GdT SCALE (Scheduling for Computing Architecture and Low Energy)

Organisateur:

Responsable(s): 
Jean-Marc NicodFEMTO-STBesançonjean-marc.nicod@ens2m.fr

La thématique du groupe de travail s’inscrit pleinement dans la définition de l’axe OPA avec une spécificité applicative des travaux au niveau des grands systèmes de calcul, des réseaux, des centres de données ou centres calcul de différentes natures. Ces architectures ont des contraintes spécifiques comme l’espace mémoire, la consommation énergétique, la disponibilité, etc. De nouveaux paradigmes apparaissent avec le Edge Computing et les architectures parallèles à large échelle (exascale). Cela conduit à une grande diversité dans les problèmes posés. Que les architectures visées soient à petite ou à très large échelle, la problématique de la consommation énergétique se pose avec là encore des réponses variées en terme de pilotage, de résilience et de robustesse. De nombreuses incertitudes existent dans les problèmes traités. Elles peuvent se manifester de différentes manières, pannes, alimentation énergétique renouvelable intrinsèquement intermittente, demande de calcul, etc.

 

Les Journées organisées par le GdT SCALE de l'axe OPA

  • le 3 décembre 2021

ENS-Lyon, site Monod (46 allée d’Italie)
Salles des conseils (2ème étage, centre)

Programme

Matin

10h00 Accueil café

10h30 Redouane Elghazi
Shelf schedules for independent moldable tasks to minimize the energy consumption
Scheduling independent tasks on a parallel platform is a widely-studied problem, in particular
when the goal is to minimize the total execution time, or makespan (P||Cmax problem
in Graham’s notations). Also, many applications do not consist of sequential tasks, but rather
parallel moldable tasks that can decide their degree of parallelism at execution (i.e., on how
many processors they are executed). Furthermore, since the energy consumption of data centers
is a growing concern, both from an environmental and economical point of view, minimizing the
energy consumption of a schedule is a main challenge to be addressed. One can then decide, for
each task, on how many processors it is executed, and at which speed the processors are operated,
with the goal to minimize the total energy consumption. We further focus on co-schedules,
where tasks are partitioned into shelves, and we prove that the problem of minimizing the energy
consumption remains NP-complete when static energy is consumed during the whole duration
of the application. We are however able to provide an optimal algorithm for the schedule within
one shelf, i.e., for a set of tasks that start at the same time. Several approximation results are
derived, and simulations are performed to show the performance of the proposed algorithms.

11h00 Lucas Perotin
Resilient Scheduling of Moldable Parallel Jobs to Cope with Silent Errors
This presentation features resilient scheduling of moldable parallel jobs on high-performance
computing (HPC) platforms. Moldable jobs allow for choosing a processor allocation before
execution, and their execution time obeys various speedup models. The objective is to minimize
the overall completion time or the makespan, when jobs can fail due to silent errors and hence
may need to be re-executed after each failure until successful completion. Our work introduce two
resilient scheduling algorithms that generalizes the classical scheduling framework for failure-free
jobs and cope with silent errors.

11h30 Vincent Fagnon
Scheduling avec oracle

Après Midi

14h00 Chifaa Dahik
Optimisation discrète robuste en présence d’incertitude ellipsoïdale
On s’intéresse à la version robuste des problèmes linéaires à variables binaires avec un ensemble
d’incertitude ellipsoïdal corrélé. Puisque ce problème est NP-difficile, une approche heuristique
intitulée DFW et basée sur l’algorithme de Frank-Wolfe est proposée. Dans cette approche, nous examinons la puissance d’exploration des itérations internes binaires de la méthode.
Pour les problèmes de petites tailles, la méthode est capable de fournir la solution optimale
fournie par CPLEX, après quelques centaines d’itérations. De plus, contrairement à la méthode
exacte, notre approche s’applique à des problèmes de grandes tailles également. Les résultats
numériques ont été appliqués au plus court chemin robuste. Un autre objectif est de proposer
une relaxation semi-définie positive (SDP) pour le plus court chemin robuste qui fournit une
borne inférieure pour valider des approches telles que l’algorithme DFW. Le problème relaxé
est le résultant d’une bidualisation du problème. Puis le problème relaxé est résolu en utilisant
une version creuse d’une méthode de décomposition dans un espace produit. Cette méthode
de validation est adaptée aux problèmes de grande taille. Finalement, une autre adaptation de
l’algorithme de Frank-Wolfe a été réalisé pour le problème du k-médiane, accompagnée d’un
algorithme d’arrondissement qui satisfait les contraintes

14h30 Anthony Dugois
Bounding the Flow Time in Online Scheduling with Structured Processing Sets
Replication in distributed key-value stores makes scheduling more challenging, as it introduces
processing set restrictions, which limits the number of machines that can process a given
task. We focus on the online minimization of the maximum response time in such systems, that
is, we aim at bounding the latency of each task. When processing sets have no structure, Anand
et al. (Algorithmica, 2017) derive a strong lower bound on the competitiveness of the problem :
no online scheduling algorithm can have a competitive ratio smaller than Omega(m), where m is the
number of machines. In practice, data replication schemes are regular, and structured processing
sets may make the problem easier to solve. We derive new lower bounds for various common
structures, including inclusive, nested or interval structures. In particular, we consider fixed sized
intervals of machines, which mimic the standard replication strategy of key-value stores. We
prove that EFT scheduling is (3 -2/k )-competitive when optimizing max-flow on disjoint intervals
of size k. However, we show that the competitive ratio of EFT is at least m - k + 1 when
these intervals overlap, even when unit tasks are considered. We compare these two replication
strategies in simulations and assess their efficiency when popularity biases are introduced, i.e.,
when some machines are accessed more frequently than others because they hold popular data.

15h00 Andrei Da Silva
Scheduling of Functions on Serverless Platforms

15h30 Manal Benaissa
Standalone Data-Center Sizing Combating the Over-provisioning of Both the IT
and Electrical Parts

During the two last decades, sustainability of IT infrastructures like data centers became a
major concern for computer giants like Google or Amazon. Data centers powered with renewable
energy have been proposed. But because of the intermittency of these power alternatives, these
platforms remains connected to the classical power grid. IT structure and electrical constraints
were often questioned separately, leading to a non-efficient global system. In this paper, an energy
self-sufficient green data center is modeled and designed, by proposing an electrically autonomous infrastructure including wind turbines, solar panels and its own short and long-term storage
devices mainly based on batteries and hydrogen system. The sizing of such an infrastructure one
of the main issue of the DATAZERO2 project. Based of previous sizing suggestions, the paper
shows how to reduce and combat the overprovioning by questioning its impact on the QoS, when
decreasing the number of computing or storage elements (servers and batteries).

16h00 Break / Discussion scientifique puis clôture de la journée

  • 13 avril 2022

FEMTO-ST, site TEMIS (15B avenue des Montboucons, Besançon)
Amphithéâtre Jean-Jacques Gagnepain (comme s'y rendre)

Programme

Matin

10h00 Accueil café

10h30 Redouane Elghazi
Update on the Asymptotic Optimality of LPT
When independent tasks are to be scheduled onto identical processors, the typical goal is to minimize the makespan. A simple and efficient heuristic consists in scheduling first the task with the longest processing time (LPT heuristic), and to plan its execution as soon as possible. While the performance of LPT has already been largely studied, in particular its asymptotic performance, we revisit results and propose a novel analysis for the case of tasks generated through uniform integer compositions. Also, we perform extensive simulations to empirically assess the asymptotic performance of LPT, and compare it to four other classical heuristics. The results show that the absolute error rapidly tends to zero for several distributions of task costs, including distributions studied by theoretical models, and realistic distributions coming from benchmarks.

11h00 Fanny Dufossé
Dimensioning of multi-clouds with follow-the-renewable approaches for environmental impact minimization
Cloud computing has become an essential component of our digital society. Efforts for reducing its environmental impact are being made by academics and industry alike, with commitments from major cloud providers to be fully operated by renewable energy in the future. One strategy to reduce nonrenewable energy usage is the “follow-the-renewables”, in which the workload is migrated to be executed in the data centers with the most availability of renewable energy.
The objective consists in developing a realistic model of Clouds supplied by green energy, and a simulation platform to compare scheduling algorithms. We consider the problem of cloud dimensioning, to minimize the ecological impact of data centers in terms of brown energy consumption and IT products manufacturing.

11h30 Bertrand Simon
Learning-Augmented Online Algorithms
Guaranteed classical online algorithms are often designed in order to minimize the competitive ratio, i.e., the performance in the worst case, so usually do not perform well on easy inputs. An orthogonal approach has been developed in the last decade thanks to the progress in machine learning, which allows to predict key parameters of the instance. Its downside is that no guarantee can be given on the prediction quality, for instance if the training set does not represent the current instance. This statement is at the origin of the recently emerging field of learning-augmented algorithms. In this framework, an online algorithm has access to a predictor, which can give relevant information about the future instance, though without any guarantee on the quality. The objective is then to design an algorithm which performs well when the predictor turns out to be accurate, while being robust to imprecise predictions. This tutorial will introduce this new domain, present some recent works and discuss current research questions.

Après Midi

13h30 Imed Kacem, Hans Kellerer
Minimizing the Maximum Lateness for Scheduling with Release Times and Job Rejection
Abstract coming soon

14h00 Denis Trystram
Une catastrophe écologique, une opportunité pour la recherche en Informatique
Je commencerai par dresser le panorama du développement des objets connectés et les enjeux économiques associés. Je ciblerai plus particulièrement le domaine de l'apprentissage embarqué et l'orchestration de services des systèmes digitaux du edge et extreme edge. Les questions de recherche associées sont stimulantes mais non soutenables dans un monde toujours plus digital, je terminerai en posant au débat de nouvelles questions qui s'appuient sur des outils simples, maitrisés et respectueux de l'environnement.

14h30 Anthony Dugois
A Scheduling Framework for Distributed Key-Value Stores and Application to Tail Latency Minimization
Distributed key-value stores employ replication for high availability. Yet, they do not always efficiently take advantage of the availability of multiple replicas for each value, and read operations often exhibit high tail latencies. Various replica selection strategies have been proposed to address this problem, together with local request scheduling policies. It is difficult, however, to determine what is the absolute performance gain each of these strategies can achieve. We present a formal framework allowing the systematic study of request scheduling strategies in key-value stores. We contribute a definition of the optimization problem related to reducing tail latency in a replicated key-value store as a minimization problem with respect to the maximum weighted flow criterion. By using scheduling theory, we show the difficulty of this problem, and therefore the need to develop performance guarantees. We also study the behavior of heuristic methods using simulations, which highlight which properties are useful for limiting tail latency: for instance, the EFT strategy—which uses the earliest available time of servers—exhibits a tail latency that is less than half that of state-of-the-art strategies, often matching the lower bound. Our study also emphasizes the importance of metrics such as the stretch to properly evaluate replica selection and local execution policies.

15h00 Frederic Vivien
Scheduling Strategies for Overloaded Real-Time Systems​
This work introduces and assesses novel strategies to schedule firm real-time jobs on an overloaded server. The jobs are released periodically and have the same relative deadline. Job execution times obey an arbitrary probability distribution and can take unbounded values (no WCET). We introduce three control parameters to decide when to start or interrupt a job. We couple this dynamic scheduling with several admission policies and investigate several optimization criteria, the most prominent being the Deadline Miss Ratio (DMR). Then we derive a Markov model and use its stationary distribution to determine the best value of each control parameter. Finally we conduct an extensive simulation campaign with 14 different probability distributions; the results nicely demonstrate how the new control parameters help improve system performance compared with traditional approaches. In particular, we show that (i) the best admission policy is to admit all jobs; (ii) the key control parameter is to upper bound the start time of each job; (iii) the best scheduling strategy decreases the DMR by up to 0.35 over traditional competitors.

15h30 Ilyes Kadri
Algorithms on variable-size rectangular interfaces
abstract coming soon

16h00 Break / Discussion scientifique puis clôture de la journée