Skip to content
  • vingu-linaro's avatar
    sched: replace capacity_factor by usage · e5aa6306
    vingu-linaro authored and Morten Rasmussen's avatar Morten Rasmussen committed
    The scheduler tries to compute how many tasks a group of CPUs can handle by
    assuming that a task's load is SCHED_LOAD_SCALE and a CPU's capacity is
    SCHED_CAPACITY_SCALE. group_capacity_factor divides the capacity of the group
    by SCHED_LOAD_SCALE to estimate how many task can run in the group. Then, it
    compares this value with the sum of nr_running to decide if the group is
    overloaded or not. But the group_capacity_factor is hardly working for SMT
     system, it sometimes works for big cores but fails to do the right thing for
     little cores.
    
    Below are two examples to illustrate the problem that this patch solves:
    
    1- If the original capacity of a CPU is less than SCHED_CAPACITY_SCALE
    (640 as an example), a group of 3 CPUS will have a max capacity_factor of 2
    (div_round_closest(3x640/1024) = 2) which means that it will be seen as
    overloaded even if we have only one task per CPU.
    
    2 - If the original capacity of a CPU is greater than SCHED_CAPACITY_SCALE
    (1512 as an example), a group of 4 CPUs will have a capacity_factor of 4
    (at max and thanks to the fix [0] for SMT system that prevent the apparition
    of ghost CPUs) but if one CPU is fully used by rt tasks (and its capacity is
    reduced to nearly nothing), the capacity factor of the group will still be 4
    (div_round_closest(3*1512/1024) = 5 which is cap to 4 with [0]).
    
    So, this patch tries to solve this issue by removing capacity_factor and
    replacing it with the 2 following metrics :
    -The available CPU's capacity for CFS tasks which is already used by
     load_balance.
    -The usage of the CPU by the CFS tasks. For the latter, utilization_avg_contrib
    has been re-introduced to compute the usage of a CPU by CFS tasks.
    
    group_capacity_factor and group_has_free_capacity has been removed and replaced
    by group_no_capacity. We compare the number of task with the number of CPUs and
    we evaluate the level of utilization of the CPUs to define if a group is
    overloaded or if a group has capacity to handle more tasks.
    
    For SD_PREFER_SIBLING, a group is tagged overloaded if it has more than 1 task
    so it will be selected in priority (among the overloaded groups). Since [1],
    SD_PREFER_SIBLING is no more concerned by the computation of load_above_capacity
    because local is not overloaded.
    
    Finally, the sched_group->sched_group_capacity->capacity_orig has been removed
    because it's no more used during load balance.
    
    [1] https://lkml.org/lkml/2014/8/12/295
    
    
    
    Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    [Fixed merge conflict on v3.19-rc6: Morten Rasmussen
    <morten.rasmussen@arm.com>]
    e5aa6306