1. 14 Dec, 2018 14 commits
  2. 11 Dec, 2018 14 commits
    • Quentin Perret's avatar
      sched/fair: Select an energy-efficient CPU on task wake-up · 732cd75b
      Quentin Perret authored
      
      
      If an Energy Model (EM) is available and if the system isn't
      overutilized, re-route waking tasks into an energy-aware placement
      algorithm. The selection of an energy-efficient CPU for a task
      is achieved by estimating the impact on system-level active energy
      resulting from the placement of the task on the CPU with the highest
      spare capacity in each performance domain. This strategy spreads tasks
      in a performance domain and avoids overly aggressive task packing. The
      best CPU energy-wise is then selected if it saves a large enough amount
      of energy with respect to prev_cpu.
      
      Although it has already shown significant benefits on some existing
      targets, this approach cannot scale to platforms with numerous CPUs.
      This is an attempt to do something useful as writing a fast heuristic
      that performs reasonably well on a broad spectrum of architectures isn't
      an easy task. As such, the scope of usability of the energy-aware
      wake-up path is restricted to systems with the SD_ASYM_CPUCAPACITY flag
      set, and where the EM isn't too complex.
      
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-15-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      732cd75b
    • Quentin Perret's avatar
      sched/fair: Introduce an energy estimation helper function · 390031e4
      Quentin Perret authored
      
      
      In preparation for the definition of an energy-aware wakeup path,
      introduce a helper function to estimate the consequence on system energy
      when a specific task wakes-up on a specific CPU. compute_energy()
      estimates the capacity state to be reached by all performance domains
      and estimates the consumption of each online CPU according to its Energy
      Model and its percentage of busy time.
      
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-14-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      390031e4
    • Morten Rasmussen's avatar
      sched/fair: Add over-utilization/tipping point indicator · 2802bf3c
      Morten Rasmussen authored
      
      
      Energy-aware scheduling is only meant to be active while the system is
      _not_ over-utilized. That is, there are spare cycles available to shift
      tasks around based on their actual utilization to get a more
      energy-efficient task distribution without depriving any tasks. When
      above the tipping point task placement is done the traditional way based
      on load_avg, spreading the tasks across as many cpus as possible based
      on priority scaled load to preserve smp_nice. Below the tipping point we
      want to use util_avg instead. We need to define a criteria for when we
      make the switch.
      
      The util_avg for each cpu converges towards 100% regardless of how many
      additional tasks we may put on it. If we define over-utilized as:
      
      sum_{cpus}(rq.cfs.avg.util_avg) + margin > sum_{cpus}(rq.capacity)
      
      some individual cpus may be over-utilized running multiple tasks even
      when the above condition is false. That should be okay as long as we try
      to spread the tasks out to avoid per-cpu over-utilization as much as
      possible and if all tasks have the _same_ priority. If the latter isn't
      true, we have to consider priority to preserve smp_nice.
      
      For example, we could have n_cpus nice=-10 util_avg=55% tasks and
      n_cpus/2 nice=0 util_avg=60% tasks. Balancing based on util_avg we are
      likely to end up with nice=-10 tasks sharing cpus and nice=0 tasks
      getting their own as we 1.5*n_cpus tasks in total and 55%+55% is less
      over-utilized than 55%+60% for those cpus that have to be shared. The
      system utilization is only 85% of the system capacity, but we are
      breaking smp_nice.
      
      To be sure not to break smp_nice, we have defined over-utilization
      conservatively as when any cpu in the system is fully utilized at its
      highest frequency instead:
      
      cpu_rq(any).cfs.avg.util_avg + margin > cpu_rq(any).capacity
      
      IOW, as soon as one cpu is (nearly) 100% utilized, we switch to load_avg
      to factor in priority to preserve smp_nice.
      
      With this definition, we can skip periodic load-balance as no cpu has an
      always-running task when the system is not over-utilized. All tasks will
      be periodic and we can balance them at wake-up. This conservative
      condition does however mean that some scenarios that could benefit from
      energy-aware decisions even if one cpu is fully utilized would not get
      those benefits.
      
      For systems where some cpus might have reduced capacity on some cpus
      (RT-pressure and/or big.LITTLE), we want periodic load-balance checks as
      soon a just a single cpu is fully utilized as it might one of those with
      reduced capacity and in that case we want to migrate it.
      
      [ peterz: Added a comment explaining why new tasks are not accounted during
                overutilization detection. ]
      
      Signed-off-by: Morten Rasmussen's avatarMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-13-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2802bf3c
    • Quentin Perret's avatar
      sched/fair: Clean-up update_sg_lb_stats parameters · 630246a0
      Quentin Perret authored
      
      
      In preparation for the introduction of a new root domain flag which can
      be set during load balance (the 'overutilized' flag), clean-up the set
      of parameters passed to update_sg_lb_stats(). More specifically, the
      'local_group' and 'local_idx' parameters can be removed since they can
      easily be reconstructed from within the function.
      
      While at it, transform the 'overload' parameter into a flag stored in
      the 'sg_status' parameter hence facilitating the definition of new flags
      when needed.
      
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Suggested-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-12-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      630246a0
    • Quentin Perret's avatar
      sched/toplogy: Introduce the 'sched_energy_present' static key · 1f74de87
      Quentin Perret authored
      
      
      In order to make sure Energy Aware Scheduling (EAS) will not impact
      systems where no Energy Model is available, introduce a static key
      guarding the access to EAS code. Since EAS is enabled on a
      per-root-domain basis, the static key is enabled when at least one root
      domain meets all conditions for EAS.
      
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-10-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1f74de87
    • Quentin Perret's avatar
      sched/topology: Make Energy Aware Scheduling depend on schedutil · 531b5c9f
      Quentin Perret authored
      
      
      Energy Aware Scheduling (EAS) is designed with the assumption that
      frequencies of CPUs follow their utilization value. When using a CPUFreq
      governor other than schedutil, the chances of this assumption being true
      are small, if any. When schedutil is being used, EAS' predictions are at
      least consistent with the frequency requests. Although those requests
      have no guarantees to be honored by the hardware, they should at least
      guide DVFS in the right direction and provide some hope in regards to the
      EAS model being accurate.
      
      To make sure EAS is only used in a sane configuration, create a strong
      dependency on schedutil being used. Since having sugov compiled-in does
      not provide that guarantee, make CPUFreq call a scheduler function on
      governor changes hence letting it rebuild the scheduling domains, check
      the governors of the online CPUs, and enable/disable EAS accordingly.
      
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-9-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      531b5c9f
    • Quentin Perret's avatar
      sched/topology: Disable EAS on inappropriate platforms · b68a4c0d
      Quentin Perret authored
      
      
      Energy Aware Scheduling (EAS) in its current form is most relevant on
      platforms with asymmetric CPU topologies (e.g. Arm big.LITTLE) since
      this is where there is a lot of potential for saving energy through
      scheduling. This is particularly true since the Energy Model only
      includes the active power costs of CPUs, hence not providing enough data
      to compare packing-vs-spreading strategies.
      
      As such, disable EAS on root domains where the SD_ASYM_CPUCAPACITY flag
      is not set. While at it, disable EAS on systems where the complexity of
      the Energy Model is too high since that could lead to unacceptable
      scheduling overhead.
      
      All in all, EAS can be used on a root domain if and only if:
        1. an Energy Model is available;
        2. the root domain has an asymmetric CPU capacity topology;
        3. the complexity of the root domain's EM is low enough to keep
           scheduling overheads low.
      
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-8-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b68a4c0d
    • Quentin Perret's avatar
      sched/topology: Add lowest CPU asymmetry sched_domain level pointer · 011b27bb
      Quentin Perret authored
      
      
      Add another member to the family of per-cpu sched_domain shortcut
      pointers. This one, sd_asym_cpucapacity, points to the lowest level
      at which the SD_ASYM_CPUCAPACITY flag is set. While at it, rename the
      sd_asym shortcut to sd_asym_packing to avoid confusions.
      
      Generally speaking, the largest opportunity to save energy via
      scheduling comes from a smarter exploitation of heterogeneous platforms
      (i.e. big.LITTLE). Consequently, the sd_asym_cpucapacity shortcut will
      be used at first as the lowest domain where Energy-Aware Scheduling
      (EAS) should be applied. For example, it is possible to apply EAS within
      a socket on a multi-socket system, as long as each socket has an
      asymmetric topology. Energy-aware cross-sockets wake-up balancing will
      only happen when the system is over-utilized, or this_cpu and prev_cpu
      are in different sockets.
      
      Suggested-by: Morten Rasmussen's avatarMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-7-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      011b27bb
    • Quentin Perret's avatar
      sched/topology: Reference the Energy Model of CPUs when available · 6aa140fa
      Quentin Perret authored
      
      
      The existing scheduling domain hierarchy is defined to map to the cache
      topology of the system. However, Energy Aware Scheduling (EAS) requires
      more knowledge about the platform, and specifically needs to know about
      the span of Performance Domains (PD), which do not always align with
      caches.
      
      To address this issue, use the Energy Model (EM) of the system to extend
      the scheduler topology code with a representation of the PDs, alongside
      the scheduling domains. More specifically, a linked list of PDs is
      attached to each root domain. When multiple root domains are in use,
      each list contains only the PDs covering the CPUs of its root domain. If
      a PD spans over CPUs of multiple different root domains, it will be
      duplicated in all lists.
      
      The lists are fully maintained by the scheduler from
      partition_sched_domains() in order to cope with hotplug and cpuset
      changes. As for scheduling domains, the list are protected by RCU to
      ensure safe concurrent updates.
      
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-6-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6aa140fa
    • Quentin Perret's avatar
      PM: Introduce an Energy Model management framework · 27871f7a
      Quentin Perret authored
      
      
      Several subsystems in the kernel (task scheduler and/or thermal at the
      time of writing) can benefit from knowing about the energy consumed by
      CPUs. Yet, this information can come from different sources (DT or
      firmware for example), in different formats, hence making it hard to
      exploit without a standard API.
      
      As an attempt to address this, introduce a centralized Energy Model
      (EM) management framework which aggregates the power values provided
      by drivers into a table for each performance domain in the system. The
      power cost tables are made available to interested clients (e.g. task
      scheduler or thermal) via platform-agnostic APIs. The overall design
      is represented by the diagram below (focused on Arm-related drivers as
      an example, but applicable to any architecture):
      
           +---------------+  +-----------------+  +-------------+
           | Thermal (IPA) |  | Scheduler (EAS) |  |    Other    |
           +---------------+  +-----------------+  +-------------+
                   |                   | em_pd_energy()   |
                   |                   | em_cpu_get()     |
                   +-----------+       |         +--------+
                               |       |         |
                               v       v         v
                            +---------------------+
                            |                     |
                            |    Energy Model     |
                            |                     |
                            |     Framework       |
                            |                     |
                            +---------------------+
                               ^       ^       ^
                               |       |       | em_register_perf_domain()
                    +----------+       |       +---------+
                    |                  |                 |
            +---------------+  +---------------+  +--------------+
            |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
            +---------------+  +---------------+  +--------------+
                    ^                  ^                 ^
                    |                  |                 |
            +--------------+   +---------------+  +--------------+
            | Device Tree  |   |   Firmware    |  |      ?       |
            +--------------+   +---------------+  +--------------+
      
      Drivers (typically, but not limited to, CPUFreq drivers) can register
      data in the EM framework using the em_register_perf_domain() API. The
      calling driver must provide a callback function with a standardized
      signature that will be used by the EM framework to build the power
      cost tables of the performance domain. This design should offer a lot of
      flexibility to calling drivers which are free of reading information
      from any location and to use any technique to compute power costs.
      Moreover, the capacity states registered by drivers in the EM framework
      are not required to match real performance states of the target. This
      is particularly important on targets where the performance states are
      not known by the OS.
      
      The power cost coefficients managed by the EM framework are specified in
      milli-watts. Although the two potential users of those coefficients (IPA
      and EAS) only need relative correctness, IPA specifically needs to
      compare the power of CPUs with the power of other components (GPUs, for
      example), which are still expressed in absolute terms in their
      respective subsystems. Hence, specifying the power of CPUs in
      milli-watts should help transitioning IPA to using the EM framework
      without introducing new problems by keeping units comparable across
      sub-systems.
      On the longer term, the EM of other devices than CPUs could also be
      managed by the EM framework, which would enable to remove the absolute
      unit. However, this is not absolutely required as a first step, so this
      extension of the EM framework is left for later.
      
      On the client side, the EM framework offers APIs to access the power
      cost tables of a CPU (em_cpu_get()), and to estimate the energy
      consumed by the CPUs of a performance domain (em_pd_energy()). Clients
      such as the task scheduler can then use these APIs to access the shared
      data structures holding the Energy Model of CPUs.
      
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      27871f7a
    • Quentin Perret's avatar
      sched/cpufreq: Prepare schedutil for Energy Aware Scheduling · 938e5e4b
      Quentin Perret authored
      
      
      Schedutil requests frequency by aggregating utilization signals from
      the scheduler (CFS, RT, DL, IRQ) and applying a 25% margin on top of
      them. Since Energy Aware Scheduling (EAS) needs to be able to predict
      the frequency requests, it needs to forecast the decisions made by the
      governor.
      
      In order to prepare the introduction of EAS, introduce
      schedutil_freq_util() to centralize the aforementioned signal
      aggregation and make it available to both schedutil and EAS. Since
      frequency selection and energy estimation still need to deal with RT and
      DL signals slightly differently, schedutil_freq_util() is called with a
      different 'type' parameter in those two contexts, and returns an
      aggregated utilization signal accordingly. While at it, introduce the
      map_util_freq() function which is designed to make schedutil's 25%
      margin usable easily for both sugov and EAS.
      
      As EAS will be able to predict schedutil's frequency requests more
      accurately than any other governor by design, it'd be sensible to make
      sure EAS cannot be used without schedutil. This will be done later, once
      EAS has actually been introduced.
      
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-3-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      938e5e4b
    • Quentin Perret's avatar
      sched/topology: Relocate arch_scale_cpu_capacity() to the internal header · 5bd0988b
      Quentin Perret authored
      
      
      By default, arch_scale_cpu_capacity() is only visible from within the
      kernel/sched folder. Relocate it to include/linux/sched/topology.h to
      make it visible to other clients needing to know about the capacity of
      CPUs, such as the Energy Model framework.
      
      This also shrinks the <linux/sched/topology.h> public header.
      
      Signed-off-by: default avatarQuentin Perret <quentin.perret@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adharmap@codeaurora.org
      Cc: chris.redpath@arm.com
      Cc: currojerez@riseup.net
      Cc: dietmar.eggemann@arm.com
      Cc: edubezval@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: javi.merino@kernel.org
      Cc: joel@joelfernandes.org
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: pkondeti@codeaurora.org
      Cc: rjw@rjwysocki.net
      Cc: skannan@codeaurora.org
      Cc: smuckle@google.com
      Cc: srinivas.pandruvada@linux.intel.com
      Cc: thara.gopinath@linaro.org
      Cc: tkjos@google.com
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Cc: viresh.kumar@linaro.org
      Link: https://lkml.kernel.org/r/20181203095628.11858-2-quentin.perret@arm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5bd0988b
    • Yangtao Li's avatar
      sched/core: Remove unnecessary unlikely() in push_*_task() · 9ebc6053
      Yangtao Li authored
      
      
      WARN_ON() already contains an unlikely(), so it's not necessary to
      use WARN_ON(1).
      
      Signed-off-by: default avatarYangtao Li <tiny.windzz@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20181103172602.1917-1-tiny.windzz@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9ebc6053
    • vingu-linaro's avatar
      sched/topology: Remove the ::smt_gain field from 'struct sched_domain' · 765d0af1
      vingu-linaro authored
      ::smt_gain is used to compute the capacity of CPUs of a SMT core with the
      constraint 1 < ::smt_gain < 2 in order to be able to compute number of CPUs
      per core. The field has_free_capacity of struct numa_stat, which was the
      last user of this computation of number of CPUs per core, has been removed
      by:
      
        2d4056fa
      
       ("sched/numa: Remove numa_has_capacity()")
      
      We can now remove this constraint on core capacity and use the defautl value
      SCHED_CAPACITY_SCALE for SMT CPUs. With this remove, SCHED_CAPACITY_SCALE
      becomes the maximum compute capacity of CPUs on every systems. This should
      help to simplify some code and remove fields like rd->max_cpu_capacity
      
      Furthermore, arch_scale_cpu_capacity() is used with a NULL sd in several other
      places in the code when it wants the capacity of a CPUs to scale
      some metrics like in pelt, deadline or schedutil. In case on SMT, the value
      returned is not the capacity of SMT CPUs but default SCHED_CAPACITY_SCALE.
      
      So remove it.
      
      Signed-off-by: vingu-linaro's avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1535548752-4434-4-git-send-email-vincent.guittot@linaro.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      765d0af1
  3. 03 Dec, 2018 2 commits
  4. 02 Dec, 2018 4 commits
    • Linus Torvalds's avatar
      Linux 4.20-rc5 · 25956467
      Linus Torvalds authored
      25956467
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 6a512726
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "Volume is a little higher than usual due to a set of gpio fixes for
        Davinci platforms that's been around a while, still seemed appropriate
        to not hold off until next merge window.
      
        Besides that it's the usual mix of minor fixes, mostly corrections of
        small stuff in device trees.
      
        Major stability-related one is the removal of a regulator from DT on
        Rock960, since DVFS caused undervoltage. I expect it'll be restored
        once they figure out the underlying issue"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
        MAINTAINERS: Remove unused Qualcomm SoC mailing list
        ARM: davinci: dm644x: set the GPIO base to 0
        ARM: davinci: da830: set the GPIO base to 0
        ARM: davinci: dm355: set the GPIO base to 0
        ARM: davinci: dm646x: set the GPIO base to 0
        ARM: davinci: dm365: set the GPIO base to 0
        ARM: davinci: da850: set the GPIO base to 0
        gpio: davinci: restore a way to manually specify the GPIO base
        ARM: davinci: dm644x: define gpio interrupts as separate resources
        ARM: davinci: dm355: define gpio interrupts as separate resources
        ARM: davinci: dm646x: define gpio interrupts as separate resources
        ARM: davinci: dm365: define gpio interrupts as separate resources
        ARM: davinci: da8xx: define gpio interrupts as separate resources
        ARM: dts: at91: sama5d2: use the divided clock for SMC
        ARM: dts: imx51-zii-rdu1: Remove EEPROM node
        ARM: dts: rockchip: Remove @0 from the veyron memory node
        arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou.
        arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP
        arm64: dts: sdm845-mtp: Reserve reserved gpios
        arm64: dts: ti: k3-am654: Fix wakeup_uart reg address
        ...
      6a512726
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 292974c5
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - A revert of a previous commit as it is no longer necessary and has
         shown to cause problems in some memory hotplug cases.
      
       - Some small fixes and a minor cleanup.
      
       - A patch for adding better diagnostic data in a very rare failure
         case.
      
      * tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        pvcalls-front: fixes incorrect error handling
        Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"
        xen: xlate_mmu: add missing header to fix 'W=1' warning
        xen/x86: add diagnostic printout to xen_mc_flush() in case of error
        x86/xen: cleanup includes in arch/x86/xen/spinlock.c
      292974c5
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-4.20-rc5' of git://git.infradead.org/users/vkoul/slave-dma · a234c737
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "This contains two fixes to at_hdmac which fixes long standing bus
        reported recently on serial transfers causing memory leak. These fixes
        were done by Richard Genoud"
      
      * tag 'dmaengine-fix-4.20-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: at_hdmac: fix module unloading
        dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
      a234c737
  5. 01 Dec, 2018 6 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4b783176
      Linus Torvalds authored
      Pull STIBP fallout fixes from Thomas Gleixner:
       "The performance destruction department finally got it's act together
        and came up with a cure for the STIPB regression:
      
         - Provide a command line option to control the spectre v2 user space
           mitigations. Default is either seccomp or prctl (if seccomp is
           disabled in Kconfig). prctl allows mitigation opt-in, seccomp
           enables the migitation for sandboxed processes.
      
         - Rework the code to handle the conditional STIBP/IBPB control and
           remove the now unused ptrace_may_access_sched() optimization
           attempt
      
         - Disable STIBP automatically when SMT is disabled
      
         - Optimize the switch_to() logic to avoid MSR writes and invocations
           of __switch_to_xtra().
      
         - Make the asynchronous speculation TIF updates synchronous to
           prevent stale mitigation state.
      
        As a general cleanup this also makes retpoline directly depend on
        compiler support and removes the 'minimal retpoline' option which just
        pretended to provide some form of security while providing none"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
        x86/speculation: Provide IBPB always command line options
        x86/speculation: Add seccomp Spectre v2 user space protection mode
        x86/speculation: Enable prctl mode for spectre_v2_user
        x86/speculation: Add prctl() control for indirect branch speculation
        x86/speculation: Prepare arch_smt_update() for PRCTL mode
        x86/speculation: Prevent stale SPEC_CTRL msr content
        x86/speculation: Split out TIF update
        ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
        x86/speculation: Prepare for conditional IBPB in switch_mm()
        x86/speculation: Avoid __switch_to_xtra() calls
        x86/process: Consolidate and simplify switch_to_xtra() code
        x86/speculation: Prepare for per task indirect branch speculation control
        x86/speculation: Add command line control for indirect branch speculation
        x86/speculation: Unify conditional spectre v2 print functions
        x86/speculataion: Mark command line parser data __initdata
        x86/speculation: Mark string arrays const correctly
        x86/speculation: Reorder the spec_v2 code
        x86/l1tf: Show actual SMT state
        x86/speculation: Rework SMT state change
        sched/smt: Expose sched_smt_present static key
        ...
      4b783176
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20181201' of git://git.kernel.dk/linux-block · 88058417
      Linus Torvalds authored
      Pull block layer fixes from Jens Axboe:
      
       - Single range elevator discard merge fix, that caused crashes (Ming)
      
       - Fix for a regression in O_DIRECT, where we could potentially lose the
         error value (Maximilian Heyne)
      
       - NVMe pull request from Christoph, with little fixes all over the map
         for NVMe.
      
      * tag 'for-linus-20181201' of git://git.kernel.dk/linux-block:
        block: fix single range discard merge
        nvme-rdma: fix double freeing of async event data
        nvme: flush namespace scanning work just before removing namespaces
        nvme: warn when finding multi-port subsystems without multipathing enabled
        fs: fix lost error code in dio_complete
        nvme-pci: fix surprise removal
        nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request()
        nvme: Free ctrl device name on init failure
      88058417
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.20-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · c734b425
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
      
       - Fix a link speed checking interface that broke PCIe gen3 cards in
         gen1 slots (Mikulas Patocka)
      
       - Fix an imx6 link training error (Trent Piepho)
      
       - Fix a layerscape outbound window accessor calling error (Hou
         Zhiqiang)
      
       - Fix a DesignWare endpoint MSI-X address calculation error (Gustavo
         Pimentel)
      
      * tag 'pci-v4.20-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: Fix incorrect value returned from pcie_get_speed_cap()
        PCI: dwc: Fix MSI-X EP framework address calculation bug
        PCI: layerscape: Fix wrong invocation of outbound window disable accessor
        PCI: imx6: Fix link training status detection in link up check
      c734b425
    • Bjorn Helgaas's avatar
      Merge remote-tracking branch 'lorenzo/pci/controller-fixes' into for-linus · c74eadf8
      Bjorn Helgaas authored
        - Fix DesignWare endpoint MSI-X address calculation bug (Gustavo
          Pimentel)
      
        - Fix Layerscape outbound window disable usage (Hou Zhiqiang)
      
        - Fix imx6 link up detection (Trent Piepho)
      
      * lorenzo/pci/controller-fixes:
        PCI: dwc: Fix MSI-X EP framework address calculation bug
        PCI: layerscape: Fix wrong invocation of outbound window disable accessor
        PCI: imx6: Fix link training status detection in link up check
      c74eadf8
    • Mikulas Patocka's avatar
      PCI: Fix incorrect value returned from pcie_get_speed_cap() · f1f90e25
      Mikulas Patocka authored
      The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks.  We must mask
      the register and compare it against them.
      
      This fixes errors like this:
      
        amdgpu: [powerplay] failed to send message 261 ret is 0
      
      when a PCIe-v3 card is plugged into a PCIe-v1 slot, because the slot is
      being incorrectly reported as PCIe-v3 capable.
      
      6cf57be0, which appeared in v4.17, added pcie_get_speed_cap() with the
      incorrect test of PCI_EXP_LNKCAP_SLS as a bitmask.  5d9a6330, which
      appeared in v4.19, changed amdgpu to use pcie_get_speed_cap(), so the
      amdgpu bug reports below are regressions in v4.19.
      
      Fixes: 6cf57be0 ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
      Fixes: 5d9a6330 ("drm/amdgpu: use pcie functions for link width and speed")
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704
      Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778
      
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      [bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and
      PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2,
      remove test of PCI_EXP_LNKCAP for zero, since that register is required]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org	# v4.17+
      f1f90e25
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · d8f190ee
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "31 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
        ocfs2: fix potential use after free
        mm/khugepaged: fix the xas_create_range() error path
        mm/khugepaged: collapse_shmem() do not crash on Compound
        mm/khugepaged: collapse_shmem() without freezing new_page
        mm/khugepaged: minor reorderings in collapse_shmem()
        mm/khugepaged: collapse_shmem() remember to clear holes
        mm/khugepaged: fix crashes due to misaccounted holes
        mm/khugepaged: collapse_shmem() stop if punched or truncated
        mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
        mm/huge_memory: splitting set mapping+index before unfreeze
        mm/huge_memory: rename freeze_page() to unmap_page()
        initramfs: clean old path before creating a hardlink
        kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
        psi: make disabling/enabling easier for vendor kernels
        proc: fixup map_files test on arm
        debugobjects: avoid recursive calls with kmemleak
        userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
        userfaultfd: shmem: add i_size checks
        userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
        userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
        ...
      d8f190ee