1. 02 Aug, 2021 40 commits
    • Lukasz Luba's avatar
      PM: EM: Increase energy calculation precision · 60b3fa3f
      Lukasz Luba authored
      The Energy Model (EM) provides useful information about device power in
      each performance state to other subsystems like: Energy Aware Scheduler
      (EAS). The energy calculation in EAS does arithmetic operation based on
      the EM em_cpu_energy(). Current implementation of that function uses
      em_perf_state::cost as a pre-computed cost coefficient equal to:
      cost = power * max_frequency / frequency.
      The 'power' is expressed in milli-Watts (or in abstract scale).
      There are corner cases when the EAS energy calculation for two Performance
      Domains (PDs) return the same value. The EAS compares these values to
      choose smaller one. It might happen that this values are equal due to
      rounding error. In such scenario, we need better resolution, e.g. 1000
      times better. To provide this possibility increase the resolution in the
      em_perf_state::cost for 64-bit architectures. The costs for increasing
      resolution in 32-bit architectures are pretty high (64-bit division) and
      the returns do not justify the increased costs.
      This patch allows to avoid the rounding to milli-Watt errors, which might
      occur in EAS energy estimation for each Performance Domains (PD). The
      rounding error is common for small tasks which have small utilization
      There are two places in the code where it makes a difference:
      1. In the find_energy_efficient_cpu() where we are searching for
      best_delta. We might suffer there when two PDs return the same result,
      like in the example below.
      Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There
      are quite a few small tasks ~10-15 util. These tasks would suffer for
      the rounding error. Such system utilization has been seen while playing
      some simple games. In such condition our partner reported 5..10mA less
      battery drain.
      Some details:
      We have two Perf Domains (PDs): PD0 (big) and PD1 (little)
      Let's compare w/o patch set ('old') and w/ patch set ('new')
      We are comparing energy w/ task and w/o task placed in the PDs
      a) 'old' w/o patch set, PD0
      task_util = 13
      cost = 480
      sum_util_w/o_task = 215
      sum_util_w_task = 228
      scale_cpu = 1024
      energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100
      energy_w_task = 480 * 228 / 1024 = 106.87 => 106
      energy_diff = 106 - 100 = 6
      (this is equal to 'old' PD1's energy_diff in 'c)')
      b) 'new' w/ patch set, PD0
      task_util = 13
      cost = 480 * 1000 = 480000
      sum_util_w/o_task = 215
      sum_util_w_task = 228
      energy_w/o_task = 480000 * 215 / 1024 = 100781
      energy_w_task = 480000 * 228 / 1024  = 106875
      energy_diff = 106875 - 100781 = 6094
      (this is not equal to 'new' PD1's energy_diff in 'd)')
      c) 'old' w/o patch set, PD1
      task_util = 13
      cost = 160
      sum_util_w/o_task = 283
      sum_util_w_task = 293
      scale_cpu = 355
      energy_w/o_task = 160 * 283 / 355 = 127.55 => 127
      energy_w_task = 160 * 296 / 355 = 133.41 => 133
      energy_diff = 133 - 127 = 6
      (this is equal to 'old' PD0's energy_diff in 'a)')
      d) 'new' w/ patch set, PD1
      task_util = 13
      cost = 160 * 1000 = 160000
      sum_util_w/o_task = 283
      sum_util_w_task = 293
      scale_cpu = 355
      energy_w/o_task = 160000 * 283 / 355 = 127549
      energy_w_task = 160000 * 296 / 355 =   133408
      energy_diff = 133408 - 127549 = 5859
      (this is not equal to 'new' PD0's energy_diff in 'b)')
      2. Difference in the the last find_energy_efficient_cpu(): margin filter.
      With this patch the margin comparison also has better resolution,
      so it's possible to have better task placement thanks to that.
      Fixes: 27871f7a
       ("PM: Introduce an Energy Model management framework")
      Reported-by: default avatarCCJ Yeh <CCj.Yeh@mediatek.com>
      Signed-off-by: Lukasz Luba's avatarLukasz Luba <lukasz.luba@arm.com>
    • Ionela Voinescu's avatar
      arch_topology: obtain cpu capacity using information from CPPC · 20416c14
      Ionela Voinescu authored
      Define arch_init_invariance_cppc() to use highest performance values
      from _CPC objects to obtain and set maximum capacity information for
      each CPU.
      The performance scale used by CPPC is a unified scale for all CPUs in
      the system. Therefore, by obtaining the raw highest performance values
      from the _CPC objects, and normalizing them on the [0, 1024] capacity
      scale used by the task scheduler we obtain the CPU capacity of each
      Signed-off-by: Ionela Voinescu's avatarIonela Voinescu <ionela.voinescu@arm.com>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
    • Ionela Voinescu's avatar
      x86, ACPI: rename init_freq_invariance_cppc to arch_init_invariance_cppc · 83744b77
      Ionela Voinescu authored
      init_freq_invariance_cppc() was called in acpi_cppc_processor_probe(),
      after CPU performance information and controls were populated from the
      per-cpu _CPC objects.
      But these _CPC objects provide information that helps with both CPU
      (u-arch) and frequency invariance. Therefore, change the function name
      to a more generic one, while adding the arch_ prefix, as this function
      is expected to be defined differently by different architectures.
      Signed-off-by: Ionela Voinescu's avatarIonela Voinescu <ionela.voinescu@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Giovanni Gherdovich <ggherdovich@suse.cz>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
    • Vincent Donnefort's avatar
      PM / EM: Support for inefficient OPPs · 282844c7
      Vincent Donnefort authored
      Some SoCs, such as the sd855 or the old TC2 have OPPs within the same
      performance domain, whose cost is higher than others with a higher
      frequency. If those OPPs are interesting from a cooling perspective, it
      makes no sense to use them when the device can run at full capacity.
      Those OPPs handicap the performance domain, when choosing the most
      energy-efficient task placement, and are wasting energy. They are
      Hence, add support for such OPPs to the Energy Model, which creates for
      each OPP a performance state. The Energy Model can now be read using the
      regular table, which contains all performance states available, or using
      an efficient table, where inefficient performance states (and by
      extension, inefficient OPPs) have been removed.
      Currently, the efficient table is used in two paths. Schedutil, that'll
      skip inefficient OPPs for the frequency selection and em_cpu_energy(),
      used by find_energy_efficient_cpu() to estimate the energy cost for a
      specified task placement. We have to modify both paths in the same patch
      so they stay synchronized. The thermal framework still relies on the
      original table and hence, non CPU devices won't create the efficient
      As used in the hot-path, the efficient table is a lookup table, generated
      dynamically when the perf domain is created. The complexity of searching
      a performance state is hence changed from O(n) to O(1). This also
      speeds-up em_cpu_energy() even if no inefficient OPPs have been found.
      Signed-off-by: Vincent Donnefort's avatarVincent Donnefort <vincent.donnefort@arm.com>
    • Valentin Schneider's avatar
      arm64: dts: sdm845: Bind CPU thermal throttling to cluster sensors · 78f50df5
      Valentin Schneider authored
      The previous commit removed per-CPU thermal zones. Despite having a thermal
      sensor per-CPU, the trip points in those zones would affect *several* CPUs:
      this system doesn't have per-CPU DVFS, so one cannot change the frequency
      of an individual CPU, but rather a group thereof (i.e. the frequency
      Furthermore, the (existing) CPU cluster thermal zones have a "hot" trip
      point set at the same trip temperature as the lower per-CPU thermal
      zones. AIUI this is actually useless, as the struct thermal_zone_device_ops
      provided by thermal_of.c doesn't contain a .notify() callback, so no action
      will (and can) be taken as a consequence of hitting that trip point.
      Copy the previous per-CPU trip points / cooling maps into the CPU cluster
      thermal zones. This should effectively lead to a similar CPU thermal
      management as before with less overhead.
      Signed-off-by: Valentin Schneider's avatarValentin Schneider <valentin.schneider@arm.com>
    • Valentin Schneider's avatar
    • Morten Rasmussen's avatar
      sched/pelt: [HACK] Make PELT trace points unconditional · 753eddce
      Morten Rasmussen authored
      This is a temporary hack.
      Currently trace points for PELT are only triggered when the PELT metrics
      consumed by the scheduler are actually updated, i.e. util_avg. This
      means no updates if no 1 ms boundary is being crossed by the update.
      When reconstructing the PELT signal based on this data, the peak PELT
      value can therefore be up to 1 ms worth of PELT accumulation off (23 in
      absolute terms). This leads to a discrepancy that causes test cases to
      This patch ensures that trace events are always emitted even if the
      metrics haven't been updated which should allow accurate reconstruction
      of the PELT signals.
    • Dietmar Eggemann's avatar
      sched_tp: Add sched_pelt_thermal trace event · 55a0bbb3
      Dietmar Eggemann authored
      hackbench-3060 [007] 565.489752: sched_pelt_thermal: cpu=7 load=101 runnable=103765 util=951 update_time=564835470336
      hackbench-3147 [005] 565.489760: sched_pelt_thermal: cpu=5 load=101 runnable=104036 util=954 update_time=565325810688
      task_n3-3-576  [000] 565.489763: sched_pelt_thermal: cpu=0 load=31 runnable=32404 util=953 update_time=557800611840
      hackbench-862  [001] 565.489763: sched_pelt_thermal: cpu=1 load=31 runnable=32395 util=952 update_time=564421108736
      hackbench-2136 [002] 565.489763: sched_pelt_thermal: cpu=2 load=31 runnable=32455 util=954 update_time=564142488576
      The actual PELT thermal signal is `load`, `runnable` and `util` are
      meaningless. It's plotted here since the trace event is implemented
      according to the `sched_pelt_rq_template` trace event class.
      The `sched_pelt_rt`, `sched_pelt_dl` and `sched_pelt_irq` trace event
      have a similar issue.
      Signed-off-by: Dietmar Eggemann's avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    • Dietmar Eggemann's avatar
      sched_tp: Add capacity_curr to cpu_capacity trace event · 57b72eb5
      Dietmar Eggemann authored
      Whereas capacity_orig is the CPU capacity scaled by CPU architecture,
      capacity_curr (current capacity) is the CPU capacity scaled by CPU
      architecture and current_frequency/max_frequency.
      capacity_curr is used to create the PELT clock information for
      enqueue/dequeue (sched_switch) out of (not aligned) PELT sched events
      in LISA's PELT simulation.
      Here, the ftrace time between an enqueue/dequeue and a PELT sched event
      has to be rescaled by the time-scaling factor which itself consists of
      CPU capacity and current_frequency/max_frequency scaling factor. And
      this is done by using a series of capacity_curr values.
    • Dietmar Eggemann's avatar
      sched_tp: Add cpu_capacity trace events · 53e0e084
      Dietmar Eggemann authored
      [000] .Ns3 185.489841: sched_cpu_capacity: cpu=0 capacity=437 capacity_orig=446
      [002] ..s1 185.501802: sched_cpu_capacity: cpu=2 capacity=999 capacity_orig=1024
      [003] ..s3 185.517837: sched_cpu_capacity: cpu=3 capacity=438 capacity_orig=446
      [002] ..s1 185.529794: sched_cpu_capacity: cpu=2 capacity=999 capacity_orig=1024
      [001] ..s1 185.545789: sched_cpu_capacity: cpu=1 capacity=1011 capacity_orig=1024
      [004] .Ns1 185.545837: sched_cpu_capacity: cpu=4 capacity=439 capacity_orig=446
      Use arch_scale_cpu_capacity() instead of rq->cpu_capacity_orig because
      there is a patch further up the stack which removes the latter.
      Signed-off-by: Dietmar Eggemann's avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    • Dietmar Eggemann's avatar
      sched_tp: Add update_time to sched_pelt_cfs and sched_pelt_rq_template · 2890008f
      Dietmar Eggemann authored
      The trace event sched_pelt_se already has this entry to trace PELT's
      Having this for cfs, rt, dl rq and irq as well is interesting to debug
      PELT issues, especially with time-scaling (rq_clock_task/clock_pelt
      diff and sync).
      Signed-off-by: Dietmar Eggemann's avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    • Dietmar Eggemann's avatar
      sched_tp: Avoid the use of runqueues since it's not exported · 323c6a81
      Dietmar Eggemann authored
      Use rq_of(cfs_rq)->cfs instead of &cpu_rq(cpu)->cfs since cpu_rq(cpu)
      uses runqueues.
      Signed-off-by: Dietmar Eggemann's avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    • Vincent Donnefort's avatar
    • Qais Yousef's avatar
      sched_tp: Now 5.7-rc1 is released, change the compatability condition · 157af54e
      Qais Yousef authored
      The change that broke the trace event was actually merged in Linus tree
      on 5.7, but since we test tip/sched/core which had the change earlier,
      the version condition was temporarily incorrect to cope with that. Now
      5.7 is released, apply the correct condition.
      Signed-off-by: Qais Yousef's avatarQais Yousef <qais.yousef@arm.com>
    • Qais Yousef's avatar
      sched_tp: Add uclamp trace points · 37685d5b
      Qais Yousef authored
      Signed-off-by: Qais Yousef's avatarQais Yousef <qais.yousef@arm.com>
    • Qais Yousef's avatar
      sched_tp: rename some tracepoints to match code name · 03561384
      Qais Yousef authored
      Signed-off-by: Qais Yousef's avatarQais Yousef <qais.yousef@arm.com>
    • Qais Yousef's avatar
      sched_tp: Manage a change in a member name in struct sched_avg · 071dad9c
      Qais Yousef authored
      We need to support both old and new version of the struct.
      Signed-off-by: Qais Yousef's avatarQais Yousef <qais.yousef@arm.com>
    • Qais Yousef's avatar
      sched_tp: fix compilation error on SMP systems with NR_CPUS=2 · bac3fc0e
      Qais Yousef authored
      SPAN_SIZE was being set to 0 when NR_CPUS was 2; which triggererd
      a compile time assertion. Which exposed a bug in the logic where we
      where rounding down instead of up. Fix it by using round_up() to ensure
      we get the correct span size for the NR_CPUS.
      Ensure as well that we don't use a spansize greather than 128. The trace
      str can't take arbitrarly large size. It's limited to 256 but be more
      conservative and cap it to 128.
      Signed-off-by: Qais Yousef's avatarQais Yousef <qais.yousef@arm.com>
    • Qais Yousef's avatar
      kmodule: Add update_time field to sched_load_se trace event · 841ea8bb
      Qais Yousef authored
      Record the PELT clock as used by the PELT computations to enable accurate
      reproduction of the signal in LISA.
      Signed-off-by: Douglas Raillard's avatarDouglas Raillard <douglas.raillard@arm.com>
      Signed-off-by: Qais Yousef's avatarQais Yousef <qais.yousef@arm.com>
    • Qais Yousef's avatar
      sched_tp: rename some events to match what Lisa expects · 8f13befb
      Qais Yousef authored
      Signed-off-by: Qais Yousef's avatarQais Yousef <qais.yousef@arm.com>
    • Qais Yousef's avatar
      sched: add a module to convert tp into events · 0c42668b
      Qais Yousef authored
      The module is always compiled as built-in except for !CONFIG_SMP
      where the targeted tracepoints don't exist/make sense.
      It creates a set of sched events in tracefs that are required to run
      Lisa tests.
      Signed-off-by: Qais Yousef's avatarQais Yousef <qais.yousef@arm.com>
    • Dietmar Eggemann's avatar
      arm, arm64: Enable kernel config options required for EAS testing · c1158a98
      Dietmar Eggemann authored
      arm and arm64:
          Add    Debug per_cpu maps access
          Add    Prove Locking
          Add    Scheduler statistics
          Add    kernel .config support and /proc/config.gz
          Add    Scheduler debugging
          Add    Ftrace
      Signed-off-by: Dietmar Eggemann's avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    • Dietmar Eggemann's avatar
      arm, arm64: Enable kernel config options required for EAS · c3ec7d8f
      Dietmar Eggemann authored
      arm and arm64:
          Add    Cgroups (+ FAIR_GROUP_SCHED and FREEZER)
          Add    Uclamp support for tasks and taskgroups
          Add    Cpuset support
          Add    Scheduler autogroups
          Add    DIE (SCHED_MC) sched domain level
          Add    Energy Model
        Add    CpuFreq governors and make schedutil default
      Signed-off-by: Dietmar Eggemann's avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    • Ionela Voinescu's avatar
      Revert "arch_topology: Make cpu_capacity sysfs node as read-only" · afc596cb
      Ionela Voinescu authored
      This reverts commit 5d777b18.
      [ionela.voinescu@arm.com: modify capacity of current CPU only]
    • Ionela Voinescu's avatar
      arm64, tc0: enable arm mhuv2 · 5fd90066
      Ionela Voinescu authored
    • Sudeep Holla's avatar
      arm64: dts: juno: add mhu doorbell support and scmi device nodes · 028dedba
      Sudeep Holla authored
      , later edit: changed compatibility string from
      arm,juno-scp-shmem to arm,scmi-shmem
      Signed-off-by: Sudeep Holla's avatarSudeep Holla <sudeep.holla@arm.com>
    • Ionela Voinescu's avatar
      arm64: juno: disable CONFIG_MOUSE_PS2 · ab47a816
      Ionela Voinescu authored
      This config is causing instability on Juno boards.
      Signed-off-by: Ionela Voinescu's avatarIonela Voinescu <ionela.voinescu@arm.com>
    • Ionela Voinescu's avatar
      tc2: multi_v7_defconfig: add board support configs · 7310cd16
      Ionela Voinescu authored
      For Arm:
        Add       ARM vexpress-spc cpufreq driver
        Add       ARM Big.Little cpuidle driver
        Add       Sensor Vexpress
        Add       Schedutil as default governor
        Built-in  CPUfreq governors
      Signed-off-by: Ionela Voinescu's avatarIonela Voinescu <ionela.voinescu@arm.com>
    • Valentin Schneider's avatar
    • John Stultz's avatar
      HACK: adv7511: Add poweron delay to allow for EDID probing to work · 73e11ee9
      John Stultz authored
      For some reason on HiKey960 the edid probing doesn't work
      properly unless we delay a bit at poweron.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
    • John Stultz's avatar
    • Chen Jun's avatar
      arm64: dts: hi3660: adb reboot node · 7a47a607
      Chen Jun authored
      Add "hisilicon,hi3660-reboot" node for hi3660.
      Eventually when we've transitioned to UEFI this can be dropped.
      As we can then use syscon-reboot-mode.
      Signed-off-by: default avatarChen Feng <puck.chen@hisilicon.com>
      Signed-off-by: default avatarChen Jun <chenjun14@huawei.com>
    • Chen Feng's avatar
      reset: hisi-reboot: adb reboot bootloader · ddcfe525
      Chen Feng authored
      Signed-off-by: default avatarChen Feng <puck.chen@hisilicon.com>
    • Yu Chen's avatar
      dts: hi3660: Add support for usb on Hikey960 · 06cb2665
      Yu Chen authored
      This patch adds support for usb on Hikey960.
      Cc: Chunfeng Yun <chunfeng.yun@mediatek.com>
      Cc: Wei Xu <xuwei5@hisilicon.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Binghui Wang <wangbinghui@hisilicon.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
    • Yu Chen's avatar
      usb: gadget: Add configfs attribuite for controling match_existing_only · 48cf0b5a
      Yu Chen authored
      Currently the "match_existing_only" of usb_gadget_driver in configfs is
      set to one which is not flexible.
      Dwc3 udc will be removed when usb core switch to host mode. This causes
      failure of writing name of dwc3 udc to configfs's UDC attribuite.
      To fix this we need to add a way to change the config of
      There are systems like Android do not support udev, so adding
      "match_existing_only" attribute to allow configuration by user is cost little.
      This patch adds a configfs attribuite for controling match_existing_only
      which allow user to config "match_existing_only".
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Felipe Balbi <balbi@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Binghui Wang <wangbinghui@hisilicon.com>
      Signed-off-by: default avatarYu Chen <chenyu56@huawei.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Change-Id: I49862a4de4e56b235278cddf7a5ec6e30c5f2ec4
    • Ionela Voinescu's avatar
      TEMP FIX: always call _find_current_opp(dev, opp_table) · 90406541
      Ionela Voinescu authored
      This fixes commit 81c4d8a3.
      Before the commit above, we always used to obtain information about the
      current OPP from hardware. If this performance level was different from
      the one we wanted to set, we would go ahead and set it. But after this
      patch, we use a cached value for the current opp which (somehow) ends up
      being different from what hardware has knowledge of. This results in
      an opp change request sometimes failing.
    • Ionela Voinescu's avatar
      rb5: additional boards support configs · dbe534bd
      Ionela Voinescu authored
    • Ionela Voinescu's avatar
      Revert "usb: renesas-xhci: Fix handling of unknown ROM state" · 51bdb9fa
      Ionela Voinescu authored
      This reverts commit d143825b.
    • Rob Clark's avatar
      drm/msm: Fix display fault handling · 58b4aa08
      Rob Clark authored
      It turns out that when the display is enabled by the bootloader, we can
      get some transient iommu faults from the display.  Which doesn't go over
      too well when we install a fault handler that is gpu specific.  To avoid
      this, defer installing the fault handler until we get around to setting
      up per-process pgtables (which is adreno_smmu specific).  The arm-smmu
      fallback error reporting is sufficient for reporting display related
      faults (and in fact was all we had prior to f8f934c1
      Reported-by: default avatarDmitry Baryshkov <dmitry.baryshkov@linaro.org>
      Reported-by: default avatarYassine Oudjana <y.oudjana@protonmail.com>
      Fixes: 2a574cc0
       ("drm/msm: Improve the a6xx page fault handler")
      Signed-off-by: default avatarRob Clark <robdclark@chromium.org>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
    • Ionela Voinescu's avatar