- 02 Aug, 2021 40 commits
-
-
The Energy Model (EM) provides useful information about device power in each performance state to other subsystems like: Energy Aware Scheduler (EAS). The energy calculation in EAS does arithmetic operation based on the EM em_cpu_energy(). Current implementation of that function uses em_perf_state::cost as a pre-computed cost coefficient equal to: cost = power * max_frequency / frequency. The 'power' is expressed in milli-Watts (or in abstract scale). There are corner cases when the EAS energy calculation for two Performance Domains (PDs) return the same value. The EAS compares these values to choose smaller one. It might happen that this values are equal due to rounding error. In such scenario, we need better resolution, e.g. 1000 times better. To provide this possibility increase the resolution in the em_perf_state::cost for 64-bit architectures. The costs for increasing resolution in 32-bit architectures are pretty high (64-bit division) and the returns do not justify the increased costs. This patch allows to avoid the rounding to milli-Watt errors, which might occur in EAS energy estimation for each Performance Domains (PD). The rounding error is common for small tasks which have small utilization value. There are two places in the code where it makes a difference: 1. In the find_energy_efficient_cpu() where we are searching for best_delta. We might suffer there when two PDs return the same result, like in the example below. Scenario: Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There are quite a few small tasks ~10-15 util. These tasks would suffer for the rounding error. Such system utilization has been seen while playing some simple games. In such condition our partner reported 5..10mA less battery drain. Some details: We have two Perf Domains (PDs): PD0 (big) and PD1 (little) Let's compare w/o patch set ('old') and w/ patch set ('new') We are comparing energy w/ task and w/o task placed in the PDs a) 'old' w/o patch set, PD0 task_util = 13 cost = 480 sum_util_w/o_task = 215 sum_util_w_task = 228 scale_cpu = 1024 energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100 energy_w_task = 480 * 228 / 1024 = 106.87 => 106 energy_diff = 106 - 100 = 6 (this is equal to 'old' PD1's energy_diff in 'c)') b) 'new' w/ patch set, PD0 task_util = 13 cost = 480 * 1000 = 480000 sum_util_w/o_task = 215 sum_util_w_task = 228 energy_w/o_task = 480000 * 215 / 1024 = 100781 energy_w_task = 480000 * 228 / 1024 = 106875 energy_diff = 106875 - 100781 = 6094 (this is not equal to 'new' PD1's energy_diff in 'd)') c) 'old' w/o patch set, PD1 task_util = 13 cost = 160 sum_util_w/o_task = 283 sum_util_w_task = 293 scale_cpu = 355 energy_w/o_task = 160 * 283 / 355 = 127.55 => 127 energy_w_task = 160 * 296 / 355 = 133.41 => 133 energy_diff = 133 - 127 = 6 (this is equal to 'old' PD0's energy_diff in 'a)') d) 'new' w/ patch set, PD1 task_util = 13 cost = 160 * 1000 = 160000 sum_util_w/o_task = 283 sum_util_w_task = 293 scale_cpu = 355 energy_w/o_task = 160000 * 283 / 355 = 127549 energy_w_task = 160000 * 296 / 355 = 133408 energy_diff = 133408 - 127549 = 5859 (this is not equal to 'new' PD0's energy_diff in 'b)') 2. Difference in the the last find_energy_efficient_cpu(): margin filter. With this patch the margin comparison also has better resolution, so it's possible to have better task placement thanks to that. Fixes: 27871f7a ("PM: Introduce an Energy Model management framework") Reported-by:
CCJ Yeh <CCj.Yeh@mediatek.com> Signed-off-by:
Lukasz Luba <lukasz.luba@arm.com>
-
Ionela Voinescu authored
Define arch_init_invariance_cppc() to use highest performance values from _CPC objects to obtain and set maximum capacity information for each CPU. The performance scale used by CPPC is a unified scale for all CPUs in the system. Therefore, by obtaining the raw highest performance values from the _CPC objects, and normalizing them on the [0, 1024] capacity scale used by the task scheduler we obtain the CPU capacity of each CPU. Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com>
-
Ionela Voinescu authored
init_freq_invariance_cppc() was called in acpi_cppc_processor_probe(), after CPU performance information and controls were populated from the per-cpu _CPC objects. But these _CPC objects provide information that helps with both CPU (u-arch) and frequency invariance. Therefore, change the function name to a more generic one, while adding the arch_ prefix, as this function is expected to be defined differently by different architectures. Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Giovanni Gherdovich <ggherdovich@suse.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
-
Some SoCs, such as the sd855 or the old TC2 have OPPs within the same performance domain, whose cost is higher than others with a higher frequency. If those OPPs are interesting from a cooling perspective, it makes no sense to use them when the device can run at full capacity. Those OPPs handicap the performance domain, when choosing the most energy-efficient task placement, and are wasting energy. They are inefficient. Hence, add support for such OPPs to the Energy Model, which creates for each OPP a performance state. The Energy Model can now be read using the regular table, which contains all performance states available, or using an efficient table, where inefficient performance states (and by extension, inefficient OPPs) have been removed. Currently, the efficient table is used in two paths. Schedutil, that'll skip inefficient OPPs for the frequency selection and em_cpu_energy(), used by find_energy_efficient_cpu() to estimate the energy cost for a specified task placement. We have to modify both paths in the same patch so they stay synchronized. The thermal framework still relies on the original table and hence, non CPU devices won't create the efficient table. As used in the hot-path, the efficient table is a lookup table, generated dynamically when the perf domain is created. The complexity of searching a performance state is hence changed from O(n) to O(1). This also speeds-up em_cpu_energy() even if no inefficient OPPs have been found. Signed-off-by:
Vincent Donnefort <vincent.donnefort@arm.com>
-
The previous commit removed per-CPU thermal zones. Despite having a thermal sensor per-CPU, the trip points in those zones would affect *several* CPUs: this system doesn't have per-CPU DVFS, so one cannot change the frequency of an individual CPU, but rather a group thereof (i.e. the frequency domain). Furthermore, the (existing) CPU cluster thermal zones have a "hot" trip point set at the same trip temperature as the lower per-CPU thermal zones. AIUI this is actually useless, as the struct thermal_zone_device_ops provided by thermal_of.c doesn't contain a .notify() callback, so no action will (and can) be taken as a consequence of hitting that trip point. Copy the previous per-CPU trip points / cooling maps into the CPU cluster thermal zones. This should effectively lead to a similar CPU thermal management as before with less overhead. Signed-off-by:
Valentin Schneider <valentin.schneider@arm.com>
-
Signed-off-by:
Valentin Schneider <valentin.schneider@arm.com>
-
This is a temporary hack. Currently trace points for PELT are only triggered when the PELT metrics consumed by the scheduler are actually updated, i.e. util_avg. This means no updates if no 1 ms boundary is being crossed by the update. When reconstructing the PELT signal based on this data, the peak PELT value can therefore be up to 1 ms worth of PELT accumulation off (23 in absolute terms). This leads to a discrepancy that causes test cases to fail. This patch ensures that trace events are always emitted even if the metrics haven't been updated which should allow accurate reconstruction of the PELT signals.
-
... hackbench-3060 [007] 565.489752: sched_pelt_thermal: cpu=7 load=101 runnable=103765 util=951 update_time=564835470336 hackbench-3147 [005] 565.489760: sched_pelt_thermal: cpu=5 load=101 runnable=104036 util=954 update_time=565325810688 task_n3-3-576 [000] 565.489763: sched_pelt_thermal: cpu=0 load=31 runnable=32404 util=953 update_time=557800611840 hackbench-862 [001] 565.489763: sched_pelt_thermal: cpu=1 load=31 runnable=32395 util=952 update_time=564421108736 hackbench-2136 [002] 565.489763: sched_pelt_thermal: cpu=2 load=31 runnable=32455 util=954 update_time=564142488576 The actual PELT thermal signal is `load`, `runnable` and `util` are meaningless. It's plotted here since the trace event is implemented according to the `sched_pelt_rq_template` trace event class. The `sched_pelt_rt`, `sched_pelt_dl` and `sched_pelt_irq` trace event have a similar issue. Signed-off-by:
Dietmar Eggemann <dietmar.eggemann@arm.com>
-
Whereas capacity_orig is the CPU capacity scaled by CPU architecture, capacity_curr (current capacity) is the CPU capacity scaled by CPU architecture and current_frequency/max_frequency. capacity_curr is used to create the PELT clock information for enqueue/dequeue (sched_switch) out of (not aligned) PELT sched events in LISA's PELT simulation. Here, the ftrace time between an enqueue/dequeue and a PELT sched event has to be rescaled by the time-scaling factor which itself consists of CPU capacity and current_frequency/max_frequency scaling factor. And this is done by using a series of capacity_curr values.
-
... [000] .Ns3 185.489841: sched_cpu_capacity: cpu=0 capacity=437 capacity_orig=446 [002] ..s1 185.501802: sched_cpu_capacity: cpu=2 capacity=999 capacity_orig=1024 [003] ..s3 185.517837: sched_cpu_capacity: cpu=3 capacity=438 capacity_orig=446 [002] ..s1 185.529794: sched_cpu_capacity: cpu=2 capacity=999 capacity_orig=1024 [001] ..s1 185.545789: sched_cpu_capacity: cpu=1 capacity=1011 capacity_orig=1024 [004] .Ns1 185.545837: sched_cpu_capacity: cpu=4 capacity=439 capacity_orig=446 ... Use arch_scale_cpu_capacity() instead of rq->cpu_capacity_orig because there is a patch further up the stack which removes the latter. Signed-off-by:
Dietmar Eggemann <dietmar.eggemann@arm.com>
-
The trace event sched_pelt_se already has this entry to trace PELT's last_update_time. Having this for cfs, rt, dl rq and irq as well is interesting to debug PELT issues, especially with time-scaling (rq_clock_task/clock_pelt diff and sync). Signed-off-by:
Dietmar Eggemann <dietmar.eggemann@arm.com>
-
Use rq_of(cfs_rq)->cfs instead of &cpu_rq(cpu)->cfs since cpu_rq(cpu) uses runqueues. Signed-off-by:
Dietmar Eggemann <dietmar.eggemann@arm.com>
-
Signed-off-by:
Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by:
Qais Yousef <qais.yousef@arm.com>
-
The change that broke the trace event was actually merged in Linus tree on 5.7, but since we test tip/sched/core which had the change earlier, the version condition was temporarily incorrect to cope with that. Now 5.7 is released, apply the correct condition. Signed-off-by:
Qais Yousef <qais.yousef@arm.com>
-
Signed-off-by:
Qais Yousef <qais.yousef@arm.com>
-
s/sched_load_cfs_rq/sched_pelt_cfs/ s/sched_load_se/sched_pelt_se/ Signed-off-by:
Qais Yousef <qais.yousef@arm.com>
-
We need to support both old and new version of the struct. Signed-off-by:
Qais Yousef <qais.yousef@arm.com>
-
SPAN_SIZE was being set to 0 when NR_CPUS was 2; which triggererd a compile time assertion. Which exposed a bug in the logic where we where rounding down instead of up. Fix it by using round_up() to ensure we get the correct span size for the NR_CPUS. Ensure as well that we don't use a spansize greather than 128. The trace str can't take arbitrarly large size. It's limited to 256 but be more conservative and cap it to 128. Signed-off-by:
Qais Yousef <qais.yousef@arm.com>
-
Record the PELT clock as used by the PELT computations to enable accurate reproduction of the signal in LISA. Signed-off-by:
Douglas Raillard <douglas.raillard@arm.com> Signed-off-by:
Qais Yousef <qais.yousef@arm.com>
-
s/sched_pelt_cfs/sched_load_cfs_rq/ s/sched_pelt_se/sched_load_se/ Signed-off-by:
Qais Yousef <qais.yousef@arm.com>
-
The module is always compiled as built-in except for !CONFIG_SMP where the targeted tracepoints don't exist/make sense. It creates a set of sched events in tracefs that are required to run Lisa tests. Signed-off-by:
Qais Yousef <qais.yousef@arm.com>
-
arm and arm64: Add Debug per_cpu maps access Add Prove Locking Add Scheduler statistics arm: Add kernel .config support and /proc/config.gz arm64: Add Scheduler debugging Add Ftrace Signed-off-by:
Dietmar Eggemann <dietmar.eggemann@arm.com>
-
arm and arm64: Add Cgroups (+ FAIR_GROUP_SCHED and FREEZER) Add Uclamp support for tasks and taskgroups arm: Add Cpuset support Add Scheduler autogroups Add DIE (SCHED_MC) sched domain level Add Energy Model arm64: Add CpuFreq governors and make schedutil default Signed-off-by:
Dietmar Eggemann <dietmar.eggemann@arm.com>
-
Ionela Voinescu authored
This reverts commit 5d777b18. [ionela.voinescu@arm.com: modify capacity of current CPU only]
-
Ionela Voinescu authored
-
@Ionela.Voinescu , later edit: changed compatibility string from arm,juno-scp-shmem to arm,scmi-shmem Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com>
-
Ionela Voinescu authored
This config is causing instability on Juno boards. Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com>
-
Ionela Voinescu authored
For Arm: Add ARM vexpress-spc cpufreq driver Add ARM Big.Little cpuidle driver Add Sensor Vexpress Add Schedutil as default governor Built-in CPUfreq governors Signed-off-by:
Ionela Voinescu <ionela.voinescu@arm.com>
-
Signed-off-by:
Valentin Schneider <valentin.schneider@arm.com>
-
For some reason on HiKey960 the edid probing doesn't work properly unless we delay a bit at poweron. Signed-off-by:
John Stultz <john.stultz@linaro.org>
-
Signed-off-by:
John Stultz <john.stultz@linaro.org>
-
Add "hisilicon,hi3660-reboot" node for hi3660. Eventually when we've transitioned to UEFI this can be dropped. As we can then use syscon-reboot-mode. Signed-off-by:
Chen Feng <puck.chen@hisilicon.com> Signed-off-by:
Chen Jun <chenjun14@huawei.com>
-
Signed-off-by:
Chen Feng <puck.chen@hisilicon.com>
-
This patch adds support for usb on Hikey960. Cc: Chunfeng Yun <chunfeng.yun@mediatek.com> Cc: Wei Xu <xuwei5@hisilicon.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: John Stultz <john.stultz@linaro.org> Cc: Binghui Wang <wangbinghui@hisilicon.com> Signed-off-by:
Yu Chen <chenyu56@huawei.com> Signed-off-by:
John Stultz <john.stultz@linaro.org>
-
Currently the "match_existing_only" of usb_gadget_driver in configfs is set to one which is not flexible. Dwc3 udc will be removed when usb core switch to host mode. This causes failure of writing name of dwc3 udc to configfs's UDC attribuite. To fix this we need to add a way to change the config of "match_existing_only". There are systems like Android do not support udev, so adding "match_existing_only" attribute to allow configuration by user is cost little. This patch adds a configfs attribuite for controling match_existing_only which allow user to config "match_existing_only". Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Felipe Balbi <balbi@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Binghui Wang <wangbinghui@hisilicon.com> Signed-off-by:
Yu Chen <chenyu56@huawei.com> Signed-off-by:
John Stultz <john.stultz@linaro.org> Change-Id: I49862a4de4e56b235278cddf7a5ec6e30c5f2ec4
-
Ionela Voinescu authored
This fixes commit 81c4d8a3. Before the commit above, we always used to obtain information about the current OPP from hardware. If this performance level was different from the one we wanted to set, we would go ahead and set it. But after this patch, we use a cached value for the current opp which (somehow) ends up being different from what hardware has knowledge of. This results in an opp change request sometimes failing.
-
Ionela Voinescu authored
-
Ionela Voinescu authored
This reverts commit d143825b.
-
It turns out that when the display is enabled by the bootloader, we can get some transient iommu faults from the display. Which doesn't go over too well when we install a fault handler that is gpu specific. To avoid this, defer installing the fault handler until we get around to setting up per-process pgtables (which is adreno_smmu specific). The arm-smmu fallback error reporting is sufficient for reporting display related faults (and in fact was all we had prior to f8f934c1 ) Reported-by:
Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reported-by:
Yassine Oudjana <y.oudjana@protonmail.com> Fixes: 2a574cc0 ("drm/msm: Improve the a6xx page fault handler") Signed-off-by:
Rob Clark <robdclark@chromium.org> Tested-by:
John Stultz <john.stultz@linaro.org>
-
Ionela Voinescu authored
-