Skip to content
  • Michal Hocko's avatar
    mm, vmstat: make quiet_vmstat lighter · f01f17d3
    Michal Hocko authored
    Mike has reported a considerable overhead of refresh_cpu_vm_stats from
    the idle entry during pipe test:
    
        12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
         4.75%  [kernel]       [k] __schedule
         4.70%  [kernel]       [k] mutex_unlock
         3.14%  [kernel]       [k] __switch_to
    
    This is caused by commit 0eb77e98
    
     ("vmstat: make vmstat_updater
    deferrable again and shut down on idle") which has placed quiet_vmstat
    into cpu_idle_loop.  The main reason here seems to be that the idle
    entry has to get over all zones and perform atomic operations for each
    vmstat entry even though there might be no per cpu diffs.  This is a
    pointless overhead for _each_ idle entry.
    
    Make sure that quiet_vmstat is as light as possible.
    
    First of all it doesn't make any sense to do any local sync if the
    current cpu is already set in oncpu_stat_off because vmstat_update puts
    itself there only if there is nothing to do.
    
    Then we can check need_update which should be a cheap way to check for
    potential per-cpu diffs and only then do refresh_cpu_vm_stats.
    
    The original patch also did cancel_delayed_work which we are not doing
    here.  There are two reasons for that.  Firstly cancel_delayed_work from
    idle context will blow up on RT kernels (reported by Mike):
    
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
      Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
      Call Trace:
        dump_stack+0x49/0x67
        ___might_sleep+0xf5/0x180
        rt_spin_lock+0x20/0x50
        try_to_grab_pending+0x69/0x240
        cancel_delayed_work+0x26/0xe0
        quiet_vmstat+0x75/0xa0
        cpu_idle_loop+0x38/0x3e0
        cpu_startup_entry+0x13/0x20
        start_secondary+0x114/0x140
    
    And secondly, even on !RT kernels it might add some non trivial overhead
    which is not necessary.  Even if the vmstat worker wakes up and preempts
    idle then it will be most likely a single shot noop because the stats
    were already synced and so it would end up on the oncpu_stat_off anyway.
    We just need to teach both vmstat_shepherd and vmstat_update to stop
    scheduling the worker if there is nothing to do.
    
    [mgalbraith@suse.de: cancel pending work of the cpu_stat_off CPU]
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Reported-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
    Acked-by: default avatarChristoph Lameter <cl@linux.com>
    Signed-off-by: default avatarMike Galbraith <mgalbraith@suse.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    f01f17d3