• Andy Lutomirski's avatar
    x86/mm: Rework lazy TLB mode and TLB freshness tracking · 94b1b03b
    Andy Lutomirski authored
    x86's lazy TLB mode used to be fairly weak -- it would switch to
    init_mm the first time it tried to flush a lazy TLB.  This meant an
    unnecessary CR3 write and, if the flush was remote, an unnecessary
    IPI.
    
    Rewrite it entirely.  When we enter lazy mode, we simply remove the
    CPU from mm_cpumask.  This means that we need a way to figure out
    whether we've missed a flush when we switch back out of lazy mode.
    I use the tlb_gen machinery to track whether a context is up to
    date.
    
    Note to reviewers: this patch, my itself, looks a bit odd.  I'm
    using an array of length 1 containing (ctx_id, tlb_gen) rather than
    just storing tlb_gen, and making it at array isn't necessary yet.
    I'm doing this because the next few patches add PCID support, and,
    with PCID, we need ctx_id, and the array will end up with a length
    greater than 1.  Making it an array now means that there will be
    less churn and therefore less stress on your eyeballs.
    
    NB: This is dubious but, AFAICT, still correct on Xen and UV.
    xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
    patch changes the way that mm_cpumask() works.  This should be okay,
    since Xen *also* iterates all online CPUs to find all the CPUs it
    needs to twiddle.
    
    The UV tlbflush code is rather dated and should be changed.
    
    Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
    (turbo off, intel_pstate requesting max performance) under KVM with
    the guest using idle=poll (to avoid artifacts when bouncing between
    CPUs).  I haven't done any real statistics here -- I just ran them
    in a loop and picked the fastest results that didn't look like
    outliers.  Unpatched means commit a4eb8b99
    
    , so all the
    bookkeeping overhead is gone.
    
    MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity.  In
    an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU.
    This is intended to be a nearly worst-case test.
    
      patched:         13.4µs
      unpatched:       21.6µs
    
    Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores),
    nrounds = 100, 256M data
    
      patched:         1.1 seconds or so
      unpatched:       1.9 seconds or so
    
    The sleepup on Vitaly's test appearss to be because it spends a lot
    of time blocked on mmap_sem, and this patch avoids sending IPIs to
    blocked CPUs.
    Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
    Reviewed-by: default avatarNadav Amit <nadav.amit@gmail.com>
    Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: Andrew Banman <abanman@sgi.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Arjan van de Ven <arjan@linux.intel.com>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Dimitri Sivanich <sivanich@sgi.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Mike Travis <travis@sgi.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/ddf2c92962339f4ba39d8fc41b853936ec0b44f1.1498751203.git.luto@kernel.org
    
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    94b1b03b
init.c 23.7 KB