• Huang Ying's avatar
    mm, swap: VMA based swap readahead · ec560175
    Huang Ying authored
    The swap readahead is an important mechanism to reduce the swap in
    latency.  Although pure sequential memory access pattern isn't very
    popular for anonymous memory, the space locality is still considered
    In the original swap readahead implementation, the consecutive blocks in
    swap device are readahead based on the global space locality estimation.
    But the consecutive blocks in swap device just reflect the order of page
    reclaiming, don't necessarily reflect the access pattern in virtual
    memory.  And the different tasks in the system may have different access
    patterns, which makes the global space locality estimation incorrect.
    In this patch, when page fault occurs, the virtual pages near the fault
    address will be readahead instead of the swap slots near the fault swap
    slot in swap device.  This avoid to readahead the unrelated swap slots.
    At the same time, the swap readahead is changed to work on per-VMA from
    globally.  So that the different access patterns of the different VMAs
    could be distinguished, and the different readahead policy could be
    applied accordingly.  The original core readahead detection and scaling
    algorithm is reused, because it is an effect algorithm to detect the
    space locality.
    The test and result is as follow,
    Common test condition
    Test Machine: Xeon E5 v3 (2 sockets, 72 threads, 32G RAM) Swap device:
    NVMe disk
    Micro-benchmark with combined access pattern
    vm-scalability, sequential swap test case, 4 processes to eat 50G
    virtual memory space, repeat the sequential memory writing until 300
    seconds.  The first round writing will trigger swap out, the following
    rounds will trigger sequential swap in and out.
    At the same time, run vm-scalability random swap test case in
    background, 8 processes to eat 30G virtual memory space, repeat the
    random memory write until 300 seconds.  This will trigger random swap-in
    in the background.
    This is a combined workload with sequential and random memory accessing
    at the same time.  The result (for sequential workload) is as follow,
    			Base		Optimized
    			----		---------
    throughput		345413 KB/s	414029 KB/s (+19.9%)
    latency.average		97.14 us	61.06 us (-37.1%)
    latency.50th		2 us		1 us
    latency.60th		2 us		1 us
    latency.70th		98 us		2 us
    latency.80th		160 us		2 us
    latency.90th		260 us		217 us
    latency.95th		346 us		369 us
    latency.99th		1.34 ms		1.09 ms
    ra_hit%			52.69%		99.98%
    The original swap readahead algorithm is confused by the background
    random access workload, so readahead hit rate is lower.  The VMA-base
    readahead algorithm works much better.
    The test memory size is bigger than RAM to trigger swapping.
    			Base		Optimized
    			----		---------
    elapsed_time		393.49 s	329.88 s (-16.2%)
    ra_hit%			86.21%		98.82%
    The score of base and optimized kernel hasn't visible changes.  But the
    elapsed time reduced and readahead hit rate improved, so the optimized
    kernel runs better for startup and tear down stages.  And the absolute
    value of readahead hit rate is high, shows that the space locality is
    still valid in some practical workloads.
    Link: http://lkml.kernel.org/r/20170807054038.1843-4-ying.huang@intel.com
    Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Shaohua Li <shli@kernel.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Fengguang Wu <fengguang.wu@intel.com>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>