Skip to content
  • Yang Shi's avatar
    mm: swap: check if swap backing device is congested or not · 8fd2e0b5
    Yang Shi authored
    Swap readahead would read in a few pages regardless if the underlying
    device is busy or not.  It may incur long waiting time if the device is
    congested, and it may also exacerbate the congestion.
    
    Use inode_read_congested() to check if the underlying device is busy or
    not like what file page readahead does.  Get inode from
    swap_info_struct.
    
    Although we can add inode information in swap_address_space
    (address_space->host), it may lead some unexpected side effect, i.e.  it
    may break mapping_cap_account_dirty().  Using inode from
    swap_info_struct seems simple and good enough.
    
    Just does the check in vma_cluster_readahead() since
    swap_vma_readahead() is just used for non-rotational device which much
    less likely has congestion than traditional HDD.
    
    Although swap slots may be consecutive on swap partition, it still may
    be fragmented on swap file.  This check would help to reduce excessive
    stall for such case.
    
    The test with page_fault1 of will-it-scale (sometimes tracing may just
    show runtest.py that is the wrapper script of page_fault1), which
    basically launches NR_CPU threads to generate 128MB anonymous pages for
    each thread, on my virtual machine with congested HDD shows long tail
    latency is reduced significantly.
    
    Without the patch
     page_fault1_thr-1490  [023]   129.311706: funcgraph_entry:      #57377.796 us |  do_swap_page();
     page_fault1_thr-1490  [023]   129.369103: funcgraph_entry:        5.642us   |  do_swap_page();
     page_fault1_thr-1490  [023]   129.369119: funcgraph_entry:      #1289.592 us |  do_swap_page();
     page_fault1_thr-1490  [023]   129.370411: funcgraph_entry:        4.957us   |  do_swap_page();
     page_fault1_thr-1490  [023]   129.370419: funcgraph_entry:        1.940us   |  do_swap_page();
     page_fault1_thr-1490  [023]   129.378847: funcgraph_entry:      #1411.385 us |  do_swap_page();
     page_fault1_thr-1490  [023]   129.380262: funcgraph_entry:        3.916us   |  do_swap_page();
     page_fault1_thr-1490  [023]   129.380275: funcgraph_entry:      #4287.751 us |  do_swap_page();
    
    With the patch
          runtest.py-1417  [020]   301.925911: funcgraph_entry:      #9870.146 us |  do_swap_page();
          runtest.py-1417  [020]   301.935785: funcgraph_entry:        9.802us   |  do_swap_page();
          runtest.py-1417  [020]   301.935799: funcgraph_entry:        3.551us   |  do_swap_page();
          runtest.py-1417  [020]   301.935806: funcgraph_entry:        2.142us   |  do_swap_page();
          runtest.py-1417  [020]   301.935853: funcgraph_entry:        6.938us   |  do_swap_page();
          runtest.py-1417  [020]   301.935864: funcgraph_entry:        3.765us   |  do_swap_page();
          runtest.py-1417  [020]   301.935871: funcgraph_entry:        3.600us   |  do_swap_page();
          runtest.py-1417  [020]   301.935878: funcgraph_entry:        7.202us   |  do_swap_page();
    
    [akpm@linux-foundation.org: code cleanup]
    [yang.shi@linux.alibaba.com: add comment]
      Link: http://lkml.kernel.org/r/bbc7bda7-62d0-df1a-23ef-d369e865bdca@linux.alibaba.com
    Link: http://lkml.kernel.org/r/1546543673-108536-1-git-send-email-yang.shi@linux.alibaba.com
    
    
    Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
    Acked-by: default avatarTim Chen <tim.c.chen@intel.com>
    Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Cc: Huang Ying <ying.huang@intel.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
    Cc: Hugh Dickins <hughd@google.com
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    8fd2e0b5