1. 14 Jun, 2018 1 commit
  2. 21 Feb, 2018 1 commit
    • Huang Ying's avatar
      mm, swap, frontswap: fix THP swap if frontswap enabled · 7ba71669
      Huang Ying authored
      It was reported by Sergey Senozhatsky that if THP (Transparent Huge
      Page) and frontswap (via zswap) are both enabled, when memory goes low
      so that swap is triggered, segfault and memory corruption will occur in
      random user space applications as follow,
      kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
       #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
       #1  0x00007fc08889c2f3 malloc (libc.so.6)
       #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
       #3  0x0000560e6005e75c n/a (urxvt)
       #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
       #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
       #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
       #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
       #8  0x0000560e6005cb55 ev_run (urxvt)
       #9  0x0000560e6003b9b9 main (urxvt)
       #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
       #11 0x0000560e6003f9da _start (urxvt)
      After bisection, it was found the first bad commit is bd4c82c2 ("mm,
      THP, swap: delay splitting THP after swapped out").
      The root cause is as follows:
      When the pages are written to swap device during swapping out in
      swap_writepage(), zswap (fontswap) is tried to compress the pages to
      improve performance.  But zswap (frontswap) will treat THP as a normal
      page, so only the head page is saved.  After swapping in, tail pages
      will not be restored to their original contents, causing memory
      corruption in the applications.
      This is fixed by refusing to save page in the frontswap store functions
      if the page is a THP.  So that the THP will be swapped out to swap
      Another choice is to split THP if frontswap is enabled.  But it is found
      that the frontswap enabling isn't flexible.  For example, if
      CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
      zswap itself isn't enabled.
      Frontswap has multiple backends, to make it easy for one backend to
      enable THP support, the THP checking is put in backend frontswap store
      functions instead of the general interfaces.
      Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
      Fixes: bd4c82c2
       ("mm, THP, swap: delay splitting THP after swapped out")
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reported-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Tested-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: Minchan Kim <minchan@kernel.org>	[put THP checking in backend]
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: <stable@vger.kernel.org>	[4.14]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  3. 01 Feb, 2018 2 commits
    • Yu Zhao's avatar
      zswap: only save zswap header when necessary · 9c3760eb
      Yu Zhao authored
      We waste sizeof(swp_entry_t) for zswap header when using zsmalloc as
      zpool driver because zsmalloc doesn't support eviction.
      Add zpool_evictable() to detect if zpool is potentially evictable, and
      use it in zswap to avoid waste memory for zswap header.
      [yuzhao@google.com: The zpool->" prefix is a result of copy & paste]
        Link: http://lkml.kernel.org/r/20180110225626.110330-1-yuzhao@google.com
      Link: http://lkml.kernel.org/r/20180110224741.83751-1-yuzhao@google.com
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Acked-by: default avatarDan Streetman <ddstreet@ieee.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Srividya Desireddy's avatar
      zswap: same-filled pages handling · a85f878b
      Srividya Desireddy authored
      Zswap is a cache which compresses the pages that are being swapped out
      and stores them into a dynamically allocated RAM-based memory pool.
      Experiments have shown that around 10-20% of pages stored in zswap are
      same-filled pages (i.e.  contents of the page are all same), but these
      pages are handled as normal pages by compressing and allocating memory
      in the pool.
      This patch adds a check in zswap_frontswap_store() to identify
      same-filled page before compression of the page.  If the page is a
      same-filled page, set zswap_entry.length to zero, save the same-filled
      value and skip the compression of the page and alloction of memory in
      zpool.  In zswap_frontswap_load(), check if value of zswap_entry.length
      is zero corresponding to the page to be loaded.  If zswap_entry.length
      is zero, fill the page with same-filled value.  This saves the
      decompression time during load.
      On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
      relaunching different applications, out of ~64000 pages stored in zswap,
      ~11000 pages were same-value filled pages (including zero-filled pages)
      and ~9000 pages were zero-filled pages.
      An average of 17% of pages(including zero-filled pages) in zswap are
      same-value filled pages and 14% pages are zero-filled pages.  An average
      of 3% of pages are same-filled non-zero pages.
      The below table shows the execution time profiling with the patch.
                                  Baseline    With patch  % Improvement
        *Zswap Store Time           26.5ms       18ms          32%
         (of same value pages)
        *Zswap Load Time
         (of same value pages)      25.5ms       13ms          49%
      On Ubuntu PC with 2GB RAM, while executing kernel build and other test
      scripts and running multimedia applications, out of 360000 pages stored
      in zswap 78000(~22%) of pages were found to be same-value filled pages
      (including zero-filled pages) and 64000(~17%) are zero-filled pages.  So
      an average of %5 of pages are same-filled non-zero pages.
      The below table shows the execution time profiling with the patch.
                                  Baseline    With patch  % Improvement
        *Zswap Store Time           91ms        74ms           19%
         (of same value pages)
        *Zswap Load Time            50ms        7.5ms          85%
         (of same value pages)
      *The execution times may vary with test device used.
      Dan said:
      : I did test this patch out this week, and I added some instrumentation to
      : check the performance impact, and tested with a small program to try to
      : check the best and worst cases.
      : When doing a lot of swap where all (or almost all) pages are same-value, I
      : found this patch does save both time and space, significantly.  The exact
      : improvement in time and space depends on which compressor is being used,
      : but roughly agrees with the numbers you listed.
      : In the worst case situation, where all (or almost all) pages have the
      : same-value *except* the final long (meaning, zswap will check each long on
      : the entire page but then still have to pass the page to the compressor),
      : the same-value check is around 10-15% of the total time spent in
      : zswap_frontswap_store().  That's a not-insignificant amount of time, but
      : it's not huge.  Considering that most systems will probably be swapping
      : pages that aren't similar to the worst case (although I don't have any
      : data to know that), I'd say the improvement is worth the possible
      : worst-case performance impact.
      [srividya.dr@samsung.com: add memset_l instead of for loop]
      Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
      Signed-off-by: default avatarSrividya Desireddy <srividya.dr@samsung.com>
      Acked-by: default avatarDan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Dinakar Reddy Pathireddy <dinakar.p@samsung.com>
      Cc: SHARAN ALLUR <sharan.allur@samsung.com>
      Cc: RAJIB BASU <rajib.basu@samsung.com>
      Cc: JUHUN KIM <juhunkim@samsung.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Timofey Titovets <nefelim4ag@gmail.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 06 Jul, 2017 3 commits
  5. 28 Feb, 2017 3 commits
  6. 03 Feb, 2017 1 commit
    • Dan Streetman's avatar
      zswap: disable changing params if init fails · d7b028f5
      Dan Streetman authored
      Add zswap_init_failed bool that prevents changing any of the module
      params, if init_zswap() fails, and set zswap_enabled to false.  Change
      'enabled' param to a callback, and check zswap_init_failed before
      allowing any change to 'enabled', 'zpool', or 'compressor' params.
      Any driver that is built-in to the kernel will not be unloaded if its
      init function returns error, and its module params remain accessible for
      users to change via sysfs.  Since zswap uses param callbacks, which
      assume that zswap has been initialized, changing the zswap params after
      a failed initialization will result in WARNING due to the param
      callbacks expecting a pool to already exist.  This prevents that by
      immediately exiting any of the param callbacks if initialization failed.
      This was reported here:
      And fixes this WARNING:
        [  429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60
      The warning is just noise, and not serious.  However, when init fails,
      zswap frees all its percpu dstmem pages and its kmem cache.  The kmem
      cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
      the percpu dstmem pages are definitely a problem, as they're used as
      temporary buffer for compressed pages before copying into place in the
      If the user does get zswap enabled after an init failure, then zswap
      will likely Oops on the first page it tries to compress (or worse, start
      corrupting memory).
      Fixes: 90b0fc26 ("zswap: change zpool/compressor at runtime")
      Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
      Signed-off-by: default avatarDan Streetman <dan.streetman@canonical.com>
      Reported-by: default avatarMarcin Miroslaw <marcin@mejor.pl>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 01 Dec, 2016 2 commits
  8. 21 May, 2016 1 commit
    • Dan Streetman's avatar
      mm/zswap: use workqueue to destroy pool · 200867af
      Dan Streetman authored
      Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
      use the workqueue instead of using call_rcu().
      When zswap destroys a pool no longer in use, it uses call_rcu() to
      perform the destruction/freeing.  Since that executes in softirq
      context, it must not sleep.  However, actually destroying the pool
      involves freeing the per-cpu compressors (which requires locking the
      cpu_add_remove_lock mutex) and freeing the zpool, for which the
      implementation may sleep (e.g.  zsmalloc calls kmem_cache_destroy, which
      locks the slab_mutex).  So if either mutex is currently taken, or any
      other part of the compressor or zpool implementation sleeps, it will
      result in a BUG().
      It's not easy to reproduce this when changing zswap's params normally.
      In testing with a loaded system, this does not fail:
        $ cd /sys/module/zswap/parameters
        $ echo lz4 > compressor ; echo zsmalloc > zpool
      nor does this:
        $ while true ; do
        > echo lzo > compressor ; echo zbud > zpool
        > sleep 1
        > echo lz4 > compressor ; echo zsmalloc > zpool
        > sleep 1
        > done
      although it's still possible either of those might fail, depending on
      whether anything else besides zswap has locked the mutexes.
      However, changing a parameter with no delay immediately causes the
      schedule while atomic BUG:
        $ while true ; do
        > echo lzo > compressor ; echo lz4 > compressor
        > done
      This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
      but moved to zswap, to cover compressor and zpool freeing.
      Fixes: f1c54846
       ("zswap: dynamic pool creation")
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Reported-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Streetman <dan.streetman@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  9. 06 May, 2016 1 commit
    • Dan Streetman's avatar
      mm/zswap: provide unique zpool name · 32a4e169
      Dan Streetman authored
      Instead of using "zswap" as the name for all zpools created, add an
      atomic counter and use "zswap%x" with the counter number for each zpool
      created, to provide a unique name for each new zpool.
      As zsmalloc, one of the zpool implementations, requires/expects a unique
      name for each pool created, zswap should provide a unique name.  The
      zsmalloc pool creation does not fail if a new pool with a conflicting
      name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case,
      zsmalloc pool creation fails with -ENOMEM.  Then zswap will be unable to
      change its compressor parameter if its zpool is zsmalloc; it also will
      be unable to change its zpool parameter back to zsmalloc, if it has any
      existing old zpool using zsmalloc with page(s) in it.  Attempts to
      change the parameters will result in failure to create the zpool.  This
      changes zswap to provide a unique name for each zpool creation.
      Fixes: f1c54846
       ("zswap: dynamic pool creation")
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Reported-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Dan Streetman <dan.streetman@canonical.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  10. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      This promise never materialized.  And unlikely will.
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      Let's stop pretending that pages in page cache are special.  They are
      The changes are pretty straight-forward:
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
       - page_cache_get() -> get_page();
       - page_cache_release() -> put_page();
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      virtual patch
      expression E;
      + E
      expression E;
      + E
      + PAGE_SHIFT
      + PAGE_SIZE
      + PAGE_MASK
      expression E;
      + PAGE_ALIGN(E)
      expression E;
      - page_cache_get(E)
      + get_page(E)
      expression E;
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  11. 18 Dec, 2015 1 commit
  12. 07 Nov, 2015 3 commits
    • Dan Streetman's avatar
      zswap: use charp for zswap param strings · c99b42c3
      Dan Streetman authored
      Instead of using a fixed-length string for the zswap params, use charp.
      This simplifies the code and uses less memory, as most zswap param strings
      will be less than the current maximum length.
      Signed-off-by: default avatarDan Streetman <ddstreet@ieee.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Alexey Klimov's avatar
      mm/zswap.c: remove unneeded initialization to NULL in zswap_entry_find_get() · b0c9865f
      Alexey Klimov authored
      On the next line entry variable will be re-initialized so no need to init
      it with NULL.
      Signed-off-by: default avatarAlexey Klimov <alexey.klimov@linaro.org>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mel Gorman's avatar
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman authored
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      This patch then converts a number of sites
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  13. 10 Sep, 2015 2 commits
  14. 08 Sep, 2015 2 commits
  15. 26 Jun, 2015 1 commit
  16. 13 Feb, 2015 1 commit
  17. 13 Dec, 2014 2 commits
  18. 13 Nov, 2014 1 commit
  19. 08 Aug, 2014 2 commits
    • Fabian Frederick's avatar
      mm/zswap.c: add __init to zswap_entry_cache_destroy() · c119239b
      Fabian Frederick authored
      zswap_entry_cache_destroy() is only called by __init init_zswap().
      This patch also fixes function name zswap_entry_cache_ s/destory/destroy
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Acked-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Johannes Weiner's avatar
      mm: memcontrol: rewrite uncharge API · 0a31bc97
      Johannes Weiner authored
      The memcg uncharging code that is involved towards the end of a page's
      lifetime - truncation, reclaim, swapout, migration - is impressively
      complicated and fragile.
      Because anonymous and file pages were always charged before they had their
      page->mapping established, uncharges had to happen when the page type
      could still be known from the context; as in unmap for anonymous, page
      cache removal for file and shmem pages, and swap cache truncation for swap
      pages.  However, these operations happen well before the page is actually
      freed, and so a lot of synchronization is necessary:
      - Charging, uncharging, page migration, and charge migration all need
        to take a per-page bit spinlock as they could race with uncharging.
      - Swap cache truncation happens during both swap-in and swap-out, and
        possibly repeatedly before the page is actually freed.  This means
        that the memcg swapout code is called from many contexts that make
        no sense and it has to figure out the direction from page state to
        make sure memory and memory+swap are always correctly charged.
      - On page migration, the old page might be unmapped but then reused,
        so memcg code has to prevent untimely uncharging in that case.
        Because this code - which should be a simple charge transfer - is so
        special-cased, it is not reusable for replace_page_cache().
      But now that charged pages always have a page->mapping, introduce
      mem_cgroup_uncharge(), which is called after the final put_page(), when we
      know for sure that nobody is looking at the page anymore.
      For page migration, introduce mem_cgroup_migrate(), which is called after
      the migration is successful and the new page is fully rmapped.  Because
      the old page is no longer uncharged after migration, prevent double
      charges by decoupling the page's memcg association (PCG_USED and
      pc->mem_cgroup) from the page holding an actual charge.  The new bits
      PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
      to the new page during migration.
      mem_cgroup_migrate() is suitable for replace_page_cache() as well,
      which gets rid of mem_cgroup_replace_page_cache().  However, care
      needs to be taken because both the source and the target page can
      already be charged and on the LRU when fuse is splicing: grab the page
      lock on the charge moving side to prevent changing pc->mem_cgroup of a
      page under migration.  Also, the lruvecs of both pages change as we
      uncharge the old and charge the new during migration, and putback may
      race with us, so grab the lru lock and isolate the pages iff on LRU to
      prevent races and ensure the pages are on the right lruvec afterward.
      Swap accounting is massively simplified: because the page is no longer
      uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
      transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
      before the final put_page() in page reclaim.
      Finally, page_cgroup changes are now protected by whatever protection the
      page itself offers: anonymous pages are charged under the page table lock,
      whereas page cache insertions, swapin, and migration hold the page lock.
      Uncharging happens under full exclusion with no outstanding references.
      Charging and uncharging also ensure that the page is off-LRU, which
      serializes against charge migration.  Remove the very costly page_cgroup
      lock and set pc->flags non-atomically.
      [mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
      [vdavydov@parallels.com: fix flags definition]
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Tested-by: default avatarJet Chen <jet.chen@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Tested-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  20. 07 Aug, 2014 1 commit
  21. 04 Jun, 2014 1 commit
  22. 07 Apr, 2014 4 commits
  23. 20 Mar, 2014 1 commit
    • Srivatsa S. Bhat's avatar
      mm, zswap: Fix CPU hotplug callback registration · 57637824
      Srivatsa S. Bhat authored
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      Instead, the correct and race-free way of performing the callback
      registration is:
      	/* Note the use of the double underscored version of the API */
      Fix the zswap code by using this latter form of callback registration.
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
  24. 24 Jan, 2014 1 commit
  25. 13 Nov, 2013 1 commit