1. 15 Aug, 2020 2 commits
    • Qian Cai's avatar
      mm/swap_state: mark various intentional data races · b96a3db2
      Qian Cai authored
      
      
      swap_cache_info.* could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache
      
       write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101:
        lookup_swap_cache+0x12e/0x460
        lookup_swap_cache at mm/swap_state.c:322
        do_swap_page+0x112/0xeb0
        __handle_mm_fault+0xc7a/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100:
        lookup_swap_cache+0x117/0x460
        lookup_swap_cache at mm/swap_state.c:322
        shmem_swapin_page+0xc7/0x9e0
        shmem_getpage_gfp+0x2ca/0x16c0
        shmem_fault+0xef/0x3c0
        __do_fault+0x9e/0x220
        do_fault+0x4a0/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
       write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87:
         __delete_from_swap_cache+0x681/0x8b0
         __delete_from_swap_cache at mm/swap_state.c:178
      
       read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53:
         __delete_from_swap_cache+0x66e/0x8b0
         __delete_from_swap_cache at mm/swap_state.c:178
      
      Both the read and write are done as lockless. Since swap_cache_info.*
      are only used to print out counter information, even if any of them
      missed a few incremental due to data races, it will be harmless, so just
      mark it as an intentional data race using the data_race() macro.
      
      While at it, fix a checkpatch.pl warning,
      
      WARNING: Single statement macros should not use a do {} while (0) loop
      
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pw
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b96a3db2
    • Matthew Wilcox (Oracle)'s avatar
      mm: replace hpage_nr_pages with thp_nr_pages · 6c357848
      Matthew Wilcox (Oracle) authored
      
      
      The thp prefix is more frequently used than hpage and we should be
      consistent between the various functions.
      
      [akpm@linux-foundation.org: fix mm/migrate.c]
      
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c357848
  2. 12 Aug, 2020 2 commits
  3. 07 Aug, 2020 1 commit
  4. 26 Jun, 2020 1 commit
    • Hugh Dickins's avatar
      mm: fix swap cache node allocation mask · 243bce09
      Hugh Dickins authored
      Chris Murphy reports that a slightly overcommitted load, testing swap
      and zram along with i915, splats and keeps on splatting, when it had
      better fail less noisily:
      
        gnome-shell: page allocation failure: order:0,
        mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE),
        nodemask=(null),cpuset=/,mems_allowed=0
        CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1
        Call Trace:
          dump_stack+0x64/0x88
          warn_alloc.cold+0x75/0xd9
          __alloc_pages_slowpath.constprop.0+0xcfa/0xd30
          __alloc_pages_nodemask+0x2df/0x320
          alloc_slab_page+0x195/0x310
          allocate_slab+0x3c5/0x440
          ___slab_alloc+0x40c/0x5f0
          __slab_alloc+0x1c/0x30
          kmem_cache_alloc+0x20e/0x220
          xas_nomem+0x28/0x70
          add_to_swap_cache+0x321/0x400
          __read_swap_cache_async+0x105/0x240
          swap_cluster_readahead+0x22c/0x2e0
          shmem_swapin+0x8e/0xc0
          shmem_swapin_page+0x196/0x740
          shmem_getpage_gfp+0x3a2/0xa60
          shmem_read_mapping_page_gfp+0x32/0x60
          shmem_get_pages+0x155/0x5e0 [i915]
          __i915_gem_object_get_pages+0x68/0xa0 [i915]
          i915_vma_pin+0x3fe/0x6c0 [i915]
          eb_add_vma+0x10b/0x2c0 [i915]
          i915_gem_do_execbuffer+0x704/0x3430 [i915]
          i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915]
          drm_ioctl_kernel+0x86/0xd0 [drm]
          drm_ioctl+0x206/0x390 [drm]
          ksys_ioctl+0x82/0xc0
          __x64_sys_ioctl+0x16/0x20
          do_syscall_64+0x5b/0xf0
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported on 5.7, but it goes back really to 3.1: when
      shmem_read_mapping_page_gfp() was implemented for use by i915, and
      allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but
      missed swapin's "& GFP_KERNEL" mask for page tree node allocation in
      __read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits
      from what page cache uses, but GFP_RECLAIM_MASK is now what's needed.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils
      Fixes: 68da9f05
      
       ("tmpfs: pass gfp to shmem_getpage_gfp")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reported-by: default avatarChris Murphy <lists@colorremedies.com>
      Analyzed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Analyzed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Tested-by: default avatarChris Murphy <lists@colorremedies.com>
      Cc: <stable@vger.kernel.org>	[3.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      243bce09
  5. 09 Jun, 2020 2 commits
    • Michel Lespinasse's avatar
      mmap locking API: convert mmap_sem comments · c1e8d7c6
      Michel Lespinasse authored
      
      
      Convert comments that reference mmap_sem to reference mmap_lock instead.
      
      [akpm@linux-foundation.org: fix up linux-next leftovers]
      [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
      [akpm@linux-foundation.org: more linux-next fixups, per Michel]
      
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: Liam Howlett <Liam.Howlett@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ying Han <yinghan@google.com>
      Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c1e8d7c6
    • Mike Rapoport's avatar
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport authored
      
      
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
  6. 04 Jun, 2020 5 commits
    • Johannes Weiner's avatar
      mm: vmscan: reclaim writepage is IO cost · 96f8bf4f
      Johannes Weiner authored
      
      
      The VM tries to balance reclaim pressure between anon and file so as to
      reduce the amount of IO incurred due to the memory shortage.  It already
      counts refaults and swapins, but in addition it should also count
      writepage calls during reclaim.
      
      For swap, this is obvious: it's IO that wouldn't have occurred if the
      anonymous memory hadn't been under memory pressure.  From a relative
      balancing point of view this makes sense as well: even if anon is cold and
      reclaimable, a cache that isn't thrashing may have equally cold pages that
      don't require IO to reclaim.
      
      For file writeback, it's trickier: some of the reclaim writepage IO would
      have likely occurred anyway due to dirty expiration.  But not all of it -
      premature writeback reduces batching and generates additional writes.
      Since the flushers are already woken up by the time the VM starts writing
      cache pages one by one, let's assume that we'e likely causing writes that
      wouldn't have happened without memory pressure.  In addition, the per-page
      cost of IO would have probably been much cheaper if written in larger
      batches from the flusher thread rather than the single-page-writes from
      kswapd.
      
      For our purposes - getting the trend right to accelerate convergence on a
      stable state that doesn't require paging at all - this is sufficiently
      accurate.  If we later wanted to optimize for sustained thrashing, we can
      still refine the measurements.
      
      Count all writepage calls from kswapd as IO cost toward the LRU that the
      page belongs to.
      
      Why do this dynamically?  Don't we know in advance that anon pages require
      IO to reclaim, and so could build in a static bias?
      
      First, scanning is not the same as reclaiming.  If all the anon pages are
      referenced, we may not swap for a while just because we're scanning the
      anon list.  During this time, however, it's important that we age
      anonymous memory and the page cache at the same rate so that their
      hot-cold gradients are comparable.  Everything else being equal, we still
      want to reclaim the coldest memory overall.
      
      Second, we keep copies in swap unless the page changes.  If there is
      swap-backed data that's mostly read (tmpfs file) and has been swapped out
      before, we can reclaim it without incurring additional IO.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: http://lkml.kernel.org/r/20200520232525.798933-14-hannes@cmpxchg.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96f8bf4f
    • Johannes Weiner's avatar
      mm: balance LRU lists based on relative thrashing · 314b57fb
      Johannes Weiner authored
      Since the LRUs were split into anon and file lists, the VM has been
      balancing between page cache and anonymous pages based on per-list ratios
      of scanned vs.  rotated pages.  In most cases that tips page reclaim
      towards the list that is easier to reclaim and has the fewest actively
      used pages, but there are a few problems with it:
      
      1. Refaults and LRU rotations are weighted the same way, even though
         one costs IO and the other costs a bit of CPU.
      
      2. The less we scan an LRU list based on already observed rotations,
         the more we increase the sampling interval for new references, and
         rotations become even more likely on that list. This can enter a
         death spiral in which we stop looking at one list completely until
         the other one is all but annihilated by page reclaim.
      
      Since commit a528910e
      
       ("mm: thrash detection-based file cache sizing")
      we have refault detection for the page cache.  Along with swapin events,
      they are good indicators of when the file or anon list, respectively, is
      too small for its workingset and needs to grow.
      
      For example, if the page cache is thrashing, the cache pages need more
      time in memory, while there may be colder pages on the anonymous list.
      Likewise, if swapped pages are faulting back in, it indicates that we
      reclaim anonymous pages too aggressively and should back off.
      
      Replace LRU rotations with refaults and swapins as the basis for relative
      reclaim cost of the two LRUs.  This will have the VM target list balances
      that incur the least amount of IO on aggregate.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: http://lkml.kernel.org/r/20200520232525.798933-12-hannes@cmpxchg.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      314b57fb
    • Johannes Weiner's avatar
      mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() · 6058eaec
      Johannes Weiner authored
      
      
      They're the same function, and for the purpose of all callers they are
      equivalent to lru_cache_add().
      
      [akpm@linux-foundation.org: fix it for local_lock changes]
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Link: http://lkml.kernel.org/r/20200520232525.798933-5-hannes@cmpxchg.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6058eaec
    • Johannes Weiner's avatar
      mm: memcontrol: delete unused lrucare handling · d9eb1ea2
      Johannes Weiner authored
      
      
      Swapin faults were the last event to charge pages after they had already
      been put on the LRU list.  Now that we charge directly on swapin, the
      lrucare portion of the charge code is unused.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9eb1ea2
    • Johannes Weiner's avatar
      mm: memcontrol: charge swapin pages on instantiation · 4c6355b2
      Johannes Weiner authored
      Right now, users that are otherwise memory controlled can easily escape
      their containment and allocate significant amounts of memory that they're
      not being charged for.  That's because swap readahead pages are not being
      charged until somebody actually faults them into their page table.  This
      can be exploited with MADV_WILLNEED, which triggers arbitrary readahead
      allocations without charging the pages.
      
      There are additional problems with the delayed charging of swap pages:
      
      1. To implement refault/workingset detection for anonymous pages, we
         need to have a target LRU available at swapin time, but the LRU is not
         determinable until the page has been charged.
      
      2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be
         stable when the page is isolated from the LRU; otherwise, the locks
         change under us.  But swapcache gets charged after it's already on the
         LRU, and even if we cannot isolate it ourselves (since charging is not
         exactly optional).
      
      The previous patch ensured we always maintain cgroup ownership records for
      swap pages.  This patch moves the swapcache charging point from the fault
      handler to swapin time to fix all of the above problems.
      
      v2: simplify swapin error checking (Joonsoo)
      
      [hughd@google.com: fix livelock in __read_swap_cache_async()]
        Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005212246080.8458@eggly.anvils
      
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAlex Shi <alex.shi@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Link: http://lkml.kernel.org/r/20200508183105.225460-17-hannes@cmpxchg.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4c6355b2
  7. 02 Jun, 2020 1 commit
    • Qian Cai's avatar
      mm/swap_state: fix a data race in swapin_nr_pages · d6c1f098
      Qian Cai authored
      
      
      "prev_offset" is a static variable in swapin_nr_pages() that can be
      accessed concurrently with only mmap_sem held in read mode as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in swap_cluster_readahead / swap_cluster_readahead
      
       write to 0xffffffff92763830 of 8 bytes by task 14795 on cpu 17:
        swap_cluster_readahead+0x2a6/0x5e0
        swapin_readahead+0x92/0x8dc
        do_swap_page+0x49b/0xf20
        __handle_mm_fault+0xcfb/0xd70
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x715
        page_fault+0x34/0x40
      
       1 lock held by (dnf)/14795:
        #0: ffff897bd2e98858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715
        do_user_addr_fault at arch/x86/mm/fault.c:1405
        (inlined by) do_page_fault at arch/x86/mm/fault.c:1535
       irq event stamp: 83493
       count_memcg_event_mm+0x1a6/0x270
       count_memcg_event_mm+0x119/0x270
       __do_softirq+0x365/0x589
       irq_exit+0xa2/0xc0
      
       read to 0xffffffff92763830 of 8 bytes by task 1 on cpu 22:
        swap_cluster_readahead+0xfd/0x5e0
        swapin_readahead+0x92/0x8dc
        do_swap_page+0x49b/0xf20
        __handle_mm_fault+0xcfb/0xd70
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x715
        page_fault+0x34/0x40
      
       1 lock held by systemd/1:
        #0: ffff897c38f14858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715
       irq event stamp: 43530289
       count_memcg_event_mm+0x1a6/0x270
       count_memcg_event_mm+0x119/0x270
       __do_softirq+0x365/0x589
       irq_exit+0xa2/0xc0
      
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200402213748.2237-1-cai@lca.pw
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6c1f098
  8. 02 Apr, 2020 1 commit
  9. 24 Sep, 2019 2 commits
  10. 12 Jul, 2019 2 commits
    • Huang Ying's avatar
      mm/swap_state.c: simplify total_swapcache_pages() with get_swap_device() · 054f1d1f
      Huang Ying authored
      total_swapcache_pages() may race with swapper_spaces[] allocation and
      freeing.  Previously, this is protected with a swapper_spaces[] specific
      RCU mechanism.  To simplify the logic/code complexity, it is replaced with
      get/put_swap_device().  The code line number is reduced too.  Although not
      so important, the swapoff() performance improves too because one
      synchronize_rcu() call during swapoff() is deleted.
      
      [ying.huang@intel.com: fix bad swap file entry warning]
        Link: http://lkml.kernel.org/r/20190531024102.21723-1-ying.huang@intel.com
      Link: http://lkml.kernel.org/r/20190527082714.12151-1-ying.huang@intel.com
      
      
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      054f1d1f
    • Huang Ying's avatar
      mm, swap: fix race between swapoff and some swap operations · eb085574
      Huang Ying authored
      When swapin is performed, after getting the swap entry information from
      the page table, system will swap in the swap entry, without any lock held
      to prevent the swap device from being swapoff.  This may cause the race
      like below,
      
      CPU 1				CPU 2
      -----				-----
      				do_swap_page
      				  swapin_readahead
      				    __read_swap_cache_async
      swapoff				      swapcache_prepare
        p->swap_map = NULL		        __swap_duplicate
      					  p->swap_map[?] /* !!! NULL pointer access */
      
      Because swapoff is usually done when system shutdown only, the race may
      not hit many people in practice.  But it is still a race need to be fixed.
      
      To fix the race, get_swap_device() is added to check whether the specified
      swap entry is valid in its swap device.  If so, it will keep the swap
      entry valid via preventing the swap device from being swapoff, until
      put_swap_device() is called.
      
      Because swapoff() is very rare code path, to make the normal path runs as
      fast as possible, rcu_read_lock/unlock() and synchronize_rcu() instead of
      reference count is used to implement get/put_swap_device().  >From
      get_swap_device() to put_swap_device(), RCU reader side is locked, so
      synchronize_rcu() in swapoff() will wait until put_swap_device() is
      called.
      
      In addition to swap_map, cluster_info, etc.  data structure in the struct
      swap_info_struct, the swap cache radix tree will be freed after swapoff,
      so this patch fixes the race between swap cache looking up and swapoff
      too.
      
      Races between some other swap cache usages and swapoff are fixed too via
      calling synchronize_rcu() between clearing PageSwapCache() and freeing
      swap cache data structure.
      
      Another possible method to fix this is to use preempt_off() +
      stop_machine() to prevent the swap device from being swapoff when its data
      structure is being accessed.  The overhead in hot-path of both methods is
      similar.  The advantages of RCU based method are,
      
      1. stop_machine() may disturb the normal execution code path on other
         CPUs.
      
      2. File cache uses RCU to protect its radix tree.  If the similar
         mechanism is used for swap cache too, it is easier to share code
         between them.
      
      3. RCU is used to protect swap cache in total_swapcache_pages() and
         exit_swap_address_space() already.  The two mechanisms can be
         merged to simplify the logic.
      
      Link: http://lkml.kernel.org/r/20190522015423.14418-1-ying.huang@intel.com
      Fixes: 235b6217
      
       ("mm/swap: add cluster lock")
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarAndrea Parri <andrea.parri@amarulasolutions.com>
      Not-nacked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Yang Shi <yang.shi@linux.alibaba.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb085574
  11. 06 Jul, 2019 1 commit
    • Linus Torvalds's avatar
      Revert "mm: page cache: store only head pages in i_pages" · 69bf4b6b
      Linus Torvalds authored
      This reverts commit 5fd4ca2d.
      
      Mikhail Gavrilov reports that it causes the VM_BUG_ON_PAGE() in
      __delete_from_swap_cache() to trigger:
      
         page:ffffd6d34dff0000 refcount:1 mapcount:1 mapping:ffff97812323a689 index:0xfecec363
         anon
         flags: 0x17fffe00080034(uptodate|lru|active|swapbacked)
         raw: 0017fffe00080034 ffffd6d34c67c508 ffffd6d3504b8d48 ffff97812323a689
         raw: 00000000fecec363 0000000000000000 0000000100000000 ffff978433ace000
         page dumped because: VM_BUG_ON_PAGE(entry != page)
         page->mem_cgroup:ffff978433ace000
         ------------[ cut here ]------------
         kernel BUG at mm/swap_state.c:170!
         invalid opcode: 0000 [#1] SMP NOPTI
         CPU: 1 PID: 221 Comm: kswapd0 Not tainted 5.2.0-0.rc2.git0.1.fc31.x86_64 #1
         Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2202 04/11/2019
         RIP: 0010:__delete_from_swap_cache+0x20d/0x240
         Code: 30 65 48 33 04 25 28 00 00 00 75 4a 48 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 2f dc 0f 8a 48 89 c7 e8 93 1b fd ff <0f> 0b 48 c7 c6 a8 74 0f 8a e8 85 1b fd ff 0f 0b 48 c7 c6 a8 7d 0f
         RSP: 0018:ffffa982036e7980 EFLAGS: 00010046
         RAX: 0000000000000021 RBX: 0000000000000040 RCX: 0000000000000006
         RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff97843d657900
         RBP: 0000000000000001 R08: ffffa982036e7835 R09: 0000000000000535
         R10: ffff97845e21a46c R11: ffffa982036e7835 R12: ffff978426387120
         R13: 0000000000000000 R14: ffffd6d34dff0040 R15: ffffd6d34dff0000
         FS:  0000000000000000(0000) GS:ffff97843d640000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 00002cba88ef5000 CR3: 000000078a97c000 CR4: 00000000003406e0
         Call Trace:
          delete_from_swap_cache+0x46/0xa0
          try_to_free_swap+0xbc/0x110
          swap_writepage+0x13/0x70
          pageout.isra.0+0x13c/0x350
          shrink_page_list+0xc14/0xdf0
          shrink_inactive_list+0x1e5/0x3c0
          shrink_node_memcg+0x202/0x760
          shrink_node+0xe0/0x470
          balance_pgdat+0x2d1/0x510
          kswapd+0x220/0x420
          kthread+0xfb/0x130
          ret_from_fork+0x22/0x40
      
      and it's not immediately obvious why it happens.  It's too late in the
      rc cycle to do anything but revert for now.
      
      Link: https://lore.kernel.org/lkml/CABXGCsN9mYmBD-4GaaeW_NrDu+FDXLzr_6x+XNxfmFV6QkYCDg@mail.gmail.com/
      
      
      Reported-and-bisected-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Kirill Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69bf4b6b
  12. 14 May, 2019 1 commit
  13. 06 Mar, 2019 2 commits
    • Yang Shi's avatar
      mm: swap: add comment for swap_vma_readahead · e9f59873
      Yang Shi authored
      swap_vma_readahead()'s comment is missing, just add it.
      
      Link: http://lkml.kernel.org/r/1546543673-108536-2-git-send-email-yang.shi@linux.alibaba.com
      
      
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Hugh Dickins <hughd@google.com
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9f59873
    • Yang Shi's avatar
      mm: swap: check if swap backing device is congested or not · 8fd2e0b5
      Yang Shi authored
      Swap readahead would read in a few pages regardless if the underlying
      device is busy or not.  It may incur long waiting time if the device is
      congested, and it may also exacerbate the congestion.
      
      Use inode_read_congested() to check if the underlying device is busy or
      not like what file page readahead does.  Get inode from
      swap_info_struct.
      
      Although we can add inode information in swap_address_space
      (address_space->host), it may lead some unexpected side effect, i.e.  it
      may break mapping_cap_account_dirty().  Using inode from
      swap_info_struct seems simple and good enough.
      
      Just does the check in vma_cluster_readahead() since
      swap_vma_readahead() is just used for non-rotational device which much
      less likely has congestion than traditional HDD.
      
      Although swap slots may be consecutive on swap partition, it still may
      be fragmented on swap file.  This check would help to reduce excessive
      stall for such case.
      
      The test with page_fault1 of will-it-scale (sometimes tracing may just
      show runtest.py that is the wrapper script of page_fault1), which
      basically launches NR_CPU threads to generate 128MB anonymous pages for
      each thread, on my virtual machine with congested HDD shows long tail
      latency is reduced significantly.
      
      Without the patch
       page_fault1_thr-1490  [023]   129.311706: funcgraph_entry:      #57377.796 us |  do_swap_page();
       page_fault1_thr-1490  [023]   129.369103: funcgraph_entry:        5.642us   |  do_swap_page();
       page_fault1_thr-1490  [023]   129.369119: funcgraph_entry:      #1289.592 us |  do_swap_page();
       page_fault1_thr-1490  [023]   129.370411: funcgraph_entry:        4.957us   |  do_swap_page();
       page_fault1_thr-1490  [023]   129.370419: funcgraph_entry:        1.940us   |  do_swap_page();
       page_fault1_thr-1490  [023]   129.378847: funcgraph_entry:      #1411.385 us |  do_swap_page();
       page_fault1_thr-1490  [023]   129.380262: funcgraph_entry:        3.916us   |  do_swap_page();
       page_fault1_thr-1490  [023]   129.380275: funcgraph_entry:      #4287.751 us |  do_swap_page();
      
      With the patch
            runtest.py-1417  [020]   301.925911: funcgraph_entry:      #9870.146 us |  do_swap_page();
            runtest.py-1417  [020]   301.935785: funcgraph_entry:        9.802us   |  do_swap_page();
            runtest.py-1417  [020]   301.935799: funcgraph_entry:        3.551us   |  do_swap_page();
            runtest.py-1417  [020]   301.935806: funcgraph_entry:        2.142us   |  do_swap_page();
            runtest.py-1417  [020]   301.935853: funcgraph_entry:        6.938us   |  do_swap_page();
            runtest.py-1417  [020]   301.935864: funcgraph_entry:        3.765us   |  do_swap_page();
            runtest.py-1417  [020]   301.935871: funcgraph_entry:        3.600us   |  do_swap_page();
            runtest.py-1417  [020]   301.935878: funcgraph_entry:        7.202us   |  do_swap_page();
      
      [akpm@linux-foundation.org: code cleanup]
      [yang.shi@linux.alibaba.com: add comment]
        Link: http://lkml.kernel.org/r/bbc7bda7-62d0-df1a-23ef-d369e865bdca@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1546543673-108536-1-git-send-email-yang.shi@linux.alibaba.com
      
      
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarTim Chen <tim.c.chen@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Hugh Dickins <hughd@google.com
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fd2e0b5
  14. 26 Oct, 2018 1 commit
    • Johannes Weiner's avatar
      mm: workingset: tell cache transitions from workingset thrashing · 1899ad18
      Johannes Weiner authored
      Refaults happen during transitions between workingsets as well as in-place
      thrashing.  Knowing the difference between the two has a range of
      applications, including measuring the impact of memory shortage on the
      system performance, as well as the ability to smarter balance pressure
      between the filesystem cache and the swap-backed workingset.
      
      During workingset transitions, inactive cache refaults and pushes out
      established active cache.  When that active cache isn't stale, however,
      and also ends up refaulting, that's bonafide thrashing.
      
      Introduce a new page flag that tells on eviction whether the page has been
      active or not in its lifetime.  This bit is then stored in the shadow
      entry, to classify refaults as transitioning or thrashing.
      
      How many page->flags does this leave us with on 32-bit?
      
      	20 bits are always page flags
      
      	21 if you have an MMU
      
      	23 with the zone bits for DMA, Normal, HighMem, Movable
      
      	29 with the sparsemem section bits
      
      	30 if PAE is enabled
      
      	31 with this patch.
      
      So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes.  If
      that's not enough, the system can switch to discontigmem and re-gain the 6
      or 7 sparsemem section bits.
      
      Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org
      
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarDaniel Drake <drake@endlessm.com>
      Tested-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Johannes Weiner <jweiner@fb.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Enderborg <peter.enderborg@sony.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1899ad18
  15. 21 Oct, 2018 3 commits
  16. 12 Jun, 2018 1 commit
    • Kees Cook's avatar
      treewide: kvzalloc() -> kvcalloc() · 778e1cdd
      Kees Cook authored
      
      
      The kvzalloc() function has a 2-factor argument form, kvcalloc(). This
      patch replaces cases of:
      
              kvzalloc(a * b, gfp)
      
      with:
              kvcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kvzalloc(a * b * c, gfp)
      
      with:
      
              kvzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kvcalloc(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kvzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kvzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kvzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kvzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kvzalloc
      + kvcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kvzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kvzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kvzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kvzalloc(C1 * C2 * C3, ...)
      |
        kvzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kvzalloc(sizeof(THING) * C2, ...)
      |
        kvzalloc(sizeof(TYPE) * C2, ...)
      |
        kvzalloc(C1 * C2 * C3, ...)
      |
        kvzalloc(C1 * C2, ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      778e1cdd
  17. 08 Jun, 2018 1 commit
  18. 11 Apr, 2018 1 commit
  19. 06 Apr, 2018 3 commits
  20. 16 Nov, 2017 3 commits
  21. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard...
      b2441318
  22. 13 Oct, 2017 1 commit
  23. 04 Oct, 2017 1 commit
  24. 07 Sep, 2017 1 commit