1. 16 Nov, 2019 13 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · bec8b6e9
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "11 fixes"
      
      MM fixes and one xz decompressor fix.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/debug.c: PageAnon() is true for PageKsm() pages
        mm/debug.c: __dump_page() prints an extra line
        mm/page_io.c: do not free shared swap slots
        mm/memory_hotplug: fix try_offline_node()
        mm,thp: recheck each page before collapsing file THP
        mm: slub: really fix slab walking for init_on_free
        mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
        mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
        lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations
        mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
        mm: mempolicy: fix the wrong return value and potential pages leak of mbind
      bec8b6e9
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 6c9594bd
      Linus Torvalds authored
      Pull more input fixes from Dmitry Torokhov:
       "A couple of fixes in driver teardown paths and another ID for
        Synaptics RMI mode"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: synaptics - enable RMI mode for X1 Extreme 2nd Generation
        Input: synaptics-rmi4 - destroy F54 poller workqueue when removing
        Input: ff-memless - kill timer in destroy()
      6c9594bd
    • Ralph Campbell's avatar
      mm/debug.c: PageAnon() is true for PageKsm() pages · 6855ac4a
      Ralph Campbell authored
      PageAnon() and PageKsm() use the low two bits of the page->mapping
      pointer to indicate the page type.  PageAnon() only checks the LSB while
      PageKsm() checks the least significant 2 bits are equal to 3.
      
      Therefore, PageAnon() is true for KSM pages.  __dump_page() incorrectly
      will never print "ksm" because it checks PageAnon() first.  Fix this by
      checking PageKsm() first.
      
      Link: http://lkml.kernel.org/r/20191113000651.20677-1-rcampbell@nvidia.com
      Fixes: 1c6fb1d8
      
       ("mm: print more information about mapping in __dump_page")
      Signed-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6855ac4a
    • Ralph Campbell's avatar
      mm/debug.c: __dump_page() prints an extra line · 76a1850e
      Ralph Campbell authored
      When dumping struct page information, __dump_page() prints the page type
      with a trailing blank followed by the page flags on a separate line:
      
        anon
        flags: 0x100000000090034(uptodate|lru|active|head|swapbacked)
      
      It looks like the intent was to use pr_cont() for printing "flags:" but
      pr_cont() usage is discouraged so fix this by extending the format to
      include the flags into a single line:
      
        anon flags: 0x100000000090034(uptodate|lru|active|head|swapbacked)
      
      If the page is file backed, the name might be long so use two lines:
      
        shmem_aops name:"dev/zero"
        flags: 0x10000000008000c(uptodate|dirty|swapbacked)
      
      Eliminate pr_conf() usage as well for appending compound_mapcount.
      
      Link: http://lkml.kernel.org/r/20191112012608.16926-1-rcampbell@nvidia.com
      
      Signed-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76a1850e
    • Vinayak Menon's avatar
      mm/page_io.c: do not free shared swap slots · 5df373e9
      Vinayak Menon authored
      The following race is observed due to which a processes faulting on a
      swap entry, finds the page neither in swapcache nor swap.  This causes
      zram to give a zero filled page that gets mapped to the process,
      resulting in a user space crash later.
      
      Consider parent and child processes Pa and Pb sharing the same swap slot
      with swap_count 2.  Swap is on zram with SWP_SYNCHRONOUS_IO set.
      Virtual address 'VA' of Pa and Pb points to the shared swap entry.
      
      Pa                                       Pb
      
      fault on VA                              fault on VA
      do_swap_page                             do_swap_page
      lookup_swap_cache fails                  lookup_swap_cache fails
                                               Pb scheduled out
      swapin_readahead (deletes zram entry)
      swap_free (makes swap_count 1)
                                               Pb scheduled in
                                               swap_readpage (swap_count == 1)
                                               Takes SWP_SYNCHRONOUS_IO path
                                               zram enrty absent
                                               zram gives a zero filled page
      
      Fix this by making sure that swap slot is freed only when swap count
      drops down to one.
      
      Link: http://lkml.kernel.org/r/1571743294-14285-1-git-send-email-vinmenon@codeaurora.org
      Fixes: aa8d22a1
      
       ("mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference")
      Signed-off-by: default avatarVinayak Menon <vinmenon@codeaurora.org>
      Suggested-by: default avatarMinchan Kim <minchan@google.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5df373e9
    • David Hildenbrand's avatar
      mm/memory_hotplug: fix try_offline_node() · 2c91f8fc
      David Hildenbrand authored
      try_offline_node() is pretty much broken right now:
      
       - The node span is updated when onlining memory, not when adding it. We
         ignore memory that was mever onlined. Bad.
      
       - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
         trigger a kernel panic. Bad for memory that is offline but also bad
         for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
         first PFN of a section might contain garbage.
      
       - Sections belonging to mixed nodes are not properly considered.
      
      As memory blocks might belong to multiple nodes, we would have to walk
      all pageblocks (or at least subsections) within present sections.
      However, we don't have a way to identify whether a memmap that is not
      online was initialized (relevant for ZONE_DEVICE).  This makes things
      more complicated.
      
      Luckily, we can piggy pack on the node span and the nid stored in memory
      blocks.  Currently, the node span is grown when calling
      move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
      removing memory, before calling try_offline_node().  Sysfs links are
      created via link_mem_sections(), e.g., during boot or when adding
      memory.
      
      If the node still spans memory or if any memory block belongs to the
      nid, we don't set the node offline.  As memory blocks that span multiple
      nodes cannot get offlined, the nid stored in memory blocks is reliable
      enough (for such online memory blocks, the node still spans the memory).
      
      Introduce for_each_memory_block() to efficiently walk all memory blocks.
      
      Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
      when removing ZONE_DEVICE memory to fix similar issues (access of
      garbage memmaps) - until we have a reliable way to identify whether
      these memmaps were properly initialized.  This implies later, that once
      a node had ZONE_DEVICE memory, we won't be able to set a node offline -
      which should be acceptable.
      
      Since commit f1dd2cd1 ("mm, memory_hotplug: do not associate
      hotadded memory to zones until online") memory that is added is not
      assoziated with a zone/node (memmap not initialized).  The introducing
      commit 60a5a19e ("memory-hotplug: remove sysfs file of node")
      already missed that we could have multiple nodes for a section and that
      the zone/node span is updated when onlining pages, not when adding them.
      
      I tested this by hotplugging two DIMMs to a memory-less and cpu-less
      NUMA node.  The node is properly onlined when adding the DIMMs.  When
      removing the DIMMs, the node is properly offlined.
      
      Masayoshi Mizuma reported:
      
      : Without this patch, memory hotplug fails as panic:
      :
      :  BUG: kernel NULL pointer dereference, address: 0000000000000000
      :  ...
      :  Call Trace:
      :   remove_memory_block_devices+0x81/0xc0
      :   try_remove_memory+0xb4/0x130
      :   __remove_memory+0xa/0x20
      :   acpi_memory_device_remove+0x84/0x100
      :   acpi_bus_trim+0x57/0x90
      :   acpi_bus_trim+0x2e/0x90
      :   acpi_device_hotplug+0x2b2/0x4d0
      :   acpi_hotplug_work_fn+0x1a/0x30
      :   process_one_work+0x171/0x380
      :   worker_thread+0x49/0x3f0
      :   kthread+0xf8/0x130
      :   ret_from_fork+0x35/0x40
      
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
      Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
      Fixes: 60a5a19e ("memory-hotplug: remove sysfs file of node")
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e8
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Nayna Jain <nayna@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c91f8fc
    • Song Liu's avatar
      mm,thp: recheck each page before collapsing file THP · 4655e5e5
      Song Liu authored
      In collapse_file(), for !is_shmem case, current check cannot guarantee
      the locked page is up-to-date.  Specifically, xas_unlock_irq() should
      not be called before lock_page() and get_page(); and it is necessary to
      recheck PageUptodate() after locking the page.
      
      With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text
      may contain corrupted data.  This is because khugepaged mistakenly
      collapses some not up-to-date sub pages into a huge page, and assumes
      the huge page is up-to-date.  This will NOT corrupt data in the disk,
      because the page is read-only and never written back.  Fix this by
      properly checking PageUptodate() after locking the page.  This check
      replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);".
      
      Also, move PageDirty() check after locking the page.  Current khugepaged
      should not try to collapse dirty file THP, because it is limited to
      read-only .text.  The only case we hit a dirty page here is when the
      page hasn't been written since write.  Bail out and retry when this
      happens.
      
      syzbot reported bug on previous version of this patch.
      
      Link: http://lkml.kernel.org/r/20191106060930.2571389-2-songliubraving@fb.com
      Fixes: 99cb0dbd
      
       ("mm,thp: add read-only THP support for (non-shmem) FS")
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Reported-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4655e5e5
    • Laura Abbott's avatar
      mm: slub: really fix slab walking for init_on_free · aea4df4c
      Laura Abbott authored
      Commit 1b7e816f ("mm: slub: Fix slab walking for init_on_free")
      fixed one problem with the slab walking but missed a key detail: When
      walking the list, the head and tail pointers need to be updated since we
      end up reversing the list as a result.  Without doing this, bulk free is
      broken.
      
      One way this is exposed is a NULL pointer with slub_debug=F:
      
        =============================================================================
        BUG skbuff_head_cache (Tainted: G                T): Object already free
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x000000000d2d2f8f objects=16 used=3 fp=0x0000000064309071 flags=0x3fff00000000201
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        Oops: 0000 [#1] PREEMPT SMP PTI
        RIP: 0010:print_trailer+0x70/0x1d5
        Call Trace:
         <IRQ>
         free_debug_processing.cold.37+0xc9/0x149
         __slab_free+0x22a/0x3d0
         kmem_cache_free_bulk+0x415/0x420
         __kfree_skb_flush+0x30/0x40
         net_rx_action+0x2dd/0x480
         __do_softirq+0xf0/0x246
         irq_exit+0x93/0xb0
         do_IRQ+0xa0/0x110
         common_interrupt+0xf/0xf
         </IRQ>
      
      Given we're now almost identical to the existing debugging code which
      correctly walks the list, combine with that.
      
      Link: https://lkml.kernel.org/r/20191104170303.GA50361@gandi.net
      Link: http://lkml.kernel.org/r/20191106222208.26815-1-labbott@redhat.com
      Fixes: 1b7e816f
      
       ("mm: slub: Fix slab walking for init_on_free")
      Signed-off-by: default avatarLaura Abbott <labbott@redhat.com>
      Reported-by: default avatarThibaut Sautereau <thibaut.sautereau@clip-os.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <clipos@ssi.gouv.fr>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aea4df4c
    • Roman Gushchin's avatar
      mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() · 0362f326
      Roman Gushchin authored
      An exiting task might belong to an offline cgroup.  In this case an
      attempt to grab a cgroup reference from the task can end up with an
      infinite loop in hugetlb_cgroup_charge_cgroup(), because neither the
      cgroup will become online, neither the task will be migrated to a live
      cgroup.
      
      Fix this by switching over to css_tryget().  As css_tryget_online()
      can't guarantee that the cgroup won't go offline, in most cases the
      check doesn't make sense.  In this particular case users of
      hugetlb_cgroup_charge_cgroup() are not affected by this change.
      
      A similar problem is described by commit 18fa84a2 ("cgroup: Use
      css_tryget() instead of css_tryget_online() in task_get_css()").
      
      Link: http://lkml.kernel.org/r/20191106225131.3543616-2-guro@fb.com
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0362f326
    • Roman Gushchin's avatar
      mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() · 00d484f3
      Roman Gushchin authored
      We've encountered a rcu stall in get_mem_cgroup_from_mm():
      
        rcu: INFO: rcu_sched self-detected stall on CPU
        rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
        (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
        <...>
        RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
        <...>
         __memcg_kmem_charge+0x55/0x140
         __alloc_pages_nodemask+0x267/0x320
         pipe_write+0x1ad/0x400
         new_sync_write+0x127/0x1c0
         __kernel_write+0x4f/0xf0
         dump_emit+0x91/0xc0
         writenote+0xa0/0xc0
         elf_core_dump+0x11af/0x1430
         do_coredump+0xc65/0xee0
         get_signal+0x132/0x7c0
         do_signal+0x36/0x640
         exit_to_usermode_loop+0x61/0xd0
         do_syscall_64+0xd4/0x100
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The problem is caused by an exiting task which is associated with an
      offline memcg.  We're iterating over and over in the do {} while
      (!css_tryget_online()) loop, but obviously the memcg won't become online
      and the exiting task won't be migrated to a live memcg.
      
      Let's fix it by switching from css_tryget_online() to css_tryget().
      
      As css_tryget_online() cannot guarantee that the memcg won't go offline,
      the check is usually useless, except some rare cases when for example it
      determines if something should be presented to a user.
      
      A similar problem is described by commit 18fa84a2 ("cgroup: Use
      css_tryget() instead of css_tryget_online() in task_get_css()").
      
      Johannes:
      
      : The bug aside, it doesn't matter whether the cgroup is online for the
      : callers.  It used to matter when offlining needed to evacuate all charges
      : from the memcg, and so needed to prevent new ones from showing up, but we
      : don't care now.
      
      Link: http://lkml.kernel.org/r/20191106225131.3543616-1-guro@fb.com
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarShakeel Butt <shakeeb@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Koutn <mkoutny@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      00d484f3
    • Lasse Collin's avatar
      lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations · 8e20ba2e
      Lasse Collin authored
      s->dict.allocated was initialized to 0 but never set after a successful
      allocation, thus the code always thought that the dictionary buffer has
      to be reallocated.
      
      Link: http://lkml.kernel.org/r/20191104185107.3b6330df@tukaani.org
      
      Signed-off-by: default avatarLasse Collin <lasse.collin@tukaani.org>
      Reported-by: default avatarYu Sun <yusun2@cisco.com>
      Acked-by: default avatarDaniel Walker <danielwa@cisco.com>
      Cc: "Yixia Si (yisi)" <yisi@cisco.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e20ba2e
    • zhong jiang's avatar
      mm: fix trying to reclaim unevictable lru page when calling madvise_pageout · 82072962
      zhong jiang authored
      Recently, I hit the following issue when running upstream.
      
        kernel BUG at mm/vmscan.c:1521!
        invalid opcode: 0000 [#1] SMP KASAN PTI
        CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1
        RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521
        Call Trace:
         reclaim_pages+0x499/0x800 mm/vmscan.c:2188
         madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453
         walk_pmd_range mm/pagewalk.c:53 [inline]
         walk_pud_range mm/pagewalk.c:112 [inline]
         walk_p4d_range mm/pagewalk.c:139 [inline]
         walk_pgd_range mm/pagewalk.c:166 [inline]
         __walk_page_range+0x45a/0xc20 mm/pagewalk.c:261
         walk_page_range+0x179/0x310 mm/pagewalk.c:349
         madvise_pageout_page_range mm/madvise.c:506 [inline]
         madvise_pageout+0x1f0/0x330 mm/madvise.c:542
         madvise_vma mm/madvise.c:931 [inline]
         __do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113
         do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      madvise_pageout() accesses the specified range of the vma and isolates
      them, then runs shrink_page_list() to reclaim its memory.  But it also
      isolates the unevictable pages to reclaim.  Hence, we can catch the
      cases in shrink_page_list().
      
      The root cause is that we scan the page tables instead of specific LRU
      list.  and so we need to filter out the unevictable lru pages from our
      end.
      
      Link: http://lkml.kernel.org/r/1572616245-18946-1-git-send-email-zhongjiang@huawei.com
      Fixes: 1a4e58cc
      
       ("mm: introduce MADV_PAGEOUT")
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82072962
    • Yang Shi's avatar
      mm: mempolicy: fix the wrong return value and potential pages leak of mbind · a85dfc30
      Yang Shi authored
      Commit d8835445 ("mm: mempolicy: make the behavior consistent when
      MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") fixed the return value
      of mbind() for a couple of corner cases.  But, it altered the errno for
      some other cases, for example, mbind() should return -EFAULT when part
      or all of the memory range specified by nodemask and maxnode points
      outside your accessible address space, or there was an unmapped hole in
      the specified memory range specified by addr and len.
      
      Fix this by preserving the errno returned by queue_pages_range().  And,
      the pagelist may be not empty even though queue_pages_range() returns
      error, put the pages back to LRU since mbind_range() is not called to
      really apply the policy so those pages should not be migrated, this is
      also the old behavior before the problematic commit.
      
      Link: http://lkml.kernel.org/r/1572454731-3925-1-git-send-email-yang.shi@linux.alibaba.com
      Fixes: d8835445
      
       ("mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified")
      Signed-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Reported-by: default avatarLi Xinhai <lixinhai.lxh@gmail.com>
      Reviewed-by: default avatarLi Xinhai <lixinhai.lxh@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>	[4.19 and 5.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a85dfc30
  2. 15 Nov, 2019 17 commits
  3. 14 Nov, 2019 10 commits
    • Xiaojie Yuan's avatar
      drm/amdgpu: fix null pointer deref in firmware header printing · a84fddb1
      Xiaojie Yuan authored
      
      
      v2: declare as (struct common_firmware_header *) type because
          struct xxx_firmware_header inherits from it
      
      When CE's ucode_id(8) is used to get sdma_hdr, we will be accessing an
      unallocated amdgpu_firmware_info instance.
      
      This issue appears on rhel7.7 with gcc 4.8.5. Newer compilers might have
      optimized out such 'defined but not referenced' variable.
      
      [ 1120.798564] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
      [ 1120.806703] IP: [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
      [ 1120.813693] PGD 80000002603ff067 PUD 271b8d067 PMD 0
      [ 1120.818931] Oops: 0000 [#1] SMP
      [ 1120.822245] Modules linked in: amdgpu(OE+) amdkcl(OE) amd_iommu_v2 amdttm(OE) amd_sched(OE) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun bridge stp llc devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dm_mirror dm_region_hash dm_log dm_mod intel_pmc_core intel_powerclamp coretemp intel_rapl joydev kvm_intel eeepc_wmi asus_wmi kvm sparse_keymap iTCO_wdt irqbypass rfkill crc32_pclmul snd_hda_codec_realtek mxm_wmi ghash_clmulni_intel intel_wmi_thunderbolt iTCO_vendor_support snd_hda_codec_generic snd_hda_codec_hdmi aesni_intel lrw gf128mul glue_helper ablk_helper sg cryptd pcspkr snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd pinctrl_sunrisepoint pinctrl_intel soundcore acpi_pad mei_me wmi mei i2c_i801 pcc_cpufreq ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic i915 i2c_algo_bit iosf_mbi drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm ptp libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw pps_core drm_panel_orientation_quirks video i2c_hid
      [ 1120.954136] CPU: 4 PID: 2426 Comm: modprobe Tainted: G           OE  ------------   3.10.0-1062.el7.x86_64 #1
      [ 1120.964390] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 1302 11/09/2015
      [ 1120.973321] task: ffff991ef1e3c1c0 ti: ffff991ee625c000 task.ti: ffff991ee625c000
      [ 1120.981020] RIP: 0010:[<ffffffffc0e3c9b3>]  [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
      [ 1120.990483] RSP: 0018:ffff991ee625f950  EFLAGS: 00010202
      [ 1120.995935] RAX: 0000000000000002 RBX: ffff991edf6b2d38 RCX: ffff991edf6a0000
      [ 1121.003391] RDX: 0000000000000000 RSI: ffff991f01d13898 RDI: ffffffffc110afb3
      [ 1121.010706] RBP: ffff991ee625f9b0 R08: 0000000000000000 R09: 0000000000000000
      [ 1121.018029] R10: 00000000000004c4 R11: ffff991ee625f64e R12: ffff991edf6b3220
      [ 1121.025353] R13: ffff991edf6a0000 R14: 0000000000000008 R15: ffff991edf6b2d30
      [ 1121.032666] FS:  00007f97b0c0b740(0000) GS:ffff991f01d00000(0000) knlGS:0000000000000000
      [ 1121.041000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1121.046880] CR2: 000000000000000a CR3: 000000025e604000 CR4: 00000000003607e0
      [ 1121.054239] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1121.061631] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1121.068938] Call Trace:
      [ 1121.071494]  [<ffffffffc0e3dba8>] psp_hw_init+0x218/0x270 [amdgpu]
      [ 1121.077886]  [<ffffffffc0da3188>] amdgpu_device_fw_loading+0xe8/0x160 [amdgpu]
      [ 1121.085296]  [<ffffffffc0e3b34c>] ? vega10_ih_irq_init+0x4bc/0x730 [amdgpu]
      [ 1121.092534]  [<ffffffffc0da5c75>] amdgpu_device_init+0x1495/0x1c90 [amdgpu]
      [ 1121.099675]  [<ffffffffc0da9cab>] amdgpu_driver_load_kms+0x8b/0x2f0 [amdgpu]
      [ 1121.106888]  [<ffffffffc01b25cf>] drm_dev_register+0x12f/0x1d0 [drm]
      [ 1121.113419]  [<ffffffffa4dcdfd8>] ? pci_enable_device_flags+0xe8/0x140
      [ 1121.120183]  [<ffffffffc0da260a>] amdgpu_pci_probe+0xca/0x170 [amdgpu]
      [ 1121.126919]  [<ffffffffa4dcf97a>] local_pci_probe+0x4a/0xb0
      [ 1121.132622]  [<ffffffffa4dd10c9>] pci_device_probe+0x109/0x160
      [ 1121.138607]  [<ffffffffa4eb4205>] driver_probe_device+0xc5/0x3e0
      [ 1121.144766]  [<ffffffffa4eb4603>] __driver_attach+0x93/0xa0
      [ 1121.150507]  [<ffffffffa4eb4570>] ? __device_attach+0x50/0x50
      [ 1121.156422]  [<ffffffffa4eb1da5>] bus_for_each_dev+0x75/0xc0
      [ 1121.162213]  [<ffffffffa4eb3b7e>] driver_attach+0x1e/0x20
      [ 1121.167771]  [<ffffffffa4eb3620>] bus_add_driver+0x200/0x2d0
      [ 1121.173590]  [<ffffffffa4eb4c94>] driver_register+0x64/0xf0
      [ 1121.179345]  [<ffffffffa4dd0905>] __pci_register_driver+0xa5/0xc0
      [ 1121.185593]  [<ffffffffc099f000>] ? 0xffffffffc099efff
      [ 1121.190914]  [<ffffffffc099f0a4>] amdgpu_init+0xa4/0xb0 [amdgpu]
      [ 1121.197101]  [<ffffffffa4a0210a>] do_one_initcall+0xba/0x240
      [ 1121.202901]  [<ffffffffa4b1c90a>] load_module+0x271a/0x2bb0
      [ 1121.208598]  [<ffffffffa4dad740>] ? ddebug_proc_write+0x100/0x100
      [ 1121.214894]  [<ffffffffa4b1ce8f>] SyS_init_module+0xef/0x140
      [ 1121.220698]  [<ffffffffa518bede>] system_call_fastpath+0x25/0x2a
      [ 1121.226870] Code: b4 01 60 a2 00 00 31 c0 e8 83 60 33 e4 41 8b 47 08 48 8b 4d d0 48 c7 c7 b3 af 10 c1 48 69 c0 68 07 00 00 48 8b 84 01 60 a2 00 00 <48> 8b 70 08 31 c0 48 89 75 c8 e8 56 60 33 e4 48 8b 4d d0 48 c7
      [ 1121.247422] RIP  [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
      [ 1121.254432]  RSP <ffff991ee625f950>
      [ 1121.258017] CR2: 000000000000000a
      [ 1121.261427] ---[ end trace e98b35387ede75bd ]---
      Signed-off-by: default avatarXiaojie Yuan <xiaojie.yuan@amd.com>
      Fixes: c5fb9126
      
       ("drm/amdgpu: add firmware header printing for psp fw loading (v2)")
      Reviewed-by: default avatarKevin Wang <kevin1.wang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      a84fddb1
    • Chuhong Yuan's avatar
      rsxx: add missed destroy_workqueue calls in remove · dcb77e4b
      Chuhong Yuan authored
      
      
      The driver misses calling destroy_workqueue in remove like what is done
      when probe fails.
      Add the missed calls to fix it.
      Signed-off-by: default avatarChuhong Yuan <hslester96@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dcb77e4b
    • Jiufei Xue's avatar
      iocost: check active_list of all the ancestors in iocg_activate() · 8b37bc27
      Jiufei Xue authored
      There is a bug that checking the same active_list over and over again
      in iocg_activate(). The intention of the code was checking whether all
      the ancestors and self have already been activated. So fix it.
      
      Fixes: 7caa4715
      
       ("blkcg: implement blk-iocost")
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8b37bc27
    • Ilya Dryomov's avatar
      rbd: silence bogus uninitialized warning in rbd_object_map_update_finish() · 633739b2
      Ilya Dryomov authored
      
      
      Some versions of gcc (so far 6.3 and 7.4) throw a warning:
      
        drivers/block/rbd.c: In function 'rbd_object_map_callback':
        drivers/block/rbd.c:2124:21: warning: 'current_state' may be used uninitialized in this function [-Wmaybe-uninitialized]
              (current_state == OBJECT_EXISTS && state == OBJECT_EXISTS_CLEAN))
        drivers/block/rbd.c:2092:23: note: 'current_state' was declared here
          u8 state, new_state, current_state;
                                ^~~~~~~~~~~~~
      
      It's bogus because all current_state accesses are guarded by
      has_current_state.
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      633739b2
    • Jeff Layton's avatar
      ceph: increment/decrement dio counter on async requests · 6a81749e
      Jeff Layton authored
      Ceph can in some cases issue an async DIO request, in which case we can
      end up calling ceph_end_io_direct before the I/O is actually complete.
      That may allow buffered operations to proceed while DIO requests are
      still in flight.
      
      Fix this by incrementing the i_dio_count when issuing an async DIO
      request, and decrement it when tearing down the aio_req.
      
      Fixes: 321fe13c
      
       ("ceph: add buffered/direct exclusionary locking for reads and writes")
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      6a81749e
    • Jeff Layton's avatar
      ceph: take the inode lock before acquiring cap refs · a81bc310
      Jeff Layton authored
      Most of the time, we (or the vfs layer) takes the inode_lock and then
      acquires caps, but ceph_read_iter does the opposite, and that can lead
      to a deadlock.
      
      When there are multiple clients treading over the same data, we can end
      up in a situation where a reader takes caps and then tries to acquire
      the inode_lock. Another task holds the inode_lock and issues a request
      to the MDS which needs to revoke the caps, but that can't happen until
      the inode_lock is unwedged.
      
      Fix this by having ceph_read_iter take the inode_lock earlier, before
      attempting to acquire caps.
      
      Fixes: 321fe13c ("ceph: add buffered/direct exclusionary locking for reads and writes")
      Link: https://tracker.ceph.com/issues/36348
      
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      a81bc310
    • Takashi Iwai's avatar
      ALSA: usb-audio: Fix incorrect size check for processing/extension units · 976a68f0
      Takashi Iwai authored
      The recently introduced unit descriptor validation had some bug for
      processing and extension units, it counts a bControlSize byte twice so
      it expected a bigger size than it should have been.  This seems
      resulting in a probe error on a few devices.
      
      Fix the calculation for proper checks of PU and EU.
      
      Fixes: 57f87706 ("ALSA: usb-audio: More validations of descriptor units")
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20191114165613.7422-1-tiwai@suse.de
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      976a68f0
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.4-3' of... · 96b95eff
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - fix build error when compiling SPARC VDSO with CONFIG_COMPAT=y
      
       - pass correct --arch option to Sparse
      
      * tag 'kbuild-fixes-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: tell sparse about the $ARCH
        sparc: vdso: fix build error of vdso32
      96b95eff
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 4e84608c
      Linus Torvalds authored
      Pull RDMA fixes from Jason Gunthorpe:
       "Bug fixes for old bugs in the hns and hfi1 drivers:
      
         - Calculate various values in hns properly to avoid over/underflows
           in some cases
      
         - Fix an oops, PCI negotiation on Gen4 systems, and bugs related to
           retries"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/hns: Correct the value of srq_desc_size
        RDMA/hns: Correct the value of HNS_ROCE_HEM_CHUNK_LEN
        IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR
        IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA
        IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet
        IB/hfi1: Ensure full Gen3 speed in a Gen4 system
      4e84608c
    • Sean Christopherson's avatar
      KVM: x86/mmu: Take slots_lock when using kvm_mmu_zap_all_fast() · ed69a6cb
      Sean Christopherson authored
      Acquire the per-VM slots_lock when zapping all shadow pages as part of
      toggling nx_huge_pages.  The fast zap algorithm relies on exclusivity
      (via slots_lock) to identify obsolete vs. valid shadow pages, because it
      uses a single bit for its generation number. Holding slots_lock also
      obviates the need to acquire a read lock on the VM's srcu.
      
      Failing to take slots_lock when toggling nx_huge_pages allows multiple
      instances of kvm_mmu_zap_all_fast() to run concurrently, as the other
      user, KVM_SET_USER_MEMORY_REGION, does not take the global kvm_lock.
      (kvm_mmu_zap_all_fast() does take kvm->mmu_lock, but it can be
      temporarily dropped by kvm_zap_obsolete_pages(), so it is not enough
      to enforce exclusivity).
      
      Concurrent fast zap instances causes obsolete shadow pages to be
      incorrectly identified as valid due to the single bit generation number
      wrapping, which results in stale shadow pages being left in KVM's MMU
      and leads to all sorts of undesirable behavior.
      The bug is easily confirmed by running with CONFIG_PROVE_LOCKING and
      toggling nx_huge_pages via its module param.
      
      Note, until commit 4ae5acbc4936 ("KVM: x86/mmu: Take slots_lock when
      using kvm_mmu_zap_all_fast()", 2019-11-13) the fast zap algorithm used
      an ulong-sized generation instead of relying on exclusivity for
      correctness, but all callers except the recently added set_nx_huge_pages()
      needed to hold slots_lock anyways.  Therefore, this patch does not have
      to be backported to stable kernels.
      
      Given that toggling nx_huge_pages is by no means a fast path, force it
      to conform to the current approach instead of reintroducing the previous
      generation count.
      
      Fixes: b8e8c830
      
       ("kvm: mmu: ITLB_MULTIHIT mitigation", but NOT FOR STABLE)
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ed69a6cb