1. 24 Sep, 2019 1 commit
    • akpm@linux-foundation.org's avatar
      mm/gup: add make_dirty arg to put_user_pages_dirty_lock() · 2d15eb31
      akpm@linux-foundation.org authored
      [11~From: John Hubbard <jhubbard@nvidia.com>
      Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
      
      Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()",
      v3.
      
      There are about 50+ patches in my tree [2], and I'll be sending out the
      remaining ones in a few more groups:
      
      * The block/bio related changes (Jerome mostly wrote those, but I've had
        to move stuff around extensively, and add a little code)
      
      * mm/ changes
      
      * other subsystem patches
      
      * an RFC that shows the current state of the tracking patch set.  That
        can only be applied after all call sites are converted, but it's good to
        get an early look at it.
      
      This is part a tree-wide conversion, as described in fc1d8e7c ("mm:
      introduce put_user_page*(), placeholder versions").
      
      This patch (of 3):
      
      Provide more capable variation of put_user_pages_dirty_lock(), and delete
      put_user_pages_dirty().  This is based on the following:
      
      1.  Lots of call sites become simpler if a bool is passed into
         put_user_page*(), instead of making the call site choose which
         put_user_page*() variant to call.
      
      2.  Christoph Hellwig's observation that set_page_dirty_lock() is
         usually correct, and set_page_dirty() is usually a bug, or at least
         questionable, within a put_user_page*() calling chain.
      
      This leads to the following API choices:
      
          * put_user_pages_dirty_lock(page, npages, make_dirty)
      
          * There is no put_user_pages_dirty(). You have to
            hand code that, in the rare case that it's
            required.
      
      [jhubbard@nvidia.com: remove unused variable in siw_free_plist()]
        Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com
      Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com
      
      
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d15eb31
  2. 21 Sep, 2019 1 commit
  3. 16 Sep, 2019 1 commit
  4. 13 Sep, 2019 2 commits
  5. 27 Aug, 2019 1 commit
  6. 21 Aug, 2019 12 commits
  7. 20 Aug, 2019 6 commits
  8. 16 Aug, 2019 1 commit
  9. 12 Aug, 2019 3 commits
  10. 07 Aug, 2019 2 commits
    • Mark Zhang's avatar
      RDMA/counter: Prevent QP counter binding if counters unsupported · d97de888
      Mark Zhang authored
      In case of rdma_counter_init() fails, counter allocation and QP bind
      should not be allowed.
      
      Fixes: 413d3347 ("RDMA/counter: Add set/clear per-port auto mode support")
      Fixes: 1bd8e0a9
      
       ("RDMA/counter: Allow manual mode configuration support")
      Signed-off-by: default avatarMark Zhang <markz@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190807101819.7581-1-leon@kernel.org
      
      
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      d97de888
    • Yishai Hadas's avatar
      IB/mlx5: Fix implicit MR release flow · f591822c
      Yishai Hadas authored
      Once implicit MR is being called to be released by
      ib_umem_notifier_release() its leaves were marked as "dying".
      
      However, when dereg_mr()->mlx5_ib_free_implicit_mr()->mr_leaf_free() is
      called, it skips running the mr_leaf_free_action (i.e. umem_odp->work)
      when those leaves were marked as "dying".
      
      As such ib_umem_release() for the leaves won't be called and their MRs
      will be leaked as well.
      
      When an application exits/killed without calling dereg_mr we might hit the
      above flow.
      
      This fatal scenario is reported by WARN_ON() upon
      mlx5_ib_dealloc_ucontext() as ibcontext->per_mm_list is not empty, the
      call trace can be seen below.
      
      Originally the "dying" mark as part of ib_umem_notifier_release() was
      introduced to prevent pagefault_mr() from returning a success response
      once this happened. However, we already have today the completion
      mechanism so no need for that in those flows any more.  Even in case a
      success response will be returned the firmware will not find the pages and
      an error will be returned in the following call as a released mm will
      cause ib_umem_odp_map_dma_pages() to permanently fail mmget_not_zero().
      
      Fix the above issue by dropping the "dying" from the above flows.  The
      other flows that are using "dying" are still needed it for their
      synchronization purposes.
      
         WARNING: CPU: 1 PID: 7218 at
         drivers/infiniband/hw/mlx5/main.c:2004
      		  mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib]
         CPU: 1 PID: 7218 Comm: ibv_rc_pingpong Tainted: G     E
      	       5.2.0-rc6+ #13
         Call Trace:
         uverbs_destroy_ufile_hw+0xb5/0x120 [ib_uverbs]
         ib_uverbs_close+0x1f/0x80 [ib_uverbs]
         __fput+0xbe/0x250
         task_work_run+0x88/0xa0
         do_exit+0x2cb/0xc30
         ? __fput+0x14b/0x250
         do_group_exit+0x39/0xb0
         get_signal+0x191/0x920
         ? _raw_spin_unlock_bh+0xa/0x20
         ? inet_csk_accept+0x229/0x2f0
         do_signal+0x36/0x5e0
         ? put_unused_fd+0x5b/0x70
         ? __sys_accept4+0x1a6/0x1e0
         ? inet_hash+0x35/0x40
         ? release_sock+0x43/0x90
         ? _raw_spin_unlock_bh+0xa/0x20
         ? inet_listen+0x9f/0x120
         exit_to_usermode_loop+0x5c/0xc6
         do_syscall_64+0x182/0x1b0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
      Link: https://lore.kernel.org/r/20190805083010.21777-1-leon@kernel.org
      
      
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      f591822c
  11. 05 Aug, 2019 1 commit
  12. 01 Aug, 2019 5 commits
    • Jack Morgenstein's avatar
      IB/mad: Fix use-after-free in ib mad completion handling · 770b7d96
      Jack Morgenstein authored
      We encountered a use-after-free bug when unloading the driver:
      
      [ 3562.116059] BUG: KASAN: use-after-free in ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
      [ 3562.117233] Read of size 4 at addr ffff8882ca5aa868 by task kworker/u13:2/23862
      [ 3562.118385]
      [ 3562.119519] CPU: 2 PID: 23862 Comm: kworker/u13:2 Tainted: G           OE     5.1.0-for-upstream-dbg-2019-05-19_16-44-30-13 #1
      [ 3562.121806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
      [ 3562.123075] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
      [ 3562.124383] Call Trace:
      [ 3562.125640]  dump_stack+0x9a/0xeb
      [ 3562.126911]  print_address_description+0xe3/0x2e0
      [ 3562.128223]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
      [ 3562.129545]  __kasan_report+0x15c/0x1df
      [ 3562.130866]  ? ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
      [ 3562.132174]  kasan_report+0xe/0x20
      [ 3562.133514]  ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
      [ 3562.134835]  ? find_mad_agent+0xa00/0xa00 [ib_core]
      [ 3562.136158]  ? qlist_free_all+0x51/0xb0
      [ 3562.137498]  ? mlx4_ib_sqp_comp_worker+0x1970/0x1970 [mlx4_ib]
      [ 3562.138833]  ? quarantine_reduce+0x1fa/0x270
      [ 3562.140171]  ? kasan_unpoison_shadow+0x30/0x40
      [ 3562.141522]  ib_mad_recv_done+0xdf6/0x3000 [ib_core]
      [ 3562.142880]  ? _raw_spin_unlock_irqrestore+0x46/0x70
      [ 3562.144277]  ? ib_mad_send_done+0x1810/0x1810 [ib_core]
      [ 3562.145649]  ? mlx4_ib_destroy_cq+0x2a0/0x2a0 [mlx4_ib]
      [ 3562.147008]  ? _raw_spin_unlock_irqrestore+0x46/0x70
      [ 3562.148380]  ? debug_object_deactivate+0x2b9/0x4a0
      [ 3562.149814]  __ib_process_cq+0xe2/0x1d0 [ib_core]
      [ 3562.151195]  ib_cq_poll_work+0x45/0xf0 [ib_core]
      [ 3562.152577]  process_one_work+0x90c/0x1860
      [ 3562.153959]  ? pwq_dec_nr_in_flight+0x320/0x320
      [ 3562.155320]  worker_thread+0x87/0xbb0
      [ 3562.156687]  ? __kthread_parkme+0xb6/0x180
      [ 3562.158058]  ? process_one_work+0x1860/0x1860
      [ 3562.159429]  kthread+0x320/0x3e0
      [ 3562.161391]  ? kthread_park+0x120/0x120
      [ 3562.162744]  ret_from_fork+0x24/0x30
      ...
      [ 3562.187615] Freed by task 31682:
      [ 3562.188602]  save_stack+0x19/0x80
      [ 3562.189586]  __kasan_slab_free+0x11d/0x160
      [ 3562.190571]  kfree+0xf5/0x2f0
      [ 3562.191552]  ib_mad_port_close+0x200/0x380 [ib_core]
      [ 3562.192538]  ib_mad_remove_device+0xf0/0x230 [ib_core]
      [ 3562.193538]  remove_client_context+0xa6/0xe0 [ib_core]
      [ 3562.194514]  disable_device+0x14e/0x260 [ib_core]
      [ 3562.195488]  __ib_unregister_device+0x79/0x150 [ib_core]
      [ 3562.196462]  ib_unregister_device+0x21/0x30 [ib_core]
      [ 3562.197439]  mlx4_ib_remove+0x162/0x690 [mlx4_ib]
      [ 3562.198408]  mlx4_remove_device+0x204/0x2c0 [mlx4_core]
      [ 3562.199381]  mlx4_unregister_interface+0x49/0x1d0 [mlx4_core]
      [ 3562.200356]  mlx4_ib_cleanup+0xc/0x1d [mlx4_ib]
      [ 3562.201329]  __x64_sys_delete_module+0x2d2/0x400
      [ 3562.202288]  do_syscall_64+0x95/0x470
      [ 3562.203277]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The problem was that the MAD PD was deallocated before the MAD CQ.
      There was completion work pending for the CQ when the PD got deallocated.
      When the mad completion handling reached procedure
      ib_mad_post_receive_mads(), we got a use-after-free bug in the following
      line of code in that procedure:
         sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
      (the pd pointer in the above line is no longer valid, because the
      pd has been deallocated).
      
      We fix this by allocating the PD before the CQ in procedure
      ib_mad_port_open(), and deallocating the PD after freeing the CQ
      in procedure ib_mad_port_close().
      
      Since the CQ completion work queue is flushed during ib_free_cq(),
      no completions will be pending for that CQ when the PD is later
      deallocated.
      
      Note that freeing the CQ before deallocating the PD is the practice
      in the ULPs.
      
      Fixes: 4be90bc6
      
       ("IB/mad: Remove ib_get_dma_mr calls")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190801121449.24973-1-leon@kernel.org
      
      
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      770b7d96
    • Gal Pressman's avatar
      RDMA/restrack: Track driver QP types in resource tracker · 52e0a118
      Gal Pressman authored
      The check for QP type different than XRC has excluded driver QP
      types from the resource tracker.
      As a result, "rdma resource show" user command would not show opened
      driver QPs which does not reflect the real state of the system.
      
      Check QP type explicitly instead of assuming enum values/ordering.
      
      Fixes: 40909f66
      
       ("RDMA/efa: Add EFA verbs implementation")
      Signed-off-by: default avatarGal Pressman <galpress@amazon.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190801104354.11417-1-galpress@amazon.com
      
      
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      52e0a118
    • Jason Gunthorpe's avatar
      RDMA/devices: Remove the lock around remove_client_context · 9cd58817
      Jason Gunthorpe authored
      
      
      Due to the complexity of client->remove() callbacks it is desirable to not
      hold any locks while calling them. Remove the last one by tracking only
      the highest client ID and running backwards from there over the xarray.
      
      Since the only purpose of that lock was to protect the linked list, we can
      drop the lock.
      
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081841.32345-3-leon@kernel.org
      
      
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      9cd58817
    • Jason Gunthorpe's avatar
      RDMA/devices: Do not deadlock during client removal · 621e55ff
      Jason Gunthorpe authored
      lockdep reports:
      
         WARNING: possible circular locking dependency detected
      
         modprobe/302 is trying to acquire lock:
         0000000007c8919c ((wq_completion)ib_cm){+.+.}, at: flush_workqueue+0xdf/0x990
      
         but task is already holding lock:
         000000002d3d2ca9 (&device->client_data_rwsem){++++}, at: remove_client_context+0x79/0xd0 [ib_core]
      
         which lock already depends on the new lock.
      
         the existing dependency chain (in reverse order) is:
      
         -> #2 (&device->client_data_rwsem){++++}:
                down_read+0x3f/0x160
                ib_get_net_dev_by_params+0xd5/0x200 [ib_core]
                cma_ib_req_handler+0x5f6/0x2090 [rdma_cm]
                cm_process_work+0x29/0x110 [ib_cm]
                cm_req_handler+0x10f5/0x1c00 [ib_cm]
                cm_work_handler+0x54c/0x311d [ib_cm]
                process_one_work+0x4aa/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #1 ((work_completion)(&(&work->work)->work)){+.+.}:
                process_one_work+0x45f/0xa30
                worker_thread+0x62/0x5b0
                kthread+0x1ca/0x1f0
                ret_from_fork+0x24/0x30
      
         -> #0 ((wq_completion)ib_cm){+.+.}:
                lock_acquire+0xc8/0x1d0
                flush_workqueue+0x102/0x990
                cm_remove_one+0x30e/0x3c0 [ib_cm]
                remove_client_context+0x94/0xd0 [ib_core]
                disable_device+0x10a/0x1f0 [ib_core]
                __ib_unregister_device+0x5a/0xe0 [ib_core]
                ib_unregister_device+0x21/0x30 [ib_core]
                mlx5_ib_stage_ib_reg_cleanup+0x9/0x10 [mlx5_ib]
                __mlx5_ib_remove+0x3d/0x70 [mlx5_ib]
                mlx5_ib_remove+0x12e/0x140 [mlx5_ib]
                mlx5_remove_device+0x144/0x150 [mlx5_core]
                mlx5_unregister_interface+0x3f/0xf0 [mlx5_core]
                mlx5_ib_cleanup+0x10/0x3a [mlx5_ib]
                __x64_sys_delete_module+0x227/0x350
                do_syscall_64+0xc3/0x6a4
                entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Which is due to the read side of the client_data_rwsem being obtained
      recursively through a work queue flush during cm client removal.
      
      The lock is being held across the remove in remove_client_context() so
      that the function is a fence, once it returns the client is removed. This
      is required so that the two callers do not proceed with destruction until
      the client completes removal.
      
      Instead of using client_data_rwsem use the existing device unregistration
      refcount and add a similar client unregistration (client->uses) refcount.
      
      This will fence the two unregistration paths without holding any locks.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 921eab11
      
       ("RDMA/devices: Re-organize device.c locking")
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20190731081841.32345-2-leon@kernel.org
      
      
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      621e55ff
    • Luck, Tony's avatar
      IB/core: Add mitigation for Spectre V1 · 61f25982
      Luck, Tony authored
      
      
      Some processors may mispredict an array bounds check and
      speculatively access memory that they should not. With
      a user supplied array index we like to play things safe
      by masking the value with the array size before it is
      used as an index.
      
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Link: https://lore.kernel.org/r/20190731043957.GA1600@agluck-desk2.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      61f25982
  13. 31 Jul, 2019 1 commit
  14. 25 Jul, 2019 3 commits