1. 06 Feb, 2020 1 commit
    • Maor Gottlieb's avatar
      net/mlx5: Fix deadlock in fs_core · c1948390
      Maor Gottlieb authored
      free_match_list could be called when the flow table is already
      locked. We need to pass this notation to tree_put_node.
      
      It fixes the following lockdep warnning:
      
      [ 1797.268537] ============================================
      [ 1797.276837] WARNING: possible recursive locking detected
      [ 1797.285101] 5.5.0-rc5+ #10 Not tainted
      [ 1797.291641] --------------------------------------------
      [ 1797.299917] handler10/9296 is trying to acquire lock:
      [ 1797.307885] ffff889ad399a0a0 (&node->lock){++++}, at:
      tree_put_node+0x1d5/0x210 [mlx5_core]
      [ 1797.319694]
      [ 1797.319694] but task is already holding lock:
      [ 1797.330904] ffff889ad399a0a0 (&node->lock){++++}, at:
      nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
      [ 1797.344707]
      [ 1797.344707] other info that might help us debug this:
      [ 1797.356952]  Possible unsafe locking scenario:
      [ 1797.356952]
      [ 1797.368333]        CPU0
      [ 1797.373357]        ----
      [ 1797.378364]   lock(&node->lock);
      [ 1797.384222]   lock(&node->lock);
      [ 1797.390031]
      [ 1797.390031]  *** DEADLOCK ***
      [ 1797.390031]
      [ 1797.403003]  May be due to missing lock nesting notation
      [ 1797.403003]
      [ 1797.414691] 3 locks held by handler10/9296:
      [ 1797.421465]  #0: ffff889cf2c5a110 (&block->cb_lock){++++}, at:
      tc_setup_cb_add+0x70/0x250
      [ 1797.432810]  #1: ffff88a030081490 (&comp->sem){++++}, at:
      mlx5_devcom_get_peer_data+0x4c/0xb0 [mlx5_core]
      [ 1797.445829]  #2: ffff889ad399a0a0 (&node->lock){++++}, at:
      nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
      [ 1797.459913]
      [ 1797.459913] stack backtrace:
      [ 1797.469436] CPU: 1 PID: 9296 Comm: handler10 Kdump: loaded Not
      tainted 5.5.0-rc5+ #10
      [ 1797.480643] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS
      2.4.3 01/17/2017
      [ 1797.491480] Call Trace:
      [ 1797.496701]  dump_stack+0x96/0xe0
      [ 1797.502864]  __lock_acquire.cold.63+0xf8/0x212
      [ 1797.510301]  ? lockdep_hardirqs_on+0x250/0x250
      [ 1797.517701]  ? mark_held_locks+0x55/0xa0
      [ 1797.524547]  ? quarantine_put+0xb7/0x160
      [ 1797.531422]  ? lockdep_hardirqs_on+0x17d/0x250
      [ 1797.538913]  lock_acquire+0xd6/0x1f0
      [ 1797.545529]  ? tree_put_node+0x1d5/0x210 [mlx5_core]
      [ 1797.553701]  down_write+0x94/0x140
      [ 1797.560206]  ? tree_put_node+0x1d5/0x210 [mlx5_core]
      [ 1797.568464]  ? down_write_killable_nested+0x170/0x170
      [ 1797.576925]  ? del_hw_flow_group+0xde/0x1f0 [mlx5_core]
      [ 1797.585629]  tree_put_node+0x1d5/0x210 [mlx5_core]
      [ 1797.593891]  ? free_match_list.part.25+0x147/0x170 [mlx5_core]
      [ 1797.603389]  free_match_list.part.25+0xe0/0x170 [mlx5_core]
      [ 1797.612654]  _mlx5_add_flow_rules+0x17e2/0x20b0 [mlx5_core]
      [ 1797.621838]  ? lock_acquire+0xd6/0x1f0
      [ 1797.629028]  ? esw_get_prio_table+0xb0/0x3e0 [mlx5_core]
      [ 1797.637981]  ? alloc_insert_flow_group+0x420/0x420 [mlx5_core]
      [ 1797.647459]  ? try_to_wake_up+0x4c7/0xc70
      [ 1797.654881]  ? lock_downgrade+0x350/0x350
      [ 1797.662271]  ? __mutex_unlock_slowpath+0xb1/0x3f0
      [ 1797.670396]  ? find_held_lock+0xac/0xd0
      [ 1797.677540]  ? mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
      [ 1797.686467]  mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
      [ 1797.695134]  ? _mlx5_add_flow_rules+0x20b0/0x20b0 [mlx5_core]
      [ 1797.704270]  ? irq_exit+0xa5/0x170
      [ 1797.710764]  ? retint_kernel+0x10/0x10
      [ 1797.717698]  ? mlx5_eswitch_set_rule_source_port.isra.9+0x122/0x230
      [mlx5_core]
      [ 1797.728708]  mlx5_eswitch_add_offloaded_rule+0x465/0x6d0 [mlx5_core]
      [ 1797.738713]  ? mlx5_eswitch_get_prio_range+0x30/0x30 [mlx5_core]
      [ 1797.748384]  ? mlx5_fc_stats_work+0x670/0x670 [mlx5_core]
      [ 1797.757400]  mlx5e_tc_offload_fdb_rules.isra.27+0x24/0x90 [mlx5_core]
      [ 1797.767665]  mlx5e_tc_add_fdb_flow+0xaf8/0xd40 [mlx5_core]
      [ 1797.776886]  ? mlx5e_encap_put+0xd0/0xd0 [mlx5_core]
      [ 1797.785562]  ? mlx5e_alloc_flow.isra.43+0x18c/0x1c0 [mlx5_core]
      [ 1797.795353]  __mlx5e_add_fdb_flow+0x2e2/0x440 [mlx5_core]
      [ 1797.804558]  ? mlx5e_tc_update_neigh_used_value+0x8c0/0x8c0
      [mlx5_core]
      [ 1797.815093]  ? wait_for_completion+0x260/0x260
      [ 1797.823272]  mlx5e_configure_flower+0xe94/0x1620 [mlx5_core]
      [ 1797.832792]  ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core]
      [ 1797.842096]  ? down_read+0x11a/0x2e0
      [ 1797.849090]  ? down_write+0x140/0x140
      [ 1797.856142]  ? mlx5e_rep_indr_setup_block_cb+0xc0/0xc0 [mlx5_core]
      [ 1797.866027]  tc_setup_cb_add+0x11a/0x250
      [ 1797.873339]  fl_hw_replace_filter+0x25e/0x320 [cls_flower]
      [ 1797.882385]  ? fl_hw_destroy_filter+0x1c0/0x1c0 [cls_flower]
      [ 1797.891607]  fl_change+0x1d54/0x1fb6 [cls_flower]
      [ 1797.899772]  ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
      [cls_flower]
      [ 1797.910728]  ? lock_downgrade+0x350/0x350
      [ 1797.918187]  ? __radix_tree_lookup+0xa5/0x130
      [ 1797.926046]  ? fl_set_key+0x1590/0x1590 [cls_flower]
      [ 1797.934611]  ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
      [cls_flower]
      [ 1797.945673]  tc_new_tfilter+0xcd1/0x1240
      [ 1797.953138]  ? tc_del_tfilter+0xb10/0xb10
      [ 1797.960688]  ? avc_has_perm_noaudit+0x92/0x320
      [ 1797.968721]  ? avc_has_perm_noaudit+0x1df/0x320
      [ 1797.976816]  ? avc_has_extended_perms+0x990/0x990
      [ 1797.985090]  ? mark_lock+0xaa/0x9e0
      [ 1797.991988]  ? match_held_lock+0x1b/0x240
      [ 1797.999457]  ? match_held_lock+0x1b/0x240
      [ 1798.006859]  ? find_held_lock+0xac/0xd0
      [ 1798.014045]  ? symbol_put_addr+0x40/0x40
      [ 1798.021317]  ? rcu_read_lock_sched_held+0xd0/0xd0
      [ 1798.029460]  ? tc_del_tfilter+0xb10/0xb10
      [ 1798.036810]  rtnetlink_rcv_msg+0x4d5/0x620
      [ 1798.044236]  ? rtnl_bridge_getlink+0x460/0x460
      [ 1798.052034]  ? lockdep_hardirqs_on+0x250/0x250
      [ 1798.059837]  ? match_held_lock+0x1b/0x240
      [ 1798.067146]  ? find_held_lock+0xac/0xd0
      [ 1798.074246]  netlink_rcv_skb+0xc6/0x1f0
      [ 1798.081339]  ? rtnl_bridge_getlink+0x460/0x460
      [ 1798.089104]  ? netlink_ack+0x440/0x440
      [ 1798.096061]  netlink_unicast+0x2d4/0x3b0
      [ 1798.103189]  ? netlink_attachskb+0x3f0/0x3f0
      [ 1798.110724]  ? _copy_from_iter_full+0xda/0x370
      [ 1798.118415]  netlink_sendmsg+0x3ba/0x6a0
      [ 1798.125478]  ? netlink_unicast+0x3b0/0x3b0
      [ 1798.132705]  ? netlink_unicast+0x3b0/0x3b0
      [ 1798.139880]  sock_sendmsg+0x94/0xa0
      [ 1798.146332]  ____sys_sendmsg+0x36c/0x3f0
      [ 1798.153251]  ? copy_msghdr_from_user+0x165/0x230
      [ 1798.160941]  ? kernel_sendmsg+0x30/0x30
      [ 1798.167738]  ___sys_sendmsg+0xeb/0x150
      [ 1798.174411]  ? sendmsg_copy_msghdr+0x30/0x30
      [ 1798.181649]  ? lock_downgrade+0x350/0x350
      [ 1798.188559]  ? rcu_read_lock_sched_held+0xd0/0xd0
      [ 1798.196239]  ? __fget+0x21d/0x320
      [ 1798.202335]  ? do_dup2+0x2a0/0x2a0
      [ 1798.208499]  ? lock_downgrade+0x350/0x350
      [ 1798.215366]  ? __fget_light+0xd6/0xf0
      [ 1798.221808]  ? syscall_trace_enter+0x369/0x5d0
      [ 1798.229112]  __sys_sendmsg+0xd3/0x160
      [ 1798.235511]  ? __sys_sendmsg_sock+0x60/0x60
      [ 1798.242478]  ? syscall_trace_enter+0x233/0x5d0
      [ 1798.249721]  ? syscall_slow_exit_work+0x280/0x280
      [ 1798.257211]  ? do_syscall_64+0x1e/0x2e0
      [ 1798.263680]  do_syscall_64+0x72/0x2e0
      [ 1798.269950]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: bd71b08e
      
       ("net/mlx5: Support multiple updates of steering rules in parallel")
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c1948390
  2. 16 Jan, 2020 4 commits
  3. 06 Jan, 2020 1 commit
    • Parav Pandit's avatar
      Revert "net/mlx5: Support lockless FTE read lookups" · 1f0593e7
      Parav Pandit authored
      This reverts commit 7dee607e.
      
      During cleanup path, FTE's parent node group is removed which is
      referenced by the FTE while freeing the FTE.
      Hence FTE's lockless read lookup optimization done in cited commit is
      not possible at the moment.
      
      Hence, revert the commit.
      
      This avoid below KAZAN call trace.
      
      [  110.390896] BUG: KASAN: use-after-free in find_root.isra.14+0x56/0x60
      [mlx5_core]
      [  110.391048] Read of size 4 at addr ffff888c19e6d220 by task
      swapper/12/0
      
      [  110.391219] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 5.5.0-rc1+
      [  110.391222] Hardware name: HP ProLiant DL380p Gen8, BIOS P70
      08/02/2014
      [  110.391225] Call Trace:
      [  110.391229]  <IRQ>
      [  110.391246]  dump_stack+0x95/0xd5
      [  110.391307]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391320]  print_address_description.constprop.5+0x20/0x320
      [  110.391379]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391435]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391441]  __kasan_report+0x149/0x18c
      [  110.391499]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391504]  kasan_report+0x12/0x20
      [  110.391511]  __asan_report_load4_noabort+0x14/0x20
      [  110.391567]  find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391625]  del_sw_fte_rcu+0x4a/0x100 [mlx5_core]
      [  110.391633]  rcu_core+0x404/0x1950
      [  110.391640]  ? rcu_accelerate_cbs_unlocked+0x100/0x100
      [  110.391649]  ? run_rebalance_domains+0x201/0x280
      [  110.391654]  rcu_core_si+0xe/0x10
      [  110.391661]  __do_softirq+0x181/0x66c
      [  110.391670]  irq_exit+0x12c/0x150
      [  110.391675]  smp_apic_timer_interrupt+0xf0/0x370
      [  110.391681]  apic_timer_interrupt+0xf/0x20
      [  110.391684]  </IRQ>
      [  110.391695] RIP: 0010:cpuidle_enter_state+0xfa/0xba0
      [  110.391703] Code: 3d c3 9b b5 50 e8 56 75 6e fe 48 89 45 c8 0f 1f 44
      00 00 31 ff e8 a6 94 6e fe 45 84 ff 0f 85 f6 02 00 00 fb 66 0f 1f 44 00
      00 <45> 85 f6 0f 88 db 06 00 00 4d 63 fe 4b 8d 04 7f 49 8d 04 87 49 8d
      [  110.391706] RSP: 0018:ffff888c23a6fce8 EFLAGS: 00000246 ORIG_RAX:
      ffffffffffffff13
      [  110.391712] RAX: dffffc0000000000 RBX: ffffe8ffff7002f8 RCX:
      000000000000001f
      [  110.391715] RDX: 1ffff11184ee6cb5 RSI: 0000000040277d83 RDI:
      ffff888c277365a8
      [  110.391718] RBP: ffff888c23a6fd40 R08: 0000000000000002 R09:
      0000000000035280
      [  110.391721] R10: ffff888c23a6fc80 R11: ffffed11847485d0 R12:
      ffffffffb1017740
      [  110.391723] R13: 0000000000000003 R14: 0000000000000003 R15:
      0000000000000000
      [  110.391732]  ? cpuidle_enter_state+0xea/0xba0
      [  110.391738]  cpuidle_enter+0x4f/0xa0
      [  110.391747]  call_cpuidle+0x6d/0xc0
      [  110.391752]  do_idle+0x360/0x430
      [  110.391758]  ? arch_cpu_idle_exit+0x40/0x40
      [  110.391765]  ? complete+0x67/0x80
      [  110.391771]  cpu_startup_entry+0x1d/0x20
      [  110.391779]  start_secondary+0x2f3/0x3c0
      [  110.391784]  ? set_cpu_sibling_map+0x2500/0x2500
      [  110.391795]  secondary_startup_64+0xa4/0xb0
      
      [  110.391841] Allocated by task 290:
      [  110.391917]  save_stack+0x21/0x90
      [  110.391921]  __kasan_kmalloc.constprop.8+0xa7/0xd0
      [  110.391925]  kasan_kmalloc+0x9/0x10
      [  110.391929]  kmem_cache_alloc_trace+0xf6/0x270
      [  110.391987]  create_root_ns.isra.36+0x58/0x260 [mlx5_core]
      [  110.392044]  mlx5_init_fs+0x5fd/0x1ee0 [mlx5_core]
      [  110.392092]  mlx5_load_one+0xc7a/0x3860 [mlx5_core]
      [  110.392139]  init_one+0x6ff/0xf90 [mlx5_core]
      [  110.392145]  local_pci_probe+0xde/0x190
      [  110.392150]  work_for_cpu_fn+0x56/0xa0
      [  110.392153]  process_one_work+0x678/0x1140
      [  110.392157]  worker_thread+0x573/0xba0
      [  110.392162]  kthread+0x341/0x400
      [  110.392166]  ret_from_fork+0x1f/0x40
      
      [  110.392218] Freed by task 2742:
      [  110.392288]  save_stack+0x21/0x90
      [  110.392292]  __kasan_slab_free+0x137/0x190
      [  110.392296]  kasan_slab_free+0xe/0x10
      [  110.392299]  kfree+0x94/0x250
      [  110.392357]  tree_put_node+0x257/0x360 [mlx5_core]
      [  110.392413]  tree_remove_node+0x63/0xb0 [mlx5_core]
      [  110.392469]  clean_tree+0x199/0x240 [mlx5_core]
      [  110.392525]  mlx5_cleanup_fs+0x76/0x580 [mlx5_core]
      [  110.392572]  mlx5_unload+0x22/0xc0 [mlx5_core]
      [  110.392619]  mlx5_unload_one+0x99/0x260 [mlx5_core]
      [  110.392666]  remove_one+0x61/0x160 [mlx5_core]
      [  110.392671]  pci_device_remove+0x10b/0x2c0
      [  110.392677]  device_release_driver_internal+0x1e4/0x490
      [  110.392681]  device_driver_detach+0x36/0x40
      [  110.392685]  unbind_store+0x147/0x200
      [  110.392688]  drv_attr_store+0x6f/0xb0
      [  110.392693]  sysfs_kf_write+0x127/0x1d0
      [  110.392697]  kernfs_fop_write+0x296/0x420
      [  110.392702]  __vfs_write+0x66/0x110
      [  110.392707]  vfs_write+0x1a0/0x500
      [  110.392711]  ksys_write+0x164/0x250
      [  110.392715]  __x64_sys_write+0x73/0xb0
      [  110.392720]  do_syscall_64+0x9f/0x3a0
      [  110.392725]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 7dee607e
      
       ("net/mlx5: Support lockless FTE read lookups")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1f0593e7
  4. 09 Dec, 2019 1 commit
  5. 20 Nov, 2019 1 commit
  6. 13 Nov, 2019 5 commits
  7. 01 Nov, 2019 2 commits
    • Parav Pandit's avatar
      net/mlx5: Support lockless FTE read lookups · 7dee607e
      Parav Pandit authored
      
      
      During connection tracking offloads with high number of connections,
      (40K connections per second), flow table group lock contention is
      observed.
      To improve the performance by reducing lock contention, lockless
      FTE read lookup is performed as described below.
      
      Each flow table entry is refcounted.
      Flow table entry is removed when refcount drops to zero.
      rhash table allows rcu protected lookup.
      Each hash table entry insertion and removal is write lock protected.
      
      Hence, it is possible to perform lockless lookup in rhash table using
      following scheme.
      
      (a) Guard FTE entry lookup per group using rcu read lock.
      (b) Before freeing the FTE entry, wait for all readers to finish
      accessing the FTE.
      
      Below example of one reader and write in parallel racing, shows
      protection in effect with rcu lock.
      
      lookup_fte_locked()
        rcu_read_lock();
        search_hash_table()
                                        existing_flow_group_write_lock();
                                        tree_put_node(fte)
                                          drop_ref_cnt(fte)
                                          del_sw_fte(fte)
                                          del_hash_table_entry();
                                          call_rcu();
                                        existing_flow_group_write_unlock();
        get_ref_cnt(fte) fails
        rcu_read_unlock();
                                        rcu grace period();
                                          [..]
                                          kmem_cache_free(fte);
      
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      7dee607e
    • Parav Pandit's avatar
      net/mlx5: Do not hold group lock while allocating FTE in software · 84c7af63
      Parav Pandit authored
      
      
      FTE memory allocation using alloc_fte() doesn't have any dependency
      on the flow group.
      Hence, do not hold flow group lock while performing alloc_fte().
      This helps to reduce contention of flow group lock.
      
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      84c7af63
  8. 03 Sep, 2019 3 commits
  9. 21 Aug, 2019 2 commits
  10. 03 Jul, 2019 1 commit
    • Parav Pandit's avatar
      net/mlx5: Introduce and use mlx5_eswitch_get_total_vports() · 2752b823
      Parav Pandit authored
      
      
      Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
      mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
      vports as well.
      Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
      more generic vport.h header file. Such exposure is not desired.
      Hence a mlx5_eswitch_get_total_vports() is introduced.
      
      Given that mlx5_eswitch_get_total_vports() API wants to work on const
      mlx5_core_dev*, change its helper functions also to accept const *dev.
      
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2752b823
  11. 28 Jun, 2019 1 commit
    • Arnd Bergmann's avatar
      net/mlx5e: reduce stack usage in mlx5_eswitch_termtbl_create · 5233794b
      Arnd Bergmann authored
      Putting an empty 'mlx5_flow_spec' structure on the stack is a bit
      wasteful and causes a warning on 32-bit architectures when building
      with clang -fsanitize-coverage:
      
      drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create':
      drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      
      Since the structure is never written to, we can statically allocate
      it to avoid the stack usage. To be on the safe side, mark all
      subsequent function arguments that we pass it into as 'const'
      as well.
      
      Fixes: 10caabda
      
       ("net/mlx5e: Use termination table for VLAN push actions")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5233794b
  12. 26 Jun, 2019 1 commit
  13. 29 May, 2019 3 commits
  14. 17 May, 2019 1 commit
  15. 29 Apr, 2019 3 commits
  16. 10 Apr, 2019 2 commits
  17. 02 Apr, 2019 1 commit
  18. 11 Mar, 2019 4 commits
  19. 14 Feb, 2019 1 commit
  20. 05 Feb, 2019 1 commit
  21. 25 Jan, 2019 1 commit