1. 14 Oct, 2016 1 commit
    • Brenden Blanco's avatar
      net/mlx4_en: fixup xdp tx irq to match rx · 958b3d39
      Brenden Blanco authored
      In cases where the number of tx rings is not a multiple of the number of
      rx rings, the tx completion event will be handled on a different core
      from the transmit and population of the ring. Races on the ring will
      lead to a double-free of the page, and possibly other corruption.
      The rings are initialized by default with a valid multiple of rings,
      based on the number of cpus, therefore an invalid configuration requires
      ethtool to change the ring layout. For instance 'ethtool -L eth0 rx 9 tx
      8' will cause packets received on rx0, and XDP_TX'd to tx48, to be
      completed on cpu3 (48 % 9 == 3).
      Resolve this discrepancy by shifting the irq for the xdp tx queues to
      start again from 0, modulo rx_ring_num.
      Fixes: 9ecc2d86
       ("net/mlx4_en: add xdp forwarding and data write support")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 06 May, 2016 1 commit
    • Haggai Abramovsky's avatar
      net/mlx4: Avoid wrong virtual mappings · 73898db0
      Haggai Abramovsky authored
      The dma_alloc_coherent() function returns a virtual address which can
      be used for coherent access to the underlying memory.  On some
      architectures, like arm64, undefined behavior results if this memory is
      also accessed via virtual mappings that are not coherent.  Because of
      their undefined nature, operations like virt_to_page() return garbage
      when passed virtual addresses obtained from dma_alloc_coherent().  Any
      subsequent mappings via vmap() of the garbage page values are unusable
      and result in bad things like bus errors (synchronous aborts in ARM64
      The mlx4 driver contains code that does the equivalent of:
      vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
      device is opened.
      Prevent Ethernet driver to run this problematic code by forcing it to
      allocate contiguous memory. As for the Infiniband driver, at first we
      are trying to allocate contiguous memory, but in case of failure roll
      back to work with fragmented memory.
      Signed-off-by: default avatarHaggai Abramovsky <hagaya@mellanox.com>
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Reported-by: default avatarDavid Daney <david.daney@cavium.com>
      Tested-by: default avatarSinan Kaya <okaya@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 18 Nov, 2015 2 commits
    • Eric Dumazet's avatar
      net: provide generic busy polling to all NAPI drivers · 93d05d4a
      Eric Dumazet authored
      NAPI drivers no longer need to observe a particular protocol
      to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
      napi_hash_add() and napi_hash_del() are automatically called
      from core networking stack, respectively from
      netif_napi_add() and netif_napi_del()
      This patch depends on free_netdev() and netif_napi_del() being
      called from process context, which seems to be the norm.
      Drivers might still prefer to call napi_hash_del() on their
      own, since they might combine all the rcu grace periods into
      a single one, knowing their NAPI structures lifetime, while
      core networking stack has no idea of a possible combining.
      Once this patch proves to not bring serious regressions,
      we will cleanup drivers to either remove napi_hash_del()
      or provide appropriate rcu grace periods combining.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      net: add netif_tx_napi_add() · d64b5e85
      Eric Dumazet authored
      netif_tx_napi_add() is a variant of netif_napi_add()
      It should be used by drivers that use a napi structure
      to exclusively poll TX.
      We do not want to add this kind of napi in napi_hash[] in following
      patches, adding generic busy polling to all NAPI drivers.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  4. 27 Aug, 2015 1 commit
  5. 31 May, 2015 2 commits
    • Ido Shamay's avatar
      net/mlx4_core: Move affinity hints to mlx4_core ownership · de161803
      Ido Shamay authored
      Now that EQs management is in the sole responsibility of mlx4_core,
      the IRQ affinity hints configuration should be in its hands as well.
      request_irq is called only once by the first consumer (maybe mlx4_ib),
      so mlx4_en passes the affinity mask too late. We also need to request
      vectors according to the cores we want to run on.
      mlx4_core distribution of IRQs to cores is straight forward,
      EQ(i)->IRQ will set affinity hint to core i.
      Consumers need to request EQ vectors, according to their cores
      considerations (NUMA).
      Signed-off-by: default avatarIdo Shamay <idos@mellanox.com>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Matan Barak's avatar
      net/mlx4: Add EQ pool · c66fa19c
      Matan Barak authored
      Previously, mlx4_en allocated EQs and used them exclusively.
      This affected RoCE performance, as applications which are
      events sensitive were limited to use only the legacy EQs.
      Change that by introducing an EQ pool. This pool is managed
      by mlx4_core. EQs are assigned to ports (when there are limited
      number of EQs, multiple ports could be assigned to the same EQs).
      An exception to this rule is the ASYNC EQ which handles various events.
      Legacy EQs are completely removed as all EQs could be shared.
      When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
      EQ serving on a specific port. The core driver calculates which
      EQ should be assigned to that request.
      Because IRQs are shared between IB and Ethernet modules, their
      names only include the PCI device BDF address.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarIdo Shamay <idos@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  6. 25 Jan, 2015 1 commit
  7. 17 Jul, 2014 1 commit
  8. 03 Jul, 2014 2 commits
  9. 11 Jun, 2014 1 commit
    • Yuval Atias's avatar
      net/mlx4_en: Use affinity hint · 9e311e77
      Yuval Atias authored
      The “affinity hint” mechanism is used by the user space
      daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
      Irqbalancer can use this hint to balance the irqs between the
      cpus indicated by the mask.
      We wish the HCA to preferentially map the IRQs it uses to numa cores
      close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
      sets the affinity hint according the following policy:
      First it maps IRQs to “close” numa cores.  If these are exhausted, the
      remaining IRQs are mapped to “far” numa cores.
      Signed-off-by: default avatarYuval Atias <yuvala@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  10. 02 Jun, 2014 2 commits
  11. 09 May, 2014 1 commit
  12. 16 Apr, 2014 1 commit
  13. 20 Dec, 2013 1 commit
  14. 08 Nov, 2013 2 commits
  15. 20 Jun, 2013 1 commit
  16. 24 Apr, 2013 1 commit
    • Amir Vadai's avatar
      net/mlx4_en: Add HW timestamping (TS) support · ec693d47
      Amir Vadai authored
      The patch allows to enable/disable HW timestamping for incoming and/or
      outgoing packets. It adds and initializes all structs and callbacks
      needed by kernel TS API.
      To enable/disable HW timestamping appropriate ioctl should be used.
      When enabling TS on receive flow - VLAN stripping will be disabled.
      Also were made all relevant changes in RX/TX flows to consider TS request
      and plant HW timestamps into relevant structures.
      mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature.
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  17. 26 Nov, 2012 1 commit
    • Or Gerlitz's avatar
      mlx4: 64-byte CQE/EQE support · 08ff3235
      Or Gerlitz authored
      ConnectX-3 devices can use either 64- or 32-byte completion queue
      entries (CQEs) and event queue entries (EQEs).  Using 64-byte
      EQEs/CQEs performs better because each entry is aligned to a complete
      cacheline.  This patch queries the HCA's capabilities, and if it
      supports 64-byte CQEs and EQES the driver will configure the HW to
      work in 64-byte mode.
      The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.
      Since this mode is global, userspace (libmlx4) must be updated to work
      with the configured CQE size, and guests using SR-IOV virtual
      functions need to know both EQE and CQE size.
      In case one of the 64-byte CQE/EQE capabilities is activated, the
      patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
      command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
      they need an update to be able to work with the PPF. This is done by
      changing the returned pf_context_behaviour not to be zero any more. In
      case none of these capabilities is activated that value remains zero
      and older guest drivers can run OK.
      The SRIOV related flow is as follows
      1. the PPF does the detection of the new capabilities using
         QUERY_DEV_CAP command.
      2. the PPF activates the new capabilities using INIT_HCA.
      3. the VF detects if the PPF activated the capabilities using
         QUERY_HCA, and if this is the case activates them for itself too.
      Note that the VF detects that it must be aware to the new PF behaviour
      using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.
      User space notification is done through a new field introduced in
      struct mlx4_ib_ucontext which holds device capabilities for which user
      space must take action. This changes the binary interface so the ABI
      towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
      when **needed** i.e. only when the driver does use 64-byte CQEs or
      future device capabilities which must be in sync by user space. This
      practice allows to work with unmodified libmlx4 on older devices (e.g
      A0, B0) which don't support 64-byte CQEs.
      In order to keep existing systems functional when they update to a
      newer kernel that contains these changes in VF and userspace ABI, a
      module parameter enable_64b_cqe_eqe must be set to enable 64-byte
      mode; the default is currently false.
      Signed-off-by: default avatarEli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
  18. 19 Jul, 2012 2 commits
  19. 24 Apr, 2012 1 commit
  20. 30 Dec, 2011 1 commit
  21. 27 Nov, 2011 1 commit
  22. 10 Oct, 2011 2 commits
  23. 11 Aug, 2011 1 commit
  24. 23 Mar, 2011 1 commit
  25. 25 May, 2009 1 commit
  26. 18 May, 2009 1 commit
  27. 26 Dec, 2008 2 commits
  28. 22 Dec, 2008 1 commit
  29. 22 Oct, 2008 1 commit