- 18 Jan, 2019 1 commit
-
-
Ido Schimmel authored
When a packet should be trapped to the CPU the device consumes a WQE (work queue element) from an RDQ (receive descriptor queue) and copies the packet to the address specified in the WQE. The device then tries to post a CQE (completion queue element) that contains various metadata (e.g., ingress port) about the packet to a CQ (completion queue). In case the device managed to consume a WQE, but did not manage to post the corresponding CQE, it will get stuck. This unlikely situation can be triggered due to the scheme the driver is currently using to process CQEs. The driver will consume up to 512 CQEs at a time and after processing each corresponding WQE it will ring the RDQ's doorbell, letting the device know that a new WQE was posted for it to consume. Only after processing all the CQEs (up to 512), the driver will ring the CQ's doorbell, letting the device know that new ones can be posted. Fix this by having the driver ring the CQ's doorbell for every processed CQE, but before ringing the RDQ's doorbell. This guarantees that whenever we post a new WQE, there is a corresponding CQE available. Copy the currently processed CQE to prevent the device from overwriting it with a new CQE after ringing the doorbell. Note that the driver still arms the CQ only after processing all the pending CQEs, so that interrupts for this CQ will only be delivered after the driver finished its processing. Before commit 8404f6f2 ("mlxsw: pci: Allow to use CQEs of version 1 and version 2") the issue was virtually impossible to trigger since the number of CQEs was twice the number of WQEs and the number of CQEs processed at a time was equal to the number of available WQEs. Fixes: 8404f6f2 ("mlxsw: pci: Allow to use CQEs of version 1 and version 2") Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reported-by:
Semion Lisyansky <semionl@mellanox.com> Tested-by:
Semion Lisyansky <semionl@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 08 Jan, 2019 8 commits
-
-
Ido Schimmel authored
When a VLAN is deleted from a bridge port we should not change the PVID unless the deleted VLAN is the PVID. Fixes: fe9ccc78 ("mlxsw: spectrum_switchdev: Don't batch VLAN operations") Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Adding a VLAN on a port can trigger the offload of a VXLAN tunnel which is already a member in the VLAN. In case the configuration of the VXLAN is not supported, the driver would return -EOPNOTSUPP. This is problematic since bridge code does not interpret this as error, but rather that it should try to setup the VLAN using the 8021q driver instead of switchdev. Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges") Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Drivers are not supposed to return errors in switchdev commit phase if they returned OK in prepare phase. Otherwise, a WARNING is emitted. However, when the offloading of a VXLAN tunnel is triggered by the addition of a VLAN on a local port, it is not possible to guarantee that the commit phase will succeed without doing a lot of work. In these cases, the artificial division between prepare and commit phase does not make sense, so simply do the work in the prepare phase. Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges") Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When VXLAN is a loadable module, MLXSW_SPECTRUM must not be built-in: drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:2547: undefined reference to `vxlan_fdb_find_uc' Add Kconfig dependency to enforce usable configurations. Fixes: 1231e04f ("mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation") Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reported-by:
kbuild test robot <lkp@intel.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Make sure that lag port TX is disabled before mlxsw_sp_port_lag_leave() is called and prevent from possible EMAD error. Fixes: 0d65fc13 ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Nir Dotan authored
Removal of the mlxsw driver on Spectrum-2 platforms hits an ASSERT_RTNL() in Spectrum-2 ACL Bloom filter and in ERP removal paths. This happens because the multicast router implementation in Spectrum-2 relies on ACLs. Taking the RTNL lock upon driver removal is useless since the driver first removes its ports and unregisters from notifiers so concurrent writes cannot happen at that time. The assertions were originally put as a reminder for future work involving ERP background optimization, but having these assertions only during addition serves this purpose as well. Therefore remove the ASSERT_RTNL() in both places related to ERP and Bloom filter removal. Fixes: cf7221a4 ("mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2") Signed-off-by:
Nir Dotan <nird@mellanox.com> Reviewed-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Nir Dotan authored
When writing to C-TCAM, mlxsw driver uses cregion->ops->entry_insert(). In case of C-TCAM HW insertion error, the opposite action should take place. Add error handling case in which the C-TCAM region entry is removed, by calling cregion->ops->entry_remove(). Fixes: a0a777b9 ("mlxsw: spectrum_acl: Start using A-TCAM") Signed-off-by:
Nir Dotan <nird@mellanox.com> Reviewed-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Luis Chamberlain authored
We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- 07 Jan, 2019 2 commits
-
-
Stephen Warren authored
pci_{,un}map_sg are deprecated and replaced by dma_{,un}map_sg. This is especially relevant since the rest of the driver uses the DMA API. Fix the driver to use the replacement APIs. Signed-off-by:
Stephen Warren <swarren@nvidia.com> Reviewed-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Stephen Warren authored
This patch solves a crash at the time of mlx4 driver unload or system shutdown. The crash occurs because dma_alloc_coherent() returns one value in mlx4_alloc_icm_coherent(), but a different value is passed to dma_free_coherent() in mlx4_free_icm_coherent(). In turn this is because when allocated, that pointer is passed to sg_set_buf() to record it, then when freed it is re-calculated by calling lowmem_page_address(sg_page()) which returns a different value. Solve this by recording the value that dma_alloc_coherent() returns, and passing this to dma_free_coherent(). This patch is roughly equivalent to commit 378efe79 ("RDMA/hns: Get rid of page operation after dma_alloc_coherent"). Based-on-code-from: Christoph Hellwig <hch@lst.de> Signed-off-by:
Stephen Warren <swarren@nvidia.com> Reviewed-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 24 Dec, 2018 4 commits
-
-
Julia Lawall authored
Drop LIST_HEAD where the variable it declares has never been used. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: c82e9aa0 ("mlx4_core: resource tracking for HCA resources used by guests") Signed-off-by:
Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Julia Lawall authored
Drop LIST_HEAD where the variable it declares is never used. The uses were removed in 244cd96a ("net_sched: remove list_head from tc_action"), but not the declaration. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 244cd96a ("net_sched: remove list_head from tc_action") Signed-off-by:
Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Julia Lawall authored
Drop LIST_HEAD where the variable it declares is never used. These became useless in 244cd96a ("net_sched: remove list_head from tc_action") The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 244cd96a ("net_sched: remove list_head from tc_action") Signed-off-by:
Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
kbuild test robot authored
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:1339:57-58: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: 4c8fb298 ("net/mlx5e: Increase VF representors' SQ size to 128") CC: Gavi Teitz <gavi@mellanox.com> Signed-off-by:
kbuild test robot <fengguang.wu@intel.com> Reviewed-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 21 Dec, 2018 9 commits
-
-
Tariq Toukan authored
Add ethtool private flag 'xdp_tx_mpwqe' to control the feature from userspace. Feature is set ON by default, if supported. Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
Add support for the HW feature of multi-packet WQE in XDP xmit flow. The conventional TX descriptor (WQE, Work Queue Element) serves a single packet. Our HW has support for multi-packet WQE (MPWQE) in which a single descriptor serves multiple TX packets. This reduces both the PCI overhead and the CPU cycles wasted on writing them. In this patch we add support for the HW feature, which is supported starting from ConnectX-5. Performance: Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz XDP_TX: We see a huge gain on single port ConnectX-5, and reach the 100 Mpps milestone. * Single-port HCA: Before: 70 Mpps After: 100 Mpps (+42.8%) * Dual-port HCA: Before: 51.7 Mpps After: 57.3 Mpps (+10.8%) * In both cases we tested traffic on one port and for now On Dual-port HCAs we see only small gain, we are working to overcome this bottleneck, but for the moment only with experimental firmware on dual port HCAs we can reach the wanted numbers as seen on Single-port HCAs. XDP_REDIRECT: Redirect from (A) ConnectX-5 to (B) ConnectX-5. Due to a setup limitation, (A) and (B) are on different NUMA nodes, so absolute performance numbers are not optimal. Note: Below is the transmit rate of (B), not the redirect rate of (A) which is in some cases higher. * (B) is single-port: Before: 77 Mpps After: 90 Mpps (+16.8%) * (B) is dual-port: Before: 61 Mpps After: 72 Mpps (+18%) Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
Each xdp_wqe_info instance describes the number of data-segments and WQEBBs of the WQE. This is useful for a downstream patch that adds support for Multi-Packet TX WQE feature. Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
This provides infrastructure to have multiple xdp_info instances for the same consumer index. Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
Instead of calculating the control segment to be used upon an XDP xmit doorbell, save it in SQ structure. Nullify when no pending doorbell. Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
Do not ignore the CQE opcode. This helps expose issues and debug them. Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
Do not maintain an SQ state bit to indicate whether an XDP SQ serves redirect operations. Instead, rely on the fact that such an XDP SQ doesn't reside in an RQ instance, while the others do. This info is not known to the XDP SQ functions themselves, and they rely on their callers to distinguish between the cases. Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
At the end of the RQ polling loop, some XDP-related operations might be required. Before checking them one by one, check if an XDP program is even loaded. Combine all the checks and operations in a single function in xdp files. This saves unnecessary checks for non-XDP flows. Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
The opcode indicates about the error reason. Printing it helps in debug. Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Reviewed-by:
Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
- 20 Dec, 2018 16 commits
-
-
Ido Schimmel authored
VID 1 is not reserved anymore, so remove the check that prevented the creation of VLAN devices with this VID over mlxsw ports. Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
There is no need to abuse VID 1 anymore and we can instead use VID 4095 as the default VLAN, which will be configured on the port throughout its lifetime. The OVS join / leave functions are changed to enable VIDs 1-4094 (inclusive) instead of 2-4095. This because VID 4095 is now the default VLAN instead of 1. Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
VLAN entries on a port can be associated with either a bridge VLAN or a router port. Before the VLAN entry is destroyed these associations need to be cleaned up. Currently, this is always invoked from the function which destroys the VLAN entry, but next patch is going to skip the destruction of the default entry when a port in unlinked from a LAG. The above does not mean that the associations should not be cleaned up, so add a helper that will be invoked from both call sites. Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Subsequent patches will need to access the default port VLAN. Since this VLAN will exist throughout the lifetime of the port, simply store it in the port's struct. Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The function allows flushing all the existing VLAN entries on a port. It is invoked when a port is destroyed and when it is unlinked from a LAG. In the latter case, when moving to the new default VLAN, there will not be a need to destroy the default VLAN entry. Therefore, add an argument that allows to control whether the default port VLAN should be destroyed or not. Currently it is always set to 'true'. Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Currently, the driver does not set the port's PVID when initializing a new port. This is because the driver is using VID 1 as PVID which is the firmware default. Subsequent patches are going to change the PVID the driver is setting when initializing a new port. Prepare for that by explicitly setting the port's PVID. Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Subsequent patches are going to replace the current default VID (1) with VLAN_N_VID - 1 (4095). Prepare for this conversion by replacing the hard-coded '1' with a define. Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
In symmetric routing, the only two members in the VLAN corresponding to the L3 VNI are the router port and the VXLAN tunnel. In case the VXLAN device is already enslaved to the bridge and only later the VLAN interface is configured, the tunnel will not be offloaded. The reason for this is that when the router interface (RIF) corresponding to the VLAN interface is configured, it calls the core fid_get() API which does not check if NVE should be enabled on the FID. Instead, call into the bridge code which will check if NVE should be enabled on the FID. This effectively means that the same code path is used to retrieve a FID when either a local port or a router port joins the FID. Signed-off-by:
Ido Schimmel <idosch@mellanox.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Reviewed-by:
Petr Machata <petrm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Aviv Heller authored
If CONFIG_MLX5_ESWITCH is not defined, test for SR-IOV being disabled, instead of calling e-switch LAG prereq routine. Since LAG with SRIOV is allowed only when switchdev mode is on. Fixes: eff849b2 ("net/mlx5: Allow/disallow LAG according to pre-req only") Signed-off-by:
Aviv Heller <avivh@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Aviv Heller authored
vport system image guid should be queried using vport nic API for Ethernet ports, and vport hca API for Infiniband ports. Fixes: fadd59fc ("net/mlx5: Introduce inter-device communication mechanism") Signed-off-by:
Aviv Heller <avivh@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Eli Britstein authored
Generate encap header depending on the routed device to support native/tagged Ethernet header. Signed-off-by:
Eli Britstein <elibr@mellanox.com> Reviewed-by:
Roi Dayan <roid@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Eli Britstein authored
Support generation of native or tagged Ethernet header for encap header, depending on provided net device. Signed-off-by:
Eli Britstein <elibr@mellanox.com> Reviewed-by:
Roi Dayan <roid@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Eli Britstein authored
Change the order to first route IPv4/6 and return if error. Only after successful route continue to allocate an encap header, with no functional change. Signed-off-by:
Eli Britstein <elibr@mellanox.com> Reviewed-by:
Roi Dayan <roid@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Eli Britstein authored
In tunnel encap we prepare the encap header for IPv4/6 cases, in two separate functions. For ETH header generation the code is almost duplicated. Move the ETH header generation code from IPv4/6 functions to a helper function, with no functional change. Signed-off-by:
Eli Britstein <elibr@mellanox.com> Reviewed-by:
Roi Dayan <roid@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Eli Britstein authored
Currently we don't support nor fail attempts to offload encap flows routed to vlan device on the underlay network. We wrongly consider a vlan underlay device to be on the same e-switch b/c the switchdev ID is retrieved recursively. Add explicit check for that and fail such attempts. Also align to a more strict check for the ingress and the underlay devices to practically be on the same eswitch. Fixes: ce99f6b9 ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels') Fixes: 3e621b19 ('net/mlx5e: Support TC encapsulation offloads with upper devices') Signed-off-by:
Eli Britstein <elibr@mellanox.com> Reviewed-by:
Roi Dayan <roid@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Eli Britstein authored
For tunnel we determine the output devs for IPv4/6 cases, in two separate functions, with a duplicated code. Move that code from IPv4/6 functions to a helper function, with no functional change. Signed-off-by:
Eli Britstein <elibr@mellanox.com> Reviewed-by:
Roi Dayan <roid@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-