1. 03 Jun, 2018 7 commits
    • Wei Yongjun's avatar
      net/smc: fix error return code in smc_setsockopt() · 3dc9f558
      Wei Yongjun authored
      Fix to return error code -EINVAL instead of 0 if optlen is invalid.
      
      Fixes: 01d2f7e2
      
       ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dc9f558
    • Wei Yongjun's avatar
      net/mlx5: Make function mlx5_fpga_tls_send_teardown_cmd() static · 8cb77149
      Wei Yongjun authored
      
      
      Fixes the following sparse warning:
      
      drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c:199:6: warning:
       symbol 'mlx5_fpga_tls_send_teardown_cmd' was not declared. Should it be static?
      
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cb77149
    • Wei Yongjun's avatar
      hv_netvsc: fix error return code in netvsc_probe() · 9c6ffbac
      Wei Yongjun authored
      Fix to return a negative error code from the failover register fail
      error handling case instead of 0, as done elsewhere in this function.
      
      Fixes: 1ff78076
      
       ("netvsc: refactor notifier/event handling code to use the failover framework")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c6ffbac
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 9c54aeb0
      David S. Miller authored
      
      
      Filling in the padding slot in the bpf structure as a bug fix in 'ne'
      overlapped with actually using that padding area for something in
      'net-next'.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c54aeb0
    • Heiner Kallweit's avatar
      net: phy: consider PHY_IGNORE_INTERRUPT in state machine PHY_NOLINK handling · eaf47b17
      Heiner Kallweit authored
      
      
      We can bail out immediately also in case of PHY_IGNORE_INTERRUPT because
      phy_mac_interupt() informs us once the link is up.
      
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eaf47b17
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 4cb160d0
      David S. Miller authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next tree:
      
      1) Get rid of nf_sk_is_transparent(), use inet_sk_transparent() instead.
         From Máté Eckl.
      
      2) Move shared tproxy infrastructure to nf_tproxy_ipv4 and nf_tproxy_ipv6.
         Also from Máté.
      
      3) Add hashtable to speed up chain lookups by name, from Florian Westphal.
      
      4) Patch series to add connlimit support reusing part of the
         nf_conncount infrastructure. This includes preparation changes such
         passing context to the object and expression destroy interface;
         garbage collection for expressions embedded into set elements, and
         the introduction of the clone_destroy interface for expressions.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cb160d0
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 918fe1b3
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Infinite loop in _decode_session6(), from Eric Dumazet.
      
       2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric
          Dumazet.
      
       3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux.
      
       4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki
          Makita.
      
       5) Incorrect idr release in cls_flower, from Paul Blakey.
      
       6) Probe error handling fix in davinci_emac, from Dan Carpenter.
      
       7) Memory leak in XPS configuration, from Alexander Duyck.
      
       8) Use after free with cloned sockets in kcm, from Kirill Tkhai.
      
       9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas
          Dichtel.
      
      10) Fix UAPI hole in bpf data structure for 32-bit compat applications,
          from Daniel Borkmann.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
        bpf: fix uapi hole for 32 bit compat applications
        net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
        ip6_tunnel: remove magic mtu value 0xFFF8
        ip_tunnel: restore binding to ifaces with a large mtu
        net: dsa: b53: Add BCM5389 support
        kcm: Fix use-after-free caused by clonned sockets
        net-sysfs: Fix memory leak in XPS configuration
        ixgbe: fix parsing of TC actions for HW offload
        net: ethernet: davinci_emac: fix error handling in probe()
        net/ncsi: Fix array size in dumpit handler
        cls_flower: Fix incorrect idr release when failing to modify rule
        net/sonic: Use dma_mapping_error()
        xfrm Fix potential error pointer dereference in xfrm_bundle_create.
        vhost_net: flush batched heads before trying to busy polling
        tun: Fix NULL pointer dereference in XDP redirect
        be2net: Fix error detection logic for BE3
        net: qmi_wwan: Add Netgear Aircard 779S
        mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
        atm: zatm: fix memcmp casting
        iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs
        ...
      918fe1b3
  2. 02 Jun, 2018 26 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: handle chain name lookups via rhltable · 1b2470e5
      Florian Westphal authored
      
      
      If there is a significant amount of chains list search is too slow, so
      add an rhlist table for this.
      
      This speeds up ruleset loading: for every new rule we have to check if
      the name already exists in current generation.
      
      We need to be able to cope with duplicate chain names in case a transaction
      drops the nfnl mutex (for request_module) and the abort of this old
      transaction is still pending.
      
      The list is kept -- we need a way to iterate chains even if hash resize is
      in progress without missing an entry.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1b2470e5
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add connlimit support · 290180e2
      Pablo Neira Ayuso authored
      
      
      This features which allows you to limit the maximum number of
      connections per arbitrary key. The connlimit expression is stateful,
      therefore it can be used from meters to dynamically populate a set, this
      provides a mapping to the iptables' connlimit match. This patch also
      comes that allows you define static connlimit policies.
      
      This extension depends on the nf_conncount infrastructure.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      290180e2
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · e0255aec
      Linus Torvalds authored
      Pull SCSI fix from James Bottomley:
       "Eve of merge window fix: The original code was so bogus as to be
        casting the wrong generic device to an rport and proceeding to take
        actions based on the bogus values it found.
      
        Fortunately it seems the location that is dereferenced always exists,
        so the code hasn't oopsed yet, but it certainly annoys the memory
        checkers"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: scsi_transport_srp: Fix shost to rport translation
      e0255aec
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux · ada7339e
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "A few final fixes:
      
        i915:
         - fix for potential Spectre vector in the new query uAPI
         - fix NULL pointer deref (FDO #106559)
         - DMI fix to hide LVDS for Radiant P845 (FDO #105468)
      
        amdgpu:
         - suspend/resume DC regression fix
         - underscan flicker fix on fiji
         - gamma setting fix after dpms
      
        omap:
         - fix oops regression
      
        core:
         - fix PSR timing
      
        dw-hdmi:
         - fix oops regression"
      
      * tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux:
        drm/amd/display: Update color props when modeset is required
        drm/amd/display: Make atomic-check validate underscan changes
        drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense
        drm/amd/display: Fix BUG_ON during CRTC atomic check update
        drm/i915/query: nospec expects no more than an unsigned long
        drm/i915/query: Protect tainted function pointer lookup
        drm/i915/lvds: Move acpi lid notification registration to registration phase
        drm/i915: Disable LVDS on Radiant P845
        drm/omap: fix NULL deref crash with SDI displays
        drm/psr: Fix missed entry in PSR setup time table.
      ada7339e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: add destroy_clone expression · 371ebcbb
      Pablo Neira Ayuso authored
      
      
      Before this patch, cloned expressions are released via ->destroy. This
      is a problem for the new connlimit expression since the ->destroy path
      drop a reference on the conntrack modules and it unregisters hooks. The
      new ->destroy_clone provides context that this expression is being
      released from the packet path, so it is mirroring ->clone(), where
      neither module reference is dropped nor hooks need to be unregistered -
      because this done from the control plane path from the ->init() path.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      371ebcbb
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: garbage collection for stateful expressions · 79b174ad
      Pablo Neira Ayuso authored
      
      
      Use garbage collector to schedule removal of elements based of feedback
      from expression that this element comes with. Therefore, the garbage
      collector is not guided by timeout expirations in this new mode.
      
      The new connlimit expression sets on the NFT_EXPR_GC flag to enable this
      behaviour, the dynset expression needs to explicitly enable the garbage
      collector via set->ops->gc_init call.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      79b174ad
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: pass ctx to nf_tables_expr_destroy() · 3453c927
      Pablo Neira Ayuso authored
      
      
      nft_set_elem_destroy() can be called from call_rcu context. Annotate
      netns and table in set object so we can populate the context object.
      Moreover, pass context object to nf_tables_set_elem_destroy() from the
      commit phase, since it is already available from there.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3453c927
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conncount: expose connection list interface · 5e5cbc7b
      Pablo Neira Ayuso authored
      
      
      This patch provides an interface to maintain the list of connections and
      the lookup function to obtain the number of connections in the list.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5e5cbc7b
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: pass context to object destroy indirection · 00bfb320
      Pablo Neira Ayuso authored
      
      
      The new connlimit object needs this to properly deal with conntrack
      dependencies.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      00bfb320
    • Máté Eckl's avatar
      netfilter: Libify xt_TPROXY · 45ca4e0c
      Máté Eckl authored
      
      
      The extracted functions will likely be usefull to implement tproxy
      support in nf_tables.
      
      Extrancted functions:
      	- nf_tproxy_sk_is_transparent
      	- nf_tproxy_laddr4
      	- nf_tproxy_handle_time_wait4
      	- nf_tproxy_get_sock_v4
      	- nf_tproxy_laddr6
      	- nf_tproxy_handle_time_wait6
      	- nf_tproxy_get_sock_v6
      
      (nf_)tproxy_handle_time_wait6 also needed some refactor as its current
      implementation was xtables-specific.
      
      Signed-off-by: default avatarMáté Eckl <ecklm94@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      45ca4e0c
    • Máté Eckl's avatar
      netfilter: Decrease code duplication regarding transparent socket option · 8d6e5557
      Máté Eckl authored
      
      
      There is a function in include/net/netfilter/nf_socket.h to decide if a
      socket has IP(V6)_TRANSPARENT socket option set or not. However this
      does the same as inet_sk_transparent() in include/net/tcp.h
      
      include/net/tcp.h:1733
      /* This helper checks if socket has IP_TRANSPARENT set */
      static inline bool inet_sk_transparent(const struct sock *sk)
      {
      	switch (sk->sk_state) {
      	case TCP_TIME_WAIT:
      		return inet_twsk(sk)->tw_transparent;
      	case TCP_NEW_SYN_RECV:
      		return inet_rsk(inet_reqsk(sk))->no_srccheck;
      	}
      	return inet_sk(sk)->transparent;
      }
      
      tproxy_sk_is_transparent has also been refactored to use this function
      instead of reimplementing it.
      
      Signed-off-by: default avatarMáté Eckl <ecklm94@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8d6e5557
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 012cface
      Dave Airlie authored
      Two last minute DC fixes for 4.17.  A fix for underscan on fiji and
      a fix for gamma settings getting after dpms.
      
      * 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux:
        drm/amd/display: Update color props when modeset is required
        drm/amd/display: Make atomic-check validate underscan changes
      012cface
    • Linus Torvalds's avatar
      Merge tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 4277e6b9
      Linus Torvalds authored
      Pull MIPS fixes from James Hogan:
       "A final few MIPS fixes for 4.17:
      
         - drop Lantiq gphy reboot/remove reset (4.14)
      
         - prctl(PR_SET_FP_MODE): Disallow PRE without FR (4.0)
      
         - ptrace(PTRACE_PEEKUSR): Fix 64-bit FGRs (3.15)"
      
      * tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: ptrace: Fix PTRACE_PEEKUSR requests for 64-bit FGRs
        MIPS: prctl: Disallow FRE without FR with PR_SET_FP_MODE requests
        MIPS: lantiq: gphy: Drop reboot/remove reset asserts
      4277e6b9
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio · 7172a69c
      Linus Torvalds authored
      Pull VFIO fix from Alex Williamson:
       "Revert a pfn page mapping optimization identified as introducing a bad
        page state regression (Alex Williamson)"
      
      * tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio:
        Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"
      7172a69c
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 6ac9f42c
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are four small bugfixes for some char/misc drivers. Well, really
        three fixes and one fix for one of those fixes due to problems found
        by 0-day.
      
        This resolves some reported issues with the hwtracing drivers, and a
        reported regression for the thunderbolt subsystem. All of these have
        been in linux-next for a while now with no reported problems"
      
      * tag 'char-misc-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        hwtracing: stm: fix build error on some arches
        intel_th: Use correct device when freeing buffers
        stm class: Use vmalloc for the master map
        thunderbolt: Handle NULL boot ACL entries properly
      6ac9f42c
    • Linus Torvalds's avatar
      Merge tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 34a8e640
      Linus Torvalds authored
      Pull IIO driver fixes from Greg KH:
       "Here are some old IIO driver fixes that were sitting in my tree for a
        few weeks. Sorry about not getting them to you sooner. They fix a
        number of small IIO driver issues that have been reported.
      
        All of these have been in linux-next for a while with no reported
        problems"
      
      * tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio: adc: select buffer for at91-sama5d2_adc
        iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume
        iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels
        iio:kfifo_buf: check for uint overflow
        iio:buffer: make length types match kfifo types
        iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock
        iio: adc: stm32-dfsdm: fix successive oversampling settings
        iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ
      34a8e640
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 7fdf3e86
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "Just three small last minute regressions that were found in the last
        week. The Broadcom fix is a bit big for rc7, but since it is fixing
        driver crash regressions that were merged via netdev into rc1, I am
        sending it.
      
         - bnxt netdev changes merged this cycle caused the bnxt RDMA driver
           to crash under certain situations
      
         - Arnd found (several, unfortunately) kconfig problems with the
           patches adding INFINIBAND_ADDR_TRANS. Reverting this last part,
           will fix it more fully outside -rc.
      
         - Subtle change in error code for a uapi function caused breakage in
           userspace. This was bug was subtly introduced cycle"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        IB/core: Fix error code for invalid GID entry
        IB: Revert "remove redundant INFINIBAND kconfig dependencies"
        RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes
      7fdf3e86
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · a36b7968
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "A documentation bugfix and a MAINTAINERS addition"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: ocores: update HDL sources URL
        i2c: xlp9xx: Add MAINTAINERS entry
      a36b7968
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 0938a8f5
      Linus Torvalds authored
      Merge two fixes from Andrew Morton.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: fix the NULL mapping case in __isolate_lru_page()
        mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()
      0938a8f5
    • Hugh Dickins's avatar
      mm: fix the NULL mapping case in __isolate_lru_page() · 145e1a71
      Hugh Dickins authored
      George Boole would have noticed a slight error in 4.16 commit
      69d763fc ("mm: pin address_space before dereferencing it while
      isolating an LRU page").  Fix it, to match both the comment above it,
      and the original behaviour.
      
      Although anonymous pages are not marked PageDirty at first, we have an
      old habit of calling SetPageDirty when a page is removed from swap
      cache: so there's a category of ex-swap pages that are easily
      migratable, but were inadvertently excluded from compaction's async
      migration in 4.16.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805302014001.12558@eggly.anvils
      Fixes: 69d763fc
      
       ("mm: pin address_space before dereferencing it while isolating an LRU page")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarIvan Kalvachev <ikalvachev@gmail.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      145e1a71
    • Hugh Dickins's avatar
      mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty() · 2d077d4b
      Hugh Dickins authored
      Swapping load on huge=always tmpfs (with khugepaged tuned up to be very
      eager, but I'm not sure that is relevant) soon hung uninterruptibly,
      waiting for page lock in shmem_getpage_gfp()'s find_lock_entry(), most
      often when "cp -a" was trying to write to a smallish file.  Debug showed
      that the page in question was not locked, and page->mapping NULL by now,
      but page->index consistent with having been in a huge page before.
      
      Reproduced in minutes on a 4.15 kernel, even with 4.17's 605ca5ed
      ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()") added
      in; but took hours to reproduce on a 4.17 kernel (no idea why).
      
      The culprit proved to be the __ClearPageDirty() on tails beyond i_size in
      __split_huge_page(): the non-atomic __bitoperation may have been safe when
      4.8's baa355fd ("thp: file pages support for split_huge_page()")
      introduced it, but liable to erase PageWaiters after 4.10's 62906027
      ("mm: add PageWaiters indicating tasks are waiting for a page bit").
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805291841070.3197@eggly.anvils
      Fixes: 62906027
      
       ("mm: add PageWaiters indicating tasks are waiting for a page bit")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d077d4b
    • Alex Williamson's avatar
      Revert "vfio/type1: Improve memory pinning process for raw PFN mapping" · 89c29def
      Alex Williamson authored
      Bisection by Amadeusz Sławiński implicates this commit leading to bad
      page state issues after VM shutdown, likely due to unbalanced page
      references.  The original commit was intended only as a performance
      improvement, therefore revert for offline rework.
      
      Link: https://lkml.org/lkml/2018/6/2/97
      Fixes: 356e88eb
      
       ("vfio/type1: Improve memory pinning process for raw PFN mapping")
      Cc: Jason Cai (Xiang Feng) <jason.cai@linux.alibaba.com>
      Reported-by: default avatarAmadeusz Sławiński <amade@asmblr.net>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      89c29def
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 1ffdd8e1
      David S. Miller authored
      
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for your net-next
      tree, the most relevant things in this batch are:
      
      1) Compile masquerade infrastructure into NAT module, from Florian Westphal.
         Same thing with the redirection support.
      
      2) Abort transaction if early initialization of the commit phase fails.
         Also from Florian.
      
      3) Get rid of synchronize_rcu() by using rule array in nf_tables, from
         Florian.
      
      4) Abort nf_tables batch if fatal signal is pending, from Florian.
      
      5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless.
         From Florian Westphal.
      
      6) Support to match transparent sockets from nf_tables, from Máté Eckl.
      
      7) Audit support for nf_tables, from Phil Sutter.
      
      8) Validate chain dependencies from commit phase, fall back to fine grain
         validation only in case of errors.
      
      9) Attach dst to skbuff from netfilter flowtable packet path, from
         Jason A. Donenfeld.
      
      10) Use artificial maximum attribute cap to remove VLA from nfnetlink.
          Patch from Kees Cook.
      
      11) Add extension to allow to forward packets through neighbour layer.
      
      12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov.
      
      13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ffdd8e1
    • David S. Miller's avatar
      Merge tag 'mlx5e-updates-2018-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f39c6b29
      David S. Miller authored
      
      
      Saeed Mahameed says:
      
      ====================
      mlx5e-updates-2018-06-01
      
      1) From Tariq, Two patches to Fix IPoIB issues introduced in
         "net/mlx5e: TX, Use actual WQE size for SQ edge fill"
      
      2) From Eran, Additional improvements to mlx5e statistics reporting
      
      3) From Maor, Increase aRFS flow tables size
      
      4) From Adi, Support MTU change for ethernet representors
      
      5) From Ilan and Adi, Handle QP error events in FPGA
      
      6) From Tariq, last 10 patches mainly deals with RX buffer scheme improvements for legacy RQ
         to use only order-0 pages and fragmented SKBs for large MTUs.
      
      -  Tariq starts with some refactoring and removing HW LRO support from traditional
         (legacy) RQ, since it complicates the buffer scheme and removing it makes it smoother
         to move to cyclic descriptor buffer for traditional RQ.
      
      - Use cyclic WQ in legacy RQ, which has many benefits and paves the way for fragmented SKBs
        for large MTUs.
      
      - Enhance legacy Receive Queue memory scheme, such that only order-0 pages are used.
        Whenever possible, prefer using a linear SKB, and build it wrapping the WQE buffer.
        Otherwise (for example, jumbo frames on x86), use non-linear SKB, with as many frags
        as needed. In this case, multiple WQE scatter entries are used, up to a maximum of 4
        frags and 10KB of MTU.
      
      - TX statistics access improvements.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f39c6b29
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · cd075ce4
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-06-02
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) BPF uapi fix in struct bpf_prog_info and struct bpf_map_info in
         order to fix offsets on 32 bit archs.
      
      This will have a minor merge conflict with net-next which has the
      __u32 gpl_compatible:1 bitfield in struct bpf_prog_info at this
      location. Resolution is to use the gpl_compatible member.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd075ce4
    • Daniel Borkmann's avatar
      bpf: fix uapi hole for 32 bit compat applications · 36f9814a
      Daniel Borkmann authored
      In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the
      case of struct bpf_map_info but also struct bpf_prog_info. In net-next
      commit b85fab0e ("bpf: Add gpl_compatible flag to struct bpf_prog_info")
      added a bitfield into it to expose some flags related to programs. Thus,
      add an unnamed __u32 bitfield for both so that alignment keeps the same
      in both 32 and 64 bit cases, and can be naturally extended from there
      as in b85fab0e
      
      .
      
      Before:
      
        # file test.o
        test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
        # pahole test.o
        struct bpf_map_info {
      	__u32                      type;                 /*     0     4 */
      	__u32                      id;                   /*     4     4 */
      	__u32                      key_size;             /*     8     4 */
      	__u32                      value_size;           /*    12     4 */
      	__u32                      max_entries;          /*    16     4 */
      	__u32                      map_flags;            /*    20     4 */
      	char                       name[16];             /*    24    16 */
      	__u32                      ifindex;              /*    40     4 */
      	__u64                      netns_dev;            /*    44     8 */
      	__u64                      netns_ino;            /*    52     8 */
      
      	/* size: 64, cachelines: 1, members: 10 */
      	/* padding: 4 */
        };
      
      After (same as on 64 bit):
      
        # file test.o
        test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
        # pahole test.o
        struct bpf_map_info {
      	__u32                      type;                 /*     0     4 */
      	__u32                      id;                   /*     4     4 */
      	__u32                      key_size;             /*     8     4 */
      	__u32                      value_size;           /*    12     4 */
      	__u32                      max_entries;          /*    16     4 */
      	__u32                      map_flags;            /*    20     4 */
      	char                       name[16];             /*    24    16 */
      	__u32                      ifindex;              /*    40     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	__u64                      netns_dev;            /*    48     8 */
      	__u64                      netns_ino;            /*    56     8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      
      	/* size: 64, cachelines: 1, members: 10 */
      	/* sum members: 60, holes: 1, sum holes: 4 */
        };
      
      Reported-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Reported-by: default avatarEugene Syromiatnikov <esyr@redhat.com>
      Fixes: 52775b33 ("bpf: offload: report device information about offloaded maps")
      Fixes: 675fc275
      
       ("bpf: offload: report device information for offloaded programs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      36f9814a
  3. 01 Jun, 2018 7 commits
    • Tariq Toukan's avatar
      net/mlx5e: TX, Separate cachelines of xmit and completion stats · f65a59ff
      Tariq Toukan authored
      
      
      Avoid false sharing of cachelines by separating the cachelines of
      TX stats that are dertied in xmit flow and in completion flow.
      
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f65a59ff
    • Tariq Toukan's avatar
      net/mlx5e: RX, Always prefer Linear SKB configuration · 5ffd8194
      Tariq Toukan authored
      
      
      Prefer the linear SKB configuration of Legacy RQ over the
      non-linear one of Striding RQ.
      
      This implies that ConnectX-4 LX now uses legacy RQ by default,
      as it does not support the linear configuration of Striding RQ.
      
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5ffd8194
    • Tariq Toukan's avatar
      net/mlx5e: RX, Enhance legacy Receive Queue memory scheme · 069d1146
      Tariq Toukan authored
      
      
      Enhance the memory scheme of the legacy RQ, such that
      only order-0 pages are used.
      
      Whenever possible, prefer using a linear SKB, and build it
      wrapping the WQE buffer.
      
      Otherwise (for example, jumbo frames on x86), use non-linear SKB,
      with as many frags as needed. In this case, multiple WQE
      scatter entries are used, up to a maximum of 4 frags and 10KB of MTU.
      
      This implied to remove support of HW LRO in legacy RQ, as it would
      require large number of page allocations and scatter entries per WQE
      on archs with PAGE_SIZE = 4KB, yielding bad performance.
      
      In earlier patches, we guaranteed that all completions are in-order,
      and that we use a cyclic WQ.
      This creates an oppurtunity for a performance optimization:
      The mapping between a "struct mlx5e_dma_info", and the
      WQEs (struct mlx5e_wqe_frag_info) pointing to it, is constant
      across different cycles of a WQ. This allows initializing
      the mapping in the time of RQ creation, and not handle it
      in datapath.
      
      A struct mlx5e_dma_info that is shared between different WQEs
      is allocated by the first WQE, and freed by the last one.
      This implies an important requirement: WQEs that share the same
      struct mlx5e_dma_info must be posted within the same NAPI.
      Otherwise, upon completion, struct mlx5e_wqe_frag_info would mistakenly
      point to the new struct mlx5e_dma_info, not the one that was posted
      (and the HW wrote to).
      This bulking requirement is actually good also for performance reasons,
      hence we extend the bulk beyong the minimal requirement above.
      
      With this memory scheme, the RQs memory footprint is reduce by a
      factor of 2 on x86, and by a factor of 32 on PowerPC.
      Same factors apply for the number of pages in a GRO session.
      
      Performance tests:
      ConnectX-4, single core, single RX ring, default MTU.
      
      x86:
      CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      
      Packet rate (early drop in TC): no degradation
      TCP streams: ~5% improvement
      
      PowerPC:
      CPU: POWER8 (raw), altivec supported
      
      Packet rate (early drop in TC): 20% gain
      TCP streams: 25% gain
      
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      069d1146
    • Tariq Toukan's avatar
      net/mlx5e: RX, Use cyclic WQ in legacy RQ · 99cbfa93
      Tariq Toukan authored
      
      
      Now that LRO is not supported for Legacy RQ, there is no source of
      out-of-order completions in the WQ, and we can use a cyclic one.
      This has multiple advantages:
      - reduces the WQE size (smaller PCI transactions).
      - lower overhead in datapath (no handling of 'next' pointers).
      - no reserved WQE for the WQ head (was need in linked-list).
      - allows using a constant map between frag and dma_info struct, in downstream patch.
      
      Performance tests:
      ConnectX-4, single core, single RX ring.
      Major gain in packet rate of single ring XDP drop.
      Bottleneck is shifted form HW (at 16Mpps) to SW (at 20Mpps).
      
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      99cbfa93
    • Tariq Toukan's avatar
      net/mlx5e: RX, Split WQ objects for different RQ types · 422d4c40
      Tariq Toukan authored
      
      
      Replace the common RQ WQ object with two separate ones for the
      different RQ types.
      This is in preparation for switching to using a cyclic WQ type
      in Legacy RQ.
      
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      422d4c40
    • Tariq Toukan's avatar
      net/mlx5e: RX, Remove HW LRO support in legacy RQ · 6c3a823e
      Tariq Toukan authored
      
      
      Current LRO implementation in Legacy RQ uses high-order pages.
      In downstream patches of this series we complete the transition
      to using only order-0 pages in RX datapath (which was already done
      in Striding RQ).
      
      Unlike the more advanced Striding RQ, Legacy RQ does not make reuse
      of any non-consumed buffers of non-full LRO sessions, and combining
      it with order-0 pages has many performance drawbacks.
      
      Hence, here we totally remove LRO support in Legacy RQ.
      This guarantees having no out-of-order completions, which allows using
      a cyclic work queue (instead of a linked-list) in a downstream patch.
      
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6c3a823e
    • Tariq Toukan's avatar
      net/mlx5e: RX, Dedicate a function for copying SKB header · 386471f1
      Tariq Toukan authored
      
      
      Get the logic of copying the packet header into the SKB linear part
      into a generic function. Function does copy length alignment
      and dma buffer sync.
      
      It is currently called only within the MPWQE flow.
      In a downstream patch, it will be called within the legacy RQ flow
      as well.
      
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      386471f1