1. 15 Oct, 2018 4 commits
    • Daniel Borkmann's avatar
      tls: convert to generic sk_msg interface · d829e9c4
      Daniel Borkmann authored
      Convert kTLS over to make use of sk_msg interface for plaintext and
      encrypted scattergather data, so it reuses all the sk_msg helpers
      and data structure which later on in a second step enables to glue
      this to BPF.
      This also allows to remove quite a bit of open coded helpers which
      are covered by the sk_msg API. Recent changes in kTLs 80ece6a0
      ("tls: Remove redundant vars from tls record structure") and
       ("tls: Add support for inplace records encryption")
      changed the data path handling a bit; while we've kept the latter
      optimization intact, we had to undo the former change to better
      fit the sk_msg model, hence the sg_aead_in and sg_aead_out have
      been brought back and are linked into the sk_msg sgs. Now the kTLS
      record contains a msg_plaintext and msg_encrypted sk_msg each.
      In the original code, the zerocopy_from_iter() has been used out
      of TX but also RX path. For the strparser skb-based RX path,
      we've left the zerocopy_from_iter() in decrypt_internal() mostly
      untouched, meaning it has been moved into tls_setup_from_iter()
      with charging logic removed (as not used from RX). Given RX path
      is not based on sk_msg objects, we haven't pursued setting up a
      dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it
      could be an option to prusue in a later step.
      Joint work with John.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    • Daniel Borkmann's avatar
      bpf, sockmap: convert to generic sk_msg interface · 604326b4
      Daniel Borkmann authored
      Add a generic sk_msg layer, and convert current sockmap and later
      kTLS over to make use of it. While sk_buff handles network packet
      representation from netdevice up to socket, sk_msg handles data
      representation from application to socket layer.
      This means that sk_msg framework spans across ULP users in the
      kernel, and enables features such as introspection or filtering
      of data with the help of BPF programs that operate on this data
      Latter becomes in particular useful for kTLS where data encryption
      is deferred into the kernel, and as such enabling the kernel to
      perform L7 introspection and policy based on BPF for TLS connections
      where the record is being encrypted after BPF has run and came to
      a verdict. In order to get there, first step is to transform open
      coding of scatter-gather list handling into a common core framework
      that subsystems can use.
      The code itself has been split and refactored into three bigger
      pieces: i) the generic sk_msg API which deals with managing the
      scatter gather ring, providing helpers for walking and mangling,
      transferring application data from user space into it, and preparing
      it for BPF pre/post-processing, ii) the plain sock map itself
      where sockets can be attached to or detached from; these bits
      are independent of i) which can now be used also without sock
      map, and iii) the integration with plain TCP as one protocol
      to be used for processing L7 application data (later this could
      e.g. also be extended to other protocols like UDP). The semantics
      are the same with the old sock map code and therefore no change
      of user facing behavior or APIs. While pursuing this work it
      also helped finding a number of bugs in the old sockmap code
      that we've fixed already in earlier commits. The test_sockmap
      kselftest suite passes through fine as well.
      Joint work with John.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    • Daniel Borkmann's avatar
      tcp, ulp: remove ulp bits from sockmap · 1243a51f
      Daniel Borkmann authored
      In order to prepare sockmap logic to be used in combination with kTLS
      we need to detangle it from ULP, and further split it in later commits
      into a generic API.
      Joint work with John.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    • Daniel Borkmann's avatar
      tcp, ulp: enforce sock_owned_by_me upon ulp init and cleanup · 8b9088f8
      Daniel Borkmann authored
      Whenever the ULP data on the socket is mangled, enforce that the
      caller has the socket lock held as otherwise things may race with
      initialization and cleanup callbacks from ulp ops as both would
      mangle internal socket state.
      Joint work with John.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
  2. 14 Oct, 2018 1 commit
  3. 13 Oct, 2018 1 commit
  4. 11 Oct, 2018 4 commits
  5. 10 Oct, 2018 13 commits
  6. 09 Oct, 2018 2 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 071a234a
      David S. Miller authored
      Alexei Starovoitov says:
      pull-request: bpf-next 2018-10-08
      The following pull-request contains BPF updates for your *net-next* tree.
      The main changes are:
      1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
      BPF programs to perform lookups for sockets in a network namespace. This would
      allow programs to determine early on in processing whether the stack is
      expecting to receive the packet, and perform some action (eg drop,
      forward somewhere) based on this information.
      2) per-cpu cgroup local storage from Roman Gushchin.
      Per-cpu cgroup local storage is very similar to simple cgroup storage
      except all the data is per-cpu. The main goal of per-cpu variant is to
      implement super fast counters (e.g. packet counters), which don't require
      neither lookups, neither atomic operations in a fast path.
      The example of these hybrid counters is in selftests/bpf/netcnt_prog.c
      3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet
      4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski
      5) rename of libbpf interfaces from Andrey Ignatov.
      libbpf is maturing as a library and should follow good practices in
      library design and implementation to play well with other libraries.
      This patch set brings consistent naming convention to global symbols.
      6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
      to let Apache2 projects use libbpf
      7) various AF_XDP fixes from Björn and Magnus
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 9000a457
      David S. Miller authored
      Pablo Neira Ayuso says:
      Netfilter updates for net-next
      The following patchset contains Netfilter updates for your net-next tree:
      1) Support for matching on ipsec policy already set in the route, from
         Florian Westphal.
      2) Split set destruction into deactivate and destroy phase to make it
         fit better into the transaction infrastructure, also from Florian.
         This includes a patch to warn on imbalance when setting the new
         activate and deactivate interfaces.
      3) Release transaction list from the workqueue to remove expensive
         synchronize_rcu() from configuration plane path. This speeds up
         configuration plane quite a bit. From Florian Westphal.
      4) Add new xfrm/ipsec extension, this new extension allows you to match
         for ipsec tunnel keys such as source and destination address, spi and
         reqid. From Máté Eckl and Florian Westphal.
      5) Add secmark support, this includes connsecmark too, patches
         from Christian Gottsche.
      6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng.
         One follow up patch to calm a clang warning for this one, from
         Nathan Chancellor.
      7) Flush conntrack entries based on layer 3 family, from Kristian Evensen.
      8) New revision for cgroups2 to shrink the path field.
      9) Get rid of obsolete need_conntrack(), as a result from recent
         demodularization works.
      10) Use WARN_ON instead of BUG_ON, from Florian Westphal.
      11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian.
      12) Remove superfluous check for timeout netlink parser and dump
          functions in layer 4 conntrack helpers.
      13) Unnecessary redundant rcu read side locks in NAT redirect,
          from Taehee Yoo.
      14) Pass nf_hook_state structure to error handlers, patch from
          Florian Westphal.
      15) Remove ->new() interface from layer 4 protocol trackers. Place
          them in the ->packet() interface. From Florian.
      16) Place conntrack ->error() handling in the ->packet() interface.
          Patches from Florian Westphal.
      17) Remove unused parameter in the pernet initialization path,
          also from Florian.
      18) Remove additional parameter to specify layer 3 protocol when
          looking up for protocol tracker. From Florian.
      19) Shrink array of layer 4 protocol trackers, from Florian.
      20) Check for linear skb only once from the ALG NAT mangling
          codebase, from Taehee Yoo.
      21) Use rhashtable_walk_enter() instead of deprecated
          rhashtable_walk_init(), also from Taehee.
      22) No need to flush all conntracks when only one single address
          is gone, from Tan Hu.
      23) Remove redundant check for NAT flags in flowtable code, from
          Taehee Yoo.
      24) Use rhashtable_lookup() instead of rhashtable_lookup_fast()
          from netfilter codebase, since rcu read lock side is already
          assumed in this path.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  7. 08 Oct, 2018 15 commits
    • Arnd Bergmann's avatar
      bpf: fix building without CONFIG_INET · df3f94a0
      Arnd Bergmann authored
      The newly added TCP and UDP handling fails to link when CONFIG_INET
      is disabled:
      net/core/filter.o: In function `sk_lookup':
      filter.c:(.text+0x7ff8): undefined reference to `tcp_hashinfo'
      filter.c:(.text+0x7ffc): undefined reference to `tcp_hashinfo'
      filter.c:(.text+0x8020): undefined reference to `__inet_lookup_established'
      filter.c:(.text+0x8058): undefined reference to `__inet_lookup_listener'
      filter.c:(.text+0x8068): undefined reference to `udp_table'
      filter.c:(.text+0x8070): undefined reference to `udp_table'
      filter.c:(.text+0x808c): undefined reference to `__udp4_lib_lookup'
      net/core/filter.o: In function `bpf_sk_release':
      filter.c:(.text+0x82e8): undefined reference to `sock_gen_put'
      Wrap the related sections of code in #ifdefs for the config option.
      Furthermore, sk_lookup() should always have been marked 'static', this
      also avoids a warning about a missing prototype when building with
      'make W=1'.
      Fixes: 6acc9b43
       ("bpf: Add helper to retrieve socket in BPF")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJoe Stringer <joe@wand.net.nz>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    • Nathan Chancellor's avatar
      netfilter: xt_quota: Don't use aligned attribute in sizeof · ffa0a9a5
      Nathan Chancellor authored
      Clang warns:
      net/netfilter/xt_quota.c:47:44: warning: 'aligned' attribute ignored
      when parsing type [-Wignored-attributes]
              BUILD_BUG_ON(sizeof(atomic64_t) != sizeof(__aligned_u64));
      Use 'sizeof(__u64)' instead, as the alignment doesn't affect the size
      of the type.
      Fixes: e9837e55
       ("netfilter: xt_quota: fix the behavior of xt_quota module")
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Ioana Ciocoi Radulescu's avatar
      dpaa2-eth: Don't account Tx confirmation frames on NAPI poll · 68049a5f
      Ioana Ciocoi Radulescu authored
      Until now, both Rx and Tx confirmation frames handled during
      NAPI poll were counted toward the NAPI budget. However, Tx
      confirmations are lighter to process than Rx frames, which can
      skew the amount of work actually done inside one NAPI cycle.
      Update the code to only count Rx frames toward the NAPI budget
      and set a separate threshold on how many Tx conf frames can be
      processed in one poll cycle.
      The NAPI poll routine stops when either the budget is consumed
      by Rx frames or when Tx confirmation frames reach this threshold.
      Signed-off-by: default avatarIoana Radulescu <ruxandra.radulescu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • YueHaibing's avatar
      net: mscc: ocelot: remove set but not used variable 'phy_mode' · 9e19dabc
      YueHaibing authored
      Fixes gcc '-Wunused-but-set-variable' warning:
      drivers/net/ethernet/mscc/ocelot_board.c: In function 'mscc_ocelot_probe':
      drivers/net/ethernet/mscc/ocelot_board.c:262:17: warning:
       variable 'phy_mode' set but not used [-Wunused-but-set-variable]
         enum phy_mode phy_mode;
      It never used since introduction in
      commit 71e32a20
       ("net: mscc: ocelot: make use of SerDes PHYs for handling their configuration")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David S. Miller's avatar
      Merge branch 'more-pmtu-selftests' · ee9615be
      David S. Miller authored
      Sabrina Dubroca says:
      selftests: add more PMTU tests
      The current selftests for PMTU cover VTI tunnels, but there's nothing
      about the generation and handling of PMTU exceptions by intermediate
      routers. This series adds and improves existing helpers, then adds
      IPv4 and IPv6 selftests with a setup involving an intermediate router.
      Joint work with Stefano Brivio.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Sabrina Dubroca's avatar
      selftests: pmtu: add basic IPv4 and IPv6 PMTU tests · e44e428f
      Sabrina Dubroca authored
      Commit d1f1b9cb
       ("selftests: net: Introduce first PMTU test") and
      follow-ups introduced some PMTU tests, but they all rely on tunneling,
      and, particularly, on VTI.
      These new tests use simple routing to exercise the generation and
      update of PMTU exceptions in IPv4 and IPv6.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Sabrina Dubroca's avatar
      selftests: pmtu: extend MTU parsing helper to locked MTU · 72ebddd7
      Sabrina Dubroca authored
      The mtu_parse helper introduced in commit f2c929fe
      pmtu: Factor out MTU parsing helper") can only handle "mtu 1234", but
      not "mtu lock 1234". Extend it, so that we can do IPv4 tests with PMTU
      smaller than net.ipv4.route.min_pmtu
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Stefano Brivio's avatar
      selftests: pmtu: Introduce check_pmtu_value() · 1e0a7207
      Stefano Brivio authored
      Introduce and use a function that checks PMTU values against
      expected values and logs error messages, to remove some clutter.
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Gustavo A. R. Silva's avatar
      isdn/gigaset/isocdata: mark expected switch fall-through · 062f97a3
      Gustavo A. R. Silva authored
      Notice that in this particular case, I replaced the
      "--v-- fall through --v--" comment with a proper
      "fall through", which is what GCC is expecting to
      This fix is part of the ongoing efforts to enabling
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David S. Miller's avatar
      Merge branch 'rtnetlink-Add-support-for-rigid-checking-of-data-in-dump-request' · cd7f7df6
      David S. Miller authored
      David Ahern says:
      rtnetlink: Add support for rigid checking of data in dump request
      There are many use cases where a user wants to influence what is
      returned in a dump for some rtnetlink command: one is wanting data
      for a different namespace than the one the request is received and
      another is limiting the amount of data returned in the dump to a
      specific set of interest to userspace, reducing the cpu overhead of
      both kernel and userspace. Unfortunately, the kernel has historically
      not been strict with checking for the proper header or checking the
      values passed in the header. This lenient implementation has allowed
      iproute2 and other packages to pass any struct or data in the dump
      request as long as the family is the first byte. For example, ifinfomsg
      struct is used by iproute2 for all generic dump requests - links,
      addresses, routes and rules when it is really only valid for link
      There is 1 is example where the kernel deals with the wrong struct: link
      dumps after VF support was added. Older iproute2 was sending rtgenmsg as
      the header instead of ifinfomsg so a patch was added to try and detect
      old userspace vs new:
      e5eca6d4 ("rtnetlink: fix userspace API breakage for iproute2 < v3.9.0")
      The latest example is Christian's patch set wanting to return addresses for
      a target namespace. It guesses the header struct is an ifaddrmsg and if it
      guesses wrong a netlink warning is generated in the kernel log on every
      address dump which is unacceptable.
      Another example where the kernel is a bit lenient is route dumps: iproute2
      can send either a request with either ifinfomsg or a rtmsg as the header
      struct, yet the kernel always treats the header as an rtmsg (see
      inet_dump_fib and rtm_flags check). The header inconsistency impacts the
      ability to add kernel side filters for route dumps - a necessary feature
      for scale setups with 100k+ routes.
      How to resolve the problem of not breaking old userspace yet be able to
      move forward with new features such as kernel side filtering which are
      crucial for efficient operation at high scale?
      This patch set addresses the problem by adding a new socket flag,
      NETLINK_DUMP_STRICT_CHK, that userspace can use with setsockopt to
      request strict checking of headers and attributes on dump requests and
      hence unlock the ability to use kernel side filters as they are added.
      Kernel side, the dump handlers are updated to verify the message contains
      at least the expected header struct:
          RTM_GETLINK:       ifinfomsg
          RTM_GETADDR:       ifaddrmsg
          RTM_GETMULTICAST:  ifaddrmsg
          RTM_GETANYCAST:    ifaddrmsg
          RTM_GETADDRLABEL:  ifaddrlblmsg
          RTM_GETROUTE:      rtmsg
          RTM_GETSTATS:      if_stats_msg
          RTM_GETNEIGH:      ndmsg
          RTM_GETNEIGHTBL:   ndtmsg
          RTM_GETNSID:       rtgenmsg
          RTM_GETRULE:       fib_rule_hdr
          RTM_GETNETCONF:    netconfmsg
          RTM_GETMDB:        br_port_msg
      And then every field in the header struct should be 0 with the exception
      of the family. There are a few exceptions to this rule where the kernel
      already influences the data returned by values in the struct. Next the
      message should not contain attributes unless the kernel implements
      filtering for it. Any unexpected data causes the dump to fail with EINVAL.
      If the new flag is honored by the kernel and the dump contents adjusted
      by any data passed in the request, the dump handler can set the
      NLM_F_DUMP_FILTERED flag in the netlink message header.
      For old userspace on new kernel there is no impact as all checks are
      wrapped in a check on the new strict flag. For new userspace on old
      kernel, the data in the headers and any appended attributes are
      silently ignored though the setsockopt failing is the clue to userspace
      the feature is not supported. New userspace on new kernel gets the
      requested data dump.
      iproute2 patches can be found here:
      Major changes since v1
      - inner header is supposed to be 4-bytes aligned. So for dumps that
        should not have attributes appended changed the check to use:
              if (nlmsg_attrlen(nlh, sizeof(hdr)))
        Only impacts patches with headers that are not multiples of 4-bytes
        (rtgenmsg, netconfmsg), but applied the change to all patches not
        calling nlmsg_parse for consistency.
      - Added nlmsg_parse_strict and nla_parse_strict for tighter control on
        attribute parsing. There should be no unknown attribute types or extra
      - Moved validation to a helper in most cases
      Changes since rfc-v2
      - dropped the NLM_F_DUMP_FILTERED flag from target nsid dumps per
        Jiri's objections
      - changed the opt-in uapi from a netlink message flag to a socket
        flag. setsockopt provides an api for userspace to definitively
        know if the kernel supports strict checking on dumps.
      - re-ordered patches to peel off the extack on dumps if needed to
        keep this set size within limits
      - misc cleanups in patches based on testing
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David Ahern's avatar
      rtnetlink: Update rtnl_fdb_dump for strict data checking · 8c6e137f
      David Ahern authored
      Update rtnl_fdb_dump for strict data checking. If the flag is set,
      the dump request is expected to have an ndmsg struct as the header
      potentially followed by one or more attributes. Any data passed in the
      header or as an attribute is taken as a request to influence the data
      returned. Only values supported by the dump handler are allowed to be
      non-0 or set in the request. At the moment only the NDA_IFINDEX and
      NDA_MASTER attributes are supported.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David Ahern's avatar
      rtnetlink: Move input checking for rtnl_fdb_dump to helper · 8dfbda19
      David Ahern authored
      Move the existing input checking for rtnl_fdb_dump into a helper,
      valid_fdb_dump_legacy. This function will retain the current
      logic that works around the 2 headers that userspace has been
      allowed to send up to this point.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David Ahern's avatar
      net/bridge: Update br_mdb_dump for strict data checking · c77b9364
      David Ahern authored
      Update br_mdb_dump for strict data checking. If the flag is set,
      the dump request is expected to have a br_port_msg struct as the
      header. All elements of the struct are expected to be 0 and no
      attributes can be appended.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David Ahern's avatar
      net: Update netconf dump handlers for strict data checking · addd383f
      David Ahern authored
      Update inet_netconf_dump_devconf, inet6_netconf_dump_devconf, and
      mpls_netconf_dump_devconf for strict data checking. If the flag is set,
      the dump request is expected to have an netconfmsg struct as the header.
      The struct only has the family member and no attributes can be appended.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David Ahern's avatar
      net/ipv6: Update ip6addrlbl_dump for strict data checking · f2ae64bb
      David Ahern authored
      Update ip6addrlbl_dump for strict data checking. If the flag is set,
      the dump request is expected to have an ifaddrlblmsg struct as the
      header. All elements of the struct are expected to be 0 and no
      attributes can be appended.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>