1. 18 Sep, 2015 4 commits
  2. 02 Sep, 2015 1 commit
    • Daniel Borkmann's avatar
      netfilter: nf_conntrack: make nf_ct_zone_dflt built-in · 62da9865
      Daniel Borkmann authored
      Fengguang reported, that some randconfig generated the following linker
      issue with nf_ct_zone_dflt object involved:
        CC      init/version.o
        LD      init/built-in.o
        net/built-in.o: In function `ipv4_conntrack_defrag':
        nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt'
        net/built-in.o: In function `ipv6_defrag':
        nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt'
        make: *** [vmlinux] Error 1
      Given that configurations exist where we have a built-in part, which is
      accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user()
      and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a
      module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in
      area when netfilter is configured in general.
      Therefore, split the more generic parts into a common header under
      include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in
      section that already holds parts related to CONFIG_NF_CONNTRACK in the
      netfilter core. This fixes the issue on my side.
      Fixes: 308ac914
       ("netfilter: nf_conntrack: push zone object into functions")
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 23 Jul, 2015 1 commit
  4. 15 Jul, 2015 3 commits
    • Florian Westphal's avatar
      netfilter: move tee_active to core · e7c8899f
      Florian Westphal authored
      This prepares for a TEE like expression in nftables.
      We want to ensure only one duplicate is sent, so both will
      use the same percpu variable to detect duplication.
      The other use case is detection of recursive call to xtables, but since
      we don't want dependency from nft to xtables core its put into core.c
      instead of the x_tables core.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Eric W. Biederman's avatar
      netfilter: Per network namespace netfilter hooks. · 085db2c0
      Eric W. Biederman authored
      - Add a new set of functions for registering and unregistering per
        network namespace hooks.
      - Modify the old global namespace hook functions to use the per
        network namespace hooks in their implementation, so their remains a
        single list that needs to be walked for any hook (this is important
        for keeping the hook priority working and for keeping the code
        walking the hooks simple).
      - Only allow registering the per netdevice hooks in the network
        namespace where the network device lives.
      - Dynamically allocate the structures in the per network namespace
        hook list in nf_register_net_hook, and unregister them in
        Dynamic allocate is required somewhere as the number of network
        namespaces are not fixed so we might as well allocate them in the
        registration function.
        The chain of registered hooks on any list is expected to be small so
        the cost of walking that list to find the entry we are unregistering
        should also be small.
        Performing the management of the dynamically allocated list entries
        in the registration and unregistration functions keeps the complexity
        from spreading.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
    • Eric W. Biederman's avatar
      netfilter: kill nf_hooks_active · 70aa9966
      Eric W. Biederman authored
      The function obscures what is going on in nf_hook_thresh and it's existence
      requires computing the hook list twice.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
  5. 18 Jun, 2015 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: don't pull include/linux/netfilter.h from netns headers · a263653e
      Pablo Neira Ayuso authored
      This pulls the full hook netfilter definitions from all those that include
      Instead let's just include the bare minimum required in the new
      linux/netfilter_defs.h file, and use it from the netfilter netns header files.
      I also needed to include in.h and in6.h from linux/netfilter.h otherwise we hit
      this compilation error:
      In file included from include/linux/netfilter_defs.h:4:0,
                       from include/net/netns/netfilter.h:4,
                       from include/net/net_namespace.h:22,
                       from include/linux/netdevice.h:43,
                       from net/netfilter/nfnetlink_queue_core.c:23:
      include/uapi/linux/netfilter.h:76:17: error: field ‘in’ has incomplete type struct in_addr in;
      And also explicit include linux/netfilter.h in several spots.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
  6. 14 May, 2015 4 commits
    • Pablo Neira's avatar
      netfilter: add netfilter ingress hook after handle_ing() under unique static key · e687ad60
      Pablo Neira authored
      This patch adds the Netfilter ingress hook just after the existing tc ingress
      hook, that seems to be the consensus solution for this.
      Note that the Netfilter hook resides under the global static key that enables
      ingress filtering. Nonetheless, Netfilter still also has its own static key for
      minimal impact on the existing handle_ing().
      * Without this patch:
      Result: OK: 6216490(c6216338+d152) usec, 100000000 (60byte,0frags)
        16086246pps 7721Mb/sec (7721398080bps) errors: 100000000
          42.46%  kpktgend_0   [kernel.kallsyms]   [k] __netif_receive_skb_core
          25.92%  kpktgend_0   [kernel.kallsyms]   [k] kfree_skb
           7.81%  kpktgend_0   [pktgen]            [k] pktgen_thread_worker
           5.62%  kpktgend_0   [kernel.kallsyms]   [k] ip_rcv
           2.70%  kpktgend_0   [kernel.kallsyms]   [k] netif_receive_skb_internal
           2.34%  kpktgend_0   [kernel.kallsyms]   [k] netif_receive_skb_sk
           1.44%  kpktgend_0   [kernel.kallsyms]   [k] __build_skb
      * With this patch:
      Result: OK: 6214833(c6214731+d101) usec, 100000000 (60byte,0frags)
        16090536pps 7723Mb/sec (7723457280bps) errors: 100000000
          41.23%  kpktgend_0      [kernel.kallsyms]  [k] __netif_receive_skb_core
          26.57%  kpktgend_0      [kernel.kallsyms]  [k] kfree_skb
           7.72%  kpktgend_0      [pktgen]           [k] pktgen_thread_worker
           5.55%  kpktgend_0      [kernel.kallsyms]  [k] ip_rcv
           2.78%  kpktgend_0      [kernel.kallsyms]  [k] netif_receive_skb_internal
           2.06%  kpktgend_0      [kernel.kallsyms]  [k] netif_receive_skb_sk
           1.43%  kpktgend_0      [kernel.kallsyms]  [k] __build_skb
      * Without this patch + tc ingress:
              tc filter add dev eth4 parent ffff: protocol ip prio 1 \
                      u32 match ip dst
      Result: OK: 9269001(c9268821+d179) usec, 100000000 (60byte,0frags)
        10788648pps 5178Mb/sec (5178551040bps) errors: 100000000
          40.99%  kpktgend_0   [kernel.kallsyms]  [k] __netif_receive_skb_core
          17.50%  kpktgend_0   [kernel.kallsyms]  [k] kfree_skb
          11.77%  kpktgend_0   [cls_u32]          [k] u32_classify
           5.62%  kpktgend_0   [kernel.kallsyms]  [k] tc_classify_compat
           5.18%  kpktgend_0   [pktgen]           [k] pktgen_thread_worker
           3.23%  kpktgend_0   [kernel.kallsyms]  [k] tc_classify
           2.97%  kpktgend_0   [kernel.kallsyms]  [k] ip_rcv
           1.83%  kpktgend_0   [kernel.kallsyms]  [k] netif_receive_skb_internal
           1.50%  kpktgend_0   [kernel.kallsyms]  [k] netif_receive_skb_sk
           0.99%  kpktgend_0   [kernel.kallsyms]  [k] __build_skb
      * With this patch + tc ingress:
              tc filter add dev eth4 parent ffff: protocol ip prio 1 \
                      u32 match ip dst
      Result: OK: 9308218(c9308091+d126) usec, 100000000 (60byte,0frags)
        10743194pps 5156Mb/sec (5156733120bps) errors: 100000000
          42.01%  kpktgend_0   [kernel.kallsyms]   [k] __netif_receive_skb_core
          17.78%  kpktgend_0   [kernel.kallsyms]   [k] kfree_skb
          11.70%  kpktgend_0   [cls_u32]           [k] u32_classify
           5.46%  kpktgend_0   [kernel.kallsyms]   [k] tc_classify_compat
           5.16%  kpktgend_0   [pktgen]            [k] pktgen_thread_worker
           2.98%  kpktgend_0   [kernel.kallsyms]   [k] ip_rcv
           2.84%  kpktgend_0   [kernel.kallsyms]   [k] tc_classify
           1.96%  kpktgend_0   [kernel.kallsyms]   [k] netif_receive_skb_internal
           1.57%  kpktgend_0   [kernel.kallsyms]   [k] netif_receive_skb_sk
      Note that the results are very similar before and after.
      I can see gcc gets the code under the ingress static key out of the hot path.
      Then, on that cold branch, it generates the code to accomodate the netfilter
      ingress static key. My explanation for this is that this reduces the pressure
      on the instruction cache for non-users as the new code is out of the hot path,
      and it comes with minimal impact for tc ingress users.
      Using gcc version 4.8.4 on:
      Architecture:          x86_64
      CPU op-mode(s):        32-bit, 64-bit
      Byte Order:            Little Endian
      CPU(s):                8
      L1d cache:             16K
      L1i cache:             64K
      L2 cache:              2048K
      L3 cache:              8192K
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Pablo Neira's avatar
      netfilter: add nf_hook_list_active() · b8d0aad0
      Pablo Neira authored
      In preparation to have netfilter ingress per-device hook list.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Pablo Neira's avatar
    • Pablo Neira's avatar
  7. 07 Apr, 2015 3 commits
    • David Miller's avatar
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller authored
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David Miller's avatar
      netfilter: Add socket pointer to nf_hook_state. · 1c984f8a
      David Miller authored
      It is currently always set to NULL, but nf_queue is adjusted to be
      prepared for it being set to a real socket by taking and releasing a
      reference to that socket when necessary.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David Miller's avatar
      netfilter: Add nf_hook_state initializer function. · 107a9f4d
      David Miller authored
      This way we can consolidate where we setup new nf_hook_state objects,
      to make sure the entire thing is initialized.
      The only other place an nf_hook_object is instantiated is nf_queue,
      wherein a structure copy is used.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 04 Apr, 2015 2 commits
  9. 25 Aug, 2014 1 commit
  10. 14 Oct, 2013 2 commits
    • Patrick McHardy's avatar
      netfilter: add nftables · 96518518
      Patrick McHardy authored
      This patch adds nftables which is the intended successor of iptables.
      This packet filtering framework reuses the existing netfilter hooks,
      the connection tracking system, the NAT subsystem, the transparent
      proxying engine, the logging infrastructure and the userspace packet
      queueing facilities.
      In a nutshell, nftables provides a pseudo-state machine with 4 general
      purpose registers of 128 bits and 1 specific purpose register to store
      verdicts. This pseudo-machine comes with an extensible instruction set,
      a.k.a. "expressions" in the nftables jargon. The expressions included
      in this patch provide the basic functionality, they are:
      * bitwise: to perform bitwise operations.
      * byteorder: to change from host/network endianess.
      * cmp: to compare data with the content of the registers.
      * counter: to enable counters on rules.
      * ct: to store conntrack keys into register.
      * exthdr: to match IPv6 extension headers.
      * immediate: to load data into registers.
      * limit: to limit matching based on packet rate.
      * log: to log packets.
      * meta: to match metainformation that usually comes with the skbuff.
      * nat: to perform Network Address Translation.
      * payload: to fetch data from the packet payload and store it into
      * reject (IPv4 only): to explicitly close connection, eg. TCP RST.
      Using this instruction-set, the userspace utility 'nft' can transform
      the rules expressed in human-readable text representation (using a
      new syntax, inspired by tcpdump) to nftables bytecode.
      nftables also inherits the table, chain and rule objects from
      iptables, but in a more configurable way, and it also includes the
      original datatype-agnostic set infrastructure with mapping support.
      This set infrastructure is enhanced in the follow up patch (netfilter:
      nf_tables: add netlink set API).
      This patch includes the following components:
      * the netlink API: net/netfilter/nf_tables_api.c and
      * the packet filter core: net/netfilter/nf_tables_core.c
      * the expressions (described above): net/netfilter/nft_*.c
      * the filter tables: arp, IPv4, IPv6 and bridge:
      * the NAT table (IPv4 only):
      * the route table (similar to mangle):
      * internal definitions under:
      * It also includes an skeleton expression:
        and the preliminary implementation of the meta target
      It also includes a change in struct nf_hook_ops to add a new
      pointer to store private data to the hook, that is used to store
      the rule list per chain.
      This patch is based on the patch from Patrick McHardy, plus merged
      accumulated cleanups, fixes and small enhancements to the nftables
      code that has been done since 2009, which are:
      From Patrick McHardy:
      * nf_tables: adjust netlink handler function signatures
      * nf_tables: only retry table lookup after successful table module load
      * nf_tables: fix event notification echo and avoid unnecessary messages
      * nft_ct: add l3proto support
      * nf_tables: pass expression context to nft_validate_data_load()
      * nf_tables: remove redundant definition
      * nft_ct: fix maxattr initialization
      * nf_tables: fix invalid event type in nf_tables_getrule()
      * nf_tables: simplify nft_data_init() usage
      * nf_tables: build in more core modules
      * nf_tables: fix double lookup expression unregistation
      * nf_tables: move expression initialization to nf_tables_core.c
      * nf_tables: build in payload module
      * nf_tables: use NFPROTO constants
      * nf_tables: rename pid variables to portid
      * nf_tables: save 48 bits per rule
      * nf_tables: introduce chain rename
      * nf_tables: check for duplicate names on chain rename
      * nf_tables: remove ability to specify handles for new rules
      * nf_tables: return error for rule change request
      * nf_tables: return error for NLM_F_REPLACE without rule handle
      * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
      * nf_tables: fix NLM_F_MULTI usage in netlink notifications
      * nf_tables: include NLM_F_APPEND in rule dumps
      From Pablo Neira Ayuso:
      * nf_tables: fix stack overflow in nf_tables_newrule
      * nf_tables: nft_ct: fix compilation warning
      * nf_tables: nft_ct: fix crash with invalid packets
      * nft_log: group and qthreshold are 2^16
      * nf_tables: nft_meta: fix socket uid,gid handling
      * nft_counter: allow to restore counters
      * nf_tables: fix module autoload
      * nf_tables: allow to remove all rules placed in one chain
      * nf_tables: use 64-bits rule handle instead of 16-bits
      * nf_tables: fix chain after rule deletion
      * nf_tables: improve deletion performance
      * nf_tables: add missing code in route chain type
      * nf_tables: rise maximum number of expressions from 12 to 128
      * nf_tables: don't delete table if in use
      * nf_tables: fix basechain release
      From Tomasz Bursztyka:
      * nf_tables: Add support for changing users chain's name
      * nf_tables: Change chain's name to be fixed sized
      * nf_tables: Add support for replacing a rule by another one
      * nf_tables: Update uapi nftables netlink header documentation
      From Florian Westphal:
      * nft_log: group is u16, snaplen u32
      From Phil Oester:
      * nf_tables: operational limit match
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Patrick McHardy's avatar
      netfilter: pass hook ops to hookfn · 795aa6ef
      Patrick McHardy authored
      Pass the hook ops to the hookfn to allow for generic hook
      functions. This change is required by nf_tables.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
  11. 26 Sep, 2013 1 commit
    • Joe Perches's avatar
      netfilter: Remove extern from function prototypes · a0f4ecf3
      Joe Perches authored
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
  12. 27 Aug, 2013 1 commit
  13. 13 Aug, 2013 1 commit
  14. 31 Jul, 2013 2 commits
  15. 23 May, 2013 1 commit
  16. 05 Apr, 2013 1 commit
  17. 13 Oct, 2012 1 commit
  18. 30 Aug, 2012 1 commit
  19. 22 Jun, 2012 1 commit
  20. 20 Jun, 2012 1 commit
  21. 16 Jun, 2012 2 commits
  22. 07 Jun, 2012 1 commit
  23. 21 Apr, 2012 1 commit
  24. 24 Feb, 2012 1 commit
    • Ingo Molnar's avatar
      static keys: Introduce 'struct static_key', static_key_true()/false() and... · c5905afb
      Ingo Molnar authored
      static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
      So here's a boot tested patch on top of Jason's series that does
      all the cleanups I talked about and turns jump labels into a
      more intuitive to use facility. It should also address the
      various misconceptions and confusions that surround jump labels.
      Typical usage scenarios:
              #include <linux/static_key.h>
              struct static_key key = STATIC_KEY_INIT_TRUE;
              if (static_key_false(&key))
                      do unlikely code
                      do likely code
              if (static_key_true(&key))
                      do likely code
                      do unlikely code
      The static key is modified via:
      The 'slow' prefix makes it abundantly clear that this is an
      expensive operation.
      I've updated all in-kernel code to use this everywhere. Note
      that I (intentionally) have not pushed through the rename
      blindly through to the lowest levels: the actual jump-label
      patching arch facility should be named like that, so we want to
      decouple jump labels from the static-key facility a bit.
      On non-jump-label enabled architectures static keys default to
      likely()/unlikely() branches.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarJason Baron <jbaron@redhat.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: mathieu.desnoyers@efficios.com
      Cc: davem@davemloft.net
      Cc: ddaney.cavm@gmail.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  25. 21 Nov, 2011 1 commit
    • Eric Dumazet's avatar
      netfilter: use jump_label for nf_hooks · a2d7ec58
      Eric Dumazet authored
      On configs where CONFIG_JUMP_LABEL=y, we can replace in fast path a
      load/compare/conditional jump by a single jump with no dcache reference.
      Jump target is modified as soon as nf_hooks[pf][hook] switches from
      empty state to non empty states. jump_label state is kept outside of
      nf_hooks array so has no cost on cpu caches.
      This patch removes the test on CONFIG_NETFILTER_DEBUG : No need to call
      nf_hook_slow() at all if nf_hooks[pf][hook] is empty, this didnt give
      useful information, but slowed down things a lot.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  26. 27 May, 2011 1 commit