1. 29 Sep, 2015 1 commit
    • Nikolay Aleksandrov's avatar
      bridge: vlan: add per-vlan struct and move to rhashtables · 2594e906
      Nikolay Aleksandrov authored
      This patch changes the bridge vlan implementation to use rhashtables
      instead of bitmaps. The main motivation behind this change is that we
      need extensible per-vlan structures (both per-port and global) so more
      advanced features can be introduced and the vlan support can be
      extended. I've tried to break this up but the moment net_port_vlans is
      changed and the whole API goes away, thus this is a larger patch.
      A few short goals of this patch are:
      - Extensible per-vlan structs stored in rhashtables and a sorted list
      - Keep user-visible behaviour (compressed vlans etc)
      - Keep fastpath ingress/egress logic the same (optimizations to come
      Here's a brief list of some of the new features we'd like to introduce:
      - per-vlan counters
      - vlan ingress/egress mapping
      - per-vlan igmp configuration
      - vlan priorities
      - avoid fdb entries replication (e.g. local fdb scaling issues)
      The structure is kept single for both global and per-port entries so to
      avoid code duplication where possible and also because we'll soon introduce
      "port0 / aka bridge as port" which should simplify things further
      (thanks to Vlad for the suggestion!).
      Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
      rhashtable, if an entry is added to a port it'll get a pointer to its
      global context so it can be quickly accessed later. There's also a
      sorted vlan list which is used for stable walks and some user-visible
      behaviour such as the vlan ranges, also for error paths.
      VLANs are stored in a "vlan group" which currently contains the
      rhashtable, sorted vlan list and the number of "real" vlan entries.
      A good side-effect of this change is that it resembles how hw keeps
      per-vlan data.
      One important note after this change is that if a VLAN is being looked up
      in the bridge's rhashtable for filtering purposes (or to check if it's an
      existing usable entry, not just a global context) then the new helper
      br_vlan_should_use() needs to be used if the vlan is found. In case the
      lookup is done only with a port's vlan group, then this check can be
      Things tested so far:
      - basic vlan ingress/egress
      - pvids
      - untagged vlans
      - adding/deleting vlans in different scenarios (with/without global ctx,
        while transmitting traffic, in ranges etc)
      - loading/removing the module while having/adding/deleting vlans
      - extracting bridge vlan information (user ABI), compressed requests
      - adding/deleting fdbs on vlans
      - bridge mac change, promisc mode
      - default pvid change
      - kmemleak ON during the whole time
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 18 Sep, 2015 2 commits
    • Eric W. Biederman's avatar
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman authored
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric W. Biederman's avatar
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman authored
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 29 Jul, 2015 1 commit
    • Toshiaki Makita's avatar
      bridge: Fix network header pointer for vlan tagged packets · df356d5e
      Toshiaki Makita authored
      There are several devices that can receive vlan tagged packets with
      CHECKSUM_PARTIAL like tap, possibly veth and xennet.
      When (multiple) vlan tagged packets with CHECKSUM_PARTIAL are forwarded
      by bridge to a device with the IP_CSUM feature, they end up with checksum
      error because before entering bridge, the network header is set to
      ETH_HLEN (not including vlan header length) in __netif_receive_skb_core(),
      get_rps_cpu(), or drivers' rx functions, and nobody fixes the pointer later.
      Since the network header is exepected to be ETH_HLEN in flow-dissection
      and hash-calculation in RPS in rx path, and since the header pointer fix
      is needed only in tx path, set the appropriate network header on forwarding
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  4. 10 Jul, 2015 1 commit
  5. 07 Apr, 2015 1 commit
    • David Miller's avatar
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller authored
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  6. 09 Mar, 2015 1 commit
  7. 05 Mar, 2015 1 commit
    • Jouni Malinen's avatar
      bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi · 842a9ae0
      Jouni Malinen authored
      This extends the design in commit 95850116
       ("bridge: Add support for
      IEEE 802.11 Proxy ARP") with optional set of rules that are needed to
      meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The
      previously added BR_PROXYARP behavior is left as-is and a new
      BR_PROXYARP_WIFI alternative is added so that this behavior can be
      configured from user space when required.
      In addition, this enables proxyarp functionality for unicast ARP
      requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible
      to use unicast as well as broadcast for these frames.
      The key differences in functionality:
      - uses the flag on the bridge port on which the request frame was
        received to determine whether to reply
      - block bridge port flooding completely on ports that enable proxy ARP
      - uses the flag on the bridge port to which the target device of the
        request belongs
      - block bridge port flooding selectively based on whether the proxyarp
        functionality replied
      Signed-off-by: default avatarJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 31 Oct, 2014 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: nft_reject_bridge: don't use IP stack to reject traffic · 523b929d
      Pablo Neira Ayuso authored
      If the packet is received via the bridge stack, this cannot reject
      packets from the IP stack.
      This adds functions to build the reject packet and send it from the
      bridge stack. Comments and assumptions on this patch:
      1) Validate the IPv4 and IPv6 headers before further processing,
         given that the packet comes from the bridge stack, we cannot assume
         they are clean. Truncated packets are dropped, we follow similar
         approach in the existing iptables match/target extensions that need
         to inspect layer 4 headers that is not available. This also includes
         packets that are directed to multicast and broadcast ethernet
      2) br_deliver() is exported to inject the reject packet via
         bridge localout -> postrouting. So the approach is similar to what
         we already do in the iptables reject target. The reject packet is
         sent to the bridge port from which we have received the original
      3) The reject packet is forged based on the original packet. The TTL
         is set based on sysctl_ip_default_ttl for IPv4 and per-net
         ipv6.devconf_all hoplimit for IPv6.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
  9. 27 Oct, 2014 1 commit
    • Kyeyoon Park's avatar
      bridge: Add support for IEEE 802.11 Proxy ARP · 95850116
      Kyeyoon Park authored
      This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows
      the AP devices to keep track of the hardware-address-to-IP-address
      mapping of the mobile devices within the WLAN network.
      The AP will learn this mapping via observing DHCP, ARP, and NS/NA
      frames. When a request for such information is made (i.e. ARP request,
      Neighbor Solicitation), the AP will respond on behalf of the
      associated mobile device. In the process of doing so, the AP will drop
      the multicast request frame that was intended to go out to the wireless
      It was recommended at the LKS workshop to do this implementation in
      the bridge layer. vxlan.c is already doing something very similar.
      The DHCP snooping code will be added to the userspace application
      (hostapd) per the recommendation.
      This RFC commit is only for IPv4. A similar approach in the bridge
      layer will be taken for IPv6 as well.
      Signed-off-by: default avatarKyeyoon Park <kyeyoonp@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  10. 26 Sep, 2014 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: bridge: move br_netfilter out of the core · 34666d46
      Pablo Neira Ayuso authored
      Jesper reported that br_netfilter always registers the hooks since
      this is part of the bridge core. This harms performance for people that
      don't need this.
      This patch modularizes br_netfilter so it can be rmmod'ed, thus,
      the hooks can be unregistered. I think the bridge netfilter should have
      been a separated module since the beginning, Patrick agreed on that.
      Note that this is breaking compatibility for users that expect that
      bridge netfilter is going to be available after explicitly 'modprobe
      bridge' or via automatic load through brctl.
      However, the damage can be easily undone by modprobing br_netfilter.
      The bridge core also spots a message to provide a clue to people that
      didn't notice that this has been deprecated.
      On top of that, the plan is that nftables will not rely on this software
      layer, but integrate the connection tracking into the bridge layer to
      enable stateful filtering and NAT, which is was bridge netfilter users
      seem to require.
      This patch still keeps the fake_dst_ops in the bridge core, since this
      is required by when the bridge port is initialized. So we can safely
      modprobe/rmmod br_netfilter anytime.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
  11. 31 Mar, 2014 1 commit
  12. 20 Dec, 2013 1 commit
  13. 18 Dec, 2013 1 commit
  14. 11 Jun, 2013 1 commit
  15. 14 Feb, 2013 2 commits
  16. 14 Aug, 2012 1 commit
  17. 24 Apr, 2012 1 commit
  18. 15 Apr, 2012 1 commit
  19. 09 Dec, 2011 1 commit
  20. 06 Jan, 2011 1 commit
  21. 15 Nov, 2010 1 commit
  22. 20 Jul, 2010 1 commit
  23. 16 Jun, 2010 1 commit
  24. 15 Jun, 2010 1 commit
    • Herbert Xu's avatar
      bridge: Fix netpoll support · 91d2c34a
      Herbert Xu authored
      There are multiple problems with the newly added netpoll support:
      1) Use-after-free on each netpoll packet.
      2) Invoking unsafe code on netpoll/IRQ path.
      3) Breaks when netpoll is enabled on the underlying device.
      This patch fixes all of these problems.  In particular, we now
      allocate proper netpoll structures for each underlying device.
      We only allow netpoll to be enabled on the bridge when all the
      devices underneath it support netpoll.  Once it is enabled, we
      do not allow non-netpoll devices to join the bridge (until netpoll
      is disabled again).
      This allows us to do away with the npinfo juggling that caused
      problem number 1.
      Incidentally this patch fixes number 2 by bypassing unsafe code
      such as multicast snooping and netfilter.
      Reported-by: default avatarQianfeng Zhang <frzhang@redhat.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  25. 06 May, 2010 1 commit
    • WANG Cong's avatar
      bridge: make bridge support netpoll · c06ee961
      WANG Cong authored
      Based on the previous patch, make bridge support netpoll by:
      1) implement the 2 methods to support netpoll for bridge;
      2) modify netpoll during forwarding packets via bridge;
      3) disable netpoll support of bridge when a netpoll-unabled device
         is added to bridge;
      4) enable netpoll support when all underlying devices support netpoll.
      Cc: David Miller <davem@davemloft.net>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Stephen Hemminger <shemminger@linux-foundation.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarWANG Cong <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  26. 28 Apr, 2010 3 commits
  27. 13 Apr, 2010 1 commit
  28. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  29. 25 Mar, 2010 1 commit
    • Jan Engelhardt's avatar
      netfilter: bridge: use NFPROTO values for NF_HOOK invocation · 713aefa3
      Jan Engelhardt authored
      The first argument to NF_HOOK* is an nfproto since quite some time.
      Commit v2.6.27-2457-gfdc9314c
       was the first to practically start using
      the new names. Do that now for the remaining NF_HOOK calls.
      The semantic patch used was:
      // <smpl>
      // </smpl>
      Signed-off-by: default avatarJan Engelhardt <jengelh@medozas.de>
  30. 16 Mar, 2010 2 commits
    • David S. Miller's avatar
      bridge: Make first arg to deliver_clone const. · 87faf3cc
      David S. Miller authored
      Otherwise we get a warning from the call in br_forward().
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Michael Braun's avatar
      bridge: Fix br_forward crash in promiscuous mode · 7f7708f0
      Michael Braun authored
      From: Michael Braun <michael-dev@fami-braun.de>
      bridge: Fix br_forward crash in promiscuous mode
      It's a linux-next kernel from 2010-03-12 on an x86 system and it
      OOPs in the bridge module in br_pass_frame_up (called by
      br_handle_frame_finish) because brdev cannot be dereferenced (its set to
      a non-null value).
      Adding some BUG_ON statements revealed that
       BR_INPUT_SKB_CB(skb)->brdev == br-dev
      (as set in br_handle_frame_finish first)
      only holds until br_forward is called.
      The next call to br_pass_frame_up then fails.
      Digging deeper it seems that br_forward either frees the skb or passes
      it to NF_HOOK which will in turn take care of freeing the skb. The
      same is holds for br_pass_frame_ip. So it seems as if two independent
      skb allocations are required. As far as I can see, commit
       ("bridge: Avoid unnecessary
      clone on forward path") removed skb duplication and so likely causes
      this crash. This crash does not happen on 2.6.33.
      I've therefore modified br_forward the same way br_flood has been
      modified so that the skb is not freed if skb0 is going to be used
      and I can confirm that the attached patch resolves the issue for me.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  31. 28 Feb, 2010 4 commits
  32. 13 Aug, 2009 1 commit