1. 18 Sep, 2015 2 commits
    • Eric W. Biederman's avatar
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman authored
      
      
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c4b51f0
    • Eric W. Biederman's avatar
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman authored
      
      
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29a26a56
  2. 29 Jul, 2015 1 commit
    • Toshiaki Makita's avatar
      bridge: Fix network header pointer for vlan tagged packets · df356d5e
      Toshiaki Makita authored
      
      
      There are several devices that can receive vlan tagged packets with
      CHECKSUM_PARTIAL like tap, possibly veth and xennet.
      When (multiple) vlan tagged packets with CHECKSUM_PARTIAL are forwarded
      by bridge to a device with the IP_CSUM feature, they end up with checksum
      error because before entering bridge, the network header is set to
      ETH_HLEN (not including vlan header length) in __netif_receive_skb_core(),
      get_rps_cpu(), or drivers' rx functions, and nobody fixes the pointer later.
      
      Since the network header is exepected to be ETH_HLEN in flow-dissection
      and hash-calculation in RPS in rx path, and since the header pointer fix
      is needed only in tx path, set the appropriate network header on forwarding
      packets.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df356d5e
  3. 10 Jul, 2015 1 commit
  4. 07 Apr, 2015 1 commit
    • David Miller's avatar
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller authored
      
      
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7026b1dd
  5. 09 Mar, 2015 1 commit
  6. 05 Mar, 2015 1 commit
    • Jouni Malinen's avatar
      bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi · 842a9ae0
      Jouni Malinen authored
      This extends the design in commit 95850116
      
       ("bridge: Add support for
      IEEE 802.11 Proxy ARP") with optional set of rules that are needed to
      meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The
      previously added BR_PROXYARP behavior is left as-is and a new
      BR_PROXYARP_WIFI alternative is added so that this behavior can be
      configured from user space when required.
      
      In addition, this enables proxyarp functionality for unicast ARP
      requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible
      to use unicast as well as broadcast for these frames.
      
      The key differences in functionality:
      
      BR_PROXYARP:
      - uses the flag on the bridge port on which the request frame was
        received to determine whether to reply
      - block bridge port flooding completely on ports that enable proxy ARP
      
      BR_PROXYARP_WIFI:
      - uses the flag on the bridge port to which the target device of the
        request belongs
      - block bridge port flooding selectively based on whether the proxyarp
        functionality replied
      Signed-off-by: default avatarJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      842a9ae0
  7. 31 Oct, 2014 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: nft_reject_bridge: don't use IP stack to reject traffic · 523b929d
      Pablo Neira Ayuso authored
      
      
      If the packet is received via the bridge stack, this cannot reject
      packets from the IP stack.
      
      This adds functions to build the reject packet and send it from the
      bridge stack. Comments and assumptions on this patch:
      
      1) Validate the IPv4 and IPv6 headers before further processing,
         given that the packet comes from the bridge stack, we cannot assume
         they are clean. Truncated packets are dropped, we follow similar
         approach in the existing iptables match/target extensions that need
         to inspect layer 4 headers that is not available. This also includes
         packets that are directed to multicast and broadcast ethernet
         addresses.
      
      2) br_deliver() is exported to inject the reject packet via
         bridge localout -> postrouting. So the approach is similar to what
         we already do in the iptables reject target. The reject packet is
         sent to the bridge port from which we have received the original
         packet.
      
      3) The reject packet is forged based on the original packet. The TTL
         is set based on sysctl_ip_default_ttl for IPv4 and per-net
         ipv6.devconf_all hoplimit for IPv6.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      523b929d
  8. 27 Oct, 2014 1 commit
    • Kyeyoon Park's avatar
      bridge: Add support for IEEE 802.11 Proxy ARP · 95850116
      Kyeyoon Park authored
      
      
      This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows
      the AP devices to keep track of the hardware-address-to-IP-address
      mapping of the mobile devices within the WLAN network.
      
      The AP will learn this mapping via observing DHCP, ARP, and NS/NA
      frames. When a request for such information is made (i.e. ARP request,
      Neighbor Solicitation), the AP will respond on behalf of the
      associated mobile device. In the process of doing so, the AP will drop
      the multicast request frame that was intended to go out to the wireless
      medium.
      
      It was recommended at the LKS workshop to do this implementation in
      the bridge layer. vxlan.c is already doing something very similar.
      The DHCP snooping code will be added to the userspace application
      (hostapd) per the recommendation.
      
      This RFC commit is only for IPv4. A similar approach in the bridge
      layer will be taken for IPv6 as well.
      Signed-off-by: default avatarKyeyoon Park <kyeyoonp@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95850116
  9. 26 Sep, 2014 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: bridge: move br_netfilter out of the core · 34666d46
      Pablo Neira Ayuso authored
      
      
      Jesper reported that br_netfilter always registers the hooks since
      this is part of the bridge core. This harms performance for people that
      don't need this.
      
      This patch modularizes br_netfilter so it can be rmmod'ed, thus,
      the hooks can be unregistered. I think the bridge netfilter should have
      been a separated module since the beginning, Patrick agreed on that.
      
      Note that this is breaking compatibility for users that expect that
      bridge netfilter is going to be available after explicitly 'modprobe
      bridge' or via automatic load through brctl.
      
      However, the damage can be easily undone by modprobing br_netfilter.
      The bridge core also spots a message to provide a clue to people that
      didn't notice that this has been deprecated.
      
      On top of that, the plan is that nftables will not rely on this software
      layer, but integrate the connection tracking into the bridge layer to
      enable stateful filtering and NAT, which is was bridge netfilter users
      seem to require.
      
      This patch still keeps the fake_dst_ops in the bridge core, since this
      is required by when the bridge port is initialized. So we can safely
      modprobe/rmmod br_netfilter anytime.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      34666d46
  10. 31 Mar, 2014 1 commit
  11. 20 Dec, 2013 1 commit
  12. 18 Dec, 2013 1 commit
  13. 11 Jun, 2013 1 commit
  14. 14 Feb, 2013 2 commits
  15. 14 Aug, 2012 1 commit
  16. 24 Apr, 2012 1 commit
  17. 15 Apr, 2012 1 commit
  18. 09 Dec, 2011 1 commit
  19. 06 Jan, 2011 1 commit
  20. 15 Nov, 2010 1 commit
  21. 20 Jul, 2010 1 commit
  22. 16 Jun, 2010 1 commit
  23. 15 Jun, 2010 1 commit
    • Herbert Xu's avatar
      bridge: Fix netpoll support · 91d2c34a
      Herbert Xu authored
      
      
      There are multiple problems with the newly added netpoll support:
      
      1) Use-after-free on each netpoll packet.
      2) Invoking unsafe code on netpoll/IRQ path.
      3) Breaks when netpoll is enabled on the underlying device.
      
      This patch fixes all of these problems.  In particular, we now
      allocate proper netpoll structures for each underlying device.
      
      We only allow netpoll to be enabled on the bridge when all the
      devices underneath it support netpoll.  Once it is enabled, we
      do not allow non-netpoll devices to join the bridge (until netpoll
      is disabled again).
      
      This allows us to do away with the npinfo juggling that caused
      problem number 1.
      
      Incidentally this patch fixes number 2 by bypassing unsafe code
      such as multicast snooping and netfilter.
      Reported-by: default avatarQianfeng Zhang <frzhang@redhat.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91d2c34a
  24. 06 May, 2010 1 commit
    • WANG Cong's avatar
      bridge: make bridge support netpoll · c06ee961
      WANG Cong authored
      
      
      Based on the previous patch, make bridge support netpoll by:
      
      1) implement the 2 methods to support netpoll for bridge;
      
      2) modify netpoll during forwarding packets via bridge;
      
      3) disable netpoll support of bridge when a netpoll-unabled device
         is added to bridge;
      
      4) enable netpoll support when all underlying devices support netpoll.
      
      Cc: David Miller <davem@davemloft.net>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Stephen Hemminger <shemminger@linux-foundation.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarWANG Cong <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c06ee961
  25. 28 Apr, 2010 3 commits
  26. 13 Apr, 2010 1 commit
  27. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  28. 25 Mar, 2010 1 commit
    • Jan Engelhardt's avatar
      netfilter: bridge: use NFPROTO values for NF_HOOK invocation · 713aefa3
      Jan Engelhardt authored
      The first argument to NF_HOOK* is an nfproto since quite some time.
      Commit v2.6.27-2457-gfdc9314c
      
       was the first to practically start using
      the new names. Do that now for the remaining NF_HOOK calls.
      
      The semantic patch used was:
      // <smpl>
      @@
      @@
      (NF_HOOK
      |NF_HOOK_THRESH
      )(
      -PF_BRIDGE,
      +NFPROTO_BRIDGE,
       ...)
      
      @@
      @@
       NF_HOOK(
      -PF_INET6,
      +NFPROTO_IPV6,
       ...)
      
      @@
      @@
       NF_HOOK(
      -PF_INET,
      +NFPROTO_IPV4,
       ...)
      // </smpl>
      Signed-off-by: default avatarJan Engelhardt <jengelh@medozas.de>
      713aefa3
  29. 16 Mar, 2010 2 commits
    • David S. Miller's avatar
      bridge: Make first arg to deliver_clone const. · 87faf3cc
      David S. Miller authored
      
      
      Otherwise we get a warning from the call in br_forward().
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87faf3cc
    • Michael Braun's avatar
      bridge: Fix br_forward crash in promiscuous mode · 7f7708f0
      Michael Braun authored
      From: Michael Braun <michael-dev@fami-braun.de>
      
      bridge: Fix br_forward crash in promiscuous mode
      
      It's a linux-next kernel from 2010-03-12 on an x86 system and it
      OOPs in the bridge module in br_pass_frame_up (called by
      br_handle_frame_finish) because brdev cannot be dereferenced (its set to
      a non-null value).
      
      Adding some BUG_ON statements revealed that
       BR_INPUT_SKB_CB(skb)->brdev == br-dev
      (as set in br_handle_frame_finish first)
      only holds until br_forward is called.
      The next call to br_pass_frame_up then fails.
      
      Digging deeper it seems that br_forward either frees the skb or passes
      it to NF_HOOK which will in turn take care of freeing the skb. The
      same is holds for br_pass_frame_ip. So it seems as if two independent
      skb allocations are required. As far as I can see, commit
      b33084be
      
       ("bridge: Avoid unnecessary
      clone on forward path") removed skb duplication and so likely causes
      this crash. This crash does not happen on 2.6.33.
      
      I've therefore modified br_forward the same way br_flood has been
      modified so that the skb is not freed if skb0 is going to be used
      and I can confirm that the attached patch resolves the issue for me.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f7708f0
  30. 28 Feb, 2010 4 commits
  31. 13 Aug, 2009 1 commit
  32. 09 Feb, 2009 1 commit
    • Herbert Xu's avatar
      bridge: Fix LRO crash with tun · 4906f998
      Herbert Xu authored
      
      
      > Kernel BUG at drivers/net/tun.c:444
      > invalid opcode: 0000 [1] SMP
      > last sysfs file: /class/net/lo/ifindex
      > CPU 0
      > Modules linked in: tun ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack
      > nfnetlink ipt_REJECT xt_tcpudp iptable_filter d
      > Pid: 6912, comm: qemu-kvm Tainted: G      2.6.18-128.el5 #1
      > RIP: 0010:[<ffffffff886f57b0>]  [<ffffffff886f57b0>]
      > :tun:tun_chr_readv+0x2b1/0x3a6
      > RSP: 0018:ffff8102202c5e48  EFLAGS: 00010246
      > RAX: 0000000000000000 RBX: ffff8102202c5e98 RCX: 0000000004010000
      > RDX: ffff810227063680 RSI: ffff8102202c5e9e RDI: ffff8102202c5e92
      > RBP: 0000000000010ff6 R08: 0000000000000000 R09: 0000000000000001
      > R10: ffff8102202c5e94 R11: 0000000000000202 R12: ffff8102275357c0
      > R13: ffff81022755e500 R14: 0000000000000000 R15: ffff8102202c5ef8
      > FS:  00002ae4398db980(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      > CR2: 00002ae4ab514000 CR3: 0000000221344000 CR4: 00000000000026e0
      > Process qemu-kvm (pid: 6912, threadinfo ffff8102202c4000, task
      > ffff81022e58d820)
      > Stack:  00000000498735cb ffff810229d1a3c0 0000000000000000 ffff81022e58d820
      >  ffffffff8008a461 ffff81022755e528 ffff81022755e528 ffffffff8009f925
      >  000005ea05ea0000 ffff8102209d0000 00001051143e1600 ffffffff8003c00e
      > Call Trace:
      >  [<ffffffff8008a461>] default_wake_function+0x0/0xe
      >  [<ffffffff8009f925>] enqueue_hrtimer+0x55/0x70
      >  [<ffffffff8003c00e>] hrtimer_start+0xbc/0xce
      >  [<ffffffff886f58bf>] :tun:tun_chr_read+0x1a/0x1f
      >  [<ffffffff8000b3f3>] vfs_read+0xcb/0x171
      >  [<ffffffff800117d4>] sys_read+0x45/0x6e
      >  [<ffffffff8005d116>] system_call+0x7e/0x83
      >
      >
      > Code: 0f 0b 68 40 62 6f 88 c2 bc 01 f6 42 0a 08 74 0c 80 4c 24 41
      > RIP  [<ffffffff886f57b0>] :tun:tun_chr_readv+0x2b1/0x3a6
      >  RSP <ffff8102202c5e48>
      >  <0>Kernel panic - not syncing: Fatal exception
      
      This crashed when an LRO packet generated by bnx2x reached a
      tun device through the bridge.  We're supposed to drop it at
      the bridge.  However, because the check was placed in br_forward
      instead of __br_forward, it's only effective if we are sending
      the packet through a single port.
      
      This patch fixes it by moving the check into __br_forward.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4906f998