1. 09 Jul, 2016 1 commit
  2. 30 Jun, 2016 1 commit
    • Nikolay Aleksandrov's avatar
      net: bridge: add support for IGMP/MLD stats and export them via netlink · 1080ab95
      Nikolay Aleksandrov authored
      
      
      This patch adds stats support for the currently used IGMP/MLD types by the
      bridge. The stats are per-port (plus one stat per-bridge) and per-direction
      (RX/TX). The stats are exported via netlink via the new linkxstats API
      (RTM_GETSTATS). In order to minimize the performance impact, a new option
      is used to enable/disable the stats - multicast_stats_enabled, similar to
      the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
      lookups and checks, we make use of the current "igmp" member of the bridge
      private skb->cb region to record the type on Rx (both host-generated and
      external packets pass by multicast_rcv()). We can do that since the igmp
      member was used as a boolean and all the valid IGMP/MLD types are positive
      values. The normal bridge fast-path is not affected at all, the only
      affected paths are the flooding ones and since we make use of the IGMP/MLD
      type, we can quickly determine if the packet should be counted using
      cache-hot data (cb's igmp member). We add counters for:
      * IGMP Queries
      * IGMP Leaves
      * IGMP v1/v2/v3 reports
      
      * MLD Queries
      * MLD Leaves
      * MLD v1/v2 reports
      
      These are invaluable when monitoring or debugging complex multicast setups
      with bridges.
      
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1080ab95
  3. 11 Jun, 2016 1 commit
    • Ido Schimmel's avatar
      bridge: Fix incorrect re-injection of STP packets · 56fae404
      Ido Schimmel authored
      Commit 8626c56c ("bridge: fix potential use-after-free when hook
      returns QUEUE or STOLEN verdict") fixed incorrect usage of NF_HOOK's
      return value by consuming packets in okfn via br_pass_frame_up().
      
      However, this function re-injects packets to the Rx path with skb->dev
      set to the bridge device, which breaks kernel's STP, as all STP packets
      appear to originate from the bridge device itself.
      
      Instead, if STP is enabled and bridge isn't a 802.1ad bridge, then learn
      packet's SMAC and inject it back to the Rx path for further processing
      by the packet handlers.
      
      The patch also makes netfilter's behavior consistent with regards to
      packets destined to the Bridge Group Address, as no hook registered at
      LOCAL_IN will ever be called, regardless if STP is enabled or not.
      
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Fixes: 8626c56c
      
       ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56fae404
  4. 14 Mar, 2016 1 commit
    • Florian Westphal's avatar
      bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict · 8626c56c
      Florian Westphal authored
      
      
      Zefir Kurtisi reported kernel panic with an openwrt specific patch.
      However, it turns out that mainline has a similar bug waiting to happen.
      
      Once NF_HOOK() returns the skb is in undefined state and must not be
      used.   Moreover, the okfn must consume the skb to support async
      processing (NF_QUEUE).
      
      Current okfn in this spot doesn't consume it and caller assumes that
      NF_HOOK return value tells us if skb was freed or not, but thats wrong.
      
      It "works" because no in-tree user registers a NFPROTO_BRIDGE hook at
      LOCAL_IN that returns STOLEN or NF_QUEUE verdicts.
      
      Once we add NF_QUEUE support for nftables bridge this will break --
      NF_QUEUE holds the skb for async processing, caller will erronoulsy
      return RX_HANDLER_PASS and on reinject netfilter will access free'd skb.
      
      Fix this by pushing skb up the stack in the okfn instead.
      
      NB: It also seems dubious to use LOCAL_IN while bypassing PRE_ROUTING
      completely in this case but this is how its been forever so it seems
      preferable to not change this.
      
      Cc: Felix Fietkau <nbd@openwrt.org>
      Cc: Zefir Kurtisi <zefir.kurtisi@neratec.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Tested-by: default avatarZefir Kurtisi <zefir.kurtisi@neratec.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8626c56c
  5. 13 Oct, 2015 1 commit
  6. 02 Oct, 2015 1 commit
  7. 29 Sep, 2015 1 commit
    • Nikolay Aleksandrov's avatar
      bridge: vlan: add per-vlan struct and move to rhashtables · 2594e906
      Nikolay Aleksandrov authored
      
      
      This patch changes the bridge vlan implementation to use rhashtables
      instead of bitmaps. The main motivation behind this change is that we
      need extensible per-vlan structures (both per-port and global) so more
      advanced features can be introduced and the vlan support can be
      extended. I've tried to break this up but the moment net_port_vlans is
      changed and the whole API goes away, thus this is a larger patch.
      A few short goals of this patch are:
      - Extensible per-vlan structs stored in rhashtables and a sorted list
      - Keep user-visible behaviour (compressed vlans etc)
      - Keep fastpath ingress/egress logic the same (optimizations to come
        later)
      
      Here's a brief list of some of the new features we'd like to introduce:
      - per-vlan counters
      - vlan ingress/egress mapping
      - per-vlan igmp configuration
      - vlan priorities
      - avoid fdb entries replication (e.g. local fdb scaling issues)
      
      The structure is kept single for both global and per-port entries so to
      avoid code duplication where possible and also because we'll soon introduce
      "port0 / aka bridge as port" which should simplify things further
      (thanks to Vlad for the suggestion!).
      
      Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
      rhashtable, if an entry is added to a port it'll get a pointer to its
      global context so it can be quickly accessed later. There's also a
      sorted vlan list which is used for stable walks and some user-visible
      behaviour such as the vlan ranges, also for error paths.
      VLANs are stored in a "vlan group" which currently contains the
      rhashtable, sorted vlan list and the number of "real" vlan entries.
      A good side-effect of this change is that it resembles how hw keeps
      per-vlan data.
      One important note after this change is that if a VLAN is being looked up
      in the bridge's rhashtable for filtering purposes (or to check if it's an
      existing usable entry, not just a global context) then the new helper
      br_vlan_should_use() needs to be used if the vlan is found. In case the
      lookup is done only with a port's vlan group, then this check can be
      skipped.
      
      Things tested so far:
      - basic vlan ingress/egress
      - pvids
      - untagged vlans
      - undef CONFIG_BRIDGE_VLAN_FILTERING
      - adding/deleting vlans in different scenarios (with/without global ctx,
        while transmitting traffic, in ranges etc)
      - loading/removing the module while having/adding/deleting vlans
      - extracting bridge vlan information (user ABI), compressed requests
      - adding/deleting fdbs on vlans
      - bridge mac change, promisc mode
      - default pvid change
      - kmemleak ON during the whole time
      
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2594e906
  8. 18 Sep, 2015 3 commits
    • Eric W. Biederman's avatar
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman authored
      
      
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c4b51f0
    • Eric W. Biederman's avatar
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman authored
      
      
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29a26a56
    • Eric W. Biederman's avatar
      bridge: Add br_netif_receive_skb remove netif_receive_skb_sk · 04eb4489
      Eric W. Biederman authored
      
      
      netif_receive_skb_sk is only called once in the bridge code, replace
      it with a bridge specific function that calls netif_receive_skb.
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04eb4489
  9. 07 Apr, 2015 1 commit
    • David Miller's avatar
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller authored
      
      
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7026b1dd
  10. 05 Mar, 2015 1 commit
    • Jouni Malinen's avatar
      bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi · 842a9ae0
      Jouni Malinen authored
      This extends the design in commit 95850116
      
       ("bridge: Add support for
      IEEE 802.11 Proxy ARP") with optional set of rules that are needed to
      meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The
      previously added BR_PROXYARP behavior is left as-is and a new
      BR_PROXYARP_WIFI alternative is added so that this behavior can be
      configured from user space when required.
      
      In addition, this enables proxyarp functionality for unicast ARP
      requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible
      to use unicast as well as broadcast for these frames.
      
      The key differences in functionality:
      
      BR_PROXYARP:
      - uses the flag on the bridge port on which the request frame was
        received to determine whether to reply
      - block bridge port flooding completely on ports that enable proxy ARP
      
      BR_PROXYARP_WIFI:
      - uses the flag on the bridge port to which the target device of the
        request belongs
      - block bridge port flooding selectively based on whether the proxyarp
        functionality replied
      
      Signed-off-by: default avatarJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      842a9ae0
  11. 14 Jan, 2015 1 commit
    • Arnd Bergmann's avatar
      bridge: only provide proxy ARP when CONFIG_INET is enabled · d92cfdbb
      Arnd Bergmann authored
      
      
      When IPV4 support is disabled, we cannot call arp_send from
      the bridge code, which would result in a kernel link error:
      
      net/built-in.o: In function `br_handle_frame_finish':
      :(.text+0x59914): undefined reference to `arp_send'
      :(.text+0x59a50): undefined reference to `arp_tbl'
      
      This makes the newly added proxy ARP support in the bridge
      code depend on the CONFIG_INET symbol and lets the compiler
      optimize the code out to avoid the link error.
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 95850116
      
       ("bridge: Add support for IEEE 802.11 Proxy ARP")
      Cc: Kyeyoon Park <kyeyoonp@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d92cfdbb
  12. 27 Oct, 2014 1 commit
    • Kyeyoon Park's avatar
      bridge: Add support for IEEE 802.11 Proxy ARP · 95850116
      Kyeyoon Park authored
      
      
      This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows
      the AP devices to keep track of the hardware-address-to-IP-address
      mapping of the mobile devices within the WLAN network.
      
      The AP will learn this mapping via observing DHCP, ARP, and NS/NA
      frames. When a request for such information is made (i.e. ARP request,
      Neighbor Solicitation), the AP will respond on behalf of the
      associated mobile device. In the process of doing so, the AP will drop
      the multicast request frame that was intended to go out to the wireless
      medium.
      
      It was recommended at the LKS workshop to do this implementation in
      the bridge layer. vxlan.c is already doing something very similar.
      The DHCP snooping code will be added to the userspace application
      (hostapd) per the recommendation.
      
      This RFC commit is only for IPv4. A similar approach in the bridge
      layer will be taken for IPv6 as well.
      
      Signed-off-by: default avatarKyeyoon Park <kyeyoonp@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95850116
  13. 26 Sep, 2014 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: bridge: move br_netfilter out of the core · 34666d46
      Pablo Neira Ayuso authored
      
      
      Jesper reported that br_netfilter always registers the hooks since
      this is part of the bridge core. This harms performance for people that
      don't need this.
      
      This patch modularizes br_netfilter so it can be rmmod'ed, thus,
      the hooks can be unregistered. I think the bridge netfilter should have
      been a separated module since the beginning, Patrick agreed on that.
      
      Note that this is breaking compatibility for users that expect that
      bridge netfilter is going to be available after explicitly 'modprobe
      bridge' or via automatic load through brctl.
      
      However, the damage can be easily undone by modprobing br_netfilter.
      The bridge core also spots a message to provide a clue to people that
      didn't notice that this has been deprecated.
      
      On top of that, the plan is that nftables will not rely on this software
      layer, but integrate the connection tracking into the bridge layer to
      enable stateful filtering and NAT, which is was bridge netfilter users
      seem to require.
      
      This patch still keeps the fake_dst_ops in the bridge core, since this
      is required by when the bridge port is initialized. So we can safely
      modprobe/rmmod br_netfilter anytime.
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      34666d46
  14. 11 Jun, 2014 1 commit
  15. 02 Jun, 2014 1 commit
  16. 11 Apr, 2014 1 commit
  17. 28 Mar, 2014 1 commit
  18. 10 Feb, 2014 1 commit
    • Toshiaki Makita's avatar
      bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr · a5642ab4
      Toshiaki Makita authored
      br_fdb_changeaddr() assumes that there is at most one local entry per port
      per vlan. It used to be true, but since commit 36fd2b63
      
       ("bridge: allow
      creating/deleting fdb entries via netlink"), it has not been so.
      Therefore, the function might fail to search a correct previous address
      to be deleted and delete an arbitrary local entry if user has added local
      entries manually.
      
      Example of problematic case:
        ip link set eth0 address ee:ff:12:34:56:78
        brctl addif br0 eth0
        bridge fdb add 12:34:56:78:90:ab dev eth0 master
        ip link set eth0 address aa:bb:cc:dd:ee:ff
      Then, the address 12:34:56:78:90:ab might be deleted instead of
      ee:ff:12:34:56:78, the original mac address of eth0.
      
      Address this issue by introducing a new flag, added_by_user, to struct
      net_bridge_fdb_entry.
      
      Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases
      like:
        ip link set eth0 address 12:34:56:78:90:ab
        ip link set eth1 address aa:bb:cc:dd:ee:ff
        brctl addif br0 eth0
        bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master
        brctl addif br0 eth1
        brctl delif br0 eth0
      In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff,
      but it also should have been added by "brctl addif br0 eth1" originally,
      so we don't delete it and treat it a new kernel-created entry.
      
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5642ab4
  19. 05 Jan, 2014 1 commit
  20. 29 Oct, 2013 1 commit
  21. 30 Aug, 2013 1 commit
    • Linus Lüssing's avatar
      bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones · cc0fdd80
      Linus Lüssing authored
      Currently we would still potentially suffer multicast packet loss if there
      is just either an IGMP or an MLD querier: For the former case, we would
      possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is
      because we are currently assuming that if either an IGMP or MLD querier
      is present that the other one is present, too.
      
      This patch makes the behaviour and fix added in
      "bridge: disable snooping if there is no querier" (b00589af
      
      )
      to also work if there is either just an IGMP or an MLD querier on the
      link: It refines the deactivation of the snooping to be protocol
      specific by using separate timers for the snooped IGMP and MLD queries
      as well as separate timers for our internal IGMP and MLD queriers.
      
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@web.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc0fdd80
  22. 01 Aug, 2013 1 commit
    • Linus Lüssing's avatar
      bridge: disable snooping if there is no querier · b00589af
      Linus Lüssing authored
      If there is no querier on a link then we won't get periodic reports and
      therefore won't be able to learn about multicast listeners behind ports,
      potentially leading to lost multicast packets, especially for multicast
      listeners that joined before the creation of the bridge.
      
      These lost multicast packets can appear since c5c23260
      
      
      ("bridge: Add multicast_querier toggle and disable queries by default")
      in particular.
      
      With this patch we are flooding multicast packets if our querier is
      disabled and if we didn't detect any other querier.
      
      A grace period of the Maximum Response Delay of the querier is added to
      give multicast responses enough time to arrive and to be learned from
      before disabling the flooding behaviour again.
      
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@web.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b00589af
  23. 11 Jun, 2013 2 commits
  24. 07 Mar, 2013 1 commit
  25. 14 Feb, 2013 4 commits
  26. 03 Nov, 2012 1 commit
  27. 30 Oct, 2012 1 commit
  28. 10 May, 2012 1 commit
    • Joe Perches's avatar
      bridge: Convert compare_ether_addr to ether_addr_equal · 9a7b6ef9
      Joe Perches authored
      
      
      Use the new bool function ether_addr_equal to add
      some clarity and reduce the likelihood for misuse
      of compare_ether_addr for sorting.
      
      Done via cocci script:
      
      $ cat compare_ether_addr.cocci
      @@
      expression a,b;
      @@
      -	!compare_ether_addr(a, b)
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	compare_ether_addr(a, b)
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal(a, b) == 0
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal(a, b) != 0
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal(a, b) == 0
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal(a, b) != 0
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!!ether_addr_equal(a, b)
      +	ether_addr_equal(a, b)
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a7b6ef9
  29. 31 Oct, 2011 1 commit
  30. 06 Oct, 2011 1 commit
    • stephen hemminger's avatar
      bridge: allow forwarding some link local frames · 515853cc
      stephen hemminger authored
      
      
      This is based on an earlier patch by Nick Carter with comments
      by David Lamparter but with some refinements. Thanks for their patience
      this is a confusing area with overlap of standards, user requirements,
      and compatibility with earlier releases.
      
      It adds a new sysfs attribute
         /sys/class/net/brX/bridge/group_fwd_mask
      that controls forwarding of frames with address of: 01-80-C2-00-00-0X
      The default setting has no forwarding to retain compatibility.
      
      One change from earlier releases is that forwarding of group
      addresses is not dependent on STP being enabled or disabled. This
      choice was made based on interpretation of tie 802.1 standards.
      I expect complaints will arise because of this, but better to follow
      the standard than continue acting incorrectly by default.
      
      The filtering mask is writeable, but only values that don't forward
      known control frames are allowed. It intentionally blocks attempts
      to filter control protocols. For example: writing a 8 allows
      forwarding 802.1X PAE addresses which is the most common request.
      
      Reported-by: default avatarDavid Lamparter <equinox@diac24.net>
      Original-patch-by: default avatarNick Carter <ncarter100@gmail.com>
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Tested-by: default avatarBenjamin Poirier <benjamin.poirier@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      515853cc
  31. 06 Jul, 2011 1 commit
  32. 22 Apr, 2011 1 commit
  33. 05 Apr, 2011 1 commit
  34. 16 Mar, 2011 1 commit