1. 18 Sep, 2015 1 commit
    • Eric W. Biederman's avatar
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman authored
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 07 Apr, 2015 1 commit
    • David Miller's avatar
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller authored
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 09 Mar, 2015 1 commit
  4. 06 Mar, 2015 1 commit
    • Eric W. Biederman's avatar
      DECnet: Only use neigh_ops for adding the link layer header · aaa4e704
      Eric W. Biederman authored
      Other users users of the neighbour table use neigh->output as the method
      to decided when and which link-layer header to place on a packet.
      DECnet has been using neigh->output to decide which DECnet headers to
      place on a packet depending which neighbour the packet is destined for.
      The DECnet usage isn't totally wrong but it can run into problems if the
      neighbour output function is run for a second time as the teql driver
      and the bridge netfilter code can do.
      Therefore to avoid pathologic problems later down the line and make the
      neighbour code easier to understand by refactoring the decnet output
      code to only use a neighbour method to add a link layer header to a
      This is done by moving the neigbhour operations lookup from
      dn_to_neigh_output to dn_neigh_output_packet.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  5. 23 Feb, 2015 1 commit
  6. 19 Jan, 2015 1 commit
  7. 18 Jan, 2015 1 commit
    • Johannes Berg's avatar
      netlink: make nlmsg_end() and genlmsg_end() void · 053c095a
      Johannes Berg authored
      Contrary to common expectations for an "int" return, these functions
      return only a positive value -- if used correctly they cannot even
      return 0 because the message header will necessarily be in the skb.
      This makes the very common pattern of
        if (genlmsg_end(...) < 0) { ... }
      be a whole bunch of dead code. Many places also simply do
        return nlmsg_end(...);
      and the caller is expected to deal with it.
      This also commonly (at least for me) causes errors, because it is very
      common to write
        if (my_function(...))
          /* error condition */
      and if my_function() does "return nlmsg_end()" this is of course wrong.
      Additionally, there's not a single place in the kernel that actually
      needs the message length returned, and if anyone needs it later then
      it'll be very easy to just use skb->len there.
      Remove this, and make the functions void. This removes a bunch of dead
      code as described above. The patch adds lines because I did
      -	return nlmsg_end(...);
      +	nlm...
  8. 15 Apr, 2014 1 commit
  9. 15 Jan, 2014 1 commit
  10. 06 Dec, 2013 1 commit
  11. 22 Mar, 2013 3 commits
  12. 18 Feb, 2013 2 commits
  13. 28 Jan, 2013 1 commit
  14. 10 Sep, 2012 1 commit
  15. 09 Aug, 2012 1 commit
  16. 31 Jul, 2012 1 commit
    • Eric Dumazet's avatar
      ipv4: Restore old dst_free() behavior. · 54764bb6
      Eric Dumazet authored
      commit 404e0a8b
       (net: ipv4: fix RCU races on dst refcounts) tried
      to solve a race but added a problem at device/fib dismantle time :
      We really want to call dst_free() as soon as possible, even if sockets
      still have dst in their cache.
      dst_release() calls in free_fib_info_rcu() are not welcomed.
      Root of the problem was that now we also cache output routes (in
      nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
      rt_free(), because output route lookups are done in process context.
      Based on feedback and initial patch from David Miller (adding another
      call_rcu_bh() call in fib, but it appears it was not the right fix)
      I left the inet_sk_rx_dst_set() helper and added __rcu attributes
      to nh_rth_output and nh_rth_input to better document what is going on in
      this code.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  17. 30 Jul, 2012 1 commit
    • Eric Dumazet's avatar
      net: ipv4: fix RCU races on dst refcounts · 404e0a8b
      Eric Dumazet authored
      commit c6cffba4
       (ipv4: Fix input route performance regression.)
      added various fatal races with dst refcounts.
      crashes happen on tcp workloads if routes are added/deleted at the same
      The dst_free() calls from free_fib_info_rcu() are clearly racy.
      We need instead regular dst refcounting (dst_release()) and make
      sure dst_release() is aware of RCU grace periods :
      Add DST_RCU_FREE flag so that dst_release() respects an RCU grace period
      before dst destruction for cached dst
      Introduce a new inet_sk_rx_dst_set() helper, using atomic_inc_not_zero()
      to make sure we dont increase a zero refcount (On a dst currently
      waiting an rcu grace period before destruction)
      rt_cache_route() must take a reference on the new cached route, and
      release it if was not able to install it.
      With this patch, my machines survive various benchmarks.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  18. 23 Jul, 2012 1 commit
  19. 20 Jul, 2012 1 commit
  20. 17 Jul, 2012 1 commit
    • David S. Miller's avatar
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller authored
      This will be used so that we can compose a full flow key.
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  21. 12 Jul, 2012 1 commit
  22. 11 Jul, 2012 2 commits
  23. 05 Jul, 2012 2 commits
  24. 27 Jun, 2012 2 commits
  25. 15 May, 2012 1 commit
  26. 15 Apr, 2012 1 commit
  27. 05 Feb, 2012 1 commit
  28. 05 Dec, 2011 1 commit
  29. 26 Nov, 2011 2 commits
  30. 31 Oct, 2011 1 commit
  31. 18 Jul, 2011 3 commits