1. 10 Dec, 2013 9 commits
    • Paul Durrant's avatar
      xen-netback: improve guest-receive-side flow control · ca2f09f2
      Paul Durrant authored
      
      
      The way that flow control works without this patch is that, in start_xmit()
      the code uses xenvif_count_skb_slots() to predict how many slots
      xenvif_gop_skb() will consume and then adds this to a 'req_cons_peek'
      counter which it then uses to determine if the shared ring has that amount
      of space available by checking whether 'req_prod' has passed that value.
      If the ring doesn't have space the tx queue is stopped.
      xenvif_gop_skb() will then consume slots and update 'req_cons' and issue
      responses, updating 'rsp_prod' as it goes. The frontend will consume those
      responses and post new requests, by updating req_prod. So, req_prod chases
      req_cons which chases rsp_prod, and can never exceed that value. Thus if
      xenvif_count_skb_slots() ever returns a number of slots greater than
      xenvif_gop_skb() uses, req_cons_peek will get to a value that req_prod cannot
      possibly achieve (since it's limited by the 'real' req_cons) and, if this
      happens enough times, req_cons_peek gets more than a ring size ahead of
      req_cons and the tx queue then remains stopped forever waiting for an
      unachievable amount of space to become available in the ring.
      
      Having two routines trying to calculate the same value is always going to be
      fragile, so this patch does away with that. All we essentially need to do is
      make sure that we have 'enough stuff' on our internal queue without letting
      it build up uncontrollably. So start_xmit() makes a cheap optimistic check
      of how much space is needed for an skb and only turns the queue off if that
      is unachievable. net_rx_action() is the place where we could do with an
      accurate predicition but, since that has proven tricky to calculate, a cheap
      worse-case (but not too bad) estimate is all we really need since the only
      thing we *must* prevent is xenvif_gop_skb() consuming more slots than are
      available.
      
      Without this patch I can trivially stall netback permanently by just doing
      a large guest to guest file copy between two Windows Server 2008R2 VMs on a
      single host.
      
      Patch tested with frontends in:
      - Windows Server 2008R2
      - CentOS 6.0
      - Debian Squeeze
      - Debian Wheezy
      - SLES11
      
      Signed-off-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Annie Li <annie.li@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca2f09f2
    • Erik Hugne's avatar
      tipc: remove interface state mirroring in bearer · 512137ee
      Erik Hugne authored
      
      
      struct 'tipc_bearer' is a generic representation of the underlying
      media type, and exists in a one-to-one relationship to each interface
      TIPC is using. The struct contains a 'blocked' flag that mirrors the
      operational and execution state of the represented interface, and is
      updated through notification calls from the latter. The users of
      tipc_bearer are checking this flag before each attempt to send a
      packet via the interface.
      
      This state mirroring serves no purpose in the current code base. TIPC
      links will not discover a media failure any faster through this
      mechanism, and in reality the flag only adds overhead at packet
      sending and reception.
      
      Furthermore, the fact that the flag needs to be protected by a spinlock
      aggregated into tipc_bearer has turned out to cause a serious and
      completely unnecessary deadlock problem.
      
      CPU0                                    CPU1
      ----                                    ----
      Time 0: bearer_disable()                link_timeout()
      Time 1:   spin_lock_bh(&b_ptr->lock)      tipc_link_push_queue()
      Time 2:   tipc_link_delete()                tipc_bearer_blocked(b_ptr)
      Time 3:     k_cancel_timer(&req->timer)       spin_lock_bh(&b_ptr->lock)
      Time 4:       del_timer_sync(&req->timer)
      
      I.e., del_timer_sync() on CPU0 never returns, because the timer handler
      on CPU1 is waiting for the bearer lock.
      
      We eliminate the 'blocked' flag from struct tipc_bearer, along with all
      tests on this flag. This not only resolves the deadlock, but also
      simplifies and speeds up the data path execution of TIPC. It also fits
      well into our ongoing effort to make the locking policy simpler and
      more manageable.
      
      An effect of this change is that we can get rid of functions such as
      tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
      We replace the latter with a new function, tipc_reset_bearer(), which
      resets all links associated to the bearer immediately after an
      interface goes down.
      
      A user might notice one slight change in link behaviour after this
      change. When an interface goes down, (e.g. through a NETDEV_DOWN
      event) all attached links will be reset immediately, instead of
      leaving it to each link to detect the failure through a timer-driven
      mechanism. We consider this an improvement, and see no obvious risks
      with the new behavior.
      
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarPaul Gortmaker <Paul.Gortmaker@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      512137ee
    • wangweidong's avatar
      x25: convert printks to pr_<level> · b73e9e3c
      wangweidong authored
      
      
      use pr_<level> instead of printk(LEVEL)
      
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b73e9e3c
    • Daniel Borkmann's avatar
      packet: introduce PACKET_QDISC_BYPASS socket option · d346a3fa
      Daniel Borkmann authored
      
      
      This patch introduces a PACKET_QDISC_BYPASS socket option, that
      allows for using a similar xmit() function as in pktgen instead
      of taking the dev_queue_xmit() path. This can be very useful when
      PF_PACKET applications are required to be used in a similar
      scenario as pktgen, but with full, flexible packet payload that
      needs to be provided, for example.
      
      On default, nothing changes in behaviour for normal PF_PACKET
      TX users, so everything stays as is for applications. New users,
      however, can now set PACKET_QDISC_BYPASS if needed to prevent
      own packets from i) reentering packet_rcv() and ii) to directly
      push the frame to the driver.
      
      In doing so we can increase pps (here 64 byte packets) for
      PF_PACKET a bit:
      
        # CPUs -- QDISC_BYPASS   -- qdisc path -- qdisc path[**]
        1 CPU  ==  1,509,628 pps --  1,208,708 --  1,247,436
        2 CPUs ==  3,198,659 pps --  2,536,012 --  1,605,779
        3 CPUs ==  4,787,992 pps --  3,788,740 --  1,735,610
        4 CPUs ==  6,173,956 pps --  4,907,799 --  1,909,114
        5 CPUs ==  7,495,676 pps --  5,956,499 --  2,014,422
        6 CPUs ==  9,001,496 pps --  7,145,064 --  2,155,261
        7 CPUs == 10,229,776 pps --  8,190,596 --  2,220,619
        8 CPUs == 11,040,732 pps --  9,188,544 --  2,241,879
        9 CPUs == 12,009,076 pps -- 10,275,936 --  2,068,447
       10 CPUs == 11,380,052 pps -- 11,265,337 --  1,578,689
       11 CPUs == 11,672,676 pps -- 11,845,344 --  1,297,412
       [...]
       20 CPUs == 11,363,192 pps -- 11,014,933 --  1,245,081
      
       [**]: qdisc path with packet_rcv(), how probably most people
             seem to use it (hopefully not anymore if not needed)
      
      The test was done using a modified trafgen, sending a simple
      static 64 bytes packet, on all CPUs.  The trick in the fast
      "qdisc path" case, is to avoid reentering packet_rcv() by
      setting the RAW socket protocol to zero, like:
      socket(PF_PACKET, SOCK_RAW, 0);
      
      Tradeoffs are documented as well in this patch, clearly, if
      queues are busy, we will drop more packets, tc disciplines are
      ignored, and these packets are not visible to taps anymore. For
      a pktgen like scenario, we argue that this is acceptable.
      
      The pointer to the xmit function has been placed in packet
      socket structure hole between cached_dev and prot_hook that
      is hot anyway as we're working on cached_dev in each send path.
      
      Done in joint work together with Jesper Dangaard Brouer.
      
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d346a3fa
    • Daniel Borkmann's avatar
      net: dev: move inline skb_needs_linearize helper to header · 4262e5cc
      Daniel Borkmann authored
      
      
      As we need it elsewhere, move the inline helper function of
      skb_needs_linearize() over to skbuff.h include file. While
      at it, also convert the return to 'bool' instead of 'int'
      and add a proper kernel doc.
      
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4262e5cc
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 34f9f437
      David S. Miller authored
      
      
      Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
      Daniel's direct transmit changes depend upon.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34f9f437
    • Daniel Borkmann's avatar
      packet: fix send path when running with proto == 0 · 66e56cd4
      Daniel Borkmann authored
      Commit e40526cb introduced a cached dev pointer, that gets
      hooked into register_prot_hook(), __unregister_prot_hook() to
      update the device used for the send path.
      
      We need to fix this up, as otherwise this will not work with
      sockets created with protocol = 0, plus with sll_protocol = 0
      passed via sockaddr_ll when doing the bind.
      
      So instead, assign the pointer directly. The compiler can inline
      these helper functions automagically.
      
      While at it, also assume the cached dev fast-path as likely(),
      and document this variant of socket creation as it seems it is
      not widely used (seems not even the author of TX_RING was aware
      of that in his reference example [1]). Tested with reproducer
      from e40526cb.
      
       [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap#Example
      
      Fixes: e40526cb
      
       ("packet: fix use after free race in send path when dev is released")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Tested-by: default avatarSalam Noureddine <noureddine@aristanetworks.com>
      Tested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66e56cd4
    • Eric Dumazet's avatar
      pkt_sched: give visibility to mq slave qdiscs · 95dc1929
      Eric Dumazet authored
      Commit 6da7c8fc
      
       ("qdisc: allow setting default queuing discipline")
      added the ability to change default qdisc from pfifo_fast to say fq
      
      But as most modern ethernet devices are multiqueue, we cant really
      see all the statistics from "tc -s qdisc show", as the default root
      qdisc is mq.
      
      This patch adds the calls to qdisc_list_add() to mq and mqprio
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95dc1929
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · fbec3706
      David S. Miller authored
      
      
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates
      
      This series contains updates to i40e only.
      
      Jacob provides a i40e patch to get 1588 work correctly by separating
      TSYNVALID and TSYNINDX fields in the receive descriptor.
      
      Jesse provides several i40e patches, first to correct the checking
      of the multi-bit state.  The hash is reported correctly in the RSS
      field if and only if the filter status is 3.  Other values of the
      filter status mean different things and we should not depend on a
      bitwise result.  Then provides a patch to enable a couple of
      workarounds based on revision ID that allow the driver to work
      more fully on early hardware.
      
      Shannon provides several i40e patches as well.  First sets the media
      type in the hardware structure based on the external connection type.
      Then provides a patch to only setup the rings that will be used.  Lastly
      provides a fix where the TESTING state was still set when exiting the
      ethtool diagnostics.
      
      Kevin Scott provides one i40e patch to add a new flag to the i40e_add_veb()
      which allows the driver to request the hardware to filter on layer 2
      parameters.
      
      Anjali provides four i40e patches, first refactors the reset code in
      order to re-size queues and vectors while the interface is still up.
      Then provides a patch to enable all PCTYPEs expect FCoE for RSS.  Adds
      a message to notify the user of how many VFs are initialized on each
      port.  Lastly adds a new variable to track the number of PF instances,
      this is a global counter on purpose so that each PF loaded has a
      unique ID.
      
      Catherine bumps the driver version.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbec3706
  2. 09 Dec, 2013 12 commits
  3. 07 Dec, 2013 14 commits
  4. 06 Dec, 2013 5 commits