1. 30 Jul, 2015 1 commit
  2. 09 Jul, 2015 2 commits
  3. 26 May, 2015 1 commit
  4. 23 May, 2015 2 commits
  5. 13 May, 2015 1 commit
  6. 10 May, 2015 2 commits
    • Alexei Starovoitov's avatar
      pktgen: introduce xmit_mode '<start_xmit|netif_receive>' · 62f64aed
      Alexei Starovoitov authored
      
      
      Introduce xmit_mode 'netif_receive' for pktgen which generates the
      packets using familiar pktgen commands, but feeds them into
      netif_receive_skb() instead of ndo_start_xmit().
      
      Default mode is called 'start_xmit'.
      
      It is designed to test netif_receive_skb and ingress qdisc
      performace only. Make sure to understand how it works before
      using it for other rx benchmarking.
      
      Sample script 'pktgen.sh':
      \#!/bin/bash
      function pgset() {
        local result
      
        echo $1 > $PGDEV
      
        result=`cat $PGDEV | fgrep "Result: OK:"`
        if [ "$result" = "" ]; then
          cat $PGDEV | fgrep Result:
        fi
      }
      
      [ -z "$1" ] && echo "Usage: $0 DEV" && exit 1
      ETH=$1
      
      PGDEV=/proc/net/pktgen/kpktgend_0
      pgset "rem_device_all"
      pgset "add_device $ETH"
      
      PGDEV=/proc/net/pktgen/$ETH
      pgset "xmit_mode netif_receive"
      pgset "pkt_size 60"
      pgset "dst 198.18.0.1"
      pgset "dst_mac 90:e2:ba:ff:ff:ff"
      pgset "count 10000000"
      pgset "burst 32"
      
      PGDEV=/proc/net/pktgen/pgctrl
      echo "Running... ctrl^C to stop"
      pgset "start"
      echo "Done"
      cat /proc/net/pktgen/$ETH
      
      Usage:
      $ sudo ./pktgen.sh eth2
      ...
      Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags)
        43033682pps 20656Mb/sec (20656167360bps) errors: 10000000
      
      Raw netif_receive_skb speed should be ~43 million packet
      per second on 3.7Ghz x86 and 'perf report' should look like:
        37.69%  kpktgend_0   [kernel.vmlinux]  [k] __netif_receive_skb_core
        25.81%  kpktgend_0   [kernel.vmlinux]  [k] kfree_skb
         7.22%  kpktgend_0   [kernel.vmlinux]  [k] ip_rcv
         5.68%  kpktgend_0   [pktgen]          [k] pktgen_thread_worker
      
      If fib_table_lookup is seen on top, it means skb was processed
      by the stack. To benchmark netif_receive_skb only make sure
      that 'dst_mac' of your pktgen script is different from
      receiving device mac and it will be dropped by ip_rcv
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62f64aed
    • Jesper Dangaard Brouer's avatar
      pktgen: adjust flag NO_TIMESTAMP to be more pktgen compliant · f1f00d8f
      Jesper Dangaard Brouer authored
      Allow flag NO_TIMESTAMP to turn timestamping on again, like other flags,
      with a negation of the flag like !NO_TIMESTAMP.
      
      Also document the option flag NO_TIMESTAMP.
      
      Fixes: afb84b62
      
       ("pktgen: add flag NO_TIMESTAMP to disable timestamping")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1f00d8f
  7. 22 Apr, 2015 1 commit
  8. 23 Feb, 2015 1 commit
  9. 15 Feb, 2015 1 commit
  10. 05 Feb, 2015 1 commit
  11. 19 Nov, 2014 1 commit
  12. 02 Oct, 2014 1 commit
  13. 10 Sep, 2014 1 commit
  14. 02 Sep, 2014 3 commits
  15. 30 Aug, 2014 1 commit
  16. 25 Aug, 2014 1 commit
  17. 15 Jul, 2014 1 commit
  18. 01 Jul, 2014 2 commits
    • Jesper Dangaard Brouer's avatar
      pktgen: RCU-ify "if_list" to remove lock in next_to_run() · 8788370a
      Jesper Dangaard Brouer authored
      
      
      The if_lock()/if_unlock() in next_to_run() adds a significant
      overhead, because its called for every packet in busy loop of
      pktgen_thread_worker().  (Thomas Graf originally pointed me
      at this lock problem).
      
      Removing these two "LOCK" operations should in theory save us approx
      16ns (8ns x 2), as illustrated below we do save 16ns when removing
      the locks and introducing RCU protection.
      
      Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30:
       (single CPU performance, ixgbe 10Gbit/s, E5-2630)
       * Prev   : 5684009 pps --> 175.93ns (1/5684009*10^9)
       * RCU-fix: 6272204 pps --> 159.43ns (1/6272204*10^9)
       * Diff   : +588195 pps --> -16.50ns
      
      To understand this RCU patch, I describe the pktgen thread model
      below.
      
      In pktgen there is several kernel threads, but there is only one CPU
      running each kernel thread.  Communication with the kernel threads are
      done through some thread control flags.  This allow the thread to
      change data structures at a know synchronization point, see main
      thread func pktgen_thread_worker().
      
      Userspace changes are communicated through proc-file writes.  There
      are three types of changes, general control changes "pgctrl"
      (func:pgctrl_write), thread changes "kpktgend_X"
      (func:pktgen_thread_write), and interface config changes "etcX@N"
      (func:pktgen_if_write).
      
      Userspace "pgctrl" and "thread" changes are synchronized via the mutex
      pktgen_thread_lock, thus only a single userspace instance can run.
      The mutex is taken while the packet generator is running, by pgctrl
      "start".  Thus e.g. "add_device" cannot be invoked when pktgen is
      running/started.
      
      All "pgctrl" and all "thread" changes, except thread "add_device",
      communicate via the thread control flags.  The main problem is the
      exception "add_device", that modifies threads "if_list" directly.
      
      Fortunately "add_device" cannot be invoked while pktgen is running.
      But there exists a race between "rem_device_all" and "add_device"
      (which normally don't occur, because "rem_device_all" waits 125ms
      before returning). Background'ing "rem_device_all" and running
      "add_device" immediately allow the race to occur.
      
      The race affects the threads (list of devices) "if_list".  The if_lock
      is used for protecting this "if_list".  Other readers are given
      lock-free access to the list under RCU read sections.
      
      Note, interface config changes (via proc) can occur while pktgen is
      running, which worries me a bit.  I'm assuming proc_remove() takes
      appropriate locks, to assure no writers exists after proc_remove()
      finish.
      
      I've been running a script exercising the race condition (leading me
      to fix the proc_remove order), without any issues.  The script also
      exercises concurrent proc writes, while the interface config is
      getting removed.
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8788370a
    • Jesper Dangaard Brouer's avatar
      pktgen: avoid expensive set_current_state() call in loop · baac167b
      Jesper Dangaard Brouer authored
      
      
      Avoid calling set_current_state() inside the busy-loop in
      pktgen_thread_worker().  In case of pkt_dev->delay, then it is still
      used/enabled in pktgen_xmit() via the spin() call.
      
      The set_current_state(TASK_INTERRUPTIBLE) uses a xchg, which implicit
      is LOCK prefixed.  I've measured the asm LOCK operation to take approx
      8ns on this E5-2630 CPU.  Performance increase corrolate with this
      measurement.
      
      Performance data with CLONE_SKB==100000, rx-usecs=30:
       (single CPU performance, ixgbe 10Gbit/s, E5-2630)
       * Prev:  5454050 pps --> 183.35ns (1/5454050*10^9)
       * Now:   5684009 pps --> 175.93ns (1/5684009*10^9)
       * Diff:  +229959 pps -->  -7.42ns
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      baac167b
  19. 16 May, 2014 1 commit
  20. 12 Apr, 2014 1 commit
  21. 07 Apr, 2014 1 commit
  22. 24 Feb, 2014 3 commits
  23. 22 Jan, 2014 1 commit
  24. 10 Jan, 2014 1 commit
  25. 06 Jan, 2014 1 commit
  26. 03 Jan, 2014 6 commits
    • Fan Du's avatar
      {pktgen, xfrm} Show spi value properly when ipsec turned on · 8101328b
      Fan Du authored
      
      
      If user run pktgen plus ipsec by using spi, show spi value
      properly when cat /proc/net/pktgen/ethX
      
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      8101328b
    • Fan Du's avatar
      {pktgen, xfrm} Introduce xfrm_state_lookup_byspi for pktgen · c454997e
      Fan Du authored
      
      
      Introduce xfrm_state_lookup_byspi to find user specified by custom
      from "pgset spi xxx". Using this scheme, any flow regardless its
      saddr/daddr could be transform by SA specified with configurable
      spi.
      
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      c454997e
    • Fan Du's avatar
      {pktgen, xfrm} Construct skb dst for tunnel mode transformation · cf93d47e
      Fan Du authored
      
      
      IPsec tunnel mode encapuslation needs to set outter ip header
      with right protocol/ttl/id value with regard to skb->dst->child.
      
      Looking up a rt in a standard way is absolutely wrong for every
      packet transmission. In a simple way, construct a dst by setting
      neccessary information to make tunnel mode encapuslation working.
      
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      cf93d47e
    • Fan Du's avatar
      {pktgen, xfrm} Using "pgset spi xxx" to spedifiy SA for a given flow · de4aee7d
      Fan Du authored
      
      
      User could set specific SPI value to arm pktgen flow with IPsec
      transformation, instead of looking up SA by sadr/daddr. The reaseon
      to do so is because current state lookup scheme is both slow and, most
      important of all, in fact pktgen doesn't need to match any SA state
      addresses information, all it needs is the SA transfromation shell to
      do the encapuslation.
      
      And this option also provide user an alternative to using pktgen
      test existing SA without creating new ones.
      
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      de4aee7d
    • Fan Du's avatar
      {pktgen, xfrm} Add statistics counting when transforming · 6de9ace4
      Fan Du authored
      
      
      so /proc/net/xfrm_stat could give user clue about what's
      wrong in this process.
      
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      6de9ace4
    • Fan Du's avatar
      {pktgen, xfrm} Correct xfrm state lock usage when transforming · 0af0a413
      Fan Du authored
      
      
      xfrm_state lock protects its state, i.e., VALID/DEAD and statistics,
      not the transforming procedure, as both mode/type output functions
      are reentrant.
      
      Another issue is state lock can be used in BH context when state timer
      alarmed, after transformation in pktgen, update state statistics acquiring
      state lock should disabled BH context for a moment. Otherwise LOCKDEP
      critisize this:
      
      [   62.354339] pktgen: Packet Generator for packet performance testing. Version: 2.74
      [   62.655444]
      [   62.655448] =================================
      [   62.655451] [ INFO: inconsistent lock state ]
      [   62.655455] 3.13.0-rc2+ #70 Not tainted
      [   62.655457] ---------------------------------
      [   62.655459] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      [   62.655463] kpktgend_0/2764 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [   62.655466]  (&(&x->lock)->rlock){+.?...}, at: [<ffffffffa00886f6>] pktgen_thread_worker+0x1796/0x1860 [pktgen]
      [   62.655479] {IN-SOFTIRQ-W} state was registered at:
      [   62.655484]   [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70
      [   62.655492]   [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
      [   62.655498]   [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
      [   62.655505]   [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
      [   62.655511]   [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
      [   62.655519]   [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
      [   62.655523]   [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
      [   62.655526]   [<ffffffff8105a026>] irq_exit+0x96/0xc0
      [   62.655530]   [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
      [   62.655537]   [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
      [   62.655541]   [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
      [   62.655547]   [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
      [   62.655552]   [<ffffffff81761c3c>] rest_init+0xbc/0xd0
      [   62.655557]   [<ffffffff81ea5e5e>] start_kernel+0x3c4/0x3d1
      [   62.655583]   [<ffffffff81ea55a8>] x86_64_start_reservations+0x2a/0x2c
      [   62.655588]   [<ffffffff81ea569f>] x86_64_start_kernel+0xf5/0xfc
      [   62.655592] irq event stamp: 77
      [   62.655594] hardirqs last  enabled at (77): [<ffffffff810ab7f2>] vprintk_emit+0x1b2/0x520
      [   62.655597] hardirqs last disabled at (76): [<ffffffff810ab684>] vprintk_emit+0x44/0x520
      [   62.655601] softirqs last  enabled at (22): [<ffffffff81059b57>] __do_softirq+0x177/0x2d0
      [   62.655605] softirqs last disabled at (15): [<ffffffff8105a026>] irq_exit+0x96/0xc0
      [   62.655609]
      [   62.655609] other info that might help us debug this:
      [   62.655613]  Possible unsafe locking scenario:
      [   62.655613]
      [   62.655616]        CPU0
      [   62.655617]        ----
      [   62.655618]   lock(&(&x->lock)->rlock);
      [   62.655622]   <Interrupt>
      [   62.655623]     lock(&(&x->lock)->rlock);
      [   62.655626]
      [   62.655626]  *** DEADLOCK ***
      [   62.655626]
      [   62.655629] no locks held by kpktgend_0/2764.
      [   62.655631]
      [   62.655631] stack backtrace:
      [   62.655636] CPU: 0 PID: 2764 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #70
      [   62.655638] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
      [   62.655642]  ffffffff8216b7b0 ffff88001be43ab8 ffffffff8176af37 0000000000000007
      [   62.655652]  ffff88001c8d4fc0 ffff88001be43b18 ffffffff81766d78 0000000000000000
      [   62.655663]  ffff880000000001 ffff880000000001 ffffffff8101025f ffff88001be43b18
      [   62.655671] Call Trace:
      [   62.655680]  [<ffffffff8176af37>] dump_stack+0x46/0x58
      [   62.655685]  [<ffffffff81766d78>] print_usage_bug+0x1f1/0x202
      [   62.655691]  [<ffffffff8101025f>] ? save_stack_trace+0x2f/0x50
      [   62.655696]  [<ffffffff81099f8c>] mark_lock+0x28c/0x2f0
      [   62.655700]  [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150
      [   62.655704]  [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
      [   62.655712]  [<ffffffff81115b09>] ? irq_work_queue+0x69/0xb0
      [   62.655717]  [<ffffffff810ab7f2>] ? vprintk_emit+0x1b2/0x520
      [   62.655722]  [<ffffffff8109cec5>] ? trace_hardirqs_on_caller+0x105/0x1d0
      [   62.655730]  [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen]
      [   62.655734]  [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
      [   62.655741]  [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen]
      [   62.655745]  [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
      [   62.655752]  [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen]
      [   62.655758]  [<ffffffffa00886f6>] pktgen_thread_worker+0x1796/0x1860 [pktgen]
      [   62.655766]  [<ffffffffa0087a79>] ? pktgen_thread_worker+0xb19/0x1860 [pktgen]
      [   62.655771]  [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10
      [   62.655777]  [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40
      [   62.655785]  [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0
      [   62.655791]  [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
      [   62.655795]  [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
      [   62.655800]  [<ffffffffa0086f60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen]
      [   62.655806]  [<ffffffff81078f84>] kthread+0xe4/0x100
      [   62.655813]  [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
      [   62.655819]  [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
      [   62.655824]  [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
      
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      0af0a413
  27. 02 Dec, 2013 1 commit
    • fan.du's avatar
      {pktgen, xfrm} Update IPv4 header total len and checksum after tranformation · 3868204d
      fan.du authored
      commit a553e4a6
      
       ("[PKTGEN]: IPSEC support")
      tried to support IPsec ESP transport transformation for pktgen, but acctually
      this doesn't work at all for two reasons(The orignal transformed packet has
      bad IPv4 checksum value, as well as wrong auth value, reported by wireshark)
      
      - After transpormation, IPv4 header total length needs update,
        because encrypted payload's length is NOT same as that of plain text.
      
      - After transformation, IPv4 checksum needs re-caculate because of payload
        has been changed.
      
      With this patch, armmed pktgen with below cofiguration, Wireshark is able to
      decrypted ESP packet generated by pktgen without any IPv4 checksum error or
      auth value error.
      
      pgset "flag IPSEC"
      pgset "flows 1"
      
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3868204d