1. 10 Sep, 2014 35 commits
  2. 09 Sep, 2014 5 commits
    • David S. Miller's avatar
      Merge branch 'bpf-next' · 60005c60
      David S. Miller authored
      
      
      Daniel Borkmann says:
      
      ====================
      BPF updates
      
      [ Set applies on top of current net-next but also on top of
        Alexei's latest patches. Please see individual patches for
        more details. ]
      
      Changelog:
       v1->v2:
        - Removed paragraph in 1st commit message
        - Rest stays the same
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60005c60
    • Daniel Borkmann's avatar
      net: bpf: be friendly to kmemcheck · 286aad3c
      Daniel Borkmann authored
      
      
      Reported by Mikulas Patocka, kmemcheck currently barks out a
      false positive since we don't have special kmemcheck annotation
      for bitfields used in bpf_prog structure.
      
      We currently have jited:1, len:31 and thus when accessing len
      while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
      we're reading uninitialized memory.
      
      As we don't need the whole bit universe for pages member, we
      can just split it to u16 and use a bool flag for jited instead
      of a bitfield.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      286aad3c
    • Daniel Borkmann's avatar
      net: bpf: arm: address randomize and write protect JIT code · 55309dd3
      Daniel Borkmann authored
      This is the ARM variant for 314beb9b ("x86: bpf_jit_comp: secure bpf
      jit against spraying attacks").
      
      It is now possible to implement it due to commits 75374ad4 ("ARM: mm:
      Define set_memory_* functions for ARM") and dca9aa92 ("ARM: add
      DEBUG_SET_MODULE_RONX option to Kconfig") which added infrastructure for
      this facility.
      
      Thus, this patch makes sure the BPF generated JIT code is marked RO, as
      other kernel text sections, and also lets the generated JIT code start
      at a pseudo random offset instead on a page boundary. The holes are filled
      with illegal instructions.
      
      JIT tested on armv7hl with BPF test suite.
      
      Reference: http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
      
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarMircea Gherzan <mgherzan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55309dd3
    • Daniel Borkmann's avatar
      net: bpf: consolidate JIT binary allocator · 738cbe72
      Daniel Borkmann authored
      Introduced in commit 314beb9b ("x86: bpf_jit_comp: secure bpf jit
      against spraying attacks") and later on replicated in aa2d2c73
      
      
      ("s390/bpf,jit: address randomize and write protect jit code") for
      s390 architecture, write protection for BPF JIT images got added and
      a random start address of the JIT code, so that it's not on a page
      boundary anymore.
      
      Since both use a very similar allocator for the BPF binary header,
      we can consolidate this code into the BPF core as it's mostly JIT
      independant anyway.
      
      This will also allow for future archs that support DEBUG_SET_MODULE_RONX
      to just reuse instead of reimplementing it.
      
      JIT tested on x86_64 and s390x with BPF test suite.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      738cbe72
    • Eric Dumazet's avatar
      tcp: remove dst refcount false sharing for prequeue mode · ca777eff
      Eric Dumazet authored
      Alexander Duyck reported high false sharing on dst refcount in tcp stack
      when prequeue is used. prequeue is the mechanism used when a thread is
      blocked in recvmsg()/read() on a TCP socket, using a blocking model
      rather than select()/poll()/epoll() non blocking one.
      
      We already try to use RCU in input path as much as possible, but we were
      forced to take a refcount on the dst when skb escaped RCU protected
      region. When/if the user thread runs on different cpu, dst_release()
      will then touch dst refcount again.
      
      Commit 09316255
      
       (tcp: force a dst refcount when prequeue packet)
      was an example of a race fix.
      
      It turns out the only remaining usage of skb->dst for a packet stored
      in a TCP socket prequeue is IP early demux.
      
      We can add a logic to detect when IP early demux is probably going
      to use skb->dst. Because we do an optimistic check rather than duplicate
      existing logic, we need to guard inet_sk_rx_dst_set() and
      inet6_sk_rx_dst_set() from using a NULL dst.
      
      Many thanks to Alexander for providing a nice bug report, git bisection,
      and reproducer.
      
      Tested using Alexander script on a 40Gb NIC, 8 RX queues.
      Hosts have 24 cores, 48 hyper threads.
      
      echo 0 >/proc/sys/net/ipv4/tcp_autocorking
      
      for i in `seq 0 47`
      do
        for j in `seq 0 2`
        do
           netperf -H $DEST -t TCP_STREAM -l 1000 \
                   -c -C -T $i,$i -P 0 -- \
                   -m 64 -s 64K -D &
        done
      done
      
      Before patch : ~6Mpps and ~95% cpu usage on receiver
      After patch : ~9Mpps and ~35% cpu usage on receiver.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca777eff