1. 10 Nov, 2010 1 commit
    • David S. Miller's avatar
      filter: make sure filters dont read uninitialized memory · 57fe93b3
      David S. Miller authored
      There is a possibility malicious users can get limited information about
      uninitialized stack mem array. Even if sk_run_filter() result is bound
      to packet length (0 .. 65535), we could imagine this can be used by
      hostile user.
      Initializing mem[] array, like Dan Rosenberg suggested in his patch is
      expensive since most filters dont even use this array.
      Its hard to make the filter validation in sk_chk_filter(), because of
      the jumps. This might be done later.
      In this patch, I use a bitmap (a single long var) so that only filters
      using mem[] loads/stores pay the price of added security checks.
      For other filters, additional cost is a single instruction.
      [ Since we access fentry->k a lot now, cache it in a local variable
        and mark filter entry pointer as const. -DaveM ]
      Reported-by: default avatarDan Rosenberg <drosenberg@vsecurity.com>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 25 Oct, 2010 1 commit
  3. 28 Sep, 2010 1 commit
  4. 26 Jun, 2010 1 commit
    • Hagen Paul Pfeifer's avatar
      net: optimize Berkeley Packet Filter (BPF) processing · 01f2f3f6
      Hagen Paul Pfeifer authored
      Gcc is currenlty not in the ability to optimize the switch statement in
      sk_run_filter() because of dense case labels. This patch replace the
      OR'd labels with ordered sequenced case labels. The sk_chk_filter()
      function is modified to patch/replace the original OPCODES in a
      ordered but equivalent form. gcc is now in the ability to transform the
      switch statement in sk_run_filter into a jump table of complexity O(1).
      Until this patch gcc generates a sequence of conditional branches (O(n) of 567
      byte .text segment size (arch x86_64):
      7ff: 8b 06                 mov    (%rsi),%eax
      801: 66 83 f8 35           cmp    $0x35,%ax
      805: 0f 84 d0 02 00 00     je     adb <sk_run_filter+0x31d>
      80b: 0f 87 07 01 00 00     ja     918 <sk_run_filter+0x15a>
      811: 66 83 f8 15           cmp    $0x15,%ax
      815: 0f 84 c5 02 00 00     je     ae0 <sk_run_filter+0x322>
      81b: 77 73                 ja     890 <sk_run_filter+0xd2>
      81d: 66 83 f8 04           cmp    $0x4,%ax
      821: 0f 84 17 02 00 00     je     a3e <sk_run_filter+0x280>
      827: 77 29                 ja     852 <sk_run_filter+0x94>
      829: 66 83 f8 01           cmp    $0x1,%ax
      With the modification the compiler translate the switch statement into
      the following jump table fragment:
      7ff: 66 83 3e 2c           cmpw   $0x2c,(%rsi)
      803: 0f 87 1f 02 00 00     ja     a28 <sk_run_filter+0x26a>
      809: 0f b7 06              movzwl (%rsi),%eax
      80c: ff 24 c5 00 00 00 00  jmpq   *0x0(,%rax,8)
      813: 44 89 e3              mov    %r12d,%ebx
      816: e9 43 03 00 00        jmpq   b5e <sk_run_filter+0x3a0>
      81b: 41 89 dc              mov    %ebx,%r12d
      81e: e9 3b 03 00 00        jmpq   b5e <sk_run_filter+0x3a0>
      Furthermore, I reordered the instructions to reduce cache line misses by
      order the most common instruction to the start.
      Signed-off-by: default avatarHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  5. 22 Apr, 2010 1 commit
  6. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  7. 25 Feb, 2010 1 commit
    • Paul E. McKenney's avatar
      net: Add checking to rcu_dereference() primitives · a898def2
      Paul E. McKenney authored
      Update rcu_dereference() primitives to use new lockdep-based
      checking. The rcu_dereference() in __in6_dev_get() may be
      protected either by rcu_read_lock() or RTNL, per Eric Dumazet.
      The rcu_dereference() in __sk_free() is protected by the fact
      that it is never reached if an update could change it.  Check
      for this by using rcu_dereference_check() to verify that the
      struct sock's ->sk_wmem_alloc counter is zero.
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-5-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  8. 18 Feb, 2010 1 commit
  9. 20 Oct, 2009 2 commits
  10. 20 Nov, 2008 1 commit
    • Pablo Neira Ayuso's avatar
      filter: add SKF_AD_NLATTR_NEST to look for nested attributes · d214c753
      Pablo Neira Ayuso authored
      SKF_AD_NLATTR allows us to find the first matching attribute in a
      stream of netlink attributes from one offset to the end of the
      netlink message. This is not suitable to look for a specific
      matching inside a set of nested attributes.
      For example, in ctnetlink messages, if we look for the CTA_V6_SRC
      attribute in a message that talks about an IPv4 connection,
      SKF_AD_NLATTR returns the offset of CTA_STATUS which has the same
      value of CTA_V6_SRC but outside the nest. To differenciate
      CTA_STATUS and CTA_V6_SRC, we would have to make assumptions on the
      size of the attribute and the usual offset, resulting in horrible
      BSF code.
      This patch adds SKF_AD_NLATTR_NEST, which is a variant of
      SKF_AD_NLATTR, that looks for an attribute inside the limits of
      a nested attributes, but not further.
      This patch validates that we have enough room to look for the
      nested attributes - based on a suggestion from Patrick McHardy.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  11. 02 Jul, 2008 1 commit
  12. 02 May, 2008 1 commit
  13. 10 Apr, 2008 3 commits
    • Patrick McHardy's avatar
      [SKFILTER]: Add SKF_ADF_NLATTR instruction · 4738c1db
      Patrick McHardy authored
      SKF_ADF_NLATTR searches for a netlink attribute, which avoids manually
      parsing and walking attributes. It takes the offset at which to start
      searching in the 'A' register and the attribute type in the 'X' register
      and returns the offset in the 'A' register. When the attribute is not
      found it returns zero.
      A top-level attribute can be located using a filter like this
      (example for nfnetlink, using struct nfgenmsg):
      		/* A = offset of first attribute */
      		.code	= BPF_LD | BPF_IMM,
      		.k	= sizeof(struct nlmsghdr) + sizeof(struct nfgenmsg)
      		/* X = CTA_PROTOINFO */
      		.code	= BPF_LDX | BPF_IMM,
      		.k	= CTA_PROTOINFO,
      		/* A = netlink attribute offset */
      		.code	= BPF_LD | BPF_B | BPF_ABS,
      		.k	= SKF_AD_OFF + SKF_AD_NLATTR
      		/* Exit if not found */
      		.code   = BPF_JMP | BPF_JEQ | BPF_K,
      		.k	= 0,
      		.jt	= <error>
      A nested attribute below the CTA_PROTOINFO attribute would then
      be parsed like this:
      		/* A += sizeof(struct nlattr) */
      		.code	= BPF_ALU | BPF_ADD | BPF_K,
      		.k	= sizeof(struct nlattr),
      		/* X = CTA_PROTOINFO_TCP */
      		.code	= BPF_LDX | BPF_IMM,
      		.k	= CTA_PROTOINFO_TCP,
      		/* A = netlink attribute offset */
      		.code	= BPF_LD | BPF_B | BPF_ABS,
      		.k	= SKF_AD_OFF + SKF_AD_NLATTR
      The data of an attribute can be loaded into 'A' like this:
      		/* X = A (attribute offset) */
      		.code	= BPF_MISC | BPF_TAX,
      		/* A = skb->data[X + k] */
      		.code 	= BPF_LD | BPF_B | BPF_IND,
      		.k	= sizeof(struct nlattr),
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Stephen Hemminger's avatar
      socket: sk_filter deinline · 43db6d65
      Stephen Hemminger authored
      The sk_filter function is too big to be inlined. This saves 2296 bytes
      of text on allyesconfig.
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Stephen Hemminger's avatar
      socket: sk_filter minor cleanups · b715631f
      Stephen Hemminger authored
      Some minor style cleanups:
        * Move __KERNEL__ definitions to one place in filter.h
        * Use const for sk_filter_len
        * Line wrapping
        * Put EXPORT_SYMBOL next to function definition
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  14. 19 Oct, 2007 1 commit
    • Olof Johansson's avatar
      [NET]: Fix bug in sk_filter race cures. · 9b013e05
      Olof Johansson authored
      Looks like this might be causing problems, at least for me on ppc. This
      happened during a normal boot, right around first interface config/dhcp
      cpu 0x0: Vector: 300 (Data Access) at [c00000000147b820]
          pc: c000000000435e5c: .sk_filter_delayed_uncharge+0x1c/0x60
          lr: c0000000004360d0: .sk_attach_filter+0x170/0x180
          sp: c00000000147baa0
         msr: 9000000000009032
         dar: 4
       dsisr: 40000000
        current = 0xc000000004780fa0
        paca    = 0xc000000000650480
          pid   = 1295, comm = dhclient3
      0:mon> t
      [c00000000147bb20] c0000000004360d0 .sk_attach_filter+0x170/0x180
      [c00000000147bbd0] c000000000418988 .sock_setsockopt+0x788/0x7f0
      [c00000000147bcb0] c000000000438a74 .compat_sys_setsockopt+0x4e4/0x5a0
      [c00000000147bd90] c00000000043955c .compat_sys_socketcall+0x25c/0x2b0
      [c00000000147be30] c000000000007508 syscall_exit+0x0/0x40
      --- Exception: c01 (System Call) at 000000000ff618d8
      SP (fffdf040) is in userspace
      I.e. null pointer deref at sk_filter_delayed_uncharge+0x1c:
      0:mon> di $.sk_filter_delayed_uncharge
      c000000000435e40  7c0802a6      mflr    r0
      c000000000435e44  fbc1fff0      std     r30,-16(r1)
      c000000000435e48  7c8b2378      mr      r11,r4
      c000000000435e4c  ebc2cdd0      ld      r30,-12848(r2)
      c000000000435e50  f8010010      std     r0,16(r1)
      c000000000435e54  f821ff81      stdu    r1,-128(r1)
      c000000000435e58  380300a4      addi    r0,r3,164
      c000000000435e5c  81240004      lwz     r9,4(r4)
      That's the deref of fp:
      static void sk_filter_delayed_uncharge(struct sock *sk, struct sk_filter *fp)
              unsigned int size = sk_filter_len(fp);
      That is called from sk_attach_filter():
              old_fp = rcu_dereference(sk->sk_filter);
              rcu_assign_pointer(sk->sk_filter, fp);
              sk_filter_delayed_uncharge(sk, old_fp);
              return 0;
      So, looks like rcu_dereference() returned NULL. I don't know the
      filter code at all, but it seems like it might be a valid case?
      sk_detach_filter() seems to handle a NULL sk_filter, at least.
      So, this needs review by someone who knows the filter, but it fixes the
      problem for me:
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  15. 18 Oct, 2007 4 commits
  16. 26 Apr, 2007 3 commits
  17. 14 Feb, 2007 1 commit
    • Tim Schmielau's avatar
      [PATCH] remove many unneeded #includes of sched.h · cd354f1a
      Tim Schmielau authored
      After Al Viro (finally) succeeded in removing the sched.h #include in module.h
      recently, it makes sense again to remove other superfluous sched.h includes.
      There are quite a lot of files which include it but don't actually need
      anything defined in there.  Presumably these includes were once needed for
      macros that used to live in sched.h, but moved to other header files in the
      course of cleaning it up.
      To ease the pain, this time I did not fiddle with any header files and only
      removed #includes from .c-files, which tend to cause less trouble.
      Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
      arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
      allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
      configs in arch/arm/configs on arm.  I also checked that no new warnings were
      introduced by the patch (actually, some warnings are removed that were emitted
      by unnecessarily included header files).
      Signed-off-by: default avatarTim Schmielau <tim@physik3.uni-rostock.de>
      Acked-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  18. 11 Feb, 2007 1 commit
  19. 03 Dec, 2006 1 commit
  20. 22 Sep, 2006 1 commit
  21. 18 Apr, 2006 1 commit
  22. 24 Jan, 2006 1 commit
  23. 17 Jan, 2006 1 commit
  24. 13 Jan, 2006 1 commit
  25. 06 Jan, 2006 1 commit
  26. 04 Jan, 2006 1 commit
  27. 27 Dec, 2005 1 commit
  28. 20 Nov, 2005 1 commit
  29. 06 Sep, 2005 1 commit
    • Herbert Xu's avatar
      [NET]: 2.6.13 breaks libpcap (and tcpdump) · 1198ad00
      Herbert Xu authored
      Patrick McHardy says:
        Never mind, I got it, we never fall through to the second switch
        statement anymore. I think we could simply break when load_pointer
        returns NULL. The switch statement will fall through to the default
        case and return 0 for all cases but 0 > k >= SKF_AD_OFF.
      Here's a patch to do just that.
      I left BPF_MSH alone because it's really a hack to calculate the IP
      header length, which makes no sense when applied to the special data.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  30. 05 Jul, 2005 3 commits