1. 11 Apr, 2016 4 commits
    • Erik Hugne's avatar
      tipc: make dist queue pernet · 541726ab
      Erik Hugne authored
      
      
      Nametable updates received from the network that cannot be applied
      immediately are placed on a defer queue. This queue is global to the
      TIPC module, which might cause problems when using TIPC in containers.
      To prevent nametable updates from escaping into the wrong namespace,
      we make the queue pernet instead.
      Signed-off-by: default avatarErik Hugne <erik.hugne@gmail.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      541726ab
    • Dmitry Ivanov's avatar
      netlink: don't send NETLINK_URELEASE for unbound sockets · e2726020
      Dmitry Ivanov authored
      
      
      All existing users of NETLINK_URELEASE use it to clean up resources that
      were previously allocated to a socket via some command. As a result, no
      users require getting this notification for unbound sockets.
      
      Sending it for unbound sockets, however, is a problem because any user
      (including unprivileged users) can create a socket that uses the same ID
      as an existing socket. Binding this new socket will fail, but if the
      NETLINK_URELEASE notification is generated for such sockets, the users
      thereof will be tricked into thinking the socket that they allocated the
      resources for is closed.
      
      In the nl80211 case, this will cause destruction of virtual interfaces
      that still belong to an existing hostapd process; this is the case that
      Dmitry noticed. In the NFC case, it will cause a poll abort. In the case
      of netlink log/queue it will cause them to stop reporting events, as if
      NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called.
      
      Fix this problem by checking that the socket is bound before generating
      the NETLINK_URELEASE notification.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Ivanov <dima@ubnt.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2726020
    • David S. Miller's avatar
      decnet: Do not build routes to devices without decnet private data. · a36a0d40
      David S. Miller authored
      
      
      In particular, make sure we check for decnet private presence
      for loopback devices.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a36a0d40
    • Marcelo Ricardo Leitner's avatar
      sctp: avoid refreshing heartbeat timer too often · ba6f5e33
      Marcelo Ricardo Leitner authored
      
      
      Currently on high rate SCTP streams the heartbeat timer refresh can
      consume quite a lot of resources as timer updates are costly and it
      contains a random factor, which a) is also costly and b) invalidates
      mod_timer() optimization for not editing a timer to the same value.
      It may even cause the timer to be slightly advanced, for no good reason.
      
      As suggested by David Laight this patch now removes this timer update
      from hot path by leaving the timer on and re-evaluating upon its
      expiration if the heartbeat is still needed or not, similarly to what is
      done for TCP. If it's not needed anymore the timer is re-scheduled to
      the new timeout, considering the time already elapsed.
      
      For this, we now record the last tx timestamp per transport, updated in
      the same spots as hb timer was restarted on tx. Also split up
      sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
      sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
      heartbeat one.
      
      On loopback with MTU of 65535 and data chunks with 1636, so that we
      have a considerable amount of chunks without stressing system calls,
      netperf -t SCTP_STREAM -l 30, perf looked like this before:
      
      Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
        Overhead  Command  Shared Object      Symbol
      +    6,15%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,43%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
         - _raw_write_unlock_irqrestore
            - 96,54% _raw_spin_unlock_irqrestore
               - 36,14% mod_timer
                  + 97,24% sctp_transport_reset_timers
                  + 2,76% sctp_do_sm
               + 33,65% __wake_up_sync_key
               + 28,77% sctp_ulpq_tail_event
               + 1,40% del_timer
            - 1,84% mod_timer
               + 99,03% sctp_transport_reset_timers
               + 0,97% sctp_do_sm
            + 1,50% sctp_ulpq_tail_event
      
      And after this patch, now with netperf -l 60:
      
      Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
        Overhead  Command  Shared Object      Symbol
      +    5,65%  netperf  [kernel.vmlinux]   [k] memcpy_erms
      +    5,59%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,05%  netperf  [kernel.vmlinux]   [k] _raw_spin_unlock_irqrestore
         - _raw_spin_unlock_irqrestore
            + 49,89% __wake_up_sync_key
            + 45,68% sctp_ulpq_tail_event
            - 2,85% mod_timer
               + 76,51% sctp_transport_reset_t3_rtx
               + 23,49% sctp_do_sm
            + 1,55% del_timer
      +    2,50%  netperf  [sctp]             [k] sctp_datamsg_from_user
      +    2,26%  netperf  [sctp]             [k] sctp_sendmsg
      
      Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
      ~3.7%.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba6f5e33
  2. 08 Apr, 2016 2 commits
    • Roopa Prabhu's avatar
      mpls: find_outdev: check for err ptr in addition to NULL check · 94a57f1f
      Roopa Prabhu authored
      
      
      find_outdev calls inet{,6}_fib_lookup_dev() or dev_get_by_index() to
      find the output device. In case of an error, inet{,6}_fib_lookup_dev()
      returns error pointer and dev_get_by_index() returns NULL. But the function
      only checks for NULL and thus can end up calling dev_put on an ERR_PTR.
      This patch adds an additional check for err ptr after the NULL check.
      
      Before: Trying to add an mpls route with no oif from user, no available
      path to 10.1.1.8 and no default route:
      $ip -f mpls route add 100 as 200 via inet 10.1.1.8
      [  822.337195] BUG: unable to handle kernel NULL pointer dereference at
      00000000000003a3
      [  822.340033] IP: [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182
      [  822.340033] PGD 1db38067 PUD 1de9e067 PMD 0
      [  822.340033] Oops: 0000 [#1] SMP
      [  822.340033] Modules linked in:
      [  822.340033] CPU: 0 PID: 11148 Comm: ip Not tainted 4.5.0-rc7+ #54
      [  822.340033] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org
      04/01/2014
      [  822.340033] task: ffff88001db82580 ti: ffff88001dad4000 task.ti:
      ffff88001dad4000
      [  822.340033] RIP: 0010:[<ffffffff8148781e>]  [<ffffffff8148781e>]
      mpls_nh_assign_dev+0x10b/0x182
      [  822.340033] RSP: 0018:ffff88001dad7a88  EFLAGS: 00010282
      [  822.340033] RAX: ffffffffffffff9b RBX: ffffffffffffff9b RCX:
      0000000000000002
      [  822.340033] RDX: 00000000ffffff9b RSI: 0000000000000008 RDI:
      0000000000000000
      [  822.340033] RBP: ffff88001ddc9ea0 R08: ffff88001e9f1768 R09:
      0000000000000000
      [  822.340033] R10: ffff88001d9c1100 R11: ffff88001e3c89f0 R12:
      ffffffff8187e0c0
      [  822.340033] R13: ffffffff8187e0c0 R14: ffff88001ddc9e80 R15:
      0000000000000004
      [  822.340033] FS:  00007ff9ed798700(0000) GS:ffff88001fc00000(0000)
      knlGS:0000000000000000
      [  822.340033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  822.340033] CR2: 00000000000003a3 CR3: 000000001de89000 CR4:
      00000000000006f0
      [  822.340033] Stack:
      [  822.340033]  0000000000000000 0000000100000000 0000000000000000
      0000000000000000
      [  822.340033]  0000000000000000 0801010a00000000 0000000000000000
      0000000000000000
      [  822.340033]  0000000000000004 ffffffff8148749b ffffffff8187e0c0
      000000000000001c
      [  822.340033] Call Trace:
      [  822.340033]  [<ffffffff8148749b>] ? mpls_rt_alloc+0x2b/0x3e
      [  822.340033]  [<ffffffff81488e66>] ? mpls_rtm_newroute+0x358/0x3e2
      [  822.340033]  [<ffffffff810e7bbc>] ? get_page+0x5/0xa
      [  822.340033]  [<ffffffff813b7d94>] ? rtnetlink_rcv_msg+0x17e/0x191
      [  822.340033]  [<ffffffff8111794e>] ? __kmalloc_track_caller+0x8c/0x9e
      [  822.340033]  [<ffffffff813c9393>] ?
      rht_key_hashfn.isra.20.constprop.57+0x14/0x1f
      [  822.340033]  [<ffffffff813b7c16>] ? __rtnl_unlock+0xc/0xc
      [  822.340033]  [<ffffffff813cb794>] ? netlink_rcv_skb+0x36/0x82
      [  822.340033]  [<ffffffff813b4507>] ? rtnetlink_rcv+0x1f/0x28
      [  822.340033]  [<ffffffff813cb2b1>] ? netlink_unicast+0x106/0x189
      [  822.340033]  [<ffffffff813cb5b3>] ? netlink_sendmsg+0x27f/0x2c8
      [  822.340033]  [<ffffffff81392ede>] ? sock_sendmsg_nosec+0x10/0x1b
      [  822.340033]  [<ffffffff81393df1>] ? ___sys_sendmsg+0x182/0x1e3
      [  822.340033]  [<ffffffff810e4f35>] ?
      __alloc_pages_nodemask+0x11c/0x1e4
      [  822.340033]  [<ffffffff8110619c>] ? PageAnon+0x5/0xd
      [  822.340033]  [<ffffffff811062fe>] ? __page_set_anon_rmap+0x45/0x52
      [  822.340033]  [<ffffffff810e7bbc>] ? get_page+0x5/0xa
      [  822.340033]  [<ffffffff810e85ab>] ? __lru_cache_add+0x1a/0x3a
      [  822.340033]  [<ffffffff81087ea9>] ? current_kernel_time64+0x9/0x30
      [  822.340033]  [<ffffffff813940c4>] ? __sys_sendmsg+0x3c/0x5a
      [  822.340033]  [<ffffffff8148f597>] ?
      entry_SYSCALL_64_fastpath+0x12/0x6a
      [  822.340033] Code: 83 08 04 00 00 65 ff 00 48 8b 3c 24 e8 40 7c f2 ff
      eb 13 48 c7 c3 9f ff ff ff eb 0f 89 ce e8 f1 ae f1 ff 48 89 c3 48 85 db
      74 15 <48> 8b 83 08 04 00 00 65 ff 08 48 81 fb 00 f0 ff ff 76 0d eb 07
      [  822.340033] RIP  [<ffffffff8148781e>] mpls_nh_assign_dev+0x10b/0x182
      [  822.340033]  RSP <ffff88001dad7a88>
      [  822.340033] CR2: 00000000000003a3
      [  822.435363] ---[ end trace 98cc65e6f6b8bf11 ]---
      
      After patch:
      $ip -f mpls route add 100 as 200 via inet 10.1.1.8
      RTNETLINK answers: Network is unreachable
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94a57f1f
    • Jakub Sitnicki's avatar
      ipv6: Count in extension headers in skb->network_header · 3ba3458f
      Jakub Sitnicki authored
      
      
      When sending a UDPv6 message longer than MTU, account for the length
      of fragmentable IPv6 extension headers in skb->network_header offset.
      Same as we do in alloc_new_skb path in __ip6_append_data().
      
      This ensures that later on __ip6_make_skb() will make space in
      headroom for fragmentable extension headers:
      
      	/* move skb->data to ip header from ext header */
      	if (skb->data < skb_network_header(skb))
      		__skb_pull(skb, skb_network_offset(skb));
      
      Prevents a splat due to skb_under_panic:
      
      skbuff: skb_under_panic: text:ffffffff8143397b len:2126 put:14 \
      head:ffff880005bacf50 data:ffff880005bacf4a tail:0x48 end:0xc0 dev:lo
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:104!
      invalid opcode: 0000 [#1] KASAN
      CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65
      [...]
      Call Trace:
       [<ffffffff813eb7b9>] skb_push+0x79/0x80
       [<ffffffff8143397b>] eth_header+0x2b/0x100
       [<ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310
       [<ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0
       [<ffffffff814efe3a>] ip6_output+0x16a/0x280
       [<ffffffff815440c1>] ip6_local_out+0xb1/0xf0
       [<ffffffff814f1115>] ip6_send_skb+0x45/0xd0
       [<ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0
       [<ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090
      [...]
      Reported-by: default avatarJi Jianwen <jiji@redhat.com>
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ba3458f
  3. 07 Apr, 2016 3 commits
  4. 06 Apr, 2016 5 commits
  5. 05 Apr, 2016 13 commits
  6. 04 Apr, 2016 5 commits
  7. 01 Apr, 2016 1 commit
    • Daniel Borkmann's avatar
      tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter · 5a5abb1f
      Daniel Borkmann authored
      Sasha Levin reported a suspicious rcu_dereference_protected() warning
      found while fuzzing with trinity that is similar to this one:
      
        [   52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
        [   52.765688] other info that might help us debug this:
        [   52.765695] rcu_scheduler_active = 1, debug_locks = 1
        [   52.765701] 1 lock held by a.out/1525:
        [   52.765704]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
        [   52.765721] stack backtrace:
        [   52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
        [...]
        [   52.765768] Call Trace:
        [   52.765775]  [<ffffffff813e488d>] dump_stack+0x85/0xc8
        [   52.765784]  [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
        [   52.765792]  [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
        [   52.765801]  [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
        [   52.765810]  [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
        [   52.765818]  [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
        [   52.765827]  [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
        [   52.765834]  [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
        [   52.765843]  [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
        [   52.765850]  [<ffffffff81261519>] SyS_ioctl+0x79/0x90
        [   52.765858]  [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
        [   52.765866]  [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
      from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
      DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.
      
      Since the fix in f91ff5b9 ("net: sk_{detach|attach}_filter() rcu
      fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
      filter with rcu_dereference_protected(), checking whether socket lock
      is held in control path.
      
      Since its introduction in 99405162
      
       ("tun: socket filter support"),
      tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
      sock_owned_by_user(sk) doesn't apply in this specific case and therefore
      triggers the false positive.
      
      Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
      that is used by tap filters and pass in lockdep_rtnl_is_held() for the
      rcu_dereference_protected() checks instead.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a5abb1f
  8. 31 Mar, 2016 1 commit
  9. 30 Mar, 2016 5 commits
  10. 28 Mar, 2016 1 commit