1. 12 Sep, 2019 1 commit
    • Yang Yingliang's avatar
      tun: fix use-after-free when register netdev failed · 77f22f92
      Yang Yingliang authored
      I got a UAF repport in tun driver when doing fuzzy test:
      
      [  466.269490] ==================================================================
      [  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
      [  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
      [  466.271810]
      [  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
      [  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [  466.271838] Call Trace:
      [  466.271858]  dump_stack+0xca/0x13e
      [  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271890]  print_address_description+0x79/0x440
      [  466.271906]  ? vprintk_func+0x5e/0xf0
      [  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271935]  __kasan_report+0x15c/0x1df
      [  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271976]  kasan_report+0xe/0x20
      [  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
      [  466.272013]  do_iter_readv_writev+0x4b7/0x740
      [  466.272032]  ? default_llseek+0x2d0/0x2d0
      [  466.272072]  do_iter_read+0x1c5/0x5e0
      [  466.272110]  vfs_readv+0x108/0x180
      [  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
      [  466.299020]  ? fsnotify+0x888/0xd50
      [  466.299040]  ? __fsnotify_parent+0xd0/0x350
      [  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
      [  466.304548]  ? vfs_write+0x264/0x510
      [  466.304569]  ? ksys_write+0x101/0x210
      [  466.304591]  ? do_preadv+0x116/0x1a0
      [  466.304609]  do_preadv+0x116/0x1a0
      [  466.309829]  do_syscall_64+0xc8/0x600
      [  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.309861] RIP: 0033:0x4560f9
      [  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      [  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
      [  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
      [  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
      [  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
      [  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
      [  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
      [  466.323057]
      [  466.323064] Allocated by task 2605:
      [  466.335165]  save_stack+0x19/0x80
      [  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
      [  466.337755]  kmem_cache_alloc+0xe8/0x320
      [  466.339050]  getname_flags+0xca/0x560
      [  466.340229]  user_path_at_empty+0x2c/0x50
      [  466.341508]  vfs_statx+0xe6/0x190
      [  466.342619]  __do_sys_newstat+0x81/0x100
      [  466.343908]  do_syscall_64+0xc8/0x600
      [  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.347034]
      [  466.347517] Freed by task 2605:
      [  466.348471]  save_stack+0x19/0x80
      [  466.349476]  __kasan_slab_free+0x12e/0x180
      [  466.350726]  kmem_cache_free+0xc8/0x430
      [  466.351874]  putname+0xe2/0x120
      [  466.352921]  filename_lookup+0x257/0x3e0
      [  466.354319]  vfs_statx+0xe6/0x190
      [  466.355498]  __do_sys_newstat+0x81/0x100
      [  466.356889]  do_syscall_64+0xc8/0x600
      [  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.359567]
      [  466.360050] The buggy address belongs to the object at ffff888372139100
      [  466.360050]  which belongs to the cache names_cache of size 4096
      [  466.363735] The buggy address is located 336 bytes inside of
      [  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
      [  466.367179] The buggy address belongs to the page:
      [  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
      [  466.371582] flags: 0x2fffff80010200(slab|head)
      [  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
      [  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  466.377778] page dumped because: kasan: bad access detected
      [  466.379730]
      [  466.380288] Memory state around the buggy address:
      [  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.388257]                                                  ^
      [  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.394667] ==================================================================
      
      tun_chr_read_iter() accessed the memory which freed by free_netdev()
      called by tun_set_iff():
      
              CPUA                                           CPUB
        tun_set_iff()
          alloc_netdev_mqs()
          tun_attach()
                                                        tun_chr_read_iter()
                                                          tun_get()
                                                          tun_do_read()
                                                            tun_ring_recv()
          register_netdevice() <-- inject error
          goto err_detach
          tun_detach_all() <-- set RCV_SHUTDOWN
          free_netdev() <-- called from
                           err_free_dev path
            netdev_freemem() <-- free the memory
                              without check refcount
            (In this path, the refcount cannot prevent
             freeing the memory of dev, and the memory
             will be used by dev_put() called by
             tun_chr_read_iter() on CPUB.)
                                                           (Break from tun_ring_recv(),
                                                           because RCV_SHUTDOWN is set)
                                                         tun_put()
                                                           dev_put() <-- use the memory
                                                                         freed by netdev_freemem()
      
      Put the publishing of tfile->tun after register_netdevice(),
      so tun_get() won't get the tun pointer that freed by
      err_detach path if register_netdevice() failed.
      
      Fixes: eb0fb363
      
       ("tuntap: attach queue 0 before registering netdevice")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Suggested-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77f22f92
  2. 25 Jul, 2019 1 commit
    • Alexis Bauvin's avatar
      tun: mark small packets as owned by the tap sock · 4b663366
      Alexis Bauvin authored
      - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
      
      Small packets going out of a tap device go through an optimized code
      path that uses build_skb() rather than sock_alloc_send_pskb(). The
      latter calls skb_set_owner_w(), but the small packet code path does not.
      
      The net effect is that small packets are not owned by the userland
      application's socket (e.g. QEMU), while large packets are.
      This can be seen with a TCP session, where packets are not owned when
      the window size is small enough (around PAGE_SIZE), while they are once
      the window grows (note that this requires the host to support virtio
      tso for the guest to offload segmentation).
      All this leads to inconsistent behaviour in the kernel, especially on
      netfilter modules that uses sk->socket (e.g. xt_owner).
      
      Fixes: 66ccbc9c
      
       ("tap: use build_skb() for small packet")
      Signed-off-by: default avatarAlexis Bauvin <abauvin@scaleway.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b663366
  3. 09 Jul, 2019 1 commit
    • Al Viro's avatar
      coallocate socket_wq with socket itself · 333f7909
      Al Viro authored
      
      
      socket->wq is assign-once, set when we are initializing both
      struct socket it's in and struct socket_wq it points to.  As the
      matter of fact, the only reason for separate allocation was the
      ability to RCU-delay freeing of socket_wq.  RCU-delaying the
      freeing of socket itself gets rid of that need, so we can just
      fold struct socket_wq into the end of struct socket and simplify
      the life both for sock_alloc_inode() (one allocation instead of
      two) and for tun/tap oddballs, where we used to embed struct socket
      and struct socket_wq into the same structure (now - embedding just
      the struct socket).
      
      Note that reference to struct socket_wq in struct sock does remain
      a reference - that's unchanged.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      333f7909
  4. 18 Jun, 2019 1 commit
    • Fei Li's avatar
      tun: wake up waitqueues after IFF_UP is set · 72b319dc
      Fei Li authored
      
      
      Currently after setting tap0 link up, the tun code wakes tx/rx waited
      queues up in tun_net_open() when .ndo_open() is called, however the
      IFF_UP flag has not been set yet. If there's already a wait queue, it
      would fail to transmit when checking the IFF_UP flag in tun_sendmsg().
      Then the saving vhost_poll_start() will add the wq into wqh until it
      is waken up again. Although this works when IFF_UP flag has been set
      when tun_chr_poll detects; this is not true if IFF_UP flag has not
      been set at that time. Sadly the latter case is a fatal error, as
      the wq will never be waken up in future unless later manually
      setting link up on purpose.
      
      Fix this by moving the wakeup process into the NETDEV_UP event
      notifying process, this makes sure IFF_UP has been set before all
      waited queues been waken up.
      Signed-off-by: default avatarFei Li <lifei.shirley@bytedance.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72b319dc
  5. 30 May, 2019 1 commit
    • Thomas Gleixner's avatar
      treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 · c942fddf
      Thomas Gleixner authored
      
      
      Based on 3 normalized pattern(s):
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version this program is distributed in the
        hope that it will be useful but without any warranty without even
        the implied warranty of merchantability or fitness for a particular
        purpose see the gnu general public license for more details
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version [author] [kishon] [vijay] [abraham]
        [i] [kishon]@[ti] [com] this program is distributed in the hope that
        it will be useful but without any warranty without even the implied
        warranty of merchantability or fitness for a particular purpose see
        the gnu general public license for more details
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version [author] [graeme] [gregory]
        [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
        [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
        [hk] [hemahk]@[ti] [com] this program is distributed in the hope
        that it will be useful but without any warranty without even the
        implied warranty of merchantability or fitness for a particular
        purpose see the gnu general public license for more details
      
      extracted by the scancode license scanner the SPDX license identifier
      
        GPL-2.0-or-later
      
      has been chosen to replace the boilerplate/reference in 1105 file(s).
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAllison Randal <allison@lohutok.net>
      Reviewed-by: default avatarRichard Fontana <rfontana@redhat.com>
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Cc: linux-spdx@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c942fddf
  6. 09 May, 2019 2 commits
  7. 23 Apr, 2019 1 commit
    • Stanislav Fomichev's avatar
      net: pass net_device argument to the eth_get_headlen · c43f1255
      Stanislav Fomichev authored
      
      
      Update all users of eth_get_headlen to pass network device, fetch
      network namespace from it and pass it down to the flow dissector.
      This commit is a noop until administrator inserts BPF flow dissector
      program.
      
      Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: intel-wired-lan@lists.osuosl.org
      Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
      Cc: Salil Mehta <salil.mehta@huawei.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Cc: Igor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c43f1255
  8. 24 Mar, 2019 1 commit
  9. 21 Mar, 2019 2 commits
  10. 20 Mar, 2019 1 commit
    • Paolo Abeni's avatar
      net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce
      Paolo Abeni authored
      
      
      After the previous patch, all the callers of ndo_select_queue()
      provide as a 'fallback' argument netdev_pick_tx.
      The only exceptions are nested calls to ndo_select_queue(),
      which pass down the 'fallback' available in the current scope
      - still netdev_pick_tx.
      
      We can drop such argument and replace fallback() invocation with
      netdev_pick_tx(). This avoids an indirect call per xmit packet
      in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
      with device drivers implementing such ndo. It also clean the code
      a bit.
      
      Tested with ixgbe and CONFIG_FCOE=m
      
      With pktgen using queue xmit:
      threads		vanilla 	patched
      		(kpps)		(kpps)
      1		2334		2428
      2		4166		4278
      4		7895		8100
      
       v1 -> v2:
       - rebased after helper's name change
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a350ecce
  11. 16 Mar, 2019 1 commit
  12. 15 Mar, 2019 1 commit
    • Eric Dumazet's avatar
      tun: properly test for IFF_UP · 4477138f
      Eric Dumazet authored
      Same reasons than the ones explained in commit 4179cb5a
      ("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
      
      netif_rx_ni() or napi_gro_frags() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      A similar protocol is used for gro layer.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Fixes: 1bd4978a
      
       ("tun: honor IFF_UP in tun_get_user()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4477138f
  13. 25 Feb, 2019 2 commits
  14. 22 Feb, 2019 1 commit
    • Maxim Mikityanskiy's avatar
      net: Don't set transport offset to invalid value · d2aa125d
      Maxim Mikityanskiy authored
      
      
      If the socket was created with socket(AF_PACKET, SOCK_RAW, 0),
      skb->protocol will be unset, __skb_flow_dissect() will fail, and
      skb_probe_transport_header() will fall back to the offset_hint, making
      the resulting skb_transport_offset incorrect.
      
      If, however, there is no transport header in the packet,
      transport_header shouldn't be set to an arbitrary value.
      
      Fix it by leaving the transport offset unset if it couldn't be found, to
      be explicit rather than to fill it with some wrong value. It changes the
      behavior, but if some code relied on the old behavior, it would be
      broken anyway, as the old one is incorrect.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2aa125d
  15. 31 Jan, 2019 1 commit
  16. 10 Jan, 2019 1 commit
    • Stanislav Fomichev's avatar
      tun: publish tfile after it's fully initialized · 0b7959b6
      Stanislav Fomichev authored
      
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1
      Call Trace:
       ? napi_gro_frags+0xa7/0x2c0
       tun_get_user+0xb50/0xf20
       tun_chr_write_iter+0x53/0x70
       new_sync_write+0xff/0x160
       vfs_write+0x191/0x1e0
       __x64_sys_write+0x5e/0xd0
       do_syscall_64+0x47/0xf0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      I think there is a subtle race between sending a packet via tap and
      attaching it:
      
      CPU0:                    CPU1:
      tun_chr_ioctl(TUNSETIFF)
        tun_set_iff
          tun_attach
            rcu_assign_pointer(tfile->tun, tun);
                               tun_fops->write_iter()
                                 tun_chr_write_iter
                                   tun_napi_alloc_frags
                                     napi_get_frags
                                       napi->skb = napi_alloc_skb
            tun_napi_init
              netif_napi_add
                napi->skb = NULL
                                    napi->skb is NULL here
                                    napi_gro_frags
                                      napi_frags_skb
      				  skb = napi->skb
      				  skb_reset_mac_header(skb)
      				  panic()
      
      Move rcu_assign_pointer(tfile->tun) and rcu_assign_pointer(tun->tfiles) to
      be the last thing we do in tun_attach(); this should guarantee that when we
      call tun_get() we always get an initialized object.
      
      v2 changes:
      * remove extra napi_mutex locks/unlocks for napi operations
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Fixes: 90e33d45
      
       ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b7959b6
  17. 14 Dec, 2018 2 commits
  18. 06 Dec, 2018 2 commits
  19. 03 Dec, 2018 1 commit
    • Prashant Bhole's avatar
      tun: remove skb access after netif_receive_skb · 4e4b08e5
      Prashant Bhole authored
      In tun.c skb->len was accessed while doing stats accounting after a
      call to netif_receive_skb. We can not access skb after this call
      because buffers may be dropped.
      
      The fix for this bug would be to store skb->len in local variable and
      then use it after netif_receive_skb(). IMO using xdp data size for
      accounting bytes will be better because input for tun_xdp_one() is
      xdp_buff.
      
      Hence this patch:
      - fixes a bug by removing skb access after netif_receive_skb()
      - uses xdp data size for accounting bytes
      
      [613.019057] BUG: KASAN: use-after-free in tun_sendmsg+0x77c/0xc50 [tun]
      [613.021062] Read of size 4 at addr ffff8881da9ab7c0 by task vhost-1115/1155
      [613.023073]
      [613.024003] CPU: 0 PID: 1155 Comm: vhost-1115 Not tainted 4.20.0-rc3-vm+ #232
      [613.026029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [613.029116] Call Trace:
      [613.031145]  dump_stack+0x5b/0x90
      [613.032219]  print_address_description+0x6c/0x23c
      [613.034156]  ? tun_sendmsg+0x77c/0xc50 [tun]
      [613.036141]  kasan_report.cold.5+0x241/0x308
      [613.038125]  tun_sendmsg+0x77c/0xc50 [tun]
      [613.040109]  ? tun_get_user+0x1960/0x1960 [tun]
      [613.042094]  ? __isolate_free_page+0x270/0x270
      [613.045173]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
      [613.047127]  ? peek_head_len.part.13+0x90/0x90 [vhost_net]
      [613.049096]  ? get_tx_bufs+0x5a/0x2c0 [vhost_net]
      [613.051106]  ? vhost_enable_notify+0x2d8/0x420 [vhost]
      [613.053139]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
      [613.053139]  ? vhost_net_buf_peek+0x340/0x340 [vhost_net]
      [613.053139]  ? __mutex_lock+0x8d9/0xb30
      [613.053139]  ? finish_task_switch+0x8f/0x3f0
      [613.053139]  ? handle_tx+0x32/0x120 [vhost_net]
      [613.053139]  ? mutex_trylock+0x110/0x110
      [613.053139]  ? finish_task_switch+0xcf/0x3f0
      [613.053139]  ? finish_task_switch+0x240/0x3f0
      [613.053139]  ? __switch_to_asm+0x34/0x70
      [613.053139]  ? __switch_to_asm+0x40/0x70
      [613.053139]  ? __schedule+0x506/0xf10
      [613.053139]  handle_tx+0xc7/0x120 [vhost_net]
      [613.053139]  vhost_worker+0x166/0x200 [vhost]
      [613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
      [613.053139]  ? __kthread_parkme+0x77/0x90
      [613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
      [613.053139]  kthread+0x1b1/0x1d0
      [613.053139]  ? kthread_park+0xb0/0xb0
      [613.053139]  ret_from_fork+0x35/0x40
      [613.088705]
      [613.088705] Allocated by task 1155:
      [613.088705]  kasan_kmalloc+0xbf/0xe0
      [613.088705]  kmem_cache_alloc+0xdc/0x220
      [613.088705]  __build_skb+0x2a/0x160
      [613.088705]  build_skb+0x14/0xc0
      [613.088705]  tun_sendmsg+0x4f0/0xc50 [tun]
      [613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
      [613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
      [613.088705]  handle_tx+0xc7/0x120 [vhost_net]
      [613.088705]  vhost_worker+0x166/0x200 [vhost]
      [613.088705]  kthread+0x1b1/0x1d0
      [613.088705]  ret_from_fork+0x35/0x40
      [613.088705]
      [613.088705] Freed by task 1155:
      [613.088705]  __kasan_slab_free+0x12e/0x180
      [613.088705]  kmem_cache_free+0xa0/0x230
      [613.088705]  ip6_mc_input+0x40f/0x5a0
      [613.088705]  ipv6_rcv+0xc9/0x1e0
      [613.088705]  __netif_receive_skb_one_core+0xc1/0x100
      [613.088705]  netif_receive_skb_internal+0xc4/0x270
      [613.088705]  br_pass_frame_up+0x2b9/0x2e0
      [613.088705]  br_handle_frame_finish+0x2fb/0x7a0
      [613.088705]  br_handle_frame+0x30f/0x6c0
      [613.088705]  __netif_receive_skb_core+0x61a/0x15b0
      [613.088705]  __netif_receive_skb_one_core+0x8e/0x100
      [613.088705]  netif_receive_skb_internal+0xc4/0x270
      [613.088705]  tun_sendmsg+0x738/0xc50 [tun]
      [613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
      [613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
      [613.088705]  handle_tx+0xc7/0x120 [vhost_net]
      [613.088705]  vhost_worker+0x166/0x200 [vhost]
      [613.088705]  kthread+0x1b1/0x1d0
      [613.088705]  ret_from_fork+0x35/0x40
      [613.088705]
      [613.088705] The buggy address belongs to the object at ffff8881da9ab740
      [613.088705]  which belongs to the cache skbuff_head_cache of size 232
      
      Fixes: 043d222f
      
       ("tuntap: accept an array of XDP buffs through sendmsg()")
      Reviewed-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e4b08e5
  20. 01 Dec, 2018 2 commits
  21. 19 Nov, 2018 2 commits
    • Matthew Cover's avatar
      tuntap: fix multiqueue rx · 8ebebcba
      Matthew Cover authored
      
      
      When writing packets to a descriptor associated with a combined queue, the
      packets should end up on that queue.
      
      Before this change all packets written to any descriptor associated with a
      tap interface end up on rx-0, even when the descriptor is associated with a
      different queue.
      
      The rx traffic can be generated by either of the following.
        1. a simple tap program which spins up multiple queues and writes packets
           to each of the file descriptors
        2. tx from a qemu vm with a tap multiqueue netdev
      
      The queue for rx traffic can be observed by either of the following (done
      on the hypervisor in the qemu case).
        1. a simple netmap program which opens and reads from per-queue
           descriptors
        2. configuring RPS and doing per-cpu captures with rxtxcpu
      
      Alternatively, if you printk() the return value of skb_get_rx_queue() just
      before each instance of netif_receive_skb() in tun.c, you will get 65535
      for every skb.
      
      Calling skb_record_rx_queue() to set the rx queue to the queue_index fixes
      the association between descriptor and rx queue.
      Signed-off-by: default avatarMatthew Cover <matthew.cover@stackpath.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ebebcba
    • Eric Dumazet's avatar
      tun: use netdev_alloc_frag() in tun_napi_alloc_frags() · aa6daaca
      Eric Dumazet authored
      In order to cook skbs in the same way than Ethernet drivers,
      it is probably better to not use GFP_KERNEL, but rather
      use the GFP_ATOMIC and PFMEMALLOC mechanisms provided by
      netdev_alloc_frag().
      
      This would allow to use tun driver even in memory stress
      situations, especially if swap is used over this tun channel.
      
      Fixes: 90e33d45
      
       ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Petar Penkov <peterpenkov96@gmail.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa6daaca
  22. 18 Nov, 2018 1 commit
  23. 17 Nov, 2018 1 commit
  24. 08 Nov, 2018 1 commit
  25. 16 Oct, 2018 1 commit
    • Serhey Popovych's avatar
      tun: Consistently configure generic netdev params via rtnetlink · df52eab2
      Serhey Popovych authored
      Configuring generic network device parameters on tun will fail in
      presence of IFLA_INFO_KIND attribute in IFLA_LINKINFO nested attribute
      since tun_validate() always return failure.
      
      This can be visualized with following ip-link(8) command sequences:
      
        # ip link set dev tun0 group 100
        # ip link set dev tun0 group 100 type tun
        RTNETLINK answers: Invalid argument
      
      with contrast to dummy and veth drivers:
      
        # ip link set dev dummy0 group 100
        # ip link set dev dummy0 type dummy
      
        # ip link set dev veth0 group 100
        # ip link set dev veth0 group 100 type veth
      
      Fix by returning zero in tun_validate() when @data is NULL that is
      always in case since rtnl_link_ops->maxtype is zero in tun driver.
      
      Fixes: f019a7a5
      
       ("tun: Implement ip link del tunXXX")
      Signed-off-by: default avatarSerhey Popovych <serhe.popovych@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df52eab2
  26. 11 Oct, 2018 1 commit
  27. 02 Oct, 2018 3 commits
    • Eric Dumazet's avatar
      tun: napi flags belong to tfile · af3fb24e
      Eric Dumazet authored
      Since tun->flags might be shared by multiple tfile structures,
      it is better to make sure tun_get_user() is using the flags
      for the current tfile.
      
      Presence of the READ_ONCE() in tun_napi_frags_enabled() gave a hint
      of what could happen, but we need something stronger to please
      syzbot.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 13647 Comm: syz-executor5 Not tainted 4.19.0-rc5+ #59
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
      RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
      RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
      RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
      R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
      R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
      FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       napi_gro_frags+0x3f4/0xc90 net/core/dev.c:5715
       tun_get_user+0x31d5/0x42a0 drivers/net/tun.c:1922
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1967
       call_write_iter include/linux/fs.h:1808 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
       vfs_write+0x1fc/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457579
      Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fe003614c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457579
      RDX: 0000000000000012 RSI: 0000000020000000 RDI: 000000000000000a
      RBP: 000000000072c040 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0036156d4
      R13: 00000000004c5574 R14: 00000000004d8e98 R15: 00000000ffffffff
      Modules linked in:
      
      RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
      RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
      RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
      RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
      R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
      R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
      FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 90e33d45
      
       ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af3fb24e
    • Eric Dumazet's avatar
      tun: initialize napi_mutex unconditionally · c7256f57
      Eric Dumazet authored
      This is the first part to fix following syzbot report :
      
      console output: https://syzkaller.appspot.com/x/log.txt?x=145378e6400000
      kernel config:  https://syzkaller.appspot.com/x/.config?x=443816db871edd66
      dashboard link: https://syzkaller.appspot.com/bug?extid=e662df0ac1d753b57e80
      
      Following patch is fixing the race condition, but it seems safer
      to initialize this mutex at tfile creation anyway.
      
      Fixes: 90e33d45
      
       ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+e662df0ac1d753b57e80@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7256f57
    • Eric Dumazet's avatar
      tun: remove unused parameters · 06e55add
      Eric Dumazet authored
      
      
      tun_napi_disable() and tun_napi_del() do not need
      a pointer to the tun_struct
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06e55add
  28. 24 Sep, 2018 1 commit
    • Eric Dumazet's avatar
      tun: remove ndo_poll_controller · 765cdc20
      Eric Dumazet authored
      
      
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      tun uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      765cdc20
  29. 13 Sep, 2018 3 commits