1. 12 Sep, 2019 1 commit
    • Yang Yingliang's avatar
      tun: fix use-after-free when register netdev failed · 77f22f92
      Yang Yingliang authored
      I got a UAF repport in tun driver when doing fuzzy test:
      [  466.269490] ==================================================================
      [  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
      [  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
      [  466.271810]
      [  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
      [  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [  466.271838] Call Trace:
      [  466.271858]  dump_stack+0xca/0x13e
      [  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271890]  print_address_description+0x79/0x440
      [  466.271906]  ? vprintk_func+0x5e/0xf0
      [  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271935]  __kasan_report+0x15c/0x1df
      [  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271976]  kasan_report+0xe/0x20
      [  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
      [  466.272013]  do_iter_readv_writev+0x4b7/0x740
      [  466.272032]  ? default_llseek+0x2d0/0x2d0
      [  466.272072]  do_iter_read+0x1c5/0x5e0
      [  466.272110]  vfs_readv+0x108/0x180
      [  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
      [  466.299020]  ? fsnotify+0x888/0xd50
      [  466.299040]  ? __fsnotify_parent+0xd0/0x350
      [  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
      [  466.304548]  ? vfs_write+0x264/0x510
      [  466.304569]  ? ksys_write+0x101/0x210
      [  466.304591]  ? do_preadv+0x116/0x1a0
      [  466.304609]  do_preadv+0x116/0x1a0
      [  466.309829]  do_syscall_64+0xc8/0x600
      [  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.309861] RIP: 0033:0x4560f9
      [  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      [  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
      [  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
      [  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
      [  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
      [  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
      [  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
      [  466.323057]
      [  466.323064] Allocated by task 2605:
      [  466.335165]  save_stack+0x19/0x80
      [  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
      [  466.337755]  kmem_cache_alloc+0xe8/0x320
      [  466.339050]  getname_flags+0xca/0x560
      [  466.340229]  user_path_at_empty+0x2c/0x50
      [  466.341508]  vfs_statx+0xe6/0x190
      [  466.342619]  __do_sys_newstat+0x81/0x100
      [  466.343908]  do_syscall_64+0xc8/0x600
      [  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.347034]
      [  466.347517] Freed by task 2605:
      [  466.348471]  save_stack+0x19/0x80
      [  466.349476]  __kasan_slab_free+0x12e/0x180
      [  466.350726]  kmem_cache_free+0xc8/0x430
      [  466.351874]  putname+0xe2/0x120
      [  466.352921]  filename_lookup+0x257/0x3e0
      [  466.354319]  vfs_statx+0xe6/0x190
      [  466.355498]  __do_sys_newstat+0x81/0x100
      [  466.356889]  do_syscall_64+0xc8/0x600
      [  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.359567]
      [  466.360050] The buggy address belongs to the object at ffff888372139100
      [  466.360050]  which belongs to the cache names_cache of size 4096
      [  466.363735] The buggy address is located 336 bytes inside of
      [  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
      [  466.367179] The buggy address belongs to the page:
      [  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
      [  466.371582] flags: 0x2fffff80010200(slab|head)
      [  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
      [  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  466.377778] page dumped because: kasan: bad access detected
      [  466.379730]
      [  466.380288] Memory state around the buggy address:
      [  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.388257]                                                  ^
      [  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.394667] ==================================================================
      tun_chr_read_iter() accessed the memory which freed by free_netdev()
      called by tun_set_iff():
              CPUA                                           CPUB
          register_netdevice() <-- inject error
          goto err_detach
          tun_detach_all() <-- set RCV_SHUTDOWN
          free_netdev() <-- called from
                           err_free_dev path
            netdev_freemem() <-- free the memory
                              without check refcount
            (In this path, the refcount cannot prevent
             freeing the memory of dev, and the memory
             will be used by dev_put() called by
             tun_chr_read_iter() on CPUB.)
                                                           (Break from tun_ring_recv(),
                                                           because RCV_SHUTDOWN is set)
                                                           dev_put() <-- use the memory
                                                                         freed by netdev_freemem()
      Put the publishing of tfile->tun after register_netdevice(),
      so tun_get() won't get the tun pointer that freed by
      err_detach path if register_netdevice() failed.
      Fixes: eb0fb363
       ("tuntap: attach queue 0 before registering netdevice")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Suggested-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 11 Sep, 2019 7 commits
  3. 10 Sep, 2019 1 commit
  4. 07 Sep, 2019 4 commits
    • Fred Lotter's avatar
      nfp: flower: cmsg rtnl locks can timeout reify messages · 28abe579
      Fred Lotter authored
      Flower control message replies are handled in different locations. The truly
      high priority replies are handled in the BH (tasklet) context, while the
      remaining replies are handled in a predefined Linux work queue. The work
      queue handler orders replies into high and low priority groups, and always
      start servicing the high priority replies within the received batch first.
      Reply Type:			Rtnl Lock:	Handler:
      CMSG_TYPE_PORT_MOD		no		BH tasklet (mtu)
      CMSG_TYPE_TUN_NEIGH		no		BH tasklet
      CMSG_TYPE_FLOW_STATS		no		BH tasklet
      CMSG_TYPE_PORT_REIFY		no		WQ high
      CMSG_TYPE_PORT_MOD		yes		WQ high (link/mtu)
      CMSG_TYPE_MERGE_HINT		yes		WQ low
      CMSG_TYPE_NO_NEIGH		no		WQ low
      CMSG_TYPE_QOS_STATS		no		WQ low
      CMSG_TYPE_LAG_CONFIG		no		WQ low
      A subset of control messages can block waiting for an rtnl lock (from both
      work queue priority groups). The rtnl lock is heavily contended for by
      external processes such as systemd-udevd, systemd-network and libvirtd,
      especially during netdev creation, such as when flower VFs and representors
      are instantiated.
      Kernel netlink instrumentation shows that external processes (such as
      systemd-udevd) often use successive rtnl_trylock() sequences, which can result
      in an rtnl_lock() blocked control message to starve for longer periods of time
      during rtnl lock contention, i.e. netdev creation.
      In the current design a single blocked control message will block the entire
      work queue (both priorities), and introduce a latency which is
      nondeterministic and dependent on system wide rtnl lock usage.
      In some extreme cases, one blocked control message at exactly the wrong time,
      just before the maximum number of VFs are instantiated, can block the work
      queue for long enough to prevent VF representor REIFY replies from getting
      handled in time for the 40ms timeout.
      The firmware will deliver the total maximum number of REIFY message replies in
      around 300us.
      Only REIFY and MTU update messages require replies within a timeout period (of
      40ms). The MTU-only updates are already done directly in the BH (tasklet)
      Move the REIFY handler down into the BH (tasklet) in order to resolve timeouts
      caused by a blocked work queue waiting on rtnl locks.
      Signed-off-by: default avatarFred Lotter <frederik.lotter@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Juliet Kim's avatar
      net/ibmvnic: free reset work of removed device from queue · 1c2977c0
      Juliet Kim authored
      Commit 36f1031c ("ibmvnic: Do not process reset during or after
       device removal") made the change to exit reset if the driver has been
      removed, but does not free reset work items of the adapter from queue.
      Ensure all reset work items are freed when breaking out of the loop early.
      Fixes: 36f1031c
       ("ibmnvic: Do not process reset during or after device removal”)
      Signed-off-by: default avatarJuliet Kim <julietk@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Stefan Chulski's avatar
      net: phylink: Fix flow control resolution · 63b2ed4e
      Stefan Chulski authored
      Regarding to IEEE 802.3-2015 standard section 2
      28B.3 Priority resolution - Table 28-3 - Pause resolution
      In case of Local device Pause=1 AsymDir=0, Link partner
      Pause=1 AsymDir=1, Local device resolution should be enable PAUSE
      transmit, disable PAUSE receive.
      And in case of Local device Pause=1 AsymDir=1, Link partner
      Pause=1 AsymDir=0, Local device resolution should be enable PAUSE
      receive, disable PAUSE transmit.
      Fixes: 9525ae83
       ("phylink: add phylink infrastructure")
      Signed-off-by: default avatarStefan Chulski <stefanc@marvell.com>
      Reported-by: default avatarShaul Ben-Mayor <shaulb@marvell.com>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Christophe JAILLET's avatar
      net/hamradio/6pack: Fix the size of a sk_buff used in 'sp_bump()' · b82573fd
      Christophe JAILLET authored
      We 'allocate' 'count' bytes here. In fact, 'dev_alloc_skb' already add some
      extra space for padding, so a bit more is allocated.
      However, we use 1 byte for the KISS command, then copy 'count' bytes, so
      count+1 bytes.
      Explicitly allocate and use 1 more byte to be safe.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  5. 06 Sep, 2019 2 commits
    • Zhu Yanjun's avatar
      forcedeth: use per cpu to collect xmit/recv statistics · f4b633b9
      Zhu Yanjun authored
      When testing with a background iperf pushing 1Gbit/sec traffic and running
      both ifconfig and netstat to collect statistics, some deadlocks occurred.
      Ifconfig and netstat will call nv_get_stats64 to get software xmit/recv
      statistics. In the commit f5d827ae ("forcedeth: implement
      ndo_get_stats64() API"), the normal tx/rx variables is to collect tx/rx
      statistics. The fix is to replace normal tx/rx variables with per
      cpu 64-bit variable to collect xmit/recv statistics. The per cpu variable
      will avoid deadlocks and provide fast efficient statistics updates.
      In nv_probe, the per cpu variable is initialized. In nv_remove, this
      per cpu variable is freed.
      In xmit/recv process, this per cpu variable will be updated.
      In nv_get_stats64, this per cpu variable on each cpu is added up. Then
      the driver can get xmit/recv packets statistics.
      A test runs for several days with this commit, the deadlocks disappear
      and the performance is better.
         - iperf SMP x86_64 ->
         Client connecting to, TCP port 5001
         TCP window size: 85.0 KByte (default)
         [  3] local port 38888 connected with port 5001
         [ ID] Interval       Transfer     Bandwidth
         [  3]  0.0-10.0 sec  1.10 GBytes   943 Mbits/sec
         ifconfig results:
         enp0s9 Link encap:Ethernet  HWaddr 00:21:28:6f:de:0f
                inet addr:  Bcast:  Mask:
                UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
                RX packets:5774764531 errors:0 dropped:0 overruns:0 frame:0
                TX packets:633534193 errors:0 dropped:0 overruns:0 carrier:0
                collisions:0 txqueuelen:1000
                RX bytes:7646159340904 (7.6 TB) TX bytes:11425340407722 (11.4 TB)
         netstat results:
         Kernel Interface table
         enp0s9 1500 0  5774764531 0    0 0      633534193      0      0  0 BMRU
      Fixes: f5d827ae
       ("forcedeth: implement ndo_get_stats64() API")
      CC: Joe Jin <joe.jin@oracle.com>
      CC: JUNXIAO_BI <junxiao.bi@oracle.com>
      Reported-and-tested-by: default avatarNan san <nan.1986san@gmail.com>
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Mao Wenan's avatar
      net: sonic: return NETDEV_TX_OK if failed to map buffer · 6e1cdedc
      Mao Wenan authored
      NETDEV_TX_BUSY really should only be used by drivers that call
      netif_tx_stop_queue() at the wrong moment. If dma_map_single() is
      failed to map tx DMA buffer, it might trigger an infinite loop.
      This patch use NETDEV_TX_OK instead of NETDEV_TX_BUSY, and change
      printk to pr_err_ratelimited.
      Fixes: d9fb9f38
       ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
      Signed-off-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  6. 03 Sep, 2019 7 commits
  7. 02 Sep, 2019 1 commit
  8. 01 Sep, 2019 9 commits
  9. 31 Aug, 2019 1 commit
  10. 30 Aug, 2019 2 commits
  11. 29 Aug, 2019 1 commit
  12. 28 Aug, 2019 4 commits