1. 14 Jan, 2019 1 commit
  2. 10 Jan, 2019 2 commits
    • wenxu's avatar
      netfilter: nft_flow_offload: fix interaction with vrf slave device · 10f4e765
      wenxu authored
      In the forward chain, the iif is changed from slave device to master vrf
      device. Thus, flow offload does not find a match on the lower slave
      This patch uses the cached route, ie. dst->dev, to update the iif and
      oif fields in the flow entry.
      After this patch, the following example works fine:
       # ip addr add dev eth0
       # ip addr add dev eth1
       # ip link add user1 type vrf table 1
       # ip l set user1 up
       # ip l set dev eth0 master user1
       # ip l set dev eth1 master user1
       # nft add table firewall
       # nft add flowtable f fb1 { hook ingress priority 0 \; devices = { eth0, eth1 } \; }
       # nft add chain f ftb-all {type filter hook forward priority 0 \; policy accept \; }
       # nft add rule f ftb-all ct zone 1 ip protocol tcp flow offload @fb1
       # nft add rule f ftb-all ct zone 1 ip protocol udp flow offload @fb1
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    • Shakeel Butt's avatar
      netfilter: ebtables: account ebt_table_info to kmemcg · e2c8d550
      Shakeel Butt authored
      The [ip,ip6,arp]_tables use x_tables_info internally and the underlying
      memory is already accounted to kmemcg. Do the same for ebtables. The
      syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able to OOM the
      whole system from a restricted memcg, a potential DoS.
      By accounting the ebt_table_info, the memory used for ebt_table_info can
      be contained within the memcg of the allocating process. However the
      lifetime of ebt_table_info is independent of the allocating process and
      is tied to the network namespace. So, the oom-killer will not be able to
      relieve the memory pressure due to ebt_table_info memory. The memory for
      ebt_table_info is allocated through vmalloc. Currently vmalloc does not
      handle the oom-killed allocating process correctly and one large
      allocation can bypass memcg limit enforcement. So, with this patch,
      at least the small allocations will be contained. For large allocations,
      we need to fix vmalloc.
      Reported-by: default avatar <syzbot+7713f3aa67be76b1552c@syzkaller.appspotmail.com>
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
  3. 09 Jan, 2019 1 commit
    • wenxu's avatar
      netfilter: nft_flow_offload: Fix reverse route lookup · a799aea0
      wenxu authored
      Using the following example:
      	client ---> which dnat to server
      The first reply packet (ie. syn+ack) uses an incorrect destination
      address for the reverse route lookup since it uses:
      	daddr = ct->tuplehash[!dir].tuple.dst.u3.ip;
      which is in the scenario that is described above, while this
      should be:
      	daddr = ct->tuplehash[dir].tuple.src.u3.ip;
      that is
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
  4. 08 Jan, 2019 7 commits
  5. 07 Jan, 2019 14 commits
  6. 06 Jan, 2019 3 commits
  7. 05 Jan, 2019 5 commits
    • David Ahern's avatar
      ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses · d4a7e9bb
      David Ahern authored
      I realized the last patch calls dev_get_by_index_rcu in a branch not
      holding the rcu lock. Add the calls to rcu_read_lock and rcu_read_unlock.
      Fixes: ec90ad33
       ("ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Alexei Starovoitov's avatar
      Merge branch 'udpv6_sendmsg-addr_any-fix' · 466f89e9
      Alexei Starovoitov authored
      Andrey Ignatov says:
      The patch set fixes BSD'ism in sys_sendmsg to rewrite unspecified
      destination IPv6 for unconnected UDP sockets in sys_sendmsg with [::1] in
      case when either CONFIG_CGROUP_BPF is enabled or when sys_sendmsg BPF hook
      sets destination IPv6 to [::].
      Patch 1 is the fix and provides more details.
      Patch 2 adds two test cases to verify the fix.
      * Fix compile error in patch 1.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    • Andrey Ignatov's avatar
      selftests/bpf: Test [::] -> [::1] rewrite in sys_sendmsg in test_sock_addr · 976b4f3a
      Andrey Ignatov authored
      Test that sys_sendmsg BPF hook doesn't break sys_sendmsg behaviour to
      rewrite destination IPv6 = [::] with [::1] (BSD'ism).
      Two test cases are added:
      1) User passes dst IPv6 = [::] and BPF_CGROUP_UDP6_SENDMSG program
         doesn't touch it.
      2) User passes dst IPv6 != [::], but BPF_CGROUP_UDP6_SENDMSG program
         rewrites it with [::].
      In both cases [::1] is used by sys_sendmsg code eventually and datagram
      is sent successfully for unconnected UDP socket.
      Example of relevant output:
        Test case: sendmsg6: set dst IP = [::] (BSD'ism) .. [PASS]
        Test case: sendmsg6: preserve dst IP = [::] (BSD'ism) .. [PASS]
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    • Andrey Ignatov's avatar
      bpf: Fix [::] -> [::1] rewrite in sys_sendmsg · e8e36984
      Andrey Ignatov authored
      sys_sendmsg has supported unspecified destination IPv6 (wildcard) for
      unconnected UDP sockets since 876c7f41. When [::] is passed by user as
      destination, sys_sendmsg rewrites it with [::1] to be consistent with
      BSD (see "BSD'ism" comment in the code).
      This didn't work when cgroup-bpf was enabled though since the rewrite
      [::] -> [::1] happened before passing control to cgroup-bpf block where
      fl6.daddr was updated with passed by user sockaddr_in6.sin6_addr (that
      might or might not be changed by BPF program). That way if user passed
      [::] as dst IPv6 it was first rewritten with [::1] by original code from
      876c7f41, but then rewritten back with [::] by cgroup-bpf block.
      It happened even when BPF_CGROUP_UDP6_SENDMSG program was not present
      (CONFIG_CGROUP_BPF=y was enough).
      The fix is to apply BSD'ism after cgroup-bpf block so that [::] is
      replaced with [::1] no matter where it came from: passed by user to
      sys_sendmsg or set by BPF_CGROUP_UDP6_SENDMSG program.
      Fixes: 1cedee13
       ("bpf: Hooks for sys_sendmsg")
      Reported-by: default avatarNitin Rawat <nitin.rawat@intel.com>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    • David Ahern's avatar
      ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address · ec90ad33
      David Ahern authored
      Similar to c5ee0663
       ("ipv6: Consider sk_bound_dev_if when binding a
      socket to an address"), binding a socket to v4 mapped addresses needs to
      consider if the socket is bound to a device.
      This problem also exists from the beginning of git history.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 04 Jan, 2019 7 commits
    • Jeff Kirsher's avatar
      ixgbe: fix Kconfig when driver is not a module · ae84e4a8
      Jeff Kirsher authored
      The new ability added to the driver to use mii_bus to handle MII related
      ioctls is causing compile issues when the driver is compiled into the
      kernel (i.e. not a module).
      The problem was in selecting MDIO_DEVICE instead of the preferred PHYLIB
      Kconfig option.  The reason being that MDIO_DEVICE had a dependency on
      PHYLIB and would be compiled as a module when PHYLIB was a module, no
      matter whether ixgbe was compiled into the kernel.
      CC: Dave Jones <davej@codemonkey.org.uk>
      CC: Steve Douthit <stephend@silicom-usa.com>
      CC: Florian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Reviewed-by: default avatarStephen Douthit <stephend@silicom-usa.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      ipv6: make icmp6_send() robust against null skb->dev · 8d933670
      Eric Dumazet authored
      syzbot was able to crash one host with the following stack trace :
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 8625 Comm: syz-executor4 Not tainted 4.20.0+ #8
      RIP: 0010:dev_net include/linux/netdevice.h:2169 [inline]
      RIP: 0010:icmp6_send+0x116/0x2d30 net/ipv6/icmp.c:426
      This is because a RX packet found socket owned by user and
      was stored into socket backlog. Before leaving RCU protected section,
      skb->dev was cleared in __sk_receive_skb(). When socket backlog
      was finally handled at release_sock() time, skb was fed to
      smack_socket_sock_rcv_skb() then icmp6_send()
      We could fix the bug in smack_socket_sock_rcv_skb(), or simply
      make icmp6_send() more robust against such possibility.
      In the future we might provide to icmp6_send() the net pointer
      instead of infering it.
      Fixes: d66a8acb
       ("Smack: Inform peer that IPv6 traffic has been blocked")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Piotr Sawicki <p.sawicki2@partner.samsung.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Peter Oskolkov's avatar
      selftests: net: fix/improve ip_defrag selftest · 3271a482
      Peter Oskolkov authored
      Commit ade44640
       ("net: ipv4: do not handle duplicate fragments as
      overlapping") changed IPv4 defragmentation so that duplicate fragments,
      as well as _some_ fragments completely covered by previously delivered
      fragments, do not lead to the whole frag queue being discarded. This
      makes the existing ip_defrag selftest flaky.
      This patch
      * makes sure that negative IPv4 defrag tests generate truly overlapping
        fragments that trigger defrag queue drops;
      * tests that duplicate IPv4 fragments do not trigger defrag queue drops;
      * makes a couple of minor tweaks to the test aimed at increasing its code
        coverage and reduce flakiness.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Daniele Palmas's avatar
      qmi_wwan: add MTU default to qmap network interface · f87118d5
      Daniele Palmas authored
      This patch adds MTU default value to qmap network interface in
      order to avoid "RTNETLINK answers: No buffer space available"
      error when setting an ipv6 address.
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • David S. Miller's avatar
      Merge branch 'hns-fixes' · 75e7fb0a
      David S. Miller authored
      Huazhong Tan says:
      net: hns: Bugfixes for HNS driver
      This patchset includes bugfixes for the HNS ethernet controller driver.
      Every patch is independent.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Yonglong Liu's avatar
      net: hns: Fix use after free identified by SLUB debug · bb989501
      Yonglong Liu authored
      When enable SLUB debug, than remove hns_enet_drv module, SLUB debug will
      identify a use after free bug:
      [134.189505] Unable to handle kernel paging request at virtual address
      [134.197553] Mem abort info:
      [134.200381]   ESR = 0x96000004
      [134.203487]   Exception class = DABT (current EL), IL = 32 bits
      [134.209497]   SET = 0, FnV = 0
      [134.212596]   EA = 0, S1PTW = 0
      [134.215777] Data abort info:
      [134.218701]   ISV = 0, ISS = 0x00000004
      [134.222596]   CM = 0, WnR = 0
      [134.225606] [006b6b6b6b6b6b6b] address between user and kernel address ranges
      [134.232851] Internal error: Oops: 96000004 [#1] SMP
      [134.237798] CPU: 21 PID: 27834 Comm: rmmod Kdump: loaded Tainted: G
      		OE     4.19.5-1.2.34.aarch64 #1
      [134.247856] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 10/24/2018
      [134.255181] pstate: 20000005 (nzCv daif -PAN -UAO)
      [134.260044] pc : hns_ae_put_handle+0x38/0x60
      [134.264372] lr : hns_ae_put_handle+0x24/0x60
      [134.268700] sp : ffff00001be93c50
      [134.272054] x29: ffff00001be93c50 x28: ffff802faaec8040
      [134.277442] x27: 0000000000000000 x26: 0000000000000000
      [134.282830] x25: 0000000056000000 x24: 0000000000000015
      [134.288284] x23: ffff0000096fe098 x22: ffff000001050070
      [134.293671] x21: ffff801fb3c044a0 x20: ffff80afb75ec098
      [134.303287] x19: ffff80afb75ec098 x18: 0000000000000000
      [134.312945] x17: 0000000000000000 x16: 0000000000000000
      [134.322517] x15: 0000000000000002 x14: 0000000000000000
      [134.332030] x13: dead000000000100 x12: ffff7e02bea3c988
      [134.341487] x11: ffff80affbee9e68 x10: 0000000000000000
      [134.351033] x9 : 6fffff8000008101 x8 : 0000000000000000
      [134.360569] x7 : dead000000000100 x6 : ffff000009579748
      [134.370059] x5 : 0000000000210d00 x4 : 0000000000000000
      [134.379550] x3 : 0000000000000001 x2 : 0000000000000000
      [134.388813] x1 : 6b6b6b6b6b6b6b6b x0 : 0000000000000000
      [134.397993] Process rmmod (pid: 27834, stack limit = 0x00000000d474b7fd)
      [134.408498] Call trace:
      [134.414611]  hns_ae_put_handle+0x38/0x60
      [134.422208]  hnae_put_handle+0xd4/0x108
      [134.429563]  hns_nic_dev_remove+0x60/0xc0 [hns_enet_drv]
      [134.438342]  platform_drv_remove+0x2c/0x70
      [134.445958]  device_release_driver_internal+0x174/0x208
      [134.454810]  driver_detach+0x70/0xd8
      [134.461913]  bus_remove_driver+0x64/0xe8
      [134.469396]  driver_unregister+0x34/0x60
      [134.476822]  platform_driver_unregister+0x20/0x30
      [134.485130]  hns_nic_dev_driver_exit+0x14/0x6e4 [hns_enet_drv]
      [134.494634]  __arm64_sys_delete_module+0x238/0x290
      struct hnae_handle is a member of struct hnae_vf_cb, so when vf_cb is
      freed, than use hnae_handle will cause use after free panic.
      This patch frees vf_cb after hnae_handle used.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Yonglong Liu's avatar
      net: hns: Fix WARNING when hns modules installed · c77804be
      Yonglong Liu authored
      Commit 308c6caf ("net: hns: All ports can not work when insmod hns ko
      after rmmod.") add phy_stop in hns_nic_init_phy(), In the branch of "net",
      this method is effective, but in the branch of "net-next", it will cause
      a WARNING when hns modules loaded, reference to commit 2b3e88ea ("net:
      phy: improve phy state checking"):
      [10.092168] ------------[ cut here ]------------
      [10.092171] called from state READY
      [10.092189] WARNING: CPU: 4 PID: 1 at ../drivers/net/phy/phy.c:854
      [10.092192] Modules linked in:
      [10.092197] CPU: 4 PID:1 Comm:swapper/0 Not tainted 4.20.0-rc7-next-20181220 #1
      [10.092200] Hardware name: Huawei TaiShan 2280 /D05, BIOS Hisilicon D05 UEFI
                      16.12 Release 05/15/2017
      [10.092202] pstate: 60000005 (nZCv daif -PAN -UAO)
      [10.092205] pc : phy_stop+0x90/0xb0
      [10.092208] lr : phy_stop+0x90/0xb0
      [10.092209] sp : ffff00001159ba90
      [10.092212] x29: ffff00001159ba90 x28: 0000000000000007
      [10.092215] x27: ffff000011180068 x26: ffff0000110a5620
      [10.092218] x25: ffff0000113b6000 x24: ffff842f96dac000
      [10.092221] x23: 0000000000000000 x22: 0000000000000000
      [10.092223] x21: ffff841fb8425e18 x20: ffff801fb3a56438
      [10.092226] x19: ffff801fb3a56000 x18: ffffffffffffffff
      [10.092228] x17: 0000000000000000 x16: 0000000000000000
      [10.092231] x15: ffff00001122d6c8 x14: ffff00009159b7b7
      [10.092234] x13: ffff00001159b7c5 x12: ffff000011245000
      [10.092236] x11: 0000000005f5e0ff x10: ffff00001159b750
      [10.092239] x9 : 00000000ffffffd0 x8 : 0000000000000465
      [10.092242] x7 : ffff0000112457f8 x6 : ffff0000113bd7ce
      [10.092245] x5 : 0000000000000000 x4 : 0000000000000000
      [10.092247] x3 : 00000000ffffffff x2 : ffff000011245828
      [10.092250] x1 : 4b5860bd05871300 x0 : 0000000000000000
      [10.092253] Call trace:
      [10.092255]  phy_stop+0x90/0xb0
      [10.092260]  hns_nic_init_phy+0xf8/0x110
      [10.092262]  hns_nic_try_get_ae+0x4c/0x3b0
      [10.092264]  hns_nic_dev_probe+0x1fc/0x480
      [10.092268]  platform_drv_probe+0x50/0xa0
      [10.092271]  really_probe+0x1f4/0x298
      [10.092273]  driver_probe_device+0x58/0x108
      [10.092275]  __driver_attach+0xdc/0xe0
      [10.092278]  bus_for_each_dev+0x74/0xc8
      [10.092280]  driver_attach+0x20/0x28
      [10.092283]  bus_add_driver+0x1b8/0x228
      [10.092285]  driver_register+0x60/0x110
      [10.092288]  __platform_driver_register+0x40/0x48
      [10.092292]  hns_nic_dev_driver_init+0x18/0x20
      [10.092296]  do_one_initcall+0x5c/0x180
      [10.092299]  kernel_init_freeable+0x198/0x240
      [10.092303]  kernel_init+0x10/0x108
      [10.092306]  ret_from_fork+0x10/0x18
      [10.092308] ---[ end trace 1396dd0278e397eb ]---
      This WARNING occurred because of calling phy_stop before phy_start.
      The root cause of the problem in commit '308c6caf' is:
      Reference to hns_nic_init_phy, the flag phydev->supported is changed after
      phy_connect_direct. The flag phydev->supported is 0x6ff when hns modules is
      loaded, so will not change Fiber Port power(Reference to marvell.c), which
      is power on at default.
      Then the flag phydev->supported is changed to 0x6f, so Fiber Port power is
      off when removing hns modules.
      When hns modules installed again, the flag phydev->supported is default
      value 0x6ff, so will not change Fiber Port power(now is off), causing mac
      link not up problem.
      So the solution is change phy flags before phy_connect_direct.
      Fixes: 308c6caf
       ("net: hns: All ports can not work when insmod hns ko after rmmod.")
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>