1. 14 Oct, 2021 10 commits
    • Xin Long's avatar
      icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe · 1fcd7945
      Xin Long authored
      In icmp_build_probe(), the icmp_ext_echo_iio parsing should be done
      step by step and skb_header_pointer() return value should always be
      checked, this patch fixes 3 places in there:
      
        - On case ICMP_EXT_ECHO_CTYPE_NAME, it should only copy ident.name
          from skb by skb_header_pointer(), its len is ident_len. Besides,
          the return value of skb_header_pointer() should always be checked.
      
        - On case ICMP_EXT_ECHO_CTYPE_INDEX, move ident_len check ahead of
          skb_header_pointer(), and also do the return value check for
          skb_header_pointer().
      
        - On case ICMP_EXT_ECHO_CTYPE_ADDR, before accessing iio->ident.addr.
          ctype3_hdr.addrlen, skb_header_pointer() should be called first,
          then check its return value and ident_len.
          On subcases ICMP_AFI_IP and ICMP_AFI_IP6, also do check for ident.
          addr.ctype3_hdr.addrlen and skb_header_pointer()'s return value.
          On subcase ICMP_AFI_IP, the len for skb_header_pointer() should be
          "sizeof(iio->extobj_hdr) + sizeof(iio->ident.addr.ctype3_hdr) +
          sizeof(struct in_addr)" or "ident_len".
      
      v1->v2:
        - To make it more clear, call skb_header_pointer() once only for
          iio->indent's parsing as Jakub Suggested.
      v2->v3:
        - The extobj_hdr.length check against sizeof(_iio) should be done
          before calling skb_header_pointer(), as Eric noticed.
      
      Fixes: d329ea5b
      
       ("icmp: add response to RFC 8335 PROBE messages")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/31628dd76657ea62f5cf78bb55da6b35240831f1.1634205050.git.lucien.xin@gmail.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1fcd7945
    • Cai Huoqing's avatar
      MAINTAINERS: Update the devicetree documentation path of imx fec driver · ea142b09
      Cai Huoqing authored
      
      
      Change the devicetree documentation path
      to "Documentation/devicetree/bindings/net/fsl,fec.yaml"
      since 'fsl-fec.txt' has been converted to 'fsl,fec.yaml' already.
      Signed-off-by: default avatarCai Huoqing <caihuoqing@baidu.com>
      Link: https://lore.kernel.org/r/20211014110214.3254-1-caihuoqing@baidu.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ea142b09
    • Eiichi Tsukata's avatar
      sctp: account stream padding length for reconf chunk · a2d859e3
      Eiichi Tsukata authored
      sctp_make_strreset_req() makes repeated calls to sctp_addto_chunk()
      which will automatically account for padding on each call. inreq and
      outreq are already 4 bytes aligned, but the payload is not and doing
      SCTP_PAD4(a + b) (which _sctp_make_chunk() did implicitly here) is
      different from SCTP_PAD4(a) + SCTP_PAD4(b) and not enough. It led to
      possible attempt to use more buffer than it was allocated and triggered
      a BUG_ON.
      
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Fixes: cc16f00f
      
       ("sctp: add support for generating stream reconf ssn reset request chunk")
      Reported-by: default avatarEiichi Tsukata <eiichi.tsukata@nutanix.com>
      Signed-off-by: default avatarEiichi Tsukata <eiichi.tsukata@nutanix.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <mleitner@redhat.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/b97c1f8b0c7ff79ac4ed206fc2c49d3612e0850c.1634156849.git.mleitner@redhat.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a2d859e3
    • Ido Schimmel's avatar
      mlxsw: thermal: Fix out-of-bounds memory accesses · 332fdf95
      Ido Schimmel authored
      Currently, mlxsw allows cooling states to be set above the maximum
      cooling state supported by the driver:
      
       # cat /sys/class/thermal/thermal_zone2/cdev0/type
       mlxsw_fan
       # cat /sys/class/thermal/thermal_zone2/cdev0/max_state
       10
       # echo 18 > /sys/class/thermal/thermal_zone2/cdev0/cur_state
       # echo $?
       0
      
      This results in out-of-bounds memory accesses when thermal state
      transition statistics are enabled (CONFIG_THERMAL_STATISTICS=y), as the
      transition table is accessed with a too large index (state) [1].
      
      According to the thermal maintainer, it is the responsibility of the
      driver to reject such operations [2].
      
      Therefore, return an error when the state to be set exceeds the maximum
      cooling state supported by the driver.
      
      To avoid dead code, as suggested by the thermal maintainer [3],
      partially revert commit a421ce08 ("mlxsw: core: Extend cooling
      device with cooling levels") that tried to interpret these invalid
      cooling states (above the maximum) in a special way. The cooling levels
      array is not removed in order to prevent the fans going below 20% PWM,
      which would cause them to get stuck at 0% PWM.
      
      [1]
      BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x271/0x290
      Read of size 4 at addr ffff8881052f7bf8 by task kworker/0:0/5
      
      CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.15.0-rc3-custom-45935-gce1adf704b14 #122
      Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2FO"/"SA000874", BIOS 4.6.5 03/08/2016
      Workqueue: events_freezable_power_ thermal_zone_device_check
      Call Trace:
       dump_stack_lvl+0x8b/0xb3
       print_address_description.constprop.0+0x1f/0x140
       kasan_report.cold+0x7f/0x11b
       thermal_cooling_device_stats_update+0x271/0x290
       __thermal_cdev_update+0x15e/0x4e0
       thermal_cdev_update+0x9f/0xe0
       step_wise_throttle+0x770/0xee0
       thermal_zone_device_update+0x3f6/0xdf0
       process_one_work+0xa42/0x1770
       worker_thread+0x62f/0x13e0
       kthread+0x3ee/0x4e0
       ret_from_fork+0x1f/0x30
      
      Allocated by task 1:
       kasan_save_stack+0x1b/0x40
       __kasan_kmalloc+0x7c/0x90
       thermal_cooling_device_setup_sysfs+0x153/0x2c0
       __thermal_cooling_device_register.part.0+0x25b/0x9c0
       thermal_cooling_device_register+0xb3/0x100
       mlxsw_thermal_init+0x5c5/0x7e0
       __mlxsw_core_bus_device_register+0xcb3/0x19c0
       mlxsw_core_bus_device_register+0x56/0xb0
       mlxsw_pci_probe+0x54f/0x710
       local_pci_probe+0xc6/0x170
       pci_device_probe+0x2b2/0x4d0
       really_probe+0x293/0xd10
       __driver_probe_device+0x2af/0x440
       driver_probe_device+0x51/0x1e0
       __driver_attach+0x21b/0x530
       bus_for_each_dev+0x14c/0x1d0
       bus_add_driver+0x3ac/0x650
       driver_register+0x241/0x3d0
       mlxsw_sp_module_init+0xa2/0x174
       do_one_initcall+0xee/0x5f0
       kernel_init_freeable+0x45a/0x4de
       kernel_init+0x1f/0x210
       ret_from_fork+0x1f/0x30
      
      The buggy address belongs to the object at ffff8881052f7800
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 1016 bytes inside of
       1024-byte region [ffff8881052f7800, ffff8881052f7c00)
      The buggy address belongs to the page:
      page:0000000052355272 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1052f0
      head:0000000052355272 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0x200000000010200(slab|head|node=0|zone=2)
      raw: 0200000000010200 ffffea0005034800 0000000300000003 ffff888100041dc0
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881052f7a80: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
       ffff8881052f7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff8881052f7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                                      ^
       ffff8881052f7c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8881052f7c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      [2] https://lore.kernel.org/linux-pm/9aca37cb-1629-5c67-1895-1fdc45c0244e@linaro.org/
      [3] https://lore.kernel.org/linux-pm/af9857f2-578e-de3a-e62b-6baff7e69fd4@linaro.org/
      
      CC: Daniel Lezcano <daniel.lezcano@linaro.org>
      Fixes: a50c1e35 ("mlxsw: core: Implement thermal zone")
      Fixes: a421ce08
      
       ("mlxsw: core: Extend cooling device with cooling levels")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarVadim Pasternak <vadimp@nvidia.com>
      Link: https://lore.kernel.org/r/20211012174955.472928-1-idosch@idosch.org
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      332fdf95
    • Arnd Bergmann's avatar
      ethernet: s2io: fix setting mac address during resume · 40507e7a
      Arnd Bergmann authored
      After recent cleanups, gcc started warning about a suspicious
      memcpy() call during the s2io_io_resume() function:
      
      In function '__dev_addr_set',
          inlined from 'eth_hw_addr_set' at include/linux/etherdevice.h:318:2,
          inlined from 's2io_set_mac_addr' at drivers/net/ethernet/neterion/s2io.c:5205:2,
          inlined from 's2io_io_resume' at drivers/net/ethernet/neterion/s2io.c:8569:7:
      arch/x86/include/asm/string_32.h:182:25: error: '__builtin_memcpy' accessing 6 bytes at offsets 0 and 2 overlaps 4 bytes at offset 2 [-Werror=restrict]
        182 | #define memcpy(t, f, n) __builtin_memcpy(t, f, n)
            |                         ^~~~~~~~~~~~~~~~~~~~~~~~~
      include/linux/netdevice.h:4648:9: note: in expansion of macro 'memcpy'
       4648 |         memcpy(dev->dev_addr, addr, len);
            |         ^~~~~~
      
      What apparently happened is that an old cleanup changed the calling
      conventions for s2io_set_mac_addr() from taking an ethernet address
      as a character array to taking a struct sockaddr, but one of the
      callers was not changed at the same time.
      
      Change it to instead call the low-level do_s2io_prog_unicast() function
      that still takes the old argument type.
      
      Fixes: 2fd37688
      
       ("S2io: Added support set_mac_address driver entry point")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20211013143613.2049096-1-arnd@kernel.org
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      40507e7a
    • Jakub Kicinski's avatar
      Merge branch 'fix-two-possible-memory-leak-problems-in-nfc-digital-module' · cbcc5072
      Jakub Kicinski authored
      Ziyang Xuan says:
      
      ====================
      Fix two possible memory leak problems in NFC digital module.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1634111083.git.william.xuanziyang@huawei.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cbcc5072
    • Ziyang Xuan's avatar
      NFC: digital: fix possible memory leak in digital_in_send_sdd_req() · 291c932f
      Ziyang Xuan authored
      'skb' is allocated in digital_in_send_sdd_req(), but not free when
      digital_in_send_cmd() failed, which will cause memory leak. Fix it
      by freeing 'skb' if digital_in_send_cmd() return failed.
      
      Fixes: 2c66daec
      
       ("NFC Digital: Add NFC-A technology support")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      291c932f
    • Ziyang Xuan's avatar
      NFC: digital: fix possible memory leak in digital_tg_listen_mdaa() · 58e7dcc9
      Ziyang Xuan authored
      'params' is allocated in digital_tg_listen_mdaa(), but not free when
      digital_send_cmd() failed, which will cause memory leak. Fix it by
      freeing 'params' if digital_send_cmd() return failed.
      
      Fixes: 1c7a4c24
      
       ("NFC Digital: Add target NFC-DEP support")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      58e7dcc9
    • Ziyang Xuan's avatar
      nfc: fix error handling of nfc_proto_register() · 0911ab31
      Ziyang Xuan authored
      When nfc proto id is using, nfc_proto_register() return -EBUSY error
      code, but forgot to unregister proto. Fix it by adding proto_unregister()
      in the error handling case.
      
      Fixes: c7fe3b52
      
       ("NFC: add NFC socket family")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Link: https://lore.kernel.org/r/20211013034932.2833737-1-william.xuanziyang@huawei.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0911ab31
    • Vladimir Oltean's avatar
      Revert "net: procfs: add seq_puts() statement for dev_mcast" · 1f922d9e
      Vladimir Oltean authored
      This reverts commit ec18e845.
      
      It turns out that there are user space programs which got broken by that
      change. One example is the "ifstat" program shipped by Debian:
      https://packages.debian.org/source/bullseye/ifstat
      which, confusingly enough, seems to not have anything in common with the
      much more familiar (at least to me) ifstat program from iproute2:
      https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/misc/ifstat.c
      
      root@debian:~# ifstat
      ifstat: /proc/net/dev: unsupported format.
      
      This change modified the header (first two lines of text) in
      /proc/net/dev so that it looks like this:
      
      root@debian:~# cat /proc/net/dev
      Interface|                            Receive                                       |                                 Transmit
               |            bytes      packets errs   drop fifo frame compressed multicast|            bytes      packets errs   drop fifo colls carrier compressed
             lo:            97400         1204    0      0    0     0          0         0            97400         1204    0      0    0     0       0          0
          bond0:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
           sit0:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
           eno2:          5002206         6651    0      0    0     0          0         0        105518642      1465023    0      0    0     0       0          0
           swp0:           134531         2448    0      0    0     0          0         0         99599598      1464381    0      0    0     0       0          0
           swp1:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
           swp2:          4867675         4203    0      0    0     0          0         0            58134          631    0      0    0     0       0          0
          sw0p0:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
          sw0p1:           124739         2448    0   1422    0     0          0         0         93741184      1464369    0      0    0     0       0          0
          sw0p2:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
          sw2p0:          4850863         4203    0      0    0     0          0         0            54722          619    0      0    0     0       0          0
          sw2p1:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
          sw2p2:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
          sw2p3:                0            0    0      0    0     0          0         0                0            0    0      0    0     0       0          0
            br0:            10508          212    0    212    0     0          0       212         61369558       958857    0      0    0     0       0          0
      
      whereas before it looked like this:
      
      root@debian:~# cat /proc/net/dev
      Inter-|   Receive                                                |  Transmit
       face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
          lo:   13160     164    0    0    0     0          0         0    13160     164    0    0    0     0       0          0
       bond0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
        sit0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
        eno2:   30824     268    0    0    0     0          0         0     3332      37    0    0    0     0       0          0
        swp0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
        swp1:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
        swp2:   30824     268    0    0    0     0          0         0     2428      27    0    0    0     0       0          0
       sw0p0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
       sw0p1:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
       sw0p2:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
       sw2p0:   29752     268    0    0    0     0          0         0     1564      17    0    0    0     0       0          0
       sw2p1:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
       sw2p2:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
       sw2p3:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
      
      The reason why the ifstat shipped by Debian (v1.1, with a Debian patch
      upgrading it to 1.1-8.1 at the time of writing) is broken is because its
      "proc" driver/backend parses the header very literally:
      
      main/drivers.c#L825
        if (!data->checked && strncmp(buf, "Inter-|", 7))
          goto badproc;
      
      and there's no way in which the header can be changed such that programs
      parsing like that would not get broken.
      
      Even if we fix this ancient and very "lightly" maintained program to
      parse the text output of /proc/net/dev in a more sensible way, this
      story seems bound to repeat again with other programs, and modifying
      them all could cause more trouble than it's worth. On the other hand,
      the reverted patch had no other reason than an aesthetic one, so
      reverting it is the simplest way out.
      
      I don't know what other distributions would be affected; the fact that
      Debian doesn't ship the iproute2 version of the program (a different
      code base altogether, which uses netlink and not /proc/net/dev) is
      surprising in itself.
      
      Fixes: ec18e845 ("net: procfs: add seq_puts() statement for dev_mcast")
      Link: https://lore.kernel.org/netdev/20211009163511.vayjvtn3rrteglsu@skbuf/
      
      
      Cc: Yajun Deng <yajun.deng@linux.dev>
      Cc: Matthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20211013001909.3164185-1-vladimir.oltean@nxp.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1f922d9e
  2. 13 Oct, 2021 18 commits
    • Nanyong Sun's avatar
      net: encx24j600: check error in devm_regmap_init_encx24j600 · f03dca0c
      Nanyong Sun authored
      devm_regmap_init may return error which caused by like out of memory,
      this will results in null pointer dereference later when reading
      or writing register:
      
      general protection fault in encx24j600_spi_probe
      KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
      CPU: 0 PID: 286 Comm: spi-encx24j600- Not tainted 5.15.0-rc2-00142-g9978db750e31-dirty #11 9c53a778c1306b1b02359f3c2bbedc0222cba652
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
      RIP: 0010:regcache_cache_bypass drivers/base/regmap/regcache.c:540
      Code: 54 41 89 f4 55 53 48 89 fb 48 83 ec 08 e8 26 94 a8 fe 48 8d bb a0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 03 00 00 4c 8d ab b0 00 00 00 48 8b ab a0 00
      RSP: 0018:ffffc900010476b8 EFLAGS: 00010207
      RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: 0000000000000000
      RDX: 0000000000000012 RSI: ffff888002de0000 RDI: 0000000000000094
      RBP: ffff888013c9a000 R08: 0000000000000000 R09: fffffbfff3f9cc6a
      R10: ffffc900010476e8 R11: fffffbfff3f9cc69 R12: 0000000000000001
      R13: 000000000000000a R14: ffff888013c9af54 R15: ffff888013c9ad08
      FS:  00007ffa984ab580(0000) GS:ffff88801fe00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055a6384136c8 CR3: 000000003bbe6003 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       encx24j600_spi_probe drivers/net/ethernet/microchip/encx24j600.c:459
       spi_probe drivers/spi/spi.c:397
       really_probe drivers/base/dd.c:517
       __driver_probe_device drivers/base/dd.c:751
       driver_probe_device drivers/base/dd.c:782
       __device_attach_driver drivers/base/dd.c:899
       bus_for_each_drv drivers/base/bus.c:427
       __device_attach drivers/base/dd.c:971
       bus_probe_device drivers/base/bus.c:487
       device_add drivers/base/core.c:3364
       __spi_add_device drivers/spi/spi.c:599
       spi_add_device drivers/spi/spi.c:641
       spi_new_device drivers/spi/spi.c:717
       new_device_store+0x18c/0x1f1 [spi_stub 4e02719357f1ff33f5a43d00630982840568e85e]
       dev_attr_store drivers/base/core.c:2074
       sysfs_kf_write fs/sysfs/file.c:139
       kernfs_fop_write_iter fs/kernfs/file.c:300
       new_sync_write fs/read_write.c:508 (discriminator 4)
       vfs_write fs/read_write.c:594
       ksys_write fs/read_write.c:648
       do_syscall_64 arch/x86/entry/common.c:50
       entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:113
      
      Add error check in devm_regmap_init_encx24j600 to avoid this situation.
      
      Fixes: 04fbfce7
      
       ("net: Microchip encx24j600 driver")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarNanyong Sun <sunnanyong@huawei.com>
      Link: https://lore.kernel.org/r/20211012125901.3623144-1-sunnanyong@huawei.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f03dca0c
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-fixes-2021-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · b70b1521
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2021-10-12
      
      * tag 'mlx5-fixes-2021-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors
        net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
        net/mlx5e: Switchdev representors are not vlan challenged
        net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path
        net/mlx5e: Allow only complete TXQs partition in MQPRIO channel mode
        net/mlx5: Fix cleanup of bridge delayed work
      ====================
      
      Link: https://lore.kernel.org/r/20211012205323.20123-1-saeed@kernel.org
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b70b1521
    • Vegard Nossum's avatar
      net: korina: select CRC32 · 427f974d
      Vegard Nossum authored
      Fix the following build/link error by adding a dependency on the CRC32
      routines:
      
        ld: drivers/net/ethernet/korina.o: in function `korina_multicast_list':
        korina.c:(.text+0x1af): undefined reference to `crc32_le'
      
      Fixes: ef11291b
      
       ("Add support the Korina (IDT RC32434) Ethernet MAC")
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: default avatarFlorian fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20211012152509.21771-1-vegard.nossum@oracle.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      427f974d
    • Vegard Nossum's avatar
      net: arc: select CRC32 · e599ee23
      Vegard Nossum authored
      Fix the following build/link error by adding a dependency on the CRC32
      routines:
      
        ld: drivers/net/ethernet/arc/emac_main.o: in function `arc_emac_set_rx_mode':
        emac_main.c:(.text+0xb11): undefined reference to `crc32_le'
      
      The crc32_le() call comes through the ether_crc_le() call in
      arc_emac_set_rx_mode().
      
      [v2: moved the select to ARC_EMAC_CORE; the Makefile is a bit confusing,
      but the error comes from emac_main.o, which is part of the arc_emac module,
      which in turn is enabled by CONFIG_ARC_EMAC_CORE. Note that arc_emac is
      different from emac_arc...]
      
      Fixes: 775dd682
      
       ("arc_emac: implement promiscuous mode and multicast filtering")
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Link: https://lore.kernel.org/r/20211012093446.1575-1-vegard.nossum@oracle.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e599ee23
    • Jakub Kicinski's avatar
      Merge branch 'felix-dsa-driver-fixes' · 847c6bdb
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Felix DSA driver fixes
      
      This is an assorted collection of fixes for issues seen on the NXP
      LS1028A switch.
      
      - PTP packet drops due to switch congestion result in catastrophic
        damage to the driver's state
      - loops are not blocked by STP if using the ocelot-8021q tagger
      - driver uses the wrong CPU port when two of them are defined in DT
      - module autoloading is broken* with both tagging protocol drivers
        (ocelot and ocelot-8021q)
      
      Changes in v2:
      - Stop printing that we aren't going to take TX timestamps if we don't
        have TX timestamping anyway, and we are just carrying PTP frames for a
        cascaded DSA switch.
      - Shorten the deferred xmit kthread name so that it fits the 16
        character limit (TASK_COMM_LEN)
      ====================
      
      Link: https://lore.kernel.org/r/20211012114044.2526146-1-vladimir.oltean@nxp.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      847c6bdb
    • Vladimir Oltean's avatar
      net: dsa: felix: break at first CPU port during init and teardown · 8d5f7954
      Vladimir Oltean authored
      The NXP LS1028A switch has two Ethernet ports towards the CPU, but only
      one of them is capable of acting as an NPI port at a time (inject and
      extract packets using DSA tags).
      
      However, using the alternative ocelot-8021q tagging protocol, it should
      be possible to use both CPU ports symmetrically, but for that we need to
      mark both ports in the device tree as DSA masters.
      
      In the process of doing that, it can be seen that traffic to/from the
      network stack gets broken, and this is because the Felix driver iterates
      through all DSA CPU ports and configures them as NPI ports. But since
      there can only be a single NPI port, we effectively end up in a
      situation where DSA thinks the default CPU port is the first one, but
      the hardware port configured to be an NPI is the last one.
      
      I would like to treat this as a bug, because if the updated device trees
      are going to start circulating, it would be really good for existing
      kernels to support them, too.
      
      Fixes: adb3dccf
      
       ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d5f7954
    • Vladimir Oltean's avatar
      net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports · 43ba33b4
      Vladimir Oltean authored
      When setting up a bridge with stp_state 1, topology changes are not
      detected and loops are not blocked. This is because the standard way of
      transmitting a packet, based on VLAN IDs redirected by VCAP IS2 to the
      right egress port, does not override the port STP state (in the case of
      Ocelot switches, that's really the PGID_SRC masks).
      
      To force a packet to be injected into a port that's BLOCKING, we must
      send it as a control packet, which means in the case of this tagger to
      send it using the manual register injection method. We already do this
      for PTP frames, extend the logic to apply to any link-local MAC DA.
      
      Fixes: 7c83a7c5
      
       ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      43ba33b4
    • Vladimir Oltean's avatar
      net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent · 1328a883
      Vladimir Oltean authored
      At present, when a PTP packet which requires TX timestamping gets
      dropped under congestion by the switch, things go downhill very fast.
      The driver keeps a clone of that skb in a queue of packets awaiting TX
      timestamp interrupts, but interrupts will never be raised for the
      dropped packets.
      
      Moreover, matching timestamped packets to timestamps is done by a 2-bit
      timestamp ID, and this can wrap around and we can match on the wrong skb.
      
      Since with the default NPI-based tagging protocol, we get no notification
      about packet drops, the best we can do is eventually recover from the
      drop of a PTP frame: its skb will be dead memory until another skb which
      was assigned the same timestamp ID happens to find it.
      
      However, with the ocelot-8021q tagger which injects packets using the
      manual register interface, it appears that we can check for more
      information, such as:
      
      - whether the input queue has reached the high watermark or not
      - whether the injection group's FIFO can accept additional data or not
      
      so we know that a PTP frame is likely to get dropped before actually
      sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it
      can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet).
      
      But when we do that, we can also remove the skb from the timestamping
      queue, because there surely won't be any timestamp that matches it.
      
      Fixes: 0a6f17c6
      
       ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1328a883
    • Vladimir Oltean's avatar
      net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib · 49f885b2
      Vladimir Oltean authored
      Michael reported that when using the "ocelot-8021q" tagging protocol,
      the switch driver module must be manually loaded before the tagging
      protocol can be loaded/is available.
      
      This appears to be the same problem described here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      where due to the fact that DSA tagging protocols make use of symbols
      exported by the switch drivers, circular dependencies appear and this
      breaks module autoloading.
      
      The ocelot_8021q driver needs the ocelot_can_inject() and
      ocelot_port_inject_frame() functions from the switch library. Previously
      the wrong approach was taken to solve that dependency: shims were
      provided for the case where the ocelot switch library was compiled out,
      but that turns out to be insufficient, because the dependency when the
      switch lib _is_ compiled is problematic too.
      
      We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
      static inline functions, because these access I/O functions like
      __ocelot_write_ix() which is called by ocelot_write_rix(). Making those
      static inline basically means exposing the whole guts of the ocelot
      switch library, not ideal...
      
      We already have one tagging protocol driver which calls into the switch
      driver during xmit but not using any exported symbol: sja1105_defer_xmit.
      We can do the same thing here: create a kthread worker and one work item
      per skb, and let the switch driver itself do the register accesses to
      send the skb, and then consume it.
      
      Fixes: 0a6f17c6
      
       ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Reported-by: default avatarMichael Walle <michael@walle.cc>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      49f885b2
    • Vladimir Oltean's avatar
      net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver · deab6b1c
      Vladimir Oltean authored
      As explained here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      DSA tagging protocol drivers cannot depend on symbols exported by switch
      drivers, because this creates a circular dependency that breaks module
      autoloading.
      
      The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
      exported by the common ocelot switch lib. This function looks at
      OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
      DSA tag, for PTP timestamping (the command: one-step/two-step, and the
      TX timestamp identifier).
      
      None of that requires deep insight into the driver, it is quite
      stateless, as it only depends upon the skb->cb. So let's make it a
      static inline function and put it in include/linux/dsa/ocelot.h, a
      file that despite its name is used by the ocelot switch driver for
      populating the injection header too - since commit 40d3f295 ("net:
      mscc: ocelot: use common tag parsing code with DSA").
      
      With that function declared as static inline, its body is expanded
      inside each call site, so the dependency is broken and the DSA tagger
      can be built without the switch library, upon which the felix driver
      depends.
      
      Fixes: 39e5308b
      
       ("net: mscc: ocelot: support PTP Sync one-step timestamping")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      deab6b1c
    • Vladimir Oltean's avatar
      net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header · ebb4c6a9
      Vladimir Oltean authored
      The sad reality is that when a PTP frame with a TX timestamping request
      is transmitted, it isn't guaranteed that it will make it all the way to
      the wire (due to congestion inside the switch), and that a timestamp
      will be taken by the hardware and placed in the timestamp FIFO where an
      IRQ will be raised for it.
      
      The implication is that if enough PTP frames are silently dropped by the
      hardware such that the timestamp ID has rolled over, it is possible to
      match a timestamp to an old skb.
      
      Furthermore, nobody will match on the real skb corresponding to this
      timestamp, since we stupidly matched on a previous one that was stale in
      the queue, and stopped there.
      
      So PTP timestamping will be broken and there will be no way to recover.
      
      It looks like the hardware parses the sequenceID from the PTP header,
      and also provides that metadata for each timestamp. The driver currently
      ignores this, but it shouldn't.
      
      As an extra resiliency measure, do the following:
      
      - check whether the PTP sequenceID also matches between the skb and the
        timestamp, treat the skb as stale otherwise and free it
      
      - if we see a stale skb, don't stop there and try to match an skb one
        more time, chances are there's one more skb in the queue with the same
        timestamp ID, otherwise we wouldn't have ever found the stale one (it
        is by timestamp ID that we matched it).
      
      While this does not prevent PTP packet drops, it at least prevents
      the catastrophic consequences of incorrect timestamp matching.
      
      Since we already call ptp_classify_raw in the TX path, save the result
      in the skb->cb of the clone, and just use that result in the interrupt
      code path.
      
      Fixes: 4e3b0468
      
       ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ebb4c6a9
    • Vladimir Oltean's avatar
      net: mscc: ocelot: deny TX timestamping of non-PTP packets · fba01283
      Vladimir Oltean authored
      It appears that Ocelot switches cannot timestamp non-PTP frames,
      I tested this using the isochron program at:
      https://github.com/vladimiroltean/tsn-scripts
      
      with the result that the driver increments the ocelot_port->ts_id
      counter as expected, puts it in the REW_OP, but the hardware seems to
      not timestamp these packets at all, since no IRQ is emitted.
      
      Therefore check whether we are sending PTP frames, and refuse to
      populate REW_OP otherwise.
      
      Fixes: 4e3b0468
      
       ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fba01283
    • Vladimir Oltean's avatar
      net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb · 9fde506e
      Vladimir Oltean authored
      When skb_match is NULL, it means we received a PTP IRQ for a timestamp
      ID that the kernel has no idea about, since there is no skb in the
      timestamping queue with that timestamp ID.
      
      This is a grave error and not something to just "continue" over.
      So print a big warning in case this happens.
      
      Also, move the check above ocelot_get_hwtimestamp(), there is no point
      in reading the full 64-bit current PTP time if we're not going to do
      anything with it anyway for this skb.
      
      Fixes: 4e3b0468
      
       ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9fde506e
    • Vladimir Oltean's avatar
      net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO · 52849bcf
      Vladimir Oltean authored
      PTP packets with 2-step TX timestamp requests are matched to packets
      based on the egress port number and a 6-bit timestamp identifier.
      All PTP timestamps are held in a common FIFO that is 128 entry deep.
      
      This patch ensures that back-to-back timestamping requests cannot exceed
      the hardware FIFO capacity. If that happens, simply send the packets
      without requesting a TX timestamp to be taken (in the case of felix,
      since the DSA API has a void return code in ds->ops->port_txtstamp) or
      drop them (in the case of ocelot).
      
      I've moved the ts_id_lock from a per-port basis to a per-switch basis,
      because we need separate accounting for both numbers of PTP frames in
      flight. And since we need locking to inc/dec the per-switch counter,
      that also offers protection for the per-port counter and hence there is
      no reason to have a per-port counter anymore.
      
      Fixes: 4e3b0468
      
       ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52849bcf
    • Vladimir Oltean's avatar
      net: mscc: ocelot: make use of all 63 PTP timestamp identifiers · c57fe003
      Vladimir Oltean authored
      At present, there is a problem when user space bombards a port with PTP
      event frames which have TX timestamping requests (or when a tc-taprio
      offload is installed on a port, which delays the TX timestamps by a
      significant amount of time). The driver will happily roll over the 2-bit
      timestamp ID and this will cause incorrect matches between an skb and
      the TX timestamp collected from the FIFO.
      
      The Ocelot switches have a 6-bit PTP timestamp identifier, and the value
      63 is reserved, so that leaves identifiers 0-62 to be used.
      
      The timestamp identifiers are selected by the REW_OP packet field, and
      are actually shared between CPU-injected frames and frames which match a
      VCAP IS2 rule that modifies the REW_OP. The hardware supports
      partitioning between the two uses of the REW_OP field through the
      PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP
      IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2.
      
      The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping,
      and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in
      ocelot_init_timestamp() which makes all timestamp identifiers available
      to CPU injection.
      
      Therefore, we can make use of all 63 timestamp identifiers, which should
      allow more timestampable packets to be in flight on each port. This is
      only part of the solution, more issues will be addressed in future changes.
      
      Fixes: 4e3b0468
      
       ("net: mscc: PTP Hardware Clock (PHC) support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c57fe003
    • Jakub Kicinski's avatar
      Merge branch 'fix-circular-dependency-between-sja1105-and-tag_sja1105' · 3af760e4
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Fix circular dependency between sja1105 and tag_sja1105
      
      As discussed here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      DSA tagging protocols cannot use symbols exported by switch drivers.
      
      Eliminate the two instances of that from tag_sja1105, and that allows us
      to have a working setup with modules again.
      ====================
      
      Re-applying to net, this was mistakenly applied to net-next,
      see first Link.
      
      Link: https://lore.kernel.org/r/20211012114044.2526146-1-vladimir.oltean@nxp.com/
      Link: https://lore.kernel.org/r/20210922143726.2431036-1-vladimir.oltean@nxp.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3af760e4
    • Vladimir Oltean's avatar
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e
      Vladimir Oltean authored
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4ac0567e
    • Vladimir Oltean's avatar
      net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver · 28da0555
      Vladimir Oltean authored
      The problem is that DSA tagging protocols really must not depend on the
      switch driver, because this creates a circular dependency at insmod
      time, and the switch driver will effectively not load when the tagging
      protocol driver is missing.
      
      The code was structured in the way it was for a reason, though. The DSA
      driver-facing API for PTP timestamping relies on the assumption that
      two-step TX timestamps are provided by the hardware in an out-of-band
      manner, typically by raising an interrupt and making that timestamp
      available inside some sort of FIFO which is to be accessed over
      SPI/MDIO/etc.
      
      So the API puts .port_txtstamp into dsa_switch_ops, because it is
      expected that the switch driver needs to save some state (like put the
      skb into a queue until its TX timestamp arrives).
      
      On SJA1110, TX timestamps are provided by the switch as Ethernet
      packets, so this makes them be received and processed by the tagging
      protocol driver. This in itself is great, because the timestamps are
      full 64-bit and do not require reconstruction, and since Ethernet is the
      fastest I/O method available to/from the switch, PTP timestamps arrive
      very quickly, no matter how bottlenecked the SPI connection is, because
      SPI interaction is not needed at all.
      
      DSA's code structure and strict isolation between the tagging protocol
      driver and the switch driver break the natural code organization.
      
      When the tagging protocol driver receives a packet which is classified
      as a metadata packet containing timestamps, it passes those timestamps
      one by one to the switch driver, which then proceeds to compare them
      based on the recorded timestamp ID that was generated in .port_txtstamp.
      
      The communication between the tagging protocol and the switch driver is
      done through a method exported by the switch driver, sja1110_process_meta_tstamp.
      To satisfy build requirements, we force a dependency to build the
      tagging protocol driver as a module when the switch driver is a module.
      However, as explained in the first paragraph, that causes the circular
      dependency.
      
      To solve this, move the skb queue from struct sja1105_private :: struct
      sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
      The latter is a data structure for which hacks have already been put
      into place to be able to create persistent storage per switch that is
      accessible from the tagging protocol driver (see sja1105_setup_ports).
      
      With the skb queue directly accessible from the tagging protocol driver,
      we can now move sja1110_process_meta_tstamp into the tagging driver
      itself, and avoid exporting a symbol.
      
      Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      28da0555
  3. 12 Oct, 2021 12 commits
    • Alvin Šipraga's avatar
      net: dsa: fix spurious error message when unoffloaded port leaves bridge · 43a4b4db
      Alvin Šipraga authored
      Flip the sign of a return value check, thereby suppressing the following
      spurious error:
      
        port 2 failed to notify DSA_NOTIFIER_BRIDGE_LEAVE: -EOPNOTSUPP
      
      ... which is emitted when removing an unoffloaded DSA switch port from a
      bridge.
      
      Fixes: d371b7c9
      
       ("net: dsa: Unset vlan_filtering when ports leave the bridge")
      Signed-off-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20211012112730.3429157-1-alvin@pqrs.dk
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      43a4b4db
    • Baowen Zheng's avatar
      nfp: flow_offload: move flow_indr_dev_register from app init to app start · 60d950f4
      Baowen Zheng authored
      In commit 74fc4f82 ("net: Fix offloading indirect devices dependency
      on qdisc order creation"), it adds a process to trigger the callback to
      setup the bo callback when the driver regists a callback.
      
      In our current implement, we are not ready to run the callback when nfp
      call the function flow_indr_dev_register, then there will be error
      message as:
      
      kernel: Oops: 0000 [#1] SMP PTI
      kernel: CPU: 0 PID: 14119 Comm: kworker/0:0 Tainted: G
      kernel: Workqueue: events work_for_cpu_fn
      kernel: RIP: 0010:nfp_flower_indr_setup_tc_cb+0x258/0x410
      kernel: RSP: 0018:ffffbc1e02c57bf8 EFLAGS: 00010286
      kernel: RAX: 0000000000000000 RBX: ffff9c761fabc000 RCX: 0000000000000001
      kernel: RDX: 0000000000000001 RSI: fffffffffffffff0 RDI: ffffffffc0be9ef1
      kernel: RBP: ffffbc1e02c57c58 R08: ffffffffc08f33aa R09: ffff9c6db7478800
      kernel: R10: 0000009c003f6e00 R11: ffffbc1e02800000 R12: ffffbc1e000d9000
      kernel: R13: ffffbc1e000db428 R14: ffff9c6db7478800 R15: ffff9c761e884e80
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel: CR2: fffffffffffffff0 CR3: 00000009e260a004 CR4: 00000000007706f0
      kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      kernel: PKRU: 55555554
      kernel: Call Trace:
      kernel: ? flow_indr_dev_register+0xab/0x210
      kernel: ? __cond_resched+0x15/0x30
      kernel: ? kmem_cache_alloc_trace+0x44/0x4b0
      kernel: ? nfp_flower_setup_tc+0x1d0/0x1d0 [nfp]
      kernel: flow_indr_dev_register+0x158/0x210
      kernel: ? tcf_block_unbind+0xe0/0xe0
      kernel: nfp_flower_init+0x40b/0x650 [nfp]
      kernel: nfp_net_pci_probe+0x25f/0x960 [nfp]
      kernel: ? nfp_rtsym_read_le+0x76/0x130 [nfp]
      kernel: nfp_pci_probe+0x6a9/0x820 [nfp]
      kernel: local_pci_probe+0x45/0x80
      
      So we need to call flow_indr_dev_register in app start process instead of
      init stage.
      
      Fixes: 74fc4f82
      
       ("net: Fix offloading indirect devices dependency on qdisc order creation")
      Signed-off-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Link: https://lore.kernel.org/r/20211012124850.13025-1-louis.peens@corigine.com
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      60d950f4
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors · 84c8a874
      Maxim Mikityanskiy authored
      Commit 846d6da1 ("net/mlx5e: Fix division by 0 in
      mlx5e_select_queue") makes mlx5e_build_nic_params assign a non-zero
      initial value to priv->num_tc_x_num_ch, so that mlx5e_select_queue
      doesn't fail with division by 0 if called before the first activation of
      channels. However, the initialization flow of representors doesn't call
      mlx5e_build_nic_params, so this bug can still happen with representors.
      
      This commit fixes the bug by adding the missing assignment to
      mlx5e_build_rep_params.
      
      Fixes: 846d6da1
      
       ("net/mlx5e: Fix division by 0 in mlx5e_select_queue")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      84c8a874
    • Aya Levin's avatar
      net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp · 0bc73ad4
      Aya Levin authored
      Due to current HW arch limitations, RX-FCS (scattering FCS frame field
      to software) and RX-port-timestamp (improved timestamp accuracy on the
      receive side) can't work together.
      RX-port-timestamp is not controlled by the user and it is enabled by
      default when supported by the HW/FW.
      This patch sets RX-port-timestamp opposite to RX-FCS configuration.
      
      Fixes: 102722fc
      
       ("net/mlx5e: Add support for RXFCS feature flag")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0bc73ad4
    • Saeed Mahameed's avatar
      net/mlx5e: Switchdev representors are not vlan challenged · b2107cdc
      Saeed Mahameed authored
      Before this patch, mlx5 representors advertised the
      NETIF_F_VLAN_CHALLENGED bit, this could lead to missing features when
      using reps with vxlan/bridge and maybe other virtual interfaces,
      when such interfaces inherit this bit and block vlan usage in their
      topology.
      
      Example:
      $ip link add dev bridge type bridge
       # add representor interface to the bridge
      $ip link set dev pf0hpf master
      $ip link add link bridge name vlan10 type vlan id 10 protocol 802.1q
      Error: 8021q: VLANs not supported on device.
      
      Reps are perfectly capable of handling vlan traffic, although they don't
      implement vlan_{add,kill}_vid ndos, hence, remove
      NETIF_F_VLAN_CHALLENGED advertisement.
      
      Fixes: cb67b832
      
       ("net/mlx5e: Introduce SRIOV VF representors")
      Reported-by: default avatarRoopa Prabhu <roopa@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      b2107cdc
    • Valentine Fatiev's avatar
      net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path · 94b960b9
      Valentine Fatiev authored
      Prior to this patch in case mlx5_core_destroy_cq() failed it returns
      without completing all destroy operations and that leads to memory leak.
      Instead, complete the destroy flow before return error.
      
      Also move mlx5_debug_cq_remove() to the beginning of mlx5_core_destroy_cq()
      to be symmetrical with mlx5_core_create_cq().
      
      kmemleak complains on:
      
      unreferenced object 0xc000000038625100 (size 64):
        comm "ethtool", pid 28301, jiffies 4298062946 (age 785.380s)
        hex dump (first 32 bytes):
          60 01 48 94 00 00 00 c0 b8 05 34 c3 00 00 00 c0  `.H.......4.....
          02 00 00 00 00 00 00 00 00 db 7d c1 00 00 00 c0  ..........}.....
        backtrace:
          [<000000009e8643cb>] add_res_tree+0xd0/0x270 [mlx5_core]
          [<00000000e7cb8e6c>] mlx5_debug_cq_add+0x5c/0xc0 [mlx5_core]
          [<000000002a12918f>] mlx5_core_create_cq+0x1d0/0x2d0 [mlx5_core]
          [<00000000cef0a696>] mlx5e_create_cq+0x210/0x3f0 [mlx5_core]
          [<000000009c642c26>] mlx5e_open_cq+0xb4/0x130 [mlx5_core]
          [<0000000058dfa578>] mlx5e_ptp_open+0x7f4/0xe10 [mlx5_core]
          [<0000000081839561>] mlx5e_open_channels+0x9cc/0x13e0 [mlx5_core]
          [<0000000009cf05d4>] mlx5e_switch_priv_channels+0xa4/0x230
      [mlx5_core]
          [<0000000042bbedd8>] mlx5e_safe_switch_params+0x14c/0x300
      [mlx5_core]
          [<0000000004bc9db8>] set_pflag_tx_port_ts+0x9c/0x160 [mlx5_core]
          [<00000000a0553443>] mlx5e_set_priv_flags+0xd0/0x1b0 [mlx5_core]
          [<00000000a8f3d84b>] ethnl_set_privflags+0x234/0x2d0
          [<00000000fd27f27c>] genl_family_rcv_msg_doit+0x108/0x1d0
          [<00000000f495e2bb>] genl_family_rcv_msg+0xe4/0x1f0
          [<00000000646c5c2c>] genl_rcv_msg+0x78/0x120
          [<00000000d53e384e>] netlink_rcv_skb+0x74/0x1a0
      
      Fixes: e126ba97
      
       ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarValentine Fatiev <valentinef@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      94b960b9
    • Tariq Toukan's avatar
      net/mlx5e: Allow only complete TXQs partition in MQPRIO channel mode · ca20dfda
      Tariq Toukan authored
      Do not allow configurations of MQPRIO channel mode that do not
      fully define and utilize the channels txqs.
      
      Fixes: ec60c458
      
       ("net/mlx5e: Support MQPRIO channel mode")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ca20dfda
    • Shay Drory's avatar
      net/mlx5: Fix cleanup of bridge delayed work · 2266bb1e
      Shay Drory authored
      Currently, bridge cleanup is calling to cancel_delayed_work(). When this
      function is finished, there is a chance that the delayed work is still
      running. Also, the delayed work is queueing itself.
      As a result, we might execute the delayed work after the bridge cleanup
      have finished and hit a null-ptr oops[1].
      
      Fix it by using cancel_delayed_work_sync(), which is waiting until the
      work is done and will cancel the queue work.
      
      [1]
      [ 8202.143043 ] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [ 8202.144438 ] #PF: supervisor write access in kernel mode
      [ 8202.145476 ] #PF: error_code(0x0002) - not-present page
      [ 8202.146520 ] PGD 0 P4D 0
      [ 8202.147126 ] Oops: 0002 [#1] SMP NOPTI
      [ 8202.147899 ] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.14.0-rc6_for_upstream_min_debug_2021_08_25_16_06 #1
      [ 8202.149741 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [ 8202.151908 ] RIP: 0010:_raw_spin_lock+0xc/0x20
      [ 8202.156234 ] RSP: 0018:ffff88846f885ea0 EFLAGS: 00010046
      [ 8202.157289 ] RAX: 0000000000000000 RBX: ffff88846f880000 RCX: 0000000000000000
      [ 8202.158731 ] RDX: 0000000000000001 RSI: ffff8881004000c8 RDI: 0000000000000000
      [ 8202.160177 ] RBP: ffff8881fe684978 R08: ffff888100140000 R09: ffffffff824455b8
      [ 8202.161569 ] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      [ 8202.163004 ] R13: 0000000000000012 R14: 0000000000000200 R15: ffff88812992d000
      [ 8202.164018 ] FS:  0000000000000000(0000) GS:ffff88846f880000(0000) knlGS:0000000000000000
      [ 8202.164960 ] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 8202.165634 ] CR2: 0000000000000000 CR3: 0000000108cac004 CR4: 0000000000370ea0
      [ 8202.166450 ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 8202.167807 ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 8202.168852 ] Call Trace:
      [ 8202.169421 ]  <IRQ>
      [ 8202.169792 ]  __queue_work+0xf2/0x3d0
      [ 8202.170481 ]  ? queue_work_node+0x40/0x40
      [ 8202.171270 ]  call_timer_fn+0x2b/0x100
      [ 8202.171932 ]  __run_timers.part.0+0x152/0x220
      [ 8202.172717 ]  ? __hrtimer_run_queues+0x171/0x290
      [ 8202.173526 ]  ? kvm_clock_get_cycles+0xd/0x10
      [ 8202.174232 ]  ? ktime_get+0x35/0x90
      [ 8202.174943 ]  run_timer_softirq+0x26/0x50
      [ 8202.175745 ]  __do_softirq+0xc7/0x271
      [ 8202.176373 ]  irq_exit_rcu+0x93/0xb0
      [ 8202.176983 ]  sysvec_apic_timer_interrupt+0x72/0x90
      [ 8202.177755 ]  </IRQ>
      [ 8202.178245 ]  asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      Fixes: c636a0f0
      
       ("net/mlx5: Bridge, dynamic entry ageing")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2266bb1e
    • Jacob Keller's avatar
      ice: fix locking for Tx timestamp tracking flush · 4d4a223a
      Jacob Keller authored
      Commit 4dd0d5c3 ("ice: add lock around Tx timestamp tracker flush")
      added a lock around the Tx timestamp tracker flow which is used to
      cleanup any left over SKBs and prepare for device removal.
      
      This lock is problematic because it is being held around a call to
      ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY
      write command to firmware. This could lead to a deadlock if the mutex
      actually sleeps, and causes the following warning on a kernel with
      preemption debugging enabled:
      
      [  715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573
      [  715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod
      [  715.435652] INFO: lockdep is turned off.
      [  715.439591] Preemption disabled at:
      [  715.439594] [<0000000000000000>] 0x0
      [  715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G        W  OE     5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c
      [  715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
      [  715.468483] Call Trace:
      [  715.470940]  dump_stack_lvl+0x6a/0x9a
      [  715.474613]  ___might_sleep.cold+0x224/0x26a
      [  715.478895]  __mutex_lock+0xb3/0x1440
      [  715.482569]  ? stack_depot_save+0x378/0x500
      [  715.486763]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.494979]  ? kfree+0xc1/0x520
      [  715.498128]  ? mutex_lock_io_nested+0x12a0/0x12a0
      [  715.502837]  ? kasan_set_free_info+0x20/0x30
      [  715.507110]  ? __kasan_slab_free+0x10b/0x140
      [  715.511385]  ? slab_free_freelist_hook+0xc7/0x220
      [  715.516092]  ? kfree+0xc1/0x520
      [  715.519235]  ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.527359]  ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.535133]  ? pci_device_remove+0xab/0x1d0
      [  715.539318]  ? __device_release_driver+0x35b/0x690
      [  715.544110]  ? driver_detach+0x214/0x2f0
      [  715.548035]  ? bus_remove_driver+0x11d/0x2f0
      [  715.552309]  ? pci_unregister_driver+0x26/0x250
      [  715.556840]  ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.564799]  ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0
      [  715.570554]  ? do_syscall_64+0x3b/0x90
      [  715.574303]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  715.579529]  ? start_flush_work+0x542/0x8f0
      [  715.583719]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.591923]  ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.599960]  ? wait_for_completion_io+0x250/0x250
      [  715.604662]  ? lock_acquire+0x196/0x200
      [  715.608504]  ? do_raw_spin_trylock+0xa5/0x160
      [  715.612864]  ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.620813]  ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.628497]  ? __debug_check_no_obj_freed+0x1e8/0x3c0
      [  715.633550]  ? trace_hardirqs_on+0x1c/0x130
      [  715.637748]  ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.646220]  ? do_raw_spin_trylock+0xa5/0x160
      [  715.650581]  ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.658797]  ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.667013]  ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.675403]  ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.683440]  ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.691037]  ? _raw_spin_unlock_irqrestore+0x46/0x73
      [  715.696005]  pci_device_remove+0xab/0x1d0
      [  715.700018]  __device_release_driver+0x35b/0x690
      [  715.704637]  driver_detach+0x214/0x2f0
      [  715.708389]  bus_remove_driver+0x11d/0x2f0
      [  715.712489]  pci_unregister_driver+0x26/0x250
      [  715.716857]  ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
      [  715.724637]  __do_sys_delete_module.constprop.0+0x2d8/0x4e0
      [  715.730210]  ? free_module+0x6d0/0x6d0
      [  715.733963]  ? task_work_run+0xe1/0x170
      [  715.737803]  ? exit_to_user_mode_loop+0x17f/0x1d0
      [  715.742509]  ? rcu_read_lock_sched_held+0x12/0x80
      [  715.747215]  ? trace_hardirqs_on+0x1c/0x130
      [  715.751401]  do_syscall_64+0x3b/0x90
      [  715.754981]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  715.760033] RIP: 0033:0x7f4dfe59000b
      [  715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48
      [  715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
      [  715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b
      [  715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918
      [  715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      [  715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940
      [  715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0
      
      Notice that this is the only case where we use the lock in this way. In
      the cleanup kthread and work kthread the lock is only taken around the
      bit accesses. This was done intentionally to avoid this kind of issue.
      The way the lock is used, we only protect ordering of bit sets vs bit
      clears. The Tx writers in the hot path don't need to be protected
      against the entire kthread loop. The Tx queues threads only need to
      ensure that they do not re-use an index that is currently in use. The
      cleanup loop does not need to block all new set bits, since it will
      re-queue itself if new timestamps are present.
      
      Fix the tracker flow so that it uses the same flow as the standard
      cleanup thread. In addition, ensure the in_use bitmap actually gets
      cleared properly.
      
      This fixes the warning and also avoids the potential deadlock that might
      have occurred otherwise.
      
      Fixes: 4dd0d5c3
      
       ("ice: add lock around Tx timestamp tracker flush")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d4a223a
    • David S. Miller's avatar
      Merge branch 'ioam-fixes' · 7389074c
      David S. Miller authored
      Justin Iurman says:
      
      ====================
      Correct the IOAM behavior for undefined trace type bits
      
      (@Jakub @David: there will be a conflict for #2 when merging net->net-next, due
      to commit [1]. The conflict is only 5-10 lines for #2 (#1 should be fine) inside
      the file tools/testing/selftests/net/ioam6.sh, so quite short though possibly
      ugly. Sorry for that, I didn't expect to post this one... Had I known, I'd have
      made the opposite.)
      
      Modify both the input and output behaviors regarding the trace type when one of
      the undefined bits is set. The goal is to keep the interoperability when new
      fields (aka new bits inside the range 12-21) will be defined.
      
      The draft [2] says the following:
      ---------------------------------------------------------------
      "Bit 12-21  Undefined.  These values are available for future
             assignment in the IOAM Trace-Type Registry (Section 8.2).
             Every future node data field corresponding to one of
             these bits MUST be 4-octets long.  An IOAM encapsulating
             node MUST set the value of each undefined bit to 0.  If
             an IOAM transit node receives a packet with one or more
             of these bits set to 1, it MUST either:
      
             1.  Add corresponding node data filled with the reserved
                 value 0xFFFFFFFF, after the node data fields for the
                 IOAM-Trace-Type bits defined above, such that the
                 total node data added by this node in units of
                 4-octets is equal to NodeLen, or
      
             2.  Not add any node data fields to the packet, even for
                 the IOAM-Trace-Type bits defined above."
      ---------------------------------------------------------------
      
      The output behavior has been modified to respect the fact that "an IOAM encap
      node MUST set the value of each undefined bit to 0" (i.e., undefined bits can't
      be set anymore).
      
      As for the input behavior, current implementation is based on the second choice
      (i.e., "not add any data fields to the packet [...]"). With this solution, any
      interoperability is lost (i.e., if a new bit is defined, then an "old" kernel
      implementation wouldn't fill IOAM data when such new bit is set inside the trace
      type).
      
      The input behavior is therefore relaxed and these undefined bits are now allowed
      to be set. It is only possible thanks to the sentence "every future node data
      field corresponding to one of these bits MUST be 4-octets long". Indeed, the
      default empty value (the one for 4-octet fields) is inserted whenever an
      undefined bit is set.
      
        [1] cfbe9b00
        [2] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.1
      
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7389074c
    • Justin Iurman's avatar
      selftests: net: modify IOAM tests for undef bits · 7b1700e0
      Justin Iurman authored
      
      
      The output behavior for undefined bits is now directly tested inside the bash
      script. Trying to set an undefined bit should be refused.
      
      The input behavior for undefined bits has been removed due to the fact that we
      would need another sender allowed to set undefined bits.
      Signed-off-by: default avatarJustin Iurman <justin.iurman@uliege.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b1700e0
    • Justin Iurman's avatar
      ipv6: ioam: move the check for undefined bits · 2bbc977c
      Justin Iurman authored
      
      
      The check for undefined bits in the trace type is moved from the input side to
      the output side, while the input side is relaxed and now inserts default empty
      values when an undefined bit is set.
      Signed-off-by: default avatarJustin Iurman <justin.iurman@uliege.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bbc977c