1. 26 Sep, 2019 1 commit
    • Xin Long's avatar
      macsec: drop skb sk before calling gro_cells_receive · ba56d8ce
      Xin Long authored
      Fei Liu reported a crash when doing netperf on a topo of macsec
      dev over veth:
      
        [  448.919128] refcount_t: underflow; use-after-free.
        [  449.090460] Call trace:
        [  449.092895]  refcount_sub_and_test+0xb4/0xc0
        [  449.097155]  tcp_wfree+0x2c/0x150
        [  449.100460]  ip_rcv+0x1d4/0x3a8
        [  449.103591]  __netif_receive_skb_core+0x554/0xae0
        [  449.108282]  __netif_receive_skb+0x28/0x78
        [  449.112366]  netif_receive_skb_internal+0x54/0x100
        [  449.117144]  napi_gro_complete+0x70/0xc0
        [  449.121054]  napi_gro_flush+0x6c/0x90
        [  449.124703]  napi_complete_done+0x50/0x130
        [  449.128788]  gro_cell_poll+0x8c/0xa8
        [  449.132351]  net_rx_action+0x16c/0x3f8
        [  449.136088]  __do_softirq+0x128/0x320
      
      The issue was caused by skb's true_size changed without its sk's
      sk_wmem_alloc increased in tcp/skb_gro_receive(). Later when the
      skb is being freed and the skb's truesize is subtracted from its
      sk's sk_wmem_alloc in tcp_wfree(), underflow occurs.
      
      macsec is calling gro_cells_receive() to receive a packet, which
      actually requires skb->sk to be NULL. However when macsec dev is
      over veth, it's possible the skb->sk is still set if the skb was
      not unshared or expanded from the peer veth.
      
      ip_rcv() is calling skb_orphan() to drop the skb's sk for tproxy,
      but it is too late for macsec's calling gro_cells_receive(). So
      fix it by dropping the skb's sk earlier on rx path of macsec.
      
      Fixes: 5491e7c6
      
       ("macsec: enable GRO and RPS on macsec devices")
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Reported-by: default avatarFei Liu <feliu@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba56d8ce
  2. 02 Jul, 2019 2 commits
  3. 30 May, 2019 1 commit
  4. 27 Apr, 2019 3 commits
    • Johannes Berg's avatar
      genetlink: optionally validate strictly/dumps · ef6243ac
      Johannes Berg authored
      
      
      Add options to strictly validate messages and dump messages,
      sometimes perhaps validating dump messages non-strictly may
      be required, so add an option for that as well.
      
      Since none of this can really be applied to existing commands,
      set the options everwhere using the following spatch:
      
          @@
          identifier ops;
          expression X;
          @@
          struct genl_ops ops[] = {
          ...,
           {
                  .cmd = X,
          +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
                  ...
           },
          ...
          };
      
      For new commands one should just not copy the .validate 'opt-out'
      flags and thus get strict validation.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef6243ac
    • Johannes Berg's avatar
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg authored
      
      
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cb08174
    • Michal Kubecek's avatar
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek authored
      
      
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae0be8de
  5. 02 Apr, 2019 1 commit
  6. 22 Mar, 2019 1 commit
    • Johannes Berg's avatar
      genetlink: make policy common to family · 3b0f31f2
      Johannes Berg authored
      
      
      Since maxattr is common, the policy can't really differ sanely,
      so make it common as well.
      
      The only user that did in fact manage to make a non-common policy
      is taskstats, which has to be really careful about it (since it's
      still using a common maxattr!). This is no longer supported, but
      we can fake it using pre_doit.
      
      This reduces the size of e.g. nl80211.o (which has lots of commands):
      
         text	   data	    bss	    dec	    hex	filename
       398745	  14323	   2240	 415308	  6564c	net/wireless/nl80211.o (before)
       397913	  14331	   2240	 414484	  65314	net/wireless/nl80211.o (after)
      --------------------------------
         -832      +8       0    -824
      
      Which is obviously just 8 bytes for each command, and an added 8
      bytes for the new policy pointer. I'm not sure why the ops list is
      counted as .text though.
      
      Most of the code transformations were done using the following spatch:
          @ops@
          identifier OPS;
          expression POLICY;
          @@
          struct genl_ops OPS[] = {
          ...,
           {
          -	.policy = POLICY,
           },
          ...
          };
      
          @@
          identifier ops.OPS;
          expression ops.POLICY;
          identifier fam;
          expression M;
          @@
          struct genl_family fam = {
                  .ops = OPS,
                  .maxattr = M,
          +       .policy = POLICY,
                  ...
          };
      
      This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
      the cb->data as ops, which we want to change in a later genl patch.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b0f31f2
  7. 29 Oct, 2018 2 commits
  8. 22 Sep, 2018 1 commit
  9. 16 Apr, 2018 1 commit
  10. 22 Mar, 2018 1 commit
  11. 22 Jan, 2018 1 commit
    • Sabrina Dubroca's avatar
      macsec: restore uAPI after addition of GCM-AES-256 · e8660ded
      Sabrina Dubroca authored
      Commit ccfdec90 ("macsec: Add support for GCM-AES-256 cipher suite")
      changed a few values in the uapi headers for MACsec.
      
      Because of existing userspace implementations, we need to preserve the
      value of MACSEC_DEFAULT_CIPHER_ID. Not doing that resulted in
      wpa_supplicant segfaults when a secure channel was created using the
      default cipher. Thus, swap MACSEC_DEFAULT_CIPHER_{ID,ALT} back to their
      original values.
      
      Changing the maximum length of the MACSEC_SA_ATTR_KEY attribute is
      unnecessary, as the previous value (MACSEC_MAX_KEY_LEN, which was 128B)
      is large enough to carry 32-bytes keys. This patch reverts
      MACSEC_MAX_KEY_LEN to 128B and restores the old length check on
      MACSEC_SA_ATTR_KEY.
      
      Fixes: ccfdec90
      
       ("macsec: Add support for GCM-AES-256 cipher suite")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8660ded
  12. 09 Jan, 2018 1 commit
  13. 16 Nov, 2017 1 commit
    • Michal Kubecek's avatar
      genetlink: fix genlmsg_nlhdr() · 0a833c29
      Michal Kubecek authored
      According to the description, first argument of genlmsg_nlhdr() points to
      what genlmsg_put() returns, i.e. beginning of user header. Therefore we
      should only subtract size of genetlink header and netlink message header,
      not user header.
      
      This also means we don't need to pass the pointer to genetlink family and
      the same is true for genl_dump_check_consistent() which is the only caller
      of genlmsg_nlhdr(). (Note that at the moment, these functions are only
      used for families which do not have user header so that they are not
      affected.)
      
      Fixes: 670dc283
      
       ("netlink: advertise incomplete dumps")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a833c29
  14. 22 Oct, 2017 3 commits
  15. 11 Oct, 2017 1 commit
  16. 05 Oct, 2017 1 commit
  17. 22 Aug, 2017 1 commit
  18. 27 Jun, 2017 3 commits
  19. 16 Jun, 2017 1 commit
    • Johannes Berg's avatar
      networking: make skb_push & __skb_push return void pointers · d58ff351
      Johannes Berg authored
      
      
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
          @@
          expression SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - fn(SKB, LEN)[0]
          + *(u8 *)fn(SKB, LEN)
      
      Note that the last part there converts from push(...)[0] to the
      more idiomatic *(u8 *)push(...).
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d58ff351
  20. 07 Jun, 2017 1 commit
    • David S. Miller's avatar
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller authored
      
      
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf124db5
  21. 05 Jun, 2017 1 commit
  22. 22 May, 2017 1 commit
  23. 26 Apr, 2017 1 commit
  24. 24 Apr, 2017 1 commit
    • Jason A. Donenfeld's avatar
      macsec: avoid heap overflow in skb_to_sgvec · 4d6fa57b
      Jason A. Donenfeld authored
      
      
      While this may appear as a humdrum one line change, it's actually quite
      important. An sk_buff stores data in three places:
      
      1. A linear chunk of allocated memory in skb->data. This is the easiest
         one to work with, but it precludes using scatterdata since the memory
         must be linear.
      2. The array skb_shinfo(skb)->frags, which is of maximum length
         MAX_SKB_FRAGS. This is nice for scattergather, since these fragments
         can point to different pages.
      3. skb_shinfo(skb)->frag_list, which is a pointer to another sk_buff,
         which in turn can have data in either (1) or (2).
      
      The first two are rather easy to deal with, since they're of a fixed
      maximum length, while the third one is not, since there can be
      potentially limitless chains of fragments. Fortunately dealing with
      frag_list is opt-in for drivers, so drivers don't actually have to deal
      with this mess. For whatever reason, macsec decided it wanted pain, and
      so it explicitly specified NETIF_F_FRAGLIST.
      
      Because dealing with (1), (2), and (3) is insane, most users of sk_buff
      doing any sort of crypto or paging operation calls a convenient function
      called skb_to_sgvec (which happens to be recursive if (3) is in use!).
      This takes a sk_buff as input, and writes into its output pointer an
      array of scattergather list items. Sometimes people like to declare a
      fixed size scattergather list on the stack; othertimes people like to
      allocate a fixed size scattergather list on the heap. However, if you're
      doing it in a fixed-size fashion, you really shouldn't be using
      NETIF_F_FRAGLIST too (unless you're also ensuring the sk_buff and its
      frag_list children arent't shared and then you check the number of
      fragments in total required.)
      
      Macsec specifically does this:
      
              size += sizeof(struct scatterlist) * (MAX_SKB_FRAGS + 1);
              tmp = kmalloc(size, GFP_ATOMIC);
              *sg = (struct scatterlist *)(tmp + sg_offset);
      	...
              sg_init_table(sg, MAX_SKB_FRAGS + 1);
              skb_to_sgvec(skb, sg, 0, skb->len);
      
      Specifying MAX_SKB_FRAGS + 1 is the right answer usually, but not if you're
      using NETIF_F_FRAGLIST, in which case the call to skb_to_sgvec will
      overflow the heap, and disaster ensues.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: stable@vger.kernel.org
      Cc: security@kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d6fa57b
  25. 13 Apr, 2017 1 commit
  26. 21 Feb, 2017 1 commit
  27. 08 Jan, 2017 1 commit
  28. 08 Dec, 2016 1 commit
  29. 27 Oct, 2016 4 commits
    • Tobias Brunner's avatar
      macsec: Fix header length if SCI is added if explicitly disabled · e0f841f5
      Tobias Brunner authored
      Even if sending SCIs is explicitly disabled, the code that creates the
      Security Tag might still decide to add it (e.g. if multiple RX SCs are
      defined on the MACsec interface).
      But because the header length so far only depended on the configuration
      option the SCI overwrote the original frame's contents (EtherType and
      e.g. the beginning of the IP header) and if encrypted did not visibly
      end up in the packet, while the SC flag in the TCI field of the Security
      Tag was still set, resulting in invalid MACsec frames.
      
      Fixes: c09440f7
      
       ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarTobias Brunner <tobias@strongswan.org>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0f841f5
    • Johannes Berg's avatar
      genetlink: mark families as __ro_after_init · 56989f6d
      Johannes Berg authored
      
      
      Now genl_register_family() is the only thing (other than the
      users themselves, perhaps, but I didn't find any doing that)
      writing to the family struct.
      
      In all families that I found, genl_register_family() is only
      called from __init functions (some indirectly, in which case
      I've add __init annotations to clarifly things), so all can
      actually be marked __ro_after_init.
      
      This protects the data structure from accidental corruption.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56989f6d
    • Johannes Berg's avatar
      genetlink: statically initialize families · 489111e5
      Johannes Berg authored
      
      
      Instead of providing macros/inline functions to initialize
      the families, make all users initialize them statically and
      get rid of the macros.
      
      This reduces the kernel code size by about 1.6k on x86-64
      (with allyesconfig).
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      489111e5
    • Johannes Berg's avatar
      genetlink: no longer support using static family IDs · a07ea4d9
      Johannes Berg authored
      
      
      Static family IDs have never really been used, the only
      use case was the workaround I introduced for those users
      that assumed their family ID was also their multicast
      group ID.
      
      Additionally, because static family IDs would never be
      reserved by the generic netlink code, using a relatively
      low ID would only work for built-in families that can be
      registered immediately after generic netlink is started,
      which is basically only the control family (apart from
      the workaround code, which I also had to add code for so
      it would reserve those IDs)
      
      Thus, anything other than GENL_ID_GENERATE is flawed and
      luckily not used except in the cases I mentioned. Move
      those workarounds into a few lines of code, and then get
      rid of GENL_ID_GENERATE entirely, making it more robust.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a07ea4d9