1. 07 Jul, 2018 5 commits
  2. 06 Jul, 2018 1 commit
    • Tyler Hicks's avatar
      ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns · 70ba5b6d
      Tyler Hicks authored
      
      
      The low and high values of the net.ipv4.ping_group_range sysctl were
      being silently forced to the default disabled state when a write to the
      sysctl contained GIDs that didn't map to the associated user namespace.
      Confusingly, the sysctl's write operation would return success and then
      a subsequent read of the sysctl would indicate that the low and high
      values are the overflowgid.
      
      This patch changes the behavior by clearly returning an error when the
      sysctl write operation receives a GID range that doesn't map to the
      associated user namespace. In such a situation, the previous value of
      the sysctl is preserved and that range will be returned in a subsequent
      read of the sysctl.
      
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70ba5b6d
  3. 05 Jul, 2018 3 commits
    • Arun Kumar Neelakantam's avatar
      net: qrtr: Reset the node and port ID of broadcast messages · d27e77a3
      Arun Kumar Neelakantam authored
      
      
      All the control messages broadcast to remote routers are using
      QRTR_NODE_BCAST instead of using local router NODE ID which cause
      the packets to be dropped on remote router due to invalid NODE ID.
      
      Signed-off-by: default avatarArun Kumar Neelakantam <aneela@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d27e77a3
    • Arun Kumar Neelakantam's avatar
      net: qrtr: Broadcast messages only from control port · fdf5fd39
      Arun Kumar Neelakantam authored
      
      
      The broadcast node id should only be sent with the control port id.
      
      Signed-off-by: default avatarArun Kumar Neelakantam <aneela@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdf5fd39
    • Paul Moore's avatar
      ipv6: make ipv6_renew_options() interrupt/kernel safe · a9ba23d4
      Paul Moore authored
      
      
      At present the ipv6_renew_options_kern() function ends up calling into
      access_ok() which is problematic if done from inside an interrupt as
      access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures
      (x86-64 is affected).  Example warning/backtrace is shown below:
      
       WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
       ...
       Call Trace:
        <IRQ>
        ipv6_renew_option+0xb2/0xf0
        ipv6_renew_options+0x26a/0x340
        ipv6_renew_options_kern+0x2c/0x40
        calipso_req_setattr+0x72/0xe0
        netlbl_req_setattr+0x126/0x1b0
        selinux_netlbl_inet_conn_request+0x80/0x100
        selinux_inet_conn_request+0x6d/0xb0
        security_inet_conn_request+0x32/0x50
        tcp_conn_request+0x35f/0xe00
        ? __lock_acquire+0x250/0x16c0
        ? selinux_socket_sock_rcv_skb+0x1ae/0x210
        ? tcp_rcv_state_process+0x289/0x106b
        tcp_rcv_state_process+0x289/0x106b
        ? tcp_v6_do_rcv+0x1a7/0x3c0
        tcp_v6_do_rcv+0x1a7/0x3c0
        tcp_v6_rcv+0xc82/0xcf0
        ip6_input_finish+0x10d/0x690
        ip6_input+0x45/0x1e0
        ? ip6_rcv_finish+0x1d0/0x1d0
        ipv6_rcv+0x32b/0x880
        ? ip6_make_skb+0x1e0/0x1e0
        __netif_receive_skb_core+0x6f2/0xdf0
        ? process_backlog+0x85/0x250
        ? process_backlog+0x85/0x250
        ? process_backlog+0xec/0x250
        process_backlog+0xec/0x250
        net_rx_action+0x153/0x480
        __do_softirq+0xd9/0x4f7
        do_softirq_own_stack+0x2a/0x40
        </IRQ>
        ...
      
      While not present in the backtrace, ipv6_renew_option() ends up calling
      access_ok() via the following chain:
      
        access_ok()
        _copy_from_user()
        copy_from_user()
        ipv6_renew_option()
      
      The fix presented in this patch is to perform the userspace copy
      earlier in the call chain such that it is only called when the option
      data is actually coming from userspace; that place is
      do_ipv6_setsockopt().  Not only does this solve the problem seen in
      the backtrace above, it also allows us to simplify the code quite a
      bit by removing ipv6_renew_options_kern() completely.  We also take
      this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option()
      a small amount as well.
      
      This patch is heavily based on a rough patch by Al Viro.  I've taken
      his original patch, converted a kmemdup() call in do_ipv6_setsockopt()
      to a memdup_user() call, made better use of the e_inval jump target in
      the same function, and cleaned up the use ipv6_renew_option() by
      ipv6_renew_options().
      
      CC: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9ba23d4
  4. 04 Jul, 2018 3 commits
  5. 03 Jul, 2018 1 commit
  6. 02 Jul, 2018 2 commits
  7. 01 Jul, 2018 1 commit
    • Ilpo Järvinen's avatar
      tcp: prevent bogus FRTO undos with non-SACK flows · 1236f22f
      Ilpo Järvinen authored
      
      
      If SACK is not enabled and the first cumulative ACK after the RTO
      retransmission covers more than the retransmitted skb, a spurious
      FRTO undo will trigger (assuming FRTO is enabled for that RTO).
      The reason is that any non-retransmitted segment acknowledged will
      set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
      no indication that it would have been delivered for real (the
      scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
      case so the check for that bit won't help like it does with SACK).
      Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
      in tcp_process_loss.
      
      We need to use more strict condition for non-SACK case and check
      that none of the cumulatively ACKed segments were retransmitted
      to prove that progress is due to original transmissions. Only then
      keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
      non-SACK case.
      
      (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
      to better indicate its purpose but to keep this change minimal, it
      will be done in another patch).
      
      Besides burstiness and congestion control violations, this problem
      can result in RTO loop: When the loss recovery is prematurely
      undoed, only new data will be transmitted (if available) and
      the next retransmission can occur only after a new RTO which in case
      of multiple losses (that are not for consecutive packets) requires
      one RTO per loss to recover.
      
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Tested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1236f22f
  8. 30 Jun, 2018 5 commits
  9. 29 Jun, 2018 5 commits
  10. 28 Jun, 2018 7 commits
    • David Ahern's avatar
      bpf: Change bpf_fib_lookup to return lookup status · 4c79579b
      David Ahern authored
      
      
      For ACLs implemented using either FIB rules or FIB entries, the BPF
      program needs the FIB lookup status to be able to drop the packet.
      Since the bpf_fib_lookup API has not reached a released kernel yet,
      change the return code to contain an encoding of the FIB lookup
      result and return the nexthop device index in the params struct.
      
      In addition, inform the BPF program of any post FIB lookup reason as
      to why the packet needs to go up the stack.
      
      The fib result for unicast routes must have an egress device, so remove
      the check that it is non-NULL.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4c79579b
    • Linus Torvalds's avatar
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds authored
      
      
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
    • Ursula Braun's avatar
      net/smc: rebuild nonblocking connect · 24ac3a08
      Ursula Braun authored
      The recent poll change may lead to stalls for non-blocking connecting
      SMC sockets, since sock_poll_wait is no longer performed on the
      internal CLC socket, but on the outer SMC socket.  kernel_connect() on
      the internal CLC socket returns with -EINPROGRESS, but the wake up
      logic does not work in all cases. If the internal CLC socket is still
      in state TCP_SYN_SENT when polled, sock_poll_wait() from sock_poll()
      does not sleep. It is supposed to sleep till the state of the internal
      CLC socket switches to TCP_ESTABLISHED.
      
      This problem triggered a redesign of the SMC nonblocking connect logic.
      This patch introduces a connect worker covering all connect steps
      followed by a wake up of socket waiters. It allows to get rid of all
      delays and locks in smc_poll().
      
      Fixes: c0129a06
      
       ("smc: convert to ->poll_mask")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24ac3a08
    • Eric Dumazet's avatar
      tcp: add one more quick ack after after ECN events · 15ecbe94
      Eric Dumazet authored
      Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
      tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
      about our recent patch removing ~16 quick acks after ECN events.
      
      tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
      but in the case the sender cwnd was lowered to 1, we do not want
      to have a delayed ack for the next packet we will receive.
      
      Fixes: 522040ea
      
       ("tcp: do not aggressively quick ack after ECN events")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15ecbe94
    • Masahiro Yamada's avatar
      bpfilter: include bpfilter_umh in assembly instead of using objcopy · 8e75887d
      Masahiro Yamada authored
      
      
      What we want here is to embed a user-space program into the kernel.
      Instead of the complex ELF magic, let's simply wrap it in the assembly
      with the '.incbin' directive.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e75887d
    • Doron Roberts-Kedes's avatar
      strparser: Remove early eaten to fix full tcp receive buffer stall · 977c7114
      Doron Roberts-Kedes authored
      
      
      On receving an incomplete message, the existing code stores the
      remaining length of the cloned skb in the early_eaten field instead of
      incrementing the value returned by __strp_recv. This defers invocation
      of sock_rfree for the current skb until the next invocation of
      __strp_recv, which returns early_eaten if early_eaten is non-zero.
      
      This behavior causes a stall when the current message occupies the very
      tail end of a massive skb, and strp_peek/need_bytes indicates that the
      remainder of the current message has yet to arrive on the socket. The
      TCP receive buffer is totally full, causing the TCP window to go to
      zero, so the remainder of the message will never arrive.
      
      Incrementing the value returned by __strp_recv by the amount otherwise
      stored in early_eaten prevents stalls of this nature.
      
      Signed-off-by: default avatarDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      977c7114
    • Masahiro Yamada's avatar
      bpfilter: check compiler capability in Kconfig · 88e85a7d
      Masahiro Yamada authored
      
      
      With the brand-new syntax extension of Kconfig, we can directly
      check the compiler capability in the configuration phase.
      
      If the cc-can-link.sh fails, the BPFILTER_UMH is automatically
      hidden by the dependency.
      
      I also deleted 'default n', which is no-op.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88e85a7d
  11. 27 Jun, 2018 2 commits
  12. 26 Jun, 2018 3 commits
    • Florian Westphal's avatar
      netfilter: nf_conncount: fix garbage collection confirm race · b36e4523
      Florian Westphal authored
      
      
      Yi-Hung Wei and Justin Pettit found a race in the garbage collection scheme
      used by nf_conncount.
      
      When doing list walk, we lookup the tuple in the conntrack table.
      If the lookup fails we remove this tuple from our list because
      the conntrack entry is gone.
      
      This is the common cause, but turns out its not the only one.
      The list entry could have been created just before by another cpu, i.e. the
      conntrack entry might not yet have been inserted into the global hash.
      
      The avoid this, we introduce a timestamp and the owning cpu.
      If the entry appears to be stale, evict only if:
       1. The current cpu is the one that added the entry, or,
       2. The timestamp is older than two jiffies
      
      The second constraint allows GC to be taken over by other
      cpu too (e.g. because a cpu was offlined or napi got moved to another
      cpu).
      
      We can't pretend the 'doubtful' entry wasn't in our list.
      Instead, when we don't find an entry indicate via IS_ERR
      that entry was removed ('did not exist' or withheld
      ('might-be-unconfirmed').
      
      This most likely also fixes a xt_connlimit imbalance earlier reported by
      Dmitry Andrianov.
      
      Cc: Dmitry Andrianov <dmitry.andrianov@alertme.com>
      Reported-by: default avatarJustin Pettit <jpettit@vmware.com>
      Reported-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b36e4523
    • Jann Horn's avatar
      netfilter: nf_log: don't hold nf_log_mutex during user access · ce00bf07
      Jann Horn authored
      The old code would indefinitely block other users of nf_log_mutex if
      a userspace access in proc_dostring() blocked e.g. due to a userfaultfd
      region. Fix it by moving proc_dostring() out of the locked region.
      
      This is a followup to commit 266d07cb ("netfilter: nf_log: fix
      sleeping function called from invalid context"), which changed this code
      from using rcu_read_lock() to taking nf_log_mutex.
      
      Fixes: 266d07cb
      
       ("netfilter: nf_log: fix sleeping function calle[...]")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ce00bf07
    • Jann Horn's avatar
      netfilter: nf_log: fix uninit read in nf_log_proc_dostring · dffd22ae
      Jann Horn authored
      When proc_dostring() is called with a non-zero offset in strict mode, it
      doesn't just write to the ->data buffer, it also reads. Make sure it
      doesn't read uninitialized data.
      
      Fixes: c6ac37d8
      
       ("netfilter: nf_log: fix error on write NONE to [...]")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dffd22ae
  13. 23 Jun, 2018 2 commits