Skip to content
  • Daniel Borkmann's avatar
    net: sctp: rework multihoming retransmission path selection to rfc4960 · 4c47af4d
    Daniel Borkmann authored
    Problem statement: 1) both paths (primary path1 and alternate
    path2) are up after the association has been established i.e.,
    HB packets are normally exchanged, 2) path2 gets inactive after
    path_max_retrans * max_rto timed out (i.e. path2 is down completely),
    3) now, if a transmission times out on the only surviving/active
    path1 (any ~1sec network service impact could cause this like
    a channel bonding failover), then the retransmitted packets are
    sent over the inactive path2; this happens with partial failover
    and without it.
    
    Besides not being optimal in the above scenario, a small failure
    or timeout in the only existing path has the potential to cause
    long delays in the retransmission (depending on RTO_MAX) until
    the still active path is reselected. Further, when the T3-timeout
    occurs, we have active_patch == retrans_path, and even though the
    timeout occurred on the initial transmission of data, not a
    retransmit, we end up updating retransmit path.
    
    RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under
    6.4.1. "Failover from an Inactive Destination Address" the
    following:
    
      Some of the transport addresses of a multi-homed SCTP endpoint
      may become inactive due to either the occurrence of certain
      error conditions (see Section 8.2) or adjustments from the
      SCTP user.
    
      When there is outbound data to send and the primary path
      becomes inactive (e.g., due to failures), or where the SCTP
      user explicitly requests to send data to an inactive
      destination transport address, before reporting an error to
      its ULP, the SCTP endpoint should try to send the data to an
      alternate __active__ destination transport address if one
      exists.
    
      When retransmitting data that timed out, if the endpoint is
      multihomed, it should consider each source-destination address
      pair in its retransmission selection policy. When retransmitting
      timed-out data, the endpoint should attempt to pick the most
      divergent source-destination pair from the original
      source-destination pair to which the packet was transmitted.
    
      Note: Rules for picking the most divergent source-destination
      pair are an implementation decision and are not specified
      within this document.
    
    So, we should first reconsider to take the current active
    retransmission transport if we cannot find an alternative
    active one. If all of that fails, we can still round robin
    through unkown, partial failover, and inactive ones in the
    hope to find something still suitable.
    
    Commit 4141ddc0 ("sctp: retran_path update bug fix") broke
    that behaviour by selecting the next inactive transport when
    no other active transport was found besides the current assoc's
    peer.retran_path. Before commit 4141ddc0, we would have
    traversed through the list until we reach our peer.retran_path
    again, and in case that is still in state SCTP_ACTIVE, we would
    take it and return. Only if that is not the case either, we
    take the next inactive transport.
    
    Besides all that, another issue is that transports in state
    SCTP_UNKNOWN could be preferred over transports in state
    SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after
    SCTP_UNKNOWN in the transport list yielding a weaker transport
    state to be used in retransmission.
    
    This patch mostly reverts 4141ddc0, but also rewrites
    this function to introduce more clarity and strictness into
    the code. A strict priority of transport states is enforced
    in this patch, hence selection is active > unkown > partial
    failover > inactive.
    
    Fixes: 4141ddc0
    
     ("sctp: retran_path update bug fix")
    Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
    Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
    Acked-by: default avatarVlad Yasevich <yasevich@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    4c47af4d