Skip to content
  • Eric Dumazet's avatar
    xps: fix xps for stacked devices · 2bd82484
    Eric Dumazet authored
    
    
    A typical qdisc setup is the following :
    
    bond0 : bonding device, using HTB hierarchy
    eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc
    
    XPS allows to spread packets on specific tx queues, based on the cpu
    doing the send.
    
    Problem is that dequeues from bond0 qdisc can happen on random cpus,
    due to the fact that qdisc_run() can dequeue a batch of packets.
    
    CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
    CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0
    
    CPUB -> dequeue packet P1 from bond0
            enqueue packet on eth1/eth2
    CPUC -> dequeue packet P2 from bond0
            enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)
    
    get_xps_queue() then might select wrong queue for P1, since current cpu
    might be different than CPUA.
    
    P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
    if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)
    
    Effect of this bug is TCP reorders, and more generally not optimal
    TX queue placement. (A victim bulk flow can be migrated to the wrong TX
    queue for a while)
    
    To fix this, we have to record sender cpu number the first time
    dev_queue_xmit() is called for one tx skb.
    
    We can union napi_id (used on receive path) and sender_cpu,
    granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
    this union idea)
    
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Cc: Willem de Bruijn <willemb@google.com>
    Cc: Nandita Dukkipati <nanditad@google.com>
    Cc: Yuchung Cheng <ycheng@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    2bd82484