Skip to content
  • Jay Vosburgh's avatar
    bonding: eliminate bond_close race conditions · e6d265e8
    Jay Vosburgh authored
    
    
    This patch resolves two sets of race conditions.
    
    	Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> reported the
    first, as follows:
    
    The bond_close() calls cancel_delayed_work() to cancel delayed works.
    It, however, cannot cancel works that were already queued in workqueue.
    The bond_open() initializes work->data, and proccess_one_work() refers
    get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
    work->data has been initialized. Thus, a panic occurs.
    
    	He included a patch that converted the cancel_delayed_work calls
    in bond_close to flush_delayed_work_sync, which eliminated the above
    problem.
    
    	His patch is incorporated, at least in principle, into this
    patch.  In this patch, we use cancel_delayed_work_sync in place of
    flush_delayed_work_sync, and also convert bond_uninit in addition to
    bond_close.
    
    	This conversion to _sync, however, opens new races between
    bond_close and three periodically executing workqueue functions:
    bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon.
    
    	The race occurs because bond_close and bond_uninit are always
    called with RTNL held, and these workqueue functions may acquire RTNL to
    perform failover-related activities.  If bond_close or bond_uninit is
    waiting in cancel_delayed_work_sync, deadlock occurs.
    
    	These deadlocks are resolved by having the workqueue functions
    acquire RTNL conditionally.  If the rtnl_trylock() fails, the functions
    reschedule and return immediately.  For the cases that are attempting to
    perform link failover, a delay of 1 is used; for the other cases, the
    normal interval is used (as those activities are not as time critical).
    
    	Additionally, the bond_mii_monitor function now stores the delay
    in a variable (mimicing the structure of activebackup_arp_mon).
    
    	Lastly, all of the above renders the kill_timers sentinel moot,
    and therefore it has been removed.
    
    Tested-by: default avatarMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
    Signed-off-by: default avatarJay Vosburgh <fubar@us.ibm.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    e6d265e8