Skip to content
  • Nicholas Bellinger's avatar
    iscsi-target: Fix iscsi_np reset hung task during parallel delete · 978d13d6
    Nicholas Bellinger authored
    
    
    This patch fixes a bug associated with iscsit_reset_np_thread()
    that can occur during parallel configfs rmdir of a single iscsi_np
    used across multiple iscsi-target instances, that would result in
    hung task(s) similar to below where configfs rmdir process context
    was blocked indefinately waiting for iscsi_np->np_restart_comp
    to finish:
    
    [ 6726.112076] INFO: task dcp_proxy_node_:15550 blocked for more than 120 seconds.
    [ 6726.119440]       Tainted: G        W  O     4.1.26-3321 #2
    [ 6726.125045] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 6726.132927] dcp_proxy_node_ D ffff8803f202bc88     0 15550      1 0x00000000
    [ 6726.140058]  ffff8803f202bc88 ffff88085c64d960 ffff88083b3b1ad0 ffff88087fffeb08
    [ 6726.147593]  ffff8803f202c000 7fffffffffffffff ffff88083f459c28 ffff88083b3b1ad0
    [ 6726.155132]  ffff88035373c100 ffff8803f202bca8 ffffffff8168ced2 ffff8803f202bcb8
    [ 6726.162667] Call Trace:
    [ 6726.165150]  [<ffffffff8168ced2>] schedule+0x32/0x80
    [ 6726.170156]  [<ffffffff8168f5b4>] schedule_timeout+0x214/0x290
    [ 6726.176030]  [<ffffffff810caef2>] ? __send_signal+0x52/0x4a0
    [ 6726.181728]  [<ffffffff8168d7d6>] wait_for_completion+0x96/0x100
    [ 6726.187774]  [<ffffffff810e7c80>] ? wake_up_state+0x10/0x10
    [ 6726.193395]  [<ffffffffa035d6e2>] iscsit_reset_np_thread+0x62/0xe0 [iscsi_target_mod]
    [ 6726.201278]  [<ffffffffa0355d86>] iscsit_tpg_disable_portal_group+0x96/0x190 [iscsi_target_mod]
    [ 6726.210033]  [<ffffffffa0363f7f>] lio_target_tpg_store_enable+0x4f/0xc0 [iscsi_target_mod]
    [ 6726.218351]  [<ffffffff81260c5a>] configfs_write_file+0xaa/0x110
    [ 6726.224392]  [<ffffffff811ea364>] vfs_write+0xa4/0x1b0
    [ 6726.229576]  [<ffffffff811eb111>] SyS_write+0x41/0xb0
    [ 6726.234659]  [<ffffffff8169042e>] system_call_fastpath+0x12/0x71
    
    It would happen because each iscsit_reset_np_thread() sets state
    to ISCSI_NP_THREAD_RESET, sends SIGINT, and then blocks waiting
    for completion on iscsi_np->np_restart_comp.
    
    However, if iscsi_np was active processing a login request and
    more than a single iscsit_reset_np_thread() caller to the same
    iscsi_np was blocked on iscsi_np->np_restart_comp, iscsi_np
    kthread process context in __iscsi_target_login_thread() would
    flush pending signals and only perform a single completion of
    np->np_restart_comp before going back to sleep within transport
    specific iscsit_transport->iscsi_accept_np code.
    
    To address this bug, add a iscsi_np->np_reset_count and update
    __iscsi_target_login_thread() to keep completing np->np_restart_comp
    until ->np_reset_count has reached zero.
    
    Reported-by: default avatarGary Guo <ghg@datera.io>
    Tested-by: default avatarGary Guo <ghg@datera.io>
    Cc: Mike Christie <mchristi@redhat.com>
    Cc: Hannes Reinecke <hare@suse.de>
    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
    978d13d6