Commit 1c019671 authored by Changwei Ge's avatar Changwei Ge Committed by Linus Torvalds
Browse files

ocfs2: fix cluster hang after a node dies

When a node dies, other live nodes have to choose a new master for an
existed lock resource mastered by the dead node.

As for ocfs2/dlm implementation, this is done by function -
dlm_move_lockres_to_recovery_list which marks those lock rsources as
DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM
changes lock resource's master later.

So without invoking dlm_move_lockres_to_recovery_list, no master will be
choosed after dlm recovery accomplishment since no lock resource can be
found through ::resource list.

What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
resources mastered a dead node, it will break up synchronization among

So invoke dlm_move_lockres_to_recovery_list again.

Fixs: 'commit ee8f7fcb ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")'

Signed-off-by: default avatarChangwei Ge <>
Reported-by: default avatarVitaly Mayatskih <>
Tested-by: default avatarVitaly Mayatskikh <>
Cc: Mark Fasheh <>
Cc: Joel Becker <>
Cc: Junxiao Bi <>
Cc: Joseph Qi <>
Cc: <>
Signed-off-by: default avatarAndrew Morton <>
Signed-off-by: default avatarLinus Torvalds <>
parent 98d6c09e
......@@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
dlm_move_lockres_to_recovery_list(dlm, res);
} else if (res->owner == dlm->node_num) {
dlm_free_dead_locks(dlm, res, dead_node);
__dlm_lockres_calc_usage(dlm, res);
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment