All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Ren <zren@suse.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] Dead lock and cluster blocked, any advices will be appreciated.
Date: Mon, 9 May 2016 13:39:52 +0800	[thread overview]
Message-ID: <573022A8.2020400@suse.com> (raw)
In-Reply-To: <71604351584F6A4EBAE558C676F37CA482951330@H3CMLB14-EX.srv.huawei-3com.com>

Hello Zhonghua,

Thanks for reporting this.

On 05/07/2016 07:30 PM, Guozhonghua wrote:
> Hi, we had find one dead lock scenario.
>
> Suddenly, the Node 2 is rebooted(fenced) for IO error accessing storage. So its slot 2 is remained valid on storage disk.
> The node 1 which is in the same cluster with node 2, is to mount the same disk.  At the same time, the node 2 restarted and mount the same disk.

It'll be great if we have some specific steps to reproduce the deadlock. 
Do we?

Eric

>
> So the work flow is as below.
>
>   Node 1                                                   Node 2
>   ocfs2_dlm_init                                                                  ocfs2_dlm_init
>   ocfs2_super_lock                                                                    waiting ocfs2_super_lock
>   ocfs2_find_slot
>   ocfs2_check_volume
>   ocfs2_mark_dead_nodes
>     ocfs2_slot_to_node_num_locked
>     Finding node slot 2 is valid
>     and set it into recovery map
>
>     ocfs2_trylock_journal
>       This time, try lock journal:0002
>       will successfully for node 2 is
>       waiting super lock.
>
>       ocfs2_recovery_thread
>          Starting recovery for node 2
>     ocfs2_super_unlock
>                                                                                                                   ocfs2_dlm_init
>                                                                                                                   ocfs2_super_lock
>                                                                                                                   ocfs2_find_slot
>                                                                                                                      Grant the journal:0002 lock with slot 2
>                                                                                                                   ocfs2_super_unlock
>    __ocfs2_recovery_thread
>          ocfs2_super_lock
>          ocfs2_recover_node
>             Recovering node 2, to granted journal:0002
>             Node 1 will still waiting for node 2.
>             And Node 2 will never release the journal:0002            .... ....
>                                                               ocfs2_super_lock
>                                                                                                                       At this time node 2 will waiting node 1 to release super lock;
>                                                                                                                           So One dead lock occurred.
>
>
>
>
> Stack, and lock res infos:
>          122 /dev/dm-1: LABEL="o20160426150630" UUID="83269946-3428-4a04-8d78-1d76053b3f28" TYPE="ocfs2"
>          123
>          124     find deadlock on /dev/dm-1
>          125 Lockres: M000000000000000000026a863e451d  Mode: No Lock
>          126 Flags: Initialized Attached Busy
>          127 RO Holders: 0  EX Holders: 0
>          128 Pending Action: Convert  Pending Unlock Action: None
>          129 Requested Mode: Exclusive  Blocking Mode: No Lock
>          130 PR > Gets: 0  Fails: 0    Waits Total: 0us  Max: 0us  Avg: 0ns
>          131 EX > Gets: 1  Fails: 0    Waits Total: 772us  Max: 772us  Avg: 772470ns
>          132 Disk Refreshes: 1
>          133
>          134     inode of lock: M000000000000000000026a863e451d is 000000000000026a, file is:
>          135         618     //journal:0002
>          136     lock: M000000000000000000026a863e451d on local is:
>          137 Lockres: M000000000000000000026a863e451d   Owner: 1    State: 0x0
>          138 Last Used: 0      ASTs Reserved: 0    Inflight: 0    Migration Pending: No
>          139 Refs: 4    Locks: 2    On Lists: None
>          140 Reference Map: 2
>          141  Lock-Queue  Node  Level  Conv  Cookie           Refs  AST  BAST  Pending-Action
>          142  Granted     2     EX     -1    2:18553          2     No   No    None
>          143  Converting  1     NL     EX    1:15786          2     No   No    None
>          144
>          145     Local host is the Owner of M000000000000000000026a863e451d
>
> Node 1
>    ========= find hung_up process ==========
>    2 16398 D    kworker/u128:0  ocfs2_wait_for_recovery
>    3 35883 D    ocfs2rec-832699 ocfs2_cluster_lock.isra.37
>    4 36601 D    df              ocfs2_wait_for_recovery
>    5 54451 D    kworker/u128:2  chbk_store_chk_proc
>    6 62872 D    kworker/u128:3  ocfs2_wait_for_recovery
>    7
>    8 ========== get stack of 16398 ==========
>    9
>   10 [<ffffffffc06367a5>] ocfs2_wait_for_recovery+0x75/0xc0 [ocfs2]
>   11 [<ffffffffc0621d68>] ocfs2_inode_lock_full_nested+0x318/0xc50 [ocfs2]
>   12 [<ffffffffc063b210>] ocfs2_complete_local_alloc_recovery+0x70/0x3f0 [ocfs2]
>   13 [<ffffffffc063698e>] ocfs2_complete_recovery+0x19e/0xfa0 [ocfs2]
>   14 [<ffffffff81096e64>] process_one_work+0x144/0x4c0
>   15 [<ffffffff810978fd>] worker_thread+0x11d/0x540
>   16 [<ffffffff8109def9>] kthread+0xc9/0xe0
>   17 [<ffffffff817f6a22>] ret_from_fork+0x42/0x70
>   18 [<ffffffffffffffff>] 0xffffffffffffffff
>   19
>   20 ========== get stack of 35883 ==========
>   21
>   22 [<ffffffffc0620260>] __ocfs2_cluster_lock.isra.37+0x2b0/0x9f0 [ocfs2]
>   23 [<ffffffffc0621c4d>] ocfs2_inode_lock_full_nested+0x1fd/0xc50 [ocfs2]
>   24 [<ffffffffc0638b72>] __ocfs2_recovery_thread+0x6f2/0x14d0 [ocfs2]
>   25 [<ffffffff8109def9>] kthread+0xc9/0xe0
>   26 [<ffffffff817f6a22>] ret_from_fork+0x42/0x70
>   27 [<ffffffffffffffff>] 0xffffffffffffffff
>   28
>   29 ========== get stack of 36601 ==========
>   30 df^@-BM^@-TP^@
>   31 [<ffffffffc06367a5>] ocfs2_wait_for_recovery+0x75/0xc0 [ocfs2]
>   32 [<ffffffffc0621d68>] ocfs2_inode_lock_full_nested+0x318/0xc50 [ocfs2]
>   33 [<ffffffffc066a1e1>] ocfs2_statfs+0x81/0x400 [ocfs2]
>   34 [<ffffffff81235969>] statfs_by_dentry+0x99/0x140
>   35 [<ffffffff81235a2b>] vfs_statfs+0x1b/0xa0
>   36 [<ffffffff81235af5>] user_statfs+0x45/0x80
>   37 [<ffffffff81235bab>] SYSC_statfs+0x1b/0x40
>   38 [<ffffffff81235cee>] SyS_statfs+0xe/0x10
>   39 [<ffffffff817f65f2>] system_call_fastpath+0x16/0x75
>   40 [<ffffffffffffffff>] 0xffffffffffffffff
>
>
> Thanks
>
> Guozhonghua
> -------------------------------------------------------------------------------------------------------------------------------------
> ????????????????????????????????????????
> ????????????????????????????????????????
> ????????????????????????????????????????
> ???
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>

      reply	other threads:[~2016-05-09  5:39 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-07 11:30 [Ocfs2-devel] Dead lock and cluster blocked, any advices will be appreciated Guozhonghua
2016-05-09  5:39 ` Eric Ren [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=573022A8.2020400@suse.com \
    --to=zren@suse.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.