ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
From: Eric Ren <zren@suse.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] Dead lock and cluster blocked, any advices will be appreciated.
Date: Mon, 9 May 2016 13:39:52 +0800	[thread overview]
Message-ID: <573022A8.2020400@suse.com> (raw)
In-Reply-To: <71604351584F6A4EBAE558C676F37CA482951330@H3CMLB14-EX.srv.huawei-3com.com>

Hello Zhonghua,

Thanks for reporting this.

On 05/07/2016 07:30 PM, Guozhonghua wrote:
> Hi, we had find one dead lock scenario.
>
> Suddenly, the Node 2 is rebooted(fenced) for IO error accessing storage. So its slot 2 is remained valid on storage disk.
> The node 1 which is in the same cluster with node 2, is to mount the same disk.  At the same time, the node 2 restarted and mount the same disk.

It'll be great if we have some specific steps to reproduce the deadlock. 
Do we?

Eric

>
> So the work flow is as below.
>
>   Node 1                                                   Node 2
>   ocfs2_dlm_init                                                                  ocfs2_dlm_init
>   ocfs2_super_lock                                                                    waiting ocfs2_super_lock
>   ocfs2_find_slot
>   ocfs2_check_volume
>   ocfs2_mark_dead_nodes
>     ocfs2_slot_to_node_num_locked
>     Finding node slot 2 is valid
>     and set it into recovery map
>
>     ocfs2_trylock_journal
>       This time, try lock journal:0002
>       will successfully for node 2 is
>       waiting super lock.
>
>       ocfs2_recovery_thread
>          Starting recovery for node 2
>     ocfs2_super_unlock
>                                                                                                                   ocfs2_dlm_init
>                                                                                                                   ocfs2_super_lock
>                                                                                                                   ocfs2_find_slot
>                                                                                                                      Grant the journal:0002 lock with slot 2
>                                                                                                                   ocfs2_super_unlock
>    __ocfs2_recovery_thread
>          ocfs2_super_lock
>          ocfs2_recover_node
>             Recovering node 2, to granted journal:0002
>             Node 1 will still waiting for node 2.
>             And Node 2 will never release the journal:0002            .... ....
>                                                               ocfs2_super_lock
>                                                                                                                       At this time node 2 will waiting node 1 to release super lock;
>                                                                                                                           So One dead lock occurred.
>
>
>
>
> Stack, and lock res infos:
>          122 /dev/dm-1: LABEL="o20160426150630" UUID="83269946-3428-4a04-8d78-1d76053b3f28" TYPE="ocfs2"
>          123
>          124     find deadlock on /dev/dm-1
>          125 Lockres: M000000000000000000026a863e451d  Mode: No Lock
>          126 Flags: Initialized Attached Busy
>          127 RO Holders: 0  EX Holders: 0
>          128 Pending Action: Convert  Pending Unlock Action: None
>          129 Requested Mode: Exclusive  Blocking Mode: No Lock
>          130 PR > Gets: 0  Fails: 0    Waits Total: 0us  Max: 0us  Avg: 0ns
>          131 EX > Gets: 1  Fails: 0    Waits Total: 772us  Max: 772us  Avg: 772470ns
>          132 Disk Refreshes: 1
>          133
>          134     inode of lock: M000000000000000000026a863e451d is 000000000000026a, file is:
>          135         618     //journal:0002
>          136     lock: M000000000000000000026a863e451d on local is:
>          137 Lockres: M000000000000000000026a863e451d   Owner: 1    State: 0x0
>          138 Last Used: 0      ASTs Reserved: 0    Inflight: 0    Migration Pending: No
>          139 Refs: 4    Locks: 2    On Lists: None
>          140 Reference Map: 2
>          141  Lock-Queue  Node  Level  Conv  Cookie           Refs  AST  BAST  Pending-Action
>          142  Granted     2     EX     -1    2:18553          2     No   No    None
>          143  Converting  1     NL     EX    1:15786          2     No   No    None
>          144
>          145     Local host is the Owner of M000000000000000000026a863e451d
>
> Node 1
>    ========= find hung_up process ==========
>    2 16398 D    kworker/u128:0  ocfs2_wait_for_recovery
>    3 35883 D    ocfs2rec-832699 ocfs2_cluster_lock.isra.37
>    4 36601 D    df              ocfs2_wait_for_recovery
>    5 54451 D    kworker/u128:2  chbk_store_chk_proc
>    6 62872 D    kworker/u128:3  ocfs2_wait_for_recovery
>    7
>    8 ========== get stack of 16398 ==========
>    9
>   10 [<ffffffffc06367a5>] ocfs2_wait_for_recovery+0x75/0xc0 [ocfs2]
>   11 [<ffffffffc0621d68>] ocfs2_inode_lock_full_nested+0x318/0xc50 [ocfs2]
>   12 [<ffffffffc063b210>] ocfs2_complete_local_alloc_recovery+0x70/0x3f0 [ocfs2]
>   13 [<ffffffffc063698e>] ocfs2_complete_recovery+0x19e/0xfa0 [ocfs2]
>   14 [<ffffffff81096e64>] process_one_work+0x144/0x4c0
>   15 [<ffffffff810978fd>] worker_thread+0x11d/0x540
>   16 [<ffffffff8109def9>] kthread+0xc9/0xe0
>   17 [<ffffffff817f6a22>] ret_from_fork+0x42/0x70
>   18 [<ffffffffffffffff>] 0xffffffffffffffff
>   19
>   20 ========== get stack of 35883 ==========
>   21
>   22 [<ffffffffc0620260>] __ocfs2_cluster_lock.isra.37+0x2b0/0x9f0 [ocfs2]
>   23 [<ffffffffc0621c4d>] ocfs2_inode_lock_full_nested+0x1fd/0xc50 [ocfs2]
>   24 [<ffffffffc0638b72>] __ocfs2_recovery_thread+0x6f2/0x14d0 [ocfs2]
>   25 [<ffffffff8109def9>] kthread+0xc9/0xe0
>   26 [<ffffffff817f6a22>] ret_from_fork+0x42/0x70
>   27 [<ffffffffffffffff>] 0xffffffffffffffff
>   28
>   29 ========== get stack of 36601 ==========
>   30 df^@-BM^@-TP^@
>   31 [<ffffffffc06367a5>] ocfs2_wait_for_recovery+0x75/0xc0 [ocfs2]
>   32 [<ffffffffc0621d68>] ocfs2_inode_lock_full_nested+0x318/0xc50 [ocfs2]
>   33 [<ffffffffc066a1e1>] ocfs2_statfs+0x81/0x400 [ocfs2]
>   34 [<ffffffff81235969>] statfs_by_dentry+0x99/0x140
>   35 [<ffffffff81235a2b>] vfs_statfs+0x1b/0xa0
>   36 [<ffffffff81235af5>] user_statfs+0x45/0x80
>   37 [<ffffffff81235bab>] SYSC_statfs+0x1b/0x40
>   38 [<ffffffff81235cee>] SyS_statfs+0xe/0x10
>   39 [<ffffffff817f65f2>] system_call_fastpath+0x16/0x75
>   40 [<ffffffffffffffff>] 0xffffffffffffffff
>
>
> Thanks
>
> Guozhonghua
> -------------------------------------------------------------------------------------------------------------------------------------
> ????????????????????????????????????????
> ????????????????????????????????????????
> ????????????????????????????????????????
> ???
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>

      reply	other threads:[~2016-05-09  5:39 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-07 11:30 [Ocfs2-devel] Dead lock and cluster blocked, any advices will be appreciated Guozhonghua
2016-05-09  5:39 ` Eric Ren [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=573022A8.2020400@suse.com \
    --to=zren@suse.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).