All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] hang with fsdlm
Date: Wed, 15 Apr 2009 19:07:19 -0700	[thread overview]
Message-ID: <49E692D7.5030009@oracle.com> (raw)
In-Reply-To: <20090415165717.GD22705@redhat.com>

David,

So it's the same issue. bull-02 knows it is blocking an EX, has no
holders, but is not doing anything to downconvert.

Coly Li is working on this same issue on novellbz#492055.
https://bugzilla.novell.com/show_bug.cgi?id=492055

He is unable to reproduce with the sles11 kernel. The dlm bits should
be the same as mainline. I think. I've asked him to run with mainline.

Do you have a novell bz account?

Sunil

David Teigland wrote:
> Using fsdlm/ocfs2_controld.cman, I've rerun the test I've been having problems
> with on 2.6.30-rc1.  After running for several minutes in the same directory
> on three nodes, the test hangs, and I collect the following information:
>
>
> bull-01
> -------
>
>  3053 S<   [ocfs2dc]                   ocfs2_downconvert_thread
>  3054 S<   [dlm_astd]                  dlm_astd
>  3055 S<   [dlm_scand]                 dlm_scand
>  3056 S<   [dlm_recv/0]                worker_thread
>  3057 S<   [dlm_recv/1]                worker_thread
>  3058 S<   [dlm_recv/2]                worker_thread
>  3059 S<   [dlm_recv/3]                worker_thread
>  3060 S<   [dlm_send]                  worker_thread
>  3061 S<   [dlm_recoverd]              dlm_recoverd
>  3067 S<   [kjournald2]                kjournald2
>  3068 S<   [ocfs2cmt]                  ocfs2_commit_thread
>  3082 D+   /usr/bin/perl /root/make_pa ocfs2_cluster_lock
>
> make_panic    D 0000000000000002     0  3082   3021
>  ffff8801360f1a38 0000000000000046 ffffffffa0240b08 ffff88013703ea40
>  ffff88013eb877f8 0000000000004000 00000000001d0a80 000000000000cb10
>  0000000000000000 ffffffffa0240af0 ffff88013703ea40 ffff88007ffee340
> Call Trace:
>  [<ffffffffa022bdbf>] ? dlm_put_lockspace+0x18/0x2b [dlm]
>  [<ffffffff804f13e4>] schedule+0x9/0x1d
>  [<ffffffff804f171d>] schedule_timeout+0x24/0x15e
>  [<ffffffff804f1559>] ? wait_for_common+0x3c/0x123
>  [<ffffffff804f15ec>] wait_for_common+0xcf/0x123
>  [<ffffffff8023616f>] ? default_wake_function+0x0/0xf
>  [<ffffffff804f16ca>] wait_for_completion+0x18/0x1a
>  [<ffffffffa02d3755>] ocfs2_cluster_lock+0x91f/0x93e [ocfs2]
>  [<ffffffff804f387f>] ? _spin_unlock+0x26/0x2a
>  [<ffffffffa02e59fb>] ? ocfs2_recovery_completed+0x34/0x3c [ocfs2]
>  [<ffffffffa02d750c>] ocfs2_inode_lock_full+0x180/0xccb [ocfs2]
>  [<ffffffff8025d97f>] ? print_lock_contention_bug+0x1e/0x110
>  [<ffffffffa02f0210>] ocfs2_unlink+0x1a3/0xa6e [ocfs2]
>  [<ffffffff802c3686>] ? vfs_unlink+0x95/0x110
>  [<ffffffff802c36af>] vfs_unlink+0xbe/0x110
>  [<ffffffff802c5d46>] do_unlinkat+0xcd/0x15d
>  [<ffffffff802be708>] ? sys_newlstat+0x31/0x3c
>  [<ffffffff804f34fc>] ? lockdep_sys_exit_thunk+0x35/0x67
>  [<ffffffff802c5de7>] sys_unlink+0x11/0x13
>  [<ffffffff8020baeb>] system_call_fastpath+0x16/0x1b
>
>     Device => Id: 253,2  Uuid: 8A77044F9D98416FAB240B44B1843EDA  Gen: 0x7622CA50
>   Label: 
>     Volume => State: 2  Flags: 0x0
>      Sizes => Block: 4096  Cluster: 4096
>   Features => Compat: 0x3  Incompat: 0x1D0  ROcompat: 0x1
>      Mount => Opts: 0x0  AtimeQuanta: 60
>    Cluster => Stack: cman  Name: 8A77044F9D98416FAB240B44B1843EDA  Version: 1.0
>   DownCnvt => Pid: 3053  Count: 0  WakeSeq: 55892  WorkSeq: 55892
>   Recovery => Pid: -1  Nodes: None
>     Commit => Pid: 3068  Interval: 0  Needs: 1
>    Journal => State: 1  NumTxns: 2  TxnId: 1136
>      Stats => GlobalAllocs: 3  LocalAllocs: 915  SubAllocs: 2938  LAWinMoves: 1 
>  SAExtends: 2
> LocalAlloc => State: 1  Descriptor: 2032128  Size: 2048 bits  Default: 2048 bits
>      Steal => Slot: -1  NumStolen: 0
>      Slots => Num     RecoGen
>             *   0           3
>                 1           3
>                 2           3
>                 3           3
>
> Lockres: M0000000000000000046e2a00000000  Mode: Protected Read
> Flags: Initialized Attached Busy
> RO Holders: 0  EX Holders: 0
> Pending Action: Convert  Pending Unlock Action: None
> Requested Mode: Exclusive  Blocking Mode: No Lock
> PR > Gets: 982  Fails: 0    Waits (usec) Total: 637775  Max: 79972
> EX > Gets: 69  Fails: 0    Waits (usec) Total: 723745  Max: 40985
> Disk Refreshes: 0
>
> Resource len 31  "M0000000000000000046e2a00000000"
> Master           
> LVB len 64 seq 123
> 05 00 00 00 00 00 00 02 00 00 00 00 00 00 00 00 
> 12 75 a1 17 e0 68 d0 10 12 79 39 dd b8 79 ee b4 
> 12 79 39 dd b8 79 ee b4 00 00 00 00 00 00 20 00 
> 41 ed 00 02 00 00 00 00 82 92 d2 7a 00 00 00 00
> Granted
> 01ce0003 PR      Remote:   4 03d80001          
> 014a0002 PR      Remote:   2 00d70001          
> Convert
> 038f0001 PR (EX)                               
>
>
> bull-02
> -------
>
>  2759 S<   [ocfs2dc]                   ocfs2_downconvert_thread
>  2760 S<   [dlm_astd]                  dlm_astd
>  2761 S<   [dlm_scand]                 dlm_scand
>  2762 S<   [dlm_recv/0]                worker_thread
>  2763 S<   [dlm_recv/1]                worker_thread
>  2764 S<   [dlm_recv/2]                worker_thread
>  2765 S<   [dlm_recv/3]                worker_thread
>  2766 S<   [dlm_send]                  worker_thread
>  2767 S<   [dlm_recoverd]              dlm_recoverd
>  2779 S<   [kjournald2]                kjournald2
>  2780 S<   [ocfs2cmt]                  ocfs2_commit_thread
>  2783 D+   /usr/bin/perl /root/make_pa ocfs2_cluster_lock
>
> make_panic    D 0000000000000002     0  2783   2727
>  ffff88013f497958 0000000000000046 ffffffffa024ab08 ffff88013ed6e540
>  ffff88007e9557f8 0000000000004000 00000000001d0a80 000000000000cb10
>  0000000000000000 ffffffffa024aaf0 ffff88013ed6e540 ffff88007fffe440
> Call Trace:
>  [<ffffffffa0235dbf>] ? dlm_put_lockspace+0x18/0x2b [dlm]
>  [<ffffffff804f13e4>] schedule+0x9/0x1d
>  [<ffffffff804f171d>] schedule_timeout+0x24/0x15e
>  [<ffffffff804f1559>] ? wait_for_common+0x3c/0x123
>  [<ffffffff804f15ec>] wait_for_common+0xcf/0x123
>  [<ffffffff8023616f>] ? default_wake_function+0x0/0xf
>  [<ffffffff804f16ca>] wait_for_completion+0x18/0x1a
>  [<ffffffffa02dd755>] ocfs2_cluster_lock+0x91f/0x93e [ocfs2]
>  [<ffffffff8025d97f>] ? print_lock_contention_bug+0x1e/0x110
>  [<ffffffff804f387f>] ? _spin_unlock+0x26/0x2a
>  [<ffffffffa02ef9fb>] ? ocfs2_recovery_completed+0x34/0x3c [ocfs2]
>  [<ffffffffa02e150c>] ocfs2_inode_lock_full+0x180/0xccb [ocfs2]
>  [<ffffffffa02d0198>] ? ocfs2_dentry_revalidate+0x194/0x268 [ocfs2]
>  [<ffffffffa02e7b97>] ocfs2_permission+0x70/0x167 [ocfs2]
>  [<ffffffff802c285f>] inode_permission+0x69/0x97
>  [<ffffffff802c4752>] __link_path_walk+0x134/0xdea
>  [<ffffffff802c560e>] path_walk+0x69/0xd4
>  [<ffffffff802c58a7>] do_path_lookup+0x187/0x1df
>  [<ffffffff802bc094>] ? get_empty_filp+0xfc/0x188
>  [<ffffffff802c59bd>] path_lookup_open+0x5a/0x9b
>  [<ffffffff802c6588>] do_filp_open+0xb6/0x899
>  [<ffffffff802cf62e>] ? alloc_fd+0x3a/0x12e
>  [<ffffffff802cf711>] ? alloc_fd+0x11d/0x12e
>  [<ffffffff802b92aa>] do_sys_open+0x53/0xd3
>  [<ffffffff802b9353>] sys_open+0x1b/0x1d
>  [<ffffffff8020baeb>] system_call_fastpath+0x16/0x1b
>
>     Device => Id: 253,2  Uuid: 8A77044F9D98416FAB240B44B1843EDA  Gen: 0x7622CA50  Label: 
>     Volume => State: 2  Flags: 0x0
>      Sizes => Block: 4096  Cluster: 4096
>   Features => Compat: 0x3  Incompat: 0x1D0  ROcompat: 0x1
>      Mount => Opts: 0x0  AtimeQuanta: 60
>    Cluster => Stack: cman  Name: 8A77044F9D98416FAB240B44B1843EDA  Version: 1.0
>   DownCnvt => Pid: 2759  Count: 1  WakeSeq: 14838  WorkSeq: 14838
>   Recovery => Pid: -1  Nodes: None
>     Commit => Pid: 2780  Interval: 0  Needs: 1
>    Journal => State: 1  NumTxns: 8  TxnId: 928
>      Stats => GlobalAllocs: 3  LocalAllocs: 513  SubAllocs: 1556  LAWinMoves: 1  SAExtends: 2
> LocalAlloc => State: 1  Descriptor: 2064384  Size: 2048 bits  Default: 2048 bits
>      Steal => Slot: -1  NumStolen: 0
>      Slots => Num     RecoGen
>                 0           3
>             *   1           3
>                 2           3
>                 3           3
>
> Lockres: M0000000000000000046e2a00000000  Mode: Protected Read
> Flags: Initialized Attached Blocked Needs Refresh Queued
> RO Holders: 0  EX Holders: 0
> Pending Action: None  Pending Unlock Action: None
> Requested Mode: Protected Read  Blocking Mode: Exclusive
> PR > Gets: 463  Fails: 0    Waits (usec) Total: 1052625  Max: 41985
> EX > Gets: 37  Fails: 0    Waits (usec) Total: 990652  Max: 79971
> Disk Refreshes: 0
>
> Resource len 31  "M0000000000000000046e2a00000000"
> Local 1          
> Granted
> 00d70001 PR      Master:   1 014a0002          
>
> dlm: 8A77044F9D98416FAB240B44B1843EDA: addwait d70001 cur 2 overlap 4 count 2 f 100000
> dlm: 8A77044F9D98416FAB240B44B1843EDA: remwait d70001 convert_reply zap overlap_cancel
> dlm: 8A77044F9D98416FAB240B44B1843EDA: remwait d70001 cancel_reply wait_type 0
>
> messages mean a cancel operation occurs while a remote convert is in progress,
> then the convert reply comes back indicating that the convert was successful,
> so the cancel is a moot point and being ignored.
>
>
> bull-04
> -------
>
>  4114 S<   [ocfs2dc]                   ocfs2_downconvert_thread
>  4115 S<   [dlm_astd]                  dlm_astd
>  4116 S<   [dlm_scand]                 dlm_scand
>  4117 S<   [dlm_recv/0]                worker_thread
>  4118 S<   [dlm_recv/1]                worker_thread
>  4119 S<   [dlm_recv/2]                worker_thread
>  4120 S<   [dlm_recv/3]                worker_thread
>  4121 S<   [dlm_send]                  worker_thread
>  4122 S<   [dlm_recoverd]              dlm_recoverd
>  4136 S<   [kjournald2]                kjournald2
>  4137 S<   [ocfs2cmt]                  ocfs2_commit_thread
>  4142 D+   /usr/bin/perl /root/make_pa ocfs2_wait_for_mask
>
> make_panic    D 0000000000000002     0  4142   4086
>  ffff88013147b998 0000000000000046 ffff88013147b938 ffffffff8025aea3
>  0000000000000000 ffff88013142acc0 ffff88007ffee340 ffff88013142b078
>  0000000100078e31 0000000000000246 0000000000000292 ffffffffa02b102b
> Call Trace:
>  [<ffffffff8025aea3>] ? print_lock_contention_bug+0x1b/0xe1
>  [<ffffffffa02b102b>] ? dlm_put_lockspace+0x18/0x2b [dlm]
>  [<ffffffff804e6cfd>] schedule+0x9/0x1d
>  [<ffffffff804e702f>] schedule_timeout+0x28/0x176
>  [<ffffffff804e6dce>] ? wait_for_common+0x3d/0x11e
>  [<ffffffff804e6e57>] wait_for_common+0xc6/0x11e
>  [<ffffffff80232c89>] ? default_wake_function+0x0/0xf
>  [<ffffffffa03457be>] ? ocfs2_cluster_lock+0xf2/0x8e9 [ocfs2]
>  [<ffffffff804e6f39>] wait_for_completion+0x18/0x1a
>  [<ffffffffa0345563>] ocfs2_wait_for_mask+0x15/0x23 [ocfs2]
>  [<ffffffffa0345f90>] ocfs2_cluster_lock+0x8c4/0x8e9 [ocfs2]
>  [<ffffffff8025aea3>] ? print_lock_contention_bug+0x1b/0xe1
>  [<ffffffff804e9001>] ? _spin_unlock+0x26/0x2a
>  [<ffffffffa03592b8>] ? ocfs2_recovery_completed+0x34/0x3c [ocfs2]
>  [<ffffffffa03479d0>] ocfs2_inode_lock_full+0x181/0xcbc [ocfs2]
>  [<ffffffff804e9001>] ? _spin_unlock+0x26/0x2a
>  [<ffffffffa033a3c7>] ? ocfs2_dentry_revalidate+0x194/0x26c [ocfs2]
>  [<ffffffffa03522a0>] ocfs2_permission+0x70/0x166 [ocfs2]
>  [<ffffffff802bdec4>] inode_permission+0x69/0x97
>  [<ffffffff802bfe9e>] __link_path_walk+0x13f/0xced
>  [<ffffffff8025aea3>] ? print_lock_contention_bug+0x1b/0xe1
>  [<ffffffff802c0a9a>] path_walk+0x4e/0x97
>  [<ffffffff802c0c84>] do_path_lookup+0x118/0x175
>  [<ffffffff802c1aff>] do_filp_open+0xde/0x819
>  [<ffffffff8025aea3>] ? print_lock_contention_bug+0x1b/0xe1
>  [<ffffffff802bfae7>] ? getname+0x26/0x1ac
>  [<ffffffff802ca4f2>] ? alloc_fd+0x33/0x11e
>  [<ffffffff804e9001>] ? _spin_unlock+0x26/0x2a
>  [<ffffffff802ca5ce>] ? alloc_fd+0x10f/0x11e
>  [<ffffffff802b4a30>] do_sys_open+0x53/0xd3
>  [<ffffffff802b4ad9>] sys_open+0x1b/0x1d
>  [<ffffffff8020baab>] system_call_fastpath+0x16/0x1b
>
>     Device => Id: 253,2  Uuid: 8A77044F9D98416FAB240B44B1843EDA  Gen: 0x7622CA50  Label: 
>     Volume => State: 2  Flags: 0x0
>      Sizes => Block: 4096  Cluster: 4096
>   Features => Compat: 0x3  Incompat: 0x1D0  ROcompat: 0x1
>      Mount => Opts: 0x0  AtimeQuanta: 60
>    Cluster => Stack: cman  Name: 8A77044F9D98416FAB240B44B1843EDA  Version: 1.0
>   DownCnvt => Pid: 4114  Count: 1  WakeSeq: 14213  WorkSeq: 14213
>   Recovery => Pid: -1  Nodes: None
>     Commit => Pid: 4137  Interval: 0  Needs: 1
>    Journal => State: 1  NumTxns: 4  TxnId: 851
>      Stats => GlobalAllocs: 2  LocalAllocs: 414  SubAllocs: 1396  LAWinMoves: 1  SAExtends: 1
> LocalAlloc => State: 1  Descriptor: 2096640  Size: 2048 bits  Default: 2048 bits
>      Steal => Slot: -1  NumStolen: 0
>      Slots => Num     RecoGen
>                 0           3
>                 1           3
>                 2           3
>             *   3           3
>
> no M0000000000000000046e2a00000000 exists in locking_state
>
> Resource len 31  "M0000000000000000046e2a00000000"
> Local 1          
> Granted
> 03d80001 PR      Master:   1 01ce0003          
>
> dlm: 8A77044F9D98416FAB240B44B1843EDA: validate_unlock_args -16 3d80001 0 10c 2 0 M0000000000000000046e2a00000000
>
> message means the lock is granted, i.e. the convert being canceled already
> completed, so the cancel operation is being ignored.
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>   

  reply	other threads:[~2009-04-16  2:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-15 16:57 [Ocfs2-devel] hang with fsdlm David Teigland
2009-04-16  2:07 ` Sunil Mushran [this message]
2009-04-17 11:17   ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49E692D7.5030009@oracle.com \
    --to=sunil.mushran@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.