All of lore.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <coly.li@suse.de>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] dlm stress test hangs OCFS2
Date: Thu, 03 Sep 2009 01:11:53 +0800	[thread overview]
Message-ID: <4A9EA759.5090906@suse.de> (raw)
In-Reply-To: <4A8B6C29.30802@oracle.com>



Sunil Mushran Wrote:
> Read this thread for some background. There are others like this.
> http://oss.oracle.com/pipermail/ocfs2-devel/2009-April/004313.html
> 
> David had run into a similar issue with two nodes. The symptoms were the
> same. In that case, we were failing to kick the downconvert thread under
> one situation.
> 
> Bottomline, the reason for the hang is that a node is not downconverting
> its lock. It could be a race in dlmglue or something else.
> 
> The node has a PR and an another nodes wants an EX. Unless the node
> downconverts
> to a NL, the master cannot upconvert the other node to EX. Hang. Also,
> cancel
> converts are in the mix.
> [snip]
> The downcnvt shows 1 lockres is queued. We have to assume it is this one.
> If not, then we have a bigger problem. Maybe add a quick/dirty hack to dump
> the lockres in this queue.
> 
> Maybe we are forgetting to kick it like last time. I did scan the code
> for that but came up empty handed.
> 
> To solve this mystery, you have to find out as to why the dc thread is
> not acting on the lockres. Forget stats. Just add printks in that thread.
> Starting from say ocfs2_downconvert_thread_do_work().

I simplified the original perl script to a simple bash script,
---------------------------------
#!/bin/sh

prefix=`hostname`
i=1
while [ 1 ];do
	f="$prefix"_"$i"
	echo $f
	touch $f
	i=`expr $i + 1`
	if [ $i -ge 1000 ];then
		i=1
		rm -f "$prefix"_*
	fi
done
---------------------------------

Run the above script on both nodes can also reproduce the blocking issue.

When the blocking happens, ocfs2_downconvert_thread_do_work() still gets called
again and again.

I add a printk to display osb->blocked_lock_count before the while(1) loop
inside ocfs2_downconvert_thread_do_work().

Here is what I observed,
1) Before the blocking happens, the
number sequence is,
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 2
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
(the count could be 1, 0, 2 and in an irregular sequence)

2) when the blocking happens, the number sequence of osb->blocked_lock_count is
always like this,
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0
ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1
(all are 0-1-0-1-0-1-... in a regular sequence)

Continue to track...

-- 
Coly Li
SuSE Labs

  reply	other threads:[~2009-09-02 17:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-18 19:26 [Ocfs2-devel] dlm stress test hangs OCFS2 Coly Li
2009-08-18 19:34 ` David Teigland
2009-08-19  3:06 ` Sunil Mushran
2009-09-02 17:11   ` Coly Li [this message]
2009-09-02 22:01     ` Sunil Mushran
2009-09-03 16:24       ` Coly Li
2009-09-03 16:24         ` Sunil Mushran
2009-09-09 20:07           ` Coly Li
2009-09-09 21:42             ` Sunil Mushran
2009-09-10  5:38               ` Coly Li
2009-09-11 22:57                 ` Sunil Mushran
2009-09-13 14:08                   ` Coly Li
2009-09-14 19:30                     ` Sunil Mushran
2009-09-14 20:23                       ` Coly Li
2009-09-14 23:57                         ` Sunil Mushran
2009-09-15  7:11                           ` Coly Li
2009-09-16  0:49                             ` Sunil Mushran
2009-09-21 17:25                               ` Coly Li
2009-09-21 17:25                                 ` Sunil Mushran
2009-09-21 17:31                                   ` Sunil Mushran
2009-09-21 17:43                                     ` Coly Li
2009-09-21 19:03                                     ` Coly Li
2009-09-23  6:32                               ` [Ocfs2-devel] questions of AST and BAST (was Re: dlm stress test hangs OCFS2) Coly Li
2009-09-23 18:21                                 ` Sunil Mushran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A9EA759.5090906@suse.de \
    --to=coly.li@suse.de \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.