From: Coly Li <coyli@suse.de>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
Date: Wed, 03 Dec 2008 10:23:48 +0800 [thread overview]
Message-ID: <4935EDB4.7020405@suse.de> (raw)
In-Reply-To: <4935B20B.9040309@oracle.com>
Sunil Mushran Wrote:
> Coly Li wrote:
>> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
>> index 44f87ca..c777844 100644
>> --- a/fs/ocfs2/dlm/dlmmaster.c
>> +++ b/fs/ocfs2/dlm/dlmmaster.c
>> @@ -742,6 +742,19 @@ lookup:
>> goto lookup;
>> }
>>
>> + /* tmpres might be initiating by another process, in this case
>> + * we should wait until the initiation done */
>> + spin_lock(&tmpres->spinlock);
>> + if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN ||
>> + (tmpres->state & DLM_LOCK_RES_IN_PROGRESS)) {
>> + __dlm_wait_on_lockres_flags(tmpres,
>> DLM_LOCK_RES_IN_PROGRESS);
>> + spin_unlock(&tmpres->spinlock);
>> + dlm_lockres_put(tmpres);
>> + tmpres = NULL;
>> + goto lookup;
>> + }
>> + spin_unlock(&tmpres->spinlock);
>> +
>> mlog(0, "found in hash!\n");
>> if (res)
>> dlm_lockres_put(res);
>
> This will work but we are wasting cycles as we are looking up the same
> lockres again. How about the following? The mlog is for testing.
>
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -742,6 +742,17 @@ lookup:
> goto lookup;
> }
>
> + /* wait if lock resource is being mastered by another
> thread */
> + spin_lock(&tmpres->spinlock);
> + if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
> + mlog(ML_NOTICE, "%s:%.*s Another thread is
> mastering "
> + "this resource\n", dlm->name, namelen,
> lockid);
> + __dlm_wait_on_lockres_flags(tmpres,
> +
> DLM_LOCK_RES_IN_PROGRESS);
> + BUG_ON(tmpres->owner ==
> DLM_LOCK_RES_OWNER_UNKNOWN);
> + }
> + spin_unlock(&tmpres->spinlock);
This is the way I tried firstly, didn't not work well. When I tested with more than 20 forks on 1
node, user space process got segmentation fault and kernel got oops. The kernel oops was triggered
on BUG_ON(res->inflight_locks == 0) in __dlm_lockres_drop_inflight_ref() IIRC.
Also, BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) might not always be correct IMHO. Even we
return from __dlm_wait_on_lockres_flags(), there still is chance to set tmpres->owner to
DLM_LOCK_RES_OWNER_UNKNOWN before BUG_ON is tested (in theory), especially when dlm->spinlock is not
held.
> +
> mlog(0, "found in hash!\n");
> if (res)
> dlm_lockres_put(res);
>
> Secondly, the patch comments are more than required. We should prune it
> down a bit. Like:
>
> =========================================================================
> ocfs2/dlm: Fix lockres mastery race
Cool. Since I was not sure this was a race error, you can see, I did not mention race anywhere.
I like this title more ;-)
>
> dlm_get_lock_resource() is supposed to return a lock resource with a proper
> master. If multiple concurrent threads attempt to lookup the lockres for
> the
> same lockid, one or more threads are likely to return a lockres without a
> proper master, if the lock mastery is underway.
>
> This patch makes the threads wait in dlm_get_lock_resource() while the
> mastery
> is underway, ensuring all threads return the lockres with a proper master.
>
> This issue is limited to users using the flock() syscall. For all other fs
> operations, dlmglue ensures that only one thread is performing dlm
> operations
> for a given lockid at any one time.
>
> Signed-off-by: Coly Li <...>
> =========================================================================
>
> Corrections welcome.
The comment is perfect IMHO.
>
> Lastly, test the patch not only with the flock() test but also otherwise.
> Say, with recovery. Part of fixing bugs is ensuring the fix does not break
> existing functionality. Run all multinode tests in ocfs2-test.
Sure, thanks for the suggestion. I will test the patch will all multinode tests, and submit a patch
to ocfs2 test-suite for this fix.
--
Coly Li
SuSE PRC Labs
next prev parent reply other threads:[~2008-12-03 2:23 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-02 12:06 [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking Coly Li
2008-12-02 22:09 ` Sunil Mushran
2008-12-03 2:23 ` Coly Li [this message]
2008-12-04 3:58 ` Sunil Mushran
2008-12-03 19:48 ` Coly Li
2008-12-04 1:31 ` tristan.ye
[not found] ` <4940CC19.5010801@suse.de>
2008-12-11 8:26 ` tristan.ye
-- strict thread matches above, loose matches on Subject: below --
2008-12-09 2:28 [Ocfs2-devel] flock() race - fix Sunil Mushran
2008-12-09 2:28 ` [Ocfs2-devel] [PATCH 11/11] ocfs2/dlm: Fix race during lockres mastery Sunil Mushran
2008-12-11 19:04 ` [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking Coly Li
2008-12-11 19:23 ` Sunil Mushran
2008-12-11 21:10 ` Coly Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4935EDB4.7020405@suse.de \
--to=coyli@suse.de \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.