* [Ocfs2-devel] flock() race - fix
@ 2008-12-09 2:28 Sunil Mushran
2008-12-09 2:28 ` [Ocfs2-devel] [PATCH 10/11] ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list Sunil Mushran
2008-12-09 2:28 ` [Ocfs2-devel] [PATCH 11/11] ocfs2/dlm: Fix race during lockres mastery Sunil Mushran
0 siblings, 2 replies; 13+ messages in thread
From: Sunil Mushran @ 2008-12-09 2:28 UTC (permalink / raw)
To: ocfs2-devel
So these patches are against the ocfs2-1.4 tree. I am sending them as-is
as the bug has been filed against 1.4. Will email the patches for the
mainline kernel later.
Sunil
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 10/11] ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list
2008-12-09 2:28 [Ocfs2-devel] flock() race - fix Sunil Mushran
@ 2008-12-09 2:28 ` Sunil Mushran
2008-12-09 2:28 ` [Ocfs2-devel] [PATCH 11/11] ocfs2/dlm: Fix race during lockres mastery Sunil Mushran
1 sibling, 0 replies; 13+ messages in thread
From: Sunil Mushran @ 2008-12-09 2:28 UTC (permalink / raw)
To: ocfs2-devel
This patch adds a new lock, dlm->tracking_lock, to protect adding/removing
lockres' to/from the dlm->tracking_list. We were previously using dlm->spinlock
for the same, but that proved inadequate as we could be freeing a lockres from
a context that did not hold that lock. As the new lock only protects this list,
we can explicitly take it when removing the lockres from the tracking list.
This bug was exposed when testing multiple processes concurrently flock() the
same file.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
fs/ocfs2/dlm/dlmcommon.h | 3 ++
fs/ocfs2/dlm/dlmdebug.c | 53 ++++++++++++++++++++-------------------------
fs/ocfs2/dlm/dlmdomain.c | 1 +
fs/ocfs2/dlm/dlmmaster.c | 10 ++++++++
4 files changed, 38 insertions(+), 29 deletions(-)
diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index 9add88d..a7ecccc 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -140,6 +140,7 @@ struct dlm_ctxt
unsigned int purge_count;
spinlock_t spinlock;
spinlock_t ast_lock;
+ spinlock_t track_lock;
char *name;
u8 node_num;
u32 key;
@@ -316,6 +317,8 @@ struct dlm_lock_resource
* put on a list for the dlm thread to run. */
unsigned long last_used;
+ struct dlm_ctxt *dlm;
+
unsigned migration_pending:1;
atomic_t asts_reserved;
spinlock_t spinlock;
diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index 1b81dcb..b32f60a 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -630,43 +630,38 @@ static void *lockres_seq_start(struct seq_file *m, loff_t *pos)
{
struct debug_lockres *dl = m->private;
struct dlm_ctxt *dlm = dl->dl_ctxt;
+ struct dlm_lock_resource *oldres = dl->dl_res;
struct dlm_lock_resource *res = NULL;
+ struct list_head *track_list;
- spin_lock(&dlm->spinlock);
+ spin_lock(&dlm->track_lock);
+ if (oldres)
+ track_list = &oldres->tracking;
+ else
+ track_list = &dlm->tracking_list;
- if (dl->dl_res) {
- list_for_each_entry(res, &dl->dl_res->tracking, tracking) {
- if (dl->dl_res) {
- dlm_lockres_put(dl->dl_res);
- dl->dl_res = NULL;
- }
- if (&res->tracking == &dlm->tracking_list) {
- mlog(0, "End of list found, %p\n", res);
- dl = NULL;
- break;
- }
+ list_for_each_entry(res, track_list, tracking) {
+ if (&res->tracking == &dlm->tracking_list)
+ res = NULL;
+ else
dlm_lockres_get(res);
- dl->dl_res = res;
- break;
- }
- } else {
- if (!list_empty(&dlm->tracking_list)) {
- list_for_each_entry(res, &dlm->tracking_list, tracking)
- break;
- dlm_lockres_get(res);
- dl->dl_res = res;
- } else
- dl = NULL;
+ break;
}
+ spin_unlock(&dlm->track_lock);
- if (dl) {
- spin_lock(&dl->dl_res->spinlock);
- dump_lockres(dl->dl_res, dl->dl_buf, dl->dl_len - 1);
- spin_unlock(&dl->dl_res->spinlock);
- }
+ if (oldres)
+ dlm_lockres_put(oldres);
- spin_unlock(&dlm->spinlock);
+ dl->dl_res = res;
+
+ if (res) {
+ spin_lock(&res->spinlock);
+ dump_lockres(res, dl->dl_buf, dl->dl_len - 1);
+ spin_unlock(&res->spinlock);
+ } else
+ dl = NULL;
+ /* passed to seq_show */
return dl;
}
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 2eff4a4..c2e192e 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -1550,6 +1550,7 @@ static struct dlm_ctxt *dlm_alloc_ctxt(const char *domain,
spin_lock_init(&dlm->spinlock);
spin_lock_init(&dlm->master_lock);
spin_lock_init(&dlm->ast_lock);
+ spin_lock_init(&dlm->track_lock);
INIT_LIST_HEAD(&dlm->list);
INIT_LIST_HEAD(&dlm->dirty_list);
INIT_LIST_HEAD(&dlm->reco.resources);
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index a575392..4d493f4 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -509,8 +509,10 @@ void dlm_change_lockres_owner(struct dlm_ctxt *dlm,
static void dlm_lockres_release(struct kref *kref)
{
struct dlm_lock_resource *res;
+ struct dlm_ctxt *dlm;
res = container_of(kref, struct dlm_lock_resource, refs);
+ dlm = res->dlm;
/* This should not happen -- all lockres' have a name
* associated with them at init time. */
@@ -519,6 +521,7 @@ static void dlm_lockres_release(struct kref *kref)
mlog(0, "destroying lockres %.*s\n", res->lockname.len,
res->lockname.name);
+ spin_lock(&dlm->track_lock);
if (!list_empty(&res->tracking))
list_del_init(&res->tracking);
else {
@@ -526,6 +529,9 @@ static void dlm_lockres_release(struct kref *kref)
res->lockname.len, res->lockname.name);
dlm_print_one_lock_resource(res);
}
+ spin_unlock(&dlm->track_lock);
+
+ dlm_put(dlm);
if (!hlist_unhashed(&res->hash_node) ||
!list_empty(&res->granted) ||
@@ -599,6 +605,10 @@ static void dlm_init_lockres(struct dlm_ctxt *dlm,
res->migration_pending = 0;
res->inflight_locks = 0;
+ /* put in dlm_lockres_release */
+ dlm_grab(dlm);
+ res->dlm = dlm;
+
kref_init(&res->refs);
/* just for consistency */
--
1.5.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread* [Ocfs2-devel] [PATCH 11/11] ocfs2/dlm: Fix race during lockres mastery
2008-12-09 2:28 [Ocfs2-devel] flock() race - fix Sunil Mushran
2008-12-09 2:28 ` [Ocfs2-devel] [PATCH 10/11] ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list Sunil Mushran
@ 2008-12-09 2:28 ` Sunil Mushran
2008-12-11 19:04 ` [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking Coly Li
1 sibling, 1 reply; 13+ messages in thread
From: Sunil Mushran @ 2008-12-09 2:28 UTC (permalink / raw)
To: ocfs2-devel
dlm_get_lock_resource() is supposed to return a lock resource with a proper
master. If multiple concurrent threads attempt to lookup the lockres for the
same lockid while the lock mastery in underway, one or more threads are likely
to return a lockres without a proper master.
This patch makes the threads wait in dlm_get_lock_resource() while the mastery
is underway, ensuring all threads return the lockres with a proper master.
This issue is known to be limited to users using the flock() syscall. For all
other fs operations, the dlmglue layer serializes the dlm op for each lockid.
Patch fixes Novell bz#425491
https://bugzilla.novell.com/show_bug.cgi?id=425491
Users encountering this bug will see flock() return EINVAL and dmesg have the
following error:
ERROR: Dlm error "DLM_BADARGS" while calling dlmlock on resource <LOCKID>: bad api args
Reported-by: Coly Li <coyli@suse.de>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
---
fs/ocfs2/dlm/dlmmaster.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 4d493f4..6bda3ab 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -736,14 +736,21 @@ lookup:
if (tmpres) {
int dropping_ref = 0;
+ spin_unlock(&dlm->spinlock);
+
spin_lock(&tmpres->spinlock);
+ /* We wait for the other thread that is mastering the resource */
+ if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
+ __dlm_wait_on_lockres(tmpres);
+ BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN);
+ }
+
if (tmpres->owner == dlm->node_num) {
BUG_ON(tmpres->state & DLM_LOCK_RES_DROPPING_REF);
dlm_lockres_grab_inflight_ref(dlm, tmpres);
} else if (tmpres->state & DLM_LOCK_RES_DROPPING_REF)
dropping_ref = 1;
spin_unlock(&tmpres->spinlock);
- spin_unlock(&dlm->spinlock);
/* wait until done messaging the master, drop our ref to allow
* the lockres to be purged, start over. */
--
1.5.6.3
^ permalink raw reply related [flat|nested] 13+ messages in thread* [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
2008-12-09 2:28 ` [Ocfs2-devel] [PATCH 11/11] ocfs2/dlm: Fix race during lockres mastery Sunil Mushran
@ 2008-12-11 19:04 ` Coly Li
2008-12-11 19:23 ` Sunil Mushran
0 siblings, 1 reply; 13+ messages in thread
From: Coly Li @ 2008-12-11 19:04 UTC (permalink / raw)
To: ocfs2-devel
Sunil,
Sorry for my late replying and thanks for your patient explaining.
Sunil Mushran Wrote:
> So I have a new patch. It has debug code too. I am still testing this.
> Don't upload it to bz.
>
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -672,6 +672,12 @@ void __dlm_lockres_drop_inflight_ref(struct
> dlm_ctxt *dlm,
> {
> assert_spin_locked(&res->spinlock);
>
> + if (res->inflight_locks == 0) {
> + mlog(ML_ERROR, "%s:%.*s: inflight=0\n",
> + dlm->name, res->lockname.len, res->lockname.name);
> + __dlm_print_one_lock_resource(res);
> + dump_stack();
> + }
> BUG_ON(res->inflight_locks == 0);
> res->inflight_locks--;
> mlog(0, "%s:%.*s: inflight--: now %u\n",
> @@ -726,14 +732,22 @@ lookup:
> if (tmpres) {
> int dropping_ref = 0;
>
> + spin_unlock(&dlm->spinlock);
> +
> spin_lock(&tmpres->spinlock);
> + if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
> + mlog(ML_NOTICE, "%s:%.*s Another thread is
> mastering "
> + "this resource\n", dlm->name, namelen, lockid);
> + __dlm_wait_on_lockres(tmpres);
> + BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN);
> + }
> +
> if (tmpres->owner == dlm->node_num) {
> BUG_ON(tmpres->state & DLM_LOCK_RES_DROPPING_REF);
> dlm_lockres_grab_inflight_ref(dlm, tmpres);
> } else if (tmpres->state & DLM_LOCK_RES_DROPPING_REF)
> dropping_ref = 1;
> spin_unlock(&tmpres->spinlock);
> - spin_unlock(&dlm->spinlock);
>
> /* wait until done messaging the master, drop our ref to
> allow
> * the lockres to be purged, start over. */
>
>
>
> Coly Li wrote:
>> This is the way I tried firstly, didn't not work well. When I tested with more than 20 forks on 1
>> node, user space process got segmentation fault and kernel got oops. The kernel oops was triggered
>> on BUG_ON(res->inflight_locks == 0) in __dlm_lockres_drop_inflight_ref() IIRC.
>>
>> Also, BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) might not always be correct IMHO. Even we
>> return from __dlm_wait_on_lockres_flags(), there still is chance to set tmpres->owner to
>> DLM_LOCK_RES_OWNER_UNKNOWN before BUG_ON is tested (in theory), especially when dlm->spinlock is not
>> held.
>>
> Well, if you get an oops, get the stack. If you had, it would have
> looked like this.
>
> (19990,0):__dlm_lockres_drop_inflight_ref:677
> ERROR:
> 1B57953A48324D258163FE31C04C4B11:F00000000000000000801f61f38b60c: inflight=0
> Call Trace:
> .show_stack+0x68/0x1b0 (unreliable)
> .__dlm_lockres_drop_inflight_ref+0x98/0x184 [ocfs2_dlm]
> .dlmlock_master+0x41c/0x4f8 [ocfs2_dlm]
> .dlmlock+0x8b0/0x11a0 [ocfs2_dlm]
> .ocfs2_lock_create+0x174/0x3a8 [ocfs2]
> .ocfs2_file_lock+0x158/0x6d8 [ocfs2]
> .ocfs2_flock+0x18c/0x278 [ocfs2]
> .sys_flock+0x170/0x1d4
> syscall_exit+0x0/0x40
> kernel BUG in __dlm_lockres_drop_inflight_ref at
> /rpmbuild/smushran/BUILD/ocfs2-1.4.1/fs/ocfs2/dlm/dlmmaster.c:680!
>
> This tells you what the issue is. We are forgetting to take an inflight ref.
> The new patch moves the code up so that we take the reference.
Because I am not familiar with the code yet, I though this is an oops triggered by my first
modification. Therefore, I choose to use a loop which did not trigger the oops.
From your reply, it seems kernel BUG in __dlm_lockres_drop_inflight_ref at dlmmaster.c:680 is
another bug ? I saw a patch named "ocfs2/dlm: Fix race in adding/removing lockres' to/from the
tracking list", is it the fix for this bug ? If yes, I should learn how you resolve it ;)
>
> Secondly, dlm->spinlock does not protect res->owner. Furthermore, that
> BUG_ON is important because that is the case. If mastery is underway,
> the state will remain IN_PROGRESS. I am curious as to why you feel
> otherwise.
Here is how I thought, please comments on my mistake,
The dlm associated with lockres is projected by dlm->spinlock, if we only protect lockres by
lockres->spinlock, there *might* be possibility to modify dlm->node_num somewhere. Since we have
quite a few places to compare lockres->owner with dlm->node_num, I suspect that manipulating on
lockres->owner without protecting dlm->owner might be problematic.
I might be wrong, and happy to see your correction :)
>
> Lastly, the perl script can be improved by enclosing the 20 fork loop
> into another loop that changes the filename. To startwith, touch 5000 files
> (test.flock%d). umount and mount again. This will drop the caches. Now
> before running any ls, run flock.pl. It will spawn 20 processes to flock()
> each file and wait till the 20 forks have exited before letting loose
> another
> 20 for the next file. (Makesure you have another node or two or more
> that have
> just mounted that volume.)
>
> I ran this with 5000 files successfully. I did encounter a problem in
> the last
> run, not where I expected it. Still a work in progress. I can't email
> you the
> perl script as my box will remain in the debugger for the night. ;)
>
> My problem with your patch is not that it is wrong, but it is punting the
> issue. Say we do have a case in which the res->owner remains unknown
> eventhough
> the state in not in_prog. In that case, the process will keep looping. By
> oopsing, we catch the problem upfront.
Thanks for the comments, I do agree with you. Your comments help me to understand more to the code :-)
--
Coly Li
SuSE PRC Labs
^ permalink raw reply [flat|nested] 13+ messages in thread* [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
2008-12-11 19:04 ` [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking Coly Li
@ 2008-12-11 19:23 ` Sunil Mushran
2008-12-11 21:10 ` Coly Li
0 siblings, 1 reply; 13+ messages in thread
From: Sunil Mushran @ 2008-12-11 19:23 UTC (permalink / raw)
To: ocfs2-devel
Coly Li wrote:
> Because I am not familiar with the code yet, I though this is an oops triggered by my first
> modification. Therefore, I choose to use a loop which did not trigger the oops.
>
> From your reply, it seems kernel BUG in __dlm_lockres_drop_inflight_ref at dlmmaster.c:680 is
> another bug ? I saw a patch named "ocfs2/dlm: Fix race in adding/removing lockres' to/from the
> tracking list", is it the fix for this bug ? If yes, I should learn how you resolve it ;)
The oops in __dlm_lockres_drop_inflight_ref() is different that the
tracking list oops. No relationship.
The inflight_ref oops is because the "fix" was not taking the ref. Hence
it was zero during the drop. And that was because the "patch fix" was
at the wrong location. See the diff between my first patch and the final one
and see where the inflight ref is taken.
The tracking list bug has always been there. It was exposed during
forked flock() testing as explained in the patch.
> Here is how I thought, please comments on my mistake,
> The dlm associated with lockres is projected by dlm->spinlock, if we only protect lockres by
> lockres->spinlock, there *might* be possibility to modify dlm->node_num somewhere. Since we have
> quite a few places to compare lockres->owner with dlm->node_num, I suspect that manipulating on
> lockres->owner without protecting dlm->owner might be problematic.
dlm->node_num can never be modified. It is the node number which
is fixed for the life of the dlm domain (and more).
Sunil
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
2008-12-11 19:23 ` Sunil Mushran
@ 2008-12-11 21:10 ` Coly Li
0 siblings, 0 replies; 13+ messages in thread
From: Coly Li @ 2008-12-11 21:10 UTC (permalink / raw)
To: ocfs2-devel
Sunil Mushran Wrote:
> Coly Li wrote:
>> Because I am not familiar with the code yet, I though this is an oops
>> triggered by my first
>> modification. Therefore, I choose to use a loop which did not trigger
>> the oops.
>>
>> From your reply, it seems kernel BUG in
>> __dlm_lockres_drop_inflight_ref at dlmmaster.c:680 is
>> another bug ? I saw a patch named "ocfs2/dlm: Fix race in
>> adding/removing lockres' to/from the
>> tracking list", is it the fix for this bug ? If yes, I should learn
>> how you resolve it ;)
>
> The oops in __dlm_lockres_drop_inflight_ref() is different that the
> tracking list oops. No relationship.
>
> The inflight_ref oops is because the "fix" was not taking the ref. Hence
> it was zero during the drop. And that was because the "patch fix" was
> at the wrong location. See the diff between my first patch and the final
> one and see where the inflight ref is taken.
>
Aha, Understand.
The previous patch did not grab an inflight_ref after waiting on __dlm_wait_on_lockres_flags(). A
lockres with inflight_ref (=0) is possible returned.
Yes, this was what I did not understand before. Perfect point :)
> The tracking list bug has always been there. It was exposed during
> forked flock() testing as explained in the patch.
>
>> Here is how I thought, please comments on my mistake,
>> The dlm associated with lockres is projected by dlm->spinlock, if we
>> only protect lockres by
>> lockres->spinlock, there *might* be possibility to modify
>> dlm->node_num somewhere. Since we have
>> quite a few places to compare lockres->owner with dlm->node_num, I
>> suspect that manipulating on
>> lockres->owner without protecting dlm->owner might be problematic.
>
> dlm->node_num can never be modified. It is the node number which
> is fixed for the life of the dlm domain (and more).
Got it, and remember it :-)
Thanks for your patient and accurate explaining again :-)
--
Coly Li
SuSE PRC Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
@ 2008-12-02 12:06 Coly Li
2008-12-02 22:09 ` Sunil Mushran
0 siblings, 1 reply; 13+ messages in thread
From: Coly Li @ 2008-12-02 12:06 UTC (permalink / raw)
To: ocfs2-devel
When calling flock(2) by multiple processes concurrently, it is reported to be very easy resulting
DLM_BADARGS error. In dmesg the similar lines can be found:
Sep 2 11:17:32 acct20 kernel: (6935,1):ocfs2_lock_create:901 ERROR: Dlm error "DLM_BADARGS" while
calling dlmlock on resource
F00000000000000005d1cfe83ab8dad: bad api args
Sep 2 11:17:32 acct20 kernel: (6935,1):ocfs2_file_lock:1486 ERROR: status = -22
Here is the analysis how this error comes, in condition that 2 process concurrently call flock(2) on
same file:
1) call sequence of process 1 is (first listed first called):
[ocfs2_do_flock] [ocfs2_file_lock] [ocfs2_lock_create] [dlmlock] [dlm_get_lock_resource]
[dlm_do_master_request] [o2net_send_message_vec]
In o2net_send_message_vec(), at line 1049:
wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &nsw));
Here process 1 is scheduled out, and process 2 is scheduled in to ran.
At this moment, res->owner (ownership of the locking file resource) is set to
DLM_LOCK_RES_OWNER_UNKNOWN.
2) call sequence of process 2 is (first listed first called):
[ocfs2_do_flock] [ocfs2_file_lock] [ocfs2_lock_create] [dlmlock] [dlmlock_remote]
[o2net_send_message_vec]
in dlmlock(), at line 725:
if (res->owner == dlm->node_num)
status = dlmlock_master(dlm, res, lock, flags);
else
status = dlmlock_remote(dlm, res, lock, flags);
Since res->owner is still set to DLM_LOCK_RES_OWNER_UNKNOWN and not decided in process 1 yet,
process 2 calls dlmlock_remote() here. But inside dlmlock_remote(), at line 247:
__dlm_wait_on_lockres(res);
After this function returned, res->owner is set correctly to 0 (node 0). Then at line 257:
status = dlm_send_remote_lock_request(dlm, res, lock, flags);
dlm_send_remote_lock_request() calls o2net_send_message() at line 330, as bellowed:
tmpret = o2net_send_message(DLM_CREATE_LOCK_MSG, dlm->key, &create,
sizeof(create), res->owner, &status);
o2net_send_message() calls o2net_send_message_vec() and at line 987, the code is:
if (target_node == o2nm_this_node()) {
ret = -ELOOP;
goto out;
}
Here target_node is res->owner which is already set to 0. But before calling dlmlock_remote(),
res->owner is not set properly (still is DLM_LOCK_RES_OWNER_UNKNOWN). After entering
dlmlock_remote(), and returning from __dlm_wait_on_lockres(), though res->owner is correct, the
execution flow is in mistaken location.
3) The result is, process 1 can make the flock, and process 2 generates BADARGS error. 20 processes
condition also got tested, only first process can success the flock, and other processes all got
BADARGS.
With Sunil and Mark's suggestion, the solution is,
1) If a tmpres returned from hashed resources, check its owner.
2) if tmpres->owner is DLM_LOCK_RES_OWNER_UNKNOWN, it means another process owns the resource but
dose not finish its initiation. We should wait on DLM_LOCK_RES_IN_PROGRESS until the initiation
accomplished.
3) Once the initiation done, re-lookup the resource.
After basic testing on 2 nodes (each node forks 200,000 processes calling flock(2)) and 8 nodes
(each node forks 1000 processes calling flock(2)), it works fine so far.
Signed-off-by: Coly Li <coyli@suse.de>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
---
fs/ocfs2/dlm/dlmmaster.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 44f87ca..c777844 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -742,6 +742,19 @@ lookup:
goto lookup;
}
+ /* tmpres might be initiating by another process, in this case
+ * we should wait until the initiation done */
+ spin_lock(&tmpres->spinlock);
+ if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN ||
+ (tmpres->state & DLM_LOCK_RES_IN_PROGRESS)) {
+ __dlm_wait_on_lockres_flags(tmpres, DLM_LOCK_RES_IN_PROGRESS);
+ spin_unlock(&tmpres->spinlock);
+ dlm_lockres_put(tmpres);
+ tmpres = NULL;
+ goto lookup;
+ }
+ spin_unlock(&tmpres->spinlock);
+
mlog(0, "found in hash!\n");
if (res)
dlm_lockres_put(res);
--
Coly Li
SuSE PRC Labs
^ permalink raw reply related [flat|nested] 13+ messages in thread* [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
2008-12-02 12:06 Coly Li
@ 2008-12-02 22:09 ` Sunil Mushran
2008-12-03 2:23 ` Coly Li
2008-12-03 19:48 ` Coly Li
0 siblings, 2 replies; 13+ messages in thread
From: Sunil Mushran @ 2008-12-02 22:09 UTC (permalink / raw)
To: ocfs2-devel
Coly Li wrote:
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 44f87ca..c777844 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -742,6 +742,19 @@ lookup:
> goto lookup;
> }
>
> + /* tmpres might be initiating by another process, in this case
> + * we should wait until the initiation done */
> + spin_lock(&tmpres->spinlock);
> + if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN ||
> + (tmpres->state & DLM_LOCK_RES_IN_PROGRESS)) {
> + __dlm_wait_on_lockres_flags(tmpres, DLM_LOCK_RES_IN_PROGRESS);
> + spin_unlock(&tmpres->spinlock);
> + dlm_lockres_put(tmpres);
> + tmpres = NULL;
> + goto lookup;
> + }
> + spin_unlock(&tmpres->spinlock);
> +
> mlog(0, "found in hash!\n");
> if (res)
> dlm_lockres_put(res);
This will work but we are wasting cycles as we are looking up the same
lockres again. How about the following? The mlog is for testing.
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -742,6 +742,17 @@ lookup:
goto lookup;
}
+ /* wait if lock resource is being mastered by another
thread */
+ spin_lock(&tmpres->spinlock);
+ if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
+ mlog(ML_NOTICE, "%s:%.*s Another thread is
mastering "
+ "this resource\n", dlm->name, namelen, lockid);
+ __dlm_wait_on_lockres_flags(tmpres,
+
DLM_LOCK_RES_IN_PROGRESS);
+ BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN);
+ }
+ spin_unlock(&tmpres->spinlock);
+
mlog(0, "found in hash!\n");
if (res)
dlm_lockres_put(res);
Secondly, the patch comments are more than required. We should prune it
down a bit. Like:
=========================================================================
ocfs2/dlm: Fix lockres mastery race
dlm_get_lock_resource() is supposed to return a lock resource with a proper
master. If multiple concurrent threads attempt to lookup the lockres for the
same lockid, one or more threads are likely to return a lockres without a
proper master, if the lock mastery is underway.
This patch makes the threads wait in dlm_get_lock_resource() while the
mastery
is underway, ensuring all threads return the lockres with a proper master.
This issue is limited to users using the flock() syscall. For all other fs
operations, dlmglue ensures that only one thread is performing dlm
operations
for a given lockid at any one time.
Signed-off-by: Coly Li <...>
=========================================================================
Corrections welcome.
Lastly, test the patch not only with the flock() test but also otherwise.
Say, with recovery. Part of fixing bugs is ensuring the fix does not break
existing functionality. Run all multinode tests in ocfs2-test.
Thanks
Sunil
^ permalink raw reply [flat|nested] 13+ messages in thread* [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
2008-12-02 22:09 ` Sunil Mushran
@ 2008-12-03 2:23 ` Coly Li
2008-12-04 3:58 ` Sunil Mushran
2008-12-03 19:48 ` Coly Li
1 sibling, 1 reply; 13+ messages in thread
From: Coly Li @ 2008-12-03 2:23 UTC (permalink / raw)
To: ocfs2-devel
Sunil Mushran Wrote:
> Coly Li wrote:
>> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
>> index 44f87ca..c777844 100644
>> --- a/fs/ocfs2/dlm/dlmmaster.c
>> +++ b/fs/ocfs2/dlm/dlmmaster.c
>> @@ -742,6 +742,19 @@ lookup:
>> goto lookup;
>> }
>>
>> + /* tmpres might be initiating by another process, in this case
>> + * we should wait until the initiation done */
>> + spin_lock(&tmpres->spinlock);
>> + if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN ||
>> + (tmpres->state & DLM_LOCK_RES_IN_PROGRESS)) {
>> + __dlm_wait_on_lockres_flags(tmpres,
>> DLM_LOCK_RES_IN_PROGRESS);
>> + spin_unlock(&tmpres->spinlock);
>> + dlm_lockres_put(tmpres);
>> + tmpres = NULL;
>> + goto lookup;
>> + }
>> + spin_unlock(&tmpres->spinlock);
>> +
>> mlog(0, "found in hash!\n");
>> if (res)
>> dlm_lockres_put(res);
>
> This will work but we are wasting cycles as we are looking up the same
> lockres again. How about the following? The mlog is for testing.
>
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -742,6 +742,17 @@ lookup:
> goto lookup;
> }
>
> + /* wait if lock resource is being mastered by another
> thread */
> + spin_lock(&tmpres->spinlock);
> + if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
> + mlog(ML_NOTICE, "%s:%.*s Another thread is
> mastering "
> + "this resource\n", dlm->name, namelen,
> lockid);
> + __dlm_wait_on_lockres_flags(tmpres,
> +
> DLM_LOCK_RES_IN_PROGRESS);
> + BUG_ON(tmpres->owner ==
> DLM_LOCK_RES_OWNER_UNKNOWN);
> + }
> + spin_unlock(&tmpres->spinlock);
This is the way I tried firstly, didn't not work well. When I tested with more than 20 forks on 1
node, user space process got segmentation fault and kernel got oops. The kernel oops was triggered
on BUG_ON(res->inflight_locks == 0) in __dlm_lockres_drop_inflight_ref() IIRC.
Also, BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) might not always be correct IMHO. Even we
return from __dlm_wait_on_lockres_flags(), there still is chance to set tmpres->owner to
DLM_LOCK_RES_OWNER_UNKNOWN before BUG_ON is tested (in theory), especially when dlm->spinlock is not
held.
> +
> mlog(0, "found in hash!\n");
> if (res)
> dlm_lockres_put(res);
>
> Secondly, the patch comments are more than required. We should prune it
> down a bit. Like:
>
> =========================================================================
> ocfs2/dlm: Fix lockres mastery race
Cool. Since I was not sure this was a race error, you can see, I did not mention race anywhere.
I like this title more ;-)
>
> dlm_get_lock_resource() is supposed to return a lock resource with a proper
> master. If multiple concurrent threads attempt to lookup the lockres for
> the
> same lockid, one or more threads are likely to return a lockres without a
> proper master, if the lock mastery is underway.
>
> This patch makes the threads wait in dlm_get_lock_resource() while the
> mastery
> is underway, ensuring all threads return the lockres with a proper master.
>
> This issue is limited to users using the flock() syscall. For all other fs
> operations, dlmglue ensures that only one thread is performing dlm
> operations
> for a given lockid at any one time.
>
> Signed-off-by: Coly Li <...>
> =========================================================================
>
> Corrections welcome.
The comment is perfect IMHO.
>
> Lastly, test the patch not only with the flock() test but also otherwise.
> Say, with recovery. Part of fixing bugs is ensuring the fix does not break
> existing functionality. Run all multinode tests in ocfs2-test.
Sure, thanks for the suggestion. I will test the patch will all multinode tests, and submit a patch
to ocfs2 test-suite for this fix.
--
Coly Li
SuSE PRC Labs
^ permalink raw reply [flat|nested] 13+ messages in thread* [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
2008-12-03 2:23 ` Coly Li
@ 2008-12-04 3:58 ` Sunil Mushran
0 siblings, 0 replies; 13+ messages in thread
From: Sunil Mushran @ 2008-12-04 3:58 UTC (permalink / raw)
To: ocfs2-devel
So I have a new patch. It has debug code too. I am still testing this.
Don't upload it to bz.
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -672,6 +672,12 @@ void __dlm_lockres_drop_inflight_ref(struct
dlm_ctxt *dlm,
{
assert_spin_locked(&res->spinlock);
+ if (res->inflight_locks == 0) {
+ mlog(ML_ERROR, "%s:%.*s: inflight=0\n",
+ dlm->name, res->lockname.len, res->lockname.name);
+ __dlm_print_one_lock_resource(res);
+ dump_stack();
+ }
BUG_ON(res->inflight_locks == 0);
res->inflight_locks--;
mlog(0, "%s:%.*s: inflight--: now %u\n",
@@ -726,14 +732,22 @@ lookup:
if (tmpres) {
int dropping_ref = 0;
+ spin_unlock(&dlm->spinlock);
+
spin_lock(&tmpres->spinlock);
+ if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
+ mlog(ML_NOTICE, "%s:%.*s Another thread is
mastering "
+ "this resource\n", dlm->name, namelen, lockid);
+ __dlm_wait_on_lockres(tmpres);
+ BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN);
+ }
+
if (tmpres->owner == dlm->node_num) {
BUG_ON(tmpres->state & DLM_LOCK_RES_DROPPING_REF);
dlm_lockres_grab_inflight_ref(dlm, tmpres);
} else if (tmpres->state & DLM_LOCK_RES_DROPPING_REF)
dropping_ref = 1;
spin_unlock(&tmpres->spinlock);
- spin_unlock(&dlm->spinlock);
/* wait until done messaging the master, drop our ref to
allow
* the lockres to be purged, start over. */
Coly Li wrote:
> This is the way I tried firstly, didn't not work well. When I tested with more than 20 forks on 1
> node, user space process got segmentation fault and kernel got oops. The kernel oops was triggered
> on BUG_ON(res->inflight_locks == 0) in __dlm_lockres_drop_inflight_ref() IIRC.
>
> Also, BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) might not always be correct IMHO. Even we
> return from __dlm_wait_on_lockres_flags(), there still is chance to set tmpres->owner to
> DLM_LOCK_RES_OWNER_UNKNOWN before BUG_ON is tested (in theory), especially when dlm->spinlock is not
> held.
>
Well, if you get an oops, get the stack. If you had, it would have
looked like this.
(19990,0):__dlm_lockres_drop_inflight_ref:677
ERROR:
1B57953A48324D258163FE31C04C4B11:F00000000000000000801f61f38b60c: inflight=0
Call Trace:
.show_stack+0x68/0x1b0 (unreliable)
.__dlm_lockres_drop_inflight_ref+0x98/0x184 [ocfs2_dlm]
.dlmlock_master+0x41c/0x4f8 [ocfs2_dlm]
.dlmlock+0x8b0/0x11a0 [ocfs2_dlm]
.ocfs2_lock_create+0x174/0x3a8 [ocfs2]
.ocfs2_file_lock+0x158/0x6d8 [ocfs2]
.ocfs2_flock+0x18c/0x278 [ocfs2]
.sys_flock+0x170/0x1d4
syscall_exit+0x0/0x40
kernel BUG in __dlm_lockres_drop_inflight_ref at
/rpmbuild/smushran/BUILD/ocfs2-1.4.1/fs/ocfs2/dlm/dlmmaster.c:680!
This tells you what the issue is. We are forgetting to take an inflight ref.
The new patch moves the code up so that we take the reference.
Secondly, dlm->spinlock does not protect res->owner. Furthermore, that
BUG_ON is important because that is the case. If mastery is underway,
the state will remain IN_PROGRESS. I am curious as to why you feel
otherwise.
Lastly, the perl script can be improved by enclosing the 20 fork loop
into another loop that changes the filename. To startwith, touch 5000 files
(test.flock%d). umount and mount again. This will drop the caches. Now
before running any ls, run flock.pl. It will spawn 20 processes to flock()
each file and wait till the 20 forks have exited before letting loose
another
20 for the next file. (Makesure you have another node or two or more
that have
just mounted that volume.)
I ran this with 5000 files successfully. I did encounter a problem in
the last
run, not where I expected it. Still a work in progress. I can't email
you the
perl script as my box will remain in the debugger for the night. ;)
My problem with your patch is not that it is wrong, but it is punting the
issue. Say we do have a case in which the res->owner remains unknown
eventhough
the state in not in_prog. In that case, the process will keep looping. By
oopsing, we catch the problem upfront.
Sunil
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
2008-12-02 22:09 ` Sunil Mushran
2008-12-03 2:23 ` Coly Li
@ 2008-12-03 19:48 ` Coly Li
2008-12-04 1:31 ` tristan.ye
1 sibling, 1 reply; 13+ messages in thread
From: Coly Li @ 2008-12-03 19:48 UTC (permalink / raw)
To: ocfs2-devel
Sunil Mushran Wrote:
[snip]
> Corrections welcome.
>
> Lastly, test the patch not only with the flock() test but also otherwise.
> Say, with recovery. Part of fixing bugs is ensuring the fix does not break
> existing functionality. Run all multinode tests in ocfs2-test.
On a 8 nodes cluster setup by xen, I tested these test cases,
- run_buildkernel.py
- cross_delete.py
- dlmstress1.sh
- run_extend_and_write.py
- run_lvb_torture.py
- mmap_test
- test_netfail.py
Not tested cases are run_create_racer.py and open_delete.py, cause I encountered some error when
setup them.
When I ran dlmsress1.sh and test_netfail.py,
- Kernel BUG at fs/ocfs2/dlm/dlmast.c:334 was triggered 3 times.
- Kernel BUG at fs/ocfs2/dlm/dlmmaster.c:1694 was triggered once.
Right now, there is not obvious clue they are directly related to this patch. They might stay there
already, I will do more testing to confirm.
Thanks.
--
Coly Li
SuSE PRC Labs
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
2008-12-03 19:48 ` Coly Li
@ 2008-12-04 1:31 ` tristan.ye
[not found] ` <4940CC19.5010801@suse.de>
0 siblings, 1 reply; 13+ messages in thread
From: tristan.ye @ 2008-12-04 1:31 UTC (permalink / raw)
To: ocfs2-devel
On Thu, 2008-12-04 at 03:48 +0800, Coly Li wrote:
>
> Sunil Mushran Wrote:
> [snip]
>
> > Corrections welcome.
> >
> > Lastly, test the patch not only with the flock() test but also otherwise.
> > Say, with recovery. Part of fixing bugs is ensuring the fix does not break
> > existing functionality. Run all multinode tests in ocfs2-test.
> On a 8 nodes cluster setup by xen, I tested these test cases,
> - run_buildkernel.py
> - cross_delete.py
> - dlmstress1.sh
> - run_extend_and_write.py
> - run_lvb_torture.py
> - mmap_test
> - test_netfail.py
>
> Not tested cases are run_create_racer.py and open_delete.py, cause I encountered some error when
> setup them.
Hi Coly,
How did you setup the env for testing? are you using the latest
testsuite from ocfs2-test.git? As for the cluster testing, we need to
install openmpi first, and setup passwordless ssh&rsh connection among
all nodes.
Besides, all testing binaries need to be run from a installed BINDIR
after a successful building&installation.
Regards,
Tristan
>
> When I ran dlmsress1.sh and test_netfail.py,
> - Kernel BUG at fs/ocfs2/dlm/dlmast.c:334 was triggered 3 times.
> - Kernel BUG at fs/ocfs2/dlm/dlmmaster.c:1694 was triggered once.
> Right now, there is not obvious clue they are directly related to this patch. They might stay there
> already, I will do more testing to confirm.
>
> Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-12-11 21:10 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-09 2:28 [Ocfs2-devel] flock() race - fix Sunil Mushran
2008-12-09 2:28 ` [Ocfs2-devel] [PATCH 10/11] ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list Sunil Mushran
2008-12-09 2:28 ` [Ocfs2-devel] [PATCH 11/11] ocfs2/dlm: Fix race during lockres mastery Sunil Mushran
2008-12-11 19:04 ` [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking Coly Li
2008-12-11 19:23 ` Sunil Mushran
2008-12-11 21:10 ` Coly Li
-- strict thread matches above, loose matches on Subject: below --
2008-12-02 12:06 Coly Li
2008-12-02 22:09 ` Sunil Mushran
2008-12-03 2:23 ` Coly Li
2008-12-04 3:58 ` Sunil Mushran
2008-12-03 19:48 ` Coly Li
2008-12-04 1:31 ` tristan.ye
[not found] ` <4940CC19.5010801@suse.de>
2008-12-11 8:26 ` tristan.ye
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.