[PATCH 1/3] autofs4 - expiring filesystem from under process

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/3] autofs4 - expiring filesystem from under process
@ 2005-04-10 12:48 raven
  2005-04-11 17:05 ` Jeff Moyer
  0 siblings, 1 reply; 9+ messages in thread
From: raven @ 2005-04-10 12:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: autofs mailing list, Michael Blandford, linux-fsdevel, Jeff Moyer


autofs4-2.6.12-rc1-mm4-wait-order.patch

It's possible for an event wait request to arive before the event
requestor. If this happens the daemon never gets notified and autofs
hangs.

Signed-off-by: Ian Kent <raven@themaw.net>

--- linux-2.6.12-rc1-mm4/fs/autofs4/waitq.c.wait-order	2005-04-03 12:30:14.000000000 +0800
+++ linux-2.6.12-rc1-mm4/fs/autofs4/waitq.c	2005-04-03 12:31:57.000000000 +0800
@@ -210,17 +210,8 @@ int autofs4_wait(struct autofs_sb_info *
  		wq->len = len;
  		wq->status = -EINTR; /* Status return if interrupted */
  		atomic_set(&wq->wait_ctr, 2);
+		atomic_set(&wq->notified, 1);
  		up(&sbi->wq_sem);
-
-		DPRINTK("new wait id = 0x%08lx, name = %.*s, nfy=%d",
-			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
-		/* autofs4_notify_daemon() may block */
-		if (notify != NFY_NONE) {
-			autofs4_notify_daemon(sbi,wq, 
-					notify == NFY_MOUNT ?
-						  autofs_ptype_missing :
-						  autofs_ptype_expire_multi);
-		}
  	} else {
  		atomic_inc(&wq->wait_ctr);
  		up(&sbi->wq_sem);
@@ -229,6 +220,17 @@ int autofs4_wait(struct autofs_sb_info *
  			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
  	}

+	if (notify != NFY_NONE && atomic_dec_and_test(&wq->notified)) {
+		int type = (notify == NFY_MOUNT ? 
+			autofs_ptype_missing : autofs_ptype_expire_multi);
+
+		DPRINTK(("new wait id = 0x%08lx, name = %.*s, nfy=%d\n",
+			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify));
+
+		/* autofs4_notify_daemon() may block */
+		autofs4_notify_daemon(sbi, wq, type);
+	}
+
  	/* wq->name is NULL if and only if the lock is already released */

  	if ( sbi->catatonic ) {
--- linux-2.6.12-rc1-mm4/fs/autofs4/autofs_i.h.wait-order	2005-04-03 12:30:24.000000000 +0800
+++ linux-2.6.12-rc1-mm4/fs/autofs4/autofs_i.h	2005-04-03 12:30:46.000000000 +0800
@@ -84,6 +84,7 @@ struct autofs_wait_queue {
  	char *name;
  	/* This is for status reporting upon return */
  	int status;
+	atomic_t notified;
  	atomic_t wait_ctr;
  };


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] autofs4 - expiring filesystem from under process
  2005-04-10 12:48 [PATCH 1/3] autofs4 - expiring filesystem from under process raven
@ 2005-04-11 17:05 ` Jeff Moyer
  2005-04-12 12:44   ` raven
  2005-04-20 14:34   ` raven
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff Moyer @ 2005-04-11 17:05 UTC (permalink / raw)
  To: raven; +Cc: Andrew Morton, autofs mailing list, linux-fsdevel,
	Michael Blandford

==> Regarding [PATCH 1/3] autofs4 - expiring filesystem from under process; raven@themaw.net adds:

raven> autofs4-2.6.12-rc1-mm4-wait-order.patch

raven> It's possible for an event wait request to arive before the event
raven> requestor. If this happens the daemon never gets notified and autofs
raven> hangs.

I'm not sure I understand your problem description.  Could you please
elaborate on the set of circumstances that lead to this?

Could also please explain how the following is handled:

expire process runs and issues AUTOFS_EXPIRE_MULTI, which sets
AUTOFS_INF_EXPIRING in flags.  While the expire is in progress, another
process access the directory in question, causing a call to
try_to_fill_dentry.  try_to_fill_dentry sees the AUTOFS_INF_EXPIRING flag
is set, and so calls autofs4_wait with notify set to NFY_NONE.  However,
when we take the wq sem, we find that the expire has finished, and thus
create a new wq entry.  Because NFY_NONE is set, we don't tell the daemon.

So how will this process ever get woken up?

Thanks!

Jeff


raven> Signed-off-by: Ian Kent <raven@themaw.net>

raven> --- linux-2.6.12-rc1-mm4/fs/autofs4/waitq.c.wait-order	2005-04-03 12:30:14.000000000 +0800
raven> +++ linux-2.6.12-rc1-mm4/fs/autofs4/waitq.c	2005-04-03 12:31:57.000000000 +0800
raven> @@ -210,17 +210,8 @@ int autofs4_wait(struct autofs_sb_info *
wq-> len = len;
wq-> status = -EINTR; /* Status return if interrupted */
raven>   		atomic_set(&wq->wait_ctr, 2);
raven> +		atomic_set(&wq->notified, 1);
raven>   		up(&sbi->wq_sem);
raven> -
raven> -		DPRINTK("new wait id = 0x%08lx, name = %.*s, nfy=%d",
raven> -			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
raven> -		/* autofs4_notify_daemon() may block */
raven> -		if (notify != NFY_NONE) {
raven> -			autofs4_notify_daemon(sbi,wq, 
raven> -					notify == NFY_MOUNT ?
raven> -						  autofs_ptype_missing :
raven> -						  autofs_ptype_expire_multi);
raven> -		}
raven>   	} else {
raven>   		atomic_inc(&wq->wait_ctr);
raven>   		up(&sbi->wq_sem);
raven> @@ -229,6 +220,17 @@ int autofs4_wait(struct autofs_sb_info *
raven>   			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify);
raven>   	}

raven> +	if (notify != NFY_NONE && atomic_dec_and_test(&wq->notified)) {
raven> +		int type = (notify == NFY_MOUNT ? 
raven> +			autofs_ptype_missing : autofs_ptype_expire_multi);
raven> +
raven> +		DPRINTK(("new wait id = 0x%08lx, name = %.*s, nfy=%d\n",
raven> +			(unsigned long) wq->wait_queue_token, wq->len, wq->name, notify));
raven> +
raven> +		/* autofs4_notify_daemon() may block */
raven> +		autofs4_notify_daemon(sbi, wq, type);
raven> +	}
raven> +
raven>   	/* wq->name is NULL if and only if the lock is already released */

raven>   	if ( sbi->catatonic ) {
raven> --- linux-2.6.12-rc1-mm4/fs/autofs4/autofs_i.h.wait-order	2005-04-03 12:30:24.000000000 +0800
raven> +++ linux-2.6.12-rc1-mm4/fs/autofs4/autofs_i.h	2005-04-03 12:30:46.000000000 +0800
raven> @@ -84,6 +84,7 @@ struct autofs_wait_queue {
raven>   	char *name;
raven>   	/* This is for status reporting upon return */
raven>   	int status;
raven> +	atomic_t notified;
raven>   	atomic_t wait_ctr;
raven>   };

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH 1/3] autofs4 - expiring filesystem from under process
  2005-04-11 17:05 ` Jeff Moyer
@ 2005-04-12 12:44   ` raven
  2005-04-20 14:34   ` raven
  1 sibling, 0 replies; 9+ messages in thread
From: raven @ 2005-04-12 12:44 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Andrew Morton, autofs mailing list, linux-fsdevel,
	Michael Blandford

On Mon, 11 Apr 2005, Jeff Moyer wrote:

Glad you were able to find time to review the patches Jeff.
Much appreciated.

> ==> Regarding [PATCH 1/3] autofs4 - expiring filesystem from under process; raven@themaw.net adds:
>
> raven> autofs4-2.6.12-rc1-mm4-wait-order.patch
>
> raven> It's possible for an event wait request to arive before the event
> raven> requestor. If this happens the daemon never gets notified and autofs
> raven> hangs.
>
> I'm not sure I understand your problem description.  Could you please
> elaborate on the set of circumstances that lead to this?

The case is similar to the one you describe below except it's during the 
entry to the expire event.

During an expire, after a dentry is selected to be expired and 
AUTOFS_INF_EXPIRE is set. Another process may take the same path as below 
via try_to_fill_dentry and race for the semaphore. The result is the same 
as your example below.

>
> expire process runs and issues AUTOFS_EXPIRE_MULTI, which sets
> AUTOFS_INF_EXPIRING in flags.  While the expire is in progress, another
> process access the directory in question, causing a call to
> try_to_fill_dentry.  try_to_fill_dentry sees the AUTOFS_INF_EXPIRING flag
> is set, and so calls autofs4_wait with notify set to NFY_NONE.  However,
> when we take the wq sem, we find that the expire has finished, and thus
> create a new wq entry.  Because NFY_NONE is set, we don't tell the daemon.
>
> So how will this process ever get woken up?

Never.
Well spotted.

This certainly looks like a potential race.
It appears to be the same case except on exit from the expire instead of 
on entry.

I've missed it till now.

I don't have a solution yet either but I'll give it some thought and see 
what I can come up with.

Ian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] autofs4 - expiring filesystem from under process
  2005-04-11 17:05 ` Jeff Moyer
  2005-04-12 12:44   ` raven
@ 2005-04-20 14:34   ` raven
  2005-04-20 21:03     ` Jeff Moyer
  1 sibling, 1 reply; 9+ messages in thread
From: raven @ 2005-04-20 14:34 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Andrew Morton, autofs mailing list, Michael Blandford,
	linux-fsdevel

On Mon, 11 Apr 2005, Jeff Moyer wrote:

> ==> Regarding [PATCH 1/3] autofs4 - expiring filesystem from under process; raven@themaw.net adds:
>
> Could also please explain how the following is handled:
>
> expire process runs and issues AUTOFS_EXPIRE_MULTI, which sets
> AUTOFS_INF_EXPIRING in flags.  While the expire is in progress, another
> process access the directory in question, causing a call to
> try_to_fill_dentry.  try_to_fill_dentry sees the AUTOFS_INF_EXPIRING flag
> is set, and so calls autofs4_wait with notify set to NFY_NONE.  However,
> when we take the wq sem, we find that the expire has finished, and thus
> create a new wq entry.  Because NFY_NONE is set, we don't tell the daemon.
>
> So how will this process ever get woken up?
>

I've thought about this for quite a while and I think all that's needed is 
to recognise that we're about to expire a dentry that's not mounted 
anymore.

Can you think of a case for which this patch fails?

Ian

--- linux-2.6.12-rc2-mm3/fs/autofs4/waitq.c.post-expire-race	2005-04-20 22:07:32.000000000 +0800
+++ linux-2.6.12-rc2-mm3/fs/autofs4/waitq.c	2005-04-20 22:08:29.000000000 +0800
@@ -191,6 +191,13 @@ int autofs4_wait(struct autofs_sb_info *
  	}

  	if ( !wq ) {
+		/* Can't expire if we are not mounted */
+		if (notify == NFY_EXPIRE && !d_mountpoint(dentry)) {
+			kfree(name);
+			up(&sbi->wq_sem);
+			return -ENOENT;
+		}
+
  		/* Create a new wait queue */
  		wq = kmalloc(sizeof(struct autofs_wait_queue),GFP_KERNEL);
  		if ( !wq ) {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] autofs4 - expiring filesystem from under process
  2005-04-20 14:34   ` raven
@ 2005-04-20 21:03     ` Jeff Moyer
  2005-04-21  9:34       ` Ian Kent
  2005-05-23 12:38       ` raven
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff Moyer @ 2005-04-20 21:03 UTC (permalink / raw)
  To: raven; +Cc: Andrew Morton, autofs mailing list, Michael Blandford,
	linux-fsdevel

==> Regarding Re: [PATCH 1/3] autofs4 - expiring filesystem from under process; raven@themaw.net adds:

raven> On Mon, 11 Apr 2005, Jeff Moyer wrote:
>> ==> Regarding [PATCH 1/3] autofs4 - expiring filesystem from under
>> process; raven@themaw.net adds:
>> 
>> Could also please explain how the following is handled:
>> 
>> expire process runs and issues AUTOFS_EXPIRE_MULTI, which sets
>> AUTOFS_INF_EXPIRING in flags.  While the expire is in progress, another
>> process access the directory in question, causing a call to
>> try_to_fill_dentry.  try_to_fill_dentry sees the AUTOFS_INF_EXPIRING
>> flag is set, and so calls autofs4_wait with notify set to NFY_NONE.
>> However, when we take the wq sem, we find that the expire has finished,
>> and thus create a new wq entry.  Because NFY_NONE is set, we don't tell
>> the daemon.
>> 
>> So how will this process ever get woken up?
>> 

raven> I've thought about this for quite a while and I think all that's
raven> needed is to recognise that we're about to expire a dentry that's
raven> not mounted anymore.

raven> Can you think of a case for which this patch fails?

I don't see how the patch you provided addresses the race between an expire
event and a lookup.  The real issue here is that the checking of
AUTOFS_INF_EXPIRING and the list operations associated therewith need to
happen atomically.  Until this happens, we will have races.

The attached patch is more in line with how I think the problem should be
fixed.  I have not yet tested or even compiled this.  Could you please look
this over and comment?

Do you have a reproducer test case for this, by the way?  That would be
vastly useful.

Thanks,

Jeff

--- linux-2.6.9/fs/autofs4/waitq.c.orig	2005-04-20 14:34:40.746320400 -0400
+++ linux-2.6.9/fs/autofs4/waitq.c	2005-04-20 16:39:34.543090128 -0400
@@ -161,6 +161,7 @@ int autofs4_wait(struct autofs_sb_info *
 		enum autofs_notify notify)
 {
 	struct autofs_wait_queue *wq;
+	struct autofs_info *de_info = autofs4_dentry_ino(dentry);
 	char *name;
 	int len, status;
 
@@ -183,6 +184,22 @@ int autofs4_wait(struct autofs_sb_info *
 		return -EINTR;
 	}
 
+	/*
+	 *  The following two notify values are special:
+	 *
+	 *  The only case in which we don:t want notification is when we
+	 *  are trying to fill a dentry and there is an expiry in process.
+	 *  So, we check if we are expiring inside the wq_sem to avoid
+	 *  races.  Notice that we moved the setting of AUTOFS_INF_EXPIRING
+	 *  to inside the lock, as well.
+	 */
+	if (notify & NFY_EXPIRE)
+		de_info->d_flags |= AUTOFS_INF_EXPIRING;
+
+	if (notify & NFY_NONE)
+		if (!de_info->d_flags & AUTOFS_INF_EXPIRING)
+			return NFY_NONE;
+
 	for (wq = sbi->queues ; wq ; wq = wq->next) {
 		if (wq->hash == dentry->d_name.hash &&
 		    wq->len == len &&
@@ -208,6 +225,7 @@ int autofs4_wait(struct autofs_sb_info *
 		wq->hash = dentry->d_name.hash;
 		wq->name = name;
 		wq->len = len;
+		wq->info = de_info;
 		wq->status = -EINTR; /* Status return if interrupted */
 		atomic_set(&wq->wait_ctr, 2);
 		up(&sbi->wq_sem);
@@ -287,6 +305,8 @@ int autofs4_wait_release(struct autofs_s
 	}
 
 	*wql = wq->next;	/* Unlink from chain */
+	/* at this point, expiries have finished */
+	wq->info->flags &= ~AUTOFS_INF_EXPIRING;
 	up(&sbi->wq_sem);
 	kfree(wq->name);
 	wq->name = NULL;	/* Do not wait on this queue */
--- linux-2.6.9/fs/autofs4/expire.c.orig	2005-04-20 15:11:41.940647536 -0400
+++ linux-2.6.9/fs/autofs4/expire.c	2005-04-20 16:00:13.627003928 -0400
@@ -343,13 +343,9 @@ int autofs4_expire_multi(struct super_bl
 		return -EFAULT;
 
 	if ((dentry = autofs4_expire(sb, mnt, sbi, do_now)) != NULL) {
-		struct autofs_info *de_info = autofs4_dentry_ino(dentry);
-
 		/* This is synchronous because it makes the daemon a
                    little easier */
-		de_info->flags |= AUTOFS_INF_EXPIRING;
 		ret = autofs4_wait(sbi, dentry, NFY_EXPIRE);
-		de_info->flags &= ~AUTOFS_INF_EXPIRING;
 		dput(dentry);
 	}
 		
--- linux-2.6.9/fs/autofs4/autofs_i.h.orig	2005-04-20 15:36:49.956394296 -0400
+++ linux-2.6.9/fs/autofs4/autofs_i.h	2005-04-20 16:41:50.617403688 -0400
@@ -82,6 +82,7 @@ struct autofs_wait_queue {
 	int hash;
 	int len;
 	char *name;
+	struct autofs_info *info;
 	/* This is for status reporting upon return */
 	int status;
 	atomic_t wait_ctr;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] autofs4 - expiring filesystem from under process
  2005-04-20 21:03     ` Jeff Moyer
@ 2005-04-21  9:34       ` Ian Kent
  2005-04-22 18:28         ` Jeff Moyer
  2005-05-23 12:38       ` raven
  1 sibling, 1 reply; 9+ messages in thread
From: Ian Kent @ 2005-04-21  9:34 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Andrew Morton, autofs mailing list, linux-fsdevel,
	Michael Blandford

On Wed, 20 Apr 2005, Jeff Moyer wrote:

> ==> Regarding Re: [PATCH 1/3] autofs4 - expiring filesystem from under process; raven@themaw.net adds:
> 
> raven> On Mon, 11 Apr 2005, Jeff Moyer wrote:
> >> ==> Regarding [PATCH 1/3] autofs4 - expiring filesystem from under
> >> process; raven@themaw.net adds:
> >> 
> >> Could also please explain how the following is handled:
> >> 
> >> expire process runs and issues AUTOFS_EXPIRE_MULTI, which sets
> >> AUTOFS_INF_EXPIRING in flags.  While the expire is in progress, another
> >> process access the directory in question, causing a call to
> >> try_to_fill_dentry.  try_to_fill_dentry sees the AUTOFS_INF_EXPIRING
> >> flag is set, and so calls autofs4_wait with notify set to NFY_NONE.
> >> However, when we take the wq sem, we find that the expire has finished,
> >> and thus create a new wq entry.  Because NFY_NONE is set, we don't tell
> >> the daemon.
> >> 
> >> So how will this process ever get woken up?
> >> 
> 
> raven> I've thought about this for quite a while and I think all that's
> raven> needed is to recognise that we're about to expire a dentry that's
> raven> not mounted anymore.
> 
> raven> Can you think of a case for which this patch fails?
> 
> I don't see how the patch you provided addresses the race between an expire
> event and a lookup.  The real issue here is that the checking of
> AUTOFS_INF_EXPIRING and the list operations associated therewith need to
> happen atomically.  Until this happens, we will have races.
> 
> The attached patch is more in line with how I think the problem should be
> fixed.  I have not yet tested or even compiled this.  Could you please look
> this over and comment?

Sorry about this but the test should be for a notify event of NFY_NONE not 
NFY_EXPIRE.

I started something similar to what you propose myself but I don't think 
it's needed to resolve the issue.

I'll reconsider, but ...

One possible issue with your proposal is that the dentry info may not 
yet exist for mount requests in directories that are not browsable (ie. 
via lookup) as at this point we have a newly created negative dentry that 
we are attempting to fill in. I'll have to check further to see if this is 
actually the case.

The point of the notify none is to "wait" until the expire operation 
is done then return a fail status so that the following lookup can perform 
the mount.

To explain.

In this case, during the path walk the cached lookup is called as the 
dentry concerned is present (it's dgot) until the very end of the expire. 
The cached lookup fails as the revalidate (NFY_NONE) returns fail and the 
dput allows d_invalidate to succeed. The path walk then calls real lookup 
which triggers a mount following the expire.

There may still be an oppertunity for a race between the time the 
autofs4_release function is called and before the dput is executed. I'll 
think more about that.

But, given the correct test, as far as the original issue you describe is 
concerned I think my recommended patch covers what's needed.

A single counter example is all that's needed.

Thanks for the help Jeff.

> 
> Do you have a reproducer test case for this, by the way?  That would be
> vastly useful.

Not easily any more. I may be able to get Nautilus to duplicate the 
situation as that's were I first saw it ages ago (but couldn't work out 
how to fix it).

This was with 2.6.0-test? but went away with later 2.6 releases.
You might be interested to know that my inability to properly fix this 
is the source nice call in the daemon.

Ian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] autofs4 - expiring filesystem from under process
  2005-04-21  9:34       ` Ian Kent
@ 2005-04-22 18:28         ` Jeff Moyer
  2005-04-23  3:26           ` raven
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Moyer @ 2005-04-22 18:28 UTC (permalink / raw)
  To: Ian Kent
  Cc: Andrew Morton, autofs mailing list, Michael Blandford,
	linux-fsdevel

==> Regarding Re: [PATCH 1/3] autofs4 - expiring filesystem from under process; Ian Kent <raven@themaw.net> adds:

raven> On Wed, 20 Apr 2005, Jeff Moyer wrote:
>> ==> Regarding Re: [PATCH 1/3] autofs4 - expiring filesystem from under
>> process; raven@themaw.net adds:
>> 
raven> On Mon, 11 Apr 2005, Jeff Moyer wrote:
>> >> ==> Regarding [PATCH 1/3] autofs4 - expiring filesystem from under >>
>> process; raven@themaw.net adds:
>> >> 
>> >> Could also please explain how the following is handled:
>> >> 
>> >> expire process runs and issues AUTOFS_EXPIRE_MULTI, which sets >>
>> AUTOFS_INF_EXPIRING in flags.  While the expire is in progress, another
>> >> process access the directory in question, causing a call to >>
>> try_to_fill_dentry.  try_to_fill_dentry sees the AUTOFS_INF_EXPIRING >>
>> flag is set, and so calls autofs4_wait with notify set to NFY_NONE.  >>
>> However, when we take the wq sem, we find that the expire has finished,
>> >> and thus create a new wq entry.  Because NFY_NONE is set, we don't
>> tell >> the daemon.
>> >> 
>> >> So how will this process ever get woken up?
>> >> 
>> 
raven> I've thought about this for quite a while and I think all that's
raven> needed is to recognise that we're about to expire a dentry that's
raven> not mounted anymore.
>>
raven> Can you think of a case for which this patch fails?
>> I don't see how the patch you provided addresses the race between an
>> expire event and a lookup.  The real issue here is that the checking of
>> AUTOFS_INF_EXPIRING and the list operations associated therewith need to
>> happen atomically.  Until this happens, we will have races.
>> 
>> The attached patch is more in line with how I think the problem should
>> be fixed.  I have not yet tested or even compiled this.  Could you
>> please look this over and comment?

raven> Sorry about this but the test should be for a notify event of
raven> NFY_NONE not NFY_EXPIRE.

Oh, that makes a bit more sense.  So the patch I am commenting on is this:

@@ -191,6 +191,13 @@ int autofs4_wait(struct autofs_sb_info *
  	}

  	if ( !wq ) {
+		/* Can't expire if we are not mounted */
+		if (notify == NFY_NONE && !d_mountpoint(dentry)) {
+			kfree(name);
+			up(&sbi->wq_sem);
+			return -ENOENT;
+		}
+
  		/* Create a new wait queue */
  		wq = kmalloc(sizeof(struct autofs_wait_queue),GFP_KERNEL);
  		if ( !wq ) {

The original case you mentioned was where the expire is racing with an
access at the beginning of the expiry.  So, they both enter autofs4_wait,
one with NFY_EXPIRE and the other with NFY_NONE.  The caller with NFY_NONE
gets the semaphore first, and so doesn't notify the daemon.  The NFY_EXPIRE
caller then gets added to the wait list, and no one wakes them up.  I don't
belive your patch addresses that race, since d_mountpoint(dentry) will
still return true.

raven> I started something similar to what you propose myself but I don't
raven> think it's needed to resolve the issue.

raven> I'll reconsider, but ...

raven> One possible issue with your proposal is that the dentry info may
raven> not yet exist for mount requests in directories that are not
raven> browsable (ie. via lookup) as at this point we have a newly created
raven> negative dentry that we are attempting to fill in. I'll have to
raven> check further to see if this is actually the case.

Okay, I'll look into this further, as well.

raven> The point of the notify none is to "wait" until the expire operation
raven> is done then return a fail status so that the following lookup can
raven> perform the mount.

Right.

raven> To explain.

raven> In this case, during the path walk the cached lookup is called as
raven> the dentry concerned is present (it's dgot) until the very end of
raven> the expire. The cached lookup fails as the revalidate (NFY_NONE)
raven> returns fail and the dput allows d_invalidate to succeed. The path
raven> walk then calls real lookup which triggers a mount following the
raven> expire.

raven> There may still be an oppertunity for a race between the time the
raven> autofs4_release function is called and before the dput is
raven> executed. I'll think more about that.

raven> But, given the correct test, as far as the original issue you
raven> describe is concerned I think my recommended patch covers what's
raven> needed.

raven> A single counter example is all that's needed.

raven> Thanks for the help Jeff.

No problem.  In case you are actually considering the patch I sent earlier,
I missed a hunk which changes fs/autofs4/root.c.  Let me know if you want
me to send that to you.  It basically changed try_to_fill_dentry to call
autofs4_wait unconditionally with NFY_NONE.

-Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] autofs4 - expiring filesystem from under process
  2005-04-22 18:28         ` Jeff Moyer
@ 2005-04-23  3:26           ` raven
  0 siblings, 0 replies; 9+ messages in thread
From: raven @ 2005-04-23  3:26 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Andrew Morton, autofs mailing list, Michael Blandford,
	linux-fsdevel

On Fri, 22 Apr 2005, Jeff Moyer wrote:

> ==> Regarding Re: [PATCH 1/3] autofs4 - expiring filesystem from under process; Ian Kent <raven@themaw.net> adds:
>
> raven> On Wed, 20 Apr 2005, Jeff Moyer wrote:
>>> ==> Regarding Re: [PATCH 1/3] autofs4 - expiring filesystem from under
>>> process; raven@themaw.net adds:
>>>
> raven> On Mon, 11 Apr 2005, Jeff Moyer wrote:
>>>>> ==> Regarding [PATCH 1/3] autofs4 - expiring filesystem from under >>
>>> process; raven@themaw.net adds:
>>>>>
>>>>> Could also please explain how the following is handled:
>>>>>
>>>>> expire process runs and issues AUTOFS_EXPIRE_MULTI, which sets >>
>>> AUTOFS_INF_EXPIRING in flags.  While the expire is in progress, another
>>>>> process access the directory in question, causing a call to >>
>>> try_to_fill_dentry.  try_to_fill_dentry sees the AUTOFS_INF_EXPIRING >>
>>> flag is set, and so calls autofs4_wait with notify set to NFY_NONE.  >>
>>> However, when we take the wq sem, we find that the expire has finished,
>>>>> and thus create a new wq entry.  Because NFY_NONE is set, we don't
>>> tell >> the daemon.
>>>>>
>>>>> So how will this process ever get woken up?
>>>>>
>>>
> raven> I've thought about this for quite a while and I think all that's
> raven> needed is to recognise that we're about to expire a dentry that's
> raven> not mounted anymore.
>>>
> raven> Can you think of a case for which this patch fails?
>>> I don't see how the patch you provided addresses the race between an
>>> expire event and a lookup.  The real issue here is that the checking of
>>> AUTOFS_INF_EXPIRING and the list operations associated therewith need to
>>> happen atomically.  Until this happens, we will have races.
>>>
>>> The attached patch is more in line with how I think the problem should
>>> be fixed.  I have not yet tested or even compiled this.  Could you
>>> please look this over and comment?
>
> raven> Sorry about this but the test should be for a notify event of
> raven> NFY_NONE not NFY_EXPIRE.
>
> Oh, that makes a bit more sense.  So the patch I am commenting on is this:

That's right.

>
> @@ -191,6 +191,13 @@ int autofs4_wait(struct autofs_sb_info *
>  	}
>
>  	if ( !wq ) {
> +		/* Can't expire if we are not mounted */
> +		if (notify == NFY_NONE && !d_mountpoint(dentry)) {
> +			kfree(name);
> +			up(&sbi->wq_sem);
> +			return -ENOENT;
> +		}
> +
>  		/* Create a new wait queue */
>  		wq = kmalloc(sizeof(struct autofs_wait_queue),GFP_KERNEL);
>  		if ( !wq ) {
>
> The original case you mentioned was where the expire is racing with an
> access at the beginning of the expiry.  So, they both enter autofs4_wait,
> one with NFY_EXPIRE and the other with NFY_NONE.  The caller with NFY_NONE
> gets the semaphore first, and so doesn't notify the daemon.  The NFY_EXPIRE
> caller then gets added to the wait list, and no one wakes them up.  I don't
> belive your patch addresses that race, since d_mountpoint(dentry) will
> still return true.

Now my little brain is spinning.
I think that's not quite what we are discussing here.

The original patch from this thread is meant to address the race at expire 
initiation and I still believe it does. On lead in to the expire 
d_mountpoint should always be true so this test will always be false.

Following that you gave an example of a race at the tail end of the expire 
(good catch) which I believe the above patch resolves.

I thought about the possibilities for each of these cases for several days 
and I'm fairly confident that the two patches together address both 
issues.

Please, if you can think of a counter example for either case I'll be 
happy to reconsider, as always.

>
> raven> I started something similar to what you propose myself but I don't
> raven> think it's needed to resolve the issue.
>
> raven> I'll reconsider, but ...
>
> raven> One possible issue with your proposal is that the dentry info may
> raven> not yet exist for mount requests in directories that are not
> raven> browsable (ie. via lookup) as at this point we have a newly created
> raven> negative dentry that we are attempting to fill in. I'll have to
> raven> check further to see if this is actually the case.
>
> Okay, I'll look into this further, as well.
>
> raven> The point of the notify none is to "wait" until the expire operation
> raven> is done then return a fail status so that the following lookup can
> raven> perform the mount.
>
> Right.
>
> raven> To explain.
>
> raven> In this case, during the path walk the cached lookup is called as
> raven> the dentry concerned is present (it's dgot) until the very end of
> raven> the expire. The cached lookup fails as the revalidate (NFY_NONE)
> raven> returns fail and the dput allows d_invalidate to succeed. The path
> raven> walk then calls real lookup which triggers a mount following the
> raven> expire.
>
> raven> There may still be an oppertunity for a race between the time the
> raven> autofs4_release function is called and before the dput is
> raven> executed. I'll think more about that.
>
> raven> But, given the correct test, as far as the original issue you
> raven> describe is concerned I think my recommended patch covers what's
> raven> needed.
>
> raven> A single counter example is all that's needed.
>
> raven> Thanks for the help Jeff.
>
> No problem.  In case you are actually considering the patch I sent earlier,
> I missed a hunk which changes fs/autofs4/root.c.  Let me know if you want
> me to send that to you.  It basically changed try_to_fill_dentry to call
> autofs4_wait unconditionally with NFY_NONE.

I'll give that some thought but I suspect it will produce incorrect 
behaviour. I haven't checked but when the lookup is called it may prevent 
mounts from happening when they should.

We are focusing on interactions with expire here but I agree we probably 
need to do a similar analysis for the mount case. OTOH that hasn't been 
problematic as far a I can remember.

Ian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] autofs4 - expiring filesystem from under process
  2005-04-20 21:03     ` Jeff Moyer
  2005-04-21  9:34       ` Ian Kent
@ 2005-05-23 12:38       ` raven
  1 sibling, 0 replies; 9+ messages in thread
From: raven @ 2005-05-23 12:38 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Andrew Morton, autofs mailing list, Michael Blandford,
	linux-fsdevel


Sorry to take so long to get back to this Jeff.

On Wed, 20 Apr 2005, Jeff Moyer wrote:
> ==> Regarding Re: [PATCH 1/3] autofs4 - expiring filesystem from under process; raven@themaw.net adds:
>
> raven> On Mon, 11 Apr 2005, Jeff Moyer wrote:
>>> ==> Regarding [PATCH 1/3] autofs4 - expiring filesystem from under
>>> process; raven@themaw.net adds:
>>>
>>> Could also please explain how the following is handled:
>>>
>>> expire process runs and issues AUTOFS_EXPIRE_MULTI, which sets
>>> AUTOFS_INF_EXPIRING in flags.  While the expire is in progress, another
>>> process access the directory in question, causing a call to
>>> try_to_fill_dentry.  try_to_fill_dentry sees the AUTOFS_INF_EXPIRING
>>> flag is set, and so calls autofs4_wait with notify set to NFY_NONE.
>>> However, when we take the wq sem, we find that the expire has finished,
>>> and thus create a new wq entry.  Because NFY_NONE is set, we don't tell
>>> the daemon.
>>>
>>> So how will this process ever get woken up?
>>>
>
> raven> I've thought about this for quite a while and I think all that's
> raven> needed is to recognise that we're about to expire a dentry that's
> raven> not mounted anymore.
>
> raven> Can you think of a case for which this patch fails?
>
> I don't see how the patch you provided addresses the race between an expire
> event and a lookup.  The real issue here is that the checking of
> AUTOFS_INF_EXPIRING and the list operations associated therewith need to
> happen atomically.  Until this happens, we will have races.

I don't think that's quite how this process works.

In this case, since the the directory exists, we always come to 
autofs4_wait via autofs4_revalidate and only during an expire. If the 
expire finishes and we attempt to create a new wait as you describe then 
all that needs to be done is to return a fail instead of waiting and rely 
on the subsequent autofs4_lookup call to perform the requested mount.

For the normal case, when the process waits and then returns at expire 
completion has the same semantics. So for a NFY_NONE wait request, 
checking if the dentry is a mountpoint and returning -ENOENT is sufficient 
(in fact the return code itself is not significant). Please check 
do_lookup in namei.c and I think you will see what I mean.

Ian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-05-23 12:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-10 12:48 [PATCH 1/3] autofs4 - expiring filesystem from under process raven
2005-04-11 17:05 ` Jeff Moyer
2005-04-12 12:44   ` raven
2005-04-20 14:34   ` raven
2005-04-20 21:03     ` Jeff Moyer
2005-04-21  9:34       ` Ian Kent
2005-04-22 18:28         ` Jeff Moyer
2005-04-23  3:26           ` raven
2005-05-23 12:38       ` raven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).