public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [-mm patch] Fix inotify umount hangs.
@ 2005-07-04 19:28 Anton Altaparmakov
  2005-07-05 15:20 ` Robert Love
  0 siblings, 1 reply; 5+ messages in thread
From: Anton Altaparmakov @ 2005-07-04 19:28 UTC (permalink / raw)
  To: Andrew Morton, Robert Love; +Cc: linux-kernel

Hi Andrew, Robert,

The below patch against 2.6.13-rc1-mm1 fixes the umount hangs caused by 
inotify.

It excludes more inodes from being messed around with in 
inotify_unmount_inodes(): the ones with zero i_count as they cannot have 
any watches and the I_WILL_FREE ones which it is not allowed to do 
__iget() on.

Note that at this stage MS_ACTIVE is cleared on the superblock which means 
that iput() does not need shrink_icache_memory(), it evicts the inodes 
reaching zero i_count itself and immediately thus if inotify does __iget 
and then iput on such zero i_count inode it will evict it from icache and 
from the i_sb_list!  So the assumption that i_sb_list does not change 
whilts inotify_umount_inodes is running is bogus.  Even with this patch 
the list changes, e.g. when watches are removed i_count can hit zero and 
the iput at the bottom of the list_for_each_safe() loop _will_ modify 
i_sb_list by evicting the inode from icache there and then.

Further to this NTFS drops references to internal system inodes (also in 
icache) during its ->put_inode() method, e.g. this happens for directory 
inodes, their bitmap inode is iput when the directory has its final iput 
done on it which can be triggered by a removed inotify watch.  This can 
cause the "next_i" inode to also be evicted which leads to hangs and 
crashes since it no longer is on i_sb_list and is indeed freed memory 
which it is illegal to access.

This patch solves this by taking a reference on next_i for as long as it 
is needed, again with the same constraits as placed on taking a reference 
on the inode itself.

This may not be the prettiest but it works.  (-:

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>

NB. A cleaner solution _might_ be (I am not sure, haven't tried to 
write it) to do-your-own for loop instead of list_for_each_safe() and 
protect the current inode with a reference and only drop the reference 
when the next inode has had a reference obtained on it but this pretty 
much amounts to what I do in this patch...

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

--- linux-2.6.13-rc1-mm1-vanilla/fs/inotify.c	2005-07-01 14:51:09.000000000 +0100
+++ linux-2.6.13-rc1-mm1/fs/inotify.c	2005-07-04 20:04:31.000000000 +0100
@@ -560,43 +560,65 @@ EXPORT_SYMBOL_GPL(inotify_get_cookie);
  */
 void inotify_unmount_inodes(struct list_head *list)
 {
-	struct inode *inode, *next_i;
+	struct inode *inode, *next_i, *need_iput = NULL;
 
 	list_for_each_entry_safe(inode, next_i, list, i_sb_list) {
+		struct inode *need_iput_tmp;
 		struct inotify_watch *watch, *next_w;
 		struct list_head *watches;
 
 		/*
-		 * We cannot __iget() an inode in state I_CLEAR or I_FREEING,
-		 * which is fine becayse by that point the inode cannot have
-		 * any associated watches.
+		 * If i_count is zero, the inode cannot have any watches and
+		 * doing an __iget/iput with MS_ACTIVE clear would actually
+		 * evict all inodes with zero i_count from icache which is
+		 * unnecessarily violent and may in fact be illegal to do.
 		 */
-		if (inode->i_state & (I_CLEAR | I_FREEING))
+		if (!atomic_read(&inode->i_count))
 			continue;
-
-		/* In case the remove_watch() drops a reference */
-		__iget(inode);
-
 		/*
-		 * We can safely drop inode_lock here because the per-sb list
-		 * of inodes must not change during unmount and iprune_sem
-		 * keeps shrink_icache_memory() away.
+		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
+		 * I_WILL_FREE which is fine because by that point the inode
+		 * cannot have any associated watches.
+		 */
+		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
+			continue;
+		need_iput_tmp = need_iput;
+		need_iput = NULL;
+		/* In case the remove_watch() drops a reference. */
+		if (inode != need_iput_tmp)
+			__iget(inode);
+		else
+			need_iput_tmp = NULL;
+		/* In case the dropping of a reference would nuke next_i. */
+		if ((&next_i->i_sb_list != list) &&
+				atomic_read(&next_i->i_count) &&
+				!(next_i->i_state & (I_CLEAR | I_FREEING |
+				I_WILL_FREE))) {
+			__iget(next_i);
+			need_iput = next_i;
+		}
+		/*
+		 * We can safely drop inode_lock here because we hold
+		 * references on both inode and next_i.  Also no new inodes
+		 * will be added since the umount has begun.  Finally,
+		 * iprune_sem keeps shrink_icache_memory() away.
 		 */
 		spin_unlock(&inode_lock);
-
-		/* for each watch, send IN_UNMOUNT and then remove it */
+		if (need_iput_tmp)
+			iput(need_iput_tmp);
+		/* For each watch, send IN_UNMOUNT and then remove it. */
 		down(&inode->inotify_sem);
 		watches = &inode->inotify_watches;
 		list_for_each_entry_safe(watch, next_w, watches, i_list) {
 			struct inotify_device *dev = watch->dev;
 			down(&dev->sem);
-			inotify_dev_queue_event(dev, watch, IN_UNMOUNT,0,NULL);
+			inotify_dev_queue_event(dev, watch, IN_UNMOUNT, 0,
+					NULL);
 			remove_watch(watch, dev);
 			up(&dev->sem);
 		}
 		up(&inode->inotify_sem);
 		iput(inode);
-
 		spin_lock(&inode_lock);
 	}
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [-mm patch] Fix inotify umount hangs.
  2005-07-04 19:28 [-mm patch] Fix inotify umount hangs Anton Altaparmakov
@ 2005-07-05 15:20 ` Robert Love
  2005-07-05 17:03   ` Anton Altaparmakov
  0 siblings, 1 reply; 5+ messages in thread
From: Robert Love @ 2005-07-05 15:20 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: Andrew Morton, linux-kernel

On Mon, 2005-07-04 at 20:28 +0100, Anton Altaparmakov wrote:

> The below patch against 2.6.13-rc1-mm1 fixes the umount hangs caused by 
> inotify.

Thank you, very much, Anton, for hacking on this over the weekend.

It's definitely not the prettiest thing, but there may be no easier
approach.  One thing, the messy code is working around the list
changing, doesn't invalidate_inodes() have the same problem?  If so, it
solves it differently.

I'm also curious if the I_WILL_FREE or i_count check fixed the bug.  I
suspect the other fix did, but we probably still want this.  Or at least
the I_WILL_FREE check.

Anyhow... I'll send out an updated inotify patch after some testing.
Thanks again.

	Robert Love



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [-mm patch] Fix inotify umount hangs.
  2005-07-05 15:20 ` Robert Love
@ 2005-07-05 17:03   ` Anton Altaparmakov
  2005-07-05 17:11     ` Anton Altaparmakov
  0 siblings, 1 reply; 5+ messages in thread
From: Anton Altaparmakov @ 2005-07-05 17:03 UTC (permalink / raw)
  To: Robert Love; +Cc: Andrew Morton, linux-kernel

On Tue, 5 Jul 2005, Robert Love wrote:
> On Mon, 2005-07-04 at 20:28 +0100, Anton Altaparmakov wrote:
> > The below patch against 2.6.13-rc1-mm1 fixes the umount hangs caused by 
> > inotify.
> 
> Thank you, very much, Anton, for hacking on this over the weekend.

You are welcome.  (-:

> It's definitely not the prettiest thing, but there may be no easier
> approach.  One thing, the messy code is working around the list
> changing, doesn't invalidate_inodes() have the same problem?  If so, it
> solves it differently.

It does.  It first goes over the i_sb_list and anything it finds that it 
is interested in (i.e. all inodes with zero i_count), it moves the inode 
(i_list) over to a private list (this is done in invalidate_list()).  
Then, when it is finished accessing the i_sb_list, and all inodes of 
interest (zero i_count) are on the private list, dispose_list() is called, 
and all inodes on the private list are exterminated.  This obviously no 
longer uses i_sb_list so it does not matter that that is changing now.

> I'm also curious if the I_WILL_FREE or i_count check fixed the bug.  I
> suspect the other fix did, but we probably still want this.  Or at least
> the I_WILL_FREE check.

The i_count check is at least sensible if not required (not sure) 
otherwise you do iget() on inode with zero i_count then waste your time 
looking for watches (which can't be there or i_count would not be zero), 
and then iput() kills off the inode and throws it out of the icache.  This 
would be done in invalidate_inodes() anyway, no need for you to do it 
first.  Even if this is not a required check it saves some cpu cycles.

The I_WILL_FREE is definitely required otherwise you will catch inodes 
that will suddenly become I_FREEING and then I_CLEAR under your feet once 
you drop the inode_lock and we know that is a Bad Thing (TM) as it causes 
the BUG_ON(i_state == I_CLEAR) in iput() to trigger that was reported 
before (when I got drawn into inotify in the first place).

Note that if you want to be really thorough you could wait on the inode 
to be destroyed for good:

if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
	__wait_on_freeing_inode(inode);

And then re-check in the i_sb_list if the inode is still there (e.g. via 
prev member of next_i->i_sb_list which will no longer be "inode" if the 
inode has been evicted).  If the inode is still there someone 
"reactivated" it while you were waiting and you need to redo the 
i_count and i_state checks and deal with the inode as appropriate.

However given this is umount we are talking about there doesn't seem to be 
much point in being that thorough.

I am not familiar enough with i_notify but _if_ it is possible for a user 
to get a watch on an inode which is I_FREEING|I_CLEAR|I_WILLFREE then you 
have to do the waiting otherwise you will miss that watch with I don't 
know what results but probably not good ones...

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [-mm patch] Fix inotify umount hangs.
  2005-07-05 17:03   ` Anton Altaparmakov
@ 2005-07-05 17:11     ` Anton Altaparmakov
  2005-07-05 20:43       ` Anton Altaparmakov
  0 siblings, 1 reply; 5+ messages in thread
From: Anton Altaparmakov @ 2005-07-05 17:11 UTC (permalink / raw)
  To: Robert Love; +Cc: Andrew Morton, linux-kernel

On Tue, 5 Jul 2005, Anton Altaparmakov wrote:
> On Tue, 5 Jul 2005, Robert Love wrote:
> > On Mon, 2005-07-04 at 20:28 +0100, Anton Altaparmakov wrote:
> > > The below patch against 2.6.13-rc1-mm1 fixes the umount hangs caused by 
> > > inotify.
> > 
> > Thank you, very much, Anton, for hacking on this over the weekend.
> 
> You are welcome.  (-:
> 
> > It's definitely not the prettiest thing, but there may be no easier
> > approach.  One thing, the messy code is working around the list
> > changing, doesn't invalidate_inodes() have the same problem?  If so, it
> > solves it differently.
> 
> It does.  It first goes over the i_sb_list and anything it finds that it 
> is interested in (i.e. all inodes with zero i_count), it moves the inode 
> (i_list) over to a private list (this is done in invalidate_list()).  
> Then, when it is finished accessing the i_sb_list, and all inodes of 
> interest (zero i_count) are on the private list, dispose_list() is called, 
> and all inodes on the private list are exterminated.  This obviously no 
> longer uses i_sb_list so it does not matter that that is changing now.
> 
> > I'm also curious if the I_WILL_FREE or i_count check fixed the bug.  I
> > suspect the other fix did, but we probably still want this.  Or at least
> > the I_WILL_FREE check.
> 
> The i_count check is at least sensible if not required (not sure) 
> otherwise you do iget() on inode with zero i_count then waste your time 
> looking for watches (which can't be there or i_count would not be zero), 
> and then iput() kills off the inode and throws it out of the icache.  This 
> would be done in invalidate_inodes() anyway, no need for you to do it 
> first.  Even if this is not a required check it saves some cpu cycles.
> 
> The I_WILL_FREE is definitely required otherwise you will catch inodes 
> that will suddenly become I_FREEING and then I_CLEAR under your feet once 
> you drop the inode_lock and we know that is a Bad Thing (TM) as it causes 
> the BUG_ON(i_state == I_CLEAR) in iput() to trigger that was reported 
> before (when I got drawn into inotify in the first place).
> 
> Note that if you want to be really thorough you could wait on the inode 
> to be destroyed for good:
> 
> if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
> 	__wait_on_freeing_inode(inode);
> 
> And then re-check in the i_sb_list if the inode is still there (e.g. via 
> prev member of next_i->i_sb_list which will no longer be "inode" if the 
> inode has been evicted).  If the inode is still there someone 
> "reactivated" it while you were waiting and you need to redo the 
> i_count and i_state checks and deal with the inode as appropriate.
> 
> However given this is umount we are talking about there doesn't seem to be 
> much point in being that thorough.
> 
> I am not familiar enough with i_notify but _if_ it is possible for a user 
> to get a watch on an inode which is I_FREEING|I_CLEAR|I_WILLFREE then you 
> have to do the waiting otherwise you will miss that watch with I don't 
> know what results but probably not good ones...

Actually given that watches increment i_count, the results would be quite 
obvious if it were possible for this to happen.  The user would see my 
favourite printk (see fs/super.c::generic_shutdown_super()) in dmesg! (-;

printk("VFS: Busy inodes after unmount. "
   "Self-destruct in 5 seconds.  Have a nice day...\n");

Cheers,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [-mm patch] Fix inotify umount hangs.
  2005-07-05 17:11     ` Anton Altaparmakov
@ 2005-07-05 20:43       ` Anton Altaparmakov
  0 siblings, 0 replies; 5+ messages in thread
From: Anton Altaparmakov @ 2005-07-05 20:43 UTC (permalink / raw)
  To: Robert Love; +Cc: Andrew Morton, linux-kernel

On Tue, 5 Jul 2005, Anton Altaparmakov wrote:
> On Tue, 5 Jul 2005, Anton Altaparmakov wrote:
> > On Tue, 5 Jul 2005, Robert Love wrote:
> > > On Mon, 2005-07-04 at 20:28 +0100, Anton Altaparmakov wrote:
> > > > The below patch against 2.6.13-rc1-mm1 fixes the umount hangs caused by 
> > > > inotify.
> > > 
> > > Thank you, very much, Anton, for hacking on this over the weekend.
> > 
> > You are welcome.  (-:
> > 
> > > It's definitely not the prettiest thing, but there may be no easier
> > > approach.  One thing, the messy code is working around the list
> > > changing, doesn't invalidate_inodes() have the same problem?  If so, it
> > > solves it differently.
> > 
> > It does.  It first goes over the i_sb_list and anything it finds that it 
> > is interested in (i.e. all inodes with zero i_count), it moves the inode 
> > (i_list) over to a private list (this is done in invalidate_list()).  
> > Then, when it is finished accessing the i_sb_list, and all inodes of 
> > interest (zero i_count) are on the private list, dispose_list() is called, 
> > and all inodes on the private list are exterminated.  This obviously no 
> > longer uses i_sb_list so it does not matter that that is changing now.
> > 
> > > I'm also curious if the I_WILL_FREE or i_count check fixed the bug.  I
> > > suspect the other fix did, but we probably still want this.  Or at least
> > > the I_WILL_FREE check.
> > 
> > The i_count check is at least sensible if not required (not sure) 
> > otherwise you do iget() on inode with zero i_count then waste your time 
> > looking for watches (which can't be there or i_count would not be zero), 
> > and then iput() kills off the inode and throws it out of the icache.  This 
> > would be done in invalidate_inodes() anyway, no need for you to do it 
> > first.  Even if this is not a required check it saves some cpu cycles.
> > 
> > The I_WILL_FREE is definitely required otherwise you will catch inodes 
> > that will suddenly become I_FREEING and then I_CLEAR under your feet once 
> > you drop the inode_lock and we know that is a Bad Thing (TM) as it causes 
> > the BUG_ON(i_state == I_CLEAR) in iput() to trigger that was reported 
> > before (when I got drawn into inotify in the first place).
> > 
> > Note that if you want to be really thorough you could wait on the inode 
> > to be destroyed for good:
> > 
> > if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
> > 	__wait_on_freeing_inode(inode);
> > 
> > And then re-check in the i_sb_list if the inode is still there (e.g. via 
> > prev member of next_i->i_sb_list which will no longer be "inode" if the 
> > inode has been evicted).  If the inode is still there someone 
> > "reactivated" it while you were waiting and you need to redo the 
> > i_count and i_state checks and deal with the inode as appropriate.
> > 
> > However given this is umount we are talking about there doesn't seem to be 
> > much point in being that thorough.
> > 
> > I am not familiar enough with i_notify but _if_ it is possible for a user 
> > to get a watch on an inode which is I_FREEING|I_CLEAR|I_WILLFREE then you 
> > have to do the waiting otherwise you will miss that watch with I don't 
> > know what results but probably not good ones...
> 
> Actually given that watches increment i_count, the results would be quite 
> obvious if it were possible for this to happen.  The user would see my 
> favourite printk (see fs/super.c::generic_shutdown_super()) in dmesg! (-;
> 
> printk("VFS: Busy inodes after unmount. "
>    "Self-destruct in 5 seconds.  Have a nice day...\n");

I had a look at this and it is safe, i.e. to register a watch you need to 
open(2) a file first and you cannot do an open(2) at the point in time 
when invalidate_inodes() is called in the umount(2) code path.  And you 
cannot get that far into the umount(2) code path if a file is still open 
(you would get -EBUSY much earlier on and the umount would fail).  Thus it 
is not possible for a watch to be created when inotify_umount_inodes() is 
running.  Thus it is not necessary to wait_on_inode() wen an inodes is in 
i_state I_CLEAR|I_FREEING|I_WILL_FREE.  So my patch stands as being 
complete and sufficient for safety (AFAICS).  (-:

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-07-05 20:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-04 19:28 [-mm patch] Fix inotify umount hangs Anton Altaparmakov
2005-07-05 15:20 ` Robert Love
2005-07-05 17:03   ` Anton Altaparmakov
2005-07-05 17:11     ` Anton Altaparmakov
2005-07-05 20:43       ` Anton Altaparmakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox