linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
@ 2005-04-19 12:38 Artem B. Bityuckiy
  0 siblings, 0 replies; 21+ messages in thread
From: Artem B. Bityuckiy @ 2005-04-19 12:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Woodhouse, linux-mtd, viro, linux-fsdevel

Hello,

here is a patch to fix the problem discussed at the "[PATC] small VFS
change for JFFS2" thread in LKML (http://lkml.org/lkml/2005/4/18/77).

The problem description:
~~~~~~~~~~~~~~~~~~~~~~

prune_icache() removes inodes from the inode hash (inode->i_hash) and
drops the node_lock spinlock. If at that moment iget() is called, we end
up with the situation when VFS calls ->read_inode() twice for the same
inode without calling ->clear_inode() between. This happens despite of
the I_FREEING inode state because the inode is already removed from the
hash by the time find_inode_fast() is invoked.

The fix is: do not remove the inode from the hash too early.

The following patch fixes the problem. It was tested with JFFS2 (only)
and works perfectly.

Comments?



Signed-off-by: Artem B. Bityuckiy <dedekind@infradead.org>


diff -auNrp linux-2.6.11.5/fs/inode.c linux-2.6.11.5_fixed/fs/inode.c
--- linux-2.6.11.5/fs/inode.c   2005-03-19 09:35:04.000000000 +0300
+++ linux-2.6.11.5_fixed/fs/inode.c     2005-04-18 17:54:16.000000000
+0400
@@ -284,6 +284,12 @@ static void dispose_list(struct list_hea
                if (inode->i_data.nrpages)
                        truncate_inode_pages(&inode->i_data, 0);
                clear_inode(inode);
+
+               spin_lock(&inode_lock);
+               hlist_del_init(&inode->i_hash);
+               list_del_init(&inode->i_sb_list);
+               spin_unlock(&inode_lock);
+
                destroy_inode(inode);
                nr_disposed++;
        }
@@ -319,8 +325,6 @@ static int invalidate_list(struct list_h
                inode = list_entry(tmp, struct inode, i_sb_list);
                invalidate_inode_buffers(inode);
                if (!atomic_read(&inode->i_count)) {
-                       hlist_del_init(&inode->i_hash);
-                       list_del(&inode->i_sb_list);
                        list_move(&inode->i_list, dispose);
                        inode->i_state |= I_FREEING;
                        count++;
@@ -455,8 +459,6 @@ static void prune_icache(int nr_to_scan)
                        if (!can_unuse(inode))
                                continue;
                }
-               hlist_del_init(&inode->i_hash);
-               list_del_init(&inode->i_sb_list);
                list_move(&inode->i_list, &freeable);
                inode->i_state |= I_FREEING;
                nr_pruned++;

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
@ 2005-04-27 13:15 Artem B. Bityuckiy
  2005-04-27 13:42 ` Jan Harkes
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Artem B. Bityuckiy @ 2005-04-27 13:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Woodhouse, viro, linux-fsdevel

Dear VFS developers,

is it possible to review/comment/apply the patch which fixes a severe
VFS bug which screws up JFFS2? 

(the third attempt).

The problem description:
~~~~~~~~~~~~~~~~~~~~~~

prune_icache() removes inodes from the inode hash (inode->i_hash) and
drops the node_lock spinlock. If at that moment iget() is called, we end
up with the situation when VFS calls ->read_inode() twice for the same
inode without calling ->clear_inode() between. This happens despite of
the I_FREEING inode state because the inode is already removed from the
hash by the time find_inode_fast() is invoked.

The fix is: do not remove the inode from the hash too early.

The following patch fixes the problem. It was tested with JFFS2 (only)
and works perfectly.

Comments?



Signed-off-by: Artem B. Bityuckiy <dedekind@infradead.org>


diff -auNrp linux-2.6.11.5/fs/inode.c linux-2.6.11.5_fixed/fs/inode.c
--- linux-2.6.11.5/fs/inode.c   2005-03-19 09:35:04.000000000 +0300
+++ linux-2.6.11.5_fixed/fs/inode.c     2005-04-18 17:54:16.000000000
+0400
@@ -284,6 +284,12 @@ static void dispose_list(struct list_hea
                if (inode->i_data.nrpages)
                        truncate_inode_pages(&inode->i_data, 0);
                clear_inode(inode);
+
+               spin_lock(&inode_lock);
+               hlist_del_init(&inode->i_hash);
+               list_del_init(&inode->i_sb_list);
+               spin_unlock(&inode_lock);
+
                destroy_inode(inode);
                nr_disposed++;
        }
@@ -319,8 +325,6 @@ static int invalidate_list(struct list_h
                inode = list_entry(tmp, struct inode, i_sb_list);
                invalidate_inode_buffers(inode);
                if (!atomic_read(&inode->i_count)) {
-                       hlist_del_init(&inode->i_hash);
-                       list_del(&inode->i_sb_list);
                        list_move(&inode->i_list, dispose);
                        inode->i_state |= I_FREEING;
                        count++;
@@ -455,8 +459,6 @@ static void prune_icache(int nr_to_scan)
                        if (!can_unuse(inode))
                                continue;
                }
-               hlist_del_init(&inode->i_hash);
-               list_del_init(&inode->i_sb_list);
                list_move(&inode->i_list, &freeable);
                inode->i_state |= I_FREEING;
                nr_pruned++;

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-04-27 13:15 [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between Artem B. Bityuckiy
@ 2005-04-27 13:42 ` Jan Harkes
  2005-04-27 14:22 ` Miklos Szeredi
  2005-04-27 15:57 ` Miklos Szeredi
  2 siblings, 0 replies; 21+ messages in thread
From: Jan Harkes @ 2005-04-27 13:42 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-kernel, David Woodhouse, viro, linux-fsdevel

On Wed, Apr 27, 2005 at 05:15:41PM +0400, Artem B. Bityuckiy wrote:
> Dear VFS developers,
> 
> is it possible to review/comment/apply the patch which fixes a severe
> VFS bug which screws up JFFS2? 
> 
> (the third attempt).

I think this is a problem that hit Coda at some point. We used to have a
linked list that linked all the inodes together. However not long after
the RCU changes, I noticed that this linked list occasionally got
corrupted. I didn't want to dig too deep into all the dcache related
changes where I thought the problem came from, nobody else seemed to be
complaining and the linked list wasn't really necessary so I worked
around the problem by removing the linked list strucure.

But I think you found the actual cause and this patch looks good to me,
it probably should get merged.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-04-27 13:15 [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between Artem B. Bityuckiy
  2005-04-27 13:42 ` Jan Harkes
@ 2005-04-27 14:22 ` Miklos Szeredi
  2005-04-27 15:57 ` Miklos Szeredi
  2 siblings, 0 replies; 21+ messages in thread
From: Miklos Szeredi @ 2005-04-27 14:22 UTC (permalink / raw)
  To: dedekind; +Cc: linux-kernel, dwmw2, viro, linux-fsdevel

> The following patch fixes the problem. It was tested with JFFS2 (only)
> and works perfectly.

You should send it to Andrew Morton, so it gets some more testing.
FWIW it looks good to me too.

Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-04-27 13:15 [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between Artem B. Bityuckiy
  2005-04-27 13:42 ` Jan Harkes
  2005-04-27 14:22 ` Miklos Szeredi
@ 2005-04-27 15:57 ` Miklos Szeredi
  2005-04-27 16:19   ` Artem B. Bityuckiy
  2 siblings, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2005-04-27 15:57 UTC (permalink / raw)
  To: dedekind; +Cc: linux-kernel, dwmw2, viro, linux-fsdevel

> Comments?
> 

On second thought a wake_up_inode() seems to be missing in
dispose_list() just before destroy_inode().  

Also I'm not sure delaying removal from i_sb_list is the right thing.
generic_delete_inode() does this before clear_inode().

Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-04-27 15:57 ` Miklos Szeredi
@ 2005-04-27 16:19   ` Artem B. Bityuckiy
       [not found]     ` <E1DQqZu-0002Rf-00@dorka.pomaz.szeredi.hu>
  0 siblings, 1 reply; 21+ messages in thread
From: Artem B. Bityuckiy @ 2005-04-27 16:19 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, dwmw2, viro, linux-fsdevel, akpm

В Срд, 27/04/2005 в 17:57 +0200, Miklos Szeredi пишет:
> On second thought a wake_up_inode() seems to be missing in
> dispose_list() just before destroy_inode().  
Good point, thanks.

> Also I'm not sure delaying removal from i_sb_list is the right thing.
> generic_delete_inode() does this before clear_inode().
As I can see, dispose_list() doesn't correlate with generic_delete_inode
() code path, so this mustn't be a problem.

What am I supposed to do next? I already sent the old patch with the
error to Andrew Morton. Should I notify him about this? Just resend?
Whatever?

Cheers,
Artem.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
       [not found]     ` <E1DQqZu-0002Rf-00@dorka.pomaz.szeredi.hu>
@ 2005-04-28  7:32       ` Artem B. Bityuckiy
  2005-04-28  7:34         ` Andrew Morton
  2005-04-28  7:41         ` [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between Miklos Szeredi
  0 siblings, 2 replies; 21+ messages in thread
From: Artem B. Bityuckiy @ 2005-04-28  7:32 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, dwmw2, linux-fsdevel, akpm

> Why do you need to move it from prune_icache() to dispose_list()?
> For the hash list it's the right thing, but for sb_list it's not
> needed, is it?
Yes, it is not needed but harmless. I did it only because i_hash &
i_sb_list insertions/deletions always come in couple. So I decided move
them both, to be more consistent, to make code less complicated.

If you regard this hazardous I might split these removals. But IMHO, my
variant is a bit more pleasant.

> Retest, and resend.
Thanks.

I assume I don't need to notify Andrew about the inconsistency in the
old patch, or should I?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-04-28  7:32       ` Artem B. Bityuckiy
@ 2005-04-28  7:34         ` Andrew Morton
  2005-05-04 12:17           ` Artem B. Bityuckiy
  2005-04-28  7:41         ` [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between Miklos Szeredi
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2005-04-28  7:34 UTC (permalink / raw)
  To: dedekind; +Cc: miklos, linux-kernel, dwmw2, linux-fsdevel

"Artem B. Bityuckiy" <dedekind@infradead.org> wrote:
>
>  I assume I don't need to notify Andrew about the inconsistency in the
>  old patch, or should I?

Nope, just send the new patch.  Please be sure to include a complete and
up-to-date explanation of what it does, and why - I haven't looked at this
yet.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-04-28  7:32       ` Artem B. Bityuckiy
  2005-04-28  7:34         ` Andrew Morton
@ 2005-04-28  7:41         ` Miklos Szeredi
  2005-04-28  7:47           ` Artem B. Bityuckiy
  1 sibling, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2005-04-28  7:41 UTC (permalink / raw)
  To: dedekind; +Cc: linux-kernel, dwmw2, linux-fsdevel, akpm

> > Why do you need to move it from prune_icache() to dispose_list()?
> > For the hash list it's the right thing, but for sb_list it's not
> > needed, is it?
> Yes, it is not needed but harmless. I did it only because i_hash &
> i_sb_list insertions/deletions always come in couple. So I decided move
> them both, to be more consistent, to make code less complicated.
> 
> If you regard this hazardous I might split these removals. But IMHO, my
> variant is a bit more pleasant.

It's not just pleasentness.  You should be _very_ careful with any
changes you make to this kind of code, and have a very clear
explanation why the change is needed, and why it won't do any trouble.

I didn't actually think about the sb_list stuff, but my feeling is you
should not move it unless there's a very clear reason to do so.

Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-04-28  7:41         ` [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between Miklos Szeredi
@ 2005-04-28  7:47           ` Artem B. Bityuckiy
  0 siblings, 0 replies; 21+ messages in thread
From: Artem B. Bityuckiy @ 2005-04-28  7:47 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, dwmw2, linux-fsdevel, akpm

> I didn't actually think about the sb_list stuff, but my feeling is you
> should not move it unless there's a very clear reason to do so.

Well, no problems. Probably safeness is of greater priority. I'll take
care not to touch it.

Thanks for review.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-04-28  7:34         ` Andrew Morton
@ 2005-05-04 12:17           ` Artem B. Bityuckiy
  2005-05-04 20:04             ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Artem B. Bityuckiy @ 2005-05-04 12:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: miklos, linux-kernel, dwmw2, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

Hello Andrew,

here you can find a new patch for the VFS bug which was discussed at
http://lkml.org/lkml/2005/4/27/84

I added wake_up_inode() invocation just as Miklos suggested.


Bug symptoms
~~~~~~~~~~~~
For the same inode VFS calls read_inode() twice and doesn't call
clear_inode() between the two read_inode() invocations.

Bug description
~~~~~~~~~~~~~~~
Suppose we have an inode which has zero reference count but is still in
the inode cache. Suppose kswapd invokes shrink_icache_memory() to free
some RAM. In prune_icache() inodes are removed from i_hash. prune_icache
() is then going to call clear_inode(), but drops the inode_lock
spinlock before this. If in this moment another task calls iget() for an
inode which was just removed from i_hash by prune_icache(), then iget()
invokes read_inode() for this inode, because it is *already removed*
from i_hash.

The end result is: we call iget(#N) then iput(#N); inode #N has zero
i_count now and is in the inode cache; kswapd starts. kswapd removes the
inode #N from i_hash ans is preempted; we call iget(#N) again;
read_inode() is invoked as the result; but we expect clear_inode()
before.

Fix
~~~~~~~
To fix the bug I remove inodes from i_hash later, when clear_inode() is
actually called. I remove them from i_hash under spinlock protection.
Since the i_state is set to I_FREEING, it is safe to do this. The others
will sleep waiting for the inode state change.

I also postpone removing inodes from i_sb_list. It is not compulsory to
do so but I do it for readability reasons. Inodes are added/removed to
the lists together everywhere in the code and there is no point to
change this rule. This is harmless because the only user of i_sb_list
which somehow may interfere with me (invalidate_list()) is excluded by
the iprune_sem mutex.

The same race is possible in invalidate_list() so I do the same for it.

The patch is against linux 2.6.11.5.
The patch was tested for JFFS2.

Please. apply/comment.

Cheers,
Artem.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

[-- Attachment #2: vfs-double_inode_read-2.diff --]
[-- Type: text/x-patch, Size: 1190 bytes --]

diff -auNrp linux-2.6.11.5/fs/inode.c linux-2.6.11.5_fixed/fs/inode.c
--- linux-2.6.11.5/fs/inode.c	2005-03-19 09:35:04.000000000 +0300
+++ linux-2.6.11.5_fixed/fs/inode.c	2005-05-04 14:51:14.000000000 +0400
@@ -284,6 +284,13 @@ static void dispose_list(struct list_hea
 		if (inode->i_data.nrpages)
 			truncate_inode_pages(&inode->i_data, 0);
 		clear_inode(inode);
+		
+		spin_lock(&inode_lock);
+		hlist_del_init(&inode->i_hash);
+		list_del_init(&inode->i_sb_list);
+		spin_unlock(&inode_lock);
+		
+		wake_up_inode(inode);
 		destroy_inode(inode);
 		nr_disposed++;
 	}
@@ -319,8 +326,6 @@ static int invalidate_list(struct list_h
 		inode = list_entry(tmp, struct inode, i_sb_list);
 		invalidate_inode_buffers(inode);
 		if (!atomic_read(&inode->i_count)) {
-			hlist_del_init(&inode->i_hash);
-			list_del(&inode->i_sb_list);
 			list_move(&inode->i_list, dispose);
 			inode->i_state |= I_FREEING;
 			count++;
@@ -455,8 +460,6 @@ static void prune_icache(int nr_to_scan)
 			if (!can_unuse(inode))
 				continue;
 		}
-		hlist_del_init(&inode->i_hash);
-		list_del_init(&inode->i_sb_list);
 		list_move(&inode->i_list, &freeable);
 		inode->i_state |= I_FREEING;
 		nr_pruned++;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-05-04 12:17           ` Artem B. Bityuckiy
@ 2005-05-04 20:04             ` Andrew Morton
  2005-05-04 21:35               ` David Woodhouse
  2005-06-13 14:45               ` Synchronous FAT Artem B. Bityuckiy
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2005-05-04 20:04 UTC (permalink / raw)
  To: dedekind; +Cc: miklos, linux-kernel, dwmw2, linux-fsdevel

"Artem B. Bityuckiy" <dedekind@infradead.org> wrote:
>
> Bug symptoms
> ~~~~~~~~~~~~
> For the same inode VFS calls read_inode() twice and doesn't call
> clear_inode() between the two read_inode() invocations.
> 
> Bug description
> ~~~~~~~~~~~~~~~
> Suppose we have an inode which has zero reference count but is still in
> the inode cache. Suppose kswapd invokes shrink_icache_memory() to free
> some RAM. In prune_icache() inodes are removed from i_hash. prune_icache
> () is then going to call clear_inode(), but drops the inode_lock
> spinlock before this. If in this moment another task calls iget() for an
> inode which was just removed from i_hash by prune_icache(), then iget()
> invokes read_inode() for this inode, because it is *already removed*
> from i_hash.

This sounds more like a bug in the iget() caller to me.

Question is: if the inode has zero refcount and is unhashed then how did
the caller get its sticky paws onto the inode* in the first place?

If the caller had saved a copy of the inode* in local storage then the
caller should have taken a ref against the inode.

If the caller had just looked up the inode via hastable lookup via
iget_whatever() then again the caller will have a ref on the inode.

So.  Please tell us more about how the caller got into this situation.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-05-04 20:04             ` Andrew Morton
@ 2005-05-04 21:35               ` David Woodhouse
  2005-05-04 21:58                 ` Andrew Morton
  2005-06-13 14:45               ` Synchronous FAT Artem B. Bityuckiy
  1 sibling, 1 reply; 21+ messages in thread
From: David Woodhouse @ 2005-05-04 21:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dedekind, miklos, linux-kernel, linux-fsdevel

On Wed, 2005-05-04 at 13:04 -0700, Andrew Morton wrote:
> This sounds more like a bug in the iget() caller to me.
> 
> Question is: if the inode has zero refcount and is unhashed then how
> did the caller get its sticky paws onto the inode* in the first place?
> 
> If the caller had saved a copy of the inode* in local storage then the
> caller should have taken a ref against the inode.
> 
> If the caller had just looked up the inode via hastable lookup via
> iget_whatever() then again the caller will have a ref on the inode.
> 
> So.  Please tell us more about how the caller got into this situation.

I could explain in detail how JFFS2 garbage collection works, moving log
entries out of the way by calling iget() on the inode to which they
belong.... or I could just say "NFS".

-- 
dwmw2



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-05-04 21:35               ` David Woodhouse
@ 2005-05-04 21:58                 ` Andrew Morton
  2005-05-05  9:10                   ` David Woodhouse
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2005-05-04 21:58 UTC (permalink / raw)
  To: David Woodhouse; +Cc: dedekind, miklos, linux-kernel, linux-fsdevel

David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Wed, 2005-05-04 at 13:04 -0700, Andrew Morton wrote:
> > This sounds more like a bug in the iget() caller to me.
> > 
> > Question is: if the inode has zero refcount and is unhashed then how
> > did the caller get its sticky paws onto the inode* in the first place?
> > 
> > If the caller had saved a copy of the inode* in local storage then the
> > caller should have taken a ref against the inode.
> > 
> > If the caller had just looked up the inode via hastable lookup via
> > iget_whatever() then again the caller will have a ref on the inode.
> > 
> > So.  Please tell us more about how the caller got into this situation.
> 
> I could explain in detail how JFFS2 garbage collection works, moving log
> entries out of the way by calling iget() on the inode to which they
> belong.... or I could just say "NFS".
> 

That doesn't really settle the question of whether the callers are broken,
whether they are doing something which the VFS really should support and
what need to be done to the VFS to support it properly.

Looking at the proposed patch: what happens if an inode is on its way to
dispose_list() and someone tries to do an iget() on it?  I don't think I_LOCK
is set, so __wait_on_freeing_inode() will no longer provide this guarantee:

/*
 * If we try to find an inode in the inode hash while it is being deleted, we
 * have to wait until the filesystem completes its deletion before reporting
 * that it isn't found.  This is because iget will immediately call
 * ->read_inode, and we want to be sure that evidence of the deletion is found
 * by ->read_inode.

Not only will we break the __wait_on_freeing_inode() guarantee, but we'll
break it in extremely rare circumstances.

And that's just from a quick peek.  There may be other problems.  Worried.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-05-04 21:58                 ` Andrew Morton
@ 2005-05-05  9:10                   ` David Woodhouse
  2005-05-05 16:18                     ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: David Woodhouse @ 2005-05-05  9:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dedekind, miklos, linux-kernel, linux-fsdevel

On Wed, 2005-05-04 at 14:58 -0700, Andrew Morton wrote:
> That doesn't really settle the question of whether the callers are broken,
> whether they are doing something which the VFS really should support and
> what need to be done to the VFS to support it properly.

Filesystems exported by NFS _will_ get iget() called for recently-
deleted inodes. JFFS2 and (I believe) NTFS also do internal things which
end up having the same effect.

The premise is simple: regardless of who calls iget() and when they do
it, the VFS should not call the filesystem's read_inode() method twice
consecutively for the same inode without ever calling clear_inode() or
delete_inode() in between.

That's what __wait_on_freeing_inode() was introduced for -- so we can
make sure the clear_inode() call actually happened before we call
read_inode() again for the same inode. Unfortunately there is still a
code path where we can get it wrong, and that's what Artem is fixing.

> Looking at the proposed patch: what happens if an inode is on its way to
> dispose_list() and someone tries to do an iget() on it?  I don't think I_LOCK
> is set, so __wait_on_freeing_inode() will no longer provide this guarantee:

> /*
>  * If we try to find an inode in the inode hash while it is being deleted, we
>  * have to wait until the filesystem completes its deletion before reporting
>  * that it isn't found.  This is because iget will immediately call
>  * ->read_inode, and we want to be sure that evidence of the deletion is found
>  * by ->read_inode.

That comment isn't true any more. Look at what __wait_on_freeing_inode()
actually does, and observe the fact that all its callers actually loop
and start again after calling it. 

The current implementation of __wait_on_freeing_inode() waits until it
_might_ have changed, not until it _has_ changed. That's why it's OK for
it just to be a yield() or a wait on a bit_waitqueue.

I'm not convinced I _like_ that implementation, mind you -- it's changed
since I last looked at it. But I don't see that there's anything
strictly broken about it.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-05-05  9:10                   ` David Woodhouse
@ 2005-05-05 16:18                     ` Miklos Szeredi
  2005-05-06 11:08                       ` David Woodhouse
  0 siblings, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2005-05-05 16:18 UTC (permalink / raw)
  To: dwmw2; +Cc: akpm, dedekind, linux-kernel, linux-fsdevel

> That comment isn't true any more. Look at what __wait_on_freeing_inode()
> actually does, and observe the fact that all its callers actually loop
> and start again after calling it. 
> 
> The current implementation of __wait_on_freeing_inode() waits until it
> _might_ have changed, not until it _has_ changed. That's why it's OK for
> it just to be a yield() or a wait on a bit_waitqueue.
> 
> I'm not convinced I _like_ that implementation, mind you -- it's changed
> since I last looked at it. But I don't see that there's anything
> strictly broken about it.

Using yield() to wait for a precisely defined event (clear_inode()
finishing) doesn't seem like a very good idea.  Especially, since
Artem's patch will probably make it trigger more often.

How about this (totally untested) patch?  Even if I_LOCK is not set
initially, wake_up_inode() should do the right thing and wake up the
waiting task after clear_inode().  It shouldn't cause spurious
wakeups, since there should be no other reference to the inode.

Miklos

--- inode.c~	2005-05-02 11:24:49.000000000 +0200
+++ inode.c	2005-05-05 18:12:57.000000000 +0200
@@ -1264,18 +1264,6 @@ static void __wait_on_freeing_inode(stru
 {
 	wait_queue_head_t *wq;
 	DEFINE_WAIT_BIT(wait, &inode->i_state, __I_LOCK);
-
-	/*
-	 * I_FREEING and I_CLEAR are cleared in process context under
-	 * inode_lock, so we have to give the tasks who would clear them
-	 * a chance to run and acquire inode_lock.
-	 */
-	if (!(inode->i_state & I_LOCK)) {
-		spin_unlock(&inode_lock);
-		yield();
-		spin_lock(&inode_lock);
-		return;
-	}
 	wq = bit_waitqueue(&inode->i_state, __I_LOCK);
 	prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
 	spin_unlock(&inode_lock);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between
  2005-05-05 16:18                     ` Miklos Szeredi
@ 2005-05-06 11:08                       ` David Woodhouse
  0 siblings, 0 replies; 21+ messages in thread
From: David Woodhouse @ 2005-05-06 11:08 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, dedekind, linux-kernel, linux-fsdevel

On Thu, 2005-05-05 at 18:18 +0200, Miklos Szeredi wrote:
> Using yield() to wait for a precisely defined event (clear_inode()
> finishing) doesn't seem like a very good idea.  Especially, since
> Artem's patch will probably make it trigger more often.

Agreed. Even before Artem's patch, we're still effectively busy-waiting
for something which calls back into the file system's clear_inode method
and may well sleep and perform I/O.

> How about this (totally untested) patch?  Even if I_LOCK is not set
> initially, wake_up_inode() should do the right thing and wake up the
> waiting task after clear_inode().  It shouldn't cause spurious
> wakeups, since there should be no other reference to the inode.

Since Artem introduced a wake_up_inode() in dispose_list(), your patch
seems reasonable.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Synchronous FAT
  2005-05-04 20:04             ` Andrew Morton
  2005-05-04 21:35               ` David Woodhouse
@ 2005-06-13 14:45               ` Artem B. Bityuckiy
  2005-06-14  1:06                 ` Coywolf Qi Hunt
  1 sibling, 1 reply; 21+ messages in thread
From: Artem B. Bityuckiy @ 2005-06-13 14:45 UTC (permalink / raw)
  To: linux-fsdevel

Did somebody try to work with vfat mounted with -osync flag in 
linux-2.6.12-rc6 or linux-2.6.12-rc2 ? Is it supposed to be OK?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Synchronous FAT
  2005-06-13 14:45               ` Synchronous FAT Artem B. Bityuckiy
@ 2005-06-14  1:06                 ` Coywolf Qi Hunt
  2005-06-14 12:16                   ` Artem B. Bityuckiy
  0 siblings, 1 reply; 21+ messages in thread
From: Coywolf Qi Hunt @ 2005-06-14  1:06 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-fsdevel

On 6/13/05, Artem B. Bityuckiy <dedekind@yandex.ru> wrote:
> Did somebody try to work with vfat mounted with -osync flag in
> linux-2.6.12-rc6 or linux-2.6.12-rc2 ? Is it supposed to be OK?

>From the manual, "the  sync  option today has effect only for ext2,
ext3 and ufs".

-- 
Coywolf Qi Hunt
http://ahbl.org/~coywolf/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Synchronous FAT
  2005-06-14  1:06                 ` Coywolf Qi Hunt
@ 2005-06-14 12:16                   ` Artem B. Bityuckiy
  2005-06-15  1:19                     ` Coywolf Qi Hunt
  0 siblings, 1 reply; 21+ messages in thread
From: Artem B. Bityuckiy @ 2005-06-14 12:16 UTC (permalink / raw)
  To: Coywolf Qi Hunt; +Cc: linux-fsdevel

Coywolf Qi Hunt wrote:
>>From the manual, "the  sync  option today has effect only for ext2,
> ext3 and ufs".
It was added recently. But it oopses on one of my embedded devices.


-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Synchronous FAT
  2005-06-14 12:16                   ` Artem B. Bityuckiy
@ 2005-06-15  1:19                     ` Coywolf Qi Hunt
  0 siblings, 0 replies; 21+ messages in thread
From: Coywolf Qi Hunt @ 2005-06-15  1:19 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-fsdevel

On 6/14/05, Artem B. Bityuckiy <dedekind@yandex.ru> wrote:
> Coywolf Qi Hunt wrote:
> >>From the manual, "the  sync  option today has effect only for ext2,
> > ext3 and ufs".
> It was added recently. But it oopses on one of my embedded devices.
> 

Yes, it was added in 2.6.11-mm2. Show your oops and CC lkml, someone
will come up. BTW, sync option destroys flash.
-- 
Coywolf Qi Hunt
http://ahbl.org/~coywolf/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-06-15  1:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-27 13:15 [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between Artem B. Bityuckiy
2005-04-27 13:42 ` Jan Harkes
2005-04-27 14:22 ` Miklos Szeredi
2005-04-27 15:57 ` Miklos Szeredi
2005-04-27 16:19   ` Artem B. Bityuckiy
     [not found]     ` <E1DQqZu-0002Rf-00@dorka.pomaz.szeredi.hu>
2005-04-28  7:32       ` Artem B. Bityuckiy
2005-04-28  7:34         ` Andrew Morton
2005-05-04 12:17           ` Artem B. Bityuckiy
2005-05-04 20:04             ` Andrew Morton
2005-05-04 21:35               ` David Woodhouse
2005-05-04 21:58                 ` Andrew Morton
2005-05-05  9:10                   ` David Woodhouse
2005-05-05 16:18                     ` Miklos Szeredi
2005-05-06 11:08                       ` David Woodhouse
2005-06-13 14:45               ` Synchronous FAT Artem B. Bityuckiy
2005-06-14  1:06                 ` Coywolf Qi Hunt
2005-06-14 12:16                   ` Artem B. Bityuckiy
2005-06-15  1:19                     ` Coywolf Qi Hunt
2005-04-28  7:41         ` [PATCH] VFS bugfix: two read_inode() calles without clear_inode() call between Miklos Szeredi
2005-04-28  7:47           ` Artem B. Bityuckiy
  -- strict thread matches above, loose matches on Subject: below --
2005-04-19 12:38 Artem B. Bityuckiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).