All of lore.kernel.org
 help / color / mirror / Atom feed
* Busy inodes after unmount followed by Oops
@ 2004-01-09 11:06 James Pearson
  2004-01-09 19:13 ` [NFS] " Ian Kent
  2004-05-21 15:44 ` James Pearson
  0 siblings, 2 replies; 11+ messages in thread
From: James Pearson @ 2004-01-09 11:06 UTC (permalink / raw)
  To: nfs, autofs

There was a long thread a few months ago about this subject:

http://marc.theaimsgroup.com/?t=106332683300004&r=1&w=2

and

http://marc.theaimsgroup.com/?t=106340013500006&r=1&w=2

I've read the posts but as far as I can tell, I can't find a 'solution'
to the problem (but I may have missed it in all the posts!)

I have the same problem with a large number of dual CPU machines running
2.4.20 and above that make heavy use of autofs (v4.0.0pre10).

We get a:

VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice
day...

mesasage, followed some time later by Oops's from kswapd, umount or some
other user application.

Is there a 'fix' for this problem?

Thanks

James Pearson


-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [NFS] Busy inodes after unmount followed by Oops
  2004-01-09 11:06 Busy inodes after unmount followed by Oops James Pearson
@ 2004-01-09 19:13 ` Ian Kent
  2004-05-21 15:44 ` James Pearson
  1 sibling, 0 replies; 11+ messages in thread
From: Ian Kent @ 2004-01-09 19:13 UTC (permalink / raw)
  To: James Pearson; +Cc: autofs, nfs

On Fri, 9 Jan 2004, James Pearson wrote:

> I have the same problem with a large number of dual CPU machines running
> 2.4.20 and above that make heavy use of autofs (v4.0.0pre10).
>
> We get a:
>
> VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice
> day...
>
> mesasage, followed some time later by Oops's from kswapd, umount or some
> other user application.
>
> Is there a 'fix' for this problem?

I would very much appreciate if you would use 4.1.0 (on kernel.org). This
may help with the problem. I'm interested to hear how it goes against the
stock autofs4 module.

I have a patch set for 2.6 but, from a recent discussion, it may be faulty.

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Busy inodes after unmount followed by Oops
  2004-01-09 11:06 Busy inodes after unmount followed by Oops James Pearson
  2004-01-09 19:13 ` [NFS] " Ian Kent
@ 2004-05-21 15:44 ` James Pearson
  2004-05-21 16:44     ` Trond Myklebust
  1 sibling, 1 reply; 11+ messages in thread
From: James Pearson @ 2004-05-21 15:44 UTC (permalink / raw)
  To: nfs, autofs

James Pearson wrote:
> 
> There was a long thread a few months ago about this subject:
> 
> http://marc.theaimsgroup.com/?t=106332683300004&r=1&w=2
> 
> and
> 
> http://marc.theaimsgroup.com/?t=106340013500006&r=1&w=2
> 
> I've read the posts but as far as I can tell, I can't find a 'solution'
> to the problem (but I may have missed it in all the posts!)
> 
> I have the same problem with a large number of dual CPU machines running
> 2.4.20 and above that make heavy use of autofs (v4.0.0pre10).
> 
> We get a:
> 
> VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice
> day...
> 
> mesasage, followed some time later by Oops's from kswapd, umount or some
> other user application.
> 
> Is there a 'fix' for this problem?

Just to follow up on my original posting back in January, the problem
above appears to have been solved by using Ian Kent's latest autofs
kernel patch (20040508) from
http://www.kernel.org/pub/linux/daemons/autofs/v4 - we are also running
the latest 4.1.2/4.1.3 automount daemon.

We've been running this patch on over 400 machines for the last 10 or so
days, and have not seen any instance of this 'VFS: Busy inodes after
unmount/Oops' problem above - previously, we would have seen a
significant number of these in the same time scale (and we are still
seeing the problem on machines not using the patch).

Thanks

James Pearson

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Busy inodes after unmount followed by Oops
  2004-05-21 15:44 ` James Pearson
@ 2004-05-21 16:44     ` Trond Myklebust
  0 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2004-05-21 16:44 UTC (permalink / raw)
  To: James Pearson; +Cc: nfs, autofs

PÂ fr , 21/05/2004 klokka 11:44, skreiv James Pearson:

> We've been running this patch on over 400 machines for the last 10 or so
> days, and have not seen any instance of this 'VFS: Busy inodes after
> unmount/Oops' problem above - previously, we would have seen a
> significant number of these in the same time scale (and we are still
> seeing the problem on machines not using the patch).

Note  that Greg's patch for solving the NFS client issues that were
revealed by this bug is in 2.4.26 (and in 2.6.5)...

Cheers,
  Trond


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149&alloc_idÅ66&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Busy inodes after unmount followed by Oops
@ 2004-05-21 16:44     ` Trond Myklebust
  0 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2004-05-21 16:44 UTC (permalink / raw)
  To: James Pearson; +Cc: nfs, autofs

P=E5 fr , 21/05/2004 klokka 11:44, skreiv James Pearson:

> We've been running this patch on over 400 machines for the last 10 or so
> days, and have not seen any instance of this 'VFS: Busy inodes after
> unmount/Oops' problem above - previously, we would have seen a
> significant number of these in the same time scale (and we are still
> seeing the problem on machines not using the patch).

Note  that Greg's patch for solving the NFS client issues that were
revealed by this bug is in 2.4.26 (and in 2.6.5)...

Cheers,
  Trond


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [NFS] Re: Busy inodes after unmount followed by Oops
  2004-05-21 16:44     ` Trond Myklebust
  (?)
@ 2004-05-21 16:58     ` James Pearson
  2004-05-22 13:04       ` Greg Banks
  -1 siblings, 1 reply; 11+ messages in thread
From: James Pearson @ 2004-05-21 16:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: autofs, nfs

Trond Myklebust wrote:
> 
> På fr , 21/05/2004 klokka 11:44, skreiv James Pearson:
> 
> > We've been running this patch on over 400 machines for the last 10 or so
> > days, and have not seen any instance of this 'VFS: Busy inodes after
> > unmount/Oops' problem above - previously, we would have seen a
> > significant number of these in the same time scale (and we are still
> > seeing the problem on machines not using the patch).
> 
> Note  that Greg's patch for solving the NFS client issues that were
> revealed by this bug is in 2.4.26 (and in 2.6.5)...

Not quite sure what you mean by this - I had tried Greg's patch
(http://marc.theaimsgroup.com/?l=linux-nfs&m=107604754127538&w=2)
previously - but it made no difference in my case. The kernel I'm using
now has both Greg's patch and Ian's recent autofs4 patch.

James Pearson

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [NFS] Re: Busy inodes after unmount followed by Oops
  2004-05-21 16:44     ` Trond Myklebust
  (?)
  (?)
@ 2004-05-22  3:41     ` raven
  -1 siblings, 0 replies; 11+ messages in thread
From: raven @ 2004-05-22  3:41 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: autofs, James Pearson, nfs

On Fri, 21 May 2004, Trond Myklebust wrote:

> På fr , 21/05/2004 klokka 11:44, skreiv James Pearson:
> 
> > We've been running this patch on over 400 machines for the last 10 or so
> > days, and have not seen any instance of this 'VFS: Busy inodes after
> > unmount/Oops' problem above - previously, we would have seen a
> > significant number of these in the same time scale (and we are still
> > seeing the problem on machines not using the patch).
> 
> Note  that Greg's patch for solving the NFS client issues that were
> revealed by this bug is in 2.4.26 (and in 2.6.5)...
> 

Yes I'm sure that Gregs' patch helps for the rename race it's meant to 
fix. However, I believe the main issue revealed at James' site is a race 
for the wait q struct (ref. waitq.c in autofs4). This struct has gone 
entirely unprotected in 2.4 until I syned my 2.4 patch with what I've done 
for 2.6. This is what James is using.

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [NFS] Re: Busy inodes after unmount followed by Oops
  2004-05-21 16:58     ` [NFS] " James Pearson
@ 2004-05-22 13:04       ` Greg Banks
  2004-05-22 15:15         ` raven
  2004-05-24 15:51         ` James Pearson
  0 siblings, 2 replies; 11+ messages in thread
From: Greg Banks @ 2004-05-22 13:04 UTC (permalink / raw)
  To: James Pearson; +Cc: autofs, nfs, Trond Myklebust

On Fri, May 21, 2004 at 05:58:26PM +0100, James Pearson wrote:
> Not quite sure what you mean by this - I had tried Greg's patch
> (http://marc.theaimsgroup.com/?l=linux-nfs&m=107604754127538&w=2)
> previously - but it made no difference in my case. The kernel I'm using
> now has both Greg's patch and Ian's recent autofs4 patch.

The error message and subsequent oops are both generic symptoms and
could come from any kind of race with umount which causes a dentry
or inode reference count leak, not just the particular one in NFS
which I fixed.  There could well be another NFS bug like this, or
one in autofs.  Ian's patch may have fixed it, hidden it, or just
stopped tickling it.

The only way to tell for sure is to modify the code that generates
the message to BUG() instead and use a kernel debugger to figure out
what has gone wrong.  Note that using a debugger at oops time is
already too late.

James, are you able to reproduce this at will?

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [NFS] Re: Busy inodes after unmount followed by Oops
  2004-05-22 13:04       ` Greg Banks
@ 2004-05-22 15:15         ` raven
  2004-05-22 15:24           ` raven
  2004-05-24 15:51         ` James Pearson
  1 sibling, 1 reply; 11+ messages in thread
From: raven @ 2004-05-22 15:15 UTC (permalink / raw)
  To: Greg Banks; +Cc: autofs, James Pearson, nfs, Trond Myklebust

On Sat, 22 May 2004, Greg Banks wrote:

> On Fri, May 21, 2004 at 05:58:26PM +0100, James Pearson wrote:
> > Not quite sure what you mean by this - I had tried Greg's patch
> > (http://marc.theaimsgroup.com/?l=linux-nfs&m=107604754127538&w=2)
> > previously - but it made no difference in my case. The kernel I'm using
> > now has both Greg's patch and Ian's recent autofs4 patch.
> 
> The error message and subsequent oops are both generic symptoms and
> could come from any kind of race with umount which causes a dentry
> or inode reference count leak, not just the particular one in NFS
> which I fixed.  There could well be another NFS bug like this, or
> one in autofs.  Ian's patch may have fixed it, hidden it, or just
> stopped tickling it.

Probably a little of the first two.

The changes I made were a result of working on another problem. I noticed 
that it was possible for two execution paths to raise seperate waits for 
the same mount. I changed the spin lock I had used to a semaphore and 
extended the critical region to force correct wait q handling. It was 
after this that James contacted me and I sent him the my latest patch.

It's worth pointing out that in 2.4 there was previously no locking 
during access to the wait q struct yet there is at least one possibility 
of two execution paths accessing it concurently. 

The oops seems to occur some fair amount of time after the damage was 
done and his log hinted that there was some sort of corruption in the wait 
q struct.

There is the possibility that it is now hidden as I cannot give you an 
exact decription of how this occurs except for the above observations.

There's not much doubt in my mind that there was a potential race for the 
wait q. There are 4 possible execution paths that modify the wait q 
struct. Two are syncronous (so really are only one) but two are not.

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: [NFS] Re: Busy inodes after unmount followed by Oops
  2004-05-22 15:15         ` raven
@ 2004-05-22 15:24           ` raven
  0 siblings, 0 replies; 11+ messages in thread
From: raven @ 2004-05-22 15:24 UTC (permalink / raw)
  To: Greg Banks; +Cc: autofs, James Pearson, nfs, Trond Myklebust

On Sat, 22 May 2004 raven@themaw.net wrote:

> 
> There is the possibility that it is now hidden as I cannot give you an 
> exact decription of how this occurs except for the above observations.
> 
> There's not much doubt in my mind that there was a potential race for the 
> wait q. There are 4 possible execution paths that modify the wait q 
> struct. Two are syncronous (so really are only one) but two are not.

Auugh. It's not as simple as that either in 2.4 due to the use of the BKL 
and a flag in autofs dentry struct. But I maintain there at least two 
paths that are timing sensitive.

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [NFS] Re: Busy inodes after unmount followed by Oops
  2004-05-22 13:04       ` Greg Banks
  2004-05-22 15:15         ` raven
@ 2004-05-24 15:51         ` James Pearson
  1 sibling, 0 replies; 11+ messages in thread
From: James Pearson @ 2004-05-24 15:51 UTC (permalink / raw)
  To: Greg Banks; +Cc: autofs, nfs

Greg Banks wrote:
> 
> On Fri, May 21, 2004 at 05:58:26PM +0100, James Pearson wrote:
> > Not quite sure what you mean by this - I had tried Greg's patch
> > (http://marc.theaimsgroup.com/?l=linux-nfs&m=107604754127538&w=2)
> > previously - but it made no difference in my case. The kernel I'm using
> > now has both Greg's patch and Ian's recent autofs4 patch.
> 
> The error message and subsequent oops are both generic symptoms and
> could come from any kind of race with umount which causes a dentry
> or inode reference count leak, not just the particular one in NFS
> which I fixed.  There could well be another NFS bug like this, or
> one in autofs.  Ian's patch may have fixed it, hidden it, or just
> stopped tickling it.
> 
> The only way to tell for sure is to modify the code that generates
> the message to BUG() instead and use a kernel debugger to figure out
> what has gone wrong.  Note that using a debugger at oops time is
> already too late.
> 
> James, are you able to reproduce this at will?

Unfortunately not. It proved impossible to reproduce in a 'controlled'
way - however since using the latest autofs4 patch we haven't had the
problem. Given that, I'm happy with the situation as it stands.

James Pearson

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-05-24 15:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-09 11:06 Busy inodes after unmount followed by Oops James Pearson
2004-01-09 19:13 ` [NFS] " Ian Kent
2004-05-21 15:44 ` James Pearson
2004-05-21 16:44   ` Trond Myklebust
2004-05-21 16:44     ` Trond Myklebust
2004-05-21 16:58     ` [NFS] " James Pearson
2004-05-22 13:04       ` Greg Banks
2004-05-22 15:15         ` raven
2004-05-22 15:24           ` raven
2004-05-24 15:51         ` James Pearson
2004-05-22  3:41     ` raven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.