* RE: NFS root lockups with -next 20110113
2011-01-13 13:22 ` J. R. Okajima
@ 2011-01-13 13:28 ` Santosh Shilimkar
2011-01-13 13:45 ` J. R. Okajima
2011-01-13 13:35 ` NFS root lockups with -next 20110113 Mark Brown
2011-01-13 13:41 ` Santosh Shilimkar
2 siblings, 1 reply; 12+ messages in thread
From: Santosh Shilimkar @ 2011-01-13 13:28 UTC (permalink / raw)
To: J. R. Okajima, Mark Brown
Cc: Trond Myklebust, Nick Piggin, linux-nfs, linux-kernel,
linux-fsdevel
> -----Original Message-----
> From: J. R. Okajima [mailto:hooanon05@yahoo.co.jp]
> Sent: Thursday, January 13, 2011 6:52 PM
> To: Mark Brown
> Cc: Trond Myklebust; Santosh Shilimkar; Nick Piggin; linux-
> nfs@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> fsdevel@vger.kernel.org
> Subject: Re: NFS root lockups with -next 20110113
>
>
> Mark Brown:
> > With -next 20110113 I'm experiencing lockups shortly after
> userspace
> > starts when booting with my root filesystem on NFS, log below. I
> can
> > boot into /bin/sh but running real userspace triggers this very
> easily.
> > This was introduced sometime this week.
> >
> > I've not bisected or otherwise investigated much yet, but I do
> notice
> > code added recently by Nick in "fs: rcu-walk for path lookup"
> showing up
> > in the backtrace so including him in the CCs.
>
> This and a report from Santosh Shilimkar look like the same problem
> which I reported, and I am testing this patch. If you can, please
> try it
> too.
> Note: Of course this is not offcial fix.
>
It works. My board booted with NFS with below patch
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 5bb7588..51d052f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct
> nameidata *nd, struct dentry *dentry
> {
> struct fs_struct *fs = current->fs;
> struct dentry *parent = nd->path.dentry;
> + int isroot;
>
> BUG_ON(!(nd->flags & LOOKUP_RCU));
> if (nd->root.mnt) {
> @@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct
> nameidata *nd, struct dentry *dentry
> goto err_root;
> }
> spin_lock(&parent->d_lock);
> - spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> - if (!__d_rcu_to_refcount(dentry, nd->seq))
> - goto err;
> + isroot = IS_ROOT(dentry);
> + if (!isroot) {
> + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> + if (!__d_rcu_to_refcount(dentry, nd->seq))
> + goto err;
> + }
> /*
> * If the sequence check on the child dentry passed, then the
> child has
> * not been removed from its parent. This means the parent
> dentry must
> * be valid and able to take a reference at this point.
> */
> - BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
> + BUG_ON(!isroot && dentry->d_parent != parent);
> BUG_ON(!parent->d_count);
> parent->d_count++;
> - spin_unlock(&dentry->d_lock);
> + if (!isroot)
> + spin_unlock(&dentry->d_lock);
> spin_unlock(&parent->d_lock);
> if (nd->root.mnt) {
> path_get(&nd->root);
> @@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct
> nameidata *nd, struct dentry *dentry
> nd->flags &= ~LOOKUP_RCU;
> return 0;
> err:
> - spin_unlock(&dentry->d_lock);
> + if (!isroot)
> + spin_unlock(&dentry->d_lock);
> spin_unlock(&parent->d_lock);
> err_root:
> if (nd->root.mnt)
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: NFS root lockups with -next 20110113
2011-01-13 13:28 ` Santosh Shilimkar
@ 2011-01-13 13:45 ` J. R. Okajima
2011-01-14 3:59 ` Nick Piggin
0 siblings, 1 reply; 12+ messages in thread
From: J. R. Okajima @ 2011-01-13 13:45 UTC (permalink / raw)
To: Santosh Shilimkar
Cc: Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs, linux-kernel,
linux-fsdevel
Santosh Shilimkar:
> It works. My board booted with NFS with below patch
----------------------------------------
Mark Brown;
> Seems to do the trick:
>
> Tested-by: Mark Brown <broonie@opensource.wolsfonmicro.com>
Thank you so quick tests.
When I post the patch to be mainlined, I will add Tested-by for both of
you. But Nick Piggin may take another approach.
J. R. Okajima
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS root lockups with -next 20110113
2011-01-13 13:45 ` J. R. Okajima
@ 2011-01-14 3:59 ` Nick Piggin
2011-01-14 4:41 ` J. R. Okajima
2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
0 siblings, 2 replies; 12+ messages in thread
From: Nick Piggin @ 2011-01-14 3:59 UTC (permalink / raw)
To: J. R. Okajima
Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
linux-nfs, linux-kernel, linux-fsdevel
On Fri, Jan 14, 2011 at 12:45 AM, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>
> Santosh Shilimkar:
>> It works. My board booted with NFS with below patch
>
> ----------------------------------------
>
> Mark Brown;
>> Seems to do the trick:
>>
>> Tested-by: Mark Brown <broonie@opensource.wolsfonmicro.com>
>
> Thank you so quick tests.
> When I post the patch to be mainlined, I will add Tested-by for both of
> you. But Nick Piggin may take another approach.
Thanks for your help, can you see how I've fixed it in my vfs-scale
tree? What do you think?
Thanks,
Nick
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS root lockups with -next 20110113
2011-01-14 3:59 ` Nick Piggin
@ 2011-01-14 4:41 ` J. R. Okajima
2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
1 sibling, 0 replies; 12+ messages in thread
From: J. R. Okajima @ 2011-01-14 4:41 UTC (permalink / raw)
To: Nick Piggin
Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
linux-nfs, linux-kernel, linux-fsdevel
Nick Piggin:
> Thanks for your help, can you see how I've fixed it in my vfs-scale
> tree? What do you think?
I will.
It may take some time.
J. R. Okajima
^ permalink raw reply [flat|nested] 12+ messages in thread
* vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
2011-01-14 3:59 ` Nick Piggin
2011-01-14 4:41 ` J. R. Okajima
@ 2011-01-19 6:43 ` J. R. Okajima
2011-01-19 7:21 ` Nick Piggin
2011-02-11 3:49 ` Ian Kent
1 sibling, 2 replies; 12+ messages in thread
From: J. R. Okajima @ 2011-01-19 6:43 UTC (permalink / raw)
To: Nick Piggin
Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
linux-nfs, linux-kernel, linux-fsdevel
Hi,
Nick Piggin:
> Thanks for your help, can you see how I've fixed it in my vfs-scale
> tree? What do you think?
Your fix is great. I have no objection at all.
Other than the fix, here are more generic questions about vfs-scale work.
I am happy if you reply when you have time.
- getcwd(2) needs d_lock?
It acquires rename_lock and then tests whether the pwd is removed by
d_unhashed(). If a race condition between vfs_rename_dir() which may
unhash/rehash the dentry happens, then getcwd() may return the wrong
result due to unprotected d_unhashed() call, I am afraid. rename_lock
doesn't help this case.
- what is the right order of dget() and mntget()?
If I remember correctly, someone said "mntget() first and then
dget(). when putting, do in reverse" in the discussion when
path_{get,put}() were born. So it is called "the right order" in the
commit log.
It was many years ago. Is it still true? And should rcu-walk follow it
too? The current implementation doesn't seem to care about this order.
- d_move() and rename_lock
This may be out of rcu-walk work, but rename_lock in d_move() looks
outstanding since it surely kills concurrency. It is a pity that two
unrelated but concurrent d_move-s are serialized when we run rename(2)
on two different filesystems. Even if all of dentries, parents and
hash buckets are different from each other, d_move() never run
concurrently.
J. R. Okajima
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
@ 2011-01-19 7:21 ` Nick Piggin
2011-02-11 3:49 ` Ian Kent
1 sibling, 0 replies; 12+ messages in thread
From: Nick Piggin @ 2011-01-19 7:21 UTC (permalink / raw)
To: J. R. Okajima
Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
linux-nfs, linux-kernel, linux-fsdevel
On Wed, Jan 19, 2011 at 5:43 PM, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>
> Hi,
>
> Nick Piggin:
>> Thanks for your help, can you see how I've fixed it in my vfs-scale
>> tree? What do you think?
>
> Your fix is great. I have no objection at all.
> Other than the fix, here are more generic questions about vfs-scale work.
> I am happy if you reply when you have time.
Thanks for reviewing.
> - getcwd(2) needs d_lock?
> It acquires rename_lock and then tests whether the pwd is removed by
> d_unhashed(). If a race condition between vfs_rename_dir() which may
> unhash/rehash the dentry happens, then getcwd() may return the wrong
> result due to unprotected d_unhashed() call, I am afraid. rename_lock
> doesn't help this case.
We have the lock in write mode there, so it should exclude that
particular race. But I need to take another look at this code I
think, I'm not sure it's completely right, so I would appreciate reviews.
A while back I had some extra checks in there and would restart
the entire reverse walk in case of races... but need to think about
it.
> - what is the right order of dget() and mntget()?
> If I remember correctly, someone said "mntget() first and then
> dget(). when putting, do in reverse" in the discussion when
> path_{get,put}() were born. So it is called "the right order" in the
> commit log.
> It was many years ago. Is it still true? And should rcu-walk follow it
> too? The current implementation doesn't seem to care about this order.
Well dget and mntget is not a problem, because we can only do
mntget while already guaranteeing a reference on the mount, and
only dget when already guaranteeing a ref on the dentry (and mount).
But dput must happen before mntput so you don't have dentry ref
without mnt ref. Can you point out where rcu-walk does this wrongly?
> - d_move() and rename_lock
> This may be out of rcu-walk work, but rename_lock in d_move() looks
> outstanding since it surely kills concurrency. It is a pity that two
> unrelated but concurrent d_move-s are serialized when we run rename(2)
> on two different filesystems. Even if all of dentries, parents and
> hash buckets are different from each other, d_move() never run
> concurrently.
Yes I have a patch for that. I made a small hash table of rename locks.
This makes independent same-dir renames scalable. However that was
not the main motivation of the patch. On a really big POWER7 system,
the lookup path goes into a strange bimodal behaviour in the presence
of a relatively small amount of rename activity and sometimes starves
and throughput crashes. Breaking up rename_lock solves that too.
I'll wait until things settle down a bit more and perhaps have a chance
to get more numbers before submitting it (although I can show you when
I get back).
Thanks,
Nick
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
2011-01-19 7:21 ` Nick Piggin
@ 2011-02-11 3:49 ` Ian Kent
2011-02-13 2:19 ` J. R. Okajima
1 sibling, 1 reply; 12+ messages in thread
From: Ian Kent @ 2011-02-11 3:49 UTC (permalink / raw)
To: J. R. Okajima
Cc: Nick Piggin, Santosh Shilimkar, Mark Brown, Trond Myklebust,
Nick Piggin, linux-nfs, linux-kernel, linux-fsdevel
On Wed, 2011-01-19 at 15:43 +0900, J. R. Okajima wrote:
> Hi,
>
> Nick Piggin:
> > Thanks for your help, can you see how I've fixed it in my vfs-scale
> > tree? What do you think?
>
> Your fix is great. I have no objection at all.
> Other than the fix, here are more generic questions about vfs-scale work.
> I am happy if you reply when you have time.
>
> - getcwd(2) needs d_lock?
> It acquires rename_lock and then tests whether the pwd is removed by
> d_unhashed(). If a race condition between vfs_rename_dir() which may
> unhash/rehash the dentry happens, then getcwd() may return the wrong
> result due to unprotected d_unhashed() call, I am afraid. rename_lock
> doesn't help this case.
>
> - what is the right order of dget() and mntget()?
> If I remember correctly, someone said "mntget() first and then
> dget(). when putting, do in reverse" in the discussion when
> path_{get,put}() were born. So it is called "the right order" in the
> commit log.
> It was many years ago. Is it still true? And should rcu-walk follow it
> too? The current implementation doesn't seem to care about this order.
I didn't spot that, where did you see this?
I'm not sure about the get but I fairly sure the dput() has to be before
the mntput() because the shrink_dcache_*() cleanup routines object to
dentrys that have a reference count of more than one.
Ian
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
2011-02-11 3:49 ` Ian Kent
@ 2011-02-13 2:19 ` J. R. Okajima
0 siblings, 0 replies; 12+ messages in thread
From: J. R. Okajima @ 2011-02-13 2:19 UTC (permalink / raw)
To: Ian Kent
Cc: Nick Piggin, Santosh Shilimkar, Mark Brown, Trond Myklebust,
Nick Piggin, linux-nfs, linux-kernel, linux-fsdevel
Ian Kent:
> > - what is the right order of dget() and mntget()?
> > If I remember correctly, someone said "mntget() first and then
> > dget(). when putting, do in reverse" in the discussion when
> > path_{get,put}() were born. So it is called "the right order" in the
> > commit log.
> > It was many years ago. Is it still true? And should rcu-walk follow it
> > too? The current implementation doesn't seem to care about this order.
>
> I didn't spot that, where did you see this?
>
> I'm not sure about the get but I fairly sure the dput() has to be before
> the mntput() because the shrink_dcache_*() cleanup routines object to
> dentrys that have a reference count of more than one.
For dget - mntget, there are several such code. For example,
nameidata_dentry_drop_rcu()
{
struct dentry *parent = nd->path.dentry;
:::
parent->d_count++;
spin_unlock(&dentry->d_lock);
spin_unlock(&parent->d_lock);
:::
mntget(nd->path.mnt);
:::
But I am not sure the "get" order is a problem.
Nick Piggin also replied and said dget and mntget is not a problem, and
I replied if I found such "put" order, I would write again.
J. R. Okajima
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NFS root lockups with -next 20110113
2011-01-13 13:22 ` J. R. Okajima
2011-01-13 13:28 ` Santosh Shilimkar
@ 2011-01-13 13:35 ` Mark Brown
2011-01-13 13:41 ` Santosh Shilimkar
2 siblings, 0 replies; 12+ messages in thread
From: Mark Brown @ 2011-01-13 13:35 UTC (permalink / raw)
To: J. R. Okajima
Cc: Trond Myklebust, Santosh Shilimkar, Nick Piggin, linux-nfs,
linux-kernel, linux-fsdevel
On Thu, Jan 13, 2011 at 10:22:07PM +0900, J. R. Okajima wrote:
> This and a report from Santosh Shilimkar look like the same problem
> which I reported, and I am testing this patch. If you can, please try it
> too.
> Note: Of course this is not offcial fix.
Seems to do the trick:
Tested-by: Mark Brown <broonie-yzvPICuk2AABXRvFM2YniueM+bqZidxxQQ4Iyu8u01E@public.gmane.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: NFS root lockups with -next 20110113
2011-01-13 13:22 ` J. R. Okajima
2011-01-13 13:28 ` Santosh Shilimkar
2011-01-13 13:35 ` NFS root lockups with -next 20110113 Mark Brown
@ 2011-01-13 13:41 ` Santosh Shilimkar
2 siblings, 0 replies; 12+ messages in thread
From: Santosh Shilimkar @ 2011-01-13 13:41 UTC (permalink / raw)
To: J. R. Okajima, Mark Brown
Cc: Trond Myklebust, Nick Piggin, linux-nfs, linux-kernel,
linux-fsdevel, linux-arm-kernel
(+ 'linux-arm' since same problem was getting discussed
in another thread)
> -----Original Message-----
> From: Santosh Shilimkar [mailto:santosh.shilimkar@ti.com]
> Sent: Thursday, January 13, 2011 6:59 PM
> To: 'J. R. Okajima'; 'Mark Brown'
> Cc: 'Trond Myklebust'; 'Nick Piggin'; 'linux-nfs@vger.kernel.org';
> 'linux-kernel@vger.kernel.org'; 'linux-fsdevel@vger.kernel.org'
> Subject: RE: NFS root lockups with -next 20110113
>
> > -----Original Message-----
> > From: J. R. Okajima [mailto:hooanon05-/E1597aS9LR3+QwDJ9on6Q@public.gmane.org]
> > Sent: Thursday, January 13, 2011 6:52 PM
> > To: Mark Brown
> > Cc: Trond Myklebust; Santosh Shilimkar; Nick Piggin; linux-
> > nfs@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> > fsdevel@vger.kernel.org
> > Subject: Re: NFS root lockups with -next 20110113
> >
> >
> > Mark Brown:
> > > With -next 20110113 I'm experiencing lockups shortly after
> > userspace
> > > starts when booting with my root filesystem on NFS, log below.
> I
> > can
> > > boot into /bin/sh but running real userspace triggers this very
> > easily.
> > > This was introduced sometime this week.
> > >
> > > I've not bisected or otherwise investigated much yet, but I do
> > notice
> > > code added recently by Nick in "fs: rcu-walk for path lookup"
> > showing up
> > > in the backtrace so including him in the CCs.
> >
> > This and a report from Santosh Shilimkar look like the same
> problem
> > which I reported, and I am testing this patch. If you can, please
> > try it
> > too.
> > Note: Of course this is not offcial fix.
> >
>
> It works. My board booted with NFS with below patch
>
>
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 5bb7588..51d052f 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct
> > nameidata *nd, struct dentry *dentry
> > {
> > struct fs_struct *fs = current->fs;
> > struct dentry *parent = nd->path.dentry;
> > + int isroot;
> >
> > BUG_ON(!(nd->flags & LOOKUP_RCU));
> > if (nd->root.mnt) {
> > @@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct
> > nameidata *nd, struct dentry *dentry
> > goto err_root;
> > }
> > spin_lock(&parent->d_lock);
> > - spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> > - if (!__d_rcu_to_refcount(dentry, nd->seq))
> > - goto err;
> > + isroot = IS_ROOT(dentry);
> > + if (!isroot) {
> > + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> > + if (!__d_rcu_to_refcount(dentry, nd->seq))
> > + goto err;
> > + }
> > /*
> > * If the sequence check on the child dentry passed, then the
> > child has
> > * not been removed from its parent. This means the parent
> > dentry must
> > * be valid and able to take a reference at this point.
> > */
> > - BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
> > + BUG_ON(!isroot && dentry->d_parent != parent);
> > BUG_ON(!parent->d_count);
> > parent->d_count++;
> > - spin_unlock(&dentry->d_lock);
> > + if (!isroot)
> > + spin_unlock(&dentry->d_lock);
> > spin_unlock(&parent->d_lock);
> > if (nd->root.mnt) {
> > path_get(&nd->root);
> > @@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct
> > nameidata *nd, struct dentry *dentry
> > nd->flags &= ~LOOKUP_RCU;
> > return 0;
> > err:
> > - spin_unlock(&dentry->d_lock);
> > + if (!isroot)
> > + spin_unlock(&dentry->d_lock);
> > spin_unlock(&parent->d_lock);
> > err_root:
> > if (nd->root.mnt)
^ permalink raw reply [flat|nested] 12+ messages in thread