* NFS root lockups with -next 20110113 @ 2011-01-13 12:06 Mark Brown 2011-01-13 13:22 ` J. R. Okajima 0 siblings, 1 reply; 15+ messages in thread From: Mark Brown @ 2011-01-13 12:06 UTC (permalink / raw) To: Trond Myklebust, Nick Piggin; +Cc: linux-nfs, linux-kernel, linux-fsdevel With -next 20110113 I'm experiencing lockups shortly after userspace starts when booting with my root filesystem on NFS, log below. I can boot into /bin/sh but running real userspace triggers this very easily. This was introduced sometime this week. I've not bisected or otherwise investigated much yet, but I do notice code added recently by Nick in "fs: rcu-walk for path lookup" showing up in the backtrace so including him in the CCs. Tail of the console log: INIT: version 2.86 booting [ 15.250000] BUG: spinlock recursion on CPU#0, rc/1151 [ 15.250000] lock: c74ad678, .magic: dead4ead, .owner: rc/1151, .owner_cpu: 0 [ 15.260000] [<c0035118>] (unwind_backtrace+0x0/0xe4) from [<c0190454>] (do_raw_spin_lock+0x48/0x148) [ 15.260000] [<c0190454>] (do_raw_spin_lock+0x48/0x148) from [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180) [ 15.270000] [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180) from [<c00c8b1c>] (d_revalidate+0x30/0x58) [ 15.280000] [<c00c8b1c>] (d_revalidate+0x30/0x58) from [<c00cb82c>] (link_path_walk+0xad8/0xb00) [ 15.290000] [<c00cb82c>] (link_path_walk+0xad8/0xb00) from [<c00cba4c>] (do_path_lookup+0x44/0xcc) [ 15.300000] [<c00cba4c>] (do_path_lookup+0x44/0xcc) from [<c00cc3f4>] (user_path_at+0x58/0x90) [ 15.310000] [<c00cc3f4>] (user_path_at+0x58/0x90) from [<c00c4210>] (vfs_fstatat+0x28/0x54) [ 15.320000] [<c00c4210>] (vfs_fstatat+0x28/0x54) from [<c00c431c>] (sys_stat64+0x14/0x34) [ 15.330000] [<c00c431c>] (sys_stat64+0x14/0x34) from [<c002f4e0>] (ret_fast_syscall+0x0/0x30) [ 25.830000] BUG: spinlock lockup on CPU#0, rc/1151, c74ad678 [ 25.830000] [<c0035118>] (unwind_backtrace+0x0/0xe4) from [<c0190518>] (do_raw_spin_lock+0x10c/0x148) [ 25.840000] [<c0190518>] (do_raw_spin_lock+0x10c/0x148) from [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180) [ 25.850000] [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180) from [<c00c8b1c>] (d_revalidate+0x30/0x58) [ 25.860000] [<c00c8b1c>] (d_revalidate+0x30/0x58) from [<c00cb82c>] (link_path_walk+0xad8/0xb00) [ 25.870000] [<c00cb82c>] (link_path_walk+0xad8/0xb00) from [<c00cba4c>] (do_path_lookup+0x44/0xcc) [ 25.870000] [<c00cba4c>] (do_path_lookup+0x44/0xcc) from [<c00cc3f4>] (user_path_at+0x58/0x90) [ 25.880000] [<c00cc3f4>] (user_path_at+0x58/0x90) from [<c00c4210>] (vfs_fstatat+0x28/0x54) [ 25.890000] [<c00c4210>] (vfs_fstatat+0x28/0x54) from [<c00c431c>] (sys_stat64+0x14/0x34) [ 25.900000] [<c00c431c>] (sys_stat64+0x14/0x34) from [<c002f4e0>] (ret_fast_syscall+0x0/0x30) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NFS root lockups with -next 20110113 2011-01-13 12:06 NFS root lockups with -next 20110113 Mark Brown @ 2011-01-13 13:22 ` J. R. Okajima 2011-01-13 13:28 ` Santosh Shilimkar ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: J. R. Okajima @ 2011-01-13 13:22 UTC (permalink / raw) To: Mark Brown Cc: Trond Myklebust, Santosh Shilimkar, Nick Piggin, linux-nfs, linux-kernel, linux-fsdevel Mark Brown: > With -next 20110113 I'm experiencing lockups shortly after userspace > starts when booting with my root filesystem on NFS, log below. I can > boot into /bin/sh but running real userspace triggers this very easily. > This was introduced sometime this week. > > I've not bisected or otherwise investigated much yet, but I do notice > code added recently by Nick in "fs: rcu-walk for path lookup" showing up > in the backtrace so including him in the CCs. This and a report from Santosh Shilimkar look like the same problem which I reported, and I am testing this patch. If you can, please try it too. Note: Of course this is not offcial fix. J. R. Okajima diff --git a/fs/namei.c b/fs/namei.c index 5bb7588..51d052f 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry { struct fs_struct *fs = current->fs; struct dentry *parent = nd->path.dentry; + int isroot; BUG_ON(!(nd->flags & LOOKUP_RCU)); if (nd->root.mnt) { @@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry goto err_root; } spin_lock(&parent->d_lock); - spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); - if (!__d_rcu_to_refcount(dentry, nd->seq)) - goto err; + isroot = IS_ROOT(dentry); + if (!isroot) { + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); + if (!__d_rcu_to_refcount(dentry, nd->seq)) + goto err; + } /* * If the sequence check on the child dentry passed, then the child has * not been removed from its parent. This means the parent dentry must * be valid and able to take a reference at this point. */ - BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent); + BUG_ON(!isroot && dentry->d_parent != parent); BUG_ON(!parent->d_count); parent->d_count++; - spin_unlock(&dentry->d_lock); + if (!isroot) + spin_unlock(&dentry->d_lock); spin_unlock(&parent->d_lock); if (nd->root.mnt) { path_get(&nd->root); @@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry nd->flags &= ~LOOKUP_RCU; return 0; err: - spin_unlock(&dentry->d_lock); + if (!isroot) + spin_unlock(&dentry->d_lock); spin_unlock(&parent->d_lock); err_root: if (nd->root.mnt) ^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: NFS root lockups with -next 20110113 2011-01-13 13:22 ` J. R. Okajima @ 2011-01-13 13:28 ` Santosh Shilimkar [not found] ` <676f5c24375e1cc2aa14fe6630ef1324-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-01-13 13:35 ` NFS root lockups with -next 20110113 Mark Brown 2011-01-13 13:41 ` Santosh Shilimkar 2 siblings, 1 reply; 15+ messages in thread From: Santosh Shilimkar @ 2011-01-13 13:28 UTC (permalink / raw) To: J. R. Okajima, Mark Brown Cc: Trond Myklebust, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA > -----Original Message----- > From: J. R. Okajima [mailto:hooanon05-/E1597aS9LR3+QwDJ9on6Q@public.gmane.org] > Sent: Thursday, January 13, 2011 6:52 PM > To: Mark Brown > Cc: Trond Myklebust; Santosh Shilimkar; Nick Piggin; linux- > nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux- > fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Subject: Re: NFS root lockups with -next 20110113 > > > Mark Brown: > > With -next 20110113 I'm experiencing lockups shortly after > userspace > > starts when booting with my root filesystem on NFS, log below. I > can > > boot into /bin/sh but running real userspace triggers this very > easily. > > This was introduced sometime this week. > > > > I've not bisected or otherwise investigated much yet, but I do > notice > > code added recently by Nick in "fs: rcu-walk for path lookup" > showing up > > in the backtrace so including him in the CCs. > > This and a report from Santosh Shilimkar look like the same problem > which I reported, and I am testing this patch. If you can, please > try it > too. > Note: Of course this is not offcial fix. > It works. My board booted with NFS with below patch > > diff --git a/fs/namei.c b/fs/namei.c > index 5bb7588..51d052f 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct > nameidata *nd, struct dentry *dentry > { > struct fs_struct *fs = current->fs; > struct dentry *parent = nd->path.dentry; > + int isroot; > > BUG_ON(!(nd->flags & LOOKUP_RCU)); > if (nd->root.mnt) { > @@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct > nameidata *nd, struct dentry *dentry > goto err_root; > } > spin_lock(&parent->d_lock); > - spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); > - if (!__d_rcu_to_refcount(dentry, nd->seq)) > - goto err; > + isroot = IS_ROOT(dentry); > + if (!isroot) { > + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); > + if (!__d_rcu_to_refcount(dentry, nd->seq)) > + goto err; > + } > /* > * If the sequence check on the child dentry passed, then the > child has > * not been removed from its parent. This means the parent > dentry must > * be valid and able to take a reference at this point. > */ > - BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent); > + BUG_ON(!isroot && dentry->d_parent != parent); > BUG_ON(!parent->d_count); > parent->d_count++; > - spin_unlock(&dentry->d_lock); > + if (!isroot) > + spin_unlock(&dentry->d_lock); > spin_unlock(&parent->d_lock); > if (nd->root.mnt) { > path_get(&nd->root); > @@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct > nameidata *nd, struct dentry *dentry > nd->flags &= ~LOOKUP_RCU; > return 0; > err: > - spin_unlock(&dentry->d_lock); > + if (!isroot) > + spin_unlock(&dentry->d_lock); > spin_unlock(&parent->d_lock); > err_root: > if (nd->root.mnt) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <676f5c24375e1cc2aa14fe6630ef1324-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: NFS root lockups with -next 20110113 [not found] ` <676f5c24375e1cc2aa14fe6630ef1324-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-01-13 13:45 ` J. R. Okajima 2011-01-14 3:59 ` Nick Piggin 0 siblings, 1 reply; 15+ messages in thread From: J. R. Okajima @ 2011-01-13 13:45 UTC (permalink / raw) To: Santosh Shilimkar Cc: Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Santosh Shilimkar: > It works. My board booted with NFS with below patch ---------------------------------------- Mark Brown; > Seems to do the trick: > > Tested-by: Mark Brown <broonie-yzvPICuk2AABXRvFM2YniueM+bqZidxxQQ4Iyu8u01E@public.gmane.org> Thank you so quick tests. When I post the patch to be mainlined, I will add Tested-by for both of you. But Nick Piggin may take another approach. J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NFS root lockups with -next 20110113 2011-01-13 13:45 ` J. R. Okajima @ 2011-01-14 3:59 ` Nick Piggin [not found] ` <AANLkTim=VY7+fo6d_nUXVxs+iZ_f79qWu_eYMUjvhVJO-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Nick Piggin @ 2011-01-14 3:59 UTC (permalink / raw) To: J. R. Okajima Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Fri, Jan 14, 2011 at 12:45 AM, J. R. Okajima <hooanon05-/E1597aS9LR3+QwDJ9on6Q@public.gmane.org> wrote: > > Santosh Shilimkar: >> It works. My board booted with NFS with below patch > > ---------------------------------------- > > Mark Brown; >> Seems to do the trick: >> >> Tested-by: Mark Brown <broonie-yzvPICuk2AABXRvFM2YniueM+bqZidxxQQ4Iyu8u01E@public.gmane.org> > > Thank you so quick tests. > When I post the patch to be mainlined, I will add Tested-by for both of > you. But Nick Piggin may take another approach. Thanks for your help, can you see how I've fixed it in my vfs-scale tree? What do you think? Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AANLkTim=VY7+fo6d_nUXVxs+iZ_f79qWu_eYMUjvhVJO-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: NFS root lockups with -next 20110113 [not found] ` <AANLkTim=VY7+fo6d_nUXVxs+iZ_f79qWu_eYMUjvhVJO-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-01-14 4:41 ` J. R. Okajima 2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima 1 sibling, 0 replies; 15+ messages in thread From: J. R. Okajima @ 2011-01-14 4:41 UTC (permalink / raw) To: Nick Piggin Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Nick Piggin: > Thanks for your help, can you see how I've fixed it in my vfs-scale > tree? What do you think? I will. It may take some time. J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* vfs-scale, general questions (Re: NFS root lockups with -next 20110113) [not found] ` <AANLkTim=VY7+fo6d_nUXVxs+iZ_f79qWu_eYMUjvhVJO-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-01-14 4:41 ` J. R. Okajima @ 2011-01-19 6:43 ` J. R. Okajima 2011-01-19 7:21 ` Nick Piggin 2011-02-11 3:49 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent 1 sibling, 2 replies; 15+ messages in thread From: J. R. Okajima @ 2011-01-19 6:43 UTC (permalink / raw) To: Nick Piggin Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Hi, Nick Piggin: > Thanks for your help, can you see how I've fixed it in my vfs-scale > tree? What do you think? Your fix is great. I have no objection at all. Other than the fix, here are more generic questions about vfs-scale work. I am happy if you reply when you have time. - getcwd(2) needs d_lock? It acquires rename_lock and then tests whether the pwd is removed by d_unhashed(). If a race condition between vfs_rename_dir() which may unhash/rehash the dentry happens, then getcwd() may return the wrong result due to unprotected d_unhashed() call, I am afraid. rename_lock doesn't help this case. - what is the right order of dget() and mntget()? If I remember correctly, someone said "mntget() first and then dget(). when putting, do in reverse" in the discussion when path_{get,put}() were born. So it is called "the right order" in the commit log. It was many years ago. Is it still true? And should rcu-walk follow it too? The current implementation doesn't seem to care about this order. - d_move() and rename_lock This may be out of rcu-walk work, but rename_lock in d_move() looks outstanding since it surely kills concurrency. It is a pity that two unrelated but concurrent d_move-s are serialized when we run rename(2) on two different filesystems. Even if all of dentries, parents and hash buckets are different from each other, d_move() never run concurrently. J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113) 2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima @ 2011-01-19 7:21 ` Nick Piggin 2011-01-20 9:05 ` vfs-scale, general questions J. R. Okajima 2011-02-11 3:49 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent 1 sibling, 1 reply; 15+ messages in thread From: Nick Piggin @ 2011-01-19 7:21 UTC (permalink / raw) To: J. R. Okajima Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Wed, Jan 19, 2011 at 5:43 PM, J. R. Okajima <hooanon05-/E1597aS9LR3+QwDJ9on6Q@public.gmane.org> wrote: > > Hi, > > Nick Piggin: >> Thanks for your help, can you see how I've fixed it in my vfs-scale >> tree? What do you think? > > Your fix is great. I have no objection at all. > Other than the fix, here are more generic questions about vfs-scale work. > I am happy if you reply when you have time. Thanks for reviewing. > - getcwd(2) needs d_lock? > It acquires rename_lock and then tests whether the pwd is removed by > d_unhashed(). If a race condition between vfs_rename_dir() which may > unhash/rehash the dentry happens, then getcwd() may return the wrong > result due to unprotected d_unhashed() call, I am afraid. rename_lock > doesn't help this case. We have the lock in write mode there, so it should exclude that particular race. But I need to take another look at this code I think, I'm not sure it's completely right, so I would appreciate reviews. A while back I had some extra checks in there and would restart the entire reverse walk in case of races... but need to think about it. > - what is the right order of dget() and mntget()? > If I remember correctly, someone said "mntget() first and then > dget(). when putting, do in reverse" in the discussion when > path_{get,put}() were born. So it is called "the right order" in the > commit log. > It was many years ago. Is it still true? And should rcu-walk follow it > too? The current implementation doesn't seem to care about this order. Well dget and mntget is not a problem, because we can only do mntget while already guaranteeing a reference on the mount, and only dget when already guaranteeing a ref on the dentry (and mount). But dput must happen before mntput so you don't have dentry ref without mnt ref. Can you point out where rcu-walk does this wrongly? > - d_move() and rename_lock > This may be out of rcu-walk work, but rename_lock in d_move() looks > outstanding since it surely kills concurrency. It is a pity that two > unrelated but concurrent d_move-s are serialized when we run rename(2) > on two different filesystems. Even if all of dentries, parents and > hash buckets are different from each other, d_move() never run > concurrently. Yes I have a patch for that. I made a small hash table of rename locks. This makes independent same-dir renames scalable. However that was not the main motivation of the patch. On a really big POWER7 system, the lookup path goes into a strange bimodal behaviour in the presence of a relatively small amount of rename activity and sometimes starves and throughput crashes. Breaking up rename_lock solves that too. I'll wait until things settle down a bit more and perhaps have a chance to get more numbers before submitting it (although I can show you when I get back). Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions 2011-01-19 7:21 ` Nick Piggin @ 2011-01-20 9:05 ` J. R. Okajima 2011-01-20 11:15 ` Miklos Szeredi 0 siblings, 1 reply; 15+ messages in thread From: J. R. Okajima @ 2011-01-20 9:05 UTC (permalink / raw) To: Nick Piggin; +Cc: Nick Piggin, linux-kernel, linux-fsdevel (Some of mail destinations are removed since this is not nfs specific anymore.) Nick Piggin: > > - getcwd(2) needs d_lock? > > =A0It acquires rename_lock and then tests whether the pwd is removed by > > =A0d_unhashed(). If a race condition between vfs_rename_dir() which may > > =A0unhash/rehash the dentry happens, then getcwd() may return the wrong > > =A0result due to unprotected d_unhashed() call, I am afraid. rename_lock > > =A0doesn't help this case. > > We have the lock in write mode there, so it should exclude that > particular race. But I need to take another look at this code I > think, I'm not sure it's completely right, so I would appreciate reviews. You might think about the race around d_move, but what I meant is the race between d_unlinked and unhash/rehash. - getcwd return -ENOENT when pwd is unhashed. - vfs_rename_dir() + makes the existing target unhashed. + FS ->rename() is called, here let's assume an error happened. so the target dir is surely alive and reachable, nothing have been changed. + vfs_rename_dir() rehashes it again. During this unhashed period, getcwd(2) may be issued. And I am afraid it may return an error incorrectly. About other issues, I will reply when I have time. J. R. Okajima ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions 2011-01-20 9:05 ` vfs-scale, general questions J. R. Okajima @ 2011-01-20 11:15 ` Miklos Szeredi 2011-01-21 6:38 ` J. R. Okajima 0 siblings, 1 reply; 15+ messages in thread From: Miklos Szeredi @ 2011-01-20 11:15 UTC (permalink / raw) To: J. R. Okajima; +Cc: npiggin, npiggin, linux-kernel, linux-fsdevel On Thu, 20 Jan 2011, J. R. Okajima wrote: > You might think about the race around d_move, but what I meant is the > race between d_unlinked and unhash/rehash. > > - getcwd return -ENOENT when pwd is unhashed. > - vfs_rename_dir() > + makes the existing target unhashed. No it doesn't. dentry_unhash() doesn't drop the dentry from the hash unless it has no other references, cwd being one such ref. Thanks, Miklos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions 2011-01-20 11:15 ` Miklos Szeredi @ 2011-01-21 6:38 ` J. R. Okajima 0 siblings, 0 replies; 15+ messages in thread From: J. R. Okajima @ 2011-01-21 6:38 UTC (permalink / raw) To: Miklos Szeredi; +Cc: npiggin, npiggin, linux-kernel, linux-fsdevel Miklos Szeredi: > No it doesn't. dentry_unhash() doesn't drop the dentry from the hash > unless it has no other references, cwd being one such ref. I see. When chdir() completets before vfs_rename_dir(), the dir will not be unhashed. It is OK. Even if the dir is unhashed temporary before chdir(), then the lookup fallback to ref-walk and i_mutex protects. It is OK too. Thank you. J. R. Okajima ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113) 2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima 2011-01-19 7:21 ` Nick Piggin @ 2011-02-11 3:49 ` Ian Kent 2011-02-13 2:19 ` J. R. Okajima 1 sibling, 1 reply; 15+ messages in thread From: Ian Kent @ 2011-02-11 3:49 UTC (permalink / raw) To: J. R. Okajima Cc: Nick Piggin, Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Wed, 2011-01-19 at 15:43 +0900, J. R. Okajima wrote: > Hi, > > Nick Piggin: > > Thanks for your help, can you see how I've fixed it in my vfs-scale > > tree? What do you think? > > Your fix is great. I have no objection at all. > Other than the fix, here are more generic questions about vfs-scale work. > I am happy if you reply when you have time. > > - getcwd(2) needs d_lock? > It acquires rename_lock and then tests whether the pwd is removed by > d_unhashed(). If a race condition between vfs_rename_dir() which may > unhash/rehash the dentry happens, then getcwd() may return the wrong > result due to unprotected d_unhashed() call, I am afraid. rename_lock > doesn't help this case. > > - what is the right order of dget() and mntget()? > If I remember correctly, someone said "mntget() first and then > dget(). when putting, do in reverse" in the discussion when > path_{get,put}() were born. So it is called "the right order" in the > commit log. > It was many years ago. Is it still true? And should rcu-walk follow it > too? The current implementation doesn't seem to care about this order. I didn't spot that, where did you see this? I'm not sure about the get but I fairly sure the dput() has to be before the mntput() because the shrink_dcache_*() cleanup routines object to dentrys that have a reference count of more than one. Ian -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113) 2011-02-11 3:49 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent @ 2011-02-13 2:19 ` J. R. Okajima 0 siblings, 0 replies; 15+ messages in thread From: J. R. Okajima @ 2011-02-13 2:19 UTC (permalink / raw) To: Ian Kent Cc: Nick Piggin, Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs, linux-kernel, linux-fsdevel Ian Kent: > > - what is the right order of dget() and mntget()? > > If I remember correctly, someone said "mntget() first and then > > dget(). when putting, do in reverse" in the discussion when > > path_{get,put}() were born. So it is called "the right order" in the > > commit log. > > It was many years ago. Is it still true? And should rcu-walk follow it > > too? The current implementation doesn't seem to care about this order. > > I didn't spot that, where did you see this? > > I'm not sure about the get but I fairly sure the dput() has to be before > the mntput() because the shrink_dcache_*() cleanup routines object to > dentrys that have a reference count of more than one. For dget - mntget, there are several such code. For example, nameidata_dentry_drop_rcu() { struct dentry *parent = nd->path.dentry; ::: parent->d_count++; spin_unlock(&dentry->d_lock); spin_unlock(&parent->d_lock); ::: mntget(nd->path.mnt); ::: But I am not sure the "get" order is a problem. Nick Piggin also replied and said dget and mntget is not a problem, and I replied if I found such "put" order, I would write again. J. R. Okajima ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NFS root lockups with -next 20110113 2011-01-13 13:22 ` J. R. Okajima 2011-01-13 13:28 ` Santosh Shilimkar @ 2011-01-13 13:35 ` Mark Brown 2011-01-13 13:41 ` Santosh Shilimkar 2 siblings, 0 replies; 15+ messages in thread From: Mark Brown @ 2011-01-13 13:35 UTC (permalink / raw) To: J. R. Okajima Cc: Trond Myklebust, Santosh Shilimkar, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA On Thu, Jan 13, 2011 at 10:22:07PM +0900, J. R. Okajima wrote: > This and a report from Santosh Shilimkar look like the same problem > which I reported, and I am testing this patch. If you can, please try it > too. > Note: Of course this is not offcial fix. Seems to do the trick: Tested-by: Mark Brown <broonie-yzvPICuk2AABXRvFM2YniueM+bqZidxxQQ4Iyu8u01E@public.gmane.org> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: NFS root lockups with -next 20110113 2011-01-13 13:22 ` J. R. Okajima 2011-01-13 13:28 ` Santosh Shilimkar 2011-01-13 13:35 ` NFS root lockups with -next 20110113 Mark Brown @ 2011-01-13 13:41 ` Santosh Shilimkar 2 siblings, 0 replies; 15+ messages in thread From: Santosh Shilimkar @ 2011-01-13 13:41 UTC (permalink / raw) To: J. R. Okajima, Mark Brown Cc: Trond Myklebust, Nick Piggin, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r (+ 'linux-arm' since same problem was getting discussed in another thread) > -----Original Message----- > From: Santosh Shilimkar [mailto:santosh.shilimkar-l0cyMroinI0@public.gmane.org] > Sent: Thursday, January 13, 2011 6:59 PM > To: 'J. R. Okajima'; 'Mark Brown' > Cc: 'Trond Myklebust'; 'Nick Piggin'; 'linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org'; > 'linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org'; 'linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org' > Subject: RE: NFS root lockups with -next 20110113 > > > -----Original Message----- > > From: J. R. Okajima [mailto:hooanon05-/E1597aS9LR3+QwDJ9on6Q@public.gmane.org] > > Sent: Thursday, January 13, 2011 6:52 PM > > To: Mark Brown > > Cc: Trond Myklebust; Santosh Shilimkar; Nick Piggin; linux- > > nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux- > > fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > Subject: Re: NFS root lockups with -next 20110113 > > > > > > Mark Brown: > > > With -next 20110113 I'm experiencing lockups shortly after > > userspace > > > starts when booting with my root filesystem on NFS, log below. > I > > can > > > boot into /bin/sh but running real userspace triggers this very > > easily. > > > This was introduced sometime this week. > > > > > > I've not bisected or otherwise investigated much yet, but I do > > notice > > > code added recently by Nick in "fs: rcu-walk for path lookup" > > showing up > > > in the backtrace so including him in the CCs. > > > > This and a report from Santosh Shilimkar look like the same > problem > > which I reported, and I am testing this patch. If you can, please > > try it > > too. > > Note: Of course this is not offcial fix. > > > > It works. My board booted with NFS with below patch > > > > > > diff --git a/fs/namei.c b/fs/namei.c > > index 5bb7588..51d052f 100644 > > --- a/fs/namei.c > > +++ b/fs/namei.c > > @@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct > > nameidata *nd, struct dentry *dentry > > { > > struct fs_struct *fs = current->fs; > > struct dentry *parent = nd->path.dentry; > > + int isroot; > > > > BUG_ON(!(nd->flags & LOOKUP_RCU)); > > if (nd->root.mnt) { > > @@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct > > nameidata *nd, struct dentry *dentry > > goto err_root; > > } > > spin_lock(&parent->d_lock); > > - spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); > > - if (!__d_rcu_to_refcount(dentry, nd->seq)) > > - goto err; > > + isroot = IS_ROOT(dentry); > > + if (!isroot) { > > + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); > > + if (!__d_rcu_to_refcount(dentry, nd->seq)) > > + goto err; > > + } > > /* > > * If the sequence check on the child dentry passed, then the > > child has > > * not been removed from its parent. This means the parent > > dentry must > > * be valid and able to take a reference at this point. > > */ > > - BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent); > > + BUG_ON(!isroot && dentry->d_parent != parent); > > BUG_ON(!parent->d_count); > > parent->d_count++; > > - spin_unlock(&dentry->d_lock); > > + if (!isroot) > > + spin_unlock(&dentry->d_lock); > > spin_unlock(&parent->d_lock); > > if (nd->root.mnt) { > > path_get(&nd->root); > > @@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct > > nameidata *nd, struct dentry *dentry > > nd->flags &= ~LOOKUP_RCU; > > return 0; > > err: > > - spin_unlock(&dentry->d_lock); > > + if (!isroot) > > + spin_unlock(&dentry->d_lock); > > spin_unlock(&parent->d_lock); > > err_root: > > if (nd->root.mnt) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-02-13 2:19 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-13 12:06 NFS root lockups with -next 20110113 Mark Brown 2011-01-13 13:22 ` J. R. Okajima 2011-01-13 13:28 ` Santosh Shilimkar [not found] ` <676f5c24375e1cc2aa14fe6630ef1324-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-01-13 13:45 ` J. R. Okajima 2011-01-14 3:59 ` Nick Piggin [not found] ` <AANLkTim=VY7+fo6d_nUXVxs+iZ_f79qWu_eYMUjvhVJO-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2011-01-14 4:41 ` J. R. Okajima 2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima 2011-01-19 7:21 ` Nick Piggin 2011-01-20 9:05 ` vfs-scale, general questions J. R. Okajima 2011-01-20 11:15 ` Miklos Szeredi 2011-01-21 6:38 ` J. R. Okajima 2011-02-11 3:49 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent 2011-02-13 2:19 ` J. R. Okajima 2011-01-13 13:35 ` NFS root lockups with -next 20110113 Mark Brown 2011-01-13 13:41 ` Santosh Shilimkar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).