nfsd stuckage

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* nfsd stuckage
@ 2009-01-06 22:56 Andrew Morton
  2009-01-06 23:02 ` J. Bruce Fields
  2009-01-08 14:57 ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2009-01-06 22:56 UTC (permalink / raw)
  To: linux-nfs; +Cc: linux-kernel, Neil Brown, J. Bruce Fields


I just built current mainline plus the just-sent 266 -mm patches.

The machine failed to power off when hit with `halt -pfn'.  dmesg output:

[   37.087037] calling  rfcomm_init+0x0/0xb6 [rfcomm] @ 3946
[   37.087294] initcall rfcomm_init+0x0/0xb6 [rfcomm] returned 0 after 72 usecs
[   37.505855] calling  hidp_init+0x0/0x5e [hidp] @ 4046
[   37.506072] initcall hidp_init+0x0/0x5e [hidp] returned 0 after 28 usecs
[   37.636638] calling  init_autofs4_fs+0x0/0x23 [autofs4] @ 4081
[   37.636990] initcall init_autofs4_fs+0x0/0x23 [autofs4] returned 0 after 54 usecs
[   39.630075] calling  init_nlm+0x0/0x22 [lockd] @ 4264
[   39.630321] initcall init_nlm+0x0/0x22 [lockd] returned 0 after 59 usecs
[   39.690077] calling  init_rpcsec_gss+0x0/0x4a [auth_rpcgss] @ 4264
[   39.690281] initcall init_rpcsec_gss+0x0/0x4a [auth_rpcgss] returned 0 after 12 usecs
[   39.834034] calling  init_nfsd+0x0/0xe2 [nfsd] @ 4302
[   39.834471] initcall init_nfsd+0x0/0xe2 [nfsd] returned 0 after 236 usecs
[   39.924213] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[  672.162677] INFO: task nfsd4:4324 blocked for more than 480 seconds.
[  672.162706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  672.162725]  ffff880251df1d60 0000000000000046 ffff88025e1c0580 ffff8802488013d8
[  672.162753]  ffff880251df1d20 ffff88024c49a7a0 ffff88025e088760 ffff88024c49ab18
[  672.162834]  000000002807ee00 00000000ffff59ec ffff880251df1d50 0000000000000282
[  672.162865] Call Trace:
[  672.162880]  [<ffffffff8052a1d8>] __mutex_lock_slowpath+0x6a/0xac
[  672.162895]  [<ffffffff8052a0c5>] mutex_lock+0x2c/0x30
[  672.162908]  [<ffffffff802c4747>] vfs_fsync+0x63/0xa9
[  672.162933]  [<ffffffffa048c74c>] nfsd_sync_dir+0x10/0x12 [nfsd]
[  672.162960]  [<ffffffffa04a5427>] nfsd4_sync_rec_dir+0x27/0x40 [nfsd]
[  672.162984]  [<ffffffffa04a592a>] nfsd4_recdir_purge_old+0x3d/0x6a [nfsd]
[  672.163023]  [<ffffffffa04a1745>] laundromat_main+0x62/0x225 [nfsd]
[  672.163049]  [<ffffffffa04a16e3>] ? laundromat_main+0x0/0x225 [nfsd]
[  672.163064]  [<ffffffff8024b4a7>] run_workqueue+0x8d/0x124
[  672.163076]  [<ffffffff8024b5e0>] ? worker_thread+0x0/0xe5
[  672.163089]  [<ffffffff8024b6b8>] worker_thread+0xd8/0xe5
[  672.163102]  [<ffffffff8024e8cc>] ? autoremove_wake_function+0x0/0x36
[  672.163115]  [<ffffffff8024b5e0>] ? worker_thread+0x0/0xe5
[  672.163127]  [<ffffffff8024e5e0>] kthread+0x44/0x6b
[  672.163140]  [<ffffffff8020cfba>] child_rip+0xa/0x20
[  672.163151]  [<ffffffff8024e59c>] ? kthread+0x0/0x6b
[  672.163162]  [<ffffffff8020cfb0>] ? child_rip+0x0/0x20
[ 1204.739381] INFO: task nfsd4:4324 blocked for more than 480 seconds.
[ 1204.739415] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1204.739436]  ffff880251df1d60 0000000000000046 ffff88025e1c0580 ffff8802488013d8
[ 1204.739523]  ffff880251df1d20 ffff88024c49a7a0 ffff88025e088760 ffff88024c49ab18
[ 1204.739551]  000000002807ee00 00000000ffff59ec ffff880251df1d50 0000000000000282
[ 1204.739579] Call Trace:

This didn't happen in linux-next a week or so ago.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-06 22:56 nfsd stuckage Andrew Morton
@ 2009-01-06 23:02 ` J. Bruce Fields
  2009-01-06 23:03   ` Christoph Hellwig
  2009-01-08 14:57 ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: J. Bruce Fields @ 2009-01-06 23:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-nfs, linux-kernel, Neil Brown, Eric Sesterhenn, hch

On Tue, Jan 06, 2009 at 02:56:12PM -0800, Andrew Morton wrote:
> 
> I just built current mainline plus the just-sent 266 -mm patches.
> 
> The machine failed to power off when hit with `halt -pfn'.  dmesg output:

Christoph, can you live with this for now?

--b.

commit 33e3950dc2eae7484e79685083c304d93013e3ec
Author: J. Bruce Fields <bfields@citi.umich.edu>
Date:   Tue Jan 6 13:37:03 2009 -0500

    nfsd: fix double-locks of directory mutex
    
    A number of nfsd operations depend on the i_mutex to cover more code
    than just the fsync, so the approach of 4c728ef583b3d8 "add a vfs_fsync
    helper" doesn't work for nfsd.  Revert the parts of those patches that
    touch nfsd, and remove the logic from vfs_nfsd that was needed only for
    the special case of nfsd.
    
    Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
    Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 44aa92a..6e50aaa 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -744,16 +744,44 @@ nfsd_close(struct file *filp)
 	fput(filp);
 }
 
+/*
+ * Sync a file
+ * As this calls fsync (not fdatasync) there is no need for a write_inode
+ * after it.
+ */
+static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
+			      const struct file_operations *fop)
+{
+	struct inode *inode = dp->d_inode;
+	int (*fsync) (struct file *, struct dentry *, int);
+	int err;
+
+	err = filemap_fdatawrite(inode->i_mapping);
+	if (err == 0 && fop && (fsync = fop->fsync))
+		err = fsync(filp, dp, 0);
+	if (err == 0)
+		err = filemap_fdatawait(inode->i_mapping);
+
+	return err;
+}
+
 static int
 nfsd_sync(struct file *filp)
 {
-	return vfs_fsync(filp, filp->f_path.dentry, 0);
+        int err;
+	struct inode *inode = filp->f_path.dentry->d_inode;
+	dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
+	mutex_lock(&inode->i_mutex);
+	err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
+	mutex_unlock(&inode->i_mutex);
+
+	return err;
 }
 
 int
-nfsd_sync_dir(struct dentry *dentry)
+nfsd_sync_dir(struct dentry *dp)
 {
-	return vfs_fsync(NULL, dentry, 0);
+	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
 }
 
 /*
diff --git a/fs/sync.c b/fs/sync.c
index 0921d6d..8e0a656 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -83,10 +83,6 @@ int file_fsync(struct file *filp, struct dentry *dentry, int datasync)
  *
  * Write back data and metadata for @file to disk.  If @datasync is
  * set only metadata needed to access modified file data is written.
- *
- * In case this function is called from nfsd @file may be %NULL and
- * only @dentry is set.  This can only happen when the filesystem
- * implements the export_operations API.
  */
 int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 {
@@ -94,18 +90,8 @@ int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
 	struct address_space *mapping;
 	int err, ret;
 
-	/*
-	 * Get mapping and operations from the file in case we have
-	 * as file, or get the default values for them in case we
-	 * don't have a struct file available.  Damn nfsd..
-	 */
-	if (file) {
-		mapping = file->f_mapping;
-		fop = file->f_op;
-	} else {
-		mapping = dentry->d_inode->i_mapping;
-		fop = dentry->d_inode->i_fop;
-	}
+	mapping = file->f_mapping;
+	fop = file->f_op;
 
 	if (!fop || !fop->fsync) {
 		ret = -EINVAL;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-06 23:02 ` J. Bruce Fields
@ 2009-01-06 23:03   ` Christoph Hellwig
  2009-01-06 23:05     ` J. Bruce Fields
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2009-01-06 23:03 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Andrew Morton, linux-nfs, linux-kernel, Neil Brown,
	Eric Sesterhenn, hch

On Tue, Jan 06, 2009 at 06:02:44PM -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 02:56:12PM -0800, Andrew Morton wrote:
> > 
> > I just built current mainline plus the just-sent 266 -mm patches.
> > 
> > The machine failed to power off when hit with `halt -pfn'.  dmesg output:
> 
> Christoph, can you live with this for now?

nfsd part is well, livable.  But the fs/sync.c is buggy as stackable
filesystems can call vfs_fsync with a NULL pointer due to nfs calling it
that way, so please drop that hunk.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-06 23:03   ` Christoph Hellwig
@ 2009-01-06 23:05     ` J. Bruce Fields
  2009-01-07  0:15       ` J. Bruce Fields
  0 siblings, 1 reply; 12+ messages in thread
From: J. Bruce Fields @ 2009-01-06 23:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, linux-nfs, linux-kernel, Neil Brown,
	Eric Sesterhenn

On Wed, Jan 07, 2009 at 12:03:56AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 06, 2009 at 06:02:44PM -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 02:56:12PM -0800, Andrew Morton wrote:
> > > 
> > > I just built current mainline plus the just-sent 266 -mm patches.
> > > 
> > > The machine failed to power off when hit with `halt -pfn'.  dmesg output:
> > 
> > Christoph, can you live with this for now?
> 
> nfsd part is well, livable.  But the fs/sync.c is buggy as stackable
> filesystems can call vfs_fsync with a NULL pointer due to nfs calling it
> that way, so please drop that hunk.

Whoops, OK, I didn't understand that.  I'll drop that hunk, retest, then
submit--should take a few minutes.

--b.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-06 23:05     ` J. Bruce Fields
@ 2009-01-07  0:15       ` J. Bruce Fields
  2009-01-07  0:23         ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: J. Bruce Fields @ 2009-01-07  0:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, linux-nfs, linux-kernel, Neil Brown,
	Eric Sesterhenn

On Tue, Jan 06, 2009 at 06:05:51PM -0500, bfields wrote:
> On Wed, Jan 07, 2009 at 12:03:56AM +0100, Christoph Hellwig wrote:
> > On Tue, Jan 06, 2009 at 06:02:44PM -0500, J. Bruce Fields wrote:
> > > On Tue, Jan 06, 2009 at 02:56:12PM -0800, Andrew Morton wrote:
> > > > 
> > > > I just built current mainline plus the just-sent 266 -mm patches.
> > > > 
> > > > The machine failed to power off when hit with `halt -pfn'.  dmesg output:
> > > 
> > > Christoph, can you live with this for now?
> > 
> > nfsd part is well, livable.  But the fs/sync.c is buggy as stackable
> > filesystems can call vfs_fsync with a NULL pointer due to nfs calling it
> > that way, so please drop that hunk.
> 
> Whoops, OK, I didn't understand that.  I'll drop that hunk, retest, then
> submit--should take a few minutes.

So this works for me--and I guess I may as well submit it in a pull
request.  But: it sounds like we still have a regression for ecryptfs?
(Since on ecryptfs export nfsd will still try to get the mutex twice on
create, unlink, etc.)  Maybe we should just revert
4c728ef583b3d82266584da5cb068294c09df31e entirely for now?

--b.

commit 3fbc5c762bd9f9ff52fe7b5b09398a3cff0e8415
Author: J. Bruce Fields <bfields@citi.umich.edu>
Date:   Tue Jan 6 13:37:03 2009 -0500

    nfsd: fix double-locks of directory mutex
    
    A number of nfsd operations depend on the i_mutex to cover more code
    than just the fsync, so the approach of 4c728ef583b3d8 "add a vfs_fsync
    helper" doesn't work for nfsd.  Revert the parts of those patches that
    touch nfsd.
    
    Note: we can't, however, remove the logic from vfs_fsync that was needed
    only for the special case of nfsd, because a vfs_fsync(NULL,...) call
    can still result indirectly from a stackable filesystem that was called
    by nfsd.
    
    Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
    Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 44aa92a..6e50aaa 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -744,16 +744,44 @@ nfsd_close(struct file *filp)
 	fput(filp);
 }
 
+/*
+ * Sync a file
+ * As this calls fsync (not fdatasync) there is no need for a write_inode
+ * after it.
+ */
+static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
+			      const struct file_operations *fop)
+{
+	struct inode *inode = dp->d_inode;
+	int (*fsync) (struct file *, struct dentry *, int);
+	int err;
+
+	err = filemap_fdatawrite(inode->i_mapping);
+	if (err == 0 && fop && (fsync = fop->fsync))
+		err = fsync(filp, dp, 0);
+	if (err == 0)
+		err = filemap_fdatawait(inode->i_mapping);
+
+	return err;
+}
+
 static int
 nfsd_sync(struct file *filp)
 {
-	return vfs_fsync(filp, filp->f_path.dentry, 0);
+        int err;
+	struct inode *inode = filp->f_path.dentry->d_inode;
+	dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
+	mutex_lock(&inode->i_mutex);
+	err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
+	mutex_unlock(&inode->i_mutex);
+
+	return err;
 }
 
 int
-nfsd_sync_dir(struct dentry *dentry)
+nfsd_sync_dir(struct dentry *dp)
 {
-	return vfs_fsync(NULL, dentry, 0);
+	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-07  0:15       ` J. Bruce Fields
@ 2009-01-07  0:23         ` Andrew Morton
  2009-01-07  0:28           ` J. Bruce Fields
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2009-01-07  0:23 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: hch, linux-nfs, linux-kernel, neilb, snakebyte

On Tue, 6 Jan 2009 19:15:01 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

>     nfsd: fix double-locks of directory mutex

grumble.

>  
> +/*
> + * Sync a file
> + * As this calls fsync (not fdatasync) there is no need for a write_inode
> + * after it.
> + */
> +static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
> +			      const struct file_operations *fop)
> +{
> +	struct inode *inode = dp->d_inode;
> +	int (*fsync) (struct file *, struct dentry *, int);
> +	int err;
> +
> +	err = filemap_fdatawrite(inode->i_mapping);
> +	if (err == 0 && fop && (fsync = fop->fsync))
> +		err = fsync(filp, dp, 0);
> +	if (err == 0)
> +		err = filemap_fdatawait(inode->i_mapping);
> +
> +	return err;
> +}

This function is HUGE!  And hardly a fastpath.

>  static int
>  nfsd_sync(struct file *filp)
>  {
> -	return vfs_fsync(filp, filp->f_path.dentry, 0);
> +        int err;
> +	struct inode *inode = filp->f_path.dentry->d_inode;
> +	dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
> +	mutex_lock(&inode->i_mutex);
> +	err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);

(checkpatch?)

> +	mutex_unlock(&inode->i_mutex);
> +
> +	return err;
>  }
>  
>  int
> -nfsd_sync_dir(struct dentry *dentry)
> +nfsd_sync_dir(struct dentry *dp)
>  {
> -	return vfs_fsync(NULL, dentry, 0);
> +	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
>  }

And we expand it twice.  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-07  0:23         ` Andrew Morton
@ 2009-01-07  0:28           ` J. Bruce Fields
  2009-01-07  7:42             ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: J. Bruce Fields @ 2009-01-07  0:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hch, linux-nfs, linux-kernel, neilb, snakebyte

On Tue, Jan 06, 2009 at 04:23:28PM -0800, Andrew Morton wrote:
> On Tue, 6 Jan 2009 19:15:01 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> >     nfsd: fix double-locks of directory mutex
> 
> grumble.

This is literally just a revert of part of 4c728ef583b3d822; if you'd
like me to clean up this stuff while I'm there, I'm happy to.

--b.

> > +/*
> > + * Sync a file
> > + * As this calls fsync (not fdatasync) there is no need for a write_inode
> > + * after it.
> > + */
> > +static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
> > +			      const struct file_operations *fop)
> > +{
> > +	struct inode *inode = dp->d_inode;
> > +	int (*fsync) (struct file *, struct dentry *, int);
> > +	int err;
> > +
> > +	err = filemap_fdatawrite(inode->i_mapping);
> > +	if (err == 0 && fop && (fsync = fop->fsync))
> > +		err = fsync(filp, dp, 0);
> > +	if (err == 0)
> > +		err = filemap_fdatawait(inode->i_mapping);
> > +
> > +	return err;
> > +}
> 
> This function is HUGE!  And hardly a fastpath.
> 
> >  static int
> >  nfsd_sync(struct file *filp)
> >  {
> > -	return vfs_fsync(filp, filp->f_path.dentry, 0);
> > +        int err;
> > +	struct inode *inode = filp->f_path.dentry->d_inode;
> > +	dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
> > +	mutex_lock(&inode->i_mutex);
> > +	err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
> 
> (checkpatch?)
> 
> > +	mutex_unlock(&inode->i_mutex);
> > +
> > +	return err;
> >  }
> >  
> >  int
> > -nfsd_sync_dir(struct dentry *dentry)
> > +nfsd_sync_dir(struct dentry *dp)
> >  {
> > -	return vfs_fsync(NULL, dentry, 0);
> > +	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
> >  }
> 
> And we expand it twice.  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-07  0:28           ` J. Bruce Fields
@ 2009-01-07  7:42             ` Christoph Hellwig
  2009-01-07 16:56               ` J. Bruce Fields
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2009-01-07  7:42 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Andrew Morton, hch, linux-nfs, linux-kernel, neilb, snakebyte

On Tue, Jan 06, 2009 at 07:28:16PM -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 04:23:28PM -0800, Andrew Morton wrote:
> > On Tue, 6 Jan 2009 19:15:01 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > >     nfsd: fix double-locks of directory mutex
> > 
> > grumble.
> 
> This is literally just a revert of part of 4c728ef583b3d822; if you'd
> like me to clean up this stuff while I'm there, I'm happy to.

Please leave it as the revert.  NFSD really needs to use vfs_fsync
eventually so we can sort out our ->fsync usage.  I suspect the best
way to get there is to to the i_mutex removal for fsync earlier than
planned, but I'll need to audit the filesystems first.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-07  7:42             ` Christoph Hellwig
@ 2009-01-07 16:56               ` J. Bruce Fields
  2009-01-07 17:22                 ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: J. Bruce Fields @ 2009-01-07 16:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, hch, linux-nfs, linux-kernel, neilb, snakebyte

On Wed, Jan 07, 2009 at 02:42:56AM -0500, Christoph Hellwig wrote:
> On Tue, Jan 06, 2009 at 07:28:16PM -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 04:23:28PM -0800, Andrew Morton wrote:
> > > On Tue, 6 Jan 2009 19:15:01 -0500
> > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > 
> > > >     nfsd: fix double-locks of directory mutex
> > > 
> > > grumble.
> > 
> > This is literally just a revert of part of 4c728ef583b3d822; if you'd
> > like me to clean up this stuff while I'm there, I'm happy to.
> 
> Please leave it as the revert.  NFSD really needs to use vfs_fsync
> eventually so we can sort out our ->fsync usage.

OK.  Mind if we just revert the whole commit for now?  With the
double-lock regression is still there for ecryptfs exports, then I'd
rather do a simple revert of the whole patch and not try to pick out
just the fs/nfsd/vfs.c part.

--b.

> I suspect the best way to get there is to to the i_mutex removal for
> fsync earlier than planned, but I'll need to audit the filesystems
> first.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-07 16:56               ` J. Bruce Fields
@ 2009-01-07 17:22                 ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2009-01-07 17:22 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Andrew Morton, hch, linux-nfs, linux-kernel,
	neilb, snakebyte

On Wed, Jan 07, 2009 at 11:56:39AM -0500, J. Bruce Fields wrote:
> OK.  Mind if we just revert the whole commit for now?  With the
> double-lock regression is still there for ecryptfs exports, then I'd
> rather do a simple revert of the whole patch and not try to pick out
> just the fs/nfsd/vfs.c part.

Umm, exporting ecryptfs would previously take the lower i_mutex in
the ecryptfs fsync method and now does in vfs_fsync, there should
be no changed in behaviour.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-06 22:56 nfsd stuckage Andrew Morton
  2009-01-06 23:02 ` J. Bruce Fields
@ 2009-01-08 14:57 ` Peter Zijlstra
  2009-01-08 16:05   ` J. Bruce Fields
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2009-01-08 14:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-nfs, linux-kernel, Neil Brown, J. Bruce Fields,
	Christoph Hellwig

On Tue, 2009-01-06 at 14:56 -0800, Andrew Morton wrote:
> I just built current mainline plus the just-sent 266 -mm patches.
> 
> The machine failed to power off when hit with `halt -pfn'.  dmesg output:

> [  672.162677] INFO: task nfsd4:4324 blocked for more than 480 seconds.
> [  672.162706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  672.162725]  ffff880251df1d60 0000000000000046 ffff88025e1c0580 ffff8802488013d8
> [  672.162753]  ffff880251df1d20 ffff88024c49a7a0 ffff88025e088760 ffff88024c49ab18
> [  672.162834]  000000002807ee00 00000000ffff59ec ffff880251df1d50 0000000000000282
> [  672.162865] Call Trace:
> [  672.162880]  [<ffffffff8052a1d8>] __mutex_lock_slowpath+0x6a/0xac
> [  672.162895]  [<ffffffff8052a0c5>] mutex_lock+0x2c/0x30
> [  672.162908]  [<ffffffff802c4747>] vfs_fsync+0x63/0xa9
> [  672.162933]  [<ffffffffa048c74c>] nfsd_sync_dir+0x10/0x12 [nfsd]
> [  672.162960]  [<ffffffffa04a5427>] nfsd4_sync_rec_dir+0x27/0x40 [nfsd]
> [  672.162984]  [<ffffffffa04a592a>] nfsd4_recdir_purge_old+0x3d/0x6a [nfsd]
> [  672.163023]  [<ffffffffa04a1745>] laundromat_main+0x62/0x225 [nfsd]
> [  672.163049]  [<ffffffffa04a16e3>] ? laundromat_main+0x0/0x225 [nfsd]
> [  672.163064]  [<ffffffff8024b4a7>] run_workqueue+0x8d/0x124
> [  672.163076]  [<ffffffff8024b5e0>] ? worker_thread+0x0/0xe5
> [  672.163089]  [<ffffffff8024b6b8>] worker_thread+0xd8/0xe5
> [  672.163102]  [<ffffffff8024e8cc>] ? autoremove_wake_function+0x0/0x36
> [  672.163115]  [<ffffffff8024b5e0>] ? worker_thread+0x0/0xe5
> [  672.163127]  [<ffffffff8024e5e0>] kthread+0x44/0x6b
> [  672.163140]  [<ffffffff8020cfba>] child_rip+0xa/0x20
> [  672.163151]  [<ffffffff8024e59c>] ? kthread+0x0/0x6b
> [  672.163162]  [<ffffffff8020cfb0>] ? child_rip+0x0/0x20

FWIW lockdep seems to warn about this...

All I have to do to trigger this is boot the machine and let it sit for
a few minutes.

[  113.552497] =============================================
[  113.553289] [ INFO: possible recursive locking detected ]
[  113.553289] 2.6.28-tip #592                              
[  113.553289] ---------------------------------------------
[  113.553289] nfsd4/1914 is trying to acquire lock:        
[  113.553289]  (&type->i_mutex_dir_key#4){--..}, at: [<ffffffff802e7e5e>] vfs_fsync+0x6c/0xb1
[  113.553289]                                                                                
[  113.553289] but task is already holding lock:                                              
[  113.553289]  (&type->i_mutex_dir_key#4){--..}, at: [<ffffffffa0190727>] nfsd4_sync_rec_dir+0x22/0x47 [nfsd]                                                                                                        
[  113.553289]                                                                                             
[  113.553289] other info that might help us debug this:                                                   
[  113.553289] 4 locks held by nfsd4/1914:                                                                 
[  113.553289]  #0:  (nfsd4){--..}, at: [<ffffffff80252303>] run_workqueue+0xb6/0x21b                      
[  113.553289]  #1:  ((laundromat_work).work){--..}, at: [<ffffffff80252303>] run_workqueue+0xb6/0x21b     
[  113.553289]  #2:  (client_mutex){--..}, at: [<ffffffffa018bd05>] laundromat_main+0x33/0x24e [nfsd]      
[  113.553289]  #3:  (&type->i_mutex_dir_key#4){--..}, at: [<ffffffffa0190727>] nfsd4_sync_rec_dir+0x22/0x47 [nfsd]                                                                                                   
[  113.553289]                                                                                             
[  113.553289] stack backtrace:                                                                            
[  113.553289] Pid: 1914, comm: nfsd4 Not tainted 2.6.28-tip #592                                          
[  113.553289] Call Trace:                                                                                 
[  113.553289]  [<ffffffff80266987>] __lock_acquire+0xe42/0x161a                                           
[  113.553289]  [<ffffffff80288857>] ? __call_rcu+0x7a/0x107                                               
[  113.553289]  [<ffffffff802671b4>] lock_acquire+0x55/0x71                                                
[  113.553289]  [<ffffffff802e7e5e>] ? vfs_fsync+0x6c/0xb1                                                 
[  113.553289]  [<ffffffff805568d0>] mutex_lock_nested+0x4e/0x320                                          
[  113.553289]  [<ffffffff802e7e5e>] ? vfs_fsync+0x6c/0xb1                                                 
[  113.553289]  [<ffffffff8029bde0>] ? __filemap_fdatawrite_range+0x57/0x5f                                
[  113.553289]  [<ffffffff802e7e5e>] vfs_fsync+0x6c/0xb1                                                   
[  113.553289]  [<ffffffffa0176f8f>] nfsd_sync_dir+0x15/0x17 [nfsd]                                        
[  113.553289]  [<ffffffffa0190733>] nfsd4_sync_rec_dir+0x2e/0x47 [nfsd]                                   
[  113.553289]  [<ffffffffa0190791>] nfsd4_recdir_purge_old+0x45/0x73 [nfsd]                               
[  113.553289]  [<ffffffffa018bd44>] laundromat_main+0x72/0x24e [nfsd]                                     
[  113.553289]  [<ffffffff80252355>] run_workqueue+0x108/0x21b                                             
[  113.553289]  [<ffffffff80252303>] ? run_workqueue+0xb6/0x21b                                            
[  113.553289]  [<ffffffffa018bcd2>] ? laundromat_main+0x0/0x24e [nfsd]                                    
[  113.553289]  [<ffffffff8025254d>] worker_thread+0xe5/0xf6                                               
[  113.553289]  [<ffffffff80256615>] ? autoremove_wake_function+0x0/0x3d                                   
[  113.553289]  [<ffffffff80252468>] ? worker_thread+0x0/0xf6                                              
[  113.553289]  [<ffffffff80256200>] kthread+0x4e/0x7b                                                     
[  113.553289]  [<ffffffff8020d51a>] child_rip+0xa/0x20                                                    
[  113.553289]  [<ffffffff8020cec0>] ? restore_args+0x0/0x30                                               
[  113.553289]  [<ffffffff802561b2>] ? kthread+0x0/0x7b                                                    
[  113.553289]  [<ffffffff8020d510>] ? child_rip+0x0/0x20 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuckage
  2009-01-08 14:57 ` Peter Zijlstra
@ 2009-01-08 16:05   ` J. Bruce Fields
  0 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2009-01-08 16:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, linux-nfs, linux-kernel, Neil Brown,
	Christoph Hellwig

On Thu, Jan 08, 2009 at 03:57:30PM +0100, Peter Zijlstra wrote:
> FWIW lockdep seems to warn about this...
> 
> All I have to do to trigger this is boot the machine and let it sit for
> a few minutes.

Linus merged a fix (9a8d248e2d2 "nfsd: fix double-locks of directory
mutex") last night.  If you still see warnings after that, let us know.

--b.

> 
> [  113.552497] =============================================
> [  113.553289] [ INFO: possible recursive locking detected ]
> [  113.553289] 2.6.28-tip #592                              
> [  113.553289] ---------------------------------------------
> [  113.553289] nfsd4/1914 is trying to acquire lock:        
> [  113.553289]  (&type->i_mutex_dir_key#4){--..}, at: [<ffffffff802e7e5e>] vfs_fsync+0x6c/0xb1
> [  113.553289]                                                                                
> [  113.553289] but task is already holding lock:                                              
> [  113.553289]  (&type->i_mutex_dir_key#4){--..}, at: [<ffffffffa0190727>] nfsd4_sync_rec_dir+0x22/0x47 [nfsd]                                                                                                        
> [  113.553289]                                                                                             
> [  113.553289] other info that might help us debug this:                                                   
> [  113.553289] 4 locks held by nfsd4/1914:                                                                 
> [  113.553289]  #0:  (nfsd4){--..}, at: [<ffffffff80252303>] run_workqueue+0xb6/0x21b                      
> [  113.553289]  #1:  ((laundromat_work).work){--..}, at: [<ffffffff80252303>] run_workqueue+0xb6/0x21b     
> [  113.553289]  #2:  (client_mutex){--..}, at: [<ffffffffa018bd05>] laundromat_main+0x33/0x24e [nfsd]      
> [  113.553289]  #3:  (&type->i_mutex_dir_key#4){--..}, at: [<ffffffffa0190727>] nfsd4_sync_rec_dir+0x22/0x47 [nfsd]                                                                                                   
> [  113.553289]                                                                                             
> [  113.553289] stack backtrace:                                                                            
> [  113.553289] Pid: 1914, comm: nfsd4 Not tainted 2.6.28-tip #592                                          
> [  113.553289] Call Trace:                                                                                 
> [  113.553289]  [<ffffffff80266987>] __lock_acquire+0xe42/0x161a                                           
> [  113.553289]  [<ffffffff80288857>] ? __call_rcu+0x7a/0x107                                               
> [  113.553289]  [<ffffffff802671b4>] lock_acquire+0x55/0x71                                                
> [  113.553289]  [<ffffffff802e7e5e>] ? vfs_fsync+0x6c/0xb1                                                 
> [  113.553289]  [<ffffffff805568d0>] mutex_lock_nested+0x4e/0x320                                          
> [  113.553289]  [<ffffffff802e7e5e>] ? vfs_fsync+0x6c/0xb1                                                 
> [  113.553289]  [<ffffffff8029bde0>] ? __filemap_fdatawrite_range+0x57/0x5f                                
> [  113.553289]  [<ffffffff802e7e5e>] vfs_fsync+0x6c/0xb1                                                   
> [  113.553289]  [<ffffffffa0176f8f>] nfsd_sync_dir+0x15/0x17 [nfsd]                                        
> [  113.553289]  [<ffffffffa0190733>] nfsd4_sync_rec_dir+0x2e/0x47 [nfsd]                                   
> [  113.553289]  [<ffffffffa0190791>] nfsd4_recdir_purge_old+0x45/0x73 [nfsd]                               
> [  113.553289]  [<ffffffffa018bd44>] laundromat_main+0x72/0x24e [nfsd]                                     
> [  113.553289]  [<ffffffff80252355>] run_workqueue+0x108/0x21b                                             
> [  113.553289]  [<ffffffff80252303>] ? run_workqueue+0xb6/0x21b                                            
> [  113.553289]  [<ffffffffa018bcd2>] ? laundromat_main+0x0/0x24e [nfsd]                                    
> [  113.553289]  [<ffffffff8025254d>] worker_thread+0xe5/0xf6                                               
> [  113.553289]  [<ffffffff80256615>] ? autoremove_wake_function+0x0/0x3d                                   
> [  113.553289]  [<ffffffff80252468>] ? worker_thread+0x0/0xf6                                              
> [  113.553289]  [<ffffffff80256200>] kthread+0x4e/0x7b                                                     
> [  113.553289]  [<ffffffff8020d51a>] child_rip+0xa/0x20                                                    
> [  113.553289]  [<ffffffff8020cec0>] ? restore_args+0x0/0x30                                               
> [  113.553289]  [<ffffffff802561b2>] ? kthread+0x0/0x7b                                                    
> [  113.553289]  [<ffffffff8020d510>] ? child_rip+0x0/0x20 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-01-08 16:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-06 22:56 nfsd stuckage Andrew Morton
2009-01-06 23:02 ` J. Bruce Fields
2009-01-06 23:03   ` Christoph Hellwig
2009-01-06 23:05     ` J. Bruce Fields
2009-01-07  0:15       ` J. Bruce Fields
2009-01-07  0:23         ` Andrew Morton
2009-01-07  0:28           ` J. Bruce Fields
2009-01-07  7:42             ` Christoph Hellwig
2009-01-07 16:56               ` J. Bruce Fields
2009-01-07 17:22                 ` Christoph Hellwig
2009-01-08 14:57 ` Peter Zijlstra
2009-01-08 16:05   ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox