linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH,RESEND] make knfsd interact cleanly with HSMs
@ 2006-05-05 11:52 Greg Banks
  2006-05-08  1:13 ` Neil Brown
  2006-05-08  6:42 ` [NFS] " Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Greg Banks @ 2006-05-05 11:52 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux NFS Mailing List, Linux Filesystem Mailing List

G'day,

The NFSv3 protocol specifies an error, NFS3ERR_JUKEBOX, which a server
should return when an I/O operation will take a very long time.
This causes a different pattern of retries in clients, and avoids
a number of serious problems associated with I/Os which take longer
than an RPC timeout.  The Linux knfsd server has code to generate the
jukebox error and many NFS clients are known to have working code to
handle it.

One scenario in which a server should emit the JUKEBOX error is when
a file data which the client is attempting to access is managed by
an HSM (Hierarchical Storage Manager) and is not present on the disk
and needs to be brought in from tape.  Due to the nature of tapes this
operation can take minutes rather than the milliseconds normally seen
for local file data.

Currently the Linux knfsd handles this situation poorly.  A READ NFS
call will cause the nfsd thread handling it to block until the file
is available, without sending a reply to the NFS client.  After a
few seconds the client retries, and this second READ call causes
another nfsd to block behind the first one.  A few seconds later and
the client's retries have blocked *all* the nfsd threads, and all NFS
service from the server stops until the original file arrives on disk.

WRITEs and SETATTRs which truncate the file are marginally better, in
that the knfsd dupcache will catch the retries and drop them without
blocking an nfsd (the dupcache *will* catch the retries because the
cache entry remains in RC_INPROG state and is not reused until the
first call finishes).  However the first call still blocks, so given
WRITEs to enough offline files the server can still be locked up.

There are also client-side implications, depending on the client
implementation.  For example, on a Linux client an RPC retry loop uses
an RPC request slot, so reads from enough separate offline files can
lock up a mountpoint.

This patch seeks to remedy the interaction between knfsd and HSMs by
providing mechanisms to allow knfsd to tell an underlying filesystem
(which supports HSMs) not to block for reads, writes and truncates
of offline files.  It's a port of a Linux 2.4 patch used in SGI's
ProPack distro since 2004 and in SLES9 since SP2.  The patch:

*  provides a new ATTR_NO_BLOCK flag which the kernel can
   use to tell a filesystem's inode_ops->setattr() operation not
   to block when truncating an offline file.  XFS already obeys
   this flag (inside a #ifdef)

*  changes knfsd to provide ATTR_NO_BLOCK when it does the VFS
   calls to implement the SETATTR NFS call.

*  changes knfsd to supply the O_NONBLOCK flag in the temporary
   struct file it uses for VFS reads and writes, in order to ask
   the filesystem not to block when reading or writing an offline
   file.  XFS already obeys this new semantic for O_NONBLOCK
   (and in SLES9 so does JFS).

*  adds code to translate the -EAGAIN the filesystem returns when
   it would have blocked, to the -ETIMEDOUT that knfsd expects.


Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
(SLES9 patch Acked-by: okir@suse.de)
---

This is a resend of

http://marc.theaimsgroup.com/?l=linux-nfs&m=111087383132762&w=2


 fs/nfsd/vfs.c      |   33 +++++++++++++++++++++++++++++++--
 include/linux/fs.h |    1 +
 2 files changed, 32 insertions(+), 2 deletions(-)

Index: linux/fs/nfsd/vfs.c
===================================================================
--- linux.orig/fs/nfsd/vfs.c	2006-05-05 19:49:38.434101243 +1000
+++ linux/fs/nfsd/vfs.c	2006-05-05 21:00:22.568841897 +1000
@@ -327,6 +327,16 @@ nfsd_setattr(struct svc_rqst *rqstp, str
 			goto out_nfserr;
 		}
 		DQUOT_INIT(inode);
+
+
+		/*
+		 * Tell a Hierarchical Storage Manager (e.g. via DMAPI) to
+		 * return EAGAIN when an action would take minutes instead of
+		 * milliseconds so that NFS can reply to the client with
+		 * NFSERR_JUKEBOX instead of blocking an nfsd thread.
+		 */
+		if (rqstp->rq_vers >= 3)
+			iap->ia_valid |= ATTR_NO_BLOCK;
 	}
 
 	imode = inode->i_mode;
@@ -349,6 +359,9 @@ nfsd_setattr(struct svc_rqst *rqstp, str
 	if (!check_guard || guardtime == inode->i_ctime.tv_sec) {
 		fh_lock(fhp);
 		err = notify_change(dentry, iap);
+		/* to get NFSERR_JUKEBOX on the wire, need -ETIMEDOUT */
+		if (err == -EAGAIN)
+			err = -ETIMEDOUT;
 		err = nfserrno(err);
 		fh_unlock(fhp);
 	}
@@ -834,6 +847,10 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
 	if (ra && ra->p_set)
 		file->f_ra = ra->p_ra;
 
+	/* Support HSMs -- see comment in nfsd_setattr() */
+	if (rqstp->rq_vers >= 3)
+		file->f_flags |= O_NONBLOCK;
+
 	if (file->f_op->sendfile) {
 		svc_pushback_unused_pages(rqstp);
 		err = file->f_op->sendfile(file, &offset, *count,
@@ -859,8 +876,12 @@ nfsd_vfs_read(struct svc_rqst *rqstp, st
 		*count = err;
 		err = 0;
 		fsnotify_access(file->f_dentry);
-	} else 
+	} else {
+		/* to get NFSERR_JUKEBOX on the wire, need -ETIMEDOUT */
+		if (err == -EAGAIN)
+			err = -ETIMEDOUT;
 		err = nfserrno(err);
+	}
 out:
 	return err;
 }
@@ -918,6 +939,10 @@ nfsd_vfs_write(struct svc_rqst *rqstp, s
 	if (stable && !EX_WGATHER(exp))
 		file->f_flags |= O_SYNC;
 
+	/* Support HSMs -- see comment in nfsd_setattr() */
+	if (rqstp->rq_vers >= 3)
+		file->f_flags |= O_NONBLOCK;
+
 	/* Write the data. */
 	oldfs = get_fs(); set_fs(KERNEL_DS);
 	err = vfs_writev(file, (struct iovec __user *)vec, vlen, &offset);
@@ -970,8 +995,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, s
 	dprintk("nfsd: write complete err=%d\n", err);
 	if (err >= 0)
 		err = 0;
-	else 
+	else {
+		/* to get NFSERR_JUKEBOX on the wire, need -ETIMEDOUT */
+		if (err == -EAGAIN)
+			err = -ETIMEDOUT;
 		err = nfserrno(err);
+	}
 out:
 	return err;
 }
Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h	2006-05-05 19:49:40.587127963 +1000
+++ linux/include/linux/fs.h	2006-05-05 21:00:22.569818326 +1000
@@ -273,6 +273,7 @@ typedef void (dio_iodone_t)(struct kiocb
 #define ATTR_KILL_SUID	2048
 #define ATTR_KILL_SGID	4096
 #define ATTR_FILE	8192
+#define ATTR_NO_BLOCK	32768	/* Return EAGAIN and don't block on long truncates */
 
 /*
  * This is the Inode Attributes structure, used for notify_change().  It


-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH,RESEND] make knfsd interact cleanly with HSMs
  2006-05-05 11:52 [PATCH,RESEND] make knfsd interact cleanly with HSMs Greg Banks
@ 2006-05-08  1:13 ` Neil Brown
  2006-05-08  6:42 ` [NFS] " Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Neil Brown @ 2006-05-08  1:13 UTC (permalink / raw)
  To: Greg Banks; +Cc: Linux NFS Mailing List, Linux Filesystem Mailing List

On Friday May 5, gnb@sgi.com wrote:
> 
> This patch seeks to remedy the interaction between knfsd and HSMs by
> providing mechanisms to allow knfsd to tell an underlying filesystem
> (which supports HSMs) not to block for reads, writes and truncates
> of offline files.  It's a port of a Linux 2.4 patch used in SGI's
> ProPack distro since 2004 and in SLES9 since SP2.  The patch:
> 
> *  provides a new ATTR_NO_BLOCK flag which the kernel can
>    use to tell a filesystem's inode_ops->setattr() operation not
>    to block when truncating an offline file.  XFS already obeys
>    this flag (inside a #ifdef)
> 
> *  changes knfsd to provide ATTR_NO_BLOCK when it does the VFS
>    calls to implement the SETATTR NFS call.
> 
> *  changes knfsd to supply the O_NONBLOCK flag in the temporary
>    struct file it uses for VFS reads and writes, in order to ask
>    the filesystem not to block when reading or writing an offline
>    file.  XFS already obeys this new semantic for O_NONBLOCK
>    (and in SLES9 so does JFS).
> 
> *  adds code to translate the -EAGAIN the filesystem returns when
>    it would have blocked, to the -ETIMEDOUT that knfsd expects.
> 

Yes, I'm happy with this.  All the changes make sense and look right.

Thanks for persisting with it.

I'll send it on up the chain...

NeilBrown


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [NFS] [PATCH,RESEND] make knfsd interact cleanly with HSMs
  2006-05-05 11:52 [PATCH,RESEND] make knfsd interact cleanly with HSMs Greg Banks
  2006-05-08  1:13 ` Neil Brown
@ 2006-05-08  6:42 ` Christoph Hellwig
  2006-05-08 11:16   ` Neil Brown
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2006-05-08  6:42 UTC (permalink / raw)
  To: Greg Banks
  Cc: Neil Brown, Linux NFS Mailing List, Linux Filesystem Mailing List

NACK.  As long as we have no HSM support in the tree there's no reason to
add this.  From the kernel's point it's just untested and unused code that
can break.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [NFS] [PATCH,RESEND] make knfsd interact cleanly with HSMs
  2006-05-08  6:42 ` [NFS] " Christoph Hellwig
@ 2006-05-08 11:16   ` Neil Brown
  2006-05-08 11:37     ` Nathan Scott
  2006-05-08 17:55     ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Neil Brown @ 2006-05-08 11:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Greg Banks, Linux NFS Mailing List, Linux Filesystem Mailing List

On Monday May 8, hch@infradead.org wrote:
> NACK.  As long as we have no HSM support in the tree there's no reason to
> add this.  From the kernel's point it's just untested and unused code that
> can break.


Greg: you seemed to suggest that there was already code in XFS that
could make use of this.  If that is so: could you point us to it
please.

Alternately, if this will only be used by future patches, please feel
free to add this patch to that set, and add

  Acked-By: NeilBrown <neilb@suse.de>

to it.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [NFS] [PATCH,RESEND] make knfsd interact cleanly with HSMs
  2006-05-08 11:16   ` Neil Brown
@ 2006-05-08 11:37     ` Nathan Scott
  2006-05-08 17:55     ` Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Nathan Scott @ 2006-05-08 11:37 UTC (permalink / raw)
  To: Neil Brown
  Cc: Christoph Hellwig, Greg Banks, Linux NFS Mailing List,
	Linux Filesystem Mailing List

On Mon, May 08, 2006 at 09:16:34PM +1000, Neil Brown wrote:
> On Monday May 8, hch@infradead.org wrote:
> > NACK.  As long as we have no HSM support in the tree there's no reason to
> > add this.  From the kernel's point it's just untested and unused code that
> > can break.
> 
> Greg: you seemed to suggest that there was already code in XFS that
> could make use of this.  If that is so: could you point us to it
> please.

The code in question is in fs/xfs/linux-2.6/xfs_iops.c::xfs_vn_setattr,
fs/xfs/xfs_dmapi.h::AT_DELAY_FLAG, and its users in xfs_setattr and
xfs_free_file_space and one or two other spots in XFS.  The DMAPI code
(driver, outside XFS) can be found in the XFS CVS tree on oss.sgi.com.

> Alternately, if this will only be used by future patches, please feel
> free to add this patch to that set, and add

The DMAPI code was released at the same time as XFS, but it seems to
be a spec that is not widely used (unsurprising since not may people
have tape robots doing their bidding) and, perhaps more importantly,
its not viewed as a particularly clean spec by, erm, certain kernel
developers. ;)

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [NFS] [PATCH,RESEND] make knfsd interact cleanly with HSMs
  2006-05-08 11:16   ` Neil Brown
  2006-05-08 11:37     ` Nathan Scott
@ 2006-05-08 17:55     ` Christoph Hellwig
  2006-05-09  2:35       ` Greg Banks
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2006-05-08 17:55 UTC (permalink / raw)
  To: Neil Brown
  Cc: Christoph Hellwig, Greg Banks, Linux NFS Mailing List,
	Linux Filesystem Mailing List

On Mon, May 08, 2006 at 09:16:34PM +1000, Neil Brown wrote:
> On Monday May 8, hch@infradead.org wrote:
> > NACK.  As long as we have no HSM support in the tree there's no reason to
> > add this.  From the kernel's point it's just untested and unused code that
> > can break.
> 
> 
> Greg: you seemed to suggest that there was already code in XFS that
> could make use of this.  If that is so: could you point us to it
> please.

It's used by SGI's out of tree dmapi implementation.  Because dmapi is such
an utterly braindead standard I don't expect anyone to submit a kernel-based
implementation for inclusion, although support for a big enough subset of
that standard could be archived by proper kernel <-> userspace cooperation.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [NFS] [PATCH,RESEND] make knfsd interact cleanly with HSMs
  2006-05-08 17:55     ` Christoph Hellwig
@ 2006-05-09  2:35       ` Greg Banks
  2006-05-09  9:19         ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Banks @ 2006-05-09  2:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Neil Brown, Linux NFS Mailing List, Linux Filesystem Mailing List

On Tue, 2006-05-09 at 03:55, Christoph Hellwig wrote:
> On Mon, May 08, 2006 at 09:16:34PM +1000, Neil Brown wrote:
> > On Monday May 8, hch@infradead.org wrote:
> > > NACK.  As long as we have no HSM support in the tree there's no reason to
> > > add this.  From the kernel's point it's just untested and unused code that
> > > can break.

So your only objection is the absence of DMAPI or some equivalent?
If so, perhaps the best way forward in the short term is for SUSE
to add it as an out-of-tree patch.

> > Greg: you seemed to suggest that there was already code in XFS that
> > could make use of this.  If that is so: could you point us to it
> > please.
> 
> It's used by SGI's out of tree dmapi implementation.  Because dmapi is such
> an utterly braindead standard

Sure it's ugly, but to be fair it's no worse than SysV semaphores
and shmem, and it does fill a real (albeit niche) need which nothing
else does AFAIK.  Compare to TLI, which is pretty much entirely useless.

>  I don't expect anyone to submit a kernel-based
> implementation for inclusion, although support for a big enough subset of
> that standard could be archived by proper kernel <-> userspace cooperation.

I don't understand what you mean here: surely almost all of what
DMAPI specifies is for achieving proper kernel/userspace co-operation?
Do you have some ideas for a better way of doing that?

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [NFS] [PATCH,RESEND] make knfsd interact cleanly with HSMs
  2006-05-09  2:35       ` Greg Banks
@ 2006-05-09  9:19         ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2006-05-09  9:19 UTC (permalink / raw)
  To: Greg Banks
  Cc: Christoph Hellwig, Neil Brown, Linux NFS Mailing List,
	Linux Filesystem Mailing List

On Tue, May 09, 2006 at 12:35:15PM +1000, Greg Banks wrote:
> So your only objection is the absence of DMAPI or some equivalent?
> If so, perhaps the best way forward in the short term is for SUSE
> to add it as an out-of-tree patch.

The objection doesn't have anything to do with DMAPI per see.  It has to
do with adding kernel code that's never used in tree and thus bloats the
kernel.   Also it has a fair chance to bitrot as there's no way it can be
tested.

p.s. -fsdevel couldn't care less about what patches you send to SuSE,
please stop posting such things here and talk to your SuSE technical contacts.

> Sure it's ugly, but to be fair it's no worse than SysV semaphores
> and shmem, and it does fill a real (albeit niche) need which nothing
> else does AFAIK.  Compare to TLI, which is pretty much entirely useless.

It's fat worse because it makes assumptions that simply aren't true in
a unix or linux enviroment.  SysV IPC is ugly but doesn't do this.

> >  I don't expect anyone to submit a kernel-based
> > implementation for inclusion, although support for a big enough subset of
> > that standard could be archived by proper kernel <-> userspace cooperation.
> 
> I don't understand what you mean here: surely almost all of what
> DMAPI specifies is for achieving proper kernel/userspace co-operation?

no.  the DMAPI spec doesn't even know about the user <-> kernel boundary
nor should it.

> Do you have some ideas for a better way of doing that?

Yes, and I've written them up quite a few times, please look at the fsdevel
linux-kernel archives and possible some SGI-internal lists from years ago.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-05-09  9:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-05 11:52 [PATCH,RESEND] make knfsd interact cleanly with HSMs Greg Banks
2006-05-08  1:13 ` Neil Brown
2006-05-08  6:42 ` [NFS] " Christoph Hellwig
2006-05-08 11:16   ` Neil Brown
2006-05-08 11:37     ` Nathan Scott
2006-05-08 17:55     ` Christoph Hellwig
2006-05-09  2:35       ` Greg Banks
2006-05-09  9:19         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).