All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS4ERR_RESOURCE returns Remote IO Error
@ 2009-09-11  7:46 André Roth
       [not found] ` <4AAA0063.4070406-omB+W0Dpw2o@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: André Roth @ 2009-09-11  7:46 UTC (permalink / raw)
  To: linux-nfs


Hello list,

We have a problem with the NFS client on Linux on a NetApp filer. Every
now and then we are getting huge amounts of "Remote IO Errors" on the
mounted home directories. This is new, we had no problems with the same
setup some months ago.

The reason for the Remote IO Errors is that the server is returning an
NFS4ERR_RESOURCE error, even just for
a simple "touch foo". The tcpdump of such a touch command I have listed
below.

I suspect that maybe the deduplication or backup operation make the
filer "busy" so it cannot respond to all compound operations.

I've seen that linux NFS just passes this error to the user as
EREMOTEIO. OpenSolaris seems to delay and retry. According to the RFC
I'm not sure what would be the right thing to do. But I think this is
meant to be a temporary error, and an NFS client should, in the example
below, take the first two operations as completed, and retry the
remaining 5 ones, or maybe retry the whole compound message.

What is your insight on this error ?
Is there something I can try, like returning EAGAIN and see if the erro=
r
gets handled correctly ?
The problem ocurred one or twice a week, without any regularity..

Thanks for your help

 Andr=E9


"touch foo" request from the client:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
NFS      V4 COMPOUND Call PUTFH;SAVEFH;OPEN;GETFH;GETATTR;
RESTOREFH;GETATTR
Remote Procedure Call, Type:Call XID:0x00c46a7a
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 0
    Operations (count: 7)
        Opcode: PUTFH (22)
        Opcode: SAVEFH (32)
        Opcode: OPEN (18)
        Opcode: GETFH (10)
        Opcode: GETATTR (9)
        Opcode: RESTOREFH (31)
        Opcode: GETATTR (9)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D


Error Response from the server:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
NFS      V4 COMPOUND Reply PUTFH;SAVEFH;OPEN
Remote Procedure Call, Type:Reply XID:0x00c46a7a
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4ERR_RESOURCE (10018)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 3)
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: SAVEFH (32)
            Status: NFS4_OK (0)
        Opcode: OPEN (18)
            Status: NFS4ERR_RESOURCE (10018)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D









^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: NFS4ERR_RESOURCE returns Remote IO Error
       [not found] ` <4AAA0063.4070406-omB+W0Dpw2o@public.gmane.org>
@ 2009-09-11 12:33   ` Trond Myklebust
  0 siblings, 0 replies; 2+ messages in thread
From: Trond Myklebust @ 2009-09-11 12:33 UTC (permalink / raw)
  To: André Roth; +Cc: linux-nfs

On Fri, 2009-09-11 at 09:46 +0200, Andr=C3=A9 Roth wrote:
> Hello list,
>=20
> We have a problem with the NFS client on Linux on a NetApp filer. Eve=
ry
> now and then we are getting huge amounts of "Remote IO Errors" on the
> mounted home directories. This is new, we had no problems with the sa=
me
> setup some months ago.
>=20
> The reason for the Remote IO Errors is that the server is returning a=
n
> NFS4ERR_RESOURCE error, even just for
> a simple "touch foo". The tcpdump of such a touch command I have list=
ed
> below.
>=20
> I suspect that maybe the deduplication or backup operation make the
> filer "busy" so it cannot respond to all compound operations.
>=20
> I've seen that linux NFS just passes this error to the user as
> EREMOTEIO. OpenSolaris seems to delay and retry. According to the RFC
> I'm not sure what would be the right thing to do. But I think this is
> meant to be a temporary error, and an NFS client should, in the examp=
le
> below, take the first two operations as completed, and retry the
> remaining 5 ones, or maybe retry the whole compound message.
>=20
> What is your insight on this error ?
> Is there something I can try, like returning EAGAIN and see if the er=
ror
> gets handled correctly ?
> The problem ocurred one or twice a week, without any regularity..

This has been discussed extensively on the IETF mailing lists, and the
code therefore reflects our current understanding of RFC3530.

Unlike NFS4ERR_DELAY, the error NFS4ERR_RESOURCE is not listed as a
temporary error, nor is there any admonition in the RFC to retry or
should do anything other than to split up a possible overly-complex
COMPOUND (which is usually not possible for us). To quote:

   In the processing of the COMPOUND procedure, the server may find
   that it does not have the available resources to execute any or all
   of the operations within the COMPOUND sequence.  In this case, the
   error NFS4ERR_RESOURCE will be returned for the particular operation
   within the COMPOUND procedure where the resource exhaustion
   occurred.  This assumes that all previous operations within the
   COMPOUND sequence have been evaluated successfully.  The results for
   all of the evaluated operations must be returned to the client.

IOW: If dedup is causing the filer to start returning NFS4ERR_RESOURCE
errors to the client, then that would appear to be a serious server bug=
=2E
I'd suggest you contact your NetApp sales rep, and ask them to help you
escalate this so that it gets due attention.

Cheers,
  Trond
--=20
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-09-11 12:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-11  7:46 NFS4ERR_RESOURCE returns Remote IO Error André Roth
     [not found] ` <4AAA0063.4070406-omB+W0Dpw2o@public.gmane.org>
2009-09-11 12:33   ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.