All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [NFS] I/O Errors with hard mounts
@ 2008-06-06  1:00 Ricardo Labiaga
  0 siblings, 0 replies; 13+ messages in thread
From: Ricardo Labiaga @ 2008-06-06  1:00 UTC (permalink / raw)
  To: David Konerding; +Cc: nfs

You have a=A0significant number of dropped connections, as indicated by=
 the high EAGAIN count.
I wouldn't be surprised if the 2.6.16 kernel isn't handling the reconne=
ction correctly and propagating
EIO to the application.=A0 There's=A0been a fair amount of client side =
work in the RPC=A0reconnection=20
code=A0recently .=A0 Can you try with a recent kernel?
A network trace and rpcdebug output would be invaluable when you're abl=
e to reproduce this.
- ricardo
On Wed, Jun 4, 2008 at 3:45 PM, Ricardo Labiaga <labiaga@yahoo.com> wro=
te:
>> Does /var/log/messages show any errors around the same time?=A0=20
>> In addition to the network trace and rpcdebug on the client, take a =
look at "nfsstat -d" on the filer.=20
>>=A0Is the filer dropping the connection?=A0 Look for "dropped with EA=
GAIN" or "dropped from vol offline"=20
>> in the output.=A0 This will help narrow down the problem.
> So, sometimes when somebody deletes a lot of data (like the problem w=
e
> just observed),
> the deleting host, and often other hosts, do report=A0 'filer not
> responding' in the logs.
> However, operations that aren't happening in the delete dir, tend to
> work just fine (for example, iozone could be running and doing pretty
> well)).=A0 Further, the most recent time this happened, the host didn=
't
> report filer not responding.
>
> This is the only EAGAN reference I see:
>
> assist queue (queued, split mbufs, drop for EAGAIN) =3D (0, 64478612,=
 94340)
>
> Dave


     =20

-----------------------------------------------------------------------=
--
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [NFS] I/O Errors with hard mounts
@ 2008-06-05  0:40 Ricardo Labiaga
  0 siblings, 0 replies; 13+ messages in thread
From: Ricardo Labiaga @ 2008-06-05  0:40 UTC (permalink / raw)
  To: dakoner; +Cc: nfs

Can you provide the entire nfsstat -d output on the filer?
(Apologies for the lack of subject line in previous reply)
- ricardo
> -----Original Message-----
> From: David Konerding [mailto:dakoner@gmail.com]=20
> Sent: Wednesday, June 04, 2008 3:56 PM
> To: nfs@lists.sourceforge.net
> Subject: Re: [NFS] I/O Errors with hard mounts
>=20
> On Wed, Jun 4, 2008 at 3:45 PM, Ricardo Labiaga=20
> <labiaga@yahoo.com> wrote:
> > Does /var/log/messages show any errors around the same=20
> time?=A0 In addition to the network trace and rpcdebug on the=20
> client, take a look at "nfsstat -d" on the filer. Is the=20
> filer dropping the connection?=A0 Look for "dropped with=20
> EAGAIN" or "dropped from vol offline" in the output.=A0 This=20
> will help narrow down the problem.
>=20
> So, sometimes when somebody deletes a lot of data (like the problem w=
e
> just observed),
> the deleting host, and often other hosts, do report=A0 'filer not
> responding' in the logs.
> However, operations that aren't happening in the delete dir, tend to
> work just fine (for example, iozone could be running and doing pretty
> well)).=A0 Further, the most recent time this happened, the host didn=
't
> report filer not responding.
>=20
>=20
> This is the only EAGAN reference I see:
>=20
> assist queue (queued, split mbufs, drop for EAGAIN) =3D (0,=20
> 64478612, 94340)
>=20
>=20
> Dave
>=20



     =20

-----------------------------------------------------------------------=
--
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [NFS] I/O Errors with hard mounts
@ 2008-06-04 22:45 Ricardo Labiaga
       [not found] ` <927260.87785.qm-KtJlQ5K7SlOvuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Ricardo Labiaga @ 2008-06-04 22:45 UTC (permalink / raw)
  To: dakoner; +Cc: nfs

Does=A0/var/log/messages show any errors around the same time?=A0 In ad=
dition=A0to the=A0network trace=A0and=A0rpcdebug on the client, take a =
look at "nfsstat -d" on the filer.=A0Is=A0the filer=A0dropping the conn=
ection?=A0 Look for "dropped with EAGAIN" or "dropped from vol offline"=
 in the output.=A0 This will help narrow down the problem.
- ricardo
> -----Original Message-----
> From: David Konerding [mailto:dakoner@gmail.com]=20
> Sent: Wednesday, June 04, 2008 6:33 AM
> To: nfs@lists.sourceforge.net
> Subject: [NFS] I/O Errors with hard mounts
>=20
> Hi,
>=20
> We have a bunch of Linux clients (SLES 10 SP1) which mount a=20
> NetApp filer.
>=20
> When the NetApp gets very, very busy, for example, one user is
> deleting 1Tbyte of data
> while another user is doing a 30 client throughput test, it will stop
> responding to some requests.
>=20
> Although we are using hard mounts, some users report that during the
> hammering period, some of their
> file operations produce "I/O Error" messages on their terminal.
>=20
> We checked, and the hosts are indeed using hard mounting.=A0 From our
> reading, I/O Errors
> should only ever make it back to the user if are using soft mounting.
>=20
> We're pretty sure the filer is not sending back an NFS_ERR=20
> response (and we're
> pretty sure that wouldn't get reported to the user as an I/O Error...=
)
>=20
> At this point, we suspect there must be a path in the NFS
> implementation that returns I/O Error to user
> space even with a hard mount.
>=20
> Any ideas?
>=20
> Dave



     =20

-----------------------------------------------------------------------=
--
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 13+ messages in thread
* [NFS] I/O Errors with hard mounts
@ 2008-06-04 13:33 David Konerding
       [not found] ` <4f0f0cb0806040633x74fd0afbm94866cf85810f242-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: David Konerding @ 2008-06-04 13:33 UTC (permalink / raw)
  To: nfs

Hi,

We have a bunch of Linux clients (SLES 10 SP1) which mount a NetApp filer.

When the NetApp gets very, very busy, for example, one user is
deleting 1Tbyte of data
while another user is doing a 30 client throughput test, it will stop
responding to some requests.

Although we are using hard mounts, some users report that during the
hammering period, some of their
file operations produce "I/O Error" messages on their terminal.

We checked, and the hosts are indeed using hard mounting.  From our
reading, I/O Errors
should only ever make it back to the user if are using soft mounting.

We're pretty sure the filer is not sending back an NFS_ERR response (and we're
pretty sure that wouldn't get reported to the user as an I/O Error...)

At this point, we suspect there must be a path in the NFS
implementation that returns I/O Error to user
space even with a hard mount.

Any ideas?

Dave

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
    http://vger.kernel.org/vger-lists.html#linux-nfs


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-06-10  0:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <505115.86554.qm@web31405.mail.mud.yahoo.com>
     [not found] ` <4f0f0cb0806061638i35ae4f9bp423148d6acbb953b@mail.gmail.com>
     [not found]   ` <4f0f0cb0806061638i35ae4f9bp423148d6acbb953b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-09 17:02     ` [NFS] I/O Errors with hard mounts David Konerding
     [not found]       ` <4f0f0cb0806091002w7f0110fh17e40568c7eb5bb8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-09 23:20         ` Trond Myklebust
2008-06-06  1:00 Ricardo Labiaga
  -- strict thread matches above, loose matches on Subject: below --
2008-06-05  0:40 Ricardo Labiaga
2008-06-04 22:45 Ricardo Labiaga
     [not found] ` <927260.87785.qm-KtJlQ5K7SlOvuULXzWHTWIglqE1Y4D90QQ4Iyu8u01E@public.gmane.org>
2008-06-04 22:56   ` David Konerding
2008-06-04 13:33 David Konerding
     [not found] ` <4f0f0cb0806040633x74fd0afbm94866cf85810f242-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-04 15:20   ` Blake Golliher
2008-06-04 16:17   ` Jeff Layton
     [not found]     ` <20080604121723.5b6a53e6-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-04 17:00       ` David Konerding
     [not found]         ` <4f0f0cb0806041000m7926d1e7m93f71ebaacd6c976-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-04 17:58           ` Jeff Layton
     [not found]             ` <20080604135817.0608273a-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-04 21:07               ` David Konerding
2008-06-04 18:19   ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.