All of lore.kernel.org
 help / color / mirror / Atom feed
* 100% repeatable NFS v3 client hang
@ 2007-06-14 17:04 John McCorquodale
  2007-06-14 17:11 ` John McCorquodale
  0 siblings, 1 reply; 5+ messages in thread
From: John McCorquodale @ 2007-06-14 17:04 UTC (permalink / raw)
  To: nfs

Guys,

I've been having a reliable problem with NFS v3 since 2.6.19 (possibly earlier;
I didn't test), and just confirmed still present in 2.6.21.5 (x86_64).  The
problem is that issuing the command:

  $ dd if=/dev/zero of=biggy bs=1G count=50

100% reliably results in a hung client and completely idle server after a
few (different each time, usually between 5 and 10) GB have been written.
No output via dmesg at all.  Sometimes, however, when not writing large files,
I'll see an occasional dmesg "NFS: desynchronized value of nfs_i.ncommit."
It proceeds at about 30MB/s until the client eats it, and then it never wakes
up again.  The root fs is mounted via nfs v3, so once it eats it I can't log
on or anything.  Serial console outputs no information during/after the eat-it
condition, so it's not a panic.  Pings continue.  Feels deadlocky.

I saw an old dicussion/patch about something that sounded like my problem here:
  http://lkml.org/lkml/2007/4/21/112
but this old kernel+patch did not fix my problem.

This problem has been reliable and present for months for me, and I assumed
that it was so obvious that anybody would see it immediately, and thus
concluded that 'nobody was working on v3 anymore'.  Enough eyebrows were
raised by that comment by my post on the v4 list this morning:
  http://linux-nfs.org/pipermail/nfsv4/2007-June/006183.html
that I am forced to re-evaluate that conclusion -- it appears likely that
this may be something unique to my configuration or personal luck.

The only thing at all weird about my platform is that the servers (not the
clients) are nvidia forcedeth gigE, which has never inspired my confidence.
The forcedeth in 2.6.21 seems fine, 'tho, as it regularly sends >800GB via
nc for backups without hiccup or incident.

Anyway, last bit of data is that I just switched to nfs4 and the problem is
gone.  I have successfully dd'd four 50G files-of-zeros simultaneously with
no ill effects.

Anyway, I have a shadow server/client that I could conceivably bring v3 up on
and run tests with if you guys would like to use me as a debugging agent or
would like something to log in to.

I was going to ignore this, but when Trond's eyebrows went up on the v4 list,
I figured I ought to get off my duff and report it.

Cheers,

-mcq

P.S. If this comes thru twice, my apologies.  I wasn't subscribed the first
     time I sent it and I assume it got blackholed.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread
* 100% repeatable NFS v3 client hang
@ 2007-06-14 16:55 John McCorquodale
  2007-06-14 18:03 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: John McCorquodale @ 2007-06-14 16:55 UTC (permalink / raw)
  To: nfs

Guys,

I've been having a reliable problem with NFS v3 since 2.6.19 (possibly earlier;
I didn't test), and just confirmed still present in 2.6.21.5 (x86_64).  The
problem is that issuing the command:

  $ dd if=/dev/zero of=biggy bs=1G count=50

100% reliably results in a hung client and completely idle server after a
few (different each time, usually betwene 5 and 10) GB have been written.
No output via dmesg at all.  Sometimes, however, when not writing large files,
I'll see an occasional dmesg "NFS: desynchronized value of nfs_i.ncommit."
It proceeds at about 30MB/s until the client eats it, and then it never wakes
up again.  The roof fs is mounted via nfs v3, so once it eats it I can't log
on or anything.  Serial console outputs no information during/after the eat-it
condition, so it's not a panic.  Pings continue.  Feels deadlocky.

I saw an old dicussion/patch about something that sounded like my problem
(but the old kernel+patch didn't fix my problem) here:
  http://lkml.org/lkml/2007/4/21/112

This problem has been reliable and present for months for me, and I assumed
that it was so obvious that anybody would see it immediately, and thus
concluded that 'nobody was working on v3 anymore'.  Enough eyebrows were
raised by that comment by my most on the v4 list this morning:
  http://linux-nfs.org/pipermail/nfsv4/2007-June/006183.html
that I am forced to re-evaluate that conclusion -- it appears likely that
this may be something unique to my hardware.

The only thing at all weird about my platform is that the servers (not the
clients) are nvidia forcedeth gigE, which has never inspired my confidence.
The forcedeth in 2.6.21 seems fine, 'tho, as it regularly sends >800GB via
nc for backups without hiccup or incident.

Anyway, last bit of data is that I just switched to nfs4 and the problem is
gone.  I have successfully dd'd four 50G files-of-zeros simultaneously with
no ill effects.

Anyway, I have a shadow server/client that I could conceivably bring v3 up on
and run tests with if you guys would like to use me as a debugging agent or
would like something to log in to.

I was going to ignore this, but when Trond's eyebrows went up on the v4 list,
I figured I ought to get off my duff and report it.

Cheers,

-mcq

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-06-14 18:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-14 17:04 100% repeatable NFS v3 client hang John McCorquodale
2007-06-14 17:11 ` John McCorquodale
  -- strict thread matches above, loose matches on Subject: below --
2007-06-14 16:55 John McCorquodale
2007-06-14 18:03 ` Trond Myklebust
2007-06-14 18:09   ` John McCorquodale

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.