All of lore.kernel.org
 help / color / mirror / Atom feed
* RedHat Rawhide Kernels
@ 2002-08-08 19:21 Jeremy Sanders
  2002-08-08 19:33 ` Benjamin LaHaise
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Jeremy Sanders @ 2002-08-08 19:21 UTC (permalink / raw)
  To: nfs

Just to let you guys know (as it's close to the current thread), I'm
getting very bad lock-ups using the current rawhide kernel (2.4.18-7.94).
When I connet with a machine running 2.4.18-5 (standard 7.3 errata
kernel), both the client and the server get processes stuck in a "D" state
- the nfsd processes on the server and the user command on the client.
This means you can't shut the server down as the nfsd processes can't get
killed. Strangely 2.4.19 kernels can talk to the server fine!

See
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=70561

I'm going to try a stack trace on the nfsd processes as the RH guys
suggested tomorrow.

Jeremy

-- 
Jeremy Sanders <jss@ast.cam.ac.uk>   http://www-xray.ast.cam.ac.uk/~jss/
X-Ray Group, Institute of Astronomy, University of Cambridge, UK.
Public Key Server PGP Key ID: E1AAE053



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 19:21 RedHat Rawhide Kernels Jeremy Sanders
@ 2002-08-08 19:33 ` Benjamin LaHaise
  2002-08-08 20:16   ` Trond Myklebust
  2002-08-08 20:21 ` Trond Myklebust
  2002-08-12 10:41 ` Jeremy Sanders
  2 siblings, 1 reply; 13+ messages in thread
From: Benjamin LaHaise @ 2002-08-08 19:33 UTC (permalink / raw)
  To: Jeremy Sanders; +Cc: nfs

On Thu, Aug 08, 2002 at 08:21:27PM +0100, Jeremy Sanders wrote:
> Just to let you guys know (as it's close to the current thread), I'm
> getting very bad lock-ups using the current rawhide kernel (2.4.18-7.94).
> When I connet with a machine running 2.4.18-5 (standard 7.3 errata
> kernel), both the client and the server get processes stuck in a "D" state
> - the nfsd processes on the server and the user command on the client.
> This means you can't shut the server down as the nfsd processes can't get
> killed. Strangely 2.4.19 kernels can talk to the server fine!

For the sake of spreading the knowledge around, I've also got a couple 
of reports of -5 (which has an earlier set of Trond's client patches) 
and vanilla 2.4.18 getting D state stuck processes in lock_page on NFS 
mounts.  There's no useful data point beyond that, but it looks like a 
request is getting lost somewhere.

		-ben
-- 
"You will be reincarnated as a toad; and you will be much happier."


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 19:33 ` Benjamin LaHaise
@ 2002-08-08 20:16   ` Trond Myklebust
  2002-08-08 20:23     ` Benjamin LaHaise
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2002-08-08 20:16 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: nfs

>>>>> " " == Benjamin LaHaise <bcrl@redhat.com> writes:

     > For the sake of spreading the knowledge around, I've also got a
     > couple of reports of -5 (which has an earlier set of Trond's
     > client patches) and vanilla 2.4.18 getting D state stuck
     > processes in lock_page on NFS mounts.  There's no useful data
     > point beyond that, but it looks like a request is getting lost
     > somewhere.

Do you know offhand which set of patches are included?

Cheers,
  Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 19:21 RedHat Rawhide Kernels Jeremy Sanders
  2002-08-08 19:33 ` Benjamin LaHaise
@ 2002-08-08 20:21 ` Trond Myklebust
  2002-08-12 10:41 ` Jeremy Sanders
  2 siblings, 0 replies; 13+ messages in thread
From: Trond Myklebust @ 2002-08-08 20:21 UTC (permalink / raw)
  To: Jeremy Sanders; +Cc: nfs

>>>>> " " == Jeremy Sanders <jss@ast.cam.ac.uk> writes:

     > Just to let you guys know (as it's close to the current
     > thread), I'm getting very bad lock-ups using the current
     > rawhide kernel (2.4.18-7.94).  When I connet with a machine

I don't know about the server stuff, but the version of the client
patches that they appear to have included in 2.4.18-7.94 contains one
pretty nasty race in the RPC code that can cause significant
corruption.
I've already notified RH of this, and provided them with details on
how to fix it...

Cheers,
  Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 20:16   ` Trond Myklebust
@ 2002-08-08 20:23     ` Benjamin LaHaise
  2002-08-08 20:31       ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin LaHaise @ 2002-08-08 20:23 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: nfs

On Thu, Aug 08, 2002 at 10:16:18PM +0200, Trond Myklebust wrote:
> Do you know offhand which set of patches are included?

# 15xx
# NFS patches: selected bits from Trond's 2.4.19pre8 patchset
#
Patch1501: linux-2.4.19-nfs-01-pathconf.dif.txt
Patch1503: linux-2.4.19-nfs-03-noac.dif.txt
Patch1504: linux-2.4.19-nfs-04-seekdir.dif.txt
Patch1505: linux-2.4.19-nfs-05-rdplus.dif.txt
Patch1506: linux-2.4.19-nfs-06-rpc_bkl.dif.txt
Patch1507: linux-2.4.19-nfs-07-bkl2.dif.txt
Patch1508: linux-2.4.19-nfs-08-rpc_cong.dif.txt
Patch1509: linux-2.4.19-nfs-09-rpc_wspace.dif.txt
Patch1510: linux-2.4.19-nfs-10-ping.dif.txt
Patch1511: linux-2.4.19-nfs-11-rpc_tweaks.dif.txt
Patch1520: linux-2.4.19-nfs-nosvc.patch
Patch1550: linux-2.4.18-nfs-default-size.patch

nosvc just silences the annoying unknown version (0) printk, and the 
default size patch reverts to a 4KB default if none is specified.

		-ben
-- 
"You will be reincarnated as a toad; and you will be much happier."


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 20:23     ` Benjamin LaHaise
@ 2002-08-08 20:31       ` Trond Myklebust
  2002-08-08 20:57         ` Benjamin LaHaise
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2002-08-08 20:31 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: NFS maillist

>>>>> " " == Benjamin LaHaise <bcrl@redhat.com> writes:

     > On Thu, Aug 08, 2002 at 10:16:18PM +0200, Trond Myklebust
     > wrote:
    >> Do you know offhand which set of patches are included?

     > # 15xx NFS patches: selected bits from Trond's 2.4.19pre8
     > # patchset

That's for 2.4.18-7-94 (with the known RPC race problem), but I
understood that you mentioned a problem affecting the 2.4.18-5 client
too?

Cheers,
  Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 20:31       ` Trond Myklebust
@ 2002-08-08 20:57         ` Benjamin LaHaise
  2002-08-08 21:16           ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin LaHaise @ 2002-08-08 20:57 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS maillist

On Thu, Aug 08, 2002 at 10:31:57PM +0200, Trond Myklebust wrote:
> >>>>> " " == Benjamin LaHaise <bcrl@redhat.com> writes:
> 
>      > On Thu, Aug 08, 2002 at 10:16:18PM +0200, Trond Myklebust
>      > wrote:
>     >> Do you know offhand which set of patches are included?
> 
>      > # 15xx NFS patches: selected bits from Trond's 2.4.19pre8
>      > # patchset
> 
> That's for 2.4.18-7-94 (with the known RPC race problem), but I
> understood that you mentioned a problem affecting the 2.4.18-5 client
> too?

That is the list of patches applied to 2.4.18-5.  Sorry for not being 
clear on that.

		-ben
-- 
"You will be reincarnated as a toad; and you will be much happier."


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 20:57         ` Benjamin LaHaise
@ 2002-08-08 21:16           ` Trond Myklebust
  2002-08-08 21:27             ` Benjamin LaHaise
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2002-08-08 21:16 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: NFS maillist

>>>>> " " == Benjamin LaHaise <bcrl@redhat.com> writes:

     > That is the list of patches applied to 2.4.18-5.  Sorry for not
     > being clear on that.

Duh... My mistake I didn't read your mail clearly enough...

I know of 1 possible hang in that patchset (a hang which also affects
the standard 2.4.19 kernel): The spinlock in xprt_write_space() needs
to be converted to a bh-safe spinlock. The race should be very rare,
but definitely exists (it's been fixed BTW in the newer patchset)...

Do you have any details on the hangs in question that might help?

Cheers,
  Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 21:16           ` Trond Myklebust
@ 2002-08-08 21:27             ` Benjamin LaHaise
  2002-08-08 21:35               ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin LaHaise @ 2002-08-08 21:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS maillist

On Thu, Aug 08, 2002 at 11:16:17PM +0200, Trond Myklebust wrote:
> Duh... My mistake I didn't read your mail clearly enough...
> 
> I know of 1 possible hang in that patchset (a hang which also affects
> the standard 2.4.19 kernel): The spinlock in xprt_write_space() needs
> to be converted to a bh-safe spinlock. The race should be very rare,
> but definitely exists (it's been fixed BTW in the newer patchset)...

Ah, interesting.  I'll put that into a test rpm for the people 
experiencing the problem to see if it helps.

> Do you have any details on the hangs in question that might help?

Basically, under heavy load several processes end up stuck in 
lock_page being called from generic_file_read.  The problem is 
very hard to reproduce (at least I can't on my local machines).

		-ben
-- 
"You will be reincarnated as a toad; and you will be much happier."


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 21:27             ` Benjamin LaHaise
@ 2002-08-08 21:35               ` Trond Myklebust
  2002-08-09  1:04                 ` Thomas Langås
  0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2002-08-08 21:35 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Trond Myklebust, NFS maillist

>>>>> " " == Benjamin LaHaise <bcrl@redhat.com> writes:

    >> Do you have any details on the hangs in question that might
    >> help?

     > Basically, under heavy load several processes end up stuck in
     > lock_page being called from generic_file_read.  The problem is
     > very hard to reproduce (at least I can't on my local machines).

The only other hang I can think of concerns only HIGHMEM machines,
where you can deadlock while exhausting all free kmap() resources.

Also fixed (well - at least chances are *very* heavily reduced) in the
new 'kmap' patchsets.

Cheers,
  Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 21:35               ` Trond Myklebust
@ 2002-08-09  1:04                 ` Thomas Langås
  2002-08-09  4:51                   ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Langås @ 2002-08-09  1:04 UTC (permalink / raw)
  To: NFS maillist

Trond Myklebust:
> The only other hang I can think of concerns only HIGHMEM machines,
> where you can deadlock while exhausting all free kmap() resources.
> Also fixed (well - at least chances are *very* heavily reduced) in the
> new 'kmap' patchsets.

Is this included in 2.4.19 or won't we see it before 2.4.20 comes along?

-- 
Thomas


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-09  1:04                 ` Thomas Langås
@ 2002-08-09  4:51                   ` Trond Myklebust
  0 siblings, 0 replies; 13+ messages in thread
From: Trond Myklebust @ 2002-08-09  4:51 UTC (permalink / raw)
  To: nfs

>>>>> " " =3D=3D Thomas Lang=E5s <tlan@stud.ntnu.no> writes:

     > Trond Myklebust:
    >> The only other hang I can think of concerns only HIGHMEM
    >> machines, where you can deadlock while exhausting all free
    >> kmap() resources.  Also fixed (well - at least chances are
    >> *very* heavily reduced) in the new 'kmap' patchsets.

     > Is this included in 2.4.19 or won't we see it before 2.4.20
     > comes along?

You can see it in 2.5.x or from my patchsets for 2.4.19 (in the usual
place).

Cheers,
   Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RedHat Rawhide Kernels
  2002-08-08 19:21 RedHat Rawhide Kernels Jeremy Sanders
  2002-08-08 19:33 ` Benjamin LaHaise
  2002-08-08 20:21 ` Trond Myklebust
@ 2002-08-12 10:41 ` Jeremy Sanders
  2 siblings, 0 replies; 13+ messages in thread
From: Jeremy Sanders @ 2002-08-12 10:41 UTC (permalink / raw)
  To: nfs

On Thu, Aug 08, 2002 at 08:21:27PM +0100, Jeremy Sanders wrote:
> Just to let you guys know (as it's close to the current thread), I'm
> getting very bad lock-ups using the current rawhide kernel (2.4.18-7.94).
> When I connet with a machine running 2.4.18-5 (standard 7.3 errata
> kernel), both the client and the server get processes stuck in a "D" state
> - the nfsd processes on the server and the user command on the client.
> This means you can't shut the server down as the nfsd processes can't get
> killed. Strangely 2.4.19 kernels can talk to the server fine!
> 
> See
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=70561

Interestingly this bug seems to be due to ext3 (or a combination of ext3
and nfs). Remounting the partition as ext2 stops the hang.

The processes get stuck (when using ext3) in

nfsd          D F68108E0  5976  1142      1          1141  1143 (L-TLB)
Call Trace: [<c0107f7a>] __down [kernel] 0x6a (0xf6af9de4))
[<c01080d4>] __down_failed [kernel] 0x8 (0xf6af9e08))
[<f8824aa0>] ext3_readdir [ext3] 0x0 (0xf6af9e10))
[<c014fdce>] .text.lock.readdir [kernel] 0x5 (0xf6af9e18))
[<f89997e3>] nfsd_readdir [nfsd] 0xc3 (0xf6af9e38))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6af9e40))
[<f8837ca0>] ext3_dir_operations [ext3] 0x0 (0xf6af9e80))
[<f899ee6e>] nfsd3_proc_readdirplus [nfsd] 0xde (0xf6af9ef0))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6af9f04))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6af9f24))
[<f89935c0>] nfsd_dispatch [nfsd] 0xd0 (0xf6af9f30))
[<f89a5898>] nfsd_version3 [nfsd] 0x0 (0xf6af9f44))
[<f89754cc>] svc_process_R6eda96b1 [sunrpc] 0x43c (0xf6af9f50))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6af9f78))
[<f89a58b8>] nfsd_program [nfsd] 0x0 (0xf6af9f7c))
[<f89933b0>] nfsd [nfsd] 0x1d0 (0xf6af9f98))
[<c010765e>] kernel_thread [kernel] 0x2e (0xf6af9ff0))
[<f89931e0>] nfsd [nfsd] 0x0 (0xf6af9ff8))

nfsd          D F68108E0  5592  1143      1          1142  1144 (L-TLB)
Call Trace: [<c0107f7a>] __down [kernel] 0x6a (0xf6aefc24))
[<c01080d4>] __down_failed [kernel] 0x8 (0xf6aefc48))
[<f8837e20>] ext3_dir_inode_operations [ext3] 0x0 (0xf6aefc50))
[<f8999fab>] .text.lock.vfs [nfsd] 0xaf (0xf6aefc58))
[<c01af97c>] ide_wait_stat [kernel] 0xcc (0xf6aefc68))
[<f89a1a20>] encode_fattr3 [nfsd] 0x200 (0xf6aefc90))
[<f89a0f6b>] encode_entry [nfsd] 0x1ab (0xf6aefcb4))
[<f8824d02>] ext3_readdir [ext3] 0x262 (0xf6aefd70))
[<c0223760>] udp_getfrag [kernel] 0x0 (0xf6aefdcc))
[<c014f6b2>] vfs_readdir [kernel] 0x92 (0xf6aefe18))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6aefe24))
[<f89997e3>] nfsd_readdir [nfsd] 0xc3 (0xf6aefe38))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6aefe40))
[<f8837ca0>] ext3_dir_operations [ext3] 0x0 (0xf6aefe80))
[<f899ee6e>] nfsd3_proc_readdirplus [nfsd] 0xde (0xf6aefef0))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6aeff04))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6aeff24))
[<f89935c0>] nfsd_dispatch [nfsd] 0xd0 (0xf6aeff30))
[<f89a5898>] nfsd_version3 [nfsd] 0x0 (0xf6aeff44))
[<f89754cc>] svc_process_R6eda96b1 [sunrpc] 0x43c (0xf6aeff50))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6aeff78))
[<f89a58b8>] nfsd_program [nfsd] 0x0 (0xf6aeff7c))
[<f89933b0>] nfsd [nfsd] 0x1d0 (0xf6aeff98))
[<c010765e>] kernel_thread [kernel] 0x2e (0xf6aefff0))
[<f89931e0>] nfsd [nfsd] 0x0 (0xf6aefff8))


(See the above bug for more details).

Jeremy

-- 
Jeremy Sanders <jss@ast.cam.ac.uk>   http://www-xray.ast.cam.ac.uk/~jss/
X-Ray Group, Institute of Astronomy, University of Cambridge, UK.
Public Key Server PGP Key ID: E1AAE053


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-08-12 10:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-08 19:21 RedHat Rawhide Kernels Jeremy Sanders
2002-08-08 19:33 ` Benjamin LaHaise
2002-08-08 20:16   ` Trond Myklebust
2002-08-08 20:23     ` Benjamin LaHaise
2002-08-08 20:31       ` Trond Myklebust
2002-08-08 20:57         ` Benjamin LaHaise
2002-08-08 21:16           ` Trond Myklebust
2002-08-08 21:27             ` Benjamin LaHaise
2002-08-08 21:35               ` Trond Myklebust
2002-08-09  1:04                 ` Thomas Langås
2002-08-09  4:51                   ` Trond Myklebust
2002-08-08 20:21 ` Trond Myklebust
2002-08-12 10:41 ` Jeremy Sanders

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.