* 2.6.24: RPC: bad TCP reclen 0x00020090 (large) @ 2008-02-13 14:02 Michael Tokarev 2008-02-18 12:58 ` Andrew Morton 0 siblings, 1 reply; 5+ messages in thread From: Michael Tokarev @ 2008-02-13 14:02 UTC (permalink / raw) To: Kernel Mailing List Hello! After upgrading to 2.6.24 (from .23), we're seeing ALOT of messages like in $subj in dmesg: Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. ... with linux NFS server. The clients are all linux too, mostly 2.6.23 and some 2.6.22. I found the "offending" piece of code in net/sunrpc/svcsock.c, in routine svc_tcp_recvfrom() with condition being: if (svsk->sk_reclen > serv->sv_max_mesg) ... This happens after a server reboot. At this point, client(s) are trying to perform some NFS transaction and fail, and server starts generating the above messages - till I do a umount followed by mount on all clients. Before, such situation (nfs server reboot) were handled transparently, ie, there was nothing to do, the mount continued working just fine when the server comes back online. Now, I'm not sure if it's really 2.6.24-specific problem or a userspace problem. Some time ago we also upgraded nfs-kernel-server (Debian) package, and the remount-after-nfs-server-reboot problem started to occur at THAT time (and it is something to worry about as well, I just had no time to deal with it); but the dmesg spamming only appeared with 2.6.24. How to debug the issue further on from this point? Thanks! /mjt ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large) 2008-02-13 14:02 2.6.24: RPC: bad TCP reclen 0x00020090 (large) Michael Tokarev @ 2008-02-18 12:58 ` Andrew Morton 2008-02-18 13:05 ` Michael Tokarev 2008-02-18 21:25 ` Tom Tucker 0 siblings, 2 replies; 5+ messages in thread From: Andrew Morton @ 2008-02-18 12:58 UTC (permalink / raw) To: Michael Tokarev; +Cc: Kernel Mailing List, linux-nfs (suitable cc added) (regression) On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <mjt@tls.msk.ru> wrote: > Hello! > > After upgrading to 2.6.24 (from .23), we're seeing ALOT > of messages like in $subj in dmesg: > > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. > ... > > with linux NFS server. The clients are all linux too, mostly 2.6.23 > and some 2.6.22. > > I found the "offending" piece of code in net/sunrpc/svcsock.c, > in routine svc_tcp_recvfrom() with condition being: > > if (svsk->sk_reclen > serv->sv_max_mesg) ... > > This happens after a server reboot. At this point, client(s) are trying > to perform some NFS transaction and fail, and server starts generating > the above messages - till I do a umount followed by mount on all clients. > Before, such situation (nfs server reboot) were handled transparently, > ie, there was nothing to do, the mount continued working just fine when > the server comes back online. > > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace > problem. Some time ago we also upgraded nfs-kernel-server (Debian) > package, and the remount-after-nfs-server-reboot problem started to > occur at THAT time (and it is something to worry about as well, I just > had no time to deal with it); but the dmesg spamming only appeared > with 2.6.24. > > How to debug the issue further on from this point? > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large) 2008-02-18 12:58 ` Andrew Morton @ 2008-02-18 13:05 ` Michael Tokarev 2008-02-18 21:25 ` Tom Tucker 1 sibling, 0 replies; 5+ messages in thread From: Michael Tokarev @ 2008-02-18 13:05 UTC (permalink / raw) To: Andrew Morton; +Cc: Kernel Mailing List, linux-nfs Andrew Morton wrote: > (suitable cc added) Thanks. I was meant to sent it to linux-nfs originally, but looks like i mistyped the address. > (regression) Now, after we did some more experiments with it, I don't think it's a regression. I'll post a bit more details in a few hours when the ongoing testing finishes. Thanks! /mjt ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large) 2008-02-18 12:58 ` Andrew Morton 2008-02-18 13:05 ` Michael Tokarev @ 2008-02-18 21:25 ` Tom Tucker 2008-02-18 22:00 ` Tom Tucker 1 sibling, 1 reply; 5+ messages in thread From: Tom Tucker @ 2008-02-18 21:25 UTC (permalink / raw) To: Andrew Morton; +Cc: Michael Tokarev, Kernel Mailing List, linux-nfs On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote: > (suitable cc added) > > (regression) > > On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <mjt@tls.msk.ru> wrote: > > > Hello! > > > > After upgrading to 2.6.24 (from .23), we're seeing ALOT > > of messages like in $subj in dmesg: > > > > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. > > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. > > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. > > ... > > > > with linux NFS server. The clients are all linux too, mostly 2.6.23 > > and some 2.6.22. > > > > I found the "offending" piece of code in net/sunrpc/svcsock.c, > > in routine svc_tcp_recvfrom() with condition being: > > > > if (svsk->sk_reclen > serv->sv_max_mesg) ... The problem might be that the client is setting a bit in the RPC message length field that is meant to be interpreted and masked off by the server -- and we're not doing it yet. My bet is that 0x20000 is the bit we're looking for. I'll poke around... > > > > This happens after a server reboot. At this point, client(s) are trying > > to perform some NFS transaction and fail, and server starts generating > > the above messages - till I do a umount followed by mount on all clients. > > Before, such situation (nfs server reboot) were handled transparently, > > ie, there was nothing to do, the mount continued working just fine when > > the server comes back online. > > > > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace > > problem. Some time ago we also upgraded nfs-kernel-server (Debian) > > package, and the remount-after-nfs-server-reboot problem started to > > occur at THAT time (and it is something to worry about as well, I just > > had no time to deal with it); but the dmesg spamming only appeared > > with 2.6.24. > > > > How to debug the issue further on from this point? > > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large) 2008-02-18 21:25 ` Tom Tucker @ 2008-02-18 22:00 ` Tom Tucker 0 siblings, 0 replies; 5+ messages in thread From: Tom Tucker @ 2008-02-18 22:00 UTC (permalink / raw) To: Andrew Morton; +Cc: Michael Tokarev, Kernel Mailing List, linux-nfs On Mon, 2008-02-18 at 15:25 -0600, Tom Tucker wrote: > On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote: > > (suitable cc added) > > > > (regression) > > > > On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <mjt@tls.msk.ru> wrote: > > > > > Hello! > > > > > > After upgrading to 2.6.24 (from .23), we're seeing ALOT > > > of messages like in $subj in dmesg: > > > > > > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed. > > > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed. > > > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large) > > > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed. > > > ... > > > > > > with linux NFS server. The clients are all linux too, mostly 2.6.23 > > > and some 2.6.22. > > > > > > I found the "offending" piece of code in net/sunrpc/svcsock.c, > > > in routine svc_tcp_recvfrom() with condition being: > > > > > > if (svsk->sk_reclen > serv->sv_max_mesg) ... > > The problem might be that the client is setting a bit in the RPC message > length field that is meant to be interpreted and masked off by the > server -- and we're not doing it yet. My bet is that 0x20000 is the bit > we're looking for. I'll poke around... Never mind. The way this is supposed to work is that the transport is shut down. The code used to delete the socket directly, but I reorganized this code to just set the XPT_CLOSE bit and let the normal close path handle it when it came back through to retry. I'm not sure exactly what version of the code you have, but it may be that your missing the code from the close path that does this, or it may just not work. As a first shot, can you try this patch and tell me if the messages go away? diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 1d3e5fc..cf6150a 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -920,6 +920,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) err_delete: set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags); + svc_delete_xprt(&svsk->sk_xprt); return -EAGAIN; error: > > > > > > > This happens after a server reboot. At this point, client(s) are trying > > > to perform some NFS transaction and fail, and server starts generating > > > the above messages - till I do a umount followed by mount on all clients. > > > Before, such situation (nfs server reboot) were handled transparently, > > > ie, there was nothing to do, the mount continued working just fine when > > > the server comes back online. > > > > > > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace > > > problem. Some time ago we also upgraded nfs-kernel-server (Debian) > > > package, and the remount-after-nfs-server-reboot problem started to > > > occur at THAT time (and it is something to worry about as well, I just > > > had no time to deal with it); but the dmesg spamming only appeared > > > with 2.6.24. > > > > > > How to debug the issue further on from this point? > > > > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-02-18 21:51 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-13 14:02 2.6.24: RPC: bad TCP reclen 0x00020090 (large) Michael Tokarev 2008-02-18 12:58 ` Andrew Morton 2008-02-18 13:05 ` Michael Tokarev 2008-02-18 21:25 ` Tom Tucker 2008-02-18 22:00 ` Tom Tucker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox