* Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice
@ 2024-06-13 22:44 Chris Friesen
2024-06-14 13:09 ` Chuck Lever III
0 siblings, 1 reply; 3+ messages in thread
From: Chris Friesen @ 2024-06-13 22:44 UTC (permalink / raw)
To: linux-nfs
Hi all,
I'm not subscribed to the list so please CC me on replies. I'm seeing
some odd behaviour and I'm not sure what's going on.
The short version is that I can mount NFS exports from this server
(including mounting on the server itself) via TCP but not via UDP.
rpcinfo on the server gives the following:
root@controller-0:/var/home/sysadmin# rpcinfo -s
program version(s) netid(s) service owner
100000 2,3,4 local,udp,tcp,udp6,tcp6 portmapper superuser
100024 1 tcp6,udp6,tcp,udp status 116
100003 4,3 udp6,tcp6,udp,tcp nfs superuser
100227 3 udp6,tcp6,udp,tcp - superuser
100021 4,3,1 tcp6,udp6,tcp,udp nlockmgr superuser
100005 3,2,1 tcp6,udp6,tcp,udp mountd superuser
root@controller-0:/var/home/sysadmin# rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 44993 status
100024 1 tcp 60561 status
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049
100003 3 udp 2049 nfs
100227 3 udp 2049
100021 1 udp 36993 nlockmgr
100021 3 udp 36993 nlockmgr
100021 4 udp 36993 nlockmgr
100021 1 tcp 33239 nlockmgr
100021 3 tcp 33239 nlockmgr
100021 4 tcp 33239 nlockmgr
100005 1 udp 43636 mountd
100005 1 tcp 49357 mountd
100005 2 udp 39783 mountd
100005 2 tcp 54527 mountd
100005 3 udp 40970 mountd
100005 3 tcp 41761 mountd
So it *looks* okay to me, but if I probe it I get the expected response
for TCP but not for UDP:
root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 4
program 100003 version 4 ready and waiting
root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 3
program 100003 version 3 ready and waiting
root@controller-0:/var/home/sysadmin# rpcinfo -T udp localhost 100003 3
rpcinfo: RPC: Timed out
program 100003 version 3 is not available
Oddly the last command returns immediately so it's not actually timing
out, the error message is a bit misleading. If I sniff the network
traffic while running that last command, I see this:
root@controller-0:/var/home/sysadmin# tcpdump -vvv -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length
262144 bytes
20:20:53.159743 IP (tos 0x0, ttl 64, id 58251, offset 0, flags [DF],
proto UDP (17), length 68)
localhost.887 > localhost.nfs: NFS request xid 1717614110 40 null
20:20:53.160016 IP (tos 0x0, ttl 64, id 58252, offset 0, flags [DF],
proto UDP (17), length 28)
localhost.nfs > localhost.887: [bad udp cksum 0xfe1b -> 0xf663!]
UDP, length 0
Ignoring the checksums, I get this:
root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length
262144 bytes
20:21:49.642588 IP (tos 0x0, ttl 64, id 22026, offset 0, flags [DF],
proto UDP (17), length 68)
localhost.934 > localhost.nfs: NFS request xid 1718297876 40 null
20:21:49.642697 IP (tos 0x0, ttl 64, id 22027, offset 0, flags [DF],
proto UDP (17), length 28)
localhost.nfs > localhost.934: UDP, length 0
On a separate system with a 5.10 kernel I get a successful response that
is longer and is properly decoded:
root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length
262144 bytes
22:22:00.388497 IP (tos 0x0, ttl 64, id 806, offset 0, flags [DF], proto
UDP (17), length 68)
localhost.808 > localhost.nfs: NFS request xid 1718037578 40 null
22:22:00.388525 IP (tos 0x0, ttl 64, id 807, offset 0, flags [DF], proto
UDP (17), length 52)
localhost.nfs > localhost.808: NFS reply xid 1718037578 reply ok 24
null
Anyone have any ideas what might be going on or how to debug? I'm
building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives
anything useful.
Thanks,
Chris
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice
2024-06-13 22:44 Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice Chris Friesen
@ 2024-06-14 13:09 ` Chuck Lever III
2024-06-17 14:30 ` Chris Friesen
0 siblings, 1 reply; 3+ messages in thread
From: Chuck Lever III @ 2024-06-14 13:09 UTC (permalink / raw)
To: Chris Friesen; +Cc: Linux NFS Mailing List
> On Jun 13, 2024, at 6:44 PM, Chris Friesen <cbf123@usask.ca> wrote:
>
> Hi all,
>
> I'm not subscribed to the list so please CC me on replies. I'm seeing some odd behaviour and I'm not sure what's going on.
>
> The short version is that I can mount NFS exports from this server (including mounting on the server itself) via TCP but not via UDP.
>
> rpcinfo on the server gives the following:
>
> root@controller-0:/var/home/sysadmin# rpcinfo -s
> program version(s) netid(s) service owner
> 100000 2,3,4 local,udp,tcp,udp6,tcp6 portmapper superuser
> 100024 1 tcp6,udp6,tcp,udp status 116
> 100003 4,3 udp6,tcp6,udp,tcp nfs superuser
> 100227 3 udp6,tcp6,udp,tcp - superuser
> 100021 4,3,1 tcp6,udp6,tcp,udp nlockmgr superuser
> 100005 3,2,1 tcp6,udp6,tcp,udp mountd superuser
>
> root@controller-0:/var/home/sysadmin# rpcinfo -p
> program vers proto port service
> 100000 4 tcp 111 portmapper
> 100000 3 tcp 111 portmapper
> 100000 2 tcp 111 portmapper
> 100000 4 udp 111 portmapper
> 100000 3 udp 111 portmapper
> 100000 2 udp 111 portmapper
> 100024 1 udp 44993 status
> 100024 1 tcp 60561 status
> 100003 3 tcp 2049 nfs
> 100003 4 tcp 2049 nfs
> 100227 3 tcp 2049
> 100003 3 udp 2049 nfs
> 100227 3 udp 2049
> 100021 1 udp 36993 nlockmgr
> 100021 3 udp 36993 nlockmgr
> 100021 4 udp 36993 nlockmgr
> 100021 1 tcp 33239 nlockmgr
> 100021 3 tcp 33239 nlockmgr
> 100021 4 tcp 33239 nlockmgr
> 100005 1 udp 43636 mountd
> 100005 1 tcp 49357 mountd
> 100005 2 udp 39783 mountd
> 100005 2 tcp 54527 mountd
> 100005 3 udp 40970 mountd
> 100005 3 tcp 41761 mountd
>
>
> So it *looks* okay to me, but if I probe it I get the expected response for TCP but not for UDP:
>
> root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 4
> program 100003 version 4 ready and waiting
> root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 3
> program 100003 version 3 ready and waiting
> root@controller-0:/var/home/sysadmin# rpcinfo -T udp localhost 100003 3
> rpcinfo: RPC: Timed out
> program 100003 version 3 is not available
>
>
> Oddly the last command returns immediately so it's not actually timing out, the error message is a bit misleading. If I sniff the network traffic while running that last command, I see this:
>
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 20:20:53.159743 IP (tos 0x0, ttl 64, id 58251, offset 0, flags [DF], proto UDP (17), length 68)
> localhost.887 > localhost.nfs: NFS request xid 1717614110 40 null
> 20:20:53.160016 IP (tos 0x0, ttl 64, id 58252, offset 0, flags [DF], proto UDP (17), length 28)
> localhost.nfs > localhost.887: [bad udp cksum 0xfe1b -> 0xf663!] UDP, length 0
>
>
> Ignoring the checksums, I get this:
>
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 20:21:49.642588 IP (tos 0x0, ttl 64, id 22026, offset 0, flags [DF], proto UDP (17), length 68)
> localhost.934 > localhost.nfs: NFS request xid 1718297876 40 null
> 20:21:49.642697 IP (tos 0x0, ttl 64, id 22027, offset 0, flags [DF], proto UDP (17), length 28)
> localhost.nfs > localhost.934: UDP, length 0
>
>
> On a separate system with a 5.10 kernel I get a successful response that is longer and is properly decoded:
>
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 22:22:00.388497 IP (tos 0x0, ttl 64, id 806, offset 0, flags [DF], proto UDP (17), length 68)
> localhost.808 > localhost.nfs: NFS request xid 1718037578 40 null
> 22:22:00.388525 IP (tos 0x0, ttl 64, id 807, offset 0, flags [DF], proto UDP (17), length 52)
> localhost.nfs > localhost.808: NFS reply xid 1718037578 reply ok 24 null
>
>
> Anyone have any ideas what might be going on or how to debug? I'm building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives anything useful.
linux-6.6.15 has "SUNRPC: use request size to initialize bio_vec in
svc_udp_sendto()" which might address this issue.
--
Chuck Lever
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice
2024-06-14 13:09 ` Chuck Lever III
@ 2024-06-17 14:30 ` Chris Friesen
0 siblings, 0 replies; 3+ messages in thread
From: Chris Friesen @ 2024-06-17 14:30 UTC (permalink / raw)
To: Chuck Lever III; +Cc: Linux NFS Mailing List
On 6/14/2024 7:09 AM, Chuck Lever III wrote:
>> On Jun 13, 2024, at 6:44 PM, Chris Friesen <cbf123@usask.ca> wrote:
>> Anyone have any ideas what might be going on or how to debug? I'm building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives anything useful.
>
> linux-6.6.15 has "SUNRPC: use request size to initialize bio_vec in
> svc_udp_sendto()" which might address this issue.
>
>
> --
> Chuck Lever
Yes! Thanks for that, it did indeed fix the issue.
Chris
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-06-17 14:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 22:44 Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice Chris Friesen
2024-06-14 13:09 ` Chuck Lever III
2024-06-17 14:30 ` Chris Friesen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox