Linux NFS development
 help / color / mirror / Atom feed
* Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice
@ 2024-06-13 22:44 Chris Friesen
  2024-06-14 13:09 ` Chuck Lever III
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Friesen @ 2024-06-13 22:44 UTC (permalink / raw)
  To: linux-nfs

Hi all,

I'm not subscribed to the list so please CC me on replies.  I'm seeing 
some odd behaviour and I'm not sure what's going on.

The short version is that I can mount NFS exports from this server 
(including mounting on the server itself) via TCP but not via UDP.

rpcinfo on the server gives the following:

root@controller-0:/var/home/sysadmin# rpcinfo -s
    program version(s) netid(s)                     service     owner
     100000  2,3,4     local,udp,tcp,udp6,tcp6      portmapper  superuser
     100024  1         tcp6,udp6,tcp,udp            status      116
     100003  4,3       udp6,tcp6,udp,tcp            nfs         superuser
     100227  3         udp6,tcp6,udp,tcp            -           superuser
     100021  4,3,1     tcp6,udp6,tcp,udp            nlockmgr    superuser
     100005  3,2,1     tcp6,udp6,tcp,udp            mountd      superuser

root@controller-0:/var/home/sysadmin# rpcinfo -p
    program vers proto   port  service
     100000    4   tcp    111  portmapper
     100000    3   tcp    111  portmapper
     100000    2   tcp    111  portmapper
     100000    4   udp    111  portmapper
     100000    3   udp    111  portmapper
     100000    2   udp    111  portmapper
     100024    1   udp  44993  status
     100024    1   tcp  60561  status
     100003    3   tcp   2049  nfs
     100003    4   tcp   2049  nfs
     100227    3   tcp   2049
     100003    3   udp   2049  nfs
     100227    3   udp   2049
     100021    1   udp  36993  nlockmgr
     100021    3   udp  36993  nlockmgr
     100021    4   udp  36993  nlockmgr
     100021    1   tcp  33239  nlockmgr
     100021    3   tcp  33239  nlockmgr
     100021    4   tcp  33239  nlockmgr
     100005    1   udp  43636  mountd
     100005    1   tcp  49357  mountd
     100005    2   udp  39783  mountd
     100005    2   tcp  54527  mountd
     100005    3   udp  40970  mountd
     100005    3   tcp  41761  mountd


So it *looks* okay to me, but if I probe it I get the expected response 
for TCP but not for UDP:

root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 4
program 100003 version 4 ready and waiting
root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 3
program 100003 version 3 ready and waiting
root@controller-0:/var/home/sysadmin# rpcinfo -T udp localhost 100003 3
rpcinfo: RPC: Timed out
program 100003 version 3 is not available


Oddly the last command returns immediately so it's not actually timing 
out, the error message is a bit misleading.   If I sniff the network 
traffic while running that last command, I see this:

root@controller-0:/var/home/sysadmin# tcpdump -vvv -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
20:20:53.159743 IP (tos 0x0, ttl 64, id 58251, offset 0, flags [DF], 
proto UDP (17), length 68)
     localhost.887 > localhost.nfs: NFS request xid 1717614110 40 null
20:20:53.160016 IP (tos 0x0, ttl 64, id 58252, offset 0, flags [DF], 
proto UDP (17), length 28)
     localhost.nfs > localhost.887: [bad udp cksum 0xfe1b -> 0xf663!] 
UDP, length 0


Ignoring the checksums, I get this:

root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
20:21:49.642588 IP (tos 0x0, ttl 64, id 22026, offset 0, flags [DF], 
proto UDP (17), length 68)
     localhost.934 > localhost.nfs: NFS request xid 1718297876 40 null
20:21:49.642697 IP (tos 0x0, ttl 64, id 22027, offset 0, flags [DF], 
proto UDP (17), length 28)
     localhost.nfs > localhost.934: UDP, length 0


On a separate system with a 5.10 kernel I get a successful response that 
is longer and is properly decoded:

root@controller-0:/var/home/sysadmin#  tcpdump -vvv -K -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
22:22:00.388497 IP (tos 0x0, ttl 64, id 806, offset 0, flags [DF], proto 
UDP (17), length 68)
     localhost.808 > localhost.nfs: NFS request xid 1718037578 40 null
22:22:00.388525 IP (tos 0x0, ttl 64, id 807, offset 0, flags [DF], proto 
UDP (17), length 52)
     localhost.nfs > localhost.808: NFS reply xid 1718037578 reply ok 24 
null


Anyone have any ideas what might be going on or how to debug?  I'm 
building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives 
anything useful.

Thanks,
Chris

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice
  2024-06-13 22:44 Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice Chris Friesen
@ 2024-06-14 13:09 ` Chuck Lever III
  2024-06-17 14:30   ` Chris Friesen
  0 siblings, 1 reply; 3+ messages in thread
From: Chuck Lever III @ 2024-06-14 13:09 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Linux NFS Mailing List



> On Jun 13, 2024, at 6:44 PM, Chris Friesen <cbf123@usask.ca> wrote:
> 
> Hi all,
> 
> I'm not subscribed to the list so please CC me on replies.  I'm seeing some odd behaviour and I'm not sure what's going on.
> 
> The short version is that I can mount NFS exports from this server (including mounting on the server itself) via TCP but not via UDP.
> 
> rpcinfo on the server gives the following:
> 
> root@controller-0:/var/home/sysadmin# rpcinfo -s
>   program version(s) netid(s)                     service     owner
>    100000  2,3,4     local,udp,tcp,udp6,tcp6      portmapper  superuser
>    100024  1         tcp6,udp6,tcp,udp            status      116
>    100003  4,3       udp6,tcp6,udp,tcp            nfs         superuser
>    100227  3         udp6,tcp6,udp,tcp            -           superuser
>    100021  4,3,1     tcp6,udp6,tcp,udp            nlockmgr    superuser
>    100005  3,2,1     tcp6,udp6,tcp,udp            mountd      superuser
> 
> root@controller-0:/var/home/sysadmin# rpcinfo -p
>   program vers proto   port  service
>    100000    4   tcp    111  portmapper
>    100000    3   tcp    111  portmapper
>    100000    2   tcp    111  portmapper
>    100000    4   udp    111  portmapper
>    100000    3   udp    111  portmapper
>    100000    2   udp    111  portmapper
>    100024    1   udp  44993  status
>    100024    1   tcp  60561  status
>    100003    3   tcp   2049  nfs
>    100003    4   tcp   2049  nfs
>    100227    3   tcp   2049
>    100003    3   udp   2049  nfs
>    100227    3   udp   2049
>    100021    1   udp  36993  nlockmgr
>    100021    3   udp  36993  nlockmgr
>    100021    4   udp  36993  nlockmgr
>    100021    1   tcp  33239  nlockmgr
>    100021    3   tcp  33239  nlockmgr
>    100021    4   tcp  33239  nlockmgr
>    100005    1   udp  43636  mountd
>    100005    1   tcp  49357  mountd
>    100005    2   udp  39783  mountd
>    100005    2   tcp  54527  mountd
>    100005    3   udp  40970  mountd
>    100005    3   tcp  41761  mountd
> 
> 
> So it *looks* okay to me, but if I probe it I get the expected response for TCP but not for UDP:
> 
> root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 4
> program 100003 version 4 ready and waiting
> root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 3
> program 100003 version 3 ready and waiting
> root@controller-0:/var/home/sysadmin# rpcinfo -T udp localhost 100003 3
> rpcinfo: RPC: Timed out
> program 100003 version 3 is not available
> 
> 
> Oddly the last command returns immediately so it's not actually timing out, the error message is a bit misleading.   If I sniff the network traffic while running that last command, I see this:
> 
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 20:20:53.159743 IP (tos 0x0, ttl 64, id 58251, offset 0, flags [DF], proto UDP (17), length 68)
>    localhost.887 > localhost.nfs: NFS request xid 1717614110 40 null
> 20:20:53.160016 IP (tos 0x0, ttl 64, id 58252, offset 0, flags [DF], proto UDP (17), length 28)
>    localhost.nfs > localhost.887: [bad udp cksum 0xfe1b -> 0xf663!] UDP, length 0
> 
> 
> Ignoring the checksums, I get this:
> 
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 20:21:49.642588 IP (tos 0x0, ttl 64, id 22026, offset 0, flags [DF], proto UDP (17), length 68)
>    localhost.934 > localhost.nfs: NFS request xid 1718297876 40 null
> 20:21:49.642697 IP (tos 0x0, ttl 64, id 22027, offset 0, flags [DF], proto UDP (17), length 28)
>    localhost.nfs > localhost.934: UDP, length 0
> 
> 
> On a separate system with a 5.10 kernel I get a successful response that is longer and is properly decoded:
> 
> root@controller-0:/var/home/sysadmin#  tcpdump -vvv -K -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 22:22:00.388497 IP (tos 0x0, ttl 64, id 806, offset 0, flags [DF], proto UDP (17), length 68)
>    localhost.808 > localhost.nfs: NFS request xid 1718037578 40 null
> 22:22:00.388525 IP (tos 0x0, ttl 64, id 807, offset 0, flags [DF], proto UDP (17), length 52)
>    localhost.nfs > localhost.808: NFS reply xid 1718037578 reply ok 24 null
> 
> 
> Anyone have any ideas what might be going on or how to debug?  I'm building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives anything useful.

linux-6.6.15 has "SUNRPC: use request size to initialize bio_vec in
svc_udp_sendto()" which might address this issue.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice
  2024-06-14 13:09 ` Chuck Lever III
@ 2024-06-17 14:30   ` Chris Friesen
  0 siblings, 0 replies; 3+ messages in thread
From: Chris Friesen @ 2024-06-17 14:30 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

On 6/14/2024 7:09 AM, Chuck Lever III wrote:

>> On Jun 13, 2024, at 6:44 PM, Chris Friesen <cbf123@usask.ca> wrote:

>> Anyone have any ideas what might be going on or how to debug?  I'm building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives anything useful.
> 
> linux-6.6.15 has "SUNRPC: use request size to initialize bio_vec in
> svc_udp_sendto()" which might address this issue.
> 
> 
> --
> Chuck Lever

Yes! Thanks for that, it did indeed fix the issue.

Chris


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-06-17 14:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 22:44 Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice Chris Friesen
2024-06-14 13:09 ` Chuck Lever III
2024-06-17 14:30   ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox