Linux NFS development
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Chris Friesen <cbf123@usask.ca>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice
Date: Fri, 14 Jun 2024 13:09:13 +0000	[thread overview]
Message-ID: <005CFD98-CF5B-4A33-9AB4-43FBE1FD8762@oracle.com> (raw)
In-Reply-To: <8ac2bb6f-63da-412f-8469-2a5be823fa40@usask.ca>



> On Jun 13, 2024, at 6:44 PM, Chris Friesen <cbf123@usask.ca> wrote:
> 
> Hi all,
> 
> I'm not subscribed to the list so please CC me on replies.  I'm seeing some odd behaviour and I'm not sure what's going on.
> 
> The short version is that I can mount NFS exports from this server (including mounting on the server itself) via TCP but not via UDP.
> 
> rpcinfo on the server gives the following:
> 
> root@controller-0:/var/home/sysadmin# rpcinfo -s
>   program version(s) netid(s)                     service     owner
>    100000  2,3,4     local,udp,tcp,udp6,tcp6      portmapper  superuser
>    100024  1         tcp6,udp6,tcp,udp            status      116
>    100003  4,3       udp6,tcp6,udp,tcp            nfs         superuser
>    100227  3         udp6,tcp6,udp,tcp            -           superuser
>    100021  4,3,1     tcp6,udp6,tcp,udp            nlockmgr    superuser
>    100005  3,2,1     tcp6,udp6,tcp,udp            mountd      superuser
> 
> root@controller-0:/var/home/sysadmin# rpcinfo -p
>   program vers proto   port  service
>    100000    4   tcp    111  portmapper
>    100000    3   tcp    111  portmapper
>    100000    2   tcp    111  portmapper
>    100000    4   udp    111  portmapper
>    100000    3   udp    111  portmapper
>    100000    2   udp    111  portmapper
>    100024    1   udp  44993  status
>    100024    1   tcp  60561  status
>    100003    3   tcp   2049  nfs
>    100003    4   tcp   2049  nfs
>    100227    3   tcp   2049
>    100003    3   udp   2049  nfs
>    100227    3   udp   2049
>    100021    1   udp  36993  nlockmgr
>    100021    3   udp  36993  nlockmgr
>    100021    4   udp  36993  nlockmgr
>    100021    1   tcp  33239  nlockmgr
>    100021    3   tcp  33239  nlockmgr
>    100021    4   tcp  33239  nlockmgr
>    100005    1   udp  43636  mountd
>    100005    1   tcp  49357  mountd
>    100005    2   udp  39783  mountd
>    100005    2   tcp  54527  mountd
>    100005    3   udp  40970  mountd
>    100005    3   tcp  41761  mountd
> 
> 
> So it *looks* okay to me, but if I probe it I get the expected response for TCP but not for UDP:
> 
> root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 4
> program 100003 version 4 ready and waiting
> root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 3
> program 100003 version 3 ready and waiting
> root@controller-0:/var/home/sysadmin# rpcinfo -T udp localhost 100003 3
> rpcinfo: RPC: Timed out
> program 100003 version 3 is not available
> 
> 
> Oddly the last command returns immediately so it's not actually timing out, the error message is a bit misleading.   If I sniff the network traffic while running that last command, I see this:
> 
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 20:20:53.159743 IP (tos 0x0, ttl 64, id 58251, offset 0, flags [DF], proto UDP (17), length 68)
>    localhost.887 > localhost.nfs: NFS request xid 1717614110 40 null
> 20:20:53.160016 IP (tos 0x0, ttl 64, id 58252, offset 0, flags [DF], proto UDP (17), length 28)
>    localhost.nfs > localhost.887: [bad udp cksum 0xfe1b -> 0xf663!] UDP, length 0
> 
> 
> Ignoring the checksums, I get this:
> 
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 20:21:49.642588 IP (tos 0x0, ttl 64, id 22026, offset 0, flags [DF], proto UDP (17), length 68)
>    localhost.934 > localhost.nfs: NFS request xid 1718297876 40 null
> 20:21:49.642697 IP (tos 0x0, ttl 64, id 22027, offset 0, flags [DF], proto UDP (17), length 28)
>    localhost.nfs > localhost.934: UDP, length 0
> 
> 
> On a separate system with a 5.10 kernel I get a successful response that is longer and is properly decoded:
> 
> root@controller-0:/var/home/sysadmin#  tcpdump -vvv -K -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 22:22:00.388497 IP (tos 0x0, ttl 64, id 806, offset 0, flags [DF], proto UDP (17), length 68)
>    localhost.808 > localhost.nfs: NFS request xid 1718037578 40 null
> 22:22:00.388525 IP (tos 0x0, ttl 64, id 807, offset 0, flags [DF], proto UDP (17), length 52)
>    localhost.nfs > localhost.808: NFS reply xid 1718037578 reply ok 24 null
> 
> 
> Anyone have any ideas what might be going on or how to debug?  I'm building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives anything useful.

linux-6.6.15 has "SUNRPC: use request size to initialize bio_vec in
svc_udp_sendto()" which might address this issue.


--
Chuck Lever



  reply	other threads:[~2024-06-14 13:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13 22:44 Seeing strange behaviour from RPC/NFSD on 6.6.7 kernel, looking for debug advice Chris Friesen
2024-06-14 13:09 ` Chuck Lever III [this message]
2024-06-17 14:30   ` Chris Friesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=005CFD98-CF5B-4A33-9AB4-43FBE1FD8762@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=cbf123@usask.ca \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox