Linux NFS development
 help / color / mirror / Atom feed
* Reoccurring 5 second delays during NFS calls
@ 2023-02-07 10:58 Florian Möller
  2023-02-07 15:21 ` Jeff Layton
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Möller @ 2023-02-07 10:58 UTC (permalink / raw)
  To: linux-nfs; +Cc: Andreas Seeg

[-- Attachment #1: Type: text/plain, Size: 3005 bytes --]

Hi all,

we are currently in the process of migrating our file server infrastructure to 
NFS. In our test environments, the following problem has now occurred several 
times in certain situations:

A previously very fast NFS file operation suddenly takes 5 seconds longer - per 
file. This leads to applications running very slowly and severely delayed file 
operations.

Here are the details:

NFS server:
OS: Ubuntu 22.04.1, all patches installed
Kernel: Ubuntu Mainline, Versions
	6.1.7-060107-generic_6.1.7-060107.202301181200
         6.1.8-060108_6.1.8-060108.202301240742
         6.1.9-060109_6.1.9-060109.202302010835
Security options: all Kerberos security options are affected
  (The bug does not seem to occur without Kerberos security.)
Output of exportfs -v:
/export 
gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export 
gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)


Client 1:
OS: Arch Linux, all patches installed
Kernel: 6.1.9-arch1-2, 6.1.9-arch1-1
Mount-Line: servername:/ on /nfs type nfs4 
(rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5p,clientaddr=XX.XX.XX.XX,local_lock=none,addr=YY.YY.YY.YY,_netdev)
krb5: 1.20.1-1
libevent: 2.1.12-4
nfs-utils: 2.6.2-1
util-linux: 2.38.1-1


Client 2:
OS: openSuSE 15.4, all patches installed
Kernel: 5.14.21-150400.24.41-default
Mount-Line: servername:/ on /nfs type nfs4 
(rw,relatime,sync,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,softerr,softreval,noac,noresvport,proto=tcp,timeo=1,retrans=2,sec=krb5,clientaddr=XX.XX.XX.XX,lookupcache=none,local_lock=none,addr=YY.YY.YY.YY) 

libgssapi3: 7.8.0-bp154.2.4.1
libevent: 2.1.8-2.23
nfs-client: 2.1.1-150100.10.27.1
util-linux: 2.37.2-140400.8.14.1


The error occurs for example if a file is touched twice:

touch testfile && echo "done" && touch testfile && echo "and again"

However, touching a large number of files (about 10000) with (pairwise) 
different filenames works fast.


Here is another example that triggers the error:

1st step: create many files (shell code in Z-shell syntax):

for i in {1..10000}; do
echo "test" > $i.txt
done

This is fast.

2nd step:

for i in {1..10000}; do
echo $i
cat $i.txt
done

This takes 5 seconds per cat(1) call.
After unmounting and mounting, the 2nd step also runs quickly at first. But 
after executing the 2nd step several times in a row, the error occurs again 
(quite soon, after the 2nd or 3rd execution).

We were not able to reproduce the error without a Kerberos security type set.


Attached are a log from the server and from the client. In both cases

rpcdebug -m nfs -s all
rpcdebug -m nfsd -s all
rpcdebug -m rpc -s all

was set.


Best regards,
Florian Möller



-- 
Dr. Florian Möller
Universität Würzburg
Institut für Mathematik
Emil-Fischer-Straße 30
97074 Würzburg, Germany
Tel. +49 931 3185596

[-- Attachment #2: client-log.txt --]
[-- Type: text/plain, Size: 21292 bytes --]

Feb 07 11:10:25 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:25 wma7113 kernel: NFS: nfs_weak_revalidate: inode 4625339 is valid
Feb 07 11:10:25 wma7113 kernel: NFS: read(testdir/3167.txt, 1048576@0)
Feb 07 11:10:25 wma7113 kernel: NFS: read(testdir/3167.txt, 1048576@5)
Feb 07 11:10:25 wma7113 kernel: NFS: flush(testdir/3167.txt)
Feb 07 11:10:25 wma7113 kernel: NFS: release(testdir/3167.txt)
Feb 07 11:10:25 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:25 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:25 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:25 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25366 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:25 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:25 wma7113 kernel: RPC:       xs_tcp_send_request(256) = 0
Feb 07 11:10:25 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:25 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:25 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:25 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:25 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:25 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:25 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:25 wma7113 kernel: nfs4_close_done: ret = 0
Feb 07 11:10:25 wma7113 kernel: NFS: dentry_delete(testdir/3167.txt, 48084c)
Feb 07 11:10:25 wma7113 kernel: NFS: permission(0:57/4622172), mask=0x81, res=0
Feb 07 11:10:25 wma7113 kernel: NFS: open file(testdir/3168.txt)
Feb 07 11:10:25 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:25 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:25 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:25 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25367 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:25 wma7113 kernel: RPC:       gss_wrap_kerberos_v2
Feb 07 11:10:25 wma7113 kernel: RPC:       xs_tcp_send_request(328) = 0
Feb 07 11:10:25 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:25 wma7113 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:10:25 wma7113 kernel: decode_attr_type: type=0100000
Feb 07 11:10:25 wma7113 kernel: decode_attr_change: change attribute=1799337701143539908
Feb 07 11:10:25 wma7113 kernel: decode_attr_size: file size=5
Feb 07 11:10:25 wma7113 kernel: decode_attr_fsid: fsid=(0xaee0e28f29494091/0xa495028a183ab387)
Feb 07 11:10:25 wma7113 kernel: decode_attr_fileid: fileid=4625340
Feb 07 11:10:25 wma7113 kernel: decode_attr_fs_locations: fs_locations done, error = 0
Feb 07 11:10:25 wma7113 kernel: decode_attr_mode: file mode=0664
Feb 07 11:10:25 wma7113 kernel: decode_attr_nlink: nlink=1
Feb 07 11:10:25 wma7113 kernel: decode_attr_owner: name=moeller@ifm
Feb 07 11:10:25 wma7113 kernel: decode_attr_group: name=ldapusers@ifm
Feb 07 11:10:25 wma7113 kernel: decode_attr_rdev: rdev=(0x0:0x0)
Feb 07 11:10:25 wma7113 kernel: decode_attr_space_used: space used=4096
Feb 07 11:10:25 wma7113 kernel: decode_attr_time_access: atime=1675763820
Feb 07 11:10:25 wma7113 kernel: decode_attr_time_metadata: ctime=1675763820
Feb 07 11:10:25 wma7113 kernel: decode_attr_time_modify: mtime=1675763820
Feb 07 11:10:25 wma7113 kernel: decode_attr_mounted_on_fileid: fileid=4625340
Feb 07 11:10:25 wma7113 kernel: decode_getfattr_attrs: xdr returned 0
Feb 07 11:10:25 wma7113 kernel: decode_getfattr_generic: xdr returned 0
Feb 07 11:10:25 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:25 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:25 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:25 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:25 wma7113 kernel: NFS: nfs_update_inode(0:57/4625340 fh_crc=0x286e2ced ct=3 info=0x427e7f)
<--- 5 seconds delay --->
Feb 07 11:10:31 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:31 wma7113 kernel: NFS: nfs_weak_revalidate: inode 4625340 is valid
Feb 07 11:10:31 wma7113 kernel: NFS: read(testdir/3168.txt, 1048576@0)
Feb 07 11:10:31 wma7113 kernel: NFS: read(testdir/3168.txt, 1048576@5)
Feb 07 11:10:31 wma7113 kernel: NFS: flush(testdir/3168.txt)
Feb 07 11:10:31 wma7113 kernel: NFS: release(testdir/3168.txt)
Feb 07 11:10:31 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:31 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:31 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:31 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25368 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:31 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:31 wma7113 kernel: RPC:       xs_tcp_send_request(256) = 0
Feb 07 11:10:31 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:31 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:31 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:31 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:31 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:31 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:31 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:31 wma7113 kernel: nfs4_close_done: ret = 0
Feb 07 11:10:31 wma7113 kernel: NFS: dentry_delete(testdir/3168.txt, 48084c)
Feb 07 11:10:31 wma7113 kernel: NFS: permission(0:57/4622172), mask=0x81, res=0
Feb 07 11:10:31 wma7113 kernel: NFS: open file(testdir/3169.txt)
Feb 07 11:10:31 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:31 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:31 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:31 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25369 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:31 wma7113 kernel: RPC:       gss_wrap_kerberos_v2
Feb 07 11:10:31 wma7113 kernel: RPC:       xs_tcp_send_request(328) = 0
Feb 07 11:10:31 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:31 wma7113 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:10:31 wma7113 kernel: decode_attr_type: type=0100000
Feb 07 11:10:31 wma7113 kernel: decode_attr_change: change attribute=1799337701143539908
Feb 07 11:10:31 wma7113 kernel: decode_attr_size: file size=5
Feb 07 11:10:31 wma7113 kernel: decode_attr_fsid: fsid=(0xaee0e28f29494091/0xa495028a183ab387)
Feb 07 11:10:31 wma7113 kernel: decode_attr_fileid: fileid=4625341
Feb 07 11:10:31 wma7113 kernel: decode_attr_fs_locations: fs_locations done, error = 0
Feb 07 11:10:31 wma7113 kernel: decode_attr_mode: file mode=0664
Feb 07 11:10:31 wma7113 kernel: decode_attr_nlink: nlink=1
Feb 07 11:10:31 wma7113 kernel: decode_attr_owner: name=moeller@ifm
Feb 07 11:10:31 wma7113 kernel: decode_attr_group: name=ldapusers@ifm
Feb 07 11:10:31 wma7113 kernel: decode_attr_rdev: rdev=(0x0:0x0)
Feb 07 11:10:31 wma7113 kernel: decode_attr_space_used: space used=4096
Feb 07 11:10:31 wma7113 kernel: decode_attr_time_access: atime=1675763820
Feb 07 11:10:31 wma7113 kernel: decode_attr_time_metadata: ctime=1675763820
Feb 07 11:10:31 wma7113 kernel: decode_attr_time_modify: mtime=1675763820
Feb 07 11:10:31 wma7113 kernel: decode_attr_mounted_on_fileid: fileid=4625341
Feb 07 11:10:31 wma7113 kernel: decode_getfattr_attrs: xdr returned 0
Feb 07 11:10:31 wma7113 kernel: decode_getfattr_generic: xdr returned 0
Feb 07 11:10:31 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:31 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:31 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:31 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:31 wma7113 kernel: NFS: nfs_update_inode(0:57/4625341 fh_crc=0x00286593 ct=3 info=0x427e7f)
Feb 07 11:10:31 wma7113 kernel: nfs4_renew_state: start
Feb 07 11:10:31 wma7113 kernel: nfs4_schedule_state_renewal: requeueing work. Lease period = 60
Feb 07 11:10:31 wma7113 kernel: nfs4_renew_state: done
<--- 5 seconds delay --->
Feb 07 11:10:36 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:36 wma7113 kernel: NFS: nfs_weak_revalidate: inode 4625341 is valid
Feb 07 11:10:36 wma7113 kernel: NFS: read(testdir/3169.txt, 1048576@0)
Feb 07 11:10:36 wma7113 kernel: NFS: read(testdir/3169.txt, 1048576@5)
Feb 07 11:10:36 wma7113 kernel: NFS: flush(testdir/3169.txt)
Feb 07 11:10:36 wma7113 kernel: NFS: release(testdir/3169.txt)
Feb 07 11:10:36 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:36 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:36 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:36 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25370 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:36 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:36 wma7113 kernel: RPC:       xs_tcp_send_request(256) = 0
Feb 07 11:10:36 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:36 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:36 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:36 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:36 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:36 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:36 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:36 wma7113 kernel: nfs4_close_done: ret = 0
Feb 07 11:10:36 wma7113 kernel: NFS: dentry_delete(testdir/3169.txt, 48084c)
Feb 07 11:10:36 wma7113 kernel: NFS: permission(0:57/4622172), mask=0x81, res=0
Feb 07 11:10:36 wma7113 kernel: NFS: open file(testdir/3170.txt)
Feb 07 11:10:36 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:36 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:36 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:36 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25371 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:36 wma7113 kernel: RPC:       gss_wrap_kerberos_v2
Feb 07 11:10:36 wma7113 kernel: RPC:       xs_tcp_send_request(328) = 0
Feb 07 11:10:36 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:36 wma7113 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:10:36 wma7113 kernel: decode_attr_type: type=0100000
Feb 07 11:10:36 wma7113 kernel: decode_attr_change: change attribute=1799337701143539908
Feb 07 11:10:36 wma7113 kernel: decode_attr_size: file size=5
Feb 07 11:10:36 wma7113 kernel: decode_attr_fsid: fsid=(0xaee0e28f29494091/0xa495028a183ab387)
Feb 07 11:10:36 wma7113 kernel: decode_attr_fileid: fileid=4625342
Feb 07 11:10:36 wma7113 kernel: decode_attr_fs_locations: fs_locations done, error = 0
Feb 07 11:10:36 wma7113 kernel: decode_attr_mode: file mode=0664
Feb 07 11:10:36 wma7113 kernel: decode_attr_nlink: nlink=1
Feb 07 11:10:36 wma7113 kernel: decode_attr_owner: name=moeller@ifm
Feb 07 11:10:36 wma7113 kernel: decode_attr_group: name=ldapusers@ifm
Feb 07 11:10:36 wma7113 kernel: decode_attr_rdev: rdev=(0x0:0x0)
Feb 07 11:10:36 wma7113 kernel: decode_attr_space_used: space used=4096
Feb 07 11:10:36 wma7113 kernel: decode_attr_time_access: atime=1675763820
Feb 07 11:10:36 wma7113 kernel: decode_attr_time_metadata: ctime=1675763820
Feb 07 11:10:36 wma7113 kernel: decode_attr_time_modify: mtime=1675763820
Feb 07 11:10:36 wma7113 kernel: decode_attr_mounted_on_fileid: fileid=4625342
Feb 07 11:10:36 wma7113 kernel: decode_getfattr_attrs: xdr returned 0
Feb 07 11:10:36 wma7113 kernel: decode_getfattr_generic: xdr returned 0
Feb 07 11:10:36 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:36 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:36 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:36 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:36 wma7113 kernel: NFS: nfs_update_inode(0:57/4625342 fh_crc=0x21e97a40 ct=3 info=0x427e7f)
<--- 5 seconds delay --->
Feb 07 11:10:41 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:41 wma7113 kernel: NFS: nfs_weak_revalidate: inode 4625342 is valid
Feb 07 11:10:41 wma7113 kernel: NFS: read(testdir/3170.txt, 1048576@0)
Feb 07 11:10:41 wma7113 kernel: NFS: read(testdir/3170.txt, 1048576@5)
Feb 07 11:10:41 wma7113 kernel: NFS: flush(testdir/3170.txt)
Feb 07 11:10:41 wma7113 kernel: NFS: release(testdir/3170.txt)
Feb 07 11:10:41 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:41 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:41 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:41 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25372 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:41 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:41 wma7113 kernel: RPC:       xs_tcp_send_request(256) = 0
Feb 07 11:10:41 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:41 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:41 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:41 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:41 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:41 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:41 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:41 wma7113 kernel: nfs4_close_done: ret = 0
Feb 07 11:10:41 wma7113 kernel: NFS: dentry_delete(testdir/3170.txt, 48084c)
Feb 07 11:10:41 wma7113 kernel: NFS: permission(0:57/4622172), mask=0x81, res=0
Feb 07 11:10:41 wma7113 kernel: NFS: open file(testdir/3171.txt)
Feb 07 11:10:41 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:41 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:41 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:41 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25373 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:41 wma7113 kernel: RPC:       gss_wrap_kerberos_v2
Feb 07 11:10:41 wma7113 kernel: RPC:       xs_tcp_send_request(328) = 0
Feb 07 11:10:41 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:41 wma7113 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:10:41 wma7113 kernel: decode_attr_type: type=0100000
Feb 07 11:10:41 wma7113 kernel: decode_attr_change: change attribute=1799337701147539776
Feb 07 11:10:41 wma7113 kernel: decode_attr_size: file size=5
Feb 07 11:10:41 wma7113 kernel: decode_attr_fsid: fsid=(0xaee0e28f29494091/0xa495028a183ab387)
Feb 07 11:10:41 wma7113 kernel: decode_attr_fileid: fileid=4625343
Feb 07 11:10:41 wma7113 kernel: decode_attr_fs_locations: fs_locations done, error = 0
Feb 07 11:10:41 wma7113 kernel: decode_attr_mode: file mode=0664
Feb 07 11:10:41 wma7113 kernel: decode_attr_nlink: nlink=1
Feb 07 11:10:41 wma7113 kernel: decode_attr_owner: name=moeller@ifm
Feb 07 11:10:41 wma7113 kernel: decode_attr_group: name=ldapusers@ifm
Feb 07 11:10:41 wma7113 kernel: decode_attr_rdev: rdev=(0x0:0x0)
Feb 07 11:10:41 wma7113 kernel: decode_attr_space_used: space used=4096
Feb 07 11:10:41 wma7113 kernel: decode_attr_time_access: atime=1675763820
Feb 07 11:10:41 wma7113 kernel: decode_attr_time_metadata: ctime=1675763820
Feb 07 11:10:41 wma7113 kernel: decode_attr_time_modify: mtime=1675763820
Feb 07 11:10:41 wma7113 kernel: decode_attr_mounted_on_fileid: fileid=4625343
Feb 07 11:10:41 wma7113 kernel: decode_getfattr_attrs: xdr returned 0
Feb 07 11:10:41 wma7113 kernel: decode_getfattr_generic: xdr returned 0
Feb 07 11:10:41 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:41 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:41 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:41 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:41 wma7113 kernel: NFS: nfs_update_inode(0:57/4625343 fh_crc=0x98fcf935 ct=3 info=0x427e7f)
<--- 5 seconds delay --->
Feb 07 11:10:46 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:46 wma7113 kernel: NFS: nfs_weak_revalidate: inode 4625343 is valid
Feb 07 11:10:46 wma7113 kernel: NFS: read(testdir/3171.txt, 1048576@0)
Feb 07 11:10:46 wma7113 kernel: NFS: read(testdir/3171.txt, 1048576@5)
Feb 07 11:10:46 wma7113 kernel: NFS: flush(testdir/3171.txt)
Feb 07 11:10:46 wma7113 kernel: NFS: release(testdir/3171.txt)
Feb 07 11:10:46 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:46 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:46 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:46 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25374 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:46 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:46 wma7113 kernel: RPC:       xs_tcp_send_request(256) = 0
Feb 07 11:10:46 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:46 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:46 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:46 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:46 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:46 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:46 wma7113 kernel: nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Feb 07 11:10:46 wma7113 kernel: nfs4_close_done: ret = 0
Feb 07 11:10:46 wma7113 kernel: NFS: dentry_delete(testdir/3171.txt, 48084c)
Feb 07 11:10:46 wma7113 kernel: NFS: permission(0:57/4622172), mask=0x81, res=0
Feb 07 11:10:46 wma7113 kernel: NFS: open file(testdir/3172.txt)
Feb 07 11:10:46 wma7113 kernel: --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=30
Feb 07 11:10:46 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Feb 07 11:10:46 wma7113 kernel: RPC:       gss_get_mic_v2
Feb 07 11:10:46 wma7113 kernel: encode_sequence: sessionid=1675430610:3317937782:36:0 seqid=25375 slotid=0 max_slotid=0 cache_this=1
Feb 07 11:10:46 wma7113 kernel: RPC:       gss_wrap_kerberos_v2
Feb 07 11:10:46 wma7113 kernel: RPC:       xs_tcp_send_request(328) = 0
Feb 07 11:10:46 wma7113 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:10:46 wma7113 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:10:46 wma7113 kernel: decode_attr_type: type=0100000
Feb 07 11:10:46 wma7113 kernel: decode_attr_change: change attribute=1799337701147539776
Feb 07 11:10:46 wma7113 kernel: decode_attr_size: file size=5
Feb 07 11:10:46 wma7113 kernel: decode_attr_fsid: fsid=(0xaee0e28f29494091/0xa495028a183ab387)
Feb 07 11:10:46 wma7113 kernel: decode_attr_fileid: fileid=4625344
Feb 07 11:10:46 wma7113 kernel: decode_attr_fs_locations: fs_locations done, error = 0
Feb 07 11:10:46 wma7113 kernel: decode_attr_mode: file mode=0664
Feb 07 11:10:46 wma7113 kernel: decode_attr_nlink: nlink=1
Feb 07 11:10:46 wma7113 kernel: decode_attr_owner: name=moeller@ifm
Feb 07 11:10:46 wma7113 kernel: decode_attr_group: name=ldapusers@ifm
Feb 07 11:10:46 wma7113 kernel: decode_attr_rdev: rdev=(0x0:0x0)
Feb 07 11:10:46 wma7113 kernel: decode_attr_space_used: space used=4096
Feb 07 11:10:46 wma7113 kernel: decode_attr_time_access: atime=1675763820
Feb 07 11:10:46 wma7113 kernel: decode_attr_time_metadata: ctime=1675763820
Feb 07 11:10:46 wma7113 kernel: decode_attr_time_modify: mtime=1675763820
Feb 07 11:10:46 wma7113 kernel: decode_attr_mounted_on_fileid: fileid=4625344
Feb 07 11:10:46 wma7113 kernel: decode_getfattr_attrs: xdr returned 0
Feb 07 11:10:46 wma7113 kernel: decode_getfattr_generic: xdr returned 0
Feb 07 11:10:46 wma7113 kernel: --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=30
Feb 07 11:10:46 wma7113 kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Feb 07 11:10:46 wma7113 kernel: nfs4_free_slot: slotid 1 highest_used_slotid 0
Feb 07 11:10:46 wma7113 kernel: nfs41_sequence_process: Error 0 free the slot 
Feb 07 11:10:46 wma7113 kernel: NFS: nfs_update_inode(0:57/4625344 fh_crc=0x15f7431d ct=3 info=0x427e7f)


[-- Attachment #3: server-log.txt --]
[-- Type: text/plain, Size: 11693 bytes --]

Feb 07 11:08:02 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:02 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:02 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:02 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:02 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:02 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:02 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:02 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:02 v09 kernel: check_slot_seqid enter. seqid 25308 slot_seqid 25307
Feb 07 11:08:02 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:02 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:02 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:02 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:02 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:02 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:02 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:02 v09 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:08:02 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:02 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:02 v09 kernel: check_slot_seqid enter. seqid 25309 slot_seqid 25308
Feb 07 11:08:02 v09 kernel: NFSD: nfsd4_open filename  op_openowner 0000000000000000
Feb 07 11:08:02 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:02 v09 kernel: RPC:       gss_wrap_kerberos_v2
<--- 5 seconds delay --->
Feb 07 11:08:07 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:07 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:07 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:07 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:07 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:07 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:07 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:07 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:07 v09 kernel: check_slot_seqid enter. seqid 25310 slot_seqid 25309
Feb 07 11:08:07 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:07 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:07 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:07 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:07 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:07 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:07 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:07 v09 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:08:07 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:07 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:07 v09 kernel: check_slot_seqid enter. seqid 25311 slot_seqid 25310
Feb 07 11:08:07 v09 kernel: NFSD: nfsd4_open filename  op_openowner 0000000000000000
Feb 07 11:08:07 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:07 v09 kernel: RPC:       gss_wrap_kerberos_v2
<--- 5 seconds delay --->
Feb 07 11:08:12 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:12 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:12 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:12 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:12 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:12 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:12 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:12 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:12 v09 kernel: check_slot_seqid enter. seqid 25312 slot_seqid 25311
Feb 07 11:08:12 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:12 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:12 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:12 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:12 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:12 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:12 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:12 v09 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:08:12 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:12 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:12 v09 kernel: check_slot_seqid enter. seqid 25313 slot_seqid 25312
Feb 07 11:08:12 v09 kernel: NFSD: nfsd4_open filename  op_openowner 0000000000000000
Feb 07 11:08:12 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:12 v09 kernel: RPC:       gss_wrap_kerberos_v2
<--- 5 seconds delay --->
Feb 07 11:08:17 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:17 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:17 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:17 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:17 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:17 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:17 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:17 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:17 v09 kernel: check_slot_seqid enter. seqid 25314 slot_seqid 25313
Feb 07 11:08:17 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:17 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:17 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:17 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:17 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:17 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:17 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:17 v09 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:08:17 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:17 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:17 v09 kernel: check_slot_seqid enter. seqid 25315 slot_seqid 25314
Feb 07 11:08:17 v09 kernel: NFSD: nfsd4_open filename  op_openowner 0000000000000000
Feb 07 11:08:17 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:17 v09 kernel: RPC:       gss_wrap_kerberos_v2
<--- 5 seconds delay --->
Feb 07 11:08:23 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:23 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:23 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:23 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:23 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:23 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:23 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:23 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:23 v09 kernel: check_slot_seqid enter. seqid 25316 slot_seqid 25315
Feb 07 11:08:23 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:23 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:23 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:23 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:23 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:23 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:23 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:23 v09 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:08:23 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:23 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:23 v09 kernel: check_slot_seqid enter. seqid 25317 slot_seqid 25316
Feb 07 11:08:23 v09 kernel: NFSD: nfsd4_open filename  op_openowner 0000000000000000
Feb 07 11:08:23 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:23 v09 kernel: RPC:       gss_wrap_kerberos_v2
<--- 5 seconds delay --->
Feb 07 11:08:28 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:28 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:28 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:28 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:28 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:28 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:28 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:28 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:28 v09 kernel: check_slot_seqid enter. seqid 25318 slot_seqid 25317
Feb 07 11:08:28 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:28 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:28 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:28 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:28 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:28 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:28 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:28 v09 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:08:28 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:28 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:28 v09 kernel: check_slot_seqid enter. seqid 25319 slot_seqid 25318
Feb 07 11:08:28 v09 kernel: NFSD: nfsd4_open filename  op_openowner 0000000000000000
Feb 07 11:08:28 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:28 v09 kernel: RPC:       gss_wrap_kerberos_v2
<--- 5 seconds delay --->
Feb 07 11:08:33 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:33 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:33 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:33 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:33 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:33 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:33 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:33 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:33 v09 kernel: check_slot_seqid enter. seqid 25320 slot_seqid 25319
Feb 07 11:08:33 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:33 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:33 v09 kernel: svc: server 000000006d109dfb, pool 0, transport 00000000433c4cd7, inuse=4
Feb 07 11:08:33 v09 kernel: svc: svc_authenticate (6)
Feb 07 11:08:33 v09 kernel: svc: server 000000004616e212, pool 0, transport 00000000433c4cd7, inuse=5
Feb 07 11:08:33 v09 kernel: RPC:       gss_verify_mic_v2
Feb 07 11:08:33 v09 kernel: RPC:       gss_get_mic_v2
Feb 07 11:08:33 v09 kernel: RPC:       gss_unwrap_kerberos_v2
Feb 07 11:08:33 v09 kernel: __find_in_sessionid_hashtbl: 1675430610:3317937782:36:0
Feb 07 11:08:33 v09 kernel: nfsd4_sequence: slotid 0
Feb 07 11:08:33 v09 kernel: check_slot_seqid enter. seqid 25321 slot_seqid 25320
Feb 07 11:08:33 v09 kernel: NFSD: nfsd4_open filename  op_openowner 0000000000000000
Feb 07 11:08:33 v09 kernel: --> nfsd4_store_cache_entry slot 00000000f22dd884
Feb 07 11:08:33 v09 kernel: RPC:       gss_wrap_kerberos_v2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-07 10:58 Reoccurring 5 second delays during NFS calls Florian Möller
@ 2023-02-07 15:21 ` Jeff Layton
  2023-02-21 13:33   ` Florian Möller
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Layton @ 2023-02-07 15:21 UTC (permalink / raw)
  To: Florian Möller, linux-nfs; +Cc: Andreas Seeg

On Tue, 2023-02-07 at 11:58 +0100, Florian Möller wrote:
> Hi all,
> 
> we are currently in the process of migrating our file server infrastructure to 
> NFS. In our test environments, the following problem has now occurred several 
> times in certain situations:
> 
> A previously very fast NFS file operation suddenly takes 5 seconds longer - per 
> file. This leads to applications running very slowly and severely delayed file 
> operations.
> 
> Here are the details:
> 
> NFS server:
> OS: Ubuntu 22.04.1, all patches installed
> Kernel: Ubuntu Mainline, Versions
> 	6.1.7-060107-generic_6.1.7-060107.202301181200
>          6.1.8-060108_6.1.8-060108.202301240742
>          6.1.9-060109_6.1.9-060109.202302010835
> Security options: all Kerberos security options are affected
>   (The bug does not seem to occur without Kerberos security.)
> Output of exportfs -v:
> /export 
> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> /export 
> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> 


I see you're using the -o async export option. Note that you may end up
with corrupt data on a server reboot (see exports(5) manpage).

Assuming you're aware of this and want to keep that anyway, then the
patch I just posted to the mailing list may help you, if the stalls are
coming during CLOSE operations:

https://lore.kernel.org/linux-nfs/9137413986ba9c2e83c030513fa9ae3358f30a85.camel@kernel.org/T/#mcb88f091263d07d8b9c13e6cc5ce0a0413d3f761

> 
> Client 1:
> OS: Arch Linux, all patches installed
> Kernel: 6.1.9-arch1-2, 6.1.9-arch1-1
> Mount-Line: servername:/ on /nfs type nfs4 
> (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5p,clientaddr=XX.XX.XX.XX,local_lock=none,addr=YY.YY.YY.YY,_netdev)
> krb5: 1.20.1-1
> libevent: 2.1.12-4
> nfs-utils: 2.6.2-1
> util-linux: 2.38.1-1
> 
> 
> Client 2:
> OS: openSuSE 15.4, all patches installed
> Kernel: 5.14.21-150400.24.41-default
> Mount-Line: servername:/ on /nfs type nfs4 
> (rw,relatime,sync,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,softerr,softreval,noac,noresvport,proto=tcp,timeo=1,retrans=2,sec=krb5,clientaddr=XX.XX.XX.XX,lookupcache=none,local_lock=none,addr=YY.YY.YY.YY) 
> 
> libgssapi3: 7.8.0-bp154.2.4.1
> libevent: 2.1.8-2.23
> nfs-client: 2.1.1-150100.10.27.1
> util-linux: 2.37.2-140400.8.14.1
> 
> 
> The error occurs for example if a file is touched twice:
> 
> touch testfile && echo "done" && touch testfile && echo "and again"
> 
> However, touching a large number of files (about 10000) with (pairwise) 
> different filenames works fast.
> 
> 
> Here is another example that triggers the error:
> 
> 1st step: create many files (shell code in Z-shell syntax):
> 
> for i in {1..10000}; do
> echo "test" > $i.txt
> done
> 
> This is fast.
> 
> 2nd step:
> 
> for i in {1..10000}; do
> echo $i
> cat $i.txt
> done
> 
> This takes 5 seconds per cat(1) call.
> After unmounting and mounting, the 2nd step also runs quickly at first. But 
> after executing the 2nd step several times in a row, the error occurs again 
> (quite soon, after the 2nd or 3rd execution).
> 
> We were not able to reproduce the error without a Kerberos security type set.
> 
> 
> Attached are a log from the server and from the client. In both cases
> 
> rpcdebug -m nfs -s all
> rpcdebug -m nfsd -s all
> rpcdebug -m rpc -s all
> 
> was set.
> 
> 
> Best regards,
> Florian Möller
> 
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-07 15:21 ` Jeff Layton
@ 2023-02-21 13:33   ` Florian Möller
  2023-02-21 14:13     ` Benjamin Coddington
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Möller @ 2023-02-21 13:33 UTC (permalink / raw)
  To: Jeff Layton, linux-nfs; +Cc: Andreas Seeg

[-- Attachment #1: Type: text/plain, Size: 5009 bytes --]

Hi all,

unfortunately the patch did not help, neither using -o async nor using -o sync. 
We did some more tests (which is the reason for the delay of this reply):

We used a qemu image housing both the NFS server and the client and did some 
kernel debugging.
OS: Ubuntu 22.04.1
Kernel: 5.15.78
Mount line: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,
	namlen=255,hard,proto=tcp,timeo=600,retrans=2,
	sec=krb5p,clientaddr=10.0.0.1,
	local_lock=none,addr=10.0.0.1

We touched a file and then touched the file again. This triggers the delay reliably.

We set breakpoints on all functions starting with nfs4 and on 
"update_open_stateid". The attached file "1sttouch.log" contains a gdb log of 
the first touch.
"2ndtouch.log" shows the gdb output of the second touch. The delay occurs in 
line 116 in update_open_stateid.

We then deleted all breakpoints and set a sole breakpoint on 
update_open_stateid. We touched the file again and used only the "next" command 
of gdb. The gdb output is in "2ndtouch-next.log", the delay occurs in line 8.

Please let us know if you need more information or if you want us to perform 
further tests.

Best regards,
Florian Möller

Am 07.02.23 um 16:21 schrieb Jeff Layton:
> On Tue, 2023-02-07 at 11:58 +0100, Florian Möller wrote:
>> Hi all,
>>
>> we are currently in the process of migrating our file server infrastructure to
>> NFS. In our test environments, the following problem has now occurred several
>> times in certain situations:
>>
>> A previously very fast NFS file operation suddenly takes 5 seconds longer - per
>> file. This leads to applications running very slowly and severely delayed file
>> operations.
>>
>> Here are the details:
>>
>> NFS server:
>> OS: Ubuntu 22.04.1, all patches installed
>> Kernel: Ubuntu Mainline, Versions
>> 	6.1.7-060107-generic_6.1.7-060107.202301181200
>>           6.1.8-060108_6.1.8-060108.202301240742
>>           6.1.9-060109_6.1.9-060109.202302010835
>> Security options: all Kerberos security options are affected
>>    (The bug does not seem to occur without Kerberos security.)
>> Output of exportfs -v:
>> /export
>> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>> /export
>> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>>
> 
> 
> I see you're using the -o async export option. Note that you may end up
> with corrupt data on a server reboot (see exports(5) manpage).
> 
> Assuming you're aware of this and want to keep that anyway, then the
> patch I just posted to the mailing list may help you, if the stalls are
> coming during CLOSE operations:
> 
> https://lore.kernel.org/linux-nfs/9137413986ba9c2e83c030513fa9ae3358f30a85.camel@kernel.org/T/#mcb88f091263d07d8b9c13e6cc5ce0a0413d3f761
> 
>>
>> Client 1:
>> OS: Arch Linux, all patches installed
>> Kernel: 6.1.9-arch1-2, 6.1.9-arch1-1
>> Mount-Line: servername:/ on /nfs type nfs4
>> (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5p,clientaddr=XX.XX.XX.XX,local_lock=none,addr=YY.YY.YY.YY,_netdev)
>> krb5: 1.20.1-1
>> libevent: 2.1.12-4
>> nfs-utils: 2.6.2-1
>> util-linux: 2.38.1-1
>>
>>
>> Client 2:
>> OS: openSuSE 15.4, all patches installed
>> Kernel: 5.14.21-150400.24.41-default
>> Mount-Line: servername:/ on /nfs type nfs4
>> (rw,relatime,sync,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,softerr,softreval,noac,noresvport,proto=tcp,timeo=1,retrans=2,sec=krb5,clientaddr=XX.XX.XX.XX,lookupcache=none,local_lock=none,addr=YY.YY.YY.YY)
>>
>> libgssapi3: 7.8.0-bp154.2.4.1
>> libevent: 2.1.8-2.23
>> nfs-client: 2.1.1-150100.10.27.1
>> util-linux: 2.37.2-140400.8.14.1
>>
>>
>> The error occurs for example if a file is touched twice:
>>
>> touch testfile && echo "done" && touch testfile && echo "and again"
>>
>> However, touching a large number of files (about 10000) with (pairwise)
>> different filenames works fast.
>>
>>
>> Here is another example that triggers the error:
>>
>> 1st step: create many files (shell code in Z-shell syntax):
>>
>> for i in {1..10000}; do
>> echo "test" > $i.txt
>> done
>>
>> This is fast.
>>
>> 2nd step:
>>
>> for i in {1..10000}; do
>> echo $i
>> cat $i.txt
>> done
>>
>> This takes 5 seconds per cat(1) call.
>> After unmounting and mounting, the 2nd step also runs quickly at first. But
>> after executing the 2nd step several times in a row, the error occurs again
>> (quite soon, after the 2nd or 3rd execution).
>>
>> We were not able to reproduce the error without a Kerberos security type set.
>>
>>
>> Attached are a log from the server and from the client. In both cases
>>
>> rpcdebug -m nfs -s all
>> rpcdebug -m nfsd -s all
>> rpcdebug -m rpc -s all
>>
>> was set.
>>
>>
>> Best regards,
>> Florian Möller
>>
>>
>>
> 

-- 
Dr. Florian Möller
Universität Würzburg
Institut für Mathematik
Emil-Fischer-Straße 30
97074 Würzburg, Germany
Tel. +49 931 3185596

[-- Attachment #2: 1sttouch.log --]
[-- Type: text/x-log, Size: 7936 bytes --]

Breakpoint 181, 0xffffffffc0979970 in nfs4_atomic_open ()
Continuing.

Breakpoint 180, 0xffffffffc0978c80 in nfs4_do_open ()
Continuing.

Breakpoint 371, 0xffffffffc098c990 in nfs4_get_state_owner ()
Continuing.

Breakpoint 396, 0xffffffffc098f560 in nfs4_client_recover_expired_lease ()
Continuing.

Breakpoint 395, 0xffffffffc098f4b0 in nfs4_wait_clnt_recover ()
Continuing.

Breakpoint 62, 0xffffffffc096d300 in nfs4_opendata_alloc ()
Continuing.

Breakpoint 675, 0xffffffffc09130f0 in nfs4_label_alloc ()
Continuing.

Breakpoint 675, 0xffffffffc09130f0 in nfs4_label_alloc ()
Continuing.

Breakpoint 68, 0xffffffffc096e2f0 in nfs4_run_open_task ()
Continuing.

Breakpoint 120, 0xffffffffc0972a40 in nfs4_open_prepare ()
Continuing.

Breakpoint 104, 0xffffffffc09724a0 in nfs4_setup_sequence ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 325, 0xffffffffc0988e90 in nfs4_xdr_enc_open ()
Continuing.

Breakpoint 678, 0xffffffffc0580ed0 in nfs4svc_decode_compoundargs ()
Continuing.

Breakpoint 694, 0xffffffffc0585770 in nfs4_alloc_stid ()
Continuing.

Breakpoint 692, 0xffffffffc0584ad0 in nfs4_get_vfs_file ()
Continuing.

Breakpoint 699, 0xffffffffc0586370 in nfs4_inc_and_copy_stateid ()
Continuing.

Breakpoint 698, 0xffffffffc05859c0 in nfs4_put_stid ()
Continuing.

Breakpoint 688, 0xffffffffc05835a0 in nfs4_put_stateowner ()
Continuing.

Breakpoint 679, 0xffffffffc0581310 in nfs4svc_encode_compoundres ()
Continuing.

Breakpoint 338, 0xffffffffc098ae50 in nfs4_xdr_dec_open ()
Continuing.

Breakpoint 100, 0xffffffffc09712f0 in nfs4_open_done ()
Continuing.

Breakpoint 76, 0xffffffffc096f340 in nfs41_sequence_process ()
Continuing.

Breakpoint 400, 0xffffffffc098faf0 in nfs41_handle_sequence_flag_errors ()
Continuing.

Breakpoint 486, 0xffffffffc099a740 in nfs41_update_target_slotid ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 178, 0xffffffffc0978b20 in nfs4_open_release ()
Continuing.

Breakpoint 24, 0xffffffffc09691c0 in nfs4_update_changeattr_locked ()
Continuing.

Breakpoint 376, 0xffffffffc098d0b0 in nfs4_get_open_state ()
Continuing.

Breakpoint 737, 0xffffffffc0977280 in update_open_stateid ()
Continuing.

Breakpoint 418, 0xffffffffc0991d20 in nfs4_get_valid_delegation ()
Continuing.

Breakpoint 375, 0xffffffffc098d040 in nfs4_state_set_mode_locked ()
Continuing.

Breakpoint 87, 0xffffffffc0970230 in nfs4_sequence_free_slot ()
Continuing.

Breakpoint 73, 0xffffffffc096efe0 in nfs41_release_slot ()
Continuing.

Breakpoint 483, 0xffffffffc099a5e0 in nfs41_wake_and_assign_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 93, 0xffffffffc09706d0 in nfs4_opendata_free ()
Continuing.

Breakpoint 493, 0xffffffffc09a9800 in nfs4_lgopen_release ()
Continuing.

Breakpoint 372, 0xffffffffc098ce50 in nfs4_put_state_owner ()
Continuing.

Breakpoint 372, 0xffffffffc098ce50 in nfs4_put_state_owner ()
Continuing.

Breakpoint 412, 0xffffffffc0990840 in nfs4_file_flush ()
Continuing.

Breakpoint 426, 0xffffffffc0992fc0 in nfs4_delegation_flush_on_close ()
Continuing.

Breakpoint 140, 0xffffffffc0974b90 in nfs4_proc_setattr ()
Continuing.

Breakpoint 675, 0xffffffffc09130f0 in nfs4_label_alloc ()
Continuing.

Breakpoint 139, 0xffffffffc09746c0 in nfs4_do_setattr ()
Continuing.

Breakpoint 27, 0xffffffffc0969520 in nfs4_bitmap_copy_adjust ()
Continuing.

Breakpoint 423, 0xffffffffc0992610 in nfs4_inode_make_writeable ()
Continuing.

Breakpoint 53, 0xffffffffc096b060 in nfs4_call_sync_sequence ()
Continuing.

Breakpoint 111, 0xffffffffc09727c0 in nfs41_call_sync_prepare ()
Continuing.

Breakpoint 104, 0xffffffffc09724a0 in nfs4_setup_sequence ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 296, 0xffffffffc0984b50 in nfs4_xdr_enc_setattr ()
Continuing.

Breakpoint 678, 0xffffffffc0580ed0 in nfs4svc_decode_compoundargs ()
Continuing.

Breakpoint 679, 0xffffffffc0581310 in nfs4svc_encode_compoundres ()
Continuing.

Breakpoint 330, 0xffffffffc098a740 in nfs4_xdr_dec_setattr ()
Continuing.

Breakpoint 79, 0xffffffffc096f700 in nfs41_call_sync_done ()
Continuing.

Breakpoint 76, 0xffffffffc096f340 in nfs41_sequence_process ()
Continuing.

Breakpoint 400, 0xffffffffc098faf0 in nfs41_handle_sequence_flag_errors ()
Continuing.

Breakpoint 486, 0xffffffffc099a740 in nfs41_update_target_slotid ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 73, 0xffffffffc096efe0 in nfs41_release_slot ()
Continuing.

Breakpoint 483, 0xffffffffc099a5e0 in nfs41_wake_and_assign_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 126, 0xffffffffc0973950 in nfs4_handle_exception ()
Continuing.

Breakpoint 44, 0xffffffffc096a060 in nfs4_do_handle_exception ()
Continuing.

Breakpoint 412, 0xffffffffc0990840 in nfs4_file_flush ()
Continuing.

Breakpoint 426, 0xffffffffc0992fc0 in nfs4_delegation_flush_on_close ()
Continuing.

Breakpoint 36, 0xffffffffc0969980 in nfs4_close_context ()
Continuing.

Breakpoint 381, 0xffffffffc098e790 in nfs4_close_sync ()
Continuing.

Breakpoint 375, 0xffffffffc098d040 in nfs4_state_set_mode_locked ()
Continuing.

Breakpoint 184, 0xffffffffc0979dd0 in nfs4_do_close ()
Continuing.

Breakpoint 385, 0xffffffffc098ea70 in nfs4_copy_open_stateid ()
Continuing.

Breakpoint 121, 0xffffffffc0972c40 in nfs4_close_prepare ()
Continuing.

Breakpoint 45, 0xffffffffc096a650 in nfs4_bitmask_set.constprop ()
Continuing.

Breakpoint 104, 0xffffffffc09724a0 in nfs4_setup_sequence ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 316, 0xffffffffc0986f70 in nfs4_xdr_enc_close ()
Continuing.

Breakpoint 678, 0xffffffffc0580ed0 in nfs4svc_decode_compoundargs ()
Continuing.

Breakpoint 679, 0xffffffffc0581310 in nfs4svc_encode_compoundres ()
Continuing.

Breakpoint 327, 0xffffffffc098a490 in nfs4_xdr_dec_close ()
Continuing.

Breakpoint 326, 0xffffffffc098a370 in nfs4_xdr_dec_close.part ()
Continuing.

Breakpoint 125, 0xffffffffc09732a0 in nfs4_close_done ()
Continuing.

Breakpoint 89, 0xffffffffc0970510 in nfs4_sequence_done ()
Continuing.

Breakpoint 76, 0xffffffffc096f340 in nfs41_sequence_process ()
Continuing.

Breakpoint 400, 0xffffffffc098faf0 in nfs41_handle_sequence_flag_errors ()
Continuing.

Breakpoint 486, 0xffffffffc099a740 in nfs41_update_target_slotid ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 73, 0xffffffffc096efe0 in nfs41_release_slot ()
Continuing.

Breakpoint 483, 0xffffffffc099a5e0 in nfs41_wake_and_assign_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 50, 0xffffffffc096aa30 in nfs4_async_handle_exception ()
Continuing.

Breakpoint 44, 0xffffffffc096a060 in nfs4_do_handle_exception ()
Continuing.

Breakpoint 40, 0xffffffffc0969c80 in nfs4_free_closedata ()
Continuing.

Breakpoint 377, 0xffffffffc098d250 in nfs4_put_open_state ()
Continuing.

Breakpoint 422, 0xffffffffc0992510 in nfs4_inode_return_delegation_on_close ()
Continuing.

Breakpoint 372, 0xffffffffc098ce50 in nfs4_put_state_owner ()
Continuing.

[-- Attachment #3: 2ndtouch.log --]
[-- Type: text/x-log, Size: 8500 bytes --]

Breakpoint 673, 0xffffffffc0910200 in nfs4_lookup_revalidate ()
Continuing.

Breakpoint 672, 0xffffffffc0910130 in nfs4_do_lookup_revalidate ()
Continuing.

Breakpoint 414, 0xffffffffc0990bb0 in nfs4_file_open ()
Continuing.

Breakpoint 181, 0xffffffffc0979970 in nfs4_atomic_open ()
Continuing.

Breakpoint 180, 0xffffffffc0978c80 in nfs4_do_open ()
Continuing.

Breakpoint 371, 0xffffffffc098c990 in nfs4_get_state_owner ()
Continuing.

Breakpoint 396, 0xffffffffc098f560 in nfs4_client_recover_expired_lease ()
Continuing.

Breakpoint 395, 0xffffffffc098f4b0 in nfs4_wait_clnt_recover ()
Continuing.

Breakpoint 418, 0xffffffffc0991d20 in nfs4_get_valid_delegation ()
Continuing.

Breakpoint 62, 0xffffffffc096d300 in nfs4_opendata_alloc ()
Continuing.

Breakpoint 675, 0xffffffffc09130f0 in nfs4_label_alloc ()
Continuing.

Breakpoint 675, 0xffffffffc09130f0 in nfs4_label_alloc ()
Continuing.

Breakpoint 376, 0xffffffffc098d0b0 in nfs4_get_open_state ()
Continuing.

Breakpoint 68, 0xffffffffc096e2f0 in nfs4_run_open_task ()
Continuing.

Breakpoint 120, 0xffffffffc0972a40 in nfs4_open_prepare ()
Continuing.

Breakpoint 418, 0xffffffffc0991d20 in nfs4_get_valid_delegation ()
Continuing.

Breakpoint 104, 0xffffffffc09724a0 in nfs4_setup_sequence ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 324, 0xffffffffc0988d00 in nfs4_xdr_enc_open_noattr ()
Continuing.

Breakpoint 678, 0xffffffffc0580ed0 in nfs4svc_decode_compoundargs ()
Continuing.

Breakpoint 694, 0xffffffffc0585770 in nfs4_alloc_stid ()
Continuing.

Breakpoint 699, 0xffffffffc0586370 in nfs4_inc_and_copy_stateid ()
Continuing.

Breakpoint 698, 0xffffffffc05859c0 in nfs4_put_stid ()
Continuing.

Breakpoint 688, 0xffffffffc05835a0 in nfs4_put_stateowner ()
Continuing.

Breakpoint 698, 0xffffffffc05859c0 in nfs4_put_stid ()
Continuing.

Breakpoint 686, 0xffffffffc0582d50 in nfs4_free_cpntf_statelist ()
Continuing.

Breakpoint 691, 0xffffffffc0584a60 in nfs4_free_ol_stateid ()
Continuing.

Breakpoint 679, 0xffffffffc0581310 in nfs4svc_encode_compoundres ()
Continuing.

Breakpoint 340, 0xffffffffc098b0d0 in nfs4_xdr_dec_open_noattr ()
Continuing.

Breakpoint 100, 0xffffffffc09712f0 in nfs4_open_done ()
Continuing.

Breakpoint 76, 0xffffffffc096f340 in nfs41_sequence_process ()
Continuing.

Breakpoint 400, 0xffffffffc098faf0 in nfs41_handle_sequence_flag_errors ()
Continuing.

Breakpoint 486, 0xffffffffc099a740 in nfs41_update_target_slotid ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 178, 0xffffffffc0978b20 in nfs4_open_release ()
Continuing.

Breakpoint 737, 0xffffffffc0977280 in update_open_stateid ()
Continuing. <-- delay occurs here

Breakpoint 418, 0xffffffffc0991d20 in nfs4_get_valid_delegation ()
Continuing.

Breakpoint 375, 0xffffffffc098d040 in nfs4_state_set_mode_locked ()
Continuing.

Breakpoint 87, 0xffffffffc0970230 in nfs4_sequence_free_slot ()
Continuing.

Breakpoint 73, 0xffffffffc096efe0 in nfs41_release_slot ()
Continuing.

Breakpoint 483, 0xffffffffc099a5e0 in nfs41_wake_and_assign_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 93, 0xffffffffc09706d0 in nfs4_opendata_free ()
Continuing.

Breakpoint 493, 0xffffffffc09a9800 in nfs4_lgopen_release ()
Continuing.

Breakpoint 377, 0xffffffffc098d250 in nfs4_put_open_state ()
Continuing.

Breakpoint 372, 0xffffffffc098ce50 in nfs4_put_state_owner ()
Continuing.

Breakpoint 372, 0xffffffffc098ce50 in nfs4_put_state_owner ()
Continuing.

Breakpoint 412, 0xffffffffc0990840 in nfs4_file_flush ()
Continuing.

Breakpoint 426, 0xffffffffc0992fc0 in nfs4_delegation_flush_on_close ()
Continuing.

Breakpoint 140, 0xffffffffc0974b90 in nfs4_proc_setattr ()
Continuing.

Breakpoint 675, 0xffffffffc09130f0 in nfs4_label_alloc ()
Continuing.

Breakpoint 139, 0xffffffffc09746c0 in nfs4_do_setattr ()
Continuing.

Breakpoint 27, 0xffffffffc0969520 in nfs4_bitmap_copy_adjust ()
Continuing.

Breakpoint 423, 0xffffffffc0992610 in nfs4_inode_make_writeable ()
Continuing.

Breakpoint 53, 0xffffffffc096b060 in nfs4_call_sync_sequence ()
Continuing.

Breakpoint 111, 0xffffffffc09727c0 in nfs41_call_sync_prepare ()
Continuing.

Breakpoint 104, 0xffffffffc09724a0 in nfs4_setup_sequence ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 296, 0xffffffffc0984b50 in nfs4_xdr_enc_setattr ()
Continuing.

Breakpoint 678, 0xffffffffc0580ed0 in nfs4svc_decode_compoundargs ()
Continuing.

Breakpoint 679, 0xffffffffc0581310 in nfs4svc_encode_compoundres ()
Continuing.

Breakpoint 330, 0xffffffffc098a740 in nfs4_xdr_dec_setattr ()
Continuing.

Breakpoint 79, 0xffffffffc096f700 in nfs41_call_sync_done ()
Continuing.

Breakpoint 76, 0xffffffffc096f340 in nfs41_sequence_process ()
Continuing.

Breakpoint 400, 0xffffffffc098faf0 in nfs41_handle_sequence_flag_errors ()
Continuing.

Breakpoint 486, 0xffffffffc099a740 in nfs41_update_target_slotid ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 73, 0xffffffffc096efe0 in nfs41_release_slot ()
Continuing.

Breakpoint 483, 0xffffffffc099a5e0 in nfs41_wake_and_assign_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 126, 0xffffffffc0973950 in nfs4_handle_exception ()
Continuing.

Breakpoint 44, 0xffffffffc096a060 in nfs4_do_handle_exception ()
Continuing.

Breakpoint 412, 0xffffffffc0990840 in nfs4_file_flush ()
Continuing.

Breakpoint 426, 0xffffffffc0992fc0 in nfs4_delegation_flush_on_close ()
Continuing.

Breakpoint 36, 0xffffffffc0969980 in nfs4_close_context ()
Continuing.

Breakpoint 381, 0xffffffffc098e790 in nfs4_close_sync ()
Continuing.

Breakpoint 375, 0xffffffffc098d040 in nfs4_state_set_mode_locked ()
Continuing.

Breakpoint 184, 0xffffffffc0979dd0 in nfs4_do_close ()
Continuing.

Breakpoint 385, 0xffffffffc098ea70 in nfs4_copy_open_stateid ()
Continuing.

Breakpoint 121, 0xffffffffc0972c40 in nfs4_close_prepare ()
Continuing.

Breakpoint 45, 0xffffffffc096a650 in nfs4_bitmask_set.constprop ()
Continuing.

Breakpoint 104, 0xffffffffc09724a0 in nfs4_setup_sequence ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 316, 0xffffffffc0986f70 in nfs4_xdr_enc_close ()
Continuing.

Breakpoint 678, 0xffffffffc0580ed0 in nfs4svc_decode_compoundargs ()
Continuing.

Breakpoint 679, 0xffffffffc0581310 in nfs4svc_encode_compoundres ()
Continuing.

Breakpoint 327, 0xffffffffc098a490 in nfs4_xdr_dec_close ()
Continuing.

Breakpoint 326, 0xffffffffc098a370 in nfs4_xdr_dec_close.part ()
Continuing.

Breakpoint 125, 0xffffffffc09732a0 in nfs4_close_done ()
Continuing.

Breakpoint 89, 0xffffffffc0970510 in nfs4_sequence_done ()
Continuing.

Breakpoint 76, 0xffffffffc096f340 in nfs41_sequence_process ()
Continuing.

Breakpoint 400, 0xffffffffc098faf0 in nfs41_handle_sequence_flag_errors ()
Continuing.

Breakpoint 486, 0xffffffffc099a740 in nfs41_update_target_slotid ()
Continuing.

Breakpoint 480, 0xffffffffc099a430 in nfs4_alloc_slot ()
Continuing.

Breakpoint 471, 0xffffffffc0999e60 in nfs4_find_or_create_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 73, 0xffffffffc096efe0 in nfs41_release_slot ()
Continuing.

Breakpoint 483, 0xffffffffc099a5e0 in nfs41_wake_and_assign_slot ()
Continuing.

Breakpoint 476, 0xffffffffc099a200 in nfs4_free_slot ()
Continuing.

Breakpoint 50, 0xffffffffc096aa30 in nfs4_async_handle_exception ()
Continuing.

Breakpoint 44, 0xffffffffc096a060 in nfs4_do_handle_exception ()
Continuing.

Breakpoint 40, 0xffffffffc0969c80 in nfs4_free_closedata ()
Continuing.

Breakpoint 377, 0xffffffffc098d250 in nfs4_put_open_state ()
Continuing.

Breakpoint 422, 0xffffffffc0992510 in nfs4_inode_return_delegation_on_close ()
Continuing.

Breakpoint 372, 0xffffffffc098ce50 in nfs4_put_state_owner ()
Continuing.

[-- Attachment #4: 2ndtouch-next.log --]
[-- Type: text/x-log, Size: 1200 bytes --]

Breakpoint 1, 0xffffffffc0987280 in update_open_stateid ()
Single stepping until exit from function update_open_stateid,
which has no line number information.
_raw_spin_lock (lock=0xffff888117ab1440) at kernel/locking/spinlock.c:153
153	{
154		__raw_spin_lock(lock);
0xffffffffc09872fb in update_open_stateid ()
Single stepping until exit from function update_open_stateid,  <--- delay occurs here
which has no line number information.
0xffffffffc0987936 in _nfs4_opendata_to_nfs4_state ()
Single stepping until exit from function _nfs4_opendata_to_nfs4_state,
which has no line number information.
0xffffffffc0989262 in nfs4_do_open ()
Single stepping until exit from function nfs4_do_open,
which has no line number information.
0xffffffffc0989a71 in nfs4_atomic_open ()
Single stepping until exit from function nfs4_atomic_open,
which has no line number information.
0xffffffffc09a0cc1 in nfs4_file_open ()
Single stepping until exit from function nfs4_file_open,
which has no line number information.
do_dentry_open (f=f@entry=0xffff88810305b600, inode=0xffff888105136370, open=0xffffffffc09a0bb0 <nfs4_file_open>, open@entry=0x0 <fixed_percpu_data>) at fs/open.c:828
828			if (error)
Continuing.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-21 13:33   ` Florian Möller
@ 2023-02-21 14:13     ` Benjamin Coddington
  2023-02-21 16:52       ` Florian Möller
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Coddington @ 2023-02-21 14:13 UTC (permalink / raw)
  To: Florian Möller; +Cc: Jeff Layton, linux-nfs, Andreas Seeg

On 21 Feb 2023, at 8:33, Florian Möller wrote:

> Hi all,
>
> unfortunately the patch did not help, neither using -o async nor using -o sync. We did some more tests (which is the reason for the delay of this reply):
>
> We used a qemu image housing both the NFS server and the client and did some kernel debugging.
> OS: Ubuntu 22.04.1
> Kernel: 5.15.78
> Mount line: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,
> 	namlen=255,hard,proto=tcp,timeo=600,retrans=2,
> 	sec=krb5p,clientaddr=10.0.0.1,
> 	local_lock=none,addr=10.0.0.1
>
> We touched a file and then touched the file again. This triggers the delay reliably.
>
> We set breakpoints on all functions starting with nfs4 and on "update_open_stateid". The attached file "1sttouch.log" contains a gdb log of the first touch.
> "2ndtouch.log" shows the gdb output of the second touch. The delay occurs in line 116 in update_open_stateid.
>
> We then deleted all breakpoints and set a sole breakpoint on update_open_stateid. We touched the file again and used only the "next" command of gdb. The gdb output is in "2ndtouch-next.log", the delay occurs in line 8.
>
> Please let us know if you need more information or if you want us to perform further tests.
>
> Best regards,
> Florian Möller

Hi Florian,

The 5 second value and location of the delay is making me suspect something
is wrong with the open stateid sequence processing.

The client introduces 5-second delays in order to correctly order stateid
updates from the server.  Usually this happens because there are multiple
processes sending OPEN/CLOSE calls and the server processess them out of
order, or the client receives the responses out of order.

It would be helpful to have a network capture of the problem, along with the
matching output from these tracepoints on the client:

nfs4:nfs4_open_stateid_update
nfs4:nfs4_open_stateid_update_wait
nfs4:nfs4_close_stateid_update_wait
sunrpc:xs_stream_read_request
sunrpc:rpc_request
sunrpc:rpc_task_end

And these tracepoints on the server:

nfsd:nfsd_preprocess
sunrpc:svc_process

I'm interested in seeing how the client is processing the sequence numbers
of the open stateid, or if perhaps there's a delegation in play.

LMK if you need help with the tracepoints -- you can simply append those
lines into /sys/kernel/debug/tracing/set_event, then reproduce the problem.
The output of those tracepoints will be in /sys/kernel/debug/tracing/trace.

Ben


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-21 14:13     ` Benjamin Coddington
@ 2023-02-21 16:52       ` Florian Möller
  2023-02-21 18:58         ` Benjamin Coddington
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Möller @ 2023-02-21 16:52 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: Jeff Layton, linux-nfs, Andreas Seeg

[-- Attachment #1: Type: text/plain, Size: 3076 bytes --]

Hi Benjamin,

here are the trace and a listing of the corresponding network packages. If the 
listing is not detailed enough, I can send you a full package dump tomorrow.

The command I used was

touch test.txt && sleep 2 && touch test.txt

test.txt did not exist previously. So you have an example of a touch without and 
with delay.

Best regards,
Florian

Am 21.02.23 um 15:13 schrieb Benjamin Coddington:
> On 21 Feb 2023, at 8:33, Florian Möller wrote:
> 
>> Hi all,
>>
>> unfortunately the patch did not help, neither using -o async nor using -o sync. We did some more tests (which is the reason for the delay of this reply):
>>
>> We used a qemu image housing both the NFS server and the client and did some kernel debugging.
>> OS: Ubuntu 22.04.1
>> Kernel: 5.15.78
>> Mount line: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,
>> 	namlen=255,hard,proto=tcp,timeo=600,retrans=2,
>> 	sec=krb5p,clientaddr=10.0.0.1,
>> 	local_lock=none,addr=10.0.0.1
>>
>> We touched a file and then touched the file again. This triggers the delay reliably.
>>
>> We set breakpoints on all functions starting with nfs4 and on "update_open_stateid". The attached file "1sttouch.log" contains a gdb log of the first touch.
>> "2ndtouch.log" shows the gdb output of the second touch. The delay occurs in line 116 in update_open_stateid.
>>
>> We then deleted all breakpoints and set a sole breakpoint on update_open_stateid. We touched the file again and used only the "next" command of gdb. The gdb output is in "2ndtouch-next.log", the delay occurs in line 8.
>>
>> Please let us know if you need more information or if you want us to perform further tests.
>>
>> Best regards,
>> Florian Möller
> 
> Hi Florian,
> 
> The 5 second value and location of the delay is making me suspect something
> is wrong with the open stateid sequence processing.
> 
> The client introduces 5-second delays in order to correctly order stateid
> updates from the server.  Usually this happens because there are multiple
> processes sending OPEN/CLOSE calls and the server processess them out of
> order, or the client receives the responses out of order.
> 
> It would be helpful to have a network capture of the problem, along with the
> matching output from these tracepoints on the client:
> 
> nfs4:nfs4_open_stateid_update
> nfs4:nfs4_open_stateid_update_wait
> nfs4:nfs4_close_stateid_update_wait
> sunrpc:xs_stream_read_request
> sunrpc:rpc_request
> sunrpc:rpc_task_end
> 
> And these tracepoints on the server:
> 
> nfsd:nfsd_preprocess
> sunrpc:svc_process
> 
> I'm interested in seeing how the client is processing the sequence numbers
> of the open stateid, or if perhaps there's a delegation in play.
> 
> LMK if you need help with the tracepoints -- you can simply append those
> lines into /sys/kernel/debug/tracing/set_event, then reproduce the problem.
> The output of those tracepoints will be in /sys/kernel/debug/tracing/trace.
> 
> Ben

-- 
Dr. Florian Möller
Universität Würzburg
Institut für Mathematik
Emil-Fischer-Straße 30
97074 Würzburg, Germany
Tel. +49 931 3185596

[-- Attachment #2: nfs-network.txt --]
[-- Type: text/plain, Size: 2329 bytes --]

   57 30.720428828 10.126.0.109 → 10.126.0.109 NFS 252 [TCP Previous segment not captured] V4 Call SEQUENCE
   58 30.720592390 10.126.0.109 → 10.126.0.109 NFS 220 [TCP ACKed unseen segment] V4 Reply (Call In 57) SEQUENCE
   59 30.720611655 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=186 Ack=153 Win=511 Len=0 TSval=173753135 TSecr=173753135
   61 52.473490272 10.126.0.109 → 10.126.0.109 NFS 320 V4 Call
   62 52.473606700 10.126.0.109 → 10.126.0.109 NFS 420 V4 Reply (Call In 61)
   63 52.473614923 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=438 Ack=505 Win=510 Len=0 TSval=173774888 TSecr=173774888
   64 52.477849250 10.126.0.109 → 10.126.0.109 NFS 432 V4 Call
   65 52.477981562 10.126.0.109 → 10.126.0.109 NFS 540 V4 Reply (Call In 64)
   66 52.481123231 10.126.0.109 → 10.126.0.109 NFS 364 V4 Call
   67 52.481255060 10.126.0.109 → 10.126.0.109 NFS 452 V4 Reply (Call In 66)
   68 52.481383173 10.126.0.109 → 10.126.0.109 NFS 316 V4 Call CLOSE StateID: 0xab41
   69 52.481425875 10.126.0.109 → 10.126.0.109 NFS 228 V4 Reply (Call In 68) SEQUENCE | PUTFH Status: NFS4ERR_STALE
   70 52.481774274 10.126.0.109 → 10.126.0.109 NFS 320 V4 Call
   71 52.482851688 10.126.0.109 → 10.126.0.109 NFS 420 V4 Reply (Call In 70)
   72 52.523997562 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=1598 Ack=1873 Win=512 Len=0 TSval=173774938 TSecr=173774897
   73 54.489060244 10.126.0.109 → 10.126.0.109 NFS 388 V4 Call
   74 54.489340744 10.126.0.109 → 10.126.0.109 NFS 504 V4 Reply (Call In 73)
   75 54.489364270 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=1918 Ack=2309 Win=509 Len=0 TSval=173776904 TSecr=173776903
   76 59.648437485 10.126.0.109 → 10.126.0.109 NFS 364 V4 Call
   77 59.648966709 10.126.0.109 → 10.126.0.109 NFS 452 V4 Reply (Call In 76)
   78 59.648991812 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=2214 Ack=2693 Win=509 Len=0 TSval=173782063 TSecr=173782063
   79 59.651032061 10.126.0.109 → 10.126.0.109 NFS 316 V4 Call CLOSE StateID: 0xa5b6
   80 59.651429583 10.126.0.109 → 10.126.0.109 NFS 228 V4 Reply (Call In 79) SEQUENCE | PUTFH Status: NFS4ERR_STALE
   81 59.691894428 10.126.0.109 → 10.126.0.109 TCP 68 978 → 2049 [ACK] Seq=2462 Ack=2853 Win=512 Len=0 TSval=173782106 TSecr=173782066

[-- Attachment #3: nfs-trace.txt --]
[-- Type: text/plain, Size: 5009 bytes --]

           touch-2476    [000] .....  1841.512484: rpc_request: task:129@4 nfsv4 GETATTR (sync)
            nfsd-893     [000] .....  1841.515022: svc_process: addr=10.126.0.109:978 xid=0xffd24ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.515108: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0xffd24ee9 copied=348 reclen=348 offset=348
           touch-2476    [000] .....  1841.519191: rpc_task_end: task:129@4 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
    kworker/u2:0-1906    [000] .....  1841.519289: rpc_request: task:130@4 nfsv4 OPEN (async)
            nfsd-893     [000] .....  1841.519381: svc_process: addr=10.126.0.109:978 xid=0x00d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.519482: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x00d34ee9 copied=468 reclen=468 offset=468
    kworker/u2:0-1906    [000] .....  1841.520684: rpc_task_end: task:130@4 flags=ASYNC|MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
           touch-2476    [000] .....  1841.522551: nfs4_open_stateid_update: error=0 (OK) fileid=00:2e:1048887 fhandle=0x5a835bd1 stateid=1:0x8fe5bdfd
           touch-2476    [000] .....  1841.522571: rpc_request: task:131@4 nfsv4 SETATTR (sync)
            nfsd-893     [000] .....  1841.522695: svc_process: addr=10.126.0.109:978 xid=0x01d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.522750: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x01d34ee9 copied=380 reclen=380 offset=380
           touch-2476    [000] .....  1841.522825: rpc_task_end: task:131@4 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
    kworker/u2:0-1906    [000] .....  1841.522842: rpc_request: task:132@2 nfsv4 CLOSE (async)
            nfsd-893     [000] .....  1841.522891: svc_process: addr=10.126.0.109:978 xid=0x02d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.522916: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x02d34ee9 copied=156 reclen=156 offset=156
    kworker/u2:0-1906    [000] .....  1841.522929: rpc_task_end: task:132@2 flags=ASYNC|MOVEABLE|DYNAMIC|SOFT|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=-116 action=rpc_exit_task [sunrpc]
            bash-2230    [000] .....  1841.523219: rpc_request: task:133@4 nfsv4 GETATTR (sync)
            nfsd-893     [000] .....  1841.524303: svc_process: addr=10.126.0.109:978 xid=0x03d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1841.524347: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x03d34ee9 copied=348 reclen=348 offset=348
            bash-2230    [000] .....  1841.524427: rpc_task_end: task:133@4 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
    kworker/u2:1-2480    [000] .....  1843.530173: rpc_request: task:134@4 nfsv4 OPEN_NOATTR (async)
            nfsd-893     [000] .....  1843.530670: svc_process: addr=10.126.0.109:978 xid=0x04d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1843.530874: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x04d34ee9 copied=432 reclen=432 offset=432
    kworker/u2:1-2480    [000] .....  1843.531142: rpc_task_end: task:134@4 flags=ASYNC|MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
           touch-2482    [000] .....  1843.531395: nfs4_open_stateid_update_wait: error=0 (OK) fileid=00:2e:1048887 fhandle=0x5a835bd1 stateid=2:0x8fe5bdfd
           touch-2482    [000] .....  1848.689634: nfs4_open_stateid_update: error=-11 (EAGAIN) fileid=00:2e:1048887 fhandle=0x5a835bd1 stateid=2:0x8fe5bdfd
           touch-2482    [000] .....  1848.689728: rpc_request: task:135@4 nfsv4 SETATTR (sync)
            nfsd-893     [000] .....  1848.690267: svc_process: addr=10.126.0.109:978 xid=0x05d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1848.690494: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x05d34ee9 copied=380 reclen=380 offset=380
           touch-2482    [000] .....  1848.690806: rpc_task_end: task:135@4 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=0 action=rpc_exit_task [sunrpc]
    kworker/u2:1-2480    [000] .....  1848.691627: rpc_request: task:136@2 nfsv4 CLOSE (async)
            nfsd-893     [000] .....  1848.692586: svc_process: addr=10.126.0.109:978 xid=0x06d34ee9 service=nfsd vers=4 proc=COMPOUND
    kworker/u3:0-112     [000] .....  1848.692931: xs_stream_read_request: peer=[10.126.0.109]:2049 xid=0x06d34ee9 copied=156 reclen=156 offset=156
    kworker/u2:1-2480    [000] .....  1848.693157: rpc_task_end: task:136@2 flags=ASYNC|MOVEABLE|DYNAMIC|SOFT|SENT|NORTO|CRED_NOREF runstate=RUNNING|ACTIVE status=-116 action=rpc_exit_task [sunrpc]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-21 16:52       ` Florian Möller
@ 2023-02-21 18:58         ` Benjamin Coddington
       [not found]           ` <4f70c2f5-dfdb-c37c-8663-5f2a108e229e@mathematik.uni-wuerzburg.de>
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Coddington @ 2023-02-21 18:58 UTC (permalink / raw)
  To: Florian Möller; +Cc: Jeff Layton, linux-nfs, Andreas Seeg

On 21 Feb 2023, at 11:52, Florian Möller wrote:

> Hi Benjamin,
>
> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
>
> The command I used was
>
> touch test.txt && sleep 2 && touch test.txt
>
> test.txt did not exist previously. So you have an example of a touch without and with delay.

Thanks!  These are great - I can see from them that the client is indeed
waiting in the stateid update mechanism because the server has returned
NFS4ERR_STALE to the client's first CLOSE.

That is unusual.  The server is signaling that the open file's stateid is old,
so I am interested to see if the first CLOSE is sent with the stateid's
sequence that was returned from the server.  I could probably see this if I
had the server-side tracepoint data.

But from what's here I cannot see the details to figure this out because
you've only sent the summary of the packet capture.. can you send along the
raw capture?

I usually use something like:

tshark -w /tmp/pcap -i enps0 -P host server

.. then the file /tmp/pcap has the raw capture.  We can use that to look at
the sequence numbers for the stateid as sent from client to server.

However, if you use krb5p, we won't be able to see these details.  Only krb5
or krb5i.  If the problem only reproduces with krb5p, we have to rely on the
tracepoints output on the client and server.

Ben


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
       [not found]           ` <4f70c2f5-dfdb-c37c-8663-5f2a108e229e@mathematik.uni-wuerzburg.de>
@ 2023-02-22 11:54             ` Benjamin Coddington
  2023-02-22 12:22               ` Jeff Layton
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Coddington @ 2023-02-22 11:54 UTC (permalink / raw)
  To: Florian Möller; +Cc: Jeff Layton, linux-nfs, Andreas Seeg

On 22 Feb 2023, at 3:19, Florian Möller wrote:

> Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
>> On 21 Feb 2023, at 11:52, Florian Möller wrote:
>>
>>> Hi Benjamin,
>>>
>>> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
>>>
>>> The command I used was
>>>
>>> touch test.txt && sleep 2 && touch test.txt
>>>
>>> test.txt did not exist previously. So you have an example of a touch without and with delay.
>>
>> Thanks!  These are great - I can see from them that the client is indeed
>> waiting in the stateid update mechanism because the server has returned
>> NFS4ERR_STALE to the client's first CLOSE.
>>
>> That is unusual.  The server is signaling that the open file's stateid is old,
>> so I am interested to see if the first CLOSE is sent with the stateid's
>> sequence that was returned from the server.  I could probably see this if I
>> had the server-side tracepoint data.
>
> Hi Benjamin,
>
> the server-side tracepoints
>
> nfsd:nfsd_preprocess
> sunrpc:svc_process
>
> were enabled. It seems they did not produce any output.
>
> What I did now was:
> - enable all nfsd tracepoints,
> - enable all nfs4 tracepoints,
> - enable all sunrpc tracepoints.
>
> The command I used was
>
> touch somefile && sleep 2 && touch somefile.
>
> Then I unmounted the NFS share - this also causes a delay.
>
> I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.


In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
see:

nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70


.. so nfsd's exp_find_key() is having trouble and returns -ENOENT.  Does
this look familiar to anyone?

I am not as familiar with how the server operates here, so my next step
would be to start inserting trace_printk's into the kernel source to figure
out what's going wrong in there.  However, we can also use the function
graph tracer to see where the kernel is going.  That would look like:

 echo exp_find_key > /sys/kernel/tracing/set_graph_function
 echo 7 > /sys/kernel/debug/tracing/max_graph_depth
 echo function_graph > /sys/kernel/debug/tracing/current_tracer
> /sys/kernel/debug/tracing/trace

 .. reproduce

 cat /sys/kernel/debut/tracing/trace

Hopefully someone with more knfsd/sunrpc experience recognizes this.. but it
makes a lot more sense now that krb5 is part of the problem.

Ben


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-22 11:54             ` Benjamin Coddington
@ 2023-02-22 12:22               ` Jeff Layton
  2023-02-22 12:45                 ` Benjamin Coddington
  2023-02-22 12:48                 ` Florian Möller
  0 siblings, 2 replies; 14+ messages in thread
From: Jeff Layton @ 2023-02-22 12:22 UTC (permalink / raw)
  To: Benjamin Coddington, Florian Möller; +Cc: linux-nfs, Andreas Seeg

On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
> On 22 Feb 2023, at 3:19, Florian Möller wrote:
> 
> > Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
> > > On 21 Feb 2023, at 11:52, Florian Möller wrote:
> > > 
> > > > Hi Benjamin,
> > > > 
> > > > here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
> > > > 
> > > > The command I used was
> > > > 
> > > > touch test.txt && sleep 2 && touch test.txt
> > > > 
> > > > test.txt did not exist previously. So you have an example of a touch without and with delay.
> > > 
> > > Thanks!  These are great - I can see from them that the client is indeed
> > > waiting in the stateid update mechanism because the server has returned
> > > NFS4ERR_STALE to the client's first CLOSE.
> > > 
> > > That is unusual.  The server is signaling that the open file's stateid is old,
> > > so I am interested to see if the first CLOSE is sent with the stateid's
> > > sequence that was returned from the server.  I could probably see this if I
> > > had the server-side tracepoint data.
> > 
> > Hi Benjamin,
> > 
> > the server-side tracepoints
> > 
> > nfsd:nfsd_preprocess
> > sunrpc:svc_process
> > 
> > were enabled. It seems they did not produce any output.
> > 
> > What I did now was:
> > - enable all nfsd tracepoints,
> > - enable all nfs4 tracepoints,
> > - enable all sunrpc tracepoints.
> > 
> > The command I used was
> > 
> > touch somefile && sleep 2 && touch somefile.
> > 
> > Then I unmounted the NFS share - this also causes a delay.
> > 
> > I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.
> 
> 
> In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
> NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
> see:
> 
> nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
> nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
> nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70
> 

This just means that the kernel called into the "cache" infrastructure
to find an export entry, and there wasn't one. 


Looking back at the original email here, I'd say this is expected since
the export wasn't set up for krb5i.

Output of exportfs -v:
/export 
gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export 
gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)


> 
> .. so nfsd's exp_find_key() is having trouble and returns -ENOENT.  Does
> this look familiar to anyone?
> 
> I am not as familiar with how the server operates here, so my next step
> would be to start inserting trace_printk's into the kernel source to figure
> out what's going wrong in there.  However, we can also use the function
> graph tracer to see where the kernel is going.  That would look like:
> 
>  echo exp_find_key > /sys/kernel/tracing/set_graph_function
>  echo 7 > /sys/kernel/debug/tracing/max_graph_depth
>  echo function_graph > /sys/kernel/debug/tracing/current_tracer
> > /sys/kernel/debug/tracing/trace
> 
>  .. reproduce
> 
>  cat /sys/kernel/debut/tracing/trace
> 
> Hopefully someone with more knfsd/sunrpc experience recognizes this.. but it
> makes a lot more sense now that krb5 is part of the problem.
> 
> Ben
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-22 12:22               ` Jeff Layton
@ 2023-02-22 12:45                 ` Benjamin Coddington
  2023-02-22 12:48                 ` Florian Möller
  1 sibling, 0 replies; 14+ messages in thread
From: Benjamin Coddington @ 2023-02-22 12:45 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Florian Möller, linux-nfs, Andreas Seeg

On 22 Feb 2023, at 7:22, Jeff Layton wrote:

> On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
>> On 22 Feb 2023, at 3:19, Florian Möller wrote:
>>
>>> Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
>>>> On 21 Feb 2023, at 11:52, Florian Möller wrote:
>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
>>>>>
>>>>> The command I used was
>>>>>
>>>>> touch test.txt && sleep 2 && touch test.txt
>>>>>
>>>>> test.txt did not exist previously. So you have an example of a touch without and with delay.
>>>>
>>>> Thanks!  These are great - I can see from them that the client is indeed
>>>> waiting in the stateid update mechanism because the server has returned
>>>> NFS4ERR_STALE to the client's first CLOSE.
>>>>
>>>> That is unusual.  The server is signaling that the open file's stateid is old,
>>>> so I am interested to see if the first CLOSE is sent with the stateid's
>>>> sequence that was returned from the server.  I could probably see this if I
>>>> had the server-side tracepoint data.
>>>
>>> Hi Benjamin,
>>>
>>> the server-side tracepoints
>>>
>>> nfsd:nfsd_preprocess
>>> sunrpc:svc_process
>>>
>>> were enabled. It seems they did not produce any output.
>>>
>>> What I did now was:
>>> - enable all nfsd tracepoints,
>>> - enable all nfs4 tracepoints,
>>> - enable all sunrpc tracepoints.
>>>
>>> The command I used was
>>>
>>> touch somefile && sleep 2 && touch somefile.
>>>
>>> Then I unmounted the NFS share - this also causes a delay.
>>>
>>> I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.
>>
>>
>> In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
>> NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
>> see:
>>
>> nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
>> nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
>> nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70
>>
>
> This just means that the kernel called into the "cache" infrastructure
> to find an export entry, and there wasn't one.
>
>
> Looking back at the original email here, I'd say this is expected since
> the export wasn't set up for krb5i.
>
> Output of exportfs -v:
> /export
> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> /export
> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)

Jeff and I were chatting on #linux-nfs, and its back to looking like a
client issue.. as he's pointed out that the client starts out with krb5,
then in Frame 8 (the CLOSE that receives STALE) switches to krb5i, then back
to krb5 in Frame 10..

.. this should be reproduceable.  Let me dust off my old kdc.

Ben



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-22 12:22               ` Jeff Layton
  2023-02-22 12:45                 ` Benjamin Coddington
@ 2023-02-22 12:48                 ` Florian Möller
  2023-02-22 19:43                   ` Benjamin Coddington
  1 sibling, 1 reply; 14+ messages in thread
From: Florian Möller @ 2023-02-22 12:48 UTC (permalink / raw)
  To: Jeff Layton, Benjamin Coddington; +Cc: linux-nfs, Andreas Seeg

Am 22.02.23 um 13:22 schrieb Jeff Layton:
> On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
>> On 22 Feb 2023, at 3:19, Florian Möller wrote:
>>
>>> Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
>>>> On 21 Feb 2023, at 11:52, Florian Möller wrote:
>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
>>>>>
>>>>> The command I used was
>>>>>
>>>>> touch test.txt && sleep 2 && touch test.txt
>>>>>
>>>>> test.txt did not exist previously. So you have an example of a touch without and with delay.
>>>>
>>>> Thanks!  These are great - I can see from them that the client is indeed
>>>> waiting in the stateid update mechanism because the server has returned
>>>> NFS4ERR_STALE to the client's first CLOSE.
>>>>
>>>> That is unusual.  The server is signaling that the open file's stateid is old,
>>>> so I am interested to see if the first CLOSE is sent with the stateid's
>>>> sequence that was returned from the server.  I could probably see this if I
>>>> had the server-side tracepoint data.
>>>
>>> Hi Benjamin,
>>>
>>> the server-side tracepoints
>>>
>>> nfsd:nfsd_preprocess
>>> sunrpc:svc_process
>>>
>>> were enabled. It seems they did not produce any output.
>>>
>>> What I did now was:
>>> - enable all nfsd tracepoints,
>>> - enable all nfs4 tracepoints,
>>> - enable all sunrpc tracepoints.
>>>
>>> The command I used was
>>>
>>> touch somefile && sleep 2 && touch somefile.
>>>
>>> Then I unmounted the NFS share - this also causes a delay.
>>>
>>> I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.
>>
>>
>> In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
>> NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
>> see:
>>
>> nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
>> nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
>> nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70
>>
> 
> This just means that the kernel called into the "cache" infrastructure
> to find an export entry, and there wasn't one.
> 
> 
> Looking back at the original email here, I'd say this is expected since
> the export wasn't set up for krb5i.
> 
> Output of exportfs -v:
> /export
> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> /export
> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)

I changed the export definitions to

/export 
gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export 
gss/krb5i(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export 
gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)

such that all three kerberos security types are exported. With this setup the 
delay is gone. Without the krb5i export the delay occurs again.
Mounting the NFS share with different kerberos security types does not change 
this behaviour - without the krb5i export there is a delay.

Best,
Florian



> 
> 
>>
>> .. so nfsd's exp_find_key() is having trouble and returns -ENOENT.  Does
>> this look familiar to anyone?
>>
>> I am not as familiar with how the server operates here, so my next step
>> would be to start inserting trace_printk's into the kernel source to figure
>> out what's going wrong in there.  However, we can also use the function
>> graph tracer to see where the kernel is going.  That would look like:
>>
>>   echo exp_find_key > /sys/kernel/tracing/set_graph_function
>>   echo 7 > /sys/kernel/debug/tracing/max_graph_depth
>>   echo function_graph > /sys/kernel/debug/tracing/current_tracer
>>> /sys/kernel/debug/tracing/trace
>>
>>   .. reproduce
>>
>>   cat /sys/kernel/debut/tracing/trace
>>
>> Hopefully someone with more knfsd/sunrpc experience recognizes this.. but it
>> makes a lot more sense now that krb5 is part of the problem.
>>
>> Ben
>>
> 

-- 
Dr. Florian Möller
Universität Würzburg
Institut für Mathematik
Emil-Fischer-Straße 30
97074 Würzburg, Germany
Tel. +49 931 3185596


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-22 12:48                 ` Florian Möller
@ 2023-02-22 19:43                   ` Benjamin Coddington
  2023-02-22 20:14                     ` Rick Macklem
  2023-02-23  8:27                     ` Florian Möller
  0 siblings, 2 replies; 14+ messages in thread
From: Benjamin Coddington @ 2023-02-22 19:43 UTC (permalink / raw)
  To: Florian Möller; +Cc: Jeff Layton, linux-nfs, Andreas Seeg

On 22 Feb 2023, at 7:48, Florian Möller wrote:

> Am 22.02.23 um 13:22 schrieb Jeff Layton:
>> On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
>>> On 22 Feb 2023, at 3:19, Florian Möller wrote:
>>>
>>>> Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
>>>>> On 21 Feb 2023, at 11:52, Florian Möller wrote:
>>>>>
>>>>>> Hi Benjamin,
>>>>>>
>>>>>> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
>>>>>>
>>>>>> The command I used was
>>>>>>
>>>>>> touch test.txt && sleep 2 && touch test.txt
>>>>>>
>>>>>> test.txt did not exist previously. So you have an example of a touch without and with delay.
>>>>>
>>>>> Thanks!  These are great - I can see from them that the client is indeed
>>>>> waiting in the stateid update mechanism because the server has returned
>>>>> NFS4ERR_STALE to the client's first CLOSE.
>>>>>
>>>>> That is unusual.  The server is signaling that the open file's stateid is old,
>>>>> so I am interested to see if the first CLOSE is sent with the stateid's
>>>>> sequence that was returned from the server.  I could probably see this if I
>>>>> had the server-side tracepoint data.
>>>>
>>>> Hi Benjamin,
>>>>
>>>> the server-side tracepoints
>>>>
>>>> nfsd:nfsd_preprocess
>>>> sunrpc:svc_process
>>>>
>>>> were enabled. It seems they did not produce any output.
>>>>
>>>> What I did now was:
>>>> - enable all nfsd tracepoints,
>>>> - enable all nfs4 tracepoints,
>>>> - enable all sunrpc tracepoints.
>>>>
>>>> The command I used was
>>>>
>>>> touch somefile && sleep 2 && touch somefile.
>>>>
>>>> Then I unmounted the NFS share - this also causes a delay.
>>>>
>>>> I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.
>>>
>>>
>>> In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
>>> NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
>>> see:
>>>
>>> nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
>>> nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
>>> nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70
>>>
>>
>> This just means that the kernel called into the "cache" infrastructure
>> to find an export entry, and there wasn't one.
>>
>>
>> Looking back at the original email here, I'd say this is expected since
>> the export wasn't set up for krb5i.
>>
>> Output of exportfs -v:
>> /export
>> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>> /export
>> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>
> I changed the export definitions to
>
> /export gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> /export gss/krb5i(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> /export gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>
> such that all three kerberos security types are exported. With this setup the delay is gone. Without the krb5i export the delay occurs again.
> Mounting the NFS share with different kerberos security types does not change this behaviour - without the krb5i export there is a delay.

After getting the ol' KDC back in shape and trying to reproduce this, I
noticed you're using a client specifier that's been deprecated (the
gss/krb5), along with sec=sys in the same line.  That's a little weird, not
sure what the results of that might be.

I can't even get any mount to work with that:

[root@fedora ~]# exportfs -v
/exports      	gss/krb5(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
/exports      	gss/krb5p(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
[root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
mount.nfs: mounting fedora.fh.bcodding.com:/exports failed, reason given by server: No such file or directory

However, if I use the "sec=" method, everything seems to work fine:

[root@fedora ~]# exportfs -v
/exports      	10.0.1.0/24(sync,wdelay,hide,no_subtree_check,nordirplus,sec=sys:krb5:krb5p,rw,insecure,no_root_squash,no_all_squash)
[root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
[root@fedora ~]# touch /mnt/fedora/krbfoo
[root@fedora ~]# touch /mnt/fedora/krbfoo
[root@fedora ~]# umount /mnt/fedora
[root@fedora ~]# mount -t nfs -ov4.1,sec=krb5p fedora.fh.bcodding.com:/exports /mnt/fedora
[root@fedora ~]# touch /mnt/fedora/krbfoo
[root@fedora ~]# touch /mnt/fedora/krbfoo
[root@fedora ~]# umount /mnt/fedora

.. so I can't seem to reproduce this because I can't even mount.  Did you
perhaps have a previous mount with that client with a different security
flavor?

I think your NFS client ended up having a machine credential with krb5i, but
then the export with that flavor was removed.  When the client tried to do a
CLOSE, it picked the machine cred, which caused the server to fail in the
lookup of the export, which caused the client to think its stateid was
wrong.

Ben


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-22 19:43                   ` Benjamin Coddington
@ 2023-02-22 20:14                     ` Rick Macklem
  2023-02-23 11:06                       ` Benjamin Coddington
  2023-02-23  8:27                     ` Florian Möller
  1 sibling, 1 reply; 14+ messages in thread
From: Rick Macklem @ 2023-02-22 20:14 UTC (permalink / raw)
  To: Benjamin Coddington
  Cc: Florian Möller, Jeff Layton, linux-nfs, Andreas Seeg

On Wed, Feb 22, 2023 at 11:44 AM Benjamin Coddington
<bcodding@redhat.com> wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca
>
>
> On 22 Feb 2023, at 7:48, Florian Möller wrote:
>
> > Am 22.02.23 um 13:22 schrieb Jeff Layton:
> >> On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
> >>> On 22 Feb 2023, at 3:19, Florian Möller wrote:
> >>>
> >>>> Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
> >>>>> On 21 Feb 2023, at 11:52, Florian Möller wrote:
> >>>>>
> >>>>>> Hi Benjamin,
> >>>>>>
> >>>>>> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
> >>>>>>
> >>>>>> The command I used was
> >>>>>>
> >>>>>> touch test.txt && sleep 2 && touch test.txt
> >>>>>>
> >>>>>> test.txt did not exist previously. So you have an example of a touch without and with delay.
> >>>>>
> >>>>> Thanks!  These are great - I can see from them that the client is indeed
> >>>>> waiting in the stateid update mechanism because the server has returned
> >>>>> NFS4ERR_STALE to the client's first CLOSE.
> >>>>>
> >>>>> That is unusual.  The server is signaling that the open file's stateid is old,
> >>>>> so I am interested to see if the first CLOSE is sent with the stateid's
> >>>>> sequence that was returned from the server.  I could probably see this if I
> >>>>> had the server-side tracepoint data.
> >>>>
> >>>> Hi Benjamin,
> >>>>
> >>>> the server-side tracepoints
> >>>>
> >>>> nfsd:nfsd_preprocess
> >>>> sunrpc:svc_process
> >>>>
> >>>> were enabled. It seems they did not produce any output.
> >>>>
> >>>> What I did now was:
> >>>> - enable all nfsd tracepoints,
> >>>> - enable all nfs4 tracepoints,
> >>>> - enable all sunrpc tracepoints.
> >>>>
> >>>> The command I used was
> >>>>
> >>>> touch somefile && sleep 2 && touch somefile.
> >>>>
> >>>> Then I unmounted the NFS share - this also causes a delay.
> >>>>
> >>>> I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.
> >>>
> >>>
> >>> In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
> >>> NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
> >>> see:
> >>>
> >>> nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
> >>> nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
> >>> nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70
> >>>
> >>
> >> This just means that the kernel called into the "cache" infrastructure
> >> to find an export entry, and there wasn't one.
> >>
> >>
> >> Looking back at the original email here, I'd say this is expected since
> >> the export wasn't set up for krb5i.
> >>
> >> Output of exportfs -v:
> >> /export
> >> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> >> /export
> >> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> >
> > I changed the export definitions to
> >
> > /export gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> > /export gss/krb5i(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> > /export gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> >
> > such that all three kerberos security types are exported. With this setup the delay is gone. Without the krb5i export the delay occurs again.
> > Mounting the NFS share with different kerberos security types does not change this behaviour - without the krb5i export there is a delay.
>
> After getting the ol' KDC back in shape and trying to reproduce this, I
> noticed you're using a client specifier that's been deprecated (the
> gss/krb5), along with sec=sys in the same line.  That's a little weird, not
> sure what the results of that might be.
>
> I can't even get any mount to work with that:
>
> [root@fedora ~]# exportfs -v
> /exports        gss/krb5(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
> /exports        gss/krb5p(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
> [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
> mount.nfs: mounting fedora.fh.bcodding.com:/exports failed, reason given by server: No such file or directory
>
> However, if I use the "sec=" method, everything seems to work fine:
>
> [root@fedora ~]# exportfs -v
> /exports        10.0.1.0/24(sync,wdelay,hide,no_subtree_check,nordirplus,sec=sys:krb5:krb5p,rw,insecure,no_root_squash,no_all_squash)
> [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
> [root@fedora ~]# touch /mnt/fedora/krbfoo
> [root@fedora ~]# touch /mnt/fedora/krbfoo
> [root@fedora ~]# umount /mnt/fedora
> [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5p fedora.fh.bcodding.com:/exports /mnt/fedora
> [root@fedora ~]# touch /mnt/fedora/krbfoo
> [root@fedora ~]# touch /mnt/fedora/krbfoo
> [root@fedora ~]# umount /mnt/fedora
>
> .. so I can't seem to reproduce this because I can't even mount.  Did you
> perhaps have a previous mount with that client with a different security
> flavor?
>
> I think your NFS client ended up having a machine credential with krb5i, but
> then the export with that flavor was removed.  When the client tried to do a
> CLOSE, it picked the machine cred, which caused the server to fail in the
> lookup of the export, which caused the client to think its stateid was
> wrong.
>
Just a shot in the dark, but is the client deciding to use SP4_MACH_CRED?
If so (and the server allows it by replying NFS_OK to ExchangeID), then the
client must use krb5i or krb5p for the state maintenance operations.
Easy to check. Just look at the ExchangeID done at mount time with wireshark.

rick

> Ben
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-22 19:43                   ` Benjamin Coddington
  2023-02-22 20:14                     ` Rick Macklem
@ 2023-02-23  8:27                     ` Florian Möller
  1 sibling, 0 replies; 14+ messages in thread
From: Florian Möller @ 2023-02-23  8:27 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: Jeff Layton, linux-nfs, Andreas Seeg

Am 22.02.23 um 20:43 schrieb Benjamin Coddington:
> On 22 Feb 2023, at 7:48, Florian Möller wrote:
> 
>> Am 22.02.23 um 13:22 schrieb Jeff Layton:
>>> On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
>>>> On 22 Feb 2023, at 3:19, Florian Möller wrote:
>>>>
>>>>> Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
>>>>>> On 21 Feb 2023, at 11:52, Florian Möller wrote:
>>>>>>
>>>>>>> Hi Benjamin,
>>>>>>>
>>>>>>> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
>>>>>>>
>>>>>>> The command I used was
>>>>>>>
>>>>>>> touch test.txt && sleep 2 && touch test.txt
>>>>>>>
>>>>>>> test.txt did not exist previously. So you have an example of a touch without and with delay.
>>>>>>
>>>>>> Thanks!  These are great - I can see from them that the client is indeed
>>>>>> waiting in the stateid update mechanism because the server has returned
>>>>>> NFS4ERR_STALE to the client's first CLOSE.
>>>>>>
>>>>>> That is unusual.  The server is signaling that the open file's stateid is old,
>>>>>> so I am interested to see if the first CLOSE is sent with the stateid's
>>>>>> sequence that was returned from the server.  I could probably see this if I
>>>>>> had the server-side tracepoint data.
>>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> the server-side tracepoints
>>>>>
>>>>> nfsd:nfsd_preprocess
>>>>> sunrpc:svc_process
>>>>>
>>>>> were enabled. It seems they did not produce any output.
>>>>>
>>>>> What I did now was:
>>>>> - enable all nfsd tracepoints,
>>>>> - enable all nfs4 tracepoints,
>>>>> - enable all sunrpc tracepoints.
>>>>>
>>>>> The command I used was
>>>>>
>>>>> touch somefile && sleep 2 && touch somefile.
>>>>>
>>>>> Then I unmounted the NFS share - this also causes a delay.
>>>>>
>>>>> I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.
>>>>
>>>>
>>>> In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
>>>> NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
>>>> see:
>>>>
>>>> nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
>>>> nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
>>>> nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70
>>>>
>>>
>>> This just means that the kernel called into the "cache" infrastructure
>>> to find an export entry, and there wasn't one.
>>>
>>>
>>> Looking back at the original email here, I'd say this is expected since
>>> the export wasn't set up for krb5i.
>>>
>>> Output of exportfs -v:
>>> /export
>>> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>>> /export
>>> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>>
>> I changed the export definitions to
>>
>> /export gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>> /export gss/krb5i(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>> /export gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>>
>> such that all three kerberos security types are exported. With this setup the delay is gone. Without the krb5i export the delay occurs again.
>> Mounting the NFS share with different kerberos security types does not change this behaviour - without the krb5i export there is a delay.
> 
> After getting the ol' KDC back in shape and trying to reproduce this, I
> noticed you're using a client specifier that's been deprecated (the
> gss/krb5), along with sec=sys in the same line.  That's a little weird, not
> sure what the results of that might be.

Our original /etc/exports entries were:

/export		gss/krb5p(rw,crossmnt,async,fsid=0)
/export		gss/krb5(rw,crossmnt,async,fsid=0)

These were based on a template shipping with Ubuntu (file 
/usr/share/nfs-kernel-server/conffiles/etc.exports from package 
nfs-kernel-server) using the deprecated gss/krb syntax.
We didn't explicitly add any sec= option.

> 
> I can't even get any mount to work with that:
> 
> [root@fedora ~]# exportfs -v
> /exports      	gss/krb5(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
> /exports      	gss/krb5p(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
> [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
> mount.nfs: mounting fedora.fh.bcodding.com:/exports failed, reason given by server: No such file or directory

For us, mounting was never really a problem; we tried several distros, each with 
several kernels. Except for the delay, everything worked fine, no errors or 
deprecation warnings were given.

> However, if I use the "sec=" method, everything seems to work fine:
> 
> [root@fedora ~]# exportfs -v
> /exports      	10.0.1.0/24(sync,wdelay,hide,no_subtree_check,nordirplus,sec=sys:krb5:krb5p,rw,insecure,no_root_squash,no_all_squash)
> [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
> [root@fedora ~]# touch /mnt/fedora/krbfoo
> [root@fedora ~]# touch /mnt/fedora/krbfoo
> [root@fedora ~]# umount /mnt/fedora
> [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5p fedora.fh.bcodding.com:/exports /mnt/fedora
> [root@fedora ~]# touch /mnt/fedora/krbfoo
> [root@fedora ~]# touch /mnt/fedora/krbfoo
> [root@fedora ~]# umount /mnt/fedora
> 
> .. so I can't seem to reproduce this because I can't even mount.  Did you
> perhaps have a previous mount with that client with a different security
> flavor?

I can confirm that the delay doesn't occur with

/export		*(rw,crossmnt,async,fsid=0,sec=krb5:krb5p)

exportfs -v now gives

/export 
<world>(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=krb5:krb5p,rw,secure,root_squash,no_all_squash)

> I think your NFS client ended up having a machine credential with krb5i, but
> then the export with that flavor was removed.  When the client tried to do a
> CLOSE, it picked the machine cred, which caused the server to fail in the
> lookup of the export, which caused the client to think its stateid was
> wrong.

Thank you for your efforts! Things are working now.

Perhaps someone could rewrite the "RPCSEC_GSS security" section of exports(5). 
It starts with

You may use the special strings "gss/krb5", "gss/krb5i", or "gss/krb5p" to 
restrict access to clients using rpcsec_gss security.
However, this syntax is deprecated; on linux kernels since 2.6.23, you should 
instead use the "sec=" export option

After all this hassle, the first sentence (together with the Ubuntu template) is 
a bit misleading.

Best,
Florian

-- 
Dr. Florian Möller
Universität Würzburg
Institut für Mathematik
Emil-Fischer-Straße 30
97074 Würzburg, Germany
Tel. +49 931 3185596


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Reoccurring 5 second delays during NFS calls
  2023-02-22 20:14                     ` Rick Macklem
@ 2023-02-23 11:06                       ` Benjamin Coddington
  0 siblings, 0 replies; 14+ messages in thread
From: Benjamin Coddington @ 2023-02-23 11:06 UTC (permalink / raw)
  To: Rick Macklem; +Cc: Florian Möller, Jeff Layton, linux-nfs, Andreas Seeg

On 22 Feb 2023, at 15:14, Rick Macklem wrote:

> On Wed, Feb 22, 2023 at 11:44 AM Benjamin Coddington
> <bcodding@redhat.com> wrote:
>>
>> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca
>>
>>
>> On 22 Feb 2023, at 7:48, Florian Möller wrote:
>>
>>> Am 22.02.23 um 13:22 schrieb Jeff Layton:
>>>> On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
>>>>> On 22 Feb 2023, at 3:19, Florian Möller wrote:
>>>>>
>>>>>> Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
>>>>>>> On 21 Feb 2023, at 11:52, Florian Möller wrote:
>>>>>>>
>>>>>>>> Hi Benjamin,
>>>>>>>>
>>>>>>>> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
>>>>>>>>
>>>>>>>> The command I used was
>>>>>>>>
>>>>>>>> touch test.txt && sleep 2 && touch test.txt
>>>>>>>>
>>>>>>>> test.txt did not exist previously. So you have an example of a touch without and with delay.
>>>>>>>
>>>>>>> Thanks!  These are great - I can see from them that the client is indeed
>>>>>>> waiting in the stateid update mechanism because the server has returned
>>>>>>> NFS4ERR_STALE to the client's first CLOSE.
>>>>>>>
>>>>>>> That is unusual.  The server is signaling that the open file's stateid is old,
>>>>>>> so I am interested to see if the first CLOSE is sent with the stateid's
>>>>>>> sequence that was returned from the server.  I could probably see this if I
>>>>>>> had the server-side tracepoint data.
>>>>>>
>>>>>> Hi Benjamin,
>>>>>>
>>>>>> the server-side tracepoints
>>>>>>
>>>>>> nfsd:nfsd_preprocess
>>>>>> sunrpc:svc_process
>>>>>>
>>>>>> were enabled. It seems they did not produce any output.
>>>>>>
>>>>>> What I did now was:
>>>>>> - enable all nfsd tracepoints,
>>>>>> - enable all nfs4 tracepoints,
>>>>>> - enable all sunrpc tracepoints.
>>>>>>
>>>>>> The command I used was
>>>>>>
>>>>>> touch somefile && sleep 2 && touch somefile.
>>>>>>
>>>>>> Then I unmounted the NFS share - this also causes a delay.
>>>>>>
>>>>>> I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.
>>>>>
>>>>>
>>>>> In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
>>>>> NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
>>>>> see:
>>>>>
>>>>> nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
>>>>> nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
>>>>> nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70
>>>>>
>>>>
>>>> This just means that the kernel called into the "cache" infrastructure
>>>> to find an export entry, and there wasn't one.
>>>>
>>>>
>>>> Looking back at the original email here, I'd say this is expected since
>>>> the export wasn't set up for krb5i.
>>>>
>>>> Output of exportfs -v:
>>>> /export
>>>> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>>>> /export
>>>> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>>>
>>> I changed the export definitions to
>>>
>>> /export gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>>> /export gss/krb5i(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>>> /export gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
>>>
>>> such that all three kerberos security types are exported. With this setup the delay is gone. Without the krb5i export the delay occurs again.
>>> Mounting the NFS share with different kerberos security types does not change this behaviour - without the krb5i export there is a delay.
>>
>> After getting the ol' KDC back in shape and trying to reproduce this, I
>> noticed you're using a client specifier that's been deprecated (the
>> gss/krb5), along with sec=sys in the same line.  That's a little weird, not
>> sure what the results of that might be.
>>
>> I can't even get any mount to work with that:
>>
>> [root@fedora ~]# exportfs -v
>> /exports        gss/krb5(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
>> /exports        gss/krb5p(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
>> [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
>> mount.nfs: mounting fedora.fh.bcodding.com:/exports failed, reason given by server: No such file or directory
>>
>> However, if I use the "sec=" method, everything seems to work fine:
>>
>> [root@fedora ~]# exportfs -v
>> /exports        10.0.1.0/24(sync,wdelay,hide,no_subtree_check,nordirplus,sec=sys:krb5:krb5p,rw,insecure,no_root_squash,no_all_squash)
>> [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
>> [root@fedora ~]# touch /mnt/fedora/krbfoo
>> [root@fedora ~]# touch /mnt/fedora/krbfoo
>> [root@fedora ~]# umount /mnt/fedora
>> [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5p fedora.fh.bcodding.com:/exports /mnt/fedora
>> [root@fedora ~]# touch /mnt/fedora/krbfoo
>> [root@fedora ~]# touch /mnt/fedora/krbfoo
>> [root@fedora ~]# umount /mnt/fedora
>>
>> .. so I can't seem to reproduce this because I can't even mount.  Did you
>> perhaps have a previous mount with that client with a different security
>> flavor?
>>
>> I think your NFS client ended up having a machine credential with krb5i, but
>> then the export with that flavor was removed.  When the client tried to do a
>> CLOSE, it picked the machine cred, which caused the server to fail in the
>> lookup of the export, which caused the client to think its stateid was
>> wrong.
>>
> Just a shot in the dark, but is the client deciding to use SP4_MACH_CRED?
> If so (and the server allows it by replying NFS_OK to ExchangeID), then the
> client must use krb5i or krb5p for the state maintenance operations.
> Easy to check. Just look at the ExchangeID done at mount time with wireshark.

Yep, that's what's going on.

And it seems that when using the gss/krb5(options) exporting style it causes
exp_find_key() stuff to be looking for an auth_domain of gss/krb5i, which
doesn't exist.  There's a bug in here somewhere.

Ben


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-02-23 11:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-07 10:58 Reoccurring 5 second delays during NFS calls Florian Möller
2023-02-07 15:21 ` Jeff Layton
2023-02-21 13:33   ` Florian Möller
2023-02-21 14:13     ` Benjamin Coddington
2023-02-21 16:52       ` Florian Möller
2023-02-21 18:58         ` Benjamin Coddington
     [not found]           ` <4f70c2f5-dfdb-c37c-8663-5f2a108e229e@mathematik.uni-wuerzburg.de>
2023-02-22 11:54             ` Benjamin Coddington
2023-02-22 12:22               ` Jeff Layton
2023-02-22 12:45                 ` Benjamin Coddington
2023-02-22 12:48                 ` Florian Möller
2023-02-22 19:43                   ` Benjamin Coddington
2023-02-22 20:14                     ` Rick Macklem
2023-02-23 11:06                       ` Benjamin Coddington
2023-02-23  8:27                     ` Florian Möller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox