linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
       [not found] <176298368872.955.14091113173156448257.reportbug@nfsclient-sid.ipa.twrlab.net>
@ 2025-11-13  5:00 ` Salvatore Bonaccorso
  2025-11-13 14:30   ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Salvatore Bonaccorso @ 2025-11-13  5:00 UTC (permalink / raw)
  To: Tyler W. Ross, 1120598, Chuck Lever, Jeff Layton, NeilBrown,
	Scott Mayhew, Steve Dickson
  Cc: Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, linux-nfs, linux-kernel

Hi NFS folks,

Tyler W. Ross reported the following issue in Debian (in
https://bugs.debian.org/1120598)

On Wed, Nov 12, 2025 at 04:41:28PM -0500, Tyler W. Ross wrote:
> Package: nfs-common
> Version: 1:2.8.4-1+b1
> Severity: important
> X-Debbugs-Cc: twr+debbugs@tylerwross.com
> 
> 
> When the session key of a kerberos ticket uses a SHA2 cipher (aes256-cts-hmac-sha384-192 and aes128-cts-hmac-sha256-128 tested), readdir requests fail.
> 
> SHA1 ciphers (aes256-cts-hmac-sha1-96 and aes128-cts-hmac-sha1-96 tested) work as expected.
> 
> ls reports the following:
> ls: reading directory '/mnt/example/': Input/output error
> 
> stat and touch of files and directories is working, and cat'ing a file works (see also: later note about cat with NFSv4.1 and 4.0).
> 
> 
> 
> Example of a non-working ticket, as reported by klist -e:
> 11/12/25 18:37:30  11/13/25 17:49:03  nfs/nfssrv.ipa.twrlab.net@IPA.TWRLAB.NET
> 	Etype (skey, tkt): aes256-cts-hmac-sha384-192, aes256-cts-hmac-sha384-192 
> 
> Example of a working ticket:
> 11/12/25 19:01:46  11/13/25 18:27:33  nfs/nfssrv.ipa.twrlab.net@IPA.TWRLAB.NET
> 	Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha384-192 
> 
> If rpcdebug is enabled for nfs and rpc modules, the following is logged to dmesg: 
> [332376.797836] NFS: nfs_weak_revalidate: inode 262146 is valid
> [332376.798512] NFS: revalidating (0:58/262146)
> [332376.799169] --> nfs41_call_sync_prepare data->seq_server 00000000e22b1bd9
> [332376.799916] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
> [332376.800764] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [332376.801507] RPC:       gss_krb5_get_mic_v2
> [332376.802009] encode_sequence: sessionid=1762048597:1479457708:22:0 seqid=27 slotid=0 max_slotid=0 cache_this=0
> [332376.803204] RPC:       gss_krb5_get_mic_v2
> [332376.803726] RPC:       xs_tcp_send_request(260) = 0
> [332376.804536] RPC:       gss_krb5_verify_mic_v2
> [332376.805093] RPC:       gss_krb5_verify_mic_v2
> [332376.805643] decode_attr_type: type=040000
> [332376.806149] decode_attr_change: change attribute=22
> [332376.806866] decode_attr_size: file size=4096
> [332376.807398] decode_attr_fsid: fsid=(0xfdcb5a40986843e0/0xa4fc6c44ad8345ad)
> [332376.808154] decode_attr_fileid: fileid=262146
> [332376.808742] decode_attr_fs_locations: fs_locations done, error = 0
> [332376.809495] decode_attr_mode: file mode=0777
> [332376.810042] decode_attr_nlink: nlink=3
> [332376.810695] decode_attr_owner: uid=591200000
> [332376.811229] decode_attr_group: gid=591200004
> [332376.811761] decode_attr_rdev: rdev=(0x0:0x0)
> [332376.812291] decode_attr_space_used: space used=4096
> [332376.812878] decode_attr_time_access: atime=1762383044
> [332376.813487] decode_attr_time_create: btime=1761952933
> [332376.814098] decode_attr_time_metadata: ctime=1762055558
> [332376.814895] decode_attr_time_modify: mtime=1762055558
> [332376.815578] decode_attr_mounted_on_fileid: fileid=262146
> [332376.816225] decode_getfattr_attrs: xdr returned 0
> [332376.816796] decode_getfattr_generic: xdr returned 0
> [332376.817374] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
> [332376.818135] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [332376.818873] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [332376.819604] nfs41_sequence_process: Error 0 free the slot 
> [332376.820228] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [332376.820930] NFS: nfs_update_inode(0:58/262146 fh_crc=0xad8c294c ct=2 info=0x4427e7f)
> [332376.821767] NFS: (0:58/262146) revalidation complete
> [332376.822342] NFS: nfs_weak_revalidate: inode 262146 is valid
> [332376.823056] NFS: permission(0:58/262146), mask=0x24, res=0
> [332376.823684] NFS: open dir(/)
> [332376.824087] NFS: readdir(/) starting at cookie 0
> [332376.824641] _nfs4_proc_readdir: dentry = /, cookie = 0
> [332376.825229] --> nfs41_call_sync_prepare data->seq_server 00000000e22b1bd9
> [332376.825967] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
> [332376.826814] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [332376.827616] RPC:       gss_krb5_get_mic_v2
> [332376.828114] encode_sequence: sessionid=1762048597:1479457708:22:0 seqid=28 slotid=0 max_slotid=0 cache_this=0
> [332376.829146] encode_readdir: cookie = 0, verifier = 00000000:00000000, bitmap = 0018091a:00b4a23a:00000000
> [332376.830144] RPC:       gss_krb5_get_mic_v2
> [332376.830720] RPC:       xs_tcp_send_request(284) = 0
> [332376.831431] RPC:       gss_krb5_verify_mic_v2
> [332376.831967] RPC:       gss_krb5_verify_mic_v2
> [332376.832498] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
> [332376.833254] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [332376.833994] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [332376.834695] nfs41_sequence_process: Error 0 free the slot 
> [332376.835318] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [332376.836016] _nfs4_proc_readdir: returns -5
> [332376.836519] NFS: readdir(/) returns -5
> 
> 
> 
> Environment/Supporting Systems:
> - The NFS server is a fresh Debian 13 cloud image. freeipa-client, gssproxy, nfs-kernel-server, and qemu-guest-agent have been installed. Joined to FreeIPA via ipa-client-install.
> - Kerberos is provided by a newly installed FreeIPA instance on Fedora 43.
> 
> Failing NFS client configurations:
> 1. Freshly deployed and updated Debian 13 official cloud image (debian-13-genericcloud-amd64). freeipa-client, gssproxy, nfs-common, and qemu-guest-agent have been installed. Joined to FreeIPA via ipa-client-install.
> 2. Freshly installed Debian sid via mini ISO (2025-11-01). Same configuration as 1/above.
> 3. Minimal replication config: freshly installed Debian 13 via debian-13.1.0-amd64-netinst.iso . Installed nfs-common, krb5-config and krb5-user. Manually installed keytab: no additional krb5 configuration done (realm was automatically configured from hostname by krb5-config).
> 
> Working NFS client configuration:
> - Fedora 43 installation configured via ipa-client-install .
> 
> This issue was escalated to me by someone with a matching production environment (FreeIPA on Fedora 43, and Debian 13 NFS client(s) and server). This original reporter also found that a Fedora 43 client worked as-expected with SHA2.
> 
> 
> 
> Miscellaneous observations:
> - Testing was primarily conducted with NFS v4.2. Error occurs with krb5, krb5i and krb5p on 4.2. Also confirmed with krb5i on 4.1 and 4.0 (other combinations of krb5/krb5p and vers 4.1/4.0 not tested).
> - readdir failure observed when client is mounted with NFS v4.2, 4.1, and 4.0. ls reports "input/output error" and dmesg reports "readdir(/) returns -5" in all 3 versions.
> - When mounted with v4.1 and 4.0, cat'ing a file also fails with SHA2. There is no obvious (to me) error in dmesg. stat/touch of files and directories remains working.
> - Failing state is cached on the client: if a user runs ls with a SHA2 session key, then acquires a new SHA1 session key ticket, the "input/output error" persists unless the NFS share is remounted. Setting noac, actimeo=0, and lookupcache=none mount options do not affect this behavior: the error persists until a remount. Error persisted when left overnight (about 13 hours).
> - Cursory examination of a packet capture shows an apparently normal NFSv4 readdir call and reply. The reply contains the expected directory listing.
> 
> Attempted file/directory operations with SHA2 session key and sec=krb5i:
> (all are successful/OK with SHA1 session key)
> ls directory:
>     4.2: "Input/output error"
>     4.1: "Input/output error"
>     4.0: "Input/output error"
> stat file and directory:
>     4.2: OK
>     4.1: OK
>     4.0: OK
> touch file and directory:
>     4.2: OK
>     4.1: OK
>     4.0: OK
> cat file:
>     4.2: OK
>     4.1: "Input/output error"
>     4.0: "Input/output error"
> 
> 
> 
> 
> -- Package-specific info:
> -- rpcinfo --
>    program vers proto   port  service
>     100000    4   tcp    111  portmapper
>     100000    3   tcp    111  portmapper
>     100000    2   tcp    111  portmapper
>     100000    4   udp    111  portmapper
>     100000    3   udp    111  portmapper
>     100000    2   udp    111  portmapper
> -- /etc/default/nfs-common --
> NEED_STATD=
> NEED_IDMAPD=
> NEED_GSSD=
> -- /etc/nfs.conf --
> [general]
> pipefs-directory=/run/rpc_pipefs
> [nfsrahead]
> [exports]
> [exportfs]
> [gssd]
> use-gss-proxy=1
> [lockd]
> [exportd]
> [mountd]
> manage-gids=y
> [nfsdcld]
> [nfsd]
> [statd]
> [sm-notify]
> [svcgssd]
> -- /etc/nfs.conf.d/*.conf --
> 
> -- System Information:
> Debian Release: forky/sid
>   APT prefers unstable
>   APT policy: (500, 'unstable')
> Architecture: amd64 (x86_64)
> 
> Kernel: Linux 6.17.7+deb14+1-amd64 (SMP w/4 CPU threads; PREEMPT)
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
> Shell: /bin/sh linked to /usr/bin/dash
> Init: systemd (via /run/systemd/system)
> LSM: AppArmor: enabled

Is there anything which could help us debugging this?

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13  5:00 ` ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2 Salvatore Bonaccorso
@ 2025-11-13 14:30   ` Chuck Lever
  2025-11-13 17:16     ` Tyler W. Ross
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-11-13 14:30 UTC (permalink / raw)
  To: Salvatore Bonaccorso, Tyler W. Ross, 1120598, Jeff Layton,
	NeilBrown, Scott Mayhew, Steve Dickson
  Cc: Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, linux-nfs, linux-kernel

On 11/13/25 12:00 AM, Salvatore Bonaccorso wrote:
> [332376.824087] NFS: readdir(/) starting at cookie 0
> [332376.824641] _nfs4_proc_readdir: dentry = /, cookie = 0
> [332376.825229] --> nfs41_call_sync_prepare data->seq_server 00000000e22b1bd9
> [332376.825967] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
> [332376.826814] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [332376.827616] RPC:       gss_krb5_get_mic_v2
> [332376.828114] encode_sequence: sessionid=1762048597:1479457708:22:0 seqid=28 slotid=0 max_slotid=0 cache_this=0
> [332376.829146] encode_readdir: cookie = 0, verifier = 00000000:00000000, bitmap = 0018091a:00b4a23a:00000000
> [332376.830144] RPC:       gss_krb5_get_mic_v2
> [332376.830720] RPC:       xs_tcp_send_request(284) = 0
> [332376.831431] RPC:       gss_krb5_verify_mic_v2
> [332376.831967] RPC:       gss_krb5_verify_mic_v2
> [332376.832498] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
> [332376.833254] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [332376.833994] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [332376.834695] nfs41_sequence_process: Error 0 free the slot 
> [332376.835318] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [332376.836016] _nfs4_proc_readdir: returns -5
> [332376.836519] NFS: readdir(/) returns -5

That looks like the client can't understand the server's READDIR
response.

Trace points might give a little more indication of the problem. On the
client, start the tracing command:

 # trace-cmd record -e nfs -e nfs4 -e sunrpc -e rpcgss

In another window, run your reproducer. When it's finished, ^C the
trace-cmd, then:

 # trace-cmd report | less

There are also Kunit tests for the SunRPC Kerberos module to confirm
there isn't some kind of basic problem there.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 14:30   ` Chuck Lever
@ 2025-11-13 17:16     ` Tyler W. Ross
  2025-11-13 17:47       ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-13 17:16 UTC (permalink / raw)
  To: 1120598@bugs.debian.org, Chuck Lever, Jeff Layton, NeilBrown,
	Scott Mayhew, Steve Dickson, Salvatore Bonaccorso
  Cc: Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, linux-nfs, linux-kernel

Thanks, Chunk.

Suggested trace-cmd report from the client follows. Last 3 lines appear salient, but I've included the full report just in case.

cpus=4
              ls-969   [003] .....   270.318649: nfs_getattr_enter:    fileid=00:2d:262146 fhandle=0xad8c294c version=31 cache_validity=0x0 ()
              ls-969   [003] .....   270.318651: nfs_getattr_exit:     error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x0 () nfs_flags=0x0 ()
              ls-969   [003] .....   270.318654: nfs_revalidate_inode_enter: fileid=00:2d:262146 fhandle=0xad8c294c version=31 cache_validity=0x0 ()
              ls-969   [003] .....   270.318658: rpc_task_begin:       task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=0x4 status=0 action=0x0
              ls-969   [003] .....   270.318658: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_prepare_task
              ls-969   [003] .....   270.318660: nfs4_setup_sequence:  session=0x5988ad3c slot_nr=0 seq_nr=24 highest_used_slotid=0
              ls-969   [003] .....   270.318661: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_start
              ls-969   [003] .....   270.318661: rpc_request:          task:00000006@00000005 nfsv4 GETATTR (sync)
              ls-969   [003] .....   270.318662: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_reserve
              ls-969   [003] .....   270.318663: xprt_reserve:         task:00000006@00000005 xid=0x79569c7a
              ls-969   [003] .....   270.318663: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_reserveresult
              ls-969   [003] .....   270.318663: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_refresh
        rpc.gssd-613   [001] .....   270.318690: rpcgss_upcall_msg:    msg='mech=krb5 uid=591200003 enctypes=20,19,18,17'
        rpc.gssd-970   [002] .....   270.326582: rpcgss_context:       win_size=128 expiry=4316009978 now=4294959728 timeout=84201 acceptor=nfs@nfssrv.ipa.twrlab.net
              ls-969   [003] ...1.   270.326598: rpcgss_ctx_init:      cred=0xffff8895c5989900 service=integrity principal='(null)'
              ls-969   [003] .....   270.326600: rpcgss_upcall_result: for uid 591200003, result=0
              ls-969   [003] .....   270.326601: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_refreshresult
              ls-969   [003] .....   270.326601: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_allocate
              ls-969   [003] .....   270.326603: rpc_buf_alloc:        task:00000006@00000005 callsize=1844 recvsize=2704 status=0
              ls-969   [003] .....   270.326603: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_encode
              ls-969   [003] .....   270.326604: rpcgss_seqno:         task:00000006@00000005 xid=0x79569c7a seqno=1
              ls-969   [003] .....   270.326611: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x1c status=0 action=call_transmit
              ls-969   [003] ...1.   270.326611: xprt_reserve_xprt:    task:00000006@00000005 snd_task:00000006
              ls-969   [003] .....   270.326612: rpcgss_need_reencode: task:00000006@00000005 xid=0x79569c7a rq_seqno=1 seq_xmit=0 reencode unneeded
              ls-969   [003] .....   270.326612: rpc_xdr_sendto:       task:00000006@00000005 head=[0xffff8895c29fe008,260] page=0(0) tail=[(nil),0] len=260
              ls-969   [003] .....   270.326627: xprt_transmit:        task:00000006@00000005 xid=0x79569c7a seqno=1 status=0
              ls-969   [003] ...1.   270.326628: xprt_release_xprt:    task:00000006@00000005 snd_task:ffffffff
              ls-969   [003] .....   270.326629: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x14 status=0 action=call_transmit_status
              ls-969   [003] ...2.   270.326629: rpc_task_sleep:       task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x14 status=0 timeout=0 queue=xprt_pending
              ls-969   [003] .....   270.326630: rpc_task_sync_sleep:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x16 status=0 action=call_status
          <idle>-0     [001] ..s2.   270.326754: xs_data_ready:        peer=[10.108.2.102]:2049
   kworker/u16:0-12    [001] ...1.   270.326762: xprt_lookup_rqst:     peer=[10.108.2.102]:2049 xid=0x79569c7a status=0
   kworker/u16:0-12    [001] ...2.   270.326764: rpc_task_wakeup:      task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x6 status=0 timeout=15000 queue=xprt_pending
   kworker/u16:0-12    [001] .....   270.326768: xs_stream_read_request: peer=[10.108.2.102]:2049 xid=0x79569c7a copied=384 reclen=384 offset=384
   kworker/u16:0-12    [001] .....   270.326769: xs_stream_read_data:  peer=[10.108.2.102]:2049 err=-11 total=388
              ls-969   [003] .....   270.326775: rpc_task_sync_wake:   task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
              ls-969   [003] .....   270.326775: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=xprt_timer
              ls-969   [003] .....   270.326775: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
              ls-969   [003] .....   270.326775: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
              ls-969   [003] .....   270.326776: rpc_xdr_recvfrom:     task:00000006@00000005 head=[0xffff8895c29fe73c,2704] page=0(0) tail=[(nil),0] len=384
              ls-969   [003] .....   270.326785: nfs4_map_name_to_uid: error=0 (OK) id=591200000 name=admin@ipa.twrlab.net
              ls-969   [003] .....   270.326786: nfs4_map_group_to_gid: error=0 (OK) id=591200004 name=domainusers@ipa.twrlab.net
              ls-969   [003] .....   270.326787: rpc_task_run_action:  task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
              ls-969   [003] .....   270.326787: rpc_task_end:         task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
              ls-969   [003] .....   270.326788: rpc_stats_latency:    task:00000006@00000005 xid=0x79569c7a nfsv4 GETATTR backlog=7956 rtt=149 execute=8131 xprt_id=1
              ls-969   [003] .....   270.326789: rpc_task_call_done:   task:00000006@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=nfs41_call_sync_done
              ls-969   [003] .....   270.326789: nfs4_sequence_done:   error=0 (OK) session=0x5988ad3c slot_nr=0 seq_nr=24 highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
              ls-969   [003] ...1.   270.326791: xprt_release_xprt:    task:00000006@00000005 snd_task:ffffffff
              ls-969   [003] .....   270.326793: nfs4_getattr:         error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c valid=TYPE|MODE|NLINK|OWNER|GROUP|RDEV|SIZE|FSID|FILEID|ATIME|MTIME|CTIME|CHANGE|BTIME|0x400200
              ls-969   [003] ...1.   270.326795: nfs_refresh_inode_enter: fileid=00:2d:262146 fhandle=0xad8c294c version=31 cache_validity=0x0 ()
              ls-969   [003] ...1.   270.326797: nfs_set_cache_invalid: error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x0 () nfs_flags=0x0 ()
              ls-969   [003] ...1.   270.326797: nfs_refresh_inode_exit: error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x0 () nfs_flags=0x0 ()
              ls-969   [003] .....   270.326798: nfs_revalidate_inode_exit: error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x0 () nfs_flags=0x0 ()
              ls-969   [003] .....   270.326799: nfs_access_enter:     fileid=00:2d:262146 fhandle=0xad8c294c version=31 cache_validity=0x0 ()
              ls-969   [003] .....   270.326801: rpc_task_begin:       task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=0x4 status=0 action=0x0
              ls-969   [003] .....   270.326801: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_prepare_task
              ls-969   [003] .....   270.326801: nfs4_setup_sequence:  session=0x5988ad3c slot_nr=0 seq_nr=25 highest_used_slotid=0
              ls-969   [003] .....   270.326802: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_start
              ls-969   [003] .....   270.326802: rpc_request:          task:00000007@00000005 nfsv4 ACCESS (sync)
              ls-969   [003] .....   270.326802: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_reserve
              ls-969   [003] .....   270.326803: xprt_reserve:         task:00000007@00000005 xid=0x7a569c7a
              ls-969   [003] .....   270.326803: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_reserveresult
              ls-969   [003] .....   270.326803: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_refresh
              ls-969   [003] .....   270.326804: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_refreshresult
              ls-969   [003] .....   270.326804: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_allocate
              ls-969   [003] .....   270.326804: rpc_buf_alloc:        task:00000007@00000005 callsize=1836 recvsize=2712 status=0
              ls-969   [003] .....   270.326804: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_encode
              ls-969   [003] .....   270.326805: rpcgss_seqno:         task:00000007@00000005 xid=0x7a569c7a seqno=2
              ls-969   [003] .....   270.326807: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x1c status=0 action=call_transmit
              ls-969   [003] ...1.   270.326807: xprt_reserve_xprt:    task:00000007@00000005 snd_task:00000007
              ls-969   [003] .....   270.326808: rpcgss_need_reencode: task:00000007@00000005 xid=0x7a569c7a rq_seqno=2 seq_xmit=1 reencode unneeded
              ls-969   [003] .....   270.326808: rpc_xdr_sendto:       task:00000007@00000005 head=[0xffff8895c29fe008,268] page=0(0) tail=[(nil),0] len=268
              ls-969   [003] .....   270.326816: xprt_transmit:        task:00000007@00000005 xid=0x7a569c7a seqno=2 status=0
              ls-969   [003] ...1.   270.326817: xprt_release_xprt:    task:00000007@00000005 snd_task:ffffffff
              ls-969   [003] .....   270.326817: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x14 status=0 action=call_transmit_status
              ls-969   [003] ...2.   270.326817: rpc_task_sleep:       task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x14 status=0 timeout=0 queue=xprt_pending
              ls-969   [003] .....   270.326817: rpc_task_sync_sleep:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x16 status=0 action=call_status
          <idle>-0     [001] ..s2.   270.326882: xs_data_ready:        peer=[10.108.2.102]:2049
   kworker/u16:0-12    [001] ...1.   270.326885: xprt_lookup_rqst:     peer=[10.108.2.102]:2049 xid=0x7a569c7a status=0
   kworker/u16:0-12    [001] ...2.   270.326885: rpc_task_wakeup:      task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x6 status=0 timeout=15000 queue=xprt_pending
   kworker/u16:0-12    [001] .....   270.326888: xs_stream_read_request: peer=[10.108.2.102]:2049 xid=0x7a569c7a copied=260 reclen=260 offset=260
   kworker/u16:0-12    [001] .....   270.326888: xs_stream_read_data:  peer=[10.108.2.102]:2049 err=-11 total=264
              ls-969   [003] .....   270.326895: rpc_task_sync_wake:   task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
              ls-969   [003] .....   270.326895: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=xprt_timer
              ls-969   [003] .....   270.326895: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
              ls-969   [003] .....   270.326895: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
              ls-969   [003] .....   270.326895: rpc_xdr_recvfrom:     task:00000007@00000005 head=[0xffff8895c29fe734,2712] page=0(0) tail=[(nil),0] len=260
              ls-969   [003] .....   270.326898: rpc_task_run_action:  task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
              ls-969   [003] .....   270.326898: rpc_task_end:         task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
              ls-969   [003] .....   270.326899: rpc_stats_latency:    task:00000007@00000005 xid=0x7a569c7a nfsv4 ACCESS backlog=7 rtt=76 execute=98 xprt_id=1
              ls-969   [003] .....   270.326899: rpc_task_call_done:   task:00000007@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=nfs41_call_sync_done
              ls-969   [003] .....   270.326899: nfs4_sequence_done:   error=0 (OK) session=0x5988ad3c slot_nr=0 seq_nr=25 highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
              ls-969   [003] ...1.   270.326900: xprt_release_xprt:    task:00000007@00000005 snd_task:ffffffff
              ls-969   [003] ...1.   270.326901: nfs_refresh_inode_enter: fileid=00:2d:262146 fhandle=0xad8c294c version=31 cache_validity=0x0 ()
              ls-969   [003] ...1.   270.326901: nfs_set_cache_invalid: error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x0 () nfs_flags=0x0 ()
              ls-969   [003] ...1.   270.326901: nfs_refresh_inode_exit: error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x0 () nfs_flags=0x0 ()
              ls-969   [003] .....   270.326902: nfs4_access:          error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c
              ls-969   [003] .....   270.326903: nfs_access_exit:      error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x0 () nfs_flags=0x4 (ACL_LRU_SET) mask=0x24 permitted=0x7
              ls-969   [003] .....   270.326907: nfs_getattr_enter:    fileid=00:2d:262146 fhandle=0xad8c294c version=31 cache_validity=0x0 ()
              ls-969   [003] .....   270.326908: nfs_getattr_exit:     error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x0 () nfs_flags=0x4 (ACL_LRU_SET)
              ls-969   [003] .....   270.326928: nfs_readdir_cache_fill: fileid=00:2d:262146 fhandle=0xad8c294c version=31 cookie=0000000000000000:0x0 cache_index=0 dtsize=4096
              ls-969   [003] .....   270.326931: rpc_task_begin:       task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=0x4 status=0 action=0x0
              ls-969   [003] .....   270.326931: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_prepare_task
              ls-969   [003] .....   270.326931: nfs4_setup_sequence:  session=0x5988ad3c slot_nr=0 seq_nr=26 highest_used_slotid=0
              ls-969   [003] .....   270.326931: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_start
              ls-969   [003] .....   270.326932: rpc_request:          task:00000008@00000005 nfsv4 READDIR (sync)
              ls-969   [003] .....   270.326932: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_reserve
              ls-969   [003] .....   270.326932: xprt_reserve:         task:00000008@00000005 xid=0x7b569c7a
              ls-969   [003] .....   270.326932: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_reserveresult
              ls-969   [003] .....   270.326932: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_refresh
              ls-969   [003] .....   270.326933: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_refreshresult
              ls-969   [003] .....   270.326933: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_allocate
              ls-969   [003] .....   270.326933: rpc_buf_alloc:        task:00000008@00000005 callsize=3932 recvsize=176 status=0
              ls-969   [003] .....   270.326933: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_encode
              ls-969   [003] .....   270.326934: rpcgss_seqno:         task:00000008@00000005 xid=0x7b569c7a seqno=3
              ls-969   [003] .....   270.326936: rpc_xdr_reply_pages:  task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=0
              ls-969   [003] .....   270.326937: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|NORTO|CRED_NOREF runstate=RUNNING|0x1c status=0 action=call_transmit
              ls-969   [003] ...1.   270.326937: xprt_reserve_xprt:    task:00000008@00000005 snd_task:00000008
              ls-969   [003] .....   270.326937: rpcgss_need_reencode: task:00000008@00000005 xid=0x7b569c7a rq_seqno=3 seq_xmit=2 reencode unneeded
              ls-969   [003] .....   270.326938: rpc_xdr_sendto:       task:00000008@00000005 head=[0xffff8895c29fe008,284] page=0(0) tail=[(nil),0] len=284
              ls-969   [003] .....   270.326946: xprt_transmit:        task:00000008@00000005 xid=0x7b569c7a seqno=3 status=0
              ls-969   [003] ...1.   270.326947: xprt_release_xprt:    task:00000008@00000005 snd_task:ffffffff
              ls-969   [003] .....   270.326947: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x14 status=0 action=call_transmit_status
              ls-969   [003] ...2.   270.326947: rpc_task_sleep:       task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x14 status=0 timeout=0 queue=xprt_pending
              ls-969   [003] .....   270.326947: rpc_task_sync_sleep:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x16 status=0 action=call_status
          <idle>-0     [001] ..s2.   270.327040: xs_data_ready:        peer=[10.108.2.102]:2049
   kworker/u16:0-12    [001] ...1.   270.327048: xprt_lookup_rqst:     peer=[10.108.2.102]:2049 xid=0x7b569c7a status=0
   kworker/u16:0-12    [001] ...2.   270.327050: rpc_task_wakeup:      task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x6 status=0 timeout=15000 queue=xprt_pending
   kworker/u16:0-12    [001] .....   270.327054: xs_stream_read_request: peer=[10.108.2.102]:2049 xid=0x7b569c7a copied=988 reclen=988 offset=988
   kworker/u16:0-12    [001] .....   270.327055: xs_stream_read_data:  peer=[10.108.2.102]:2049 err=-11 total=992
              ls-969   [003] .....   270.327062: rpc_task_sync_wake:   task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
              ls-969   [003] .....   270.327062: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=xprt_timer
              ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
              ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
              ls-969   [003] .....   270.327063: rpc_xdr_recvfrom:     task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
              ls-969   [003] .....   270.327067: rpc_xdr_overflow:     task:00000008@00000005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988
              ls-969   [003] .....   270.327068: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=-5 action=rpc_exit_task
              ls-969   [003] .....   270.327068: rpc_task_end:         task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=-5 action=rpc_exit_task
              ls-969   [003] .....   270.327068: rpc_stats_latency:    task:00000008@00000005 xid=0x7b569c7a nfsv4 READDIR backlog=7 rtt=110 execute=137 xprt_id=1
              ls-969   [003] .....   270.327068: rpc_task_call_done:   task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=-5 action=nfs41_call_sync_done
              ls-969   [003] .....   270.327068: nfs4_sequence_done:   error=0 (OK) session=0x5988ad3c slot_nr=0 seq_nr=26 highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
              ls-969   [003] ...1.   270.327069: xprt_release_xprt:    task:00000008@00000005 snd_task:ffffffff
              ls-969   [003] ...1.   270.327070: nfs_set_cache_invalid: error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x4 (INVALID_ATIME) nfs_flags=0x4 (ACL_LRU_SET)
              ls-969   [003] .....   270.327070: nfs4_readdir:         error=-5 (EIO) fileid=00:2d:262146 fhandle=0xad8c294c
              ls-969   [003] .....   270.327071: nfs_readdir_cache_fill_done: error=-5 (IO) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x4 (INVALID_ATIME) nfs_flags=0x4 (ACL_LRU_SET)



TWR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 17:16     ` Tyler W. Ross
@ 2025-11-13 17:47       ` Chuck Lever
  2025-11-13 18:05         ` Tyler W. Ross
  2025-11-13 21:21         ` Salvatore Bonaccorso
  0 siblings, 2 replies; 31+ messages in thread
From: Chuck Lever @ 2025-11-13 17:47 UTC (permalink / raw)
  To: Tyler W. Ross, 1120598@bugs.debian.org, Jeff Layton, NeilBrown,
	Scott Mayhew, Steve Dickson, Salvatore Bonaccorso
  Cc: Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, linux-nfs, linux-kernel

On 11/13/25 12:16 PM, Tyler W. Ross wrote:
> Thanks, Chunk.
> 
> Suggested trace-cmd report from the client follows. Last 3 lines appear salient, but I've included the full report just in case.
> 
>           <idle>-0     [001] ..s2.   270.327040: xs_data_ready:        peer=[10.108.2.102]:2049
>    kworker/u16:0-12    [001] ...1.   270.327048: xprt_lookup_rqst:     peer=[10.108.2.102]:2049 xid=0x7b569c7a status=0
>    kworker/u16:0-12    [001] ...2.   270.327050: rpc_task_wakeup:      task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x6 status=0 timeout=15000 queue=xprt_pending
>    kworker/u16:0-12    [001] .....   270.327054: xs_stream_read_request: peer=[10.108.2.102]:2049 xid=0x7b569c7a copied=988 reclen=988 offset=988
>    kworker/u16:0-12    [001] .....   270.327055: xs_stream_read_data:  peer=[10.108.2.102]:2049 err=-11 total=992
>               ls-969   [003] .....   270.327062: rpc_task_sync_wake:   task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
>               ls-969   [003] .....   270.327062: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=xprt_timer
>               ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
>               ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
>               ls-969   [003] .....   270.327063: rpc_xdr_recvfrom:     task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
>               ls-969   [003] .....   270.327067: rpc_xdr_overflow:     task:00000008@00000005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988

Here's the problem. This is a sign of an XDR decoding issue. If you
capture the traffic with Wireshark, does Wireshark indicate where the
XDR is malformed?

If it doesn't, then there is some problem with the client code. Since
Fedora 43 is working as expected, I would guess there's a misapplied
patch on Debian 13's kernel...?


>               ls-969   [003] .....   270.327068: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=-5 action=rpc_exit_task
>               ls-969   [003] .....   270.327068: rpc_task_end:         task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=-5 action=rpc_exit_task
>               ls-969   [003] .....   270.327068: rpc_stats_latency:    task:00000008@00000005 xid=0x7b569c7a nfsv4 READDIR backlog=7 rtt=110 execute=137 xprt_id=1
>               ls-969   [003] .....   270.327068: rpc_task_call_done:   task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=-5 action=nfs41_call_sync_done
>               ls-969   [003] .....   270.327068: nfs4_sequence_done:   error=0 (OK) session=0x5988ad3c slot_nr=0 seq_nr=26 highest_slotid=63 target_highest_slotid=63 status_flags=0x0 ()
>               ls-969   [003] ...1.   270.327069: xprt_release_xprt:    task:00000008@00000005 snd_task:ffffffff
>               ls-969   [003] ...1.   270.327070: nfs_set_cache_invalid: error=0 (OK) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x4 (INVALID_ATIME) nfs_flags=0x4 (ACL_LRU_SET)
>               ls-969   [003] .....   270.327070: nfs4_readdir:         error=-5 (EIO) fileid=00:2d:262146 fhandle=0xad8c294c
>               ls-969   [003] .....   270.327071: nfs_readdir_cache_fill_done: error=-5 (IO) fileid=00:2d:262146 fhandle=0xad8c294c type=4 (DIR) version=31 size=4096 cache_validity=0x4 (INVALID_ATIME) nfs_flags=0x4 (ACL_LRU_SET)


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 17:47       ` Chuck Lever
@ 2025-11-13 18:05         ` Tyler W. Ross
  2025-11-13 18:12           ` Chuck Lever
  2025-11-13 21:21         ` Salvatore Bonaccorso
  1 sibling, 1 reply; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-13 18:05 UTC (permalink / raw)
  To: Chuck Lever
  Cc: 1120598@bugs.debian.org, Jeff Layton, NeilBrown, Scott Mayhew,
	Steve Dickson, Salvatore Bonaccorso, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

On Thursday, November 13th, 2025 at 10:47 AM, Chuck Lever <chuck.lever@oracle.com> wrote:

> > ls-969 [003] ..... 270.327063: rpc_xdr_recvfrom: task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
> > ls-969 [003] ..... 270.327067: rpc_xdr_overflow: task:00000008@00000005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988
> 
> 
> Here's the problem. This is a sign of an XDR decoding issue. If you
> capture the traffic with Wireshark, does Wireshark indicate where the
> XDR is malformed?

Wireshark appears to decode the READDIR reply without issue. Nothing is obviously marked as malformed, and values all appear sane when spot-checking fields in the decoded packet.


TWR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 18:05         ` Tyler W. Ross
@ 2025-11-13 18:12           ` Chuck Lever
  2025-11-13 18:51             ` Tyler W. Ross
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-11-13 18:12 UTC (permalink / raw)
  To: Tyler W. Ross
  Cc: 1120598@bugs.debian.org, Jeff Layton, NeilBrown, Scott Mayhew,
	Steve Dickson, Salvatore Bonaccorso, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

On 11/13/25 1:05 PM, Tyler W. Ross wrote:
> On Thursday, November 13th, 2025 at 10:47 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>>> ls-969 [003] ..... 270.327063: rpc_xdr_recvfrom: task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
>>> ls-969 [003] ..... 270.327067: rpc_xdr_overflow: task:00000008@00000005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988
>>
>>
>> Here's the problem. This is a sign of an XDR decoding issue. If you
>> capture the traffic with Wireshark, does Wireshark indicate where the
>> XDR is malformed?
> 
> Wireshark appears to decode the READDIR reply without issue. Nothing is obviously marked as malformed, and values all appear sane when spot-checking fields in the decoded packet.
Then I would start looking for differences between the Debian 13 and
Fedora 43 kernel code base under net/sunrpc/ .

Alternatively, "git bisect first, ask questions later" ... :-)

So I didn't find an indication of whether this was sec=krb5, sec=krb5i,
or sec=krb5p. That might narrow down where the code changed.

Also, the xdr_buf might have a page boundary positioned in the middle of
an XDR data item. Knowing which data item is being decoded where the
"overflow" occurs might be helpful (I think adding pr_info() call sites
or trace_printk() will be adequate to gain some better observability).


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 18:12           ` Chuck Lever
@ 2025-11-13 18:51             ` Tyler W. Ross
  2025-11-13 18:57               ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-13 18:51 UTC (permalink / raw)
  To: Chuck Lever
  Cc: 1120598@bugs.debian.org, Jeff Layton, NeilBrown, Scott Mayhew,
	Steve Dickson, Salvatore Bonaccorso, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

On Thursday, November 13th, 2025 at 11:12 AM, Chuck Lever <chuck.lever@oracle.com> wrote:

> Then I would start looking for differences between the Debian 13 and
> Fedora 43 kernel code base under net/sunrpc/ .
> 
> Alternatively, "git bisect first, ask questions later" ... :-)

This is outside my day-to-day, so I don't have a workflow for this kind of
testing/debugging, but I'll see what I can do.

Thanks for the starting place.

> So I didn't find an indication of whether this was sec=krb5, sec=krb5i,
> or sec=krb5p. That might narrow down where the code changed.

I confirmed the issue with all 3 krb5 sec modes, in both the 6.12 kernel
that ships with Debian 13 and the 6.17 that currently ships with Debian
Sid/unstable. Similarly, I confirmed NFSv4.2, 4.1 and 4.0 are impacted.

> Also, the xdr_buf might have a page boundary positioned in the middle of
> an XDR data item. Knowing which data item is being decoded where the
> "overflow" occurs might be helpful (I think adding pr_info() call sites
> or trace_printk() will be adequate to gain some better observability).

No experience with kernel hacking, so I'm not confident I can locate
meaningful places to insert those.

I'll see where some snooping and a bisect gets me. Failing that, if
anyone has recommendations on where to add those calls, I'd appreciate
the guidance.


TWR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 18:51             ` Tyler W. Ross
@ 2025-11-13 18:57               ` Chuck Lever
  0 siblings, 0 replies; 31+ messages in thread
From: Chuck Lever @ 2025-11-13 18:57 UTC (permalink / raw)
  To: Tyler W. Ross
  Cc: 1120598@bugs.debian.org, Jeff Layton, NeilBrown, Scott Mayhew,
	Steve Dickson, Salvatore Bonaccorso, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

On 11/13/25 1:51 PM, Tyler W. Ross wrote:
> On Thursday, November 13th, 2025 at 11:12 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>> Then I would start looking for differences between the Debian 13 and
>> Fedora 43 kernel code base under net/sunrpc/ .
>>
>> Alternatively, "git bisect first, ask questions later" ... :-)
> 
> This is outside my day-to-day, so I don't have a workflow for this kind of
> testing/debugging, but I'll see what I can do.
> 
> Thanks for the starting place.
> 
>> So I didn't find an indication of whether this was sec=krb5, sec=krb5i,
>> or sec=krb5p. That might narrow down where the code changed.
> 
> I confirmed the issue with all 3 krb5 sec modes, in both the 6.12 kernel
> that ships with Debian 13 and the 6.17 that currently ships with Debian
> Sid/unstable. Similarly, I confirmed NFSv4.2, 4.1 and 4.0 are impacted.
> 
>> Also, the xdr_buf might have a page boundary positioned in the middle of
>> an XDR data item. Knowing which data item is being decoded where the
>> "overflow" occurs might be helpful (I think adding pr_info() call sites
>> or trace_printk() will be adequate to gain some better observability).
> 
> No experience with kernel hacking, so I'm not confident I can locate
> meaningful places to insert those.
> 
> I'll see where some snooping and a bisect gets me. Failing that, if
> anyone has recommendations on where to add those calls, I'd appreciate
> the guidance.

xdr_inline_decode(). Easiest approach (but somewhat noisy) would be to
add a WARN_ON just after each of the trace_rpc_xdr_overflow() call
sites. The stack trace on the failing decode will be dumped into the
system journal.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 17:47       ` Chuck Lever
  2025-11-13 18:05         ` Tyler W. Ross
@ 2025-11-13 21:21         ` Salvatore Bonaccorso
  2025-11-13 21:23           ` Chuck Lever
  1 sibling, 1 reply; 31+ messages in thread
From: Salvatore Bonaccorso @ 2025-11-13 21:21 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Tyler W. Ross, 1120598@bugs.debian.org, Jeff Layton, NeilBrown,
	Scott Mayhew, Steve Dickson, Salvatore Bonaccorso,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, linux-nfs, linux-kernel

Hi Chuck,

On Thu, Nov 13, 2025 at 12:47:23PM -0500, Chuck Lever wrote:
> On 11/13/25 12:16 PM, Tyler W. Ross wrote:
> > Thanks, Chunk.
> > 
> > Suggested trace-cmd report from the client follows. Last 3 lines appear salient, but I've included the full report just in case.
> > 
> >           <idle>-0     [001] ..s2.   270.327040: xs_data_ready:        peer=[10.108.2.102]:2049
> >    kworker/u16:0-12    [001] ...1.   270.327048: xprt_lookup_rqst:     peer=[10.108.2.102]:2049 xid=0x7b569c7a status=0
> >    kworker/u16:0-12    [001] ...2.   270.327050: rpc_task_wakeup:      task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x6 status=0 timeout=15000 queue=xprt_pending
> >    kworker/u16:0-12    [001] .....   270.327054: xs_stream_read_request: peer=[10.108.2.102]:2049 xid=0x7b569c7a copied=988 reclen=988 offset=988
> >    kworker/u16:0-12    [001] .....   270.327055: xs_stream_read_data:  peer=[10.108.2.102]:2049 err=-11 total=992
> >               ls-969   [003] .....   270.327062: rpc_task_sync_wake:   task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
> >               ls-969   [003] .....   270.327062: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=xprt_timer
> >               ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
> >               ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
> >               ls-969   [003] .....   270.327063: rpc_xdr_recvfrom:     task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
> >               ls-969   [003] .....   270.327067: rpc_xdr_overflow:     task:00000008@00000005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988
> 
> Here's the problem. This is a sign of an XDR decoding issue. If you
> capture the traffic with Wireshark, does Wireshark indicate where the
> XDR is malformed?
> 
> If it doesn't, then there is some problem with the client code. Since
> Fedora 43 is working as expected, I would guess there's a misapplied
> patch on Debian 13's kernel...?

if it is helpful: Debian follows the stable upstream releases (6.12.y
for trixie/Debian 13, right now 6.17.y for Debian unstable) and we try
to keep the patches limited which we apply on top. So far I see none
which touches net/sunrpc/. The patches applied:
https://salsa.debian.org/kernel-team/linux/-/tree/debian/6.17/forky/debian/patches?ref_type=heads
(in case this could help narrowing down more the issue).

But we could try here additionally, if Tylor has the possibility to do
so, to try directly the 6.17.7 upstream version without Debian patches
applied.

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 21:21         ` Salvatore Bonaccorso
@ 2025-11-13 21:23           ` Chuck Lever
  2025-11-13 22:20             ` Salvatore Bonaccorso
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-11-13 21:23 UTC (permalink / raw)
  To: Salvatore Bonaccorso
  Cc: Tyler W. Ross, 1120598@bugs.debian.org, Jeff Layton, NeilBrown,
	Scott Mayhew, Steve Dickson, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

On 11/13/25 4:21 PM, Salvatore Bonaccorso wrote:
> Hi Chuck,
> 
> On Thu, Nov 13, 2025 at 12:47:23PM -0500, Chuck Lever wrote:
>> On 11/13/25 12:16 PM, Tyler W. Ross wrote:
>>> Thanks, Chunk.
>>>
>>> Suggested trace-cmd report from the client follows. Last 3 lines appear salient, but I've included the full report just in case.
>>>
>>>           <idle>-0     [001] ..s2.   270.327040: xs_data_ready:        peer=[10.108.2.102]:2049
>>>    kworker/u16:0-12    [001] ...1.   270.327048: xprt_lookup_rqst:     peer=[10.108.2.102]:2049 xid=0x7b569c7a status=0
>>>    kworker/u16:0-12    [001] ...2.   270.327050: rpc_task_wakeup:      task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x6 status=0 timeout=15000 queue=xprt_pending
>>>    kworker/u16:0-12    [001] .....   270.327054: xs_stream_read_request: peer=[10.108.2.102]:2049 xid=0x7b569c7a copied=988 reclen=988 offset=988
>>>    kworker/u16:0-12    [001] .....   270.327055: xs_stream_read_data:  peer=[10.108.2.102]:2049 err=-11 total=992
>>>               ls-969   [003] .....   270.327062: rpc_task_sync_wake:   task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
>>>               ls-969   [003] .....   270.327062: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=xprt_timer
>>>               ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
>>>               ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
>>>               ls-969   [003] .....   270.327063: rpc_xdr_recvfrom:     task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
>>>               ls-969   [003] .....   270.327067: rpc_xdr_overflow:     task:00000008@00000005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988
>>
>> Here's the problem. This is a sign of an XDR decoding issue. If you
>> capture the traffic with Wireshark, does Wireshark indicate where the
>> XDR is malformed?
>>
>> If it doesn't, then there is some problem with the client code. Since
>> Fedora 43 is working as expected, I would guess there's a misapplied
>> patch on Debian 13's kernel...?
> 
> if it is helpful: Debian follows the stable upstream releases (6.12.y
> for trixie/Debian 13, right now 6.17.y for Debian unstable) and we try
> to keep the patches limited which we apply on top. So far I see none
> which touches net/sunrpc/. The patches applied:
> https://salsa.debian.org/kernel-team/linux/-/tree/debian/6.17/forky/debian/patches?ref_type=heads
> (in case this could help narrowing down more the issue).
> 
> But we could try here additionally, if Tylor has the possibility to do
> so, to try directly the 6.17.7 upstream version without Debian patches
> applied.

A bisect between broken v6.12.y and working v6.17.7 could identify
what is possibly missing from v6.12.y.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 21:23           ` Chuck Lever
@ 2025-11-13 22:20             ` Salvatore Bonaccorso
  2025-11-13 22:30               ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Salvatore Bonaccorso @ 2025-11-13 22:20 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Tyler W. Ross, 1120598@bugs.debian.org, Jeff Layton, NeilBrown,
	Scott Mayhew, Steve Dickson, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

Hi Chuck,

On Thu, Nov 13, 2025 at 04:23:52PM -0500, Chuck Lever wrote:
> On 11/13/25 4:21 PM, Salvatore Bonaccorso wrote:
> > Hi Chuck,
> > 
> > On Thu, Nov 13, 2025 at 12:47:23PM -0500, Chuck Lever wrote:
> >> On 11/13/25 12:16 PM, Tyler W. Ross wrote:
> >>> Thanks, Chunk.
> >>>
> >>> Suggested trace-cmd report from the client follows. Last 3 lines appear salient, but I've included the full report just in case.
> >>>
> >>>           <idle>-0     [001] ..s2.   270.327040: xs_data_ready:        peer=[10.108.2.102]:2049
> >>>    kworker/u16:0-12    [001] ...1.   270.327048: xprt_lookup_rqst:     peer=[10.108.2.102]:2049 xid=0x7b569c7a status=0
> >>>    kworker/u16:0-12    [001] ...2.   270.327050: rpc_task_wakeup:      task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=0x6 status=0 timeout=15000 queue=xprt_pending
> >>>    kworker/u16:0-12    [001] .....   270.327054: xs_stream_read_request: peer=[10.108.2.102]:2049 xid=0x7b569c7a copied=988 reclen=988 offset=988
> >>>    kworker/u16:0-12    [001] .....   270.327055: xs_stream_read_data:  peer=[10.108.2.102]:2049 err=-11 total=992
> >>>               ls-969   [003] .....   270.327062: rpc_task_sync_wake:   task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
> >>>               ls-969   [003] .....   270.327062: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=xprt_timer
> >>>               ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
> >>>               ls-969   [003] .....   270.327063: rpc_task_run_action:  task:00000008@00000005 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
> >>>               ls-969   [003] .....   270.327063: rpc_xdr_recvfrom:     task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=988
> >>>               ls-969   [003] .....   270.327067: rpc_xdr_overflow:     task:00000008@00000005 nfsv4 READDIR requested=8 p=0xffff8895c29fefec end=0xffff8895c29feff0 xdr=[0xffff8895c29fef64,140]/4008/[0xffff8895c29feff0,36]/988
> >>
> >> Here's the problem. This is a sign of an XDR decoding issue. If you
> >> capture the traffic with Wireshark, does Wireshark indicate where the
> >> XDR is malformed?
> >>
> >> If it doesn't, then there is some problem with the client code. Since
> >> Fedora 43 is working as expected, I would guess there's a misapplied
> >> patch on Debian 13's kernel...?
> > 
> > if it is helpful: Debian follows the stable upstream releases (6.12.y
> > for trixie/Debian 13, right now 6.17.y for Debian unstable) and we try
> > to keep the patches limited which we apply on top. So far I see none
> > which touches net/sunrpc/. The patches applied:
> > https://salsa.debian.org/kernel-team/linux/-/tree/debian/6.17/forky/debian/patches?ref_type=heads
> > (in case this could help narrowing down more the issue).
> > 
> > But we could try here additionally, if Tylor has the possibility to do
> > so, to try directly the 6.17.7 upstream version without Debian patches
> > applied.
> 
> A bisect between broken v6.12.y and working v6.17.7 could identify
> what is possibly missing from v6.12.y.

There seems to be a missundestanding? 6.17.7 as present in Debian
unstable is neither working, at least Tyler said:

> 2. Freshly installed Debian sid via mini ISO (2025-11-01). Same
> configuration as 1/above.

which includes a 6.17.y based kernel (6.17.7-1).

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 22:20             ` Salvatore Bonaccorso
@ 2025-11-13 22:30               ` Chuck Lever
  2025-11-14  4:35                 ` Tyler W. Ross
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-11-13 22:30 UTC (permalink / raw)
  To: Salvatore Bonaccorso
  Cc: Tyler W. Ross, 1120598@bugs.debian.org, Jeff Layton, NeilBrown,
	Scott Mayhew, Steve Dickson, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

On 11/13/25 5:20 PM, Salvatore Bonaccorso wrote:
>>> if it is helpful: Debian follows the stable upstream releases (6.12.y
>>> for trixie/Debian 13, right now 6.17.y for Debian unstable) and we try
>>> to keep the patches limited which we apply on top. So far I see none
>>> which touches net/sunrpc/. The patches applied:
>>> https://salsa.debian.org/kernel-team/linux/-/tree/debian/6.17/forky/
>>> debian/patches?ref_type=heads
>>> (in case this could help narrowing down more the issue).
>>>
>>> But we could try here additionally, if Tylor has the possibility to do
>>> so, to try directly the 6.17.7 upstream version without Debian patches
>>> applied.
>> A bisect between broken v6.12.y and working v6.17.7 could identify
>> what is possibly missing from v6.12.y.
> There seems to be a missundestanding? 6.17.7 as present in Debian
> unstable is neither working, at least Tyler said:
> 
>> 2. Freshly installed Debian sid via mini ISO (2025-11-01). Same
>> configuration as 1/above.
> which includes a 6.17.y based kernel (6.17.7-1).

Got it. No, I grew up with Fedora. The Debian distribution names are
somewhat lost on me.

However, if you know there is a working kernel release sometime in the
past, a bisect would be useful. Failing that, go ahead and try a stock
linux-6.17.y kernel. But try building both the Fedora /boot/config and
the Debian one. Could be we hit a Kconfig problem?


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-13 22:30               ` Chuck Lever
@ 2025-11-14  4:35                 ` Tyler W. Ross
  2025-11-14  5:09                   ` Tyler W. Ross
  0 siblings, 1 reply; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-14  4:35 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Scott Mayhew, Steve Dickson, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

I tried a couple vanilla/stock kernels today, without success.

Most notably, I built 6.17.8 from upstream using the Kconfig from the
working Fedora 43 client in my lab ("config-6.17.5-300.fc43.x86_64").

Unfortunately, the rpc_xdr_overflow still occurs with this kernel.


TWR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-14  4:35                 ` Tyler W. Ross
@ 2025-11-14  5:09                   ` Tyler W. Ross
  2025-11-14 14:18                     ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-14  5:09 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Scott Mayhew, Steve Dickson, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

On Thursday, November 13th, 2025 at 9:35 PM, Tyler W. Ross <TWR@tylerwross.com> wrote:

> I tried a couple vanilla/stock kernels today, without success.
> 
> Most notably, I built 6.17.8 from upstream using the Kconfig from the
> working Fedora 43 client in my lab ("config-6.17.5-300.fc43.x86_64").
> 
> Unfortunately, the rpc_xdr_overflow still occurs with this kernel.

Quick addendum:

I had not tried Debian 12, because CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA2
was not enabled in the shipped Kconfig.

I just spun up a Debian 12 VM and installed the aforementioned
upstream 6.17.8 with Fedora 43 Kconfig kernel and confirmed the issue
also occurs on Debian 12.


TWR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-14  5:09                   ` Tyler W. Ross
@ 2025-11-14 14:18                     ` Chuck Lever
  2025-11-16  0:38                       ` Tyler W. Ross
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-11-14 14:18 UTC (permalink / raw)
  To: Tyler W. Ross
  Cc: Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Scott Mayhew, Steve Dickson, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

On 11/14/25 12:09 AM, Tyler W. Ross wrote:
> On Thursday, November 13th, 2025 at 9:35 PM, Tyler W. Ross <TWR@tylerwross.com> wrote:
> 
>> I tried a couple vanilla/stock kernels today, without success.
>>
>> Most notably, I built 6.17.8 from upstream using the Kconfig from the
>> working Fedora 43 client in my lab ("config-6.17.5-300.fc43.x86_64").
>>
>> Unfortunately, the rpc_xdr_overflow still occurs with this kernel.
> 
> Quick addendum:
> 
> I had not tried Debian 12, because CONFIG_RPCSEC_GSS_KRB5_ENCTYPES_AES_SHA2
> was not enabled in the shipped Kconfig.
> 
> I just spun up a Debian 12 VM and installed the aforementioned
> upstream 6.17.8 with Fedora 43 Kconfig kernel and confirmed the issue
> also occurs on Debian 12.
Then I would say further hunting for the broken commit is going to be
fruitless. Adding the WARNs in net/sunrpc/xdr.c is a good next step so
we see which XDR data item (assuming it's the same one every time) is
failing to decode.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-14 14:18                     ` Chuck Lever
@ 2025-11-16  0:38                       ` Tyler W. Ross
  2025-11-16 16:29                         ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-16  0:38 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Scott Mayhew, Steve Dickson, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker, linux-nfs,
	linux-kernel

On Friday, November 14th, 2025 at 7:19 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> Then I would say further hunting for the broken commit is going to be
> fruitless. Adding the WARNs in net/sunrpc/xdr.c is a good next step so
> we see which XDR data item (assuming it's the same one every time) is
> failing to decode.

I added WARNs after each trace_rpc_xdr_overflow() call, and then a couple
pr_info() inside xdr_copy_to_scratch() as a follow-up.

If I'm understanding correctly, it's failing in the xdr_copy_to_scratch()
call inside xdr_inline_decode(), because the xdr_stream struct has an
unset/NULL scratch kvec. I don't understand the context enough to
speculate on why, though.

[   26.844102] Entered xdr_copy_to_scratch()
[   26.844105] xdr->scratch.iov_base: 0000000000000000
[   26.844107] xdr->scratch.iov_len: 0
[   26.844127] ------------[ cut here ]------------
[   26.844128] WARNING: CPU: 1 PID: 886 at net/sunrpc/xdr.c:1490 xdr_inline_decode.cold+0x65/0x141 [sunrpc]
[   26.844153] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace netfs binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd ccp kvm cfg80211 hid_generic usbhid hid irqbypass rfkill ghash_clmulni_intel aesni_intel pcspkr 8021q garp stp virtio_balloon llc mrp button evdev joydev sg auth_rpcgss sunrpc configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_cryptoapi sr_mod cdrom bochs uhci_hcd drm_client_lib drm_shmem_helper ehci_pci ata_generic sd_mod drm_kms_helper ehci_hcd ata_piix libata drm virtio_net usbcore virtio_scsi floppy psmouse net_failover failover scsi_mod serio_raw i2c_piix4 usb_common scsi_common i2c_smbus
[   26.844217] CPU: 1 UID: 591200003 PID: 886 Comm: ls Not tainted 6.17.8-debbug1120598hack3 #9 PREEMPT(lazy)  
[   26.844220] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   26.844222] RIP: 0010:xdr_inline_decode.cold+0x65/0x141 [sunrpc]
[   26.844238] Code: 24 48 c7 c7 e7 eb 8c c0 48 8b 71 28 e8 5a 36 fc d7 48 8b 0c 24 4c 8b 44 24 10 48 8b 54 24 08 4c 39 41 28 73 0c 0f 1f 44 00 00 <0f> 0b e9 b7 fe fe ff 48 89 d8 48 89 cf 4c 89 44 24 08 48 29 d0 48
[   26.844240] RSP: 0018:ffffd09e82ce3758 EFLAGS: 00010293
[   26.844242] RAX: 0000000000000017 RBX: ffff8f1e0adcffe8 RCX: ffffd09e82ce3838
[   26.844244] RDX: ffff8f1e0adcffe4 RSI: 0000000000000001 RDI: ffff8f1f37c5ce40
[   26.844245] RBP: ffffd09e82ce37b4 R08: 0000000000000008 R09: ffffd09e82ce3600
[   26.844246] R10: ffffffff9acdb348 R11: 00000000ffffefff R12: 000000000000001a
[   26.844247] R13: ffff8f1e01151200 R14: 0000000000000000 R15: 0000000000440000
[   26.844250] FS:  00007fa5d13db240(0000) GS:ffff8f1f9c44a000(0000) knlGS:0000000000000000
[   26.844252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.844253] CR2: 00007fa5d13b9000 CR3: 000000010ab82000 CR4: 0000000000750ef0
[   26.844255] PKRU: 55555554
[   26.844257] Call Trace:
[   26.844259]  <TASK>
[   26.844263]  __decode_op_hdr+0x20/0x120 [nfsv4]
[   26.844288]  nfs4_xdr_dec_readdir+0xbb/0x120 [nfsv4]
[   26.844305]  gss_unwrap_resp+0x9e/0x150 [auth_rpcgss]
[   26.844311]  call_decode+0x211/0x230 [sunrpc]
[   26.844332]  ? __pfx_call_decode+0x10/0x10 [sunrpc]
[   26.844348]  __rpc_execute+0xb6/0x480 [sunrpc]
[   26.844369]  ? rpc_new_task+0x17a/0x200 [sunrpc]
[   26.844386]  rpc_execute+0x133/0x160 [sunrpc]
[   26.844401]  rpc_run_task+0x103/0x160 [sunrpc]
[   26.844419]  nfs4_call_sync_sequence+0x74/0xb0 [nfsv4]
[   26.844440]  _nfs4_proc_readdir+0x28d/0x310 [nfsv4]
[   26.844459]  nfs4_proc_readdir+0x60/0xf0 [nfsv4]
[   26.844475]  nfs_readdir_xdr_to_array+0x1fb/0x410 [nfs]
[   26.844494]  nfs_readdir+0x2ed/0xf00 [nfs]
[   26.844506]  iterate_dir+0xaa/0x270
[   26.844517]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844521]  __x64_sys_getdents64+0x7b/0x110
[   26.844523]  ? __pfx_filldir64+0x10/0x10
[   26.844526]  do_syscall_64+0x82/0x320
[   26.844530]  ? mod_memcg_lruvec_state+0xe7/0x2e0
[   26.844533]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844535]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844537]  ? __lruvec_stat_mod_folio+0x85/0xd0
[   26.844539]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844541]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844550]  ? set_ptes.isra.0+0x36/0x80
[   26.844555]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844557]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844560]  ? do_anonymous_page+0x101/0x970
[   26.844563]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844565]  ? ___pte_offset_map+0x1b/0x160
[   26.844570]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844572]  ? __handle_mm_fault+0xac6/0xef0
[   26.844577]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844578]  ? count_memcg_events+0xd6/0x220
[   26.844581]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844583]  ? handle_mm_fault+0x1d6/0x2d0
[   26.844585]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844587]  ? do_user_addr_fault+0x21a/0x690
[   26.844591]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844593]  ? srso_alias_return_thunk+0x5/0xfbef5
[   26.844595]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   26.844597] RIP: 0033:0x7fa5d15678a3
[   26.844606] Code: 8b 05 59 a5 10 00 64 c7 00 16 00 00 00 31 c0 eb 9e e8 11 03 04 00 90 b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 21 a5 10 00 f7 d8
[   26.844607] RSP: 002b:00007fffa272d848 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[   26.844609] RAX: ffffffffffffffda RBX: 00007fa5d13b9010 RCX: 00007fa5d15678a3
[   26.844610] RDX: 0000000000020000 RSI: 00007fa5d13b9040 RDI: 0000000000000003
[   26.844611] RBP: 00007fa5d13b9040 R08: 00007fa5d1707400 R09: 0000000000000000
[   26.844613] R10: 0000000000000022 R11: 0000000000000293 R12: 00007fa5d13b9014
[   26.844614] R13: fffffffffffffea0 R14: 0000000000000000 R15: 0000564585c1c200
[   26.844617]  </TASK>
[   26.844618] ---[ end trace 0000000000000000 ]---



TWR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-16  0:38                       ` Tyler W. Ross
@ 2025-11-16 16:29                         ` Chuck Lever
  2025-11-16 18:21                           ` Trond Myklebust
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-11-16 16:29 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker
  Cc: Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Scott Mayhew, Steve Dickson, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, linux-nfs, linux-kernel, Tyler W. Ross

On 11/15/25 7:38 PM, Tyler W. Ross wrote:
> On Friday, November 14th, 2025 at 7:19 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> Then I would say further hunting for the broken commit is going to be
>> fruitless. Adding the WARNs in net/sunrpc/xdr.c is a good next step so
>> we see which XDR data item (assuming it's the same one every time) is
>> failing to decode.
> 
> I added WARNs after each trace_rpc_xdr_overflow() call, and then a couple
> pr_info() inside xdr_copy_to_scratch() as a follow-up.
> 
> If I'm understanding correctly, it's failing in the xdr_copy_to_scratch()
> call inside xdr_inline_decode(), because the xdr_stream struct has an
> unset/NULL scratch kvec. I don't understand the context enough to
> speculate on why, though.
> 
> [   26.844102] Entered xdr_copy_to_scratch()
> [   26.844105] xdr->scratch.iov_base: 0000000000000000
> [   26.844107] xdr->scratch.iov_len: 0
> [   26.844127] ------------[ cut here ]------------
> [   26.844128] WARNING: CPU: 1 PID: 886 at net/sunrpc/xdr.c:1490 xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> [   26.844153] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace netfs binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd ccp kvm cfg80211 hid_generic usbhid hid irqbypass rfkill ghash_clmulni_intel aesni_intel pcspkr 8021q garp stp virtio_balloon llc mrp button evdev joydev sg auth_rpcgss sunrpc configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_cryptoapi sr_mod cdrom bochs uhci_hcd drm_client_lib drm_shmem_helper ehci_pci ata_generic sd_mod drm_kms_helper ehci_hcd ata_piix libata drm virtio_net usbcore virtio_scsi floppy psmouse net_failover failover scsi_mod serio_raw i2c_piix4 usb_common scsi_common i2c_smbus
> [   26.844217] CPU: 1 UID: 591200003 PID: 886 Comm: ls Not tainted 6.17.8-debbug1120598hack3 #9 PREEMPT(lazy)  
> [   26.844220] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [   26.844222] RIP: 0010:xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> [   26.844238] Code: 24 48 c7 c7 e7 eb 8c c0 48 8b 71 28 e8 5a 36 fc d7 48 8b 0c 24 4c 8b 44 24 10 48 8b 54 24 08 4c 39 41 28 73 0c 0f 1f 44 00 00 <0f> 0b e9 b7 fe fe ff 48 89 d8 48 89 cf 4c 89 44 24 08 48 29 d0 48
> [   26.844240] RSP: 0018:ffffd09e82ce3758 EFLAGS: 00010293
> [   26.844242] RAX: 0000000000000017 RBX: ffff8f1e0adcffe8 RCX: ffffd09e82ce3838
> [   26.844244] RDX: ffff8f1e0adcffe4 RSI: 0000000000000001 RDI: ffff8f1f37c5ce40
> [   26.844245] RBP: ffffd09e82ce37b4 R08: 0000000000000008 R09: ffffd09e82ce3600
> [   26.844246] R10: ffffffff9acdb348 R11: 00000000ffffefff R12: 000000000000001a
> [   26.844247] R13: ffff8f1e01151200 R14: 0000000000000000 R15: 0000000000440000
> [   26.844250] FS:  00007fa5d13db240(0000) GS:ffff8f1f9c44a000(0000) knlGS:0000000000000000
> [   26.844252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   26.844253] CR2: 00007fa5d13b9000 CR3: 000000010ab82000 CR4: 0000000000750ef0
> [   26.844255] PKRU: 55555554
> [   26.844257] Call Trace:
> [   26.844259]  <TASK>
> [   26.844263]  __decode_op_hdr+0x20/0x120 [nfsv4]
> [   26.844288]  nfs4_xdr_dec_readdir+0xbb/0x120 [nfsv4]
> [   26.844305]  gss_unwrap_resp+0x9e/0x150 [auth_rpcgss]
> [   26.844311]  call_decode+0x211/0x230 [sunrpc]
> [   26.844332]  ? __pfx_call_decode+0x10/0x10 [sunrpc]
> [   26.844348]  __rpc_execute+0xb6/0x480 [sunrpc]
> [   26.844369]  ? rpc_new_task+0x17a/0x200 [sunrpc]
> [   26.844386]  rpc_execute+0x133/0x160 [sunrpc]
> [   26.844401]  rpc_run_task+0x103/0x160 [sunrpc]
> [   26.844419]  nfs4_call_sync_sequence+0x74/0xb0 [nfsv4]
> [   26.844440]  _nfs4_proc_readdir+0x28d/0x310 [nfsv4]
> [   26.844459]  nfs4_proc_readdir+0x60/0xf0 [nfsv4]
> [   26.844475]  nfs_readdir_xdr_to_array+0x1fb/0x410 [nfs]
> [   26.844494]  nfs_readdir+0x2ed/0xf00 [nfs]
> [   26.844506]  iterate_dir+0xaa/0x270

Hi Trond, Anna -

NFSv4 READDIR is hitting an XDR overflow because the XDR stream's
scratch buffer is missing, and one of the READDIR response's fields
crosses a page boundary in the receive buffer.

Shouldn't the client's readdir XDR decoder have a scratch buffer?


> [   26.844517]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844521]  __x64_sys_getdents64+0x7b/0x110
> [   26.844523]  ? __pfx_filldir64+0x10/0x10
> [   26.844526]  do_syscall_64+0x82/0x320
> [   26.844530]  ? mod_memcg_lruvec_state+0xe7/0x2e0
> [   26.844533]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844535]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844537]  ? __lruvec_stat_mod_folio+0x85/0xd0
> [   26.844539]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844541]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844550]  ? set_ptes.isra.0+0x36/0x80
> [   26.844555]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844557]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844560]  ? do_anonymous_page+0x101/0x970
> [   26.844563]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844565]  ? ___pte_offset_map+0x1b/0x160
> [   26.844570]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844572]  ? __handle_mm_fault+0xac6/0xef0
> [   26.844577]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844578]  ? count_memcg_events+0xd6/0x220
> [   26.844581]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844583]  ? handle_mm_fault+0x1d6/0x2d0
> [   26.844585]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844587]  ? do_user_addr_fault+0x21a/0x690
> [   26.844591]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844593]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   26.844595]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   26.844597] RIP: 0033:0x7fa5d15678a3
> [   26.844606] Code: 8b 05 59 a5 10 00 64 c7 00 16 00 00 00 31 c0 eb 9e e8 11 03 04 00 90 b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 21 a5 10 00 f7 d8
> [   26.844607] RSP: 002b:00007fffa272d848 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
> [   26.844609] RAX: ffffffffffffffda RBX: 00007fa5d13b9010 RCX: 00007fa5d15678a3
> [   26.844610] RDX: 0000000000020000 RSI: 00007fa5d13b9040 RDI: 0000000000000003
> [   26.844611] RBP: 00007fa5d13b9040 R08: 00007fa5d1707400 R09: 0000000000000000
> [   26.844613] R10: 0000000000000022 R11: 0000000000000293 R12: 00007fa5d13b9014
> [   26.844614] R13: fffffffffffffea0 R14: 0000000000000000 R15: 0000564585c1c200
> [   26.844617]  </TASK>
> [   26.844618] ---[ end trace 0000000000000000 ]---
> 
> 
> 
> TWR


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-16 16:29                         ` Chuck Lever
@ 2025-11-16 18:21                           ` Trond Myklebust
  2025-11-17  5:19                             ` Tyler W. Ross
  2025-11-17 22:54                             ` Scott Mayhew
  0 siblings, 2 replies; 31+ messages in thread
From: Trond Myklebust @ 2025-11-16 18:21 UTC (permalink / raw)
  To: Chuck Lever, Anna Schumaker
  Cc: Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Scott Mayhew, Steve Dickson, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, linux-nfs, linux-kernel, Tyler W. Ross

On Sun, 2025-11-16 at 11:29 -0500, Chuck Lever wrote:
> On 11/15/25 7:38 PM, Tyler W. Ross wrote:
> > On Friday, November 14th, 2025 at 7:19 AM, Chuck Lever
> > <chuck.lever@oracle.com> wrote:
> > > Then I would say further hunting for the broken commit is going
> > > to be
> > > fruitless. Adding the WARNs in net/sunrpc/xdr.c is a good next
> > > step so
> > > we see which XDR data item (assuming it's the same one every
> > > time) is
> > > failing to decode.
> > 
> > I added WARNs after each trace_rpc_xdr_overflow() call, and then a
> > couple
> > pr_info() inside xdr_copy_to_scratch() as a follow-up.
> > 
> > If I'm understanding correctly, it's failing in the
> > xdr_copy_to_scratch()
> > call inside xdr_inline_decode(), because the xdr_stream struct has
> > an
> > unset/NULL scratch kvec. I don't understand the context enough to
> > speculate on why, though.
> > 
> > [   26.844102] Entered xdr_copy_to_scratch()
> > [   26.844105] xdr->scratch.iov_base: 0000000000000000
> > [   26.844107] xdr->scratch.iov_len: 0
> > [   26.844127] ------------[ cut here ]------------
> > [   26.844128] WARNING: CPU: 1 PID: 886 at net/sunrpc/xdr.c:1490
> > xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> > [   26.844153] Modules linked in: rpcsec_gss_krb5 nfsv4
> > dns_resolver nfs lockd grace netfs binfmt_misc intel_rapl_msr
> > intel_rapl_common kvm_amd ccp kvm cfg80211 hid_generic usbhid hid
> > irqbypass rfkill ghash_clmulni_intel aesni_intel pcspkr 8021q garp
> > stp virtio_balloon llc mrp button evdev joydev sg auth_rpcgss
> > sunrpc configfs efi_pstore nfnetlink vsock_loopback
> > vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> > vmw_vmci qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc16 mbcache
> > jbd2 crc32c_cryptoapi sr_mod cdrom bochs uhci_hcd drm_client_lib
> > drm_shmem_helper ehci_pci ata_generic sd_mod drm_kms_helper
> > ehci_hcd ata_piix libata drm virtio_net usbcore virtio_scsi floppy
> > psmouse net_failover failover scsi_mod serio_raw i2c_piix4
> > usb_common scsi_common i2c_smbus
> > [   26.844217] CPU: 1 UID: 591200003 PID: 886 Comm: ls Not tainted
> > 6.17.8-debbug1120598hack3 #9 PREEMPT(lazy)  
> > [   26.844220] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [   26.844222] RIP: 0010:xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> > [   26.844238] Code: 24 48 c7 c7 e7 eb 8c c0 48 8b 71 28 e8 5a 36
> > fc d7 48 8b 0c 24 4c 8b 44 24 10 48 8b 54 24 08 4c 39 41 28 73 0c
> > 0f 1f 44 00 00 <0f> 0b e9 b7 fe fe ff 48 89 d8 48 89 cf 4c 89 44 24
> > 08 48 29 d0 48
> > [   26.844240] RSP: 0018:ffffd09e82ce3758 EFLAGS: 00010293
> > [   26.844242] RAX: 0000000000000017 RBX: ffff8f1e0adcffe8 RCX:
> > ffffd09e82ce3838
> > [   26.844244] RDX: ffff8f1e0adcffe4 RSI: 0000000000000001 RDI:
> > ffff8f1f37c5ce40
> > [   26.844245] RBP: ffffd09e82ce37b4 R08: 0000000000000008 R09:
> > ffffd09e82ce3600
> > [   26.844246] R10: ffffffff9acdb348 R11: 00000000ffffefff R12:
> > 000000000000001a
> > [   26.844247] R13: ffff8f1e01151200 R14: 0000000000000000 R15:
> > 0000000000440000
> > [   26.844250] FS:  00007fa5d13db240(0000)
> > GS:ffff8f1f9c44a000(0000) knlGS:0000000000000000
> > [   26.844252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   26.844253] CR2: 00007fa5d13b9000 CR3: 000000010ab82000 CR4:
> > 0000000000750ef0
> > [   26.844255] PKRU: 55555554
> > [   26.844257] Call Trace:
> > [   26.844259]  <TASK>
> > [   26.844263]  __decode_op_hdr+0x20/0x120 [nfsv4]
> > [   26.844288]  nfs4_xdr_dec_readdir+0xbb/0x120 [nfsv4]
> > [   26.844305]  gss_unwrap_resp+0x9e/0x150 [auth_rpcgss]
> > [   26.844311]  call_decode+0x211/0x230 [sunrpc]
> > [   26.844332]  ? __pfx_call_decode+0x10/0x10 [sunrpc]
> > [   26.844348]  __rpc_execute+0xb6/0x480 [sunrpc]
> > [   26.844369]  ? rpc_new_task+0x17a/0x200 [sunrpc]
> > [   26.844386]  rpc_execute+0x133/0x160 [sunrpc]
> > [   26.844401]  rpc_run_task+0x103/0x160 [sunrpc]
> > [   26.844419]  nfs4_call_sync_sequence+0x74/0xb0 [nfsv4]
> > [   26.844440]  _nfs4_proc_readdir+0x28d/0x310 [nfsv4]
> > [   26.844459]  nfs4_proc_readdir+0x60/0xf0 [nfsv4]
> > [   26.844475]  nfs_readdir_xdr_to_array+0x1fb/0x410 [nfs]
> > [   26.844494]  nfs_readdir+0x2ed/0xf00 [nfs]
> > [   26.844506]  iterate_dir+0xaa/0x270
> 
> Hi Trond, Anna -
> 
> NFSv4 READDIR is hitting an XDR overflow because the XDR stream's
> scratch buffer is missing, and one of the READDIR response's fields
> crosses a page boundary in the receive buffer.
> 
> Shouldn't the client's readdir XDR decoder have a scratch buffer?

No it shouldn't.

The READDIR XDR decoder doesn't interpret the contents of the readdir
buffer. What it is supposed to do is read the op header and the readdir
verifier, and then to align the remaining data into the pages that were
allocated as buffer using a call to xdr_read_page(). Essentially, it's
the exact same procedure as we follow for a READ call.

So if we're crossing into the pages before we hit the call to
xdr_read_pages() then that means we've allocated too small a header
buffer. Since it only appears to happen with RPCSEC_GSS, then my money
would be on AUTH_GSS not padding the reply buffer sufficiently when
setting the value of auth->au_cslack.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-16 18:21                           ` Trond Myklebust
@ 2025-11-17  5:19                             ` Tyler W. Ross
  2025-11-17 13:41                               ` Chuck Lever
  2025-11-17 23:05                               ` Scott Mayhew
  2025-11-17 22:54                             ` Scott Mayhew
  1 sibling, 2 replies; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-17  5:19 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Chuck Lever, Anna Schumaker, Salvatore Bonaccorso,
	1120598@bugs.debian.org, Jeff Layton, NeilBrown, Scott Mayhew,
	Steve Dickson, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-kernel

Weird behavior I just discovered:

Explicitly setting allowed-enctypes in the gssd section of /etc/nfs.conf
to exclude aes256-cts-hmac-sha1-96 makes both SHA2 ciphers work as
expected (assuming each is allowed).

If allowed-enctypes is unset (letting gssd interrogate the kernel for
supported enctypes) or includes aes256-cts-hmac-sha1-96, then the XDR
overflow occurs.

Non-working configurations (first is the commented-out default in nfs.conf):
allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,camellia256-cts-cmac,camellia128-cts-cmac,aes256-cts-hmac-sha1-96,aes128-cts-hmac-sha1-96
allowed-enctypes=aes256-cts-hmac-sha384-192,aes256-cts-hmac-sha1-96
allowed-enctypes=aes128-cts-hmac-sha256-128,aes256-cts-hmac-sha1-96
allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,aes256-cts-hmac-sha1-96

Working configurations (first is default sans aes256-cts-hmac-sha1-96):
allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,camellia256-cts-cmac,camellia128-cts-cmac,aes128-cts-hmac-sha1-96
allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128
allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha1-96
allowed-enctypes=aes128-cts-hmac-sha256-128,aes128-cts-hmac-sha1-96


Is this gssd mishandling some setup/initialization?
Or is there a miscalculation happening somewhere further up?


TWR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-17  5:19                             ` Tyler W. Ross
@ 2025-11-17 13:41                               ` Chuck Lever
  2025-11-17 18:38                                 ` Tyler W. Ross
  2025-11-17 23:05                               ` Scott Mayhew
  1 sibling, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-11-17 13:41 UTC (permalink / raw)
  To: Tyler W. Ross, Trond Myklebust
  Cc: Anna Schumaker, Salvatore Bonaccorso, 1120598@bugs.debian.org,
	Jeff Layton, NeilBrown, Scott Mayhew, Steve Dickson,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel

On 11/17/25 12:19 AM, Tyler W. Ross wrote:
> Weird behavior I just discovered:
> 
> Explicitly setting allowed-enctypes in the gssd section of /etc/nfs.conf
> to exclude aes256-cts-hmac-sha1-96 makes both SHA2 ciphers work as
> expected (assuming each is allowed).
> 
> If allowed-enctypes is unset (letting gssd interrogate the kernel for
> supported enctypes) or includes aes256-cts-hmac-sha1-96, then the XDR
> overflow occurs.
> 
> Non-working configurations (first is the commented-out default in nfs.conf):
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,camellia256-cts-cmac,camellia128-cts-cmac,aes256-cts-hmac-sha1-96,aes128-cts-hmac-sha1-96
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes256-cts-hmac-sha1-96
> allowed-enctypes=aes128-cts-hmac-sha256-128,aes256-cts-hmac-sha1-96
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,aes256-cts-hmac-sha1-96
> 
> Working configurations (first is default sans aes256-cts-hmac-sha1-96):
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,camellia256-cts-cmac,camellia128-cts-cmac,aes128-cts-hmac-sha1-96
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha1-96
> allowed-enctypes=aes128-cts-hmac-sha256-128,aes128-cts-hmac-sha1-96
> 
> 
> Is this gssd mishandling some setup/initialization?
> Or is there a miscalculation happening somewhere further up?
Does Debian's user space Kerberos support the sha2 enctypes?


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-17 13:41                               ` Chuck Lever
@ 2025-11-17 18:38                                 ` Tyler W. Ross
  0 siblings, 0 replies; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-17 18:38 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Trond Myklebust, Anna Schumaker, Salvatore Bonaccorso,
	1120598@bugs.debian.org, Jeff Layton, NeilBrown, Scott Mayhew,
	Steve Dickson, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-kernel

On Monday, November 17th, 2025 at 6:41 AM, Chuck Lever <chuck.lever@oracle.com> wrote:

> > Is this gssd mishandling some setup/initialization?
> > Or is there a miscalculation happening somewhere further up?
> 
> Does Debian's user space Kerberos support the sha2 enctypes?

Appears to. MIT Kerberos docs list sha2 enctypes support on releases
>=1.15 . Debian 13 and Fedora 43 are both shipping 1.21.3 . Debian
unstable currently has 1.22.1 . I haven't had any issues managing
sha2 keytabs, tickets, etc. with the userspace tools. And at least some
NFS operations other than READDIR do seem to work, though I haven't
tested that beyond observing cat, stat, and touch are functional.


TWR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-16 18:21                           ` Trond Myklebust
  2025-11-17  5:19                             ` Tyler W. Ross
@ 2025-11-17 22:54                             ` Scott Mayhew
  2025-11-18  4:10                               ` Tyler W. Ross
  1 sibling, 1 reply; 31+ messages in thread
From: Scott Mayhew @ 2025-11-17 22:54 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Chuck Lever, Anna Schumaker, Salvatore Bonaccorso,
	1120598@bugs.debian.org, Jeff Layton, NeilBrown, Steve Dickson,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel,
	Tyler W. Ross

On Sun, 16 Nov 2025, Trond Myklebust wrote:

> On Sun, 2025-11-16 at 11:29 -0500, Chuck Lever wrote:
> > On 11/15/25 7:38 PM, Tyler W. Ross wrote:
> > > On Friday, November 14th, 2025 at 7:19 AM, Chuck Lever
> > > <chuck.lever@oracle.com> wrote:
> > > > Then I would say further hunting for the broken commit is going
> > > > to be
> > > > fruitless. Adding the WARNs in net/sunrpc/xdr.c is a good next
> > > > step so
> > > > we see which XDR data item (assuming it's the same one every
> > > > time) is
> > > > failing to decode.
> > > 
> > > I added WARNs after each trace_rpc_xdr_overflow() call, and then a
> > > couple
> > > pr_info() inside xdr_copy_to_scratch() as a follow-up.
> > > 
> > > If I'm understanding correctly, it's failing in the
> > > xdr_copy_to_scratch()
> > > call inside xdr_inline_decode(), because the xdr_stream struct has
> > > an
> > > unset/NULL scratch kvec. I don't understand the context enough to
> > > speculate on why, though.
> > > 
> > > [   26.844102] Entered xdr_copy_to_scratch()
> > > [   26.844105] xdr->scratch.iov_base: 0000000000000000
> > > [   26.844107] xdr->scratch.iov_len: 0
> > > [   26.844127] ------------[ cut here ]------------
> > > [   26.844128] WARNING: CPU: 1 PID: 886 at net/sunrpc/xdr.c:1490
> > > xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> > > [   26.844153] Modules linked in: rpcsec_gss_krb5 nfsv4
> > > dns_resolver nfs lockd grace netfs binfmt_misc intel_rapl_msr
> > > intel_rapl_common kvm_amd ccp kvm cfg80211 hid_generic usbhid hid
> > > irqbypass rfkill ghash_clmulni_intel aesni_intel pcspkr 8021q garp
> > > stp virtio_balloon llc mrp button evdev joydev sg auth_rpcgss
> > > sunrpc configfs efi_pstore nfnetlink vsock_loopback
> > > vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> > > vmw_vmci qemu_fw_cfg ip_tables x_tables autofs4 ext4 crc16 mbcache
> > > jbd2 crc32c_cryptoapi sr_mod cdrom bochs uhci_hcd drm_client_lib
> > > drm_shmem_helper ehci_pci ata_generic sd_mod drm_kms_helper
> > > ehci_hcd ata_piix libata drm virtio_net usbcore virtio_scsi floppy
> > > psmouse net_failover failover scsi_mod serio_raw i2c_piix4
> > > usb_common scsi_common i2c_smbus
> > > [   26.844217] CPU: 1 UID: 591200003 PID: 886 Comm: ls Not tainted
> > > 6.17.8-debbug1120598hack3 #9 PREEMPT(lazy)  
> > > [   26.844220] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > > 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > > [   26.844222] RIP: 0010:xdr_inline_decode.cold+0x65/0x141 [sunrpc]
> > > [   26.844238] Code: 24 48 c7 c7 e7 eb 8c c0 48 8b 71 28 e8 5a 36
> > > fc d7 48 8b 0c 24 4c 8b 44 24 10 48 8b 54 24 08 4c 39 41 28 73 0c
> > > 0f 1f 44 00 00 <0f> 0b e9 b7 fe fe ff 48 89 d8 48 89 cf 4c 89 44 24
> > > 08 48 29 d0 48
> > > [   26.844240] RSP: 0018:ffffd09e82ce3758 EFLAGS: 00010293
> > > [   26.844242] RAX: 0000000000000017 RBX: ffff8f1e0adcffe8 RCX:
> > > ffffd09e82ce3838
> > > [   26.844244] RDX: ffff8f1e0adcffe4 RSI: 0000000000000001 RDI:
> > > ffff8f1f37c5ce40
> > > [   26.844245] RBP: ffffd09e82ce37b4 R08: 0000000000000008 R09:
> > > ffffd09e82ce3600
> > > [   26.844246] R10: ffffffff9acdb348 R11: 00000000ffffefff R12:
> > > 000000000000001a
> > > [   26.844247] R13: ffff8f1e01151200 R14: 0000000000000000 R15:
> > > 0000000000440000
> > > [   26.844250] FS:  00007fa5d13db240(0000)
> > > GS:ffff8f1f9c44a000(0000) knlGS:0000000000000000
> > > [   26.844252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   26.844253] CR2: 00007fa5d13b9000 CR3: 000000010ab82000 CR4:
> > > 0000000000750ef0
> > > [   26.844255] PKRU: 55555554
> > > [   26.844257] Call Trace:
> > > [   26.844259]  <TASK>
> > > [   26.844263]  __decode_op_hdr+0x20/0x120 [nfsv4]
> > > [   26.844288]  nfs4_xdr_dec_readdir+0xbb/0x120 [nfsv4]
> > > [   26.844305]  gss_unwrap_resp+0x9e/0x150 [auth_rpcgss]
> > > [   26.844311]  call_decode+0x211/0x230 [sunrpc]
> > > [   26.844332]  ? __pfx_call_decode+0x10/0x10 [sunrpc]
> > > [   26.844348]  __rpc_execute+0xb6/0x480 [sunrpc]
> > > [   26.844369]  ? rpc_new_task+0x17a/0x200 [sunrpc]
> > > [   26.844386]  rpc_execute+0x133/0x160 [sunrpc]
> > > [   26.844401]  rpc_run_task+0x103/0x160 [sunrpc]
> > > [   26.844419]  nfs4_call_sync_sequence+0x74/0xb0 [nfsv4]
> > > [   26.844440]  _nfs4_proc_readdir+0x28d/0x310 [nfsv4]
> > > [   26.844459]  nfs4_proc_readdir+0x60/0xf0 [nfsv4]
> > > [   26.844475]  nfs_readdir_xdr_to_array+0x1fb/0x410 [nfs]
> > > [   26.844494]  nfs_readdir+0x2ed/0xf00 [nfs]
> > > [   26.844506]  iterate_dir+0xaa/0x270
> > 
> > Hi Trond, Anna -
> > 
> > NFSv4 READDIR is hitting an XDR overflow because the XDR stream's
> > scratch buffer is missing, and one of the READDIR response's fields
> > crosses a page boundary in the receive buffer.
> > 
> > Shouldn't the client's readdir XDR decoder have a scratch buffer?
> 
> No it shouldn't.
> 
> The READDIR XDR decoder doesn't interpret the contents of the readdir
> buffer. What it is supposed to do is read the op header and the readdir
> verifier, and then to align the remaining data into the pages that were
> allocated as buffer using a call to xdr_read_page(). Essentially, it's
> the exact same procedure as we follow for a READ call.
> 
> So if we're crossing into the pages before we hit the call to
> xdr_read_pages() then that means we've allocated too small a header
> buffer. Since it only appears to happen with RPCSEC_GSS, then my money
> would be on AUTH_GSS not padding the reply buffer sufficiently when
> setting the value of auth->au_cslack.

If replies are the problem, why wouldn't we want to focus on
auth->au_rslack and auth->au_ralign?

FWIW I have both Debian Trixie and Sid/Forky VMs, and krb5{,i,p} is
working across the board for me.  Normally I just use a plain MIT KDC,
so I tried IPA and that works fine too.  Looking Tyler's tracepoint
output, these two jump out:

              ls-969   [003] .....   270.326933: rpc_buf_alloc:        task:00000008@00000005 callsize=3932 recvsize=176 status=0
                                                                                                                     ^^^
              ls-969   [003] .....   270.326936: rpc_xdr_reply_pages:  task:00000008@00000005 head=[0xffff8895c29fef64,140] page=4008(88) tail=[0xffff8895c29feff0,36] len=0
                                                                                                                       ^^^

Contrast that with what I see on my own systems:
              ls-13558   [000] ..... 419637.290876: rpc_buf_alloc: task:00000008@00000007 callsize=3932 recvsize=148 status=0
                                                                                                                 ^^^ 
              ls-13558   [000] ..... 419637.290879: rpc_xdr_reply_pages: task:00000008@00000007 head=[0000000050ca7092,144] page=4008(88) tail=[000000007b84934f,4] len=0
                                                                                                                       ^^^
Those values for the receive size and the head iov length are consistent
across all my VMs (not just my Debian ones).

> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trondmy@kernel.org, trond.myklebust@hammerspace.com
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-17  5:19                             ` Tyler W. Ross
  2025-11-17 13:41                               ` Chuck Lever
@ 2025-11-17 23:05                               ` Scott Mayhew
  1 sibling, 0 replies; 31+ messages in thread
From: Scott Mayhew @ 2025-11-17 23:05 UTC (permalink / raw)
  To: Tyler W. Ross
  Cc: Trond Myklebust, Chuck Lever, Anna Schumaker,
	Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Steve Dickson, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-kernel

On Mon, 17 Nov 2025, Tyler W. Ross wrote:

> Weird behavior I just discovered:
> 
> Explicitly setting allowed-enctypes in the gssd section of /etc/nfs.conf
> to exclude aes256-cts-hmac-sha1-96 makes both SHA2 ciphers work as
> expected (assuming each is allowed).
> 
> If allowed-enctypes is unset (letting gssd interrogate the kernel for
> supported enctypes) or includes aes256-cts-hmac-sha1-96, then the XDR
> overflow occurs.
> 
> Non-working configurations (first is the commented-out default in nfs.conf):
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,camellia256-cts-cmac,camellia128-cts-cmac,aes256-cts-hmac-sha1-96,aes128-cts-hmac-sha1-96
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes256-cts-hmac-sha1-96
> allowed-enctypes=aes128-cts-hmac-sha256-128,aes256-cts-hmac-sha1-96
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,aes256-cts-hmac-sha1-96
> 
> Working configurations (first is default sans aes256-cts-hmac-sha1-96):
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,camellia256-cts-cmac,camellia128-cts-cmac,aes128-cts-hmac-sha1-96
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128
> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha1-96
> allowed-enctypes=aes128-cts-hmac-sha256-128,aes128-cts-hmac-sha1-96
> 

That doesn't really make sense.  You should only need to use the
allowed-enctypes setting if you're talking to an NFS server that doesn't
have support for the new encryption types.

It basically works like the "permitted_enctypes" option in krb5.conf,
except it only affects NFS rather than affecting your krb5 configuration
as a whole.

Can you go back and re-do the tracepoint capture, except this time
umount your NFS filessytems before starting the capture (i.e. perform
the mount command while trace-cmd is running).  I'm curious what values
the rpcgss_update_slack tracepoint shows.

> 
> Is this gssd mishandling some setup/initialization?
> Or is there a miscalculation happening somewhere further up?
> 
> 
> TWR
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-17 22:54                             ` Scott Mayhew
@ 2025-11-18  4:10                               ` Tyler W. Ross
  2025-11-18 17:52                                 ` Scott Mayhew
  0 siblings, 1 reply; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-18  4:10 UTC (permalink / raw)
  To: Scott Mayhew
  Cc: Trond Myklebust, Chuck Lever, Anna Schumaker,
	Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Steve Dickson, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-kernel

On 11/17/25 3:54 PM, Scott Mayhew wrote:
> FWIW I have both Debian Trixie and Sid/Forky VMs, and krb5{,i,p} is
> working across the board for me.  Normally I just use a plain MIT KDC,
> so I tried IPA and that works fine too.

Did you confirm the enctype used?

My repro steps, from initial mounted state:
kinit
kvno -e aes256-cts-hmac-sha384-192 <nfs spn>
ls /mnt/example

On my Debian Sid VM, if I do kinit and then immediately ls, the issue 
does not occur. klist shows the acquired service ticket has an
aes256-cts-hmac-sha1-96 session key.


TWR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
@ 2025-11-18  4:32 Tyler W. Ross
  0 siblings, 0 replies; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-18  4:32 UTC (permalink / raw)
  To: Scott Mayhew
  Cc: Trond Myklebust, Chuck Lever, Anna Schumaker,
	Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Steve Dickson, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-kernel

On 11/17/25 4:05 PM, Scott Mayhew wrote:
> On Mon, 17 Nov 2025, Tyler W. Ross wrote:
> 
>> Weird behavior I just discovered:
>>
>> Explicitly setting allowed-enctypes in the gssd section of /etc/nfs.conf
>> to exclude aes256-cts-hmac-sha1-96 makes both SHA2 ciphers work as
>> expected (assuming each is allowed).
>>
>> If allowed-enctypes is unset (letting gssd interrogate the kernel for
>> supported enctypes) or includes aes256-cts-hmac-sha1-96, then the XDR
>> overflow occurs.
>>
>> Non-working configurations (first is the commented-out default in nfs.conf):
>> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,camellia256-cts-cmac,camellia128-cts-cmac,aes256-cts-hmac-sha1-96,aes128-cts-hmac-sha1-96
>> allowed-enctypes=aes256-cts-hmac-sha384-192,aes256-cts-hmac-sha1-96
>> allowed-enctypes=aes128-cts-hmac-sha256-128,aes256-cts-hmac-sha1-96
>> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,aes256-cts-hmac-sha1-96
>>
>> Working configurations (first is default sans aes256-cts-hmac-sha1-96):
>> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128,camellia256-cts-cmac,camellia128-cts-cmac,aes128-cts-hmac-sha1-96
>> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha256-128
>> allowed-enctypes=aes256-cts-hmac-sha384-192,aes128-cts-hmac-sha1-96
>> allowed-enctypes=aes128-cts-hmac-sha256-128,aes128-cts-hmac-sha1-96
>>
> 
> That doesn't really make sense.  You should only need to use the
> allowed-enctypes setting if you're talking to an NFS server that doesn't
> have support for the new encryption types.
> 
> It basically works like the "permitted_enctypes" option in krb5.conf,
> except it only affects NFS rather than affecting your krb5 configuration
> as a whole.

Agreed. It really doesn't make sense. It may just be me being confounded 
by some ancillary behavior I don't understand.

I find it especially strange that
allowed-enctypes=aes256-cts-hmac-sha384-192 works, but unset
allowed-enctypes with a manually acquired aes256-cts-hmac-sha384-192 
ticket doesn't work.

allowed-enctypes=aes256-cts-hmac-sha384-192 works both with an 
automatically acquired service ticket (kinit then ls) and a manually 
acquired service ticket (via kvno -e).

> Can you go back and re-do the tracepoint capture, except this time
> umount your NFS filessytems before starting the capture (i.e. perform
> the mount command while trace-cmd is running).  I'm curious what values
> the rpcgss_update_slack tracepoint shows.

Here are the 2 rpcgss_update_slack occurrences, with a couple lines of 
context. Let me know if you'd like the full report: it's ~1300 lines.

mount.nfs4-1043  [005] .....   190.746932: rpc_task_run_action:  task:00000002@00000001 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
mount.nfs4-1043  [005] .....   190.746932: rpc_task_run_action:  task:00000002@00000001 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
mount.nfs4-1043  [005] .....   190.746933: rpc_xdr_recvfrom:     task:00000002@00000001 head=[0xffff8a61a2848fd4,4392] page=0(0) tail=[(nil),0] len=312
mount.nfs4-1043  [005] .....   190.746938: rpcgss_update_slack:  task:00000002@00000001 xid=0xb28269cc auth=0xffff8a6189400798 rslack=19 ralign=11 verfsize=9
mount.nfs4-1043  [005] .....   190.746939: rpc_task_run_action:  task:00000002@00000001 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
mount.nfs4-1043  [005] .....   190.746939: rpc_task_end:         task:00000002@00000001 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
mount.nfs4-1043  [005] .....   190.746940: rpc_stats_latency:    task:00000002@00000001 xid=0xb28269cc nfsv4 EXCHANGE_ID backlog=12836 rtt=136 execute=12995 xprt_id=1
--
mount.nfs4-1043  [002] .....   190.755687: rpc_task_run_action:  task:00000001@00000002 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
mount.nfs4-1043  [002] .....   190.755687: rpc_task_run_action:  task:00000001@00000002 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
mount.nfs4-1043  [002] .....   190.755688: rpc_xdr_recvfrom:     task:00000001@00000002 head=[0xffff8a6182b4e6ac,2920] page=0(0) tail=[(nil),0] len=192
mount.nfs4-1043  [002] .....   190.755691: rpcgss_update_slack:  task:00000001@00000002 xid=0xb68269cc auth=0xffff8a6187759498 rslack=9 ralign=9 verfsize=9
mount.nfs4-1043  [002] .....   190.755694: rpc_task_run_action:  task:00000001@00000002 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
mount.nfs4-1043  [002] .....   190.755694: rpc_task_end:         task:00000001@00000002 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
mount.nfs4-1043  [002] .....   190.755694: rpc_stats_latency:    task:00000001@00000002 xid=0xb68269cc nfsv4 LOOKUP_ROOT backlog=7101 rtt=91 execute=7218 xprt_id=1


And here's with allowed-enctypes=aes256-cts-hmac-sha384-192

mount.nfs4-1100  [005] .....   580.221598: rpc_task_run_action:  task:00000002@00000001 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
mount.nfs4-1100  [005] .....   580.221598: rpc_task_run_action:  task:00000002@00000001 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
mount.nfs4-1100  [005] .....   580.221598: rpc_xdr_recvfrom:     task:00000002@00000001 head=[0xffff8b2b98850fd4,4392] page=0(0) tail=[(nil),0] len=336
mount.nfs4-1100  [005] .....   580.221604: rpcgss_update_slack:  task:00000002@00000001 xid=0x4c050148 auth=0xffff8b2b88864818 rslack=25 ralign=14 verfsize=12
mount.nfs4-1100  [005] .....   580.221605: rpc_task_run_action:  task:00000002@00000001 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
mount.nfs4-1100  [005] .....   580.221606: rpc_task_end:         task:00000002@00000001 flags=DYNAMIC|NO_ROUND_ROBIN|SOFT|SENT|TIMEOUT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
mount.nfs4-1100  [005] .....   580.221607: rpc_stats_latency:    task:00000002@00000001 xid=0x4c050148 nfsv4 EXCHANGE_ID backlog=13249 rtt=164 execute=13435 xprt_id=1
--
mount.nfs4-1100  [000] .....   580.230841: rpc_task_run_action:  task:00000001@00000002 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_status
mount.nfs4-1100  [000] .....   580.230841: rpc_task_run_action:  task:00000001@00000002 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=call_decode
mount.nfs4-1100  [000] .....   580.230841: rpc_xdr_recvfrom:     task:00000001@00000002 head=[0xffff8b2ba07b66ac,2920] page=0(0) tail=[(nil),0] len=204
mount.nfs4-1100  [000] .....   580.230845: rpcgss_update_slack:  task:00000001@00000002 xid=0x50050148 auth=0xffff8b2b88864b18 rslack=12 ralign=12 verfsize=12
mount.nfs4-1100  [000] .....   580.230847: rpc_task_run_action:  task:00000001@00000002 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
mount.nfs4-1100  [000] .....   580.230847: rpc_task_end:         task:00000001@00000002 flags=MOVEABLE|DYNAMIC|SENT|NORTO|CRED_NOREF runstate=RUNNING|0x4 status=0 action=rpc_exit_task
mount.nfs4-1100  [000] .....   580.230848: rpc_stats_latency:    task:00000001@00000002 xid=0x50050148 nfsv4 LOOKUP_ROOT backlog=7760 rtt=98 execute=7878 xprt_id=1



TWR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-18  4:10                               ` Tyler W. Ross
@ 2025-11-18 17:52                                 ` Scott Mayhew
  2025-11-18 23:43                                   ` Tyler W. Ross
  0 siblings, 1 reply; 31+ messages in thread
From: Scott Mayhew @ 2025-11-18 17:52 UTC (permalink / raw)
  To: Tyler W. Ross
  Cc: Trond Myklebust, Chuck Lever, Anna Schumaker,
	Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Steve Dickson, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-kernel

On Tue, 18 Nov 2025, Tyler W. Ross wrote:

> On 11/17/25 3:54 PM, Scott Mayhew wrote:
> > FWIW I have both Debian Trixie and Sid/Forky VMs, and krb5{,i,p} is
> > working across the board for me.  Normally I just use a plain MIT KDC,
> > so I tried IPA and that works fine too.
> 
> Did you confirm the enctype used?

Yes.  This is how I was testing:

root@forky:~# uname -r
6.17.7+deb14+1-amd64
root@forky:~# systemctl restart rpc-gssd
root@forky:~# klist -ce /tmp/krb5ccmachine_SMAYHEW.TEST
klist: No credentials cache found (filename: /tmp/krb5ccmachine_SMAYHEW.TEST)
root@forky:~# for serv in forky trixie rawhide rhel10 rhel9; do for flav in krb5 krb5i krb5p; do mount -o v4.2,sec=$flav $serv.smayhew.test:/export /mnt/t; ls -lR /mnt/t >/dev/null; umount /mnt/t; done; done
root@forky:~# klist -ce /tmp/krb5ccmachine_SMAYHEW.TEST
Ticket cache: FILE:/tmp/krb5ccmachine_SMAYHEW.TEST
Default principal: nfs/forky.smayhew.test@SMAYHEW.TEST

Valid starting     Expires            Service principal
11/14/25 14:53:03  11/15/25 14:53:03  krbtgt/SMAYHEW.TEST@SMAYHEW.TEST
        Etype (skey, tkt): aes256-cts-hmac-sha384-192, aes256-cts-hmac-sha384-192
11/14/25 14:53:03  11/15/25 14:53:03  nfs/forky.smayhew.test@SMAYHEW.TEST
        Etype (skey, tkt): aes256-cts-hmac-sha384-192, aes256-cts-hmac-sha384-192
11/14/25 14:53:03  11/15/25 14:53:03  nfs/trixie.smayhew.test@SMAYHEW.TEST
        Etype (skey, tkt): aes256-cts-hmac-sha384-192, aes256-cts-hmac-sha384-192
11/14/25 14:53:03  11/15/25 14:53:03  nfs/rawhide.smayhew.test@SMAYHEW.TEST
        Etype (skey, tkt): aes256-cts-hmac-sha384-192, aes256-cts-hmac-sha384-192
11/14/25 14:53:04  11/15/25 14:53:03  nfs/rhel10.smayhew.test@SMAYHEW.TEST
        Etype (skey, tkt): aes256-cts-hmac-sha384-192, aes256-cts-hmac-sha384-192
11/14/25 14:53:05  11/15/25 14:53:03  nfs/rhel9.smayhew.test@SMAYHEW.TEST
        Etype (skey, tkt): aes256-cts-hmac-sha384-192, aes256-cts-hmac-sha384-192

> 
> My repro steps, from initial mounted state:
> kinit
> kvno -e aes256-cts-hmac-sha384-192 <nfs spn>
> ls /mnt/example
> 
> On my Debian Sid VM, if I do kinit and then immediately ls, the issue 
> does not occur. klist shows the acquired service ticket has an
> aes256-cts-hmac-sha1-96 session key.

Oh!  I see the problem.  If the automatically acquired service ticket
for a normal user is using aes256-cts-hmac-sha1-96, then I'm assuming
the machine credential is also using aes256-cts-hmac-sha1-96.
Run 'klist -ce /tmp/krb5ccmachine_IPA.TWRLAB.NET' to check.  You can't
use 'kvno -e' to choose a different encryption type.  Why are you doing
that?  Is it because you want to use the stronger encryption types?  In
that case, the proper way to do this would be to manually add this line
to the "[libdefaults]" stanza of your /etc/krb5.conf:

  permitted_enctypes = aes256-cts-hmac-sha384-192 aes128-cts-hmac-sha256-128 aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96

and get rid of allowed-enctypes settings that you may have added to
/etc/nfs.conf.  Then unmount, run 'systemctl restart rpc-gssd', remount,
etc. and your system should be using aes256-cts-hmac-sha384-192 by default.

RHEL/CentOS/Fedora all ship a package called "crypto-policies" that
include system-wide configurations for various crypto packages.  For
kerberos, it drops a config snippet in /etc/krb5.conf.d similar to what
I have above.  AFAICT Suse has this package too, but it appears Debian
does not.

Without the permitted_enctypes setting, the kerberos library will fall
back to the default settings, which according to krb5.conf(5) 

---8<---
       permitted_enctypes
              Identifies the encryption types that servers will permit for ses‐
              sion keys and for ticket and authenticator encryption, ordered by
              preference from highest to lowest.   Starting  in  release  1.18,
              this  tag also acts as the default value for default_tgs_enctypes
              and default_tkt_enctypes.  The default  value  for  this  tag  is
              aes256-cts-hmac-sha1-96                   aes128-cts-hmac-sha1-96
              aes256-cts-hmac-sha384-192             aes128-cts-hmac-sha256-128
              des3-cbc-sha1    arcfour-hmac-md5   camellia256-cts-cmac   camel‐
              lia128-cts-cmac.
---8<---

If I remove that line from my krb5.conf and use 'kvno -e' like your
test, then I can reproduce the behavior you're seeing:

root@forky:~# systemctl restart rpc-gssd
root@forky:~# mount -o v4.2,sec=krb5 trixie.smayhew.test:/export /mnt/t
root@forky:~# klist -ce /tmp/krb5ccmachine_SMAYHEW.TEST 
Ticket cache: FILE:/tmp/krb5ccmachine_SMAYHEW.TEST
Default principal: nfs/forky.smayhew.test@SMAYHEW.TEST

Valid starting     Expires            Service principal
11/18/25 17:41:29  11/19/25 17:15:04  krbtgt/SMAYHEW.TEST@SMAYHEW.TEST
	Etype (skey, tkt): aes256-cts-hmac-sha1-96, camellia256-cts-cmac 
11/18/25 17:41:29  11/19/25 17:15:04  nfs/trixie.smayhew.test@SMAYHEW.TEST
	Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha384-192 
root@forky:~# su - smayhew
smayhew@forky:~$ kinit
Password for smayhew@SMAYHEW.TEST: 
smayhew@forky:~$ kvno -e aes256-cts-hmac-sha384-192 nfs/trixie.smayhew.test
nfs/trixie.smayhew.test@SMAYHEW.TEST: kvno = 1
smayhew@forky:~$ klist -ce 
Ticket cache: KEYRING:persistent:1052000003:1052000003
Default principal: smayhew@SMAYHEW.TEST

Valid starting     Expires            Service principal
11/18/25 17:41:53  11/19/25 17:20:27  nfs/trixie.smayhew.test@SMAYHEW.TEST
	Etype (skey, tkt): aes256-cts-hmac-sha384-192, aes256-cts-hmac-sha384-192 
11/18/25 17:41:39  11/19/25 17:20:27  krbtgt/SMAYHEW.TEST@SMAYHEW.TEST
	Etype (skey, tkt): aes256-cts-hmac-sha1-96, camellia256-cts-cmac 
smayhew@forky:~$ ls /mnt/t
ls: reading directory '/mnt/t': Input/output error
smayhew@forky:~$ 
logout
root@forky:~# grep overflow /sys/kernel/debug/tracing/trace
              ls-2032    [002] .....  3025.593816: rpc_xdr_overflow: task:00000009@00000006 nfsv4 READDIR requested=8 p=00000000dfba8950 end=00000000b97e329e xdr=[00000000389cc91a,132]/4008/[00000000b97e329e,4]/988

-Scott
> 
> 
> TWR
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-18 17:52                                 ` Scott Mayhew
@ 2025-11-18 23:43                                   ` Tyler W. Ross
  2025-11-19  4:50                                     ` Salvatore Bonaccorso
  0 siblings, 1 reply; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-18 23:43 UTC (permalink / raw)
  To: Scott Mayhew
  Cc: Trond Myklebust, Chuck Lever, Anna Schumaker,
	Salvatore Bonaccorso, 1120598@bugs.debian.org, Jeff Layton,
	NeilBrown, Steve Dickson, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	linux-nfs, linux-kernel

On 11/18/25 10:52 AM, Scott Mayhew wrote:
> Oh!  I see the problem.  If the automatically acquired service ticket
> for a normal user is using aes256-cts-hmac-sha1-96, then I'm assuming
> the machine credential is also using aes256-cts-hmac-sha1-96.
> Run 'klist -ce /tmp/krb5ccmachine_IPA.TWRLAB.NET' to check.  You can't
> use 'kvno -e' to choose a different encryption type.  Why are you doing
> that?

Aha! Thank you!

That's exactly the case: the machine credential is
aes256-cts-hmac-sha1-96.

So, taking a step back for context/background: this issue was escalated 
to me by someone attempting to use constrained delegation via gssproxy. 
In the course of troubleshooting that, we found (by examining the 
krb5kdc logs on the IPA server) that the NFS service ticket acquired by 
gssproxy had an aes256-cts-hmac-sha384-192 session key.

Not understanding that the machine and user tickets must having matching 
enctypes, I ended up down this rabbit hole thinking the problem was with 
the SHA2 enctypes. Sorry to bring you all with me on that misadventure.



The actual issue at hand then seems to be that gssproxy is requesting 
(and receiving) a service ticket with an unusable (for the NFS mount) 
enctype, when performing constrained delegation/S4U2Proxy.

krb5kdc logs of gssproxy performing S4U2Self and S4U2Proxy:Nov 18 
18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info): TGS_REQ (8 etypes 
{aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), 
aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19), 
UNSUPPORTED:des3-hmac-sha1(16), DEPRECATED:arcfour-hmac(23), 
camellia128-cts-cmac(25), camellia256-cts-cmac(26)}) 10.108.2.105: 
ISSUE: authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18), 
tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha1-96(18)}, 
host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for 
host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info): ... 
PROTOCOL-TRANSITION s4u-client=jsmith@IPA.TWRLAB.NET
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info): closing 
down fd 4
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): TGS_REQ (4 
etypes {aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19), 
aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17)}) 10.108.2.105: 
ISSUE: authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18), 
tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha384-192(20)}, 
host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for 
nfs/nfssrv.ipa.twrlab.net@IPA.TWRLAB.NET
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): ... 
CONSTRAINED-DELEGATION s4u-client=jsmith@IPA.TWRLAB.NET
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): closing 
down fd 11


On the Fedora 43 client, gssproxy also acquires an
aes256-cts-hmac-sha384-192 service ticket, but the machine credential is 
aes256-cts-hmac-sha384-192 and everything works as-expected.


TWR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-18 23:43                                   ` Tyler W. Ross
@ 2025-11-19  4:50                                     ` Salvatore Bonaccorso
  2025-11-19 13:36                                       ` Scott Mayhew
  2025-11-19 20:54                                       ` Simon Josefsson
  0 siblings, 2 replies; 31+ messages in thread
From: Salvatore Bonaccorso @ 2025-11-19  4:50 UTC (permalink / raw)
  To: Tyler W. Ross
  Cc: Scott Mayhew, Trond Myklebust, Chuck Lever, Anna Schumaker,
	1120598@bugs.debian.org, Jeff Layton, NeilBrown, Steve Dickson,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel,
	Simon Josefsson

Hi,

On Tue, Nov 18, 2025 at 11:43:29PM +0000, Tyler W. Ross wrote:
> On 11/18/25 10:52 AM, Scott Mayhew wrote:
> > Oh!  I see the problem.  If the automatically acquired service ticket
> > for a normal user is using aes256-cts-hmac-sha1-96, then I'm assuming
> > the machine credential is also using aes256-cts-hmac-sha1-96.
> > Run 'klist -ce /tmp/krb5ccmachine_IPA.TWRLAB.NET' to check.  You can't
> > use 'kvno -e' to choose a different encryption type.  Why are you doing
> > that?
> 
> Aha! Thank you!

Thanks to all helping to debug this issue when reported downstream in
Debian, your time invested is very much appreciated!

> That's exactly the case: the machine credential is
> aes256-cts-hmac-sha1-96.
> 
> So, taking a step back for context/background: this issue was escalated to
> me by someone attempting to use constrained delegation via gssproxy. In the
> course of troubleshooting that, we found (by examining the krb5kdc logs on
> the IPA server) that the NFS service ticket acquired by gssproxy had an
> aes256-cts-hmac-sha384-192 session key.
> 
> Not understanding that the machine and user tickets must having matching
> enctypes, I ended up down this rabbit hole thinking the problem
> was with the SHA2 enctypes. Sorry to bring you all with me on that
> misadventure.
> 
> 
> 
> The actual issue at hand then seems to be that gssproxy is requesting (and
> receiving) a service ticket with an unusable (for the NFS mount) enctype,
> when performing constrained delegation/S4U2Proxy.
> 
> krb5kdc logs of gssproxy performing S4U2Self and S4U2Proxy:Nov 18 18:06:51
> directory.ipa.twrlab.net krb5kdc[8463](info): TGS_REQ (8 etypes
> {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17),
> aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19),
> UNSUPPORTED:des3-hmac-sha1(16), DEPRECATED:arcfour-hmac(23),
> camellia128-cts-cmac(25), camellia256-cts-cmac(26)}) 10.108.2.105: ISSUE:
> authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18),
> tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha1-96(18)},
> host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for
> host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET
> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info):
> ... PROTOCOL-TRANSITION s4u-client=jsmith@IPA.TWRLAB.NET
> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info): closing down
> fd 4
> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): TGS_REQ (4
> etypes {aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19),
> aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17)}) 10.108.2.105:
> ISSUE: authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18),
> tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha384-192(20)},
> host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for
> nfs/nfssrv.ipa.twrlab.net@IPA.TWRLAB.NET
> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): ...
> CONSTRAINED-DELEGATION s4u-client=jsmith@IPA.TWRLAB.NET
> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): closing down
> fd 11
> 
> 
> On the Fedora 43 client, gssproxy also acquires an
> aes256-cts-hmac-sha384-192 service ticket, but the machine credential is
> aes256-cts-hmac-sha384-192 and everything works as-ex
> pected.

I'm looping in here the gssproxy maintainer as well. Simon, this is
about https://bugs.debian.org/1120598 . I assume there is nothing on
gssroxy side which can be done to warn about the situation, quoting
again:

> The actual issue at hand then seems to be that gssproxy is requesting (and
> receiving) a service ticket with an unusable (for the NFS mount) enctype,
> when performing constrained delegation/S4U2Proxy.

?

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-19  4:50                                     ` Salvatore Bonaccorso
@ 2025-11-19 13:36                                       ` Scott Mayhew
  2025-11-19 20:54                                       ` Simon Josefsson
  1 sibling, 0 replies; 31+ messages in thread
From: Scott Mayhew @ 2025-11-19 13:36 UTC (permalink / raw)
  To: Salvatore Bonaccorso
  Cc: Tyler W. Ross, Trond Myklebust, Chuck Lever, Anna Schumaker,
	1120598@bugs.debian.org, Jeff Layton, NeilBrown, Steve Dickson,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel,
	Simon Josefsson

On Wed, 19 Nov 2025, Salvatore Bonaccorso wrote:

> Hi,
> 
> On Tue, Nov 18, 2025 at 11:43:29PM +0000, Tyler W. Ross wrote:
> > On 11/18/25 10:52 AM, Scott Mayhew wrote:
> > > Oh!  I see the problem.  If the automatically acquired service ticket
> > > for a normal user is using aes256-cts-hmac-sha1-96, then I'm assuming
> > > the machine credential is also using aes256-cts-hmac-sha1-96.
> > > Run 'klist -ce /tmp/krb5ccmachine_IPA.TWRLAB.NET' to check.  You can't
> > > use 'kvno -e' to choose a different encryption type.  Why are you doing
> > > that?
> > 
> > Aha! Thank you!
> 
> Thanks to all helping to debug this issue when reported downstream in
> Debian, your time invested is very much appreciated!

While I still assert that if you want to use the stronger encryption
types with NFS, then you should prioritize those encryption types higher
in your kerberos configuration... after discussing this yesterday with
Olga I think the above scenario should probably work too.

I just sent a patch that makes that happen, but I forgot to add
"--in-reply-to" my "git send-email" command, so here's the link:

https://lore.kernel.org/linux-nfs/20251119133231.3660975-1-smayhew@redhat.com/T/#u

-Scott

> 
> > That's exactly the case: the machine credential is
> > aes256-cts-hmac-sha1-96.
> > 
> > So, taking a step back for context/background: this issue was escalated to
> > me by someone attempting to use constrained delegation via gssproxy. In the
> > course of troubleshooting that, we found (by examining the krb5kdc logs on
> > the IPA server) that the NFS service ticket acquired by gssproxy had an
> > aes256-cts-hmac-sha384-192 session key.
> > 
> > Not understanding that the machine and user tickets must having matching
> > enctypes, I ended up down this rabbit hole thinking the problem
> > was with the SHA2 enctypes. Sorry to bring you all with me on that
> > misadventure.
> > 
> > 
> > 
> > The actual issue at hand then seems to be that gssproxy is requesting (and
> > receiving) a service ticket with an unusable (for the NFS mount) enctype,
> > when performing constrained delegation/S4U2Proxy.
> > 
> > krb5kdc logs of gssproxy performing S4U2Self and S4U2Proxy:Nov 18 18:06:51
> > directory.ipa.twrlab.net krb5kdc[8463](info): TGS_REQ (8 etypes
> > {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17),
> > aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19),
> > UNSUPPORTED:des3-hmac-sha1(16), DEPRECATED:arcfour-hmac(23),
> > camellia128-cts-cmac(25), camellia256-cts-cmac(26)}) 10.108.2.105: ISSUE:
> > authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18),
> > tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha1-96(18)},
> > host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for
> > host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET
> > Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info):
> > ... PROTOCOL-TRANSITION s4u-client=jsmith@IPA.TWRLAB.NET
> > Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info): closing down
> > fd 4
> > Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): TGS_REQ (4
> > etypes {aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19),
> > aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17)}) 10.108.2.105:
> > ISSUE: authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18),
> > tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha384-192(20)},
> > host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for
> > nfs/nfssrv.ipa.twrlab.net@IPA.TWRLAB.NET
> > Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): ...
> > CONSTRAINED-DELEGATION s4u-client=jsmith@IPA.TWRLAB.NET
> > Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): closing down
> > fd 11
> > 
> > 
> > On the Fedora 43 client, gssproxy also acquires an
> > aes256-cts-hmac-sha384-192 service ticket, but the machine credential is
> > aes256-cts-hmac-sha384-192 and everything works as-ex
> > pected.
> 
> I'm looping in here the gssproxy maintainer as well. Simon, this is
> about https://bugs.debian.org/1120598 . I assume there is nothing on
> gssroxy side which can be done to warn about the situation, quoting
> again:
> 
> > The actual issue at hand then seems to be that gssproxy is requesting (and
> > receiving) a service ticket with an unusable (for the NFS mount) enctype,
> > when performing constrained delegation/S4U2Proxy.
> 
> ?
> 
> Regards,
> Salvatore
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
@ 2025-11-19 17:19 Tyler W. Ross
  0 siblings, 0 replies; 31+ messages in thread
From: Tyler W. Ross @ 2025-11-19 17:19 UTC (permalink / raw)
  To: Scott Mayhew, Salvatore Bonaccorso
  Cc: Trond Myklebust, Chuck Lever, Anna Schumaker,
	1120598@bugs.debian.org, Jeff Layton, NeilBrown, Steve Dickson,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, linux-kernel,
	Simon Josefsson

On 11/19/25 6:36 AM, Scott Mayhew wrote:
> While I still assert that if you want to use the stronger encryption
> types with NFS, then you should prioritize those encryption types higher
> in your kerberos configuration... after discussing this yesterday with
> Olga I think the above scenario should probably work too.

There was no intent or attempt to specify encryption types in the
original configuration. Fiddling with enctypes only came up in the
course of troubleshooting.

This issue was found/replicated by:
1. Configuring a stock Debian 13 client using ipa-client-install against
a freshly deployed Fedora 43 IPA instance.
2. Adding a krb5 NFS entry in fstab
3. Installing and enabling gssproxy (use-gss-proxy=1 in nfs.conf)
4. Configuring gssproxy for constrained delegation as described in the
docs:
[service/nfs-client]
  mechs = krb5
  cred_store = keytab:/etc/krb5.keytab
  cred_store = ccache:FILE:/var/lib/gssproxy/clients/krb5cc_%U
  cred_usage = initiate
  allow_any_uid = yes
  impersonate = true
  euid = 0
5. Allowing constrained delegation on the IPA/KDC side


I think this should be a working configuration: it shouldn't be
necessary to change the enctypes from default for this to work.
But the above results in the aes256-cts-hmac-sha1-96 machine credential
and aes256-cts-hmac-sha384-192 client ticket situation.


> I just sent a patch that makes that happen, but I forgot to add
> "--in-reply-to" my "git send-email" command, so here's the link:
> 
> https://lore.kernel.org/linux-nfs/20251119133231.3660975-1-smayhew@redhat.com/T/#u

Thanks, Scott.

So it's technically possible for the machine and client credentials to
have mismatched enctypes? There are just assumptions (like the slack
variable calculations) that need to be changed to support that?



I'm also wondering if the gssproxy behavior is correct. I obviously
don't understand all the nuance here, but it appears gssproxy is
requesting the service ticket with a different preference/order of
enctypes -- which leads to this mismatch situation.

Looking at the KDC logs (below), the protocol transition request has
enctypes matching the default permitted_enctypes described in
krb5.conf(5) (i.e., with aes256-cts-hmac-sha1-96 first). But then the
constrained delegation request lists aes256-cts-hmac-sha384-192 first,
which I assume indicates preference and is why that enctype is issued.

Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info): TGS_REQ (8 etypes {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17), aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19), UNSUPPORTED:des3-hmac-sha1(16), DEPRECATED:arcfour-hmac(23), camellia128-cts-cmac(25), camellia256-cts-cmac(26)}) 10.108.2.105: ISSUE: authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18), tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha1-96(18)}, host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info): ... PROTOCOL-TRANSITION s4u-client=jsmith@IPA.TWRLAB.NET
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info): closing down fd 4
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): TGS_REQ (4 etypes {aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19), aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17)}) 10.108.2.105: ISSUE: authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18), tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha384-192(20)}, host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for nfs/nfssrv.ipa.twrlab.net@IPA.TWRLAB.NET
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): ... CONSTRAINED-DELEGATION s4u-client=jsmith@IPA.TWRLAB.NET
Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): closing down fd 11


> 
> -Scott
> 
>>
>>> That's exactly the case: the machine credential is
>>> aes256-cts-hmac-sha1-96.
>>>
>>> So, taking a step back for context/background: this issue was escalated to
>>> me by someone attempting to use constrained delegation via gssproxy. In the
>>> course of troubleshooting that, we found (by examining the krb5kdc logs on
>>> the IPA server) that the NFS service ticket acquired by gssproxy had an
>>> aes256-cts-hmac-sha384-192 session key.
>>>
>>> Not understanding that the machine and user tickets must having matching
>>> enctypes, I ended up down this rabbit hole thinking the problem
>>> was with the SHA2 enctypes. Sorry to bring you all with me on that
>>> misadventure.
>>>
>>>
>>>
>>> The actual issue at hand then seems to be that gssproxy is requesting (and
>>> receiving) a service ticket with an unusable (for the NFS mount) enctype,
>>> when performing constrained delegation/S4U2Proxy.
>>>
>>> krb5kdc logs of gssproxy performing S4U2Self and S4U2Proxy:Nov 18 18:06:51
>>> directory.ipa.twrlab.net krb5kdc[8463](info): TGS_REQ (8 etypes
>>> {aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17),
>>> aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19),
>>> UNSUPPORTED:des3-hmac-sha1(16), DEPRECATED:arcfour-hmac(23),
>>> camellia128-cts-cmac(25), camellia256-cts-cmac(26)}) 10.108.2.105: ISSUE:
>>> authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18),
>>> tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha1-96(18)},
>>> host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for
>>> host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET
>>> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info):
>>> ... PROTOCOL-TRANSITION s4u-client=jsmith@IPA.TWRLAB.NET
>>> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8463](info): closing down
>>> fd 4
>>> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): TGS_REQ (4
>>> etypes {aes256-cts-hmac-sha384-192(20), aes128-cts-hmac-sha256-128(19),
>>> aes256-cts-hmac-sha1-96(18), aes128-cts-hmac-sha1-96(17)}) 10.108.2.105:
>>> ISSUE: authtime 1763506600, etypes {rep=aes256-cts-hmac-sha1-96(18),
>>> tkt=aes256-cts-hmac-sha384-192(20), ses=aes256-cts-hmac-sha384-192(20)},
>>> host/nfsclient.ipa.twrlab.net@IPA.TWRLAB.NET for
>>> nfs/nfssrv.ipa.twrlab.net@IPA.TWRLAB.NET
>>> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): ...
>>> CONSTRAINED-DELEGATION s4u-client=jsmith@IPA.TWRLAB.NET
>>> Nov 18 18:06:51 directory.ipa.twrlab.net krb5kdc[8465](info): closing down
>>> fd 11
>>>
>>>
>>> On the Fedora 43 client, gssproxy also acquires an
>>> aes256-cts-hmac-sha384-192 service ticket, but the machine credential is
>>> aes256-cts-hmac-sha384-192 and everything works as-ex
>>> pected.
>>
>> I'm looping in here the gssproxy maintainer as well. Simon, this is
>> about https://bugs.debian.org/1120598 . I assume there is nothing on
>> gssroxy side which can be done to warn about the situation, quoting
>> again:
>>
>>> The actual issue at hand then seems to be that gssproxy is requesting (and
>>> receiving) a service ticket with an unusable (for the NFS mount) enctype,
>>> when performing constrained delegation/S4U2Proxy.
>>
>> ?
>>
>> Regards,
>> Salvatore
>>
> 

TWR


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2
  2025-11-19  4:50                                     ` Salvatore Bonaccorso
  2025-11-19 13:36                                       ` Scott Mayhew
@ 2025-11-19 20:54                                       ` Simon Josefsson
  1 sibling, 0 replies; 31+ messages in thread
From: Simon Josefsson @ 2025-11-19 20:54 UTC (permalink / raw)
  To: Salvatore Bonaccorso
  Cc: Tyler W. Ross, Scott Mayhew, Trond Myklebust, Chuck Lever,
	Anna Schumaker, 1120598@bugs.debian.org, Jeff Layton, NeilBrown,
	Steve Dickson, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 957 bytes --]

Salvatore Bonaccorso <carnil@debian.org> writes:

> I'm looping in here the gssproxy maintainer as well. Simon, this is
> about https://bugs.debian.org/1120598 . I assume there is nothing on
> gssroxy side which can be done to warn about the situation, quoting
> again:
>
>> The actual issue at hand then seems to be that gssproxy is requesting (and
>> receiving) a service ticket with an unusable (for the NFS mount) enctype,
>> when performing constrained delegation/S4U2Proxy.
>
> ?

It isn't clear to me if the gssproxy behaviour is buggy or just
sub-optimal, but it seems like gssproxy upstream could develop some
patch to make the enctypes match.  I'm not sure if that is generally a
safe thing, even if it would fix the problem.  Anyway, I think this
looks definitely beyond any Debian-specific concern about gssproxy so I
think some upstream recommendation is needed here, and I don't have a
working NFSv4 gss setup available to debug this.

/Simon

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1251 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-11-19 21:15 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <176298368872.955.14091113173156448257.reportbug@nfsclient-sid.ipa.twrlab.net>
2025-11-13  5:00 ` ls input/output error ("NFS: readdir(/) returns -5") on krb5 NFSv4 client using SHA2 Salvatore Bonaccorso
2025-11-13 14:30   ` Chuck Lever
2025-11-13 17:16     ` Tyler W. Ross
2025-11-13 17:47       ` Chuck Lever
2025-11-13 18:05         ` Tyler W. Ross
2025-11-13 18:12           ` Chuck Lever
2025-11-13 18:51             ` Tyler W. Ross
2025-11-13 18:57               ` Chuck Lever
2025-11-13 21:21         ` Salvatore Bonaccorso
2025-11-13 21:23           ` Chuck Lever
2025-11-13 22:20             ` Salvatore Bonaccorso
2025-11-13 22:30               ` Chuck Lever
2025-11-14  4:35                 ` Tyler W. Ross
2025-11-14  5:09                   ` Tyler W. Ross
2025-11-14 14:18                     ` Chuck Lever
2025-11-16  0:38                       ` Tyler W. Ross
2025-11-16 16:29                         ` Chuck Lever
2025-11-16 18:21                           ` Trond Myklebust
2025-11-17  5:19                             ` Tyler W. Ross
2025-11-17 13:41                               ` Chuck Lever
2025-11-17 18:38                                 ` Tyler W. Ross
2025-11-17 23:05                               ` Scott Mayhew
2025-11-17 22:54                             ` Scott Mayhew
2025-11-18  4:10                               ` Tyler W. Ross
2025-11-18 17:52                                 ` Scott Mayhew
2025-11-18 23:43                                   ` Tyler W. Ross
2025-11-19  4:50                                     ` Salvatore Bonaccorso
2025-11-19 13:36                                       ` Scott Mayhew
2025-11-19 20:54                                       ` Simon Josefsson
2025-11-18  4:32 Tyler W. Ross
  -- strict thread matches above, loose matches on Subject: below --
2025-11-19 17:19 Tyler W. Ross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).