* Random IO errors on nfs clients running linux > 4.20
@ 2020-04-29 17:15 Alberto Gonzalez Iniesta
2020-04-30 6:08 ` Alberto Gonzalez Iniesta
2020-04-30 17:32 ` J. Bruce Fields
0 siblings, 2 replies; 16+ messages in thread
From: Alberto Gonzalez Iniesta @ 2020-04-29 17:15 UTC (permalink / raw)
To: linux-nfs; +Cc: Miguel Rodriguez, Isaac Marco Blancas
Hello NFS maintainers,
I'm sorry for reporting this (a little bit) late, but it took us (Miguel
in Cc:) some time to track this issue to an exact kernel update.
We're running a +200 clients NFS server with Ubuntu 16.04 and 18.04
clients. The server runs Debian 8.11 (jessie) with Linux 3.16.0 and
nfs-kernel-server 1:1.2.8-9+deb8u1. It has been working some years now
without issues.
But since we started moving clients from Ubuntu 16.04 to Ubuntu 18.04
some of them started experiencing failures while working on NFS mounts.
The failures are arbitrary and sometimes it may take more than 20 minutes
to come out (which made finding out which kernel version introduced
this a pain). We are almost sure that some directories are more prone to
suffer from this than others (maybe related to path length/chars?).
The error is also not very "verbose", from an strace:
execve("/bin/ls", ["ls", "-lR", "Becas y ayudas/"], 0x7ffccb7f5b20 /* 16 vars */) = 0
[lots of uninteresting output]
openat(AT_FDCWD, "Becas y ayudas/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(1, TCGETS, 0x7ffd8b725c80) = -1 ENOTTY (Inappropriate ioctl for device)
getdents(3, /* 35 entries */, 32768) = 1936
[lots of lstats)
lstat("Becas y ayudas/Convocatorias", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
getdents(3, 0x561af78de890, 32768) = -1 EIO (Input/output error)
(I can send you the full output if you need it)
We can run the previous "ls -lR" 20 times and get no error, or get
this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida"
(ls: reading directorio 'Becas y ayudas/': Input/Output Error") every
now and then.
The error happens (obviously?) with ls, rsync and the users's GUI tools.
There's nothing in dmesg (or elsewhere).
These are the kernels with tried:
4.18.0-25 -> Can't reproduce
4.19.0 -> Can't reproduce
4.20.17 -> Happening (hard to reproduce)
5.0.0-15 -> Happening (hard to reproduce)
5.3.0-45 -> Happening (more frequently)
5.6.0-rc7 -> Reproduced a couple of times after boot, then nothing
We did long (as in daylong) testing trying to reproduce this with all
those kernel versions, so we are pretty sure 4.18 and 4.19 don't
experience this and our Ubuntu 16.04 clients don't have any issue.
I know we aren't providing much info but we are really looking forward
to doing all the testing required (we already spent lots of time in it).
Thanks for your work.
Regards,
Alberto
--
Alberto González Iniesta | Universidad a Distancia
alberto.gonzalez@udima.es | de Madrid
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Random IO errors on nfs clients running linux > 4.20 2020-04-29 17:15 Random IO errors on nfs clients running linux > 4.20 Alberto Gonzalez Iniesta @ 2020-04-30 6:08 ` Alberto Gonzalez Iniesta 2020-04-30 17:32 ` J. Bruce Fields 1 sibling, 0 replies; 16+ messages in thread From: Alberto Gonzalez Iniesta @ 2020-04-30 6:08 UTC (permalink / raw) To: linux-nfs On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote: > Hello NFS maintainers, > > I know we aren't providing much info but we are really looking forward > to doing all the testing required (we already spent lots of time in it). Hi, Sorry, I was providing way too little info... We're using NFSv4 with kerberos, mounts are done with: mount -t nfs4 -o sec=krb5p,exec,noauto pluto.XXXX:/publico /media/pluto Regards, Alberto -- Alberto Gonzalez Iniesta | Formación, consultoría y soporte técnico mailto/sip: agi@inittab.org | en GNU/Linux y software libre Encrypted mail preferred | http://inittab.com Key fingerprint = 5347 CBD8 3E30 A9EB 4D7D 4BF2 009B 3375 6B9A AA55 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-04-29 17:15 Random IO errors on nfs clients running linux > 4.20 Alberto Gonzalez Iniesta 2020-04-30 6:08 ` Alberto Gonzalez Iniesta @ 2020-04-30 17:32 ` J. Bruce Fields 2020-09-09 9:29 ` Alberto Gonzalez Iniesta 1 sibling, 1 reply; 16+ messages in thread From: J. Bruce Fields @ 2020-04-30 17:32 UTC (permalink / raw) To: Alberto Gonzalez Iniesta; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote: > I'm sorry for reporting this (a little bit) late, but it took us (Miguel > in Cc:) some time to track this issue to an exact kernel update. > > We're running a +200 clients NFS server with Ubuntu 16.04 and 18.04 > clients. The server runs Debian 8.11 (jessie) with Linux 3.16.0 and > nfs-kernel-server 1:1.2.8-9+deb8u1. It has been working some years now > without issues. > > But since we started moving clients from Ubuntu 16.04 to Ubuntu 18.04 > some of them started experiencing failures while working on NFS mounts. > The failures are arbitrary and sometimes it may take more than 20 minutes > to come out (which made finding out which kernel version introduced > this a pain). We are almost sure that some directories are more prone to > suffer from this than others (maybe related to path length/chars?). > > The error is also not very "verbose", from an strace: > > execve("/bin/ls", ["ls", "-lR", "Becas y ayudas/"], 0x7ffccb7f5b20 /* 16 vars */) = 0 > [lots of uninteresting output] > openat(AT_FDCWD, "Becas y ayudas/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3 > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 > fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 > ioctl(1, TCGETS, 0x7ffd8b725c80) = -1 ENOTTY (Inappropriate ioctl for device) > getdents(3, /* 35 entries */, 32768) = 1936 > [lots of lstats) > lstat("Becas y ayudas/Convocatorias", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 > getdents(3, 0x561af78de890, 32768) = -1 EIO (Input/output error) Ideas off the top of my head.... It'd be really useful to get a network trace--something like tcpdump -s0 -wtmp.pcap -i<interface>, then reproduce the problem, then look through it to see if you can find the READDIR or STAT or whatever that results in the unexpected EIO. But if takes a while to reproduce, that may be difficult. Is there anything in the logs? It might be worth turning on some more debug logging--see the "rpcdebug" command. --b. > > (I can send you the full output if you need it) > > We can run the previous "ls -lR" 20 times and get no error, or get > this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida" > (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every > now and then. > > The error happens (obviously?) with ls, rsync and the users's GUI tools. > > There's nothing in dmesg (or elsewhere). > These are the kernels with tried: > 4.18.0-25 -> Can't reproduce > 4.19.0 -> Can't reproduce > 4.20.17 -> Happening (hard to reproduce) > 5.0.0-15 -> Happening (hard to reproduce) > 5.3.0-45 -> Happening (more frequently) > 5.6.0-rc7 -> Reproduced a couple of times after boot, then nothing > > We did long (as in daylong) testing trying to reproduce this with all > those kernel versions, so we are pretty sure 4.18 and 4.19 don't > experience this and our Ubuntu 16.04 clients don't have any issue. > > I know we aren't providing much info but we are really looking forward > to doing all the testing required (we already spent lots of time in it). > > Thanks for your work. > > Regards, > > Alberto > > -- > Alberto González Iniesta | Universidad a Distancia > alberto.gonzalez@udima.es | de Madrid ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-04-30 17:32 ` J. Bruce Fields @ 2020-09-09 9:29 ` Alberto Gonzalez Iniesta 2020-09-09 13:47 ` J. Bruce Fields 0 siblings, 1 reply; 16+ messages in thread From: Alberto Gonzalez Iniesta @ 2020-09-09 9:29 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas [-- Attachment #1: Type: text/plain, Size: 4154 bytes --] On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote: > On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote: > > I'm sorry for reporting this (a little bit) late, but it took us (Miguel > > in Cc:) some time to track this issue to an exact kernel update. > > > > We're running a +200 clients NFS server with Ubuntu 16.04 and 18.04 > > clients. The server runs Debian 8.11 (jessie) with Linux 3.16.0 and > > nfs-kernel-server 1:1.2.8-9+deb8u1. It has been working some years now > > without issues. > > > > But since we started moving clients from Ubuntu 16.04 to Ubuntu 18.04 > > some of them started experiencing failures while working on NFS mounts. > > The failures are arbitrary and sometimes it may take more than 20 minutes > > to come out (which made finding out which kernel version introduced > > this a pain). We are almost sure that some directories are more prone to > > suffer from this than others (maybe related to path length/chars?). > > > > The error is also not very "verbose", from an strace: > > > > execve("/bin/ls", ["ls", "-lR", "Becas y ayudas/"], 0x7ffccb7f5b20 /* 16 vars */) = 0 > > [lots of uninteresting output] > > openat(AT_FDCWD, "Becas y ayudas/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3 > > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 > > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 > > fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 > > ioctl(1, TCGETS, 0x7ffd8b725c80) = -1 ENOTTY (Inappropriate ioctl for device) > > getdents(3, /* 35 entries */, 32768) = 1936 > > [lots of lstats) > > lstat("Becas y ayudas/Convocatorias", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 > > getdents(3, 0x561af78de890, 32768) = -1 EIO (Input/output error) > > Ideas off the top of my head.... > > It'd be really useful to get a network trace--something like tcpdump -s0 > -wtmp.pcap -i<interface>, then reproduce the problem, then look through > it to see if you can find the READDIR or STAT or whatever that results > in the unexpected EIO. But if takes a while to reproduce, that may be > difficult. > > Is there anything in the logs? > > It might be worth turning on some more debug logging--see the "rpcdebug" > command. Hi, Bruce et at. I'm sorry this reply took so long, but with debugging enabled the error was harder to reproduce. I'm attaching 3 log files (2 with just "nfs" debugging and one with "nfs" and "rpc" debugging modules enabled). I'm also attaching a pcap file, don't know if it would be useful since we run "sec=krb5p". Let me know if there's anything else I can test/provide. These tests were done with Ubuntu's 5.3.0-53-generic. Thanks, Alberto > > (I can send you the full output if you need it) > > > > We can run the previous "ls -lR" 20 times and get no error, or get > > this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida" > > (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every > > now and then. > > > > The error happens (obviously?) with ls, rsync and the users's GUI tools. > > > > There's nothing in dmesg (or elsewhere). > > These are the kernels with tried: > > 4.18.0-25 -> Can't reproduce > > 4.19.0 -> Can't reproduce > > 4.20.17 -> Happening (hard to reproduce) > > 5.0.0-15 -> Happening (hard to reproduce) > > 5.3.0-45 -> Happening (more frequently) > > 5.6.0-rc7 -> Reproduced a couple of times after boot, then nothing > > > > We did long (as in daylong) testing trying to reproduce this with all > > those kernel versions, so we are pretty sure 4.18 and 4.19 don't > > experience this and our Ubuntu 16.04 clients don't have any issue. > > > > I know we aren't providing much info but we are really looking forward > > to doing all the testing required (we already spent lots of time in it). > > > > Thanks for your work. > > > > Regards, > > > > Alberto > > > > -- > > Alberto González Iniesta | Universidad a Distancia > > alberto.gonzalez@udima.es | de Madrid -- Alberto González Iniesta | Universidad a Distancia alberto.gonzalez@udima.es | de Madrid [-- Attachment #2: err3.txt --] [-- Type: text/plain, Size: 10137 bytes --] Sep 8 16:03:23 portatil264 kernel: [15032.854134] RPC: 3283 call_decode result 0 Sep 8 16:03:23 portatil264 kernel: [15032.854140] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 Sep 8 16:03:23 portatil264 kernel: [15032.854142] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Sep 8 16:03:23 portatil264 kernel: [15032.854145] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") Sep 8 16:03:23 portatil264 kernel: [15032.854147] nfs4_free_slot: slotid 1 highest_used_slotid 0 Sep 8 16:03:23 portatil264 kernel: [15032.854148] nfs41_sequence_process: Error 0 free the slot Sep 8 16:03:23 portatil264 kernel: [15032.854150] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") Sep 8 16:03:23 portatil264 kernel: [15032.854152] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 16:03:23 portatil264 kernel: [15032.854154] RPC: 3283 return 0, status 0 Sep 8 16:03:23 portatil264 kernel: [15032.854155] RPC: 3283 release task Sep 8 16:03:23 portatil264 kernel: [15032.854159] RPC: freeing buffer of size 4584 at 00000000a3649daf Sep 8 16:03:23 portatil264 kernel: [15032.854162] RPC: 3283 release request 0000000079df89b2 Sep 8 16:03:23 portatil264 kernel: [15032.854164] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") Sep 8 16:03:23 portatil264 kernel: [15032.854166] RPC: rpc_release_client(00000000b930c343) Sep 8 16:03:23 portatil264 kernel: [15032.854169] RPC: 3283 freeing task Sep 8 16:03:23 portatil264 kernel: [15032.854179] NFS: nfs_update_inode(0:53/31982477 fh_crc=0x47b6a915 ct=1 info=0x427e7f) Sep 8 16:03:23 portatil264 kernel: [15032.854183] NFS: (0:53/31982477) revalidation complete Sep 8 16:03:23 portatil264 kernel: [15032.854186] NFS: dentry_delete(innovacion/Proyectos de investigación, 28080c) Sep 8 16:03:23 portatil264 kernel: [15032.854240] NFS: readdir(departamentos/innovacion) starting at cookie 19 Sep 8 16:03:23 portatil264 kernel: [15032.854259] _nfs4_proc_readdir: dentry = departamentos/innovacion, cookie = 0 Sep 8 16:03:23 portatil264 kernel: [15032.854262] RPC: new task initialized, procpid 26029 Sep 8 16:03:23 portatil264 kernel: [15032.854263] RPC: allocated task 00000000493c7c0e Sep 8 16:03:23 portatil264 kernel: [15032.854267] RPC: 3284 __rpc_execute flags=0x4080 Sep 8 16:03:23 portatil264 kernel: [15032.854269] --> nfs41_call_sync_prepare data->seq_server 00000000eb43854e Sep 8 16:03:23 portatil264 kernel: [15032.854271] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 16:03:23 portatil264 kernel: [15032.854273] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 16:03:23 portatil264 kernel: [15032.854276] RPC: 3284 call_start nfs4 proc READDIR (sync) Sep 8 16:03:23 portatil264 kernel: [15032.854278] RPC: 3284 call_reserve (status 0) Sep 8 16:03:23 portatil264 kernel: [15032.854281] RPC: 3284 reserved req 0000000079df89b2 xid 5dfa3876 Sep 8 16:03:23 portatil264 kernel: [15032.854282] RPC: 3284 call_reserveresult (status 0) Sep 8 16:03:23 portatil264 kernel: [15032.854284] RPC: 3284 call_refresh (status 0) Sep 8 16:03:23 portatil264 kernel: [15032.854287] RPC: 3284 call_refreshresult (status 0) Sep 8 16:03:23 portatil264 kernel: [15032.854289] RPC: 3284 call_allocate (status 0) Sep 8 16:03:23 portatil264 kernel: [15032.854292] RPC: 3284 allocated buffer of size 4144 at 00000000a3649daf Sep 8 16:03:23 portatil264 kernel: [15032.854297] RPC: 3284 call_encode (status 0) Sep 8 16:03:23 portatil264 kernel: [15032.854301] RPC: gss_get_mic_v2 Sep 8 16:03:23 portatil264 kernel: [15032.854319] encode_sequence: sessionid=1582481124:15749:9224:0 seqid=195 slotid=0 max_slotid=0 cache_this=0 Sep 8 16:03:23 portatil264 kernel: [15032.854323] encode_readdir: cookie = 0, verifier = 00000000:00000000, bitmap = 0018091a:00b0a23a:00000000 Sep 8 16:03:23 portatil264 kernel: [15032.854325] RPC: gss_wrap_kerberos_v2 Sep 8 16:03:23 portatil264 kernel: [15032.854350] RPC: 3284 call_transmit (status 0) Sep 8 16:03:23 portatil264 kernel: [15032.854352] RPC: 3284 xprt_prepare_transmit Sep 8 16:03:23 portatil264 kernel: [15032.854389] RPC: xs_tcp_send_request(272) = 0 Sep 8 16:03:23 portatil264 kernel: [15032.854392] RPC: wake_up_first(000000001bb91292 "xprt_sending") Sep 8 16:03:23 portatil264 kernel: [15032.854395] RPC: 3284 sleep_on(queue "xprt_pending" time 4298650492) Sep 8 16:03:23 portatil264 kernel: [15032.854397] RPC: 3284 added to queue 00000000a9a9ecc7 "xprt_pending" Sep 8 16:03:23 portatil264 kernel: [15032.854398] RPC: 3284 setting alarm for 60000 ms Sep 8 16:03:23 portatil264 kernel: [15032.854401] RPC: 3284 sync task going to sleep Sep 8 16:03:23 portatil264 kernel: [15032.935579] RPC: xs_data_ready... Sep 8 16:03:23 portatil264 kernel: [15032.939135] RPC: xs_data_ready... Sep 8 16:03:23 portatil264 kernel: [15032.941013] decode_attr_owner: uid=138184 Sep 8 16:03:23 portatil264 kernel: [15033.016025] RPC: xs_data_ready... Sep 8 16:03:23 portatil264 kernel: [15033.016117] RPC: xs_data_ready... Sep 8 16:03:23 portatil264 kernel: [15033.016124] RPC: 3284 xid 5dfa3876 complete (4064 bytes received) Sep 8 16:03:23 portatil264 kernel: [15033.016129] RPC: 3284 __rpc_wake_up_task (now 4298650533) Sep 8 16:03:23 portatil264 kernel: [15033.016131] RPC: 3284 disabling timer Sep 8 16:03:23 portatil264 kernel: [15033.016136] RPC: 3284 removed from queue 00000000a9a9ecc7 "xprt_pending" Sep 8 16:03:23 portatil264 kernel: [15033.016149] RPC: __rpc_wake_up_task done Sep 8 16:03:23 portatil264 kernel: [15033.016231] RPC: 3284 sync task resuming Sep 8 16:03:23 portatil264 kernel: [15033.016234] RPC: 3284 call_status (status 0) Sep 8 16:03:23 portatil264 kernel: [15033.016235] RPC: 3284 call_decode (status 0) Sep 8 16:03:23 portatil264 kernel: [15033.016240] RPC: gss_verify_mic_v2 Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 Sep 8 16:03:23 portatil264 kernel: [15033.016788] NFS: permission(0:53/28049412), mask=0x81, res=0 Sep 8 16:03:23 portatil264 kernel: [15033.016793] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid Sep 8 16:03:23 portatil264 kernel: [15033.016797] NFS: permission(0:53/28051004), mask=0x81, res=0 Sep 8 16:03:23 portatil264 kernel: [15033.016801] NFS: revalidating (0:53/30801922) Sep 8 16:03:23 portatil264 kernel: [15033.016808] RPC: new task initialized, procpid 26029 Sep 8 16:03:23 portatil264 kernel: [15033.016810] RPC: allocated task 000000005afe768d Sep 8 16:03:23 portatil264 kernel: [15033.016814] RPC: 3285 __rpc_execute flags=0x4080 Sep 8 16:03:23 portatil264 kernel: [15033.016817] --> nfs41_call_sync_prepare data->seq_server 00000000eb43854e Sep 8 16:03:23 portatil264 kernel: [15033.016820] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 16:03:23 portatil264 kernel: [15033.016823] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 16:03:23 portatil264 kernel: [15033.016826] RPC: 3285 call_start nfs4 proc GETATTR (sync) Sep 8 16:03:23 portatil264 kernel: [15033.016829] RPC: 3285 call_reserve (status 0) Sep 8 16:03:23 portatil264 kernel: [15033.016832] RPC: 3285 reserved req 0000000079df89b2 xid 5efa3876 Sep 8 16:03:23 portatil264 kernel: [15033.016834] RPC: 3285 call_reserveresult (status 0) Sep 8 16:03:23 portatil264 kernel: [15033.016836] RPC: 3285 call_refresh (status 0) Sep 8 16:03:23 portatil264 kernel: [15033.016840] RPC: 3285 call_refreshresult (status 0) Sep 8 16:03:23 portatil264 kernel: [15033.016844] RPC: 3285 call_allocate (status 0) Sep 8 16:03:23 portatil264 kernel: [15033.016848] RPC: 3285 allocated buffer of size 4584 at 000000007624e941 Sep 8 16:03:23 portatil264 kernel: [15033.016850] RPC: 3285 call_encode (status 0) Sep 8 16:03:23 portatil264 kernel: [15033.016857] RPC: gss_get_mic_v2 Sep 8 16:03:23 portatil264 kernel: [15033.016876] encode_sequence: sessionid=1582481124:15749:9224:0 seqid=196 slotid=0 max_slotid=0 cache_this=0 Sep 8 16:03:23 portatil264 kernel: [15033.016879] RPC: gss_wrap_kerberos_v2 Sep 8 16:03:23 portatil264 kernel: [15033.016907] RPC: 3285 call_transmit (status 0) Sep 8 16:03:23 portatil264 kernel: [15033.016908] RPC: 3285 xprt_prepare_transmit Sep 8 16:03:23 portatil264 kernel: [15033.016941] RPC: xs_tcp_send_request(248) = 0 Sep 8 16:03:23 portatil264 kernel: [15033.016944] RPC: wake_up_first(000000001bb91292 "xprt_sending") Sep 8 16:03:23 portatil264 kernel: [15033.016947] RPC: 3285 sleep_on(queue "xprt_pending" time 4298650533) Sep 8 16:03:23 portatil264 kernel: [15033.016948] RPC: 3285 added to queue 00000000a9a9ecc7 "xprt_pending" Sep 8 16:03:23 portatil264 kernel: [15033.016949] RPC: 3285 setting alarm for 60000 ms Sep 8 16:03:23 portatil264 kernel: [15033.016952] RPC: 3285 sync task going to sleep [-- Attachment #3: err2.txt --] [-- Type: text/plain, Size: 8642 bytes --] Sep 8 11:03:08 portatil264 kernel: [ 294.429356] NFS: permission(0:53/28049412), mask=0x81, res=0 Sep 8 11:03:08 portatil264 kernel: [ 294.429365] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid Sep 8 11:03:08 portatil264 kernel: [ 294.429369] NFS: permission(0:53/28051004), mask=0x81, res=0 Sep 8 11:03:08 portatil264 kernel: [ 294.429372] NFS: nfs_lookup_revalidate_done(departamentos/innovacion) is valid Sep 8 11:03:08 portatil264 kernel: [ 294.429377] NFS: revalidating (0:53/30801924) Sep 8 11:03:08 portatil264 kernel: [ 294.429392] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5 Sep 8 11:03:08 portatil264 kernel: [ 294.429396] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 11:03:08 portatil264 kernel: [ 294.429399] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 11:03:08 portatil264 kernel: [ 294.429456] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=50 slotid=0 max_slotid=0 cache_this=0 Sep 8 11:03:09 portatil264 kernel: [ 294.480184] decode_attr_type: type=040000 Sep 8 11:03:09 portatil264 kernel: [ 294.480187] decode_attr_change: change attribute=6810276040818143508 Sep 8 11:03:09 portatil264 kernel: [ 294.480189] decode_attr_size: file size=4096 Sep 8 11:03:09 portatil264 kernel: [ 294.480190] decode_attr_fsid: fsid=(0x0/0x0) Sep 8 11:03:09 portatil264 kernel: [ 294.480192] decode_attr_fileid: fileid=30801924 Sep 8 11:03:09 portatil264 kernel: [ 294.480193] decode_attr_fs_locations: fs_locations done, error = 0 Sep 8 11:03:09 portatil264 kernel: [ 294.480194] decode_attr_mode: file mode=02770 Sep 8 11:03:09 portatil264 kernel: [ 294.480196] decode_attr_nlink: nlink=19 Sep 8 11:03:09 portatil264 kernel: [ 294.480208] decode_attr_owner: uid=0 Sep 8 11:03:09 portatil264 kernel: [ 294.480212] decode_attr_group: gid=709 Sep 8 11:03:09 portatil264 kernel: [ 294.480213] decode_attr_rdev: rdev=(0x0:0x0) Sep 8 11:03:09 portatil264 kernel: [ 294.480215] decode_attr_space_used: space used=4096 Sep 8 11:03:09 portatil264 kernel: [ 294.480216] decode_attr_time_access: atime=1599472206 Sep 8 11:03:09 portatil264 kernel: [ 294.480217] decode_attr_time_metadata: ctime=1585640954 Sep 8 11:03:09 portatil264 kernel: [ 294.480218] decode_attr_time_modify: mtime=1585640954 Sep 8 11:03:09 portatil264 kernel: [ 294.480220] decode_attr_mounted_on_fileid: fileid=30801924 Sep 8 11:03:09 portatil264 kernel: [ 294.480221] decode_getfattr_attrs: xdr returned 0 Sep 8 11:03:09 portatil264 kernel: [ 294.480223] decode_getfattr_generic: xdr returned 0 Sep 8 11:03:09 portatil264 kernel: [ 294.480228] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 Sep 8 11:03:09 portatil264 kernel: [ 294.480230] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Sep 8 11:03:09 portatil264 kernel: [ 294.480232] nfs4_free_slot: slotid 1 highest_used_slotid 0 Sep 8 11:03:09 portatil264 kernel: [ 294.480233] nfs41_sequence_process: Error 0 free the slot Sep 8 11:03:09 portatil264 kernel: [ 294.480234] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 11:03:09 portatil264 kernel: [ 294.480246] NFS: nfs_update_inode(0:53/30801924 fh_crc=0x21ccf1a0 ct=1 info=0x427e7f) Sep 8 11:03:09 portatil264 kernel: [ 294.480249] NFS: (0:53/30801924) revalidation complete Sep 8 11:03:09 portatil264 kernel: [ 294.480252] NFS: dentry_delete(departamentos/innovacion, 28084c) Sep 8 11:03:09 portatil264 kernel: [ 294.480271] NFS: permission(0:53/28049412), mask=0x81, res=0 Sep 8 11:03:09 portatil264 kernel: [ 294.480273] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid Sep 8 11:03:09 portatil264 kernel: [ 294.480275] NFS: permission(0:53/28051004), mask=0x81, res=0 Sep 8 11:03:09 portatil264 kernel: [ 294.480277] NFS: revalidating (0:53/30801924) Sep 8 11:03:09 portatil264 kernel: [ 294.480282] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5 Sep 8 11:03:09 portatil264 kernel: [ 294.480284] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 11:03:09 portatil264 kernel: [ 294.480285] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 11:03:09 portatil264 kernel: [ 294.480306] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=51 slotid=0 max_slotid=0 cache_this=0 Sep 8 11:03:09 portatil264 kernel: [ 294.562935] decode_attr_type: type=040000 Sep 8 11:03:09 portatil264 kernel: [ 294.562937] decode_attr_change: change attribute=6810276040818143508 Sep 8 11:03:09 portatil264 kernel: [ 294.562939] decode_attr_size: file size=4096 Sep 8 11:03:09 portatil264 kernel: [ 294.562940] decode_attr_fsid: fsid=(0x0/0x0) Sep 8 11:03:09 portatil264 kernel: [ 294.562941] decode_attr_fileid: fileid=30801924 Sep 8 11:03:09 portatil264 kernel: [ 294.562942] decode_attr_fs_locations: fs_locations done, error = 0 Sep 8 11:03:09 portatil264 kernel: [ 294.562943] decode_attr_mode: file mode=02770 Sep 8 11:03:09 portatil264 kernel: [ 294.562944] decode_attr_nlink: nlink=19 Sep 8 11:03:09 portatil264 kernel: [ 294.562950] decode_attr_owner: uid=0 Sep 8 11:03:09 portatil264 kernel: [ 294.562954] decode_attr_group: gid=709 Sep 8 11:03:09 portatil264 kernel: [ 294.562955] decode_attr_rdev: rdev=(0x0:0x0) Sep 8 11:03:09 portatil264 kernel: [ 294.562956] decode_attr_space_used: space used=4096 Sep 8 11:03:09 portatil264 kernel: [ 294.562957] decode_attr_time_access: atime=1599472206 Sep 8 11:03:09 portatil264 kernel: [ 294.562958] decode_attr_time_metadata: ctime=1585640954 Sep 8 11:03:09 portatil264 kernel: [ 294.562961] decode_attr_time_modify: mtime=1585640954 Sep 8 11:03:09 portatil264 kernel: [ 294.562962] decode_attr_mounted_on_fileid: fileid=30801924 Sep 8 11:03:09 portatil264 kernel: [ 294.562963] decode_getfattr_attrs: xdr returned 0 Sep 8 11:03:09 portatil264 kernel: [ 294.562964] decode_getfattr_generic: xdr returned 0 Sep 8 11:03:09 portatil264 kernel: [ 294.562968] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 Sep 8 11:03:09 portatil264 kernel: [ 294.562970] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Sep 8 11:03:09 portatil264 kernel: [ 294.562971] nfs4_free_slot: slotid 1 highest_used_slotid 0 Sep 8 11:03:09 portatil264 kernel: [ 294.562973] nfs41_sequence_process: Error 0 free the slot Sep 8 11:03:09 portatil264 kernel: [ 294.562974] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 11:03:09 portatil264 kernel: [ 294.562982] NFS: nfs_update_inode(0:53/30801924 fh_crc=0x21ccf1a0 ct=1 info=0x427e7f) Sep 8 11:03:09 portatil264 kernel: [ 294.562984] NFS: (0:53/30801924) revalidation complete Sep 8 11:03:09 portatil264 kernel: [ 294.562987] NFS: nfs_lookup_revalidate_done(departamentos/innovacion) is valid Sep 8 11:03:09 portatil264 kernel: [ 294.562991] NFS: permission(0:53/30801924), mask=0x24, res=0 Sep 8 11:03:09 portatil264 kernel: [ 294.562995] NFS: open dir(departamentos/innovacion) Sep 8 11:03:09 portatil264 kernel: [ 294.563017] NFS: readdir(departamentos/innovacion) starting at cookie 0 Sep 8 11:03:09 portatil264 kernel: [ 294.563029] _nfs4_proc_readdir: dentry = departamentos/innovacion, cookie = 0 Sep 8 11:03:09 portatil264 kernel: [ 294.563034] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5 Sep 8 11:03:09 portatil264 kernel: [ 294.563036] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 11:03:09 portatil264 kernel: [ 294.563037] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 11:03:09 portatil264 kernel: [ 294.563057] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=52 slotid=0 max_slotid=0 cache_this=0 Sep 8 11:03:09 portatil264 kernel: [ 294.563060] encode_readdir: cookie = 0, verifier = 00000000:00000000, bitmap = 0018091a:00b0a23a:00000000 Sep 8 11:03:09 portatil264 kernel: [ 294.643573] nfs41_sequence_process: Error 1 free the slot Sep 8 11:03:09 portatil264 kernel: [ 294.643576] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 11:03:09 portatil264 kernel: [ 294.643583] _nfs4_proc_readdir: returns -5 Sep 8 11:03:09 portatil264 kernel: [ 294.643589] NFS: readdir(departamentos/innovacion) returns -5 Sep 8 11:03:09 portatil264 kernel: [ 294.644109] NFS: dentry_delete(departamentos/innovacion, 28084c) Sep 8 11:03:10 portatil264 kernel: [ 296.012504] nfs4_renew_state: start Sep 8 11:03:10 portatil264 kernel: [ 296.012510] nfs4_schedule_state_renewal: requeueing work. Lease period = 59 Sep 8 11:03:10 portatil264 kernel: [ 296.012513] nfs4_renew_state: done [-- Attachment #4: err1.txt --] [-- Type: text/plain, Size: 16867 bytes --] Sep 8 11:02:03 portatil264 kernel: [ 229.334583] FS-Cache: Loaded Sep 8 11:02:03 portatil264 kernel: [ 229.374147] FS-Cache: Netfs 'nfs' registered for caching Sep 8 11:02:04 portatil264 kernel: [ 229.528716] NFS: Registering the id_resolver key type Sep 8 11:02:04 portatil264 kernel: [ 229.528731] Key type id_resolver registered Sep 8 11:02:04 portatil264 kernel: [ 229.528732] Key type id_legacy registered Sep 8 11:02:52 portatil264 kernel: [ 278.316243] NFS: permission(0:53/28049412), mask=0x81, res=-10 Sep 8 11:02:52 portatil264 kernel: [ 278.316247] NFS: revalidating (0:53/28049412) Sep 8 11:02:52 portatil264 kernel: [ 278.316260] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5 Sep 8 11:02:52 portatil264 kernel: [ 278.316263] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 11:02:52 portatil264 kernel: [ 278.316265] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 11:02:52 portatil264 kernel: [ 278.316309] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=44 slotid=0 max_slotid=0 cache_this=0 Sep 8 11:02:52 portatil264 kernel: [ 278.373736] decode_attr_type: type=040000 Sep 8 11:02:52 portatil264 kernel: [ 278.373738] decode_attr_change: change attribute=6769953312283491654 Sep 8 11:02:52 portatil264 kernel: [ 278.373739] decode_attr_size: file size=4096 Sep 8 11:02:52 portatil264 kernel: [ 278.373741] decode_attr_fsid: fsid=(0x0/0x0) Sep 8 11:02:52 portatil264 kernel: [ 278.373742] decode_attr_fileid: fileid=28049412 Sep 8 11:02:52 portatil264 kernel: [ 278.373743] decode_attr_fs_locations: fs_locations done, error = 0 Sep 8 11:02:52 portatil264 kernel: [ 278.373745] decode_attr_mode: file mode=0755 Sep 8 11:02:52 portatil264 kernel: [ 278.373746] decode_attr_nlink: nlink=6 Sep 8 11:02:52 portatil264 kernel: [ 278.373757] decode_attr_owner: uid=0 Sep 8 11:02:52 portatil264 kernel: [ 278.373761] decode_attr_group: gid=0 Sep 8 11:02:52 portatil264 kernel: [ 278.373763] decode_attr_rdev: rdev=(0x0:0x0) Sep 8 11:02:52 portatil264 kernel: [ 278.373764] decode_attr_space_used: space used=4096 Sep 8 11:02:52 portatil264 kernel: [ 278.373765] decode_attr_time_access: atime=1599538321 Sep 8 11:02:52 portatil264 kernel: [ 278.373766] decode_attr_time_metadata: ctime=1576252587 Sep 8 11:02:52 portatil264 kernel: [ 278.373767] decode_attr_time_modify: mtime=1576252587 Sep 8 11:02:52 portatil264 kernel: [ 278.373768] decode_attr_mounted_on_fileid: fileid=28049412 Sep 8 11:02:52 portatil264 kernel: [ 278.373770] decode_getfattr_attrs: xdr returned 0 Sep 8 11:02:52 portatil264 kernel: [ 278.373771] decode_getfattr_generic: xdr returned 0 Sep 8 11:02:52 portatil264 kernel: [ 278.373776] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 Sep 8 11:02:52 portatil264 kernel: [ 278.373778] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Sep 8 11:02:52 portatil264 kernel: [ 278.373780] nfs4_free_slot: slotid 1 highest_used_slotid 0 Sep 8 11:02:52 portatil264 kernel: [ 278.373781] nfs41_sequence_process: Error 0 free the slot Sep 8 11:02:52 portatil264 kernel: [ 278.373782] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 11:02:52 portatil264 kernel: [ 278.373794] NFS: nfs_update_inode(0:53/28049412 fh_crc=0xc875a563 ct=2 info=0x427e7f) Sep 8 11:02:52 portatil264 kernel: [ 278.373797] NFS: (0:53/28049412) revalidation complete Sep 8 11:02:52 portatil264 kernel: [ 278.373800] NFS: permission(0:53/28049412), mask=0x1, res=0 Sep 8 11:02:52 portatil264 kernel: [ 278.373806] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid Sep 8 11:02:52 portatil264 kernel: [ 278.373812] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5 Sep 8 11:02:52 portatil264 kernel: [ 278.373814] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 11:02:52 portatil264 kernel: [ 278.373815] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 11:02:52 portatil264 kernel: [ 278.373834] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=45 slotid=0 max_slotid=0 cache_this=0 Sep 8 11:02:52 portatil264 kernel: [ 278.430991] decode_attr_type: type=00 Sep 8 11:02:52 portatil264 kernel: [ 278.430994] decode_attr_change: change attribute=6792489597889629835 Sep 8 11:02:52 portatil264 kernel: [ 278.430995] decode_attr_size: file size=4096 Sep 8 11:02:52 portatil264 kernel: [ 278.430996] decode_attr_fsid: fsid=(0x0/0x0) Sep 8 11:02:52 portatil264 kernel: [ 278.430997] decode_attr_fileid: fileid=0 Sep 8 11:02:52 portatil264 kernel: [ 278.430998] decode_attr_fs_locations: fs_locations done, error = 0 Sep 8 11:02:52 portatil264 kernel: [ 278.430999] decode_attr_mode: file mode=00 Sep 8 11:02:52 portatil264 kernel: [ 278.431000] decode_attr_nlink: nlink=1 Sep 8 11:02:52 portatil264 kernel: [ 278.431001] decode_attr_rdev: rdev=(0x0:0x0) Sep 8 11:02:52 portatil264 kernel: [ 278.431002] decode_attr_space_used: space used=0 Sep 8 11:02:52 portatil264 kernel: [ 278.431003] decode_attr_time_access: atime=0 Sep 8 11:02:52 portatil264 kernel: [ 278.431004] decode_attr_time_metadata: ctime=1581499725 Sep 8 11:02:52 portatil264 kernel: [ 278.431005] decode_attr_time_modify: mtime=1581499725 Sep 8 11:02:52 portatil264 kernel: [ 278.431006] decode_attr_mounted_on_fileid: fileid=0 Sep 8 11:02:52 portatil264 kernel: [ 278.431007] decode_getfattr_attrs: xdr returned 0 Sep 8 11:02:52 portatil264 kernel: [ 278.431008] decode_getfattr_generic: xdr returned 0 Sep 8 11:02:52 portatil264 kernel: [ 278.431012] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 Sep 8 11:02:52 portatil264 kernel: [ 278.431014] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Sep 8 11:02:52 portatil264 kernel: [ 278.431016] nfs4_free_slot: slotid 1 highest_used_slotid 0 Sep 8 11:02:52 portatil264 kernel: [ 278.431017] nfs41_sequence_process: Error 0 free the slot Sep 8 11:02:52 portatil264 kernel: [ 278.431018] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 11:02:52 portatil264 kernel: [ 278.431026] NFS: nfs_update_inode(0:53/28051004 fh_crc=0x3843eff0 ct=1 info=0x26040) Sep 8 11:02:52 portatil264 kernel: [ 278.431033] NFS: permission(0:53/28051004), mask=0x1, res=0 Sep 8 11:02:52 portatil264 kernel: [ 278.431037] NFS: lookup(departamentos/innovacion) Sep 8 11:02:52 portatil264 kernel: [ 278.431039] NFS call lookup innovacion Sep 8 11:02:52 portatil264 kernel: [ 278.431043] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5 Sep 8 11:02:52 portatil264 kernel: [ 278.431045] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 11:02:52 portatil264 kernel: [ 278.431047] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 11:02:52 portatil264 kernel: [ 278.431066] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=46 slotid=0 max_slotid=0 cache_this=0 Sep 8 11:02:53 portatil264 kernel: [ 278.585742] decode_attr_type: type=040000 Sep 8 11:02:53 portatil264 kernel: [ 278.585745] decode_attr_change: change attribute=6810276040818143508 Sep 8 11:02:53 portatil264 kernel: [ 278.585746] decode_attr_size: file size=4096 Sep 8 11:02:53 portatil264 kernel: [ 278.585747] decode_attr_fsid: fsid=(0x0/0x0) Sep 8 11:02:53 portatil264 kernel: [ 278.585748] decode_attr_fileid: fileid=30801924 Sep 8 11:02:53 portatil264 kernel: [ 278.585749] decode_attr_fs_locations: fs_locations done, error = 0 Sep 8 11:02:53 portatil264 kernel: [ 278.585750] decode_attr_mode: file mode=02770 Sep 8 11:02:53 portatil264 kernel: [ 278.585751] decode_attr_nlink: nlink=19 Sep 8 11:02:53 portatil264 kernel: [ 278.585758] decode_attr_owner: uid=0 Sep 8 11:02:53 portatil264 kernel: [ 278.898245] decode_attr_group: gid=709 Sep 8 11:02:53 portatil264 kernel: [ 278.898249] decode_attr_rdev: rdev=(0x0:0x0) Sep 8 11:02:53 portatil264 kernel: [ 278.898250] decode_attr_space_used: space used=4096 Sep 8 11:02:53 portatil264 kernel: [ 278.898251] decode_attr_time_access: atime=1599472206 Sep 8 11:02:53 portatil264 kernel: [ 278.898252] decode_attr_time_metadata: ctime=1585640954 Sep 8 11:02:53 portatil264 kernel: [ 278.898253] decode_attr_time_modify: mtime=1585640954 Sep 8 11:02:53 portatil264 kernel: [ 278.898255] decode_attr_mounted_on_fileid: fileid=30801924 Sep 8 11:02:53 portatil264 kernel: [ 278.898256] decode_getfattr_attrs: xdr returned 0 Sep 8 11:02:53 portatil264 kernel: [ 278.898258] decode_getfattr_generic: xdr returned 0 Sep 8 11:02:53 portatil264 kernel: [ 278.898266] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 Sep 8 11:02:53 portatil264 kernel: [ 278.898268] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Sep 8 11:02:53 portatil264 kernel: [ 278.898270] nfs4_free_slot: slotid 1 highest_used_slotid 0 Sep 8 11:02:53 portatil264 kernel: [ 278.898271] nfs41_sequence_process: Error 0 free the slot Sep 8 11:02:53 portatil264 kernel: [ 278.898273] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 11:02:53 portatil264 kernel: [ 278.898281] NFS reply lookup: 0 Sep 8 11:02:53 portatil264 kernel: [ 278.898292] NFS: nfs_fhget(0:53/30801924 fh_crc=0x21ccf1a0 ct=1) Sep 8 11:02:53 portatil264 kernel: [ 278.898302] NFS: dentry_delete(departamentos/innovacion, 20080c) Sep 8 11:02:53 portatil264 kernel: [ 278.898330] NFS: permission(0:53/28049412), mask=0x81, res=0 Sep 8 11:02:53 portatil264 kernel: [ 278.898334] NFS: nfs_lookup_revalidate_done(publico/departamentos) is valid Sep 8 11:02:53 portatil264 kernel: [ 278.898336] NFS: permission(0:53/28051004), mask=0x81, res=0 Sep 8 11:02:53 portatil264 kernel: [ 278.898338] NFS: revalidating (0:53/30801924) Sep 8 11:02:53 portatil264 kernel: [ 278.898349] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5 Sep 8 11:02:53 portatil264 kernel: [ 278.898351] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 11:02:53 portatil264 kernel: [ 278.898352] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 11:02:53 portatil264 kernel: [ 278.898393] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=47 slotid=0 max_slotid=0 cache_this=0 Sep 8 11:02:53 portatil264 kernel: [ 278.984901] decode_attr_type: type=040000 Sep 8 11:02:53 portatil264 kernel: [ 278.984904] decode_attr_change: change attribute=6810276040818143508 Sep 8 11:02:53 portatil264 kernel: [ 278.984905] decode_attr_size: file size=4096 Sep 8 11:02:53 portatil264 kernel: [ 278.984906] decode_attr_fsid: fsid=(0x0/0x0) Sep 8 11:02:53 portatil264 kernel: [ 278.984907] decode_attr_fileid: fileid=30801924 Sep 8 11:02:53 portatil264 kernel: [ 278.984909] decode_attr_fs_locations: fs_locations done, error = 0 Sep 8 11:02:53 portatil264 kernel: [ 278.984910] decode_attr_mode: file mode=02770 Sep 8 11:02:53 portatil264 kernel: [ 278.984911] decode_attr_nlink: nlink=19 Sep 8 11:02:53 portatil264 kernel: [ 278.984919] decode_attr_owner: uid=0 Sep 8 11:02:53 portatil264 kernel: [ 278.984922] decode_attr_group: gid=709 Sep 8 11:02:53 portatil264 kernel: [ 278.984923] decode_attr_rdev: rdev=(0x0:0x0) Sep 8 11:02:53 portatil264 kernel: [ 278.984924] decode_attr_space_used: space used=4096 Sep 8 11:02:53 portatil264 kernel: [ 278.984926] decode_attr_time_access: atime=1599472206 Sep 8 11:02:53 portatil264 kernel: [ 278.984927] decode_attr_time_metadata: ctime=1585640954 Sep 8 11:02:53 portatil264 kernel: [ 278.984928] decode_attr_time_modify: mtime=1585640954 Sep 8 11:02:53 portatil264 kernel: [ 278.984929] decode_attr_mounted_on_fileid: fileid=30801924 Sep 8 11:02:53 portatil264 kernel: [ 278.984930] decode_getfattr_attrs: xdr returned 0 Sep 8 11:02:53 portatil264 kernel: [ 278.984931] decode_getfattr_generic: xdr returned 0 Sep 8 11:02:53 portatil264 kernel: [ 278.984935] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 Sep 8 11:02:53 portatil264 kernel: [ 278.984937] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Sep 8 11:02:53 portatil264 kernel: [ 278.984939] nfs4_free_slot: slotid 1 highest_used_slotid 0 Sep 8 11:02:53 portatil264 kernel: [ 278.984940] nfs41_sequence_process: Error 0 free the slot Sep 8 11:02:53 portatil264 kernel: [ 278.984941] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 11:02:53 portatil264 kernel: [ 278.984951] NFS: nfs_update_inode(0:53/30801924 fh_crc=0x21ccf1a0 ct=1 info=0x427e7f) Sep 8 11:02:53 portatil264 kernel: [ 278.984954] NFS: (0:53/30801924) revalidation complete Sep 8 11:02:53 portatil264 kernel: [ 278.984957] NFS: nfs_lookup_revalidate_done(departamentos/innovacion) is valid Sep 8 11:02:53 portatil264 kernel: [ 278.984964] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5 Sep 8 11:02:53 portatil264 kernel: [ 278.984966] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 11:02:53 portatil264 kernel: [ 278.984968] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 11:02:53 portatil264 kernel: [ 278.984989] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=48 slotid=0 max_slotid=0 cache_this=0 Sep 8 11:02:53 portatil264 kernel: [ 279.083208] decode_attr_type: type=00 Sep 8 11:02:53 portatil264 kernel: [ 279.083210] decode_attr_change: change attribute=6810276040818143508 Sep 8 11:02:53 portatil264 kernel: [ 279.083211] decode_attr_size: file size=4096 Sep 8 11:02:53 portatil264 kernel: [ 279.083212] decode_attr_fsid: fsid=(0x0/0x0) Sep 8 11:02:53 portatil264 kernel: [ 279.083213] decode_attr_fileid: fileid=0 Sep 8 11:02:53 portatil264 kernel: [ 279.083215] decode_attr_fs_locations: fs_locations done, error = 0 Sep 8 11:02:53 portatil264 kernel: [ 279.083216] decode_attr_mode: file mode=00 Sep 8 11:02:53 portatil264 kernel: [ 279.083217] decode_attr_nlink: nlink=1 Sep 8 11:02:53 portatil264 kernel: [ 279.083218] decode_attr_rdev: rdev=(0x0:0x0) Sep 8 11:02:53 portatil264 kernel: [ 279.083219] decode_attr_space_used: space used=0 Sep 8 11:02:53 portatil264 kernel: [ 279.083220] decode_attr_time_access: atime=0 Sep 8 11:02:53 portatil264 kernel: [ 279.083221] decode_attr_time_metadata: ctime=1585640954 Sep 8 11:02:53 portatil264 kernel: [ 279.083222] decode_attr_time_modify: mtime=1585640954 Sep 8 11:02:53 portatil264 kernel: [ 279.083223] decode_attr_mounted_on_fileid: fileid=0 Sep 8 11:02:53 portatil264 kernel: [ 279.083224] decode_getfattr_attrs: xdr returned 0 Sep 8 11:02:53 portatil264 kernel: [ 279.083225] decode_getfattr_generic: xdr returned 0 Sep 8 11:02:53 portatil264 kernel: [ 279.083230] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 Sep 8 11:02:53 portatil264 kernel: [ 279.083232] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 Sep 8 11:02:53 portatil264 kernel: [ 279.083233] nfs4_free_slot: slotid 1 highest_used_slotid 0 Sep 8 11:02:53 portatil264 kernel: [ 279.083234] nfs41_sequence_process: Error 0 free the slot Sep 8 11:02:53 portatil264 kernel: [ 279.083236] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 11:02:53 portatil264 kernel: [ 279.083244] NFS: nfs_update_inode(0:53/30801924 fh_crc=0x21ccf1a0 ct=1 info=0x26040) Sep 8 11:02:53 portatil264 kernel: [ 279.083249] NFS: permission(0:53/30801924), mask=0x24, res=0 Sep 8 11:02:53 portatil264 kernel: [ 279.083253] NFS: open dir(departamentos/innovacion) Sep 8 11:02:53 portatil264 kernel: [ 279.083278] NFS: readdir(departamentos/innovacion) starting at cookie 0 Sep 8 11:02:53 portatil264 kernel: [ 279.083293] _nfs4_proc_readdir: dentry = departamentos/innovacion, cookie = 0 Sep 8 11:02:53 portatil264 kernel: [ 279.083298] --> nfs41_call_sync_prepare data->seq_server 00000000368715a5 Sep 8 11:02:53 portatil264 kernel: [ 279.083300] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 Sep 8 11:02:53 portatil264 kernel: [ 279.083302] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 Sep 8 11:02:53 portatil264 kernel: [ 279.083326] encode_sequence: sessionid=1582481124:15703:9206:0 seqid=49 slotid=0 max_slotid=0 cache_this=0 Sep 8 11:02:53 portatil264 kernel: [ 279.083329] encode_readdir: cookie = 0, verifier = 00000000:00000000, bitmap = 0018091a:00b0a23a:00000000 Sep 8 11:02:53 portatil264 kernel: [ 279.188491] nfs41_sequence_process: Error 1 free the slot Sep 8 11:02:53 portatil264 kernel: [ 279.188494] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 Sep 8 11:02:53 portatil264 kernel: [ 279.188501] _nfs4_proc_readdir: returns -5 Sep 8 11:02:53 portatil264 kernel: [ 279.188507] NFS: readdir(departamentos/innovacion) returns -5 Sep 8 11:02:53 portatil264 kernel: [ 279.189204] NFS: dentry_delete(departamentos/innovacion, 28080c) [-- Attachment #5: nfs_error.pcap --] [-- Type: application/vnd.tcpdump.pcap, Size: 6496 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-09-09 9:29 ` Alberto Gonzalez Iniesta @ 2020-09-09 13:47 ` J. Bruce Fields 2020-10-26 13:42 ` Alberto Gonzalez Iniesta 0 siblings, 1 reply; 16+ messages in thread From: J. Bruce Fields @ 2020-09-09 13:47 UTC (permalink / raw) To: Alberto Gonzalez Iniesta; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas On Wed, Sep 09, 2020 at 11:29:00AM +0200, Alberto Gonzalez Iniesta wrote: > On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote: > > On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote: > > > I'm sorry for reporting this (a little bit) late, but it took us (Miguel > > > in Cc:) some time to track this issue to an exact kernel update. > > > > > > We're running a +200 clients NFS server with Ubuntu 16.04 and 18.04 > > > clients. The server runs Debian 8.11 (jessie) with Linux 3.16.0 and > > > nfs-kernel-server 1:1.2.8-9+deb8u1. It has been working some years now > > > without issues. > > > > > > But since we started moving clients from Ubuntu 16.04 to Ubuntu 18.04 > > > some of them started experiencing failures while working on NFS mounts. > > > The failures are arbitrary and sometimes it may take more than 20 minutes > > > to come out (which made finding out which kernel version introduced > > > this a pain). We are almost sure that some directories are more prone to > > > suffer from this than others (maybe related to path length/chars?). > > > > > > The error is also not very "verbose", from an strace: > > > > > > execve("/bin/ls", ["ls", "-lR", "Becas y ayudas/"], 0x7ffccb7f5b20 /* 16 vars */) = 0 > > > [lots of uninteresting output] > > > openat(AT_FDCWD, "Becas y ayudas/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3 > > > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 > > > fstat(3, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 > > > fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 > > > ioctl(1, TCGETS, 0x7ffd8b725c80) = -1 ENOTTY (Inappropriate ioctl for device) > > > getdents(3, /* 35 entries */, 32768) = 1936 > > > [lots of lstats) > > > lstat("Becas y ayudas/Convocatorias", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 > > > getdents(3, 0x561af78de890, 32768) = -1 EIO (Input/output error) > > > > Ideas off the top of my head.... > > > > It'd be really useful to get a network trace--something like tcpdump -s0 > > -wtmp.pcap -i<interface>, then reproduce the problem, then look through > > it to see if you can find the READDIR or STAT or whatever that results > > in the unexpected EIO. But if takes a while to reproduce, that may be > > difficult. > > > > Is there anything in the logs? > > > > It might be worth turning on some more debug logging--see the "rpcdebug" > > command. > > Hi, Bruce et at. > > I'm sorry this reply took so long, but with debugging enabled the error > was harder to reproduce. > > I'm attaching 3 log files (2 with just "nfs" debugging and one with > "nfs" and "rpc" debugging modules enabled). > > I'm also attaching a pcap file, don't know if it would be useful since > we run "sec=krb5p". Yeah, not too useful. > Let me know if there's anything else I can test/provide. > These tests were done with Ubuntu's 5.3.0-53-generic. > > Thanks, > > Alberto > > > > (I can send you the full output if you need it) > > > > > > We can run the previous "ls -lR" 20 times and get no error, or get > > > this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida" > > > (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every > > > now and then. > > > > > > The error happens (obviously?) with ls, rsync and the users's GUI tools. > > > > > > There's nothing in dmesg (or elsewhere). > > > These are the kernels with tried: > > > 4.18.0-25 -> Can't reproduce > > > 4.19.0 -> Can't reproduce > > > 4.20.17 -> Happening (hard to reproduce) > > > 5.0.0-15 -> Happening (hard to reproduce) > > > 5.3.0-45 -> Happening (more frequently) > > > 5.6.0-rc7 -> Reproduced a couple of times after boot, then nothing > > > > > > We did long (as in daylong) testing trying to reproduce this with all > > > those kernel versions, so we are pretty sure 4.18 and 4.19 don't > > > experience this and our Ubuntu 16.04 clients don't have any issue. > > > > > > I know we aren't providing much info but we are really looking forward > > > to doing all the testing required (we already spent lots of time in it). > > > > > > Thanks for your work. So all I notice from this one is the readdir EIO came from call_decode. I suspect that means it failed in the xdr decoding. Looks like xdr decoding of the actual directory data (which is the complicated part) is done later, so this means it failed decoding the header or verifier, which is a little odd: > Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 > Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot > Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") > Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 > Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task > Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf > Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 > Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") > Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) > Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task > Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 > Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 --b. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-09-09 13:47 ` J. Bruce Fields @ 2020-10-26 13:42 ` Alberto Gonzalez Iniesta 2020-10-26 13:58 ` Chuck Lever 2020-10-26 14:14 ` J. Bruce Fields 0 siblings, 2 replies; 16+ messages in thread From: Alberto Gonzalez Iniesta @ 2020-10-26 13:42 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas On Wed, Sep 09, 2020 at 09:47:27AM -0400, J. Bruce Fields wrote: > On Wed, Sep 09, 2020 at 11:29:00AM +0200, Alberto Gonzalez Iniesta wrote: > > On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote: > > > On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote: > > > > We can run the previous "ls -lR" 20 times and get no error, or get > > > > this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida" > > > > (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every > > > > now and then. > > > > > > > > The error happens (obviously?) with ls, rsync and the users's GUI tools. > > > > > > > > There's nothing in dmesg (or elsewhere). > > > > These are the kernels with tried: > > > > 4.18.0-25 -> Can't reproduce > > > > 4.19.0 -> Can't reproduce > > > > 4.20.17 -> Happening (hard to reproduce) > > > > 5.0.0-15 -> Happening (hard to reproduce) > > > > 5.3.0-45 -> Happening (more frequently) > > > > 5.6.0-rc7 -> Reproduced a couple of times after boot, then nothing > > > > > > > > We did long (as in daylong) testing trying to reproduce this with all > > > > those kernel versions, so we are pretty sure 4.18 and 4.19 don't > > > > experience this and our Ubuntu 16.04 clients don't have any issue. > > > > > > > > I know we aren't providing much info but we are really looking forward > > > > to doing all the testing required (we already spent lots of time in it). > > > > > > > > Thanks for your work. > > So all I notice from this one is the readdir EIO came from call_decode. > I suspect that means it failed in the xdr decoding. Looks like xdr > decoding of the actual directory data (which is the complicated part) is > done later, so this means it failed decoding the header or verifier, > which is a little odd: > > > Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 > > Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot > > Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") > > Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > > Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 > > Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task > > Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf > > Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 > > Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") > > Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) > > Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task > > Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 > > Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 Hi, Bruce et al. Is there anything we can do to help debugging/fixing this? It's still biting our users with a +4.20.x kernel. Thanks, Alberto -- Alberto González Iniesta | Universidad a Distancia alberto.gonzalez@udima.es | de Madrid ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 13:42 ` Alberto Gonzalez Iniesta @ 2020-10-26 13:58 ` Chuck Lever 2020-10-26 14:43 ` Alberto Gonzalez Iniesta 2020-10-26 14:14 ` J. Bruce Fields 1 sibling, 1 reply; 16+ messages in thread From: Chuck Lever @ 2020-10-26 13:58 UTC (permalink / raw) To: Alberto Gonzalez Iniesta Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez, Isaac Marco Blancas > On Oct 26, 2020, at 9:42 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote: > > On Wed, Sep 09, 2020 at 09:47:27AM -0400, J. Bruce Fields wrote: >> On Wed, Sep 09, 2020 at 11:29:00AM +0200, Alberto Gonzalez Iniesta wrote: >>> On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote: >>>> On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote: >>>>> We can run the previous "ls -lR" 20 times and get no error, or get >>>>> this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida" >>>>> (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every >>>>> now and then. >>>>> >>>>> The error happens (obviously?) with ls, rsync and the users's GUI tools. >>>>> >>>>> There's nothing in dmesg (or elsewhere). >>>>> These are the kernels with tried: >>>>> 4.18.0-25 -> Can't reproduce >>>>> 4.19.0 -> Can't reproduce >>>>> 4.20.17 -> Happening (hard to reproduce) >>>>> 5.0.0-15 -> Happening (hard to reproduce) >>>>> 5.3.0-45 -> Happening (more frequently) >>>>> 5.6.0-rc7 -> Reproduced a couple of times after boot, then nothing >>>>> >>>>> We did long (as in daylong) testing trying to reproduce this with all >>>>> those kernel versions, so we are pretty sure 4.18 and 4.19 don't >>>>> experience this and our Ubuntu 16.04 clients don't have any issue. >>>>> >>>>> I know we aren't providing much info but we are really looking forward >>>>> to doing all the testing required (we already spent lots of time in it). >>>>> >>>>> Thanks for your work. >> >> So all I notice from this one is the readdir EIO came from call_decode. >> I suspect that means it failed in the xdr decoding. Looks like xdr >> decoding of the actual directory data (which is the complicated part) is >> done later, so this means it failed decoding the header or verifier, >> which is a little odd: >> >>> Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 >>> Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot >>> Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") >>> Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 >>> Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 >>> Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task >>> Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf >>> Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 >>> Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") >>> Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) >>> Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task >>> Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 >>> Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 > > Hi, Bruce et al. > > Is there anything we can do to help debugging/fixing this? It's still > biting our users with a +4.20.x kernel. Alberto, can you share a snippet of a raw network capture that shows the READDIR Reply that fails to decode? -- Chuck Lever ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 13:58 ` Chuck Lever @ 2020-10-26 14:43 ` Alberto Gonzalez Iniesta 2020-10-26 14:46 ` Chuck Lever 0 siblings, 1 reply; 16+ messages in thread From: Alberto Gonzalez Iniesta @ 2020-10-26 14:43 UTC (permalink / raw) To: Chuck Lever Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez, Isaac Marco Blancas On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote: > >> So all I notice from this one is the readdir EIO came from call_decode. > >> I suspect that means it failed in the xdr decoding. Looks like xdr > >> decoding of the actual directory data (which is the complicated part) is > >> done later, so this means it failed decoding the header or verifier, > >> which is a little odd: > >> > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 > >>> Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 > > > > Hi, Bruce et al. > > > > Is there anything we can do to help debugging/fixing this? It's still > > biting our users with a +4.20.x kernel. > > Alberto, can you share a snippet of a raw network capture that shows > the READDIR Reply that fails to decode? Hi, Chuck. Thanks for your reply. We're using "sec=krb5p", which makes the network capture useless :-( I'm afraid it's quite difficult to disable that, since we have +200 NFS clients on production. -- Alberto González Iniesta | Universidad a Distancia alberto.gonzalez@udima.es | de Madrid ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 14:43 ` Alberto Gonzalez Iniesta @ 2020-10-26 14:46 ` Chuck Lever 2020-10-26 14:54 ` Alberto Gonzalez Iniesta 2020-10-26 15:02 ` Bruce Fields 0 siblings, 2 replies; 16+ messages in thread From: Chuck Lever @ 2020-10-26 14:46 UTC (permalink / raw) To: Alberto Gonzalez Iniesta Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez, Isaac Marco Blancas > On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote: > > On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote: >>>> So all I notice from this one is the readdir EIO came from call_decode. >>>> I suspect that means it failed in the xdr decoding. Looks like xdr >>>> decoding of the actual directory data (which is the complicated part) is >>>> done later, so this means it failed decoding the header or verifier, >>>> which is a little odd: >>>> >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 >>> >>> Hi, Bruce et al. >>> >>> Is there anything we can do to help debugging/fixing this? It's still >>> biting our users with a +4.20.x kernel. >> >> Alberto, can you share a snippet of a raw network capture that shows >> the READDIR Reply that fails to decode? > > Hi, Chuck. > > Thanks for your reply. We're using "sec=krb5p", which makes the network > capture useless :-( You can plug keytabs into Wireshark to enable it to decrypt the traffic. -- Chuck Lever ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 14:46 ` Chuck Lever @ 2020-10-26 14:54 ` Alberto Gonzalez Iniesta 2020-10-26 15:02 ` Bruce Fields 1 sibling, 0 replies; 16+ messages in thread From: Alberto Gonzalez Iniesta @ 2020-10-26 14:54 UTC (permalink / raw) To: Chuck Lever Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez, Isaac Marco Blancas On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote: > > > > On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote: > > > > On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote: > >>>> So all I notice from this one is the readdir EIO came from call_decode. > >>>> I suspect that means it failed in the xdr decoding. Looks like xdr > >>>> decoding of the actual directory data (which is the complicated part) is > >>>> done later, so this means it failed decoding the header or verifier, > >>>> which is a little odd: > >>>> > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 > >>> > >>> Hi, Bruce et al. > >>> > >>> Is there anything we can do to help debugging/fixing this? It's still > >>> biting our users with a +4.20.x kernel. > >> > >> Alberto, can you share a snippet of a raw network capture that shows > >> the READDIR Reply that fails to decode? > > > > Hi, Chuck. > > > > Thanks for your reply. We're using "sec=krb5p", which makes the network > > capture useless :-( > > You can plug keytabs into Wireshark to enable it to decrypt the traffic. > Cool, I'll look into that then. Thanks! -- Alberto González Iniesta | Universidad a Distancia alberto.gonzalez@udima.es | de Madrid ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 14:46 ` Chuck Lever 2020-10-26 14:54 ` Alberto Gonzalez Iniesta @ 2020-10-26 15:02 ` Bruce Fields 2020-10-26 15:06 ` Chuck Lever 1 sibling, 1 reply; 16+ messages in thread From: Bruce Fields @ 2020-10-26 15:02 UTC (permalink / raw) To: Chuck Lever Cc: Alberto Gonzalez Iniesta, Linux NFS Mailing List, Miguel Rodriguez, Isaac Marco Blancas On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote: > > > > On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote: > > > > On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote: > >>>> So all I notice from this one is the readdir EIO came from call_decode. > >>>> I suspect that means it failed in the xdr decoding. Looks like xdr > >>>> decoding of the actual directory data (which is the complicated part) is > >>>> done later, so this means it failed decoding the header or verifier, > >>>> which is a little odd: > >>>> > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 > >>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 > >>> > >>> Hi, Bruce et al. > >>> > >>> Is there anything we can do to help debugging/fixing this? It's still > >>> biting our users with a +4.20.x kernel. > >> > >> Alberto, can you share a snippet of a raw network capture that shows > >> the READDIR Reply that fails to decode? > > > > Hi, Chuck. > > > > Thanks for your reply. We're using "sec=krb5p", which makes the network > > capture useless :-( > > You can plug keytabs into Wireshark to enable it to decrypt the traffic. Just skimming that range of history, there's some changes to the handling of gss sequence numbers, I wonder if there's a chance he could be hitting that? You had a workload that would lead to calls dropping out of the sequence number window, didn't you, Chuck? Is there a quick way to check whether that's happening? --b. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 15:02 ` Bruce Fields @ 2020-10-26 15:06 ` Chuck Lever 2020-10-26 15:26 ` Alberto Gonzalez Iniesta 0 siblings, 1 reply; 16+ messages in thread From: Chuck Lever @ 2020-10-26 15:06 UTC (permalink / raw) To: Bruce Fields Cc: Alberto Gonzalez Iniesta, Linux NFS Mailing List, Miguel Rodriguez, Isaac Marco Blancas > On Oct 26, 2020, at 11:02 AM, Bruce Fields <bfields@fieldses.org> wrote: > > On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote: >> >> >>> On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote: >>> >>> On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote: >>>>>> So all I notice from this one is the readdir EIO came from call_decode. >>>>>> I suspect that means it failed in the xdr decoding. Looks like xdr >>>>>> decoding of the actual directory data (which is the complicated part) is >>>>>> done later, so this means it failed decoding the header or verifier, >>>>>> which is a little odd: >>>>>> >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 >>>>> >>>>> Hi, Bruce et al. >>>>> >>>>> Is there anything we can do to help debugging/fixing this? It's still >>>>> biting our users with a +4.20.x kernel. >>>> >>>> Alberto, can you share a snippet of a raw network capture that shows >>>> the READDIR Reply that fails to decode? >>> >>> Hi, Chuck. >>> >>> Thanks for your reply. We're using "sec=krb5p", which makes the network >>> capture useless :-( >> >> You can plug keytabs into Wireshark to enable it to decrypt the traffic. > > Just skimming that range of history, there's some changes to the > handling of gss sequence numbers, I wonder if there's a chance he could > be hitting that? You had a workload that would lead to calls dropping > out of the sequence number window, didn't you, Chuck? Is there a quick > way to check whether that's happening? The server is supposed to drop the connection when that happens, though I'm not sure 4.20's NFSD was perfect in that regard. Connection loss in itself wouldn't result in EIO. -- Chuck Lever ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 15:06 ` Chuck Lever @ 2020-10-26 15:26 ` Alberto Gonzalez Iniesta 2020-10-26 17:23 ` Chuck Lever 0 siblings, 1 reply; 16+ messages in thread From: Alberto Gonzalez Iniesta @ 2020-10-26 15:26 UTC (permalink / raw) To: Chuck Lever Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez, Isaac Marco Blancas On Mon, Oct 26, 2020 at 11:06:24AM -0400, Chuck Lever wrote: > > > > On Oct 26, 2020, at 11:02 AM, Bruce Fields <bfields@fieldses.org> wrote: > > > > On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote: > >> > >> > >>> On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote: > >>> > >>> On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote: > >>>>>> So all I notice from this one is the readdir EIO came from call_decode. > >>>>>> I suspect that means it failed in the xdr decoding. Looks like xdr > >>>>>> decoding of the actual directory data (which is the complicated part) is > >>>>>> done later, so this means it failed decoding the header or verifier, > >>>>>> which is a little odd: > >>>>>> > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 > >>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 > >>>>> > >>>>> Hi, Bruce et al. > >>>>> > >>>>> Is there anything we can do to help debugging/fixing this? It's still > >>>>> biting our users with a +4.20.x kernel. > >>>> > >>>> Alberto, can you share a snippet of a raw network capture that shows > >>>> the READDIR Reply that fails to decode? > >>> > >>> Hi, Chuck. > >>> > >>> Thanks for your reply. We're using "sec=krb5p", which makes the network > >>> capture useless :-( > >> > >> You can plug keytabs into Wireshark to enable it to decrypt the traffic. > > > > Just skimming that range of history, there's some changes to the > > handling of gss sequence numbers, I wonder if there's a chance he could > > be hitting that? You had a workload that would lead to calls dropping > > out of the sequence number window, didn't you, Chuck? Is there a quick > > way to check whether that's happening? > > The server is supposed to drop the connection when that happens, though > I'm not sure 4.20's NFSD was perfect in that regard. Connection loss in > itself wouldn't result in EIO. In case this is relevant, server is running 3.16.0. Clients (with issues) +4.20. -- Alberto González Iniesta | Universidad a Distancia alberto.gonzalez@udima.es | de Madrid ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 15:26 ` Alberto Gonzalez Iniesta @ 2020-10-26 17:23 ` Chuck Lever 0 siblings, 0 replies; 16+ messages in thread From: Chuck Lever @ 2020-10-26 17:23 UTC (permalink / raw) To: Alberto Gonzalez Iniesta Cc: Bruce Fields, Linux NFS Mailing List, Miguel Rodriguez, Isaac Marco Blancas > On Oct 26, 2020, at 11:26 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote: > > On Mon, Oct 26, 2020 at 11:06:24AM -0400, Chuck Lever wrote: >> >> >>> On Oct 26, 2020, at 11:02 AM, Bruce Fields <bfields@fieldses.org> wrote: >>> >>> On Mon, Oct 26, 2020 at 10:46:05AM -0400, Chuck Lever wrote: >>>> >>>> >>>>> On Oct 26, 2020, at 10:43 AM, Alberto Gonzalez Iniesta <alberto.gonzalez@udima.es> wrote: >>>>> >>>>> On Mon, Oct 26, 2020 at 09:58:17AM -0400, Chuck Lever wrote: >>>>>>>> So all I notice from this one is the readdir EIO came from call_decode. >>>>>>>> I suspect that means it failed in the xdr decoding. Looks like xdr >>>>>>>> decoding of the actual directory data (which is the complicated part) is >>>>>>>> done later, so this means it failed decoding the header or verifier, >>>>>>>> which is a little odd: >>>>>>>> >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 >>>>>>>>> Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 >>>>>>> >>>>>>> Hi, Bruce et al. >>>>>>> >>>>>>> Is there anything we can do to help debugging/fixing this? It's still >>>>>>> biting our users with a +4.20.x kernel. >>>>>> >>>>>> Alberto, can you share a snippet of a raw network capture that shows >>>>>> the READDIR Reply that fails to decode? >>>>> >>>>> Hi, Chuck. >>>>> >>>>> Thanks for your reply. We're using "sec=krb5p", which makes the network >>>>> capture useless :-( >>>> >>>> You can plug keytabs into Wireshark to enable it to decrypt the traffic. >>> >>> Just skimming that range of history, there's some changes to the >>> handling of gss sequence numbers, I wonder if there's a chance he could >>> be hitting that? You had a workload that would lead to calls dropping >>> out of the sequence number window, didn't you, Chuck? Is there a quick >>> way to check whether that's happening? >> >> The server is supposed to drop the connection when that happens, though >> I'm not sure 4.20's NFSD was perfect in that regard. Connection loss in >> itself wouldn't result in EIO. > > In case this is relevant, server is running 3.16.0. Clients (with > issues) +4.20. Ah, I see. Well that's an old kernel. Have you engaged your distributor? They might be able to provide builds with debugging instrumentation, for example, if we can give them some instructions or a patch. My experience tells me that this is probably an issue with either the server's GSS wrap function, or the client's GSS unwrap function, if you don't ever see this failure without krb5p in the picture. If you don't see this problem on older clients, than I would start looking at the client's GSS unwrap function, fwiw. -- Chuck Lever ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 13:42 ` Alberto Gonzalez Iniesta 2020-10-26 13:58 ` Chuck Lever @ 2020-10-26 14:14 ` J. Bruce Fields 2020-10-26 14:54 ` Alberto Gonzalez Iniesta 1 sibling, 1 reply; 16+ messages in thread From: J. Bruce Fields @ 2020-10-26 14:14 UTC (permalink / raw) To: Alberto Gonzalez Iniesta; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas On Mon, Oct 26, 2020 at 02:42:16PM +0100, Alberto Gonzalez Iniesta wrote: > On Wed, Sep 09, 2020 at 09:47:27AM -0400, J. Bruce Fields wrote: > > On Wed, Sep 09, 2020 at 11:29:00AM +0200, Alberto Gonzalez Iniesta wrote: > > > On Thu, Apr 30, 2020 at 01:32:00PM -0400, J. Bruce Fields wrote: > > > > On Wed, Apr 29, 2020 at 07:15:27PM +0200, Alberto Gonzalez Iniesta wrote: > > > > > We can run the previous "ls -lR" 20 times and get no error, or get > > > > > this "ls: leyendo el directorio 'Becas y ayudas/': Error de entrada/salida" > > > > > (ls: reading directorio 'Becas y ayudas/': Input/Output Error") every > > > > > now and then. > > > > > > > > > > The error happens (obviously?) with ls, rsync and the users's GUI tools. > > > > > > > > > > There's nothing in dmesg (or elsewhere). > > > > > These are the kernels with tried: > > > > > 4.18.0-25 -> Can't reproduce > > > > > 4.19.0 -> Can't reproduce > > > > > 4.20.17 -> Happening (hard to reproduce) > > > > > 5.0.0-15 -> Happening (hard to reproduce) > > > > > 5.3.0-45 -> Happening (more frequently) > > > > > 5.6.0-rc7 -> Reproduced a couple of times after boot, then nothing > > > > > > > > > > We did long (as in daylong) testing trying to reproduce this with all > > > > > those kernel versions, so we are pretty sure 4.18 and 4.19 don't > > > > > experience this and our Ubuntu 16.04 clients don't have any issue. > > > > > > > > > > I know we aren't providing much info but we are really looking forward > > > > > to doing all the testing required (we already spent lots of time in it). > > > > > > > > > > Thanks for your work. > > > > So all I notice from this one is the readdir EIO came from call_decode. > > I suspect that means it failed in the xdr decoding. Looks like xdr > > decoding of the actual directory data (which is the complicated part) is > > done later, so this means it failed decoding the header or verifier, > > which is a little odd: > > > > > Sep 8 16:03:23 portatil264 kernel: [15033.016276] RPC: 3284 call_decode result -5 > > > Sep 8 16:03:23 portatil264 kernel: [15033.016281] nfs41_sequence_process: Error 1 free the slot > > > Sep 8 16:03:23 portatil264 kernel: [15033.016286] RPC: wake_up_first(00000000d3f50f4d "ForeChannel Slot table") > > > Sep 8 16:03:23 portatil264 kernel: [15033.016288] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > > > Sep 8 16:03:23 portatil264 kernel: [15033.016290] RPC: 3284 return 0, status -5 > > > Sep 8 16:03:23 portatil264 kernel: [15033.016291] RPC: 3284 release task > > > Sep 8 16:03:23 portatil264 kernel: [15033.016295] RPC: freeing buffer of size 4144 at 00000000a3649daf > > > Sep 8 16:03:23 portatil264 kernel: [15033.016298] RPC: 3284 release request 0000000079df89b2 > > > Sep 8 16:03:23 portatil264 kernel: [15033.016300] RPC: wake_up_first(00000000c5ee49ee "xprt_backlog") > > > Sep 8 16:03:23 portatil264 kernel: [15033.016302] RPC: rpc_release_client(00000000b930c343) > > > Sep 8 16:03:23 portatil264 kernel: [15033.016304] RPC: 3284 freeing task > > > Sep 8 16:03:23 portatil264 kernel: [15033.016309] _nfs4_proc_readdir: returns -5 > > > Sep 8 16:03:23 portatil264 kernel: [15033.016318] NFS: readdir(departamentos/innovacion) returns -5 > > Hi, Bruce et al. > > Is there anything we can do to help debugging/fixing this? It's still > biting our users with a +4.20.x kernel. Sorry, I just don't know what this is off the top of my head. If I had the time, stuff I might try: - look at the wire traffic with wireshark: try to figure out which operation this is happening on, and if there's anything unusual about the reply. May be difficult if a lot of traffic is required to reproduce. - if you're using krb5, try without just to see if that makes a difference. - look at history: gitk v4.19..v4.20.17 fs/nfs net/sunrpc, see if anything looks relevant; or even just try a git bisect (difficult again given the intermittent failure). - trace through the code to work out where call_decode might be returning -EIO, try to narrow it down with tracepoints or even just debugging printk's. --b. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Random IO errors on nfs clients running linux > 4.20 2020-10-26 14:14 ` J. Bruce Fields @ 2020-10-26 14:54 ` Alberto Gonzalez Iniesta 0 siblings, 0 replies; 16+ messages in thread From: Alberto Gonzalez Iniesta @ 2020-10-26 14:54 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, Miguel Rodriguez, Isaac Marco Blancas On Mon, Oct 26, 2020 at 10:14:52AM -0400, J. Bruce Fields wrote: > Sorry, I just don't know what this is off the top of my head. If I had > the time, stuff I might try: > > - look at the wire traffic with wireshark: try to figure out > which operation this is happening on, and if there's anything > unusual about the reply. May be difficult if a lot of traffic > is required to reproduce. > - if you're using krb5, try without just to see if that makes a > difference. Using "sec=krb5p" here... And with lots of active clients. :-/ Oh! Could I add another entry in /etc/exports with krb5 without afecting current krb5p clients? > - look at history: gitk v4.19..v4.20.17 fs/nfs net/sunrpc, see > if anything looks relevant; or even just try a git bisect > (difficult again given the intermittent failure). > - trace through the code to work out where call_decode might be > returning -EIO, try to narrow it down with tracepoints or even > just debugging printk's. I'm afraid I don't have the skills to carry out those tasks, haven't coded in C for decades, not to mention kernel/nfs hacking. I guess we'll have to wait until someone else hits this too, or we update the server and (hopefully/miraculously) fixes it self :-) Thanks a lot, Alberto -- Alberto González Iniesta | Universidad a Distancia alberto.gonzalez@udima.es | de Madrid ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-10-26 17:23 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-29 17:15 Random IO errors on nfs clients running linux > 4.20 Alberto Gonzalez Iniesta 2020-04-30 6:08 ` Alberto Gonzalez Iniesta 2020-04-30 17:32 ` J. Bruce Fields 2020-09-09 9:29 ` Alberto Gonzalez Iniesta 2020-09-09 13:47 ` J. Bruce Fields 2020-10-26 13:42 ` Alberto Gonzalez Iniesta 2020-10-26 13:58 ` Chuck Lever 2020-10-26 14:43 ` Alberto Gonzalez Iniesta 2020-10-26 14:46 ` Chuck Lever 2020-10-26 14:54 ` Alberto Gonzalez Iniesta 2020-10-26 15:02 ` Bruce Fields 2020-10-26 15:06 ` Chuck Lever 2020-10-26 15:26 ` Alberto Gonzalez Iniesta 2020-10-26 17:23 ` Chuck Lever 2020-10-26 14:14 ` J. Bruce Fields 2020-10-26 14:54 ` Alberto Gonzalez Iniesta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox