Linux NFS development
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"clm@fb.com" <clm@fb.com>
Subject: Re: client looping in state recovery after netns teardown
Date: Mon, 24 Mar 2025 20:15:48 -0400	[thread overview]
Message-ID: <b799e9cb5b60e08186431915bda5111604fda4b3.camel@kernel.org> (raw)
In-Reply-To: <1f2d9f709a8b7b0f0746b90672f9a1f5293cba0a.camel@hammerspace.com>

[-- Attachment #1: Type: text/plain, Size: 8178 bytes --]

On Mon, 2025-03-24 at 22:49 +0000, Trond Myklebust wrote:
> On Mon, 2025-03-24 at 17:52 -0400, Jeff Layton wrote:
> > Hi Trond,
> > 
> > We hit a new failure mode with the containerized NFS client teardown
> > patches.
> > 
> > The symptoms are that we see the state recovery thread stuck, and
> > then
> > we see a flusher thread stuck like this:
> > 
> > #0  context_switch (kernel/sched/core.c:5434:2)
> > #1  __schedule (kernel/sched/core.c:6799:8)
> > #2  __schedule_loop (kernel/sched/core.c:6874:3)
> > #3  schedule (kernel/sched/core.c:6889:2)
> > #4  nfs_wait_bit_killable (fs/nfs/inode.c:77:2)
> > #5  __wait_on_bit (kernel/sched/wait_bit.c:49:10)
> > #6  out_of_line_wait_on_bit (kernel/sched/wait_bit.c:64:9)
> > #7  wait_on_bit_action (./include/linux/wait_bit.h:156:9)
> > #8  nfs4_wait_clnt_recover (fs/nfs/nfs4state.c:1329:8)
> > #9  nfs4_client_recover_expired_lease (fs/nfs/nfs4state.c:1347:9)
> > #10 pnfs_update_layout (fs/nfs/pnfs.c:2133:17)
> > #11 ff_layout_pg_get_mirror_count_write
> > (fs/nfs/flexfilelayout/flexfilelayout.c:949:4)
> > #12 nfs_pageio_setup_mirroring (fs/nfs/pagelist.c:1114:18)
> > #13 nfs_pageio_add_request (fs/nfs/pagelist.c:1406:2)
> > #14 nfs_page_async_flush (fs/nfs/write.c:631:7)
> > #15 nfs_do_writepage (fs/nfs/write.c:657:9)
> > #16 nfs_writepages_callback (fs/nfs/write.c:684:8)
> > #17 write_cache_pages (mm/page-writeback.c:2569:11)
> > #18 nfs_writepages (fs/nfs/write.c:723:9)
> > #19 do_writepages (mm/page-writeback.c:2612:10)
> > #20 __writeback_single_inode (fs/fs-writeback.c:1650:8)
> > #21 writeback_sb_inodes (fs/fs-writeback.c:1942:3)
> > #22 __writeback_inodes_wb (fs/fs-writeback.c:2013:12)
> > #23 wb_writeback (fs/fs-writeback.c:2120:15)
> > #24 wb_check_old_data_flush (fs/fs-writeback.c:2224:10)
> > #25 wb_do_writeback (fs/fs-writeback.c:2277:11)
> > #26 wb_workfn (fs/fs-writeback.c:2305:20)
> > #27 process_one_work (kernel/workqueue.c:3267:2)
> > #28 process_scheduled_works (kernel/workqueue.c:3348:3)
> > #29 worker_thread (kernel/workqueue.c:3429:4)
> > #30 kthread (kernel/kthread.c:388:9)
> > #31 ret_from_fork (arch/x86/kernel/process.c:147:3)
> > #32 ret_from_fork_asm+0x11/0x16 (arch/x86/entry/entry_64.S:244)
> > 
> > Turning up some tracepoints, I see this:
> > 
> >  kworker/u1028:1-997456  [159] .....  9027.330396:
> > rpc_task_run_action: task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> > runstate=RUNNING|ACTIVE status=0 action=call_reserve [sunrpc]
> >  kworker/u1028:1-997456  [159] .....  9027.330397: xprt_reserve:
> > task:0000174b@00000000 xid=0x6bfc099c
> >  kworker/u1028:1-997456  [159] .....  9027.330397:
> > rpc_task_run_action: task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> > UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_reserveresult
> > [sunrpc]
> >  kworker/u1028:1-997456  [159] .....  9027.330397:
> > rpc_task_run_action: task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> > UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_refresh
> > [sunrpc]
> >  kworker/u1028:1-997456  [159] .....  9027.330399:
> > rpc_task_run_action: task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> > UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_refreshresult
> > [sunrpc]
> >  kworker/u1028:1-997456  [159] .....  9027.330399:
> > rpc_task_run_action: task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> > UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_allocate
> > [sunrpc]
> >  kworker/u1028:1-997456  [159] .....  9027.330400: rpc_buf_alloc:
> > task:0000174b@00000000 callsize=368 recvsize=80 status=0
> >  kworker/u1028:1-997456  [159] .....  9027.330400:
> > rpc_task_run_action: task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> > runstate=RUNNING|ACTIVE status=0 action=call_encode [sunrpc]
> >  kworker/u1028:1-997456  [159] .....  9027.330402:
> > rpc_task_run_action: task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
> > action=call_connect [sunrpc]
> >  kworker/u1028:1-997456  [159] .....  9027.330403: xprt_reserve_xprt:
> > task:0000174b@00000000 snd_task:0000174b
> >  kworker/u1028:1-997456  [159] .....  9027.330404: xprt_connect:
> > peer=[2a03:83e4:4002:0:12a::2a]:20492 state=LOCKED|BOUND
> >  kworker/u1028:1-997456  [159] .....  9027.330405: rpc_task_sleep:
> > task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 timeout=0
> > queue=xprt_pending
> >  kworker/u1028:1-997456  [159] .....  9027.330442:
> > rpc_socket_connect: error=-22 socket:[15084763] srcaddr=[::]:993
> > dstaddr=[2a03:83e4:4002:0:12a::2a]:0 state=1 (UNCONNECTED) sk_state=7
> > (CLOSE)
> >  kworker/u1028:1-997456  [159] .....  9027.330443: rpc_task_wakeup:
> > task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> > runstate=QUEUED|ACTIVE|NEED_XMIT|NEED_RECV status=-22 timeout=60000
> > queue=xprt_pending
> >  kworker/u1028:1-997456  [159] .....  9027.330444:
> > xprt_disconnect_force: peer=[2a03:83e4:4002:0:12a::2a]:20492
> > state=LOCKED|CONNECTING|BOUND|SND_IS_COOKIE
> >  kworker/u1028:1-997456  [159] .....  9027.330444: xprt_release_xprt:
> > task:ffffffff@ffffffff snd_task:ffffffff
> >  kworker/u1028:1-997456  [159] .....  9027.330445:
> > rpc_task_run_action: task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22
> > action=call_connect_status [sunrpc]
> >  kworker/u1028:1-997456  [159] .....  9027.330446:
> > rpc_connect_status: task:0000174b@00000000 status=-22
> >  kworker/u1028:1-997456  [159] .....  9027.330446: rpc_call_rpcerror:
> > task:0000174b@00000000 tk_status=-22 rpc_status=-22
> >  kworker/u1028:1-997456  [159] .....  9027.330446:
> > rpc_task_run_action: task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> > UT|NORTO runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22
> > action=rpc_exit_task [sunrpc]
> >  kworker/u1028:1-997456  [159] .....  9027.330447: rpc_task_end:
> > task:0000174b@00000000
> > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22
> > action=rpc_exit_task [sunrpc]
> > 
> > There is no traffic on the wire. So, the problem seems to be in this
> > situation that we're calling call_connect, and getting back -EINVAL
> > from the network layer, which doesn't trigger the -ENETUNREACH
> > handling.
> > 
> > Is this a bug from the network layer, or do we also need to handle -
> > EINVAL in call_connect_status?
> > 
> > ...or do we need a different fix?
> > 
> > Thoughts?
> 
> We need a way to pass down to the client calling
> nfs4_client_recover_expired_lease() that the client is terminally gone
> away, and that it should stop looping.
> I'm thinking that we probably need to mark the client as errored out by
> using a call to nfs_mark_client_ready() to set clp->cl_cons_state in
> error.
> 
> However I do find it odd that we're seeing an EINVAL error instead of
> ENETUNREACH.
> 

Same.

> It looks from your traces above as if that is coming from
> the call to xs_tcp_finish_connecting(), which almost has to mean that
> it is being returned by kernel_connect().
> Hmm. Maybe it is this: "dstaddr=[2a03:83e4:4002:0:12a::2a]:0"? A zero
> port value might explain it...

That sounds entirely plausible.

Here is the whole trace buffer. It sort of looks like the port isn't 0,
in xprt_disconnect_auto/_force(), but in rpc_socket_connect() it
definitely is. I see no calls to call_bind() in here too, which is also
odd if the port is really 0.
-- 
Jeff Layton <jlayton@kernel.org>

[-- Attachment #2: nfs-port0.txt.gz --]
[-- Type: application/gzip, Size: 2836 bytes --]

      reply	other threads:[~2025-03-25  0:15 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-24 21:52 client looping in state recovery after netns teardown Jeff Layton
2025-03-24 22:49 ` Trond Myklebust
2025-03-25  0:15   ` Jeff Layton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b799e9cb5b60e08186431915bda5111604fda4b3.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=clm@fb.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox