Linux NFS development
 help / color / mirror / Atom feed
* NFS workload leaves nfsd threads in D state
@ 2023-07-08 18:30 Chuck Lever III
  2023-07-09  6:58 ` Linux regression tracking (Thorsten Leemhuis)
  2023-07-10  7:56 ` Christoph Hellwig
  0 siblings, 2 replies; 14+ messages in thread
From: Chuck Lever III @ 2023-07-08 18:30 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever

Hi -

I have a "standard" test of running the git regression suite with
many threads against an NFS mount. I found that with 6.5-rc, the
test stalled and several nfsd threads on the server were stuck
in D state.

I can reproduce this stall 100% with both an xfs and an ext4
export, so I bisected with both, and both bisects landed on the
same commit:

615939a2ae734e3e68c816d6749d1f5f79c62ab7 is the first bad commit
commit 615939a2ae734e3e68c816d6749d1f5f79c62ab7
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri May 19 06:40:48 2023 +0200

    blk-mq: defer to the normal submission path for post-flush requests

    Requests with the FUA bit on hardware without FUA support need a post
    flush before returning to the caller, but they can still be sent using
    the normal I/O path after initializing the flush-related fields and
    end I/O handler.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20230519044050.107790-6-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

 block/blk-flush.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

On system 1: the exports are on top of /dev/mapper and reside on
an "INTEL SSDSC2BA400G3" SATA device.

On system 2: the exports are on top of /dev/mapper and reside on
an "INTEL SSDSC2KB240G8" SATA device.

System 1 was where I discovered the stall. System 2 is where I ran
the bisects.

The call stacks vary a little. I've seen stalls in both the WRITE
and SETATTR paths. Here's a sample from system 1:

INFO: task nfsd:1237 blocked for more than 122 seconds.
      Tainted: G        W          6.4.0-08699-g9e268189cb14 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:nfsd            state:D stack:0     pid:1237  ppid:2      flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x78f/0x7db
 schedule+0x93/0xc8
 jbd2_log_wait_commit+0xb4/0xf4
 ? __pfx_autoremove_wake_function+0x10/0x10
 jbd2_complete_transaction+0x85/0x97
 ext4_fc_commit+0x118/0x70a
 ? _raw_spin_unlock+0x18/0x2e
 ? __mark_inode_dirty+0x282/0x302
 ext4_write_inode+0x94/0x121
 ext4_nfs_commit_metadata+0x72/0x7d
 commit_inode_metadata+0x1f/0x31 [nfsd]
 commit_metadata+0x26/0x33 [nfsd]
 nfsd_setattr+0x2f2/0x30e [nfsd]
 nfsd_create_setattr+0x4e/0x87 [nfsd]
 nfsd4_open+0x604/0x8fa [nfsd]
 nfsd4_proc_compound+0x4a8/0x5e3 [nfsd]
 ? nfs4svc_decode_compoundargs+0x291/0x2de [nfsd]
 nfsd_dispatch+0xb3/0x164 [nfsd]
 svc_process_common+0x3c7/0x53a [sunrpc]
 ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
 svc_process+0xc6/0xe3 [sunrpc]
 nfsd+0xf2/0x18c [nfsd]
 ? __pfx_nfsd+0x10/0x10 [nfsd]
 kthread+0x10d/0x115
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>

--
Chuck Lever



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-07-25 13:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-08 18:30 NFS workload leaves nfsd threads in D state Chuck Lever III
2023-07-09  6:58 ` Linux regression tracking (Thorsten Leemhuis)
2023-07-10  7:56 ` Christoph Hellwig
2023-07-10 14:06   ` Chuck Lever III
2023-07-10 15:10     ` Chuck Lever III
2023-07-10 15:18       ` Christoph Hellwig
2023-07-10 17:28       ` Christoph Hellwig
2023-07-10 17:40         ` Chuck Lever III
2023-07-11 12:01           ` Christoph Hellwig
2023-07-12 11:34             ` Chengming Zhou
2023-07-12 13:29               ` Chuck Lever III
2023-07-25  9:57                 ` Linux regression tracking (Thorsten Leemhuis)
2023-07-25 13:21                   ` Chuck Lever III
2023-07-25 13:34                     ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox