* NFS workload leaves nfsd threads in D state
@ 2023-07-08 18:30 Chuck Lever III
2023-07-09 6:58 ` Linux regression tracking (Thorsten Leemhuis)
2023-07-10 7:56 ` Christoph Hellwig
0 siblings, 2 replies; 14+ messages in thread
From: Chuck Lever III @ 2023-07-08 18:30 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig
Cc: linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever
Hi -
I have a "standard" test of running the git regression suite with
many threads against an NFS mount. I found that with 6.5-rc, the
test stalled and several nfsd threads on the server were stuck
in D state.
I can reproduce this stall 100% with both an xfs and an ext4
export, so I bisected with both, and both bisects landed on the
same commit:
615939a2ae734e3e68c816d6749d1f5f79c62ab7 is the first bad commit
commit 615939a2ae734e3e68c816d6749d1f5f79c62ab7
Author: Christoph Hellwig <hch@lst.de>
Date: Fri May 19 06:40:48 2023 +0200
blk-mq: defer to the normal submission path for post-flush requests
Requests with the FUA bit on hardware without FUA support need a post
flush before returning to the caller, but they can still be sent using
the normal I/O path after initializing the flush-related fields and
end I/O handler.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230519044050.107790-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
block/blk-flush.c | 11 +++++++++++
1 file changed, 11 insertions(+)
On system 1: the exports are on top of /dev/mapper and reside on
an "INTEL SSDSC2BA400G3" SATA device.
On system 2: the exports are on top of /dev/mapper and reside on
an "INTEL SSDSC2KB240G8" SATA device.
System 1 was where I discovered the stall. System 2 is where I ran
the bisects.
The call stacks vary a little. I've seen stalls in both the WRITE
and SETATTR paths. Here's a sample from system 1:
INFO: task nfsd:1237 blocked for more than 122 seconds.
Tainted: G W 6.4.0-08699-g9e268189cb14 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:nfsd state:D stack:0 pid:1237 ppid:2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x78f/0x7db
schedule+0x93/0xc8
jbd2_log_wait_commit+0xb4/0xf4
? __pfx_autoremove_wake_function+0x10/0x10
jbd2_complete_transaction+0x85/0x97
ext4_fc_commit+0x118/0x70a
? _raw_spin_unlock+0x18/0x2e
? __mark_inode_dirty+0x282/0x302
ext4_write_inode+0x94/0x121
ext4_nfs_commit_metadata+0x72/0x7d
commit_inode_metadata+0x1f/0x31 [nfsd]
commit_metadata+0x26/0x33 [nfsd]
nfsd_setattr+0x2f2/0x30e [nfsd]
nfsd_create_setattr+0x4e/0x87 [nfsd]
nfsd4_open+0x604/0x8fa [nfsd]
nfsd4_proc_compound+0x4a8/0x5e3 [nfsd]
? nfs4svc_decode_compoundargs+0x291/0x2de [nfsd]
nfsd_dispatch+0xb3/0x164 [nfsd]
svc_process_common+0x3c7/0x53a [sunrpc]
? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
svc_process+0xc6/0xe3 [sunrpc]
nfsd+0xf2/0x18c [nfsd]
? __pfx_nfsd+0x10/0x10 [nfsd]
kthread+0x10d/0x115
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>
--
Chuck Lever
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: NFS workload leaves nfsd threads in D state 2023-07-08 18:30 NFS workload leaves nfsd threads in D state Chuck Lever III @ 2023-07-09 6:58 ` Linux regression tracking (Thorsten Leemhuis) 2023-07-10 7:56 ` Christoph Hellwig 1 sibling, 0 replies; 14+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2023-07-09 6:58 UTC (permalink / raw) To: Chuck Lever III, Jens Axboe, Christoph Hellwig Cc: linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever, Linux kernel regressions list [CCing the regression list, as it should be in the loop for regressions: https://docs.kernel.org/admin-guide/reporting-regressions.html] [TLDR: I'm adding this report to the list of tracked Linux kernel regressions; the text you find below is based on a few templates paragraphs you might have encountered already in similar form. See link in footer if these mails annoy you.] On 08.07.23 20:30, Chuck Lever III wrote: > > I have a "standard" test of running the git regression suite with > many threads against an NFS mount. I found that with 6.5-rc, the > test stalled and several nfsd threads on the server were stuck > in D state. > > I can reproduce this stall 100% with both an xfs and an ext4 > export, so I bisected with both, and both bisects landed on the > same commit: > > 615939a2ae734e3e68c816d6749d1f5f79c62ab7 is the first bad commit > commit 615939a2ae734e3e68c816d6749d1f5f79c62ab7 > Author: Christoph Hellwig <hch@lst.de> > Date: Fri May 19 06:40:48 2023 +0200 > > blk-mq: defer to the normal submission path for post-flush requests > > Requests with the FUA bit on hardware without FUA support need a post > flush before returning to the caller, but they can still be sent using > the normal I/O path after initializing the flush-related fields and > end I/O handler. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Bart Van Assche <bvanassche@acm.org> > Link: https://lore.kernel.org/r/20230519044050.107790-6-hch@lst.de > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > block/blk-flush.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > On system 1: the exports are on top of /dev/mapper and reside on > an "INTEL SSDSC2BA400G3" SATA device. > > On system 2: the exports are on top of /dev/mapper and reside on > an "INTEL SSDSC2KB240G8" SATA device. > > System 1 was where I discovered the stall. System 2 is where I ran > the bisects. > > The call stacks vary a little. I've seen stalls in both the WRITE > and SETATTR paths. Here's a sample from system 1: > > INFO: task nfsd:1237 blocked for more than 122 seconds. > Tainted: G W 6.4.0-08699-g9e268189cb14 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:nfsd state:D stack:0 pid:1237 ppid:2 flags:0x00004000 > Call Trace: > <TASK> > __schedule+0x78f/0x7db > schedule+0x93/0xc8 > jbd2_log_wait_commit+0xb4/0xf4 > ? __pfx_autoremove_wake_function+0x10/0x10 > jbd2_complete_transaction+0x85/0x97 > ext4_fc_commit+0x118/0x70a > ? _raw_spin_unlock+0x18/0x2e > ? __mark_inode_dirty+0x282/0x302 > ext4_write_inode+0x94/0x121 > ext4_nfs_commit_metadata+0x72/0x7d > commit_inode_metadata+0x1f/0x31 [nfsd] > commit_metadata+0x26/0x33 [nfsd] > nfsd_setattr+0x2f2/0x30e [nfsd] > nfsd_create_setattr+0x4e/0x87 [nfsd] > nfsd4_open+0x604/0x8fa [nfsd] > nfsd4_proc_compound+0x4a8/0x5e3 [nfsd] > ? nfs4svc_decode_compoundargs+0x291/0x2de [nfsd] > nfsd_dispatch+0xb3/0x164 [nfsd] > svc_process_common+0x3c7/0x53a [sunrpc] > ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd] > svc_process+0xc6/0xe3 [sunrpc] > nfsd+0xf2/0x18c [nfsd] > ? __pfx_nfsd+0x10/0x10 [nfsd] > kthread+0x10d/0x115 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x2c/0x50 > </TASK> Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced 615939a2ae734e #regzbot title blk-mq: NFS workload leaves nfsd threads in D state #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail. Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-08 18:30 NFS workload leaves nfsd threads in D state Chuck Lever III 2023-07-09 6:58 ` Linux regression tracking (Thorsten Leemhuis) @ 2023-07-10 7:56 ` Christoph Hellwig 2023-07-10 14:06 ` Chuck Lever III 1 sibling, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2023-07-10 7:56 UTC (permalink / raw) To: Chuck Lever III Cc: Jens Axboe, Christoph Hellwig, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever On Sat, Jul 08, 2023 at 06:30:26PM +0000, Chuck Lever III wrote: > Hi - > > I have a "standard" test of running the git regression suite with > many threads against an NFS mount. I found that with 6.5-rc, the > test stalled and several nfsd threads on the server were stuck > in D state. Can you paste the exact reproducer here? > I can reproduce this stall 100% with both an xfs and an ext4 > export, so I bisected with both, and both bisects landed on the > same commit: > On system 1: the exports are on top of /dev/mapper and reside on > an "INTEL SSDSC2BA400G3" SATA device. > > On system 2: the exports are on top of /dev/mapper and reside on > an "INTEL SSDSC2KB240G8" SATA device. > > System 1 was where I discovered the stall. System 2 is where I ran > the bisects. Ok. I'd be curious if this reproducers without either device mapper or on a non-SATA device. If you have an easy way to run it in a VM that'd be great. Otherwise I'll try to recreate it in various setups if you post the exact reproducer. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-10 7:56 ` Christoph Hellwig @ 2023-07-10 14:06 ` Chuck Lever III 2023-07-10 15:10 ` Chuck Lever III 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever III @ 2023-07-10 14:06 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever > On Jul 10, 2023, at 3:56 AM, Christoph Hellwig <hch@lst.de> wrote: > > On Sat, Jul 08, 2023 at 06:30:26PM +0000, Chuck Lever III wrote: >> Hi - >> >> I have a "standard" test of running the git regression suite with >> many threads against an NFS mount. I found that with 6.5-rc, the >> test stalled and several nfsd threads on the server were stuck >> in D state. > > Can you paste the exact reproducer here? It's a twisty little maze of scripts, but it does essentially this: 1. export a test filesystem on system B 2. mount that export on system A via NFS (I think I used NFSv4.1) 3. download the latest git tarball on system A 4. unpack the tarball on the test NFS mount on system A. umount / mount 5. "make -jN all docs" on system A, where N is nprocs. umount / mount 6. "make -jN test" on system A, where N is as in step 5. (For "make test" to work, the mounted on dir on system A has to be exactly the same for all steps). My system A has 12 cores, and B has 4, fwiw. The network fabric is InfiniBand, but I suspect that won't make much difference. During step 6, the tests will slow down and then stop cold. After another two minutes, on system B you'll start to see the INFO splats about hung processes. As an interesting side note, I have a btrfs filesystem on that same mapper group and physical device. I'm not able to reproduce the problem on that filesystem. >> I can reproduce this stall 100% with both an xfs and an ext4 >> export, so I bisected with both, and both bisects landed on the >> same commit: > >> On system 1: the exports are on top of /dev/mapper and reside on >> an "INTEL SSDSC2BA400G3" SATA device. >> >> On system 2: the exports are on top of /dev/mapper and reside on >> an "INTEL SSDSC2KB240G8" SATA device. >> >> System 1 was where I discovered the stall. System 2 is where I ran >> the bisects. > > Ok. I'd be curious if this reproducers without either device mapper > or on a non-SATA device. If you have an easy way to run it in a VM > that'd be great. Otherwise I'll try to recreate it in various > setups if you post the exact reproducer. I have a way to test it on an xfs export backed by a pair of AIC NVMe devices. Stand by. -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-10 14:06 ` Chuck Lever III @ 2023-07-10 15:10 ` Chuck Lever III 2023-07-10 15:18 ` Christoph Hellwig 2023-07-10 17:28 ` Christoph Hellwig 0 siblings, 2 replies; 14+ messages in thread From: Chuck Lever III @ 2023-07-10 15:10 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever > On Jul 10, 2023, at 10:06 AM, Chuck Lever III <chuck.lever@oracle.com> wrote: > >> On Jul 10, 2023, at 3:56 AM, Christoph Hellwig <hch@lst.de> wrote: >> >> Ok. I'd be curious if this reproducers without either device mapper >> or on a non-SATA device. If you have an easy way to run it in a VM >> that'd be great. Otherwise I'll try to recreate it in various >> setups if you post the exact reproducer. > > I have a way to test it on an xfs export backed by a pair of AIC > NVMe devices. Stand by. The issue is not reproducible with PCIe NVMe devices. -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-10 15:10 ` Chuck Lever III @ 2023-07-10 15:18 ` Christoph Hellwig 2023-07-10 17:28 ` Christoph Hellwig 1 sibling, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2023-07-10 15:18 UTC (permalink / raw) To: Chuck Lever III Cc: Christoph Hellwig, Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever On Mon, Jul 10, 2023 at 03:10:41PM +0000, Chuck Lever III wrote: > > I have a way to test it on an xfs export backed by a pair of AIC > > NVMe devices. Stand by. > > The issue is not reproducible with PCIe NVMe devices. Thanks. ATA is special because it can't queued flush commands unlike all other problems. I'll see if I can get an ATA test setup and will look into it ASAP. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-10 15:10 ` Chuck Lever III 2023-07-10 15:18 ` Christoph Hellwig @ 2023-07-10 17:28 ` Christoph Hellwig 2023-07-10 17:40 ` Chuck Lever III 1 sibling, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2023-07-10 17:28 UTC (permalink / raw) To: Chuck Lever III Cc: Christoph Hellwig, Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever Only found a virtualized AHCI setup so far without much success. Can you try this (from Chengming Zhou) in the meantime? diff --git a/block/blk-flush.c b/block/blk-flush.c index dba392cf22bec6..5c392a277b9eb2 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -443,7 +443,7 @@ bool blk_insert_flush(struct request *rq) * the post flush, and then just pass the command on. */ blk_rq_init_flush(rq); - rq->flush.seq |= REQ_FSEQ_POSTFLUSH; + rq->flush.seq |= REQ_FSEQ_PREFLUSH; spin_lock_irq(&fq->mq_flush_lock); list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); spin_unlock_irq(&fq->mq_flush_lock); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-10 17:28 ` Christoph Hellwig @ 2023-07-10 17:40 ` Chuck Lever III 2023-07-11 12:01 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever III @ 2023-07-10 17:40 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever > On Jul 10, 2023, at 1:28 PM, Christoph Hellwig <hch@lst.de> wrote: > > Only found a virtualized AHCI setup so far without much success. Can > you try this (from Chengming Zhou) in the meantime? > > diff --git a/block/blk-flush.c b/block/blk-flush.c > index dba392cf22bec6..5c392a277b9eb2 100644 > --- a/block/blk-flush.c > +++ b/block/blk-flush.c > @@ -443,7 +443,7 @@ bool blk_insert_flush(struct request *rq) > * the post flush, and then just pass the command on. > */ > blk_rq_init_flush(rq); > - rq->flush.seq |= REQ_FSEQ_POSTFLUSH; > + rq->flush.seq |= REQ_FSEQ_PREFLUSH; > spin_lock_irq(&fq->mq_flush_lock); > list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); > spin_unlock_irq(&fq->mq_flush_lock); Thanks for the quick response. No change. -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-10 17:40 ` Chuck Lever III @ 2023-07-11 12:01 ` Christoph Hellwig 2023-07-12 11:34 ` Chengming Zhou 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2023-07-11 12:01 UTC (permalink / raw) To: Chuck Lever III Cc: Christoph Hellwig, Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote: > > blk_rq_init_flush(rq); > > - rq->flush.seq |= REQ_FSEQ_POSTFLUSH; > > + rq->flush.seq |= REQ_FSEQ_PREFLUSH; > > spin_lock_irq(&fq->mq_flush_lock); > > list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); > > spin_unlock_irq(&fq->mq_flush_lock); > > Thanks for the quick response. No change. I'm a bit lost and still can't reprodce. Below is a patch with the only behavior differences I can find. It has two "#if 1" blocks, which I'll need to bisect to to find out which made it work (if any, but I hope so). diff --git a/block/blk-mq.c b/block/blk-mq.c index 5504719b970d59..67364e607f2d1d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2927,6 +2927,7 @@ void blk_mq_submit_bio(struct bio *bio) struct request_queue *q = bdev_get_queue(bio->bi_bdev); struct blk_plug *plug = blk_mq_plug(bio); const int is_sync = op_is_sync(bio->bi_opf); + bool is_flush = op_is_flush(bio->bi_opf); struct blk_mq_hw_ctx *hctx; struct request *rq; unsigned int nr_segs = 1; @@ -2967,16 +2968,23 @@ void blk_mq_submit_bio(struct bio *bio) return; } - if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq)) - return; - - if (plug) { - blk_add_rq_to_plug(plug, rq); - return; +#if 1 /* Variant 1, the plug is holding us back */ + if (op_is_flush(bio->bi_opf)) { + if (blk_insert_flush(rq)) + return; + } else { + if (plug) { + blk_add_rq_to_plug(plug, rq); + return; + } } +#endif hctx = rq->mq_hctx; if ((rq->rq_flags & RQF_USE_SCHED) || +#if 1 /* Variant 2 (unlikely), blk_mq_try_issue_directly causes problems */ + is_flush || +#endif (hctx->dispatch_busy && (q->nr_hw_queues == 1 || !is_sync))) { blk_mq_insert_request(rq, 0); blk_mq_run_hw_queue(hctx, true); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-11 12:01 ` Christoph Hellwig @ 2023-07-12 11:34 ` Chengming Zhou 2023-07-12 13:29 ` Chuck Lever III 0 siblings, 1 reply; 14+ messages in thread From: Chengming Zhou @ 2023-07-12 11:34 UTC (permalink / raw) To: Chuck Lever III, Christoph Hellwig, ross.lagerwall Cc: Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever On 2023/7/11 20:01, Christoph Hellwig wrote: > On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote: >>> blk_rq_init_flush(rq); >>> - rq->flush.seq |= REQ_FSEQ_POSTFLUSH; >>> + rq->flush.seq |= REQ_FSEQ_PREFLUSH; >>> spin_lock_irq(&fq->mq_flush_lock); >>> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); >>> spin_unlock_irq(&fq->mq_flush_lock); >> >> Thanks for the quick response. No change. > > I'm a bit lost and still can't reprodce. Below is a patch with the > only behavior differences I can find. It has two "#if 1" blocks, > which I'll need to bisect to to find out which made it work (if any, > but I hope so). Hello, I tried today to reproduce, but can't unfortunately. Could you please also try the fix patch [1] from Ross Lagerwall that fixes IO hung problem of plug recursive flush? (Since the main difference is that post-flush requests now can go into plug.) [1] https://lore.kernel.org/all/20230711160434.248868-1-ross.lagerwall@citrix.com/ Thanks! > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 5504719b970d59..67364e607f2d1d 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2927,6 +2927,7 @@ void blk_mq_submit_bio(struct bio *bio) > struct request_queue *q = bdev_get_queue(bio->bi_bdev); > struct blk_plug *plug = blk_mq_plug(bio); > const int is_sync = op_is_sync(bio->bi_opf); > + bool is_flush = op_is_flush(bio->bi_opf); > struct blk_mq_hw_ctx *hctx; > struct request *rq; > unsigned int nr_segs = 1; > @@ -2967,16 +2968,23 @@ void blk_mq_submit_bio(struct bio *bio) > return; > } > > - if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq)) > - return; > - > - if (plug) { > - blk_add_rq_to_plug(plug, rq); > - return; > +#if 1 /* Variant 1, the plug is holding us back */ > + if (op_is_flush(bio->bi_opf)) { > + if (blk_insert_flush(rq)) > + return; > + } else { > + if (plug) { > + blk_add_rq_to_plug(plug, rq); > + return; > + } > } > +#endif > > hctx = rq->mq_hctx; > if ((rq->rq_flags & RQF_USE_SCHED) || > +#if 1 /* Variant 2 (unlikely), blk_mq_try_issue_directly causes problems */ > + is_flush || > +#endif > (hctx->dispatch_busy && (q->nr_hw_queues == 1 || !is_sync))) { > blk_mq_insert_request(rq, 0); > blk_mq_run_hw_queue(hctx, true); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-12 11:34 ` Chengming Zhou @ 2023-07-12 13:29 ` Chuck Lever III 2023-07-25 9:57 ` Linux regression tracking (Thorsten Leemhuis) 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever III @ 2023-07-12 13:29 UTC (permalink / raw) To: Chengming Zhou Cc: Christoph Hellwig, ross.lagerwall@citrix.com, Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever > On Jul 12, 2023, at 7:34 AM, Chengming Zhou <chengming.zhou@linux.dev> wrote: > > On 2023/7/11 20:01, Christoph Hellwig wrote: >> On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote: >>>> blk_rq_init_flush(rq); >>>> - rq->flush.seq |= REQ_FSEQ_POSTFLUSH; >>>> + rq->flush.seq |= REQ_FSEQ_PREFLUSH; >>>> spin_lock_irq(&fq->mq_flush_lock); >>>> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); >>>> spin_unlock_irq(&fq->mq_flush_lock); >>> >>> Thanks for the quick response. No change. >> >> I'm a bit lost and still can't reprodce. Below is a patch with the >> only behavior differences I can find. It has two "#if 1" blocks, >> which I'll need to bisect to to find out which made it work (if any, >> but I hope so). > > Hello, > > I tried today to reproduce, but can't unfortunately. > > Could you please also try the fix patch [1] from Ross Lagerwall that fixes > IO hung problem of plug recursive flush? > > (Since the main difference is that post-flush requests now can go into plug.) > > [1] https://lore.kernel.org/all/20230711160434.248868-1-ross.lagerwall@citrix.com/ Thanks for the suggestion. No change, unfortunately. > Thanks! > >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 5504719b970d59..67364e607f2d1d 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2927,6 +2927,7 @@ void blk_mq_submit_bio(struct bio *bio) >> struct request_queue *q = bdev_get_queue(bio->bi_bdev); >> struct blk_plug *plug = blk_mq_plug(bio); >> const int is_sync = op_is_sync(bio->bi_opf); >> + bool is_flush = op_is_flush(bio->bi_opf); >> struct blk_mq_hw_ctx *hctx; >> struct request *rq; >> unsigned int nr_segs = 1; >> @@ -2967,16 +2968,23 @@ void blk_mq_submit_bio(struct bio *bio) >> return; >> } >> >> - if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq)) >> - return; >> - >> - if (plug) { >> - blk_add_rq_to_plug(plug, rq); >> - return; >> +#if 1 /* Variant 1, the plug is holding us back */ >> + if (op_is_flush(bio->bi_opf)) { >> + if (blk_insert_flush(rq)) >> + return; >> + } else { >> + if (plug) { >> + blk_add_rq_to_plug(plug, rq); >> + return; >> + } >> } >> +#endif >> >> hctx = rq->mq_hctx; >> if ((rq->rq_flags & RQF_USE_SCHED) || >> +#if 1 /* Variant 2 (unlikely), blk_mq_try_issue_directly causes problems */ >> + is_flush || >> +#endif >> (hctx->dispatch_busy && (q->nr_hw_queues == 1 || !is_sync))) { >> blk_mq_insert_request(rq, 0); >> blk_mq_run_hw_queue(hctx, true); -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-12 13:29 ` Chuck Lever III @ 2023-07-25 9:57 ` Linux regression tracking (Thorsten Leemhuis) 2023-07-25 13:21 ` Chuck Lever III 0 siblings, 1 reply; 14+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2023-07-25 9:57 UTC (permalink / raw) To: Chuck Lever III, Chengming Zhou Cc: Christoph Hellwig, ross.lagerwall@citrix.com, Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever, Linux kernel regressions list On 12.07.23 15:29, Chuck Lever III wrote: >> On Jul 12, 2023, at 7:34 AM, Chengming Zhou <chengming.zhou@linux.dev> wrote: >> On 2023/7/11 20:01, Christoph Hellwig wrote: >>> On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote: >>>>> blk_rq_init_flush(rq); >>>>> - rq->flush.seq |= REQ_FSEQ_POSTFLUSH; >>>>> + rq->flush.seq |= REQ_FSEQ_PREFLUSH; >>>>> spin_lock_irq(&fq->mq_flush_lock); >>>>> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); >>>>> spin_unlock_irq(&fq->mq_flush_lock); >>>> >>>> Thanks for the quick response. No change. >>> I'm a bit lost and still can't reprodce. Below is a patch with the >>> only behavior differences I can find. It has two "#if 1" blocks, >>> which I'll need to bisect to to find out which made it work (if any, >>> but I hope so). >> >> I tried today to reproduce, but can't unfortunately. >> >> Could you please also try the fix patch [1] from Ross Lagerwall that fixes >> IO hung problem of plug recursive flush? >> >> (Since the main difference is that post-flush requests now can go into plug.) >> >> [1] https://lore.kernel.org/all/20230711160434.248868-1-ross.lagerwall@citrix.com/ > > Thanks for the suggestion. No change, unfortunately. Chuck, what's the status here? This thread looks stalled, that's why I wonder. FWIW, I noticed a commit with a Fixes: tag for your culprit in next (see 28b24123747098 ("blk-flush: fix rq->flush.seq for post-flush requests")). But unless I missed something you are not CCed, so I guess that's a different issue? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-25 9:57 ` Linux regression tracking (Thorsten Leemhuis) @ 2023-07-25 13:21 ` Chuck Lever III 2023-07-25 13:34 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever III @ 2023-07-25 13:21 UTC (permalink / raw) To: Linux regressions mailing list Cc: Chengming Zhou, Christoph Hellwig, ross.lagerwall@citrix.com, Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever > On Jul 25, 2023, at 5:57 AM, Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> wrote: > > On 12.07.23 15:29, Chuck Lever III wrote: >>> On Jul 12, 2023, at 7:34 AM, Chengming Zhou <chengming.zhou@linux.dev> wrote: >>> On 2023/7/11 20:01, Christoph Hellwig wrote: >>>> On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote: >>>>>> blk_rq_init_flush(rq); >>>>>> - rq->flush.seq |= REQ_FSEQ_POSTFLUSH; >>>>>> + rq->flush.seq |= REQ_FSEQ_PREFLUSH; >>>>>> spin_lock_irq(&fq->mq_flush_lock); >>>>>> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight); >>>>>> spin_unlock_irq(&fq->mq_flush_lock); >>>>> >>>>> Thanks for the quick response. No change. >>>> I'm a bit lost and still can't reprodce. Below is a patch with the >>>> only behavior differences I can find. It has two "#if 1" blocks, >>>> which I'll need to bisect to to find out which made it work (if any, >>>> but I hope so). >>> >>> I tried today to reproduce, but can't unfortunately. >>> >>> Could you please also try the fix patch [1] from Ross Lagerwall that fixes >>> IO hung problem of plug recursive flush? >>> >>> (Since the main difference is that post-flush requests now can go into plug.) >>> >>> [1] https://lore.kernel.org/all/20230711160434.248868-1-ross.lagerwall@citrix.com/ >> >> Thanks for the suggestion. No change, unfortunately. > > Chuck, what's the status here? This thread looks stalled, that's why I > wonder. > > FWIW, I noticed a commit with a Fixes: tag for your culprit in next (see > 28b24123747098 ("blk-flush: fix rq->flush.seq for post-flush > requests")). But unless I missed something you are not CCed, so I guess > that's a different issue? Hi Thorsten- This issue was fixed in 6.5-rc2 by commit 9f87fc4d72f5 ("block: queue data commands from the flush state machine at the head") -- Chuck Lever ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: NFS workload leaves nfsd threads in D state 2023-07-25 13:21 ` Chuck Lever III @ 2023-07-25 13:34 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 0 replies; 14+ messages in thread From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-07-25 13:34 UTC (permalink / raw) To: Chuck Lever III, Linux regressions mailing list Cc: Chengming Zhou, Christoph Hellwig, ross.lagerwall@citrix.com, Jens Axboe, linux-block@vger.kernel.org, Linux NFS Mailing List, Chuck Lever On 25.07.23 15:21, Chuck Lever III wrote: >> On Jul 25, 2023, at 5:57 AM, Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> wrote: >> >> On 12.07.23 15:29, Chuck Lever III wrote: >>>> On Jul 12, 2023, at 7:34 AM, Chengming Zhou <chengming.zhou@linux.dev> wrote: >>>> On 2023/7/11 20:01, Christoph Hellwig wrote: >>>>> On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote: >> >> Chuck, what's the status here? This thread looks stalled, that's why I >> wonder. >> >> FWIW, I noticed a commit with a Fixes: tag for your culprit in next (see >> 28b24123747098 ("blk-flush: fix rq->flush.seq for post-flush >> requests")). But unless I missed something you are not CCed, so I guess >> that's a different issue? > > This issue was fixed in 6.5-rc2 by commit > > 9f87fc4d72f5 ("block: queue data commands from the flush state machine at the head") Ahh, many thx for the update, it lacked a proper Link:/Closes: tag and mentioned another culprit, so I had missed that! #regzbot fix: 9f87fc4d72f5 #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-07-25 13:34 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-08 18:30 NFS workload leaves nfsd threads in D state Chuck Lever III 2023-07-09 6:58 ` Linux regression tracking (Thorsten Leemhuis) 2023-07-10 7:56 ` Christoph Hellwig 2023-07-10 14:06 ` Chuck Lever III 2023-07-10 15:10 ` Chuck Lever III 2023-07-10 15:18 ` Christoph Hellwig 2023-07-10 17:28 ` Christoph Hellwig 2023-07-10 17:40 ` Chuck Lever III 2023-07-11 12:01 ` Christoph Hellwig 2023-07-12 11:34 ` Chengming Zhou 2023-07-12 13:29 ` Chuck Lever III 2023-07-25 9:57 ` Linux regression tracking (Thorsten Leemhuis) 2023-07-25 13:21 ` Chuck Lever III 2023-07-25 13:34 ` Linux regression tracking #update (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox