From: Christoph Hellwig <hch@infradead.org>
To: Tal Zussman <tz2294@columbia.edu>
Cc: Jens Axboe <axboe@kernel.dk>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Christian Brauner <brauner@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
Carlos Maiolino <cem@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Christoph Hellwig <hch@infradead.org>,
Dave Chinner <dgc@kernel.org>,
Bart Van Assche <bvanassche@acm.org>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, Gao Xiang <xiang@kernel.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Clark Williams <clrkwllms@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
linux-rt-devel@lists.linux.dev
Subject: Re: [PATCH v6 1/4] block: add task-context bio completion infrastructure
Date: Sun, 24 May 2026 22:24:30 -0700 [thread overview]
Message-ID: <ahPdDtu3vXfNpb__@infradead.org> (raw)
In-Reply-To: <ea6fc01f-5cb7-4a04-9f92-bbd2791fea51@columbia.edu>
[adding the PREEMPT-RT maintainers and list for one and a half questions
for them a bit below]
On Fri, May 22, 2026 at 07:09:59PM -0400, Tal Zussman wrote:
> > + while ((bio = bio_list_pop(&list)))
> > + bio->bi_end_io(bio);
>
> Q: "Does this code handle bios that do not have a completion callback?
> If a bio is submitted with the BIO_COMPLETE_IN_TASK flag but has a NULL
> bi_end_io pointer (like some flush or discard bios), executing
> bio->bi_end_io(bio) directly will cause a NULL pointer dereference.
> The synchronous completion path in bio_endio() checks if bio->bi_end_io is
> non-NULL before calling it. Should a similar check be added here?"
>
> A: Every bio that reaches this point should have bi_end_io set. There's no
> point in deferring a bio without a completion handler.
Yes. The only bios without bi_end_io that can end up in bio_endio
are those attached to passthrough requests, and deferring them of
course doesn't make sense.
> Q: "Could this loop also monopolize the CPU and cause latency spikes?
> Because the work uses a 1-jiffie delay, tens of thousands of bios might
> accumulate under heavy I/O load. Processing the entire batch sequentially
> without yielding or calling cond_resched() inside the inner loop could
> lead to scheduler latency or watchdog warnings on non-preemptible kernels."
>
> A: With PREEMPT_LAZY as the default, the worker is preemptible, as per
> Hillf and Christoph, so this is unnecessary.
Yes. Sashiko tends to spew this a lot, which is odd as there were clear
recommendations to only do this absolutely if needed even before
PREEMPT_LAZY.
> > +
> > + local_lock_irqsave(&bio_complete_batch.lock, flags);
>
> Q: "Is it safe to use local_lock_irqsave() here when called from an atomic
> context?
> On CONFIG_PREEMPT_RT kernels, local_lock_t maps to a per-CPU spinlock_t,
> which is a sleepable rt_mutex. Since __bio_complete_in_task() is specifically
> called when bio_in_atomic() is true (which includes hardware interrupts or
> execution under a raw_spinlock_t), attempting to acquire a sleepable lock
> here would trigger an "Invalid wait context" lockdep warning.
> Would a lockless list (llist) be more appropriate here to avoid sleeping
> in atomic contexts?"
>
> A: This seems legit, but I'm not super familiar with PREEMPT_RT. I don't want
> to switch to raw_spinlock_t, as it seems like that would add unnecessary
> overhead on non-PREEMPT_RT kernels. I think switching to use local_irq_save()
> (as is done for the per-CPU bio allocation cache) should work.
Adding the PREEMPT_RT maintainers for this as it is above my pay grade.
> Q: "Does creating this workqueue with WQ_MEM_RECLAIM break the local_lock_irq()
> protection in bio_complete_work_fn()?
> When a workqueue has WQ_MEM_RECLAIM, it spawns a global rescuer thread that
> can execute per-CPU work items during worker pool congestion. This rescuer
> thread executes unbound, meaning it could run on CPU B while processing
> CPU A's work item.
> Since local_lock operates strictly on the currently executing CPU, the
> rescuer thread on CPU B would acquire CPU B's lock, while popping elements
> from CPU A's list (derived via container_of()).
> If an interrupt on CPU A concurrently calls __bio_complete_in_task(),
> it will acquire CPU A's lock and modify the same list without mutual
> exclusion, potentially causing list corruption."
>
> A: The rescuer should run on the same CPU, not unbound, so this is not an
> issue.
This is another area where the PREEMPT_RT/scheduler folks might be able
to help.
> static inline bool bio_complete_in_task(struct bio *bio)
> {
> if (bio_flagged(bio, BIO_COMPLETE_IN_TASK))
> return false;
> if (!bio_in_atomic())
> return false;
> bio_set_flag(bio, BIO_COMPLETE_IN_TASK);
> __bio_complete_in_task(bio);
> return true;
> }
>
> We can use the BIO_COMPLETE_IN_TASK flag to indicate that it's already
> been deferred to the workqueue as is safe to run.
Would be nice to avoid this, but yes.
next prev parent reply other threads:[~2026-05-25 5:24 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 21:51 [PATCH v6 0/4] block: enable RWF_DONTCACHE for block devices Tal Zussman
2026-05-14 21:51 ` [PATCH v6 1/4] block: add task-context bio completion infrastructure Tal Zussman
2026-05-18 6:48 ` Christoph Hellwig
2026-05-22 22:47 ` Tal Zussman
2026-05-25 5:17 ` Christoph Hellwig
2026-05-22 23:09 ` Tal Zussman
2026-05-25 5:24 ` Christoph Hellwig [this message]
2026-05-14 21:51 ` [PATCH v6 2/4] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback Tal Zussman
2026-05-18 6:48 ` Christoph Hellwig
2026-05-14 21:51 ` [PATCH v6 3/4] buffer: add dropbehind writeback support Tal Zussman
2026-05-18 6:49 ` Christoph Hellwig
2026-05-22 23:14 ` Tal Zussman
2026-05-25 5:25 ` Christoph Hellwig
2026-05-14 21:51 ` [PATCH v6 4/4] block: enable RWF_DONTCACHE for block devices Tal Zussman
2026-05-18 6:49 ` Christoph Hellwig
2026-05-22 23:17 ` Tal Zussman
2026-05-25 5:30 ` Christoph Hellwig
2026-05-25 18:06 ` Tal Zussman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahPdDtu3vXfNpb__@infradead.org \
--to=hch@infradead.org \
--cc=axboe@kernel.dk \
--cc=bigeasy@linutronix.de \
--cc=brauner@kernel.org \
--cc=bvanassche@acm.org \
--cc=cem@kernel.org \
--cc=clrkwllms@kernel.org \
--cc=dgc@kernel.org \
--cc=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=linux-xfs@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tz2294@columbia.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=xiang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox