All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Tal Zussman <tz2294@columbia.edu>
Cc: Jens Axboe <axboe@kernel.dk>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Christian Brauner <brauner@kernel.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Carlos Maiolino <cem@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <dgc@kernel.org>,
	Bart Van Assche <bvanassche@acm.org>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Gao Xiang <xiang@kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Clark Williams <clrkwllms@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-rt-devel@lists.linux.dev
Subject: Re: [PATCH v6 1/4] block: add task-context bio completion infrastructure
Date: Sun, 24 May 2026 22:24:30 -0700	[thread overview]
Message-ID: <ahPdDtu3vXfNpb__@infradead.org> (raw)
In-Reply-To: <ea6fc01f-5cb7-4a04-9f92-bbd2791fea51@columbia.edu>

[adding the PREEMPT-RT maintainers and list for one and a half questions
for them a bit below]

On Fri, May 22, 2026 at 07:09:59PM -0400, Tal Zussman wrote:
> > +		while ((bio = bio_list_pop(&list)))
> > +			bio->bi_end_io(bio);
> 
> Q: "Does this code handle bios that do not have a completion callback?
> If a bio is submitted with the BIO_COMPLETE_IN_TASK flag but has a NULL
> bi_end_io pointer (like some flush or discard bios), executing
> bio->bi_end_io(bio) directly will cause a NULL pointer dereference.
> The synchronous completion path in bio_endio() checks if bio->bi_end_io is
> non-NULL before calling it. Should a similar check be added here?"
> 
> A: Every bio that reaches this point should have bi_end_io set. There's no
> point in deferring a bio without a completion handler.

Yes.  The only bios without bi_end_io that can end up in bio_endio
are those attached to passthrough requests, and deferring them of
course doesn't make sense.

> Q: "Could this loop also monopolize the CPU and cause latency spikes?
> Because the work uses a 1-jiffie delay, tens of thousands of bios might
> accumulate under heavy I/O load. Processing the entire batch sequentially
> without yielding or calling cond_resched() inside the inner loop could
> lead to scheduler latency or watchdog warnings on non-preemptible kernels."
> 
> A: With PREEMPT_LAZY as the default, the worker is preemptible, as per 
> Hillf and Christoph, so this is unnecessary.

Yes.  Sashiko tends to spew this a lot, which is odd as there were clear
recommendations to only do this absolutely if needed even before
PREEMPT_LAZY.

> > +
> > +	local_lock_irqsave(&bio_complete_batch.lock, flags);
> 
> Q: "Is it safe to use local_lock_irqsave() here when called from an atomic
> context?
> On CONFIG_PREEMPT_RT kernels, local_lock_t maps to a per-CPU spinlock_t,
> which is a sleepable rt_mutex. Since __bio_complete_in_task() is specifically
> called when bio_in_atomic() is true (which includes hardware interrupts or
> execution under a raw_spinlock_t), attempting to acquire a sleepable lock
> here would trigger an "Invalid wait context" lockdep warning.
> Would a lockless list (llist) be more appropriate here to avoid sleeping
> in atomic contexts?"
> 
> A: This seems legit, but I'm not super familiar with PREEMPT_RT. I don't want
> to switch to raw_spinlock_t, as it seems like that would add unnecessary
> overhead on non-PREEMPT_RT kernels. I think switching to use local_irq_save()
> (as is done for the per-CPU bio allocation cache) should work.

Adding the PREEMPT_RT maintainers for this as it is above my pay grade.

> Q: "Does creating this workqueue with WQ_MEM_RECLAIM break the local_lock_irq()
> protection in bio_complete_work_fn()?
> When a workqueue has WQ_MEM_RECLAIM, it spawns a global rescuer thread that
> can execute per-CPU work items during worker pool congestion. This rescuer
> thread executes unbound, meaning it could run on CPU B while processing
> CPU A's work item.
> Since local_lock operates strictly on the currently executing CPU, the
> rescuer thread on CPU B would acquire CPU B's lock, while popping elements
> from CPU A's list (derived via container_of()).
> If an interrupt on CPU A concurrently calls __bio_complete_in_task(),
> it will acquire CPU A's lock and modify the same list without mutual
> exclusion, potentially causing list corruption."
> 
> A: The rescuer should run on the same CPU, not unbound, so this is not an
> issue.

This is another area where the PREEMPT_RT/scheduler folks might be able
to help.

> static inline bool bio_complete_in_task(struct bio *bio)
> {
> 	if (bio_flagged(bio, BIO_COMPLETE_IN_TASK))
> 		return false;
> 	if (!bio_in_atomic())
> 		return false;
> 	bio_set_flag(bio, BIO_COMPLETE_IN_TASK);
> 	__bio_complete_in_task(bio);
> 	return true;
> }
> 
> We can use the BIO_COMPLETE_IN_TASK flag to indicate that it's already
> been deferred to the workqueue as is safe to run.

Would be nice to avoid this, but yes.



  reply	other threads:[~2026-05-25  5:24 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14 21:51 [PATCH v6 0/4] block: enable RWF_DONTCACHE for block devices Tal Zussman
2026-05-14 21:51 ` [PATCH v6 1/4] block: add task-context bio completion infrastructure Tal Zussman
2026-05-15  2:38   ` Hillf Danton
2026-05-18  6:48   ` Christoph Hellwig
2026-05-22 22:47     ` Tal Zussman
2026-05-25  5:17       ` Christoph Hellwig
2026-05-26 19:29         ` Tal Zussman
2026-05-27  9:42           ` Jan Kara
2026-05-27 13:00             ` Christoph Hellwig
2026-05-29 20:46               ` Tal Zussman
2026-06-01 11:04                 ` Jan Kara
2026-05-22 23:09   ` Tal Zussman
2026-05-25  5:24     ` Christoph Hellwig [this message]
2026-05-29  8:49       ` Sebastian Andrzej Siewior
2026-05-14 21:51 ` [PATCH v6 2/4] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback Tal Zussman
2026-05-18  6:48   ` Christoph Hellwig
2026-05-14 21:51 ` [PATCH v6 3/4] buffer: add dropbehind writeback support Tal Zussman
2026-05-18  6:49   ` Christoph Hellwig
2026-05-22 23:14   ` Tal Zussman
2026-05-25  5:25     ` Christoph Hellwig
2026-05-14 21:51 ` [PATCH v6 4/4] block: enable RWF_DONTCACHE for block devices Tal Zussman
2026-05-18  6:49   ` Christoph Hellwig
2026-05-22 23:17   ` Tal Zussman
2026-05-25  5:30     ` Christoph Hellwig
2026-05-25 18:06       ` Tal Zussman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahPdDtu3vXfNpb__@infradead.org \
    --to=hch@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=cem@kernel.org \
    --cc=clrkwllms@kernel.org \
    --cc=dgc@kernel.org \
    --cc=djwong@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tz2294@columbia.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.