From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Christoph Hellwig <hch@infradead.org>
Cc: Tal Zussman <tz2294@columbia.edu>, Jens Axboe <axboe@kernel.dk>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Christian Brauner <brauner@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
Carlos Maiolino <cem@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Dave Chinner <dgc@kernel.org>,
Bart Van Assche <bvanassche@acm.org>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, Gao Xiang <xiang@kernel.org>,
Clark Williams <clrkwllms@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
linux-rt-devel@lists.linux.dev
Subject: Re: [PATCH v6 1/4] block: add task-context bio completion infrastructure
Date: Fri, 29 May 2026 10:49:39 +0200 [thread overview]
Message-ID: <20260529084939.bWkYVTj4@linutronix.de> (raw)
In-Reply-To: <ahPdDtu3vXfNpb__@infradead.org>
On 2026-05-24 22:24:30 [-0700], Christoph Hellwig wrote:
> > > + local_lock_irqsave(&bio_complete_batch.lock, flags);
> >
> > Q: "Is it safe to use local_lock_irqsave() here when called from an atomic
> > context?
> > On CONFIG_PREEMPT_RT kernels, local_lock_t maps to a per-CPU spinlock_t,
> > which is a sleepable rt_mutex. Since __bio_complete_in_task() is specifically
> > called when bio_in_atomic() is true (which includes hardware interrupts or
> > execution under a raw_spinlock_t), attempting to acquire a sleepable lock
> > here would trigger an "Invalid wait context" lockdep warning.
> > Would a lockless list (llist) be more appropriate here to avoid sleeping
> > in atomic contexts?"
> >
> > A: This seems legit, but I'm not super familiar with PREEMPT_RT. I don't want
> > to switch to raw_spinlock_t, as it seems like that would add unnecessary
> > overhead on non-PREEMPT_RT kernels. I think switching to use local_irq_save()
> > (as is done for the per-CPU bio allocation cache) should work.
>
> Adding the PREEMPT_RT maintainers for this as it is above my pay grade.
The local_lock_irqsave() seems to come from __bio_complete_in_task()
which sounds like preemptible context in general. So yes, using so is
safe.
It should be even save with interrupt handlers and so on since they are
threaded in general.
There is this new bio_in_atomic() this looks a bit odd to detect softirq
context as calimed in the comment. Anyway, on PREEMPT_RT
rcu_preempt_depth() will work as intended. The preemptible() shouldn't
get false because softirq handling is not recorded in preempt-counter,
interrupts are forced-threaded so you should never be in hardirq
context and things like spin_lock_irq() don't disable interrupts. So
unless you do local_irq_save(), preempt_disable() you should remain
preemptible (even in softirq). This might or might not do what you want.
> > Q: "Does creating this workqueue with WQ_MEM_RECLAIM break the local_lock_irq()
> > protection in bio_complete_work_fn()?
> > When a workqueue has WQ_MEM_RECLAIM, it spawns a global rescuer thread that
> > can execute per-CPU work items during worker pool congestion. This rescuer
> > thread executes unbound, meaning it could run on CPU B while processing
> > CPU A's work item.
> > Since local_lock operates strictly on the currently executing CPU, the
> > rescuer thread on CPU B would acquire CPU B's lock, while popping elements
> > from CPU A's list (derived via container_of()).
> > If an interrupt on CPU A concurrently calls __bio_complete_in_task(),
> > it will acquire CPU A's lock and modify the same list without mutual
> > exclusion, potentially causing list corruption."
> >
> > A: The rescuer should run on the same CPU, not unbound, so this is not an
> > issue.
>
> This is another area where the PREEMPT_RT/scheduler folks might be able
> to help.
Not sure what the question is. WQ_MEM_RECLAIM is one thing WQ_UNBOUND/
WQ_PERCPU another.
bio_complete_wq is WQ_MEM_RECLAIM | WQ_PERCPU. So it will run on the
requested CPU. The container_of() and local_local() looks like it will
access the same thing but having a WARN_ON() if they don't would be a
blessing. Or just use this_cpu in the worker FN to avoid all this.
The need_resched() check in bio_complete_work_fn() is bit odd. So if
need_resched() is true then you want to leave (and you care !PREEMPT
kernels). On PREEMPT_LAZY you could just continue as there would be
preemption sooner or later.
The bio_list_empty() check below is futile. If it is empty then you
leave doing nothing (so you could just leave without the check).
If there is an item, then the enqueue "thread" should have invoked
mod_delayed_work_on(, 1) claiming the work. That means the
mod_delayed_work_on(, 0) in this function should do nothing because the
work is already claimed (so you could just leave skipping the extra
work).
Sebastian
prev parent reply other threads:[~2026-05-29 8:49 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <ea6fc01f-5cb7-4a04-9f92-bbd2791fea51@columbia.edu>
2026-05-25 5:24 ` [PATCH v6 1/4] block: add task-context bio completion infrastructure Christoph Hellwig
2026-05-29 8:49 ` Sebastian Andrzej Siewior [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260529084939.bWkYVTj4@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=bvanassche@acm.org \
--cc=cem@kernel.org \
--cc=clrkwllms@kernel.org \
--cc=dgc@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=linux-xfs@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tz2294@columbia.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=xiang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox