From: Ming Lei <ming.lei@redhat.com>
To: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-block@vger.kernel.org,
Mikulas Patocka <mpatocka@redhat.com>,
Zhaoyang Huang <zhaoyang.huang@unisoc.com>,
Dave Chinner <dchinner@redhat.com>,
linux-fsdevel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: calling into file systems directly from ->queue_rq, was Re: [PATCH V5 0/6] loop: improve loop aio perf by IOCB_NOWAIT
Date: Tue, 25 Nov 2025 17:19:44 +0800 [thread overview]
Message-ID: <aSV0sDZGDoS-tLlp@fedora> (raw)
In-Reply-To: <db90b7b3-bf94-4531-8329-d9e0dbc6a997@linux.alibaba.com>
On Tue, Nov 25, 2025 at 03:26:39PM +0800, Gao Xiang wrote:
> Hi Ming and Christoph,
>
> On 2025/11/25 11:00, Ming Lei wrote:
> > On Mon, Nov 24, 2025 at 01:05:46AM -0800, Christoph Hellwig wrote:
> > > On Mon, Nov 24, 2025 at 05:02:03PM +0800, Ming Lei wrote:
> > > > On Sun, Nov 23, 2025 at 10:12:24PM -0800, Christoph Hellwig wrote:
> > > > > FYI, with this series I'm seeing somewhat frequent stack overflows when
> > > > > using loop on top of XFS on top of stacked block devices.
> > > >
> > > > Can you share your setting?
> > > >
> > > > BTW, there are one followup fix:
> > > >
> > > > https://lore.kernel.org/linux-block/20251120160722.3623884-1-ming.lei@redhat.com/
> > > >
> > > > I just run 'xfstests -q quick' on loop on top of XFS on top of dm-stripe,
> > > > not see stack overflow with the above fix against -next.
> > >
> > > This was with a development tree with lots of local code. So the
> > > messages aren't applicable (and probably a hint I need to reduce my
> > > stack usage). The observations is that we now stack through from block
> > > submission context into the file system write path, which is bad for a
> > > lot of reasons. journal_info being the most obvious one.
> > >
> > > > > In other words: I don't think issuing file system I/O from the
> > > > > submission thread in loop can work, and we should drop this again.
> > > >
> > > > I don't object to drop it one more time.
> > > >
> > > > However, can we confirm if it is really a stack overflow because of
> > > > calling into FS from ->queue_rq()?
> > >
> > > Yes.
> > >
> > > > If yes, it could be dead end to improve loop in this way, then I can give up.
> > >
> > > I think calling directly into the lower file system without a context
> > > switch is very problematic, so IMHO yes, it is a dead end.
> I've already explained the details in
> https://lore.kernel.org/r/8c596737-95c1-4274-9834-1fe06558b431@linux.alibaba.com
>
> to zram folks why block devices act like this is very
> risky (in brief, because virtual block devices don't
> have any way (unlike the inner fs itself) to know enough
> about whether the inner fs already did something without
> context save (a.k.a side effect) so a new task context
> is absolutely necessary for virtual block devices to
> access backing fses for stacked usage.
>
> So whether a nested fs can success is intrinsic to
> specific fses (because either they assure no complex
> journal_info access or save all effected contexts before
> transiting to the block layer. But that is not bdev can
> do since they need to do any block fs.
IMO, task stack overflow could be the biggest trouble.
block layer has current->blk_plug/current->bio_list, which are
dealt with in the following patches:
https://lore.kernel.org/linux-block/20251120160722.3623884-4-ming.lei@redhat.com/
https://lore.kernel.org/linux-block/20251120160722.3623884-5-ming.lei@redhat.com/
I am curious why FS task context can't be saved/restored inside block
layer when calling into new FS IO? Given it is just per-task info.
Thanks,
Ming
next prev parent reply other threads:[~2025-11-25 9:20 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-15 11:07 [PATCH V5 0/6] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
2025-10-15 11:07 ` [PATCH V5 1/6] loop: add helper lo_cmd_nr_bvec() Ming Lei
2025-10-15 15:49 ` Bart Van Assche
2025-10-16 2:19 ` Ming Lei
2025-10-15 11:07 ` [PATCH V5 2/6] loop: add helper lo_rw_aio_prep() Ming Lei
2025-10-15 11:07 ` [PATCH V5 3/6] loop: add lo_submit_rw_aio() Ming Lei
2025-10-15 11:07 ` [PATCH V5 4/6] loop: move command blkcg/memcg initialization into loop_queue_work Ming Lei
2025-10-15 11:07 ` [PATCH V5 5/6] loop: try to handle loop aio command via NOWAIT IO first Ming Lei
2025-10-15 11:07 ` [PATCH V5 6/6] loop: add hint for handling aio via IOCB_NOWAIT Ming Lei
2025-11-18 12:55 ` [PATCH V5 0/6] loop: improve loop aio perf by IOCB_NOWAIT Ming Lei
2025-11-18 15:38 ` Jens Axboe
2025-11-24 6:12 ` calling into file systems directly from ->queue_rq, was " Christoph Hellwig
2025-11-24 9:02 ` Ming Lei
2025-11-24 9:05 ` Christoph Hellwig
2025-11-25 3:00 ` Ming Lei
2025-11-25 3:56 ` Jens Axboe
2025-11-25 7:26 ` Gao Xiang
2025-11-25 9:19 ` Ming Lei [this message]
2025-11-25 9:39 ` Gao Xiang
2025-11-25 10:13 ` Gao Xiang
2025-11-25 10:41 ` Ming Lei
2025-11-25 10:57 ` Gao Xiang
2025-11-25 11:10 ` Christoph Hellwig
2025-11-25 11:48 ` Ming Lei
2025-11-25 11:58 ` Gao Xiang
2025-11-25 12:18 ` Ming Lei
2025-11-25 15:16 ` Gao Xiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aSV0sDZGDoS-tLlp@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=dchinner@redhat.com \
--cc=hch@infradead.org \
--cc=hsiangkao@linux.alibaba.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=zhaoyang.huang@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).