From: asmadeus@codewreck.org
To: Christoph Hellwig <hch@infradead.org>
Cc: Eric Van Hensbergen <ericvh@kernel.org>,
Latchesar Ionkov <lucho@ionkov.net>,
Christian Schoenebeck <linux_oss@crudebyte.com>,
v9fs@lists.linux.dev, linux-kernel@vger.kernel.org,
David Howells <dhowells@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
linux-fsdevel@vger.kernel.org,
Chris Arges <carges@cloudflare.com>
Subject: Re: [PATCH] 9p/virtio: restrict page pinning to user_backed_iter() iovec
Date: Wed, 10 Dec 2025 16:38:02 +0900 [thread overview]
Message-ID: <aTkjWsOyDzXq_bLv@codewreck.org> (raw)
In-Reply-To: <aTkNbptI5stvpBPn@infradead.org>
Christoph Hellwig wrote on Tue, Dec 09, 2025 at 10:04:30PM -0800:
> On Wed, Dec 10, 2025 at 06:04:23AM +0900, Dominique Martinet via B4 Relay wrote:
> > From: Dominique Martinet <asmadeus@codewreck.org>
> >
> > When doing a loop mount of a filesystem over 9p, read requests can come
> > from unexpected places and blow up as reported by Chris Arges with this
> > reproducer:
> > ```
> > dd if=/dev/zero of=./xfs.img bs=1M count=300
> > yes | mkfs.xfs -b size=8192 ./xfs.img
> > rm -rf ./mount && mkdir -p ./mount
> > mount -o loop ./xfs.img ./mount
>
> We should really wire this up to xfstests so that all file systems
> see the pattern of kmalloc allocations passed into the block layer
> and then on to the direct I/O code.
Note this doesn't seem to reproduce on my test VM so I'm not sure what
kind of precondition there is to going through this code...
> > The problem is that iov_iter_get_pages_alloc2() apparently cannot be
> > called on folios (as illustrated by the backtrace below), so limit what
> > iov we can pin from !iov_iter_is_kvec() to user_backed_iter()
>
> As willy pointed out this is a kmalloc.
Ok I got confused because of the VM_BUG_ON_FOLIO(), but looking back
it's in a folio_get() called directly from __iov_iter_get_pages_alloc()
so that was likely a bvec...
My points of "but there's a case for it in __iov_iter_get_pages_alloc()"
and "we have no idea what to do" still stand though, but you answered
that below:
> And 9p (just like NFS) really needs to switch away from
> iov_iter_get_pages_alloc2 to iov_iter_extract_pages, which handles not
> just this perfectly fine but also fixes various other issues.
Ok, so we can remove the special branch for kvec and just extract pages
with this.
I understand it pins user spaces pages, so there's no risk of it moving
under us during the IO, and there's nothing else we need to do about it?
Looking at the implementation for iov_iter_extract_bvec_pages() it looks
like it might not process all the way to the end, so we need to loop on
calling iov_iter_extract_pages()? (I see networking code looping on
"while (iter->count > 0)")
I'll send a v2 with that when I can
While I have your attention, there's some work to move away from large
(>1MB) kmalloc() in the non-zerocopy case into kvmalloc() that might not
be contiguous (see commit e21d451a82f3 ("9p: Use kvmalloc for message
buffers on supported transports") that basically only did that for
trans_fd), there's no iov_iter involved so it's off topic but how would
one get around "extracting pages" out of that?
> Note that the networking code still wants special treatment for kmalloc
> pages, so you might have more work there.
I *think* we're fine on this end, as it's just passing the buffers into
a sg list for virtio, as long as things don't move under the caller I
assume they don't care...
Thanks,
--
Dominique Martinet | Asmadeus
next prev parent reply other threads:[~2025-12-10 7:38 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-09 21:04 [PATCH] 9p/virtio: restrict page pinning to user_backed_iter() iovec Dominique Martinet via B4 Relay
2025-12-10 4:21 ` Matthew Wilcox
2025-12-10 6:04 ` Christoph Hellwig
2025-12-10 7:38 ` asmadeus [this message]
2025-12-10 8:32 ` Christoph Hellwig
2025-12-13 13:28 ` asmadeus
2025-12-15 5:55 ` Christoph Hellwig
2025-12-15 7:34 ` Dominique Martinet
2025-12-15 11:16 ` Christian Schoenebeck
2025-12-15 14:37 ` Christoph Hellwig
2025-12-10 13:33 ` Christian Schoenebeck
2025-12-17 13:41 ` Christian Schoenebeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTkjWsOyDzXq_bLv@codewreck.org \
--to=asmadeus@codewreck.org \
--cc=carges@cloudflare.com \
--cc=dhowells@redhat.com \
--cc=ericvh@kernel.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux_oss@crudebyte.com \
--cc=lucho@ionkov.net \
--cc=v9fs@lists.linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).