linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: asmadeus@codewreck.org
To: Christoph Hellwig <hch@infradead.org>
Cc: Eric Van Hensbergen <ericvh@kernel.org>,
	Latchesar Ionkov <lucho@ionkov.net>,
	Christian Schoenebeck <linux_oss@crudebyte.com>,
	v9fs@lists.linux.dev, linux-kernel@vger.kernel.org,
	David Howells <dhowells@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	linux-fsdevel@vger.kernel.org,
	Chris Arges <carges@cloudflare.com>
Subject: Re: [PATCH] 9p/virtio: restrict page pinning to user_backed_iter() iovec
Date: Sat, 13 Dec 2025 22:28:50 +0900	[thread overview]
Message-ID: <aT1qEmxcOjuJEZH9@codewreck.org> (raw)
In-Reply-To: <aTkwKbnXvUZs4UU9@infradead.org>

Christoph Hellwig wrote on Wed, Dec 10, 2025 at 12:32:41AM -0800:
> > Looking at the implementation for iov_iter_extract_bvec_pages() it looks
> > like it might not process all the way to the end, so we need to loop on
> > calling iov_iter_extract_pages()? (I see networking code looping on
> > "while (iter->count > 0)")
> 
> Yes.

Ok, I don't understand why the current code locks everything down and
wants to use a single scatterlist shared for the whole channel (and
capped to 128 pages?), it should only need to lock around the
virtqueue_add_sg() call, I'll need to play with that some more.

Looking at other virtio drivers I could probably use a sg_table and
have extract_iter_to_sg() do all the work for us...

> > I'll send a v2 with that when I can
> 
> You looked into this already, but in case you haven't seen it yet,
> don't forget to call unpin_user_folio on the completion side as well.

Thanks for bringing this up -- I'm not sure I understand how to decide
on what to cleanup with... Depending on the type of pages it's not
necessarily unpin_user_folio() is it? Or will it just ignore anything
else?
Ah, the caller must check iov_iter_extract_will_pin(iter), then it looks
like I'm free to call unpin_user_page() (or unpin_folio() which is the
same without pinned check) -- unpin_user_folio() would allow unpining
multiple pages within a folio, but iov_iter_extract_pages() doesn't give
us folios that could be used like that)


FWIW, the comment at the top of extract_iter_to_sg() says:
> The iov_iter_extract_mode() function should be used to query how cleanup
but I couldn't find any such function (even back when this comment was
added to netfs code in 2023...), the two copies of this comment probably
could use updating... David?


> > While I have your attention, there's some work to move away from large
> > (>1MB) kmalloc() in the non-zerocopy case into kvmalloc() that might not
> > be contiguous (see commit e21d451a82f3 ("9p: Use kvmalloc for message
> > buffers on supported transports") that basically only did that for
> > trans_fd), there's no iov_iter involved so it's off topic but how would
> > one get around "extracting pages" out of that?
> 
> vmalloc and I/O is in general problematic, because you need special
> calls to flush the cached for VIVT caches:
> flush_kernel_vmap_range / invalidate_kernel_vmap_range.
> 
> If you want to stuff that into an iov_iter, the only sane way is to
> call vmalloc_to_page and build a bvec iter from that at the moment.
> You also need to do the flush_kernel_vmap_range /
> invalidate_kernel_vmap_range in the caller for it.

Thanks, this is on the virtio/network side so I don't need an iov_iter,
but I agree virtio will not like vmap addresses as well -- I'd want to
resolve and pin the backing pages and pass that to a scatterlist like
we're doing here... Looking at extract_kvec_to_sg() it's just calling
vmalloc_to_page() on the the kvec address if it's a vmalloc address, so
no pin, so I guess pin might not be needed?
I was under the impression that vmalloc meant the kernel could shuffle
the physical data under us when we're not scheduled, but I'm really not
familiar enough with these memory management details; let's focus on the
bug at hands first...
I'll (eventually) poke back at this after we're all happy with the
zero-copy side.


> > I *think* we're fine on this end, as it's just passing the buffers into
> > a sg list for virtio, as long as things don't move under the caller I
> > assume they don't care...
> 
> Ok, if your backend is virtio that's fine.  If it's actual 9p over the
> network you might still into issues, though.

TCP and other transports don't do zero-copy straight into the iov_iter,
so virtio is our only problem right now -- thanksfully! :)


Thanks,
-- 
Dominique Martinet | Asmadeus

  reply	other threads:[~2025-12-13 13:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-09 21:04 [PATCH] 9p/virtio: restrict page pinning to user_backed_iter() iovec Dominique Martinet via B4 Relay
2025-12-10  4:21 ` Matthew Wilcox
2025-12-10  6:04 ` Christoph Hellwig
2025-12-10  7:38   ` asmadeus
2025-12-10  8:32     ` Christoph Hellwig
2025-12-13 13:28       ` asmadeus [this message]
2025-12-15  5:55         ` Christoph Hellwig
2025-12-15  7:34           ` Dominique Martinet
2025-12-15 11:16             ` Christian Schoenebeck
2025-12-15 14:37             ` Christoph Hellwig
2025-12-10 13:33   ` Christian Schoenebeck
2025-12-17 13:41 ` Christian Schoenebeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aT1qEmxcOjuJEZH9@codewreck.org \
    --to=asmadeus@codewreck.org \
    --cc=carges@cloudflare.com \
    --cc=dhowells@redhat.com \
    --cc=ericvh@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux_oss@crudebyte.com \
    --cc=lucho@ionkov.net \
    --cc=v9fs@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).