From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Network Development <netdev@vger.kernel.org>
Subject: Re: [RFC] iov_iter_get_pages() semantics
Date: Wed, 1 Apr 2015 19:08:01 +0100 [thread overview]
Message-ID: <20150401180801.GA889@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFx0XbzyPJ8zYkSauhTAPVgkqTW5EubaZhg=QgOGHRGkpw@mail.gmail.com>
On Wed, Apr 01, 2015 at 09:45:27AM -0700, Linus Torvalds wrote:
> The *only* thing you can do with those pages is to copy the content.
> You must never *ever* do anything else. And if the caller only copies
> the content, then the caller is *wrong* to use the page-iterators.
>
> It really is that simple. Either the caller cares about just the
> content - in which case it should use the normal "iterate over
> addresses" - or the caller somehow cares about 'struct page' itself,
> in which case it is *wrong* to do it over vmalloc space or random
> kernel mappings.
... or the caller wants sg_table. When net/9p p9_client_write() is called
with virtio for underlying transport (and the amount of data is large enough)
it allocates two buffers (for request header and response resp.), builds
header in the first buffer and proceeds to populate two scatterlists - one
with header + data being sent (sg_set_buf() on the header + sg_set_page() on
each page in payload) and another covering the buffer for response. Then it
passes the sucker to virtio, which does the actual transfer.
Non-zerocopy variant of the same allocates a buffer for request + data,
a buffer for response, builds header _and_ copies data, then proceeds the
same way as zerocopy path.
In both cases we end up doing sg_set_buf() on some of that - in fact,
in non-zerocopy path that's all we do. And sg_set_buf() is just
sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
What's more, on the sg_set_page() side it has to deal with several cases:
1) normal write(); pages are obtained by get_user_pages_fast().
2) writepage(). Page should've been really passed by caller
via ITER_BVEC iov_iter and picked from there, but p9_client_write() isn't
iov_iter-aware, so the caller does kmap() and passes the kernel address
as payload. And then this sucker proceeds to do kmap_to_page() instead
of get_user_pages_fast() - virt_to_page() won't do. That one should be
killed off - there's really no reason to play with kmap() at all.
3) setxattr(). The data being written comes from the kernel
buffer, _hopefully_ not vmalloc'ed one. Same path as in (2) once we
get inside p9_client_write(); I'd prefer to pass ITER_KVEC there.
Similar thing happens in p9_client_read(), only there we definitely can
get vmalloc'ed destination - finit_module(2) will trigger read into
vmalloc'ed buffer. Zerocopy logics in net/9p is actually shared for
read and write (except that we have fixed header for request and header
+ payload for response). Getting the pages to feed into sg_set_page()
is done in net/9p/trans_virtio.c:p9_get_mapped_pages(); basically, it's
get_user_pages_fast() for userland payloads and this
} else {
/* kernel buffer, no need to pin pages */
int s, index = 0;
int count = nr_pages;
while (nr_pages) {
s = rest_of_page(data);
if (is_vmalloc_addr(data))
pages[index++] = vmalloc_to_page(data);
else
pages[index++] = kmap_to_page(data);
data += s;
nr_pages--;
}
nr_pages = count;
}
for the kernel ones. kmap_to_page() probably ought to be virt_to_page()
(it's an artefact of calling conventions - readpage and writepage shouldn't
have to pass the page as kmapped address), but vmalloc_to_page() is
for absolutely real use case - finit_module() reading a file from 9p fs
into vmalloc'ed buffer.
AFAICS, your objections seem to apply to that code. A lot of clumsiness
in there (and in fs/9p) would be killed if we switched p9_client_{read,write}
to passing iov_iter, but it's not a matter of adding functionality to those;
they already do the same thing, only with bloody inconvenient calling
conventions. It's already there.
How about a primitive that would feed iov_iter into scatterlist, and fill
an array with struct page pointers to drop afterwards? With ITER_IOVEC
dumping the get_user_pages_fast() results into that array and ITER_KVEC
just putting NULLs there.
IOW, do you have a problem with obtaining a pointer to kernel page and
immediately shoving it into scatterlist?
next prev parent reply other threads:[~2015-04-01 18:08 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-04 20:20 [RFC][PATCHES] iov_iter.c rewrite Al Viro
2014-12-04 20:23 ` [RFC][PATCH 01/13] iov_iter.c: macros for iterating over iov_iter Al Viro
2014-12-04 20:23 ` [RFC][PATCH 02/13] iov_iter.c: iterate_and_advance Al Viro
2014-12-04 20:23 ` [RFC][PATCH 03/13] iov_iter.c: convert iov_iter_npages() to iterate_all_kinds Al Viro
2014-12-04 20:23 ` [RFC][PATCH 04/13] iov_iter.c: convert iov_iter_get_pages() " Al Viro
2014-12-04 20:23 ` [RFC][PATCH 05/13] iov_iter.c: convert iov_iter_get_pages_alloc() " Al Viro
2014-12-04 20:23 ` [RFC][PATCH 06/13] iov_iter.c: convert iov_iter_zero() to iterate_and_advance Al Viro
2014-12-04 20:23 ` [RFC][PATCH 07/13] iov_iter.c: get rid of bvec_copy_page_{to,from}_iter() Al Viro
2014-12-05 12:28 ` Sergei Shtylyov
2014-12-04 20:23 ` [RFC][PATCH 08/13] iov_iter.c: convert copy_from_iter() to iterate_and_advance Al Viro
2014-12-04 20:23 ` [RFC][PATCH 09/13] iov_iter.c: convert copy_to_iter() " Al Viro
2014-12-04 20:23 ` [RFC][PATCH 10/13] iov_iter.c: handle ITER_KVEC directly Al Viro
2014-12-04 20:23 ` [RFC][PATCH 11/13] csum_and_copy_..._iter() Al Viro
2014-12-04 20:23 ` [RFC][PATCH 12/13] new helper: iov_iter_kvec() Al Viro
2014-12-04 20:23 ` [RFC][PATCH 13/13] copy_from_iter_nocache() Al Viro
2014-12-08 16:46 ` [RFC][PATCHES] iov_iter.c rewrite Kirill A. Shutemov
2014-12-08 17:58 ` Al Viro
2014-12-08 18:08 ` Al Viro
2014-12-08 18:14 ` Linus Torvalds
2014-12-08 18:20 ` Al Viro
2014-12-08 18:37 ` Linus Torvalds
2014-12-08 18:46 ` Al Viro
2014-12-08 18:57 ` Linus Torvalds
2014-12-08 19:28 ` Al Viro
2014-12-08 19:48 ` Linus Torvalds
2014-12-09 1:56 ` Al Viro
2014-12-09 2:21 ` Kirill A. Shutemov
2015-04-01 2:33 ` [RFC] iov_iter_get_pages() semantics Al Viro
2015-04-01 16:45 ` Linus Torvalds
2015-04-01 18:08 ` Al Viro [this message]
2015-04-01 18:15 ` Linus Torvalds
2015-04-01 19:23 ` Al Viro
2015-04-01 18:26 ` Linus Torvalds
2015-04-01 18:34 ` Linus Torvalds
2015-04-01 20:15 ` Al Viro
2015-04-01 21:57 ` Linus Torvalds
2015-04-01 19:50 ` Al Viro
2014-12-08 18:56 ` [RFC][PATCHES] iov_iter.c rewrite Kirill A. Shutemov
2014-12-08 19:01 ` Linus Torvalds
2014-12-08 19:15 ` Dave Jones
2014-12-08 19:23 ` Kirill A. Shutemov
2014-12-08 22:14 ` Theodore Ts'o
2014-12-08 22:23 ` Linus Torvalds
2014-12-08 22:31 ` Dave Jones
2014-12-08 18:07 ` Linus Torvalds
2014-12-08 18:14 ` Al Viro
2014-12-08 18:23 ` Linus Torvalds
2014-12-08 18:35 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150401180801.GA889@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).