From: Wolfgang Draxinger <wdraxinger.maillist@draxit.de>
To: linux-fsdevel@vger.kernel.org
Cc: viro@zeniv.linux.org.uk, matthew@wil.cx
Subject: Doubts regarding iov_iter_get_pages{,_alloc}
Date: Sun, 11 Dec 2016 21:23:50 +0100 [thread overview]
Message-ID: <20161211212350.0fd99b17@loki.lan> (raw)
Hi,
I'm currently reworking a device driver (unfortunately it's mostly
under NDA and I can't share details about it) and am at the point of
replacing the old jenga tower of "DIY async read/write via custom
ioctls" with something that uses the kernel's AIO infrastructure.
Essentially where I'm currently at is taking an iov_iter and produce a
DMA scatterlist from it (and I definitely think that THAT should be
something readily available, will probably contribute my implementation
once it's done). There are a few extra quirks in what I need, but those
are not important here.
Anyway, in trying to understand what I have to do to this end I was
looking at iov_iter_get_pages (and its _alloc variant), which raised
more questions than answering. So here's the code as found in linux-4.8
(linux/lib/iov_iter.c +585)
ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
struct page ***pages, size_t maxsize,
size_t *start)
{
struct page **p;
if (maxsize > i->count)
maxsize = i->count;
if (!maxsize)
return 0;
iterate_all_kinds(i, maxsize, v, ({
unsigned long addr = (unsigned long)v.iov_base;
size_t len = v.iov_len + (*start = addr & (PAGE_SIZE - 1));
int n;
int res;
addr &= ~(PAGE_SIZE - 1);
n = DIV_ROUND_UP(len, PAGE_SIZE);
p = get_pages_array(n);
if (!p)
return -ENOMEM;
res = get_user_pages_fast(addr, n, (i->type & WRITE) != WRITE, p);
if (unlikely(res < 0)) {
kvfree(p);
return res;
}
*pages = p;
return (res == n ? len : res * PAGE_SIZE) - *start;
0;}),({
/* can't be more than PAGE_SIZE */
*start = v.bv_offset;
*pages = p = get_pages_array(1);
if (!p)
return -ENOMEM;
get_page(*p = v.bv_page);
return v.bv_len;
}),({
return -EFAULT;
})
)
return 0;
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc);
Here's the thing I'm dumbfounded about: This thing will never iterate
past the first entry of an iov_iter (never mind the superfluous(?) `0;`
at the end of the ITER_IOVEC step; I mean whatever that compound statement
expression is the r-value of, that return preempts it).
The execution path of every kind of step ends in an unconditional return
statement, thereby (seemingly?) cutting the whole thing short.
In case of the _alloc variant this is probably a good thing, as
otherwise the memory allocated for the page array would be leaked in
subsequent iterations.
Which leaves me with the following doubts:
What is the rationale for looking at only the very first element in an
iov_iter?
If this somehow does give the pages of everything an iov_iter covers,
how does this work?
If this gives the pages of only the very first element in an iov_iter
is this the intended behaviour of the API and what's the recommended
way of retrieving the pages of all the elements in an iov_iter?
Kind Regards,
Wolfgang
reply other threads:[~2016-12-11 20:33 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161211212350.0fd99b17@loki.lan \
--to=wdraxinger.maillist@draxit.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.