From: Al Viro <viro@ZenIV.linux.org.uk>
To: David Miller <davem@davemloft.net>
Cc: herbert@gondor.apana.org.au, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, bcrl@kvack.org
Subject: Re: [0/3] net: Kill skb_copy_datagram_const_iovec
Date: Tue, 4 Nov 2014 05:45:13 +0000 [thread overview]
Message-ID: <20141104054513.GB7996@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20141103.150553.633529234958324183.davem@davemloft.net>
On Mon, Nov 03, 2014 at 03:05:53PM -0500, David Miller wrote:
> I'll see if I can make some progress converting the networking over
> to iov_iter. It can't be that difficult... albeit perhaps a little
> time consuming.
FWIW, I have a queue that got started back in April; basically, the plan
of attack was
* separate kernel-side and userland msghdr.
* localize ->msg_iov uses - most of that gets taken care of by
several new helpers, as in
new helper: skb_copy_datagram_msg()
Absolute majority of skb_copy_datagram_iovec() callers (49 out of 56)
are passing it msg->msg_iov as iovec. Provide a trivial wrapper that
takes msg as argument instead of iovec.
and several like that (the numbers in the above are probably incorrect these
days - it was done more than half a year ago).
* switch kernel-side msghdr to iov_iter. That means diverging
layouts; it's really not hard, since we have copying of msghdr from
userland already localized. Initially - just a mechanical conversion
(i.e. direct uses of iov_iter fields instead of ->msg_iov/->msg_iovlen;
note that after the introduction of wrappers the number of such places
is very much reduced).
* start converting those relatively few places to iov_iter primitives.
And that's where it got stalled, since we have to deal with expectations
of callers. Syscall ones are trivial; that's not a problem. Unfortunately,
there are kernel_{send,recv}msg() users, and those do care about the state the
iovec is left in. Strictly speaking, the state of iovec after e.g.
->sendmsg() is undefined. And it's not just protocol-dependent - unless
I'm seriously misreading it, tcp_sendmsg() ends up modifying iovec in case
when it hits tcp_send_rcvq(), while in the normal case it leaves iovec
unmodified. So in general you need to feed ->{send,recv}msg() a throwaway copy
of iovec. Leads to wonders like
/* NB we can't trust socket ops to either consume our iovs
* or leave them alone. */
LASSERT (niov > 0);
for (nob = i = 0; i < niov; i++) {
scratchiov[i] = iov[i];
nob += scratchiov[i].iov_len;
}
LASSERT (nob <= conn->ksnc_rx_nob_wanted);
rc = kernel_recvmsg(conn->ksnc_sock, &msg,
(struct kvec *)scratchiov, niov, nob, MSG_DONTWAIT);
etc. However, there are places that don't bother and do this:
while (total_rx < data) {
rx_loop = kernel_recvmsg(conn->sock, &msg, iov_p, iov_len,
(data - total_rx), MSG_WAITALL);
if (rx_loop <= 0) {
pr_debug("rx_loop: %d total_rx: %d\n",
rx_loop, total_rx);
return rx_loop;
}
total_rx += rx_loop;
pr_debug("rx_loop: %d, total_rx: %d, data: %d\n",
rx_loop, total_rx, data);
}
(that's iscsit_do_rx_data()). Maybe it's a bug; maybe it's relying on
specific behaviour of the protocol known to be used - this code clearly
expects recvmsg to advance iovec, which seems to depend only on the
protocol. At the moment. In any case, it's very brittle...
Hell knows; I hadn't finished digging through that zoo - got sidetracked back
then. *IF* all such places either use a throwaway copy or assume that iovec
gets modified, we can do the following: switch the access to iovecs to
iov_iter primitives, with the first kind of callers creating a throwaway
iov_iter and the second just feeding the same iov_iter to e.g.
kernel_recvmsg(). iovec will remain constant, iov_iter will be advanced.
Moreover, in a lot of cases of first kind will be able to get rid of
throwaway iov_iter (and of manually advancing it), effectively converting
to the second one.
If we have places that currently rely on iovec remaining unchanged (i.e.
manually advancing it after kernel_{send,recv}msg()), the series will be
more painful ;-/ I very much hope that no such places exist...
FWIW, there is also a tactical question that needs to be dealt with. We
can, of course, start with renaming the "kernel-side" (i.e. post
copy_msghdr_from_user()/get_compat_msghdr()) to struct kmsghdr. OTOH,
that's a _lot_ of churn for very little reason - most of the instances
in the tree are of that kind. So I did it the other way round - introduced
struct user_msghdr (only in linux/socket.h; note that we do *not* have
struct msghdr in uabi/linux/socket.h, or anywhere else in uabi/*),
made the syscalls take pointers to it and (initially) rely upon the identical
layouts in copy_msghdr_from_user(); once we put iov_iter into kernel-side
msghdr, we'll just do it like get_compat_msghdr() does.
Is that acceptable? It would greatly reduce the amount of churn in net/* -
we don't need to pass iov_iter separately and most of the functions in
the middle of call chains are completely unchanged. Only the originators
of ->sendmsg()/->recvmsg() and the places doing actual data copying
need to be touched. OTOH, it makes for kernel struct msghdr looking
odd - instead of normal ->msg_iov and ->msg_iovlen it would have
->msg_iov_iter, with ->sendmsg()/->recvmsg() callers needing to set it
up... OTTH, the things *are* odd from userland programmer POV - sendmsg(2)
and recvmsg(2) leave the iovec unchanged, and having it changed unpredicatably
in the kernel-side counterparts seems to make for a nasty trap. Certainly
makes for a bunch of nasty comments in the code using those...
Comments?
next prev parent reply other threads:[~2014-11-04 5:45 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-02 23:05 fs: Use non-const iov in aio_read/aio_write Herbert Xu
2014-11-03 0:16 ` Al Viro
2014-11-03 0:21 ` Al Viro
2014-11-03 0:22 ` Herbert Xu
2014-11-03 0:45 ` Al Viro
2014-11-03 5:37 ` [0/3] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-03 5:44 ` [PATCH 1/3] tun: Modify const aio_read iovec per do_sock_read Herbert Xu
2014-11-03 5:44 ` [PATCH 3/3] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-03 5:44 ` [PATCH 2/3] macvtap: Modify const aio_read iovec per do_sock_read Herbert Xu
2014-11-03 20:05 ` [0/3] net: Kill skb_copy_datagram_const_iovec David Miller
2014-11-04 3:38 ` Herbert Xu
2014-11-04 8:31 ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-04 14:32 ` Al Viro
2014-11-04 14:35 ` Al Viro
2014-11-04 14:44 ` Herbert Xu
2014-11-04 14:52 ` Al Viro
2014-11-04 14:55 ` Herbert Xu
2014-11-04 14:42 ` Herbert Xu
2014-11-04 15:13 ` Al Viro
2014-11-05 2:22 ` Herbert Xu
2014-11-05 3:27 ` David Miller
2014-11-05 3:55 ` Al Viro
2014-11-05 4:12 ` Al Viro
2014-11-05 20:51 ` David Miller
2014-11-05 20:50 ` David Miller
2014-11-05 21:07 ` Al Viro
2014-11-05 21:57 ` David Miller
2014-11-06 3:25 ` Al Viro
2014-11-06 5:50 ` ipv4: Use standard iovec primitive in raw_probe_proto_opt Herbert Xu
2014-11-06 6:43 ` Al Viro
2014-11-06 6:46 ` Herbert Xu
2014-11-06 7:11 ` Al Viro
2014-11-06 9:55 ` Jon Maloy
2014-11-06 22:16 ` Al Viro
2014-11-28 5:14 ` Al Viro
2014-11-06 21:28 ` David Miller
2014-11-07 2:00 ` Herbert Xu
2014-11-07 13:25 ` [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice Herbert Xu
2014-11-07 13:27 ` [PATCH 1/2] ipv4: Use standard iovec primitive in raw_probe_proto_opt Herbert Xu
2014-11-07 13:27 ` [PATCH 2/2] ipv4: Avoid reading user iov twice after raw_probe_proto_opt Herbert Xu
2014-11-10 19:26 ` [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice David Miller
2014-11-06 9:50 ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Jon Maloy
2014-11-07 21:48 ` David Miller
2014-11-07 22:11 ` Al Viro
2014-11-07 22:31 ` Al Viro
2014-11-07 22:35 ` Al Viro
2014-11-07 23:42 ` Al Viro
2014-11-08 2:21 ` Herbert Xu
2014-11-09 21:19 ` Al Viro
2014-11-10 5:20 ` David Miller
2014-11-10 6:58 ` Al Viro
2014-11-10 7:30 ` David Miller
2014-11-10 9:09 ` Al Viro
2014-11-10 16:18 ` David Miller
2014-11-10 10:14 ` Michael S. Tsirkin
2014-11-07 21:52 ` David Miller
2014-11-05 20:24 ` David Miller
2014-11-06 8:23 ` Herbert Xu
2014-11-06 17:25 ` David Miller
2014-11-07 1:59 ` Herbert Xu
2014-11-07 3:13 ` David Miller
2014-11-07 13:21 ` [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version Herbert Xu
2014-11-07 13:22 ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-07 13:22 ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-07 13:22 ` [PATCH 3/4] macvtap: " Herbert Xu
2014-11-07 13:22 ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-06 8:27 ` [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version Herbert Xu
2014-11-06 8:28 ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-06 17:30 ` Al Viro
2014-11-07 1:58 ` Herbert Xu
2014-11-06 8:28 ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-06 8:28 ` [PATCH 3/4] macvtap: " Herbert Xu
2014-11-06 17:33 ` Al Viro
2014-11-06 8:28 ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-04 8:31 ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-04 8:37 ` Herbert Xu
2014-11-05 2:49 ` YOSHIFUJI Hideaki
2014-11-05 3:41 ` Herbert Xu
2014-11-04 8:31 ` [PATCH 3/4] macvtap: " Herbert Xu
2014-11-04 8:31 ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-04 5:45 ` Al Viro [this message]
2014-11-05 1:53 ` [0/3] " Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141104054513.GB7996@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=bcrl@kvack.org \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.