All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: David Miller <davem@davemloft.net>
Cc: herbert@gondor.apana.org.au, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, bcrl@kvack.org,
	Steve French <sfrench@samba.org>, Sage Weil <sage@inktank.com>,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>
Subject: Re: [0/3] net: Kill skb_copy_datagram_const_iovec
Date: Wed, 5 Nov 2014 01:53:55 +0000	[thread overview]
Message-ID: <20141105015355.GN7996@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20141104054513.GB7996@ZenIV.linux.org.uk>

On Tue, Nov 04, 2014 at 05:45:13AM +0000, Al Viro wrote:

> Hell knows; I hadn't finished digging through that zoo - got sidetracked back
> then.  *IF* all such places either use a throwaway copy or assume that iovec
> gets modified, we can do the following: switch the access to iovecs to
> iov_iter primitives, with the first kind of callers creating a throwaway
> iov_iter and the second just feeding the same iov_iter to e.g.
> kernel_recvmsg().  iovec will remain constant, iov_iter will be advanced.
> Moreover, in a lot of cases of first kind will be able to get rid of
> throwaway iov_iter (and of manually advancing it), effectively converting
> to the second one.

All right, now I _have_ finished that.  See the resulting notes below.
TL;DR version: looks like hypothesis above is correct, modulo 2 places,
both buggy - cifs smb_send_kvec() apparently relies on ->sendmsg() leaving the
iovec unchanged and so does of the ceph_tcp_sendmsg() callers
(write_partial_kvec()).  For TCP that's not always true.  Another apparent
bug caught in process is iscsi iscsit_do_tx_data() - assumes that iovec is
being consumed by sendmsg().  I don't see how that could not be a bug - TCP
sockets can get there and tcp_sendmsg() normally *doesn't* modify the iovec.
Sometimes it does, unfortunately for other two places...

Maintainers Cc'd...


Full version follows:
-----------------------------------------------------------------------------
->sendmsg(): there's such method in struct proto_ops and in struct proto; the
latter is called by (some of) the former, in cases when ->sendmsg() isn't the
same for the entire family.

Instances of proto ->sendmsg() are, on several occasions, called directly; some
of those calls are from another such instance (with unchanged payload).  There
are two exceptions to that - in tipc_accept() and tipc_connect() we call such
instances with empty payload.  All calls via method are from proto_ops
->sendmsg() instances, payload unchanged.

Instances of proto_ops ->sendmsg() are almost never called directly.  All
exceptions are from another such instance with unchanged payload.

There are two places that call proto_ops ->sendmsg() via method -
__sock_sendmsg_nosec() in net/socket.c, and handle_tx() in drivers/vhost/net.c.
The latter appears to be playing somewhat unusual games with passing NULL iocb,
making it impossible to use the former...  Everybody else in the kernel goes
through __sock_sendmsg_nosec(), though - it's a chokepoint for sendmsg path.

vhost callsite is somewhat worrying - granted, most of the ->sendmsg()
instances don't give a damn about iocb at all.  The rest, though...
E.g. what happens if we do VHOST_NET_SET_BACKEND with backend.fd being an
AF_UNIX socket?  AFAICS, if it ever gets to that ->sendmsg() call afterwards,
we'll get an oops when e.g. unix_dgram_sendmsg() calls kiocb_to_siocb(NULL).
The same goes for AF_NETLINK; AF_TIPC is even more interesting, since there
NULL iocb is used as "we are in weird callchain, socket is already locked"
flag (for tipc_{accept,connect}() callsites).  What's going on there?
[After having talked with mst: drivers/vhost/net.c checks that it's
AF_PACKET/SOCK_RAW (packet_sendmsg()) *or* comes from tun.c (tun_sendmsg())
or from macvtap.c (macvtap_sendmsg()) and all of those ignore iocb completely]

FWIW, the situation with iocb is
	* AF_UNIX and AF_NETLINK use it to get to sock_iocb
	* AF_TIPC uses it as a flag - it never dereferences the damn thing and
only compares it with NULL, to determine whether it's in a normal call chain,
or in tipc_accept/tipc_connect one...
	* everything else ignores it completely, either directly or after
passing it to something that ignores it (rxrpc has unusually deep chain, but it
still ends up ignoring the sucker).

->recvmsg(): again, present in proto_ops and proto, in the same relationship
to each other.  The situation is a bit simpler there: there is only one direct
caller of an instance of struct proto ->recvmsg() and it is in another such
instance, arguments unchanged.  All callers via method are in the instances of
struct proto_ops ->recvmsg(), message-related arguments unchanged.  No direct
callers of struct proto_ops recvmsg, three call sites via method - regular one
in __sock_recvmsg_nosec() plus two in drivers/vhost/net.c.  The latter have
NULL iocb and are saved from oopsing in ->recvmsg() by the same logics that
saves vhost on sendmsg side.  iocb is ignored by everything except AF_UNIX
and AF_NETLINK (those use it for sock_iocb) and neither can be reached from
vhost path.

So it boils down to the following: drivers/vhost/net.c aside, everything goes
through __sock_sendmsg_nosec() on the sendmsg side and __sock_recvmsg_nosec()
on recvmsg one.

Call chains leading to __sock_sendmsg_nosec():

__sock_sendmsg_nosec()
	<- sock_sendmsg_nosec()
		<- ___sys_sendmsg()
			<- __sys_sendmsg()
				<- sys_compat_sendmsg()
				<- sys_sendmsg()
			<- __sys_sendmmsg()
				<- sys_compat_senmmmsg()
				<- sys_sendmmsg()
	<- __sock_sendmsg()
		<- do_sock_write()
			<- sock_aio_write()
				== ->aio_write()
		<- sock_sendmsg()
			<- svc_sendto()	[no iovec at all]
			<- ___sys_sendmsg() [see above]
			<- sys_sendto()
			<- kernel_sendmsg()
All syscalls (and there's quite a tangled mess with sys_socketcall,
assorted ARM wrappers, etc.) end up with iovec discarded.
Ditto for ->aio_write() callers - they all free the iovec soon after
->aio_write() returns and never look at it before freeing.

Call chains leading to __sock_recvmsg_nosec():

__sock_recvmsg_nosec()
	<- sock_recvmsg_nosec()
		<- ___sys_recvmsg()
			<- __sys_recvmsg()
				<- sys_compat_recvmsg()
				<- sys_recvmsg()
			<- __sys_recvmmsg()
				<- sys_compat_recvmmsg()
				<- sys_recvmmsg()
	<- __sock_recvmsg()
		<- do_sock_read()
			<- sock_aio_read()
				== ->aio_read()
		<- sock_recvmsg() [why is it not static, BTW?]
			<- ___sys_recvmsg() [see above]
			<- sys_recvfrom()
			<- kernel_recvmsg()
Again, both the sycalls and ->aio_read() callers end up discarding
iovec.

All of that leaves us with kernel_{send,recv}msg() as the next-order
chokepoints.

kernel_recvmsg() callers:
	drbd drbd_recv_short() - single-element iovec discarded
	nbd sock_xmit() - single-element iovec discarded; would be better off
with advancing iov_iter.
	isdn l1oip_socket_thread() - single-element iovec discarded
	lustre ksocknal_lib_recv_iov() - iovec copied (with unhappy comment),
copy passed to kernel_recvmsg() and discarded
	lustre ksocknal_lib_recv_kiov() - ditto.  Would be much better off
with bvec-based iov_iter
	lustre libcfs_sock_read() - single-element iovec discarded; would be
better off with advancing iov_iter.
	iscsi iscsit_do_rx_data() - assumes that iovec is being consumed.
AFAICS, it's guaranteed to be TCP and tcp_recvmsg() appears to act that way,
so it's probably OK...
	usbip usbip_recv() - single-element iovec discarded; would be better
off with advancing iov_iter
	cifs cifs_readv_from_socket() - iovec copied, copy passed to
kernel_recvmsg() and discarded; *definitely* would be better off with advancing
iov_iter.
	dlm receive_from_sock() - 1- or 2-element iovec discarded.
	ncpfs _recv() - single-element iovec discarded.
	ocfs2 o2net_recv_tcp_msg() - single-element iovec discarded.
	ceph ceph_tcp_recvmsg() - single-element iovec discarded; at least one
of the loops using it would be better off with advancing iov_iter.
	ipvs ip_vs_receive() - single-element iovec discarded.
	sunrpc svc_udp_recvfrom() - no payload (MSG_PEEK, that one)
	tipc tipc_receive_from_sock() - single-element iovec discarded.
	sunrpc svc_recvfrom() - confusing; looks like iovec is a throwaway
one, though (and we might be better off if we could use an iov_iter of bvec
sort instead).

kernel_sendmsg() callers:
	drbd drbd_send() - single-element iovec discarded; would be better
off with advancing iov_iter
	nbd sock_xmit() - ditto
	isdn l1oip_socket_send() - single-element iovec discarded
	iscsi iscsi_sw_tcp_xmit_segment() - single-element iovec discarded
	lustre ksocknal_lib_send_iov() - iovec copied (with unhappy comment),
copy passed to kernel_sendmsg() and discarded
	lustre ksocknal_lib_send_kiov - ditto.  Would be much better off
with bvec iov_iter.
	lustre libcfs_sock_write() - single-element iovec discarded; better
off with advancing iov_iter
	iscsi iscsit_do_tx_data() - assumes that iovec is being consumed.
I don't see how that could not be a bug - TCP sockets can get there.
	usbip stub_send_ret_submit() - iovec is built and discarded;
short write is treated as an error
	usbip stub_send_ret_unlink() - ditto
	usbip vhci_send_cmd_submit() - ditto
	usbip vhci_send_cmd_unlink() - ditto
	cifs smb_send_kvec() - apparently relies on ->sendmsg() leaving the
iovec unchanged.  Looks like a bug - tcp_sendmsg() might drain iovec in some
cases.
	dlm sctp_send_shutdown() - no payload
	dlm sctp_init_assoc() - single-element iovec discarded
	ncpfs do_send() - single-element iovec discarded
	ocfs2 o2net_send_tcp_msg() - callers pass it a throwaway iovec
	bnep bnep_send() - single-element iovec discarded
	bnep_tx_frame() - short iovec is built and discarded
	cmtp_send_frame() - single-element iovec discarded
	hidp_send_frame() - single-element iovec discarded
	rfcomm_send_frame() - single-element iovec discarded
	rfcomm_send_test() - short iovec is built and discarded
	core sock_no_sendpage*() - single-element iovec discarded
	ipvs ip_vs_send_async() - single-element iovec discarded
	rds rds_tcp_sendmsg() - single-element iovec discarded
	rxrpc rxrpc_busy() - single-element iovec discarded
	rxrpc rxrpc_process_call() - short iovec is built and discarded
	rxrpc rxrpc_abort_connection() - short iovec is built and discarded
	rxrpc rxrpc_reject_packets() - assumes that sendmsg drains iovec;
may be a bug.
	rxrpc rxrpc_send_packet() - single-element iovec discarded
	rxrpc rxkad_issue_challenge() - short iovec is built and discarded
	rxrpc rxkad_send_response - short iovec is built and discarded
	sunrpc xs_send_kvec() - single-element iovec discarded; really asks
or iov_iter...
	tipc tipc_send_to_sock() - single-element iovec discarded; for some
reason it seems to believe that short writes never happen...
	ceph ceph_tcp_sendmsg() - one caller appears to discard iovec, another
(write_partial_kvec()) apparently assumes that iovec is unchanged by sendmsg.
Not guaranteed to be true for TCP, AFAICAS.

      reply	other threads:[~2014-11-05  1:54 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-02 23:05 fs: Use non-const iov in aio_read/aio_write Herbert Xu
2014-11-03  0:16 ` Al Viro
2014-11-03  0:21   ` Al Viro
2014-11-03  0:22   ` Herbert Xu
2014-11-03  0:45     ` Al Viro
2014-11-03  5:37       ` [0/3] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-03  5:44         ` [PATCH 1/3] tun: Modify const aio_read iovec per do_sock_read Herbert Xu
2014-11-03  5:44         ` [PATCH 3/3] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-03  5:44         ` [PATCH 2/3] macvtap: Modify const aio_read iovec per do_sock_read Herbert Xu
2014-11-03 20:05         ` [0/3] net: Kill skb_copy_datagram_const_iovec David Miller
2014-11-04  3:38           ` Herbert Xu
2014-11-04  8:31             ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-04  8:37               ` Herbert Xu
2014-11-05  2:49                 ` YOSHIFUJI Hideaki
2014-11-05  3:41                   ` Herbert Xu
2014-11-04  8:31             ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-04 14:32               ` Al Viro
2014-11-04 14:35                 ` Al Viro
2014-11-04 14:44                   ` Herbert Xu
2014-11-04 14:52                     ` Al Viro
2014-11-04 14:55                       ` Herbert Xu
2014-11-04 14:42                 ` Herbert Xu
2014-11-04 15:13                   ` Al Viro
2014-11-05  2:22                     ` Herbert Xu
2014-11-05  3:27                       ` David Miller
2014-11-05  3:55                         ` Al Viro
2014-11-05  4:12                           ` Al Viro
2014-11-05 20:51                             ` David Miller
2014-11-05 20:50                           ` David Miller
2014-11-05 21:07                             ` Al Viro
2014-11-05 21:57                               ` David Miller
2014-11-06  3:25                                 ` Al Viro
2014-11-06  5:50                                   ` ipv4: Use standard iovec primitive in raw_probe_proto_opt Herbert Xu
2014-11-06  6:43                                     ` Al Viro
2014-11-06  6:46                                       ` Herbert Xu
2014-11-06  7:11                                         ` Al Viro
2014-11-06  9:55                                           ` Jon Maloy
2014-11-06 22:16                                             ` Al Viro
2014-11-28  5:14                                               ` Al Viro
2014-11-06 21:28                                         ` David Miller
2014-11-07  2:00                                           ` Herbert Xu
2014-11-07 13:25                                             ` [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice Herbert Xu
2014-11-07 13:27                                               ` [PATCH 1/2] ipv4: Use standard iovec primitive in raw_probe_proto_opt Herbert Xu
2014-11-07 13:27                                               ` [PATCH 2/2] ipv4: Avoid reading user iov twice after raw_probe_proto_opt Herbert Xu
2014-11-10 19:26                                               ` [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice David Miller
2014-11-06  9:50                                   ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Jon Maloy
2014-11-07 21:48                                   ` David Miller
2014-11-07 22:11                                     ` Al Viro
2014-11-07 22:31                                       ` Al Viro
2014-11-07 22:35                                         ` Al Viro
2014-11-07 23:42                                       ` Al Viro
2014-11-08  2:21                                         ` Herbert Xu
2014-11-09 21:19                                         ` Al Viro
2014-11-10  5:20                                           ` David Miller
2014-11-10  6:58                                             ` Al Viro
2014-11-10  7:30                                               ` David Miller
2014-11-10  9:09                                                 ` Al Viro
2014-11-10 16:18                                                   ` David Miller
2014-11-10 10:14                                           ` Michael S. Tsirkin
2014-11-07 21:52                                   ` David Miller
2014-11-05 20:24               ` David Miller
2014-11-06  8:23                 ` Herbert Xu
2014-11-06 17:25                   ` David Miller
2014-11-07  1:59                     ` Herbert Xu
2014-11-07  3:13                       ` David Miller
2014-11-07 13:21                         ` [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version Herbert Xu
2014-11-07 13:22                           ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-07 13:22                           ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-07 13:22                           ` [PATCH 3/4] macvtap: " Herbert Xu
2014-11-07 13:22                           ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-06  8:27                 ` [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version Herbert Xu
2014-11-06  8:28                   ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-06 17:30                     ` Al Viro
2014-11-07  1:58                       ` Herbert Xu
2014-11-06  8:28                   ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-06  8:28                   ` [PATCH 3/4] macvtap: " Herbert Xu
2014-11-06 17:33                     ` Al Viro
2014-11-06  8:28                   ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-04  8:31             ` [PATCH 3/4] macvtap: Use iovec iterators Herbert Xu
2014-11-04  8:31             ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-04  5:45           ` [0/3] " Al Viro
2014-11-05  1:53             ` Al Viro [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141105015355.GN7996@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=bcrl@kvack.org \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nab@linux-iscsi.org \
    --cc=netdev@vger.kernel.org \
    --cc=sage@inktank.com \
    --cc=sfrench@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.