netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Antonio Quartulli <antonio@openvpn.net>
To: Sabrina Dubroca <sd@queasysnail.net>
Cc: netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
	Sergey Ryazanov <ryazanov.s.a@gmail.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>, Andrew Lunn <andrew@lunn.ch>,
	Esben Haabendal <esben@geanix.com>
Subject: Re: [PATCH net-next v3 13/24] ovpn: implement TCP transport
Date: Wed, 15 May 2024 00:11:28 +0200	[thread overview]
Message-ID: <2ddf759d-378f-475c-8fc1-30c6e83c2d14@openvpn.net> (raw)
In-Reply-To: <ZkMnpy3_T8YO3eHD@hog>

On 14/05/2024 10:58, Sabrina Dubroca wrote:
>>>> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
>>>> index b5ff59a4b40f..ac4907705d98 100644
>>>> --- a/drivers/net/ovpn/peer.h
>>>> +++ b/drivers/net/ovpn/peer.h
>>>> + * @tcp.raw_len: next packet length as read from the stream (TCP only)
>>>> + * @tcp.skb: next packet being filled with data from the stream (TCP only)
>>>> + * @tcp.offset: position of the next byte to write in the skb (TCP only)
>>>> + * @tcp.data_len: next packet length converted to host order (TCP only)
>>>
>>> It would be nice to add information about whether they're used for TX or RX.
>>
>> they are all about "from the stream" and "to the skb", meaning that we are
>> doing RX.
>> Will make it more explicit.
> 
> Maybe group them in a struct rx?

yap, makes sense.

> 
>>>> + * @tcp.sk_cb.sk_data_ready: pointer to original cb
>>>> + * @tcp.sk_cb.sk_write_space: pointer to original cb
>>>> + * @tcp.sk_cb.prot: pointer to original prot object
>>>>     * @crypto: the crypto configuration (ciphers, keys, etc..)
>>>>     * @dst_cache: cache for dst_entry used to send to peer
>>>>     * @bind: remote peer binding
>>>> @@ -59,6 +69,25 @@ struct ovpn_peer {
>>>>    	struct ptr_ring netif_rx_ring;
>>>>    	struct napi_struct napi;
>>>>    	struct ovpn_socket *sock;
>>>> +	/* state of the TCP reading. Needed to keep track of how much of a
>>>> +	 * single packet has already been read from the stream and how much is
>>>> +	 * missing
>>>> +	 */
>>>> +	struct {
>>>> +		struct ptr_ring tx_ring;
>>>> +		struct work_struct tx_work;
>>>> +		struct work_struct rx_work;
>>>> +
>>>> +		u8 raw_len[sizeof(u16)];
>>>
>>> Why not u16 or __be16 for this one?
>>
>> because in this array we are putting the bytes as we get them from the
>> stream.
>> We may be at the point where one out of two bytes is available on the
>> stream. For this reason I use an array to store this u16 byte by byte.
>>
>> Once thw two bytes are ready, we convert the content in an actual int and
>> store it in "data_len" (a few lines below).
> 
> Ok, I see. Hopefully you can switch to strparser and make this one go
> away.
> 
> 
>>>> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
>>>> index e099a61b03fa..004db5b13663 100644
>>>> --- a/drivers/net/ovpn/socket.c
>>>> +++ b/drivers/net/ovpn/socket.c
>>>> @@ -16,6 +16,7 @@
>>>>    #include "packet.h"
>>>>    #include "peer.h"
>>>>    #include "socket.h"
>>>> +#include "tcp.h"
>>>>    #include "udp.h"
>>>>    /* Finalize release of socket, called after RCU grace period */
>>>> @@ -26,6 +27,8 @@ static void ovpn_socket_detach(struct socket *sock)
>>>>    	if (sock->sk->sk_protocol == IPPROTO_UDP)
>>>>    		ovpn_udp_socket_detach(sock);
>>>> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
>>>> +		ovpn_tcp_socket_detach(sock);
>>>>    	sockfd_put(sock);
>>>>    }
>>>> @@ -69,6 +72,8 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>>>>    	if (sock->sk->sk_protocol == IPPROTO_UDP)
>>>>    		ret = ovpn_udp_socket_attach(sock, peer->ovpn);
>>>> +	else if (sock->sk->sk_protocol == IPPROTO_TCP)
>>>> +		ret = ovpn_tcp_socket_attach(sock, peer);
>>>>    	return ret;
>>>>    }
>>>> @@ -124,6 +129,21 @@ struct ovpn_socket *ovpn_socket_new(struct socket *sock, struct ovpn_peer *peer)
>>>>    	ovpn_sock->sock = sock;
>>>
>>> The line above this is:
>>>
>>>       ovpn_sock->ovpn = peer->ovpn;
>>>
>>> It's technically fine since you then overwrite this with peer in case
>>> we're on TCP, but ovpn_sock->ovpn only exists on UDP since you moved
>>> it into a union in this patch.
>>
>> Yeah, I did not want to make another branch, but having a UDP specific case
>> will make code easier to read.
> 
> Either that, or drop the union.

ACK

> 
> 
>>>> diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
>>>> new file mode 100644
>>>> index 000000000000..84ad7cd4fc4f
>>>> --- /dev/null
>>>> +++ b/drivers/net/ovpn/tcp.c
>>>> @@ -0,0 +1,511 @@
>>>> +static int ovpn_tcp_read_sock(read_descriptor_t *desc, struct sk_buff *in_skb,
>>>> +			      unsigned int in_offset, size_t in_len)
>>>> +{
>>>> +	struct sock *sk = desc->arg.data;
>>>> +	struct ovpn_socket *sock;
>>>> +	struct ovpn_skb_cb *cb;
>>>> +	struct ovpn_peer *peer;
>>>> +	size_t chunk, copied = 0;
>>>> +	void *data;
>>>> +	u16 len;
>>>> +	int st;
>>>> +
>>>> +	rcu_read_lock();
>>>> +	sock = rcu_dereference_sk_user_data(sk);
>>>> +	rcu_read_unlock();
>>>
>>> You can't just release rcu_read_lock and keep using sock (here and in
>>> the rest of this file). Either you keep rcu_read_lock, or you can take
>>> a reference on the ovpn_socket.
>>
>> I was just staring at this today, after having worked on the
>> rcu_read_lock/unlock for the peer get()s..
>>
>> I thinkt the assumption was: if we are in this read_sock callback, it's
>> impossible that the ovpn_socket was invalidated, because it gets invalidated
>> upon detach, which also prevents any further calling of this callback. But
>> this sounds racy, and I guess we should somewhat hold a reference..
> 
> ovpn_tcp_read_sock starts
> 
> detach
> kfree_rcu(ovpn_socket)
> ...
> ovpn_socket actually freed
> ...
> ovpn_tcp_read_sock continues with freed ovpn_socket
> 
> 
> I don't think anything in the current code prevents this.

mh yeah, if something like this happens right after having started the 
read_sock we are doomed.
Will fix this.


> 
> 
>>>> +/* Set TCP encapsulation callbacks */
>>>> +int ovpn_tcp_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>>>> +{
>>>> +	void *old_data;
>>>> +	int ret;
>>>> +
>>>> +	INIT_WORK(&peer->tcp.tx_work, ovpn_tcp_tx_work);
>>>> +
>>>> +	ret = ptr_ring_init(&peer->tcp.tx_ring, OVPN_QUEUE_LEN, GFP_KERNEL);
>>>> +	if (ret < 0) {
>>>> +		netdev_err(peer->ovpn->dev, "cannot allocate TCP TX ring\n");
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	peer->tcp.skb = NULL;
>>>> +	peer->tcp.offset = 0;
>>>> +	peer->tcp.data_len = 0;
>>>> +
>>>> +	write_lock_bh(&sock->sk->sk_callback_lock);
>>>> +
>>>> +	/* make sure no pre-existing encapsulation handler exists */
>>>> +	rcu_read_lock();
>>>> +	old_data = rcu_dereference_sk_user_data(sock->sk);
>>>> +	rcu_read_unlock();
>>>> +	if (old_data) {
>>>> +		netdev_err(peer->ovpn->dev,
>>>> +			   "provided socket already taken by other user\n");
>>>> +		ret = -EBUSY;
>>>> +		goto err;
>>>
>>> The UDP code differentiates "socket already owned by this interface"
>>> from "already taken by other user". That doesn't apply to TCP?
>>
>> This makes me wonder: how safe it is to interpret the user data as an object
>> of type ovpn_socket?
>>
>> When we find the user data already assigned, we don't know what was really
>> stored in there, right?
>> Technically this socket could have gone through another module which
>> assigned its own state.
>>
>> Therefore I think that what UDP does [ dereferencing ((struct ovpn_socket
>> *)user_data)->ovpn ] is probably not safe. Would you agree?
> 
> Hmmm, yeah, I think you're right. If you checked encap_type ==
> UDP_ENCAP_OVPNINUDP before (sk_prot for TCP), then you'd know it's
> really your data. Basically call ovpn_from_udp_sock during attach if
> you want to check something beyond EBUSY.

right. Maybe we can leave with simply reporting EBUSY and be done with 
it, without adding extra checks and what not.

> 
> Once you're in your own callbacks, it should be safe. If some other
> code sends packet with a non-ovpn socket to ovpn's ->encap_rcv,
> something is really broken.

yup

> 
>>>> +int __init ovpn_tcp_init(void)
>>>> +{
>>>> +	/* We need to substitute the recvmsg and the sock_is_readable
>>>> +	 * callbacks in the sk_prot member of the sock object for TCP
>>>> +	 * sockets.
>>>> +	 *
>>>> +	 * However sock->sk_prot is a pointer to a static variable and
>>>> +	 * therefore we can't directly modify it, otherwise every socket
>>>> +	 * pointing to it will be affected.
>>>> +	 *
>>>> +	 * For this reason we create our own static copy and modify what
>>>> +	 * we need. Then we make sk_prot point to this copy
>>>> +	 * (in ovpn_tcp_socket_attach())
>>>> +	 */
>>>> +	ovpn_tcp_prot = tcp_prot;
>>>
>>> Don't you need a separate variant for IPv6, like TLS does?
>>
>> Never did so far.
>>
>> My wild wild wild guess: for the time this socket is owned by ovpn, we only
>> use callbacks that are IPvX agnostic, hence v4 vs v6 doesn't make any
>> difference.
>> When this socket is released, we reassigned the original prot.
> 
> That seems a bit suspicious to me. For example, tcpv6_prot has a
> different backlog_rcv. And you don't control if the socket is detached
> before being closed, or which callbacks are needed. Your userspace
> client doesn't use them, but someone else's might.
> 
>>>> +	ovpn_tcp_prot.recvmsg = ovpn_tcp_recvmsg;
>>>
>>> You don't need to replace ->sendmsg as well? The userspace client is
>>> not expected to send messages?
>>
>> It is, but my assumption is that those packets will just go through the
>> socket as usual. No need to be handled by ovpn (those packets are not
>> encrypted/decrypted, like data traffic is).
>> And this is how it has worked so far.
>>
>> Makes sense?
> 
> Two things come to mind:
> 
> - userspace is expected to prefix the messages it inserts on the
>    stream with the 2-byte length field? otherwise, the peer won't be
>    able to parse them out of the stream

correct. userspace sends those packets as if ovpn is not running, 
therefore this happens naturally.

> 
> - I'm not convinced this would be safe wrt kernel writing partial
>    messages. if ovpn_tcp_send_one doesn't send the full message, you
>    could interleave two messages:
> 
>    +------+-------------------+------+--------+----------------+
>    | len1 | (bytes from msg1) | len2 | (msg2) | (rest of msg1) |
>    +------+-------------------+------+--------+----------------+
> 
>    and the RX side would parse that as:
> 
>    +------+-----------------------------------+------+---------
>    | len1 | (bytes from msg1) | len2 | (msg2) | ???? | ...
>    +------+-------------------+---------------+------+---------
> 
>    and try to interpret some random bytes out of either msg1 or msg2 as
>    a length prefix, resulting in a broken stream.

hm you are correct. if multiple sendmsg can overlap, then we might be in 
troubles, but are we sure this can truly happen?

> 
> 
> The stream format looks identical to ESP in TCP [1] (2B length prefix
> followed by the actual message), so I think the espintcp code (both tx
> and rx, except for actual protocol parsing) should look very
> similar. The problems that need to be solved for both protocols are
> pretty much the same.

ok, will have a look. maybe this will simplify the code even more and we 
will get rid of some of the issues we were discussing above.

Thanks!

> 
> [1] https://www.rfc-editor.org/rfc/rfc8229#section-3
> 

-- 
Antonio Quartulli
OpenVPN Inc.

  reply	other threads:[~2024-05-14 22:10 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06  1:16 [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 01/24] netlink: add NLA_POLICY_MAX_LEN macro Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 02/24] net: introduce OpenVPN Data Channel Offload (ovpn) Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 03/24] ovpn: add basic netlink support Antonio Quartulli
2024-05-08  0:10   ` Jakub Kicinski
2024-05-08  7:42     ` Antonio Quartulli
2024-05-08 14:42   ` Sabrina Dubroca
2024-05-08 14:51     ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 04/24] ovpn: add basic interface creation/destruction/management routines Antonio Quartulli
2024-05-08  0:18   ` Jakub Kicinski
2024-05-08  7:53     ` Antonio Quartulli
2024-05-08 14:52   ` Sabrina Dubroca
2024-05-09  1:06     ` Jakub Kicinski
2024-05-09  8:25     ` Antonio Quartulli
2024-05-09 10:09       ` Sabrina Dubroca
2024-05-09 10:35         ` Antonio Quartulli
2024-05-09 12:16           ` Sabrina Dubroca
2024-05-09 13:25             ` Antonio Quartulli
2024-05-09 13:52               ` Sabrina Dubroca
2024-05-06  1:16 ` [PATCH net-next v3 05/24] ovpn: implement interface creation/destruction via netlink Antonio Quartulli
2024-05-08  0:21   ` Jakub Kicinski
2024-05-08  9:49     ` Antonio Quartulli
2024-05-09  1:09       ` Jakub Kicinski
2024-05-09  8:30         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 06/24] ovpn: keep carrier always on Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 07/24] ovpn: introduce the ovpn_peer object Antonio Quartulli
2024-05-08 16:06   ` Sabrina Dubroca
2024-05-08 20:31     ` Antonio Quartulli
2024-05-09 13:04       ` Sabrina Dubroca
2024-05-09 13:24         ` Andrew Lunn
2024-05-10 18:57           ` Antonio Quartulli
2024-05-11  0:28             ` Jakub Kicinski
2024-05-09 13:44         ` Antonio Quartulli
2024-05-09 13:55           ` Andrew Lunn
2024-05-09 14:17           ` Sabrina Dubroca
2024-05-09 14:36             ` Antonio Quartulli
2024-05-09 14:53               ` Antonio Quartulli
2024-05-10 10:30                 ` Sabrina Dubroca
2024-05-10 12:34                   ` Antonio Quartulli
2024-05-10 14:11                     ` Sabrina Dubroca
2024-05-13 10:09   ` Simon Horman
2024-05-13 10:53     ` Antonio Quartulli
2024-05-13 15:04       ` Simon Horman
2024-05-06  1:16 ` [PATCH net-next v3 08/24] ovpn: introduce the ovpn_socket object Antonio Quartulli
2024-05-08 17:10   ` Sabrina Dubroca
2024-05-08 20:38     ` Antonio Quartulli
2024-05-09 13:32       ` Sabrina Dubroca
2024-05-09 13:46         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 09/24] ovpn: implement basic TX path (UDP) Antonio Quartulli
2024-05-10 13:01   ` Sabrina Dubroca
2024-05-10 13:39     ` Antonio Quartulli
2024-05-12 21:35   ` Sabrina Dubroca
2024-05-13  7:37     ` Antonio Quartulli
2024-05-13  9:36       ` Sabrina Dubroca
2024-05-13  9:47         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 10/24] ovpn: implement basic RX " Antonio Quartulli
2024-05-10 13:45   ` Sabrina Dubroca
2024-05-10 14:41     ` Antonio Quartulli
2024-07-18 10:46       ` Sabrina Dubroca
2024-07-18 13:06         ` Antonio Quartulli
2024-07-18 13:11           ` Sabrina Dubroca
2024-07-18 13:27             ` Antonio Quartulli
2024-07-18 13:40               ` Sabrina Dubroca
2024-07-18 14:15                 ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 11/24] ovpn: implement packet processing Antonio Quartulli
2024-05-12  8:46   ` Sabrina Dubroca
2024-05-13  7:14     ` Antonio Quartulli
2024-05-13  9:24       ` Sabrina Dubroca
2024-05-13  9:31         ` Antonio Quartulli
2024-05-22 14:08     ` Antonio Quartulli
2024-05-22 14:28       ` Andrew Lunn
2024-05-06  1:16 ` [PATCH net-next v3 12/24] ovpn: store tunnel and transport statistics Antonio Quartulli
2024-05-12  8:47   ` Sabrina Dubroca
2024-05-13  7:25     ` Antonio Quartulli
2024-05-13  9:19       ` Sabrina Dubroca
2024-05-13  9:33         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 13/24] ovpn: implement TCP transport Antonio Quartulli
2024-05-13 13:37   ` Antonio Quartulli
2024-05-13 15:34     ` Jakub Kicinski
2024-05-13 14:50   ` Sabrina Dubroca
2024-05-13 22:20     ` Antonio Quartulli
2024-05-14  8:58       ` Sabrina Dubroca
2024-05-14 22:11         ` Antonio Quartulli [this message]
2024-05-15 10:19           ` Sabrina Dubroca
2024-05-15 12:54             ` Antonio Quartulli
2024-05-15 14:55               ` Sabrina Dubroca
2024-05-15 19:44                 ` Antonio Quartulli
2024-05-15 20:35                   ` Sabrina Dubroca
2024-05-15 20:39                     ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 14/24] ovpn: implement multi-peer support Antonio Quartulli
2024-05-28 14:44   ` Sabrina Dubroca
2024-05-28 19:41     ` Antonio Quartulli
2024-05-29 15:16       ` Sabrina Dubroca
2024-05-29 20:15         ` Antonio Quartulli
2024-05-29 20:45           ` Sabrina Dubroca
2024-05-06  1:16 ` [PATCH net-next v3 15/24] ovpn: implement peer lookup logic Antonio Quartulli
2024-05-28 16:42   ` Sabrina Dubroca
2024-05-28 20:09     ` Antonio Quartulli
2024-05-29 16:42       ` Sabrina Dubroca
2024-05-29 20:19         ` Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 16/24] ovpn: implement keepalive mechanism Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 17/24] ovpn: add support for updating local UDP endpoint Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 18/24] ovpn: add support for peer floating Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 19/24] ovpn: implement peer add/dump/delete via netlink Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 20/24] ovpn: implement key add/del/swap " Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 21/24] ovpn: kill key and notify userspace in case of IV exhaustion Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 22/24] ovpn: notify userspace when a peer is deleted Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 23/24] ovpn: add basic ethtool support Antonio Quartulli
2024-05-06  1:16 ` [PATCH net-next v3 24/24] testing/selftest: add test tool and scripts for ovpn module Antonio Quartulli
2024-05-07 23:55   ` Jakub Kicinski
2024-05-08  9:51     ` Antonio Quartulli
2024-05-09  0:50       ` Jakub Kicinski
2024-05-09  8:40         ` Antonio Quartulli
2024-05-07 23:48 ` [PATCH net-next v3 00/24] Introducing OpenVPN Data Channel Offload Jakub Kicinski
2024-05-08  9:56   ` Antonio Quartulli
2024-05-09  0:53     ` Jakub Kicinski
2024-05-09  8:41       ` Antonio Quartulli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ddf759d-378f-475c-8fc1-30c6e83c2d14@openvpn.net \
    --to=antonio@openvpn.net \
    --cc=andrew@lunn.ch \
    --cc=edumazet@google.com \
    --cc=esben@geanix.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=ryazanov.s.a@gmail.com \
    --cc=sd@queasysnail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).