[MPTCP] Re: [PATCH mptcp-next v2] mptcp: adjust mptcp receive buffer limit if subflow has larger one

All of lore.kernel.org
 help / color / mirror / Atom feed

* [MPTCP] Re: [PATCH mptcp-next v2] mptcp: adjust mptcp receive buffer limit if subflow has larger one
@ 2020-08-19  8:49 Florian Westphal
  0 siblings, 0 replies; 3+ messages in thread
From: Florian Westphal @ 2020-08-19  8:49 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 2595 bytes --]

Mat Martineau <mathew.j.martineau(a)linux.intel.com> wrote:
> > In mptcp_data_ready() we do not hold the mptcp socket lock, so
> > modification of sk_rcvbuf is racy.  Do it when moving skbs from subflow
> > to mptcp socket, both sockets are locked in this case.
> > 
> > v2: move rcvbuf update to __mptcp_move_skbs_from_subflow where both
> > mptcp and subflow sk locks are held (pointed out by Mat).
> > 
> > Signed-off-by: Florian Westphal <fw(a)strlen.de>
> > ---
> > net/mptcp/protocol.c | 22 +++++++++++++++++-----
> > 1 file changed, 17 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 77d655b0650c..ecf93d0bec14 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -454,10 +454,17 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> > 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> > 	struct sock *sk = (struct sock *)msk;
> > 	unsigned int moved = 0;
> > +	int sk_rbuf, ssk_rbuf;
> > 	bool more_data_avail;
> > 	struct tcp_sock *tp;
> > 	bool done = false;
> > 
> > +	ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf);
> > +	sk_rbuf = READ_ONCE(sk->sk_rcvbuf);
> > +
> > +	if (unlikely(ssk_rbuf > sk_rbuf))
> > +		WRITE_ONCE(sk->sk_rcvbuf, ssk_rbuf);
> > +
> 
> The v2 changes do address my concerns with v1, but now I'm looking at how
> this change interacts with mptcp_rcv_space_adjust() and it's not obvious
> what all the consequences are. Is ssk_rbuf always in a valid range for the
> msk,

What is the 'valid range'?  TCP caps ssk_rcvbuf at rmem[2] sysctl.

> and is it ok if that is propagated to msk->sk_rcvbuf and changes the
> way other ssk->sk_rcvbuf values are updated in mptcp_rcv_space_adjust()?

Hmm, the values should be kept in-sync.

If msk->sk_rcvbuf is large (e.g. increased via msk autotune) and we
don't make the subflow rcvbuf setting follow along (this is what we do
ATM), then subflow will drop packets on the floor if we can't drain fast
enough.

If TCP has increased the subflow sk_rcvbuf via clamp_window() path and we
don't update the msk (which is what this patch changes), then we may
end up not moving all pending skbs to the mptcp socket because TCP
queued up more than what the mptcp rcvbuf limit allows.

That seems quite stupid to me -- data is already here, why have a delay?
Making this change gets rid of excess delays in the "lo" test cases for
me.

> Also what about checking (sk->sk_userlocks & SOCK_RCVBUF_LOCK)?

Agreed, that should be added.

I can send a v3, let me know.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [MPTCP] Re: [PATCH mptcp-next v2] mptcp: adjust mptcp receive buffer limit if subflow has larger one
@ 2020-08-19 23:11 Mat Martineau
  0 siblings, 0 replies; 3+ messages in thread
From: Mat Martineau @ 2020-08-19 23:11 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 3159 bytes --]

On Wed, 19 Aug 2020, Florian Westphal wrote:

> Mat Martineau <mathew.j.martineau(a)linux.intel.com> wrote:
>>> In mptcp_data_ready() we do not hold the mptcp socket lock, so
>>> modification of sk_rcvbuf is racy.  Do it when moving skbs from subflow
>>> to mptcp socket, both sockets are locked in this case.
>>>
>>> v2: move rcvbuf update to __mptcp_move_skbs_from_subflow where both
>>> mptcp and subflow sk locks are held (pointed out by Mat).
>>>
>>> Signed-off-by: Florian Westphal <fw(a)strlen.de>
>>> ---
>>> net/mptcp/protocol.c | 22 +++++++++++++++++-----
>>> 1 file changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 77d655b0650c..ecf93d0bec14 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -454,10 +454,17 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
>>> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
>>> 	struct sock *sk = (struct sock *)msk;
>>> 	unsigned int moved = 0;
>>> +	int sk_rbuf, ssk_rbuf;
>>> 	bool more_data_avail;
>>> 	struct tcp_sock *tp;
>>> 	bool done = false;
>>>
>>> +	ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf);
>>> +	sk_rbuf = READ_ONCE(sk->sk_rcvbuf);
>>> +
>>> +	if (unlikely(ssk_rbuf > sk_rbuf))
>>> +		WRITE_ONCE(sk->sk_rcvbuf, ssk_rbuf);
>>> +
>>
>> The v2 changes do address my concerns with v1, but now I'm looking at how
>> this change interacts with mptcp_rcv_space_adjust() and it's not obvious
>> what all the consequences are. Is ssk_rbuf always in a valid range for the
>> msk,
>
> What is the 'valid range'?  TCP caps ssk_rcvbuf at rmem[2] sysctl.
>

Ok, when I asked the question I was thinking the msk rcvbuf calculation 
differed more significantly from the TCP code, but as you point out they 
use the same sysctl limitation.

>> and is it ok if that is propagated to msk->sk_rcvbuf and changes the
>> way other ssk->sk_rcvbuf values are updated in mptcp_rcv_space_adjust()?
>
> Hmm, the values should be kept in-sync.
>
> If msk->sk_rcvbuf is large (e.g. increased via msk autotune) and we
> don't make the subflow rcvbuf setting follow along (this is what we do
> ATM), then subflow will drop packets on the floor if we can't drain fast
> enough.
>
> If TCP has increased the subflow sk_rcvbuf via clamp_window() path and we
> don't update the msk (which is what this patch changes), then we may
> end up not moving all pending skbs to the mptcp socket because TCP
> queued up more than what the mptcp rcvbuf limit allows.
>
> That seems quite stupid to me -- data is already here, why have a delay?
> Making this change gets rid of excess delays in the "lo" test cases for
> me.
>

I was misunderstanding the autotuning process at the msk level before. 
Thanks for explaining more of the details, it does seems ok to propagate 
the subflow sk_rcvbuf value up to the msk to avoid needlessly discarding 
good data.

>> Also what about checking (sk->sk_userlocks & SOCK_RCVBUF_LOCK)?
>
> Agreed, that should be added.
>
> I can send a v3, let me know.

Yes, please do.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [MPTCP] Re: [PATCH mptcp-next v2] mptcp: adjust mptcp receive buffer limit if subflow has larger one
@ 2020-08-18  1:02 Mat Martineau
  0 siblings, 0 replies; 3+ messages in thread
From: Mat Martineau @ 2020-08-18  1:02 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 4086 bytes --]

On Mon, 17 Aug 2020, Florian Westphal wrote:

> In addition to tcp autotuning during read, it may also increase the
> receive buffer in tcp_clamp_window().
>
> In this case, mptcp should adjust its receive buffer size as well so
> it can pull all pending skbs at once.
>
> At this time, when TCP grows its receive buffer, it may have more
> skbs ready for processing than what mptcp allows.
>
> In the mptcp case, the receive window is derived from free
> space of the mptcp parent socket instead of the individual subflows.
> Following the subflow allows mptcp to grow its receive buffer.
>
> This is especially noticeable for loopback traffic, when even two
> skbs are enough to fill the initial receive window.
>
> In mptcp_data_ready() we do not hold the mptcp socket lock, so
> modification of sk_rcvbuf is racy.  Do it when moving skbs from subflow
> to mptcp socket, both sockets are locked in this case.
>
> v2: move rcvbuf update to __mptcp_move_skbs_from_subflow where both
> mptcp and subflow sk locks are held (pointed out by Mat).
>
> Signed-off-by: Florian Westphal <fw(a)strlen.de>
> ---
> net/mptcp/protocol.c | 22 +++++++++++++++++-----
> 1 file changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 77d655b0650c..ecf93d0bec14 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -454,10 +454,17 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> 	struct sock *sk = (struct sock *)msk;
> 	unsigned int moved = 0;
> +	int sk_rbuf, ssk_rbuf;
> 	bool more_data_avail;
> 	struct tcp_sock *tp;
> 	bool done = false;
>
> +	ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf);
> +	sk_rbuf = READ_ONCE(sk->sk_rcvbuf);
> +
> +	if (unlikely(ssk_rbuf > sk_rbuf))
> +		WRITE_ONCE(sk->sk_rcvbuf, ssk_rbuf);
> +

The v2 changes do address my concerns with v1, but now I'm looking at how 
this change interacts with mptcp_rcv_space_adjust() and it's not obvious 
what all the consequences are. Is ssk_rbuf always in a valid range for the 
msk, and is it ok if that is propagated to msk->sk_rcvbuf and changes the 
way other ssk->sk_rcvbuf values are updated in mptcp_rcv_space_adjust()?

Also what about checking (sk->sk_userlocks & SOCK_RCVBUF_LOCK)?


Mat


> 	pr_debug("msk=%p ssk=%p", msk, ssk);
> 	tp = tcp_sk(ssk);
> 	do {
> @@ -511,7 +518,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
> 		WRITE_ONCE(tp->copied_seq, seq);
> 		more_data_avail = mptcp_subflow_data_available(ssk);
>
> -		if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf)) {
> +		if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf) {
> 			done = true;
> 			break;
> 		}
> @@ -603,6 +610,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
> {
> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> +	int sk_rbuf, ssk_rbuf;
> 	bool wake;
>
> 	/* move_skbs_to_msk below can legitly clear the data_avail flag,
> @@ -613,12 +621,16 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
> 	if (wake)
> 		set_bit(MPTCP_DATA_READY, &msk->flags);
>
> -	if (atomic_read(&sk->sk_rmem_alloc) < READ_ONCE(sk->sk_rcvbuf) &&
> -	    move_skbs_to_msk(msk, ssk))
> +	ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf);
> +	sk_rbuf = READ_ONCE(sk->sk_rcvbuf);
> +	if (unlikely(ssk_rbuf > sk_rbuf))
> +		sk_rbuf = ssk_rbuf;
> +
> +	/* over limit? can't append more skbs to msk */
> +	if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf)
> 		goto wake;
>
> -	/* don't schedule if mptcp sk is (still) over limit */
> -	if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf))
> +	if (move_skbs_to_msk(msk, ssk))
> 		goto wake;
>
> 	/* mptcp socket is owned, release_cb should retry */
> -- 
> 2.26.2
> _______________________________________________
> mptcp mailing list -- mptcp(a)lists.01.org
> To unsubscribe send an email to mptcp-leave(a)lists.01.org
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-19 23:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-19  8:49 [MPTCP] Re: [PATCH mptcp-next v2] mptcp: adjust mptcp receive buffer limit if subflow has larger one Florian Westphal
  -- strict thread matches above, loose matches on Subject: below --
2020-08-19 23:11 Mat Martineau
2020-08-18  1:02 Mat Martineau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.