All of lore.kernel.org
 help / color / mirror / Atom feed
* [MPTCP] Re: [PATCH v4 7/7] mptcp: use mptcp backlog.
@ 2020-11-18 23:11 Florian Westphal
  0 siblings, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2020-11-18 23:11 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 766 bytes --]

Paolo Abeni <pabeni(a)redhat.com> wrote:
> @@ -2812,6 +2859,17 @@ static void mptcp_release_cb(struct sock *sk)
>  {
>  	unsigned long flags, nflags;
>  
> +	/* push_pending may touch wmem_reserved, do it before the later
> +	 * cleanup
> +	 */

Can you expand here a bit on why the lock break is needed & safe?

We still own the socket, so I think its safe, but I am not sure why its
needed.  Only due to GFP_KERNEL alloc in mptcp_push_pending, or is there
more to it?

> +	set_bit(MPTCP_NOSPACE, &msk->flags);
> +	smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */

I'm not sure why the barrier is needed.  What load/store needs the ordering?

Maybe I'm blind -- nothing catches my eye either here or in
subflow_write_space.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [MPTCP] Re: [PATCH v4 7/7] mptcp: use mptcp backlog.
@ 2020-11-19  8:36 Paolo Abeni
  0 siblings, 0 replies; 4+ messages in thread
From: Paolo Abeni @ 2020-11-19  8:36 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]

Hello,

On Thu, 2020-11-19 at 00:11 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni(a)redhat.com> wrote:
> > @@ -2812,6 +2859,17 @@ static void mptcp_release_cb(struct sock *sk)
> >  {
> >  	unsigned long flags, nflags;
> >  
> > +	/* push_pending may touch wmem_reserved, do it before the later
> > +	 * cleanup
> > +	 */
> 
> Can you expand here a bit on why the lock break is needed & safe?
> 
> We still own the socket, so I think its safe, but I am not sure why its
> needed.  Only due to GFP_KERNEL alloc in mptcp_push_pending, or is there
> more to it?

mptcp_push_pending() acquires the subflow socket lock:

1) can't be invoked in atomic scope
2) abba deadlock with msk socket spinlock, since the rx datapath
acquires the msk socket spinlock while helding the subflow socket lock.
> 
> > +	set_bit(MPTCP_NOSPACE, &msk->flags);
> > +	smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */
> 
> I'm not sure why the barrier is needed.  What load/store needs the ordering?

Maybe cargo-cult c&p on my side. Now MPTCP write poll code mirrors very
closely TCP's one, with s/MPTCP_NOSPACE/SOCK_NOSPACE/, and retains the
same barriers.

My understanding is that is needed in the following scenario:

BH - CPU0				User-space CPU1
<rx path>				<poll>

<free space>				<test write space>
<mb>					set_bit(...)
if test_bit(...)			<mb>
	sk_stream_write_space(sk)	<test write space>

without the memory barrier pairs the 2nd <test write space> could
actually miss the update done by <frees space> and test_bit can miss
the set_bit() perfomed on the other side due to reordering.

Looks like we currently lack the mb() on the rx side ;)

Additionally it would be better move the sk_stream_write_space() from
subflow_write_space af the end of mptcp_clean_una, where wspace is
actually freed.


> Maybe I'm blind -- nothing catches my eye either here or in
> subflow_write_space.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [MPTCP] Re: [PATCH v4 7/7] mptcp: use mptcp backlog.
@ 2020-11-19  9:41 Florian Westphal
  0 siblings, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2020-11-19  9:41 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]

Paolo Abeni <pabeni(a)redhat.com> wrote:
> > We still own the socket, so I think its safe, but I am not sure why its
> > needed.  Only due to GFP_KERNEL alloc in mptcp_push_pending, or is there
> > more to it?
> 
> mptcp_push_pending() acquires the subflow socket lock:
> 
> 1) can't be invoked in atomic scope
> 2) abba deadlock with msk socket spinlock, since the rx datapath
> acquires the msk socket spinlock while helding the subflow socket lock.

Can you add that to the comment?  Thanks!

> > > +	set_bit(MPTCP_NOSPACE, &msk->flags);
> > > +	smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */
> > 
> > I'm not sure why the barrier is needed.  What load/store needs the ordering?
> 
> Maybe cargo-cult c&p on my side. Now MPTCP write poll code mirrors very
> closely TCP's one, with s/MPTCP_NOSPACE/SOCK_NOSPACE/, and retains the
> same barriers.
> 
> My understanding is that is needed in the following scenario:
> 
> BH - CPU0				User-space CPU1
> <rx path>				<poll>
> 
> <free space>				<test write space>
> <mb>					set_bit(...)
> if test_bit(...)			<mb>
> 	sk_stream_write_space(sk)	<test write space>

You mean:

        set_bit(MPTCP_NOSPACE, &msk->flags);
        smp_mb__after_atomic();

	/* write_space might have changed right now,
	 * so we need to re-test.
	 */
        if (sk_stream_is_writeable(sk))
               return EPOLLOUT | EPOLLWRNORM;

> without the memory barrier pairs the 2nd <test write space> could
> actually miss the update done by <frees space> and test_bit can miss
> the set_bit() perfomed on the other side due to reordering.
> 
> Looks like we currently lack the mb() on the rx side ;)

Yes...

> Additionally it would be better move the sk_stream_write_space() from
> subflow_write_space af the end of mptcp_clean_una, where wspace is
> actually freed.

Right, there is no guarantee that we can push more data just because
a subflow has new space.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [MPTCP] Re: [PATCH v4 7/7] mptcp: use mptcp backlog.
@ 2020-11-19 11:09 Paolo Abeni
  0 siblings, 0 replies; 4+ messages in thread
From: Paolo Abeni @ 2020-11-19 11:09 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 2600 bytes --]

On Thu, 2020-11-19 at 10:41 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni(a)redhat.com> wrote:
> > > We still own the socket, so I think its safe, but I am not sure why its
> > > needed.  Only due to GFP_KERNEL alloc in mptcp_push_pending, or is there
> > > more to it?
> > 
> > mptcp_push_pending() acquires the subflow socket lock:
> > 
> > 1) can't be invoked in atomic scope
> > 2) abba deadlock with msk socket spinlock, since the rx datapath
> > acquires the msk socket spinlock while helding the subflow socket lock.
> 
> Can you add that to the comment?  Thanks!

Ok, will do.

> > > > +	set_bit(MPTCP_NOSPACE, &msk->flags);
> > > > +	smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */
> > > 
> > > I'm not sure why the barrier is needed.  What load/store needs the ordering?
> > 
> > Maybe cargo-cult c&p on my side. Now MPTCP write poll code mirrors very
> > closely TCP's one, with s/MPTCP_NOSPACE/SOCK_NOSPACE/, and retains the
> > same barriers.
> > 
> > My understanding is that is needed in the following scenario:
> > 
> > BH - CPU0				User-space CPU1
> > <rx path>				<poll>
> > 
> > <free space>				<test write space>
> > <mb>					set_bit(...)
> > if test_bit(...)			<mb>
> > 	sk_stream_write_space(sk)	<test write space>
> 
> You mean:
> 
>         set_bit(MPTCP_NOSPACE, &msk->flags);
>         smp_mb__after_atomic();
> 
> 	/* write_space might have changed right now,
> 	 * so we need to re-test.
> 	 */
>         if (sk_stream_is_writeable(sk))
>                return EPOLLOUT | EPOLLWRNORM;

Yep, I tried to be agnostic, as the same pattern works for TCP, too.

> > without the memory barrier pairs the 2nd <test write space> could
> > actually miss the update done by <frees space> and test_bit can miss
> > the set_bit() perfomed on the other side due to reordering.
> > 
> > Looks like we currently lack the mb() on the rx side ;)
> 
> Yes...

I'll add it.

> > Additionally it would be better move the sk_stream_write_space() from
> > subflow_write_space af the end of mptcp_clean_una, where wspace is
> > actually freed.
> 
> Right, there is no guarantee that we can push more data just because
> a subflow has new space.

Yep, but the current code (subflow_write_space()) checks both ssk and
msk free_space, so it should be correct.

A potential problem is that __mptcp_clean_una() can happen _after_
subflow_write_space() is invoked - when delayed to release_cb(), so we
would need a later packet to actually wake up the sender.

I'll cook v5 with all the above.

/P

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-19 11:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-19  9:41 [MPTCP] Re: [PATCH v4 7/7] mptcp: use mptcp backlog Florian Westphal
  -- strict thread matches above, loose matches on Subject: below --
2020-11-19 11:09 Paolo Abeni
2020-11-19  8:36 Paolo Abeni
2020-11-18 23:11 Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.