All of lore.kernel.org
 help / color / mirror / Atom feed
* [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse
@ 2017-11-28  5:57 Christoph Paasch
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Paasch @ 2017-11-28  5:57 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 712 bytes --]

Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
---
 net/ipv4/tcp_input.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4e84f299c96f..7acc1f46641c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net *net,
 				break;
 
 			default:
-				tcp_extra_options_parse(opcode, opsize, ptr,
-							skb, opt_rx,
-							tcp_to_sk(tp));
+				if (static_branch_unlikely(&tcp_extra_options_enabled))
+					tcp_extra_options_parse(opcode, opsize,
+								ptr, skb,
+								opt_rx,
+								tcp_to_sk(tp));
 				break;
 
 			}
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse
@ 2017-11-29  1:49 Mat Martineau
  0 siblings, 0 replies; 9+ messages in thread
From: Mat Martineau @ 2017-11-29  1:49 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]


On Mon, 27 Nov 2017, Christoph Paasch wrote:

> Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
> ---
> net/ipv4/tcp_input.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 4e84f299c96f..7acc1f46641c 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net *net,
> 				break;
>
> 			default:
> -				tcp_extra_options_parse(opcode, opsize, ptr,
> -							skb, opt_rx,
> -							tcp_to_sk(tp));
> +				if (static_branch_unlikely(&tcp_extra_options_enabled))

Since this case in the switch statement is only executed for unrecognized 
options, I had skipped the static branch in the initial implementation. It 
is more consistent to use the static branch, and there isn't much use in 
making the function call just to try to iterate over an empty list.

> +					tcp_extra_options_parse(opcode, opsize,
> +								ptr, skb,
> +								opt_rx,
> +								tcp_to_sk(tp));
> 				break;
>
> 			}
> -- 
> 2.15.0
>
>

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse
@ 2017-11-29  1:58 Mat Martineau
  0 siblings, 0 replies; 9+ messages in thread
From: Mat Martineau @ 2017-11-29  1:58 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 1673 bytes --]


On Tue, 28 Nov 2017, Mat Martineau wrote:

>
> On Mon, 27 Nov 2017, Christoph Paasch wrote:
>
>> Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
>> ---
>> net/ipv4/tcp_input.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>> 
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 4e84f299c96f..7acc1f46641c 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net *net,
>> 				break;
>>
>> 			default:
>> -				tcp_extra_options_parse(opcode, opsize, ptr,
>> -							skb, opt_rx,
>> -							tcp_to_sk(tp));
>> +				if 
>> (static_branch_unlikely(&tcp_extra_options_enabled))
>
> Since this case in the switch statement is only executed for unrecognized 
> options, I had skipped the static branch in the initial implementation. It is 
> more consistent to use the static branch, and there isn't much use in making 
> the function call just to try to iterate over an empty list.
>
>> +					tcp_extra_options_parse(opcode, 
>> opsize,
>> +								ptr, skb,
>> +								opt_rx,
>> + 
>> tcp_to_sk(tp));
>> 				break;
>>
>> 			}
>> -- 
>> 2.15.0
>> 
>>

I sent my reply and then realized I had one more point to make. We might 
need to take another look at the use of static branching by the extra 
options framework. It's one thing to inc/dec tcp_extra_options_enabled 
when modules are added or removed, but we don't want to be rapidly 
changing static branches as MPTCP sockets come and go. Does 
tcp_extra_options_enabled still make sense for per-socket extra_option 
lists?

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse
@ 2017-11-29 18:31 Christoph Paasch
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Paasch @ 2017-11-29 18:31 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 2220 bytes --]

On 28/11/17 - 17:58:52, Mat Martineau wrote:
> 
> On Tue, 28 Nov 2017, Mat Martineau wrote:
> 
> > 
> > On Mon, 27 Nov 2017, Christoph Paasch wrote:
> > 
> > > Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
> > > ---
> > > net/ipv4/tcp_input.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 4e84f299c96f..7acc1f46641c 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net *net,
> > > 				break;
> > > 
> > > 			default:
> > > -				tcp_extra_options_parse(opcode, opsize, ptr,
> > > -							skb, opt_rx,
> > > -							tcp_to_sk(tp));
> > > +				if (static_branch_unlikely(&tcp_extra_options_enabled))
> > 
> > Since this case in the switch statement is only executed for
> > unrecognized options, I had skipped the static branch in the initial
> > implementation. It is more consistent to use the static branch, and
> > there isn't much use in making the function call just to try to iterate
> > over an empty list.
> > 
> > > +					tcp_extra_options_parse(opcode, opsize,
> > > +								ptr, skb,
> > > +								opt_rx,
> > > + tcp_to_sk(tp));
> > > 				break;
> > > 
> > > 			}
> > > -- 
> > > 2.15.0
> > > 
> > > 
> 
> I sent my reply and then realized I had one more point to make. We might
> need to take another look at the use of static branching by the extra
> options framework. It's one thing to inc/dec tcp_extra_options_enabled when
> modules are added or removed, but we don't want to be rapidly changing
> static branches as MPTCP sockets come and go. Does tcp_extra_options_enabled
> still make sense for per-socket extra_option lists?

You think that the test for whether or not the tcp_option_list is emtpy is
sufficient?

I don't know about the cost of static_branch_inc/dec. Is it that bad?


One thing is that having the static branches allows for sure to avoid any
cost in the hot data-path. Which is definitely something netdev will want.


If we pay a cost for MPTCP, I think it's fine IMO. (at least in a first step ;))


Christoph



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse
@ 2017-11-29 20:19 Mat Martineau
  0 siblings, 0 replies; 9+ messages in thread
From: Mat Martineau @ 2017-11-29 20:19 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 3362 bytes --]

On Wed, 29 Nov 2017, Christoph Paasch wrote:

> On 28/11/17 - 17:58:52, Mat Martineau wrote:
>>
>> On Tue, 28 Nov 2017, Mat Martineau wrote:
>>
>>>
>>> On Mon, 27 Nov 2017, Christoph Paasch wrote:
>>>
>>>> Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
>>>> ---
>>>> net/ipv4/tcp_input.c | 8 +++++---
>>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>>>> index 4e84f299c96f..7acc1f46641c 100644
>>>> --- a/net/ipv4/tcp_input.c
>>>> +++ b/net/ipv4/tcp_input.c
>>>> @@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net *net,
>>>> 				break;
>>>>
>>>> 			default:
>>>> -				tcp_extra_options_parse(opcode, opsize, ptr,
>>>> -							skb, opt_rx,
>>>> -							tcp_to_sk(tp));
>>>> +				if (static_branch_unlikely(&tcp_extra_options_enabled))
>>>
>>> Since this case in the switch statement is only executed for
>>> unrecognized options, I had skipped the static branch in the initial
>>> implementation. It is more consistent to use the static branch, and
>>> there isn't much use in making the function call just to try to iterate
>>> over an empty list.
>>>
>>>> +					tcp_extra_options_parse(opcode, opsize,
>>>> +								ptr, skb,
>>>> +								opt_rx,
>>>> + tcp_to_sk(tp));
>>>> 				break;
>>>>
>>>> 			}
>>>> --
>>>> 2.15.0
>>>>
>>>>
>>
>> I sent my reply and then realized I had one more point to make. We might
>> need to take another look at the use of static branching by the extra
>> options framework. It's one thing to inc/dec tcp_extra_options_enabled when
>> modules are added or removed, but we don't want to be rapidly changing
>> static branches as MPTCP sockets come and go. Does tcp_extra_options_enabled
>> still make sense for per-socket extra_option lists?
>
> You think that the test for whether or not the tcp_option_list is emtpy is
> sufficient?

It might be if we wrap it in a "unlikely()".

>
> I don't know about the cost of static_branch_inc/dec. Is it that bad?
>

Yeah, it's that bad. The kernel is patching machine code every time the 
key is enabled or disabled (rewriting no-ops with jump instructions and 
vice-versa).

"""
  * Since this relies on modifying code, the branch modifying functions
  * must be considered absolute slow paths (machine wide synchronization etc.).
"""

http://elixir.free-electrons.com/linux/v4.15-rc1/source/include/linux/jump_label.h#L56

Not just a slow path, but an *absolute* slow path!

static_key_deferred and static_key_slow_dec_deferred() are available in 
jump_label_ratelimit.h, which is one way to keep it from thrashing too 
much. This rate-limited static key is only used in one file 
(arch/x86/kvm/lapic.c). That subsystem limits jump label patching to once 
per second.

>
> One thing is that having the static branches allows for sure to avoid any
> cost in the hot data-path. Which is definitely something netdev will want.
>
>
> If we pay a cost for MPTCP, I think it's fine IMO. (at least in a first step ;))

It would be more straightforward if only MPTCP was paying the cost, but in 
this case the hit is systemwide if any extra TCP options are in use. The 
combination of a rate-limiting static_key and checking the option list 
before the call could be fast enough.

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse
@ 2017-11-29 20:50 Christoph Paasch
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Paasch @ 2017-11-29 20:50 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 4064 bytes --]

On 29/11/17 - 12:19:32, Mat Martineau wrote:
> On Wed, 29 Nov 2017, Christoph Paasch wrote:
> 
> > On 28/11/17 - 17:58:52, Mat Martineau wrote:
> > > 
> > > On Tue, 28 Nov 2017, Mat Martineau wrote:
> > > 
> > > > 
> > > > On Mon, 27 Nov 2017, Christoph Paasch wrote:
> > > > 
> > > > > Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
> > > > > ---
> > > > > net/ipv4/tcp_input.c | 8 +++++---
> > > > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > > index 4e84f299c96f..7acc1f46641c 100644
> > > > > --- a/net/ipv4/tcp_input.c
> > > > > +++ b/net/ipv4/tcp_input.c
> > > > > @@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net *net,
> > > > > 				break;
> > > > > 
> > > > > 			default:
> > > > > -				tcp_extra_options_parse(opcode, opsize, ptr,
> > > > > -							skb, opt_rx,
> > > > > -							tcp_to_sk(tp));
> > > > > +				if (static_branch_unlikely(&tcp_extra_options_enabled))
> > > > 
> > > > Since this case in the switch statement is only executed for
> > > > unrecognized options, I had skipped the static branch in the initial
> > > > implementation. It is more consistent to use the static branch, and
> > > > there isn't much use in making the function call just to try to iterate
> > > > over an empty list.
> > > > 
> > > > > +					tcp_extra_options_parse(opcode, opsize,
> > > > > +								ptr, skb,
> > > > > +								opt_rx,
> > > > > + tcp_to_sk(tp));
> > > > > 				break;
> > > > > 
> > > > > 			}
> > > > > --
> > > > > 2.15.0
> > > > > 
> > > > > 
> > > 
> > > I sent my reply and then realized I had one more point to make. We might
> > > need to take another look at the use of static branching by the extra
> > > options framework. It's one thing to inc/dec tcp_extra_options_enabled when
> > > modules are added or removed, but we don't want to be rapidly changing
> > > static branches as MPTCP sockets come and go. Does tcp_extra_options_enabled
> > > still make sense for per-socket extra_option lists?
> > 
> > You think that the test for whether or not the tcp_option_list is emtpy is
> > sufficient?
> 
> It might be if we wrap it in a "unlikely()".
> 
> > 
> > I don't know about the cost of static_branch_inc/dec. Is it that bad?
> > 
> 
> Yeah, it's that bad. The kernel is patching machine code every time the key
> is enabled or disabled (rewriting no-ops with jump instructions and
> vice-versa).
> 
> """
>  * Since this relies on modifying code, the branch modifying functions
>  * must be considered absolute slow paths (machine wide synchronization etc.).
> """
> 
> http://elixir.free-electrons.com/linux/v4.15-rc1/source/include/linux/jump_label.h#L56
> 
> Not just a slow path, but an *absolute* slow path!
> 
> static_key_deferred and static_key_slow_dec_deferred() are available in
> jump_label_ratelimit.h, which is one way to keep it from thrashing too much.
> This rate-limited static key is only used in one file
> (arch/x86/kvm/lapic.c). That subsystem limits jump label patching to once
> per second.

I see.

> 
> > 
> > One thing is that having the static branches allows for sure to avoid any
> > cost in the hot data-path. Which is definitely something netdev will want.
> > 
> > 
> > If we pay a cost for MPTCP, I think it's fine IMO. (at least in a first step ;))
> 
> It would be more straightforward if only MPTCP was paying the cost, but in
> this case the hit is systemwide if any extra TCP options are in use. The
> combination of a rate-limiting static_key and checking the option list
> before the call could be fast enough.

If we have rate-limited static keys, do we still need to check the
option-list before the call?

Also, what if we rather inc/dec the static key more rarely. E.g., not for
every new connection but rather for the very first one and then dec again at
the very last one.

I'm trying to cook a patch...


Christoph





> 
> --
> Mat Martineau
> Intel OTC

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse
@ 2017-11-29 21:16 Christoph Paasch
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Paasch @ 2017-11-29 21:16 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 6342 bytes --]

On 29/11/17 - 12:50:24, Christoph Paasch wrote:
> On 29/11/17 - 12:19:32, Mat Martineau wrote:
> > On Wed, 29 Nov 2017, Christoph Paasch wrote:
> > 
> > > On 28/11/17 - 17:58:52, Mat Martineau wrote:
> > > > 
> > > > On Tue, 28 Nov 2017, Mat Martineau wrote:
> > > > 
> > > > > 
> > > > > On Mon, 27 Nov 2017, Christoph Paasch wrote:
> > > > > 
> > > > > > Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
> > > > > > ---
> > > > > > net/ipv4/tcp_input.c | 8 +++++---
> > > > > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > > > index 4e84f299c96f..7acc1f46641c 100644
> > > > > > --- a/net/ipv4/tcp_input.c
> > > > > > +++ b/net/ipv4/tcp_input.c
> > > > > > @@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net *net,
> > > > > > 				break;
> > > > > > 
> > > > > > 			default:
> > > > > > -				tcp_extra_options_parse(opcode, opsize, ptr,
> > > > > > -							skb, opt_rx,
> > > > > > -							tcp_to_sk(tp));
> > > > > > +				if (static_branch_unlikely(&tcp_extra_options_enabled))
> > > > > 
> > > > > Since this case in the switch statement is only executed for
> > > > > unrecognized options, I had skipped the static branch in the initial
> > > > > implementation. It is more consistent to use the static branch, and
> > > > > there isn't much use in making the function call just to try to iterate
> > > > > over an empty list.
> > > > > 
> > > > > > +					tcp_extra_options_parse(opcode, opsize,
> > > > > > +								ptr, skb,
> > > > > > +								opt_rx,
> > > > > > + tcp_to_sk(tp));
> > > > > > 				break;
> > > > > > 
> > > > > > 			}
> > > > > > --
> > > > > > 2.15.0
> > > > > > 
> > > > > > 
> > > > 
> > > > I sent my reply and then realized I had one more point to make. We might
> > > > need to take another look at the use of static branching by the extra
> > > > options framework. It's one thing to inc/dec tcp_extra_options_enabled when
> > > > modules are added or removed, but we don't want to be rapidly changing
> > > > static branches as MPTCP sockets come and go. Does tcp_extra_options_enabled
> > > > still make sense for per-socket extra_option lists?
> > > 
> > > You think that the test for whether or not the tcp_option_list is emtpy is
> > > sufficient?
> > 
> > It might be if we wrap it in a "unlikely()".
> > 
> > > 
> > > I don't know about the cost of static_branch_inc/dec. Is it that bad?
> > > 
> > 
> > Yeah, it's that bad. The kernel is patching machine code every time the key
> > is enabled or disabled (rewriting no-ops with jump instructions and
> > vice-versa).
> > 
> > """
> >  * Since this relies on modifying code, the branch modifying functions
> >  * must be considered absolute slow paths (machine wide synchronization etc.).
> > """
> > 
> > http://elixir.free-electrons.com/linux/v4.15-rc1/source/include/linux/jump_label.h#L56
> > 
> > Not just a slow path, but an *absolute* slow path!
> > 
> > static_key_deferred and static_key_slow_dec_deferred() are available in
> > jump_label_ratelimit.h, which is one way to keep it from thrashing too much.
> > This rate-limited static key is only used in one file
> > (arch/x86/kvm/lapic.c). That subsystem limits jump label patching to once
> > per second.
> 
> I see.
> 
> > 
> > > 
> > > One thing is that having the static branches allows for sure to avoid any
> > > cost in the hot data-path. Which is definitely something netdev will want.
> > > 
> > > 
> > > If we pay a cost for MPTCP, I think it's fine IMO. (at least in a first step ;))
> > 
> > It would be more straightforward if only MPTCP was paying the cost, but in
> > this case the hit is systemwide if any extra TCP options are in use. The
> > combination of a rate-limiting static_key and checking the option list
> > before the call could be fast enough.
> 
> If we have rate-limited static keys, do we still need to check the
> option-list before the call?
> 
> Also, what if we rather inc/dec the static key more rarely. E.g., not for
> every new connection but rather for the very first one and then dec again at
> the very last one.
> 
> I'm trying to cook a patch...

This is what I have in mind:


-----
commit ba22182b0b3100539e545313e0eb12a9c466c1fe
Author: Christoph Paasch <cpaasch(a)apple.com>
Date:   Wed Nov 29 13:11:44 2017 -0800

    tcp_extra_options: Reduce static_branch_inc/dec through a refcnt
    
    Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6db92bbbf3e2..344a0d859b37 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2008,6 +2008,7 @@ struct tcp_extra_option_ops {
 	struct tcp_extra_option_store *(*move)(struct tcp_extra_option_store *from);
 	void (*destroy)(struct tcp_extra_option_store *store);
 	struct module		*owner;
+	atomic_t		*refcnt;
 };
 
 /* Every extra-option in a TCP-socket gets stored in a _store structure which
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 2dcb12c330eb..3067e7b5ad00 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3570,7 +3570,8 @@ int tcp_register_extra_option(struct tcp_extra_option_store *store,
 	list_add_tail(&store->list, add_before);
 	pr_debug("Option kind %u registered\n", store->ops->option_kind);
 
-	static_branch_inc(&tcp_extra_options_enabled);
+	if (atomic_inc_return(store->ops->refcnt) == 1)
+		static_branch_inc(&tcp_extra_options_enabled);
 
 	return ret;
 }
@@ -3602,7 +3603,8 @@ void tcp_extra_options_copy(struct sock *listener,
 
 		list_add_tail(&new->list, to);
 
-		static_branch_inc(&tcp_extra_options_enabled);
+		if (atomic_inc_return(new->ops->refcnt) == 1)
+			static_branch_inc(&tcp_extra_options_enabled);
 	}
 }
 
@@ -3632,11 +3634,16 @@ void tcp_extra_options_destroy(struct sock *sk)
 
 	list_for_each_entry_safe(entry, tmp, lhead, list) {
 		struct module *owner = entry->ops->owner;
+		bool dec_static = false;
 		list_del(&entry->list);
 
+		if (atomic_dec_return(entry->ops->refcnt) == 0)
+			dec_static = true;
+
 		entry->ops->destroy(entry);
 
-		static_branch_dec(&tcp_extra_options_enabled);
+		if (dec_static)
+			static_branch_dec(&tcp_extra_options_enabled);
 
 		module_put(owner);
 	}

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse
@ 2017-11-29 22:24 Mat Martineau
  0 siblings, 0 replies; 9+ messages in thread
From: Mat Martineau @ 2017-11-29 22:24 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 7294 bytes --]


On Wed, 29 Nov 2017, Christoph Paasch wrote:

> On 29/11/17 - 12:50:24, Christoph Paasch wrote:
>> On 29/11/17 - 12:19:32, Mat Martineau wrote:
>>> On Wed, 29 Nov 2017, Christoph Paasch wrote:
>>>
>>>> On 28/11/17 - 17:58:52, Mat Martineau wrote:
>>>>>
>>>>> On Tue, 28 Nov 2017, Mat Martineau wrote:
>>>>>
>>>>>>
>>>>>> On Mon, 27 Nov 2017, Christoph Paasch wrote:
>>>>>>
>>>>>>> Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
>>>>>>> ---
>>>>>>> net/ipv4/tcp_input.c | 8 +++++---
>>>>>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>>>>>>> index 4e84f299c96f..7acc1f46641c 100644
>>>>>>> --- a/net/ipv4/tcp_input.c
>>>>>>> +++ b/net/ipv4/tcp_input.c
>>>>>>> @@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net *net,
>>>>>>> 				break;
>>>>>>>
>>>>>>> 			default:
>>>>>>> -				tcp_extra_options_parse(opcode, opsize, ptr,
>>>>>>> -							skb, opt_rx,
>>>>>>> -							tcp_to_sk(tp));
>>>>>>> +				if (static_branch_unlikely(&tcp_extra_options_enabled))
>>>>>>
>>>>>> Since this case in the switch statement is only executed for
>>>>>> unrecognized options, I had skipped the static branch in the initial
>>>>>> implementation. It is more consistent to use the static branch, and
>>>>>> there isn't much use in making the function call just to try to iterate
>>>>>> over an empty list.
>>>>>>
>>>>>>> +					tcp_extra_options_parse(opcode, opsize,
>>>>>>> +								ptr, skb,
>>>>>>> +								opt_rx,
>>>>>>> + tcp_to_sk(tp));
>>>>>>> 				break;
>>>>>>>
>>>>>>> 			}
>>>>>>> --
>>>>>>> 2.15.0
>>>>>>>
>>>>>>>
>>>>>
>>>>> I sent my reply and then realized I had one more point to make. We might
>>>>> need to take another look at the use of static branching by the extra
>>>>> options framework. It's one thing to inc/dec tcp_extra_options_enabled when
>>>>> modules are added or removed, but we don't want to be rapidly changing
>>>>> static branches as MPTCP sockets come and go. Does tcp_extra_options_enabled
>>>>> still make sense for per-socket extra_option lists?
>>>>
>>>> You think that the test for whether or not the tcp_option_list is emtpy is
>>>> sufficient?
>>>
>>> It might be if we wrap it in a "unlikely()".
>>>
>>>>
>>>> I don't know about the cost of static_branch_inc/dec. Is it that bad?
>>>>
>>>
>>> Yeah, it's that bad. The kernel is patching machine code every time the key
>>> is enabled or disabled (rewriting no-ops with jump instructions and
>>> vice-versa).
>>>
>>> """
>>>  * Since this relies on modifying code, the branch modifying functions
>>>  * must be considered absolute slow paths (machine wide synchronization etc.).
>>> """
>>>
>>> http://elixir.free-electrons.com/linux/v4.15-rc1/source/include/linux/jump_label.h#L56
>>>
>>> Not just a slow path, but an *absolute* slow path!
>>>
>>> static_key_deferred and static_key_slow_dec_deferred() are available in
>>> jump_label_ratelimit.h, which is one way to keep it from thrashing too much.
>>> This rate-limited static key is only used in one file
>>> (arch/x86/kvm/lapic.c). That subsystem limits jump label patching to once
>>> per second.
>>
>> I see.
>>
>>>
>>>>
>>>> One thing is that having the static branches allows for sure to avoid any
>>>> cost in the hot data-path. Which is definitely something netdev will want.
>>>>
>>>>
>>>> If we pay a cost for MPTCP, I think it's fine IMO. (at least in a first step ;))
>>>
>>> It would be more straightforward if only MPTCP was paying the cost, but in
>>> this case the hit is systemwide if any extra TCP options are in use. The
>>> combination of a rate-limiting static_key and checking the option list
>>> before the call could be fast enough.
>>
>> If we have rate-limited static keys, do we still need to check the
>> option-list before the call?

You'd still want the fast path on concurrent regular TCP sockets to be 
minimally affected, even when other "extra option" sockets exist. I'm not 
sure if the maintainers would consider it a requirement, but this code is 
so highly optimized that I'm assuming they would be interested in cutting 
out the extra CPU cycles.

>>
>> Also, what if we rather inc/dec the static key more rarely. E.g., not for
>> every new connection but rather for the very first one and then dec again at
>> the very last one.
>>
>> I'm trying to cook a patch...
>
> This is what I have in mind:
>
>
> -----
> commit ba22182b0b3100539e545313e0eb12a9c466c1fe
> Author: Christoph Paasch <cpaasch(a)apple.com>
> Date:   Wed Nov 29 13:11:44 2017 -0800
>
>    tcp_extra_options: Reduce static_branch_inc/dec through a refcnt
>
>    Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 6db92bbbf3e2..344a0d859b37 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2008,6 +2008,7 @@ struct tcp_extra_option_ops {
> 	struct tcp_extra_option_store *(*move)(struct tcp_extra_option_store *from);
> 	void (*destroy)(struct tcp_extra_option_store *store);
> 	struct module		*owner;
> +	atomic_t		*refcnt;
> };
>
> /* Every extra-option in a TCP-socket gets stored in a _store structure which
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 2dcb12c330eb..3067e7b5ad00 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3570,7 +3570,8 @@ int tcp_register_extra_option(struct tcp_extra_option_store *store,
> 	list_add_tail(&store->list, add_before);
> 	pr_debug("Option kind %u registered\n", store->ops->option_kind);
>
> -	static_branch_inc(&tcp_extra_options_enabled);
> +	if (atomic_inc_return(store->ops->refcnt) == 1)
> +		static_branch_inc(&tcp_extra_options_enabled);
>
> 	return ret;
> }
> @@ -3602,7 +3603,8 @@ void tcp_extra_options_copy(struct sock *listener,
>
> 		list_add_tail(&new->list, to);
>
> -		static_branch_inc(&tcp_extra_options_enabled);
> +		if (atomic_inc_return(new->ops->refcnt) == 1)
> +			static_branch_inc(&tcp_extra_options_enabled);
> 	}
> }
>
> @@ -3632,11 +3634,16 @@ void tcp_extra_options_destroy(struct sock *sk)
>
> 	list_for_each_entry_safe(entry, tmp, lhead, list) {
> 		struct module *owner = entry->ops->owner;
> +		bool dec_static = false;
> 		list_del(&entry->list);
>
> +		if (atomic_dec_return(entry->ops->refcnt) == 0)
> +			dec_static = true;
> +
> 		entry->ops->destroy(entry);
>
> -		static_branch_dec(&tcp_extra_options_enabled);
> +		if (dec_static)
> +			static_branch_dec(&tcp_extra_options_enabled);
>
> 		module_put(owner);
> 	}
>

static_branch_{inc,dec} are already reference counts - I didn't explain 
the performance hit clearly. They aren't always expensive.

The problem is when the static key goes from 0->1 and 1->0. As long as 
there's at least one extended option socket active on the system, the 
expensive instruction rewrites won't happen. If you had a process rapidly 
open and close a socket that enabled/disabled the static key, the system 
would be drastically impacted. A server with one or more long-lived 
listening MPTCP sockets would not be affected because the static branch 
reference count for extended options would always be at least 1.

--
Mat Martineau
Intel OTC

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse
@ 2017-11-29 23:59 Christoph Paasch
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Paasch @ 2017-11-29 23:59 UTC (permalink / raw)
  To: mptcp 

[-- Attachment #1: Type: text/plain, Size: 8325 bytes --]

On 29/11/17 - 14:24:27, Mat Martineau wrote:
> 
> On Wed, 29 Nov 2017, Christoph Paasch wrote:
> 
> > On 29/11/17 - 12:50:24, Christoph Paasch wrote:
> > > On 29/11/17 - 12:19:32, Mat Martineau wrote:
> > > > On Wed, 29 Nov 2017, Christoph Paasch wrote:
> > > > 
> > > > > On 28/11/17 - 17:58:52, Mat Martineau wrote:
> > > > > > 
> > > > > > On Tue, 28 Nov 2017, Mat Martineau wrote:
> > > > > > 
> > > > > > > 
> > > > > > > On Mon, 27 Nov 2017, Christoph Paasch wrote:
> > > > > > > 
> > > > > > > > Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
> > > > > > > > ---
> > > > > > > > net/ipv4/tcp_input.c | 8 +++++---
> > > > > > > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > > > > > index 4e84f299c96f..7acc1f46641c 100644
> > > > > > > > --- a/net/ipv4/tcp_input.c
> > > > > > > > +++ b/net/ipv4/tcp_input.c
> > > > > > > > @@ -3799,9 +3799,11 @@ void tcp_parse_options(const struct net *net,
> > > > > > > > 				break;
> > > > > > > > 
> > > > > > > > 			default:
> > > > > > > > -				tcp_extra_options_parse(opcode, opsize, ptr,
> > > > > > > > -							skb, opt_rx,
> > > > > > > > -							tcp_to_sk(tp));
> > > > > > > > +				if (static_branch_unlikely(&tcp_extra_options_enabled))
> > > > > > > 
> > > > > > > Since this case in the switch statement is only executed for
> > > > > > > unrecognized options, I had skipped the static branch in the initial
> > > > > > > implementation. It is more consistent to use the static branch, and
> > > > > > > there isn't much use in making the function call just to try to iterate
> > > > > > > over an empty list.
> > > > > > > 
> > > > > > > > +					tcp_extra_options_parse(opcode, opsize,
> > > > > > > > +								ptr, skb,
> > > > > > > > +								opt_rx,
> > > > > > > > + tcp_to_sk(tp));
> > > > > > > > 				break;
> > > > > > > > 
> > > > > > > > 			}
> > > > > > > > --
> > > > > > > > 2.15.0
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > > > I sent my reply and then realized I had one more point to make. We might
> > > > > > need to take another look at the use of static branching by the extra
> > > > > > options framework. It's one thing to inc/dec tcp_extra_options_enabled when
> > > > > > modules are added or removed, but we don't want to be rapidly changing
> > > > > > static branches as MPTCP sockets come and go. Does tcp_extra_options_enabled
> > > > > > still make sense for per-socket extra_option lists?
> > > > > 
> > > > > You think that the test for whether or not the tcp_option_list is emtpy is
> > > > > sufficient?
> > > > 
> > > > It might be if we wrap it in a "unlikely()".
> > > > 
> > > > > 
> > > > > I don't know about the cost of static_branch_inc/dec. Is it that bad?
> > > > > 
> > > > 
> > > > Yeah, it's that bad. The kernel is patching machine code every time the key
> > > > is enabled or disabled (rewriting no-ops with jump instructions and
> > > > vice-versa).
> > > > 
> > > > """
> > > >  * Since this relies on modifying code, the branch modifying functions
> > > >  * must be considered absolute slow paths (machine wide synchronization etc.).
> > > > """
> > > > 
> > > > http://elixir.free-electrons.com/linux/v4.15-rc1/source/include/linux/jump_label.h#L56
> > > > 
> > > > Not just a slow path, but an *absolute* slow path!
> > > > 
> > > > static_key_deferred and static_key_slow_dec_deferred() are available in
> > > > jump_label_ratelimit.h, which is one way to keep it from thrashing too much.
> > > > This rate-limited static key is only used in one file
> > > > (arch/x86/kvm/lapic.c). That subsystem limits jump label patching to once
> > > > per second.
> > > 
> > > I see.
> > > 
> > > > 
> > > > > 
> > > > > One thing is that having the static branches allows for sure to avoid any
> > > > > cost in the hot data-path. Which is definitely something netdev will want.
> > > > > 
> > > > > 
> > > > > If we pay a cost for MPTCP, I think it's fine IMO. (at least in a first step ;))
> > > > 
> > > > It would be more straightforward if only MPTCP was paying the cost, but in
> > > > this case the hit is systemwide if any extra TCP options are in use. The
> > > > combination of a rate-limiting static_key and checking the option list
> > > > before the call could be fast enough.
> > > 
> > > If we have rate-limited static keys, do we still need to check the
> > > option-list before the call?
> 
> You'd still want the fast path on concurrent regular TCP sockets to be
> minimally affected, even when other "extra option" sockets exist. I'm not
> sure if the maintainers would consider it a requirement, but this code is so
> highly optimized that I'm assuming they would be interested in cutting out
> the extra CPU cycles.
> 
> > > 
> > > Also, what if we rather inc/dec the static key more rarely. E.g., not for
> > > every new connection but rather for the very first one and then dec again at
> > > the very last one.
> > > 
> > > I'm trying to cook a patch...
> > 
> > This is what I have in mind:
> > 
> > 
> > -----
> > commit ba22182b0b3100539e545313e0eb12a9c466c1fe
> > Author: Christoph Paasch <cpaasch(a)apple.com>
> > Date:   Wed Nov 29 13:11:44 2017 -0800
> > 
> >    tcp_extra_options: Reduce static_branch_inc/dec through a refcnt
> > 
> >    Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
> > 
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 6db92bbbf3e2..344a0d859b37 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -2008,6 +2008,7 @@ struct tcp_extra_option_ops {
> > 	struct tcp_extra_option_store *(*move)(struct tcp_extra_option_store *from);
> > 	void (*destroy)(struct tcp_extra_option_store *store);
> > 	struct module		*owner;
> > +	atomic_t		*refcnt;
> > };
> > 
> > /* Every extra-option in a TCP-socket gets stored in a _store structure which
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 2dcb12c330eb..3067e7b5ad00 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -3570,7 +3570,8 @@ int tcp_register_extra_option(struct tcp_extra_option_store *store,
> > 	list_add_tail(&store->list, add_before);
> > 	pr_debug("Option kind %u registered\n", store->ops->option_kind);
> > 
> > -	static_branch_inc(&tcp_extra_options_enabled);
> > +	if (atomic_inc_return(store->ops->refcnt) == 1)
> > +		static_branch_inc(&tcp_extra_options_enabled);
> > 
> > 	return ret;
> > }
> > @@ -3602,7 +3603,8 @@ void tcp_extra_options_copy(struct sock *listener,
> > 
> > 		list_add_tail(&new->list, to);
> > 
> > -		static_branch_inc(&tcp_extra_options_enabled);
> > +		if (atomic_inc_return(new->ops->refcnt) == 1)
> > +			static_branch_inc(&tcp_extra_options_enabled);
> > 	}
> > }
> > 
> > @@ -3632,11 +3634,16 @@ void tcp_extra_options_destroy(struct sock *sk)
> > 
> > 	list_for_each_entry_safe(entry, tmp, lhead, list) {
> > 		struct module *owner = entry->ops->owner;
> > +		bool dec_static = false;
> > 		list_del(&entry->list);
> > 
> > +		if (atomic_dec_return(entry->ops->refcnt) == 0)
> > +			dec_static = true;
> > +
> > 		entry->ops->destroy(entry);
> > 
> > -		static_branch_dec(&tcp_extra_options_enabled);
> > +		if (dec_static)
> > +			static_branch_dec(&tcp_extra_options_enabled);
> > 
> > 		module_put(owner);
> > 	}
> > 
> 
> static_branch_{inc,dec} are already reference counts - I didn't explain the
> performance hit clearly. They aren't always expensive.

Ok, I see. Thanks for clarifying.

> The problem is when the static key goes from 0->1 and 1->0. As long as
> there's at least one extended option socket active on the system, the
> expensive instruction rewrites won't happen. If you had a process rapidly
> open and close a socket that enabled/disabled the static key, the system
> would be drastically impacted. A server with one or more long-lived
> listening MPTCP sockets would not be affected because the static branch
> reference count for extended options would always be at least 1.

Ok, I think the scenario where we have a single socket
appearing/disappearing can be considered a corner-case scenario for now.

Later-on we can optimize that.



Christoph


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-29 23:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-29 23:59 [MPTCP] [PATCH v2 14/25] tcp_extra_options: Check static branch before _parse Christoph Paasch
  -- strict thread matches above, loose matches on Subject: below --
2017-11-29 22:24 Mat Martineau
2017-11-29 21:16 Christoph Paasch
2017-11-29 20:50 Christoph Paasch
2017-11-29 20:19 Mat Martineau
2017-11-29 18:31 Christoph Paasch
2017-11-29  1:58 Mat Martineau
2017-11-29  1:49 Mat Martineau
2017-11-28  5:57 Christoph Paasch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.