netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 1/2] sock: Code cleanup on __sk_mem_raise_allocated()
@ 2023-09-20 13:25 Abel Wu
  2023-09-20 13:25 ` [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory Abel Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Abel Wu @ 2023-09-20 13:25 UTC (permalink / raw)
  To: Shakeel Butt, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kuniyuki Iwashima, Abel Wu, Breno Leitao,
	Alexander Mikhalitsyn, David Howells, Jason Xing
  Cc: Glauber Costa, Xin Long, Andy Shevchenko,
	open list:NETWORKING [GENERAL], open list

Code cleanup for both better simplicity and readability.
No functional change intended.

Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
---
 net/core/sock.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index a5995750c5c5..379eb8b65562 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3046,17 +3046,19 @@ EXPORT_SYMBOL(sk_wait_data);
  */
 int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 {
-	bool memcg_charge = mem_cgroup_sockets_enabled && sk->sk_memcg;
+	struct mem_cgroup *memcg = mem_cgroup_sockets_enabled ? sk->sk_memcg : NULL;
 	struct proto *prot = sk->sk_prot;
-	bool charged = true;
+	bool charged = false;
 	long allocated;
 
 	sk_memory_allocated_add(sk, amt);
 	allocated = sk_memory_allocated(sk);
-	if (memcg_charge &&
-	    !(charged = mem_cgroup_charge_skmem(sk->sk_memcg, amt,
-						gfp_memcg_charge())))
-		goto suppress_allocation;
+
+	if (memcg) {
+		if (!mem_cgroup_charge_skmem(memcg, amt, gfp_memcg_charge()))
+			goto suppress_allocation;
+		charged = true;
+	}
 
 	/* Under limit. */
 	if (allocated <= sk_prot_mem_limits(sk, 0)) {
@@ -3111,8 +3113,8 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 		 */
 		if (sk->sk_wmem_queued + size >= sk->sk_sndbuf) {
 			/* Force charge with __GFP_NOFAIL */
-			if (memcg_charge && !charged) {
-				mem_cgroup_charge_skmem(sk->sk_memcg, amt,
+			if (memcg && !charged) {
+				mem_cgroup_charge_skmem(memcg, amt,
 					gfp_memcg_charge() | __GFP_NOFAIL);
 			}
 			return 1;
@@ -3124,8 +3126,8 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 
 	sk_memory_allocated_sub(sk, amt);
 
-	if (memcg_charge && charged)
-		mem_cgroup_uncharge_skmem(sk->sk_memcg, amt);
+	if (charged)
+		mem_cgroup_uncharge_skmem(memcg, amt);
 
 	return 0;
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory
  2023-09-20 13:25 [PATCH net-next 1/2] sock: Code cleanup on __sk_mem_raise_allocated() Abel Wu
@ 2023-09-20 13:25 ` Abel Wu
  2023-09-21 19:01   ` Shakeel Butt
  0 siblings, 1 reply; 9+ messages in thread
From: Abel Wu @ 2023-09-20 13:25 UTC (permalink / raw)
  To: Shakeel Butt, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Kuniyuki Iwashima, Abel Wu, Breno Leitao,
	Alexander Mikhalitsyn, David Howells, Jason Xing, Xin Long,
	Glauber Costa, KAMEZAWA Hiroyuki
  Cc: open list:NETWORKING [GENERAL], open list

Before sockets became aware of net-memcg's memory pressure since
commit e1aab161e013 ("socket: initial cgroup code."), the memory
usage would be granted to raise if below average even when under
protocol's pressure. This provides fairness among the sockets of
same protocol.

That commit changes this because the heuristic will also be
effective when only memcg is under pressure which makes no sense.
Fix this by skipping this heuristic when under memcg pressure.

Fixes: e1aab161e013 ("socket: initial cgroup code.")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
 net/core/sock.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 379eb8b65562..ef5cf6250f17 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3093,8 +3093,16 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 	if (sk_has_memory_pressure(sk)) {
 		u64 alloc;
 
-		if (!sk_under_memory_pressure(sk))
+		if (memcg && mem_cgroup_under_socket_pressure(memcg))
+			goto suppress_allocation;
+
+		if (!sk_under_global_memory_pressure(sk))
 			return 1;
+
+		/* Trying to be fair among all the sockets under the
+		 * protocol's memory pressure, by allowing the ones
+		 * that below average usage to raise.
+		 */
 		alloc = sk_sockets_allocated_read_positive(sk);
 		if (sk_prot_mem_limits(sk, 2) > alloc *
 		    sk_mem_pages(sk->sk_wmem_queued +
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory
  2023-09-20 13:25 ` [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory Abel Wu
@ 2023-09-21 19:01   ` Shakeel Butt
  2023-09-22  8:36     ` Abel Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Shakeel Butt @ 2023-09-21 19:01 UTC (permalink / raw)
  To: Abel Wu
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, Breno Leitao, Alexander Mikhalitsyn,
	David Howells, Jason Xing, Xin Long, Glauber Costa,
	KAMEZAWA Hiroyuki, open list:NETWORKING [GENERAL], open list

On Wed, Sep 20, 2023 at 09:25:41PM +0800, Abel Wu wrote:
> Before sockets became aware of net-memcg's memory pressure since
> commit e1aab161e013 ("socket: initial cgroup code."), the memory
> usage would be granted to raise if below average even when under
> protocol's pressure. This provides fairness among the sockets of
> same protocol.
> 
> That commit changes this because the heuristic will also be
> effective when only memcg is under pressure which makes no sense.
> Fix this by skipping this heuristic when under memcg pressure.
> 
> Fixes: e1aab161e013 ("socket: initial cgroup code.")
> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> ---
>  net/core/sock.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 379eb8b65562..ef5cf6250f17 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3093,8 +3093,16 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>  	if (sk_has_memory_pressure(sk)) {
>  		u64 alloc;
>  
> -		if (!sk_under_memory_pressure(sk))
> +		if (memcg && mem_cgroup_under_socket_pressure(memcg))
> +			goto suppress_allocation;
> +
> +		if (!sk_under_global_memory_pressure(sk))
>  			return 1;

I am onboard with replacing sk_under_memory_pressure() with
sk_under_global_memory_pressure(). However suppressing on memcg pressure
is a behavior change from status quo and need more thought and testing.

I think there are three options for this hunk:

1. proposed patch
2. Consider memcg pressure only for !in_softirq().
3. Don't consider memcg pressure at all.

All three options are behavior change from the status quo but with
different risk levels. (1) may reintroduce the regression fixed by
720ca52bcef22 ("net-memcg: avoid stalls when under memory pressure").
(2) is more inlined with 720ca52bcef22. (3) has the risk to making memcg
limits ineffective.

IMHO we should go with (2) as there is already a precedence in
720ca52bcef22.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory
  2023-09-21 19:01   ` Shakeel Butt
@ 2023-09-22  8:36     ` Abel Wu
  2023-09-22 10:10       ` Abel Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Abel Wu @ 2023-09-22  8:36 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, Breno Leitao, Alexander Mikhalitsyn,
	David Howells, Jason Xing, Xin Long, KAMEZAWA Hiroyuki,
	open list:NETWORKING [GENERAL], open list

On 9/22/23 3:01 AM, Shakeel Butt wrote:
> On Wed, Sep 20, 2023 at 09:25:41PM +0800, Abel Wu wrote:
>> Before sockets became aware of net-memcg's memory pressure since
>> commit e1aab161e013 ("socket: initial cgroup code."), the memory
>> usage would be granted to raise if below average even when under
>> protocol's pressure. This provides fairness among the sockets of
>> same protocol.
>>
>> That commit changes this because the heuristic will also be
>> effective when only memcg is under pressure which makes no sense.
>> Fix this by skipping this heuristic when under memcg pressure.
>>
>> Fixes: e1aab161e013 ("socket: initial cgroup code.")
>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
>> ---
>>   net/core/sock.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 379eb8b65562..ef5cf6250f17 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -3093,8 +3093,16 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
>>   	if (sk_has_memory_pressure(sk)) {
>>   		u64 alloc;
>>   
>> -		if (!sk_under_memory_pressure(sk))
>> +		if (memcg && mem_cgroup_under_socket_pressure(memcg))
>> +			goto suppress_allocation;
>> +
>> +		if (!sk_under_global_memory_pressure(sk))
>>   			return 1;
> 
> I am onboard with replacing sk_under_memory_pressure() with
> sk_under_global_memory_pressure(). However suppressing on memcg pressure
> is a behavior change from status quo and need more thought and testing.
> 
> I think there are three options for this hunk:
> 
> 1. proposed patch
> 2. Consider memcg pressure only for !in_softirq().
> 3. Don't consider memcg pressure at all.
> 
> All three options are behavior change from the status quo but with
> different risk levels. (1) may reintroduce the regression fixed by
> 720ca52bcef22 ("net-memcg: avoid stalls when under memory pressure").

Just for the record, it is same for the current upstream implementation
if the socket reaches average usage. Taking option 2 will fix this too.

> (2) is more inlined with 720ca52bcef22. (3) has the risk to making memcg
> limits ineffective.
> 
> IMHO we should go with (2) as there is already a precedence in
> 720ca52bcef22.

Yes, I agree. Actually applying option(2) would make this patch quite
similar to the previous version[a], except the below part:

  	/* Under limit. */
  	if (allocated <= sk_prot_mem_limits(sk, 0)) {
  		sk_leave_memory_pressure(sk);
-		return 1;
+		if (!under_memcg_pressure)
+			return 1;
  	}

My original thought is to inherit the behavior of tcpmem pressure.
There are also 3 levels of memcg pressure named low/medium/critical,
but considering that the 'low' level is too much conservative for
socket allocation, I made the following match:

	PROTOCOL	MEMCG		ACTION
	-----------------------------------------------------
	low		<medium		allow allocation
	pressure	medium		be more conservative
	high		critical	throttle

which also seems align with the design[b] of memcg pressure. Anyway
I will take option (2) and post v2.

Thanks & Best,
	Abel

[a] 
https://lore.kernel.org/lkml/20230901062141.51972-4-wuyun.abel@bytedance.com/
[b] 
https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#memory-pressure

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory
  2023-09-22  8:36     ` Abel Wu
@ 2023-09-22 10:10       ` Abel Wu
  2023-09-24  7:28         ` Shakeel Butt
  0 siblings, 1 reply; 9+ messages in thread
From: Abel Wu @ 2023-09-22 10:10 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, Breno Leitao, Alexander Mikhalitsyn,
	David Howells, Jason Xing, Xin Long, KAMEZAWA Hiroyuki,
	open list:NETWORKING [GENERAL], open list

On 9/22/23 4:36 PM, Abel Wu wrote:
> On 9/22/23 3:01 AM, Shakeel Butt wrote:
>> On Wed, Sep 20, 2023 at 09:25:41PM +0800, Abel Wu wrote:
>>> Before sockets became aware of net-memcg's memory pressure since
>>> commit e1aab161e013 ("socket: initial cgroup code."), the memory
>>> usage would be granted to raise if below average even when under
>>> protocol's pressure. This provides fairness among the sockets of
>>> same protocol.
>>>
>>> That commit changes this because the heuristic will also be
>>> effective when only memcg is under pressure which makes no sense.
>>> Fix this by skipping this heuristic when under memcg pressure.
>>>
>>> Fixes: e1aab161e013 ("socket: initial cgroup code.")
>>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
>>> ---
>>>   net/core/sock.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>> index 379eb8b65562..ef5cf6250f17 100644
>>> --- a/net/core/sock.c
>>> +++ b/net/core/sock.c
>>> @@ -3093,8 +3093,16 @@ int __sk_mem_raise_allocated(struct sock *sk, 
>>> int size, int amt, int kind)
>>>       if (sk_has_memory_pressure(sk)) {
>>>           u64 alloc;
>>> -        if (!sk_under_memory_pressure(sk))
>>> +        if (memcg && mem_cgroup_under_socket_pressure(memcg))
>>> +            goto suppress_allocation;
>>> +
>>> +        if (!sk_under_global_memory_pressure(sk))
>>>               return 1;
>>
>> I am onboard with replacing sk_under_memory_pressure() with
>> sk_under_global_memory_pressure(). However suppressing on memcg pressure
>> is a behavior change from status quo and need more thought and testing.
>>
>> I think there are three options for this hunk:
>>
>> 1. proposed patch
>> 2. Consider memcg pressure only for !in_softirq().
>> 3. Don't consider memcg pressure at all.
>>
>> All three options are behavior change from the status quo but with
>> different risk levels. (1) may reintroduce the regression fixed by
>> 720ca52bcef22 ("net-memcg: avoid stalls when under memory pressure").
> 
> Just for the record, it is same for the current upstream implementation
> if the socket reaches average usage. Taking option 2 will fix this too.
> 
>> (2) is more inlined with 720ca52bcef22. (3) has the risk to making memcg
>> limits ineffective.
>>
>> IMHO we should go with (2) as there is already a precedence in
>> 720ca52bcef22.
> 
> Yes, I agree. Actually applying option(2) would make this patch quite
> similar to the previous version[a], except the below part:
> 
>       /* Under limit. */
>       if (allocated <= sk_prot_mem_limits(sk, 0)) {
>           sk_leave_memory_pressure(sk);
> -        return 1;
> +        if (!under_memcg_pressure)
> +            return 1;
>       }

After a second thought, it is still vague to me about the position
the memcg pressure should be in socket memory allocation. It lacks
convincing design. I think the above hunk helps, but not much.

I wonder if we should take option (3) first. Thoughts?

Thanks,
	Abel

> 
> My original thought is to inherit the behavior of tcpmem pressure.
> There are also 3 levels of memcg pressure named low/medium/critical,
> but considering that the 'low' level is too much conservative for
> socket allocation, I made the following match:
> 
>      PROTOCOL    MEMCG        ACTION
>      -----------------------------------------------------
>      low        <medium        allow allocation
>      pressure    medium        be more conservative
>      high        critical    throttle
> 
> which also seems align with the design[b] of memcg pressure. Anyway
> I will take option (2) and post v2.
> 
> Thanks & Best,
>      Abel
> 
> [a] 
> https://lore.kernel.org/lkml/20230901062141.51972-4-wuyun.abel@bytedance.com/
> [b] 
> https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#memory-pressure

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory
  2023-09-22 10:10       ` Abel Wu
@ 2023-09-24  7:28         ` Shakeel Butt
  2023-10-03 12:49           ` Abel Wu
  0 siblings, 1 reply; 9+ messages in thread
From: Shakeel Butt @ 2023-09-24  7:28 UTC (permalink / raw)
  To: Abel Wu
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, Breno Leitao, Alexander Mikhalitsyn,
	David Howells, Jason Xing, Xin Long, KAMEZAWA Hiroyuki,
	open list:NETWORKING [GENERAL], open list

On Fri, Sep 22, 2023 at 06:10:06PM +0800, Abel Wu wrote:
[...]
> 
> After a second thought, it is still vague to me about the position
> the memcg pressure should be in socket memory allocation. It lacks
> convincing design. I think the above hunk helps, but not much.
> 
> I wonder if we should take option (3) first. Thoughts?
> 

Let's take a step further. Let's decouple the memcg accounting and
global skmem accounting. __sk_mem_raise_allocated is already very hard
to reason. There are couple of heuristics in it which may or may not
apply to both accounting infrastructures.

Let's explicitly document what heurisitics allows to forcefully succeed
the allocations i.e. irrespective of pressure or over limit for both
accounting infras. I think decoupling them would make the flow of the
code very clear.

There are three heuristics:

1. minimum buffer size even under pressure.

2. allow allocation for a socket whose usage is below average of the
system.

3. socket is over its sndbuf.

Let's discuss which heuristic applies to which accounting infra and
under which state (under pressure or over limit).

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory
  2023-09-24  7:28         ` Shakeel Butt
@ 2023-10-03 12:49           ` Abel Wu
  2023-10-11  3:04             ` Abel Wu
  2023-10-13  1:09             ` Shakeel Butt
  0 siblings, 2 replies; 9+ messages in thread
From: Abel Wu @ 2023-10-03 12:49 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, Breno Leitao, Alexander Mikhalitsyn,
	David Howells, Jason Xing, Xin Long, KAMEZAWA Hiroyuki,
	open list:NETWORKING [GENERAL], open list

On 9/24/23 3:28 PM, Shakeel Butt wrote:
> On Fri, Sep 22, 2023 at 06:10:06PM +0800, Abel Wu wrote:
> [...]
>>
>> After a second thought, it is still vague to me about the position
>> the memcg pressure should be in socket memory allocation. It lacks
>> convincing design. I think the above hunk helps, but not much.
>>
>> I wonder if we should take option (3) first. Thoughts?
>>
> 
> Let's take a step further. Let's decouple the memcg accounting and
> global skmem accounting. __sk_mem_raise_allocated is already very hard
> to reason. There are couple of heuristics in it which may or may not
> apply to both accounting infrastructures.
> 
> Let's explicitly document what heurisitics allows to forcefully succeed
> the allocations i.e. irrespective of pressure or over limit for both
> accounting infras. I think decoupling them would make the flow of the
> code very clear.

I can't agree more.

> 
> There are three heuristics:

I found all of them were first introduced in linux-2.4.0-test7pre1 for
TCP only, and then migrated to socket core in linux-2.6.8-rc1 without
functional change.

> 
> 1. minimum buffer size even under pressure.

This is required by RFC 7323 (TCP Extensions for High Performance) to
make features like Window Scale option work as expected, and should be
succeeded under global pressure by tcp_{r,w}mem's definition. And IMHO
for same reason, it should also be succeeded under memcg pressure, or
else workloads might suffer performance drop due to bottleneck on
network.

The allocation must not be succeeded either exceed global or memcg's
hard limit, or else a DoS attack can be taken place by spawning lots
of sockets that are under minimum buffer size.

> 
> 2. allow allocation for a socket whose usage is below average of the
> system.

Since 'average' is within the scope of global accounting, this one
only makes sense under global memory pressure. Actually this exists
before cgroup was born, hence doesn't take memcg into consideration.

While OTOH the intention of throttling under memcg pressure is to
relief the memcg from heavy reclaim pressure, this heuristic does no
help. And there also seems to be no reason to succeed the allocation
when global or memcg's hard limit is exceeded.

> 
> 3. socket is over its sndbuf.

TBH I don't get its point..

> 
> Let's discuss which heuristic applies to which accounting infra and
> under which state (under pressure or over limit).

I will follow your suggestion to post a patch to explicitly document
the behaviors once things are cleared.

Thanks,
	Abel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory
  2023-10-03 12:49           ` Abel Wu
@ 2023-10-11  3:04             ` Abel Wu
  2023-10-13  1:09             ` Shakeel Butt
  1 sibling, 0 replies; 9+ messages in thread
From: Abel Wu @ 2023-10-11  3:04 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, Breno Leitao, Alexander Mikhalitsyn,
	David Howells, Jason Xing, Xin Long,
	open list:NETWORKING [GENERAL], open list

Gentle ping :)

在 10/3/23 8:49 PM, Abel Wu Wrote:
> On 9/24/23 3:28 PM, Shakeel Butt wrote:
>> On Fri, Sep 22, 2023 at 06:10:06PM +0800, Abel Wu wrote:
>> [...]
>>>
>>> After a second thought, it is still vague to me about the position
>>> the memcg pressure should be in socket memory allocation. It lacks
>>> convincing design. I think the above hunk helps, but not much.
>>>
>>> I wonder if we should take option (3) first. Thoughts?
>>>
>>
>> Let's take a step further. Let's decouple the memcg accounting and
>> global skmem accounting. __sk_mem_raise_allocated is already very hard
>> to reason. There are couple of heuristics in it which may or may not
>> apply to both accounting infrastructures.
>>
>> Let's explicitly document what heurisitics allows to forcefully succeed
>> the allocations i.e. irrespective of pressure or over limit for both
>> accounting infras. I think decoupling them would make the flow of the
>> code very clear.
> 
> I can't agree more.
> 
>>
>> There are three heuristics:
> 
> I found all of them were first introduced in linux-2.4.0-test7pre1 for
> TCP only, and then migrated to socket core in linux-2.6.8-rc1 without
> functional change.
> 
>>
>> 1. minimum buffer size even under pressure.
> 
> This is required by RFC 7323 (TCP Extensions for High Performance) to
> make features like Window Scale option work as expected, and should be
> succeeded under global pressure by tcp_{r,w}mem's definition. And IMHO
> for same reason, it should also be succeeded under memcg pressure, or
> else workloads might suffer performance drop due to bottleneck on
> network.
> 
> The allocation must not be succeeded either exceed global or memcg's
> hard limit, or else a DoS attack can be taken place by spawning lots
> of sockets that are under minimum buffer size.
> 
>>
>> 2. allow allocation for a socket whose usage is below average of the
>> system.
> 
> Since 'average' is within the scope of global accounting, this one
> only makes sense under global memory pressure. Actually this exists
> before cgroup was born, hence doesn't take memcg into consideration.
> 
> While OTOH the intention of throttling under memcg pressure is to
> relief the memcg from heavy reclaim pressure, this heuristic does no
> help. And there also seems to be no reason to succeed the allocation
> when global or memcg's hard limit is exceeded.
> 
>>
>> 3. socket is over its sndbuf.
> 
> TBH I don't get its point..
> 
>>
>> Let's discuss which heuristic applies to which accounting infra and
>> under which state (under pressure or over limit).
> 
> I will follow your suggestion to post a patch to explicitly document
> the behaviors once things are cleared.
> 
> Thanks,
>      Abel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory
  2023-10-03 12:49           ` Abel Wu
  2023-10-11  3:04             ` Abel Wu
@ 2023-10-13  1:09             ` Shakeel Butt
  1 sibling, 0 replies; 9+ messages in thread
From: Shakeel Butt @ 2023-10-13  1:09 UTC (permalink / raw)
  To: Abel Wu
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Kuniyuki Iwashima, Breno Leitao, Alexander Mikhalitsyn,
	David Howells, Jason Xing, Xin Long, KAMEZAWA Hiroyuki,
	open list:NETWORKING [GENERAL], open list

On Tue, Oct 03, 2023 at 08:49:08PM +0800, Abel Wu wrote:
[...]
> > 1. minimum buffer size even under pressure.
> 
> This is required by RFC 7323 (TCP Extensions for High Performance) to
> make features like Window Scale option work as expected, and should be
> succeeded under global pressure by tcp_{r,w}mem's definition. And IMHO
> for same reason, it should also be succeeded under memcg pressure, or
> else workloads might suffer performance drop due to bottleneck on
> network.
> 
> The allocation must not be succeeded either exceed global or memcg's
> hard limit, or else a DoS attack can be taken place by spawning lots
> of sockets that are under minimum buffer size.
> 

Sounds good.

> > 
> > 2. allow allocation for a socket whose usage is below average of the
> > system.
> 
> Since 'average' is within the scope of global accounting, this one
> only makes sense under global memory pressure. Actually this exists
> before cgroup was born, hence doesn't take memcg into consideration.
> 
> While OTOH the intention of throttling under memcg pressure is to
> relief the memcg from heavy reclaim pressure, this heuristic does no
> help. And there also seems to be no reason to succeed the allocation
> when global or memcg's hard limit is exceeded.
> 

Sounds good too.

> > 
> > 3. socket is over its sndbuf.
> 
> TBH I don't get its point..
> 

So, this corresponds to following code in __sk_mem_raise_allocated()

	if (kind == SK_MEM_SEND && sk->sk_type == SOCK_STREAM) {
		sk_stream_moderate_sndbuf(sk);

		/* Fail only if socket is _under_ its sndbuf.
		 * In this case we cannot block, so that we have to fail.
		 */
		if (sk->sk_wmem_queued + size >= sk->sk_sndbuf) {
			/* Force charge with __GFP_NOFAIL */
			if (memcg_charge && !charged) {
				mem_cgroup_charge_skmem(sk->sk_memcg, amt,
					gfp_memcg_charge() | __GFP_NOFAIL);
			}
			return 1;
		}
	}

Here we moderate the sk_sndbuf possibly half of sk_wmem_queued and thus
we always succeed unless user has done SO_SNDBUF on the socket in which
case it interacts with sk_stream_wait_memory() called in sendmsg.

I am not really able to make sense of the interaction between this code
and sk_stream_wait_memory() and will punt to networking experts to shed
some light.

Other than that I think we need to answer if we want to moderate the
sndbuf on memcg charge failure.

> > 
> > Let's discuss which heuristic applies to which accounting infra and
> > under which state (under pressure or over limit).
> 
> I will follow your suggestion to post a patch to explicitly document
> the behaviors once things are cleared.
> 

Let's just post the patch and see what other folks comment as well.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-10-13  1:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-20 13:25 [PATCH net-next 1/2] sock: Code cleanup on __sk_mem_raise_allocated() Abel Wu
2023-09-20 13:25 ` [PATCH net-next 2/2] sock: Fix improper heuristic on raising memory Abel Wu
2023-09-21 19:01   ` Shakeel Butt
2023-09-22  8:36     ` Abel Wu
2023-09-22 10:10       ` Abel Wu
2023-09-24  7:28         ` Shakeel Butt
2023-10-03 12:49           ` Abel Wu
2023-10-11  3:04             ` Abel Wu
2023-10-13  1:09             ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).