* [PATCH RESEND net-next 2/2] net-memcg: Remove redundant tcpmem_pressure
2023-07-11 12:41 [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure Abel Wu
@ 2023-07-11 12:41 ` Abel Wu
2023-07-12 3:45 ` [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure Jakub Kicinski
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Abel Wu @ 2023-07-11 12:41 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, David Ahern, Yosry Ahmed,
Matthew Wilcox (Oracle), Yu Zhao, Abel Wu, Yafang Shao,
Kefeng Wang, Kuniyuki Iwashima, Martin KaFai Lau,
Alexander Mikhalitsyn, Breno Leitao, David Howells, Jason Xing,
Xin Long
Cc: Michal Hocko, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
As {socket,tcpmem}_pressure are only used in default/legacy mode
respectively, use socket_pressure instead of tcpmem_pressure in all
kinds of cgroup hierarchies.
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
include/linux/memcontrol.h | 3 +--
mm/memcontrol.c | 4 ++--
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5860c7f316b9..341d397186ff 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -288,7 +288,6 @@ struct mem_cgroup {
/* Legacy tcp memory accounting */
bool tcpmem_active;
- int tcpmem_pressure;
#ifdef CONFIG_MEMCG_KMEM
int kmemcg_id;
@@ -1728,7 +1727,7 @@ void mem_cgroup_sk_free(struct sock *sk);
static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
{
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
- return !!memcg->tcpmem_pressure;
+ return !!memcg->socket_pressure;
do {
if (time_before(jiffies, READ_ONCE(memcg->socket_pressure)))
return true;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e8ca4bdcb03c..e9e26dbd65b5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7292,10 +7292,10 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
struct page_counter *fail;
if (page_counter_try_charge(&memcg->tcpmem, nr_pages, &fail)) {
- memcg->tcpmem_pressure = 0;
+ memcg->socket_pressure = 0;
return true;
}
- memcg->tcpmem_pressure = 1;
+ memcg->socket_pressure = 1;
if (gfp_mask & __GFP_NOFAIL) {
page_counter_charge(&memcg->tcpmem, nr_pages);
return true;
--
2.37.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-11 12:41 [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure Abel Wu
2023-07-11 12:41 ` [PATCH RESEND net-next 2/2] net-memcg: Remove redundant tcpmem_pressure Abel Wu
@ 2023-07-12 3:45 ` Jakub Kicinski
2023-07-12 6:45 ` Abel Wu
2023-07-20 7:58 ` Abel Wu
2023-07-22 0:20 ` Roman Gushchin
3 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2023-07-12 3:45 UTC (permalink / raw)
To: Abel Wu
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Johannes Weiner,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton, David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle),
Yu Zhao, Kefeng Wang, Yafang Shao, Kuniyuki Iwashima,
Martin KaFai Lau, Alexander Mikhalitsyn, Breno Leitao,
David Howells, Jason Xing, Xin Long, Michal Hocko,
Alexei Starovoitov, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
On Tue, 11 Jul 2023 20:41:43 +0800 Abel Wu wrote:
> Now there are two indicators of socket memory pressure sit inside
> struct mem_cgroup, socket_pressure and tcpmem_pressure.
>
> When in legacy mode aka. cgroupv1, the socket memory is charged
> into a separate counter memcg->tcpmem rather than ->memory, so
> the reclaim pressure of the memcg has nothing to do with socket's
> pressure at all. While for default mode, the ->tcpmem is simply
> not used.
>
> So {socket,tcpmem}_pressure are only used in default/legacy mode
> respectively. This patch fixes the pieces of code that make mixed
> use of both.
Eric Dumazet is currently AFK, can we wait for him to return
(in about a week) before merging?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-12 3:45 ` [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure Jakub Kicinski
@ 2023-07-12 6:45 ` Abel Wu
0 siblings, 0 replies; 13+ messages in thread
From: Abel Wu @ 2023-07-12 6:45 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Johannes Weiner,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton, David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle),
Yu Zhao, Kefeng Wang, Yafang Shao, Kuniyuki Iwashima,
Martin KaFai Lau, Alexander Mikhalitsyn, Breno Leitao,
David Howells, Jason Xing, Xin Long, Michal Hocko,
Alexei Starovoitov, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
Hi Jakub,
On 7/12/23 11:45 AM, Jakub Kicinski wrote:
> On Tue, 11 Jul 2023 20:41:43 +0800 Abel Wu wrote:
>> Now there are two indicators of socket memory pressure sit inside
>> struct mem_cgroup, socket_pressure and tcpmem_pressure.
>>
>> When in legacy mode aka. cgroupv1, the socket memory is charged
>> into a separate counter memcg->tcpmem rather than ->memory, so
>> the reclaim pressure of the memcg has nothing to do with socket's
>> pressure at all. While for default mode, the ->tcpmem is simply
>> not used.
>>
>> So {socket,tcpmem}_pressure are only used in default/legacy mode
>> respectively. This patch fixes the pieces of code that make mixed
>> use of both.
>
> Eric Dumazet is currently AFK, can we wait for him to return
> (in about a week) before merging?
Sure, thanks for providing this information!
Best Regards,
Abel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-11 12:41 [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure Abel Wu
2023-07-11 12:41 ` [PATCH RESEND net-next 2/2] net-memcg: Remove redundant tcpmem_pressure Abel Wu
2023-07-12 3:45 ` [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure Jakub Kicinski
@ 2023-07-20 7:58 ` Abel Wu
2023-07-20 8:57 ` Eric Dumazet
2023-07-22 0:20 ` Roman Gushchin
3 siblings, 1 reply; 13+ messages in thread
From: Abel Wu @ 2023-07-20 7:58 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
Muchun Song, Andrew Morton, David Ahern, Yosry Ahmed,
Matthew Wilcox (Oracle), Yu Zhao, Kefeng Wang, Yafang Shao,
Kuniyuki Iwashima, Martin KaFai Lau, Alexander Mikhalitsyn,
Breno Leitao, David Howells, Jason Xing, Xin Long
Cc: Michal Hocko, Alexei Starovoitov, open list,
open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
Gentle ping :)
On 7/11/23 8:41 PM, Abel Wu wrote:
> Now there are two indicators of socket memory pressure sit inside
> struct mem_cgroup, socket_pressure and tcpmem_pressure.
>
> When in legacy mode aka. cgroupv1, the socket memory is charged
> into a separate counter memcg->tcpmem rather than ->memory, so
> the reclaim pressure of the memcg has nothing to do with socket's
> pressure at all. While for default mode, the ->tcpmem is simply
> not used.
>
> So {socket,tcpmem}_pressure are only used in default/legacy mode
> respectively. This patch fixes the pieces of code that make mixed
> use of both.
>
> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> ---
> include/linux/memcontrol.h | 4 ++--
> mm/vmpressure.c | 8 ++++++++
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-20 7:58 ` Abel Wu
@ 2023-07-20 8:57 ` Eric Dumazet
2023-07-20 11:34 ` Abel Wu
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2023-07-20 8:57 UTC (permalink / raw)
To: Abel Wu
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Johannes Weiner,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton, David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle),
Yu Zhao, Kefeng Wang, Yafang Shao, Kuniyuki Iwashima,
Martin KaFai Lau, Alexander Mikhalitsyn, Breno Leitao,
David Howells, Jason Xing, Xin Long, Michal Hocko,
Alexei Starovoitov, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
On Thu, Jul 20, 2023 at 9:59 AM Abel Wu <wuyun.abel@bytedance.com> wrote:
>
> Gentle ping :)
I was hoping for some feedback from memcg experts.
You claim to fix a bug, please provide a Fixes: tag so that we can
involve original patch author.
Thanks.
>
> On 7/11/23 8:41 PM, Abel Wu wrote:
> > Now there are two indicators of socket memory pressure sit inside
> > struct mem_cgroup, socket_pressure and tcpmem_pressure.
> >
> > When in legacy mode aka. cgroupv1, the socket memory is charged
> > into a separate counter memcg->tcpmem rather than ->memory, so
> > the reclaim pressure of the memcg has nothing to do with socket's
> > pressure at all. While for default mode, the ->tcpmem is simply
> > not used.
> >
> > So {socket,tcpmem}_pressure are only used in default/legacy mode
> > respectively. This patch fixes the pieces of code that make mixed
> > use of both.
> >
> > Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> > ---
> > include/linux/memcontrol.h | 4 ++--
> > mm/vmpressure.c | 8 ++++++++
> > 2 files changed, 10 insertions(+), 2 deletions(-)
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-20 8:57 ` Eric Dumazet
@ 2023-07-20 11:34 ` Abel Wu
0 siblings, 0 replies; 13+ messages in thread
From: Abel Wu @ 2023-07-20 11:34 UTC (permalink / raw)
To: Eric Dumazet, Johannes Weiner
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle), Yu Zhao,
Kefeng Wang, Yafang Shao, Kuniyuki Iwashima, Martin KaFai Lau,
Alexander Mikhalitsyn, Breno Leitao, David Howells, Jason Xing,
Xin Long, Michal Hocko, Alexei Starovoitov, open list,
open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
On 7/20/23 4:57 PM, Eric Dumazet wrote:
> On Thu, Jul 20, 2023 at 9:59 AM Abel Wu <wuyun.abel@bytedance.com> wrote:
>>
>> Gentle ping :)
>
> I was hoping for some feedback from memcg experts.
Me too :)
>
> You claim to fix a bug, please provide a Fixes: tag so that we can
> involve original patch author.
Sorry for missing that part, will be added in next version.
Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket
pressure")
Thanks!
Abel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-11 12:41 [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure Abel Wu
` (2 preceding siblings ...)
2023-07-20 7:58 ` Abel Wu
@ 2023-07-22 0:20 ` Roman Gushchin
2023-07-24 3:47 ` Abel Wu
3 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2023-07-22 0:20 UTC (permalink / raw)
To: Abel Wu
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Johannes Weiner, Michal Hocko, Shakeel Butt, Muchun Song,
Andrew Morton, David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle),
Yu Zhao, Kefeng Wang, Yafang Shao, Kuniyuki Iwashima,
Martin KaFai Lau, Alexander Mikhalitsyn, Breno Leitao,
David Howells, Jason Xing, Xin Long, Michal Hocko,
Alexei Starovoitov, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
On Tue, Jul 11, 2023 at 08:41:43PM +0800, Abel Wu wrote:
> Now there are two indicators of socket memory pressure sit inside
> struct mem_cgroup, socket_pressure and tcpmem_pressure.
Hi Abel!
> When in legacy mode aka. cgroupv1, the socket memory is charged
> into a separate counter memcg->tcpmem rather than ->memory, so
> the reclaim pressure of the memcg has nothing to do with socket's
> pressure at all.
But we still might set memcg->socket_pressure and propagate the pressure,
right?
If you're changing this, you need to provide a bit more data on why it's
a good idea. I'm not saying the current status is perfect, but I think we need
a bit more justification for this change.
> While for default mode, the ->tcpmem is simply
> not used.
>
> So {socket,tcpmem}_pressure are only used in default/legacy mode
> respectively. This patch fixes the pieces of code that make mixed
> use of both.
>
> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> ---
> include/linux/memcontrol.h | 4 ++--
> mm/vmpressure.c | 8 ++++++++
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 5818af8eca5a..5860c7f316b9 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1727,8 +1727,8 @@ void mem_cgroup_sk_alloc(struct sock *sk);
> void mem_cgroup_sk_free(struct sock *sk);
> static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
> {
> - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure)
> - return true;
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> + return !!memcg->tcpmem_pressure;
So here you can have something like
if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
do {
if (time_before(jiffies, READ_ONCE(memcg->socket_pressure)))
return true;
} while ((memcg = parent_mem_cgroup(memcg)));
} else {
return !!READ_ONCE(memcg->socket_pressure);
}
And, please, add a bold comment here or nearby the socket_pressure definition
that it has a different semantics in the legacy and default modes.
Overall I think it's a good idea to clean these things up and thank you
for working on this. But I wonder if we can make the next step and leave only
one mechanism for both cgroup v1 and v2 instead of having this weird setup
where memcg->socket_pressure is set differently from different paths on cgroup
v1 and v2.
Thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-22 0:20 ` Roman Gushchin
@ 2023-07-24 3:47 ` Abel Wu
2023-07-26 2:56 ` Roman Gushchin
0 siblings, 1 reply; 13+ messages in thread
From: Abel Wu @ 2023-07-24 3:47 UTC (permalink / raw)
To: Roman Gushchin
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Johannes Weiner, Michal Hocko, Shakeel Butt, Muchun Song,
Andrew Morton, David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle),
Yu Zhao, Kefeng Wang, Yafang Shao, Kuniyuki Iwashima,
Martin KaFai Lau, Alexander Mikhalitsyn, Breno Leitao,
David Howells, Jason Xing, Xin Long, Michal Hocko,
Alexei Starovoitov, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
Hi Roman, thanks for taking time to have a look!
On 7/22/23 8:20 AM, Roman Gushchin wrote:
> On Tue, Jul 11, 2023 at 08:41:43PM +0800, Abel Wu wrote:
>> Now there are two indicators of socket memory pressure sit inside
>> struct mem_cgroup, socket_pressure and tcpmem_pressure.
>
> Hi Abel!
>
>> When in legacy mode aka. cgroupv1, the socket memory is charged
>> into a separate counter memcg->tcpmem rather than ->memory, so
>> the reclaim pressure of the memcg has nothing to do with socket's
>> pressure at all.
>
> But we still might set memcg->socket_pressure and propagate the pressure,
> right?
Yes, but the pressure comes from memcg->socket_pressure does not mean
pressure in socket memory in cgroupv1, which might lead to premature
reclamation or throttling on socket memory allocation. As the following
example shows:
->memory ->tcpmem
limit 10G 10G
usage 9G 4G
pressure true false
the memcg's memory limits are both set to 10G, and the ->memory part
is suffering from reclaim pressure while ->tcpmem still has much room
for use. I have no idea why should treat the ->tcpmem as under pressure
in this scenario, am I missed something?
> If you're changing this, you need to provide a bit more data on why it's
> a good idea. I'm not saying the current status is perfect, but I think we need
> a bit more justification for this change.
>
>> While for default mode, the ->tcpmem is simply
>> not used.
>>
>> So {socket,tcpmem}_pressure are only used in default/legacy mode
>> respectively. This patch fixes the pieces of code that make mixed
>> use of both.
>>
>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
>> ---
>> include/linux/memcontrol.h | 4 ++--
>> mm/vmpressure.c | 8 ++++++++
>> 2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 5818af8eca5a..5860c7f316b9 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -1727,8 +1727,8 @@ void mem_cgroup_sk_alloc(struct sock *sk);
>> void mem_cgroup_sk_free(struct sock *sk);
>> static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
>> {
>> - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure)
>> - return true;
>> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>> + return !!memcg->tcpmem_pressure;
>
> So here you can have something like
> if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> do {
> if (time_before(jiffies, READ_ONCE(memcg->socket_pressure)))
> return true;
> } while ((memcg = parent_mem_cgroup(memcg)));
> } else {
> return !!READ_ONCE(memcg->socket_pressure);
> }
Yes, this looks better.
>
> And, please, add a bold comment here or nearby the socket_pressure definition
> that it has a different semantics in the legacy and default modes.
Agreed.
>
> Overall I think it's a good idea to clean these things up and thank you
> for working on this. But I wonder if we can make the next step and leave only
> one mechanism for both cgroup v1 and v2 instead of having this weird setup
> where memcg->socket_pressure is set differently from different paths on cgroup
> v1 and v2.
There is some difficulty in unifying the mechanism for both cgroup
designs. Throttling socket memory allocation when memcg is under
pressure only makes sense when socket memory and other usages are
sharing the same limit, which is not true for cgroupv1. Thoughts?
Thanks & Best,
Abel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-24 3:47 ` Abel Wu
@ 2023-07-26 2:56 ` Roman Gushchin
2023-07-26 8:44 ` Abel Wu
0 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2023-07-26 2:56 UTC (permalink / raw)
To: Abel Wu
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Johannes Weiner, Michal Hocko, Shakeel Butt, Muchun Song,
Andrew Morton, David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle),
Yu Zhao, Kefeng Wang, Yafang Shao, Kuniyuki Iwashima,
Martin KaFai Lau, Alexander Mikhalitsyn, Breno Leitao,
David Howells, Jason Xing, Xin Long, Michal Hocko,
Alexei Starovoitov, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote:
> Hi Roman, thanks for taking time to have a look!
>
> On 7/22/23 8:20 AM, Roman Gushchin wrote:
> > On Tue, Jul 11, 2023 at 08:41:43PM +0800, Abel Wu wrote:
> > > Now there are two indicators of socket memory pressure sit inside
> > > struct mem_cgroup, socket_pressure and tcpmem_pressure.
> >
> > Hi Abel!
> >
> > > When in legacy mode aka. cgroupv1, the socket memory is charged
> > > into a separate counter memcg->tcpmem rather than ->memory, so
> > > the reclaim pressure of the memcg has nothing to do with socket's
> > > pressure at all.
> >
> > But we still might set memcg->socket_pressure and propagate the pressure,
> > right?
>
> Yes, but the pressure comes from memcg->socket_pressure does not mean
> pressure in socket memory in cgroupv1, which might lead to premature
> reclamation or throttling on socket memory allocation. As the following
> example shows:
>
> ->memory ->tcpmem
> limit 10G 10G
> usage 9G 4G
> pressure true false
Yes, now it makes sense to me. Thank you for the explanation.
Then I'd organize the patchset in the following way:
1) cgroup v1-only fix to not throttle tcpmem based on the vmpressure
2) a formal code refactoring
>
> the memcg's memory limits are both set to 10G, and the ->memory part
> is suffering from reclaim pressure while ->tcpmem still has much room
> for use. I have no idea why should treat the ->tcpmem as under pressure
> in this scenario, am I missed something?
>
> > If you're changing this, you need to provide a bit more data on why it's
> > a good idea. I'm not saying the current status is perfect, but I think we need
> > a bit more justification for this change.
> >
> > > While for default mode, the ->tcpmem is simply
> > > not used.
> > >
> > > So {socket,tcpmem}_pressure are only used in default/legacy mode
> > > respectively. This patch fixes the pieces of code that make mixed
> > > use of both.
> > >
> > > Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> > > ---
> > > include/linux/memcontrol.h | 4 ++--
> > > mm/vmpressure.c | 8 ++++++++
> > > 2 files changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > index 5818af8eca5a..5860c7f316b9 100644
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -1727,8 +1727,8 @@ void mem_cgroup_sk_alloc(struct sock *sk);
> > > void mem_cgroup_sk_free(struct sock *sk);
> > > static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
> > > {
> > > - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure)
> > > - return true;
> > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > > + return !!memcg->tcpmem_pressure;
> >
> > So here you can have something like
> > if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > do {
> > if (time_before(jiffies, READ_ONCE(memcg->socket_pressure)))
> > return true;
> > } while ((memcg = parent_mem_cgroup(memcg)));
> > } else {
> > return !!READ_ONCE(memcg->socket_pressure);
> > }
>
> Yes, this looks better.
>
> >
> > And, please, add a bold comment here or nearby the socket_pressure definition
> > that it has a different semantics in the legacy and default modes.
>
> Agreed.
>
> >
> > Overall I think it's a good idea to clean these things up and thank you
> > for working on this. But I wonder if we can make the next step and leave only
> > one mechanism for both cgroup v1 and v2 instead of having this weird setup
> > where memcg->socket_pressure is set differently from different paths on cgroup
> > v1 and v2.
>
> There is some difficulty in unifying the mechanism for both cgroup
> designs. Throttling socket memory allocation when memcg is under
> pressure only makes sense when socket memory and other usages are
> sharing the same limit, which is not true for cgroupv1. Thoughts?
I see... Generally speaking cgroup v1 is considered frozen, so we can leave it
as it is, except when it creates an unnecessary complexity in the code.
I'm curious, was your work driven by some real-world problem or a desire to clean
up the code? Both are valid reasons of course.
Thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-26 2:56 ` Roman Gushchin
@ 2023-07-26 8:44 ` Abel Wu
2023-07-27 0:19 ` Roman Gushchin
0 siblings, 1 reply; 13+ messages in thread
From: Abel Wu @ 2023-07-26 8:44 UTC (permalink / raw)
To: Roman Gushchin
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Johannes Weiner, Michal Hocko, Shakeel Butt, Muchun Song,
Andrew Morton, David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle),
Yu Zhao, Kefeng Wang, Yafang Shao, Kuniyuki Iwashima,
Martin KaFai Lau, Alexander Mikhalitsyn, Breno Leitao,
David Howells, Jason Xing, Xin Long, Michal Hocko,
Alexei Starovoitov, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
On 7/26/23 10:56 AM, Roman Gushchin wrote:
> On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote:
>> Hi Roman, thanks for taking time to have a look!
>>>
>>>> When in legacy mode aka. cgroupv1, the socket memory is charged
>>>> into a separate counter memcg->tcpmem rather than ->memory, so
>>>> the reclaim pressure of the memcg has nothing to do with socket's
>>>> pressure at all.
>>>
>>> But we still might set memcg->socket_pressure and propagate the pressure,
>>> right?
>>
>> Yes, but the pressure comes from memcg->socket_pressure does not mean
>> pressure in socket memory in cgroupv1, which might lead to premature
>> reclamation or throttling on socket memory allocation. As the following
>> example shows:
>>
>> ->memory ->tcpmem
>> limit 10G 10G
>> usage 9G 4G
>> pressure true false
>
> Yes, now it makes sense to me. Thank you for the explanation.
Cheers!
>
> Then I'd organize the patchset in the following way:
> 1) cgroup v1-only fix to not throttle tcpmem based on the vmpressure
> 2) a formal code refactoring
OK, I will take a try to re-organize in next version.
>>>
>>> Overall I think it's a good idea to clean these things up and thank you
>>> for working on this. But I wonder if we can make the next step and leave only
>>> one mechanism for both cgroup v1 and v2 instead of having this weird setup
>>> where memcg->socket_pressure is set differently from different paths on cgroup
>>> v1 and v2.
>>
>> There is some difficulty in unifying the mechanism for both cgroup
>> designs. Throttling socket memory allocation when memcg is under
>> pressure only makes sense when socket memory and other usages are
>> sharing the same limit, which is not true for cgroupv1. Thoughts?
>
> I see... Generally speaking cgroup v1 is considered frozen, so we can leave it
> as it is, except when it creates an unnecessary complexity in the code.
Are you suggesting that the 2nd patch can be ignored and keep
->tcpmem_pressure as it is? Or keep the 2nd patch and add some
explanation around as you suggested in last reply?
>
> I'm curious, was your work driven by some real-world problem or a desire to clean
> up the code? Both are valid reasons of course.
We (a cloud service provider) are migrating users to cgroupv2,
but encountered some problems among which the socket memory
really puts us in a difficult situation. There is no specific
threshold for socket memory in cgroupv2 and relies largely on
workloads doing traffic control themselves.
Say one workload behaves fine in cgroupv1 with 10G of ->memory
and 1G of ->tcpmem, but will suck (or even be OOMed) in cgroupv2
with 11G of ->memory due to burst memory usage on socket.
It's rational for the workloads to build some traffic control
to better utilize the resources they bought, but from kernel's
point of view it's also reasonable to suppress the allocation
of socket memory once there is a shortage of free memory, given
that performance degradation is better than failure.
Currently the mechanism of net-memcg's pressure doesn't work as
we expected, please check the discussion in [1]. Besides this,
we are also working on mitigating the priority inversion issue
introduced by the net protocols' global shared thresholds [2],
which has something to do with the net-memcg's pressure. This
patchset and maybe some other are byproducts of the above work.
[1]
https://lore.kernel.org/netdev/20230602081135.75424-1-wuyun.abel@bytedance.com/
[2]
https://lore.kernel.org/netdev/20230609082712.34889-1-wuyun.abel@bytedance.com/
Thanks!
Abel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-26 8:44 ` Abel Wu
@ 2023-07-27 0:19 ` Roman Gushchin
2023-07-28 12:45 ` Abel Wu
0 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2023-07-27 0:19 UTC (permalink / raw)
To: Abel Wu
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Johannes Weiner, Michal Hocko, Shakeel Butt, Muchun Song,
Andrew Morton, David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle),
Yu Zhao, Kefeng Wang, Yafang Shao, Kuniyuki Iwashima,
Martin KaFai Lau, Alexander Mikhalitsyn, Breno Leitao,
David Howells, Jason Xing, Xin Long, Michal Hocko,
Alexei Starovoitov, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
On Wed, Jul 26, 2023 at 04:44:24PM +0800, Abel Wu wrote:
> On 7/26/23 10:56 AM, Roman Gushchin wrote:
> > On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote:
> > > Hi Roman, thanks for taking time to have a look!
> > > >
> > > > > When in legacy mode aka. cgroupv1, the socket memory is charged
> > > > > into a separate counter memcg->tcpmem rather than ->memory, so
> > > > > the reclaim pressure of the memcg has nothing to do with socket's
> > > > > pressure at all.
> > > >
> > > > But we still might set memcg->socket_pressure and propagate the pressure,
> > > > right?
> > >
> > > Yes, but the pressure comes from memcg->socket_pressure does not mean
> > > pressure in socket memory in cgroupv1, which might lead to premature
> > > reclamation or throttling on socket memory allocation. As the following
> > > example shows:
> > >
> > > ->memory ->tcpmem
> > > limit 10G 10G
> > > usage 9G 4G
> > > pressure true false
> >
> > Yes, now it makes sense to me. Thank you for the explanation.
>
> Cheers!
>
> >
> > Then I'd organize the patchset in the following way:
> > 1) cgroup v1-only fix to not throttle tcpmem based on the vmpressure
> > 2) a formal code refactoring
>
> OK, I will take a try to re-organize in next version.
Thank you!
>
> > > >
> > > > Overall I think it's a good idea to clean these things up and thank you
> > > > for working on this. But I wonder if we can make the next step and leave only
> > > > one mechanism for both cgroup v1 and v2 instead of having this weird setup
> > > > where memcg->socket_pressure is set differently from different paths on cgroup
> > > > v1 and v2.
> > >
> > > There is some difficulty in unifying the mechanism for both cgroup
> > > designs. Throttling socket memory allocation when memcg is under
> > > pressure only makes sense when socket memory and other usages are
> > > sharing the same limit, which is not true for cgroupv1. Thoughts?
> >
> > I see... Generally speaking cgroup v1 is considered frozen, so we can leave it
> > as it is, except when it creates an unnecessary complexity in the code.
>
> Are you suggesting that the 2nd patch can be ignored and keep
> ->tcpmem_pressure as it is? Or keep the 2nd patch and add some
> explanation around as you suggested in last reply?
I suggest to split a code refactoring (which is not expected to bring any
functional changes) and an actual change of the behavior on cgroup v1.
Re the refactoring: I see a lot of value in adding comments and make the
code more readable, I don't see that much value in merging two variables.
But if it comes organically with the code simplification - nice.
>
> >
> > I'm curious, was your work driven by some real-world problem or a desire to clean
> > up the code? Both are valid reasons of course.
>
> We (a cloud service provider) are migrating users to cgroupv2,
> but encountered some problems among which the socket memory
> really puts us in a difficult situation. There is no specific
> threshold for socket memory in cgroupv2 and relies largely on
> workloads doing traffic control themselves.
>
> Say one workload behaves fine in cgroupv1 with 10G of ->memory
> and 1G of ->tcpmem, but will suck (or even be OOMed) in cgroupv2
> with 11G of ->memory due to burst memory usage on socket.
>
> It's rational for the workloads to build some traffic control
> to better utilize the resources they bought, but from kernel's
> point of view it's also reasonable to suppress the allocation
> of socket memory once there is a shortage of free memory, given
> that performance degradation is better than failure.
Yeah, I can see it. But Idk if it's too workload-specific to have
a single-policy-fits-all-cases approach.
E.g. some workloads might prefer to have a portion of pagecache
being reclaimed.
What do you think?
>
> Currently the mechanism of net-memcg's pressure doesn't work as
> we expected, please check the discussion in [1]. Besides this,
> we are also working on mitigating the priority inversion issue
> introduced by the net protocols' global shared thresholds [2],
> which has something to do with the net-memcg's pressure. This
> patchset and maybe some other are byproducts of the above work.
>
> [1] https://lore.kernel.org/netdev/20230602081135.75424-1-wuyun.abel@bytedance.com/
> [2] https://lore.kernel.org/netdev/20230609082712.34889-1-wuyun.abel@bytedance.com/
Thanks for the clarification!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RESEND net-next 1/2] net-memcg: Scopify the indicators of sockmem pressure
2023-07-27 0:19 ` Roman Gushchin
@ 2023-07-28 12:45 ` Abel Wu
0 siblings, 0 replies; 13+ messages in thread
From: Abel Wu @ 2023-07-28 12:45 UTC (permalink / raw)
To: Roman Gushchin
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Johannes Weiner, Michal Hocko, Shakeel Butt, Muchun Song,
Andrew Morton, David Ahern, Yosry Ahmed, Matthew Wilcox (Oracle),
Yu Zhao, Kefeng Wang, Yafang Shao, Kuniyuki Iwashima,
Martin KaFai Lau, Alexander Mikhalitsyn, Breno Leitao,
David Howells, Jason Xing, Xin Long, Michal Hocko,
Alexei Starovoitov, open list, open list:NETWORKING [GENERAL],
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG),
open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
On 7/27/23 8:19 AM, Roman Gushchin wrote:
> On Wed, Jul 26, 2023 at 04:44:24PM +0800, Abel Wu wrote:
>> On 7/26/23 10:56 AM, Roman Gushchin wrote:
>>> On Mon, Jul 24, 2023 at 11:47:02AM +0800, Abel Wu wrote:
>>>> Hi Roman, thanks for taking time to have a look!
>>>>>
>>>>> Overall I think it's a good idea to clean these things up and thank you
>>>>> for working on this. But I wonder if we can make the next step and leave only
>>>>> one mechanism for both cgroup v1 and v2 instead of having this weird setup
>>>>> where memcg->socket_pressure is set differently from different paths on cgroup
>>>>> v1 and v2.
>>>>
>>>> There is some difficulty in unifying the mechanism for both cgroup
>>>> designs. Throttling socket memory allocation when memcg is under
>>>> pressure only makes sense when socket memory and other usages are
>>>> sharing the same limit, which is not true for cgroupv1. Thoughts?
>>>
>>> I see... Generally speaking cgroup v1 is considered frozen, so we can leave it
>>> as it is, except when it creates an unnecessary complexity in the code.
>>
>> Are you suggesting that the 2nd patch can be ignored and keep
>> ->tcpmem_pressure as it is? Or keep the 2nd patch and add some
>> explanation around as you suggested in last reply?
>
> I suggest to split a code refactoring (which is not expected to bring any
> functional changes) and an actual change of the behavior on cgroup v1.
> Re the refactoring: I see a lot of value in adding comments and make the
> code more readable, I don't see that much value in merging two variables.
> But if it comes organically with the code simplification - nice.
I see, thanks for the clarification!
>
>>> I'm curious, was your work driven by some real-world problem or a desire to clean
>>> up the code? Both are valid reasons of course.
>>
>> We (a cloud service provider) are migrating users to cgroupv2,
>> but encountered some problems among which the socket memory
>> really puts us in a difficult situation. There is no specific
>> threshold for socket memory in cgroupv2 and relies largely on
>> workloads doing traffic control themselves.
>>
>> Say one workload behaves fine in cgroupv1 with 10G of ->memory
>> and 1G of ->tcpmem, but will suck (or even be OOMed) in cgroupv2
>> with 11G of ->memory due to burst memory usage on socket.
>>
>> It's rational for the workloads to build some traffic control
>> to better utilize the resources they bought, but from kernel's
>> point of view it's also reasonable to suppress the allocation
>> of socket memory once there is a shortage of free memory, given
>> that performance degradation is better than failure.
>
> Yeah, I can see it. But Idk if it's too workload-specific to have
> a single-policy-fits-all-cases approach.
> E.g. some workloads might prefer to have a portion of pagecache
> being reclaimed.
> What do you think?
Now the memcg is considered to be under pressure if the number of
pages reclaimed is much less than desired. I doubt it could be a
win in such case to spend more time on reclaiming while letting
socket continue to allocate memory (which could make things worse),
compared to relieving reclaim pressure and putting time on its real
work.
Best,
Abel
^ permalink raw reply [flat|nested] 13+ messages in thread