cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] memcg: expose socket memory pressure in a cgroup
@ 2025-07-22  7:11 Daniel Sedlak
  2025-07-22  7:17 ` Eric Dumazet
  2025-07-22  8:57 ` Michal Koutný
  0 siblings, 2 replies; 20+ messages in thread
From: Daniel Sedlak @ 2025-07-22  7:11 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Neal Cardwell, Kuniyuki Iwashima,
	David Ahern, Andrew Morton, Shakeel Butt, Yosry Ahmed, linux-mm,
	netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups
  Cc: Daniel Sedlak, Matyas Hurtik

This patch is a result of our long-standing debug sessions, where it all
started as "networking is slow", and TCP network throughput suddenly
dropped from tens of Gbps to few Mbps, and we could not see anything in
the kernel log or netstat counters.

Currently, we have two memory pressure counters for TCP sockets [1],
which we manipulate only when the memory pressure is signalled through
the proto struct [2]. However, the memory pressure can also be signaled
through the cgroup memory subsystem, which we do not reflect in the
netstat counters. In the end, when the cgroup memory subsystem signals
that it is under pressure, we silently reduce the advertised TCP window
with tcp_adjust_rcv_ssthresh() to 4*advmss, which causes a significant
throughput reduction.

Keep in mind that when the cgroup memory subsystem signals the socket
memory pressure, it affects all sockets used in that cgroup.

This patch exposes a new file for each cgroup in sysfs which signals
the cgroup socket memory pressure. The file is accessible in
the following path.

  /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure

The output value is an integer matching the internal semantics of the
struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
representing the end of the said socket memory pressure, and once the
clock is re-armed it is set to jiffies + HZ.

Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/uapi/linux/snmp.h#L231-L232 [1]
Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/net/sock.h#L1300-L1301 [2]
Co-developed-by: Matyas Hurtik <matyas.hurtik@cdn77.com>
Signed-off-by: Matyas Hurtik <matyas.hurtik@cdn77.com>
Signed-off-by: Daniel Sedlak <daniel.sedlak@cdn77.com>
---
Changes:
v2 -> v3:
- Expose the socket memory pressure on the cgroups instead of netstat
- Split patch
- Link: https://lore.kernel.org/netdev/20250714143613.42184-1-daniel.sedlak@cdn77.com/

v1 -> v2:
- Add tracepoint
- Link: https://lore.kernel.org/netdev/20250707105205.222558-1-daniel.sedlak@cdn77.com/


 mm/memcontrol.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 902da8a9c643..8e8808fb2d7a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4647,6 +4647,15 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 	return nbytes;
 }
 
+static int memory_socket_pressure_show(struct seq_file *m, void *v)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
+
+	seq_printf(m, "%lu\n", READ_ONCE(memcg->socket_pressure));
+
+	return 0;
+}
+
 static struct cftype memory_files[] = {
 	{
 		.name = "current",
@@ -4718,6 +4727,11 @@ static struct cftype memory_files[] = {
 		.flags = CFTYPE_NS_DELEGATABLE,
 		.write = memory_reclaim,
 	},
+	{
+		.name = "net.socket_pressure",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_socket_pressure_show,
+	},
 	{ }	/* terminate */
 };
 

base-commit: e96ee511c906c59b7c4e6efd9d9b33917730e000
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22  7:11 [PATCH v3] memcg: expose socket memory pressure in a cgroup Daniel Sedlak
@ 2025-07-22  7:17 ` Eric Dumazet
  2025-07-22  7:27   ` Daniel Sedlak
  2025-07-22  8:57 ` Michal Koutný
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2025-07-22  7:17 UTC (permalink / raw)
  To: Daniel Sedlak
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Neal Cardwell, Kuniyuki Iwashima, David Ahern,
	Andrew Morton, Shakeel Butt, Yosry Ahmed, linux-mm, netdev,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	cgroups, Matyas Hurtik

On Tue, Jul 22, 2025 at 12:12 AM Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
>
> This patch is a result of our long-standing debug sessions, where it all
> started as "networking is slow", and TCP network throughput suddenly
> dropped from tens of Gbps to few Mbps, and we could not see anything in
> the kernel log or netstat counters.
>
> Currently, we have two memory pressure counters for TCP sockets [1],
> which we manipulate only when the memory pressure is signalled through
> the proto struct [2]. However, the memory pressure can also be signaled
> through the cgroup memory subsystem, which we do not reflect in the
> netstat counters. In the end, when the cgroup memory subsystem signals
> that it is under pressure, we silently reduce the advertised TCP window
> with tcp_adjust_rcv_ssthresh() to 4*advmss, which causes a significant
> throughput reduction.
>
> Keep in mind that when the cgroup memory subsystem signals the socket
> memory pressure, it affects all sockets used in that cgroup.
>
> This patch exposes a new file for each cgroup in sysfs which signals
> the cgroup socket memory pressure. The file is accessible in
> the following path.
>
>   /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
>
> The output value is an integer matching the internal semantics of the
> struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
> representing the end of the said socket memory pressure, and once the
> clock is re-armed it is set to jiffies + HZ.
>
> Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/uapi/linux/snmp.h#L231-L232 [1]
> Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/net/sock.h#L1300-L1301 [2]
> Co-developed-by: Matyas Hurtik <matyas.hurtik@cdn77.com>
> Signed-off-by: Matyas Hurtik <matyas.hurtik@cdn77.com>
> Signed-off-by: Daniel Sedlak <daniel.sedlak@cdn77.com>
> ---
> Changes:
> v2 -> v3:
> - Expose the socket memory pressure on the cgroups instead of netstat
> - Split patch
> - Link: https://lore.kernel.org/netdev/20250714143613.42184-1-daniel.sedlak@cdn77.com/
>
> v1 -> v2:
> - Add tracepoint
> - Link: https://lore.kernel.org/netdev/20250707105205.222558-1-daniel.sedlak@cdn77.com/
>
>
>  mm/memcontrol.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 902da8a9c643..8e8808fb2d7a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4647,6 +4647,15 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
>         return nbytes;
>  }
>
> +static int memory_socket_pressure_show(struct seq_file *m, void *v)
> +{
> +       struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> +
> +       seq_printf(m, "%lu\n", READ_ONCE(memcg->socket_pressure));
> +
> +       return 0;
> +}
> +
>  static struct cftype memory_files[] = {
>         {
>                 .name = "current",
> @@ -4718,6 +4727,11 @@ static struct cftype memory_files[] = {
>                 .flags = CFTYPE_NS_DELEGATABLE,
>                 .write = memory_reclaim,
>         },
> +       {
> +               .name = "net.socket_pressure",
> +               .flags = CFTYPE_NOT_ON_ROOT,
> +               .seq_show = memory_socket_pressure_show,
> +       },
>         { }     /* terminate */
>  };
>

It seems you forgot to update Documentation/admin-guide/cgroup-v2.rst

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22  7:17 ` Eric Dumazet
@ 2025-07-22  7:27   ` Daniel Sedlak
  0 siblings, 0 replies; 20+ messages in thread
From: Daniel Sedlak @ 2025-07-22  7:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Neal Cardwell, Kuniyuki Iwashima, David Ahern,
	Andrew Morton, Shakeel Butt, Yosry Ahmed, linux-mm, netdev,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	cgroups, Matyas Hurtik

On 7/22/25 9:17 AM, Eric Dumazet wrote:
> On Tue, Jul 22, 2025 at 12:12 AM Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
>>
>> This patch is a result of our long-standing debug sessions, where it all
>> started as "networking is slow", and TCP network throughput suddenly
>> dropped from tens of Gbps to few Mbps, and we could not see anything in
>> the kernel log or netstat counters.
>>
>> Currently, we have two memory pressure counters for TCP sockets [1],
>> which we manipulate only when the memory pressure is signalled through
>> the proto struct [2]. However, the memory pressure can also be signaled
>> through the cgroup memory subsystem, which we do not reflect in the
>> netstat counters. In the end, when the cgroup memory subsystem signals
>> that it is under pressure, we silently reduce the advertised TCP window
>> with tcp_adjust_rcv_ssthresh() to 4*advmss, which causes a significant
>> throughput reduction.
>>
>> Keep in mind that when the cgroup memory subsystem signals the socket
>> memory pressure, it affects all sockets used in that cgroup.
>>
>> This patch exposes a new file for each cgroup in sysfs which signals
>> the cgroup socket memory pressure. The file is accessible in
>> the following path.
>>
>>    /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
>>
>> The output value is an integer matching the internal semantics of the
>> struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
>> representing the end of the said socket memory pressure, and once the
>> clock is re-armed it is set to jiffies + HZ.
>>
>> Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/uapi/linux/snmp.h#L231-L232 [1]
>> Link: https://elixir.bootlin.com/linux/v6.15.4/source/include/net/sock.h#L1300-L1301 [2]
>> Co-developed-by: Matyas Hurtik <matyas.hurtik@cdn77.com>
>> Signed-off-by: Matyas Hurtik <matyas.hurtik@cdn77.com>
>> Signed-off-by: Daniel Sedlak <daniel.sedlak@cdn77.com>
>> ---
>> Changes:
>> v2 -> v3:
>> - Expose the socket memory pressure on the cgroups instead of netstat
>> - Split patch
>> - Link: https://lore.kernel.org/netdev/20250714143613.42184-1-daniel.sedlak@cdn77.com/
>>
>> v1 -> v2:
>> - Add tracepoint
>> - Link: https://lore.kernel.org/netdev/20250707105205.222558-1-daniel.sedlak@cdn77.com/
>>
>>
>>   mm/memcontrol.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 902da8a9c643..8e8808fb2d7a 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -4647,6 +4647,15 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
>>          return nbytes;
>>   }
>>
>> +static int memory_socket_pressure_show(struct seq_file *m, void *v)
>> +{
>> +       struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
>> +
>> +       seq_printf(m, "%lu\n", READ_ONCE(memcg->socket_pressure));
>> +
>> +       return 0;
>> +}
>> +
>>   static struct cftype memory_files[] = {
>>          {
>>                  .name = "current",
>> @@ -4718,6 +4727,11 @@ static struct cftype memory_files[] = {
>>                  .flags = CFTYPE_NS_DELEGATABLE,
>>                  .write = memory_reclaim,
>>          },
>> +       {
>> +               .name = "net.socket_pressure",
>> +               .flags = CFTYPE_NOT_ON_ROOT,
>> +               .seq_show = memory_socket_pressure_show,
>> +       },
>>          { }     /* terminate */
>>   };
>>
> 
> It seems you forgot to update Documentation/admin-guide/cgroup-v2.rst

Oops, missed that. I will add it to the v4.

Thanks!
Daniel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22  7:11 [PATCH v3] memcg: expose socket memory pressure in a cgroup Daniel Sedlak
  2025-07-22  7:17 ` Eric Dumazet
@ 2025-07-22  8:57 ` Michal Koutný
  2025-07-22 17:50   ` Shakeel Butt
  1 sibling, 1 reply; 20+ messages in thread
From: Michal Koutný @ 2025-07-22  8:57 UTC (permalink / raw)
  To: Daniel Sedlak, Shakeel Butt
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Neal Cardwell, Kuniyuki Iwashima,
	David Ahern, Andrew Morton, Yosry Ahmed, linux-mm, netdev,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	cgroups, Matyas Hurtik

[-- Attachment #1: Type: text/plain, Size: 939 bytes --]

Hello Daniel.

On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
>   /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
> 
> The output value is an integer matching the internal semantics of the
> struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
> representing the end of the said socket memory pressure, and once the
> clock is re-armed it is set to jiffies + HZ.

I don't find it ideal to expose this value in its raw form that is
rather an implementation detail.

IIUC, the information is possibly valid only during one jiffy interval.
How would be the userspace consuming this?

I'd consider exposing this as a cummulative counter in memory.stat for
simplicity (or possibly cummulative time spent in the pressure
condition).

Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
thought it's kind of legacy.

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22  8:57 ` Michal Koutný
@ 2025-07-22 17:50   ` Shakeel Butt
  2025-07-22 18:27     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 20+ messages in thread
From: Shakeel Butt @ 2025-07-22 17:50 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Daniel Sedlak, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Neal Cardwell,
	Kuniyuki Iwashima, David Ahern, Andrew Morton, Yosry Ahmed,
	linux-mm, netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, Matyas Hurtik

On Tue, Jul 22, 2025 at 10:57:31AM +0200, Michal Koutný wrote:
> Hello Daniel.
> 
> On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
> >   /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
> > 
> > The output value is an integer matching the internal semantics of the
> > struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
> > representing the end of the said socket memory pressure, and once the
> > clock is re-armed it is set to jiffies + HZ.
> 
> I don't find it ideal to expose this value in its raw form that is
> rather an implementation detail.
> 
> IIUC, the information is possibly valid only during one jiffy interval.
> How would be the userspace consuming this?
> 
> I'd consider exposing this as a cummulative counter in memory.stat for
> simplicity (or possibly cummulative time spent in the pressure
> condition).
> 
> Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
> thought it's kind of legacy.


Yes vmpressure is legacy and we should not expose raw underlying number
to the userspace. How about just 0 or 1 and use
mem_cgroup_under_socket_pressure() underlying? In future if we change
the underlying implementation, the output of this interface should be
consistent.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22 17:50   ` Shakeel Butt
@ 2025-07-22 18:27     ` Kuniyuki Iwashima
  2025-07-22 18:41       ` Waiman Long
  2025-07-22 19:05       ` Shakeel Butt
  0 siblings, 2 replies; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-07-22 18:27 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Koutný, Daniel Sedlak, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Neal Cardwell, David Ahern, Andrew Morton, Yosry Ahmed, linux-mm,
	netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, Matyas Hurtik

On Tue, Jul 22, 2025 at 10:50 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Tue, Jul 22, 2025 at 10:57:31AM +0200, Michal Koutný wrote:
> > Hello Daniel.
> >
> > On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
> > >   /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
> > >
> > > The output value is an integer matching the internal semantics of the
> > > struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
> > > representing the end of the said socket memory pressure, and once the
> > > clock is re-armed it is set to jiffies + HZ.
> >
> > I don't find it ideal to expose this value in its raw form that is
> > rather an implementation detail.
> >
> > IIUC, the information is possibly valid only during one jiffy interval.
> > How would be the userspace consuming this?
> >
> > I'd consider exposing this as a cummulative counter in memory.stat for
> > simplicity (or possibly cummulative time spent in the pressure
> > condition).
> >
> > Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
> > thought it's kind of legacy.
>
>
> Yes vmpressure is legacy and we should not expose raw underlying number
> to the userspace. How about just 0 or 1 and use
> mem_cgroup_under_socket_pressure() underlying? In future if we change
> the underlying implementation, the output of this interface should be
> consistent.

But this is available only for 1 second, and it will not be useful
except for live debugging ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22 18:27     ` Kuniyuki Iwashima
@ 2025-07-22 18:41       ` Waiman Long
  2025-07-22 18:49         ` Kuniyuki Iwashima
  2025-07-22 19:05       ` Shakeel Butt
  1 sibling, 1 reply; 20+ messages in thread
From: Waiman Long @ 2025-07-22 18:41 UTC (permalink / raw)
  To: Kuniyuki Iwashima, Shakeel Butt
  Cc: Michal Koutný, Daniel Sedlak, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Neal Cardwell, David Ahern, Andrew Morton, Yosry Ahmed, linux-mm,
	netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, Matyas Hurtik


On 7/22/25 2:27 PM, Kuniyuki Iwashima wrote:
> On Tue, Jul 22, 2025 at 10:50 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>> On Tue, Jul 22, 2025 at 10:57:31AM +0200, Michal Koutný wrote:
>>> Hello Daniel.
>>>
>>> On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
>>>>    /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
>>>>
>>>> The output value is an integer matching the internal semantics of the
>>>> struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
>>>> representing the end of the said socket memory pressure, and once the
>>>> clock is re-armed it is set to jiffies + HZ.
>>> I don't find it ideal to expose this value in its raw form that is
>>> rather an implementation detail.
>>>
>>> IIUC, the information is possibly valid only during one jiffy interval.
>>> How would be the userspace consuming this?
>>>
>>> I'd consider exposing this as a cummulative counter in memory.stat for
>>> simplicity (or possibly cummulative time spent in the pressure
>>> condition).
>>>
>>> Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
>>> thought it's kind of legacy.
>>
>> Yes vmpressure is legacy and we should not expose raw underlying number
>> to the userspace. How about just 0 or 1 and use
>> mem_cgroup_under_socket_pressure() underlying? In future if we change
>> the underlying implementation, the output of this interface should be
>> consistent.
> But this is available only for 1 second, and it will not be useful
> except for live debugging ?

If the new interface is used mainly for debugging purpose, I will 
suggest adding the CFTYPE_DEBUG flag so that it will only show up when 
"cgroup_debug" is specified in the kernel command line.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22 18:41       ` Waiman Long
@ 2025-07-22 18:49         ` Kuniyuki Iwashima
  0 siblings, 0 replies; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-07-22 18:49 UTC (permalink / raw)
  To: Waiman Long
  Cc: Shakeel Butt, Michal Koutný, Daniel Sedlak, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Neal Cardwell, David Ahern, Andrew Morton,
	Yosry Ahmed, linux-mm, netdev, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, cgroups, Matyas Hurtik

On Tue, Jul 22, 2025 at 11:41 AM Waiman Long <llong@redhat.com> wrote:
>
>
> On 7/22/25 2:27 PM, Kuniyuki Iwashima wrote:
> > On Tue, Jul 22, 2025 at 10:50 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >> On Tue, Jul 22, 2025 at 10:57:31AM +0200, Michal Koutný wrote:
> >>> Hello Daniel.
> >>>
> >>> On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
> >>>>    /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
> >>>>
> >>>> The output value is an integer matching the internal semantics of the
> >>>> struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
> >>>> representing the end of the said socket memory pressure, and once the
> >>>> clock is re-armed it is set to jiffies + HZ.
> >>> I don't find it ideal to expose this value in its raw form that is
> >>> rather an implementation detail.
> >>>
> >>> IIUC, the information is possibly valid only during one jiffy interval.
> >>> How would be the userspace consuming this?
> >>>
> >>> I'd consider exposing this as a cummulative counter in memory.stat for
> >>> simplicity (or possibly cummulative time spent in the pressure
> >>> condition).
> >>>
> >>> Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
> >>> thought it's kind of legacy.
> >>
> >> Yes vmpressure is legacy and we should not expose raw underlying number
> >> to the userspace. How about just 0 or 1 and use
> >> mem_cgroup_under_socket_pressure() underlying? In future if we change
> >> the underlying implementation, the output of this interface should be
> >> consistent.
> > But this is available only for 1 second, and it will not be useful
> > except for live debugging ?
>
> If the new interface is used mainly for debugging purpose, I will
> suggest adding the CFTYPE_DEBUG flag so that it will only show up when
> "cgroup_debug" is specified in the kernel command line.

Sorry, I meant the signal that is available only for 1 second does not
help troubleshooting and we cannot get any hint from 0 _after_
something bad happens.

The flag works if the issue is more consistent or can be reproduced
and we can reboot, but it does not fit here.  I guess the flag is for a
different use case ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22 18:27     ` Kuniyuki Iwashima
  2025-07-22 18:41       ` Waiman Long
@ 2025-07-22 19:05       ` Shakeel Butt
  2025-07-22 19:58         ` Kuniyuki Iwashima
  2025-07-23  8:41         ` Daniel Sedlak
  1 sibling, 2 replies; 20+ messages in thread
From: Shakeel Butt @ 2025-07-22 19:05 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Michal Koutný, Daniel Sedlak, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Neal Cardwell, David Ahern, Andrew Morton, Yosry Ahmed, linux-mm,
	netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, Matyas Hurtik

On Tue, Jul 22, 2025 at 11:27:39AM -0700, Kuniyuki Iwashima wrote:
> On Tue, Jul 22, 2025 at 10:50 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Tue, Jul 22, 2025 at 10:57:31AM +0200, Michal Koutný wrote:
> > > Hello Daniel.
> > >
> > > On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
> > > >   /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
> > > >
> > > > The output value is an integer matching the internal semantics of the
> > > > struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
> > > > representing the end of the said socket memory pressure, and once the
> > > > clock is re-armed it is set to jiffies + HZ.
> > >
> > > I don't find it ideal to expose this value in its raw form that is
> > > rather an implementation detail.
> > >
> > > IIUC, the information is possibly valid only during one jiffy interval.
> > > How would be the userspace consuming this?
> > >
> > > I'd consider exposing this as a cummulative counter in memory.stat for
> > > simplicity (or possibly cummulative time spent in the pressure
> > > condition).
> > >
> > > Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
> > > thought it's kind of legacy.
> >
> >
> > Yes vmpressure is legacy and we should not expose raw underlying number
> > to the userspace. How about just 0 or 1 and use
> > mem_cgroup_under_socket_pressure() underlying? In future if we change
> > the underlying implementation, the output of this interface should be
> > consistent.
> 
> But this is available only for 1 second, and it will not be useful
> except for live debugging ?

1 second is the current implementation and it can be more if the memcg
remains in memory pressure. Regarding usefullness I think the periodic
stat collectors (like cadvisor or Google's internal borglet+rumbo) would
be interested in scraping this interface. If this is still not useful,
what will be better? Some kind of trace which tracks the state of socket
pressure state of a memcg (i.e. going into and out of pressure)?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22 19:05       ` Shakeel Butt
@ 2025-07-22 19:58         ` Kuniyuki Iwashima
  2025-07-22 20:11           ` Shakeel Butt
  2025-07-23  8:41         ` Daniel Sedlak
  1 sibling, 1 reply; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-07-22 19:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Koutný, Daniel Sedlak, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Neal Cardwell, David Ahern, Andrew Morton, Yosry Ahmed, linux-mm,
	netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, Matyas Hurtik

On Tue, Jul 22, 2025 at 12:05 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Tue, Jul 22, 2025 at 11:27:39AM -0700, Kuniyuki Iwashima wrote:
> > On Tue, Jul 22, 2025 at 10:50 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > >
> > > On Tue, Jul 22, 2025 at 10:57:31AM +0200, Michal Koutný wrote:
> > > > Hello Daniel.
> > > >
> > > > On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
> > > > >   /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
> > > > >
> > > > > The output value is an integer matching the internal semantics of the
> > > > > struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
> > > > > representing the end of the said socket memory pressure, and once the
> > > > > clock is re-armed it is set to jiffies + HZ.
> > > >
> > > > I don't find it ideal to expose this value in its raw form that is
> > > > rather an implementation detail.
> > > >
> > > > IIUC, the information is possibly valid only during one jiffy interval.
> > > > How would be the userspace consuming this?
> > > >
> > > > I'd consider exposing this as a cummulative counter in memory.stat for
> > > > simplicity (or possibly cummulative time spent in the pressure
> > > > condition).
> > > >
> > > > Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
> > > > thought it's kind of legacy.
> > >
> > >
> > > Yes vmpressure is legacy and we should not expose raw underlying number
> > > to the userspace. How about just 0 or 1 and use
> > > mem_cgroup_under_socket_pressure() underlying? In future if we change
> > > the underlying implementation, the output of this interface should be
> > > consistent.
> >
> > But this is available only for 1 second, and it will not be useful
> > except for live debugging ?
>
> 1 second is the current implementation and it can be more if the memcg
> remains in memory pressure. Regarding usefullness I think the periodic
> stat collectors (like cadvisor or Google's internal borglet+rumbo) would
> be interested in scraping this interface.

I think the cumulative counter suggested above is better at least.

If we poll such an interface periodically, the cumulative counter also
works, we can just calculate the delta.  And even we don't need to
monitor that if it's not always needed but we can know if there was
memory pressure.



> If this is still not useful,
> what will be better? Some kind of trace which tracks the state of socket
> pressure state of a memcg (i.e. going into and out of pressure)?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22 19:58         ` Kuniyuki Iwashima
@ 2025-07-22 20:11           ` Shakeel Butt
  2025-07-22 22:10             ` Kuniyuki Iwashima
  2025-07-23  8:38             ` Michal Koutný
  0 siblings, 2 replies; 20+ messages in thread
From: Shakeel Butt @ 2025-07-22 20:11 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Michal Koutný, Daniel Sedlak, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Neal Cardwell, David Ahern, Andrew Morton, Yosry Ahmed, linux-mm,
	netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, Matyas Hurtik

On Tue, Jul 22, 2025 at 12:58:17PM -0700, Kuniyuki Iwashima wrote:
> On Tue, Jul 22, 2025 at 12:05 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Tue, Jul 22, 2025 at 11:27:39AM -0700, Kuniyuki Iwashima wrote:
> > > On Tue, Jul 22, 2025 at 10:50 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > >
> > > > On Tue, Jul 22, 2025 at 10:57:31AM +0200, Michal Koutný wrote:
> > > > > Hello Daniel.
> > > > >
> > > > > On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
> > > > > >   /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
> > > > > >
> > > > > > The output value is an integer matching the internal semantics of the
> > > > > > struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
> > > > > > representing the end of the said socket memory pressure, and once the
> > > > > > clock is re-armed it is set to jiffies + HZ.
> > > > >
> > > > > I don't find it ideal to expose this value in its raw form that is
> > > > > rather an implementation detail.
> > > > >
> > > > > IIUC, the information is possibly valid only during one jiffy interval.
> > > > > How would be the userspace consuming this?
> > > > >
> > > > > I'd consider exposing this as a cummulative counter in memory.stat for
> > > > > simplicity (or possibly cummulative time spent in the pressure
> > > > > condition).
> > > > >
> > > > > Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
> > > > > thought it's kind of legacy.
> > > >
> > > >
> > > > Yes vmpressure is legacy and we should not expose raw underlying number
> > > > to the userspace. How about just 0 or 1 and use
> > > > mem_cgroup_under_socket_pressure() underlying? In future if we change
> > > > the underlying implementation, the output of this interface should be
> > > > consistent.
> > >
> > > But this is available only for 1 second, and it will not be useful
> > > except for live debugging ?
> >
> > 1 second is the current implementation and it can be more if the memcg
> > remains in memory pressure. Regarding usefullness I think the periodic
> > stat collectors (like cadvisor or Google's internal borglet+rumbo) would
> > be interested in scraping this interface.
> 
> I think the cumulative counter suggested above is better at least.

It is tied to the underlying implementation. If we decide to use, for
example, PSI in future, what should this interface show?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22 20:11           ` Shakeel Butt
@ 2025-07-22 22:10             ` Kuniyuki Iwashima
  2025-07-23  8:38             ` Michal Koutný
  1 sibling, 0 replies; 20+ messages in thread
From: Kuniyuki Iwashima @ 2025-07-22 22:10 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Koutný, Daniel Sedlak, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Neal Cardwell, David Ahern, Andrew Morton, Yosry Ahmed, linux-mm,
	netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, Matyas Hurtik

On Tue, Jul 22, 2025 at 1:11 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Tue, Jul 22, 2025 at 12:58:17PM -0700, Kuniyuki Iwashima wrote:
> > On Tue, Jul 22, 2025 at 12:05 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > >
> > > On Tue, Jul 22, 2025 at 11:27:39AM -0700, Kuniyuki Iwashima wrote:
> > > > On Tue, Jul 22, 2025 at 10:50 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > > >
> > > > > On Tue, Jul 22, 2025 at 10:57:31AM +0200, Michal Koutný wrote:
> > > > > > Hello Daniel.
> > > > > >
> > > > > > On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
> > > > > > >   /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
> > > > > > >
> > > > > > > The output value is an integer matching the internal semantics of the
> > > > > > > struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
> > > > > > > representing the end of the said socket memory pressure, and once the
> > > > > > > clock is re-armed it is set to jiffies + HZ.
> > > > > >
> > > > > > I don't find it ideal to expose this value in its raw form that is
> > > > > > rather an implementation detail.
> > > > > >
> > > > > > IIUC, the information is possibly valid only during one jiffy interval.
> > > > > > How would be the userspace consuming this?
> > > > > >
> > > > > > I'd consider exposing this as a cummulative counter in memory.stat for
> > > > > > simplicity (or possibly cummulative time spent in the pressure
> > > > > > condition).
> > > > > >
> > > > > > Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
> > > > > > thought it's kind of legacy.
> > > > >
> > > > >
> > > > > Yes vmpressure is legacy and we should not expose raw underlying number
> > > > > to the userspace. How about just 0 or 1 and use
> > > > > mem_cgroup_under_socket_pressure() underlying? In future if we change
> > > > > the underlying implementation, the output of this interface should be
> > > > > consistent.
> > > >
> > > > But this is available only for 1 second, and it will not be useful
> > > > except for live debugging ?
> > >
> > > 1 second is the current implementation and it can be more if the memcg
> > > remains in memory pressure. Regarding usefullness I think the periodic
> > > stat collectors (like cadvisor or Google's internal borglet+rumbo) would
> > > be interested in scraping this interface.
> >
> > I think the cumulative counter suggested above is better at least.
>
> It is tied to the underlying implementation. If we decide to use, for
> example, PSI in future, what should this interface show?

Sorry, I'm not yet familiar with PSI so can't say what would be
especially useful.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22 20:11           ` Shakeel Butt
  2025-07-22 22:10             ` Kuniyuki Iwashima
@ 2025-07-23  8:38             ` Michal Koutný
  2025-07-23  8:58               ` Daniel Sedlak
  1 sibling, 1 reply; 20+ messages in thread
From: Michal Koutný @ 2025-07-23  8:38 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Kuniyuki Iwashima, Daniel Sedlak, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Neal Cardwell, David Ahern, Andrew Morton, Yosry Ahmed, linux-mm,
	netdev, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, cgroups, Matyas Hurtik

[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]

On Tue, Jul 22, 2025 at 01:11:05PM -0700, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > 1 second is the current implementation and it can be more if the memcg
> > > remains in memory pressure. Regarding usefullness I think the periodic
> > > stat collectors (like cadvisor or Google's internal borglet+rumbo) would
> > > be interested in scraping this interface.
> > 
> > I think the cumulative counter suggested above is better at least.
> 
> It is tied to the underlying implementation. If we decide to use, for
> example, PSI in future, what should this interface show?

Actually, if it was exposed as cummulative time under pressure (not
cummulative events), that's quite similar to PSI.

My curiosity is whether this can be useful to some responsive actions
(hence it's worth watching with high frequency or even create
notification events) or rather like post-hoc examination or low
frequency adjustments (reason for cummulative). I.e. what can this
signal to the userspace?

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-22 19:05       ` Shakeel Butt
  2025-07-22 19:58         ` Kuniyuki Iwashima
@ 2025-07-23  8:41         ` Daniel Sedlak
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel Sedlak @ 2025-07-23  8:41 UTC (permalink / raw)
  To: Shakeel Butt, Kuniyuki Iwashima
  Cc: Michal Koutný, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Neal Cardwell,
	David Ahern, Andrew Morton, Yosry Ahmed, linux-mm, netdev,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	cgroups, Matyas Hurtik

On 7/22/25 9:05 PM, Shakeel Butt wrote:
> On Tue, Jul 22, 2025 at 11:27:39AM -0700, Kuniyuki Iwashima wrote:
>> On Tue, Jul 22, 2025 at 10:50 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>>
>>> On Tue, Jul 22, 2025 at 10:57:31AM +0200, Michal Koutný wrote:
>>>> Hello Daniel.
>>>>
>>>> On Tue, Jul 22, 2025 at 09:11:46AM +0200, Daniel Sedlak <daniel.sedlak@cdn77.com> wrote:
>>>>>    /sys/fs/cgroup/**/<cgroup name>/memory.net.socket_pressure
>>>>>
>>>>> The output value is an integer matching the internal semantics of the
>>>>> struct mem_cgroup for socket_pressure. It is a periodic re-arm clock,
>>>>> representing the end of the said socket memory pressure, and once the
>>>>> clock is re-armed it is set to jiffies + HZ.
>>>>
>>>> I don't find it ideal to expose this value in its raw form that is
>>>> rather an implementation detail.
>>>>
>>>> IIUC, the information is possibly valid only during one jiffy interval.
>>>> How would be the userspace consuming this?
>>>>
>>>> I'd consider exposing this as a cummulative counter in memory.stat for
>>>> simplicity (or possibly cummulative time spent in the pressure
>>>> condition).
>>>>
>>>> Shakeel, how useful is this vmpressure per-cgroup tracking nowadays? I
>>>> thought it's kind of legacy.
>>>
>>>
>>> Yes vmpressure is legacy and we should not expose raw underlying number
>>> to the userspace. How about just 0 or 1 and use
>>> mem_cgroup_under_socket_pressure() underlying? In future if we change
>>> the underlying implementation, the output of this interface should be
>>> consistent.
>>
>> But this is available only for 1 second, and it will not be useful
>> except for live debugging ?
> 
> 1 second is the current implementation and it can be more if the memcg
> remains in memory pressure.

In our production environment, when this so-called pressure happens, it 
typically stays under pressure for a few hours straight. So, in our 
scenario, even a 1 or 0 would be helpful since it does not oscillate.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-23  8:38             ` Michal Koutný
@ 2025-07-23  8:58               ` Daniel Sedlak
  2025-07-23 17:54                 ` Shakeel Butt
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Sedlak @ 2025-07-23  8:58 UTC (permalink / raw)
  To: Michal Koutný, Shakeel Butt
  Cc: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Neal Cardwell,
	David Ahern, Andrew Morton, Yosry Ahmed, linux-mm, netdev,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	cgroups, Matyas Hurtik

On 7/23/25 10:38 AM, Michal Koutný wrote:
> On Tue, Jul 22, 2025 at 01:11:05PM -0700, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>>> 1 second is the current implementation and it can be more if the memcg
>>>> remains in memory pressure. Regarding usefullness I think the periodic
>>>> stat collectors (like cadvisor or Google's internal borglet+rumbo) would
>>>> be interested in scraping this interface.
>>>
>>> I think the cumulative counter suggested above is better at least.
>>
>> It is tied to the underlying implementation. If we decide to use, for
>> example, PSI in future, what should this interface show?
> 
> Actually, if it was exposed as cummulative time under pressure (not
> cummulative events), that's quite similar to PSI.

I think overall the cumulative counter is better than just signaling 1 
or 0, but it lacks the time information (if not scraped periodically). 
In addition, it may oscillate between under_pressure=true/false rather 
quickly so the cumulative counter would catch this.

To me, introducing the new PSI for sockets (like for CPU, IO, memory), 
would be slightly better than cumulative counter because PSI can have 
the timing information without frequent periodic scrapes. So it may help 
with live debugs.

However, if we were to just add a new counter to the memory.stat in each 
cgroup, then it would be easier to do so?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-23  8:58               ` Daniel Sedlak
@ 2025-07-23 17:54                 ` Shakeel Butt
  2025-07-24  8:43                   ` Daniel Sedlak
  0 siblings, 1 reply; 20+ messages in thread
From: Shakeel Butt @ 2025-07-23 17:54 UTC (permalink / raw)
  To: Daniel Sedlak
  Cc: Michal Koutný, Kuniyuki Iwashima, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Neal Cardwell, David Ahern, Andrew Morton,
	Yosry Ahmed, linux-mm, netdev, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, cgroups, Matyas Hurtik

On Wed, Jul 23, 2025 at 10:58:10AM +0200, Daniel Sedlak wrote:
> On 7/23/25 10:38 AM, Michal Koutný wrote:
> > On Tue, Jul 22, 2025 at 01:11:05PM -0700, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > > > 1 second is the current implementation and it can be more if the memcg
> > > > > remains in memory pressure. Regarding usefullness I think the periodic
> > > > > stat collectors (like cadvisor or Google's internal borglet+rumbo) would
> > > > > be interested in scraping this interface.
> > > > 
> > > > I think the cumulative counter suggested above is better at least.
> > > 
> > > It is tied to the underlying implementation. If we decide to use, for
> > > example, PSI in future, what should this interface show?
> > 
> > Actually, if it was exposed as cummulative time under pressure (not
> > cummulative events), that's quite similar to PSI.
> 
> I think overall the cumulative counter is better than just signaling 1 or 0,
> but it lacks the time information (if not scraped periodically). In
> addition, it may oscillate between under_pressure=true/false rather quickly
> so the cumulative counter would catch this.

Yes cumulative counter would not miss small bursts.

> 
> To me, introducing the new PSI for sockets (like for CPU, IO, memory), would
> be slightly better than cumulative counter because PSI can have the timing
> information without frequent periodic scrapes. So it may help with live
> debugs.

How would this PSI for sockets work? What would be the entry and exit
points?

> 
> However, if we were to just add a new counter to the memory.stat in each
> cgroup, then it would be easier to do so?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-23 17:54                 ` Shakeel Butt
@ 2025-07-24  8:43                   ` Daniel Sedlak
  2025-07-25  0:44                     ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Sedlak @ 2025-07-24  8:43 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Koutný, Kuniyuki Iwashima, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Neal Cardwell, David Ahern, Andrew Morton,
	Yosry Ahmed, linux-mm, netdev, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, cgroups, Matyas Hurtik

On 7/23/25 7:54 PM, Shakeel Butt wrote:
>>
>> To me, introducing the new PSI for sockets (like for CPU, IO, memory), would
>> be slightly better than cumulative counter because PSI can have the timing
>> information without frequent periodic scrapes. So it may help with live
>> debugs.
> 
> How would this PSI for sockets work? What would be the entry and exit
> points?
> 
Currently, we know the following information:

- we know when the pressure starts
- and we know when the pressure ends if not rearmed (start time + HZ)

 From that, we should be able to calculate a similar triplet to the 
pressure endpoints in the cgroups (cpu|io|memory|irq).pressure. That is, 
how much % of time on average was spent under pressure for avg10, avg60, 
avg300 i.e. average pressure over the past 10 seconds, 60 seconds, and 
300 seconds, respectively. (+ total time spent under pressure)

For example, if we had pressure for 5 seconds straight, then the output 
of socket.pressure could be:

	full avg10=50.00 avg60=8.33 avg300=1.66 total=77777

Do you think this would be feasible? If so, I can try to send it as v4.

Thanks!
Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-24  8:43                   ` Daniel Sedlak
@ 2025-07-25  0:44                     ` Tejun Heo
  2025-07-28 11:29                       ` Daniel Sedlak
  0 siblings, 1 reply; 20+ messages in thread
From: Tejun Heo @ 2025-07-25  0:44 UTC (permalink / raw)
  To: Daniel Sedlak
  Cc: Shakeel Butt, Michal Koutný, Kuniyuki Iwashima,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Neal Cardwell, David Ahern,
	Andrew Morton, Yosry Ahmed, linux-mm, netdev, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Muchun Song, cgroups, Matyas Hurtik

On Thu, Jul 24, 2025 at 10:43:27AM +0200, Daniel Sedlak wrote:
...
> Currently, we know the following information:
> 
> - we know when the pressure starts
> - and we know when the pressure ends if not rearmed (start time + HZ)
> 
> From that, we should be able to calculate a similar triplet to the pressure
> endpoints in the cgroups (cpu|io|memory|irq).pressure. That is, how much %
> of time on average was spent under pressure for avg10, avg60, avg300 i.e.
> average pressure over the past 10 seconds, 60 seconds, and 300 seconds,
> respectively. (+ total time spent under pressure)

Let's just add the cumulative duration that socket pressure was present.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-25  0:44                     ` Tejun Heo
@ 2025-07-28 11:29                       ` Daniel Sedlak
  2025-07-30  0:15                         ` Tejun Heo
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Sedlak @ 2025-07-28 11:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shakeel Butt, Michal Koutný, Kuniyuki Iwashima,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Neal Cardwell, David Ahern,
	Andrew Morton, Yosry Ahmed, linux-mm, netdev, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Muchun Song, cgroups, Matyas Hurtik

On 7/25/25 2:44 AM, Tejun Heo wrote:
> On Thu, Jul 24, 2025 at 10:43:27AM +0200, Daniel Sedlak wrote:
> ...
>> Currently, we know the following information:
>>
>> - we know when the pressure starts
>> - and we know when the pressure ends if not rearmed (start time + HZ)
>>
>>  From that, we should be able to calculate a similar triplet to the pressure
>> endpoints in the cgroups (cpu|io|memory|irq).pressure. That is, how much %
>> of time on average was spent under pressure for avg10, avg60, avg300 i.e.
>> average pressure over the past 10 seconds, 60 seconds, and 300 seconds,
>> respectively. (+ total time spent under pressure)
> 
> Let's just add the cumulative duration that socket pressure was present.
> 
> Thanks.
> 

Ok, I will send it as v4, if no other objections are made. In which 
units the duration should be? Milliseconds? It can also be microseconds, 
which are now used in (cpu|io|memory|irq).pressure files.

Thanks!
Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] memcg: expose socket memory pressure in a cgroup
  2025-07-28 11:29                       ` Daniel Sedlak
@ 2025-07-30  0:15                         ` Tejun Heo
  0 siblings, 0 replies; 20+ messages in thread
From: Tejun Heo @ 2025-07-30  0:15 UTC (permalink / raw)
  To: Daniel Sedlak
  Cc: Shakeel Butt, Michal Koutný, Kuniyuki Iwashima,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Neal Cardwell, David Ahern,
	Andrew Morton, Yosry Ahmed, linux-mm, netdev, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Muchun Song, cgroups, Matyas Hurtik

On Mon, Jul 28, 2025 at 01:29:29PM +0200, Daniel Sedlak wrote:
...
> Ok, I will send it as v4, if no other objections are made. In which units
> the duration should be? Milliseconds? It can also be microseconds, which are
> now used in (cpu|io|memory|irq).pressure files.

Yes, microseconds.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-07-30  0:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22  7:11 [PATCH v3] memcg: expose socket memory pressure in a cgroup Daniel Sedlak
2025-07-22  7:17 ` Eric Dumazet
2025-07-22  7:27   ` Daniel Sedlak
2025-07-22  8:57 ` Michal Koutný
2025-07-22 17:50   ` Shakeel Butt
2025-07-22 18:27     ` Kuniyuki Iwashima
2025-07-22 18:41       ` Waiman Long
2025-07-22 18:49         ` Kuniyuki Iwashima
2025-07-22 19:05       ` Shakeel Butt
2025-07-22 19:58         ` Kuniyuki Iwashima
2025-07-22 20:11           ` Shakeel Butt
2025-07-22 22:10             ` Kuniyuki Iwashima
2025-07-23  8:38             ` Michal Koutný
2025-07-23  8:58               ` Daniel Sedlak
2025-07-23 17:54                 ` Shakeel Butt
2025-07-24  8:43                   ` Daniel Sedlak
2025-07-25  0:44                     ` Tejun Heo
2025-07-28 11:29                       ` Daniel Sedlak
2025-07-30  0:15                         ` Tejun Heo
2025-07-23  8:41         ` Daniel Sedlak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).