* [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
2022-01-29 20:53 ` Waiman Long
@ 2022-01-29 20:53 ` Waiman Long
-1 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-29 20:53 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini, Waiman Long
For *scnprintf(), vsnprintf() is always called even if the input size is
0. That is a waste of time, so just return 0 in this case.
Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
lib/vsprintf.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 3b8129dd374c..a65df546fb06 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
{
int i;
+ if (!size)
+ return 0;
+
i = vsnprintf(buf, size, fmt, args);
if (likely(i < size))
return i;
- if (size != 0)
- return size - 1;
- return 0;
+
+ return size - 1;
}
EXPORT_SYMBOL(vscnprintf);
--
2.27.0
^ permalink raw reply related [flat|nested] 63+ messages in thread* [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
@ 2022-01-29 20:53 ` Waiman Long
0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-29 20:53 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes
Cc: linux-kernel, cgroups, linux-mm, Ira Weiny, Rafael Aquini,
Waiman Long
For *scnprintf(), vsnprintf() is always called even if the input size is
0. That is a waste of time, so just return 0 in this case.
Signed-off-by: Waiman Long <longman@redhat.com>
---
lib/vsprintf.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 3b8129dd374c..a65df546fb06 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
{
int i;
+ if (!size)
+ return 0;
+
i = vsnprintf(buf, size, fmt, args);
if (likely(i < size))
return i;
- if (size != 0)
- return size - 1;
- return 0;
+
+ return size - 1;
}
EXPORT_SYMBOL(vscnprintf);
--
2.27.0
^ permalink raw reply related [flat|nested] 63+ messages in thread[parent not found: <20220129205315.478628-2-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
2022-01-29 20:53 ` Waiman Long
@ 2022-01-30 20:49 ` David Rientjes
-1 siblings, 0 replies; 63+ messages in thread
From: David Rientjes @ 2022-01-30 20:49 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On Sat, 29 Jan 2022, Waiman Long wrote:
> For *scnprintf(), vsnprintf() is always called even if the input size is
> 0. That is a waste of time, so just return 0 in this case.
>
> Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> lib/vsprintf.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 3b8129dd374c..a65df546fb06 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
> {
> int i;
>
> + if (!size)
> + return 0;
Nit: any reason this shouldn't be unlikely()? If the conditional for
i < size is likely(), this seems assumed already?
> +
> i = vsnprintf(buf, size, fmt, args);
>
> if (likely(i < size))
> return i;
> - if (size != 0)
> - return size - 1;
> - return 0;
> +
> + return size - 1;
> }
> EXPORT_SYMBOL(vscnprintf);
>
> --
> 2.27.0
>
>
>
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
@ 2022-01-30 20:49 ` David Rientjes
0 siblings, 0 replies; 63+ messages in thread
From: David Rientjes @ 2022-01-30 20:49 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Sat, 29 Jan 2022, Waiman Long wrote:
> For *scnprintf(), vsnprintf() is always called even if the input size is
> 0. That is a waste of time, so just return 0 in this case.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> lib/vsprintf.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 3b8129dd374c..a65df546fb06 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
> {
> int i;
>
> + if (!size)
> + return 0;
Nit: any reason this shouldn't be unlikely()? If the conditional for
i < size is likely(), this seems assumed already?
> +
> i = vsnprintf(buf, size, fmt, args);
>
> if (likely(i < size))
> return i;
> - if (size != 0)
> - return size - 1;
> - return 0;
> +
> + return size - 1;
> }
> EXPORT_SYMBOL(vscnprintf);
>
> --
> 2.27.0
>
>
>
^ permalink raw reply [flat|nested] 63+ messages in thread[parent not found: <d99b3c4b-7b6e-529-6e4b-b91b65c92d81-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
2022-01-30 20:49 ` David Rientjes
@ 2022-01-30 20:57 ` Waiman Long
-1 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-30 20:57 UTC (permalink / raw)
To: David Rientjes
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On 1/30/22 15:49, David Rientjes wrote:
> On Sat, 29 Jan 2022, Waiman Long wrote:
>
>> For *scnprintf(), vsnprintf() is always called even if the input size is
>> 0. That is a waste of time, so just return 0 in this case.
>>
>> Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> ---
>> lib/vsprintf.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>> index 3b8129dd374c..a65df546fb06 100644
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
>> {
>> int i;
>>
>> + if (!size)
>> + return 0;
> Nit: any reason this shouldn't be unlikely()? If the conditional for
> i < size is likely(), this seems assumed already?
Good suggestion. Will make the change in the next version.
Cheers,
Longman
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
@ 2022-01-30 20:57 ` Waiman Long
0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-30 20:57 UTC (permalink / raw)
To: David Rientjes
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On 1/30/22 15:49, David Rientjes wrote:
> On Sat, 29 Jan 2022, Waiman Long wrote:
>
>> For *scnprintf(), vsnprintf() is always called even if the input size is
>> 0. That is a waste of time, so just return 0 in this case.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> lib/vsprintf.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>> index 3b8129dd374c..a65df546fb06 100644
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
>> {
>> int i;
>>
>> + if (!size)
>> + return 0;
> Nit: any reason this shouldn't be unlikely()? If the conditional for
> i < size is likely(), this seems assumed already?
Good suggestion. Will make the change in the next version.
Cheers,
Longman
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
2022-01-30 20:49 ` David Rientjes
(?)
(?)
@ 2022-01-31 10:25 ` Andy Shevchenko
2022-01-31 10:30 ` Andy Shevchenko
-1 siblings, 1 reply; 63+ messages in thread
From: Andy Shevchenko @ 2022-01-31 10:25 UTC (permalink / raw)
To: David Rientjes
Cc: Waiman Long, Johannes Weiner, Michal Hocko, Vladimir Davydov,
Andrew Morton, Petr Mladek, Steven Rostedt, Sergey Senozhatsky,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Sun, Jan 30, 2022 at 12:49:37PM -0800, David Rientjes wrote:
> On Sat, 29 Jan 2022, Waiman Long wrote:
>
> > For *scnprintf(), vsnprintf() is always called even if the input size is
> > 0. That is a waste of time, so just return 0 in this case.
Why do you think it's not legit?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
2022-01-31 10:25 ` Andy Shevchenko
@ 2022-01-31 10:30 ` Andy Shevchenko
[not found] ` <Yfe6SfG4CqzWSaMM-XvqNBM/wLWRrdx17CPfAsdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 63+ messages in thread
From: Andy Shevchenko @ 2022-01-31 10:30 UTC (permalink / raw)
To: David Rientjes
Cc: Waiman Long, Johannes Weiner, Michal Hocko, Vladimir Davydov,
Andrew Morton, Petr Mladek, Steven Rostedt, Sergey Senozhatsky,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Mon, Jan 31, 2022 at 12:25:09PM +0200, Andy Shevchenko wrote:
> On Sun, Jan 30, 2022 at 12:49:37PM -0800, David Rientjes wrote:
> > On Sat, 29 Jan 2022, Waiman Long wrote:
> >
> > > For *scnprintf(), vsnprintf() is always called even if the input size is
> > > 0. That is a waste of time, so just return 0 in this case.
>
> Why do you think it's not legit?
I have to elaborate.
For *nprintf() the size=0 is quite useful to have.
For *cnprintf() the size=0 makes less sense, but, if we read `man snprintf()`:
The functions snprintf() and vsnprintf() do not write more than size bytes
(including the terminating null byte ('\0')). If the output was truncated due
to this limit, then the return value is the number of characters (excluding
the terminating null byte) which would have been written to the final string
if enough space had been available. Thus, a return value of size or more
means that the output was truncated. (See also below under NOTES.)
If an output error is encountered, a negative value is returned.
Note the last sentence there. You need to answer to it in the commit message
why your change is okay and it will show that you thought through all possible
scenarios.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
2022-01-29 20:53 ` Waiman Long
@ 2022-01-31 2:53 ` Sergey Senozhatsky
-1 siblings, 0 replies; 63+ messages in thread
From: Sergey Senozhatsky @ 2022-01-31 2:53 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On (22/01/29 15:53), Waiman Long wrote:
> For *scnprintf(), vsnprintf() is always called even if the input size is
> 0. That is a waste of time, so just return 0 in this case.
>
> Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> +++ b/lib/vsprintf.c
> @@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
> {
> int i;
>
> + if (!size)
> + return 0;
> +
> i = vsnprintf(buf, size, fmt, args);
>
> if (likely(i < size))
> return i;
> - if (size != 0)
> - return size - 1;
> - return 0;
> +
> + return size - 1;
> }
> EXPORT_SYMBOL(vscnprintf);
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
@ 2022-01-31 2:53 ` Sergey Senozhatsky
0 siblings, 0 replies; 63+ messages in thread
From: Sergey Senozhatsky @ 2022-01-31 2:53 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On (22/01/29 15:53), Waiman Long wrote:
> For *scnprintf(), vsnprintf() is always called even if the input size is
> 0. That is a waste of time, so just return 0 in this case.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> +++ b/lib/vsprintf.c
> @@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
> {
> int i;
>
> + if (!size)
> + return 0;
> +
> i = vsnprintf(buf, size, fmt, args);
>
> if (likely(i < size))
> return i;
> - if (size != 0)
> - return size - 1;
> - return 0;
> +
> + return size - 1;
> }
> EXPORT_SYMBOL(vscnprintf);
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
2022-01-29 20:53 ` Waiman Long
@ 2022-01-31 18:17 ` Roman Gushchin
-1 siblings, 0 replies; 63+ messages in thread
From: Roman Gushchin @ 2022-01-31 18:17 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On Sat, Jan 29, 2022 at 03:53:13PM -0500, Waiman Long wrote:
> For *scnprintf(), vsnprintf() is always called even if the input size is
> 0. That is a waste of time, so just return 0 in this case.
>
> Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> lib/vsprintf.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 3b8129dd374c..a65df546fb06 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
> {
> int i;
>
> + if (!size)
> + return 0;
> +
> i = vsnprintf(buf, size, fmt, args);
>
> if (likely(i < size))
> return i;
> - if (size != 0)
> - return size - 1;
> - return 0;
> +
> + return size - 1;
> }
> EXPORT_SYMBOL(vscnprintf);
Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
Thanks!
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH v2 1/3] lib/vsprintf: Avoid redundant work with 0 size
@ 2022-01-31 18:17 ` Roman Gushchin
0 siblings, 0 replies; 63+ messages in thread
From: Roman Gushchin @ 2022-01-31 18:17 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Sat, Jan 29, 2022 at 03:53:13PM -0500, Waiman Long wrote:
> For *scnprintf(), vsnprintf() is always called even if the input size is
> 0. That is a waste of time, so just return 0 in this case.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> lib/vsprintf.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 3b8129dd374c..a65df546fb06 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2895,13 +2895,15 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
> {
> int i;
>
> + if (!size)
> + return 0;
> +
> i = vsnprintf(buf, size, fmt, args);
>
> if (likely(i < size))
> return i;
> - if (size != 0)
> - return size - 1;
> - return 0;
> +
> + return size - 1;
> }
> EXPORT_SYMBOL(vscnprintf);
Acked-by: Roman Gushchin <guro@fb.com>
Thanks!
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v2 2/3] mm/page_owner: Use scnprintf() to avoid excessive buffer overrun check
2022-01-29 20:53 ` Waiman Long
@ 2022-01-29 20:53 ` Waiman Long
-1 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-29 20:53 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini, Waiman Long
The snprintf() function can return a length greater than the given
input size. That will require a check for buffer overrun after each
invocation of snprintf(). scnprintf(), on the other hand, will never
return a greater length. By using scnprintf() in selected places, we
can avoid some buffer overrun checks except after stack_depot_snprint()
and after the last snprintf().
Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
mm/page_owner.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 99e360df9465..28dac73e0542 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -338,19 +338,16 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
if (!kbuf)
return -ENOMEM;
- ret = snprintf(kbuf, count,
+ ret = scnprintf(kbuf, count,
"Page allocated via order %u, mask %#x(%pGg), pid %d, ts %llu ns, free_ts %llu ns\n",
page_owner->order, page_owner->gfp_mask,
&page_owner->gfp_mask, page_owner->pid,
page_owner->ts_nsec, page_owner->free_ts_nsec);
- if (ret >= count)
- goto err;
-
/* Print information relevant to grouping pages by mobility */
pageblock_mt = get_pageblock_migratetype(page);
page_mt = gfp_migratetype(page_owner->gfp_mask);
- ret += snprintf(kbuf + ret, count - ret,
+ ret += scnprintf(kbuf + ret, count - ret,
"PFN %lu type %s Block %lu type %s Flags %pGp\n",
pfn,
migratetype_names[page_mt],
@@ -358,19 +355,14 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
migratetype_names[pageblock_mt],
&page->flags);
- if (ret >= count)
- goto err;
-
ret += stack_depot_snprint(handle, kbuf + ret, count - ret, 0);
if (ret >= count)
goto err;
if (page_owner->last_migrate_reason != -1) {
- ret += snprintf(kbuf + ret, count - ret,
+ ret += scnprintf(kbuf + ret, count - ret,
"Page has been migrated, last migrate reason: %s\n",
migrate_reason_names[page_owner->last_migrate_reason]);
- if (ret >= count)
- goto err;
}
ret += snprintf(kbuf + ret, count - ret, "\n");
--
2.27.0
^ permalink raw reply related [flat|nested] 63+ messages in thread* [PATCH v2 2/3] mm/page_owner: Use scnprintf() to avoid excessive buffer overrun check
@ 2022-01-29 20:53 ` Waiman Long
0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-29 20:53 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes
Cc: linux-kernel, cgroups, linux-mm, Ira Weiny, Rafael Aquini,
Waiman Long
The snprintf() function can return a length greater than the given
input size. That will require a check for buffer overrun after each
invocation of snprintf(). scnprintf(), on the other hand, will never
return a greater length. By using scnprintf() in selected places, we
can avoid some buffer overrun checks except after stack_depot_snprint()
and after the last snprintf().
Signed-off-by: Waiman Long <longman@redhat.com>
---
mm/page_owner.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 99e360df9465..28dac73e0542 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -338,19 +338,16 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
if (!kbuf)
return -ENOMEM;
- ret = snprintf(kbuf, count,
+ ret = scnprintf(kbuf, count,
"Page allocated via order %u, mask %#x(%pGg), pid %d, ts %llu ns, free_ts %llu ns\n",
page_owner->order, page_owner->gfp_mask,
&page_owner->gfp_mask, page_owner->pid,
page_owner->ts_nsec, page_owner->free_ts_nsec);
- if (ret >= count)
- goto err;
-
/* Print information relevant to grouping pages by mobility */
pageblock_mt = get_pageblock_migratetype(page);
page_mt = gfp_migratetype(page_owner->gfp_mask);
- ret += snprintf(kbuf + ret, count - ret,
+ ret += scnprintf(kbuf + ret, count - ret,
"PFN %lu type %s Block %lu type %s Flags %pGp\n",
pfn,
migratetype_names[page_mt],
@@ -358,19 +355,14 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
migratetype_names[pageblock_mt],
&page->flags);
- if (ret >= count)
- goto err;
-
ret += stack_depot_snprint(handle, kbuf + ret, count - ret, 0);
if (ret >= count)
goto err;
if (page_owner->last_migrate_reason != -1) {
- ret += snprintf(kbuf + ret, count - ret,
+ ret += scnprintf(kbuf + ret, count - ret,
"Page has been migrated, last migrate reason: %s\n",
migrate_reason_names[page_owner->last_migrate_reason]);
- if (ret >= count)
- goto err;
}
ret += snprintf(kbuf + ret, count - ret, "\n");
--
2.27.0
^ permalink raw reply related [flat|nested] 63+ messages in thread[parent not found: <20220129205315.478628-3-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2 2/3] mm/page_owner: Use scnprintf() to avoid excessive buffer overrun check
2022-01-29 20:53 ` Waiman Long
@ 2022-01-31 2:58 ` Sergey Senozhatsky
-1 siblings, 0 replies; 63+ messages in thread
From: Sergey Senozhatsky @ 2022-01-31 2:58 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On (22/01/29 15:53), Waiman Long wrote:
> The snprintf() function can return a length greater than the given
> input size. That will require a check for buffer overrun after each
> invocation of snprintf(). scnprintf(), on the other hand, will never
> return a greater length. By using scnprintf() in selected places, we
> can avoid some buffer overrun checks except after stack_depot_snprint()
> and after the last snprintf().
>
> Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 2/3] mm/page_owner: Use scnprintf() to avoid excessive buffer overrun check
@ 2022-01-31 2:58 ` Sergey Senozhatsky
0 siblings, 0 replies; 63+ messages in thread
From: Sergey Senozhatsky @ 2022-01-31 2:58 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On (22/01/29 15:53), Waiman Long wrote:
> The snprintf() function can return a length greater than the given
> input size. That will require a check for buffer overrun after each
> invocation of snprintf(). scnprintf(), on the other hand, will never
> return a greater length. By using scnprintf() in selected places, we
> can avoid some buffer overrun checks except after stack_depot_snprint()
> and after the last snprintf().
>
> Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-29 20:53 ` Waiman Long
@ 2022-01-29 20:53 ` Waiman Long
-1 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-29 20:53 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini, Waiman Long
It was found that a number of offlined memcgs were not freed because
they were pinned by some charged pages that were present. Even "echo
1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
offlined but not freed memcgs tend to increase in number over time with
the side effect that percpu memory consumption as shown in /proc/meminfo
also increases over time.
In order to find out more information about those pages that pin
offlined memcgs, the page_owner feature is extended to dump memory
cgroup information especially whether the cgroup is offlined or not.
Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
mm/page_owner.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 28dac73e0542..8dc5cd0fa227 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -10,6 +10,7 @@
#include <linux/migrate.h>
#include <linux/stackdepot.h>
#include <linux/seq_file.h>
+#include <linux/memcontrol.h>
#include <linux/sched/clock.h>
#include "internal.h"
@@ -331,6 +332,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
depot_stack_handle_t handle)
{
int ret, pageblock_mt, page_mt;
+ unsigned long __maybe_unused memcg_data;
char *kbuf;
count = min_t(size_t, count, PAGE_SIZE);
@@ -365,6 +367,35 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
migrate_reason_names[page_owner->last_migrate_reason]);
}
+#ifdef CONFIG_MEMCG
+ /*
+ * Look for memcg information and print it out
+ */
+ memcg_data = READ_ONCE(page->memcg_data);
+ if (memcg_data) {
+ struct mem_cgroup *memcg = page_memcg_check(page);
+ bool onlined;
+ char name[80];
+
+ if (memcg_data & MEMCG_DATA_OBJCGS)
+ ret += scnprintf(kbuf + ret, count - ret,
+ "Slab cache page\n");
+
+ if (!memcg)
+ goto copy_out;
+
+ onlined = (memcg->css.flags & CSS_ONLINE);
+ cgroup_name(memcg->css.cgroup, name, sizeof(name));
+ ret += scnprintf(kbuf + ret, count - ret,
+ "Charged %sto %smemcg %s\n",
+ PageMemcgKmem(page) ? "(via objcg) " : "",
+ onlined ? "" : "offlined ",
+ name);
+ }
+
+copy_out:
+#endif
+
ret += snprintf(kbuf + ret, count - ret, "\n");
if (ret >= count)
goto err;
--
2.27.0
^ permalink raw reply related [flat|nested] 63+ messages in thread* [PATCH v2 3/3] mm/page_owner: Dump memcg information
@ 2022-01-29 20:53 ` Waiman Long
0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-29 20:53 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes
Cc: linux-kernel, cgroups, linux-mm, Ira Weiny, Rafael Aquini,
Waiman Long
It was found that a number of offlined memcgs were not freed because
they were pinned by some charged pages that were present. Even "echo
1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
offlined but not freed memcgs tend to increase in number over time with
the side effect that percpu memory consumption as shown in /proc/meminfo
also increases over time.
In order to find out more information about those pages that pin
offlined memcgs, the page_owner feature is extended to dump memory
cgroup information especially whether the cgroup is offlined or not.
Signed-off-by: Waiman Long <longman@redhat.com>
---
mm/page_owner.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 28dac73e0542..8dc5cd0fa227 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -10,6 +10,7 @@
#include <linux/migrate.h>
#include <linux/stackdepot.h>
#include <linux/seq_file.h>
+#include <linux/memcontrol.h>
#include <linux/sched/clock.h>
#include "internal.h"
@@ -331,6 +332,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
depot_stack_handle_t handle)
{
int ret, pageblock_mt, page_mt;
+ unsigned long __maybe_unused memcg_data;
char *kbuf;
count = min_t(size_t, count, PAGE_SIZE);
@@ -365,6 +367,35 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
migrate_reason_names[page_owner->last_migrate_reason]);
}
+#ifdef CONFIG_MEMCG
+ /*
+ * Look for memcg information and print it out
+ */
+ memcg_data = READ_ONCE(page->memcg_data);
+ if (memcg_data) {
+ struct mem_cgroup *memcg = page_memcg_check(page);
+ bool onlined;
+ char name[80];
+
+ if (memcg_data & MEMCG_DATA_OBJCGS)
+ ret += scnprintf(kbuf + ret, count - ret,
+ "Slab cache page\n");
+
+ if (!memcg)
+ goto copy_out;
+
+ onlined = (memcg->css.flags & CSS_ONLINE);
+ cgroup_name(memcg->css.cgroup, name, sizeof(name));
+ ret += scnprintf(kbuf + ret, count - ret,
+ "Charged %sto %smemcg %s\n",
+ PageMemcgKmem(page) ? "(via objcg) " : "",
+ onlined ? "" : "offlined ",
+ name);
+ }
+
+copy_out:
+#endif
+
ret += snprintf(kbuf + ret, count - ret, "\n");
if (ret >= count)
goto err;
--
2.27.0
^ permalink raw reply related [flat|nested] 63+ messages in thread[parent not found: <20220129205315.478628-4-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-29 20:53 ` Waiman Long
@ 2022-01-30 6:33 ` Mike Rapoport
-1 siblings, 0 replies; 63+ messages in thread
From: Mike Rapoport @ 2022-01-30 6:33 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On Sat, Jan 29, 2022 at 03:53:15PM -0500, Waiman Long wrote:
> It was found that a number of offlined memcgs were not freed because
> they were pinned by some charged pages that were present. Even "echo
> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> offlined but not freed memcgs tend to increase in number over time with
> the side effect that percpu memory consumption as shown in /proc/meminfo
> also increases over time.
>
> In order to find out more information about those pages that pin
> offlined memcgs, the page_owner feature is extended to dump memory
> cgroup information especially whether the cgroup is offlined or not.
>
> Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> mm/page_owner.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 28dac73e0542..8dc5cd0fa227 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -10,6 +10,7 @@
> #include <linux/migrate.h>
> #include <linux/stackdepot.h>
> #include <linux/seq_file.h>
> +#include <linux/memcontrol.h>
> #include <linux/sched/clock.h>
>
> #include "internal.h"
> @@ -331,6 +332,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> depot_stack_handle_t handle)
> {
> int ret, pageblock_mt, page_mt;
> + unsigned long __maybe_unused memcg_data;
> char *kbuf;
>
> count = min_t(size_t, count, PAGE_SIZE);
> @@ -365,6 +367,35 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> migrate_reason_names[page_owner->last_migrate_reason]);
> }
>
> +#ifdef CONFIG_MEMCG
Can we put all this along with the declaration of memcg_data in a helper
function please?
> + /*
> + * Look for memcg information and print it out
> + */
> + memcg_data = READ_ONCE(page->memcg_data);
> + if (memcg_data) {
> + struct mem_cgroup *memcg = page_memcg_check(page);
> + bool onlined;
> + char name[80];
> +
> + if (memcg_data & MEMCG_DATA_OBJCGS)
> + ret += scnprintf(kbuf + ret, count - ret,
> + "Slab cache page\n");
> +
> + if (!memcg)
> + goto copy_out;
> +
> + onlined = (memcg->css.flags & CSS_ONLINE);
> + cgroup_name(memcg->css.cgroup, name, sizeof(name));
> + ret += scnprintf(kbuf + ret, count - ret,
> + "Charged %sto %smemcg %s\n",
> + PageMemcgKmem(page) ? "(via objcg) " : "",
> + onlined ? "" : "offlined ",
> + name);
> + }
> +
> +copy_out:
> +#endif
> +
> ret += snprintf(kbuf + ret, count - ret, "\n");
> if (ret >= count)
> goto err;
> --
> 2.27.0
>
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
@ 2022-01-30 6:33 ` Mike Rapoport
0 siblings, 0 replies; 63+ messages in thread
From: Mike Rapoport @ 2022-01-30 6:33 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Sat, Jan 29, 2022 at 03:53:15PM -0500, Waiman Long wrote:
> It was found that a number of offlined memcgs were not freed because
> they were pinned by some charged pages that were present. Even "echo
> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> offlined but not freed memcgs tend to increase in number over time with
> the side effect that percpu memory consumption as shown in /proc/meminfo
> also increases over time.
>
> In order to find out more information about those pages that pin
> offlined memcgs, the page_owner feature is extended to dump memory
> cgroup information especially whether the cgroup is offlined or not.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> mm/page_owner.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 28dac73e0542..8dc5cd0fa227 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -10,6 +10,7 @@
> #include <linux/migrate.h>
> #include <linux/stackdepot.h>
> #include <linux/seq_file.h>
> +#include <linux/memcontrol.h>
> #include <linux/sched/clock.h>
>
> #include "internal.h"
> @@ -331,6 +332,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> depot_stack_handle_t handle)
> {
> int ret, pageblock_mt, page_mt;
> + unsigned long __maybe_unused memcg_data;
> char *kbuf;
>
> count = min_t(size_t, count, PAGE_SIZE);
> @@ -365,6 +367,35 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> migrate_reason_names[page_owner->last_migrate_reason]);
> }
>
> +#ifdef CONFIG_MEMCG
Can we put all this along with the declaration of memcg_data in a helper
function please?
> + /*
> + * Look for memcg information and print it out
> + */
> + memcg_data = READ_ONCE(page->memcg_data);
> + if (memcg_data) {
> + struct mem_cgroup *memcg = page_memcg_check(page);
> + bool onlined;
> + char name[80];
> +
> + if (memcg_data & MEMCG_DATA_OBJCGS)
> + ret += scnprintf(kbuf + ret, count - ret,
> + "Slab cache page\n");
> +
> + if (!memcg)
> + goto copy_out;
> +
> + onlined = (memcg->css.flags & CSS_ONLINE);
> + cgroup_name(memcg->css.cgroup, name, sizeof(name));
> + ret += scnprintf(kbuf + ret, count - ret,
> + "Charged %sto %smemcg %s\n",
> + PageMemcgKmem(page) ? "(via objcg) " : "",
> + onlined ? "" : "offlined ",
> + name);
> + }
> +
> +copy_out:
> +#endif
> +
> ret += snprintf(kbuf + ret, count - ret, "\n");
> if (ret >= count)
> goto err;
> --
> 2.27.0
>
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-30 6:33 ` Mike Rapoport
(?)
@ 2022-01-30 18:22 ` Waiman Long
[not found] ` <82c99093-e44b-7fac-14ab-9e8392d107ea-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
-1 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2022-01-30 18:22 UTC (permalink / raw)
To: Mike Rapoport
Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On 1/30/22 01:33, Mike Rapoport wrote:
> On Sat, Jan 29, 2022 at 03:53:15PM -0500, Waiman Long wrote:
>> It was found that a number of offlined memcgs were not freed because
>> they were pinned by some charged pages that were present. Even "echo
>> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
>> offlined but not freed memcgs tend to increase in number over time with
>> the side effect that percpu memory consumption as shown in /proc/meminfo
>> also increases over time.
>>
>> In order to find out more information about those pages that pin
>> offlined memcgs, the page_owner feature is extended to dump memory
>> cgroup information especially whether the cgroup is offlined or not.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> mm/page_owner.c | 31 +++++++++++++++++++++++++++++++
>> 1 file changed, 31 insertions(+)
>>
>> diff --git a/mm/page_owner.c b/mm/page_owner.c
>> index 28dac73e0542..8dc5cd0fa227 100644
>> --- a/mm/page_owner.c
>> +++ b/mm/page_owner.c
>> @@ -10,6 +10,7 @@
>> #include <linux/migrate.h>
>> #include <linux/stackdepot.h>
>> #include <linux/seq_file.h>
>> +#include <linux/memcontrol.h>
>> #include <linux/sched/clock.h>
>>
>> #include "internal.h"
>> @@ -331,6 +332,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
>> depot_stack_handle_t handle)
>> {
>> int ret, pageblock_mt, page_mt;
>> + unsigned long __maybe_unused memcg_data;
>> char *kbuf;
>>
>> count = min_t(size_t, count, PAGE_SIZE);
>> @@ -365,6 +367,35 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
>> migrate_reason_names[page_owner->last_migrate_reason]);
>> }
>>
>> +#ifdef CONFIG_MEMCG
> Can we put all this along with the declaration of memcg_data in a helper
> function please?
>
Sure. Will post another version with that change.
Cheers,
Longman
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-29 20:53 ` Waiman Long
@ 2022-01-31 9:38 ` Michal Hocko
-1 siblings, 0 replies; 63+ messages in thread
From: Michal Hocko @ 2022-01-31 9:38 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Petr Mladek,
Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On Sat 29-01-22 15:53:15, Waiman Long wrote:
> It was found that a number of offlined memcgs were not freed because
> they were pinned by some charged pages that were present. Even "echo
> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> offlined but not freed memcgs tend to increase in number over time with
> the side effect that percpu memory consumption as shown in /proc/meminfo
> also increases over time.
>
> In order to find out more information about those pages that pin
> offlined memcgs, the page_owner feature is extended to dump memory
> cgroup information especially whether the cgroup is offlined or not.
It is not really clear to me how this is supposed to be used. Are you
really dumping all the pages in the system to find out offline memcgs?
That looks rather clumsy to me. I am not against adding memcg
information to the page owner output. That can be useful in other
contexts.
> Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> mm/page_owner.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 28dac73e0542..8dc5cd0fa227 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -10,6 +10,7 @@
> #include <linux/migrate.h>
> #include <linux/stackdepot.h>
> #include <linux/seq_file.h>
> +#include <linux/memcontrol.h>
> #include <linux/sched/clock.h>
>
> #include "internal.h"
> @@ -331,6 +332,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> depot_stack_handle_t handle)
> {
> int ret, pageblock_mt, page_mt;
> + unsigned long __maybe_unused memcg_data;
> char *kbuf;
>
> count = min_t(size_t, count, PAGE_SIZE);
> @@ -365,6 +367,35 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> migrate_reason_names[page_owner->last_migrate_reason]);
> }
>
> +#ifdef CONFIG_MEMCG
This really begs to be in a dedicated function. page_owner_print_memcg
or something like that.
> + /*
> + * Look for memcg information and print it out
> + */
> + memcg_data = READ_ONCE(page->memcg_data);
> + if (memcg_data) {
> + struct mem_cgroup *memcg = page_memcg_check(page);
> + bool onlined;
> + char name[80];
What does prevent memcg to go away and being reused for a different
purpose?
> +
> + if (memcg_data & MEMCG_DATA_OBJCGS)
> + ret += scnprintf(kbuf + ret, count - ret,
> + "Slab cache page\n");
> +
> + if (!memcg)
> + goto copy_out;
> +
> + onlined = (memcg->css.flags & CSS_ONLINE);
> + cgroup_name(memcg->css.cgroup, name, sizeof(name));
> + ret += scnprintf(kbuf + ret, count - ret,
> + "Charged %sto %smemcg %s\n",
> + PageMemcgKmem(page) ? "(via objcg) " : "",
> + onlined ? "" : "offlined ",
> + name);
> + }
> +
> +copy_out:
> +#endif
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
@ 2022-01-31 9:38 ` Michal Hocko
0 siblings, 0 replies; 63+ messages in thread
From: Michal Hocko @ 2022-01-31 9:38 UTC (permalink / raw)
To: Waiman Long
Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Petr Mladek,
Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Sat 29-01-22 15:53:15, Waiman Long wrote:
> It was found that a number of offlined memcgs were not freed because
> they were pinned by some charged pages that were present. Even "echo
> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> offlined but not freed memcgs tend to increase in number over time with
> the side effect that percpu memory consumption as shown in /proc/meminfo
> also increases over time.
>
> In order to find out more information about those pages that pin
> offlined memcgs, the page_owner feature is extended to dump memory
> cgroup information especially whether the cgroup is offlined or not.
It is not really clear to me how this is supposed to be used. Are you
really dumping all the pages in the system to find out offline memcgs?
That looks rather clumsy to me. I am not against adding memcg
information to the page owner output. That can be useful in other
contexts.
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> mm/page_owner.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 28dac73e0542..8dc5cd0fa227 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -10,6 +10,7 @@
> #include <linux/migrate.h>
> #include <linux/stackdepot.h>
> #include <linux/seq_file.h>
> +#include <linux/memcontrol.h>
> #include <linux/sched/clock.h>
>
> #include "internal.h"
> @@ -331,6 +332,7 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> depot_stack_handle_t handle)
> {
> int ret, pageblock_mt, page_mt;
> + unsigned long __maybe_unused memcg_data;
> char *kbuf;
>
> count = min_t(size_t, count, PAGE_SIZE);
> @@ -365,6 +367,35 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn,
> migrate_reason_names[page_owner->last_migrate_reason]);
> }
>
> +#ifdef CONFIG_MEMCG
This really begs to be in a dedicated function. page_owner_print_memcg
or something like that.
> + /*
> + * Look for memcg information and print it out
> + */
> + memcg_data = READ_ONCE(page->memcg_data);
> + if (memcg_data) {
> + struct mem_cgroup *memcg = page_memcg_check(page);
> + bool onlined;
> + char name[80];
What does prevent memcg to go away and being reused for a different
purpose?
> +
> + if (memcg_data & MEMCG_DATA_OBJCGS)
> + ret += scnprintf(kbuf + ret, count - ret,
> + "Slab cache page\n");
> +
> + if (!memcg)
> + goto copy_out;
> +
> + onlined = (memcg->css.flags & CSS_ONLINE);
> + cgroup_name(memcg->css.cgroup, name, sizeof(name));
> + ret += scnprintf(kbuf + ret, count - ret,
> + "Charged %sto %smemcg %s\n",
> + PageMemcgKmem(page) ? "(via objcg) " : "",
> + onlined ? "" : "offlined ",
> + name);
> + }
> +
> +copy_out:
> +#endif
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 63+ messages in thread[parent not found: <YfeuK5j7cbgM+Oo+-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-31 9:38 ` Michal Hocko
@ 2022-01-31 16:53 ` Johannes Weiner
-1 siblings, 0 replies; 63+ messages in thread
From: Johannes Weiner @ 2022-01-31 16:53 UTC (permalink / raw)
To: Michal Hocko
Cc: Waiman Long, Vladimir Davydov, Andrew Morton, Petr Mladek,
Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On Mon, Jan 31, 2022 at 10:38:51AM +0100, Michal Hocko wrote:
> On Sat 29-01-22 15:53:15, Waiman Long wrote:
> > It was found that a number of offlined memcgs were not freed because
> > they were pinned by some charged pages that were present. Even "echo
> > 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> > offlined but not freed memcgs tend to increase in number over time with
> > the side effect that percpu memory consumption as shown in /proc/meminfo
> > also increases over time.
> >
> > In order to find out more information about those pages that pin
> > offlined memcgs, the page_owner feature is extended to dump memory
> > cgroup information especially whether the cgroup is offlined or not.
>
> It is not really clear to me how this is supposed to be used. Are you
> really dumping all the pages in the system to find out offline memcgs?
> That looks rather clumsy to me. I am not against adding memcg
> information to the page owner output. That can be useful in other
> contexts.
We've sometimes done exactly that in production, but with drgn
scripts. It's not very common, so it doesn't need to be very efficient
either. Typically, we'd encounter a host with an unusual number of
dying cgroups, ssh in and poke around with drgn to figure out what
kind of objects are still pinning the cgroups in question.
This patch would make that process a little easier, I suppose.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
@ 2022-01-31 16:53 ` Johannes Weiner
0 siblings, 0 replies; 63+ messages in thread
From: Johannes Weiner @ 2022-01-31 16:53 UTC (permalink / raw)
To: Michal Hocko
Cc: Waiman Long, Vladimir Davydov, Andrew Morton, Petr Mladek,
Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Mon, Jan 31, 2022 at 10:38:51AM +0100, Michal Hocko wrote:
> On Sat 29-01-22 15:53:15, Waiman Long wrote:
> > It was found that a number of offlined memcgs were not freed because
> > they were pinned by some charged pages that were present. Even "echo
> > 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> > offlined but not freed memcgs tend to increase in number over time with
> > the side effect that percpu memory consumption as shown in /proc/meminfo
> > also increases over time.
> >
> > In order to find out more information about those pages that pin
> > offlined memcgs, the page_owner feature is extended to dump memory
> > cgroup information especially whether the cgroup is offlined or not.
>
> It is not really clear to me how this is supposed to be used. Are you
> really dumping all the pages in the system to find out offline memcgs?
> That looks rather clumsy to me. I am not against adding memcg
> information to the page owner output. That can be useful in other
> contexts.
We've sometimes done exactly that in production, but with drgn
scripts. It's not very common, so it doesn't need to be very efficient
either. Typically, we'd encounter a host with an unusual number of
dying cgroups, ssh in and poke around with drgn to figure out what
kind of objects are still pinning the cgroups in question.
This patch would make that process a little easier, I suppose.
^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <YfgT/9tEREQNiiAN-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-31 16:53 ` Johannes Weiner
@ 2022-01-31 18:15 ` Roman Gushchin
-1 siblings, 0 replies; 63+ messages in thread
From: Roman Gushchin @ 2022-01-31 18:15 UTC (permalink / raw)
To: Johannes Weiner
Cc: Michal Hocko, Waiman Long, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On Mon, Jan 31, 2022 at 11:53:19AM -0500, Johannes Weiner wrote:
> On Mon, Jan 31, 2022 at 10:38:51AM +0100, Michal Hocko wrote:
> > On Sat 29-01-22 15:53:15, Waiman Long wrote:
> > > It was found that a number of offlined memcgs were not freed because
> > > they were pinned by some charged pages that were present. Even "echo
> > > 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> > > offlined but not freed memcgs tend to increase in number over time with
> > > the side effect that percpu memory consumption as shown in /proc/meminfo
> > > also increases over time.
> > >
> > > In order to find out more information about those pages that pin
> > > offlined memcgs, the page_owner feature is extended to dump memory
> > > cgroup information especially whether the cgroup is offlined or not.
> >
> > It is not really clear to me how this is supposed to be used. Are you
> > really dumping all the pages in the system to find out offline memcgs?
> > That looks rather clumsy to me. I am not against adding memcg
> > information to the page owner output. That can be useful in other
> > contexts.
>
> We've sometimes done exactly that in production, but with drgn
> scripts. It's not very common, so it doesn't need to be very efficient
> either. Typically, we'd encounter a host with an unusual number of
> dying cgroups, ssh in and poke around with drgn to figure out what
> kind of objects are still pinning the cgroups in question.
>
> This patch would make that process a little easier, I suppose.
Right. Over last few years I've spent enormous amount of time digging into
various aspects of this problem and in my experience the combination of drgn
for the inspection of the current state and bpf for following various decisions
on the reclaim path was the most useful combination.
I really appreciate an effort to put useful tools to track memcg references
into the kernel tree, however the page_owner infra has a limited usefulness
as it has to be enabled on the boot. But because it doesn't add any overhead,
I also don't think there any reasons to not add it.
Thanks!
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
@ 2022-01-31 18:15 ` Roman Gushchin
0 siblings, 0 replies; 63+ messages in thread
From: Roman Gushchin @ 2022-01-31 18:15 UTC (permalink / raw)
To: Johannes Weiner
Cc: Michal Hocko, Waiman Long, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Mon, Jan 31, 2022 at 11:53:19AM -0500, Johannes Weiner wrote:
> On Mon, Jan 31, 2022 at 10:38:51AM +0100, Michal Hocko wrote:
> > On Sat 29-01-22 15:53:15, Waiman Long wrote:
> > > It was found that a number of offlined memcgs were not freed because
> > > they were pinned by some charged pages that were present. Even "echo
> > > 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> > > offlined but not freed memcgs tend to increase in number over time with
> > > the side effect that percpu memory consumption as shown in /proc/meminfo
> > > also increases over time.
> > >
> > > In order to find out more information about those pages that pin
> > > offlined memcgs, the page_owner feature is extended to dump memory
> > > cgroup information especially whether the cgroup is offlined or not.
> >
> > It is not really clear to me how this is supposed to be used. Are you
> > really dumping all the pages in the system to find out offline memcgs?
> > That looks rather clumsy to me. I am not against adding memcg
> > information to the page owner output. That can be useful in other
> > contexts.
>
> We've sometimes done exactly that in production, but with drgn
> scripts. It's not very common, so it doesn't need to be very efficient
> either. Typically, we'd encounter a host with an unusual number of
> dying cgroups, ssh in and poke around with drgn to figure out what
> kind of objects are still pinning the cgroups in question.
>
> This patch would make that process a little easier, I suppose.
Right. Over last few years I've spent enormous amount of time digging into
various aspects of this problem and in my experience the combination of drgn
for the inspection of the current state and bpf for following various decisions
on the reclaim path was the most useful combination.
I really appreciate an effort to put useful tools to track memcg references
into the kernel tree, however the page_owner infra has a limited usefulness
as it has to be enabled on the boot. But because it doesn't add any overhead,
I also don't think there any reasons to not add it.
Thanks!
^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <YfgnUZQBRkqhrEIb-cx5fftMpWqeCjSd+JxjunQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>]
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-31 18:15 ` Roman Gushchin
@ 2022-01-31 18:25 ` Michal Hocko
-1 siblings, 0 replies; 63+ messages in thread
From: Michal Hocko @ 2022-01-31 18:25 UTC (permalink / raw)
To: Roman Gushchin
Cc: Johannes Weiner, Waiman Long, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On Mon 31-01-22 10:15:45, Roman Gushchin wrote:
> On Mon, Jan 31, 2022 at 11:53:19AM -0500, Johannes Weiner wrote:
> > On Mon, Jan 31, 2022 at 10:38:51AM +0100, Michal Hocko wrote:
> > > On Sat 29-01-22 15:53:15, Waiman Long wrote:
> > > > It was found that a number of offlined memcgs were not freed because
> > > > they were pinned by some charged pages that were present. Even "echo
> > > > 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> > > > offlined but not freed memcgs tend to increase in number over time with
> > > > the side effect that percpu memory consumption as shown in /proc/meminfo
> > > > also increases over time.
> > > >
> > > > In order to find out more information about those pages that pin
> > > > offlined memcgs, the page_owner feature is extended to dump memory
> > > > cgroup information especially whether the cgroup is offlined or not.
> > >
> > > It is not really clear to me how this is supposed to be used. Are you
> > > really dumping all the pages in the system to find out offline memcgs?
> > > That looks rather clumsy to me. I am not against adding memcg
> > > information to the page owner output. That can be useful in other
> > > contexts.
> >
> > We've sometimes done exactly that in production, but with drgn
> > scripts. It's not very common, so it doesn't need to be very efficient
> > either. Typically, we'd encounter a host with an unusual number of
> > dying cgroups, ssh in and poke around with drgn to figure out what
> > kind of objects are still pinning the cgroups in question.
> >
> > This patch would make that process a little easier, I suppose.
>
> Right. Over last few years I've spent enormous amount of time digging into
> various aspects of this problem and in my experience the combination of drgn
> for the inspection of the current state and bpf for following various decisions
> on the reclaim path was the most useful combination.
>
> I really appreciate an effort to put useful tools to track memcg references
> into the kernel tree, however the page_owner infra has a limited usefulness
> as it has to be enabled on the boot. But because it doesn't add any overhead,
> I also don't think there any reasons to not add it.
Would it be feasible to add a debugfs interface to displa dead memcg
information?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
@ 2022-01-31 18:25 ` Michal Hocko
0 siblings, 0 replies; 63+ messages in thread
From: Michal Hocko @ 2022-01-31 18:25 UTC (permalink / raw)
To: Roman Gushchin
Cc: Johannes Weiner, Waiman Long, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Mon 31-01-22 10:15:45, Roman Gushchin wrote:
> On Mon, Jan 31, 2022 at 11:53:19AM -0500, Johannes Weiner wrote:
> > On Mon, Jan 31, 2022 at 10:38:51AM +0100, Michal Hocko wrote:
> > > On Sat 29-01-22 15:53:15, Waiman Long wrote:
> > > > It was found that a number of offlined memcgs were not freed because
> > > > they were pinned by some charged pages that were present. Even "echo
> > > > 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
> > > > offlined but not freed memcgs tend to increase in number over time with
> > > > the side effect that percpu memory consumption as shown in /proc/meminfo
> > > > also increases over time.
> > > >
> > > > In order to find out more information about those pages that pin
> > > > offlined memcgs, the page_owner feature is extended to dump memory
> > > > cgroup information especially whether the cgroup is offlined or not.
> > >
> > > It is not really clear to me how this is supposed to be used. Are you
> > > really dumping all the pages in the system to find out offline memcgs?
> > > That looks rather clumsy to me. I am not against adding memcg
> > > information to the page owner output. That can be useful in other
> > > contexts.
> >
> > We've sometimes done exactly that in production, but with drgn
> > scripts. It's not very common, so it doesn't need to be very efficient
> > either. Typically, we'd encounter a host with an unusual number of
> > dying cgroups, ssh in and poke around with drgn to figure out what
> > kind of objects are still pinning the cgroups in question.
> >
> > This patch would make that process a little easier, I suppose.
>
> Right. Over last few years I've spent enormous amount of time digging into
> various aspects of this problem and in my experience the combination of drgn
> for the inspection of the current state and bpf for following various decisions
> on the reclaim path was the most useful combination.
>
> I really appreciate an effort to put useful tools to track memcg references
> into the kernel tree, however the page_owner infra has a limited usefulness
> as it has to be enabled on the boot. But because it doesn't add any overhead,
> I also don't think there any reasons to not add it.
Would it be feasible to add a debugfs interface to displa dead memcg
information?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <Yfgpknwr1tMnPkqh-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-31 18:25 ` Michal Hocko
@ 2022-01-31 18:38 ` Waiman Long
-1 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-31 18:38 UTC (permalink / raw)
To: Michal Hocko, Roman Gushchin
Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Petr Mladek,
Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On 1/31/22 13:25, Michal Hocko wrote:
> On Mon 31-01-22 10:15:45, Roman Gushchin wrote:
>> On Mon, Jan 31, 2022 at 11:53:19AM -0500, Johannes Weiner wrote:
>>> On Mon, Jan 31, 2022 at 10:38:51AM +0100, Michal Hocko wrote:
>>>> On Sat 29-01-22 15:53:15, Waiman Long wrote:
>>>>> It was found that a number of offlined memcgs were not freed because
>>>>> they were pinned by some charged pages that were present. Even "echo
>>>>> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
>>>>> offlined but not freed memcgs tend to increase in number over time with
>>>>> the side effect that percpu memory consumption as shown in /proc/meminfo
>>>>> also increases over time.
>>>>>
>>>>> In order to find out more information about those pages that pin
>>>>> offlined memcgs, the page_owner feature is extended to dump memory
>>>>> cgroup information especially whether the cgroup is offlined or not.
>>>> It is not really clear to me how this is supposed to be used. Are you
>>>> really dumping all the pages in the system to find out offline memcgs?
>>>> That looks rather clumsy to me. I am not against adding memcg
>>>> information to the page owner output. That can be useful in other
>>>> contexts.
>>> We've sometimes done exactly that in production, but with drgn
>>> scripts. It's not very common, so it doesn't need to be very efficient
>>> either. Typically, we'd encounter a host with an unusual number of
>>> dying cgroups, ssh in and poke around with drgn to figure out what
>>> kind of objects are still pinning the cgroups in question.
>>>
>>> This patch would make that process a little easier, I suppose.
>> Right. Over last few years I've spent enormous amount of time digging into
>> various aspects of this problem and in my experience the combination of drgn
>> for the inspection of the current state and bpf for following various decisions
>> on the reclaim path was the most useful combination.
>>
>> I really appreciate an effort to put useful tools to track memcg references
>> into the kernel tree, however the page_owner infra has a limited usefulness
>> as it has to be enabled on the boot. But because it doesn't add any overhead,
>> I also don't think there any reasons to not add it.
> Would it be feasible to add a debugfs interface to displa dead memcg
> information?
Originally, I added some debug code to keep track of the list of memcg
that has been offlined but not yet freed. After some more testing, I
figured out that the memcg's were not freed because they were pinned by
references in the page structs. At this point, I realize the using the
existing page owner debugging tool will be useful to track this kind of
problem since it already have all the infrastructure to list where the
pages were allocated as well as various field in the page structures.
Of course, it is also possible to have a debugfs interface to list those
dead memcg information, displaying more information about the page that
pins the memcg will be hard without using the page owner tool. Keeping
track of the list of dead memcg's may also have some runtime overhead.
Cheers,
Longman
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
@ 2022-01-31 18:38 ` Waiman Long
0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-31 18:38 UTC (permalink / raw)
To: Michal Hocko, Roman Gushchin
Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Petr Mladek,
Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On 1/31/22 13:25, Michal Hocko wrote:
> On Mon 31-01-22 10:15:45, Roman Gushchin wrote:
>> On Mon, Jan 31, 2022 at 11:53:19AM -0500, Johannes Weiner wrote:
>>> On Mon, Jan 31, 2022 at 10:38:51AM +0100, Michal Hocko wrote:
>>>> On Sat 29-01-22 15:53:15, Waiman Long wrote:
>>>>> It was found that a number of offlined memcgs were not freed because
>>>>> they were pinned by some charged pages that were present. Even "echo
>>>>> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
>>>>> offlined but not freed memcgs tend to increase in number over time with
>>>>> the side effect that percpu memory consumption as shown in /proc/meminfo
>>>>> also increases over time.
>>>>>
>>>>> In order to find out more information about those pages that pin
>>>>> offlined memcgs, the page_owner feature is extended to dump memory
>>>>> cgroup information especially whether the cgroup is offlined or not.
>>>> It is not really clear to me how this is supposed to be used. Are you
>>>> really dumping all the pages in the system to find out offline memcgs?
>>>> That looks rather clumsy to me. I am not against adding memcg
>>>> information to the page owner output. That can be useful in other
>>>> contexts.
>>> We've sometimes done exactly that in production, but with drgn
>>> scripts. It's not very common, so it doesn't need to be very efficient
>>> either. Typically, we'd encounter a host with an unusual number of
>>> dying cgroups, ssh in and poke around with drgn to figure out what
>>> kind of objects are still pinning the cgroups in question.
>>>
>>> This patch would make that process a little easier, I suppose.
>> Right. Over last few years I've spent enormous amount of time digging into
>> various aspects of this problem and in my experience the combination of drgn
>> for the inspection of the current state and bpf for following various decisions
>> on the reclaim path was the most useful combination.
>>
>> I really appreciate an effort to put useful tools to track memcg references
>> into the kernel tree, however the page_owner infra has a limited usefulness
>> as it has to be enabled on the boot. But because it doesn't add any overhead,
>> I also don't think there any reasons to not add it.
> Would it be feasible to add a debugfs interface to displa dead memcg
> information?
Originally, I added some debug code to keep track of the list of memcg
that has been offlined but not yet freed. After some more testing, I
figured out that the memcg's were not freed because they were pinned by
references in the page structs. At this point, I realize the using the
existing page owner debugging tool will be useful to track this kind of
problem since it already have all the infrastructure to list where the
pages were allocated as well as various field in the page structures.
Of course, it is also possible to have a debugfs interface to list those
dead memcg information, displaying more information about the page that
pins the memcg will be hard without using the page owner tool. Keeping
track of the list of dead memcg's may also have some runtime overhead.
Cheers,
Longman
^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <12686956-612d-d89b-5641-470d5e913090-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-31 18:38 ` Waiman Long
@ 2022-02-01 10:49 ` Michal Hocko
-1 siblings, 0 replies; 63+ messages in thread
From: Michal Hocko @ 2022-02-01 10:49 UTC (permalink / raw)
To: Waiman Long
Cc: Roman Gushchin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On Mon 31-01-22 13:38:28, Waiman Long wrote:
[...]
> Of course, it is also possible to have a debugfs interface to list those
> dead memcg information, displaying more information about the page that pins
> the memcg will be hard without using the page owner tool.
Yes, you will need page owner or hook into the kernel by other means
(like already mentioned drgn). The question is whether scanning all
existing pages to get that information is the best we can offer.
> Keeping track of
> the list of dead memcg's may also have some runtime overhead.
Could you be more specific? Offlined memcgs are still part of the
hierarchy IIRC. So it shouldn't be much more than iterating the whole
cgroup tree and collect interesting data about dead cgroups.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
@ 2022-02-01 10:49 ` Michal Hocko
0 siblings, 0 replies; 63+ messages in thread
From: Michal Hocko @ 2022-02-01 10:49 UTC (permalink / raw)
To: Waiman Long
Cc: Roman Gushchin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On Mon 31-01-22 13:38:28, Waiman Long wrote:
[...]
> Of course, it is also possible to have a debugfs interface to list those
> dead memcg information, displaying more information about the page that pins
> the memcg will be hard without using the page owner tool.
Yes, you will need page owner or hook into the kernel by other means
(like already mentioned drgn). The question is whether scanning all
existing pages to get that information is the best we can offer.
> Keeping track of
> the list of dead memcg's may also have some runtime overhead.
Could you be more specific? Offlined memcgs are still part of the
hierarchy IIRC. So it shouldn't be much more than iterating the whole
cgroup tree and collect interesting data about dead cgroups.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-02-01 10:49 ` Michal Hocko
(?)
@ 2022-02-01 16:41 ` Waiman Long
[not found] ` <268a8bdf-4c70-b967-f34c-2293b54186f0-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
-1 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2022-02-01 16:41 UTC (permalink / raw)
To: Michal Hocko
Cc: Roman Gushchin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
Petr Mladek, Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On 2/1/22 05:49, Michal Hocko wrote:
> On Mon 31-01-22 13:38:28, Waiman Long wrote:
> [...]
>> Of course, it is also possible to have a debugfs interface to list those
>> dead memcg information, displaying more information about the page that pins
>> the memcg will be hard without using the page owner tool.
> Yes, you will need page owner or hook into the kernel by other means
> (like already mentioned drgn). The question is whether scanning all
> existing pages to get that information is the best we can offer.
The page_owner tool records the page information at allocation time.
There are some slight performance overhead, but it is the memory
overhead that is the major drawback of this approach as we need one
page_owner structure for each physical page. Page scanning is only done
when users read the page_owner debugfs file. Yes, I agree that scanning
all the pages is not the most efficient way to get these dead memcg
information, but it is what the page_owner tool does. I would argue that
this is the most efficient coding-wise to get this information.
>> Keeping track of
>> the list of dead memcg's may also have some runtime overhead.
> Could you be more specific? Offlined memcgs are still part of the
> hierarchy IIRC. So it shouldn't be much more than iterating the whole
> cgroup tree and collect interesting data about dead cgroups.
What I mean is that without piggybacking on top of page_owner, we will
to add a lot more code to collect and display those information which
may have some overhead of its own.
Cheers,
Longman
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
2022-01-31 9:38 ` Michal Hocko
@ 2022-01-31 19:01 ` Waiman Long
-1 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-31 19:01 UTC (permalink / raw)
To: Michal Hocko
Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Petr Mladek,
Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Ira Weiny, Rafael Aquini
On 1/31/22 04:38, Michal Hocko wrote:
> On Sat 29-01-22 15:53:15, Waiman Long wrote:
>> It was found that a number of offlined memcgs were not freed because
>> they were pinned by some charged pages that were present. Even "echo
>> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
>> offlined but not freed memcgs tend to increase in number over time with
>> the side effect that percpu memory consumption as shown in /proc/meminfo
>> also increases over time.
>>
>> In order to find out more information about those pages that pin
>> offlined memcgs, the page_owner feature is extended to dump memory
>> cgroup information especially whether the cgroup is offlined or not.
> It is not really clear to me how this is supposed to be used. Are you
> really dumping all the pages in the system to find out offline memcgs?
> That looks rather clumsy to me. I am not against adding memcg
> information to the page owner output. That can be useful in other
> contexts.
I am just piggybacking on top of the existing page_owner tool to provide
information for me to find out what pages are pinning the dead memcgs.
page_owner is a debugging tool that is not turned on by default. We do
have to add a kernel parameter and rebooting the system to use that,
but that is pretty easy to do once we have a reproducer to reproduce the
problem.
Cheers,
Longman
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v2 3/3] mm/page_owner: Dump memcg information
@ 2022-01-31 19:01 ` Waiman Long
0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2022-01-31 19:01 UTC (permalink / raw)
To: Michal Hocko
Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Petr Mladek,
Steven Rostedt, Sergey Senozhatsky, Andy Shevchenko,
Rasmus Villemoes, linux-kernel, cgroups, linux-mm, Ira Weiny,
Rafael Aquini
On 1/31/22 04:38, Michal Hocko wrote:
> On Sat 29-01-22 15:53:15, Waiman Long wrote:
>> It was found that a number of offlined memcgs were not freed because
>> they were pinned by some charged pages that were present. Even "echo
>> 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These
>> offlined but not freed memcgs tend to increase in number over time with
>> the side effect that percpu memory consumption as shown in /proc/meminfo
>> also increases over time.
>>
>> In order to find out more information about those pages that pin
>> offlined memcgs, the page_owner feature is extended to dump memory
>> cgroup information especially whether the cgroup is offlined or not.
> It is not really clear to me how this is supposed to be used. Are you
> really dumping all the pages in the system to find out offline memcgs?
> That looks rather clumsy to me. I am not against adding memcg
> information to the page owner output. That can be useful in other
> contexts.
I am just piggybacking on top of the existing page_owner tool to provide
information for me to find out what pages are pinning the dead memcgs.
page_owner is a debugging tool that is not turned on by default. We do
have to add a kernel parameter and rebooting the system to use that,
but that is pretty easy to do once we have a reproducer to reproduce the
problem.
Cheers,
Longman
^ permalink raw reply [flat|nested] 63+ messages in thread