* [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
[not found] ` <20230501165450.15352-2-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2023-05-02 7:55 ` Jani Nikula
2023-05-01 16:54 ` [PATCH 04/40] nodemask: Split out include/linux/nodemask_types.h Suren Baghdasaryan
` (29 subsequent siblings)
30 siblings, 2 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
From: Kent Overstreet <kent.overstreet@linux.dev>
Previously, string_get_size() outputted a space between the number and
the units, i.e.
9.88 MiB
This changes it to
9.88MiB
which allows it to be parsed correctly by the 'sort -h' command.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Noralf Tr√∏nnes" <noralf@tronnes.org>
Cc: Jens Axboe <axboe@kernel.dk>
---
lib/string_helpers.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/lib/string_helpers.c b/lib/string_helpers.c
index 230020a2e076..593b29fece32 100644
--- a/lib/string_helpers.c
+++ b/lib/string_helpers.c
@@ -126,8 +126,7 @@ void string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
else
unit = units_str[units][i];
- snprintf(buf, len, "%u%s %s", (u32)size,
- tmp, unit);
+ snprintf(buf, len, "%u%s%s", (u32)size, tmp, unit);
}
EXPORT_SYMBOL(string_get_size);
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread[parent not found: <20230501165450.15352-2-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
[not found] ` <20230501165450.15352-2-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2023-05-01 18:13 ` Davidlohr Bueso
2023-05-01 19:35 ` Kent Overstreet
[not found] ` <ZFAUj+Q+hP7cWs4w@moria.home.lan>
0 siblings, 2 replies; 160+ messages in thread
From: Davidlohr Bueso @ 2023-05-01 18:13 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
willy-wEGCiKHe2LqWVfeAwA7xHQ, liam.howlett-QHcLZuEGTsvQT0dZR+AlfA,
corbet-T1hC0tSOHrs, void-gq6j2QGBifHby3iVrkZq2A,
peterz-wEGCiKHe2LqWVfeAwA7xHQ, juri.lelli-H+wXaHxf7aLQT0dZR+AlfA,
ldufour-tEXmvtCZX7AybS5Ee8rs3A, catalin.marinas-5wv7dgnIgG8,
will-DgEjT+Ai2ygdnm+yROfE0A, arnd-r2nGTMty4D4,
tglx-hfZtesqFncYOwBW4kG4KsQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
dave.hansen-VuQAYsv1563Yd54FQh9/CA, x86-DgEjT+Ai2ygdnm+yROfE0A,
peterx-H+wXaHxf7aLQT0dZR+AlfA, david-H+wXaHxf7aLQT0dZR+AlfA,
axboe-tSWWG44O7X1aa/9Udqfwiw, mcgrof-DgEjT+Ai2ygdnm+yROfE0A,
masahiroy-DgEjT+Ai2ygdnm+yROfE0A, nathan-DgEjT+Ai2ygdnm+yROfE0A,
dennis-DgEjT+Ai2ygdnm+yROfE0A, tj-DgEjT+Ai2ygdnm+yROfE0A,
muchun.song-fxUVXftIFDnyG1zEObXtfA, rppt-DgEjT+Ai2ygdnm+yROfE0A,
paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w, keescook
On Mon, 01 May 2023, Suren Baghdasaryan wrote:
>From: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
>
>Previously, string_get_size() outputted a space between the number and
>the units, i.e.
> 9.88 MiB
>
>This changes it to
> 9.88MiB
>
>which allows it to be parsed correctly by the 'sort -h' command.
Wouldn't this break users that already parse it the current way?
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
2023-05-01 18:13 ` Davidlohr Bueso
@ 2023-05-01 19:35 ` Kent Overstreet
[not found] ` <ZFAUj+Q+hP7cWs4w@moria.home.lan>
1 sibling, 0 replies; 160+ messages in thread
From: Kent Overstreet @ 2023-05-01 19:35 UTC (permalink / raw)
To: Suren Baghdasaryan, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
mhocko-IBi9RG/b67k, vbabka-AlSwsSmVLrQ,
hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
willy-wEGCiKHe2LqWVfeAwA7xHQ, liam.howlett-QHcLZuEGTsvQT0dZR+AlfA,
corbet-T1hC0tSOHrs, void-gq6j2QGBifHby3iVrkZq2A,
peterz-wEGCiKHe2LqWVfeAwA7xHQ, juri.lelli-H+wXaHxf7aLQT0dZR+AlfA,
ldufour-tEXmvtCZX7AybS5Ee8rs3A, catalin.marinas-5wv7dgnIgG8,
will-DgEjT+Ai2ygdnm+yROfE0A, arnd-r2nGTMty4D4,
tglx-hfZtesqFncYOwBW4kG4KsQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
dave.hansen-VuQAYsv1563Yd54FQh9/CA, x86-DgEjT+Ai2ygdnm+yROfE0A,
peterx-H+wXaHxf7aLQT0dZR+AlfA, david-H+wXaHxf7aLQT0dZR+AlfA,
axboe-tSWWG44O7X1aa/9Udqfwiw, mcgrof-DgEjT+Ai2ygdnm+yROfE0A,
masahiroy-DgEjT+Ai2ygdnm+yROfE0A, nathan-DgEjT+Ai2ygdnm+yROfE0A,
dennis-DgEjT+Ai2ygdnm+yROfE0A, tj-DgEjT+Ai2ygdnm+yROfE0A,
muchun.song-fxUVXftIFDnyG1zEObXtfA, rppt-DgEjT+Ai2ygdnm+yROfE0A,
paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w, k
On Mon, May 01, 2023 at 11:13:15AM -0700, Davidlohr Bueso wrote:
> On Mon, 01 May 2023, Suren Baghdasaryan wrote:
>
> > From: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
> >
> > Previously, string_get_size() outputted a space between the number and
> > the units, i.e.
> > 9.88 MiB
> >
> > This changes it to
> > 9.88MiB
> >
> > which allows it to be parsed correctly by the 'sort -h' command.
>
> Wouldn't this break users that already parse it the current way?
It's not impossible - but it's not used in very many places and we
wouldn't be printing in human-readable units if it was meant to be
parsed - it's mainly used for debug output currently.
If someone raises a specific objection we'll do something different,
otherwise I think standardizing on what userspace tooling already parses
is a good idea.
^ permalink raw reply [flat|nested] 160+ messages in thread[parent not found: <ZFAUj+Q+hP7cWs4w@moria.home.lan>]
* Re: [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
[not found] ` <ZFAUj+Q+hP7cWs4w@moria.home.lan>
@ 2023-05-01 19:57 ` Andy Shevchenko
2023-05-01 21:16 ` Kent Overstreet
` (2 more replies)
[not found] ` <ZFAUj+Q+hP7cWs4w-jC9Py7bek1znysI04z7BkA@public.gmane.org>
[not found] ` <b6b472b65b76e95bb4c7fc7eac1ee296fdbb64fd.camel@HansenPartnership.com>
2 siblings, 3 replies; 160+ messages in thread
From: Andy Shevchenko @ 2023-05-01 19:57 UTC (permalink / raw)
To: Kent Overstreet
Cc: Suren Baghdasaryan, akpm, mhocko, vbabka, hannes, roman.gushchin,
mgorman, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, k
On Mon, May 1, 2023 at 10:36 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Mon, May 01, 2023 at 11:13:15AM -0700, Davidlohr Bueso wrote:
> > On Mon, 01 May 2023, Suren Baghdasaryan wrote:
> >
> > > From: Kent Overstreet <kent.overstreet@linux.dev>
> > >
> > > Previously, string_get_size() outputted a space between the number and
> > > the units, i.e.
> > > 9.88 MiB
> > >
> > > This changes it to
> > > 9.88MiB
> > >
> > > which allows it to be parsed correctly by the 'sort -h' command.
But why do we need that? What's the use case?
> > Wouldn't this break users that already parse it the current way?
>
> It's not impossible - but it's not used in very many places and we
> wouldn't be printing in human-readable units if it was meant to be
> parsed - it's mainly used for debug output currently.
>
> If someone raises a specific objection we'll do something different,
> otherwise I think standardizing on what userspace tooling already parses
> is a good idea.
Yes, I NAK this on the basis of
https://english.stackexchange.com/a/2911/153144
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
2023-05-01 19:57 ` Andy Shevchenko
@ 2023-05-01 21:16 ` Kent Overstreet
2023-05-01 21:33 ` Liam R. Howlett
2023-05-02 0:53 ` Kent Overstreet
2 siblings, 0 replies; 160+ messages in thread
From: Kent Overstreet @ 2023-05-01 21:16 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Suren Baghdasaryan, akpm, mhocko, vbabka, hannes, roman.gushchin,
mgorman, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, k
On Mon, May 01, 2023 at 10:57:07PM +0300, Andy Shevchenko wrote:
> On Mon, May 1, 2023 at 10:36 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Mon, May 01, 2023 at 11:13:15AM -0700, Davidlohr Bueso wrote:
> > > On Mon, 01 May 2023, Suren Baghdasaryan wrote:
> > >
> > > > From: Kent Overstreet <kent.overstreet@linux.dev>
> > > >
> > > > Previously, string_get_size() outputted a space between the number and
> > > > the units, i.e.
> > > > 9.88 MiB
> > > >
> > > > This changes it to
> > > > 9.88MiB
> > > >
> > > > which allows it to be parsed correctly by the 'sort -h' command.
>
> But why do we need that? What's the use case?
As was in the commit message: to produce output that sort -h knows how
to parse.
> > > Wouldn't this break users that already parse it the current way?
> >
> > It's not impossible - but it's not used in very many places and we
> > wouldn't be printing in human-readable units if it was meant to be
> > parsed - it's mainly used for debug output currently.
> >
> > If someone raises a specific objection we'll do something different,
> > otherwise I think standardizing on what userspace tooling already parses
> > is a good idea.
>
> Yes, I NAK this on the basis of
> https://english.stackexchange.com/a/2911/153144
Not sure I find a style guide on stackexchange more compelling than
interop with a tool everyone already has installed :)
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
2023-05-01 19:57 ` Andy Shevchenko
2023-05-01 21:16 ` Kent Overstreet
@ 2023-05-01 21:33 ` Liam R. Howlett
2023-05-02 0:11 ` Kent Overstreet
2023-05-02 0:53 ` Kent Overstreet
2 siblings, 1 reply; 160+ messages in thread
From: Liam R. Howlett @ 2023-05-01 21:33 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Kent Overstreet, Suren Baghdasaryan, akpm, mhocko, vbabka, hannes,
roman.gushchin, mgorman, willy, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd
* Andy Shevchenko <andy.shevchenko@gmail.com> [230501 15:57]:
> On Mon, May 1, 2023 at 10:36 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Mon, May 01, 2023 at 11:13:15AM -0700, Davidlohr Bueso wrote:
> > > On Mon, 01 May 2023, Suren Baghdasaryan wrote:
> > >
> > > > From: Kent Overstreet <kent.overstreet@linux.dev>
> > > >
> > > > Previously, string_get_size() outputted a space between the number and
> > > > the units, i.e.
> > > > 9.88 MiB
> > > >
> > > > This changes it to
> > > > 9.88MiB
> > > >
> > > > which allows it to be parsed correctly by the 'sort -h' command.
>
> But why do we need that? What's the use case?
>
> > > Wouldn't this break users that already parse it the current way?
> >
> > It's not impossible - but it's not used in very many places and we
> > wouldn't be printing in human-readable units if it was meant to be
> > parsed - it's mainly used for debug output currently.
> >
> > If someone raises a specific objection we'll do something different,
> > otherwise I think standardizing on what userspace tooling already parses
> > is a good idea.
>
> Yes, I NAK this on the basis of
> https://english.stackexchange.com/a/2911/153144
This fixes the output to be better aligned with:
the output of ls -sh
the input expected by find -size
Are there counter-examples of commands that follow the SI Brochure?
Thanks,
Liam
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
2023-05-01 21:33 ` Liam R. Howlett
@ 2023-05-02 0:11 ` Kent Overstreet
0 siblings, 0 replies; 160+ messages in thread
From: Kent Overstreet @ 2023-05-02 0:11 UTC (permalink / raw)
To: Liam R. Howlett, Andy Shevchenko, Suren Baghdasaryan,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
willy-wEGCiKHe2LqWVfeAwA7xHQ, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao
On Mon, May 01, 2023 at 05:33:49PM -0400, Liam R. Howlett wrote:
> * Andy Shevchenko <andy.shevchenko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> [230501 15:57]:
> This fixes the output to be better aligned with:
> the output of ls -sh
> the input expected by find -size
>
> Are there counter-examples of commands that follow the SI Brochure?
Even perf, which is included in the kernel tree, doesn't include the
space - example perf top output:
0 bcachefs:move_extent_fail
0 bcachefs:move_extent_alloc_mem_fail
3 bcachefs:move_data
0 bcachefs:evacuate_bucket
0 bcachefs:copygc
2 bcachefs:copygc_wait
195K bcachefs:transaction_commit
0 bcachefs:trans_restart_injected
(I'm also going to need to submit a patch that deletes or makes optional
the B suffix, just because we're using human readable units doesn't mean
it's bytes).
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
2023-05-01 19:57 ` Andy Shevchenko
2023-05-01 21:16 ` Kent Overstreet
2023-05-01 21:33 ` Liam R. Howlett
@ 2023-05-02 0:53 ` Kent Overstreet
2 siblings, 0 replies; 160+ messages in thread
From: Kent Overstreet @ 2023-05-02 0:53 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Suren Baghdasaryan, akpm, mhocko, vbabka, hannes, roman.gushchin,
mgorman, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, k
On Mon, May 01, 2023 at 10:57:07PM +0300, Andy Shevchenko wrote:
> But why do we need that? What's the use case?
It looks like we missed you on the initial CC, here's the use case:
https://lore.kernel.org/linux-fsdevel/ZFAsm0XTqC%2F%2Ff4FP@P9FQF9L96D/T/#mdda814a8c569e2214baa31320912b0ef83432fa9
^ permalink raw reply [flat|nested] 160+ messages in thread
[parent not found: <ZFAUj+Q+hP7cWs4w-jC9Py7bek1znysI04z7BkA@public.gmane.org>]
* Re: [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
[not found] ` <ZFAUj+Q+hP7cWs4w-jC9Py7bek1znysI04z7BkA@public.gmane.org>
@ 2023-05-02 2:22 ` James Bottomley
0 siblings, 0 replies; 160+ messages in thread
From: James Bottomley @ 2023-05-02 2:22 UTC (permalink / raw)
To: Kent Overstreet, Suren Baghdasaryan,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
willy-wEGCiKHe2LqWVfeAwA7xHQ, liam.howlett-QHcLZuEGTsvQT0dZR+AlfA,
corbet-T1hC0tSOHrs, void-gq6j2QGBifHby3iVrkZq2A,
peterz-wEGCiKHe2LqWVfeAwA7xHQ, juri.lelli-H+wXaHxf7aLQT0dZR+AlfA,
ldufour-tEXmvtCZX7AybS5Ee8rs3A, catalin.marinas-5wv7dgnIgG8,
will-DgEjT+Ai2ygdnm+yROfE0A, arnd-r2nGTMty4D4,
tglx-hfZtesqFncYOwBW4kG4KsQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
dave.hansen-VuQAYsv1563Yd54FQh9/CA, x86-DgEjT+Ai2ygdnm+yROfE0A,
peterx-H+wXaHxf7aLQT0dZR+AlfA, david-H+wXaHxf7aLQT0dZR+AlfA,
axboe-tSWWG44O7X1aa/9Udqfwiw, mcgrof-DgEjT+Ai2ygdnm+yROfE0A,
masahiroy-DgEjT+Ai2ygdnm+yROfE0A, nathan-DgEjT+Ai2ygdnm+yROfE0A,
dennis-DgEjT+Ai2ygdnm+yROfE0A, tj-DgEjT+Ai2ygdnm+yROfE0A,
muchun.song-fxUVXftIFDnyG1zEObXtfA, rppt-DgEjT+Ai2ygdnm+yROfE0A,
paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells
On Mon, 2023-05-01 at 15:35 -0400, Kent Overstreet wrote:
> On Mon, May 01, 2023 at 11:13:15AM -0700, Davidlohr Bueso wrote:
> > On Mon, 01 May 2023, Suren Baghdasaryan wrote:
> >
> > > From: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
> > >
> > > Previously, string_get_size() outputted a space between the
> > > number and the units, i.e.
> > > 9.88 MiB
> > >
> > > This changes it to
> > > 9.88MiB
> > >
> > > which allows it to be parsed correctly by the 'sort -h' command.
> >
> > Wouldn't this break users that already parse it the current way?
>
> It's not impossible - but it's not used in very many places and we
> wouldn't be printing in human-readable units if it was meant to be
> parsed - it's mainly used for debug output currently.
It is not used just for debug. It's used all over the kernel for
printing out device sizes. The output mostly goes to the kernel print
buffer, so it's anyone's guess as to what, if any, tools are parsing
it, but the concern about breaking log parsers seems to be a valid one.
> If someone raises a specific objection we'll do something different,
> otherwise I think standardizing on what userspace tooling already
> parses is a good idea.
If you want to omit the space, why not simply add your own variant? A
string_get_size_nospace() which would use most of the body of this one
as a helper function but give its own snprintf format string at the
end. It's only a couple of lines longer as a patch and has the bonus
that it definitely wouldn't break anything by altering an existing
output.
James
^ permalink raw reply [flat|nested] 160+ messages in thread
[parent not found: <b6b472b65b76e95bb4c7fc7eac1ee296fdbb64fd.camel@HansenPartnership.com>]
* Re: [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output
2023-05-01 16:54 ` [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output Suren Baghdasaryan
[not found] ` <20230501165450.15352-2-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2023-05-02 7:55 ` Jani Nikula
1 sibling, 0 replies; 160+ messages in thread
From: Jani Nikula @ 2023-05-02 7:55 UTC (permalink / raw)
To: Suren Baghdasaryan, akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Mon, 01 May 2023, Suren Baghdasaryan <surenb@google.com> wrote:
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> Previously, string_get_size() outputted a space between the number and
> the units, i.e.
> 9.88 MiB
>
> This changes it to
> 9.88MiB
>
> which allows it to be parsed correctly by the 'sort -h' command.
The former is easier for humans to parse, and that should be
preferred. 'sort -h' is supposed to compare "human readable numbers", so
arguably sort does not do its job here.
BR,
Jani.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Cc: Andy Shevchenko <andy@kernel.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: "Noralf Tr√∏nnes" <noralf@tronnes.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> ---
> lib/string_helpers.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/lib/string_helpers.c b/lib/string_helpers.c
> index 230020a2e076..593b29fece32 100644
> --- a/lib/string_helpers.c
> +++ b/lib/string_helpers.c
> @@ -126,8 +126,7 @@ void string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
> else
> unit = units_str[units][i];
>
> - snprintf(buf, len, "%u%s %s", (u32)size,
> - tmp, unit);
> + snprintf(buf, len, "%u%s%s", (u32)size, tmp, unit);
> }
> EXPORT_SYMBOL(string_get_size);
--
Jani Nikula, Intel Open Source Graphics Center
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 04/40] nodemask: Split out include/linux/nodemask_types.h
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 05/40] prandom: Remove unused include Suren Baghdasaryan
` (28 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
From: Kent Overstreet <kent.overstreet@linux.dev>
sched.h, which defines task_struct, needs nodemask_t - but sched.h is a
frequently used header and ideally shouldn't be pulling in any more code
that it needs to.
This splits out nodemask_types.h which has the definition sched.h needs,
which will avoid a circular header dependency in the alloc tagging patch
series, and as a bonus should speed up kernel build times.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
include/linux/nodemask.h | 2 +-
include/linux/nodemask_types.h | 9 +++++++++
include/linux/sched.h | 2 +-
3 files changed, 11 insertions(+), 2 deletions(-)
create mode 100644 include/linux/nodemask_types.h
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index bb0ee80526b2..fda37b6df274 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -93,10 +93,10 @@
#include <linux/threads.h>
#include <linux/bitmap.h>
#include <linux/minmax.h>
+#include <linux/nodemask_types.h>
#include <linux/numa.h>
#include <linux/random.h>
-typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
extern nodemask_t _unused_nodemask_arg_;
/**
diff --git a/include/linux/nodemask_types.h b/include/linux/nodemask_types.h
new file mode 100644
index 000000000000..84c2f47c4237
--- /dev/null
+++ b/include/linux/nodemask_types.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_NODEMASK_TYPES_H
+#define __LINUX_NODEMASK_TYPES_H
+
+#include <linux/numa.h>
+
+typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
+
+#endif /* __LINUX_NODEMASK_TYPES_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index eed5d65b8d1f..35e7efdea2d9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -20,7 +20,7 @@
#include <linux/hrtimer.h>
#include <linux/irqflags.h>
#include <linux/seccomp.h>
-#include <linux/nodemask.h>
+#include <linux/nodemask_types.h>
#include <linux/rcupdate.h>
#include <linux/refcount.h>
#include <linux/resource.h>
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 05/40] prandom: Remove unused include
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 04/40] nodemask: Split out include/linux/nodemask_types.h Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 06/40] lib/string.c: strsep_no_empty() Suren Baghdasaryan
` (27 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
From: Kent Overstreet <kent.overstreet@linux.dev>
prandom.h doesn't use percpu.h - this fixes some circular header issues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/prandom.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index f2ed5b72b3d6..f7f1e5251c67 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -10,7 +10,6 @@
#include <linux/types.h>
#include <linux/once.h>
-#include <linux/percpu.h>
#include <linux/random.h>
struct rnd_state {
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 06/40] lib/string.c: strsep_no_empty()
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (2 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 05/40] prandom: Remove unused include Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-02 12:37 ` Petr Tesařík
2023-05-01 16:54 ` [PATCH 08/40] mm: introduce slabobj_ext to support slab object extensions Suren Baghdasaryan
` (26 subsequent siblings)
30 siblings, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
From: Kent Overstreet <kent.overstreet@linux.dev>
This adds a new helper which is like strsep, except that it skips empty
tokens.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/string.h | 1 +
lib/string.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/include/linux/string.h b/include/linux/string.h
index c062c581a98b..6cd5451c262c 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -96,6 +96,7 @@ extern char * strpbrk(const char *,const char *);
#ifndef __HAVE_ARCH_STRSEP
extern char * strsep(char **,const char *);
#endif
+extern char *strsep_no_empty(char **, const char *);
#ifndef __HAVE_ARCH_STRSPN
extern __kernel_size_t strspn(const char *,const char *);
#endif
diff --git a/lib/string.c b/lib/string.c
index 3d55ef890106..dd4914baf45a 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -520,6 +520,25 @@ char *strsep(char **s, const char *ct)
EXPORT_SYMBOL(strsep);
#endif
+/**
+ * strsep_no_empt - Split a string into tokens, but don't return empty tokens
+ * @s: The string to be searched
+ * @ct: The characters to search for
+ *
+ * strsep() updates @s to point after the token, ready for the next call.
+ */
+char *strsep_no_empty(char **s, const char *ct)
+{
+ char *ret;
+
+ do {
+ ret = strsep(s, ct);
+ } while (ret && !*ret);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(strsep_no_empty);
+
#ifndef __HAVE_ARCH_MEMSET
/**
* memset - Fill a region of memory with the given value
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH 06/40] lib/string.c: strsep_no_empty()
2023-05-01 16:54 ` [PATCH 06/40] lib/string.c: strsep_no_empty() Suren Baghdasaryan
@ 2023-05-02 12:37 ` Petr Tesařík
0 siblings, 0 replies; 160+ messages in thread
From: Petr Tesařík @ 2023-05-02 12:37 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
mgorman, dave, willy, liam.howlett, corbet, void, peterz,
juri.lelli, ldufour, catalin.marinas, will, arnd, tglx, mingo,
dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy, nathan,
dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, andreyknvl
On Mon, 1 May 2023 09:54:16 -0700
Suren Baghdasaryan <surenb@google.com> wrote:
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> This adds a new helper which is like strsep, except that it skips empty
> tokens.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
> include/linux/string.h | 1 +
> lib/string.c | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+)
>
> diff --git a/include/linux/string.h b/include/linux/string.h
> index c062c581a98b..6cd5451c262c 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -96,6 +96,7 @@ extern char * strpbrk(const char *,const char *);
> #ifndef __HAVE_ARCH_STRSEP
> extern char * strsep(char **,const char *);
> #endif
> +extern char *strsep_no_empty(char **, const char *);
> #ifndef __HAVE_ARCH_STRSPN
> extern __kernel_size_t strspn(const char *,const char *);
> #endif
> diff --git a/lib/string.c b/lib/string.c
> index 3d55ef890106..dd4914baf45a 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -520,6 +520,25 @@ char *strsep(char **s, const char *ct)
> EXPORT_SYMBOL(strsep);
> #endif
>
> +/**
> + * strsep_no_empt - Split a string into tokens, but don't return empty tokens
^^^^
Typo: strsep_no_empty
Petr T
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 08/40] mm: introduce slabobj_ext to support slab object extensions
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (3 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 06/40] lib/string.c: strsep_no_empty() Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 10/40] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation Suren Baghdasaryan
` (25 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Currently slab pages can store only vectors of obj_cgroup pointers in
page->memcg_data. Introduce slabobj_ext structure to allow more data
to be stored for each slab object. Wrap obj_cgroup into slabobj_ext
to support current functionality while allowing to extend slabobj_ext
in the future.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/memcontrol.h | 20 +++--
include/linux/mm_types.h | 4 +-
init/Kconfig | 4 +
mm/kfence/core.c | 14 ++--
mm/kfence/kfence.h | 4 +-
mm/memcontrol.c | 56 ++------------
mm/page_owner.c | 2 +-
mm/slab.h | 148 +++++++++++++++++++++++++------------
mm/slab_common.c | 47 ++++++++++++
9 files changed, 185 insertions(+), 114 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 222d7370134c..b9fd9732a52b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -339,8 +339,8 @@ struct mem_cgroup {
extern struct mem_cgroup *root_mem_cgroup;
enum page_memcg_data_flags {
- /* page->memcg_data is a pointer to an objcgs vector */
- MEMCG_DATA_OBJCGS = (1UL << 0),
+ /* page->memcg_data is a pointer to an slabobj_ext vector */
+ MEMCG_DATA_OBJEXTS = (1UL << 0),
/* page has been accounted as a non-slab kernel page */
MEMCG_DATA_KMEM = (1UL << 1),
/* the next bit after the last actual flag */
@@ -378,7 +378,7 @@ static inline struct mem_cgroup *__folio_memcg(struct folio *folio)
unsigned long memcg_data = folio->memcg_data;
VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
- VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio);
+ VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio);
return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
@@ -399,7 +399,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
unsigned long memcg_data = folio->memcg_data;
VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
- VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio);
+ VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio);
return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
@@ -496,7 +496,7 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
*/
unsigned long memcg_data = READ_ONCE(folio->memcg_data);
- if (memcg_data & MEMCG_DATA_OBJCGS)
+ if (memcg_data & MEMCG_DATA_OBJEXTS)
return NULL;
if (memcg_data & MEMCG_DATA_KMEM) {
@@ -542,7 +542,7 @@ static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *ob
static inline bool folio_memcg_kmem(struct folio *folio)
{
VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page);
- VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJCGS, folio);
+ VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio);
return folio->memcg_data & MEMCG_DATA_KMEM;
}
@@ -1606,6 +1606,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
}
#endif /* CONFIG_MEMCG */
+/*
+ * Extended information for slab objects stored as an array in page->memcg_data
+ * if MEMCG_DATA_OBJEXTS is set.
+ */
+struct slabobj_ext {
+ struct obj_cgroup *objcg;
+} __aligned(8);
+
static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
{
__mod_lruvec_kmem_state(p, idx, 1);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 306a3d1a0fa6..e79303e1e30c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -194,7 +194,7 @@ struct page {
/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
atomic_t _refcount;
-#ifdef CONFIG_MEMCG
+#ifdef CONFIG_SLAB_OBJ_EXT
unsigned long memcg_data;
#endif
@@ -320,7 +320,7 @@ struct folio {
void *private;
atomic_t _mapcount;
atomic_t _refcount;
-#ifdef CONFIG_MEMCG
+#ifdef CONFIG_SLAB_OBJ_EXT
unsigned long memcg_data;
#endif
/* private: the union with struct page is transitional */
diff --git a/init/Kconfig b/init/Kconfig
index 32c24950c4ce..44267919a2a2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -936,10 +936,14 @@ config CGROUP_FAVOR_DYNMODS
Say N if unsure.
+config SLAB_OBJ_EXT
+ bool
+
config MEMCG
bool "Memory controller"
select PAGE_COUNTER
select EVENTFD
+ select SLAB_OBJ_EXT
help
Provides control over the memory footprint of tasks in a cgroup.
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index dad3c0eb70a0..aea6fa145080 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -590,9 +590,9 @@ static unsigned long kfence_init_pool(void)
continue;
__folio_set_slab(slab_folio(slab));
-#ifdef CONFIG_MEMCG
- slab->memcg_data = (unsigned long)&kfence_metadata[i / 2 - 1].objcg |
- MEMCG_DATA_OBJCGS;
+#ifdef CONFIG_MEMCG_KMEM
+ slab->obj_exts = (unsigned long)&kfence_metadata[i / 2 - 1].obj_exts |
+ MEMCG_DATA_OBJEXTS;
#endif
}
@@ -634,8 +634,8 @@ static unsigned long kfence_init_pool(void)
if (!i || (i % 2))
continue;
-#ifdef CONFIG_MEMCG
- slab->memcg_data = 0;
+#ifdef CONFIG_MEMCG_KMEM
+ slab->obj_exts = 0;
#endif
__folio_clear_slab(slab_folio(slab));
}
@@ -1093,8 +1093,8 @@ void __kfence_free(void *addr)
{
struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);
-#ifdef CONFIG_MEMCG
- KFENCE_WARN_ON(meta->objcg);
+#ifdef CONFIG_MEMCG_KMEM
+ KFENCE_WARN_ON(meta->obj_exts.objcg);
#endif
/*
* If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing
diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h
index 2aafc46a4aaf..8e0d76c4ea2a 100644
--- a/mm/kfence/kfence.h
+++ b/mm/kfence/kfence.h
@@ -97,8 +97,8 @@ struct kfence_metadata {
struct kfence_track free_track;
/* For updating alloc_covered on frees. */
u32 alloc_stack_hash;
-#ifdef CONFIG_MEMCG
- struct obj_cgroup *objcg;
+#ifdef CONFIG_MEMCG_KMEM
+ struct slabobj_ext obj_exts;
#endif
};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4b27e245a055..f2a7fe718117 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2892,13 +2892,6 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
}
#ifdef CONFIG_MEMCG_KMEM
-/*
- * The allocated objcg pointers array is not accounted directly.
- * Moreover, it should not come from DMA buffer and is not readily
- * reclaimable. So those GFP bits should be masked off.
- */
-#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT)
-
/*
* mod_objcg_mlstate() may be called with irq enabled, so
* mod_memcg_lruvec_state() should be used.
@@ -2917,62 +2910,27 @@ static inline void mod_objcg_mlstate(struct obj_cgroup *objcg,
rcu_read_unlock();
}
-int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab)
-{
- unsigned int objects = objs_per_slab(s, slab);
- unsigned long memcg_data;
- void *vec;
-
- gfp &= ~OBJCGS_CLEAR_MASK;
- vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp,
- slab_nid(slab));
- if (!vec)
- return -ENOMEM;
-
- memcg_data = (unsigned long) vec | MEMCG_DATA_OBJCGS;
- if (new_slab) {
- /*
- * If the slab is brand new and nobody can yet access its
- * memcg_data, no synchronization is required and memcg_data can
- * be simply assigned.
- */
- slab->memcg_data = memcg_data;
- } else if (cmpxchg(&slab->memcg_data, 0, memcg_data)) {
- /*
- * If the slab is already in use, somebody can allocate and
- * assign obj_cgroups in parallel. In this case the existing
- * objcg vector should be reused.
- */
- kfree(vec);
- return 0;
- }
-
- kmemleak_not_leak(vec);
- return 0;
-}
-
static __always_inline
struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
{
/*
* Slab objects are accounted individually, not per-page.
* Memcg membership data for each individual object is saved in
- * slab->memcg_data.
+ * slab->obj_exts.
*/
if (folio_test_slab(folio)) {
- struct obj_cgroup **objcgs;
+ struct slabobj_ext *obj_exts;
struct slab *slab;
unsigned int off;
slab = folio_slab(folio);
- objcgs = slab_objcgs(slab);
- if (!objcgs)
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts)
return NULL;
off = obj_to_index(slab->slab_cache, slab, p);
- if (objcgs[off])
- return obj_cgroup_memcg(objcgs[off]);
+ if (obj_exts[off].objcg)
+ return obj_cgroup_memcg(obj_exts[off].objcg);
return NULL;
}
@@ -2980,7 +2938,7 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
/*
* folio_memcg_check() is used here, because in theory we can encounter
* a folio where the slab flag has been cleared already, but
- * slab->memcg_data has not been freed yet
+ * slab->obj_exts has not been freed yet
* folio_memcg_check() will guarantee that a proper memory
* cgroup pointer or NULL will be returned.
*/
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 31169b3e7f06..8b6086c666e6 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -372,7 +372,7 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
if (!memcg_data)
goto out_unlock;
- if (memcg_data & MEMCG_DATA_OBJCGS)
+ if (memcg_data & MEMCG_DATA_OBJEXTS)
ret += scnprintf(kbuf + ret, count - ret,
"Slab cache page\n");
diff --git a/mm/slab.h b/mm/slab.h
index f01ac256a8f5..25d14b3a7280 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -57,8 +57,8 @@ struct slab {
#endif
atomic_t __page_refcount;
-#ifdef CONFIG_MEMCG
- unsigned long memcg_data;
+#ifdef CONFIG_SLAB_OBJ_EXT
+ unsigned long obj_exts;
#endif
};
@@ -67,8 +67,8 @@ struct slab {
SLAB_MATCH(flags, __page_flags);
SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */
SLAB_MATCH(_refcount, __page_refcount);
-#ifdef CONFIG_MEMCG
-SLAB_MATCH(memcg_data, memcg_data);
+#ifdef CONFIG_SLAB_OBJ_EXT
+SLAB_MATCH(memcg_data, obj_exts);
#endif
#undef SLAB_MATCH
static_assert(sizeof(struct slab) <= sizeof(struct page));
@@ -390,36 +390,106 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla
return false;
}
-#ifdef CONFIG_MEMCG_KMEM
+#ifdef CONFIG_SLAB_OBJ_EXT
+
/*
- * slab_objcgs - get the object cgroups vector associated with a slab
+ * slab_obj_exts - get the pointer to the slab object extension vector
+ * associated with a slab.
* @slab: a pointer to the slab struct
*
- * Returns a pointer to the object cgroups vector associated with the slab,
+ * Returns a pointer to the object extension vector associated with the slab,
* or NULL if no such vector has been associated yet.
*/
-static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
+static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
{
- unsigned long memcg_data = READ_ONCE(slab->memcg_data);
+ unsigned long obj_exts = READ_ONCE(slab->obj_exts);
- VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS),
+#ifdef CONFIG_MEMCG
+ VM_BUG_ON_PAGE(obj_exts && !(obj_exts & MEMCG_DATA_OBJEXTS),
slab_page(slab));
- VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, slab_page(slab));
+ VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
- return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct slabobj_ext *)(obj_exts & ~MEMCG_DATA_FLAGS_MASK);
+#else
+ return (struct slabobj_ext *)obj_exts;
+#endif
}
-int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab);
-void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
- enum node_stat_item idx, int nr);
+int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
+ gfp_t gfp, bool new_slab);
-static inline void memcg_free_slab_cgroups(struct slab *slab)
+static inline bool need_slab_obj_ext(void)
{
- kfree(slab_objcgs(slab));
- slab->memcg_data = 0;
+ /*
+ * CONFIG_MEMCG_KMEM creates vector of obj_cgroup objects conditionally
+ * inside memcg_slab_post_alloc_hook. No other users for now.
+ */
+ return false;
}
+static inline void free_slab_obj_exts(struct slab *slab)
+{
+ struct slabobj_ext *obj_exts;
+
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts)
+ return;
+
+ kfree(obj_exts);
+ slab->obj_exts = 0;
+}
+
+static inline struct slabobj_ext *
+prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+{
+ struct slab *slab;
+
+ if (!p)
+ return NULL;
+
+ if (!need_slab_obj_ext())
+ return NULL;
+
+ slab = virt_to_slab(p);
+ if (!slab_obj_exts(slab) &&
+ WARN(alloc_slab_obj_exts(slab, s, flags, false),
+ "%s, %s: Failed to create slab extension vector!\n",
+ __func__, s->name))
+ return NULL;
+
+ return slab_obj_exts(slab) + obj_to_index(s, slab, p);
+}
+
+#else /* CONFIG_SLAB_OBJ_EXT */
+
+static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+{
+ return NULL;
+}
+
+static inline int alloc_slab_obj_exts(struct slab *slab,
+ struct kmem_cache *s, gfp_t gfp,
+ bool new_slab)
+{
+ return 0;
+}
+
+static inline void free_slab_obj_exts(struct slab *slab)
+{
+}
+
+static inline struct slabobj_ext *
+prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+{
+ return NULL;
+}
+
+#endif /* CONFIG_SLAB_OBJ_EXT */
+
+#ifdef CONFIG_MEMCG_KMEM
+void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
+ enum node_stat_item idx, int nr);
+
static inline size_t obj_full_size(struct kmem_cache *s)
{
/*
@@ -487,16 +557,15 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
if (likely(p[i])) {
slab = virt_to_slab(p[i]);
- if (!slab_objcgs(slab) &&
- memcg_alloc_slab_cgroups(slab, s, flags,
- false)) {
+ if (!slab_obj_exts(slab) &&
+ alloc_slab_obj_exts(slab, s, flags, false)) {
obj_cgroup_uncharge(objcg, obj_full_size(s));
continue;
}
off = obj_to_index(s, slab, p[i]);
obj_cgroup_get(objcg);
- slab_objcgs(slab)[off] = objcg;
+ slab_obj_exts(slab)[off].objcg = objcg;
mod_objcg_state(objcg, slab_pgdat(slab),
cache_vmstat_idx(s), obj_full_size(s));
} else {
@@ -509,14 +578,14 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
void **p, int objects)
{
- struct obj_cgroup **objcgs;
+ struct slabobj_ext *obj_exts;
int i;
if (!memcg_kmem_online())
return;
- objcgs = slab_objcgs(slab);
- if (!objcgs)
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts)
return;
for (i = 0; i < objects; i++) {
@@ -524,11 +593,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
unsigned int off;
off = obj_to_index(s, slab, p[i]);
- objcg = objcgs[off];
+ objcg = obj_exts[off].objcg;
if (!objcg)
continue;
- objcgs[off] = NULL;
+ obj_exts[off].objcg = NULL;
obj_cgroup_uncharge(objcg, obj_full_size(s));
mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
-obj_full_size(s));
@@ -537,27 +606,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
}
#else /* CONFIG_MEMCG_KMEM */
-static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
-{
- return NULL;
-}
-
static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
{
return NULL;
}
-static inline int memcg_alloc_slab_cgroups(struct slab *slab,
- struct kmem_cache *s, gfp_t gfp,
- bool new_slab)
-{
- return 0;
-}
-
-static inline void memcg_free_slab_cgroups(struct slab *slab)
-{
-}
-
static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
struct list_lru *lru,
struct obj_cgroup **objcgp,
@@ -594,7 +647,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
struct kmem_cache *s, gfp_t gfp)
{
if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
- memcg_alloc_slab_cgroups(slab, s, gfp, true);
+ alloc_slab_obj_exts(slab, s, gfp, true);
mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
PAGE_SIZE << order);
@@ -603,8 +656,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
static __always_inline void unaccount_slab(struct slab *slab, int order,
struct kmem_cache *s)
{
- if (memcg_kmem_online())
- memcg_free_slab_cgroups(slab);
+ free_slab_obj_exts(slab);
mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
-(PAGE_SIZE << order));
@@ -684,6 +736,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
unsigned int orig_size)
{
unsigned int zero_size = s->object_size;
+ struct slabobj_ext *obj_exts;
size_t i;
flags &= gfp_allowed_mask;
@@ -714,6 +767,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, flags);
kmsan_slab_alloc(s, p[i], flags);
+ obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]);
}
memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 607249785c07..f11cc072b01e 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -204,6 +204,53 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
return NULL;
}
+#ifdef CONFIG_SLAB_OBJ_EXT
+/*
+ * The allocated objcg pointers array is not accounted directly.
+ * Moreover, it should not come from DMA buffer and is not readily
+ * reclaimable. So those GFP bits should be masked off.
+ */
+#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT)
+
+int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
+ gfp_t gfp, bool new_slab)
+{
+ unsigned int objects = objs_per_slab(s, slab);
+ unsigned long obj_exts;
+ void *vec;
+
+ gfp &= ~OBJCGS_CLEAR_MASK;
+ vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
+ slab_nid(slab));
+ if (!vec)
+ return -ENOMEM;
+
+ obj_exts = (unsigned long)vec;
+#ifdef CONFIG_MEMCG
+ obj_exts |= MEMCG_DATA_OBJEXTS;
+#endif
+ if (new_slab) {
+ /*
+ * If the slab is brand new and nobody can yet access its
+ * obj_exts, no synchronization is required and obj_exts can
+ * be simply assigned.
+ */
+ slab->obj_exts = obj_exts;
+ } else if (cmpxchg(&slab->obj_exts, 0, obj_exts)) {
+ /*
+ * If the slab is already in use, somebody can allocate and
+ * assign slabobj_exts in parallel. In this case the existing
+ * objcg vector should be reused.
+ */
+ kfree(vec);
+ return 0;
+ }
+
+ kmemleak_not_leak(vec);
+ return 0;
+}
+#endif /* CONFIG_SLAB_OBJ_EXT */
+
static struct kmem_cache *create_cache(const char *name,
unsigned int object_size, unsigned int align,
slab_flags_t flags, unsigned int useroffset,
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 10/40] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (4 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 08/40] mm: introduce slabobj_ext to support slab object extensions Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 11/40] mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache objects Suren Baghdasaryan
` (24 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Slab extension objects can't be allocated before slab infrastructure is
initialized. Some caches, like kmem_cache and kmem_cache_node, are created
before slab infrastructure is initialized. Objects from these caches can't
have extension objects. Introduce SLAB_NO_OBJ_EXT slab flag to mark these
caches and avoid creating extensions for objects allocated from these
slabs.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/slab.h | 7 +++++++
mm/slab.c | 2 +-
mm/slub.c | 5 +++--
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 6b3e155b70bf..99a146f3cedf 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -147,6 +147,13 @@
#endif
#define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */
+#ifdef CONFIG_SLAB_OBJ_EXT
+/* Slab created using create_boot_cache */
+#define SLAB_NO_OBJ_EXT ((slab_flags_t __force)0x20000000U)
+#else
+#define SLAB_NO_OBJ_EXT 0
+#endif
+
/*
* ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
*
diff --git a/mm/slab.c b/mm/slab.c
index bb57f7fdbae1..ccc76f7455e9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1232,7 +1232,7 @@ void __init kmem_cache_init(void)
create_boot_cache(kmem_cache, "kmem_cache",
offsetof(struct kmem_cache, node) +
nr_node_ids * sizeof(struct kmem_cache_node *),
- SLAB_HWCACHE_ALIGN, 0, 0);
+ SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
list_add(&kmem_cache->list, &slab_caches);
slab_state = PARTIAL;
diff --git a/mm/slub.c b/mm/slub.c
index c87628cd8a9a..507b71372ee4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5020,7 +5020,8 @@ void __init kmem_cache_init(void)
node_set(node, slab_nodes);
create_boot_cache(kmem_cache_node, "kmem_cache_node",
- sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);
+ sizeof(struct kmem_cache_node),
+ SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
@@ -5030,7 +5031,7 @@ void __init kmem_cache_init(void)
create_boot_cache(kmem_cache, "kmem_cache",
offsetof(struct kmem_cache, node) +
nr_node_ids * sizeof(struct kmem_cache_node *),
- SLAB_HWCACHE_ALIGN, 0, 0);
+ SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
kmem_cache = bootstrap(&boot_kmem_cache);
kmem_cache_node = bootstrap(&boot_kmem_cache_node);
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 11/40] mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache objects
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (5 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 10/40] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 12/40] slab: objext: introduce objext_flags as extension to page_memcg_data_flags Suren Baghdasaryan
` (23 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Use __GFP_NO_OBJ_EXT to prevent recursions when allocating slabobj_ext
objects. Also prevent slabobj_ext allocations for kmem_cache objects.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
mm/slab.h | 6 ++++++
mm/slab_common.c | 2 ++
2 files changed, 8 insertions(+)
diff --git a/mm/slab.h b/mm/slab.h
index 25d14b3a7280..b1c22dc87047 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -450,6 +450,12 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
if (!need_slab_obj_ext())
return NULL;
+ if (s->flags & SLAB_NO_OBJ_EXT)
+ return NULL;
+
+ if (flags & __GFP_NO_OBJ_EXT)
+ return NULL;
+
slab = virt_to_slab(p);
if (!slab_obj_exts(slab) &&
WARN(alloc_slab_obj_exts(slab, s, flags, false),
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f11cc072b01e..42777d66d0e3 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -220,6 +220,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
void *vec;
gfp &= ~OBJCGS_CLEAR_MASK;
+ /* Prevent recursive extension vector allocation */
+ gfp |= __GFP_NO_OBJ_EXT;
vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
slab_nid(slab));
if (!vec)
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 12/40] slab: objext: introduce objext_flags as extension to page_memcg_data_flags
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (6 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 11/40] mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache objects Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 13/40] lib: code tagging framework Suren Baghdasaryan
` (22 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Introduce objext_flags to store additional objext flags unrelated to memcg.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/memcontrol.h | 29 ++++++++++++++++++++++-------
mm/slab.h | 4 +---
2 files changed, 23 insertions(+), 10 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b9fd9732a52b..5e2da63c525f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -347,7 +347,22 @@ enum page_memcg_data_flags {
__NR_MEMCG_DATA_FLAGS = (1UL << 2),
};
-#define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
+#define __FIRST_OBJEXT_FLAG __NR_MEMCG_DATA_FLAGS
+
+#else /* CONFIG_MEMCG */
+
+#define __FIRST_OBJEXT_FLAG (1UL << 0)
+
+#endif /* CONFIG_MEMCG */
+
+enum objext_flags {
+ /* the next bit after the last actual flag */
+ __NR_OBJEXTS_FLAGS = __FIRST_OBJEXT_FLAG,
+};
+
+#define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
+
+#ifdef CONFIG_MEMCG
static inline bool folio_memcg_kmem(struct folio *folio);
@@ -381,7 +396,7 @@ static inline struct mem_cgroup *__folio_memcg(struct folio *folio)
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio);
- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}
/*
@@ -402,7 +417,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio);
- return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct obj_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}
/*
@@ -459,11 +474,11 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
if (memcg_data & MEMCG_DATA_KMEM) {
struct obj_cgroup *objcg;
- objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
return obj_cgroup_memcg(objcg);
}
- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}
/*
@@ -502,11 +517,11 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
if (memcg_data & MEMCG_DATA_KMEM) {
struct obj_cgroup *objcg;
- objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
return obj_cgroup_memcg(objcg);
}
- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}
static inline struct mem_cgroup *page_memcg_check(struct page *page)
diff --git a/mm/slab.h b/mm/slab.h
index b1c22dc87047..bec202bdcfb8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -409,10 +409,8 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
slab_page(slab));
VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
- return (struct slabobj_ext *)(obj_exts & ~MEMCG_DATA_FLAGS_MASK);
-#else
- return (struct slabobj_ext *)obj_exts;
#endif
+ return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
}
int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 13/40] lib: code tagging framework
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (7 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 12/40] slab: objext: introduce objext_flags as extension to page_memcg_data_flags Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 14/40] lib: code tagging module support Suren Baghdasaryan
` (21 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Add basic infrastructure to support code tagging which stores tag common
information consisting of the module name, function, file name and line
number. Provide functions to register a new code tag type and navigate
between code tags.
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/codetag.h | 71 ++++++++++++++
lib/Kconfig.debug | 4 +
lib/Makefile | 1 +
lib/codetag.c | 199 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 275 insertions(+)
create mode 100644 include/linux/codetag.h
create mode 100644 lib/codetag.c
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
new file mode 100644
index 000000000000..a9d7adecc2a5
--- /dev/null
+++ b/include/linux/codetag.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * code tagging framework
+ */
+#ifndef _LINUX_CODETAG_H
+#define _LINUX_CODETAG_H
+
+#include <linux/types.h>
+
+struct codetag_iterator;
+struct codetag_type;
+struct seq_buf;
+struct module;
+
+/*
+ * An instance of this structure is created in a special ELF section at every
+ * code location being tagged. At runtime, the special section is treated as
+ * an array of these.
+ */
+struct codetag {
+ unsigned int flags; /* used in later patches */
+ unsigned int lineno;
+ const char *modname;
+ const char *function;
+ const char *filename;
+} __aligned(8);
+
+union codetag_ref {
+ struct codetag *ct;
+};
+
+struct codetag_range {
+ struct codetag *start;
+ struct codetag *stop;
+};
+
+struct codetag_module {
+ struct module *mod;
+ struct codetag_range range;
+};
+
+struct codetag_type_desc {
+ const char *section;
+ size_t tag_size;
+};
+
+struct codetag_iterator {
+ struct codetag_type *cttype;
+ struct codetag_module *cmod;
+ unsigned long mod_id;
+ struct codetag *ct;
+};
+
+#define CODE_TAG_INIT { \
+ .modname = KBUILD_MODNAME, \
+ .function = __func__, \
+ .filename = __FILE__, \
+ .lineno = __LINE__, \
+ .flags = 0, \
+}
+
+void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
+struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
+struct codetag *codetag_next_ct(struct codetag_iterator *iter);
+
+void codetag_to_text(struct seq_buf *out, struct codetag *ct);
+
+struct codetag_type *
+codetag_register_type(const struct codetag_type_desc *desc);
+
+#endif /* _LINUX_CODETAG_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ce51d4dc6803..5078da7d3ffb 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -957,6 +957,10 @@ config DEBUG_STACKOVERFLOW
If in doubt, say "N".
+config CODE_TAGGING
+ bool
+ select KALLSYMS
+
source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
source "lib/Kconfig.kmsan"
diff --git a/lib/Makefile b/lib/Makefile
index 293a0858a3f8..28d70ecf2976 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -228,6 +228,7 @@ obj-$(CONFIG_OF_RECONFIG_NOTIFIER_ERROR_INJECT) += \
of-reconfig-notifier-error-inject.o
obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+obj-$(CONFIG_CODE_TAGGING) += codetag.o
lib-$(CONFIG_GENERIC_BUG) += bug.o
obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
diff --git a/lib/codetag.c b/lib/codetag.c
new file mode 100644
index 000000000000..7708f8388e55
--- /dev/null
+++ b/lib/codetag.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/codetag.h>
+#include <linux/idr.h>
+#include <linux/kallsyms.h>
+#include <linux/module.h>
+#include <linux/seq_buf.h>
+#include <linux/slab.h>
+
+struct codetag_type {
+ struct list_head link;
+ unsigned int count;
+ struct idr mod_idr;
+ struct rw_semaphore mod_lock; /* protects mod_idr */
+ struct codetag_type_desc desc;
+};
+
+static DEFINE_MUTEX(codetag_lock);
+static LIST_HEAD(codetag_types);
+
+void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
+{
+ if (lock)
+ down_read(&cttype->mod_lock);
+ else
+ up_read(&cttype->mod_lock);
+}
+
+struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
+{
+ struct codetag_iterator iter = {
+ .cttype = cttype,
+ .cmod = NULL,
+ .mod_id = 0,
+ .ct = NULL,
+ };
+
+ return iter;
+}
+
+static inline struct codetag *get_first_module_ct(struct codetag_module *cmod)
+{
+ return cmod->range.start < cmod->range.stop ? cmod->range.start : NULL;
+}
+
+static inline
+struct codetag *get_next_module_ct(struct codetag_iterator *iter)
+{
+ struct codetag *res = (struct codetag *)
+ ((char *)iter->ct + iter->cttype->desc.tag_size);
+
+ return res < iter->cmod->range.stop ? res : NULL;
+}
+
+struct codetag *codetag_next_ct(struct codetag_iterator *iter)
+{
+ struct codetag_type *cttype = iter->cttype;
+ struct codetag_module *cmod;
+ struct codetag *ct;
+
+ lockdep_assert_held(&cttype->mod_lock);
+
+ if (unlikely(idr_is_empty(&cttype->mod_idr)))
+ return NULL;
+
+ ct = NULL;
+ while (true) {
+ cmod = idr_find(&cttype->mod_idr, iter->mod_id);
+
+ /* If module was removed move to the next one */
+ if (!cmod)
+ cmod = idr_get_next_ul(&cttype->mod_idr,
+ &iter->mod_id);
+
+ /* Exit if no more modules */
+ if (!cmod)
+ break;
+
+ if (cmod != iter->cmod) {
+ iter->cmod = cmod;
+ ct = get_first_module_ct(cmod);
+ } else
+ ct = get_next_module_ct(iter);
+
+ if (ct)
+ break;
+
+ iter->mod_id++;
+ }
+
+ iter->ct = ct;
+ return ct;
+}
+
+void codetag_to_text(struct seq_buf *out, struct codetag *ct)
+{
+ seq_buf_printf(out, "%s:%u module:%s func:%s",
+ ct->filename, ct->lineno,
+ ct->modname, ct->function);
+}
+
+static inline size_t range_size(const struct codetag_type *cttype,
+ const struct codetag_range *range)
+{
+ return ((char *)range->stop - (char *)range->start) /
+ cttype->desc.tag_size;
+}
+
+static void *get_symbol(struct module *mod, const char *prefix, const char *name)
+{
+ char buf[64];
+ int res;
+
+ res = snprintf(buf, sizeof(buf), "%s%s", prefix, name);
+ if (WARN_ON(res < 1 || res > sizeof(buf)))
+ return NULL;
+
+ return mod ?
+ (void *)find_kallsyms_symbol_value(mod, buf) :
+ (void *)kallsyms_lookup_name(buf);
+}
+
+static struct codetag_range get_section_range(struct module *mod,
+ const char *section)
+{
+ return (struct codetag_range) {
+ get_symbol(mod, "__start_", section),
+ get_symbol(mod, "__stop_", section),
+ };
+}
+
+static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
+{
+ struct codetag_range range;
+ struct codetag_module *cmod;
+ int err;
+
+ range = get_section_range(mod, cttype->desc.section);
+ if (!range.start || !range.stop) {
+ pr_warn("Failed to load code tags of type %s from the module %s\n",
+ cttype->desc.section,
+ mod ? mod->name : "(built-in)");
+ return -EINVAL;
+ }
+
+ /* Ignore empty ranges */
+ if (range.start == range.stop)
+ return 0;
+
+ BUG_ON(range.start > range.stop);
+
+ cmod = kmalloc(sizeof(*cmod), GFP_KERNEL);
+ if (unlikely(!cmod))
+ return -ENOMEM;
+
+ cmod->mod = mod;
+ cmod->range = range;
+
+ down_write(&cttype->mod_lock);
+ err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);
+ if (err >= 0)
+ cttype->count += range_size(cttype, &range);
+ up_write(&cttype->mod_lock);
+
+ if (err < 0) {
+ kfree(cmod);
+ return err;
+ }
+
+ return 0;
+}
+
+struct codetag_type *
+codetag_register_type(const struct codetag_type_desc *desc)
+{
+ struct codetag_type *cttype;
+ int err;
+
+ BUG_ON(desc->tag_size <= 0);
+
+ cttype = kzalloc(sizeof(*cttype), GFP_KERNEL);
+ if (unlikely(!cttype))
+ return ERR_PTR(-ENOMEM);
+
+ cttype->desc = *desc;
+ idr_init(&cttype->mod_idr);
+ init_rwsem(&cttype->mod_lock);
+
+ err = codetag_module_init(cttype, NULL);
+ if (unlikely(err)) {
+ kfree(cttype);
+ return ERR_PTR(err);
+ }
+
+ mutex_lock(&codetag_lock);
+ list_add_tail(&cttype->link, &codetag_types);
+ mutex_unlock(&codetag_lock);
+
+ return cttype;
+}
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 14/40] lib: code tagging module support
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (8 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 13/40] lib: code tagging framework Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 15/40] lib: prevent module unloading if memory is not freed Suren Baghdasaryan
` (20 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Add support for code tagging from dynamically loaded modules.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
include/linux/codetag.h | 12 +++++++++
kernel/module/main.c | 4 +++
lib/codetag.c | 58 +++++++++++++++++++++++++++++++++++++++--
3 files changed, 72 insertions(+), 2 deletions(-)
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index a9d7adecc2a5..386733e89b31 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -42,6 +42,10 @@ struct codetag_module {
struct codetag_type_desc {
const char *section;
size_t tag_size;
+ void (*module_load)(struct codetag_type *cttype,
+ struct codetag_module *cmod);
+ void (*module_unload)(struct codetag_type *cttype,
+ struct codetag_module *cmod);
};
struct codetag_iterator {
@@ -68,4 +72,12 @@ void codetag_to_text(struct seq_buf *out, struct codetag *ct);
struct codetag_type *
codetag_register_type(const struct codetag_type_desc *desc);
+#ifdef CONFIG_CODE_TAGGING
+void codetag_load_module(struct module *mod);
+void codetag_unload_module(struct module *mod);
+#else
+static inline void codetag_load_module(struct module *mod) {}
+static inline void codetag_unload_module(struct module *mod) {}
+#endif
+
#endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 044aa2c9e3cb..4232e7bff549 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -56,6 +56,7 @@
#include <linux/dynamic_debug.h>
#include <linux/audit.h>
#include <linux/cfi.h>
+#include <linux/codetag.h>
#include <linux/debugfs.h>
#include <uapi/linux/module.h>
#include "internal.h"
@@ -1249,6 +1250,7 @@ static void free_module(struct module *mod)
{
trace_module_free(mod);
+ codetag_unload_module(mod);
mod_sysfs_teardown(mod);
/*
@@ -2974,6 +2976,8 @@ static int load_module(struct load_info *info, const char __user *uargs,
/* Get rid of temporary copy. */
free_copy(info, flags);
+ codetag_load_module(mod);
+
/* Done! */
trace_module_load(mod);
diff --git a/lib/codetag.c b/lib/codetag.c
index 7708f8388e55..4ea57fb37346 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -108,15 +108,20 @@ static inline size_t range_size(const struct codetag_type *cttype,
static void *get_symbol(struct module *mod, const char *prefix, const char *name)
{
char buf[64];
+ void *ret;
int res;
res = snprintf(buf, sizeof(buf), "%s%s", prefix, name);
if (WARN_ON(res < 1 || res > sizeof(buf)))
return NULL;
- return mod ?
+ preempt_disable();
+ ret = mod ?
(void *)find_kallsyms_symbol_value(mod, buf) :
(void *)kallsyms_lookup_name(buf);
+ preempt_enable();
+
+ return ret;
}
static struct codetag_range get_section_range(struct module *mod,
@@ -157,8 +162,11 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
down_write(&cttype->mod_lock);
err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);
- if (err >= 0)
+ if (err >= 0) {
cttype->count += range_size(cttype, &range);
+ if (cttype->desc.module_load)
+ cttype->desc.module_load(cttype, cmod);
+ }
up_write(&cttype->mod_lock);
if (err < 0) {
@@ -197,3 +205,49 @@ codetag_register_type(const struct codetag_type_desc *desc)
return cttype;
}
+
+void codetag_load_module(struct module *mod)
+{
+ struct codetag_type *cttype;
+
+ if (!mod)
+ return;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link)
+ codetag_module_init(cttype, mod);
+ mutex_unlock(&codetag_lock);
+}
+
+void codetag_unload_module(struct module *mod)
+{
+ struct codetag_type *cttype;
+
+ if (!mod)
+ return;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ struct codetag_module *found = NULL;
+ struct codetag_module *cmod;
+ unsigned long mod_id, tmp;
+
+ down_write(&cttype->mod_lock);
+ idr_for_each_entry_ul(&cttype->mod_idr, cmod, tmp, mod_id) {
+ if (cmod->mod && cmod->mod == mod) {
+ found = cmod;
+ break;
+ }
+ }
+ if (found) {
+ if (cttype->desc.module_unload)
+ cttype->desc.module_unload(cttype, cmod);
+
+ cttype->count -= range_size(cttype, &cmod->range);
+ idr_remove(&cttype->mod_idr, mod_id);
+ kfree(cmod);
+ }
+ up_write(&cttype->mod_lock);
+ }
+ mutex_unlock(&codetag_lock);
+}
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 15/40] lib: prevent module unloading if memory is not freed
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (9 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 14/40] lib: code tagging module support Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 16/40] lib: code tagging query helper functions Suren Baghdasaryan
` (19 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Skip freeing module's data section if there are non-zero allocation tags
because otherwise, once these allocations are freed, the access to their
code tag would cause UAF.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/codetag.h | 6 +++---
kernel/module/main.c | 23 +++++++++++++++--------
lib/codetag.c | 11 ++++++++---
3 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index 386733e89b31..d98e4c8e86f0 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -44,7 +44,7 @@ struct codetag_type_desc {
size_t tag_size;
void (*module_load)(struct codetag_type *cttype,
struct codetag_module *cmod);
- void (*module_unload)(struct codetag_type *cttype,
+ bool (*module_unload)(struct codetag_type *cttype,
struct codetag_module *cmod);
};
@@ -74,10 +74,10 @@ codetag_register_type(const struct codetag_type_desc *desc);
#ifdef CONFIG_CODE_TAGGING
void codetag_load_module(struct module *mod);
-void codetag_unload_module(struct module *mod);
+bool codetag_unload_module(struct module *mod);
#else
static inline void codetag_load_module(struct module *mod) {}
-static inline void codetag_unload_module(struct module *mod) {}
+static inline bool codetag_unload_module(struct module *mod) { return true; }
#endif
#endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 4232e7bff549..9ff56f2bb09d 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1218,15 +1218,19 @@ static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
return module_alloc(size);
}
-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(void *ptr, enum mod_mem_type type,
+ bool unload_codetags)
{
+ if (!unload_codetags && mod_mem_type_is_core_data(type))
+ return;
+
if (mod_mem_use_vmalloc(type))
vfree(ptr);
else
module_memfree(ptr);
}
-static void free_mod_mem(struct module *mod)
+static void free_mod_mem(struct module *mod, bool unload_codetags)
{
for_each_mod_mem_type(type) {
struct module_memory *mod_mem = &mod->mem[type];
@@ -1237,20 +1241,23 @@ static void free_mod_mem(struct module *mod)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
- module_memory_free(mod_mem->base, type);
+ module_memory_free(mod_mem->base, type,
+ unload_codetags);
}
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
- module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+ module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA, unload_codetags);
}
/* Free a module, remove from lists, etc. */
static void free_module(struct module *mod)
{
+ bool unload_codetags;
+
trace_module_free(mod);
- codetag_unload_module(mod);
+ unload_codetags = codetag_unload_module(mod);
mod_sysfs_teardown(mod);
/*
@@ -1292,7 +1299,7 @@ static void free_module(struct module *mod)
kfree(mod->args);
percpu_modfree(mod);
- free_mod_mem(mod);
+ free_mod_mem(mod, unload_codetags);
}
void *__symbol_get(const char *symbol)
@@ -2294,7 +2301,7 @@ static int move_module(struct module *mod, struct load_info *info)
return 0;
out_enomem:
for (t--; t >= 0; t--)
- module_memory_free(mod->mem[t].base, t);
+ module_memory_free(mod->mem[t].base, t, true);
return ret;
}
@@ -2424,7 +2431,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
percpu_modfree(mod);
module_arch_freeing_init(mod);
- free_mod_mem(mod);
+ free_mod_mem(mod, true);
}
int __weak module_finalize(const Elf_Ehdr *hdr,
diff --git a/lib/codetag.c b/lib/codetag.c
index 4ea57fb37346..0ad4ea66c769 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -5,6 +5,7 @@
#include <linux/module.h>
#include <linux/seq_buf.h>
#include <linux/slab.h>
+#include <linux/vmalloc.h>
struct codetag_type {
struct list_head link;
@@ -219,12 +220,13 @@ void codetag_load_module(struct module *mod)
mutex_unlock(&codetag_lock);
}
-void codetag_unload_module(struct module *mod)
+bool codetag_unload_module(struct module *mod)
{
struct codetag_type *cttype;
+ bool unload_ok = true;
if (!mod)
- return;
+ return true;
mutex_lock(&codetag_lock);
list_for_each_entry(cttype, &codetag_types, link) {
@@ -241,7 +243,8 @@ void codetag_unload_module(struct module *mod)
}
if (found) {
if (cttype->desc.module_unload)
- cttype->desc.module_unload(cttype, cmod);
+ if (!cttype->desc.module_unload(cttype, cmod))
+ unload_ok = false;
cttype->count -= range_size(cttype, &cmod->range);
idr_remove(&cttype->mod_idr, mod_id);
@@ -250,4 +253,6 @@ void codetag_unload_module(struct module *mod)
up_write(&cttype->mod_lock);
}
mutex_unlock(&codetag_lock);
+
+ return unload_ok;
}
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 16/40] lib: code tagging query helper functions
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (10 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 15/40] lib: prevent module unloading if memory is not freed Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 17/40] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
` (18 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
From: Kent Overstreet <kent.overstreet@linux.dev>
Provide codetag_query_parse() to parse codetag queries and
codetag_matches_query() to check if the query affects a given codetag.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/codetag.h | 27 ++++++++
lib/codetag.c | 135 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 162 insertions(+)
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index d98e4c8e86f0..87207f199ac9 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -80,4 +80,31 @@ static inline void codetag_load_module(struct module *mod) {}
static inline bool codetag_unload_module(struct module *mod) { return true; }
#endif
+/* Codetag query parsing */
+
+struct codetag_query {
+ const char *filename;
+ const char *module;
+ const char *function;
+ const char *class;
+ unsigned int first_line, last_line;
+ unsigned int first_index, last_index;
+ unsigned int cur_index;
+
+ bool match_line:1;
+ bool match_index:1;
+
+ unsigned int set_enabled:1;
+ unsigned int enabled:2;
+
+ unsigned int set_frequency:1;
+ unsigned int frequency;
+};
+
+char *codetag_query_parse(struct codetag_query *q, char *buf);
+bool codetag_matches_query(struct codetag_query *q,
+ const struct codetag *ct,
+ const struct codetag_module *mod,
+ const char *class);
+
#endif /* _LINUX_CODETAG_H */
diff --git a/lib/codetag.c b/lib/codetag.c
index 0ad4ea66c769..84f90f3b922c 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -256,3 +256,138 @@ bool codetag_unload_module(struct module *mod)
return unload_ok;
}
+
+/* Codetag query parsing */
+
+#define CODETAG_QUERY_TOKENS() \
+ x(func) \
+ x(file) \
+ x(line) \
+ x(module) \
+ x(class) \
+ x(index)
+
+enum tokens {
+#define x(name) TOK_##name,
+ CODETAG_QUERY_TOKENS()
+#undef x
+};
+
+static const char * const token_strs[] = {
+#define x(name) #name,
+ CODETAG_QUERY_TOKENS()
+#undef x
+ NULL
+};
+
+static int parse_range(char *str, unsigned int *first, unsigned int *last)
+{
+ char *first_str = str;
+ char *last_str = strchr(first_str, '-');
+
+ if (last_str)
+ *last_str++ = '\0';
+
+ if (kstrtouint(first_str, 10, first))
+ return -EINVAL;
+
+ if (!last_str)
+ *last = *first;
+ else if (kstrtouint(last_str, 10, last))
+ return -EINVAL;
+
+ return 0;
+}
+
+char *codetag_query_parse(struct codetag_query *q, char *buf)
+{
+ while (1) {
+ char *p = buf;
+ char *str1 = strsep_no_empty(&p, " \t\r\n");
+ char *str2 = strsep_no_empty(&p, " \t\r\n");
+ int ret, token;
+
+ if (!str1 || !str2)
+ break;
+
+ token = match_string(token_strs, ARRAY_SIZE(token_strs), str1);
+ if (token < 0)
+ break;
+
+ switch (token) {
+ case TOK_func:
+ q->function = str2;
+ break;
+ case TOK_file:
+ q->filename = str2;
+ break;
+ case TOK_line:
+ ret = parse_range(str2, &q->first_line, &q->last_line);
+ if (ret)
+ return ERR_PTR(ret);
+ q->match_line = true;
+ break;
+ case TOK_module:
+ q->module = str2;
+ break;
+ case TOK_class:
+ q->class = str2;
+ break;
+ case TOK_index:
+ ret = parse_range(str2, &q->first_index, &q->last_index);
+ if (ret)
+ return ERR_PTR(ret);
+ q->match_index = true;
+ break;
+ }
+
+ buf = p;
+ }
+
+ return buf;
+}
+
+bool codetag_matches_query(struct codetag_query *q,
+ const struct codetag *ct,
+ const struct codetag_module *mod,
+ const char *class)
+{
+ size_t classlen = q->class ? strlen(q->class) : 0;
+
+ if (q->module &&
+ (!mod->mod ||
+ strcmp(q->module, ct->modname)))
+ return false;
+
+ if (q->filename &&
+ strcmp(q->filename, ct->filename) &&
+ strcmp(q->filename, kbasename(ct->filename)))
+ return false;
+
+ if (q->function &&
+ strcmp(q->function, ct->function))
+ return false;
+
+ /* match against the line number range */
+ if (q->match_line &&
+ (ct->lineno < q->first_line ||
+ ct->lineno > q->last_line))
+ return false;
+
+ /* match against the class */
+ if (classlen &&
+ (strncmp(q->class, class, classlen) ||
+ (class[classlen] && class[classlen] != ':')))
+ return false;
+
+ /* match against the fault index */
+ if (q->match_index &&
+ (q->cur_index < q->first_index ||
+ q->cur_index > q->last_index)) {
+ q->cur_index++;
+ return false;
+ }
+
+ q->cur_index++;
+ return true;
+}
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 17/40] lib: add allocation tagging support for memory allocation profiling
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (11 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 16/40] lib: code tagging query helper functions Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 18/40] lib: introduce support for page allocation tagging Suren Baghdasaryan
` (17 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Introduce CONFIG_MEM_ALLOC_PROFILING which provides definitions to easily
instrument memory allocators. It also registers an "alloc_tags" codetag
type with "allocations" defbugfs interface to output allocation tag
information.
CONFIG_MEM_ALLOC_PROFILING_DEBUG is provided for debugging the memory
allocation profiling instrumentation.
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
.../admin-guide/kernel-parameters.txt | 2 +
include/asm-generic/codetag.lds.h | 14 ++
include/asm-generic/vmlinux.lds.h | 3 +
include/linux/alloc_tag.h | 105 +++++++++++
include/linux/sched.h | 24 +++
lib/Kconfig.debug | 19 ++
lib/Makefile | 2 +
lib/alloc_tag.c | 177 ++++++++++++++++++
scripts/module.lds.S | 7 +
9 files changed, 353 insertions(+)
create mode 100644 include/asm-generic/codetag.lds.h
create mode 100644 include/linux/alloc_tag.h
create mode 100644 lib/alloc_tag.c
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..2fd8e56b7af8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3770,6 +3770,8 @@
nomce [X86-32] Disable Machine Check Exception
+ nomem_profiling Disable memory allocation profiling.
+
nomfgpt [X86-32] Disable Multi-Function General Purpose
Timer usage (for AMD Geode machines).
diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
new file mode 100644
index 000000000000..64f536b80380
--- /dev/null
+++ b/include/asm-generic/codetag.lds.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_CODETAG_LDS_H
+#define __ASM_GENERIC_CODETAG_LDS_H
+
+#define SECTION_WITH_BOUNDARIES(_name) \
+ . = ALIGN(8); \
+ __start_##_name = .; \
+ KEEP(*(_name)) \
+ __stop_##_name = .;
+
+#define CODETAG_SECTIONS() \
+ SECTION_WITH_BOUNDARIES(alloc_tags)
+
+#endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index d1f57e4868ed..985ff045c2a2 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -50,6 +50,8 @@
* [__nosave_begin, __nosave_end] for the nosave data
*/
+#include <asm-generic/codetag.lds.h>
+
#ifndef LOAD_OFFSET
#define LOAD_OFFSET 0
#endif
@@ -374,6 +376,7 @@
. = ALIGN(8); \
BOUNDED_SECTION_BY(__dyndbg_classes, ___dyndbg_classes) \
BOUNDED_SECTION_BY(__dyndbg, ___dyndbg) \
+ CODETAG_SECTIONS() \
LIKELY_PROFILE() \
BRANCH_PROFILE() \
TRACE_PRINTKS() \
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
new file mode 100644
index 000000000000..d913f8d9a7d8
--- /dev/null
+++ b/include/linux/alloc_tag.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * allocation tagging
+ */
+#ifndef _LINUX_ALLOC_TAG_H
+#define _LINUX_ALLOC_TAG_H
+
+#include <linux/bug.h>
+#include <linux/codetag.h>
+#include <linux/container_of.h>
+#include <linux/lazy-percpu-counter.h>
+#include <linux/static_key.h>
+
+/*
+ * An instance of this structure is created in a special ELF section at every
+ * allocation callsite. At runtime, the special section is treated as
+ * an array of these. Embedded codetag utilizes codetag framework.
+ */
+struct alloc_tag {
+ struct codetag ct;
+ struct lazy_percpu_counter bytes_allocated;
+} __aligned(8);
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
+static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
+{
+ return container_of(ct, struct alloc_tag, ct);
+}
+
+#define DEFINE_ALLOC_TAG(_alloc_tag, _old) \
+ static struct alloc_tag _alloc_tag __used __aligned(8) \
+ __section("alloc_tags") = { .ct = CODE_TAG_INIT }; \
+ struct alloc_tag * __maybe_unused _old = alloc_tag_save(&_alloc_tag)
+
+extern struct static_key_true mem_alloc_profiling_key;
+
+static inline bool mem_alloc_profiling_enabled(void)
+{
+ return static_branch_likely(&mem_alloc_profiling_key);
+}
+
+static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes,
+ bool may_allocate)
+{
+ struct alloc_tag *tag;
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ /* The switch should be checked before this */
+ BUG_ON(!mem_alloc_profiling_enabled());
+
+ WARN_ONCE(ref && !ref->ct, "alloc_tag was not set\n");
+#endif
+ if (!ref || !ref->ct)
+ return;
+
+ tag = ct_to_alloc_tag(ref->ct);
+
+ if (may_allocate)
+ lazy_percpu_counter_add(&tag->bytes_allocated, -bytes);
+ else
+ lazy_percpu_counter_add_noupgrade(&tag->bytes_allocated, -bytes);
+ ref->ct = NULL;
+}
+
+static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
+{
+ __alloc_tag_sub(ref, bytes, true);
+}
+
+static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes)
+{
+ __alloc_tag_sub(ref, bytes, false);
+}
+
+static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, size_t bytes)
+{
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ /* The switch should be checked before this */
+ BUG_ON(!mem_alloc_profiling_enabled());
+
+ WARN_ONCE(ref && ref->ct,
+ "alloc_tag was not cleared (got tag for %s:%u)\n",\
+ ref->ct->filename, ref->ct->lineno);
+
+ WARN_ONCE(!tag, "current->alloc_tag not set");
+#endif
+ if (!ref || !tag)
+ return;
+
+ ref->ct = &tag->ct;
+ lazy_percpu_counter_add(&tag->bytes_allocated, bytes);
+}
+
+#else
+
+#define DEFINE_ALLOC_TAG(_alloc_tag, _old)
+static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
+static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes) {}
+static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
+ size_t bytes) {}
+
+#endif
+
+#endif /* _LINUX_ALLOC_TAG_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 35e7efdea2d9..33708bf8f191 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -763,6 +763,10 @@ struct task_struct {
unsigned int flags;
unsigned int ptrace;
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ struct alloc_tag *alloc_tag;
+#endif
+
#ifdef CONFIG_SMP
int on_cpu;
struct __call_single_node wake_entry;
@@ -802,6 +806,7 @@ struct task_struct {
struct task_group *sched_task_group;
#endif
+
#ifdef CONFIG_UCLAMP_TASK
/*
* Clamp values requested for a scheduling entity.
@@ -2444,4 +2449,23 @@ static inline void sched_core_fork(struct task_struct *p) { }
extern void sched_set_stop_task(int cpu, struct task_struct *stop);
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag)
+{
+ swap(current->alloc_tag, tag);
+ return tag;
+}
+
+static inline void alloc_tag_restore(struct alloc_tag *tag, struct alloc_tag *old)
+{
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ WARN(current->alloc_tag != tag, "current->alloc_tag was changed:\n");
+#endif
+ current->alloc_tag = old;
+}
+#else
+static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag) { return NULL; }
+#define alloc_tag_restore(_tag, _old)
+#endif
+
#endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5078da7d3ffb..da0a91ea6042 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -961,6 +961,25 @@ config CODE_TAGGING
bool
select KALLSYMS
+config MEM_ALLOC_PROFILING
+ bool "Enable memory allocation profiling"
+ default n
+ depends on DEBUG_FS
+ select CODE_TAGGING
+ select LAZY_PERCPU_COUNTER
+ help
+ Track allocation source code and record total allocation size
+ initiated at that code location. The mechanism can be used to track
+ memory leaks with a low performance impact.
+
+config MEM_ALLOC_PROFILING_DEBUG
+ bool "Memory allocation profiler debugging"
+ default n
+ depends on MEM_ALLOC_PROFILING
+ help
+ Adds warnings with helpful error messages for memory allocation
+ profiling.
+
source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
source "lib/Kconfig.kmsan"
diff --git a/lib/Makefile b/lib/Makefile
index 28d70ecf2976..8d09ccb4d30c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -229,6 +229,8 @@ obj-$(CONFIG_OF_RECONFIG_NOTIFIER_ERROR_INJECT) += \
obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
obj-$(CONFIG_CODE_TAGGING) += codetag.o
+obj-$(CONFIG_MEM_ALLOC_PROFILING) += alloc_tag.o
+
lib-$(CONFIG_GENERIC_BUG) += bug.o
obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
new file mode 100644
index 000000000000..3c4cfeb79862
--- /dev/null
+++ b/lib/alloc_tag.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/alloc_tag.h>
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/gfp.h>
+#include <linux/module.h>
+#include <linux/seq_buf.h>
+#include <linux/uaccess.h>
+
+DEFINE_STATIC_KEY_TRUE(mem_alloc_profiling_key);
+
+/*
+ * Won't need to be exported once page allocation accounting is moved to the
+ * correct place:
+ */
+EXPORT_SYMBOL(mem_alloc_profiling_key);
+
+static int __init mem_alloc_profiling_disable(char *s)
+{
+ static_branch_disable(&mem_alloc_profiling_key);
+ return 1;
+}
+__setup("nomem_profiling", mem_alloc_profiling_disable);
+
+struct alloc_tag_file_iterator {
+ struct codetag_iterator ct_iter;
+ struct seq_buf buf;
+ char rawbuf[4096];
+};
+
+struct user_buf {
+ char __user *buf; /* destination user buffer */
+ size_t size; /* size of requested read */
+ ssize_t ret; /* bytes read so far */
+};
+
+static int flush_ubuf(struct user_buf *dst, struct seq_buf *src)
+{
+ if (src->len) {
+ size_t bytes = min_t(size_t, src->len, dst->size);
+ int err = copy_to_user(dst->buf, src->buffer, bytes);
+
+ if (err)
+ return err;
+
+ dst->ret += bytes;
+ dst->buf += bytes;
+ dst->size -= bytes;
+ src->len -= bytes;
+ memmove(src->buffer, src->buffer + bytes, src->len);
+ }
+
+ return 0;
+}
+
+static int allocations_file_open(struct inode *inode, struct file *file)
+{
+ struct codetag_type *cttype = inode->i_private;
+ struct alloc_tag_file_iterator *iter;
+
+ iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+ if (!iter)
+ return -ENOMEM;
+
+ codetag_lock_module_list(cttype, true);
+ iter->ct_iter = codetag_get_ct_iter(cttype);
+ codetag_lock_module_list(cttype, false);
+ seq_buf_init(&iter->buf, iter->rawbuf, sizeof(iter->rawbuf));
+ file->private_data = iter;
+
+ return 0;
+}
+
+static int allocations_file_release(struct inode *inode, struct file *file)
+{
+ struct alloc_tag_file_iterator *iter = file->private_data;
+
+ kfree(iter);
+ return 0;
+}
+
+static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
+{
+ struct alloc_tag *tag = ct_to_alloc_tag(ct);
+ char buf[10];
+
+ string_get_size(lazy_percpu_counter_read(&tag->bytes_allocated), 1,
+ STRING_UNITS_2, buf, sizeof(buf));
+
+ seq_buf_printf(out, "%8s ", buf);
+ codetag_to_text(out, ct);
+ seq_buf_putc(out, '\n');
+}
+
+static ssize_t allocations_file_read(struct file *file, char __user *ubuf,
+ size_t size, loff_t *ppos)
+{
+ struct alloc_tag_file_iterator *iter = file->private_data;
+ struct user_buf buf = { .buf = ubuf, .size = size };
+ struct codetag *ct;
+ int err = 0;
+
+ codetag_lock_module_list(iter->ct_iter.cttype, true);
+ while (1) {
+ err = flush_ubuf(&buf, &iter->buf);
+ if (err || !buf.size)
+ break;
+
+ ct = codetag_next_ct(&iter->ct_iter);
+ if (!ct)
+ break;
+
+ alloc_tag_to_text(&iter->buf, ct);
+ }
+ codetag_lock_module_list(iter->ct_iter.cttype, false);
+
+ return err ? : buf.ret;
+}
+
+static const struct file_operations allocations_file_ops = {
+ .owner = THIS_MODULE,
+ .open = allocations_file_open,
+ .release = allocations_file_release,
+ .read = allocations_file_read,
+};
+
+static int __init dbgfs_init(struct codetag_type *cttype)
+{
+ struct dentry *file;
+
+ file = debugfs_create_file("allocations", 0444, NULL, cttype,
+ &allocations_file_ops);
+
+ return IS_ERR(file) ? PTR_ERR(file) : 0;
+}
+
+static bool alloc_tag_module_unload(struct codetag_type *cttype, struct codetag_module *cmod)
+{
+ struct codetag_iterator iter = codetag_get_ct_iter(cttype);
+ bool module_unused = true;
+ struct alloc_tag *tag;
+ struct codetag *ct;
+ size_t bytes;
+
+ for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
+ if (iter.cmod != cmod)
+ continue;
+
+ tag = ct_to_alloc_tag(ct);
+ bytes = lazy_percpu_counter_read(&tag->bytes_allocated);
+
+ if (!WARN(bytes, "%s:%u module %s func:%s has %zu allocated at module unload",
+ ct->filename, ct->lineno, ct->modname, ct->function, bytes))
+ lazy_percpu_counter_exit(&tag->bytes_allocated);
+ else
+ module_unused = false;
+ }
+
+ return module_unused;
+}
+
+static int __init alloc_tag_init(void)
+{
+ struct codetag_type *cttype;
+ const struct codetag_type_desc desc = {
+ .section = "alloc_tags",
+ .tag_size = sizeof(struct alloc_tag),
+ .module_unload = alloc_tag_module_unload,
+ };
+
+ cttype = codetag_register_type(&desc);
+ if (IS_ERR_OR_NULL(cttype))
+ return PTR_ERR(cttype);
+
+ return dbgfs_init(cttype);
+}
+module_init(alloc_tag_init);
diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index bf5bcf2836d8..45c67a0994f3 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -9,6 +9,8 @@
#define DISCARD_EH_FRAME *(.eh_frame)
#endif
+#include <asm-generic/codetag.lds.h>
+
SECTIONS {
/DISCARD/ : {
*(.discard)
@@ -47,12 +49,17 @@ SECTIONS {
.data : {
*(.data .data.[0-9a-zA-Z_]*)
*(.data..L*)
+ CODETAG_SECTIONS()
}
.rodata : {
*(.rodata .rodata.[0-9a-zA-Z_]*)
*(.rodata..L*)
}
+#else
+ .data : {
+ CODETAG_SECTIONS()
+ }
#endif
}
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 18/40] lib: introduce support for page allocation tagging
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (12 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 17/40] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts Suren Baghdasaryan
` (16 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Introduce helper functions to easily instrument page allocators by
storing a pointer to the allocation tag associated with the code that
allocated the page in a page_ext field.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
include/linux/pgalloc_tag.h | 33 +++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 1 +
lib/alloc_tag.c | 17 +++++++++++++++++
mm/page_ext.c | 12 +++++++++---
4 files changed, 60 insertions(+), 3 deletions(-)
create mode 100644 include/linux/pgalloc_tag.h
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
new file mode 100644
index 000000000000..f8c7b6ef9c75
--- /dev/null
+++ b/include/linux/pgalloc_tag.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * page allocation tagging
+ */
+#ifndef _LINUX_PGALLOC_TAG_H
+#define _LINUX_PGALLOC_TAG_H
+
+#include <linux/alloc_tag.h>
+#include <linux/page_ext.h>
+
+extern struct page_ext_operations page_alloc_tagging_ops;
+struct page_ext *lookup_page_ext(const struct page *page);
+
+static inline union codetag_ref *get_page_tag_ref(struct page *page)
+{
+ if (page && mem_alloc_profiling_enabled()) {
+ struct page_ext *page_ext = lookup_page_ext(page);
+
+ if (page_ext)
+ return (void *)page_ext + page_alloc_tagging_ops.offset;
+ }
+ return NULL;
+}
+
+static inline void pgalloc_tag_dec(struct page *page, unsigned int order)
+{
+ union codetag_ref *ref = get_page_tag_ref(page);
+
+ if (ref)
+ alloc_tag_sub(ref, PAGE_SIZE << order);
+}
+
+#endif /* _LINUX_PGALLOC_TAG_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index da0a91ea6042..d3aa5ee0bf0d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -967,6 +967,7 @@ config MEM_ALLOC_PROFILING
depends on DEBUG_FS
select CODE_TAGGING
select LAZY_PERCPU_COUNTER
+ select PAGE_EXTENSION
help
Track allocation source code and record total allocation size
initiated at that code location. The mechanism can be used to track
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 3c4cfeb79862..4a0b95a46b2e 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -4,6 +4,7 @@
#include <linux/fs.h>
#include <linux/gfp.h>
#include <linux/module.h>
+#include <linux/page_ext.h>
#include <linux/seq_buf.h>
#include <linux/uaccess.h>
@@ -159,6 +160,22 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype, struct codetag_
return module_unused;
}
+static __init bool need_page_alloc_tagging(void)
+{
+ return true;
+}
+
+static __init void init_page_alloc_tagging(void)
+{
+}
+
+struct page_ext_operations page_alloc_tagging_ops = {
+ .size = sizeof(union codetag_ref),
+ .need = need_page_alloc_tagging,
+ .init = init_page_alloc_tagging,
+};
+EXPORT_SYMBOL(page_alloc_tagging_ops);
+
static int __init alloc_tag_init(void)
{
struct codetag_type *cttype;
diff --git a/mm/page_ext.c b/mm/page_ext.c
index dc1626be458b..eaf054ec276c 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -10,6 +10,7 @@
#include <linux/page_idle.h>
#include <linux/page_table_check.h>
#include <linux/rcupdate.h>
+#include <linux/pgalloc_tag.h>
/*
* struct page extension
@@ -82,6 +83,9 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
&page_idle_ops,
#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ &page_alloc_tagging_ops,
+#endif
#ifdef CONFIG_PAGE_TABLE_CHECK
&page_table_check_ops,
#endif
@@ -90,7 +94,7 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
unsigned long page_ext_size;
static unsigned long total_usage;
-static struct page_ext *lookup_page_ext(const struct page *page);
+struct page_ext *lookup_page_ext(const struct page *page);
bool early_page_ext __meminitdata;
static int __init setup_early_page_ext(char *str)
@@ -199,7 +203,7 @@ void __meminit pgdat_page_ext_init(struct pglist_data *pgdat)
pgdat->node_page_ext = NULL;
}
-static struct page_ext *lookup_page_ext(const struct page *page)
+struct page_ext *lookup_page_ext(const struct page *page)
{
unsigned long pfn = page_to_pfn(page);
unsigned long index;
@@ -219,6 +223,7 @@ static struct page_ext *lookup_page_ext(const struct page *page)
MAX_ORDER_NR_PAGES);
return get_entry(base, index);
}
+EXPORT_SYMBOL(lookup_page_ext);
static int __init alloc_node_page_ext(int nid)
{
@@ -278,7 +283,7 @@ static bool page_ext_invalid(struct page_ext *page_ext)
return !page_ext || (((unsigned long)page_ext & PAGE_EXT_INVALID) == PAGE_EXT_INVALID);
}
-static struct page_ext *lookup_page_ext(const struct page *page)
+struct page_ext *lookup_page_ext(const struct page *page)
{
unsigned long pfn = page_to_pfn(page);
struct mem_section *section = __pfn_to_section(pfn);
@@ -295,6 +300,7 @@ static struct page_ext *lookup_page_ext(const struct page *page)
return NULL;
return get_entry(page_ext, pfn);
}
+EXPORT_SYMBOL(lookup_page_ext);
static void *__meminit alloc_page_ext(size_t size, int nid)
{
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (13 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 18/40] lib: introduce support for page allocation tagging Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-02 15:50 ` Petr Tesařík
[not found] ` <20230501165450.15352-20-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2023-05-01 16:54 ` [PATCH 22/40] mm: create new codetag references during page splitting Suren Baghdasaryan
` (15 subsequent siblings)
30 siblings, 2 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
After redefining alloc_pages, all uses of that name are being replaced.
Change the conflicting names to prevent preprocessor from replacing them
when it's not intended.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
arch/x86/kernel/amd_gart_64.c | 2 +-
drivers/iommu/dma-iommu.c | 2 +-
drivers/xen/grant-dma-ops.c | 2 +-
drivers/xen/swiotlb-xen.c | 2 +-
include/linux/dma-map-ops.h | 2 +-
kernel/dma/mapping.c | 4 ++--
6 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index 56a917df410d..842a0ec5eaa9 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -676,7 +676,7 @@ static const struct dma_map_ops gart_dma_ops = {
.get_sgtable = dma_common_get_sgtable,
.dma_supported = dma_direct_supported,
.get_required_mask = dma_direct_get_required_mask,
- .alloc_pages = dma_direct_alloc_pages,
+ .alloc_pages_op = dma_direct_alloc_pages,
.free_pages = dma_direct_free_pages,
};
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7a9f0b0bddbd..76a9d5ca4eee 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1556,7 +1556,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.flags = DMA_F_PCI_P2PDMA_SUPPORTED,
.alloc = iommu_dma_alloc,
.free = iommu_dma_free,
- .alloc_pages = dma_common_alloc_pages,
+ .alloc_pages_op = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
.alloc_noncontiguous = iommu_dma_alloc_noncontiguous,
.free_noncontiguous = iommu_dma_free_noncontiguous,
diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index 9784a77fa3c9..6c7d984f164d 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -282,7 +282,7 @@ static int xen_grant_dma_supported(struct device *dev, u64 mask)
static const struct dma_map_ops xen_grant_dma_ops = {
.alloc = xen_grant_dma_alloc,
.free = xen_grant_dma_free,
- .alloc_pages = xen_grant_dma_alloc_pages,
+ .alloc_pages_op = xen_grant_dma_alloc_pages,
.free_pages = xen_grant_dma_free_pages,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 67aa74d20162..5ab2616153f0 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -403,6 +403,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.dma_supported = xen_swiotlb_dma_supported,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
- .alloc_pages = dma_common_alloc_pages,
+ .alloc_pages_op = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
};
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 31f114f486c4..d741940dcb3b 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -27,7 +27,7 @@ struct dma_map_ops {
unsigned long attrs);
void (*free)(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs);
- struct page *(*alloc_pages)(struct device *dev, size_t size,
+ struct page *(*alloc_pages_op)(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir,
gfp_t gfp);
void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9a4db5cce600..fc42930af14b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -570,9 +570,9 @@ static struct page *__dma_alloc_pages(struct device *dev, size_t size,
size = PAGE_ALIGN(size);
if (dma_alloc_direct(dev, ops))
return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
- if (!ops->alloc_pages)
+ if (!ops->alloc_pages_op)
return NULL;
- return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
+ return ops->alloc_pages_op(dev, size, dma_handle, dir, gfp);
}
struct page *dma_alloc_pages(struct device *dev, size_t size,
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts
2023-05-01 16:54 ` [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts Suren Baghdasaryan
@ 2023-05-02 15:50 ` Petr Tesařík
2023-05-02 18:38 ` Suren Baghdasaryan
[not found] ` <20230501165450.15352-20-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
1 sibling, 1 reply; 160+ messages in thread
From: Petr Tesařík @ 2023-05-02 15:50 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
mgorman, dave, willy, liam.howlett, corbet, void, peterz,
juri.lelli, ldufour, catalin.marinas, will, arnd, tglx, mingo,
dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy, nathan,
dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, andreyknvl
On Mon, 1 May 2023 09:54:29 -0700
Suren Baghdasaryan <surenb@google.com> wrote:
> After redefining alloc_pages, all uses of that name are being replaced.
> Change the conflicting names to prevent preprocessor from replacing them
> when it's not intended.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
> arch/x86/kernel/amd_gart_64.c | 2 +-
> drivers/iommu/dma-iommu.c | 2 +-
> drivers/xen/grant-dma-ops.c | 2 +-
> drivers/xen/swiotlb-xen.c | 2 +-
> include/linux/dma-map-ops.h | 2 +-
> kernel/dma/mapping.c | 4 ++--
> 6 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
> index 56a917df410d..842a0ec5eaa9 100644
> --- a/arch/x86/kernel/amd_gart_64.c
> +++ b/arch/x86/kernel/amd_gart_64.c
> @@ -676,7 +676,7 @@ static const struct dma_map_ops gart_dma_ops = {
> .get_sgtable = dma_common_get_sgtable,
> .dma_supported = dma_direct_supported,
> .get_required_mask = dma_direct_get_required_mask,
> - .alloc_pages = dma_direct_alloc_pages,
> + .alloc_pages_op = dma_direct_alloc_pages,
> .free_pages = dma_direct_free_pages,
> };
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7a9f0b0bddbd..76a9d5ca4eee 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1556,7 +1556,7 @@ static const struct dma_map_ops iommu_dma_ops = {
> .flags = DMA_F_PCI_P2PDMA_SUPPORTED,
> .alloc = iommu_dma_alloc,
> .free = iommu_dma_free,
> - .alloc_pages = dma_common_alloc_pages,
> + .alloc_pages_op = dma_common_alloc_pages,
> .free_pages = dma_common_free_pages,
> .alloc_noncontiguous = iommu_dma_alloc_noncontiguous,
> .free_noncontiguous = iommu_dma_free_noncontiguous,
> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> index 9784a77fa3c9..6c7d984f164d 100644
> --- a/drivers/xen/grant-dma-ops.c
> +++ b/drivers/xen/grant-dma-ops.c
> @@ -282,7 +282,7 @@ static int xen_grant_dma_supported(struct device *dev, u64 mask)
> static const struct dma_map_ops xen_grant_dma_ops = {
> .alloc = xen_grant_dma_alloc,
> .free = xen_grant_dma_free,
> - .alloc_pages = xen_grant_dma_alloc_pages,
> + .alloc_pages_op = xen_grant_dma_alloc_pages,
> .free_pages = xen_grant_dma_free_pages,
> .mmap = dma_common_mmap,
> .get_sgtable = dma_common_get_sgtable,
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 67aa74d20162..5ab2616153f0 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -403,6 +403,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
> .dma_supported = xen_swiotlb_dma_supported,
> .mmap = dma_common_mmap,
> .get_sgtable = dma_common_get_sgtable,
> - .alloc_pages = dma_common_alloc_pages,
> + .alloc_pages_op = dma_common_alloc_pages,
> .free_pages = dma_common_free_pages,
> };
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 31f114f486c4..d741940dcb3b 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -27,7 +27,7 @@ struct dma_map_ops {
> unsigned long attrs);
> void (*free)(struct device *dev, size_t size, void *vaddr,
> dma_addr_t dma_handle, unsigned long attrs);
> - struct page *(*alloc_pages)(struct device *dev, size_t size,
> + struct page *(*alloc_pages_op)(struct device *dev, size_t size,
> dma_addr_t *dma_handle, enum dma_data_direction dir,
> gfp_t gfp);
> void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 9a4db5cce600..fc42930af14b 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -570,9 +570,9 @@ static struct page *__dma_alloc_pages(struct device *dev, size_t size,
> size = PAGE_ALIGN(size);
> if (dma_alloc_direct(dev, ops))
> return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
> - if (!ops->alloc_pages)
> + if (!ops->alloc_pages_op)
> return NULL;
> - return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
> + return ops->alloc_pages_op(dev, size, dma_handle, dir, gfp);
> }
>
> struct page *dma_alloc_pages(struct device *dev, size_t size,
I'm not impressed. This patch increases churn for code which does not
(directly) benefit from the change, and that for limitations in your
tooling?
Why not just rename the conflicting uses in your local tree, but then
remove the rename from the final patch series?
Just my two cents,
Petr T
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts
2023-05-02 15:50 ` Petr Tesařík
@ 2023-05-02 18:38 ` Suren Baghdasaryan
2023-05-02 20:09 ` Petr Tesařík
0 siblings, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-02 18:38 UTC (permalink / raw)
To: Petr Tesařík
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
mgorman, dave, willy, liam.howlett, corbet, void, peterz,
juri.lelli, ldufour, catalin.marinas, will, arnd, tglx, mingo,
dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy, nathan,
dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, andreyknvl
On Tue, May 2, 2023 at 8:50 AM Petr Tesařík <petr@tesarici.cz> wrote:
>
> On Mon, 1 May 2023 09:54:29 -0700
> Suren Baghdasaryan <surenb@google.com> wrote:
>
> > After redefining alloc_pages, all uses of that name are being replaced.
> > Change the conflicting names to prevent preprocessor from replacing them
> > when it's not intended.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> > arch/x86/kernel/amd_gart_64.c | 2 +-
> > drivers/iommu/dma-iommu.c | 2 +-
> > drivers/xen/grant-dma-ops.c | 2 +-
> > drivers/xen/swiotlb-xen.c | 2 +-
> > include/linux/dma-map-ops.h | 2 +-
> > kernel/dma/mapping.c | 4 ++--
> > 6 files changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
> > index 56a917df410d..842a0ec5eaa9 100644
> > --- a/arch/x86/kernel/amd_gart_64.c
> > +++ b/arch/x86/kernel/amd_gart_64.c
> > @@ -676,7 +676,7 @@ static const struct dma_map_ops gart_dma_ops = {
> > .get_sgtable = dma_common_get_sgtable,
> > .dma_supported = dma_direct_supported,
> > .get_required_mask = dma_direct_get_required_mask,
> > - .alloc_pages = dma_direct_alloc_pages,
> > + .alloc_pages_op = dma_direct_alloc_pages,
> > .free_pages = dma_direct_free_pages,
> > };
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 7a9f0b0bddbd..76a9d5ca4eee 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -1556,7 +1556,7 @@ static const struct dma_map_ops iommu_dma_ops = {
> > .flags = DMA_F_PCI_P2PDMA_SUPPORTED,
> > .alloc = iommu_dma_alloc,
> > .free = iommu_dma_free,
> > - .alloc_pages = dma_common_alloc_pages,
> > + .alloc_pages_op = dma_common_alloc_pages,
> > .free_pages = dma_common_free_pages,
> > .alloc_noncontiguous = iommu_dma_alloc_noncontiguous,
> > .free_noncontiguous = iommu_dma_free_noncontiguous,
> > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > index 9784a77fa3c9..6c7d984f164d 100644
> > --- a/drivers/xen/grant-dma-ops.c
> > +++ b/drivers/xen/grant-dma-ops.c
> > @@ -282,7 +282,7 @@ static int xen_grant_dma_supported(struct device *dev, u64 mask)
> > static const struct dma_map_ops xen_grant_dma_ops = {
> > .alloc = xen_grant_dma_alloc,
> > .free = xen_grant_dma_free,
> > - .alloc_pages = xen_grant_dma_alloc_pages,
> > + .alloc_pages_op = xen_grant_dma_alloc_pages,
> > .free_pages = xen_grant_dma_free_pages,
> > .mmap = dma_common_mmap,
> > .get_sgtable = dma_common_get_sgtable,
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index 67aa74d20162..5ab2616153f0 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -403,6 +403,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
> > .dma_supported = xen_swiotlb_dma_supported,
> > .mmap = dma_common_mmap,
> > .get_sgtable = dma_common_get_sgtable,
> > - .alloc_pages = dma_common_alloc_pages,
> > + .alloc_pages_op = dma_common_alloc_pages,
> > .free_pages = dma_common_free_pages,
> > };
> > diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> > index 31f114f486c4..d741940dcb3b 100644
> > --- a/include/linux/dma-map-ops.h
> > +++ b/include/linux/dma-map-ops.h
> > @@ -27,7 +27,7 @@ struct dma_map_ops {
> > unsigned long attrs);
> > void (*free)(struct device *dev, size_t size, void *vaddr,
> > dma_addr_t dma_handle, unsigned long attrs);
> > - struct page *(*alloc_pages)(struct device *dev, size_t size,
> > + struct page *(*alloc_pages_op)(struct device *dev, size_t size,
> > dma_addr_t *dma_handle, enum dma_data_direction dir,
> > gfp_t gfp);
> > void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
> > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> > index 9a4db5cce600..fc42930af14b 100644
> > --- a/kernel/dma/mapping.c
> > +++ b/kernel/dma/mapping.c
> > @@ -570,9 +570,9 @@ static struct page *__dma_alloc_pages(struct device *dev, size_t size,
> > size = PAGE_ALIGN(size);
> > if (dma_alloc_direct(dev, ops))
> > return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
> > - if (!ops->alloc_pages)
> > + if (!ops->alloc_pages_op)
> > return NULL;
> > - return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
> > + return ops->alloc_pages_op(dev, size, dma_handle, dir, gfp);
> > }
> >
> > struct page *dma_alloc_pages(struct device *dev, size_t size,
>
> I'm not impressed. This patch increases churn for code which does not
> (directly) benefit from the change, and that for limitations in your
> tooling?
>
> Why not just rename the conflicting uses in your local tree, but then
> remove the rename from the final patch series?
With alloc_pages function becoming a macro, the preprocessor ends up
replacing all instances of that name, even when it's not used as a
function. That what necessitates this change. If there is a way to
work around this issue without changing all alloc_pages() calls in the
source base I would love to learn it but I'm not quite clear about
your suggestion and if it solves the issue. Could you please provide
more details?
>
> Just my two cents,
> Petr T
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts
2023-05-02 18:38 ` Suren Baghdasaryan
@ 2023-05-02 20:09 ` Petr Tesařík
2023-05-02 20:18 ` Kent Overstreet
2023-05-02 20:24 ` Suren Baghdasaryan
0 siblings, 2 replies; 160+ messages in thread
From: Petr Tesařík @ 2023-05-02 20:09 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
mgorman, dave, willy, liam.howlett, corbet, void, peterz,
juri.lelli, ldufour, catalin.marinas, will, arnd, tglx, mingo,
dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy, nathan,
dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, andreyknvl
On Tue, 2 May 2023 11:38:49 -0700
Suren Baghdasaryan <surenb@google.com> wrote:
> On Tue, May 2, 2023 at 8:50 AM Petr Tesařík <petr@tesarici.cz> wrote:
> >
> > On Mon, 1 May 2023 09:54:29 -0700
> > Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > After redefining alloc_pages, all uses of that name are being replaced.
> > > Change the conflicting names to prevent preprocessor from replacing them
> > > when it's not intended.
> > >
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > ---
> > > arch/x86/kernel/amd_gart_64.c | 2 +-
> > > drivers/iommu/dma-iommu.c | 2 +-
> > > drivers/xen/grant-dma-ops.c | 2 +-
> > > drivers/xen/swiotlb-xen.c | 2 +-
> > > include/linux/dma-map-ops.h | 2 +-
> > > kernel/dma/mapping.c | 4 ++--
> > > 6 files changed, 7 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
> > > index 56a917df410d..842a0ec5eaa9 100644
> > > --- a/arch/x86/kernel/amd_gart_64.c
> > > +++ b/arch/x86/kernel/amd_gart_64.c
> > > @@ -676,7 +676,7 @@ static const struct dma_map_ops gart_dma_ops = {
> > > .get_sgtable = dma_common_get_sgtable,
> > > .dma_supported = dma_direct_supported,
> > > .get_required_mask = dma_direct_get_required_mask,
> > > - .alloc_pages = dma_direct_alloc_pages,
> > > + .alloc_pages_op = dma_direct_alloc_pages,
> > > .free_pages = dma_direct_free_pages,
> > > };
> > >
> > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > > index 7a9f0b0bddbd..76a9d5ca4eee 100644
> > > --- a/drivers/iommu/dma-iommu.c
> > > +++ b/drivers/iommu/dma-iommu.c
> > > @@ -1556,7 +1556,7 @@ static const struct dma_map_ops iommu_dma_ops = {
> > > .flags = DMA_F_PCI_P2PDMA_SUPPORTED,
> > > .alloc = iommu_dma_alloc,
> > > .free = iommu_dma_free,
> > > - .alloc_pages = dma_common_alloc_pages,
> > > + .alloc_pages_op = dma_common_alloc_pages,
> > > .free_pages = dma_common_free_pages,
> > > .alloc_noncontiguous = iommu_dma_alloc_noncontiguous,
> > > .free_noncontiguous = iommu_dma_free_noncontiguous,
> > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > > index 9784a77fa3c9..6c7d984f164d 100644
> > > --- a/drivers/xen/grant-dma-ops.c
> > > +++ b/drivers/xen/grant-dma-ops.c
> > > @@ -282,7 +282,7 @@ static int xen_grant_dma_supported(struct device *dev, u64 mask)
> > > static const struct dma_map_ops xen_grant_dma_ops = {
> > > .alloc = xen_grant_dma_alloc,
> > > .free = xen_grant_dma_free,
> > > - .alloc_pages = xen_grant_dma_alloc_pages,
> > > + .alloc_pages_op = xen_grant_dma_alloc_pages,
> > > .free_pages = xen_grant_dma_free_pages,
> > > .mmap = dma_common_mmap,
> > > .get_sgtable = dma_common_get_sgtable,
> > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > > index 67aa74d20162..5ab2616153f0 100644
> > > --- a/drivers/xen/swiotlb-xen.c
> > > +++ b/drivers/xen/swiotlb-xen.c
> > > @@ -403,6 +403,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
> > > .dma_supported = xen_swiotlb_dma_supported,
> > > .mmap = dma_common_mmap,
> > > .get_sgtable = dma_common_get_sgtable,
> > > - .alloc_pages = dma_common_alloc_pages,
> > > + .alloc_pages_op = dma_common_alloc_pages,
> > > .free_pages = dma_common_free_pages,
> > > };
> > > diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> > > index 31f114f486c4..d741940dcb3b 100644
> > > --- a/include/linux/dma-map-ops.h
> > > +++ b/include/linux/dma-map-ops.h
> > > @@ -27,7 +27,7 @@ struct dma_map_ops {
> > > unsigned long attrs);
> > > void (*free)(struct device *dev, size_t size, void *vaddr,
> > > dma_addr_t dma_handle, unsigned long attrs);
> > > - struct page *(*alloc_pages)(struct device *dev, size_t size,
> > > + struct page *(*alloc_pages_op)(struct device *dev, size_t size,
> > > dma_addr_t *dma_handle, enum dma_data_direction dir,
> > > gfp_t gfp);
> > > void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
> > > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> > > index 9a4db5cce600..fc42930af14b 100644
> > > --- a/kernel/dma/mapping.c
> > > +++ b/kernel/dma/mapping.c
> > > @@ -570,9 +570,9 @@ static struct page *__dma_alloc_pages(struct device *dev, size_t size,
> > > size = PAGE_ALIGN(size);
> > > if (dma_alloc_direct(dev, ops))
> > > return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
> > > - if (!ops->alloc_pages)
> > > + if (!ops->alloc_pages_op)
> > > return NULL;
> > > - return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
> > > + return ops->alloc_pages_op(dev, size, dma_handle, dir, gfp);
> > > }
> > >
> > > struct page *dma_alloc_pages(struct device *dev, size_t size,
> >
> > I'm not impressed. This patch increases churn for code which does not
> > (directly) benefit from the change, and that for limitations in your
> > tooling?
> >
> > Why not just rename the conflicting uses in your local tree, but then
> > remove the rename from the final patch series?
>
> With alloc_pages function becoming a macro, the preprocessor ends up
> replacing all instances of that name, even when it's not used as a
> function. That what necessitates this change. If there is a way to
> work around this issue without changing all alloc_pages() calls in the
> source base I would love to learn it but I'm not quite clear about
> your suggestion and if it solves the issue. Could you please provide
> more details?
Ah, right, I admit I did not quite understand why this change is
needed. However, this is exactly what I don't like about preprocessor
macros. Each macro effectively adds a new keyword to the language.
I believe everything can be solved with inline functions. What exactly
does not work if you rename alloc_pages() to e.g. alloc_pages_caller()
and then add an alloc_pages() inline function which calls
alloc_pages_caller() with _RET_IP_ as a parameter?
Petr T
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts
2023-05-02 20:09 ` Petr Tesařík
@ 2023-05-02 20:18 ` Kent Overstreet
2023-05-02 20:24 ` Suren Baghdasaryan
1 sibling, 0 replies; 160+ messages in thread
From: Kent Overstreet @ 2023-05-02 20:18 UTC (permalink / raw)
To: Petr Tesařík
Cc: Suren Baghdasaryan, akpm, mhocko, vbabka, hannes, roman.gushchin,
mgorman, dave, willy, liam.howlett, corbet, void, peterz,
juri.lelli, ldufour, catalin.marinas, will, arnd, tglx, mingo,
dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy, nathan,
dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, andr
On Tue, May 02, 2023 at 10:09:09PM +0200, Petr Tesařík wrote:
> Ah, right, I admit I did not quite understand why this change is
> needed. However, this is exactly what I don't like about preprocessor
> macros. Each macro effectively adds a new keyword to the language.
>
> I believe everything can be solved with inline functions. What exactly
> does not work if you rename alloc_pages() to e.g. alloc_pages_caller()
> and then add an alloc_pages() inline function which calls
> alloc_pages_caller() with _RET_IP_ as a parameter?
Perhaps you should spend a little more time reading the patchset and
learning how the code works before commenting.
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts
2023-05-02 20:09 ` Petr Tesařík
2023-05-02 20:18 ` Kent Overstreet
@ 2023-05-02 20:24 ` Suren Baghdasaryan
2023-05-02 20:39 ` Petr Tesařík
1 sibling, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-02 20:24 UTC (permalink / raw)
To: Petr Tesařík
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
mgorman, dave, willy, liam.howlett, corbet, void, peterz,
juri.lelli, ldufour, catalin.marinas, will, arnd, tglx, mingo,
dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy, nathan,
dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, andreyknvl
On Tue, May 2, 2023 at 1:09 PM Petr Tesařík <petr@tesarici.cz> wrote:
>
> On Tue, 2 May 2023 11:38:49 -0700
> Suren Baghdasaryan <surenb@google.com> wrote:
>
> > On Tue, May 2, 2023 at 8:50 AM Petr Tesařík <petr@tesarici.cz> wrote:
> > >
> > > On Mon, 1 May 2023 09:54:29 -0700
> > > Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > > After redefining alloc_pages, all uses of that name are being replaced.
> > > > Change the conflicting names to prevent preprocessor from replacing them
> > > > when it's not intended.
> > > >
> > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > ---
> > > > arch/x86/kernel/amd_gart_64.c | 2 +-
> > > > drivers/iommu/dma-iommu.c | 2 +-
> > > > drivers/xen/grant-dma-ops.c | 2 +-
> > > > drivers/xen/swiotlb-xen.c | 2 +-
> > > > include/linux/dma-map-ops.h | 2 +-
> > > > kernel/dma/mapping.c | 4 ++--
> > > > 6 files changed, 7 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
> > > > index 56a917df410d..842a0ec5eaa9 100644
> > > > --- a/arch/x86/kernel/amd_gart_64.c
> > > > +++ b/arch/x86/kernel/amd_gart_64.c
> > > > @@ -676,7 +676,7 @@ static const struct dma_map_ops gart_dma_ops = {
> > > > .get_sgtable = dma_common_get_sgtable,
> > > > .dma_supported = dma_direct_supported,
> > > > .get_required_mask = dma_direct_get_required_mask,
> > > > - .alloc_pages = dma_direct_alloc_pages,
> > > > + .alloc_pages_op = dma_direct_alloc_pages,
> > > > .free_pages = dma_direct_free_pages,
> > > > };
> > > >
> > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > > > index 7a9f0b0bddbd..76a9d5ca4eee 100644
> > > > --- a/drivers/iommu/dma-iommu.c
> > > > +++ b/drivers/iommu/dma-iommu.c
> > > > @@ -1556,7 +1556,7 @@ static const struct dma_map_ops iommu_dma_ops = {
> > > > .flags = DMA_F_PCI_P2PDMA_SUPPORTED,
> > > > .alloc = iommu_dma_alloc,
> > > > .free = iommu_dma_free,
> > > > - .alloc_pages = dma_common_alloc_pages,
> > > > + .alloc_pages_op = dma_common_alloc_pages,
> > > > .free_pages = dma_common_free_pages,
> > > > .alloc_noncontiguous = iommu_dma_alloc_noncontiguous,
> > > > .free_noncontiguous = iommu_dma_free_noncontiguous,
> > > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > > > index 9784a77fa3c9..6c7d984f164d 100644
> > > > --- a/drivers/xen/grant-dma-ops.c
> > > > +++ b/drivers/xen/grant-dma-ops.c
> > > > @@ -282,7 +282,7 @@ static int xen_grant_dma_supported(struct device *dev, u64 mask)
> > > > static const struct dma_map_ops xen_grant_dma_ops = {
> > > > .alloc = xen_grant_dma_alloc,
> > > > .free = xen_grant_dma_free,
> > > > - .alloc_pages = xen_grant_dma_alloc_pages,
> > > > + .alloc_pages_op = xen_grant_dma_alloc_pages,
> > > > .free_pages = xen_grant_dma_free_pages,
> > > > .mmap = dma_common_mmap,
> > > > .get_sgtable = dma_common_get_sgtable,
> > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > > > index 67aa74d20162..5ab2616153f0 100644
> > > > --- a/drivers/xen/swiotlb-xen.c
> > > > +++ b/drivers/xen/swiotlb-xen.c
> > > > @@ -403,6 +403,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
> > > > .dma_supported = xen_swiotlb_dma_supported,
> > > > .mmap = dma_common_mmap,
> > > > .get_sgtable = dma_common_get_sgtable,
> > > > - .alloc_pages = dma_common_alloc_pages,
> > > > + .alloc_pages_op = dma_common_alloc_pages,
> > > > .free_pages = dma_common_free_pages,
> > > > };
> > > > diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> > > > index 31f114f486c4..d741940dcb3b 100644
> > > > --- a/include/linux/dma-map-ops.h
> > > > +++ b/include/linux/dma-map-ops.h
> > > > @@ -27,7 +27,7 @@ struct dma_map_ops {
> > > > unsigned long attrs);
> > > > void (*free)(struct device *dev, size_t size, void *vaddr,
> > > > dma_addr_t dma_handle, unsigned long attrs);
> > > > - struct page *(*alloc_pages)(struct device *dev, size_t size,
> > > > + struct page *(*alloc_pages_op)(struct device *dev, size_t size,
> > > > dma_addr_t *dma_handle, enum dma_data_direction dir,
> > > > gfp_t gfp);
> > > > void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
> > > > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> > > > index 9a4db5cce600..fc42930af14b 100644
> > > > --- a/kernel/dma/mapping.c
> > > > +++ b/kernel/dma/mapping.c
> > > > @@ -570,9 +570,9 @@ static struct page *__dma_alloc_pages(struct device *dev, size_t size,
> > > > size = PAGE_ALIGN(size);
> > > > if (dma_alloc_direct(dev, ops))
> > > > return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
> > > > - if (!ops->alloc_pages)
> > > > + if (!ops->alloc_pages_op)
> > > > return NULL;
> > > > - return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
> > > > + return ops->alloc_pages_op(dev, size, dma_handle, dir, gfp);
> > > > }
> > > >
> > > > struct page *dma_alloc_pages(struct device *dev, size_t size,
> > >
> > > I'm not impressed. This patch increases churn for code which does not
> > > (directly) benefit from the change, and that for limitations in your
> > > tooling?
> > >
> > > Why not just rename the conflicting uses in your local tree, but then
> > > remove the rename from the final patch series?
> >
> > With alloc_pages function becoming a macro, the preprocessor ends up
> > replacing all instances of that name, even when it's not used as a
> > function. That what necessitates this change. If there is a way to
> > work around this issue without changing all alloc_pages() calls in the
> > source base I would love to learn it but I'm not quite clear about
> > your suggestion and if it solves the issue. Could you please provide
> > more details?
>
> Ah, right, I admit I did not quite understand why this change is
> needed. However, this is exactly what I don't like about preprocessor
> macros. Each macro effectively adds a new keyword to the language.
>
> I believe everything can be solved with inline functions. What exactly
> does not work if you rename alloc_pages() to e.g. alloc_pages_caller()
> and then add an alloc_pages() inline function which calls
> alloc_pages_caller() with _RET_IP_ as a parameter?
I don't think that would work because we need to inject the codetag at
the file/line of the actual allocation call. If we pass _REP_IT_ then
we would have to lookup the codetag associated with that _RET_IP_
which results in additional runtime overhead.
>
> Petr T
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts
2023-05-02 20:24 ` Suren Baghdasaryan
@ 2023-05-02 20:39 ` Petr Tesařík
[not found] ` <20230502223915.6b38f8c4-TD/jYOLh/Qr2G+KSGY6Hrl+YFMdMcpeZ@public.gmane.org>
0 siblings, 1 reply; 160+ messages in thread
From: Petr Tesařík @ 2023-05-02 20:39 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
mgorman, dave, willy, liam.howlett, corbet, void, peterz,
juri.lelli, ldufour, catalin.marinas, will, arnd, tglx, mingo,
dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy, nathan,
dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, andreyknvl
On Tue, 2 May 2023 13:24:37 -0700
Suren Baghdasaryan <surenb@google.com> wrote:
> On Tue, May 2, 2023 at 1:09 PM Petr Tesařík <petr@tesarici.cz> wrote:
> >
> > On Tue, 2 May 2023 11:38:49 -0700
> > Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > On Tue, May 2, 2023 at 8:50 AM Petr Tesařík <petr@tesarici.cz> wrote:
> > > >
> > > > On Mon, 1 May 2023 09:54:29 -0700
> > > > Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > > After redefining alloc_pages, all uses of that name are being replaced.
> > > > > Change the conflicting names to prevent preprocessor from replacing them
> > > > > when it's not intended.
> > > > >
> > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > > > ---
> > > > > arch/x86/kernel/amd_gart_64.c | 2 +-
> > > > > drivers/iommu/dma-iommu.c | 2 +-
> > > > > drivers/xen/grant-dma-ops.c | 2 +-
> > > > > drivers/xen/swiotlb-xen.c | 2 +-
> > > > > include/linux/dma-map-ops.h | 2 +-
> > > > > kernel/dma/mapping.c | 4 ++--
> > > > > 6 files changed, 7 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
> > > > > index 56a917df410d..842a0ec5eaa9 100644
> > > > > --- a/arch/x86/kernel/amd_gart_64.c
> > > > > +++ b/arch/x86/kernel/amd_gart_64.c
> > > > > @@ -676,7 +676,7 @@ static const struct dma_map_ops gart_dma_ops = {
> > > > > .get_sgtable = dma_common_get_sgtable,
> > > > > .dma_supported = dma_direct_supported,
> > > > > .get_required_mask = dma_direct_get_required_mask,
> > > > > - .alloc_pages = dma_direct_alloc_pages,
> > > > > + .alloc_pages_op = dma_direct_alloc_pages,
> > > > > .free_pages = dma_direct_free_pages,
> > > > > };
> > > > >
> > > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > > > > index 7a9f0b0bddbd..76a9d5ca4eee 100644
> > > > > --- a/drivers/iommu/dma-iommu.c
> > > > > +++ b/drivers/iommu/dma-iommu.c
> > > > > @@ -1556,7 +1556,7 @@ static const struct dma_map_ops iommu_dma_ops = {
> > > > > .flags = DMA_F_PCI_P2PDMA_SUPPORTED,
> > > > > .alloc = iommu_dma_alloc,
> > > > > .free = iommu_dma_free,
> > > > > - .alloc_pages = dma_common_alloc_pages,
> > > > > + .alloc_pages_op = dma_common_alloc_pages,
> > > > > .free_pages = dma_common_free_pages,
> > > > > .alloc_noncontiguous = iommu_dma_alloc_noncontiguous,
> > > > > .free_noncontiguous = iommu_dma_free_noncontiguous,
> > > > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > > > > index 9784a77fa3c9..6c7d984f164d 100644
> > > > > --- a/drivers/xen/grant-dma-ops.c
> > > > > +++ b/drivers/xen/grant-dma-ops.c
> > > > > @@ -282,7 +282,7 @@ static int xen_grant_dma_supported(struct device *dev, u64 mask)
> > > > > static const struct dma_map_ops xen_grant_dma_ops = {
> > > > > .alloc = xen_grant_dma_alloc,
> > > > > .free = xen_grant_dma_free,
> > > > > - .alloc_pages = xen_grant_dma_alloc_pages,
> > > > > + .alloc_pages_op = xen_grant_dma_alloc_pages,
> > > > > .free_pages = xen_grant_dma_free_pages,
> > > > > .mmap = dma_common_mmap,
> > > > > .get_sgtable = dma_common_get_sgtable,
> > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > > > > index 67aa74d20162..5ab2616153f0 100644
> > > > > --- a/drivers/xen/swiotlb-xen.c
> > > > > +++ b/drivers/xen/swiotlb-xen.c
> > > > > @@ -403,6 +403,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
> > > > > .dma_supported = xen_swiotlb_dma_supported,
> > > > > .mmap = dma_common_mmap,
> > > > > .get_sgtable = dma_common_get_sgtable,
> > > > > - .alloc_pages = dma_common_alloc_pages,
> > > > > + .alloc_pages_op = dma_common_alloc_pages,
> > > > > .free_pages = dma_common_free_pages,
> > > > > };
> > > > > diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> > > > > index 31f114f486c4..d741940dcb3b 100644
> > > > > --- a/include/linux/dma-map-ops.h
> > > > > +++ b/include/linux/dma-map-ops.h
> > > > > @@ -27,7 +27,7 @@ struct dma_map_ops {
> > > > > unsigned long attrs);
> > > > > void (*free)(struct device *dev, size_t size, void *vaddr,
> > > > > dma_addr_t dma_handle, unsigned long attrs);
> > > > > - struct page *(*alloc_pages)(struct device *dev, size_t size,
> > > > > + struct page *(*alloc_pages_op)(struct device *dev, size_t size,
> > > > > dma_addr_t *dma_handle, enum dma_data_direction dir,
> > > > > gfp_t gfp);
> > > > > void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
> > > > > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> > > > > index 9a4db5cce600..fc42930af14b 100644
> > > > > --- a/kernel/dma/mapping.c
> > > > > +++ b/kernel/dma/mapping.c
> > > > > @@ -570,9 +570,9 @@ static struct page *__dma_alloc_pages(struct device *dev, size_t size,
> > > > > size = PAGE_ALIGN(size);
> > > > > if (dma_alloc_direct(dev, ops))
> > > > > return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
> > > > > - if (!ops->alloc_pages)
> > > > > + if (!ops->alloc_pages_op)
> > > > > return NULL;
> > > > > - return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
> > > > > + return ops->alloc_pages_op(dev, size, dma_handle, dir, gfp);
> > > > > }
> > > > >
> > > > > struct page *dma_alloc_pages(struct device *dev, size_t size,
> > > >
> > > > I'm not impressed. This patch increases churn for code which does not
> > > > (directly) benefit from the change, and that for limitations in your
> > > > tooling?
> > > >
> > > > Why not just rename the conflicting uses in your local tree, but then
> > > > remove the rename from the final patch series?
> > >
> > > With alloc_pages function becoming a macro, the preprocessor ends up
> > > replacing all instances of that name, even when it's not used as a
> > > function. That what necessitates this change. If there is a way to
> > > work around this issue without changing all alloc_pages() calls in the
> > > source base I would love to learn it but I'm not quite clear about
> > > your suggestion and if it solves the issue. Could you please provide
> > > more details?
> >
> > Ah, right, I admit I did not quite understand why this change is
> > needed. However, this is exactly what I don't like about preprocessor
> > macros. Each macro effectively adds a new keyword to the language.
> >
> > I believe everything can be solved with inline functions. What exactly
> > does not work if you rename alloc_pages() to e.g. alloc_pages_caller()
> > and then add an alloc_pages() inline function which calls
> > alloc_pages_caller() with _RET_IP_ as a parameter?
>
> I don't think that would work because we need to inject the codetag at
> the file/line of the actual allocation call. If we pass _REP_IT_ then
> we would have to lookup the codetag associated with that _RET_IP_
> which results in additional runtime overhead.
OK. If the reference to source code itself must be recorded in the
kernel, and not resolved later (either by the debugfs read fops, or by
a tool which reads the file), then this information can only be
obtained with a preprocessor macro.
I was hoping that a debugging feature could be less intrusive. OTOH
it's not my call to balance the tradeoffs.
Thank you for your patient explanations.
Petr T
^ permalink raw reply [flat|nested] 160+ messages in thread
[parent not found: <20230501165450.15352-20-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts
[not found] ` <20230501165450.15352-20-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2023-05-03 16:25 ` Steven Rostedt
2023-05-03 18:03 ` Suren Baghdasaryan
0 siblings, 1 reply; 160+ messages in thread
From: Steven Rostedt @ 2023-05-03 16:25 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl
On Mon, 1 May 2023 09:54:29 -0700
Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> After redefining alloc_pages, all uses of that name are being replaced.
> Change the conflicting names to prevent preprocessor from replacing them
> when it's not intended.
Note, every change log should have enough information in it to know why it
is being done. This says what the patch does, but does not fully explain
"why". It should never be assumed that one must read other patches to get
the context. A year from now, investigating git history, this may be the
only thing someone sees for why this change occurred.
The "why" above is simply "prevent preprocessor from replacing them
when it's not intended". What does that mean?
-- Steve
>
> Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts
2023-05-03 16:25 ` Steven Rostedt
@ 2023-05-03 18:03 ` Suren Baghdasaryan
0 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-03 18:03 UTC (permalink / raw)
To: Steven Rostedt
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
mgorman, dave, willy, liam.howlett, corbet, void, peterz,
juri.lelli, ldufour, catalin.marinas, will, arnd, tglx, mingo,
dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy, nathan,
dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, andreyknvl
On Wed, May 3, 2023 at 9:25 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 1 May 2023 09:54:29 -0700
> Suren Baghdasaryan <surenb@google.com> wrote:
>
> > After redefining alloc_pages, all uses of that name are being replaced.
> > Change the conflicting names to prevent preprocessor from replacing them
> > when it's not intended.
>
> Note, every change log should have enough information in it to know why it
> is being done. This says what the patch does, but does not fully explain
> "why". It should never be assumed that one must read other patches to get
> the context. A year from now, investigating git history, this may be the
> only thing someone sees for why this change occurred.
>
> The "why" above is simply "prevent preprocessor from replacing them
> when it's not intended". What does that mean?
Thanks for the feedback, Steve. I'll make appropriate modifications to
the description.
>
> -- Steve
>
>
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 22/40] mm: create new codetag references during page splitting
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (14 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 24/40] mm/slab: add allocation accounting into slab allocation and free paths Suren Baghdasaryan
` (14 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
When a high-order page is split into smaller ones, each newly split
page should get its codetag. The original codetag is reused for these
pages but it's recorded as 0-byte allocation because original codetag
already accounts for the original high-order allocated page.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++
mm/huge_memory.c | 2 ++
mm/page_alloc.c | 2 ++
3 files changed, 34 insertions(+)
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 567327c1c46f..0cbba13869b5 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -52,11 +52,41 @@ static inline void pgalloc_tag_dec(struct page *page, unsigned int order)
}
}
+static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
+{
+ int i;
+ struct page_ext *page_ext;
+ union codetag_ref *ref;
+ struct alloc_tag *tag;
+
+ if (!mem_alloc_profiling_enabled())
+ return;
+
+ page_ext = page_ext_get(page);
+ if (unlikely(!page_ext))
+ return;
+
+ ref = codetag_ref_from_page_ext(page_ext);
+ if (!ref->ct)
+ goto out;
+
+ tag = ct_to_alloc_tag(ref->ct);
+ page_ext = page_ext_next(page_ext);
+ for (i = 1; i < nr; i++) {
+ /* New reference with 0 bytes accounted */
+ alloc_tag_add(codetag_ref_from_page_ext(page_ext), tag, 0);
+ page_ext = page_ext_next(page_ext);
+ }
+out:
+ page_ext_put(page_ext);
+}
+
#else /* CONFIG_MEM_ALLOC_PROFILING */
static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
static inline void put_page_tag_ref(union codetag_ref *ref) {}
#define pgalloc_tag_dec(__page, __size) do {} while (0)
+static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}
#endif /* CONFIG_MEM_ALLOC_PROFILING */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 624671aaa60d..221cce0052a2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -37,6 +37,7 @@
#include <linux/page_owner.h>
#include <linux/sched/sysctl.h>
#include <linux/memory-tiers.h>
+#include <linux/pgalloc_tag.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
@@ -2557,6 +2558,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
/* Caller disabled irqs, so they are still disabled here */
split_page_owner(head, nr);
+ pgalloc_tag_split(head, nr);
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edd35500f7f6..8cf5a835af7f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2796,6 +2796,7 @@ void split_page(struct page *page, unsigned int order)
for (i = 1; i < (1 << order); i++)
set_page_refcounted(page + i);
split_page_owner(page, 1 << order);
+ pgalloc_tag_split(page, 1 << order);
split_page_memcg(page, 1 << order);
}
EXPORT_SYMBOL_GPL(split_page);
@@ -5012,6 +5013,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
struct page *last = page + nr;
split_page_owner(page, 1 << order);
+ pgalloc_tag_split(page, 1 << order);
split_page_memcg(page, 1 << order);
while (page < --last)
set_page_refcounted(last);
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 24/40] mm/slab: add allocation accounting into slab allocation and free paths
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (15 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 22/40] mm: create new codetag references during page splitting Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 25/40] mm/slab: enable slab allocation tagging for kmalloc and friends Suren Baghdasaryan
` (13 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Account slab allocations using codetag reference embedded into slabobj_ext.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
include/linux/slab_def.h | 2 +-
include/linux/slub_def.h | 4 ++--
mm/slab.c | 4 +++-
mm/slab.h | 35 +++++++++++++++++++++++++++++++++++
4 files changed, 41 insertions(+), 4 deletions(-)
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index a61e7d55d0d3..23f14dcb8d5b 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -107,7 +107,7 @@ static inline void *nearest_obj(struct kmem_cache *cache, const struct slab *sla
* reciprocal_divide(offset, cache->reciprocal_buffer_size)
*/
static inline unsigned int obj_to_index(const struct kmem_cache *cache,
- const struct slab *slab, void *obj)
+ const struct slab *slab, const void *obj)
{
u32 offset = (obj - slab->s_mem);
return reciprocal_divide(offset, cache->reciprocal_buffer_size);
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index f6df03f934e5..e8be5b368857 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -176,14 +176,14 @@ static inline void *nearest_obj(struct kmem_cache *cache, const struct slab *sla
/* Determine object index from a given position */
static inline unsigned int __obj_to_index(const struct kmem_cache *cache,
- void *addr, void *obj)
+ void *addr, const void *obj)
{
return reciprocal_divide(kasan_reset_tag(obj) - addr,
cache->reciprocal_size);
}
static inline unsigned int obj_to_index(const struct kmem_cache *cache,
- const struct slab *slab, void *obj)
+ const struct slab *slab, const void *obj)
{
if (is_kfence_address(obj))
return 0;
diff --git a/mm/slab.c b/mm/slab.c
index ccc76f7455e9..026f0c08708a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3367,9 +3367,11 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
static __always_inline void __cache_free(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
+ struct slab *slab = virt_to_slab(objp);
bool init;
- memcg_slab_free_hook(cachep, virt_to_slab(objp), &objp, 1);
+ memcg_slab_free_hook(cachep, slab, &objp, 1);
+ alloc_tagging_slab_free_hook(cachep, slab, &objp, 1);
if (is_kfence_address(objp)) {
kmemleak_free_recursive(objp, cachep->flags);
diff --git a/mm/slab.h b/mm/slab.h
index f953e7c81e98..f9442d3a10b2 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -494,6 +494,35 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
#endif /* CONFIG_SLAB_OBJ_EXT */
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
+static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
+ void **p, int objects)
+{
+ struct slabobj_ext *obj_exts;
+ int i;
+
+ if (!mem_alloc_profiling_enabled())
+ return;
+
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts)
+ return;
+
+ for (i = 0; i < objects; i++) {
+ unsigned int off = obj_to_index(s, slab, p[i]);
+
+ alloc_tag_sub(&obj_exts[off].ref, s->size);
+ }
+}
+
+#else
+
+static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
+ void **p, int objects) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING */
+
#ifdef CONFIG_MEMCG_KMEM
void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
enum node_stat_item idx, int nr);
@@ -776,6 +805,12 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
s->flags, flags);
kmsan_slab_alloc(s, p[i], flags);
obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]);
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ /* obj_exts can be allocated for other reasons */
+ if (likely(obj_exts) && mem_alloc_profiling_enabled())
+ alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
+#endif
}
memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 25/40] mm/slab: enable slab allocation tagging for kmalloc and friends
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (16 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 24/40] mm/slab: add allocation accounting into slab allocation and free paths Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 26/40] mm/slub: Mark slab_free_freelist_hook() __always_inline Suren Baghdasaryan
` (12 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Redefine kmalloc, krealloc, kzalloc, kcalloc, etc. to record allocations
and deallocations done by these functions.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
include/linux/slab.h | 175 ++++++++++++++++++++++---------------------
mm/slab.c | 16 ++--
mm/slab_common.c | 22 +++---
mm/slub.c | 17 +++--
mm/util.c | 10 +--
5 files changed, 124 insertions(+), 116 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 99a146f3cedf..43c922524081 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -213,7 +213,10 @@ int kmem_cache_shrink(struct kmem_cache *s);
/*
* Common kmalloc functions provided by all allocators
*/
-void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2);
+void * __must_check _krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2);
+#define krealloc(_p, _size, _flags) \
+ alloc_hooks(_krealloc(_p, _size, _flags), void*, NULL)
+
void kfree(const void *objp);
void kfree_sensitive(const void *objp);
size_t __ksize(const void *objp);
@@ -451,6 +454,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
static_assert(PAGE_SHIFT <= 20);
#define kmalloc_index(s) __kmalloc_index(s, true)
+#include <linux/alloc_tag.h>
+
void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
/**
@@ -463,9 +468,15 @@ void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_siz
*
* Return: pointer to the new object or %NULL in case of error
*/
-void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) __assume_slab_alignment __malloc;
-void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
- gfp_t gfpflags) __assume_slab_alignment __malloc;
+void *_kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc(_s, _flags) \
+ alloc_hooks(_kmem_cache_alloc(_s, _flags), void*, NULL)
+
+void *_kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc_lru(_s, _lru, _flags) \
+ alloc_hooks(_kmem_cache_alloc_lru(_s, _lru, _flags), void*, NULL)
+
void kmem_cache_free(struct kmem_cache *s, void *objp);
/*
@@ -476,7 +487,9 @@ void kmem_cache_free(struct kmem_cache *s, void *objp);
* Note that interrupts must be enabled when calling these functions.
*/
void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p);
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p);
+int _kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p);
+#define kmem_cache_alloc_bulk(_s, _flags, _size, _p) \
+ alloc_hooks(_kmem_cache_alloc_bulk(_s, _flags, _size, _p), int, 0)
static __always_inline void kfree_bulk(size_t size, void **p)
{
@@ -485,20 +498,32 @@ static __always_inline void kfree_bulk(size_t size, void **p)
void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
__alloc_size(1);
-void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
- __malloc;
+void *_kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
+ __malloc;
+#define kmem_cache_alloc_node(_s, _flags, _node) \
+ alloc_hooks(_kmem_cache_alloc_node(_s, _flags, _node), void*, NULL)
-void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
+void *_kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
__assume_kmalloc_alignment __alloc_size(3);
-void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
+void *_kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t size) __assume_kmalloc_alignment
__alloc_size(4);
-void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
+#define kmalloc_trace(_s, _flags, _size) \
+ alloc_hooks(_kmalloc_trace(_s, _flags, _size), void*, NULL)
+
+#define kmalloc_node_trace(_s, _gfpflags, _node, _size) \
+ alloc_hooks(_kmalloc_node_trace(_s, _gfpflags, _node, _size), void*, NULL)
+
+void *_kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
__alloc_size(1);
+#define kmalloc_large(_size, _flags) \
+ alloc_hooks(_kmalloc_large(_size, _flags), void*, NULL)
-void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment
+void *_kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment
__alloc_size(1);
+#define kmalloc_large_node(_size, _flags, _node) \
+ alloc_hooks(_kmalloc_large_node(_size, _flags, _node), void*, NULL)
/**
* kmalloc - allocate kernel memory
@@ -554,37 +579,40 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_align
* Try really hard to succeed the allocation but fail
* eventually.
*/
-static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
+static __always_inline __alloc_size(1) void *_kmalloc(size_t size, gfp_t flags)
{
if (__builtin_constant_p(size) && size) {
unsigned int index;
if (size > KMALLOC_MAX_CACHE_SIZE)
- return kmalloc_large(size, flags);
+ return _kmalloc_large(size, flags);
index = kmalloc_index(size);
- return kmalloc_trace(
+ return _kmalloc_trace(
kmalloc_caches[kmalloc_type(flags)][index],
flags, size);
}
return __kmalloc(size, flags);
}
+#define kmalloc(_size, _flags) alloc_hooks(_kmalloc(_size, _flags), void*, NULL)
-static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
+static __always_inline __alloc_size(1) void *_kmalloc_node(size_t size, gfp_t flags, int node)
{
if (__builtin_constant_p(size) && size) {
unsigned int index;
if (size > KMALLOC_MAX_CACHE_SIZE)
- return kmalloc_large_node(size, flags, node);
+ return _kmalloc_large_node(size, flags, node);
index = kmalloc_index(size);
- return kmalloc_node_trace(
+ return _kmalloc_node_trace(
kmalloc_caches[kmalloc_type(flags)][index],
flags, node, size);
}
return __kmalloc_node(size, flags, node);
}
+#define kmalloc_node(_size, _flags, _node) \
+ alloc_hooks(_kmalloc_node(_size, _flags, _node), void*, NULL)
/**
* kmalloc_array - allocate memory for an array.
@@ -592,16 +620,18 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_t flags)
+static inline __alloc_size(1, 2) void *_kmalloc_array(size_t n, size_t size, gfp_t flags)
{
size_t bytes;
if (unlikely(check_mul_overflow(n, size, &bytes)))
return NULL;
if (__builtin_constant_p(n) && __builtin_constant_p(size))
- return kmalloc(bytes, flags);
- return __kmalloc(bytes, flags);
+ return _kmalloc(bytes, flags);
+ return _kmalloc(bytes, flags);
}
+#define kmalloc_array(_n, _size, _flags) \
+ alloc_hooks(_kmalloc_array(_n, _size, _flags), void*, NULL)
/**
* krealloc_array - reallocate memory for an array.
@@ -610,18 +640,20 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_
* @new_size: new size of a single member of the array
* @flags: the type of memory to allocate (see kmalloc)
*/
-static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
- size_t new_n,
- size_t new_size,
- gfp_t flags)
+static inline __realloc_size(2, 3) void * __must_check _krealloc_array(void *p,
+ size_t new_n,
+ size_t new_size,
+ gfp_t flags)
{
size_t bytes;
if (unlikely(check_mul_overflow(new_n, new_size, &bytes)))
return NULL;
- return krealloc(p, bytes, flags);
+ return _krealloc(p, bytes, flags);
}
+#define krealloc_array(_p, _n, _size, _flags) \
+ alloc_hooks(_krealloc_array(_p, _n, _size, _flags), void*, NULL)
/**
* kcalloc - allocate memory for an array. The memory is set to zero.
@@ -629,16 +661,14 @@ static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flags)
-{
- return kmalloc_array(n, size, flags | __GFP_ZERO);
-}
+#define kcalloc(_n, _size, _flags) \
+ kmalloc_array(_n, _size, (_flags) | __GFP_ZERO)
void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
unsigned long caller) __alloc_size(1);
#define kmalloc_node_track_caller(size, flags, node) \
- __kmalloc_node_track_caller(size, flags, node, \
- _RET_IP_)
+ alloc_hooks(__kmalloc_node_track_caller(size, flags, node, \
+ _RET_IP_), void*, NULL)
/*
* kmalloc_track_caller is a special version of kmalloc that records the
@@ -648,11 +678,10 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
* allocator where we care about the real place the memory allocation
* request comes from.
*/
-#define kmalloc_track_caller(size, flags) \
- __kmalloc_node_track_caller(size, flags, \
- NUMA_NO_NODE, _RET_IP_)
+#define kmalloc_track_caller(size, flags) \
+ kmalloc_node_track_caller(size, flags, NUMA_NO_NODE)
-static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size, gfp_t flags,
+static inline __alloc_size(1, 2) void *_kmalloc_array_node(size_t n, size_t size, gfp_t flags,
int node)
{
size_t bytes;
@@ -660,75 +689,53 @@ static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size,
if (unlikely(check_mul_overflow(n, size, &bytes)))
return NULL;
if (__builtin_constant_p(n) && __builtin_constant_p(size))
- return kmalloc_node(bytes, flags, node);
+ return _kmalloc_node(bytes, flags, node);
return __kmalloc_node(bytes, flags, node);
}
+#define kmalloc_array_node(_n, _size, _flags, _node) \
+ alloc_hooks(_kmalloc_array_node(_n, _size, _flags, _node), void*, NULL)
-static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t flags, int node)
-{
- return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
-}
+#define kcalloc_node(_n, _size, _flags, _node) \
+ kmalloc_array_node(_n, _size, (_flags) | __GFP_ZERO, _node)
/*
* Shortcuts
*/
-static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
-{
- return kmem_cache_alloc(k, flags | __GFP_ZERO);
-}
+#define kmem_cache_zalloc(_k, _flags) \
+ kmem_cache_alloc(_k, (_flags)|__GFP_ZERO)
/**
* kzalloc - allocate memory. The memory is set to zero.
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate (see kmalloc).
*/
-static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags)
-{
- return kmalloc(size, flags | __GFP_ZERO);
-}
-
-/**
- * kzalloc_node - allocate zeroed memory from a particular memory node.
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate (see kmalloc).
- * @node: memory node from which to allocate
- */
-static inline __alloc_size(1) void *kzalloc_node(size_t size, gfp_t flags, int node)
-{
- return kmalloc_node(size, flags | __GFP_ZERO, node);
-}
+#define kzalloc(_size, _flags) kmalloc(_size, (_flags)|__GFP_ZERO)
+#define kzalloc_node(_size, _flags, _node) kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
-extern void *kvmalloc_node(size_t size, gfp_t flags, int node) __alloc_size(1);
-static inline __alloc_size(1) void *kvmalloc(size_t size, gfp_t flags)
-{
- return kvmalloc_node(size, flags, NUMA_NO_NODE);
-}
-static inline __alloc_size(1) void *kvzalloc_node(size_t size, gfp_t flags, int node)
-{
- return kvmalloc_node(size, flags | __GFP_ZERO, node);
-}
-static inline __alloc_size(1) void *kvzalloc(size_t size, gfp_t flags)
-{
- return kvmalloc(size, flags | __GFP_ZERO);
-}
+extern void *_kvmalloc_node(size_t size, gfp_t flags, int node) __alloc_size(1);
+#define kvmalloc_node(_size, _flags, _node) \
+ alloc_hooks(_kvmalloc_node(_size, _flags, _node), void*, NULL)
-static inline __alloc_size(1, 2) void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
-{
- size_t bytes;
+#define kvmalloc(_size, _flags) kvmalloc_node(_size, _flags, NUMA_NO_NODE)
+#define kvzalloc(_size, _flags) kvmalloc(_size, _flags|__GFP_ZERO)
- if (unlikely(check_mul_overflow(n, size, &bytes)))
- return NULL;
+#define kvzalloc_node(_size, _flags, _node) kvmalloc_node(_size, _flags|__GFP_ZERO, _node)
- return kvmalloc(bytes, flags);
-}
+#define kvmalloc_array(_n, _size, _flags) \
+({ \
+ size_t _bytes; \
+ \
+ !check_mul_overflow(_n, _size, &_bytes) ? kvmalloc(_bytes, _flags) : NULL; \
+})
-static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t flags)
-{
- return kvmalloc_array(n, size, flags | __GFP_ZERO);
-}
+#define kvcalloc(_n, _size, _flags) kvmalloc_array(_n, _size, _flags|__GFP_ZERO)
-extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+extern void *_kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
__realloc_size(3);
+
+#define kvrealloc(_p, _oldsize, _newsize, _flags) \
+ alloc_hooks(_kvrealloc(_p, _oldsize, _newsize, _flags), void*, NULL)
+
extern void kvfree(const void *addr);
extern void kvfree_sensitive(const void *addr, size_t len);
diff --git a/mm/slab.c b/mm/slab.c
index 026f0c08708a..e08bd3496f56 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3448,18 +3448,18 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
return ret;
}
-void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
+void *_kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
return __kmem_cache_alloc_lru(cachep, NULL, flags);
}
-EXPORT_SYMBOL(kmem_cache_alloc);
+EXPORT_SYMBOL(_kmem_cache_alloc);
-void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
+void *_kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
gfp_t flags)
{
return __kmem_cache_alloc_lru(cachep, lru, flags);
}
-EXPORT_SYMBOL(kmem_cache_alloc_lru);
+EXPORT_SYMBOL(_kmem_cache_alloc_lru);
static __always_inline void
cache_alloc_debugcheck_after_bulk(struct kmem_cache *s, gfp_t flags,
@@ -3471,7 +3471,7 @@ cache_alloc_debugcheck_after_bulk(struct kmem_cache *s, gfp_t flags,
p[i] = cache_alloc_debugcheck_after(s, flags, p[i], caller);
}
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
+int _kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
void **p)
{
struct obj_cgroup *objcg = NULL;
@@ -3510,7 +3510,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
kmem_cache_free_bulk(s, i, p);
return 0;
}
-EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+EXPORT_SYMBOL(_kmem_cache_alloc_bulk);
/**
* kmem_cache_alloc_node - Allocate an object on the specified node
@@ -3525,7 +3525,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_bulk);
*
* Return: pointer to the new object or %NULL in case of error
*/
-void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
+void *_kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);
@@ -3533,7 +3533,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
return ret;
}
-EXPORT_SYMBOL(kmem_cache_alloc_node);
+EXPORT_SYMBOL(_kmem_cache_alloc_node);
void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
int nodeid, size_t orig_size,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 42777d66d0e3..a05333bbb7f1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1101,7 +1101,7 @@ size_t __ksize(const void *object)
return slab_ksize(folio_slab(folio)->slab_cache);
}
-void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
+void *_kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
void *ret = __kmem_cache_alloc_node(s, gfpflags, NUMA_NO_NODE,
size, _RET_IP_);
@@ -1111,9 +1111,9 @@ void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
-EXPORT_SYMBOL(kmalloc_trace);
+EXPORT_SYMBOL(_kmalloc_trace);
-void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
+void *_kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t size)
{
void *ret = __kmem_cache_alloc_node(s, gfpflags, node, size, _RET_IP_);
@@ -1123,7 +1123,7 @@ void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
-EXPORT_SYMBOL(kmalloc_node_trace);
+EXPORT_SYMBOL(_kmalloc_node_trace);
gfp_t kmalloc_fix_flags(gfp_t flags)
{
@@ -1168,7 +1168,7 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
return ptr;
}
-void *kmalloc_large(size_t size, gfp_t flags)
+void *_kmalloc_large(size_t size, gfp_t flags)
{
void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
@@ -1176,9 +1176,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
flags, NUMA_NO_NODE);
return ret;
}
-EXPORT_SYMBOL(kmalloc_large);
+EXPORT_SYMBOL(_kmalloc_large);
-void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+void *_kmalloc_large_node(size_t size, gfp_t flags, int node)
{
void *ret = __kmalloc_large_node(size, flags, node);
@@ -1186,7 +1186,7 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
flags, node);
return ret;
}
-EXPORT_SYMBOL(kmalloc_large_node);
+EXPORT_SYMBOL(_kmalloc_large_node);
#ifdef CONFIG_SLAB_FREELIST_RANDOM
/* Randomize a generic freelist */
@@ -1405,7 +1405,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
return (void *)p;
}
- ret = kmalloc_track_caller(new_size, flags);
+ ret = __kmalloc_node_track_caller(new_size, flags, NUMA_NO_NODE, _RET_IP_);
if (ret && p) {
/* Disable KASAN checks as the object's redzone is accessed. */
kasan_disable_current();
@@ -1429,7 +1429,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
*
* Return: pointer to the allocated memory or %NULL in case of error
*/
-void *krealloc(const void *p, size_t new_size, gfp_t flags)
+void *_krealloc(const void *p, size_t new_size, gfp_t flags)
{
void *ret;
@@ -1444,7 +1444,7 @@ void *krealloc(const void *p, size_t new_size, gfp_t flags)
return ret;
}
-EXPORT_SYMBOL(krealloc);
+EXPORT_SYMBOL(_krealloc);
/**
* kfree_sensitive - Clear sensitive information in memory before freeing
diff --git a/mm/slub.c b/mm/slub.c
index 507b71372ee4..8f57fd086f69 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3470,18 +3470,18 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
return ret;
}
-void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+void *_kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
{
return __kmem_cache_alloc_lru(s, NULL, gfpflags);
}
-EXPORT_SYMBOL(kmem_cache_alloc);
+EXPORT_SYMBOL(_kmem_cache_alloc);
-void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+void *_kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags)
{
return __kmem_cache_alloc_lru(s, lru, gfpflags);
}
-EXPORT_SYMBOL(kmem_cache_alloc_lru);
+EXPORT_SYMBOL(_kmem_cache_alloc_lru);
void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t orig_size,
@@ -3491,7 +3491,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
caller, orig_size);
}
-void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
+void *_kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);
@@ -3499,7 +3499,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
return ret;
}
-EXPORT_SYMBOL(kmem_cache_alloc_node);
+EXPORT_SYMBOL(_kmem_cache_alloc_node);
static noinline void free_to_partial_list(
struct kmem_cache *s, struct slab *slab,
@@ -3779,6 +3779,7 @@ static __fastpath_inline void slab_free(struct kmem_cache *s, struct slab *slab,
unsigned long addr)
{
memcg_slab_free_hook(s, slab, p, cnt);
+ alloc_tagging_slab_free_hook(s, slab, p, cnt);
/*
* With KASAN enabled slab_free_freelist_hook modifies the freelist
* to remove objects, whose reuse must be delayed.
@@ -4009,7 +4010,7 @@ static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
#endif /* CONFIG_SLUB_TINY */
/* Note that interrupts must be enabled when calling this function. */
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
+int _kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
void **p)
{
int i;
@@ -4034,7 +4035,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
slab_want_init_on_alloc(flags, s), s->object_size);
return i;
}
-EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+EXPORT_SYMBOL(_kmem_cache_alloc_bulk);
/*
diff --git a/mm/util.c b/mm/util.c
index dd12b9531ac4..e9077d1af676 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -579,7 +579,7 @@ EXPORT_SYMBOL(vm_mmap);
*
* Return: pointer to the allocated memory of %NULL in case of failure
*/
-void *kvmalloc_node(size_t size, gfp_t flags, int node)
+void *_kvmalloc_node(size_t size, gfp_t flags, int node)
{
gfp_t kmalloc_flags = flags;
void *ret;
@@ -601,7 +601,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
kmalloc_flags &= ~__GFP_NOFAIL;
}
- ret = kmalloc_node(size, kmalloc_flags, node);
+ ret = _kmalloc_node(size, kmalloc_flags, node);
/*
* It doesn't really make sense to fallback to vmalloc for sub page
@@ -630,7 +630,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
node, __builtin_return_address(0));
}
-EXPORT_SYMBOL(kvmalloc_node);
+EXPORT_SYMBOL(_kvmalloc_node);
/**
* kvfree() - Free memory.
@@ -669,7 +669,7 @@ void kvfree_sensitive(const void *addr, size_t len)
}
EXPORT_SYMBOL(kvfree_sensitive);
-void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+void *_kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
{
void *newp;
@@ -682,7 +682,7 @@ void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
kvfree(p);
return newp;
}
-EXPORT_SYMBOL(kvrealloc);
+EXPORT_SYMBOL(_kvrealloc);
/**
* __vmalloc_array - allocate memory for a virtually contiguous array.
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 26/40] mm/slub: Mark slab_free_freelist_hook() __always_inline
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (17 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 25/40] mm/slab: enable slab allocation tagging for kmalloc and friends Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 27/40] mempool: Hook up to memory allocation profiling Suren Baghdasaryan
` (11 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
From: Kent Overstreet <kent.overstreet@linux.dev>
It seems we need to be more forceful with the compiler on this one.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c
index 8f57fd086f69..9dd57b3384a1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1781,7 +1781,7 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s,
return kasan_slab_free(s, x, init);
}
-static inline bool slab_free_freelist_hook(struct kmem_cache *s,
+static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
void **head, void **tail,
int *cnt)
{
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 27/40] mempool: Hook up to memory allocation profiling
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (18 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 26/40] mm/slub: Mark slab_free_freelist_hook() __always_inline Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 29/40] mm: percpu: Introduce pcpuobj_ext Suren Baghdasaryan
` (10 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
From: Kent Overstreet <kent.overstreet@linux.dev>
This adds hooks to mempools for correctly annotating mempool-backed
allocations at the correct source line, so they show up correctly in
/sys/kernel/debug/allocations.
Various inline functions are converted to wrappers so that we can invoke
alloc_hooks() in fewer places.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/mempool.h | 73 ++++++++++++++++++++---------------------
mm/mempool.c | 28 ++++++----------
2 files changed, 45 insertions(+), 56 deletions(-)
diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 4aae6c06c5f2..aa6e886b01d7 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -5,6 +5,8 @@
#ifndef _LINUX_MEMPOOL_H
#define _LINUX_MEMPOOL_H
+#include <linux/sched.h>
+#include <linux/alloc_tag.h>
#include <linux/wait.h>
#include <linux/compiler.h>
@@ -39,18 +41,32 @@ void mempool_exit(mempool_t *pool);
int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data,
gfp_t gfp_mask, int node_id);
-int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+
+int _mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data);
+#define mempool_init(...) \
+ alloc_hooks(_mempool_init(__VA_ARGS__), int, -ENOMEM)
extern mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data);
-extern mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
+
+extern mempool_t *_mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data,
gfp_t gfp_mask, int nid);
+#define mempool_create_node(...) \
+ alloc_hooks(_mempool_create_node(__VA_ARGS__), mempool_t *, NULL)
+
+#define mempool_create(_min_nr, _alloc_fn, _free_fn, _pool_data) \
+ mempool_create_node(_min_nr, _alloc_fn, _free_fn, _pool_data, \
+ GFP_KERNEL, NUMA_NO_NODE)
extern int mempool_resize(mempool_t *pool, int new_min_nr);
extern void mempool_destroy(mempool_t *pool);
-extern void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) __malloc;
+
+extern void *_mempool_alloc(mempool_t *pool, gfp_t gfp_mask) __malloc;
+#define mempool_alloc(_pool, _gfp) \
+ alloc_hooks(_mempool_alloc((_pool), (_gfp)), void *, NULL)
+
extern void mempool_free(void *element, mempool_t *pool);
/*
@@ -61,19 +77,10 @@ extern void mempool_free(void *element, mempool_t *pool);
void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data);
void mempool_free_slab(void *element, void *pool_data);
-static inline int
-mempool_init_slab_pool(mempool_t *pool, int min_nr, struct kmem_cache *kc)
-{
- return mempool_init(pool, min_nr, mempool_alloc_slab,
- mempool_free_slab, (void *) kc);
-}
-
-static inline mempool_t *
-mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
-{
- return mempool_create(min_nr, mempool_alloc_slab, mempool_free_slab,
- (void *) kc);
-}
+#define mempool_init_slab_pool(_pool, _min_nr, _kc) \
+ mempool_init(_pool, (_min_nr), mempool_alloc_slab, mempool_free_slab, (void *)(_kc))
+#define mempool_create_slab_pool(_min_nr, _kc) \
+ mempool_create((_min_nr), mempool_alloc_slab, mempool_free_slab, (void *)(_kc))
/*
* a mempool_alloc_t and a mempool_free_t to kmalloc and kfree the
@@ -82,17 +89,12 @@ mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data);
void mempool_kfree(void *element, void *pool_data);
-static inline int mempool_init_kmalloc_pool(mempool_t *pool, int min_nr, size_t size)
-{
- return mempool_init(pool, min_nr, mempool_kmalloc,
- mempool_kfree, (void *) size);
-}
-
-static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
-{
- return mempool_create(min_nr, mempool_kmalloc, mempool_kfree,
- (void *) size);
-}
+#define mempool_init_kmalloc_pool(_pool, _min_nr, _size) \
+ mempool_init(_pool, (_min_nr), mempool_kmalloc, mempool_kfree, \
+ (void *)(unsigned long)(_size))
+#define mempool_create_kmalloc_pool(_min_nr, _size) \
+ mempool_create((_min_nr), mempool_kmalloc, mempool_kfree, \
+ (void *)(unsigned long)(_size))
/*
* A mempool_alloc_t and mempool_free_t for a simple page allocator that
@@ -101,16 +103,11 @@ static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data);
void mempool_free_pages(void *element, void *pool_data);
-static inline int mempool_init_page_pool(mempool_t *pool, int min_nr, int order)
-{
- return mempool_init(pool, min_nr, mempool_alloc_pages,
- mempool_free_pages, (void *)(long)order);
-}
-
-static inline mempool_t *mempool_create_page_pool(int min_nr, int order)
-{
- return mempool_create(min_nr, mempool_alloc_pages, mempool_free_pages,
- (void *)(long)order);
-}
+#define mempool_init_page_pool(_pool, _min_nr, _order) \
+ mempool_init(_pool, (_min_nr), mempool_alloc_pages, \
+ mempool_free_pages, (void *)(long)(_order))
+#define mempool_create_page_pool(_min_nr, _order) \
+ mempool_create((_min_nr), mempool_alloc_pages, \
+ mempool_free_pages, (void *)(long)(_order))
#endif /* _LINUX_MEMPOOL_H */
diff --git a/mm/mempool.c b/mm/mempool.c
index 734bcf5afbb7..4fc90735853c 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -230,17 +230,17 @@ EXPORT_SYMBOL(mempool_init_node);
*
* Return: %0 on success, negative error code otherwise.
*/
-int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+int _mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data)
{
return mempool_init_node(pool, min_nr, alloc_fn, free_fn,
pool_data, GFP_KERNEL, NUMA_NO_NODE);
}
-EXPORT_SYMBOL(mempool_init);
+EXPORT_SYMBOL(_mempool_init);
/**
- * mempool_create - create a memory pool
+ * mempool_create_node - create a memory pool
* @min_nr: the minimum number of elements guaranteed to be
* allocated for this pool.
* @alloc_fn: user-defined element-allocation function.
@@ -255,15 +255,7 @@ EXPORT_SYMBOL(mempool_init);
*
* Return: pointer to the created memory pool object or %NULL on error.
*/
-mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data)
-{
- return mempool_create_node(min_nr, alloc_fn, free_fn, pool_data,
- GFP_KERNEL, NUMA_NO_NODE);
-}
-EXPORT_SYMBOL(mempool_create);
-
-mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
+mempool_t *_mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data,
gfp_t gfp_mask, int node_id)
{
@@ -281,7 +273,7 @@ mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
return pool;
}
-EXPORT_SYMBOL(mempool_create_node);
+EXPORT_SYMBOL(_mempool_create_node);
/**
* mempool_resize - resize an existing memory pool
@@ -377,7 +369,7 @@ EXPORT_SYMBOL(mempool_resize);
*
* Return: pointer to the allocated element or %NULL on error.
*/
-void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
+void *_mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
{
void *element;
unsigned long flags;
@@ -444,7 +436,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
finish_wait(&pool->wait, &wait);
goto repeat_alloc;
}
-EXPORT_SYMBOL(mempool_alloc);
+EXPORT_SYMBOL(_mempool_alloc);
/**
* mempool_free - return an element to the pool.
@@ -515,7 +507,7 @@ void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data)
{
struct kmem_cache *mem = pool_data;
VM_BUG_ON(mem->ctor);
- return kmem_cache_alloc(mem, gfp_mask);
+ return _kmem_cache_alloc(mem, gfp_mask);
}
EXPORT_SYMBOL(mempool_alloc_slab);
@@ -533,7 +525,7 @@ EXPORT_SYMBOL(mempool_free_slab);
void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
{
size_t size = (size_t)pool_data;
- return kmalloc(size, gfp_mask);
+ return _kmalloc(size, gfp_mask);
}
EXPORT_SYMBOL(mempool_kmalloc);
@@ -550,7 +542,7 @@ EXPORT_SYMBOL(mempool_kfree);
void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data)
{
int order = (int)(long)pool_data;
- return alloc_pages(gfp_mask, order);
+ return _alloc_pages(gfp_mask, order);
}
EXPORT_SYMBOL(mempool_alloc_pages);
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 29/40] mm: percpu: Introduce pcpuobj_ext
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (19 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 27/40] mempool: Hook up to memory allocation profiling Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (9 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
From: Kent Overstreet <kent.overstreet@linux.dev>
Upcoming alloc tagging patches require a place to stash per-allocation
metadata.
We already do this when memcg is enabled, so this patch generalizes the
obj_cgroup * vector in struct pcpu_chunk by creating a pcpu_obj_ext
type, which we will be adding to in an upcoming patch - similarly to the
previous slabobj_ext patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: linux-mm@kvack.org
---
mm/percpu-internal.h | 19 +++++++++++++++++--
mm/percpu.c | 30 +++++++++++++++---------------
2 files changed, 32 insertions(+), 17 deletions(-)
diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index f9847c131998..2433e7b24172 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -32,6 +32,16 @@ struct pcpu_block_md {
int nr_bits; /* total bits responsible for */
};
+struct pcpuobj_ext {
+#ifdef CONFIG_MEMCG_KMEM
+ struct obj_cgroup *cgroup;
+#endif
+};
+
+#ifdef CONFIG_MEMCG_KMEM
+#define NEED_PCPUOBJ_EXT
+#endif
+
struct pcpu_chunk {
#ifdef CONFIG_PERCPU_STATS
int nr_alloc; /* # of allocations */
@@ -57,8 +67,8 @@ struct pcpu_chunk {
int end_offset; /* additional area required to
have the region end page
aligned */
-#ifdef CONFIG_MEMCG_KMEM
- struct obj_cgroup **obj_cgroups; /* vector of object cgroups */
+#ifdef NEED_PCPUOBJ_EXT
+ struct pcpuobj_ext *obj_exts; /* vector of object cgroups */
#endif
int nr_pages; /* # of pages served by this chunk */
@@ -67,6 +77,11 @@ struct pcpu_chunk {
unsigned long populated[]; /* populated bitmap */
};
+static inline bool need_pcpuobj_ext(void)
+{
+ return !mem_cgroup_kmem_disabled();
+}
+
extern spinlock_t pcpu_lock;
extern struct list_head *pcpu_chunk_lists;
diff --git a/mm/percpu.c b/mm/percpu.c
index 28e07ede46f6..95b26a6b718d 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1392,9 +1392,9 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr,
panic("%s: Failed to allocate %zu bytes\n", __func__,
alloc_size);
-#ifdef CONFIG_MEMCG_KMEM
+#ifdef NEED_PCPUOBJ_EXT
/* first chunk is free to use */
- chunk->obj_cgroups = NULL;
+ chunk->obj_exts = NULL;
#endif
pcpu_init_md_blocks(chunk);
@@ -1463,12 +1463,12 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)
if (!chunk->md_blocks)
goto md_blocks_fail;
-#ifdef CONFIG_MEMCG_KMEM
- if (!mem_cgroup_kmem_disabled()) {
- chunk->obj_cgroups =
+#ifdef NEED_PCPUOBJ_EXT
+ if (need_pcpuobj_ext()) {
+ chunk->obj_exts =
pcpu_mem_zalloc(pcpu_chunk_map_bits(chunk) *
- sizeof(struct obj_cgroup *), gfp);
- if (!chunk->obj_cgroups)
+ sizeof(struct pcpuobj_ext), gfp);
+ if (!chunk->obj_exts)
goto objcg_fail;
}
#endif
@@ -1480,7 +1480,7 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)
return chunk;
-#ifdef CONFIG_MEMCG_KMEM
+#ifdef NEED_PCPUOBJ_EXT
objcg_fail:
pcpu_mem_free(chunk->md_blocks);
#endif
@@ -1498,8 +1498,8 @@ static void pcpu_free_chunk(struct pcpu_chunk *chunk)
{
if (!chunk)
return;
-#ifdef CONFIG_MEMCG_KMEM
- pcpu_mem_free(chunk->obj_cgroups);
+#ifdef NEED_PCPUOBJ_EXT
+ pcpu_mem_free(chunk->obj_exts);
#endif
pcpu_mem_free(chunk->md_blocks);
pcpu_mem_free(chunk->bound_map);
@@ -1648,8 +1648,8 @@ static void pcpu_memcg_post_alloc_hook(struct obj_cgroup *objcg,
if (!objcg)
return;
- if (likely(chunk && chunk->obj_cgroups)) {
- chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = objcg;
+ if (likely(chunk && chunk->obj_exts)) {
+ chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup = objcg;
rcu_read_lock();
mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_PERCPU_B,
@@ -1665,13 +1665,13 @@ static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
{
struct obj_cgroup *objcg;
- if (unlikely(!chunk->obj_cgroups))
+ if (unlikely(!chunk->obj_exts))
return;
- objcg = chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT];
+ objcg = chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup;
if (!objcg)
return;
- chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = NULL;
+ chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup = NULL;
obj_cgroup_uncharge(objcg, pcpu_obj_full_size(size));
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread[parent not found: <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* [PATCH 02/40] scripts/kallysms: Always include __start and __stop symbols
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 03/40] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
` (10 subsequent siblings)
11 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
From: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
These symbols are used to denote section boundaries: by always including
them we can unify loading sections from modules with loading built-in
sections, which leads to some significant cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
scripts/kallsyms.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 0d2db41177b2..7b7dbeb5bd6e 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -203,6 +203,11 @@ static int symbol_in_range(const struct sym_entry *s,
return 0;
}
+static bool string_starts_with(const char *s, const char *prefix)
+{
+ return strncmp(s, prefix, strlen(prefix)) == 0;
+}
+
static int symbol_valid(const struct sym_entry *s)
{
const char *name = sym_name(s);
@@ -210,6 +215,14 @@ static int symbol_valid(const struct sym_entry *s)
/* if --all-symbols is not specified, then symbols outside the text
* and inittext sections are discarded */
if (!all_symbols) {
+ /*
+ * Symbols starting with __start and __stop are used to denote
+ * section boundaries, and should always be included:
+ */
+ if (string_starts_with(name, "__start_") ||
+ string_starts_with(name, "__stop_"))
+ return 1;
+
if (symbol_in_range(s, text_ranges,
ARRAY_SIZE(text_ranges)) == 0)
return 0;
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 03/40] fs: Convert alloc_inode_sb() to a macro
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2023-05-01 16:54 ` [PATCH 02/40] scripts/kallysms: Always include __start and __stop symbols Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-02 12:35 ` Petr Tesařík
2023-05-01 16:54 ` [PATCH 07/40] Lazy percpu counters Suren Baghdasaryan
` (9 subsequent siblings)
11 siblings, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
From: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
We're introducing alloc tagging, which tracks memory allocations by
callsite. Converting alloc_inode_sb() to a macro means allocations will
be tracked by its caller, which is a bit more useful.
Signed-off-by: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Alexander Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
---
include/linux/fs.h | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a981680856..4905ce14db0b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2699,11 +2699,7 @@ int setattr_should_drop_sgid(struct mnt_idmap *idmap,
* This must be used for allocating filesystems specific inodes to set
* up the inode reclaim context correctly.
*/
-static inline void *
-alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)
-{
- return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp);
-}
+#define alloc_inode_sb(_sb, _cache, _gfp) kmem_cache_alloc_lru(_cache, &_sb->s_inode_lru, _gfp)
extern void __insert_inode_hash(struct inode *, unsigned long hashval);
static inline void insert_inode_hash(struct inode *inode)
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH 03/40] fs: Convert alloc_inode_sb() to a macro
2023-05-01 16:54 ` [PATCH 03/40] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
@ 2023-05-02 12:35 ` Petr Tesařík
[not found] ` <20230502143530.1586e287-TD/jYOLh/Qr2G+KSGY6Hrl+YFMdMcpeZ@public.gmane.org>
0 siblings, 1 reply; 160+ messages in thread
From: Petr Tesařík @ 2023-05-02 12:35 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, roman.gushchin,
mgorman, dave, willy, liam.howlett, corbet, void, peterz,
juri.lelli, ldufour, catalin.marinas, will, arnd, tglx, mingo,
dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy, nathan,
dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, andreyknvl
On Mon, 1 May 2023 09:54:13 -0700
Suren Baghdasaryan <surenb@google.com> wrote:
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> We're introducing alloc tagging, which tracks memory allocations by
> callsite. Converting alloc_inode_sb() to a macro means allocations will
> be tracked by its caller, which is a bit more useful.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> ---
> include/linux/fs.h | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 21a981680856..4905ce14db0b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2699,11 +2699,7 @@ int setattr_should_drop_sgid(struct mnt_idmap *idmap,
> * This must be used for allocating filesystems specific inodes to set
> * up the inode reclaim context correctly.
> */
> -static inline void *
> -alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)
> -{
> - return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp);
> -}
> +#define alloc_inode_sb(_sb, _cache, _gfp) kmem_cache_alloc_lru(_cache, &_sb->s_inode_lru, _gfp)
Honestly, I don't like this change. In general, pre-processor macros
are ugly and error-prone.
Besides, it works for you only because __kmem_cache_alloc_lru() is
declared __always_inline (unless CONFIG_SLUB_TINY is defined, but then
you probably don't want the tracking either). In any case, it's going
to be difficult for people to understand why and how this works.
If the actual caller of alloc_inode_sb() is needed, I'd rather add it
as a parameter and pass down _RET_IP_ explicitly here.
Just my two cents,
Petr T
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 07/40] Lazy percpu counters
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2023-05-01 16:54 ` [PATCH 02/40] scripts/kallysms: Always include __start and __stop symbols Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 03/40] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 19:17 ` Randy Dunlap
2023-05-01 16:54 ` [PATCH 09/40] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation Suren Baghdasaryan
` (8 subsequent siblings)
11 siblings, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
From: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
This patch adds lib/lazy-percpu-counter.c, which implements counters
that start out as atomics, but lazily switch to percpu mode if the
update rate crosses some threshold (arbitrarily set at 256 per second).
Signed-off-by: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
include/linux/lazy-percpu-counter.h | 102 ++++++++++++++++++++++
lib/Kconfig | 3 +
lib/Makefile | 2 +
lib/lazy-percpu-counter.c | 127 ++++++++++++++++++++++++++++
4 files changed, 234 insertions(+)
create mode 100644 include/linux/lazy-percpu-counter.h
create mode 100644 lib/lazy-percpu-counter.c
diff --git a/include/linux/lazy-percpu-counter.h b/include/linux/lazy-percpu-counter.h
new file mode 100644
index 000000000000..45ca9e2ce58b
--- /dev/null
+++ b/include/linux/lazy-percpu-counter.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Lazy percpu counters:
+ * (C) 2022 Kent Overstreet
+ *
+ * Lazy percpu counters start out in atomic mode, then switch to percpu mode if
+ * the update rate crosses some threshold.
+ *
+ * This means we don't have to decide between low memory overhead atomic
+ * counters and higher performance percpu counters - we can have our cake and
+ * eat it, too!
+ *
+ * Internally we use an atomic64_t, where the low bit indicates whether we're in
+ * percpu mode, and the high 8 bits are a secondary counter that's incremented
+ * when the counter is modified - meaning 55 bits of precision are available for
+ * the counter itself.
+ */
+
+#ifndef _LINUX_LAZY_PERCPU_COUNTER_H
+#define _LINUX_LAZY_PERCPU_COUNTER_H
+
+#include <linux/atomic.h>
+#include <asm/percpu.h>
+
+struct lazy_percpu_counter {
+ atomic64_t v;
+ unsigned long last_wrap;
+};
+
+void lazy_percpu_counter_exit(struct lazy_percpu_counter *c);
+void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i);
+void lazy_percpu_counter_add_slowpath_noupgrade(struct lazy_percpu_counter *c, s64 i);
+s64 lazy_percpu_counter_read(struct lazy_percpu_counter *c);
+
+/*
+ * We use the high bits of the atomic counter for a secondary counter, which is
+ * incremented every time the counter is touched. When the secondary counter
+ * wraps, we check the time the counter last wrapped, and if it was recent
+ * enough that means the update frequency has crossed our threshold and we
+ * switch to percpu mode:
+ */
+#define COUNTER_MOD_BITS 8
+#define COUNTER_MOD_MASK ~(~0ULL >> COUNTER_MOD_BITS)
+#define COUNTER_MOD_BITS_START (64 - COUNTER_MOD_BITS)
+
+/*
+ * We use the low bit of the counter to indicate whether we're in atomic mode
+ * (low bit clear), or percpu mode (low bit set, counter is a pointer to actual
+ * percpu counters:
+ */
+#define COUNTER_IS_PCPU_BIT 1
+
+static inline u64 __percpu *lazy_percpu_counter_is_pcpu(u64 v)
+{
+ if (!(v & COUNTER_IS_PCPU_BIT))
+ return NULL;
+
+ v ^= COUNTER_IS_PCPU_BIT;
+ return (u64 __percpu *)(unsigned long)v;
+}
+
+/**
+ * lazy_percpu_counter_add: Add a value to a lazy_percpu_counter
+ *
+ * @c: counter to modify
+ * @i: value to add
+ */
+static inline void lazy_percpu_counter_add(struct lazy_percpu_counter *c, s64 i)
+{
+ u64 v = atomic64_read(&c->v);
+ u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v);
+
+ if (likely(pcpu_v))
+ this_cpu_add(*pcpu_v, i);
+ else
+ lazy_percpu_counter_add_slowpath(c, i);
+}
+
+/**
+ * lazy_percpu_counter_add_noupgrade: Add a value to a lazy_percpu_counter,
+ * without upgrading to percpu mode
+ *
+ * @c: counter to modify
+ * @i: value to add
+ */
+static inline void lazy_percpu_counter_add_noupgrade(struct lazy_percpu_counter *c, s64 i)
+{
+ u64 v = atomic64_read(&c->v);
+ u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v);
+
+ if (likely(pcpu_v))
+ this_cpu_add(*pcpu_v, i);
+ else
+ lazy_percpu_counter_add_slowpath_noupgrade(c, i);
+}
+
+static inline void lazy_percpu_counter_sub(struct lazy_percpu_counter *c, s64 i)
+{
+ lazy_percpu_counter_add(c, -i);
+}
+
+#endif /* _LINUX_LAZY_PERCPU_COUNTER_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index 5c2da561c516..7380292a8fcd 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -505,6 +505,9 @@ config ASSOCIATIVE_ARRAY
for more information.
+config LAZY_PERCPU_COUNTER
+ bool
+
config HAS_IOMEM
bool
depends on !NO_IOMEM
diff --git a/lib/Makefile b/lib/Makefile
index 876fcdeae34e..293a0858a3f8 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -164,6 +164,8 @@ obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
obj-$(CONFIG_DEBUG_LIST) += list_debug.o
obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o
+obj-$(CONFIG_LAZY_PERCPU_COUNTER) += lazy-percpu-counter.o
+
obj-$(CONFIG_BITREVERSE) += bitrev.o
obj-$(CONFIG_LINEAR_RANGES) += linear_ranges.o
obj-$(CONFIG_PACKING) += packing.o
diff --git a/lib/lazy-percpu-counter.c b/lib/lazy-percpu-counter.c
new file mode 100644
index 000000000000..4f4e32c2dc09
--- /dev/null
+++ b/lib/lazy-percpu-counter.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/atomic.h>
+#include <linux/gfp.h>
+#include <linux/jiffies.h>
+#include <linux/lazy-percpu-counter.h>
+#include <linux/percpu.h>
+
+static inline s64 lazy_percpu_counter_atomic_val(s64 v)
+{
+ /* Ensure output is sign extended properly: */
+ return (v << COUNTER_MOD_BITS) >>
+ (COUNTER_MOD_BITS + COUNTER_IS_PCPU_BIT);
+}
+
+static void lazy_percpu_counter_switch_to_pcpu(struct lazy_percpu_counter *c)
+{
+ u64 __percpu *pcpu_v = alloc_percpu_gfp(u64, GFP_ATOMIC|__GFP_NOWARN);
+ u64 old, new, v;
+
+ if (!pcpu_v)
+ return;
+
+ preempt_disable();
+ v = atomic64_read(&c->v);
+ do {
+ if (lazy_percpu_counter_is_pcpu(v)) {
+ free_percpu(pcpu_v);
+ return;
+ }
+
+ old = v;
+ new = (unsigned long)pcpu_v | 1;
+
+ *this_cpu_ptr(pcpu_v) = lazy_percpu_counter_atomic_val(v);
+ } while ((v = atomic64_cmpxchg(&c->v, old, new)) != old);
+ preempt_enable();
+}
+
+/**
+ * lazy_percpu_counter_exit: Free resources associated with a
+ * lazy_percpu_counter
+ *
+ * @c: counter to exit
+ */
+void lazy_percpu_counter_exit(struct lazy_percpu_counter *c)
+{
+ free_percpu(lazy_percpu_counter_is_pcpu(atomic64_read(&c->v)));
+}
+EXPORT_SYMBOL_GPL(lazy_percpu_counter_exit);
+
+/**
+ * lazy_percpu_counter_read: Read current value of a lazy_percpu_counter
+ *
+ * @c: counter to read
+ */
+s64 lazy_percpu_counter_read(struct lazy_percpu_counter *c)
+{
+ s64 v = atomic64_read(&c->v);
+ u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v);
+
+ if (pcpu_v) {
+ int cpu;
+
+ v = 0;
+ for_each_possible_cpu(cpu)
+ v += *per_cpu_ptr(pcpu_v, cpu);
+ } else {
+ v = lazy_percpu_counter_atomic_val(v);
+ }
+
+ return v;
+}
+EXPORT_SYMBOL_GPL(lazy_percpu_counter_read);
+
+void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i)
+{
+ u64 atomic_i;
+ u64 old, v = atomic64_read(&c->v);
+ u64 __percpu *pcpu_v;
+
+ atomic_i = i << COUNTER_IS_PCPU_BIT;
+ atomic_i &= ~COUNTER_MOD_MASK;
+ atomic_i |= 1ULL << COUNTER_MOD_BITS_START;
+
+ do {
+ pcpu_v = lazy_percpu_counter_is_pcpu(v);
+ if (pcpu_v) {
+ this_cpu_add(*pcpu_v, i);
+ return;
+ }
+
+ old = v;
+ } while ((v = atomic64_cmpxchg(&c->v, old, old + atomic_i)) != old);
+
+ if (unlikely(!(v & COUNTER_MOD_MASK))) {
+ unsigned long now = jiffies;
+
+ if (c->last_wrap &&
+ unlikely(time_after(c->last_wrap + HZ, now)))
+ lazy_percpu_counter_switch_to_pcpu(c);
+ else
+ c->last_wrap = now;
+ }
+}
+EXPORT_SYMBOL(lazy_percpu_counter_add_slowpath);
+
+void lazy_percpu_counter_add_slowpath_noupgrade(struct lazy_percpu_counter *c, s64 i)
+{
+ u64 atomic_i;
+ u64 old, v = atomic64_read(&c->v);
+ u64 __percpu *pcpu_v;
+
+ atomic_i = i << COUNTER_IS_PCPU_BIT;
+ atomic_i &= ~COUNTER_MOD_MASK;
+
+ do {
+ pcpu_v = lazy_percpu_counter_is_pcpu(v);
+ if (pcpu_v) {
+ this_cpu_add(*pcpu_v, i);
+ return;
+ }
+
+ old = v;
+ } while ((v = atomic64_cmpxchg(&c->v, old, old + atomic_i)) != old);
+}
+EXPORT_SYMBOL(lazy_percpu_counter_add_slowpath_noupgrade);
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH 07/40] Lazy percpu counters
2023-05-01 16:54 ` [PATCH 07/40] Lazy percpu counters Suren Baghdasaryan
@ 2023-05-01 19:17 ` Randy Dunlap
0 siblings, 0 replies; 160+ messages in thread
From: Randy Dunlap @ 2023-05-01 19:17 UTC (permalink / raw)
To: Suren Baghdasaryan, akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Hi--
On 5/1/23 09:54, Suren Baghdasaryan wrote:
> From: Kent Overstreet <kent.overstreet@linux.dev>
>
> This patch adds lib/lazy-percpu-counter.c, which implements counters
> that start out as atomics, but lazily switch to percpu mode if the
> update rate crosses some threshold (arbitrarily set at 256 per second).
>
from submitting-patches.rst:
Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour.
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
> include/linux/lazy-percpu-counter.h | 102 ++++++++++++++++++++++
> lib/Kconfig | 3 +
> lib/Makefile | 2 +
> lib/lazy-percpu-counter.c | 127 ++++++++++++++++++++++++++++
> 4 files changed, 234 insertions(+)
> create mode 100644 include/linux/lazy-percpu-counter.h
> create mode 100644 lib/lazy-percpu-counter.c
>
> diff --git a/include/linux/lazy-percpu-counter.h b/include/linux/lazy-percpu-counter.h
> new file mode 100644
> index 000000000000..45ca9e2ce58b
> --- /dev/null
> +++ b/include/linux/lazy-percpu-counter.h
> @@ -0,0 +1,102 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Lazy percpu counters:
> + * (C) 2022 Kent Overstreet
> + *
> + * Lazy percpu counters start out in atomic mode, then switch to percpu mode if
> + * the update rate crosses some threshold.
> + *
> + * This means we don't have to decide between low memory overhead atomic
> + * counters and higher performance percpu counters - we can have our cake and
> + * eat it, too!
> + *
> + * Internally we use an atomic64_t, where the low bit indicates whether we're in
> + * percpu mode, and the high 8 bits are a secondary counter that's incremented
> + * when the counter is modified - meaning 55 bits of precision are available for
> + * the counter itself.
> + */
> +
> +#ifndef _LINUX_LAZY_PERCPU_COUNTER_H
> +#define _LINUX_LAZY_PERCPU_COUNTER_H
> +
> +#include <linux/atomic.h>
> +#include <asm/percpu.h>
> +
> +struct lazy_percpu_counter {
> + atomic64_t v;
> + unsigned long last_wrap;
> +};
> +
> +void lazy_percpu_counter_exit(struct lazy_percpu_counter *c);
> +void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i);
> +void lazy_percpu_counter_add_slowpath_noupgrade(struct lazy_percpu_counter *c, s64 i);
> +s64 lazy_percpu_counter_read(struct lazy_percpu_counter *c);
> +
> +/*
> + * We use the high bits of the atomic counter for a secondary counter, which is
> + * incremented every time the counter is touched. When the secondary counter
> + * wraps, we check the time the counter last wrapped, and if it was recent
> + * enough that means the update frequency has crossed our threshold and we
> + * switch to percpu mode:
> + */
> +#define COUNTER_MOD_BITS 8
> +#define COUNTER_MOD_MASK ~(~0ULL >> COUNTER_MOD_BITS)
> +#define COUNTER_MOD_BITS_START (64 - COUNTER_MOD_BITS)
> +
> +/*
> + * We use the low bit of the counter to indicate whether we're in atomic mode
> + * (low bit clear), or percpu mode (low bit set, counter is a pointer to actual
> + * percpu counters:
> + */
> +#define COUNTER_IS_PCPU_BIT 1
> +
> +static inline u64 __percpu *lazy_percpu_counter_is_pcpu(u64 v)
> +{
> + if (!(v & COUNTER_IS_PCPU_BIT))
> + return NULL;
> +
> + v ^= COUNTER_IS_PCPU_BIT;
> + return (u64 __percpu *)(unsigned long)v;
> +}
> +
> +/**
> + * lazy_percpu_counter_add: Add a value to a lazy_percpu_counter
For kernel-doc, the function name should be followed by '-', not ':'.
(many places)
> + *
> + * @c: counter to modify
> + * @i: value to add
> + */
> +static inline void lazy_percpu_counter_add(struct lazy_percpu_counter *c, s64 i)
> +{
> + u64 v = atomic64_read(&c->v);
> + u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v);
> +
> + if (likely(pcpu_v))
> + this_cpu_add(*pcpu_v, i);
> + else
> + lazy_percpu_counter_add_slowpath(c, i);
> +}
> +
> +/**
> + * lazy_percpu_counter_add_noupgrade: Add a value to a lazy_percpu_counter,
> + * without upgrading to percpu mode
> + *
> + * @c: counter to modify
> + * @i: value to add
> + */
> +static inline void lazy_percpu_counter_add_noupgrade(struct lazy_percpu_counter *c, s64 i)
> +{
> + u64 v = atomic64_read(&c->v);
> + u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v);
> +
> + if (likely(pcpu_v))
> + this_cpu_add(*pcpu_v, i);
> + else
> + lazy_percpu_counter_add_slowpath_noupgrade(c, i);
> +}
> +
> +static inline void lazy_percpu_counter_sub(struct lazy_percpu_counter *c, s64 i)
> +{
> + lazy_percpu_counter_add(c, -i);
> +}
> +
> +#endif /* _LINUX_LAZY_PERCPU_COUNTER_H */
> diff --git a/lib/lazy-percpu-counter.c b/lib/lazy-percpu-counter.c
> new file mode 100644
> index 000000000000..4f4e32c2dc09
> --- /dev/null
> +++ b/lib/lazy-percpu-counter.c
> @@ -0,0 +1,127 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/atomic.h>
> +#include <linux/gfp.h>
> +#include <linux/jiffies.h>
> +#include <linux/lazy-percpu-counter.h>
> +#include <linux/percpu.h>
> +
> +static inline s64 lazy_percpu_counter_atomic_val(s64 v)
> +{
> + /* Ensure output is sign extended properly: */
> + return (v << COUNTER_MOD_BITS) >>
> + (COUNTER_MOD_BITS + COUNTER_IS_PCPU_BIT);
> +}
> +
...
> +
> +/**
> + * lazy_percpu_counter_exit: Free resources associated with a
> + * lazy_percpu_counter
Same kernel-doc comment.
> + *
> + * @c: counter to exit
> + */
> +void lazy_percpu_counter_exit(struct lazy_percpu_counter *c)
> +{
> + free_percpu(lazy_percpu_counter_is_pcpu(atomic64_read(&c->v)));
> +}
> +EXPORT_SYMBOL_GPL(lazy_percpu_counter_exit);
> +
> +/**
> + * lazy_percpu_counter_read: Read current value of a lazy_percpu_counter
> + *
> + * @c: counter to read
> + */
> +s64 lazy_percpu_counter_read(struct lazy_percpu_counter *c)
> +{
> + s64 v = atomic64_read(&c->v);
> + u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v);
> +
> + if (pcpu_v) {
> + int cpu;
> +
> + v = 0;
> + for_each_possible_cpu(cpu)
> + v += *per_cpu_ptr(pcpu_v, cpu);
> + } else {
> + v = lazy_percpu_counter_atomic_val(v);
> + }
> +
> + return v;
> +}
> +EXPORT_SYMBOL_GPL(lazy_percpu_counter_read);
> +
> +void lazy_percpu_counter_add_slowpath(struct lazy_percpu_counter *c, s64 i)
> +{
> + u64 atomic_i;
> + u64 old, v = atomic64_read(&c->v);
> + u64 __percpu *pcpu_v;
> +
> + atomic_i = i << COUNTER_IS_PCPU_BIT;
> + atomic_i &= ~COUNTER_MOD_MASK;
> + atomic_i |= 1ULL << COUNTER_MOD_BITS_START;
> +
> + do {
> + pcpu_v = lazy_percpu_counter_is_pcpu(v);
> + if (pcpu_v) {
> + this_cpu_add(*pcpu_v, i);
> + return;
> + }
> +
> + old = v;
> + } while ((v = atomic64_cmpxchg(&c->v, old, old + atomic_i)) != old);
> +
> + if (unlikely(!(v & COUNTER_MOD_MASK))) {
> + unsigned long now = jiffies;
> +
> + if (c->last_wrap &&
> + unlikely(time_after(c->last_wrap + HZ, now)))
> + lazy_percpu_counter_switch_to_pcpu(c);
> + else
> + c->last_wrap = now;
> + }
> +}
> +EXPORT_SYMBOL(lazy_percpu_counter_add_slowpath);
> +
> +void lazy_percpu_counter_add_slowpath_noupgrade(struct lazy_percpu_counter *c, s64 i)
> +{
> + u64 atomic_i;
> + u64 old, v = atomic64_read(&c->v);
> + u64 __percpu *pcpu_v;
> +
> + atomic_i = i << COUNTER_IS_PCPU_BIT;
> + atomic_i &= ~COUNTER_MOD_MASK;
> +
> + do {
> + pcpu_v = lazy_percpu_counter_is_pcpu(v);
> + if (pcpu_v) {
> + this_cpu_add(*pcpu_v, i);
> + return;
> + }
> +
> + old = v;
> + } while ((v = atomic64_cmpxchg(&c->v, old, old + atomic_i)) != old);
> +}
> +EXPORT_SYMBOL(lazy_percpu_counter_add_slowpath_noupgrade);
These last 2 exported functions could use some comments, preferably in
kernel-doc format.
Thanks.
--
~Randy
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 09/40] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (2 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 07/40] Lazy percpu counters Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
[not found] ` <20230501165450.15352-10-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2023-05-01 16:54 ` [PATCH 20/40] mm: enable page allocation tagging Suren Baghdasaryan
` (7 subsequent siblings)
11 siblings, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
Introduce __GFP_NO_OBJ_EXT flag in order to prevent recursive allocations
when allocating slabobj_ext on a slab.
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
include/linux/gfp_types.h | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 6583a58670c5..aab1959130f9 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -53,8 +53,13 @@ typedef unsigned int __bitwise gfp_t;
#define ___GFP_SKIP_ZERO 0
#define ___GFP_SKIP_KASAN 0
#endif
+#ifdef CONFIG_SLAB_OBJ_EXT
+#define ___GFP_NO_OBJ_EXT 0x4000000u
+#else
+#define ___GFP_NO_OBJ_EXT 0
+#endif
#ifdef CONFIG_LOCKDEP
-#define ___GFP_NOLOCKDEP 0x4000000u
+#define ___GFP_NOLOCKDEP 0x8000000u
#else
#define ___GFP_NOLOCKDEP 0
#endif
@@ -99,12 +104,15 @@ typedef unsigned int __bitwise gfp_t;
* node with no fallbacks or placement policy enforcements.
*
* %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg.
+ *
+ * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension.
*/
#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
#define __GFP_WRITE ((__force gfp_t)___GFP_WRITE)
#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL)
#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)
#define __GFP_ACCOUNT ((__force gfp_t)___GFP_ACCOUNT)
+#define __GFP_NO_OBJ_EXT ((__force gfp_t)___GFP_NO_OBJ_EXT)
/**
* DOC: Watermark modifiers
@@ -249,7 +257,7 @@ typedef unsigned int __bitwise gfp_t;
#define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
/* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (26 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (27 + IS_ENABLED(CONFIG_LOCKDEP))
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
/**
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 20/40] mm: enable page allocation tagging
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (3 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 09/40] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 21/40] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y Suren Baghdasaryan
` (6 subsequent siblings)
11 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
Redefine page allocators to record allocation tags upon their invocation.
Instrument post_alloc_hook and free_pages_prepare to modify current
allocation tag.
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
include/linux/alloc_tag.h | 11 ++++
include/linux/gfp.h | 123 +++++++++++++++++++++++++-----------
include/linux/page_ext.h | 1 -
include/linux/pagemap.h | 9 ++-
include/linux/pgalloc_tag.h | 38 +++++++++--
mm/compaction.c | 9 ++-
mm/filemap.c | 6 +-
mm/mempolicy.c | 30 ++++-----
mm/mm_init.c | 1 +
mm/page_alloc.c | 73 ++++++++++++---------
10 files changed, 208 insertions(+), 93 deletions(-)
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index d913f8d9a7d8..07922d81b641 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -102,4 +102,15 @@ static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
#endif
+#define alloc_hooks(_do_alloc, _res_type, _err) \
+({ \
+ _res_type _res; \
+ DEFINE_ALLOC_TAG(_alloc_tag, _old); \
+ \
+ _res = _do_alloc; \
+ alloc_tag_restore(&_alloc_tag, _old); \
+ _res; \
+})
+
+
#endif /* _LINUX_ALLOC_TAG_H */
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index ed8cb537c6a7..0cb4a515109a 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -6,6 +6,8 @@
#include <linux/mmzone.h>
#include <linux/topology.h>
+#include <linux/alloc_tag.h>
+#include <linux/sched.h>
struct vm_area_struct;
@@ -174,42 +176,57 @@ static inline void arch_free_page(struct page *page, int order) { }
static inline void arch_alloc_page(struct page *page, int order) { }
#endif
-struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
+struct page *_alloc_pages2(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask);
-struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid,
+#define __alloc_pages(_gfp, _order, _preferred_nid, _nodemask) \
+ alloc_hooks(_alloc_pages2(_gfp, _order, _preferred_nid, \
+ _nodemask), struct page *, NULL)
+
+struct folio *_folio_alloc2(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask);
+#define __folio_alloc(_gfp, _order, _preferred_nid, _nodemask) \
+ alloc_hooks(_folio_alloc2(_gfp, _order, _preferred_nid, \
+ _nodemask), struct folio *, NULL)
-unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
+unsigned long _alloc_pages_bulk(gfp_t gfp, int preferred_nid,
nodemask_t *nodemask, int nr_pages,
struct list_head *page_list,
struct page **page_array);
-
-unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
+#define __alloc_pages_bulk(_gfp, _preferred_nid, _nodemask, _nr_pages, \
+ _page_list, _page_array) \
+ alloc_hooks(_alloc_pages_bulk(_gfp, _preferred_nid, \
+ _nodemask, _nr_pages, \
+ _page_list, _page_array), \
+ unsigned long, 0)
+
+unsigned long _alloc_pages_bulk_array_mempolicy(gfp_t gfp,
unsigned long nr_pages,
struct page **page_array);
+#define alloc_pages_bulk_array_mempolicy(_gfp, _nr_pages, _page_array) \
+ alloc_hooks(_alloc_pages_bulk_array_mempolicy(_gfp, \
+ _nr_pages, _page_array), \
+ unsigned long, 0)
/* Bulk allocate order-0 pages */
-static inline unsigned long
-alloc_pages_bulk_list(gfp_t gfp, unsigned long nr_pages, struct list_head *list)
-{
- return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, list, NULL);
-}
+#define alloc_pages_bulk_list(_gfp, _nr_pages, _list) \
+ __alloc_pages_bulk(_gfp, numa_mem_id(), NULL, _nr_pages, _list, NULL)
-static inline unsigned long
-alloc_pages_bulk_array(gfp_t gfp, unsigned long nr_pages, struct page **page_array)
-{
- return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, NULL, page_array);
-}
+#define alloc_pages_bulk_array(_gfp, _nr_pages, _page_array) \
+ __alloc_pages_bulk(_gfp, numa_mem_id(), NULL, _nr_pages, NULL, _page_array)
static inline unsigned long
-alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array)
+_alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array)
{
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();
- return __alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array);
+ return _alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array);
}
+#define alloc_pages_bulk_array_node(_gfp, _nid, _nr_pages, _page_array) \
+ alloc_hooks(_alloc_pages_bulk_array_node(_gfp, _nid, _nr_pages, _page_array), \
+ unsigned long, 0)
+
static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
{
gfp_t warn_gfp = gfp_mask & (__GFP_THISNODE|__GFP_NOWARN);
@@ -229,21 +246,25 @@ static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
* online. For more general interface, see alloc_pages_node().
*/
static inline struct page *
-__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
+_alloc_pages_node2(int nid, gfp_t gfp_mask, unsigned int order)
{
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
warn_if_node_offline(nid, gfp_mask);
- return __alloc_pages(gfp_mask, order, nid, NULL);
+ return _alloc_pages2(gfp_mask, order, nid, NULL);
}
+#define __alloc_pages_node(_nid, _gfp_mask, _order) \
+ alloc_hooks(_alloc_pages_node2(_nid, _gfp_mask, _order), \
+ struct page *, NULL)
+
static inline
struct folio *__folio_alloc_node(gfp_t gfp, unsigned int order, int nid)
{
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
warn_if_node_offline(nid, gfp);
- return __folio_alloc(gfp, order, nid, NULL);
+ return _folio_alloc2(gfp, order, nid, NULL);
}
/*
@@ -251,32 +272,45 @@ struct folio *__folio_alloc_node(gfp_t gfp, unsigned int order, int nid)
* prefer the current CPU's closest node. Otherwise node must be valid and
* online.
*/
-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
+static inline struct page *_alloc_pages_node(int nid, gfp_t gfp_mask,
unsigned int order)
{
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();
- return __alloc_pages_node(nid, gfp_mask, order);
+ return _alloc_pages_node2(nid, gfp_mask, order);
}
+#define alloc_pages_node(_nid, _gfp_mask, _order) \
+ alloc_hooks(_alloc_pages_node(_nid, _gfp_mask, _order), \
+ struct page *, NULL)
+
#ifdef CONFIG_NUMA
-struct page *alloc_pages(gfp_t gfp, unsigned int order);
-struct folio *folio_alloc(gfp_t gfp, unsigned order);
-struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
+struct page *_alloc_pages(gfp_t gfp, unsigned int order);
+struct folio *_folio_alloc(gfp_t gfp, unsigned int order);
+struct folio *_vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
unsigned long addr, bool hugepage);
#else
-static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
+static inline struct page *_alloc_pages(gfp_t gfp_mask, unsigned int order)
{
- return alloc_pages_node(numa_node_id(), gfp_mask, order);
+ return _alloc_pages_node(numa_node_id(), gfp_mask, order);
}
-static inline struct folio *folio_alloc(gfp_t gfp, unsigned int order)
+static inline struct folio *_folio_alloc(gfp_t gfp, unsigned int order)
{
return __folio_alloc_node(gfp, order, numa_node_id());
}
-#define vma_alloc_folio(gfp, order, vma, addr, hugepage) \
- folio_alloc(gfp, order)
+#define _vma_alloc_folio(gfp, order, vma, addr, hugepage) \
+ _folio_alloc(gfp, order)
#endif
+
+#define alloc_pages(_gfp, _order) \
+ alloc_hooks(_alloc_pages(_gfp, _order), struct page *, NULL)
+#define folio_alloc(_gfp, _order) \
+ alloc_hooks(_folio_alloc(_gfp, _order), struct folio *, NULL)
+#define vma_alloc_folio(_gfp, _order, _vma, _addr, _hugepage) \
+ alloc_hooks(_vma_alloc_folio(_gfp, _order, _vma, _addr, \
+ _hugepage), struct folio *, NULL)
+
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
static inline struct page *alloc_page_vma(gfp_t gfp,
struct vm_area_struct *vma, unsigned long addr)
@@ -286,12 +320,21 @@ static inline struct page *alloc_page_vma(gfp_t gfp,
return &folio->page;
}
-extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
-extern unsigned long get_zeroed_page(gfp_t gfp_mask);
+extern unsigned long _get_free_pages(gfp_t gfp_mask, unsigned int order);
+#define __get_free_pages(_gfp_mask, _order) \
+ alloc_hooks(_get_free_pages(_gfp_mask, _order), unsigned long, 0)
+extern unsigned long _get_zeroed_page(gfp_t gfp_mask);
+#define get_zeroed_page(_gfp_mask) \
+ alloc_hooks(_get_zeroed_page(_gfp_mask), unsigned long, 0)
-void *alloc_pages_exact(size_t size, gfp_t gfp_mask) __alloc_size(1);
+void *_alloc_pages_exact(size_t size, gfp_t gfp_mask) __alloc_size(1);
+#define alloc_pages_exact(_size, _gfp_mask) \
+ alloc_hooks(_alloc_pages_exact(_size, _gfp_mask), void *, NULL)
void free_pages_exact(void *virt, size_t size);
-__meminit void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask) __alloc_size(2);
+
+__meminit void *_alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask) __alloc_size(2);
+#define alloc_pages_exact_nid(_nid, _size, _gfp_mask) \
+ alloc_hooks(_alloc_pages_exact_nid(_nid, _size, _gfp_mask), void *, NULL)
#define __get_free_page(gfp_mask) \
__get_free_pages((gfp_mask), 0)
@@ -354,10 +397,16 @@ static inline bool pm_suspended_storage(void)
#ifdef CONFIG_CONTIG_ALLOC
/* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end,
+extern int _alloc_contig_range(unsigned long start, unsigned long end,
unsigned migratetype, gfp_t gfp_mask);
-extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
- int nid, nodemask_t *nodemask);
+#define alloc_contig_range(_start, _end, _migratetype, _gfp_mask) \
+ alloc_hooks(_alloc_contig_range(_start, _end, _migratetype, \
+ _gfp_mask), int, -ENOMEM)
+extern struct page *_alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
+ int nid, nodemask_t *nodemask);
+#define alloc_contig_pages(_nr_pages, _gfp_mask, _nid, _nodemask) \
+ alloc_hooks(_alloc_contig_pages(_nr_pages, _gfp_mask, _nid, \
+ _nodemask), struct page *, NULL)
#endif
void free_contig_range(unsigned long pfn, unsigned long nr_pages);
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 67314f648aeb..cff15ee5440e 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -4,7 +4,6 @@
#include <linux/types.h>
#include <linux/stacktrace.h>
-#include <linux/stackdepot.h>
struct pglist_data;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a4..b2efafa001f8 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -467,14 +467,17 @@ static inline void *detach_page_private(struct page *page)
}
#ifdef CONFIG_NUMA
-struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order);
+struct folio *_filemap_alloc_folio(gfp_t gfp, unsigned int order);
#else
-static inline struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order)
+static inline struct folio *_filemap_alloc_folio(gfp_t gfp, unsigned int order)
{
- return folio_alloc(gfp, order);
+ return _folio_alloc(gfp, order);
}
#endif
+#define filemap_alloc_folio(_gfp, _order) \
+ alloc_hooks(_filemap_alloc_folio(_gfp, _order), struct folio *, NULL)
+
static inline struct page *__page_cache_alloc(gfp_t gfp)
{
return &filemap_alloc_folio(gfp, 0)->page;
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index f8c7b6ef9c75..567327c1c46f 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -6,28 +6,58 @@
#define _LINUX_PGALLOC_TAG_H
#include <linux/alloc_tag.h>
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
#include <linux/page_ext.h>
extern struct page_ext_operations page_alloc_tagging_ops;
-struct page_ext *lookup_page_ext(const struct page *page);
+extern struct page_ext *page_ext_get(struct page *page);
+extern void page_ext_put(struct page_ext *page_ext);
+
+static inline union codetag_ref *codetag_ref_from_page_ext(struct page_ext *page_ext)
+{
+ return (void *)page_ext + page_alloc_tagging_ops.offset;
+}
+
+static inline struct page_ext *page_ext_from_codetag_ref(union codetag_ref *ref)
+{
+ return (void *)ref - page_alloc_tagging_ops.offset;
+}
static inline union codetag_ref *get_page_tag_ref(struct page *page)
{
if (page && mem_alloc_profiling_enabled()) {
- struct page_ext *page_ext = lookup_page_ext(page);
+ struct page_ext *page_ext = page_ext_get(page);
if (page_ext)
- return (void *)page_ext + page_alloc_tagging_ops.offset;
+ return codetag_ref_from_page_ext(page_ext);
}
return NULL;
}
+static inline void put_page_tag_ref(union codetag_ref *ref)
+{
+ if (ref)
+ page_ext_put(page_ext_from_codetag_ref(ref));
+}
+
static inline void pgalloc_tag_dec(struct page *page, unsigned int order)
{
union codetag_ref *ref = get_page_tag_ref(page);
- if (ref)
+ if (ref) {
alloc_tag_sub(ref, PAGE_SIZE << order);
+ put_page_tag_ref(ref);
+ }
}
+#else /* CONFIG_MEM_ALLOC_PROFILING */
+
+static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
+static inline void put_page_tag_ref(union codetag_ref *ref) {}
+#define pgalloc_tag_dec(__page, __size) do {} while (0)
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING */
+
#endif /* _LINUX_PGALLOC_TAG_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index c8bcdea15f5f..32707fb62495 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1684,7 +1684,7 @@ static void isolate_freepages(struct compact_control *cc)
* This is a migrate-callback that "allocates" freepages by taking pages
* from the isolated freelists in the block we are migrating to.
*/
-static struct page *compaction_alloc(struct page *migratepage,
+static struct page *_compaction_alloc(struct page *migratepage,
unsigned long data)
{
struct compact_control *cc = (struct compact_control *)data;
@@ -1704,6 +1704,13 @@ static struct page *compaction_alloc(struct page *migratepage,
return freepage;
}
+static struct page *compaction_alloc(struct page *migratepage,
+ unsigned long data)
+{
+ return alloc_hooks(_compaction_alloc(migratepage, data),
+ struct page *, NULL);
+}
+
/*
* This is a migrate-callback that "frees" freepages back to the isolated
* freelist. All pages on the freelist are from the same zone, so there is no
diff --git a/mm/filemap.c b/mm/filemap.c
index a34abfe8c654..f0f8b782d172 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -958,7 +958,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
EXPORT_SYMBOL_GPL(filemap_add_folio);
#ifdef CONFIG_NUMA
-struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order)
+struct folio *_filemap_alloc_folio(gfp_t gfp, unsigned int order)
{
int n;
struct folio *folio;
@@ -973,9 +973,9 @@ struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order)
return folio;
}
- return folio_alloc(gfp, order);
+ return _folio_alloc(gfp, order);
}
-EXPORT_SYMBOL(filemap_alloc_folio);
+EXPORT_SYMBOL(_filemap_alloc_folio);
#endif
/*
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2068b594dc88..80cd33811641 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2141,7 +2141,7 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
}
/**
- * vma_alloc_folio - Allocate a folio for a VMA.
+ * _vma_alloc_folio - Allocate a folio for a VMA.
* @gfp: GFP flags.
* @order: Order of the folio.
* @vma: Pointer to VMA or NULL if not available.
@@ -2155,7 +2155,7 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
*
* Return: The folio on success or NULL if allocation fails.
*/
-struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
+struct folio *_vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
unsigned long addr, bool hugepage)
{
struct mempolicy *pol;
@@ -2240,10 +2240,10 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
out:
return folio;
}
-EXPORT_SYMBOL(vma_alloc_folio);
+EXPORT_SYMBOL(_vma_alloc_folio);
/**
- * alloc_pages - Allocate pages.
+ * _alloc_pages - Allocate pages.
* @gfp: GFP flags.
* @order: Power of two of number of pages to allocate.
*
@@ -2256,7 +2256,7 @@ EXPORT_SYMBOL(vma_alloc_folio);
* flags are used.
* Return: The page on success or NULL if allocation fails.
*/
-struct page *alloc_pages(gfp_t gfp, unsigned order)
+struct page *_alloc_pages(gfp_t gfp, unsigned int order)
{
struct mempolicy *pol = &default_policy;
struct page *page;
@@ -2274,15 +2274,15 @@ struct page *alloc_pages(gfp_t gfp, unsigned order)
page = alloc_pages_preferred_many(gfp, order,
policy_node(gfp, pol, numa_node_id()), pol);
else
- page = __alloc_pages(gfp, order,
+ page = _alloc_pages2(gfp, order,
policy_node(gfp, pol, numa_node_id()),
policy_nodemask(gfp, pol));
return page;
}
-EXPORT_SYMBOL(alloc_pages);
+EXPORT_SYMBOL(_alloc_pages);
-struct folio *folio_alloc(gfp_t gfp, unsigned order)
+struct folio *_folio_alloc(gfp_t gfp, unsigned int order)
{
struct page *page = alloc_pages(gfp | __GFP_COMP, order);
@@ -2290,7 +2290,7 @@ struct folio *folio_alloc(gfp_t gfp, unsigned order)
prep_transhuge_page(page);
return (struct folio *)page;
}
-EXPORT_SYMBOL(folio_alloc);
+EXPORT_SYMBOL(_folio_alloc);
static unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp,
struct mempolicy *pol, unsigned long nr_pages,
@@ -2309,13 +2309,13 @@ static unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp,
for (i = 0; i < nodes; i++) {
if (delta) {
- nr_allocated = __alloc_pages_bulk(gfp,
+ nr_allocated = _alloc_pages_bulk(gfp,
interleave_nodes(pol), NULL,
nr_pages_per_node + 1, NULL,
page_array);
delta--;
} else {
- nr_allocated = __alloc_pages_bulk(gfp,
+ nr_allocated = _alloc_pages_bulk(gfp,
interleave_nodes(pol), NULL,
nr_pages_per_node, NULL, page_array);
}
@@ -2337,11 +2337,11 @@ static unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid,
preferred_gfp = gfp | __GFP_NOWARN;
preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
- nr_allocated = __alloc_pages_bulk(preferred_gfp, nid, &pol->nodes,
+ nr_allocated = _alloc_pages_bulk(preferred_gfp, nid, &pol->nodes,
nr_pages, NULL, page_array);
if (nr_allocated < nr_pages)
- nr_allocated += __alloc_pages_bulk(gfp, numa_node_id(), NULL,
+ nr_allocated += _alloc_pages_bulk(gfp, numa_node_id(), NULL,
nr_pages - nr_allocated, NULL,
page_array + nr_allocated);
return nr_allocated;
@@ -2353,7 +2353,7 @@ static unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid,
* It can accelerate memory allocation especially interleaving
* allocate memory.
*/
-unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
+unsigned long _alloc_pages_bulk_array_mempolicy(gfp_t gfp,
unsigned long nr_pages, struct page **page_array)
{
struct mempolicy *pol = &default_policy;
@@ -2369,7 +2369,7 @@ unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
return alloc_pages_bulk_array_preferred_many(gfp,
numa_node_id(), pol, nr_pages, page_array);
- return __alloc_pages_bulk(gfp, policy_node(gfp, pol, numa_node_id()),
+ return _alloc_pages_bulk(gfp, policy_node(gfp, pol, numa_node_id()),
policy_nodemask(gfp, pol), nr_pages, NULL,
page_array);
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 7f7f9c677854..42135fad4d8a 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -24,6 +24,7 @@
#include <linux/page_ext.h>
#include <linux/pti.h>
#include <linux/pgtable.h>
+#include <linux/stackdepot.h>
#include <linux/swap.h>
#include <linux/cma.h>
#include "internal.h"
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9de2a18519a1..edd35500f7f6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -74,6 +74,7 @@
#include <linux/psi.h>
#include <linux/khugepaged.h>
#include <linux/delayacct.h>
+#include <linux/pgalloc_tag.h>
#include <asm/sections.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -657,6 +658,7 @@ static inline bool pcp_allowed_order(unsigned int order)
static inline void free_the_page(struct page *page, unsigned int order)
{
+
if (pcp_allowed_order(order)) /* Via pcp? */
free_unref_page(page, order);
else
@@ -1259,6 +1261,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
__memcg_kmem_uncharge_page(page, order);
reset_page_owner(page, order);
page_table_check_free(page, order);
+ pgalloc_tag_dec(page, order);
return false;
}
@@ -1301,6 +1304,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
reset_page_owner(page, order);
page_table_check_free(page, order);
+ pgalloc_tag_dec(page, order);
if (!PageHighMem(page)) {
debug_check_no_locks_freed(page_address(page),
@@ -1669,6 +1673,9 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) &&
!should_skip_init(gfp_flags);
bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS);
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ union codetag_ref *ref;
+#endif
int i;
set_page_private(page, 0);
@@ -1721,6 +1728,14 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
set_page_owner(page, order, gfp_flags);
page_table_check_alloc(page, order);
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ ref = get_page_tag_ref(page);
+ if (ref) {
+ alloc_tag_add(ref, current->alloc_tag, PAGE_SIZE << order);
+ put_page_tag_ref(ref);
+ }
+#endif
}
static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
@@ -4568,7 +4583,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
*
* Returns the number of pages on the list or array.
*/
-unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
+unsigned long _alloc_pages_bulk(gfp_t gfp, int preferred_nid,
nodemask_t *nodemask, int nr_pages,
struct list_head *page_list,
struct page **page_array)
@@ -4704,7 +4719,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
pcp_trylock_finish(UP_flags);
failed:
- page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
+ page = _alloc_pages2(gfp, 0, preferred_nid, nodemask);
if (page) {
if (page_list)
list_add(&page->lru, page_list);
@@ -4715,12 +4730,12 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
goto out;
}
-EXPORT_SYMBOL_GPL(__alloc_pages_bulk);
+EXPORT_SYMBOL_GPL(_alloc_pages_bulk);
/*
* This is the 'heart' of the zoned buddy allocator.
*/
-struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
+struct page *_alloc_pages2(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask)
{
struct page *page;
@@ -4783,41 +4798,41 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
return page;
}
-EXPORT_SYMBOL(__alloc_pages);
+EXPORT_SYMBOL(_alloc_pages2);
-struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid,
+struct folio *_folio_alloc2(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask)
{
- struct page *page = __alloc_pages(gfp | __GFP_COMP, order,
+ struct page *page = _alloc_pages2(gfp | __GFP_COMP, order,
preferred_nid, nodemask);
if (page && order > 1)
prep_transhuge_page(page);
return (struct folio *)page;
}
-EXPORT_SYMBOL(__folio_alloc);
+EXPORT_SYMBOL(_folio_alloc2);
/*
* Common helper functions. Never use with __GFP_HIGHMEM because the returned
* address cannot represent highmem pages. Use alloc_pages and then kmap if
* you need to access high mem.
*/
-unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
+unsigned long _get_free_pages(gfp_t gfp_mask, unsigned int order)
{
struct page *page;
- page = alloc_pages(gfp_mask & ~__GFP_HIGHMEM, order);
+ page = _alloc_pages(gfp_mask & ~__GFP_HIGHMEM, order);
if (!page)
return 0;
return (unsigned long) page_address(page);
}
-EXPORT_SYMBOL(__get_free_pages);
+EXPORT_SYMBOL(_get_free_pages);
-unsigned long get_zeroed_page(gfp_t gfp_mask)
+unsigned long _get_zeroed_page(gfp_t gfp_mask)
{
- return __get_free_page(gfp_mask | __GFP_ZERO);
+ return _get_free_pages(gfp_mask | __GFP_ZERO, 0);
}
-EXPORT_SYMBOL(get_zeroed_page);
+EXPORT_SYMBOL(_get_zeroed_page);
/**
* __free_pages - Free pages allocated with alloc_pages().
@@ -5009,7 +5024,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
}
/**
- * alloc_pages_exact - allocate an exact number physically-contiguous pages.
+ * _alloc_pages_exact - allocate an exact number physically-contiguous pages.
* @size: the number of bytes to allocate
* @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
*
@@ -5023,7 +5038,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
*
* Return: pointer to the allocated area or %NULL in case of error.
*/
-void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
+void *_alloc_pages_exact(size_t size, gfp_t gfp_mask)
{
unsigned int order = get_order(size);
unsigned long addr;
@@ -5031,13 +5046,13 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
if (WARN_ON_ONCE(gfp_mask & (__GFP_COMP | __GFP_HIGHMEM)))
gfp_mask &= ~(__GFP_COMP | __GFP_HIGHMEM);
- addr = __get_free_pages(gfp_mask, order);
+ addr = _get_free_pages(gfp_mask, order);
return make_alloc_exact(addr, order, size);
}
-EXPORT_SYMBOL(alloc_pages_exact);
+EXPORT_SYMBOL(_alloc_pages_exact);
/**
- * alloc_pages_exact_nid - allocate an exact number of physically-contiguous
+ * _alloc_pages_exact_nid - allocate an exact number of physically-contiguous
* pages on a node.
* @nid: the preferred node ID where memory should be allocated
* @size: the number of bytes to allocate
@@ -5048,7 +5063,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
*
* Return: pointer to the allocated area or %NULL in case of error.
*/
-void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
+void * __meminit _alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
{
unsigned int order = get_order(size);
struct page *p;
@@ -5056,7 +5071,7 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
if (WARN_ON_ONCE(gfp_mask & (__GFP_COMP | __GFP_HIGHMEM)))
gfp_mask &= ~(__GFP_COMP | __GFP_HIGHMEM);
- p = alloc_pages_node(nid, gfp_mask, order);
+ p = _alloc_pages_node(nid, gfp_mask, order);
if (!p)
return NULL;
return make_alloc_exact((unsigned long)page_address(p), order, size);
@@ -6729,7 +6744,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
}
/**
- * alloc_contig_range() -- tries to allocate given range of pages
+ * _alloc_contig_range() -- tries to allocate given range of pages
* @start: start PFN to allocate
* @end: one-past-the-last PFN to allocate
* @migratetype: migratetype of the underlying pageblocks (either
@@ -6749,7 +6764,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
* pages which PFN is in [start, end) are allocated for the caller and
* need to be freed with free_contig_range().
*/
-int alloc_contig_range(unsigned long start, unsigned long end,
+int _alloc_contig_range(unsigned long start, unsigned long end,
unsigned migratetype, gfp_t gfp_mask)
{
unsigned long outer_start, outer_end;
@@ -6873,15 +6888,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
undo_isolate_page_range(start, end, migratetype);
return ret;
}
-EXPORT_SYMBOL(alloc_contig_range);
+EXPORT_SYMBOL(_alloc_contig_range);
static int __alloc_contig_pages(unsigned long start_pfn,
unsigned long nr_pages, gfp_t gfp_mask)
{
unsigned long end_pfn = start_pfn + nr_pages;
- return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
- gfp_mask);
+ return _alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
+ gfp_mask);
}
static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
@@ -6916,7 +6931,7 @@ static bool zone_spans_last_pfn(const struct zone *zone,
}
/**
- * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
+ * _alloc_contig_pages() -- tries to find and allocate contiguous range of pages
* @nr_pages: Number of contiguous pages to allocate
* @gfp_mask: GFP mask to limit search and used during compaction
* @nid: Target node
@@ -6936,8 +6951,8 @@ static bool zone_spans_last_pfn(const struct zone *zone,
*
* Return: pointer to contiguous pages on success, or NULL if not successful.
*/
-struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
- int nid, nodemask_t *nodemask)
+struct page *_alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
+ int nid, nodemask_t *nodemask)
{
unsigned long ret, pfn, flags;
struct zonelist *zonelist;
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 21/40] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (4 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 20/40] mm: enable page allocation tagging Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 23/40] lib: add codetag reference into slabobj_ext Suren Baghdasaryan
` (5 subsequent siblings)
11 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
For all page allocations to be tagged, page_ext has to be initialized
before the first page allocation. Early tasks allocate their stacks
using page allocator before alloc_node_page_ext() initializes page_ext
area, unless early_page_ext is enabled. Therefore these allocations will
generate a warning when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled.
Enable early_page_ext whenever CONFIG_MEM_ALLOC_PROFILING_DEBUG=y to
ensure page_ext initialization prior to any page allocation. This will
have all the negative effects associated with early_page_ext, such as
possible longer boot time, therefore we enable it only when debugging
with CONFIG_MEM_ALLOC_PROFILING_DEBUG enabled and not universally for
CONFIG_MEM_ALLOC_PROFILING.
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
mm/page_ext.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/page_ext.c b/mm/page_ext.c
index eaf054ec276c..55ba797f8881 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -96,7 +96,16 @@ unsigned long page_ext_size;
static unsigned long total_usage;
struct page_ext *lookup_page_ext(const struct page *page);
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+/*
+ * To ensure correct allocation tagging for pages, page_ext should be available
+ * before the first page allocation. Otherwise early task stacks will be
+ * allocated before page_ext initialization and missing tags will be flagged.
+ */
+bool early_page_ext __meminitdata = true;
+#else
bool early_page_ext __meminitdata;
+#endif
static int __init setup_early_page_ext(char *str)
{
early_page_ext = true;
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 23/40] lib: add codetag reference into slabobj_ext
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (5 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 21/40] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 28/40] timekeeping: Fix a circular include dependency Suren Baghdasaryan
` (4 subsequent siblings)
11 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
To store code tag for every slab object, a codetag reference is embedded
into slabobj_ext when CONFIG_MEM_ALLOC_PROFILING=y.
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Co-developed-by: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Signed-off-by: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
---
include/linux/memcontrol.h | 5 +++++
lib/Kconfig.debug | 1 +
mm/slab.h | 4 ++++
3 files changed, 10 insertions(+)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5e2da63c525f..c7f21b15b540 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1626,7 +1626,12 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
* if MEMCG_DATA_OBJEXTS is set.
*/
struct slabobj_ext {
+#ifdef CONFIG_MEMCG_KMEM
struct obj_cgroup *objcg;
+#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ union codetag_ref ref;
+#endif
} __aligned(8);
static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d3aa5ee0bf0d..4157c2251b07 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -968,6 +968,7 @@ config MEM_ALLOC_PROFILING
select CODE_TAGGING
select LAZY_PERCPU_COUNTER
select PAGE_EXTENSION
+ select SLAB_OBJ_EXT
help
Track allocation source code and record total allocation size
initiated at that code location. The mechanism can be used to track
diff --git a/mm/slab.h b/mm/slab.h
index bec202bdcfb8..f953e7c81e98 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -418,6 +418,10 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
static inline bool need_slab_obj_ext(void)
{
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ if (mem_alloc_profiling_enabled())
+ return true;
+#endif
/*
* CONFIG_MEMCG_KMEM creates vector of obj_cgroup objects conditionally
* inside memcg_slab_post_alloc_hook. No other users for now.
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 28/40] timekeeping: Fix a circular include dependency
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (6 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 23/40] lib: add codetag reference into slabobj_ext Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
[not found] ` <20230501165450.15352-29-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2023-05-01 16:54 ` [PATCH 30/40] mm: percpu: Add codetag reference into pcpuobj_ext Suren Baghdasaryan
` (3 subsequent siblings)
11 siblings, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
From: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
This avoids a circular header dependency in an upcoming patch by only
making hrtimer.h depend on percpu-defs.h
Signed-off-by: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
---
include/linux/hrtimer.h | 2 +-
include/linux/time_namespace.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 0ee140176f10..e67349e84364 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -16,7 +16,7 @@
#include <linux/rbtree.h>
#include <linux/init.h>
#include <linux/list.h>
-#include <linux/percpu.h>
+#include <linux/percpu-defs.h>
#include <linux/seqlock.h>
#include <linux/timer.h>
#include <linux/timerqueue.h>
diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index bb9d3f5542f8..d8e0cacfcae5 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -11,6 +11,8 @@
struct user_namespace;
extern struct user_namespace init_user_ns;
+struct vm_area_struct;
+
struct timens_offsets {
struct timespec64 monotonic;
struct timespec64 boottime;
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 30/40] mm: percpu: Add codetag reference into pcpuobj_ext
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (7 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 28/40] timekeeping: Fix a circular include dependency Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 32/40] arm64: Fix circular header dependency Suren Baghdasaryan
` (2 subsequent siblings)
11 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
From: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
To store codetag for every per-cpu allocation, a codetag reference is
embedded into pcpuobj_ext when CONFIG_MEM_ALLOC_PROFILING=y. Hooks to
use the newly introduced codetag are added.
Signed-off-by: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
mm/percpu-internal.h | 11 +++++++++--
mm/percpu.c | 26 ++++++++++++++++++++++++++
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index 2433e7b24172..c5d1d6723a66 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -36,9 +36,12 @@ struct pcpuobj_ext {
#ifdef CONFIG_MEMCG_KMEM
struct obj_cgroup *cgroup;
#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ union codetag_ref tag;
+#endif
};
-#ifdef CONFIG_MEMCG_KMEM
+#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEM_ALLOC_PROFILING)
#define NEED_PCPUOBJ_EXT
#endif
@@ -79,7 +82,11 @@ struct pcpu_chunk {
static inline bool need_pcpuobj_ext(void)
{
- return !mem_cgroup_kmem_disabled();
+ if (IS_ENABLED(CONFIG_MEM_ALLOC_PROFILING))
+ return true;
+ if (!mem_cgroup_kmem_disabled())
+ return true;
+ return false;
}
extern spinlock_t pcpu_lock;
diff --git a/mm/percpu.c b/mm/percpu.c
index 95b26a6b718d..4e2592f2e58f 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1701,6 +1701,32 @@ static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
}
#endif /* CONFIG_MEMCG_KMEM */
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off,
+ size_t size)
+{
+ if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts)) {
+ alloc_tag_add(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag,
+ current->alloc_tag, size);
+ }
+}
+
+static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
+{
+ if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts))
+ alloc_tag_sub_noalloc(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag, size);
+}
+#else
+static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off,
+ size_t size)
+{
+}
+
+static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
+{
+}
+#endif
+
/**
* pcpu_alloc - the percpu allocator
* @size: size of area to allocate in bytes
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 32/40] arm64: Fix circular header dependency
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (8 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 30/40] mm: percpu: Add codetag reference into pcpuobj_ext Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 33/40] move stack capture functionality into a separate function for reuse Suren Baghdasaryan
2023-05-01 17:47 ` [PATCH 00/40] Memory allocation profiling Roman Gushchin
11 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
From: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Replace linux/percpu.h include with asm/percpu.h to avoid circular
dependency.
Signed-off-by: Kent Overstreet <kent.overstreet-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
arch/arm64/include/asm/spectre.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/spectre.h b/arch/arm64/include/asm/spectre.h
index db7b371b367c..31823d9715ab 100644
--- a/arch/arm64/include/asm/spectre.h
+++ b/arch/arm64/include/asm/spectre.h
@@ -13,8 +13,8 @@
#define __BP_HARDEN_HYP_VECS_SZ ((BP_HARDEN_EL2_SLOTS - 1) * SZ_2K)
#ifndef __ASSEMBLY__
-
-#include <linux/percpu.h>
+#include <linux/smp.h>
+#include <asm/percpu.h>
#include <asm/cpufeature.h>
#include <asm/virt.h>
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 33/40] move stack capture functionality into a separate function for reuse
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (9 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 32/40] arm64: Fix circular header dependency Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 17:47 ` [PATCH 00/40] Memory allocation profiling Roman Gushchin
11 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
Cc: kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
Make save_stack() function part of stackdepot API to be used outside of
page_owner. Also rename task_struct's in_page_owner to in_capture_stack
flag to better convey the wider use of this flag.
Signed-off-by: Suren Baghdasaryan <surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
include/linux/sched.h | 6 ++--
include/linux/stackdepot.h | 16 +++++++++
lib/stackdepot.c | 68 ++++++++++++++++++++++++++++++++++++++
mm/page_owner.c | 52 ++---------------------------
4 files changed, 90 insertions(+), 52 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 33708bf8f191..6eca46ab6d78 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -942,9 +942,9 @@ struct task_struct {
/* Stalled due to lack of memory */
unsigned in_memstall:1;
#endif
-#ifdef CONFIG_PAGE_OWNER
- /* Used by page_owner=on to detect recursion in page tracking. */
- unsigned in_page_owner:1;
+#ifdef CONFIG_STACKDEPOT
+ /* Used by stack_depot_capture_stack to detect recursion. */
+ unsigned in_capture_stack:1;
#endif
#ifdef CONFIG_EVENTFD
/* Recursion prevention for eventfd_signal() */
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index e58306783d8e..baf7e80cf449 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -164,4 +164,20 @@ depot_stack_handle_t __must_check stack_depot_set_extra_bits(
*/
unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle);
+/**
+ * stack_depot_capture_init - Initialize stack depot capture mechanism
+ *
+ * Return: Stack depot initialization status
+ */
+bool stack_depot_capture_init(void);
+
+/**
+ * stack_depot_capture_stack - Capture current stack trace into stack depot
+ *
+ * @flags: Allocation GFP flags
+ *
+ * Return: Handle of the stack trace stored in depot, 0 on failure
+ */
+depot_stack_handle_t stack_depot_capture_stack(gfp_t flags);
+
#endif
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 2f5aa851834e..c7e5e22fcb16 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -539,3 +539,71 @@ unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
return parts.extra;
}
EXPORT_SYMBOL(stack_depot_get_extra_bits);
+
+static depot_stack_handle_t recursion_handle;
+static depot_stack_handle_t failure_handle;
+
+static __always_inline depot_stack_handle_t create_custom_stack(void)
+{
+ unsigned long entries[4];
+ unsigned int nr_entries;
+
+ nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
+ return stack_depot_save(entries, nr_entries, GFP_KERNEL);
+}
+
+static noinline void register_recursion_stack(void)
+{
+ recursion_handle = create_custom_stack();
+}
+
+static noinline void register_failure_stack(void)
+{
+ failure_handle = create_custom_stack();
+}
+
+bool stack_depot_capture_init(void)
+{
+ static DEFINE_MUTEX(stack_depot_capture_init_mutex);
+ static bool utility_stacks_ready;
+
+ mutex_lock(&stack_depot_capture_init_mutex);
+ if (!utility_stacks_ready) {
+ register_recursion_stack();
+ register_failure_stack();
+ utility_stacks_ready = true;
+ }
+ mutex_unlock(&stack_depot_capture_init_mutex);
+
+ return utility_stacks_ready;
+}
+
+/* TODO: teach stack_depot_capture_stack to use off stack temporal storage */
+#define CAPTURE_STACK_DEPTH (16)
+
+depot_stack_handle_t stack_depot_capture_stack(gfp_t flags)
+{
+ unsigned long entries[CAPTURE_STACK_DEPTH];
+ depot_stack_handle_t handle;
+ unsigned int nr_entries;
+
+ /*
+ * Avoid recursion.
+ *
+ * Sometimes page metadata allocation tracking requires more
+ * memory to be allocated:
+ * - when new stack trace is saved to stack depot
+ * - when backtrace itself is calculated (ia64)
+ */
+ if (current->in_capture_stack)
+ return recursion_handle;
+ current->in_capture_stack = 1;
+
+ nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
+ handle = stack_depot_save(entries, nr_entries, flags);
+ if (!handle)
+ handle = failure_handle;
+
+ current->in_capture_stack = 0;
+ return handle;
+}
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 8b6086c666e6..9fafbc290d5b 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -15,12 +15,6 @@
#include "internal.h"
-/*
- * TODO: teach PAGE_OWNER_STACK_DEPTH (__dump_page_owner and save_stack)
- * to use off stack temporal storage
- */
-#define PAGE_OWNER_STACK_DEPTH (16)
-
struct page_owner {
unsigned short order;
short last_migrate_reason;
@@ -37,8 +31,6 @@ struct page_owner {
static bool page_owner_enabled __initdata;
DEFINE_STATIC_KEY_FALSE(page_owner_inited);
-static depot_stack_handle_t dummy_handle;
-static depot_stack_handle_t failure_handle;
static depot_stack_handle_t early_handle;
static void init_early_allocated_pages(void);
@@ -68,16 +60,6 @@ static __always_inline depot_stack_handle_t create_dummy_stack(void)
return stack_depot_save(entries, nr_entries, GFP_KERNEL);
}
-static noinline void register_dummy_stack(void)
-{
- dummy_handle = create_dummy_stack();
-}
-
-static noinline void register_failure_stack(void)
-{
- failure_handle = create_dummy_stack();
-}
-
static noinline void register_early_stack(void)
{
early_handle = create_dummy_stack();
@@ -88,8 +70,7 @@ static __init void init_page_owner(void)
if (!page_owner_enabled)
return;
- register_dummy_stack();
- register_failure_stack();
+ stack_depot_capture_init();
register_early_stack();
static_branch_enable(&page_owner_inited);
init_early_allocated_pages();
@@ -107,33 +88,6 @@ static inline struct page_owner *get_page_owner(struct page_ext *page_ext)
return (void *)page_ext + page_owner_ops.offset;
}
-static noinline depot_stack_handle_t save_stack(gfp_t flags)
-{
- unsigned long entries[PAGE_OWNER_STACK_DEPTH];
- depot_stack_handle_t handle;
- unsigned int nr_entries;
-
- /*
- * Avoid recursion.
- *
- * Sometimes page metadata allocation tracking requires more
- * memory to be allocated:
- * - when new stack trace is saved to stack depot
- * - when backtrace itself is calculated (ia64)
- */
- if (current->in_page_owner)
- return dummy_handle;
- current->in_page_owner = 1;
-
- nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
- handle = stack_depot_save(entries, nr_entries, flags);
- if (!handle)
- handle = failure_handle;
-
- current->in_page_owner = 0;
- return handle;
-}
-
void __reset_page_owner(struct page *page, unsigned short order)
{
int i;
@@ -146,7 +100,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
if (unlikely(!page_ext))
return;
- handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
+ handle = stack_depot_capture_stack(GFP_NOWAIT | __GFP_NOWARN);
for (i = 0; i < (1 << order); i++) {
__clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
page_owner = get_page_owner(page_ext);
@@ -189,7 +143,7 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
struct page_ext *page_ext;
depot_stack_handle_t handle;
- handle = save_stack(gfp_mask);
+ handle = stack_depot_capture_stack(gfp_mask);
page_ext = page_ext_get(page);
if (unlikely(!page_ext))
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH 00/40] Memory allocation profiling
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
` (10 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 33/40] move stack capture functionality into a separate function for reuse Suren Baghdasaryan
@ 2023-05-01 17:47 ` Roman Gushchin
2023-05-01 18:08 ` Suren Baghdasaryan
11 siblings, 1 reply; 160+ messages in thread
From: Roman Gushchin @ 2023-05-01 17:47 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
kent.overstreet-fxUVXftIFDnyG1zEObXtfA, mhocko-IBi9RG/b67k,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
mgorman-l3A5Bk7waGM, dave-h16yJtLeMjHk1uMJSBkQmQ,
willy-wEGCiKHe2LqWVfeAwA7xHQ, liam.howlett-QHcLZuEGTsvQT0dZR+AlfA,
corbet-T1hC0tSOHrs, void-gq6j2QGBifHby3iVrkZq2A,
peterz-wEGCiKHe2LqWVfeAwA7xHQ, juri.lelli-H+wXaHxf7aLQT0dZR+AlfA,
ldufour-tEXmvtCZX7AybS5Ee8rs3A, catalin.marinas-5wv7dgnIgG8,
will-DgEjT+Ai2ygdnm+yROfE0A, arnd-r2nGTMty4D4,
tglx-hfZtesqFncYOwBW4kG4KsQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
dave.hansen-VuQAYsv1563Yd54FQh9/CA, x86-DgEjT+Ai2ygdnm+yROfE0A,
peterx-H+wXaHxf7aLQT0dZR+AlfA, david-H+wXaHxf7aLQT0dZR+AlfA,
axboe-tSWWG44O7X1aa/9Udqfwiw, mcgrof-DgEjT+Ai2ygdnm+yROfE0A,
masahiroy-DgEjT+Ai2ygdnm+yROfE0A, nathan-DgEjT+Ai2ygdnm+yROfE0A,
dennis-DgEjT+Ai2ygdnm+yROfE0A, tj-DgEjT+Ai2ygdnm+yROfE0A,
muchun.song-fxUVXftIFDnyG1zEObXtfA, rppt-DgEjT+Ai2ygdnm+yROfE0A,
paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w,
keescook-F7+t8E8rja9g9hUCZPvPmw
On Mon, May 01, 2023 at 09:54:10AM -0700, Suren Baghdasaryan wrote:
> Performance overhead:
> To evaluate performance we implemented an in-kernel test executing
> multiple get_free_page/free_page and kmalloc/kfree calls with allocation
> sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
> affinity set to a specific CPU to minimize the noise. Below is performance
> comparison between the baseline kernel, profiling when enabled, profiling
> when disabled (nomem_profiling=y) and (for comparison purposes) baseline
> with CONFIG_MEMCG_KMEM enabled and allocations using __GFP_ACCOUNT:
>
> kmalloc pgalloc
> Baseline (6.3-rc7) 9.200s 31.050s
> profiling disabled 9.800 (+6.52%) 32.600 (+4.99%)
> profiling enabled 12.500 (+35.87%) 39.010 (+25.60%)
> memcg_kmem enabled 41.400 (+350.00%) 70.600 (+127.38%)
Hm, this makes me think we have a regression with memcg_kmem in one of
the recent releases. When I measured it a couple of years ago, the overhead
was definitely within 100%.
Do you understand what makes the your profiling drastically faster than kmem?
Thanks!
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 00/40] Memory allocation profiling
2023-05-01 17:47 ` [PATCH 00/40] Memory allocation profiling Roman Gushchin
@ 2023-05-01 18:08 ` Suren Baghdasaryan
2023-05-01 18:14 ` Roman Gushchin
0 siblings, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 18:08 UTC (permalink / raw)
To: Roman Gushchin
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, mgorman, dave,
willy, liam.howlett, corbet, void, peterz, juri.lelli, ldufour,
catalin.marinas, will, arnd, tglx, mingo, dave.hansen, x86,
peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Mon, May 1, 2023 at 10:47 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Mon, May 01, 2023 at 09:54:10AM -0700, Suren Baghdasaryan wrote:
> > Performance overhead:
> > To evaluate performance we implemented an in-kernel test executing
> > multiple get_free_page/free_page and kmalloc/kfree calls with allocation
> > sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
> > affinity set to a specific CPU to minimize the noise. Below is performance
> > comparison between the baseline kernel, profiling when enabled, profiling
> > when disabled (nomem_profiling=y) and (for comparison purposes) baseline
> > with CONFIG_MEMCG_KMEM enabled and allocations using __GFP_ACCOUNT:
> >
> > kmalloc pgalloc
> > Baseline (6.3-rc7) 9.200s 31.050s
> > profiling disabled 9.800 (+6.52%) 32.600 (+4.99%)
> > profiling enabled 12.500 (+35.87%) 39.010 (+25.60%)
> > memcg_kmem enabled 41.400 (+350.00%) 70.600 (+127.38%)
>
> Hm, this makes me think we have a regression with memcg_kmem in one of
> the recent releases. When I measured it a couple of years ago, the overhead
> was definitely within 100%.
>
> Do you understand what makes the your profiling drastically faster than kmem?
I haven't profiled or looked into kmem overhead closely but I can do
that. I just wanted to see how the overhead compares with the existing
accounting mechanisms.
For kmalloc, the overhead is low because after we create the vector of
slab_ext objects (which is the same as what memcg_kmem does), memory
profiling just increments a lazy counter (which in many cases would be
a per-cpu counter). memcg_kmem operates on cgroup hierarchy with
additional overhead associated with that. I'm guessing that's the
reason for the big difference between these mechanisms but, I didn't
look into the details to understand memcg_kmem performance.
>
> Thanks!
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 00/40] Memory allocation profiling
2023-05-01 18:08 ` Suren Baghdasaryan
@ 2023-05-01 18:14 ` Roman Gushchin
2023-05-01 19:37 ` Kent Overstreet
0 siblings, 1 reply; 160+ messages in thread
From: Roman Gushchin @ 2023-05-01 18:14 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, mhocko, vbabka, hannes, mgorman, dave,
willy, liam.howlett, corbet, void, peterz, juri.lelli, ldufour,
catalin.marinas, will, arnd, tglx, mingo, dave.hansen, x86,
peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Mon, May 01, 2023 at 11:08:05AM -0700, Suren Baghdasaryan wrote:
> On Mon, May 1, 2023 at 10:47 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Mon, May 01, 2023 at 09:54:10AM -0700, Suren Baghdasaryan wrote:
> > > Performance overhead:
> > > To evaluate performance we implemented an in-kernel test executing
> > > multiple get_free_page/free_page and kmalloc/kfree calls with allocation
> > > sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
> > > affinity set to a specific CPU to minimize the noise. Below is performance
> > > comparison between the baseline kernel, profiling when enabled, profiling
> > > when disabled (nomem_profiling=y) and (for comparison purposes) baseline
> > > with CONFIG_MEMCG_KMEM enabled and allocations using __GFP_ACCOUNT:
> > >
> > > kmalloc pgalloc
> > > Baseline (6.3-rc7) 9.200s 31.050s
> > > profiling disabled 9.800 (+6.52%) 32.600 (+4.99%)
> > > profiling enabled 12.500 (+35.87%) 39.010 (+25.60%)
> > > memcg_kmem enabled 41.400 (+350.00%) 70.600 (+127.38%)
> >
> > Hm, this makes me think we have a regression with memcg_kmem in one of
> > the recent releases. When I measured it a couple of years ago, the overhead
> > was definitely within 100%.
> >
> > Do you understand what makes the your profiling drastically faster than kmem?
>
> I haven't profiled or looked into kmem overhead closely but I can do
> that. I just wanted to see how the overhead compares with the existing
> accounting mechanisms.
It's a good idea and I generally think that +25-35% for kmalloc/pgalloc
should be ok for the production use, which is great!
In the reality, most workloads are not that sensitive to the speed of
memory allocation.
>
> For kmalloc, the overhead is low because after we create the vector of
> slab_ext objects (which is the same as what memcg_kmem does), memory
> profiling just increments a lazy counter (which in many cases would be
> a per-cpu counter).
So does kmem (this is why I'm somewhat surprised by the difference).
> memcg_kmem operates on cgroup hierarchy with
> additional overhead associated with that. I'm guessing that's the
> reason for the big difference between these mechanisms but, I didn't
> look into the details to understand memcg_kmem performance.
I suspect recent rt-related changes and also the wide usage of
rcu primitives in the kmem code. I'll try to look closer as well.
Thanks!
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 00/40] Memory allocation profiling
2023-05-01 18:14 ` Roman Gushchin
@ 2023-05-01 19:37 ` Kent Overstreet
[not found] ` <ZFAVFlrRtpVgxJ0q-jC9Py7bek1znysI04z7BkA@public.gmane.org>
0 siblings, 1 reply; 160+ messages in thread
From: Kent Overstreet @ 2023-05-01 19:37 UTC (permalink / raw)
To: Roman Gushchin
Cc: Suren Baghdasaryan, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
mhocko-IBi9RG/b67k, vbabka-AlSwsSmVLrQ,
hannes-druUgvl0LCNAfugRpC6u6w, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w, keescook
On Mon, May 01, 2023 at 11:14:45AM -0700, Roman Gushchin wrote:
> It's a good idea and I generally think that +25-35% for kmalloc/pgalloc
> should be ok for the production use, which is great!
> In the reality, most workloads are not that sensitive to the speed of
> memory allocation.
:)
My main takeaway has been "the slub fast path is _really_ fast". No
disabling of preemption, no atomic instructions, just a non locked
double word cmpxchg - it's a slick piece of work.
> > For kmalloc, the overhead is low because after we create the vector of
> > slab_ext objects (which is the same as what memcg_kmem does), memory
> > profiling just increments a lazy counter (which in many cases would be
> > a per-cpu counter).
>
> So does kmem (this is why I'm somewhat surprised by the difference).
>
> > memcg_kmem operates on cgroup hierarchy with
> > additional overhead associated with that. I'm guessing that's the
> > reason for the big difference between these mechanisms but, I didn't
> > look into the details to understand memcg_kmem performance.
>
> I suspect recent rt-related changes and also the wide usage of
> rcu primitives in the kmem code. I'll try to look closer as well.
Happy to give you something to compare against :)
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 31/40] mm: percpu: enable per-cpu allocation tagging
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (21 preceding siblings ...)
[not found] ` <20230501165450.15352-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 34/40] lib: code tagging context capture support Suren Baghdasaryan
` (7 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Redefine __alloc_percpu, __alloc_percpu_gfp and __alloc_reserved_percpu
to record allocations and deallocations done by these functions.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/percpu.h | 19 ++++++++----
mm/percpu.c | 66 +++++-------------------------------------
2 files changed, 22 insertions(+), 63 deletions(-)
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 1338ea2aa720..51ec257379af 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -2,12 +2,14 @@
#ifndef __LINUX_PERCPU_H
#define __LINUX_PERCPU_H
+#include <linux/alloc_tag.h>
#include <linux/mmdebug.h>
#include <linux/preempt.h>
#include <linux/smp.h>
#include <linux/cpumask.h>
#include <linux/pfn.h>
#include <linux/init.h>
+#include <linux/sched.h>
#include <asm/percpu.h>
@@ -116,7 +118,6 @@ extern int __init pcpu_page_first_chunk(size_t reserved_size,
pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
#endif
-extern void __percpu *__alloc_reserved_percpu(size_t size, size_t align) __alloc_size(1);
extern bool __is_kernel_percpu_address(unsigned long addr, unsigned long *can_addr);
extern bool is_kernel_percpu_address(unsigned long addr);
@@ -124,10 +125,15 @@ extern bool is_kernel_percpu_address(unsigned long addr);
extern void __init setup_per_cpu_areas(void);
#endif
-extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp) __alloc_size(1);
-extern void __percpu *__alloc_percpu(size_t size, size_t align) __alloc_size(1);
-extern void free_percpu(void __percpu *__pdata);
-extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
+extern void __percpu *__pcpu_alloc(size_t size, size_t align, bool reserved,
+ gfp_t gfp) __alloc_size(1);
+
+#define __alloc_percpu_gfp(_size, _align, _gfp) alloc_hooks( \
+ __pcpu_alloc(_size, _align, false, _gfp), void __percpu *, NULL)
+#define __alloc_percpu(_size, _align) alloc_hooks( \
+ __pcpu_alloc(_size, _align, false, GFP_KERNEL), void __percpu *, NULL)
+#define __alloc_reserved_percpu(_size, _align) alloc_hooks( \
+ __pcpu_alloc(_size, _align, true, GFP_KERNEL), void __percpu *, NULL)
#define alloc_percpu_gfp(type, gfp) \
(typeof(type) __percpu *)__alloc_percpu_gfp(sizeof(type), \
@@ -136,6 +142,9 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
(typeof(type) __percpu *)__alloc_percpu(sizeof(type), \
__alignof__(type))
+extern void free_percpu(void __percpu *__pdata);
+extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
+
extern unsigned long pcpu_nr_pages(void);
#endif /* __LINUX_PERCPU_H */
diff --git a/mm/percpu.c b/mm/percpu.c
index 4e2592f2e58f..4b5cf260d8e0 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1728,7 +1728,7 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
#endif
/**
- * pcpu_alloc - the percpu allocator
+ * __pcpu_alloc - the percpu allocator
* @size: size of area to allocate in bytes
* @align: alignment of area (max PAGE_SIZE)
* @reserved: allocate from the reserved chunk if available
@@ -1742,8 +1742,8 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
* RETURNS:
* Percpu pointer to the allocated area on success, NULL on failure.
*/
-static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
- gfp_t gfp)
+void __percpu *__pcpu_alloc(size_t size, size_t align, bool reserved,
+ gfp_t gfp)
{
gfp_t pcpu_gfp;
bool is_atomic;
@@ -1909,6 +1909,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
pcpu_memcg_post_alloc_hook(objcg, chunk, off, size);
+ pcpu_alloc_tag_alloc_hook(chunk, off, size);
+
return ptr;
fail_unlock:
@@ -1935,61 +1937,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
return NULL;
}
-
-/**
- * __alloc_percpu_gfp - allocate dynamic percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- * @gfp: allocation flags
- *
- * Allocate zero-filled percpu area of @size bytes aligned at @align. If
- * @gfp doesn't contain %GFP_KERNEL, the allocation doesn't block and can
- * be called from any context but is a lot more likely to fail. If @gfp
- * has __GFP_NOWARN then no warning will be triggered on invalid or failed
- * allocation requests.
- *
- * RETURNS:
- * Percpu pointer to the allocated area on success, NULL on failure.
- */
-void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp)
-{
- return pcpu_alloc(size, align, false, gfp);
-}
-EXPORT_SYMBOL_GPL(__alloc_percpu_gfp);
-
-/**
- * __alloc_percpu - allocate dynamic percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- *
- * Equivalent to __alloc_percpu_gfp(size, align, %GFP_KERNEL).
- */
-void __percpu *__alloc_percpu(size_t size, size_t align)
-{
- return pcpu_alloc(size, align, false, GFP_KERNEL);
-}
-EXPORT_SYMBOL_GPL(__alloc_percpu);
-
-/**
- * __alloc_reserved_percpu - allocate reserved percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- *
- * Allocate zero-filled percpu area of @size bytes aligned at @align
- * from reserved percpu area if arch has set it up; otherwise,
- * allocation is served from the same dynamic area. Might sleep.
- * Might trigger writeouts.
- *
- * CONTEXT:
- * Does GFP_KERNEL allocation.
- *
- * RETURNS:
- * Percpu pointer to the allocated area on success, NULL on failure.
- */
-void __percpu *__alloc_reserved_percpu(size_t size, size_t align)
-{
- return pcpu_alloc(size, align, true, GFP_KERNEL);
-}
+EXPORT_SYMBOL_GPL(__pcpu_alloc);
/**
* pcpu_balance_free - manage the amount of free chunks
@@ -2299,6 +2247,8 @@ void free_percpu(void __percpu *ptr)
size = pcpu_free_area(chunk, off);
+ pcpu_alloc_tag_free_hook(chunk, off, size);
+
pcpu_memcg_free_hook(chunk, off, size);
/*
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 34/40] lib: code tagging context capture support
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (22 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 31/40] mm: percpu: enable per-cpu allocation tagging Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-03 7:35 ` Michal Hocko
2023-05-01 16:54 ` [PATCH 35/40] lib: implement context capture support for tagged allocations Suren Baghdasaryan
` (6 subsequent siblings)
30 siblings, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Add support for code tag context capture when registering a new code tag
type. When context capture for a specific code tag is enabled,
codetag_ref will point to a codetag_ctx object which can be attached
to an application-specific object storing code invocation context.
codetag_ctx has a pointer to its codetag_with_ctx object with embedded
codetag object in it. All context objects of the same code tag are placed
into codetag_with_ctx.ctx_head linked list. codetag.flag is used to
indicate when a context capture for the associated code tag is
initialized and enabled.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/codetag.h | 50 +++++++++++++-
include/linux/codetag_ctx.h | 48 +++++++++++++
lib/codetag.c | 134 ++++++++++++++++++++++++++++++++++++
3 files changed, 231 insertions(+), 1 deletion(-)
create mode 100644 include/linux/codetag_ctx.h
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index 87207f199ac9..9ab2f017e845 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -5,8 +5,12 @@
#ifndef _LINUX_CODETAG_H
#define _LINUX_CODETAG_H
+#include <linux/container_of.h>
+#include <linux/spinlock.h>
#include <linux/types.h>
+struct kref;
+struct codetag_ctx;
struct codetag_iterator;
struct codetag_type;
struct seq_buf;
@@ -18,15 +22,38 @@ struct module;
* an array of these.
*/
struct codetag {
- unsigned int flags; /* used in later patches */
+ unsigned int flags; /* has to be the first member shared with codetag_ctx */
unsigned int lineno;
const char *modname;
const char *function;
const char *filename;
} __aligned(8);
+/* codetag_with_ctx flags */
+#define CTC_FLAG_CTX_PTR (1 << 0)
+#define CTC_FLAG_CTX_READY (1 << 1)
+#define CTC_FLAG_CTX_ENABLED (1 << 2)
+
+/*
+ * Code tag with context capture support. Contains a list to store context for
+ * each tag hit, a lock protecting the list and a flag to indicate whether
+ * context capture is enabled for the tag.
+ */
+struct codetag_with_ctx {
+ struct codetag ct;
+ struct list_head ctx_head;
+ spinlock_t ctx_lock;
+} __aligned(8);
+
+/*
+ * Tag reference can point to codetag directly or indirectly via codetag_ctx.
+ * Direct codetag pointer is used when context capture is disabled or not
+ * supported. When context capture for the tag is used, the reference points
+ * to the codetag_ctx through which the codetag can be reached.
+ */
union codetag_ref {
struct codetag *ct;
+ struct codetag_ctx *ctx;
};
struct codetag_range {
@@ -46,6 +73,7 @@ struct codetag_type_desc {
struct codetag_module *cmod);
bool (*module_unload)(struct codetag_type *cttype,
struct codetag_module *cmod);
+ void (*free_ctx)(struct kref *ref);
};
struct codetag_iterator {
@@ -53,6 +81,7 @@ struct codetag_iterator {
struct codetag_module *cmod;
unsigned long mod_id;
struct codetag *ct;
+ struct codetag_ctx *ctx;
};
#define CODE_TAG_INIT { \
@@ -63,9 +92,28 @@ struct codetag_iterator {
.flags = 0, \
}
+static inline bool is_codetag_ctx_ref(union codetag_ref *ref)
+{
+ return !!(ref->ct->flags & CTC_FLAG_CTX_PTR);
+}
+
+static inline
+struct codetag_with_ctx *ct_to_ctc(struct codetag *ct)
+{
+ return container_of(ct, struct codetag_with_ctx, ct);
+}
+
void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
struct codetag *codetag_next_ct(struct codetag_iterator *iter);
+struct codetag_ctx *codetag_next_ctx(struct codetag_iterator *iter);
+
+bool codetag_enable_ctx(struct codetag_with_ctx *ctc, bool enable);
+static inline bool codetag_ctx_enabled(struct codetag_with_ctx *ctc)
+{
+ return !!(ctc->ct.flags & CTC_FLAG_CTX_ENABLED);
+}
+bool codetag_has_ctx(struct codetag_with_ctx *ctc);
void codetag_to_text(struct seq_buf *out, struct codetag *ct);
diff --git a/include/linux/codetag_ctx.h b/include/linux/codetag_ctx.h
new file mode 100644
index 000000000000..e741484f0e08
--- /dev/null
+++ b/include/linux/codetag_ctx.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * code tag context
+ */
+#ifndef _LINUX_CODETAG_CTX_H
+#define _LINUX_CODETAG_CTX_H
+
+#include <linux/codetag.h>
+#include <linux/kref.h>
+
+/* Code tag hit context. */
+struct codetag_ctx {
+ unsigned int flags; /* has to be the first member shared with codetag */
+ struct codetag_with_ctx *ctc;
+ struct list_head node;
+ struct kref refcount;
+} __aligned(8);
+
+static inline struct codetag_ctx *kref_to_ctx(struct kref *refcount)
+{
+ return container_of(refcount, struct codetag_ctx, refcount);
+}
+
+static inline void add_ctx(struct codetag_ctx *ctx,
+ struct codetag_with_ctx *ctc)
+{
+ kref_init(&ctx->refcount);
+ spin_lock(&ctc->ctx_lock);
+ ctx->flags = CTC_FLAG_CTX_PTR;
+ ctx->ctc = ctc;
+ list_add_tail(&ctx->node, &ctc->ctx_head);
+ spin_unlock(&ctc->ctx_lock);
+}
+
+static inline void rem_ctx(struct codetag_ctx *ctx,
+ void (*free_ctx)(struct kref *refcount))
+{
+ struct codetag_with_ctx *ctc = ctx->ctc;
+
+ spin_lock(&ctc->ctx_lock);
+ /* ctx might have been removed while we were using it */
+ if (!list_empty(&ctx->node))
+ list_del_init(&ctx->node);
+ spin_unlock(&ctc->ctx_lock);
+ kref_put(&ctx->refcount, free_ctx);
+}
+
+#endif /* _LINUX_CODETAG_CTX_H */
diff --git a/lib/codetag.c b/lib/codetag.c
index 84f90f3b922c..d891bbe4481d 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/codetag.h>
+#include <linux/codetag_ctx.h>
#include <linux/idr.h>
#include <linux/kallsyms.h>
#include <linux/module.h>
@@ -92,6 +93,139 @@ struct codetag *codetag_next_ct(struct codetag_iterator *iter)
return ct;
}
+static struct codetag_ctx *next_ctx_from_ct(struct codetag_iterator *iter)
+{
+ struct codetag_with_ctx *ctc;
+ struct codetag_ctx *ctx = NULL;
+ struct codetag *ct = iter->ct;
+
+ while (ct) {
+ if (!(ct->flags & CTC_FLAG_CTX_READY))
+ goto next;
+
+ ctc = ct_to_ctc(ct);
+ spin_lock(&ctc->ctx_lock);
+ if (!list_empty(&ctc->ctx_head)) {
+ ctx = list_first_entry(&ctc->ctx_head,
+ struct codetag_ctx, node);
+ kref_get(&ctx->refcount);
+ }
+ spin_unlock(&ctc->ctx_lock);
+ if (ctx)
+ break;
+next:
+ ct = codetag_next_ct(iter);
+ }
+
+ iter->ctx = ctx;
+ return ctx;
+}
+
+struct codetag_ctx *codetag_next_ctx(struct codetag_iterator *iter)
+{
+ struct codetag_ctx *ctx = iter->ctx;
+ struct codetag_ctx *found = NULL;
+
+ lockdep_assert_held(&iter->cttype->mod_lock);
+
+ if (!ctx)
+ return next_ctx_from_ct(iter);
+
+ spin_lock(&ctx->ctc->ctx_lock);
+ /*
+ * Do not advance if the object was isolated, restart at the same tag.
+ */
+ if (!list_empty(&ctx->node)) {
+ if (list_is_last(&ctx->node, &ctx->ctc->ctx_head)) {
+ /* Finished with this tag, advance to the next */
+ codetag_next_ct(iter);
+ } else {
+ found = list_next_entry(ctx, node);
+ kref_get(&found->refcount);
+ }
+ }
+ spin_unlock(&ctx->ctc->ctx_lock);
+ kref_put(&ctx->refcount, iter->cttype->desc.free_ctx);
+
+ if (!found)
+ return next_ctx_from_ct(iter);
+
+ iter->ctx = found;
+ return found;
+}
+
+static struct codetag_type *find_cttype(struct codetag *ct)
+{
+ struct codetag_module *cmod;
+ struct codetag_type *cttype;
+ unsigned long mod_id;
+ unsigned long tmp;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ down_read(&cttype->mod_lock);
+ idr_for_each_entry_ul(&cttype->mod_idr, cmod, tmp, mod_id) {
+ if (ct >= cmod->range.start && ct < cmod->range.stop) {
+ up_read(&cttype->mod_lock);
+ goto found;
+ }
+ }
+ up_read(&cttype->mod_lock);
+ }
+ cttype = NULL;
+found:
+ mutex_unlock(&codetag_lock);
+
+ return cttype;
+}
+
+bool codetag_enable_ctx(struct codetag_with_ctx *ctc, bool enable)
+{
+ struct codetag_type *cttype = find_cttype(&ctc->ct);
+
+ if (!cttype || !cttype->desc.free_ctx)
+ return false;
+
+ lockdep_assert_held(&cttype->mod_lock);
+ BUG_ON(!rwsem_is_locked(&cttype->mod_lock));
+
+ if (codetag_ctx_enabled(ctc) == enable)
+ return false;
+
+ if (enable) {
+ /* Initialize context capture fields only once */
+ if (!(ctc->ct.flags & CTC_FLAG_CTX_READY)) {
+ spin_lock_init(&ctc->ctx_lock);
+ INIT_LIST_HEAD(&ctc->ctx_head);
+ ctc->ct.flags |= CTC_FLAG_CTX_READY;
+ }
+ ctc->ct.flags |= CTC_FLAG_CTX_ENABLED;
+ } else {
+ /*
+ * The list of context objects is intentionally left untouched.
+ * It can be read back and if context capture is re-enablied it
+ * will append new objects.
+ */
+ ctc->ct.flags &= ~CTC_FLAG_CTX_ENABLED;
+ }
+
+ return true;
+}
+
+bool codetag_has_ctx(struct codetag_with_ctx *ctc)
+{
+ bool no_ctx;
+
+ if (!(ctc->ct.flags & CTC_FLAG_CTX_READY))
+ return false;
+
+ spin_lock(&ctc->ctx_lock);
+ no_ctx = list_empty(&ctc->ctx_head);
+ spin_unlock(&ctc->ctx_lock);
+
+ return !no_ctx;
+}
+
void codetag_to_text(struct seq_buf *out, struct codetag *ct)
{
seq_buf_printf(out, "%s:%u module:%s func:%s",
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH 34/40] lib: code tagging context capture support
2023-05-01 16:54 ` [PATCH 34/40] lib: code tagging context capture support Suren Baghdasaryan
@ 2023-05-03 7:35 ` Michal Hocko
2023-05-03 15:18 ` Suren Baghdasaryan
0 siblings, 1 reply; 160+ messages in thread
From: Michal Hocko @ 2023-05-03 7:35 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Mon 01-05-23 09:54:44, Suren Baghdasaryan wrote:
[...]
> +static inline void add_ctx(struct codetag_ctx *ctx,
> + struct codetag_with_ctx *ctc)
> +{
> + kref_init(&ctx->refcount);
> + spin_lock(&ctc->ctx_lock);
> + ctx->flags = CTC_FLAG_CTX_PTR;
> + ctx->ctc = ctc;
> + list_add_tail(&ctx->node, &ctc->ctx_head);
> + spin_unlock(&ctc->ctx_lock);
AFAIU every single tracked allocation will get its own codetag_ctx.
There is no aggregation per allocation site or anything else. This looks
like a scalability and a memory overhead red flag to me.
> +}
> +
> +static inline void rem_ctx(struct codetag_ctx *ctx,
> + void (*free_ctx)(struct kref *refcount))
> +{
> + struct codetag_with_ctx *ctc = ctx->ctc;
> +
> + spin_lock(&ctc->ctx_lock);
This could deadlock when allocator is called from the IRQ context.
> + /* ctx might have been removed while we were using it */
> + if (!list_empty(&ctx->node))
> + list_del_init(&ctx->node);
> + spin_unlock(&ctc->ctx_lock);
> + kref_put(&ctx->refcount, free_ctx);
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 34/40] lib: code tagging context capture support
2023-05-03 7:35 ` Michal Hocko
@ 2023-05-03 15:18 ` Suren Baghdasaryan
2023-05-03 15:26 ` Dave Hansen
[not found] ` <CAJuCfpHrZ4kWYFPvA3W9J+CmNMuOtGa_ZMXE9fOmKsPQeNt2tg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 2 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-03 15:18 UTC (permalink / raw)
To: Michal Hocko
Cc: akpm, kent.overstreet, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Wed, May 3, 2023 at 12:36 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 01-05-23 09:54:44, Suren Baghdasaryan wrote:
> [...]
> > +static inline void add_ctx(struct codetag_ctx *ctx,
> > + struct codetag_with_ctx *ctc)
> > +{
> > + kref_init(&ctx->refcount);
> > + spin_lock(&ctc->ctx_lock);
> > + ctx->flags = CTC_FLAG_CTX_PTR;
> > + ctx->ctc = ctc;
> > + list_add_tail(&ctx->node, &ctc->ctx_head);
> > + spin_unlock(&ctc->ctx_lock);
>
> AFAIU every single tracked allocation will get its own codetag_ctx.
> There is no aggregation per allocation site or anything else. This looks
> like a scalability and a memory overhead red flag to me.
True. The allocations here would not be limited. We could introduce a
global limit to the amount of memory that we can use to store contexts
and maybe reuse the oldest entry (in LRU fashion) when we hit that
limit?
>
> > +}
> > +
> > +static inline void rem_ctx(struct codetag_ctx *ctx,
> > + void (*free_ctx)(struct kref *refcount))
> > +{
> > + struct codetag_with_ctx *ctc = ctx->ctc;
> > +
> > + spin_lock(&ctc->ctx_lock);
>
> This could deadlock when allocator is called from the IRQ context.
I see. spin_lock_irqsave() then?
Thanks for the feedback!
Suren.
>
> > + /* ctx might have been removed while we were using it */
> > + if (!list_empty(&ctx->node))
> > + list_del_init(&ctx->node);
> > + spin_unlock(&ctc->ctx_lock);
> > + kref_put(&ctx->refcount, free_ctx);
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 34/40] lib: code tagging context capture support
2023-05-03 15:18 ` Suren Baghdasaryan
@ 2023-05-03 15:26 ` Dave Hansen
2023-05-03 19:45 ` Suren Baghdasaryan
[not found] ` <CAJuCfpHrZ4kWYFPvA3W9J+CmNMuOtGa_ZMXE9fOmKsPQeNt2tg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 1 reply; 160+ messages in thread
From: Dave Hansen @ 2023-05-03 15:26 UTC (permalink / raw)
To: Suren Baghdasaryan, Michal Hocko
Cc: akpm, kent.overstreet, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On 5/3/23 08:18, Suren Baghdasaryan wrote:
>>> +static inline void rem_ctx(struct codetag_ctx *ctx,
>>> + void (*free_ctx)(struct kref *refcount))
>>> +{
>>> + struct codetag_with_ctx *ctc = ctx->ctc;
>>> +
>>> + spin_lock(&ctc->ctx_lock);
>> This could deadlock when allocator is called from the IRQ context.
> I see. spin_lock_irqsave() then?
Yes. But, even better, please turn on lockdep when you are testing. It
will find these for you. If you're on x86, we have a set of handy-dandy
debug options that you can add to an existing config with:
make x86_debug.config
That said, I'm as concerned as everyone else that this is all "new" code
and doesn't lean on existing tracing or things like PAGE_OWNER enough.
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 34/40] lib: code tagging context capture support
2023-05-03 15:26 ` Dave Hansen
@ 2023-05-03 19:45 ` Suren Baghdasaryan
0 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-03 19:45 UTC (permalink / raw)
To: Dave Hansen
Cc: Michal Hocko, akpm, kent.overstreet, vbabka, hannes,
roman.gushchin, mgorman, dave, willy, liam.howlett, corbet, void,
peterz, juri.lelli, ldufour, catalin.marinas, will, arnd, tglx,
mingo, dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, an
On Wed, May 3, 2023 at 8:26 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 5/3/23 08:18, Suren Baghdasaryan wrote:
> >>> +static inline void rem_ctx(struct codetag_ctx *ctx,
> >>> + void (*free_ctx)(struct kref *refcount))
> >>> +{
> >>> + struct codetag_with_ctx *ctc = ctx->ctc;
> >>> +
> >>> + spin_lock(&ctc->ctx_lock);
> >> This could deadlock when allocator is called from the IRQ context.
> > I see. spin_lock_irqsave() then?
>
> Yes. But, even better, please turn on lockdep when you are testing. It
> will find these for you. If you're on x86, we have a set of handy-dandy
> debug options that you can add to an existing config with:
>
> make x86_debug.config
Nice!
I thought I tested with lockdep enabled but I might be wrong. The
beauty of working on multiple patchsets in parallel is that I can't
remember what I did for each one :)
>
> That said, I'm as concerned as everyone else that this is all "new" code
> and doesn't lean on existing tracing or things like PAGE_OWNER enough.
Yeah, that's being actively discussed.
>
^ permalink raw reply [flat|nested] 160+ messages in thread
[parent not found: <CAJuCfpHrZ4kWYFPvA3W9J+CmNMuOtGa_ZMXE9fOmKsPQeNt2tg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 34/40] lib: code tagging context capture support
[not found] ` <CAJuCfpHrZ4kWYFPvA3W9J+CmNMuOtGa_ZMXE9fOmKsPQeNt2tg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2023-05-04 8:04 ` Michal Hocko
2023-05-04 14:31 ` Suren Baghdasaryan
0 siblings, 1 reply; 160+ messages in thread
From: Michal Hocko @ 2023-05-04 8:04 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
kent.overstreet-fxUVXftIFDnyG1zEObXtfA, vbabka-AlSwsSmVLrQ,
hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w, keescook
On Wed 03-05-23 08:18:39, Suren Baghdasaryan wrote:
> On Wed, May 3, 2023 at 12:36 AM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
> >
> > On Mon 01-05-23 09:54:44, Suren Baghdasaryan wrote:
> > [...]
> > > +static inline void add_ctx(struct codetag_ctx *ctx,
> > > + struct codetag_with_ctx *ctc)
> > > +{
> > > + kref_init(&ctx->refcount);
> > > + spin_lock(&ctc->ctx_lock);
> > > + ctx->flags = CTC_FLAG_CTX_PTR;
> > > + ctx->ctc = ctc;
> > > + list_add_tail(&ctx->node, &ctc->ctx_head);
> > > + spin_unlock(&ctc->ctx_lock);
> >
> > AFAIU every single tracked allocation will get its own codetag_ctx.
> > There is no aggregation per allocation site or anything else. This looks
> > like a scalability and a memory overhead red flag to me.
>
> True. The allocations here would not be limited. We could introduce a
> global limit to the amount of memory that we can use to store contexts
> and maybe reuse the oldest entry (in LRU fashion) when we hit that
> limit?
Wouldn't it make more sense to aggregate same allocations? Sure pids
get recycled but quite honestly I am not sure that information is all
that interesting. Precisely because of the recycle and short lived
processes reasons. I think there is quite a lot to think about the
detailed context tracking.
> >
> > > +}
> > > +
> > > +static inline void rem_ctx(struct codetag_ctx *ctx,
> > > + void (*free_ctx)(struct kref *refcount))
> > > +{
> > > + struct codetag_with_ctx *ctc = ctx->ctc;
> > > +
> > > + spin_lock(&ctc->ctx_lock);
> >
> > This could deadlock when allocator is called from the IRQ context.
>
> I see. spin_lock_irqsave() then?
yes. I have checked that the lock is not held over the all list
traversal which is good but the changelog could be more explicit about
the iterators and lock hold times implications.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 34/40] lib: code tagging context capture support
2023-05-04 8:04 ` Michal Hocko
@ 2023-05-04 14:31 ` Suren Baghdasaryan
0 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-04 14:31 UTC (permalink / raw)
To: Michal Hocko
Cc: akpm, kent.overstreet, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Thu, May 4, 2023 at 1:04 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 03-05-23 08:18:39, Suren Baghdasaryan wrote:
> > On Wed, May 3, 2023 at 12:36 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 01-05-23 09:54:44, Suren Baghdasaryan wrote:
> > > [...]
> > > > +static inline void add_ctx(struct codetag_ctx *ctx,
> > > > + struct codetag_with_ctx *ctc)
> > > > +{
> > > > + kref_init(&ctx->refcount);
> > > > + spin_lock(&ctc->ctx_lock);
> > > > + ctx->flags = CTC_FLAG_CTX_PTR;
> > > > + ctx->ctc = ctc;
> > > > + list_add_tail(&ctx->node, &ctc->ctx_head);
> > > > + spin_unlock(&ctc->ctx_lock);
> > >
> > > AFAIU every single tracked allocation will get its own codetag_ctx.
> > > There is no aggregation per allocation site or anything else. This looks
> > > like a scalability and a memory overhead red flag to me.
> >
> > True. The allocations here would not be limited. We could introduce a
> > global limit to the amount of memory that we can use to store contexts
> > and maybe reuse the oldest entry (in LRU fashion) when we hit that
> > limit?
>
> Wouldn't it make more sense to aggregate same allocations? Sure pids
> get recycled but quite honestly I am not sure that information is all
> that interesting. Precisely because of the recycle and short lived
> processes reasons. I think there is quite a lot to think about the
> detailed context tracking.
That would be a nice optimization. I'll need to look into the
implementation details. Thanks for the idea.
>
> > >
> > > > +}
> > > > +
> > > > +static inline void rem_ctx(struct codetag_ctx *ctx,
> > > > + void (*free_ctx)(struct kref *refcount))
> > > > +{
> > > > + struct codetag_with_ctx *ctc = ctx->ctc;
> > > > +
> > > > + spin_lock(&ctc->ctx_lock);
> > >
> > > This could deadlock when allocator is called from the IRQ context.
> >
> > I see. spin_lock_irqsave() then?
>
> yes. I have checked that the lock is not held over the all list
> traversal which is good but the changelog could be more explicit about
> the iterators and lock hold times implications.
Ack. Will add more information.
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 35/40] lib: implement context capture support for tagged allocations
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (23 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 34/40] lib: code tagging context capture support Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-03 7:39 ` Michal Hocko
2023-05-01 16:54 ` [PATCH 36/40] lib: add memory allocations report in show_mem() Suren Baghdasaryan
` (5 subsequent siblings)
30 siblings, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Implement mechanisms for capturing allocation call context which consists
of:
- allocation size
- pid, tgid and name of the allocating task
- allocation timestamp
- allocation call stack
The patch creates allocations.ctx file which can be written to
enable/disable context capture for a specific code tag. Captured context
can be obtained by reading allocations.ctx file.
Usage example:
echo "file include/asm-generic/pgalloc.h line 63 enable" > \
/sys/kernel/debug/allocations.ctx
cat allocations.ctx
91.0MiB 212 include/asm-generic/pgalloc.h:63 module:pgtable func:__pte_alloc_one
size: 4096
pid: 1551
tgid: 1551
comm: cat
ts: 670109646361
call stack:
pte_alloc_one+0xfe/0x130
__pte_alloc+0x22/0x90
move_page_tables.part.0+0x994/0xa60
shift_arg_pages+0xa4/0x180
setup_arg_pages+0x286/0x2d0
load_elf_binary+0x4e1/0x18d0
bprm_execve+0x26b/0x660
do_execveat_common.isra.0+0x19d/0x220
__x64_sys_execve+0x2e/0x40
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
size: 4096
pid: 1551
tgid: 1551
comm: cat
ts: 670109711801
call stack:
pte_alloc_one+0xfe/0x130
__do_fault+0x52/0xc0
__handle_mm_fault+0x7d9/0xdd0
handle_mm_fault+0xc0/0x2b0
do_user_addr_fault+0x1c3/0x660
exc_page_fault+0x62/0x150
asm_exc_page_fault+0x22/0x30
...
echo "file include/asm-generic/pgalloc.h line 63 disable" > \
/sys/kernel/debug/alloc_tags.ctx
Note that disabling context capture will not clear already captured
context but no new context will be captured.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/alloc_tag.h | 25 +++-
include/linux/codetag.h | 3 +-
include/linux/pgalloc_tag.h | 4 +-
lib/Kconfig.debug | 1 +
lib/alloc_tag.c | 238 +++++++++++++++++++++++++++++++++++-
lib/codetag.c | 20 +--
6 files changed, 272 insertions(+), 19 deletions(-)
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 07922d81b641..2a3d248aae10 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -17,20 +17,29 @@
* an array of these. Embedded codetag utilizes codetag framework.
*/
struct alloc_tag {
- struct codetag ct;
+ struct codetag_with_ctx ctc;
struct lazy_percpu_counter bytes_allocated;
} __aligned(8);
#ifdef CONFIG_MEM_ALLOC_PROFILING
+static inline struct alloc_tag *ctc_to_alloc_tag(struct codetag_with_ctx *ctc)
+{
+ return container_of(ctc, struct alloc_tag, ctc);
+}
+
static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
{
- return container_of(ct, struct alloc_tag, ct);
+ return container_of(ct_to_ctc(ct), struct alloc_tag, ctc);
}
+struct codetag_ctx *alloc_tag_create_ctx(struct alloc_tag *tag, size_t size);
+void alloc_tag_free_ctx(struct codetag_ctx *ctx, struct alloc_tag **ptag);
+bool alloc_tag_enable_ctx(struct alloc_tag *tag, bool enable);
+
#define DEFINE_ALLOC_TAG(_alloc_tag, _old) \
static struct alloc_tag _alloc_tag __used __aligned(8) \
- __section("alloc_tags") = { .ct = CODE_TAG_INIT }; \
+ __section("alloc_tags") = { .ctc.ct = CODE_TAG_INIT }; \
struct alloc_tag * __maybe_unused _old = alloc_tag_save(&_alloc_tag)
extern struct static_key_true mem_alloc_profiling_key;
@@ -54,7 +63,10 @@ static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes,
if (!ref || !ref->ct)
return;
- tag = ct_to_alloc_tag(ref->ct);
+ if (is_codetag_ctx_ref(ref))
+ alloc_tag_free_ctx(ref->ctx, &tag);
+ else
+ tag = ct_to_alloc_tag(ref->ct);
if (may_allocate)
lazy_percpu_counter_add(&tag->bytes_allocated, -bytes);
@@ -88,7 +100,10 @@ static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
if (!ref || !tag)
return;
- ref->ct = &tag->ct;
+ if (codetag_ctx_enabled(&tag->ctc))
+ ref->ctx = alloc_tag_create_ctx(tag, bytes);
+ else
+ ref->ct = &tag->ctc.ct;
lazy_percpu_counter_add(&tag->bytes_allocated, bytes);
}
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index 9ab2f017e845..b6a2f0287a83 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -104,7 +104,8 @@ struct codetag_with_ctx *ct_to_ctc(struct codetag *ct)
}
void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
-struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
+void codetag_init_iter(struct codetag_iterator *iter,
+ struct codetag_type *cttype);
struct codetag *codetag_next_ct(struct codetag_iterator *iter);
struct codetag_ctx *codetag_next_ctx(struct codetag_iterator *iter);
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 0cbba13869b5..e4661bbd40c6 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -6,6 +6,7 @@
#define _LINUX_PGALLOC_TAG_H
#include <linux/alloc_tag.h>
+#include <linux/codetag_ctx.h>
#ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -70,7 +71,8 @@ static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
if (!ref->ct)
goto out;
- tag = ct_to_alloc_tag(ref->ct);
+ tag = is_codetag_ctx_ref(ref) ? ctc_to_alloc_tag(ref->ctx->ctc)
+ : ct_to_alloc_tag(ref->ct);
page_ext = page_ext_next(page_ext);
for (i = 1; i < nr; i++) {
/* New reference with 0 bytes accounted */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4157c2251b07..1b83ef17d232 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -969,6 +969,7 @@ config MEM_ALLOC_PROFILING
select LAZY_PERCPU_COUNTER
select PAGE_EXTENSION
select SLAB_OBJ_EXT
+ select STACKDEPOT
help
Track allocation source code and record total allocation size
initiated at that code location. The mechanism can be used to track
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 4a0b95a46b2e..675c7a08e38b 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -1,13 +1,18 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/alloc_tag.h>
+#include <linux/codetag_ctx.h>
#include <linux/debugfs.h>
#include <linux/fs.h>
#include <linux/gfp.h>
#include <linux/module.h>
#include <linux/page_ext.h>
+#include <linux/sched/clock.h>
#include <linux/seq_buf.h>
+#include <linux/stackdepot.h>
#include <linux/uaccess.h>
+#define STACK_BUF_SIZE 1024
+
DEFINE_STATIC_KEY_TRUE(mem_alloc_profiling_key);
/*
@@ -23,6 +28,16 @@ static int __init mem_alloc_profiling_disable(char *s)
}
__setup("nomem_profiling", mem_alloc_profiling_disable);
+struct alloc_call_ctx {
+ struct codetag_ctx ctx;
+ size_t size;
+ pid_t pid;
+ pid_t tgid;
+ char comm[TASK_COMM_LEN];
+ u64 ts_nsec;
+ depot_stack_handle_t stack_handle;
+} __aligned(8);
+
struct alloc_tag_file_iterator {
struct codetag_iterator ct_iter;
struct seq_buf buf;
@@ -64,7 +79,7 @@ static int allocations_file_open(struct inode *inode, struct file *file)
return -ENOMEM;
codetag_lock_module_list(cttype, true);
- iter->ct_iter = codetag_get_ct_iter(cttype);
+ codetag_init_iter(&iter->ct_iter, cttype);
codetag_lock_module_list(cttype, false);
seq_buf_init(&iter->buf, iter->rawbuf, sizeof(iter->rawbuf));
file->private_data = iter;
@@ -125,24 +140,240 @@ static const struct file_operations allocations_file_ops = {
.read = allocations_file_read,
};
+static void alloc_tag_ops_free_ctx(struct kref *refcount)
+{
+ kfree(container_of(kref_to_ctx(refcount), struct alloc_call_ctx, ctx));
+}
+
+struct codetag_ctx *alloc_tag_create_ctx(struct alloc_tag *tag, size_t size)
+{
+ struct alloc_call_ctx *ac_ctx;
+
+ /* TODO: use a dedicated kmem_cache */
+ ac_ctx = kmalloc(sizeof(struct alloc_call_ctx), GFP_KERNEL);
+ if (WARN_ON(!ac_ctx))
+ return NULL;
+
+ ac_ctx->size = size;
+ ac_ctx->pid = current->pid;
+ ac_ctx->tgid = current->tgid;
+ strscpy(ac_ctx->comm, current->comm, sizeof(ac_ctx->comm));
+ ac_ctx->ts_nsec = local_clock();
+ ac_ctx->stack_handle =
+ stack_depot_capture_stack(GFP_NOWAIT | __GFP_NOWARN);
+ add_ctx(&ac_ctx->ctx, &tag->ctc);
+
+ return &ac_ctx->ctx;
+}
+EXPORT_SYMBOL_GPL(alloc_tag_create_ctx);
+
+void alloc_tag_free_ctx(struct codetag_ctx *ctx, struct alloc_tag **ptag)
+{
+ *ptag = ctc_to_alloc_tag(ctx->ctc);
+ rem_ctx(ctx, alloc_tag_ops_free_ctx);
+}
+EXPORT_SYMBOL_GPL(alloc_tag_free_ctx);
+
+bool alloc_tag_enable_ctx(struct alloc_tag *tag, bool enable)
+{
+ static bool stack_depot_ready;
+
+ if (enable && !stack_depot_ready) {
+ stack_depot_init();
+ stack_depot_capture_init();
+ stack_depot_ready = true;
+ }
+
+ return codetag_enable_ctx(&tag->ctc, enable);
+}
+
+static void alloc_tag_ctx_to_text(struct seq_buf *out, struct codetag_ctx *ctx)
+{
+ struct alloc_call_ctx *ac_ctx;
+ char *buf;
+
+ ac_ctx = container_of(ctx, struct alloc_call_ctx, ctx);
+ seq_buf_printf(out, " size: %zu\n", ac_ctx->size);
+ seq_buf_printf(out, " pid: %d\n", ac_ctx->pid);
+ seq_buf_printf(out, " tgid: %d\n", ac_ctx->tgid);
+ seq_buf_printf(out, " comm: %s\n", ac_ctx->comm);
+ seq_buf_printf(out, " ts: %llu\n", ac_ctx->ts_nsec);
+
+ buf = kmalloc(STACK_BUF_SIZE, GFP_KERNEL);
+ if (buf) {
+ int bytes_read = stack_depot_snprint(ac_ctx->stack_handle, buf,
+ STACK_BUF_SIZE - 1, 8);
+ buf[bytes_read] = '\0';
+ seq_buf_printf(out, " call stack:\n%s\n", buf);
+ }
+ kfree(buf);
+}
+
+static ssize_t allocations_ctx_file_read(struct file *file, char __user *ubuf,
+ size_t size, loff_t *ppos)
+{
+ struct alloc_tag_file_iterator *iter = file->private_data;
+ struct codetag_iterator *ct_iter = &iter->ct_iter;
+ struct user_buf buf = { .buf = ubuf, .size = size };
+ struct codetag_ctx *ctx;
+ struct codetag *prev_ct;
+ int err = 0;
+
+ codetag_lock_module_list(ct_iter->cttype, true);
+ while (1) {
+ err = flush_ubuf(&buf, &iter->buf);
+ if (err || !buf.size)
+ break;
+
+ prev_ct = ct_iter->ct;
+ ctx = codetag_next_ctx(ct_iter);
+ if (!ctx)
+ break;
+
+ if (prev_ct != &ctx->ctc->ct)
+ alloc_tag_to_text(&iter->buf, &ctx->ctc->ct);
+ alloc_tag_ctx_to_text(&iter->buf, ctx);
+ }
+ codetag_lock_module_list(ct_iter->cttype, false);
+
+ return err ? : buf.ret;
+}
+
+#define CTX_CAPTURE_TOKENS() \
+ x(disable, 0) \
+ x(enable, 0)
+
+static const char * const ctx_capture_token_strs[] = {
+#define x(name, nr_args) #name,
+ CTX_CAPTURE_TOKENS()
+#undef x
+ NULL
+};
+
+enum ctx_capture_token {
+#define x(name, nr_args) TOK_##name,
+ CTX_CAPTURE_TOKENS()
+#undef x
+};
+
+static int enable_ctx_capture(struct codetag_type *cttype,
+ struct codetag_query *query, bool enable)
+{
+ struct codetag_iterator ct_iter;
+ struct codetag_with_ctx *ctc;
+ struct codetag *ct;
+ unsigned int nfound = 0;
+
+ codetag_lock_module_list(cttype, true);
+
+ codetag_init_iter(&ct_iter, cttype);
+ while ((ct = codetag_next_ct(&ct_iter))) {
+ if (!codetag_matches_query(query, ct, ct_iter.cmod, NULL))
+ continue;
+
+ ctc = ct_to_ctc(ct);
+ if (codetag_ctx_enabled(ctc) == enable)
+ continue;
+
+ if (!alloc_tag_enable_ctx(ctc_to_alloc_tag(ctc), enable)) {
+ pr_warn("Failed to toggle context capture\n");
+ continue;
+ }
+
+ nfound++;
+ }
+
+ codetag_lock_module_list(cttype, false);
+
+ return nfound ? 0 : -ENOENT;
+}
+
+static int parse_command(struct codetag_type *cttype, char *buf)
+{
+ struct codetag_query query = { NULL };
+ char *cmd;
+ int ret;
+ int tok;
+
+ buf = codetag_query_parse(&query, buf);
+ if (IS_ERR(buf))
+ return PTR_ERR(buf);
+
+ cmd = strsep_no_empty(&buf, " \t\r\n");
+ if (!cmd)
+ return -EINVAL; /* no command */
+
+ tok = match_string(ctx_capture_token_strs,
+ ARRAY_SIZE(ctx_capture_token_strs), cmd);
+ if (tok < 0)
+ return -EINVAL; /* unknown command */
+
+ ret = enable_ctx_capture(cttype, &query, tok == TOK_enable);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static ssize_t allocations_ctx_file_write(struct file *file, const char __user *ubuf,
+ size_t len, loff_t *offp)
+{
+ struct alloc_tag_file_iterator *iter = file->private_data;
+ char tmpbuf[256];
+
+ if (len == 0)
+ return 0;
+ /* we don't check *offp -- multiple writes() are allowed */
+ if (len > sizeof(tmpbuf) - 1)
+ return -E2BIG;
+
+ if (copy_from_user(tmpbuf, ubuf, len))
+ return -EFAULT;
+
+ tmpbuf[len] = '\0';
+ parse_command(iter->ct_iter.cttype, tmpbuf);
+
+ *offp += len;
+ return len;
+}
+
+static const struct file_operations allocations_ctx_file_ops = {
+ .owner = THIS_MODULE,
+ .open = allocations_file_open,
+ .release = allocations_file_release,
+ .read = allocations_ctx_file_read,
+ .write = allocations_ctx_file_write,
+};
+
static int __init dbgfs_init(struct codetag_type *cttype)
{
struct dentry *file;
+ struct dentry *ctx_file;
file = debugfs_create_file("allocations", 0444, NULL, cttype,
&allocations_file_ops);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ ctx_file = debugfs_create_file("allocations.ctx", 0666, NULL, cttype,
+ &allocations_ctx_file_ops);
+ if (IS_ERR(ctx_file)) {
+ debugfs_remove(file);
+ return PTR_ERR(ctx_file);
+ }
- return IS_ERR(file) ? PTR_ERR(file) : 0;
+ return 0;
}
static bool alloc_tag_module_unload(struct codetag_type *cttype, struct codetag_module *cmod)
{
- struct codetag_iterator iter = codetag_get_ct_iter(cttype);
+ struct codetag_iterator iter;
bool module_unused = true;
struct alloc_tag *tag;
struct codetag *ct;
size_t bytes;
+ codetag_init_iter(&iter, cttype);
for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
if (iter.cmod != cmod)
continue;
@@ -183,6 +414,7 @@ static int __init alloc_tag_init(void)
.section = "alloc_tags",
.tag_size = sizeof(struct alloc_tag),
.module_unload = alloc_tag_module_unload,
+ .free_ctx = alloc_tag_ops_free_ctx,
};
cttype = codetag_register_type(&desc);
diff --git a/lib/codetag.c b/lib/codetag.c
index d891bbe4481d..cbff146b3fe8 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -27,16 +27,14 @@ void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
up_read(&cttype->mod_lock);
}
-struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
+void codetag_init_iter(struct codetag_iterator *iter,
+ struct codetag_type *cttype)
{
- struct codetag_iterator iter = {
- .cttype = cttype,
- .cmod = NULL,
- .mod_id = 0,
- .ct = NULL,
- };
-
- return iter;
+ iter->cttype = cttype;
+ iter->cmod = NULL;
+ iter->mod_id = 0;
+ iter->ct = NULL;
+ iter->ctx = NULL;
}
static inline struct codetag *get_first_module_ct(struct codetag_module *cmod)
@@ -128,6 +126,10 @@ struct codetag_ctx *codetag_next_ctx(struct codetag_iterator *iter)
lockdep_assert_held(&iter->cttype->mod_lock);
+ /* Move to the first codetag if search just started */
+ if (!iter->ct)
+ codetag_next_ct(iter);
+
if (!ctx)
return next_ctx_from_ct(iter);
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH 35/40] lib: implement context capture support for tagged allocations
2023-05-01 16:54 ` [PATCH 35/40] lib: implement context capture support for tagged allocations Suren Baghdasaryan
@ 2023-05-03 7:39 ` Michal Hocko
[not found] ` <ZFIPmnrSIdJ5yusM-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
0 siblings, 1 reply; 160+ messages in thread
From: Michal Hocko @ 2023-05-03 7:39 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Mon 01-05-23 09:54:45, Suren Baghdasaryan wrote:
[...]
> +struct codetag_ctx *alloc_tag_create_ctx(struct alloc_tag *tag, size_t size)
> +{
> + struct alloc_call_ctx *ac_ctx;
> +
> + /* TODO: use a dedicated kmem_cache */
> + ac_ctx = kmalloc(sizeof(struct alloc_call_ctx), GFP_KERNEL);
You cannot really use GFP_KERNEL here. This is post_alloc_hook path and
that has its own gfp context.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH 36/40] lib: add memory allocations report in show_mem()
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (24 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 35/40] lib: implement context capture support for tagged allocations Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 37/40] codetag: debug: skip objext checking when it's for objext itself Suren Baghdasaryan
` (4 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
Include allocations in show_mem reports.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/alloc_tag.h | 2 ++
lib/alloc_tag.c | 48 +++++++++++++++++++++++++++++++++++----
lib/show_mem.c | 15 ++++++++++++
3 files changed, 60 insertions(+), 5 deletions(-)
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 2a3d248aae10..190ab793f7e5 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -23,6 +23,8 @@ struct alloc_tag {
#ifdef CONFIG_MEM_ALLOC_PROFILING
+void alloc_tags_show_mem_report(struct seq_buf *s);
+
static inline struct alloc_tag *ctc_to_alloc_tag(struct codetag_with_ctx *ctc)
{
return container_of(ctc, struct alloc_tag, ctc);
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 675c7a08e38b..e2ebab8999a9 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -13,6 +13,8 @@
#define STACK_BUF_SIZE 1024
+static struct codetag_type *alloc_tag_cttype;
+
DEFINE_STATIC_KEY_TRUE(mem_alloc_profiling_key);
/*
@@ -133,6 +135,43 @@ static ssize_t allocations_file_read(struct file *file, char __user *ubuf,
return err ? : buf.ret;
}
+void alloc_tags_show_mem_report(struct seq_buf *s)
+{
+ struct codetag_iterator iter;
+ struct codetag *ct;
+ struct {
+ struct codetag *tag;
+ size_t bytes;
+ } tags[10], n;
+ unsigned int i, nr = 0;
+
+ codetag_init_iter(&iter, alloc_tag_cttype);
+
+ codetag_lock_module_list(alloc_tag_cttype, true);
+ while ((ct = codetag_next_ct(&iter))) {
+ n.tag = ct;
+ n.bytes = lazy_percpu_counter_read(&ct_to_alloc_tag(ct)->bytes_allocated);
+
+ for (i = 0; i < nr; i++)
+ if (n.bytes > tags[i].bytes)
+ break;
+
+ if (i < ARRAY_SIZE(tags)) {
+ nr -= nr == ARRAY_SIZE(tags);
+ memmove(&tags[i + 1],
+ &tags[i],
+ sizeof(tags[0]) * (nr - i));
+ nr++;
+ tags[i] = n;
+ }
+ }
+
+ for (i = 0; i < nr; i++)
+ alloc_tag_to_text(s, tags[i].tag);
+
+ codetag_lock_module_list(alloc_tag_cttype, false);
+}
+
static const struct file_operations allocations_file_ops = {
.owner = THIS_MODULE,
.open = allocations_file_open,
@@ -409,7 +448,6 @@ EXPORT_SYMBOL(page_alloc_tagging_ops);
static int __init alloc_tag_init(void)
{
- struct codetag_type *cttype;
const struct codetag_type_desc desc = {
.section = "alloc_tags",
.tag_size = sizeof(struct alloc_tag),
@@ -417,10 +455,10 @@ static int __init alloc_tag_init(void)
.free_ctx = alloc_tag_ops_free_ctx,
};
- cttype = codetag_register_type(&desc);
- if (IS_ERR_OR_NULL(cttype))
- return PTR_ERR(cttype);
+ alloc_tag_cttype = codetag_register_type(&desc);
+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
+ return PTR_ERR(alloc_tag_cttype);
- return dbgfs_init(cttype);
+ return dbgfs_init(alloc_tag_cttype);
}
module_init(alloc_tag_init);
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 1485c87be935..5c82f29168e3 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -7,6 +7,7 @@
#include <linux/mm.h>
#include <linux/cma.h>
+#include <linux/seq_buf.h>
void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
{
@@ -34,4 +35,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
#ifdef CONFIG_MEMORY_FAILURE
printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ {
+ struct seq_buf s;
+ char *buf = kmalloc(4096, GFP_ATOMIC);
+
+ if (buf) {
+ printk("Memory allocations:\n");
+ seq_buf_init(&s, buf, 4096);
+ alloc_tags_show_mem_report(&s);
+ printk("%s", buf);
+ kfree(buf);
+ }
+ }
+#endif
}
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 37/40] codetag: debug: skip objext checking when it's for objext itself
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (25 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 36/40] lib: add memory allocations report in show_mem() Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 38/40] codetag: debug: mark codetags for reserved pages as empty Suren Baghdasaryan
` (3 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
objext objects are created with __GFP_NO_OBJ_EXT flag and therefore have
no corresponding objext themselves (otherwise we would get an infinite
recursion). When freeing these objects their codetag will be empty and
when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled this will lead to false
warnings. Introduce CODETAG_EMPTY special codetag value to mark
allocations which intentionally lack codetag to avoid these warnings.
Set objext codetags to CODETAG_EMPTY before freeing to indicate that
the codetag is expected to be empty.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/alloc_tag.h | 28 ++++++++++++++++++++++++++++
mm/slab.h | 33 +++++++++++++++++++++++++++++++++
mm/slab_common.c | 1 +
3 files changed, 62 insertions(+)
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 190ab793f7e5..2c3f4f3a8c93 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -51,6 +51,28 @@ static inline bool mem_alloc_profiling_enabled(void)
return static_branch_likely(&mem_alloc_profiling_key);
}
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+#define CODETAG_EMPTY (void *)1
+
+static inline bool is_codetag_empty(union codetag_ref *ref)
+{
+ return ref->ct == CODETAG_EMPTY;
+}
+
+static inline void set_codetag_empty(union codetag_ref *ref)
+{
+ if (ref)
+ ref->ct = CODETAG_EMPTY;
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
+static inline bool is_codetag_empty(union codetag_ref *ref) { return false; }
+static inline void set_codetag_empty(union codetag_ref *ref) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes,
bool may_allocate)
{
@@ -65,6 +87,11 @@ static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes,
if (!ref || !ref->ct)
return;
+ if (is_codetag_empty(ref)) {
+ ref->ct = NULL;
+ return;
+ }
+
if (is_codetag_ctx_ref(ref))
alloc_tag_free_ctx(ref->ctx, &tag);
else
@@ -112,6 +139,7 @@ static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
#else
#define DEFINE_ALLOC_TAG(_alloc_tag, _old)
+static inline void set_codetag_empty(union codetag_ref *ref) {}
static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes) {}
static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
diff --git a/mm/slab.h b/mm/slab.h
index f9442d3a10b2..50d86008a86a 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -416,6 +416,31 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
gfp_t gfp, bool new_slab);
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
+{
+ struct slabobj_ext *slab_exts;
+ struct slab *obj_exts_slab;
+
+ obj_exts_slab = virt_to_slab(obj_exts);
+ slab_exts = slab_obj_exts(obj_exts_slab);
+ if (slab_exts) {
+ unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
+ obj_exts_slab, obj_exts);
+ /* codetag should be NULL */
+ WARN_ON(slab_exts[offs].ref.ct);
+ set_codetag_empty(&slab_exts[offs].ref);
+ }
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
+static inline void mark_objexts_empty(struct slabobj_ext *obj_exts) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
static inline bool need_slab_obj_ext(void)
{
#ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -437,6 +462,14 @@ static inline void free_slab_obj_exts(struct slab *slab)
if (!obj_exts)
return;
+ /*
+ * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
+ * corresponding extension will be NULL. alloc_tag_sub() will throw a
+ * warning if slab has extensions but the extension of an object is
+ * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
+ * the extension for obj_exts is expected to be NULL.
+ */
+ mark_objexts_empty(obj_exts);
kfree(obj_exts);
slab->obj_exts = 0;
}
diff --git a/mm/slab_common.c b/mm/slab_common.c
index a05333bbb7f1..89265f825c43 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -244,6 +244,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
* assign slabobj_exts in parallel. In this case the existing
* objcg vector should be reused.
*/
+ mark_objexts_empty(vec);
kfree(vec);
return 0;
}
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 38/40] codetag: debug: mark codetags for reserved pages as empty
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (26 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 37/40] codetag: debug: skip objext checking when it's for objext itself Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 39/40] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations Suren Baghdasaryan
` (2 subsequent siblings)
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
To avoid debug warnings while freeing reserved pages which were not
allocated with usual allocators, mark their codetags as empty before
freeing.
Maybe we can annotate reserved pages correctly and avoid this?
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/mm.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 27ce77080c79..f5969cb85879 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -5,6 +5,7 @@
#include <linux/errno.h>
#include <linux/mmdebug.h>
#include <linux/gfp.h>
+#include <linux/pgalloc_tag.h>
#include <linux/bug.h>
#include <linux/list.h>
#include <linux/mmzone.h>
@@ -2920,6 +2921,13 @@ extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
/* Free the reserved page into the buddy system, so it gets managed. */
static inline void free_reserved_page(struct page *page)
{
+ union codetag_ref *ref;
+
+ ref = get_page_tag_ref(page);
+ if (ref) {
+ set_codetag_empty(ref);
+ put_page_tag_ref(ref);
+ }
ClearPageReserved(page);
init_page_count(page);
__free_page(page);
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 39/40] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (27 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 38/40] codetag: debug: mark codetags for reserved pages as empty Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 40/40] MAINTAINERS: Add entries for code tagging and memory allocation profiling Suren Baghdasaryan
2023-05-03 7:25 ` [PATCH 00/40] Memory " Michal Hocko
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
If slabobj_ext vector allocation for a slab object fails and later on it
succeeds for another object in the same slab, the slabobj_ext for the
original object will be NULL and will be flagged in case when
CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled.
Mark failed slabobj_ext vector allocations using a new objext_flags flag
stored in the lower bits of slab->obj_exts. When new allocation succeeds
it marks all tag references in the same slabobj_ext vector as empty to
avoid warnings implemented by CONFIG_MEM_ALLOC_PROFILING_DEBUG checks.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/memcontrol.h | 4 +++-
mm/slab_common.c | 27 +++++++++++++++++++++++++--
2 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c7f21b15b540..3eb8975c1462 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -356,8 +356,10 @@ enum page_memcg_data_flags {
#endif /* CONFIG_MEMCG */
enum objext_flags {
+ /* slabobj_ext vector failed to allocate */
+ OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
/* the next bit after the last actual flag */
- __NR_OBJEXTS_FLAGS = __FIRST_OBJEXT_FLAG,
+ __NR_OBJEXTS_FLAGS = (__FIRST_OBJEXT_FLAG << 1),
};
#define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 89265f825c43..5b7e096b70a5 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -217,21 +217,44 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
{
unsigned int objects = objs_per_slab(s, slab);
unsigned long obj_exts;
- void *vec;
+ struct slabobj_ext *vec;
gfp &= ~OBJCGS_CLEAR_MASK;
/* Prevent recursive extension vector allocation */
gfp |= __GFP_NO_OBJ_EXT;
vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
slab_nid(slab));
- if (!vec)
+ if (!vec) {
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ if (new_slab) {
+ /* Mark vectors which failed to allocate */
+ slab->obj_exts = OBJEXTS_ALLOC_FAIL;
+#ifdef CONFIG_MEMCG
+ slab->obj_exts |= MEMCG_DATA_OBJEXTS;
+#endif
+ }
+#endif
return -ENOMEM;
+ }
obj_exts = (unsigned long)vec;
#ifdef CONFIG_MEMCG
obj_exts |= MEMCG_DATA_OBJEXTS;
#endif
if (new_slab) {
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ /*
+ * If vector previously failed to allocate then we have live
+ * objects with no tag reference. Mark all references in this
+ * vector as empty to avoid warnings later on.
+ */
+ if (slab->obj_exts & OBJEXTS_ALLOC_FAIL) {
+ unsigned int i;
+
+ for (i = 0; i < objects; i++)
+ set_codetag_empty(&vec[i].ref);
+ }
+#endif
/*
* If the slab is brand new and nobody can yet access its
* obj_exts, no synchronization is required and obj_exts can
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH 40/40] MAINTAINERS: Add entries for code tagging and memory allocation profiling
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (28 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 39/40] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations Suren Baghdasaryan
@ 2023-05-01 16:54 ` Suren Baghdasaryan
2023-05-03 7:25 ` [PATCH 00/40] Memory " Michal Hocko
30 siblings, 0 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-01 16:54 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, mhocko, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
From: Kent Overstreet <kent.overstreet@linux.dev>
The new code & libraries added are being maintained - mark them as such.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
MAINTAINERS | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 3889d1adf71f..6f3b79266204 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5116,6 +5116,13 @@ S: Supported
F: Documentation/process/code-of-conduct-interpretation.rst
F: Documentation/process/code-of-conduct.rst
+CODE TAGGING
+M: Suren Baghdasaryan <surenb@google.com>
+M: Kent Overstreet <kent.overstreet@linux.dev>
+S: Maintained
+F: include/linux/codetag.h
+F: lib/codetag.c
+
COMEDI DRIVERS
M: Ian Abbott <abbotti@mev.co.uk>
M: H Hartley Sweeten <hsweeten@visionengravers.com>
@@ -11658,6 +11665,12 @@ S: Maintained
F: Documentation/devicetree/bindings/leds/backlight/kinetic,ktz8866.yaml
F: drivers/video/backlight/ktz8866.c
+LAZY PERCPU COUNTERS
+M: Kent Overstreet <kent.overstreet@linux.dev>
+S: Maintained
+F: include/linux/lazy-percpu-counter.h
+F: lib/lazy-percpu-counter.c
+
L3MDEV
M: David Ahern <dsahern@kernel.org>
L: netdev@vger.kernel.org
@@ -13468,6 +13481,15 @@ F: mm/memblock.c
F: mm/mm_init.c
F: tools/testing/memblock/
+MEMORY ALLOCATION PROFILING
+M: Suren Baghdasaryan <surenb@google.com>
+M: Kent Overstreet <kent.overstreet@linux.dev>
+S: Maintained
+F: include/linux/alloc_tag.h
+F: include/linux/codetag_ctx.h
+F: lib/alloc_tag.c
+F: lib/pgalloc_tag.c
+
MEMORY CONTROLLER DRIVERS
M: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
L: linux-kernel@vger.kernel.org
--
2.40.1.495.gc816e09b53d-goog
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH 00/40] Memory allocation profiling
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
` (29 preceding siblings ...)
2023-05-01 16:54 ` [PATCH 40/40] MAINTAINERS: Add entries for code tagging and memory allocation profiling Suren Baghdasaryan
@ 2023-05-03 7:25 ` Michal Hocko
2023-05-03 7:34 ` Kent Overstreet
2023-05-03 15:09 ` Suren Baghdasaryan
30 siblings, 2 replies; 160+ messages in thread
From: Michal Hocko @ 2023-05-03 7:25 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Mon 01-05-23 09:54:10, Suren Baghdasaryan wrote:
> Memory allocation profiling infrastructure provides a low overhead
> mechanism to make all kernel allocations in the system visible. It can be
> used to monitor memory usage, track memory hotspots, detect memory leaks,
> identify memory regressions.
>
> To keep the overhead to the minimum, we record only allocation sizes for
> every allocation in the codebase. With that information, if users are
> interested in more detailed context for a specific allocation, they can
> enable in-depth context tracking, which includes capturing the pid, tgid,
> task name, allocation size, timestamp and call stack for every allocation
> at the specified code location.
[...]
> Implementation utilizes a more generic concept of code tagging, introduced
> as part of this patchset. Code tag is a structure identifying a specific
> location in the source code which is generated at compile time and can be
> embedded in an application-specific structure. A number of applications
> for code tagging have been presented in the original RFC [1].
> Code tagging uses the old trick of "define a special elf section for
> objects of a given type so that we can iterate over them at runtime" and
> creates a proper library for it.
>
> To profile memory allocations, we instrument page, slab and percpu
> allocators to record total memory allocated in the associated code tag at
> every allocation in the codebase. Every time an allocation is performed by
> an instrumented allocator, the code tag at that location increments its
> counter by allocation size. Every time the memory is freed the counter is
> decremented. To decrement the counter upon freeing, allocated object needs
> a reference to its code tag. Page allocators use page_ext to record this
> reference while slab allocators use memcg_data (renamed into more generic
> slabobj_ext) of the slab page.
[...]
> [1] https://lore.kernel.org/all/20220830214919.53220-1-surenb@google.com/
[...]
> 70 files changed, 2765 insertions(+), 554 deletions(-)
Sorry for cutting the cover considerably but I believe I have quoted the
most important/interesting parts here. The approach is not fundamentally
different from the previous version [1] and there was a significant
discussion around this approach. The cover letter doesn't summarize nor
deal with concerns expressed previous AFAICS. So let me bring those up
back. At least those I find the most important:
- This is a big change and it adds a significant maintenance burden
because each allocation entry point needs to be handled specifically.
The cost will grow with the intended coverage especially there when
allocation is hidden in a library code.
- It has been brought up that this is duplicating functionality already
available via existing tracing infrastructure. You should make it very
clear why that is not suitable for the job
- We already have page_owner infrastructure that provides allocation
tracking data. Why it cannot be used/extended?
Thanks!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 00/40] Memory allocation profiling
2023-05-03 7:25 ` [PATCH 00/40] Memory " Michal Hocko
@ 2023-05-03 7:34 ` Kent Overstreet
[not found] ` <ZFIOfb6/jHwLqg6M-jC9Py7bek1znysI04z7BkA@public.gmane.org>
2023-05-03 15:09 ` Suren Baghdasaryan
1 sibling, 1 reply; 160+ messages in thread
From: Kent Overstreet @ 2023-05-03 7:34 UTC (permalink / raw)
To: Michal Hocko
Cc: Suren Baghdasaryan, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w
On Wed, May 03, 2023 at 09:25:29AM +0200, Michal Hocko wrote:
> On Mon 01-05-23 09:54:10, Suren Baghdasaryan wrote:
> > Memory allocation profiling infrastructure provides a low overhead
> > mechanism to make all kernel allocations in the system visible. It can be
> > used to monitor memory usage, track memory hotspots, detect memory leaks,
> > identify memory regressions.
> >
> > To keep the overhead to the minimum, we record only allocation sizes for
> > every allocation in the codebase. With that information, if users are
> > interested in more detailed context for a specific allocation, they can
> > enable in-depth context tracking, which includes capturing the pid, tgid,
> > task name, allocation size, timestamp and call stack for every allocation
> > at the specified code location.
> [...]
> > Implementation utilizes a more generic concept of code tagging, introduced
> > as part of this patchset. Code tag is a structure identifying a specific
> > location in the source code which is generated at compile time and can be
> > embedded in an application-specific structure. A number of applications
> > for code tagging have been presented in the original RFC [1].
> > Code tagging uses the old trick of "define a special elf section for
> > objects of a given type so that we can iterate over them at runtime" and
> > creates a proper library for it.
> >
> > To profile memory allocations, we instrument page, slab and percpu
> > allocators to record total memory allocated in the associated code tag at
> > every allocation in the codebase. Every time an allocation is performed by
> > an instrumented allocator, the code tag at that location increments its
> > counter by allocation size. Every time the memory is freed the counter is
> > decremented. To decrement the counter upon freeing, allocated object needs
> > a reference to its code tag. Page allocators use page_ext to record this
> > reference while slab allocators use memcg_data (renamed into more generic
> > slabobj_ext) of the slab page.
> [...]
> > [1] https://lore.kernel.org/all/20220830214919.53220-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
> [...]
> > 70 files changed, 2765 insertions(+), 554 deletions(-)
>
> Sorry for cutting the cover considerably but I believe I have quoted the
> most important/interesting parts here. The approach is not fundamentally
> different from the previous version [1] and there was a significant
> discussion around this approach. The cover letter doesn't summarize nor
> deal with concerns expressed previous AFAICS. So let me bring those up
> back. At least those I find the most important:
We covered this previously, I'll just be giving the same answers I did
before:
> - This is a big change and it adds a significant maintenance burden
> because each allocation entry point needs to be handled specifically.
> The cost will grow with the intended coverage especially there when
> allocation is hidden in a library code.
We've made this as clean and simple as posssible: a single new macro
invocation per allocation function, no calling convention changes (that
would indeed have been a lot of churn!)
> - It has been brought up that this is duplicating functionality already
> available via existing tracing infrastructure. You should make it very
> clear why that is not suitable for the job
Tracing people _claimed_ this, but never demonstrated it. Tracepoints
exist but the tooling that would consume them to provide this kind of
information does not exist; it would require maintaining an index of
_every outstanding allocation_ so that frees could be accounted
correctly - IOW, it would be _drastically_ higher overhead, so not at
all comparable.
> - We already have page_owner infrastructure that provides allocation
> tracking data. Why it cannot be used/extended?
Page owner is also very high overhead, and the output is not very user
friendly (tracking full call stack means many related overhead gets
split, not generally what you want), and it doesn't cover slab.
This tracks _all_ memory allocations - slab, page, vmalloc, percpu.
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 00/40] Memory allocation profiling
2023-05-03 7:25 ` [PATCH 00/40] Memory " Michal Hocko
2023-05-03 7:34 ` Kent Overstreet
@ 2023-05-03 15:09 ` Suren Baghdasaryan
2023-05-03 16:28 ` Steven Rostedt
2023-05-04 9:07 ` Michal Hocko
1 sibling, 2 replies; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-03 15:09 UTC (permalink / raw)
To: Michal Hocko
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
kent.overstreet-fxUVXftIFDnyG1zEObXtfA, vbabka-AlSwsSmVLrQ,
hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w, keescook
On Wed, May 3, 2023 at 12:25 AM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
>
> On Mon 01-05-23 09:54:10, Suren Baghdasaryan wrote:
> > Memory allocation profiling infrastructure provides a low overhead
> > mechanism to make all kernel allocations in the system visible. It can be
> > used to monitor memory usage, track memory hotspots, detect memory leaks,
> > identify memory regressions.
> >
> > To keep the overhead to the minimum, we record only allocation sizes for
> > every allocation in the codebase. With that information, if users are
> > interested in more detailed context for a specific allocation, they can
> > enable in-depth context tracking, which includes capturing the pid, tgid,
> > task name, allocation size, timestamp and call stack for every allocation
> > at the specified code location.
> [...]
> > Implementation utilizes a more generic concept of code tagging, introduced
> > as part of this patchset. Code tag is a structure identifying a specific
> > location in the source code which is generated at compile time and can be
> > embedded in an application-specific structure. A number of applications
> > for code tagging have been presented in the original RFC [1].
> > Code tagging uses the old trick of "define a special elf section for
> > objects of a given type so that we can iterate over them at runtime" and
> > creates a proper library for it.
> >
> > To profile memory allocations, we instrument page, slab and percpu
> > allocators to record total memory allocated in the associated code tag at
> > every allocation in the codebase. Every time an allocation is performed by
> > an instrumented allocator, the code tag at that location increments its
> > counter by allocation size. Every time the memory is freed the counter is
> > decremented. To decrement the counter upon freeing, allocated object needs
> > a reference to its code tag. Page allocators use page_ext to record this
> > reference while slab allocators use memcg_data (renamed into more generic
> > slabobj_ext) of the slab page.
> [...]
> > [1] https://lore.kernel.org/all/20220830214919.53220-1-surenb-hpIqsD4AKldhl2p70BpVqQ@public.gmane.orgm/
> [...]
> > 70 files changed, 2765 insertions(+), 554 deletions(-)
>
> Sorry for cutting the cover considerably but I believe I have quoted the
> most important/interesting parts here. The approach is not fundamentally
> different from the previous version [1] and there was a significant
> discussion around this approach. The cover letter doesn't summarize nor
> deal with concerns expressed previous AFAICS. So let me bring those up
> back.
Thanks for summarizing!
> At least those I find the most important:
> - This is a big change and it adds a significant maintenance burden
> because each allocation entry point needs to be handled specifically.
> The cost will grow with the intended coverage especially there when
> allocation is hidden in a library code.
Do you mean with more allocations in the codebase more codetags will
be generated? Is that the concern? Or maybe as you commented in
another patch that context capturing feature does not limit how many
stacks will be captured?
> - It has been brought up that this is duplicating functionality already
> available via existing tracing infrastructure. You should make it very
> clear why that is not suitable for the job
I experimented with using tracing with _RET_IP_ to implement this
accounting. The major issue is the _RET_IP_ to codetag lookup runtime
overhead which is orders of magnitude higher than proposed code
tagging approach. With code tagging proposal, that link is resolved at
compile time. Since we want this mechanism deployed in production, we
want to keep the overhead to the absolute minimum.
You asked me before how much overhead would be tolerable and the
answer will always be "as small as possible". This is especially true
for slab allocators which are ridiculously fast and regressing them
would be very noticable (due to the frequent use).
There is another issue, which I think can be solved in a smart way but
will either affect performance or would require more memory. With the
tracing approach we don't know beforehand how many individual
allocation sites exist, so we have to allocate code tags (or similar
structures for counting) at runtime vs compile time. We can be smart
about it and allocate in batches or even preallocate more than we need
beforehand but, as I said, it will require some kind of compromise.
I understand that code tagging creates additional maintenance burdens
but I hope it also produces enough benefits that people will want
this. The cost is also hopefully amortized when additional
applications like the ones we presented in RFC [1] are built using the
same framework.
> - We already have page_owner infrastructure that provides allocation
> tracking data. Why it cannot be used/extended?
1. The overhead.
2. Covers only page allocators.
I didn't think about extending the page_owner approach to slab
allocators but I suspect it would not be trivial. I don't see
attaching an owner to every slab object to be a scalable solution. The
overhead would again be of concern here.
I should point out that there was one important technical concern
about lack of a kill switch for this feature, which was an issue for
distributions that can't disable the CONFIG flag. In this series we
addressed that concern.
[1] https://lore.kernel.org/all/20220830214919.53220-1-surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
Thanks,
Suren.
>
> Thanks!
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 00/40] Memory allocation profiling
2023-05-03 15:09 ` Suren Baghdasaryan
@ 2023-05-03 16:28 ` Steven Rostedt
[not found] ` <20230503122839.0d9934c5-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
2023-05-04 9:07 ` Michal Hocko
1 sibling, 1 reply; 160+ messages in thread
From: Steven Rostedt @ 2023-05-03 16:28 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Michal Hocko, akpm, kent.overstreet, vbabka, hannes,
roman.gushchin, mgorman, dave, willy, liam.howlett, corbet, void,
peterz, juri.lelli, ldufour, catalin.marinas, will, arnd, tglx,
mingo, dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells, hughd, an
On Wed, 3 May 2023 08:09:28 -0700
Suren Baghdasaryan <surenb@google.com> wrote:
> There is another issue, which I think can be solved in a smart way but
> will either affect performance or would require more memory. With the
> tracing approach we don't know beforehand how many individual
> allocation sites exist, so we have to allocate code tags (or similar
> structures for counting) at runtime vs compile time. We can be smart
> about it and allocate in batches or even preallocate more than we need
> beforehand but, as I said, it will require some kind of compromise.
This approach is actually quite common, especially since tagging every
instance is usually overkill, as if you trace function calls in a running
kernel, you will find that only a small percentage of the kernel ever
executes. It's possible that you will be allocating a lot of tags that will
never be used. If run time allocation is possible, that is usually the
better approach.
-- Steve
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 00/40] Memory allocation profiling
2023-05-03 15:09 ` Suren Baghdasaryan
2023-05-03 16:28 ` Steven Rostedt
@ 2023-05-04 9:07 ` Michal Hocko
2023-05-04 15:08 ` Suren Baghdasaryan
[not found] ` <ZFN1yswCd9wRgYPR-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
1 sibling, 2 replies; 160+ messages in thread
From: Michal Hocko @ 2023-05-04 9:07 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Wed 03-05-23 08:09:28, Suren Baghdasaryan wrote:
> On Wed, May 3, 2023 at 12:25 AM Michal Hocko <mhocko@suse.com> wrote:
[...]
> Thanks for summarizing!
>
> > At least those I find the most important:
> > - This is a big change and it adds a significant maintenance burden
> > because each allocation entry point needs to be handled specifically.
> > The cost will grow with the intended coverage especially there when
> > allocation is hidden in a library code.
>
> Do you mean with more allocations in the codebase more codetags will
> be generated? Is that the concern?
No. I am mostly concerned about the _maintenance_ overhead. For the
bare tracking (without profiling and thus stack traces) only those
allocations that are directly inlined into the consumer are really
of any use. That increases the code impact of the tracing because any
relevant allocation location has to go through the micro surgery.
e.g. is it really interesting to know that there is a likely memory
leak in seq_file proper doing and allocation? No as it is the specific
implementation using seq_file that is leaking most likely. There are
other examples like that See?
> Or maybe as you commented in
> another patch that context capturing feature does not limit how many
> stacks will be captured?
That is a memory overhead which can be really huge and it would be nice
to be more explicit about that in the cover letter. It is a downside for
sure but not something that has a code maintenance impact and it is an
opt-in so it can be enabled only when necessary.
Quite honestly, though, the more I look into context capturing part it
seems to me that there is much more to be reconsidered there and if you
really want to move forward with the code tagging part then you should
drop that for now. It would make the whole series smaller and easier to
digest.
> > - It has been brought up that this is duplicating functionality already
> > available via existing tracing infrastructure. You should make it very
> > clear why that is not suitable for the job
>
> I experimented with using tracing with _RET_IP_ to implement this
> accounting. The major issue is the _RET_IP_ to codetag lookup runtime
> overhead which is orders of magnitude higher than proposed code
> tagging approach. With code tagging proposal, that link is resolved at
> compile time. Since we want this mechanism deployed in production, we
> want to keep the overhead to the absolute minimum.
> You asked me before how much overhead would be tolerable and the
> answer will always be "as small as possible". This is especially true
> for slab allocators which are ridiculously fast and regressing them
> would be very noticable (due to the frequent use).
It would have been more convincing if you had some numbers at hands.
E.g. this is a typical workload we are dealing with. With the compile
time tags we are able to learn this with that much of cost. With a dynamic
tracing we are able to learn this much with that cost. See? As small as
possible is a rather vague term that different people will have a very
different idea about.
> There is another issue, which I think can be solved in a smart way but
> will either affect performance or would require more memory. With the
> tracing approach we don't know beforehand how many individual
> allocation sites exist, so we have to allocate code tags (or similar
> structures for counting) at runtime vs compile time. We can be smart
> about it and allocate in batches or even preallocate more than we need
> beforehand but, as I said, it will require some kind of compromise.
I have tried our usual distribution config (only vmlinux without modules
so the real impact will be larger as we build a lot of stuff into
modules) just to get an idea:
text data bss dec hex filename
28755345 17040322 19845124 65640791 3e99957 vmlinux.before
28867168 17571838 19386372 65825378 3ec6a62 vmlinux.after
Less than 1% for text 3% for data. This is not all that terrible
for an initial submission and a more dynamic approach could be added
later. E.g. with a smaller pre-allocated hash table that could be
expanded lazily. Anyway not something I would be losing sleep over. This
can always be improved later on.
> I understand that code tagging creates additional maintenance burdens
> but I hope it also produces enough benefits that people will want
> this. The cost is also hopefully amortized when additional
> applications like the ones we presented in RFC [1] are built using the
> same framework.
TBH I am much more concerned about the maintenance burden on the MM side
than the actual code tagging itslef which is much more self contained. I
haven't seen other potential applications of the same infrastructure and
maybe the code impact would be much smaller than in the MM proper. Our
allocator API is really hairy and convoluted.
> > - We already have page_owner infrastructure that provides allocation
> > tracking data. Why it cannot be used/extended?
>
> 1. The overhead.
Do you have any numbers?
> 2. Covers only page allocators.
Yes this sucks.
>
> I didn't think about extending the page_owner approach to slab
> allocators but I suspect it would not be trivial. I don't see
> attaching an owner to every slab object to be a scalable solution. The
> overhead would again be of concern here.
This would have been a nice argument to mention in the changelog so that
we know that you have considered that option at least. Why should I (as
a reviewer) wild guess that?
> I should point out that there was one important technical concern
> about lack of a kill switch for this feature, which was an issue for
> distributions that can't disable the CONFIG flag. In this series we
> addressed that concern.
Thanks, that is certainly appreciated. I haven't looked deeper into that
part but from the cover letter I have understood that CONFIG_MEM_ALLOC_PROFILING
implies unconditional page_ext and therefore the memory overhead
assosiated with that. There seems to be a killswitch nomem_profiling but
from a quick look it doesn't seem to disable page_ext allocations. I
might be missing something there of course. Having a highlevel
describtion for that would be really nice as well.
> [1] https://lore.kernel.org/all/20220830214919.53220-1-surenb@google.com/
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 00/40] Memory allocation profiling
2023-05-04 9:07 ` Michal Hocko
@ 2023-05-04 15:08 ` Suren Baghdasaryan
[not found] ` <CAJuCfpEkV_+pAjxyEpMqY+x7buZhSpj5qDF6KubsS=ObrQKUZg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
[not found] ` <ZFN1yswCd9wRgYPR-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
1 sibling, 1 reply; 160+ messages in thread
From: Suren Baghdasaryan @ 2023-05-04 15:08 UTC (permalink / raw)
To: Michal Hocko
Cc: akpm, kent.overstreet, vbabka, hannes, roman.gushchin, mgorman,
dave, willy, liam.howlett, corbet, void, peterz, juri.lelli,
ldufour, catalin.marinas, will, arnd, tglx, mingo, dave.hansen,
x86, peterx, david, axboe, mcgrof, masahiroy, nathan, dennis, tj,
muchun.song, rppt, paulmck, pasha.tatashin, yosryahmed, yuzhao,
dhowells, hughd, andreyknvl, keescook
On Thu, May 4, 2023 at 2:07 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 03-05-23 08:09:28, Suren Baghdasaryan wrote:
> > On Wed, May 3, 2023 at 12:25 AM Michal Hocko <mhocko@suse.com> wrote:
> [...]
> > Thanks for summarizing!
> >
> > > At least those I find the most important:
> > > - This is a big change and it adds a significant maintenance burden
> > > because each allocation entry point needs to be handled specifically.
> > > The cost will grow with the intended coverage especially there when
> > > allocation is hidden in a library code.
> >
> > Do you mean with more allocations in the codebase more codetags will
> > be generated? Is that the concern?
>
> No. I am mostly concerned about the _maintenance_ overhead. For the
> bare tracking (without profiling and thus stack traces) only those
> allocations that are directly inlined into the consumer are really
> of any use. That increases the code impact of the tracing because any
> relevant allocation location has to go through the micro surgery.
>
> e.g. is it really interesting to know that there is a likely memory
> leak in seq_file proper doing and allocation? No as it is the specific
> implementation using seq_file that is leaking most likely. There are
> other examples like that See?
Yes, I see that. One level tracking does not provide all the
information needed to track such issues. Something more informative
would cost more. That's why our proposal is to have a light-weight
mechanism to get a high level picture and then be able to zoom into a
specific area using context capture. If you have ideas to improve
this, I'm open to suggestions.
>
> > Or maybe as you commented in
> > another patch that context capturing feature does not limit how many
> > stacks will be captured?
>
> That is a memory overhead which can be really huge and it would be nice
> to be more explicit about that in the cover letter. It is a downside for
> sure but not something that has a code maintenance impact and it is an
> opt-in so it can be enabled only when necessary.
You are right, I'll add that into the cover letter.
>
> Quite honestly, though, the more I look into context capturing part it
> seems to me that there is much more to be reconsidered there and if you
> really want to move forward with the code tagging part then you should
> drop that for now. It would make the whole series smaller and easier to
> digest.
Sure, I don't see an issue with removing that for now and refining the
mechanism before posting again.
>
> > > - It has been brought up that this is duplicating functionality already
> > > available via existing tracing infrastructure. You should make it very
> > > clear why that is not suitable for the job
> >
> > I experimented with using tracing with _RET_IP_ to implement this
> > accounting. The major issue is the _RET_IP_ to codetag lookup runtime
> > overhead which is orders of magnitude higher than proposed code
> > tagging approach. With code tagging proposal, that link is resolved at
> > compile time. Since we want this mechanism deployed in production, we
> > want to keep the overhead to the absolute minimum.
> > You asked me before how much overhead would be tolerable and the
> > answer will always be "as small as possible". This is especially true
> > for slab allocators which are ridiculously fast and regressing them
> > would be very noticable (due to the frequent use).
>
> It would have been more convincing if you had some numbers at hands.
> E.g. this is a typical workload we are dealing with. With the compile
> time tags we are able to learn this with that much of cost. With a dynamic
> tracing we are able to learn this much with that cost. See? As small as
> possible is a rather vague term that different people will have a very
> different idea about.
I'm rerunning my tests with the latest kernel to collect the
comparison data. I profiled these solutions before but the kernel
changed since then, so I need to update them.
>
> > There is another issue, which I think can be solved in a smart way but
> > will either affect performance or would require more memory. With the
> > tracing approach we don't know beforehand how many individual
> > allocation sites exist, so we have to allocate code tags (or similar
> > structures for counting) at runtime vs compile time. We can be smart
> > about it and allocate in batches or even preallocate more than we need
> > beforehand but, as I said, it will require some kind of compromise.
>
> I have tried our usual distribution config (only vmlinux without modules
> so the real impact will be larger as we build a lot of stuff into
> modules) just to get an idea:
> text data bss dec hex filename
> 28755345 17040322 19845124 65640791 3e99957 vmlinux.before
> 28867168 17571838 19386372 65825378 3ec6a62 vmlinux.after
>
> Less than 1% for text 3% for data. This is not all that terrible
> for an initial submission and a more dynamic approach could be added
> later. E.g. with a smaller pre-allocated hash table that could be
> expanded lazily. Anyway not something I would be losing sleep over. This
> can always be improved later on.
Ah, right. I should have mentioned this overhead too. Thanks for
keeping me honest.
> > I understand that code tagging creates additional maintenance burdens
> > but I hope it also produces enough benefits that people will want
> > this. The cost is also hopefully amortized when additional
> > applications like the ones we presented in RFC [1] are built using the
> > same framework.
>
> TBH I am much more concerned about the maintenance burden on the MM side
> than the actual code tagging itslef which is much more self contained. I
> haven't seen other potential applications of the same infrastructure and
> maybe the code impact would be much smaller than in the MM proper. Our
> allocator API is really hairy and convoluted.
Yes, other applications are much smaller and cleaner. MM allocation
code is quite complex indeed.
>
> > > - We already have page_owner infrastructure that provides allocation
> > > tracking data. Why it cannot be used/extended?
> >
> > 1. The overhead.
>
> Do you have any numbers?
Will post once my tests are completed.
>
> > 2. Covers only page allocators.
>
> Yes this sucks.
> >
> > I didn't think about extending the page_owner approach to slab
> > allocators but I suspect it would not be trivial. I don't see
> > attaching an owner to every slab object to be a scalable solution. The
> > overhead would again be of concern here.
>
> This would have been a nice argument to mention in the changelog so that
> we know that you have considered that option at least. Why should I (as
> a reviewer) wild guess that?
Sorry, It's hard to remember all the decisions, discussions and
conclusions when working on a feature over a long time period. I'll
include more information about that.
>
> > I should point out that there was one important technical concern
> > about lack of a kill switch for this feature, which was an issue for
> > distributions that can't disable the CONFIG flag. In this series we
> > addressed that concern.
>
> Thanks, that is certainly appreciated. I haven't looked deeper into that
> part but from the cover letter I have understood that CONFIG_MEM_ALLOC_PROFILING
> implies unconditional page_ext and therefore the memory overhead
> assosiated with that. There seems to be a killswitch nomem_profiling but
> from a quick look it doesn't seem to disable page_ext allocations. I
> might be missing something there of course. Having a highlevel
> describtion for that would be really nice as well.
Right, will add a description of that as well.
We eliminate the runtime overhead but not the memory one. However I
believe it's also doable using page_ext_operations.need callback. Will
look into it.
Thanks,
Suren.
>
> > [1] https://lore.kernel.org/all/20220830214919.53220-1-surenb@google.com/
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 160+ messages in thread[parent not found: <ZFN1yswCd9wRgYPR-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH 00/40] Memory allocation profiling
[not found] ` <ZFN1yswCd9wRgYPR-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2023-05-07 17:20 ` Kent Overstreet
2023-05-07 20:55 ` Steven Rostedt
2023-05-08 15:52 ` Petr Tesařík
0 siblings, 2 replies; 160+ messages in thread
From: Kent Overstreet @ 2023-05-07 17:20 UTC (permalink / raw)
To: Michal Hocko
Cc: Suren Baghdasaryan, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
vbabka-AlSwsSmVLrQ, hannes-druUgvl0LCNAfugRpC6u6w,
roman.gushchin-fxUVXftIFDnyG1zEObXtfA, mgorman-l3A5Bk7waGM,
dave-h16yJtLeMjHk1uMJSBkQmQ, willy-wEGCiKHe2LqWVfeAwA7xHQ,
liam.howlett-QHcLZuEGTsvQT0dZR+AlfA, corbet-T1hC0tSOHrs,
void-gq6j2QGBifHby3iVrkZq2A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
juri.lelli-H+wXaHxf7aLQT0dZR+AlfA, ldufour-tEXmvtCZX7AybS5Ee8rs3A,
catalin.marinas-5wv7dgnIgG8, will-DgEjT+Ai2ygdnm+yROfE0A,
arnd-r2nGTMty4D4, tglx-hfZtesqFncYOwBW4kG4KsQ,
mingo-H+wXaHxf7aLQT0dZR+AlfA, dave.hansen-VuQAYsv1563Yd54FQh9/CA,
x86-DgEjT+Ai2ygdnm+yROfE0A, peterx-H+wXaHxf7aLQT0dZR+AlfA,
david-H+wXaHxf7aLQT0dZR+AlfA, axboe-tSWWG44O7X1aa/9Udqfwiw,
mcgrof-DgEjT+Ai2ygdnm+yROfE0A, masahiroy-DgEjT+Ai2ygdnm+yROfE0A,
nathan-DgEjT+Ai2ygdnm+yROfE0A, dennis-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, muchun.song-fxUVXftIFDnyG1zEObXtfA,
rppt-DgEjT+Ai2ygdnm+yROfE0A, paulmck-DgEjT+Ai2ygdnm+yROfE0A,
pasha.tatashin-2EmBfe737+LQT0dZR+AlfA,
yosryahmed-hpIqsD4AKlfQT0dZR+AlfA, yuzhao-hpIqsD4AKlfQT0dZR+AlfA,
dhowells-H+wXaHxf7aLQT0dZR+AlfA, hughd-hpIqsD4AKlfQT0dZR+AlfA,
andreyknvl-Re5JQEeQqe8AvxtiuMwx3w
On Thu, May 04, 2023 at 11:07:22AM +0200, Michal Hocko wrote:
> No. I am mostly concerned about the _maintenance_ overhead. For the
> bare tracking (without profiling and thus stack traces) only those
> allocations that are directly inlined into the consumer are really
> of any use. That increases the code impact of the tracing because any
> relevant allocation location has to go through the micro surgery.
>
> e.g. is it really interesting to know that there is a likely memory
> leak in seq_file proper doing and allocation? No as it is the specific
> implementation using seq_file that is leaking most likely. There are
> other examples like that See?
So this is a rather strange usage of "maintenance overhead" :)
But it's something we thought of. If we had to plumb around a _RET_IP_
parameter, or a codetag pointer, it would be a hassle annotating the
correct callsite.
Instead, alloc_hooks() wraps a memory allocation function and stashes a
pointer to a codetag in task_struct for use by the core slub/buddy
allocator code.
That means that in your example, to move tracking to a given seq_file
function, we just:
- hook the seq_file function with alloc_hooks
- change the seq_file function to call non-hooked memory allocation
functions.
> It would have been more convincing if you had some numbers at hands.
> E.g. this is a typical workload we are dealing with. With the compile
> time tags we are able to learn this with that much of cost. With a dynamic
> tracing we are able to learn this much with that cost. See? As small as
> possible is a rather vague term that different people will have a very
> different idea about.
Engineers don't prototype and benchmark everything as a matter of
course, we're expected to have the rough equivealent of a CS education
and an understanding of big O notation, cache architecture, etc.
The slub fast path is _really_ fast - double word non locked cmpxchg.
That's what we're trying to compete with. Adding a big globally
accessible hash table is going to tank performance compared to that.
I believe the numbers we already posted speak for themselves. We're
considerably faster than memcg, fast enough to run in production.
I'm not going to be switching to a design that significantly regresses
performance, sorry :)
> TBH I am much more concerned about the maintenance burden on the MM side
> than the actual code tagging itslef which is much more self contained. I
> haven't seen other potential applications of the same infrastructure and
> maybe the code impact would be much smaller than in the MM proper. Our
> allocator API is really hairy and convoluted.
You keep saying "maintenance burden", but this is a criticism that can
be directed at _any_ patchset that adds new code; it's generally
understood that that is the accepted cost for new functionality.
If you have specific concerns where you think we did something that
makes the code harder to maintain, _please point them out in the
appropriate patch_. I don't think you'll find too much - the
instrumentation in the allocators simply generalizes what memcg was
already doing, and the hooks themselves are a bit boilerplaty but hardly
the sort of thing people will be tripping over later.
TL;DR - put up or shut up :)
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH 00/40] Memory allocation profiling
2023-05-07 17:20 ` Kent Overstreet
@ 2023-05-07 20:55 ` Steven Rostedt
2023-05-07 21:53 ` Kent Overstreet
2023-05-08 15:52 ` Petr Tesařík
1 sibling, 1 reply; 160+ messages in thread
From: Steven Rostedt @ 2023-05-07 20:55 UTC (permalink / raw)
To: Kent Overstreet
Cc: Michal Hocko, Suren Baghdasaryan, akpm, vbabka, hannes,
roman.gushchin, mgorman, dave, willy, liam.howlett, corbet, void,
peterz, juri.lelli, ldufour, catalin.marinas, will, arnd, tglx,
mingo, dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells
On Sun, 7 May 2023 13:20:55 -0400
Kent Overstreet <kent.overstreet@linux.dev> wrote:
> On Thu, May 04, 2023 at 11:07:22AM +0200, Michal Hocko wrote:
> > No. I am mostly concerned about the _maintenance_ overhead. For the
> > bare tracking (without profiling and thus stack traces) only those
> > allocations that are directly inlined into the consumer are really
> > of any use. That increases the code impact of the tracing because any
> > relevant allocation location has to go through the micro surgery.
> >
> > e.g. is it really interesting to know that there is a likely memory
> > leak in seq_file proper doing and allocation? No as it is the specific
> > implementation using seq_file that is leaking most likely. There are
> > other examples like that See?
>
> So this is a rather strange usage of "maintenance overhead" :)
>
> But it's something we thought of. If we had to plumb around a _RET_IP_
> parameter, or a codetag pointer, it would be a hassle annotating the
> correct callsite.
>
> Instead, alloc_hooks() wraps a memory allocation function and stashes a
> pointer to a codetag in task_struct for use by the core slub/buddy
> allocator code.
>
> That means that in your example, to move tracking to a given seq_file
> function, we just:
> - hook the seq_file function with alloc_hooks
> - change the seq_file function to call non-hooked memory allocation
> functions.
>
> > It would have been more convincing if you had some numbers at hands.
> > E.g. this is a typical workload we are dealing with. With the compile
> > time tags we are able to learn this with that much of cost. With a dynamic
> > tracing we are able to learn this much with that cost. See? As small as
> > possible is a rather vague term that different people will have a very
> > different idea about.
>
> Engineers don't prototype and benchmark everything as a matter of
> course, we're expected to have the rough equivealent of a CS education
> and an understanding of big O notation, cache architecture, etc.
>
> The slub fast path is _really_ fast - double word non locked cmpxchg.
> That's what we're trying to compete with. Adding a big globally
> accessible hash table is going to tank performance compared to that.
>
> I believe the numbers we already posted speak for themselves. We're
> considerably faster than memcg, fast enough to run in production.
>
> I'm not going to be switching to a design that significantly regresses
> performance, sorry :)
>
> > TBH I am much more concerned about the maintenance burden on the MM side
> > than the actual code tagging itslef which is much more self contained. I
> > haven't seen other potential applications of the same infrastructure and
> > maybe the code impact would be much smaller than in the MM proper. Our
> > allocator API is really hairy and convoluted.
>
> You keep saying "maintenance burden", but this is a criticism that can
> be directed at _any_ patchset that adds new code; it's generally
> understood that that is the accepted cost for new functionality.
>
> If you have specific concerns where you think we did something that
> makes the code harder to maintain, _please point them out in the
> appropriate patch_. I don't think you'll find too much - the
> instrumentation in the allocators simply generalizes what memcg was
> already doing, and the hooks themselves are a bit boilerplaty but hardly
> the sort of thing people will be tripping over later.
>
> TL;DR - put up or shut up :)
Your email would have been much better if you left the above line out. :-/
Comments like the above do not go over well via text. Even if you add the ":)"
Back to the comment about this being a burden. I just applied all the
patches and did a diff (much easier than to wade through 40 patches!)
One thing we need to get rid of, and this isn't your fault but this
series is extending it, is the use of the damn underscores to
differentiate functions. This is one of the abominations of the early
Linux kernel code base. I admit, I'm guilty of this too. But today I
have learned and avoid it at all cost. Underscores are meaningless and
error prone, not to mention confusing to people coming onboard. Let's
use something that has some meaning.
What's the difference between:
_kmem_cache_alloc_node() and __kmem_cache_alloc_node()?
And if every allocation function requires a double hook, that is a
maintenance burden. We do this for things like system calls, but
there's a strong rationale for that. I'm guessing that Michal's concern
is that he and other mm maintainers will need to make sure any new
allocation function has this double call and is done properly. This
isn't just new code that needs to be maintained, it's something that
needs to be understood when adding any new interface to page
allocations.
It's true that all new code has a maintenance burden, and unless the
maintainer feels the burden is worth their time, they have the right to
complain about it.
I've given talks about how to get code into open source projects, and
the title is "Commits are pulled and never pushed". Where basically I
talk about convincing the maintainers that they want your change, and
not by pushing it because you want it.
-- Steve
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 00/40] Memory allocation profiling
2023-05-07 20:55 ` Steven Rostedt
@ 2023-05-07 21:53 ` Kent Overstreet
2023-05-07 22:09 ` Steven Rostedt
0 siblings, 1 reply; 160+ messages in thread
From: Kent Overstreet @ 2023-05-07 21:53 UTC (permalink / raw)
To: Steven Rostedt
Cc: Michal Hocko, Suren Baghdasaryan, akpm, vbabka, hannes,
roman.gushchin, mgorman, dave, willy, liam.howlett, corbet, void,
peterz, juri.lelli, ldufour, catalin.marinas, will, arnd, tglx,
mingo, dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells
On Sun, May 07, 2023 at 04:55:38PM -0400, Steven Rostedt wrote:
> > TL;DR - put up or shut up :)
>
> Your email would have been much better if you left the above line out. :-/
> Comments like the above do not go over well via text. Even if you add the ":)"
I stand by that comment :)
> Back to the comment about this being a burden. I just applied all the
> patches and did a diff (much easier than to wade through 40 patches!)
>
> One thing we need to get rid of, and this isn't your fault but this
> series is extending it, is the use of the damn underscores to
> differentiate functions. This is one of the abominations of the early
> Linux kernel code base. I admit, I'm guilty of this too. But today I
> have learned and avoid it at all cost. Underscores are meaningless and
> error prone, not to mention confusing to people coming onboard. Let's
> use something that has some meaning.
>
> What's the difference between:
>
> _kmem_cache_alloc_node() and __kmem_cache_alloc_node()?
>
> And if every allocation function requires a double hook, that is a
> maintenance burden. We do this for things like system calls, but
> there's a strong rationale for that.
The underscore is a legitimate complaint - I brought this up in
development, not sure why it got lost. We'll do something better with a
consistent suffix, perhaps kmem_cache_alloc_noacct().
> I'm guessing that Michal's concern is that he and other mm maintainers
> will need to make sure any new allocation function has this double
> call and is done properly. This isn't just new code that needs to be
> maintained, it's something that needs to be understood when adding any
> new interface to page allocations.
Well, isn't that part of the problem then? We're _this far_ into the
thread and still guessing on what Michal's "maintenance concerns" are?
Regarding your specific concern: My main design consideration was making
sure every allocation gets accounted somewhere; we don't want a memory
allocation profiling system where it's possible for allocations to be
silently not tracked! There's warnings in the core allocators if they
see an allocation without an alloc tag, and in testing we chased down
everything we found.
So if anyone later creates a new memory allocation interface and forgets
to hook it, they'll see the same warning - but perhaps we could improve
the warning message so it says exactly what needs to be done (wrap the
allocation in an alloc_hooks() call).
> It's true that all new code has a maintenance burden, and unless the
> maintainer feels the burden is worth their time, they have the right to
> complain about it.
Sure, but complaints should say what they're complaining about.
Complaints so vague they could be levelled at any patchset don't do
anything for the discussion.
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 00/40] Memory allocation profiling
2023-05-07 21:53 ` Kent Overstreet
@ 2023-05-07 22:09 ` Steven Rostedt
[not found] ` <20230507180911.09d328c8-tvvo3QnZDcrq6bvjpv6Lkf3PFXHtQ0wO@public.gmane.org>
0 siblings, 1 reply; 160+ messages in thread
From: Steven Rostedt @ 2023-05-07 22:09 UTC (permalink / raw)
To: Kent Overstreet
Cc: Michal Hocko, Suren Baghdasaryan, akpm, vbabka, hannes,
roman.gushchin, mgorman, dave, willy, liam.howlett, corbet, void,
peterz, juri.lelli, ldufour, catalin.marinas, will, arnd, tglx,
mingo, dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells
On Sun, 7 May 2023 17:53:09 -0400
Kent Overstreet <kent.overstreet@linux.dev> wrote:
> The underscore is a legitimate complaint - I brought this up in
> development, not sure why it got lost. We'll do something better with a
> consistent suffix, perhaps kmem_cache_alloc_noacct().
Would "_noprofile()" be a better name. I'm not sure what "acct" means.
-- Steve
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 00/40] Memory allocation profiling
2023-05-07 17:20 ` Kent Overstreet
2023-05-07 20:55 ` Steven Rostedt
@ 2023-05-08 15:52 ` Petr Tesařík
2023-05-08 15:57 ` Kent Overstreet
1 sibling, 1 reply; 160+ messages in thread
From: Petr Tesařík @ 2023-05-08 15:52 UTC (permalink / raw)
To: Kent Overstreet
Cc: Michal Hocko, Suren Baghdasaryan, akpm, vbabka, hannes,
roman.gushchin, mgorman, dave, willy, liam.howlett, corbet, void,
peterz, juri.lelli, ldufour, catalin.marinas, will, arnd, tglx,
mingo, dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells
On Sun, 7 May 2023 13:20:55 -0400
Kent Overstreet <kent.overstreet@linux.dev> wrote:
> On Thu, May 04, 2023 at 11:07:22AM +0200, Michal Hocko wrote:
> > No. I am mostly concerned about the _maintenance_ overhead. For the
> > bare tracking (without profiling and thus stack traces) only those
> > allocations that are directly inlined into the consumer are really
> > of any use. That increases the code impact of the tracing because any
> > relevant allocation location has to go through the micro surgery.
> >
> > e.g. is it really interesting to know that there is a likely memory
> > leak in seq_file proper doing and allocation? No as it is the specific
> > implementation using seq_file that is leaking most likely. There are
> > other examples like that See?
>
> So this is a rather strange usage of "maintenance overhead" :)
>
> But it's something we thought of. If we had to plumb around a _RET_IP_
> parameter, or a codetag pointer, it would be a hassle annotating the
> correct callsite.
>
> Instead, alloc_hooks() wraps a memory allocation function and stashes a
> pointer to a codetag in task_struct for use by the core slub/buddy
> allocator code.
>
> That means that in your example, to move tracking to a given seq_file
> function, we just:
> - hook the seq_file function with alloc_hooks
Thank you. That's exactly what I was trying to point out. So you hook
seq_buf_alloc(), just to find out it's called from traverse(), which
is not very helpful either. So, you hook traverse(), which sounds quite
generic. Yes, you're lucky, because it is a static function, and the
identifier is not actually used anywhere else (right now), but each
time you want to hook something, you must make sure it does not
conflict with any other identifier in the kernel...
Petr T
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 00/40] Memory allocation profiling
2023-05-08 15:52 ` Petr Tesařík
@ 2023-05-08 15:57 ` Kent Overstreet
2023-05-08 16:09 ` Petr Tesařík
0 siblings, 1 reply; 160+ messages in thread
From: Kent Overstreet @ 2023-05-08 15:57 UTC (permalink / raw)
To: Petr Tesařík
Cc: Michal Hocko, Suren Baghdasaryan, akpm, vbabka, hannes,
roman.gushchin, mgorman, dave, willy, liam.howlett, corbet, void,
peterz, juri.lelli, ldufour, catalin.marinas, will, arnd, tglx,
mingo, dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells
On Mon, May 08, 2023 at 05:52:06PM +0200, Petr Tesařík wrote:
> On Sun, 7 May 2023 13:20:55 -0400
> Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> > On Thu, May 04, 2023 at 11:07:22AM +0200, Michal Hocko wrote:
> > > No. I am mostly concerned about the _maintenance_ overhead. For the
> > > bare tracking (without profiling and thus stack traces) only those
> > > allocations that are directly inlined into the consumer are really
> > > of any use. That increases the code impact of the tracing because any
> > > relevant allocation location has to go through the micro surgery.
> > >
> > > e.g. is it really interesting to know that there is a likely memory
> > > leak in seq_file proper doing and allocation? No as it is the specific
> > > implementation using seq_file that is leaking most likely. There are
> > > other examples like that See?
> >
> > So this is a rather strange usage of "maintenance overhead" :)
> >
> > But it's something we thought of. If we had to plumb around a _RET_IP_
> > parameter, or a codetag pointer, it would be a hassle annotating the
> > correct callsite.
> >
> > Instead, alloc_hooks() wraps a memory allocation function and stashes a
> > pointer to a codetag in task_struct for use by the core slub/buddy
> > allocator code.
> >
> > That means that in your example, to move tracking to a given seq_file
> > function, we just:
> > - hook the seq_file function with alloc_hooks
>
> Thank you. That's exactly what I was trying to point out. So you hook
> seq_buf_alloc(), just to find out it's called from traverse(), which
> is not very helpful either. So, you hook traverse(), which sounds quite
> generic. Yes, you're lucky, because it is a static function, and the
> identifier is not actually used anywhere else (right now), but each
> time you want to hook something, you must make sure it does not
> conflict with any other identifier in the kernel...
Cscope makes quick and easy work of this kind of stuff.
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH 00/40] Memory allocation profiling
2023-05-08 15:57 ` Kent Overstreet
@ 2023-05-08 16:09 ` Petr Tesařík
[not found] ` <20230508180913.6a018b21-TD/jYOLh/Qr2G+KSGY6Hrl+YFMdMcpeZ@public.gmane.org>
0 siblings, 1 reply; 160+ messages in thread
From: Petr Tesařík @ 2023-05-08 16:09 UTC (permalink / raw)
To: Kent Overstreet
Cc: Michal Hocko, Suren Baghdasaryan, akpm, vbabka, hannes,
roman.gushchin, mgorman, dave, willy, liam.howlett, corbet, void,
peterz, juri.lelli, ldufour, catalin.marinas, will, arnd, tglx,
mingo, dave.hansen, x86, peterx, david, axboe, mcgrof, masahiroy,
nathan, dennis, tj, muchun.song, rppt, paulmck, pasha.tatashin,
yosryahmed, yuzhao, dhowells
On Mon, 8 May 2023 11:57:10 -0400
Kent Overstreet <kent.overstreet@linux.dev> wrote:
> On Mon, May 08, 2023 at 05:52:06PM +0200, Petr Tesařík wrote:
> > On Sun, 7 May 2023 13:20:55 -0400
> > Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >
> > > On Thu, May 04, 2023 at 11:07:22AM +0200, Michal Hocko wrote:
> > > > No. I am mostly concerned about the _maintenance_ overhead. For the
> > > > bare tracking (without profiling and thus stack traces) only those
> > > > allocations that are directly inlined into the consumer are really
> > > > of any use. That increases the code impact of the tracing because any
> > > > relevant allocation location has to go through the micro surgery.
> > > >
> > > > e.g. is it really interesting to know that there is a likely memory
> > > > leak in seq_file proper doing and allocation? No as it is the specific
> > > > implementation using seq_file that is leaking most likely. There are
> > > > other examples like that See?
> > >
> > > So this is a rather strange usage of "maintenance overhead" :)
> > >
> > > But it's something we thought of. If we had to plumb around a _RET_IP_
> > > parameter, or a codetag pointer, it would be a hassle annotating the
> > > correct callsite.
> > >
> > > Instead, alloc_hooks() wraps a memory allocation function and stashes a
> > > pointer to a codetag in task_struct for use by the core slub/buddy
> > > allocator code.
> > >
> > > That means that in your example, to move tracking to a given seq_file
> > > function, we just:
> > > - hook the seq_file function with alloc_hooks
> >
> > Thank you. That's exactly what I was trying to point out. So you hook
> > seq_buf_alloc(), just to find out it's called from traverse(), which
> > is not very helpful either. So, you hook traverse(), which sounds quite
> > generic. Yes, you're lucky, because it is a static function, and the
> > identifier is not actually used anywhere else (right now), but each
> > time you want to hook something, you must make sure it does not
> > conflict with any other identifier in the kernel...
>
> Cscope makes quick and easy work of this kind of stuff.
Sure, although AFAIK the index does not cover all possible config
options (so non-x86 arch code is often forgotten). However, that's the
less important part.
What do you do if you need to hook something that does conflict with an
existing identifier?
Petr T
^ permalink raw reply [flat|nested] 160+ messages in thread