* [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
@ 2025-04-08 11:09 Breno Leitao
2025-04-08 11:30 ` Andrea Righi
0 siblings, 1 reply; 5+ messages in thread
From: Breno Leitao @ 2025-04-08 11:09 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider
Cc: linux-kernel, christophe.jaillet, changwoo, kernel-team, stable,
Rik van Riel, Breno Leitao
Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
can require large contiguous memory (up to order=9) depending on the
implementation. This change prevents allocation failures by allowing the
system to fall back to vmalloc when contiguous memory allocation fails.
Since this buffer is only used for debugging purposes, physical memory
contiguity is not required, making vmalloc a suitable alternative.
Cc: stable@vger.kernel.org
Fixes: 07814a9439a3b0 ("sched_ext: Print debug dump after an error exit")
Suggested-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Andrea Righi <arighi@nvidia.com>
---
Changes in v2:
- Use kvfree() on the free path as well.
- Link to v1: https://lore.kernel.org/r/20250407-scx-v1-1-774ba74a2c17@debian.org
---
kernel/sched/ext.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 66bcd40a28ca1..db9af6a3c04fd 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4623,7 +4623,7 @@ static void scx_ops_bypass(bool bypass)
static void free_exit_info(struct scx_exit_info *ei)
{
- kfree(ei->dump);
+ kvfree(ei->dump);
kfree(ei->msg);
kfree(ei->bt);
kfree(ei);
@@ -4639,7 +4639,7 @@ static struct scx_exit_info *alloc_exit_info(size_t exit_dump_len)
ei->bt = kcalloc(SCX_EXIT_BT_LEN, sizeof(ei->bt[0]), GFP_KERNEL);
ei->msg = kzalloc(SCX_EXIT_MSG_LEN, GFP_KERNEL);
- ei->dump = kzalloc(exit_dump_len, GFP_KERNEL);
+ ei->dump = kvzalloc(exit_dump_len, GFP_KERNEL);
if (!ei->bt || !ei->msg || !ei->dump) {
free_exit_info(ei);
---
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
change-id: 20250407-scx-11dbf94803c3
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
2025-04-08 11:09 [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation Breno Leitao
@ 2025-04-08 11:30 ` Andrea Righi
2025-04-08 12:17 ` Breno Leitao
0 siblings, 1 reply; 5+ messages in thread
From: Andrea Righi @ 2025-04-08 11:30 UTC (permalink / raw)
To: Breno Leitao
Cc: Tejun Heo, David Vernet, Changwoo Min, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
linux-kernel, christophe.jaillet, kernel-team, stable,
Rik van Riel
Hi Breno,
I already acked even the buggy version, so this one looks good. :)
On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
> Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
> can require large contiguous memory (up to order=9) depending on the
BTW, from where this order=9 is coming from? exit_dump_len is 32K by
default, but a BPF scheduler can arbitrarily set it to any value via
ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
Thanks,
-Andrea
> implementation. This change prevents allocation failures by allowing the
> system to fall back to vmalloc when contiguous memory allocation fails.
>
> Since this buffer is only used for debugging purposes, physical memory
> contiguity is not required, making vmalloc a suitable alternative.
>
> Cc: stable@vger.kernel.org
> Fixes: 07814a9439a3b0 ("sched_ext: Print debug dump after an error exit")
> Suggested-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> Acked-by: Andrea Righi <arighi@nvidia.com>
> ---
> Changes in v2:
> - Use kvfree() on the free path as well.
> - Link to v1: https://lore.kernel.org/r/20250407-scx-v1-1-774ba74a2c17@debian.org
> ---
> kernel/sched/ext.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 66bcd40a28ca1..db9af6a3c04fd 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4623,7 +4623,7 @@ static void scx_ops_bypass(bool bypass)
>
> static void free_exit_info(struct scx_exit_info *ei)
> {
> - kfree(ei->dump);
> + kvfree(ei->dump);
> kfree(ei->msg);
> kfree(ei->bt);
> kfree(ei);
> @@ -4639,7 +4639,7 @@ static struct scx_exit_info *alloc_exit_info(size_t exit_dump_len)
>
> ei->bt = kcalloc(SCX_EXIT_BT_LEN, sizeof(ei->bt[0]), GFP_KERNEL);
> ei->msg = kzalloc(SCX_EXIT_MSG_LEN, GFP_KERNEL);
> - ei->dump = kzalloc(exit_dump_len, GFP_KERNEL);
> + ei->dump = kvzalloc(exit_dump_len, GFP_KERNEL);
>
> if (!ei->bt || !ei->msg || !ei->dump) {
> free_exit_info(ei);
>
> ---
> base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
> change-id: 20250407-scx-11dbf94803c3
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
2025-04-08 11:30 ` Andrea Righi
@ 2025-04-08 12:17 ` Breno Leitao
2025-04-08 13:12 ` Andrea Righi
0 siblings, 1 reply; 5+ messages in thread
From: Breno Leitao @ 2025-04-08 12:17 UTC (permalink / raw)
To: Andrea Righi
Cc: Tejun Heo, David Vernet, Changwoo Min, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
linux-kernel, christophe.jaillet, kernel-team, stable,
Rik van Riel
Hello Andrea,
On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote:
> Hi Breno,
>
> I already acked even the buggy version, so this one looks good. :)
>
> On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
> > Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
> > can require large contiguous memory (up to order=9) depending on the
>
> BTW, from where this order=9 is coming from? exit_dump_len is 32K by
> default, but a BPF scheduler can arbitrarily set it to any value via
> ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
You are absolutely correct, this allocation could be of any size.
I've got this problem because I was monitoring the Meta fleet, and saw
a bunch of allocation failures and decided to investigate. In this case
specifically, the users were using order=9 (512 pages), but, again, this
could be even bigger.
Thanks for the review,
--breno
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
2025-04-08 12:17 ` Breno Leitao
@ 2025-04-08 13:12 ` Andrea Righi
2025-04-08 13:40 ` Breno Leitao
0 siblings, 1 reply; 5+ messages in thread
From: Andrea Righi @ 2025-04-08 13:12 UTC (permalink / raw)
To: Breno Leitao
Cc: Tejun Heo, David Vernet, Changwoo Min, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
linux-kernel, christophe.jaillet, kernel-team, stable,
Rik van Riel
On Tue, Apr 08, 2025 at 05:17:16AM -0700, Breno Leitao wrote:
> Hello Andrea,
>
> On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote:
> > Hi Breno,
> >
> > I already acked even the buggy version, so this one looks good. :)
> >
> > On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
> > > Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
> > > can require large contiguous memory (up to order=9) depending on the
> >
> > BTW, from where this order=9 is coming from? exit_dump_len is 32K by
> > default, but a BPF scheduler can arbitrarily set it to any value via
> > ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
>
> You are absolutely correct, this allocation could be of any size.
>
> I've got this problem because I was monitoring the Meta fleet, and saw
> a bunch of allocation failures and decided to investigate. In this case
> specifically, the users were using order=9 (512 pages), but, again, this
> could be even bigger.
I see, makes sense. Maybe we can rephrase this part to not mention the
order=9 allocation and avoid potential confusion.
Thanks,
-Andrea
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation
2025-04-08 13:12 ` Andrea Righi
@ 2025-04-08 13:40 ` Breno Leitao
0 siblings, 0 replies; 5+ messages in thread
From: Breno Leitao @ 2025-04-08 13:40 UTC (permalink / raw)
To: Andrea Righi
Cc: Tejun Heo, David Vernet, Changwoo Min, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
linux-kernel, christophe.jaillet, kernel-team, stable,
Rik van Riel
On Tue, Apr 08, 2025 at 03:12:43PM +0200, Andrea Righi wrote:
> On Tue, Apr 08, 2025 at 05:17:16AM -0700, Breno Leitao wrote:
> > Hello Andrea,
> >
> > On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote:
> > > Hi Breno,
> > >
> > > I already acked even the buggy version, so this one looks good. :)
> > >
> > > On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
> > > > Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which
> > > > can require large contiguous memory (up to order=9) depending on the
> > >
> > > BTW, from where this order=9 is coming from? exit_dump_len is 32K by
> > > default, but a BPF scheduler can arbitrarily set it to any value via
> > > ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
> >
> > You are absolutely correct, this allocation could be of any size.
> >
> > I've got this problem because I was monitoring the Meta fleet, and saw
> > a bunch of allocation failures and decided to investigate. In this case
> > specifically, the users were using order=9 (512 pages), but, again, this
> > could be even bigger.
>
> I see, makes sense. Maybe we can rephrase this part to not mention the
> order=9 allocation and avoid potential confusion.
Sure! I will send a v3 later today, then.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-04-08 13:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08 11:09 [PATCH v2] sched_ext: Use kvzalloc for large exit_dump allocation Breno Leitao
2025-04-08 11:30 ` Andrea Righi
2025-04-08 12:17 ` Breno Leitao
2025-04-08 13:12 ` Andrea Righi
2025-04-08 13:40 ` Breno Leitao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).