* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-14 6:28 ` Hugh Dickins
@ 2025-10-14 17:51 ` Liam R. Howlett
2025-10-15 9:10 ` Lorenzo Stoakes
2025-10-14 21:33 ` Kalesh Singh
2025-10-17 9:00 ` Lorenzo Stoakes
2 siblings, 1 reply; 12+ messages in thread
From: Liam R. Howlett @ 2025-10-14 17:51 UTC (permalink / raw)
To: Hugh Dickins
Cc: Kalesh Singh, akpm, minchan, lorenzo.stoakes, david, rppt,
pfalcato, kernel-team, android-mm, stable, SeongJae Park,
Alexander Viro, Christian Brauner, Jan Kara, Kees Cook,
Vlastimil Babka, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Shuah Khan,
linux-kernel, linux-fsdevel, linux-mm, linux-trace-kernel,
linux-kselftest
* Hugh Dickins <hughd@google.com> [251014 02:28]:
> On Mon, 13 Oct 2025, Kalesh Singh wrote:
>
> > The VMA count limit check in do_mmap() and do_brk_flags() uses a
> > strict inequality (>), which allows a process's VMA count to exceed
> > the configured sysctl_max_map_count limit by one.
...
> > /* Too many mappings? */
> > - if (mm->map_count > sysctl_max_map_count)
> > + if (mm->map_count >= sysctl_max_map_count)
> > return -ENOMEM;
> >
> > /*
> > diff --git a/mm/vma.c b/mm/vma.c
> > index a2e1ae954662..fba68f13e628 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -2797,7 +2797,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT))
> > return -ENOMEM;
> >
> > - if (mm->map_count > sysctl_max_map_count)
> > + if (mm->map_count >= sysctl_max_map_count)
> > return -ENOMEM;
> >
> > if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT))
> > --
> > 2.51.0.760.g7b8bcc2412-goog
>
> Sorry for letting you go so far before speaking up (I had to test what
> I believed to be true, and had hoped that meanwhile one of your many
> illustrious reviewers would say so first, but no): it's a NAK from me.
>
> These are not off-by-ones: at the point of these checks, it is not
> known whether an additional map/vma will have to be added, or the
> addition will be merged into an existing map/vma. So the checks
> err on the lenient side, letting you get perhaps one more than the
> sysctl said, but not allowing any more than that.
>
> Which is all that matters, isn't it? Limiting unrestrained growth.
>
> In this patch you're proposing to change it from erring on the
> lenient side to erring on the strict side - prohibiting merges
> at the limit which have been allowed for many years.
>
> Whatever one thinks about the merits of erring on the lenient versus
> erring on the strict side, I see no reason to make this change now,
> and most certainly not with a Fixes Cc: stable. There is no danger
> in the current behaviour; there is danger in prohibiting what was
> allowed before.
Thanks Hugh.
I'm left wondering if the issue is that we are checking in the wrong
location. That is, should we be checking so early in the process or
later when we know the count adjustment?
But then again, later we may be in mid-operation and find out we're out
of room. Other places are even more lenient and allow us to exceed the
count for a potential limited time, and we really don't know what's
going to happen until we examine what already exists.. So it seems like
a lot of effort to avoid one extra vma.
>
> As to the remainder of your series: I have to commend you for doing
> a thorough and well-presented job, but I cannot myself see the point in
> changing 21 files for what almost amounts to a max_map_count subsystem.
> I call it misdirected effort, not at all to my taste, which prefers the
> straightforward checks already there; but accept that my taste may be
> out of fashion, so won't stand in the way if others think it worthwhile.
I'm not sure which way I favour, it does seem like a large change to
avoid an issue that never existed.
In either case, it seems like a good idea to adjust the comments to
state that the count may not change.
Thanks,
Liam
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-14 17:51 ` Liam R. Howlett
@ 2025-10-15 9:10 ` Lorenzo Stoakes
0 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2025-10-15 9:10 UTC (permalink / raw)
To: Liam R. Howlett, Hugh Dickins, Kalesh Singh, akpm, minchan, david,
rppt, pfalcato, kernel-team, android-mm, stable, SeongJae Park,
Alexander Viro, Christian Brauner, Jan Kara, Kees Cook,
Vlastimil Babka, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Shuah Khan,
linux-kernel, linux-fsdevel, linux-mm, linux-trace-kernel,
linux-kselftest
On Tue, Oct 14, 2025 at 01:51:31PM -0400, Liam R. Howlett wrote:
> In either case, it seems like a good idea to adjust the comments to
> state that the count may not change.
Yes, I am busy trying to catch up with my backlog (do intend to look at
this more closely) but wanted to say - let's please document what we're
doing here in a comment.
It's my firm conviction that we in mm need to eliminate as many instances
of 'just have to know' or implicit knowledge that is either undocumented or
tricky to find (buried in a commit message from X yrs ago for instance).
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-14 6:28 ` Hugh Dickins
2025-10-14 17:51 ` Liam R. Howlett
@ 2025-10-14 21:33 ` Kalesh Singh
2025-10-16 5:05 ` Hugh Dickins
2025-10-17 9:00 ` Lorenzo Stoakes
2025-10-17 9:00 ` Lorenzo Stoakes
2 siblings, 2 replies; 12+ messages in thread
From: Kalesh Singh @ 2025-10-14 21:33 UTC (permalink / raw)
To: Hugh Dickins
Cc: akpm, minchan, lorenzo.stoakes, david, Liam.Howlett, rppt,
pfalcato, kernel-team, android-mm, stable, SeongJae Park,
Alexander Viro, Christian Brauner, Jan Kara, Kees Cook,
Vlastimil Babka, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Shuah Khan,
linux-kernel, linux-fsdevel, linux-mm, linux-trace-kernel,
linux-kselftest
On Mon, Oct 13, 2025 at 11:28 PM Hugh Dickins <hughd@google.com> wrote:
>
> On Mon, 13 Oct 2025, Kalesh Singh wrote:
>
> > The VMA count limit check in do_mmap() and do_brk_flags() uses a
> > strict inequality (>), which allows a process's VMA count to exceed
> > the configured sysctl_max_map_count limit by one.
> >
> > A process with mm->map_count == sysctl_max_map_count will incorrectly
> > pass this check and then exceed the limit upon allocation of a new VMA
> > when its map_count is incremented.
> >
> > Other VMA allocation paths, such as split_vma(), already use the
> > correct, inclusive (>=) comparison.
> >
> > Fix this bug by changing the comparison to be inclusive in do_mmap()
> > and do_brk_flags(), bringing them in line with the correct behavior
> > of other allocation paths.
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Cc: <stable@vger.kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: Mike Rapoport <rppt@kernel.org>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Pedro Falcato <pfalcato@suse.de>
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Reviewed-by: Pedro Falcato <pfalcato@suse.de>
> > Acked-by: SeongJae Park <sj@kernel.org>
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> > - Collect Reviewed-by and Acked-by tags.
> >
> > Changes in v2:
> > - Fix mmap check, per Pedro
> >
> > mm/mmap.c | 2 +-
> > mm/vma.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 644f02071a41..da2cbdc0f87b 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -374,7 +374,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > return -EOVERFLOW;
> >
> > /* Too many mappings? */
> > - if (mm->map_count > sysctl_max_map_count)
> > + if (mm->map_count >= sysctl_max_map_count)
> > return -ENOMEM;
> >
> > /*
> > diff --git a/mm/vma.c b/mm/vma.c
> > index a2e1ae954662..fba68f13e628 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -2797,7 +2797,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT))
> > return -ENOMEM;
> >
> > - if (mm->map_count > sysctl_max_map_count)
> > + if (mm->map_count >= sysctl_max_map_count)
> > return -ENOMEM;
> >
> > if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT))
> > --
> > 2.51.0.760.g7b8bcc2412-goog
>
> Sorry for letting you go so far before speaking up (I had to test what
> I believed to be true, and had hoped that meanwhile one of your many
> illustrious reviewers would say so first, but no): it's a NAK from me.
>
> These are not off-by-ones: at the point of these checks, it is not
> known whether an additional map/vma will have to be added, or the
> addition will be merged into an existing map/vma. So the checks
> err on the lenient side, letting you get perhaps one more than the
> sysctl said, but not allowing any more than that.
>
> Which is all that matters, isn't it? Limiting unrestrained growth.
>
> In this patch you're proposing to change it from erring on the
> lenient side to erring on the strict side - prohibiting merges
> at the limit which have been allowed for many years.
>
> Whatever one thinks about the merits of erring on the lenient versus
> erring on the strict side, I see no reason to make this change now,
> and most certainly not with a Fixes Cc: stable. There is no danger
> in the current behaviour; there is danger in prohibiting what was
> allowed before.
>
> As to the remainder of your series: I have to commend you for doing
> a thorough and well-presented job, but I cannot myself see the point in
> changing 21 files for what almost amounts to a max_map_count subsystem.
> I call it misdirected effort, not at all to my taste, which prefers the
> straightforward checks already there; but accept that my taste may be
> out of fashion, so won't stand in the way if others think it worthwhile.
Hi Hugh,
Thanks for the detailed review and for taking the time to test the behavior.
You've raised a valid point. I wasn't aware of the history behind the
lenient check for merges. The lack of a comment, like the one that
exists for exceeding the limit in munmap(), led me to misinterpret
this as an off-by-one bug. The convention makes sense if we consider
potential merges.
If it was in-fact the intended behavior, then I agree we should keep
it lenient. It would mean though, that munmap() being able to free a
VMA if a split is required (by permitting exceeding the limit by 1)
would not work in the case where we have already exceeded the limit. I
find this to be inconsistent but this is also the current behavior ...
I will drop this patch and the patch that introduces the
vma_count_remaining() helper, as I see your point about it potentially
being unnecessary overhead.
Regarding your feedback on the rest of the series, I believe the 3
remaining patches are still valuable on their own.
- The selftest adds a comprehensive tests for VMA operations at the
sysctl_max_map_count limit. This will self-document the exact behavior
expected, including the leniency for potential merges that you
highlighted, preventing the kind of misunderstanding that led to my
initial patch.
- The rename of mm_struct->map_count to vma_count, is a
straightforward cleanup for code clarity that makes the purpose of the
field more explicit.
- The tracepoint adds needed observability for telemetry, allowing us
to see when processes are failing in the field due to VMA count limit.
The selftest, is what makes up a large portion of the diff you
sited, and with vma_count_remaining() gone the series will not touch
nearly as many files.
Would this be an acceptable path forward?
Thanks,
Kalesh
>
> Hugh
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-14 21:33 ` Kalesh Singh
@ 2025-10-16 5:05 ` Hugh Dickins
2025-10-16 17:19 ` Kalesh Singh
2025-10-17 9:00 ` Lorenzo Stoakes
1 sibling, 1 reply; 12+ messages in thread
From: Hugh Dickins @ 2025-10-16 5:05 UTC (permalink / raw)
To: Kalesh Singh
Cc: Hugh Dickins, akpm, minchan, lorenzo.stoakes, david, Liam.Howlett,
rppt, pfalcato, kernel-team, android-mm, stable, SeongJae Park,
Alexander Viro, Christian Brauner, Jan Kara, Kees Cook,
Vlastimil Babka, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Shuah Khan,
linux-kernel, linux-fsdevel, linux-mm, linux-trace-kernel,
linux-kselftest
[-- Attachment #1: Type: text/plain, Size: 4956 bytes --]
On Tue, 14 Oct 2025, Kalesh Singh wrote:
> On Mon, Oct 13, 2025 at 11:28 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > Sorry for letting you go so far before speaking up (I had to test what
> > I believed to be true, and had hoped that meanwhile one of your many
> > illustrious reviewers would say so first, but no): it's a NAK from me.
> >
> > These are not off-by-ones: at the point of these checks, it is not
> > known whether an additional map/vma will have to be added, or the
> > addition will be merged into an existing map/vma. So the checks
> > err on the lenient side, letting you get perhaps one more than the
> > sysctl said, but not allowing any more than that.
> >
> > Which is all that matters, isn't it? Limiting unrestrained growth.
> >
> > In this patch you're proposing to change it from erring on the
> > lenient side to erring on the strict side - prohibiting merges
> > at the limit which have been allowed for many years.
> >
> > Whatever one thinks about the merits of erring on the lenient versus
> > erring on the strict side, I see no reason to make this change now,
> > and most certainly not with a Fixes Cc: stable. There is no danger
> > in the current behaviour; there is danger in prohibiting what was
> > allowed before.
> >
> > As to the remainder of your series: I have to commend you for doing
> > a thorough and well-presented job, but I cannot myself see the point in
> > changing 21 files for what almost amounts to a max_map_count subsystem.
> > I call it misdirected effort, not at all to my taste, which prefers the
> > straightforward checks already there; but accept that my taste may be
> > out of fashion, so won't stand in the way if others think it worthwhile.
>
> Hi Hugh,
>
> Thanks for the detailed review and for taking the time to test the behavior.
>
> You've raised a valid point. I wasn't aware of the history behind the
> lenient check for merges. The lack of a comment, like the one that
> exists for exceeding the limit in munmap(), led me to misinterpret
> this as an off-by-one bug. The convention makes sense if we consider
> potential merges.
Yes, a comment there would be helpful (and I doubt it's worth more
than adding a comment); but I did not understand at all, Liam's
suggestion for the comment "to state that the count may not change".
>
> If it was in-fact the intended behavior, then I agree we should keep
> it lenient. It would mean though, that munmap() being able to free a
> VMA if a split is required (by permitting exceeding the limit by 1)
> would not work in the case where we have already exceeded the limit. I
> find this to be inconsistent but this is also the current behavior ...
You're saying that once we go one over the limit, say with a new mmap,
an munmap check makes it impossible to munmap that or any other vma?
If that's so, I do agree with you, that's nasty, and I would hate any
new code to behave that way. In code that's survived as long as this
without troubling anyone, I'm not so sure: but if it's easily fixed
(a more lenient check at the munmap end?) that would seem worthwhile.
Ah, but reading again, you say "if a split is required": I guess
munmapping the whole vma has no problem; and it's fine for a middle
munmap, splitting into three before munmapping the middle, to fail.
I suppose it would be nicer if munmaping start or end succeeeded,
but I don't think that matters very much in this case.
>
> I will drop this patch and the patch that introduces the
> vma_count_remaining() helper, as I see your point about it potentially
> being unnecessary overhead.
>
> Regarding your feedback on the rest of the series, I believe the 3
> remaining patches are still valuable on their own.
>
> - The selftest adds a comprehensive tests for VMA operations at the
> sysctl_max_map_count limit. This will self-document the exact behavior
> expected, including the leniency for potential merges that you
> highlighted, preventing the kind of misunderstanding that led to my
> initial patch.
>
> - The rename of mm_struct->map_count to vma_count, is a
> straightforward cleanup for code clarity that makes the purpose of the
> field more explicit.
>
> - The tracepoint adds needed observability for telemetry, allowing us
> to see when processes are failing in the field due to VMA count limit.
>
> The selftest, is what makes up a large portion of the diff you
> sited, and with vma_count_remaining() gone the series will not touch
> nearly as many files.
>
> Would this be an acceptable path forward?
Possibly, if others like it: my concern was to end a misunderstanding
(I'm generally much too slow to get involved in cleanups).
Though given that the sysctl is named "max_map_count", I'm not very
keen on renaming everything else from map_count to vma_count
(and of course I'm not suggesting to rename the sysctl).
Hugh
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-16 5:05 ` Hugh Dickins
@ 2025-10-16 17:19 ` Kalesh Singh
2025-10-16 19:15 ` David Hildenbrand
0 siblings, 1 reply; 12+ messages in thread
From: Kalesh Singh @ 2025-10-16 17:19 UTC (permalink / raw)
To: Hugh Dickins
Cc: akpm, minchan, lorenzo.stoakes, david, Liam.Howlett, rppt,
pfalcato, kernel-team, android-mm, stable, SeongJae Park,
Alexander Viro, Christian Brauner, Jan Kara, Kees Cook,
Vlastimil Babka, Suren Baghdasaryan, Michal Hocko, Jann Horn,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Ben Segall, Mel Gorman, Valentin Schneider, Shuah Khan,
linux-kernel, linux-fsdevel, linux-mm, linux-trace-kernel,
linux-kselftest
On Wed, Oct 15, 2025 at 10:05 PM Hugh Dickins <hughd@google.com> wrote:
>
> On Tue, 14 Oct 2025, Kalesh Singh wrote:
> > On Mon, Oct 13, 2025 at 11:28 PM Hugh Dickins <hughd@google.com> wrote:
> > >
> > > Sorry for letting you go so far before speaking up (I had to test what
> > > I believed to be true, and had hoped that meanwhile one of your many
> > > illustrious reviewers would say so first, but no): it's a NAK from me.
> > >
> > > These are not off-by-ones: at the point of these checks, it is not
> > > known whether an additional map/vma will have to be added, or the
> > > addition will be merged into an existing map/vma. So the checks
> > > err on the lenient side, letting you get perhaps one more than the
> > > sysctl said, but not allowing any more than that.
> > >
> > > Which is all that matters, isn't it? Limiting unrestrained growth.
> > >
> > > In this patch you're proposing to change it from erring on the
> > > lenient side to erring on the strict side - prohibiting merges
> > > at the limit which have been allowed for many years.
> > >
> > > Whatever one thinks about the merits of erring on the lenient versus
> > > erring on the strict side, I see no reason to make this change now,
> > > and most certainly not with a Fixes Cc: stable. There is no danger
> > > in the current behaviour; there is danger in prohibiting what was
> > > allowed before.
> > >
> > > As to the remainder of your series: I have to commend you for doing
> > > a thorough and well-presented job, but I cannot myself see the point in
> > > changing 21 files for what almost amounts to a max_map_count subsystem.
> > > I call it misdirected effort, not at all to my taste, which prefers the
> > > straightforward checks already there; but accept that my taste may be
> > > out of fashion, so won't stand in the way if others think it worthwhile.
> >
> > Hi Hugh,
> >
> > Thanks for the detailed review and for taking the time to test the behavior.
> >
> > You've raised a valid point. I wasn't aware of the history behind the
> > lenient check for merges. The lack of a comment, like the one that
> > exists for exceeding the limit in munmap(), led me to misinterpret
> > this as an off-by-one bug. The convention makes sense if we consider
> > potential merges.
>
> Yes, a comment there would be helpful (and I doubt it's worth more
> than adding a comment); but I did not understand at all, Liam's
> suggestion for the comment "to state that the count may not change".
>
> >
> > If it was in-fact the intended behavior, then I agree we should keep
> > it lenient. It would mean though, that munmap() being able to free a
> > VMA if a split is required (by permitting exceeding the limit by 1)
> > would not work in the case where we have already exceeded the limit. I
> > find this to be inconsistent but this is also the current behavior ...
>
> You're saying that once we go one over the limit, say with a new mmap,
> an munmap check makes it impossible to munmap that or any other vma?
>
> If that's so, I do agree with you, that's nasty, and I would hate any
> new code to behave that way. In code that's survived as long as this
> without troubling anyone, I'm not so sure: but if it's easily fixed
> (a more lenient check at the munmap end?) that would seem worthwhile.
>
> Ah, but reading again, you say "if a split is required": I guess
> munmapping the whole vma has no problem; and it's fine for a middle
> munmap, splitting into three before munmapping the middle, to fail.
> I suppose it would be nicer if munmaping start or end succeeeded,
> but I don't think that matters very much in this case.
>
Yes, your understanding is correct. I meant that currently, we allow
for an munmap() requiring a single split to succeed even if it will
temporarily exceed the limit by one, as immediately after we will be
removing one of those VMAs. However, if the process has already
exceeded the limit, say, due to a non-merging mmap(), then an munmap()
requiring a split will fail. It's not a big issue, but I found it
inconsistent that this succeeds in some cases and not in others.
> >
> > I will drop this patch and the patch that introduces the
> > vma_count_remaining() helper, as I see your point about it potentially
> > being unnecessary overhead.
> >
> > Regarding your feedback on the rest of the series, I believe the 3
> > remaining patches are still valuable on their own.
> >
> > - The selftest adds a comprehensive tests for VMA operations at the
> > sysctl_max_map_count limit. This will self-document the exact behavior
> > expected, including the leniency for potential merges that you
> > highlighted, preventing the kind of misunderstanding that led to my
> > initial patch.
> >
> > - The rename of mm_struct->map_count to vma_count, is a
> > straightforward cleanup for code clarity that makes the purpose of the
> > field more explicit.
> >
> > - The tracepoint adds needed observability for telemetry, allowing us
> > to see when processes are failing in the field due to VMA count limit.
> >
> > The selftest, is what makes up a large portion of the diff you
> > sited, and with vma_count_remaining() gone the series will not touch
> > nearly as many files.
> >
> > Would this be an acceptable path forward?
>
> Possibly, if others like it: my concern was to end a misunderstanding
> (I'm generally much too slow to get involved in cleanups).
>
> Though given that the sysctl is named "max_map_count", I'm not very
> keen on renaming everything else from map_count to vma_count
> (and of course I'm not suggesting to rename the sysctl).
I still believe vma_count is a clearer name for the field, given some
existing comments already refer to it as vma count. The inconsistency
between vma_count and sysctl_max_map_count can be abstracted away; and
the sysctl made non-global.
I'll wait for feedback form others on how to proceed.
Thanks for the thorough review and discussion.
-- Kalesh
>
> Hugh
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-16 17:19 ` Kalesh Singh
@ 2025-10-16 19:15 ` David Hildenbrand
0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2025-10-16 19:15 UTC (permalink / raw)
To: Kalesh Singh, Hugh Dickins
Cc: akpm, minchan, lorenzo.stoakes, Liam.Howlett, rppt, pfalcato,
kernel-team, android-mm, stable, SeongJae Park, Alexander Viro,
Christian Brauner, Jan Kara, Kees Cook, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Jann Horn, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Ben Segall,
Mel Gorman, Valentin Schneider, Shuah Khan, linux-kernel,
linux-fsdevel, linux-mm, linux-trace-kernel, linux-kselftest
>>> Would this be an acceptable path forward?
>>
>> Possibly, if others like it: my concern was to end a misunderstanding
>> (I'm generally much too slow to get involved in cleanups).
>>
>> Though given that the sysctl is named "max_map_count", I'm not very
>> keen on renaming everything else from map_count to vma_count
>> (and of course I'm not suggesting to rename the sysctl).
>
> I still believe vma_count is a clearer name for the field, given some
> existing comments already refer to it as vma count. The inconsistency
> between vma_count and sysctl_max_map_count can be abstracted away; and
> the sysctl made non-global.
Yes, to me that part makes perfect sense (taste differs as we know).
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-14 21:33 ` Kalesh Singh
2025-10-16 5:05 ` Hugh Dickins
@ 2025-10-17 9:00 ` Lorenzo Stoakes
1 sibling, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2025-10-17 9:00 UTC (permalink / raw)
To: Kalesh Singh
Cc: Hugh Dickins, akpm, minchan, david, Liam.Howlett, rppt, pfalcato,
kernel-team, android-mm, stable, SeongJae Park, Alexander Viro,
Christian Brauner, Jan Kara, Kees Cook, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Jann Horn, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Ben Segall,
Mel Gorman, Valentin Schneider, Shuah Khan, linux-kernel,
linux-fsdevel, linux-mm, linux-trace-kernel, linux-kselftest
On Tue, Oct 14, 2025 at 02:33:05PM -0700, Kalesh Singh wrote:
> I will drop this patch and the patch that introduces the
> vma_count_remaining() helper, as I see your point about it potentially
> being unnecessary overhead.
Please do not drop the helper thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-14 6:28 ` Hugh Dickins
2025-10-14 17:51 ` Liam R. Howlett
2025-10-14 21:33 ` Kalesh Singh
@ 2025-10-17 9:00 ` Lorenzo Stoakes
2025-10-17 21:41 ` Kalesh Singh
2 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Stoakes @ 2025-10-17 9:00 UTC (permalink / raw)
To: Hugh Dickins
Cc: Kalesh Singh, akpm, minchan, david, Liam.Howlett, rppt, pfalcato,
kernel-team, android-mm, stable, SeongJae Park, Alexander Viro,
Christian Brauner, Jan Kara, Kees Cook, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Jann Horn, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Ben Segall,
Mel Gorman, Valentin Schneider, Shuah Khan, linux-kernel,
linux-fsdevel, linux-mm, linux-trace-kernel, linux-kselftest
On Mon, Oct 13, 2025 at 11:28:16PM -0700, Hugh Dickins wrote:
> On Mon, 13 Oct 2025, Kalesh Singh wrote:
>
> > The VMA count limit check in do_mmap() and do_brk_flags() uses a
> > strict inequality (>), which allows a process's VMA count to exceed
> > the configured sysctl_max_map_count limit by one.
> >
> > A process with mm->map_count == sysctl_max_map_count will incorrectly
> > pass this check and then exceed the limit upon allocation of a new VMA
> > when its map_count is incremented.
> >
> > Other VMA allocation paths, such as split_vma(), already use the
> > correct, inclusive (>=) comparison.
> >
> > Fix this bug by changing the comparison to be inclusive in do_mmap()
> > and do_brk_flags(), bringing them in line with the correct behavior
> > of other allocation paths.
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Cc: <stable@vger.kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: Mike Rapoport <rppt@kernel.org>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Pedro Falcato <pfalcato@suse.de>
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Reviewed-by: Pedro Falcato <pfalcato@suse.de>
> > Acked-by: SeongJae Park <sj@kernel.org>
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> > - Collect Reviewed-by and Acked-by tags.
> >
> > Changes in v2:
> > - Fix mmap check, per Pedro
> >
> > mm/mmap.c | 2 +-
> > mm/vma.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 644f02071a41..da2cbdc0f87b 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -374,7 +374,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > return -EOVERFLOW;
> >
> > /* Too many mappings? */
> > - if (mm->map_count > sysctl_max_map_count)
> > + if (mm->map_count >= sysctl_max_map_count)
> > return -ENOMEM;
> >
> > /*
> > diff --git a/mm/vma.c b/mm/vma.c
> > index a2e1ae954662..fba68f13e628 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -2797,7 +2797,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT))
> > return -ENOMEM;
> >
> > - if (mm->map_count > sysctl_max_map_count)
> > + if (mm->map_count >= sysctl_max_map_count)
> > return -ENOMEM;
> >
> > if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT))
> > --
> > 2.51.0.760.g7b8bcc2412-goog
>
> Sorry for letting you go so far before speaking up (I had to test what
> I believed to be true, and had hoped that meanwhile one of your many
> illustrious reviewers would say so first, but no): it's a NAK from me.
>
> These are not off-by-ones: at the point of these checks, it is not
> known whether an additional map/vma will have to be added, or the
> addition will be merged into an existing map/vma. So the checks
> err on the lenient side, letting you get perhaps one more than the
> sysctl said, but not allowing any more than that.
>
> Which is all that matters, isn't it? Limiting unrestrained growth.
>
> In this patch you're proposing to change it from erring on the
> lenient side to erring on the strict side - prohibiting merges
> at the limit which have been allowed for many years.
>
> Whatever one thinks about the merits of erring on the lenient versus
> erring on the strict side, I see no reason to make this change now,
> and most certainly not with a Fixes Cc: stable. There is no danger
> in the current behaviour; there is danger in prohibiting what was
> allowed before.
Thanks for highlighting this, but this is something that people just 'had
to know'. If so many maintainers are unaware that this is a requirement,
this is a sign that this is very unclear.
So as I said to Kalesh elsewhere, this is something we really do need to
comment very clearly.
Or perhaps have as a helper function to _very explicitly_ show that we're
making this allowance.
I do agree we should err on the side of caution, though if you're at a
point where you're _about_ to hit the map count limit you're already
screwed really.
But for the sake of avoiding breaking people who are doing crazy things (or
perhaps I'm not imaginative enough here :) yes let's leave it as is.
But I really _do not_ want to see this global exported so, I think an
appropriate helper function or use of the newly introduced one with a
comment are vital.
>
> As to the remainder of your series: I have to commend you for doing
> a thorough and well-presented job, but I cannot myself see the point in
> changing 21 files for what almost amounts to a max_map_count subsystem.
> I call it misdirected effort, not at all to my taste, which prefers the
> straightforward checks already there; but accept that my taste may be
> out of fashion, so won't stand in the way if others think it worthwhile.
I disagree very much, I see value here.
Avoiding referencing an ugly global is a big win in itself, but
self-documenting code has huge value.
In general mm has had a habit of hiding information as to how things work
for a long time (when writing the book I really had to decode a _lot_ of
this kind of thing).
I think it's time we moved away from this, and tried to make the code as
clear as possible.
>
> Hugh
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-17 9:00 ` Lorenzo Stoakes
@ 2025-10-17 21:41 ` Kalesh Singh
2025-10-20 11:32 ` Lorenzo Stoakes
0 siblings, 1 reply; 12+ messages in thread
From: Kalesh Singh @ 2025-10-17 21:41 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Hugh Dickins, akpm, minchan, david, Liam.Howlett, rppt, pfalcato,
kernel-team, android-mm, stable, SeongJae Park, Alexander Viro,
Christian Brauner, Jan Kara, Kees Cook, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Jann Horn, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Ben Segall,
Mel Gorman, Valentin Schneider, Shuah Khan, linux-kernel,
linux-fsdevel, linux-mm, linux-trace-kernel, linux-kselftest
On Fri, Oct 17, 2025 at 2:00 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Mon, Oct 13, 2025 at 11:28:16PM -0700, Hugh Dickins wrote:
> > On Mon, 13 Oct 2025, Kalesh Singh wrote:
> >
> > > The VMA count limit check in do_mmap() and do_brk_flags() uses a
> > > strict inequality (>), which allows a process's VMA count to exceed
> > > the configured sysctl_max_map_count limit by one.
> > >
> > > A process with mm->map_count == sysctl_max_map_count will incorrectly
> > > pass this check and then exceed the limit upon allocation of a new VMA
> > > when its map_count is incremented.
> > >
> > > Other VMA allocation paths, such as split_vma(), already use the
> > > correct, inclusive (>=) comparison.
> > >
> > > Fix this bug by changing the comparison to be inclusive in do_mmap()
> > > and do_brk_flags(), bringing them in line with the correct behavior
> > > of other allocation paths.
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > Cc: <stable@vger.kernel.org>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: David Hildenbrand <david@redhat.com>
> > > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > Cc: Mike Rapoport <rppt@kernel.org>
> > > Cc: Minchan Kim <minchan@kernel.org>
> > > Cc: Pedro Falcato <pfalcato@suse.de>
> > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > Reviewed-by: Pedro Falcato <pfalcato@suse.de>
> > > Acked-by: SeongJae Park <sj@kernel.org>
> > > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > ---
> > >
> > > Changes in v3:
> > > - Collect Reviewed-by and Acked-by tags.
> > >
> > > Changes in v2:
> > > - Fix mmap check, per Pedro
> > >
> > > mm/mmap.c | 2 +-
> > > mm/vma.c | 2 +-
> > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 644f02071a41..da2cbdc0f87b 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -374,7 +374,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > > return -EOVERFLOW;
> > >
> > > /* Too many mappings? */
> > > - if (mm->map_count > sysctl_max_map_count)
> > > + if (mm->map_count >= sysctl_max_map_count)
> > > return -ENOMEM;
> > >
> > > /*
> > > diff --git a/mm/vma.c b/mm/vma.c
> > > index a2e1ae954662..fba68f13e628 100644
> > > --- a/mm/vma.c
> > > +++ b/mm/vma.c
> > > @@ -2797,7 +2797,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT))
> > > return -ENOMEM;
> > >
> > > - if (mm->map_count > sysctl_max_map_count)
> > > + if (mm->map_count >= sysctl_max_map_count)
> > > return -ENOMEM;
> > >
> > > if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT))
> > > --
> > > 2.51.0.760.g7b8bcc2412-goog
> >
> > Sorry for letting you go so far before speaking up (I had to test what
> > I believed to be true, and had hoped that meanwhile one of your many
> > illustrious reviewers would say so first, but no): it's a NAK from me.
> >
> > These are not off-by-ones: at the point of these checks, it is not
> > known whether an additional map/vma will have to be added, or the
> > addition will be merged into an existing map/vma. So the checks
> > err on the lenient side, letting you get perhaps one more than the
> > sysctl said, but not allowing any more than that.
> >
> > Which is all that matters, isn't it? Limiting unrestrained growth.
> >
> > In this patch you're proposing to change it from erring on the
> > lenient side to erring on the strict side - prohibiting merges
> > at the limit which have been allowed for many years.
> >
> > Whatever one thinks about the merits of erring on the lenient versus
> > erring on the strict side, I see no reason to make this change now,
> > and most certainly not with a Fixes Cc: stable. There is no danger
> > in the current behaviour; there is danger in prohibiting what was
> > allowed before.
>
> Thanks for highlighting this, but this is something that people just 'had
> to know'. If so many maintainers are unaware that this is a requirement,
> this is a sign that this is very unclear.
>
> So as I said to Kalesh elsewhere, this is something we really do need to
> comment very clearly.
>
> Or perhaps have as a helper function to _very explicitly_ show that we're
> making this allowance.
>
> I do agree we should err on the side of caution, though if you're at a
> point where you're _about_ to hit the map count limit you're already
> screwed really.
>
> But for the sake of avoiding breaking people who are doing crazy things (or
> perhaps I'm not imaginative enough here :) yes let's leave it as is.
>
> But I really _do not_ want to see this global exported so, I think an
> appropriate helper function or use of the newly introduced one with a
> comment are vital.
>
> >
> > As to the remainder of your series: I have to commend you for doing
> > a thorough and well-presented job, but I cannot myself see the point in
> > changing 21 files for what almost amounts to a max_map_count subsystem.
> > I call it misdirected effort, not at all to my taste, which prefers the
> > straightforward checks already there; but accept that my taste may be
> > out of fashion, so won't stand in the way if others think it worthwhile.
>
> I disagree very much, I see value here.
>
> Avoiding referencing an ugly global is a big win in itself, but
> self-documenting code has huge value.
>
> In general mm has had a habit of hiding information as to how things work
> for a long time (when writing the book I really had to decode a _lot_ of
> this kind of thing).
>
> I think it's time we moved away from this, and tried to make the code as
> clear as possible.
Hi all,
Thanks, Lorenzo, for the feedback and support.
The consensus from the discussion appears to be:
- Drop this patch keeping the existing check lenient, but make it more
obvious and clear that this is the intended behavior.
- Keep the vma_count_remaining() or another helper and abstracts away
the direct use of the global sysctl_max_map_count.
- Keep the rename of map_count to vma_count
- Keep the selftest and tracepoint patches, which are important for
documenting the behavior and providing observability.
If there are no objections, I'll plan to send out a new version of the series.
Thanks,
Kalesh
>
> >
> > Hugh
>
> Cheers, Lorenzo
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3 1/5] mm: fix off-by-one error in VMA count limit checks
2025-10-17 21:41 ` Kalesh Singh
@ 2025-10-20 11:32 ` Lorenzo Stoakes
0 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2025-10-20 11:32 UTC (permalink / raw)
To: Kalesh Singh
Cc: Hugh Dickins, akpm, minchan, david, Liam.Howlett, rppt, pfalcato,
kernel-team, android-mm, stable, SeongJae Park, Alexander Viro,
Christian Brauner, Jan Kara, Kees Cook, Vlastimil Babka,
Suren Baghdasaryan, Michal Hocko, Jann Horn, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Ben Segall,
Mel Gorman, Valentin Schneider, Shuah Khan, linux-kernel,
linux-fsdevel, linux-mm, linux-trace-kernel, linux-kselftest
On Fri, Oct 17, 2025 at 02:41:00PM -0700, Kalesh Singh wrote:
>
> Hi all,
>
> Thanks, Lorenzo, for the feedback and support.
>
> The consensus from the discussion appears to be:
>
> - Drop this patch keeping the existing check lenient, but make it more
> obvious and clear that this is the intended behavior.
> - Keep the vma_count_remaining() or another helper and abstracts away
> the direct use of the global sysctl_max_map_count.
> - Keep the rename of map_count to vma_count
> - Keep the selftest and tracepoint patches, which are important for
> documenting the behavior and providing observability.
>
> If there are no objections, I'll plan to send out a new version of the series.
>
> Thanks,
> Kalesh
>
Ack, sounds good!
^ permalink raw reply [flat|nested] 12+ messages in thread