Linux s390 Architecture development
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Christian Brauner <brauner@kernel.org>,
	Jann Horn <jannh@google.com>,
	akpm@linux-foundation.org, hca@linux.ibm.com,
	linux-s390@vger.kernel.org, david@kernel.org, mhocko@suse.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	surenb@google.com, timmurray@google.com
Subject: Re: [PATCH v3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
Date: Fri, 15 May 2026 22:47:06 -0700	[thread overview]
Message-ID: <aggE2haQ7yGeHPeF@google.com> (raw)
In-Reply-To: <CAHk-=wiHfzbufXD2nuc9BhOVp-pYEmnVUYeThKmfZRXLU0kJ7w@mail.gmail.com>

On Fri, May 15, 2026 at 04:45:38PM -0700, Linus Torvalds wrote:
> On Fri, 15 May 2026 at 15:33, Minchan Kim <minchan@kernel.org> wrote:
> >
> > I hope thisclarifies the motivation and mechanics behind this issue.
> 
> This still seems very hacky and a complete special case for one odd situation.

Hi Linus,

Thank you for looking into this issue and sharing your thoughts.

Regarding proc_mem_open(), it actually operates very close to what you suggested.
It acquires a reference to the mm_struct itself via mmgrab() but immediately
unpins the address space memory via mmput(). Thus, no long-term mm_users
reference is held across the open file descriptor.

The latency issue occurs during seqfile iteration (m_start/m_stop) in
smaps/maps, or during get_cmdline() and ptrace_access_vm(), where the reader
temporarily acquires mm_users via mmget_not_zero() or get_task_mm().

Because these monitoring processes or background readers often run with lower
priority on heavily loaded systems under memory pressure, their temporary
mmget() reference can easily be delayed or preempted for extended periods.

When the victim process exits, its memory teardown (exit_mmap) is blocked by
this stalled reference. Because any kernel interface acquiring mmget() can
potentially cause this delay under heavy load, the issue appears more general
than just a single file interface.

This is why I felt allowing process_mrelease() to directly acquire the dying
address space could be a clean and robust way to resolve this expedited reclaim
issue, without modifying individual callers across the kernel. I would highly
appreciate your thoughts on this perspective.

> 
> This all sounds like it's just because smap is a pig.
> 
> And yes, smap *is* a pig, but it should be trivial to just fix smap
> for this case - fix the problem spot, don't add new horrid logic
> elsewhere.
> 
> I really think the fix is to fix smap instead.
> 
> And I think smap is doing odd things. For example, it does
> 
>    pid_smaps_open -> do_maps_open -> proc_mem_open
> 
> and that proc_mem_open() takes that long-term ref to the mm. And then
> does various memory allocations - and copying data to user space -
> under that long-term ref, which is presumably what causes all the
> latency issues.
> 
> But it doesn't actually seem to *need* a long-term ref to the mm. The
> seqfile interface is designed so that it should all be chunkable, and
> the locks and refs should be done at m_start/m_end time.
> 
> And the smap / maps m_start and m_end functions already *almost* seem
> to do that. They literally look up the task again with
> 
>         priv->task = get_proc_task(priv->inode);
> 
> etc, but then they do that odd
> 
>         lock_ctx = &priv->lock_ctx;
>         mm = lock_ctx->mm;
>         if (!mm || !mmget_not_zero(mm)) {
>                 put_task_struct(priv->task);
>                 priv->task = NULL;
>                 return NULL;
>         }
> 
> dance (where lock_ctx->mm is literally that long-term ref we hold).
> 
> And I don't see why they need to do any of this. I think it's all a
> historical accident.
> 
> Because I think it could look up the mm from the task pointer every
> time, without holding that long-term ref from proc_mem_open() at all.
> 
> IOW, at open time, we could save off the "this is the mm I opened",
> but *without* any refcount, and then just verify that "yes, the task
> mm still matches". No long-term refs anywhere.
> 
> Yes, yes, we'd need some sequence counter for when the mm changes due
> to execve, but *that* should be absolutely trivial.
> 
> And wouldn't that make all of this go away entirely? And probably
> clean up the code at the same time?
> 
> I think the only reason "proc_mem_open()" does what it does was that
> it was simple. Not because it was a good idea.
> 
>                    Linus

  parent reply	other threads:[~2026-05-16  5:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 21:42 [PATCH v3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-05-15 14:41 ` Christian Brauner
2026-05-15 15:49   ` Oleg Nesterov
2026-05-15 20:15   ` Oleg Nesterov
2026-05-15 22:33     ` Minchan Kim
2026-05-15 23:45       ` Linus Torvalds
2026-05-16  0:00         ` Linus Torvalds
2026-05-16  5:47         ` Minchan Kim [this message]
2026-05-15 20:42   ` Suren Baghdasaryan
2026-05-15 20:52   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aggE2haQ7yGeHPeF@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=david@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox