Re: [PATCH 0/2] execve scalability issues, part 1

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mateusz Guzik <mjguzik@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: Dennis Zhou <dennis@kernel.org>,
	linux-kernel@vger.kernel.org, tj@kernel.org,  cl@linux.com,
	akpm@linux-foundation.org, shakeelb@google.com,
	 linux-mm@kvack.org
Subject: Re: [PATCH 0/2] execve scalability issues, part 1
Date: Tue, 22 Aug 2023 16:24:56 +0200	[thread overview]
Message-ID: <CAGudoHHe5nzRTuj4G1fphD+JJ02TE5BnHEDwFm=-W6DoEj2qVQ@mail.gmail.com> (raw)
In-Reply-To: <20230822095154.7cr5ofogw552z3jk@quack3>

On 8/22/23, Jan Kara <jack@suse.cz> wrote:
> On Tue 22-08-23 00:29:49, Mateusz Guzik wrote:
>> On 8/21/23, Mateusz Guzik <mjguzik@gmail.com> wrote:
>> > True Fix(tm) is a longer story.
>> >
>> > Maybe let's sort out this patchset first, whichever way. :)
>> >
>>
>> So I found the discussion around the original patch with a perf
>> regression report.
>>
>> https://lore.kernel.org/linux-mm/20230608111408.s2minsenlcjow7q3@quack3/
>>
>> The reporter suggests dodging the problem by only allocating per-cpu
>> counters when the process is going multithreaded. Given that there is
>> still plenty of forever single-threaded procs out there I think that's
>> does sound like a great plan regardless of what happens with this
>> patchset.
>>
>> Almost all access is already done using dedicated routines, so this
>> should be an afternoon churn to sort out, unless I missed a
>> showstopper. (maybe there is no good place to stuff a flag/whatever
>> other indicator about the state of counters?)
>>
>> That said I'll look into it some time this or next week.
>
> Good, just let me know how it went, I also wanted to start looking into
> this to come up with some concrete patches :). What I had in mind was that
> we could use 'counters == NULL' as an indication that the counter is still
> in 'single counter mode'.
>

In the current state there are only pointers to counters in mm_struct
and there is no storage for them in task_struct. So I don't think
merely null-checking the per-cpu stuff is going to cut it -- where
should the single-threaded counters land?

Bonus problem, non-current can modify these counters and this needs to
be safe against current playing with them at the same time. (and it
would be a shame to require current to use atomic on them)

That said, my initial proposal adds a union:
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5e74ce4a28cd..ea70f0c08286 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -737,7 +737,11 @@ struct mm_struct {

                unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for
/proc/PID/auxv */

-               struct percpu_counter rss_stat[NR_MM_COUNTERS];
+               union {
+                       struct percpu_counter rss_stat[NR_MM_COUNTERS];
+                       u64 *rss_stat_single;
+               };
+               bool    magic_flag_stuffed_elsewhere;

                struct linux_binfmt *binfmt;

Then for single-threaded case an area is allocated for NR_MM_COUNTERS
countes * 2 -- first set updated without any synchro by current
thread. Second set only to be modified by others and protected with
mm->arg_lock. The lock protects remote access to the union to begin
with.

Transition to per-CPU operation sets the magic flag (there is plenty
of spare space in mm_struct, I'll find a good home for it without
growing the struct). It would be a one-way street -- a process which
gets a bunch of threads and goes back to one stays with per-CPU.

Then you get the true value of something by adding both counters.

arg_lock is sparingly used, so remote ops are not expected to contend
with anything. In fact their cost is going to go down since percpu
summation takes a spinlock which also disables interrupts.

Local ops should be about the same in cost as they are right now.

I might have missed some detail in the above description, but I think
the approach is decent.

-- 
Mateusz Guzik <mjguzik gmail.com>

next prev parent reply	other threads:[~2023-08-22 14:25 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 20:28 [PATCH 0/2] execve scalability issues, part 1 Mateusz Guzik
2023-08-21 20:28 ` [PATCH 1/2] pcpcntr: add group allocation/free Mateusz Guzik
2023-08-22 13:37   ` Vegard Nossum
2023-08-22 14:06     ` Mateusz Guzik
2023-08-22 17:02   ` Dennis Zhou
2023-08-21 20:28 ` [PATCH 2/2] fork: group allocation of per-cpu counters for mm struct Mateusz Guzik
2023-08-21 21:20   ` Matthew Wilcox
2023-08-21 20:42 ` [PATCH 0/2] execve scalability issues, part 1 Matthew Wilcox
2023-08-21 20:44   ` [PATCH 1/7] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 2/7] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 3/7] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 4/7] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 5/7] memcg: Add mem_cgroup_uncharge_batch() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 6/7] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 7/7] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2023-08-21 21:07 ` [PATCH 0/2] execve scalability issues, part 1 Dennis Zhou
2023-08-21 21:39   ` Mateusz Guzik
2023-08-21 22:29     ` Mateusz Guzik
2023-08-22  9:51       ` Jan Kara
2023-08-22 14:24         ` Mateusz Guzik [this message]
2023-08-23  9:49           ` Jan Kara
2023-08-23 10:49             ` David Laight
2023-08-23 12:01               ` Mateusz Guzik
2023-08-23 12:13             ` Mateusz Guzik
2023-08-23 15:47               ` Jan Kara
2023-08-23 16:10                 ` Mateusz Guzik
2023-08-23 16:41                   ` Jan Kara
2023-08-23 17:12                     ` Mateusz Guzik
2023-08-23 20:27             ` Dennis Zhou
2023-08-24  9:19               ` Jan Kara
2023-08-26 18:33 ` Mateusz Guzik

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:5e74ce4a28c dfblob:ea70f0c0828 )
 OR (
bs:"Re: [PATCH 0/2] execve scalability issues, part 1" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGudoHHe5nzRTuj4G1fphD+JJ02TE5BnHEDwFm=-W6DoEj2qVQ@mail.gmail.com' \
    --to=mjguzik@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).