linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>,
	tj@kernel.org, cgroups@vger.kernel.org,  hannes@cmpxchg.org,
	lizefan.x@bytedance.com, longman@redhat.com,
	 kernel-team@cloudflare.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes
Date: Thu, 18 Jul 2024 20:11:32 -0700	[thread overview]
Message-ID: <CAJD7tkZbOf7125mcuN-EQdn6MB=dEasHWpoLmY1p-KCjjRxGXQ@mail.gmail.com> (raw)
In-Reply-To: <k3aiufe36mb2re3fyfzam4hqdeshvbqcashxiyb5grn7w2iz2s@2oeaei6klok3>

On Thu, Jul 18, 2024 at 5:41 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> Hi Jesper,
>
> On Wed, Jul 17, 2024 at 06:36:28PM GMT, Jesper Dangaard Brouer wrote:
> >
> [...]
> >
> >
> > Looking at the production numbers for the time the lock is held for level 0:
> >
> > @locked_time_level[0]:
> > [4M, 8M)     623 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@               |
> > [8M, 16M)    860 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > [16M, 32M)   295 |@@@@@@@@@@@@@@@@@                                   |
> > [32M, 64M)   275 |@@@@@@@@@@@@@@@@                                    |
> >
>
> Is it possible to get the above histogram for other levels as well? I
> know this is 12 numa node machine, how many total CPUs are there?
>
> > The time is in nanosec, so M corresponds to ms (milliseconds).
> >
> > With 36 flushes per second (as shown earlier) this is a flush every
> > 27.7ms.  It is not unreasonable (from above data) that the flush time
> > also spend 27ms, which means that we spend a full CPU second flushing.
> > That is spending too much time flushing.
>
> One idea to further reduce this time is more fine grained flush
> skipping. At the moment we either skip the whole flush or not. How
> about we make this decision per-cpu? We already have per-cpu updates
> data and if it is less than MEMCG_CHARGE_BATCH, skip flush on that cpu.

Good idea.

I think we would need a per-subsystem callback to decide whether we
want to flush the cgroup or not. This needs to happen in the core
rstat flushing code (not the memcg flushing code), as we need to make
sure we do not remove the cgroup from the per-cpu updated tree if we
don't flush it.

More generally, I think we should be able to have a "force" flush API
that skips all optimizations and ensures that a flush occurs. I think
this will be needed in the cgroup_rstat_exit() path, where stats of a
cgroup being freed must be propagated to its parent, no matter how
insignificant they may be, to avoid inconsistencies.


  reply	other threads:[~2024-07-19  3:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11 13:28 [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes Jesper Dangaard Brouer
2024-07-11 13:29 ` [PATCH V7 2/2 RFC] cgroup/rstat: add tracepoint for ongoing flusher waits Jesper Dangaard Brouer
2024-07-16  8:42 ` [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes Jesper Dangaard Brouer
2024-07-17  0:35   ` Yosry Ahmed
2024-07-17  3:00     ` Waiman Long
2024-07-17 16:05       ` Yosry Ahmed
2024-07-17 16:36     ` Jesper Dangaard Brouer
2024-07-17 16:49       ` Yosry Ahmed
2024-07-18  8:12         ` Jesper Dangaard Brouer
2024-07-18 15:55           ` Yosry Ahmed
2024-07-19  0:40       ` Shakeel Butt
2024-07-19  3:11         ` Yosry Ahmed [this message]
2024-07-19 23:01           ` Shakeel Butt
2024-07-19  7:54         ` Jesper Dangaard Brouer
2024-07-19 22:47           ` Shakeel Butt
2024-07-20  4:52             ` Yosry Ahmed
     [not found]               ` <CAJD7tkaypFa3Nk0jh_ZYJX8YB0i7h9VY2YFXMg7GKzSS+f8H5g@mail.gmail.com>
2024-07-20 15:05                 ` Jesper Dangaard Brouer
2024-07-22 20:02               ` Shakeel Butt
2024-07-22 20:12                 ` Yosry Ahmed
2024-07-22 21:32                   ` Shakeel Butt
2024-07-22 22:58                     ` Shakeel Butt
2024-07-23  6:24                       ` Yosry Ahmed
2024-07-17  0:30 ` Yosry Ahmed
2024-07-17  7:32   ` Jesper Dangaard Brouer
2024-07-17 16:31     ` Yosry Ahmed
2024-07-17 18:17       ` Jesper Dangaard Brouer
2024-07-17 18:43         ` Yosry Ahmed
2024-07-19 15:07   ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJD7tkZbOf7125mcuN-EQdn6MB=dEasHWpoLmY1p-KCjjRxGXQ@mail.gmail.com' \
    --to=yosryahmed@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hawk@kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).