linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ivan Babrou <ivan@cloudflare.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>,
	Waiman Long <longman@redhat.com>,
	tj@kernel.org,  hannes@cmpxchg.org, lizefan.x@bytedance.com,
	cgroups@vger.kernel.org,  yosryahmed@google.com,
	netdev@vger.kernel.org, linux-mm@kvack.org,
	 kernel-team@cloudflare.com,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Daniel Dao <dqminh@cloudflare.com>,
	jr@cloudflare.com
Subject: Re: [PATCH v1] cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints
Date: Mon, 6 May 2024 09:28:41 -0700	[thread overview]
Message-ID: <CABWYdi2pTg5Yc23XAV_ZLJepF42b8L3R5syhocDVJS-Po=ZYOA@mail.gmail.com> (raw)
In-Reply-To: <mnakwztmiskni3k6ia5mynqfllb3dw5kicuv4wp4e4ituaezwt@2pzkuuqg6r3e>

On Mon, May 6, 2024 at 9:22 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Mon, May 06, 2024 at 02:03:47PM +0200, Jesper Dangaard Brouer wrote:
> >
> >
> > On 03/05/2024 21.18, Shakeel Butt wrote:
> [...]
> > >
> > > Hmm 128 usec is actually unexpectedly high.
> >
> > > How does the cgroup hierarchy on your system looks like?
> > I didn't design this, so hopefully my co-workers can help me out here? (To
> > @Daniel or @Jon)
> >
> > My low level view is that, there are 17 top-level directories in
> > /sys/fs/cgroup/.
> > There are 649 cgroups (counting occurrence of memory.stat).
> > There are two directories that contain the major part.
> >  - /sys/fs/cgroup/system.slice = 379
> >  - /sys/fs/cgroup/production.slice = 233
> >  - (production.slice have directory two levels)
> >  - remaining 37
> >
> > We are open to changing this if you have any advice?
> > (@Daniel and @Jon are actually working on restructuring this)
> >
> > > How many cgroups have actual workloads running?
> > Do you have a command line trick to determine this?
> >
>
> The rstat infra maintains a per-cpu cgroup update tree to only flush
> stats of cgroups which have seen updates. So, even if you have large
> number of cgroups but the workload is active in small number of cgroups,
> the update tree should be much smaller. That is the reason I asked these
> questions. I don't have any advise yet. At the I am trying to understand
> the usage and then hopefully work on optimizing those.
>
> >
> > > Can the network softirqs run on any cpus or smaller
> > > set of cpus? I am assuming these softirqs are processing packets from
> > > any or all cgroups and thus have larger cgroup update tree.
> >
> > Softirq and specifically NET_RX is running half of the cores (e.g. 64).
> > (I'm looking at restructuring this allocation)
> >
> > > I wonder if
> > > you comment out MEMCG_SOCK stat update and still see the same holding
> > > time.
> > >
> >
> > It doesn't look like MEMCG_SOCK is used.
> >
> > I deduct you are asking:
> >  - What is the update count for different types of mod_memcg_state() calls?
> >
> > // Dumped via BTF info
> > enum memcg_stat_item {
> >         MEMCG_SWAP = 43,
> >         MEMCG_SOCK = 44,
> >         MEMCG_PERCPU_B = 45,
> >         MEMCG_VMALLOC = 46,
> >         MEMCG_KMEM = 47,
> >         MEMCG_ZSWAP_B = 48,
> >         MEMCG_ZSWAPPED = 49,
> >         MEMCG_NR_STAT = 50,
> > };
> >
> > sudo bpftrace -e 'kfunc:vmlinux:__mod_memcg_state{@[args->idx]=count()}
> > END{printf("\nEND time elapsed: %d sec\n", elapsed / 1000000000);}'
> > Attaching 2 probes...
> > ^C
> > END time elapsed: 99 sec
> >
> > @[45]: 17996
> > @[46]: 18603
> > @[43]: 61858
> > @[47]: 21398919
> >
> > It seems clear that MEMCG_KMEM = 47 is the main "user".
> >  - 21398919/99 = 216150 calls per sec
> >
> > Could someone explain to me what this MEMCG_KMEM is used for?
> >
>
> MEMCG_KMEM is the kernel memory charged to a cgroup. It also contains
> the untyped kernel memory which is not included in kernel_stack,
> pagetables, percpu, vmalloc, slab e.t.c.
>
> The reason I asked about MEMCG_SOCK was that it might be causing larger
> update trees (more cgroups) on CPUs processing the NET_RX.

We pass cgroup.memory=nosocket in the kernel cmdline:

* https://lore.kernel.org/lkml/CABWYdi0G7cyNFbndM-ELTDAR3x4Ngm0AehEp5aP0tfNkXUE+Uw@mail.gmail.com/

> Anyways did the mutex change helped your production workload regarding
> latencies?


  reply	other threads:[~2024-05-06 16:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01 14:04 [PATCH v1] cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints Jesper Dangaard Brouer
2024-05-01 14:24 ` Waiman Long
2024-05-01 17:22   ` Jesper Dangaard Brouer
2024-05-01 18:41     ` Waiman Long
2024-05-02 11:23       ` Jesper Dangaard Brouer
2024-05-02 18:19         ` Waiman Long
2024-05-03 14:00           ` Jesper Dangaard Brouer
2024-05-03 14:30             ` Waiman Long
2024-05-03 19:18             ` Shakeel Butt
2024-05-06 12:03               ` Jesper Dangaard Brouer
2024-05-06 16:22                 ` Shakeel Butt
2024-05-06 16:28                   ` Ivan Babrou [this message]
2024-05-06 19:45                     ` Shakeel Butt
2024-05-06 19:54                   ` Jesper Dangaard Brouer
2024-05-02 19:44     ` Shakeel Butt
2024-05-03 12:58       ` Jesper Dangaard Brouer
2024-05-03 18:11         ` Shakeel Butt
2024-05-14  5:18 ` Jesper Dangaard Brouer
2024-05-14  5:55   ` Tejun Heo
2024-05-14 16:59     ` Waiman Long
2024-05-15 16:58 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABWYdi2pTg5Yc23XAV_ZLJepF42b8L3R5syhocDVJS-Po=ZYOA@mail.gmail.com' \
    --to=ivan@cloudflare.com \
    --cc=acme@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=cgroups@vger.kernel.org \
    --cc=dqminh@cloudflare.com \
    --cc=hannes@cmpxchg.org \
    --cc=hawk@kernel.org \
    --cc=jr@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).