From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C95EC25B4F for ; Mon, 6 May 2024 16:28:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B11956B0098; Mon, 6 May 2024 12:28:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC24A6B009C; Mon, 6 May 2024 12:28:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 989116B009D; Mon, 6 May 2024 12:28:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 77B196B0098 for ; Mon, 6 May 2024 12:28:55 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2AE1FA11D9 for ; Mon, 6 May 2024 16:28:55 +0000 (UTC) X-FDA: 82088505030.15.E50DC94 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf24.hostedemail.com (Postfix) with ESMTP id 53E13180007 for ; Mon, 6 May 2024 16:28:53 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=cloudflare.com header.s=google09082023 header.b=UkOpgzdO; dmarc=pass (policy=reject) header.from=cloudflare.com; spf=pass (imf24.hostedemail.com: domain of ivan@cloudflare.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=ivan@cloudflare.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715012933; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xCCkT3u7JpEp6mJ5QsHOybd1FYzd1g8E4PkeVAusjAE=; b=zPhvQvWYw49YpkPxWXLvCQbe2BSFDL4g2dDXMkVqxeyd1RlmoAeBljzoR2BV8ziDEcD4YJ +lO2+T786ENiB5J9x86/s0Be9yUzU8iHWZavRFjrBY93nm6SO0VYu6QHfEUk33kVAtv3Zq lEPp8BROgsehj5yRqP8vy+JEoavDVB8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715012933; a=rsa-sha256; cv=none; b=3ZXVjt06/C+reeaqFqF11nL8xGBR0f5WhDJHXP3FZ/WlE1rNovVYBSnFf/wNdRliEVXQiv C5Bj0gAoUwxwy3oAqRZR7WunI2KLKdJGbW4E7L0q0vUt7oAP3IOlXw4ZVnkkpI9cP3jCJa DEL/onblfRqLNI3WLpo4NuuzmKxih3g= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=cloudflare.com header.s=google09082023 header.b=UkOpgzdO; dmarc=pass (policy=reject) header.from=cloudflare.com; spf=pass (imf24.hostedemail.com: domain of ivan@cloudflare.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=ivan@cloudflare.com Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-61bb219737dso23790807b3.2 for ; Mon, 06 May 2024 09:28:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1715012932; x=1715617732; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xCCkT3u7JpEp6mJ5QsHOybd1FYzd1g8E4PkeVAusjAE=; b=UkOpgzdOaW2HQsb6gUJ/5r23s2YtLTpznAXBWDY0QlzFfUKPHmzC1emsa9OHYjdzxz xXAHLnHXlfMcUbrNS+qi9FS3iN2ofcfH7jwJnRV7LDY+WeQB9t8vc255nHxIK6TTqhmF SsVVsimbszg9a9kLeh60sZBwamjfYurY76SZtqwMFQhRHGpR2zq5wgWuv+2SA5Uhpf6N WDTE/tvDv1wQp/S9WW/RPeAWwbqiLMBFBWUiSsftEo+FtmEN6LDlHuaktLa81XRQyF5L oE5odNRTWYXtb315dhI7oWA8Mn5qDb2gMmIPhp/Or+Qqn0PhNq479dzjXkS+r5t/AI5y mFVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715012932; x=1715617732; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xCCkT3u7JpEp6mJ5QsHOybd1FYzd1g8E4PkeVAusjAE=; b=qads9eoBN0N/4LOOSZ/CtKlqFSno7qQM2zKpgQ2PUqKHrZaN6QxYQeQy9AiunT5e64 SGzITkjwrSewPzybFNRzxrdojNxui+o5JQcDK85UVN/Elcna5jq2jGCz/996x/bUXue8 pYIsbP78/p/J0PNqd5rPFYpKr6LPGobHa11pS1sO2L9uJKcwP0sECQVLwSuUSjyMSYCG ohMuVXODC3EFTd+QnWf6ATXDl6M+D/ljVn6bnq0eFN7ms6Rd2Fno5XoIAw1ddVSqlVTT 3RSf88reO59XpzYNpmyE9ddV4AQTrT4Zseg2+0mtW6fFJwJZmCEE93FCw0HcZLTUEZVe cWQg== X-Forwarded-Encrypted: i=1; AJvYcCUAqqEvC+dcYH+Nl+pH5g20sSnwdq86BoR3KwF0caKpTfJpvwOqWcKZD7JTa1saY8E10hiNQLUX/QR8GGaYndntgaw= X-Gm-Message-State: AOJu0Yyiho9MU3qq4mvuhcUEydgiQTS0iXVDolL3AmwAvFFXDpaqLAah liIxmI5vbMvBH1z34GPcDZUrE5+vhpOLC8KMAYRh7BjWfFLk4J0TPxitT0MTTi/y4qb19qZRHYU tIAOlWfIlh0nGHey3O3skxGclRkr9QbooSwATyg== X-Google-Smtp-Source: AGHT+IFvfA2ywVbCWpHObc2hhQu6/bp+dPE34bgiOyDlARIZkevuDZ413NE2Lsv3j8JUxB1ePQsIlKXeHYj5RqeHUtk= X-Received: by 2002:a81:af12:0:b0:61a:d372:8767 with SMTP id n18-20020a81af12000000b0061ad3728767mr9829158ywh.51.1715012932189; Mon, 06 May 2024 09:28:52 -0700 (PDT) MIME-Version: 1.0 References: <171457225108.4159924.12821205549807669839.stgit@firesoul> <30d64e25-561a-41c6-ab95-f0820248e9b6@redhat.com> <4a680b80-b296-4466-895a-13239b982c85@kernel.org> <203fdb35-f4cf-4754-9709-3c024eecade9@redhat.com> <42a6d218-206b-4f87-a8fa-ef42d107fb23@kernel.org> <4gdfgo3njmej7a42x6x6x4b6tm267xmrfwedis4mq7f4mypfc7@4egtwzrfqkhp> <55854a94-681e-4142-9160-98b22fa64d61@kernel.org> In-Reply-To: From: Ivan Babrou Date: Mon, 6 May 2024 09:28:41 -0700 Message-ID: Subject: Re: [PATCH v1] cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints To: Shakeel Butt Cc: Jesper Dangaard Brouer , Waiman Long , tj@kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, cgroups@vger.kernel.org, yosryahmed@google.com, netdev@vger.kernel.org, linux-mm@kvack.org, kernel-team@cloudflare.com, Arnaldo Carvalho de Melo , Sebastian Andrzej Siewior , Daniel Dao , jr@cloudflare.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 53E13180007 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 19sww1zabry4rpc6zjminywr7kx4m7c7 X-HE-Tag: 1715012933-204886 X-HE-Meta: U2FsdGVkX18WvwMgvZ4zhrecfk0bOpJNunmJyTgGHI8WhlKVOoytykJwzkOjWetM7jyFrDBk4R2WjmQpbZTs6bWmgN79tvsjo0+kBsAZqgkMOByvBz+UbUDvc69pJdbR5vnK1cz5BW6bUx16pgtYerkir/qLhWnERwwLYqgUAMYrnDAQe7gpY7jCh1X0X93QCGzVzaVNCb5/Q4xP25DyQEbYKusyMuOnXU8imiiotJH3kHgT1TZrWr74/X3CPfOwKhDxyBaOOxhAakyhNsEVoH0SFVfTofO/+YOVNXRR0GDks3HbLJJqmM0RI5Qc7zcJEWXHX36fg2qV6NCtn30AI23xDmt0CRnZlHO5zP/kr/54GPmqFom1fve210FetEQ+gxCaR6yTfz+W45GhWWjNdwv+0aIHGzyMKxH9JRoffD2kIMF65KT/xwPEHyETAqtcasPGtbsUumg9giKOwnek8tyu8EmfHzkO+9nHfxY8bax4PaoI7NTTtzQ0HnLpJrscgj/aDcVrT6dKvcDYwBk9rxSfxZURcpcDOalNDjBKGV5hqvtKj7jJqt8yE+wbVEKJQNS9LLBOnGvXyOmU+507pm/yS/7Jj9181BFSjp6sMUpy1xIq6Pit9WYok5Q+A4UW0np0sJM+oR3iho1gMaij7fJyMk60ECHYm4PEYCKB3DcQDBAuWlX6mUPXym5Uggmkf9vyKItnrAWT+7Ztc8ZSU0KAYn9hYADyTt40f5+yqPeM+ZV1BMk+Os/LG0tabJI78SJeBjH79wvRSbqaqesgOvxd0ok+MT3Ls7YyYyWasKg5N8aX/njLzi8Dp20egrphlMM8Sc6qHkUoJfxc4/Epc9gQb4/zFelwElsnXRTSGwNvjZu0azTmzTAMgM7gpr2QnkdegTh3bPSo9kBM/oZKPkpEmlqyccU/ASaPJj/pGBjL7MrxCEzGpLHZk7W/WioAOHczHumlCZjXxvqBCzq LHAStAiV kRrEIxlczNavGETpwUfxJXdUI6k4Of6r3I/N44f2b8G0mpWKQUw6qL3MAVfE76MOlyZXC+3Od7q+buDNoT4BK6PGoW0LsSeor0ZV7EFh4c8owgqDKGwVK5uDlSAN2I061uCiK9a8stKugQ/0D/+e2JzE6CJc0pz2JqC4tGBBToqoBWiEfgnkhKs7FkKafmWUXkim9mefMQp3GXVk9z3Ms9sMmeZE/9D0LYwFISGJbLQLyDcICf6q9MNVZyKmHtOUeoDGtorblpb9DttTLLqKYlRc/qlnBwyy9J+h29Sz+Iem7muqMQJc+x9x3IXzvrWVEysgsaMebAjOQboS66N5Epa5NtC3o5IROHwJ7rYsJYZAKgkkSIlDsSVWUWA6EporhY8/Z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 6, 2024 at 9:22=E2=80=AFAM Shakeel Butt wrote: > > On Mon, May 06, 2024 at 02:03:47PM +0200, Jesper Dangaard Brouer wrote: > > > > > > On 03/05/2024 21.18, Shakeel Butt wrote: > [...] > > > > > > Hmm 128 usec is actually unexpectedly high. > > > > > How does the cgroup hierarchy on your system looks like? > > I didn't design this, so hopefully my co-workers can help me out here? = (To > > @Daniel or @Jon) > > > > My low level view is that, there are 17 top-level directories in > > /sys/fs/cgroup/. > > There are 649 cgroups (counting occurrence of memory.stat). > > There are two directories that contain the major part. > > - /sys/fs/cgroup/system.slice =3D 379 > > - /sys/fs/cgroup/production.slice =3D 233 > > - (production.slice have directory two levels) > > - remaining 37 > > > > We are open to changing this if you have any advice? > > (@Daniel and @Jon are actually working on restructuring this) > > > > > How many cgroups have actual workloads running? > > Do you have a command line trick to determine this? > > > > The rstat infra maintains a per-cpu cgroup update tree to only flush > stats of cgroups which have seen updates. So, even if you have large > number of cgroups but the workload is active in small number of cgroups, > the update tree should be much smaller. That is the reason I asked these > questions. I don't have any advise yet. At the I am trying to understand > the usage and then hopefully work on optimizing those. > > > > > > Can the network softirqs run on any cpus or smaller > > > set of cpus? I am assuming these softirqs are processing packets from > > > any or all cgroups and thus have larger cgroup update tree. > > > > Softirq and specifically NET_RX is running half of the cores (e.g. 64). > > (I'm looking at restructuring this allocation) > > > > > I wonder if > > > you comment out MEMCG_SOCK stat update and still see the same holding > > > time. > > > > > > > It doesn't look like MEMCG_SOCK is used. > > > > I deduct you are asking: > > - What is the update count for different types of mod_memcg_state() ca= lls? > > > > // Dumped via BTF info > > enum memcg_stat_item { > > MEMCG_SWAP =3D 43, > > MEMCG_SOCK =3D 44, > > MEMCG_PERCPU_B =3D 45, > > MEMCG_VMALLOC =3D 46, > > MEMCG_KMEM =3D 47, > > MEMCG_ZSWAP_B =3D 48, > > MEMCG_ZSWAPPED =3D 49, > > MEMCG_NR_STAT =3D 50, > > }; > > > > sudo bpftrace -e 'kfunc:vmlinux:__mod_memcg_state{@[args->idx]=3Dcount(= )} > > END{printf("\nEND time elapsed: %d sec\n", elapsed / 1000000000);}' > > Attaching 2 probes... > > ^C > > END time elapsed: 99 sec > > > > @[45]: 17996 > > @[46]: 18603 > > @[43]: 61858 > > @[47]: 21398919 > > > > It seems clear that MEMCG_KMEM =3D 47 is the main "user". > > - 21398919/99 =3D 216150 calls per sec > > > > Could someone explain to me what this MEMCG_KMEM is used for? > > > > MEMCG_KMEM is the kernel memory charged to a cgroup. It also contains > the untyped kernel memory which is not included in kernel_stack, > pagetables, percpu, vmalloc, slab e.t.c. > > The reason I asked about MEMCG_SOCK was that it might be causing larger > update trees (more cgroups) on CPUs processing the NET_RX. We pass cgroup.memory=3Dnosocket in the kernel cmdline: * https://lore.kernel.org/lkml/CABWYdi0G7cyNFbndM-ELTDAR3x4Ngm0AehEp5aP0tfN= kXUE+Uw@mail.gmail.com/ > Anyways did the mutex change helped your production workload regarding > latencies?