From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59295C3DA59 for ; Sat, 20 Jul 2024 15:06:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95E156B0082; Sat, 20 Jul 2024 11:06:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E7876B0085; Sat, 20 Jul 2024 11:06:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 787C96B0088; Sat, 20 Jul 2024 11:06:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 57FF96B0082 for ; Sat, 20 Jul 2024 11:06:03 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CE34A1C07BE for ; Sat, 20 Jul 2024 15:06:02 +0000 (UTC) X-FDA: 82360456164.11.2F79BAE Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id 10B46120019 for ; Sat, 20 Jul 2024 15:05:59 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=M+DGjn7M; spf=pass (imf29.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721487939; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1V3VK0yhRloW+XIGn6rEz4m10JrbCf++WHzyX2TAc5Q=; b=ZfOkPY4l6+PldCV9ce8NnCU4mB+khSRDDY8wJM78JMHqlRUcHGES+401TI+DJteHIHZ5jc C9KnnMTrUMtEfhSTSf+9tIZq6n/GOb2+BH92mxDny1ELMI7lSsxaZ0UkEEDWnwXfAvqKSU VCbORB7BDakfNIVZaeHMCjewXkOcoSg= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=M+DGjn7M; spf=pass (imf29.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721487939; a=rsa-sha256; cv=none; b=nJ9VlodrQDZRKE+TZA3ABpKIvdCX9rptrVJZ+pZJJxmJKiHolaOKP+Pxuhn7hwqsYB3AKv SVal8+H5pcs+e4TyuZiUPvht9Cjn3DVjFaUAKsOWZJVHJclDaTaNW5l1u/R4pPnT9axr+I qZIzeTvO+qA2AuRhtm642SPC6nW1Fhw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id C671560DD5; Sat, 20 Jul 2024 15:05:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 79E50C2BD10; Sat, 20 Jul 2024 15:05:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721487958; bh=k4QH3qMZmc5lt/JCKiyiytBQFLERwz8/NvfvAG7HuYU=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=M+DGjn7MLbUQhFphHGMvKeKFSIMXZHy756KSGeg0DFq6CF6/z6WaCDSm22ll3ezC9 HXNyeRDvPR03lSm8Hy5ovam0PGDkz4B2TKvUSTj5dD58Nil02gPJ8+c7E7xKw6cTRO h6GJnOvZ/XcBFHkrYwXSFeMhRQDvujnTuZXDWP4PyRxRtG1c0ddYexC4shqw3GUd2G hBdtOHzgt1WnuVA93sdGrqTZ7nif0yZHxTKjwrWG4f0MDe6w30BM9+QpAdwszVDRjn adjOEKiz557c6KMLgohObDU3yitNCpQmNMU85A+513/zjIBJet3cwj3DCgtmnRhR72 1wF1XMzcPNHSw== Message-ID: <74c53382-5c31-41e9-94a2-0a7f88c0d2a5@kernel.org> Date: Sat, 20 Jul 2024 17:05:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Yosry Ahmed , Shakeel Butt Cc: tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <172070450139.2992819.13210624094367257881.stgit@firesoul> <100caebf-c11c-45c9-b864-d8562e2a5ac5@kernel.org> <5ccc693a-2142-489d-b3f1-426758883c1e@kernel.org> Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 10B46120019 X-Stat-Signature: ix961zhjxmt7bqh6tt37gtqberngti7p X-HE-Tag: 1721487959-663722 X-HE-Meta: U2FsdGVkX1/bDVINPDZLWeVAGHTB0TMsocAvhuIvnmKB0LGNLXgSBelxr9tM8OYRSO4fzshwpsXKhVdwo7SSjDqOEF415Agj7t6tP9JXuURB6SdauLXKo9STEZADXC4CyBvpBqoWlb3yYS8rjJSSSqdfYEptB1Q9NjVyyjpWRRmqh88KJCV+2fpBJIKVKS9/UVDA61yupZTqmUfmnyoI76pGBHVCC9TtoPO9XxyQogDQL1j2CSp7ASnfl486Y8J51HRqIp/btdeR2AmwGhOodk06WidaSyu+wCTZAanFpTvb4bCRvynrnKO8kekOBcGKqTcmUnYbHzbZMAA5NT96IE87vucs8IImeFvzgDRVZpBEbdVrqwMEMmo+3XoVPINc+i4Snnt+XbdnOcQZuA4IWMhy8CogTpkT7wcePoClYHwgCG9yLKPDYNqqrQy6yJlLbNGG4E6UzU1KpNhrXbII3R6Fv0f8he/odSNZuIpXuSBfgPK3p6iFwK+mM0mQJ7Fi5mnwcPp4GtqbIGNreveNSmPRyki9S5hrFE0jlTliVsjmubfGfEH/Cv25er2PhAiLUy0LEt3arxBI42uokfI6T6KHj+mOAhHilml/J1fg4GeX8D+2ItvOk/oLE20yTWQvBsEayXLhbxUkyEUkcVZfPCwMxtrTzc6wFoba4KrrrHtHwaLQNFx79eLkh2Je32zl2dt2KvQNuem+RooOIL6y4vAhg6LgivRGeEfxxir8+wSq31v+s086+IE+FED5oc8PFOmzqQ48/thSxpsut1fQeM6wUKkaC7zZMD5hgVIQL5VEDiGTfYAfK2XhgZwhn9lL4yPuTullGo6n7kwvN9epIbGTs3YTq5pywX9ooo7cSKIn02SAruMxsOer3Dy1X+1ilk6MpqaDG1aPUhTIWB6OrxqbUV9NmVpma/LDN3laOawII4ZZs5WQqVuWVOcnigL4Pg+ERfxpHM447TiNW4D qrAq9iu6 I8BKjdznCh28G3VRcoWabqeMq+P1IFezy7z45PtyB37G+xANAgRRgO3wSSB0Pe/ImgIKh487qaIiMm5eIfUkQMIjV/dBPKU67fh2CaEZHTiPSUlMZ7v2soMbJq5VICvwThfqIdUPBXjGspvIHHUVsDz7He2JpQM7VhgMzWJI7o+T9IDvkvx4XpHYnMvO7v7Y+3+BG1e7AwEOrcIfMCA2PsAf3qSjtMVjFwe17W6oPmsxdF4co69qwOoFQIRe7FQY0KlkrnUvLvoBJpES8Y+4bFURkwwxKjAqltUvDdKT2pZAwqBGSAnkDWKB5cRdxC8Nld8Oe7xmXxss9WZyCl6vVGx2VlaaJUMM31IWabdB84tJwgcg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 20/07/2024 06.52, Yosry Ahmed wrote: > On Fri, Jul 19, 2024 at 9:52 PM Yosry Ahmed wrote: >> >> On Fri, Jul 19, 2024 at 3:48 PM Shakeel Butt wrote: >>> >>> On Fri, Jul 19, 2024 at 09:54:41AM GMT, Jesper Dangaard Brouer wrote: >>>> >>>> >>>> On 19/07/2024 02.40, Shakeel Butt wrote: >>>>> Hi Jesper, >>>>> >>>>> On Wed, Jul 17, 2024 at 06:36:28PM GMT, Jesper Dangaard Brouer wrote: >>>>>> >>>>> [...] >>>>>> >>>>>> >>>>>> Looking at the production numbers for the time the lock is held for level 0: >>>>>> >>>>>> @locked_time_level[0]: >>>>>> [4M, 8M) 623 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | >>>>>> [8M, 16M) 860 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| >>>>>> [16M, 32M) 295 |@@@@@@@@@@@@@@@@@ | >>>>>> [32M, 64M) 275 |@@@@@@@@@@@@@@@@ | >>>>>> >>>>> >>>>> Is it possible to get the above histogram for other levels as well? >>>> >>>> Data from other levels available in [1]: >>>> [1] >>>> https://lore.kernel.org/all/8c123882-a5c5-409a-938b-cb5aec9b9ab5@kernel.org/ >>>> >>>> IMHO the data shows we will get most out of skipping level-0 root-cgroup >>>> flushes. >>>> >>> >>> Thanks a lot of the data. Are all or most of these locked_time_level[0] >>> from kswapds? This just motivates me to strongly push the ratelimited >>> flush patch of mine (which would be orthogonal to your patch series). >> There are also others flushing level 0. Extended the bpftrace oneliner to also capture the process 'comm' name. (I reduced 'kworker' to one entry in below, e.g pattern 'kworker/u392:19'). grep 'level\[' out01.bpf_oneliner_locked_time | awk -F/ '{print $1}' | sort | uniq @locked_time_level[0, cadvisor]: @locked_time_level[0, consul]: @locked_time_level[0, kswapd0]: @locked_time_level[0, kswapd10]: @locked_time_level[0, kswapd11]: @locked_time_level[0, kswapd1]: @locked_time_level[0, kswapd2]: @locked_time_level[0, kswapd3]: @locked_time_level[0, kswapd4]: @locked_time_level[0, kswapd5]: @locked_time_level[0, kswapd6]: @locked_time_level[0, kswapd7]: @locked_time_level[0, kswapd8]: @locked_time_level[0, kswapd9]: @locked_time_level[0, kworker @locked_time_level[0, lassen]: @locked_time_level[0, thunderclap-san]: @locked_time_level[0, xdpd]: @locked_time_level[1, cadvisor]: @locked_time_level[2, cadvisor]: @locked_time_level[2, kworker @locked_time_level[2, memory-saturati]: @locked_time_level[2, systemd]: @locked_time_level[2, thread-saturati]: @locked_time_level[3, cadvisor]: @locked_time_level[3, cat]: @locked_time_level[3, kworker @locked_time_level[3, memory-saturati]: @locked_time_level[3, systemd]: @locked_time_level[3, thread-saturati]: @locked_time_level[4, cadvisor]: >> Jesper and I were discussing a better ratelimiting approach, whether >> it's measuring the time since the last flush, or only skipping if we >> have a lot of flushes in a specific time frame (using __ratelimit()). >> I believe this would be better than the current memcg ratelimiting >> approach, and we can remove the latter. >> >> WDYT? > > Forgot to link this: > https://lore.kernel.org/lkml/CAJD7tkZ5nxoa7aCpAix1bYOoYiLVfn+aNkq7jmRAZqsxruHYLw@mail.gmail.com/ > I agree that ratelimiting is orthogonal to this patch, and that we really need to address this in follow up patchset. The proposed mem_cgroup_flush_stats_ratelimited patch[1] helps, but is limited to memory area. I'm proposing a more generic solution in [2] that helps all users of rstat. It is time based, because it makes sense to observe the time it takes to flush root (service rate), and then limit how quickly after another flusher can run (limiting arrival rate). From practical queue theory we intuitively know that we should keep arrival rate below service rate, else queuing happens. --Jesper [1] "memcg: use ratelimited stats flush in the reclaim" - https://lore.kernel.org/all/20240615081257.3945587-1-shakeel.butt@linux.dev/ [2] "cgroup/rstat: introduce ratelimited rstat flushing" - https://lore.kernel.org/all/171328990014.3930751.10674097155895405137.stgit@firesoul/