From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2930AC3DA4A for ; Fri, 2 Aug 2024 11:43:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 556D56B007B; Fri, 2 Aug 2024 07:43:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DF456B0083; Fri, 2 Aug 2024 07:43:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37FAA6B0085; Fri, 2 Aug 2024 07:43:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 10F206B007B for ; Fri, 2 Aug 2024 07:43:19 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B91BF1A02EA for ; Fri, 2 Aug 2024 11:43:18 +0000 (UTC) X-FDA: 82407119676.29.9188C82 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf19.hostedemail.com (Postfix) with ESMTP id ECA091A0017 for ; Fri, 2 Aug 2024 11:43:16 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=JG3YdIlH; spf=pass (imf19.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722598938; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JLdPFcqbI7/lYtGN4tHAV0POioX1NsElusDcnK9aF1Q=; b=c+O3Xz2Ge+hvf2h6vo/oTPx4AsS3btNccFROBqYxdJa0MbGmAdcfsWCV/NUGMPsTs53JHi pAa7oAEfL3TqvDc8+7tUK4tnlU9T1cHKGtGdo2clGHShWFLjZ9ff8fQzuEtR5mp4nEICf/ insLWFMRfW3uOvZiquq8mznzrjImTEg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722598938; a=rsa-sha256; cv=none; b=dNtsHZVnBI+45OrGdPW/e6oseklFqwmhmqlyTvMVvtDhwOGNP4Px8B7abpWB2By3ticX6v 5rF3HV9IzstOXqb37tYdQUPT1O1Yt1WDXypxjvm1mq9hpbyv9fv6fT1f6E1bE+1/Q3XILf MhNNBErxUzg6oodE2tieYQfQqjFLh38= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=JG3YdIlH; spf=pass (imf19.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id DDF1062A03; Fri, 2 Aug 2024 11:43:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9AB5EC32782; Fri, 2 Aug 2024 11:43:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722598995; bh=WC8O4GJENSSswcFTCoObnQZlBeKC7T4XO63/D9YHEt4=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=JG3YdIlH3nAcFY7HrlMXy0Q2/YCo1T+FPoBQU6dqPwmOaOS0jCTWrwAO/ifAnb1SZ NjBq799toE5scjJhxUPZ8MvXnrqtGAkRuxyrIx19LPkOrnsK50wgweSYoWFXiPCD1F 9qNPcxyqhcPt5OiMgyixcTOrqjvWhh6qSlTUJ1A2EqA9GoCLcArvaKBy2mdjlqqEBF i87tRQUzh+P5LMnKP6XWywDapg1QeeIiOiOeiR8QUlUZanUmWDIsnaI22qYZqkXIkN rTSqfShIZpHp36z4B0hosmuD+pvovQ3O2Pa3ez+DkXQxcfARKdSRs9w+/QgZ8LL79R mXSuDajzgiRsQ== Message-ID: Date: Fri, 2 Aug 2024 13:43:11 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V8 1/2] cgroup/rstat: Avoid flushing if there is an ongoing overlapping flush To: Yosry Ahmed Cc: tj@kernel.org, cgroups@vger.kernel.org, shakeel.butt@linux.dev, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <172139415725.3084888.13770938453137383953.stgit@firesoul> Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: x5pycp6wshdu9grya41bqhjoecoewzc7 X-Rspamd-Queue-Id: ECA091A0017 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1722598996-564060 X-HE-Meta: U2FsdGVkX18sx5VuRHimURADOCqlKhMdBe5WuAwnzSgufGf9Jks+CZn1PvdUTwATD+HKy9n2ZXZWgY2mQ71AJyY4C+n5cAkVaUmq1knNF3ySQ9cxVI1UVteLgPxUQu9FkwjI2ynOuBHddnTHlvavmoA4XbDQIrw6Hks1COvT/lNDiJPLOSuMp10sIaUGhcKb56CpM/rEeB4PK6ihlZSwNf5xeVBh1ZktfKWgGRfg1MWvXRmn1TYONi/rv1BJxqFModI85/pn1BVFZPTfj9UdJhrsVPpco4PnG8fCHjjdtJuuWtH5uwgVtR9VU2kZt7+axdk6vA5N2Ob/D8iPi0z1+M9SDHaGx080IG4nJdSJLdEKWwqvNJok4D/YSOvImSsEeNKNyy4oUZEZQZN5GfYNh7b+F6xYjreyO0u8XhVCgt1FiU+Q/0tsqbMVFcb4bQxyUNXOkxOVlLPbQ2uAYwqA1YeOv0uzAC9ufpMw1DFR4msjkHTLkSUHA2tAFqdtFw0zge7oFjoTLx3ZzttPRLCO5c9k1RIgsXeeYpPNyBcb0JfZ9EBLb/C44GzQh6JOfHAToUBvyK9FFBuHO1a4SfZpg0hS61xA1S22aEtXtlTwzHU1VoayUBHlmpaA1Gsb4Z87WM1DZy10eRrHXxji7oZ53JGg2enjnl0t/IPkRXgXji/P3HFC/FSkX/g9TpqqqLmpWWQo8pwpM8PzS/TN+gu52DfTx7O+OTH/h12Fo3oxN228KhmIy3miyMiizxphsWKw8N23XWmEeyXNdOHXtweONUr7KNktZC2xDLqM2MpGOY5UkUW3fhdoPEFviQyWcwSS1KEqZrFQwbCwtg99VFCK9/OzK0T7bNbbNhTXStYYV+WcpxnqvI4OkSWotW/6fKoFjrFNZTuFZmHQ4900i7a5vbfBUxwJjsU/p2cpJkoLRFezvVMukBTiph91ez0VYeoL/PRaqOHXoT2i8KJdH0h mlFxPwqX YjraZXUiuYHkmIZGjPiw158zs+4WUOKwnBI6XekKWkHTf/XuG9UmR013Smcw05XldytI88KNTLECS8xZ4UklD/M54TXCMdmG6dyMiYmtBjFp4Z/TrLu6U9JEL/Y7QTCjDNXRTd0v/HMXqkVdzIaCtjgj+4JI2OpHALo6AfoKZOu/teszTDRj0MM/PFfddzx8I8QJiQqQHvLa0QpzvFm+wSfMWNN4sBlhOHFRu X-Bogosity: Ham, tests=bogofilter, spamicity=0.013421, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 30/07/2024 20.54, Yosry Ahmed wrote: > [..] >> >> Well... I'm still not convinced that it makes sense to have level >= 2 >> be the ongoing flusher. >> >> E.g. if a level 2 cgroup becomes ongoing flusher, and kswapd starts 12 >> NUMA flushes at the same time, then the code will have these 12 kswapd >> threads spin on the lock, until ongoing flusher finishes. That is likely >> what happened above (for a level 1). These 12 spinning (root) flushers >> will not recheck ongoing_flusher and will all flush the root >> (unnecessarily 11 times). > > Hmm regardless of whether or not the level-2 cgroup becomes the > ongoing flusher, the kswapd threads will all spin on the lock anyway > since none of them can be the ongoing flusher until the level-2 cgroup > finishes. Right? > > Is the scenario you have in mind that the level-2 cgroup starts > flushing at the same time as kswapd, so there is a race on who gets to > be the ongoing flusher? In this case as well, whoever gets the lock > will be the ongoing flusher anyway. > > Not allowing whoever is holding the lock to be the ongoing flusher > based on level is only useful when we can have multiple ongoing > flushers (with lock yielding). Right? > > Perhaps I am missing something here. > >> >> So, I don't think it is a good idea to have anything else that the root >> as the ongoing flusher. >> >> Can you explain/convince me why having sub-cgroups as ongoing flusher is >> an advantage? > > I just don't see the benefit of the special casing here as I mentioned > above. If I missed something please let me know. > I do think you missed something. Let me try to explain this in another way. (I hope my frustrations doesn't shine through). The main purpose of the patch is/was to stop the thundering herd of kswapd thread flushing (root-cgrp) at exactly the same time, leading to lock contention. This happens all-the-time/constantly in production. The first versions (where ongoing was limited to root/level=0) solved this 100%. The patches that generalized this to be all levels can become ongoing flush, doesn't solve the problem any-longer! I hope it is clear what fails. E.g. When a level>0 becomes ongoing flusher, and 12 kswapd simultaneously does a level=0/root-cgrp flush, then we have 12 CPU cores spinning on the rstat lock. (These 12 kswapd threads will all go-through completing the flush, as they do not discover/recheck that ongoing flush was previously became their own level). I think we need to go back to only having root-cgroup as ongoing flusher. --Jesper