From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F25A62AF1A; Wed, 26 Jun 2024 21:35:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719437711; cv=none; b=DzMcPXiQU4xf1zHH46u7it6B6HwiJCbzxeEpqc1T2rRdlpeNURjSuCQEZwU64TC3QZ8zrYK0bWQckosb0b7qFjDS9Rni9TegkLLfkLfk4PTtIcNqWYQrgLm5ZQsZW51uPhtsQ1QJKQslNeRGxzcnhFd6ZUcHUUUsydTDnGIbAYM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719437711; c=relaxed/simple; bh=s0GeOvESFSBrRi+DN+LV5hCsx1lZA+34LfdNK7H/BjY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=mUvdXLzW0I/ceDS/IPogTzwpl+s3AsyWigrb7rvZETDff66TcgfJNQsFPraq1TVu8GiaLJmjYQ7HS/nGEVUxnVTP/i9Yz7Swssg6T8+vX95vecUWOPBqI88bYyTcp1y//WRdQ7WX2mnXfFlSGKRXheHH2L/dSk0BFRPNSt56I/Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Vp/8tqwL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Vp/8tqwL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83B57C116B1; Wed, 26 Jun 2024 21:35:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719437710; bh=s0GeOvESFSBrRi+DN+LV5hCsx1lZA+34LfdNK7H/BjY=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=Vp/8tqwLmUV0dpY+5G8PSbp0C94GyCfYCdhzDxYX5hLTjrHz+2FEunJDuJpg7uhQJ 8VQDPRLapN3ArwFx1H/jXPizTFxCFwGfRWadUVAS9H10eMP5gNsir/WtXSTd1eVIsy Na6/IXVRnU+JdIwyWxjzFVjeyIxScxsnmdvkhIaUo3k/2Mlu/mwccIv4AVb+g03bNq TdCUBm8QR0fnv0zME3/1Z08lHbnlkbv2V5T28Tg6bz+p0znXTpdssQUQEmfVRWLaQW c6rNlSCfONB+d+pm/up857qhaWDrpo5MUuLlXp8dJ3CuP4ZuCxt4o32GQ7dmCMBtK3 V4y+7OQz4PDXw== Message-ID: <43732a44-1f90-4119-9e52-000b5a6a2f99@kernel.org> Date: Wed, 26 Jun 2024 23:35:07 +0200 Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Yosry Ahmed , "Christoph Lameter (Ampere)" Cc: Shakeel Butt , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 26/06/2024 00.59, Yosry Ahmed wrote: > On Tue, Jun 25, 2024 at 3:35 PM Christoph Lameter (Ampere) wrote: >> >> On Tue, 25 Jun 2024, Yosry Ahmed wrote: >> >>>> In my reply above, I am not arguing to go back to the older >>>> stats_flush_ongoing situation. Rather I am discussing what should be the >>>> best eventual solution. From the vmstats infra, we can learn that >>>> frequent async flushes along with no sync flush, users are fine with the >>>> 'non-determinism'. Of course cgroup stats are different from vmstats >>>> i.e. are hierarchical but I think we can try out this approach and see >>>> if this works or not. >>> >>> If we do not do sync flushing, then the same problem that happened >>> with stats_flush_ongoing could occur again, right? Userspace could >>> read the stats after an event, and get a snapshot of the system before >>> that event. >>> >>> Perhaps this is fine for vmstats if it has always been like that (I >>> have no idea), or if no users make assumptions about this. But for >>> cgroup stats, we have use cases that rely on this behavior. >> >> vmstat updates are triggered initially as needed by the shepherd task and >> there is no requirement that this is triggered simultaenously. We >> could actually randomize the intervals in vmstat_update() a bit if this >> will help. > > The problem is that for cgroup stats, the behavior has been that a > userspace read will trigger a flush (i.e. propagating updates). We > have use cases that depend on this. If we switch to the vmstat model > where updates are triggered independently from user reads, it > constitutes a behavioral change. I implemented a variant using completions as Yosry asked for: https://lore.kernel.org/all/171943668946.1638606.1320095353103578332.stgit@firesoul/ --Jesper