From: Johannes Weiner <hannes@cmpxchg.org>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: "Tejun Heo" <tj@kernel.org>, "Josef Bacik" <josef@toxicpanda.com>,
"Jens Axboe" <axboe@kernel.dk>,
"Zefan Li" <lizefan.x@bytedance.com>,
"Michal Hocko" <mhocko@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeelb@google.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Vasily Averin" <vasily.averin@linux.dev>,
cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org
Subject: Re: [PATCH v1 5/9] memcg: replace stats_flush_lock with an atomic
Date: Tue, 28 Mar 2023 13:53:22 -0400 [thread overview]
Message-ID: <ZCMpklJZqwWHro0u@cmpxchg.org> (raw)
In-Reply-To: <20230328061638.203420-6-yosryahmed@google.com>
On Tue, Mar 28, 2023 at 06:16:34AM +0000, Yosry Ahmed wrote:
> As Johannes notes in [1], stats_flush_lock is currently used to:
> (a) Protect updated to stats_flush_threshold.
> (b) Protect updates to flush_next_time.
> (c) Serializes calls to cgroup_rstat_flush() based on those ratelimits.
>
> However:
>
> 1. stats_flush_threshold is already an atomic
>
> 2. flush_next_time is not atomic. The writer is locked, but the reader
> is lockless. If the reader races with a flush, you could see this:
>
> if (time_after(jiffies, flush_next_time))
> spin_trylock()
> flush_next_time = now + delay
> flush()
> spin_unlock()
> spin_trylock()
> flush_next_time = now + delay
> flush()
> spin_unlock()
>
> which means we already can get flushes at a higher frequency than
> FLUSH_TIME during races. But it isn't really a problem.
>
> The reader could also see garbled partial updates, so it needs at
> least READ_ONCE and WRITE_ONCE protection.
>
> 3. Serializing cgroup_rstat_flush() calls against the ratelimit
> factors is currently broken because of the race in 2. But the race
> is actually harmless, all we might get is the occasional earlier
> flush. If there is no delta, the flush won't do much. And if there
> is, the flush is justified.
>
> So the lock can be removed all together. However, the lock also served
> the purpose of preventing a thundering herd problem for concurrent
> flushers, see [2]. Use an atomic instead to serve the purpose of
> unifying concurrent flushers.
>
> [1]https://lore.kernel.org/lkml/20230323172732.GE739026@cmpxchg.org/
> [2]https://lore.kernel.org/lkml/20210716212137.1391164-2-shakeelb@google.com/
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
With Shakeel's suggestion:
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: "Tejun Heo" <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
"Josef Bacik" <josef-DigfWCa+lFGyeJad7bwFQA@public.gmane.org>,
"Jens Axboe" <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
"Zefan Li" <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
"Michal Hocko" <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
"Roman Gushchin"
<roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>,
"Shakeel Butt" <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
"Muchun Song"
<muchun.song-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>,
"Andrew Morton"
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
"Michal Koutný" <mkoutny-IBi9RG/b67k@public.gmane.org>,
"Vasily Averin"
<vasily.averin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
bpf-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v1 5/9] memcg: replace stats_flush_lock with an atomic
Date: Tue, 28 Mar 2023 13:53:22 -0400 [thread overview]
Message-ID: <ZCMpklJZqwWHro0u@cmpxchg.org> (raw)
In-Reply-To: <20230328061638.203420-6-yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
On Tue, Mar 28, 2023 at 06:16:34AM +0000, Yosry Ahmed wrote:
> As Johannes notes in [1], stats_flush_lock is currently used to:
> (a) Protect updated to stats_flush_threshold.
> (b) Protect updates to flush_next_time.
> (c) Serializes calls to cgroup_rstat_flush() based on those ratelimits.
>
> However:
>
> 1. stats_flush_threshold is already an atomic
>
> 2. flush_next_time is not atomic. The writer is locked, but the reader
> is lockless. If the reader races with a flush, you could see this:
>
> if (time_after(jiffies, flush_next_time))
> spin_trylock()
> flush_next_time = now + delay
> flush()
> spin_unlock()
> spin_trylock()
> flush_next_time = now + delay
> flush()
> spin_unlock()
>
> which means we already can get flushes at a higher frequency than
> FLUSH_TIME during races. But it isn't really a problem.
>
> The reader could also see garbled partial updates, so it needs at
> least READ_ONCE and WRITE_ONCE protection.
>
> 3. Serializing cgroup_rstat_flush() calls against the ratelimit
> factors is currently broken because of the race in 2. But the race
> is actually harmless, all we might get is the occasional earlier
> flush. If there is no delta, the flush won't do much. And if there
> is, the flush is justified.
>
> So the lock can be removed all together. However, the lock also served
> the purpose of preventing a thundering herd problem for concurrent
> flushers, see [2]. Use an atomic instead to serve the purpose of
> unifying concurrent flushers.
>
> [1]https://lore.kernel.org/lkml/20230323172732.GE739026-druUgvl0LCNAfugRpC6u6w@public.gmane.org/
> [2]https://lore.kernel.org/lkml/20210716212137.1391164-2-shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
>
> Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
With Shakeel's suggestion:
Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
next prev parent reply other threads:[~2023-03-28 17:53 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-28 6:16 [PATCH v1 0/9] memcg: make rstat flushing irq and sleep friendly Yosry Ahmed
2023-03-28 6:16 ` Yosry Ahmed
2023-03-28 6:16 ` [PATCH v1 1/9] cgroup: rename cgroup_rstat_flush_"irqsafe" to "atomic" Yosry Ahmed
2023-03-28 6:16 ` Yosry Ahmed
2023-03-28 13:24 ` Shakeel Butt
2023-03-28 13:24 ` Shakeel Butt
2023-03-28 17:42 ` Johannes Weiner
2023-03-28 17:42 ` Johannes Weiner
2023-03-28 6:16 ` [PATCH v1 2/9] memcg: rename mem_cgroup_flush_stats_"delayed" to "ratelimited" Yosry Ahmed
2023-03-28 13:25 ` Shakeel Butt
2023-03-28 13:25 ` Shakeel Butt
2023-03-28 17:42 ` Johannes Weiner
2023-03-28 17:42 ` Johannes Weiner
2023-03-28 6:16 ` [PATCH v1 3/9] memcg: do not flush stats in irq context Yosry Ahmed
2023-03-28 13:26 ` Shakeel Butt
2023-03-28 13:26 ` Shakeel Butt
2023-03-28 17:43 ` Johannes Weiner
2023-03-28 6:16 ` [PATCH v1 4/9] cgroup: rstat: add WARN_ON_ONCE() if flushing outside task context Yosry Ahmed
2023-03-28 14:59 ` Shakeel Butt
2023-03-28 14:59 ` Shakeel Butt
2023-03-28 17:49 ` Johannes Weiner
2023-03-28 18:59 ` Yosry Ahmed
2023-03-28 18:59 ` Yosry Ahmed
2023-03-28 22:18 ` Yosry Ahmed
2023-03-28 22:18 ` Yosry Ahmed
2023-03-28 6:16 ` [PATCH v1 5/9] memcg: replace stats_flush_lock with an atomic Yosry Ahmed
2023-03-28 14:15 ` Shakeel Butt
2023-03-28 14:15 ` Shakeel Butt
2023-03-28 18:52 ` Yosry Ahmed
2023-03-28 18:52 ` Yosry Ahmed
2023-03-28 19:28 ` Shakeel Butt
2023-03-28 19:28 ` Shakeel Butt
2023-03-28 19:34 ` Yosry Ahmed
2023-03-28 19:34 ` Yosry Ahmed
2023-03-28 19:42 ` Yosry Ahmed
2023-03-28 17:53 ` Johannes Weiner [this message]
2023-03-28 17:53 ` Johannes Weiner
2023-03-28 6:16 ` [PATCH v1 6/9] memcg: sleep during flushing stats in safe contexts Yosry Ahmed
2023-03-28 15:09 ` Shakeel Butt
2023-03-28 15:09 ` Shakeel Butt
2023-03-28 18:35 ` Johannes Weiner
2023-03-28 18:45 ` Yosry Ahmed
2023-03-28 18:45 ` Yosry Ahmed
2023-03-28 19:06 ` Johannes Weiner
2023-03-28 19:06 ` Johannes Weiner
2023-03-28 19:26 ` Yosry Ahmed
2023-03-28 19:26 ` Yosry Ahmed
2023-03-28 6:16 ` [PATCH v1 7/9] workingset: memcg: sleep when flushing stats in workingset_refault() Yosry Ahmed
2023-03-28 15:18 ` Shakeel Butt
2023-03-28 15:18 ` Shakeel Butt
2023-03-28 18:47 ` Johannes Weiner
2023-03-28 18:47 ` Johannes Weiner
2023-03-28 19:25 ` Yosry Ahmed
2023-03-28 18:43 ` Johannes Weiner
2023-03-28 18:43 ` Johannes Weiner
2023-03-28 6:16 ` [PATCH v1 8/9] vmscan: memcg: sleep when flushing stats during reclaim Yosry Ahmed
2023-03-28 6:16 ` Yosry Ahmed
2023-03-28 15:19 ` Shakeel Butt
2023-03-28 15:19 ` Shakeel Butt
2023-03-28 19:01 ` Yosry Ahmed
2023-03-28 19:01 ` Yosry Ahmed
2023-03-28 19:29 ` Shakeel Butt
2023-03-28 19:29 ` Shakeel Butt
2023-03-28 18:49 ` Johannes Weiner
2023-03-28 6:16 ` [PATCH v1 9/9] memcg: do not modify rstat tree for zero updates Yosry Ahmed
2023-03-28 15:20 ` Shakeel Butt
2023-03-28 15:20 ` Shakeel Butt
2023-03-28 18:50 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZCMpklJZqwWHro0u@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=tj@kernel.org \
--cc=vasily.averin@linux.dev \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.