* [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-03 22:03 ` Yosry Ahmed
-1 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-03 22:03 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Yosry Ahmed
wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
flush, which can be expensive on large systems. Currently,
wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
have to make the rstat flush atomically. On systems with a lot of
cpus/cgroups, this can cause us to disable irqs for a long time,
potentially causing problems.
Move the call to wb_over_bg_thresh() outside the lock section in
preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
The list_empty(&wb->work_list) should be okay outside the lock section
of wb->list_lock as it is protected by a separate lock (wb->work_lock),
and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
lists the wb->list_lock is protecting. Also, the loop seems to be
already releasing and reacquring the lock, so this refactoring looks
safe.
Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
fs/fs-writeback.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d831..012357bc8daa3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
struct blk_plug plug;
blk_start_plug(&plug);
- spin_lock(&wb->list_lock);
for (;;) {
/*
* Stop writeback when nr_pages has been consumed
@@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
if (work->for_background && !wb_over_bg_thresh(wb))
break;
+
+ spin_lock(&wb->list_lock);
+
/*
* Kupdate and background works are special and we want to
* include all inodes that need writing. Livelock avoidance is
@@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
* mean the overall work is done. So we keep looping as long
* as made some progress on cleaning pages or inodes.
*/
- if (progress)
+ if (progress) {
+ spin_unlock(&wb->list_lock);
continue;
+ }
+
/*
* No more inodes for IO, bail
*/
- if (list_empty(&wb->b_more_io))
+ if (list_empty(&wb->b_more_io)) {
+ spin_unlock(&wb->list_lock);
break;
+ }
+
/*
* Nothing written. Wait for some inode to
* become available for writeback. Otherwise
@@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
spin_unlock(&wb->list_lock);
/* This function drops i_lock... */
inode_sleep_on_writeback(inode);
- spin_lock(&wb->list_lock);
}
- spin_unlock(&wb->list_lock);
blk_finish_plug(&plug);
return nr_pages - work->nr_pages;
--
2.40.0.348.gf938b09366-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread* [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
@ 2023-04-03 22:03 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-03 22:03 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel, linux-kernel, cgroups, linux-mm, Yosry Ahmed
wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
flush, which can be expensive on large systems. Currently,
wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
have to make the rstat flush atomically. On systems with a lot of
cpus/cgroups, this can cause us to disable irqs for a long time,
potentially causing problems.
Move the call to wb_over_bg_thresh() outside the lock section in
preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
The list_empty(&wb->work_list) should be okay outside the lock section
of wb->list_lock as it is protected by a separate lock (wb->work_lock),
and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
lists the wb->list_lock is protecting. Also, the loop seems to be
already releasing and reacquring the lock, so this refactoring looks
safe.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
fs/fs-writeback.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d831..012357bc8daa3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
struct blk_plug plug;
blk_start_plug(&plug);
- spin_lock(&wb->list_lock);
for (;;) {
/*
* Stop writeback when nr_pages has been consumed
@@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
if (work->for_background && !wb_over_bg_thresh(wb))
break;
+
+ spin_lock(&wb->list_lock);
+
/*
* Kupdate and background works are special and we want to
* include all inodes that need writing. Livelock avoidance is
@@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
* mean the overall work is done. So we keep looping as long
* as made some progress on cleaning pages or inodes.
*/
- if (progress)
+ if (progress) {
+ spin_unlock(&wb->list_lock);
continue;
+ }
+
/*
* No more inodes for IO, bail
*/
- if (list_empty(&wb->b_more_io))
+ if (list_empty(&wb->b_more_io)) {
+ spin_unlock(&wb->list_lock);
break;
+ }
+
/*
* Nothing written. Wait for some inode to
* become available for writeback. Otherwise
@@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
spin_unlock(&wb->list_lock);
/* This function drops i_lock... */
inode_sleep_on_writeback(inode);
- spin_lock(&wb->list_lock);
}
- spin_unlock(&wb->list_lock);
blk_finish_plug(&plug);
return nr_pages - work->nr_pages;
--
2.40.0.348.gf938b09366-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread[parent not found: <20230403220337.443510-2-yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-19 11:38 ` Michal Koutný
-1 siblings, 0 replies; 44+ messages in thread
From: Michal Koutný @ 2023-04-19 11:38 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
[-- Attachment #1: Type: text/plain, Size: 2609 bytes --]
On Mon, Apr 03, 2023 at 10:03:33PM +0000, Yosry Ahmed <yosryahmed-hpIqsD4AKldhl2p70BpVqQ@public.gmane.orgm> wrote:
> wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> flush, which can be expensive on large systems. Currently,
> wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> have to make the rstat flush atomically. On systems with a lot of
> cpus/cgroups, this can cause us to disable irqs for a long time,
> potentially causing problems.
>
> Move the call to wb_over_bg_thresh() outside the lock section in
> preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> The list_empty(&wb->work_list) should be okay outside the lock section
> of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> lists the wb->list_lock is protecting. Also, the loop seems to be
> already releasing and reacquring the lock, so this refactoring looks
> safe.
>
> Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
> fs/fs-writeback.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 195dc23e0d831..012357bc8daa3 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> struct blk_plug plug;
>
> blk_start_plug(&plug);
> - spin_lock(&wb->list_lock);
> for (;;) {
> /*
> * Stop writeback when nr_pages has been consumed
> @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> if (work->for_background && !wb_over_bg_thresh(wb))
> break;
>
> +
> + spin_lock(&wb->list_lock);
> +
> /*
> * Kupdate and background works are special and we want to
> * include all inodes that need writing. Livelock avoidance is
> @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> * mean the overall work is done. So we keep looping as long
> * as made some progress on cleaning pages or inodes.
> */
> - if (progress)
> + if (progress) {
> + spin_unlock(&wb->list_lock);
> continue;
> + }
> +
This would release wb->list_lock temporarily with progress but that's
already not held continuously due to writeback_sb_inodes().
Holding the lock could even be shortened by taking it later after
trace_writeback_start().
Altogether, the change looks OK,
Reviewed-by: Michal Koutný <mkoutny-IBi9RG/b67k@public.gmane.org>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
@ 2023-04-19 11:38 ` Michal Koutný
0 siblings, 0 replies; 44+ messages in thread
From: Michal Koutný @ 2023-04-19 11:38 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
linux-fsdevel, linux-kernel, cgroups, linux-mm
[-- Attachment #1: Type: text/plain, Size: 2530 bytes --]
On Mon, Apr 03, 2023 at 10:03:33PM +0000, Yosry Ahmed <yosryahmed@google.com> wrote:
> wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> flush, which can be expensive on large systems. Currently,
> wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> have to make the rstat flush atomically. On systems with a lot of
> cpus/cgroups, this can cause us to disable irqs for a long time,
> potentially causing problems.
>
> Move the call to wb_over_bg_thresh() outside the lock section in
> preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> The list_empty(&wb->work_list) should be okay outside the lock section
> of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> lists the wb->list_lock is protecting. Also, the loop seems to be
> already releasing and reacquring the lock, so this refactoring looks
> safe.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
> fs/fs-writeback.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 195dc23e0d831..012357bc8daa3 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> struct blk_plug plug;
>
> blk_start_plug(&plug);
> - spin_lock(&wb->list_lock);
> for (;;) {
> /*
> * Stop writeback when nr_pages has been consumed
> @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> if (work->for_background && !wb_over_bg_thresh(wb))
> break;
>
> +
> + spin_lock(&wb->list_lock);
> +
> /*
> * Kupdate and background works are special and we want to
> * include all inodes that need writing. Livelock avoidance is
> @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> * mean the overall work is done. So we keep looping as long
> * as made some progress on cleaning pages or inodes.
> */
> - if (progress)
> + if (progress) {
> + spin_unlock(&wb->list_lock);
> continue;
> + }
> +
This would release wb->list_lock temporarily with progress but that's
already not held continuously due to writeback_sb_inodes().
Holding the lock could even be shortened by taking it later after
trace_writeback_start().
Altogether, the change looks OK,
Reviewed-by: Michal Koutný <mkoutny@suse.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
2023-04-19 11:38 ` Michal Koutný
@ 2023-04-20 20:23 ` Yosry Ahmed
-1 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-20 20:23 UTC (permalink / raw)
To: Michal Koutný
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Wed, Apr 19, 2023 at 4:38 AM Michal Koutný <mkoutny-WDPRK7U/wSs@public.gmane.orgm> wrote:
>
> On Mon, Apr 03, 2023 at 10:03:33PM +0000, Yosry Ahmed <yosryahmed@google.com> wrote:
> > wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> > flush, which can be expensive on large systems. Currently,
> > wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> > have to make the rstat flush atomically. On systems with a lot of
> > cpus/cgroups, this can cause us to disable irqs for a long time,
> > potentially causing problems.
> >
> > Move the call to wb_over_bg_thresh() outside the lock section in
> > preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> > The list_empty(&wb->work_list) should be okay outside the lock section
> > of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> > and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> > lists the wb->list_lock is protecting. Also, the loop seems to be
> > already releasing and reacquring the lock, so this refactoring looks
> > safe.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > ---
> > fs/fs-writeback.c | 16 +++++++++++-----
> > 1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 195dc23e0d831..012357bc8daa3 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> > struct blk_plug plug;
> >
> > blk_start_plug(&plug);
> > - spin_lock(&wb->list_lock);
> > for (;;) {
> > /*
> > * Stop writeback when nr_pages has been consumed
> > @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> > if (work->for_background && !wb_over_bg_thresh(wb))
> > break;
> >
> > +
> > + spin_lock(&wb->list_lock);
> > +
> > /*
> > * Kupdate and background works are special and we want to
> > * include all inodes that need writing. Livelock avoidance is
> > @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> > * mean the overall work is done. So we keep looping as long
> > * as made some progress on cleaning pages or inodes.
> > */
> > - if (progress)
> > + if (progress) {
> > + spin_unlock(&wb->list_lock);
> > continue;
> > + }
> > +
>
> This would release wb->list_lock temporarily with progress but that's
> already not held continuously due to writeback_sb_inodes().
> Holding the lock could even be shortened by taking it later after
> trace_writeback_start().
>
> Altogether, the change looks OK,
> Reviewed-by: Michal Koutný <mkoutny-IBi9RG/b67k@public.gmane.org>
Thanks for taking a look!
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
@ 2023-04-20 20:23 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-20 20:23 UTC (permalink / raw)
To: Michal Koutný
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
linux-fsdevel, linux-kernel, cgroups, linux-mm
On Wed, Apr 19, 2023 at 4:38 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Mon, Apr 03, 2023 at 10:03:33PM +0000, Yosry Ahmed <yosryahmed@google.com> wrote:
> > wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> > flush, which can be expensive on large systems. Currently,
> > wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> > have to make the rstat flush atomically. On systems with a lot of
> > cpus/cgroups, this can cause us to disable irqs for a long time,
> > potentially causing problems.
> >
> > Move the call to wb_over_bg_thresh() outside the lock section in
> > preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> > The list_empty(&wb->work_list) should be okay outside the lock section
> > of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> > and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> > lists the wb->list_lock is protecting. Also, the loop seems to be
> > already releasing and reacquring the lock, so this refactoring looks
> > safe.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> > fs/fs-writeback.c | 16 +++++++++++-----
> > 1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 195dc23e0d831..012357bc8daa3 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> > struct blk_plug plug;
> >
> > blk_start_plug(&plug);
> > - spin_lock(&wb->list_lock);
> > for (;;) {
> > /*
> > * Stop writeback when nr_pages has been consumed
> > @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> > if (work->for_background && !wb_over_bg_thresh(wb))
> > break;
> >
> > +
> > + spin_lock(&wb->list_lock);
> > +
> > /*
> > * Kupdate and background works are special and we want to
> > * include all inodes that need writing. Livelock avoidance is
> > @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> > * mean the overall work is done. So we keep looping as long
> > * as made some progress on cleaning pages or inodes.
> > */
> > - if (progress)
> > + if (progress) {
> > + spin_unlock(&wb->list_lock);
> > continue;
> > + }
> > +
>
> This would release wb->list_lock temporarily with progress but that's
> already not held continuously due to writeback_sb_inodes().
> Holding the lock could even be shortened by taking it later after
> trace_writeback_start().
>
> Altogether, the change looks OK,
> Reviewed-by: Michal Koutný <mkoutny@suse.com>
Thanks for taking a look!
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-20 18:53 ` Shakeel Butt
-1 siblings, 0 replies; 44+ messages in thread
From: Shakeel Butt @ 2023-04-20 18:53 UTC (permalink / raw)
To: Yosry Ahmed, Jan Kara, Jens Axboe
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Muchun Song, Andrew Morton,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
+Jens & Jan
The patch looks good but it would be nice to pass this patch through
the eyes of experts of this area.
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> flush, which can be expensive on large systems. Currently,
> wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> have to make the rstat flush atomically. On systems with a lot of
> cpus/cgroups, this can cause us to disable irqs for a long time,
> potentially causing problems.
>
> Move the call to wb_over_bg_thresh() outside the lock section in
> preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> The list_empty(&wb->work_list) should be okay outside the lock section
> of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> lists the wb->list_lock is protecting. Also, the loop seems to be
> already releasing and reacquring the lock, so this refactoring looks
> safe.
>
> Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---
> fs/fs-writeback.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 195dc23e0d831..012357bc8daa3 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> struct blk_plug plug;
>
> blk_start_plug(&plug);
> - spin_lock(&wb->list_lock);
> for (;;) {
> /*
> * Stop writeback when nr_pages has been consumed
> @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> if (work->for_background && !wb_over_bg_thresh(wb))
> break;
>
> +
> + spin_lock(&wb->list_lock);
> +
> /*
> * Kupdate and background works are special and we want to
> * include all inodes that need writing. Livelock avoidance is
> @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> * mean the overall work is done. So we keep looping as long
> * as made some progress on cleaning pages or inodes.
> */
> - if (progress)
> + if (progress) {
> + spin_unlock(&wb->list_lock);
> continue;
> + }
> +
> /*
> * No more inodes for IO, bail
> */
> - if (list_empty(&wb->b_more_io))
> + if (list_empty(&wb->b_more_io)) {
> + spin_unlock(&wb->list_lock);
> break;
> + }
> +
> /*
> * Nothing written. Wait for some inode to
> * become available for writeback. Otherwise
> @@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
> spin_unlock(&wb->list_lock);
> /* This function drops i_lock... */
> inode_sleep_on_writeback(inode);
> - spin_lock(&wb->list_lock);
> }
> - spin_unlock(&wb->list_lock);
> blk_finish_plug(&plug);
>
> return nr_pages - work->nr_pages;
> --
> 2.40.0.348.gf938b09366-goog
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
@ 2023-04-20 18:53 ` Shakeel Butt
0 siblings, 0 replies; 44+ messages in thread
From: Shakeel Butt @ 2023-04-20 18:53 UTC (permalink / raw)
To: Yosry Ahmed, Jan Kara, Jens Axboe
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Muchun Song, Andrew Morton, linux-fsdevel,
linux-kernel, cgroups, linux-mm
+Jens & Jan
The patch looks good but it would be nice to pass this patch through
the eyes of experts of this area.
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> flush, which can be expensive on large systems. Currently,
> wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> have to make the rstat flush atomically. On systems with a lot of
> cpus/cgroups, this can cause us to disable irqs for a long time,
> potentially causing problems.
>
> Move the call to wb_over_bg_thresh() outside the lock section in
> preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> The list_empty(&wb->work_list) should be okay outside the lock section
> of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> lists the wb->list_lock is protecting. Also, the loop seems to be
> already releasing and reacquring the lock, so this refactoring looks
> safe.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
> fs/fs-writeback.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 195dc23e0d831..012357bc8daa3 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> struct blk_plug plug;
>
> blk_start_plug(&plug);
> - spin_lock(&wb->list_lock);
> for (;;) {
> /*
> * Stop writeback when nr_pages has been consumed
> @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> if (work->for_background && !wb_over_bg_thresh(wb))
> break;
>
> +
> + spin_lock(&wb->list_lock);
> +
> /*
> * Kupdate and background works are special and we want to
> * include all inodes that need writing. Livelock avoidance is
> @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> * mean the overall work is done. So we keep looping as long
> * as made some progress on cleaning pages or inodes.
> */
> - if (progress)
> + if (progress) {
> + spin_unlock(&wb->list_lock);
> continue;
> + }
> +
> /*
> * No more inodes for IO, bail
> */
> - if (list_empty(&wb->b_more_io))
> + if (list_empty(&wb->b_more_io)) {
> + spin_unlock(&wb->list_lock);
> break;
> + }
> +
> /*
> * Nothing written. Wait for some inode to
> * become available for writeback. Otherwise
> @@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
> spin_unlock(&wb->list_lock);
> /* This function drops i_lock... */
> inode_sleep_on_writeback(inode);
> - spin_lock(&wb->list_lock);
> }
> - spin_unlock(&wb->list_lock);
> blk_finish_plug(&plug);
>
> return nr_pages - work->nr_pages;
> --
> 2.40.0.348.gf938b09366-goog
>
^ permalink raw reply [flat|nested] 44+ messages in thread[parent not found: <CALvZod5h5G9YNu8d9fAOL5eXie5iM3urw9QgD=vucBdCMAQnxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
2023-04-20 18:53 ` Shakeel Butt
@ 2023-04-20 20:22 ` Yosry Ahmed
-1 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-20 20:22 UTC (permalink / raw)
To: Shakeel Butt
Cc: Jan Kara, Jens Axboe, Alexander Viro, Christian Brauner,
Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Andrew Morton, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Thu, Apr 20, 2023 at 11:53 AM Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> +Jens & Jan
>
> The patch looks good but it would be nice to pass this patch through
> the eyes of experts of this area.
Thanks for taking a look and CC'ing folks. I will make sure to include
them in the next rounds as well. FWIW, Jens & Jan did not show up when
I ran scripts/get_maintainers.ph if I remember correctly.
>
> On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> > flush, which can be expensive on large systems. Currently,
> > wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> > have to make the rstat flush atomically. On systems with a lot of
> > cpus/cgroups, this can cause us to disable irqs for a long time,
> > potentially causing problems.
> >
> > Move the call to wb_over_bg_thresh() outside the lock section in
> > preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> > The list_empty(&wb->work_list) should be okay outside the lock section
> > of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> > and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> > lists the wb->list_lock is protecting. Also, the loop seems to be
> > already releasing and reacquring the lock, so this refactoring looks
> > safe.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > ---
> > fs/fs-writeback.c | 16 +++++++++++-----
> > 1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 195dc23e0d831..012357bc8daa3 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> > struct blk_plug plug;
> >
> > blk_start_plug(&plug);
> > - spin_lock(&wb->list_lock);
> > for (;;) {
> > /*
> > * Stop writeback when nr_pages has been consumed
> > @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> > if (work->for_background && !wb_over_bg_thresh(wb))
> > break;
> >
> > +
> > + spin_lock(&wb->list_lock);
> > +
> > /*
> > * Kupdate and background works are special and we want to
> > * include all inodes that need writing. Livelock avoidance is
> > @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> > * mean the overall work is done. So we keep looping as long
> > * as made some progress on cleaning pages or inodes.
> > */
> > - if (progress)
> > + if (progress) {
> > + spin_unlock(&wb->list_lock);
> > continue;
> > + }
> > +
> > /*
> > * No more inodes for IO, bail
> > */
> > - if (list_empty(&wb->b_more_io))
> > + if (list_empty(&wb->b_more_io)) {
> > + spin_unlock(&wb->list_lock);
> > break;
> > + }
> > +
> > /*
> > * Nothing written. Wait for some inode to
> > * become available for writeback. Otherwise
> > @@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
> > spin_unlock(&wb->list_lock);
> > /* This function drops i_lock... */
> > inode_sleep_on_writeback(inode);
> > - spin_lock(&wb->list_lock);
> > }
> > - spin_unlock(&wb->list_lock);
> > blk_finish_plug(&plug);
> >
> > return nr_pages - work->nr_pages;
> > --
> > 2.40.0.348.gf938b09366-goog
> >
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
@ 2023-04-20 20:22 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-20 20:22 UTC (permalink / raw)
To: Shakeel Butt
Cc: Jan Kara, Jens Axboe, Alexander Viro, Christian Brauner,
Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Andrew Morton, linux-fsdevel, linux-kernel, cgroups, linux-mm
On Thu, Apr 20, 2023 at 11:53 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> +Jens & Jan
>
> The patch looks good but it would be nice to pass this patch through
> the eyes of experts of this area.
Thanks for taking a look and CC'ing folks. I will make sure to include
them in the next rounds as well. FWIW, Jens & Jan did not show up when
I ran scripts/get_maintainers.ph if I remember correctly.
>
> On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> > flush, which can be expensive on large systems. Currently,
> > wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> > have to make the rstat flush atomically. On systems with a lot of
> > cpus/cgroups, this can cause us to disable irqs for a long time,
> > potentially causing problems.
> >
> > Move the call to wb_over_bg_thresh() outside the lock section in
> > preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> > The list_empty(&wb->work_list) should be okay outside the lock section
> > of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> > and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> > lists the wb->list_lock is protecting. Also, the loop seems to be
> > already releasing and reacquring the lock, so this refactoring looks
> > safe.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> > ---
> > fs/fs-writeback.c | 16 +++++++++++-----
> > 1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 195dc23e0d831..012357bc8daa3 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> > struct blk_plug plug;
> >
> > blk_start_plug(&plug);
> > - spin_lock(&wb->list_lock);
> > for (;;) {
> > /*
> > * Stop writeback when nr_pages has been consumed
> > @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> > if (work->for_background && !wb_over_bg_thresh(wb))
> > break;
> >
> > +
> > + spin_lock(&wb->list_lock);
> > +
> > /*
> > * Kupdate and background works are special and we want to
> > * include all inodes that need writing. Livelock avoidance is
> > @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> > * mean the overall work is done. So we keep looping as long
> > * as made some progress on cleaning pages or inodes.
> > */
> > - if (progress)
> > + if (progress) {
> > + spin_unlock(&wb->list_lock);
> > continue;
> > + }
> > +
> > /*
> > * No more inodes for IO, bail
> > */
> > - if (list_empty(&wb->b_more_io))
> > + if (list_empty(&wb->b_more_io)) {
> > + spin_unlock(&wb->list_lock);
> > break;
> > + }
> > +
> > /*
> > * Nothing written. Wait for some inode to
> > * become available for writeback. Otherwise
> > @@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
> > spin_unlock(&wb->list_lock);
> > /* This function drops i_lock... */
> > inode_sleep_on_writeback(inode);
> > - spin_lock(&wb->list_lock);
> > }
> > - spin_unlock(&wb->list_lock);
> > blk_finish_plug(&plug);
> >
> > return nr_pages - work->nr_pages;
> > --
> > 2.40.0.348.gf938b09366-goog
> >
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-21 8:53 ` Jan Kara
-1 siblings, 0 replies; 44+ messages in thread
From: Jan Kara @ 2023-04-21 8:53 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Mon 03-04-23 22:03:33, Yosry Ahmed wrote:
> wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> flush, which can be expensive on large systems. Currently,
> wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> have to make the rstat flush atomically. On systems with a lot of
> cpus/cgroups, this can cause us to disable irqs for a long time,
> potentially causing problems.
>
> Move the call to wb_over_bg_thresh() outside the lock section in
> preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> The list_empty(&wb->work_list) should be okay outside the lock section
> of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> lists the wb->list_lock is protecting. Also, the loop seems to be
> already releasing and reacquring the lock, so this refactoring looks
> safe.
>
> Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
The patch looks good to me. Nice find. Feel free to add:
Reviewed-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Honza
> ---
> fs/fs-writeback.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 195dc23e0d831..012357bc8daa3 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> struct blk_plug plug;
>
> blk_start_plug(&plug);
> - spin_lock(&wb->list_lock);
> for (;;) {
> /*
> * Stop writeback when nr_pages has been consumed
> @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> if (work->for_background && !wb_over_bg_thresh(wb))
> break;
>
> +
> + spin_lock(&wb->list_lock);
> +
> /*
> * Kupdate and background works are special and we want to
> * include all inodes that need writing. Livelock avoidance is
> @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> * mean the overall work is done. So we keep looping as long
> * as made some progress on cleaning pages or inodes.
> */
> - if (progress)
> + if (progress) {
> + spin_unlock(&wb->list_lock);
> continue;
> + }
> +
> /*
> * No more inodes for IO, bail
> */
> - if (list_empty(&wb->b_more_io))
> + if (list_empty(&wb->b_more_io)) {
> + spin_unlock(&wb->list_lock);
> break;
> + }
> +
> /*
> * Nothing written. Wait for some inode to
> * become available for writeback. Otherwise
> @@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
> spin_unlock(&wb->list_lock);
> /* This function drops i_lock... */
> inode_sleep_on_writeback(inode);
> - spin_lock(&wb->list_lock);
> }
> - spin_unlock(&wb->list_lock);
> blk_finish_plug(&plug);
>
> return nr_pages - work->nr_pages;
> --
> 2.40.0.348.gf938b09366-goog
>
--
Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
@ 2023-04-21 8:53 ` Jan Kara
0 siblings, 0 replies; 44+ messages in thread
From: Jan Kara @ 2023-04-21 8:53 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
linux-fsdevel, linux-kernel, cgroups, linux-mm
On Mon 03-04-23 22:03:33, Yosry Ahmed wrote:
> wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> flush, which can be expensive on large systems. Currently,
> wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> have to make the rstat flush atomically. On systems with a lot of
> cpus/cgroups, this can cause us to disable irqs for a long time,
> potentially causing problems.
>
> Move the call to wb_over_bg_thresh() outside the lock section in
> preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> The list_empty(&wb->work_list) should be okay outside the lock section
> of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> lists the wb->list_lock is protecting. Also, the loop seems to be
> already releasing and reacquring the lock, so this refactoring looks
> safe.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
The patch looks good to me. Nice find. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/fs-writeback.c | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 195dc23e0d831..012357bc8daa3 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> struct blk_plug plug;
>
> blk_start_plug(&plug);
> - spin_lock(&wb->list_lock);
> for (;;) {
> /*
> * Stop writeback when nr_pages has been consumed
> @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> if (work->for_background && !wb_over_bg_thresh(wb))
> break;
>
> +
> + spin_lock(&wb->list_lock);
> +
> /*
> * Kupdate and background works are special and we want to
> * include all inodes that need writing. Livelock avoidance is
> @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> * mean the overall work is done. So we keep looping as long
> * as made some progress on cleaning pages or inodes.
> */
> - if (progress)
> + if (progress) {
> + spin_unlock(&wb->list_lock);
> continue;
> + }
> +
> /*
> * No more inodes for IO, bail
> */
> - if (list_empty(&wb->b_more_io))
> + if (list_empty(&wb->b_more_io)) {
> + spin_unlock(&wb->list_lock);
> break;
> + }
> +
> /*
> * Nothing written. Wait for some inode to
> * become available for writeback. Otherwise
> @@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
> spin_unlock(&wb->list_lock);
> /* This function drops i_lock... */
> inode_sleep_on_writeback(inode);
> - spin_lock(&wb->list_lock);
> }
> - spin_unlock(&wb->list_lock);
> blk_finish_plug(&plug);
>
> return nr_pages - work->nr_pages;
> --
> 2.40.0.348.gf938b09366-goog
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
2023-04-21 8:53 ` Jan Kara
@ 2023-04-21 17:21 ` Yosry Ahmed
-1 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-21 17:21 UTC (permalink / raw)
To: Jan Kara
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Fri, Apr 21, 2023 at 1:53 AM Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote:
>
> On Mon 03-04-23 22:03:33, Yosry Ahmed wrote:
> > wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> > flush, which can be expensive on large systems. Currently,
> > wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> > have to make the rstat flush atomically. On systems with a lot of
> > cpus/cgroups, this can cause us to disable irqs for a long time,
> > potentially causing problems.
> >
> > Move the call to wb_over_bg_thresh() outside the lock section in
> > preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> > The list_empty(&wb->work_list) should be okay outside the lock section
> > of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> > and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> > lists the wb->list_lock is protecting. Also, the loop seems to be
> > already releasing and reacquring the lock, so this refactoring looks
> > safe.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>
> The patch looks good to me. Nice find. Feel free to add:
>
> Reviewed-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Thanks for taking a look!
>
> Honza
>
> > ---
> > fs/fs-writeback.c | 16 +++++++++++-----
> > 1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 195dc23e0d831..012357bc8daa3 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> > struct blk_plug plug;
> >
> > blk_start_plug(&plug);
> > - spin_lock(&wb->list_lock);
> > for (;;) {
> > /*
> > * Stop writeback when nr_pages has been consumed
> > @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> > if (work->for_background && !wb_over_bg_thresh(wb))
> > break;
> >
> > +
> > + spin_lock(&wb->list_lock);
> > +
> > /*
> > * Kupdate and background works are special and we want to
> > * include all inodes that need writing. Livelock avoidance is
> > @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> > * mean the overall work is done. So we keep looping as long
> > * as made some progress on cleaning pages or inodes.
> > */
> > - if (progress)
> > + if (progress) {
> > + spin_unlock(&wb->list_lock);
> > continue;
> > + }
> > +
> > /*
> > * No more inodes for IO, bail
> > */
> > - if (list_empty(&wb->b_more_io))
> > + if (list_empty(&wb->b_more_io)) {
> > + spin_unlock(&wb->list_lock);
> > break;
> > + }
> > +
> > /*
> > * Nothing written. Wait for some inode to
> > * become available for writeback. Otherwise
> > @@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
> > spin_unlock(&wb->list_lock);
> > /* This function drops i_lock... */
> > inode_sleep_on_writeback(inode);
> > - spin_lock(&wb->list_lock);
> > }
> > - spin_unlock(&wb->list_lock);
> > blk_finish_plug(&plug);
> >
> > return nr_pages - work->nr_pages;
> > --
> > 2.40.0.348.gf938b09366-goog
> >
> --
> Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 1/5] writeback: move wb_over_bg_thresh() call outside lock section
@ 2023-04-21 17:21 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-21 17:21 UTC (permalink / raw)
To: Jan Kara
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
linux-fsdevel, linux-kernel, cgroups, linux-mm
On Fri, Apr 21, 2023 at 1:53 AM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 03-04-23 22:03:33, Yosry Ahmed wrote:
> > wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
> > flush, which can be expensive on large systems. Currently,
> > wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
> > have to make the rstat flush atomically. On systems with a lot of
> > cpus/cgroups, this can cause us to disable irqs for a long time,
> > potentially causing problems.
> >
> > Move the call to wb_over_bg_thresh() outside the lock section in
> > preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
> > The list_empty(&wb->work_list) should be okay outside the lock section
> > of wb->list_lock as it is protected by a separate lock (wb->work_lock),
> > and wb_over_bg_thresh() doesn't seem like it is modifying any of the b_*
> > lists the wb->list_lock is protecting. Also, the loop seems to be
> > already releasing and reacquring the lock, so this refactoring looks
> > safe.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
>
> The patch looks good to me. Nice find. Feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
Thanks for taking a look!
>
> Honza
>
> > ---
> > fs/fs-writeback.c | 16 +++++++++++-----
> > 1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 195dc23e0d831..012357bc8daa3 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2021,7 +2021,6 @@ static long wb_writeback(struct bdi_writeback *wb,
> > struct blk_plug plug;
> >
> > blk_start_plug(&plug);
> > - spin_lock(&wb->list_lock);
> > for (;;) {
> > /*
> > * Stop writeback when nr_pages has been consumed
> > @@ -2046,6 +2045,9 @@ static long wb_writeback(struct bdi_writeback *wb,
> > if (work->for_background && !wb_over_bg_thresh(wb))
> > break;
> >
> > +
> > + spin_lock(&wb->list_lock);
> > +
> > /*
> > * Kupdate and background works are special and we want to
> > * include all inodes that need writing. Livelock avoidance is
> > @@ -2075,13 +2077,19 @@ static long wb_writeback(struct bdi_writeback *wb,
> > * mean the overall work is done. So we keep looping as long
> > * as made some progress on cleaning pages or inodes.
> > */
> > - if (progress)
> > + if (progress) {
> > + spin_unlock(&wb->list_lock);
> > continue;
> > + }
> > +
> > /*
> > * No more inodes for IO, bail
> > */
> > - if (list_empty(&wb->b_more_io))
> > + if (list_empty(&wb->b_more_io)) {
> > + spin_unlock(&wb->list_lock);
> > break;
> > + }
> > +
> > /*
> > * Nothing written. Wait for some inode to
> > * become available for writeback. Otherwise
> > @@ -2093,9 +2101,7 @@ static long wb_writeback(struct bdi_writeback *wb,
> > spin_unlock(&wb->list_lock);
> > /* This function drops i_lock... */
> > inode_sleep_on_writeback(inode);
> > - spin_lock(&wb->list_lock);
> > }
> > - spin_unlock(&wb->list_lock);
> > blk_finish_plug(&plug);
> >
> > return nr_pages - work->nr_pages;
> > --
> > 2.40.0.348.gf938b09366-goog
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH mm-unstable RFC 4/5] memcg: remove mem_cgroup_flush_stats_atomic()
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-03 22:03 ` Yosry Ahmed
-1 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-03 22:03 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Yosry Ahmed
Previous patches removed all callers of mem_cgroup_flush_stats_atomic().
Remove the function and simplify the code.
Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
include/linux/memcontrol.h | 5 -----
mm/memcontrol.c | 24 +++++-------------------
2 files changed, 5 insertions(+), 24 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 222d7370134c7..00a88cf947e14 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1038,7 +1038,6 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
}
void mem_cgroup_flush_stats(void);
-void mem_cgroup_flush_stats_atomic(void);
void mem_cgroup_flush_stats_ratelimited(void);
void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
@@ -1537,10 +1536,6 @@ static inline void mem_cgroup_flush_stats(void)
{
}
-static inline void mem_cgroup_flush_stats_atomic(void)
-{
-}
-
static inline void mem_cgroup_flush_stats_ratelimited(void)
{
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e7fe18c0c0ef2..33339106f1d9b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -638,7 +638,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
}
}
-static void do_flush_stats(bool atomic)
+static void do_flush_stats(void)
{
/*
* We always flush the entire tree, so concurrent flushers can just
@@ -651,30 +651,16 @@ static void do_flush_stats(bool atomic)
WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
- if (atomic)
- cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup);
- else
- cgroup_rstat_flush(root_mem_cgroup->css.cgroup);
+ cgroup_rstat_flush(root_mem_cgroup->css.cgroup);
atomic_set(&stats_flush_threshold, 0);
atomic_set(&stats_flush_ongoing, 0);
}
-static bool should_flush_stats(void)
-{
- return atomic_read(&stats_flush_threshold) > num_online_cpus();
-}
-
void mem_cgroup_flush_stats(void)
{
- if (should_flush_stats())
- do_flush_stats(false);
-}
-
-void mem_cgroup_flush_stats_atomic(void)
-{
- if (should_flush_stats())
- do_flush_stats(true);
+ if (atomic_read(&stats_flush_threshold) > num_online_cpus())
+ do_flush_stats();
}
void mem_cgroup_flush_stats_ratelimited(void)
@@ -689,7 +675,7 @@ static void flush_memcg_stats_dwork(struct work_struct *w)
* Always flush here so that flushing in latency-sensitive paths is
* as cheap as possible.
*/
- do_flush_stats(false);
+ do_flush_stats();
queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME);
}
--
2.40.0.348.gf938b09366-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread* [PATCH mm-unstable RFC 4/5] memcg: remove mem_cgroup_flush_stats_atomic()
@ 2023-04-03 22:03 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-03 22:03 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel, linux-kernel, cgroups, linux-mm, Yosry Ahmed
Previous patches removed all callers of mem_cgroup_flush_stats_atomic().
Remove the function and simplify the code.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
include/linux/memcontrol.h | 5 -----
mm/memcontrol.c | 24 +++++-------------------
2 files changed, 5 insertions(+), 24 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 222d7370134c7..00a88cf947e14 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1038,7 +1038,6 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
}
void mem_cgroup_flush_stats(void);
-void mem_cgroup_flush_stats_atomic(void);
void mem_cgroup_flush_stats_ratelimited(void);
void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
@@ -1537,10 +1536,6 @@ static inline void mem_cgroup_flush_stats(void)
{
}
-static inline void mem_cgroup_flush_stats_atomic(void)
-{
-}
-
static inline void mem_cgroup_flush_stats_ratelimited(void)
{
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e7fe18c0c0ef2..33339106f1d9b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -638,7 +638,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
}
}
-static void do_flush_stats(bool atomic)
+static void do_flush_stats(void)
{
/*
* We always flush the entire tree, so concurrent flushers can just
@@ -651,30 +651,16 @@ static void do_flush_stats(bool atomic)
WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME);
- if (atomic)
- cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup);
- else
- cgroup_rstat_flush(root_mem_cgroup->css.cgroup);
+ cgroup_rstat_flush(root_mem_cgroup->css.cgroup);
atomic_set(&stats_flush_threshold, 0);
atomic_set(&stats_flush_ongoing, 0);
}
-static bool should_flush_stats(void)
-{
- return atomic_read(&stats_flush_threshold) > num_online_cpus();
-}
-
void mem_cgroup_flush_stats(void)
{
- if (should_flush_stats())
- do_flush_stats(false);
-}
-
-void mem_cgroup_flush_stats_atomic(void)
-{
- if (should_flush_stats())
- do_flush_stats(true);
+ if (atomic_read(&stats_flush_threshold) > num_online_cpus())
+ do_flush_stats();
}
void mem_cgroup_flush_stats_ratelimited(void)
@@ -689,7 +675,7 @@ static void flush_memcg_stats_dwork(struct work_struct *w)
* Always flush here so that flushing in latency-sensitive paths is
* as cheap as possible.
*/
- do_flush_stats(false);
+ do_flush_stats();
queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME);
}
--
2.40.0.348.gf938b09366-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 4/5] memcg: remove mem_cgroup_flush_stats_atomic()
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-20 19:38 ` Shakeel Butt
-1 siblings, 0 replies; 44+ messages in thread
From: Shakeel Butt @ 2023-04-20 19:38 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Muchun Song, Andrew Morton, linux-fsdevel,
linux-kernel, cgroups, linux-mm
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Previous patches removed all callers of mem_cgroup_flush_stats_atomic().
> Remove the function and simplify the code.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 4/5] memcg: remove mem_cgroup_flush_stats_atomic()
@ 2023-04-20 19:38 ` Shakeel Butt
0 siblings, 0 replies; 44+ messages in thread
From: Shakeel Butt @ 2023-04-20 19:38 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Muchun Song, Andrew Morton, linux-fsdevel,
linux-kernel, cgroups, linux-mm
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Previous patches removed all callers of mem_cgroup_flush_stats_atomic().
> Remove the function and simplify the code.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH mm-unstable RFC 5/5] cgroup: remove cgroup_rstat_flush_atomic()
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-03 22:03 ` Yosry Ahmed
-1 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-03 22:03 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
Yosry Ahmed
Previous patches removed the only caller of cgroup_rstat_flush_atomic().
Remove the function and simplify the code.
Signed-off-by: Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
include/linux/cgroup.h | 1 -
kernel/cgroup/rstat.c | 26 +++++---------------------
2 files changed, 5 insertions(+), 22 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 885f5395fcd04..567c547cf371f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -692,7 +692,6 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
*/
void cgroup_rstat_updated(struct cgroup *cgrp, int cpu);
void cgroup_rstat_flush(struct cgroup *cgrp);
-void cgroup_rstat_flush_atomic(struct cgroup *cgrp);
void cgroup_rstat_flush_hold(struct cgroup *cgrp);
void cgroup_rstat_flush_release(void);
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index d3252b0416b69..f9ad33f117c82 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -171,7 +171,7 @@ __weak noinline void bpf_rstat_flush(struct cgroup *cgrp,
__diag_pop();
/* see cgroup_rstat_flush() */
-static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
+static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
__releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock)
{
int cpu;
@@ -207,9 +207,8 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
}
raw_spin_unlock_irqrestore(cpu_lock, flags);
- /* if @may_sleep, play nice and yield if necessary */
- if (may_sleep && (need_resched() ||
- spin_needbreak(&cgroup_rstat_lock))) {
+ /* play nice and yield if necessary */
+ if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) {
spin_unlock_irq(&cgroup_rstat_lock);
if (!cond_resched())
cpu_relax();
@@ -236,25 +235,10 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp)
might_sleep();
spin_lock_irq(&cgroup_rstat_lock);
- cgroup_rstat_flush_locked(cgrp, true);
+ cgroup_rstat_flush_locked(cgrp);
spin_unlock_irq(&cgroup_rstat_lock);
}
-/**
- * cgroup_rstat_flush_atomic- atomic version of cgroup_rstat_flush()
- * @cgrp: target cgroup
- *
- * This function can be called from any context.
- */
-void cgroup_rstat_flush_atomic(struct cgroup *cgrp)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&cgroup_rstat_lock, flags);
- cgroup_rstat_flush_locked(cgrp, false);
- spin_unlock_irqrestore(&cgroup_rstat_lock, flags);
-}
-
/**
* cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold
* @cgrp: target cgroup
@@ -269,7 +253,7 @@ void cgroup_rstat_flush_hold(struct cgroup *cgrp)
{
might_sleep();
spin_lock_irq(&cgroup_rstat_lock);
- cgroup_rstat_flush_locked(cgrp, true);
+ cgroup_rstat_flush_locked(cgrp);
}
/**
--
2.40.0.348.gf938b09366-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread* [PATCH mm-unstable RFC 5/5] cgroup: remove cgroup_rstat_flush_atomic()
@ 2023-04-03 22:03 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-03 22:03 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel, linux-kernel, cgroups, linux-mm, Yosry Ahmed
Previous patches removed the only caller of cgroup_rstat_flush_atomic().
Remove the function and simplify the code.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
include/linux/cgroup.h | 1 -
kernel/cgroup/rstat.c | 26 +++++---------------------
2 files changed, 5 insertions(+), 22 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 885f5395fcd04..567c547cf371f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -692,7 +692,6 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
*/
void cgroup_rstat_updated(struct cgroup *cgrp, int cpu);
void cgroup_rstat_flush(struct cgroup *cgrp);
-void cgroup_rstat_flush_atomic(struct cgroup *cgrp);
void cgroup_rstat_flush_hold(struct cgroup *cgrp);
void cgroup_rstat_flush_release(void);
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index d3252b0416b69..f9ad33f117c82 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -171,7 +171,7 @@ __weak noinline void bpf_rstat_flush(struct cgroup *cgrp,
__diag_pop();
/* see cgroup_rstat_flush() */
-static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
+static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
__releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock)
{
int cpu;
@@ -207,9 +207,8 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
}
raw_spin_unlock_irqrestore(cpu_lock, flags);
- /* if @may_sleep, play nice and yield if necessary */
- if (may_sleep && (need_resched() ||
- spin_needbreak(&cgroup_rstat_lock))) {
+ /* play nice and yield if necessary */
+ if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) {
spin_unlock_irq(&cgroup_rstat_lock);
if (!cond_resched())
cpu_relax();
@@ -236,25 +235,10 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp)
might_sleep();
spin_lock_irq(&cgroup_rstat_lock);
- cgroup_rstat_flush_locked(cgrp, true);
+ cgroup_rstat_flush_locked(cgrp);
spin_unlock_irq(&cgroup_rstat_lock);
}
-/**
- * cgroup_rstat_flush_atomic- atomic version of cgroup_rstat_flush()
- * @cgrp: target cgroup
- *
- * This function can be called from any context.
- */
-void cgroup_rstat_flush_atomic(struct cgroup *cgrp)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&cgroup_rstat_lock, flags);
- cgroup_rstat_flush_locked(cgrp, false);
- spin_unlock_irqrestore(&cgroup_rstat_lock, flags);
-}
-
/**
* cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold
* @cgrp: target cgroup
@@ -269,7 +253,7 @@ void cgroup_rstat_flush_hold(struct cgroup *cgrp)
{
might_sleep();
spin_lock_irq(&cgroup_rstat_lock);
- cgroup_rstat_flush_locked(cgrp, true);
+ cgroup_rstat_flush_locked(cgrp);
}
/**
--
2.40.0.348.gf938b09366-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 5/5] cgroup: remove cgroup_rstat_flush_atomic()
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-20 19:40 ` Shakeel Butt
-1 siblings, 0 replies; 44+ messages in thread
From: Shakeel Butt @ 2023-04-20 19:40 UTC (permalink / raw)
To: Yosry Ahmed, Tejun Heo
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Muchun Song, Andrew Morton, linux-fsdevel,
linux-kernel, cgroups, linux-mm
+Tejun
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Previous patches removed the only caller of cgroup_rstat_flush_atomic().
> Remove the function and simplify the code.
I would say let cgroup maintainers decide this and this patch can be
decoupled from the series.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
> include/linux/cgroup.h | 1 -
> kernel/cgroup/rstat.c | 26 +++++---------------------
> 2 files changed, 5 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 885f5395fcd04..567c547cf371f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -692,7 +692,6 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
> */
> void cgroup_rstat_updated(struct cgroup *cgrp, int cpu);
> void cgroup_rstat_flush(struct cgroup *cgrp);
> -void cgroup_rstat_flush_atomic(struct cgroup *cgrp);
> void cgroup_rstat_flush_hold(struct cgroup *cgrp);
> void cgroup_rstat_flush_release(void);
>
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index d3252b0416b69..f9ad33f117c82 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -171,7 +171,7 @@ __weak noinline void bpf_rstat_flush(struct cgroup *cgrp,
> __diag_pop();
>
> /* see cgroup_rstat_flush() */
> -static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
> +static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
> __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock)
> {
> int cpu;
> @@ -207,9 +207,8 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
> }
> raw_spin_unlock_irqrestore(cpu_lock, flags);
>
> - /* if @may_sleep, play nice and yield if necessary */
> - if (may_sleep && (need_resched() ||
> - spin_needbreak(&cgroup_rstat_lock))) {
> + /* play nice and yield if necessary */
> + if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) {
> spin_unlock_irq(&cgroup_rstat_lock);
> if (!cond_resched())
> cpu_relax();
> @@ -236,25 +235,10 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp)
> might_sleep();
>
> spin_lock_irq(&cgroup_rstat_lock);
> - cgroup_rstat_flush_locked(cgrp, true);
> + cgroup_rstat_flush_locked(cgrp);
> spin_unlock_irq(&cgroup_rstat_lock);
> }
>
> -/**
> - * cgroup_rstat_flush_atomic- atomic version of cgroup_rstat_flush()
> - * @cgrp: target cgroup
> - *
> - * This function can be called from any context.
> - */
> -void cgroup_rstat_flush_atomic(struct cgroup *cgrp)
> -{
> - unsigned long flags;
> -
> - spin_lock_irqsave(&cgroup_rstat_lock, flags);
> - cgroup_rstat_flush_locked(cgrp, false);
> - spin_unlock_irqrestore(&cgroup_rstat_lock, flags);
> -}
> -
> /**
> * cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold
> * @cgrp: target cgroup
> @@ -269,7 +253,7 @@ void cgroup_rstat_flush_hold(struct cgroup *cgrp)
> {
> might_sleep();
> spin_lock_irq(&cgroup_rstat_lock);
> - cgroup_rstat_flush_locked(cgrp, true);
> + cgroup_rstat_flush_locked(cgrp);
> }
>
> /**
> --
> 2.40.0.348.gf938b09366-goog
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH mm-unstable RFC 5/5] cgroup: remove cgroup_rstat_flush_atomic()
@ 2023-04-20 19:40 ` Shakeel Butt
0 siblings, 0 replies; 44+ messages in thread
From: Shakeel Butt @ 2023-04-20 19:40 UTC (permalink / raw)
To: Yosry Ahmed, Tejun Heo
Cc: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Muchun Song, Andrew Morton, linux-fsdevel,
linux-kernel, cgroups, linux-mm
+Tejun
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> Previous patches removed the only caller of cgroup_rstat_flush_atomic().
> Remove the function and simplify the code.
I would say let cgroup maintainers decide this and this patch can be
decoupled from the series.
>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
> ---
> include/linux/cgroup.h | 1 -
> kernel/cgroup/rstat.c | 26 +++++---------------------
> 2 files changed, 5 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 885f5395fcd04..567c547cf371f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -692,7 +692,6 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
> */
> void cgroup_rstat_updated(struct cgroup *cgrp, int cpu);
> void cgroup_rstat_flush(struct cgroup *cgrp);
> -void cgroup_rstat_flush_atomic(struct cgroup *cgrp);
> void cgroup_rstat_flush_hold(struct cgroup *cgrp);
> void cgroup_rstat_flush_release(void);
>
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index d3252b0416b69..f9ad33f117c82 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -171,7 +171,7 @@ __weak noinline void bpf_rstat_flush(struct cgroup *cgrp,
> __diag_pop();
>
> /* see cgroup_rstat_flush() */
> -static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
> +static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
> __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock)
> {
> int cpu;
> @@ -207,9 +207,8 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep)
> }
> raw_spin_unlock_irqrestore(cpu_lock, flags);
>
> - /* if @may_sleep, play nice and yield if necessary */
> - if (may_sleep && (need_resched() ||
> - spin_needbreak(&cgroup_rstat_lock))) {
> + /* play nice and yield if necessary */
> + if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) {
> spin_unlock_irq(&cgroup_rstat_lock);
> if (!cond_resched())
> cpu_relax();
> @@ -236,25 +235,10 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp)
> might_sleep();
>
> spin_lock_irq(&cgroup_rstat_lock);
> - cgroup_rstat_flush_locked(cgrp, true);
> + cgroup_rstat_flush_locked(cgrp);
> spin_unlock_irq(&cgroup_rstat_lock);
> }
>
> -/**
> - * cgroup_rstat_flush_atomic- atomic version of cgroup_rstat_flush()
> - * @cgrp: target cgroup
> - *
> - * This function can be called from any context.
> - */
> -void cgroup_rstat_flush_atomic(struct cgroup *cgrp)
> -{
> - unsigned long flags;
> -
> - spin_lock_irqsave(&cgroup_rstat_lock, flags);
> - cgroup_rstat_flush_locked(cgrp, false);
> - spin_unlock_irqrestore(&cgroup_rstat_lock, flags);
> -}
> -
> /**
> * cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold
> * @cgrp: target cgroup
> @@ -269,7 +253,7 @@ void cgroup_rstat_flush_hold(struct cgroup *cgrp)
> {
> might_sleep();
> spin_lock_irq(&cgroup_rstat_lock);
> - cgroup_rstat_flush_locked(cgrp, true);
> + cgroup_rstat_flush_locked(cgrp);
> }
>
> /**
> --
> 2.40.0.348.gf938b09366-goog
>
^ permalink raw reply [flat|nested] 44+ messages in thread[parent not found: <CALvZod79Au=Fw9MTW7fP0P7KwQQ_NUiKgRHsNUFnv9v61pKnZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH mm-unstable RFC 5/5] cgroup: remove cgroup_rstat_flush_atomic()
2023-04-20 19:40 ` Shakeel Butt
@ 2023-04-20 19:48 ` Tejun Heo
-1 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2023-04-20 19:48 UTC (permalink / raw)
To: Shakeel Butt
Cc: Yosry Ahmed, Alexander Viro, Christian Brauner, Johannes Weiner,
Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Thu, Apr 20, 2023 at 12:40:24PM -0700, Shakeel Butt wrote:
> +Tejun
>
>
> On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > Previous patches removed the only caller of cgroup_rstat_flush_atomic().
> > Remove the function and simplify the code.
>
>
> I would say let cgroup maintainers decide this and this patch can be
> decoupled from the series.
Looks fine to me but yeah please cc me on the whole series from the next
round.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 5/5] cgroup: remove cgroup_rstat_flush_atomic()
@ 2023-04-20 19:48 ` Tejun Heo
0 siblings, 0 replies; 44+ messages in thread
From: Tejun Heo @ 2023-04-20 19:48 UTC (permalink / raw)
To: Shakeel Butt
Cc: Yosry Ahmed, Alexander Viro, Christian Brauner, Johannes Weiner,
Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton,
linux-fsdevel, linux-kernel, cgroups, linux-mm
On Thu, Apr 20, 2023 at 12:40:24PM -0700, Shakeel Butt wrote:
> +Tejun
>
>
> On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > Previous patches removed the only caller of cgroup_rstat_flush_atomic().
> > Remove the function and simplify the code.
>
>
> I would say let cgroup maintainers decide this and this patch can be
> decoupled from the series.
Looks fine to me but yeah please cc me on the whole series from the next
round.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 5/5] cgroup: remove cgroup_rstat_flush_atomic()
2023-04-20 19:48 ` Tejun Heo
(?)
@ 2023-04-20 20:19 ` Yosry Ahmed
-1 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-20 20:19 UTC (permalink / raw)
To: Tejun Heo
Cc: Shakeel Butt, Alexander Viro, Christian Brauner, Johannes Weiner,
Michal Hocko, Roman Gushchin, Muchun Song, Andrew Morton,
linux-fsdevel, linux-kernel, cgroups, linux-mm
On Thu, Apr 20, 2023 at 12:48 PM Tejun Heo <tj@kernel.org> wrote:
>
> On Thu, Apr 20, 2023 at 12:40:24PM -0700, Shakeel Butt wrote:
> > +Tejun
> >
> >
> > On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > Previous patches removed the only caller of cgroup_rstat_flush_atomic().
> > > Remove the function and simplify the code.
> >
> >
> > I would say let cgroup maintainers decide this and this patch can be
> > decoupled from the series.
>
> Looks fine to me but yeah please cc me on the whole series from the next
> round.
Thanks for taking a look, I don't know how I missed CC'ing you on this
RFC. If I have to guess, my initial draft did not have this patch, so
I did not include you or linux-cgroups, then I added this patch. Sorry
for that :)
>
>
> Thanks.
>
> --
> tejun
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 0/5] cgroup: eliminate atomic rstat
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-03 22:04 ` Yosry Ahmed
-1 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-03 22:04 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> A previous patch series ([1] currently in mm-unstable) changed most
.. and I naturally forgot to link this:
[1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/
> atomic rstat flushing contexts to become non-atomic. This was done to
> avoid an expensive operation that scales with # cgroups and # cpus to
> happen with irqs disabled and scheduling not permitted. There were two
> remaining atomic flushing contexts after that series. This series tries
> to eliminate them as well, eliminating atomic rstat flushing completely.
>
> The two remaining atomic flushing contexts are:
> (a) wb_over_bg_thresh()->mem_cgroup_wb_stats()
> (b) mem_cgroup_threshold()->mem_cgroup_usage()
>
> For (a), flushing needs to be atomic as wb_writeback() calls
> wb_over_bg_thresh() with a spinlock held. However, it seems like the
> call to wb_over_bg_thresh() doesn't need to be protected by that
> spinlock, so this series proposes a refactoring that moves the call
> outside the lock criticial section and makes the stats flushing
> in mem_cgroup_wb_stats() non-atomic.
>
> For (b), flushing needs to be atomic as mem_cgroup_threshold() is called
> with irqs disabled. We only flush the stats when calculating the root
> usage, as it is approximated as the sum of some memcg stats (file, anon,
> and optionally swap) instead of the conventional page counter. This
> series proposes changing this calculation to use the global stats
> instead, eliminating the need for a memcg stat flush.
>
> After these 2 contexts are eliminated, we no longer need
> mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic(). We can
> remove them and simplify the code.
>
> Yosry Ahmed (5):
> writeback: move wb_over_bg_thresh() call outside lock section
> memcg: flush stats non-atomically in mem_cgroup_wb_stats()
> memcg: calculate root usage from global state
> memcg: remove mem_cgroup_flush_stats_atomic()
> cgroup: remove cgroup_rstat_flush_atomic()
>
> fs/fs-writeback.c | 16 +++++++----
> include/linux/cgroup.h | 1 -
> include/linux/memcontrol.h | 5 ----
> kernel/cgroup/rstat.c | 26 ++++--------------
> mm/memcontrol.c | 54 ++++++++------------------------------
> 5 files changed, 27 insertions(+), 75 deletions(-)
>
> --
> 2.40.0.348.gf938b09366-goog
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 0/5] cgroup: eliminate atomic rstat
@ 2023-04-03 22:04 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-03 22:04 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel, linux-kernel, cgroups, linux-mm
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> A previous patch series ([1] currently in mm-unstable) changed most
.. and I naturally forgot to link this:
[1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/
> atomic rstat flushing contexts to become non-atomic. This was done to
> avoid an expensive operation that scales with # cgroups and # cpus to
> happen with irqs disabled and scheduling not permitted. There were two
> remaining atomic flushing contexts after that series. This series tries
> to eliminate them as well, eliminating atomic rstat flushing completely.
>
> The two remaining atomic flushing contexts are:
> (a) wb_over_bg_thresh()->mem_cgroup_wb_stats()
> (b) mem_cgroup_threshold()->mem_cgroup_usage()
>
> For (a), flushing needs to be atomic as wb_writeback() calls
> wb_over_bg_thresh() with a spinlock held. However, it seems like the
> call to wb_over_bg_thresh() doesn't need to be protected by that
> spinlock, so this series proposes a refactoring that moves the call
> outside the lock criticial section and makes the stats flushing
> in mem_cgroup_wb_stats() non-atomic.
>
> For (b), flushing needs to be atomic as mem_cgroup_threshold() is called
> with irqs disabled. We only flush the stats when calculating the root
> usage, as it is approximated as the sum of some memcg stats (file, anon,
> and optionally swap) instead of the conventional page counter. This
> series proposes changing this calculation to use the global stats
> instead, eliminating the need for a memcg stat flush.
>
> After these 2 contexts are eliminated, we no longer need
> mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic(). We can
> remove them and simplify the code.
>
> Yosry Ahmed (5):
> writeback: move wb_over_bg_thresh() call outside lock section
> memcg: flush stats non-atomically in mem_cgroup_wb_stats()
> memcg: calculate root usage from global state
> memcg: remove mem_cgroup_flush_stats_atomic()
> cgroup: remove cgroup_rstat_flush_atomic()
>
> fs/fs-writeback.c | 16 +++++++----
> include/linux/cgroup.h | 1 -
> include/linux/memcontrol.h | 5 ----
> kernel/cgroup/rstat.c | 26 ++++--------------
> mm/memcontrol.c | 54 ++++++++------------------------------
> 5 files changed, 27 insertions(+), 75 deletions(-)
>
> --
> 2.40.0.348.gf938b09366-goog
>
^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <CAJD7tkZ5vh5ssDux1LStX9ZivmGmXsFyxfADGJD5AXDaMnGWRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH mm-unstable RFC 0/5] cgroup: eliminate atomic rstat
2023-04-03 22:04 ` Yosry Ahmed
@ 2023-04-06 18:26 ` Tim Chen
-1 siblings, 0 replies; 44+ messages in thread
From: Tim Chen @ 2023-04-06 18:26 UTC (permalink / raw)
To: Yosry Ahmed, Alexander Viro, Christian Brauner, Johannes Weiner,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Mon, 2023-04-03 at 15:04 -0700, Yosry Ahmed wrote:
> On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > A previous patch series ([1] currently in mm-unstable) changed most
>
> .. and I naturally forgot to link this:
> [1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/
Thanks. Saw this after I sent my request for link.
Tim
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 0/5] cgroup: eliminate atomic rstat
@ 2023-04-06 18:26 ` Tim Chen
0 siblings, 0 replies; 44+ messages in thread
From: Tim Chen @ 2023-04-06 18:26 UTC (permalink / raw)
To: Yosry Ahmed, Alexander Viro, Christian Brauner, Johannes Weiner,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton
Cc: linux-fsdevel, linux-kernel, cgroups, linux-mm
On Mon, 2023-04-03 at 15:04 -0700, Yosry Ahmed wrote:
> On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > A previous patch series ([1] currently in mm-unstable) changed most
>
> .. and I naturally forgot to link this:
> [1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/
Thanks. Saw this after I sent my request for link.
Tim
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 0/5] cgroup: eliminate atomic rstat
2023-04-03 22:03 ` Yosry Ahmed
@ 2023-04-17 11:54 ` Yosry Ahmed
-1 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-17 11:54 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> A previous patch series ([1] currently in mm-unstable) changed most
> atomic rstat flushing contexts to become non-atomic. This was done to
> avoid an expensive operation that scales with # cgroups and # cpus to
> happen with irqs disabled and scheduling not permitted. There were two
> remaining atomic flushing contexts after that series. This series tries
> to eliminate them as well, eliminating atomic rstat flushing completely.
>
> The two remaining atomic flushing contexts are:
> (a) wb_over_bg_thresh()->mem_cgroup_wb_stats()
> (b) mem_cgroup_threshold()->mem_cgroup_usage()
>
> For (a), flushing needs to be atomic as wb_writeback() calls
> wb_over_bg_thresh() with a spinlock held. However, it seems like the
> call to wb_over_bg_thresh() doesn't need to be protected by that
> spinlock, so this series proposes a refactoring that moves the call
> outside the lock criticial section and makes the stats flushing
> in mem_cgroup_wb_stats() non-atomic.
>
> For (b), flushing needs to be atomic as mem_cgroup_threshold() is called
> with irqs disabled. We only flush the stats when calculating the root
> usage, as it is approximated as the sum of some memcg stats (file, anon,
> and optionally swap) instead of the conventional page counter. This
> series proposes changing this calculation to use the global stats
> instead, eliminating the need for a memcg stat flush.
>
> After these 2 contexts are eliminated, we no longer need
> mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic(). We can
> remove them and simplify the code.
>
> Yosry Ahmed (5):
> writeback: move wb_over_bg_thresh() call outside lock section
> memcg: flush stats non-atomically in mem_cgroup_wb_stats()
> memcg: calculate root usage from global state
> memcg: remove mem_cgroup_flush_stats_atomic()
> cgroup: remove cgroup_rstat_flush_atomic()
>
> fs/fs-writeback.c | 16 +++++++----
> include/linux/cgroup.h | 1 -
> include/linux/memcontrol.h | 5 ----
> kernel/cgroup/rstat.c | 26 ++++--------------
> mm/memcontrol.c | 54 ++++++++------------------------------
> 5 files changed, 27 insertions(+), 75 deletions(-)
>
> --
> 2.40.0.348.gf938b09366-goog
>
Any thoughts on this series, anyone? :)
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH mm-unstable RFC 0/5] cgroup: eliminate atomic rstat
@ 2023-04-17 11:54 ` Yosry Ahmed
0 siblings, 0 replies; 44+ messages in thread
From: Yosry Ahmed @ 2023-04-17 11:54 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Johannes Weiner, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton
Cc: linux-fsdevel, linux-kernel, cgroups, linux-mm
On Mon, Apr 3, 2023 at 3:03 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> A previous patch series ([1] currently in mm-unstable) changed most
> atomic rstat flushing contexts to become non-atomic. This was done to
> avoid an expensive operation that scales with # cgroups and # cpus to
> happen with irqs disabled and scheduling not permitted. There were two
> remaining atomic flushing contexts after that series. This series tries
> to eliminate them as well, eliminating atomic rstat flushing completely.
>
> The two remaining atomic flushing contexts are:
> (a) wb_over_bg_thresh()->mem_cgroup_wb_stats()
> (b) mem_cgroup_threshold()->mem_cgroup_usage()
>
> For (a), flushing needs to be atomic as wb_writeback() calls
> wb_over_bg_thresh() with a spinlock held. However, it seems like the
> call to wb_over_bg_thresh() doesn't need to be protected by that
> spinlock, so this series proposes a refactoring that moves the call
> outside the lock criticial section and makes the stats flushing
> in mem_cgroup_wb_stats() non-atomic.
>
> For (b), flushing needs to be atomic as mem_cgroup_threshold() is called
> with irqs disabled. We only flush the stats when calculating the root
> usage, as it is approximated as the sum of some memcg stats (file, anon,
> and optionally swap) instead of the conventional page counter. This
> series proposes changing this calculation to use the global stats
> instead, eliminating the need for a memcg stat flush.
>
> After these 2 contexts are eliminated, we no longer need
> mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic(). We can
> remove them and simplify the code.
>
> Yosry Ahmed (5):
> writeback: move wb_over_bg_thresh() call outside lock section
> memcg: flush stats non-atomically in mem_cgroup_wb_stats()
> memcg: calculate root usage from global state
> memcg: remove mem_cgroup_flush_stats_atomic()
> cgroup: remove cgroup_rstat_flush_atomic()
>
> fs/fs-writeback.c | 16 +++++++----
> include/linux/cgroup.h | 1 -
> include/linux/memcontrol.h | 5 ----
> kernel/cgroup/rstat.c | 26 ++++--------------
> mm/memcontrol.c | 54 ++++++++------------------------------
> 5 files changed, 27 insertions(+), 75 deletions(-)
>
> --
> 2.40.0.348.gf938b09366-goog
>
Any thoughts on this series, anyone? :)
^ permalink raw reply [flat|nested] 44+ messages in thread