* + revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch added to -mm tree
@ 2019-01-30 5:10 akpm
2019-01-30 5:54 ` Roman Gushchin
0 siblings, 1 reply; 6+ messages in thread
From: akpm @ 2019-01-30 5:10 UTC (permalink / raw)
To: dairinin, dchinner, guro, mhocko, mm-commits, riel, stable
The patch titled
Subject: Revert "mm: don't reclaim inodes with many attached pages"
has been added to the -mm tree. Its filename is
revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dave Chinner <dchinner@redhat.com>
Subject: Revert "mm: don't reclaim inodes with many attached pages"
This reverts commit a76cf1a474d7dbcd9336b5f5afb0162baa142cf0.
This change causes serious changes to page cache and inode cache behaviour
and balance, resulting in major performance regressions when combining
worklaods such as large file copies and kernel compiles.
https://bugzilla.kernel.org/show_bug.cgi?id=202441
This change is a hack to work around the problems introduced by changing
how agressive shrinkers are on small caches in commit 172b06c32b94 ("mm:
slowly shrink slabs with a relatively small number of objects"). It
creates more problems than it solves, wasn't adequately reviewed or
tested, so it needs to be reverted.
Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
--- a/fs/inode.c~revert-mm-dont-reclaim-inodes-with-many-attached-pages
+++ a/fs/inode.c
@@ -730,11 +730,8 @@ static enum lru_status inode_lru_isolate
return LRU_REMOVED;
}
- /*
- * Recently referenced inodes and inodes with many attached pages
- * get one more pass.
- */
- if (inode->i_state & I_REFERENCED || inode->i_data.nrpages > 1) {
+ /* recently referenced inodes get one more pass */
+ if (inode->i_state & I_REFERENCED) {
inode->i_state &= ~I_REFERENCED;
spin_unlock(&inode->i_lock);
return LRU_ROTATE;
_
Patches currently in -mm which might be from dchinner@redhat.com are
revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch
revert-mm-slowly-shrink-slabs-with-a-relatively-small-number-of-objects.patch
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: + revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch added to -mm tree
2019-01-30 5:10 + revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch added to -mm tree akpm
@ 2019-01-30 5:54 ` Roman Gushchin
2019-01-30 22:10 ` Andrew Morton
2019-01-31 1:44 ` Dave Chinner
0 siblings, 2 replies; 6+ messages in thread
From: Roman Gushchin @ 2019-01-30 5:54 UTC (permalink / raw)
To: akpm@linux-foundation.org
Cc: dairinin@gmail.com, dchinner@redhat.com, mhocko@kernel.org,
mm-commits@vger.kernel.org, riel@surriel.com,
stable@vger.kernel.org
Hi, Andrew!
I believe, that Rik's patch ( https://lkml.org/lkml/2019/1/28/1865 ) can make
a difference here, and might fix the regression. I'd give it a chance, before
reverting these two patches. Reverting will re-introduce the memcg-leak, which
is quite bad.
Thanks!
On Tue, Jan 29, 2019 at 09:10:09PM -0800, Andrew Morton wrote:
>
> The patch titled
> Subject: Revert "mm: don't reclaim inodes with many attached pages"
> has been added to the -mm tree. Its filename is
> revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch
>
> This patch should soon appear at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ozlabs.org_-7Eakpm_mmots_broken-2Dout_revert-2Dmm-2Ddont-2Dreclaim-2Dinodes-2Dwith-2Dmany-2Dattached-2Dpages.patch&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=FyF1C9cGjEdqHkUBqvei-KdEdRTXFGpZ4I0PmiEfyFo&s=VSS5uU4gj2RCLtvxvOT33OayobrBHoMnr7S_BX6Z1aU&e=
> and later at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ozlabs.org_-7Eakpm_mmotm_broken-2Dout_revert-2Dmm-2Ddont-2Dreclaim-2Dinodes-2Dwith-2Dmany-2Dattached-2Dpages.patch&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=FyF1C9cGjEdqHkUBqvei-KdEdRTXFGpZ4I0PmiEfyFo&s=-LrpRUXob7gymkXVHfk99LZx2GwNY8kcoUD1TuxO7kk&e=
>
> Before you just go and hit "reply", please:
> a) Consider who else should be cc'ed
> b) Prefer to cc a suitable mailing list as well
> c) Ideally: find the original patch on the mailing list and do a
> reply-to-all to that, adding suitable additional cc's
>
> *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
>
> The -mm tree is included into linux-next and is updated
> there every 3-4 working days
>
> ------------------------------------------------------
> From: Dave Chinner <dchinner@redhat.com>
> Subject: Revert "mm: don't reclaim inodes with many attached pages"
>
> This reverts commit a76cf1a474d7dbcd9336b5f5afb0162baa142cf0.
>
> This change causes serious changes to page cache and inode cache behaviour
> and balance, resulting in major performance regressions when combining
> worklaods such as large file copies and kernel compiles.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=202441
>
> This change is a hack to work around the problems introduced by changing
> how agressive shrinkers are on small caches in commit 172b06c32b94 ("mm:
> slowly shrink slabs with a relatively small number of objects"). It
> creates more problems than it solves, wasn't adequately reviewed or
> tested, so it needs to be reverted.
>
> Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Spock <dairinin@gmail.com>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>
> --- a/fs/inode.c~revert-mm-dont-reclaim-inodes-with-many-attached-pages
> +++ a/fs/inode.c
> @@ -730,11 +730,8 @@ static enum lru_status inode_lru_isolate
> return LRU_REMOVED;
> }
>
> - /*
> - * Recently referenced inodes and inodes with many attached pages
> - * get one more pass.
> - */
> - if (inode->i_state & I_REFERENCED || inode->i_data.nrpages > 1) {
> + /* recently referenced inodes get one more pass */
> + if (inode->i_state & I_REFERENCED) {
> inode->i_state &= ~I_REFERENCED;
> spin_unlock(&inode->i_lock);
> return LRU_ROTATE;
> _
>
> Patches currently in -mm which might be from dchinner@redhat.com are
>
> revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch
> revert-mm-slowly-shrink-slabs-with-a-relatively-small-number-of-objects.patch
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: + revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch added to -mm tree
2019-01-30 5:54 ` Roman Gushchin
@ 2019-01-30 22:10 ` Andrew Morton
2019-01-31 1:44 ` Dave Chinner
1 sibling, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2019-01-30 22:10 UTC (permalink / raw)
To: Roman Gushchin
Cc: dairinin@gmail.com, dchinner@redhat.com, mhocko@kernel.org,
mm-commits@vger.kernel.org, riel@surriel.com,
stable@vger.kernel.org
On Wed, 30 Jan 2019 05:54:27 +0000 Roman Gushchin <guro@fb.com> wrote:
> I believe, that Rik's patch ( https://lkml.org/lkml/2019/1/28/1865 ) can make
> a difference here, and might fix the regression. I'd give it a chance, before
> reverting these two patches. Reverting will re-introduce the memcg-leak, which
> is quite bad.
Yup. I'll restore Rik's "mm, slab, vmscan: accumulate gradual pressure
on small slabs" for next -mm.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: + revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch added to -mm tree
2019-01-30 5:54 ` Roman Gushchin
2019-01-30 22:10 ` Andrew Morton
@ 2019-01-31 1:44 ` Dave Chinner
2019-01-31 2:33 ` Rik van Riel
1 sibling, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2019-01-31 1:44 UTC (permalink / raw)
To: Roman Gushchin
Cc: akpm@linux-foundation.org, dairinin@gmail.com, mhocko@kernel.org,
mm-commits@vger.kernel.org, riel@surriel.com,
stable@vger.kernel.org
On Wed, Jan 30, 2019 at 05:54:27AM +0000, Roman Gushchin wrote:
> Hi, Andrew!
>
> I believe, that Rik's patch ( https://lkml.org/lkml/2019/1/28/1865 ) can make
> a difference here, and might fix the regression. I'd give it a chance, before
> reverting these two patches. Reverting will re-introduce the memcg-leak, which
> is quite bad.
Rik's change is just another hack that will still have effects on
reclaim behaviour.
Indeed, the fs/inode.c change definitely needs reverting, because
that is just *plain wrong* and breaks long-standing memory reclaim
behaviour.
I seriously disagree with shovelling a different, largely untested
and contentious change to the shrinker algorithm to try and patch
over the symptoms of the original change. It leaves the underlying
problem unfixed (dying memcgs need a reaper to shrink the remaining
slab objects that pin that specific memcg) and instead plays
"whack-a-mole" on what we alreayd know is a fundamentally broken
assumption (i.e. that shrinking small slabs more agressively is
side-effect free).
-Dave.
--
Dave Chinner
dchinner@redhat.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: + revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch added to -mm tree
2019-01-31 1:44 ` Dave Chinner
@ 2019-01-31 2:33 ` Rik van Riel
2019-01-31 22:49 ` Dave Chinner
0 siblings, 1 reply; 6+ messages in thread
From: Rik van Riel @ 2019-01-31 2:33 UTC (permalink / raw)
To: Dave Chinner, Roman Gushchin
Cc: akpm@linux-foundation.org, dairinin@gmail.com, mhocko@kernel.org,
mm-commits@vger.kernel.org, stable@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1538 bytes --]
On Thu, 2019-01-31 at 12:44 +1100, Dave Chinner wrote:
> Indeed, the fs/inode.c change definitely needs reverting, because
> that is just *plain wrong* and breaks long-standing memory reclaim
> behaviour.
The long-standing behavior may be wrong here, because
of just how incredibly slow ext4 and XFS are when it
comes to reclaiming inodes with lots of dirty pages.
We have observed some real system stalls when the
reclaim code hits an ext4 or XFS inode with dozens
of megabytes of dirty data that needs to be synced
out before the inode can be reclaimed.
Have you observed any regressions due to not
reclaiming inodes with cached pages attached?
If so, what kind of behavioral differences are you
seeing due to that regression?
It would be nice if we could figure out a way to
avoid both bad behaviors...
> I seriously disagree with shovelling a different, largely untested
> and contentious change to the shrinker algorithm to try and patch
> over the symptoms of the original change. It leaves the underlying
> problem unfixed (dying memcgs need a reaper to shrink the remaining
> slab objects that pin that specific memcg) and instead plays
> "whack-a-mole" on what we alreayd know is a fundamentally broken
> assumption (i.e. that shrinking small slabs more agressively is
> side-effect free).
My patch shrinks small slabs with the same pressure
as larger slabs. It also ensures that slabs from dead
memcgs will get eventually reclaimed.
What am I missing?
--
All Rights Reversed.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: + revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch added to -mm tree
2019-01-31 2:33 ` Rik van Riel
@ 2019-01-31 22:49 ` Dave Chinner
0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2019-01-31 22:49 UTC (permalink / raw)
To: Rik van Riel
Cc: Roman Gushchin, akpm@linux-foundation.org, dairinin@gmail.com,
mhocko@kernel.org, mm-commits@vger.kernel.org,
stable@vger.kernel.org
On Wed, Jan 30, 2019 at 09:33:34PM -0500, Rik van Riel wrote:
> On Thu, 2019-01-31 at 12:44 +1100, Dave Chinner wrote:
>
> > Indeed, the fs/inode.c change definitely needs reverting, because
> > that is just *plain wrong* and breaks long-standing memory reclaim
> > behaviour.
>
> The long-standing behavior may be wrong here, because
> of just how incredibly slow ext4 and XFS are when it
> comes to reclaiming inodes with lots of dirty pages.
That has nothing to do with the reasons that the change that was
made. The reasons the change were to band-aid a regression for a
different change that has clearly introduced multiple regressions.
This change of >15 year old behaviour can't be justified by arguing
that "but what about <this other completely differnt issue>!"
If you have problems with writeback and inode reclaim, do what
everyone else does: write a bug report showing where the problematic
writeback is coming from, explain why it's a problem and propose a
fix. And post it to linux-fsdevel@vger.kernel.org for review.
Rik, you know how to work with upstream development, I should not
have to be telling you this.
> We have observed some real system stalls when the
> reclaim code hits an ext4 or XFS inode with dozens
> of megabytes of dirty data that needs to be synced
> out before the inode can be reclaimed.
"reclaim code" is a massive foot print. Please be specific.
> Have you observed any regressions due to not
> reclaiming inodes with cached pages attached?
That's not how we justify a change. The burden of proof of
correctness is on the person proposing the change, not on the
reviewer to prove the change is wrong. We have clear indication that
is changes behaviour of common workloads (copying files at the same
time as compiling a kernel is a common thing!), so we should revert
it until we have a set of changes that actually are proved to work
correctly for common workloads.
> > I seriously disagree with shovelling a different, largely untested
> > and contentious change to the shrinker algorithm to try and patch
> > over the symptoms of the original change. It leaves the underlying
> > problem unfixed (dying memcgs need a reaper to shrink the remaining
> > slab objects that pin that specific memcg) and instead plays
> > "whack-a-mole" on what we alreayd know is a fundamentally broken
> > assumption (i.e. that shrinking small slabs more agressively is
> > side-effect free).
>
> My patch shrinks small slabs with the same pressure
> as larger slabs. It also ensures that slabs from dead
> memcgs will get eventually reclaimed.
>
> What am I missing?
That "freeable" is not a measure of slab cache size.
It's a measure of how many freeable objects are in the cache.
i.e. a large cache with nothing freeable returns the same value
as a small cache with nothing freeable.
i.e. the basic premise that "freeable is small == this is a small
cache" is flawed.
And then there's the bugs. Don't forget - the same shrinker can run
concurrent on every node via kswapd and be called by every instance
of direct reclaim on every CPU at the same time. (i.e. the unbound
concurrency problem that fucks up inode caches and results in having
to throttle (block) in the shrinker implementation so the working
sets don't get trashed by massively excessive memory reclaim
pressure).
-Dave.
--
Dave Chinner
dchinner@redhat.com
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-01-31 22:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-30 5:10 + revert-mm-dont-reclaim-inodes-with-many-attached-pages.patch added to -mm tree akpm
2019-01-30 5:54 ` Roman Gushchin
2019-01-30 22:10 ` Andrew Morton
2019-01-31 1:44 ` Dave Chinner
2019-01-31 2:33 ` Rik van Riel
2019-01-31 22:49 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).