public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: Timely free truncated dirty pages
@ 2017-01-10 10:17 Jan Kara
  2017-01-10 11:09 ` Vlastimil Babka
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jan Kara @ 2017-01-10 10:17 UTC (permalink / raw)
  To: Darrick J . Wong
  Cc: linux-xfs, Petr Tuma, Jan Kara, Brian Foster, Vlastimil Babka

Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
to skip dirty pages in xfs_vm_releasepage() which also has the effect
that if a dirty page is truncated, it does not get freed by
block_invalidatepage() and is lingering in LRU list waiting for reclaim.
So a simple loop like:

while true; do
	dd if=/dev/zero of=file bs=1M count=100
	rm file
done

will keep using more and more memory until we hit low watermarks and
start pagecache reclaim which will eventually reclaim also the truncate
pages. Keeping these truncated (and thus never usable) pages in memory
is just a waste of memory, is unnecessarily stressing page cache
reclaim, and is also confusing users thinking they are running out of
memory.

So instead of just skipping dirty pages in xfs_vm_releasepage(), return
to old behavior of skipping them only if they have delalloc or unwritten
buffers and fix the spurious warnings by warning only if the page is
clean.

CC: Brian Foster <bfoster@redhat.com>
CC: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Petr Tůma <petr.tuma@d3s.mff.cuni.cz>
Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/xfs/xfs_aops.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0f56fcd3a5d5..670d38ff7dc7 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1150,21 +1150,20 @@ xfs_vm_releasepage(
 	 * the dirty bit cleared. Thus, it can send actual dirty pages to
 	 * ->releasepage() via shrink_active_list(). Conversely,
 	 * block_invalidatepage() can send pages that are still marked dirty
-	 * but otherwise have invalidated buffers.
-	 *
-	 * We've historically freed buffers on the latter. Instead, quietly
-	 * filter out all dirty pages to avoid spurious buffer state warnings.
-	 * This can likely be removed once shrink_active_list() is fixed.
+	 * but otherwise have invalidated buffers. So we warn only if the page
+	 * is clean to avoid spurious warnings when called from
+	 * shrink_active_list() for a dirty page.
 	 */
-	if (PageDirty(page))
-		return 0;
-
 	xfs_count_page_state(page, &delalloc, &unwritten);
 
-	if (WARN_ON_ONCE(delalloc))
+	if (delalloc) {
+		WARN_ON_ONCE(!PageDirty(page));
 		return 0;
-	if (WARN_ON_ONCE(unwritten))
+	}
+	if (unwritten) {
+		WARN_ON_ONCE(!PageDirty(page));
 		return 0;
+	}
 
 	return try_to_free_buffers(page);
 }
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Timely free truncated dirty pages
  2017-01-10 10:17 [PATCH] xfs: Timely free truncated dirty pages Jan Kara
@ 2017-01-10 11:09 ` Vlastimil Babka
  2017-01-10 12:43 ` Brian Foster
  2017-01-11  8:07 ` Vlastimil Babka
  2 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2017-01-10 11:09 UTC (permalink / raw)
  To: Jan Kara, Darrick J . Wong; +Cc: linux-xfs, Petr Tuma, Brian Foster, stable

On 01/10/2017 11:17 AM, Jan Kara wrote:
> Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
> to skip dirty pages in xfs_vm_releasepage() which also has the effect
> that if a dirty page is truncated, it does not get freed by
> block_invalidatepage() and is lingering in LRU list waiting for reclaim.
> So a simple loop like:
> 
> while true; do
> 	dd if=/dev/zero of=file bs=1M count=100
> 	rm file
> done
> 
> will keep using more and more memory until we hit low watermarks and
> start pagecache reclaim which will eventually reclaim also the truncate
> pages. Keeping these truncated (and thus never usable) pages in memory
> is just a waste of memory, is unnecessarily stressing page cache
> reclaim, and is also confusing users thinking they are running out of
> memory.
> 
> So instead of just skipping dirty pages in xfs_vm_releasepage(), return
> to old behavior of skipping them only if they have delalloc or unwritten
> buffers and fix the spurious warnings by warning only if the page is
> clean.
> 
> CC: Brian Foster <bfoster@redhat.com>
> CC: Vlastimil Babka <vbabka@suse.cz>
> Reported-by: Petr Tůma <petr.tuma@d3s.mff.cuni.cz>
> Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab
> Signed-off-by: Jan Kara <jack@suse.cz>

Thanks,

I think it's worth for stable inclusion? (4.8+)

> ---
>  fs/xfs/xfs_aops.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 0f56fcd3a5d5..670d38ff7dc7 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1150,21 +1150,20 @@ xfs_vm_releasepage(
>  	 * the dirty bit cleared. Thus, it can send actual dirty pages to
>  	 * ->releasepage() via shrink_active_list(). Conversely,
>  	 * block_invalidatepage() can send pages that are still marked dirty
> -	 * but otherwise have invalidated buffers.
> -	 *
> -	 * We've historically freed buffers on the latter. Instead, quietly
> -	 * filter out all dirty pages to avoid spurious buffer state warnings.
> -	 * This can likely be removed once shrink_active_list() is fixed.
> +	 * but otherwise have invalidated buffers. So we warn only if the page
> +	 * is clean to avoid spurious warnings when called from
> +	 * shrink_active_list() for a dirty page.
>  	 */
> -	if (PageDirty(page))
> -		return 0;
> -
>  	xfs_count_page_state(page, &delalloc, &unwritten);
>  
> -	if (WARN_ON_ONCE(delalloc))
> +	if (delalloc) {
> +		WARN_ON_ONCE(!PageDirty(page));
>  		return 0;
> -	if (WARN_ON_ONCE(unwritten))
> +	}
> +	if (unwritten) {
> +		WARN_ON_ONCE(!PageDirty(page));
>  		return 0;
> +	}
>  
>  	return try_to_free_buffers(page);
>  }
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Timely free truncated dirty pages
  2017-01-10 10:17 [PATCH] xfs: Timely free truncated dirty pages Jan Kara
  2017-01-10 11:09 ` Vlastimil Babka
@ 2017-01-10 12:43 ` Brian Foster
  2017-01-11  8:57   ` Jan Kara
  2017-01-11  8:07 ` Vlastimil Babka
  2 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2017-01-10 12:43 UTC (permalink / raw)
  To: Jan Kara; +Cc: Darrick J . Wong, linux-xfs, Petr Tuma, Vlastimil Babka

On Tue, Jan 10, 2017 at 11:17:45AM +0100, Jan Kara wrote:
> Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
> to skip dirty pages in xfs_vm_releasepage() which also has the effect
> that if a dirty page is truncated, it does not get freed by
> block_invalidatepage() and is lingering in LRU list waiting for reclaim.
> So a simple loop like:
> 
> while true; do
> 	dd if=/dev/zero of=file bs=1M count=100
> 	rm file
> done
> 
> will keep using more and more memory until we hit low watermarks and
> start pagecache reclaim which will eventually reclaim also the truncate
> pages. Keeping these truncated (and thus never usable) pages in memory
> is just a waste of memory, is unnecessarily stressing page cache
> reclaim, and is also confusing users thinking they are running out of
> memory.
> 
> So instead of just skipping dirty pages in xfs_vm_releasepage(), return
> to old behavior of skipping them only if they have delalloc or unwritten
> buffers and fix the spurious warnings by warning only if the page is
> clean.
> 
> CC: Brian Foster <bfoster@redhat.com>
> CC: Vlastimil Babka <vbabka@suse.cz>
> Reported-by: Petr Tůma <petr.tuma@d3s.mff.cuni.cz>
> Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

Ugh, so this is closer to the alternative I was considering at the time
this was posted. Thanks for the fix. The code looks fine, I'd just like
to update the comment...

>  fs/xfs/xfs_aops.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 0f56fcd3a5d5..670d38ff7dc7 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1150,21 +1150,20 @@ xfs_vm_releasepage(
>  	 * the dirty bit cleared. Thus, it can send actual dirty pages to
>  	 * ->releasepage() via shrink_active_list(). Conversely,
>  	 * block_invalidatepage() can send pages that are still marked dirty
> -	 * but otherwise have invalidated buffers.
> -	 *
> -	 * We've historically freed buffers on the latter. Instead, quietly
> -	 * filter out all dirty pages to avoid spurious buffer state warnings.
> -	 * This can likely be removed once shrink_active_list() is fixed.
> +	 * but otherwise have invalidated buffers. So we warn only if the page
> +	 * is clean to avoid spurious warnings when called from
> +	 * shrink_active_list() for a dirty page.
>  	 */

Could you fold in something like the following?

-	 * but otherwise have invalidated buffers. So we warn only if the page
-	 * is clean to avoid spurious warnings when called from
-	 * shrink_active_list() for a dirty page.
+	 * but otherwise have invalidated buffers.
+	 *
+	 * We want to release the latter to avoid unnecessary buildup of the
+	 * LRU, skip the former and warn if we've left any lingering
+	 * delalloc/unwritten buffers on clean pages. Skip pages with delalloc
+	 * or unwritten buffers and warn if the page is not dirty. Otherwise
+	 * try to release the buffers.

Brian

> -	if (PageDirty(page))
> -		return 0;
> -
>  	xfs_count_page_state(page, &delalloc, &unwritten);
>  
> -	if (WARN_ON_ONCE(delalloc))
> +	if (delalloc) {
> +		WARN_ON_ONCE(!PageDirty(page));
>  		return 0;
> -	if (WARN_ON_ONCE(unwritten))
> +	}
> +	if (unwritten) {
> +		WARN_ON_ONCE(!PageDirty(page));
>  		return 0;
> +	}
>  
>  	return try_to_free_buffers(page);
>  }
> -- 
> 2.10.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Timely free truncated dirty pages
  2017-01-10 10:17 [PATCH] xfs: Timely free truncated dirty pages Jan Kara
  2017-01-10 11:09 ` Vlastimil Babka
  2017-01-10 12:43 ` Brian Foster
@ 2017-01-11  8:07 ` Vlastimil Babka
  2017-01-11  8:19   ` Vlastimil Babka
  2017-01-11  8:59   ` Jan Kara
  2 siblings, 2 replies; 7+ messages in thread
From: Vlastimil Babka @ 2017-01-11  8:07 UTC (permalink / raw)
  To: Jan Kara, Darrick J . Wong; +Cc: linux-xfs, Petr Tuma, Brian Foster

On 01/10/2017 11:17 AM, Jan Kara wrote:
> Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
> to skip dirty pages in xfs_vm_releasepage() which also has the effect
> that if a dirty page is truncated, it does not get freed by
> block_invalidatepage() and is lingering in LRU list waiting for reclaim.
> So a simple loop like:
> 
> while true; do
> 	dd if=/dev/zero of=file bs=1M count=100
> 	rm file
> done
> 
> will keep using more and more memory until we hit low watermarks and
> start pagecache reclaim which will eventually reclaim also the truncate
> pages. Keeping these truncated (and thus never usable) pages in memory
> is just a waste of memory, is unnecessarily stressing page cache
> reclaim, and is also confusing users thinking they are running out of
> memory.

Hi,

So the impact is even worse than that, as it's the kernel that is
actually confused, while the user still gets the impression of memory
being available.
According to the reporter, this bug has manifested as anonymous mmap()
returning with ENOMEM in their workload (some benchmark), which does not
happen anymore after switching from xfs to ext4.

Here are the relevant /proc/meminfo counters from a system that is
experiencing the bug (but hasn't exhausted all memory and hit ENOMEM yet):


MemTotal:       65862388 kB
MemFree:         6882888 kB
MemAvailable:   54829584 kB
Buffers:            2248 kB
Cached:          1937376 kB
SwapCached:            0 kB
Active:         14969280 kB
Inactive:       42491396 kB
Active(anon):   10034824 kB
Inactive(anon):     1420 kB
Active(file):    4934456 kB
Inactive(file): 42489976 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        511996 kB
SwapFree:         511996 kB
Dirty:              1452 kB
Writeback:             0 kB
AnonPages:      10034556 kB
Mapped:           143956 kB
Shmem:              1836 kB
...

MemAvailable suggests that most of memory (except anonymous mmaps) is
still available, i.e. free or reclaimable. But there's a large
discrepancy between "Cached", which is based on the NR_FILE_PAGES
counter, and [In]Active(file), which reflects NR_[IN]ACTIVE_FILE
counters. So the truncated pages are not counted towards page cache
anymore, but are still accounted as LRU pages, as they are placed on the
LRU lists waiting for reclaim.

Now MemAvailable is based on NR_FREE_PAGES and NR_[IN]ACTIVE_FILE (I
actually wonder why the value is so high here, it should consider only
half of the file LRU's as available, according to the comment there...
oh, I see, there's a math error there, I'll report that separately...).

And AFAICS, the ENOMEM results of mmap() comes from
__vm_enough_memory(), which in OVERCOMMIT_GUESS mode considers as
available memory (called 'free' there) the sum of NR_FREE_PAGES, and
NR_FILE_PAGES.

So this would explain the ENOMEM, and how this bug is a problem for
workloads that truncate/remove files on xfs, and at the same time rely
on anonymous mmap(). In that case, these mmaps can apparently start
failing if there's no other source of sufficient memory pressure to let
reclaim get rid of the truncated pages on LRU. This should be serious
enough for a stable backport IMHO.

Thanks,
Vlastimil

> So instead of just skipping dirty pages in xfs_vm_releasepage(), return
> to old behavior of skipping them only if they have delalloc or unwritten
> buffers and fix the spurious warnings by warning only if the page is
> clean.
> 
> CC: Brian Foster <bfoster@redhat.com>
> CC: Vlastimil Babka <vbabka@suse.cz>
> Reported-by: Petr Tůma <petr.tuma@d3s.mff.cuni.cz>
> Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/xfs/xfs_aops.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 0f56fcd3a5d5..670d38ff7dc7 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1150,21 +1150,20 @@ xfs_vm_releasepage(
>  	 * the dirty bit cleared. Thus, it can send actual dirty pages to
>  	 * ->releasepage() via shrink_active_list(). Conversely,
>  	 * block_invalidatepage() can send pages that are still marked dirty
> -	 * but otherwise have invalidated buffers.
> -	 *
> -	 * We've historically freed buffers on the latter. Instead, quietly
> -	 * filter out all dirty pages to avoid spurious buffer state warnings.
> -	 * This can likely be removed once shrink_active_list() is fixed.
> +	 * but otherwise have invalidated buffers. So we warn only if the page
> +	 * is clean to avoid spurious warnings when called from
> +	 * shrink_active_list() for a dirty page.
>  	 */
> -	if (PageDirty(page))
> -		return 0;
> -
>  	xfs_count_page_state(page, &delalloc, &unwritten);
>  
> -	if (WARN_ON_ONCE(delalloc))
> +	if (delalloc) {
> +		WARN_ON_ONCE(!PageDirty(page));
>  		return 0;
> -	if (WARN_ON_ONCE(unwritten))
> +	}
> +	if (unwritten) {
> +		WARN_ON_ONCE(!PageDirty(page));
>  		return 0;
> +	}
>  
>  	return try_to_free_buffers(page);
>  }
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Timely free truncated dirty pages
  2017-01-11  8:07 ` Vlastimil Babka
@ 2017-01-11  8:19   ` Vlastimil Babka
  2017-01-11  8:59   ` Jan Kara
  1 sibling, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2017-01-11  8:19 UTC (permalink / raw)
  To: Jan Kara, Darrick J . Wong; +Cc: linux-xfs, Petr Tuma, Brian Foster

On 01/11/2017 09:07 AM, Vlastimil Babka wrote:
> Now MemAvailable is based on NR_FREE_PAGES and NR_[IN]ACTIVE_FILE (I
> actually wonder why the value is so high here, it should consider only
> half of the file LRU's as available, according to the comment there...
> oh, I see, there's a math error there, I'll report that separately...).

Ah, looks like the comment is just somewhat ambiguous. "half of the file
LRU's" applies only if their size is below the low watermarks. The code
seems to do the right thing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Timely free truncated dirty pages
  2017-01-10 12:43 ` Brian Foster
@ 2017-01-11  8:57   ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2017-01-11  8:57 UTC (permalink / raw)
  To: Brian Foster
  Cc: Jan Kara, Darrick J . Wong, linux-xfs, Petr Tuma, Vlastimil Babka

On Tue 10-01-17 07:43:20, Brian Foster wrote:
> On Tue, Jan 10, 2017 at 11:17:45AM +0100, Jan Kara wrote:
> > Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
> > to skip dirty pages in xfs_vm_releasepage() which also has the effect
> > that if a dirty page is truncated, it does not get freed by
> > block_invalidatepage() and is lingering in LRU list waiting for reclaim.
> > So a simple loop like:
> > 
> > while true; do
> > 	dd if=/dev/zero of=file bs=1M count=100
> > 	rm file
> > done
> > 
> > will keep using more and more memory until we hit low watermarks and
> > start pagecache reclaim which will eventually reclaim also the truncate
> > pages. Keeping these truncated (and thus never usable) pages in memory
> > is just a waste of memory, is unnecessarily stressing page cache
> > reclaim, and is also confusing users thinking they are running out of
> > memory.
> > 
> > So instead of just skipping dirty pages in xfs_vm_releasepage(), return
> > to old behavior of skipping them only if they have delalloc or unwritten
> > buffers and fix the spurious warnings by warning only if the page is
> > clean.
> > 
> > CC: Brian Foster <bfoster@redhat.com>
> > CC: Vlastimil Babka <vbabka@suse.cz>
> > Reported-by: Petr Tůma <petr.tuma@d3s.mff.cuni.cz>
> > Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> 
> Ugh, so this is closer to the alternative I was considering at the time
> this was posted. Thanks for the fix. The code looks fine, I'd just like
> to update the comment...

OK, done.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Timely free truncated dirty pages
  2017-01-11  8:07 ` Vlastimil Babka
  2017-01-11  8:19   ` Vlastimil Babka
@ 2017-01-11  8:59   ` Jan Kara
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Kara @ 2017-01-11  8:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Jan Kara, Darrick J . Wong, linux-xfs, Petr Tuma, Brian Foster

Hi,

On Wed 11-01-17 09:07:51, Vlastimil Babka wrote:
> On 01/10/2017 11:17 AM, Jan Kara wrote:
> > Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
> > to skip dirty pages in xfs_vm_releasepage() which also has the effect
> > that if a dirty page is truncated, it does not get freed by
> > block_invalidatepage() and is lingering in LRU list waiting for reclaim.
> > So a simple loop like:
> > 
> > while true; do
> > 	dd if=/dev/zero of=file bs=1M count=100
> > 	rm file
> > done
> > 
> > will keep using more and more memory until we hit low watermarks and
> > start pagecache reclaim which will eventually reclaim also the truncate
> > pages. Keeping these truncated (and thus never usable) pages in memory
> > is just a waste of memory, is unnecessarily stressing page cache
> > reclaim, and is also confusing users thinking they are running out of
> > memory.
> 
> So the impact is even worse than that, as it's the kernel that is
> actually confused, while the user still gets the impression of memory
> being available.
> According to the reporter, this bug has manifested as anonymous mmap()
> returning with ENOMEM in their workload (some benchmark), which does not
> happen anymore after switching from xfs to ext4.

OK, I've changed the changelog to contain:

... and reportedly also leads to anonymous mmap(2) returning ENOMEM
prematurely.

<snip>

> So this would explain the ENOMEM, and how this bug is a problem for
> workloads that truncate/remove files on xfs, and at the same time rely
> on anonymous mmap(). In that case, these mmaps can apparently start
> failing if there's no other source of sufficient memory pressure to let
> reclaim get rid of the truncated pages on LRU. This should be serious
> enough for a stable backport IMHO.

OK, added CC to stable.
	
								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-11  8:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-10 10:17 [PATCH] xfs: Timely free truncated dirty pages Jan Kara
2017-01-10 11:09 ` Vlastimil Babka
2017-01-10 12:43 ` Brian Foster
2017-01-11  8:57   ` Jan Kara
2017-01-11  8:07 ` Vlastimil Babka
2017-01-11  8:19   ` Vlastimil Babka
2017-01-11  8:59   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox