public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Ethan Lien <ethanlien@synology.com>
To: linux-btrfs@vger.kernel.org
Cc: Ethan Lien <ethanlien@synology.com>
Subject: [PATCH] btrfs: balance dirty metadata pages in btrfs_finish_ordered_io
Date: Fri, 27 Apr 2018 15:05:24 +0800	[thread overview]
Message-ID: <20180427070524.12188-1-ethanlien@synology.com> (raw)

We should balance dirty metadata pages at the end of
btrfs_finish_ordered_io, since a small, unmergeable random write can
potentially produce dirty metadata which is multiple times larger than
the data itself. For example, a small, unmergeable 4KiB write may
produce:

    16KiB dirty leaf (and possibly 16KiB dirty node) in subvolume tree
    16KiB dirty leaf (and possibly 16KiB dirty node) in checksum tree
    16KiB dirty leaf (and possibly 16KiB dirty node) in extent tree

Although we do call balance dirty pages in write side, but in the
buffered write path, most metadata are dirtied only after we reach the
dirty background limit (which by far onlys counts dirty data pages) and
wakeup the flusher thread. If there are many small, unmergeable random
writes spread in a large btree, we'll find a burst of dirty pages
exceeds the dirty_bytes limit after we wakeup the flusher thread - which
is not what we expect. In our machine, it caused out-of-memory problem
since a page cannot be dropped if it is marked dirty.

Someone may worry about we may sleep in btrfs_btree_balance_dirty(), but
since we do btrfs_finish_ordered_io in a separate worker, it will not
stop the flusher consuming dirty pages. Also, we use different worker for
metadata writeback endio, sleep in btrfs_finish_ordered_io help us
throttle the size of dirty metadata pages.

Signed-off-by: Ethan Lien <ethanlien@synology.com>
---
 fs/btrfs/inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d241285a0d2a..d09433ab126e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3151,6 +3151,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 	/* once for the tree */
 	btrfs_put_ordered_extent(ordered_extent);
 
+	btrfs_btree_balance_dirty(fs_info);
+
 	return ret;
 }
 
-- 
2.17.0


             reply	other threads:[~2018-04-27  7:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-27  7:05 Ethan Lien [this message]
2018-05-22 16:28 ` [PATCH] btrfs: balance dirty metadata pages in btrfs_finish_ordered_io David Sterba
2018-05-23  9:34   ` ethanlien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180427070524.12188-1-ethanlien@synology.com \
    --to=ethanlien@synology.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox