From: Liu Bo <bo.li.liu@oracle.com>
To: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Rework qgroup accounting
Date: Thu, 19 Dec 2013 10:00:40 +0800 [thread overview]
Message-ID: <20131219020039.GA18185@localhost.localdomain> (raw)
In-Reply-To: <1387400849-7274-1-git-send-email-jbacik@fb.com>
On Wed, Dec 18, 2013 at 04:07:26PM -0500, Josef Bacik wrote:
> People have been complaining about autodefrag/defrag killing their box with OOM.
> This is because the snapshot aware defrag stuff super sucks if you have lots of
> snapshots, and so that needs to be reworked. The problem is once that is fixed
> you start to hit horrible lock contention on the delayed refs lock because we
> have thousands of like entries that can't be merged until when we go to actually
> run the delayed ref. This problem exists because of the delayed ref sequence
> number.
>
> The major user of the delayed ref sequence number is the qgroup code. It uses
> it to pass into btrfs_find_all_roots to see what roots pointed to a particular
> bytenr either before or including the current operation. It needs this
> information to know if we were removing the last ref or an just the last ref for
> this particular root. The problem with this is that it has made the delayed ref
> code incredibly fragile and has forced us to do things like
> btrfs_merge_delayed_refs which is what is causing us so much pain when we have
> thousands of ref updates for the same block.
>
> In order to fix this I'm introducing a new way of adjusting quota counts. I've
> called them qgroup operations, and we apply them in very specific situations.
> We only add these when we add or remove the only ref for a particular root.
> Obviously we have to account for shared refs as well so there is some extra code
> for these special cases, but basically we make the qgroup accounting only happen
> when we know there was a real change (or likely a real change in the case of
> shared refs).
>
> In order to do this I've also introduced lock/unlock_ref. This only gets used
> if we actually have qgroups enabled, but it will be relatively low cost even if
> we have qgroups enabled as it only locks the bytenr for reference updates. So
> delayed ref updates will not trip over this since we only do one at a time
> anyway, so we'll only have contention if we have delayed refs running at the
> same time as a qgroup operation update.
>
> Then all we need to account for is the fact that we will get the full view of
> the roots at the time we run the operations, not what they were when our
> particular operation occurred. This is ok because we will either ignore our
> root in the case of add or not ignore it in case of remove when calculating the
> ref counts. We use the same ref counting scheme that Arne developed as it's
> pretty freaking awesome, and just adjust how we count the ref counts based on
> our operations.
>
> In addition to all of this new code I've added a big set of sanity tests to make
> sure everything is working right. Between this and the qgroups xfstests I'm
> pretty certain I haven't broken anything obvious with qgroups. This is just the
> first step in getting rid of the delayed ref sequence number and fixing the
> defrag OOM mess but it is the biggest part. Thanks,
I'd say I love the idea, will look at it closer.
-liubo
prev parent reply other threads:[~2013-12-19 2:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-18 21:07 Rework qgroup accounting Josef Bacik
2013-12-18 21:07 ` [PATCH 1/3] Btrfs: introduce lock_ref/unlock_ref Josef Bacik
2013-12-19 4:01 ` Dave Chinner
2013-12-19 14:37 ` Josef Bacik
2013-12-18 21:07 ` [PATCH 2/3] Btrfs: rework qgroup accounting Josef Bacik
2013-12-21 8:01 ` Wang Shilong
2013-12-21 14:13 ` Josef Bacik
2013-12-21 8:56 ` Wang Shilong
2013-12-21 14:14 ` Josef Bacik
2014-01-07 16:43 ` Josef Bacik
2014-01-08 14:33 ` David Sterba
2014-01-08 14:42 ` Josef Bacik
2013-12-18 21:07 ` [PATCH 3/3] Btrfs: add sanity tests for new qgroup accounting code Josef Bacik
2013-12-19 2:00 ` Liu Bo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131219020039.GA18185@localhost.localdomain \
--to=bo.li.liu@oracle.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).