Re: Rework qgroup accounting

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Liu Bo <bo.li.liu@oracle.com>
To: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Rework qgroup accounting
Date: Thu, 19 Dec 2013 10:00:40 +0800	[thread overview]
Message-ID: <20131219020039.GA18185@localhost.localdomain> (raw)
In-Reply-To: <1387400849-7274-1-git-send-email-jbacik@fb.com>

On Wed, Dec 18, 2013 at 04:07:26PM -0500, Josef Bacik wrote:
> People have been complaining about autodefrag/defrag killing their box with OOM.
> This is because the snapshot aware defrag stuff super sucks if you have lots of
> snapshots, and so that needs to be reworked.  The problem is once that is fixed
> you start to hit horrible lock contention on the delayed refs lock because we
> have thousands of like entries that can't be merged until when we go to actually
> run the delayed ref.  This problem exists because of the delayed ref sequence
> number.
> 
> The major user of the delayed ref sequence number is the qgroup code.  It uses
> it to pass into btrfs_find_all_roots to see what roots pointed to a particular
> bytenr either before or including the current operation.  It needs this
> information to know if we were removing the last ref or an just the last ref for
> this particular root.  The problem with this is that it has made the delayed ref
> code incredibly fragile and has forced us to do things like
> btrfs_merge_delayed_refs which is what is causing us so much pain when we have
> thousands of ref updates for the same block.
> 
> In order to fix this I'm introducing a new way of adjusting quota counts.  I've
> called them qgroup operations, and we apply them in very specific situations.
> We only add these when we add or remove the only ref for a particular root.
> Obviously we have to account for shared refs as well so there is some extra code
> for these special cases, but basically we make the qgroup accounting only happen
> when we know there was a real change (or likely a real change in the case of
> shared refs).
> 
> In order to do this I've also introduced lock/unlock_ref.  This only gets used
> if we actually have qgroups enabled, but it will be relatively low cost even if
> we have qgroups enabled as it only locks the bytenr for reference updates.  So
> delayed ref updates will not trip over this since we only do one at a time
> anyway, so we'll only have contention if we have delayed refs running at the
> same time as a qgroup operation update.
> 
> Then all we need to account for is the fact that we will get the full view of
> the roots at the time we run the operations, not what they were when our
> particular operation occurred.  This is ok because we will either ignore our
> root in the case of add or not ignore it in case of remove when calculating the
> ref counts.  We use the same ref counting scheme that Arne developed as it's
> pretty freaking awesome, and just adjust how we count the ref counts based on
> our operations.
> 
> In addition to all of this new code I've added a big set of sanity tests to make
> sure everything is working right.  Between this and the qgroups xfstests I'm
> pretty certain I haven't broken anything obvious with qgroups.  This is just the
> first step in getting rid of the delayed ref sequence number and fixing the
> defrag OOM mess but it is the biggest part.  Thanks,

I'd say I love the idea, will look at it closer.

-liubo

     prev parent reply	other threads:[~2013-12-19  2:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-18 21:07 Rework qgroup accounting Josef Bacik
2013-12-18 21:07 ` [PATCH 1/3] Btrfs: introduce lock_ref/unlock_ref Josef Bacik
2013-12-19  4:01   ` Dave Chinner
2013-12-19 14:37     ` Josef Bacik
2013-12-18 21:07 ` [PATCH 2/3] Btrfs: rework qgroup accounting Josef Bacik
2013-12-21  8:01   ` Wang Shilong
2013-12-21 14:13     ` Josef Bacik
2013-12-21  8:56   ` Wang Shilong
2013-12-21 14:14     ` Josef Bacik
2014-01-07 16:43     ` Josef Bacik
2014-01-08 14:33   ` David Sterba
2014-01-08 14:42     ` Josef Bacik
2013-12-18 21:07 ` [PATCH 3/3] Btrfs: add sanity tests for new qgroup accounting code Josef Bacik
2013-12-19  2:00 ` Liu Bo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131219020039.GA18185@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).