Re: device balance times

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: device balance times
Date: Thu, 23 Oct 2014 22:35:29 -0400	[thread overview]
Message-ID: <20141024023529.GD17395@hungrycats.org> (raw)
In-Reply-To: <pan$d3176$38668d14$9f498698$7cd4113f@cox.net>

[-- Attachment #1: Type: text/plain, Size: 4493 bytes --]

On Fri, Oct 24, 2014 at 01:05:39AM +0000, Duncan wrote:
> Austin S Hemmelgarn posted on Thu, 23 Oct 2014 07:39:28 -0400 as
> excerpted:
> 
> > On 2014-10-23 05:19, Miao Xie wrote:
> >>
> >> Now my colleague and I is implementing the scrub/replace for RAID5/6
> >> and I have a plan to reimplement the balance and split it off from the
> >> metadata/file data process. the main idea is
> >> - allocate a new chunk which has the same size as the relocated one,
> >>   but don't insert it into the block group list, so we don't allocate
> >>   the free space from it.
> >> - set the source chunk to be Read-only
> >> - copy the data from the source chunk to the new chunk
> >> - replace the extent map of the source chunk with the one of the new
> >>   chunk(The new chunk has the same logical address and the length as
> >>   the old one)
> >> - release the source chunk
> >>
> >> By this way, we needn't deal the data one extent by one extent, and
> >> needn't do any space reservation, so the speed will be very fast even
> >> [if] we have lots of snapshots.
> >>
> > Even if balance gets re-implemented this way, we should still provide
> > some way to consolidate the data from multiple partially full chunks.
> > Maybe keep the old balance path and have some option (maybe call it
> > aggressive?) that turns it on instead of the new code.
> 
> IMO:
> 
> * Keep normal default balance behavior as-is.
> 
> * Add two new options, --fast, and --aggressive.
> 
> * --aggressive behaves as today and is the normal default.
> 
> * --fast is the new chunk-by-chunk behavior.  This becomes the default if 
> the convert filter is used, or if balance detects that it /is/ changing 
> the mode, thus converting or filling in missing chunk copies, even when 
> the convert filter was not specifically set.  Thus, if there's only one 
> chunk copy (single or raid0 mode, or raid1/10 or dup with a missing/
> invalid copy) and the balance would result in two copies, default to
> --fast.  Similarly, if it's raid1/10 and switching to single/raid0, 
> default to --fast.  If no conversion is being done, keep the normal
> --aggressive default.

My pet peeve:  if balance is converting profiles from RAID1 to single,
the conversion should be *instantaneous* (or at least small_constant *
number_of_block_groups).  Pick one mirror, keep all the chunks on that
mirror, delete all the corresponding chunks on the other mirror.

Sometimes when a RAID1 mirror dies we want to temporarily convert
the remaining disk to single data / DUP metadata while we wait for
a replacement.  Right now if we try to do this, we discover:

	- if the system reboots during the rebalance, btrfs now sees a
	mix of single and RAID1 data profiles on the disk.  The rebalance
	takes a long time, and a hardware replacement has been ordered,
	so the probability of this happening is pretty close to 1.0.

	- one disk is missing, so there's a check in the mount code path
	that counts missing disks like this:

		- RAID1 profile: we can tolerate 1 missing disk so just
		mount rw,degraded

		- single profile: we can tolerate zero missing disks,
		so we don't allow rw mounts even if degraded.

That filesystem is now permanently read-only (or at least it was in 3.14).
It's not even possible to add or replace disks any more since that
requires mounting the filesystem read-write.

> * Users could always specify the behavior they want, overriding the 
> default, using the appropriate option.
> 
> * Of course defaults may result in some chunks being rebalanced in fast 
> mode, while others are rebalanced in aggressive mode, if for instance 
> it's 3+ device raid1 mode filesystem with one device missing, since in 
> that case there'd be the usual two copies of some chunks and those would 
> default to aggressive, while there'd be one copy of chunks where the 
> other one was on the missing device.  However, users could always specify 
> the desired behavior using the last point above, thus getting the same 
> behavior for the entire balance.
> 
> -- 
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

next prev parent reply	other threads:[~2014-10-24  2:35 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-21 18:59 device balance times Tomasz Chmielewski
2014-10-21 20:14 ` Piotr Pawłow
2014-10-21 20:44   ` Arnaud Kapp
2014-10-22  1:10     ` 5 _thousand_ snapshots? even 160? (was: device balance times) Robert White
2014-10-22  4:02       ` Zygo Blaxell
2014-10-22  4:05       ` Duncan
2014-10-23 20:38         ` 5 _thousand_ snapshots? even 160? Arnaud Kapp
2014-10-22 11:30       ` Austin S Hemmelgarn
2014-10-22 17:32       ` Goffredo Baroncelli
2014-10-22 11:22     ` device balance times Austin S Hemmelgarn
2014-10-22  1:43   ` Chris Murphy
2014-10-22 12:40     ` Piotr Pawłow
2014-10-22 16:59       ` Bob Marley
2014-10-23  7:39         ` Russell Coker
2014-10-23  8:49           ` Duncan
2014-10-23  9:19       ` Miao Xie
2014-10-23 11:39         ` Austin S Hemmelgarn
2014-10-24  1:05           ` Duncan
2014-10-24  2:35             ` Zygo Blaxell [this message]
2014-10-24  5:13               ` Duncan
2014-10-24 15:18                 ` Zygo Blaxell
2014-10-24 10:58               ` Rich Freeman
2014-10-24 16:07                 ` Zygo Blaxell
2014-10-24 19:58                   ` Rich Freeman
2014-10-22 16:15     ` Chris Murphy
2014-10-23  2:44       ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141024023529.GD17395@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.