From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from james.kirk.hungrycats.org ([174.142.39.145]:46103 "EHLO james.kirk.hungrycats.org" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1755127AbaJXCfa (ORCPT ); Thu, 23 Oct 2014 22:35:30 -0400 Date: Thu, 23 Oct 2014 22:35:29 -0400 From: Zygo Blaxell To: Duncan <1i5t5.duncan@cox.net> Cc: linux-btrfs@vger.kernel.org Subject: Re: device balance times Message-ID: <20141024023529.GD17395@hungrycats.org> References: <845c0ca8cc78ed97da487bf7f4b7b122@admin.virtall.com> <5446BEC0.8070009@siedziba.pl> <02A17DFE-290C-4447-99E9-D39480D7A26A@colorremedies.com> <5447A5CF.9060405@siedziba.pl> <5448C81E.4060701@cn.fujitsu.com> <5448E8F0.7070004@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="J5MfuwkIyy7RmF4Q" In-Reply-To: Sender: linux-btrfs-owner@vger.kernel.org List-ID: --J5MfuwkIyy7RmF4Q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Oct 24, 2014 at 01:05:39AM +0000, Duncan wrote: > Austin S Hemmelgarn posted on Thu, 23 Oct 2014 07:39:28 -0400 as > excerpted: >=20 > > On 2014-10-23 05:19, Miao Xie wrote: > >> > >> Now my colleague and I is implementing the scrub/replace for RAID5/6 > >> and I have a plan to reimplement the balance and split it off from the > >> metadata/file data process. the main idea is > >> - allocate a new chunk which has the same size as the relocated one, > >> but don't insert it into the block group list, so we don't allocate > >> the free space from it. > >> - set the source chunk to be Read-only > >> - copy the data from the source chunk to the new chunk > >> - replace the extent map of the source chunk with the one of the new > >> chunk(The new chunk has the same logical address and the length as > >> the old one) > >> - release the source chunk > >> > >> By this way, we needn't deal the data one extent by one extent, and > >> needn't do any space reservation, so the speed will be very fast even > >> [if] we have lots of snapshots. > >> > > Even if balance gets re-implemented this way, we should still provide > > some way to consolidate the data from multiple partially full chunks. > > Maybe keep the old balance path and have some option (maybe call it > > aggressive?) that turns it on instead of the new code. >=20 > IMO: >=20 > * Keep normal default balance behavior as-is. >=20 > * Add two new options, --fast, and --aggressive. >=20 > * --aggressive behaves as today and is the normal default. >=20 > * --fast is the new chunk-by-chunk behavior. This becomes the default if= =20 > the convert filter is used, or if balance detects that it /is/ changing= =20 > the mode, thus converting or filling in missing chunk copies, even when= =20 > the convert filter was not specifically set. Thus, if there's only one= =20 > chunk copy (single or raid0 mode, or raid1/10 or dup with a missing/ > invalid copy) and the balance would result in two copies, default to > --fast. Similarly, if it's raid1/10 and switching to single/raid0,=20 > default to --fast. If no conversion is being done, keep the normal > --aggressive default. My pet peeve: if balance is converting profiles from RAID1 to single, the conversion should be *instantaneous* (or at least small_constant * number_of_block_groups). Pick one mirror, keep all the chunks on that mirror, delete all the corresponding chunks on the other mirror. Sometimes when a RAID1 mirror dies we want to temporarily convert the remaining disk to single data / DUP metadata while we wait for a replacement. Right now if we try to do this, we discover: - if the system reboots during the rebalance, btrfs now sees a mix of single and RAID1 data profiles on the disk. The rebalance takes a long time, and a hardware replacement has been ordered, so the probability of this happening is pretty close to 1.0. - one disk is missing, so there's a check in the mount code path that counts missing disks like this: - RAID1 profile: we can tolerate 1 missing disk so just mount rw,degraded - single profile: we can tolerate zero missing disks, so we don't allow rw mounts even if degraded. That filesystem is now permanently read-only (or at least it was in 3.14). It's not even possible to add or replace disks any more since that requires mounting the filesystem read-write. > * Users could always specify the behavior they want, overriding the=20 > default, using the appropriate option. >=20 > * Of course defaults may result in some chunks being rebalanced in fast= =20 > mode, while others are rebalanced in aggressive mode, if for instance=20 > it's 3+ device raid1 mode filesystem with one device missing, since in=20 > that case there'd be the usual two copies of some chunks and those would= =20 > default to aggressive, while there'd be one copy of chunks where the=20 > other one was on the missing device. However, users could always specify= =20 > the desired behavior using the last point above, thus getting the same=20 > behavior for the entire balance. >=20 > --=20 > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman >=20 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html --J5MfuwkIyy7RmF4Q Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlRJuvEACgkQgfmLGlazG5zM/gCg1rZtD1WbnP7cwZfJNNXD8GKa xQUAoIvQpDhGfw1JS8QilfHRXdwxcUCv =P5N5 -----END PGP SIGNATURE----- --J5MfuwkIyy7RmF4Q--