public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Andrei Borzenkov <arvidjaar@gmail.com>
Cc: kreijack@inwind.it, linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Question: how understand the raid profile of a btrfs filesystem
Date: Sat, 21 Mar 2020 03:14:25 -0400	[thread overview]
Message-ID: <20200321071425.GS13306@hungrycats.org> (raw)
In-Reply-To: <bd70e1fc-5e5b-672b-cc7c-0cd9b8b31e4a@gmail.com>

On Sat, Mar 21, 2020 at 08:40:50AM +0300, Andrei Borzenkov wrote:
> 21.03.2020 06:29, Zygo Blaxell пишет:
> > On Fri, Mar 20, 2020 at 06:56:38PM +0100, Goffredo Baroncelli wrote:
> >> Hi all,
> >>
> >> for a btrfs filesystem, how an user can understand which is the {data,mmetadata,system} [raid] profile in use ? E.g. the next chunk which profile will have ?
> > 
> > It's the profile used by the highest-numbered block group for the
> > allocation type (one for data, one for metadata/system).
> 
> Is "highest-numbered" block group always the last one created? 

It's not required by the filesystem format but it is the current behavior
of the implementation.

> Can block group numbers wrap around?

In theory, yes, but they are 64 bits long and correspond to bytes in the
filesystem's address space.  If you loop balancing a filesystem with a
single 4K data block and you can do it at 1000 block groups per second,
you'll wrap around in a little over six months.  Typical use cases
(and even extreme ones) will take centuries to wrap around if you
are converting all the time.

> Recently someone reported that after conversion block groups with old
> profile remained and this probably explains it - conversion races with
> new allocation.

Conversion *is* new allocation, no race is possible because they are
the same thing.

While a conversion is running, the conversion itself forces the
raid profile of newly created block groups, so there is no race.
After conversion is completed, there is special case code to prevent the
last empty block group in the filesystem from being deleted; otherwise,
btrfs would lose information about the selected raid profile.

When a conversion is paused or cancelled, new allocations normally
continue using the conversion target profile; however, if all block
groups of the new profile are deleted (i.e. all the data contained in
the new block groups are removed) then it is possible to revert back to
allocating using an older profile.  e.g. if you want to combine a balance
convert with a device remove, you have to let the convert run long enough
to ensure several block groups of the new raid profile exist on other
drives than the drive being removed.  The device remove will delete all
block groups on the removed device, in reverse device physical offset
order which is often (but not necessarily) reverse block group order.
This leads to device remove switching back to the old RAID profile.
This example is not any kind of race--the result can be produced
deterministically, and the conversion must be paused first.

A conversion can be forcibly stopped by various events:  crashes,
unmounting the filesystem, having an unrecoverable read or write error,
or running out of space.  These events will leave block groups with old
profiles on the disk.  Generally if an external event forces conversion
to stop, then it will need to be manually restarted.

If there are uncorrectable read errors on the filesystem then affected
data blocks must be removed from the filesystem before conversion can
be completed.  Same with free space, you must have enough to complete.

Old versions of mkfs.btrfs had bugs which would leave empty block groups
with different profiles on the filesystem.  When in doubt, or if you have
an older vintage btrfs filesystem, run a converting balance with the
desired raid profile and the 'soft' filter to be sure only one profile
is present--it will be a no-op if conversion is complete; otherwise,
it will finish the conversion.

> >> So the question is: the next chunk which profile will have ?
> >> Is there any way to understand what will happens ?
> 
> Well, from that explanation it is not possible using standard tools -
> one needs to crawl btrfs internals to find out the "last" block group.

This is required only during the conversion process.  In normal cases
users can assume the only profile present is the one that will be used.

The python-btrfs package contains an example of listing block groups.
The last entry in the list will have the current allocation profile.

An unprivileged user can monitor 'btrfs fi df' output over time.
Used space will increase or decrease in the current profile, and
only decrease in the other profiles.

  reply	other threads:[~2020-03-21  7:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-20 17:56 Question: how understand the raid profile of a btrfs filesystem Goffredo Baroncelli
2020-03-21  3:29 ` Zygo Blaxell
2020-03-21  5:40   ` Andrei Borzenkov
2020-03-21  7:14     ` Zygo Blaxell [this message]
2020-03-21  9:55   ` Goffredo Baroncelli
2020-03-21 23:26     ` Zygo Blaxell
2020-03-22  8:34       ` Goffredo Baroncelli
2020-03-22  8:38         ` Goffredo Baroncelli
2020-03-22 23:49           ` Zygo Blaxell
2020-03-23 20:50             ` Goffredo Baroncelli
2020-03-23 22:48               ` Graham Cobb
2020-03-25  4:09                 ` Zygo Blaxell
2020-03-25  4:30                   ` Paul Jones
2020-03-26  2:51                     ` Zygo Blaxell
2020-03-23 23:18               ` Zygo Blaxell
2020-03-24  4:55 ` Anand Jain
2020-03-24 17:59   ` Goffredo Baroncelli
2020-03-25  4:09     ` Andrei Borzenkov
2020-03-25 17:14       ` Goffredo Baroncelli
2020-03-26  3:10         ` Zygo Blaxell
  -- strict thread matches above, loose matches on Subject: below --
2020-03-20 17:58 Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200321071425.GS13306@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=arvidjaar@gmail.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox