public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: Goffredo Baroncelli <kreijack@libero.it>,
	linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.cz>,
	Sinnamohideen Shafeeq <shafeeqs@panasas.com>,
	Paul Jones <paul@pauljones.id.au>, Boris Burkov <boris@bur.io>,
	Goffredo Baroncelli <kreijack@inwind.it>
Subject: Re: [PATCH 1/5] btrfs: add flags to give an hint to the chunk allocator
Date: Tue, 22 Mar 2022 23:26:28 -0400	[thread overview]
Message-ID: <YjqTZIxRm0wJ05Rx@hungrycats.org> (raw)
In-Reply-To: <YjoNvoIy/WmulvEc@localhost.localdomain>

On Tue, Mar 22, 2022 at 01:56:14PM -0400, Josef Bacik wrote:
> On Sun, Mar 06, 2022 at 07:14:39PM +0100, Goffredo Baroncelli wrote:
> > From: Goffredo Baroncelli <kreijack@inwind.it>
> > 
> > Add the following flags to give an hint about which chunk should be
> > allocated in which disk:
> > 
> > - BTRFS_DEV_ALLOCATION_HINT_DATA_PREFERRED
> >   preferred for data chunk, but metadata chunk allowed
> > - BTRFS_DEV_ALLOCATION_HINT_METADATA_PREFERRED
> >   preferred for metadata chunk, but data chunk allowed
> > - BTRFS_DEV_ALLOCATION_HINT_METADATA_ONLY
> >   only metadata chunk allowed
> > - BTRFS_DEV_ALLOCATION_HINT_DATA_ONLY
> >   only data chunk allowed
> > 
> > Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> > ---
> >  include/uapi/linux/btrfs_tree.h | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> > index b069752a8ecf..e0d842c2e616 100644
> > --- a/include/uapi/linux/btrfs_tree.h
> > +++ b/include/uapi/linux/btrfs_tree.h
> > @@ -389,6 +389,22 @@ struct btrfs_key {
> >  	__u64 offset;
> >  } __attribute__ ((__packed__));
> >  
> > +/* dev_item.type */
> > +
> > +/* btrfs chunk allocation hint */
> > +#define BTRFS_DEV_ALLOCATION_HINT_BIT_COUNT	2
> > +/* btrfs chunk allocation hint mask */
> > +#define BTRFS_DEV_ALLOCATION_HINT_MASK	\
> > +	((1 << BTRFS_DEV_ALLOCATION_HINT_BIT_COUNT) - 1)
> > +/* preferred data chunk, but metadata chunk allowed */
> > +#define BTRFS_DEV_ALLOCATION_HINT_DATA_PREFERRED	(0ULL)
> 
> Actually don't we have 0 set on type already?  So this will make all existing
> file systems have DATA_PREFERRED, when in reality they may not want that
> behavior.  So should we start at 1?  Thanks,

If all the disks in the filesystem have the same value, and it's one
of the two _PREFERRED values, then there is no change in behavior from
before:  all devices will be filled with no (difference in) preferences.
So if the default is either DATA_PREFERRED or METADATA_PREFERRED, then
nothing changes for existing filesystems until at least one device is
set to some other type.

Another problem is, what happens if we add a new device to a filesystem
where some devices have _ONLY set?  In that case, the new device has
type 0, and might get either data or metadata allocated on it that we
don't want.

Four possible solutions to choose from:

1.  Leave things as they are, sysadmins will have to set the type on
new devices immediately after add, and possibly balance to fix bad
allocations if they lose the race.

2.  Make device type a parameter to the add-device ioctl, so that
drives can be added with a non-default type already set.

3.  Define "0" as "get the preference value from a mount option, a
sysfs variable, or some on-disk item?" which is another way to do
#2 that doesn't mean having to change an existing ioctl's parameters
or roll a new one (I haven't checked to see if we have a spare u64
on device add).

4.  Define "0" as meaning "last resort", a new preference level where
the device is not preferred for any allocation unless there are no other
devices with a preference set.  Ecch I don't like this one because there's
a possible race where we're converting e.g. a 4-disk raid1 array into 2
data-only 2 metadata-only, and there's a point where we have 2x data-only
1x metadata-only and one type 0 device and it's possible to ENOSPC on
metadata at that moment if type 0 doesn't allow metadata allocation.
This plan would prevent the "data on the wrong drive" failure mode, so
it's in the list, but don't use it unless you can solve the new problem.


> Josef
> 

  parent reply	other threads:[~2022-03-23  3:26 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-06 18:14 [PATCH 0/5][V12] btrfs: allocation_hint Goffredo Baroncelli
2022-03-06 18:14 ` [PATCH 1/5] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2022-03-22 17:51   ` Josef Bacik
2022-03-22 17:56   ` Josef Bacik
2022-03-22 18:49     ` Goffredo Baroncelli
2022-04-06 19:32       ` Goffredo Baroncelli
2022-03-23  3:26     ` Zygo Blaxell [this message]
2022-03-22 19:52   ` Josef Bacik
2022-03-22 20:25     ` Goffredo Baroncelli
2022-03-23  2:56       ` Zygo Blaxell
2022-03-24 19:05         ` Goffredo Baroncelli
2022-03-25 14:59       ` Josef Bacik
2022-03-25 18:55         ` Goffredo Baroncelli
2022-03-06 18:14 ` [PATCH 2/5] btrfs: export the device allocation_hint property in sysfs Goffredo Baroncelli
2022-03-08 13:25   ` kernel test robot
2022-03-22 18:05   ` Josef Bacik
2022-03-22 19:02     ` Goffredo Baroncelli
2022-03-06 18:14 ` [PATCH 3/5] btrfs: change the device allocation_hint property via sysfs Goffredo Baroncelli
2022-03-22 19:19   ` Josef Bacik
2022-03-22 19:52     ` Goffredo Baroncelli
2022-03-06 18:14 ` [PATCH 4/5] btrfs: add allocation_hint mode Goffredo Baroncelli
2022-03-22 19:47   ` Josef Bacik
2022-03-22 20:56     ` Goffredo Baroncelli
2022-03-06 18:14 ` [PATCH 5/5] btrfs: rename dev_item->type to dev_item->flags Goffredo Baroncelli
2022-03-21 20:29 ` [PATCH 0/5][V12] btrfs: allocation_hint Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjqTZIxRm0wJ05Rx@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=boris@bur.io \
    --cc=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@inwind.it \
    --cc=kreijack@libero.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=paul@pauljones.id.au \
    --cc=shafeeqs@panasas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox