linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Kiper <dkiper@net-space.pl>
To: kreijack@inwind.it
Cc: Daniel Kiper <dkiper@net-space.pl>,
	grub-devel@gnu.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.
Date: Thu, 27 Sep 2018 17:47:59 +0200	[thread overview]
Message-ID: <20180927154759.GA22053@router-fw-old.i.net-space.pl> (raw)
In-Reply-To: <01115a89-3397-3cad-73b1-10495e11cb61@libero.it>

On Wed, Sep 26, 2018 at 10:40:32PM +0200, Goffredo Baroncelli wrote:
> On 25/09/2018 17.31, Daniel Kiper wrote:
> > On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote:
> >> From: Goffredo Baroncelli <kreijack@inwind.it>
> >>
> >> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> >> ---
> >>  grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 66 insertions(+)
> >>
> >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
> >> index be195448d..56c42746d 100644
> >> --- a/grub-core/fs/btrfs.c
> >> +++ b/grub-core/fs/btrfs.c
> >> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
> >>  #define GRUB_BTRFS_CHUNK_TYPE_RAID1         0x10
> >>  #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED    0x20
> >>  #define GRUB_BTRFS_CHUNK_TYPE_RAID10        0x40
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5         0x80
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6         0x100
> >>    grub_uint8_t dummy2[0xc];
> >>    grub_uint16_t nstripes;
> >>    grub_uint16_t nsubstripes;
> >> @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
> >>  	      stripe_offset = low + chunk_stripe_length
> >>  		* high;
> >>  	      csize = chunk_stripe_length - low;
> >> +	      break;
> >> +	    }
> >> +	  case GRUB_BTRFS_CHUNK_TYPE_RAID5:
> >> +	  case GRUB_BTRFS_CHUNK_TYPE_RAID6:
> >> +	    {
> >> +	      grub_uint64_t nparities, block_nr, high, low;
> >> +
> >> +	      redundancy = 1;	/* no redundancy for now */
> >> +
> >> +	      if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
> >> +		{
> >> +		  grub_dprintf ("btrfs", "RAID5\n");
> >> +		  nparities = 1;
> >> +		}
> >> +	      else
> >> +		{
> >> +		  grub_dprintf ("btrfs", "RAID6\n");
> >> +		  nparities = 2;
> >> +		}
> >> +
> >> +	      /*
> >> +	       * A RAID 6 layout consists of several blocks spread on the disks.
> >> +	       * The raid terminology is used to call all the blocks of a row
> >> +	       * "stripe". Unfortunately the BTRFS terminology confuses block
> >
> > Stripe is data set or parity (parity stripe) on one disk. Block has
> > different meaning. Please stick to btrfs terminology and say it clearly
> > in the comment. And even add a link to btrfs wiki page to ease reading.
> >
> > I think about this one:
> >   https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID
> >
> >> +	       * and stripe.
> >
> > I do not think so. Or at least not so much...
>
> Trust me, generally speaking stripe is the "row" in the disks (without the parity); looking at the ext3 man page:
>
> ....
>                    stride=stride-size
>                           Configure  the  filesystem  for  a  RAID  array with
>                           stride-size filesystem blocks. This is the number of
>                           blocks  read or written to disk before moving to the
>                           next disk, which is sometimes  referred  to  as  the
>                           chunk   size.   This  mostly  affects  placement  of
>                           filesystem metadata like bitmaps at mke2fs  time  to
>                           avoid  placing them on a single disk, which can hurt
>                           performance.  It may also be used by the block allo???
>                           cator.
>
>                    stripe_width=stripe-width
>                           Configure  the  filesystem  for  a  RAID  array with
>                           stripe-width filesystem blocks per stripe.  This  is
>                           typically  stride-size * N, where N is the number of
>                           data-bearing disks in the  RAID  (e.g.  for  RAID  5
>                           there is one parity disk, so N will be the number of
>                           disks in the array minus 1).  This allows the  block
>                           allocator to prevent read-modify-write of the parity
>                           in a RAID stripe if possible when the data is  writ???
>                           ten.
>
> ....
> Looking at the RAID5 wikipedia page, it seems that the term "stripe"
> is coherent with the ext3 man page.

Ugh... It looks that I have messed up things. Sorry about that.

> I suspect that the confusion is due to the fact that in RAID1 a stripe
> is in ONE disk (because the others are like parities). In BTRFS the
> RAID5/6 code uses the structure of RAID1 with some minimal
> extensions...
>
> To be clear, I don't have problem to be adherent to the BTRFS
> terminology. However I found this very confusing because I was used to
> a different terminology. I am bit worried about the fact that grub

Yeah, I have the same feeling. However, I think that in btrfs code we
should stay with btrfs terms. Though I think that it make sense to
underline differences between btrfs and well known RAID here.

> uses both MD/DM code and BTRFS code; a quick look to the code (eg
> ./grub-core/disk/diskfilter.c) shows that the stripe_size field seems
> to be related to a disks row without parities.
>
> And... yes in BTRFS "chunk" is a completely different beast than what
> it is reported in the ext3 man page :-)

As I said above, please say it in the comment. This will ease reading
for people who are not used to btrfs terms.

> >> +	       *
> >> +	       *   Disk0  Disk1  Disk2  Disk3
> >> +	       *
> >> +	       *    A1     B1     P1     Q1
> >> +	       *    Q2     A2     B2     P2
> >> +	       *    P3     Q3     A3     B3
> >> +	       *  [...]
> >> +	       *
> >> +	       *  Note that the placement of the parities depends on row index.
> >> +	       *  In the code below:
> >> +	       *  - block_nr is the block number without the parities
> >
> > Well, it seems to me that the btrfs code introduces confusion not the
> > spec itself. I would leave code as is but s/block number/stripe number/.
>
> Ok I will replace the two terms. However I have to put a warning that this is a "BTRFS" terminology :-)

Yep, and please explain the differences at the beginning of the comment.

> >> +	       *    (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
> >> +	       *  - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
> >> +	       *  - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...),
> >
> > s/disk number/disk number in a row/
>
> This value doesn't depend by the row. So "number of disk" is more correct

Yes, but without "row" it is a bit confusing because it suggests that
it is an arbitrary number. Even if you give an example next to the
description. So, I am insisting on adding "in a row" here.

> >> +	       *  - off is the logical address to read
> >> +	       *  - chunk_stripe_length is the size of a block (typically 64k),
> >
> > s/a block/a stripe/

Taking into account discussion above I am not sure right now which one
is correct. Please double check and fix it if it is needed.

> >> +	       *  - nstripes is the number of disks,
> >
> > s/number of disks/number of disks in a row/
> ditto

As above, Is it total number of disks in array? I do not think so.
Hence, I think that "in a row" helps a bit. Even if it is not very
precise. However, if you come up with something better I am not
against it.

> > I miss the description of nparities here...
>
> Right:
> +              *  - nparities is the number of parities (1 for RAID5, 2 for RAID6);
> +              *    used only in RAID5/6 code.

LGTM.

> >> +	       *  - low is the offset of the data inside a stripe,
> >> +	       *  - stripe_offset is the disk offset,
> >
> > s/the disk offset/the data offset in an array/?
>
> Yes
>
> >
> >> +	       *  - csize is the "potential" data to read. It will be reduced to
> >> +	       *    size if the latter is smaller.
> >> +	       */
> >> +	      block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
> >> +
> >> +	      /*
> >> +	       * stripen is computed without the parities (0 for A1, A2, A3...
> >> +	       * 1 for B1, B2...).
> >> +	       */
> >> +	      high = grub_divmod64 (block_nr, nstripes - nparities, &stripen);
> >
> > This is clear...
> >
> >> +	      /*
> >> +	       * stripen is recomputed considering the parities (0 for A1, 1 for
> >> +	       * A2, 2 for A3....).
> >> +	       */
> >> +	      grub_divmod64 (high + stripen, nstripes, &stripen);
> >
> > ... but this looks strange... You add disk number to row number. Hmmm...
> > It looks that it works but this is not obvious at first sight. Could you
> > explain that?
>
> What about
> +	      /*
> +	       * stripen is recomputed considering the parities: different row have
> +              * a different offset, we have to add to stripen the number of row ("high") in
> +              * modulo nstripes (0 for A1, 1 for A2, 2 for A3....).
> +	       */

This is better but not much. You are repeating what code does.
I am especially interested in why this math is correct. It is not
obvious at first sight. If it is not it should be explained.
Otherwise we will forget in a few months why it is correct.

Daniel

  reply	other threads:[~2018-09-27 15:53 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-19 18:40 [PATCH V7] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli
2018-09-25 15:31   ` Daniel Kiper
2018-09-26 20:40     ` Goffredo Baroncelli
2018-09-27 15:47       ` Daniel Kiper [this message]
2018-09-19 18:40 ` [PATCH 2/9] btrfs: Add helper to check the btrfs header Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 3/9] btrfs: Move the error logging from find_device() to its caller Goffredo Baroncelli
2018-09-25 17:23   ` Daniel Kiper
2018-09-19 18:40 ` [PATCH 4/9] btrfs: Avoid a rescan for a device which was already not found Goffredo Baroncelli
2018-09-25 17:29   ` Daniel Kiper
2018-09-26 19:55     ` Goffredo Baroncelli
2018-09-27 16:03       ` Daniel Kiper
2018-09-19 18:40 ` [PATCH 5/9] btrfs: Move logging code in grub_btrfs_read_logical() Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 6/9] btrfs: Refactor the code that read from disk Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 7/9] btrfs: Add support for recovery for a RAID 5 btrfs profiles Goffredo Baroncelli
2018-09-25 19:10   ` Daniel Kiper
2018-09-26 19:55     ` Goffredo Baroncelli
2018-09-27 16:18       ` Daniel Kiper
2018-09-19 18:40 ` [PATCH 8/9] btrfs: Make more generic the code for RAID 6 rebuilding Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 9/9] btrfs: Add RAID 6 recovery for a btrfs filesystem Goffredo Baroncelli
2018-09-25 19:20   ` Daniel Kiper
2018-09-26 19:56     ` Goffredo Baroncelli
2018-09-27 16:20       ` Daniel Kiper
  -- strict thread matches above, loose matches on Subject: below --
2018-09-27 18:34 [PATCH V8] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-09-27 18:34 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli
2018-10-09 17:51   ` Daniel Kiper
2018-10-11 13:17     ` Daniel Kiper
2018-10-11 18:50 [PATCH V9] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-10-11 18:50 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli
2018-10-17 13:46   ` Daniel Kiper
2018-10-18 17:55 [PATCH V10] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-10-18 17:55 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli
2018-10-22 17:29 [PATCH V11] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-10-22 17:29 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180927154759.GA22053@router-fw-old.i.net-space.pl \
    --to=dkiper@net-space.pl \
    --cc=grub-devel@gnu.org \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).