Re: Status of RAID5/6 - Goffredo Baroncelli

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Goffredo Baroncelli <kreijack@inwind.it>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	Chris Murphy <lists@colorremedies.com>,
	Christoph Anton Mitterer <calestyo@scientia.net>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Status of RAID5/6
Date: Wed, 4 Apr 2018 07:15:54 +0200	[thread overview]
Message-ID: <cfae662a-d64d-6fd8-9ca3-9892fc349747@inwind.it> (raw)
In-Reply-To: <20180403225742.GJ2446@hungrycats.org>

On 04/04/2018 12:57 AM, Zygo Blaxell wrote:
>> I have to point out that in any case the extent is physically
>> interrupted at the disk-stripe size. Assuming disk-stripe=64KB, if
>> you want to write 128KB, the first half is written in the first disk,
>> the other in the 2nd disk.  If you want to write 96kb, the first 64
>> are written in the first disk, the last part in the 2nd, only on a
>> different BG.
> The "only on a different BG" part implies something expensive, either
> a seek or a new erase page depending on the hardware.  Without that,
> nearby logical blocks are nearby physical blocks as well.

In any case it happens on a different disk

> 
>> So yes there is a fragmentation from a logical point of view; from a
>> physical point of view the data is spread on the disks in any case.

> What matters is the extent-tree point of view.  There is (currently)
> no fragmentation there, even for RAID5/6.  The extent tree is unaware
> of RAID5/6 (to its peril).

Before you pointed out that the non-contiguous block written has an impact on performance. I am replaying  that the switching from a different BG happens at the stripe-disk boundary, so in any case the block is physically interrupted and switched to another disk

However yes: from an extent-tree point of view there will be an increase of number extents, because the end of the writing is allocated to another BG (if the size is not stripe-boundary)

> If an application does a loop writing 68K then fsync(), the multiple-BG
> solution adds two seeks to read every 68K.  That's expensive if sequential
> read bandwidth is more scarce than free space.

Why you talk about an additional seeks? In any case (even without the additional BG) the read happens from another disks

>> * c),d),e) are applied only for the tail of the extent, in case the
> size is less than the stripe size.
> 
> It's only necessary to split an extent if there are no other writes
> in the same transaction that could be combined with the extent tail
> into a single RAID stripe.  As long as everything in the RAID stripe
> belongs to a single transaction, there is no write hole

May be that a more "simpler" optimization would be close the transaction when the data reach the stripe boundary... But I suspect that it is not so simple to implement.

> Not for d.  Balance doesn't know how to get rid of unreachable blocks
> in extents (it just moves the entire extent around) so after a balance
> the writes would still be rounded up to the stripe size.  Balance would
> never be able to free the rounded-up space.  That space would just be
> gone until the file was overwritten, deleted, or defragged.

If balance is capable to move the extent, why not place one near the other during a balance ? The goal is not to limit the the writing of the end of a extent, but avoid writing the end of an extent without further data (e.g. the gap to the stripe has to be filled in the same transaction)

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

next prev parent reply	other threads:[~2018-04-04  5:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21 16:50 Status of RAID5/6 Menion
2018-03-21 17:24 ` Liu Bo
2018-03-21 20:02   ` Christoph Anton Mitterer
2018-03-22 12:01     ` Austin S. Hemmelgarn
2018-03-29 21:50     ` Zygo Blaxell
2018-03-30  7:21       ` Menion
2018-03-31  4:53         ` Zygo Blaxell
2018-03-30 16:14       ` Goffredo Baroncelli
2018-03-31  5:03         ` Zygo Blaxell
2018-03-31  6:57           ` Goffredo Baroncelli
2018-03-31  7:43             ` Zygo Blaxell
2018-03-31  8:16               ` Goffredo Baroncelli
     [not found]                 ` <28a574db-0f74-b12c-ab5f-400205fd80c8@gmail.com>
2018-03-31 14:40                   ` Zygo Blaxell
2018-03-31 22:34             ` Chris Murphy
2018-04-01  3:45               ` Zygo Blaxell
2018-04-01 20:51                 ` Chris Murphy
2018-04-01 21:11                   ` Chris Murphy
2018-04-02  5:45                     ` Zygo Blaxell
2018-04-02 15:18                       ` Goffredo Baroncelli
2018-04-02 15:49                         ` Austin S. Hemmelgarn
2018-04-02 22:23                           ` Zygo Blaxell
2018-04-03  0:31                             ` Zygo Blaxell
2018-04-03 17:03                               ` Goffredo Baroncelli
2018-04-03 22:57                                 ` Zygo Blaxell
2018-04-04  5:15                                   ` Goffredo Baroncelli [this message]
2018-04-04  6:01                                     ` Zygo Blaxell
2018-04-04 21:31                                       ` Goffredo Baroncelli
2018-04-04 22:38                                         ` Zygo Blaxell
2018-04-04  3:08                                 ` Chris Murphy
2018-04-04  6:20                                   ` Zygo Blaxell
2018-03-21 20:27   ` Menion
2018-03-22 21:13   ` waxhead

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfae662a-d64d-6fd8-9ca3-9892fc349747@inwind.it \
    --to=kreijack@inwind.it \
    --cc=ahferroin7@gmail.com \
    --cc=calestyo@scientia.net \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).