Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* Raid5 Write Hole: Is it worse than in MD?
@ 2020-10-13  9:34 Hendrik Friedel
  2020-10-13  9:43 ` Johannes Thumshirn
  2020-10-13 22:54 ` Zygo Blaxell, @hungrycats.org
  0 siblings, 2 replies; 4+ messages in thread
From: Hendrik Friedel @ 2020-10-13  9:34 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I recently read this article about the write-hole in md:
https://lwn.net/Articles/665299/

Whilst the article is focused on the journal as a fix for the write hole 
(by the way: Is that possible with btrfs?), it made me wonder, if the 
write hole in btrfs is any worse than in md?

Regards,
Hendrik


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid5 Write Hole: Is it worse than in MD?
  2020-10-13  9:34 Raid5 Write Hole: Is it worse than in MD? Hendrik Friedel
@ 2020-10-13  9:43 ` Johannes Thumshirn
  2020-10-13 13:46   ` Piotr Szymaniak
  2020-10-13 22:54 ` Zygo Blaxell, @hungrycats.org
  1 sibling, 1 reply; 4+ messages in thread
From: Johannes Thumshirn @ 2020-10-13  9:43 UTC (permalink / raw)
  To: Hendrik Friedel, linux-btrfs@vger.kernel.org

On 13/10/2020 11:34, Hendrik Friedel wrote:
> Whilst the article is focused on the journal as a fix for the write hole 
> (by the way: Is that possible with btrfs?), it made me wonder, if the 
> write hole in btrfs is any worse than in md?

Not a direct answer to your question, but IMHO adding a journal isn't the 
right fix for btrfs. The correct fix for the write hole (and other problems
we encountered with btrfs raid5/6) would be a raid stripe tree.

This is something I'm currently investigating.

For the other problems of raid56, Zygo once compiled a very comprehensive list,
but I don't have the link anymore.

Byte,
	Johannes

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid5 Write Hole: Is it worse than in MD?
  2020-10-13  9:43 ` Johannes Thumshirn
@ 2020-10-13 13:46   ` Piotr Szymaniak
  0 siblings, 0 replies; 4+ messages in thread
From: Piotr Szymaniak @ 2020-10-13 13:46 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Hendrik Friedel, linux-btrfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

On Tue, Oct 13, 2020 at 09:43:25AM +0000, Johannes Thumshirn wrote:
> *snip*
> For the other problems of raid56, Zygo once compiled a very comprehensive list,
> but I don't have the link anymore.

This list (both user/dev):
https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@hungrycats.org/
https://lore.kernel.org/linux-btrfs/20200627030614.GW10769@hungrycats.org/


Best regards,
Piotr Szymaniak.
-- 
Chyba  musze  juz wracac do sklepu.  Kelly jest w porzadku,  ale czasem
potrafi  zupelnie sie wylaczyc.  I do tego nie wierzy w cos takiego jak
odpowiedzialnosc.  Ma to jakis zwiazek z ta sekta,  do  ktorej  nalezy.
Maharishi Woda-z-mozgu czy cos w tym stylu.
  -- Graham Masterton, "Mirror"

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid5 Write Hole: Is it worse than in MD?
  2020-10-13  9:34 Raid5 Write Hole: Is it worse than in MD? Hendrik Friedel
  2020-10-13  9:43 ` Johannes Thumshirn
@ 2020-10-13 22:54 ` Zygo Blaxell, @hungrycats.org
  1 sibling, 0 replies; 4+ messages in thread
From: Zygo Blaxell, @hungrycats.org @ 2020-10-13 22:54 UTC (permalink / raw)
  To: Hendrik Friedel; +Cc: linux-btrfs

On Tue, Oct 13, 2020 at 09:34:50AM +0000, Hendrik Friedel wrote:
> Hello,
> 
> I recently read this article about the write-hole in md:
> https://lwn.net/Articles/665299/
> 
> Whilst the article is focused on the journal as a fix for the write hole (by
> the way: Is that possible with btrfs?), it made me wonder, if the write hole
> in btrfs is any worse than in md?

It is hard to compare them directly, because write hole is only one of
several ways a raid5 array can fail on either mdadm or btrfs, and both
have significant shortcomings.

btrfs and mdadm have separate strengths and weaknesses in their raid5
implementations.  e.g. btrfs can often recover from data corruption that
is not reported by the drives, while mdadm can't detect or repair it.
On the other hand, mdadm has no problems reading a degraded non-corrupted
raid5 array that I know of, while btrfs has some known troubles there.

It's possible to implement a raid5 stripe update journal (or tree), but
it's not the only possible solution (or only part of a complete solution).
Other possible solutions include:

	- adjust the allocator to minimize stripe RMW update operations
	(effectively banning them outright for datacow and metadata), or

	- throw out the existing raid5/6 implementation and start
	over with an implementation that works in harmony with the
	copy-on-write semantics, more like the way data compression in
	btrfs works now (effectively solving the problem the same way
	ZFS did).

These all have various performance and capability tradeoffs.  Some of
them can even be combined (e.g. minimize RMW updates with allocator
changes, fall back to stripe log tree for the rest).

> Regards,
> Hendrik
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-10-13 23:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-13  9:34 Raid5 Write Hole: Is it worse than in MD? Hendrik Friedel
2020-10-13  9:43 ` Johannes Thumshirn
2020-10-13 13:46   ` Piotr Szymaniak
2020-10-13 22:54 ` Zygo Blaxell, @hungrycats.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox