linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AW: AW: RAID1 and data safety?
@ 2005-03-29  8:54 Schuett Thomas EXT
  2005-03-29  9:27 ` Peter T. Breuer
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Schuett Thomas EXT @ 2005-03-29  8:54 UTC (permalink / raw)
  To: linux-raid

I wrote:
>> 
>> Still, if there is different data on the two disks due to a previous 
>> power failure, the comparsion could really be the better choise, isn't it?
>> 
>

Neil wrote:
>What does a comparison of two blocks tell you?  That they are
>different, not which one is "right".
>
>A filesystem designed to handle these sort of problems wouldn't suffer
>from data inconsistencies due to power off.  It would "know" where it
>was up to and would either re-write or ignore any data that it doesn't
>know to certainly be safe.

But:
If you have a raid1 and a journaling fs, see the following:
If the system chrashes at the end of a write transaction,
then the end-of-transaction information may got written 
to hda already, but not to hdb. On the next boot, the 
journaling fs may see an overall unclean bit (*probably* a transaction
is pending), so it reads the transaction log. 

And here the fault happens:
By chance, it reads the transaction log from hda, then sees, that the
transaction was finished, and clears the overall unclean bit. 
This cleaning is a write, so it goes to *both* HDs.

Situation now: On hdb there is a pending transaction in the transaction 
log, but the overall unclean bit is cleared. This may not be realised,
until by chance a year later hda chrashes, and you finaly face the fact,
that there is a corrupt situation on the left HD.

Solution approach: if it would have read the transaction log from both HDs,
it would have returned a read fault. A good journaling fs probably stores
the end-of-transaction info in a different block, than the start-of-transaction 
info. Then it can say: If I cannot properly read the end-of-transaction info,
then I consider the transaction as not finished, so I do a rollback. 
(Of course this requires a readable start-of-transaction info, therefore it 
should be stored seperate from the end-of-transaction info.)

Does this sound reasonable?

Thomas

^ permalink raw reply	[flat|nested] 18+ messages in thread
* AW: RAID1 and data safety?
@ 2005-04-07 15:35 Schuett Thomas EXT
  2005-04-07 16:05 ` Doug Ledford
  0 siblings, 1 reply; 18+ messages in thread
From: Schuett Thomas EXT @ 2005-04-07 15:35 UTC (permalink / raw)
  To: 'Doug Ledford'; +Cc: linux-raid

[Please excuse, my mailtool breaks threads ...]
Reply to mail from 2005-04-05

Hello Doug,

many thanks for this highly detailed and structured posting.

A few questions are left: Is it common today, that a (eide) HD does
not state a write as finished (aka send completion events, if I got this 
right), before it was written to *media*?

I am happy to hear about this "write barriers", even as I am astonished, that 
it doesn't bring down the whole system performance (at least for raid1).


> This is where the event counters
> come into play.  That's what md uses to be able to tell which drives in
> an array are up to date versus those that aren't, which is what's needed
> to satisfy C.

So event counters are the 2nd type of information, that gets written with write 
barriers. One is the journal data from the (j)fs (and actually the real data 
too, to make it gain sence, otherwise the end-of-transaction-write is like a 
semaphor with only one of the two parties using it), and the other is the event 
counter.

> Now, if I recall correctly, Peter posted a patch that changed this
> semantic in the raid1 code.  The raid1 code does not complete a write to
> the upper layers of the kernel until it's been completed on all devices
> and his patch made it such that as soon as it hit 1 device it returned
> the write to the upper layers of the kernel.

I am glad to hear, that the behaviour is such, that the barrier stops, until 
*all* media got written. That was one of the things that really made me 
worrying. I hope, the patch is backed out and didn't went into any distros.

> had in its queue.  Being a nice, smart SCSI disk with tagged queuing
> enabled, it then proceeds to complete the whole queue of writes in
> whatever order is most efficient for it.

But just to make sure: Your previous statement "...when the linux block layer 
did not provide any means of write barriers. As a result, they used completion 
events as write barriers." indicates, that even "nice, smart SCSI disk with 
tagged queuing enabled" will act as demanded, because the special way of write 
with appended "completion events testing" will make sure they do?

---

You mentioned data journaling, and it sounded like it is reliable working. 
Which one of the existing journaling fs did you have in your mind?

---

Afaik a read only reads from *one* HD (in raid1). So how to be sure, 
that *both* HDs are still perfectly o.k.? Am I am fine to do a 
   cat /dev/hda2 > /dev/null ; cat /dev/hdb2 > /dev/null
even *during* the md is active and getting used r/w?

best regards,
  Thomas



^ permalink raw reply	[flat|nested] 18+ messages in thread
* AW: RAID1 and data safety?
@ 2005-03-22 15:08 Schuett Thomas EXT
  2005-03-23 22:31 ` Neil Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Schuett Thomas EXT @ 2005-03-22 15:08 UTC (permalink / raw)
  To: linux-raid

>Neil Brown wrote:
>>> Is there any way to tell MD to [...] and
>>> read-from-all-disks on a RAID1 array?
>
>Not sure why a checksum of X data blocks should be cheaper
>performance-wise than a comparison between X data blocks, but I can
>see the point in that you only have to load the data once and check
>the checksum.  Not quite the same security, but almost.

Still, if there is different data on the two disks due to a previous 
power failure, the comparsion could really be the better choise, isn't it?



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-04-08 12:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-29  8:54 AW: AW: RAID1 and data safety? Schuett Thomas EXT
2005-03-29  9:27 ` Peter T. Breuer
2005-03-29 10:09   ` Neil Brown
2005-03-29 11:26     ` Peter T. Breuer
2005-03-29 12:13       ` Lars Marowsky-Bree
2005-04-04 22:57       ` Doug Ledford
2005-03-29  9:30 ` AW: " Molle Bestefich
2005-03-29 10:08 ` AW: " Neil Brown
2005-03-29 11:29   ` Peter T. Breuer
2005-03-29 16:46     ` Luca Berra
2005-03-29 18:43       ` Peter T. Breuer
2005-03-29 20:07     ` Mario Holbe
2005-04-04 20:06     ` Doug Ledford
2005-04-08 12:16       ` Peter T. Breuer
  -- strict thread matches above, loose matches on Subject: below --
2005-04-07 15:35 AW: " Schuett Thomas EXT
2005-04-07 16:05 ` Doug Ledford
2005-03-22 15:08 Schuett Thomas EXT
2005-03-23 22:31 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).