mdadm raid 5 one disk overwritten file system failed

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mdadm raid 5 one disk overwritten file system failed
@ 2015-02-19  7:38 John Andre Taule
  2015-02-19 11:20 ` Mikael Abrahamsson
  2015-02-19 17:15 ` Piergiorgio Sartor
  0 siblings, 2 replies; 11+ messages in thread
From: John Andre Taule @ 2015-02-19  7:38 UTC (permalink / raw)
  To: linux-raid

Hi!

Case: mdadm Raid 5 4 2TB disks. ext4 formatted spanning the raid.
Attack: dd if=/dev/zero of=/dev/sdb bs=1M

Expected result would be a raid that could be recovered without data loss.

Result was that the file system failed and not possible to recover. 

As I understand it if this was a "hardware type fake" raid controller, the
outcome would be uncertain. However I'm a bit confused as to why the raid
(or more specifically the file system) would fail so horrible when losing
one disk. Is there perhaps critical information written "outside" the raid
on the physical disk, and this where overwritten in the attack?

It would be nice to have an exact idea as to why it failed so hard, and how
obvious it should be that this attack would have more consequence then a
degraded raid.

//Regards

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm raid 5 one disk overwritten file system failed
  2015-02-19  7:38 mdadm raid 5 one disk overwritten file system failed John Andre Taule
@ 2015-02-19 11:20 ` Mikael Abrahamsson
  2015-02-19 14:00   ` SV: " John Andre Taule
  2015-02-19 17:15 ` Piergiorgio Sartor
  1 sibling, 1 reply; 11+ messages in thread
From: Mikael Abrahamsson @ 2015-02-19 11:20 UTC (permalink / raw)
  To: John Andre Taule; +Cc: linux-raid

On Thu, 19 Feb 2015, John Andre Taule wrote:

> Hi!
>
> Case: mdadm Raid 5 4 2TB disks. ext4 formatted spanning the raid.
> Attack: dd if=/dev/zero of=/dev/sdb bs=1M
>
> Expected result would be a raid that could be recovered without data loss.
>
> Result was that the file system failed and not possible to recover.
>
> As I understand it if this was a "hardware type fake" raid controller, the
> outcome would be uncertain. However I'm a bit confused as to why the raid
> (or more specifically the file system) would fail so horrible when losing
> one disk. Is there perhaps critical information written "outside" the raid
> on the physical disk, and this where overwritten in the attack?

Did you stop the array before you did the dd command, or you just did it?

If you just did it, most likely you overwrote the superblock on the drive 
(located near the beginning of the drive by recent default), plus part of 
the file system.

> It would be nice to have an exact idea as to why it failed so hard, and 
> how obvious it should be that this attack would have more consequence 
> then a degraded raid.

Because if the drive was active then the operating system most likely 
didn't notice that you overwrote part of the data on the disk and the 
drive wasn't failed.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 11+ messages in thread

* SV: mdadm raid 5 one disk overwritten file system failed
  2015-02-19 11:20 ` Mikael Abrahamsson
@ 2015-02-19 14:00   ` John Andre Taule
  2015-02-19 14:23     ` Mikael Abrahamsson
  0 siblings, 1 reply; 11+ messages in thread
From: John Andre Taule @ 2015-02-19 14:00 UTC (permalink / raw)
  To: linux-raid

The array was not stopped before dd was running. The "hacker" logged on,
left the command running and logged of. It was discovered the next morning
about 5 hours later, and there was very high load on the server, I think
this is why the command where discovered at all. This is how that raid have
performed earlier when a drive have failed.

I'm a bit surprised that overwriting anything on the physical disk should
corrupt the file system on the raid. I would think that would be similar to
a disk crashing or failing in other ways.

What you say that Linux might not have seen the disk as failing is
interesting. This could explain why the file system got corrupted.

-----Opprinnelig melding-----
Fra: Mikael Abrahamsson [mailto:swmike@swm.pp.se] 
Sendt: 19. februar 2015 12:20
Til: John Andre Taule
Kopi: linux-raid@vger.kernel.org
Emne: Re: mdadm raid 5 one disk overwritten file system failed

On Thu, 19 Feb 2015, John Andre Taule wrote:

> Hi!
>
> Case: mdadm Raid 5 4 2TB disks. ext4 formatted spanning the raid.
> Attack: dd if=/dev/zero of=/dev/sdb bs=1M
>
> Expected result would be a raid that could be recovered without data loss.
>
> Result was that the file system failed and not possible to recover.
>
> As I understand it if this was a "hardware type fake" raid controller, 
> the outcome would be uncertain. However I'm a bit confused as to why 
> the raid (or more specifically the file system) would fail so horrible 
> when losing one disk. Is there perhaps critical information written 
> "outside" the raid on the physical disk, and this where overwritten in the
attack?

Did you stop the array before you did the dd command, or you just did it?

If you just did it, most likely you overwrote the superblock on the drive
(located near the beginning of the drive by recent default), plus part of
the file system.

> It would be nice to have an exact idea as to why it failed so hard, 
> and how obvious it should be that this attack would have more 
> consequence then a degraded raid.

Because if the drive was active then the operating system most likely didn't
notice that you overwrote part of the data on the disk and the drive wasn't
failed.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm raid 5 one disk overwritten file system failed
  2015-02-19 14:00   ` SV: " John Andre Taule
@ 2015-02-19 14:23     ` Mikael Abrahamsson
  2015-02-19 14:39       ` Adam Goryachev
  2015-04-15 11:47       ` SV: " John Andre Taule
  0 siblings, 2 replies; 11+ messages in thread
From: Mikael Abrahamsson @ 2015-02-19 14:23 UTC (permalink / raw)
  To: John Andre Taule; +Cc: linux-raid

On Thu, 19 Feb 2015, John Andre Taule wrote:

> I'm a bit surprised that overwriting anything on the physical disk 
> should corrupt the file system on the raid. I would think that would be 
> similar to a disk crashing or failing in other ways.

Errr, in raid5 you have data blocks and parity blocks. WHen you overwrite 
one of the component drives with zeroes, you're effectively doing the same 
as writing 0:es to a non-raid drive every 3 $stripesize. You're zero:ing a 
lot of the filesystem information.

> What you say that Linux might not have seen the disk as failing is
> interesting. This could explain why the file system got corrupted.

Correct. There is no mechanism that periodically checks the contents of 
the superblock and fails the drive if it's not there anymore. So the drive 
is never failed.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm raid 5 one disk overwritten file system failed
  2015-02-19 14:23     ` Mikael Abrahamsson
@ 2015-02-19 14:39       ` Adam Goryachev
  2015-02-19 16:21         ` SV: " John Andre Taule
  2015-02-19 22:15         ` Wols Lists
  2015-04-15 11:47       ` SV: " John Andre Taule
  1 sibling, 2 replies; 11+ messages in thread
From: Adam Goryachev @ 2015-02-19 14:39 UTC (permalink / raw)
  To: John Andre Taule; +Cc: linux-raid


On 20/02/2015 01:23, Mikael Abrahamsson wrote:
> On Thu, 19 Feb 2015, John Andre Taule wrote:
>
>> I'm a bit surprised that overwriting anything on the physical disk 
>> should corrupt the file system on the raid. I would think that would 
>> be similar to a disk crashing or failing in other ways.
>
> Errr, in raid5 you have data blocks and parity blocks. WHen you 
> overwrite one of the component drives with zeroes, you're effectively 
> doing the same as writing 0:es to a non-raid drive every 3 
> $stripesize. You're zero:ing a lot of the filesystem information.
>
>> What you say that Linux might not have seen the disk as failing is
>> interesting. This could explain why the file system got corrupted.
>
> Correct. There is no mechanism that periodically checks the contents 
> of the superblock and fails the drive if it's not there anymore. So 
> the drive is never failed.
>
In addition, there is no checking of the data when read to confirm that 
the data on the first 4 disks = the checksum on the 5th disk (assuming a 
5 disk raid5). This applies equally to all raid levels as currently 
working from linux md raid. While there are some use cases where it 
would be nice to confirm that the data read is correct, this has not yet 
been implemented (for live operation, you can schedule a check at 
periodic intervals).

Even if MD noticed that the value of the first 4 disks did not equal the 
checksum on the 5th disk, it has no method to determine which disk 
contained the wrong value (could be any of the data stripes, or the 
parity stripe). raid6 begins to allow for this type of check, and I 
remember a lot of work being done on this, however, I think that was 
still an offline tool more useful for data recovery from partially 
failed multiple drives.

 From memory, there are filesystems which will do what you are asking 
(check that the data received from disk is correct, use multiple 'disks' 
and ensure protection from x failed drives, etc. I am certain zfs and 
btrfs both support this. (I've never used either due to stability 
concerns, but I read about them every now and then....)

Regards,
Adam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* SV: mdadm raid 5 one disk overwritten file system failed
  2015-02-19 14:39       ` Adam Goryachev
@ 2015-02-19 16:21         ` John Andre Taule
  2015-02-19 22:15         ` Wols Lists
  1 sibling, 0 replies; 11+ messages in thread
From: John Andre Taule @ 2015-02-19 16:21 UTC (permalink / raw)
  To: linux-raid

How common would this knowledge be? 

Personally I would never do something like this on a live system just
because there is too many unknown variables in play. I know what the
different raids do. I am not working on this full time, my day to day work
is toward the user end of the application stack. Usually we use Areca
hardware raids, but this particular raid used mdadm, well because that's
what was available at the time. Its been stable enough, I think its survived
2 or 3 failed drives, of course not at the same time.

I would like to thank the list for confirming my suspicion that there was
something else at play here that made the /dev/zero do more damage then the
"hacker" believed it would do.

//Regards

-----Opprinnelig melding-----
Fra: Adam Goryachev [mailto:mailinglists@websitemanagers.com.au] 
Sendt: 19. februar 2015 15:40
Til: John Andre Taule
Kopi: linux-raid@vger.kernel.org
Emne: Re: mdadm raid 5 one disk overwritten file system failed

On 20/02/2015 01:23, Mikael Abrahamsson wrote:
> On Thu, 19 Feb 2015, John Andre Taule wrote:
>
>> I'm a bit surprised that overwriting anything on the physical disk 
>> should corrupt the file system on the raid. I would think that would 
>> be similar to a disk crashing or failing in other ways.
>
> Errr, in raid5 you have data blocks and parity blocks. WHen you 
> overwrite one of the component drives with zeroes, you're effectively 
> doing the same as writing 0:es to a non-raid drive every 3 
> $stripesize. You're zero:ing a lot of the filesystem information.
>
>> What you say that Linux might not have seen the disk as failing is 
>> interesting. This could explain why the file system got corrupted.
>
> Correct. There is no mechanism that periodically checks the contents 
> of the superblock and fails the drive if it's not there anymore. So 
> the drive is never failed.
>
In addition, there is no checking of the data when read to confirm that the
data on the first 4 disks = the checksum on the 5th disk (assuming a
5 disk raid5). This applies equally to all raid levels as currently working
from linux md raid. While there are some use cases where it would be nice to
confirm that the data read is correct, this has not yet been implemented
(for live operation, you can schedule a check at periodic intervals).

Even if MD noticed that the value of the first 4 disks did not equal the
checksum on the 5th disk, it has no method to determine which disk contained
the wrong value (could be any of the data stripes, or the parity stripe).
raid6 begins to allow for this type of check, and I remember a lot of work
being done on this, however, I think that was still an offline tool more
useful for data recovery from partially failed multiple drives.

 From memory, there are filesystems which will do what you are asking (check
that the data received from disk is correct, use multiple 'disks' 
and ensure protection from x failed drives, etc. I am certain zfs and btrfs
both support this. (I've never used either due to stability concerns, but I
read about them every now and then....)

Regards,
Adam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm raid 5 one disk overwritten file system failed
  2015-02-19 14:39       ` Adam Goryachev
  2015-02-19 16:21         ` SV: " John Andre Taule
@ 2015-02-19 22:15         ` Wols Lists
  1 sibling, 0 replies; 11+ messages in thread
From: Wols Lists @ 2015-02-19 22:15 UTC (permalink / raw)
  To: Adam Goryachev, John Andre Taule; +Cc: linux-raid

On 19/02/15 14:39, Adam Goryachev wrote:
> From memory, there are filesystems which will do what you are asking
> (check that the data received from disk is correct, use multiple 'disks'
> and ensure protection from x failed drives, etc. I am certain zfs and
> btrfs both support this. (I've never used either due to stability
> concerns, but I read about them every now and then....)

When I used Pr1mes, I don't remember whether it was hardware or
software, but I believe their drives implemented some form of parity
check and recovery.

Basically, every eight-bit byte you wrote went to disk as sixteen bits -
a data byte and a parity byte. I don't know how it worked but (1) you
could reconstruct either byte from the other, and (2) for any 1-bit
error you could tell which of the data or parity bytes was corrupt. For
any 2-bit error I think you had a 90% chance of telling which byte was
corrupt - something like that anyway.

Of course, that's no use if your hacker feeds their corrupt stream
through your parity mechanism, or if 0x00000000 is valid when read from
disk.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 11+ messages in thread

* SV: mdadm raid 5 one disk overwritten file system failed
  2015-02-19 14:23     ` Mikael Abrahamsson
  2015-02-19 14:39       ` Adam Goryachev
@ 2015-04-15 11:47       ` John Andre Taule
  2015-04-15 12:38         ` Mikael Abrahamsson
  1 sibling, 1 reply; 11+ messages in thread
From: John Andre Taule @ 2015-04-15 11:47 UTC (permalink / raw)
  To: linux-raid

The guy that did this to us got 3 months jail. 

His argument was that we should have failed the system manually (removed the
disk that he targeted with "dd"), and the raid should have magically fixed
itself. Anyone think this would have worked? 
It was 5 hours of heavy write and deletes to the file system (ext4) and all
that time the dd command where running.

Later I also found this exact "test" of raid in mdadm documentation marking
it as not something you should do (will fail data integriy, eg corrupt
filesystem, period).

/regards

-----Opprinnelig melding-----
Fra: Mikael Abrahamsson [mailto:swmike@swm.pp.se] 
Sendt: 19. februar 2015 15:24
Til: John Andre Taule
Kopi: linux-raid@vger.kernel.org
Emne: Re: mdadm raid 5 one disk overwritten file system failed

On Thu, 19 Feb 2015, John Andre Taule wrote:

> I'm a bit surprised that overwriting anything on the physical disk 
> should corrupt the file system on the raid. I would think that would 
> be similar to a disk crashing or failing in other ways.

Errr, in raid5 you have data blocks and parity blocks. WHen you overwrite
one of the component drives with zeroes, you're effectively doing the same
as writing 0:es to a non-raid drive every 3 $stripesize. You're zero:ing a
lot of the filesystem information.

> What you say that Linux might not have seen the disk as failing is 
> interesting. This could explain why the file system got corrupted.

Correct. There is no mechanism that periodically checks the contents of the
superblock and fails the drive if it's not there anymore. So the drive is
never failed.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm raid 5 one disk overwritten file system failed
  2015-04-15 11:47       ` SV: " John Andre Taule
@ 2015-04-15 12:38         ` Mikael Abrahamsson
  2015-04-15 18:27           ` Wols Lists
  0 siblings, 1 reply; 11+ messages in thread
From: Mikael Abrahamsson @ 2015-04-15 12:38 UTC (permalink / raw)
  To: John Andre Taule; +Cc: linux-raid

On Wed, 15 Apr 2015, John Andre Taule wrote:

> The guy that did this to us got 3 months jail.
>
> His argument was that we should have failed the system manually (removed the
> disk that he targeted with "dd"), and the raid should have magically fixed
> itself. Anyone think this would have worked?
> It was 5 hours of heavy write and deletes to the file system (ext4) and all
> that time the dd command where running.

Not a chance, after 5 hours dd basically had overwritten 1/3 of the data 
spread out across a large portion of the volume. We're talking massive 
file and filesystem corruption.

I don't know enough about zfs, but I am under the impression that zfs 
perhaps could have detected the bad information (because checksum would no 
longer match on those blocks) if you would have had native zfs to create 
the raid5. I don't have personal experience with zfs though, someone else 
might be able to answer that part.

It's really hard to protect against this kind of intentional sabotage. 
Even if you would have run zfs instead, he could have just dd:ed to the 
actual raid volume instead.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm raid 5 one disk overwritten file system failed
  2015-04-15 12:38         ` Mikael Abrahamsson
@ 2015-04-15 18:27           ` Wols Lists
  0 siblings, 0 replies; 11+ messages in thread
From: Wols Lists @ 2015-04-15 18:27 UTC (permalink / raw)
  To: Mikael Abrahamsson, John Andre Taule; +Cc: linux-raid

On 15/04/15 13:38, Mikael Abrahamsson wrote:
> On Wed, 15 Apr 2015, John Andre Taule wrote:
> 
>> The guy that did this to us got 3 months jail.
>>
>> His argument was that we should have failed the system manually
>> (removed the
>> disk that he targeted with "dd"), and the raid should have magically
>> fixed
>> itself. Anyone think this would have worked?
>> It was 5 hours of heavy write and deletes to the file system (ext4)
>> and all
>> that time the dd command where running.
> 
> Not a chance, after 5 hours dd basically had overwritten 1/3 of the data
> spread out across a large portion of the volume. We're talking massive
> file and filesystem corruption.

Wouldn't failing the drive and then adding it "as new" (triggering a
rebuild) recover any files that hadn't been modified while the dd was
running?

Of course, that still means any directories that had been modified would
likely have also been corrupted, in all probability landing the files in
them into "lost+found" and necessitating a massive recataloging of all
the files in there.

The data would have been recovered, but the directory structure ... not
a nice recovery job.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mdadm raid 5 one disk overwritten file system failed
  2015-02-19  7:38 mdadm raid 5 one disk overwritten file system failed John Andre Taule
  2015-02-19 11:20 ` Mikael Abrahamsson
@ 2015-02-19 17:15 ` Piergiorgio Sartor
  1 sibling, 0 replies; 11+ messages in thread
From: Piergiorgio Sartor @ 2015-02-19 17:15 UTC (permalink / raw)
  To: John Andre Taule; +Cc: linux-raid

On Thu, Feb 19, 2015 at 08:38:19AM +0100, John Andre Taule wrote:
> Hi!
> 
> Case: mdadm Raid 5 4 2TB disks. ext4 formatted spanning the raid.
> Attack: dd if=/dev/zero of=/dev/sdb bs=1M
> 
> Expected result would be a raid that could be recovered without data loss.
> 
> Result was that the file system failed and not possible to recover. 
> 
> As I understand it if this was a "hardware type fake" raid controller, the
> outcome would be uncertain. However I'm a bit confused as to why the raid
> (or more specifically the file system) would fail so horrible when losing
> one disk. Is there perhaps critical information written "outside" the raid
> on the physical disk, and this where overwritten in the attack?
> 
> It would be nice to have an exact idea as to why it failed so hard, and how
> obvious it should be that this attack would have more consequence then a
> degraded raid.

In this situation, there is no HDD failure.
The kernel, the md driver, the sata driver and so on,
cannot detect any failure, because there is none.
The HDD is alive and kicking and well writing.

Just to be clear and avoid confusion, the (redundant)
RAID does *not* check, at each read operation, that
the data is consistent. It does only use redundancy
in order to re-generate missing data *after* a failure
is detected.

So, writing to a RAID component does not trigger any
error, hence no failure, hence no reconstruction, but
a corrupted filesystem.

bye,

pg

> 
> //Regards
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-04-15 18:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-19  7:38 mdadm raid 5 one disk overwritten file system failed John Andre Taule
2015-02-19 11:20 ` Mikael Abrahamsson
2015-02-19 14:00   ` SV: " John Andre Taule
2015-02-19 14:23     ` Mikael Abrahamsson
2015-02-19 14:39       ` Adam Goryachev
2015-02-19 16:21         ` SV: " John Andre Taule
2015-02-19 22:15         ` Wols Lists
2015-04-15 11:47       ` SV: " John Andre Taule
2015-04-15 12:38         ` Mikael Abrahamsson
2015-04-15 18:27           ` Wols Lists
2015-02-19 17:15 ` Piergiorgio Sartor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).