Linux RAID subsystem development
 help / color / mirror / Atom feed
* Unexpected mdadm behavior with old replugged disc
@ 2017-11-18 14:35 Matthias Walther
  2017-11-18 14:58 ` Wols Lists
  2017-11-20  2:08 ` Phil Turmel
  0 siblings, 2 replies; 5+ messages in thread
From: Matthias Walther @ 2017-11-18 14:35 UTC (permalink / raw)
  To: linux-raid

Hello,

I just signed up for this mailing list to discuss the following,
unexpected behavior:

Situation: Raid6 with 6 discs. For some reasons, which are unimportant,
I had replaced a disc before, which was fully functional. This disc was
never changed or written to in between.

Today I replugged this particular disc additionally as 7th disc to the
server (cold plug, server was switched off).

Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to
this old disc dropping one of the newer discs from the raid.

This might be because it has its uuid still stored with higher rank than
the newer disc or because the old disc got a lower sdX slot. I don't
know that in detail.

Anyway, I wouldn't expect mdadm to act like this. It might use the old,
now plugged in again disc as hot spare or ignore it at all. But it
shouldn't break a fully synced raid. I have reduced redundancy for about
24 hours now - without any rational reason.

Especially as this isn't a usual use case. In general, you'd only
replace broken discs. So this behavior makes even less sense to me,
because it shouldn't try to use a potentially broken disc.

Is this a bug? Or maybe something the developer's haven thought of, or
is this just unprecedented?

The kernel version is 4.14 mainline on Ubuntu 16.04 LTS.

Regards,
Matthias

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unexpected mdadm behavior with old replugged disc
  2017-11-18 14:35 Unexpected mdadm behavior with old replugged disc Matthias Walther
@ 2017-11-18 14:58 ` Wols Lists
  2017-11-18 15:06   ` Matthias Walther
  2017-11-20  2:08 ` Phil Turmel
  1 sibling, 1 reply; 5+ messages in thread
From: Wols Lists @ 2017-11-18 14:58 UTC (permalink / raw)
  To: Matthias Walther, linux-raid

On 18/11/17 14:35, Matthias Walther wrote:
> Hello,
> 
> I just signed up for this mailing list to discuss the following,
> unexpected behavior:
> 
> Situation: Raid6 with 6 discs. For some reasons, which are unimportant,
> I had replaced a disc before, which was fully functional. This disc was
> never changed or written to in between.
> 
> Today I replugged this particular disc additionally as 7th disc to the
> server (cold plug, server was switched off).
> 
> Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to
> this old disc dropping one of the newer discs from the raid.
> 
> This might be because it has its uuid still stored with higher rank than
> the newer disc or because the old disc got a lower sdX slot. I don't
> know that in detail.
> 
> Anyway, I wouldn't expect mdadm to act like this. It might use the old,
> now plugged in again disc as hot spare or ignore it at all. But it
> shouldn't break a fully synced raid. I have reduced redundancy for about
> 24 hours now - without any rational reason.

Just a guess? "mdadm --assemble --incremental"?

What I *suspect* happened is that, as the system booted, mdadm scanned
the drives as they came available, and because this drive became
available before some of the others, it got included in the array.

I can't, off the top of my head, think of any way to stop this
happening, other than to prevent raid assembling during boot, or having
an *accurate* mdadm.conf from which mdadm could realise this drive
wasn't meant to be included.

Did you update mdadm.conf after you removed this drive? Do you even have
an mdadm.conf?

The only good point here, is that if you had three such drives, mdadm
would almost certainly have failed the array as it booted, and left you
in an (easily) recoverable situation. I don't really see what else it
could have done?

Cheers,
Wol

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unexpected mdadm behavior with old replugged disc
  2017-11-18 14:58 ` Wols Lists
@ 2017-11-18 15:06   ` Matthias Walther
  2017-11-18 18:04     ` Wols Lists
  0 siblings, 1 reply; 5+ messages in thread
From: Matthias Walther @ 2017-11-18 15:06 UTC (permalink / raw)
  To: Wols Lists, linux-raid

Am 18.11.2017 um 15:58 schrieb Wols Lists:
> On 18/11/17 14:35, Matthias Walther wrote:
>> Hello,
>>
>> I just signed up for this mailing list to discuss the following,
>> unexpected behavior:
>>
>> Situation: Raid6 with 6 discs. For some reasons, which are unimportant,
>> I had replaced a disc before, which was fully functional. This disc was
>> never changed or written to in between.
>>
>> Today I replugged this particular disc additionally as 7th disc to the
>> server (cold plug, server was switched off).
>>
>> Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to
>> this old disc dropping one of the newer discs from the raid.
>>
>> This might be because it has its uuid still stored with higher rank than
>> the newer disc or because the old disc got a lower sdX slot. I don't
>> know that in detail.
>>
>> Anyway, I wouldn't expect mdadm to act like this. It might use the old,
>> now plugged in again disc as hot spare or ignore it at all. But it
>> shouldn't break a fully synced raid. I have reduced redundancy for about
>> 24 hours now - without any rational reason.
Hello,

thanks for your quick reply.
> Just a guess? "mdadm --assemble --incremental"?
What do you mean with this guess? I didn't do anything. All happened
automatically
>
> What I *suspect* happened is that, as the system booted, mdadm scanned
> the drives as they came available, and because this drive became
> available before some of the others, it got included in the array.
Probably.
>
> I can't, off the top of my head, think of any way to stop this
> happening, other than to prevent raid assembling during boot, or having
> an *accurate* mdadm.conf from which mdadm could realise this drive
> wasn't meant to be included.
>
> Did you update mdadm.conf after you removed this drive? Do you even have
> an mdadm.conf?
No, always relied on auto-configuration. So if I had had a correctly
updated mdadm.conf, this wouldn't have happened?
>
> The only good point here, is that if you had three such drives, mdadm
> would almost certainly have failed the array as it booted, and left you
> in an (easily) recoverable situation. I don't really see what else it
> could have done?
>
> Cheers,
> Wol

I have to reassembly it manually anyway now. The system crashed, changed
the order again and doesn't assemble automatically at the moment. I'll
have to search how to determine to use which discs.

Once reassembled, I'll create a mdadm.conf to prevent this in the future.

Regards,
Matthias

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unexpected mdadm behavior with old replugged disc
  2017-11-18 15:06   ` Matthias Walther
@ 2017-11-18 18:04     ` Wols Lists
  0 siblings, 0 replies; 5+ messages in thread
From: Wols Lists @ 2017-11-18 18:04 UTC (permalink / raw)
  To: Matthias Walther, linux-raid

On 18/11/17 15:06, Matthias Walther wrote:
> Am 18.11.2017 um 15:58 schrieb Wols Lists:
>> On 18/11/17 14:35, Matthias Walther wrote:
>>> Hello,
>>>
>>> I just signed up for this mailing list to discuss the following,
>>> unexpected behavior:
>>>
>>> Situation: Raid6 with 6 discs. For some reasons, which are unimportant,
>>> I had replaced a disc before, which was fully functional. This disc was
>>> never changed or written to in between.
>>>
>>> Today I replugged this particular disc additionally as 7th disc to the
>>> server (cold plug, server was switched off).
>>>
>>> Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to
>>> this old disc dropping one of the newer discs from the raid.
>>>
>>> This might be because it has its uuid still stored with higher rank than
>>> the newer disc or because the old disc got a lower sdX slot. I don't
>>> know that in detail.
>>>
>>> Anyway, I wouldn't expect mdadm to act like this. It might use the old,
>>> now plugged in again disc as hot spare or ignore it at all. But it
>>> shouldn't break a fully synced raid. I have reduced redundancy for about
>>> 24 hours now - without any rational reason.
> Hello,
> 
> thanks for your quick reply.

>> Just a guess? "mdadm --assemble --incremental"?

> What do you mean with this guess? I didn't do anything. All happened
> automatically
>>
Exactly. As I understand it, this is the command that the boot sequence
runs. It doesn't wait until all the drives are available (it can't know
when all the drives are available), so it adds each drive as it sees it.

>> What I *suspect* happened is that, as the system booted, mdadm scanned
>> the drives as they came available, and because this drive became
>> available before some of the others, it got included in the array.

> Probably.
>>
>> I can't, off the top of my head, think of any way to stop this
>> happening, other than to prevent raid assembling during boot, or having
>> an *accurate* mdadm.conf from which mdadm could realise this drive
>> wasn't meant to be included.
>>
>> Did you update mdadm.conf after you removed this drive? Do you even have
>> an mdadm.conf?

> No, always relied on auto-configuration. So if I had had a correctly
> updated mdadm.conf, this wouldn't have happened?

I don't know. My system doesn't have an mdadm.conf. But if you had an
mdadm.conf, it may well have told mdadm that drive didn't belong there.
>>
>> The only good point here, is that if you had three such drives, mdadm
>> would almost certainly have failed the array as it booted, and left you
>> in an (easily) recoverable situation. I don't really see what else it
>> could have done?
>>
>> Cheers,
>> Wol
> 
> I have to reassembly it manually anyway now. The system crashed, changed
> the order again and doesn't assemble automatically at the moment. I'll
> have to search how to determine to use which discs.

Download the lsdrv command, it'll give you a load of info. And mdadm
--examine or --detail should tell you everything you need to know.
> 
> Once reassembled, I'll create a mdadm.conf to prevent this in the future.
> 
Next time you remove a drive, it would pay you to use --wipe-superblock
or whatever the option is. I suspect using --replace would also flag the
removed drive as no longer valid.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unexpected mdadm behavior with old replugged disc
  2017-11-18 14:35 Unexpected mdadm behavior with old replugged disc Matthias Walther
  2017-11-18 14:58 ` Wols Lists
@ 2017-11-20  2:08 ` Phil Turmel
  1 sibling, 0 replies; 5+ messages in thread
From: Phil Turmel @ 2017-11-20  2:08 UTC (permalink / raw)
  To: Matthias Walther, linux-raid

On 11/18/2017 09:35 AM, Matthias Walther wrote:
> Hello,
> 
> I just signed up for this mailing list to discuss the following,
> unexpected behavior:
> 
> Situation: Raid6 with 6 discs. For some reasons, which are unimportant,
> I had replaced a disc before, which was fully functional. This disc was
> never changed or written to in between.
> 
> Today I replugged this particular disc additionally as 7th disc to the
> server (cold plug, server was switched off).
> 
> Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to
> this old disc dropping one of the newer discs from the raid.
> 
> This might be because it has its uuid still stored with higher rank than
> the newer disc or because the old disc got a lower sdX slot. I don't
> know that in detail.

This is called split-brain, as there's insufficient information for
mdadm to tell when it encounters this disk during startup to discard
this disk if it shows up before the other in that device role.  You're
lucky it was a raid6 instead of a mirror, as you can really get screwed
in that case.

It's not a question of rank, simply the order of device discovery on boot.

The correct (and only) solution is to clean off your UUIDs with
--zero-superblock if a removed device is actually still working.

Phil

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-20  2:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-18 14:35 Unexpected mdadm behavior with old replugged disc Matthias Walther
2017-11-18 14:58 ` Wols Lists
2017-11-18 15:06   ` Matthias Walther
2017-11-18 18:04     ` Wols Lists
2017-11-20  2:08 ` Phil Turmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox