* Unexpected mdadm behavior with old replugged disc @ 2017-11-18 14:35 Matthias Walther 2017-11-18 14:58 ` Wols Lists 2017-11-20 2:08 ` Phil Turmel 0 siblings, 2 replies; 5+ messages in thread From: Matthias Walther @ 2017-11-18 14:35 UTC (permalink / raw) To: linux-raid Hello, I just signed up for this mailing list to discuss the following, unexpected behavior: Situation: Raid6 with 6 discs. For some reasons, which are unimportant, I had replaced a disc before, which was fully functional. This disc was never changed or written to in between. Today I replugged this particular disc additionally as 7th disc to the server (cold plug, server was switched off). Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to this old disc dropping one of the newer discs from the raid. This might be because it has its uuid still stored with higher rank than the newer disc or because the old disc got a lower sdX slot. I don't know that in detail. Anyway, I wouldn't expect mdadm to act like this. It might use the old, now plugged in again disc as hot spare or ignore it at all. But it shouldn't break a fully synced raid. I have reduced redundancy for about 24 hours now - without any rational reason. Especially as this isn't a usual use case. In general, you'd only replace broken discs. So this behavior makes even less sense to me, because it shouldn't try to use a potentially broken disc. Is this a bug? Or maybe something the developer's haven thought of, or is this just unprecedented? The kernel version is 4.14 mainline on Ubuntu 16.04 LTS. Regards, Matthias ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Unexpected mdadm behavior with old replugged disc 2017-11-18 14:35 Unexpected mdadm behavior with old replugged disc Matthias Walther @ 2017-11-18 14:58 ` Wols Lists 2017-11-18 15:06 ` Matthias Walther 2017-11-20 2:08 ` Phil Turmel 1 sibling, 1 reply; 5+ messages in thread From: Wols Lists @ 2017-11-18 14:58 UTC (permalink / raw) To: Matthias Walther, linux-raid On 18/11/17 14:35, Matthias Walther wrote: > Hello, > > I just signed up for this mailing list to discuss the following, > unexpected behavior: > > Situation: Raid6 with 6 discs. For some reasons, which are unimportant, > I had replaced a disc before, which was fully functional. This disc was > never changed or written to in between. > > Today I replugged this particular disc additionally as 7th disc to the > server (cold plug, server was switched off). > > Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to > this old disc dropping one of the newer discs from the raid. > > This might be because it has its uuid still stored with higher rank than > the newer disc or because the old disc got a lower sdX slot. I don't > know that in detail. > > Anyway, I wouldn't expect mdadm to act like this. It might use the old, > now plugged in again disc as hot spare or ignore it at all. But it > shouldn't break a fully synced raid. I have reduced redundancy for about > 24 hours now - without any rational reason. Just a guess? "mdadm --assemble --incremental"? What I *suspect* happened is that, as the system booted, mdadm scanned the drives as they came available, and because this drive became available before some of the others, it got included in the array. I can't, off the top of my head, think of any way to stop this happening, other than to prevent raid assembling during boot, or having an *accurate* mdadm.conf from which mdadm could realise this drive wasn't meant to be included. Did you update mdadm.conf after you removed this drive? Do you even have an mdadm.conf? The only good point here, is that if you had three such drives, mdadm would almost certainly have failed the array as it booted, and left you in an (easily) recoverable situation. I don't really see what else it could have done? Cheers, Wol ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Unexpected mdadm behavior with old replugged disc 2017-11-18 14:58 ` Wols Lists @ 2017-11-18 15:06 ` Matthias Walther 2017-11-18 18:04 ` Wols Lists 0 siblings, 1 reply; 5+ messages in thread From: Matthias Walther @ 2017-11-18 15:06 UTC (permalink / raw) To: Wols Lists, linux-raid Am 18.11.2017 um 15:58 schrieb Wols Lists: > On 18/11/17 14:35, Matthias Walther wrote: >> Hello, >> >> I just signed up for this mailing list to discuss the following, >> unexpected behavior: >> >> Situation: Raid6 with 6 discs. For some reasons, which are unimportant, >> I had replaced a disc before, which was fully functional. This disc was >> never changed or written to in between. >> >> Today I replugged this particular disc additionally as 7th disc to the >> server (cold plug, server was switched off). >> >> Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to >> this old disc dropping one of the newer discs from the raid. >> >> This might be because it has its uuid still stored with higher rank than >> the newer disc or because the old disc got a lower sdX slot. I don't >> know that in detail. >> >> Anyway, I wouldn't expect mdadm to act like this. It might use the old, >> now plugged in again disc as hot spare or ignore it at all. But it >> shouldn't break a fully synced raid. I have reduced redundancy for about >> 24 hours now - without any rational reason. Hello, thanks for your quick reply. > Just a guess? "mdadm --assemble --incremental"? What do you mean with this guess? I didn't do anything. All happened automatically > > What I *suspect* happened is that, as the system booted, mdadm scanned > the drives as they came available, and because this drive became > available before some of the others, it got included in the array. Probably. > > I can't, off the top of my head, think of any way to stop this > happening, other than to prevent raid assembling during boot, or having > an *accurate* mdadm.conf from which mdadm could realise this drive > wasn't meant to be included. > > Did you update mdadm.conf after you removed this drive? Do you even have > an mdadm.conf? No, always relied on auto-configuration. So if I had had a correctly updated mdadm.conf, this wouldn't have happened? > > The only good point here, is that if you had three such drives, mdadm > would almost certainly have failed the array as it booted, and left you > in an (easily) recoverable situation. I don't really see what else it > could have done? > > Cheers, > Wol I have to reassembly it manually anyway now. The system crashed, changed the order again and doesn't assemble automatically at the moment. I'll have to search how to determine to use which discs. Once reassembled, I'll create a mdadm.conf to prevent this in the future. Regards, Matthias ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Unexpected mdadm behavior with old replugged disc 2017-11-18 15:06 ` Matthias Walther @ 2017-11-18 18:04 ` Wols Lists 0 siblings, 0 replies; 5+ messages in thread From: Wols Lists @ 2017-11-18 18:04 UTC (permalink / raw) To: Matthias Walther, linux-raid On 18/11/17 15:06, Matthias Walther wrote: > Am 18.11.2017 um 15:58 schrieb Wols Lists: >> On 18/11/17 14:35, Matthias Walther wrote: >>> Hello, >>> >>> I just signed up for this mailing list to discuss the following, >>> unexpected behavior: >>> >>> Situation: Raid6 with 6 discs. For some reasons, which are unimportant, >>> I had replaced a disc before, which was fully functional. This disc was >>> never changed or written to in between. >>> >>> Today I replugged this particular disc additionally as 7th disc to the >>> server (cold plug, server was switched off). >>> >>> Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to >>> this old disc dropping one of the newer discs from the raid. >>> >>> This might be because it has its uuid still stored with higher rank than >>> the newer disc or because the old disc got a lower sdX slot. I don't >>> know that in detail. >>> >>> Anyway, I wouldn't expect mdadm to act like this. It might use the old, >>> now plugged in again disc as hot spare or ignore it at all. But it >>> shouldn't break a fully synced raid. I have reduced redundancy for about >>> 24 hours now - without any rational reason. > Hello, > > thanks for your quick reply. >> Just a guess? "mdadm --assemble --incremental"? > What do you mean with this guess? I didn't do anything. All happened > automatically >> Exactly. As I understand it, this is the command that the boot sequence runs. It doesn't wait until all the drives are available (it can't know when all the drives are available), so it adds each drive as it sees it. >> What I *suspect* happened is that, as the system booted, mdadm scanned >> the drives as they came available, and because this drive became >> available before some of the others, it got included in the array. > Probably. >> >> I can't, off the top of my head, think of any way to stop this >> happening, other than to prevent raid assembling during boot, or having >> an *accurate* mdadm.conf from which mdadm could realise this drive >> wasn't meant to be included. >> >> Did you update mdadm.conf after you removed this drive? Do you even have >> an mdadm.conf? > No, always relied on auto-configuration. So if I had had a correctly > updated mdadm.conf, this wouldn't have happened? I don't know. My system doesn't have an mdadm.conf. But if you had an mdadm.conf, it may well have told mdadm that drive didn't belong there. >> >> The only good point here, is that if you had three such drives, mdadm >> would almost certainly have failed the array as it booted, and left you >> in an (easily) recoverable situation. I don't really see what else it >> could have done? >> >> Cheers, >> Wol > > I have to reassembly it manually anyway now. The system crashed, changed > the order again and doesn't assemble automatically at the moment. I'll > have to search how to determine to use which discs. Download the lsdrv command, it'll give you a load of info. And mdadm --examine or --detail should tell you everything you need to know. > > Once reassembled, I'll create a mdadm.conf to prevent this in the future. > Next time you remove a drive, it would pay you to use --wipe-superblock or whatever the option is. I suspect using --replace would also flag the removed drive as no longer valid. Cheers, Wol ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Unexpected mdadm behavior with old replugged disc 2017-11-18 14:35 Unexpected mdadm behavior with old replugged disc Matthias Walther 2017-11-18 14:58 ` Wols Lists @ 2017-11-20 2:08 ` Phil Turmel 1 sibling, 0 replies; 5+ messages in thread From: Phil Turmel @ 2017-11-20 2:08 UTC (permalink / raw) To: Matthias Walther, linux-raid On 11/18/2017 09:35 AM, Matthias Walther wrote: > Hello, > > I just signed up for this mailing list to discuss the following, > unexpected behavior: > > Situation: Raid6 with 6 discs. For some reasons, which are unimportant, > I had replaced a disc before, which was fully functional. This disc was > never changed or written to in between. > > Today I replugged this particular disc additionally as 7th disc to the > server (cold plug, server was switched off). > > Unexpectedly mdadm broke up my fully synced raid6 and now syncs back to > this old disc dropping one of the newer discs from the raid. > > This might be because it has its uuid still stored with higher rank than > the newer disc or because the old disc got a lower sdX slot. I don't > know that in detail. This is called split-brain, as there's insufficient information for mdadm to tell when it encounters this disk during startup to discard this disk if it shows up before the other in that device role. You're lucky it was a raid6 instead of a mirror, as you can really get screwed in that case. It's not a question of rank, simply the order of device discovery on boot. The correct (and only) solution is to clean off your UUIDs with --zero-superblock if a removed device is actually still working. Phil ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-11-20 2:08 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-11-18 14:35 Unexpected mdadm behavior with old replugged disc Matthias Walther 2017-11-18 14:58 ` Wols Lists 2017-11-18 15:06 ` Matthias Walther 2017-11-18 18:04 ` Wols Lists 2017-11-20 2:08 ` Phil Turmel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox