raid1 show clean but md0 will not assemble

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid1 show clean but md0 will not assemble
@ 2024-03-07 22:39 Stewart Andreason
  2024-03-07 22:58 ` Roman Mamedov
  0 siblings, 1 reply; 3+ messages in thread
From: Stewart Andreason @ 2024-03-07 22:39 UTC (permalink / raw)
  To: linux-raid

Hi, I am new to raid, but have many decades in tech.

I have a raid1 array that was working, but now gives conflicting 
information, that I need some help understanding. Searching did not 
answer for my specific situation.

I've put all the required output in a file instead of attaching here.:

http://seahorsecorral.org/bugreport/linux-raid-help-form.txt.gz

What confuses me is both drives report clean with mdadm --examine

so it does not look like a typical degraded situation.

I can not see clearly why they are out of sync. Is it timestamps, last 
unmounted time, superblocks broken, etc. I would think if something was 
altered it would say where that happened. That would get into what I was 
doing when it broke, and make a longer story.

Model numbers specs show CMR, not shingled. Problems with portable 
shingled drives is what moved me into creating a raid.

sdc is inactive, slot 0, (AA), and is out of date. I guess AA is from 
the last time it was written to? So it was turned off in the correct 
condition.

sdd is RO degraded mountable, slot 1, and (.A)

That indicates 0 is Missing. I understand missing from the current 
array, but... it's right there.

I have set SCT Error Recovery Control to 7 seconds today, After the 
problem occurred, so is a detail to be ignored.

I have tried this:

$ sudo mdadm --stop /dev/md0
mdadm: stopped /dev/md0

$ sudo mdadm --assemble --verbose /dev/md0 /dev/sdc1 /dev/sdd1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 1.
mdadm: added /dev/sdc1 to /dev/md0 as 0 (possibly out of date)
mdadm: added /dev/sdd1 to /dev/md0 as 1
mdadm: /dev/md0 has been started with 1 drive (out of 2).

My questions:

Where are the Last-updated timestamps? Don't they match?

Is the correct way to fix this, to --force  OR  --delete and --add ?

What exactly would force do?

I can find nothing about what exactly that does. Presumably something 
less than a whole disk copy like the 2nd option.

Do I need to --zero-superblock sdc first, without seeing a specific 
error regarding the superblocks?

Thank you

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: raid1 show clean but md0 will not assemble
  2024-03-07 22:39 raid1 show clean but md0 will not assemble Stewart Andreason
@ 2024-03-07 22:58 ` Roman Mamedov
       [not found]   ` <ac42e2cf-b8a0-4072-949e-fe3d0d969f7c@gmail.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Roman Mamedov @ 2024-03-07 22:58 UTC (permalink / raw)
  To: Stewart Andreason; +Cc: linux-raid

On Thu, 7 Mar 2024 14:39:17 -0800
Stewart Andreason <sandreas41@gmail.com> wrote:

> $ sudo mdadm --assemble --verbose /dev/md0 /dev/sdc1 /dev/sdd1
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 1.
> mdadm: added /dev/sdc1 to /dev/md0 as 0 (possibly out of date)
> mdadm: added /dev/sdd1 to /dev/md0 as 1
> mdadm: /dev/md0 has been started with 1 drive (out of 2).

Please include "dmesg" output that's printed after running this command.

> Where are the Last-updated timestamps? Don't they match?

See the "Event" counters, one drive indeed has less than the other. But that
shouldn't be a problem as you use a bitmap and the outdated parts should
simply resync from the other drive.

As for the actual steps, when you are in this state as in your report, I'd try:

  mdadm --re-add /dev/md0 /dev/sdc1

But to me it is puzzling why it got removed to begin with.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: raid1 show clean but md0 will not assemble
       [not found]   ` <ac42e2cf-b8a0-4072-949e-fe3d0d969f7c@gmail.com>
@ 2024-03-08  0:55     ` Roman Mamedov
  0 siblings, 0 replies; 3+ messages in thread
From: Roman Mamedov @ 2024-03-08  0:55 UTC (permalink / raw)
  To: Stewart Andreason, linux-raid

On Thu, 7 Mar 2024 16:45:49 -0800
Stewart Andreason <sandreas41@gmail.com> wrote:

> Hi Roman,
> 
> Does this board have rules about replying to everyone, or not?

Hello,

Yes it was better to reply to everyone, to let people know it is now solved
and nobody else needs to spend time analyzing the original report.

> I'll go with not until advised otherwise.
> 
> 
> >> $ sudo mdadm --assemble --verbose /dev/md0 /dev/sdc1 /dev/sdd1
> >> mdadm: looking for devices for /dev/md0
> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 1.
> >> mdadm: added /dev/sdc1 to /dev/md0 as 0 (possibly out of date)
> >> mdadm: added /dev/sdd1 to /dev/md0 as 1
> >> mdadm: /dev/md0 has been started with 1 drive (out of 2).
> > Please include "dmesg" output that's printed after running this command.
> 
> 
> Certainly.
> 
> http://seahorsecorral.org/bugreport/Roxy10-dmesg-20240307-clip.txt.gz
> 
> 
> 
> > See the "Event" counters, one drive indeed has less than the other.
> 
> 
> When I first opened these in January, they went into a different 
> enclosure, Acasis EC-7352
> 
> All or 99% of the errors are from that month, both the events and the 3 
> serious errors in the smart log. It was first configured for hardware 
> raid, but proved to have several issues, including not turning on the 
> fan after waking up. Crashed a few times, even in JBOD mode. So I 
> started over in new individual enclosures.
> 
> Hard to tell who was responsible for those crashes, since the most 
> recent ones froze up the whole OS, so no dmesg could be retrieved. That 
> was Jan.29 and was the end of Acasis in my rating.
> 
> 
> > As for the actual steps, when you are in this state as in your report, I'd try:
> >
> >    mdadm --re-add /dev/md0 /dev/sdc1
> 
> I powered up 0, expect sdc, Device Role : Active device 0 , ok.
> 
> I powered up 1, expect sdd, Active device 1
> 
> $ sudo mdadm --detail /dev/md0
> /dev/md0:
>             Version : 1.2
>       Creation Time : Sat Jan 27 12:07:27 2024
>          Raid Level : raid1
>          Array Size : 5860388864 (5588.90 GiB 6001.04 GB)
>       Used Dev Size : 5860388864 (5588.90 GiB 6001.04 GB)
>        Raid Devices : 2
>       Total Devices : 1
>         Persistence : Superblock is persistent
> 
>       Intent Bitmap : Internal
> 
>         Update Time : Sat Mar  2 18:50:02 2024
>               State : clean, degraded
>      Active Devices : 1
>     Working Devices : 1
>      Failed Devices : 0
> 
>                Name : roxy10-debian11-x64:0  (local to host 
> roxy10-debian11-x64)
>                UUID : 1e3f7f7e:23a5b75f:6f76abf5:88f5e704
>              Events : 21691
> 
>      Number   Major   Minor   RaidDevice State
>         0       8       33        0      active sync   /dev/sdc1
>         -       0        0        1      removed
> 
> Why is sdc the active one this time?
> 
> $ lsblk
> 
> sdc       8:32   0   5.5T  0 disk
> └─sdc1    8:33   0   5.5T  0 part
>    └─md0   9:0    0   5.5T  0 raid1
> sdd       8:48   0   5.5T  0 disk
> └─sdd1    8:49   0   5.5T  0 part
> 
> I keep getting confused which drive is the bad one, and repeated my 
> steps before posting my question. Now maybe I was not imagining it.
> 
> I only got the slot numbers verified by reassembling it. so I'll do that 
> step again.
> 
> $ sudo mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> $ sudo mdadm --assemble --verbose /dev/md0 /dev/sdc1 /dev/sdd1
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 1.
> mdadm: added /dev/sdc1 to /dev/md0 as 0 (possibly out of date)
> mdadm: added /dev/sdd1 to /dev/md0 as 1
> mdadm: /dev/md0 has been started with 1 drive (out of 2).
> 
> Huh. Well, onward then. I'll just include what changed:
> 
> $ sudo mdadm --detail /dev/md0
>      Number   Major   Minor   RaidDevice State
>         -       0        0        0      removed
>         1       8       49        1      active sync   /dev/sdd1
> 
> $ sudo mdadm --re-add /dev/md0 /dev/sdc1
> mdadm: re-added /dev/sdc1
> 
> $ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
> [raid4] [raid10]
> md0 : active raid1 sdc1[0] sdd1[1]
>        5860388864 blocks super 1.2 [2/2] [UU]
>        bitmap: 1/44 pages [4KB], 65536KB chunk
> 
> $ sudo mdadm --detail /dev/md0
> 
>      Number   Major   Minor   RaidDevice State
>         0       8       33        0      active sync   /dev/sdc1
>         1       8       49        1      active sync   /dev/sdd1
> 
> $ dmesg
> 
> [37737.530811] md: kicking non-fresh sdc1 from array!
> [37737.556503] md/raid1:md0: active with 1 out of 2 mirrors
> [37737.561908] md0: detected capacity change from 0 to 6001038196736
> [37818.049342] md: recovery of RAID array md0
> [37818.319780] md: md0: recovery done.
> 
> Fixed. I'm so glad I asked the right forum. Thank you!
> 
> 
> > But to me it is puzzling why it got removed to begin with.
> >
> I  had the intent of making a backup of my primary OS when it was 
> unmounted, and after researching the safe commands to reassemble the 
> raid in a different linux OS, I rebooted to System-Rescue-10, copied the 
> mdadm.conf to /etc over the existing template. and attempted to mount 
> /dev/md0
> 
> I got only one drive up. Ensue the mild panic, because when doesn't 
> doing research first ever go perfectly?
> 
> Now with a few more days experience, I get to try again.
> 
> Thanks again,
> 
> Stewart
> 


-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-03-08  0:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-07 22:39 raid1 show clean but md0 will not assemble Stewart Andreason
2024-03-07 22:58 ` Roman Mamedov
     [not found]   ` <ac42e2cf-b8a0-4072-949e-fe3d0d969f7c@gmail.com>
2024-03-08  0:55     ` Roman Mamedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).