All of lore.kernel.org
 help / color / mirror / Atom feed
* Issue with moving LSI/Dell Raid to MD
@ 2024-03-16 18:26 Shaya Potter
  2024-03-18 11:18 ` Mariusz Tkaczyk
  0 siblings, 1 reply; 3+ messages in thread
From: Shaya Potter @ 2024-03-16 18:26 UTC (permalink / raw)
  To: linux-raid

note: not subscribed, so please cc me on responses.

I recently had a Dell R710 die where I was using the Perc6 to provide
storage to the box.  As the box wasn't usable, I decided to image the
individual disks to a newer machine with significantly more storage.

I sort of messed up the progress, but that might have discovered a bug in mdadm.

Background, the Dell R710 supported 6 drives, which I had as a 1TB
SATA SSD and 5x8TB SATA disks in a RAID5 array.

In the process of imaging it, I I was setting up devices on /dev/loop
to be prepared to assemble the raid, but I think I accidentally
assembled the raid while imaging the last disk (which in effect caused
the last disk to get out of sync with the other disks.  This was
initially ok, until the VM I was doing it on, crashed with a KVM/QEMU
failure (unsure what occurred).

I was hoping, it was going to be easy to bring up the raid array
again, but now mdadm was segfault on a null pointer exception whenever
I tried to assemble the array (was just trying the RAID5 portion).

I was thinking perhaps my VM got corrupted, but I couldn't figure that
out, so I decided to try and reimage the disks (more carefully this
time), but yes, the 5th disk was marked as in quick init, while the
others were more consistent.

Howvever, same segfault was occuring, so I built mdadm from source
(with -g and no -O, as an aside, this would be a good Makefile target
to have, to make issues easier to debug)

After understanding the issue, the segfault seems to be due to
Assemble.c wanting to call update_super() with a ddf super.  Except
super-ddf.c doesn't provide that.

i.e. in Assemble.c it was crashing at

if (st->ss->update_super(st, &devices[j].i, UOPT_SPEC_ASSEMBLE, NULL,
c->verbose, 0, NULL)) {...}

which now explained the seg fault on null pointer exception.  I was
able to progress past the segfault (perhaps badly, but it "seems" to
work for me), by putting in a null check before the update_super()
call, i.e.

if (st->ss->update_super && st->ss->update_super(....)) { ... }

thoughts about my "fix" (perhaps super-ddf.c needs an empty
update_super function?) , if this is a bug? (perhaps its unexpected
for me to have gotten into this state in the first place?)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Issue with moving LSI/Dell Raid to MD
  2024-03-16 18:26 Issue with moving LSI/Dell Raid to MD Shaya Potter
@ 2024-03-18 11:18 ` Mariusz Tkaczyk
  2024-03-18 11:42   ` Shaya Potter
  0 siblings, 1 reply; 3+ messages in thread
From: Mariusz Tkaczyk @ 2024-03-18 11:18 UTC (permalink / raw)
  To: Shaya Potter; +Cc: linux-raid

On Sat, 16 Mar 2024 20:26:15 +0200
Shaya Potter <spotter@gmail.com> wrote:

> note: not subscribed, so please cc me on responses.
> 
> I recently had a Dell R710 die where I was using the Perc6 to provide
> storage to the box.  As the box wasn't usable, I decided to image the
> individual disks to a newer machine with significantly more storage.
> 
> I sort of messed up the progress, but that might have discovered a bug in
> mdadm.
> 
> Background, the Dell R710 supported 6 drives, which I had as a 1TB
> SATA SSD and 5x8TB SATA disks in a RAID5 array.
> 
> In the process of imaging it, I I was setting up devices on /dev/loop
> to be prepared to assemble the raid, but I think I accidentally
> assembled the raid while imaging the last disk (which in effect caused
> the last disk to get out of sync with the other disks.  This was
> initially ok, until the VM I was doing it on, crashed with a KVM/QEMU
> failure (unsure what occurred).
> 
> I was hoping, it was going to be easy to bring up the raid array
> again, but now mdadm was segfault on a null pointer exception whenever
> I tried to assemble the array (was just trying the RAID5 portion).
> 
> I was thinking perhaps my VM got corrupted, but I couldn't figure that
> out, so I decided to try and reimage the disks (more carefully this
> time), but yes, the 5th disk was marked as in quick init, while the
> others were more consistent.
> 
> Howvever, same segfault was occuring, so I built mdadm from source
> (with -g and no -O, as an aside, this would be a good Makefile target
> to have, to make issues easier to debug)
> 
> After understanding the issue, the segfault seems to be due to
> Assemble.c wanting to call update_super() with a ddf super.  Except
> super-ddf.c doesn't provide that.
> 
> i.e. in Assemble.c it was crashing at
> 
> if (st->ss->update_super(st, &devices[j].i, UOPT_SPEC_ASSEMBLE, NULL,
> c->verbose, 0, NULL)) {...}
> 
> which now explained the seg fault on null pointer exception.  I was
> able to progress past the segfault (perhaps badly, but it "seems" to
> work for me), by putting in a null check before the update_super()
> call, i.e.
> 
> if (st->ss->update_super && st->ss->update_super(....)) { ... }
> 
> thoughts about my "fix" (perhaps super-ddf.c needs an empty
> update_super function?) , if this is a bug? (perhaps its unexpected
> for me to have gotten into this state in the first place?)
> 

Hello Shaya,
DDF is not actively developed. I'm considering dropping
it.
If you are interested in bringing it too life then you are
more than welcome to send patches!

If DDF doesn't implement update_super() then fix proposed by you seems to be
valid. Please send proper patch for that then we will review it.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Issue with moving LSI/Dell Raid to MD
  2024-03-18 11:18 ` Mariusz Tkaczyk
@ 2024-03-18 11:42   ` Shaya Potter
  0 siblings, 0 replies; 3+ messages in thread
From: Shaya Potter @ 2024-03-18 11:42 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid

On Mon, Mar 18, 2024 at 1:18 PM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> On Sat, 16 Mar 2024 20:26:15 +0200
> Shaya Potter <spotter@gmail.com> wrote:
>
> > note: not subscribed, so please cc me on responses.
> >
> > I recently had a Dell R710 die where I was using the Perc6 to provide
> > storage to the box.  As the box wasn't usable, I decided to image the
> > individual disks to a newer machine with significantly more storage.
> >
> > I sort of messed up the progress, but that might have discovered a bug in
> > mdadm.
> >
> > Background, the Dell R710 supported 6 drives, which I had as a 1TB
> > SATA SSD and 5x8TB SATA disks in a RAID5 array.
> >
> > In the process of imaging it, I I was setting up devices on /dev/loop
> > to be prepared to assemble the raid, but I think I accidentally
> > assembled the raid while imaging the last disk (which in effect caused
> > the last disk to get out of sync with the other disks.  This was
> > initially ok, until the VM I was doing it on, crashed with a KVM/QEMU
> > failure (unsure what occurred).
> >
> > I was hoping, it was going to be easy to bring up the raid array
> > again, but now mdadm was segfault on a null pointer exception whenever
> > I tried to assemble the array (was just trying the RAID5 portion).
> >
> > I was thinking perhaps my VM got corrupted, but I couldn't figure that
> > out, so I decided to try and reimage the disks (more carefully this
> > time), but yes, the 5th disk was marked as in quick init, while the
> > others were more consistent.
> >
> > Howvever, same segfault was occuring, so I built mdadm from source
> > (with -g and no -O, as an aside, this would be a good Makefile target
> > to have, to make issues easier to debug)
> >
> > After understanding the issue, the segfault seems to be due to
> > Assemble.c wanting to call update_super() with a ddf super.  Except
> > super-ddf.c doesn't provide that.
> >
> > i.e. in Assemble.c it was crashing at
> >
> > if (st->ss->update_super(st, &devices[j].i, UOPT_SPEC_ASSEMBLE, NULL,
> > c->verbose, 0, NULL)) {...}
> >
> > which now explained the seg fault on null pointer exception.  I was
> > able to progress past the segfault (perhaps badly, but it "seems" to
> > work for me), by putting in a null check before the update_super()
> > call, i.e.
> >
> > if (st->ss->update_super && st->ss->update_super(....)) { ... }
> >
> > thoughts about my "fix" (perhaps super-ddf.c needs an empty
> > update_super function?) , if this is a bug? (perhaps its unexpected
> > for me to have gotten into this state in the first place?)
> >
>
> Hello Shaya,
> DDF is not actively developed. I'm considering dropping
> it.
> If you are interested in bringing it too life then you are
> more than welcome to send patches!
>
> If DDF doesn't implement update_super() then fix proposed by you seems to be
> valid. Please send proper patch for that then we will review it.
>
> Thanks,
> Mariusz

I'll make a proper patch in the coming days.

just to note: it is very useful for recovering from RAID arrays that
do provide that metadata.  It would be a shame (IMO) to lose support
for it, as it would have made my recovery/migration efforts much more
difficult.  At worst, I'd suggest marking it unmaintained, needing a
specific flag to be used which notes, since it's unmaintained, it
might go down code paths that are untested and could break in future
(i.e. what happened to me).

As a total other aside: md seems to work much better (performance
wise) when using loop devices when the loop devices are created with
direct-io support.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-03-18 11:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-16 18:26 Issue with moving LSI/Dell Raid to MD Shaya Potter
2024-03-18 11:18 ` Mariusz Tkaczyk
2024-03-18 11:42   ` Shaya Potter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.