linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] Disk Died - Ideas?
@ 2001-09-25 15:02 Jeff Layton
  2001-09-25 16:56 ` lembark
  2001-09-25 17:09 ` Andreas Dilger
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff Layton @ 2001-09-25 15:02 UTC (permalink / raw)
  To: linux-lvm

Hello,

   I searched the mailing list and the web but didn't come up with
too much to help me - so I thought I would ask the experts.
   I had 6 PVs across 6 disks (the first one was just part of the disk,
the other 5 spanned the entire drives). On top of this I had vg01
and in that I had lvol1. lvol1 was ext2 formatted and mounted as
/home. There is no striping across the disks. The first drive is
internal (part of the root disk) and the other five are external to
the machine.
   Well, of course, the last disk gave up the ghost (lots of SCSI
errors, machine will not boot without unplugging drives from
machine). I'm pretty sure you can guess what I'm going to ask :)
Can I just unplug the last drive, bring the system up, don't run
fsck on lvol1, mount lvol1, and try to pull as much data as I can
off what's left of the filesystem? If this works, then I can just
redo the PVs, the VGs, and the LVOLs and recreate the filesystem
and move over what data I can recover.
   Oh, by the way, this filesystem had no backups. The powers
to be claimed they were working on a backup solution for us,
but they didn't get one in place by the time this drive died. I know,
I know. I screamed very loudly, made lots of enemies internally,
but still no backups were ever done. It's a good thing they don't
allow firearms on the site :>

TIA,

Jeff Layton

Lockheed-Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Disk Died - Ideas?
  2001-09-25 15:02 [linux-lvm] Disk Died - Ideas? Jeff Layton
@ 2001-09-25 16:56 ` lembark
  2001-09-25 17:09 ` Andreas Dilger
  1 sibling, 0 replies; 9+ messages in thread
From: lembark @ 2001-09-25 16:56 UTC (permalink / raw)
  To: linux-lvm


-- Jeff Layton <jeffrey.b.layton@lmco.com> on 09/25/01 11:02:32 -0400

> Hello,
> 
>    I searched the mailing list and the web but didn't come up with
> too much to help me - so I thought I would ask the experts.
>    I had 6 PVs across 6 disks (the first one was just part of the disk,
> the other 5 spanned the entire drives). On top of this I had vg01
> and in that I had lvol1. lvol1 was ext2 formatted and mounted as
> /home. There is no striping across the disks. The first drive is
> internal (part of the root disk) and the other five are external to
> the machine.
>    Well, of course, the last disk gave up the ghost (lots of SCSI
> errors, machine will not boot without unplugging drives from
> machine). I'm pretty sure you can guess what I'm going to ask :)
> Can I just unplug the last drive, bring the system up, don't run
> fsck on lvol1, mount lvol1, and try to pull as much data as I can
> off what's left of the filesystem? If this works, then I can just
> redo the PVs, the VGs, and the LVOLs and recreate the filesystem
> and move over what data I can recover.
>    Oh, by the way, this filesystem had no backups. The powers
> to be claimed they were working on a backup solution for us,
> but they didn't get one in place by the time this drive died. I know,
> I know. I screamed very loudly, made lots of enemies internally,
> but still no backups were ever done. It's a good thing they don't
> allow firearms on the site :>


vgexport vgXX;
vgimport vgXX <list of disks that didn't fail>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Disk Died - Ideas?
  2001-09-25 15:02 [linux-lvm] Disk Died - Ideas? Jeff Layton
  2001-09-25 16:56 ` lembark
@ 2001-09-25 17:09 ` Andreas Dilger
  2001-09-27 11:39   ` Jeff Layton
  1 sibling, 1 reply; 9+ messages in thread
From: Andreas Dilger @ 2001-09-25 17:09 UTC (permalink / raw)
  To: linux-lvm

On Sep 25, 2001  11:02 -0400, Jeff Layton wrote:
>    Well, of course, the last disk gave up the ghost (lots of SCSI
> errors, machine will not boot without unplugging drives from
> machine). I'm pretty sure you can guess what I'm going to ask :)
> Can I just unplug the last drive, bring the system up, don't run
> fsck on lvol1, mount lvol1, and try to pull as much data as I can
> off what's left of the filesystem? If this works, then I can just
> redo the PVs, the VGs, and the LVOLs and recreate the filesystem
> and move over what data I can recover.

Two ways to do it.  I _think_ the latest release of EVMS will allow you
to have partial LVs like this.  Are you sure that the LV was using space
on the last PV?  If so it is less likely to work.

The other alternative is to take the output from "pvdata -avP <dev>" on
each remaining disk, and manually "dd" out the data from each disk.  If
the LV was mostly consecutive PEs, then it will be easy, otherwise a lot
of work (you may want to write a tool if so).

PE 0 data starts at (pe_on_disk.start + pe_on_disk.size), and is in chunks
of PE size.  The pvdata output will tell you which PE numbers belonged to
your LV, so let's say on the first PV this LV starts at PE 10, you want:

(pe_on_disk.start + pe_on_disk.size) + 10 * pe_size = byte offset of LV

This will probably be at least a multiple of 1024, but maybe 4096 (larger
will make for faster dd).  Then, for the number of consecutive PEs on disk:

count=<number of consecutive IN ORDER PEs> * pe_size / 1024 = consecutive kB

dd if=/dev/pv1 of=<backup> bs=1024 skip=<offset #1 in kB> count=<count #1>

Do the same thing for the next set of consecutive PEs, with:

dd if=/dev/pv1 of=<backup> bs=1024 skip=<offset #2 in kB> \
	seek=<count #1> count=<count #2>

dd if=/dev/pv1 of=<backup> bs=1024 skip=<offset #3 in kB> \
	seek=<count #1 + count #2> count=<count #3>


>    Oh, by the way, this filesystem had no backups. The powers
> to be claimed they were working on a backup solution for us,
> but they didn't get one in place by the time this drive died. I know,
> I know. I screamed very loudly, made lots of enemies internally,
> but still no backups were ever done.

Never trust anyone w.r.t backups.  I got burned this way as well.  Also
make sure they have a RESTORE system, and not just a BACKUP system (i.e.
make sure they can get your data back from tape).

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Disk Died - Ideas?
  2001-09-25 17:09 ` Andreas Dilger
@ 2001-09-27 11:39   ` Jeff Layton
  2001-09-27 14:40     ` Steven Lembark
  2001-09-27 15:42     ` Andreas Dilger
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff Layton @ 2001-09-27 11:39 UTC (permalink / raw)
  To: linux-lvm

Andreas Dilger wrote:

> On Sep 25, 2001  11:02 -0400, Jeff Layton wrote:
> >    Well, of course, the last disk gave up the ghost (lots of SCSI
> > errors, machine will not boot without unplugging drives from
> > machine). I'm pretty sure you can guess what I'm going to ask :)
> > Can I just unplug the last drive, bring the system up, don't run
> > fsck on lvol1, mount lvol1, and try to pull as much data as I can
> > off what's left of the filesystem? If this works, then I can just
> > redo the PVs, the VGs, and the LVOLs and recreate the filesystem
> > and move over what data I can recover.
>
> Two ways to do it.  I _think_ the latest release of EVMS will allow you
> to have partial LVs like this.  Are you sure that the LV was using space
> on the last PV?  If so it is less likely to work.
>
> The other alternative is to take the output from "pvdata -avP <dev>" on
> each remaining disk, and manually "dd" out the data from each disk.  If
> the LV was mostly consecutive PEs, then it will be easy, otherwise a lot
> of work (you may want to write a tool if so).
>
> PE 0 data starts at (pe_on_disk.start + pe_on_disk.size), and is in chunks
> of PE size.  The pvdata output will tell you which PE numbers belonged to
> your LV, so let's say on the first PV this LV starts at PE 10, you want:
>
> (pe_on_disk.start + pe_on_disk.size) + 10 * pe_size = byte offset of LV
>
> This will probably be at least a multiple of 1024, but maybe 4096 (larger
> will make for faster dd).  Then, for the number of consecutive PEs on disk:
>
> count=<number of consecutive IN ORDER PEs> * pe_size / 1024 = consecutive kB
>
> dd if=/dev/pv1 of=<backup> bs=1024 skip=<offset #1 in kB> count=<count #1>
>
> Do the same thing for the next set of consecutive PEs, with:
>
> dd if=/dev/pv1 of=<backup> bs=1024 skip=<offset #2 in kB> \
>         seek=<count #1> count=<count #2>
>
> dd if=/dev/pv1 of=<backup> bs=1024 skip=<offset #3 in kB> \
>         seek=<count #1 + count #2> count=<count #3>

Andreas,

   Sorry the late follow-up. I follow you so far (I think). The next step
after copy these "blocks" using dd to some backup device is to
rebuild the filesystem on the disks and then copy back the data
from the backup device using dd (just reverse the procedure).
Is this correct? (Sorry the novice question but I want to be sure I
have the correct steps before trying this).
   Thanks very much for your help!

Jeff


>
>
> >    Oh, by the way, this filesystem had no backups. The powers
> > to be claimed they were working on a backup solution for us,
> > but they didn't get one in place by the time this drive died. I know,
> > I know. I screamed very loudly, made lots of enemies internally,
> > but still no backups were ever done.
>
> Never trust anyone w.r.t backups.  I got burned this way as well.  Also
> make sure they have a RESTORE system, and not just a BACKUP system (i.e.
> make sure they can get your data back from tape).
>
> Cheers, Andreas
> --
> Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
>                  \  would they cancel out, leaving him still hungry?"
> http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@sistina.com
> http://lists.sistina.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://www.sistina.com/lvm/Pages/howto.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Disk Died - Ideas?
  2001-09-27 11:39   ` Jeff Layton
@ 2001-09-27 14:40     ` Steven Lembark
  2001-09-27 15:55       ` Andreas Dilger
  2001-09-27 15:42     ` Andreas Dilger
  1 sibling, 1 reply; 9+ messages in thread
From: Steven Lembark @ 2001-09-27 14:40 UTC (permalink / raw)
  To: linux-lvm


-- Jeff Layton <jeffrey.b.layton@lmco.com>

> Andreas Dilger wrote:
>
>> On Sep 25, 2001  11:02 -0400, Jeff Layton wrote:
>> >    Well, of course, the last disk gave up the ghost (lots of SCSI
>> > errors, machine will not boot without unplugging drives from
>> > machine). I'm pretty sure you can guess what I'm going to ask :)
>> > Can I just unplug the last drive, bring the system up, don't run
>> > fsck on lvol1, mount lvol1, and try to pull as much data as I can
>> > off what's left of the filesystem? If this works, then I can just
>> > redo the PVs, the VGs, and the LVOLs and recreate the filesystem
>> > and move over what data I can recover.

<broken record>

boot once.

vgexport /dev/vgwhatever;
vgimport /dev/vgwhatever <list of drives that didn't croak>

you will now have your VG back on line with whatever portion of the
data is no the clean drives.  any LV's spanning the dead drive are
likely to be lost anyway.  It'll take you less time to vgextend the
imported group onto a new, working drive an recover backups onto
new LV's than almost anything else you can try.

</broken record>


--
Steven Lembark                                               2930 W. Palmer
Workhorse Computing                                       Chicago, IL 60647
                                                            +1 800 762 1582

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Disk Died - Ideas?
  2001-09-27 15:55       ` Andreas Dilger
@ 2001-09-27 15:32         ` Kevin Corry
  2001-09-27 16:09         ` Steven Lembark
  1 sibling, 0 replies; 9+ messages in thread
From: Kevin Corry @ 2001-09-27 15:32 UTC (permalink / raw)
  To: linux-lvm

> > <broken record>
> >
> > boot once.
> >
> > vgexport /dev/vgwhatever;
> > vgimport /dev/vgwhatever <list of drives that didn't croak>
> >
> > you will now have your VG back on line with whatever portion of the
> > data is no the clean drives.  any LV's spanning the dead drive are
> > likely to be lost anyway.  It'll take you less time to vgextend the
> > imported group onto a new, working drive an recover backups onto
> > new LV's than almost anything else you can try.
> >
> > </broken record>
>
> bzzzt.  This _may_ work on HPUX and AIX, but I _highly_ doubt it will
> work with Linux LVM.  The Linux LVM code requires that all of the disks
> be present, and that they all have the correct data (no metadata backups
> yet).  You could hack the vgscan code so that it doesn't require this,
> but it would probably end up causing grief somewhere else before you
> could actually read from the LV.

I'd agree with Andreas. I have tested this situation, and the 
vgexport/vgimport method isn't always guaranteed to work. If you have run 
vgscan at any point after a PV is lost, the VG will no longer be recognized, 
and you can't run vgexport anymore. It just complains and tells you to run 
vgscan again.

> AFAIK, not even HPUX or AIX would allow you to read from a partial LV
> (which is the situation we are discussing here), so it wouldn't help.
> What _would_ be very useful is a tool that reads the LVM metadata
> directly, creates a list of available LEs (in order) and dumps them
> to a file, writing zeros for LEs that are not available (and writing
> large warnings for each missing LE).

EVMS already does this. It is perfectly happy recognizing partial volume 
groups, and exports any complete volumes it finds in such a group. Any 
incomplete volume in the group (one that had data on the lost disk) will be 
exported read-only, so you can at least do a raw backup of whatever data is 
left, or use some sort of filesystem recovery tools if they are available for 
your fs.

-Kevin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Disk Died - Ideas?
  2001-09-27 11:39   ` Jeff Layton
  2001-09-27 14:40     ` Steven Lembark
@ 2001-09-27 15:42     ` Andreas Dilger
  1 sibling, 0 replies; 9+ messages in thread
From: Andreas Dilger @ 2001-09-27 15:42 UTC (permalink / raw)
  To: linux-lvm

On Sep 27, 2001  07:39 -0400, Jeff Layton wrote:
> Andreas Dilger wrote:
> > dd if=/dev/pv1 of=<backup> bs=1024 skip=<offset #1 in kB> count=<count #1>
> >
> > Do the same thing for the next set of consecutive PEs, with:
> >
> > dd if=/dev/pv1 of=<backup> bs=1024 skip=<offset #2 in kB> \
> >         seek=<count #1> count=<count #2>
> >
> > dd if=/dev/pv1 of=<backup> bs=1024 skip=<offset #3 in kB> \
> >         seek=<count #1 + count #2> count=<count #3>
> 
>    Sorry the late follow-up. I follow you so far (I think). The next step
> after copy these "blocks" using dd to some backup device is to
> rebuild the filesystem on the disks and then copy back the data
> from the backup device using dd (just reverse the procedure).
> Is this correct? (Sorry the novice question but I want to be sure I
> have the correct steps before trying this).

If you do the above "dd" steps so that they save the PEs in LE order
(as given by pvdata -avP) then you simply need to "dd" the entire
backup file out to the LV once it is built (assuming you have enough
space to save the entire LV somewhere).  If you are fortunate, and
there is only a single contiguous string of PEs/LEs on each disk it
will not be too much work.  If needed, you could "dd" the data from
each disk to a separate tape/remote machine disk, but then you need
to make sure you get the "seek" offsets correct when you are restoring
it (it should just be the sum of all the block counts output from
previous "dd" runs).

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Disk Died - Ideas?
  2001-09-27 14:40     ` Steven Lembark
@ 2001-09-27 15:55       ` Andreas Dilger
  2001-09-27 15:32         ` Kevin Corry
  2001-09-27 16:09         ` Steven Lembark
  0 siblings, 2 replies; 9+ messages in thread
From: Andreas Dilger @ 2001-09-27 15:55 UTC (permalink / raw)
  To: linux-lvm

On Sep 27, 2001  09:40 -0500, Steven Lembark wrote:
> <broken record>
> 
> boot once.
> 
> vgexport /dev/vgwhatever;
> vgimport /dev/vgwhatever <list of drives that didn't croak>
> 
> you will now have your VG back on line with whatever portion of the
> data is no the clean drives.  any LV's spanning the dead drive are
> likely to be lost anyway.  It'll take you less time to vgextend the
> imported group onto a new, working drive an recover backups onto
> new LV's than almost anything else you can try.
> 
> </broken record>

bzzzt.  This _may_ work on HPUX and AIX, but I _highly_ doubt it will
work with Linux LVM.  The Linux LVM code requires that all of the disks
be present, and that they all have the correct data (no metadata backups
yet).  You could hack the vgscan code so that it doesn't require this,
but it would probably end up causing grief somewhere else before you
could actually read from the LV.

AFAIK, not even HPUX or AIX would allow you to read from a partial LV
(which is the situation we are discussing here), so it wouldn't help.
What _would_ be very useful is a tool that reads the LVM metadata
directly, creates a list of available LEs (in order) and dumps them
to a file, writing zeros for LEs that are not available (and writing
large warnings for each missing LE).

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-lvm] Disk Died - Ideas?
  2001-09-27 15:55       ` Andreas Dilger
  2001-09-27 15:32         ` Kevin Corry
@ 2001-09-27 16:09         ` Steven Lembark
  1 sibling, 0 replies; 9+ messages in thread
From: Steven Lembark @ 2001-09-27 16:09 UTC (permalink / raw)
  To: linux-lvm


-- Andreas Dilger <adilger@turbolabs.com>

> On Sep 27, 2001  09:40 -0500, Steven Lembark wrote:
>> <broken record>
>>
>> boot once.
>>
>> vgexport /dev/vgwhatever;
>> vgimport /dev/vgwhatever <list of drives that didn't croak>
>>
>> you will now have your VG back on line with whatever portion of the
>> data is no the clean drives.  any LV's spanning the dead drive are
>> likely to be lost anyway.  It'll take you less time to vgextend the
>> imported group onto a new, working drive an recover backups onto
>> new LV's than almost anything else you can try.
>>
>> </broken record>
>
> bzzzt.  This _may_ work on HPUX and AIX, but I _highly_ doubt it will
> work with Linux LVM.  The Linux LVM code requires that all of the disks
> be present, and that they all have the correct data (no metadata backups
> yet).  You could hack the vgscan code so that it doesn't require this,
> but it would probably end up causing grief somewhere else before you
> could actually read from the LV.
>
> AFAIK, not even HPUX or AIX would allow you to read from a partial LV
> (which is the situation we are discussing here), so it wouldn't help.
> What _would_ be very useful is a tool that reads the LVM metadata
> directly, creates a list of available LEs (in order) and dumps them
> to a file, writing zeros for LEs that are not available (and writing
> large warnings for each missing LE).

Hmmm...  this is what I've used repeatedly to get drives back when LVM
croaks on me and doesn't like the VG's.  None of the commercial LVM
products allow reading from partial LV's.  Point I made was that it's
usually simpler to give up, get the volumes back on line, vgextend onto
working PV's and restore from backup.  Main problem with anything that
reads partial LV's is that you can only recover raw data, which will
normally leave you with a badly scrambled file system (e.g., ext2)
rather than any kind of real "data" you can manage -- unless you're
into locating inodes and extracting the block information then trying
to jigsaw that out of the LV recovery data.

--
Steven Lembark                                               2930 W. Palmer
Workhorse Computing                                       Chicago, IL 60647
                                                            +1 800 762 1582

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-09-27 16:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-25 15:02 [linux-lvm] Disk Died - Ideas? Jeff Layton
2001-09-25 16:56 ` lembark
2001-09-25 17:09 ` Andreas Dilger
2001-09-27 11:39   ` Jeff Layton
2001-09-27 14:40     ` Steven Lembark
2001-09-27 15:55       ` Andreas Dilger
2001-09-27 15:32         ` Kevin Corry
2001-09-27 16:09         ` Steven Lembark
2001-09-27 15:42     ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).