* [linux-lvm] Failed PV recovery
@ 2006-07-24 2:19 Lamont R. Peterson
2006-07-24 12:19 ` Dieter Stüken
0 siblings, 1 reply; 3+ messages in thread
From: Lamont R. Peterson @ 2006-07-24 2:19 UTC (permalink / raw)
To: LVM
[-- Attachment #1: Type: text/plain, Size: 4341 bytes --]
All,
Here's the setup: home file server has 3 drives, 4.3GB, 45GB, 120GB; all IDE.
The 4.3GB drive has a /boot/ partition and a small swap with the rest
allocated to an LVM partition which is the only member of the "system VG.
The other two drives are single LVM partitions and comprise the "data" VG.
That's how it was configured for over a year.
A few months ago, I started seeing some unreadable sectors on the 45GB drive.
I purchased a 320GB SATA drive and a PCI controller (no SATA on this
motherboard) to replace the two drives (I'll get more SATA disks and convert
to LVM on RAID as I can afford them). Long story short, motherboard needed
BIOS flash and a little coaxing to recognize the PCI STAT controller, but
that's sorted out now.
I partition the 320GB drive with 1 LVM PV and add it the data VG. I
run "pvmove /dev/hde1 /dev/sda1" (120GB -> 320GB) which takes about 75
minutes (120GB was almost completely full) no issues.
AT that point, I *should* have run "vgreduce data /dev/hde1" so that I
wouldn't have the 120GB drive in the VG anymore, but I didn't. 20/20
Hindsight.
Next I ran "pvmove /dev/hdg1 /dev/sda1" (45GB -> 320GB). About 45% of the way
through, it crashes:
/dev/hdg1: Moved: 45.0%
/dev/hdg1: read failed after 0 of 1024 at 4096: Input/output error
/dev/hdg1: read failed after 0 of 2048 at 0: Input/output error
Failed to read existing physical volume '/dev/hdg1'
Physical volume /dev/hdg1 not found
ABORTING: Can't reread PV /dev/hdg1
ABORTING: Can't reread VG for /dev/hdg1
The system was still running, but the /dev/hdg disk no longer showed up. In
the past, I could power down for an hour or so (let the drive cool down) and
then it would show up again. It looked like the mounted LVs which are on
data were fine (I could read & write), so I powered off. Rebooting, I get
kernel panics. I can bring the box up in "emergency" mode or with a rescue
environment.
Prior to this, only one LV was unusable. I was able to read every bit of the
rest of them just fine (I have backups of everything important). The one bad
LV (due to unreadable sectors on the 45GB drive) was for /var/spool/up2date
when I was running RHEL3, which I have obviously replaced since RHEL3
wouldn't support SATA (I have SUSE Linux 10.1 on there now).
If I had already removed the 120GB drive from the VG, I would try dd_rescue
and copy the entire 45GB drive over to the 120GB one. I can't get vgreduce
to run correctly and pull it out of the VG. When I run pvscan, I get:
NOTE: I just booted up the box to get the output, and the 45GB disk was
working. It hasn't been for about a week now. I have successfully removed
the 120GB drive from the data VG. Man, I gotta love having a little bit of
luck! Wow. :D
I could just blow it all away and recreate the data VG from scratch, reloading
from backups (and pulling down things like .iso images, etc.). I would like
to figure out some techniques to try to recover this from here. As I make my
living teaching over 1,000 people/year (newbies and experts alike) to use
Linux, I'd like to be able to use this experience to teach others how to
recover if they find themselves up the "Creek Who Should Not Be Named".
1. How can I take an unused PV out of a VG with another PV that's broken?
2. Once I have a copy of the entire bad drive's contents, how do I alter the
VG (hand edit?) so that it is using the copy instead of the original.
3. What am I not asking/seeing?
4. Are there better ways I could have handled this (other than the obvious
like RAID to start with, etc.)?
--
Lamont R. Peterson <peregrine@OpenBrainstem.net>
Founder [ http://blog.OpenBrainstem.net/peregrine/ ]
GPG Key fingerprint: 0E35 93C5 4249 49F0 EC7B 4DDD BE46 4732 6460 CCB5
___ ____ _ _
/ _ \ _ __ ___ _ __ | __ ) _ __ __ _(_)_ __ ___| |_ ___ _ __ ___
| | | | '_ \ / _ \ '_ \| _ \| '__/ _` | | '_ \/ __| __/ _ \ '_ ` _ \
| |_| | |_) | __/ | | | |_) | | | (_| | | | | \__ \ || __/ | | | | |
\___/| .__/ \___|_| |_|____/|_| \__,_|_|_| |_|___/\__\___|_| |_| |_|
|_| Intelligent Open Source Software Engineering
[ http://www.OpenBrainstem.net/ ]
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [linux-lvm] Failed PV recovery
2006-07-24 2:19 [linux-lvm] Failed PV recovery Lamont R. Peterson
@ 2006-07-24 12:19 ` Dieter Stüken
2006-07-24 17:59 ` Lamont R. Peterson
0 siblings, 1 reply; 3+ messages in thread
From: Dieter Stüken @ 2006-07-24 12:19 UTC (permalink / raw)
To: LVM general discussion and development
Lamont R. Peterson wrote:
> 4. Are there better ways I could have handled this (other than the obvious
> like RAID to start with, etc.)?
Hi Lamont,
I strongly suggest to use: http://smartmontools.sourceforge.net/
to monitor all disks. In most cases you get warnings about
problems with your disks weeks before they fail completely.
> A few months ago, I started seeing some unreadable sectors on the 45GB drive.
... but you have to take it seriously :-)
Dieter.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [linux-lvm] Failed PV recovery
2006-07-24 12:19 ` Dieter Stüken
@ 2006-07-24 17:59 ` Lamont R. Peterson
0 siblings, 0 replies; 3+ messages in thread
From: Lamont R. Peterson @ 2006-07-24 17:59 UTC (permalink / raw)
To: LVM
[-- Attachment #1: Type: text/plain, Size: 1718 bytes --]
On Monday 24 July 2006 06:19am, Dieter Stüken wrote:
> Lamont R. Peterson wrote:
> > 4. Are there better ways I could have handled this (other than the
> > obvious like RAID to start with, etc.)?
>
> Hi Lamont,
>
> I strongly suggest to use: http://smartmontools.sourceforge.net/
> to monitor all disks. In most cases you get warnings about
> problems with your disks weeks before they fail completely.
Thanks :), I'll check it out. Since my home file server is now growing,
that's just the kind of monitoring I want for the future.
> > A few months ago, I started seeing some unreadable sectors on the 45GB
> > drive.
>
> ... but you have to take it seriously :-)
Absolutely; and I did take the indications seriously. Unfortunately, the
accounting department (a.k.a. my wife & my bank account) couldn't help me out
fast enough.
Of course, the good news is that she now knows first hand (well 1.5 hand, I
guess) what it's like to have a disk go bad with data on it. That should
help me get parts faster in the future :) .
--
Lamont R. Peterson <peregrine@OpenBrainstem.net>
Founder [ http://blog.OpenBrainstem.net/peregrine/ ]
GPG Key fingerprint: 0E35 93C5 4249 49F0 EC7B 4DDD BE46 4732 6460 CCB5
___ ____ _ _
/ _ \ _ __ ___ _ __ | __ ) _ __ __ _(_)_ __ ___| |_ ___ _ __ ___
| | | | '_ \ / _ \ '_ \| _ \| '__/ _` | | '_ \/ __| __/ _ \ '_ ` _ \
| |_| | |_) | __/ | | | |_) | | | (_| | | | | \__ \ || __/ | | | | |
\___/| .__/ \___|_| |_|____/|_| \__,_|_|_| |_|___/\__\___|_| |_| |_|
|_| Intelligent Open Source Software Engineering
[ http://www.OpenBrainstem.net/ ]
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-07-24 18:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-24 2:19 [linux-lvm] Failed PV recovery Lamont R. Peterson
2006-07-24 12:19 ` Dieter Stüken
2006-07-24 17:59 ` Lamont R. Peterson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.