* [linux-lvm] Pvmove Cannot Be Aborted
@ 2007-07-04 14:05 Jim Schatzman
2007-07-05 16:41 ` Stuart D. Gathman
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Jim Schatzman @ 2007-07-04 14:05 UTC (permalink / raw)
To: linux-lvm
All-
I made the mistake of trying to use pvmove to move any good data from a bad disk to a new identical good disk in an LV. Unfortunately, the Pvmove failed in midoperation. It cannot now be aborted, presumably because of the bad disk.
Furthermore, when I set up /dev/ioerror with dmsetup and try to activate the LV with -Pay, I get a LV that is unusable ("d" type).
So... if I activate/mount the LV normally, the mount works but I get IO errors and eventually the drive turns itself off. I cannot mount the LV without the missing drive (type "d", which I am guessing happens due to the pending pvmove). I cannot abort the pvmove because of the bad drive.
So... I have learned my lesson - never use pvmove on a bad drive. However, now that I have done it, how can I extract the data from the remaining JBOD disks in the LV?
Thanks!
Jim
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [linux-lvm] Pvmove Cannot Be Aborted
2007-07-04 14:05 [linux-lvm] Pvmove Cannot Be Aborted Jim Schatzman
@ 2007-07-05 16:41 ` Stuart D. Gathman
2007-07-05 16:47 ` [linux-lvm] Debugging partial drive failure Stuart D. Gathman
2007-07-05 16:53 ` [linux-lvm] Pvmove Cannot Be Aborted Richard van den Berg
2007-07-06 4:55 ` [linux-lvm] lv with cmirror Michael Eisenkölbl / FIS
2 siblings, 1 reply; 8+ messages in thread
From: Stuart D. Gathman @ 2007-07-05 16:41 UTC (permalink / raw)
To: LVM general discussion and development
On Wed, 4 Jul 2007, Jim Schatzman wrote:
> I made the mistake of trying to use pvmove to move any good data from a bad
> disk to a new identical good disk in an LV. Unfortunately, the Pvmove failed
> in midoperation. It cannot now be aborted, presumably because of the bad
> disk.
I've noticed that LVM has big problems handling partially failed drives.
I think this is largely due to the difficulty of testing. It is simple
enough to simulate completely failed drives. I've tested this by disconnecting
the power from a drive while the system is running (I'm sure there are
safer ways.) However, a partially failed drive (lots of bad sectors) is
another matter.
I wish there was a SMART command to a drive that would tell it to pretend a
range of sectors is bad until further notice (without actually remapping said
sectors). This would be a great help is debugging the error handling of things
like LVM. In fact, maybe there already is such a feature in SMART, and
it just isn't widely known.
--
Stuart D. Gathman <stuart@bmsi.com>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.
^ permalink raw reply [flat|nested] 8+ messages in thread* [linux-lvm] Debugging partial drive failure
2007-07-05 16:41 ` Stuart D. Gathman
@ 2007-07-05 16:47 ` Stuart D. Gathman
2007-07-07 11:05 ` Nix
0 siblings, 1 reply; 8+ messages in thread
From: Stuart D. Gathman @ 2007-07-05 16:47 UTC (permalink / raw)
To: LVM general discussion and development
On Thu, 5 Jul 2007, Stuart D. Gathman wrote:
> On Wed, 4 Jul 2007, Jim Schatzman wrote:
>
> > I made the mistake of trying to use pvmove to move any good data from a bad
> > disk to a new identical good disk in an LV. Unfortunately, the Pvmove failed
> > in midoperation. It cannot now be aborted, presumably because of the bad
> > disk.
>
> I wish there was a SMART command to a drive that would tell it to pretend a
> range of sectors is bad until further notice (without actually remapping said
> sectors). This would be a great help is debugging the error handling of
> things like LVM. In fact, maybe there already is such a feature in SMART,
> and it just isn't widely known.
This could be added to the linux disk driver. There could be an
option for a disk module that would pretend that a range of sectors
got an I/O error from the drive. This would allow debugging the LVM
layer.
E.g. modprobe.conf:
options mptbase debug_bad=450000-500000
--
Stuart D. Gathman <stuart@bmsi.com>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [linux-lvm] Debugging partial drive failure
2007-07-05 16:47 ` [linux-lvm] Debugging partial drive failure Stuart D. Gathman
@ 2007-07-07 11:05 ` Nix
0 siblings, 0 replies; 8+ messages in thread
From: Nix @ 2007-07-07 11:05 UTC (permalink / raw)
To: LVM general discussion and development
On 5 Jul 2007, Stuart D. Gathman uttered the following:
> This could be added to the linux disk driver. There could be an
> option for a disk module that would pretend that a range of sectors
> got an I/O error from the drive. This would allow debugging the LVM
> layer.
I think something in /sys or /debug would be more practical. debugfs
seems a better fit.
You really want to be able to change this at runtime without rmmodding
or rebooting.
--
`... in the sense that dragons logically follow evolution so they would
be able to wield metal.' --- Kenneth Eng's colourless green ideas sleep
furiously
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] Pvmove Cannot Be Aborted
2007-07-04 14:05 [linux-lvm] Pvmove Cannot Be Aborted Jim Schatzman
2007-07-05 16:41 ` Stuart D. Gathman
@ 2007-07-05 16:53 ` Richard van den Berg
2007-07-06 0:12 ` Jim Schatzman
2007-07-06 4:55 ` [linux-lvm] lv with cmirror Michael Eisenkölbl / FIS
2 siblings, 1 reply; 8+ messages in thread
From: Richard van den Berg @ 2007-07-05 16:53 UTC (permalink / raw)
To: LVM general discussion and development
Jim Schatzman wrote:
> So... I have learned my lesson - never use pvmove on a bad drive.
Your VG and LVs are not in worse shape now (after the stuck pvmove) than
they would have been if you did not attempt the pvmove, right? I guess
you did not use use raid on top of or under lvm?
Sincerely,
Richard van den Berg
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] Pvmove Cannot Be Aborted
2007-07-05 16:53 ` [linux-lvm] Pvmove Cannot Be Aborted Richard van den Berg
@ 2007-07-06 0:12 ` Jim Schatzman
0 siblings, 0 replies; 8+ messages in thread
From: Jim Schatzman @ 2007-07-06 0:12 UTC (permalink / raw)
To: LVM general discussion and development
At 10:53 AM 7/5/2007, you wrote:
>Jim Schatzman wrote:
>> So... I have learned my lesson - never use pvmove on a bad drive.
>
>Your VG and LVs are not in worse shape now (after the stuck pvmove) than
>they would have been if you did not attempt the pvmove, right? I guess
>you did not use use raid on top of or under lvm?
>
>Sincerely,
>
>Richard van den Berg
Thanks for your question.
The problem is more complicated than I explained. The drive (a late model SATA device) apparently tries to automatically relocate bad sectors. The relocate invariably fails. What happens after a while is that the system log gets filled with thousands of error messages and Linux eventually disables the drive. The drive then disables itself somehow (I have not figured this out) so that if I reboot then Linux reports "soft reset failed" and disables the drive again.
When the drive is offline, Linux LVM refuses to activate the LVs on the VG (obviously) unless I use dmsetup to error out the bad drive. If I do that, the LV ends up in a "d" state ("device present without tables"). I am not sure what "tables" are meant. I am certain that there are many EXT2 superblocks on the good drives, including the first one, so that isn't it. If I had to guess, I would think that what is meant is the LVM tables.
I can power cycle the drive and computer about 20 times to get the drive to come back online (it would be helpful if I could get SATA hot-swap to work so I wouldn't have to reboot the computer but no joy even though the controller supposedly supports hot swap - maybe "hot swap" is different from "hot initial plug in"?). Then I can copy off a bit of data until the drive goes haywire again and Linux disables it. After a few days, I was able to retrieve all the good data I was going to get.
What would have saved me a lot of time is to have been able to mount the LV filesystems without the bad drive via the dmsetup mechanism. That did not work. My guess is that this is due to my executing "pvmove" in a vain attempt to replace the bad drive without having to rebuild the VG from scratch. Once the pvmove failed, I was hosed, apparently. No going forward and no going back.
Jim Schatzman
^ permalink raw reply [flat|nested] 8+ messages in thread
* [linux-lvm] lv with cmirror
2007-07-04 14:05 [linux-lvm] Pvmove Cannot Be Aborted Jim Schatzman
2007-07-05 16:41 ` Stuart D. Gathman
2007-07-05 16:53 ` [linux-lvm] Pvmove Cannot Be Aborted Richard van den Berg
@ 2007-07-06 4:55 ` Michael Eisenkölbl / FIS
2007-07-10 14:51 ` Jonathan Brassow
2 siblings, 1 reply; 8+ messages in thread
From: Michael Eisenkölbl / FIS @ 2007-07-06 4:55 UTC (permalink / raw)
To: linux-lvm
Hi,
I installed a clustered VG with a mirrored LV (RHEL4.5 and cmirror).
and using three iscsi storages (each two channels) and multipathd ... to
be sure.
If I disconnect one iscsi storage, the LV is not working.
the command lvs or lvdisplay stucks.... when connecting again, the
command will be
finished and everything will work.
why doesn't it remove the fault pv? any ideas?
kind regards,
michael
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-07-10 14:53 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-04 14:05 [linux-lvm] Pvmove Cannot Be Aborted Jim Schatzman
2007-07-05 16:41 ` Stuart D. Gathman
2007-07-05 16:47 ` [linux-lvm] Debugging partial drive failure Stuart D. Gathman
2007-07-07 11:05 ` Nix
2007-07-05 16:53 ` [linux-lvm] Pvmove Cannot Be Aborted Richard van den Berg
2007-07-06 0:12 ` Jim Schatzman
2007-07-06 4:55 ` [linux-lvm] lv with cmirror Michael Eisenkölbl / FIS
2007-07-10 14:51 ` Jonathan Brassow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).