linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] pvmove obliterates filesystem (Opensuse 10.2, x86-64)
@ 2007-10-16 23:27 Brian Strand
  2007-10-17  1:28 ` Alasdair G Kergon
  0 siblings, 1 reply; 5+ messages in thread
From: Brian Strand @ 2007-10-16 23:27 UTC (permalink / raw)
  To: linux-lvm

(Apologies in advance if this is the wrong place for this.)  Yesterday I
ran a pvmove of a mounted filesystem, but something went wrong and the
filesystem was very badly damaged.  The box is a 2x quad-core box with
16gb running Opensuse 10.2 x86-64; it is under heavy load 24x7 (typical
 load average 15-20).  The storage is connected to a san via a QLogic
2462 dual-port FC HBA, using qla2400 (no dm-multipath).  Note:  I had
just completed a successful pvmove of another lv about 30 minutes prior
to this incident.


# pvmove --version
  LVM version:     2.02.13 (2006-10-27)
  Library version: 1.02.12 (2006-10-13)
  Driver version:  4.7.0

# uname -a
Linux somebox 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006
x86_64 x86_64 x86_64 GNU/Linux


Here is the output from attempting to pvmove a 100gb lv:

# (time pvmove --verbose -n archlogs /dev/sdc @sata) >>
pvmove-archlogs.log-20071009 2>&1 </dev/null &

# cat pvmove-archlogs.log-20071009
    Wiping cache of LVM-capable devices
    Finding volume group "switch"
    Archiving volume group "switch" metadata (seqno 248).
    Creating logical volume pvmove0
    Moving 800 extents of logical volume switch/archlogs
    Found volume group "switch"
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/switch" (seqno 249).
    Found volume group "switch"
    Found volume group "switch"
    Suspending switch-archlogs (253:13)
    Found volume group "switch"
    Found volume group "switch"
    Creating switch-pvmove0
  device-mapper: create ioctl failed: Device or resource busy
    Loading switch-archlogs table
  device-mapper: reload ioctl failed: Invalid argument
    Checking progress every 15 seconds
  WARNING: dev_open(/dev/sdc) called while suspended
  WARNING: dev_open(/dev/sdc) called while suspended
  WARNING: dev_open(/dev/sdc) called while suspended
  WARNING: dev_open(/dev/sda2) called while suspended
  WARNING: dev_open(/dev/sdb) called while suspended
  WARNING: dev_open(/dev/sdc) called while suspended
  WARNING: dev_open(/dev/sda2) called while suspended
  WARNING: dev_open(/dev/sdb) called while suspended
  WARNING: dev_open(/dev/sda2) called while suspended
  WARNING: dev_open(/dev/sdb) called while suspended
  WARNING: dev_open(/dev/sdc) called while suspended
  WARNING: dev_open(/dev/sdb) called while suspended
  WARNING: dev_open(/dev/sda2) called while suspended
  WARNING: dev_open(/dev/sdc) called while suspended
  WARNING: dev_open(/dev/sdc) called while suspended
  WARNING: dev_open(/dev/sda2) called while suspended
  WARNING: dev_open(/dev/sdb) called while suspended
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/switch" (seqno 250).
    Found volume group "switch"
    Found volume group "switch"
    Found volume group "switch"
    Found volume group "switch"
    Suspending switch-pvmove0 (253:14)
    Found volume group "switch"
    Creating switch-pvmove0
  device-mapper: create ioctl failed: Device or resource busy
  Unable to reactivate logical volume "pvmove0"
    Found volume group "switch"
    Creating switch-pvmove0
  device-mapper: create ioctl failed: Device or resource busy
    Loading switch-archlogs table
  device-mapper: reload ioctl failed: Invalid argument
  ABORTING: Segment progression failed.
    Found volume group "switch"
    Found volume group "switch"
    Found volume group "switch"
    Found volume group "switch"
    Found volume group "switch"
    Creating switch-pvmove0
  device-mapper: create ioctl failed: Device or resource busy
  Unable to reactivate logical volume "pvmove0"
    Found volume group "switch"
    Loading switch-archlogs table
    Resuming switch-archlogs (253:13)
    Found volume group "switch"
    Removing switch-pvmove0 (253:14)
    Found volume group "switch"
    Removing temporary pvmove LV
    Writing out final volume group after pvmove
    Creating volume group backup "/etc/lvm/backup/switch" (seqno 252).
  /dev/sdc: Moved: 60.0%

real    0m21.789s
user    0m0.108s
sys     0m0.052s


Kernel messages from /var/log/messages:

Oct  9 22:33:21 somebox kernel: device-mapper: table: 253:13: linear:
dm-linear: Device lookup failed
Oct  9 22:33:21 somebox kernel: device-mapper: ioctl: error adding
target to table
Oct  9 22:33:21 somebox kernel: klogd 1.4.1, ---------- state change
----------
Oct  9 22:33:36 somebox kernel: device-mapper: table: 253:13: linear:
dm-linear: Device lookup failed
Oct  9 22:33:36 somebox kernel: device-mapper: ioctl: error adding
target to table
Oct  9 22:40:01 somebox kernel: ReiserFS: dm-13: warning: vs-4080:
reiserfs_free_block: free_block (dm-13:13061735)[dev:blocknr]: bit
already cleared
Oct  9 22:40:01 somebox kernel: ReiserFS: dm-13: warning: vs-4080:
reiserfs_free_block: free_block (dm-13:13061734)[dev:blocknr]: bit
already cleared

...and many thousands more complaints from reiserfs.  Given the error
messages (especially "/dev/sdc: Moved: 60.0%") and the speed with which
the destruction occurred, my working hypothesis is that the first 60% of
the lv got repointed to the destination pv, but that the data got left
behind.

Are there any known issues with pvmove?  Is pvmove a supported
operation?  I had many pvmove-induced kernel oopses under Suse 9.3, but
up until this instance it had worked fine under Opensuse 10.2 for at
least 10 pvmoves on various boxes, all under load.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] pvmove obliterates filesystem (Opensuse 10.2, x86-64)
  2007-10-16 23:27 [linux-lvm] pvmove obliterates filesystem (Opensuse 10.2, x86-64) Brian Strand
@ 2007-10-17  1:28 ` Alasdair G Kergon
  2007-10-17 22:11   ` Morten Torstensen
  2007-10-17 23:52   ` [linux-lvm] " Brian Strand
  0 siblings, 2 replies; 5+ messages in thread
From: Alasdair G Kergon @ 2007-10-17  1:28 UTC (permalink / raw)
  To: Brian Strand; +Cc: linux-lvm

On Tue, Oct 16, 2007 at 11:27:42PM +0000, Brian Strand wrote:
> 2462 dual-port FC HBA, using qla2400 (no dm-multipath).  Note:  I had
> just completed a successful pvmove of another lv about 30 minutes prior
> to this incident.
 
>   LVM version:     2.02.13 (2006-10-27)
>   Library version: 1.02.12 (2006-10-13)
>   Driver version:  4.7.0
> Linux somebox 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006
> x86_64 x86_64 x86_64 GNU/Linux
 
(need to check those versions are all compatible and kernel isn't missing
relevant patches)

>     Creating switch-pvmove0
>   device-mapper: create ioctl failed: Device or resource busy

That should *not* happen.
Are you sure the preceding pvmove completed correctly?
Is some version of udev enabled on dm devices that might be interfering?

>   device-mapper: reload ioctl failed: Invalid argument

>     Creating volume group backup "/etc/lvm/backup/switch" (seqno 250).

Need to check through the sequence of backups to see all the metadata
changes it actually made (probably need the ones in the on-disk metadata
area rather than just the /etc/lvm/backup ones).

>   ABORTING: Segment progression failed.

>   /dev/sdc: Moved: 60.0%

Message could be incorrect, need to check.

> Oct  9 22:33:21 somebox kernel: device-mapper: table: 253:13: linear:
> dm-linear: Device lookup failed

So it couldn't use that device (a common cause is a size error when wrong
device is used e.g. with software raid or partially-cloned devices).

Alasdair
-- 
agk@redhat.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] pvmove obliterates filesystem (Opensuse 10.2, x86-64)
  2007-10-17  1:28 ` Alasdair G Kergon
@ 2007-10-17 22:11   ` Morten Torstensen
  2007-10-20 17:12     ` Stuart D. Gathman
  2007-10-17 23:52   ` [linux-lvm] " Brian Strand
  1 sibling, 1 reply; 5+ messages in thread
From: Morten Torstensen @ 2007-10-17 22:11 UTC (permalink / raw)
  To: LVM general discussion and development

 > [snip pvmove problems]

This is why, imo, why lvm on linux needs to virtualize PE in lVm... a LV 
only maps to LE (logical extents) that can themselves be mapped to one 
or more PEs. Then you can pvmove by adding maps and removing old maps 
after verifying that the new mapping is ok.

This is basically how LVM is implemented in AIX. Is this a too big NIH 
problem for LVM on Linux? I am amazed that LVM on Linux doesn't steal 
more ideas from LVM on AIX, as that is without doubt the best LVM 
implementation you have out there. Big words, yes... if anyone can argue 
why I am wrong I am happy to hear why :) (and gladly accept defeat if 
that is required)

//Morten

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [linux-lvm] Re: pvmove obliterates filesystem (Opensuse 10.2, x86-64)
  2007-10-17  1:28 ` Alasdair G Kergon
  2007-10-17 22:11   ` Morten Torstensen
@ 2007-10-17 23:52   ` Brian Strand
  1 sibling, 0 replies; 5+ messages in thread
From: Brian Strand @ 2007-10-17 23:52 UTC (permalink / raw)
  To: linux-lvm

Alasdair G Kergon wrote:
> On Tue, Oct 16, 2007 at 11:27:42PM +0000, Brian Strand wrote:
>> 2462 dual-port FC HBA, using qla2400 (no dm-multipath).  Note:  I had
>> just completed a successful pvmove of another lv about 30 minutes prior
>> to this incident.
>  
>>   LVM version:     2.02.13 (2006-10-27)
>>   Library version: 1.02.12 (2006-10-13)
>>   Driver version:  4.7.0
>> Linux somebox 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006
>> x86_64 x86_64 x86_64 GNU/Linux
>  
> (need to check those versions are all compatible and kernel isn't missing
> relevant patches)

Is this information available somewhere I can check?


>>     Creating switch-pvmove0
>>   device-mapper: create ioctl failed: Device or resource busy
> 
> That should *not* happen.
> Are you sure the preceding pvmove completed correctly?

For the preceding pvmove, the log file showed no errors and there was
nothing from device-mapper in the logs.  "lvs -o +devices" showed the
expected result (the lv was now on the desired pv, and was not there
prior to the pvmove).  Also the successfully pvmoved lv contains some of
Oracle's system datafiles as well as binaries, so Oracle would have
imploded rapidly if something went wrong.


> Is some version of udev enabled on dm devices that might be interfering?

This I don't know; we're just running the stock out-of-the-box udev.
Any pointers to how I can find this out would be appreciated.


>>   device-mapper: reload ioctl failed: Invalid argument
> 
>>     Creating volume group backup "/etc/lvm/backup/switch" (seqno 250).
> 
> Need to check through the sequence of backups to see all the metadata
> changes it actually made (probably need the ones in the on-disk metadata
> area rather than just the /etc/lvm/backup ones).

Would dd suffice to get the on-disk metadata area?  If so, what is (are)
the offset(s) to use?  Is it ok to post these as attachments to the
list, or is there some other preferred means?


>>   ABORTING: Segment progression failed.
> 
>>   /dev/sdc: Moved: 60.0%
> 
> Message could be incorrect, need to check.
> 
>> Oct  9 22:33:21 somebox kernel: device-mapper: table: 253:13: linear:
>> dm-linear: Device lookup failed
> 
> So it couldn't use that device (a common cause is a size error when wrong
> device is used e.g. with software raid or partially-cloned devices).
> 
> Alasdair

If it helps any, 253:13 is the major:minor of the lv which got destroyed
during the pvmove.  It is still present in /dev/mapper, as I left the lv
alone (after fsck).  Please let me know if I should attach (or otherwise
send) any files.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-lvm] pvmove obliterates filesystem (Opensuse 10.2, x86-64)
  2007-10-17 22:11   ` Morten Torstensen
@ 2007-10-20 17:12     ` Stuart D. Gathman
  0 siblings, 0 replies; 5+ messages in thread
From: Stuart D. Gathman @ 2007-10-20 17:12 UTC (permalink / raw)
  To: LVM general discussion and development

On Thu, 18 Oct 2007, Morten Torstensen wrote:

>  > [snip pvmove problems]
> 
> This is why, imo, why lvm on linux needs to virtualize PE in lVm... a LV 
> only maps to LE (logical extents) that can themselves be mapped to one 
> or more PEs. Then you can pvmove by adding maps and removing old maps 
> after verifying that the new mapping is ok.

While I come from AIX myself, I was impressed that lvm only obliterated
the LV being moved.  That shows good isolation between LVs.

-- 
	      Stuart D. Gathman <stuart@bmsi.com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-10-20 17:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-16 23:27 [linux-lvm] pvmove obliterates filesystem (Opensuse 10.2, x86-64) Brian Strand
2007-10-17  1:28 ` Alasdair G Kergon
2007-10-17 22:11   ` Morten Torstensen
2007-10-20 17:12     ` Stuart D. Gathman
2007-10-17 23:52   ` [linux-lvm] " Brian Strand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).