* [linux-lvm] pvmove hangs
@ 2003-08-16 5:21 Jan Niehusmann
2003-08-16 10:48 ` Jan Niehusmann
2003-08-16 13:57 ` Alasdair G Kergon
0 siblings, 2 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-16 5:21 UTC (permalink / raw)
To: linux-lvm
Hi!
On my computer, pvmove just hangs at 0.2% done. First it looked like the
reason was that I hadn't loaded dm-mirror before calling pvmove.
Modprobe was called automatically, but tried to write to
/var/log/ksymoops/, which was on one of the lvs to move.
But then I booted to singe user mode and didn't mount any partitions on
lvm, and still, pvmove was hanging at 0.2%.
Another unusual detail about my installation is that the target pv is on
a degraded raid1 array. Perhaps there is some locking issue?
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-16 5:21 Jan Niehusmann
@ 2003-08-16 10:48 ` Jan Niehusmann
2003-08-16 13:57 ` Alasdair G Kergon
1 sibling, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-16 10:48 UTC (permalink / raw)
To: linux-lvm
On Sat, Aug 16, 2003 at 12:20:06PM +0200, Jan Niehusmann wrote:
> On my computer, pvmove just hangs at 0.2% done. First it looked like the
> reason was that I hadn't loaded dm-mirror before calling pvmove.
I just noticed I forgot an important detail: Version numbers :-)
# pvmove --version
LVM version: 2.00.05 (2003-07-18)
Library version: 1.00.02-ioctl (2003-07-12)
Driver version: 4.0.1
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-16 5:21 Jan Niehusmann
2003-08-16 10:48 ` Jan Niehusmann
@ 2003-08-16 13:57 ` Alasdair G Kergon
2003-08-17 11:11 ` Jan Niehusmann
` (2 more replies)
1 sibling, 3 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-16 13:57 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: linux-lvm
On Sat, Aug 16, 2003 at 12:20:06PM +0200, Jan Niehusmann wrote:
> On my computer, pvmove just hangs at 0.2% done.
If pvmove fails, you can use 'pvmove --abort' to abandon it
or 'pvmove' on its own [or with same source pv as before] to
retry. xxchange -ay may also attempt to restart pvmoves in
progress (leaving a pvmove daemon running).
> But then I booted to singe user mode and didn't mount any partitions on
> lvm, and still, pvmove was hanging at 0.2%.
Need more precise description of what you mean by 'hanging'.
And the usual diagnostics: debug log file from the command that should
have set the pvmove running, any kernel error messages, lvm.conf, a copy
of the current VG's metadata [e.g. with vgcfgbackup -f] etc.
If pvmove process itself freezes, then a long/wide 'ps' output for that
process, kcopyd & kmirrord (incl cols: NI, VSZ, RSS, STAT
and symbolic WCHAN).
Alasdair
--
agk@uk.sistina.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-16 13:57 ` Alasdair G Kergon
@ 2003-08-17 11:11 ` Jan Niehusmann
2003-08-17 11:34 ` Alasdair G Kergon
2003-08-17 18:15 ` Jan Niehusmann
[not found] ` <20030817114638.GA1839@gondor.com>
2 siblings, 1 reply; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-17 11:11 UTC (permalink / raw)
To: linux-lvm
Good news (perhaps?):
If I move only single LVs (ie., I specify one LV with -n on pvmove), it
seems to work.
Perhaps the problem is that source and target PV have exactly the same
number of extents? Of-by-one error, if the target volume gets completely
filled up by the pvmove?
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-17 11:11 ` Jan Niehusmann
@ 2003-08-17 11:34 ` Alasdair G Kergon
2003-08-17 11:41 ` Jan Niehusmann
0 siblings, 1 reply; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 11:34 UTC (permalink / raw)
To: linux-lvm
On Sun, Aug 17, 2003 at 06:10:22PM +0200, Jan Niehusmann wrote:
> Perhaps the problem is that source and target PV have exactly the same
> number of extents? Of-by-one error, if the target volume gets completely
> filled up by the pvmove?
Who knows?
The log file might show whether that theory is correct or not.
If you want me to look at this, you need to supply some of the
information requested in my last email: your description of the problem
so far is too vague.
Alasdair
--
agk@uk.sistina.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-17 11:34 ` Alasdair G Kergon
@ 2003-08-17 11:41 ` Jan Niehusmann
2003-08-17 12:00 ` Alasdair G Kergon
0 siblings, 1 reply; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-17 11:41 UTC (permalink / raw)
To: linux-lvm
On Sun, Aug 17, 2003 at 05:33:26PM +0100, Alasdair G Kergon wrote:
> The log file might show whether that theory is correct or not.
I sent it to the list a few hours ago, but it seems like it didn't go
through. Perhaps it exceeded some size limitation?
However, now I put the files on http://gondor.com/lvm/
They do contain the following:
lvm2.log debug log of pvmove, vgcfgbackup (to save the new
config) and pvmove --abort
lvm.conf obvious
config-old config before pvmove
config-new config after pvmove
ps output of ps while (first) pvmove is running
I checked the off-by-one theory, by filling up most of the target PV
with a dummy partition and moving a single LV which exactly fits in to
the remaining extents. The pvmove succeeded, so this probably wasn't the
cause.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-17 11:41 ` Jan Niehusmann
@ 2003-08-17 12:00 ` Alasdair G Kergon
0 siblings, 0 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 12:00 UTC (permalink / raw)
To: linux-lvm
On Sun, Aug 17, 2003 at 06:40:28PM +0200, Jan Niehusmann wrote:
> I sent it to the list a few hours ago, but it seems like it didn't go
> through. Perhaps it exceeded some size limitation?
Quite likely - after the 1.9MB message slipped through the net
last week.
Alasdair
--
agk@uk.sistina.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
[not found] ` <20030817114638.GA1839@gondor.com>
@ 2003-08-17 12:42 ` Alasdair G Kergon
2003-08-17 13:27 ` Jan Niehusmann
2003-08-18 12:58 ` Alasdair G Kergon
1 sibling, 1 reply; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 12:42 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: linux-lvm
On Sun, Aug 17, 2003 at 01:46:38PM +0200, Jan Niehusmann wrote:
> I hope this helps - do you have a suggestion on what to try next?
This is the first clue:
activate/dev_manager.c:487 pvmove Mirror status: 2 003:004 009:002 383/384
activate/dev_manager.c:532 pvmove Mirror percent: 99.739586
383/384 means the kernel mirror has only successfully copied 383 out
of 384 512-KB regions from (major, minor) (3,4) to (9,2).
The first pvmove segment has
12 16MB extents = 393216 sectors = 384 512KB regions.
Can you check (e.g. using dd) all the old sectors can be read OK,
and the new ones written to OK? (Though you're not seeing kernel error
messages; but some volume moves work OK, so an off-by-one theory
is unlikely or nothing would work?)
The first entry of 'dmsetup table vgraid-pvmove0' would show
you the sector ranges - 30448+3156 on pv1=/dev/hda4, length above
& 384 on pv0=/dev/md2 ?
[NB Be careful not to overwrite data - if you've done some
other successful pvmoves since, some of those /dev/md2 sectors
might now be allocated onto.]
We haven't implemented bad sector error reporting and handling yet,
so for now, you have to construct pvmoves to avoid any bad sectors.
Alasdair
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-17 12:42 ` Alasdair G Kergon
@ 2003-08-17 13:27 ` Jan Niehusmann
2003-08-17 13:50 ` Alasdair G Kergon
2003-08-17 13:55 ` Alasdair G Kergon
0 siblings, 2 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-17 13:27 UTC (permalink / raw)
To: linux-lvm
On Sun, Aug 17, 2003 at 06:41:38PM +0100, Alasdair G Kergon wrote:
> 383/384 means the kernel mirror has only successfully copied 383 out
> of 384 512-KB regions from (major, minor) (3,4) to (9,2).
>
> The first pvmove segment has
> 12 16MB extents = 393216 sectors = 384 512KB regions.
Well, now after I did some pvmoves of individual LVs, pvmove /dev/hda4
hangs at a different point:
activate/dev_manager.c:487 pvmove Mirror status: 2 003:004 009:002 511/512
activate/dev_manager.c:532 pvmove Mirror percent: 99.804688
dmsetup says:
0 524288 mirror core 1 1024 2 003:004 111146736 009:002 56852864
524288 229376 linear 003:004 88864496
753664 2326528 linear 003:004 89093872
[...]
I hope I understand these numbers correctly: The first line means that
the virtual device vgraid-pvmove0 starts with 524288 blocks (512 byte
each). These are located at position 111146736 on /dev/hda4 and
56852864 on /dev/md2. (But I don't know what core 1 1024 2 means)
I can read/write these areas with dd, without error messages.
> We haven't implemented bad sector error reporting and handling yet,
> so for now, you have to construct pvmoves to avoid any bad sectors.
I don't think the drives have bad sectors - at least I didn't see any
signs of bad sectors by now.
And I pvmoved a 19GB LV, much more than the global pvmove gets done.
All the moves had decent performance, ~20MB/s, where the source drive
only reads ~30MB/s with dd. The only strange thing I saw was that there
sometimes where some seconds without any disk activity at all.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-17 13:27 ` Jan Niehusmann
@ 2003-08-17 13:50 ` Alasdair G Kergon
2003-08-17 13:55 ` Alasdair G Kergon
1 sibling, 0 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 13:50 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: linux-lvm
On Sun, Aug 17, 2003 at 08:26:03PM +0200, Jan Niehusmann wrote:
> only reads ~30MB/s with dd. The only strange thing I saw was that there
> sometimes where some seconds without any disk activity at all.
That'll be because of the pvmove polling mode - by default
it only checks (and displays) each segment's pvmove progress
every 15 seconds. For 5 sec. intervals use pvmove -i5
or for best performance (but no progress displays *during*
segments) try -i0.
Alasdair
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-17 13:27 ` Jan Niehusmann
2003-08-17 13:50 ` Alasdair G Kergon
@ 2003-08-17 13:55 ` Alasdair G Kergon
1 sibling, 0 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 13:55 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: linux-lvm
On Sun, Aug 17, 2003 at 08:26:03PM +0200, Jan Niehusmann wrote:
> (But I don't know what core 1 1024 2 means)
core means the data recording the state of the mirror is
held in-core - i.e. it is not a persistent mirror
[No other options apart from 'core' available yet]
1 means 1 parameter associated with 'core' follows:
1024 is the number of sectors moved at once (ie 512KB)
2 means 2 mirror devices follow
Alasdair
--
agk@uk.sistina.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-16 13:57 ` Alasdair G Kergon
2003-08-17 11:11 ` Jan Niehusmann
@ 2003-08-17 18:15 ` Jan Niehusmann
2003-08-18 6:46 ` Alasdair G Kergon
2003-08-18 9:13 ` Alasdair G Kergon
[not found] ` <20030817114638.GA1839@gondor.com>
2 siblings, 2 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-17 18:15 UTC (permalink / raw)
To: linux-lvm
On Sat, Aug 16, 2003 at 07:56:06PM +0100, Alasdair G Kergon wrote:
> If pvmove process itself freezes, then a long/wide 'ps' output for that
> process, kcopyd & kmirrord (incl cols: NI, VSZ, RSS, STAT
> and symbolic WCHAN).
I noticed the WCHAN fields in my ps output were not very useful, because
they pointed to a module and were not decoded.
So I took the numeric value (92fa83) and looked it up by hand. It seems
to point to the schedule() call in line 56 of dm-daemon.c for both
kcopyd and kmirrord.
And then, I made a quite worrying observation: When the hanging happens,
the device being copied is in a completely garbled state. In this case,
/dev/vgraid/lvol5 didn't look like an ext3 filesystem at all.
So I looked at the underlying devices, and both /dev/hda4 and /dev/md2 had
a copy of the actual filesystem, only differing in the last 512K. This
conforms to the fact that the mirroring was done for all but one 512K
block. But these 512K are completely different (target device all zeroes).
However, these are only the underlying devices. Looking at
/dev/vgraid/lvol5 a little bit closer revealed that it contained parts
of some other filesystem. Very strange. And worrying. I don't even want
to know if writing to this LV would overwrite some unrelated partition.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-17 18:15 ` Jan Niehusmann
@ 2003-08-18 6:46 ` Alasdair G Kergon
2003-08-18 7:07 ` Jan Niehusmann
2003-08-18 9:13 ` Alasdair G Kergon
1 sibling, 1 reply; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-18 6:46 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: linux-lvm
On Mon, Aug 18, 2003 at 01:14:15AM +0200, Jan Niehusmann wrote:
> So I looked at the underlying devices, and both /dev/hda4 and /dev/md2 had
> a copy of the actual filesystem, only differing in the last 512K. This
> conforms to the fact that the mirroring was done for all but one 512K
> block. But these 512K are completely different (target device all zeroes).
Please confirm which kernel you are using, and the device-mapper
patch(es) you applied. [patches/linux-2.4.2?* from the 1.00.02
device-mapper tarball?]
Some more diagnostics:
check that you have 'activation = 1' and 'level = 7' in the log{}
section of lvm.conf and if not, recreate a problem pvmove application log
(The you put on the web seems incomplete)
['activation = 1' setting is only meant for use when diagnosing
problems like this - don't leave it there permanently]
> However, these are only the underlying devices. Looking at
> /dev/vgraid/lvol5 a little bit closer revealed that it contained parts
> of some other filesystem. Very strange. And worrying. I don't even want
> to know if writing to this LV would overwrite some unrelated partition.
Use dmsetup to see what's going on:
run 'dmsetup ls' to get a list of internal device names,
then 'dmsetup -v table <device_name_shown_in_ls>'
on all the relevant ones (e.g. vgraid-lvol5*)
After all the interruptions you've had, check the /dev/vgraid/lvol5
symlink (->/dev/mapper/vgraid-lvol5) & destination major/minor is
still right.
Check the current metadata (with vgcfgbackup) still looks right.
[Does it show pvmove(s) in progress, or is it clean again?]
Alasdair
--
agk@uk.sistina.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-18 6:46 ` Alasdair G Kergon
@ 2003-08-18 7:07 ` Jan Niehusmann
0 siblings, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-18 7:07 UTC (permalink / raw)
To: linux-lvm
On Mon, Aug 18, 2003 at 12:45:15PM +0100, Alasdair G Kergon wrote:
> Please confirm which kernel you are using, and the device-mapper
> patch(es) you applied. [patches/linux-2.4.2?* from the 1.00.02
> device-mapper tarball?]
It's 2.4.21, and yes, both linux-2.4.21-devmapper-ioctl.patch and
linux-2.4.21-VFS-lock.patch from the 1.00.02 tarball are applied.
> Some more diagnostics:
> check that you have 'activation = 1' and 'level = 7' in the log{}
> section of lvm.conf and if not, recreate a problem pvmove application log
> (The you put on the web seems incomplete)
I have level = 7, but not activation = 1. I'll try it again later, when
I'm back at the console... don't want to hang the computer via ssh :-)
> Use dmsetup to see what's going on:
> run 'dmsetup ls' to get a list of internal device names,
> then 'dmsetup -v table <device_name_shown_in_ls>'
> on all the relevant ones (e.g. vgraid-lvol5*)
I had a look at these and didn't see anything unusual, but I'll check
again carefully.
> After all the interruptions you've had, check the /dev/vgraid/lvol5
> symlink (->/dev/mapper/vgraid-lvol5) & destination major/minor is
> still right.
lvol5 was mounted when the problems started (errors on ls on that
partition), so it's surely not a symlink problem, as the kernel doesn't
access a mounted device via the symlink.
> Check the current metadata (with vgcfgbackup) still looks right.
> [Does it show pvmove(s) in progress, or is it clean again?]
When pvmove hangs, vgcfgbackup shows the pvmove in progress, even after
CTRL-C (which is expected), and pvmove --abort doesn't work at all, as I
said.
So I always had to restore LVM2 to a sane state with vgcfgrestore. After
that, I reboot and get LVM back up running cleanly. Even lvol5 was not
damaged, because the garbage on the logical volume never got written to
the underlying device, and vgcfgrestore removed all traces of the
incomplete pvmove.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-17 18:15 ` Jan Niehusmann
2003-08-18 6:46 ` Alasdair G Kergon
@ 2003-08-18 9:13 ` Alasdair G Kergon
2003-08-18 9:30 ` Jan Niehusmann
1 sibling, 1 reply; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-18 9:13 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: linux-lvm
On Mon, Aug 18, 2003 at 01:14:15AM +0200, Jan Niehusmann wrote:
> I noticed the WCHAN fields in my ps output were not very useful, because
> they pointed to a module and were not decoded.
Does your System.map file correspond to the running kernel?
[See ps man page / strace to check where it's getting that file from]
Alasdair
--
agk@uk.sistina.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-18 9:13 ` Alasdair G Kergon
@ 2003-08-18 9:30 ` Jan Niehusmann
0 siblings, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-18 9:30 UTC (permalink / raw)
To: linux-lvm
On Mon, Aug 18, 2003 at 03:12:29PM +0100, Alasdair G Kergon wrote:
> On Mon, Aug 18, 2003 at 01:14:15AM +0200, Jan Niehusmann wrote:
> > I noticed the WCHAN fields in my ps output were not very useful, because
> > they pointed to a module and were not decoded.
>
> Does your System.map file correspond to the running kernel?
> [See ps man page / strace to check where it's getting that file from]
Sure it is :-)
(I use debian make-kpkg to build a kernel .deb, which makes sure that
the right System.map file gets installed together with the kernel)
But it doesn't contain symbols which get defined by modules. And
even /proc/ksymoops only contains exported symbols, which daemon()
from dm-daemon.c is not. So I looked at the next exported symbol
(dm_daemon_start), took the difference between these two addresses,
and went back the number of bytes in objdump --disassemble, and compared
that to the sourcecode to find the actual positions.
Even though I don't know x86 assembly too well, it was not difficult to
match assembly instructions to C source code, as the source at this
position is quite low-level.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
[not found] ` <20030817114638.GA1839@gondor.com>
2003-08-17 12:42 ` Alasdair G Kergon
@ 2003-08-18 12:58 ` Alasdair G Kergon
2003-08-18 13:21 ` Jan Niehusmann
2003-08-18 17:55 ` Jan Niehusmann
1 sibling, 2 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-18 12:58 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: linux-lvm
On Sun, Aug 17, 2003 at 01:46:38PM +0200, Jan Niehusmann wrote:
> pvmove --abort really hangs...
Puzzling. But we can think of some explanations that could tie this in
with the 383/384 problem.
> The difference is that the first pvmove just doesn't show any progress,
> but can be stopped with ctrl-c.
> pvmove --abort doesn't return, isn't
> killable, and after issuing it, every attempt to access a filesystem on
> lvm also blocks.
I'd like to see the debug log file/strace for that pvmove --abort and
the full ps output for the process (incl decoded wchan and process state,
nice value, resident memory size & whether its CPU time is static or
increasing) - before and after issuing the 'kill -9' that doesn't work.
We're probably going to need to start sending you patches to instrument
the kernel if we are to work out what's going on here [unless you happen
to have the machine set up for kgdb].
Also, careful use of 'dmsetup' should let us deduce some internal state.
Do you use irc or IM - might speed things up if so?
Alasdair
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-18 12:58 ` Alasdair G Kergon
@ 2003-08-18 13:21 ` Jan Niehusmann
2003-08-18 17:55 ` Jan Niehusmann
1 sibling, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-18 13:21 UTC (permalink / raw)
To: linux-lvm
On Mon, Aug 18, 2003 at 06:57:19PM +0100, Alasdair G Kergon wrote:
> We're probably going to need to start sending you patches to instrument
> the kernel if we are to work out what's going on here [unless you happen
> to have the machine set up for kgdb].
I don't use kgdb, but I'd happily apply patches if this may help finding
the problem.
> Do you use irc or IM - might speed things up if so?
jannic on the freenode IRC network, or jannic@jabber.gondor.com on
jabber. I'd prefer jabber, because on IRC I often miss messages :-)
But I don't think I'll have time this evening - perhaps later, but I'm
not sure. Tomorrow evening I'll probably have enough time to do some
more testing.
How are you reachable by IRC or IM?
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-18 12:58 ` Alasdair G Kergon
2003-08-18 13:21 ` Jan Niehusmann
@ 2003-08-18 17:55 ` Jan Niehusmann
2003-08-19 17:52 ` Jan Niehusmann
1 sibling, 1 reply; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-18 17:55 UTC (permalink / raw)
To: linux-lvm
On Mon, Aug 18, 2003 at 06:57:19PM +0100, Alasdair G Kergon wrote:
> I'd like to see the debug log file/strace for that pvmove --abort and
> the full ps output for the process (incl decoded wchan and process state,
> nice value, resident memory size & whether its CPU time is static or
> increasing) - before and after issuing the 'kill -9' that doesn't work.
The ps simply says
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 24006 1180 1 -18 14140 14140 down D<L tty1 0:00 pvmove --abort
(or numeric:)
4 0 24006 1180 1 -18 14140 14140 105b14 D<L tty1 0:00 pvmove --abort
This doesn't change with kill -9, all stays exactly the same.
I do not yet have a logfile of pvmove --abort.
But http://gondor.com/lvm/lvm2.log.20030818 does contain the complete
logfile of pvmove -v /dev/hda4.
I'm going to sleep now. Lets do further debugging tomorrow (Tuesday), if
that's ok for you.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2003-08-18 17:55 ` Jan Niehusmann
@ 2003-08-19 17:52 ` Jan Niehusmann
0 siblings, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-19 17:52 UTC (permalink / raw)
To: linux-lvm
Hi!
Unfortunately I had some network problems today, so the debugging
session was shorter than expected.
But I did gather another logfile with activation=1.
It's available from http://gondor.com/lvm/lvm2.log.20030819
Additionally, there is a file 'typescript' in the same directory, which
contains output from dmsetup ls and dmsetup -v table vgraid-pvmove0.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* [linux-lvm] pvmove hangs
@ 2005-04-26 9:19 Gergely Imre
2005-04-26 13:30 ` Diaz Rodriguez, Eduardo
0 siblings, 1 reply; 30+ messages in thread
From: Gergely Imre @ 2005-04-26 9:19 UTC (permalink / raw)
To: linux-lvm
hi
i have a LV with two PVs in it, like this:
--- Logical volume ---
LV Name /dev/watchdog/watchdog_var
VG Name watchdog
LV UUID v6U2JY-PBMf-snpJ-IXX5-ZzcV-y4fk-PchoQg
LV Write Access read/write
LV Status available
# open 1
LV Size 72.22 GB
Current LE 2311
Segments 1
Allocation next free (default)
Read ahead sectors 0
Block device 254:6
--- Physical volumes ---
PV Name /dev/sdb2
PV UUID 5v23aD-mkO0-I9Ur-zUKG-sGd2-8iCP-zhgfJe
PV Status allocatable
Total PE / Free PE 2500 / 189
PV Name /dev/sda1
PV UUID 7mzLEZ-Va0c-sdiW-8KAV-lyjo-a5yI-GBORGS
PV Status allocatable
Total PE / Free PE 2941 / 2753
i want to remove sdb2, so i wanted to make a pvmove /dev/sdb2, so it
would move everything to sda1. the strange thing is, pvmove starts, and
after, say, 20% (about an hour or so) the whole system hangs somehow. i
can ping the box, but any command i try to run, fails. it's like it's
waiting for the disk to read/write, or something like that.
the distrib is Fedora Core 2.
lvm> version
LVM version: 2.00.15 (2004-04-19)
Library version: 1.00.14-ioctl (2004-04-06)
Driver version: 4.4.0
filesystem is ext3.
the system has 2GB RAM, dual Intel Xeon 3GHz.
i tried with kernel 2.6.11.7, after a while it gives me kernel panic
during pvmove. then i tried with 2.6.9, it didn't panic, but it hung,
and i had to reboot.
it's kinda urgent, thanks.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2005-04-26 9:19 Gergely Imre
@ 2005-04-26 13:30 ` Diaz Rodriguez, Eduardo
2005-04-27 5:46 ` Gergely Imre
0 siblings, 1 reply; 30+ messages in thread
From: Diaz Rodriguez, Eduardo @ 2005-04-26 13:30 UTC (permalink / raw)
To: LVM general discussion and development
Hi, try with pvmove -t or pvmove -v,
Send the panics output, I think that we need more information.
regards!
On Tue, 26 Apr 2005 12:19:58 +0300, Gergely Imre wrote
> hi
>
> i have a LV with two PVs in it, like this:
>
> --- Logical volume ---
> LV Name /dev/watchdog/watchdog_var
> VG Name watchdog
> LV UUID v6U2JY-PBMf-snpJ-IXX5-ZzcV-y4fk-PchoQg
> LV Write Access read/write
> LV Status available
> # open 1
> LV Size 72.22 GB
> Current LE 2311
> Segments 1
> Allocation next free (default)
> Read ahead sectors 0
> Block device 254:6
>
> --- Physical volumes ---
> PV Name /dev/sdb2
> PV UUID 5v23aD-mkO0-I9Ur-zUKG-sGd2-8iCP-zhgfJe
> PV Status allocatable
> Total PE / Free PE 2500 / 189
>
> PV Name /dev/sda1
> PV UUID 7mzLEZ-Va0c-sdiW-8KAV-lyjo-a5yI-GBORGS
> PV Status allocatable
> Total PE / Free PE 2941 / 2753
>
> i want to remove sdb2, so i wanted to make a pvmove /dev/sdb2, so it
> would move everything to sda1. the strange thing is, pvmove starts, and
> after, say, 20% (about an hour or so) the whole system hangs
> somehow. i can ping the box, but any command i try to run, fails.
> it's like it's waiting for the disk to read/write, or something like
> that. the distrib is Fedora Core 2.
>
> lvm> version
> LVM version: 2.00.15 (2004-04-19)
> Library version: 1.00.14-ioctl (2004-04-06)
> Driver version: 4.4.0
>
> filesystem is ext3.
> the system has 2GB RAM, dual Intel Xeon 3GHz.
>
> i tried with kernel 2.6.11.7, after a while it gives me kernel panic
> during pvmove. then i tried with 2.6.9, it didn't panic, but it hung,
> and i had to reboot.
>
> it's kinda urgent, thanks.
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
=======================================================================================
Este d�a que tanto temes por ser el �ltimo, es la aurora del d�a eterno.
-- S�neca. (2 a.C-65) Fil�sofo latino.
=======================================================================================
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2005-04-26 13:30 ` Diaz Rodriguez, Eduardo
@ 2005-04-27 5:46 ` Gergely Imre
0 siblings, 0 replies; 30+ messages in thread
From: Gergely Imre @ 2005-04-27 5:46 UTC (permalink / raw)
To: linux-lvm
output of the command
pvmove -t -vv -d /dev/sdb2 > /tmp/pvmove-test.txt 2>&1
is here (http://terran.noc.astral.ro/pvmove/pvmove-test.txt)
then, i ran it without -t, like this:
pvmove -vv -d /dev/sdb2 > /tmp/pvmove.txt 2>&1
the output here:
http://terran.noc.astral.ro/pvmove/pvmove.txt
at first, everything seemed alright, it kept writing the progress in the
file. i even ran 'lvs' a couple of times:
[root@watchdog tmp]# lvs
LV VG Attr LSize Origin Snap% Move Move%
.
.
.
pvmove0 watchdog p-C-ao 72.22G /dev/sdb2 4.15
watchdog_root watchdog -wn-ao 5.88G
watchdog_var watchdog -wN-ao 72.22G
the filesystem on this LV was mounted and used during the pvmove, but
this shouldn't be a problem, right?
i kept an eye on it, it did it's job for around 67%, then it hung, the
system was not responding, any command failed, i had to reboot. after
the reboot the whole thing started all over again, from the beginning.
but there is nothing in the logs that would indicate a problem.
Diaz Rodriguez, Eduardo wrote:
> Hi, try with pvmove -t or pvmove -v,
>
> Send the panics output, I think that we need more information.
>
> regards!
>
>
> On Tue, 26 Apr 2005 12:19:58 +0300, Gergely Imre wrote
>
>>hi
>>
>>i have a LV with two PVs in it, like this:
>>
>> --- Logical volume ---
>> LV Name /dev/watchdog/watchdog_var
>> VG Name watchdog
>> LV UUID v6U2JY-PBMf-snpJ-IXX5-ZzcV-y4fk-PchoQg
>> LV Write Access read/write
>> LV Status available
>> # open 1
>> LV Size 72.22 GB
>> Current LE 2311
>> Segments 1
>> Allocation next free (default)
>> Read ahead sectors 0
>> Block device 254:6
>>
>> --- Physical volumes ---
>> PV Name /dev/sdb2
>> PV UUID 5v23aD-mkO0-I9Ur-zUKG-sGd2-8iCP-zhgfJe
>> PV Status allocatable
>> Total PE / Free PE 2500 / 189
>>
>> PV Name /dev/sda1
>> PV UUID 7mzLEZ-Va0c-sdiW-8KAV-lyjo-a5yI-GBORGS
>> PV Status allocatable
>> Total PE / Free PE 2941 / 2753
>>
>>i want to remove sdb2, so i wanted to make a pvmove /dev/sdb2, so it
>>would move everything to sda1. the strange thing is, pvmove starts, and
>>after, say, 20% (about an hour or so) the whole system hangs
>>somehow. i can ping the box, but any command i try to run, fails.
>>it's like it's waiting for the disk to read/write, or something like
>>that. the distrib is Fedora Core 2.
>>
>>lvm> version
>> LVM version: 2.00.15 (2004-04-19)
>> Library version: 1.00.14-ioctl (2004-04-06)
>> Driver version: 4.4.0
>>
>>filesystem is ext3.
>>the system has 2GB RAM, dual Intel Xeon 3GHz.
>>
>>i tried with kernel 2.6.11.7, after a while it gives me kernel panic
>>during pvmove. then i tried with 2.6.9, it didn't panic, but it hung,
>>and i had to reboot.
>>
>>it's kinda urgent, thanks.
>>
>>_______________________________________________
>>linux-lvm mailing list
>>linux-lvm@redhat.com
>>https://www.redhat.com/mailman/listinfo/linux-lvm
>>read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
>
>
> =======================================================================================
> Este d�a que tanto temes por ser el �ltimo, es la aurora del d�a eterno.
> -- S�neca. (2 a.C-65) Fil�sofo latino.
> =======================================================================================
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
--
Gergely Imre
SysAdmin
Nextra TeleCom - group Astral
Tel: +4(0)266-317500
http://www.nextra.ro/gimre
GPG key: 0x34525305 (www.keyserver.net)
^ permalink raw reply [flat|nested] 30+ messages in thread
* [linux-lvm] pvmove hangs
@ 2005-04-27 13:04 Gergely Imre
2005-04-28 9:46 ` Diaz Rodriguez, Eduardo
0 siblings, 1 reply; 30+ messages in thread
From: Gergely Imre @ 2005-04-27 13:04 UTC (permalink / raw)
To: linux-lvm
let me 'restart' this thread. i ran into another problem. or it's the same?
[root@test etc]# fdisk -l
Disk /dev/sda: 4294 MB, 4294967296 bytes
255 heads, 63 sectors/track, 522 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 17 136521 82 Linux swap
/dev/sda2 * 18 267 2008125 83 Linux
/dev/sda3 268 392 1004062+ 8e Linux LVM
/dev/sda4 393 522 1044225 8e Linux LVM
i installed FC2 on sda2, i upgraded the kernel to 2.6.11.7, created a VG
and LV out of sda3+sda4, like this:
[root@test etc]# vgdisplay -v
Finding all volume groups
Finding volume group "test"
--- Volume group ---
VG Name test
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 56
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 2
Act PV 2
VG Size 1.95 GB
PE Size 4.00 MB
Total PE 499
Alloc PE / Size 244 / 976.00 MB
Free PE / Size 255 / 1020.00 MB
VG UUID E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH
--- Logical volume ---
LV Name /dev/test/root
VG Name test
LV UUID wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
LV Write Access read/write
LV Status available
# open 1
LV Size 976.00 MB
Current LE 244
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:0
--- Physical volumes ---
PV Name /dev/sda3
PV UUID 1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
PV Status allocatable
Total PE / Free PE 245 / 1
PV Name /dev/sda4
PV UUID rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
PV Status allocatable
Total PE / Free PE 254 / 254
so, practically the root LV is on sda3. i moved the system from sda2 to
sda3, now the whole / is on /dev/mapper/test-root, and it's working fine.
[root@test etc]# df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/mapper/test-root
ext3 983704 638908 294828 69% /
none tmpfs 63464 0 63464 0% /dev/shm
grub.conf:
title Fedora Core (2.6.11.7)
root (hd0,1)
kernel /boot/vmlinuz-2.6.11.7 ro root=/dev/mapper/test-root
initrd /boot/initrd-2.6.11.7.img
so, the /boot directory stayed on /dev/sda2.
fstab:
[root@test etc]# cat /etc/fstab
/dev/mapper/test-root / ext3 defaults 1 1
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
/dev/sda1 swap swap defaults .
.
.
so far so good. now let's say i want to remove sda3. to do this, i need
to pvmove everything from sda3 to sda4. if i run
pvmove -vv /dev/sda3
i get the following:
[root@test root]# pvmove -vv /dev/sda3
Setting global/locking_type to 1
Setting global/locking_dir to /var/lock/lvm
File-based locking enabled.
/dev/sda3: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda1: No label detected
/dev/sda2: No label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
Finding volume group "test"
Locking /var/lock/lvm/V_test WB
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
Archiving volume group "test" metadata.
Creating logical volume pvmove0
Getting target version for mirror
Moving /dev/sda3:0-243 of test/root
Moving 244 extents of logical volume test/root
Finding volume group for uuid
E012hQKRPNygZImyIsXfbV68tt6S44wHwqoG5mX3FnGsaj8P9tcSwSAXi4IJ5j2P
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
Found volume group "test"
Setting activation/missing_stripe_filler to /dev/ioerror
Updating volume group metadata
Creating volume group backup "/etc/lvm/backup/test"
Finding volume group for uuid
E012hQKRPNygZImyIsXfbV68tt6S44wHwqoG5mX3FnGsaj8P9tcSwSAXi4IJ5j2P
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
Found volume group "test"
Locking memory
Suspending test-root
Finding volume group for uuid
E012hQKRPNygZImyIsXfbV68tt6S44wH860Z8k3Rbibp4LTSyFNUkHANkOMh0qdK
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
/dev/sda3: lvm2 label detected
/dev/sda4: lvm2 label detected
Found volume group "test"
Loading test-pvmove0
Setting activation/mirror_region_size to 512
and that's it... i wait around 20 minutes, nothing. no picture, no
sound... i can't do anything, so i have to do a hard reset. there is no
disk activity whatsoever.
after the reset, i quickly log in, and i find (running lvs) that it's
continuing the pvmove, like nothing happened:
lvs[root@test root]# lvs
LV VG Attr LSize Origin Snap% Move Copy%
pvmove0 test p-C-ao 976.00M /dev/sda3 44.67
root test -wI-ao 976.00M
i didn't run anything, still, pvmove continues. after a while it
finishes, and it seems like everything is OK.
[root@test root]# lvs
LV VG Attr LSize Origin Snap% Move Copy%
root test -wi-ao 976.00M
[root@test root]#
[root@test root]# vgdisplay -v
Finding all volume groups
Finding volume group "test"
--- Volume group ---
VG Name test
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 60
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 2
Act PV 2
VG Size 1.95 GB
PE Size 4.00 MB
Total PE 499
Alloc PE / Size 244 / 976.00 MB
Free PE / Size 255 / 1020.00 MB
VG UUID E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH
--- Logical volume ---
LV Name /dev/test/root
VG Name test
LV UUID wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
LV Write Access read/write
LV Status available
# open 1
LV Size 976.00 MB
Current LE 244
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:1
--- Physical volumes ---
PV Name /dev/sda3
PV UUID 1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
PV Status allocatable
Total PE / Free PE 245 / 245
PV Name /dev/sda4
PV UUID rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
PV Status allocatable
Total PE / Free PE 254 / 10
now everything is on sda4, i could remove sda3. but here's the thing...
i do a reboot, and i get the following error:
http://terran.noc.astral.ro/pvmove/boot.png
but if i give the root password, /dev/mapper/test-root is mounted, and
if i remount it read-write, it seems to be alright. but if i try to fsck
/dev/mapper/test-root, it still gives me this error
[root@test root]# fsck /dev/mapper/test-root
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
fsck.ext3: No such device or address while trying to open
/dev/mapper/test-root
Possibly non-existent or swap device?
[root@test root]#
i look in /dev/mapper:
[root@test root]# cd /dev/mapper
[root@test mapper]# ll
total 0
crw------- 1 root root 10, 63 Apr 27 15:43 control
brw------- 1 root root 253, 0 Apr 27 15:43 test-pvmove0
brw------- 1 root root 253, 1 Apr 27 15:43 test-root
[root@test mapper]#
it did not remove test-pvmove0, after finishing the move. another
strange thing is that now test-root has major/minor 253,1 and
test-pvmove0 has 253,0.
i removed test-*
[root@test mapper]# rm test-pvmove0 test-root
rm: remove block special file `test-pvmove0'? y
rm: remove block special file `test-root'? y
then i did a lvm vgmknodes (i looked this up in /etc/rc.sysinit:)
[root@test mapper]# lvm vgmknodes
[root@test mapper]# ls -la
total 124
drwxr-xr-x 2 root root 4096 Apr 27 15:59 .
drwxr-xr-x 24 root root 118784 Apr 27 15:55 ..
crw------- 1 root root 10, 63 Apr 27 15:43 control
brw------- 1 root root 253, 0 Apr 27 15:59 test-root
it created the test-root node, but with major/minor 253,0, so now the
fsck is working again:
[root@test mapper]# fsck /dev/mapper/test-root
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
/dev/mapper/test-root is mounted.
WARNING!!! Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.
Do you really want to continue (y/n)? no
check aborted.
if i do a reboot now, everything is alright again, and in fact i got
what i wanted, it moved everything from sda3 to sda4. but what about all
these problems?
it's not over yet :)
now i want to move everything back to sda3. but this time i do a:
[root@test root]# mount / -o remount,noatime
then the moving:
[root@test root]# pvmove -v /dev/sda4
Finding volume group "test"
Archiving volume group "test" metadata.
Creating logical volume pvmove0
Moving 244 extents of logical volume test/root
Found volume group "test"
Updating volume group metadata
Creating volume group backup "/etc/lvm/backup/test"
Found volume group "test"
Found volume group "test"
Loading test-pvmove0
Found volume group "test"
Loading test-root
Checking progress every 15 seconds
/dev/sda4: Moved: 11.9%
/dev/sda4: Moved: 21.7%
/dev/sda4: Moved: 32.4%
/dev/sda4: Moved: 45.5%
/dev/sda4: Moved: 56.1%
/dev/sda4: Moved: 67.6%
/dev/sda4: Moved: 78.3%
/dev/sda4: Moved: 89.3%
/dev/sda4: Moved: 99.6%
/dev/sda4: Moved: 100.0%
Found volume group "test"
Found volume group "test"
Found volume group "test"
Loading test-pvmove0
Found volume group "test"
Loading test-root
Found volume group "test"
Found volume group "test"
Removing temporary pvmove LV
Writing out final volume group after pvmove
Creating volume group backup "/etc/lvm/backup/test"
during to move, i run lvs a couple of times, nothing unusual.
[root@test root]# lvs
LV VG Attr LSize Origin Snap% Move Copy%
pvmove0 test p-C-ao 976.00M /dev/sda4 22.95
root test -wI-ao 976.00M
but what is unusual, is this:
[root@test root]# cd /dev/mapper/
[root@test mapper]# ll
total 0
crw------- 1 root root 10, 63 Apr 27 16:01 control
brw------- 1 root root 253, 1 Apr 27 16:03 test-pvmove0
brw------- 1 root root 253, 0 Apr 27 15:59 test-root
[root@test mapper]#
now it created test-pvmove0 with 253,1, and it didn't touch test-root.
and after pvmove was finished, it removed test-pvmove0 also. now that i
call a proper pvmove ;) i run vgdisplay just to be sure:
[root@test root]# vgdisplay -v
Finding all volume groups
Finding volume group "test"
--- Volume group ---
VG Name test
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 63
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 2
Act PV 2
VG Size 1.95 GB
PE Size 4.00 MB
Total PE 499
Alloc PE / Size 244 / 976.00 MB
Free PE / Size 255 / 1020.00 MB
VG UUID E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH
--- Logical volume ---
LV Name /dev/test/root
VG Name test
LV UUID wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
LV Write Access read/write
LV Status available
# open 1
LV Size 976.00 MB
Current LE 244
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:0
--- Physical volumes ---
PV Name /dev/sda3
PV UUID 1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
PV Status allocatable
Total PE / Free PE 245 / 1
PV Name /dev/sda4
PV UUID rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
PV Status allocatable
Total PE / Free PE 254 / 254
everything is in place (on sda3). sda4 is empty like it should.
could somebody explain this to me? what's happening here?
the box i was playing on is a vmware emulated comp (128MB ram, 4GB hdd),
with a buslogic scsi adapter. i installed fedora core 2 custom, with no
packages selected, after that i did a yum update. kernel 2.6.11.7
vanilla, with no patches at all, compiled with minimum stuff. latest
lvm2 and device-mapper.
output of ps:
[root@test root]# ps ax
PID TTY STAT TIME COMMAND
1 ? S 0:02 init [3]
2 ? SWN 0:00 [ksoftirqd/0]
3 ? SW< 0:00 [events/0]
4 ? SW< 0:00 [khelper]
9 ? SW< 0:00 [kthread]
17 ? SW< 0:00 [kblockd/0]
71 ? SW 0:00 [pdflush]
72 ? SW 0:00 [pdflush]
74 ? SW< 0:00 [aio/0]
73 ? SW 0:00 [kswapd0]
659 ? SW 0:00 [kseriod]
694 ? SW 0:00 [scsi_eh_0]
715 ? SW< 0:00 [kcryptd/0]
716 ? SW< 0:00 [kmirrord/0]
751 ? SW 0:00 [kjournald]
1722 ? S 0:00 syslogd -m 0
1726 ? S 0:00 klogd -x
1749 ? S 0:00 /usr/sbin/sshd
1761 ? S 0:00 crond
1785 tty1 S 0:00 /sbin/mingetty tty1
1807 tty2 S 0:00 /sbin/mingetty tty2
1818 tty3 S 0:00 /sbin/mingetty tty3
1819 tty4 S 0:00 /sbin/mingetty tty4
1906 tty5 S 0:00 /sbin/mingetty tty5
1937 tty6 S 0:00 /sbin/mingetty tty6
1972 ? R 0:00 sshd: root@pts/0
1974 pts/0 S 0:00 -bash
2010 ? S 0:00 sshd: root@pts/1
2012 pts/1 S 0:00 -bash
2047 pts/0 R 0:00 ps ax
(sorry for the long mail;)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2005-04-27 13:04 [linux-lvm] pvmove hangs Gergely Imre
@ 2005-04-28 9:46 ` Diaz Rodriguez, Eduardo
2005-04-28 10:23 ` Gergely Imre
0 siblings, 1 reply; 30+ messages in thread
From: Diaz Rodriguez, Eduardo @ 2005-04-28 9:46 UTC (permalink / raw)
To: LVM general discussion and development
too much dificult for me, only have one idea, update lvm to last version and
check changelog for some updates about pvmove.
If you solve the problem, please sumarice.
Good Luck!
On Wed, 27 Apr 2005 16:04:20 +0300, Gergely Imre wrote
> let me 'restart' this thread. i ran into another problem. or it's
> the same?
>
> [root@test etc]# fdisk -l
>
> Disk /dev/sda: 4294 MB, 4294967296 bytes
> 255 heads, 63 sectors/track, 522 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sda1 1 17 136521 82 Linux swap
> /dev/sda2 * 18 267 2008125 83 Linux
> /dev/sda3 268 392 1004062+ 8e Linux LVM
> /dev/sda4 393 522 1044225 8e Linux LVM
>
> i installed FC2 on sda2, i upgraded the kernel to 2.6.11.7, created
> a VG and LV out of sda3+sda4, like this:
>
> [root@test etc]# vgdisplay -v
> Finding all volume groups
> Finding volume group "test"
> --- Volume group ---
> VG Name test
> System ID
> Format lvm2
> Metadata Areas 2
> Metadata Sequence No 56
> VG Access read/write
> VG Status resizable
> MAX LV 0
> Cur LV 1
> Open LV 1
> Max PV 0
> Cur PV 2
> Act PV 2
> VG Size 1.95 GB
> PE Size 4.00 MB
> Total PE 499
> Alloc PE / Size 244 / 976.00 MB
> Free PE / Size 255 / 1020.00 MB
> VG UUID E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH
>
> --- Logical volume ---
> LV Name /dev/test/root
> VG Name test
> LV UUID wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
> LV Write Access read/write
> LV Status available
> # open 1
> LV Size 976.00 MB
> Current LE 244
> Segments 1
> Allocation inherit
> Read ahead sectors 0
> Block device 253:0
>
> --- Physical volumes ---
> PV Name /dev/sda3
> PV UUID 1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
> PV Status allocatable
> Total PE / Free PE 245 / 1
>
> PV Name /dev/sda4
> PV UUID rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
> PV Status allocatable
> Total PE / Free PE 254 / 254
>
> so, practically the root LV is on sda3. i moved the system from sda2
> to sda3, now the whole / is on /dev/mapper/test-root, and it's
> working fine.
>
> [root@test etc]# df -T
> Filesystem Type 1K-blocks Used Available Use% Mounted on
> /dev/mapper/test-root
> ext3 983704 638908 294828 69% /
> none tmpfs 63464 0 63464 0% /dev/shm
>
> grub.conf:
> title Fedora Core (2.6.11.7)
> root (hd0,1)
> kernel /boot/vmlinuz-2.6.11.7 ro root=/dev/mapper/test-root
> initrd /boot/initrd-2.6.11.7.img
>
> so, the /boot directory stayed on /dev/sda2.
>
> fstab:
> [root@test etc]# cat /etc/fstab
> /dev/mapper/test-root / ext3 defaults
> 1 1 none /dev/pts devpts gid=5,
> mode=620 0 0 none /dev/shm tmpfs
> defaults 0 0 /dev/sda1 swap
> swap defaults . . .
>
> so far so good. now let's say i want to remove sda3. to do this, i need
> to pvmove everything from sda3 to sda4. if i run
>
> pvmove -vv /dev/sda3
>
> i get the following:
>
> [root@test root]# pvmove -vv /dev/sda3
> Setting global/locking_type to 1
> Setting global/locking_dir to /var/lock/lvm
> File-based locking enabled.
> /dev/sda3: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda1: No label detected
> /dev/sda2: No label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> Finding volume group "test"
> Locking /var/lock/lvm/V_test WB
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> Archiving volume group "test" metadata.
> Creating logical volume pvmove0
> Getting target version for mirror
> Moving /dev/sda3:0-243 of test/root
> Moving 244 extents of logical volume test/root
> Finding volume group for uuid
> E012hQKRPNygZImyIsXfbV68tt6S44wHwqoG5mX3FnGsaj8P9tcSwSAXi4IJ5j2P
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> Found volume group "test"
> Setting activation/missing_stripe_filler to /dev/ioerror
> Updating volume group metadata
> Creating volume group backup "/etc/lvm/backup/test"
> Finding volume group for uuid
> E012hQKRPNygZImyIsXfbV68tt6S44wHwqoG5mX3FnGsaj8P9tcSwSAXi4IJ5j2P
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> Found volume group "test"
> Locking memory
> Suspending test-root
> Finding volume group for uuid
> E012hQKRPNygZImyIsXfbV68tt6S44wH860Z8k3Rbibp4LTSyFNUkHANkOMh0qdK
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> /dev/sda3: lvm2 label detected
> /dev/sda4: lvm2 label detected
> Found volume group "test"
> Loading test-pvmove0
> Setting activation/mirror_region_size to 512
>
> and that's it... i wait around 20 minutes, nothing. no picture, no
> sound... i can't do anything, so i have to do a hard reset. there is
> no disk activity whatsoever.
>
> after the reset, i quickly log in, and i find (running lvs) that it's
> continuing the pvmove, like nothing happened:
>
> lvs[root@test root]# lvs
> LV VG Attr LSize Origin Snap% Move Copy%
> pvmove0 test p-C-ao 976.00M /dev/sda3 44.67
> root test -wI-ao 976.00M
>
> i didn't run anything, still, pvmove continues. after a while it
> finishes, and it seems like everything is OK.
>
> [root@test root]# lvs
> LV VG Attr LSize Origin Snap% Move Copy%
> root test -wi-ao 976.00M
> [root@test root]#
>
> [root@test root]# vgdisplay -v
> Finding all volume groups
> Finding volume group "test"
> --- Volume group ---
> VG Name test
> System ID
> Format lvm2
> Metadata Areas 2
> Metadata Sequence No 60
> VG Access read/write
> VG Status resizable
> MAX LV 0
> Cur LV 1
> Open LV 1
> Max PV 0
> Cur PV 2
> Act PV 2
> VG Size 1.95 GB
> PE Size 4.00 MB
> Total PE 499
> Alloc PE / Size 244 / 976.00 MB
> Free PE / Size 255 / 1020.00 MB
> VG UUID E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH
>
> --- Logical volume ---
> LV Name /dev/test/root
> VG Name test
> LV UUID wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
> LV Write Access read/write
> LV Status available
> # open 1
> LV Size 976.00 MB
> Current LE 244
> Segments 1
> Allocation inherit
> Read ahead sectors 0
> Block device 253:1
>
> --- Physical volumes ---
> PV Name /dev/sda3
> PV UUID 1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
> PV Status allocatable
> Total PE / Free PE 245 / 245
>
> PV Name /dev/sda4
> PV UUID rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
> PV Status allocatable
> Total PE / Free PE 254 / 10
>
> now everything is on sda4, i could remove sda3. but here's the thing...
> i do a reboot, and i get the following error:
>
> http://terran.noc.astral.ro/pvmove/boot.png
>
> but if i give the root password, /dev/mapper/test-root is mounted,
> and if i remount it read-write, it seems to be alright. but if i try
> to fsck /dev/mapper/test-root, it still gives me this error
> [root@test root]# fsck /dev/mapper/test-root fsck 1.35 (28-Feb-2004)
> e2fsck 1.35 (28-Feb-2004) fsck.ext3: No such device or address while
> trying to open /dev/mapper/test-root Possibly non-existent or swap device?
> [root@test root]#
>
> i look in /dev/mapper:
>
> [root@test root]# cd /dev/mapper
> [root@test mapper]# ll
> total 0
> crw------- 1 root root 10, 63 Apr 27 15:43 control
> brw------- 1 root root 253, 0 Apr 27 15:43 test-pvmove0
> brw------- 1 root root 253, 1 Apr 27 15:43 test-root
> [root@test mapper]#
>
> it did not remove test-pvmove0, after finishing the move. another
> strange thing is that now test-root has major/minor 253,1 and
> test-pvmove0 has 253,0.
>
> i removed test-*
> [root@test mapper]# rm test-pvmove0 test-root
> rm: remove block special file `test-pvmove0'? y
> rm: remove block special file `test-root'? y
>
> then i did a lvm vgmknodes (i looked this up in /etc/rc.sysinit:)
>
> [root@test mapper]# lvm vgmknodes
> [root@test mapper]# ls -la
> total 124
> drwxr-xr-x 2 root root 4096 Apr 27 15:59 .
> drwxr-xr-x 24 root root 118784 Apr 27 15:55 ..
> crw------- 1 root root 10, 63 Apr 27 15:43 control
> brw------- 1 root root 253, 0 Apr 27 15:59 test-root
>
> it created the test-root node, but with major/minor 253,0, so now the
> fsck is working again:
> [root@test mapper]# fsck /dev/mapper/test-root
> fsck 1.35 (28-Feb-2004)
> e2fsck 1.35 (28-Feb-2004)
> /dev/mapper/test-root is mounted.
>
> WARNING!!! Running e2fsck on a mounted filesystem may cause
> SEVERE filesystem damage.
>
> Do you really want to continue (y/n)? no
>
> check aborted.
>
> if i do a reboot now, everything is alright again, and in fact i got
> what i wanted, it moved everything from sda3 to sda4. but what about
> all these problems?
>
> it's not over yet :)
> now i want to move everything back to sda3. but this time i do a:
>
> [root@test root]# mount / -o remount,noatime
>
> then the moving:
>
> [root@test root]# pvmove -v /dev/sda4
> Finding volume group "test"
> Archiving volume group "test" metadata.
> Creating logical volume pvmove0
> Moving 244 extents of logical volume test/root
> Found volume group "test"
> Updating volume group metadata
> Creating volume group backup "/etc/lvm/backup/test"
> Found volume group "test"
> Found volume group "test"
> Loading test-pvmove0
> Found volume group "test"
> Loading test-root
> Checking progress every 15 seconds
> /dev/sda4: Moved: 11.9%
> /dev/sda4: Moved: 21.7%
> /dev/sda4: Moved: 32.4%
> /dev/sda4: Moved: 45.5%
> /dev/sda4: Moved: 56.1%
> /dev/sda4: Moved: 67.6%
> /dev/sda4: Moved: 78.3%
> /dev/sda4: Moved: 89.3%
> /dev/sda4: Moved: 99.6%
> /dev/sda4: Moved: 100.0%
> Found volume group "test"
> Found volume group "test"
> Found volume group "test"
> Loading test-pvmove0
> Found volume group "test"
> Loading test-root
> Found volume group "test"
> Found volume group "test"
> Removing temporary pvmove LV
> Writing out final volume group after pvmove
> Creating volume group backup "/etc/lvm/backup/test"
>
> during to move, i run lvs a couple of times, nothing unusual.
>
> [root@test root]# lvs
> LV VG Attr LSize Origin Snap% Move Copy%
> pvmove0 test p-C-ao 976.00M /dev/sda4 22.95
> root test -wI-ao 976.00M
>
> but what is unusual, is this:
>
> [root@test root]# cd /dev/mapper/
> [root@test mapper]# ll
> total 0
> crw------- 1 root root 10, 63 Apr 27 16:01 control
> brw------- 1 root root 253, 1 Apr 27 16:03 test-pvmove0
> brw------- 1 root root 253, 0 Apr 27 15:59 test-root
> [root@test mapper]#
>
> now it created test-pvmove0 with 253,1, and it didn't touch test-
> root. and after pvmove was finished, it removed test-pvmove0 also.
> now that i call a proper pvmove ;) i run vgdisplay just to be sure:
>
> [root@test root]# vgdisplay -v
> Finding all volume groups
> Finding volume group "test"
> --- Volume group ---
> VG Name test
> System ID
> Format lvm2
> Metadata Areas 2
> Metadata Sequence No 63
> VG Access read/write
> VG Status resizable
> MAX LV 0
> Cur LV 1
> Open LV 1
> Max PV 0
> Cur PV 2
> Act PV 2
> VG Size 1.95 GB
> PE Size 4.00 MB
> Total PE 499
> Alloc PE / Size 244 / 976.00 MB
> Free PE / Size 255 / 1020.00 MB
> VG UUID E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH
>
> --- Logical volume ---
> LV Name /dev/test/root
> VG Name test
> LV UUID wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
> LV Write Access read/write
> LV Status available
> # open 1
> LV Size 976.00 MB
> Current LE 244
> Segments 1
> Allocation inherit
> Read ahead sectors 0
> Block device 253:0
>
> --- Physical volumes ---
> PV Name /dev/sda3
> PV UUID 1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
> PV Status allocatable
> Total PE / Free PE 245 / 1
>
> PV Name /dev/sda4
> PV UUID rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
> PV Status allocatable
> Total PE / Free PE 254 / 254
>
> everything is in place (on sda3). sda4 is empty like it should.
>
> could somebody explain this to me? what's happening here?
>
> the box i was playing on is a vmware emulated comp (128MB ram, 4GB
> hdd), with a buslogic scsi adapter. i installed fedora core 2 custom,
> with no packages selected, after that i did a yum update. kernel 2.6.11.7
> vanilla, with no patches at all, compiled with minimum stuff. latest
> lvm2 and device-mapper.
>
> output of ps:
>
> [root@test root]# ps ax
> PID TTY STAT TIME COMMAND
> 1 ? S 0:02 init [3]
> 2 ? SWN 0:00 [ksoftirqd/0]
> 3 ? SW< 0:00 [events/0]
> 4 ? SW< 0:00 [khelper]
> 9 ? SW< 0:00 [kthread]
> 17 ? SW< 0:00 [kblockd/0]
> 71 ? SW 0:00 [pdflush]
> 72 ? SW 0:00 [pdflush]
> 74 ? SW< 0:00 [aio/0]
> 73 ? SW 0:00 [kswapd0]
> 659 ? SW 0:00 [kseriod]
> 694 ? SW 0:00 [scsi_eh_0]
> 715 ? SW< 0:00 [kcryptd/0]
> 716 ? SW< 0:00 [kmirrord/0]
> 751 ? SW 0:00 [kjournald]
> 1722 ? S 0:00 syslogd -m 0
> 1726 ? S 0:00 klogd -x
> 1749 ? S 0:00 /usr/sbin/sshd
> 1761 ? S 0:00 crond
> 1785 tty1 S 0:00 /sbin/mingetty tty1
> 1807 tty2 S 0:00 /sbin/mingetty tty2
> 1818 tty3 S 0:00 /sbin/mingetty tty3
> 1819 tty4 S 0:00 /sbin/mingetty tty4
> 1906 tty5 S 0:00 /sbin/mingetty tty5
> 1937 tty6 S 0:00 /sbin/mingetty tty6
> 1972 ? R 0:00 sshd: root@pts/0
> 1974 pts/0 S 0:00 -bash
> 2010 ? S 0:00 sshd: root@pts/1
> 2012 pts/1 S 0:00 -bash
> 2047 pts/0 R 0:00 ps ax
>
> (sorry for the long mail;)
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
=======================================================================================
En tus apuros y afanes, acude a los refranes.
=======================================================================================
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2005-04-28 9:46 ` Diaz Rodriguez, Eduardo
@ 2005-04-28 10:23 ` Gergely Imre
2005-04-28 11:13 ` Diaz Rodriguez, Eduardo
0 siblings, 1 reply; 30+ messages in thread
From: Gergely Imre @ 2005-04-28 10:23 UTC (permalink / raw)
To: LVM general discussion and development
the main question is, why it keeps hanging when i want to move "/" ? it
doesn't even begin to move the PEs. i tried to move some other live
partitions while writing on them, and it worked.
Diaz Rodriguez, Eduardo wrote:
> too much dificult for me, only have one idea, update lvm to last version and
> check changelog for some updates about pvmove.
>
> If you solve the problem, please sumarice.
>
> Good Luck!
>
> On Wed, 27 Apr 2005 16:04:20 +0300, Gergely Imre wrote
>
>>let me 'restart' this thread. i ran into another problem. or it's
>>the same?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2005-04-28 10:23 ` Gergely Imre
@ 2005-04-28 11:13 ` Diaz Rodriguez, Eduardo
0 siblings, 0 replies; 30+ messages in thread
From: Diaz Rodriguez, Eduardo @ 2005-04-28 11:13 UTC (permalink / raw)
To: LVM general discussion and development
The / partition is diferent.
I am sure that your made the same but your import your vgs in other machine
and you moveit and you don't have problems.
the / patition has the modules, and many info that the systen can't move
becasue are in use (I suppose)....
Where you have the partions boot files?
lrwxrwxrwx 1 root root 28 Apr 13 21:50 vmlinuz ->
boot/vmlinuz-2.4.30.20050413
lrwxrwxrwx 1 root root 26 Apr 12 20:20 vmlinuz.old ->
boot/vmlinuz-2.4.30.050412
/dev/hda2 5763648 4796612 674252 88% /
I have kernel files in / :-D, yes I isn't the best way ....
:-|, I think that can be a good question for the developers :-)....
On Thu, 28 Apr 2005 13:23:14 +0300, Gergely Imre wrote
> the main question is, why it keeps hanging when i want to move "/" ?
> it doesn't even begin to move the PEs. i tried to move some other
> live partitions while writing on them, and it worked.
>
> Diaz Rodriguez, Eduardo wrote:
> > too much dificult for me, only have one idea, update lvm to last version and
> > check changelog for some updates about pvmove.
> >
> > If you solve the problem, please sumarice.
> >
> > Good Luck!
> >
> > On Wed, 27 Apr 2005 16:04:20 +0300, Gergely Imre wrote
> >
> >>let me 'restart' this thread. i ran into another problem. or it's
> >>the same?
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
=======================================================================================
En tus apuros y afanes, acude a los refranes.
=======================================================================================
^ permalink raw reply [flat|nested] 30+ messages in thread
* [linux-lvm] pvmove hangs
@ 2010-08-17 19:26 Allen, Jack
2010-08-17 22:12 ` Thomas Hager
0 siblings, 1 reply; 30+ messages in thread
From: Allen, Jack @ 2010-08-17 19:26 UTC (permalink / raw)
To: LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 10628 bytes --]
Hello:
I posted this in the Dm-devel list yesterday afternoon, but so
far I have not gotten any responses, so I thought I would ask the same
questions here since the command that hang is pvmove.
I had a customer that tried to do a pvmove and it hung. So we
setup a test system to try and duplicate the problem and were able to.
A little history and why I am asking the question in this list.
The customer needed to move from an existing SAN to a new SAN and wanted
as little as possible down time for the Application. So they zoned the
new SAN for access by the system and then added the new LUNs to the
existing Volume Group. Then ran the pvmove commands. It worked with no
problem on one of the PVs, but on the second one all the I/O hung at the
Application and any commands that access the LVM information such as
vgdisplay.
On our test system we only have 1 SAN (EMC CX700). We put X
number of LUNs in a Volume Group and allocated Logical Volumes for the
Application. Added some more LUNs to the Volume Group to simulate a
second SAN. Started the Application with a test program to generate I/O.
Ran pvmove with no problems on one PV, but on the second PV, it hung
just like on the customer's system.
The reason I am posting to this list is because the same type of
move was done earlier on the test system running PowerPath and did not
have any problems. The OS is Red Hat EL 5.5 32 bit. The same version of
LVM was used on both tests. I can provide other details if needed.
Below is part of the messages file when this happen.
Aug 13 14:18:14 mss121 multipathd: dm-25: add map (uevent)
Aug 13 14:18:14 mss121 multipathd: dm-19: add map (uevent)
Aug 13 14:18:14 mss121 multipathd: dm-22: add map (uevent)
Aug 13 14:19:53 mss121 multipathd: dm-25: add map (uevent)
Aug 13 14:19:53 mss121 multipathd: dm-19: add map (uevent)
Aug 13 14:19:53 mss121 multipathd: dm-22: add map (uevent)
Aug 13 14:21:26 mss121 multipathd: dm-25: add map (uevent)
Aug 13 14:21:26 mss121 multipathd: dm-19: add map (uevent)
Aug 13 14:21:26 mss121 multipathd: dm-22: add map (uevent)
Aug 13 14:21:26 mss121 multipathd: dm-25: remove map (uevent)
Aug 13 14:22:47 mss121 multipathd: dm-25: add map (uevent)
Aug 13 14:22:47 mss121 multipathd: dm-17: add map (uevent)
Aug 13 14:22:47 mss121 multipathd: dm-20: add map (uevent)
Aug 13 14:22:47 mss121 multipathd: dm-23: add map (uevent)
Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more
than 120 seconds.
Aug 13 14:27:22 mss121 kernel: "echo 0>
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 13 14:27:22 mss121 kernel: mpdsk D 00000BD7 1784 22158
22151 22159 22157 (NOTLB)
Aug 13 14:27:22 mss121 kernel: f3025e04 00000082 bde60e11
00000bd7 f3025e50 c045d1a9 f3025e50 0000000a
Aug 13 14:27:22 mss121 kernel: f7c60000 bde67e2a 00000bd7
00007019 00000000 f7c6010c c8612700 f6ec4e40
Aug 13 14:27:22 mss121 kernel: 00000000 00000000 00000000
c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff
Aug 13 14:27:22 mss121 kernel: Call Trace:
Aug 13 14:27:22 mss121 kernel: [<c045d1a9>] __pagevec_release+0x15/0x1d
Aug 13 14:27:22 mss121 kernel: [<c042cbd1>] getnstimeofday+0x30/0xb6
Aug 13 14:27:22 mss121 kernel: [<c061c156>] io_schedule+0x36/0x59
Aug 13 14:27:22 mss121 kernel: [<c04569c0>] sync_page+0x38/0x3b
Aug 13 14:27:22 mss121 kernel: [<c061c32d>] __wait_on_bit+0x33/0x58
Aug 13 14:27:22 mss121 kernel: [<c0456988>] sync_page+0x0/0x3b
Aug 13 14:27:22 mss121 kernel: [<c0456a48>] wait_on_page_bit+0x5b/0x62
Aug 13 14:27:22 mss121 kernel: [<c043642c>] wake_bit_function+0x0/0x3c
Aug 13 14:27:22 mss121 kernel: [<c04573cf>]
wait_on_page_writeback_range+0x4d/0xf1
Aug 13 14:27:22 mss121 kernel: [<c04934a0>]
generic_osync_inode+0x93/0xbf
Aug 13 14:27:22 mss121 kernel: [<c0457618>]
sync_page_range_nolock+0x68/0x93
Aug 13 14:27:22 mss121 kernel: [<c0458930>]
generic_file_aio_write_nolock+0x71/0x83
Aug 13 14:27:22 mss121 kernel: [<c047b301>] blkdev_file_write+0x0/0x1e
Aug 13 14:27:22 mss121 kernel: [<c0458c8d>]
generic_file_write_nolock+0x86/0x9a
Aug 13 14:27:22 mss121 kernel: [<c04566fe>]
find_get_pages_tag+0x30/0x75
Aug 13 14:27:22 mss121 kernel: [<c0457428>]
wait_on_page_writeback_range+0xa6/0xf1
Aug 13 14:27:22 mss121 kernel: [<c04363ff>]
autoremove_wake_function+0x0/0x2d
Aug 13 14:27:22 mss121 kernel: [<c061c408>] mutex_lock+0xb/0x19
Aug 13 14:27:22 mss121 kernel: [<c0449c52>]
audit_syscall_entry+0x15a/0x18c
Aug 13 14:27:22 mss121 kernel: [<c047b31b>] blkdev_file_write+0x1a/0x1e
Aug 13 14:27:22 mss121 kernel: [<c0474d53>] vfs_write+0xa1/0x143
Aug 13 14:27:22 mss121 kernel: [<c0475345>] sys_write+0x3c/0x63
Aug 13 14:27:22 mss121 kernel: [<c0404f17>] syscall_call+0x7/0xb
Aug 13 14:27:22 mss121 kernel: =======================
Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more
than 120 seconds.
Aug 13 14:27:22 mss121 kernel: "echo 0>
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 13 14:27:22 mss121 kernel: mpdsk D 00000BD7 1884 22161
22151 22162 22160 (NOTLB)
Aug 13 14:27:22 mss121 kernel: f34e2e04 00000082 baebd585
00000bd7 f34e2e50 c045d1a9 f34e2e50 0000000a
Aug 13 14:27:22 mss121 kernel: f6eb1550 baec5e00 00000bd7
0000887b 00000000 f6eb165c c8612700 f723f040
Aug 13 14:27:22 mss121 kernel: 00000000 00000000 00000000
c12e1f80 018dc68e c042cbd1 f6cb3bdc ffffffff
Aug 13 14:27:22 mss121 kernel: Call Trace:
Aug 13 14:27:22 mss121 kernel: [<c045d1a9>] __pagevec_release+0x15/0x1d
Aug 13 14:27:22 mss121 kernel: [<c042cbd1>] getnstimeofday+0x30/0xb6
Aug 13 14:27:22 mss121 kernel: [<c061c156>] io_schedule+0x36/0x59
Aug 13 14:27:22 mss121 kernel: [<c04569c0>] sync_page+0x38/0x3b
Aug 13 14:27:22 mss121 kernel: [<c061c32d>] __wait_on_bit+0x33/0x58
Aug 13 14:27:22 mss121 kernel: [<c0456988>] sync_page+0x0/0x3b
Aug 13 14:27:22 mss121 kernel: [<c0456a48>] wait_on_page_bit+0x5b/0x62
Aug 13 14:27:22 mss121 kernel: [<c043642c>] wake_bit_function+0x0/0x3c
Aug 13 14:27:22 mss121 kernel: [<c04573cf>]
wait_on_page_writeback_range+0x4d/0xf1
Aug 13 14:27:22 mss121 kernel: [<c04934a0>]
generic_osync_inode+0x93/0xbf
Aug 13 14:27:22 mss121 kernel: [<c0457618>]
sync_page_range_nolock+0x68/0x93
Aug 13 14:27:22 mss121 kernel: [<c0458930>]
generic_file_aio_write_nolock+0x71/0x83
Aug 13 14:27:22 mss121 kernel: [<c047b301>] blkdev_file_write+0x0/0x1e
Aug 13 14:27:22 mss121 kernel: [<c0458c8d>]
generic_file_write_nolock+0x86/0x9a
Aug 13 14:27:22 mss121 kernel: [<c04566fe>]
find_get_pages_tag+0x30/0x75
Aug 13 14:27:22 mss121 kernel: [<c0457428>]
wait_on_page_writeback_range+0xa6/0xf1
Aug 13 14:27:22 mss121 kernel: [<c04363ff>]
autoremove_wake_function+0x0/0x2d
Aug 13 14:27:22 mss121 kernel: [<c061c408>] mutex_lock+0xb/0x19
Aug 13 14:27:22 mss121 kernel: [<c0449c52>]
audit_syscall_entry+0x15a/0x18c
Aug 13 14:27:22 mss121 kernel: [<c047b31b>] blkdev_file_write+0x1a/0x1e
Aug 13 14:27:22 mss121 kernel: [<c0474d53>] vfs_write+0xa1/0x143
Aug 13 14:27:22 mss121 kernel: [<c0475345>] sys_write+0x3c/0x63
Aug 13 14:27:22 mss121 kernel: [<c0404f17>] syscall_call+0x7/0xb
Aug 13 14:27:22 mss121 kernel: =======================
Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more
than 120 seconds.
Aug 13 14:29:22 mss121 kernel: "echo 0>
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 13 14:29:22 mss121 kernel: mpdsk D 00000BD7 1784 22158
22151 22159 22157 (NOTLB)
Aug 13 14:29:22 mss121 kernel: f3025e04 00000082 bde60e11
00000bd7 f3025e50 c045d1a9 f3025e50 0000000a
Aug 13 14:29:22 mss121 kernel: f7c60000 bde67e2a 00000bd7
00007019 00000000 f7c6010c c8612700 f6ec4e40
Aug 13 14:29:22 mss121 kernel: 00000000 00000000 00000000
c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff
Aug 13 14:29:22 mss121 kernel: Call Trace:
Aug 13 14:29:22 mss121 kernel: [<c045d1a9>] __pagevec_release+0x15/0x1d
Aug 13 14:29:22 mss121 kernel: [<c042cbd1>] getnstimeofday+0x30/0xb6
Aug 13 14:29:22 mss121 kernel: [<c061c156>] io_schedule+0x36/0x59
Aug 13 14:29:22 mss121 kernel: [<c04569c0>] sync_page+0x38/0x3b
Aug 13 14:29:22 mss121 kernel: [<c061c32d>] __wait_on_bit+0x33/0x58
Aug 13 14:29:22 mss121 kernel: [<c0456988>] sync_page+0x0/0x3b
Aug 13 14:29:22 mss121 kernel: [<c0456a48>] wait_on_page_bit+0x5b/0x62
Aug 13 14:29:22 mss121 kernel: [<c043642c>] wake_bit_function+0x0/0x3c
Aug 13 14:29:22 mss121 kernel: [<c04573cf>]
wait_on_page_writeback_range+0x4d/0xf1
Aug 13 14:29:22 mss121 kernel: [<c04934a0>]
generic_osync_inode+0x93/0xbf
Aug 13 14:29:22 mss121 kernel: [<c0457618>]
sync_page_range_nolock+0x68/0x93
Aug 13 14:29:22 mss121 kernel: [<c0458930>]
generic_file_aio_write_nolock+0x71/0x83
Aug 13 14:29:22 mss121 kernel: [<c047b301>] blkdev_file_write+0x0/0x1e
Aug 13 14:29:22 mss121 kernel: [<c0458c8d>]
generic_file_write_nolock+0x86/0x9a
Aug 13 14:29:22 mss121 kernel: [<c04566fe>]
find_get_pages_tag+0x30/0x75
Aug 13 14:29:22 mss121 kernel: [<c0457428>]
wait_on_page_writeback_range+0xa6/0xf1
Aug 13 14:29:22 mss121 kernel: [<c04363ff>]
autoremove_wake_function+0x0/0x2d
Aug 13 14:29:22 mss121 kernel: [<c061c408>] mutex_lock+0xb/0x19
Aug 13 14:29:22 mss121 kernel: [<c0449c52>]
audit_syscall_entry+0x15a/0x18c
Aug 13 14:29:22 mss121 kernel: [<c047b31b>] blkdev_file_write+0x1a/0x1e
Aug 13 14:29:22 mss121 kernel: [<c0474d53>] vfs_write+0xa1/0x143
Aug 13 14:29:22 mss121 kernel: [<c0475345>] sys_write+0x3c/0x63
Aug 13 14:29:22 mss121 kernel: [<c0404f17>] syscall_call+0x7/0xb
Aug 13 14:29:22 mss121 kernel: =======================
Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more
than 120 seconds.
The mpdsk processes above are part of the Application which is a
MUMPS database (not a RDB) that does the writing of data blocks to raw
Logical Volume (no file system involved). It would have been doing
writes during both pvmoves. I know pvmove is part ofLVM2, but because it
worked with PowerPath and not when using Multipath and all other things
are the same is the reason I am asking the questions here.
_____
Jack Allen
[-- Attachment #2: Type: text/html, Size: 24218 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2010-08-17 19:26 Allen, Jack
@ 2010-08-17 22:12 ` Thomas Hager
2010-08-17 23:09 ` Allen, Jack
0 siblings, 1 reply; 30+ messages in thread
From: Thomas Hager @ 2010-08-17 22:12 UTC (permalink / raw)
To: linux-lvm
[-- Attachment #1: Type: text/plain, Size: 1772 bytes --]
On Tue, 2010-08-17 at 15:26 -0400, Allen, Jack wrote:
> I know pvmove is part ofLVM2, but because it worked with PowerPath and
> not when using Multipath and all other things are the same is the
> reason I am asking the questions here.
we had similar problems with novell SLES every now and then, and they
were not reproduceable and occured in random time frames.
among them:
- the pvmove simply stalled, outputting the same %-done message every 15
seconds until we reset the server.
- the server crashed and performed an automated reboot.
and worst:
- pvmove immediately threw an I/O error after starting, committed all
pending moves though -> all data previously residing on the old LUN was
lost :(
novell provided several updates to the kernel, lvm2 and the
device-mapper, but there might still lurk some bugs we haven't triggered
yet. one advise they gave us was to only migrate PEs of one LV at a
time, which we followed afterwards.
we've seen this behaviour only on SLES, which is the distribution we use
on most of our servers. the few redhats we have migrated fine with
pvmove, we didn't migrate much on these though (only some hundred GB
compared to the ~50TB we had to migrate with SLES). and it was not
related to the storage driver we used, we faced the same issues with
HP's adapted qlogic driver as well as with dm-mp.
anyway, you definitely should open an SR in redhat's CC, so they can
investigate the issue more closely.
hth,
tom.
--
Thomas "Duke" Hager duke@sigsegv.at
GPG: 1024D/D27F858C http://www.sigsegv.at/gpg/duke.gpg
=================================================================
"Never Underestimate the Power of Stupid People in Large Groups."
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 190 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-lvm] pvmove hangs
2010-08-17 22:12 ` Thomas Hager
@ 2010-08-17 23:09 ` Allen, Jack
0 siblings, 0 replies; 30+ messages in thread
From: Allen, Jack @ 2010-08-17 23:09 UTC (permalink / raw)
To: LVM general discussion and development
-----Original Message-----
From: linux-lvm-bounces@redhat.com [mailto:linux-lvm-bounces@redhat.com]
On Behalf Of Thomas Hager
Sent: Tuesday, August 17, 2010 6:12 PM
To: linux-lvm@redhat.com
Subject: Re: [linux-lvm] pvmove hangs
On Tue, 2010-08-17 at 15:26 -0400, Allen, Jack wrote:
> I know pvmove is part ofLVM2, but because it worked with PowerPath and
> not when using Multipath and all other things are the same is the
> reason I am asking the questions here.
we had similar problems with novell SLES every now and then, and they
were not reproduceable and occured in random time frames.
among them:
- the pvmove simply stalled, outputting the same %-done message every 15
seconds until we reset the server.
- the server crashed and performed an automated reboot.
and worst:
- pvmove immediately threw an I/O error after starting, committed all
pending moves though -> all data previously residing on the old LUN was
lost :(
novell provided several updates to the kernel, lvm2 and the
device-mapper, but there might still lurk some bugs we haven't triggered
yet. one advise they gave us was to only migrate PEs of one LV at a
time, which we followed afterwards.
we've seen this behaviour only on SLES, which is the distribution we use
on most of our servers. the few redhats we have migrated fine with
pvmove, we didn't migrate much on these though (only some hundred GB
compared to the ~50TB we had to migrate with SLES). and it was not
related to the storage driver we used, we faced the same issues with
HP's adapted qlogic driver as well as with dm-mp.
anyway, you definitely should open an SR in redhat's CC, so they can
investigate the issue more closely.
hth,
tom.
=============
Thanks for the info Tom.
After sending the post I tried the pvmove command several more times,
this time adding the -v option. One PV had 3 LVs on it and it completed
the first LV with no problem and then as it completed the second LV is
displays suspending LV (the first one did to) and this is when
everything related to the PV hung. If you try to do any LVM commands on
the VG they hang, but if you abort them with a ^C it states aborted
while waiting on flock /var/lock/lvm/X, where X is the VG name. If I
remove the file I can then do LVM commands related to the VG, but the
pvmove is still hung. So it would seem there is some race condition,
deadly embrace, catch 22 of 2 resources waiting on each other. Writes
are waiting because the LV is suspended, and suspending is waiting
because there are outstanding writes to be done.
I plan to open case with Red Hat, but was hoping the problem had already
been handled. I am still concerned about it not working with multipath,
but works fine with Powerpath, because I would rather use multipath
because it makes doing yum updates a lot easier.
------
Jack Allen
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2010-08-17 23:09 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-27 13:04 [linux-lvm] pvmove hangs Gergely Imre
2005-04-28 9:46 ` Diaz Rodriguez, Eduardo
2005-04-28 10:23 ` Gergely Imre
2005-04-28 11:13 ` Diaz Rodriguez, Eduardo
-- strict thread matches above, loose matches on Subject: below --
2010-08-17 19:26 Allen, Jack
2010-08-17 22:12 ` Thomas Hager
2010-08-17 23:09 ` Allen, Jack
2005-04-26 9:19 Gergely Imre
2005-04-26 13:30 ` Diaz Rodriguez, Eduardo
2005-04-27 5:46 ` Gergely Imre
2003-08-16 5:21 Jan Niehusmann
2003-08-16 10:48 ` Jan Niehusmann
2003-08-16 13:57 ` Alasdair G Kergon
2003-08-17 11:11 ` Jan Niehusmann
2003-08-17 11:34 ` Alasdair G Kergon
2003-08-17 11:41 ` Jan Niehusmann
2003-08-17 12:00 ` Alasdair G Kergon
2003-08-17 18:15 ` Jan Niehusmann
2003-08-18 6:46 ` Alasdair G Kergon
2003-08-18 7:07 ` Jan Niehusmann
2003-08-18 9:13 ` Alasdair G Kergon
2003-08-18 9:30 ` Jan Niehusmann
[not found] ` <20030817114638.GA1839@gondor.com>
2003-08-17 12:42 ` Alasdair G Kergon
2003-08-17 13:27 ` Jan Niehusmann
2003-08-17 13:50 ` Alasdair G Kergon
2003-08-17 13:55 ` Alasdair G Kergon
2003-08-18 12:58 ` Alasdair G Kergon
2003-08-18 13:21 ` Jan Niehusmann
2003-08-18 17:55 ` Jan Niehusmann
2003-08-19 17:52 ` Jan Niehusmann
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.