All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] pvmove hangs
@ 2003-08-16  5:21 Jan Niehusmann
  2003-08-16 10:48 ` Jan Niehusmann
  2003-08-16 13:57 ` Alasdair G Kergon
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-16  5:21 UTC (permalink / raw)
  To: linux-lvm

Hi!

On my computer, pvmove just hangs at 0.2% done. First it looked like the
reason was that I hadn't loaded dm-mirror before calling pvmove.
Modprobe was called automatically, but tried to write to
/var/log/ksymoops/, which was on one of the lvs to move.

But then I booted to singe user mode and didn't mount any partitions on
lvm, and still, pvmove was hanging at 0.2%.

Another unusual detail about my installation is that the target pv is on
a degraded raid1 array. Perhaps there is some locking issue?

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-16  5:21 Jan Niehusmann
@ 2003-08-16 10:48 ` Jan Niehusmann
  2003-08-16 13:57 ` Alasdair G Kergon
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-16 10:48 UTC (permalink / raw)
  To: linux-lvm

On Sat, Aug 16, 2003 at 12:20:06PM +0200, Jan Niehusmann wrote:
> On my computer, pvmove just hangs at 0.2% done. First it looked like the
> reason was that I hadn't loaded dm-mirror before calling pvmove.

I just noticed I forgot an important detail: Version numbers :-)

# pvmove --version
  LVM version:     2.00.05 (2003-07-18)
  Library version: 1.00.02-ioctl (2003-07-12)
  Driver version:  4.0.1

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-16  5:21 Jan Niehusmann
  2003-08-16 10:48 ` Jan Niehusmann
@ 2003-08-16 13:57 ` Alasdair G Kergon
  2003-08-17 11:11   ` Jan Niehusmann
                     ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-16 13:57 UTC (permalink / raw)
  To: Jan Niehusmann; +Cc: linux-lvm

On Sat, Aug 16, 2003 at 12:20:06PM +0200, Jan Niehusmann wrote:
> On my computer, pvmove just hangs at 0.2% done. 

  If pvmove fails, you can use 'pvmove --abort' to abandon it
  or 'pvmove' on its own [or with same source pv as before] to
  retry.  xxchange -ay may also attempt to restart pvmoves in
  progress (leaving a pvmove daemon running).

> But then I booted to singe user mode and didn't mount any partitions on
> lvm, and still, pvmove was hanging at 0.2%.

Need more precise description of what you mean by 'hanging'.

And the usual diagnostics:  debug log file from the command that should
have set the pvmove running, any kernel error messages, lvm.conf, a copy 
of the current VG's metadata [e.g. with vgcfgbackup -f] etc.
If pvmove process itself freezes, then a long/wide 'ps' output for that
process, kcopyd & kmirrord (incl cols: NI, VSZ, RSS, STAT 
and symbolic WCHAN).
 
Alasdair
-- 
agk@uk.sistina.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-16 13:57 ` Alasdair G Kergon
@ 2003-08-17 11:11   ` Jan Niehusmann
  2003-08-17 11:34     ` Alasdair G Kergon
  2003-08-17 18:15   ` Jan Niehusmann
       [not found]   ` <20030817114638.GA1839@gondor.com>
  2 siblings, 1 reply; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-17 11:11 UTC (permalink / raw)
  To: linux-lvm

Good news (perhaps?):
If I move only single LVs (ie., I specify one LV with -n on pvmove), it
seems to work.
Perhaps the problem is that source and target PV have exactly the same
number of extents? Of-by-one error, if the target volume gets completely
filled up by the pvmove?

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-17 11:11   ` Jan Niehusmann
@ 2003-08-17 11:34     ` Alasdair G Kergon
  2003-08-17 11:41       ` Jan Niehusmann
  0 siblings, 1 reply; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 11:34 UTC (permalink / raw)
  To: linux-lvm

On Sun, Aug 17, 2003 at 06:10:22PM +0200, Jan Niehusmann wrote:
> Perhaps the problem is that source and target PV have exactly the same
> number of extents? Of-by-one error, if the target volume gets completely
> filled up by the pvmove?

Who knows?

The log file might show whether that theory is correct or not.

If you want me to look at this, you need to supply some of the 
information requested in my last email: your description of the problem
so far is too vague.
 
Alasdair
-- 
agk@uk.sistina.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-17 11:34     ` Alasdair G Kergon
@ 2003-08-17 11:41       ` Jan Niehusmann
  2003-08-17 12:00         ` Alasdair G Kergon
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-17 11:41 UTC (permalink / raw)
  To: linux-lvm

On Sun, Aug 17, 2003 at 05:33:26PM +0100, Alasdair G Kergon wrote:
> The log file might show whether that theory is correct or not.

I sent it to the list a few hours ago, but it seems like it didn't go
through. Perhaps it exceeded some size limitation?

However, now I put the files on http://gondor.com/lvm/
They do contain the following:

lvm2.log	debug log of pvmove, vgcfgbackup (to save the new
		config) and pvmove --abort
lvm.conf	obvious
config-old	config before pvmove
config-new	config after pvmove
ps		output of ps while (first) pvmove is running

I checked the off-by-one theory, by filling up most of the target PV
with a dummy partition and moving a single LV which exactly fits in to
the remaining extents. The pvmove succeeded, so this probably wasn't the
cause.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-17 11:41       ` Jan Niehusmann
@ 2003-08-17 12:00         ` Alasdair G Kergon
  0 siblings, 0 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 12:00 UTC (permalink / raw)
  To: linux-lvm

On Sun, Aug 17, 2003 at 06:40:28PM +0200, Jan Niehusmann wrote:
> I sent it to the list a few hours ago, but it seems like it didn't go
> through. Perhaps it exceeded some size limitation?

Quite likely - after the 1.9MB message slipped through the net
last week.

Alasdair
-- 
agk@uk.sistina.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
       [not found]   ` <20030817114638.GA1839@gondor.com>
@ 2003-08-17 12:42     ` Alasdair G Kergon
  2003-08-17 13:27       ` Jan Niehusmann
  2003-08-18 12:58     ` Alasdair G Kergon
  1 sibling, 1 reply; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 12:42 UTC (permalink / raw)
  To: Jan Niehusmann; +Cc: linux-lvm

On Sun, Aug 17, 2003 at 01:46:38PM +0200, Jan Niehusmann wrote:
> I hope this helps - do you have a suggestion on what to try next?

This is the first clue:
   activate/dev_manager.c:487 pvmove  Mirror status: 2 003:004 009:002 383/384
   activate/dev_manager.c:532 pvmove  Mirror percent: 99.739586

383/384 means the kernel mirror has only successfully copied 383 out 
of 384 512-KB regions from (major, minor) (3,4) to (9,2).

The first pvmove segment has 
  12 16MB extents = 393216 sectors = 384 512KB regions.

Can you check (e.g. using dd) all the old sectors can be read OK, 
and the new ones written to OK?  (Though you're not seeing kernel error
messages; but some volume moves work OK, so an off-by-one theory
is unlikely or nothing would work?)

The first entry of 'dmsetup table vgraid-pvmove0' would show
you the sector ranges - 30448+3156 on pv1=/dev/hda4, length above
& 384 on pv0=/dev/md2 ? 

[NB Be careful not to overwrite data - if you've done some
other successful pvmoves since, some of those /dev/md2 sectors 
might now be allocated onto.]

We haven't implemented bad sector error reporting and handling yet,
so for now, you have to construct pvmoves to avoid any bad sectors.

Alasdair

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-17 12:42     ` Alasdair G Kergon
@ 2003-08-17 13:27       ` Jan Niehusmann
  2003-08-17 13:50         ` Alasdair G Kergon
  2003-08-17 13:55         ` Alasdair G Kergon
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-17 13:27 UTC (permalink / raw)
  To: linux-lvm

On Sun, Aug 17, 2003 at 06:41:38PM +0100, Alasdair G Kergon wrote:
> 383/384 means the kernel mirror has only successfully copied 383 out 
> of 384 512-KB regions from (major, minor) (3,4) to (9,2).
> 
> The first pvmove segment has 
>   12 16MB extents = 393216 sectors = 384 512KB regions.

Well, now after I did some pvmoves of individual LVs, pvmove /dev/hda4
hangs at a different point:

activate/dev_manager.c:487 pvmove  Mirror status: 2 003:004 009:002 511/512
activate/dev_manager.c:532 pvmove  Mirror percent: 99.804688

dmsetup says:

0 524288 mirror core 1 1024 2 003:004 111146736 009:002 56852864 
524288 229376 linear 003:004 88864496
753664 2326528 linear 003:004 89093872
[...]

I hope I understand these numbers correctly: The first line means that
the virtual device vgraid-pvmove0 starts with 524288 blocks (512 byte
each). These are located at position 111146736 on /dev/hda4 and 
56852864 on /dev/md2. (But I don't know what core 1 1024 2 means)

I can read/write these areas with dd, without error messages.

> We haven't implemented bad sector error reporting and handling yet,
> so for now, you have to construct pvmoves to avoid any bad sectors.

I don't think the drives have bad sectors - at least I didn't see any
signs of bad sectors by now.

And I pvmoved a 19GB LV, much more than the global pvmove gets done.
All the moves had decent performance, ~20MB/s, where the source drive
only reads ~30MB/s with dd. The only strange thing I saw was that there
sometimes where some seconds without any disk activity at all. 

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-17 13:27       ` Jan Niehusmann
@ 2003-08-17 13:50         ` Alasdair G Kergon
  2003-08-17 13:55         ` Alasdair G Kergon
  1 sibling, 0 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 13:50 UTC (permalink / raw)
  To: Jan Niehusmann; +Cc: linux-lvm

On Sun, Aug 17, 2003 at 08:26:03PM +0200, Jan Niehusmann wrote:
> only reads ~30MB/s with dd. The only strange thing I saw was that there
> sometimes where some seconds without any disk activity at all. 

That'll be because of the pvmove polling mode - by default
it only checks (and displays) each segment's pvmove progress
every 15 seconds.   For 5 sec. intervals use pvmove -i5
or for best performance (but no progress displays *during*
segments) try -i0.

Alasdair

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-17 13:27       ` Jan Niehusmann
  2003-08-17 13:50         ` Alasdair G Kergon
@ 2003-08-17 13:55         ` Alasdair G Kergon
  1 sibling, 0 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-17 13:55 UTC (permalink / raw)
  To: Jan Niehusmann; +Cc: linux-lvm

On Sun, Aug 17, 2003 at 08:26:03PM +0200, Jan Niehusmann wrote:
> (But I don't know what core 1 1024 2 means)

core means the data recording the state of the mirror is
held in-core - i.e. it is not a persistent mirror
[No other options apart from 'core' available yet]

1 means 1 parameter associated with 'core' follows:
  1024 is the number of sectors moved at once (ie 512KB)

2 means 2 mirror devices follow
 
Alasdair
-- 
agk@uk.sistina.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-16 13:57 ` Alasdair G Kergon
  2003-08-17 11:11   ` Jan Niehusmann
@ 2003-08-17 18:15   ` Jan Niehusmann
  2003-08-18  6:46     ` Alasdair G Kergon
  2003-08-18  9:13     ` Alasdair G Kergon
       [not found]   ` <20030817114638.GA1839@gondor.com>
  2 siblings, 2 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-17 18:15 UTC (permalink / raw)
  To: linux-lvm

On Sat, Aug 16, 2003 at 07:56:06PM +0100, Alasdair G Kergon wrote:
> If pvmove process itself freezes, then a long/wide 'ps' output for that
> process, kcopyd & kmirrord (incl cols: NI, VSZ, RSS, STAT 
> and symbolic WCHAN).

I noticed the WCHAN fields in my ps output were not very useful, because
they pointed to a module and were not decoded.

So I took the numeric value (92fa83) and looked it up by hand. It seems
to point to the schedule() call in line 56 of dm-daemon.c for both
kcopyd and kmirrord. 

And then, I made a quite worrying observation: When the hanging happens,
the device being copied is in a completely garbled state. In this case,
/dev/vgraid/lvol5 didn't look like an ext3 filesystem at all. 

So I looked at the underlying devices, and both /dev/hda4 and /dev/md2 had
a copy of the actual filesystem, only differing in the last 512K. This
conforms to the fact that the mirroring was done for all but one 512K
block. But these 512K are completely different (target device all zeroes).

However, these are only the underlying devices. Looking at
/dev/vgraid/lvol5 a little bit closer revealed that it contained parts
of some other filesystem. Very strange. And worrying. I don't even want
to know if writing to this LV would overwrite some unrelated partition.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-17 18:15   ` Jan Niehusmann
@ 2003-08-18  6:46     ` Alasdair G Kergon
  2003-08-18  7:07       ` Jan Niehusmann
  2003-08-18  9:13     ` Alasdair G Kergon
  1 sibling, 1 reply; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-18  6:46 UTC (permalink / raw)
  To: Jan Niehusmann; +Cc: linux-lvm

On Mon, Aug 18, 2003 at 01:14:15AM +0200, Jan Niehusmann wrote:
> So I looked at the underlying devices, and both /dev/hda4 and /dev/md2 had
> a copy of the actual filesystem, only differing in the last 512K. This
> conforms to the fact that the mirroring was done for all but one 512K
> block. But these 512K are completely different (target device all zeroes).
 
Please confirm which kernel you are using, and the device-mapper
patch(es) you applied.  [patches/linux-2.4.2?* from the 1.00.02
device-mapper tarball?]

Some more diagnostics:
  check that you have 'activation = 1' and 'level = 7' in the log{}
  section of lvm.conf and if not, recreate a problem pvmove application log 
  (The you put on the web seems incomplete)
  ['activation = 1' setting is only meant for use when diagnosing
  problems like this - don't leave it there permanently]

> However, these are only the underlying devices. Looking at
> /dev/vgraid/lvol5 a little bit closer revealed that it contained parts
> of some other filesystem. Very strange. And worrying. I don't even want
> to know if writing to this LV would overwrite some unrelated partition.

Use dmsetup to see what's going on:
  run 'dmsetup ls' to get a list of internal device names,
  then 'dmsetup -v table <device_name_shown_in_ls>' 
on all the relevant ones (e.g. vgraid-lvol5*)

After all the interruptions you've had, check the /dev/vgraid/lvol5 
symlink (->/dev/mapper/vgraid-lvol5) & destination major/minor is 
still right.

Check the current metadata (with vgcfgbackup) still looks right.
[Does it show pvmove(s) in progress, or is it clean again?]

Alasdair
-- 
agk@uk.sistina.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-18  6:46     ` Alasdair G Kergon
@ 2003-08-18  7:07       ` Jan Niehusmann
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-18  7:07 UTC (permalink / raw)
  To: linux-lvm

On Mon, Aug 18, 2003 at 12:45:15PM +0100, Alasdair G Kergon wrote:
> Please confirm which kernel you are using, and the device-mapper
> patch(es) you applied.  [patches/linux-2.4.2?* from the 1.00.02
> device-mapper tarball?]

It's 2.4.21, and yes, both linux-2.4.21-devmapper-ioctl.patch and
linux-2.4.21-VFS-lock.patch from the 1.00.02 tarball are applied.

> Some more diagnostics:
>   check that you have 'activation = 1' and 'level = 7' in the log{}
>   section of lvm.conf and if not, recreate a problem pvmove application log 
>   (The you put on the web seems incomplete)

I have level = 7, but not activation = 1. I'll try it again later, when
I'm back at the console... don't want to hang the computer via ssh :-)

> Use dmsetup to see what's going on:
>   run 'dmsetup ls' to get a list of internal device names,
>   then 'dmsetup -v table <device_name_shown_in_ls>' 
> on all the relevant ones (e.g. vgraid-lvol5*)

I had a look at these and didn't see anything unusual, but I'll check
again carefully.

> After all the interruptions you've had, check the /dev/vgraid/lvol5 
> symlink (->/dev/mapper/vgraid-lvol5) & destination major/minor is 
> still right.

lvol5 was mounted when the problems started (errors on ls on that
partition), so it's surely not a symlink problem, as the kernel doesn't
access a mounted device via the symlink.

> Check the current metadata (with vgcfgbackup) still looks right.
> [Does it show pvmove(s) in progress, or is it clean again?]

When pvmove hangs, vgcfgbackup shows the pvmove in progress, even after
CTRL-C (which is expected), and pvmove --abort doesn't work at all, as I
said. 
So I always had to restore LVM2 to a sane state with vgcfgrestore. After
that, I reboot and get LVM back up running cleanly. Even lvol5 was not
damaged, because the garbage on the logical volume never got written to
the underlying device, and vgcfgrestore removed all traces of the
incomplete pvmove.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-17 18:15   ` Jan Niehusmann
  2003-08-18  6:46     ` Alasdair G Kergon
@ 2003-08-18  9:13     ` Alasdair G Kergon
  2003-08-18  9:30       ` Jan Niehusmann
  1 sibling, 1 reply; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-18  9:13 UTC (permalink / raw)
  To: Jan Niehusmann; +Cc: linux-lvm

On Mon, Aug 18, 2003 at 01:14:15AM +0200, Jan Niehusmann wrote:
> I noticed the WCHAN fields in my ps output were not very useful, because
> they pointed to a module and were not decoded.

Does your System.map file correspond to the running kernel?
[See ps man page / strace to check where it's getting that file from]
 
Alasdair
-- 
agk@uk.sistina.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-18  9:13     ` Alasdair G Kergon
@ 2003-08-18  9:30       ` Jan Niehusmann
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-18  9:30 UTC (permalink / raw)
  To: linux-lvm

On Mon, Aug 18, 2003 at 03:12:29PM +0100, Alasdair G Kergon wrote:
> On Mon, Aug 18, 2003 at 01:14:15AM +0200, Jan Niehusmann wrote:
> > I noticed the WCHAN fields in my ps output were not very useful, because
> > they pointed to a module and were not decoded.
> 
> Does your System.map file correspond to the running kernel?
> [See ps man page / strace to check where it's getting that file from]

Sure it is :-)
(I use debian make-kpkg to build a kernel .deb, which makes sure that
the right System.map file gets installed together with the kernel)

But it doesn't contain symbols which get defined by modules. And
even /proc/ksymoops only contains exported symbols, which daemon()
from dm-daemon.c is not. So I looked at the next exported symbol
(dm_daemon_start), took the difference between these two addresses,
and went back the number of bytes in objdump --disassemble, and compared
that to the sourcecode to find the actual positions.

Even though I don't know x86 assembly too well, it was not difficult to
match assembly instructions to C source code, as the source at this
position is quite low-level.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
       [not found]   ` <20030817114638.GA1839@gondor.com>
  2003-08-17 12:42     ` Alasdair G Kergon
@ 2003-08-18 12:58     ` Alasdair G Kergon
  2003-08-18 13:21       ` Jan Niehusmann
  2003-08-18 17:55       ` Jan Niehusmann
  1 sibling, 2 replies; 30+ messages in thread
From: Alasdair G Kergon @ 2003-08-18 12:58 UTC (permalink / raw)
  To: Jan Niehusmann; +Cc: linux-lvm

On Sun, Aug 17, 2003 at 01:46:38PM +0200, Jan Niehusmann wrote:
> pvmove --abort really hangs... 

Puzzling.  But we can think of some explanations that could tie this in
with the 383/384 problem.

> The difference is that the first pvmove just doesn't show any progress,
> but can be stopped with ctrl-c. 

> pvmove --abort doesn't return, isn't
> killable, and after issuing it, every attempt to access a filesystem on
> lvm also blocks.

I'd like to see the debug log file/strace for that pvmove --abort and
the full ps output for the process (incl decoded wchan and process state,
nice value, resident memory size & whether its CPU time is static or
increasing) - before and after issuing the 'kill -9' that doesn't work.

We're probably going to need to start sending you patches to instrument
the kernel if we are to work out what's going on here [unless you happen
to have the machine set up for kgdb].

Also, careful use of 'dmsetup' should let us deduce some internal state.

Do you use irc or IM - might speed things up if so?

Alasdair

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-18 12:58     ` Alasdair G Kergon
@ 2003-08-18 13:21       ` Jan Niehusmann
  2003-08-18 17:55       ` Jan Niehusmann
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-18 13:21 UTC (permalink / raw)
  To: linux-lvm

On Mon, Aug 18, 2003 at 06:57:19PM +0100, Alasdair G Kergon wrote:
> We're probably going to need to start sending you patches to instrument
> the kernel if we are to work out what's going on here [unless you happen
> to have the machine set up for kgdb].

I don't use kgdb, but I'd happily apply patches if this may help finding
the problem.

> Do you use irc or IM - might speed things up if so?

jannic on the freenode IRC network, or jannic@jabber.gondor.com on
jabber. I'd prefer jabber, because on IRC I often miss messages :-)

But I don't think I'll have time this evening - perhaps later, but I'm
not sure. Tomorrow evening I'll probably have enough time to do some
more testing.
How are you reachable by IRC or IM?

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-18 12:58     ` Alasdair G Kergon
  2003-08-18 13:21       ` Jan Niehusmann
@ 2003-08-18 17:55       ` Jan Niehusmann
  2003-08-19 17:52         ` Jan Niehusmann
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-18 17:55 UTC (permalink / raw)
  To: linux-lvm

On Mon, Aug 18, 2003 at 06:57:19PM +0100, Alasdair G Kergon wrote:
> I'd like to see the debug log file/strace for that pvmove --abort and
> the full ps output for the process (incl decoded wchan and process state,
> nice value, resident memory size & whether its CPU time is static or
> increasing) - before and after issuing the 'kill -9' that doesn't work.

The ps simply says 

F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME COMMAND

4     0 24006  1180   1 -18 14140 14140 down  D<L  tty1       0:00 pvmove --abort

(or numeric:)
4     0 24006  1180   1 -18 14140 14140 105b14 D<L tty1       0:00 pvmove --abort

This doesn't change with kill -9, all stays exactly the same. 
I do not yet have a logfile of pvmove --abort.

But http://gondor.com/lvm/lvm2.log.20030818  does contain the complete
logfile of pvmove -v /dev/hda4.

I'm going to sleep now. Lets do further debugging tomorrow (Tuesday), if
that's ok for you.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2003-08-18 17:55       ` Jan Niehusmann
@ 2003-08-19 17:52         ` Jan Niehusmann
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Niehusmann @ 2003-08-19 17:52 UTC (permalink / raw)
  To: linux-lvm

Hi!

Unfortunately I had some network problems today, so the debugging
session was shorter than expected. 

But I did gather another logfile with activation=1.
It's available from http://gondor.com/lvm/lvm2.log.20030819
Additionally, there is a file 'typescript' in the same directory, which
contains output from dmsetup ls and dmsetup -v table vgraid-pvmove0.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [linux-lvm] pvmove hangs
@ 2005-04-26  9:19 Gergely Imre
  2005-04-26 13:30 ` Diaz Rodriguez, Eduardo
  0 siblings, 1 reply; 30+ messages in thread
From: Gergely Imre @ 2005-04-26  9:19 UTC (permalink / raw)
  To: linux-lvm


hi

i have a LV with two PVs in it, like this:

 --- Logical volume ---
  LV Name                /dev/watchdog/watchdog_var
  VG Name                watchdog
  LV UUID                v6U2JY-PBMf-snpJ-IXX5-ZzcV-y4fk-PchoQg
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                72.22 GB
  Current LE             2311
  Segments               1
  Allocation             next free (default)
  Read ahead sectors     0
  Block device           254:6

  --- Physical volumes ---
  PV Name               /dev/sdb2
  PV UUID               5v23aD-mkO0-I9Ur-zUKG-sGd2-8iCP-zhgfJe
  PV Status             allocatable
  Total PE / Free PE    2500 / 189

  PV Name               /dev/sda1
  PV UUID               7mzLEZ-Va0c-sdiW-8KAV-lyjo-a5yI-GBORGS
  PV Status             allocatable
  Total PE / Free PE    2941 / 2753

i want to remove sdb2, so i wanted to make a pvmove /dev/sdb2, so it
would move everything to sda1. the strange thing is, pvmove starts, and
after, say, 20% (about an hour or so) the whole system hangs somehow. i
can ping the box, but any command i try to run, fails. it's like it's
waiting for the disk to read/write, or something like that.
the distrib is Fedora Core 2.

lvm> version
  LVM version:     2.00.15 (2004-04-19)
  Library version: 1.00.14-ioctl (2004-04-06)
  Driver version:  4.4.0

filesystem is ext3.
the system has 2GB RAM, dual Intel Xeon 3GHz.

i tried with kernel 2.6.11.7, after a while it gives me kernel panic
during pvmove. then i tried with 2.6.9, it didn't panic, but it hung,
and i had to reboot.

it's kinda urgent, thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2005-04-26  9:19 Gergely Imre
@ 2005-04-26 13:30 ` Diaz Rodriguez, Eduardo
  2005-04-27  5:46   ` Gergely Imre
  0 siblings, 1 reply; 30+ messages in thread
From: Diaz Rodriguez, Eduardo @ 2005-04-26 13:30 UTC (permalink / raw)
  To: LVM general discussion and development

Hi, try with pvmove -t or pvmove -v,

Send the panics output, I think that we need more information.

regards!


On Tue, 26 Apr 2005 12:19:58 +0300, Gergely Imre wrote
> hi
> 
> i have a LV with two PVs in it, like this:
> 
>  --- Logical volume ---
>   LV Name                /dev/watchdog/watchdog_var
>   VG Name                watchdog
>   LV UUID                v6U2JY-PBMf-snpJ-IXX5-ZzcV-y4fk-PchoQg
>   LV Write Access        read/write
>   LV Status              available
>   # open                 1
>   LV Size                72.22 GB
>   Current LE             2311
>   Segments               1
>   Allocation             next free (default)
>   Read ahead sectors     0
>   Block device           254:6
> 
>   --- Physical volumes ---
>   PV Name               /dev/sdb2
>   PV UUID               5v23aD-mkO0-I9Ur-zUKG-sGd2-8iCP-zhgfJe
>   PV Status             allocatable
>   Total PE / Free PE    2500 / 189
> 
>   PV Name               /dev/sda1
>   PV UUID               7mzLEZ-Va0c-sdiW-8KAV-lyjo-a5yI-GBORGS
>   PV Status             allocatable
>   Total PE / Free PE    2941 / 2753
> 
> i want to remove sdb2, so i wanted to make a pvmove /dev/sdb2, so it
> would move everything to sda1. the strange thing is, pvmove starts, and
> after, say, 20% (about an hour or so) the whole system hangs 
> somehow. i can ping the box, but any command i try to run, fails. 
> it's like it's waiting for the disk to read/write, or something like 
> that. the distrib is Fedora Core 2.
> 
> lvm> version
>   LVM version:     2.00.15 (2004-04-19)
>   Library version: 1.00.14-ioctl (2004-04-06)
>   Driver version:  4.4.0
> 
> filesystem is ext3.
> the system has 2GB RAM, dual Intel Xeon 3GHz.
> 
> i tried with kernel 2.6.11.7, after a while it gives me kernel panic
> during pvmove. then i tried with 2.6.9, it didn't panic, but it hung,
> and i had to reboot.
> 
> it's kinda urgent, thanks.
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


=======================================================================================
Este d�a que tanto temes por ser el �ltimo, es la aurora del d�a eterno.
		-- S�neca. (2 a.C-65) Fil�sofo latino. 
=======================================================================================

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2005-04-26 13:30 ` Diaz Rodriguez, Eduardo
@ 2005-04-27  5:46   ` Gergely Imre
  0 siblings, 0 replies; 30+ messages in thread
From: Gergely Imre @ 2005-04-27  5:46 UTC (permalink / raw)
  To: linux-lvm


output of the command
 pvmove -t -vv -d /dev/sdb2 > /tmp/pvmove-test.txt 2>&1
is here (http://terran.noc.astral.ro/pvmove/pvmove-test.txt)

then, i ran it without -t, like this:
 pvmove -vv -d /dev/sdb2 > /tmp/pvmove.txt 2>&1

the output here:
http://terran.noc.astral.ro/pvmove/pvmove.txt

at first, everything seemed alright, it kept writing the progress in the
file. i even ran 'lvs' a couple of times:

[root@watchdog tmp]# lvs
LV            VG       Attr   LSize  Origin Snap%  Move      Move%
.
.
.
pvmove0       watchdog p-C-ao 72.22G               /dev/sdb2   4.15
watchdog_root watchdog -wn-ao  5.88G
watchdog_var  watchdog -wN-ao 72.22G

the filesystem on this LV was mounted and used during the pvmove, but
this shouldn't be a problem, right?

i kept an eye on it, it did it's job for around 67%, then it hung, the
system was not responding, any command failed, i had to reboot. after
the reboot the whole thing started all over again, from the beginning.
but there is nothing in the logs that would indicate a problem.

Diaz Rodriguez, Eduardo wrote:
> Hi, try with pvmove -t or pvmove -v,
> 
> Send the panics output, I think that we need more information.
> 
> regards!
> 
> 
> On Tue, 26 Apr 2005 12:19:58 +0300, Gergely Imre wrote
> 
>>hi
>>
>>i have a LV with two PVs in it, like this:
>>
>> --- Logical volume ---
>>  LV Name                /dev/watchdog/watchdog_var
>>  VG Name                watchdog
>>  LV UUID                v6U2JY-PBMf-snpJ-IXX5-ZzcV-y4fk-PchoQg
>>  LV Write Access        read/write
>>  LV Status              available
>>  # open                 1
>>  LV Size                72.22 GB
>>  Current LE             2311
>>  Segments               1
>>  Allocation             next free (default)
>>  Read ahead sectors     0
>>  Block device           254:6
>>
>>  --- Physical volumes ---
>>  PV Name               /dev/sdb2
>>  PV UUID               5v23aD-mkO0-I9Ur-zUKG-sGd2-8iCP-zhgfJe
>>  PV Status             allocatable
>>  Total PE / Free PE    2500 / 189
>>
>>  PV Name               /dev/sda1
>>  PV UUID               7mzLEZ-Va0c-sdiW-8KAV-lyjo-a5yI-GBORGS
>>  PV Status             allocatable
>>  Total PE / Free PE    2941 / 2753
>>
>>i want to remove sdb2, so i wanted to make a pvmove /dev/sdb2, so it
>>would move everything to sda1. the strange thing is, pvmove starts, and
>>after, say, 20% (about an hour or so) the whole system hangs 
>>somehow. i can ping the box, but any command i try to run, fails. 
>>it's like it's waiting for the disk to read/write, or something like 
>>that. the distrib is Fedora Core 2.
>>
>>lvm> version
>>  LVM version:     2.00.15 (2004-04-19)
>>  Library version: 1.00.14-ioctl (2004-04-06)
>>  Driver version:  4.4.0
>>
>>filesystem is ext3.
>>the system has 2GB RAM, dual Intel Xeon 3GHz.
>>
>>i tried with kernel 2.6.11.7, after a while it gives me kernel panic
>>during pvmove. then i tried with 2.6.9, it didn't panic, but it hung,
>>and i had to reboot.
>>
>>it's kinda urgent, thanks.
>>
>>_______________________________________________
>>linux-lvm mailing list
>>linux-lvm@redhat.com
>>https://www.redhat.com/mailman/listinfo/linux-lvm
>>read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 
> 
> 
> =======================================================================================
> Este d�a que tanto temes por ser el �ltimo, es la aurora del d�a eterno.
> 		-- S�neca. (2 a.C-65) Fil�sofo latino. 
> =======================================================================================
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

-- 
Gergely Imre
SysAdmin
Nextra TeleCom - group Astral
Tel: +4(0)266-317500
http://www.nextra.ro/gimre
GPG key: 0x34525305 (www.keyserver.net)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [linux-lvm] pvmove hangs
@ 2005-04-27 13:04 Gergely Imre
  2005-04-28  9:46 ` Diaz Rodriguez, Eduardo
  0 siblings, 1 reply; 30+ messages in thread
From: Gergely Imre @ 2005-04-27 13:04 UTC (permalink / raw)
  To: linux-lvm


let me 'restart' this thread. i ran into another problem. or it's the same?

[root@test etc]# fdisk -l

Disk /dev/sda: 4294 MB, 4294967296 bytes
255 heads, 63 sectors/track, 522 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1          17      136521   82  Linux swap
/dev/sda2   *          18         267     2008125   83  Linux
/dev/sda3             268         392     1004062+  8e  Linux LVM
/dev/sda4             393         522     1044225   8e  Linux LVM

i installed FC2 on sda2, i upgraded the kernel to 2.6.11.7, created a VG
and LV out of sda3+sda4, like this:

[root@test etc]# vgdisplay -v
    Finding all volume groups
    Finding volume group "test"
  --- Volume group ---
  VG Name               test
  System ID
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  56
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               1.95 GB
  PE Size               4.00 MB
  Total PE              499
  Alloc PE / Size       244 / 976.00 MB
  Free  PE / Size       255 / 1020.00 MB
  VG UUID               E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH

  --- Logical volume ---
  LV Name                /dev/test/root
  VG Name                test
  LV UUID                wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                976.00 MB
  Current LE             244
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0

  --- Physical volumes ---
  PV Name               /dev/sda3
  PV UUID               1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
  PV Status             allocatable
  Total PE / Free PE    245 / 1

  PV Name               /dev/sda4
  PV UUID               rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
  PV Status             allocatable
  Total PE / Free PE    254 / 254

so, practically the root LV is on sda3. i moved the system from sda2 to
sda3, now the whole / is on /dev/mapper/test-root, and it's working fine.

[root@test etc]# df -T
Filesystem    Type   1K-blocks      Used Available Use% Mounted on
/dev/mapper/test-root
              ext3      983704    638908    294828  69% /
none         tmpfs       63464         0     63464   0% /dev/shm

grub.conf:
title Fedora Core (2.6.11.7)
        root (hd0,1)
        kernel /boot/vmlinuz-2.6.11.7 ro root=/dev/mapper/test-root
        initrd /boot/initrd-2.6.11.7.img

so, the /boot directory stayed on /dev/sda2.

fstab:
[root@test etc]# cat /etc/fstab
/dev/mapper/test-root   /                       ext3    defaults        1 1
none                    /dev/pts                devpts  gid=5,mode=620  0 0
none                    /dev/shm                tmpfs   defaults        0 0
/dev/sda1               swap                    swap    defaults        .
.
.

so far so good. now let's say i want to remove sda3. to do this, i need
to pvmove everything from sda3 to sda4. if i run

pvmove -vv /dev/sda3

i get the following:

[root@test root]# pvmove -vv /dev/sda3
      Setting global/locking_type to 1
      Setting global/locking_dir to /var/lock/lvm
      File-based locking enabled.
      /dev/sda3: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda1: No label detected
      /dev/sda2: No label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
    Finding volume group "test"
      Locking /var/lock/lvm/V_test WB
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
    Archiving volume group "test" metadata.
    Creating logical volume pvmove0
      Getting target version for mirror
      Moving /dev/sda3:0-243 of test/root
    Moving 244 extents of logical volume test/root
      Finding volume group for uuid
E012hQKRPNygZImyIsXfbV68tt6S44wHwqoG5mX3FnGsaj8P9tcSwSAXi4IJ5j2P
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
    Found volume group "test"
      Setting activation/missing_stripe_filler to /dev/ioerror
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/test"
      Finding volume group for uuid
E012hQKRPNygZImyIsXfbV68tt6S44wHwqoG5mX3FnGsaj8P9tcSwSAXi4IJ5j2P
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
    Found volume group "test"
      Locking memory
      Suspending test-root
      Finding volume group for uuid
E012hQKRPNygZImyIsXfbV68tt6S44wH860Z8k3Rbibp4LTSyFNUkHANkOMh0qdK
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
      /dev/sda3: lvm2 label detected
      /dev/sda4: lvm2 label detected
    Found volume group "test"
    Loading test-pvmove0
      Setting activation/mirror_region_size to 512

and that's it... i wait around 20 minutes, nothing. no picture, no
sound... i can't do anything, so i have to do a hard reset. there is no
disk activity whatsoever.

after the reset, i quickly log in, and i find (running lvs) that it's
continuing the pvmove, like nothing happened:

lvs[root@test root]# lvs
  LV      VG   Attr   LSize   Origin Snap%  Move      Copy%
  pvmove0 test p-C-ao 976.00M               /dev/sda3  44.67
  root    test -wI-ao 976.00M

i didn't run anything, still, pvmove continues. after a while it
finishes, and it seems like everything is OK.

[root@test root]# lvs
  LV   VG   Attr   LSize   Origin Snap%  Move Copy%
  root test -wi-ao 976.00M
[root@test root]#

[root@test root]# vgdisplay -v
    Finding all volume groups
    Finding volume group "test"
  --- Volume group ---
  VG Name               test
  System ID
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  60
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               1.95 GB
  PE Size               4.00 MB
  Total PE              499
  Alloc PE / Size       244 / 976.00 MB
  Free  PE / Size       255 / 1020.00 MB
  VG UUID               E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH

  --- Logical volume ---
  LV Name                /dev/test/root
  VG Name                test
  LV UUID                wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                976.00 MB
  Current LE             244
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:1

  --- Physical volumes ---
  PV Name               /dev/sda3
  PV UUID               1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
  PV Status             allocatable
  Total PE / Free PE    245 / 245

  PV Name               /dev/sda4
  PV UUID               rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
  PV Status             allocatable
  Total PE / Free PE    254 / 10

now everything is on sda4, i could remove sda3. but here's the thing...
i do a reboot, and i get the following error:

http://terran.noc.astral.ro/pvmove/boot.png

but if i give the root password, /dev/mapper/test-root is mounted, and
if i remount it read-write, it seems to be alright. but if i try to fsck
/dev/mapper/test-root, it still gives me this error
[root@test root]# fsck /dev/mapper/test-root
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
fsck.ext3: No such device or address while trying to open
/dev/mapper/test-root
Possibly non-existent or swap device?
[root@test root]#

i look in /dev/mapper:

[root@test root]# cd /dev/mapper
[root@test mapper]# ll
total 0
crw-------  1 root root  10, 63 Apr 27 15:43 control
brw-------  1 root root 253,  0 Apr 27 15:43 test-pvmove0
brw-------  1 root root 253,  1 Apr 27 15:43 test-root
[root@test mapper]#

it did not remove test-pvmove0, after finishing the move. another
strange thing is that now test-root has major/minor 253,1 and
test-pvmove0 has 253,0.

i removed test-*
[root@test mapper]# rm test-pvmove0 test-root
rm: remove block special file `test-pvmove0'? y
rm: remove block special file `test-root'? y

then i did a lvm vgmknodes (i looked this up in /etc/rc.sysinit:)

[root@test mapper]# lvm vgmknodes
[root@test mapper]# ls -la
total 124
drwxr-xr-x   2 root root    4096 Apr 27 15:59 .
drwxr-xr-x  24 root root  118784 Apr 27 15:55 ..
crw-------   1 root root  10, 63 Apr 27 15:43 control
brw-------   1 root root 253,  0 Apr 27 15:59 test-root

it created the test-root node, but with major/minor 253,0, so now the
fsck is working again:
[root@test mapper]# fsck /dev/mapper/test-root
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
/dev/mapper/test-root is mounted.

WARNING!!!  Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? no

check aborted.

if i do a reboot now, everything is alright again, and in fact i got
what i wanted, it moved everything from sda3 to sda4. but what about all
these problems?

it's not over yet :)
now i want to move everything back to sda3. but this time i do a:

[root@test root]# mount / -o remount,noatime

then the moving:

[root@test root]# pvmove -v /dev/sda4
    Finding volume group "test"
    Archiving volume group "test" metadata.
    Creating logical volume pvmove0
    Moving 244 extents of logical volume test/root
    Found volume group "test"
    Updating volume group metadata
    Creating volume group backup "/etc/lvm/backup/test"
    Found volume group "test"
    Found volume group "test"
    Loading test-pvmove0
    Found volume group "test"
    Loading test-root
    Checking progress every 15 seconds
  /dev/sda4: Moved: 11.9%
  /dev/sda4: Moved: 21.7%
  /dev/sda4: Moved: 32.4%
  /dev/sda4: Moved: 45.5%
  /dev/sda4: Moved: 56.1%
  /dev/sda4: Moved: 67.6%
  /dev/sda4: Moved: 78.3%
  /dev/sda4: Moved: 89.3%
  /dev/sda4: Moved: 99.6%
  /dev/sda4: Moved: 100.0%
    Found volume group "test"
    Found volume group "test"
    Found volume group "test"
    Loading test-pvmove0
    Found volume group "test"
    Loading test-root
    Found volume group "test"
    Found volume group "test"
    Removing temporary pvmove LV
    Writing out final volume group after pvmove
    Creating volume group backup "/etc/lvm/backup/test"


during to move, i run lvs a couple of times, nothing unusual.

[root@test root]# lvs
  LV      VG   Attr   LSize   Origin Snap%  Move      Copy%
  pvmove0 test p-C-ao 976.00M               /dev/sda4  22.95
  root    test -wI-ao 976.00M

but what is unusual, is this:

[root@test root]# cd /dev/mapper/
[root@test mapper]# ll
total 0
crw-------  1 root root  10, 63 Apr 27 16:01 control
brw-------  1 root root 253,  1 Apr 27 16:03 test-pvmove0
brw-------  1 root root 253,  0 Apr 27 15:59 test-root
[root@test mapper]#

now it created test-pvmove0 with 253,1, and it didn't touch test-root.
and after pvmove was finished, it removed test-pvmove0 also. now that i
call a proper pvmove ;) i run vgdisplay just to be sure:

[root@test root]# vgdisplay -v
    Finding all volume groups
    Finding volume group "test"
  --- Volume group ---
  VG Name               test
  System ID
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  63
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               1.95 GB
  PE Size               4.00 MB
  Total PE              499
  Alloc PE / Size       244 / 976.00 MB
  Free  PE / Size       255 / 1020.00 MB
  VG UUID               E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH

  --- Logical volume ---
  LV Name                /dev/test/root
  VG Name                test
  LV UUID                wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                976.00 MB
  Current LE             244
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0

  --- Physical volumes ---
  PV Name               /dev/sda3
  PV UUID               1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
  PV Status             allocatable
  Total PE / Free PE    245 / 1

  PV Name               /dev/sda4
  PV UUID               rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
  PV Status             allocatable
  Total PE / Free PE    254 / 254

everything is in place (on sda3). sda4 is empty like it should.

could somebody explain this to me? what's happening here?

the box i was playing on is a vmware emulated comp (128MB ram, 4GB hdd),
with a buslogic scsi adapter. i installed fedora core 2 custom, with no
packages selected, after that i did a yum update. kernel 2.6.11.7
vanilla, with no patches at all, compiled with minimum stuff. latest
lvm2 and device-mapper.

output of ps:

[root@test root]# ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        S      0:02 init [3]
    2 ?        SWN    0:00 [ksoftirqd/0]
    3 ?        SW<    0:00 [events/0]
    4 ?        SW<    0:00 [khelper]
    9 ?        SW<    0:00 [kthread]
   17 ?        SW<    0:00 [kblockd/0]
   71 ?        SW     0:00 [pdflush]
   72 ?        SW     0:00 [pdflush]
   74 ?        SW<    0:00 [aio/0]
   73 ?        SW     0:00 [kswapd0]
  659 ?        SW     0:00 [kseriod]
  694 ?        SW     0:00 [scsi_eh_0]
  715 ?        SW<    0:00 [kcryptd/0]
  716 ?        SW<    0:00 [kmirrord/0]
  751 ?        SW     0:00 [kjournald]
 1722 ?        S      0:00 syslogd -m 0
 1726 ?        S      0:00 klogd -x
 1749 ?        S      0:00 /usr/sbin/sshd
 1761 ?        S      0:00 crond
 1785 tty1     S      0:00 /sbin/mingetty tty1
 1807 tty2     S      0:00 /sbin/mingetty tty2
 1818 tty3     S      0:00 /sbin/mingetty tty3
 1819 tty4     S      0:00 /sbin/mingetty tty4
 1906 tty5     S      0:00 /sbin/mingetty tty5
 1937 tty6     S      0:00 /sbin/mingetty tty6
 1972 ?        R      0:00 sshd: root@pts/0
 1974 pts/0    S      0:00 -bash
 2010 ?        S      0:00 sshd: root@pts/1
 2012 pts/1    S      0:00 -bash
 2047 pts/0    R      0:00 ps ax

(sorry for the long mail;)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2005-04-27 13:04 [linux-lvm] pvmove hangs Gergely Imre
@ 2005-04-28  9:46 ` Diaz Rodriguez, Eduardo
  2005-04-28 10:23   ` Gergely Imre
  0 siblings, 1 reply; 30+ messages in thread
From: Diaz Rodriguez, Eduardo @ 2005-04-28  9:46 UTC (permalink / raw)
  To: LVM general discussion and development

too much dificult for me, only have one idea, update lvm to last version and
check changelog for some updates about pvmove.

If you solve the problem, please sumarice.

Good Luck!

On Wed, 27 Apr 2005 16:04:20 +0300, Gergely Imre wrote
> let me 'restart' this thread. i ran into another problem. or it's 
> the same?
> 
> [root@test etc]# fdisk -l
> 
> Disk /dev/sda: 4294 MB, 4294967296 bytes
> 255 heads, 63 sectors/track, 522 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1               1          17      136521   82  Linux swap
> /dev/sda2   *          18         267     2008125   83  Linux
> /dev/sda3             268         392     1004062+  8e  Linux LVM
> /dev/sda4             393         522     1044225   8e  Linux LVM
> 
> i installed FC2 on sda2, i upgraded the kernel to 2.6.11.7, created 
> a VG and LV out of sda3+sda4, like this:
> 
> [root@test etc]# vgdisplay -v
>     Finding all volume groups
>     Finding volume group "test"
>   --- Volume group ---
>   VG Name               test
>   System ID
>   Format                lvm2
>   Metadata Areas        2
>   Metadata Sequence No  56
>   VG Access             read/write
>   VG Status             resizable
>   MAX LV                0
>   Cur LV                1
>   Open LV               1
>   Max PV                0
>   Cur PV                2
>   Act PV                2
>   VG Size               1.95 GB
>   PE Size               4.00 MB
>   Total PE              499
>   Alloc PE / Size       244 / 976.00 MB
>   Free  PE / Size       255 / 1020.00 MB
>   VG UUID               E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH
> 
>   --- Logical volume ---
>   LV Name                /dev/test/root
>   VG Name                test
>   LV UUID                wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
>   LV Write Access        read/write
>   LV Status              available
>   # open                 1
>   LV Size                976.00 MB
>   Current LE             244
>   Segments               1
>   Allocation             inherit
>   Read ahead sectors     0
>   Block device           253:0
> 
>   --- Physical volumes ---
>   PV Name               /dev/sda3
>   PV UUID               1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
>   PV Status             allocatable
>   Total PE / Free PE    245 / 1
> 
>   PV Name               /dev/sda4
>   PV UUID               rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
>   PV Status             allocatable
>   Total PE / Free PE    254 / 254
> 
> so, practically the root LV is on sda3. i moved the system from sda2 
> to sda3, now the whole / is on /dev/mapper/test-root, and it's 
> working fine.
> 
> [root@test etc]# df -T
> Filesystem    Type   1K-blocks      Used Available Use% Mounted on
> /dev/mapper/test-root
>               ext3      983704    638908    294828  69% /
> none         tmpfs       63464         0     63464   0% /dev/shm
> 
> grub.conf:
> title Fedora Core (2.6.11.7)
>         root (hd0,1)
>         kernel /boot/vmlinuz-2.6.11.7 ro root=/dev/mapper/test-root
>         initrd /boot/initrd-2.6.11.7.img
> 
> so, the /boot directory stayed on /dev/sda2.
> 
> fstab:
> [root@test etc]# cat /etc/fstab
> /dev/mapper/test-root   /                       ext3    defaults     
>    1 1 none                    /dev/pts                devpts  gid=5,
> mode=620  0 0 none                    /dev/shm                tmpfs  
>  defaults        0 0 /dev/sda1               swap                    
> swap    defaults        . . .
> 
> so far so good. now let's say i want to remove sda3. to do this, i need
> to pvmove everything from sda3 to sda4. if i run
> 
> pvmove -vv /dev/sda3
> 
> i get the following:
> 
> [root@test root]# pvmove -vv /dev/sda3
>       Setting global/locking_type to 1
>       Setting global/locking_dir to /var/lock/lvm
>       File-based locking enabled.
>       /dev/sda3: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda1: No label detected
>       /dev/sda2: No label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>     Finding volume group "test"
>       Locking /var/lock/lvm/V_test WB
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>     Archiving volume group "test" metadata.
>     Creating logical volume pvmove0
>       Getting target version for mirror
>       Moving /dev/sda3:0-243 of test/root
>     Moving 244 extents of logical volume test/root
>       Finding volume group for uuid
> E012hQKRPNygZImyIsXfbV68tt6S44wHwqoG5mX3FnGsaj8P9tcSwSAXi4IJ5j2P
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>     Found volume group "test"
>       Setting activation/missing_stripe_filler to /dev/ioerror
>     Updating volume group metadata
>     Creating volume group backup "/etc/lvm/backup/test"
>       Finding volume group for uuid
> E012hQKRPNygZImyIsXfbV68tt6S44wHwqoG5mX3FnGsaj8P9tcSwSAXi4IJ5j2P
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>     Found volume group "test"
>       Locking memory
>       Suspending test-root
>       Finding volume group for uuid
> E012hQKRPNygZImyIsXfbV68tt6S44wH860Z8k3Rbibp4LTSyFNUkHANkOMh0qdK
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>       /dev/sda3: lvm2 label detected
>       /dev/sda4: lvm2 label detected
>     Found volume group "test"
>     Loading test-pvmove0
>       Setting activation/mirror_region_size to 512
> 
> and that's it... i wait around 20 minutes, nothing. no picture, no
> sound... i can't do anything, so i have to do a hard reset. there is 
> no disk activity whatsoever.
> 
> after the reset, i quickly log in, and i find (running lvs) that it's
> continuing the pvmove, like nothing happened:
> 
> lvs[root@test root]# lvs
>   LV      VG   Attr   LSize   Origin Snap%  Move      Copy%
>   pvmove0 test p-C-ao 976.00M               /dev/sda3  44.67
>   root    test -wI-ao 976.00M
> 
> i didn't run anything, still, pvmove continues. after a while it
> finishes, and it seems like everything is OK.
> 
> [root@test root]# lvs
>   LV   VG   Attr   LSize   Origin Snap%  Move Copy%
>   root test -wi-ao 976.00M
> [root@test root]#
> 
> [root@test root]# vgdisplay -v
>     Finding all volume groups
>     Finding volume group "test"
>   --- Volume group ---
>   VG Name               test
>   System ID
>   Format                lvm2
>   Metadata Areas        2
>   Metadata Sequence No  60
>   VG Access             read/write
>   VG Status             resizable
>   MAX LV                0
>   Cur LV                1
>   Open LV               1
>   Max PV                0
>   Cur PV                2
>   Act PV                2
>   VG Size               1.95 GB
>   PE Size               4.00 MB
>   Total PE              499
>   Alloc PE / Size       244 / 976.00 MB
>   Free  PE / Size       255 / 1020.00 MB
>   VG UUID               E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH
> 
>   --- Logical volume ---
>   LV Name                /dev/test/root
>   VG Name                test
>   LV UUID                wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
>   LV Write Access        read/write
>   LV Status              available
>   # open                 1
>   LV Size                976.00 MB
>   Current LE             244
>   Segments               1
>   Allocation             inherit
>   Read ahead sectors     0
>   Block device           253:1
> 
>   --- Physical volumes ---
>   PV Name               /dev/sda3
>   PV UUID               1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
>   PV Status             allocatable
>   Total PE / Free PE    245 / 245
> 
>   PV Name               /dev/sda4
>   PV UUID               rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
>   PV Status             allocatable
>   Total PE / Free PE    254 / 10
> 
> now everything is on sda4, i could remove sda3. but here's the thing...
> i do a reboot, and i get the following error:
> 
> http://terran.noc.astral.ro/pvmove/boot.png
> 
> but if i give the root password, /dev/mapper/test-root is mounted, 
> and if i remount it read-write, it seems to be alright. but if i try 
> to fsck /dev/mapper/test-root, it still gives me this error 
> [root@test root]# fsck /dev/mapper/test-root fsck 1.35 (28-Feb-2004) 
> e2fsck 1.35 (28-Feb-2004) fsck.ext3: No such device or address while 
> trying to open /dev/mapper/test-root Possibly non-existent or swap device?
> [root@test root]#
> 
> i look in /dev/mapper:
> 
> [root@test root]# cd /dev/mapper
> [root@test mapper]# ll
> total 0
> crw-------  1 root root  10, 63 Apr 27 15:43 control
> brw-------  1 root root 253,  0 Apr 27 15:43 test-pvmove0
> brw-------  1 root root 253,  1 Apr 27 15:43 test-root
> [root@test mapper]#
> 
> it did not remove test-pvmove0, after finishing the move. another
> strange thing is that now test-root has major/minor 253,1 and
> test-pvmove0 has 253,0.
> 
> i removed test-*
> [root@test mapper]# rm test-pvmove0 test-root
> rm: remove block special file `test-pvmove0'? y
> rm: remove block special file `test-root'? y
> 
> then i did a lvm vgmknodes (i looked this up in /etc/rc.sysinit:)
> 
> [root@test mapper]# lvm vgmknodes
> [root@test mapper]# ls -la
> total 124
> drwxr-xr-x   2 root root    4096 Apr 27 15:59 .
> drwxr-xr-x  24 root root  118784 Apr 27 15:55 ..
> crw-------   1 root root  10, 63 Apr 27 15:43 control
> brw-------   1 root root 253,  0 Apr 27 15:59 test-root
> 
> it created the test-root node, but with major/minor 253,0, so now the
> fsck is working again:
> [root@test mapper]# fsck /dev/mapper/test-root
> fsck 1.35 (28-Feb-2004)
> e2fsck 1.35 (28-Feb-2004)
> /dev/mapper/test-root is mounted.
> 
> WARNING!!!  Running e2fsck on a mounted filesystem may cause
> SEVERE filesystem damage.
> 
> Do you really want to continue (y/n)? no
> 
> check aborted.
> 
> if i do a reboot now, everything is alright again, and in fact i got
> what i wanted, it moved everything from sda3 to sda4. but what about 
> all these problems?
> 
> it's not over yet :)
> now i want to move everything back to sda3. but this time i do a:
> 
> [root@test root]# mount / -o remount,noatime
> 
> then the moving:
> 
> [root@test root]# pvmove -v /dev/sda4
>     Finding volume group "test"
>     Archiving volume group "test" metadata.
>     Creating logical volume pvmove0
>     Moving 244 extents of logical volume test/root
>     Found volume group "test"
>     Updating volume group metadata
>     Creating volume group backup "/etc/lvm/backup/test"
>     Found volume group "test"
>     Found volume group "test"
>     Loading test-pvmove0
>     Found volume group "test"
>     Loading test-root
>     Checking progress every 15 seconds
>   /dev/sda4: Moved: 11.9%
>   /dev/sda4: Moved: 21.7%
>   /dev/sda4: Moved: 32.4%
>   /dev/sda4: Moved: 45.5%
>   /dev/sda4: Moved: 56.1%
>   /dev/sda4: Moved: 67.6%
>   /dev/sda4: Moved: 78.3%
>   /dev/sda4: Moved: 89.3%
>   /dev/sda4: Moved: 99.6%
>   /dev/sda4: Moved: 100.0%
>     Found volume group "test"
>     Found volume group "test"
>     Found volume group "test"
>     Loading test-pvmove0
>     Found volume group "test"
>     Loading test-root
>     Found volume group "test"
>     Found volume group "test"
>     Removing temporary pvmove LV
>     Writing out final volume group after pvmove
>     Creating volume group backup "/etc/lvm/backup/test"
> 
> during to move, i run lvs a couple of times, nothing unusual.
> 
> [root@test root]# lvs
>   LV      VG   Attr   LSize   Origin Snap%  Move      Copy%
>   pvmove0 test p-C-ao 976.00M               /dev/sda4  22.95
>   root    test -wI-ao 976.00M
> 
> but what is unusual, is this:
> 
> [root@test root]# cd /dev/mapper/
> [root@test mapper]# ll
> total 0
> crw-------  1 root root  10, 63 Apr 27 16:01 control
> brw-------  1 root root 253,  1 Apr 27 16:03 test-pvmove0
> brw-------  1 root root 253,  0 Apr 27 15:59 test-root
> [root@test mapper]#
> 
> now it created test-pvmove0 with 253,1, and it didn't touch test-
> root. and after pvmove was finished, it removed test-pvmove0 also. 
> now that i call a proper pvmove ;) i run vgdisplay just to be sure:
> 
> [root@test root]# vgdisplay -v
>     Finding all volume groups
>     Finding volume group "test"
>   --- Volume group ---
>   VG Name               test
>   System ID
>   Format                lvm2
>   Metadata Areas        2
>   Metadata Sequence No  63
>   VG Access             read/write
>   VG Status             resizable
>   MAX LV                0
>   Cur LV                1
>   Open LV               1
>   Max PV                0
>   Cur PV                2
>   Act PV                2
>   VG Size               1.95 GB
>   PE Size               4.00 MB
>   Total PE              499
>   Alloc PE / Size       244 / 976.00 MB
>   Free  PE / Size       255 / 1020.00 MB
>   VG UUID               E012hQ-KRPN-ygZI-myIs-XfbV-68tt-6S44wH
> 
>   --- Logical volume ---
>   LV Name                /dev/test/root
>   VG Name                test
>   LV UUID                wqoG5m-X3Fn-Gsaj-8P9t-cSwS-AXi4-IJ5j2P
>   LV Write Access        read/write
>   LV Status              available
>   # open                 1
>   LV Size                976.00 MB
>   Current LE             244
>   Segments               1
>   Allocation             inherit
>   Read ahead sectors     0
>   Block device           253:0
> 
>   --- Physical volumes ---
>   PV Name               /dev/sda3
>   PV UUID               1FWdvz-Bg30-VGNp-sq2P-HpD3-9x0t-s3QNAV
>   PV Status             allocatable
>   Total PE / Free PE    245 / 1
> 
>   PV Name               /dev/sda4
>   PV UUID               rwLsxX-3h8f-Z7tc-Jmgv-375C-RFvP-KnxNOj
>   PV Status             allocatable
>   Total PE / Free PE    254 / 254
> 
> everything is in place (on sda3). sda4 is empty like it should.
> 
> could somebody explain this to me? what's happening here?
> 
> the box i was playing on is a vmware emulated comp (128MB ram, 4GB 
> hdd), with a buslogic scsi adapter. i installed fedora core 2 custom,
>  with no packages selected, after that i did a yum update. kernel 2.6.11.7
> vanilla, with no patches at all, compiled with minimum stuff. latest
> lvm2 and device-mapper.
> 
> output of ps:
> 
> [root@test root]# ps ax
>   PID TTY      STAT   TIME COMMAND
>     1 ?        S      0:02 init [3]
>     2 ?        SWN    0:00 [ksoftirqd/0]
>     3 ?        SW<    0:00 [events/0]
>     4 ?        SW<    0:00 [khelper]
>     9 ?        SW<    0:00 [kthread]
>    17 ?        SW<    0:00 [kblockd/0]
>    71 ?        SW     0:00 [pdflush]
>    72 ?        SW     0:00 [pdflush]
>    74 ?        SW<    0:00 [aio/0]
>    73 ?        SW     0:00 [kswapd0]
>   659 ?        SW     0:00 [kseriod]
>   694 ?        SW     0:00 [scsi_eh_0]
>   715 ?        SW<    0:00 [kcryptd/0]
>   716 ?        SW<    0:00 [kmirrord/0]
>   751 ?        SW     0:00 [kjournald]
>  1722 ?        S      0:00 syslogd -m 0
>  1726 ?        S      0:00 klogd -x
>  1749 ?        S      0:00 /usr/sbin/sshd
>  1761 ?        S      0:00 crond
>  1785 tty1     S      0:00 /sbin/mingetty tty1
>  1807 tty2     S      0:00 /sbin/mingetty tty2
>  1818 tty3     S      0:00 /sbin/mingetty tty3
>  1819 tty4     S      0:00 /sbin/mingetty tty4
>  1906 tty5     S      0:00 /sbin/mingetty tty5
>  1937 tty6     S      0:00 /sbin/mingetty tty6
>  1972 ?        R      0:00 sshd: root@pts/0
>  1974 pts/0    S      0:00 -bash
>  2010 ?        S      0:00 sshd: root@pts/1
>  2012 pts/1    S      0:00 -bash
>  2047 pts/0    R      0:00 ps ax
> 
> (sorry for the long mail;)
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


=======================================================================================
En tus apuros y afanes, acude a los refranes. 
=======================================================================================

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2005-04-28  9:46 ` Diaz Rodriguez, Eduardo
@ 2005-04-28 10:23   ` Gergely Imre
  2005-04-28 11:13     ` Diaz Rodriguez, Eduardo
  0 siblings, 1 reply; 30+ messages in thread
From: Gergely Imre @ 2005-04-28 10:23 UTC (permalink / raw)
  To: LVM general discussion and development


the main question is, why it keeps hanging when i want to move "/" ? it
doesn't even begin to move the PEs. i tried to move some other live
partitions while writing on them, and it worked.

Diaz Rodriguez, Eduardo wrote:
> too much dificult for me, only have one idea, update lvm to last version and
> check changelog for some updates about pvmove.
> 
> If you solve the problem, please sumarice.
> 
> Good Luck!
> 
> On Wed, 27 Apr 2005 16:04:20 +0300, Gergely Imre wrote
> 
>>let me 'restart' this thread. i ran into another problem. or it's 
>>the same?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2005-04-28 10:23   ` Gergely Imre
@ 2005-04-28 11:13     ` Diaz Rodriguez, Eduardo
  0 siblings, 0 replies; 30+ messages in thread
From: Diaz Rodriguez, Eduardo @ 2005-04-28 11:13 UTC (permalink / raw)
  To: LVM general discussion and development

The / partition is diferent.

I am sure that your made the same but your import your vgs in other machine
and you moveit and you don't have problems.

the / patition has the modules, and many info that the systen can't move
becasue are in use (I suppose)....

Where you have the partions boot files?

lrwxrwxrwx    1 root     root           28 Apr 13 21:50 vmlinuz ->
boot/vmlinuz-2.4.30.20050413
lrwxrwxrwx    1 root     root           26 Apr 12 20:20 vmlinuz.old ->
boot/vmlinuz-2.4.30.050412

/dev/hda2              5763648   4796612    674252  88% /

I have kernel files in / :-D, yes I isn't the best way ....

:-|, I think that can be a good question for the developers :-)....

On Thu, 28 Apr 2005 13:23:14 +0300, Gergely Imre wrote
> the main question is, why it keeps hanging when i want to move "/" ? 
> it doesn't even begin to move the PEs. i tried to move some other 
> live partitions while writing on them, and it worked.
> 
> Diaz Rodriguez, Eduardo wrote:
> > too much dificult for me, only have one idea, update lvm to last version and
> > check changelog for some updates about pvmove.
> > 
> > If you solve the problem, please sumarice.
> > 
> > Good Luck!
> > 
> > On Wed, 27 Apr 2005 16:04:20 +0300, Gergely Imre wrote
> > 
> >>let me 'restart' this thread. i ran into another problem. or it's 
> >>the same?
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


=======================================================================================
En tus apuros y afanes, acude a los refranes. 
=======================================================================================

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [linux-lvm] pvmove hangs
@ 2010-08-17 19:26 Allen, Jack
  2010-08-17 22:12 ` Thomas Hager
  0 siblings, 1 reply; 30+ messages in thread
From: Allen, Jack @ 2010-08-17 19:26 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 10628 bytes --]

Hello:

        I posted this in the Dm-devel list yesterday afternoon, but so
far I have not gotten any responses, so I thought I would ask the same
questions here since the command that hang is pvmove.

        I had a customer that tried to do a pvmove and it hung. So we
setup a test system to try and duplicate the problem and were able to.

        A little history and why I am asking the question in this list.
The customer needed to move from an existing SAN to a new SAN and wanted
as little as possible down time for the Application. So they zoned the
new SAN for access by the system and then added the new LUNs to the
existing Volume Group. Then ran the pvmove commands. It worked with no
problem on one of the PVs, but on the second one all the I/O hung at the
Application and any commands that access the LVM information such as
vgdisplay.

        On our test system we only have 1 SAN (EMC CX700). We put X
number of LUNs in a Volume Group and allocated Logical Volumes for the
Application. Added some more LUNs to the Volume Group to simulate a
second SAN. Started the Application with a test program to generate I/O.
Ran pvmove with no problems on one PV, but on the second PV, it hung
just like on the customer's system.

        The reason I am posting to this list is because the same type of
move was done earlier on the test system running PowerPath and did not
have any problems. The OS is Red Hat EL 5.5 32 bit. The same version of
LVM was used on both tests. I can provide other details if needed.

        Below is part of the messages file when this happen.

Aug 13 14:18:14 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:18:14 mss121 multipathd: dm-19: add map (uevent)

Aug 13 14:18:14 mss121 multipathd: dm-22: add map (uevent)

Aug 13 14:19:53 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:19:53 mss121 multipathd: dm-19: add map (uevent)

Aug 13 14:19:53 mss121 multipathd: dm-22: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-19: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-22: add map (uevent)

Aug 13 14:21:26 mss121 multipathd: dm-25: remove map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-25: add map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-17: add map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-20: add map (uevent)

Aug 13 14:22:47 mss121 multipathd: dm-23: add map (uevent)

Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more

than 120 seconds.

Aug 13 14:27:22 mss121 kernel: "echo 0>

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Aug 13 14:27:22 mss121 kernel: mpdsk         D 00000BD7  1784 22158

22151         22159 22157 (NOTLB)

Aug 13 14:27:22 mss121 kernel:        f3025e04 00000082 bde60e11

00000bd7 f3025e50 c045d1a9 f3025e50 0000000a

Aug 13 14:27:22 mss121 kernel:        f7c60000 bde67e2a 00000bd7

00007019 00000000 f7c6010c c8612700 f6ec4e40

Aug 13 14:27:22 mss121 kernel:        00000000 00000000 00000000

c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff

Aug 13 14:27:22 mss121 kernel: Call Trace:

Aug 13 14:27:22 mss121 kernel:  [<c045d1a9>] __pagevec_release+0x15/0x1d

Aug 13 14:27:22 mss121 kernel:  [<c042cbd1>] getnstimeofday+0x30/0xb6

Aug 13 14:27:22 mss121 kernel:  [<c061c156>] io_schedule+0x36/0x59

Aug 13 14:27:22 mss121 kernel:  [<c04569c0>] sync_page+0x38/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c061c32d>] __wait_on_bit+0x33/0x58

Aug 13 14:27:22 mss121 kernel:  [<c0456988>] sync_page+0x0/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c0456a48>] wait_on_page_bit+0x5b/0x62

Aug 13 14:27:22 mss121 kernel:  [<c043642c>] wake_bit_function+0x0/0x3c

Aug 13 14:27:22 mss121 kernel:  [<c04573cf>]

wait_on_page_writeback_range+0x4d/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04934a0>]
generic_osync_inode+0x93/0xbf

Aug 13 14:27:22 mss121 kernel:  [<c0457618>]

sync_page_range_nolock+0x68/0x93

Aug 13 14:27:22 mss121 kernel:  [<c0458930>]

generic_file_aio_write_nolock+0x71/0x83

Aug 13 14:27:22 mss121 kernel:  [<c047b301>] blkdev_file_write+0x0/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0458c8d>]

generic_file_write_nolock+0x86/0x9a

Aug 13 14:27:22 mss121 kernel:  [<c04566fe>]
find_get_pages_tag+0x30/0x75

Aug 13 14:27:22 mss121 kernel:  [<c0457428>]

wait_on_page_writeback_range+0xa6/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04363ff>]

autoremove_wake_function+0x0/0x2d

Aug 13 14:27:22 mss121 kernel:  [<c061c408>] mutex_lock+0xb/0x19

Aug 13 14:27:22 mss121 kernel:  [<c0449c52>]
audit_syscall_entry+0x15a/0x18c

Aug 13 14:27:22 mss121 kernel:  [<c047b31b>] blkdev_file_write+0x1a/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0474d53>] vfs_write+0xa1/0x143

Aug 13 14:27:22 mss121 kernel:  [<c0475345>] sys_write+0x3c/0x63

Aug 13 14:27:22 mss121 kernel:  [<c0404f17>] syscall_call+0x7/0xb

Aug 13 14:27:22 mss121 kernel:  =======================

Aug 13 14:27:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more

than 120 seconds.

Aug 13 14:27:22 mss121 kernel: "echo 0>

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Aug 13 14:27:22 mss121 kernel: mpdsk         D 00000BD7  1884 22161

22151         22162 22160 (NOTLB)

Aug 13 14:27:22 mss121 kernel:        f34e2e04 00000082 baebd585

00000bd7 f34e2e50 c045d1a9 f34e2e50 0000000a

Aug 13 14:27:22 mss121 kernel:        f6eb1550 baec5e00 00000bd7

0000887b 00000000 f6eb165c c8612700 f723f040

Aug 13 14:27:22 mss121 kernel:        00000000 00000000 00000000

c12e1f80 018dc68e c042cbd1 f6cb3bdc ffffffff

Aug 13 14:27:22 mss121 kernel: Call Trace:

Aug 13 14:27:22 mss121 kernel:  [<c045d1a9>] __pagevec_release+0x15/0x1d

Aug 13 14:27:22 mss121 kernel:  [<c042cbd1>] getnstimeofday+0x30/0xb6

Aug 13 14:27:22 mss121 kernel:  [<c061c156>] io_schedule+0x36/0x59

Aug 13 14:27:22 mss121 kernel:  [<c04569c0>] sync_page+0x38/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c061c32d>] __wait_on_bit+0x33/0x58

Aug 13 14:27:22 mss121 kernel:  [<c0456988>] sync_page+0x0/0x3b

Aug 13 14:27:22 mss121 kernel:  [<c0456a48>] wait_on_page_bit+0x5b/0x62

Aug 13 14:27:22 mss121 kernel:  [<c043642c>] wake_bit_function+0x0/0x3c

Aug 13 14:27:22 mss121 kernel:  [<c04573cf>]

wait_on_page_writeback_range+0x4d/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04934a0>]
generic_osync_inode+0x93/0xbf

Aug 13 14:27:22 mss121 kernel:  [<c0457618>]

sync_page_range_nolock+0x68/0x93

Aug 13 14:27:22 mss121 kernel:  [<c0458930>]

generic_file_aio_write_nolock+0x71/0x83

Aug 13 14:27:22 mss121 kernel:  [<c047b301>] blkdev_file_write+0x0/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0458c8d>]

generic_file_write_nolock+0x86/0x9a

Aug 13 14:27:22 mss121 kernel:  [<c04566fe>]
find_get_pages_tag+0x30/0x75

Aug 13 14:27:22 mss121 kernel:  [<c0457428>]

wait_on_page_writeback_range+0xa6/0xf1

Aug 13 14:27:22 mss121 kernel:  [<c04363ff>]

autoremove_wake_function+0x0/0x2d

Aug 13 14:27:22 mss121 kernel:  [<c061c408>] mutex_lock+0xb/0x19

Aug 13 14:27:22 mss121 kernel:  [<c0449c52>]
audit_syscall_entry+0x15a/0x18c

Aug 13 14:27:22 mss121 kernel:  [<c047b31b>] blkdev_file_write+0x1a/0x1e

Aug 13 14:27:22 mss121 kernel:  [<c0474d53>] vfs_write+0xa1/0x143

Aug 13 14:27:22 mss121 kernel:  [<c0475345>] sys_write+0x3c/0x63

Aug 13 14:27:22 mss121 kernel:  [<c0404f17>] syscall_call+0x7/0xb

Aug 13 14:27:22 mss121 kernel:  =======================

Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22158 blocked for more

than 120 seconds.

Aug 13 14:29:22 mss121 kernel: "echo 0>

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Aug 13 14:29:22 mss121 kernel: mpdsk         D 00000BD7  1784 22158

22151         22159 22157 (NOTLB)

Aug 13 14:29:22 mss121 kernel:        f3025e04 00000082 bde60e11

00000bd7 f3025e50 c045d1a9 f3025e50 0000000a

Aug 13 14:29:22 mss121 kernel:        f7c60000 bde67e2a 00000bd7

00007019 00000000 f7c6010c c8612700 f6ec4e40

Aug 13 14:29:22 mss121 kernel:        00000000 00000000 00000000

c12b8dc0 018dc6f2 c042cbd1 f6cb3f0c ffffffff

Aug 13 14:29:22 mss121 kernel: Call Trace:

Aug 13 14:29:22 mss121 kernel:  [<c045d1a9>] __pagevec_release+0x15/0x1d

Aug 13 14:29:22 mss121 kernel:  [<c042cbd1>] getnstimeofday+0x30/0xb6

Aug 13 14:29:22 mss121 kernel:  [<c061c156>] io_schedule+0x36/0x59

Aug 13 14:29:22 mss121 kernel:  [<c04569c0>] sync_page+0x38/0x3b

Aug 13 14:29:22 mss121 kernel:  [<c061c32d>] __wait_on_bit+0x33/0x58

Aug 13 14:29:22 mss121 kernel:  [<c0456988>] sync_page+0x0/0x3b

Aug 13 14:29:22 mss121 kernel:  [<c0456a48>] wait_on_page_bit+0x5b/0x62

Aug 13 14:29:22 mss121 kernel:  [<c043642c>] wake_bit_function+0x0/0x3c

Aug 13 14:29:22 mss121 kernel:  [<c04573cf>]

wait_on_page_writeback_range+0x4d/0xf1

Aug 13 14:29:22 mss121 kernel:  [<c04934a0>]
generic_osync_inode+0x93/0xbf

Aug 13 14:29:22 mss121 kernel:  [<c0457618>]

sync_page_range_nolock+0x68/0x93

Aug 13 14:29:22 mss121 kernel:  [<c0458930>]

generic_file_aio_write_nolock+0x71/0x83

Aug 13 14:29:22 mss121 kernel:  [<c047b301>] blkdev_file_write+0x0/0x1e

Aug 13 14:29:22 mss121 kernel:  [<c0458c8d>]

generic_file_write_nolock+0x86/0x9a

Aug 13 14:29:22 mss121 kernel:  [<c04566fe>]
find_get_pages_tag+0x30/0x75

Aug 13 14:29:22 mss121 kernel:  [<c0457428>]

wait_on_page_writeback_range+0xa6/0xf1

Aug 13 14:29:22 mss121 kernel:  [<c04363ff>]

autoremove_wake_function+0x0/0x2d

Aug 13 14:29:22 mss121 kernel:  [<c061c408>] mutex_lock+0xb/0x19

Aug 13 14:29:22 mss121 kernel:  [<c0449c52>]
audit_syscall_entry+0x15a/0x18c

Aug 13 14:29:22 mss121 kernel:  [<c047b31b>] blkdev_file_write+0x1a/0x1e

Aug 13 14:29:22 mss121 kernel:  [<c0474d53>] vfs_write+0xa1/0x143

Aug 13 14:29:22 mss121 kernel:  [<c0475345>] sys_write+0x3c/0x63

Aug 13 14:29:22 mss121 kernel:  [<c0404f17>] syscall_call+0x7/0xb

Aug 13 14:29:22 mss121 kernel:  =======================

Aug 13 14:29:22 mss121 kernel: INFO: task mpdsk:22161 blocked for more

than 120 seconds.

        The mpdsk processes above are part of the Application which is a
MUMPS database (not a RDB) that does the writing of data blocks to raw
Logical Volume (no file system involved). It would have been doing
writes during both pvmoves. I know pvmove is part ofLVM2, but because it
worked with PowerPath and not when using Multipath and all other things
are the same is the reason I am asking the questions here.

_____

Jack Allen


[-- Attachment #2: Type: text/html, Size: 24218 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2010-08-17 19:26 Allen, Jack
@ 2010-08-17 22:12 ` Thomas Hager
  2010-08-17 23:09   ` Allen, Jack
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Hager @ 2010-08-17 22:12 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1772 bytes --]

On Tue, 2010-08-17 at 15:26 -0400, Allen, Jack wrote:
> I know pvmove is part ofLVM2, but because it worked with PowerPath and
> not when using Multipath and all other things are the same is the
> reason I am asking the questions here.
we had similar problems with novell SLES every now and then, and they
were not reproduceable and occured in random time frames. 

among them:

- the pvmove simply stalled, outputting the same %-done message every 15
seconds until we reset the server.

- the server crashed and performed an automated reboot.

and worst:

- pvmove immediately threw an I/O error after starting, committed all
pending moves though -> all data previously residing on the old LUN was
lost :(

novell provided several updates to the kernel, lvm2 and the
device-mapper, but there might still lurk some bugs we haven't triggered
yet. one advise they gave us was to only migrate PEs of one LV at a
time, which we followed afterwards.

we've seen this behaviour only on SLES, which is the distribution we use
on most of our servers. the few redhats we have migrated fine with
pvmove, we didn't migrate much on these though (only some hundred GB
compared to the ~50TB we had to migrate with SLES). and it was not
related to the storage driver we used, we faced the same issues with
HP's adapted qlogic driver as well as with dm-mp.

anyway, you definitely should open an SR in redhat's CC, so they can
investigate the issue more closely.

hth,
tom.

-- 
Thomas "Duke" Hager                               duke@sigsegv.at
GPG: 1024D/D27F858C            http://www.sigsegv.at/gpg/duke.gpg
=================================================================
"Never Underestimate the Power of Stupid People in Large Groups."


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-lvm] pvmove hangs
  2010-08-17 22:12 ` Thomas Hager
@ 2010-08-17 23:09   ` Allen, Jack
  0 siblings, 0 replies; 30+ messages in thread
From: Allen, Jack @ 2010-08-17 23:09 UTC (permalink / raw)
  To: LVM general discussion and development

-----Original Message-----
From: linux-lvm-bounces@redhat.com [mailto:linux-lvm-bounces@redhat.com]
On Behalf Of Thomas Hager
Sent: Tuesday, August 17, 2010 6:12 PM
To: linux-lvm@redhat.com
Subject: Re: [linux-lvm] pvmove hangs

On Tue, 2010-08-17 at 15:26 -0400, Allen, Jack wrote:
> I know pvmove is part ofLVM2, but because it worked with PowerPath and
> not when using Multipath and all other things are the same is the
> reason I am asking the questions here.
we had similar problems with novell SLES every now and then, and they
were not reproduceable and occured in random time frames. 

among them:

- the pvmove simply stalled, outputting the same %-done message every 15
seconds until we reset the server.

- the server crashed and performed an automated reboot.

and worst:

- pvmove immediately threw an I/O error after starting, committed all
pending moves though -> all data previously residing on the old LUN was
lost :(

novell provided several updates to the kernel, lvm2 and the
device-mapper, but there might still lurk some bugs we haven't triggered
yet. one advise they gave us was to only migrate PEs of one LV at a
time, which we followed afterwards.

we've seen this behaviour only on SLES, which is the distribution we use
on most of our servers. the few redhats we have migrated fine with
pvmove, we didn't migrate much on these though (only some hundred GB
compared to the ~50TB we had to migrate with SLES). and it was not
related to the storage driver we used, we faced the same issues with
HP's adapted qlogic driver as well as with dm-mp.

anyway, you definitely should open an SR in redhat's CC, so they can
investigate the issue more closely.

hth,
tom.
=============
Thanks for the info Tom.

After sending the post I tried the pvmove command several more times,
this time adding the -v option. One PV had 3 LVs on it and it completed
the first LV with no problem and then as it completed the second LV is
displays suspending LV (the first one did to) and this is when
everything related to the PV hung. If you try to do any LVM commands on
the VG they hang, but if you abort them with a ^C it states aborted
while waiting on flock /var/lock/lvm/X, where X is the VG name. If I
remove the file I can then do LVM commands related to the VG, but the
pvmove is still hung. So it would seem there is some race condition,
deadly embrace, catch 22 of 2 resources waiting on each other. Writes
are waiting because the LV is suspended, and suspending is waiting
because there are outstanding writes to be done.

I plan to open case with Red Hat, but was hoping the problem had already
been handled. I am still concerned about it not working with multipath,
but works fine with Powerpath, because I would rather use multipath
because it makes doing yum updates a lot easier.

------
Jack Allen

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-08-17 23:09 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-27 13:04 [linux-lvm] pvmove hangs Gergely Imre
2005-04-28  9:46 ` Diaz Rodriguez, Eduardo
2005-04-28 10:23   ` Gergely Imre
2005-04-28 11:13     ` Diaz Rodriguez, Eduardo
  -- strict thread matches above, loose matches on Subject: below --
2010-08-17 19:26 Allen, Jack
2010-08-17 22:12 ` Thomas Hager
2010-08-17 23:09   ` Allen, Jack
2005-04-26  9:19 Gergely Imre
2005-04-26 13:30 ` Diaz Rodriguez, Eduardo
2005-04-27  5:46   ` Gergely Imre
2003-08-16  5:21 Jan Niehusmann
2003-08-16 10:48 ` Jan Niehusmann
2003-08-16 13:57 ` Alasdair G Kergon
2003-08-17 11:11   ` Jan Niehusmann
2003-08-17 11:34     ` Alasdair G Kergon
2003-08-17 11:41       ` Jan Niehusmann
2003-08-17 12:00         ` Alasdair G Kergon
2003-08-17 18:15   ` Jan Niehusmann
2003-08-18  6:46     ` Alasdair G Kergon
2003-08-18  7:07       ` Jan Niehusmann
2003-08-18  9:13     ` Alasdair G Kergon
2003-08-18  9:30       ` Jan Niehusmann
     [not found]   ` <20030817114638.GA1839@gondor.com>
2003-08-17 12:42     ` Alasdair G Kergon
2003-08-17 13:27       ` Jan Niehusmann
2003-08-17 13:50         ` Alasdair G Kergon
2003-08-17 13:55         ` Alasdair G Kergon
2003-08-18 12:58     ` Alasdair G Kergon
2003-08-18 13:21       ` Jan Niehusmann
2003-08-18 17:55       ` Jan Niehusmann
2003-08-19 17:52         ` Jan Niehusmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.