[linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

* [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
@ 2002-10-02 12:23 Gregory K. Ade
  2002-10-04  3:54 ` Heinz J . Mauelshagen
  0 siblings, 1 reply; 21+ messages in thread
From: Gregory K. Ade @ 2002-10-02 12:23 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 4676 bytes --]

I'm not sure what I found, or why it's happening, but I managed to
excersize some or another bug in LVM 1.0.5...

We use home-rolled scripts for doing our system backups, and one of the
steps creates snapshots of our database filesystems, so that we can dump
the snapshots to tape and get a consistent backup image.  These scripts
were misconfigured, and attempted to create a snapshot of a volume on a
volume group that did not exist.

This machine is running Linux 2.4.19, patched with Broadcomm Gigabit
drivers and LVM 1.0.5 (linux-2.4.19-VFS-lock.patch and
lvm-1.0.5-2.4.19-1.burpr.patch, generated by running make in
/usr/src/LVM/1.0.5/PATCHES).  I then compiled and installed the LVM
userland tools from the sources.

This machine has one volume group, vg00, consisting of a single physical
volume, /dev/sda4, which is itself a partition of ~100GB on a hardware
RAID-10 array.

--->8--[ Cut Here ]--->8--
root@burpr(pts/1):~ 34 # ls -al /dev/vg00
total 47
dr-xr-xr-x    2 root     root          232 Oct  2 02:55 ./
drwxr-xr-x   15 root     root        46926 Oct  2 02:55 ../
brw-rw----    1 root     disk      58,   5 Oct  2 02:55 dat
brw-rw----    1 root     disk      58,   6 Oct  2 02:55 db1
brw-rw----    1 root     disk      58,   7 Oct  2 02:55 db2
crw-r-----    1 root     disk     109,   0 Oct  2 02:55 group
brw-rw----    1 root     disk      58,   3 Oct  2 02:55 home
brw-rw----    1 root     disk      58,   0 Oct  2 02:55 root
brw-rw----    1 root     disk      58,   1 Oct  2 02:55 tmp
brw-rw----    1 root     disk      58,   4 Oct  2 02:55 u
brw-rw----    1 root     disk      58,   8 Oct  2 02:55 unifytmp
brw-rw----    1 root     disk      58,   2 Oct  2 02:55 var
--->8--[ Cut Here ]--->8--

The command which was errantly run was:

--->8--[ Cut Here ]--->8--
lvcreate --size 8G --snapshot --name db1_snap vg01
--->8--[ Cut Here ]--->8--

I got this output:

--->8--[ Cut Here ]--->8--
lvcreate -- "/etc/lvmtab.d/vg01" doesn't exist
lvcreate -- can't create logical volume: volume group "vg01" doesn't
exist
--->8--[ Cut Here ]--->8--

That's all well and good, and expected.  Well, I saw the backup scripts
trying to do this, so I killed them off as cleanly as possible, fixed
the configuration, and restarted them.  Only now, they got stuck on the
first vgscan they tried to run.

Running vgdisplay by hand now, I seem to have "lost" 8GB from my vg. 
vgdisplay shows 8GB less free than should be there if you add up the
allocations to all the existing lv's.  lvscan segfaults, and vgscan
hangs while trying to open /dev/lvm.  lvcreate hangs as well.  Running
strace:

--->8--[ Cut Here ]--->8--
root@burpr(pts/1):~ 51 # strace lvcreate --size 256M --snapshot --name
unifytmp_snap /dev/vg00/unifytmp vg00
--->8--[ Cut Here ]--->8--

ends up with a hang, and this is the last few lines of the trace:

--->8--[ Cut Here ]--->8--
open("/dev/vg00/group", O_RDONLY)       = 3
ioctl(3, 0xc004fe05, 0x80a40b8)         = 0
close(3)                                = 0
stat64("/dev/lvm", {st_mode=S_IFCHR|0640, st_rdev=makedev(109, 0), ...})
= 0
open("/dev/lvm", O_RDONLY)              = 3
ioctl(3, 0x8004fe98, 0xbfffec22)        = 0
close(3)                                = 0
stat64("/dev/lvm", {st_mode=S_IFCHR|0640, st_rdev=makedev(109, 0), ...})
= 0
open("/dev/lvm", O_RDONLY)              = 3
ioctl(3, 0xff00 <unfinished ...>
--->8--[ Cut Here ]--->8--

The <unfinished ...> is when I gave up after 5 minutes and hit
<control>-c.

I have complete straces available of vgscan, lvscan, and lvcreate, as
well as the output of lvdisplay for each of the lv's I've got.  I also
have a core file for lvscan, if that would help, too.

We are going to reboot the server over lunch today, hopefully that will
clear out whatever kernel structures are gorked, but I'm really not
happy that this happened in the first place, and hope someone here can
point me to an answer.

The hardware is a Dell PowerEdge 6600 with PERC3/DC RAID controller (LSI
MegaRAID), 6 15krpm 36GB disks in a RAID-10, 8GB memory, four 1.6GHz
Xeon CPUs.  Running SuSE Linux Enterprise Server 7 (essentially a
stripped-down SuSE 7.2), kernel.org's 2.4.19 + Broadcom and LVM patches,
and LVM 1.0.5.

I haven't had any problems yet on another server (PowerEdge 2450, 2x
P-III 1GHz, 2GB ram, same kernel & lvm, different raid controller).

I've tried to be thourough in my data collection; let me know if there's
something more needed to debug this.

TIA

--
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-02 12:23 [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19? Gregory K. Ade
@ 2002-10-04  3:54 ` Heinz J . Mauelshagen
  2002-10-08 15:07   ` Gregory Ade
  0 siblings, 1 reply; 21+ messages in thread
From: Heinz J . Mauelshagen @ 2002-10-04  3:54 UTC (permalink / raw)
  To: linux-lvm

Gregory,

running "lvcreate --size 8G --snapshot --name db1_snap vg01" should give
a syntax error rather than "... doesn't exist".

Did you eventually run
"lvcreate --size 8G --snapshot --name db1_snap /dev/vg01/db1"
instead?

I guess the problem has disappeared after your reboot, right?
If so, are you able to repeat the problem?

Regards,
Heinz    -- The LVM Guy --


On Wed, Oct 02, 2002 at 10:22:30AM -0700, Gregory K. Ade wrote:
> I'm not sure what I found, or why it's happening, but I managed to
> excersize some or another bug in LVM 1.0.5...
> 
> We use home-rolled scripts for doing our system backups, and one of the
> steps creates snapshots of our database filesystems, so that we can dump
> the snapshots to tape and get a consistent backup image.  These scripts
> were misconfigured, and attempted to create a snapshot of a volume on a
> volume group that did not exist.
> 
> This machine is running Linux 2.4.19, patched with Broadcomm Gigabit
> drivers and LVM 1.0.5 (linux-2.4.19-VFS-lock.patch and
> lvm-1.0.5-2.4.19-1.burpr.patch, generated by running make in
> /usr/src/LVM/1.0.5/PATCHES).  I then compiled and installed the LVM
> userland tools from the sources.
> 
> This machine has one volume group, vg00, consisting of a single physical
> volume, /dev/sda4, which is itself a partition of ~100GB on a hardware
> RAID-10 array.
> 
> --->8--[ Cut Here ]--->8--
> root@burpr(pts/1):~ 34 # ls -al /dev/vg00
> total 47
> dr-xr-xr-x    2 root     root          232 Oct  2 02:55 ./
> drwxr-xr-x   15 root     root        46926 Oct  2 02:55 ../
> brw-rw----    1 root     disk      58,   5 Oct  2 02:55 dat
> brw-rw----    1 root     disk      58,   6 Oct  2 02:55 db1
> brw-rw----    1 root     disk      58,   7 Oct  2 02:55 db2
> crw-r-----    1 root     disk     109,   0 Oct  2 02:55 group
> brw-rw----    1 root     disk      58,   3 Oct  2 02:55 home
> brw-rw----    1 root     disk      58,   0 Oct  2 02:55 root
> brw-rw----    1 root     disk      58,   1 Oct  2 02:55 tmp
> brw-rw----    1 root     disk      58,   4 Oct  2 02:55 u
> brw-rw----    1 root     disk      58,   8 Oct  2 02:55 unifytmp
> brw-rw----    1 root     disk      58,   2 Oct  2 02:55 var
> --->8--[ Cut Here ]--->8--
> 
> The command which was errantly run was:
> 
> --->8--[ Cut Here ]--->8--
> lvcreate --size 8G --snapshot --name db1_snap vg01
> --->8--[ Cut Here ]--->8--
> 
> I got this output:
> 
> --->8--[ Cut Here ]--->8--
> lvcreate -- "/etc/lvmtab.d/vg01" doesn't exist
> lvcreate -- can't create logical volume: volume group "vg01" doesn't
> exist
> --->8--[ Cut Here ]--->8--
> 
> That's all well and good, and expected.  Well, I saw the backup scripts
> trying to do this, so I killed them off as cleanly as possible, fixed
> the configuration, and restarted them.  Only now, they got stuck on the
> first vgscan they tried to run.
> 
> Running vgdisplay by hand now, I seem to have "lost" 8GB from my vg. 
> vgdisplay shows 8GB less free than should be there if you add up the
> allocations to all the existing lv's.  lvscan segfaults, and vgscan
> hangs while trying to open /dev/lvm.  lvcreate hangs as well.  Running
> strace:
> 
> --->8--[ Cut Here ]--->8--
> root@burpr(pts/1):~ 51 # strace lvcreate --size 256M --snapshot --name
> unifytmp_snap /dev/vg00/unifytmp vg00
> --->8--[ Cut Here ]--->8--
> 
> ends up with a hang, and this is the last few lines of the trace:
> 
> --->8--[ Cut Here ]--->8--
> open("/dev/vg00/group", O_RDONLY)       = 3
> ioctl(3, 0xc004fe05, 0x80a40b8)         = 0
> close(3)                                = 0
> stat64("/dev/lvm", {st_mode=S_IFCHR|0640, st_rdev=makedev(109, 0), ...})
> = 0
> open("/dev/lvm", O_RDONLY)              = 3
> ioctl(3, 0x8004fe98, 0xbfffec22)        = 0
> close(3)                                = 0
> stat64("/dev/lvm", {st_mode=S_IFCHR|0640, st_rdev=makedev(109, 0), ...})
> = 0
> open("/dev/lvm", O_RDONLY)              = 3
> ioctl(3, 0xff00 <unfinished ...>
> --->8--[ Cut Here ]--->8--
> 
> The <unfinished ...> is when I gave up after 5 minutes and hit
> <control>-c.
> 
> I have complete straces available of vgscan, lvscan, and lvcreate, as
> well as the output of lvdisplay for each of the lv's I've got.  I also
> have a core file for lvscan, if that would help, too.
> 
> We are going to reboot the server over lunch today, hopefully that will
> clear out whatever kernel structures are gorked, but I'm really not
> happy that this happened in the first place, and hope someone here can
> point me to an answer.
> 
> The hardware is a Dell PowerEdge 6600 with PERC3/DC RAID controller (LSI
> MegaRAID), 6 15krpm 36GB disks in a RAID-10, 8GB memory, four 1.6GHz
> Xeon CPUs.  Running SuSE Linux Enterprise Server 7 (essentially a
> stripped-down SuSE 7.2), kernel.org's 2.4.19 + Broadcom and LVM patches,
> and LVM 1.0.5.
> 
> I haven't had any problems yet on another server (PowerEdge 2450, 2x
> P-III 1GHz, 2GB ram, same kernel & lvm, different raid controller).
> 
> I've tried to be thourough in my data collection; let me know if there's
> something more needed to debug this.
> 
> 
> TIA
> 
> --
> Gregory K. Ade <gkade@bigbrother.net>
> http://bigbrother.net/~gkade
> OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu
> 
> 

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-04  3:54 ` Heinz J . Mauelshagen
@ 2002-10-08 15:07   ` Gregory Ade
  2002-10-09  6:38     ` Heinz J . Mauelshagen
  0 siblings, 1 reply; 21+ messages in thread
From: Gregory Ade @ 2002-10-08 15:07 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 3037 bytes --]

On Fri, 2002-10-04 at 01:50, Heinz J . Mauelshagen wrote:
> 
> Gregory,
> 
> running "lvcreate --size 8G --snapshot --name db1_snap vg01" should give
> a syntax error rather than "... doesn't exist".
> 
> Did you eventually run
> "lvcreate --size 8G --snapshot --name db1_snap /dev/vg01/db1"
> instead?

according to the manpage and help for lvcreate, the syntax was correct. 
The problem was that vg01 really didn't exist. :)

That, however, is not the problem.

> I guess the problem has disappeared after your reboot, right?
> If so, are you able to repeat the problem?

Kinda...

I'm hesitant to do any serious poking at it just yet, as the machine
having the problems is already in production and being heavily used.

Rebooting mostly clears up the problem.  After a reboot, I can use
lvdisplay and vgdisplay to my heart's content.  I can lvcreate and
lvremove lv's from the vg without problem.  vgscan and lvscan work as
expected.

It seems that the `vgchange -a n` in /etc/init.d/halt hangs, though.  On
other systems, this is not the case, even with the same version kernel &
LVM utilities (2.4.19 & 1.0.5, respectively).

another point is that this system is configured with root (/) on LVM
(/dev/vg00/root).  On other systems we have configured this way, we
occasionally get an error message that vgchange was unable to deactivate
the volume group because there were open volumes.  On this machine, it
just hangs, requiring a power-cycle to do a reboot (why did they stop
putting reset switches in Intel servers???)

I haven't tried creating a snapshot yet, mainly because I'm gun shy
right now, and I don't want to risk imposing brain damage on the system
in the middle of the week.

I'm also experiencing bizarre behavior with this system when doing any
operations involving rapid traversal of the filesystems, which again
does not happen on any of our other systems with identical software
loaded.  The best example is to run:

	find / > /dev/null

On any other system, this just forces the system to read every directory
on the filesystem.  not very useful, but it doesn't do much more than
take 10%CPU on a P-III 800 for a little bit.  On this system (Quad
1.6GHz Xeon, HyperThreading disabled, 8GB RAM, 8GB Swap, 100GB RAID-10),
running that command will eventually choke up the machine, forcing find
and kswapd to >40%CPU (according to top), and occasionally bringing
kupdated into the mix.

I'm currently trying to figure out if the problem is with LVM, if I need
to double the swap space to 16GB, or if I need to find a new driver for
the RAID card.  As it is, any intrusive testing on this system will have
to wait until I'm physically at the location with this server, which
will likely happen on or about Sat. Oct. 19, so i can have the machine
to myself on a weekend.

I'm starting to get desperate for a solution...

-- 
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-08 15:07   ` Gregory Ade
@ 2002-10-09  6:38     ` Heinz J . Mauelshagen
  2002-10-27 22:30       ` Gregory K. Ade
  0 siblings, 1 reply; 21+ messages in thread
From: Heinz J . Mauelshagen @ 2002-10-09  6:38 UTC (permalink / raw)
  To: linux-lvm

On Tue, Oct 08, 2002 at 01:06:39PM -0700, Gregory Ade wrote:
> On Fri, 2002-10-04 at 01:50, Heinz J . Mauelshagen wrote:
> > 
> > Gregory,
> > 
> > running "lvcreate --size 8G --snapshot --name db1_snap vg01" should give
> > a syntax error rather than "... doesn't exist".
> > 
> > Did you eventually run
> > "lvcreate --size 8G --snapshot --name db1_snap /dev/vg01/db1"
> > instead?
> 
> according to the manpage and help for lvcreate, the syntax was correct. 
> The problem was that vg01 really didn't exist. :)
> 
> That, however, is not the problem.

<pickymode>
True, it is not. Manual page says "OriginalLogicalVolumePath".
</pickymode>

> 
> > I guess the problem has disappeared after your reboot, right?
> > If so, are you able to repeat the problem?
> 
> Kinda...

:)

> 
> I'm hesitant to do any serious poking at it just yet, as the machine
> having the problems is already in production and being heavily used.

I understand.

Having read the information below you've got various different
HW components in the machine in question. If you have a chance,
please try to put load on the machine on non LVM devices and see if that
causes problems as well.

There's a variaty of nasty things which can hit you under load:

- one or more CPUs with heat problems
- heat problems with mainboard components or drives
  (i.e. your airflow could be insufficient under load)
- flaky cables/connectors
- flaky drivers (as you already refered to)
- other kernel flaws

Tell me about your test results, please.

Regards,
Heinz    -- The LVM Guy --

> 
> Rebooting mostly clears up the problem.  After a reboot, I can use
> lvdisplay and vgdisplay to my heart's content.  I can lvcreate and
> lvremove lv's from the vg without problem.  vgscan and lvscan work as
> expected.
> 
> It seems that the `vgchange -a n` in /etc/init.d/halt hangs, though.  On
> other systems, this is not the case, even with the same version kernel &
> LVM utilities (2.4.19 & 1.0.5, respectively).
> 
> another point is that this system is configured with root (/) on LVM
> (/dev/vg00/root).  On other systems we have configured this way, we
> occasionally get an error message that vgchange was unable to deactivate
> the volume group because there were open volumes.  On this machine, it
> just hangs, requiring a power-cycle to do a reboot (why did they stop
> putting reset switches in Intel servers???)
> 
> I haven't tried creating a snapshot yet, mainly because I'm gun shy
> right now, and I don't want to risk imposing brain damage on the system
> in the middle of the week.
> 
> I'm also experiencing bizarre behavior with this system when doing any
> operations involving rapid traversal of the filesystems, which again
> does not happen on any of our other systems with identical software
> loaded.  The best example is to run:
> 
> 	find / > /dev/null
> 
> On any other system, this just forces the system to read every directory
> on the filesystem.  not very useful, but it doesn't do much more than
> take 10%CPU on a P-III 800 for a little bit.  On this system (Quad
> 1.6GHz Xeon, HyperThreading disabled, 8GB RAM, 8GB Swap, 100GB RAID-10),
> running that command will eventually choke up the machine, forcing find
> and kswapd to >40%CPU (according to top), and occasionally bringing
> kupdated into the mix.
> 
> I'm currently trying to figure out if the problem is with LVM, if I need
> to double the swap space to 16GB, or if I need to find a new driver for
> the RAID card.  As it is, any intrusive testing on this system will have
> to wait until I'm physically at the location with this server, which
> will likely happen on or about Sat. Oct. 19, so i can have the machine
> to myself on a weekend.
> 
> I'm starting to get desperate for a solution...
> 
> -- 
> Gregory K. Ade <gkade@bigbrother.net>
> http://bigbrother.net/~gkade
> OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-09  6:38     ` Heinz J . Mauelshagen
@ 2002-10-27 22:30       ` Gregory K. Ade
  2002-10-28  3:21         ` Heinz J . Mauelshagen
  2002-10-28 14:36         ` jon+lvm
  0 siblings, 2 replies; 21+ messages in thread
From: Gregory K. Ade @ 2002-10-27 22:30 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1733 bytes --]

On Wed, 2002-10-09 at 04:36, Heinz J . Mauelshagen wrote:
> <pickymode>
> True, it is not. Manual page says "OriginalLogicalVolumePath".
> </pickymode>

root@burpr(pts/0):~ 24 # lvcreate --snapshot --extents 512 --name tmp_snap /dev/vg00/tmp
lvcreate -- INFO: using default snapshot chunk size of 64 KB for "/dev/vg00/tmp_snap"
Segmentation fault
root@burpr(pts/0):~ 25 #

This is not good.  This is a system that has been recently rebooted, and
come up clean.  This i sthe very first LVM command I have run by hand
since the boot.

In terms of the other problems I was having, I think I _may_ have gotten
those fixed (i.e., things like `find /` killing the system) by some
friendly souls in the linux-kernel mailing list with a VM patch.  There
seemed to be a bug in largemem systems that was being exercised.

Now, I just rebooted the system with that patch on a fresh kernel: 
Linux Kernel 2.4.19, patched with the VM patch, LVM 1.0.5 and Broadcom
Gigabit Ethernet patches.

I left for dinner, during which time nobody did anything to the
machine.  I came back and decided to try creating a snapshot volume of
/dev/vg00/tmp, and lvcreate segfaulted.  lvscan also segfaults, vgscan
hangs, and vgdisplay and lvdisplay seem to still work.

trying to create a test volume after that (`lvcreate --extents 512
--name testvol vg00`) simply hangs.

So, who do I give what information to so that we can trace down the base
of this problem, and get a fix?  Ask me for whatever you need from the
system, and I'll provide it if I can.

Thanks again in advance,

Gregory

-- 
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-27 22:30       ` Gregory K. Ade
@ 2002-10-28  3:21         ` Heinz J . Mauelshagen
  2002-10-28 22:38           ` Gregory K. Ade
  2002-10-28 14:36         ` jon+lvm
  1 sibling, 1 reply; 21+ messages in thread
From: Heinz J . Mauelshagen @ 2002-10-28  3:21 UTC (permalink / raw)
  To: linux-lvm

On Sun, Oct 27, 2002 at 09:29:35PM -0800, Gregory K. Ade wrote:
> On Wed, 2002-10-09 at 04:36, Heinz J . Mauelshagen wrote:
> > <pickymode>
> > True, it is not. Manual page says "OriginalLogicalVolumePath".
> > </pickymode>
> 
> root@burpr(pts/0):~ 24 # lvcreate --snapshot --extents 512 --name tmp_snap /dev/vg00/tmp
> lvcreate -- INFO: using default snapshot chunk size of 64 KB for "/dev/vg00/tmp_snap"
> Segmentation fault
> root@burpr(pts/0):~ 25 #
> 
> This is not good.  This is a system that has been recently rebooted, and
> come up clean.  This i sthe very first LVM command I have run by hand
> since the boot.
> 
> In terms of the other problems I was having, I think I _may_ have gotten
> those fixed (i.e., things like `find /` killing the system) by some
> friendly souls in the linux-kernel mailing list with a VM patch.  There
> seemed to be a bug in largemem systems that was being exercised.
> 
> Now, I just rebooted the system with that patch on a fresh kernel: 
> Linux Kernel 2.4.19, patched with the VM patch, LVM 1.0.5 and Broadcom
> Gigabit Ethernet patches.
> 
> I left for dinner, during which time nobody did anything to the
> machine.  I came back and decided to try creating a snapshot volume of
> /dev/vg00/tmp, and lvcreate segfaulted.  lvscan also segfaults, vgscan
> hangs, and vgdisplay and lvdisplay seem to still work.
> 
> trying to create a test volume after that (`lvcreate --extents 512
> --name testvol vg00`) simply hangs.
> 
> So, who do I give what information to so that we can trace down the base
> of this problem, and get a fix?  Ask me for whatever you need from the
> system, and I'll provide it if I can.

Run the kernel oops you probably got for "lvcreate --snapshot ..." through
ksymoops, please.

After that oops the LVM driver is in no sane state anyway which
is the reason for other LVM commands to behave strangely.

> 
> Thanks again in advance,
> 
> Gregory
> 
> 
> 
> -- 
> Gregory K. Ade <gkade@bigbrother.net>
> http://bigbrother.net/~gkade
> OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu



-- 

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-27 22:30       ` Gregory K. Ade
  2002-10-28  3:21         ` Heinz J . Mauelshagen
@ 2002-10-28 14:36         ` jon+lvm
  2002-10-28 16:40           ` Gregory K. Ade
  1 sibling, 1 reply; 21+ messages in thread
From: jon+lvm @ 2002-10-28 14:36 UTC (permalink / raw)
  To: linux-lvm

On Sun, Oct 27, 2002 at 09:29:35PM -0800, Gregory K. Ade wrote:

[cut]

> So, who do I give what information to so that we can trace down the base
> of this problem, and get a fix?  Ask me for whatever you need from the
> system, and I'll provide it if I can.

I dont know what your problem is, but i run lvm 1.0.5 on a default 2.4.19
without trouble. (any more).
I lvcreate once an hour, though it is a snapshot, and remove it again
the next hour. I've got a script running every minute that extends the
snapshots if they become too small. So, maybe the problem isnt in lvm, but
memory, and it just happens to hit LVM ?

JonB

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-28 14:36         ` jon+lvm
@ 2002-10-28 16:40           ` Gregory K. Ade
  2002-10-29 15:21             ` Luca Berra
  0 siblings, 1 reply; 21+ messages in thread
From: Gregory K. Ade @ 2002-10-28 16:40 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1809 bytes --]

On Mon, 2002-10-28 at 12:35, jon+lvm@silicide.dk wrote:
> On Sun, Oct 27, 2002 at 09:29:35PM -0800, Gregory K. Ade wrote:
> 
> [cut]
> 
> > So, who do I give what information to so that we can trace down the base
> > of this problem, and get a fix?  Ask me for whatever you need from the
> > system, and I'll provide it if I can.
> 
> I dont know what your problem is, but i run lvm 1.0.5 on a default 2.4.19
> without trouble. (any more).
> I lvcreate once an hour, though it is a snapshot, and remove it again
> the next hour. I've got a script running every minute that extends the
> snapshots if they become too small. So, maybe the problem isnt in lvm, but
> memory, and it just happens to hit LVM ?

Interesting that you should mention this.  This system has proved to
have several interesting quirks, and most of them that have been
resolved have been resolved by making patches or other fixes in
relationship to the large memory configuration (8GB).  System
performance issues related to the filesystems were actually resolved by
applying a VM/VFS patch (I'm not really sure what it patched, honestly)
that was specifically addressing problems in >2GB RAM systems.

Is it possible that similar issues may be present in LVM?  How many
people here are running LVM on systems with ~100GB available storage or
more and 4GB or more of RAM?

This is the only system exhibiting these problems.  Then again, this is
the only system that requires highmem (64GB) support in order to address
all the memory, too.

I'll be able to generate a ksymoops sometime tonight, which will
hopefully point the LVM developers in the right direction for a fix.

-- 
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-28  3:21         ` Heinz J . Mauelshagen
@ 2002-10-28 22:38           ` Gregory K. Ade
  2002-10-29  3:15             ` Heinz J . Mauelshagen
  0 siblings, 1 reply; 21+ messages in thread
From: Gregory K. Ade @ 2002-10-28 22:38 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 6742 bytes --]

On Mon, 2002-10-28 at 01:17, Heinz J . Mauelshagen wrote:

> Run the kernel oops you probably got for "lvcreate --snapshot ..." through
> ksymoops, please.

Okay, just for consistency's sake, I ran:

--->8--[Cut Here]-->8---
root@burpr(pts/24):/ 36 # lvdisplay /dev/vg00/tmp
--- Logical volume ---
LV Name                /dev/vg00/tmp
VG Name                vg00
LV Write Access        read/write
LV Status              available
LV #                   2
# open                 1
LV Size                2 GB
Current LE             512
Allocated LE           512
Allocation             next free
Read ahead sectors     120
Block device           58:1


root@burpr(pts/24):/ 37 # lvcreate --snapshot --extents 512 --name
tmp_snap /dev/vg00/tmp
lvcreate -- INFO: using default snapshot chunk size of 64 KB for
"/dev/vg00/tmp_snap"
Segmentation fault
root@burpr(pts/24):/ 38 #
--->8--[Cut Here]-->8---

That gave me an oops.  pulling it from dmesg and sticking it into a
file, I ran ksymoops, and here's the output in all it's hairy glory:

--->8--[Cut Here]-->8---
ksymoops 2.4.1 on i686 2.4.19-2.burpr.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.19-2.burpr/ (default)
     -m /boot/System.map-2.4.19-2.burpr (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Warning (compare_maps): mismatch on symbol usb_devfs_handle  , usbcore says f9233274, /lib/modules/2.4.19-2.burpr/kernel/drivers/usb/usbcore.o says f9232d34.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/drivers/usb/usbcore.o entry
Warning (compare_maps): mismatch on symbol icmpv6_socket  , ipv6 says f92232e0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f9220420.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol icmpv6_statistics  , ipv6 says f92212e0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921e420.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol inet6_dev_count  , ipv6 says f921f000, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c140.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol inet6_ifa_count  , ipv6 says f921f004, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c144.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol inet6_protos  , ipv6 says f9221260, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921e3a0.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol inetsw6  , ipv6 says f921efa0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c0e0.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol ip6_ra_chain  , ipv6 says f92209a0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921dae0.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol ipv6_statistics  , ipv6 says f921f1a0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c2e0.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol raw_v6_htable  , ipv6 says f92211e0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921e320.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol rt6_stats  , ipv6 says f921f168, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c2a8.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol udp_stats_in6  , ipv6 says f92209e0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921db20.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
kernel BUG at vmalloc.c:236!
invalid operand: 0000
CPU:    2
EIP:    0010:[<c0132082>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: ffffffff   ebx: 00000000   ecx: 51eb851f   edx: 00000000
esi: 00000000   edi: e6421400   ebp: fffffff4   esp: d5d7bd20
ds: 0018   es: 0018   ss: 0018
Process lvcreate (pid: 19229, stackpage=d5d7b000)
Stack: 00000000 00000000 e6421400 fffffff4 000001f0 f931a000 00000001 fffffff4
       c030ca14 c030cb7c 000001f0 00000001 c024cd45 00000000 000001f2 00000163
       e642156c 00000000 e6421400 d5d7bdf8 c024cdf8 e6421400 e6421400 000bd000
Call Trace:    [<c024cd45>] [<c024cdf8>] [<c024a850>] [<c02480cc>] [<c01eaaff>]
  [<c014b397>] [<c01089cb>]
Code: 0f 0b ec 00 00 ae 2b c0 31 c0 e9 e4 01 00 00 6a 02 53 e8 3f

>>EIP; c0132082 <__vmalloc+26/224>   <=====
Trace; c024cd45 <lvm_snapshot_alloc_hash_table+45/8c>
Trace; c024cdf8 <lvm_snapshot_alloc+6c/e0>
Trace; c024a850 <lvm_do_lv_create+528/878>
Trace; c02480cc <lvm_chr_ioctl+71c/828>
Trace; c01eaaff <locate_hd_struct+27/70>
Trace; c014b397 <sys_ioctl+1bb/1f6>
Trace; c01089cb <system_call+33/38>
Code;  c0132082 <__vmalloc+26/224>
00000000 <_EIP>:
Code;  c0132082 <__vmalloc+26/224>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0132084 <__vmalloc+28/224>
   2:   ec                        in     (%dx),%al
Code;  c0132085 <__vmalloc+29/224>
   3:   00 00                     add    %al,(%eax)
Code;  c0132087 <__vmalloc+2b/224>
   5:   ae                        scas   %es:(%edi),%al
Code;  c0132088 <__vmalloc+2c/224>
   6:   2b c0                     sub    %eax,%eax
Code;  c013208a <__vmalloc+2e/224>
   8:   31 c0                     xor    %eax,%eax
Code;  c013208c <__vmalloc+30/224>
   a:   e9 e4 01 00 00            jmp    1f3 <_EIP+0x1f3> c0132275 <__vmalloc+219/224>
Code;  c0132091 <__vmalloc+35/224>
   f:   6a 02                     push   $0x2
Code;  c0132093 <__vmalloc+37/224>
  11:   53                        push   %ebx
Code;  c0132094 <__vmalloc+38/224>
  12:   e8 3f 00 00 00            call   56 <_EIP+0x56> c01320d8 <__vmalloc+7c/224>


13 warnings issued.  Results may not be reliable.
--->8--[Cut Here]-->8---

Anything else I can provide?

-- 
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-28 22:38           ` Gregory K. Ade
@ 2002-10-29  3:15             ` Heinz J . Mauelshagen
  2002-10-29 13:54               ` Gregory K. Ade
  0 siblings, 1 reply; 21+ messages in thread
From: Heinz J . Mauelshagen @ 2002-10-29  3:15 UTC (permalink / raw)
  To: linux-lvm

Gregory,

as Jon already thought, this seems to be a flaw in the VM subsystem.

The ksymoops you provided shows that, because it fails in vmalloc when
lvm_snapshot_alloc_hash_table() tries to allocated virtual memory for the
copy-on-write exeception table it needs to track the changes which happen
to the original logical volume.

Is there any chance to prove that by running the system with less than 2GB
of memory and _without_ high memory support for a test run?

Regards,
Heinz    -- The LVM Guy --


On Mon, Oct 28, 2002 at 08:37:52PM -0800, Gregory K. Ade wrote:
> On Mon, 2002-10-28 at 01:17, Heinz J . Mauelshagen wrote:
> 
> > Run the kernel oops you probably got for "lvcreate --snapshot ..." through
> > ksymoops, please.
> 
> Okay, just for consistency's sake, I ran:
> 
> --->8--[Cut Here]-->8---
> root@burpr(pts/24):/ 36 # lvdisplay /dev/vg00/tmp
> --- Logical volume ---
> LV Name                /dev/vg00/tmp
> VG Name                vg00
> LV Write Access        read/write
> LV Status              available
> LV #                   2
> # open                 1
> LV Size                2 GB
> Current LE             512
> Allocated LE           512
> Allocation             next free
> Read ahead sectors     120
> Block device           58:1
> 
> 
> root@burpr(pts/24):/ 37 # lvcreate --snapshot --extents 512 --name
> tmp_snap /dev/vg00/tmp
> lvcreate -- INFO: using default snapshot chunk size of 64 KB for
> "/dev/vg00/tmp_snap"
> Segmentation fault
> root@burpr(pts/24):/ 38 #
> --->8--[Cut Here]-->8---
> 
> That gave me an oops.  pulling it from dmesg and sticking it into a
> file, I ran ksymoops, and here's the output in all it's hairy glory:
> 
> --->8--[Cut Here]-->8---
> ksymoops 2.4.1 on i686 2.4.19-2.burpr.  Options used
>      -V (default)
>      -k /proc/ksyms (default)
>      -l /proc/modules (default)
>      -o /lib/modules/2.4.19-2.burpr/ (default)
>      -m /boot/System.map-2.4.19-2.burpr (default)
> 
> Warning: You did not tell me where to find symbol information.  I will
> assume that the log matches the kernel and modules that are running
> right now and I'll use the default options above for symbol resolution.
> If the current kernel and/or modules do not match the log, you can get
> more accurate output by telling me the kernel version and where to find
> map, modules, ksyms etc.  ksymoops -h explains the options.
> 
> Warning (compare_maps): mismatch on symbol usb_devfs_handle  , usbcore says f9233274, /lib/modules/2.4.19-2.burpr/kernel/drivers/usb/usbcore.o says f9232d34.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/drivers/usb/usbcore.o entry
> Warning (compare_maps): mismatch on symbol icmpv6_socket  , ipv6 says f92232e0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f9220420.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol icmpv6_statistics  , ipv6 says f92212e0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921e420.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol inet6_dev_count  , ipv6 says f921f000, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c140.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol inet6_ifa_count  , ipv6 says f921f004, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c144.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol inet6_protos  , ipv6 says f9221260, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921e3a0.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol inetsw6  , ipv6 says f921efa0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c0e0.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol ip6_ra_chain  , ipv6 says f92209a0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921dae0.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol ipv6_statistics  , ipv6 says f921f1a0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c2e0.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol raw_v6_htable  , ipv6 says f92211e0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921e320.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol rt6_stats  , ipv6 says f921f168, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921c2a8.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> Warning (compare_maps): mismatch on symbol udp_stats_in6  , ipv6 says f92209e0, /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o says f921db20.  Ignoring /lib/modules/2.4.19-2.burpr/kernel/net/ipv6/ipv6.o entry
> kernel BUG at vmalloc.c:236!
> invalid operand: 0000
> CPU:    2
> EIP:    0010:[<c0132082>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010246
> eax: ffffffff   ebx: 00000000   ecx: 51eb851f   edx: 00000000
> esi: 00000000   edi: e6421400   ebp: fffffff4   esp: d5d7bd20
> ds: 0018   es: 0018   ss: 0018
> Process lvcreate (pid: 19229, stackpage=d5d7b000)
> Stack: 00000000 00000000 e6421400 fffffff4 000001f0 f931a000 00000001 fffffff4
>        c030ca14 c030cb7c 000001f0 00000001 c024cd45 00000000 000001f2 00000163
>        e642156c 00000000 e6421400 d5d7bdf8 c024cdf8 e6421400 e6421400 000bd000
> Call Trace:    [<c024cd45>] [<c024cdf8>] [<c024a850>] [<c02480cc>] [<c01eaaff>]
>   [<c014b397>] [<c01089cb>]
> Code: 0f 0b ec 00 00 ae 2b c0 31 c0 e9 e4 01 00 00 6a 02 53 e8 3f
> 
> >>EIP; c0132082 <__vmalloc+26/224>   <=====
> Trace; c024cd45 <lvm_snapshot_alloc_hash_table+45/8c>
> Trace; c024cdf8 <lvm_snapshot_alloc+6c/e0>
> Trace; c024a850 <lvm_do_lv_create+528/878>
> Trace; c02480cc <lvm_chr_ioctl+71c/828>
> Trace; c01eaaff <locate_hd_struct+27/70>
> Trace; c014b397 <sys_ioctl+1bb/1f6>
> Trace; c01089cb <system_call+33/38>
> Code;  c0132082 <__vmalloc+26/224>
> 00000000 <_EIP>:
> Code;  c0132082 <__vmalloc+26/224>   <=====
>    0:   0f 0b                     ud2a      <=====
> Code;  c0132084 <__vmalloc+28/224>
>    2:   ec                        in     (%dx),%al
> Code;  c0132085 <__vmalloc+29/224>
>    3:   00 00                     add    %al,(%eax)
> Code;  c0132087 <__vmalloc+2b/224>
>    5:   ae                        scas   %es:(%edi),%al
> Code;  c0132088 <__vmalloc+2c/224>
>    6:   2b c0                     sub    %eax,%eax
> Code;  c013208a <__vmalloc+2e/224>
>    8:   31 c0                     xor    %eax,%eax
> Code;  c013208c <__vmalloc+30/224>
>    a:   e9 e4 01 00 00            jmp    1f3 <_EIP+0x1f3> c0132275 <__vmalloc+219/224>
> Code;  c0132091 <__vmalloc+35/224>
>    f:   6a 02                     push   $0x2
> Code;  c0132093 <__vmalloc+37/224>
>   11:   53                        push   %ebx
> Code;  c0132094 <__vmalloc+38/224>
>   12:   e8 3f 00 00 00            call   56 <_EIP+0x56> c01320d8 <__vmalloc+7c/224>
> 
> 
> 13 warnings issued.  Results may not be reliable.
> --->8--[Cut Here]-->8---
> 
> Anything else I can provide?
> 
> -- 
> Gregory K. Ade <gkade@bigbrother.net>
> http://bigbrother.net/~gkade
> OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-29  3:15             ` Heinz J . Mauelshagen
@ 2002-10-29 13:54               ` Gregory K. Ade
  2002-10-31  6:52                 ` Heinz J . Mauelshagen
  0 siblings, 1 reply; 21+ messages in thread
From: Gregory K. Ade @ 2002-10-29 13:54 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1270 bytes --]

On Tue, 2002-10-29 at 01:10, Heinz J . Mauelshagen wrote:
> 
> Gregory,
> 
> as Jon already thought, this seems to be a flaw in the VM subsystem.
> 
> The ksymoops you provided shows that, because it fails in vmalloc when
> lvm_snapshot_alloc_hash_table() tries to allocated virtual memory for the
> copy-on-write exeception table it needs to track the changes which happen
> to the original logical volume.
> 
> Is there any chance to prove that by running the system with less than 2GB
> of memory and _without_ high memory support for a test run?
> 
> Regards,
> Heinz    -- The LVM Guy --

Actually, yes, I can run this on several other smaller-memory systems. 
I've got a machine I've been working on setting up that's 2GB exactly,
and several others running 2.4.19 & LVM 1.0.5 with less than 1GB of
ram.  However, I've not had any of these systems produce errors like
these.  I can run the 2GB system with and without high memory support
and try to induce a similar failure.

Are there any specific tests you'd like me to run on this other system,
in case I can't induce an Oops on it?

Thanks,

Gregory

-- 
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-28 16:40           ` Gregory K. Ade
@ 2002-10-29 15:21             ` Luca Berra
  0 siblings, 0 replies; 21+ messages in thread
From: Luca Berra @ 2002-10-29 15:21 UTC (permalink / raw)
  To: linux-lvm

On Mon, Oct 28, 2002 at 02:39:40PM -0800, Gregory K. Ade wrote:
>On Mon, 2002-10-28 at 12:35, jon+lvm@silicide.dk wrote:
>> On Sun, Oct 27, 2002 at 09:29:35PM -0800, Gregory K. Ade wrote:
>Is it possible that similar issues may be present in LVM?  How many
>people here are running LVM on systems with ~100GB available storage or
>more and 4GB or more of RAM?

i am, rh 2.4.18-5 kernel (with rh-supplied lvm [1.0.3 + fixes])
8 cpus
8GB ram
two FC cards (emulex)
about 1 tb of space on an HP VA7xxx (through a brocade switch)

i have had other problems, but not the one you mention

regards,
L.
(cannot do ANY test, that baby is in production)

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-29 13:54               ` Gregory K. Ade
@ 2002-10-31  6:52                 ` Heinz J . Mauelshagen
  2002-10-31 16:55                   ` Gregory Ade
  0 siblings, 1 reply; 21+ messages in thread
From: Heinz J . Mauelshagen @ 2002-10-31  6:52 UTC (permalink / raw)
  To: linux-lvm

On Tue, Oct 29, 2002 at 11:53:44AM -0800, Gregory K. Ade wrote:
> On Tue, 2002-10-29 at 01:10, Heinz J . Mauelshagen wrote:
> > 
> > Gregory,
> > 
> > as Jon already thought, this seems to be a flaw in the VM subsystem.
> > 
> > The ksymoops you provided shows that, because it fails in vmalloc when
> > lvm_snapshot_alloc_hash_table() tries to allocated virtual memory for the
> > copy-on-write exeception table it needs to track the changes which happen
> > to the original logical volume.
> > 
> > Is there any chance to prove that by running the system with less than 2GB
> > of memory and _without_ high memory support for a test run?
> > 
> > Regards,
> > Heinz    -- The LVM Guy --
> 
> Actually, yes, I can run this on several other smaller-memory systems. 
> I've got a machine I've been working on setting up that's 2GB exactly,
> and several others running 2.4.19 & LVM 1.0.5 with less than 1GB of
> ram.  However, I've not had any of these systems produce errors like
> these.  I can run the 2GB system with and without high memory support
> and try to induce a similar failure.
> 
> Are there any specific tests you'd like me to run on this other system,
> in case I can't induce an Oops on it?

Gregory a _different_ system is unlikely to help.
I was asking for changes to take affect on the system where you do have 
the problem.
If you have a chance to reconfigure the failing system temporarily,
running what you did before should reproduce the same problems (segfault
on snapshot creation after fresh reboot) in case there's _no_ bug in the
large/high memory support.
My assumption is that it will not.

> 
> Thanks,
> 
> Gregory
> 
> -- 
> Gregory K. Ade <gkade@bigbrother.net>
> http://bigbrother.net/~gkade
> OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu



-- 

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-31  6:52                 ` Heinz J . Mauelshagen
@ 2002-10-31 16:55                   ` Gregory Ade
  2002-11-01  9:07                     ` Wolfgang Weisselberg
  2002-11-05  8:33                     ` Heinz J . Mauelshagen
  0 siblings, 2 replies; 21+ messages in thread
From: Gregory Ade @ 2002-10-31 16:55 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1392 bytes --]

On Thu, 2002-10-31 at 04:48, Heinz J . Mauelshagen wrote:

> Gregory a _different_ system is unlikely to help.
> I was asking for changes to take affect on the system where you do have 
> the problem.

Ahh...  That may take a little time, then.  I can certainly prepare
another kernel for testing with high memory support turned off, and try
to do some tests at the end of business on friday.  It might have to
wait a little longer, though.  I'll see what they're willing to
schedule, and get you results when I can.

> If you have a chance to reconfigure the failing system temporarily,
> running what you did before should reproduce the same problems (segfault
> on snapshot creation after fresh reboot) in case there's _no_ bug in the
> large/high memory support.
> My assumption is that it will not.

So, just for clarification, on the system where I am having this
problem, all you want me to do is run a kernel without high memory
support and attempt the snapshot creation?  Unfortunately, the way the
system was assembled, the smallest I can physically reduce the RAM to is
4GB (two DIMMS on each memory carrier is the minimum required for the
system to boot, and they shipped it to us with 8 1GB DIMMS.)

Thanks again,
Gregory

-- 
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-31 16:55                   ` Gregory Ade
@ 2002-11-01  9:07                     ` Wolfgang Weisselberg
  2002-11-05  8:33                     ` Heinz J . Mauelshagen
  1 sibling, 0 replies; 21+ messages in thread
From: Wolfgang Weisselberg @ 2002-11-01  9:07 UTC (permalink / raw)
  To: linux-lvm

Gregory Ade (gkade@bigbrother.net) wrote 60 lines:

> support and attempt the snapshot creation?  Unfortunately, the way the
> system was assembled, the smallest I can physically reduce the RAM to is
> 4GB (two DIMMS on each memory carrier is the minimum required for the
> system to boot, and they shipped it to us with 8 1GB DIMMS.)

There should be no problem with "mem=1023M" as boot parameter, 
as long as at least 1023 MB are actually physically available.

-Wolfgang

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-10-31 16:55                   ` Gregory Ade
  2002-11-01  9:07                     ` Wolfgang Weisselberg
@ 2002-11-05  8:33                     ` Heinz J . Mauelshagen
  2002-11-07 21:45                       ` Gregory Ade
  1 sibling, 1 reply; 21+ messages in thread
From: Heinz J . Mauelshagen @ 2002-11-05  8:33 UTC (permalink / raw)
  To: linux-lvm

On Thu, Oct 31, 2002 at 02:54:18PM -0800, Gregory Ade wrote:
> On Thu, 2002-10-31 at 04:48, Heinz J . Mauelshagen wrote:
> 
> > Gregory a _different_ system is unlikely to help.
> > I was asking for changes to take affect on the system where you do have 
> > the problem.
> 
> Ahh...  That may take a little time, then.  I can certainly prepare
> another kernel for testing with high memory support turned off, and try
> to do some tests at the end of business on friday.  It might have to
> wait a little longer, though.  I'll see what they're willing to
> schedule, and get you results when I can.
> 
> > If you have a chance to reconfigure the failing system temporarily,
> > running what you did before should reproduce the same problems (segfault
> > on snapshot creation after fresh reboot) in case there's _no_ bug in the
> > large/high memory support.
> > My assumption is that it will not.
> 
> So, just for clarification, on the system where I am having this
> problem, all you want me to do is run a kernel without high memory
> support and attempt the snapshot creation?  Unfortunately, the way the
> system was assembled, the smallest I can physically reduce the RAM to is
> 4GB (two DIMMS on each memory carrier is the minimum required for the
> system to boot, and they shipped it to us with 8 1GB DIMMS.)

There's no need to physically remove any DIMMs. Just cook up a kernel
without the patches you mentioned to get that elefant going and
without high memory support.

You can boot with lilo command line containing "mem=1G" to test with
just 1Gb.

> 
> Thanks again,
> Gregory
> 
> -- 
> Gregory K. Ade <gkade@bigbrother.net>
> http://bigbrother.net/~gkade
> OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-11-05  8:33                     ` Heinz J . Mauelshagen
@ 2002-11-07 21:45                       ` Gregory Ade
  2002-11-09  6:12                         ` Heinz J . Mauelshagen
  0 siblings, 1 reply; 21+ messages in thread
From: Gregory Ade @ 2002-11-07 21:45 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 2212 bytes --]

On Tue, 2002-11-05 at 06:28, Heinz J . Mauelshagen wrote:

> > > If you have a chance to reconfigure the failing system temporarily,
> > > running what you did before should reproduce the same problems (segfault
> > > on snapshot creation after fresh reboot) in case there's _no_ bug in the
> > > large/high memory support.
> > > My assumption is that it will not.

> There's no need to physically remove any DIMMs. Just cook up a kernel
> without the patches you mentioned to get that elefant going and
> without high memory support.

Okay, I disabled high memory support (only change from production
kernel), rebooted with this test kernel, and tried to create a snapshot:

root@burpr(pts/0):~ 26 # lvcreate --snapshot --extents 512 --name
tmp_snap /dev/vg00/tmp
lvcreate -- INFO: using default snapshot chunk size of 64 KB for
"/dev/vg00/tmp_snap"
lvcreate -- doing automatic backup of "vg00"
lvcreate -- logical volume "/dev/vg00/tmp_snap" successfully created


It worked just fine:

root@burpr(pts/1):log 28 # lvdisplay /dev/vg00/tmp
--- Logical volume ---
LV Name                /dev/vg00/tmp
VG Name                vg00
LV Write Access        read/write
LV snapshot status     source of
                       /dev/vg00/tmp_snap [active]
LV Status              available
LV #                   2
# open                 1
LV Size                2 GB
Current LE             512
Allocated LE           512
Allocation             next free
Read ahead sectors     120
Block device           58:1

So I removed it:

root@burpr(pts/1):log 29 # lvremove /dev/vg00/tmp_snap
lvremove -- do you really want to remove "/dev/vg00/tmp_snap"? [y/n]: y
lvremove -- doing automatic backup of volume group "vg00"
lvremove -- logical volume "/dev/vg00/tmp_snap" successfully removed


No kernel oops or BUG in the dmesg.

Again, the _ONLY DIFFERENCE_ between this test kernel and the production
kernel is the high-memory support option.  On the test kernel, it is
off, and on the production kernel, it is set to 64GB.

Hope this helps.

-- 
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-11-07 21:45                       ` Gregory Ade
@ 2002-11-09  6:12                         ` Heinz J . Mauelshagen
  2002-11-11  5:56                           ` Jon Bendtsen
                                             ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Heinz J . Mauelshagen @ 2002-11-09  6:12 UTC (permalink / raw)
  To: linux-lvm

Gregory,

this proves my assumption right that something is fishy with the high
memory support in your SMP environment.

I guess that it might work as well in case you make a single processor kernel
_with_ high memory enabled and repeat the very same test and that it might
be a highmem/smp problem still to be fixed.

Anyone else with hints?

Regards,
Heinz    -- The LVM Guy --


On Thu, Nov 07, 2002 at 07:44:36PM -0800, Gregory Ade wrote:
> On Tue, 2002-11-05 at 06:28, Heinz J . Mauelshagen wrote:
> 
> > > > If you have a chance to reconfigure the failing system temporarily,
> > > > running what you did before should reproduce the same problems (segfault
> > > > on snapshot creation after fresh reboot) in case there's _no_ bug in the
> > > > large/high memory support.
> > > > My assumption is that it will not.
> 
> > There's no need to physically remove any DIMMs. Just cook up a kernel
> > without the patches you mentioned to get that elefant going and
> > without high memory support.
> 
> Okay, I disabled high memory support (only change from production
> kernel), rebooted with this test kernel, and tried to create a snapshot:
> 
> root@burpr(pts/0):~ 26 # lvcreate --snapshot --extents 512 --name
> tmp_snap /dev/vg00/tmp
> lvcreate -- INFO: using default snapshot chunk size of 64 KB for
> "/dev/vg00/tmp_snap"
> lvcreate -- doing automatic backup of "vg00"
> lvcreate -- logical volume "/dev/vg00/tmp_snap" successfully created
> 
> 
> It worked just fine:
> 
> root@burpr(pts/1):log 28 # lvdisplay /dev/vg00/tmp
> --- Logical volume ---
> LV Name                /dev/vg00/tmp
> VG Name                vg00
> LV Write Access        read/write
> LV snapshot status     source of
>                        /dev/vg00/tmp_snap [active]
> LV Status              available
> LV #                   2
> # open                 1
> LV Size                2 GB
> Current LE             512
> Allocated LE           512
> Allocation             next free
> Read ahead sectors     120
> Block device           58:1
> 
> So I removed it:
> 
> root@burpr(pts/1):log 29 # lvremove /dev/vg00/tmp_snap
> lvremove -- do you really want to remove "/dev/vg00/tmp_snap"? [y/n]: y
> lvremove -- doing automatic backup of volume group "vg00"
> lvremove -- logical volume "/dev/vg00/tmp_snap" successfully removed
> 
> 
> No kernel oops or BUG in the dmesg.
> 
> Again, the _ONLY DIFFERENCE_ between this test kernel and the production
> kernel is the high-memory support option.  On the test kernel, it is
> off, and on the production kernel, it is set to 64GB.
> 
> Hope this helps.
> 
> -- 
> Gregory K. Ade <gkade@bigbrother.net>
> http://bigbrother.net/~gkade
> OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-11-09  6:12                         ` Heinz J . Mauelshagen
@ 2002-11-11  5:56                           ` Jon Bendtsen
  2002-11-22 15:51                           ` Gregory Ade
  2002-12-05 21:35                           ` Gregory Ade
  2 siblings, 0 replies; 21+ messages in thread
From: Jon Bendtsen @ 2002-11-11  5:56 UTC (permalink / raw)
  To: linux-lvm

"Heinz J . Mauelshagen" wrote:
> 
> Gregory,
> 
> this proves my assumption right that something is fishy with the high
> memory support in your SMP environment.
> 
> I guess that it might work as well in case you make a single processor kernel
> _with_ high memory enabled and repeat the very same test and that it might
> be a highmem/smp problem still to be fixed.
> 
> Anyone else with hints?

Earthrays ?? ;-D




JonB

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-11-09  6:12                         ` Heinz J . Mauelshagen
  2002-11-11  5:56                           ` Jon Bendtsen
@ 2002-11-22 15:51                           ` Gregory Ade
  2002-12-05 21:35                           ` Gregory Ade
  2 siblings, 0 replies; 21+ messages in thread
From: Gregory Ade @ 2002-11-22 15:51 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Sat, 2002-11-09 at 04:06, Heinz J . Mauelshagen wrote:
> 
> Gregory,
> 
> this proves my assumption right that something is fishy with the high
> memory support in your SMP environment.
> 
> I guess that it might work as well in case you make a single processor kernel
> _with_ high memory enabled and repeat the very same test and that it might
> be a highmem/smp problem still to be fixed.

I'll give this a try in the next week or so and get back to you.

-- 
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
  2002-11-09  6:12                         ` Heinz J . Mauelshagen
  2002-11-11  5:56                           ` Jon Bendtsen
  2002-11-22 15:51                           ` Gregory Ade
@ 2002-12-05 21:35                           ` Gregory Ade
  2 siblings, 0 replies; 21+ messages in thread
From: Gregory Ade @ 2002-12-05 21:35 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 9288 bytes --]

On Sat, 2002-11-09 at 04:06, Heinz J . Mauelshagen wrote:
> On Thu, Nov 07, 2002 at 07:44:36PM -0800, Gregory Ade wrote:
> > Okay, I disabled high memory support (only change from production
> > kernel), rebooted with this test kernel, and tried to create a snapshot:
[snip]
> > It worked just fine:
[snip]
> > So I removed it:
[snip]
> > No kernel oops or BUG in the dmesg.
> > 
> > Again, the _ONLY DIFFERENCE_ between this test kernel and the production
> > kernel is the high-memory support option.  On the test kernel, it is
> > off, and on the production kernel, it is set to 64GB.
> > 
> > Hope this helps.

[snip]

> this proves my assumption right that something is fishy with the high
> memory support in your SMP environment.
> 
> I guess that it might work as well in case you make a single processor kernel
> _with_ high memory enabled and repeat the very same test and that it might
> be a highmem/smp problem still to be fixed.

Well, it's not solely an SMP thing.  I finally got an opportunity to
test a non-SMP high-memory kernel tonight, and it's the exact same
failure mode as the original problem report.  Here's a full report,
complete with the Oops and a run through ksymoops.

root@burpr(pts/0):~ 24 # uname -a
Linux burpr 2.4.19-2.burpr.test #1 Thu Dec 5 16:22:49 PST 2002 i686 unknown
root@burpr(pts/0):~ 25 # cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Xeon(TM) CPU 1.60GHz
stepping        : 1
cpu MHz         : 1595.176
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3185.04

root@burpr(pts/0):~ 26 # free -tm
             total       used       free     shared    buffers     cached
Mem:          7580        132       7448          0         46         28
-/+ buffers/cache:         57       7523
Swap:         8191          0       8191
Total:       15772        132      15640
root@burpr(pts/0):~ 27 # lvcreate --snapshot --extents 512 --name tmp_snap /dev/vg00/tmp
lvcreate -- INFO: using default snapshot chunk size of 64 KB for "/dev/vg00/tmp_snap"
Segmentation fault

--->8--[ Oops output taken from dmesg ]-->8---

kernel BUG at vmalloc.c:236!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012caa6>]    Not tainted
EFLAGS: 00010246
eax: ffffffff   ebx: 00000000   ecx: 51eb851f   edx: 00000000
esi: 00000000   edi: f4f51a00   ebp: fffffff4   esp: f3ce9d20
ds: 0018   es: 0018   ss: 0018
Process lvcreate (pid: 740, stackpage=f3ce9000)
Stack: 00000000 00000000 f4f51a00 fffffff4 000001f0 f930f000 00000001 fffffff4
       c02eab14 c02eac7c 000001f0 00000001 c023ad1b 00000000 000001f2 00000163
       f4f51b6c 00000000 f4f51a00 f3ce9df8 c023adc8 f4f51a00 f4f51a00 000bd000
Call Trace:    [<c023ad1b>] [<c023adc8>] [<c0238870>] [<c023614c>] [<c01dad7f>]
  [<c01419f7>] [<c010867b>]

Code: 0f 0b ec 00 20 cb 29 c0 31 c0 e9 bf 01 00 00 6a 02 53 e8 9f

--->8--[ Oops output taken from dmesg ]-->8---

--->8--[ Oops output from ksymoops ]-->8---

ksymoops 2.4.1 on i686 2.4.19-2.burpr.test.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.19-2.burpr.test/ (default)
     -m /boot/System.map-2.4.19-2.burpr.test (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Warning (compare_maps): mismatch on symbol usb_devfs_handle  , usbcore says f922a6f4, /lib/modules/2.4.19-2.burpr.test/kernel/drivers/usb/usbcore.o says f922a1b4.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/drivers/usb/usbcore.o entry
Warning (compare_maps): mismatch on symbol icmpv6_socket  , ipv6 says f921ac80, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f9218960.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol icmpv6_statistics  , ipv6 says f921ab80, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f9218860.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol inet6_dev_count  , ipv6 says f921a7a0, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f9218480.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol inet6_ifa_count  , ipv6 says f921a7a4, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f9218484.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol inet6_protos  , ipv6 says f921ab00, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f92187e0.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol inetsw6  , ipv6 says f921a740, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f9218420.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol ip6_ra_chain  , ipv6 says f921aa00, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f92186e0.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol ipv6_statistics  , ipv6 says f921a940, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f9218620.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol raw_v6_htable  , ipv6 says f921aa80, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f9218760.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol rt6_stats  , ipv6 says f921a908, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f92185e8.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
Warning (compare_maps): mismatch on symbol udp_stats_in6  , ipv6 says f921aa40, /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o says f9218720.  Ignoring /lib/modules/2.4.19-2.burpr.test/kernel/net/ipv6/ipv6.o entry
LAPIC_NMI (acpi_id[0x0007] polarity[0x1] trigger[0x1] lint[0x1])
LAPIC_NMI (acpi_id[0x0008] polarity[0x1] trigger[0x1] lint[0x1])
cpu: 0, clocks: 996964, slice: 498482
kernel BUG at vmalloc.c:236!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c012caa6>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: ffffffff   ebx: 00000000   ecx: 51eb851f   edx: 00000000
esi: 00000000   edi: f4f51a00   ebp: fffffff4   esp: f3ce9d20
ds: 0018   es: 0018   ss: 0018
Process lvcreate (pid: 740, stackpage=f3ce9000)
Stack: 00000000 00000000 f4f51a00 fffffff4 000001f0 f930f000 00000001 fffffff4 
       c02eab14 c02eac7c 000001f0 00000001 c023ad1b 00000000 000001f2 00000163 
       f4f51b6c 00000000 f4f51a00 f3ce9df8 c023adc8 f4f51a00 f4f51a00 000bd000 
Call Trace:    [<c023ad1b>] [<c023adc8>] [<c0238870>] [<c023614c>] [<c01dad7f>]
  [<c01419f7>] [<c010867b>]
Code: 0f 0b ec 00 20 cb 29 c0 31 c0 e9 bf 01 00 00 6a 02 53 e8 9f 

>>EIP; c012caa6 <__vmalloc+26/1fc>   <=====
Trace; c023ad1b <lvm_snapshot_alloc_hash_table+3f/80>
Trace; c023adc8 <lvm_snapshot_alloc+6c/e0>
Trace; c0238870 <lvm_do_lv_create+518/868>
Trace; c023614c <lvm_chr_ioctl+710/81c>
Trace; c01dad7f <locate_hd_struct+27/70>
Trace; c01419f7 <sys_ioctl+16b/184>
Trace; c010867b <system_call+33/38>
Code;  c012caa6 <__vmalloc+26/1fc>
00000000 <_EIP>:
Code;  c012caa6 <__vmalloc+26/1fc>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c012caa8 <__vmalloc+28/1fc>
   2:   ec                        in     (%dx),%al
Code;  c012caa9 <__vmalloc+29/1fc>
   3:   00 20                     add    %ah,(%eax)
Code;  c012caab <__vmalloc+2b/1fc>
   5:   cb                        lret   
Code;  c012caac <__vmalloc+2c/1fc>
   6:   29 c0                     sub    %eax,%eax
Code;  c012caae <__vmalloc+2e/1fc>
   8:   31 c0                     xor    %eax,%eax
Code;  c012cab0 <__vmalloc+30/1fc>
   a:   e9 bf 01 00 00            jmp    1ce <_EIP+0x1ce> c012cc74 <__vmalloc+1f4/1fc>
Code;  c012cab5 <__vmalloc+35/1fc>
   f:   6a 02                     push   $0x2
Code;  c012cab7 <__vmalloc+37/1fc>
  11:   53                        push   %ebx
Code;  c012cab8 <__vmalloc+38/1fc>
  12:   e8 9f 00 00 00            call   b6 <_EIP+0xb6> c012cb5c <__vmalloc+dc/1fc>


13 warnings issued.  Results may not be reliable.

--->8--[ Oops output from ksymoops ]-->8---


-- 
Gregory K. Ade <gkade@bigbrother.net>
http://bigbrother.net/~gkade
OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2002-12-05 21:35 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-02 12:23 [linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19? Gregory K. Ade
2002-10-04  3:54 ` Heinz J . Mauelshagen
2002-10-08 15:07   ` Gregory Ade
2002-10-09  6:38     ` Heinz J . Mauelshagen
2002-10-27 22:30       ` Gregory K. Ade
2002-10-28  3:21         ` Heinz J . Mauelshagen
2002-10-28 22:38           ` Gregory K. Ade
2002-10-29  3:15             ` Heinz J . Mauelshagen
2002-10-29 13:54               ` Gregory K. Ade
2002-10-31  6:52                 ` Heinz J . Mauelshagen
2002-10-31 16:55                   ` Gregory Ade
2002-11-01  9:07                     ` Wolfgang Weisselberg
2002-11-05  8:33                     ` Heinz J . Mauelshagen
2002-11-07 21:45                       ` Gregory Ade
2002-11-09  6:12                         ` Heinz J . Mauelshagen
2002-11-11  5:56                           ` Jon Bendtsen
2002-11-22 15:51                           ` Gregory Ade
2002-12-05 21:35                           ` Gregory Ade
2002-10-28 14:36         ` jon+lvm
2002-10-28 16:40           ` Gregory K. Ade
2002-10-29 15:21             ` Luca Berra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).