All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] Problems with snapshots on sparc64
@ 2003-05-28 19:09 Daniel Bayer
  2003-06-02  4:50 ` Heinz J . Mauelshagen
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Bayer @ 2003-05-28 19:09 UTC (permalink / raw)
  To: linux-lvm

Hello

I have a E250 running Debian/stable with Linux 2.4.20 (vanilla kernel,
except for xfs). Three weeks ago I created a LV with an xfs on it. I
also created a backupscript that uses snapshots. For two weeks
everything worked fine.

Then I got oopses during the backup (mainly from kupdated). After that
the system was rebooted. I had to boot with init=/bin/sh since the
system freezed while mounting the xfs partitions (another story). I
did a vgscan and a vgchange. After that I did a
"lvdisplay /dev/vg1/lv_snapshot" and got the usual output and 3 times
the error "Trying to vfree() nonexistent vm area (addr)". Then I tried
"lvdisplay /dev/vg1/lv" and also got the vfree()-error and a
Segmention-Fault after the line that says that there is a snapshot of
this LV. I removed the snapshot and "lvdisplay /dev/vg1/lv" worked
fine. After that I could bring the whole system in a working state.
I could reproduce this.

The address in the vfree()-error message only changes after a reboot.
I also found older vfree()-errors in the log.

The Backtrace of the oops always ends in
| Trace; 005143e8 <lvm_map+488/580>
| Trace; 005144e8 <lvm_make_request_fn+8/40>

Another related problem is, that "cat /proc/lvm/global" in the most
cases doesn't output the usual data but some random memory. Its really
odd. But if I try often enough I sometimes also get the right output
(and no, I didn't calculated the probability). The other files in
/proc/lvm work fine.

Since I moved the important stuff to another system I can do some
tests and provide further informations.


Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] Problems with snapshots on sparc64
  2003-05-28 19:09 [linux-lvm] Problems with snapshots on sparc64 Daniel Bayer
@ 2003-06-02  4:50 ` Heinz J . Mauelshagen
  2003-06-02  8:11   ` Daniel Bayer
  0 siblings, 1 reply; 6+ messages in thread
From: Heinz J . Mauelshagen @ 2003-06-02  4:50 UTC (permalink / raw)
  To: linux-lvm

Daniel,

which LVM version (kernel and tools) and do you have HIGHMEM configured
(IOW: more than 960M memory) ?
There's flaws with Linux 2.4.20 highmem which show with snapshots :(


On Thu, May 29, 2003 at 02:10:29AM +0200, Daniel Bayer wrote:
> Hello
> 
> I have a E250 running Debian/stable with Linux 2.4.20 (vanilla kernel,
> except for xfs). Three weeks ago I created a LV with an xfs on it. I
> also created a backupscript that uses snapshots. For two weeks
> everything worked fine.
> 
> Then I got oopses during the backup (mainly from kupdated). After that
> the system was rebooted. I had to boot with init=/bin/sh since the
> system freezed while mounting the xfs partitions (another story). I
> did a vgscan and a vgchange. After that I did a
> "lvdisplay /dev/vg1/lv_snapshot" and got the usual output and 3 times
> the error "Trying to vfree() nonexistent vm area (addr)". Then I tried
> "lvdisplay /dev/vg1/lv" and also got the vfree()-error and a
> Segmention-Fault after the line that says that there is a snapshot of
> this LV. I removed the snapshot and "lvdisplay /dev/vg1/lv" worked
> fine. After that I could bring the whole system in a working state.
> I could reproduce this.
> 
> The address in the vfree()-error message only changes after a reboot.
> I also found older vfree()-errors in the log.
> 
> The Backtrace of the oops always ends in
> | Trace; 005143e8 <lvm_map+488/580>
> | Trace; 005144e8 <lvm_make_request_fn+8/40>
> 
> Another related problem is, that "cat /proc/lvm/global" in the most
> cases doesn't output the usual data but some random memory. Its really
> odd. But if I try often enough I sometimes also get the right output
> (and no, I didn't calculated the probability). The other files in
> /proc/lvm work fine.
> 
> Since I moved the important stuff to another system I can do some
> tests and provide further informations.
> 
> 
> Daniel
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@sistina.com
> http://lists.sistina.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

-- 

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] Problems with snapshots on sparc64
  2003-06-02  4:50 ` Heinz J . Mauelshagen
@ 2003-06-02  8:11   ` Daniel Bayer
  2003-06-02 10:15     ` Heinz J . Mauelshagen
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Bayer @ 2003-06-02  8:11 UTC (permalink / raw)
  To: linux-lvm

Hallo

On Mon, Jun 02, 2003 at 11:39:39AM +0200, Heinz J . Mauelshagen wrote:
> which LVM version (kernel and tools)

kernel: 1.0.5+(22/07/2002)
tools:  1.0.4-4

> and do you have HIGHMEM configured
> (IOW: more than 960M memory) ?

Its a 64Bit architecture. There is no need for things like HIGHMEM.
But yes, the system has 2GB RAM.

> There's flaws with Linux 2.4.20 highmem which show with snapshots :(

But this problem:
> > Another related problem is, that "cat /proc/lvm/global" in the most
> > cases doesn't output the usual data but some random memory. Its really
> > odd. But if I try often enough I sometimes also get the right output
> > (and no, I didn't calculated the probability). The other files in
> > /proc/lvm work fine.

appears also if there are no snapshots at all. (This time I only had
to try 2 times to get the version-string.)

Tomorrow I will have some time to update the kernel and the tools to
the current versions.


Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] Problems with snapshots on sparc64
  2003-06-02  8:11   ` Daniel Bayer
@ 2003-06-02 10:15     ` Heinz J . Mauelshagen
  2003-06-03 14:38       ` Daniel Bayer
  0 siblings, 1 reply; 6+ messages in thread
From: Heinz J . Mauelshagen @ 2003-06-02 10:15 UTC (permalink / raw)
  To: linux-lvm

On Mon, Jun 02, 2003 at 03:12:32PM +0200, Daniel Bayer wrote:
> Hallo
> 
> On Mon, Jun 02, 2003 at 11:39:39AM +0200, Heinz J . Mauelshagen wrote:
> > which LVM version (kernel and tools)
> 
> kernel: 1.0.5+(22/07/2002)
> tools:  1.0.4-4
> 
> > and do you have HIGHMEM configured
> > (IOW: more than 960M memory) ?
> 
> Its a 64Bit architecture. There is no need for things like HIGHMEM.
> But yes, the system has 2GB RAM.

Oh, overlooked that in your email :(

> 
> > There's flaws with Linux 2.4.20 highmem which show with snapshots :(
> 
> But this problem:
> > > Another related problem is, that "cat /proc/lvm/global" in the most
> > > cases doesn't output the usual data but some random memory. Its really
> > > odd. But if I try often enough I sometimes also get the right output
> > > (and no, I didn't calculated the probability). The other files in
> > > /proc/lvm work fine.
> 
> appears also if there are no snapshots at all. (This time I only had
> to try 2 times to get the version-string.)
> 
> Tomorrow I will have some time to update the kernel and the tools to
> the current versions.

Yes, my recommendation anyway.

> 
> 
> Daniel
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@sistina.com
> http://lists.sistina.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

-- 

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] Problems with snapshots on sparc64
  2003-06-02 10:15     ` Heinz J . Mauelshagen
@ 2003-06-03 14:38       ` Daniel Bayer
  2003-06-04 10:53         ` [linux-lvm] LVM + software Raid 5 : freeze when using snapshots Serge Rossi
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Bayer @ 2003-06-03 14:38 UTC (permalink / raw)
  To: linux-lvm

On Mon, Jun 02, 2003 at 05:05:07PM +0200, Heinz J . Mauelshagen wrote:
> On Mon, Jun 02, 2003 at 03:12:32PM +0200, Daniel Bayer wrote:
> > But this problem:
> > > > Another related problem is, that "cat /proc/lvm/global" in the most
> > > > cases doesn't output the usual data but some random memory. Its really
> > > > odd. But if I try often enough I sometimes also get the right output
> > > > (and no, I didn't calculated the probability). The other files in
> > > > /proc/lvm work fine.
> > 
> > appears also if there are no snapshots at all. (This time I only had
> > to try 2 times to get the version-string.)

I just did the update of the kernel (1.0.7(28/03/2003)) and tools
(1.0.7-3).

The problem with /proc/lvm/global still appears:
| tau:~ # hexdump -C < /proc/lvm/global
| 00000000  00 06 d9 84 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| 00000010  00 06 d9 40 00 00 00 00  00 04 f4 08 00 00 00 00  |...@............|
| 00000020  00 06 e6 fc 00 00 00 00  00 06 da bc 00 00 00 00  |................|
| 00000030  00 06 f1 70 00 00 00 00  00 06 f0 94 00 00 00 00  |...p............|
| 00000040  00 06 db f0 00 00 00 00  00 06 f5 88 00 00 00 00  |................|
| 00000050  00 06 d8 a0 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| 00000060  00 06 e6 a0 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| 00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| 00000080  00 00 00 00 00 00 00 00  00 06 d9 08 00 00 00 00  |................|
| 00000090  00 00 00 00 00 00 00 00  00 06 ea dc 00 00 00 00  |................|
| 000000a0  00 00 00 00 00 00 00 00  00 06 fb cc 00 00 00 00  |................|
| 000000b0  00 06 f9 88 00 00 00 00  00 06 f5 88 00 00 00 00  |................|
| 000000c0  00 07 b0 9c 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| 000000d0  00 06 db 50 00 00 00 00  00 06 c7 d8 00 00 00 00  |...P............|
| 000000e0  00 06 f3 bc 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| 000000f0  00 00 00 00 00 00 00 00  00 06 d9 84 00 00 00 00  |................|
| 00000100  00 00 00 00 00 00 00 00  00 06 f5 70 00 00 00 00  |...........p....|
| 00000110  00 06 d8 68 00 00 00 00  00 06 fc e8 00 00 00 00  |...h............|
| 00000120  00 06 e6 c4 00 00 00 00  00 06 da bc 00 00 00 00  |................|
| 00000130  00 06 f1 70 00 00 00 00  00 06 f0 94 00 00 00 00  |...p............|
| 00000140  00 06 db f0 00 00 00 00  00 06 f5 88 00 00 00 00  |................|
| 00000150  00 00 00 00 00 00 00 00  00 06 c1 e0 00 00 00 00  |................|
| 00000160  00 06 e7 38 00 00 00 00  00 06 d2 a4 00 00 00 00  |...8............|
| 00000170  00 06 d4 1c 00 00 00 00  00 06 c1 5c 00 00 00 00  |...........\....|
| 00000180  00 06 fc 88 00 00 00 00  00 06 d9 08 00 00 00 00  |................|
| 00000190  00 00 00 00 00 00 00 00  00 06 ea dc 00 00 00 00  |................|
| 000001a0  00 00 00 00 00 00 00 00  00 06 f8 94 00 00 00 00  |................|
| 000001b0  00 06 f9 88 00 00 00 00  00 06 f5 88 00 00 00 00  |................|
| 000001c0  00 06 d8 50 00 00 00 00  00 04 f3 00 00 00 00 00  |...P............|
| 000001d0  00 06 db 50 00 00 00 00  00 06 f4 94 00 00 00 00  |...P............|
| 000001e0  00 06 f3 bc 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| 000001f0  00 00 00 00 00 00 00 00  00 06 eb 88 00 00 00 00  |................|
| 00000200  00 00 00 00 00 00 00 00  00 06 e9 60 00 00 00 00  |...........`....|
| 00000210  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
| *
| 000003b0  00 00 00 00 00 00 00                              |.......|
| 000003b7
| tau:~ # 

At the moment there are four or five outputs that appears often.
Others are more seldom. Sometimes there are strings in the output,
that (at least I believe so) belong to some userspace programm.

If I do a "lvdisplay" on a snapshot I still get the vfree-error:
| Jun  3 20:58:16 tau kernel: Trying to vfree() nonexistent vm area (00000001401a8000)
| Jun  3 20:58:45 tau last message repeated 5 times

As I'm only logged in remotely I wont try to crash the system now.


Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [linux-lvm] LVM + software Raid 5 : freeze when using snapshots.
  2003-06-03 14:38       ` Daniel Bayer
@ 2003-06-04 10:53         ` Serge Rossi
  0 siblings, 0 replies; 6+ messages in thread
From: Serge Rossi @ 2003-06-04 10:53 UTC (permalink / raw)
  To: linux-lvm

I'm using two IBM file servers (Samba, NFS...) running RH 7.2 (with 
latest updates) and LVM over software Raid 5.

RH Kernel 2.4.20-13.7 (Including LVM 1.0.3)
lvm tools and lib 1.0.5

Hardware : Dual Xeon MP (HT activated) + 1 GB Ram

Each server have 14 * 73 GB Disks for the datas.
md0 and md1 are RAID 5 units of 7 disks each, both included in a VG.

The VG is subdivded into several 100 GB LV and some free space for 
snapshots.

Without snapshots, everything runs perfectly : 2 months without problems 
with 1500 Windows PCs (SMB clients) and 500 SGI Workstations (NFS 
clients). Load on the servers is between 0.1 and 1.0.

We tried to snapshot 3 LV each night. After 2 or 3 days (6 or 9 
snapshots stored in the VG), the load on the servers was between 2.0 and 
3.0 but sometimes one of the servers stop responding.

The kernel is still responsive to ping or Sysrq but not on the command 
line, even on the console.

No activity at all on the disks.

Sysrq S only syncs the system disk (not in the VG) but did not sync the
LVs. Reboot is the only way out.

Nothing special in messages (no Oops).


We removed the snapshots and the servers are stable again.
I have not tried to test what is the number of snapshots required to 
freeze the servers (difficult on production servers !!!).


Did anybody have any idea on this problem ?

Are the servers too weak to handle the load caused by the snapshots ?

-- 
   ----------------------------------------------------------------------
    Serge Rossi - System & Network Architect  Tel : +33 (0)1 34 55 91 14
                 iDVU - Renault               Fax : +33 (0)1 34 55 92 25
   ----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-06-04 10:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-28 19:09 [linux-lvm] Problems with snapshots on sparc64 Daniel Bayer
2003-06-02  4:50 ` Heinz J . Mauelshagen
2003-06-02  8:11   ` Daniel Bayer
2003-06-02 10:15     ` Heinz J . Mauelshagen
2003-06-03 14:38       ` Daniel Bayer
2003-06-04 10:53         ` [linux-lvm] LVM + software Raid 5 : freeze when using snapshots Serge Rossi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.