All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] massive LV corruption
@ 2004-09-11 11:08 Tracy R Reed
  2004-09-11 11:25 ` Tracy R Reed
  2004-09-14  0:46 ` Tracy R Reed
  0 siblings, 2 replies; 8+ messages in thread
From: Tracy R Reed @ 2004-09-11 11:08 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1769 bytes --]

I am running Fedora Core 1 with stock RedHat kernel 2.4.22-1.2188.nptl. I
filled my /usr/local to 100% and decided I needed some more space so I ran
the lvextend and resize_reiserfs commands like I have done many times
before to add a couple gig to the volume. A few hours later I began
noticing very strange behaviors. My .vimrc file was filled with garbage.
All of my email disappeared. Lots of filesystem errors began appearing on
the console.  I rebooted the machine and upon the reboot my entire /home
logical volume was nowhere to be found. The /usr/local lv existed but the
fs was corrupted very badly. I tried restoring the lvm config with vgcfg
restore to no avail. I tested the memory with memtest and found no
problems. I did a non-destructive badblocks test of all 80G of the drive
with everything unmounted and / mounted RO and came up with no problems.
The symptoms really look like disk was allocated improperly and diskspace
already in use got overwritten. I have saved a bunch of output from
various lvm commands and other things and the backup vgcfg file from right
after I made the change which probably caused the damage in case they are
of use to someone. They can be found here:
 
http://ultraviolet.org/tmp

Unfortunately I had to get the server back up and running so I didn't have
time to try to reproduce it or do any more debugging although I am afraid
to use lvm on this box now. I eventually deleted the corrupted lv's and
remade from scratch and all seems well for the moment. I am SO glad to
have made a backup a couple days before so I didn't lose too much.

-- 
Tracy Reed                     The attachment is a digital signature.
http://copilotconsulting.com   More info: http://copilotconsulting.com/sig

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] massive LV corruption
  2004-09-11 11:08 [linux-lvm] massive LV corruption Tracy R Reed
@ 2004-09-11 11:25 ` Tracy R Reed
  2004-09-14  0:46 ` Tracy R Reed
  1 sibling, 0 replies; 8+ messages in thread
From: Tracy R Reed @ 2004-09-11 11:25 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On Sat, Sep 11, 2004 at 04:08:30AM -0700, Tracy R Reed spake thusly:
> after I made the change which probably caused the damage in case they are
> of use to someone. They can be found here:
>  
> http://ultraviolet.org/tmp

I just added system.conf.4.old so you can see the lvm config from before I
executed the lvextend commands. 

-- 
Tracy Reed                     The attachment is a digital signature.
http://copilotconsulting.com   More info: http://copilotconsulting.com/sig

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] massive LV corruption
  2004-09-11 11:08 [linux-lvm] massive LV corruption Tracy R Reed
  2004-09-11 11:25 ` Tracy R Reed
@ 2004-09-14  0:46 ` Tracy R Reed
  2004-09-14  5:45   ` Clint Byrum
  1 sibling, 1 reply; 8+ messages in thread
From: Tracy R Reed @ 2004-09-14  0:46 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 951 bytes --]

On Sat, Sep 11, 2004 at 04:08:30AM -0700, Tracy R Reed spake thusly:
> I am running Fedora Core 1 with stock RedHat kernel 2.4.22-1.2188.nptl. I
> filled my /usr/local to 100% and decided I needed some more space so I ran
> the lvextend and resize_reiserfs commands like I have done many times

Nobody has an opinion to offer? I've been a big fan of LVM for ages but
this incident has really shaken my confidence in it. It is my
understanding that all of the software I am using should be pretty solid
by now. I am hesitatnt to allow LVM operations on production systems
anymore if this sort of thing can happen without explanation. :( I would
like to think I did something wrong but I've used lvextend and
resize_reiserfs a number of times before on other machines without
incident.

-- 
Tracy Reed                     The attachment is a digital signature.
http://copilotconsulting.com   More info: http://copilotconsulting.com/sig

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] massive LV corruption
  2004-09-14  0:46 ` Tracy R Reed
@ 2004-09-14  5:45   ` Clint Byrum
  2004-09-14 12:45     ` Tracy R Reed
  0 siblings, 1 reply; 8+ messages in thread
From: Clint Byrum @ 2004-09-14  5:45 UTC (permalink / raw)
  To: LVM general discussion and development


On Monday, September 13, 2004, at 05:46 PM, Tracy R Reed wrote:

> On Sat, Sep 11, 2004 at 04:08:30AM -0700, Tracy R Reed spake thusly:
>> I am running Fedora Core 1 with stock RedHat kernel 
>> 2.4.22-1.2188.nptl. I
>> filled my /usr/local to 100% and decided I needed some more space so 
>> I ran
>> the lvextend and resize_reiserfs commands like I have done many times
>
> Nobody has an opinion to offer? I've been a big fan of LVM for ages but
> this incident has really shaken my confidence in it. It is my
> understanding that all of the software I am using should be pretty 
> solid
> by now. I am hesitatnt to allow LVM operations on production systems
> anymore if this sort of thing can happen without explanation. :( I 
> would
> like to think I did something wrong but I've used lvextend and
> resize_reiserfs a number of times before on other machines without
> incident.
>

I've never used resize_reiserfs, but I do know that a lot of people I 
talk to won't use ReiserFS because of past problems that have since 
been fixed. The tools that come with ReiserFS are generally very good. 
Personally I only use ext3 or XFS on LVM because of ReiserFS's slowness 
when being written to by more than one process.

If I had to blame one thing, I'd blame the heavily hacked 2.4 kernel 
that came with Fedora. :-P

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] massive LV corruption
  2004-09-14  5:45   ` Clint Byrum
@ 2004-09-14 12:45     ` Tracy R Reed
  2004-09-14 14:49       ` Clint Byrum
  0 siblings, 1 reply; 8+ messages in thread
From: Tracy R Reed @ 2004-09-14 12:45 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]

On Mon, Sep 13, 2004 at 10:45:55PM -0700, Clint Byrum spake thusly:
> I've never used resize_reiserfs, but I do know that a lot of people I 
> talk to won't use ReiserFS because of past problems that have since 
> been fixed. The tools that come with ReiserFS are generally very good. 

I'm pretty sure it can't possibly be reiserfs because the actual lv's were
hosed. The LVM/block layer should prevent resize_reiserfs or any part of
reiserfs from damaging the lv's. I love reiser and have used it with great
success for years. I find it sad that people still pan reiserfs after all
this time. I am really looking forward to reiser4 (released already but I
want to see it get some more time behind it) and some cool plugins for it.
I have a feeling it is going to do for Linux what MS claims WinFS will
(someday) do for their OS.

> If I had to blame one thing, I'd blame the heavily hacked 2.4 kernel 
> that came with Fedora. :-P

I suspect this is the case. I am using Fedora Core 2 with a 2.6.something
(exact kernel version in the typescript file from my original posting)
kernel and that seems to be the most likely culprit. However, I doubt
anyone from RedHat is going to take an interest because this isn't
reproduceable. That is to say, I am not going to trash my box again to
reproduce it.

-- 
Tracy Reed                     The attachment is a digital signature.
http://copilotconsulting.com   More info: http://copilotconsulting.com/sig

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] massive LV corruption
  2004-09-14 12:45     ` Tracy R Reed
@ 2004-09-14 14:49       ` Clint Byrum
  2004-09-14 20:06         ` Tracy R Reed
  0 siblings, 1 reply; 8+ messages in thread
From: Clint Byrum @ 2004-09-14 14:49 UTC (permalink / raw)
  To: LVM general discussion and development


On Tuesday, September 14, 2004, at 05:45 AM, Tracy R Reed wrote:

> On Mon, Sep 13, 2004 at 10:45:55PM -0700, Clint Byrum spake thusly:
>> I've never used resize_reiserfs, but I do know that a lot of people I
>> talk to won't use ReiserFS because of past problems that have since
>> been fixed. The tools that come with ReiserFS are generally very good.
>
> I'm pretty sure it can't possibly be reiserfs because the actual lv's 
> were
> hosed. The LVM/block layer should prevent resize_reiserfs or any part 
> of
> reiserfs from damaging the lv's. I love reiser and have used it with 
> great
> success for years. I find it sad that people still pan reiserfs after 
> all

Just wanted to say that I don't pan ReiserFS, as I have never had 
problems like others did when it was still very new and there were 
problems keeping it in sync with the mainline kernel. I don't use 
ReiserFS v3 because, while very fast for workstations, it has major 
problems with concurrant write accesses.

Hans Reiser has stated that this is because each filesystem has a lock 
on it, so while writing to, say, /home/cvs, anybody else who wants to 
write to /home/cvs will have to wait. We have a CVS server where the 
CVS trees and home dirs are on two seperate logical volumes, and this 
locking scheme *HURTS* when two people are trying to do a cvs update. 
CVS writes a "read lock" file to each cvs directory, and some temp 
files in the working copy. Combine this with vim writing to its "swap" 
files all the time.. the box sometimes comes to a screeching halt for 
all users for almost a minute as they get in line with the filesystem 
locks.

That said, this ReiserFS+LVM1 system (redhat 8.0) has never had any 
data issues. :-P

> this time. I am really looking forward to reiser4 (released already 
> but I
> want to see it get some more time behind it) and some cool plugins for 
> it.
> I have a feeling it is going to do for Linux what MS claims WinFS will
> (someday) do for their OS.
>

Yes, bring it on. I plan to convert some workstations to it first.. 
then home server.. then non critical work servers.. the usual 
progression before production.

>> If I had to blame one thing, I'd blame the heavily hacked 2.4 kernel
>> that came with Fedora. :-P
>
> I suspect this is the case. I am using Fedora Core 2 with a 
> 2.6.something
> (exact kernel version in the typescript file from my original posting)
> kernel and that seems to be the most likely culprit. However, I doubt
> anyone from RedHat is going to take an interest because this isn't
> reproduceable. That is to say, I am not going to trash my box again to
> reproduce it.
>

You said you were running 2.4.22.nptl or something.
d

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] massive LV corruption
  2004-09-14 14:49       ` Clint Byrum
@ 2004-09-14 20:06         ` Tracy R Reed
  2004-09-14 20:41           ` Clint Byrum
  0 siblings, 1 reply; 8+ messages in thread
From: Tracy R Reed @ 2004-09-14 20:06 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 874 bytes --]

On Tue, Sep 14, 2004 at 07:49:45AM -0700, Clint Byrum spake thusly:
> Hans Reiser has stated that this is because each filesystem has a lock 
> on it, so while writing to, say, /home/cvs, anybody else who wants to 
> write to /home/cvs will have to wait. We have a CVS server where the 

That's odd given that each hard drive can only physically write to one
place on the disk at a time anyhow due to head movement and that the
kernel caches the writes and lays them back out on the disk with some sort
of elevator algorithm.

> You said you were running 2.4.22.nptl or something.

Oops, right you are. That is what the box in question was and is running.
I was thinking of a different box with FC2 on it.

-- 
Tracy Reed                     The attachment is a digital signature.
http://copilotconsulting.com   More info: http://copilotconsulting.com/sig

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [linux-lvm] massive LV corruption
  2004-09-14 20:06         ` Tracy R Reed
@ 2004-09-14 20:41           ` Clint Byrum
  0 siblings, 0 replies; 8+ messages in thread
From: Clint Byrum @ 2004-09-14 20:41 UTC (permalink / raw)
  To: LVM general discussion and development


On Tuesday, September 14, 2004, at 01:06 PM, Tracy R Reed wrote:

> On Tue, Sep 14, 2004 at 07:49:45AM -0700, Clint Byrum spake thusly:
>> Hans Reiser has stated that this is because each filesystem has a lock
>> on it, so while writing to, say, /home/cvs, anybody else who wants to
>> write to /home/cvs will have to wait. We have a CVS server where the
>
> That's odd given that each hard drive can only physically write to one
> place on the disk at a time anyhow due to head movement and that the
> kernel caches the writes and lays them back out on the disk with some 
> sort
> of elevator algorithm.
>

You're assuming that programs actually wait for disks! One process is 
creating a file at /home/cvs/dir1/#lockfile the other at 
/home/cvs/dir2/#lockfile. Until they run fsync, the physical disk isn't 
necessarily involved. The problem lies in the fact that with other 
filesystems, like XFS, the kernel will happily modify (at the VFS 
layer) two different dirs at one time, as they lock by meta-object (I 
won't say inode, because I don't think XFS has inodes). With ReiserFS, 
the entire partition is locked while things are modified. With a cvs 
lock file, you might not even want to call fsync() to send it to the 
disk, as the VFS layer will already have it there, and thats all you 
care about. This is one reason why using a secondary device as a 
journalling device can be so beneficial.. as you won't have to seek 
around the disk with every meta data update.

Somebody who knows what they're talking about.. feel free to shoot all 
of this down. I feel like I'm talking out of my arse a bit. ;-)

-cb

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-09-14 20:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-11 11:08 [linux-lvm] massive LV corruption Tracy R Reed
2004-09-11 11:25 ` Tracy R Reed
2004-09-14  0:46 ` Tracy R Reed
2004-09-14  5:45   ` Clint Byrum
2004-09-14 12:45     ` Tracy R Reed
2004-09-14 14:49       ` Clint Byrum
2004-09-14 20:06         ` Tracy R Reed
2004-09-14 20:41           ` Clint Byrum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.