* [linux-lvm] massive LV corruption @ 2004-09-11 11:08 Tracy R Reed 2004-09-11 11:25 ` Tracy R Reed 2004-09-14 0:46 ` Tracy R Reed 0 siblings, 2 replies; 8+ messages in thread From: Tracy R Reed @ 2004-09-11 11:08 UTC (permalink / raw) To: linux-lvm [-- Attachment #1: Type: text/plain, Size: 1769 bytes --] I am running Fedora Core 1 with stock RedHat kernel 2.4.22-1.2188.nptl. I filled my /usr/local to 100% and decided I needed some more space so I ran the lvextend and resize_reiserfs commands like I have done many times before to add a couple gig to the volume. A few hours later I began noticing very strange behaviors. My .vimrc file was filled with garbage. All of my email disappeared. Lots of filesystem errors began appearing on the console. I rebooted the machine and upon the reboot my entire /home logical volume was nowhere to be found. The /usr/local lv existed but the fs was corrupted very badly. I tried restoring the lvm config with vgcfg restore to no avail. I tested the memory with memtest and found no problems. I did a non-destructive badblocks test of all 80G of the drive with everything unmounted and / mounted RO and came up with no problems. The symptoms really look like disk was allocated improperly and diskspace already in use got overwritten. I have saved a bunch of output from various lvm commands and other things and the backup vgcfg file from right after I made the change which probably caused the damage in case they are of use to someone. They can be found here: http://ultraviolet.org/tmp Unfortunately I had to get the server back up and running so I didn't have time to try to reproduce it or do any more debugging although I am afraid to use lvm on this box now. I eventually deleted the corrupted lv's and remade from scratch and all seems well for the moment. I am SO glad to have made a backup a couple days before so I didn't lose too much. -- Tracy Reed The attachment is a digital signature. http://copilotconsulting.com More info: http://copilotconsulting.com/sig [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] massive LV corruption 2004-09-11 11:08 [linux-lvm] massive LV corruption Tracy R Reed @ 2004-09-11 11:25 ` Tracy R Reed 2004-09-14 0:46 ` Tracy R Reed 1 sibling, 0 replies; 8+ messages in thread From: Tracy R Reed @ 2004-09-11 11:25 UTC (permalink / raw) To: linux-lvm [-- Attachment #1: Type: text/plain, Size: 494 bytes --] On Sat, Sep 11, 2004 at 04:08:30AM -0700, Tracy R Reed spake thusly: > after I made the change which probably caused the damage in case they are > of use to someone. They can be found here: > > http://ultraviolet.org/tmp I just added system.conf.4.old so you can see the lvm config from before I executed the lvextend commands. -- Tracy Reed The attachment is a digital signature. http://copilotconsulting.com More info: http://copilotconsulting.com/sig [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] massive LV corruption 2004-09-11 11:08 [linux-lvm] massive LV corruption Tracy R Reed 2004-09-11 11:25 ` Tracy R Reed @ 2004-09-14 0:46 ` Tracy R Reed 2004-09-14 5:45 ` Clint Byrum 1 sibling, 1 reply; 8+ messages in thread From: Tracy R Reed @ 2004-09-14 0:46 UTC (permalink / raw) To: linux-lvm [-- Attachment #1: Type: text/plain, Size: 951 bytes --] On Sat, Sep 11, 2004 at 04:08:30AM -0700, Tracy R Reed spake thusly: > I am running Fedora Core 1 with stock RedHat kernel 2.4.22-1.2188.nptl. I > filled my /usr/local to 100% and decided I needed some more space so I ran > the lvextend and resize_reiserfs commands like I have done many times Nobody has an opinion to offer? I've been a big fan of LVM for ages but this incident has really shaken my confidence in it. It is my understanding that all of the software I am using should be pretty solid by now. I am hesitatnt to allow LVM operations on production systems anymore if this sort of thing can happen without explanation. :( I would like to think I did something wrong but I've used lvextend and resize_reiserfs a number of times before on other machines without incident. -- Tracy Reed The attachment is a digital signature. http://copilotconsulting.com More info: http://copilotconsulting.com/sig [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] massive LV corruption 2004-09-14 0:46 ` Tracy R Reed @ 2004-09-14 5:45 ` Clint Byrum 2004-09-14 12:45 ` Tracy R Reed 0 siblings, 1 reply; 8+ messages in thread From: Clint Byrum @ 2004-09-14 5:45 UTC (permalink / raw) To: LVM general discussion and development On Monday, September 13, 2004, at 05:46 PM, Tracy R Reed wrote: > On Sat, Sep 11, 2004 at 04:08:30AM -0700, Tracy R Reed spake thusly: >> I am running Fedora Core 1 with stock RedHat kernel >> 2.4.22-1.2188.nptl. I >> filled my /usr/local to 100% and decided I needed some more space so >> I ran >> the lvextend and resize_reiserfs commands like I have done many times > > Nobody has an opinion to offer? I've been a big fan of LVM for ages but > this incident has really shaken my confidence in it. It is my > understanding that all of the software I am using should be pretty > solid > by now. I am hesitatnt to allow LVM operations on production systems > anymore if this sort of thing can happen without explanation. :( I > would > like to think I did something wrong but I've used lvextend and > resize_reiserfs a number of times before on other machines without > incident. > I've never used resize_reiserfs, but I do know that a lot of people I talk to won't use ReiserFS because of past problems that have since been fixed. The tools that come with ReiserFS are generally very good. Personally I only use ext3 or XFS on LVM because of ReiserFS's slowness when being written to by more than one process. If I had to blame one thing, I'd blame the heavily hacked 2.4 kernel that came with Fedora. :-P ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] massive LV corruption 2004-09-14 5:45 ` Clint Byrum @ 2004-09-14 12:45 ` Tracy R Reed 2004-09-14 14:49 ` Clint Byrum 0 siblings, 1 reply; 8+ messages in thread From: Tracy R Reed @ 2004-09-14 12:45 UTC (permalink / raw) To: LVM general discussion and development [-- Attachment #1: Type: text/plain, Size: 1478 bytes --] On Mon, Sep 13, 2004 at 10:45:55PM -0700, Clint Byrum spake thusly: > I've never used resize_reiserfs, but I do know that a lot of people I > talk to won't use ReiserFS because of past problems that have since > been fixed. The tools that come with ReiserFS are generally very good. I'm pretty sure it can't possibly be reiserfs because the actual lv's were hosed. The LVM/block layer should prevent resize_reiserfs or any part of reiserfs from damaging the lv's. I love reiser and have used it with great success for years. I find it sad that people still pan reiserfs after all this time. I am really looking forward to reiser4 (released already but I want to see it get some more time behind it) and some cool plugins for it. I have a feeling it is going to do for Linux what MS claims WinFS will (someday) do for their OS. > If I had to blame one thing, I'd blame the heavily hacked 2.4 kernel > that came with Fedora. :-P I suspect this is the case. I am using Fedora Core 2 with a 2.6.something (exact kernel version in the typescript file from my original posting) kernel and that seems to be the most likely culprit. However, I doubt anyone from RedHat is going to take an interest because this isn't reproduceable. That is to say, I am not going to trash my box again to reproduce it. -- Tracy Reed The attachment is a digital signature. http://copilotconsulting.com More info: http://copilotconsulting.com/sig [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] massive LV corruption 2004-09-14 12:45 ` Tracy R Reed @ 2004-09-14 14:49 ` Clint Byrum 2004-09-14 20:06 ` Tracy R Reed 0 siblings, 1 reply; 8+ messages in thread From: Clint Byrum @ 2004-09-14 14:49 UTC (permalink / raw) To: LVM general discussion and development On Tuesday, September 14, 2004, at 05:45 AM, Tracy R Reed wrote: > On Mon, Sep 13, 2004 at 10:45:55PM -0700, Clint Byrum spake thusly: >> I've never used resize_reiserfs, but I do know that a lot of people I >> talk to won't use ReiserFS because of past problems that have since >> been fixed. The tools that come with ReiserFS are generally very good. > > I'm pretty sure it can't possibly be reiserfs because the actual lv's > were > hosed. The LVM/block layer should prevent resize_reiserfs or any part > of > reiserfs from damaging the lv's. I love reiser and have used it with > great > success for years. I find it sad that people still pan reiserfs after > all Just wanted to say that I don't pan ReiserFS, as I have never had problems like others did when it was still very new and there were problems keeping it in sync with the mainline kernel. I don't use ReiserFS v3 because, while very fast for workstations, it has major problems with concurrant write accesses. Hans Reiser has stated that this is because each filesystem has a lock on it, so while writing to, say, /home/cvs, anybody else who wants to write to /home/cvs will have to wait. We have a CVS server where the CVS trees and home dirs are on two seperate logical volumes, and this locking scheme *HURTS* when two people are trying to do a cvs update. CVS writes a "read lock" file to each cvs directory, and some temp files in the working copy. Combine this with vim writing to its "swap" files all the time.. the box sometimes comes to a screeching halt for all users for almost a minute as they get in line with the filesystem locks. That said, this ReiserFS+LVM1 system (redhat 8.0) has never had any data issues. :-P > this time. I am really looking forward to reiser4 (released already > but I > want to see it get some more time behind it) and some cool plugins for > it. > I have a feeling it is going to do for Linux what MS claims WinFS will > (someday) do for their OS. > Yes, bring it on. I plan to convert some workstations to it first.. then home server.. then non critical work servers.. the usual progression before production. >> If I had to blame one thing, I'd blame the heavily hacked 2.4 kernel >> that came with Fedora. :-P > > I suspect this is the case. I am using Fedora Core 2 with a > 2.6.something > (exact kernel version in the typescript file from my original posting) > kernel and that seems to be the most likely culprit. However, I doubt > anyone from RedHat is going to take an interest because this isn't > reproduceable. That is to say, I am not going to trash my box again to > reproduce it. > You said you were running 2.4.22.nptl or something. d ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] massive LV corruption 2004-09-14 14:49 ` Clint Byrum @ 2004-09-14 20:06 ` Tracy R Reed 2004-09-14 20:41 ` Clint Byrum 0 siblings, 1 reply; 8+ messages in thread From: Tracy R Reed @ 2004-09-14 20:06 UTC (permalink / raw) To: LVM general discussion and development [-- Attachment #1: Type: text/plain, Size: 874 bytes --] On Tue, Sep 14, 2004 at 07:49:45AM -0700, Clint Byrum spake thusly: > Hans Reiser has stated that this is because each filesystem has a lock > on it, so while writing to, say, /home/cvs, anybody else who wants to > write to /home/cvs will have to wait. We have a CVS server where the That's odd given that each hard drive can only physically write to one place on the disk at a time anyhow due to head movement and that the kernel caches the writes and lays them back out on the disk with some sort of elevator algorithm. > You said you were running 2.4.22.nptl or something. Oops, right you are. That is what the box in question was and is running. I was thinking of a different box with FC2 on it. -- Tracy Reed The attachment is a digital signature. http://copilotconsulting.com More info: http://copilotconsulting.com/sig [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] massive LV corruption 2004-09-14 20:06 ` Tracy R Reed @ 2004-09-14 20:41 ` Clint Byrum 0 siblings, 0 replies; 8+ messages in thread From: Clint Byrum @ 2004-09-14 20:41 UTC (permalink / raw) To: LVM general discussion and development On Tuesday, September 14, 2004, at 01:06 PM, Tracy R Reed wrote: > On Tue, Sep 14, 2004 at 07:49:45AM -0700, Clint Byrum spake thusly: >> Hans Reiser has stated that this is because each filesystem has a lock >> on it, so while writing to, say, /home/cvs, anybody else who wants to >> write to /home/cvs will have to wait. We have a CVS server where the > > That's odd given that each hard drive can only physically write to one > place on the disk at a time anyhow due to head movement and that the > kernel caches the writes and lays them back out on the disk with some > sort > of elevator algorithm. > You're assuming that programs actually wait for disks! One process is creating a file at /home/cvs/dir1/#lockfile the other at /home/cvs/dir2/#lockfile. Until they run fsync, the physical disk isn't necessarily involved. The problem lies in the fact that with other filesystems, like XFS, the kernel will happily modify (at the VFS layer) two different dirs at one time, as they lock by meta-object (I won't say inode, because I don't think XFS has inodes). With ReiserFS, the entire partition is locked while things are modified. With a cvs lock file, you might not even want to call fsync() to send it to the disk, as the VFS layer will already have it there, and thats all you care about. This is one reason why using a secondary device as a journalling device can be so beneficial.. as you won't have to seek around the disk with every meta data update. Somebody who knows what they're talking about.. feel free to shoot all of this down. I feel like I'm talking out of my arse a bit. ;-) -cb ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-09-14 20:41 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-09-11 11:08 [linux-lvm] massive LV corruption Tracy R Reed 2004-09-11 11:25 ` Tracy R Reed 2004-09-14 0:46 ` Tracy R Reed 2004-09-14 5:45 ` Clint Byrum 2004-09-14 12:45 ` Tracy R Reed 2004-09-14 14:49 ` Clint Byrum 2004-09-14 20:06 ` Tracy R Reed 2004-09-14 20:41 ` Clint Byrum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.