* raid1-diseaster on reboot: old version overwrites new version
@ 2005-04-02 15:43 peter pilsl
2005-04-02 17:27 ` Gordon Henderson
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: peter pilsl @ 2005-04-02 15:43 UTC (permalink / raw)
To: linux-raid
Two days ago I had a severe servercrash due to raid-problems. The whole
thing started with a (homemade) DOS-attack on the server. The server
went to its knees and needed to be resetted. After the reboot the server
was working fine and background-reconstruction of the mirrors started.
About 30 minutes later the first anomalies occured. Applications
reported missing libraries, fs-errors (reiserfs) and so on.
It took a while until I reckognized what was going on:
the /-partition was on a raid1 - /dev/md2 - based on two disks : hda6+hdc6.
For some reason the raid seemed to be out of sync for over a year and
hdc6 holded a old copy that was now successively overwriting hda6 and
changing the content of / while the raid was running.
I booted with a live-cd to discover the hdc6 was the exact copy of
spring 2004 (easily found out by content and timestamps of various files
over the system) and hda6 was not mountable. I ran reiserfsck and had
the tree rebuild on hda6, but it was too late. All current data was gone.
I had a backup and server is up again and my head is on my shoulders,
but it leaves a lot of questions to me:
* how can the raid be out of sync. I monitor /proc/mdstat on a
5-minute-interval and log the content to files. The output was
definitely like:
md2 : active raid1 hdc6[0] hda6[1]
5120000 blocks [2/2] [UU]
over the last year without a single exception. I just tested the entries
in my watchdog and checked functionality of the watchdog by removing one
disk. It definitely barks.
* how can in case of a unsynced raid the old version overwrite the new
version. This is like a nightmare (and I remember having such thing before)
* What did I do wrong?
The only explantion to me is, that I had the wrong entry in my
lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2
So maybe root was always mounted as /dev/hda6 and never as /dev/md2,
which was started, but never had any data written to it. Is this a
possible explanation?
kernel 2.4.24
raidtools-0.90
thnx for any advice,
peter
--
mag. peter pilsl
goldfisch.at
IT-management
tel +43 699 1 3574035
fax +43 699 4 3574035
pilsl@goldfisch.at
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: raid1-diseaster on reboot: old version overwrites new version 2005-04-02 15:43 raid1-diseaster on reboot: old version overwrites new version peter pilsl @ 2005-04-02 17:27 ` Gordon Henderson 2005-04-02 17:35 ` Tim Moore 2005-04-02 22:31 ` Neil Brown 2 siblings, 0 replies; 6+ messages in thread From: Gordon Henderson @ 2005-04-02 17:27 UTC (permalink / raw) To: linux-raid On Sat, 2 Apr 2005, peter pilsl wrote: > The only explantion to me is, that I had the wrong entry in my > lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2 > So maybe root was always mounted as /dev/hda6 and never as /dev/md2, > which was started, but never had any data written to it. Is this a > possible explanation? It's possible - but I think the root= parameter needs to correspond to whats in the /etc/fstab file - I'd check that too if it's still possible. I've no experience with reiser though. Gordon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: raid1-diseaster on reboot: old version overwrites new version 2005-04-02 15:43 raid1-diseaster on reboot: old version overwrites new version peter pilsl 2005-04-02 17:27 ` Gordon Henderson @ 2005-04-02 17:35 ` Tim Moore 2005-04-02 18:10 ` peter pilsl 2005-04-04 19:39 ` Doug Ledford 2005-04-02 22:31 ` Neil Brown 2 siblings, 2 replies; 6+ messages in thread From: Tim Moore @ 2005-04-02 17:35 UTC (permalink / raw) To: linux-raid peter pilsl wrote: > The only explantion to me is, that I had the wrong entry in my > lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2 > So maybe root was always mounted as /dev/hda6 and never as /dev/md2, > which was started, but never had any data written to it. Is this a > possible explanation? No. The lilo.conf entry just tells the kernel where root is located. Can you publish your /etc/fstab and fdisk -l output? -- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: raid1-diseaster on reboot: old version overwrites new version 2005-04-02 17:35 ` Tim Moore @ 2005-04-02 18:10 ` peter pilsl 2005-04-04 19:39 ` Doug Ledford 1 sibling, 0 replies; 6+ messages in thread From: peter pilsl @ 2005-04-02 18:10 UTC (permalink / raw) To: Tim Moore; +Cc: linux-raid Tim Moore wrote: > > > peter pilsl wrote: > >> The only explantion to me is, that I had the wrong entry in my >> lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2 >> So maybe root was always mounted as /dev/hda6 and never as /dev/md2, >> which was started, but never had any data written to it. Is this a >> possible explanation? > > > No. The lilo.conf entry just tells the kernel where root is located. > > Can you publish your /etc/fstab and fdisk -l output? > thnx. following is the output of fstab, fdisk of both involved drives and my raidtab. (which reminds me to change the swap from raid to more single-partions) ---------------fstab------------------- # cat /etc/fstab /dev/md2 / reiserfs noatime,notail 1 1 /dev/md0 /boot ext2 noatime 1 2 /dev/md3 /var reiserfs noatime,notail 1 2 /dev/md1 swap swap defaults 0 0 /dev/md4 /data reiserfs noatime,notail 1 2 /dev/md5 /backup_cust reiserfs noatime,notail 1 2 /dev/md6 /data2 reiserfs noatime,notail 1 2 /dev/hdc8 /opt_noraid reiserfs noatime,notail 1 2 /dev/hdd7 /opt reiserfs noatime,notail 1 2 none /dev/pts devpts mode=0620 0 0 none /dev/shm tmpfs defaults 0 0 none /proc proc defaults 0 0 #/dev/hdb /mnt/cdrom auto user,iocharset=iso8859-1,exec,codepage=850,ro,noauto 0 0 #/dev/fd0 /mnt/floppy auto user,iocharset=iso8859-1,sync,exec,codepage=850,noauto 0 0 ---------------fdisk------------------- # fdisk -l /dev/hda Disk /dev/hda: 255 heads, 63 sectors, 7297 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/hda1 1 3 24066 fd Linux raid autodetect /dev/hda2 4 67 514080 fd Linux raid autodetect /dev/hda3 68 7297 58074975 5 Extended /dev/hda5 68 705 5124703+ fd Linux raid autodetect /dev/hda6 706 1343 5124703+ fd Linux raid autodetect /dev/hda7 1344 5168 30724281 fd Linux raid autodetect /dev/hda8 5169 6443 10241406 fd Linux raid autodetect /dev/hda9 6444 7297 6859723+ fd Linux raid autodetect # fdisk -l /dev/hdc Disk /dev/hdc: 16 heads, 63 sectors, 232581 cylinders Units = cylinders of 1008 * 512 bytes Device Boot Start End Blocks Id System /dev/hdc1 1 48 24160+ fd Linux raid autodetect /dev/hdc2 49 1069 514584 fd Linux raid autodetect /dev/hdc3 1070 232581 116682048 5 Extended /dev/hdc5 1070 11238 5125144+ fd Linux raid autodetect /dev/hdc6 11239 21407 5125144+ fd Linux raid autodetect /dev/hdc7 21408 82368 30724312+ fd Linux raid autodetect /dev/hdc8 82369 232581 75707320+ 83 Linux ---------------raidtab------------------- # cat /etc/raidtab # /boot raiddev /dev/md0 raid-level 1 chunk-size 64k persistent-superblock 1 nr-raid-disks 3 device /dev/hdc1 raid-disk 0 device /dev/hda1 raid-disk 1 device /dev/hdd1 raid-disk 2 # swap raiddev /dev/md1 raid-level 0 chunk-size 64k persistent-superblock 1 nr-raid-disks 2 device /dev/hdc2 raid-disk 0 device /dev/hda2 raid-disk 1 # / raiddev /dev/md2 raid-level 1 chunk-size 64k persistent-superblock 1 nr-raid-disks 2 device /dev/hdc6 raid-disk 0 device /dev/hda6 raid-disk 1 # /var raiddev /dev/md3 raid-level 1 chunk-size 64k persistent-superblock 1 nr-raid-disks 2 device /dev/hda5 raid-disk 0 device /dev/hdc5 raid-disk 1 # /data raiddev /dev/md4 raid-level 1 chunk-size 64k persistent-superblock 1 nr-raid-disks 2 device /dev/hda7 raid-disk 0 device /dev/hdc7 raid-disk 1 # /back_customer raiddev /dev/md5 raid-level 1 chunk-size 64k persistent-superblock 1 nr-raid-disks 2 device /dev/hdd5 raid-disk 0 device /dev/hda8 raid-disk 1 # /data2 raiddev /dev/md6 raid-level 1 chunk-size 64k persistent-superblock 1 nr-raid-disks 2 device /dev/hdd6 raid-disk 0 device /dev/hda9 raid-disk 1 thnx for your help, peter -- mag. peter pilsl goldfisch.at IT-management tel +43 699 1 3574035 fax +43 699 4 3574035 pilsl@goldfisch.at ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: raid1-diseaster on reboot: old version overwrites new version 2005-04-02 17:35 ` Tim Moore 2005-04-02 18:10 ` peter pilsl @ 2005-04-04 19:39 ` Doug Ledford 1 sibling, 0 replies; 6+ messages in thread From: Doug Ledford @ 2005-04-04 19:39 UTC (permalink / raw) To: Tim Moore; +Cc: linux-raid On Sat, 2005-04-02 at 09:35 -0800, Tim Moore wrote: > > peter pilsl wrote: > > The only explantion to me is, that I had the wrong entry in my > > lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2 > > So maybe root was always mounted as /dev/hda6 and never as /dev/md2, > > which was started, but never had any data written to it. Is this a > > possible explanation? > > No. The lilo.conf entry just tells the kernel where root is located. Yes, as Neil posted, this exactly explains the issue. If /dev/hda6 is part of a raid1 array, and you write to it instead of /dev/md2, then those writes are never sent to /dev/hdc6 and the two devices get out of sync. Plus, standard initrd setups and the like are written to accommodate users passing in arbitrary root= options on the kernel command line to over ride the default root partition, and in those situations the root partition must be taken from the command line and not from fstab in order for this to work. So, whether it's lilo or grub or whatever, the root= line on your kernel command line is *the* authority when it comes to what will be mounted as the root partition you actually use. > Can you publish your /etc/fstab and fdisk -l output? Keep in mind the root partitions is already mounted in ro mode by the time fstab is available and the rc.sysinit script merely remounts it rw. Again, the command line is the authority. -- Doug Ledford <dledford@redhat.com> http://people.redhat.com/dledford ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: raid1-diseaster on reboot: old version overwrites new version 2005-04-02 15:43 raid1-diseaster on reboot: old version overwrites new version peter pilsl 2005-04-02 17:27 ` Gordon Henderson 2005-04-02 17:35 ` Tim Moore @ 2005-04-02 22:31 ` Neil Brown 2 siblings, 0 replies; 6+ messages in thread From: Neil Brown @ 2005-04-02 22:31 UTC (permalink / raw) To: peter pilsl; +Cc: linux-raid On Saturday April 2, pilsl@goldfisch.at wrote: > > * What did I do wrong? > > The only explantion to me is, that I had the wrong entry in my > lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2 > So maybe root was always mounted as /dev/hda6 and never as /dev/md2, > which was started, but never had any data written to it. Is this a > possible explanation? Yep, this completely explains everything. / was *not* on /dev/md2, it was on /dev/hda6 which also happened to be a part of an unused raid1 array. After a crash, the raid1 array did a resync copying from hdc6 to hda6. Very sad. Very good that you had backups. 2.6 won't let you do this: you cannot have a partition in a raid array and mounted as a filesystem at the same time. NeilBrown ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-04-04 19:39 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-04-02 15:43 raid1-diseaster on reboot: old version overwrites new version peter pilsl 2005-04-02 17:27 ` Gordon Henderson 2005-04-02 17:35 ` Tim Moore 2005-04-02 18:10 ` peter pilsl 2005-04-04 19:39 ` Doug Ledford 2005-04-02 22:31 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).