linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid 1 single drive failure. rebuilt with replacement drive. missing 20G of data
@ 2009-03-12  2:10 Mitchell Laks
  2009-03-12  2:22 ` Mitchell Laks
  2009-03-23  6:09 ` Neil Brown
  0 siblings, 2 replies; 4+ messages in thread
From: Mitchell Laks @ 2009-03-12  2:10 UTC (permalink / raw)
  To: linux-raid

Hi,

I have used linux software raid1 for many years without any data loss. Now I seem to have a problem.

I run a server with a pair of WD 400G drives that I keep in a raid 1 configuration. The raid became degraded
but continued to function and continue to store data on the remaining  drive. 

I simply replaced the faulty drive and rebuilt the raid. All seemed to be   ok,  however now I notice that 20G of data are 
missing from the raid. I see this by comparing to a backup device that I mirror to every night.

Any idea on how can this have happened? These are the most recent 20G of data. Perhaps this is the last 20 G since the 
raid went bad,  however that the data was clearly stored initially - I see it on the backup device I copied to.

Could the data   have been lost in the rebuild process? seems strange. Total of 220G of data on the device at present.

The server is  running debian 4.0 (etch) which runs mdadm-2.5.6-9 (debian). Linux kernel 2.6.18-5-686


Thanks

Mitchell Laks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid 1 single drive failure. rebuilt with replacement drive. missing 20G of data
  2009-03-12  2:10 raid 1 single drive failure. rebuilt with replacement drive. missing 20G of data Mitchell Laks
@ 2009-03-12  2:22 ` Mitchell Laks
  2009-03-23  6:09 ` Neil Brown
  1 sibling, 0 replies; 4+ messages in thread
From: Mitchell Laks @ 2009-03-12  2:22 UTC (permalink / raw)
  To: linux-raid

On 22:10 Wed 11 Mar     , Mitchell Laks wrote:

Additional information:

I use Promise SATA150 or SATA300 TX4 4 port stata controllers and I find the following errors  in 

dmesg|tail -n

ata2: translated ATA stat/err 0x51/84 to SCSI SK/ASC/ASCQ 0xb/47/00
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x84 { DriveStatusError BadCRC }
md: md2: sync done.
RAID1 conf printout:
 --- wd:2 rd:2
 disk 0, wo:0, o:1, dev:sdd1
 disk 1, wo:0, o:1, dev:sdb1
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 390708736 blocks.
md: md1: sync done.
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda1
md: syncing RAID array md2
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 390708736 blocks.
md: md2: sync done.
RAID1 conf printout:
 --- wd:2 rd:2
 disk 0, wo:0, o:1, dev:sdd1
 disk 1, wo:0, o:1, dev:sdb1
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }


and when I do 

dmesg|grep ata 

I find:

 BIOS-e820: 000000003ffb0000 - 000000003ffc0000 (ACPI data)
Memory: 1031252k/1048256k available (1541k kernel code, 16308k reserved, 576k data, 196k init, 130752k highmem)
libata version 2.00 loaded.
sata_promise 0000:00:0d.0: version 1.04
 hda:<6>ata1: SATA max UDMA/133 cmd 0xF88C8200 ctl 0xF88C8238 bmdma 0x0 irq 185
ata2: SATA max UDMA/133 cmd 0xF88C8280 ctl 0xF88C82B8 bmdma 0x0 irq 185
ata3: SATA max UDMA/133 cmd 0xF88C8300 ctl 0xF88C8338 bmdma 0x0 irq 185
ata4: SATA max UDMA/133 cmd 0xF88C8380 ctl 0xF88C83B8 bmdma 0x0 irq 185
scsi0 : sata_promise
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7, max UDMA/133, 781422768 sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 0
ata1.00: configured for UDMA/133
scsi1 : sata_promise
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7, max UDMA/133, 781422768 sectors: LBA48 NCQ (depth 0/32)
ata2.00: ata2: dev 0 multi count 0
ata2.00: configured for UDMA/133
scsi2 : sata_promise
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7, max UDMA/133, 1465149168 sectors: LBA48 NCQ (depth 0/32)
ata3.00: ata3: dev 0 multi count 0
ata3.00: configured for UDMA/133
scsi3 : sata_promise
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: ATA-7, max UDMA/133, 781422768 sectors: LBA48 NCQ (depth 0/32)
ata4.00: ata4: dev 0 multi count 0
ata4.00: configured for UDMA/133
EXT3-fs: mounted filesystem with ordered data mode.
sata_via 0000:00:0f.0: version 2.0
sata_via 0000:00:0f.0: routed to hard irq line 10
ata5: SATA max UDMA/133 cmd 0xC000 ctl 0xB802 bmdma 0xA800 irq 177
ata6: SATA max UDMA/133 cmd 0xB400 ctl 0xB002 bmdma 0xA808 irq 177
scsi4 : sata_via
ata5: SATA link down 1.5 Gbps (SStatus 0 SControl 300)
scsi5 : sata_via
ata6: SATA link down 1.5 Gbps (SStatus 0 SControl 300)
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: mounted filesystem with ordered data mode.
ata2: translated ATA stat/err 0x51/84 to SCSI SK/ASC/ASCQ 0xb/47/00
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x84 { DriveStatusError BadCRC }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
 
So when I look up in google I find:

http://bugzilla.kernel.org/show_bug.cgi?id=7516



Any more thoughts?

Thanks

Mitchell Laks



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid 1 single drive failure. rebuilt with replacement drive. missing 20G of data
  2009-03-12  2:10 raid 1 single drive failure. rebuilt with replacement drive. missing 20G of data Mitchell Laks
  2009-03-12  2:22 ` Mitchell Laks
@ 2009-03-23  6:09 ` Neil Brown
  2009-03-23 20:29   ` Nifty Fedora Mitch
  1 sibling, 1 reply; 4+ messages in thread
From: Neil Brown @ 2009-03-23  6:09 UTC (permalink / raw)
  To: Mitchell Laks; +Cc: linux-raid

On Wednesday March 11, mlaks@post.harvard.edu wrote:
> Hi,
> 
> I have used linux software raid1 for many years without any data loss. Now I seem to have a problem.
> 
> I run a server with a pair of WD 400G drives that I keep in a raid 1 configuration. The raid became degraded
> but continued to function and continue to store data on the remaining  drive. 
> 
> I simply replaced the faulty drive and rebuilt the raid. All seemed to be   ok,  however now I notice that 20G of data are 
> missing from the raid. I see this by comparing to a backup device that I mirror to every night.
> 
> Any idea on how can this have happened? These are the most recent 20G of data. Perhaps this is the last 20 G since the 
> raid went bad,  however that the data was clearly stored initially - I see it on the backup device I copied to.
> 
> Could the data   have been lost in the rebuild process? seems strange. Total of 220G of data on the device at present.

Seems very strange indeed.  I cannot explain it at all.
So you have complete kernel logs covering the time when the drive
failed, and the time when you added a new device?  They might shed
some light.

An explanation that fits most of the available data is that you
actually replaced the good drive rather than the bad drive.  Was the
'bad' drive totally dead, or just a bit sick and might have recovered
after cooling down?

NeilBrown

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid 1 single drive failure. rebuilt with replacement drive. missing 20G of data
  2009-03-23  6:09 ` Neil Brown
@ 2009-03-23 20:29   ` Nifty Fedora Mitch
  0 siblings, 0 replies; 4+ messages in thread
From: Nifty Fedora Mitch @ 2009-03-23 20:29 UTC (permalink / raw)
  To: Neil Brown; +Cc: Mitchell Laks, linux-raid

On Mon, Mar 23, 2009 at 05:09:58PM +1100, Neil Brown wrote:
> On Wednesday March 11, mlaks@post.harvard.edu wrote:
> > Hi,
> > 
> > I have used linux software raid1 for many years without any data loss. Now I seem to have a problem.
> > 
> > I run a server with a pair of WD 400G drives that I keep in a raid 1 configuration. The raid became degraded
> > but continued to function and continue to store data on the remaining  drive. 
> > 

Just curious... How much RAM in the system?  Size and config of swap?


-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-03-23 20:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-12  2:10 raid 1 single drive failure. rebuilt with replacement drive. missing 20G of data Mitchell Laks
2009-03-12  2:22 ` Mitchell Laks
2009-03-23  6:09 ` Neil Brown
2009-03-23 20:29   ` Nifty Fedora Mitch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).