ext3 on a RAID1 going read only?

Linux RAID subsystem development
 help / color / mirror / Atom feed

* ext3 on a RAID1 going read only?
@ 2009-12-25 23:17 Steven Haigh
  2009-12-25 23:35 ` Steven Haigh
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Haigh @ 2009-12-25 23:17 UTC (permalink / raw)
  To: linux-raid

Hi guys,

Not 100% sure where to go with this one.... I've been having an issue with a particular server where after 30 days or so of uptime the / partition will go readonly after spitting the following to the console:

EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
Aborting journal on device md2.
Dec 25 18:17:27 wireless kernel: EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
Dec 25 18:17:27 wireless kernel: Aborting journal on device md2.
ext3_abort called.
Dec 25 18:17:27 EXT3-fs error (device md2): ext3_journal_start_sb: wireless kernel:Detected aborted journal ext3_abort called.
Remounting filesystem read-only
Dec 25 18:17:27 wireless kernel: EXT3-fs error (device md2): ext3_journal_start_sb: Detected aborted journal
Dec 25 18:17:27 wireless kernel: Remounting filesystem read-only
EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
Dec 25 18:17:36 wireless kernel: EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979

I'm a bit confused here as from what I understand, if there are bad blocks on a disk the disk should be kicked from the array - however ext3 seems to figure out there's a bad block by itself and nominates /dev/md2 as the culprit...

Can anyone shine some light on what is going on here - as I'm not quite as cluey with this stuff as I probably should be ;)

--
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ext3 on a RAID1 going read only?
  2009-12-25 23:17 ext3 on a RAID1 going read only? Steven Haigh
@ 2009-12-25 23:35 ` Steven Haigh
  2009-12-27 13:01   ` Goswin von Brederlow
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Haigh @ 2009-12-25 23:35 UTC (permalink / raw)
  To: linux-raid


On 26/12/2009, at 10:17 AM, Steven Haigh wrote:

> Hi guys,
> 
> Not 100% sure where to go with this one.... I've been having an issue with a particular server where after 30 days or so of uptime the / partition will go readonly after spitting the following to the console:
> 
> EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
> Aborting journal on device md2.
> Dec 25 18:17:27 wireless kernel: EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
> Dec 25 18:17:27 wireless kernel: Aborting journal on device md2.
> ext3_abort called.
> Dec 25 18:17:27 EXT3-fs error (device md2): ext3_journal_start_sb: wireless kernel:Detected aborted journal ext3_abort called.
> Remounting filesystem read-only
> Dec 25 18:17:27 wireless kernel: EXT3-fs error (device md2): ext3_journal_start_sb: Detected aborted journal
> Dec 25 18:17:27 wireless kernel: Remounting filesystem read-only
> EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
> Dec 25 18:17:36 wireless kernel: EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
> 
> I'm a bit confused here as from what I understand, if there are bad blocks on a disk the disk should be kicked from the array - however ext3 seems to figure out there's a bad block by itself and nominates /dev/md2 as the culprit...
> 
> Can anyone shine some light on what is going on here - as I'm not quite as cluey with this stuff as I probably should be ;)

I should also mention that this is using CentOS 5.4 with kernel 2.6.18-164.9.1.el5. A few more details:

# mdadm -Q --detail /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Mon Feb 23 17:15:41 2009
     Raid Level : raid1
     Array Size : 300511808 (286.59 GiB 307.72 GB)
  Used Dev Size : 300511808 (286.59 GiB 307.72 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sat Dec 26 10:34:23 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : fed99e3d:d08fdcc9:b9593a45:2cc09736
         Events : 0.30586

    Number   Major   Minor   RaidDevice State
       0       3        3        0      active sync   /dev/hda3
       1      22        3        1      active sync   /dev/hdc3

# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 hdc1[1] hda1[0]
      521984 blocks [2/2] [UU]
      
md1 : active raid1 hdc2[1] hda2[0]
      10482304 blocks [2/2] [UU]
      
md3 : active raid1 hdc4[1] hda4[0]
      1052160 blocks [2/2] [UU]
      
md2 : active raid1 hdc3[1] hda3[0]
      300511808 blocks [2/2] [UU]
      
unused devices: <none>

--
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ext3 on a RAID1 going read only?
  2009-12-25 23:35 ` Steven Haigh
@ 2009-12-27 13:01   ` Goswin von Brederlow
  0 siblings, 0 replies; 3+ messages in thread
From: Goswin von Brederlow @ 2009-12-27 13:01 UTC (permalink / raw)
  To: Steven Haigh; +Cc: linux-raid

Steven Haigh <netwiz@crc.id.au> writes:

> On 26/12/2009, at 10:17 AM, Steven Haigh wrote:
>
>> Hi guys,
>> 
>> Not 100% sure where to go with this one.... I've been having an issue with a particular server where after 30 days or so of uptime the / partition will go readonly after spitting the following to the console:
>> 
>> EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
>> Aborting journal on device md2.
>> Dec 25 18:17:27 wireless kernel: EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
>> Dec 25 18:17:27 wireless kernel: Aborting journal on device md2.
>> ext3_abort called.
>> Dec 25 18:17:27 EXT3-fs error (device md2): ext3_journal_start_sb: wireless kernel:Detected aborted journal ext3_abort called.
>> Remounting filesystem read-only
>> Dec 25 18:17:27 wireless kernel: EXT3-fs error (device md2): ext3_journal_start_sb: Detected aborted journal
>> Dec 25 18:17:27 wireless kernel: Remounting filesystem read-only
>> EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
>> Dec 25 18:17:36 wireless kernel: EXT3-fs error (device md2): ext3_xattr_block_list: inode 4932068: bad block 9873979
>> 
>> I'm a bit confused here as from what I understand, if there are bad blocks on a disk the disk should be kicked from the array - however ext3 seems to figure out there's a bad block by itself and nominates /dev/md2 as the culprit...
>> 
>> Can anyone shine some light on what is going on here - as I'm not quite as cluey with this stuff as I probably should be ;)
>
> I should also mention that this is using CentOS 5.4 with kernel 2.6.18-164.9.1.el5. A few more details:
>
> # mdadm -Q --detail /dev/md2
> /dev/md2:
>         Version : 0.90
>   Creation Time : Mon Feb 23 17:15:41 2009
>      Raid Level : raid1
>      Array Size : 300511808 (286.59 GiB 307.72 GB)
>   Used Dev Size : 300511808 (286.59 GiB 307.72 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 2
>     Persistence : Superblock is persistent
>
>     Update Time : Sat Dec 26 10:34:23 2009
>           State : clean
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>
>            UUID : fed99e3d:d08fdcc9:b9593a45:2cc09736
>          Events : 0.30586
>
>     Number   Major   Minor   RaidDevice State
>        0       3        3        0      active sync   /dev/hda3
>        1      22        3        1      active sync   /dev/hdc3
>
> # cat /proc/mdstat 
> Personalities : [raid1] 
> md0 : active raid1 hdc1[1] hda1[0]
>       521984 blocks [2/2] [UU]
>       
> md1 : active raid1 hdc2[1] hda2[0]
>       10482304 blocks [2/2] [UU]
>       
> md3 : active raid1 hdc4[1] hda4[0]
>       1052160 blocks [2/2] [UU]
>       
> md2 : active raid1 hdc3[1] hda3[0]
>       300511808 blocks [2/2] [UU]
>       
> unused devices: <none>

Sounds like a block with bad data that doesn't give an IO error. The
raid layer can't see that the data is bad but the filesystem
recognises that the data makes no sense.

First I would run a check on the raid to see if the contents of both
drives differ. Then I would take one drive out of the raid and run
badblocks in read-write mode on it. Then resync and repeat with the
other drive.

If none of those show error I would backup, format and restore and
pray.

MfG
        Goswin

PS: Why is your / not read-only?
PPS: run http://mrvn.homeip.net/fstest/


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-12-27 13:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-25 23:17 ext3 on a RAID1 going read only? Steven Haigh
2009-12-25 23:35 ` Steven Haigh
2009-12-27 13:01   ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox