linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
@ 2011-04-14 21:14 Gavin Flower
  2011-04-14 21:19 ` Mathias Burén
  0 siblings, 1 reply; 28+ messages in thread
From: Gavin Flower @ 2011-04-14 21:14 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Mathias Burén, neilb, linux-raid

[-- Attachment #1: Type: text/plain, Size: 561 bytes --]

--- On Fri, 15/4/11, Phil Turmel <philip@turmel.org> wrote:

> From: Phil Turmel <philip@turmel.org>
> Subject: Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
> To: "Gavin Flower" <gavinflower@yahoo.com>
> Cc: "Mathias Burén" <mathias.buren@gmail.com>, neilb@suse.de, linux-raid@vger.kernel.org
> Date: Friday, 15 April, 2011, 1:16
> Hi Gavin,
> 
> I think you might want to investigate your *power supply*
[...]

Attaching OpenDocument file with full details of smart output and comparison table.


Cheers,
Gavin

[-- Attachment #2: raid-notes-20110415-smart.odt --]
[-- Type: application/vnd.oasis.opendocument.text, Size: 18683 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
@ 2011-04-13 22:24 Gavin Flower
  2011-04-13 22:28 ` Mathias Burén
  2011-04-13 23:09 ` NeilBrown
  0 siblings, 2 replies; 28+ messages in thread
From: Gavin Flower @ 2011-04-13 22:24 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid


--- On Fri, 8/4/11, Gavin Flower <gavinflower@yahoo.com> wrote:

> From: Gavin Flower <gavinflower@yahoo.com>
> Subject: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
[...]
> This morning, I noticed my system was extremely
> unresponsive, and that there were clicking sounds coming
> from one of my 5 hard drives.  
[...]

Hi Neil,

When I do 
   badblocks -s -v /dev/sdc
I hear clicking sounds from the hard drive, and notice lots and lots of log messages such as:
ata3: exception Emask 0x10 SAct 0x0 SErr 0x90200 action 0xe frozen
ata3: irq_stat 0x00400000, PHY RDY changed
ata3: SError: { Persist PHYRdyChg 10B8B }
ata3: hard resetting link
ata3: softreset failed (device not ready)
ata3: applying SB600 PMP SRST workaround and retrying
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata3.00: configured for UDMA/33
ata3: EH complete

So I assume that the clicking corresponds to the hard reset, but I'm not certain of that.  Initially, I thought it might be some kind of disk head problems.  Note that smart reports no bad blocks.


Regards,
Gavin

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
@ 2011-04-08  2:01 Gavin Flower
  0 siblings, 0 replies; 28+ messages in thread
From: Gavin Flower @ 2011-04-08  2:01 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

Hi Neil,

Looks like the log file was simply too big.

Here are the initial and ending lines:

Cheers,
Gavin

output of:
grep -i ATA /var/log/messages

Apr  4 00:46:18 saturn kernel: [58150.946089] pata_atiixp 0000:00:14.1: PCI INT A disabled
Apr  4 00:46:19 saturn kernel: [58151.620996] pata_atiixp 0000:00:14.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Apr  4 00:46:19 saturn kernel: [58151.776364] ata6.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
Apr  4 00:46:19 saturn kernel: [58151.776367] ata6.00: ACPI cmd ef/03:46:00:00:00:a0 (SET FEATURES) filtered out
Apr  4 00:46:19 saturn kernel: [58151.776370] ata6.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Apr  4 00:46:19 saturn kernel: [58151.792475] ata5.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
Apr  4 00:46:19 saturn kernel: [58151.792478] ata5.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out
Apr  4 00:46:19 saturn kernel: [58151.792481] ata5.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Apr  4 00:46:19 saturn kernel: [58151.814455] ata5.00: configured for UDMA/33
Apr  4 00:46:19 saturn kernel: [58151.850339] ata6.00: configured for UDMA/100
Apr  4 00:46:19 saturn kernel: [58151.864031] ata3: softreset failed (device not ready)
Apr  4 00:46:19 saturn kernel: [58151.864035] ata4: softreset failed (device not ready)
Apr  4 00:46:19 saturn kernel: [58151.864038] ata3: applying SB600 PMP SRST workaround and retrying
Apr  4 00:46:19 saturn kernel: [58151.864040] ata4: applying SB600 PMP SRST workaround and retrying
Apr  4 00:46:19 saturn kernel: [58151.864059] ata2: softreset failed (device not ready)
Apr  4 00:46:19 saturn kernel: [58151.864061] ata1: softreset failed (device not ready)
Apr  4 00:46:19 saturn kernel: [58151.864063] ata2: applying SB600 PMP SRST workaround and retrying
Apr  4 00:46:19 saturn kernel: [58151.864065] ata1: applying SB600 PMP SRST workaround and retrying
Apr  4 00:46:19 saturn kernel: [58152.019042] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  4 00:46:19 saturn kernel: [58152.019046] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  4 00:46:19 saturn kernel: [58152.019070] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  4 00:46:19 saturn kernel: [58152.019079] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  4 00:46:19 saturn kernel: [58152.021363] ata3.00: configured for UDMA/133
Apr  4 00:46:19 saturn kernel: [58152.085139] ata4.00: configured for UDMA/133
Apr  4 00:46:19 saturn kernel: [58152.085152] ata1.00: configured for UDMA/133
Apr  4 00:46:19 saturn kernel: [58152.085165] ata2.00: configured for UDMA/133
[...]
Apr  7 14:41:58 saturn kernel: [231943.624749] ata3: hard resetting link
Apr  7 14:42:05 saturn kernel: [231950.625059] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr  7 14:42:05 saturn kernel: [231950.635608] ata3.00: configured for UDMA/33
Apr  7 14:42:05 saturn kernel: [231950.635617] ata3: EH complete
Apr  7 14:42:05 saturn kernel: [231950.654531] ata3.00: exception Emask 0x50 SAct 0x1 SErr 0x90a00 action 0xe frozen
Apr  7 14:42:05 saturn kernel: [231950.654535] ata3.00: irq_stat 0x01400000, PHY RDY changed
Apr  7 14:42:05 saturn kernel: [231950.654538] ata3: SError: { Persist HostInt PHYRdyChg 10B8B }
Apr  7 14:42:05 saturn kernel: [231950.654541] ata3.00: failed command: READ FPDMA QUEUED
Apr  7 14:42:05 saturn kernel: [231950.654546] ata3.00: cmd 60/80:00:f0:21:3b/00:00:1c:00:00/40 tag 0 ncq 65536 in
Apr  7 14:42:05 saturn kernel: [231950.654547]          res 40/00:00:f0:21:3b/00:00:1c:00:00/40 Emask 0x50 (ATA bus error)
Apr  7 14:42:05 saturn kernel: [231950.654550] ata3.00: status: { DRDY }
Apr  7 14:42:05 saturn kernel: [231950.654554] ata3: hard resetting link
Apr  7 14:42:12 saturn kernel: [231957.654285] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr  7 14:42:12 saturn kernel: [231957.666115] ata3.00: configured for UDMA/33
Apr  7 14:42:12 saturn kernel: [231957.666123] ata3: EH complete
Apr  7 14:42:12 saturn kernel: [231957.756013] ata3.00: exception Emask 0x50 SAct 0x1 SErr 0x90a00 action 0xe frozen
Apr  7 14:42:12 saturn kernel: [231957.756016] ata3.00: irq_stat 0x01400000, PHY RDY changed
Apr  7 14:42:12 saturn kernel: [231957.756020] ata3: SError: { Persist HostInt PHYRdyChg 10B8B }
Apr  7 14:42:12 saturn kernel: [231957.756023] ata3.00: failed command: READ FPDMA QUEUED
Apr  7 14:42:12 saturn kernel: [231957.756028] ata3.00: cmd 60/80:00:f0:24:3b/00:00:1c:00:00/40 tag 0 ncq 65536 in
Apr  7 14:42:12 saturn kernel: [231957.756029]          res 40/00:00:f0:24:3b/00:00:1c:00:00/40 Emask 0x50 (ATA bus error)
Apr  7 14:42:12 saturn kernel: [231957.756032] ata3.00: status: { DRDY }
Apr  7 14:42:12 saturn kernel: [231957.756037] ata3: hard resetting link
Apr  7 14:42:16 saturn kernel: [231961.389026] ata3: softreset failed (device not ready)
Apr  7 14:42:16 saturn kernel: [231961.389032] ata3: applying SB600 PMP SRST workaround and retrying
Apr  7 14:42:16 saturn kernel: [231961.544030] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr  7 14:42:16 saturn kernel: [231961.546323] ata3.00: configured for UDMA/33
Apr  7 14:42:16 saturn kernel: [231961.546331] ata3: EH complete

--
All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!


--- On Fri, 8/4/11, Gavin Flower <gavinflower@yahoo.com> wrote:

> From: Gavin Flower <gavinflower@yahoo.com>
> Subject: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
> To: neilb@suse.de
> Cc: linux-raid@vger.kernel.org
> Date: Friday, 8 April, 2011, 13:32
[...]
> My original email may have been eaten: as it did not appear
> on the list, nor did I get an error message back.  So
> perhaps there was a problem with the attached files.
> 
> I will resend the attachments one at a time in separate
> emails.
[...]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
@ 2011-04-08  1:34 Gavin Flower
  0 siblings, 0 replies; 28+ messages in thread
From: Gavin Flower @ 2011-04-08  1:34 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 446 bytes --]

my notes: raid-notes-20110407a.txt
--
All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!


--- On Fri, 8/4/11, Gavin Flower <gavinflower@yahoo.com> wrote:

> From: Gavin Flower <gavinflower@yahoo.com>
> Subject: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
> To: neilb@suse.de
> Cc: linux-raid@vger.kernel.org
> Date: Friday, 8 April, 2011, 13:32
[...]

[-- Attachment #2: raid-notes-20110407a.txt --]
[-- Type: text/plain, Size: 22633 bytes --]

Note that the check on md1 took almost 2 hours!
# grep md1 /var/log/messages 
Apr  4 08:25:38 saturn kernel: [    3.203058] md: md1 stopped. 
Apr  4 08:25:38 saturn kernel: [    3.221821] md/raid:md1: device sda2 operational as raid disk 0 
Apr  4 08:25:38 saturn kernel: [    3.223099] md/raid:md1: device sdc2 operational as raid disk 4 
Apr  4 08:25:38 saturn kernel: [    3.224364] md/raid:md1: device sdd2 operational as raid disk 3 
Apr  4 08:25:38 saturn kernel: [    3.225589] md/raid:md1: device sde2 operational as raid disk 2 
Apr  4 08:25:38 saturn kernel: [    3.226806] md/raid:md1: device sdb2 operational as raid disk 1 
Apr  4 08:25:38 saturn kernel: [    3.229256] md/raid:md1: allocated 5334kB 
Apr  4 08:25:38 saturn kernel: [    3.230500] md/raid:md1: raid level 6 active with 5 out of 5 devices, algorithm 2 
Apr  4 08:25:38 saturn kernel: [    3.232503] md1: detected capacity change from 0 to 314571227136 
Apr  4 08:25:38 saturn kernel: [    3.234559] dracut: mdadm: /dev/md1 has been started with 5 drives. 
Apr  4 08:25:38 saturn kernel: [    3.236257] md1: detected capacity change from 0 to 314571227136 
Apr  4 08:25:38 saturn kernel: [    3.237425]  md1: unknown partition table 
Apr  4 08:25:38 saturn kernel: [    9.892068] EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: (null) 
Apr  5 07:05:28 saturn kernel: [65356.926079] Modules linked in: tcp_lp powernow_k8 freq_table mperf fuse ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_TCPMSS ipt_LOG xt_limit bridge stp llc rmd160 crypto_null camellia lzo lzo_compress cast6 cast5 deflate zlib_deflate cts ctr gcm ccm serpent blowfish twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key bluetooth rfkill nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_amd kvm usblp r8169 edac_core atl1e uvcvideo mii snd_hda_codec_atihdmi edac_mce_amd shpchp videodev v4l2_compat_ioctl32 asus_atk0110 serio_raw snd_hda_codec_via snd_usb_audio snd_usbmidi_lib joydev snd_hda_intel i2c_piix4 k10temp snd_hda_codec snd_hw 
Apr  7 07:54:01 saturn kernel: [207546.188800] md: data-check of RAID array md1 
Apr  7 07:54:01 saturn kernel: [207546.188868] md: delaying data-check of md0 until md1 has finished (they share one or more physical units) 
Apr  7 07:54:01 saturn kernel: [207546.190517] md: delaying data-check of md2 until md1 has finished (they share one or more physical units) 
Apr  7 07:54:01 saturn kernel: [207546.190523] md: delaying data-check of md0 until md1 has finished (they share one or more physical units) 
Apr  7 08:42:08 saturn kernel: [210414.109856] md/raid:md1: read error corrected (8 sectors at 17195800 on sdc2) 
Apr  7 08:42:08 saturn kernel: [210414.109869] md/raid:md1: read error corrected (8 sectors at 17195808 on sdc2) 
Apr  7 08:42:08 saturn kernel: [210414.109872] md/raid:md1: read error corrected (8 sectors at 17195816 on sdc2) 
Apr  7 08:42:08 saturn kernel: [210414.109875] md/raid:md1: read error corrected (8 sectors at 17195824 on sdc2) 
Apr  7 08:42:08 saturn kernel: [210414.109877] md/raid:md1: read error corrected (8 sectors at 17195832 on sdc2) 
Apr  7 08:42:08 saturn kernel: [210414.109880] md/raid:md1: read error corrected (8 sectors at 17195840 on sdc2) 
Apr  7 08:42:08 saturn kernel: [210414.109883] md/raid:md1: read error corrected (8 sectors at 17195848 on sdc2) 
Apr  7 08:42:08 saturn kernel: [210414.109891] md/raid:md1: read error corrected (8 sectors at 17195856 on sdc2) 
Apr  7 08:42:08 saturn kernel: [210414.109894] md/raid:md1: read error corrected (8 sectors at 17195864 on sdc2) 
Apr  7 08:42:08 saturn kernel: [210414.109897] md/raid:md1: read error corrected (8 sectors at 17195872 on sdc2) 
Apr  7 08:54:39 saturn kernel: [211161.824066] md/raid:md1: read error corrected (8 sectors at 137014528 on sdc2) 
Apr  7 09:51:47 saturn kernel: [214581.140560] md: md1: data-check done. 
# 

# date ; cat /proc/mdstat 
Thu Apr  7 10:31:24 NZST 2011 
Personalities : [raid6] [raid5] [raid4] 
md2 : active raid6 sda4[0] sdc4[6] sdd4[3] sdb4[5] sde4[1] 
      1114745856 blocks super 1.1 level 6, 512k chunk, algorithm 2 [5/5] [UUUUU] 
      [==========>..........]  check = 54.1% (201068416/371581952) finish=32.6min speed=87129K/sec 
      bitmap: 2/3 pages [8KB], 65536KB chunk 

md1 : active raid6 sda2[0] sdc2[4] sdd2[3] sde2[2] sdb2[1] 
      307198464 blocks level 6, 512k chunk, algorithm 2 [5/5] [UUUUU] 
      
md0 : active raid6 sda3[0] sdb3[4] sdd3[3] sdc3[2] sde3[1] 
      10751808 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] 
      
unused devices: <none> 
# 

From root@localhost6.localdomain6  Thu Apr  7 11:12:41 2011 
Return-Path: <root@localhost6.localdomain6> 
Date: Thu, 7 Apr 2011 11:12:40 +1200 
From: Anacron <root@localhost6.localdomain6> 
To: root@localhost6.localdomain6 
Content-Type: text/plain; charset="ANSI_X3.4-1968" 
Subject: Anacron job 'cron.weekly' on saturn 
Status: R 

/etc/cron.weekly/99-raid-check: 

WARNING: mismatch_cnt is not 0 on /dev/md2 
WARNING: mismatch_cnt is not 0 on /dev/md0 

# cat /sys/block/md0/md/mismatch_cnt
128 
# cat /sys/block/md1/md/mismatch_cnt 
0 
# cat /sys/block/md2/md/mismatch_cnt 
28904 
# 


# e2fsck -f -n /dev/md2 
e2fsck 1.41.12 (17-May-2010) 
Warning!  /dev/md2 is mounted. 
Warning: skipping journal recovery because doing a read-only filesystem check. 
Pass 1: Checking inodes, blocks, and sizes 
Inodes that were part of a corrupted orphan linked list found.  Fix? no 

Inode 20186332 was part of the orphaned inode list.  IGNORED. 
Inode 20317506 was part of the orphaned inode list.  IGNORED. 
Inode 20317552 was part of the orphaned inode list.  IGNORED. 
Inode 20317955 was part of the orphaned inode list.  IGNORED. 
Inode 20447237 was part of the orphaned inode list.  IGNORED. 
Inode 20447245 was part of the orphaned inode list.  IGNORED. 
Inode 20447287 was part of the orphaned inode list.  IGNORED. 
Inode 20447296 was part of the orphaned inode list.  IGNORED. 
Inode 20447302 was part of the orphaned inode list.  IGNORED. 
Inode 20447311 was part of the orphaned inode list.  IGNORED. 
Inode 20447353 was part of the orphaned inode list.  IGNORED. 
Inode 20447360 was part of the orphaned inode list.  IGNORED. 
Inode 21500787 was part of the orphaned inode list.  IGNORED. 
Inode 21628913 was part of the orphaned inode list.  IGNORED. 
Inode 22158808 was part of the orphaned inode list.  IGNORED. 
Inode 22158811 was part of the orphaned inode list.  IGNORED. 
Inode 22158840 was part of the orphaned inode list.  IGNORED. 
Inode 22158842 was part of the orphaned inode list.  IGNORED. 
Inode 22158846 was part of the orphaned inode list.  IGNORED. 
Inode 25952949 was part of the orphaned inode list.  IGNORED. 
Inode 25953424 was part of the orphaned inode list.  IGNORED. 
Inode 25954542 was part of the orphaned inode list.  IGNORED. 
Deleted inode 45088771 has zero dtime.  Fix? no 

Inode 45088772 was part of the orphaned inode list.  IGNORED. 
Inode 45088773 was part of the orphaned inode list.  IGNORED. 
Inode 45088774 was part of the orphaned inode list.  IGNORED. 
Inode 45088775 was part of the orphaned inode list.  IGNORED. 
Inode 45088972 was part of the orphaned inode list.  IGNORED. 
Inode 45089022 was part of the orphaned inode list.  IGNORED. 
Inode 45089035 was part of the orphaned inode list.  IGNORED. 
Inode 45089037 was part of the orphaned inode list.  IGNORED. 
Inode 45089043 was part of the orphaned inode list.  IGNORED. 
Inode 45089044 was part of the orphaned inode list.  IGNORED. 
Inode 45089045 was part of the orphaned inode list.  IGNORED. 
Inode 45089057 was part of the orphaned inode list.  IGNORED. 
Inode 45089060 was part of the orphaned inode list.  IGNORED. 
Inode 45089062 was part of the orphaned inode list.  IGNORED. 
Inode 45089064 was part of the orphaned inode list.  IGNORED. 
Inode 45089067 was part of the orphaned inode list.  IGNORED. 
Inode 45089068 was part of the orphaned inode list.  IGNORED. 
Inode 45089070 was part of the orphaned inode list.  IGNORED. 
Inode 45089137 was part of the orphaned inode list.  IGNORED. 
Inode 45089150 was part of the orphaned inode list.  IGNORED. 
Inode 45089156 was part of the orphaned inode list.  IGNORED. 
Inode 45089190 was part of the orphaned inode list.  IGNORED. 
Inode 45089204 was part of the orphaned inode list.  IGNORED. 
Inode 45089205 was part of the orphaned inode list.  IGNORED. 
Inode 45089207 was part of the orphaned inode list.  IGNORED. 
Inode 45089213 was part of the orphaned inode list.  IGNORED. 
Inode 45089218 was part of the orphaned inode list.  IGNORED. 
Inode 45089238 was part of the orphaned inode list.  IGNORED. 
Inode 45089249 was part of the orphaned inode list.  IGNORED. 
Inode 45089257 was part of the orphaned inode list.  IGNORED. 
Inode 45089264 was part of the orphaned inode list.  IGNORED. 
Inode 45089282 was part of the orphaned inode list.  IGNORED. 
Inode 45089284 was part of the orphaned inode list.  IGNORED. 
Inode 45089286 was part of the orphaned inode list.  IGNORED. 
Inode 45089291 was part of the orphaned inode list.  IGNORED. 
Inode 45089297 was part of the orphaned inode list.  IGNORED. 
Inode 45089298 was part of the orphaned inode list.  IGNORED. 
Inode 45089305 was part of the orphaned inode list.  IGNORED. 
Inode 45089307 was part of the orphaned inode list.  IGNORED. 
Inode 45089319 was part of the orphaned inode list.  IGNORED. 
Inode 45089320 was part of the orphaned inode list.  IGNORED. 
Inode 63705919 was part of the orphaned inode list.  IGNORED. 
Inode 65938687 was part of the orphaned inode list.  IGNORED. 
Inode 65939256 was part of the orphaned inode list.  IGNORED. 
Inode 65939355 was part of the orphaned inode list.  IGNORED. 
Inode 65939368 was part of the orphaned inode list.  IGNORED. 
Inode 66191686 was part of the orphaned inode list.  IGNORED. 
Inode 66191689 was part of the orphaned inode list.  IGNORED. 
Inode 66191738 was part of the orphaned inode list.  IGNORED. 
Inode 66191741 was part of the orphaned inode list.  IGNORED. 
Inode 66191747 was part of the orphaned inode list.  IGNORED. 
Inode 66197970 was part of the orphaned inode list.  IGNORED. 
Pass 2: Checking directory structure 
Pass 3: Checking directory connectivity 
Pass 4: Checking reference counts 
Pass 5: Checking group summary information 
Block bitmap differences:  -(2393344--2393372) -(2393792--2393809) -(2470272--2470336) -(2502016--2502080) +(7831552--7841252) +(7841792--7864319) -(79795252--79795253) -(79823488--79823615) -(79824000--79824123) -(79824640--79825142) -79826344 -79898014 -79923101 -(79923154--79923165) -(80123296--80123311) -(80152298--80152301) -80291729 -80291732 -80291759 -80847380 -(80847438--80847441) -80847502 -80847555 -80847736 -80874645 -80874664 -80875873 -(80875914--80875920) -80875927 -80875960 -(80876002--80876004) -80876048 -(80876052--80876056) -(80876600--80876601) -(80876639--80876641) -81330516 -(81334527--81334528) -81334535 -(81821915--81821947) -(81822170--81822204) -(81894559--81894562) -81923317 -81925743 -81925934 -(81925951--81925952) -(81926003--81926004) -(81956735--81957638) -(82971732--82971733) -(82971902--82971903) -(82971917--82971918) -(82971947--82971948) -(82971972--82971991) -85992203 -86516481 -87626360 -88613273 -104083592 -(104083946--104083948) -104083957 -104084073 -104084084 -104084487 -104137397 -104138111 -104236430 -(104236580--104236596) -(104236598--104236610) -(104301814--104301815) -(104301822--104301828) -104343080 -(105686863--105686864) -105686916 -(115903040--115903065) +(115903516--115903541) -134259847 -134284245 -134284593 -(134284674--134284675) -134285473 -(170994896--170994901) -170994959 -170995027 -(180397545--180397547) -(255167322--255167805) -(263756512--263756516) -(263764800--263764807) -(263779568--263779592) -(263782498--263782533) -(264798344--264798348) -(264804016--264804023) -(264804064--264804074) -(264804968--264804973) -(264809216--264809359) 
Fix? no 

Free blocks count wrong for group #239 (539, counted=32768). 
Fix? no 

Free blocks count wrong for group #2446 (23057, counted=23053). 
Fix? no 

Free blocks count wrong (256921638, counted=256646017). 
Fix? no 

Inode bitmap differences:  -20186332 -20317506 -20317552 -20317955 -20447237 -20447245 -20447287 -20447296 -20447302 -20447311 -20447353 -20447360 -21500787 -21628913 -22158808 -22158811 -22158840 -22158842 -22158846 -25952949 -25953424 -25954542 -(45088771--45088775) -45088972 -45089022 -45089035 -45089037 -(45089043--45089045) -45089057 -45089060 -45089062 -45089064 -(45089067--45089068) -45089070 -45089137 -45089150 -45089156 -45089190 -(45089204--45089205) -45089207 -45089213 -45089218 -45089238 -45089249 -45089257 -45089264 -45089282 -45089284 -45089286 -45089291 -(45089297--45089298) -45089305 -45089307 -(45089319--45089320) -63705919 -65938687 -65939256 -65939355 -65939368 -66191686 -66191689 -66191738 -66191741 -66191747 -66197970 
Fix? no 

Directories count wrong for group #2624 (735, counted=734). 
Fix? no 

Directories count wrong for group #2640 (735, counted=734). 
Fix? no 

Directories count wrong for group #2704 (541, counted=540). 
Fix? no 

Free inodes count wrong (68295781, counted=68268234). 
Fix? no 


/dev/md2: ********** WARNING: Filesystem still has errors ********** 

/dev/md2: 1377179/69672960 files (0.4% non-contiguous), 21764826/278686464 blocks 
# 

# e2fsck -f -n /dev/sda4 
e2fsck 1.41.12 (17-May-2010) 
e2fsck: Device or resource busy while trying to open /dev/sda4 
Filesystem mounted or opened exclusively by another program? 

# mdadm --detail /dev/md2 
/dev/md2: 
        Version : 1.1 
  Creation Time : Wed Nov 24 08:27:42 2010 
     Raid Level : raid6 
     Array Size : 1114745856 (1063.10 GiB 1141.50 GB) 
  Used Dev Size : 371581952 (354.37 GiB 380.50 GB) 
   Raid Devices : 5 
  Total Devices : 5 
    Persistence : Superblock is persistent 

  Intent Bitmap : Internal 

    Update Time : Thu Apr  7 12:11:59 2011 
          State : active 
 Active Devices : 5 
Working Devices : 5 
 Failed Devices : 0 
  Spare Devices : 0 

         Layout : left-symmetric 
     Chunk Size : 512K 

           Name : localhost.localdomain:2 
           UUID : a511e656:a742a2f2:f4917939:2d333c7e 
         Events : 38609 

    Number   Major   Minor   RaidDevice State 
       0       8        4        0      active sync   /dev/sda4 
       1       8       68        1      active sync   /dev/sde4 
       5       8       20        2      active sync   /dev/sdb4 
       3       8       52        3      active sync   /dev/sdd4 
       6       8       36        4      active sync   /dev/sdc4 
# 

note absence of /dev/md0 (swap)!!!
# df 
Filesystem           1K-blocks      Used Available Use% Mounted on 
/dev/md2             1097254408  70799328 970717788   7% / 
tmpfs                  4097108       824   4096284   1% /dev/shm 
/dev/sda1              1032088    128772    850888  14% /boot 
/dev/md1             302377920  72501428 214516572  26% /data 
# 

# mdadm -Evs 
ARRAY /dev/md1 level=raid6 num-devices=5 UUID=6f1176ae:a0ad6cac:bfe78010:bc810f04 
   devices=/dev/sde2,/dev/sdc2,/dev/sdd2,/dev/sdb2,/dev/sda2 
ARRAY /dev/md0 level=raid6 num-devices=5 UUID=3b76ac20:8253f696:bfe78010:bc810f04 
   devices=/dev/sde3,/dev/sdc3,/dev/sdd3,/dev/sdb3,/dev/sda3 
ARRAY /dev/md/2 level=raid6 metadata=1.1 num-devices=5 UUID=a511e656:a742a2f2:f4917939:2d333c7e name=localhost.localdomain:2 
   devices=/dev/sde4,/dev/sdc4,/dev/sdd4,/dev/sdb4,/dev/sda4 
# 

# fdisk -l 

Disk /dev/sda: 500.1 GB, 500107862016 bytes 
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors 
Units = sectors of 1 * 512 = 512 bytes 
Sector size (logical/physical): 512 bytes / 512 bytes 
I/O size (minimum/optimal): 512 bytes / 512 bytes 
Disk identifier: 0x0000ca3a 

   Device Boot      Start         End      Blocks   Id  System 
/dev/sda1   *          63     2097214     1048576   83  Linux 
/dev/sda2         2097215   206897214   102400000   fd  Linux raid autodetect 
/dev/sda3       206897215   214065214     3584000   fd  Linux raid autodetect 
/dev/sda4       214066125   957233024   371583450   fd  Linux raid autodetect 

Disk /dev/sdb: 500.1 GB, 500107862016 bytes 
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors 
Units = sectors of 1 * 512 = 512 bytes 
Sector size (logical/physical): 512 bytes / 512 bytes 
I/O size (minimum/optimal): 512 bytes / 512 bytes 
Disk identifier: 0x000566c1 

   Device Boot      Start         End      Blocks   Id  System 
/dev/sdb1              63     2097214     1048576   83  Linux 
/dev/sdb2         2097215   206897214   102400000   fd  Linux raid autodetect 
/dev/sdb3       206897215   214065214     3584000   fd  Linux raid autodetect 
/dev/sdb4       214066125   957233024   371583450   fd  Linux raid autodetect 

Disk /dev/sdd: 500.1 GB, 500107862016 bytes 
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors 
Units = sectors of 1 * 512 = 512 bytes 
Sector size (logical/physical): 512 bytes / 512 bytes 
I/O size (minimum/optimal): 512 bytes / 512 bytes 
Disk identifier: 0x0000af79 

   Device Boot      Start         End      Blocks   Id  System 
/dev/sdd1   *          63     2097214     1048576   83  Linux 
/dev/sdd2         2097215   206897214   102400000   fd  Linux raid autodetect 
/dev/sdd3       206897215   214065214     3584000   fd  Linux raid autodetect 
/dev/sdd4       214066125   957233024   371583450   fd  Linux raid autodetect 

Disk /dev/sdc: 500.1 GB, 500107862016 bytes 
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors 
Units = sectors of 1 * 512 = 512 bytes 
Sector size (logical/physical): 512 bytes / 512 bytes 
I/O size (minimum/optimal): 512 bytes / 512 bytes 
Disk identifier: 0x00081ccd 

   Device Boot      Start         End      Blocks   Id  System 
/dev/sdc1   *          63     2097214     1048576   83  Linux 
/dev/sdc2         2097215   206897214   102400000   fd  Linux raid autodetect 
/dev/sdc3       206897215   214065214     3584000   fd  Linux raid autodetect 
/dev/sdc4       214066125   957233024   371583450   fd  Linux raid autodetect 

Disk /dev/sde: 500.1 GB, 500107862016 bytes 
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors 
Units = sectors of 1 * 512 = 512 bytes 
Sector size (logical/physical): 512 bytes / 512 bytes 
I/O size (minimum/optimal): 512 bytes / 512 bytes 
Disk identifier: 0x00081ccd 

   Device Boot      Start         End      Blocks   Id  System 
/dev/sde1   *          63     2097214     1048576   83  Linux 
/dev/sde2         2097215   206897214   102400000   fd  Linux raid autodetect 
/dev/sde3       206897215   214065214     3584000   fd  Linux raid autodetect 
/dev/sde4       214066125   957233024   371583450   fd  Linux raid autodetect 

Disk /dev/md0: 11.0 GB, 11009851392 bytes 
2 heads, 4 sectors/track, 2687952 cylinders, total 21503616 sectors 
Units = sectors of 1 * 512 = 512 bytes 
Sector size (logical/physical): 512 bytes / 512 bytes 
I/O size (minimum/optimal): 65536 bytes / 196608 bytes 
Disk identifier: 0x00000000 

Disk /dev/md0 doesn't contain a valid partition table 

Disk /dev/md1: 314.6 GB, 314571227136 bytes 
2 heads, 4 sectors/track, 76799616 cylinders, total 614396928 sectors 
Units = sectors of 1 * 512 = 512 bytes 
Sector size (logical/physical): 512 bytes / 512 bytes 
I/O size (minimum/optimal): 524288 bytes / 1572864 bytes 
Disk identifier: 0x00000000 

Disk /dev/md1 doesn't contain a valid partition table 

Disk /dev/md2: 1141.5 GB, 1141499756544 bytes 
2 heads, 4 sectors/track, 278686464 cylinders, total 2229491712 sectors 
Units = sectors of 1 * 512 = 512 bytes 
Sector size (logical/physical): 512 bytes / 512 bytes 
I/O size (minimum/optimal): 524288 bytes / 1572864 bytes 
Disk identifier: 0x00000000 

Disk /dev/md2 doesn't contain a valid partition table 
# 

# dmraid -b 
/dev/sde:    976773168 total, "6VM2FE64" 
/dev/sdc:    976773168 total, "5VMJ3RJE" 
/dev/sdd:    976773168 total, "6VM2AM98" 
/dev/sdb:    976773168 total, "6VM2H5W7" 
/dev/sda:    976773168 total, "5VM1VNM9" 
# 

I ran badblocks for each drive concurrently, note that the one for sda took about an hour longer than the others, but it was sdc that reported a bad block.
# badblocks -s -v /dev/sda 
Checking blocks 0 to 488386583 
Checking for bad blocks (read-only test): done                                
Pass completed, 0 bad blocks found. 
# badblocks -s -v /dev/sdb 
Checking blocks 0 to 488386583 
Checking for bad blocks (read-only test): done                                
Pass completed, 0 bad blocks found. 
# badblocks -s -v /dev/sdc 
Checking blocks 0 to 488386583 
Checking for bad blocks (read-only test): 236817152one, 58:43 elapsed 
done                                
Pass completed, 1 bad blocks found. 
# badblocks -s -v /dev/sdd 
Checking blocks 0 to 488386583 
Checking for bad blocks (read-only test): done                                
Pass completed, 0 bad blocks found. 
# badblocks -s -v /dev/sde 
Checking blocks 0 to 488386583 
Checking for bad blocks (read-only test): done                                
Pass completed, 0 bad blocks found. 
#
Selected lines from the smartctl output:
# smartctl -a /dev/sda 
Model Family:     Seagate Barracuda 7200.12 family 
Device Model:     ST3500418AS 
Serial Number:    5VM1VNM9 
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       17 


# smartctl -a /dev/sdb 
Model Family:     Seagate Barracuda 7200.12 family 
Device Model:     ST3500418AS 
Serial Number:    6VM2H5W7 
  5 Reallocated_Sector_Ct   0x0033   099   099   036    Pre-fail  Always       -       42 


# smartctl -a /dev/sdc 
Model Family:     Seagate Barracuda 7200.12 family 
Device Model:     ST3500418AS 
Serial Number:    5VMJ3RJE 
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0 


# smartctl -a /dev/sdd 
Model Family:     Seagate Barracuda 7200.12 family 
Device Model:     ST3500418AS 
Serial Number:    6VM2AM98 
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       1 


# smartctl -a /dev/sde 
Model Family:     Seagate Barracuda 7200.12 family 
Device Model:     ST3500418AS 
Serial Number:    6VM2FE64 
  5 Reallocated_Sector_Ct   0x0033   099   099   036    Pre-fail  Always       -       79 



^ permalink raw reply	[flat|nested] 28+ messages in thread
* RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
@ 2011-04-08  1:32 Gavin Flower
  2011-04-08  9:34 ` NeilBrown
  0 siblings, 1 reply; 28+ messages in thread
From: Gavin Flower @ 2011-04-08  1:32 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

Hi Neil,

My original email may have been eaten: as it did not appear on the list, nor did I get an error message back.  So perhaps there was a problem with the attached files.

I will resend the attachments one at a time in separate emails.


Cheers,
Gavin

[begin original]
Hi Neil,

Your help (or anybody else's) would be greatly appreciated, yet again!

This morning, I noticed my system was extremely unresponsive, and that there were clicking sounds coming from one of my 5 hard drives.  Also that there was excessive disk I/O even for trivial things like bring up a directory window, and lots of ata3 errors being reported to the system log.  These symptoms were mostly during a raid check process.

Somewhere along the way, I seemed to have lost my swap partition!

So I did some extensive investigations, which took most of the day.  My notes were created in OpenDocument format using LibreOffice, but I have converted them to txt format for the include - but I can supply the ,odt file if requested.

I Have included 2 files:
               my notes: raid-notes-20110407a.txt
   selected log entries: messages-gcf-20110407-ATA

If there are some additional diagnostics that might prove useful, please let me know.


Cheers,
Gavin
[end original]
--
All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
@ 2011-04-07 21:58 Gavin Flower
  0 siblings, 0 replies; 28+ messages in thread
From: Gavin Flower @ 2011-04-07 21:58 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

Hi Neil,

After further checking, I found there was no problem with the swap partition.


Cheers,
Gavin
--
All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!


--- On Thu, 7/4/11, Gavin Flower <gavinflower@yahoo.com> wrote:

> From: Gavin Flower <gavinflower@yahoo.com>
> Subject: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
> To: neilb@suse.de
> Cc: linux-raid@vger.kernel.org
> Date: Thursday, 7 April, 2011, 18:07
[...]
> Somewhere along the way, I seemed to have lost my swap
> partition!
[...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-04-28 22:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-14 21:14 RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive Gavin Flower
2011-04-14 21:19 ` Mathias Burén
2011-04-14 23:15   ` John Robinson
  -- strict thread matches above, loose matches on Subject: below --
2011-04-13 22:24 Gavin Flower
2011-04-13 22:28 ` Mathias Burén
2011-04-14  0:15   ` Gavin Flower
2011-04-14  4:08     ` Roman Mamedov
2011-04-14 13:16     ` Phil Turmel
2011-04-14 21:12       ` Gavin Flower
2011-04-14 22:23         ` Phil Turmel
2011-04-28 20:03           ` Gavin Flower
2011-04-28 20:11             ` Roman Mamedov
2011-04-28 22:11               ` Phil Turmel
2011-04-28 22:40                 ` Phil Turmel
2011-04-13 23:09 ` NeilBrown
2011-04-08  2:01 Gavin Flower
2011-04-08  1:34 Gavin Flower
2011-04-08  1:32 Gavin Flower
2011-04-08  9:34 ` NeilBrown
2011-04-08  9:59   ` Gavin Flower
2011-04-08 11:50     ` NeilBrown
2011-04-11  6:50       ` Gavin Flower
2011-04-12 21:30       ` Gavin Flower
2011-04-13 10:57         ` John Robinson
2011-04-13 11:13           ` NeilBrown
2011-04-13 11:58             ` John Robinson
2011-04-13 20:30               ` Gavin Flower
2011-04-07 21:58 Gavin Flower

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).