Checking the sanity of SATA disks

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Checking the sanity of SATA disks
@ 2005-10-04 14:29 Andy Smith
  2005-10-04 14:42 ` Molle Bestefich
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Andy Smith @ 2005-10-04 14:29 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1241 bytes --]

Hello,

I have a home fileserver with 4 SATA disks in a RAID 5.  As I am
sure you are aware, SATA devices in Linux currently cannot be
queried for SMART info, so I can't do SMART health checks of these
devices.

Also there is still the tendency for Linux Software RAID to kick
devices out of the array as soon as there is any error on them.

I really don't want to be in the situation where a drive dies, I fit
a new one, and during the resync another device is kicked out
because of spontaneously finding a bad sector.

I tried simply doing a

        dd if=/dev/sd[abcd] of=/dev/null

To check each disk in a very unsubtle fashion, but it drives the
load average on the machine way way up (like to 20+) and makes it
very unresponsive (wait several minutes for a keypress to be
acknowledged), even if I run it under nice -n 19.

I don't notice any performance problems on this server during normal
day to day use, and while it's not particularly beefy it is an AMD
Sempron 1.8GHz so I am surprised that simply reading from one disk
causes these performance issues.

I know this isn't right, so has anyone got any advice in the way of
tracking down which part of the system is at fault, possibly
off-list if it's too offtopic?

Thanks,
Andy

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Checking the sanity of SATA disks
  2005-10-04 14:29 Checking the sanity of SATA disks Andy Smith
@ 2005-10-04 14:42 ` Molle Bestefich
  2005-10-04 14:56   ` Andy Smith
  2005-10-04 15:59 ` Patrik Jonsson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Molle Bestefich @ 2005-10-04 14:42 UTC (permalink / raw)
  To: andy; +Cc: linux-raid

Andy Smith wrote:
> I tried simply doing a
>
>         dd if=/dev/sd[abcd] of=/dev/null
>
> To check each disk in a very unsubtle fashion, but it drives the
> load average on the machine way way up (like to 20+)

Checked that the disks are using DMA transfers (use 'hdparm')?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Checking the sanity of SATA disks
  2005-10-04 14:42 ` Molle Bestefich
@ 2005-10-04 14:56   ` Andy Smith
  2005-10-04 15:02     ` Molle Bestefich
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Smith @ 2005-10-04 14:56 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3574 bytes --]

On Tue, Oct 04, 2005 at 02:42:58PM +0000, Molle Bestefich wrote:
> Andy Smith wrote:
> > I tried simply doing a
> >
> >         dd if=/dev/sd[abcd] of=/dev/null
> >
> > To check each disk in a very unsubtle fashion, but it drives the
> > load average on the machine way way up (like to 20+)
> 
> Checked that the disks are using DMA transfers (use 'hdparm')?

$ sudo hdparm /dev/sda

/dev/sda:
 HDIO_GET_MULTCOUNT failed: Inappropriate ioctl for device
 IO_support   =  0 (default 16-bit)
 readonly     =  0 (off)
 readahead    = 256 (on)
 geometry     = 14946/255/63, sectors = 122942324736, start = 0
$ sudo hdparm -d /dev/sda

/dev/sda:

Well that doesn't look right.

From dmesg:

ata1: SATA max UDMA/133 cmd 0x9E0 ctl 0xBE2 bmdma 0xDB00 irq 11
ata2: SATA max UDMA/133 cmd 0x960 ctl 0xB62 bmdma 0xDB08 irq 11
ata1: dev 0 cfg 49:2f00 82:7c6b 83:7b09 84:4003 85:7c69 86:3a01 87:4003 88:407f
ata1: dev 0 ATA, max UDMA/133, 240121728 sectors:
ata1: dev 0 configured for UDMA/133
ata2: dev 0 cfg 49:2f00 82:7c6b 83:7b09 84:4003 85:7c69 86:3a01 87:4003 88:407f
ata2: dev 0 ATA, max UDMA/133, 240121728 sectors:
ata2: dev 0 configured for UDMA/133
  Vendor: ATA       Model: Maxtor 6Y120M0    Rev: YAR5
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 240121728 512-byte hdwr sectors (122942 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 240121728 512-byte hdwr sectors (122942 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0,  type 0
  Vendor: ATA       Model: Maxtor 6Y120M0    Rev: YAR5
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 240121728 512-byte hdwr sectors (122942 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 240121728 512-byte hdwr sectors (122942 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1 sdb2 sdb3
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Attached scsi generic sg1 at scsi1, channel 0, id 0, lun 0,  type 0
ata3: SATA max UDMA/133 cmd 0x9F0 ctl 0xBF2 bmdma 0xE100 irq 10
ata4: SATA max UDMA/133 cmd 0x970 ctl 0xB72 bmdma 0xE108 irq 10
ata3: dev 0 cfg 49:2f00 82:7c6b 83:7b09 84:4003 85:7c69 86:3a01 87:4003 88:407f
ata3: dev 0 ATA, max UDMA/133, 240121728 sectors:
ata3: dev 0 configured for UDMA/133
ata4: dev 0 cfg 49:2f00 82:7c6b 83:7b09 84:4003 85:7c69 86:3a01 87:4003 88:407f
ata4: dev 0 ATA, max UDMA/133, 240121728 sectors:
ata4: dev 0 configured for UDMA/133
  Vendor: ATA       Model: Maxtor 6Y120M0    Rev: YAR5
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdc: 240121728 512-byte hdwr sectors (122942 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 240121728 512-byte hdwr sectors (122942 MB)
SCSI device sdc: drive cache: write back
 sdc: sdc1 sdc2 sdc3
Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
Attached scsi generic sg2 at scsi2, channel 0, id 0, lun 0,  type 0
  Vendor: ATA       Model: Maxtor 6Y120M0    Rev: YAR5
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdd: 240121728 512-byte hdwr sectors (122942 MB)
SCSI device sdd: drive cache: write back
SCSI device sdd: 240121728 512-byte hdwr sectors (122942 MB)
SCSI device sdd: drive cache: write back
 sdd: sdd1 sdd2 sdd3
Attached scsi disk sdd at scsi3, channel 0, id 0, lun 0
Attached scsi generic sg3 at scsi3, channel 0, id 0, lun 0,  type 0


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Checking the sanity of SATA disks
  2005-10-04 14:56   ` Andy Smith
@ 2005-10-04 15:02     ` Molle Bestefich
  0 siblings, 0 replies; 9+ messages in thread
From: Molle Bestefich @ 2005-10-04 15:02 UTC (permalink / raw)
  To: linux-raid

Andy Smith wrote:
> > Checked that the disks are using DMA transfers (use 'hdparm')?
>
> /dev/sda:
>  HDIO_GET_MULTCOUNT failed: Inappropriate ioctl for device

Oh, SATA; right.
You need a patch to make hdparm play well with SATA drives.
Ehrm, but I think that all SATAs run in DMA mode anyway, so never mind.

Sorry for the noise.

(Btw, what's with the "putting me in the Mail-Followup-To field" thingy?)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Checking the sanity of SATA disks
  2005-10-04 14:29 Checking the sanity of SATA disks Andy Smith
  2005-10-04 14:42 ` Molle Bestefich
@ 2005-10-04 15:59 ` Patrik Jonsson
  2005-10-04 17:32 ` Dan Stromberg
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Patrik Jonsson @ 2005-10-04 15:59 UTC (permalink / raw)
  To: Andy Smith; +Cc: linux-raid

Hi,

There is a patch that can be applied to libata that enables SMART for 
SATA drives. It's not 100% stable, so running smartd is not recommended, 
but I've been running nightly SMART selftests and 5-minute temperature 
logging on our 8-drive array since June and has never run into a problem 
(though it's not a very heavily used machine). ymmv... See e.g. 
http://www.ussg.iu.edu/hypermail/linux/kernel/0408.3/2304.html

/Patrik

Andy Smith wrote:

>Hello,
>
>I have a home fileserver with 4 SATA disks in a RAID 5.  As I am
>sure you are aware, SATA devices in Linux currently cannot be
>queried for SMART info, so I can't do SMART health checks of these
>devices.
>
>Also there is still the tendency for Linux Software RAID to kick
>devices out of the array as soon as there is any error on them.
>
>I really don't want to be in the situation where a drive dies, I fit
>a new one, and during the resync another device is kicked out
>because of spontaneously finding a bad sector.
>
>I tried simply doing a
>
>        dd if=/dev/sd[abcd] of=/dev/null
>
>To check each disk in a very unsubtle fashion, but it drives the
>load average on the machine way way up (like to 20+) and makes it
>very unresponsive (wait several minutes for a keypress to be
>acknowledged), even if I run it under nice -n 19.
>
>I don't notice any performance problems on this server during normal
>day to day use, and while it's not particularly beefy it is an AMD
>Sempron 1.8GHz so I am surprised that simply reading from one disk
>causes these performance issues.
>
>I know this isn't right, so has anyone got any advice in the way of
>tracking down which part of the system is at fault, possibly
>off-list if it's too offtopic?
>
>Thanks,
>Andy
>  
>
>------------------------------------------------------------------------
>
>!DSPAM:434291cc89982461629467!
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Checking the sanity of SATA disks
  2005-10-04 14:29 Checking the sanity of SATA disks Andy Smith
  2005-10-04 14:42 ` Molle Bestefich
  2005-10-04 15:59 ` Patrik Jonsson
@ 2005-10-04 17:32 ` Dan Stromberg
  2005-10-05 11:08   ` Andy Smith
  2005-10-09 15:21 ` Mark Hahn
  2005-11-07 19:22 ` Bill Davidsen
  4 siblings, 1 reply; 9+ messages in thread
From: Dan Stromberg @ 2005-10-04 17:32 UTC (permalink / raw)
  To: Andy Smith; +Cc: linux-raid, strombrg


You may be better off using smart, but if you want to go through all the
blocks in a device without making your load skyrocket, I have two
solutions for you:

reblock (check the "-s" option):
http://dcs.nac.uci.edu/~strombrg/reblock.html

slowdown in combination with dd:
http://dcs.nac.uci.edu/~strombrg/slowdown/

HTH :)

On Tue, 2005-10-04 at 14:29 +0000, Andy Smith wrote:
> Hello,
> 
> I have a home fileserver with 4 SATA disks in a RAID 5.  As I am
> sure you are aware, SATA devices in Linux currently cannot be
> queried for SMART info, so I can't do SMART health checks of these
> devices.
> 
> Also there is still the tendency for Linux Software RAID to kick
> devices out of the array as soon as there is any error on them.
> 
> I really don't want to be in the situation where a drive dies, I fit
> a new one, and during the resync another device is kicked out
> because of spontaneously finding a bad sector.
> 
> I tried simply doing a
> 
>         dd if=/dev/sd[abcd] of=/dev/null
> 
> To check each disk in a very unsubtle fashion, but it drives the
> load average on the machine way way up (like to 20+) and makes it
> very unresponsive (wait several minutes for a keypress to be
> acknowledged), even if I run it under nice -n 19.
> 
> I don't notice any performance problems on this server during normal
> day to day use, and while it's not particularly beefy it is an AMD
> Sempron 1.8GHz so I am surprised that simply reading from one disk
> causes these performance issues.
> 
> I know this isn't right, so has anyone got any advice in the way of
> tracking down which part of the system is at fault, possibly
> off-list if it's too offtopic?
> 
> Thanks,
> Andy


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Checking the sanity of SATA disks
  2005-10-04 17:32 ` Dan Stromberg
@ 2005-10-05 11:08   ` Andy Smith
  0 siblings, 0 replies; 9+ messages in thread
From: Andy Smith @ 2005-10-05 11:08 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 745 bytes --]

On Tue, Oct 04, 2005 at 10:32:32AM -0700, Dan Stromberg wrote:
> You may be better off using smart, but if you want to go through all the
> blocks in a device without making your load skyrocket, I have two
> solutions for you:

Thanks Dan.  As it happens it rapidly became clear there was some
major problem with this machine as even normal tasks sent the load
through the roof once they did any significant IO.  I rebooted it
and now the drives all report 4 times the throughput from "hdparm
-t" as they did before.  Something I need to investigate.

Your "slowdown" tool looks useful though and I will no doubt use it
in tandem with dd now, as I don't particularly need those dds to be
fast.

Thanks to others who took the time to reply also.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Checking the sanity of SATA disks
  2005-10-04 14:29 Checking the sanity of SATA disks Andy Smith
                   ` (2 preceding siblings ...)
  2005-10-04 17:32 ` Dan Stromberg
@ 2005-10-09 15:21 ` Mark Hahn
  2005-11-07 19:22 ` Bill Davidsen
  4 siblings, 0 replies; 9+ messages in thread
From: Mark Hahn @ 2005-10-09 15:21 UTC (permalink / raw)
  To: Andy Smith; +Cc: linux-raid

>         dd if=/dev/sd[abcd] of=/dev/null

the only thing really wrong with that is that you're using 512 byte IOs,
which is ridiculously inefficient.  disks these days don't do IO in 512B
chunks, so there's no justification for this.  (the last time I did a scan
of a sick disk, IIRC the minimum data size was actually 8K - that was an 
old deathstar that someone had flopping around loose inside a case...)

using, say, 128KB chunks will consume less overhead.  using O_DIRECT will
also treat the rest your system a bit nicer.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Checking the sanity of SATA disks
  2005-10-04 14:29 Checking the sanity of SATA disks Andy Smith
                   ` (3 preceding siblings ...)
  2005-10-09 15:21 ` Mark Hahn
@ 2005-11-07 19:22 ` Bill Davidsen
  4 siblings, 0 replies; 9+ messages in thread
From: Bill Davidsen @ 2005-11-07 19:22 UTC (permalink / raw)
  To: Andy Smith; +Cc: linux-raid

Andy Smith wrote:

>Hello,
>
>I have a home fileserver with 4 SATA disks in a RAID 5.  As I am
>sure you are aware, SATA devices in Linux currently cannot be
>queried for SMART info, so I can't do SMART health checks of these
>devices.
>
>Also there is still the tendency for Linux Software RAID to kick
>devices out of the array as soon as there is any error on them.
>
>I really don't want to be in the situation where a drive dies, I fit
>a new one, and during the resync another device is kicked out
>because of spontaneously finding a bad sector.
>
>I tried simply doing a
>
>        dd if=/dev/sd[abcd] of=/dev/null
>
>To check each disk in a very unsubtle fashion, but it drives the
>load average on the machine way way up (like to 20+) and makes it
>very unresponsive (wait several minutes for a keypress to be
>acknowledged), even if I run it under nice -n 19.
>

You (a) want to use larger buffers, and (b) a program which uses 
O_DIRECT for i/o. I had a news server which was running 28 aps until I 
started using dd, then it dropped to 3 aps. Usinf O_DIRECT there is no 
measurable slowdown (and no buffer contention).

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-11-07 19:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-04 14:29 Checking the sanity of SATA disks Andy Smith
2005-10-04 14:42 ` Molle Bestefich
2005-10-04 14:56   ` Andy Smith
2005-10-04 15:02     ` Molle Bestefich
2005-10-04 15:59 ` Patrik Jonsson
2005-10-04 17:32 ` Dan Stromberg
2005-10-05 11:08   ` Andy Smith
2005-10-09 15:21 ` Mark Hahn
2005-11-07 19:22 ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).