* Infrequent soft reset of ata for silicon image 3512 cards
@ 2008-07-11 9:29 Sagar Borikar
2008-08-01 4:14 ` Tejun Heo
0 siblings, 1 reply; 2+ messages in thread
From: Sagar Borikar @ 2008-07-11 9:29 UTC (permalink / raw)
To: linux-ide
Hello,
I hope this is the right list for following questions if not please
direct me to the correct one.
Currently I am working with NAS box which has following configuration:
MIPS arch
2.6.18 kernel - comparatively older but box is in production
128 MB RAM
sil 3512 SATA controller
xfs file system
When performing the iozone stress test of the box over CIFS, NFS
simultaneously, I find that the ata port gets soft reset once in 5-8
hours and because of which the the continuous write activity gets
stalled on the drives. All the smbd processes which are writing data
to the disk goes into uninterruptilbe sleep state continuosuly and the
test doesn't complete.
Following is the log that I get :
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/100
ata1: EH complete
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
After this, I start getting errors from file system :
can't seek in filesystem at bb 10686861057857128
can't read btree block 1630685585/1000141
can't seek in filesystem at bb 8951363201349912
can't read btree block 1365869628/911139
can't seek in filesystem at bb 5768064121399776
can't read btree block 880136736/1043772
Which looks like filesystem is trying to read the block which is not
present in the partition.
and because of which device driver cribs that it is trying to access
the data beyond end of the device.
So I guess there is filesystem corruption too which can be solved
independently but ata1 getting soft reset under load is something
strange. Has anyone observed this before with silicon image 3512
cards?
If I look for health of the drives, everything looks good:
[root@NAS001ee5ab9c85 ~]# smartctl -d ata -A /dev/sata1
smartctl version 5.33 [mips-unknown-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0003 190 187 021 Pre-fail
Always - 5500
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 603
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 200 200 051 Pre-fail
Always - 0
9 Power_On_Hours 0x0032 096 096 000 Old_age
Always - 3188
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail
Always - 0
11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 601
194 Temperature_Celsius 0x0022 120 096 000 Old_age
Always - 30
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail
Offline - 0
Checked whether it has any badblocks but it returns success for affected drive
buffer read and cache read command using hdparm also succeeds.
[root@NAS001ee5ab9c85 ~]# hdparm -tT /dev/sata1
/dev/sata1:
Timing cached reads: 308 MB in 2.03 seconds = 152.05 MB/sec
Timing buffered disk reads: 132 MB in 3.04 seconds = 43.45 MB/sec
any pointers?
Thanks in advance
Sagar
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Infrequent soft reset of ata for silicon image 3512 cards
2008-07-11 9:29 Infrequent soft reset of ata for silicon image 3512 cards Sagar Borikar
@ 2008-08-01 4:14 ` Tejun Heo
0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2008-08-01 4:14 UTC (permalink / raw)
To: Sagar Borikar; +Cc: linux-ide
Sagar Borikar wrote:
> I hope this is the right list for following questions if not please
> direct me to the correct one.
>
> Currently I am working with NAS box which has following configuration:
>
> MIPS arch
> 2.6.18 kernel - comparatively older but box is in production
Ah... it's a bit too old at this point.
> 128 MB RAM
> sil 3512 SATA controller
> xfs file system
>
> When performing the iozone stress test of the box over CIFS, NFS
> simultaneously, I find that the ata port gets soft reset once in 5-8
> hours and because of which the the continuous write activity gets
> stalled on the drives. All the smbd processes which are writing data
> to the disk goes into uninterruptilbe sleep state continuosuly and the
> test doesn't complete.
>
> Following is the log that I get :
>
> ata1: soft resetting port
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata1.00: configured for UDMA/100
> ata1: EH complete
> SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
> sda: Write Protect is off
> SCSI device sda: drive cache: write back
These only report the actions took by EH to recover from an error
condition. Is there any message before this?
> After this, I start getting errors from file system :
>
> can't seek in filesystem at bb 10686861057857128
> can't read btree block 1630685585/1000141
> can't seek in filesystem at bb 8951363201349912
> can't read btree block 1365869628/911139
> can't seek in filesystem at bb 5768064121399776
> can't read btree block 880136736/1043772
>
> Which looks like filesystem is trying to read the block which is not
> present in the partition.
> and because of which device driver cribs that it is trying to access
> the data beyond end of the device.
>
> So I guess there is filesystem corruption too which can be solved
> independently but ata1 getting soft reset under load is something
> strange. Has anyone observed this before with silicon image 3512
> cards?
Yeah, it looks like fs corruption. There have been a few reports of
data corruption on 3512 when combined with certain chipsets but they
didn't involve time outs or any other error conditions.
One common way to trigger data corruption is to briefly disconnect power
and reapply it. All the data in the cache will get lost and the driver
has no way whether it lost any data or not, so all hell breaks loose.
Similar situations do occur on running systems if the power supply can't
maintain voltage for whatever reason. Things like this usually occur
when a harddrive is plugged in (as the new one sucks in power to spin
up, existing ones suffer voltage drop) but I've seen it happening
without such event under heavy IO load.
Ruling it out is easy. Just prepare a separate power supply and connect
the harddrive (only the harddrive) to it and see whether the problem
disappears. You can power up an ATX PSU w/o motherboard easily.
http://modtown.co.uk/mt/article2.php?id=psumod
--
tejun
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2008-08-01 4:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-11 9:29 Infrequent soft reset of ata for silicon image 3512 cards Sagar Borikar
2008-08-01 4:14 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).