* Spurious HD convictions
@ 2009-12-12 19:42 Leslie Rhorer
0 siblings, 0 replies; 10+ messages in thread
From: Leslie Rhorer @ 2009-12-12 19:42 UTC (permalink / raw)
To: linux-raid
What's happening here? Suddenly, my backup server is suffering
apparently spurious hard drive convictions. The server is running RAID5 on
7 disks under md. It has been running well for months, but suddenly it has
started kicking drives from the array when under moderately heavy read or
write loads. The thing is, it isn't convicting any particular drive
repeatedly, and the drives are not showing any errors under SMART. This is
a PM system, and I have tried changing the drive adapters, changing the PMs,
changing cables, moving the drives around, and moving them out of the CPU
enclosure to a new external chassis. The convictions are not occurring on
any one channel, over any one particular PM, or over any particular cable.
Since this started happening, I have been unable to get all the way through
a resync before the array dumps at least one of the drives. Here is a
sample from the kernel log during one of the convictions:
Dec 12 13:03:39 Backup kernel: [56319.397992] ata6.00: failed to read SCR 1
(Emask=0x40)
Dec 12 13:03:39 Backup kernel: [56319.397999] ata6.01: failed to read SCR 1
(Emask=0x40)
Dec 12 13:03:39 Backup kernel: [56319.398001] ata6.02: failed to read SCR 1
(Emask=0x40)
Dec 12 13:03:39 Backup kernel: [56319.398006] ata6.03: failed to read SCR 1
(Emask=0x40)
Dec 12 13:03:39 Backup kernel: [56319.398008] ata6.04: failed to read SCR 1
(Emask=0x40)
Dec 12 13:03:39 Backup kernel: [56319.398010] ata6.05: failed to read SCR 1
(Emask=0x40)
Dec 12 13:03:39 Backup kernel: [56319.398014] ata6.15: exception Emask 0x4
SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 12 13:03:39 Backup kernel: [56319.398018] ata6.00: exception Emask 0x100
SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 12 13:03:39 Backup kernel: [56319.398022] ata6.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2
Dec 12 13:03:39 Backup kernel: [56319.398023] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 12 13:03:39 Backup kernel: [56319.398025] ata6.00: status: { DRDY }
Dec 12 13:03:39 Backup kernel: [56319.398028] ata6.01: exception Emask 0x100
SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 12 13:03:39 Backup kernel: [56319.398031] ata6.02: exception Emask 0x100
SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 12 13:03:39 Backup kernel: [56319.398034] ata6.03: exception Emask 0x100
SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 12 13:03:39 Backup kernel: [56319.398037] ata6.04: exception Emask 0x100
SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 12 13:03:39 Backup kernel: [56319.398040] ata6.05: exception Emask 0x100
SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 12 13:03:39 Backup kernel: [56319.398044] ata6.15: hard resetting link
Dec 12 13:03:41 Backup kernel: [56321.597384] ata6.15: SATA link up 3.0 Gbps
(SStatus 123 SControl 0)
Dec 12 13:03:41 Backup kernel: [56321.597864] ata6.00: hard resetting link
Dec 12 13:03:42 Backup kernel: [56321.933843] ata6.00: SATA link up 3.0 Gbps
(SStatus 123 SControl 320)
Dec 12 13:03:42 Backup kernel: [56321.933849] ata6.01: hard resetting link
Dec 12 13:03:42 Backup kernel: [56322.294048] ata6.01: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Dec 12 13:03:42 Backup kernel: [56322.294055] ata6.02: hard resetting link
Dec 12 13:03:42 Backup kernel: [56322.642243] ata6.02: SATA link down
(SStatus 0 SControl 320)
Dec 12 13:03:42 Backup kernel: [56322.646087] ata6.03: hard resetting link
Dec 12 13:03:43 Backup kernel: [56323.006393] ata6.03: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Dec 12 13:03:43 Backup kernel: [56323.006400] ata6.04: hard resetting link
Dec 12 13:03:43 Backup kernel: [56323.354708] ata6.04: SATA link up 1.5 Gbps
(SStatus 113 SControl 300)
Dec 12 13:03:43 Backup kernel: [56323.354714] ata6.05: hard resetting link
Dec 12 13:03:43 Backup kernel: [56323.690211] ata6.05: SATA link up 1.5 Gbps
(SStatus 113 SControl 320)
Dec 12 13:03:43 Backup kernel: [56323.694555] ata6.00: configured for
UDMA/100
Dec 12 13:03:43 Backup kernel: [56323.695732] ata6.01: configured for
UDMA/100
Dec 12 13:03:44 Backup kernel: [56323.703212] ata6.03: configured for
UDMA/100
Dec 12 13:03:44 Backup kernel: [56323.803119] ata6.04: configured for
UDMA/100
Dec 12 13:03:44 Backup kernel: [56323.803188] ata6: EH complete
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] 2930277168
512-byte hardware sectors (1500302 MB)
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write
Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Mode Sense:
00 3a 00 00
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] 2930277168
512-byte hardware sectors (1500302 MB)
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write
Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Mode Sense:
00 3a 00 00
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] 2930277168
512-byte hardware sectors (1500302 MB)
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write
Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Mode Sense:
00 3a 00 00
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] 625142448
512-byte hardware sectors (320073 MB)
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write
Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Mode Sense:
00 3a 00 00
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] 2930277168
512-byte hardware sectors (1500302 MB)
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write
Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Mode Sense:
00 3a 00 00
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] 2930277168
512-byte hardware sectors (1500302 MB)
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write
Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Mode Sense:
00 3a 00 00
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] 2930277168
512-byte hardware sectors (1500302 MB)
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write
Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Mode Sense:
00 3a 00 00
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] 625142448
512-byte hardware sectors (320073 MB)
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write
Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Mode Sense:
00 3a 00 00
Dec 12 13:03:44 Backup kernel: [56323.807115] sd 5:4:0:0: [sdh] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 12 13:03:44 Backup kernel: [56323.839100] end_request: I/O error, dev
sde, sector 10
Dec 12 13:03:44 Backup kernel: [56323.839100] md: super_written gets
error=-5, uptodate=0
Dec 12 13:03:44 Backup kernel: [56323.839100] raid5: Disk failure on sde,
disabling device.
Dec 12 13:03:44 Backup kernel: [56323.839100] raid5: Operation continuing on
6 devices.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Spurious HD convictions
@ 2009-12-13 2:02 lrhorer
2009-12-13 2:57 ` Majed B.
0 siblings, 1 reply; 10+ messages in thread
From: lrhorer @ 2009-12-13 2:02 UTC (permalink / raw)
To: linux-raid
What's happening here? Suddenly, my backup server is suffering apparently
spurious hard drive convictions. The server is running RAID5 on 7 disks
under md. It has been running well for months, but suddenly it has started
kicking drives from the array when under moderately heavy read or write
loads. The thing is, it isn't convicting any particular drive repeatedly,
and the drives are not showing any errors under SMART. This is a PM system,
and I have tried changing the drive adapters, changing the PMs, changing
cables, moving the drives around, and moving them out of the CPU enclosure to
a new external chassis. The convictions are not occurring on any one
channel, over any one particular PM, or over any particular cable. Since
this started happening, I have been unable to get all the way through a
resync before the array dumps at least one of the drives. Here is a sample
from the kernel log during one of the convictions:
Dec 12 13:03:39 Backup kernel: [56319.397992] ata6.00: failed to read SCR 1
(Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.397999] ata6.01: failed to
read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398001]
ata6.02: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel:
[56319.398006] ata6.03: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39
Backup kernel: [56319.398008] ata6.04: failed to read SCR 1 (Emask=0x40) Dec
12 13:03:39 Backup kernel: [56319.398010] ata6.05: failed to read SCR 1
(Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398014] ata6.15: exception
Emask 0x4 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel:
[56319.398018] ata6.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6
frozen Dec 12 13:03:39 Backup kernel: [56319.398022] ata6.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2
Dec 12 13:03:39 Backup kernel: [56319.398023] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 12 13:03:39 Backup kernel: [56319.398025] ata6.00: status: { DRDY } Dec 12
13:03:39 Backup kernel: [56319.398028] ata6.01: exception Emask 0x100 SAct
0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398031]
ata6.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12
13:03:39 Backup kernel: [56319.398034] ata6.03: exception Emask 0x100 SAct
0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398037]
ata6.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12
13:03:39 Backup kernel: [56319.398040] ata6.05: exception Emask 0x100 SAct
0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398044]
ata6.15: hard resetting link Dec 12 13:03:41 Backup kernel: [56321.597384]
ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) Dec 12 13:03:41
Backup kernel: [56321.597864] ata6.00: hard resetting link Dec 12 13:03:42
Backup kernel: [56321.933843] ata6.00: SATA link up 3.0 Gbps (SStatus 123
SControl 320) Dec 12 13:03:42 Backup kernel: [56321.933849] ata6.01: hard
resetting link Dec 12 13:03:42 Backup kernel: [56322.294048] ata6.01: SATA
link up 3.0 Gbps (SStatus 123 SControl 300) Dec 12 13:03:42 Backup kernel:
[56322.294055] ata6.02: hard resetting link Dec 12 13:03:42 Backup kernel:
[56322.642243] ata6.02: SATA link down (SStatus 0 SControl 320) Dec 12
13:03:42 Backup kernel: [56322.646087] ata6.03: hard resetting link Dec 12
13:03:43 Backup kernel: [56323.006393] ata6.03: SATA link up 3.0 Gbps
(SStatus 123 SControl 300) Dec 12 13:03:43 Backup kernel: [56323.006400]
ata6.04: hard resetting link Dec 12 13:03:43 Backup kernel: [56323.354708]
ata6.04: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Dec 12 13:03:43
Backup kernel: [56323.354714] ata6.05: hard resetting link Dec 12 13:03:43
Backup kernel: [56323.690211] ata6.05: SATA link up 1.5 Gbps (SStatus 113
SControl 320) Dec 12 13:03:43 Backup kernel: [56323.694555] ata6.00:
configured for UDMA/100 Dec 12 13:03:43 Backup kernel: [56323.695732]
ata6.01: configured for UDMA/100 Dec 12 13:03:44 Backup kernel:
[56323.703212] ata6.03: configured for UDMA/100 Dec 12 13:03:44 Backup
kernel: [56323.803119] ata6.04: configured for UDMA/100 Dec 12 13:03:44
Backup kernel: [56323.803188] ata6: EH complete Dec 12 13:03:44 Backup
kernel: [56323.803119] sd 5:0:0:0: [sde] 2930277168 512-byte hardware sectors
(1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde]
Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd
5:0:0:0: [sde] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel:
[56323.803119] sd 5:0:0:0: [sde] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd
5:1:0:0: [sdf] 2930277168 512-byte hardware sectors (1500302 MB) Dec 12
13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Mode Sense:
00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf]
Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12
13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] 2930277168 512-byte
hardware sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119]
sd 5:3:0:0: [sdg] Write Protect is off Dec 12 13:03:44 Backup kernel:
[56323.803119] sd 5:3:0:0: [sdg] Mode Sense: 00 3a 00 00 Dec 12 13:03:44
Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel:
[56323.803119] sd 5:4:0:0: [sdh] 625142448 512-byte hardware sectors (320073
MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write
Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0:
[sdh] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119]
sd 5:4:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde]
2930277168 512-byte hardware sectors (1500302 MB) Dec 12 13:03:44 Backup
kernel: [56323.803119] sd 5:0:0:0: [sde] Write Protect is off Dec 12 13:03:44
Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Mode Sense: 00 3a 00 00 Dec
12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44
Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] 2930277168 512-byte hardware
sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd
5:1:0:0: [sdf] Write Protect is off Dec 12 13:03:44 Backup kernel:
[56323.803119] sd 5:1:0:0: [sdf] Mode Sense: 00 3a 00 00 Dec 12 13:03:44
Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel:
[56323.803119] sd 5:3:0:0: [sdg] 2930277168 512-byte hardware sectors
(1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg]
Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd
5:3:0:0: [sdg] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel:
[56323.803119] sd 5:3:0:0: [sdg] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd
5:4:0:0: [sdh] 625142448 512-byte hardware sectors (320073 MB) Dec 12
13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write Protect is off
Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Mode Sense:
00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.807115] sd 5:4:0:0: [sdh]
Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12
13:03:44 Backup kernel: [56323.839100] end_request: I/O error, dev sde,
sector 10 Dec 12 13:03:44 Backup kernel: [56323.839100] md: super_written
gets error=-5, uptodate=0 Dec 12 13:03:44 Backup kernel: [56323.839100]
raid5: Disk failure on sde, disabling device.
Dec 12 13:03:44 Backup kernel: [56323.839100] raid5: Operation continuing on 6
devices.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Spurious HD convictions
@ 2009-12-13 2:07 lrhorer
0 siblings, 0 replies; 10+ messages in thread
From: lrhorer @ 2009-12-13 2:07 UTC (permalink / raw)
To: linux-raid
What's happening here? Suddenly, my backup server is suffering apparently spurious hard drive convictions. The server is running RAID5 on 7 disks under md. It has been running well for months, but suddenly it has started kicking drives from the array when under moderately heavy read or write loads. The thing is, it isn't convicting any particular drive repeatedly, and the drives are not showing any errors under SMART. This is a PM system, and I have tried changing the drive adapters, changing the PMs, changing cables, moving the drives around, and moving them out of the CPU enclosure to a new external chassis. The convictions are not occurring on any one channel, over any one particular PM, or over any particular cable. Since this started happening, I have been unable to get all th
e way through a resync before the array dumps at least one of the drives. Here is a sample from the kernel log during one of the convictions:
Dec 12 13:03:39 Backup kernel: [56319.397992] ata6.00: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.397999] ata6.01: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398001] ata6.02: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398006] ata6.03: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398008] ata6.04: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398010] ata6.05: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398014] ata6.15: exception Emask 0x4 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398018] ata6.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.3980
22] ata6.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2
Dec 12 13:03:39 Backup kernel: [56319.398023] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 12 13:03:39 Backup kernel: [56319.398025] ata6.00: status: { DRDY } Dec 12 13:03:39 Backup kernel: [56319.398028] ata6.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398031] ata6.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398034] ata6.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398037] ata6.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398040] ata6.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398044] ata6.15: hard resetting link Dec 12 13:03:41 Backup kernel: [56321.597384] ata6.15: SATA link up 3.0 Gbps (SStatus 12
3 SControl 0) Dec 12 13:03:41 Backup kernel: [56321.597864] ata6.00: hard resetting link Dec 12 13:03:42 Backup kernel: [56321.933843] ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Dec 12 13:03:42 Backup kernel: [56321.933849] ata6.01: hard resetting link Dec 12 13:03:42 Backup kernel: [56322.294048] ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 12 13:03:42 Backup kernel: [56322.294055] ata6.02: hard resetting link Dec 12 13:03:42 Backup kernel: [56322.642243] ata6.02: SATA link down (SStatus 0 SControl 320) Dec 12 13:03:42 Backup kernel: [56322.646087] ata6.03: hard resetting link Dec 12 13:03:43 Backup kernel: [56323.006393] ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 12 13:03:43 Backup kernel: [56323.006400] ata6.04: hard resetting link De
c 12 13:03:43 Backup kernel: [56323.354708] ata6.04: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Dec 12 13:03:43 Backup kernel: [56323.354714] ata6.05: hard resetting link Dec 12 13:03:43 Backup kernel: [56323.690211] ata6.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) Dec 12 13:03:43 Backup kernel: [56323.694555] ata6.00: configured for UDMA/100 Dec 12 13:03:43 Backup kernel: [56323.695732] ata6.01: configured for UDMA/100 Dec 12 13:03:44 Backup kernel: [56323.703212] ata6.03: configured for UDMA/100 Dec 12 13:03:44 Backup kernel: [56323.803119] ata6.04: configured for UDMA/100 Dec 12 13:03:44 Backup kernel: [56323.803188] ata6: EH complete Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] 2930277168 512-byte hardware sectors (1500302 MB) Dec 12 13:03:44 Backup
kernel: [56323.803119] sd 5:0:0:0: [sde] Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] 2930277168 512-byte hardware sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] 2930277168 512-byte hardware
sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] 625142448 512-byte hardware sectors (320073 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:
0:0:0: [sde] 2930277168 512-byte hardware sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] 2930277168 512-byte hardware sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12
13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] 2930277168 512-byte hardware sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] 625142448 512-byte hardware sectors (320073 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.807115] sd 5:4:0:0: [sdh] Write cache: enabled, read cache: e
nabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.839100] end_request: I/O error, dev sde, sector 10 Dec 12 13:03:44 Backup kernel: [56323.839100] md: super_written gets error=-5, uptodate=0 Dec 12 13:03:44 Backup kernel: [56323.839100] raid5: Disk failure on sde, disabling device.
Dec 12 13:03:44 Backup kernel: [56323.839100] raid5: Operation continuing on 6 devices.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Spurious HD convictions
2009-12-13 2:02 lrhorer
@ 2009-12-13 2:57 ` Majed B.
0 siblings, 0 replies; 10+ messages in thread
From: Majed B. @ 2009-12-13 2:57 UTC (permalink / raw)
To: lrhorer@satx.rr.com; +Cc: linux-raid
Hi Leslie,
According to some of the links here:
http://www.google.com/search?hl=en&q=failed+to+read+SCR+1+(Emask%3D0x40)
It seem to be either the Power Supply Unit (PSU) or the Port Multiplier (PM).
A quick workaround seem to be disabling NCQ on all affected devices.
On Sun, Dec 13, 2009 at 5:02 AM, lrhorer@satx.rr.com
<lrhorer@satx.rr.com> wrote:
>
> What's happening here? Suddenly, my backup server is suffering apparently
> spurious hard drive convictions. The server is running RAID5 on 7 disks
> under md. It has been running well for months, but suddenly it has started
> kicking drives from the array when under moderately heavy read or write
> loads. The thing is, it isn't convicting any particular drive repeatedly,
> and the drives are not showing any errors under SMART. This is a PM system,
> and I have tried changing the drive adapters, changing the PMs, changing
> cables, moving the drives around, and moving them out of the CPU enclosure to
> a new external chassis. The convictions are not occurring on any one
> channel, over any one particular PM, or over any particular cable. Since
> this started happening, I have been unable to get all the way through a
> resync before the array dumps at least one of the drives. Here is a sample
> from the kernel log during one of the convictions:
>
> Dec 12 13:03:39 Backup kernel: [56319.397992] ata6.00: failed to read SCR 1
> (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.397999] ata6.01: failed to
> read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398001]
> ata6.02: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39 Backup kernel:
> [56319.398006] ata6.03: failed to read SCR 1 (Emask=0x40) Dec 12 13:03:39
> Backup kernel: [56319.398008] ata6.04: failed to read SCR 1 (Emask=0x40) Dec
> 12 13:03:39 Backup kernel: [56319.398010] ata6.05: failed to read SCR 1
> (Emask=0x40) Dec 12 13:03:39 Backup kernel: [56319.398014] ata6.15: exception
> Emask 0x4 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel:
> [56319.398018] ata6.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6
> frozen Dec 12 13:03:39 Backup kernel: [56319.398022] ata6.00: cmd
> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2
> Dec 12 13:03:39 Backup kernel: [56319.398023] res
> 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> Dec 12 13:03:39 Backup kernel: [56319.398025] ata6.00: status: { DRDY } Dec 12
> 13:03:39 Backup kernel: [56319.398028] ata6.01: exception Emask 0x100 SAct
> 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398031]
> ata6.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12
> 13:03:39 Backup kernel: [56319.398034] ata6.03: exception Emask 0x100 SAct
> 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398037]
> ata6.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 12
> 13:03:39 Backup kernel: [56319.398040] ata6.05: exception Emask 0x100 SAct
> 0x0 SErr 0x0 action 0x6 frozen Dec 12 13:03:39 Backup kernel: [56319.398044]
> ata6.15: hard resetting link Dec 12 13:03:41 Backup kernel: [56321.597384]
> ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) Dec 12 13:03:41
> Backup kernel: [56321.597864] ata6.00: hard resetting link Dec 12 13:03:42
> Backup kernel: [56321.933843] ata6.00: SATA link up 3.0 Gbps (SStatus 123
> SControl 320) Dec 12 13:03:42 Backup kernel: [56321.933849] ata6.01: hard
> resetting link Dec 12 13:03:42 Backup kernel: [56322.294048] ata6.01: SATA
> link up 3.0 Gbps (SStatus 123 SControl 300) Dec 12 13:03:42 Backup kernel:
> [56322.294055] ata6.02: hard resetting link Dec 12 13:03:42 Backup kernel:
> [56322.642243] ata6.02: SATA link down (SStatus 0 SControl 320) Dec 12
> 13:03:42 Backup kernel: [56322.646087] ata6.03: hard resetting link Dec 12
> 13:03:43 Backup kernel: [56323.006393] ata6.03: SATA link up 3.0 Gbps
> (SStatus 123 SControl 300) Dec 12 13:03:43 Backup kernel: [56323.006400]
> ata6.04: hard resetting link Dec 12 13:03:43 Backup kernel: [56323.354708]
> ata6.04: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Dec 12 13:03:43
> Backup kernel: [56323.354714] ata6.05: hard resetting link Dec 12 13:03:43
> Backup kernel: [56323.690211] ata6.05: SATA link up 1.5 Gbps (SStatus 113
> SControl 320) Dec 12 13:03:43 Backup kernel: [56323.694555] ata6.00:
> configured for UDMA/100 Dec 12 13:03:43 Backup kernel: [56323.695732]
> ata6.01: configured for UDMA/100 Dec 12 13:03:44 Backup kernel:
> [56323.703212] ata6.03: configured for UDMA/100 Dec 12 13:03:44 Backup
> kernel: [56323.803119] ata6.04: configured for UDMA/100 Dec 12 13:03:44
> Backup kernel: [56323.803188] ata6: EH complete Dec 12 13:03:44 Backup
> kernel: [56323.803119] sd 5:0:0:0: [sde] 2930277168 512-byte hardware sectors
> (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde]
> Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:0:0:0: [sde] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:0:0:0: [sde] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:1:0:0: [sdf] 2930277168 512-byte hardware sectors (1500302 MB) Dec 12
> 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write Protect is off
> Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Mode Sense:
> 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:1:0:0: [sdf]
> Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12
> 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] 2930277168 512-byte
> hardware sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119]
> sd 5:3:0:0: [sdg] Write Protect is off Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:3:0:0: [sdg] Mode Sense: 00 3a 00 00 Dec 12 13:03:44
> Backup kernel: [56323.803119] sd 5:3:0:0: [sdg] Write cache: enabled, read
> cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:4:0:0: [sdh] 625142448 512-byte hardware sectors (320073
> MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write
> Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0:
> [sdh] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.803119]
> sd 5:4:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support
> DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde]
> 2930277168 512-byte hardware sectors (1500302 MB) Dec 12 13:03:44 Backup
> kernel: [56323.803119] sd 5:0:0:0: [sde] Write Protect is off Dec 12 13:03:44
> Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Mode Sense: 00 3a 00 00 Dec
> 12 13:03:44 Backup kernel: [56323.803119] sd 5:0:0:0: [sde] Write cache:
> enabled, read cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44
> Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] 2930277168 512-byte hardware
> sectors (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:1:0:0: [sdf] Write Protect is off Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:1:0:0: [sdf] Mode Sense: 00 3a 00 00 Dec 12 13:03:44
> Backup kernel: [56323.803119] sd 5:1:0:0: [sdf] Write cache: enabled, read
> cache: enabled, doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:3:0:0: [sdg] 2930277168 512-byte hardware sectors
> (1500302 MB) Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:3:0:0: [sdg]
> Write Protect is off Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:3:0:0: [sdg] Mode Sense: 00 3a 00 00 Dec 12 13:03:44 Backup kernel:
> [56323.803119] sd 5:3:0:0: [sdg] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA Dec 12 13:03:44 Backup kernel: [56323.803119] sd
> 5:4:0:0: [sdh] 625142448 512-byte hardware sectors (320073 MB) Dec 12
> 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Write Protect is off
> Dec 12 13:03:44 Backup kernel: [56323.803119] sd 5:4:0:0: [sdh] Mode Sense:
> 00 3a 00 00 Dec 12 13:03:44 Backup kernel: [56323.807115] sd 5:4:0:0: [sdh]
> Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 12
> 13:03:44 Backup kernel: [56323.839100] end_request: I/O error, dev sde,
> sector 10 Dec 12 13:03:44 Backup kernel: [56323.839100] md: super_written
> gets error=-5, uptodate=0 Dec 12 13:03:44 Backup kernel: [56323.839100]
> raid5: Disk failure on sde, disabling device.
> Dec 12 13:03:44 Backup kernel: [56323.839100] raid5: Operation continuing on 6
> devices.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Spurious HD convictions
@ 2009-12-13 3:44 lrhorer
2009-12-14 20:06 ` Majed B.
0 siblings, 1 reply; 10+ messages in thread
From: lrhorer @ 2009-12-13 3:44 UTC (permalink / raw)
To: linux-raid
Hmm. I don't see how it could be either the PS or the PMs, since the drives
were moved to a new enclosure when the problem started happening, yet the
problem persists. The new chassis has all new PMs and of course a new PS,
and the problem is happening across multiple PMs. In addition, if NCQ is the
problem, why has it just started happening? This system has been up and
running for the better part of a year. Regardless, I have disabled NCQ by
executing `echo 1 > /sys/block/sd[a-g]/device/queue_depth`, and I am
attempting a repair action again. We'll see how it goes.
> Hi Leslie,
>
> According to some of the links here:
> http://www.google.com/search?hl=en&q=failed+to+read+SCR+1+(Emask%3D0x40)
>
> It seem to be either the Power Supply Unit (PSU) or the Port Multiplier
> (PM).
>
> A quick workaround seem to be disabling NCQ on all affected devices.
>
> On Sun, Dec 13, 2009 at 5:02 AM, lrhorer@satx.rr.com
> <lrhorer@satx.rr.com> wrote:
> >
> > What's happening here? Suddenly, my backup server is suffering
> apparently
> > spurious hard drive convictions. The server is running RAID5 on 7 disks
> > under md. It has been running well for months, but suddenly it has
> started
> > kicking drives from the array when under moderately heavy read or write
> > loads. The thing is, it isn't convicting any particular drive
> repeatedly,
> > and the drives are not showing any errors under SMART. This is a PM
> system,
> > and I have tried changing the drive adapters, changing the PMs, changing
> > cables, moving the drives around, and moving them out of the CPU
> enclosure to
> > a new external chassis. The convictions are not occurring on any one
> > channel, over any one particular PM, or over any particular cable.
> Since
> > this started happening, I have been unable to get all the way through a
> > resync before the array dumps at least one of the drives. Here is a
> sample
> > from the kernel log during one of the convictions:
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Spurious HD convictions
2009-12-13 3:44 Spurious HD convictions lrhorer
@ 2009-12-14 20:06 ` Majed B.
[not found] ` <4b271970.5e44f10a.484f.ffffdd07SMTPIN_ADDED@mx.google.com>
2009-12-16 5:41 ` Leslie Rhorer
0 siblings, 2 replies; 10+ messages in thread
From: Majed B. @ 2009-12-14 20:06 UTC (permalink / raw)
To: lrhorer@satx.rr.com; +Cc: linux-raid
Hi Leslie,
I was wondering if you were able to stop the weird behavior with your disks.
On Sun, Dec 13, 2009 at 6:44 AM, lrhorer@satx.rr.com
<lrhorer@satx.rr.com> wrote:
> Hmm. I don't see how it could be either the PS or the PMs, since the drives
> were moved to a new enclosure when the problem started happening, yet the
> problem persists. The new chassis has all new PMs and of course a new PS,
> and the problem is happening across multiple PMs. In addition, if NCQ is the
> problem, why has it just started happening? This system has been up and
> running for the better part of a year. Regardless, I have disabled NCQ by
> executing `echo 1 > /sys/block/sd[a-g]/device/queue_depth`, and I am
> attempting a repair action again. We'll see how it goes.
>
>> Hi Leslie,
>>
>> According to some of the links here:
>> http://www.google.com/search?hl=en&q=failed+to+read+SCR+1+(Emask%3D0x40)
>>
>> It seem to be either the Power Supply Unit (PSU) or the Port Multiplier
>> (PM).
>>
>> A quick workaround seem to be disabling NCQ on all affected devices.
>>
>> On Sun, Dec 13, 2009 at 5:02 AM, lrhorer@satx.rr.com
>> <lrhorer@satx.rr.com> wrote:
>> >
>> > What's happening here? Suddenly, my backup server is suffering
>> apparently
>> > spurious hard drive convictions. The server is running RAID5 on 7 disks
>> > under md. It has been running well for months, but suddenly it has
>> started
>> > kicking drives from the array when under moderately heavy read or write
>> > loads. The thing is, it isn't convicting any particular drive
>> repeatedly,
>> > and the drives are not showing any errors under SMART. This is a PM
>> system,
>> > and I have tried changing the drive adapters, changing the PMs, changing
>> > cables, moving the drives around, and moving them out of the CPU
>> enclosure to
>> > a new external chassis. The convictions are not occurring on any one
>> > channel, over any one particular PM, or over any particular cable.
>> Since
>> > this started happening, I have been unable to get all the way through a
>> > resync before the array dumps at least one of the drives. Here is a
>> sample
>> > from the kernel log during one of the convictions:
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Spurious HD convictions
[not found] ` <4b271970.5e44f10a.484f.ffffdd07SMTPIN_ADDED@mx.google.com>
@ 2009-12-15 8:47 ` Majed B.
2009-12-16 5:40 ` Leslie Rhorer
0 siblings, 1 reply; 10+ messages in thread
From: Majed B. @ 2009-12-15 8:47 UTC (permalink / raw)
To: linux-raid
As far as I know, the numbers in between define the queue depth of
commands sent to the disks.
It's a matter of whether you want the disk(s) firmware to manage
sorting & executing the commands (when the queue is > 1) or not. In
all cases, as far as I know, the kernel does the sorting before
sending the commands to the disk(s). So if you notice better
performance (when the array is stable) with the queue=1, then keep it
that way.
On Tue, Dec 15, 2009 at 8:06 AM, Leslie Rhorer <lrhorer@satx.rr.com> wrote:
>> Hi Leslie,
>>
>> I was wondering if you were able to stop the weird behavior with your
>> disks.
>
> It seems to have done so, yes. I've looked around the web trying to
> find some additional info, but I've come up empty handed. Perhaps someone
> here can answer at least one of my questions? I know that putting a value
> of 32 into /sys/block/<driveID>/device/queue_depth fully enables NCQ, and
> putting a value of 1 there disables NCQ. What do all the numbers in between
> do?
>
>
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Spurious HD convictions
2009-12-15 8:47 ` Majed B.
@ 2009-12-16 5:40 ` Leslie Rhorer
0 siblings, 0 replies; 10+ messages in thread
From: Leslie Rhorer @ 2009-12-16 5:40 UTC (permalink / raw)
To: 'Majed B.', linux-raid
> As far as I know, the numbers in between define the queue depth of
> commands sent to the disks.
>
> It's a matter of whether you want the disk(s) firmware to manage
> sorting & executing the commands (when the queue is > 1) or not.
Well, what I am wondering is whether or not some value higher than 1
might produce better performance without risking the error.
> all cases, as far as I know, the kernel does the sorting before
> sending the commands to the disk(s). So if you notice better
> performance (when the array is stable) with the queue=1, then keep it
> that way.
Definitely not, it seems. At least for the resync, turning off NCQ
dropped the read rate from 35 MBps per drive to 25 MBps per drive.
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Spurious HD convictions
2009-12-14 20:06 ` Majed B.
[not found] ` <4b271970.5e44f10a.484f.ffffdd07SMTPIN_ADDED@mx.google.com>
@ 2009-12-16 5:41 ` Leslie Rhorer
2009-12-16 9:13 ` Robin Hill
1 sibling, 1 reply; 10+ messages in thread
From: Leslie Rhorer @ 2009-12-16 5:41 UTC (permalink / raw)
To: 'Majed B.'; +Cc: linux-raid
> Hi Leslie,
>
> I was wondering if you were able to stop the weird behavior with your
> disks.
It seems to have done so, yes. I've looked around the web trying to
find some additional info, but I've come up empty handed. Perhaps someone
here can answer at least one of my questions? I know that putting a value
of 32 into /sys/block/<driveID>/device/queue_depth fully enables NCQ, and
putting a value of 1 there disables NCQ. What do all the numbers in between
do?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Spurious HD convictions
2009-12-16 5:41 ` Leslie Rhorer
@ 2009-12-16 9:13 ` Robin Hill
0 siblings, 0 replies; 10+ messages in thread
From: Robin Hill @ 2009-12-16 9:13 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]
On Tue Dec 15, 2009 at 11:41:03PM -0600, Leslie Rhorer wrote:
> It seems to have done so, yes. I've looked around the web trying to
> find some additional info, but I've come up empty handed. Perhaps someone
> here can answer at least one of my questions? I know that putting a value
> of 32 into /sys/block/<driveID>/device/queue_depth fully enables NCQ, and
> putting a value of 1 there disables NCQ. What do all the numbers in between
> do?
>
I believe the value sets the allowed queue length. A value of 1 thus
effectively disables queueing. Most ATA drives have a maximum queue
depth of 32 so this is usually set for fully enabling it (though I
believe it's recommended only to use 31 - this seems to be the default).
If you set it to 32 and the drive has a shorter maximum queue, the
kernel will use the drive's maximum instead.
Setting any value is permitted, there's just generally little point in
doing so.
Cheers,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-12-16 9:13 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-13 3:44 Spurious HD convictions lrhorer
2009-12-14 20:06 ` Majed B.
[not found] ` <4b271970.5e44f10a.484f.ffffdd07SMTPIN_ADDED@mx.google.com>
2009-12-15 8:47 ` Majed B.
2009-12-16 5:40 ` Leslie Rhorer
2009-12-16 5:41 ` Leslie Rhorer
2009-12-16 9:13 ` Robin Hill
-- strict thread matches above, loose matches on Subject: below --
2009-12-13 2:07 lrhorer
2009-12-13 2:02 lrhorer
2009-12-13 2:57 ` Majed B.
2009-12-12 19:42 Leslie Rhorer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).