* Fwd: CF card I/O errors on Linux
[not found] <7B56DFFC-819E-46BA-9D06-B9AB97144AD6@usgs.gov>
@ 2010-05-12 21:19 ` Larry Baker
2010-05-20 9:50 ` Tejun Heo
0 siblings, 1 reply; 3+ messages in thread
From: Larry Baker @ 2010-05-12 21:19 UTC (permalink / raw)
To: jgarzik; +Cc: linux-ide
Jeff,
I have been trouble-shooting a misbehaving CF card on Fedora 10 (their
kernel 2.6.27.41-170.2.117). It fails in the ATA driver with an
ABORT. I have concluded that the card does not properly implement the
(PIO) READ/WRITE MULTIPLE commands. As a result of my work in your
libata-core.c code, I have a few suggestions:
1. I needed to see more of the IDENTIFY DEVICE response, so I
modified the probe printout to include words 47, 49, 59, and 80-88,
like this:
- "%s: cfg 49:%04x 82:%04x 83:%04x 84:%04x "
- "85:%04x 86:%04x 87:%04x 88:%04x\n",
+ "%s: cfg 47:%04x 49:%04x 59:%04x "
+ "80-88:%04x %04x %04x %04x %04x %04x %04x %04x %04x\n",
__func__,
- id[49], id[82], id[83], id[84],
- id[85], id[86], id[87], id[88]);
+ id[47], id[49], id[59], id[80], id[81], id[82],
+ id[83], id[84], id[85], id[86], id[87], id[88]);
2. The device I was having trouble with claims to support multi-
sector transfers, with a maximum Sector Count of 1. There's not much
point then, is there? I suggest adding a test for max >= 2 in the
conditional where dev->multi_count is assigned:
/* get current R/W Multiple count setting */
if ((dev->id[47] >> 8) == 0x80 && (dev->id[59] & 0x100)) {
unsigned int max = dev->id[47] & 0xff;
unsigned int cnt = dev->id[59] & 0xff;
+ /* no point unless max >= 2 */
+ if (max >= 2)
/* only recognize/allow powers of two here */
if (is_power_of_2(max) && is_power_of_2(cnt))
if (cnt <= max)
dev->multi_count = cnt;
}
3. In the error recovery code (in libata-eh.c?), I suggest a way to
recover for devices like the one I've got that improperly implement
READ/WRITE MULTI commands:
If the failed command is one of the READ/WRITE MULTI commands (entries
0-4 in ata_rw_cmds[]), and the device response is ABORT, use your
HORKAGE mechanism to set a flag like ATA_HORKAGE_BROKEN_MULTI to skip
setting dev->multi_count in libata-core.c (the code in 2., above),
i.e., disable multi-sector transfers, then reconfigure the device to
disable multi-sector transfers (call ata_dev_configure(), I guess, so
the new configuration gets printed out in dmesg) and retry the
transfer. Any future calls to ata_dev_configure() for that device
will not attempt to enable multi-sector transfers because the
ATA_HORKAGE_BROKEN_MULTI is set:
+ /* don't try to enable R/W multi if it is broken */
+ if (!(dev->horkage & ATA_HORKAGE_BROKEN_MULTI))
/* get current R/W Multiple count setting */
if ((dev->id[47] >> 8) == 0x80 && (dev->id[59] & 0x100)) {
unsigned int max = dev->id[47] & 0xff;
unsigned int cnt = dev->id[59] & 0xff;
+ /* no point unless max >= 2 */
+ if (max >= 2)
/* only recognize/allow powers of two here */
if (is_power_of_2(max) && is_power_of_2(cnt))
if (cnt <= max)
dev->multi_count = cnt;
}
Larry Baker
US Geological Survey
650-329-5608
baker@usgs.gov
Begin forwarded message:
> From: Larry Baker <baker@usgs.gov>
> Date: May 11, 2010 10:09:07 PM PDT
> To: Raydon Gordon <rgordon@stec-inc.com>
> Cc: Russ Sell <rwsell@usgs.gov>, Joe Fletcher <jfletcher@usgs.gov>,
> Tim MacDonald <tmacdonald@usgs.gov>, Ian Billings <i.billings@reftek.com
> >
> Subject: Re: CF card I/O errors on Linux
>
> Ron,
>
> I have analyzed the behavior of Fedora 10 Linux with the M2+ CF 9.0
> Rev: K116 CF card and I have concluded that the I/O errors are not
> due to a bug in Linux, but that the CF card does not properly
> implement the ATA (PIO) READ (WRITE) MULTIPLE command. It also
> returns a malformed value for word 47 (READ/WRITE MULTIPLE support)
> from the IDENTIFY DEVICE command.
>
> I reported to you before that the ATA standard says an ABORT error
> status means that the READ MULTIPLE command is illegal. I deduced
> that meant the CF card does not support multi-sector transfers. (I
> asked you if that was true, and when you did not contradict me, I
> assumed you implicitly confirmed my deduction.) I speculated that
> the Linux driver was not correctly initializing the device
> multi_count parameter, which I thought could lead to the illegal use
> of the READ MULTIPLE command instead of the READ SECTORS command.
>
> I modified the Fedora 10 Linux ATA device driver to initialize the
> device multi_count parameter to zero. That didn't help. I found
> that the CF card was in fact returning device information to the ATA
> driver that multi-sector transfers were supported. I also verified
> that the ATA driver was correctly reading the multi-sector count
> from the CF card. However, the only way I could ever get the CF
> card to work was by forcing the ATA driver to use a READ SECTORS
> command instead of a READ MULTIPLE command.
>
> I modified the Fedora 10 Linux ATA device driver to print out the
> ATA IDENTIFY DEVICE data words 47, 49, 59, and 80-88 (in hexadecimal):
>
> ata5.00: ata_dev_configure: cfg 47:0001 49:2b00 59:0101
> 80-88:0078 0000 740a 4004 4000 7429 0004 4003 001f
>
> Word 47 (0001h), bits 0:7 (01h) is the maximum number of sectors for
> READ/WRITE MULTIPLE commands. The value is 1 (which means there is
> no benefit to READ/WRITE MULTIPLE).
>
> Word 59 (0101h), bit 8 (1) is the "multi-sector transfers enabled"
> state. If multi-sector transfers are enabled (=1), the multi-sector
> count in bits 0:7 (01h) is the current Sector Count for READ/WRITE
> MULTIPLE commands. The Sector Count is 1, which, since it is also
> the maximum allowed, is the only legal value. If the host does not
> alter the Sector Count (to 1, 2, 4, 8, 16, 32, 64, or 128, with a
> SET MULTIPLE MODE), the device can use a preferred Sector Count
> (which can be 0=disabled) after power-up/reset for subsequent READ/
> WRITE MULTIPLE commands. That is, the host is not required to issue
> a SET MULTIPLE MODE command if the device enables multi-sector
> transfers by default. The device information returned by the CF
> card says multi-sector transfers should work. (In fact, they are
> required to work; see below.)
>
> Recall, the code in the Fedora 10 (Linux kernel version 2.6.27.41)
> ATA device driver (function ata_dev_configure() in drivers/ata/
> libata-core.c) that sets the device multi_count parameter is:
>
> if (dev->id[59] & 0x100)
> dev->multi_count = dev->id[59] & 0xff;
>
> I speculated the assignment statement was not being executed for the
> CF card. That was wrong; it is being executed, and the value 1 is
> being assigned to the device multi_count parameter.
>
> I looked at the code in the latest version of the Linux kernel. It
> is different:
>
> /* get current R/W Multiple count setting */
> if ((dev->id[47] >> 8) == 0x80 && (dev->id[59] & 0x100)) {
> unsigned int max = dev->id[47] & 0xff;
> unsigned int cnt = dev->id[59] & 0xff;
> /* only recognize/allow powers of two here */
> if (is_power_of_2(max) && is_power_of_2(cnt))
> if (cnt <= max)
> dev->multi_count = cnt;
> }
>
> It leaves the device multi_count parameter zero (disabling multi-
> sector transfers) if either:
>
> Word 47, bits 8:15 are not 80h (sanity check of a fixed field
> value)
> Word 59, bit 8 is not set (=1) (the device is currently
> configured to disable multi-sector transfers)
> Word 47, bits 0:7 (max) is not a power of 2 (sanity check)
> Word 59, bits 0:7 (cnt) is not a power of 2 (sanity check)
> Word 59, bits 0:7 (cnt) (current sector count) is greater than
> word 47, bits 0:7 (max) (maximum value allowed for sector count)
>
> Otherwise, it sets the device multi_count parameter to word 59, bits
> 0:7 (cnt).
>
> I modified the Fedora 10 Linux ATA driver to use the newer code.
> After that, the device multi_count parameter is zero, which disables
> the use of the READ MULTIPLE command, and the CF card works fine.
>
> I think that is accidental, because:
>
> Word 80 (0078h) sets bits for the ATA version(s) supported. Bits
> 3-6 are set (1). That means the CF card supports ATA versions 3
> through 6. All ATA standards since September 1995 (version 3 and
> up) have made READ, WRITE, and SET MULTIPLE commands mandatory.
> Examining the values for words 47 and 59 from the IDENTIFY DEVICE
> command, we can see the reason the device multi_count parameter was
> left set to zero is because bits 8:15 of word 47 are 00h, not 80h.
> The ATA standard versions 3 through 6 (see the portions below)
> specify that bits 8:15 of word 47 must (F=fixed value) be 80h. The
> CF card violates this requirement of the ATA standard. The sanity
> check in the Linux ATA driver fails. This has the inadvertent side
> effect of disabling READ MULTIPLE in the Linux ATA driver. Thus,
> the flaw in the value for word 47, bits 8:15 in the device
> information coincidentally causes the Linux ATA driver to conceal
> the flaw in the support for multi-sector transfers, and the CF card
> operates correctly in PIO mode on Linux systems that use kernel
> versions later than 2.6.30.
>
> I do not know why the card is not being used in DMA mode on Linux.
> Perhaps there is some setup in the Linux PCMCIA Card and Socket
> Services that I do not know about. I also do not know how Windows
> XP uses the CF card (READ SECTORS, READ MULTIPLE, or READ DMA). I
> had no trouble reading the CF card on the same hardware when I
> booted Windows XP (it is setup to boot either 32-bit Windows XP, 64-
> bit Windows XP, or 64-bit Linux).
>
> I was also wrong when I told you earlier that the Linux ATA driver
> had nothing to do with reading the CF card with a USB card reader.
> When I got the CF card to work using the CF card-to-PC card adapter,
> it appeared as a SCSI (/dev/sd) disk. I thought it would be seen as
> an IDE (/dev/hd) disk. The Linux SCSI driver is using the ATA
> driver to access the CF card "disk". This same ATA driver code
> might be used when the CF card is plugged into a USB reader. I
> don't know yet. I will not have a chance to work on this again
> until after I return from vacation in a month.
>
> Larry Baker
> US Geological Survey
> 650-329-5608
> baker@usgs.gov
>
> I downloaded the last drafts of each version of the ATA
> specifications from the Technical Committee T13, AT Attachment, web
> site, http://www.t13.org. (You have to pay ANSI for copies of the
> final ATA standards.)
>
> ATA version 7 (d1532v1r4b ATA-ATAPI-7.pdf)
>
> 6.17.22 Word 47: READ/WRITE MULTIPLE support.
> Bits (7:0) of this word define the maximum number of sectors per
> block that the device supports for READ/WRITE MULTIPLE commands. If
> the serial interface is implemented, this field shall be set to 16
> or less.
>
> 6.17.28 Word 59: Multiple sector setting
> If bit 8 is set to one, bits (7:0) reflect the number of sectors
> currently set to transfer on a READ/WRITE MULTIPLE command. This
> field may default to the preferred value for the device (See 6.52).
>
> 6.17.40 Word 80: Major version number
> If not 0000h or FFFFh, the device claims compliance with the major
> version(s) as indicated by bits (6:3) being set to one. Values other
> than 0000h and FFFFh are bit significant. Since ATA standards
> maintain downward compatibility, a device may set more than one bit.
>
> 6.17.41 Word 81: Minor version number
> If an implementor claims that the revision of the standard they used
> to guide their implementation does not need to be reported or if the
> implementation was based upon a standard prior to the ATA-3
> standard, word 81 shall be 0000h or FFFFh.
>
> ATA version 6 (d1410r3b-ATA-ATAPI-6.pdf)
>
> 47 M F 15-8 80h
> F 7-0 00h = Reserved
> F 01h-FFh = Maximum number of sectors that shall be transferred
> per interrupt on READ/WRITE MULTIPLE commands
>
> 59 M F 15-9 Reserved
> V 8 1 = Multiple sector setting is valid
> V 7-0 xxh = Current setting for number of sectors that shall be
> transferred per interrupt on R/W Multiple command
>
> 80 M Major version number
> 0000h or FFFFh = device does not report version
> F 15 Reserved
> F 14 Reserved for ATA/ATAPI-14
> F 13 Reserved for ATA/ATAPI-13
> F 12 Reserved for ATA/ATAPI-12
> F 11 Reserved for ATA/ATAPI-11
> F 10 Reserved for ATA/ATAPI-10
> F 9 Reserved for ATA/ATAPI-9
> F 8 Reserved for ATA/ATAPI-8
> F 7 Reserved for ATA/ATAPI-7
> F 6 1 = supports ATA/ATAPI-6
> F 5 1 = supports ATA/ATAPI-5
> F 4 1 = supports ATA/ATAPI-4
> F 3 1 = supports ATA-3
> X 2 Obsolete
> X 1 Obsolete
> F 0 Reserved
>
> 81 M F Minor version number
> 0000h or FFFFh = device does not report version
> 0001h-FFFEh = see 8.15.41
>
> ATA version 5 (d1321r3-ATA-ATAPI-5.pdf)
>
> 47 X 15-8 80h
> R 7-0 00h = Reserved
> F 01h-FFh = Maximum number of sectors that shall be transferred
> per interrupt on READ/WRITE MULTIPLE commands
>
> ATA version 4 (d1153r18-ATA-ATAPI-4.pdf)
>
> 47 X 15-8 80h
> R 7-0 00h =Reserved
> F 01h-FFh = Maximum number of sectors that shall be transferred
> per interrupt on READ/WRITE MULTIPLE commands
>
> ATA version 3 (d2008r7b-ATA-3.pdf)
>
> 47 X 15-8 Vendor specific
> R 7-0 00h =Reserved
> F 01h-FFh = Maximum number of sectors that can be transferred per
> interrupt on READ/WRITE MULTIPLE commands
>
> Key:
> F = the content of the word is fixed and does not change. For
> removable media devices, these values may change when media is
> removed or changed.
> V = the contents of the word is variable and may change depending on
> the state of the device or the commands executed by the device.
> X = the content of the word is vendor specific and may be fixed or
> variable.
> R = the content of the word is reserved and shall be zero.
>
> On May 9, 2010, at 11:47 AM, Raydon Gordon wrote:
>
>> Hi Larry,
>>
>> Thank you for the explanation of your findings with respect to the
>> Linux driver. I agree that not initializing the value of “device
>> multi_count” could potential spell disaster. I appreciate your
>> willingness to apprise me of what you find after patching the driver.
>>
>> I regard to the USB to CF card issue, I have engaged our
>> engineering team, as previously mentioned. They should be looking
>> into this issue during the coming week. I will keep you posted.
>>
>>
>> Best Regards,
>>
>> Raydon Gordon
>> rgordon@stec-inc.com
>>
>> STEC, Inc - Field Application Engineer
>> 2107 N First Street, Suite 415
>> San Jose, CA 95131
>> (800) 367-7330 x8919 (Office)
>> (408) 416-6514 (Cell)
>> (408) 452-7936 (Fax)
>>
>> From: Larry Baker [mailto:baker@usgs.gov]
>> Sent: Friday, May 07, 2010 3:49 PM
>> To: Raydon Gordon
>> Cc: Russ Sell; Joe Fletcher; Tim MacDonald; Ian Billings
>> Subject: Fwd: CF card I/O errors on Linux
>>
>> Ray,
>>
>> I have found the reason for the failure to read your CF card (using
>> a CF card-to-PCMCIA card adapter) on our Fedora 9 and 10 Linux
>> laptops: the ATA device driver in Fedora 9 and 10 is broken.
>>
>> As I mentioned before, the device multi_count setting controls
>> whether Linux uses the ATA (PIO) READ MULTIPLE command when it
>> reads from the CF card. The device multi_count setting is a device
>> configuration setting initialized by the Linux kernel ATA device
>> driver (function ata_dev_configure() in drivers/ata/libata-core.c):
>>
>> if (dev->id[59] & 0x100)
>> dev->multi_count = dev->id[59] & 0xff;
>>
>> However, this code alters the device multi_count setting only if
>> bit 8 in the IDENTIFY DEVICE word 59 is set (Multiple sector
>> setting is valid). If a device does not support multi-sector
>> transfers, bit 8 will be clear, and the device multi_count setting
>> will retain its previous value. The question is, what is that
>> value? Unless it has been initialized somewhere else, it will have
>> whatever garbage was in the memory location for the device
>> multi_count setting when the device configuration data structure
>> was allocated. That would be a bug, and the symptoms would be:
>> even though a device responds that it does not support multi-sector
>> transfers, Linux will use them anyway, and the transfer will fail
>> with an ABORT error. That is exactly what is happening to us.
>>
>> Prior to the code above, the device configuration parameters are
>> initialized:
>>
>> /* initialize to-be-configured parameters */
>> dev->flags &= ~ATA_DFLAG_CFG_MASK;
>> dev->max_sectors = 0;
>> dev->cdb_len = 0;
>> dev->n_sectors = 0;
>> dev->cylinders = 0;
>> dev->heads = 0;
>> dev->sectors = 0;
>>
>> The device multi_count setting does not appear, leading me to
>> suspect it has not been initialized to zero anywhere. Inspection
>> of the same section of code in the latest stable release of the
>> Linux kernel (2.6.33.3) shows that a line has been added that
>> initializes the device muilti_count setting:
>>
>> /* initialize to-be-configured parameters */
>> dev->flags &= ~ATA_DFLAG_CFG_MASK;
>> dev->max_sectors = 0;
>> dev->cdb_len = 0;
>> dev->n_sectors = 0;
>> dev->cylinders = 0;
>> dev->heads = 0;
>> dev->sectors = 0;
>> dev->multi_count = 0;
>>
>> Fedora 10 Linux uses kernel release 2.6.27.41-170. The first Linux
>> release that contains the missing line is 2.6.30. Thus, any ATA
>> device that requires PIO transfers and does not support multi-
>> sector transfers will probably fail for Linux kernels older than
>> release 2.6.30. I will try to patch the Linux ATA device drivers
>> on our Fedora 9 and Fedora 10 laptops and verify that this explains
>> the errors we have been getting.
>>
>> CF cards inserted in CF card-to-PCMCIA card adapters into a PCMCIA
>> card slot appear to Linux as ATA devices (/dev/hd devices). CF
>> cards inserted in USB CF card readers appear to Linux as SCSI
>> devices (/dev/sd devices). That is not the code I have been
>> looking at. The problems we have encountered reading CF cards
>> using USB CF card readers uses different device drivers. I'll work
>> on that next when I can find a test setup that works and one that
>> fails.
>>
>> Larry Baker
>> US Geological Survey
>> 650-329-5608
>> baker@usgs.gov
>>
>> Begin forwarded message:
>>
>>
>> From: Larry Baker <baker@usgs.gov>
>> Date: May 6, 2010 8:07:22 PM PDT
>> To: Raydon Gordon <rgordon@stec-inc.com>
>> Cc: Russ Sell <rwsell@usgs.gov>, Joe Fletcher <jfletcher@usgs.gov>,
>> Tim MacDonald <tmacdonald@usgs.gov>
>> Subject: Re: CF card I/O errors on Linux
>>
>> Ray,
>>
>> I have decoded the commands being sent to the CF card. The Linux
>> message log file sequence repeats:
>>
>> ... ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
>> ... ata10.00: cmd c4/00:20:00:00:00/00:00:00:00:00/e0 tag 0 pio
>> 16384 in
>> ... res 51/04:20:00:00:00/00:00:00:00:00/e0 Emask 0x1
>> (device error)
>> ... ata10.00: status: { DRDY ERR }
>> ... ata10.00: error: { ABRT }
>> ... ata10.00: configured for PIO0
>> ... ata10: EH complete
>>
>> The cmd=C4h command being executed is (PIO) READ MULTIPLE. The
>> arguments are Count=20h (32), LBA=0, Device=E0h (OBSOLETE_BIT_15 +
>> LBA_BIT + OBSOLETE_BIT13).
>>
>> The res=51h response is DRDY (Device Ready) + DRQ (Data Request) +
>> ERR (Error). The error bits=04h are ABRT (Abort).
>>
>> The ATA spec (http://www.t13.org/Documents/UploadedDocuments/docs2006/D1699r3f-ATA8-ACS.pdf
>> ) for READ MULTIPLE (page 155) says:
>>
>> 7.30 READ MULTIPLE - C4h, PIO data-in
>>
>> 7.30.2 Description
>>
>> If the READ MULTIPLE command is received when read multiple
>> commands are disabled, the READ MULTIPLE
>> operation shall be rejected with command aborted.
>>
>> The ATA spec for the Abort error bit (page 53) says:
>>
>> 6.2 Error Bits
>>
>> 6.2.1 Abort (ABRT)
>>
>> Error bit 2. Abort shall be set to one if the command is not
>> supported. Abort may be set to one if the device is not
>> able to complete the action requested by the command. Abort shall
>> also be set to one if an address outside of
>> the range of user-accessible addresses is requested if IDNF is not
>> set to one.
>>
>> From this, I conclude that the "M2+ CF 9.0 Rev: K116" CF card does
>> not support the ATA (PIO) READ MULTIPLE command. Is that true?
>>
>> I found the Linux kernel device driver code (function
>> ata_rwcmd_protocol() in drivers/ata/libata-core.c) that selects PIO
>> MULTI vs. PIO transfers. It tests the ATA device configuration
>> field "multi_count" to decide whether to use (PIO) multi-sector
>> transfers. multi_count is set by the Linux kernel ATA device
>> driver (function ata_dev_configure(), also in drivers/ata/libata-
>> core.c):
>>
>> if (dev->id[59] & 0x100)
>> dev->multi_count = dev->id[59] & 0xff;
>>
>> This code tests whether bit 8 (Multiple sector setting is valid;
>> see page 92 in the ATA spec) is set in id[59] (word 59 of the
>> response to the IDENTIFY DEVICE command), and, if so, sets the
>> value of multi_count to the (presumably non-zero) value in the
>> lower 8 bits in id[59] (Current setting for number of logical
>> sectors that shall be transferred per DRQ data block on READ/WRITE
>> Multiple commands).
>>
>> From this, I conclude that the "M2+ CF 9.0 Rev: K116" CF card is
>> responding to the DEVICE INQUIRY command that it does support the
>> ATA (PIO) READ MULTIPLE command. Is that true?
>>
>> If both these conclusions are true, they are inconsistent. Do you
>> agree with my conclusions? Do you have any way to test this
>> hypothesis?
>>
>> Larry Baker
>> US Geological Survey
>> 650-329-5608
>> baker@usgs.gov
>>
>> On May 6, 2010, at 4:38 PM, Raydon Gordon wrote:
>>
>>
>> Larry,
>>
>> I will have to check.
>>
>> Would it be possible for you to send me a photo of the backside of
>> the card, which contains version and part number information?
>>
>> Also, which brand and model notebook computer are you using?
>>
>>
>> Thanks!
>>
>>
>> Best Regards,
>>
>> Raydon Gordon
>> rgordon@stec-inc.com
>>
>> STEC, Inc - Field Application Engineer
>> 2107 N First Street, Suite 415
>> San Jose, CA 95131
>> (800) 367-7330 x8919 (Office)
>> (408) 416-6514 (Cell)
>> (408) 452-7936 (Fax)
>>
>>
>> -----Original Message-----
>> From: Larry Baker [mailto:baker@usgs.gov]
>> Sent: Thursday, May 06, 2010 4:32 PM
>> To: Raydon Gordon
>> Subject: Re: CF card I/O errors on Linux
>>
>> Ray,
>>
>> Do you have any laptop Linux systems in house with PCMCIA card slots?
>> I can send you a CF card-to-PCMCIA card adapter that I have tried (I
>> tried 3 or 4 of them).
>>
>> Also, which distributions/versions of Linux do you use? I believe
>> the
>> two systems we have (both laptops) run Fedora: one runs Fedora 9, the
>> other runs Fedora 10.
>>
>> Larry Baker
>> US Geological Survey
>> 650-329-5608
>> baker@usgs.gov
>>
>> On May 6, 2010, at 4:26 PM, Raydon Gordon wrote:
>>
>> > Ok, Larry. Since we have not seen this failure in-house on the
>> Linux
>> > systems we run, I think it is best to look at the CF card reader.
>> At
>> > least this way, we should be able to replicate the problem and
>> track
>> > down the root cause.
>> >
>> > It is no more difficult for us to diagnose this failure on a
>> "cheap"
>> > card reader than it would be on a full-fledged system. We use an
>> > interposer card and a bus analyzer, so there should not be a
>> problem.
>> >
>> > Best Regards,
>> >
>> > Raydon Gordon
>> > rgordon@stec-inc.com
>> >
>> > STEC, Inc - Field Application Engineer
>> > 2107 N First Street, Suite 415
>> > San Jose, CA 95131
>> > (800) 367-7330 x8919 (Office)
>> > (408) 416-6514 (Cell)
>> > (408) 452-7936 (Fax)
>> >
>> > -----Original Message-----
>> > From: Larry Baker [mailto:baker@usgs.gov]
>> > Sent: Thursday, May 06, 2010 11:24 AM
>> > To: Raydon Gordon
>> > Subject: Re: CF card I/O errors on Linux
>> >
>> > Ray,
>> >
>> > No problem. I'll try to spend some time to decode the error
>> messages
>> > I included for you. Note: the errors I sent were not using an CF
>> card
>> > reader, but a CF card-to-PCMCIA card adapter, which, I believe,
>> is not
>> > an active device. That is, it passes the pins on the CF card to
>> the
>> > proper pins in a PCMCIA card slot. Presumably, that setup should
>> have
>> > worked, and should be simpler to diagnose than a (likely, cheap) CF
>> > card reader.
>> >
>> > Larry Baker
>> > US Geological Survey
>> > 650-329-5608
>> > baker@usgs.gov
>> >
>> > On May 5, 2010, at 8:19 PM, Raydon Gordon wrote:
>> >
>> >> Larry,
>> >>
>> >> Sorry for the delay in responding. We are currently in the process
>> >> of investigating the alleged CF card reader compatibility issue.
>> >>
>> >> At this time, the root cause has not been identified as we have
>> not
>> >> been able to witness the problem first hand. I'm not saying that
>> it
>> >> doesn't exist, just that until now we have not seen a problem with
>> >> any CF card readers. I have purchased a card reader suspected of
>> >> exhibiting this issue and we will begin examining its interaction
>> >> with the new STEC CF cards shortly.
>> >>
>> >> We will be able to provide more insight once our investigation is
>> >> complete. At this time I do not have an ETA, however, I can assure
>> >> you that this reported problem is being investigated.
>> >>
>> >> Best Regards,
>> >>
>> >> Raydon Gordon
>> >> rgordon@stec-inc.com
>> >>
>> >> STEC, Inc - Field Application Engineer
>> >> 2107 N First Street, Suite 415
>> >> San Jose, CA 95131
>> >> (800) 367-7330 x8919 (Office)
>> >> (408) 416-6514 (Cell)
>> >> (408) 452-7936 (Fax)
>> >>
>> >> -----Original Message-----
>> >> From: Jesse Molina
>> >> Sent: Monday, April 19, 2010 3:45 PM
>> >> To: Larry Baker; Mark Swiney
>> >> Cc: Raydon Gordon
>> >> Subject: RE: CF card I/O errors on Linux
>> >>
>> >> Hi Larry,
>> >>
>> >> I am cc'ing Raydon Gordon who is our local FAE. Raydon can take a
>> >> look at things and provide some insight. Thanks.
>> >>
>> >> Best Regards,
>> >> Jesse
>> >>
>> >> Jesse Molina
>> >> Field Sales Engineer
>> >> STEC, Inc.
>> >> Tel: 800-367-7330 x 8916
>> >> Cell: 408-605-5583
>> >> Website: www.stec-inc.com
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Larry Baker [mailto:baker@usgs.gov]
>> >> Sent: Monday, April 19, 2010 12:04 PM
>> >> To: Mark Swiney; Jesse Molina
>> >> Subject: CF card I/O errors on Linux
>> >>
>> >> We have several M2+ CF 9.0 Rev: K116 cards. They were supplied
>> to us
>> >> by an instrument manufacturer (Refraction Technology) to record
>> >> seismic data. We use either USB CF card readers or CF-to-PCCARD
>> >> adapters to connect the CF cards to Linux laptops for playback.
>> They
>> >> seem to be very temperamental. They work with some USB CF card
>> >> readers, but not with others. Sometimes they take an
>> excruciatingly
>> >> long time to play back. I have uniformly bad luck using them with
>> >> any
>> >> of the many (5V and low-voltage keyed) CF-to-PCCARD adapters.
>> (The
>> >> Linux kernel messages are below. I have not tried to decode the
>> >> SCSI/
>> >> ATA responses.) The problems do not seem to appear when we use
>> >> Windows XP -- only Linux -- I don't know why. However, our data
>> >> management systems are all Linux; we do not use Windows. The
>> Linux
>> >> systems I have tried are Fedora 9 and 10. We do not have this
>> >> problem
>> >> with other brands of CF cards (e.g., Kingston). However, our
>> >> instrument supplier uses your cards. Do your engineers have any
>> >> comments or suggestions?
>> >>
>> >> Larry Baker
>> >> US Geological Survey
>> >> 650-329-5608
>> >> baker@usgs.gov
>> >>
>> >> Portions of the Fedora Linux 10 /var/log/messages file with error
>> >> messages and drive status information following insertion of CF-
>> to-
>> >> PCCARD adapter:
>> >>
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:20:00:00:00/00:00:00:00:00/e0 tag 0 pio 16384 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:20:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:20:00:00:00/00:00:00:00:00/e0 tag 0 pio 16384 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:20:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:20:00:00:00/00:00:00:00:00/e0 tag 0 pio 16384 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:20:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:20:00:00:00/00:00:00:00:00/e0 tag 0 pio 16384 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:20:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:20:00:00:00/00:00:00:00:00/e0 tag 0 pio 16384 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:20:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:20:00:00:00/00:00:00:00:00/e0 tag 0 pio 16384 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:20:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Result:
>> >> hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Sense Key :
>> Aborted
>> >> Command [current] [descriptor]
>> >> Apr 19 04:45:42 necv50 kernel: Descriptor sense data with sense
>> >> descriptors (in hex):
>> >> Apr 19 04:45:42 necv50 kernel: 72 0b 00 00 00 00 00 0c 00
>> 0a
>> >> 80
>> >> 00 00 00 00 00
>> >> Apr 19 04:45:42 necv50 kernel: 00 00 00 00
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Add. Sense: No
>> >> additional sense information
>> >> Apr 19 04:45:42 necv50 kernel: end_request: I/O error, dev sdb,
>> >> sector 0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:08:00:00:00/00:00:00:00:00/e0 tag 0 pio 4096 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:08:00:00:00/00:00:00:00:00/e0 tag 0 pio 4096 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:08:00:00:00/00:00:00:00:00/e0 tag 0 pio 4096 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:08:00:00:00/00:00:00:00:00/e0 tag 0 pio 4096 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:08:00:00:00/00:00:00:00:00/e0 tag 0 pio 4096 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: exception Emask 0x0
>> SAct 0x0
>> >> SErr 0x0 action 0x0
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: cmd
>> >> c4/00:08:00:00:00/00:00:00:00:00/e0 tag 0 pio 4096 in
>> >> Apr 19 04:45:42 necv50 kernel: res
>> >> 51/04:08:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: status: { DRDY ERR }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: error: { ABRT }
>> >> Apr 19 04:45:42 necv50 kernel: ata10.00: configured for PIO0
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Result:
>> >> hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Sense Key :
>> Aborted
>> >> Command [current] [descriptor]
>> >> Apr 19 04:45:42 necv50 kernel: Descriptor sense data with sense
>> >> descriptors (in hex):
>> >> Apr 19 04:45:42 necv50 kernel: 72 0b 00 00 00 00 00 0c 00
>> 0a
>> >> 80
>> >> 00 00 00 00 00
>> >> Apr 19 04:45:42 necv50 kernel: 00 00 00 00
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Add. Sense: No
>> >> additional sense information
>> >> Apr 19 04:45:42 necv50 kernel: end_request: I/O error, dev sdb,
>> >> sector 0
>> >> Apr 19 04:45:42 necv50 kernel: ata10: EH complete
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] 3844512 512-byte
>> >> hardware sectors (1968 MB)
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Write Protect
>> is off
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Write cache:
>> >> enabled,
>> >> read cache: enabled, doesn't support DPO or FUA
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] 3844512 512-byte
>> >> hardware sectors (1968 MB)
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Write Protect
>> is off
>> >> Apr 19 04:45:42 necv50 kernel: sd 9:0:0:0: [sdb] Write cache:
>> >> enabled,
>> >> read cache: enabled, doesn't support DPO or FUA
>> >>
>> >>
>> >> PROPRIETARY-CONFIDENTIAL INFORMATION INCLUDED
>> >>
>> >> This electronic transmission, and any documents attached hereto,
>> may
>> >> contain confidential, proprietary and/or legally privileged
>> >> information. The information is intended only for use by the
>> >> recipient named above. If you received this electronic message in
>> >> error, please notify the sender and delete the electronic message.
>> >> Any disclosure, copying, distribution, or use of the contents of
>> >> information received in error is strictly prohibited, and
>> violators
>> >> will be pursued legally.
>> >
>> >
>> > PROPRIETARY-CONFIDENTIAL INFORMATION INCLUDED
>> >
>> > This electronic transmission, and any documents attached hereto,
>> may
>> > contain confidential, proprietary and/or legally privileged
>> > information. The information is intended only for use by the
>> > recipient named above. If you received this electronic message in
>> > error, please notify the sender and delete the electronic message.
>> > Any disclosure, copying, distribution, or use of the contents of
>> > information received in error is strictly prohibited, and violators
>> > will be pursued legally.
>>
>>
>>
>> PROPRIETARY-CONFIDENTIAL INFORMATION INCLUDED
>>
>> This electronic transmission, and any documents attached hereto,
>> may contain confidential, proprietary and/or legally privileged
>> information. The information is intended only for use by the
>> recipient named above. If you received this electronic message in
>> error, please notify the sender and delete the electronic message.
>> Any disclosure, copying, distribution, or use of the contents of
>> information received in error is strictly prohibited, and violators
>> will be pursued legally.
>>
>>
>>
>>
>> PROPRIETARY-CONFIDENTIAL INFORMATION INCLUDED
>>
>> This electronic transmission, and any documents attached hereto,
>> may contain confidential, proprietary and/or legally privileged
>> information. The information is intended only for use by the
>> recipient named above. If you received this electronic message in
>> error, please notify the sender and delete the electronic message.
>> Any disclosure, copying, distribution, or use of the contents of
>> information received in error is strictly prohibited, and violators
>> will be pursued legally.
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Fwd: CF card I/O errors on Linux
2010-05-12 21:19 ` Fwd: CF card I/O errors on Linux Larry Baker
@ 2010-05-20 9:50 ` Tejun Heo
2010-05-23 13:58 ` Tejun Heo
0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2010-05-20 9:50 UTC (permalink / raw)
To: Larry Baker; +Cc: jgarzik, linux-ide, Sergei Shtylyov, Alan Cox
Hello,
On 05/12/2010 11:19 PM, Larry Baker wrote:
> 1. I needed to see more of the IDENTIFY DEVICE response, so I modified
> the probe printout to include words 47, 49, 59, and 80-88, like this:
>
> - "%s: cfg 49:%04x 82:%04x 83:%04x 84:%04x "
> - "85:%04x 86:%04x 87:%04x 88:%04x\n",
> + "%s: cfg 47:%04x 49:%04x 59:%04x "
> + "80-88:%04x %04x %04x %04x %04x %04x %04x %04x
> %04x\n",
> __func__,
> - id[49], id[82], id[83], id[84],
> - id[85], id[86], id[87], id[88]);
> + id[47], id[49], id[59], id[80], id[81], id[82],
> + id[83], id[84], id[85], id[86], id[87], id[88]);
I think it would be better to include a module parameter which makes
libata dump the whole thing. The message is only useful for
development and debugging anyway. I'll add such an option.
> 2. The device I was having trouble with claims to support multi-sector
> transfers, with a maximum Sector Count of 1. There's not much point
> then, is there? I suggest adding a test for max >= 2 in the conditional
> where dev->multi_count is assigned:
>
> /* get current R/W Multiple count setting */
> if ((dev->id[47] >> 8) == 0x80 && (dev->id[59] & 0x100)) {
> unsigned int max = dev->id[47] & 0xff;
> unsigned int cnt = dev->id[59] & 0xff;
> + /* no point unless max >= 2 */
> + if (max >= 2)
> /* only recognize/allow powers of two here */
> if (is_power_of_2(max) && is_power_of_2(cnt))
> if (cnt <= max)
> dev->multi_count = cnt;
> }
Yeap, this does make sense. Sergei, Alan, what do you guys think?
> 3. In the error recovery code (in libata-eh.c?), I suggest a way to
> recover for devices like the one I've got that improperly implement
> READ/WRITE MULTI commands:
>
> If the failed command is one of the READ/WRITE MULTI commands (entries
> 0-4 in ata_rw_cmds[]), and the device response is ABORT, use your
> HORKAGE mechanism to set a flag like ATA_HORKAGE_BROKEN_MULTI to skip
> setting dev->multi_count in libata-core.c (the code in 2., above), i.e.,
> disable multi-sector transfers, then reconfigure the device to disable
> multi-sector transfers (call ata_dev_configure(), I guess, so the new
> configuration gets printed out in dmesg) and retry the transfer. Any
> future calls to ata_dev_configure() for that device will not attempt to
> enable multi-sector transfers because the ATA_HORKAGE_BROKEN_MULTI is set:
>
> + /* don't try to enable R/W multi if it is broken */
> + if (!(dev->horkage & ATA_HORKAGE_BROKEN_MULTI))
So, turn off MULTI on ABORT.... Oh yeah, I can add that to the
existing speed down logic. Would you be available to test patches?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Fwd: CF card I/O errors on Linux
2010-05-20 9:50 ` Tejun Heo
@ 2010-05-23 13:58 ` Tejun Heo
0 siblings, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2010-05-23 13:58 UTC (permalink / raw)
To: Larry Baker; +Cc: jgarzik, linux-ide, Sergei Shtylyov, Alan Cox
On 05/20/2010 11:50 AM, Tejun Heo wrote:
>> If the failed command is one of the READ/WRITE MULTI commands (entries
>> 0-4 in ata_rw_cmds[]), and the device response is ABORT, use your
>> HORKAGE mechanism to set a flag like ATA_HORKAGE_BROKEN_MULTI to skip
>> setting dev->multi_count in libata-core.c (the code in 2., above), i.e.,
>> disable multi-sector transfers, then reconfigure the device to disable
>> multi-sector transfers (call ata_dev_configure(), I guess, so the new
>> configuration gets printed out in dmesg) and retry the transfer. Any
>> future calls to ata_dev_configure() for that device will not attempt to
>> enable multi-sector transfers because the ATA_HORKAGE_BROKEN_MULTI is set:
>>
>> + /* don't try to enable R/W multi if it is broken */
>> + if (!(dev->horkage & ATA_HORKAGE_BROKEN_MULTI))
>
> So, turn off MULTI on ABORT.... Oh yeah, I can add that to the
> existing speed down logic. Would you be available to test patches?
Something like the following. Please test whether it works as
expected. It should turn off muti after two device errors during
partition scan.
Thanks.
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index c47373f..1f97189 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2416,7 +2416,8 @@ int ata_dev_configure(struct ata_device *dev)
dev->n_sectors = ata_id_n_sectors(id);
/* get current R/W Multiple count setting */
- if ((dev->id[47] >> 8) == 0x80 && (dev->id[59] & 0x100)) {
+ if (!(dev->flags & ATA_DFLAG_MULTI_OFF) &&
+ (dev->id[47] >> 8) == 0x80 && (dev->id[59] & 0x100)) {
unsigned int max = dev->id[47] & 0xff;
unsigned int cnt = dev->id[59] & 0xff;
/* only recognize/allow powers of two here */
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index f77a673..60b352f 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -53,6 +53,7 @@ enum {
ATA_EH_SPDN_SPEED_DOWN = (1 << 1),
ATA_EH_SPDN_FALLBACK_TO_PIO = (1 << 2),
ATA_EH_SPDN_KEEP_ERRORS = (1 << 3),
+ ATA_EH_SPDN_MULTI_OFF = (1 << 4),
/* error flags */
ATA_EFLAG_IS_IO = (1 << 0),
@@ -1805,13 +1806,13 @@ static int speed_down_verdict_cb(struct ata_ering_entry *ent, void *void_arg)
* occurred during last 5 mins, SPEED_DOWN and FALLBACK_TO_PIO.
*
* 2. If more than one DUBIOUS_TOUT_HSM or DUBIOUS_UNK_DEV errors
- * occurred during last 5 mins, NCQ_OFF.
+ * occurred during last 5 mins, NCQ_OFF and MULTI_OFF.
*
* 3. If more than 8 ATA_BUS, TOUT_HSM or UNK_DEV errors
* ocurred during last 5 mins, FALLBACK_TO_PIO
*
* 4. If more than 3 TOUT_HSM or UNK_DEV errors occurred
- * during last 10 mins, NCQ_OFF.
+ * during last 10 mins, NCQ_OFF and MULTI_OFF.
*
* 5. If more than 3 ATA_BUS or TOUT_HSM errors, or more than 6
* UNK_DEV errors occurred during last 10 mins, SPEED_DOWN.
@@ -1841,7 +1842,8 @@ static unsigned int ata_eh_speed_down_verdict(struct ata_device *dev)
if (arg.nr_errors[ATA_ECAT_DUBIOUS_TOUT_HSM] +
arg.nr_errors[ATA_ECAT_DUBIOUS_UNK_DEV] > 1)
- verdict |= ATA_EH_SPDN_NCQ_OFF | ATA_EH_SPDN_KEEP_ERRORS;
+ verdict |= ATA_EH_SPDN_NCQ_OFF | ATA_EH_SPDN_MULTI_OFF |
+ ATA_EH_SPDN_KEEP_ERRORS;
if (arg.nr_errors[ATA_ECAT_ATA_BUS] +
arg.nr_errors[ATA_ECAT_TOUT_HSM] +
@@ -1855,7 +1857,7 @@ static unsigned int ata_eh_speed_down_verdict(struct ata_device *dev)
if (arg.nr_errors[ATA_ECAT_TOUT_HSM] +
arg.nr_errors[ATA_ECAT_UNK_DEV] > 3)
- verdict |= ATA_EH_SPDN_NCQ_OFF;
+ verdict |= ATA_EH_SPDN_NCQ_OFF | ATA_EH_SPDN_MULTI_OFF;
if (arg.nr_errors[ATA_ECAT_ATA_BUS] +
arg.nr_errors[ATA_ECAT_TOUT_HSM] > 3 ||
@@ -1908,6 +1910,14 @@ static unsigned int ata_eh_speed_down(struct ata_device *dev,
goto done;
}
+ /* turn off multi? */
+ if ((verdict & ATA_EH_SPDN_MULTI_OFF) && (dev->flags & ATA_DFLAG_PIO)) {
+ dev->flags |= ATA_DFLAG_MULTI_OFF;
+ ata_dev_printk(dev, KERN_WARNING,
+ "PIO multi disabled due to excessive errors\n");
+ goto done;
+ }
+
/* speed down? */
if (verdict & ATA_EH_SPDN_SPEED_DOWN) {
/* speed down SATA link speed if possible */
diff --git a/include/linux/libata.h b/include/linux/libata.h
index ee84e7e..a145d80 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -147,6 +147,7 @@ enum {
ATA_DFLAG_DUBIOUS_XFER = (1 << 16), /* data transfer not verified */
ATA_DFLAG_NO_UNLOAD = (1 << 17), /* device doesn't support unload */
ATA_DFLAG_UNLOCK_HPA = (1 << 18), /* unlock HPA */
+ ATA_DFLAG_MULTI_OFF = (1 << 19), /* turn off PIO multi mode */
ATA_DFLAG_INIT_MASK = (1 << 24) - 1,
ATA_DFLAG_DETACH = (1 << 24),
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-05-23 13:58 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <7B56DFFC-819E-46BA-9D06-B9AB97144AD6@usgs.gov>
2010-05-12 21:19 ` Fwd: CF card I/O errors on Linux Larry Baker
2010-05-20 9:50 ` Tejun Heo
2010-05-23 13:58 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).