* Handling Asynchronous Notification when IO are outstanding
@ 2010-03-09 0:27 Gwendal Grignou
2010-03-24 23:48 ` Tejun Heo
0 siblings, 1 reply; 2+ messages in thread
From: Gwendal Grignou @ 2010-03-09 0:27 UTC (permalink / raw)
To: IDE/ATA development list
I am working with Marvell 7042 controller and SiI3276 port multiplier
[PMP] and would like to handle asynchronous notification [AN]
properly.
However, if a command is outstanding when the PMP raises an AN, the
port is frozen, preventing _autopsy_ error code from doing its work.
For example, here is a case where a disk has a power glitch behind a
port multiplier while a command is outstanding. The PMP detects the
signal loss and send an AN.
In sata_mv.c mv_err_intr() is called and detect the notification: it
pushes info in error descriptor and call ata_port_schedule_eh() via
sata_async_notification().
However, when we enter ata_scsi_error(), if a command is outstanding,
__ata_port_freeze() is called, preventing sata_scr_read() to succeed
in ata_eh_link_autopsy():
Feb 25 02:11:57 bdfl11 kernel: ata4.00: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.01: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.02: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.03: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.04: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.05: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.15: exception Emask 0x4 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.15: edma_err_cause=02000100
pp_flags=00000005, fis_cause=00008200
Feb 25 02:11:57 bdfl11 kernel: ata4.00: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.01: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.02: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.03: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.04: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.04: cmd
ca/00:80:e7:78:56/00:00:00:00:00/e8 tag 3 dma 65536 out
Feb 25 02:11:57 bdfl11 kernel: res 50/00:00:4e:10:45/00:00:00:00:00/e8
Emask 0x4 (timeout)
Feb 25 02:11:57 bdfl11 kernel: ata4.04: status: { DRDY }
Feb 25 02:11:57 bdfl11 kernel: ata4.05: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.15: hard resetting link
Feb 25 02:11:58 bdfl11 kernel: ata4.15: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
Feb 25 02:11:58 bdfl11 kernel: ata4.00: hard resetting link
I haven't found the right solution to handle this problem yet:
1: removing __ata_port_freeze() in ata_scsi_error() unilaterally is
very dangerous, it opens a new race condition and may schedule the
error handler several time.
2: in sata_mv, we can not wait for commands to complete like we do for
NCQ, because in the case above, the command sent to the failed disk
will never come back.
I am thinking of waiting for all IO to complete on all port but the
impacted one(s), adding a new action in ehi descriptor to indicate an
AN is scheduled, and preventing the error to froze the port if only
IOs to the failed ports are outstanding.
Then _autopsy_ code would collect and decode SERROR register for the
failed port.
Is it the right approach?
Thanks,
Gwendal.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Handling Asynchronous Notification when IO are outstanding
2010-03-09 0:27 Handling Asynchronous Notification when IO are outstanding Gwendal Grignou
@ 2010-03-24 23:48 ` Tejun Heo
0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2010-03-24 23:48 UTC (permalink / raw)
To: Gwendal Grignou; +Cc: IDE/ATA development list
Hello, Gwendal.
On 03/09/2010 09:27 AM, Gwendal Grignou wrote:
> However, when we enter ata_scsi_error(), if a command is outstanding,
> __ata_port_freeze() is called, preventing sata_scr_read() to succeed
> in ata_eh_link_autopsy():
Ah... that's an interesting problem.
> 1: removing __ata_port_freeze() in ata_scsi_error() unilaterally is
> very dangerous, it opens a new race condition and may schedule the
> error handler several time.
> 2: in sata_mv, we can not wait for commands to complete like we do for
> NCQ, because in the case above, the command sent to the failed disk
> will never come back.
I don't think there will be race conditions even if you remove
__ata_port_freeze() there. Port freezing is mostly to protect the
host from malfunctioning controller which could often lead to IRQ
nobody cared issues in the early days. For FIS based controllers, I
don't think removing it from there would be such a bad idea.
> I am thinking of waiting for all IO to complete on all port but the
> impacted one(s), adding a new action in ehi descriptor to indicate an
> AN is scheduled, and preventing the error to froze the port if only
> IOs to the failed ports are outstanding.
> Then _autopsy_ code would collect and decode SERROR register for the
> failed port.
>
> Is it the right approach?
My reply is very late but can you please try removing
__ata_port_freeze() and see how it works?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-03-25 7:13 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-09 0:27 Handling Asynchronous Notification when IO are outstanding Gwendal Grignou
2010-03-24 23:48 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).