Handling Asynchronous Notification when IO are outstanding

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Handling Asynchronous Notification when IO are outstanding
@ 2010-03-09  0:27 Gwendal Grignou
  2010-03-24 23:48 ` Tejun Heo
  0 siblings, 1 reply; 2+ messages in thread
From: Gwendal Grignou @ 2010-03-09  0:27 UTC (permalink / raw)
  To: IDE/ATA development list

I am working with Marvell 7042 controller and SiI3276 port multiplier
[PMP] and would like to handle asynchronous notification [AN]
properly.
However, if a command is outstanding when the PMP raises an AN, the
port is frozen, preventing _autopsy_ error code from doing its work.

For example, here is a case where a disk has a power glitch behind a
port multiplier while a command is outstanding. The PMP detects the
signal loss and send an AN.
In sata_mv.c  mv_err_intr() is called and detect the notification: it
pushes info in error descriptor and call ata_port_schedule_eh() via
sata_async_notification().

However, when we enter ata_scsi_error(), if a command is outstanding,
__ata_port_freeze() is called, preventing  sata_scr_read() to succeed
in ata_eh_link_autopsy():

Feb 25 02:11:57 bdfl11 kernel: ata4.00: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.01: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.02: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.03: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.04: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.05: failed to read SCR 1 (Emask=0x40)
Feb 25 02:11:57 bdfl11 kernel: ata4.15: exception Emask 0x4 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.15: edma_err_cause=02000100
pp_flags=00000005, fis_cause=00008200
Feb 25 02:11:57 bdfl11 kernel: ata4.00: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.01: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.02: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.03: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.04: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.04: cmd
ca/00:80:e7:78:56/00:00:00:00:00/e8 tag 3 dma 65536 out
Feb 25 02:11:57 bdfl11 kernel: res 50/00:00:4e:10:45/00:00:00:00:00/e8
Emask 0x4 (timeout)
Feb 25 02:11:57 bdfl11 kernel: ata4.04: status: { DRDY }
Feb 25 02:11:57 bdfl11 kernel: ata4.05: exception Emask 0x100 SAct 0x0
SErr 0x0 action 0x6 frozen
Feb 25 02:11:57 bdfl11 kernel: ata4.15: hard resetting link
Feb 25 02:11:58 bdfl11 kernel: ata4.15: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
Feb 25 02:11:58 bdfl11 kernel: ata4.00: hard resetting link

I haven't found the right solution to handle this problem yet:

1: removing __ata_port_freeze() in ata_scsi_error() unilaterally is
very dangerous, it opens a new race condition and may schedule the
error handler several time.
2: in sata_mv, we can not wait for commands to complete like we do for
NCQ, because in the case above, the command sent to the failed disk
will never come back.

I am thinking of waiting for all IO to complete on all port but the
impacted one(s), adding a new action in ehi descriptor to indicate an
AN is scheduled, and preventing the error to froze the port if only
IOs to the failed ports are outstanding.
Then _autopsy_ code would collect and decode SERROR register for the
failed port.

Is it the right approach?

Thanks,
Gwendal.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Handling Asynchronous Notification when IO are outstanding
  2010-03-09  0:27 Handling Asynchronous Notification when IO are outstanding Gwendal Grignou
@ 2010-03-24 23:48 ` Tejun Heo
  0 siblings, 0 replies; 2+ messages in thread
From: Tejun Heo @ 2010-03-24 23:48 UTC (permalink / raw)
  To: Gwendal Grignou; +Cc: IDE/ATA development list

Hello, Gwendal.

On 03/09/2010 09:27 AM, Gwendal Grignou wrote:
> However, when we enter ata_scsi_error(), if a command is outstanding,
> __ata_port_freeze() is called, preventing  sata_scr_read() to succeed
> in ata_eh_link_autopsy():

Ah... that's an interesting problem.

> 1: removing __ata_port_freeze() in ata_scsi_error() unilaterally is
> very dangerous, it opens a new race condition and may schedule the
> error handler several time.
> 2: in sata_mv, we can not wait for commands to complete like we do for
> NCQ, because in the case above, the command sent to the failed disk
> will never come back.

I don't think there will be race conditions even if you remove
__ata_port_freeze() there.  Port freezing is mostly to protect the
host from malfunctioning controller which could often lead to IRQ
nobody cared issues in the early days.  For FIS based controllers, I
don't think removing it from there would be such a bad idea.

> I am thinking of waiting for all IO to complete on all port but the
> impacted one(s), adding a new action in ehi descriptor to indicate an
> AN is scheduled, and preventing the error to froze the port if only
> IOs to the failed ports are outstanding.
> Then _autopsy_ code would collect and decode SERROR register for the
> failed port.
> 
> Is it the right approach?

My reply is very late but can you please try removing
__ata_port_freeze() and see how it works?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-03-25  7:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-09  0:27 Handling Asynchronous Notification when IO are outstanding Gwendal Grignou
2010-03-24 23:48 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).