public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC&PATCH 1/2] PCI Error Recovery (readX_check)
@ 2004-08-24  5:24 Hidetoshi Seto
  2004-08-24  5:41 ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Hidetoshi Seto @ 2004-08-24  5:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ia64, Linus Torvalds

(I've forgotten to cc-ing, so send again.)

Hi, all

This is a request for comments about "PCI error recovery."

Some little time (six months!) ago, we had a discussion about this topic,
and at the last we came to the conclusion that "check the bridge status."
Based on this (Linus's idea), I have reconsider the design and refine the
implementation.  I'd really appreciate it if you could find a problem of
the following ideas and could feedback it to me.

-----

At first, again, my goal is:
    "Enable some possible error recovery on drivers by error notification."

Today, errors on some type of transaction such as:
    - memory mapped I/O write
    - DMA from device to memory
    - DMA from memory to device
are recoverable because the error (ex. parity error) is visible on the
status register of the device.

However, in the case of memory mapped I/O read, the result of the
transaction is not visible on the device. Whether there had a error
or not is visible on bus bridges locates upper the device.

We have to be careful since there are some platforms that host bus bridges
couldn't be used for recovery (ex. host isn't PCI device.)
If so, we have to use a PCI-to-PCI bridge just under the host bridge
instead of using the host bridge.

Put simply, we need to check the status of the "highest" bus bridge only
in the case of mmio read.

And, the status of the highest bridge is shared by some other devices,
so clearing the status from one device could affect checking from the
other device.  Therefore, to check the status of the highest bridge, we
should handle the status from out of device drivers, never in each drivers.

So what I have to consider next is the implementation of notification on
mmio read which we should check the highest bridge status.

---

Okay, let's talk about the actual implementation :-)

The requirement is:
    "Guarantee the result of mmio read on error while multiple mmio read
       by devices under same bridge is running."

To realize this, satisfy all of 1 to 3:

#1:
    We have to know which device under the same bridge doing mmio read.

Add a list "working_device" on the pci_dev struct to check the running
device.  Register the device to the highest bridge device at the beginning
of a session containing some mmio reads, and unregister the device at the
end of the session.

#2:
    Clear the bridge status when no devices under the bridge do mmio read.

Take a lock during a mmio read which could change the status, and during
a clearing the status.  Logically, rwlock is convenient.
    - Processing mmio read: read_lock
    - Clearing the status:  write_lock
To reduce more load on this lock, check the status before clear it, and
skip clearing if the status have no error.

#3:
    Send the error status to all drivers in session before clear it.

There is no way to know which device cause a error if there are multiple
devices doing mmio read.  (Using spinlock instead of rwlock is a possible
way, but clearly it would impact on I/O performance, so I ignore the way.)
Thus, the best we can do now is send the error to all concerning driver.
I mean, notify the error to all devices registered to the "working_device"
list of the highest bridge, by updating "err_status" value which newly
added to the pci_dev struct.  Note that we should take the write_lock from
the updating "err_status" to the clearing the bridge status.  Or the result
of I/O between the updating and the clearing would vanish into the night.

---

I suppose the basic mmio read of RAS-aware driver as like as following:

========================================================================
int retries = 2;

do {
       clear_pci_errors(dev);		/* clear bridge status */
       val = readX_check(dev, addr);	/* memory mapped I/O read */
       status = read_pci_errors(dev);	/* check bridge status */
} while (status && --retries > 0);
if (status)
	/* error */
========================================================================

The basic design of new-found functions are:

$1. clear_pci_errors(DEVICE)
   - find the highest(host or top PCI-to-PCI) bridge of DEVICE
   - check the status of the highest bridge, and if it indicates error(s):
     - write_lock
     - update each err_status of all devices registered to the working_device
       list of the highest bridge, by "|=" the value of the bridge status
     - clear the bridge status
     - write_unlock
   - clear the err_status of DEVICE
   - register DEVICE to the highest bridge

$2. readX_check(DEVICE, ADDR)
   - read_lock
   - I/O (read)
   - read_unlock

$3. read_pci_errors(DEVICE)
   - find the highest bridge of DEVICE
   - store the status of the highest bridge as STATUS
   - check ( STATUS | DEVICE->err_status )
   - return 1 if error (ex. Master/Target Abort, Party Error), else return 0


Note:
This time, here is no initialize for the control register of the highest
bridge.  The generic initialization could be implemented in the code,
but the values are user configurable and occasionally some bus needs
specific value, so now I don't write it yet.


Thanks,
H.Seto






^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [RFC&PATCH 1/2] PCI Error Recovery (readX_check)
@ 2004-08-28  1:23 Hidetoshi Seto
  2004-09-17 12:00 ` Hidetoshi Seto
  2004-09-17 12:06 ` Hidetoshi Seto
  0 siblings, 2 replies; 15+ messages in thread
From: Hidetoshi Seto @ 2004-08-28  1:23 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Linus Torvalds, Benjamin Herrenschmidt, Linux Kernel list,
	linux-ia64

(I couldn't see my last mail posted few days ago in list (kicked?),
  so I send it again...
  Grant, Linus and Benjamin, I'd appreciate it if you could read this mail
  and let me know when you receive it. Of course, new comments are welcome.)

-------- Original Message --------

Grant Grundler wrote:
> Do we only need to determine there was an error in the IO hierarchy
> or do we also need to know which device/driver caused the error?
> 
> If the latter I agree with linus. If the former, then the error recovery
> can support asyncronous errors (like the bad DMA address case) and tell
> all affected (thanks willy) drivers.

What I supposed here is the former.

As Linus said, I also assume that most high-end hardware has enough bridges
and that the number of devices sharing same bridge would be minimized.
(And additionally, I assume that there is no "mixed" bridge - having both of
  device which owned by RAS-aware driver supporting recovery infrastructure
  and one which owned by not-aware driver.  Generally all should be RAS-aware.)

However, my implementation was not designed on the assumption that
"1 bridge = 1 device" like on ppc64, but on "1 bridge = 1 device group."
Of course, there could be some group of only 1 device.
It will depend on the structure of the system which you could configure it.

Devices in same group can run at same time keeping with a certain level of
performance, and not mind being affected(or even killed!) by (PCI bus)error
caused by someone in the group.  They either swim together or sink together.

Fortunately, such error is rare occurrence, and even if it had occurred,
it would be either "recoverable by a retry since it was a usual soft-error"
or "unrecoverable since it was a seldom hard-error."

Without this new recovery infrastructure, system cannot have proper drivers
to retry the transaction to determine whether the error was a soft or a hard,
and also cannot have these drivers not to return the broken data to user.
So now, system will down, last resort comes first.


> Does anyone expect to recover from devices attempting unmapped DMA?
> Ie an IOMMU which services multiple PCI busses getting a bad DMA address
> will cause the next MMIO read by any of the (grandchildren) PCI devices to 
> see an error (MCA on IA64). I'm asking only to determine if this is
> outside the scope of what the PCI error recovery is trying to support.

At present, unmapped DMA is outside of the scope... but alongside I also
trying to possible IA64 specific recovery(with MCA & CPE) using prototypes.


>>> +	bool "PCI device error recovery"
>>> +	depends on PCI
> 
> 	depends on PCI && EXPERIMENTAL
> 
>>> +	---help---
>>> +	By default, the device driver hardly recovers from PCI errors. When
>>> +	this feature is available, the special io interface are provided
>>> +	from the kernel.
> 
> May I suggest an alternate text?
> 	Saying Y provides PCI infrastructure to recover from some PCI errors.
> 	Currently, very few PCI drivers actually implement this.
> 	See Documentation/pci-errors.txt for a description of the
> 	infrastructure provided.

Thank you for good substitution :-D

I understand the needs of Documentation/pci-errors.txt for driver developers,
so I'll write the document and post it asap.


Thanks,
H.Seto

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2004-09-21  8:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-24  5:24 [RFC&PATCH 1/2] PCI Error Recovery (readX_check) Hidetoshi Seto
2004-08-24  5:41 ` Linus Torvalds
2004-08-24  8:06   ` Hidetoshi Seto
2004-08-25  7:01   ` Benjamin Herrenschmidt
2004-08-25  7:20     ` Linus Torvalds
2004-08-25 15:52       ` Grant Grundler
2004-08-25 17:25         ` Linus Torvalds
2004-08-25 23:23       ` Benjamin Herrenschmidt
2004-08-25 23:35         ` Linus Torvalds
2004-08-25 15:42     ` Grant Grundler
  -- strict thread matches above, loose matches on Subject: below --
2004-08-28  1:23 Hidetoshi Seto
2004-09-17 12:00 ` Hidetoshi Seto
2004-09-17 12:06 ` Hidetoshi Seto
2004-09-18  4:36   ` Grant Grundler
2004-09-21  8:32     ` Hidetoshi Seto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox