[PATCH] PCIERR : interfaces for synchronous I/O error detection on driver

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] PCIERR : interfaces for synchronous I/O error detection on driver
@ 2006-03-22  8:38 Hidetoshi Seto
  2006-03-22 21:01 ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Hidetoshi Seto @ 2006-03-22  8:38 UTC (permalink / raw)
  To: Linux Kernel list, linux-ia64; +Cc: Greg KH

This patch proposes new interfaces, that known by few as iochk_*,
for synchronous I/O error detection on PCI drivers.

At 2.6.16-rc1, Linux kernel accepts and provides "PCI-bus error event
callbacks" that enable RAS-aware drivers to notice errors asynchronously,
and to join following kernel-initiated PCI-bus error recovery.
This callbacks work well on PPC64 where it was designed to fit.

However, some difficulty still remains to cover all possible error
situations even if we use callbacks. It will not help keeping data
integrity, passing no broken data to drivers and user lands, preventing
applications from going crazy or sudden death.

I suppose that proposed interfaces will solve most of remaining issues.
It would be FAQ that what is the difference between this interfaces and
the callback. Following list would help you:

** Feature 1: Asynchronous error event callbacks:
  - written by Linas Vepstas
  - patch was already accepted (2.6.16-rc1)
  - designed for "Error Recovery":
    Interface for asynchronous error recovery (such as bus reset)
    on other context after an serious error occurrences.
    This will be effective if system needs complex taking-time recovery,
    ex. re-enabling erroneous PCI-bus.
  - It will be useful if the arch supports bus-isolating and can continue
    running after such isolation, limiting a part of its services.
  - i.e. this feature improves Serviceability.
  - drivers can use this feature by constructing struct pci_error_handlers
    with its callbacks and register it in struct pci_driver of itself.

** Feature 2: Synchronous error detect interfaces:
  - written by Hidetoshi Seto
  - patch is not accepted yet.
  - designed for "Error Detection":
    Interface for synchronous error detection on same context
    and prevent system from data pollution on any error occurrences.
    This will be effective if system doesn't require complex recovery,
    ex. retrying I/O after trivial soft error, or re-issuing I/O if it
    gets poisoned data (forwarded error from PCI-Express).
  - It will be useful if arch chooses panic on bus errors not to pass
    any broken data to un-reliable drivers.
  - i.e. this feature improves Availability by keeping data integrity.
  - drivers can use this feature by clipping its I/O by *_clear/read call
    and checking the return value after all.

Following patch adds definition of interfaces described as Feature 2.
(They were renamed from iochk_* to pcierr_*, and located in pci.h.)
There are no more patch for generic part. The core implementation is
architecture dependent. You will get proper error status only on arch
where supports and implements this feature. Or you will get always 0,
means "no errors".

Thanks,
H.Seto

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

-----

  include/linux/pci.h |   15 +++++++++++++++
  1 files changed, 15 insertions(+)

Index: linux-2.6.16_WORK/include/linux/pci.h
===================================================================
--- linux-2.6.16_WORK.orig/include/linux/pci.h	2006-03-22 14:27:04.000000000 +0900
+++ linux-2.6.16_WORK/include/linux/pci.h	2006-03-22 14:41:01.000000000 +0900
@@ -696,6 +696,21 @@
  #endif /* HAVE_ARCH_PCI_RESOURCE_TO_USER */


+/* PCIERR_CHECK provides additional interfaces for drivers to detect
+ * some IO errors, to support drivers can handle errors properly.
+ * Some archs requiring high-reliability would implement these.
+ */
+#ifdef HAVE_ARCH_PCIERR_CHECK
+/* type of iocookie is arch dependent */
+extern void pcierr_clear(iocookie *cookie, struct pci_dev *dev);
+extern int pcierr_read(iocookie *cookie);
+#else
+typedef int iocookie;
+static inline void pcierr_clear(iocookie *cookie, struct pci_dev *dev) {}
+static inline int pcierr_read(iocookie *cookie) { return 0; }
+#endif
+
+
  /*
   *  The world is not perfect and supplies us with broken PCI devices.
   *  For at least a part of these bugs we need a work-around, so both


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] PCIERR : interfaces for synchronous I/O error detection on driver
  2006-03-22  8:38 [PATCH] PCIERR : interfaces for synchronous I/O error detection on driver Hidetoshi Seto
@ 2006-03-22 21:01 ` Greg KH
  2006-03-24  7:47   ` [PATCH 1/6] " Hidetoshi Seto
                     ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Greg KH @ 2006-03-22 21:01 UTC (permalink / raw)
  To: Hidetoshi Seto; +Cc: Linux Kernel list, linux-ia64

On Wed, Mar 22, 2006 at 05:38:51PM +0900, Hidetoshi Seto wrote:
> This patch proposes new interfaces, that known by few as iochk_*,
> for synchronous I/O error detection on PCI drivers.

Please include the linux-pci mailing list in further discussion of PCI
specific things.

> Index: linux-2.6.16_WORK/include/linux/pci.h
> ===================================================================
> --- linux-2.6.16_WORK.orig/include/linux/pci.h	2006-03-22 
> 14:27:04.000000000 +0900

Your patch is linewrapped :(

> +++ linux-2.6.16_WORK/include/linux/pci.h	2006-03-22 
> 14:41:01.000000000 +0900
> @@ -696,6 +696,21 @@
>  #endif /* HAVE_ARCH_PCI_RESOURCE_TO_USER */
> 
> 
> +/* PCIERR_CHECK provides additional interfaces for drivers to detect
> + * some IO errors, to support drivers can handle errors properly.
> + * Some archs requiring high-reliability would implement these.
> + */
> +#ifdef HAVE_ARCH_PCIERR_CHECK
> +/* type of iocookie is arch dependent */
> +extern void pcierr_clear(iocookie *cookie, struct pci_dev *dev);
> +extern int pcierr_read(iocookie *cookie);
> +#else
> +typedef int iocookie;
> +static inline void pcierr_clear(iocookie *cookie, struct pci_dev *dev) {}
> +static inline int pcierr_read(iocookie *cookie) { return 0; }
> +#endif

We need to see the code that implements this before we can accept a
header file change.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] PCIERR : interfaces for synchronous I/O error detection on driver
  2006-03-22 21:01 ` Greg KH
@ 2006-03-24  7:47   ` Hidetoshi Seto
  2006-03-24 23:43     ` Linas Vepstas
  2006-03-24  7:48   ` [PATCH 2/6] PCIERR : interfaces for synchronous I/O error detection on driver (config) Hidetoshi Seto
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Hidetoshi Seto @ 2006-03-24  7:47 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux Kernel list, linux-ia64, linux-pci

There are 6 patches:
  1: New interface's proposal, generic code in pci.h.
  2: [1/4]IA64 specific implementation. (config)
  3: [2/4]IA64 specific implementation. (base)
  4: [3/4]IA64 specific implementation. (mcadrv)
  5: [4/4]IA64 specific implementation. (poison)
  6: Sample driver (Fusion MPT)

Following is abstract and 1st patch:

This patch proposes new interfaces, that known by few as iochk_*,
for synchronous I/O error detection on PCI drivers.

At 2.6.16-rc1, Linux kernel accepts and provides "PCI-bus error event
callbacks" that enable RAS-aware drivers to notice errors asynchronously,
and to join following kernel-initiated PCI-bus error recovery.
This callbacks work well on PPC64 where it was designed to fit.

However, some difficulty still remains to cover all possible error
situations even if we use callbacks. It will not help keeping data
integrity, passing no broken data to drivers and user lands, preventing
applications from going crazy or sudden death.

I suppose that proposed interfaces will solve most of remaining issues.
It would be FAQ that what is the difference between this interfaces and
the callback. Following list would help you:

** Feature 1: Asynchronous error event callbacks:
  - written by Linas Vepstas
  - patch was already accepted (2.6.16-rc1)
  - designed for "Error Recovery":
    Interface for asynchronous error recovery (such as bus reset)
    on other context after an serious error occurrences.
    This will be effective if system needs complex taking-time recovery,
    ex. re-enabling erroneous PCI-bus.
  - It will be useful if the arch supports bus-isolating and can continue
    running after such isolation, limiting a part of its services.
  - i.e. this feature improves Serviceability.
  - drivers can use this feature by constructing struct pci_error_handlers
    with its callbacks and register it in struct pci_driver of itself.

** Feature 2: Synchronous error detect interfaces:
  - written by Hidetoshi Seto
  - patch is not accepted yet.
  - designed for "Error Detection":
    Interface for synchronous error detection on same context
    and prevent system from data pollution on any error occurrences.
    This will be effective if system doesn't require complex recovery,
    ex. retrying I/O after trivial soft error, or re-issuing I/O if it
    gets poisoned data (forwarded error from PCI-Express).
  - It will be useful if arch chooses panic on bus errors not to pass
    any broken data to un-reliable drivers.
  - i.e. this feature improves Availability by keeping data integrity.
  - drivers can use this feature by clipping its I/O by *_clear/read call
    and checking the return value after all.

Following patch adds definition of interfaces described as Feature 2.
(They were renamed from iochk_* to pcierr_*, and located in pci.h.)
There are no more patch for generic part. The core implementation is
architecture dependent. You will get proper error status only on arch
where supports and implements this feature. Or you will get always 0,
means "no errors".

Thanks,
H.Seto

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

-----
  include/linux/pci.h |   15 +++++++++++++++
  1 files changed, 15 insertions(+)

Index: linux-2.6.16_WORK/include/linux/pci.h
===================================================================
--- linux-2.6.16_WORK.orig/include/linux/pci.h
+++ linux-2.6.16_WORK/include/linux/pci.h
@@ -696,6 +696,21 @@
  #endif /* HAVE_ARCH_PCI_RESOURCE_TO_USER */

+/* PCIERR_CHECK provides additional interfaces for drivers to detect
+ * some IO errors, to support drivers can handle errors properly.
+ * Some archs requiring high-reliability would implement these.
+ */
+#ifdef HAVE_ARCH_PCIERR_CHECK
+/* type of iocookie is arch dependent */
+extern void pcierr_clear(iocookie *cookie, struct pci_dev *dev);
+extern int pcierr_read(iocookie *cookie);
+#else
+typedef int iocookie;
+static inline void pcierr_clear(iocookie *cookie, struct pci_dev *dev) {}
+static inline int pcierr_read(iocookie *cookie) { return 0; }
+#endif
+
+
  /*
   *  The world is not perfect and supplies us with broken PCI devices.
   *  For at least a part of these bugs we need a work-around, so both

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/6] PCIERR : interfaces for synchronous I/O error detection on driver
  2006-03-24  7:47   ` [PATCH 1/6] " Hidetoshi Seto
@ 2006-03-24 23:43     ` Linas Vepstas
  2006-03-27  2:37       ` Hidetoshi Seto
  0 siblings, 1 reply; 12+ messages in thread
From: Linas Vepstas @ 2006-03-24 23:43 UTC (permalink / raw)
  To: Hidetoshi Seto; +Cc: Greg KH, Linux Kernel list, linux-ia64, linux-pci

On Fri, Mar 24, 2006 at 04:47:25PM +0900, Hidetoshi Seto wrote:
> 
> At 2.6.16-rc1, Linux kernel accepts and provides "PCI-bus error event
> callbacks" that enable RAS-aware drivers to notice errors asynchronously,
> and to join following kernel-initiated PCI-bus error recovery.
> This callbacks work well on PPC64 where it was designed to fit.
> 
> However, some difficulty still remains to cover all possible error
> situations even if we use callbacks. It will not help keeping data
> integrity, passing no broken data to drivers and user lands, preventing
> applications from going crazy or sudden death.

This is not true.  Although there are some subtle issues, (which
I invite you to describe), the goal of the current design is to 
insure data integrity, and make sure that neither the driver nor 
the userland gets corrupted data. There shouldn't be any "crazy
or sudden death" if the device drivers are any good.

Of course, this depends on the hardware implementation. If
your PCI bus sends corrupt data up to the driver ... all bets 
are off. The design is predicated on the assumption that the
hardware sends either good data or no data, ad that the latter
is associated with a bus state indicating an error has ocurred.

>  - It will be useful if arch chooses panic on bus errors not to pass
>    any broken data to un-reliable drivers.

I assume you meant "if arch chooses NOT to panic on bus errors ..."

I'll review the rest of the patch via sepaate email

--linas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/6] PCIERR : interfaces for synchronous I/O error detection on driver
  2006-03-24 23:43     ` Linas Vepstas
@ 2006-03-27  2:37       ` Hidetoshi Seto
  2006-03-31 22:01         ` Linas Vepstas
  0 siblings, 1 reply; 12+ messages in thread
From: Hidetoshi Seto @ 2006-03-27  2:37 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: Greg KH, Linux Kernel list, linux-ia64, linux-pci

Linas Vepstas wrote:
> On Fri, Mar 24, 2006 at 04:47:25PM +0900, Hidetoshi Seto wrote:
>> However, some difficulty still remains to cover all possible error
>> situations even if we use callbacks. It will not help keeping data
>> integrity, passing no broken data to drivers and user lands, preventing
>> applications from going crazy or sudden death.
> 
> This is not true.  Although there are some subtle issues, (which
> I invite you to describe), the goal of the current design is to 
> insure data integrity, and make sure that neither the driver nor 
> the userland gets corrupted data. There shouldn't be any "crazy
> or sudden death" if the device drivers are any good.

OK, we are sharing the same goal even now.

I failed to mention that as you know this synchronous error detection
would be required if the async-callback needs to touch the hardware
due to recover it or to pick up diagnostic data during kernel-initiated
recovery. (I found a word "pci_check_whatever() API" at your comment
in document, Documentation/pci-error-recovery.txt)

> Of course, this depends on the hardware implementation. If
> your PCI bus sends corrupt data up to the driver ... all bets 
> are off. The design is predicated on the assumption that the
> hardware sends either good data or no data, ad that the latter
> is associated with a bus state indicating an error has ocurred.
> 
>>  - It will be useful if arch chooses panic on bus errors not to pass
>>    any broken data to un-reliable drivers.
> 
> I assume you meant "if arch chooses NOT to panic on bus errors ..."

Hmm, what I meant is that:
   There is an arch that chooses reboot on bus error.
   The reason why it do so is that the design is based on the assumption
   that no driver is able to handle bus error and that almost all drivers
   will go without checking hardware status. So the arch chooses rebooting
   rather than polluting user data.
   The design allows OS to determine whether the system goes reboot or not,
   but OS has no idea to know which driver actually check hardware state.
Therefore this interface will help OS to know which driver is reliable.

Of course there are some arch that chooses not to panic/reboot on bus error.
I think they are believing that all drivers working on the arch can handle
any type of errors, or they have their special feature against errors...,
or just being idiot about hardware errors.

Anyway, all that is certain is that:
  - To check the data from hardware, driver need to ask anywhere synchronously.
  - "Anywhere" would be a register, and/or something in kernel/hardware.
  - State check would be architecture dependent routine work.

Thanks,
H.Seto

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/6] PCIERR : interfaces for synchronous I/O error detection on driver
  2006-03-27  2:37       ` Hidetoshi Seto
@ 2006-03-31 22:01         ` Linas Vepstas
  2006-04-03  4:54           ` Hidetoshi Seto
  0 siblings, 1 reply; 12+ messages in thread
From: Linas Vepstas @ 2006-03-31 22:01 UTC (permalink / raw)
  To: Hidetoshi Seto; +Cc: Greg KH, Linux Kernel list, linux-ia64, linux-pci

On Mon, Mar 27, 2006 at 11:37:36AM +0900, Hidetoshi Seto wrote:
>  - State check would be architecture dependent routine work.

I read through your patches.  You are proposing a very different
way of handling PCI errors than the pci_error_handlers API.
It seems to be much more invasive, and I don't understand why
its needed or how its better.  Let me be specific:

In the mpt code you have a function called pciras_readl()
that tries to perform an error-free read by retrying the read:

  do {
    pcierr_clear(&cookie, ioc->pcidev);
    val = ioread32(addr);
    status = pcierr_read(&cookie);
  } while(status && (--retries > 0));

Why not create special arch/ia_64 readl routine to do this?
In that case, other device drivers would get the benefit of
the retry-on-error type read.

Now, you probably shouldn't put this into the default readl
routine, since some devices do peculiar things if the same
register is read repeatedly.

Next, I notice that if the repeated read fails, then

   schedule_work(&mptbase_rstTask);

is called. This seems to be exactly the kind of action
that the pci_error_handlers API was meant to provide:
if there is a pci read error that cannot be trivially
recovered, then the error_detected() &c. routines would
be called. The mpt device driver would then initiate
a mptbase_rstTask upon one of these callbacks.

Thus, in the ia64 code, if a repeated readl fails,
then the ia64 reset task calls the device drivers
error_detected() routine, followed by the drivers's 
link_reset() routine, followed by the resume() routine.

For the mpt, it would probably be resume() that was
a wrapper around mptbase_rstTask(). Wouldn't this 
work just as well? 

--linas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/6] PCIERR : interfaces for synchronous I/O error detection on driver
  2006-03-31 22:01         ` Linas Vepstas
@ 2006-04-03  4:54           ` Hidetoshi Seto
  0 siblings, 0 replies; 12+ messages in thread
From: Hidetoshi Seto @ 2006-04-03  4:54 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: Greg KH, Linux Kernel list, linux-ia64, linux-pci

Linas Vepstas wrote:
> On Mon, Mar 27, 2006 at 11:37:36AM +0900, Hidetoshi Seto wrote:
>>  - State check would be architecture dependent routine work.
> 
> I read through your patches.  You are proposing a very different
> way of handling PCI errors than the pci_error_handlers API.
> It seems to be much more invasive, and I don't understand why
> its needed or how its better.  Let me be specific:
> 
> In the mpt code you have a function called pciras_readl()
> that tries to perform an error-free read by retrying the read:
> 
>   do {
>     pcierr_clear(&cookie, ioc->pcidev);
>     val = ioread32(addr);
>     status = pcierr_read(&cookie);
>   } while(status && (--retries > 0));
> 
> Why not create special arch/ia_64 readl routine to do this?
> In that case, other device drivers would get the benefit of
> the retry-on-error type read.
> 
> Now, you probably shouldn't put this into the default readl
> routine, since some devices do peculiar things if the same
> register is read repeatedly.

My first proposal of readX_check() about 2 years ago was
already rejected by Linus. I think there were some problem
that let this style slip from linux ... :

  - To Make drivers to be more RAS-aware, rewriting large
    portion of the codes to be able to respond to errors would
    be inevitable anyway.
    It would be better to have generic interfaces wrap
    arch-specific codes because such error status check would
    require some kind of synchronization with arch-specific
    hardware or firmware etc.
    Even if dividing normal-drivers and RAS-hardened-drivers
    is inevitable, still it is better than having many
    arch-specific drivers.

  - Since such synchronized error check could be heavy time
    consumer, it would be better the ratio of actual I/O to
    error check is N:1, i.e. an error check can examine more
    than one I/O.
    Of course it is ideal if one check can examine all I/O,
    however it is always possible that using a broken data
    from earlier read brings the device or whole system to
    fatal situation. Therefore, it would be realistic that
    splitting driver's code to some possible stages and putting
    an error check for each stage.

It is possible to define more interfaces such as readX_check()
and/or readX_retry_on_error(), however they will be just
wrappers of actual readX and pcierr_clear/read(), what everyone
can imagine easily.

What style of check is convenient is depend on the device and
driver. Maybe sample mpt driver could be optimized.
I guess that readX_check() style will convenient on many cases,
but not all cases. That was maybe what Linus said.

> Next, I notice that if the repeated read fails, then
> 
>    schedule_work(&mptbase_rstTask);
> 
> is called. This seems to be exactly the kind of action
> that the pci_error_handlers API was meant to provide:
> if there is a pci read error that cannot be trivially
> recovered, then the error_detected() &c. routines would
> be called. The mpt device driver would then initiate
> a mptbase_rstTask upon one of these callbacks.
> 
> Thus, in the ia64 code, if a repeated readl fails,
> then the ia64 reset task calls the device drivers
> error_detected() routine, followed by the drivers's 
> link_reset() routine, followed by the resume() routine.
> 
> For the mpt, it would probably be resume() that was
> a wrapper around mptbase_rstTask(). Wouldn't this 
> work just as well? 

Yes, this part can be exactly replaced by callback API,
because the reset can be performed asynchronously.
Why this is here is simply ia64 have not implement it yet.

To make callback APIs work on ia64, at least it is required
that ia64 system can continue after a bus error.
This sensitive arch can catch a bus error synchronously
by its arch-specific way, but at this time there is no way
to forward the error to drivers and to know which driver can
deal the error, so it chooses system reset.

If I disable all such sensitive thing and make callback APIs
to work on ia64, drivers still need synchronous error check for
each I/O to prevent user data from pollution by error, but the
coverage of errors will be limited to only what mpt can detect.
No one care about pollution by non-aware drivers in this situation.

Now I'm watching the problems to be solved before the asynchronous
reset. Maybe my next work after this will be implement of ia64's
callback API...

If you have any better idea than this pcierr_* (and its possible
wrapper, ex. readX_check()) to keep or get high data integrity,
please tell me.

Thanks,
H.Seto

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/6] PCIERR : interfaces for synchronous I/O error detection on driver (config)
  2006-03-22 21:01 ` Greg KH
  2006-03-24  7:47   ` [PATCH 1/6] " Hidetoshi Seto
@ 2006-03-24  7:48   ` Hidetoshi Seto
  2006-03-24  7:49   ` [PATCH 3/6] PCIERR : interfaces for synchronous I/O error detection on driver (base) Hidetoshi Seto
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Hidetoshi Seto @ 2006-03-24  7:48 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux Kernel list, linux-ia64, linux-pci

This patch is 1/4 of PCIERR implementation for IA64.

This part enable us to switch IA64-specific PCIERR by config.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

-----
  arch/ia64/Kconfig            |   10 ++++++++++
  arch/ia64/lib/Makefile       |    1 +
  arch/ia64/lib/pcierr_check.c |   23 +++++++++++++++++++++++
  include/asm-ia64/pci.h       |    8 ++++++++
  4 files changed, 42 insertions(+)

Index: linux-2.6.16_WORK/arch/ia64/Kconfig
===================================================================
--- linux-2.6.16_WORK.orig/arch/ia64/Kconfig
+++ linux-2.6.16_WORK/arch/ia64/Kconfig
@@ -410,6 +410,16 @@
  	bool
  	default PCI

+config PCIERR_CHECK
+	bool "Support pcierr interfaces for IO error detection."
+	depends on PCI
+	help
+	  Saying Y provides pcierr infrastructure for "RAS-aware" drivers
+	  to detect and recover some IO errors, which strongly required by
+	  some of very-high-reliable systems.
+
+	  If you don't know what to do here, say N.
+
  source "drivers/pci/Kconfig"

  source "drivers/pci/hotplug/Kconfig"
Index: linux-2.6.16_WORK/arch/ia64/lib/Makefile
===================================================================
--- linux-2.6.16_WORK.orig/arch/ia64/lib/Makefile
+++ linux-2.6.16_WORK/arch/ia64/lib/Makefile
@@ -15,6 +15,7 @@
  lib-$(CONFIG_MCKINLEY)	+= copy_page_mck.o memcpy_mck.o
  lib-$(CONFIG_PERFMON)	+= carta_random.o
  lib-$(CONFIG_MD_RAID5)	+= xor.o
+lib-$(CONFIG_PCIERR_CHECK) += pcierr_check.o

  AFLAGS___divdi3.o	=
  AFLAGS___udivdi3.o	= -DUNSIGNED
Index: linux-2.6.16_WORK/include/asm-ia64/pci.h
===================================================================
--- linux-2.6.16_WORK.orig/include/asm-ia64/pci.h
+++ linux-2.6.16_WORK/include/asm-ia64/pci.h
@@ -171,4 +171,12 @@

  #define pcibios_scan_all_fns(a, b)	0

+#ifdef CONFIG_PCIERR_CHECK
+/* Enable ia64 pcierr - See arch/ia64/lib/pcierr_check.c */
+#define HAVE_ARCH_PCIERR_CHECK
+typedef struct {
+	int dummy;
+} iocookie;
+#endif /* CONFIG_PCIERR_CHECK  */
+
  #endif /* _ASM_IA64_PCI_H */
Index: linux-2.6.16_WORK/arch/ia64/lib/pcierr_check.c
===================================================================
--- /dev/null
+++ linux-2.6.16_WORK/arch/ia64/lib/pcierr_check.c
@@ -0,0 +1,23 @@
+/*
+ * File:    pcierr_check.c
+ * Purpose: Implement the IA64 specific pcierr interfaces for RAS-drivers
+ */
+
+#include <linux/pci.h>
+
+void pcierr_clear(iocookie *cookie, struct pci_dev *dev);
+int  pcierr_read(iocookie *cookie);
+
+void pcierr_clear(iocookie *cookie, struct pci_dev *dev)
+{
+	/* register device etc. */
+}
+
+int pcierr_read(iocookie *cookie)
+{
+	/* check error etc. */
+	return 0;
+}
+
+EXPORT_SYMBOL(pcierr_read);
+EXPORT_SYMBOL(pcierr_clear);



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/6] PCIERR : interfaces for synchronous I/O error detection on driver (base)
  2006-03-22 21:01 ` Greg KH
  2006-03-24  7:47   ` [PATCH 1/6] " Hidetoshi Seto
  2006-03-24  7:48   ` [PATCH 2/6] PCIERR : interfaces for synchronous I/O error detection on driver (config) Hidetoshi Seto
@ 2006-03-24  7:49   ` Hidetoshi Seto
  2006-03-24  7:50   ` [PATCH 4/6] PCIERR : interfaces for synchronous I/O error detection on driver (mcadrv) Hidetoshi Seto
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Hidetoshi Seto @ 2006-03-24  7:49 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux Kernel list, linux-ia64, linux-pci

This patch is 2/4 of PCIERR implementation for IA64.

This part describes base of IA64-specific PCIERR, registering
driver and checking status.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

-----
  arch/ia64/lib/pcierr_check.c |   53 +++++++++++++++++++++++++++++++++++++++++--
  include/asm-ia64/pci.h       |    4 ++-
  2 files changed, 54 insertions(+), 3 deletions(-)

Index: linux-2.6.16_WORK/arch/ia64/lib/pcierr_check.c
===================================================================
--- linux-2.6.16_WORK.orig/arch/ia64/lib/pcierr_check.c
+++ linux-2.6.16_WORK/arch/ia64/lib/pcierr_check.c
@@ -8,14 +8,63 @@
  void pcierr_clear(iocookie *cookie, struct pci_dev *dev);
  int  pcierr_read(iocookie *cookie);

+LIST_HEAD(pcierr_list);		/* all iocookies are listed in this */
+DEFINE_SPINLOCK(pcierr_lock);	/* to protect list above */
+
+static int have_error(struct pci_dev *dev);
+
  void pcierr_clear(iocookie *cookie, struct pci_dev *dev)
  {
-	/* register device etc. */
+	unsigned long flag;
+
+	INIT_LIST_HEAD(&(cookie->list));
+
+	cookie->dev = dev;
+
+	spin_lock_irqsave(&pcierr_lock, flag);
+	list_add(&cookie->list, &pcierr_list);
+	spin_unlock_irqrestore(&pcierr_lock, flag);
+
+	cookie->error = 0;
  }

  int pcierr_read(iocookie *cookie)
  {
-	/* check error etc. */
+	unsigned long flag;
+	int ret = 0;
+
+	spin_lock_irqsave(&pcierr_lock, flag);
+	if (cookie->error || have_error(cookie->dev))
+		ret = 1;
+	list_del(&cookie->list);
+	spin_unlock_irqrestore(&pcierr_lock, flag);
+
+	return ret;
+}
+
+static int have_error(struct pci_dev *dev)
+{
+	u16 status;
+
+	/* check status */
+	switch (dev->hdr_type) {
+	case PCI_HEADER_TYPE_NORMAL: /* 0 */
+		pci_read_config_word(dev, PCI_STATUS, &status);
+		break;
+	case PCI_HEADER_TYPE_BRIDGE: /* 1 */
+		pci_read_config_word(dev, PCI_SEC_STATUS, &status);
+		break;
+	case PCI_HEADER_TYPE_CARDBUS: /* 2 */
+		return 0; /* FIX ME */
+	default:
+		BUG();
+	}
+
+	if ( (status & PCI_STATUS_REC_TARGET_ABORT)
+		|| (status & PCI_STATUS_REC_MASTER_ABORT)
+		|| (status & PCI_STATUS_DETECTED_PARITY) )
+		return 1;
+
  	return 0;
  }

Index: linux-2.6.16_WORK/include/asm-ia64/pci.h
===================================================================
--- linux-2.6.16_WORK.orig/include/asm-ia64/pci.h
+++ linux-2.6.16_WORK/include/asm-ia64/pci.h
@@ -175,7 +175,9 @@
  /* Enable ia64 pcierr - See arch/ia64/lib/pcierr_check.c */
  #define HAVE_ARCH_PCIERR_CHECK
  typedef struct {
-	int dummy;
+	struct list_head	list;
+	struct pci_dev		*dev;	/* target device */
+	unsigned long		error;	/* error flag */
  } iocookie;
  #endif /* CONFIG_PCIERR_CHECK  */



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/6] PCIERR : interfaces for synchronous I/O error detection on driver (mcadrv)
  2006-03-22 21:01 ` Greg KH
                     ` (2 preceding siblings ...)
  2006-03-24  7:49   ` [PATCH 3/6] PCIERR : interfaces for synchronous I/O error detection on driver (base) Hidetoshi Seto
@ 2006-03-24  7:50   ` Hidetoshi Seto
  2006-03-24  7:51   ` [PATCH 5/6] PCIERR : interfaces for synchronous I/O error detection on driver (poison) Hidetoshi Seto
  2006-03-24  7:52   ` [PATCH 6/6] PCIERR : interfaces for synchronous I/O error detection on driver (sample: Fusion MPT) Hidetoshi Seto
  5 siblings, 0 replies; 12+ messages in thread
From: Hidetoshi Seto @ 2006-03-24  7:50 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux Kernel list, linux-ia64, linux-pci

This patch is 3/4 of PCIERR implementation for IA64.

This part describes modifies in mca_drv.
If MCA happens and the error address was in resource of registered
RAS-aware driver, mca_drv notify the error to the driver and
restart the system. Soon notified driver will do its recovery.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

-----
  arch/ia64/kernel/mca_drv.c   |   81 +++++++++++++++++++++++++++++++++++++++++++
  arch/ia64/lib/pcierr_check.c |    1
  2 files changed, 82 insertions(+)

Index: linux-2.6.16_WORK/arch/ia64/kernel/mca_drv.c
===================================================================
--- linux-2.6.16_WORK.orig/arch/ia64/kernel/mca_drv.c
+++ linux-2.6.16_WORK/arch/ia64/kernel/mca_drv.c
@@ -37,6 +37,11 @@

  #include "mca_drv.h"

+#ifdef CONFIG_PCIERR_CHECK
+#include <linux/pci.h>
+extern struct list_head pcierr_list;
+#endif
+
  /* max size of SAL error record (default) */
  static int sal_rec_max = 10000;

@@ -399,6 +404,76 @@
  	return MCA_IS_GLOBAL;
  }

+#ifdef CONFIG_PCIERR_CHECK
+
+/**
+ * get_target_identifier - get address of target_identifier
+ * @peidx:	pointer of index of processor error section
+ *
+ * Return value:
+ *	addr if valid / 0 if not valid
+ */
+static u64 get_target_identifier(peidx_table_t *peidx)
+{
+	sal_log_mod_error_info_t *smei;
+
+	smei = peidx_bus_check(peidx, 0);
+	if (smei->valid.target_identifier)
+		return (smei->target_identifier);
+	return 0;
+}
+
+#define isMEM64(resource) \
+(((resource).flags & (PCI_BASE_ADDRESS_SPACE|PCI_BASE_ADDRESS_MEM_TYPE_MASK)) \
+	== (PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64))
+
+/**
+ * pcierr_recovery - Check if MCA occur on transaction in pcierr_list.
+ * @peidx:	pointer of index of processor error section
+ *
+ * Return value:
+ *	1 if error could be caught in driver / 0 if not
+ */
+static int pcierr_recovery(peidx_table_t *peidx)
+{
+	u64 addr;
+	int i, loop = 0;
+	struct pci_dev *tdev;
+	iocookie *cookie;
+
+	if (list_empty(&pcierr_list))
+		return 0;
+
+	addr = get_target_identifier(peidx);
+	if (!addr)
+		return 0;
+
+	list_for_each_entry(cookie, &pcierr_list, list) {
+		/* MCA is non-maskable, be careful */
+		if (loop++ > 1024 || !cookie)
+			break;
+		/* double check */
+		if (pcierr_list.next == cookie->list.next)
+			break;
+		tdev = cookie->dev;
+		if (!tdev)
+			continue;
+		for (i = 0; i < PCI_ROM_RESOURCE; i++) {
+			if (tdev->resource[i].start <= addr
+				&& addr <= tdev->resource[i].end) {
+				cookie->error = 1;
+				return 1;
+			}
+			if (isMEM64(tdev->resource[i]))
+				i++;
+		}
+	}
+
+	return 0;
+}
+
+#endif /* CONFIG_PCIERR_CHECK */
+
  /**
   * recover_from_read_error - Try to recover the errors which type are "read"s.
   * @slidx:	pointer of index of SAL error record
@@ -473,6 +548,12 @@

  	}

+#ifdef CONFIG_PCIERR_CHECK
+	/* Are there any RAS-aware drivers? */
+	if (pcierr_recovery(peidx))
+		return 1;
+#endif
+
  	return 0;
  }

Index: linux-2.6.16_WORK/arch/ia64/lib/pcierr_check.c
===================================================================
--- linux-2.6.16_WORK.orig/arch/ia64/lib/pcierr_check.c
+++ linux-2.6.16_WORK/arch/ia64/lib/pcierr_check.c
@@ -68,5 +68,6 @@
  	return 0;
  }

+EXPORT_SYMBOL(pcierr_list);	/* for MCA driver */
  EXPORT_SYMBOL(pcierr_read);
  EXPORT_SYMBOL(pcierr_clear);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/6] PCIERR : interfaces for synchronous I/O error detection on driver (poison)
  2006-03-22 21:01 ` Greg KH
                     ` (3 preceding siblings ...)
  2006-03-24  7:50   ` [PATCH 4/6] PCIERR : interfaces for synchronous I/O error detection on driver (mcadrv) Hidetoshi Seto
@ 2006-03-24  7:51   ` Hidetoshi Seto
  2006-03-24  7:52   ` [PATCH 6/6] PCIERR : interfaces for synchronous I/O error detection on driver (sample: Fusion MPT) Hidetoshi Seto
  5 siblings, 0 replies; 12+ messages in thread
From: Hidetoshi Seto @ 2006-03-24  7:51 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux Kernel list, linux-ia64, linux-pci

This patch is 4/4 of PCIERR implementation for IA64.

This part defines ia64_mca_barrier.
This barrier makes sure that the passed value is poisoned or not.
It can say that this will work as light-version of PAL_MC_DRAIN.
Please refer the comment in patch to get more detail.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

-----
  arch/ia64/kernel/mca.c     |    4 +
  arch/ia64/kernel/mca_asm.S |   30 +++++++++++++
  include/asm-ia64/io.h      |   98 +++++++++++++++++++++++++++++++++++++++++++++
  3 files changed, 132 insertions(+)

Index: linux-2.6.16_WORK/arch/ia64/kernel/mca.c
===================================================================
--- linux-2.6.16_WORK.orig/arch/ia64/kernel/mca.c
+++ linux-2.6.16_WORK/arch/ia64/kernel/mca.c
@@ -103,6 +103,10 @@
  /* In mca_asm.S */
  extern void			ia64_os_init_dispatch_monarch (void);
  extern void			ia64_os_init_dispatch_slave (void);
+#ifdef CONFIG_PCIERR_CHECK
+extern void			ia64_mca_barrier (unsigned long value);
+EXPORT_SYMBOL(ia64_mca_barrier);
+#endif

  static int monarch_cpu = -1;

Index: linux-2.6.16_WORK/arch/ia64/kernel/mca_asm.S
===================================================================
--- linux-2.6.16_WORK.orig/arch/ia64/kernel/mca_asm.S
+++ linux-2.6.16_WORK/arch/ia64/kernel/mca_asm.S
@@ -19,6 +19,9 @@
  // 12/08/05 Keith Owens <kaos@sgi.com>
  //		   Use per cpu MCA/INIT stacks for all data.
  //
+// 06/03/22 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
+//		   Added ia64_mca_barrier.
+//
  #include <linux/config.h>
  #include <linux/threads.h>

@@ -39,6 +42,7 @@
  	.global ia64_os_mca_dispatch
  	.global ia64_os_init_dispatch_monarch
  	.global ia64_os_init_dispatch_slave
+	.global ia64_mca_barrier

  	.text
  	.align 16
@@ -413,6 +417,32 @@

  //EndMain//////////////////////////////////////////////////////////////////////

+//StartMain////////////////////////////////////////////////////////////////////
+
+//
+// ia64_mca_barrier:
+//
+// Some chipset may poison the data read, instead of signaling a BERR.
+// The consummation of poisoned data triggers a MCA, which tells us the
+// polluted address.
+// Note that the read operation by itself does not consume the bad data,
+// so we have to do something with it before it is too late.
+//
+// Calling this function forces a consumption of the passed value since
+// the compiler will have to copy it from whatever register it was in to
+// an "out" register to pass to the function.
+// To avoid possible optimization by compiler, and to make doubly sure,
+// this assenbly clearly consumes the value by moving it to r8.
+//
+// In this way, the value is guaranteed secure, not poisoned, and sanity.
+//
+
+ia64_mca_barrier:
+	mov	r8=r32
+	br.ret.sptk.many rp
+
+//EndMain//////////////////////////////////////////////////////////////////////
+
  // common defines for the stubs
  #define	ms		r4
  #define	regs		r5
Index: linux-2.6.16_WORK/include/asm-ia64/io.h
===================================================================
--- linux-2.6.16_WORK.orig/include/asm-ia64/io.h
+++ linux-2.6.16_WORK/include/asm-ia64/io.h
@@ -165,6 +165,51 @@
   * during optimization, which is why we use "volatile" pointers.
   */

+#ifdef CONFIG_PCIERR_CHECK
+
+extern void ia64_mca_barrier(unsigned long value);
+
+static inline unsigned int
+___ia64_inb (unsigned long port)
+{
+	volatile unsigned char *addr = __ia64_mk_io_addr(port);
+	unsigned char ret;
+
+	ret = *addr;
+	__ia64_mf_a();
+	ia64_mca_barrier(ret);
+
+	return ret;
+}
+
+static inline unsigned int
+___ia64_inw (unsigned long port)
+{
+	volatile unsigned short *addr = __ia64_mk_io_addr(port);
+	unsigned short ret;
+
+	ret = *addr;
+	__ia64_mf_a();
+	ia64_mca_barrier(ret);
+
+	return ret;
+}
+
+static inline unsigned int
+___ia64_inl (unsigned long port)
+{
+	volatile unsigned int *addr = __ia64_mk_io_addr(port);
+	unsigned int ret;
+
+	ret = *addr;
+	__ia64_mf_a();
+	ia64_mca_barrier(ret);
+
+	return ret;
+}
+
+#else /* CONFIG_PCIERR_CHECK */
+
  static inline unsigned int
  ___ia64_inb (unsigned long port)
  {
@@ -198,6 +243,8 @@
  	return ret;
  }

+#endif /* CONFIG_PCIERR_CHECK */
+
  static inline void
  ___ia64_outb (unsigned char val, unsigned long port)
  {
@@ -314,6 +361,55 @@
   * a good idea).  Writes are ok though for all existing ia64 platforms (and
   * hopefully it'll stay that way).
   */
+
+#ifdef CONFIG_PCIERR_CHECK
+
+static inline unsigned char
+___ia64_readb (const volatile void __iomem *addr)
+{
+	unsigned char val;
+
+	val = *(volatile unsigned char __force *)addr;
+	ia64_mca_barrier(val);
+
+	return val;
+}
+
+static inline unsigned short
+___ia64_readw (const volatile void __iomem *addr)
+{
+	unsigned short val;
+
+	val = *(volatile unsigned short __force *)addr;
+	ia64_mca_barrier(val);
+
+	return val;
+}
+
+static inline unsigned int
+___ia64_readl (const volatile void __iomem *addr)
+{
+	unsigned int val;
+
+	val = *(volatile unsigned int __force *) addr;
+	ia64_mca_barrier(val);
+
+	return val;
+}
+
+static inline unsigned long
+___ia64_readq (const volatile void __iomem *addr)
+{
+	unsigned long val;
+
+	val = *(volatile unsigned long __force *) addr;
+	ia64_mca_barrier(val);
+
+	return val;
+}
+
+#else /* CONFIG_PCIERR_CHECK */
+
  static inline unsigned char
  ___ia64_readb (const volatile void __iomem *addr)
  {
@@ -338,6 +434,8 @@
  	return *(volatile unsigned long __force *) addr;
  }

+#endif /* CONFIG_PCIERR_CHECK */
+
  static inline void
  __writeb (unsigned char val, volatile void __iomem *addr)
  {


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 6/6] PCIERR : interfaces for synchronous I/O error detection on driver (sample: Fusion MPT)
  2006-03-22 21:01 ` Greg KH
                     ` (4 preceding siblings ...)
  2006-03-24  7:51   ` [PATCH 5/6] PCIERR : interfaces for synchronous I/O error detection on driver (poison) Hidetoshi Seto
@ 2006-03-24  7:52   ` Hidetoshi Seto
  5 siblings, 0 replies; 12+ messages in thread
From: Hidetoshi Seto @ 2006-03-24  7:52 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux Kernel list, linux-ia64, linux-pci

This is a sample of how to use.

This patch includes following changes:
1:
   Change CHIPREG_READ32 & CHIPREG_WRITE32 to take three args,
   pointer to adapter, and two memory addresses. And change
   them to return the result of memory access.
2:
   Set proper args to every CHIPREG_{READ,WRITE}32 call, and
   also put error code that returns if the access failed.
3:
   Declare a task that resets adapter. Schedule it if an error
   is detected and reset is required. Using CHIPREG_*HR call is
   convenient to require such reset on an error.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

-----
  drivers/message/fusion/mptbase.c |  477 +++++++++++++++++++++++++++++----------
  1 files changed, 359 insertions(+), 118 deletions(-)

Index: linux-2.6.16_WORK/drivers/message/fusion/mptbase.c
===================================================================
--- linux-2.6.16_WORK.orig/drivers/message/fusion/mptbase.c
+++ linux-2.6.16_WORK/drivers/message/fusion/mptbase.c
@@ -181,16 +181,124 @@
  static void	mpt_spi_log_info(MPT_ADAPTER *ioc, u32 log_info);
  static void	mpt_sas_log_info(MPT_ADAPTER *ioc, u32 log_info);

+#ifdef CONFIG_PCIERR_CHECK
+static void	mptbase_schedule_reset(void *ioc);
+static struct work_struct	mptbase_rstTask;
+#endif
+
  /* module entry point */
  static int  __init    fusion_init  (void);
  static void __exit    fusion_exit  (void);

-#define CHIPREG_READ32(addr) 		readl_relaxed(addr)
-#define CHIPREG_READ32_dmasync(addr)	readl(addr)
-#define CHIPREG_WRITE32(addr,val) 	writel(val, addr)
+#define CHIPREG_READ32HR(ioc,addr,val)	pciras_readl(ioc,val,addr,1)
+#define CHIPREG_WRITE32HR(ioc,addr,val)	pciras_writel(ioc,val,addr,1)
+#define CHIPREG_READ32(ioc,addr,val)	pciras_readl(ioc,val,addr,0)
+#define CHIPREG_WRITE32(ioc,addr,val)	pciras_writel(ioc,val,addr,0)
  #define CHIPREG_PIO_WRITE32(addr,val)	outl(val, (unsigned long)addr)
  #define CHIPREG_PIO_READ32(addr) 	inl((unsigned long)addr)

+#ifdef CONFIG_PCIERR_CHECK
+#define PCI_ERR_RETRIES			2
+int
+pciras_readl(MPT_ADAPTER *ioc, u32 *regval, u32 *addr, int hres)
+{
+	u32 val;
+	u16 status;
+	iocookie cookie;
+	int retries = PCI_ERR_RETRIES;
+
+	do {
+		pcierr_clear(&cookie, ioc->pcidev);
+		val = ioread32(addr);
+		status = pcierr_read(&cookie);
+		if ((status) && (retries == PCI_ERR_RETRIES))
+			printk(MYIOC_s_WARN_FMT  "pciras_readl(), "
+			"detects pci parity error, do retry.\n", ioc->name);
+	} while(status && (--retries > 0));
+
+	if (status) {
+		printk(MYIOC_s_WARN_FMT  "pciras_readl(), detects pci parity "
+			"error, retries exhausted.\n", ioc->name);
+		if (hres)
+			schedule_work(&mptbase_rstTask);
+		return 1; /* Error */
+	}
+
+	if (regval)
+		*regval = val;
+	return 0; /* Success */
+}
+
+int
+pciras_writel(MPT_ADAPTER *ioc, u32 regval, u32 *addr, int hres)
+{
+	u16 status;
+	u16 perror;
+	int retries = PCI_ERR_RETRIES;
+
+	do {
+		perror = 0;
+		writel(regval, addr);
+		pci_read_config_word(ioc->pcidev, PCI_STATUS, &status);
+		if (status == 0xffff) {
+			if (retries == PCI_ERR_RETRIES)
+				printk(MYIOC_s_WARN_FMT  "pciras_writel(), "
+				"couldn't read pci register.\n", ioc->name);
+		} else if (status & PCI_STATUS_DETECTED_PARITY) {
+			if (retries == PCI_ERR_RETRIES)
+				printk(MYIOC_s_WARN_FMT  "pciras_writel(), "
+					"detects pci parity error, do retry.\n",
+					ioc->name);
+			perror = 1;
+		}
+		pci_write_config_word(ioc->pcidev, PCI_STATUS, status);
+	} while (perror && (--retries > 0));
+
+	if (perror) {
+		printk(MYIOC_s_WARN_FMT  "pciras_writel(), detects pci "
+			"parity error, retries exhausted.\n", ioc->name);
+
+		if (hres)
+			schedule_work(&mptbase_rstTask);
+		return 1; /* Error */
+	}
+
+	return 0; /* Success */
+}
+
+static void
+mptbase_schedule_reset(void *arg)
+{
+        MPT_ADAPTER *ioc = (MPT_ADAPTER *)arg;
+
+	mpt_HardResetHandler(ioc, CAN_SLEEP);
+
+	return;
+}
+
+#else /* CONFIG_PCIERR_CHECK */
+
+static inline int
+pciras_readl(MPT_ADAPTER *ioc, u32 *regval, u32 *addr, int hres)
+{
+	u32 val;
+
+	val = ioread32(addr);
+	if (regval)
+		*regval = val;
+	return 0; /* Success */
+}
+
+static inline int
+pciras_writel(MPT_ADAPTER *ioc, u32 regval, u32 *addr, int hres)
+{
+	writel(regval, addr);
+
+	return 0; /* Success */
+}
+
+#endif
+
  static void
  pci_disable_io_access(struct pci_dev *pdev)
  {
@@ -342,7 +450,7 @@

   out:
  	/*  Flush (non-TURBO) reply with a WRITE!  */
-	CHIPREG_WRITE32(&ioc->chip->ReplyFifo, pa);
+	CHIPREG_WRITE32HR(ioc, &ioc->chip->ReplyFifo, pa);

  	if (freeme)
  		mpt_free_msg_frame(ioc, mf);
@@ -373,12 +481,27 @@
  	MPT_ADAPTER *ioc = bus_id;
  	u32 pa;

+#ifdef CONFIG_PCIERR_CHECK
+	{
+		u16 status;
+
+		/* read status register to check whether DMA transfer was failed. */
+		pci_read_config_word(ioc->pcidev, PCI_STATUS, &status);
+		if (status & (PCI_STATUS_DETECTED_PARITY | PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_REC_TARGET_ABORT)) {
+			printk(MYIOC_s_WARN_FMT "mpt_interrupt(), detects pci parity error.\n", ioc->name);
+			pci_write_config_word(ioc->pcidev, PCI_STATUS, status);
+			schedule_work(&mptbase_rstTask);
+			return IRQ_HANDLED;
+		}
+	}
+#endif
  	/*
  	 *  Drain the reply FIFO!
  	 */
  	while (1) {
-		pa = CHIPREG_READ32_dmasync(&ioc->chip->ReplyFifo);
-		if (pa == 0xFFFFFFFF)
+		u16 status;
+		status = CHIPREG_READ32HR(ioc, &ioc->chip->ReplyFifo, &pa);
+		if (status || (pa == 0xFFFFFFFF))
  			return IRQ_HANDLED;
  		else if (pa & MPI_ADDRESS_REPLY_A_BIT)
  			mpt_reply(ioc, pa);
@@ -833,7 +956,7 @@

  	mf_dma_addr = (ioc->req_frames_low_dma + req_offset) | ioc->RequestNB[req_idx];
  	dsgprintk((MYIOC_s_INFO_FMT "mf_dma_addr=%x req_idx=%d RequestNB=%x\n", ioc->name, mf_dma_addr, req_idx, 
ioc->RequestNB[req_idx]));
-	CHIPREG_WRITE32(&ioc->chip->RequestFifo, mf_dma_addr);
+	CHIPREG_WRITE32HR(ioc, &ioc->chip->RequestFifo, mf_dma_addr);
  }

  /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/
@@ -933,11 +1056,11 @@
  	}

  	/* Make sure there are no doorbells */
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
-
-	CHIPREG_WRITE32(&ioc->chip->Doorbell,
-			((MPI_FUNCTION_HANDSHAKE<<MPI_DOORBELL_FUNCTION_SHIFT) |
-			 ((reqBytes/4)<<MPI_DOORBELL_ADD_DWORDS_SHIFT)));
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->Doorbell,
+		       ((MPI_FUNCTION_HANDSHAKE<<MPI_DOORBELL_FUNCTION_SHIFT) |
+			((reqBytes/4)<<MPI_DOORBELL_ADD_DWORDS_SHIFT))))
+		return -6;

  	/* Wait for IOC doorbell int */
  	if ((ii = WaitForDoorbellInt(ioc, 5, sleepFlag)) < 0) {
@@ -945,13 +1068,18 @@
  	}

  	/* Read doorbell and check for active bit */
-	if (!(CHIPREG_READ32(&ioc->chip->Doorbell) & MPI_DOORBELL_ACTIVE))
-		return -5;
-
+	{
+		u32 pa;
+		u16 status;
+		status = CHIPREG_READ32(ioc, &ioc->chip->Doorbell, &pa);
+		if (status || !(pa & MPI_DOORBELL_ACTIVE))
+			 return -5;
+	}
  	dhsprintk((KERN_INFO MYNAM ": %s: mpt_send_handshake_request start, WaitCnt=%d\n",
  		ioc->name, ii));

-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+		return -8;

  	if ((r = WaitForDoorbellAck(ioc, 5, sleepFlag)) < 0) {
  		return -2;
@@ -966,7 +1094,8 @@
  			(req_as_bytes[(ii*4) + 1] <<  8) |
  			(req_as_bytes[(ii*4) + 2] << 16) |
  			(req_as_bytes[(ii*4) + 3] << 24));
-		CHIPREG_WRITE32(&ioc->chip->Doorbell, word);
+		if (CHIPREG_WRITE32(ioc, &ioc->chip->Doorbell, word))
+			return -9;
  		if ((r = WaitForDoorbellAck(ioc, 5, sleepFlag)) < 0) {
  			r = -3;
  			break;
@@ -979,7 +1108,8 @@
  		r = -4;

  	/* Make sure there are no doorbells */
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+		return -10;

  	return r;
  }
@@ -1006,16 +1136,20 @@
  	int	 r = 0;

  	/* return if in use */
-	if (CHIPREG_READ32(&ioc->chip->Doorbell)
-	    & MPI_DOORBELL_ACTIVE)
-	    return -1;
-
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	{
+		u32 pa;
+		u16 status;
+		status = CHIPREG_READ32(ioc, &ioc->chip->Doorbell, &pa);
+		if (status || (pa & MPI_DOORBELL_ACTIVE))
+			return -1;
+	}

-	CHIPREG_WRITE32(&ioc->chip->Doorbell,
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0) ||
+		CHIPREG_WRITE32(ioc, &ioc->chip->Doorbell,
  		((MPI_FUNCTION_HOST_PAGEBUF_ACCESS_CONTROL
  		 <<MPI_DOORBELL_FUNCTION_SHIFT) |
-		 (access_control_value<<12)));
+		 (access_control_value<<12))))
+			return -3;

  	/* Wait for IOC to clear Doorbell Status bit */
  	if ((r = WaitForDoorbellAck(ioc, 5, sleepFlag)) < 0) {
@@ -1209,6 +1343,9 @@
  	int		 r = -ENODEV;
  	u8		 revision;
  	u8		 pcixcmd;
+#ifdef CONFIG_PCIERR_CHECK
+	u16		 pcicmd;
+#endif
  	static int	 mpt_ids = 0;
  #ifdef CONFIG_PROC_FS
  	struct proc_dir_entry *dent, *ent;
@@ -1329,6 +1466,10 @@
  		ioc->pio_chip = (SYSIF_REGS __iomem *)pmem;
  	}

+#ifdef CONFIG_PCIERR_CHECK
+	INIT_WORK(&mptbase_rstTask, mptbase_schedule_reset, (void *)ioc);
+#endif
+
  	if (pdev->device == MPI_MANUFACTPAGE_DEVICEID_FC909) {
  		ioc->prod_name = "LSIFC909";
  		ioc->bus_type = FC;
@@ -1397,6 +1538,17 @@
  			pcixcmd &= 0x8F;
  			pci_write_config_byte(pdev, 0x6a, pcixcmd);
  		}
+#ifdef CONFIG_PCIERR_CHECK
+		/* set PER(0040) and SERR_EN(0100) for PCI command register */
+                /* set DPER(0001) for PCI-X command register */
+		pci_read_config_word(pdev, PCI_COMMAND, &pcicmd);
+		pcicmd |= PCI_COMMAND_PARITY | PCI_COMMAND_SERR;
+		pci_write_config_word(pdev, PCI_COMMAND, pcicmd);
+
+		pci_read_config_byte(pdev, 0x6a, &pcixcmd);
+		pcixcmd |= 0x0001; /* set DPER */
+		pci_write_config_byte(pdev, 0x6a, pcixcmd);
+#endif
  	}
  	else if (pdev->device == MPI_MANUFACTPAGE_DEVID_1030_53C1035) {
  		ioc->prod_name = "LSI53C1035";
@@ -1438,9 +1590,11 @@
  	spin_lock_init(&ioc->FreeQlock);

  	/* Disable all! */
-	CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntMask, 0xFFFFFFFF))
+		return -EIO;
  	ioc->active = 0;
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+		return -EIO;

  	/* Set lookup ptr. */
  	list_add_tail(&ioc->list, &ioc_list);
@@ -1559,19 +1713,23 @@
  	}

  	/* Disable interrupts! */
-	CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntMask, 0xFFFFFFFF))
+		return;

  	ioc->active = 0;
  	synchronize_irq(pdev->irq);

  	/* Clear any lingering interrupt */
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+		return;

-	CHIPREG_READ32(&ioc->chip->IntStatus);
+	if (CHIPREG_READ32(ioc, &ioc->chip->IntStatus, NULL))
+		return;

  	mpt_adapter_dispose(ioc);

  	pci_set_drvdata(pdev, NULL);
+
  }

  /**************************************************************************
@@ -1605,11 +1763,13 @@
  	}

  	/* disable interrupts */
-	CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntMask, 0xFFFFFFFF))
+		return -EIO;
  	ioc->active = 0;

  	/* Clear any lingering interrupt */
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+		return -EIO;

  	pci_disable_device(pdev);
  	pci_set_power_state(pdev, device_state);
@@ -1630,6 +1790,7 @@
  	u32 device_state = pdev->current_state;
  	int recovery_state;
  	int ii;
+	u32 pa;

  	printk(MYIOC_s_INFO_FMT
  	"pci-resume: pdev=0x%p, slot=%s, Previous operating state [D%d]\n",
@@ -1640,11 +1801,16 @@
  	pci_enable_device(pdev);

  	/* enable interrupts */
-	CHIPREG_WRITE32(&ioc->chip->IntMask, MPI_HIM_DIM);
+	CHIPREG_WRITE32(ioc, &ioc->chip->IntMask, MPI_HIM_DIM);
  	ioc->active = 1;

  	/* F/W not running */
-	if(!CHIPREG_READ32(&ioc->chip->Doorbell)) {
+	{
+		u16 status;
+		pa = 0;
+		status = CHIPREG_READ32(ioc, &ioc->chip->Doorbell, &pa);
+	}
+	if (!pa) {
  		/* enable domain validation flags */
  		for (ii=0; ii < MPT_MAX_SCSI_DEVICES; ii++) {
  			ioc->spi_data.dvStatus[ii] |= MPT_SCSICFG_NEED_DV;
@@ -1655,7 +1821,7 @@
  		"pci-resume: ioc-state=0x%x,doorbell=0x%x\n",
  		ioc->name,
  		(mpt_GetIocState(ioc, 1) >> MPI_IOC_STATE_SHIFT),
-		CHIPREG_READ32(&ioc->chip->Doorbell));
+		pa);

  	/* bring ioc to operational state */
  	if ((recovery_state = mpt_do_ioc_recovery(ioc,
@@ -1708,7 +1874,8 @@
  			ioc->name, reason==MPT_HOSTEVENT_IOC_BRINGUP ? "bringup" : "recovery");

  	/* Disable reply interrupts (also blocks FreeQ) */
-	CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntMask, 0xFFFFFFFF))
+		return -5;
  	ioc->active = 0;

  	if (ioc->alt_ioc) {
@@ -1716,7 +1883,9 @@
  			reset_alt_ioc_active = 1;

  		/* Disable alt-IOC's reply interrupts (and FreeQ) for a bit ... */
-		CHIPREG_WRITE32(&ioc->alt_ioc->chip->IntMask, 0xFFFFFFFF);
+		if (CHIPREG_WRITE32(ioc->alt_ioc, &ioc->alt_ioc->chip->IntMask, 0xFFFFFFFF))
+			ret = -6;
+
  		ioc->alt_ioc->active = 0;
  	}

@@ -1733,7 +1902,9 @@
  				/* (re)Enable alt-IOC! (reply interrupt, FreeQ) */
  				dprintk((KERN_INFO MYNAM ": alt-%s reply irq re-enabled\n",
  						ioc->alt_ioc->name));
-				CHIPREG_WRITE32(&ioc->alt_ioc->chip->IntMask, MPI_HIM_DIM);
+				if (CHIPREG_WRITE32(ioc->alt_ioc, &ioc->alt_ioc->chip->IntMask, MPI_HIM_DIM))
+					ret = -7;
+
  				ioc->alt_ioc->active = 1;
  			}

@@ -1849,7 +2020,8 @@

  	if (ret == 0) {
  		/* Enable! (reply interrupt) */
-		CHIPREG_WRITE32(&ioc->chip->IntMask, MPI_HIM_DIM);
+		if (CHIPREG_WRITE32(ioc, &ioc->chip->IntMask, MPI_HIM_DIM))
+			ret = -8;
  		ioc->active = 1;
  	}

@@ -1857,7 +2029,8 @@
  		/* (re)Enable alt-IOC! (reply interrupt) */
  		dinitprintk((KERN_INFO MYNAM ": alt-%s reply irq re-enabled\n",
  				ioc->alt_ioc->name));
-		CHIPREG_WRITE32(&ioc->alt_ioc->chip->IntMask, MPI_HIM_DIM);
+		if (CHIPREG_WRITE32(ioc->alt_ioc, &ioc->alt_ioc->chip->IntMask, MPI_HIM_DIM))
+			ret = -9;
  		ioc->alt_ioc->active = 1;
  	}

@@ -2042,10 +2215,12 @@
  	}

  	/* Disable adapter interrupts! */
-	CHIPREG_WRITE32(&ioc->chip->IntMask, 0xFFFFFFFF);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntMask, 0xFFFFFFFF))
+		return;
  	ioc->active = 0;
  	/* Clear any lingering interrupt */
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+		return;

  	if (ioc->alloc != NULL) {
  		sz = ioc->alloc_sz;
@@ -2371,7 +2546,8 @@
  	u32 s, sc;

  	/*  Get!  */
-	s = CHIPREG_READ32(&ioc->chip->Doorbell);
+	CHIPREG_READ32(ioc, &ioc->chip->Doorbell, &s);
+
  //	dprintk((MYIOC_s_INFO_FMT "raw state = %08x\n", ioc->name, s));
  	sc = s & MPI_IOC_STATE_MASK;

@@ -2968,15 +3144,14 @@

  	ddlprintk((MYIOC_s_INFO_FMT "downloadboot: fw size 0x%x (%d), FW Ptr %p\n",
  				ioc->name, pFwHeader->ImageSize, pFwHeader->ImageSize, pFwHeader));
-
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, 0xFF);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_1ST_KEY_VALUE);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_2ND_KEY_VALUE);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_3RD_KEY_VALUE);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_4TH_KEY_VALUE);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_5TH_KEY_VALUE);
-
-	CHIPREG_WRITE32(&ioc->chip->Diagnostic, (MPI_DIAG_PREVENT_IOC_BOOT | MPI_DIAG_DISABLE_ARM));
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, 0xFF)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_1ST_KEY_VALUE)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_2ND_KEY_VALUE)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_3RD_KEY_VALUE)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_4TH_KEY_VALUE)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_5TH_KEY_VALUE)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->Diagnostic, (MPI_DIAG_PREVENT_IOC_BOOT | MPI_DIAG_DISABLE_ARM)))
+		return EFAULT;

  	/* wait 1 msec */
  	if (sleepFlag == CAN_SLEEP) {
@@ -2985,11 +3160,15 @@
  		mdelay (1);
  	}

-	diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
-	CHIPREG_WRITE32(&ioc->chip->Diagnostic, diag0val | MPI_DIAG_RESET_ADAPTER);
+	if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+		return -EFAULT;
+
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->Diagnostic, diag0val | MPI_DIAG_RESET_ADAPTER))
+		return -EFAULT;

  	for (count = 0; count < 30; count ++) {
-		diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+		if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+			return -EFAULT;
  		if (!(diag0val & MPI_DIAG_RESET_ADAPTER)) {
  			ddlprintk((MYIOC_s_INFO_FMT "RESET_ADAPTER cleared, count=%d\n",
  				ioc->name, count));
@@ -3010,15 +3189,17 @@
  		return -3;
  	}

-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, 0xFF);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_1ST_KEY_VALUE);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_2ND_KEY_VALUE);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_3RD_KEY_VALUE);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_4TH_KEY_VALUE);
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_5TH_KEY_VALUE);
+	if(CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, 0xFF)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_1ST_KEY_VALUE)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_2ND_KEY_VALUE)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_3RD_KEY_VALUE)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_4TH_KEY_VALUE)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_5TH_KEY_VALUE))
+		return -EFAULT;

  	/* Set the DiagRwEn and Disable ARM bits */
-	CHIPREG_WRITE32(&ioc->chip->Diagnostic, (MPI_DIAG_RW_ENABLE | MPI_DIAG_DISABLE_ARM));
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->Diagnostic, (MPI_DIAG_RW_ENABLE | MPI_DIAG_DISABLE_ARM)))
+		return -EFAULT;

  	fwSize = (pFwHeader->ImageSize + 3)/4;
  	ptrFw = (u32 *) pFwHeader;
@@ -3081,10 +3262,10 @@
  		CHIPREG_PIO_WRITE32(&ioc->pio_chip->DiagRwData, diagRwData);

  	} else /* if((ioc->bus_type == SAS) || (ioc->bus_type == FC)) */ {
-		diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
-		CHIPREG_WRITE32(&ioc->chip->Diagnostic, diag0val |
-		    MPI_DIAG_CLEAR_FLASH_BAD_SIG);
-
+		if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val)
+			|| CHIPREG_WRITE32(ioc, &ioc->chip->Diagnostic,
+				 diag0val | MPI_DIAG_CLEAR_FLASH_BAD_SIG))
+			return -EFAULT;
  		/* wait 1 msec */
  		if (sleepFlag == CAN_SLEEP) {
  			msleep_interruptible (1);
@@ -3096,17 +3277,20 @@
  	if (ioc->errata_flag_1064)
  		pci_disable_io_access(ioc->pcidev);

-	diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+	if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+		return -EFAULT;
  	ddlprintk((MYIOC_s_INFO_FMT "downloadboot diag0val=%x, "
  		"turning off PREVENT_IOC_BOOT, DISABLE_ARM, RW_ENABLE\n",
  		ioc->name, diag0val));
  	diag0val &= ~(MPI_DIAG_PREVENT_IOC_BOOT | MPI_DIAG_DISABLE_ARM | MPI_DIAG_RW_ENABLE);
  	ddlprintk((MYIOC_s_INFO_FMT "downloadboot now diag0val=%x\n",
  		ioc->name, diag0val));
-	CHIPREG_WRITE32(&ioc->chip->Diagnostic, diag0val);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->Diagnostic, diag0val))
+		return -EFAULT;

  	/* Write 0xFF to reset the sequencer */
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, 0xFF);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, 0xFF))
+		return -EFAULT;

  	if (ioc->bus_type == SAS) {
  		ioc_state = mpt_GetIocState(ioc, 0);
@@ -3250,14 +3434,17 @@
  #endif

  	/* Clear any existing interrupts */
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+		return -1;

  	/* Use "Diagnostic reset" method! (only thing available!) */
-	diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+	if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+		return -1;

  #ifdef MPT_DEBUG
  	if (ioc->alt_ioc)
-		diag1val = CHIPREG_READ32(&ioc->alt_ioc->chip->Diagnostic);
+		if (CHIPREG_READ32(ioc->alt_ioc, &ioc->alt_ioc->chip->Diagnostic, &diag1val))
+			return -1;
  	dprintk((MYIOC_s_INFO_FMT "DbG1: diag0=%08x, diag1=%08x\n",
  			ioc->name, diag0val, diag1val));
  #endif
@@ -3270,12 +3457,13 @@
  			/* Write magic sequence to WriteSequence register
  			 * Loop until in diagnostic mode
  			 */
-			CHIPREG_WRITE32(&ioc->chip->WriteSequence, 0xFF);
-			CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_1ST_KEY_VALUE);
-			CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_2ND_KEY_VALUE);
-			CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_3RD_KEY_VALUE);
-			CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_4TH_KEY_VALUE);
-			CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_5TH_KEY_VALUE);
+			if (CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, 0xFF)
+				|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_1ST_KEY_VALUE)
+				|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_2ND_KEY_VALUE)
+				|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_3RD_KEY_VALUE)
+				|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_4TH_KEY_VALUE)
+				|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_5TH_KEY_VALUE))
+				return -1;

  			/* wait 100 msec */
  			if (sleepFlag == CAN_SLEEP) {
@@ -3292,7 +3480,8 @@

  			}

-			diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+			if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+				return -1;

  			dprintk((MYIOC_s_INFO_FMT "Wrote magic DiagWriteEn sequence (%x)\n",
  					ioc->name, diag0val));
@@ -3300,7 +3489,8 @@

  #ifdef MPT_DEBUG
  		if (ioc->alt_ioc)
-			diag1val = CHIPREG_READ32(&ioc->alt_ioc->chip->Diagnostic);
+			if (CHIPREG_READ32(ioc->alt_ioc, &ioc->alt_ioc->chip->Diagnostic, &diag1val))
+				return -1;
  		dprintk((MYIOC_s_INFO_FMT "DbG2: diag0=%08x, diag1=%08x\n",
  				ioc->name, diag0val, diag1val));
  #endif
@@ -3308,14 +3498,16 @@
  		 * Disable the ARM (Bug fix)
  		 *
  		 */
-		CHIPREG_WRITE32(&ioc->chip->Diagnostic, diag0val | MPI_DIAG_DISABLE_ARM);
+		if (CHIPREG_WRITE32(ioc, &ioc->chip->Diagnostic, diag0val | MPI_DIAG_DISABLE_ARM))
+			return -1;
  		mdelay(1);

  		/*
  		 * Now hit the reset bit in the Diagnostic register
  		 * (THE BIG HAMMER!) (Clears DRWE bit).
  		 */
-		CHIPREG_WRITE32(&ioc->chip->Diagnostic, diag0val | MPI_DIAG_RESET_ADAPTER);
+		if (CHIPREG_WRITE32(ioc, &ioc->chip->Diagnostic, diag0val | MPI_DIAG_RESET_ADAPTER))
+			return -1;
  		hard_reset_done = 1;
  		dprintk((MYIOC_s_INFO_FMT "Diagnostic reset performed\n",
  				ioc->name));
@@ -3351,7 +3543,8 @@
  			 * case.  _diag_reset will return < 0
  			 */
  			for (count = 0; count < 30; count ++) {
-				diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+				if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+					return -1;
  				if (!(diag0val & MPI_DIAG_RESET_ADAPTER)) {
  					break;
  				}
@@ -3377,7 +3570,8 @@
  			 * with calling program.
  			 */
  			for (count = 0; count < 60; count ++) {
-				doorbell = CHIPREG_READ32(&ioc->chip->Doorbell);
+			  if (CHIPREG_READ32(ioc, &ioc->chip->Doorbell, &doorbell))
+					return -1;
  				doorbell &= MPI_IOC_STATE_MASK;

  				if (doorbell == MPI_IOC_STATE_READY) {
@@ -3394,10 +3588,12 @@
  		}
  	}

-	diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+	if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+		return -1;
  #ifdef MPT_DEBUG
  	if (ioc->alt_ioc)
-		diag1val = CHIPREG_READ32(&ioc->alt_ioc->chip->Diagnostic);
+		if (CHIPREG_READ32(ioc->alt_ioc, &ioc->alt_ioc->chip->Diagnostic, &diag1val))
+			return -1;
  	dprintk((MYIOC_s_INFO_FMT "DbG3: diag0=%08x, diag1=%08x\n",
  		ioc->name, diag0val, diag1val));
  #endif
@@ -3405,18 +3601,20 @@
  	/* Clear RESET_HISTORY bit!  Place board in the
  	 * diagnostic mode to update the diag register.
  	 */
-	diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+	if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+		return -1;
  	count = 0;
  	while ((diag0val & MPI_DIAG_DRWE) == 0) {
  		/* Write magic sequence to WriteSequence register
  		 * Loop until in diagnostic mode
  		 */
-		CHIPREG_WRITE32(&ioc->chip->WriteSequence, 0xFF);
-		CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_1ST_KEY_VALUE);
-		CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_2ND_KEY_VALUE);
-		CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_3RD_KEY_VALUE);
-		CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_4TH_KEY_VALUE);
-		CHIPREG_WRITE32(&ioc->chip->WriteSequence, MPI_WRSEQ_5TH_KEY_VALUE);
+		if (CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, 0xFF)
+			|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_1ST_KEY_VALUE)
+			|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_2ND_KEY_VALUE)
+			|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_3RD_KEY_VALUE)
+			|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_4TH_KEY_VALUE)
+			|| CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, MPI_WRSEQ_5TH_KEY_VALUE))
+			return -1;

  		/* wait 100 msec */
  		if (sleepFlag == CAN_SLEEP) {
@@ -3431,11 +3629,15 @@
  					ioc->name, diag0val);
  			break;
  		}
-		diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+		if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+			return -1;
  	}
  	diag0val &= ~MPI_DIAG_RESET_HISTORY;
-	CHIPREG_WRITE32(&ioc->chip->Diagnostic, diag0val);
-	diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->Diagnostic, diag0val))
+		return -1;
+
+	if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+		return -1;
  	if (diag0val & MPI_DIAG_RESET_HISTORY) {
  		printk(MYIOC_s_WARN_FMT "ResetHistory bit failed to clear!\n",
  				ioc->name);
@@ -3443,11 +3645,13 @@

  	/* Disable Diagnostic Mode
  	 */
-	CHIPREG_WRITE32(&ioc->chip->WriteSequence, 0xFFFFFFFF);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->WriteSequence, 0xFFFFFFFF))
+		return -1;

  	/* Check FW reload status flags.
  	 */
-	diag0val = CHIPREG_READ32(&ioc->chip->Diagnostic);
+	if (CHIPREG_READ32(ioc, &ioc->chip->Diagnostic, &diag0val))
+		return -1;
  	if (diag0val & (MPI_DIAG_FLASH_BAD_SIG | MPI_DIAG_RESET_ADAPTER | MPI_DIAG_DISABLE_ARM)) {
  		printk(MYIOC_s_ERR_FMT "Diagnostic reset FAILED! (%02xh)\n",
  				ioc->name, diag0val);
@@ -3456,7 +3660,8 @@

  #ifdef MPT_DEBUG
  	if (ioc->alt_ioc)
-		diag1val = CHIPREG_READ32(&ioc->alt_ioc->chip->Diagnostic);
+		if (CHIPREG_READ32(ioc->alt_ioc, &ioc->alt_ioc->chip->Diagnostic, &diag1val))
+			return -1;
  	dprintk((MYIOC_s_INFO_FMT "DbG4: diag0=%08x, diag1=%08x\n",
  			ioc->name, diag0val, diag1val));
  #endif
@@ -3492,7 +3697,8 @@

  	drsprintk((KERN_INFO MYNAM ": %s: Sending IOC reset(0x%02x)!\n",
  			ioc->name, reset_type));
-	CHIPREG_WRITE32(&ioc->chip->Doorbell, reset_type<<MPI_DOORBELL_FUNCTION_SHIFT);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->Doorbell, reset_type<<MPI_DOORBELL_FUNCTION_SHIFT))
+		return -EFAULT;
  	if ((r = WaitForDoorbellAck(ioc, 5, sleepFlag)) < 0)
  		return r;

@@ -3789,7 +3995,9 @@

  	for (i = 0; i < ioc->reply_depth; i++) {
  		/*  Write each address to the IOC!  */
-		CHIPREG_WRITE32(&ioc->chip->ReplyFifo, alloc_dma);
+		if (CHIPREG_WRITE32(ioc, &ioc->chip->ReplyFifo, alloc_dma))
+			goto out_fail;
+
  		alloc_dma += ioc->reply_sz;
  	}

@@ -3854,10 +4062,11 @@
  	 * then tell IOC that we want to handshake a request of N words.
  	 * (WRITE u32val to Doorbell reg).
  	 */
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
-	CHIPREG_WRITE32(&ioc->chip->Doorbell,
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0)
+		|| CHIPREG_WRITE32(ioc, &ioc->chip->Doorbell,
  			((MPI_FUNCTION_HANDSHAKE<<MPI_DOORBELL_FUNCTION_SHIFT) |
-			 ((reqBytes/4)<<MPI_DOORBELL_ADD_DWORDS_SHIFT)));
+			((reqBytes/4)<<MPI_DOORBELL_ADD_DWORDS_SHIFT))))
+		return -1;

  	/*
  	 * Wait for IOC's doorbell handshake int
@@ -3869,15 +4078,21 @@
  			ioc->name, reqBytes, t, failcnt ? " - MISSING DOORBELL HANDSHAKE!" : ""));

  	/* Read doorbell and check for active bit */
-	if (!(CHIPREG_READ32(&ioc->chip->Doorbell) & MPI_DOORBELL_ACTIVE))
+	{
+		u32 pa;
+		u16 status;
+		status = CHIPREG_READ32(ioc, &ioc->chip->Doorbell, &pa);
+		if (status || !(pa & MPI_DOORBELL_ACTIVE))
  			return -1;
+	}

  	/*
  	 * Clear doorbell int (WRITE 0 to IntStatus reg),
  	 * then wait for IOC to ACKnowledge that it's ready for
  	 * our handshake request.
  	 */
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+		return -1;
  	if (!failcnt && (t = WaitForDoorbellAck(ioc, 5, sleepFlag)) < 0)
  		failcnt++;

@@ -3895,7 +4110,8 @@
  				    (req_as_bytes[(ii*4) + 2] << 16) |
  				    (req_as_bytes[(ii*4) + 3] << 24));

-			CHIPREG_WRITE32(&ioc->chip->Doorbell, word);
+			if (CHIPREG_WRITE32(ioc, &ioc->chip->Doorbell, word))
+				return -1;
  			if ((t = WaitForDoorbellAck(ioc, 5, sleepFlag)) < 0)
  				failcnt++;
  		}
@@ -3951,7 +4167,8 @@

  	if (sleepFlag == CAN_SLEEP) {
  		while (--cntdn) {
-			intstat = CHIPREG_READ32(&ioc->chip->IntStatus);
+			if (CHIPREG_READ32(ioc, &ioc->chip->IntStatus, &intstat))
+				return -1;
  			if (! (intstat & MPI_HIS_IOP_DOORBELL_STATUS))
  				break;
  			msleep_interruptible (1);
@@ -3959,7 +4176,8 @@
  		}
  	} else {
  		while (--cntdn) {
-			intstat = CHIPREG_READ32(&ioc->chip->IntStatus);
+			if (CHIPREG_READ32(ioc, &ioc->chip->IntStatus, &intstat))
+				return -1;
  			if (! (intstat & MPI_HIS_IOP_DOORBELL_STATUS))
  				break;
  			mdelay (1);
@@ -4000,7 +4218,8 @@
  	cntdn = 1000 * howlong;
  	if (sleepFlag == CAN_SLEEP) {
  		while (--cntdn) {
-			intstat = CHIPREG_READ32(&ioc->chip->IntStatus);
+			if (CHIPREG_READ32(ioc, &ioc->chip->IntStatus, &intstat))
+				return -1;
  			if (intstat & MPI_HIS_DOORBELL_INTERRUPT)
  				break;
  			msleep_interruptible(1);
@@ -4008,7 +4227,8 @@
  		}
  	} else {
  		while (--cntdn) {
-			intstat = CHIPREG_READ32(&ioc->chip->IntStatus);
+			if (CHIPREG_READ32(ioc, &ioc->chip->IntStatus, &intstat))
+				return -1;
  			if (intstat & MPI_HIS_DOORBELL_INTERRUPT)
  				break;
  			mdelay(1);
@@ -4059,13 +4279,27 @@
  	if ((t = WaitForDoorbellInt(ioc, howlong, sleepFlag)) < 0) {
  		failcnt++;
  	} else {
-		hs_reply[u16cnt++] = le16_to_cpu(CHIPREG_READ32(&ioc->chip->Doorbell) & 0x0000FFFF);
-		CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+		{
+			u32 pa;
+			if (CHIPREG_READ32(ioc, &ioc->chip->Doorbell, &pa))
+				return -1;
+			hs_reply[u16cnt++] = le16_to_cpu(pa & 0x0000FFFF);
+		}
+
+		if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+			return -1;
  		if ((t = WaitForDoorbellInt(ioc, 5, sleepFlag)) < 0)
  			failcnt++;
  		else {
-			hs_reply[u16cnt++] = le16_to_cpu(CHIPREG_READ32(&ioc->chip->Doorbell) & 0x0000FFFF);
-			CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+			{
+				u32 pa;
+				if (CHIPREG_READ32(ioc, &ioc->chip->Doorbell, &pa))
+					return -1;
+				hs_reply[u16cnt++] = le16_to_cpu(pa & 0x0000FFFF);
+			}
+
+			if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+				return -1;
  		}
  	}

@@ -4080,16 +4314,23 @@
  	for (u16cnt=2; !failcnt && u16cnt < (2 * mptReply->MsgLength); u16cnt++) {
  		if ((t = WaitForDoorbellInt(ioc, 5, sleepFlag)) < 0)
  			failcnt++;
-		hword = le16_to_cpu(CHIPREG_READ32(&ioc->chip->Doorbell) & 0x0000FFFF);
+		{
+			u32 pa;
+			if (CHIPREG_READ32(ioc, &ioc->chip->Doorbell, &pa))
+				return -1;
+			hword = le16_to_cpu(pa & 0x0000FFFF);
+		}
  		/* don't overflow our IOC hs_reply[] buffer! */
  		if (u16cnt < sizeof(ioc->hs_reply) / sizeof(ioc->hs_reply[0]))
  			hs_reply[u16cnt] = hword;
-		CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+		if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+			return -1;
  	}

  	if (!failcnt && (t = WaitForDoorbellInt(ioc, 5, sleepFlag)) < 0)
  		failcnt++;
-	CHIPREG_WRITE32(&ioc->chip->IntStatus, 0);
+	if (CHIPREG_WRITE32(ioc, &ioc->chip->IntStatus, 0))
+		return -1;

  	if (failcnt) {
  		printk(MYIOC_s_ERR_FMT "Handshake reply failure!\n",



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-04-03  4:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-22  8:38 [PATCH] PCIERR : interfaces for synchronous I/O error detection on driver Hidetoshi Seto
2006-03-22 21:01 ` Greg KH
2006-03-24  7:47   ` [PATCH 1/6] " Hidetoshi Seto
2006-03-24 23:43     ` Linas Vepstas
2006-03-27  2:37       ` Hidetoshi Seto
2006-03-31 22:01         ` Linas Vepstas
2006-04-03  4:54           ` Hidetoshi Seto
2006-03-24  7:48   ` [PATCH 2/6] PCIERR : interfaces for synchronous I/O error detection on driver (config) Hidetoshi Seto
2006-03-24  7:49   ` [PATCH 3/6] PCIERR : interfaces for synchronous I/O error detection on driver (base) Hidetoshi Seto
2006-03-24  7:50   ` [PATCH 4/6] PCIERR : interfaces for synchronous I/O error detection on driver (mcadrv) Hidetoshi Seto
2006-03-24  7:51   ` [PATCH 5/6] PCIERR : interfaces for synchronous I/O error detection on driver (poison) Hidetoshi Seto
2006-03-24  7:52   ` [PATCH 6/6] PCIERR : interfaces for synchronous I/O error detection on driver (sample: Fusion MPT) Hidetoshi Seto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox