[RFC] I/O MCA recovery

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] I/O MCA recovery
@ 2004-05-04 16:54 Jesse Barnes
  2004-05-04 17:14 ` Grant Grundler
                   ` (32 more replies)
  0 siblings, 33 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-04 16:54 UTC (permalink / raw)
  To: linux-ia64

Background: in an effort to allow option ROM emulation on ia64 (via the X 
int10+x86 emulator), I've had to look at doing I/O error recovery since many 
option ROMs expect to do legacy I/O port reads and writes to ports that may 
or may not respond (one particular ROM that I've looked at continuously polls 
a register in legacy I/O space until it returns a value).  On sn2, when a 
device doesn't respond to an I/O (legacy space or otherwise), a PCI master 
abort is generated, which generally causes an MCA.

Recovering from such an event requires reprogramming chipset and bridge 
registers (some to just clear error state and others to re-arm error 
detection) and as such is very platform specific.  Another issue is that the 
MCA event may arrive after the processor has switched to a task completely 
unrelated to the I/O.  The approach I've taken thus far is to register the 
I/O address range that a process mmaps in /proc/bus/pci (in 
pci_mmap_page_range), along with its associated PID.  When an MCA occurs, an 
I/O error recovery routine checks the target identifier value against the 
linked list of I/O ranges and recovers appropriately (the PID is there so 
that we can send a SIGBUS or somesuch in the future).  This allows us to 
avoid calling PAL_MC_DRAIN on every interrupt to try and flush out errors 
(which I'm guessing would be very expensive), but may have other problems.

Ultimately, this involves adding a machine vector for I/O error recovery and a 
linked list of I/O regions and their PIDs.  The I/O error handler could 
optionally be extended to look for any PCI resource range and call a 
per-device error handling callback or shutdown routine.

Thoughts?  Does this approach sound reasonable?

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
@ 2004-05-04 17:14 ` Grant Grundler
  2004-05-04 17:27 ` Jesse Barnes
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Grant Grundler @ 2004-05-04 17:14 UTC (permalink / raw)
  To: linux-ia64

On Tue, May 04, 2004 at 09:54:09AM -0700, Jesse Barnes wrote:
> Another issue is that the 
> MCA event may arrive after the processor has switched to a task completely 
> unrelated to the I/O.  The approach I've taken thus far is to register the 
> I/O address range that a process mmaps in /proc/bus/pci (in 
> pci_mmap_page_range), along with its associated PID.

Why not use the existing resource map?
The PCI bus data structures are hierarchial and resources are well
defined in that. Seems like the linked list of I/O ranges could (1)
get very long and (2) just replicates what's already there.

> Ultimately, this involves adding a machine vector for I/O error recovery and a 
> linked list of I/O regions and their PIDs.  The I/O error handler could 
> optionally be extended to look for any PCI resource range and call a 
> per-device error handling callback or shutdown routine.
> 
> Thoughts?  Does this approach sound reasonable?

Seems to fit nicely with previous pci_read_check() support proposed
earlier.

grant

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
  2004-05-04 17:14 ` Grant Grundler
@ 2004-05-04 17:27 ` Jesse Barnes
  2004-05-04 17:43 ` David Mosberger
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-04 17:27 UTC (permalink / raw)
  To: linux-ia64

On Tuesday, May 4, 2004 10:14 am, Grant Grundler wrote:
> On Tue, May 04, 2004 at 09:54:09AM -0700, Jesse Barnes wrote:
> > Another issue is that the
> > MCA event may arrive after the processor has switched to a task
> > completely unrelated to the I/O.  The approach I've taken thus far is to
> > register the I/O address range that a process mmaps in /proc/bus/pci (in
> > pci_mmap_page_range), along with its associated PID.
>
> Why not use the existing resource map?
> The PCI bus data structures are hierarchial and resources are well
> defined in that. Seems like the linked list of I/O ranges could (1)
> get very long and (2) just replicates what's already there.

You mean just use request_region at pci_mmap_page_range time instead?  That 
would prevent multiple processes from accessing the same region, but we'd 
still want to know the PID of the process that had that range reserved (the 
gfx guys really want to get a signal when an I/O error occurs so they can 
recover in userspace; other userspace drivers probably want the same).  
Another problem is that legacy I/O space isn't listed in any of the PCI 
resource maps (at least as far as I know), so there would be no way to track 
that region, which is the one that I'm *really* interested in. :)

> > Ultimately, this involves adding a machine vector for I/O error recovery
> > and a linked list of I/O regions and their PIDs.  The I/O error handler
> > could optionally be extended to look for any PCI resource range and call
> > a per-device error handling callback or shutdown routine.
> >
> > Thoughts?  Does this approach sound reasonable?
>
> Seems to fit nicely with previous pci_read_check() support proposed
> earlier.

Yeah, it could be tied in with that interface.

Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
  2004-05-04 17:14 ` Grant Grundler
  2004-05-04 17:27 ` Jesse Barnes
@ 2004-05-04 17:43 ` David Mosberger
  2004-05-04 17:51 ` Grant Grundler
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: David Mosberger @ 2004-05-04 17:43 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 4 May 2004 09:54:09 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> Background: in an effort to allow option ROM emulation on
  Jesse> ia64 (via the X int10+x86 emulator), I've had to look at
  Jesse> doing I/O error recovery since many option ROMs expect to do
  Jesse> legacy I/O port reads and writes to ports that may or may not
  Jesse> respond (one particular ROM that I've looked at continuously
  Jesse> polls a register in legacy I/O space until it returns a
  Jesse> value).  On sn2, when a device doesn't respond to an I/O
  Jesse> (legacy space or otherwise), a PCI master abort is generated,
  Jesse> which generally causes an MCA.

  Jesse> Recovering from such an event requires reprogramming chipset
  Jesse> and bridge registers (some to just clear error state and
  Jesse> others to re-arm error detection) and as such is very
  Jesse> platform specific.  Another issue is that the MCA event may
  Jesse> arrive after the processor has switched to a task completely
  Jesse> unrelated to the I/O.  The approach I've taken thus far is to
  Jesse> register the I/O address range that a process mmaps in
  Jesse> /proc/bus/pci (in pci_mmap_page_range), along with its
  Jesse> associated PID.  When an MCA occurs, an I/O error recovery
  Jesse> routine checks the target identifier value against the linked
  Jesse> list of I/O ranges and recovers appropriately (the PID is
  Jesse> there so that we can send a SIGBUS or somesuch in the
  Jesse> future).  This allows us to avoid calling PAL_MC_DRAIN on
  Jesse> every interrupt to try and flush out errors (which I'm
  Jesse> guessing would be very expensive), but may have other
  Jesse> problems.

  Jesse> Ultimately, this involves adding a machine vector for I/O
  Jesse> error recovery and a linked list of I/O regions and their
  Jesse> PIDs.  The I/O error handler could optionally be extended to
  Jesse> look for any PCI resource range and call a per-device error
  Jesse> handling callback or shutdown routine.

  Jesse> Thoughts?  Does this approach sound reasonable?

Eh, I/O space is required to soft-fail, isn't it?

Why can't you hide this in the platform-specific inX/outX routines?  I
suppose it would be very slow to drain MCAs after every inX/outX, but
you'd have to do the slow part only once, until you know whether or
not the given I/O address is safe.

	--david

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (2 preceding siblings ...)
  2004-05-04 17:43 ` David Mosberger
@ 2004-05-04 17:51 ` Grant Grundler
  2004-05-04 18:04 ` Jesse Barnes
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Grant Grundler @ 2004-05-04 17:51 UTC (permalink / raw)
  To: linux-ia64

On Tue, May 04, 2004 at 10:27:21AM -0700, Jesse Barnes wrote:
> On Tuesday, May 4, 2004 10:14 am, Grant Grundler wrote:
> > Why not use the existing resource map?
> > The PCI bus data structures are hierarchial and resources are well
> > defined in that. Seems like the linked list of I/O ranges could (1)
> > get very long and (2) just replicates what's already there.
> 
> You mean just use request_region at pci_mmap_page_range time instead?

No - directly walk the either the PCI bus/device tree or walk
the ioport space resource tree and lookup the owner.

> That 
> would prevent multiple processes from accessing the same region, but we'd 
> still want to know the PID of the process that had that range reserved (the 
> gfx guys really want to get a signal when an I/O error occurs so they can 
> recover in userspace; other userspace drivers probably want the same).  

hrm...ic. And the process doesn't need to "mmap" the region/file for
IO Port space like it would for MMIO accesses. :^(

Isn't working with the vendor to NOT do this sort of crap also an option?
At least a few vendors offer EFI drivers for video cards...
(ie avoid the problem of x86 BIOS in the first place)

Linux really needs a driver/card that can provide HW acceleration
without first having BIOS initializing it. parisc-linux port
(and probably a few others) could use such a driver/card too.

> Another problem is that legacy I/O space isn't listed in any of the PCI 
> resource maps (at least as far as I know), so there would be no way to track 
> that region, which is the one that I'm *really* interested in. :)

Well, if code is randomly poking around without registering with
request_region, then your proposal is as good as any.

> Yeah, it could be tied in with that interface.

cool.

thanks,
grant

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (3 preceding siblings ...)
  2004-05-04 17:51 ` Grant Grundler
@ 2004-05-04 18:04 ` Jesse Barnes
  2004-05-04 18:07 ` Jesse Barnes
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-04 18:04 UTC (permalink / raw)
  To: linux-ia64

On Tuesday, May 4, 2004 10:51 am, Grant Grundler wrote:
> hrm...ic. And the process doesn't need to "mmap" the region/file for
> IO Port space like it would for MMIO accesses. :^(

Right, that's even worse.  I'm not sure how to deal with that (sn2 doesn't 
even support port I/O in the ia64 architected sense).

> Isn't working with the vendor to NOT do this sort of crap also an option?
> At least a few vendors offer EFI drivers for video cards...
> (ie avoid the problem of x86 BIOS in the first place)

Yeah, that's the Way Of The Future (tm), but there's a lot of legacy stuff out 
there...

> Linux really needs a driver/card that can provide HW acceleration
> without first having BIOS initializing it. parisc-linux port
> (and probably a few others) could use such a driver/card too.

The parisc port should be able to use the int10 emulator too, as long as you 
can recover from any errors that might be generated...

> > Another problem is that legacy I/O space isn't listed in any of the PCI
> > resource maps (at least as far as I know), so there would be no way to
> > track that region, which is the one that I'm *really* interested in. :)
>
> Well, if code is randomly poking around without registering with
> request_region, then your proposal is as good as any.

Ok, I'll keep hacking on it then and post a patch when I have something 
presentable.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (4 preceding siblings ...)
  2004-05-04 18:04 ` Jesse Barnes
@ 2004-05-04 18:07 ` Jesse Barnes
  2004-05-04 18:20 ` David Mosberger
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-04 18:07 UTC (permalink / raw)
  To: linux-ia64

On Tuesday, May 4, 2004 10:43 am, David Mosberger wrote:
> Eh, I/O space is required to soft-fail, isn't it?

I thought so too, but I haven't been able to find the spec that contains that 
requirement.

> Why can't you hide this in the platform-specific inX/outX routines?  I
> suppose it would be very slow to drain MCAs after every inX/outX, but
> you'd have to do the slow part only once, until you know whether or
> not the given I/O address is safe.

This is I/O initiated by userspace loads/stores, so unless I wrap every in/out 
with some sort of ioctl or something, those won't help me.  Also, with this 
scheme, we could potentially recover from regular read/writes too.

Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (5 preceding siblings ...)
  2004-05-04 18:07 ` Jesse Barnes
@ 2004-05-04 18:20 ` David Mosberger
  2004-05-04 22:36 ` Jesse Barnes
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: David Mosberger @ 2004-05-04 18:20 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 4 May 2004 11:07:41 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> On Tuesday, May 4, 2004 10:43 am, David Mosberger wrote:
  >> Eh, I/O space is required to soft-fail, isn't it?

  Jesse> I thought so too, but I haven't been able to find the spec
  Jesse> that contains that requirement.

It's certainly implied.

  >> Why can't you hide this in the platform-specific inX/outX routines?  I
  >> suppose it would be very slow to drain MCAs after every inX/outX, but
  >> you'd have to do the slow part only once, until you know whether or
  >> not the given I/O address is safe.

  Jesse> This is I/O initiated by userspace loads/stores, so unless I
  Jesse> wrap every in/out with some sort of ioctl or something, those
  Jesse> won't help me.

User-level accesses are mapped via the MMU so you could always
intercept the page-faults.

  Jesse> Also, with this scheme, we could potentially recover from
  Jesse> regular read/writes too.

_If_ there is an infrastructure what you can hook into, fine.  But I'm
highly suspicious of using broken platforms as a justification for new
infrastructure.

	--david

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (6 preceding siblings ...)
  2004-05-04 18:20 ` David Mosberger
@ 2004-05-04 22:36 ` Jesse Barnes
  2004-05-04 22:50 ` Chris Wedgwood
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-04 22:36 UTC (permalink / raw)
  To: linux-ia64

On Tuesday, May 4, 2004 11:20 am, David Mosberger wrote:
>   Jesse> This is I/O initiated by userspace loads/stores, so unless I
>   Jesse> wrap every in/out with some sort of ioctl or something, those
>   Jesse> won't help me.
>
> User-level accesses are mapped via the MMU so you could always
> intercept the page-faults.

Wouldn't that mean that on every I/O access we'd have to page fault, do the 
I/O, and then invalidate the mapping?  That seems like a lot of overhead.

>   Jesse> Also, with this scheme, we could potentially recover from
>   Jesse> regular read/writes too.
>
> _If_ there is an infrastructure what you can hook into, fine.  But I'm
> highly suspicious of using broken platforms as a justification for new
> infrastructure.

Are you describing ia64 as a broken platform here?  The problem I'm trying to 
solve isn't sn2 specific (though part of the X stuff I have to do will be 
driven by sn2 requirements), it's a generic way to deal with hard fails on 
PIO reads, which afaik, affects all ia64 platforms.  Correct me if I'm wrong 
here...

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (7 preceding siblings ...)
  2004-05-04 22:36 ` Jesse Barnes
@ 2004-05-04 22:50 ` Chris Wedgwood
  2004-05-04 22:51 ` David Mosberger
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Chris Wedgwood @ 2004-05-04 22:50 UTC (permalink / raw)
  To: linux-ia64

On Tue, May 04, 2004 at 10:51:37AM -0700, Grant Grundler wrote:

> Linux really needs a driver/card that can provide HW acceleration
> without first having BIOS initializing it. parisc-linux port
> (and probably a few others) could use such a driver/card too.

Modern hardware is complex.  The (closed) McApple firmware has
firmware to init various video cards and stufff AFAIK.  Not sure if
this is any better than BIOS emulation though.



   --cw

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (8 preceding siblings ...)
  2004-05-04 22:50 ` Chris Wedgwood
@ 2004-05-04 22:51 ` David Mosberger
  2004-05-04 22:58 ` Jesse Barnes
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: David Mosberger @ 2004-05-04 22:51 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 4 May 2004 15:36:13 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:

  >>  User-level accesses are mapped via the MMU so you could always
  >> intercept the page-faults.

  Jesse> Wouldn't that mean that on every I/O access we'd have to page
  Jesse> fault, do the I/O, and then invalidate the mapping?  That
  Jesse> seems like a lot of overhead.

Yes.  I doubt it would be an issue for inX/outX emulation in the int10
module.

  Jesse> Also, with this scheme, we could potentially recover from
  Jesse> regular read/writes too.

  >>  _If_ there is an infrastructure what you can hook into, fine.
  >> But I'm highly suspicious of using broken platforms as a
  >> justification for new infrastructure.

  Jesse> Are you describing ia64 as a broken platform here?

Hardly.

  Jesse> The problem I'm trying to solve isn't sn2 specific (though
  Jesse> part of the X stuff I have to do will be driven by sn2
  Jesse> requirements)

I was talking about hard-failure of inX/outX.  If SN2 does that, it's
broken and I'm not terribly sympathetic (but see below).

  Jesse> it's a generic way to deal with hard fails on PIO reads,
  Jesse> which afaik, affects all ia64 platforms.  Correct me if I'm
  Jesse> wrong here...

Let me try to say it differently: inX/outX must soft-fail.  How you
achieve that on SN2, I don't really care.  If, for other reasons,
there happens to be an infrastructure you can hook into to facility
implementation of soft-fail inX/outX on SN2, that's certainly fine by
me.  But don't try to use inX/outX soft-fail as a reason to justify
the infrastructure.  Better?

	--david

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (9 preceding siblings ...)
  2004-05-04 22:51 ` David Mosberger
@ 2004-05-04 22:58 ` Jesse Barnes
  2004-05-04 23:11 ` Grant Grundler
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-04 22:58 UTC (permalink / raw)
  To: linux-ia64

On Tuesday, May 4, 2004 3:51 pm, David Mosberger wrote:
> Yes.  I doubt it would be an issue for inX/outX emulation in the int10
> module.

True.

> I was talking about hard-failure of inX/outX.  If SN2 does that, it's
> broken and I'm not terribly sympathetic (but see below).

But my point was: doesn't in/out hard fail on other ia64 platforms too?  If 
so, then it makes sense to deal with it generically.

>   Jesse> it's a generic way to deal with hard fails on PIO reads,
>   Jesse> which afaik, affects all ia64 platforms.  Correct me if I'm
>   Jesse> wrong here...
>
> Let me try to say it differently: inX/outX must soft-fail.  How you
> achieve that on SN2, I don't really care.  If, for other reasons,
> there happens to be an infrastructure you can hook into to facility
> implementation of soft-fail inX/outX on SN2, that's certainly fine by
> me.  But don't try to use inX/outX soft-fail as a reason to justify
> the infrastructure.  Better?

Sure, that makes sense.  The other part of the implementation was to deal with 
regular MMIO accesses though--userspace drivers want to get signalled when an 
error occurs, would you propose the page fault mechanism to detect that as 
well, or is an MCA handler a better way to go?

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (10 preceding siblings ...)
  2004-05-04 22:58 ` Jesse Barnes
@ 2004-05-04 23:11 ` Grant Grundler
  2004-05-04 23:13 ` David Mosberger
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Grant Grundler @ 2004-05-04 23:11 UTC (permalink / raw)
  To: linux-ia64

On Tue, May 04, 2004 at 03:36:13PM -0700, Jesse Barnes wrote:
> Are you describing ia64 as a broken platform here?  The problem I'm trying to 
> solve isn't sn2 specific (though part of the X stuff I have to do will be 
> driven by sn2 requirements), it's a generic way to deal with hard fails on 
> PIO reads, which afaik, affects all ia64 platforms.  Correct me if I'm wrong 
> here...

hardfail vs softfail is a chipset, not arch issue.
Intel IA32/IA64 chipsets will softfail on MMIO reads and IO port
access that master abort (timeout).

HP chipsets (ZX1/SX1000) will hardfail - it's one of the differences
I point out in the "Porting drivers to ZX1" OLS2002 paper I wrote.
Sounds like SGI chipsets behave the same way.

ISTR only config space accesses are always required to softfail
on master aborts. I don't recall if IO Port space is required to as
well and it looks like this issue is outside the scope of PCI spec.

grant

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (11 preceding siblings ...)
  2004-05-04 23:11 ` Grant Grundler
@ 2004-05-04 23:13 ` David Mosberger
  2004-05-04 23:15 ` David Mosberger
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: David Mosberger @ 2004-05-04 23:13 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 4 May 2004 15:58:32 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> But my point was: doesn't in/out hard fail on other ia64
  Jesse> platforms too?

No.  AFAIK, inX/outX is always supposed to soft-fail.  On zx1-based
machines, firmware initializes the chipset such that memory-mapped I/O
will hard-fail.  Intel chipsets will always softfail, even for
memory-mapped I/O.  Anybody knows what IBM's chipset does in this
regard?

  Jesse> Sure, that makes sense.  The other part of the implementation
  Jesse> was to deal with regular MMIO accesses though--userspace
  Jesse> drivers want to get signalled when an error occurs, would you
  Jesse> propose the page fault mechanism to detect that as well, or
  Jesse> is an MCA handler a better way to go?

I don't have a strong opinion on this particular issue.

	--david

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (12 preceding siblings ...)
  2004-05-04 23:13 ` David Mosberger
@ 2004-05-04 23:15 ` David Mosberger
  2004-05-04 23:17 ` Jesse Barnes
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: David Mosberger @ 2004-05-04 23:15 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 4 May 2004 16:11:35 -0700, Grant Grundler <iod00d@hp.com> said:

  Grant> HP chipsets (ZX1/SX1000) will hardfail - it's one of the
  Grant> differences I point out in the "Porting drivers to ZX1"
  Grant> OLS2002 paper I wrote.  Sounds like SGI chipsets behave the
  Grant> same way.

Hmmh, I thought zx1 soft-fails on inX/outX?

	--david

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (13 preceding siblings ...)
  2004-05-04 23:15 ` David Mosberger
@ 2004-05-04 23:17 ` Jesse Barnes
  2004-05-04 23:18 ` Grant Grundler
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-04 23:17 UTC (permalink / raw)
  To: linux-ia64

On Tuesday, May 4, 2004 4:11 pm, Grant Grundler wrote:
> On Tue, May 04, 2004 at 03:36:13PM -0700, Jesse Barnes wrote:
> > Are you describing ia64 as a broken platform here?  The problem I'm
> > trying to solve isn't sn2 specific (though part of the X stuff I have to
> > do will be driven by sn2 requirements), it's a generic way to deal with
> > hard fails on PIO reads, which afaik, affects all ia64 platforms. 
> > Correct me if I'm wrong here...
>
> hardfail vs softfail is a chipset, not arch issue.
> Intel IA32/IA64 chipsets will softfail on MMIO reads and IO port
> access that master abort (timeout).

That's nice (that's why I wanted a pointer to a spec--I can hand it to our hw 
guys who have been asking for it).

> HP chipsets (ZX1/SX1000) will hardfail - it's one of the differences
> I point out in the "Porting drivers to ZX1" OLS2002 paper I wrote.
> Sounds like SGI chipsets behave the same way.

Yep.

> ISTR only config space accesses are always required to softfail
> on master aborts. I don't recall if IO Port space is required to as
> well and it looks like this issue is outside the scope of PCI spec.

Alex says that zx1 can soft fail on IO port accesses if configured to do so.  
We don't have that option.

And of course, the more general case of MMIO accesses by user level drivers 
remains...

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (14 preceding siblings ...)
  2004-05-04 23:17 ` Jesse Barnes
@ 2004-05-04 23:18 ` Grant Grundler
  2004-05-04 23:23 ` Alex Williamson
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Grant Grundler @ 2004-05-04 23:18 UTC (permalink / raw)
  To: linux-ia64

On Tue, May 04, 2004 at 04:15:15PM -0700, David Mosberger wrote:
> >>>>> On Tue, 4 May 2004 16:11:35 -0700, Grant Grundler <iod00d@hp.com> said:
> 
>   Grant> HP chipsets (ZX1/SX1000) will hardfail - it's one of the
>   Grant> differences I point out in the "Porting drivers to ZX1"
>   Grant> OLS2002 paper I wrote.  Sounds like SGI chipsets behave the
>   Grant> same way.
> 
> Hmmh, I thought zx1 soft-fails on inX/outX?

I'll have to try it...none of the drivers I care use IO port
space or randomly attempt to poke around.

ISTR the graphics folks agonizing over this issue when bringing
up the radeon cards as console - the BIOS emulation in firmware was 
has issues with this same problem. Alex, remember more details?

grant

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (15 preceding siblings ...)
  2004-05-04 23:18 ` Grant Grundler
@ 2004-05-04 23:23 ` Alex Williamson
  2004-05-04 23:31 ` Grant Grundler
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2004-05-04 23:23 UTC (permalink / raw)
  To: linux-ia64

On Tue, 2004-05-04 at 17:17, Jesse Barnes wrote:

> > ISTR only config space accesses are always required to softfail
> > on master aborts. I don't recall if IO Port space is required to as
> > well and it looks like this issue is outside the scope of PCI spec.
> 
> Alex says that zx1 can soft fail on IO port accesses if configured to do so.  
> We don't have that option.
> 

   And the default configuration for zx1 is I/O port space in soft fail,
MMIO space in hardfail.  The current zx1 "smarts" in X can reconfigure
the bridge w/ VGA to softfail MMIO for probing/int10 support.  It's
ugly, but it works.

	Alex

-- 
Alex Williamson                             HP Linux & Open Source Lab


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (16 preceding siblings ...)
  2004-05-04 23:23 ` Alex Williamson
@ 2004-05-04 23:31 ` Grant Grundler
  2004-05-04 23:31 ` David Mosberger
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Grant Grundler @ 2004-05-04 23:31 UTC (permalink / raw)
  To: linux-ia64

On Tue, May 04, 2004 at 04:17:30PM -0700, Jesse Barnes wrote:
> That's nice (that's why I wanted a pointer to a spec--I can hand it to our hw 
> guys who have been asking for it).

Well, ask Intel. Nicely. :^)

> Alex says that zx1 can soft fail on IO port accesses if configured to do so.  

Ok. Then IO Port and MMIO accesses will by default hardfail on ZX1.
I believe there is only one bit that can be twiddled regarding this.

grant

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (17 preceding siblings ...)
  2004-05-04 23:31 ` Grant Grundler
@ 2004-05-04 23:31 ` David Mosberger
  2004-05-04 23:36 ` Grant Grundler
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: David Mosberger @ 2004-05-04 23:31 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 4 May 2004 16:17:30 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> That's nice (that's why I wanted a pointer to a spec--I can
  Jesse> hand it to our hw guys who have been asking for it).

I'm not sure you'll find an explicit statement to that effect.  In my
opinion, it's implied because the x86 IN/OUT instructions do not (and
should not) cause MCAs.

	--david

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (18 preceding siblings ...)
  2004-05-04 23:31 ` David Mosberger
@ 2004-05-04 23:36 ` Grant Grundler
  2004-05-12 19:03 ` Jesse Barnes
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Grant Grundler @ 2004-05-04 23:36 UTC (permalink / raw)
  To: linux-ia64

On Tue, May 04, 2004 at 04:31:26PM -0700, Grant Grundler wrote:
> Ok. Then IO Port and MMIO accesses will by default hardfail on ZX1.
> I believe there is only one bit that can be twiddled regarding this.

I'm wrong.
later chip revs added two more bits to "adjust"  I/O port failure mode.
(thanks alex)

grant

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (19 preceding siblings ...)
  2004-05-04 23:36 ` Grant Grundler
@ 2004-05-12 19:03 ` Jesse Barnes
  2004-05-12 21:11 ` David Mosberger
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-12 19:03 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1331 bytes --]

On Tuesday, May 4, 2004 11:04 am, Jesse Barnes wrote:
> Ok, I'll keep hacking on it then and post a patch when I have something
> presentable.

Well, maybe this isn't quite presentable, but it seems to work for my small 
test cases.  The basic idea is that when a user land process mmaps something 
in /proc/bus/pci, it'll get added to a list which is checked by the MCA 
handler when a bus check occurs.  If there's a match, the process is 
SIGBUS'd.

There are still lots of holes I think, especially since I'm not sure about all 
of the MCA specific bits yet.  In particular, is looking at the current MCA 
record ok?  Should I be indexing into ->proc_err.info any further than I do?  
There are also locking problems, since at the moment an MCA could occur on 
multiple processors, but I think the MCA code in general doesn't handle that 
case...

Anyway, I'd appreciate any comments.  Maybe something like this is ok for the 
mainline?  I haven't dealt with read_check here, but I think the pseudocode 
would be something like this:

read_check(addr) {
  val = *addr;
  PAL_MC_DRAIN;
  if (dev->error_flag)
     return error;
  return val;
}

Obviously, error_flag would be set in the MCA code as well based on walking 
the PCI resource structures looking for a match with the target identifier 
that caused the MCA...

Jesse

[-- Attachment #2: io-error-sigbus.patch --]
[-- Type: text/plain, Size: 5077 bytes --]

===== arch/ia64/kernel/mca.c 1.60 vs edited =====
--- 1.60/arch/ia64/kernel/mca.c	Mon Mar  1 06:43:35 2004
+++ edited/arch/ia64/kernel/mca.c	Wed May 12 11:33:24 2004
@@ -797,13 +797,51 @@
 void
 ia64_mca_ucmc_handler(void)
 {
+	struct io_range *range;
+	unsigned long io_addr = 0;
 	pal_processor_state_info_t *psp = (pal_processor_state_info_t *)
 		&ia64_sal_to_os_handoff_state.proc_state_param;
-	int recover = psp->tc && !(psp->cc || psp->bc || psp->rc || psp->uc);
+	int recover = 0;
+	ia64_err_rec_t *curr_record;
 
 	/* Get the MCA error record and log it */
 	ia64_mca_log_sal_error_record(SAL_INFO_TYPE_MCA);
 
+	/* TLB errors are fixed up before we get here, so recover */
+	if (psp->tc) {
+		recover = 1;
+		goto return_to_sal;
+	}
+
+	/*
+	 * If it's not a bus check with a valid target identifier,
+	 * we don't have a chance.
+	 */
+	if (!psp->bc) {// || !curr_record->proc_err.info->valid.target_identifier) {
+		recover = 0;
+		goto return_to_sal;
+	}
+
+	curr_record = IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA);
+	io_addr = curr_record->proc_err.info->target_identifier;
+
+	/*
+	 * See if an I/O error occured in a previously registered range
+	 */
+	list_for_each_entry(range, &pci_io_ranges, range_list) {
+		if (range->start <= io_addr && io_addr <= range->end) {
+			struct siginfo siginfo;
+			recover = 1;
+			siginfo.si_signo = SIGBUS;
+			siginfo.si_code = BUS_ADRERR;
+			siginfo.si_addr  = (void *) io_addr;
+			force_sig_info(SIGBUS, &siginfo,
+				       find_task_by_pid(range->owner));
+			break;
+		}
+	}
+
+return_to_sal:
 	/*
 	 *  Wakeup all the processors which are spinning in the rendezvous
 	 *  loop.
===== arch/ia64/pci/pci.c 1.48 vs edited =====
--- 1.48/arch/ia64/pci/pci.c	Wed Apr 21 14:26:09 2004
+++ edited/arch/ia64/pci/pci.c	Wed May 12 10:56:16 2004
@@ -20,6 +20,7 @@
 #include <linux/slab.h>
 #include <linux/smp_lock.h>
 #include <linux/spinlock.h>
+#include <linux/slab.h>
 
 #include <asm/machvec.h>
 #include <asm/page.h>
@@ -48,6 +49,9 @@
 
 struct pci_fixup pcibios_fixups[1];
 
+LIST_HEAD(pci_io_ranges);
+spinlock_t io_range_list_lock = SPIN_LOCK_UNLOCKED;
+
 /*
  * Low-level SAL-based PCI configuration access functions. Note that SAL
  * calls are already serialized (via sal_lock), so we don't need another
@@ -437,6 +441,8 @@
 pci_mmap_page_range (struct pci_dev *dev, struct vm_area_struct *vma,
 		     enum pci_mmap_state mmap_state, int write_combine)
 {
+	struct io_range *new_range;
+
 	/*
 	 * I/O space cannot be accessed via normal processor loads and stores on this
 	 * platform.
@@ -465,6 +471,29 @@
 			     vma->vm_end - vma->vm_start, vma->vm_page_prot))
 		return -EAGAIN;
 
+	new_range = kmalloc(sizeof(struct io_range), GFP_KERNEL);
+	if (!new_range) {
+		printk(KERN_WARNING "%s: cannot allocate io_range, "
+		       "I/O errors for 0x%016lx-0x%016lx will be fatal",
+		       __FUNCTION__, vma->vm_start, vma->vm_end);
+		goto out;
+	}
+
+	/*
+	 * Track this range and its associated process for use by the
+	 * MCA handler.
+	 */
+	new_range->start = __pa(vma->vm_pgoff << PAGE_SHIFT);
+	new_range->end = new_range->start + (vma->vm_end - vma->vm_start);
+	new_range->owner = current->pid;
+
+	spin_lock(&io_range_list_lock);
+	list_add(&new_range->range_list, &pci_io_ranges);
+	spin_unlock(&io_range_list_lock);
+
+	printk("I/O range 0x%016lx-0x%016lx registered\n",
+	       new_range->start, new_range->end);
+ out:
 	return 0;
 }
 
===== drivers/pci/proc.c 1.38 vs edited =====
--- 1.38/drivers/pci/proc.c	Fri Mar 26 08:11:04 2004
+++ edited/drivers/pci/proc.c	Wed May 12 11:46:04 2004
@@ -279,8 +279,22 @@
 
 static int proc_bus_pci_release(struct inode *inode, struct file *file)
 {
+	struct io_range *range;
+
 	kfree(file->private_data);
 	file->private_data = NULL;
+
+	spin_lock(&io_range_list_lock);
+	list_for_each_entry(range, &pci_io_ranges, range_list) {
+		if (range->owner == current->pid) {
+			list_del(&range->range_list);
+			printk("I/O range 0x%016lx-0x%016lx de-registered\n",
+			       range->start, range->end);
+			kfree(range);
+			break;
+		}
+	}
+	spin_unlock(&io_range_list_lock);
 
 	return 0;
 }
===== include/asm-ia64/io.h 1.19 vs edited =====
--- 1.19/include/asm-ia64/io.h	Tue Feb  3 21:31:10 2004
+++ edited/include/asm-ia64/io.h	Tue May  4 10:02:55 2004
@@ -1,6 +1,8 @@
 #ifndef _ASM_IA64_IO_H
 #define _ASM_IA64_IO_H
 
+#include <linux/list.h>
+
 /*
  * This file contains the definitions for the emulated IO instructions
  * inb/inw/inl/outb/outw/outl and the "string versions" of the same
@@ -50,12 +52,26 @@
 extern struct io_space io_space[];
 extern unsigned int num_io_spaces;
 
+/*
+ * Simple I/O range object with owner (if there is one)
+ */
+struct io_range {
+	unsigned long start, end;
+	struct list_head range_list;
+	pid_t owner;
+};
+
+extern struct list_head pci_io_ranges;
+
 # ifdef __KERNEL__
 
+#include <linux/spinlock.h>
 #include <asm/intrinsics.h>
 #include <asm/machvec.h>
 #include <asm/page.h>
 #include <asm/system.h>
+
+extern spinlock_t io_range_list_lock;
 
 /*
  * Change virtual addresses to physical addresses and vv.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (20 preceding siblings ...)
  2004-05-12 19:03 ` Jesse Barnes
@ 2004-05-12 21:11 ` David Mosberger
  2004-05-12 21:24 ` Jesse Barnes
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: David Mosberger @ 2004-05-12 21:11 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 12 May 2004 12:03:28 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> Anyway, I'd appreciate any comments.  Maybe something like
  Jesse> this is ok for the mainline?  I haven't dealt with read_check
  Jesse> here, but I think the pseudocode would be something like
  Jesse> this:

So if multiple processes map the same I/O range then _all_ of them
will be terminated?  What if stuff gets remapped etc?  This all just
seems incredibly fragile to me.

	--david

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (21 preceding siblings ...)
  2004-05-12 21:11 ` David Mosberger
@ 2004-05-12 21:24 ` Jesse Barnes
  2004-05-12 21:35 ` David Mosberger
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-12 21:24 UTC (permalink / raw)
  To: linux-ia64

On Wednesday, May 12, 2004 2:11 pm, David Mosberger wrote:
> So if multiple processes map the same I/O range then _all_ of them
> will be terminated?

As it stands, only the first PID that has a matching range will get a SIGBUS, 
but that could be changed.

> What if stuff gets remapped etc?

Then the machine takes an unrecoverable MCA, just like it does now.

> This all just seems incredibly fragile to me.

I think it's useful for at least a subset of the I/O MCAs that people see, and 
with the addition of the read_check stuff, may be more generally useful.  
Obviously though, this doesn't recover from 100% of I/O errors, something 
that's probably not practical anyway...

Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (22 preceding siblings ...)
  2004-05-12 21:24 ` Jesse Barnes
@ 2004-05-12 21:35 ` David Mosberger
  2004-05-12 21:44 ` Jesse Barnes
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: David Mosberger @ 2004-05-12 21:35 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 12 May 2004 14:24:42 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:

  >> This all just seems incredibly fragile to me.

  Jesse> I think it's useful for at least a subset of the I/O MCAs
  Jesse> that people see, and with the addition of the read_check
  Jesse> stuff, may be more generally useful.  Obviously though, this
  Jesse> doesn't recover from 100% of I/O errors, something that's
  Jesse> probably not practical anyway...

It definitely falls in the experimental category.  That isn't bad but
I don't think it's mainline material.  Certainly not for 2.6.

	--david

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (23 preceding siblings ...)
  2004-05-12 21:35 ` David Mosberger
@ 2004-05-12 21:44 ` Jesse Barnes
  2004-05-12 21:52 ` Jesse Barnes
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-12 21:44 UTC (permalink / raw)
  To: linux-ia64

On Wednesday, May 12, 2004 2:35 pm, David Mosberger wrote:
> It definitely falls in the experimental category.  That isn't bad but
> I don't think it's mainline material.  Certainly not for 2.6.

What would you like to see before it hits the mainline?  The read_check 
support implemented?  For better or worse, that requires a lot of buy in on 
lkml.  Just looking for other suggestions...

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (24 preceding siblings ...)
  2004-05-12 21:44 ` Jesse Barnes
@ 2004-05-12 21:52 ` Jesse Barnes
  2004-05-12 21:54 ` David Mosberger
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-12 21:52 UTC (permalink / raw)
  To: linux-ia64

On Wednesday, May 12, 2004 2:24 pm, Jesse Barnes wrote:
> > What if stuff gets remapped etc?
>
> Then the machine takes an unrecoverable MCA, just like it does now.

Oh, actually I'm wrong here.  Since I'm looking at the physical address in the 
target identifier, a virtual remap will have no effect, so it'll still get 
caught.  Of course, that only addresses one of your concerns...

Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (25 preceding siblings ...)
  2004-05-12 21:52 ` Jesse Barnes
@ 2004-05-12 21:54 ` David Mosberger
  2004-05-12 21:59 ` Jesse Barnes
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: David Mosberger @ 2004-05-12 21:54 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 12 May 2004 14:44:22 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> On Wednesday, May 12, 2004 2:35 pm, David Mosberger wrote:
  >> It definitely falls in the experimental category.  That isn't bad but
  >> I don't think it's mainline material.  Certainly not for 2.6.

  Jesse> What would you like to see before it hits the mainline?  The
  Jesse> read_check support implemented?  For better or worse, that
  Jesse> requires a lot of buy in on lkml.  Just looking for other
  Jesse> suggestions...

I just don't see a lot of demand for this kind of feature at this
point, so why bloat the kernel?  Plus the solution so far seems rather
incomplete and fragile.  I suggest to play with this in a separate
tree and if users start to really like the patch and can't live
without it, well, I'm sure I'll hear about it.

	--david

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (26 preceding siblings ...)
  2004-05-12 21:54 ` David Mosberger
@ 2004-05-12 21:59 ` Jesse Barnes
  2004-05-13  9:02 ` Luck, Tony
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-12 21:59 UTC (permalink / raw)
  To: linux-ia64

On Wednesday, May 12, 2004 2:54 pm, David Mosberger wrote:
> I just don't see a lot of demand for this kind of feature at this
> point, so why bloat the kernel?  Plus the solution so far seems rather
> incomplete and fragile.  I suggest to play with this in a separate
> tree and if users start to really like the patch and can't live
> without it, well, I'm sure I'll hear about it.

Sure, ok.  I'll see how useful it is in real world situations (e.g. our gfx 
folks have been wanting this for awhile) and come back when I have compelling 
evidence of its utility.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (27 preceding siblings ...)
  2004-05-12 21:59 ` Jesse Barnes
@ 2004-05-13  9:02 ` Luck, Tony
  2004-05-13 15:52 ` Jesse Barnes
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Luck, Tony @ 2004-05-13  9:02 UTC (permalink / raw)
  To: linux-ia64

>There are also locking problems, since at the moment an MCA 
>could occur on multiple processors, but I think the MCA code
>in general doesn't handle that case...

At the moment the MCA code serializes simulaneous MCA on multiple
processors (see the hand-crafted spinlock in mca_asm.S at the
ia64_os_mca_spin label).

-Tony

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (28 preceding siblings ...)
  2004-05-13  9:02 ` Luck, Tony
@ 2004-05-13 15:52 ` Jesse Barnes
  2004-05-13 16:07 ` Luck, Tony
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-13 15:52 UTC (permalink / raw)
  To: linux-ia64

On Thursday, May 13, 2004 2:02 am, Luck, Tony wrote:
> >There are also locking problems, since at the moment an MCA
> >could occur on multiple processors, but I think the MCA code
> >in general doesn't handle that case...
>
> At the moment the MCA code serializes simulaneous MCA on multiple
> processors (see the hand-crafted spinlock in mca_asm.S at the
> ia64_os_mca_spin label).

Thanks Tony, I hadn't looked at that code in awhile.  I guess the I/O error 
recovery code should try to acquire the io_range_list_lock before looking 
through the list.  If it can't get the lock, we just have to give up and make 
the error unrecoverable since we don't know if another CPU will take an MCA 
while holding that lock, leaving the list in a bad state...

I don't *think* that doing unconditional rendezvous in the PROM will help this 
situation either, but maybe someone else has good ideas about how to handle 
that?

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (29 preceding siblings ...)
  2004-05-13 15:52 ` Jesse Barnes
@ 2004-05-13 16:07 ` Luck, Tony
  2004-05-13 16:43 ` Russ Anderson
  2004-05-13 16:53 ` Jesse Barnes
  32 siblings, 0 replies; 34+ messages in thread
From: Luck, Tony @ 2004-05-13 16:07 UTC (permalink / raw)
  To: linux-ia64

>Thanks Tony, I hadn't looked at that code in awhile.  I guess 
>the I/O error recovery code should try to acquire the
>io_range_list_lock before looking through the list.

Yes ... I wouldn't attempt anything tricker than a "trylock"
in code that's going to run as part of an MCA handler.  The
async delivery of MCA means that there's always a risk of
deadlocking a cpu if it tries to grab a lock that it already
holds at the time the MCA is delivered.

-Tony

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (30 preceding siblings ...)
  2004-05-13 16:07 ` Luck, Tony
@ 2004-05-13 16:43 ` Russ Anderson
  2004-05-13 16:53 ` Jesse Barnes
  32 siblings, 0 replies; 34+ messages in thread
From: Russ Anderson @ 2004-05-13 16:43 UTC (permalink / raw)
  To: linux-ia64

Jesse Barnes wrote:
> On Thursday, May 13, 2004 2:02 am, Luck, Tony wrote:
> > >There are also locking problems, since at the moment an MCA
> > >could occur on multiple processors, but I think the MCA code
> > >in general doesn't handle that case...
> >
> > At the moment the MCA code serializes simulaneous MCA on multiple
> > processors (see the hand-crafted spinlock in mca_asm.S at the
> > ia64_os_mca_spin label).
> 
> Thanks Tony, I hadn't looked at that code in awhile.  I guess the I/O error 
> recovery code should try to acquire the io_range_list_lock before looking 
> through the list.  If it can't get the lock, we just have to give up and make 
> the error unrecoverable since we don't know if another CPU will take an MCA 
> while holding that lock, leaving the list in a bad state...

Seems like spinning on a trylock for a short period would
be reasonable.  It everything is OK, the process with the
lock will let go quickly.  Otherwise, we're probably dead
anyway.
 
> I don't *think* that doing unconditional rendezvous in the PROM will help this 
> situation either, but maybe someone else has good ideas about how to handle 
> that?

In general, I suggest avoiding rendezvous unless there is a really
obvious reason to do so.  In this case, I think you're right.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC] I/O MCA recovery
  2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
                   ` (31 preceding siblings ...)
  2004-05-13 16:43 ` Russ Anderson
@ 2004-05-13 16:53 ` Jesse Barnes
  32 siblings, 0 replies; 34+ messages in thread
From: Jesse Barnes @ 2004-05-13 16:53 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 777 bytes --]

On Thursday, May 13, 2004 9:43 am, Russ Anderson wrote:
> Seems like spinning on a trylock for a short period would
> be reasonable.  It everything is OK, the process with the
> lock will let go quickly.  Otherwise, we're probably dead
> anyway.

Here's the latest (untested!) version that tries to do that.

> > I don't *think* that doing unconditional rendezvous in the PROM will help
> > this situation either, but maybe someone else has good ideas about how to
> > handle that?
>
> In general, I suggest avoiding rendezvous unless there is a really
> obvious reason to do so.  In this case, I think you're right.

I'm coming to the same conclusion, even though throwing a whole new kernel 
execution context into the mix (MCA context) really makes this confusing...

Jesse

[-- Attachment #2: io-error-sigbus-3.patch --]
[-- Type: text/plain, Size: 5425 bytes --]

===== arch/ia64/kernel/mca.c 1.60 vs edited =====
--- 1.60/arch/ia64/kernel/mca.c	Mon Mar  1 06:43:35 2004
+++ edited/arch/ia64/kernel/mca.c	Thu May 13 09:28:06 2004
@@ -797,13 +797,70 @@
 void
 ia64_mca_ucmc_handler(void)
 {
+	struct io_range *range;
+	unsigned long io_addr = 0;
 	pal_processor_state_info_t *psp = (pal_processor_state_info_t *)
 		&ia64_sal_to_os_handoff_state.proc_state_param;
-	int recover = psp->tc && !(psp->cc || psp->bc || psp->rc || psp->uc);
+	int recover = 0;
+	ia64_err_rec_t *curr_record;
 
 	/* Get the MCA error record and log it */
 	ia64_mca_log_sal_error_record(SAL_INFO_TYPE_MCA);
 
+	/* TLB errors are fixed up before we get here, so recover */
+	if (psp->tc) {
+		recover = 1;
+		goto return_to_sal;
+	}
+
+	/*
+	 * If it's not a bus check with a valid target identifier,
+	 * we don't have a chance.
+	 */
+	if (!psp->bc) {
+		recover = 0;
+		goto return_to_sal;
+	}
+
+	/*
+	 * If we can't get this lock, we can't safely look at the list,
+	 * so give up.
+	 */
+	if (!spin_trylock(&io_range_list_lock)) {
+		recover = 0;
+		goto return_to_sal;
+	}
+
+	curr_record = IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA);
+	io_addr = curr_record->proc_err.info->target_identifier;
+
+	/*
+	 * See if an I/O error occured in a previously registered range
+	 */
+	list_for_each_entry(range, &pci_io_ranges, range_list) {
+		if (range->start <= io_addr && io_addr <= range->end) {
+			struct siginfo siginfo;
+			struct task_struct *owner = NULL;
+			recover = 1;
+			siginfo.si_signo = SIGBUS;
+			siginfo.si_code = BUS_ADRERR;
+			siginfo.si_addr  = (void *) io_addr;
+			owner = find_task_by_pid(range->owner);
+			if (owner)
+				force_sig_info(SIGBUS, &siginfo, owner);
+			else {
+				/*
+				 * need to free memory too, is that safe
+				 * here?
+				 */
+				list_del(&range->range_list);
+			}
+			break;
+		}
+	}
+	spin_unlock(&io_range_list_lock);
+
+return_to_sal:
 	/*
 	 *  Wakeup all the processors which are spinning in the rendezvous
 	 *  loop.
===== arch/ia64/pci/pci.c 1.48 vs edited =====
--- 1.48/arch/ia64/pci/pci.c	Wed Apr 21 14:26:09 2004
+++ edited/arch/ia64/pci/pci.c	Wed May 12 10:56:16 2004
@@ -20,6 +20,7 @@
 #include <linux/slab.h>
 #include <linux/smp_lock.h>
 #include <linux/spinlock.h>
+#include <linux/slab.h>
 
 #include <asm/machvec.h>
 #include <asm/page.h>
@@ -48,6 +49,9 @@
 
 struct pci_fixup pcibios_fixups[1];
 
+LIST_HEAD(pci_io_ranges);
+spinlock_t io_range_list_lock = SPIN_LOCK_UNLOCKED;
+
 /*
  * Low-level SAL-based PCI configuration access functions. Note that SAL
  * calls are already serialized (via sal_lock), so we don't need another
@@ -437,6 +441,8 @@
 pci_mmap_page_range (struct pci_dev *dev, struct vm_area_struct *vma,
 		     enum pci_mmap_state mmap_state, int write_combine)
 {
+	struct io_range *new_range;
+
 	/*
 	 * I/O space cannot be accessed via normal processor loads and stores on this
 	 * platform.
@@ -465,6 +471,29 @@
 			     vma->vm_end - vma->vm_start, vma->vm_page_prot))
 		return -EAGAIN;
 
+	new_range = kmalloc(sizeof(struct io_range), GFP_KERNEL);
+	if (!new_range) {
+		printk(KERN_WARNING "%s: cannot allocate io_range, "
+		       "I/O errors for 0x%016lx-0x%016lx will be fatal",
+		       __FUNCTION__, vma->vm_start, vma->vm_end);
+		goto out;
+	}
+
+	/*
+	 * Track this range and its associated process for use by the
+	 * MCA handler.
+	 */
+	new_range->start = __pa(vma->vm_pgoff << PAGE_SHIFT);
+	new_range->end = new_range->start + (vma->vm_end - vma->vm_start);
+	new_range->owner = current->pid;
+
+	spin_lock(&io_range_list_lock);
+	list_add(&new_range->range_list, &pci_io_ranges);
+	spin_unlock(&io_range_list_lock);
+
+	printk("I/O range 0x%016lx-0x%016lx registered\n",
+	       new_range->start, new_range->end);
+ out:
 	return 0;
 }
 
===== drivers/pci/proc.c 1.38 vs edited =====
--- 1.38/drivers/pci/proc.c	Fri Mar 26 08:11:04 2004
+++ edited/drivers/pci/proc.c	Wed May 12 11:46:04 2004
@@ -279,8 +279,22 @@
 
 static int proc_bus_pci_release(struct inode *inode, struct file *file)
 {
+	struct io_range *range;
+
 	kfree(file->private_data);
 	file->private_data = NULL;
+
+	spin_lock(&io_range_list_lock);
+	list_for_each_entry(range, &pci_io_ranges, range_list) {
+		if (range->owner == current->pid) {
+			list_del(&range->range_list);
+			printk("I/O range 0x%016lx-0x%016lx de-registered\n",
+			       range->start, range->end);
+			kfree(range);
+			break;
+		}
+	}
+	spin_unlock(&io_range_list_lock);
 
 	return 0;
 }
===== include/asm-ia64/io.h 1.19 vs edited =====
--- 1.19/include/asm-ia64/io.h	Tue Feb  3 21:31:10 2004
+++ edited/include/asm-ia64/io.h	Tue May  4 10:02:55 2004
@@ -1,6 +1,8 @@
 #ifndef _ASM_IA64_IO_H
 #define _ASM_IA64_IO_H
 
+#include <linux/list.h>
+
 /*
  * This file contains the definitions for the emulated IO instructions
  * inb/inw/inl/outb/outw/outl and the "string versions" of the same
@@ -50,12 +52,26 @@
 extern struct io_space io_space[];
 extern unsigned int num_io_spaces;
 
+/*
+ * Simple I/O range object with owner (if there is one)
+ */
+struct io_range {
+	unsigned long start, end;
+	struct list_head range_list;
+	pid_t owner;
+};
+
+extern struct list_head pci_io_ranges;
+
 # ifdef __KERNEL__
 
+#include <linux/spinlock.h>
 #include <asm/intrinsics.h>
 #include <asm/machvec.h>
 #include <asm/page.h>
 #include <asm/system.h>
+
+extern spinlock_t io_range_list_lock;
 
 /*
  * Change virtual addresses to physical addresses and vv.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2004-05-13 16:53 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-04 16:54 [RFC] I/O MCA recovery Jesse Barnes
2004-05-04 17:14 ` Grant Grundler
2004-05-04 17:27 ` Jesse Barnes
2004-05-04 17:43 ` David Mosberger
2004-05-04 17:51 ` Grant Grundler
2004-05-04 18:04 ` Jesse Barnes
2004-05-04 18:07 ` Jesse Barnes
2004-05-04 18:20 ` David Mosberger
2004-05-04 22:36 ` Jesse Barnes
2004-05-04 22:50 ` Chris Wedgwood
2004-05-04 22:51 ` David Mosberger
2004-05-04 22:58 ` Jesse Barnes
2004-05-04 23:11 ` Grant Grundler
2004-05-04 23:13 ` David Mosberger
2004-05-04 23:15 ` David Mosberger
2004-05-04 23:17 ` Jesse Barnes
2004-05-04 23:18 ` Grant Grundler
2004-05-04 23:23 ` Alex Williamson
2004-05-04 23:31 ` Grant Grundler
2004-05-04 23:31 ` David Mosberger
2004-05-04 23:36 ` Grant Grundler
2004-05-12 19:03 ` Jesse Barnes
2004-05-12 21:11 ` David Mosberger
2004-05-12 21:24 ` Jesse Barnes
2004-05-12 21:35 ` David Mosberger
2004-05-12 21:44 ` Jesse Barnes
2004-05-12 21:52 ` Jesse Barnes
2004-05-12 21:54 ` David Mosberger
2004-05-12 21:59 ` Jesse Barnes
2004-05-13  9:02 ` Luck, Tony
2004-05-13 15:52 ` Jesse Barnes
2004-05-13 16:07 ` Luck, Tony
2004-05-13 16:43 ` Russ Anderson
2004-05-13 16:53 ` Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox