Re: Scanning problems - machine lockups

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Scanning problems - machine lockups
       [not found] <01011823245400.01549@statler.ether-net>
@ 2001-01-19  1:41 ` Bob Frey
  2001-01-19 19:37   ` Gérard Roudier
  2001-01-20 15:54   ` Stephen Kitchener
  2001-01-19  9:30 ` David Woodhouse
  1 sibling, 2 replies; 5+ messages in thread
From: Bob Frey @ 2001-01-19  1:41 UTC (permalink / raw)
  To: Stephen Kitchener; +Cc: linux-scsi, linux-kernel

On Thu, Jan 18, 2001 at 11:24:54PM +0000, Stephen Kitchener wrote:
> The only thing that might be odd is that the scanner's scsi card and the 
> display card are using the same IRQ, but I thought that IRQ sharing was ok in 
> the new kernels. The display card is an AGP type and the scsi card is pci.
>
> As you might have guessed, I am at a loss as to what to do next. Any help 
> appriciated, even suggestions as to how I can track down what I haven't done 
> (yet!)
Sharing interrupts could be the problem. Interrupt sharing is supported
in the kernel as far as two different drivers being able to register a
handler for the same interrupt, but not much beyond that. From studying
the code I don't find any handling of unclaimed or spurious interrupts.

Some drivers (like video cards) do not register a handler for their card's
interrupt. So when another driver (like the advansys driver) shares an
interrupt with this card's "unregistered" interrupt there is no one left
to handle the interrupt. The system will loop taking an interrupt from
the card. I've observed this using the frame buffer driver. Note: this
problem is unnoticed if the (video) card does not share an interrupt with
another driver, because (at least on x86) Linux does not enable the
PIC IRQ bit for IRQs that do not have registered interrupted handlers.

For Linux I think the right way to handle this is to have each (SA_SHIRQ)
sharing capable interrupt handler return a TRUE or FALSE value indicating
whether the interrupt belongs to the driver. In kernel/irq.c:handle_IRQ_event()
check the return value. If after one pass through all of the interrupt
(action) handlers no one has claimed the inerrupt then log a warning message
(spurious interrupt) and clear the interrupt. The difficult/painstaking
problem is that all SA_SHIRQ drivers need to be changed to return a return
value to make this work.

Anyway the simplest solution for you is probably if you can is to put
assign the video card its own interrupt. Putting the two advansys cards
on the same interrupt is fine. I have used interrupt sharing between
multiple advansys cards and and ethernet cards without a problem.

--
Bob Frey
bfrey@turbolinux.com.cn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Scanning problems - machine lockups
       [not found] <01011823245400.01549@statler.ether-net>
  2001-01-19  1:41 ` Scanning problems - machine lockups Bob Frey
@ 2001-01-19  9:30 ` David Woodhouse
  1 sibling, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2001-01-19  9:30 UTC (permalink / raw)
  To: Bob Frey; +Cc: Stephen Kitchener, linux-scsi, linux-kernel

bfrey@turbolinux.com.cn said:
> For Linux I think the right way to handle this is to have each
> (SA_SHIRQ) sharing capable interrupt handler return a TRUE or FALSE
> value indicating whether the interrupt belongs to the driver. In
> kernel/irq.c:handle_IRQ_event() check the return value. If after one
> pass through all of the interrupt (action) handlers no one has claimed
> the inerrupt then log a warning message (spurious interrupt) and clear
> the interrupt.

There exists hardware on which it's impossible to know for sure whether an 
interrupt was generated or not. Upon receiving what might have been a 
'status change' IRQ you check the status register. If it's reading the same 
as when you last looked at it, you have know way to be sure that it hasn't 
actually changed twice.

So you'd have to have a YES/NO/MAYBE return code, which means you'd still 
want to deal with the case of an IRQ storm with the only handler returning 
MAYBE. And in reality, your first pass through all the drivers would simply 
be to change the prototype and make them return MAYBE. And many 
of them would stay that way for a _loooong_ time. 

Also, you don't necessarily need to mask the IRQ just because you got a 
single spurious trigger. It's only really necessary if you're getting 
inundated with so many IRQs that the system is falling over.

So your heuristic might be something like...

 Each jiffy, reset the 'spurious IRQ' count for each IRQ.

 Each time an IRQ is triggered, note the 'maximum' return code from the
	handlers called (where NO < MAYBE < YES).

 If YES, goto out:

 If MAYBE, increase the spurious IRQ count for this IRQ by 1

 If NO, increase the spurious IRQ count for this IRQ by 100

 If the spurious IRQ count for this IRQ has reached <a suitable number>,
	then disable the IRQ.

out:
 Return from irq.

Perhaps you don't want to touch the whole list of IRQ counts every jiffy. 
Perhaps you could reset it to zero on each YES response. Or something.

And if you start randomly disabling IRQs because they're triggering too 
often, then you'll probably need a way for a device driver to force you to 
turn them back on again, like kick_irq().

--
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Scanning problems - machine lockups
  2001-01-19  1:41 ` Scanning problems - machine lockups Bob Frey
@ 2001-01-19 19:37   ` Gérard Roudier
  2001-01-20 15:54   ` Stephen Kitchener
  1 sibling, 0 replies; 5+ messages in thread
From: Gérard Roudier @ 2001-01-19 19:37 UTC (permalink / raw)
  To: Bob Frey; +Cc: Stephen Kitchener, linux-scsi, linux-kernel



On Fri, 19 Jan 2001, Bob Frey wrote:

> On Thu, Jan 18, 2001 at 11:24:54PM +0000, Stephen Kitchener wrote:
> > The only thing that might be odd is that the scanner's scsi card and the 
> > display card are using the same IRQ, but I thought that IRQ sharing was ok in 
> > the new kernels. The display card is an AGP type and the scsi card is pci.
> >
> > As you might have guessed, I am at a loss as to what to do next. Any help 
> > appriciated, even suggestions as to how I can track down what I haven't done 
> > (yet!)
> Sharing interrupts could be the problem. Interrupt sharing is supported
> in the kernel as far as two different drivers being able to register a
> handler for the same interrupt, but not much beyond that. From studying
> the code I don't find any handling of unclaimed or spurious interrupts.
> 
> Some drivers (like video cards) do not register a handler for their card's
> interrupt. So when another driver (like the advansys driver) shares an
> interrupt with this card's "unregistered" interrupt there is no one left
> to handle the interrupt. The system will loop taking an interrupt from
> the card. I've observed this using the frame buffer driver. Note: this
> problem is unnoticed if the (video) card does not share an interrupt with
> another driver, because (at least on x86) Linux does not enable the
> PIC IRQ bit for IRQs that do not have registered interrupted handlers.
> 
> For Linux I think the right way to handle this is to have each (SA_SHIRQ)
> sharing capable interrupt handler return a TRUE or FALSE value indicating
> whether the interrupt belongs to the driver. In kernel/irq.c:handle_IRQ_event()
> check the return value. If after one pass through all of the interrupt
> (action) handlers no one has claimed the inerrupt then log a warning message
> (spurious interrupt) and clear the interrupt. The difficult/painstaking
> problem is that all SA_SHIRQ drivers need to be changed to return a return
> value to make this work.

There is no ordering of interrupts with respect to transactions in PCI.
As a result, getting interrupts that does not match a pending interrupt
condition as seen by driver can happen, without the interrupt being
spurious.

As a result, the 2 following assertions:
- All interrupts in PCI are spurious
- No interrupt is PCI is spurious
Are less wrong than asserting that some interrupts in PCI are relevant and
some are spurious. :-)

And btw, some hardwares, notably Intel ones, seems to ensure coherency
prior to deliver interrupts. This is a useless work when the IRQ is
actually shared and does only make sense for ISA or ISA-like PCI devices
and in situations where the IRQ is not actually shared.

> Anyway the simplest solution for you is probably if you can is to put
> assign the video card its own interrupt. Putting the two advansys cards
> on the same interrupt is fine. I have used interrupt sharing between
> multiple advansys cards and and ethernet cards without a problem.

In theory, the O/S should warn _loudly_ if any PCI device hasn't a
software driver attached, for the reason there is no generic way to
actually quiesce completely a PCI device. As a result, loading drivers
after boot or just loading drivers with interrupt enabled at boot is
unsafe with PCI devices. This shall be considered, even if the risk of a
breakage is generally very low.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Scanning problems - machine lockups
  2001-01-19  1:41 ` Scanning problems - machine lockups Bob Frey
  2001-01-19 19:37   ` Gérard Roudier
@ 2001-01-20 15:54   ` Stephen Kitchener
  2001-01-21  1:32     ` Bob Frey
  1 sibling, 1 reply; 5+ messages in thread
From: Stephen Kitchener @ 2001-01-20 15:54 UTC (permalink / raw)
  To: Bob Frey; +Cc: linux-scsi, linux-kernel

On Friday 19 January 2001 01:41, Bob Frey wrote:
> On Thu, Jan 18, 2001 at 11:24:54PM +0000, Stephen Kitchener wrote:
> > The only thing that might be odd is that the scanner's scsi card and the
> > display card are using the same IRQ, but I thought that IRQ sharing was
> > ok in the new kernels. The display card is an AGP type and the scsi card
> > is pci.
> >
> > As you might have guessed, I am at a loss as to what to do next. Any help
> > appriciated, even suggestions as to how I can track down what I haven't
> > done (yet!)
>
> Sharing interrupts could be the problem. Interrupt sharing is supported
> in the kernel as far as two different drivers being able to register a
> handler for the same interrupt, but not much beyond that. From studying
> the code I don't find any handling of unclaimed or spurious interrupts.
>
> Some drivers (like video cards) do not register a handler for their card's
> interrupt. So when another driver (like the advansys driver) shares an
> interrupt with this card's "unregistered" interrupt there is no one left
> to handle the interrupt. The system will loop taking an interrupt from
> the card. I've observed this using the frame buffer driver. Note: this
> problem is unnoticed if the (video) card does not share an interrupt with
> another driver, because (at least on x86) Linux does not enable the
> PIC IRQ bit for IRQs that do not have registered interrupted handlers.
>
> For Linux I think the right way to handle this is to have each (SA_SHIRQ)
> sharing capable interrupt handler return a TRUE or FALSE value indicating
> whether the interrupt belongs to the driver. In
> kernel/irq.c:handle_IRQ_event() check the return value. If after one pass
> through all of the interrupt (action) handlers no one has claimed the
> inerrupt then log a warning message (spurious interrupt) and clear the
> interrupt. The difficult/painstaking problem is that all SA_SHIRQ drivers
> need to be changed to return a return value to make this work.
>
> Anyway the simplest solution for you is probably if you can is to put
> assign the video card its own interrupt. Putting the two advansys cards
> on the same interrupt is fine. I have used interrupt sharing between
> multiple advansys cards and and ethernet cards without a problem.

Hi Bob and the list,

I eventually succeeded in putting the grahics card onto a different IRQ from 
ether of the SCSI cards. Not without some problems though. The AGP card would 
follow what ever IRQ I assigned the PCI slot nearest it. The mobo is an ASUS 
P2BD btw. The only way I could make the change was to swap the ethernet card 
with the scsi card. The Ethernet card now has the same IRQ as the Graphics 
card.

I would try a PCI graphics card, but I haven't one. Just in case the AGP card 
is getting in the way.

Any, I thought that it had cured the problem, but after a few scans, 
admittedly more than before, the scan head didn't return on the last scan 
that was successfully started.

Trying to scan again, hoping that it would reset the scanner and carry on, 
... nothing, no response from scanner.

So.. could it be the scsi driver (Advansys 3940uw, in the kernel), or a 
broken scanner itself?. Is there a way I can test this, run tests, switch on 
debug etc ? 

Scsi device is set up as follows...

Device Information for AdvanSys SCSI Host 0:
Target IDs Detected: 1, 7, (7=Host Adapter)
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: UMAX     Model: Astra 2400S      Rev: V1.1
  Type:   Scanner                          ANSI SCSI revision: 02
 
EEPROM Settings for AdvanSys SCSI Host 0:
 Serial Number: AA48A919D387
 Host SCSI ID: 7, Host Queue Size: 253, Device Queue Size: 63
 termination: 0 (Automatic), bios_ctrl: ffe7
 Target ID:            0 1 2 3 4 5 6 7 8 9 A B C D E F
 Disconnects:          Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y
 Command Queuing:      Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
 Start Motor:          Y N Y Y Y N Y Y Y Y Y Y Y Y Y Y
 Synchronous Transfer: Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y
 Ultra Transfer:       Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y
 Wide Transfer:        Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y

>
> --
> Bob Frey
> bfrey@turbolinux.com.cn
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/

-- 
Stephen Kitchener
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Scanning problems - machine lockups
  2001-01-20 15:54   ` Stephen Kitchener
@ 2001-01-21  1:32     ` Bob Frey
  0 siblings, 0 replies; 5+ messages in thread
From: Bob Frey @ 2001-01-21  1:32 UTC (permalink / raw)
  To: Stephen Kitchener; +Cc: linux-scsi, linux-kernel

On Sat, Jan 20, 2001 at 03:54:22PM +0000, Stephen Kitchener wrote:
> Any, I thought that it had cured the problem, but after a few scans, 
> admittedly more than before, the scan head didn't return on the last scan 
> that was successfully started.
It sounds like you did solve the "lock the machine solid" problem by putting
the video card on its own interrupt. Does anyone specifically maintain
kernel/irq.c? - please submit some spurious interrupt handling (interrupts
generated by devices that don't have a handler) along the lines of what
David Woodhouse suggested. I would expect it to deactivate the IRQ and report
the problem to the user so they can fix it themselves probably by changing
the IRQ configuration - not perfect but better than locking up the machine
with no message.

> Trying to scan again, hoping that it would reset the scanner and carry on, 
> ... nothing, no response from scanner.
So now apparently the scanner doesn't respond after a few scans, but the
system continues to work OK. This sounds like a problem with either
the scanner sw, advansys driver, or the scanner.

1) I'm not familiar with SANE (is that what you're using?), but it probably
has some test programs. Please try them if they exist and report the
results.
2) Please send the output of /proc/scsi/advansys/1 file after the hang. Also
you said you have another advansys card (/proc/scsi/advansys/0) with a lot
of devices attached. Do they all work correctly after the scanner hang? If
so, the problem is isolated to the second card and it's not a general driver
problem.
3) Another experiment is to compile the advansys driver as a module if you
can and after the hang, rmmod/insmod it to see if the scanner starts working
again. I would expect it to because this will reset the driver, adapter, and
SCSI bus.

-- 
Bob Frey
bfrey@turbolinux.com.cn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-01-21  1:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <01011823245400.01549@statler.ether-net>
2001-01-19  1:41 ` Scanning problems - machine lockups Bob Frey
2001-01-19 19:37   ` Gérard Roudier
2001-01-20 15:54   ` Stephen Kitchener
2001-01-21  1:32     ` Bob Frey
2001-01-19  9:30 ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox