linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* gpio: pca953x: interrupt feature unreliable (re-re-send, Grant please acknowledge!!)
@ 2012-05-10 18:19 Jean-Francois Dagenais
  2012-05-11  7:48 ` Linus Walleij
  2012-05-11 18:53 ` gpio: pca953x: interrupt feature unreliable (re-re-send, Grant please acknowledge!!) Grant Likely
  0 siblings, 2 replies; 7+ messages in thread
From: Jean-Francois Dagenais @ 2012-05-10 18:19 UTC (permalink / raw)
  To: Grant Likely, linus.walleij
  Cc: torvalds, open list, Thomas Gleixner, eric.miao, maz

Hi all,

Here's a fun, yet potentially dangerous problem.

I've sent this twice already, without any feedback. I'm just trying to get more 
visibility on this because I believe it to be a serious issue depending where 
this chip may be used. Interrupts could be missed in what could be very tough 
conditions to replicate. In my setup I could reproduce the problem easily and 
took great care in analysing and documenting it.
===============Re-Re-Send (with edits)==================
Hi all,
(Using 3.1.0-rc4, didn't change since)

I have detected a flaw in the interrupt support of the gpio-pca953x driver. 
This is how I discovered the issue:

I have a interrupt client device which has it's interrupt signal connected to 
one of the GPIOs of the pca9555. The client is an ad714x device. It's interrupt 
routine is falling edge triggered (so says the driver, but according to the 
AD714x specs, it should be level,low, however this is another topic). The 
pca9555 is interrupting the CPU through INTA of an ethernet PCI card for which 
I have blacklisted the driver module and only done "echo 1 > 
/bus/pci/devices/00xyz/enable". It is "(level, low)" which matches the pca953x 
request_threaded_isr call (IRQF_TRIGGER_LOW | IRQF_ONESHOT). I used two GPOs on 
some other device to get the !ISR signals shown below. These show the nested 
nature of the cli ISR called from the pca953x's ISR.
ASCII art of my probing (use a monospace font, "!ISR"==low, mean ISR running)

Events(lined-up):1  2 3 4 5  6    7       8    9
             ____          _______
client !INT      |________|       |________(lost)___
             ___________                   _________
cli !ISR                |____|____________|
             ____      ___         _________________
pca953x !INT     |____|   |_______|
             ______                             ____
pca953x !ISR       |___________________________|
(note: the little spike in the cli !ISR marks the end of reading the client's 
status registers, the rest of the ISR is updating driver state machines, etc.)

Summary: The client enters interrupt(1), which immediately puts the pca953x in 
interrupt as well(1). When the pca953x enters it's isr thread(2), it reads the 
input status register, this immediately clears the pca953x interrupt(3). The 
pca953x ISR then figure's out it should invoke the nested ISR for the client. 
The client ISR starts (4) by reading the client chip's status register, which 
clears the client interrupt(5). This immediately puts the pca953x back into 
interrupt (5) since it's a different state than what was last read in the 
pca953x ISR(3). Since pca953x registered it's interrupt with IRQF_ONESHOT, this 
new IRQ state is masked and goes un-noticed. The pca953x chips' FLAW IS THEN 
REVEILED if/when the client chip goes back in interrupt before (7) the client 
ISR (and the pca053x's ISR with the current code) even has time to finish(8 and 
9). At that moment, the pca953x's input are back to the way they where when the 
pca953x ISR last read the input status register. So when all of the ISR 
routines are done (9...), the client has it's INT line asserted and no more 
interrupt will come from the pca953x because of this.

pca953x specs quote: "Resetting the interrupt circuit is achieved when data on 
the port is CHANGED TO THE ORIGINAL SETTING, data is read from the port that 
generated the interrupt or in a Stop event. [...] Interrupts that occur during 
the ACK or NACK clock pulse CAN BE LOST (or be very short) due to the resetting 
of the interrupt during this pulse."

So in essence, although the pca953x.c's design and doc says only edges are 
detected and supported, the chip design makes it IMPOSSIBLE to guarantee that 
all edges will be caught.

The IRQ function SHOULD BE REMOVED from the driver since the chip cannot honor 
the driver's promise. At the very least, for system integrators that want to 
use level base interrupts implemented by the pca953x, changes to the driver 
should be made to correctly handle the lack of LATCHING INTERRUPT STATUS 
register...

I would propose a patch, but I am missing knowledge on the internals of the 
kernel's irq code, so here's a simple suggestion description, hoping to get 
some feedback.

First of all, I propose pca953x_irq_set_type should only support 
IRQ_TYPE_LEVEL_HIGH and IRQ_TYPE_LEVEL_LOW. pca953x_irq_handler should be 
adjusted accordingly.

To make sure we catch as many transitions of pca953x chips INT as possible 
(yikes), we make pca953x_irq_setup() request it's own interrupt as both rising 
and falling (IRQF_TRIGGER_RISING|IRQF_TRIGGER_FALLING, is that even possible?) 
without IRQF_ONESHOT, so that pca953x_irq_handler is rescheduled for running 
even when it's already running.

Any thoughts? Maybe there are more/less fancy, or better yet, "correct" ways of 
doing this? Will this even work?

/jfd


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gpio: pca953x: interrupt feature unreliable (re-re-send, Grant please acknowledge!!)
  2012-05-10 18:19 gpio: pca953x: interrupt feature unreliable (re-re-send, Grant please acknowledge!!) Jean-Francois Dagenais
@ 2012-05-11  7:48 ` Linus Walleij
       [not found]   ` <20120604091127.0f95a85c@archvile>
  2012-05-11 18:53 ` gpio: pca953x: interrupt feature unreliable (re-re-send, Grant please acknowledge!!) Grant Likely
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Walleij @ 2012-05-11  7:48 UTC (permalink / raw)
  To: Jean-Francois Dagenais, David Jander
  Cc: Grant Likely, linus.walleij, torvalds, open list, Thomas Gleixner,
	eric.miao, maz

On Thu, May 10, 2012 at 8:19 PM, Jean-Francois Dagenais
<jeff.dagenais@gmail.com> wrote:

> I would propose a patch, but I am missing knowledge on the internals of the
> kernel's irq code, so here's a simple suggestion description, hoping to get
> some feedback.

David Jander has been working on this driver lately, David can you have a
look at the quite complex description from Jean-Francois?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gpio: pca953x: interrupt feature unreliable (re-re-send, Grant please acknowledge!!)
  2012-05-10 18:19 gpio: pca953x: interrupt feature unreliable (re-re-send, Grant please acknowledge!!) Jean-Francois Dagenais
  2012-05-11  7:48 ` Linus Walleij
@ 2012-05-11 18:53 ` Grant Likely
  1 sibling, 0 replies; 7+ messages in thread
From: Grant Likely @ 2012-05-11 18:53 UTC (permalink / raw)
  To: Jean-Francois Dagenais, linus.walleij
  Cc: torvalds, open list, Thomas Gleixner, eric.miao, maz

On Thu, 10 May 2012 14:19:17 -0400, Jean-Francois Dagenais <jeff.dagenais@gmail.com> wrote:
> Hi all,
> 
> Here's a fun, yet potentially dangerous problem.

Hi Jean-Francois,

It sounds like your analysis is accurate and your conclusions look
correct.  There really isn't much that I can do for you on this though
as I don't have the hardware.  As Linus mentioned, David Jander has
worked on the driver in the past, and git log shows some other folks.
I would ask them.

g.

> 
> I've sent this twice already, without any feedback. I'm just trying to get more 
> visibility on this because I believe it to be a serious issue depending where 
> this chip may be used. Interrupts could be missed in what could be very tough 
> conditions to replicate. In my setup I could reproduce the problem easily and 
> took great care in analysing and documenting it.
> ===============Re-Re-Send (with edits)==================
> Hi all,
> (Using 3.1.0-rc4, didn't change since)
> 
> I have detected a flaw in the interrupt support of the gpio-pca953x driver. 
> This is how I discovered the issue:
> 
> I have a interrupt client device which has it's interrupt signal connected to 
> one of the GPIOs of the pca9555. The client is an ad714x device. It's interrupt 
> routine is falling edge triggered (so says the driver, but according to the 
> AD714x specs, it should be level,low, however this is another topic). The 
> pca9555 is interrupting the CPU through INTA of an ethernet PCI card for which 
> I have blacklisted the driver module and only done "echo 1 > 
> /bus/pci/devices/00xyz/enable". It is "(level, low)" which matches the pca953x 
> request_threaded_isr call (IRQF_TRIGGER_LOW | IRQF_ONESHOT). I used two GPOs on 
> some other device to get the !ISR signals shown below. These show the nested 
> nature of the cli ISR called from the pca953x's ISR.
> ASCII art of my probing (use a monospace font, "!ISR"==low, mean ISR running)
> 
> Events(lined-up):1  2 3 4 5  6    7       8    9
>              ____          _______
> client !INT      |________|       |________(lost)___
>              ___________                   _________
> cli !ISR                |____|____________|
>              ____      ___         _________________
> pca953x !INT     |____|   |_______|
>              ______                             ____
> pca953x !ISR       |___________________________|
> (note: the little spike in the cli !ISR marks the end of reading the client's 
> status registers, the rest of the ISR is updating driver state machines, etc.)
> 
> Summary: The client enters interrupt(1), which immediately puts the pca953x in 
> interrupt as well(1). When the pca953x enters it's isr thread(2), it reads the 
> input status register, this immediately clears the pca953x interrupt(3). The 
> pca953x ISR then figure's out it should invoke the nested ISR for the client. 
> The client ISR starts (4) by reading the client chip's status register, which 
> clears the client interrupt(5). This immediately puts the pca953x back into 
> interrupt (5) since it's a different state than what was last read in the 
> pca953x ISR(3). Since pca953x registered it's interrupt with IRQF_ONESHOT, this 
> new IRQ state is masked and goes un-noticed. The pca953x chips' FLAW IS THEN 
> REVEILED if/when the client chip goes back in interrupt before (7) the client 
> ISR (and the pca053x's ISR with the current code) even has time to finish(8 and 
> 9). At that moment, the pca953x's input are back to the way they where when the 
> pca953x ISR last read the input status register. So when all of the ISR 
> routines are done (9...), the client has it's INT line asserted and no more 
> interrupt will come from the pca953x because of this.
> 
> pca953x specs quote: "Resetting the interrupt circuit is achieved when data on 
> the port is CHANGED TO THE ORIGINAL SETTING, data is read from the port that 
> generated the interrupt or in a Stop event. [...] Interrupts that occur during 
> the ACK or NACK clock pulse CAN BE LOST (or be very short) due to the resetting 
> of the interrupt during this pulse."
> 
> So in essence, although the pca953x.c's design and doc says only edges are 
> detected and supported, the chip design makes it IMPOSSIBLE to guarantee that 
> all edges will be caught.
> 
> The IRQ function SHOULD BE REMOVED from the driver since the chip cannot honor 
> the driver's promise. At the very least, for system integrators that want to 
> use level base interrupts implemented by the pca953x, changes to the driver 
> should be made to correctly handle the lack of LATCHING INTERRUPT STATUS 
> register...
> 
> I would propose a patch, but I am missing knowledge on the internals of the 
> kernel's irq code, so here's a simple suggestion description, hoping to get 
> some feedback.
> 
> First of all, I propose pca953x_irq_set_type should only support 
> IRQ_TYPE_LEVEL_HIGH and IRQ_TYPE_LEVEL_LOW. pca953x_irq_handler should be 
> adjusted accordingly.
> 
> To make sure we catch as many transitions of pca953x chips INT as possible 
> (yikes), we make pca953x_irq_setup() request it's own interrupt as both rising 
> and falling (IRQF_TRIGGER_RISING|IRQF_TRIGGER_FALLING, is that even possible?) 
> without IRQF_ONESHOT, so that pca953x_irq_handler is rescheduled for running 
> even when it's already running.
> 
> Any thoughts? Maybe there are more/less fancy, or better yet, "correct" ways of 
> doing this? Will this even work?
> 
> /jfd
> 

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gpio: pca953x: interrupt feature unreliable
       [not found]       ` <20120620170126.311b8eef@archvile>
@ 2012-06-20 19:28         ` Jean-François Dagenais
  2012-06-21  7:10           ` David Jander
  0 siblings, 1 reply; 7+ messages in thread
From: Jean-François Dagenais @ 2012-06-20 19:28 UTC (permalink / raw)
  To: David Jander; +Cc: Linus Walleij, Grant Likely, open list, torvalds

(putting Linus in CC 'cause I hear he enjoys interaction with hardware. As do I,
and this is a funny hard/soft timing issue, insignificant maybe in the large
scale of the kernel, but an interesting puzzle. Sorry if doing so is out-of-line.

It seems a few answers missed the lists, the interesting bits are quoted, and
should give a flavour of the discussion.)

On 2012-06-20, at 11:01 AM, David Jander wrote:
> 
> First of all: As a hardware designer, if you intended to connect this chip's
> nINT output to a PCA953x input, I assume you know what you are doing, and not
> just trust the features a particular driver advertises.

Of course, I found out about this while "proof of concept"ing chaining the
AD714X INT to a PCA9555. The product this goes in is not in production yet.
 
> Second: What I try to explain as a possible workaround for your case, is to
> move the complete ISR callback functionality into a thread. But since I don't
> know the details of your driver, I don't know if that is possible.
I trie not to modify the kernel wherever I can. For the "purist" part of me,
hacking a chip driver here to compensate the flaws of another there is not the
best way to do it. I would prefer fixing a chip's shortcomings in it's own
driver.


> 
>>> It still is an enormous advantage to have this functionality, because polling
>>> an I2C peripheral just to wait for input state can be a terrible waste of
>>> CPU-time ;-)
>> 
>> I totally respect this. However, it may be OK for a student project to have a
>> flaky interrupt controller, but for industrial or medical applications, the
>> interrupt controller must be reliable.
> 
> We use it in industrial applications, and we have no problems at all ;-)

If you examine the ASCII art of the timing I showed in the original message, you
will see that my particular timing is not the only problematic one... Can you
tell me how your 953x recovers from the state established at step 5? i.e. when
you clear the slave's INT through whatever mechanism, the 953x re-assert's it's
INT signal.

If I can attempt an answer: since the 953x has requested it's own ISR using
level-low, when the 953x's ISR thread is done, since the INT is still low, it
re-enters it's threaded ISR, and read it's input register again, which clears
953x's INT. (And this is key to the problem here, I hope you can see it, or I'm
mistaken here, but the ISR thread of the 953x runs a SECOND TIME) On this second
run, 953x's ISR examines the bits that changed since the last read, in this
case, two things can happen: (assuming your slave is trigger_falling) 1. Things
are quiet, the last read had your slave chip's INT low (and client's the nested
ISR was called (falling edge)), but this current read now says high. This is
considered like a rising and doesn't trigger calling the nested ISR again. All
is well then.  2. if for whatever reason (fast slave OR some delay scheduling
the 953x's ISR thread a second time) the second ISR run comes after the client's
chip re-asserting it's INT. This event makes the 953x's input the same as when
the input register was read on the first ISR thread run, so by definition, this
de-asserts the 953x's INT. When 953x finally reads the input register moments
later, there is no difference between old_stat and cur_stat and you are
essentially locked for this slave until you reset the whole thing.

Since the timings of the slave and the OS are factors here, it's quite hard to
reproduce. Even with the best intention and diligence at design, a race is
possible.


> Really, you must know what you are doing if you want to use an I2C I/O
> expander as interrupt chain controller!

After a good read of the specs and code, I had the idea to use our existing 9555
to separate the interrupts of two i2c slaves. I put this together on a bread
board and put the idea to the test. So I am in agreement that you must know what
you are doing when doing this. I had to know for sure before telling the
hardware guy to put this on real hardware.

>> For example, we are most likely going to
>> replace the 9555 in our next hardware rev because of this problem.
> 
> Then most probably you started your design committing a mistake. You should
> just assume your hardware design is flawed and not try to blame some driver
> for it.

Again, bread board. Plus, I hope you see that since we will most likely not be
using this combination of i2c devices, I no longer have any personal interest of
my own in this, other than my wish to improve the kernel in general.


> 
>> Remember though that I was proposing a fix instead of a complete removal (I
>> admit that suggesting removing the feature was a way to attract attention to the
>> problem ;) The only problem is that since we are most likely not keeping this
>> part in our design, putting effort into a fix of the pca953x.c driver is much
>> lower now in my priority list.
> 
> Ok, then what about just leaving it as it is now?

If you are comfortable with the scenario I demonstrated originally, plus the one
I mention in this message, or you can explain to me (and others) why I am wrong,
then by all means keep the driver as it is! I just thought, first the world should
know about this, and second, get feedback before attempting modifications, if I
ever do get there.


Cheers!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gpio: pca953x: interrupt feature unreliable
  2012-06-20 19:28         ` gpio: pca953x: interrupt feature unreliable Jean-François Dagenais
@ 2012-06-21  7:10           ` David Jander
  2012-06-21 19:34             ` Jean-François Dagenais
  0 siblings, 1 reply; 7+ messages in thread
From: David Jander @ 2012-06-21  7:10 UTC (permalink / raw)
  To: Jean-François Dagenais
  Cc: Linus Walleij, Grant Likely, open list, torvalds


Dear Jean-François,

On Wed, 20 Jun 2012 15:28:05 -0400
Jean-François Dagenais <jeff.dagenais@gmail.com> wrote:

> (putting Linus in CC 'cause I hear he enjoys interaction with hardware. As do I,
> and this is a funny hard/soft timing issue, insignificant maybe in the large
> scale of the kernel, but an interesting puzzle. Sorry if doing so is out-of-line.

Oh dear. Well, if you really need to do that.... ;-)
I hope we are not boring the hell out of a lot of people with this silly
discussion. My excuses to the rest of the CC if we are...

> It seems a few answers missed the lists, the interesting bits are quoted, and
> should give a flavour of the discussion.)
> 
> On 2012-06-20, at 11:01 AM, David Jander wrote:
> > 
> > First of all: As a hardware designer, if you intended to connect this chip's
> > nINT output to a PCA953x input, I assume you know what you are doing, and not
> > just trust the features a particular driver advertises.
> 
> Of course, I found out about this while "proof of concept"ing chaining the
> AD714X INT to a PCA9555. The product this goes in is not in production yet.

Phew! Then what's the problem? You saved yourself right on time ;-)

> > Second: What I try to explain as a possible workaround for your case, is to
> > move the complete ISR callback functionality into a thread. But since I don't
> > know the details of your driver, I don't know if that is possible.
> I trie not to modify the kernel wherever I can. For the "purist" part of me,
> hacking a chip driver here to compensate the flaws of another there is not the
> best way to do it. I would prefer fixing a chip's shortcomings in it's own
> driver.

I agree, but I thought you had a major problem so I made a suggestion about how
you could work around it.... not to suggest how to write an acceptable driver.

> >>> It still is an enormous advantage to have this functionality, because polling
> >>> an I2C peripheral just to wait for input state can be a terrible waste of
> >>> CPU-time ;-)
> >> 
> >> I totally respect this. However, it may be OK for a student project to have a
> >> flaky interrupt controller, but for industrial or medical applications, the
> >> interrupt controller must be reliable.
> > 
> > We use it in industrial applications, and we have no problems at all ;-)
> 
> If you examine the ASCII art of the timing I showed in the original message, you
> will see that my particular timing is not the only problematic one... Can you
> tell me how your 953x recovers from the state established at step 5? i.e. when
> you clear the slave's INT through whatever mechanism, the 953x re-assert's it's
> INT signal.

Easy: Just release the button and push it again ;-)
(I mentioned our application was a gpio_keys keyboard in my previous e-mail.)
That's the kind of applications you can use the PCA953x as interrupt
controller for. My patches incorporated support for device-tree bindings and a
few fixes for both gpio-pca953x.c and gpio_keys.c, so it became possible to do
just this in a device-tree:

	i2c@1700 {
		[...]
		interrupts = <0x9 0x8>;
		[...]
		gpio0: gpio@74 {
			compatible = "nxp,pca9539";
			reg = <0x74>;
			interrupts = <0x11 0x8>;
			#gpio-cells = <2>;
			gpio-controller;
		}
	}
	[...]
	gpio_keys {
		compatible = "gpio-keys";
		#address-cells = <1>;
		#size-cells = <0>;
		autorepeat;
		button@20 {
			label = "GPIO Key ESC";
			linux,code = <1>;
			gpios = <&gpio0 0 1>;
		};
		button@21 {
			label = "GPIO Key UP";
			linux,code = <103>;
			gpios = <&gpio0 1 1>;
		};
		...

Get the idea? There is no other code involved, and it "just works"! I am
absolutely amazed by this marvel ;-)

> If I can attempt an answer: since the 953x has requested it's own ISR using
> level-low, when the 953x's ISR thread is done, since the INT is still low, it
> re-enters it's threaded ISR, and read it's input register again, which clears
> 953x's INT. (And this is key to the problem here, I hope you can see it, or I'm
> mistaken here, but the ISR thread of the 953x runs a SECOND TIME) On this second
> run, 953x's ISR examines the bits that changed since the last read, in this
> case, two things can happen: (assuming your slave is trigger_falling) 1. Things

I perfectly understand your reasoning. But it is not the case in our
application, since the user will not keep his finger on the key forever if the
key doesn't react ;-)
Also, in our case, pushing and releasing a button is such a slow process, the
the likelihood this will ever occur is negligible.

> are quiet, the last read had your slave chip's INT low (and client's the nested
> ISR was called (falling edge)), but this current read now says high. This is
> considered like a rising and doesn't trigger calling the nested ISR again. All
> is well then.  2. if for whatever reason (fast slave OR some delay scheduling
> the 953x's ISR thread a second time) the second ISR run comes after the client's
> chip re-asserting it's INT. This event makes the 953x's input the same as when
> the input register was read on the first ISR thread run, so by definition, this
> de-asserts the 953x's INT. When 953x finally reads the input register moments
> later, there is no difference between old_stat and cur_stat and you are
> essentially locked for this slave until you reset the whole thing.
> 
> Since the timings of the slave and the OS are factors here, it's quite hard to
> reproduce. Even with the best intention and diligence at design, a race is
> possible.

Yes, of course. But that is a shortcoming of the extremely simple design of
PCA953x gpio expanders. There is not much we can do about this in the driver.
Maybe we could place a warning in the documentation about the shortcomings of
these chips...?

> > Really, you must know what you are doing if you want to use an I2C I/O
> > expander as interrupt chain controller!
> 
> After a good read of the specs and code, I had the idea to use our existing 9555
> to separate the interrupts of two i2c slaves. I put this together on a bread
> board and put the idea to the test. So I am in agreement that you must know what
> you are doing when doing this. I had to know for sure before telling the
> hardware guy to put this on real hardware.

Never mind. I am a hardware designer myself and we all make such mistakes.
Just try not to blame someone or something else for it publicly ;-)

> >> For example, we are most likely going to
> >> replace the 9555 in our next hardware rev because of this problem.
> > 
> > Then most probably you started your design committing a mistake. You should
> > just assume your hardware design is flawed and not try to blame some driver
> > for it.
> 
> Again, bread board. Plus, I hope you see that since we will most likely not be
> using this combination of i2c devices, I no longer have any personal interest of
> my own in this, other than my wish to improve the kernel in general.

Feel free to improve the driver... but please do not remove features that work
for others.

> >> Remember though that I was proposing a fix instead of a complete removal (I
> >> admit that suggesting removing the feature was a way to attract attention to the
> >> problem ;) The only problem is that since we are most likely not keeping this
> >> part in our design, putting effort into a fix of the pca953x.c driver is much
> >> lower now in my priority list.
> > 
> > Ok, then what about just leaving it as it is now?
> 
> If you are comfortable with the scenario I demonstrated originally, plus the one
> I mention in this message, or you can explain to me (and others) why I am wrong,
> then by all means keep the driver as it is! I just thought, first the world should
> know about this, and second, get feedback before attempting modifications, if I
> ever do get there.

I hope my explanation was clear enough this time. As I said, we may put some
warnings in the documentation of the driver.

Best regards,

-- 
David Jander

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gpio: pca953x: interrupt feature unreliable
  2012-06-21  7:10           ` David Jander
@ 2012-06-21 19:34             ` Jean-François Dagenais
  2012-06-22  7:53               ` David Jander
  0 siblings, 1 reply; 7+ messages in thread
From: Jean-François Dagenais @ 2012-06-21 19:34 UTC (permalink / raw)
  To: David Jander; +Cc: Linus Walleij, Grant Likely, open list

Dear David!

On 2012-06-21, at 3:10 AM, David Jander wrote:

> On Wed, 20 Jun 2012 15:28:05 -0400
> Jean-François Dagenais <jeff.dagenais@gmail.com> wrote:
> 
>> (putting Linus in CC 'cause I hear he enjoys interaction with hardware. As do I,
>> and this is a funny hard/soft timing issue, insignificant maybe in the large
>> scale of the kernel, but an interesting puzzle. Sorry if doing so is out-of-line.
> 
> Oh dear. Well, if you really need to do that.... ;-)
> I hope we are not boring the hell out of a lot of people with this silly
> discussion. My excuses to the rest of the CC if we are...

I personally think the "silliness" of the discussion is still in debate. It may be silly in the grand context of the kernel, I agree, but definitely core for pca953x users. I merely involved torvalds for the kick of tricky timing issues and register specs, not to attract a parade on this. Removed on this reply, he can follow if he's interested on the list. Sorry if any etiquette was violated.

> 
>> Of course, I found out about this while "proof of concept"ing chaining the
>> AD714X INT to a PCA9555. The product this goes in is not in production yet.
> 
> Phew! Then what's the problem?

Uuuh, I'm good myself thanks ;) but the problem is this is not a reliable interrupt controller (in the general sense), especially the way the current driver works with it.

> ...
> Easy: Just release the button and push it again ;-)
> (I mentioned our application was a gpio_keys keyboard in my previous e-mail.)

Yeah I had missed that you were doing one key per GPIO... But that's indeed why you are fine with it. You are using it more like a keyboard driver with an interrupt controller. If you use the 953x as an actual interrupt "controller" for a client that has a hardware-logic controlled INT signal, it won't be reliable for such an application (with the current driver state), contrary to what the driver initially suggests.

> That's the kind of applications you can use the PCA953x as interrupt
> controller for. My patches incorporated support for device-tree bindings and a
> few fixes for both gpio-pca953x.c and gpio_keys.c, so it became possible to do
> just this in a device-tree:
> 		...
> 
> Get the idea? There is no other code involved, and it "just works"! I am
> absolutely amazed by this marvel ;-)

I wasn't familiar with device-tree, thanks for this intro, I will look into it more. Not being familiar, I couldn't make this part out properly: are each of your keys using their own nested IRQ? 

> ...
>> Since the timings of the slave and the OS are factors here, it's quite hard to
>> reproduce. Even with the best intention and diligence at design, a race is
>> possible.
> 
> Yes, of course. But that is a shortcoming of the extremely simple design of
> PCA953x gpio expanders. There is not much we can do about this in the driver.

Well, what about my original suggestion? I know you said you agree with serving on only LEVEL, but it had a few other specifics... I guess at this point, my idea would have to be backed by tests and a patch. (Hearing you tell me: Yeah JF, why don't you put your money where your mouth is!? ;)

> Maybe we could place a warning in the documentation about the shortcomings of
> these chips...?

Ok, now we are starting to agree! ;-) Short of an actual mod patch to the driver with results, I think this is a minimum. It could save a lot of people a lot of time, and perhaps save some aggravation too.

> Never mind. I am a hardware designer myself and we all make such mistakes.
> Just try not to blame someone or something else for it publicly ;-)

I'm sorry if you felt my interventions here had anything to do with blame, to you or anyone. I was merely acting on the behalf of helping others avoid encountering issues with this, and saving other engineers some time from my own experiences. And in the spirit of giving back since I take so much from the open source community. I know it sounds corny, but it's still true.

> 
> Feel free to improve the driver... but please do not remove features that work
> for others.

Indeed. Hence why I try to get feedback about my improvement idea. It has effects on current users since the interrupt clients would have to adjust since instead of providing EDGE based, it would provide only LEVEL, plus, in existing designs, the pca953x would now ask that it's own interrupt be EDGE(RISING|FALLING) instead of LEVEL_LOW. These changes are important for my theory to work correctly. The only remaining trap at this point would be in the chip, which says it cannot catch edges that happen too quickly, but the minimum length of the edges it detects is sufficiently small not to be of anyone's concern.

> I hope my explanation was clear enough this time. As I said, we may put some
> warnings in the documentation of the driver.
> 
> Best regards,
> 
> -- 
> David Jander

Thanks for your participation and work on this driver! ;)
/jfd


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: gpio: pca953x: interrupt feature unreliable
  2012-06-21 19:34             ` Jean-François Dagenais
@ 2012-06-22  7:53               ` David Jander
  0 siblings, 0 replies; 7+ messages in thread
From: David Jander @ 2012-06-22  7:53 UTC (permalink / raw)
  To: Jean-François Dagenais; +Cc: Linus Walleij, Grant Likely, open list


Dear Jean-François,

On Thu, 21 Jun 2012 15:34:07 -0400
Jean-François Dagenais <jeff.dagenais@gmail.com> wrote:
> Uuuh, I'm good myself thanks ;) but the problem is this is not a reliable
> interrupt controller (in the general sense), especially the way the current
> driver works with it.

I still think that the real problem is with the chip itself. The driver should
just do it's best, and if the problem can't be worked around in the driver,
the driver should stay neutral IMHO.
In theory the PCA953x is still an edge triggered interrupt controller, but its
own interrupt output is level.

> > ...
> > Easy: Just release the button and push it again ;-)
> > (I mentioned our application was a gpio_keys keyboard in my previous e-mail.)
> 
> Yeah I had missed that you were doing one key per GPIO... But that's indeed why
> you are fine with it. You are using it more like a keyboard driver with an
> interrupt controller. If you use the 953x as an actual interrupt "controller"
> for a client that has a hardware-logic controlled INT signal, it won't be
> reliable for such an application (with the current driver state), contrary to
> what the driver initially suggests.

I insist. The driver can't really do much about it.

> > That's the kind of applications you can use the PCA953x as interrupt
> > controller for. My patches incorporated support for device-tree bindings and a
> > few fixes for both gpio-pca953x.c and gpio_keys.c, so it became possible to do
> > just this in a device-tree:
> > 		...
> > 
> > Get the idea? There is no other code involved, and it "just works"! I am
> > absolutely amazed by this marvel ;-)
> 
> I wasn't familiar with device-tree, thanks for this intro, I will look into

Be prepared, a device-tree is coming to a SoC near you any day... unless you
are stuck in the x86 world, where such marvel is not likely to appear anytime
soon ;-)

> it more. Not being familiar, I couldn't make this part out properly: are each
> of your keys using their own nested IRQ?

Yes!
Unfortunately gpio_keys.c requests edge-based IRQ's, so if you eliminate edge
functionality from the pca953x driver, my setup will stop working.
And the pca953x really _is_ an edge based interrupt controller (if one really
uses it as such), although a fairly limited (broken if you must) one.

> >> Since the timings of the slave and the OS are factors here, it's quite hard to
> >> reproduce. Even with the best intention and diligence at design, a race is
> >> possible.
> > 
> > Yes, of course. But that is a shortcoming of the extremely simple design of
> > PCA953x gpio expanders. There is not much we can do about this in the driver.
> 
> Well, what about my original suggestion? I know you said you agree with serving
> on only LEVEL, but it had a few other specifics... I guess at this point, my

Sorry. I was wrong. It is not a LEVEL interrupt controller. It doesn't work as
such. Besides, as I said above, this would break my setup, although the
decice-tree where we use this functionality is not mainlined, so I am probably
not allowed to complain about that :-(

> idea would have to be backed by tests and a patch. (Hearing you tell me: Yeah
> JF, why don't you put your money where your mouth is!? ;)

I wasn't about to say that.

> > Maybe we could place a warning in the documentation about the shortcomings of
> > these chips...?
> 
> Ok, now we are starting to agree! ;-) Short of an actual mod patch to the
> driver with results, I think this is a minimum. It could save a lot of people
> a lot of time, and perhaps save some aggravation too.
> 
> > Never mind. I am a hardware designer myself and we all make such mistakes.
> > Just try not to blame someone or something else for it publicly ;-)
> 
> I'm sorry if you felt my interventions here had anything to do with blame,
> to you or anyone. I was merely acting on the behalf of helping others avoid

Well, to me it looked like you wanted to blame the pca953x driver for your
problem with the chip....

> encountering issues with this, and saving other engineers some time from my
> own experiences. And in the spirit of giving back since I take so much from
> the open source community. I know it sounds corny, but it's still true.
> 
> > 
> > Feel free to improve the driver... but please do not remove features that work
> > for others.
> 
> Indeed. Hence why I try to get feedback about my improvement idea. It has
> effects on current users since the interrupt clients would have to adjust
> since instead of providing EDGE based, it would provide only LEVEL, plus, in
> existing designs, the pca953x would now ask that it's own interrupt be
> EDGE(RISING|FALLING) instead of LEVEL_LOW. These changes are important for my
> theory to work correctly. The only remaining trap at this point would be in
> the chip, which says it cannot catch edges that happen too quickly, but the
> minimum length of the edges it detects is sufficiently small not to be of
> anyone's concern.

You are proposing to change the PCA953x's input sensitivity artificially to
LEVEL, and its output sensitivity to EDGE(RISING|FALLING)?
Weird. Quickly sketching a scenario where you have just one level-based
acknowledged interrupt on a PCA953x input, it would generate no less than 4
interrupts to the primary controller (2 for each edge of the input signal)!!
I don't think this will be accepted.... besides it will break my setup and be
pretty much counter-intuitive to anyone else trying to use this chip as
advertised (talking about pitfalls).
And it will generate quite a significant amount of downstream interrupts,
triggering a lot if I2C traffic.

> > I hope my explanation was clear enough this time. As I said, we may put some
> > warnings in the documentation of the driver.

I still think that would be the best option... specially since you are not
going to use your proposed setup anyway.

> Thanks for your participation and work on this driver! ;)

You are welcome. Thanks for the discussion.

Best regards,

-- 
David Jander
Protonic Holland.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-06-22  7:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-10 18:19 gpio: pca953x: interrupt feature unreliable (re-re-send, Grant please acknowledge!!) Jean-Francois Dagenais
2012-05-11  7:48 ` Linus Walleij
     [not found]   ` <20120604091127.0f95a85c@archvile>
     [not found]     ` <73B37CBF-9E2A-49F5-9CA9-8CFBEF3666D6@gmail.com>
     [not found]       ` <20120620170126.311b8eef@archvile>
2012-06-20 19:28         ` gpio: pca953x: interrupt feature unreliable Jean-François Dagenais
2012-06-21  7:10           ` David Jander
2012-06-21 19:34             ` Jean-François Dagenais
2012-06-22  7:53               ` David Jander
2012-05-11 18:53 ` gpio: pca953x: interrupt feature unreliable (re-re-send, Grant please acknowledge!!) Grant Likely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).