linux-gpio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] gpiolib: cdev: can't read RELEASED event for last line
@ 2023-05-25  3:09 Kent Gibson
  2023-05-25  7:09 ` Kent Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Gibson @ 2023-05-25  3:09 UTC (permalink / raw)
  To: brgl; +Cc: linux-gpio

Hi Bart,

In testing I'm finding that I can't read the RELEASED event from the
chip fd when the last line on the chip is released.
The chip fd becomes readable, but when I try to read it I get ENODEV.

I suspect this change is the likely culprit:

533aae7c94db gpiolib: cdev: fix NULL-pointer dereferences

@@ -2425,6 +2449,9 @@ static ssize_t lineinfo_watch_read(struct file *file, char __user *buf,
        int ret;
        size_t event_size;

+       if (!cdev->gdev->chip)
+               return -ENODEV;
+

though I haven't bisected it yet to be sure.

Btw, that is testing on 6.4.0-rc3 mainline.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] gpiolib: cdev: can't read RELEASED event for last line
  2023-05-25  3:09 [BUG] gpiolib: cdev: can't read RELEASED event for last line Kent Gibson
@ 2023-05-25  7:09 ` Kent Gibson
  2023-05-25  7:46   ` Kent Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Gibson @ 2023-05-25  7:09 UTC (permalink / raw)
  To: brgl; +Cc: linux-gpio

On Thu, May 25, 2023 at 11:09:52AM +0800, Kent Gibson wrote:
> Hi Bart,
> 
> In testing I'm finding that I can't read the RELEASED event from the
> chip fd when the last line on the chip is released.
> The chip fd becomes readable, but when I try to read it I get ENODEV.
> 
> I suspect this change is the likely culprit:
> 
> 533aae7c94db gpiolib: cdev: fix NULL-pointer dereferences
> 
> @@ -2425,6 +2449,9 @@ static ssize_t lineinfo_watch_read(struct file *file, char __user *buf,
>         int ret;
>         size_t event_size;
> 
> +       if (!cdev->gdev->chip)
> +               return -ENODEV;
> +
> 
> though I haven't bisected it yet to be sure.
> 
> Btw, that is testing on 6.4.0-rc3 mainline.
> 

To be clear, I happened to spot the problem with 6.4.0-rc3, but the
problem was almost certainly introduced earlier.

The problem ican be repeated with the info_change_events test from [1]

e.g.

$ cargo test --all-features chip

.... 

failures:

---- chip::uapi_v1::info_change_events stdout ----
read: Err(UapiError(ReadEvent, Os(Errno { code: 19, description: Some("No such device") })))
thread 'chip::uapi_v1::info_change_events' panicked at 'assertion failed: res.is_ok()', lib/tests/chip.rs:719:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- chip::uapi_v2::info_change_events stdout ----
read: Err(UapiError(ReadEvent, Os(Errno { code: 19, description: Some("No such device") })))
thread 'chip::uapi_v2::info_change_events' panicked at 'assertion failed: res.is_ok()', lib/tests/chip.rs:719:9


I haven't done a complete bisection, but the tests do pass with 6.1.0-rc1
- the previous kernel I was testing with.

I can also confirm that receiving the event using a blocking read() on the
fd still works, it is a poll() on the fd followed by a read() that fails.

I only ran across the bug by pure chance while debugging that test case
- previously it only did the blocking read.
So there is probably limited real world impact, but something additional
to test for, and fix, of course.

Cheers,
Kent.
[1] https://github.com/warthog618/gpiocdev-rs/blob/36e9c1640e2ae5c6e08d43cd0c254a2e7cec2d63/lib/tests/chip.rs ~line 649

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] gpiolib: cdev: can't read RELEASED event for last line
  2023-05-25  7:09 ` Kent Gibson
@ 2023-05-25  7:46   ` Kent Gibson
  2023-05-25 13:21     ` Kent Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Gibson @ 2023-05-25  7:46 UTC (permalink / raw)
  To: brgl; +Cc: linux-gpio

On Thu, May 25, 2023 at 03:09:26PM +0800, Kent Gibson wrote:
> On Thu, May 25, 2023 at 11:09:52AM +0800, Kent Gibson wrote:
> > Hi Bart,
> > 
> I can also confirm that receiving the event using a blocking read() on the
> fd still works, it is a poll() on the fd followed by a read() that fails.
> 

Hmmm, so it occurred to me that gpionotify does the poll()/read(), so it
should exhibit the bug.  But no, it doesn't.

So it could be my code doing something boneheaded??
Or there is some other variable at play.
I'll try to write a test for it with libgpiod and see I can reproduce
it.  But I might put it on the back burner - this one isn't terribly
high priority.

Cheers,
Kent.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] gpiolib: cdev: can't read RELEASED event for last line
  2023-05-25  7:46   ` Kent Gibson
@ 2023-05-25 13:21     ` Kent Gibson
  2023-05-26  0:45       ` Kent Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Gibson @ 2023-05-25 13:21 UTC (permalink / raw)
  To: brgl; +Cc: linux-gpio

On Thu, May 25, 2023 at 03:46:12PM +0800, Kent Gibson wrote:
> On Thu, May 25, 2023 at 03:09:26PM +0800, Kent Gibson wrote:
> > On Thu, May 25, 2023 at 11:09:52AM +0800, Kent Gibson wrote:
> > > Hi Bart,
> > > 
> > I can also confirm that receiving the event using a blocking read() on the
> > fd still works, it is a poll() on the fd followed by a read() that fails.
> > 
> 
> Hmmm, so it occurred to me that gpionotify does the poll()/read(), so it
> should exhibit the bug.  But no, it doesn't.
> 
> So it could be my code doing something boneheaded??
> Or there is some other variable at play.
> I'll try to write a test for it with libgpiod and see I can reproduce
> it.  But I might put it on the back burner - this one isn't terribly
> high priority.
> 

Bisect result:

[bdbbae241a04f387ba910b8609f95fad5f1470c7] gpiolib: protect the GPIO device against being dropped while in use by user-space

So, the semaphores patch.
The Rust test gets the timings right to hit a race/order of events issue?

Cheers,
Kent.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] gpiolib: cdev: can't read RELEASED event for last line
  2023-05-25 13:21     ` Kent Gibson
@ 2023-05-26  0:45       ` Kent Gibson
  2023-05-26  6:51         ` Kent Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Gibson @ 2023-05-26  0:45 UTC (permalink / raw)
  To: brgl; +Cc: linux-gpio

On Thu, May 25, 2023 at 09:21:24PM +0800, Kent Gibson wrote:
> On Thu, May 25, 2023 at 03:46:12PM +0800, Kent Gibson wrote:
> > On Thu, May 25, 2023 at 03:09:26PM +0800, Kent Gibson wrote:
> > > On Thu, May 25, 2023 at 11:09:52AM +0800, Kent Gibson wrote:
> > > > Hi Bart,
> > > > 
> > > I can also confirm that receiving the event using a blocking read() on the
> > > fd still works, it is a poll() on the fd followed by a read() that fails.
> > > 
> > 
> > Hmmm, so it occurred to me that gpionotify does the poll()/read(), so it
> > should exhibit the bug.  But no, it doesn't.
> > 
> > So it could be my code doing something boneheaded??
> > Or there is some other variable at play.
> > I'll try to write a test for it with libgpiod and see I can reproduce
> > it.  But I might put it on the back burner - this one isn't terribly
> > high priority.
> > 
> 
> Bisect result:
> 
> [bdbbae241a04f387ba910b8609f95fad5f1470c7] gpiolib: protect the GPIO device against being dropped while in use by user-space
> 
> So, the semaphores patch.
> The Rust test gets the timings right to hit a race/order of events issue?
> 

Well that throws some new light on the problem.
One of the differentiators of the Rust test from the other ways I was
trying to reproduce the problem is that the Rust test is multithreaded.

This being semaphore related makes other weirdness I was seeing make
sense.
The original form of that test was locking up when the bg thread
released the line.  The drop never returned and the fg thread never
received a RELEASED event.  That wasn't a problem with the drop, or an
event being lost, as I assumed, that was a DEADLOCK.

The current form of the test uses message passing to coordinate the
threads, so that deadlock condition doesn't occur any more.
(That change was because of my concern that the lack of buffering of info
changed events in the kernel could result in lost events - and the goal of
the test was to test my iterator, not the kernel...)

I'll revert that test case to see if I can reproduce the deadlock case.

But it looks like those semaphores have problems. At least one path
might lead to deadlock and another leads to an inconsistent state.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] gpiolib: cdev: can't read RELEASED event for last line
  2023-05-26  0:45       ` Kent Gibson
@ 2023-05-26  6:51         ` Kent Gibson
  0 siblings, 0 replies; 6+ messages in thread
From: Kent Gibson @ 2023-05-26  6:51 UTC (permalink / raw)
  To: brgl; +Cc: linux-gpio

On Fri, May 26, 2023 at 08:45:38AM +0800, Kent Gibson wrote:
> On Thu, May 25, 2023 at 09:21:24PM +0800, Kent Gibson wrote:
> > On Thu, May 25, 2023 at 03:46:12PM +0800, Kent Gibson wrote:
> > > On Thu, May 25, 2023 at 03:09:26PM +0800, Kent Gibson wrote:
> > > > On Thu, May 25, 2023 at 11:09:52AM +0800, Kent Gibson wrote:
> > > > > Hi Bart,
> > > > > 
> > > > I can also confirm that receiving the event using a blocking read() on the
> > > > fd still works, it is a poll() on the fd followed by a read() that fails.
> > > > 
> > > 
> > > Hmmm, so it occurred to me that gpionotify does the poll()/read(), so it
> > > should exhibit the bug.  But no, it doesn't.
> > > 
> > > So it could be my code doing something boneheaded??
> > > Or there is some other variable at play.
> > > I'll try to write a test for it with libgpiod and see I can reproduce
> > > it.  But I might put it on the back burner - this one isn't terribly
> > > high priority.
> > > 
> > 
> > Bisect result:
> > 
> > [bdbbae241a04f387ba910b8609f95fad5f1470c7] gpiolib: protect the GPIO device against being dropped while in use by user-space
> > 
> > So, the semaphores patch.
> > The Rust test gets the timings right to hit a race/order of events issue?
> > 
> 
> Well that throws some new light on the problem.
> One of the differentiators of the Rust test from the other ways I was
> trying to reproduce the problem is that the Rust test is multithreaded.
> 
> This being semaphore related makes other weirdness I was seeing make
> sense.
> The original form of that test was locking up when the bg thread
> released the line.  The drop never returned and the fg thread never
> received a RELEASED event.  That wasn't a problem with the drop, or an
> event being lost, as I assumed, that was a DEADLOCK.
> 
> The current form of the test uses message passing to coordinate the
> threads, so that deadlock condition doesn't occur any more.
> (That change was because of my concern that the lack of buffering of info
> changed events in the kernel could result in lost events - and the goal of
> the test was to test my iterator, not the kernel...)
> 
> I'll revert that test case to see if I can reproduce the deadlock case.
> 
> But it looks like those semaphores have problems. At least one path
> might lead to deadlock and another leads to an inconsistent state.
> 

I have failed to reproduce the deadlock case, which is good news, I
guess.  Not sure what the issue was then, but not seeing it now.
I dealing with kernel crashes at the time, so perhaps my kernel was
in a weird state?

Anyway, I've split the problem ENODEV test case out into a new repo[1]
so it is easier to play with.  In the longer term the idea is to create a
regression test suite for the uAPI.

You can run the tests by checking out the repo and calling "cargo test".

At the moment there are two tests in there - the enodev, and the
"deadlock" (well the test that was hanging on me, but now isn't).
The deadlock test is just for reference.

I've reduced the enodev case as much as I can to try to identify the
root cause.

It could be related to the event queue somehow, as it passes if it
doesn't wait for the bg thread to complete and reads the events as they
arrive.  So the two threads do not run concurrently when it fails.

It also passes if the request/drop is performed in the same thread.
So it requires the bg thread.

Not sure if that helps any, but that is where I'm at - still puzzled.

Cheers,
Kent.

[1]https://github.com/warthog618/gurt-rs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-26  6:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-25  3:09 [BUG] gpiolib: cdev: can't read RELEASED event for last line Kent Gibson
2023-05-25  7:09 ` Kent Gibson
2023-05-25  7:46   ` Kent Gibson
2023-05-25 13:21     ` Kent Gibson
2023-05-26  0:45       ` Kent Gibson
2023-05-26  6:51         ` Kent Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).