linux-gpio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] gpio: cdev: fix a crash on line-request release
@ 2023-05-23 15:51 Bartosz Golaszewski
  2023-05-23 23:58 ` Kent Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Bartosz Golaszewski @ 2023-05-23 15:51 UTC (permalink / raw)
  To: Kent Gibson, Linus Walleij, Andy Shevchenko, Viresh Kumar
  Cc: linux-gpio, linux-kernel, Bartosz Golaszewski

When a GPIO device is forcefully unregistered, we are left with an
inactive object. If user-space kept an open file descriptor to a line
request associated with such a structure, upon closing it, we'll see the
kernel crash due to freeing unexistent GPIO descriptors.

Fix it by checking if chip is still alive before calling gpiod_free() in
release callbacks for both v2 and v1 ABI.

Fixes: 3c0d9c635ae2 ("gpiolib: cdev: support GPIO_V2_GET_LINE_IOCTL and GPIO_V2_LINE_GET_VALUES_IOCTL")
Reported-by: Kent Gibson <warthog618@gmail.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
---
 drivers/gpio/gpiolib-cdev.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index 0a33971c964c..6830f668a1b0 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -315,13 +315,19 @@ static long linehandle_ioctl_compat(struct file *file, unsigned int cmd,
 
 static void linehandle_free(struct linehandle_state *lh)
 {
+	struct gpio_device *gdev = lh->gdev;
 	int i;
 
-	for (i = 0; i < lh->num_descs; i++)
-		if (lh->descs[i])
-			gpiod_free(lh->descs[i]);
+	for (i = 0; i < lh->num_descs; i++) {
+		if (lh->descs[i]) {
+			down_write(&gdev->sem);
+			if (gdev->chip)
+				gpiod_free(lh->descs[i]);
+			up_write(&gdev->sem);
+		}
+	}
 	kfree(lh->label);
-	gpio_device_put(lh->gdev);
+	gpio_device_put(gdev);
 	kfree(lh);
 }
 
@@ -1565,17 +1571,21 @@ static ssize_t linereq_read(struct file *file, char __user *buf,
 
 static void linereq_free(struct linereq *lr)
 {
+	struct gpio_device *gdev = lr->gdev;
 	unsigned int i;
 
 	for (i = 0; i < lr->num_lines; i++) {
 		if (lr->lines[i].desc) {
 			edge_detector_stop(&lr->lines[i]);
-			gpiod_free(lr->lines[i].desc);
+			down_write(&gdev->sem);
+			if (gdev->chip)
+				gpiod_free(lr->lines[i].desc);
+			up_write(&gdev->sem);
 		}
 	}
 	kfifo_free(&lr->events);
 	kfree(lr->label);
-	gpio_device_put(lr->gdev);
+	gpio_device_put(gdev);
 	kfree(lr);
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpio: cdev: fix a crash on line-request release
  2023-05-23 15:51 [PATCH] gpio: cdev: fix a crash on line-request release Bartosz Golaszewski
@ 2023-05-23 23:58 ` Kent Gibson
  2023-05-24  2:09   ` Kent Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Gibson @ 2023-05-23 23:58 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Linus Walleij, Andy Shevchenko, Viresh Kumar, linux-gpio,
	linux-kernel

On Tue, May 23, 2023 at 05:51:01PM +0200, Bartosz Golaszewski wrote:
> When a GPIO device is forcefully unregistered, we are left with an
> inactive object. If user-space kept an open file descriptor to a line
> request associated with such a structure, upon closing it, we'll see the
> kernel crash due to freeing unexistent GPIO descriptors.
> 

nonexistent

But I'm not sure that works - gpiod_free() is null aware, so strictly
speaking "freeing nonexistent GPIO descriptors" isn't the problem.
You mean orphaned GPIO descriptors?

> Fix it by checking if chip is still alive before calling gpiod_free() in
> release callbacks for both v2 and v1 ABI.
> 
> Fixes: 3c0d9c635ae2 ("gpiolib: cdev: support GPIO_V2_GET_LINE_IOCTL and GPIO_V2_LINE_GET_VALUES_IOCTL")

The problem is also in v1, so do we want to consider backporting a fix
for that too?

> Reported-by: Kent Gibson <warthog618@gmail.com>
> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
> ---
>  drivers/gpio/gpiolib-cdev.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
> index 0a33971c964c..6830f668a1b0 100644
> --- a/drivers/gpio/gpiolib-cdev.c
> +++ b/drivers/gpio/gpiolib-cdev.c
> @@ -315,13 +315,19 @@ static long linehandle_ioctl_compat(struct file *file, unsigned int cmd,
>  
>  static void linehandle_free(struct linehandle_state *lh)
>  {
> +	struct gpio_device *gdev = lh->gdev;

It isn't clear to me what this is for.
The flow below now calls gpiod_free() less often, so not that.
It is there for the normal case??

>  	int i;
>  
> -	for (i = 0; i < lh->num_descs; i++)
> -		if (lh->descs[i])
> -			gpiod_free(lh->descs[i]);
> +	for (i = 0; i < lh->num_descs; i++) {
> +		if (lh->descs[i]) {
> +			down_write(&gdev->sem);
> +			if (gdev->chip)
> +				gpiod_free(lh->descs[i]);
> +			up_write(&gdev->sem);
> +		}
> +	}
>  	kfree(lh->label);
> -	gpio_device_put(lh->gdev);
> +	gpio_device_put(gdev);
>  	kfree(lh);
>  }
>  

lineevent_free() needs the fix too?

> @@ -1565,17 +1571,21 @@ static ssize_t linereq_read(struct file *file, char __user *buf,
>  
>  static void linereq_free(struct linereq *lr)
>  {
> +	struct gpio_device *gdev = lr->gdev;
>  	unsigned int i;
>  
>  	for (i = 0; i < lr->num_lines; i++) {
>  		if (lr->lines[i].desc) {
>  			edge_detector_stop(&lr->lines[i]);
> -			gpiod_free(lr->lines[i].desc);
> +			down_write(&gdev->sem);
> +			if (gdev->chip)
> +				gpiod_free(lr->lines[i].desc);
> +			up_write(&gdev->sem);
>  		}
>  	}
>  	kfifo_free(&lr->events);
>  	kfree(lr->label);
> -	gpio_device_put(lr->gdev);
> +	gpio_device_put(gdev);
>  	kfree(lr);
>  }
>  

TBH the fact you have to mess with sems here indicates to me the problem
lies in gpiolib itself.  As a gpiolib client, cdev should just be able to
release the desc back to gpiolib and have it cleanup the mess.

Not that I ever got my head around the whole gpiolib object lifecycle here
- for v2 I just followed what v1 did.

Also, gpiolib still reports an error when forceably removing chips that
have open requests:

    dev_crit(&gdev->dev,
			 "REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED\n");

Any other gpiolib clients out there that this might impact?
Else why report that crit error if you expect it is dealt with?

So while this may fix the crash, I can't say I'm happy with it.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpio: cdev: fix a crash on line-request release
  2023-05-23 23:58 ` Kent Gibson
@ 2023-05-24  2:09   ` Kent Gibson
  2023-05-24  4:36     ` Kent Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Gibson @ 2023-05-24  2:09 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Linus Walleij, Andy Shevchenko, Viresh Kumar, linux-gpio,
	linux-kernel

On Wed, May 24, 2023 at 07:58:36AM +0800, Kent Gibson wrote:
> On Tue, May 23, 2023 at 05:51:01PM +0200, Bartosz Golaszewski wrote:
> > When a GPIO device is forcefully unregistered, we are left with an
> > inactive object. If user-space kept an open file descriptor to a line
> > request associated with such a structure, upon closing it, we'll see the
> > kernel crash due to freeing unexistent GPIO descriptors.
> > 
> 
> > @@ -1565,17 +1571,21 @@ static ssize_t linereq_read(struct file *file, char __user *buf,
> >  
> >  static void linereq_free(struct linereq *lr)
> >  {
> > +	struct gpio_device *gdev = lr->gdev;
> >  	unsigned int i;
> >  
> >  	for (i = 0; i < lr->num_lines; i++) {
> >  		if (lr->lines[i].desc) {
> >  			edge_detector_stop(&lr->lines[i]);
> > -			gpiod_free(lr->lines[i].desc);
> > +			down_write(&gdev->sem);
> > +			if (gdev->chip)
> > +				gpiod_free(lr->lines[i].desc);
> > +			up_write(&gdev->sem);

Ummm, taking another look at the oops I sent you, the crash actually
occurs in edge_detector_stop():

May 23 11:47:06 firefly kernel: [ 4216.877056] Call Trace:
May 23 11:47:06 firefly kernel: [ 4216.877512]  <TASK>
May 23 11:47:06 firefly kernel: [ 4216.877924]  irq_domain_deactivate_irq+0x19/0x30
May 23 11:47:06 firefly kernel: [ 4216.878543]  free_irq+0x257/0x360
May 23 11:47:06 firefly kernel: [ 4216.879056]  linereq_free+0x9b/0xe0
May 23 11:47:06 firefly kernel: [ 4216.879608]  linereq_release+0xc/0x20
May 23 11:47:06 firefly kernel: [ 4216.880230]  __fput+0x87/0x240
May 23 11:47:06 firefly kernel: [ 4216.880744]  task_work_run+0x54/0x80

That free_irq() call is in edge_detector_stop() (which apparently is inlined),
not in gpiod_free().

So pretty sure this patch doesn't even solve my problem, but I will test
it to confirm.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpio: cdev: fix a crash on line-request release
  2023-05-24  2:09   ` Kent Gibson
@ 2023-05-24  4:36     ` Kent Gibson
  2023-05-24 19:42       ` Bartosz Golaszewski
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Gibson @ 2023-05-24  4:36 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Linus Walleij, Andy Shevchenko, Viresh Kumar, linux-gpio,
	linux-kernel

On Wed, May 24, 2023 at 10:09:42AM +0800, Kent Gibson wrote:
> On Wed, May 24, 2023 at 07:58:36AM +0800, Kent Gibson wrote:
> > On Tue, May 23, 2023 at 05:51:01PM +0200, Bartosz Golaszewski wrote:
> > > When a GPIO device is forcefully unregistered, we are left with an
> > > inactive object. If user-space kept an open file descriptor to a line
> > > request associated with such a structure, upon closing it, we'll see the
> > > kernel crash due to freeing unexistent GPIO descriptors.
> > > 
> > 
> > > @@ -1565,17 +1571,21 @@ static ssize_t linereq_read(struct file *file, char __user *buf,
> > >  
> > >  static void linereq_free(struct linereq *lr)
> > >  {
> > > +	struct gpio_device *gdev = lr->gdev;
> > >  	unsigned int i;
> > >  
> > >  	for (i = 0; i < lr->num_lines; i++) {
> > >  		if (lr->lines[i].desc) {
> > >  			edge_detector_stop(&lr->lines[i]);
> > > -			gpiod_free(lr->lines[i].desc);
> > > +			down_write(&gdev->sem);
> > > +			if (gdev->chip)
> > > +				gpiod_free(lr->lines[i].desc);
> > > +			up_write(&gdev->sem);
> 
> Ummm, taking another look at the oops I sent you, the crash actually
> occurs in edge_detector_stop():
> 
> May 23 11:47:06 firefly kernel: [ 4216.877056] Call Trace:
> May 23 11:47:06 firefly kernel: [ 4216.877512]  <TASK>
> May 23 11:47:06 firefly kernel: [ 4216.877924]  irq_domain_deactivate_irq+0x19/0x30
> May 23 11:47:06 firefly kernel: [ 4216.878543]  free_irq+0x257/0x360
> May 23 11:47:06 firefly kernel: [ 4216.879056]  linereq_free+0x9b/0xe0
> May 23 11:47:06 firefly kernel: [ 4216.879608]  linereq_release+0xc/0x20
> May 23 11:47:06 firefly kernel: [ 4216.880230]  __fput+0x87/0x240
> May 23 11:47:06 firefly kernel: [ 4216.880744]  task_work_run+0x54/0x80
> 
> That free_irq() call is in edge_detector_stop() (which apparently is inlined),
> not in gpiod_free().
> 
> So pretty sure this patch doesn't even solve my problem, but I will test
> it to confirm.
> 

Yeah, doesn't fix my problem still crashes.

If the line request doesn't have edge detection enabled (so no
irq) then I don't get a crash.
i.e. use gpioset to request the line, rather than gpiomon.

(To provide background for anyone else trying to follow along, the
scenario is:
1. create a gpio-sim
2. request a line
3. destroy the gpio-sim
4. release the line.

3 triggers this error:

May 24 11:11:12 firefly kernel: [  200.027280] gpio_stub_drv gpiochip0: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED

and 4 triggers a crash - if the requested line holds an irq.)

I would point out:
/**
 * gpiochip_remove() - unregister a gpio_chip
 * @gc: the chip to unregister
 *
 * A gpio_chip with any GPIOs still requested may not be removed.
 */
void gpiochip_remove(struct gpio_chip *gc)

which is where that dev_crit() is, so destroying the gpio-sim has already
invalidated that contract.

Anyway, it seems my problem is the forced removal of the gpiochip invalidates
the irq that the line request is holding.
Not sure how best to deal with that.

Moving the edge_detector_stop() inside the "if (gdev->chip)" check of
your patch does prevent crash.
But in that case edge_detector_stop() does other cleanup that is no longer
getting done.
Perhaps if the chip is gone then zero line->irq prior to calling
edge_detector_stop()?
Again, this is starting to feel like a hack for gpiolib not being good
at telling the client that it has to pull the rug.
Though according the the gpiochip_remove() docs, it WONT pull the rug,
so you get that.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpio: cdev: fix a crash on line-request release
  2023-05-24  4:36     ` Kent Gibson
@ 2023-05-24 19:42       ` Bartosz Golaszewski
  2023-05-25  2:47         ` Kent Gibson
  0 siblings, 1 reply; 6+ messages in thread
From: Bartosz Golaszewski @ 2023-05-24 19:42 UTC (permalink / raw)
  To: Kent Gibson
  Cc: Linus Walleij, Andy Shevchenko, Viresh Kumar, linux-gpio,
	linux-kernel

On Wed, May 24, 2023 at 6:36 AM Kent Gibson <warthog618@gmail.com> wrote:
>
> On Wed, May 24, 2023 at 10:09:42AM +0800, Kent Gibson wrote:
> > On Wed, May 24, 2023 at 07:58:36AM +0800, Kent Gibson wrote:
> > > On Tue, May 23, 2023 at 05:51:01PM +0200, Bartosz Golaszewski wrote:
> > > > When a GPIO device is forcefully unregistered, we are left with an
> > > > inactive object. If user-space kept an open file descriptor to a line
> > > > request associated with such a structure, upon closing it, we'll see the
> > > > kernel crash due to freeing unexistent GPIO descriptors.
> > > >
> > >
> > > > @@ -1565,17 +1571,21 @@ static ssize_t linereq_read(struct file *file, char __user *buf,
> > > >
> > > >  static void linereq_free(struct linereq *lr)
> > > >  {
> > > > + struct gpio_device *gdev = lr->gdev;
> > > >   unsigned int i;
> > > >
> > > >   for (i = 0; i < lr->num_lines; i++) {
> > > >           if (lr->lines[i].desc) {
> > > >                   edge_detector_stop(&lr->lines[i]);
> > > > -                 gpiod_free(lr->lines[i].desc);
> > > > +                 down_write(&gdev->sem);
> > > > +                 if (gdev->chip)
> > > > +                         gpiod_free(lr->lines[i].desc);
> > > > +                 up_write(&gdev->sem);
> >
> > Ummm, taking another look at the oops I sent you, the crash actually
> > occurs in edge_detector_stop():
> >
> > May 23 11:47:06 firefly kernel: [ 4216.877056] Call Trace:
> > May 23 11:47:06 firefly kernel: [ 4216.877512]  <TASK>
> > May 23 11:47:06 firefly kernel: [ 4216.877924]  irq_domain_deactivate_irq+0x19/0x30
> > May 23 11:47:06 firefly kernel: [ 4216.878543]  free_irq+0x257/0x360
> > May 23 11:47:06 firefly kernel: [ 4216.879056]  linereq_free+0x9b/0xe0
> > May 23 11:47:06 firefly kernel: [ 4216.879608]  linereq_release+0xc/0x20
> > May 23 11:47:06 firefly kernel: [ 4216.880230]  __fput+0x87/0x240
> > May 23 11:47:06 firefly kernel: [ 4216.880744]  task_work_run+0x54/0x80
> >
> > That free_irq() call is in edge_detector_stop() (which apparently is inlined),
> > not in gpiod_free().
> >
> > So pretty sure this patch doesn't even solve my problem, but I will test
> > it to confirm.
> >
>
> Yeah, doesn't fix my problem still crashes.
>
> If the line request doesn't have edge detection enabled (so no
> irq) then I don't get a crash.
> i.e. use gpioset to request the line, rather than gpiomon.
>
> (To provide background for anyone else trying to follow along, the
> scenario is:
> 1. create a gpio-sim
> 2. request a line
> 3. destroy the gpio-sim
> 4. release the line.
>
> 3 triggers this error:
>
> May 24 11:11:12 firefly kernel: [  200.027280] gpio_stub_drv gpiochip0: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED
>
> and 4 triggers a crash - if the requested line holds an irq.)
>
> I would point out:
> /**
>  * gpiochip_remove() - unregister a gpio_chip
>  * @gc: the chip to unregister
>  *
>  * A gpio_chip with any GPIOs still requested may not be removed.
>  */
> void gpiochip_remove(struct gpio_chip *gc)
>
> which is where that dev_crit() is, so destroying the gpio-sim has already
> invalidated that contract.
>
> Anyway, it seems my problem is the forced removal of the gpiochip invalidates
> the irq that the line request is holding.
> Not sure how best to deal with that.
>
> Moving the edge_detector_stop() inside the "if (gdev->chip)" check of
> your patch does prevent crash.
> But in that case edge_detector_stop() does other cleanup that is no longer
> getting done.
> Perhaps if the chip is gone then zero line->irq prior to calling
> edge_detector_stop()?
> Again, this is starting to feel like a hack for gpiolib not being good
> at telling the client that it has to pull the rug.
> Though according the the gpiochip_remove() docs, it WONT pull the rug,
> so you get that.
>
> Cheers,
> Kent.

Interestingly enough, I did test it just like you and the "fix" seemed
to address the issue. Upon a further look at the code, it's of course
clear that the patch is wrong.

I wanted to debug the code to see what's happening exactly but it
turned out that enabling the generation of DWARF data hid the issue as
well even without any fix. It means that it's some kind of a memory
corruption rather than a regular NULL-pointer dereference.

I'm not yet sure where the crash happens exactly other that it's in
the irq domain code.

Anyway, I'll be back at it tomorrow.

Bart

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpio: cdev: fix a crash on line-request release
  2023-05-24 19:42       ` Bartosz Golaszewski
@ 2023-05-25  2:47         ` Kent Gibson
  0 siblings, 0 replies; 6+ messages in thread
From: Kent Gibson @ 2023-05-25  2:47 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Linus Walleij, Andy Shevchenko, Viresh Kumar, linux-gpio,
	linux-kernel

On Wed, May 24, 2023 at 09:42:13PM +0200, Bartosz Golaszewski wrote:
> On Wed, May 24, 2023 at 6:36 AM Kent Gibson <warthog618@gmail.com> wrote:
> >
> Interestingly enough, I did test it just like you and the "fix" seemed
> to address the issue. Upon a further look at the code, it's of course
> clear that the patch is wrong.
> 
> I wanted to debug the code to see what's happening exactly but it
> turned out that enabling the generation of DWARF data hid the issue as
> well even without any fix. It means that it's some kind of a memory
> corruption rather than a regular NULL-pointer dereference.
> 
> I'm not yet sure where the crash happens exactly other that it's in
> the irq domain code.
> 
> Anyway, I'll be back at it tomorrow.
> 

I was also playing with a patch for gpiomon to have it add POLLERR to
its poll() to see if it would notice the chip removal and exit.
It didn't, it just stayed blocked, but it DID made the crash go away
when I killed it.  No idea why that would be.  So yeah, weird things.

This was the patch, btw:

diff --git a/tools/gpiomon.c b/tools/gpiomon.c
index c2684c2..f4251fc 100644
--- a/tools/gpiomon.c
+++ b/tools/gpiomon.c
@@ -431,7 +431,7 @@ int main(int argc, char **argv)
                                   resolver->chips[i].path);
 
                pollfds[i].fd = gpiod_line_request_get_fd(requests[i]);
-               pollfds[i].events = POLLIN;
+               pollfds[i].events = POLLIN | POLLERR;
                gpiod_chip_close(chip);
        }
 
@@ -452,6 +452,9 @@ int main(int argc, char **argv)
                        if (pollfds[i].revents == 0)
                                continue;
 
+                       if (pollfds[i].revents & POLLERR)
+                               die_perror("error polling for events");
+
                        ret = gpiod_line_request_read_edge_events(requests[i],
                                         event_buffer, EVENT_BUF_SIZE);
                        if (ret < 0)


I expect to add that, or something along those lines, if hot removal of
chips from the kernel ever works like it should.
In the meantime it is just another curiosity.

Cheers,
Kent.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-25  2:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-23 15:51 [PATCH] gpio: cdev: fix a crash on line-request release Bartosz Golaszewski
2023-05-23 23:58 ` Kent Gibson
2023-05-24  2:09   ` Kent Gibson
2023-05-24  4:36     ` Kent Gibson
2023-05-24 19:42       ` Bartosz Golaszewski
2023-05-25  2:47         ` Kent Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).