Linux GPIO subsystem development
 help / color / mirror / Atom feed
* regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
@ 2025-03-11 10:00 David Jander
  2025-03-11 10:21 ` Bartosz Golaszewski
  2025-03-11 11:45 ` Bartosz Golaszewski
  0 siblings, 2 replies; 12+ messages in thread
From: David Jander @ 2025-03-11 10:00 UTC (permalink / raw)
  To: Bartosz Golaszewski; +Cc: Kent Gibson, Linus Walleij, linux-gpio, linux-kernel


Dear Bartosz,

I noticed this because after updating the kernel from 6.11 to 6.14 a
user-space application that uses GPIOs heavily started getting extremely slow,
to the point that I will need to heavily modify this application in order to
be usable again.
I traced the problem down to the following patch that went into 6.13:

fcc8b637c542 gpiolib: switch the line state notifier to atomic

What happens here, is that gpio_chrdev_release() now calls
atomic_notifier_chain_unregister(), which uses RCU, and as such must call
synchronize_rcu(). synchronize_rcu() waits for the RCU grace time to expire
before returning and according to the documentation can cause a delay of up to
several milliseconds. In fact it seems to take between 8-10ms on my system (an
STM32MP153C single-core Cortex-A7).

This has the effect that the time it takes to call close() on a /dev/gpiochipX
takes now ~10ms each time. If I git-revert this commit, close() will take less
than 1ms.

10ms doesn't sound like much, but it is more ~10x the time it tool before,
and unfortunately libgpiod code calls this function very often in some places,
especially in find_line() if your board has many gpiochips (mine has 16
chardevs).

The effect can easily be reproduced with the gpiofind tool:

Running on kernel 6.12:

$ time gpiofind LPOUT0
gpiochip7 9
real    0m 0.02s
user    0m 0.00s
sys     0m 0.01s

Running on kernel 6.13:

$ time gpiofind LPOUT0
gpiochip7 9
real    0m 0.19s
user    0m 0.00s
sys     0m 0.01s

That is almost a 10x increase in execution time of the whole program!!

On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was
on 6.12.

Unfortunately I can't come up with an easy solution to this problem, that's
why I don't have a patch to propose. Sorry for that.

I still think it is a bit alarming this change has such a huge impact. IMHO it
really shouldn't. What can be done about this? Is it maybe possible to defer
unregistering and freeing to a kthread and return from the release function
earlier?

Best regards,

-- 
David Jander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-11 10:00 regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance David Jander
@ 2025-03-11 10:21 ` Bartosz Golaszewski
  2025-03-11 11:03   ` David Jander
  2025-03-11 11:45 ` Bartosz Golaszewski
  1 sibling, 1 reply; 12+ messages in thread
From: Bartosz Golaszewski @ 2025-03-11 10:21 UTC (permalink / raw)
  To: David Jander
  Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio,
	linux-kernel

On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote:
>
>
> Dear Bartosz,
>
> I noticed this because after updating the kernel from 6.11 to 6.14 a
> user-space application that uses GPIOs heavily started getting extremely slow,
> to the point that I will need to heavily modify this application in order to
> be usable again.
> I traced the problem down to the following patch that went into 6.13:
>
> fcc8b637c542 gpiolib: switch the line state notifier to atomic
>
> What happens here, is that gpio_chrdev_release() now calls
> atomic_notifier_chain_unregister(), which uses RCU, and as such must call
> synchronize_rcu(). synchronize_rcu() waits for the RCU grace time to expire
> before returning and according to the documentation can cause a delay of up to
> several milliseconds. In fact it seems to take between 8-10ms on my system (an
> STM32MP153C single-core Cortex-A7).
>
> This has the effect that the time it takes to call close() on a /dev/gpiochipX
> takes now ~10ms each time. If I git-revert this commit, close() will take less
> than 1ms.
>

Thanks for the detailed report!

> 10ms doesn't sound like much, but it is more ~10x the time it tool before,
> and unfortunately libgpiod code calls this function very often in some places,
> especially in find_line() if your board has many gpiochips (mine has 16
> chardevs).

Yeah, I imagine it can affect the speed of execution of gpiofind,
gpiodetect and any other program that iterates over all character
devices.

>
> The effect can easily be reproduced with the gpiofind tool:
>
> Running on kernel 6.12:
>
> $ time gpiofind LPOUT0
> gpiochip7 9
> real    0m 0.02s
> user    0m 0.00s
> sys     0m 0.01s
>
> Running on kernel 6.13:
>
> $ time gpiofind LPOUT0
> gpiochip7 9
> real    0m 0.19s
> user    0m 0.00s
> sys     0m 0.01s
>
> That is almost a 10x increase in execution time of the whole program!!
>
> On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was
> on 6.12.
>
> Unfortunately I can't come up with an easy solution to this problem, that's
> why I don't have a patch to propose. Sorry for that.
>
> I still think it is a bit alarming this change has such a huge impact. IMHO it
> really shouldn't. What can be done about this? Is it maybe possible to defer
> unregistering and freeing to a kthread and return from the release function
> earlier?
>

This was my first idea too. Alternatively we can switch to using a raw
notifier and provide a spinlock ourselves.

Bartosz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-11 10:21 ` Bartosz Golaszewski
@ 2025-03-11 11:03   ` David Jander
  2025-03-12  1:32     ` Kent Gibson
  0 siblings, 1 reply; 12+ messages in thread
From: David Jander @ 2025-03-11 11:03 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio,
	linux-kernel


Dear Bartosz,

On Tue, 11 Mar 2025 11:21:10 +0100
Bartosz Golaszewski <brgl@bgdev.pl> wrote:

> On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote:
> >
> >
> > Dear Bartosz,
> >
> > I noticed this because after updating the kernel from 6.11 to 6.14 a
> > user-space application that uses GPIOs heavily started getting extremely slow,
> > to the point that I will need to heavily modify this application in order to
> > be usable again.
> > I traced the problem down to the following patch that went into 6.13:
> >
> > fcc8b637c542 gpiolib: switch the line state notifier to atomic
> >
> > What happens here, is that gpio_chrdev_release() now calls
> > atomic_notifier_chain_unregister(), which uses RCU, and as such must call
> > synchronize_rcu(). synchronize_rcu() waits for the RCU grace time to expire
> > before returning and according to the documentation can cause a delay of up to
> > several milliseconds. In fact it seems to take between 8-10ms on my system (an
> > STM32MP153C single-core Cortex-A7).
> >
> > This has the effect that the time it takes to call close() on a /dev/gpiochipX
> > takes now ~10ms each time. If I git-revert this commit, close() will take less
> > than 1ms.
> >  
> 
> Thanks for the detailed report!

Thanks to you for making this patch in such a way that it is easy to revert
without breaking stuff! That was a read time-saver while diagnosing.

> > 10ms doesn't sound like much, but it is more ~10x the time it tool before,
> > and unfortunately libgpiod code calls this function very often in some places,
> > especially in find_line() if your board has many gpiochips (mine has 16
> > chardevs).  
> 
> Yeah, I imagine it can affect the speed of execution of gpiofind,
> gpiodetect and any other program that iterates over all character
> devices.

Indeed, it does. My application is written in python and uses the python gpiod
module. Even in such an environment the impact is killing.

> > The effect can easily be reproduced with the gpiofind tool:
> >
> > Running on kernel 6.12:
> >
> > $ time gpiofind LPOUT0
> > gpiochip7 9
> > real    0m 0.02s
> > user    0m 0.00s
> > sys     0m 0.01s
> >
> > Running on kernel 6.13:
> >
> > $ time gpiofind LPOUT0
> > gpiochip7 9
> > real    0m 0.19s
> > user    0m 0.00s
> > sys     0m 0.01s
> >
> > That is almost a 10x increase in execution time of the whole program!!
> >
> > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was
> > on 6.12.
> >
> > Unfortunately I can't come up with an easy solution to this problem, that's
> > why I don't have a patch to propose. Sorry for that.
> >
> > I still think it is a bit alarming this change has such a huge impact. IMHO it
> > really shouldn't. What can be done about this? Is it maybe possible to defer
> > unregistering and freeing to a kthread and return from the release function
> > earlier?
> >  
> 
> This was my first idea too. Alternatively we can switch to using a raw
> notifier and provide a spinlock ourselves.

That would probably be a good alternative, although gpiod_line_state_notify()
wouldn't benefit from the zero-lock RCU implementation and incur a spin_lock
penalty. Arguably, this is probably a lot more performance-critical than
closing the chardev, so maybe the atomic notifier isn't a bad idea... we just
need to deal with the writing side so that user-space doesn't have to wait for
the RCU grace period?

Certainly, I suppose switching to the raw notifier is the easier fix.

OTOH, I know from my own experience that the penalty of a spin-lock does
matter sometimes and not having it in the performance critical path is
probably nice to have.

Best regards,

-- 
David Jander


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-11 10:00 regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance David Jander
  2025-03-11 10:21 ` Bartosz Golaszewski
@ 2025-03-11 11:45 ` Bartosz Golaszewski
  2025-03-11 12:30   ` David Jander
  1 sibling, 1 reply; 12+ messages in thread
From: Bartosz Golaszewski @ 2025-03-11 11:45 UTC (permalink / raw)
  To: David Jander
  Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio,
	linux-kernel

On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote:
>
> On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was
> on 6.12.
>

Interestingly: I cannot reproduce it. Obviously gpiofind doesn't exist
in libgpiod v2 but I'm running gpiodetect with and without reverting
these changes and am getting roughly the same results: ~0.050s real
time for 1 up to 4 chips.

Any idea why that could be? Can you reproduce it with libgpiod v2 (I
don't know why that wouldn't be the case but worth double checking).

Bart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-11 11:45 ` Bartosz Golaszewski
@ 2025-03-11 12:30   ` David Jander
  2025-03-11 13:21     ` Bartosz Golaszewski
  0 siblings, 1 reply; 12+ messages in thread
From: David Jander @ 2025-03-11 12:30 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio,
	linux-kernel

On Tue, 11 Mar 2025 12:45:51 +0100
Bartosz Golaszewski <brgl@bgdev.pl> wrote:

> On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote:
> >
> > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was
> > on 6.12.
> >  
> 
> Interestingly: I cannot reproduce it. Obviously gpiofind doesn't exist
> in libgpiod v2 but I'm running gpiodetect with and without reverting
> these changes and am getting roughly the same results: ~0.050s real
> time for 1 up to 4 chips.
> 
> Any idea why that could be? Can you reproduce it with libgpiod v2 (I
> don't know why that wouldn't be the case but worth double checking).

I had libgpiod version 1.6.3 and I have now built libgpiod version 2.1.3.

Here are my findings:

1. time gpiodetect on kernel 6.13 and gpiod 1.6.3:

$ time gpiodetect 
gpiochip0 [GPIOA] (16 lines)
gpiochip1 [GPIOB] (16 lines)
gpiochip10 [GPIOK] (8 lines)
gpiochip11 [GPIOZ] (8 lines)
gpiochip12 [unknown] (22 lines)
gpiochip13 [mcp23s17.0] (16 lines)
gpiochip14 [0-0020] (16 lines)
gpiochip15 [0-0021] (16 lines)
gpiochip2 [GPIOC] (16 lines)
gpiochip3 [GPIOD] (16 lines)
gpiochip4 [GPIOE] (16 lines)
gpiochip5 [GPIOF] (16 lines)
gpiochip6 [GPIOG] (16 lines)
gpiochip7 [GPIOH] (16 lines)
gpiochip8 [GPIOI] (16 lines)
gpiochip9 [GPIOJ] (16 lines)
real    0m 0.19s
user    0m 0.00s
sys     0m 0.01s

2. time gpiodetect on kernel 6.13 and gpiod 2.1.3:

$ time gpiodetect 
gpiochip0 [GPIOA] (16 lines)
gpiochip1 [GPIOB] (16 lines)
gpiochip2 [GPIOC] (16 lines)
gpiochip3 [GPIOD] (16 lines)
gpiochip4 [GPIOE] (16 lines)
gpiochip5 [GPIOF] (16 lines)
gpiochip6 [GPIOG] (16 lines)
gpiochip7 [GPIOH] (16 lines)
gpiochip8 [GPIOI] (16 lines)
gpiochip9 [GPIOJ] (16 lines)
gpiochip10 [GPIOK] (8 lines)
gpiochip11 [GPIOZ] (8 lines)
gpiochip12 [unknown] (22 lines)
gpiochip13 [mcp23s17.0] (16 lines)
gpiochip14 [0-0020] (16 lines)
gpiochip15 [0-0021] (16 lines)
real    0m 0.22s
user    0m 0.00s
sys     0m 0.06s

(note that it became slightly slower from v1 -> v2)

3. time gpiodetect on kernel 6.12 and gpiod 1.6.3:

$ time gpiodetect 
gpiochip0 [GPIOA] (16 lines)
gpiochip1 [GPIOB] (16 lines)
gpiochip10 [GPIOK] (8 lines)
gpiochip11 [GPIOZ] (8 lines)
gpiochip12 [unknown] (22 lines)
gpiochip13 [mcp23s17.0] (16 lines)
gpiochip14 [0-0020] (16 lines)
gpiochip15 [0-0021] (16 lines)
gpiochip2 [GPIOC] (16 lines)
gpiochip3 [GPIOD] (16 lines)
gpiochip4 [GPIOE] (16 lines)
gpiochip5 [GPIOF] (16 lines)
gpiochip6 [GPIOG] (16 lines)
gpiochip7 [GPIOH] (16 lines)
gpiochip8 [GPIOI] (16 lines)
gpiochip9 [GPIOJ] (16 lines)
real    0m 0.03s
user    0m 0.00s
sys     0m 0.01s

4. time gpiodetect on kernel 6.12 and gpiod 2.1.3:

$ time gpiodetect 
gpiochip0 [GPIOA] (16 lines)
gpiochip1 [GPIOB] (16 lines)
gpiochip2 [GPIOC] (16 lines)
gpiochip3 [GPIOD] (16 lines)
gpiochip4 [GPIOE] (16 lines)
gpiochip5 [GPIOF] (16 lines)
gpiochip6 [GPIOG] (16 lines)
gpiochip7 [GPIOH] (16 lines)
gpiochip8 [GPIOI] (16 lines)
gpiochip9 [GPIOJ] (16 lines)
gpiochip10 [GPIOK] (8 lines)
gpiochip11 [GPIOZ] (8 lines)
gpiochip12 [unknown] (22 lines)
gpiochip13 [mcp23s17.0] (16 lines)
gpiochip14 [0-0020] (16 lines)
gpiochip15 [0-0021] (16 lines)
real    0m 0.07s
user    0m 0.00s
sys     0m 0.06s

(roughly same speed difference from v1 -> v2).

Can you describe your platform? Is it a multi-core or single-core CPU? What
RCU implementation does it use? Tree or tiny? If it is multi-core, is there a
difference if you disable all but one core?
Maybe some kernel CONFIG option that makes a difference? I am not an expert in
RCU (in fact I barely know what it does), so maybe I am missing something that
makes this problem go away?

Best regards,

-- 
David Jander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-11 12:30   ` David Jander
@ 2025-03-11 13:21     ` Bartosz Golaszewski
  2025-03-11 14:09       ` David Jander
  0 siblings, 1 reply; 12+ messages in thread
From: Bartosz Golaszewski @ 2025-03-11 13:21 UTC (permalink / raw)
  To: David Jander
  Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio,
	linux-kernel

On Tue, Mar 11, 2025 at 1:30 PM David Jander <david@protonic.nl> wrote:
>
> On Tue, 11 Mar 2025 12:45:51 +0100
> Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> > On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote:
> > >
> > > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was
> > > on 6.12.
> > >
> >
> > Interestingly: I cannot reproduce it. Obviously gpiofind doesn't exist
> > in libgpiod v2 but I'm running gpiodetect with and without reverting
> > these changes and am getting roughly the same results: ~0.050s real
> > time for 1 up to 4 chips.
> >
> > Any idea why that could be? Can you reproduce it with libgpiod v2 (I
> > don't know why that wouldn't be the case but worth double checking).
>
>
> Can you describe your platform? Is it a multi-core or single-core CPU? What
> RCU implementation does it use? Tree or tiny? If it is multi-core, is there a
> difference if you disable all but one core?
> Maybe some kernel CONFIG option that makes a difference? I am not an expert in
> RCU (in fact I barely know what it does), so maybe I am missing something that
> makes this problem go away?
>

I'm testing on a qemu VM - SMP and single core. RCU algo is tree. In
any case: I've just sent you an RFT patch that switches to using the
raw notifier. Could you see what results you're getting with it?

Bart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-11 13:21     ` Bartosz Golaszewski
@ 2025-03-11 14:09       ` David Jander
  0 siblings, 0 replies; 12+ messages in thread
From: David Jander @ 2025-03-11 14:09 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio,
	linux-kernel

On Tue, 11 Mar 2025 14:21:37 +0100
Bartosz Golaszewski <brgl@bgdev.pl> wrote:

> On Tue, Mar 11, 2025 at 1:30 PM David Jander <david@protonic.nl> wrote:
> >
> > On Tue, 11 Mar 2025 12:45:51 +0100
> > Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >  
> > > On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote:  
> > > >
> > > > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was
> > > > on 6.12.
> > > >  
> > >
> > > Interestingly: I cannot reproduce it. Obviously gpiofind doesn't exist
> > > in libgpiod v2 but I'm running gpiodetect with and without reverting
> > > these changes and am getting roughly the same results: ~0.050s real
> > > time for 1 up to 4 chips.
> > >
> > > Any idea why that could be? Can you reproduce it with libgpiod v2 (I
> > > don't know why that wouldn't be the case but worth double checking).  
> >
> >
> > Can you describe your platform? Is it a multi-core or single-core CPU? What
> > RCU implementation does it use? Tree or tiny? If it is multi-core, is there a
> > difference if you disable all but one core?
> > Maybe some kernel CONFIG option that makes a difference? I am not an expert in
> > RCU (in fact I barely know what it does), so maybe I am missing something that
> > makes this problem go away?
> >  
> 
> I'm testing on a qemu VM - SMP and single core. RCU algo is tree. 

I haven't followed deep into the RCU code, but I had the impression that the
synchronize_rcu code sets a completion and waits for it, so I suspect the bulk
of the delay coming from context switches. Maybe on qemu this behaves
differently than on real hardware in terms of timing and overhead?

Btw, there is also another way this can be made visible:

$ strace -r gpiodetect
     0.000000 execve("/usr/bin/gpiodetect", ["gpiodetect"], 0x7eec5d94 /* 13 vars */) = 0
     0.017348 brk(NULL)                 = 0xc21000
     0.000689 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
...
     0.000677 openat(AT_FDCWD, "/dev/gpiochip9", O_RDWR|O_CLOEXEC) = 18
     0.000741 statx(AT_FDCWD, "/dev/gpiochip9", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFCHR|0600, stx_size=0, ...}) = 0
     0.000794 statx(AT_FDCWD, "/dev/gpiochip9", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFCHR|0600, stx_size=0, ...}) = 0
     0.000794 access("/sys/bus/gpio/devices/gpiochip9/dev", R_OK) = 0
     0.000738 openat(AT_FDCWD, "/sys/bus/gpio/devices/gpiochip9/dev", O_RDONLY) = 19
     0.000754 read(19, "254:9\n", 15)   = 6
     0.000597 close(19)                 = 0
     0.000569 ioctl(18, GPIO_GET_CHIPINFO_IOCTL, {name="gpiochip9", label="GPIOJ", lines=16}) = 0
     0.000631 statx(1, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFCHR|0600, stx_size=0, ...}) = 0
     0.000778 ioctl(1, TCGETS, {c_iflag=ICRNL|IXON|IXOFF|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B1500000|CS8|CREAD|HUPCL|CLOCAL, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|ECHOKE, ...}) = 0
     0.001941 write(1, "gpiochip0 [GPIOA] (16 lines)\n", 29gpiochip0 [GPIOA] (16 lines)
) = 29
     0.000670 close(3)                  = 0
     0.006162 write(1, "gpiochip1 [GPIOB] (16 lines)\n", 29gpiochip1 [GPIOB] (16 lines)
) = 29
     0.000655 close(4)                  = 0
     0.006250 write(1, "gpiochip10 [GPIOK] (8 lines)\n", 29gpiochip10 [GPIOK] (8 lines)
) = 29
     0.000667 close(5)                  = 0
     0.006338 write(1, "gpiochip11 [GPIOZ] (8 lines)\n", 29gpiochip11 [GPIOZ] (8 lines)
) = 29
     0.000661 close(6)                  = 0
     0.006321 write(1, "gpiochip12 [unknown] (22 lines)\n", 32gpiochip12 [unknown] (22 lines)
) = 32
     0.000662 close(7)                  = 0
     0.006365 write(1, "gpiochip13 [mcp23s17.0] (16 line"..., 35gpiochip13 [mcp23s17.0] (16 lines)
) = 35
     0.000672 close(8)                  = 0
     0.006389 write(1, "gpiochip14 [0-0020] (16 lines)\n", 31gpiochip14 [0-0020] (16 lines)
) = 31
     0.000641 close(9)                  = 0
     0.006268 write(1, "gpiochip15 [0-0021] (16 lines)\n", 31gpiochip15 [0-0021] (16 lines)
) = 31
     0.000677 close(10)                 = 0
     0.006330 write(1, "gpiochip2 [GPIOC] (16 lines)\n", 29gpiochip2 [GPIOC] (16 lines)
) = 29
     0.000648 close(11)                 = 0
     0.006358 write(1, "gpiochip3 [GPIOD] (16 lines)\n", 29gpiochip3 [GPIOD] (16 lines)
) = 29
     0.000655 close(12)                 = 0
     0.006333 write(1, "gpiochip4 [GPIOE] (16 lines)\n", 29gpiochip4 [GPIOE] (16 lines)
) = 29
     0.000669 close(13)                 = 0
     0.006332 write(1, "gpiochip5 [GPIOF] (16 lines)\n", 29gpiochip5 [GPIOF] (16 lines)
) = 29
     0.000653 close(14)                 = 0
     0.006359 write(1, "gpiochip6 [GPIOG] (16 lines)\n", 29gpiochip6 [GPIOG] (16 lines)
) = 29
     0.000653 close(15)                 = 0
     0.006332 write(1, "gpiochip7 [GPIOH] (16 lines)\n", 29gpiochip7 [GPIOH] (16 lines)
) = 29
     0.000657 close(16)                 = 0
     0.006344 write(1, "gpiochip8 [GPIOI] (16 lines)\n", 29gpiochip8 [GPIOI] (16 lines)
) = 29
     0.000661 close(17)                 = 0
     0.006343 write(1, "gpiochip9 [GPIOJ] (16 lines)\n", 29gpiochip9 [GPIOJ] (16 lines)
) = 29
     0.000654 close(18)                 = 0
     0.006394 exit_group(0)             = ?
     0.001335 +++ exited with 0 +++

I hope this is readable. Notice the long relative time-stamp of the syscall
immediately preceding each of the close() calls that close a gpiochip chardev.

For comparison, here's the output after applying your patch:

...
     0.000474 close(3)                  = 0
     0.000472 write(1, "gpiochip1 [GPIOB] (16 lines)\n", 29gpiochip1 [GPIOB] (16 lines)
) = 29
     0.000426 close(4)                  = 0
     0.000406 write(1, "gpiochip10 [GPIOK] (8 lines)\n", 29gpiochip10 [GPIOK] (8 lines)
) = 29
     0.000418 close(5)                  = 0
     0.000354 write(1, "gpiochip11 [GPIOZ] (8 lines)\n", 29gpiochip11 [GPIOZ] (8 lines)
) = 29
     0.000454 close(6)                  = 0
     0.000359 write(1, "gpiochip12 [unknown] (22 lines)\n", 32gpiochip12 [unknown] (22 lines)
) = 32
     0.000457 close(7)                  = 0
     0.000433 write(1, "gpiochip13 [mcp23s17.0] (16 line"..., 35gpiochip13 [mcp23s17.0] (16 lines)
) = 35
     0.000412 close(8)                  = 0
     0.000422 write(1, "gpiochip14 [0-0020] (16 lines)\n", 31gpiochip14 [0-0020] (16 lines)
) = 31
     0.000416 close(9)                  = 0
     0.000416 write(1, "gpiochip15 [0-0021] (16 lines)\n", 31gpiochip15 [0-0021] (16 lines)
) = 31
     0.000408 close(10)                 = 0
     0.000358 write(1, "gpiochip2 [GPIOC] (16 lines)\n", 29gpiochip2 [GPIOC] (16 lines)
) = 29
     0.000441 close(11)                 = 0
     0.000359 write(1, "gpiochip3 [GPIOD] (16 lines)\n", 29gpiochip3 [GPIOD] (16 lines)
) = 29
     0.000443 close(12)                 = 0
     0.000359 write(1, "gpiochip4 [GPIOE] (16 lines)\n", 29gpiochip4 [GPIOE] (16 lines)
) = 29
     0.000394 close(13)                 = 0
     0.000413 write(1, "gpiochip5 [GPIOF] (16 lines)\n", 29gpiochip5 [GPIOF] (16 lines)
) = 29
     0.000396 close(14)                 = 0
     0.000479 write(1, "gpiochip6 [GPIOG] (16 lines)\n", 29gpiochip6 [GPIOG] (16 lines)
) = 29
     0.000431 close(15)                 = 0
     0.000431 write(1, "gpiochip7 [GPIOH] (16 lines)\n", 29gpiochip7 [GPIOH] (16 lines)
) = 29
     0.000471 close(16)                 = 0
     0.000381 write(1, "gpiochip8 [GPIOI] (16 lines)\n", 29gpiochip8 [GPIOI] (16 lines)
) = 29
     0.000484 close(17)                 = 0
     0.000374 write(1, "gpiochip9 [GPIOJ] (16 lines)\n", 29gpiochip9 [GPIOJ] (16 lines)
) = 29
     0.000480 close(18)                 = 0
     0.000418 exit_group(0)             = ?
     0.001373 +++ exited with 0 +++

> In
> any case: I've just sent you an RFT patch that switches to using the
> raw notifier. Could you see what results you're getting with it?

Thanks! This was quick :-)

Like I also replied to the patch email (kernel 6.13 and libgpiod 1.6.3):

$ time gpiofind LPOUT0
gpiochip7 9
real    0m 0.02s
user    0m 0.00s
sys     0m 0.01s

$ time gpiodetect
gpiochip0 [GPIOA] (16 lines)
gpiochip1 [GPIOB] (16 lines)
gpiochip10 [GPIOK] (8 lines)
gpiochip11 [GPIOZ] (8 lines)
gpiochip12 [unknown] (22 lines)
gpiochip13 [mcp23s17.0] (16 lines)
gpiochip14 [0-0020] (16 lines)
gpiochip15 [0-0021] (16 lines)
gpiochip2 [GPIOC] (16 lines)
gpiochip3 [GPIOD] (16 lines)
gpiochip4 [GPIOE] (16 lines)
gpiochip5 [GPIOF] (16 lines)
gpiochip6 [GPIOG] (16 lines)
gpiochip7 [GPIOH] (16 lines)
gpiochip8 [GPIOI] (16 lines)
gpiochip9 [GPIOJ] (16 lines)
real    0m 0.03s
user    0m 0.00s
sys     0m 0.01s

Best regards,

-- 
David Jander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-11 11:03   ` David Jander
@ 2025-03-12  1:32     ` Kent Gibson
  2025-03-12  8:08       ` David Jander
  0 siblings, 1 reply; 12+ messages in thread
From: Kent Gibson @ 2025-03-12  1:32 UTC (permalink / raw)
  To: David Jander
  Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij,
	linux-gpio, linux-kernel

On Tue, Mar 11, 2025 at 12:03:46PM +0100, David Jander wrote:
>
>
> Indeed, it does. My application is written in python and uses the python gpiod
> module. Even in such an environment the impact is killing.
>

Interesting - the only reason I could think of for an application
requesting/releasing GPIOs at a high rate was it if was built on top of
the libgpiod tools and so was unable to hold the request fd.
Generally an application should request the lines it requires once and hold
them for the duration.  Similarly functions such as find_line() should be
performed once per line.
From a performance perspective, NOT having to re-request a line is
considerably faster than requesting it - even with Bart's fix.

Is there something unusual about your app that requires the lines be
released?

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-12  1:32     ` Kent Gibson
@ 2025-03-12  8:08       ` David Jander
  2025-03-12  9:10         ` Kent Gibson
  0 siblings, 1 reply; 12+ messages in thread
From: David Jander @ 2025-03-12  8:08 UTC (permalink / raw)
  To: Kent Gibson
  Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij,
	linux-gpio, linux-kernel

On Wed, 12 Mar 2025 09:32:56 +0800
Kent Gibson <warthog618@gmail.com> wrote:

> On Tue, Mar 11, 2025 at 12:03:46PM +0100, David Jander wrote:
> >
> > Indeed, it does. My application is written in python and uses the python gpiod
> > module. Even in such an environment the impact is killing.
> 
> Interesting - the only reason I could think of for an application
> requesting/releasing GPIOs at a high rate was it if was built on top of
> the libgpiod tools and so was unable to hold the request fd.

I didn't want to bother the list with the details, but this is during the
configuration phase of the application. It receives many configuration messages
for different IO objects at a fast pace. Most of those objects use one or more
GPIO lines identified by their label. So the application calls
gpiod.find_line(label) on each of them. Apparently libgiod (version 1.6.3 in
our case) isn't very efficient, since it will open and close each of the
gpiodev devices in order to query for each of the gpio lines. I wouldn't blame
libgpiod (python bindings) for doing it that way, since open()/close() of a
chardev are expected to be fast, and caching this information is probably
error prone anyway, since AFAIK user space cannot yet be informed of changes
to gpio chips from kernel space.

If this had been this slow always (even before 6.13), I would probably have
done things a bit differently and cached the config requests to then "find"
the lines in batches directly working on the character devices instead of
using gpiod, so I could open/close each one just once for finding many
different lines each time.

> Generally an application should request the lines it requires once and hold
> them for the duration.  Similarly functions such as find_line() should be
> performed once per line.

Of course it does that ;-)
This board has a large amount of GPIO lines, and like I said, it is during the
initial configuration phase of the application that I am seeing this problem.

> From a performance perspective, NOT having to re-request a line is
> considerably faster than requesting it - even with Bart's fix.
> 
> Is there something unusual about your app that requires the lines be
> released?

See above.

Best regards,

-- 
David Jander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-12  8:08       ` David Jander
@ 2025-03-12  9:10         ` Kent Gibson
  2025-03-12 10:24           ` David Jander
  0 siblings, 1 reply; 12+ messages in thread
From: Kent Gibson @ 2025-03-12  9:10 UTC (permalink / raw)
  To: David Jander
  Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij,
	linux-gpio, linux-kernel

On Wed, Mar 12, 2025 at 09:08:29AM +0100, David Jander wrote:
> On Wed, 12 Mar 2025 09:32:56 +0800
> Kent Gibson <warthog618@gmail.com> wrote:
>
> > On Tue, Mar 11, 2025 at 12:03:46PM +0100, David Jander wrote:
> > >
> > > Indeed, it does. My application is written in python and uses the python gpiod
> > > module. Even in such an environment the impact is killing.
> >
> > Interesting - the only reason I could think of for an application
> > requesting/releasing GPIOs at a high rate was it if was built on top of
> > the libgpiod tools and so was unable to hold the request fd.
>

Btw, I'm not suggesting that anyone build an app on top of the libgpiod
tools - I was just hunting for an explanation as to why anyone might be
opening and closing chips or requests at a high rate.

> I didn't want to bother the list with the details, but this is during the
> configuration phase of the application.

The fact that close() was slow is valid but it left me wondering why you
were needing to do that so frequently.
It helps to understand what you are doing and why to see if there are
other better solutions - or it there should be.

> It receives many configuration messages
> for different IO objects at a fast pace. Most of those objects use one or more
> GPIO lines identified by their label. So the application calls
> gpiod.find_line(label) on each of them. Apparently libgiod (version 1.6.3 in
> our case) isn't very efficient, since it will open and close each of the
> gpiodev devices in order to query for each of the gpio lines. I wouldn't blame
> libgpiod (python bindings) for doing it that way, since open()/close() of a
> chardev are expected to be fast, and caching this information is probably
> error prone anyway, since AFAIK user space cannot yet be informed of changes
> to gpio chips from kernel space.
>

Ok, if the issue is purely the name -> (chip,offset) mapping it is pretty
safe to assume that line names are immutable - though not unique, so
caching the mapping should be fine.

The kernel can already tell userspace about a number of changes.
What changes are you concerned about - adding/removing chips?

> If this had been this slow always (even before 6.13), I would probably have
> done things a bit differently and cached the config requests to then "find"
> the lines in batches directly working on the character devices instead of
> using gpiod, so I could open/close each one just once for finding many
> different lines each time.
>

The libgpiod v2 tools do just that - they scan the chips once rather
than once per line.  But that functionality is not exposed in the
libgpiod v2 API as the C interface is hideous and it is difficult to
provide well defined behaviour (e.g. in what order are the chips scanned?).
So it is left to the application to determine how they want to do it.
There isn't even a find_line() equivalent in v2, IIRC.

> > Generally an application should request the lines it requires once and hold
> > them for the duration.  Similarly functions such as find_line() should be
> > performed once per line.
>
> Of course it does that ;-)
> This board has a large amount of GPIO lines, and like I said, it is during the
> initial configuration phase of the application that I am seeing this problem.
>

Good to hear - from your earlier description I was concerned that
you might be doing it continuously.

Cheers,
Kent.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-12  9:10         ` Kent Gibson
@ 2025-03-12 10:24           ` David Jander
  2025-03-12 12:10             ` Kent Gibson
  0 siblings, 1 reply; 12+ messages in thread
From: David Jander @ 2025-03-12 10:24 UTC (permalink / raw)
  To: Kent Gibson
  Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij,
	linux-gpio, linux-kernel

On Wed, 12 Mar 2025 17:10:56 +0800
Kent Gibson <warthog618@gmail.com> wrote:

> On Wed, Mar 12, 2025 at 09:08:29AM +0100, David Jander wrote:
> > On Wed, 12 Mar 2025 09:32:56 +0800
> > Kent Gibson <warthog618@gmail.com> wrote:
> >  
> > > On Tue, Mar 11, 2025 at 12:03:46PM +0100, David Jander wrote:  
> > > >
> > > > Indeed, it does. My application is written in python and uses the python gpiod
> > > > module. Even in such an environment the impact is killing.  
> > >
> > > Interesting - the only reason I could think of for an application
> > > requesting/releasing GPIOs at a high rate was it if was built on top of
> > > the libgpiod tools and so was unable to hold the request fd.  
> >  
> 
> Btw, I'm not suggesting that anyone build an app on top of the libgpiod
> tools - I was just hunting for an explanation as to why anyone might be
> opening and closing chips or requests at a high rate.

Very understandable.

> > I didn't want to bother the list with the details, but this is during the
> > configuration phase of the application.  
> 
> The fact that close() was slow is valid but it left me wondering why you
> were needing to do that so frequently.
> It helps to understand what you are doing and why to see if there are
> other better solutions - or it there should be.

You are right. In this case it was warranted to describe the situation in a
bit more detail.

> > It receives many configuration messages
> > for different IO objects at a fast pace. Most of those objects use one or more
> > GPIO lines identified by their label. So the application calls
> > gpiod.find_line(label) on each of them. Apparently libgiod (version 1.6.3 in
> > our case) isn't very efficient, since it will open and close each of the
> > gpiodev devices in order to query for each of the gpio lines. I wouldn't blame
> > libgpiod (python bindings) for doing it that way, since open()/close() of a
> > chardev are expected to be fast, and caching this information is probably
> > error prone anyway, since AFAIK user space cannot yet be informed of changes
> > to gpio chips from kernel space.
> >  
> 
> Ok, if the issue is purely the name -> (chip,offset) mapping it is pretty
> safe to assume that line names are immutable - though not unique, so
> caching the mapping should be fine.

On our board that would be fine, but what about hot-pluggable GPIO
chips/devices, or modules that are loaded at a later time? I was thinking
about libgpiod in general...

> The kernel can already tell userspace about a number of changes.
> What changes are you concerned about - adding/removing chips?

Yes, since the patches from Bartosz I understand that is indeed possible now
;-)
No special concern, just thinking about general applicability of caching such
information in libgpiod (especially considering the old version I am using
presumably from long before the kernel could do this).

> > If this had been this slow always (even before 6.13), I would probably have
> > done things a bit differently and cached the config requests to then "find"
> > the lines in batches directly working on the character devices instead of
> > using gpiod, so I could open/close each one just once for finding many
> > different lines each time.
> 
> The libgpiod v2 tools do just that - they scan the chips once rather
> than once per line.  But that functionality is not exposed in the
> libgpiod v2 API as the C interface is hideous and it is difficult to
> provide well defined behaviour (e.g. in what order are the chips scanned?).
> So it is left to the application to determine how they want to do it.
> There isn't even a find_line() equivalent in v2, IIRC.
 
I think I should move to v2 as soon as I find the time to do it. ;-)

In any case, I also could reproduce the issue with the gpiodetect tool from
v2. You can visually see each found chips being printed individually on the
terminal with kernel v6.13, while with 6.12 all chip names would appear
"instantly". Hard to describe with words, but you could in fact tell which
kernel was running just by looking at the terminal output of "gpiodetect"
while it was being executed... my board has 16 gpio chips, so you can really
see it "scrolling" up as it prints them with kernel 6.13.

> > > Generally an application should request the lines it requires once and hold
> > > them for the duration.  Similarly functions such as find_line() should be
> > > performed once per line.  
> >
> > Of course it does that ;-)
> > This board has a large amount of GPIO lines, and like I said, it is during the
> > initial configuration phase of the application that I am seeing this problem.
> >  
> 
> Good to hear - from your earlier description I was concerned that
> you might be doing it continuously.

Thanks. Tbh, I am quite proud of the efficiency and speed of the application
itself. Being written in pure python and running on a rather slow Cortex-A7,
it is surprisingly fast, controlling 16 stepper motors, reacting to 26 sensor
inputs changing at a blazing high pace and continuously sampling several IIO
adcs at 16kHz sample rate, all with rather low CPU usage (in python!). It makes
heavy use of asyncio an thus of course the epoll() event mechanisms the kernel
provides for GPIOs, IIO, LMC and other interfaces.
So you may understand that I was a bit triggered by your suggestion of
inefficiency initially. Sorry ;-)

Best regards,

-- 
David Jander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
  2025-03-12 10:24           ` David Jander
@ 2025-03-12 12:10             ` Kent Gibson
  0 siblings, 0 replies; 12+ messages in thread
From: Kent Gibson @ 2025-03-12 12:10 UTC (permalink / raw)
  To: David Jander
  Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij,
	linux-gpio, linux-kernel

On Wed, Mar 12, 2025 at 11:24:01AM +0100, David Jander wrote:
> >
> > Ok, if the issue is purely the name -> (chip,offset) mapping it is pretty
> > safe to assume that line names are immutable - though not unique, so
> > caching the mapping should be fine.
>
> On our board that would be fine, but what about hot-pluggable GPIO
> chips/devices, or modules that are loaded at a later time? I was thinking
> about libgpiod in general...
>

Indeed.  A general solution suitable for libgpiod is a hard problem.
It is probably something better left to a higher layer.

> > The kernel can already tell userspace about a number of changes.
> > What changes are you concerned about - adding/removing chips?
>
> Yes, since the patches from Bartosz I understand that is indeed possible now
> ;-)
> No special concern, just thinking about general applicability of caching such
> information in libgpiod (especially considering the old version I am using
> presumably from long before the kernel could do this).
>

The change notifiers you are referring to have been there for some time
- they were first introduced late in uAPI v1.
I was also thinking of udev events for when devices are added or removed.
Though again, not something core libgpiod should be getting involved with.

> > > If this had been this slow always (even before 6.13), I would probably have
> > > done things a bit differently and cached the config requests to then "find"
> > > the lines in batches directly working on the character devices instead of
> > > using gpiod, so I could open/close each one just once for finding many
> > > different lines each time.
> >
> > The libgpiod v2 tools do just that - they scan the chips once rather
> > than once per line.  But that functionality is not exposed in the
> > libgpiod v2 API as the C interface is hideous and it is difficult to
> > provide well defined behaviour (e.g. in what order are the chips scanned?).
> > So it is left to the application to determine how they want to do it.
> > There isn't even a find_line() equivalent in v2, IIRC.
>
> I think I should move to v2 as soon as I find the time to do it. ;-)
>

You certainly should - the v2 Python bindings are much more pythonic and
easier to use.

> In any case, I also could reproduce the issue with the gpiodetect tool from
> v2. You can visually see each found chips being printed individually on the
> terminal with kernel v6.13, while with 6.12 all chip names would appear
> "instantly". Hard to describe with words, but you could in fact tell which
> kernel was running just by looking at the terminal output of "gpiodetect"
> while it was being executed... my board has 16 gpio chips, so you can really
> see it "scrolling" up as it prints them with kernel 6.13.
>

For sure - the close() issue is a real issue.

> > > > Generally an application should request the lines it requires once and hold
> > > > them for the duration.  Similarly functions such as find_line() should be
> > > > performed once per line.
> > >
> > > Of course it does that ;-)
> > > This board has a large amount of GPIO lines, and like I said, it is during the
> > > initial configuration phase of the application that I am seeing this problem.
> > >
> >
> > Good to hear - from your earlier description I was concerned that
> > you might be doing it continuously.
>
> Thanks. Tbh, I am quite proud of the efficiency and speed of the application
> itself. Being written in pure python and running on a rather slow Cortex-A7,
> it is surprisingly fast, controlling 16 stepper motors, reacting to 26 sensor
> inputs changing at a blazing high pace and continuously sampling several IIO
> adcs at 16kHz sample rate, all with rather low CPU usage (in python!). It makes
> heavy use of asyncio an thus of course the epoll() event mechanisms the kernel
> provides for GPIOs, IIO, LMC and other interfaces.

You can do some surprising things with Python.

> So you may understand that I was a bit triggered by your suggestion of
> inefficiency initially. Sorry ;-)
>

No problems - I've seen people do some silly things and just wanted
to be sure you weren't one of them ;-).
Glad everything is working for you.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-03-12 12:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-11 10:00 regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance David Jander
2025-03-11 10:21 ` Bartosz Golaszewski
2025-03-11 11:03   ` David Jander
2025-03-12  1:32     ` Kent Gibson
2025-03-12  8:08       ` David Jander
2025-03-12  9:10         ` Kent Gibson
2025-03-12 10:24           ` David Jander
2025-03-12 12:10             ` Kent Gibson
2025-03-11 11:45 ` Bartosz Golaszewski
2025-03-11 12:30   ` David Jander
2025-03-11 13:21     ` Bartosz Golaszewski
2025-03-11 14:09       ` David Jander

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox