* regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance @ 2025-03-11 10:00 David Jander 2025-03-11 10:21 ` Bartosz Golaszewski 2025-03-11 11:45 ` Bartosz Golaszewski 0 siblings, 2 replies; 12+ messages in thread From: David Jander @ 2025-03-11 10:00 UTC (permalink / raw) To: Bartosz Golaszewski; +Cc: Kent Gibson, Linus Walleij, linux-gpio, linux-kernel Dear Bartosz, I noticed this because after updating the kernel from 6.11 to 6.14 a user-space application that uses GPIOs heavily started getting extremely slow, to the point that I will need to heavily modify this application in order to be usable again. I traced the problem down to the following patch that went into 6.13: fcc8b637c542 gpiolib: switch the line state notifier to atomic What happens here, is that gpio_chrdev_release() now calls atomic_notifier_chain_unregister(), which uses RCU, and as such must call synchronize_rcu(). synchronize_rcu() waits for the RCU grace time to expire before returning and according to the documentation can cause a delay of up to several milliseconds. In fact it seems to take between 8-10ms on my system (an STM32MP153C single-core Cortex-A7). This has the effect that the time it takes to call close() on a /dev/gpiochipX takes now ~10ms each time. If I git-revert this commit, close() will take less than 1ms. 10ms doesn't sound like much, but it is more ~10x the time it tool before, and unfortunately libgpiod code calls this function very often in some places, especially in find_line() if your board has many gpiochips (mine has 16 chardevs). The effect can easily be reproduced with the gpiofind tool: Running on kernel 6.12: $ time gpiofind LPOUT0 gpiochip7 9 real 0m 0.02s user 0m 0.00s sys 0m 0.01s Running on kernel 6.13: $ time gpiofind LPOUT0 gpiochip7 9 real 0m 0.19s user 0m 0.00s sys 0m 0.01s That is almost a 10x increase in execution time of the whole program!! On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was on 6.12. Unfortunately I can't come up with an easy solution to this problem, that's why I don't have a patch to propose. Sorry for that. I still think it is a bit alarming this change has such a huge impact. IMHO it really shouldn't. What can be done about this? Is it maybe possible to defer unregistering and freeing to a kthread and return from the release function earlier? Best regards, -- David Jander ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-11 10:00 regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance David Jander @ 2025-03-11 10:21 ` Bartosz Golaszewski 2025-03-11 11:03 ` David Jander 2025-03-11 11:45 ` Bartosz Golaszewski 1 sibling, 1 reply; 12+ messages in thread From: Bartosz Golaszewski @ 2025-03-11 10:21 UTC (permalink / raw) To: David Jander Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio, linux-kernel On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote: > > > Dear Bartosz, > > I noticed this because after updating the kernel from 6.11 to 6.14 a > user-space application that uses GPIOs heavily started getting extremely slow, > to the point that I will need to heavily modify this application in order to > be usable again. > I traced the problem down to the following patch that went into 6.13: > > fcc8b637c542 gpiolib: switch the line state notifier to atomic > > What happens here, is that gpio_chrdev_release() now calls > atomic_notifier_chain_unregister(), which uses RCU, and as such must call > synchronize_rcu(). synchronize_rcu() waits for the RCU grace time to expire > before returning and according to the documentation can cause a delay of up to > several milliseconds. In fact it seems to take between 8-10ms on my system (an > STM32MP153C single-core Cortex-A7). > > This has the effect that the time it takes to call close() on a /dev/gpiochipX > takes now ~10ms each time. If I git-revert this commit, close() will take less > than 1ms. > Thanks for the detailed report! > 10ms doesn't sound like much, but it is more ~10x the time it tool before, > and unfortunately libgpiod code calls this function very often in some places, > especially in find_line() if your board has many gpiochips (mine has 16 > chardevs). Yeah, I imagine it can affect the speed of execution of gpiofind, gpiodetect and any other program that iterates over all character devices. > > The effect can easily be reproduced with the gpiofind tool: > > Running on kernel 6.12: > > $ time gpiofind LPOUT0 > gpiochip7 9 > real 0m 0.02s > user 0m 0.00s > sys 0m 0.01s > > Running on kernel 6.13: > > $ time gpiofind LPOUT0 > gpiochip7 9 > real 0m 0.19s > user 0m 0.00s > sys 0m 0.01s > > That is almost a 10x increase in execution time of the whole program!! > > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was > on 6.12. > > Unfortunately I can't come up with an easy solution to this problem, that's > why I don't have a patch to propose. Sorry for that. > > I still think it is a bit alarming this change has such a huge impact. IMHO it > really shouldn't. What can be done about this? Is it maybe possible to defer > unregistering and freeing to a kthread and return from the release function > earlier? > This was my first idea too. Alternatively we can switch to using a raw notifier and provide a spinlock ourselves. Bartosz ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-11 10:21 ` Bartosz Golaszewski @ 2025-03-11 11:03 ` David Jander 2025-03-12 1:32 ` Kent Gibson 0 siblings, 1 reply; 12+ messages in thread From: David Jander @ 2025-03-11 11:03 UTC (permalink / raw) To: Bartosz Golaszewski Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio, linux-kernel Dear Bartosz, On Tue, 11 Mar 2025 11:21:10 +0100 Bartosz Golaszewski <brgl@bgdev.pl> wrote: > On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote: > > > > > > Dear Bartosz, > > > > I noticed this because after updating the kernel from 6.11 to 6.14 a > > user-space application that uses GPIOs heavily started getting extremely slow, > > to the point that I will need to heavily modify this application in order to > > be usable again. > > I traced the problem down to the following patch that went into 6.13: > > > > fcc8b637c542 gpiolib: switch the line state notifier to atomic > > > > What happens here, is that gpio_chrdev_release() now calls > > atomic_notifier_chain_unregister(), which uses RCU, and as such must call > > synchronize_rcu(). synchronize_rcu() waits for the RCU grace time to expire > > before returning and according to the documentation can cause a delay of up to > > several milliseconds. In fact it seems to take between 8-10ms on my system (an > > STM32MP153C single-core Cortex-A7). > > > > This has the effect that the time it takes to call close() on a /dev/gpiochipX > > takes now ~10ms each time. If I git-revert this commit, close() will take less > > than 1ms. > > > > Thanks for the detailed report! Thanks to you for making this patch in such a way that it is easy to revert without breaking stuff! That was a read time-saver while diagnosing. > > 10ms doesn't sound like much, but it is more ~10x the time it tool before, > > and unfortunately libgpiod code calls this function very often in some places, > > especially in find_line() if your board has many gpiochips (mine has 16 > > chardevs). > > Yeah, I imagine it can affect the speed of execution of gpiofind, > gpiodetect and any other program that iterates over all character > devices. Indeed, it does. My application is written in python and uses the python gpiod module. Even in such an environment the impact is killing. > > The effect can easily be reproduced with the gpiofind tool: > > > > Running on kernel 6.12: > > > > $ time gpiofind LPOUT0 > > gpiochip7 9 > > real 0m 0.02s > > user 0m 0.00s > > sys 0m 0.01s > > > > Running on kernel 6.13: > > > > $ time gpiofind LPOUT0 > > gpiochip7 9 > > real 0m 0.19s > > user 0m 0.00s > > sys 0m 0.01s > > > > That is almost a 10x increase in execution time of the whole program!! > > > > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was > > on 6.12. > > > > Unfortunately I can't come up with an easy solution to this problem, that's > > why I don't have a patch to propose. Sorry for that. > > > > I still think it is a bit alarming this change has such a huge impact. IMHO it > > really shouldn't. What can be done about this? Is it maybe possible to defer > > unregistering and freeing to a kthread and return from the release function > > earlier? > > > > This was my first idea too. Alternatively we can switch to using a raw > notifier and provide a spinlock ourselves. That would probably be a good alternative, although gpiod_line_state_notify() wouldn't benefit from the zero-lock RCU implementation and incur a spin_lock penalty. Arguably, this is probably a lot more performance-critical than closing the chardev, so maybe the atomic notifier isn't a bad idea... we just need to deal with the writing side so that user-space doesn't have to wait for the RCU grace period? Certainly, I suppose switching to the raw notifier is the easier fix. OTOH, I know from my own experience that the penalty of a spin-lock does matter sometimes and not having it in the performance critical path is probably nice to have. Best regards, -- David Jander ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-11 11:03 ` David Jander @ 2025-03-12 1:32 ` Kent Gibson 2025-03-12 8:08 ` David Jander 0 siblings, 1 reply; 12+ messages in thread From: Kent Gibson @ 2025-03-12 1:32 UTC (permalink / raw) To: David Jander Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel On Tue, Mar 11, 2025 at 12:03:46PM +0100, David Jander wrote: > > > Indeed, it does. My application is written in python and uses the python gpiod > module. Even in such an environment the impact is killing. > Interesting - the only reason I could think of for an application requesting/releasing GPIOs at a high rate was it if was built on top of the libgpiod tools and so was unable to hold the request fd. Generally an application should request the lines it requires once and hold them for the duration. Similarly functions such as find_line() should be performed once per line. From a performance perspective, NOT having to re-request a line is considerably faster than requesting it - even with Bart's fix. Is there something unusual about your app that requires the lines be released? Cheers, Kent. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-12 1:32 ` Kent Gibson @ 2025-03-12 8:08 ` David Jander 2025-03-12 9:10 ` Kent Gibson 0 siblings, 1 reply; 12+ messages in thread From: David Jander @ 2025-03-12 8:08 UTC (permalink / raw) To: Kent Gibson Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel On Wed, 12 Mar 2025 09:32:56 +0800 Kent Gibson <warthog618@gmail.com> wrote: > On Tue, Mar 11, 2025 at 12:03:46PM +0100, David Jander wrote: > > > > Indeed, it does. My application is written in python and uses the python gpiod > > module. Even in such an environment the impact is killing. > > Interesting - the only reason I could think of for an application > requesting/releasing GPIOs at a high rate was it if was built on top of > the libgpiod tools and so was unable to hold the request fd. I didn't want to bother the list with the details, but this is during the configuration phase of the application. It receives many configuration messages for different IO objects at a fast pace. Most of those objects use one or more GPIO lines identified by their label. So the application calls gpiod.find_line(label) on each of them. Apparently libgiod (version 1.6.3 in our case) isn't very efficient, since it will open and close each of the gpiodev devices in order to query for each of the gpio lines. I wouldn't blame libgpiod (python bindings) for doing it that way, since open()/close() of a chardev are expected to be fast, and caching this information is probably error prone anyway, since AFAIK user space cannot yet be informed of changes to gpio chips from kernel space. If this had been this slow always (even before 6.13), I would probably have done things a bit differently and cached the config requests to then "find" the lines in batches directly working on the character devices instead of using gpiod, so I could open/close each one just once for finding many different lines each time. > Generally an application should request the lines it requires once and hold > them for the duration. Similarly functions such as find_line() should be > performed once per line. Of course it does that ;-) This board has a large amount of GPIO lines, and like I said, it is during the initial configuration phase of the application that I am seeing this problem. > From a performance perspective, NOT having to re-request a line is > considerably faster than requesting it - even with Bart's fix. > > Is there something unusual about your app that requires the lines be > released? See above. Best regards, -- David Jander ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-12 8:08 ` David Jander @ 2025-03-12 9:10 ` Kent Gibson 2025-03-12 10:24 ` David Jander 0 siblings, 1 reply; 12+ messages in thread From: Kent Gibson @ 2025-03-12 9:10 UTC (permalink / raw) To: David Jander Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel On Wed, Mar 12, 2025 at 09:08:29AM +0100, David Jander wrote: > On Wed, 12 Mar 2025 09:32:56 +0800 > Kent Gibson <warthog618@gmail.com> wrote: > > > On Tue, Mar 11, 2025 at 12:03:46PM +0100, David Jander wrote: > > > > > > Indeed, it does. My application is written in python and uses the python gpiod > > > module. Even in such an environment the impact is killing. > > > > Interesting - the only reason I could think of for an application > > requesting/releasing GPIOs at a high rate was it if was built on top of > > the libgpiod tools and so was unable to hold the request fd. > Btw, I'm not suggesting that anyone build an app on top of the libgpiod tools - I was just hunting for an explanation as to why anyone might be opening and closing chips or requests at a high rate. > I didn't want to bother the list with the details, but this is during the > configuration phase of the application. The fact that close() was slow is valid but it left me wondering why you were needing to do that so frequently. It helps to understand what you are doing and why to see if there are other better solutions - or it there should be. > It receives many configuration messages > for different IO objects at a fast pace. Most of those objects use one or more > GPIO lines identified by their label. So the application calls > gpiod.find_line(label) on each of them. Apparently libgiod (version 1.6.3 in > our case) isn't very efficient, since it will open and close each of the > gpiodev devices in order to query for each of the gpio lines. I wouldn't blame > libgpiod (python bindings) for doing it that way, since open()/close() of a > chardev are expected to be fast, and caching this information is probably > error prone anyway, since AFAIK user space cannot yet be informed of changes > to gpio chips from kernel space. > Ok, if the issue is purely the name -> (chip,offset) mapping it is pretty safe to assume that line names are immutable - though not unique, so caching the mapping should be fine. The kernel can already tell userspace about a number of changes. What changes are you concerned about - adding/removing chips? > If this had been this slow always (even before 6.13), I would probably have > done things a bit differently and cached the config requests to then "find" > the lines in batches directly working on the character devices instead of > using gpiod, so I could open/close each one just once for finding many > different lines each time. > The libgpiod v2 tools do just that - they scan the chips once rather than once per line. But that functionality is not exposed in the libgpiod v2 API as the C interface is hideous and it is difficult to provide well defined behaviour (e.g. in what order are the chips scanned?). So it is left to the application to determine how they want to do it. There isn't even a find_line() equivalent in v2, IIRC. > > Generally an application should request the lines it requires once and hold > > them for the duration. Similarly functions such as find_line() should be > > performed once per line. > > Of course it does that ;-) > This board has a large amount of GPIO lines, and like I said, it is during the > initial configuration phase of the application that I am seeing this problem. > Good to hear - from your earlier description I was concerned that you might be doing it continuously. Cheers, Kent. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-12 9:10 ` Kent Gibson @ 2025-03-12 10:24 ` David Jander 2025-03-12 12:10 ` Kent Gibson 0 siblings, 1 reply; 12+ messages in thread From: David Jander @ 2025-03-12 10:24 UTC (permalink / raw) To: Kent Gibson Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel On Wed, 12 Mar 2025 17:10:56 +0800 Kent Gibson <warthog618@gmail.com> wrote: > On Wed, Mar 12, 2025 at 09:08:29AM +0100, David Jander wrote: > > On Wed, 12 Mar 2025 09:32:56 +0800 > > Kent Gibson <warthog618@gmail.com> wrote: > > > > > On Tue, Mar 11, 2025 at 12:03:46PM +0100, David Jander wrote: > > > > > > > > Indeed, it does. My application is written in python and uses the python gpiod > > > > module. Even in such an environment the impact is killing. > > > > > > Interesting - the only reason I could think of for an application > > > requesting/releasing GPIOs at a high rate was it if was built on top of > > > the libgpiod tools and so was unable to hold the request fd. > > > > Btw, I'm not suggesting that anyone build an app on top of the libgpiod > tools - I was just hunting for an explanation as to why anyone might be > opening and closing chips or requests at a high rate. Very understandable. > > I didn't want to bother the list with the details, but this is during the > > configuration phase of the application. > > The fact that close() was slow is valid but it left me wondering why you > were needing to do that so frequently. > It helps to understand what you are doing and why to see if there are > other better solutions - or it there should be. You are right. In this case it was warranted to describe the situation in a bit more detail. > > It receives many configuration messages > > for different IO objects at a fast pace. Most of those objects use one or more > > GPIO lines identified by their label. So the application calls > > gpiod.find_line(label) on each of them. Apparently libgiod (version 1.6.3 in > > our case) isn't very efficient, since it will open and close each of the > > gpiodev devices in order to query for each of the gpio lines. I wouldn't blame > > libgpiod (python bindings) for doing it that way, since open()/close() of a > > chardev are expected to be fast, and caching this information is probably > > error prone anyway, since AFAIK user space cannot yet be informed of changes > > to gpio chips from kernel space. > > > > Ok, if the issue is purely the name -> (chip,offset) mapping it is pretty > safe to assume that line names are immutable - though not unique, so > caching the mapping should be fine. On our board that would be fine, but what about hot-pluggable GPIO chips/devices, or modules that are loaded at a later time? I was thinking about libgpiod in general... > The kernel can already tell userspace about a number of changes. > What changes are you concerned about - adding/removing chips? Yes, since the patches from Bartosz I understand that is indeed possible now ;-) No special concern, just thinking about general applicability of caching such information in libgpiod (especially considering the old version I am using presumably from long before the kernel could do this). > > If this had been this slow always (even before 6.13), I would probably have > > done things a bit differently and cached the config requests to then "find" > > the lines in batches directly working on the character devices instead of > > using gpiod, so I could open/close each one just once for finding many > > different lines each time. > > The libgpiod v2 tools do just that - they scan the chips once rather > than once per line. But that functionality is not exposed in the > libgpiod v2 API as the C interface is hideous and it is difficult to > provide well defined behaviour (e.g. in what order are the chips scanned?). > So it is left to the application to determine how they want to do it. > There isn't even a find_line() equivalent in v2, IIRC. I think I should move to v2 as soon as I find the time to do it. ;-) In any case, I also could reproduce the issue with the gpiodetect tool from v2. You can visually see each found chips being printed individually on the terminal with kernel v6.13, while with 6.12 all chip names would appear "instantly". Hard to describe with words, but you could in fact tell which kernel was running just by looking at the terminal output of "gpiodetect" while it was being executed... my board has 16 gpio chips, so you can really see it "scrolling" up as it prints them with kernel 6.13. > > > Generally an application should request the lines it requires once and hold > > > them for the duration. Similarly functions such as find_line() should be > > > performed once per line. > > > > Of course it does that ;-) > > This board has a large amount of GPIO lines, and like I said, it is during the > > initial configuration phase of the application that I am seeing this problem. > > > > Good to hear - from your earlier description I was concerned that > you might be doing it continuously. Thanks. Tbh, I am quite proud of the efficiency and speed of the application itself. Being written in pure python and running on a rather slow Cortex-A7, it is surprisingly fast, controlling 16 stepper motors, reacting to 26 sensor inputs changing at a blazing high pace and continuously sampling several IIO adcs at 16kHz sample rate, all with rather low CPU usage (in python!). It makes heavy use of asyncio an thus of course the epoll() event mechanisms the kernel provides for GPIOs, IIO, LMC and other interfaces. So you may understand that I was a bit triggered by your suggestion of inefficiency initially. Sorry ;-) Best regards, -- David Jander ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-12 10:24 ` David Jander @ 2025-03-12 12:10 ` Kent Gibson 0 siblings, 0 replies; 12+ messages in thread From: Kent Gibson @ 2025-03-12 12:10 UTC (permalink / raw) To: David Jander Cc: Bartosz Golaszewski, Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel On Wed, Mar 12, 2025 at 11:24:01AM +0100, David Jander wrote: > > > > Ok, if the issue is purely the name -> (chip,offset) mapping it is pretty > > safe to assume that line names are immutable - though not unique, so > > caching the mapping should be fine. > > On our board that would be fine, but what about hot-pluggable GPIO > chips/devices, or modules that are loaded at a later time? I was thinking > about libgpiod in general... > Indeed. A general solution suitable for libgpiod is a hard problem. It is probably something better left to a higher layer. > > The kernel can already tell userspace about a number of changes. > > What changes are you concerned about - adding/removing chips? > > Yes, since the patches from Bartosz I understand that is indeed possible now > ;-) > No special concern, just thinking about general applicability of caching such > information in libgpiod (especially considering the old version I am using > presumably from long before the kernel could do this). > The change notifiers you are referring to have been there for some time - they were first introduced late in uAPI v1. I was also thinking of udev events for when devices are added or removed. Though again, not something core libgpiod should be getting involved with. > > > If this had been this slow always (even before 6.13), I would probably have > > > done things a bit differently and cached the config requests to then "find" > > > the lines in batches directly working on the character devices instead of > > > using gpiod, so I could open/close each one just once for finding many > > > different lines each time. > > > > The libgpiod v2 tools do just that - they scan the chips once rather > > than once per line. But that functionality is not exposed in the > > libgpiod v2 API as the C interface is hideous and it is difficult to > > provide well defined behaviour (e.g. in what order are the chips scanned?). > > So it is left to the application to determine how they want to do it. > > There isn't even a find_line() equivalent in v2, IIRC. > > I think I should move to v2 as soon as I find the time to do it. ;-) > You certainly should - the v2 Python bindings are much more pythonic and easier to use. > In any case, I also could reproduce the issue with the gpiodetect tool from > v2. You can visually see each found chips being printed individually on the > terminal with kernel v6.13, while with 6.12 all chip names would appear > "instantly". Hard to describe with words, but you could in fact tell which > kernel was running just by looking at the terminal output of "gpiodetect" > while it was being executed... my board has 16 gpio chips, so you can really > see it "scrolling" up as it prints them with kernel 6.13. > For sure - the close() issue is a real issue. > > > > Generally an application should request the lines it requires once and hold > > > > them for the duration. Similarly functions such as find_line() should be > > > > performed once per line. > > > > > > Of course it does that ;-) > > > This board has a large amount of GPIO lines, and like I said, it is during the > > > initial configuration phase of the application that I am seeing this problem. > > > > > > > Good to hear - from your earlier description I was concerned that > > you might be doing it continuously. > > Thanks. Tbh, I am quite proud of the efficiency and speed of the application > itself. Being written in pure python and running on a rather slow Cortex-A7, > it is surprisingly fast, controlling 16 stepper motors, reacting to 26 sensor > inputs changing at a blazing high pace and continuously sampling several IIO > adcs at 16kHz sample rate, all with rather low CPU usage (in python!). It makes > heavy use of asyncio an thus of course the epoll() event mechanisms the kernel > provides for GPIOs, IIO, LMC and other interfaces. You can do some surprising things with Python. > So you may understand that I was a bit triggered by your suggestion of > inefficiency initially. Sorry ;-) > No problems - I've seen people do some silly things and just wanted to be sure you weren't one of them ;-). Glad everything is working for you. Cheers, Kent. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-11 10:00 regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance David Jander 2025-03-11 10:21 ` Bartosz Golaszewski @ 2025-03-11 11:45 ` Bartosz Golaszewski 2025-03-11 12:30 ` David Jander 1 sibling, 1 reply; 12+ messages in thread From: Bartosz Golaszewski @ 2025-03-11 11:45 UTC (permalink / raw) To: David Jander Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio, linux-kernel On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote: > > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was > on 6.12. > Interestingly: I cannot reproduce it. Obviously gpiofind doesn't exist in libgpiod v2 but I'm running gpiodetect with and without reverting these changes and am getting roughly the same results: ~0.050s real time for 1 up to 4 chips. Any idea why that could be? Can you reproduce it with libgpiod v2 (I don't know why that wouldn't be the case but worth double checking). Bart ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-11 11:45 ` Bartosz Golaszewski @ 2025-03-11 12:30 ` David Jander 2025-03-11 13:21 ` Bartosz Golaszewski 0 siblings, 1 reply; 12+ messages in thread From: David Jander @ 2025-03-11 12:30 UTC (permalink / raw) To: Bartosz Golaszewski Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio, linux-kernel On Tue, 11 Mar 2025 12:45:51 +0100 Bartosz Golaszewski <brgl@bgdev.pl> wrote: > On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote: > > > > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was > > on 6.12. > > > > Interestingly: I cannot reproduce it. Obviously gpiofind doesn't exist > in libgpiod v2 but I'm running gpiodetect with and without reverting > these changes and am getting roughly the same results: ~0.050s real > time for 1 up to 4 chips. > > Any idea why that could be? Can you reproduce it with libgpiod v2 (I > don't know why that wouldn't be the case but worth double checking). I had libgpiod version 1.6.3 and I have now built libgpiod version 2.1.3. Here are my findings: 1. time gpiodetect on kernel 6.13 and gpiod 1.6.3: $ time gpiodetect gpiochip0 [GPIOA] (16 lines) gpiochip1 [GPIOB] (16 lines) gpiochip10 [GPIOK] (8 lines) gpiochip11 [GPIOZ] (8 lines) gpiochip12 [unknown] (22 lines) gpiochip13 [mcp23s17.0] (16 lines) gpiochip14 [0-0020] (16 lines) gpiochip15 [0-0021] (16 lines) gpiochip2 [GPIOC] (16 lines) gpiochip3 [GPIOD] (16 lines) gpiochip4 [GPIOE] (16 lines) gpiochip5 [GPIOF] (16 lines) gpiochip6 [GPIOG] (16 lines) gpiochip7 [GPIOH] (16 lines) gpiochip8 [GPIOI] (16 lines) gpiochip9 [GPIOJ] (16 lines) real 0m 0.19s user 0m 0.00s sys 0m 0.01s 2. time gpiodetect on kernel 6.13 and gpiod 2.1.3: $ time gpiodetect gpiochip0 [GPIOA] (16 lines) gpiochip1 [GPIOB] (16 lines) gpiochip2 [GPIOC] (16 lines) gpiochip3 [GPIOD] (16 lines) gpiochip4 [GPIOE] (16 lines) gpiochip5 [GPIOF] (16 lines) gpiochip6 [GPIOG] (16 lines) gpiochip7 [GPIOH] (16 lines) gpiochip8 [GPIOI] (16 lines) gpiochip9 [GPIOJ] (16 lines) gpiochip10 [GPIOK] (8 lines) gpiochip11 [GPIOZ] (8 lines) gpiochip12 [unknown] (22 lines) gpiochip13 [mcp23s17.0] (16 lines) gpiochip14 [0-0020] (16 lines) gpiochip15 [0-0021] (16 lines) real 0m 0.22s user 0m 0.00s sys 0m 0.06s (note that it became slightly slower from v1 -> v2) 3. time gpiodetect on kernel 6.12 and gpiod 1.6.3: $ time gpiodetect gpiochip0 [GPIOA] (16 lines) gpiochip1 [GPIOB] (16 lines) gpiochip10 [GPIOK] (8 lines) gpiochip11 [GPIOZ] (8 lines) gpiochip12 [unknown] (22 lines) gpiochip13 [mcp23s17.0] (16 lines) gpiochip14 [0-0020] (16 lines) gpiochip15 [0-0021] (16 lines) gpiochip2 [GPIOC] (16 lines) gpiochip3 [GPIOD] (16 lines) gpiochip4 [GPIOE] (16 lines) gpiochip5 [GPIOF] (16 lines) gpiochip6 [GPIOG] (16 lines) gpiochip7 [GPIOH] (16 lines) gpiochip8 [GPIOI] (16 lines) gpiochip9 [GPIOJ] (16 lines) real 0m 0.03s user 0m 0.00s sys 0m 0.01s 4. time gpiodetect on kernel 6.12 and gpiod 2.1.3: $ time gpiodetect gpiochip0 [GPIOA] (16 lines) gpiochip1 [GPIOB] (16 lines) gpiochip2 [GPIOC] (16 lines) gpiochip3 [GPIOD] (16 lines) gpiochip4 [GPIOE] (16 lines) gpiochip5 [GPIOF] (16 lines) gpiochip6 [GPIOG] (16 lines) gpiochip7 [GPIOH] (16 lines) gpiochip8 [GPIOI] (16 lines) gpiochip9 [GPIOJ] (16 lines) gpiochip10 [GPIOK] (8 lines) gpiochip11 [GPIOZ] (8 lines) gpiochip12 [unknown] (22 lines) gpiochip13 [mcp23s17.0] (16 lines) gpiochip14 [0-0020] (16 lines) gpiochip15 [0-0021] (16 lines) real 0m 0.07s user 0m 0.00s sys 0m 0.06s (roughly same speed difference from v1 -> v2). Can you describe your platform? Is it a multi-core or single-core CPU? What RCU implementation does it use? Tree or tiny? If it is multi-core, is there a difference if you disable all but one core? Maybe some kernel CONFIG option that makes a difference? I am not an expert in RCU (in fact I barely know what it does), so maybe I am missing something that makes this problem go away? Best regards, -- David Jander ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-11 12:30 ` David Jander @ 2025-03-11 13:21 ` Bartosz Golaszewski 2025-03-11 14:09 ` David Jander 0 siblings, 1 reply; 12+ messages in thread From: Bartosz Golaszewski @ 2025-03-11 13:21 UTC (permalink / raw) To: David Jander Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio, linux-kernel On Tue, Mar 11, 2025 at 1:30 PM David Jander <david@protonic.nl> wrote: > > On Tue, 11 Mar 2025 12:45:51 +0100 > Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote: > > > > > > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was > > > on 6.12. > > > > > > > Interestingly: I cannot reproduce it. Obviously gpiofind doesn't exist > > in libgpiod v2 but I'm running gpiodetect with and without reverting > > these changes and am getting roughly the same results: ~0.050s real > > time for 1 up to 4 chips. > > > > Any idea why that could be? Can you reproduce it with libgpiod v2 (I > > don't know why that wouldn't be the case but worth double checking). > > > Can you describe your platform? Is it a multi-core or single-core CPU? What > RCU implementation does it use? Tree or tiny? If it is multi-core, is there a > difference if you disable all but one core? > Maybe some kernel CONFIG option that makes a difference? I am not an expert in > RCU (in fact I barely know what it does), so maybe I am missing something that > makes this problem go away? > I'm testing on a qemu VM - SMP and single core. RCU algo is tree. In any case: I've just sent you an RFT patch that switches to using the raw notifier. Could you see what results you're getting with it? Bart ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance 2025-03-11 13:21 ` Bartosz Golaszewski @ 2025-03-11 14:09 ` David Jander 0 siblings, 0 replies; 12+ messages in thread From: David Jander @ 2025-03-11 14:09 UTC (permalink / raw) To: Bartosz Golaszewski Cc: Bartosz Golaszewski, Kent Gibson, Linus Walleij, linux-gpio, linux-kernel On Tue, 11 Mar 2025 14:21:37 +0100 Bartosz Golaszewski <brgl@bgdev.pl> wrote: > On Tue, Mar 11, 2025 at 1:30 PM David Jander <david@protonic.nl> wrote: > > > > On Tue, 11 Mar 2025 12:45:51 +0100 > > Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > > > On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@protonic.nl> wrote: > > > > > > > > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was > > > > on 6.12. > > > > > > > > > > Interestingly: I cannot reproduce it. Obviously gpiofind doesn't exist > > > in libgpiod v2 but I'm running gpiodetect with and without reverting > > > these changes and am getting roughly the same results: ~0.050s real > > > time for 1 up to 4 chips. > > > > > > Any idea why that could be? Can you reproduce it with libgpiod v2 (I > > > don't know why that wouldn't be the case but worth double checking). > > > > > > Can you describe your platform? Is it a multi-core or single-core CPU? What > > RCU implementation does it use? Tree or tiny? If it is multi-core, is there a > > difference if you disable all but one core? > > Maybe some kernel CONFIG option that makes a difference? I am not an expert in > > RCU (in fact I barely know what it does), so maybe I am missing something that > > makes this problem go away? > > > > I'm testing on a qemu VM - SMP and single core. RCU algo is tree. I haven't followed deep into the RCU code, but I had the impression that the synchronize_rcu code sets a completion and waits for it, so I suspect the bulk of the delay coming from context switches. Maybe on qemu this behaves differently than on real hardware in terms of timing and overhead? Btw, there is also another way this can be made visible: $ strace -r gpiodetect 0.000000 execve("/usr/bin/gpiodetect", ["gpiodetect"], 0x7eec5d94 /* 13 vars */) = 0 0.017348 brk(NULL) = 0xc21000 0.000689 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) ... 0.000677 openat(AT_FDCWD, "/dev/gpiochip9", O_RDWR|O_CLOEXEC) = 18 0.000741 statx(AT_FDCWD, "/dev/gpiochip9", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFCHR|0600, stx_size=0, ...}) = 0 0.000794 statx(AT_FDCWD, "/dev/gpiochip9", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFCHR|0600, stx_size=0, ...}) = 0 0.000794 access("/sys/bus/gpio/devices/gpiochip9/dev", R_OK) = 0 0.000738 openat(AT_FDCWD, "/sys/bus/gpio/devices/gpiochip9/dev", O_RDONLY) = 19 0.000754 read(19, "254:9\n", 15) = 6 0.000597 close(19) = 0 0.000569 ioctl(18, GPIO_GET_CHIPINFO_IOCTL, {name="gpiochip9", label="GPIOJ", lines=16}) = 0 0.000631 statx(1, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFCHR|0600, stx_size=0, ...}) = 0 0.000778 ioctl(1, TCGETS, {c_iflag=ICRNL|IXON|IXOFF|IUTF8, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B1500000|CS8|CREAD|HUPCL|CLOCAL, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|ECHOKE, ...}) = 0 0.001941 write(1, "gpiochip0 [GPIOA] (16 lines)\n", 29gpiochip0 [GPIOA] (16 lines) ) = 29 0.000670 close(3) = 0 0.006162 write(1, "gpiochip1 [GPIOB] (16 lines)\n", 29gpiochip1 [GPIOB] (16 lines) ) = 29 0.000655 close(4) = 0 0.006250 write(1, "gpiochip10 [GPIOK] (8 lines)\n", 29gpiochip10 [GPIOK] (8 lines) ) = 29 0.000667 close(5) = 0 0.006338 write(1, "gpiochip11 [GPIOZ] (8 lines)\n", 29gpiochip11 [GPIOZ] (8 lines) ) = 29 0.000661 close(6) = 0 0.006321 write(1, "gpiochip12 [unknown] (22 lines)\n", 32gpiochip12 [unknown] (22 lines) ) = 32 0.000662 close(7) = 0 0.006365 write(1, "gpiochip13 [mcp23s17.0] (16 line"..., 35gpiochip13 [mcp23s17.0] (16 lines) ) = 35 0.000672 close(8) = 0 0.006389 write(1, "gpiochip14 [0-0020] (16 lines)\n", 31gpiochip14 [0-0020] (16 lines) ) = 31 0.000641 close(9) = 0 0.006268 write(1, "gpiochip15 [0-0021] (16 lines)\n", 31gpiochip15 [0-0021] (16 lines) ) = 31 0.000677 close(10) = 0 0.006330 write(1, "gpiochip2 [GPIOC] (16 lines)\n", 29gpiochip2 [GPIOC] (16 lines) ) = 29 0.000648 close(11) = 0 0.006358 write(1, "gpiochip3 [GPIOD] (16 lines)\n", 29gpiochip3 [GPIOD] (16 lines) ) = 29 0.000655 close(12) = 0 0.006333 write(1, "gpiochip4 [GPIOE] (16 lines)\n", 29gpiochip4 [GPIOE] (16 lines) ) = 29 0.000669 close(13) = 0 0.006332 write(1, "gpiochip5 [GPIOF] (16 lines)\n", 29gpiochip5 [GPIOF] (16 lines) ) = 29 0.000653 close(14) = 0 0.006359 write(1, "gpiochip6 [GPIOG] (16 lines)\n", 29gpiochip6 [GPIOG] (16 lines) ) = 29 0.000653 close(15) = 0 0.006332 write(1, "gpiochip7 [GPIOH] (16 lines)\n", 29gpiochip7 [GPIOH] (16 lines) ) = 29 0.000657 close(16) = 0 0.006344 write(1, "gpiochip8 [GPIOI] (16 lines)\n", 29gpiochip8 [GPIOI] (16 lines) ) = 29 0.000661 close(17) = 0 0.006343 write(1, "gpiochip9 [GPIOJ] (16 lines)\n", 29gpiochip9 [GPIOJ] (16 lines) ) = 29 0.000654 close(18) = 0 0.006394 exit_group(0) = ? 0.001335 +++ exited with 0 +++ I hope this is readable. Notice the long relative time-stamp of the syscall immediately preceding each of the close() calls that close a gpiochip chardev. For comparison, here's the output after applying your patch: ... 0.000474 close(3) = 0 0.000472 write(1, "gpiochip1 [GPIOB] (16 lines)\n", 29gpiochip1 [GPIOB] (16 lines) ) = 29 0.000426 close(4) = 0 0.000406 write(1, "gpiochip10 [GPIOK] (8 lines)\n", 29gpiochip10 [GPIOK] (8 lines) ) = 29 0.000418 close(5) = 0 0.000354 write(1, "gpiochip11 [GPIOZ] (8 lines)\n", 29gpiochip11 [GPIOZ] (8 lines) ) = 29 0.000454 close(6) = 0 0.000359 write(1, "gpiochip12 [unknown] (22 lines)\n", 32gpiochip12 [unknown] (22 lines) ) = 32 0.000457 close(7) = 0 0.000433 write(1, "gpiochip13 [mcp23s17.0] (16 line"..., 35gpiochip13 [mcp23s17.0] (16 lines) ) = 35 0.000412 close(8) = 0 0.000422 write(1, "gpiochip14 [0-0020] (16 lines)\n", 31gpiochip14 [0-0020] (16 lines) ) = 31 0.000416 close(9) = 0 0.000416 write(1, "gpiochip15 [0-0021] (16 lines)\n", 31gpiochip15 [0-0021] (16 lines) ) = 31 0.000408 close(10) = 0 0.000358 write(1, "gpiochip2 [GPIOC] (16 lines)\n", 29gpiochip2 [GPIOC] (16 lines) ) = 29 0.000441 close(11) = 0 0.000359 write(1, "gpiochip3 [GPIOD] (16 lines)\n", 29gpiochip3 [GPIOD] (16 lines) ) = 29 0.000443 close(12) = 0 0.000359 write(1, "gpiochip4 [GPIOE] (16 lines)\n", 29gpiochip4 [GPIOE] (16 lines) ) = 29 0.000394 close(13) = 0 0.000413 write(1, "gpiochip5 [GPIOF] (16 lines)\n", 29gpiochip5 [GPIOF] (16 lines) ) = 29 0.000396 close(14) = 0 0.000479 write(1, "gpiochip6 [GPIOG] (16 lines)\n", 29gpiochip6 [GPIOG] (16 lines) ) = 29 0.000431 close(15) = 0 0.000431 write(1, "gpiochip7 [GPIOH] (16 lines)\n", 29gpiochip7 [GPIOH] (16 lines) ) = 29 0.000471 close(16) = 0 0.000381 write(1, "gpiochip8 [GPIOI] (16 lines)\n", 29gpiochip8 [GPIOI] (16 lines) ) = 29 0.000484 close(17) = 0 0.000374 write(1, "gpiochip9 [GPIOJ] (16 lines)\n", 29gpiochip9 [GPIOJ] (16 lines) ) = 29 0.000480 close(18) = 0 0.000418 exit_group(0) = ? 0.001373 +++ exited with 0 +++ > In > any case: I've just sent you an RFT patch that switches to using the > raw notifier. Could you see what results you're getting with it? Thanks! This was quick :-) Like I also replied to the patch email (kernel 6.13 and libgpiod 1.6.3): $ time gpiofind LPOUT0 gpiochip7 9 real 0m 0.02s user 0m 0.00s sys 0m 0.01s $ time gpiodetect gpiochip0 [GPIOA] (16 lines) gpiochip1 [GPIOB] (16 lines) gpiochip10 [GPIOK] (8 lines) gpiochip11 [GPIOZ] (8 lines) gpiochip12 [unknown] (22 lines) gpiochip13 [mcp23s17.0] (16 lines) gpiochip14 [0-0020] (16 lines) gpiochip15 [0-0021] (16 lines) gpiochip2 [GPIOC] (16 lines) gpiochip3 [GPIOD] (16 lines) gpiochip4 [GPIOE] (16 lines) gpiochip5 [GPIOF] (16 lines) gpiochip6 [GPIOG] (16 lines) gpiochip7 [GPIOH] (16 lines) gpiochip8 [GPIOI] (16 lines) gpiochip9 [GPIOJ] (16 lines) real 0m 0.03s user 0m 0.00s sys 0m 0.01s Best regards, -- David Jander ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-03-12 12:10 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-11 10:00 regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance David Jander 2025-03-11 10:21 ` Bartosz Golaszewski 2025-03-11 11:03 ` David Jander 2025-03-12 1:32 ` Kent Gibson 2025-03-12 8:08 ` David Jander 2025-03-12 9:10 ` Kent Gibson 2025-03-12 10:24 ` David Jander 2025-03-12 12:10 ` Kent Gibson 2025-03-11 11:45 ` Bartosz Golaszewski 2025-03-11 12:30 ` David Jander 2025-03-11 13:21 ` Bartosz Golaszewski 2025-03-11 14:09 ` David Jander
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox