* [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu @ 2014-12-07 16:26 Ronny Meeus 2014-12-16 17:36 ` Ronny Meeus 0 siblings, 1 reply; 21+ messages in thread From: Ronny Meeus @ 2014-12-07 16:26 UTC (permalink / raw) To: xenomai Hello we are using the xenomai-forge implementation. We from time to time see an issue that the timer-internal thread is consuming a complete core. It is seen when we send broadcast traffic that needs to be handled by the Linux kernel (ARP). The kernel thread's priority handling the packets in the middle between the timer-internal thread and the application thread's priority. All threads run on the same core. If the priority of the timer-internal is lowered below the kernel thread, the load disappears immediately. So it looks like there is some busy polling on a common resource that is currently held by the application thread running at the lowest prio. I see that the timer lock being used is a mutex with priority inheritance so I would expect that the prio of the application is raised as soon as the timer-internal thread tries to obtain the mutex. It might be that it has nothing to do with the mutex, this is just my guess. Has anybody seen similar issues before? Best regards, Ronny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-07 16:26 [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu Ronny Meeus @ 2014-12-16 17:36 ` Ronny Meeus 2014-12-16 17:58 ` Philippe Gerum 0 siblings, 1 reply; 21+ messages in thread From: Ronny Meeus @ 2014-12-16 17:36 UTC (permalink / raw) To: xenomai On Sun, Dec 7, 2014 at 5:26 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote: > Hello > > we are using the xenomai-forge implementation. > We from time to time see an issue that the timer-internal thread is > consuming a complete core. It is seen when we send broadcast traffic that > needs to be handled by the Linux kernel (ARP). > > The kernel thread's priority handling the packets in the middle between the > timer-internal thread and the application thread's priority. All threads run > on the same core. > If the priority of the timer-internal is lowered below the kernel thread, > the load disappears immediately. > So it looks like there is some busy polling on a common resource that is > currently held by the application thread running at the lowest prio. > > I see that the timer lock being used is a mutex with priority inheritance so > I would expect that the prio of the application is raised as soon as the > timer-internal thread tries to obtain the mutex. After investigating this issue in more detail I have the impression that it has nothing to do with the mutex used to protect the timer but with the conditional variable used to implement the psos event interface. I found references on the web that explain an issue with the internal mutex used inside the posix library to implement a conditional variable. See: https://bugzilla.redhat.com/show_bug.cgi?id=438484 http://marc.info/?t=134688711000002&r=1&w=2 If this is indeed true, it means that the usage of conditional variables is not safe at all (from priority inheritance point of view). Did anybody experiences issues like this before? Are there any solutions / workarounds available (for example by avoiding using conditional variables and using PI mutexes instead)? > It might be that it has nothing to do with the mutex, this is just my guess. > > Has anybody seen similar issues before? > > Best regards, > Ronny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-16 17:36 ` Ronny Meeus @ 2014-12-16 17:58 ` Philippe Gerum 2014-12-16 19:41 ` Ronny Meeus 0 siblings, 1 reply; 21+ messages in thread From: Philippe Gerum @ 2014-12-16 17:58 UTC (permalink / raw) To: Ronny Meeus, xenomai On 12/16/2014 06:36 PM, Ronny Meeus wrote: > On Sun, Dec 7, 2014 at 5:26 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote: >> Hello >> >> we are using the xenomai-forge implementation. >> We from time to time see an issue that the timer-internal thread is >> consuming a complete core. It is seen when we send broadcast traffic that >> needs to be handled by the Linux kernel (ARP). >> >> The kernel thread's priority handling the packets in the middle between the >> timer-internal thread and the application thread's priority. All threads run >> on the same core. >> If the priority of the timer-internal is lowered below the kernel thread, >> the load disappears immediately. >> So it looks like there is some busy polling on a common resource that is >> currently held by the application thread running at the lowest prio. >> >> I see that the timer lock being used is a mutex with priority inheritance so >> I would expect that the prio of the application is raised as soon as the >> timer-internal thread tries to obtain the mutex. > > After investigating this issue in more detail I have the impression that it has > nothing to do with the mutex used to protect the timer but with the conditional > variable used to implement the psos event interface. > > I found references on the web that explain an issue with the internal mutex used > inside the posix library to implement a conditional variable. See: > https://bugzilla.redhat.com/show_bug.cgi?id=438484 > http://marc.info/?t=134688711000002&r=1&w=2 > > If this is indeed true, it means that the usage of conditional > variables is not safe > at all (from priority inheritance point of view). Yes, condvars are known not to work nicely with PI mutexes on the glibc. > > Did anybody experiences issues like this before? > Are there any solutions / workarounds available (for example by avoiding using > conditional variables and using PI mutexes instead)? Disabling PI for mutexes is the only option. -- Philippe. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-16 17:58 ` Philippe Gerum @ 2014-12-16 19:41 ` Ronny Meeus 2014-12-16 20:07 ` Philippe Gerum 0 siblings, 1 reply; 21+ messages in thread From: Ronny Meeus @ 2014-12-16 19:41 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai On Tue, Dec 16, 2014 at 6:58 PM, Philippe Gerum <rpm@xenomai.org> wrote: > On 12/16/2014 06:36 PM, Ronny Meeus wrote: >> On Sun, Dec 7, 2014 at 5:26 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote: >>> Hello >>> >>> we are using the xenomai-forge implementation. >>> We from time to time see an issue that the timer-internal thread is >>> consuming a complete core. It is seen when we send broadcast traffic that >>> needs to be handled by the Linux kernel (ARP). >>> >>> The kernel thread's priority handling the packets in the middle between the >>> timer-internal thread and the application thread's priority. All threads run >>> on the same core. >>> If the priority of the timer-internal is lowered below the kernel thread, >>> the load disappears immediately. >>> So it looks like there is some busy polling on a common resource that is >>> currently held by the application thread running at the lowest prio. >>> >>> I see that the timer lock being used is a mutex with priority inheritance so >>> I would expect that the prio of the application is raised as soon as the >>> timer-internal thread tries to obtain the mutex. >> >> After investigating this issue in more detail I have the impression that it has >> nothing to do with the mutex used to protect the timer but with the conditional >> variable used to implement the psos event interface. >> >> I found references on the web that explain an issue with the internal mutex used >> inside the posix library to implement a conditional variable. See: >> https://bugzilla.redhat.com/show_bug.cgi?id=438484 >> http://marc.info/?t=134688711000002&r=1&w=2 >> >> If this is indeed true, it means that the usage of conditional >> variables is not safe >> at all (from priority inheritance point of view). > > Yes, condvars are known not to work nicely with PI mutexes on the glibc. Philippe, I do not understand. We just use the pSOS interface of the xenomai-forge, which internally uses conditional variables. Does this mean that we cannot use the pSOS interface with glibc? If above statement is correc, what libc should we use to make it work? > >> >> Did anybody experiences issues like this before? >> Are there any solutions / workarounds available (for example by avoiding using >> conditional variables and using PI mutexes instead)? > > Disabling PI for mutexes is the only option. > > -- > Philippe. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-16 19:41 ` Ronny Meeus @ 2014-12-16 20:07 ` Philippe Gerum 2014-12-18 8:00 ` Ronny Meeus 0 siblings, 1 reply; 21+ messages in thread From: Philippe Gerum @ 2014-12-16 20:07 UTC (permalink / raw) To: Ronny Meeus; +Cc: xenomai On 12/16/2014 08:41 PM, Ronny Meeus wrote: > On Tue, Dec 16, 2014 at 6:58 PM, Philippe Gerum <rpm@xenomai.org> wrote: >> On 12/16/2014 06:36 PM, Ronny Meeus wrote: >>> On Sun, Dec 7, 2014 at 5:26 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote: >>>> Hello >>>> >>>> we are using the xenomai-forge implementation. >>>> We from time to time see an issue that the timer-internal thread is >>>> consuming a complete core. It is seen when we send broadcast traffic that >>>> needs to be handled by the Linux kernel (ARP). >>>> >>>> The kernel thread's priority handling the packets in the middle between the >>>> timer-internal thread and the application thread's priority. All threads run >>>> on the same core. >>>> If the priority of the timer-internal is lowered below the kernel thread, >>>> the load disappears immediately. >>>> So it looks like there is some busy polling on a common resource that is >>>> currently held by the application thread running at the lowest prio. >>>> >>>> I see that the timer lock being used is a mutex with priority inheritance so >>>> I would expect that the prio of the application is raised as soon as the >>>> timer-internal thread tries to obtain the mutex. >>> >>> After investigating this issue in more detail I have the impression that it has >>> nothing to do with the mutex used to protect the timer but with the conditional >>> variable used to implement the psos event interface. >>> >>> I found references on the web that explain an issue with the internal mutex used >>> inside the posix library to implement a conditional variable. See: >>> https://bugzilla.redhat.com/show_bug.cgi?id=438484 >>> http://marc.info/?t=134688711000002&r=1&w=2 >>> >>> If this is indeed true, it means that the usage of conditional >>> variables is not safe >>> at all (from priority inheritance point of view). >> >> Yes, condvars are known not to work nicely with PI mutexes on the glibc. > > Philippe, > I do not understand. > > We just use the pSOS interface of the xenomai-forge, which internally uses > conditional variables. > Does this mean that we cannot use the pSOS interface with glibc? > > If above statement is correc, what libc should we use to make it work? > A release of the glibc that fixes this issue. I must admit that I did not track this problem lately. Jan likely knows better here. -- Philippe. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-16 20:07 ` Philippe Gerum @ 2014-12-18 8:00 ` Ronny Meeus 2014-12-18 9:04 ` Jan Kiszka 0 siblings, 1 reply; 21+ messages in thread From: Ronny Meeus @ 2014-12-18 8:00 UTC (permalink / raw) To: Philippe Gerum, jan.kiszka; +Cc: xenomai > > A release of the glibc that fixes this issue. I must admit that I did > not track this problem lately. Jan likely knows better here. > Jan, what version glibc solves the priority inversion issue on conditional variables? I already tried the glibc 2.18 but the issue is still there. Regards, Ronny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 8:00 ` Ronny Meeus @ 2014-12-18 9:04 ` Jan Kiszka 2014-12-18 12:28 ` Ronny Meeus 0 siblings, 1 reply; 21+ messages in thread From: Jan Kiszka @ 2014-12-18 9:04 UTC (permalink / raw) To: Ronny Meeus, Philippe Gerum; +Cc: xenomai On 2014-12-18 09:00, Ronny Meeus wrote: >> >> A release of the glibc that fixes this issue. I must admit that I did >> not track this problem lately. Jan likely knows better here. >> > > Jan, > > what version glibc solves the priority inversion issue on conditional variables? > I already tried the glibc 2.18 but the issue is still there. The bug is still not fixed, and discussion stalled again, see https://sourceware.org/bugzilla/show_bug.cgi?id=11588 Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 9:04 ` Jan Kiszka @ 2014-12-18 12:28 ` Ronny Meeus 2014-12-18 13:35 ` Jan Kiszka 2014-12-18 14:12 ` Gilles Chanteperdrix 0 siblings, 2 replies; 21+ messages in thread From: Ronny Meeus @ 2014-12-18 12:28 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai On Thu, Dec 18, 2014 at 10:04 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2014-12-18 09:00, Ronny Meeus wrote: >>> >>> A release of the glibc that fixes this issue. I must admit that I did >>> not track this problem lately. Jan likely knows better here. >>> >> >> Jan, >> >> what version glibc solves the priority inversion issue on conditional variables? >> I already tried the glibc 2.18 but the issue is still there. > > The bug is still not fixed, and discussion stalled again, see > https://sourceware.org/bugzilla/show_bug.cgi?id=11588 > > Jan > > -- > Siemens AG, Corporate Technology, CT RTC ITP SES-DE > Corporate Competence Center Embedded Linux Philippe, Jan as long as this issue is not fixed in glibc, it is not OK to use conditional variables in application space for real-time applications in my opinion. Since the pSOS skin uses conditional variables to implement events and realtime priority threads to implement pSOS tasks, it is by definition broken and not useable for any real application. For example the internal-timer server, sending events to lower priority tasks, will be blocked until all middle prio tasks have completed. We have seen massive load consumed by the internal-timer server due to this. What happens is that the timer thread is blocked on the mutex currently owned by an thread running at normal (lower) priority. Every time a Linux timer expires, a signal is sent to the timer-server which will wake-up the task, return to the c-library which will re-invoke the futex call. In case a high number of timers is used, the overhead of this can be large. Since the timer-server is running at the highest priority (-100) we see all kinds of strange crashes. The same priority inversion is true for our own drivers since they are running at high prio as well. Has the replacement of these conditional variables by some other POSIX mechanism (like mutexes) ever been considered? Ronny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 12:28 ` Ronny Meeus @ 2014-12-18 13:35 ` Jan Kiszka 2014-12-18 14:17 ` Ronny Meeus 2014-12-18 14:12 ` Gilles Chanteperdrix 1 sibling, 1 reply; 21+ messages in thread From: Jan Kiszka @ 2014-12-18 13:35 UTC (permalink / raw) To: Ronny Meeus; +Cc: xenomai On 2014-12-18 13:28, Ronny Meeus wrote: > On Thu, Dec 18, 2014 at 10:04 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> On 2014-12-18 09:00, Ronny Meeus wrote: >>>> >>>> A release of the glibc that fixes this issue. I must admit that I did >>>> not track this problem lately. Jan likely knows better here. >>>> >>> >>> Jan, >>> >>> what version glibc solves the priority inversion issue on conditional variables? >>> I already tried the glibc 2.18 but the issue is still there. >> >> The bug is still not fixed, and discussion stalled again, see >> https://sourceware.org/bugzilla/show_bug.cgi?id=11588 >> >> Jan >> >> -- >> Siemens AG, Corporate Technology, CT RTC ITP SES-DE >> Corporate Competence Center Embedded Linux > > Philippe, Jan > > as long as this issue is not fixed in glibc, it is not OK to use > conditional variables > in application space for real-time applications in my opinion. ...when combining them with PI mutexes, right. For real-time QEMU/KVM, I worked around this by using prio-ceiling mutexes. That is by far not optimal, performance-wise, but at least you avoid random lockups or the other side effects of that bug. > > Since the pSOS skin uses conditional variables to implement events and realtime > priority threads to implement pSOS tasks, it is by definition broken > and not useable > for any real application. > > For example the internal-timer server, sending events to lower priority tasks, > will be blocked until all middle prio tasks have completed. We have seen > massive load consumed by the internal-timer server due to this. > What happens is that the timer thread is blocked on the mutex currently owned > by an thread running at normal (lower) priority. Every time a Linux > timer expires, > a signal is sent to the timer-server which will wake-up the task, > return to the c-library > which will re-invoke the futex call. In case a high number of timers > is used, the overhead > of this can be large. Since the timer-server is running at the highest > priority (-100) we > see all kinds of strange crashes. > > The same priority inversion is true for our own drivers since they are > running at > high prio as well. > > Has the replacement of these conditional variables by some other POSIX mechanism > (like mutexes) ever been considered? Sometimes it is possible to design a algorithm that uses a semaphore for event signaling instead. Doesn't work for all cond-var scenarios, though. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 13:35 ` Jan Kiszka @ 2014-12-18 14:17 ` Ronny Meeus 0 siblings, 0 replies; 21+ messages in thread From: Ronny Meeus @ 2014-12-18 14:17 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai > Philippe, Jan >> >> as long as this issue is not fixed in glibc, it is not OK to use >> conditional variables >> in application space for real-time applications in my opinion. > > ...when combining them with PI mutexes, right. For real-time QEMU/KVM, I > worked around this by using prio-ceiling mutexes. That is by far not > optimal, performance-wise, but at least you avoid random lockups or the > other side effects of that bug. > Jan, to be clear: We do not use PI mutexes in our application, we just use pSOS primitives. Internally in the copperplate lib (see copperplate/syncobj.c), conditional vars are used in combination with PI mutexes. static inline int monitor_wait_grant(struct syncobj *sobj, struct threadobj *current, const struct timespec *timeout) { if (timeout) return -pthread_cond_timedwait(¤t->core.grant_sync, &sobj->core.lock, timeout); return -pthread_cond_wait(¤t->core.grant_sync, &sobj->core.lock); } where sobj->core.lock is a mutex with PI: pthread_mutexattr_init(&mattr); pthread_mutexattr_settype(&mattr, mutex_type_attribute); pthread_mutexattr_setprotocol(&mattr, PTHREAD_PRIO_INHERIT); ret = __bt(-pthread_mutexattr_setpshared(&mattr, mutex_scope_attribute)); Ronny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 12:28 ` Ronny Meeus 2014-12-18 13:35 ` Jan Kiszka @ 2014-12-18 14:12 ` Gilles Chanteperdrix 2014-12-18 14:58 ` Jan Kiszka 1 sibling, 1 reply; 21+ messages in thread From: Gilles Chanteperdrix @ 2014-12-18 14:12 UTC (permalink / raw) To: Ronny Meeus; +Cc: Jan Kiszka, xenomai On Thu, Dec 18, 2014 at 01:28:42PM +0100, Ronny Meeus wrote: > On Thu, Dec 18, 2014 at 10:04 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > > On 2014-12-18 09:00, Ronny Meeus wrote: > >>> > >>> A release of the glibc that fixes this issue. I must admit that I did > >>> not track this problem lately. Jan likely knows better here. > >>> > >> > >> Jan, > >> > >> what version glibc solves the priority inversion issue on conditional variables? > >> I already tried the glibc 2.18 but the issue is still there. > > > > The bug is still not fixed, and discussion stalled again, see > > https://sourceware.org/bugzilla/show_bug.cgi?id=11588 > > > > Jan > > > > -- > > Siemens AG, Corporate Technology, CT RTC ITP SES-DE > > Corporate Competence Center Embedded Linux > > Philippe, Jan > > as long as this issue is not fixed in glibc, it is not OK to use > conditional variables I believe xenomai cobalt does not suffer from the same issue, condition variables should work fine with priority inheritance. Otherwise, have you tried some alternate libc, such as musl: http://www.musl-libc.org/ The following blog: http://ewontfix.com/ Seems to show that the musl maintainers try and report glibc bugs and avoid them in their implementation. I have not tried xenomai with musl at all, so, maybe it does not even compile. But maybe just compiling a testcase for the condvar issue with that libc would help know if it has the same issue or not. -- Gilles. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 14:12 ` Gilles Chanteperdrix @ 2014-12-18 14:58 ` Jan Kiszka 2014-12-18 15:04 ` Gilles Chanteperdrix 0 siblings, 1 reply; 21+ messages in thread From: Jan Kiszka @ 2014-12-18 14:58 UTC (permalink / raw) To: Gilles Chanteperdrix, Ronny Meeus; +Cc: xenomai On 2014-12-18 15:12, Gilles Chanteperdrix wrote: > On Thu, Dec 18, 2014 at 01:28:42PM +0100, Ronny Meeus wrote: >> On Thu, Dec 18, 2014 at 10:04 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote: >>> On 2014-12-18 09:00, Ronny Meeus wrote: >>>>> >>>>> A release of the glibc that fixes this issue. I must admit that I did >>>>> not track this problem lately. Jan likely knows better here. >>>>> >>>> >>>> Jan, >>>> >>>> what version glibc solves the priority inversion issue on conditional variables? >>>> I already tried the glibc 2.18 but the issue is still there. >>> >>> The bug is still not fixed, and discussion stalled again, see >>> https://sourceware.org/bugzilla/show_bug.cgi?id=11588 >>> >>> Jan >>> >>> -- >>> Siemens AG, Corporate Technology, CT RTC ITP SES-DE >>> Corporate Competence Center Embedded Linux >> >> Philippe, Jan >> >> as long as this issue is not fixed in glibc, it is not OK to use >> conditional variables > > I believe xenomai cobalt does not suffer from the same issue, > condition variables should work fine with priority inheritance. Yes, this is a mercury-only issue. Cobalt is fine as its own implementation of posix mutexes and condvars is correct in this regard. > > Otherwise, have you tried some alternate libc, such as musl: > http://www.musl-libc.org/ > > The following blog: > http://ewontfix.com/ > > Seems to show that the musl maintainers try and report glibc bugs > and avoid them in their implementation. > > I have not tried xenomai with musl at all, so, maybe it does not > even compile. But maybe just compiling a testcase for the condvar > issue with that libc would help know if it has the same issue or > not. Well, like with many of those "light-weight" re-implementations, the are "small" issues with bits required for real-time: http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 14:58 ` Jan Kiszka @ 2014-12-18 15:04 ` Gilles Chanteperdrix 2014-12-18 15:25 ` Ronny Meeus 0 siblings, 1 reply; 21+ messages in thread From: Gilles Chanteperdrix @ 2014-12-18 15:04 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote: > On 2014-12-18 15:12, Gilles Chanteperdrix wrote: > > Otherwise, have you tried some alternate libc, such as musl: > > http://www.musl-libc.org/ > > > > The following blog: > > http://ewontfix.com/ > > > > Seems to show that the musl maintainers try and report glibc bugs > > and avoid them in their implementation. > > > > I have not tried xenomai with musl at all, so, maybe it does not > > even compile. But maybe just compiling a testcase for the condvar > > issue with that libc would help know if it has the same issue or > > not. > > Well, like with many of those "light-weight" re-implementations, the are > "small" issues with bits required for real-time: > > http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c On the other hand, no implementation with a clear ENOTSUPP is better than a partial and buggy implementation that can not be trusted anyway. -- Gilles. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 15:04 ` Gilles Chanteperdrix @ 2014-12-18 15:25 ` Ronny Meeus 2014-12-18 15:30 ` Gilles Chanteperdrix 0 siblings, 1 reply; 21+ messages in thread From: Ronny Meeus @ 2014-12-18 15:25 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote: > On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote: >> On 2014-12-18 15:12, Gilles Chanteperdrix wrote: >> > Otherwise, have you tried some alternate libc, such as musl: >> > http://www.musl-libc.org/ >> > >> > The following blog: >> > http://ewontfix.com/ >> > >> > Seems to show that the musl maintainers try and report glibc bugs >> > and avoid them in their implementation. >> > >> > I have not tried xenomai with musl at all, so, maybe it does not >> > even compile. But maybe just compiling a testcase for the condvar >> > issue with that libc would help know if it has the same issue or >> > not. >> >> Well, like with many of those "light-weight" re-implementations, the are >> "small" issues with bits required for real-time: >> >> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c > > On the other hand, no implementation with a clear ENOTSUPP is better > than a partial and buggy implementation that can not be trusted anyway. > > -- > Gilles. Gilles I agree. In the meantime I tried it already. This is indeed the trace I get when running my test application with musl. # ./cond_test_arm # hread_mutexattr_setprotocol: Not supported Cobalt is not an option for us either since in that case all Linux applications will run in low-priority. Next to that we also have a huge priority inversion each time a Linux system call is done. Do we have other options to fix forge? Ronny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 15:25 ` Ronny Meeus @ 2014-12-18 15:30 ` Gilles Chanteperdrix 2014-12-18 15:35 ` Jan Kiszka 0 siblings, 1 reply; 21+ messages in thread From: Gilles Chanteperdrix @ 2014-12-18 15:30 UTC (permalink / raw) To: Ronny Meeus; +Cc: Jan Kiszka, xenomai On Thu, Dec 18, 2014 at 04:25:40PM +0100, Ronny Meeus wrote: > On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix > <gilles.chanteperdrix@xenomai.org> wrote: > > On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote: > >> On 2014-12-18 15:12, Gilles Chanteperdrix wrote: > >> > Otherwise, have you tried some alternate libc, such as musl: > >> > http://www.musl-libc.org/ > >> > > >> > The following blog: > >> > http://ewontfix.com/ > >> > > >> > Seems to show that the musl maintainers try and report glibc bugs > >> > and avoid them in their implementation. > >> > > >> > I have not tried xenomai with musl at all, so, maybe it does not > >> > even compile. But maybe just compiling a testcase for the condvar > >> > issue with that libc would help know if it has the same issue or > >> > not. > >> > >> Well, like with many of those "light-weight" re-implementations, the are > >> "small" issues with bits required for real-time: > >> > >> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c > > > > On the other hand, no implementation with a clear ENOTSUPP is better > > than a partial and buggy implementation that can not be trusted anyway. > > > > -- > > Gilles. > > Gilles I agree. > > In the meantime I tried it already. > > This is indeed the trace I get when running my test application with musl. > # ./cond_test_arm > # hread_mutexattr_setprotocol: Not supported > > Cobalt is not an option for us either since in that case all Linux > applications will run in low-priority. Next to that we also have a > huge priority inversion each time a Linux system call is done. > > Do we have other options to fix forge? Well, three options have been proposed if I followed this thread correctly: - stop using priority inheritance for these internal mutexes, at the risk of creating priority inversions - switch to priority ceiling (but what will be the ceiling? 99?) - use cobalt. -- Gilles. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 15:30 ` Gilles Chanteperdrix @ 2014-12-18 15:35 ` Jan Kiszka 2014-12-18 15:49 ` Ronny Meeus 0 siblings, 1 reply; 21+ messages in thread From: Jan Kiszka @ 2014-12-18 15:35 UTC (permalink / raw) To: Gilles Chanteperdrix, Ronny Meeus; +Cc: xenomai On 2014-12-18 16:30, Gilles Chanteperdrix wrote: > On Thu, Dec 18, 2014 at 04:25:40PM +0100, Ronny Meeus wrote: >> On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix >> <gilles.chanteperdrix@xenomai.org> wrote: >>> On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote: >>>> On 2014-12-18 15:12, Gilles Chanteperdrix wrote: >>>>> Otherwise, have you tried some alternate libc, such as musl: >>>>> http://www.musl-libc.org/ >>>>> >>>>> The following blog: >>>>> http://ewontfix.com/ >>>>> >>>>> Seems to show that the musl maintainers try and report glibc bugs >>>>> and avoid them in their implementation. >>>>> >>>>> I have not tried xenomai with musl at all, so, maybe it does not >>>>> even compile. But maybe just compiling a testcase for the condvar >>>>> issue with that libc would help know if it has the same issue or >>>>> not. >>>> >>>> Well, like with many of those "light-weight" re-implementations, the are >>>> "small" issues with bits required for real-time: >>>> >>>> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c >>> >>> On the other hand, no implementation with a clear ENOTSUPP is better >>> than a partial and buggy implementation that can not be trusted anyway. >>> >>> -- >>> Gilles. >> >> Gilles I agree. >> >> In the meantime I tried it already. >> >> This is indeed the trace I get when running my test application with musl. >> # ./cond_test_arm >> # hread_mutexattr_setprotocol: Not supported >> >> Cobalt is not an option for us either since in that case all Linux >> applications will run in low-priority. Next to that we also have a >> huge priority inversion each time a Linux system call is done. >> >> Do we have other options to fix forge? > > Well, three options have been proposed if I followed this thread > correctly: > - stop using priority inheritance for these internal mutexes, at the > risk of creating priority inversions > - switch to priority ceiling (but what will be the ceiling? 99?) Likely - part of the reason why that is no general solution. > - use cobalt. - use a patched glibc - fix upstream glibc - non-trivial, as history shows, but long overdue Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 15:35 ` Jan Kiszka @ 2014-12-18 15:49 ` Ronny Meeus 2014-12-18 16:06 ` Jan Kiszka 0 siblings, 1 reply; 21+ messages in thread From: Ronny Meeus @ 2014-12-18 15:49 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai On Thu, Dec 18, 2014 at 4:35 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2014-12-18 16:30, Gilles Chanteperdrix wrote: >> On Thu, Dec 18, 2014 at 04:25:40PM +0100, Ronny Meeus wrote: >>> On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix >>> <gilles.chanteperdrix@xenomai.org> wrote: >>>> On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote: >>>>> On 2014-12-18 15:12, Gilles Chanteperdrix wrote: >>>>>> Otherwise, have you tried some alternate libc, such as musl: >>>>>> http://www.musl-libc.org/ >>>>>> >>>>>> The following blog: >>>>>> http://ewontfix.com/ >>>>>> >>>>>> Seems to show that the musl maintainers try and report glibc bugs >>>>>> and avoid them in their implementation. >>>>>> >>>>>> I have not tried xenomai with musl at all, so, maybe it does not >>>>>> even compile. But maybe just compiling a testcase for the condvar >>>>>> issue with that libc would help know if it has the same issue or >>>>>> not. >>>>> >>>>> Well, like with many of those "light-weight" re-implementations, the are >>>>> "small" issues with bits required for real-time: >>>>> >>>>> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c >>>> >>>> On the other hand, no implementation with a clear ENOTSUPP is better >>>> than a partial and buggy implementation that can not be trusted anyway. >>>> >>>> -- >>>> Gilles. >>> >>> Gilles I agree. >>> >>> In the meantime I tried it already. >>> >>> This is indeed the trace I get when running my test application with musl. >>> # ./cond_test_arm >>> # hread_mutexattr_setprotocol: Not supported >>> >>> Cobalt is not an option for us either since in that case all Linux >>> applications will run in low-priority. Next to that we also have a >>> huge priority inversion each time a Linux system call is done. >>> >>> Do we have other options to fix forge? >> >> Well, three options have been proposed if I followed this thread >> correctly: >> - stop using priority inheritance for these internal mutexes, at the >> risk of creating priority inversions >> - switch to priority ceiling (but what will be the ceiling? 99?) > > Likely - part of the reason why that is no general solution. > >> - use cobalt. > > - use a patched glibc I tried to apply the patch on glibc 2.20 but it looks like the issue is still present. Even if it would be solved with the patch, we force all users of forge to work with a patched version of glibc and to go to 2.20. This might not always be easy. > - fix upstream glibc - non-trivial, as history shows, but long overdue Another option is to implement the priority boost in the copperplate lib. Before the signal is done, change the priority to an equal priority as the task signaling the conditional variable (in case the prio or the waiting task is lower). Once the thread is unblocked, restore the original priority (from the thread that receives the signal). Ronny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 15:49 ` Ronny Meeus @ 2014-12-18 16:06 ` Jan Kiszka 2014-12-18 17:21 ` Ronny Meeus 0 siblings, 1 reply; 21+ messages in thread From: Jan Kiszka @ 2014-12-18 16:06 UTC (permalink / raw) To: Ronny Meeus; +Cc: xenomai On 2014-12-18 16:49, Ronny Meeus wrote: > On Thu, Dec 18, 2014 at 4:35 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: >> On 2014-12-18 16:30, Gilles Chanteperdrix wrote: >>> On Thu, Dec 18, 2014 at 04:25:40PM +0100, Ronny Meeus wrote: >>>> On Thu, Dec 18, 2014 at 4:04 PM, Gilles Chanteperdrix >>>> <gilles.chanteperdrix@xenomai.org> wrote: >>>>> On Thu, Dec 18, 2014 at 03:58:52PM +0100, Jan Kiszka wrote: >>>>>> On 2014-12-18 15:12, Gilles Chanteperdrix wrote: >>>>>>> Otherwise, have you tried some alternate libc, such as musl: >>>>>>> http://www.musl-libc.org/ >>>>>>> >>>>>>> The following blog: >>>>>>> http://ewontfix.com/ >>>>>>> >>>>>>> Seems to show that the musl maintainers try and report glibc bugs >>>>>>> and avoid them in their implementation. >>>>>>> >>>>>>> I have not tried xenomai with musl at all, so, maybe it does not >>>>>>> even compile. But maybe just compiling a testcase for the condvar >>>>>>> issue with that libc would help know if it has the same issue or >>>>>>> not. >>>>>> >>>>>> Well, like with many of those "light-weight" re-implementations, the are >>>>>> "small" issues with bits required for real-time: >>>>>> >>>>>> http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutexattr_setprotocol.c >>>>> >>>>> On the other hand, no implementation with a clear ENOTSUPP is better >>>>> than a partial and buggy implementation that can not be trusted anyway. >>>>> >>>>> -- >>>>> Gilles. >>>> >>>> Gilles I agree. >>>> >>>> In the meantime I tried it already. >>>> >>>> This is indeed the trace I get when running my test application with musl. >>>> # ./cond_test_arm >>>> # hread_mutexattr_setprotocol: Not supported >>>> >>>> Cobalt is not an option for us either since in that case all Linux >>>> applications will run in low-priority. Next to that we also have a >>>> huge priority inversion each time a Linux system call is done. >>>> >>>> Do we have other options to fix forge? >>> >>> Well, three options have been proposed if I followed this thread >>> correctly: >>> - stop using priority inheritance for these internal mutexes, at the >>> risk of creating priority inversions >>> - switch to priority ceiling (but what will be the ceiling? 99?) >> >> Likely - part of the reason why that is no general solution. >> >>> - use cobalt. >> >> - use a patched glibc > > I tried to apply the patch on glibc 2.20 but it looks like the issue is still > present. You will need to extend to existing condvar users and tell glibc that those vars will be used in combination with PI mutexes (pthread_condattr_setprotocol_np). > Even if it would be solved with the patch, we force all users > of forge to work with a patched version of glibc and to go to 2.20. > This might not always be easy. Yes, that is unhandy. > >> - fix upstream glibc - non-trivial, as history shows, but long overdue > > Another option is to implement the priority boost in the copperplate lib. > Before the signal is done, change the priority to an equal priority as > the task signaling the conditional variable (in case the prio or the > waiting task is lower). > Once the thread is unblocked, restore the original priority (from the > thread that receives the signal). That implies you know the prios of the involved threads. Doesn't sound like a generic solution either. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 16:06 ` Jan Kiszka @ 2014-12-18 17:21 ` Ronny Meeus 2014-12-23 17:36 ` Ronny Meeus 0 siblings, 1 reply; 21+ messages in thread From: Ronny Meeus @ 2014-12-18 17:21 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai > > That implies you know the prios of the involved threads. Doesn't sound > like a generic solution either. > > Jan > I think Xenomai knows the involved threads. Philippe, is the list of waiting threads not kept in the thread object? Ronny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-18 17:21 ` Ronny Meeus @ 2014-12-23 17:36 ` Ronny Meeus 2015-01-23 16:45 ` Philippe Gerum 0 siblings, 1 reply; 21+ messages in thread From: Ronny Meeus @ 2014-12-23 17:36 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai On Thu, Dec 18, 2014 at 6:21 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote: >> >> That implies you know the prios of the involved threads. Doesn't sound >> like a generic solution either. >> >> Jan >> > > I think Xenomai knows the involved threads. > > Philippe, > is the list of waiting threads not kept in the thread object? > Philippe, any feedback on the discussion from your side? Ronny ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu 2014-12-23 17:36 ` Ronny Meeus @ 2015-01-23 16:45 ` Philippe Gerum 0 siblings, 0 replies; 21+ messages in thread From: Philippe Gerum @ 2015-01-23 16:45 UTC (permalink / raw) To: Ronny Meeus; +Cc: xenomai On 12/23/2014 06:36 PM, Ronny Meeus wrote: > On Thu, Dec 18, 2014 at 6:21 PM, Ronny Meeus <ronny.meeus@gmail.com> wrote: >>> >>> That implies you know the prios of the involved threads. Doesn't sound >>> like a generic solution either. >>> >>> Jan >>> >> >> I think Xenomai knows the involved threads. >> >> Philippe, >> is the list of waiting threads not kept in the thread object? >> > > Philippe, > any feedback on the discussion from your side? > Since designing over a blatant glibc bug does not make any sense, the only reasonable option is to work around this issue for Mercury specifically. Cobalt implements PI-aware condvars properly, with an additional optimization which makes them pretty efficient, so I don't see the point in changing for a sub-optimal option, only to fix a long overdue glibc issue. I pushed a work around following a brute force approach to the -next branch, which applies a static temporary boost to the caller about to signal or wait for a PI-enabled condvar. I won't bother for dynamic tracking of priorities for this issue, this would introduce nasty races and would only work for the syncobj abstraction. Besides, if the thread signaling the condvar is the timer manager, the boost would always take place anyway. Hopefully this patch should help fixing the issue on your end. -- Philippe. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2015-01-23 16:45 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-07 16:26 [Xenomai] [xenomai-forge] timer-internal consume a lot of cpu Ronny Meeus 2014-12-16 17:36 ` Ronny Meeus 2014-12-16 17:58 ` Philippe Gerum 2014-12-16 19:41 ` Ronny Meeus 2014-12-16 20:07 ` Philippe Gerum 2014-12-18 8:00 ` Ronny Meeus 2014-12-18 9:04 ` Jan Kiszka 2014-12-18 12:28 ` Ronny Meeus 2014-12-18 13:35 ` Jan Kiszka 2014-12-18 14:17 ` Ronny Meeus 2014-12-18 14:12 ` Gilles Chanteperdrix 2014-12-18 14:58 ` Jan Kiszka 2014-12-18 15:04 ` Gilles Chanteperdrix 2014-12-18 15:25 ` Ronny Meeus 2014-12-18 15:30 ` Gilles Chanteperdrix 2014-12-18 15:35 ` Jan Kiszka 2014-12-18 15:49 ` Ronny Meeus 2014-12-18 16:06 ` Jan Kiszka 2014-12-18 17:21 ` Ronny Meeus 2014-12-23 17:36 ` Ronny Meeus 2015-01-23 16:45 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.