* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
@ 2011-05-04 15:20 ` Marco d'Itri
2011-05-04 16:03 ` Kay Sievers
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Marco d'Itri @ 2011-05-04 15:20 UTC (permalink / raw)
To: linux-hotplug
On May 04, Johannes Schauer <j.schauer@email.de> wrote:
> now my question is: how do I debug this? how do I find out
The best way to debug this kind of problems is to boot with
init=/bin/bash, start a shell on tty2 or a serial port with openvt or
getty and then exec /sbin/init.
Then you can find out from the second shell what the udevd processes are
waiting for.
OTOH, I remember that udevadm settle was supposed to print the uevents
still queued when the timeout expires, did it stop doing this?
--
ciao,
Marco
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
2011-05-04 15:20 ` Marco d'Itri
@ 2011-05-04 16:03 ` Kay Sievers
2011-05-04 16:31 ` Gabor Z. Papp
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Kay Sievers @ 2011-05-04 16:03 UTC (permalink / raw)
To: linux-hotplug
On Wed, May 4, 2011 at 17:20, Marco d'Itri <md@linux.it> wrote:
> On May 04, Johannes Schauer <j.schauer@email.de> wrote:
>
>> now my question is: how do I debug this? how do I find out
> The best way to debug this kind of problems is to boot with
> init=/bin/bash, start a shell on tty2 or a serial port with openvt or
> getty and then exec /sbin/init.
> Then you can find out from the second shell what the udevd processes are
> waiting for.
>
> OTOH, I remember that udevadm settle was supposed to print the uevents
> still queued when the timeout expires, did it stop doing this?
It should still do this.
Kay
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
2011-05-04 15:20 ` Marco d'Itri
2011-05-04 16:03 ` Kay Sievers
@ 2011-05-04 16:31 ` Gabor Z. Papp
2011-05-04 16:36 ` Kay Sievers
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Gabor Z. Papp @ 2011-05-04 16:31 UTC (permalink / raw)
To: linux-hotplug
* "Johannes Schauer" <j.schauer@email.de>:
| I had still an older rootfs of mine lying around with udev 167 which
| was working fine. After I upgraded it to 168 it showed the exact
| same behaviour as explained above.
Same happend to me with 2.6.27.x on x86, but I was lazy do debug it, since I
don't use it daily.
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (2 preceding siblings ...)
2011-05-04 16:31 ` Gabor Z. Papp
@ 2011-05-04 16:36 ` Kay Sievers
2011-05-04 18:39 ` Johannes Schauer
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Kay Sievers @ 2011-05-04 16:36 UTC (permalink / raw)
To: linux-hotplug
On Wed, May 4, 2011 at 16:57, Johannes Schauer <j.schauer@email.de> wrote:
> I'm running debian unstable on the notion ink adam tablet.
> It was all working fine until debian upgraded udev to 168.
> From then on the following three things happen with new rootfs
> builds:
>
> 1.) upon each boot udev would take 180 seconds to finish and
> allow the boot process to continue after the "waiting for /dev to
> be fully populated" message.
> 2.) when the system is booted doing "ps -e | grep udevd | wc -l"
> yields that I have 24 udevd processes running.
> 3.) when i start "top" I see that there is a udevd process running
> constantly consuming 100% cpu power.
Any chance to attach: strace -p <pid> to that process?
And what kernel version is that?
Maybe some poll() loop goes crazy or something like that.
Kay
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (3 preceding siblings ...)
2011-05-04 16:36 ` Kay Sievers
@ 2011-05-04 18:39 ` Johannes Schauer
2011-05-05 8:33 ` Johannes Schauer
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Johannes Schauer @ 2011-05-04 18:39 UTC (permalink / raw)
To: linux-hotplug
Hi,
On Wed, May 4, 2011 at 18:36, Kay Sievers <kay.sievers@vrfy.org> wrote:
>On Wed, May 4, 2011 at 16:57, Johannes Schauer <j.schauer@email.de> wrote:
>> 1.) upon each boot udev would take 180 seconds to finish and
>> allow the boot process to continue after the "waiting for /dev to
>> be fully populated" message.
>> 2.) when the system is booted doing "ps -e | grep udevd | wc -l"
>> yields that I have 24 udevd processes running.
>> 3.) when i start "top" I see that there is a udevd process running
>> constantly consuming 100% cpu power.
>
>Any chance to attach: strace -p <pid> to that process?
ah that was a good idea - did that. and indeed I got loads of "epoll_wait"
messages spamming my terminal. but that doesnt tell me anything yet,
right?
>And what kernel version is that?
2.6.32
does it matter?
i was also about to try bisecting udev from 167 to 168 but have to postpone
that to later as the current debian sid perl 5.12 transition doesnt allow me
to install the required build dependencies for udev in debian yet. but once that
issue is resolved and I didnt find any other solution to the problem i will
definitely bisect it to get to the source of the error.
cheers, josch
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (4 preceding siblings ...)
2011-05-04 18:39 ` Johannes Schauer
@ 2011-05-05 8:33 ` Johannes Schauer
2011-05-05 9:33 ` Kay Sievers
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Johannes Schauer @ 2011-05-05 8:33 UTC (permalink / raw)
To: linux-hotplug
Hi,
>On Wed, May 4, 2011 at 16:57, Johannes Schauer <j.schauer@email.de> wrote:
>> I'm running debian unstable on the notion ink adam tablet.
>> It was all working fine until debian upgraded udev to 168.
>> From then on the following three things happen with new rootfs
>> builds:
>>
>> 1.) upon each boot udev would take 180 seconds to finish and
>> allow the boot process to continue after the "waiting for /dev to
>> be fully populated" message.
>> 2.) when the system is booted doing "ps -e | grep udevd | wc -l"
>> yields that I have 24 udevd processes running.
>> 3.) when i start "top" I see that there is a udevd process running
>> constantly consuming 100% cpu power.
I bisected the commits from 167 to 168 and I found that this commit is the culprit
introducing the error I see:
commit ff2c503df091e6e4e9ab48cdb6df6ec8b7b525d0
Author: Kay Sievers <kay.sievers@vrfy.org>
Date: Tue, 12 Apr 2011 23:17:09 +0000
You can find a strace of the whole udev execution here:
http://mister-muffin.de/p/vPMW
basically it's these two messages repeating infinitely:
[pid 2353] epoll_wait(0xb, 0xbeb4d3b8, 0x8, 0xbb8) = 1
[pid 2353] SYS_366(0x4, 0, 0, 0x80000, 0) = -1 ENOSYS (Function not implemented)
Note, that the ENOSYS errors didnt show up when i just attached strace to the
udev process.
What is this SYS_366 message telling me? Why is it called and why is it missing?
My kernel has CONFIG_EPOLL=y and as you can see in the strace epoll_create1
and epoll_ctl work just fine.
It kinda looks like an error on my side but shouldnt udev nevertheless check for
whatever it is and quit instead of running wild like that?
thanks for helping!
cheers, josch
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (5 preceding siblings ...)
2011-05-05 8:33 ` Johannes Schauer
@ 2011-05-05 9:33 ` Kay Sievers
2011-05-05 9:38 ` Kay Sievers
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Kay Sievers @ 2011-05-05 9:33 UTC (permalink / raw)
To: linux-hotplug
On Thu, May 5, 2011 at 10:33, Johannes Schauer <j.schauer@email.de> wrote:
> basically it's these two messages repeating infinitely:
>
> [pid 2353] epoll_wait(0xb, 0xbeb4d3b8, 0x8, 0xbb8) = 1
> [pid 2353] SYS_366(0x4, 0, 0, 0x80000, 0) = -1 ENOSYS (Function not implemented)
That's a syscall number and accept4(). It should be there since kernel
2.6.28 and glibc 2.10.
With proper kernel support, and a more recent strace it would look like:
accept4(3, 0, NULL, SOCK_CLOEXEC) = 11
Not sure how that can happen on your side. The last parameter, the
CLOEXEC flag is definitely not 0.
> Note, that the ENOSYS errors didnt show up when i just attached strace to the
> udev process.
> What is this SYS_366 message telling me? Why is it called and why is it missing?
> My kernel has CONFIG_EPOLL=y and as you can see in the strace epoll_create1
> and epoll_ctl work just fine.
>
> It kinda looks like an error on my side but shouldn't udev nevertheless check for
> whatever it is and quit instead of running wild like that?
Yeah, we could try to catch these errors, but it seems there is
something that needs to be fixed in your kernel or glibc. Udev
requires at minimum kernel 2.6.32, and that *should* work there.
Kay
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (6 preceding siblings ...)
2011-05-05 9:33 ` Kay Sievers
@ 2011-05-05 9:38 ` Kay Sievers
2011-05-05 9:56 ` Marco d'Itri
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Kay Sievers @ 2011-05-05 9:38 UTC (permalink / raw)
To: linux-hotplug
On Thu, May 5, 2011 at 11:33, Kay Sievers <kay.sievers@vrfy.org> wrote:
> On Thu, May 5, 2011 at 10:33, Johannes Schauer <j.schauer@email.de> wrote:
>> basically it's these two messages repeating infinitely:
>>
>> [pid 2353] epoll_wait(0xb, 0xbeb4d3b8, 0x8, 0xbb8) = 1
>> [pid 2353] SYS_366(0x4, 0, 0, 0x80000, 0) = -1 ENOSYS (Function not implemented)
>
> That's a syscall number and accept4(). It should be there since kernel
> 2.6.28 and glibc 2.10.
>
> With proper kernel support, and a more recent strace it would look like:
> accept4(3, 0, NULL, SOCK_CLOEXEC) = 11
>
> Not sure how that can happen on your side. The last parameter, the
> CLOEXEC flag is definitely not 0.
Ah, no counted wrong, missed that there are 5 arguments. The 4th
argument, the 0x80000 is the SOCK_CLOEXEC. So it looks like your
kernel does not support accept4. Is that really a 2.6.32 kernel?
Kay
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (7 preceding siblings ...)
2011-05-05 9:38 ` Kay Sievers
@ 2011-05-05 9:56 ` Marco d'Itri
2011-05-05 10:06 ` Kay Sievers
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Marco d'Itri @ 2011-05-05 9:56 UTC (permalink / raw)
To: linux-hotplug
On May 05, Kay Sievers <kay.sievers@vrfy.org> wrote:
> Ah, no counted wrong, missed that there are 5 arguments. The 4th
> argument, the 0x80000 is the SOCK_CLOEXEC. So it looks like your
> kernel does not support accept4. Is that really a 2.6.32 kernel?
Probably it is a libc bug, hppa had the same problem:
http://bugs.debian.org/cgi-bin/bugreport.cgi?buga7967
Johannes, can you confirm that the newer libc package works so I can add
the appropriate conflicts when it will be in unstable?
Still, udev should just explode if a syscall fails with ENOSYS.
--
ciao,
Marco
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (8 preceding siblings ...)
2011-05-05 9:56 ` Marco d'Itri
@ 2011-05-05 10:06 ` Kay Sievers
2011-05-05 10:18 ` Marco d'Itri
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Kay Sievers @ 2011-05-05 10:06 UTC (permalink / raw)
To: linux-hotplug
On Thu, May 5, 2011 at 11:56, Marco d'Itri <md@linux.it> wrote:
> On May 05, Kay Sievers <kay.sievers@vrfy.org> wrote:
>
>> Ah, no counted wrong, missed that there are 5 arguments. The 4th
>> argument, the 0x80000 is the SOCK_CLOEXEC. So it looks like your
>> kernel does not support accept4. Is that really a 2.6.32 kernel?
> Probably it is a libc bug, hppa had the same problem:
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?buga7967
>
> Johannes, can you confirm that the newer libc package works so I can add
> the appropriate conflicts when it will be in unstable?
>
> Still, udev should just explode if a syscall fails with ENOSYS.
Sure, but what should we do? We are in a poll() loop, that will never
block if we don't get the stuff out of the file descriptor that wakes
us up. We could exit? We can certainly try to print something that is
easier to read than a strace.
But on the other hand we require a certain kernel version and it's
symbols to work. There should never be a ENOSYS unless something is
broken somewhere else.
Kay
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (9 preceding siblings ...)
2011-05-05 10:06 ` Kay Sievers
@ 2011-05-05 10:18 ` Marco d'Itri
2011-05-05 11:32 ` Johannes Schauer
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Marco d'Itri @ 2011-05-05 10:18 UTC (permalink / raw)
To: linux-hotplug
On May 05, Kay Sievers <kay.sievers@vrfy.org> wrote:
> Sure, but what should we do? We are in a poll() loop, that will never
> block if we don't get the stuff out of the file descriptor that wakes
> us up. We could exit? We can certainly try to print something that is
> easier to read than a strace.
Just aborting looks fine to me, at least it will be obvious that
something is wrong.
> But on the other hand we require a certain kernel version and it's
> symbols to work. There should never be a ENOSYS unless something is
> broken somewhere else.
This is why I think crashing is OK.
--
ciao,
Marco
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (10 preceding siblings ...)
2011-05-05 10:18 ` Marco d'Itri
@ 2011-05-05 11:32 ` Johannes Schauer
2011-05-05 12:04 ` Johannes Schauer
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Johannes Schauer @ 2011-05-05 11:32 UTC (permalink / raw)
To: linux-hotplug
Hi Marco,
>On May 05, Kay Sievers <kay.sievers@vrfy.org> wrote:
>
>> Ah, no counted wrong, missed that there are 5 arguments. The 4th
>> argument, the 0x80000 is the SOCK_CLOEXEC. So it looks like your
>> kernel does not support accept4. Is that really a 2.6.32 kernel?
>Probably it is a libc bug, hppa had the same problem:
>
>http://bugs.debian.org/cgi-bin/bugreport.cgi?buga7967
>
>Johannes, can you confirm that the newer libc package works so I can add
>the appropriate conflicts when it will be in unstable?
I installed libc6_2.13-0exp5 and libc-bin_2.13-0exp5 from the experimental
repositories but the problem was still the same.
thank you for your help
josch
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (11 preceding siblings ...)
2011-05-05 11:32 ` Johannes Schauer
@ 2011-05-05 12:04 ` Johannes Schauer
2011-05-05 15:13 ` Kay Sievers
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Johannes Schauer @ 2011-05-05 12:04 UTC (permalink / raw)
To: linux-hotplug
Hi,
>But on the other hand we require a certain kernel version and it's
>symbols to work. There should never be a ENOSYS unless something is
>broken somewhere else.
It seems that indeed something is broken somewhere else. When I compile
this small C snippet:
#include <sys/socket.h>
#include <stdlib.h>
#include <errno.h>
int main() {
accept4(0, NULL, 0, 0);
perror("accept4");
}
then instead of
"accept4: Socket operation on non-socket"
I get
"accept4: Function not implemented".
I'm clueless though why that is since i'm definitely on 2.6.32 and using
libc6 2.13 which should both be enough for accept4 to be there.
thank you for your help so far but it seems this is not a problem with
udev :)
cheers, josch
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (12 preceding siblings ...)
2011-05-05 12:04 ` Johannes Schauer
@ 2011-05-05 15:13 ` Kay Sievers
2011-05-05 15:37 ` Johannes Schauer
2011-05-05 15:42 ` Marco d'Itri
15 siblings, 0 replies; 17+ messages in thread
From: Kay Sievers @ 2011-05-05 15:13 UTC (permalink / raw)
To: linux-hotplug
On Thu, May 5, 2011 at 14:04, Johannes Schauer <j.schauer@email.de> wrote:
>>But on the other hand we require a certain kernel version and it's
>>symbols to work. There should never be a ENOSYS unless something is
>>broken somewhere else.
> It seems that indeed something is broken somewhere else. When I compile
> this small C snippet:
>
> #include <sys/socket.h>
> #include <stdlib.h>
> #include <errno.h>
>
> int main() {
> accept4(0, NULL, 0, 0);
> perror("accept4");
> }
>
> then instead of
> "accept4: Socket operation on non-socket"
> I get
> "accept4: Function not implemented".
>
> I'm clueless though why that is since i'm definitely on 2.6.32 and using
> libc6 2.13 which should both be enough for accept4 to be there.
If it show's the return value in strace, it looks like it is the
kernel. Here it does:
accept4(0, 0, NULL, 0) = -1 ENOTSOCK (Socket operation on non-socket)
> thank you for your help so far but it seems this is not a problem with
> udev :)
Yeah, looks weird. Maybe try a more recent kernel first, if the issue goes away.
Kay
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (13 preceding siblings ...)
2011-05-05 15:13 ` Kay Sievers
@ 2011-05-05 15:37 ` Johannes Schauer
2011-05-05 15:42 ` Marco d'Itri
15 siblings, 0 replies; 17+ messages in thread
From: Johannes Schauer @ 2011-05-05 15:37 UTC (permalink / raw)
To: linux-hotplug
Hi,
>Yeah, looks weird. Maybe try a more recent kernel first, if the issue goes away.
>Kay
well, it turned out it is not a libc issue but indeed a kernel issue.
The reason is, that accept4 is not available on arm with 2.6.32.
It was only introduced for 2.6.36 - see this commit:
21d93e2e29722d7832f61cc56d73fb953ee6578e
I applied this simple patch to my kernel 2.6.32, rebuild and voila! everything
works as expected :)
So since version 168 udev requires more than 2.6.32 - 2.6.36 to be precise,
because of the accept4 call.
Couldnt you just use the old accept instead so that udev also works with 2.6.32
non-x86 systems? But it would probably be simpler to just raise the kernel
requirements for udev from .32 to .36.
I'm very happy this is solved now :)
cheers, josch
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: udev regression from 167 to 168 on notion ink adam
2011-05-04 14:57 udev regression from 167 to 168 on notion ink adam Johannes Schauer
` (14 preceding siblings ...)
2011-05-05 15:37 ` Johannes Schauer
@ 2011-05-05 15:42 ` Marco d'Itri
15 siblings, 0 replies; 17+ messages in thread
From: Marco d'Itri @ 2011-05-05 15:42 UTC (permalink / raw)
To: linux-hotplug
On May 05, Johannes Schauer <j.schauer@email.de> wrote:
> Couldnt you just use the old accept instead so that udev also works with 2.6.32
> non-x86 systems? But it would probably be simpler to just raise the kernel
> requirements for udev from .32 to .36.
I think it would be a good idea to stick to 2.6.32 for a while since it
is the current stable kernel for all relevant distributions and it will
help upgrades.
But please open a Debian bug asking for that changeset to be backported
to stable.
--
ciao,
Marco
^ permalink raw reply [flat|nested] 17+ messages in thread