Linux Hotplug development

Linux Hotplug development
 help / color / mirror / Atom feed

* Re: udev regression from 167 to 168 on notion ink adam
From: Marco d'Itri @ 2011-05-05 10:18 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <809580245.144501.1304521074555.JavaMail.fmail@mwmweb069>

On May 05, Kay Sievers <kay.sievers@vrfy.org> wrote:

> Sure, but what should we do? We are in a poll() loop, that will never
> block if we don't get the stuff out of the file descriptor that wakes
> us up. We could exit? We can certainly try to print something that is
> easier to read than a strace.
Just aborting looks fine to me, at least it will be obvious that
something is wrong.

> But on the other hand we require a certain kernel version and it's
> symbols to work. There should never be a ENOSYS unless something is
> broken somewhere else.
This is why I think crashing is OK.

-- 
ciao,
Marco

^ permalink raw reply

* Re: udev regression from 167 to 168 on notion ink adam
From: Johannes Schauer @ 2011-05-05 11:32 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <809580245.144501.1304521074555.JavaMail.fmail@mwmweb069>

Hi Marco,

>On May 05, Kay Sievers <kay.sievers@vrfy.org> wrote:
>
>> Ah, no counted wrong, missed that there are 5 arguments. The 4th
>> argument, the 0x80000 is the SOCK_CLOEXEC. So it looks like your
>> kernel does not support accept4. Is that really a 2.6.32 kernel?
>Probably it is a libc bug, hppa had the same problem:
>
>http://bugs.debian.org/cgi-bin/bugreport.cgi?buga7967
>
>Johannes, can you confirm that the newer libc package works so I can add
>the appropriate conflicts when it will be in unstable?
I installed libc6_2.13-0exp5 and libc-bin_2.13-0exp5 from the experimental
repositories but the problem was still the same.

thank you for your help

josch

^ permalink raw reply

* Re: udev regression from 167 to 168 on notion ink adam
From: Johannes Schauer @ 2011-05-05 12:04 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <809580245.144501.1304521074555.JavaMail.fmail@mwmweb069>

Hi,

>But on the other hand we require a certain kernel version and it's
>symbols to work. There should never be a ENOSYS unless something is
>broken somewhere else.
It seems that indeed something is broken somewhere else. When I compile
this small C snippet:

#include <sys/socket.h>
#include <stdlib.h>
#include <errno.h>

int main() {
    accept4(0, NULL, 0, 0);
    perror("accept4");
}

then instead of
"accept4: Socket operation on non-socket"
I get
"accept4: Function not implemented".

I'm clueless though why that is since i'm definitely on 2.6.32 and using
libc6 2.13 which should both be enough for accept4 to be there.

thank you for your help so far but it seems this is not a problem with
udev :)

cheers, josch

^ permalink raw reply

* Re: udev regression from 167 to 168 on notion ink adam
From: Kay Sievers @ 2011-05-05 15:13 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <809580245.144501.1304521074555.JavaMail.fmail@mwmweb069>

On Thu, May 5, 2011 at 14:04, Johannes Schauer <j.schauer@email.de> wrote:
>>But on the other hand we require a certain kernel version and it's
>>symbols to work. There should never be a ENOSYS unless something is
>>broken somewhere else.
> It seems that indeed something is broken somewhere else. When I compile
> this small C snippet:
>
> #include <sys/socket.h>
> #include <stdlib.h>
> #include <errno.h>
>
> int main() {
>     accept4(0, NULL, 0, 0);
>     perror("accept4");
> }
>
> then instead of
> "accept4: Socket operation on non-socket"
> I get
> "accept4: Function not implemented".
>
> I'm clueless though why that is since i'm definitely on 2.6.32 and using
> libc6 2.13 which should both be enough for accept4 to be there.

If it show's the return value in strace, it looks like it is the
kernel. Here it does:
  accept4(0, 0, NULL, 0) = -1 ENOTSOCK (Socket operation on non-socket)

> thank you for your help so far but it seems this is not a problem with
> udev :)

Yeah, looks weird. Maybe try a more recent kernel first, if the issue goes away.

Kay

^ permalink raw reply

* Re: udev regression from 167 to 168 on notion ink adam
From: Johannes Schauer @ 2011-05-05 15:37 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <809580245.144501.1304521074555.JavaMail.fmail@mwmweb069>

Hi,

>Yeah, looks weird. Maybe try a more recent kernel first, if the issue goes away.
>Kay
well, it turned out it is not a libc issue but indeed a kernel issue.
The reason is, that accept4 is not available on arm with 2.6.32.
It was only introduced for 2.6.36 - see this commit:

21d93e2e29722d7832f61cc56d73fb953ee6578e

I applied this simple patch to my kernel 2.6.32, rebuild and voila! everything
works as expected :)

So since version 168 udev requires more than 2.6.32 - 2.6.36 to be precise,
because of the accept4 call.

Couldnt you just use the old accept instead so that udev also works with 2.6.32
non-x86 systems? But it would probably be simpler to just raise the kernel
requirements for udev from .32 to .36.

I'm very happy this is solved now :)

cheers, josch

^ permalink raw reply

* Re: udev regression from 167 to 168 on notion ink adam
From: Marco d'Itri @ 2011-05-05 15:42 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <809580245.144501.1304521074555.JavaMail.fmail@mwmweb069>

On May 05, Johannes Schauer <j.schauer@email.de> wrote:

> Couldnt you just use the old accept instead so that udev also works with 2.6.32
> non-x86 systems? But it would probably be simpler to just raise the kernel
> requirements for udev from .32 to .36.
I think it would be a good idea to stick to 2.6.32 for a while since it
is the current stable kernel for all relevant distributions and it will
help upgrades.
But please open a Debian bug asking for that changeset to be backported
to stable.

-- 
ciao,
Marco

^ permalink raw reply

* [PATCH] systemd: settle should not block basic.target
From: Tom Gundersen @ 2011-05-06 13:38 UTC (permalink / raw)
  To: linux-hotplug

Hi guys,

On machines using lvm, the lvm service must pull in
udev-settle.service. However,
most other services don't need to wait for udev to settle. By removing
Beforeºsic.target
from udev-settle.service I can speed up boot on my machine by about two seconds.

A side-effect of this is that it no longer makes sense to enable
udev-settle unconditionally.
However, it would anyway be a bad idea for a service to assume
udev-settle to be enabled,
so all services that needs to wait for settle (such as lvm and sysv
compat units) should
anyway make this explicit.

What do you think?

Cheers,

Tom

From cd8413fa7defa94b7e6fa093626a142bb2788391 Mon Sep 17 00:00:00 2001
From: Tom Gundersen <teg@jklm.no>
Date: Fri, 6 May 2011 15:19:08 +0200
Subject: [PATCH] settle: do not block basic.target

Some services (such as lvm) stil need to wait for udev to settle. However, there
is no need for all other services also to be blocked.

This means that it is no longer possible to enable udev-settle.service
unconditionally,
but every service that needs it must depend on it explicitly.

Signed-off-by: Tom Gundersen <teg@jklm.no>
---
 init/udev-settle.service.in |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/init/udev-settle.service.in b/init/udev-settle.service.in
index d7d6f78..cb89b4d 100644
--- a/init/udev-settle.service.in
+++ b/init/udev-settle.service.in
@@ -13,13 +13,9 @@ Description=udev Wait for Complete Device Initialization
 DefaultDependencies=no
 Requires=udev.service
 After=udev-trigger.service
-Beforeºsic.target

 [Service]
 Type=oneshot
 TimeoutSec\x180
 RemainAfterExit=yes
 ExecStart=@sbindir@/udevadm settle
-
-[Install]
-WantedByºsic.target
-- 
1.7.5.1

^ permalink raw reply related

* Persistent net/cd rules
From: Tom Gundersen @ 2011-05-06 14:48 UTC (permalink / raw)
  To: linux-hotplug

Hi guys,

I'm working on the initscripts for Archlinux and I have a question.

We used to copy /dev/.udev/tmp-rules--70-persistent-net.rules to
/etc/udev/rules.d/ when the latter was remounted rw. However, with the
move to /run/, we needed to update this, and I figured there should be
a better way to do it. I notice that systemd does not deal with this
at all (AFAICT).

My question is therefore: what is the proper mechanism to use to make
sure that net/cd device naming is preserved after reboot? Is there a
way to make udev merge the rule files in /run with the rule files in
/etc itself?

Cheers,

Tom

^ permalink raw reply

* Re: Persistent net/cd rules
From: Kay Sievers @ 2011-05-06 15:05 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <BANLkTinbMZ4gGEB8_8w4tJUpLhFOj4d7PQ@mail.gmail.com>

On Fri, May 6, 2011 at 16:48, Tom Gundersen <teg@jklm.no> wrote:
> I'm working on the initscripts for Archlinux and I have a question.
>
> We used to copy /dev/.udev/tmp-rules--70-persistent-net.rules to
> /etc/udev/rules.d/ when the latter was remounted rw. However, with the
> move to /run/, we needed to update this, and I figured there should be
> a better way to do it. I notice that systemd does not deal with this
> at all (AFAICT).

No, we skipped all that.

> My question is therefore: what is the proper mechanism to use to make
> sure that net/cd device naming is preserved after reboot? Is there a
> way to make udev merge the rule files in /run with the rule files in
> /etc itself?

The current plan is to get rid of all that. We don't want to rename
devices in the kernel namespace anymore. There are races with name
swaps which are impossible to fix. If people want persistent names,
they will need to configure the devices to have such names, and these
names will _not_ be the kernel names.

When we'll get there and can provide a tool to or the mechanics for a
possible tool to do that, all the rule_generator stuff will go away,
hence we did not integrate any of the rule-copy--over-to-the-rw-rootfs
in systemd, and we don't plan to.

Same for the cdrom rules, if people need static names here, they will
need to configure them. Allocating a new number in this rules file in
/etc for every USB-G3-modem with a fake-cdrom is really nothing we
want to continue too. So that will go away in the longer run too, and
require explicit configuration with some
to-be-find-out-how-it-will-look-like system management tool
integration.

Kay

^ permalink raw reply

* Re: [PATCH] systemd: settle should not block basic.target
From: Kay Sievers @ 2011-05-06 15:11 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <BANLkTincvcmGMbxXoqtd=dw1vJ9nXCPdfA@mail.gmail.com>

On Fri, May 6, 2011 at 15:38, Tom Gundersen <teg@jklm.no> wrote:
> On machines using lvm, the lvm service must pull in
> udev-settle.service. However,
> most other services don't need to wait for udev to settle. By removing
> Beforeºsic.target
> from udev-settle.service I can speed up boot on my machine by about two seconds.

Yeah, the current idea is that _if_ something pulls it in, we want to
make sure it's done in basic.target. If the service is not pulled-in,
it will not be started at all, regardless of the basic.target sorting
or not.

It might not be needed, but we thought we would be safer to let it
block basic.target. If that thing runs on your system, you are in
legacy mode anyway. :)

> A side-effect of this is that it no longer makes sense to enable
> udev-settle unconditionally.

We don't do this. The section in the service file only allows:
  systemctl enable foo
to work, it does not enable anything by default. This section is just
nice to have, in case someone needs to enable it unconditionally, but
nothing should ever do that without need.

> However, it would anyway be a bad idea for a service to assume
> udev-settle to be enabled,

Right.

> so all services that needs to wait for settle (such as lvm and sysv
> compat units) should
> anyway make this explicit.

Yeah, they should pull i tin, then it will be started, if they don't,
systemd will not even see the file.

Kay

^ permalink raw reply

* libudev queue finished seqnums
From: Sebastian Wiesner @ 2011-05-08 18:32 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

Hello,

I've got a question about libudev, concerning the function
"udev_queue_get_seqnum_is_finished()".  According to the
documentation, it returns a flag, indicating whether the given
sequence number has already been processed.  In my experiments
however, I found this function to always return 1, even for sequence
numbers which haven't ocurred yet.  The attached test program gives
the following output on my system:

$ ./udev_queue_sequence_numbers
current seqnum: 1662
is previous seqnum finished? yes
is current seqnum finished? yes
is next seqnum finished? yes

In my understanding, the last line should have said "no", because the
number immediately folllowing the current sequence number has
obviously not yet occurred.

Am I missing something?

Thanks for your help,
Sebastian Wiesner

[-- Attachment #2: udev_queue_sequence_numbers.c --]
[-- Type: text/x-csrc, Size: 900 bytes --]

#include <libudev.h>
#include <stdio.h>
#include <assert.h>

int main(int argc, char *argv[]) {
  struct udev *context = udev_new();
  assert(context);
  struct udev_queue *queue = udev_queue_new(context);
  assert(queue);
  unsigned long long int current_seqnum = udev_queue_get_udev_seqnum(queue);
  int is_current_finished = udev_queue_get_seqnum_is_finished(
      queue, current_seqnum);
  int is_previous_finished = udev_queue_get_seqnum_is_finished(
      queue, current_seqnum - 1l);
  int is_next_finished = udev_queue_get_seqnum_is_finished(
      queue, current_seqnum + 1l);
  printf("current seqnum: %llu\n", current_seqnum);
  printf("is previous seqnum finished? %s\n", is_previous_finished?
          "yes": "no");
  printf("is current seqnum finished? %s\n", is_current_finished? "yes": "no");
  printf("is next seqnum finished? %s\n", is_next_finished? "yes": "no");
  return 0;
}

^ permalink raw reply

* Re: libudev queue finished seqnums
From: Kay Sievers @ 2011-05-09  9:54 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <BANLkTi=H1sc2tnOYNTY3n8UyJt21mcLxzw@mail.gmail.com>

On Sun, May 8, 2011 at 20:32, Sebastian Wiesner
<lunaryorn@googlemail.com> wrote:
> I've got a question about libudev, concerning the function
> "udev_queue_get_seqnum_is_finished()".  According to the
> documentation, it returns a flag, indicating whether the given
> sequence number has already been processed.  In my experiments
> however, I found this function to always return 1, even for sequence
> numbers which haven't ocurred yet.  The attached test program gives
> the following output on my system:
>
> $ ./udev_queue_sequence_numbers
> current seqnum: 1662
> is previous seqnum finished? yes
> is current seqnum finished? yes
> is next seqnum finished? yes
>
> In my understanding, the last line should have said "no", because the
> number immediately folllowing the current sequence number has
> obviously not yet occurred.
>
> Am I missing something?

I guess, you need to read the function name as
_seqnum_is_not_active(). It will return true for all event numbers
currently not queued, including numbers larger than the currently
handled one.

Kay

^ permalink raw reply

* Re: [PATCH] Set PCIE maxpayload for card during hotplug insertion
From: Jesse Barnes @ 2011-05-09 19:18 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <1301005267-31006-1-git-send-email-jordan_hargrave@dell.com>

On Wed, 4 May 2011 13:42:44 -0500
<Jordan_Hargrave@Dell.com> wrote:

> I have tested and verified that the patch works.
> Tested-by: jordan_hargrave@dell.com
> 

Great, thanks.  Can you re-send against my linux-next branch?  The last
patch seems to have been corrupted.

Kenji-san, can I add your reviewed-by?

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply

* r8712u driver
From: Serge Myronyuk @ 2011-05-10  4:47 UTC (permalink / raw)
  To: linux-hotplug

Dear sirs,

I upgraded my ubuntu from 10.10 into 11.04 version.
And after that I have issue with wireless connection:
10 minutes network connection is up and 10 minutes
network connection is down. And again 10 min - up, 10 min down.

I am using r8712u driver.

Could you, please, help me and/or recommend good working driver?

Thank you very much in advance!

Best regards,
Serge Myronyuk

^ permalink raw reply

* Re: libudev queue finished seqnums
From: Sebastian Wiesner @ 2011-05-10 10:59 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <BANLkTi=H1sc2tnOYNTY3n8UyJt21mcLxzw@mail.gmail.com>

2011/5/9 Kay Sievers <kay.sievers@vrfy.org>:
> On Sun, May 8, 2011 at 20:32, Sebastian Wiesner
> <lunaryorn@googlemail.com> wrote:
>> I've got a question about libudev, concerning the function
>> "udev_queue_get_seqnum_is_finished()".  According to the
>> documentation, it returns a flag, indicating whether the given
>> sequence number has already been processed.  In my experiments
>> however, I found this function to always return 1, even for sequence
>> numbers which haven't ocurred yet.  The attached test program gives
>> the following output on my system:
>>
>> $ ./udev_queue_sequence_numbers
>> current seqnum: 1662
>> is previous seqnum finished? yes
>> is current seqnum finished? yes
>> is next seqnum finished? yes
>>
>> In my understanding, the last line should have said "no", because the
>> number immediately folllowing the current sequence number has
>> obviously not yet occurred.
>>
>> Am I missing something?
>
> I guess, you need to read the function name as
> _seqnum_is_not_active(). It will return true for all event numbers
> currently not queued, including numbers larger than the currently
> handled one.

Ok, that'd make sense.

Thanks for your help

^ permalink raw reply

* Re: [PATCH] Set PCIE maxpayload for card during hotplug insertion
From: Kenji Kaneshige @ 2011-05-10 22:13 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <1301005267-31006-1-git-send-email-jordan_hargrave@dell.com>

(2011/05/10 4:18), Jesse Barnes wrote:
> On Wed, 4 May 2011 13:42:44 -0500
> <Jordan_Hargrave@Dell.com>  wrote:
> 
>> I have tested and verified that the patch works.
>> Tested-by: jordan_hargrave@dell.com
>>
> 
> Great, thanks.  Can you re-send against my linux-next branch?  The last
> patch seems to have been corrupted.
> 
> Kenji-san, can I add your reviewed-by?

Sure.

Thanks
Kenji Kaneshige


^ permalink raw reply

* future of sysctls?
From: Ludwig Nussel @ 2011-05-12 15:41 UTC (permalink / raw)
  To: linux-hotplug

Hi,

I'm currently struggling to find a sane way to set
net.ipv6.conf.default.use_tempaddr.
Traditionally at some point during boot "sysctl -e -q -p /etc/sysctl.conf" is
called. That doesn't really work out anymore. The aforementioned setting needs
to be applied after the ipv6 module is loaded (could be compiled into the
kernel too though) otherwise it wouldn't apply. It needs to be set before a
network driver is loaded though as the default value is copied to
interfaces specific settings at interface creation time. On top of
that there are also network interface specific sysctls that need to
be applied after an interface is created (e.g.
net.ipv6.conf.eth0.use_tempaddr).
Are there any plans to better deal with that?
Like e.g. emitting events when some part of the kernel registers a sysctl so
userspace can override the compiled in default value?
Or just offer sysfs attributes instead of sysctls?

cu
Ludwig

-- 
 (o_   Ludwig Nussel
 //\
 V_/_  http://www.suse.de/
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) 

^ permalink raw reply

* Re: future of sysctls?
From: Lennart Poettering @ 2011-05-15 15:47 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <201105121741.27459.ludwig.nussel@suse.de>

On Thu, 12.05.11 17:41, Ludwig Nussel (ludwig.nussel@suse.de) wrote:

> Hi,
> 
> I'm currently struggling to find a sane way to set
> net.ipv6.conf.default.use_tempaddr.
> Traditionally at some point during boot "sysctl -e -q -p /etc/sysctl.conf" is
> called. That doesn't really work out anymore. The aforementioned setting needs
> to be applied after the ipv6 module is loaded (could be compiled into the
> kernel too though) otherwise it wouldn't apply. It needs to be set before a
> network driver is loaded though as the default value is copied to
> interfaces specific settings at interface creation time. On top of
> that there are also network interface specific sysctls that need to
> be applied after an interface is created (e.g.
> net.ipv6.conf.eth0.use_tempaddr).

Something like this is kinda broken anyway, since it is racy: you can
apply the sysctl only after the interface is already available.

Might be a good idea to just ignore these kinds of settings. Or if this
is not possible, then set them from NM or whatever controls the network.

> Are there any plans to better deal with that?
> Like e.g. emitting events when some part of the kernel registers a sysctl so
> userspace can override the compiled in default value?
> Or just offer sysfs attributes instead of sysctls?

In a systemd world the ipv6 module is loaded very early and hence the
sysctl should always be available, no special setup needed. If the same
problem appears in real life with other modules too, then we could order
sysctl setting after module loading and fix things by this.

Can't tell you though what to do in a non-systemd world however.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.

^ permalink raw reply

* udevadm settle persistently failing
From: Nix @ 2011-05-15 18:19 UTC (permalink / raw)
  To: linux-hotplug

I know that you're not supposed to rely on 'udevadm settle' anymore, but
I rely on it across the board for systems with root filesystems that
aren't expected to move around (i.e. all of them), because massively
reengineering working systems' boot processes is generally considered a
bad thing. And it's stopped working. Given how many things expect /dev
to be populated, this has fairly serious effects.

I can be certain that as of somewhere between udev 164 and 167, 'udevadm
settle' has stopped waiting for block devices to appear (though I
suspect others have vanished too). I'm booting udev as recommended in
the release notes, via

  udevd --daemon
  udevadm trigger --action≠d --type=subsystems
  udevadm trigger --action≠d --typeﬁvices
  udevadm settle

but this is what I now see:

,----
| Creating initial device nodes...
| [    2.035253] <30>udevd[297]: starting version 168
| udevd[297]: specified group 'audio' unknown
| 
| [    2.151279] <30>udevd[297]: converting old udev database
| udevd[316]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv pci:v00001022d00002080sv00001022sd00002080bc06sc00i00': No such file or directory
| 
| umount: /run: device is busy.
|         (In some cases useful info about processes that use
|          the device is found by lsof(8) or fuser(1))
| Cannot find device "gordianet"
| fsck from util-linux 2.19
| udevd[334]: failed to exec[    2.457619] EXT2-fs (sda1): warning: mounting unchecked fs, running e2fsck is recommended
| ute '/sbin/modprobe' '/sbin/modprobe -bv platform:cs5535-gpio': No such file or directory
| 
| udevd[333]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv platform:cs5535-acpi': No such file or directory
| 
| udevd[335]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv platform:cs5535-pms': No such file or directory
| 
| udevd[336]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv sg': No such file or directory
| 
| udevd[339]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv platform:rtc_cmos': No such file or directory
| 
| fsck.ext2: No such file or directory while trying to open /dev/sda1
| Possibly non-existent device?
`----

I have no clue why udev is trying to run modprobe when this is a
non-modular kernel with all necessary devices built in. But that's not
important.

By the time of that 'umount', 'udevadm settle' has returned. Shortly
afterwards you see fsck claiming that devices don't exist. Look at
it the filesystem after the resulting failed half-boot and you see:

fold:~# ls -l /dev/sda1
brw-rw---- 1 root disk 8, 1 May 15 18:56 /dev/sda1

it's there. udevadm settle just didn't bother to wait for it to appear.
(This is not a device on a bus for which enumeration takes some time:
it's an SD card on an IDE-lookalike cs5535 bus. I've also seen this
on LVM-atop-RAID-atop-SATA and on ordinary LVM-atop-SATA, so it doesn't
require anything particularly unusual.)

Other things seem to have failed too. I have renaming rules:

SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a0", NAME="gordianet"
SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a1", NAME="adsl"
SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a3", NAME="wireless"

yet the devices were not renamed:

2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:00:24:cb:c6:a0 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:00:24:cb:c6:a1 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:00:24:cb:c6:a3 brd ff:ff:ff:ff:ff:ff

Put a 'sleep 1' after the udev call, and everything works.

Anyone know what's going on? I haven't bisected yet (the problem is
intermittent), but I strongly suspect that

commit ead7c62ab7641e150c6d668f939c102a6771ce60
Author: Kay Sievers <kay.sievers@vrfy.org>
Date:   Wed Apr 20 02:18:22 2011 +0200

    udevadm: settle - kill alarm()

commit 2181d30a342dd9fb168b7077ae5095849e328689
Author: Kay Sievers <kay.sievers@vrfy.org>
Date:   Wed Apr 20 01:53:03 2011 +0200

    timeout handling without alarm()

has broken udevadm settle and caused it to not wait at all. I'll
check that next time I reboot (with my heart in my mouth).

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: udevadm settle persistently failing
From: Tom Gundersen @ 2011-05-15 18:32 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <8739kfzv1j.fsf@spindle.srvr.nix>

On Sun, May 15, 2011 at 8:19 PM, Nix <nix@esperi.org.uk> wrote:
> I know that you're not supposed to rely on 'udevadm settle' anymore, but
> I rely on it across the board for systems with root filesystems that
> aren't expected to move around (i.e. all of them), because massively
> reengineering working systems' boot processes is generally considered a
> bad thing. And it's stopped working. Given how many things expect /dev
> to be populated, this has fairly serious effects.
>
> I can be certain that as of somewhere between udev 164 and 167, 'udevadm
> settle' has stopped waiting for block devices to appear (though I
> suspect others have vanished too). I'm booting udev as recommended in
> the release notes, via
>
> Â udevd --daemon
> Â udevadm trigger --actiond --type=subsystems
> Â udevadm trigger --actiond --typeÞvices
> Â udevadm settle

We are doing the same on Arch and today I started seeing bug reports
(after the upgrade to udev 168). So here are my two cents:

Most of the time the problem seems to be related to LVM, but I have
also seen regular block devices having problems. As would be expected
using devtmpfs greatly reduces (if not eliminates) the problem. My
guess was (like Nix said) that "udevadm settle" is somehow broken.

Some related bug reports:

Arch: <https://bugs.archlinux.org/task/24288>,
Debian: <http://bugs.debian.org/cgi-bin/bugreport.cgi?bugb4010>.

Cheers,

Tom

^ permalink raw reply

* Re: udevadm settle persistently failing
From: Kay Sievers @ 2011-05-15 23:25 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <8739kfzv1j.fsf@spindle.srvr.nix>

On Sun, May 15, 2011 at 20:19, Nix <nix@esperi.org.uk> wrote:
> I know that you're not supposed to rely on 'udevadm settle' anymore, but
> I rely on it across the board for systems with root filesystems that
> aren't expected to move around (i.e. all of them), because massively
> reengineering working systems' boot processes is generally considered a
> bad thing. And it's stopped working. Given how many things expect /dev
> to be populated, this has fairly serious effects.
>
> I can be certain that as of somewhere between udev 164 and 167, 'udevadm
> settle' has stopped waiting for block devices to appear (though I
> suspect others have vanished too). I'm booting udev as recommended in
> the release notes, via
>
> Â udevd --daemon
> Â udevadm trigger --actiond --type=subsystems
> Â udevadm trigger --actiond --typeÞvices
> Â udevadm settle
>
> but this is what I now see:
>
> ,----
> | Creating initial device nodes...
> | [ Â  Â 2.035253] <30>udevd[297]: starting version 168
> | udevd[297]: specified group 'audio' unknown
> |
> | [ Â  Â 2.151279] <30>udevd[297]: converting old udev database
> | udevd[316]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv pci:v00001022d00002080sv00001022sd00002080bc06sc00i00': No such file or directory
> |
> | umount: /run: device is busy.

What's 'umount /run' during bootup? That sounds really strange.

> | Â  Â  Â  Â  (In some cases useful info about processes that use
> | Â  Â  Â  Â  Â the device is found by lsof(8) or fuser(1))
> | Cannot find device "gordianet"
> | fsck from util-linux 2.19
> | udevd[334]: failed to exec[ Â  Â 2.457619] EXT2-fs (sda1): warning: mounting unchecked fs, running e2fsck is recommended
> | ute '/sbin/modprobe' '/sbin/modprobe -bv platform:cs5535-gpio': No such file or directory
> |
> | udevd[333]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv platform:cs5535-acpi': No such file or directory
> |
> | udevd[335]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv platform:cs5535-pms': No such file or directory
> |
> | udevd[336]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv sg': No such file or directory
> |
> | udevd[339]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv platform:rtc_cmos': No such file or directory
> |
> | fsck.ext2: No such file or directory while trying to open /dev/sda1
> | Possibly non-existent device?
> `----
>
> I have no clue why udev is trying to run modprobe when this is a
> non-modular kernel with all necessary devices built in. But that's not
> important.

Udev has no idea what the kernel supports, it calls modprobe for all
devices with a 'modalias' but no attached driver.

> By the time of that 'umount', 'udevadm settle' has returned. Shortly
> afterwards you see fsck claiming that devices don't exist. Look at
> it the filesystem after the resulting failed half-boot and you see:
>
> fold:~# ls -l /dev/sda1
> brw-rw---- 1 root disk 8, 1 May 15 18:56 /dev/sda1
>
> it's there. udevadm settle just didn't bother to wait for it to appear.
> (This is not a device on a bus for which enumeration takes some time:
> it's an SD card on an IDE-lookalike cs5535 bus. I've also seen this
> on LVM-atop-RAID-atop-SATA and on ordinary LVM-atop-SATA, so it doesn't
> require anything particularly unusual.)

Sounds weird. Settle should not return at that point. You are not
altering the content of /run/udev/ or /dev/.udev/ in any way right?
And you provide the /run tmpfs before you start udevd and don't touch
it again, right?

> Other things seem to have failed too. I have renaming rules:
>
> SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a0", NAME="gordianet"
> SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a1", NAME="adsl"
> SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a3", NAME="wireless"
>
> yet the devices were not renamed:

Hmm, that should be unrelated to the possible settle problem

> 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
> Â  Â link/ether 00:00:24:cb:c6:a0 brd ff:ff:ff:ff:ff:ff
> 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
> Â  Â link/ether 00:00:24:cb:c6:a1 brd ff:ff:ff:ff:ff:ff
> 5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
> Â  Â link/ether 00:00:24:cb:c6:a3 brd ff:ff:ff:ff:ff:ff
>
> Put a 'sleep 1' after the udev call, and everything works.

Which call? The trigger?

Kay

^ permalink raw reply

* Re: udevadm settle persistently failing
From: Kay Sievers @ 2011-05-15 23:33 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <8739kfzv1j.fsf@spindle.srvr.nix>

On Sun, May 15, 2011 at 20:32, Tom Gundersen <teg@jklm.no> wrote:
> On Sun, May 15, 2011 at 8:19 PM, Nix <nix@esperi.org.uk> wrote:
>> I know that you're not supposed to rely on 'udevadm settle' anymore, but
>> I rely on it across the board for systems with root filesystems that
>> aren't expected to move around (i.e. all of them), because massively
>> reengineering working systems' boot processes is generally considered a
>> bad thing. And it's stopped working. Given how many things expect /dev
>> to be populated, this has fairly serious effects.
>>
>> I can be certain that as of somewhere between udev 164 and 167, 'udevadm
>> settle' has stopped waiting for block devices to appear (though I
>> suspect others have vanished too). I'm booting udev as recommended in
>> the release notes, via
>>
>> Â udevd --daemon
>> Â udevadm trigger --actiond --type=subsystems
>> Â udevadm trigger --actiond --typeÞvices
>> Â udevadm settle
>
> We are doing the same on Arch and today I started seeing bug reports
> (after the upgrade to udev 168). So here are my two cents:
>
> Most of the time the problem seems to be related to LVM, but I have
> also seen regular block devices having problems. As would be expected
> using devtmpfs greatly reduces (if not eliminates) the problem. My
> guess was (like Nix said) that "udevadm settle" is somehow broken.

We are sure it's not related to /run? We keep the state there and need
a tmpfs there before udevd is started, and it must not be touched, or
cleaned by some other stuff, that thinks /var/run (bind mount or
symlink) needs to be cleaned during boot.

Devtmpfs solves a lot of settle races, yeah. We should run fine on
tmpfs, but it's the only config that is really tested these days, so
it might be a problem nobody else is really seeing.

The current settle wakes up in exactly the moment the last event is
gone, instead of it sleep()ing in a loop. It might be a bit earlier,
not really before stuff has settled though.

Does this work for you?
  time (udevadm trigger; udevadm settle)

It should not return immediately. You can watch the events with:
  udevadm monitor
at the same time and check if it really only returns after the last event.

Kay

^ permalink raw reply

* Re: udevadm settle persistently failing
From: Nix @ 2011-05-15 23:38 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <8739kfzv1j.fsf@spindle.srvr.nix>

On 16 May 2011, Kay Sievers stated:

> On Sun, May 15, 2011 at 20:19, Nix <nix@esperi.org.uk> wrote:
>> ,----
>> | Creating initial device nodes...
>> | [ Â  Â 2.035253] <30>udevd[297]: starting version 168
>> | udevd[297]: specified group 'audio' unknown
>> |
>> | [ Â  Â 2.151279] <30>udevd[297]: converting old udev database
>> | udevd[316]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv pci:v00001022d00002080sv00001022sd00002080bc06sc00i00': No such file or directory
>> |
>> | umount: /run: device is busy.
>
> What's 'umount /run' during bootup? That sounds really strange.

Hm, I didn't notice that. Yes, that does seem very odd indeed.

I'll have a look soon, if probably not on that system (it's a headless
system that does my firewalling so quite hard to reboot). I see the
symptoms on other systems too.

>> | udevd[339]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv platform:rtc_cmos': No such file or directory
>> |
>> | fsck.ext2: No such file or directory while trying to open /dev/sda1
>> | Possibly non-existent device?
>> `----
>>
>> I have no clue why udev is trying to run modprobe when this is a
>> non-modular kernel with all necessary devices built in. But that's not
>> important.
>
> Udev has no idea what the kernel supports, it calls modprobe for all
> devices with a 'modalias' but no attached driver.

Yeah, it's not terribly important (though why they even have a modalias
if this kernel does not have modules built in is also unclear. Anyway,
it can do no harm, since modprobe doesn't even exist on that system.)

>> fold:~# ls -l /dev/sda1
>> brw-rw---- 1 root disk 8, 1 May 15 18:56 /dev/sda1
>>
>> it's there. udevadm settle just didn't bother to wait for it to appear.
>> (This is not a device on a bus for which enumeration takes some time:
>> it's an SD card on an IDE-lookalike cs5535 bus. I've also seen this
>> on LVM-atop-RAID-atop-SATA and on ordinary LVM-atop-SATA, so it doesn't
>> require anything particularly unusual.)
>
> Sounds weird. Settle should not return at that point.

Agreed! (This is why I suspect the timeout stuff is simply timing out
immediately.)

>                                                       You are not
> altering the content of /run/udev/ or /dev/.udev/ in any way right?

Gods, no. Recipe for disaster.

> And you provide the /run tmpfs before you start udevd and don't touch
> it again, right?

ooo! possible bug. Here's my udev startup script:

mount -n /proc
mount -n /sys
mount -n /run
[...]
mount_tmpfs
mkdir -p $udev_root/.udev/db $udev_root/.udev/queue $udev_root/.udev/failed
echo "Creating initial device nodes..."
udevd --daemon
udevadm trigger --actiond --type=subsystems
udevadm trigger --actiond --typeÞvices
udevadm settle
sleep 1

*That* is going to cause udev to try to convert from the old to the new
udev database format every single time it starts, even though there *is*
no old udev database there, just skeletal directories. I wonder if
that's causing it?

(I've ripped that mkdir out. Let's see if that fixes things. If it does,
this suggests that udev needs further bulletproofing, because my udev
startup script was derived directly from one provided by earlier
versions of udev: a *lot* of people will have scripts like it.)

>> Other things seem to have failed too. I have renaming rules:
>>
>> SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a0", NAME="gordianet"
>> SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a1", NAME="adsl"
>> SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a3", NAME="wireless"
>>
>> yet the devices were not renamed:
>
> Hmm, that should be unrelated to the possible settle problem

Yeah. The rename-failure only seems to happen when the settle failure
happens, so perhaps it's related to parts of the startup script messing
with the interfaces and causing the kernel to ban renames because
they're in use. Obviously this doesn't happen if we're still sitting in
'udevadm settle' when this takes place.

>> 2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>> Â  Â link/ether 00:00:24:cb:c6:a0 brd ff:ff:ff:ff:ff:ff
>> 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>> Â  Â link/ether 00:00:24:cb:c6:a1 brd ff:ff:ff:ff:ff:ff
>> 5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>> Â  Â link/ether 00:00:24:cb:c6:a3 brd ff:ff:ff:ff:ff:ff
>>
>> Put a 'sleep 1' after the udev call, and everything works.
>
> Which call? The trigger?

See above: it was right after the settle call.

Anyway, more tomorrow sometime after more testing. (It's past midnight
here.)

-- 
NULL && (void)

^ permalink raw reply

* Re: udevadm settle persistently failing
From: Nix @ 2011-05-15 23:47 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <8739kfzv1j.fsf@spindle.srvr.nix>

On 16 May 2011, Kay Sievers stated:
> We are sure it's not related to /run? We keep the state there and need
> a tmpfs there before udevd is started, and it must not be touched, or
> cleaned by some other stuff, that thinks /var/run (bind mount or
> symlink) needs to be cleaned during boot.

Just to confirm, I'm not cleaning out /run, or touching it at all except
to create /run/lock later in boot.

> Devtmpfs solves a lot of settle races, yeah. We should run fine on
> tmpfs, but it's the only config that is really tested these days, so
> it might be a problem nobody else is really seeing.

I could turn on devtmpfs, I suppose, except that I have an early
userspace and I'm not sure how I'd need to change it: devtmpfs gets
mounted on the rootfs, doesn't it? This problem isn't happening on the
rootfs, it's on the root filesystem that is overmounted over that. (I'm
using busybox mdev to populate the rootfs /dev because it works for such
a simple case, and never goes wrong because it's too simple to go wrong.
However it's also too simple to be much use for a running system.)

> The current settle wakes up in exactly the moment the last event is
> gone, instead of it sleep()ing in a loop. It might be a bit earlier,
> not really before stuff has settled though.
>
> Does this work for you?
>   time (udevadm trigger; udevadm settle)
>
> It should not return immediately. You can watch the events with:
> 
>   udevadm monitor
> at the same time and check if it really only returns after the last event.

Hm, well, that seems to be working, at least once the system is all the
way up. But *something* plainly isn't.

On my next boot I'll time the trigger-and-settle pair and see how long
it takes...

-- 
NULL && (void)

^ permalink raw reply

* Re: udevadm settle persistently failing
From: Kay Sievers @ 2011-05-15 23:51 UTC (permalink / raw)
  To: linux-hotplug
In-Reply-To: <8739kfzv1j.fsf@spindle.srvr.nix>

On Mon, May 16, 2011 at 01:38, Nix <nix@esperi.org.uk> wrote:
> On 16 May 2011, Kay Sievers stated:

>> What's 'umount /run' during bootup? That sounds really strange.
>
> Hm, I didn't notice that. Yes, that does seem very odd indeed.
>
> I'll have a look soon, if probably not on that system (it's a headless
> system that does my firewalling so quite hard to reboot). I see the
> symptoms on other systems too.
>
>>> | udevd[339]: failed to execute '/sbin/modprobe' '/sbin/modprobe -bv platform:rtc_cmos': No such file or directory
>>>
>>> I have no clue why udev is trying to run modprobe when this is a
>>> non-modular kernel with all necessary devices built in. But that's not
>>> important.
>>
>> Udev has no idea what the kernel supports, it calls modprobe for all
>> devices with a 'modalias' but no attached driver.
>
> Yeah, it's not terribly important (though why they even have a modalias
> if this kernel does not have modules built in is also unclear. Anyway,
> it can do no harm, since modprobe doesn't even exist on that system.)

Yeah, you could #ifdef the modalias export if the kernel does not
support module loading. This is not optimized.

It' probably just easier to delete the default udev rule that calls
modprobe, if you don't need it. :)

>>> fold:~# ls -l /dev/sda1
>>> brw-rw---- 1 root disk 8, 1 May 15 18:56 /dev/sda1
>>>
>>> it's there. udevadm settle just didn't bother to wait for it to appear.
>>> (This is not a device on a bus for which enumeration takes some time:
>>> it's an SD card on an IDE-lookalike cs5535 bus. I've also seen this
>>> on LVM-atop-RAID-atop-SATA and on ordinary LVM-atop-SATA, so it doesn't
>>> require anything particularly unusual.)
>>
>> Sounds weird. Settle should not return at that point.
>
> Agreed! (This is why I suspect the timeout stuff is simply timing out
> immediately.)

Yeah, that could happen too, if /run/udev/queue.bin would be deleted
by some cleanup script, after udevd is started.

>> Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  You are not
>> altering the content of /run/udev/ or /dev/.udev/ in any way right?
>
> Gods, no. Recipe for disaster.

Ok, fine.

>> And you provide the /run tmpfs before you start udevd and don't touch
>> it again, right?
>
> ooo! possible bug. Here's my udev startup script:
>
> mount -n /proc
> mount -n /sys
> mount -n /run
> [...]
> mount_tmpfs
> mkdir -p $udev_root/.udev/db $udev_root/.udev/queue $udev_root/.udev/failed

Yeah, never create any private directories of udevd. Most of them do
not even exist anymore in today's udev.

> echo "Creating initial device nodes..."
> udevd --daemon
> udevadm trigger --actiond --type=subsystems
> udevadm trigger --actiond --typeÞvices
> udevadm settle
> sleep 1
>
> *That* is going to cause udev to try to convert from the old to the new
> udev database format every single time it starts, even though there *is*
> no old udev database there, just skeletal directories. I wonder if
> that's causing it?
>
> (I've ripped that mkdir out. Let's see if that fixes things. If it does,
> this suggests that udev needs further bulletproofing, because my udev
> startup script was derived directly from one provided by earlier
> versions of udev: a *lot* of people will have scripts like it.)

Yeah, not a good idea to fiddle around with udev internals. The
existence of the old directory names indicates a need to convert. It
should not really break anything, just waste some time during udevd
startup.

>>> Other things seem to have failed too. I have renaming rules:
>>>
>>> SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a0", NAME="gordianet"
>>> SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a1", NAME="adsl"
>>> SUBSYSTEM="net", ATTR{address}="00:00:24:cb:c6:a3", NAME="wireless"
>>>
>>> yet the devices were not renamed:
>>
>> Hmm, that should be unrelated to the possible settle problem
>
> Yeah. The rename-failure only seems to happen when the settle failure
> happens, so perhaps it's related to parts of the startup script messing
> with the interfaces and causing the kernel to ban renames because
> they're in use. Obviously this doesn't happen if we're still sitting in
> 'udevadm settle' when this takes place.

Yeah, that could explain it. A soon as the netif is up, we can not
rename it anymore.

>>> Put a 'sleep 1' after the udev call, and everything works.
>>
>> Which call? The trigger?
>
> See above: it was right after the settle call.

I see. Hmm, no good idea why this would be.

> Anyway, more tomorrow sometime after more testing. (It's past midnight
> here.)

Sounds good, let me know, if you find something out.

Kay

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox