race condition between udevd and modprobe (mtrr

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* race condition between udevd and modprobe (mtrr_add)
@ 2010-05-04  5:30 Kyle Hubert
  2010-05-05  5:18 ` Kay Sievers
  2010-05-06 22:59 ` Andrew Morton
  0 siblings, 2 replies; 3+ messages in thread
From: Kyle Hubert @ 2010-05-04  5:30 UTC (permalink / raw)
  To: linux-kernel

Hi, while booting an initrd image built off of BusyBox on a thousand
nodes, we hit a race on a couple of nodes. They hang during the boot
process with the stack traces listed below. The really simple init
script in the initrd does a 'udevd --daemon' and then modprobe of a
device. The device needs to assign an mtrr to the pci resource, and
instead the whole node hangs. Putting a 'sleep 1' in between these two
calls prevents any hangs.

mtrr_add_page and the buddy allocator code don't appear to share any
semaphores, and there isn't an obvious way in which this can hang.
Possibly the smp_call_function IPI isn't being handled by the other
cores... That's the best guess. Can anyone help sort this mess out?

Also, is there a better way to test that udevd is fully up? A 'sleep
1' is not the preferred solution here.

Thanks for your time,

-Kyle Hubert

>> ps
              ADDR    UID    PID   PPID  STATE     FLAGS CPU  NAME
===============================================================================
...
0xffff88061d26c720      0   1036      1      0  0x400140   -  udevd
0xffff88021e05c480      0   1037      1      0  0x400100   -  modprobe
0xffff88081d072440      0   1116   1036      0  0x400040   -  udevd
===============================================================================
135 active task structs found
>> bt 0xffff88021e05c480
================================================================
STACK TRACE FOR TASK: 0xffff88021e05c480(modprobe)

 0 <schedule?> [0x0]
 1 mtrr_add_page+494 [0xffffffff80219d9e]
 2 <unknown?>+<ERROR> [0xffffffffa0009a08]
================================================================
>> bt 0xffff88061d25f420
================================================================
STACK TRACE FOR TASK: 0xffff88061d25f420(udevd)

 0 <schedule?> [0x0]
 1 __alloc_pages_internal+241 [0xffffffff80292731]
 2 rmqueue_bulk+89 [0xffffffff80291b19]
 3 get_page_from_freelist+1430 [0xffffffff802922e6]
 4 __alloc_pages_internal+241 [0xffffffff80292731]
 5 alloc_pages_current+168 [0xffffffff802b0898]
 6 pte_alloc_one+49 [0xffffffff80229271]
 7 __pte_alloc+67 [0xffffffff8029e7d3]
 8 copy_page_range+1269 [0xffffffff802a11c5]
 9 alloc_pid+744 [0xffffffff80250a18]
10 copy_process+3057 [0xffffffff8023bcf1]
11 do_fork+118 [0xffffffff8023c4d6]
12 sys_clone+35 [0xffffffff80209c23]
13 ptregscall_common+103 [0xffffffff8020bda7]
================================================================
>> bt 0xffff88081de869e0
================================================================
STACK TRACE FOR TASK: 0xffff88081de869e0(udevd)

 0 <schedule?> [0x0]
 1 set_user_nice+324 [0xffffffff80234734]
 2 set_one_prio+113 [0xffffffff8024ea61]
 3 sys_setpriority+129 [0xffffffff8024eb41]
 4 system_call_fastpath+22 [0xffffffff8020ba3b]
================================================================

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: race condition between udevd and modprobe (mtrr_add)
  2010-05-04  5:30 race condition between udevd and modprobe (mtrr_add) Kyle Hubert
@ 2010-05-05  5:18 ` Kay Sievers
  2010-05-06 22:59 ` Andrew Morton
  1 sibling, 0 replies; 3+ messages in thread
From: Kay Sievers @ 2010-05-05  5:18 UTC (permalink / raw)
  To: Kyle Hubert; +Cc: linux-kernel

On Tue, May 4, 2010 at 07:30, Kyle Hubert <khubert@gmail.com> wrote:
> Hi, while booting an initrd image built off of BusyBox on a thousand
> nodes, we hit a race on a couple of nodes. They hang during the boot
> process with the stack traces listed below. The really simple init
> script in the initrd does a 'udevd --daemon' and then modprobe of a
> device. The device needs to assign an mtrr to the pci resource, and
> instead the whole node hangs. Putting a 'sleep 1' in between these two
> calls prevents any hangs.
>
> mtrr_add_page and the buddy allocator code don't appear to share any
> semaphores, and there isn't an obvious way in which this can hang.
> Possibly the smp_call_function IPI isn't being handled by the other
> cores... That's the best guess. Can anyone help sort this mess out?
>
> Also, is there a better way to test that udevd is fully up? A 'sleep
> 1' is not the preferred solution here.

"udevadm settle" might do it.

Kay

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: race condition between udevd and modprobe (mtrr_add)
  2010-05-04  5:30 race condition between udevd and modprobe (mtrr_add) Kyle Hubert
  2010-05-05  5:18 ` Kay Sievers
@ 2010-05-06 22:59 ` Andrew Morton
  1 sibling, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2010-05-06 22:59 UTC (permalink / raw)
  To: Kyle Hubert; +Cc: linux-kernel

On Mon, 3 May 2010 22:30:01 -0700
Kyle Hubert <khubert@gmail.com> wrote:

> Hi, while booting an initrd image built off of BusyBox on a thousand
> nodes, we hit a race on a couple of nodes. They hang during the boot
> process with the stack traces listed below. The really simple init
> script in the initrd does a 'udevd --daemon' and then modprobe of a
> device. The device needs to assign an mtrr to the pci resource, and
> instead the whole node hangs. Putting a 'sleep 1' in between these two
> calls prevents any hangs.
> 
> mtrr_add_page and the buddy allocator code don't appear to share any
> semaphores, and there isn't an obvious way in which this can hang.
> Possibly the smp_call_function IPI isn't being handled by the other
> cores... That's the best guess. Can anyone help sort this mess out?
> 
> Also, is there a better way to test that udevd is fully up? A 'sleep
> 1' is not the preferred solution here.
> 
> Thanks for your time,
> 

What kernel version are you using here?  It looks old - pre 2.6.31.

> 
> >> ps
>               ADDR    UID    PID   PPID  STATE     FLAGS CPU  NAME
> ===============================================================================
> ...
> 0xffff88061d26c720      0   1036      1      0  0x400140   -  udevd
> 0xffff88021e05c480      0   1037      1      0  0x400100   -  modprobe
> 0xffff88081d072440      0   1116   1036      0  0x400040   -  udevd
> ===============================================================================
> 135 active task structs found
> >> bt 0xffff88021e05c480
> ================================================================
> STACK TRACE FOR TASK: 0xffff88021e05c480(modprobe)
> 
>  0 <schedule?> [0x0]
>  1 mtrr_add_page+494 [0xffffffff80219d9e]
>  2 <unknown?>+<ERROR> [0xffffffffa0009a08]
> ================================================================
> >> bt 0xffff88061d25f420
> ================================================================
> STACK TRACE FOR TASK: 0xffff88061d25f420(udevd)
> 
>  0 <schedule?> [0x0]
>  1 __alloc_pages_internal+241 [0xffffffff80292731]
>  2 rmqueue_bulk+89 [0xffffffff80291b19]
>  3 get_page_from_freelist+1430 [0xffffffff802922e6]
>  4 __alloc_pages_internal+241 [0xffffffff80292731]
>  5 alloc_pages_current+168 [0xffffffff802b0898]
>  6 pte_alloc_one+49 [0xffffffff80229271]
>  7 __pte_alloc+67 [0xffffffff8029e7d3]
>  8 copy_page_range+1269 [0xffffffff802a11c5]
>  9 alloc_pid+744 [0xffffffff80250a18]
> 10 copy_process+3057 [0xffffffff8023bcf1]
> 11 do_fork+118 [0xffffffff8023c4d6]
> 12 sys_clone+35 [0xffffffff80209c23]
> 13 ptregscall_common+103 [0xffffffff8020bda7]

These traces look odd - the kernel shouldn't be calling schedule() from
below rmqueue_bulk()!

If possible, please try a more recent kernel.  If the problem occurs
there and if we manage to fix it, the fix can be backported into
whatever-kernel-version-you're-using.

Can you get a better trace?  The sysrq-T output would be good.  That's
known to work sufficiently well.  Please avoid wordwrapping it when sending.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-05-06 23:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-04  5:30 race condition between udevd and modprobe (mtrr_add) Kyle Hubert
2010-05-05  5:18 ` Kay Sievers
2010-05-06 22:59 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox