public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] wait_for_devfsd_finished deadlock
@ 2001-12-02  3:11 William Lee Irwin III
  2001-12-02 19:01 ` Richard Gooch
  2001-12-04  5:04 ` Richard Gooch
  0 siblings, 2 replies; 5+ messages in thread
From: William Lee Irwin III @ 2001-12-02  3:11 UTC (permalink / raw)
  To: linux-kernel

While testing 2.4.17-pre1 with some other patches, a situation
reminiscent of a deadlock arose. mutt(1) would block indefinitely
while opening a large mbox, and all further calls to sys_open()
would block indefinitely.

After some further testing to isolate the problem, I reproduced
this behavior in the vanilla 2.4.17-pre1 kernel. The sysrq output showed
a number of processes with the following call trace in the devfs core:


Dec  1 17:29:26 holomorphy kernel: sh            D C02A2A00     0   821    806  
                   (NOTLB)
Dec  1 17:29:26 holomorphy kernel: Call Trace: [wait_for_devfsd_finished+170/208
] [devfs_d_revalidate_wait+135/160] [devfs_lookup+409/448] [d_alloc+27/400] [rea
l_lookup+83/192] 
Dec  1 17:29:26 holomorphy kernel:    [link_path_walk+1310/1888] [path_walk+26/3
2] [open_namei+131/1488] [filp_open+59/96] [sys_open+56/192] [system_call+51/56]


And the following call traces elsewhere:


Dec  1 17:29:26 holomorphy kernel: cron          S E3540000     0   852    633  
 853     855   827 (NOTLB)
Dec  1 17:29:26 holomorphy kernel: Call Trace: [pipe_wait+124/176] [pipe_read+17
9/512] [sys_read+150/208] [system_call+51/56] 


Dec  1 17:29:26 holomorphy kernel: procmail      S E2395ED8     0   881    880  
         882       (NOTLB)
Dec  1 17:29:26 holomorphy kernel: Call Trace: [interruptible_sleep_on_locked+11
6/192] [locks_block_on+28/48] [posix_lock_file+178/1232] [fcntl_setlk+335/512] [
do_fcntl+318/512] 


Dec  1 17:29:26 holomorphy kernel: exim          S 00000000     0   905    899  
 907               (NOTLB)
Dec  1 17:29:26 holomorphy kernel: Call Trace: [sys_wait4+862/912] [system_call+
51/56] 

 

Further diagnostic information is available upon request.

Mr. Gooch, your attention to this matter is much appreciated.



Thanks,
Bill

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] wait_for_devfsd_finished deadlock
  2001-12-02  3:11 [BUG] wait_for_devfsd_finished deadlock William Lee Irwin III
@ 2001-12-02 19:01 ` Richard Gooch
  2001-12-03  1:34   ` William Lee Irwin III
  2001-12-04  5:04 ` Richard Gooch
  1 sibling, 1 reply; 5+ messages in thread
From: Richard Gooch @ 2001-12-02 19:01 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

William Lee Irwin, III writes:
> While testing 2.4.17-pre1 with some other patches, a situation
> reminiscent of a deadlock arose. mutt(1) would block indefinitely
> while opening a large mbox, and all further calls to sys_open()
> would block indefinitely.
> 
> After some further testing to isolate the problem, I reproduced this
> behavior in the vanilla 2.4.17-pre1 kernel. The sysrq output showed
> a number of processes with the following call trace in the devfs
> core:

Your sh process appears to be hung in wait_for_devfsd_finished(). It
would be helpful to know what devfsd was doing at this time. If it
were hung internally (in user-space), it would account for this
behaviour. However, if devfsd crashes, then devfsd_close() will be
called, which will wake any waiters.

> And the following call traces elsewhere:

Are these related?
cron		->	pipe_wait()
procmail	->	interruptible_sleep_on_locked()
exim		->	sys_wait4()

Maybe these are just waiting on mutt(1)?

> Further diagnostic information is available upon request.

That's what I like to hear. Set CONFIG_DEVFS_DEBUG=y and boot with
"devfs=dall" and send me the (verbose) kernel logs. That should show
the sequence of events that lead to this.

> Mr. Gooch, your attention to this matter is much appreciated.

Just "Richard" is fine. I'm not a fan of formality.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] wait_for_devfsd_finished deadlock
  2001-12-02 19:01 ` Richard Gooch
@ 2001-12-03  1:34   ` William Lee Irwin III
  2001-12-03  1:44     ` Richard Gooch
  0 siblings, 1 reply; 5+ messages in thread
From: William Lee Irwin III @ 2001-12-03  1:34 UTC (permalink / raw)
  To: Richard Gooch; +Cc: linux-kernel

On Sun, Dec 02, 2001 at 12:01:58PM -0700, Richard Gooch wrote:
> Your sh process appears to be hung in wait_for_devfsd_finished(). It
> would be helpful to know what devfsd was doing at this time. If it
> were hung internally (in user-space), it would account for this
> behaviour. However, if devfsd crashes, then devfsd_close() will be
> called, which will wake any waiters.

A oops trace from quite a bit further back in my logs makes this
appear to be an instance of an already reported problem. I apologize
for overlooking this detail in my prior post:

Dec  1 16:49:11 holomorphy kernel:  printing eip:
Dec  1 16:49:11 holomorphy kernel: c016536d
Dec  1 16:49:11 holomorphy kernel: Oops: 0002
Dec  1 16:49:11 holomorphy kernel: CPU:    0
Dec  1 16:49:11 holomorphy kernel: EIP:    0010:[devfs_put+13/192]    Not tainte
d
Dec  1 16:49:11 holomorphy kernel: EFLAGS: 00010206
Dec  1 16:49:11 holomorphy kernel: eax: 5a5a5a5a   ebx: 5a5a5a5a   ecx: 00000002
   edx: 5a5a5a5a
Dec  1 16:49:11 holomorphy kernel: esi: 00000000   edi: 00000026   ebp: 00000000
   esp: ef785f40
Dec  1 16:49:11 holomorphy kernel: ds: 0018   es: 0018   ss: 0018
Dec  1 16:49:11 holomorphy kernel: Process devfsd (pid: 20, stackpage=ef785000)
Dec  1 16:49:12 holomorphy kernel: Stack: 00000026 c01680dc 5a5a5a5a c1ca8d74 ffffffea 00000000 00000420 c1cc4800 
Dec  1 16:49:12 holomorphy kernel:        c02a2a00 ef653308 5a5a5a5a 000003fa 00
000000 00000000 00000001 00000000 
Dec  1 16:49:12 holomorphy kernel:        ef784000 00000000 00000000 00000000 ef
784000 c02a2a2c c02a2a2c c0130bd6 
Dec  1 16:49:12 holomorphy kernel: Call Trace: [devfsd_read+860/992] [sys_read+1
50/208] [system_call+51/56] 
Dec  1 16:49:12 holomorphy kernel: 
Dec  1 16:49:12 holomorphy kernel: Code: ff 4b 04 0f 94 c0 84 c0 0f 84 9d 00 00 
00 3b 1d d8 1b 30 c0 


On Sun, Dec 02, 2001 at 12:01:58PM -0700, Richard Gooch wrote:
> Are these related?
> cron		->	pipe_wait()
> procmail	->	interruptible_sleep_on_locked()
> exim		->	sys_wait4()

> Maybe these are just waiting on mutt(1)?

I don't believe so. In my configuration, exim delivers to a file
through a procmail pipe, and to the best of my knowledge, mutt does
little more than monitor the files with poll or select, which should
not interfere with the completion of their operations. cron should be
entirely unrelated, as no mail-related activities are scheduled with it.

I suspect in the case I reported, the destruction of the devfsd thread
is responsible for the deadlocks.



Thanks,
Bill

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] wait_for_devfsd_finished deadlock
  2001-12-03  1:34   ` William Lee Irwin III
@ 2001-12-03  1:44     ` Richard Gooch
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Gooch @ 2001-12-03  1:44 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

William Lee Irwin, III writes:
> On Sun, Dec 02, 2001 at 12:01:58PM -0700, Richard Gooch wrote:
> > Your sh process appears to be hung in wait_for_devfsd_finished(). It
> > would be helpful to know what devfsd was doing at this time. If it
> > were hung internally (in user-space), it would account for this
> > behaviour. However, if devfsd crashes, then devfsd_close() will be
> > called, which will wake any waiters.
> 
> A oops trace from quite a bit further back in my logs makes this
> appear to be an instance of an already reported problem. I apologize
> for overlooking this detail in my prior post:

You're referring to the problem reported by Pierre Rousselet?

> Dec  1 16:49:11 holomorphy kernel:  printing eip:
> Dec  1 16:49:11 holomorphy kernel: c016536d
> Dec  1 16:49:11 holomorphy kernel: Oops: 0002
> Dec  1 16:49:11 holomorphy kernel: CPU:    0
> Dec  1 16:49:11 holomorphy kernel: EIP:    0010:[devfs_put+13/192]    Not tainte
> d
> Dec  1 16:49:11 holomorphy kernel: EFLAGS: 00010206
> Dec  1 16:49:11 holomorphy kernel: eax: 5a5a5a5a   ebx: 5a5a5a5a   ecx: 00000002
>    edx: 5a5a5a5a
> Dec  1 16:49:11 holomorphy kernel: esi: 00000000   edi: 00000026   ebp: 00000000
>    esp: ef785f40
> Dec  1 16:49:11 holomorphy kernel: ds: 0018   es: 0018   ss: 0018
> Dec  1 16:49:11 holomorphy kernel: Process devfsd (pid: 20, stackpage=ef785000)
> Dec  1 16:49:12 holomorphy kernel: Stack: 00000026 c01680dc 5a5a5a5a c1ca8d74 ffffffea 00000000 00000420 c1cc4800 
> Dec  1 16:49:12 holomorphy kernel:        c02a2a00 ef653308 5a5a5a5a 000003fa 00
> 000000 00000000 00000001 00000000 
> Dec  1 16:49:12 holomorphy kernel:        ef784000 00000000 00000000 00000000 ef
> 784000 c02a2a2c c02a2a2c c0130bd6 
> Dec  1 16:49:12 holomorphy kernel: Call Trace: [devfsd_read+860/992] [sys_read+1
> 50/208] [system_call+51/56] 
> Dec  1 16:49:12 holomorphy kernel: 
> Dec  1 16:49:12 holomorphy kernel: Code: ff 4b 04 0f 94 c0 84 c0 0f 84 9d 00 00 
> 00 3b 1d d8 1b 30 c0 
> 
> 
> On Sun, Dec 02, 2001 at 12:01:58PM -0700, Richard Gooch wrote:
> > Are these related?
> > cron		->	pipe_wait()
> > procmail	->	interruptible_sleep_on_locked()
> > exim		->	sys_wait4()
> 
> > Maybe these are just waiting on mutt(1)?
> 
> I don't believe so. In my configuration, exim delivers to a file
> through a procmail pipe, and to the best of my knowledge, mutt does
> little more than monitor the files with poll or select, which should
> not interfere with the completion of their operations. cron should
> be entirely unrelated, as no mail-related activities are scheduled
> with it.

Odd, especially cron. My version of crond doesn't seem to look in /dev
at all.

> I suspect in the case I reported, the destruction of the devfsd thread
> is responsible for the deadlocks.

That would make sense. If devfsd gets an Oops, the release() method
won't be called, thus no waiters will be woken up. So it looks like
this is the same problem that Pierre reported. So now I'm going to ask
you for debugging help as well. The information I have from Pierre so
far hasn't led me to inspiration.

So, I want to see your complete kernel logs, booted with "devfs=dall"
and make sure you've compiled with:
CONFIG_DEVFS_DEBUG=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_HIGHMEM=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_IOVIRT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_BUGVERBOSE=y

I don't remeber what kernel this was, but I'd prefer if you can
test/reproduce this with 2.4.17-pre2.

I'll also want lsmod output, your devfsd configuration and your
complete .config. Just send it to me only: no need to spam the list
with all this detail.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] wait_for_devfsd_finished deadlock
  2001-12-02  3:11 [BUG] wait_for_devfsd_finished deadlock William Lee Irwin III
  2001-12-02 19:01 ` Richard Gooch
@ 2001-12-04  5:04 ` Richard Gooch
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Gooch @ 2001-12-04  5:04 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

William Lee Irwin, III writes:
> While testing 2.4.17-pre1 with some other patches, a situation
> reminiscent of a deadlock arose. mutt(1) would block indefinitely
> while opening a large mbox, and all further calls to sys_open()
> would block indefinitely.
> 
> After some further testing to isolate the problem, I reproduced this
> behavior in the vanilla 2.4.17-pre1 kernel. The sysrq output showed
> a number of processes with the following call trace in the devfs
> core:
[...]

Just a followup: did you get around to trying devfs-patch-v199.2 ?
That should fix the problem.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-12-04  5:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-02  3:11 [BUG] wait_for_devfsd_finished deadlock William Lee Irwin III
2001-12-02 19:01 ` Richard Gooch
2001-12-03  1:34   ` William Lee Irwin III
2001-12-03  1:44     ` Richard Gooch
2001-12-04  5:04 ` Richard Gooch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox