Strange 'zombie' problem both in 2.4 and 2.6

All of lore.kernel.org
 help / color / mirror / Atom feed

* Strange 'zombie' problem both in 2.4 and 2.6
@ 2004-04-01 10:42 Nikita V. Youshchenko
  2004-04-01 13:17 ` Denis Vlasenko
  0 siblings, 1 reply; 6+ messages in thread
From: Nikita V. Youshchenko @ 2004-04-01 10:42 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3330 bytes --]

Hello.

Some time ago I was faced with a strange problem in 2.4 kernel.
I could reproduce in only on one system - a production 2-CPU server that is 
used as LTSP server here and also runs tons of services and MUST be always 
up.

The problem is the following.
Server runs normally (and uptime may be already several weeks, but may be 
only several hours).
Suddenly something happens.
And process table becomes full of zombies.
Looks like any thread created by any program becomes a zombie when 
finished. Same programs (actually, same running processes) join()ed 
finished threads correctly before Something Happened. So it looks very 
much that Something happens inside the kernel.
Affected programs include mozilla, clamav, mysqld, licq and anything else 
that creates short-living threads, or at least threads that live shorter 
than program itself.

It looks like at some moment kernel looses the abitily to inform process 
that their threads are over. AKAIK, this is done by SIGCHLD? Anyway, 
manual sending SIGCHLD to the parent of zombies does not help.

After the problem happens, server becomes unusable (because of process 
table overflow) in several minutes. One time Something Else happened, and 
all those zombies disappeared. In all other cases a reboot was required.

If the process that created those "zombie thread" is terminated (i.e. 
sevice stopped), all his zombies disappear. However, after service is 
restarted, zombies become to appear again.

Athough I tried, I could not find any correlation between making system to 
this "zombie-keeping" state and anything else happenning with the system. 
Looks like that running java apps (with blackdown jdk) makes this happen 
more often, bot still no direct correlation.

The problem happened with official 2.4.23, 2.4.24 and 2.4.25 kernels, 
compiled from kernel.org sources.

Yestedray I was tired with this zombie problem (it arised twice during this 
week), and decided to upgrade server to kernel 2.6.
I installed 2.6.4 kernel from the Debain kernel-image-2.6.4-1-k7-smp 
package.

Unfortunately, this did not eliminate the problem: it happened today again.
The difference is that when running in 2.6, most binaries use NPTL libs 
from /lib/i686/cmov/, and seem not to be affected by the problem (i.e. no 
zombies from them). However, users need to run some statically-linked 
binaries (without source available) that have non-NPTL libs statically 
linked and so still use linuxthreads; those are affected (i.e. do create 
zombies). So problem is not rendering server unusable (so it no longer 
that critical), but it still exists in the 2.6 kernel.

I can't reproduce the problem on any other host. And the affected system is 
a production server that is somewhat difficult to use for debugging :(
It is a dual-K7 server with Tyan Tiger MPX S2466 motherboard and 2 Gb of 
ram. Output of 'lspci -vv' and 'cat /proc/cpuinfo' is attached. I may 
provide any other technical information.

I'm a seasoned unix developer and sysadmin, and have some kernel hacking 
experience. However, I don't work with the kernel currently, so I am not 
"in context of" kernel internals.
So I'm looking either for a fix :), or for some advice on what to do with 
this (i.e. where to look in the kernel code and what to look for).

Nikita Youshchenko,
sysadmin at lvk.cs.msu.su

[-- Attachment #2: info --]
[-- Type: text/plain, Size: 7991 bytes --]

> uname -a
Linux zigzag 2.6.4-1-k7-smp #1 SMP Sun Mar 14 00:19:02 EST 2004 i686 GNU/Linux

> cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 6
model		: 10
model name	: AMD Athlon(tm) MP 2800+
stepping	: 0
cpu MHz		: 2133.583
cache size	: 512 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow
bogomips	: 4210.68

processor	: 1
vendor_id	: AuthenticAMD
cpu family	: 6
model		: 10
model name	: AMD Athlon(tm) Processor
stepping	: 0
cpu MHz		: 2133.583
cache size	: 512 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow
bogomips	: 4259.84

> lspci -vv
00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller (rev 20)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 32
	Region 0: Memory at f4000000 (32-bit, prefetchable) [size=64M]
	Region 1: Memory at f0600000 (32-bit, prefetchable) [size=4K]
	Region 2: I/O ports at 1020 [disabled] [size=4]
	Capabilities: <available only to root>

00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 99
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: f0100000-f01fffff
	Prefetchable memory behind bridge: f8000000-fbffffff
	BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-

00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ISA (rev 05)
	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0

00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE (rev 04) (prog-if 8a [Master SecP PriP])
	Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0
	Region 4: I/O ports at f000 [size=16]

00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI (rev 03)
	Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:08.0 RAID bus controller: Promise Technology, Inc. PDC20271 (rev 02) (prog-if 85)
	Subsystem: Promise Technology, Inc.: Unknown device 4d68
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64 (1000ns min, 4500ns max), Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 20
	Region 0: I/O ports at 1038 [size=8]
	Region 1: I/O ports at 1030 [size=4]
	Region 2: I/O ports at 1028 [size=8]
	Region 3: I/O ports at 1024 [size=4]
	Region 4: I/O ports at 1010 [size=16]
	Region 5: Memory at f0000000 (32-bit, non-prefetchable) [size=64K]
	Expansion ROM at <unassigned> [disabled] [size=64K]
	Capabilities: <available only to root>

00:10.0 PCI bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] PCI (rev 05) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 64
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=168
	I/O behind bridge: 00003000-00003fff
	Memory behind bridge: f0200000-f03fffff
	BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-

01:05.0 VGA compatible controller: ATI Technologies Inc Rage 128 RF/SG AGP (prog-if 00 [VGA])
	Subsystem: ATI Technologies Inc Magnum/Xpert128/X99/Xpert2000
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B+
	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 66 (2000ns min), Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 18
	Region 0: Memory at f8000000 (32-bit, prefetchable) [size=64M]
	Region 1: I/O ports at 2000 [size=256]
	Region 2: Memory at f0100000 (32-bit, non-prefetchable) [size=16K]
	Expansion ROM at <unassigned> [disabled] [size=128K]
	Capabilities: <available only to root>

02:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-768 [Opus] USB (rev 07) (prog-if 10 [OHCI])
	Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] USB
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR+
	Latency: 64 (20000ns max)
	Interrupt: pin D routed to IRQ 19
	Region 0: Memory at f0300000 (32-bit, non-prefetchable) [size=4K]

02:04.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
	Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64 (2000ns min, 14000ns max), Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at f0301000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at 3480 [size=64]
	Region 2: Memory at f0200000 (32-bit, non-prefetchable) [size=1M]
	Expansion ROM at <unassigned> [disabled] [size=1M]
	Capabilities: <available only to root>

02:06.0 SCSI storage controller: Tekram Technology Co.,Ltd. TRM-S1040 (rev 01)
	Subsystem: Tekram Technology Co.,Ltd. TRM-S1040
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64, Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 18
	Region 0: I/O ports at 3000 [size=256]
	Region 1: Memory at f0302000 (32-bit, non-prefetchable) [size=4K]
	Expansion ROM at <unassigned> [disabled] [size=64K]
	Capabilities: <available only to root>

02:07.0 Multimedia audio controller: Ensoniq 5880 AudioPCI (rev 02)
	Subsystem: Ensoniq Creative Sound Blaster AudioPCI128
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort+ <MAbort- >SERR- <PERR-
	Latency: 64 (3000ns min, 32000ns max)
	Interrupt: pin A routed to IRQ 19
	Region 0: I/O ports at 34c0 [size=64]
	Capabilities: <available only to root>

02:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
	Subsystem: Tyan Computer: Unknown device 2466
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 80 (2500ns min, 2500ns max), Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin A routed to IRQ 19
	Region 0: I/O ports at 3400 [size=128]
	Region 1: Memory at f0303000 (32-bit, non-prefetchable) [size=128]
	Expansion ROM at <unassigned> [disabled] [size=128K]
	Capabilities: <available only to root>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Strange 'zombie' problem both in 2.4 and 2.6
  2004-04-01 10:42 Strange 'zombie' problem both in 2.4 and 2.6 Nikita V. Youshchenko
@ 2004-04-01 13:17 ` Denis Vlasenko
  2004-04-01 15:20   ` Nikita V. Youshchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Denis Vlasenko @ 2004-04-01 13:17 UTC (permalink / raw)
  To: Nikita V. Youshchenko, linux-kernel

On Thursday 01 April 2004 13:42, Nikita V. Youshchenko wrote:
> Hello.
>
> Some time ago I was faced with a strange problem in 2.4 kernel.
> I could reproduce in only on one system - a production 2-CPU server that is
> used as LTSP server here and also runs tons of services and MUST be always
> up.
>
> The problem is the following.
> Server runs normally (and uptime may be already several weeks, but may be
> only several hours).
> Suddenly something happens.
> And process table becomes full of zombies.
> Looks like any thread created by any program becomes a zombie when
> finished. Same programs (actually, same running processes) join()ed
> finished threads correctly before Something Happened. So it looks very
> much that Something happens inside the kernel.
> Affected programs include mozilla, clamav, mysqld, licq and anything else
> that creates short-living threads, or at least threads that live shorter
> than program itself.

How does ps -AH e looks like?

> It looks like at some moment kernel looses the abitily to inform process
> that their threads are over. AKAIK, this is done by SIGCHLD? Anyway,
> manual sending SIGCHLD to the parent of zombies does not help.

Did you try stracing parent process? It can receive SIGCHLD but
ignore/mishandle it.

> After the problem happens, server becomes unusable (because of process
> table overflow) in several minutes. One time Something Else happened, and
> all those zombies disappeared. In all other cases a reboot was required.
>
> If the process that created those "zombie thread" is terminated (i.e.
> sevice stopped), all his zombies disappear. However, after service is
> restarted, zombies become to appear again.

Probably they get reparented to init and it wait()'s for them,
ending their afterlife. So SIGCHLD works (at least in this case).

> Athough I tried, I could not find any correlation between making system to
> this "zombie-keeping" state and anything else happenning with the system.
> Looks like that running java apps (with blackdown jdk) makes this happen
> more often, bot still no direct correlation.
>
> The problem happened with official 2.4.23, 2.4.24 and 2.4.25 kernels,
> compiled from kernel.org sources.
>
> Yestedray I was tired with this zombie problem (it arised twice during this
> week), and decided to upgrade server to kernel 2.6.
> I installed 2.6.4 kernel from the Debain kernel-image-2.6.4-1-k7-smp
> package.
>
> Unfortunately, this did not eliminate the problem: it happened today again.
> The difference is that when running in 2.6, most binaries use NPTL libs
> from /lib/i686/cmov/, and seem not to be affected by the problem (i.e. no
> zombies from them). However, users need to run some statically-linked
> binaries (without source available) that have non-NPTL libs statically
> linked and so still use linuxthreads; those are affected (i.e. do create
> zombies). So problem is not rendering server unusable (so it no longer
> that critical), but it still exists in the 2.6 kernel.

Sounds like userspace problem in threading libraries.
What version of glibc/linuxthreads was in use before?
Maybe post your report on linuxthreads mailing list.

> I can't reproduce the problem on any other host. And the affected system is
> a production server that is somewhat difficult to use for debugging :(
> It is a dual-K7 server with Tyan Tiger MPX S2466 motherboard and 2 Gb of
> ram. Output of 'lspci -vv' and 'cat /proc/cpuinfo' is attached. I may
> provide any other technical information.
>
> I'm a seasoned unix developer and sysadmin, and have some kernel hacking
> experience. However, I don't work with the kernel currently, so I am not
> "in context of" kernel internals.
> So I'm looking either for a fix :), or for some advice on what to do with
> this (i.e. where to look in the kernel code and what to look for).
-- 
vda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Strange 'zombie' problem both in 2.4 and 2.6
  2004-04-01 13:17 ` Denis Vlasenko
@ 2004-04-01 15:20   ` Nikita V. Youshchenko
  2004-04-01 16:09     ` Denis Vlasenko
  0 siblings, 1 reply; 6+ messages in thread
From: Nikita V. Youshchenko @ 2004-04-01 15:20 UTC (permalink / raw)
  To: Denis Vlasenko, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4380 bytes --]

> > Some time ago I was faced with a strange problem in 2.4 kernel.
> > I could reproduce in only on one system - a production 2-CPU server
> > that is used as LTSP server here and also runs tons of services and
> > MUST be always up.
> >
> > The problem is the following.
> > Server runs normally (and uptime may be already several weeks, but may
> > be only several hours).
> > Suddenly something happens.
> > And process table becomes full of zombies.
> > Looks like any thread created by any program becomes a zombie when
> > finished. Same programs (actually, same running processes) join()ed
> > finished threads correctly before Something Happened. So it looks very
> > much that Something happens inside the kernel.
> > Affected programs include mozilla, clamav, mysqld, licq and anything
> > else that creates short-living threads, or at least threads that live
> > shorter than program itself.
>
> How does ps -AH e looks like?

See output of "ps -lax" in attachment.

> > It looks like at some moment kernel looses the abitily to inform
> > process that their threads are over. AKAIK, this is done by SIGCHLD?
> > Anyway, manual sending SIGCHLD to the parent of zombies does not help.
>
> Did you try stracing parent process? It can receive SIGCHLD but
> ignore/mishandle it.

I tried to use strace -f, so all threads exist in the output. No signals 
arrive, expect those send manually by kill().
Stracing same binary on another host shows that SIGRT_1 arrives to the 
parent.
I may send the strace logs, but they are somewhat large.
So kernel really stops devivering signals.

As far as I understand, in case of threads SIGRT_1 is used instead of 
SIGCHLD.
So I tried to send SIGRT_1 to the parent manually. And zombies disappeared!
However, new zombies appear soon. They may still be removed by manual 
SIGRT_1, but it is not a solution for a kernel bug :).

> > After the problem happens, server becomes unusable (because of process
> > table overflow) in several minutes. One time Something Else happened,
> > and all those zombies disappeared. In all other cases a reboot was
> > required.
> >
> > If the process that created those "zombie thread" is terminated (i.e.
> > sevice stopped), all his zombies disappear. However, after service is
> > restarted, zombies become to appear again.
>
> Probably they get reparented to init and it wait()'s for them,
> ending their afterlife. So SIGCHLD works (at least in this case).

Seems that signal passing works only after reparenting zombies.

> > Athough I tried, I could not find any correlation between making
> > system to this "zombie-keeping" state and anything else happenning
> > with the system. Looks like that running java apps (with blackdown
> > jdk) makes this happen more often, bot still no direct correlation.
> >
> > The problem happened with official 2.4.23, 2.4.24 and 2.4.25 kernels,
> > compiled from kernel.org sources.
> >
> > Yestedray I was tired with this zombie problem (it arised twice during
> > this week), and decided to upgrade server to kernel 2.6.
> > I installed 2.6.4 kernel from the Debain kernel-image-2.6.4-1-k7-smp
> > package.
> >
> > Unfortunately, this did not eliminate the problem: it happened today
> > again. The difference is that when running in 2.6, most binaries use
> > NPTL libs from /lib/i686/cmov/, and seem not to be affected by the
> > problem (i.e. no zombies from them). However, users need to run some
> > statically-linked binaries (without source available) that have
> > non-NPTL libs statically linked and so still use linuxthreads; those
> > are affected (i.e. do create zombies). So problem is not rendering
> > server unusable (so it no longer that critical), but it still exists
> > in the 2.6 kernel.
>
> Sounds like userspace problem in threading libraries.
> What version of glibc/linuxthreads was in use before?
> Maybe post your report on linuxthreads mailing list.

I doubt it is a userspace problem.
It happens with the same userspace libs and binaries (or even same running 
processes) with which it did not happen sometime ago.
It happens at the same moment with different processes running from 
different accounts.
Restarting processes doesn't help.
It is not reprodusable on other hosts.
Manual signal send (kill -33 <parentpid>) removes already existing zombies.
I can hardly imagine a userspace problem that behaves like this.

Nikita

[-- Attachment #2: ps --]
[-- Type: text/plain, Size: 2447 bytes --]

0   206 18992     1  16   0 50512 1600 schedu S    ?          0:00 /space/p2/n/donkey/donkey0.50.1 - ! -g -l
1   206 18993 18992  16   0 50512 1600 schedu S    ?          0:00 /space/p2/n/donkey/donkey0.50.1 - ! -g -l
1   206 18994 18993  15   0 50512 1600 schedu S    ?          0:00 /space/p2/n/donkey/donkey0.50.1 - ! -g -l
1   206 18995 18993  18   0 50512 1600 io_sch D    ?          0:31 /space/p2/n/donkey/donkey0.50.1 - ! -g -l
1   206 18996 18993  16   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19021 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19068 18993  16   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19069 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19070 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19071 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19072 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19073 18993  16   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19074 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19077 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19078 18993  16   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19090 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19091 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19092 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19093 18993  16   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19094 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19095 18993  16   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19096 18993  16   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19107 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19108 18993  15   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
1   206 19111 18993  16   0     0    0 exit   Z    ?          0:00 [donkey0.50.1] <defunct>
0   206 19123 19080  15   0  2940  748 pipe_w S    pts/254    0:00 grep donkey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Strange 'zombie' problem both in 2.4 and 2.6
  2004-04-01 15:20   ` Nikita V. Youshchenko
@ 2004-04-01 16:09     ` Denis Vlasenko
  2004-04-01 20:25       ` Nikita V. Youshchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Denis Vlasenko @ 2004-04-01 16:09 UTC (permalink / raw)
  To: Nikita V. Youshchenko, linux-kernel

> > > It looks like at some moment kernel looses the abitily to inform
> > > process that their threads are over. AKAIK, this is done by SIGCHLD?
> > > Anyway, manual sending SIGCHLD to the parent of zombies does not help.
> >
> > Did you try stracing parent process? It can receive SIGCHLD but
> > ignore/mishandle it.
>
> I tried to use strace -f, so all threads exist in the output. No signals
> arrive, expect those send manually by kill().
> Stracing same binary on another host shows that SIGRT_1 arrives to the
> parent.
> I may send the strace logs, but they are somewhat large.
> So kernel really stops devivering signals.

Post reasonably small pieces of them.

> As far as I understand, in case of threads SIGRT_1 is used instead of
> SIGCHLD.
> So I tried to send SIGRT_1 to the parent manually. And zombies disappeared!
> However, new zombies appear soon. They may still be removed by manual
> SIGRT_1, but it is not a solution for a kernel bug :).

Maybe. Maybe not. I am no expert, I'd try to learn out how SIGRT_1
is generated in normal case (I suppose kernel does not distinguish
between threads and processes, maybe it's done by threading libs?)

> > Probably they get reparented to init and it wait()'s for them,
> > ending their afterlife. So SIGCHLD works (at least in this case).
>
> Seems that signal passing works only after reparenting zombies.
>
> > > Unfortunately, this did not eliminate the problem: it happened today
> > > again. The difference is that when running in 2.6, most binaries use
> > > NPTL libs from /lib/i686/cmov/, and seem not to be affected by the
> > > problem (i.e. no zombies from them). However, users need to run some
> > > statically-linked binaries (without source available) that have
> > > non-NPTL libs statically linked and so still use linuxthreads; those
> > > are affected (i.e. do create zombies). So problem is not rendering
> > > server unusable (so it no longer that critical), but it still exists
> > > in the 2.6 kernel.
> >
> > Sounds like userspace problem in threading libraries.
> > What version of glibc/linuxthreads was in use before?
> > Maybe post your report on linuxthreads mailing list.
>
> I doubt it is a userspace problem.
> It happens with the same userspace libs and binaries (or even same running
> processes) with which it did not happen sometime ago.
> It happens at the same moment with different processes running from
> different accounts.
> Restarting processes doesn't help.
> It is not reprodusable on other hosts.
> Manual signal send (kill -33 <parentpid>) removes already existing zombies.
> I can hardly imagine a userspace problem that behaves like this.

I won't argue. One thing is clear: not enough info at this time :(

Try to instrument (printk("...")) parts of kernel responsible for
handling exit() etc.
--
vda
>
> Nikita


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Strange 'zombie' problem both in 2.4 and 2.6
  2004-04-01 16:09     ` Denis Vlasenko
@ 2004-04-01 20:25       ` Nikita V. Youshchenko
  2004-04-01 20:46         ` Denis Vlasenko
  0 siblings, 1 reply; 6+ messages in thread
From: Nikita V. Youshchenko @ 2004-04-01 20:25 UTC (permalink / raw)
  To: Denis Vlasenko, linux-kernel; +Cc: ghost, bahmurov

> > As far as I understand, in case of threads SIGRT_1 is used instead of
> > SIGCHLD.
> > So I tried to send SIGRT_1 to the parent manually. And zombies
> > disappeared! However, new zombies appear soon. They may still be
> > removed by manual SIGRT_1, but it is not a solution for a kernel bug
> > :).
>
> Maybe. Maybe not. I am no expert, I'd try to learn out how SIGRT_1
> is generated in normal case (I suppose kernel does not distinguish
> between threads and processes, maybe it's done by threading libs?)

I've looked at the kernel source.
This is what I found.

- looks like do_notify_parent() from kernel/signal.c is called to notify 
parent about child termination.

- do_notify_parent() calls __group_send_sig_info() to send the signal, and 
does not check the return code. However, __group_send_sig_info() may fail.

- __group_send_sig_info() calls send_signal()

- send_signal() contains the following code:

	struct sigqueue * q = NULL;
...
	if (atomic_read(&nr_queued_signals) < max_queued_signals)
		q = kmem_cache_alloc(sigqueue_cachep, GFP_ATOMIC);
	if (q) {
...
	} else {
		if (sig >= SIGRTMIN && info && (unsigned long)info != 1
		   && info->si_code != SI_USER)
			return -EAGAIN;
...

SIGRT_1 = 33, 33 is greater than SIGRTMIN, info is definitely not 0 or 1, 
and info->si_code is definitly not SI_USER on the path related to parent 
process notification.

nr_queued_signals and sigqueue_cachep seem to be local for kernel/signal.c 
file, and code is organized such that nr_queued_signals shows exactly how 
many elements are allocated in sigqueue_cachep.
max_queued_signals equals to 1024, so it is not allowed to allocate more 
than 1024 elements from sigqueue_cachep.

sigqueue_cachep is initialized in signals_init():
	sigqueue_cachep =
		kmem_cache_create("sigqueue",
				  sizeof(struct sigqueue),
				  __alignof__(struct sigqueue),
				  0, NULL, NULL);

So I looked into /proc/slabinfo on the server running "zombie-loving" 
kernel, and found the following line:
nikita@zigzag:/proc> grep sigqueue slabinfo
sigqueue 1024   1107  144  27  1 : tunables  120  60  8 : slabdata 41 41  0

As far as I understand, the first number in this output is the number of 
elements allocated from "sigqueue" cache. That is, all 1024 elements are 
allocated!

So looks like 'atomic_read(&nr_queued_signals) < max_queued_signals' is 
false, so 'q' is not allocated, and send_signal() returns -EAGAIN while 
trying to send SIGRT_1 to the parent process. This error code is passed 
from __group_send_sig_info() to do_notify_parent(), and just ignored 
there. So signal is not delivered, and dying process is left in zombie 
state.

So "something" that happens in the kernel that makes it "zombie-lover" is 
sigqueue overflow.

Another question is why this ever happens on my server ...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Strange 'zombie' problem both in 2.4 and 2.6
  2004-04-01 20:25       ` Nikita V. Youshchenko
@ 2004-04-01 20:46         ` Denis Vlasenko
  0 siblings, 0 replies; 6+ messages in thread
From: Denis Vlasenko @ 2004-04-01 20:46 UTC (permalink / raw)
  To: Nikita V. Youshchenko, linux-kernel; +Cc: ghost, bahmurov

On Thursday 01 April 2004 23:25, Nikita V. Youshchenko wrote:
> > > As far as I understand, in case of threads SIGRT_1 is used instead of
> > > SIGCHLD.
> > > So I tried to send SIGRT_1 to the parent manually. And zombies
> > > disappeared! However, new zombies appear soon. They may still be
> > > removed by manual SIGRT_1, but it is not a solution for a kernel bug
> > >
> > > :).
> >
> > Maybe. Maybe not. I am no expert, I'd try to learn out how SIGRT_1
> > is generated in normal case (I suppose kernel does not distinguish
> > between threads and processes, maybe it's done by threading libs?)
>
> I've looked at the kernel source.
> This is what I found.

Good! :)

> - looks like do_notify_parent() from kernel/signal.c is called to notify
> parent about child termination.
>
> - do_notify_parent() calls __group_send_sig_info() to send the signal, and
> does not check the return code. However, __group_send_sig_info() may fail.
>
> - __group_send_sig_info() calls send_signal()
>
> - send_signal() contains the following code:
>
> 	struct sigqueue * q = NULL;
> ...
> 	if (atomic_read(&nr_queued_signals) < max_queued_signals)
> 		q = kmem_cache_alloc(sigqueue_cachep, GFP_ATOMIC);
> 	if (q) {
> ...
> 	} else {
> 		if (sig >= SIGRTMIN && info && (unsigned long)info != 1
> 		   && info->si_code != SI_USER)
> 			return -EAGAIN;
> ...
>
> SIGRT_1 = 33, 33 is greater than SIGRTMIN, info is definitely not 0 or 1,
> and info->si_code is definitly not SI_USER on the path related to parent
> process notification.
>
> nr_queued_signals and sigqueue_cachep seem to be local for kernel/signal.c
> file, and code is organized such that nr_queued_signals shows exactly how
> many elements are allocated in sigqueue_cachep.
> max_queued_signals equals to 1024, so it is not allowed to allocate more
> than 1024 elements from sigqueue_cachep.
>
> sigqueue_cachep is initialized in signals_init():
> 	sigqueue_cachep =
> 		kmem_cache_create("sigqueue",
> 				  sizeof(struct sigqueue),
> 				  __alignof__(struct sigqueue),
> 				  0, NULL, NULL);
>
> So I looked into /proc/slabinfo on the server running "zombie-loving"
> kernel, and found the following line:
> nikita@zigzag:/proc> grep sigqueue slabinfo
> sigqueue 1024   1107  144  27  1 : tunables  120  60  8 : slabdata 41 41  0
>
> As far as I understand, the first number in this output is the number of
> elements allocated from "sigqueue" cache. That is, all 1024 elements are
> allocated!
>
> So looks like 'atomic_read(&nr_queued_signals) < max_queued_signals' is
> false, so 'q' is not allocated, and send_signal() returns -EAGAIN while
> trying to send SIGRT_1 to the parent process. This error code is passed
> from __group_send_sig_info() to do_notify_parent(), and just ignored
> there.

Hmmm what it can do there? Maybe only printk(). The question is why
sigqueue gets so big and does not shrink.

> So signal is not delivered, and dying process is left in zombie
> state.
>
> So "something" that happens in the kernel that makes it "zombie-lover" is
> sigqueue overflow.

You found an explanation why there are zombies. Now, why it starts to happen?
Why does it persists? There must be some code which shrinks sigqueue.
It does not seem to work right.
--
vda


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-04-01 20:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-01 10:42 Strange 'zombie' problem both in 2.4 and 2.6 Nikita V. Youshchenko
2004-04-01 13:17 ` Denis Vlasenko
2004-04-01 15:20   ` Nikita V. Youshchenko
2004-04-01 16:09     ` Denis Vlasenko
2004-04-01 20:25       ` Nikita V. Youshchenko
2004-04-01 20:46         ` Denis Vlasenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.