From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261321AbVGVQjF (ORCPT ); Fri, 22 Jul 2005 12:39:05 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261322AbVGVQjF (ORCPT ); Fri, 22 Jul 2005 12:39:05 -0400 Received: from gear.torque.net ([204.138.244.1]:39563 "EHLO gear.torque.net") by vger.kernel.org with ESMTP id S261321AbVGVQjC (ORCPT ); Fri, 22 Jul 2005 12:39:02 -0400 Message-ID: <42E120BF.6090504@torque.net> Date: Fri, 22 Jul 2005 12:37:19 -0400 From: Michael Harris User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: PROBLEM: Failure to deliver SIGCHLD Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org [1.] Failure to deliver SIGCHLD [2.] The problem occurs in a forking server similar in function to inetd. The server employs a very simple SIGCHLD handler that loops on wait(2), until all zombie processes have been collected. For no immediately apparent reason, the parent process behaves as if it no longer receives SIGCHLD. Manually sending the signal has no effect. Within minutes the server is incapacitated. If the forking server is terminated, init properly disposes of the zombie processes. When i attach strace to the parent server, it immediately enters the signal handler, and calls wait() for each zombie, and normal function continues, even after strace is terminated. We have only seen this problem manifest itsself on x86_64 and em64t multi-processor machines, which is not to say that i feel it may be limited to those architectures. [3.] Keywords: Signal handler, zombie, SIGCHLD, 64 BIT SMP [4.] Kernel version (from /proc/version): Linux version 2.6.5-7.97-smp (geeko@buildhost) (gcc version 3.3.3 (SuSE Linux)) #1 SMP Fri Jul 2 14:21:59 UTC 2004 [6.] This is the code for the signal handler in the server application. I dare say it contains no bugs (famous last words) void reaper_man (int signum) { int stat; while ( waitpid(-1, &stat, WNOHANG) > 0 ); } signal (SIGCHLD, reaper_man); /* from main() */ [7.1.] Software: Gnu C 3.3.3 Gnu make 3.80 binutils 2.15.90.0.1.1 util-linux 2.12 mount 2.12 module-init-tools 3.0-pre10 e2fsprogs 1.34 jfsutils 1.1.6 xfsprogs 2.6.3 PPP 2.4.2 isdn4k-utils 3.4 nfs-utils 1.0.6 Linux C Library x 1 root root 1398085 Jun 30 2004 /lib64/tls/libc.so.6 Dynamic linker (ldd) 2.3.3 Linux C++ Library 5.0.6 Procps 3.2.1 Net-tools 1.60 Kbd 1.12 Sh-utils 5.2.1 [7.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 248 stepping : 8 cpu MHz : 2193.511 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow bogomips : 4325.37 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts ttp processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 248 stepping : 8 cpu MHz : 2193.511 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow bogomips : 4374.52 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts ttp [7.3.] Module information (from /proc/modules): hid 45952 0 joydev 12416 0 sg 43576 0 st 43428 0 sr_mod 19492 0 usbserial 35704 0 parport_pc 41152 0 lp 13352 0 parport 47372 2 parport_pc,lp hw_random 7336 0 ohci_hcd 22532 0 evdev 11776 0 thermal 15500 0 processor 20128 1 thermal fan 5640 0 button 8480 0 battery 10760 0 ac 6664 0 ipv6 288760 19 tg3 75780 0 usbcore 124656 5 hid,usbserial,ohci_hcd subfs 10496 1 dm_mod 57920 0 reiserfs 256752 4 mptscsih 47064 0 mptbase 49568 1 mptscsih sd_mod 22400 0 scsi_mod 131584 5 sg,st,sr_mod,mptscsih,sd_ [7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) ioports: 0000-001f : dma1 0020-0021 : pic1 0040-005f : timer 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 0376-0376 : ide1 03c0-03df : vesafb 03f6-03f6 : ide0 03f8-03ff : serial 0cf8-0cff : PCI conf1 1000-101f : 0000:00:07.2 1020-102f : 0000:00:07.1 1020-1027 : ide0 1028-102f : ide1 2000-2fff : PCI Bus #01 2000-20ff : 0000:01:05.0 3000-3fff : PCI Bus #02 3000-30ff : 0000:02:02.0 8008-800b : ACPI timer 8010-8015 : ACPI CPU throttle iomem: 00000000-0009b7ff : System RAM 0009b800-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000c8000-000c97ff : Extension ROM 000c9800-000cafff : Extension ROM 000cb000-000cb7ff : Extension ROM 000f0000-000fffff : System ROM 00100000-fbf6ffff : System RAM 00100000-003423ab : Kernel code 003423ac-0041c4ec : Kernel data fbf70000-fbf7afff : ACPI Tables fbf7b000-fbf7ffff : ACPI Non-volatile Storage fbf80000-fbffffff : reserved fc000000-fc000fff : 0000:00:0a.1 fc001000-fc001fff : 0000:00:0b.1 fc100000-fdffffff : PCI Bus #01 fc100000-fc100fff : 0000:01:00.0 fc100000-fc100fff : ohci_hcd fc101000-fc101fff : 0000:01:00.1 fc101000-fc101fff : ohci_hcd fc102000-fc102fff : 0000:01:05.0 fd000000-fdffffff : 0000:01:05.0 fd000000-fd7effff : vesafb fe000000-fe0fffff : PCI Bus #02 fe000000-fe00ffff : 0000:02:01.0 fe000000-fe00ffff : tg3 fe010000-fe01ffff : 0000:02:01.0 fe010000-fe01ffff : tg3 fe020000-fe02ffff : 0000:02:01.1 fe020000-fe02ffff : tg3 fe030000-fe03ffff : 0000:02:01.1 fe030000-fe03ffff : tg3 fe040000-fe04ffff : 0000:02:02.0 fe050000-fe05ffff : 0000:02:02.0 fec00000-fec003ff : reserved fee00000-fee00fff : reserved fff80000-ffffffff : reserved [7.5.] PCI information ('lspci -vvv' as root) 0000:00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Reset- FastB2B- Capabilities: [c0] #08 [0086] Capabilities: [f0] #08 [8000] 0000:00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) Subsystem: Advanced Micro Devices [AMD] AMD-8111 LPC Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- Reset- FastB2B- Capabilities: [a0] Capabilities: [b8] #08 [8000] Capabilities: [c0] #08 [004a] 0000:00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) (prog-if 10 [IO-APIC]) Subsystem: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- Reset- FastB2B- Capabilities: [a0] Capabilities: [b8] #08 [8000] 0000:00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) (prog-if 10 [IO-APIC]) Subsystem: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR-