From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pete Zaitcev Subject: CLD crash mystery Date: Thu, 12 Nov 2009 12:02:27 -0700 Message-ID: <20091112120227.14c7a704@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Project Hail List With the last fixes to the timer code, CLD does not crash often anymore, but it still does, and it's still related to timers. Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb) where #0 0x00000000 in ?? () #1 0x080511be in timers_run () at util.c:207 #2 0x0804edfa in main (argc=3, argv=0xbff95a14) at server.c:1040 Crash occurs when NULL callback is invoked here: tmp = exec_list; while (tmp) { timer = tmp->data; tmp = tmp->next; timer->fired = true; timer->cb(timer); <===== crash } This happens because the struct timer *timer is zero-filled, although the GList does not seem to be corrupt in any way: (gdb) display timer 5: timer = (struct timer *) 0x98e21c0 (gdb) display *timer 2: *timer = {fired = true, on_list = false, cb = 0, userdata = 0x0, expires = 0, name = '\0' } (gdb) display exec_list 1: exec_list = (GList *) 0x9922750 (gdb) display * (GList *) 0x9922750 3: * (GList *) 0x9922750 = {data = 0x98e21c0, next = 0x0, prev = 0x0} (gdb) The main list looks perfectly OK too: (gdb) display timer_list 6: timer_list = (GList *) 0x99224d0 (gdb) display * (GList *) 0x99224d0 7: * (GList *) 0x99224d0 = {data = 0x98806f4, next = 0x9922780, prev = 0x0} (gdb) display * (GList *) 0x9922780 8: * (GList *) 0x9922780 = {data = 0x990229c, next = 0x9922550, prev = 0x99224d0} (gdb) display * (GList *) 0x9922550 9: * (GList *) 0x9922550 = {data = 0x8054bd8, next = 0x0, prev = 0x9922780} (gdb) display * (struct timer *) 0x98806f4 10: * (struct timer *) 0x98806f4 = {fired = false, on_list = true, cb = 0x8050620 , userdata = 0x9880678, expires = 1258047830, name = "session-timeout", '\0' } (gdb) display * (struct timer *) 0x990229c 11: * (struct timer *) 0x990229c = {fired = false, on_list = true, cb = 0x8050620 , userdata = 0x9902220, expires = 1258047835, name = "session-timeout", '\0' } (gdb) display * (struct timer *) 0x8054bd8 12: * (struct timer *) 0x8054bd8 = {fired = false, on_list = true, cb = 0x804d970 , userdata = 0x0, expires = 1258047951, name = "db4-checkpoint", '\0' } (gdb) (gdb) display now 13: now = 1258047814 I just don't see what may trip this. A mystery! -- Pete