* wtf bug of the day.
@ 2013-06-29 2:24 Dave Jones
2013-06-29 6:31 ` Jann Horn
2013-07-01 8:18 ` Michael Ellerman
0 siblings, 2 replies; 3+ messages in thread
From: Dave Jones @ 2013-06-29 2:24 UTC (permalink / raw)
To: trinity
I've beem holding off on cutting a new release of trinity until I've nailed
this one last bug[1].
When it happens, the watchdog process is in Z state, and the child processes
are all blocked on sockets (and no progress is made because the watchdog died).
In the one case I've managed to catch a core from the watchdog, it makes no damn sense..
Program terminated with signal 8, Arithmetic exception.
#0 check_shm_sanity () at watchdog.c:47
if (shm->running_childs == 0)
what the hell does that even mean ?
'shm' is valid, shm->running_childs is '4'.
Any ideas ?
Dave
[1] Until the next bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: wtf bug of the day.
2013-06-29 2:24 wtf bug of the day Dave Jones
@ 2013-06-29 6:31 ` Jann Horn
2013-07-01 8:18 ` Michael Ellerman
1 sibling, 0 replies; 3+ messages in thread
From: Jann Horn @ 2013-06-29 6:31 UTC (permalink / raw)
To: trinity
[-- Attachment #1: Type: text/plain, Size: 1098 bytes --]
On Fri, Jun 28, 2013 at 10:24:20PM -0400, Dave Jones wrote:
> I've beem holding off on cutting a new release of trinity until I've nailed
> this one last bug[1].
>
> When it happens, the watchdog process is in Z state, and the child processes
> are all blocked on sockets (and no progress is made because the watchdog died).
>
> In the one case I've managed to catch a core from the watchdog, it makes no damn sense..
>
> Program terminated with signal 8, Arithmetic exception.
> #0 check_shm_sanity () at watchdog.c:47
> if (shm->running_childs == 0)
>
> what the hell does that even mean ?
>
> 'shm' is valid, shm->running_childs is '4'.
>
> Any ideas ?
Could you post the disassembly of that functioni (including start address)? Also, in case
you still have the coredump, could you post the exact address at which the error occured?
That signal is usually only triggered by errors with floating-point errors, but maybe
trinity somehow ended up doing a kill(watchdog, 8) acidentially? I'm not very familia with
trinity, so I don't know whether that's possible.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: wtf bug of the day.
2013-06-29 2:24 wtf bug of the day Dave Jones
2013-06-29 6:31 ` Jann Horn
@ 2013-07-01 8:18 ` Michael Ellerman
1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2013-07-01 8:18 UTC (permalink / raw)
To: Dave Jones; +Cc: trinity
On Fri, 2013-06-28 at 22:24 -0400, Dave Jones wrote:
> I've beem holding off on cutting a new release of trinity until I've nailed
> this one last bug[1].
>
> When it happens, the watchdog process is in Z state, and the child processes
> are all blocked on sockets (and no progress is made because the watchdog died).
>
> In the one case I've managed to catch a core from the watchdog, it makes no damn sense..
>
> Program terminated with signal 8, Arithmetic exception.
> #0 check_shm_sanity () at watchdog.c:47
> if (shm->running_childs == 0)
>
> what the hell does that even mean ?
>
> 'shm' is valid, shm->running_childs is '4'.
>
> Any ideas ?
You could add a SIGFPE handler and check whether it's coming from
another process or not.
Something like:
void sighandler(int signal, siginfo_t *siginfo, void *ucontext)
{
printf("Took signal %d\n", signal);
printf("Sent by process %d (uid %d)\n",
siginfo->si_pid, siginfo->si_uid);
}
struct sigaction sigfpe_action = {
.sa_sigaction = sighandler,
.sa_flags = SA_SIGINFO,
};
if (sigaction(SIGFPE, &sigfpe_action, NULL)) {
perror("sigaction");
return 1;
}
cheers
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-07-01 8:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-29 2:24 wtf bug of the day Dave Jones
2013-06-29 6:31 ` Jann Horn
2013-07-01 8:18 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox