* libvirt libxl timer handling issue
@ 2014-01-31 6:01 Jim Fehlig
2014-01-31 12:17 ` Ian Jackson
0 siblings, 1 reply; 5+ messages in thread
From: Jim Fehlig @ 2014-01-31 6:01 UTC (permalink / raw)
To: Ian Jackson, xen-devel@lists.xen.org
Hi Ian,
I hit a libvirtd segfault after ~7000 iterations of my test scripts.
Oddly, after restarting libvirtd, I now see the segfault after only a
few iterations. It seems to occur when shutting down a domain, and
always at the same spot
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff74567c5 in virClassIsDerivedFrom (klass=0x4545454545454545,
parent=0x5555558a1230)
at util/virobject.c:166
166 if (klass->magic == parent->magic)
(gdb) bt
#0 0x00007ffff74567c5 in virClassIsDerivedFrom
(klass=0x4545454545454545, parent=0x5555558a1230)
at util/virobject.c:166
#1 0x00007ffff7456f0a in virObjectIsClass (anyobj=0x5555559e78a0,
klass=0x5555558a1230)
at util/virobject.c:362
#2 0x00007ffff7456d63 in virObjectLock (anyobj=0x5555559e78a0) at
util/virobject.c:314
#3 0x00007fffe993d3ad in libxlDomainObjTimerCallback (timer=31,
timer_info=0x5555559cbed0)
at libxl/libxl_domain.c:214
#4 0x00007ffff742f5b3 in virEventPollDispatchTimeouts () at
util/vireventpoll.c:451
#5 0x00007ffff7430125 in virEventPollRunOnce () at util/vireventpoll.c:644
#6 0x00007ffff742e061 in virEventRunDefaultImpl () at util/virevent.c:306
#7 0x00007ffff75b7531 in virNetServerRun (srv=0x555555896360) at
rpc/virnetserver.c:1112
#8 0x000055555556b6f8 in main (argc=2, argv=0x7fffffffe2b8) at
libvirtd.c:1517
(gdb) f 3
(gdb) p *info
$1 = {next = 0x0, priv = 0x5555559e78a0, xl_priv = 0x5555559de360, id =
31, in_callback = false,
dereg = true}
(gdb) p *info->priv
$2 = {parent = {parent = {u = {dummy_align1 = 93824997010160,
dummy_align2 = 0x5555559e2af0, s = {
magic = 1436429040, refs = 21845}}, klass =
0x4545454545454545}, lock = {lock = {__data = {
__lock = 1162167621, __count = 1162167621, __owner =
1162167621, __nusers = 1162167621,
__kind = 1162167621, __spins = 17733, __elision = 17733,
__list = {
__prev = 0x4545454545454545, __next = 0x4545454545454545}},
__size = 'E' <repeats 40 times>, __align = 4991471925827290437}}},
logger_file = 0x4545454545454545, logger = 0xcbababababababa, ctx =
0x21, devs = 0x5555559e2b40,
deathW = 0x4545454545454545}
Its not clear to me how the for_app_registration blob is being
trampled. I did notice that the timeout_modify hook is called twice for
some timeouts, once from afterpoll_internal and once from
libxl__ev_time_deregister. Should libxl apps handle multiple calls to
timeout_modify for the same timer?
On the bright side, I seem to have the fd event handling issues sorted out.
Regards,
Jim
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: libvirt libxl timer handling issue
2014-01-31 6:01 libvirt libxl timer handling issue Jim Fehlig
@ 2014-01-31 12:17 ` Ian Jackson
2014-01-31 15:19 ` Ian Jackson
0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2014-01-31 12:17 UTC (permalink / raw)
To: Jim Fehlig; +Cc: xen-devel@lists.xen.org
Jim Fehlig writes ("libvirt libxl timer handling issue"):
> I hit a libvirtd segfault after ~7000 iterations of my test scripts.
> Oddly, after restarting libvirtd, I now see the segfault after only a
> few iterations. It seems to occur when shutting down a domain, and
> always at the same spot
...
> Its not clear to me how the for_app_registration blob is being
> trampled. I did notice that the timeout_modify hook is called twice for
> some timeouts, once from afterpoll_internal and once from
> libxl__ev_time_deregister. Should libxl apps handle multiple calls to
> timeout_modify for the same timer?
Yes, multiple calls to timeout_modify are supposed to work. Is that
possibly the root cause of your crash ?
> On the bright side, I seem to have the fd event handling issues sorted out.
Good, I guess. Let me look at your crash stacktrace and the libxl
code in more detail...
Ian.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: libvirt libxl timer handling issue
2014-01-31 12:17 ` Ian Jackson
@ 2014-01-31 15:19 ` Ian Jackson
2014-01-31 15:47 ` Jim Fehlig
0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2014-01-31 15:19 UTC (permalink / raw)
To: Jim Fehlig, xen-devel@lists.xen.org
Ian Jackson writes ("Re: libvirt libxl timer handling issue"):
> Jim Fehlig writes ("libvirt libxl timer handling issue"):
> > On the bright side, I seem to have the fd event handling issues sorted out.
>
> Good, I guess. Let me look at your crash stacktrace and the libxl
> code in more detail...
I think this is due to libxl_event.c not clearing the ->func member of
its timeout structs when the timeout occurs. TBH it's surprising that
this hasn't caused more trouble and I haven't been able to test this
so I'm not sure.
But please take a look at
git://xenbits.xen.org/people/iwj/xen.git#wip.timeout-func0
The top two patches there are new; the rest is my fork fixup branch.
Note once again that I have compiled but NOT EXECUTED these two
patches. But since I'm about to be out of touch travelling and then
at FOSDEM I thought I'd send this to you right away.
Sorry if this too turns out to be my fault...
Regards,
Ian.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: libvirt libxl timer handling issue
2014-01-31 15:19 ` Ian Jackson
@ 2014-01-31 15:47 ` Jim Fehlig
2014-02-03 14:45 ` Ian Jackson
0 siblings, 1 reply; 5+ messages in thread
From: Jim Fehlig @ 2014-01-31 15:47 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel@lists.xen.org
Ian Jackson wrote:
> Ian Jackson writes ("Re: libvirt libxl timer handling issue"):
>
>> Jim Fehlig writes ("libvirt libxl timer handling issue"):
>>
>>> On the bright side, I seem to have the fd event handling issues sorted out.
>>>
>> Good, I guess. Let me look at your crash stacktrace and the libxl
>> code in more detail...
>>
>
> I think this is due to libxl_event.c not clearing the ->func member of
> its timeout structs when the timeout occurs. TBH it's surprising that
> this hasn't caused more trouble and I haven't been able to test this
> so I'm not sure.
>
> But please take a look at
> git://xenbits.xen.org/people/iwj/xen.git#wip.timeout-func0
>
> The top two patches there are new; the rest is my fork fixup branch.
> Note once again that I have compiled but NOT EXECUTED these two
> patches. But since I'm about to be out of touch travelling and then
> at FOSDEM I thought I'd send this to you right away.
>
Ok, thanks. I'll give this a try in a bit.
> Sorry if this too turns out to be my fault...
>
Well, I should spend some time becoming familiar with this part of libxl
so I can help fix issues too, instead of whining about them :).
Enjoy FOSDEM.
Regards,
Jim
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: libvirt libxl timer handling issue
2014-01-31 15:47 ` Jim Fehlig
@ 2014-02-03 14:45 ` Ian Jackson
0 siblings, 0 replies; 5+ messages in thread
From: Ian Jackson @ 2014-02-03 14:45 UTC (permalink / raw)
To: Jim Fehlig; +Cc: xen-devel@lists.xen.org
Jim Fehlig writes ("Re: libvirt libxl timer handling issue"):
> Ian Jackson wrote:
> > I think this is due to libxl_event.c not clearing the ->func member of
> > its timeout structs when the timeout occurs. TBH it's surprising that
> > this hasn't caused more trouble and I haven't been able to test this
> > so I'm not sure.
I have now tested this. I can confirm that this bug is real, and I
understand why we haven't spotted it before. My fix is correct, I
think.
> > But please take a look at
> > git://xenbits.xen.org/people/iwj/xen.git#wip.timeout-func0
I have rebased that ref.
> Ok, thanks. I'll give this a try in a bit.
Good luck and please let me know.
> Well, I should spend some time becoming familiar with this part of libxl
> so I can help fix issues too, instead of whining about them :).
TBH I'm hoping that this lot will only need debugging once ...
I'm going to post v3 of my event fixes series RSN.
(This time for sure...)
Ian.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-02-03 14:45 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-31 6:01 libvirt libxl timer handling issue Jim Fehlig
2014-01-31 12:17 ` Ian Jackson
2014-01-31 15:19 ` Ian Jackson
2014-01-31 15:47 ` Jim Fehlig
2014-02-03 14:45 ` Ian Jackson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).