xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* libvirt libxl timer handling issue
@ 2014-01-31  6:01 Jim Fehlig
  2014-01-31 12:17 ` Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Jim Fehlig @ 2014-01-31  6:01 UTC (permalink / raw)
  To: Ian Jackson, xen-devel@lists.xen.org

Hi Ian,

I hit a libvirtd segfault after ~7000 iterations of my test scripts. 
Oddly, after restarting libvirtd, I now see the segfault after only a
few iterations.  It seems to occur when shutting down a domain, and
always at the same spot

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff74567c5 in virClassIsDerivedFrom (klass=0x4545454545454545,
parent=0x5555558a1230)
    at util/virobject.c:166
166             if (klass->magic == parent->magic)
(gdb) bt
#0  0x00007ffff74567c5 in virClassIsDerivedFrom
(klass=0x4545454545454545, parent=0x5555558a1230)
    at util/virobject.c:166
#1  0x00007ffff7456f0a in virObjectIsClass (anyobj=0x5555559e78a0,
klass=0x5555558a1230)
    at util/virobject.c:362
#2  0x00007ffff7456d63 in virObjectLock (anyobj=0x5555559e78a0) at
util/virobject.c:314
#3  0x00007fffe993d3ad in libxlDomainObjTimerCallback (timer=31,
timer_info=0x5555559cbed0)
    at libxl/libxl_domain.c:214
#4  0x00007ffff742f5b3 in virEventPollDispatchTimeouts () at
util/vireventpoll.c:451
#5  0x00007ffff7430125 in virEventPollRunOnce () at util/vireventpoll.c:644
#6  0x00007ffff742e061 in virEventRunDefaultImpl () at util/virevent.c:306
#7  0x00007ffff75b7531 in virNetServerRun (srv=0x555555896360) at
rpc/virnetserver.c:1112
#8  0x000055555556b6f8 in main (argc=2, argv=0x7fffffffe2b8) at
libvirtd.c:1517
(gdb) f 3
(gdb) p *info
$1 = {next = 0x0, priv = 0x5555559e78a0, xl_priv = 0x5555559de360, id =
31, in_callback = false,
  dereg = true}
(gdb) p *info->priv
$2 = {parent = {parent = {u = {dummy_align1 = 93824997010160,
dummy_align2 = 0x5555559e2af0, s = {
          magic = 1436429040, refs = 21845}}, klass =
0x4545454545454545}, lock = {lock = {__data = {
          __lock = 1162167621, __count = 1162167621, __owner =
1162167621, __nusers = 1162167621,
          __kind = 1162167621, __spins = 17733, __elision = 17733,
__list = {
            __prev = 0x4545454545454545, __next = 0x4545454545454545}},
        __size = 'E' <repeats 40 times>, __align = 4991471925827290437}}},
  logger_file = 0x4545454545454545, logger = 0xcbababababababa, ctx =
0x21, devs = 0x5555559e2b40,
  deathW = 0x4545454545454545}

Its not clear to me how the for_app_registration blob is being
trampled.  I did notice that the timeout_modify hook is called twice for
some timeouts, once from afterpoll_internal and once from
libxl__ev_time_deregister.  Should libxl apps handle multiple calls to
timeout_modify for the same timer?

On the bright side, I seem to have the fd event handling issues sorted out.

Regards,
Jim

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: libvirt libxl timer handling issue
  2014-01-31  6:01 libvirt libxl timer handling issue Jim Fehlig
@ 2014-01-31 12:17 ` Ian Jackson
  2014-01-31 15:19   ` Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2014-01-31 12:17 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel@lists.xen.org

Jim Fehlig writes ("libvirt libxl timer handling issue"):
> I hit a libvirtd segfault after ~7000 iterations of my test scripts. 
> Oddly, after restarting libvirtd, I now see the segfault after only a
> few iterations.  It seems to occur when shutting down a domain, and
> always at the same spot
...
> Its not clear to me how the for_app_registration blob is being
> trampled.  I did notice that the timeout_modify hook is called twice for
> some timeouts, once from afterpoll_internal and once from
> libxl__ev_time_deregister.  Should libxl apps handle multiple calls to
> timeout_modify for the same timer?

Yes, multiple calls to timeout_modify are supposed to work.  Is that
possibly the root cause of your crash ?

> On the bright side, I seem to have the fd event handling issues sorted out.

Good, I guess.  Let me look at your crash stacktrace and the libxl
code in more detail...

Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: libvirt libxl timer handling issue
  2014-01-31 12:17 ` Ian Jackson
@ 2014-01-31 15:19   ` Ian Jackson
  2014-01-31 15:47     ` Jim Fehlig
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2014-01-31 15:19 UTC (permalink / raw)
  To: Jim Fehlig, xen-devel@lists.xen.org

Ian Jackson writes ("Re: libvirt libxl timer handling issue"):
> Jim Fehlig writes ("libvirt libxl timer handling issue"):
> > On the bright side, I seem to have the fd event handling issues sorted out.
> 
> Good, I guess.  Let me look at your crash stacktrace and the libxl
> code in more detail...

I think this is due to libxl_event.c not clearing the ->func member of
its timeout structs when the timeout occurs.  TBH it's surprising that
this hasn't caused more trouble and I haven't been able to test this
so I'm not sure.

But please take a look at
  git://xenbits.xen.org/people/iwj/xen.git#wip.timeout-func0

The top two patches there are new; the rest is my fork fixup branch.
Note once again that I have compiled but NOT EXECUTED these two
patches.  But since I'm about to be out of touch travelling and then
at FOSDEM I thought I'd send this to you right away.

Sorry if this too turns out to be my fault...

Regards,
Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: libvirt libxl timer handling issue
  2014-01-31 15:19   ` Ian Jackson
@ 2014-01-31 15:47     ` Jim Fehlig
  2014-02-03 14:45       ` Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Jim Fehlig @ 2014-01-31 15:47 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel@lists.xen.org

Ian Jackson wrote:
> Ian Jackson writes ("Re: libvirt libxl timer handling issue"):
>   
>> Jim Fehlig writes ("libvirt libxl timer handling issue"):
>>     
>>> On the bright side, I seem to have the fd event handling issues sorted out.
>>>       
>> Good, I guess.  Let me look at your crash stacktrace and the libxl
>> code in more detail...
>>     
>
> I think this is due to libxl_event.c not clearing the ->func member of
> its timeout structs when the timeout occurs.  TBH it's surprising that
> this hasn't caused more trouble and I haven't been able to test this
> so I'm not sure.
>
> But please take a look at
>   git://xenbits.xen.org/people/iwj/xen.git#wip.timeout-func0
>
> The top two patches there are new; the rest is my fork fixup branch.
> Note once again that I have compiled but NOT EXECUTED these two
> patches.  But since I'm about to be out of touch travelling and then
> at FOSDEM I thought I'd send this to you right away.
>   

Ok, thanks. I'll give this a try in a bit.

> Sorry if this too turns out to be my fault...
>   

Well, I should spend some time becoming familiar with this part of libxl
so I can help fix issues too, instead of whining about them :).

Enjoy FOSDEM.

Regards,
Jim

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: libvirt libxl timer handling issue
  2014-01-31 15:47     ` Jim Fehlig
@ 2014-02-03 14:45       ` Ian Jackson
  0 siblings, 0 replies; 5+ messages in thread
From: Ian Jackson @ 2014-02-03 14:45 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel@lists.xen.org

Jim Fehlig writes ("Re: libvirt libxl timer handling issue"):
> Ian Jackson wrote:
> > I think this is due to libxl_event.c not clearing the ->func member of
> > its timeout structs when the timeout occurs.  TBH it's surprising that
> > this hasn't caused more trouble and I haven't been able to test this
> > so I'm not sure.

I have now tested this.  I can confirm that this bug is real, and I
understand why we haven't spotted it before.  My fix is correct, I
think.

> > But please take a look at
> >   git://xenbits.xen.org/people/iwj/xen.git#wip.timeout-func0

I have rebased that ref.

> Ok, thanks. I'll give this a try in a bit.

Good luck and please let me know.

> Well, I should spend some time becoming familiar with this part of libxl
> so I can help fix issues too, instead of whining about them :).

TBH I'm hoping that this lot will only need debugging once ...

I'm going to post v3 of my event fixes series RSN.
(This time for sure...)

Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-02-03 14:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-31  6:01 libvirt libxl timer handling issue Jim Fehlig
2014-01-31 12:17 ` Ian Jackson
2014-01-31 15:19   ` Ian Jackson
2014-01-31 15:47     ` Jim Fehlig
2014-02-03 14:45       ` Ian Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).