From: Jim Fehlig <jfehlig@suse.com>
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: xen-devel@lists.xensource.com, Ian Campbell <ian.campbell@citrix.com>
Subject: Re: [PATCH 00/12] libxl: fork: SIGCHLD flexibility
Date: Wed, 22 Jan 2014 21:05:39 -0700 [thread overview]
Message-ID: <52E09513.6060603@suse.com> (raw)
In-Reply-To: <52DF57E2.2090602@suse.com>
Jim Fehlig wrote:
> Ian Jackson wrote:
>
>> Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
>>
>>
>>> I let this run over the weekend and today noticed libvirtd was deadlocked
>>>
>>>
>> I have just retested xl with:
>> * my 3-patch 4.4 fixes series
>> * v2 of my fork series
>> * the extra mutex patch "libxl: fork: Fixup SIGCHLD sharing"
>> * "13/12" and "14/12" just posted
>> and it WFM.
>>
>> Of course I don't have the same setup as Jim.
>>
>> Jim: if it's not too much trouble, I'd appreciate it if you could try
>> that combination.
>>
>> For your convenience you can find a git branch of it at
>> http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=shortlog;h=refs/tags/wip.enumerate-pids-v2.1
>> aka
>> git://xenbits.xen.org/people/iwj/xen.git#wip.enumerate-pids-v2.1
>>
>>
>
> I've been testing this branch and notice an occasional libvirtd segfault
> that always occurs when calling libxl_domain_create_restore(). By
> occasional, I mean my save/restore script might cause the segfault after
> 2 iterations, or 20 iterations, or ... But the segfault always occurs
> in libxl_domain_create_restore()
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffeef59700 (LWP 12083)]
> 0x00007ffff74577ef in virObjectIsClass (anyobj=0x2f302f6e69616d6f,
> klass=0x5555558a1310)
> at util/virobject.c:362
> 362 return virClassIsDerivedFrom(obj->klass, klass);
> (gdb) bt
> #0 0x00007ffff74577ef in virObjectIsClass (anyobj=0x2f302f6e69616d6f,
> klass=0x5555558a1310)
> at util/virobject.c:362
> #1 0x00007ffff745765b in virObjectLock (anyobj=0x2f302f6e69616d6f) at
> util/virobject.c:314
> #2 0x00007fffe993cc96 in libxlDomainObjTimeoutModifyEventHook
> (priv=0x5555558fc310,
> hndp=0x5555559e5d88, abs_t=...) at libxl/libxl_domain.c:302
> #3 0x00007fffe96f8fed in time_deregister (gc=0x7fffeef58220,
> ev=0x5555559eee48)
> at libxl_event.c:294
> #4 0x00007fffe96facfd in afterpoll_internal (egc=0x7fffeef58220,
> poller=0x5555559a4c70, nfds=3,
> fds=0x5555559c09d0, now=...) at libxl_event.c:1008
> #5 0x00007fffe96fc312 in eventloop_iteration (egc=0x7fffeef58220,
> poller=0x5555559a4c70)
> at libxl_event.c:1455
> #6 0x00007fffe96fce58 in libxl__ao_inprogress (ao=0x5555559e9690,
> file=0x7fffe970fadb "libxl_create.c", line=1356,
> func=0x7fffe97105f0 <__func__.16344> "do_domain_create") at
> libxl_event.c:1700
> #7 0x00007fffe96d711f in do_domain_create (ctx=0x5555559d9fa0,
> d_config=0x7fffeef58490,
> domid=0x7fffeef5840c, restore_fd=89, checkpointed_stream=0,
> ao_how=0x0, aop_console_how=0x0)
> at libxl_create.c:1356
> #8 0x00007fffe96d7238 in libxl_domain_create_restore
> (ctx=0x5555559d9fa0, d_config=0x7fffeef58490,
> domid=0x7fffeef5840c, restore_fd=89, params=0x7fffeef58400,
> ao_how=0x0, aop_console_how=0x0)
> at libxl_create.c:1387
> #...
> (gdb) f 2
> #2 0x00007fffe993cc96 in libxlDomainObjTimeoutModifyEventHook
> (priv=0x5555558fc310,
> hndp=0x5555559e5d88, abs_t=...) at libxl/libxl_domain.c:302
> 302 virObjectLock(info->priv);
> (gdb) p info->priv
> $3 = (libxlDomainObjPrivatePtr) 0x2f302f6e69616d6f
> (gdb) f 9
> #9 0x00007fffe993f2c7 in libxlVmStart (driver=0x5555558c2e50,
> vm=0x5555558e6a50,
> start_paused=false, restore_fd=89) at libxl/libxl_driver.c:635
> 635 res = libxl_domain_create_restore(priv->ctx, &d_config,
> &domid,
> (gdb) p priv
> $2 = (libxlDomainObjPrivatePtr) 0x5555558fc310
>
> It looks like the libxlDomainObjPrivatePtr, stashed as part of
> for_app_registration_out when registering the timeout, has been
> trampled. Not sure if the problem is in libvirt or libxl, but it is
> late here and I'm calling it a night :).
>
It appears the timeout_modify callback is invoked on a previously
deregistered timeout. I didn't notice the segfault when running
libvirtd under valgrind, but did see
==14653== Invalid read of size 8
==14653== at 0x134ACD1C: libxlDomainObjTimeoutModifyEventHook
(libxl_domain.c:309)
==14653== by 0x13730FEC: time_deregister (libxl_event.c:294)
==14653== by 0x13732CFC: afterpoll_internal (libxl_event.c:1008)
==14653== by 0x13734311: eventloop_iteration (libxl_event.c:1455)
==14653== by 0x13734E57: libxl__ao_inprogress (libxl_event.c:1700)
==14653== by 0x1370F11E: do_domain_create (libxl_create.c:1356)
==14653== by 0x1370F237: libxl_domain_create_restore
(libxl_create.c:1387)
==14653== by 0x134AF332: libxlVmStart (libxl_driver.c:635)
==14653== by 0x134B382A: libxlDomainRestoreFlags (libxl_driver.c:2047)
==14653== by 0x134B3975: libxlDomainRestore (libxl_driver.c:2070)
==14653== by 0x53B5AC7: virDomainRestore (libvirt.c:2678)
==14653== by 0x130ADC: remoteDispatchDomainRestore
(remote_dispatch.h:6657)
==14653== Address 0x18000178 is 8 bytes inside a block of size 32 free'd
==14653== at 0x4C28ADC: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==14653== by 0x529B08F: virFree (viralloc.c:580)
==14653== by 0x134AC578: libxlDomainObjEventHookInfoFree
(libxl_domain.c:110)
==14653== by 0x52BE3DB: virEventPollCleanupTimeouts (vireventpoll.c:535)
==14653== by 0x52BEA4C: virEventPollRunOnce (vireventpoll.c:651)
==14653== by 0x52BC960: virEventRunDefaultImpl (virevent.c:306)
which is consistent with the gdb findings. I've audited the timeout
handling code in libvirt and didn't notice any problems. I'll have some
time tomorrow to continue poking.
Regards,
Jim
next prev parent reply other threads:[~2014-01-23 4:05 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-17 16:23 [PATCH 00/12] libxl: fork: SIGCHLD flexibility Ian Jackson
2014-01-17 16:23 ` [PATCH 01/12] libxl: fork: Break out checked_waitpid Ian Jackson
2014-01-17 16:23 ` [PATCH 02/12] libxl: fork: Break out childproc_reaped_ours Ian Jackson
2014-01-17 16:23 ` [PATCH 03/12] libxl: fork: Clarify docs for libxl_sigchld_owner Ian Jackson
2014-01-17 16:23 ` [PATCH 04/12] libxl: fork: Document libxl_sigchld_owner_libxl better Ian Jackson
2014-01-17 16:28 ` Ian Campbell
2014-01-17 16:23 ` [PATCH 05/12] libxl: fork: assert that chldmode is right Ian Jackson
2014-01-17 16:23 ` [PATCH 06/12] libxl: fork: Provide libxl_childproc_sigchld_occurred Ian Jackson
2014-01-17 16:24 ` [PATCH 07/12] libxl: fork: Provide ..._always_selective_reap Ian Jackson
2014-01-17 22:17 ` Jim Fehlig
2014-01-17 16:24 ` [PATCH 08/12] libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP Ian Jackson
2014-01-17 16:24 ` [PATCH 09/12] libxl: fork: Rename sigchld handler functions Ian Jackson
2014-01-20 9:59 ` Ian Campbell
2014-01-17 16:24 ` [PATCH 10/12] libxl: fork: Break out sigchld_installhandler_core Ian Jackson
2014-01-20 9:59 ` Ian Campbell
2014-01-17 16:24 ` [PATCH 11/12] libxl: fork: Break out sigchld_sethandler_raw Ian Jackson
2014-01-20 9:58 ` Ian Campbell
2014-01-20 17:57 ` Ian Jackson
2014-01-17 16:24 ` [PATCH 12/12] libxl: fork: Share SIGCHLD handler amongst ctxs Ian Jackson
2014-01-17 18:13 ` Ian Jackson
2014-01-20 9:56 ` Ian Campbell
2014-01-21 14:40 ` Ian Jackson
2014-01-21 14:53 ` Ian Campbell
2014-01-21 15:09 ` Ian Jackson
2014-01-17 16:37 ` [PATCH 00/12] libxl: fork: SIGCHLD flexibility Ian Jackson
2014-01-17 22:29 ` Jim Fehlig
2014-01-20 18:14 ` Jim Fehlig
2014-01-21 14:46 ` Ian Jackson
2014-01-21 15:11 ` [PATCH 13/12] libxl: events: Break out libxl__pipe_nonblock, _close Ian Jackson
2014-01-21 15:11 ` [PATCH 14/12] libxl: fork: Make SIGCHLD self-pipe nonblocking Ian Jackson
2014-01-21 15:32 ` Ian Campbell
2014-01-21 15:48 ` Ian Jackson
2014-01-21 15:27 ` [PATCH 13/12] libxl: events: Break out libxl__pipe_nonblock, _close Ian Campbell
2014-01-21 15:31 ` Ian Jackson
2014-01-21 15:28 ` [PATCH 00/12] libxl: fork: SIGCHLD flexibility Ian Jackson
2014-01-22 5:32 ` Jim Fehlig
2014-01-23 4:05 ` Jim Fehlig [this message]
2014-01-23 10:56 ` Ian Jackson
2014-01-23 21:36 ` Jim Fehlig
2014-01-24 4:27 ` Jim Fehlig
2014-01-24 12:41 ` Ian Jackson
2014-01-24 12:52 ` Ian Campbell
2014-01-24 15:14 ` Ian Jackson
2014-01-24 15:18 ` Ian Jackson
2014-01-24 16:36 ` Ian Jackson
2014-01-24 16:57 ` Ian Jackson
2014-01-27 5:39 ` Jim Fehlig
2014-01-27 5:22 ` Jim Fehlig
2014-01-27 14:48 ` Ian Jackson
2014-01-28 1:39 ` [libvirt] [Xen-devel] " Jim Fehlig
2014-01-28 10:06 ` Daniel P. Berrange
2014-01-29 16:23 ` [libvirt] " Ian Jackson
2014-01-30 12:18 ` [libvirt] [Xen-devel] " Daniel P. Berrange
2014-01-30 16:14 ` Jim Fehlig
2014-01-30 16:17 ` Daniel P. Berrange
2014-01-30 16:28 ` Ian Jackson
2014-01-30 16:56 ` Jim Fehlig
2014-01-30 17:12 ` [libvirt] [Xen-devel] " Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52E09513.6060603@suse.com \
--to=jfehlig@suse.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=ian.campbell@citrix.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).