* question about SIGSEGV in datacopier_readable in libxl_aoutil.c
@ 2013-09-03 7:01 Chunyan Liu
2013-09-03 7:56 ` Ian Campbell
0 siblings, 1 reply; 5+ messages in thread
From: Chunyan Liu @ 2013-09-03 7:01 UTC (permalink / raw)
To: xen-devel@lists.xensource.com
[-- Attachment #1.1: Type: text/plain, Size: 4003 bytes --]
Hi, List,
I'm trying to add migration APIs to libvirt libxl driver. In testing HVM
migration, on source side, when executing libxl_domain_suspend, often meet
SIGSEGV in libxl_aoutil.c: datacopier_readable, the malloc() function place:
if (!buf || buf->used >= sizeof(buf->buf)) {
buf = malloc(sizeof(*buf));
I doubt the heap is corrupted someway but couldn't confirm the root cause.
And I tried valgrind to find some clue, following is the info right before
the SIGSEGV.
#valgrind --leak-check=full /usr/sbin/libvirtd -l -d
[snip]
==7510== Syscall param read(buf) points to unaddressable byte(s)
==7510== at 0x8ECC76D: ??? (syscall-template.S:82)
==7510== by 0x14AB3070: datacopier_readable (unistd.h:45)
==7510== by 0x14AB833C: afterpoll_internal (libxl_event.c:995)
==7510== by 0x14AB8F16: eventloop_iteration (libxl_event.c:1440)
==7510== by 0x14AB9439: libxl__ao_inprogress (libxl_event.c:1685)
==7510== by 0x14A9ABF7: libxl_domain_suspend (libxl.c:785)
==7510== by 0x148404B3: libxlDomainMigratePerform3 (libxl_driver.c:5100)
==7510== by 0x5390CFA: virDomainMigratePerform3 (libvirt.c:7050)
==7510== by 0x12C262: remoteDispatchDomainMigratePerform3Helper
(remote.c:3507)
==7510== by 0x53EACBE: virNetServerProgramDispatch
(virnetserverprogram.c:435)
==7510== by 0x53EBCBD: virNetServerProcessMsg (virnetserver.c:165)
==7510== by 0x53EC912: virNetServerHandleJob (virnetserver.c:186)
==7510== Address 0x18a409ec is 0 bytes after a block of size 28 alloc'd
==7510== at 0x4C26FFB: calloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==7510== by 0x14AAECB6: libxl__zalloc (libxl_internal.c:83)
==7510== by 0x14AB33B0: libxl__datacopier_prefixdata (libxl_aoutils.c:92)
==7510== by 0x14AA6CDC: libxl__domain_save_device_model
(libxl_dom.c:1447)
==7510== by 0x14AA8097: libxl__xc_domain_save_done (libxl_dom.c:1382)
==7510== by 0x14AB4268: helper_done (libxl_save_callout.c:332)
==7510== by 0x14AB4CA2: helper_exited (libxl_save_callout.c:317)
==7510== by 0x14ABB274: childproc_reaped (libxl_fork.c:264)
==7510== by 0x14ABB97A: libxl__fork_selfpipe_woken (libxl_fork.c:300)
==7510== by 0x14AB83A0: afterpoll_internal (libxl_event.c:1008)
==7510== by 0x14AB8F16: eventloop_iteration (libxl_event.c:1440)
==7510== by 0x14AB9439: libxl__ao_inprogress (libxl_event.c:1685)
==7510==
--7510-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) -
exiting
--7510-- si_code=80; Faulting address: 0x0; sp: 0x406ad5da0
I couldn't find a clear problem in the code, but after trying to change the
code a little, it turned to be working.
Following is the change.
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -89,7 +89,8 @@ void libxl__datacopier_prefixdata(libxl_
assert(len < dc->maxsz - dc->used);
- buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len);
+// buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len);
+ buf = libxl__zalloc(NOGC, sizeof(libxl__datacopier_buf));
buf->used = len;
memcpy(buf->buf, data, len);
@@ -141,10 +142,11 @@ static void datacopier_readable(libxl__e
libxl__datacopier_buf *buf =
LIBXL_TAILQ_LAST(&dc->bufs, libxl__datacopier_bufs);
if (!buf || buf->used >= sizeof(buf->buf)) {
- buf = malloc(sizeof(*buf));
- if (!buf) libxl__alloc_failed(CTX, __func__, 1, sizeof(*buf));
- buf->used = 0;
- LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry);
+ libxl__datacopier_buf *newbuf =
malloc(sizeof(libxl__datacopier_buf));
+ if (!newbuf) libxl__alloc_failed(CTX, __func__, 1,
sizeof(libxl__datacopier_buf));
+ newbuf->used = 0;
+ LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, newbuf, entry);
+ buf = newbuf;
}
int r = read(ev->fd,
buf->buf + buf->used,
Could anybody familiar with this part of code take a look at it?
Thanks,
Chunyan
[-- Attachment #1.2: Type: text/html, Size: 4797 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question about SIGSEGV in datacopier_readable in libxl_aoutil.c
2013-09-03 7:01 question about SIGSEGV in datacopier_readable in libxl_aoutil.c Chunyan Liu
@ 2013-09-03 7:56 ` Ian Campbell
2013-09-03 9:23 ` Andrew Cooper
0 siblings, 1 reply; 5+ messages in thread
From: Ian Campbell @ 2013-09-03 7:56 UTC (permalink / raw)
To: Chunyan Liu; +Cc: xen-devel@lists.xensource.com, Ian Jackson
On Tue, 2013-09-03 at 15:01 +0800, Chunyan Liu wrote:
> Hi, List,
Adding Ian J, who knows this bit of the code.
Which version of Xen (and therefore libxl) are you using?
> I'm trying to add migration APIs to libvirt libxl driver. In testing
> HVM migration, on source side, when executing libxl_domain_suspend,
> often meet SIGSEGV in libxl_aoutil.c: datacopier_readable, the
> malloc() function place:
> if (!buf || buf->used >= sizeof(buf->buf)) {
> buf = malloc(sizeof(*buf));
>
> I doubt the heap is corrupted someway but couldn't confirm the root
> cause. And I tried valgrind to find some clue, following is the info
> right before the SIGSEGV.
> #valgrind --leak-check=full /usr/sbin/libvirtd -l -d
Did you use a Xen aware version of valgrind?
http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/
> [snip]
>
> ==7510== Syscall param read(buf) points to unaddressable byte(s)
> ==7510== at 0x8ECC76D: ??? (syscall-template.S:82)
> ==7510== by 0x14AB3070: datacopier_readable (unistd.h:45)
> ==7510== by 0x14AB833C: afterpoll_internal (libxl_event.c:995)
> ==7510== by 0x14AB8F16: eventloop_iteration (libxl_event.c:1440)
> ==7510== by 0x14AB9439: libxl__ao_inprogress (libxl_event.c:1685)
> ==7510== by 0x14A9ABF7: libxl_domain_suspend (libxl.c:785)
> ==7510== by 0x148404B3: libxlDomainMigratePerform3 (libxl_driver.c:5100)
> ==7510== by 0x5390CFA: virDomainMigratePerform3 (libvirt.c:7050)
> ==7510== by 0x12C262: remoteDispatchDomainMigratePerform3Helper (remote.c:3507)
> ==7510== by 0x53EACBE: virNetServerProgramDispatch (virnetserverprogram.c:435)
> ==7510== by 0x53EBCBD: virNetServerProcessMsg (virnetserver.c:165)
> ==7510== by 0x53EC912: virNetServerHandleJob (virnetserver.c:186)
> ==7510== Address 0x18a409ec is 0 bytes after a block of size 28 alloc'd
> ==7510== at 0x4C26FFB: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==7510== by 0x14AAECB6: libxl__zalloc (libxl_internal.c:83)
> ==7510== by 0x14AB33B0: libxl__datacopier_prefixdata (libxl_aoutils.c:92)
> ==7510== by 0x14AA6CDC: libxl__domain_save_device_model (libxl_dom.c:1447)
> ==7510== by 0x14AA8097: libxl__xc_domain_save_done (libxl_dom.c:1382)
> ==7510== by 0x14AB4268: helper_done (libxl_save_callout.c:332)
> ==7510== by 0x14AB4CA2: helper_exited (libxl_save_callout.c:317)
> ==7510== by 0x14ABB274: childproc_reaped (libxl_fork.c:264)
> ==7510== by 0x14ABB97A: libxl__fork_selfpipe_woken (libxl_fork.c:300)
> ==7510== by 0x14AB83A0: afterpoll_internal (libxl_event.c:1008)
> ==7510== by 0x14AB8F16: eventloop_iteration (libxl_event.c:1440)
> ==7510== by 0x14AB9439: libxl__ao_inprogress (libxl_event.c:1685)
> ==7510==
> --7510-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
> --7510-- si_code=80; Faulting address: 0x0; sp: 0x406ad5da0
>
> I couldn't find a clear problem in the code, but after trying to change the code a little, it turned to be working.
> Following is the change.
>
> --- a/tools/libxl/libxl_aoutils.c
> +++ b/tools/libxl/libxl_aoutils.c
> @@ -89,7 +89,8 @@ void libxl__datacopier_prefixdata(libxl_
>
> assert(len < dc->maxsz - dc->used);
>
> - buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len);
> +// buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len);
> + buf = libxl__zalloc(NOGC, sizeof(libxl__datacopier_buf));
> buf->used = len;
> memcpy(buf->buf, data, len);
>
> @@ -141,10 +142,11 @@ static void datacopier_readable(libxl__e
> libxl__datacopier_buf *buf =
> LIBXL_TAILQ_LAST(&dc->bufs, libxl__datacopier_bufs);
> if (!buf || buf->used >= sizeof(buf->buf)) {
> - buf = malloc(sizeof(*buf));
> - if (!buf) libxl__alloc_failed(CTX, __func__, 1, sizeof(*buf));
> - buf->used = 0;
> - LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry);
> + libxl__datacopier_buf *newbuf = malloc(sizeof(libxl__datacopier_buf));
> + if (!newbuf) libxl__alloc_failed(CTX, __func__, 1, sizeof(libxl__datacopier_buf));
> + newbuf->used = 0;
> + LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, newbuf, entry);
> + buf = newbuf;
> }
> int r = read(ev->fd,
> buf->buf + buf->used,
>
>
> Could anybody familiar with this part of code take a look at it?
>
>
> Thanks,
> Chunyan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question about SIGSEGV in datacopier_readable in libxl_aoutil.c
2013-09-03 7:56 ` Ian Campbell
@ 2013-09-03 9:23 ` Andrew Cooper
2013-09-03 14:31 ` question about SIGSEGV in datacopier_readable in libxl_aoutil.c [and 1 more messages] Ian Jackson
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2013-09-03 9:23 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel@lists.xensource.com, Ian Jackson, Chunyan Liu
On 03/09/13 08:56, Ian Campbell wrote:
> On Tue, 2013-09-03 at 15:01 +0800, Chunyan Liu wrote:
>> Hi, List,
> Adding Ian J, who knows this bit of the code.
>
> Which version of Xen (and therefore libxl) are you using?
>
>> I'm trying to add migration APIs to libvirt libxl driver. In testing
>> HVM migration, on source side, when executing libxl_domain_suspend,
>> often meet SIGSEGV in libxl_aoutil.c: datacopier_readable, the
>> malloc() function place:
>> if (!buf || buf->used >= sizeof(buf->buf)) {
>> buf = malloc(sizeof(*buf));
>>
>> I doubt the heap is corrupted someway but couldn't confirm the root
>> cause. And I tried valgrind to find some clue, following is the info
>> right before the SIGSEGV.
>> #valgrind --leak-check=full /usr/sbin/libvirtd -l -d
> Did you use a Xen aware version of valgrind?
> http://blog.xen.org/index.php/2013/01/18/using-valgrind-to-debug-xen-toolstacks/
In which case I really should post my valgrind patches for full
migration support.
I will try to get around to posting the series today.
~Andrew
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question about SIGSEGV in datacopier_readable in libxl_aoutil.c [and 1 more messages]
2013-09-03 9:23 ` Andrew Cooper
@ 2013-09-03 14:31 ` Ian Jackson
2013-09-04 8:15 ` Chunyan Liu
0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2013-09-03 14:31 UTC (permalink / raw)
To: Andrew Cooper, Ian Campbell; +Cc: xen-devel@lists.xensource.com, Chunyan Liu
Ian Campbell writes ("Re: [Xen-devel] question about SIGSEGV in datacopier_readable in libxl_aoutil.c"):
> On Tue, 2013-09-03 at 15:01 +0800, Chunyan Liu wrote:
> > if (!buf || buf->used >= sizeof(buf->buf)) {
> > buf = malloc(sizeof(*buf));
...
> > ==7510== Syscall param read(buf) points to unaddressable byte(s)
...
> > ==7510== Address 0x18a409ec is 0 bytes after a block of size 28 alloc'd
> > ==7510== at 0x4C26FFB: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> > ==7510== by 0x14AAECB6: libxl__zalloc (libxl_internal.c:83)
> > ==7510== by 0x14AB33B0: libxl__datacopier_prefixdata (libxl_aoutils.c:92)
I think this is my fault. Please try this patch.
Thanks,
Ian.
commit 25cd65c97b733d5892b62c3ffae0887f426398ec
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date: Tue Sep 3 13:41:46 2013 +0100
libxl: Do not generate short block in libxl__datacopier_prefixdata
libxl__datacopier_prefixdata would prepend a deliberately short block
(not just a half-full one, but one with a short buffer) to the
dc->bufs queue. However, this is wrong because datacopier_readable
will find it and try to continue to fill it up.
Instead, allocate a full-sized buffer.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 983a60a..b4eb6e5 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -89,7 +89,7 @@ void libxl__datacopier_prefixdata(libxl__egc *egc, libxl__datacopier_state *dc,
assert(len < dc->maxsz - dc->used);
- buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len);
+ buf = libxl__zalloc(NOGC, sizeof(*buf));
buf->used = len;
memcpy(buf->buf, data, len);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: question about SIGSEGV in datacopier_readable in libxl_aoutil.c [and 1 more messages]
2013-09-03 14:31 ` question about SIGSEGV in datacopier_readable in libxl_aoutil.c [and 1 more messages] Ian Jackson
@ 2013-09-04 8:15 ` Chunyan Liu
0 siblings, 0 replies; 5+ messages in thread
From: Chunyan Liu @ 2013-09-04 8:15 UTC (permalink / raw)
To: Ian Jackson; +Cc: Andrew Cooper, xen-devel@lists.xensource.com, Ian Campbell
[-- Attachment #1.1: Type: text/plain, Size: 2102 bytes --]
2013/9/3 Ian Jackson <Ian.Jackson@eu.citrix.com>
> Ian Campbell writes ("Re: [Xen-devel] question about SIGSEGV in
> datacopier_readable in libxl_aoutil.c"):
> > On Tue, 2013-09-03 at 15:01 +0800, Chunyan Liu wrote:
> > > if (!buf || buf->used >= sizeof(buf->buf)) {
> > > buf = malloc(sizeof(*buf));
> ...
> > > ==7510== Syscall param read(buf) points to unaddressable byte(s)
> ...
> > > ==7510== Address 0x18a409ec is 0 bytes after a block of size 28
> alloc'd
> > > ==7510== at 0x4C26FFB: calloc (in
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> > > ==7510== by 0x14AAECB6: libxl__zalloc (libxl_internal.c:83)
> > > ==7510== by 0x14AB33B0: libxl__datacopier_prefixdata
> (libxl_aoutils.c:92)
>
> I think this is my fault. Please try this patch.
>
> Thanks,
> Ian.
>
> commit 25cd65c97b733d5892b62c3ffae0887f426398ec
> Author: Ian Jackson <ian.jackson@eu.citrix.com>
> Date: Tue Sep 3 13:41:46 2013 +0100
>
> libxl: Do not generate short block in libxl__datacopier_prefixdata
>
> libxl__datacopier_prefixdata would prepend a deliberately short block
> (not just a half-full one, but one with a short buffer) to the
> dc->bufs queue. However, this is wrong because datacopier_readable
> will find it and try to continue to fill it up.
>
> Instead, allocate a full-sized buffer.
>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
>
> diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
> index 983a60a..b4eb6e5 100644
> --- a/tools/libxl/libxl_aoutils.c
> +++ b/tools/libxl/libxl_aoutils.c
> @@ -89,7 +89,7 @@ void libxl__datacopier_prefixdata(libxl__egc *egc,
> libxl__datacopier_state *dc,
>
> assert(len < dc->maxsz - dc->used);
>
> - buf = libxl__zalloc(NOGC, sizeof(*buf) - sizeof(buf->buf) + len);
> + buf = libxl__zalloc(NOGC, sizeof(*buf));
> buf->used = len;
> memcpy(buf->buf, data, len);
>
>
Tried the patch. It worked. Thanks.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>
[-- Attachment #1.2: Type: text/html, Size: 3084 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-09-04 8:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-03 7:01 question about SIGSEGV in datacopier_readable in libxl_aoutil.c Chunyan Liu
2013-09-03 7:56 ` Ian Campbell
2013-09-03 9:23 ` Andrew Cooper
2013-09-03 14:31 ` question about SIGSEGV in datacopier_readable in libxl_aoutil.c [and 1 more messages] Ian Jackson
2013-09-04 8:15 ` Chunyan Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).