* [Qemu-devel] [STAGING]: Block migration segfaults
@ 2009-12-03 18:21 Luiz Capitulino
2009-12-03 18:36 ` [Qemu-devel] " Jan Kiszka
0 siblings, 1 reply; 6+ messages in thread
From: Luiz Capitulino @ 2009-12-03 18:21 UTC (permalink / raw)
To: jan.kiszka; +Cc: qemu-devel, lirans
Hi there,
Got this while testing block migration in staging:
"""
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start full migration for %s\n",
ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192
192 if (mon->mc && !mon->mc->print_enabled) {
"""
The problem here is that init_blk_migration() calls monitor_printf() with
a NULL 'mon' and the backtrace shows that this is true for the entire call
chain.
You probably didn't note it before because the lowest-level monitor
print function would just return if the 'mon' parameter was NULL.
A patch from me (4a29a in staging) changes a higher level monitor
function to touch 'mon' before passing it down and here's the segfault.
Now, the point is: I could give the old behavior back but I think we're
hiding a bug there. Why would you call monitor_printf() with a NULL 'mon'?
Anyways, the following patch adds the old behavior back just in case
you want to see it working...
commit 3575196202d4e54c1fc63a631ca5bd1a7778e30d
Author: Luiz Capitulino <lcapitulino@redhat.com>
Date: Thu Dec 3 15:49:16 2009 -0200
monitor: Fix block migration segfault
The monitor_vprintf() function now touches the 'mon' variable
before calling monitor_puts(), this causes block migration
to segfault as its functions call monitor_printf() with a
NULL 'mon'.
This is probably hiding the real bug, but for some reason this
has been the behavior for a long time.
We also change monitor_print_object() to use monitor_print(),
so that monitor_puts() is only called by monitor_vprintf().
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
diff --git a/monitor.c b/monitor.c
index b035e0b..6d0b1dd 100644
--- a/monitor.c
+++ b/monitor.c
@@ -171,9 +171,6 @@ static void monitor_puts(Monitor *mon, const char *str)
{
char c;
- if (!mon)
- return;
-
for(;;) {
c = *str++;
if (c == '\0')
@@ -189,6 +186,9 @@ static void monitor_puts(Monitor *mon, const char *str)
void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
{
+ if (!mon)
+ return;
+
if (mon->mc && !mon->mc->print_enabled) {
qemu_error_new(QERR_UNDEFINED_ERROR);
} else {
@@ -269,7 +269,7 @@ static void monitor_print_qobject(Monitor *mon, const QObject *data)
break;
}
- monitor_puts(mon, "\n");
+ monitor_printf(mon, "\n");
}
static void monitor_json_emitter(Monitor *mon, const QObject *data)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] Re: [STAGING]: Block migration segfaults
2009-12-03 18:21 [Qemu-devel] [STAGING]: Block migration segfaults Luiz Capitulino
@ 2009-12-03 18:36 ` Jan Kiszka
2009-12-03 18:59 ` Luiz Capitulino
2009-12-03 19:44 ` Anthony Liguori
0 siblings, 2 replies; 6+ messages in thread
From: Jan Kiszka @ 2009-12-03 18:36 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: qemu-devel@nongnu.org, lirans@il.ibm.com
Luiz Capitulino wrote:
> Hi there,
>
> Got this while testing block migration in staging:
>
> """
> Program terminated with signal 11, Segmentation fault.
> #0 0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start full migration for %s\n",
> ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192
> 192 if (mon->mc && !mon->mc->print_enabled) {
> """
>
> The problem here is that init_blk_migration() calls monitor_printf() with
> a NULL 'mon' and the backtrace shows that this is true for the entire call
> chain.
What is the backtrace? And how did you start the migration?
>
> You probably didn't note it before because the lowest-level monitor
> print function would just return if the 'mon' parameter was NULL.
I was aware that mon might be NULL, but the existing code handled this
gracefully.
>
> A patch from me (4a29a in staging) changes a higher level monitor
> function to touch 'mon' before passing it down and here's the segfault.
>
> Now, the point is: I could give the old behavior back but I think we're
> hiding a bug there. Why would you call monitor_printf() with a NULL 'mon'?
If there is no monitor associated with the current context, it can be
NULL. This is mostly the case during early startup. One may also use
this to suppress output (though I don't recall any real case ATM).
>
> Anyways, the following patch adds the old behavior back just in case
> you want to see it working...
Yes, better restore the check. Still, your call stack would be
interesting. Maybe there is actual another bug behind it.
>
> commit 3575196202d4e54c1fc63a631ca5bd1a7778e30d
> Author: Luiz Capitulino <lcapitulino@redhat.com>
> Date: Thu Dec 3 15:49:16 2009 -0200
>
> monitor: Fix block migration segfault
Let's call it by it true name: Catch printing to non-existent monitor.
>
> The monitor_vprintf() function now touches the 'mon' variable
> before calling monitor_puts(), this causes block migration
> to segfault as its functions call monitor_printf() with a
> NULL 'mon'.
>
> This is probably hiding the real bug, but for some reason this
> has been the behavior for a long time.
>
> We also change monitor_print_object() to use monitor_print(),
> so that monitor_puts() is only called by monitor_vprintf().
>
> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
>
> diff --git a/monitor.c b/monitor.c
> index b035e0b..6d0b1dd 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -171,9 +171,6 @@ static void monitor_puts(Monitor *mon, const char *str)
> {
> char c;
>
> - if (!mon)
> - return;
> -
> for(;;) {
> c = *str++;
> if (c == '\0')
> @@ -189,6 +186,9 @@ static void monitor_puts(Monitor *mon, const char *str)
>
> void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
> {
> + if (!mon)
> + return;
> +
> if (mon->mc && !mon->mc->print_enabled) {
> qemu_error_new(QERR_UNDEFINED_ERROR);
> } else {
> @@ -269,7 +269,7 @@ static void monitor_print_qobject(Monitor *mon, const QObject *data)
> break;
> }
>
> - monitor_puts(mon, "\n");
> + monitor_printf(mon, "\n");
> }
>
> static void monitor_json_emitter(Monitor *mon, const QObject *data)
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] Re: [STAGING]: Block migration segfaults
2009-12-03 18:36 ` [Qemu-devel] " Jan Kiszka
@ 2009-12-03 18:59 ` Luiz Capitulino
2009-12-03 19:10 ` Jan Kiszka
2009-12-03 19:44 ` Anthony Liguori
1 sibling, 1 reply; 6+ messages in thread
From: Luiz Capitulino @ 2009-12-03 18:59 UTC (permalink / raw)
To: Jan Kiszka; +Cc: qemu-devel@nongnu.org, lirans@il.ibm.com
On Thu, 03 Dec 2009 19:36:15 +0100
Jan Kiszka <jan.kiszka@siemens.com> wrote:
> Luiz Capitulino wrote:
> > Hi there,
> >
> > Got this while testing block migration in staging:
> >
> > """
> > Program terminated with signal 11, Segmentation fault.
> > #0 0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start full migration for %s\n",
> > ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192
> > 192 if (mon->mc && !mon->mc->print_enabled) {
> > """
> >
> > The problem here is that init_blk_migration() calls monitor_printf() with
> > a NULL 'mon' and the backtrace shows that this is true for the entire call
> > chain.
>
> What is the backtrace? And how did you start the migration?
Started the source VM with:
# qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
-balloon virtio -m 1G -S
and the destination one with:
# qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
-balloon virtio -m 1G -S -incoming tcp:0:4444
Migration command issued:
(QEMU) migrate -d -b tcp:0:4444
I have no idea if this correct and wondered if specifying the same image
for the destination VM would be catastrophic. :)
The backtrace follows and it's the source VM which segfaults:
"""
#0 0x0000000000410c11 in monitor_vprintf (mon=0x0, fmt=0x5ada87 "Start full migration for %s\n",
#1 0x0000000000410d59 in monitor_printf (mon=0x0, fmt=0x5ada87 "Start full migration for %s\n")
#2 0x00000000004e584b in init_blk_migration (mon=0x0, f=0x2864130) at block-migration.c:254
#3 0x00000000004e5d8a in block_save_live (mon=0x0, f=0x2864130, stage=1, opaque=0xbea960)
#4 0x00000000004db61a in qemu_savevm_state_begin (mon=0x0, f=0x2864130, blk_enable=1, shared=0)
#5 0x00000000004d2448 in migrate_fd_connect (s=0x20bf470) at migration.c:279
#6 0x00000000004d2849 in tcp_wait_for_connect (opaque=0x20bf470) at migration-tcp.c:72
#7 0x000000000040c3f8 in main_loop_wait (timeout=5000) at /home/lcapitulino/src/aliguori-queue/vl.c:3875
#8 0x000000000040ca17 in main_loop () at /home/lcapitulino/src/aliguori-queue/vl.c:4095
#9 0x00000000004104ce in main (argc=10, argv=0x7fff6f838bf8, envp=0x7fff6f838c50)
"""
[...]
> > A patch from me (4a29a in staging) changes a higher level monitor
> > function to touch 'mon' before passing it down and here's the segfault.
> >
> > Now, the point is: I could give the old behavior back but I think we're
> > hiding a bug there. Why would you call monitor_printf() with a NULL 'mon'?
>
> If there is no monitor associated with the current context, it can be
> NULL. This is mostly the case during early startup.
Why would one call monitor_printf() before monitor initialization?
> One may also use
> this to suppress output (though I don't recall any real case ATM).
I would prefer functions like monitor_disable()/monitor_enable()
for this...
> > Anyways, the following patch adds the old behavior back just in case
> > you want to see it working...
>
> Yes, better restore the check. Still, your call stack would be
> interesting. Maybe there is actual another bug behind it.
Ok, I'll send the fix because it's not that important right now,
but I'm not convinced this is the right thing to do.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] Re: [STAGING]: Block migration segfaults
2009-12-03 18:59 ` Luiz Capitulino
@ 2009-12-03 19:10 ` Jan Kiszka
2009-12-03 19:47 ` Anthony Liguori
0 siblings, 1 reply; 6+ messages in thread
From: Jan Kiszka @ 2009-12-03 19:10 UTC (permalink / raw)
To: Luiz Capitulino; +Cc: qemu-devel@nongnu.org, lirans@il.ibm.com
[-- Attachment #1: Type: text/plain, Size: 3728 bytes --]
Luiz Capitulino wrote:
> On Thu, 03 Dec 2009 19:36:15 +0100
> Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
>> Luiz Capitulino wrote:
>>> Hi there,
>>>
>>> Got this while testing block migration in staging:
>>>
>>> """
>>> Program terminated with signal 11, Segmentation fault.
>>> #0 0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start full migration for %s\n",
>>> ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192
>>> 192 if (mon->mc && !mon->mc->print_enabled) {
>>> """
>>>
>>> The problem here is that init_blk_migration() calls monitor_printf() with
>>> a NULL 'mon' and the backtrace shows that this is true for the entire call
>>> chain.
>> What is the backtrace? And how did you start the migration?
>
> Started the source VM with:
>
> # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
> -balloon virtio -m 1G -S
>
> and the destination one with:
>
> # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
> -balloon virtio -m 1G -S -incoming tcp:0:4444
>
> Migration command issued:
>
> (QEMU) migrate -d -b tcp:0:4444
>
Ah, forgot '-d'! Yes, that was precisely the use case for suppressing
monitor output I was talking about. This depends on the monitor services
catching NULL properly, so please push the corresponding patch.
>
> I have no idea if this correct and wondered if specifying the same image
> for the destination VM would be catastrophic. :)
>
> The backtrace follows and it's the source VM which segfaults:
>
> """
> #0 0x0000000000410c11 in monitor_vprintf (mon=0x0, fmt=0x5ada87 "Start full migration for %s\n",
> #1 0x0000000000410d59 in monitor_printf (mon=0x0, fmt=0x5ada87 "Start full migration for %s\n")
> #2 0x00000000004e584b in init_blk_migration (mon=0x0, f=0x2864130) at block-migration.c:254
> #3 0x00000000004e5d8a in block_save_live (mon=0x0, f=0x2864130, stage=1, opaque=0xbea960)
> #4 0x00000000004db61a in qemu_savevm_state_begin (mon=0x0, f=0x2864130, blk_enable=1, shared=0)
> #5 0x00000000004d2448 in migrate_fd_connect (s=0x20bf470) at migration.c:279
> #6 0x00000000004d2849 in tcp_wait_for_connect (opaque=0x20bf470) at migration-tcp.c:72
> #7 0x000000000040c3f8 in main_loop_wait (timeout=5000) at /home/lcapitulino/src/aliguori-queue/vl.c:3875
> #8 0x000000000040ca17 in main_loop () at /home/lcapitulino/src/aliguori-queue/vl.c:4095
> #9 0x00000000004104ce in main (argc=10, argv=0x7fff6f838bf8, envp=0x7fff6f838c50)
> """
>
> [...]
>
>>> A patch from me (4a29a in staging) changes a higher level monitor
>>> function to touch 'mon' before passing it down and here's the segfault.
>>>
>>> Now, the point is: I could give the old behavior back but I think we're
>>> hiding a bug there. Why would you call monitor_printf() with a NULL 'mon'?
>> If there is no monitor associated with the current context, it can be
>> NULL. This is mostly the case during early startup.
>
> Why would one call monitor_printf() before monitor initialization?
Via shared code e.g.
>
>> One may also use
>> this to suppress output (though I don't recall any real case ATM).
>
> I would prefer functions like monitor_disable()/monitor_enable()
> for this...
They have different meanings.
>
>>> Anyways, the following patch adds the old behavior back just in case
>>> you want to see it working...
>> Yes, better restore the check. Still, your call stack would be
>> interesting. Maybe there is actual another bug behind it.
>
> Ok, I'll send the fix because it's not that important right now,
> but I'm not convinced this is the right thing to do.
Thanks,
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Re: [STAGING]: Block migration segfaults
2009-12-03 18:36 ` [Qemu-devel] " Jan Kiszka
2009-12-03 18:59 ` Luiz Capitulino
@ 2009-12-03 19:44 ` Anthony Liguori
1 sibling, 0 replies; 6+ messages in thread
From: Anthony Liguori @ 2009-12-03 19:44 UTC (permalink / raw)
To: Jan Kiszka; +Cc: qemu-devel@nongnu.org, lirans@il.ibm.com, Luiz Capitulino
Jan Kiszka wrote:
> Luiz Capitulino wrote:
>
>> Hi there,
>>
>> Got this while testing block migration in staging:
>>
>> """
>> Program terminated with signal 11, Segmentation fault.
>> #0 0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start full migration for %s\n",
>> ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192
>> 192 if (mon->mc && !mon->mc->print_enabled) {
>> """
>>
>> The problem here is that init_blk_migration() calls monitor_printf() with
>> a NULL 'mon' and the backtrace shows that this is true for the entire call
>> chain.
>>
>
> What is the backtrace? And how did you start the migration?
>
>
>> You probably didn't note it before because the lowest-level monitor
>> print function would just return if the 'mon' parameter was NULL.
>>
>
> I was aware that mon might be NULL, but the existing code handled this
> gracefully.
>
>
>> A patch from me (4a29a in staging) changes a higher level monitor
>> function to touch 'mon' before passing it down and here's the segfault.
>>
>> Now, the point is: I could give the old behavior back but I think we're
>> hiding a bug there. Why would you call monitor_printf() with a NULL 'mon'?
>>
>
> If there is no monitor associated with the current context, it can be
> NULL. This is mostly the case during early startup. One may also use
> this to suppress output (though I don't recall any real case ATM).
>
I'm a bit concerned with this explanation as there is no reason
something should be printing to the monitor unless it's in response to a
monitor command. I'd like to see the full call chain to see what's
happening too.
>> Anyways, the following patch adds the old behavior back just in case
>> you want to see it working...
>>
>
> Yes, better restore the check. Still, your call stack would be
> interesting. Maybe there is actual another bug behind it.
>
I think we should have an assert in this path because mon==NULL is
definitely wrong.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Re: [STAGING]: Block migration segfaults
2009-12-03 19:10 ` Jan Kiszka
@ 2009-12-03 19:47 ` Anthony Liguori
0 siblings, 0 replies; 6+ messages in thread
From: Anthony Liguori @ 2009-12-03 19:47 UTC (permalink / raw)
To: Jan Kiszka; +Cc: qemu-devel@nongnu.org, lirans@il.ibm.com, Luiz Capitulino
Jan Kiszka wrote:
> Luiz Capitulino wrote:
>
>> On Thu, 03 Dec 2009 19:36:15 +0100
>> Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>
>>
>>> Luiz Capitulino wrote:
>>>
>>>> Hi there,
>>>>
>>>> Got this while testing block migration in staging:
>>>>
>>>> """
>>>> Program terminated with signal 11, Segmentation fault.
>>>> #0 0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start full migration for %s\n",
>>>> ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192
>>>> 192 if (mon->mc && !mon->mc->print_enabled) {
>>>> """
>>>>
>>>> The problem here is that init_blk_migration() calls monitor_printf() with
>>>> a NULL 'mon' and the backtrace shows that this is true for the entire call
>>>> chain.
>>>>
>>> What is the backtrace? And how did you start the migration?
>>>
>> Started the source VM with:
>>
>> # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
>> -balloon virtio -m 1G -S
>>
>> and the destination one with:
>>
>> # qemu -hda disks/fedora-11-kratos-i386.img -enable-kvm -snapshot \
>> -balloon virtio -m 1G -S -incoming tcp:0:4444
>>
>> Migration command issued:
>>
>> (QEMU) migrate -d -b tcp:0:4444
>>
>>
>
> Ah, forgot '-d'! Yes, that was precisely the use case for suppressing
> monitor output I was talking about. This depends on the monitor services
> catching NULL properly, so please push the corresponding patch.
>
Man I really hate detach. Okay, I agree re: a patch.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-12-03 19:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-03 18:21 [Qemu-devel] [STAGING]: Block migration segfaults Luiz Capitulino
2009-12-03 18:36 ` [Qemu-devel] " Jan Kiszka
2009-12-03 18:59 ` Luiz Capitulino
2009-12-03 19:10 ` Jan Kiszka
2009-12-03 19:47 ` Anthony Liguori
2009-12-03 19:44 ` Anthony Liguori
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).