Re: [Qemu-devel] [RFC PATCH] qcow2: Fix race in cache invalidation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Kevin Wolf <kwolf@redhat.com>
Cc: "libvir-list @ redhat . com" <libvir-list@redhat.com>,
	qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH] qcow2: Fix race in cache invalidation
Date: Sun, 28 Sep 2014 21:14:13 +1000	[thread overview]
Message-ID: <5427ED85.3040909@ozlabs.ru> (raw)
In-Reply-To: <54242145.6070808@ozlabs.ru>

On 09/26/2014 12:05 AM, Alexey Kardashevskiy wrote:
> On 09/25/2014 10:39 PM, Kevin Wolf wrote:
>> Am 25.09.2014 um 14:29 hat Alexey Kardashevskiy geschrieben:
>>> On 09/25/2014 08:20 PM, Kevin Wolf wrote:
>>>> Am 25.09.2014 um 11:55 hat Alexey Kardashevskiy geschrieben:
>>>>> Right. Cool. So is below what was suggested? I am doublechecking as it does
>>>>> not solve the original issue - the bottomhalf is called first and then
>>>>> nbd_trip() crashes in qcow2_co_flush_to_os().
>>>>>
>>>>> diff --git a/block.c b/block.c
>>>>> index d06dd51..1e6dfd1 100644
>>>>> --- a/block.c
>>>>> +++ b/block.c
>>>>> @@ -5037,20 +5037,22 @@ void bdrv_invalidate_cache(BlockDriverState *bs,
>>>>> Error **errp)
>>>>>      if (local_err) {
>>>>>          error_propagate(errp, local_err);
>>>>>          return;
>>>>>      }
>>>>>
>>>>>      ret = refresh_total_sectors(bs, bs->total_sectors);
>>>>>      if (ret < 0) {
>>>>>          error_setg_errno(errp, -ret, "Could not refresh total sector count");
>>>>>          return;
>>>>>      }
>>>>> +
>>>>> +    bdrv_drain_all();
>>>>>  }
>>>>
>>>> Try moving the bdrv_drain_all() call to the top of the function (at
>>>> least it must be called before bs->drv->bdrv_invalidate_cache).
>>>
>>>
>>> Ok, I did. Did not help.
>>>
>>>
>>>>
>>>>> +static QEMUBH *migration_complete_bh;
>>>>> +static void process_incoming_migration_complete(void *opaque);
>>>>> +
>>>>>  static void process_incoming_migration_co(void *opaque)
>>>>>  {
>>>>>      QEMUFile *f = opaque;
>>>>> -    Error *local_err = NULL;
>>>>>      int ret;
>>>>>
>>>>>      ret = qemu_loadvm_state(f);
>>>>>      qemu_fclose(f);
>>>>
>>>> Paolo suggested to move eveything starting from here, but as far as I
>>>> can tell, leaving the next few lines here shouldn't hurt.
>>>
>>>
>>> Ouch. I was looking at wrong qcow2_fclose() all this time :)
>>> Aaaany what you suggested did not help -
>>> bdrv_co_flush() calls qemu_coroutine_yield() while this BH is being
>>> executed and the situation is still the same.
>>
>> Hm, do you have a backtrace? The idea with the BH was that it would be
>> executed _outside_ coroutine context and therefore wouldn't be able to
>> yield. If it's still executed in coroutine context, it would be
>> interesting to see who that caller is.
> 
> Like this?
> process_incoming_migration_complete
> bdrv_invalidate_cache_all
> bdrv_drain_all
> aio_dispatch
> node->io_read (which is nbd_read)
> nbd_trip
> bdrv_co_flush
> [...]


Ping? I do not know how to understand this backtrace - in fact, in gdb at
the moment of crash I only see traces up to nbd_trip and
coroutine_trampoline (below). What is the context here then?...


Program received signal SIGSEGV, Segmentation fault.
0x000000001050a8d4 in qcow2_cache_flush (bs=0x100363531a0, c=0x0) at
/home/alexey/p/qemu/block/qcow2-cache.c:174
(gdb) bt
#0  0x000000001050a8d4 in qcow2_cache_flush (bs=0x100363531a0, c=0x0) at
/home/alexey/p/qemu/block/qcow2-cache.c:174
#1  0x00000000104fbc4c in qcow2_co_flush_to_os (bs=0x100363531a0) at
/home/alexey/p/qemu/block/qcow2.c:2162
#2  0x00000000104c7234 in bdrv_co_flush (bs=0x100363531a0) at
/home/alexey/p/qemu/block.c:4978
#3  0x00000000104b7e68 in nbd_trip (opaque=0x1003653e530) at
/home/alexey/p/qemu/nbd.c:1260
#4  0x00000000104d7d84 in coroutine_trampoline (i0=0x100, i1=0x36549850) at
/home/alexey/p/qemu/coroutine-ucontext.c:118
#5  0x000000804db01a9c in .__makecontext () from /lib64/libc.so.6
#6  0x0000000000000000 in ?? ()



-- 
Alexey

next prev parent reply	other threads:[~2014-09-28 11:14 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-15 10:50 [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed Alexey Kardashevskiy
2014-09-16 12:02 ` Alexey Kardashevskiy
2014-09-16 12:10   ` Paolo Bonzini
2014-09-16 12:34     ` Kevin Wolf
2014-09-16 12:35       ` Paolo Bonzini
2014-09-16 12:52         ` Kevin Wolf
2014-09-16 12:59           ` Paolo Bonzini
2014-09-19  8:47             ` Kevin Wolf
2014-09-23  8:47               ` [Qemu-devel] [RFC PATCH] qcow2: Fix race in cache invalidation Alexey Kardashevskiy
2014-09-24  7:30                 ` Alexey Kardashevskiy
2014-09-24  9:48                 ` Kevin Wolf
2014-09-25  8:41                   ` Alexey Kardashevskiy
2014-09-25  8:57                     ` Kevin Wolf
2014-09-25  9:55                       ` Alexey Kardashevskiy
2014-09-25 10:20                         ` Kevin Wolf
2014-09-25 12:29                           ` Alexey Kardashevskiy
2014-09-25 12:39                             ` Kevin Wolf
2014-09-25 14:05                               ` Alexey Kardashevskiy
2014-09-28 11:14                                 ` Alexey Kardashevskiy [this message]
2014-09-17  6:46       ` [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed Alexey Kardashevskiy
2014-09-16 14:52     ` Alexey Kardashevskiy
2014-09-17  9:06     ` Stefan Hajnoczi
2014-09-17  9:25       ` Paolo Bonzini
2014-09-17 13:44         ` Alexey Kardashevskiy
2014-09-17 15:07           ` Stefan Hajnoczi
2014-09-18  3:26             ` Alexey Kardashevskiy
2014-09-18  9:56               ` Paolo Bonzini
2014-09-19  8:23                 ` Alexey Kardashevskiy
2014-09-17 15:04         ` Stefan Hajnoczi
2014-09-17 15:17           ` Eric Blake
2014-09-17 15:53           ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5427ED85.3040909@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=dgilbert@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.