All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Pavel Dovgalyuk" <dovgaluk@ispras.ru>
To: "'Vladimir Sementsov-Ogievskiy'" <vsementsov@virtuozzo.com>
Cc: kwolf@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com
Subject: RE: Race condition in overlayed qcow2?
Date: Thu, 20 Feb 2020 14:48:17 +0300	[thread overview]
Message-ID: <003b01d5e7e3$a466a340$ed33e9c0$@ru> (raw)
In-Reply-To: <ea13d572-4840-3e88-bc7f-d7c4351cc345@virtuozzo.com>

> From: Vladimir Sementsov-Ogievskiy [mailto:vsementsov@virtuozzo.com]
> 20.02.2020 13:00, Pavel Dovgalyuk wrote:
> >> From: Vladimir Sementsov-Ogievskiy [mailto:vsementsov@virtuozzo.com]
> >> 20.02.2020 11:31, dovgaluk wrote:
> >>> Vladimir Sementsov-Ogievskiy писал 2020-02-19 19:07:
> >>>> 19.02.2020 17:32, dovgaluk wrote:
> >>>>> I encountered a problem with record/replay of QEMU execution and figured out the
> >> following, when
> >>>>> QEMU is started with one virtual disk connected to the qcow2 image with applied
> 'snapshot'
> >> option.
> >>>>>
> >>>>> The patch d710cf575ad5fb3ab329204620de45bfe50caa53 "block/qcow2: introduce parallel
> >> subrequest handling in read and write"
> >>>>> introduces some kind of race condition, which causes difference in the data read from
> the
> >> disk.
> >>>>>
> >>>>> I detected this by adding the following code, which logs IO operation checksum. And this
> >> checksum may be different in different runs of the same recorded execution.
> >>>>>
> >>>>> logging in blk_aio_complete function:
> >>>>>           qemu_log("%"PRId64": blk_aio_complete\n", replay_get_current_icount());
> >>>>>           QEMUIOVector *qiov = acb->rwco.iobuf;
> >>>>>           if (qiov && qiov->iov) {
> >>>>>               size_t i, j;
> >>>>>               uint64_t sum = 0;
> >>>>>               int count = 0;
> >>>>>               for (i = 0 ; i < qiov->niov ; ++i) {
> >>>>>                   for (j = 0 ; j < qiov->iov[i].iov_len ; ++j) {
> >>>>>                       sum += ((uint8_t*)qiov->iov[i].iov_base)[j];
> >>>>>                       ++count;
> >>>>>                   }
> >>>>>               }
> >>>>>               qemu_log("--- iobuf offset %"PRIx64" len %x sum: %"PRIx64"\n", acb-
> >>> rwco.offset, count, sum);
> >>>>>           }
> >>>>>
> >>>>> I tried to get rid of aio task by patching qcow2_co_preadv_part:
> >>>>> ret = qcow2_co_preadv_task(bs, ret, cluster_offset, offset, cur_bytes, qiov,
> qiov_offset);
> >>>>>
> >>>>> That change fixed a bug, but I have no idea what to debug next to figure out the exact
> >> reason of the failure.
> >>>>>
> >>>>> Do you have any ideas or hints?
> >>>>>
> >>>>
> >>>> Hi!
> >>>>
> >>>> Hmm, do mean that read from the disk may return wrong data? It would
> >>>> be very bad of course :(
> >>>> Could you provide a reproducer, so that I can look at it and debug?
> >>>
> >>> It is just a winxp-32 image. I record the execution and replay it with the following
> command
> >> lines:
> >>>
> >>> qemu-system-i386 -icount shift=7,rr=record,rrfile=replay.bin -m 512M -drive
> >> file=xp.qcow2,if=none,id=device-34-file,snapshot -drive
> driver=blkreplay,if=none,image=device-
> >> 34-file,id=device-34-driver -device ide-hd,drive=device-34-driver,bus=ide.0,id=device-34 -
> net
> >> none
> >>>
> >>> qemu-system-i386 -icount shift=7,rr=replay,rrfile=replay.bin -m 512M -drive
> >> file=xp.qcow2,if=none,id=device-34-file,snapshot -drive
> driver=blkreplay,if=none,image=device-
> >> 34-file,id=device-34-driver -device ide-hd,drive=device-34-driver,bus=ide.0,id=device-34 -
> net
> >> none
> >>>
> >>> Replay stalls at some moment due to the non-determinism of the execution (probably caused
> by
> >> the wrong data read).
> >>
> >> Hmm.. I tried it  (with x86_64 qemu and centos image). I waited for some time for a first
> >> command, than Ctrl+C it. After it replay.bin was 4M. Than started the second command. It
> >> works, not failing, not finishing. Is it bad? What is expected behavior and what is wrong?
> >
> > The second command should finish. There is no replay introspection yet (in master), but you
> can
> > stop qemu with gdb and inspect replay_state.current_icount field. It should increase with
> every
> > virtual CPU instruction execution. If that counter has stopped, it means that replay hangs.
> 
> It hangs for me even with QCOW2_MAX_WORKERS = 1..


There could be some other bugs in record/replay.
To be sure try winxp on i386.

Pavel Dovgalyuk



      reply	other threads:[~2020-02-20 11:48 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-19 14:32 Race condition in overlayed qcow2? dovgaluk
2020-02-19 16:07 ` Vladimir Sementsov-Ogievskiy
2020-02-20  8:31   ` dovgaluk
2020-02-20  9:05     ` Vladimir Sementsov-Ogievskiy
2020-02-20  9:36       ` Vladimir Sementsov-Ogievskiy
2020-02-21  9:49         ` dovgaluk
2020-02-21 10:09           ` Vladimir Sementsov-Ogievskiy
2020-02-21 12:35             ` dovgaluk
2020-02-21 13:23               ` Vladimir Sementsov-Ogievskiy
2020-02-25  5:58                 ` dovgaluk
2020-02-25  7:27                   ` Vladimir Sementsov-Ogievskiy
2020-02-25  7:56                     ` dovgaluk
2020-02-25  9:19                       ` Vladimir Sementsov-Ogievskiy
2020-02-25  9:26                         ` Pavel Dovgalyuk
2020-02-25 10:07                         ` Pavel Dovgalyuk
2020-02-25 11:47                           ` Kevin Wolf
2020-02-20 10:00       ` Pavel Dovgalyuk
2020-02-20 11:26         ` Vladimir Sementsov-Ogievskiy
2020-02-20 11:48           ` Pavel Dovgalyuk [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='003b01d5e7e3$a466a340$ed33e9c0$@ru' \
    --to=dovgaluk@ispras.ru \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.