From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48824) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYY69-0006Mz-Hv for qemu-devel@nongnu.org; Wed, 24 Feb 2016 06:59:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aYY66-0006iG-J9 for qemu-devel@nongnu.org; Wed, 24 Feb 2016 06:59:17 -0500 Received: from mail.ispras.ru ([83.149.199.45]:47663) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYY66-0006i8-73 for qemu-devel@nongnu.org; Wed, 24 Feb 2016 06:59:14 -0500 From: "Pavel Dovgalyuk" References: <20160215093810.GC5244@noname.str.redhat.com> <004701d167f8$5cbe70f0$163b52d0$@ru> <20160215140635.GF5244@noname.str.redhat.com> <005501d167fc$8ed75030$ac85f090$@ru> <20160215150110.GG5244@noname.str.redhat.com> <000601d16882$c9637270$5c2a5750$@ru> <20160216100208.GA4920@noname.str.redhat.com> <000a01d168ac$09929500$1cb7bf00$@ru> <20160216125453.GC4920@noname.str.redhat.com> <000601d16bad$e93f9eb0$bbbedc10$@ru> <20160222110644.GD5387@noname.str.redhat.com> In-Reply-To: <20160222110644.GD5387@noname.str.redhat.com> Date: Wed, 24 Feb 2016 14:59:11 +0300 Message-ID: <002101d16efa$c74b3850$55e1a8f0$@ru> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Language: ru Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: 'Kevin Wolf' Cc: edgar.iglesias@xilinx.com, peter.maydell@linaro.org, igor.rubinov@gmail.com, mark.burton@greensocs.com, real@ispras.ru, hines@cert.org, qemu-devel@nongnu.org, maria.klimushenkova@ispras.ru, stefanha@redhat.com, pbonzini@redhat.com, batuzovk@ispras.ru, alex.bennee@linaro.org, fred.konrad@greensocs.com > From: Kevin Wolf [mailto:kwolf@redhat.com] > Am 20.02.2016 um 08:11 hat Pavel Dovgalyuk geschrieben: > > > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru] > > > > From: Kevin Wolf [mailto:kwolf@redhat.com] > > > > Am 16.02.2016 um 12:20 hat Pavel Dovgalyuk geschrieben: > > > > > Coroutine Replay > > > > > bool *done = req_replayed_list_get(reqid) // NULL > > > > > co = > > > > req_completed_list_get(e.reqid); // NULL > > > > > > > > There was no yield, this context switch is impossible to happen. Same > > > > for the switch back. > > > > > > > > > req_completed_list_insert(reqid, qemu_coroutine_self()); > > > > > qemu_coroutine_yield(); > > > > > > > > This is the point at which a context switch happens. The only other > > > > point in my code is the qemu_coroutine_enter() in the other function. > > > > > > I've fixed aio_poll problem by disabling mutex lock for the replay_run_block_event() > > > execution. Now virtual machine deterministically runs 4e8 instructions of Windows XP > booting. > > > But then one non-deterministic event happens. > > > Callback after finishing coroutine may be called from different contexts. > > How does this happen? I'm not aware of callbacks being processed by any > thread other than the I/O thread for that specific block device (unless > you use dataplane, this is the main loop thread). > > > > apic_update_irq() function behaves differently being called from vcpu and io threads. > > > In one case it sets CPU_INTERRUPT_POLL and in other - nothing happens. > > > > Kevin, do you have some ideas how to fix this issue? > > This happens because of coroutines may be assigned to different threads. > > Maybe there is some way of making this assignment more deterministic? > > Coroutines aren't randomly assigned to threads, but threads actively > enter coroutines. To my knowledge this happens only when starting a > request (either vcpu or I/O thread; consistent per device) or by a > callback when some event happens (only I/O thread). I can't see any > non-determinism here. Behavior of coroutines looks strange for me. Consider the code below (co_readv function of the replay driver). In record mode it somehow changes the thread it assigned to. Code in point A is executed in CPU thread and code in point B - in some other thread. May this happen because this coroutine yields somewhere and its execution is restored by aio_poll, which is called from iothread? In this case event finishing callback cannot be executed deterministically (always in CPU thread or always in IO thread). static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) { BDRVBlkreplayState *s = bs->opaque; uint32_t reqid = request_id++; Request *req; // A bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov); if (replay_mode == REPLAY_MODE_RECORD) { replay_save_block_event(reqid); } else { assert(replay_mode == REPLAY_MODE_PLAY); if (reqid == current_request) { current_finished = true; } else { req = block_request_insert(reqid, bs, qemu_coroutine_self()); qemu_coroutine_yield(); block_request_remove(req); } } // B return 0; } Pavel Dovgalyuk