From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60010) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVK4e-0003e0-LJ for qemu-devel@nongnu.org; Mon, 15 Feb 2016 09:24:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aVK4b-0004LU-DS for qemu-devel@nongnu.org; Mon, 15 Feb 2016 09:24:24 -0500 Received: from mail.ispras.ru ([83.149.199.45]:60477) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVK4a-0004LP-WB for qemu-devel@nongnu.org; Mon, 15 Feb 2016 09:24:21 -0500 From: "Pavel Dovgalyuk" References: <000601d163fb$4cbcae70$e6360b50$@Dovgaluk@ispras.ru> <20160210122816.GB5474@noname.redhat.com> <000a01d16401$be4d31d0$3ae79570$@Dovgaluk@ispras.ru> <20160210132545.GC5474@noname.redhat.com> <001201d16597$fa5de6a0$ef19b3e0$@Dovgaluk@ispras.ru> <20160212135820.GD4828@noname.redhat.com> <003301d167cc$4d7d9480$e878bd80$@ru> <003a01d167d1$42df95f0$c89ec1d0$@ru> <20160215093810.GC5244@noname.str.redhat.com> <004701d167f8$5cbe70f0$163b52d0$@ru> <20160215140635.GF5244@noname.str.redhat.com> In-Reply-To: <20160215140635.GF5244@noname.str.redhat.com> Date: Mon, 15 Feb 2016 17:24:18 +0300 Message-ID: <005501d167fc$8ed75030$ac85f090$@ru> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Language: ru Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: 'Kevin Wolf' Cc: edgar.iglesias@xilinx.com, peter.maydell@linaro.org, igor.rubinov@gmail.com, mark.burton@greensocs.com, real@ispras.ru, hines@cert.org, qemu-devel@nongnu.org, maria.klimushenkova@ispras.ru, stefanha@redhat.com, pbonzini@redhat.com, batuzovk@ispras.ru, alex.bennee@linaro.org, fred.konrad@greensocs.com > From: Kevin Wolf [mailto:kwolf@redhat.com] > > > > > > First of all, I'm not sure if running replay events from > > > qemu_clock_get_ns() is such a great idea. This is not a function that > > > callers expect to change any state. If you absolutely have to do it > > > there instead of in the clock device emulations, maybe restricting it to > > > replaying clock events could make it a bit more harmless. > > > > Only virtual clock is emulated, and host clock is read from the host > > real time sources and therefore has to be saved into the log. > > Isn't the host clock invisible to the guest anyway? It isn't. Host clock is used by guest RTC. > > There could be asynchronous events that occur in non-cpu threads. > > For now these events are shutdown request and block task execution. > > They may "hide" following clock (or another one) events. That is why > > we process them until synchronous event (like clock, instructions > > execution, or checkpoint) is met. > > > > > > > Anyway, what does "can't proceed" mean? The coroutine yields because > > > it's waiting for I/O, but it is never reentered? Or is it hanging while > > > trying to acquire a lock? > > > > I've solved this problem by slightly modifying the queue. > > I haven't yet made BlockDriverState assignment to the request ids. > > Therefore aio_poll was temporarily replaced with usleep. > > Now execution starts and hangs at some random moment of OS loading. > > > > Here is the current version of blkreplay functions: > > > > static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs, > > int64_t sector_num, int nb_sectors, QEMUIOVector *qiov) > > { > > uint32_t reqid = request_id++; > > Request *req; > > req = block_request_insert(reqid, bs, qemu_coroutine_self()); > > bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov); > > > > if (replay_mode == REPLAY_MODE_RECORD) { > > replay_save_block_event(reqid); > > } else { > > assert(replay_mode == REPLAY_MODE_PLAY); > > qemu_coroutine_yield(); > > } > > block_request_remove(req); > > > > return 0; > > } > > > > void replay_run_block_event(uint32_t id) > > { > > Request *req; > > if (replay_mode == REPLAY_MODE_PLAY) { > > while (!(req = block_request_find(id))) { > > //aio_poll(bdrv_get_aio_context(req->bs), true); > > usleep(1); > > } > > How is this loop supposed to make any progress? This loop does not supposed to make any progress. It waits until block_request_insert call is added to the queue. > And I still don't understand why aio_poll() doesn't work and where it > hangs. aio_poll hangs if "req = block_request_insert(reqid, bs, qemu_coroutine_self());" line is executed after bdrv_co_readv. When bdrv_co_readv yields, replay_run_block_event has no information about pending request and cannot jump to its coroutine. Maybe I should implement aio_poll execution there to make progress in that case? > > qemu_coroutine_enter(req->co, NULL); > > } > > } Pavel Dovgalyuk