From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41633) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZjWVg-0002e0-Qc for qemu-devel@nongnu.org; Tue, 06 Oct 2015 13:58:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZjWVP-000372-He for qemu-devel@nongnu.org; Tue, 06 Oct 2015 13:58:44 -0400 References: <1442838328-23117-1-git-send-email-pl@kamp.de> <1442838328-23117-2-git-send-email-pl@kamp.de> <5612E873.1090503@redhat.com> <20151006085736.GA3707@noname.str.redhat.com> <56139278.2090002@kamp.de> <5613FFBD.4080907@redhat.com> <49F235AC-AE58-435E-8C16-BD447AD81614@kamp.de> From: John Snow Message-ID: <56140B5B.9090405@redhat.com> Date: Tue, 6 Oct 2015 13:56:43 -0400 MIME-Version: 1.0 In-Reply-To: <49F235AC-AE58-435E-8C16-BD447AD81614@kamp.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven Cc: Kevin Wolf , stefanha@gmail.com, jcody@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org On 10/06/2015 01:12 PM, Peter Lieven wrote: >=20 >> Am 06.10.2015 um 19:07 schrieb John Snow : >> >> >> >>> On 10/06/2015 05:20 AM, Peter Lieven wrote: >>>> Am 06.10.2015 um 10:57 schrieb Kevin Wolf: >>>> Am 05.10.2015 um 23:15 hat John Snow geschrieben: >>>>> >>>>> On 09/21/2015 08:25 AM, Peter Lieven wrote: >>>>>> PIO read requests on the ATAPI interface used to be sync blk reque= sts. >>>>>> This has to siginificant drawbacks. First the main loop hangs util= an >>>>>> I/O request is completed and secondly if the I/O request does not >>>>>> complete (e.g. due to an unresponsive storage) Qemu hangs complete= ly. >>>>>> >>>>>> Signed-off-by: Peter Lieven >>>>>> --- >>>>>> hw/ide/atapi.c | 69 >>>>>> ++++++++++++++++++++++++++++++++++++---------------------- >>>>>> 1 file changed, 43 insertions(+), 26 deletions(-) >>>>>> >>>>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c >>>>>> index 747f466..9257e1c 100644 >>>>>> --- a/hw/ide/atapi.c >>>>>> +++ b/hw/ide/atapi.c >>>>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int= lba) >>>>>> memset(buf, 0, 288); >>>>>> } >>>>>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, in= t >>>>>> sector_size) >>>>>> +static void cd_read_sector_cb(void *opaque, int ret) >>>>>> { >>>>>> - int ret; >>>>>> + IDEState *s =3D opaque; >>>>>> - switch(sector_size) { >>>>>> - case 2048: >>>>>> - block_acct_start(blk_get_stats(s->blk), &s->acct, >>>>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); >>>>>> - ret =3D blk_read(s->blk, (int64_t)lba << 2, buf, 4); >>>>>> - block_acct_done(blk_get_stats(s->blk), &s->acct); >>>>>> - break; >>>>>> - case 2352: >>>>>> - block_acct_start(blk_get_stats(s->blk), &s->acct, >>>>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); >>>>>> - ret =3D blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4); >>>>>> - block_acct_done(blk_get_stats(s->blk), &s->acct); >>>>>> - if (ret < 0) >>>>>> - return ret; >>>>>> - cd_data_to_raw(buf, lba); >>>>>> - break; >>>>>> - default: >>>>>> - ret =3D -EIO; >>>>>> - break; >>>>>> + block_acct_done(blk_get_stats(s->blk), &s->acct); >>>>>> + >>>>>> + if (ret < 0) { >>>>>> + ide_atapi_io_error(s, ret); >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> + if (s->cd_sector_size =3D=3D 2352) { >>>>>> + cd_data_to_raw(s->io_buffer, s->lba); >>>>>> } >>>>>> - return ret; >>>>>> + >>>>>> + s->lba++; >>>>>> + s->io_buffer_index =3D 0; >>>>>> + s->status &=3D ~BUSY_STAT; >>>>>> + >>>>>> + ide_atapi_cmd_reply_end(s); >>>>>> +} >>>>>> + >>>>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int >>>>>> sector_size) >>>>>> +{ >>>>>> + if (sector_size !=3D 2048 && sector_size !=3D 2352) { >>>>>> + return -EINVAL; >>>>>> + } >>>>>> + >>>>>> + s->iov.iov_base =3D buf; >>>>>> + if (sector_size =3D=3D 2352) { >>>>>> + buf +=3D 4; >>>>>> + } >>>> This doesn't look quite right, buf is never read after this. >>>> >>>> Also, why +=3D4 when it was originally buf + 16? >>> >>> You are right. I mixed that up. >>> >>>> >>>>>> + >>>>>> + s->iov.iov_len =3D 4 * BDRV_SECTOR_SIZE; >>>>>> + qemu_iovec_init_external(&s->qiov, &s->iov, 1); >>>>>> + >>>>>> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4, >>>>>> + cd_read_sector_cb, s) =3D=3D NULL) { >>>>>> + return -EIO; >>>>>> + } >>>>>> + >>>>>> + block_acct_start(blk_get_stats(s->blk), &s->acct, >>>>>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); >>>>>> + s->status |=3D BUSY_STAT; >>>>>> + return 0; >>>>>> } >>>>> We discussed this off-list a bit, but for upstream synchronization: >>>>> >>>>> Unfortunately, I believe making cd_read_sector here non-blocking ma= kes >>>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls t= o >>>>> s->end_transfer_func() nonblocking, which functions like ide_data_r= eadw >>>>> are not prepared to cope with. >>>> I don't think that's a problem as long as BSY is set while the >>>> asynchronous command is running and DRQ is cleared. The latter will >>>> protect ide_data_readw(). ide_sector_read() does essentially the sam= e >>>> thing. >>> >>> I was thinking the same. Without the BSY its not working at all. >>> >>>> >>>> Or maybe I'm just missing what you're trying to say. >>>> >>>>> My suggestion is to buffer an entire DRQ block of data at once >>>>> (byte_count_limit) to avoid the problem. >>>> No matter whether there is a problem or not, buffering more data at = once >>>> (and therefore doing less requests) is better for performance anyway= . >>> >>> Its possible to do only one read in the backend and read the whole >>> request into the IO buffer. I send a follow-up. >> >> Be cautious: we only have 128K (+4 bytes) to play with in the io_buffe= r >> and the READ10 cdb can request up to 128MiB! For performance, it might >> be nice to always buffer something like: >> >> MIN(128K, nb_sectors * sector_size) >=20 > isnt nb_sectors limited to CD_MAX_SECTORS (32)? >=20 > Peter >=20 CD_MAX_SECTORS is... (80 * 60 * 75 * 2048) / 512 --> 1440000, and describes the maximum sector size for a CD medium, not the request size. Where'd you get the 32 number? >=20 >> >> and then as the guest drains the DRQ block of size byte_count_limit >> which can only be at largest 0xFFFE (we can fit in at least two of the= se >> per io_buffer refill) we can just shift the data_ptr and data_end >> pointers to utilize io_buffer like a ring buffer. >> >> Because the guest can at most fetch 0xfffe bytes at a time, it will te= nd >> to leave at least 4 bytes left over from a 64 block read. Luckily, we'= ve >> got 4 extra bytes in s->io_buffer, so with a ring buffer we can always >> rebuffer *at least* two full DRQ blocks of data at a time. >> >> The routine would basically look like this: >> >> - No DRQ blocks buffered, so read up to 64 blocks or however many are >> left for our transfer >> - If we have at least one full DRQ block allocated, start the transfer >> and send an interrupt >> - If we ran out of DRQ blocks, go back to the top and buffer them. >> >> This would eliminate the need for code stanza #3 in >> ide_atapi_cmd_reply_end, which re-starts a transfer without signaling = to >> the guest. We'd only have: >> >> ide_atapi_cmd_reply_end(...) { >> if (packet_transfer_size =3D=3D 0) { end(...); return; } >> if (blocks_buffered < 1) { async_buffer_blocks(...); return; } >> ide_transfer_start(...) >> ide_set_irq(s->bus); >> } >> >> >> which is a good deal simpler than what we have now, though I need to >> look into the formatting of raw CD data a little more to make sure my >> numbers make sense... it may not be quite so easy to buffer multiple D= RQ >> blocks in some cases, but so it goes -- we should always be able to >> buffer at least one. >> >>> Maybe do you have a pointer to the test tool that John mentioned? >>> >>> Peter >>> --=20 =97js