From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41633)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1ZjWVg-0002e0-Qc
	for qemu-devel@nongnu.org; Tue, 06 Oct 2015 13:58:53 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1ZjWVP-000372-He
	for qemu-devel@nongnu.org; Tue, 06 Oct 2015 13:58:44 -0400
References: <1442838328-23117-1-git-send-email-pl@kamp.de>
	<1442838328-23117-2-git-send-email-pl@kamp.de>
	<5612E873.1090503@redhat.com>
	<20151006085736.GA3707@noname.str.redhat.com>
	<56139278.2090002@kamp.de> <5613FFBD.4080907@redhat.com>
	<49F235AC-AE58-435E-8C16-BD447AD81614@kamp.de>
From: John Snow <jsnow@redhat.com>
Message-ID: <56140B5B.9090405@redhat.com>
Date: Tue, 6 Oct 2015 13:56:43 -0400
MIME-Version: 1.0
In-Reply-To: <49F235AC-AE58-435E-8C16-BD447AD81614@kamp.de>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Peter Lieven <pl@kamp.de>
Cc: Kevin Wolf <kwolf@redhat.com>, stefanha@gmail.com, jcody@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org


On 10/06/2015 01:12 PM, Peter Lieven wrote:
>=20
>> Am 06.10.2015 um 19:07 schrieb John Snow <jsnow@redhat.com>:
>>
>>
>>
>>> On 10/06/2015 05:20 AM, Peter Lieven wrote:
>>>> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
>>>> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>>>>
>>>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>>>>> PIO read requests on the ATAPI interface used to be sync blk reque=
sts.
>>>>>> This has to siginificant drawbacks. First the main loop hangs util=
 an
>>>>>> I/O request is completed and secondly if the I/O request does not
>>>>>> complete (e.g. due to an unresponsive storage) Qemu hangs complete=
ly.
>>>>>>
>>>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>>>> ---
>>>>>>  hw/ide/atapi.c | 69
>>>>>> ++++++++++++++++++++++++++++++++++++----------------------
>>>>>>  1 file changed, 43 insertions(+), 26 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>>>>> index 747f466..9257e1c 100644
>>>>>> --- a/hw/ide/atapi.c
>>>>>> +++ b/hw/ide/atapi.c
>>>>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int=
 lba)
>>>>>>      memset(buf, 0, 288);
>>>>>>  }
>>>>>>  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, in=
t
>>>>>> sector_size)
>>>>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>>>>  {
>>>>>> -    int ret;
>>>>>> +    IDEState *s =3D opaque;
>>>>>>  -    switch(sector_size) {
>>>>>> -    case 2048:
>>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>> -        ret =3D blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>> -        break;
>>>>>> -    case 2352:
>>>>>> -        block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>> -                         4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>> -        ret =3D blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>>>>> -        block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>> -        if (ret < 0)
>>>>>> -            return ret;
>>>>>> -        cd_data_to_raw(buf, lba);
>>>>>> -        break;
>>>>>> -    default:
>>>>>> -        ret =3D -EIO;
>>>>>> -        break;
>>>>>> +    block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>> +
>>>>>> +    if (ret < 0) {
>>>>>> +        ide_atapi_io_error(s, ret);
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (s->cd_sector_size =3D=3D 2352) {
>>>>>> +        cd_data_to_raw(s->io_buffer, s->lba);
>>>>>>      }
>>>>>> -    return ret;
>>>>>> +
>>>>>> +    s->lba++;
>>>>>> +    s->io_buffer_index =3D 0;
>>>>>> +    s->status &=3D ~BUSY_STAT;
>>>>>> +
>>>>>> +    ide_atapi_cmd_reply_end(s);
>>>>>> +}
>>>>>> +
>>>>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>>>>> sector_size)
>>>>>> +{
>>>>>> +    if (sector_size !=3D 2048 && sector_size !=3D 2352) {
>>>>>> +        return -EINVAL;
>>>>>> +    }
>>>>>> +
>>>>>> +    s->iov.iov_base =3D buf;
>>>>>> +    if (sector_size =3D=3D 2352) {
>>>>>> +        buf +=3D 4;
>>>>>> +    }
>>>> This doesn't look quite right, buf is never read after this.
>>>>
>>>> Also, why +=3D4 when it was originally buf + 16?
>>>
>>> You are right. I mixed that up.
>>>
>>>>
>>>>>> +
>>>>>> +    s->iov.iov_len =3D 4 * BDRV_SECTOR_SIZE;
>>>>>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>>>>>> +
>>>>>> +    if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4,
>>>>>> +                      cd_read_sector_cb, s) =3D=3D NULL) {
>>>>>> +        return -EIO;
>>>>>> +    }
>>>>>> +
>>>>>> +    block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>> +                     4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>>>>> +    s->status |=3D BUSY_STAT;
>>>>>> +    return 0;
>>>>>>  }
>>>>> We discussed this off-list a bit, but for upstream synchronization:
>>>>>
>>>>> Unfortunately, I believe making cd_read_sector here non-blocking ma=
kes
>>>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls t=
o
>>>>> s->end_transfer_func() nonblocking, which functions like ide_data_r=
eadw
>>>>> are not prepared to cope with.
>>>> I don't think that's a problem as long as BSY is set while the
>>>> asynchronous command is running and DRQ is cleared. The latter will
>>>> protect ide_data_readw(). ide_sector_read() does essentially the sam=
e
>>>> thing.
>>>
>>> I was thinking the same. Without the BSY its not working at all.
>>>
>>>>
>>>> Or maybe I'm just missing what you're trying to say.
>>>>
>>>>> My suggestion is to buffer an entire DRQ block of data at once
>>>>> (byte_count_limit) to avoid the problem.
>>>> No matter whether there is a problem or not, buffering more data at =
once
>>>> (and therefore doing less requests) is better for performance anyway=
.
>>>
>>> Its possible to do only one read in the backend and read the whole
>>> request into the IO buffer. I send a follow-up.
>>
>> Be cautious: we only have 128K (+4 bytes) to play with in the io_buffe=
r
>> and the READ10 cdb can request up to 128MiB! For performance, it might
>> be nice to always buffer something like:
>>
>> MIN(128K, nb_sectors * sector_size)
>=20
> isnt nb_sectors limited to CD_MAX_SECTORS (32)?
>=20
> Peter
>=20

CD_MAX_SECTORS is... (80 * 60 * 75 * 2048) / 512 --> 1440000, and
describes the maximum sector size for a CD medium, not the request size.

Where'd you get the 32 number?

>=20
>>
>> and then as the guest drains the DRQ block of size byte_count_limit
>> which can only be at largest 0xFFFE (we can fit in at least two of the=
se
>> per io_buffer refill) we can just shift the data_ptr and data_end
>> pointers to utilize io_buffer like a ring buffer.
>>
>> Because the guest can at most fetch 0xfffe bytes at a time, it will te=
nd
>> to leave at least 4 bytes left over from a 64 block read. Luckily, we'=
ve
>> got 4 extra bytes in s->io_buffer, so with a ring buffer we can always
>> rebuffer *at least* two full DRQ blocks of data at a time.
>>
>> The routine would basically look like this:
>>
>> - No DRQ blocks buffered, so read up to 64 blocks or however many are
>> left for our transfer
>> - If we have at least one full DRQ block allocated, start the transfer
>> and send an interrupt
>> - If we ran out of DRQ blocks, go back to the top and buffer them.
>>
>> This would eliminate the need for code stanza #3 in
>> ide_atapi_cmd_reply_end, which re-starts a transfer without signaling =
to
>> the guest. We'd only have:
>>
>> ide_atapi_cmd_reply_end(...) {
>>  if (packet_transfer_size =3D=3D 0) { end(...); return; }
>>  if (blocks_buffered < 1) { async_buffer_blocks(...); return; }
>>  ide_transfer_start(...)
>>  ide_set_irq(s->bus);
>> }
>>
>>
>> which is a good deal simpler than what we have now, though I need to
>> look into the formatting of raw CD data a little more to make sure my
>> numbers make sense... it may not be quite so easy to buffer multiple D=
RQ
>> blocks in some cases, but so it goes -- we should always be able to
>> buffer at least one.
>>
>>> Maybe do you have a pointer to the test tool that John mentioned?
>>>
>>> Peter
>>>

--=20
=97js