From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52083)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1aqPCM-0007yg-It
	for qemu-devel@nongnu.org; Wed, 13 Apr 2016 14:07:31 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1aqPCJ-0001Lc-6c
	for qemu-devel@nongnu.org; Wed, 13 Apr 2016 14:07:30 -0400
Received: from mx1.redhat.com ([209.132.183.28]:54977)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1aqPCI-0001LW-VE
	for qemu-devel@nongnu.org; Wed, 13 Apr 2016 14:07:27 -0400
References: <AC10DBE9-1971-4863-9156-546D47FF9F2B@gmail.com>
	<20160314014248.GA2112@ad.usersys.redhat.com>
	<0205A5D4-9FC7-436C-B124-D0D0D3FD1A51@gmail.com>
	<CFC95BE4-4C8C-4B24-B275-2610A9265BA6@gmail.com>
From: John Snow <jsnow@redhat.com>
Message-ID: <570E8ADD.2010207@redhat.com>
Date: Wed, 13 Apr 2016 14:07:25 -0400
MIME-Version: 1.0
In-Reply-To: <CFC95BE4-4C8C-4B24-B275-2610A9265BA6@gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] about correctness of IDE emulation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Huaicheng Li (coperd)" <lhcwhu@gmail.com>, qemu-devel@nongnu.org
Cc: Stefan Hajnoczi <stefanha@redhat.com>


On 04/13/2016 03:25 AM, Huaicheng Li (coperd) wrote:
>=20
>> On Mar 14, 2016, at 10:09 PM, Huaicheng Li <lhcwhu@gmail.com> wrote:
>>
>>
>>> On Mar 13, 2016, at 8:42 PM, Fam Zheng <famz@redhat.com> wrote:
>>>
>>> On Sun, 03/13 14:37, Huaicheng Li (coperd) wrote:
>>>> Hi all,=20
>>>>
>>>> What I=E2=80=99m confused about is that:
>>>>
>>>> If one I/O is too large and may need several rounds (say 2) of DMA t=
ransfers,
>>>> it seems the second round transfer begins only after the completion =
of the
>>>> first part, by reading data from **IDEState**. But the IDEState info=
 may have
>>>> been changed by VCPU threads (by writing new I/Os to it) when the fi=
rst
>>>> transfer finishes. From the code, I see that IDE r/w call back funct=
ion will
>>>> continue the second transfer by referencing IDEState=E2=80=99s infor=
mation. Wouldn=E2=80=99t
>>>> this be problematic? Am I missing anything here?
>>>
>>> Can you give an concrete example? I/O in VCPU threads that changes ID=
EState
>>> must also take care of the DMA transfers, for example ide_reset() has
>>> blk_aio_cancel and clears s->nsectors. If an I/O handler fails to do =
so, it is
>>> a bug.
>>>
>>> Fam
>>
>> I get it now. ide_exec_cmd() can only proceed when BUSY_STAT|DRQ_STAT =
is not set.
>> When the 2nd DMA transfer continues, BUSY_STAT | DRQ_STAT is already
>> set, i.e., no other new ide_exec_cmd() can enter. BSUY or DRQ is remov=
ed only when
>> all DMA transfers are done, after which new writes to IDE are allowed.=
 Thus it=E2=80=99s safe.
>>
>> Thanks, Fam & Stefan.
>=20
> Hi all, I have some further puzzles about IDE emulation:
>=20
>   (1). IDE can only handle I/Os one by one.  So in the AIO queue there =
will always be only
>  **ONE** I/O from this IDE, right? For the bigs I/Os which need to be s=
pliced into several=20
> rounds of DMA transfers, they are also served one by one. (after one DM=
A transfer [as an
> AIO] is finished, another DMA transfer will be submitted and so on).  H=
ere I want to convey
> that there is no batch submission in IDE path at all. True?

Correct. In general, DMA requests are fulfilled all at once, so in
general each read request to the IDE device is processed as one giant
DMA request.

I believe ATAPI DMA requests might be split by 2048 chunks, though.

>   (2). When the guest kernel prepares to do a big I/O which need multip=
le rounds of  DMA=20
> transfers, will each DMA transfer round (one PRD entry) be trapped and =
trigger one IDE=20
> emulation, or IDE will handle all the PRD in one shot?=20

the IDE emulator does not attempt to process the PRDs individually, but
it builds an SGList that is passed down through the AIO stack and
eventually to Linux.

I'm not sure how Linux decides to process contiguous vs. noncontiguous
PRD entries.

The IDE emulator however does not iterate per-PRD except to build the
SGList. When the AIOCB is invoked, IDE expects that all PRDs it
submitted were handled.

(For instance, there is an AHCI flag for PRDs that an interrupt should
be signalled after *this PRD* was processed. Unfortunately, there is no
current way to detect this in QEMU, so I believe we ignore this flag
currently. AHCI describes this as an "opportunistic interrupt.")

>   (3). I traced the execution of my guest application with big I/Os (ea=
ch time reads 2MB),
> then in the IDE layer, I found that it=E2=80=99s splitted into 512KB ch=
unks for each DMA transfer.=20
> Why is 512KB here?? From the BMDMA spec, PRD table can at most represen=
t 64KB/8bytes
> =3D 8192 buffers, each of which can be a at most 64KB continuous buffer=
. This would give
> us 8192*64KB=3D512MB for each DMA.=20
>=20

The splitting you're seeing could be occurring in lots of different
places -- your host OS, QEMU's AIO handling itself, or the guest OS.
It's *not* happening in the IDE emulator, though.

The IDE emulator itself does not attempt to split requests by 512KB
chunks -- you can test yourself by putting a tracer in dma_cb() in
core.c to see how many bytes IDE is requesting at a time -- I was able
to ask for 1025 sectors in one-shot using a modified version of
tests/ide-test.

You can put a tracer in cmd_read_dma as well to see how many sectors the
guest is requesting from the IDE device at a time.

> Am I missing anything here? =20
>=20

Why do you want to use IDE? If you are looking for performance, why not
a virtio device?

> Thanks for your attention.
>=20
> Best,
> Huaicheng
>=20
>=20

--js