* [Qemu-devel] about correctness of IDE emulation @ 2016-03-13 19:37 Huaicheng Li (coperd) 2016-03-14 1:42 ` Fam Zheng 2016-03-14 10:15 ` Stefan Hajnoczi 0 siblings, 2 replies; 8+ messages in thread From: Huaicheng Li (coperd) @ 2016-03-13 19:37 UTC (permalink / raw) To: qemu-devel, Stefan Hajnoczi, Fam Zheng Hi all, I meet some trouble in understanding IDE emulation: (1) IDE I/O Down Path (In VCPU thread): upon KVM_EXIT_IO, corresponding disk ioport write function will write IO info to IDEState, then ide read callback function will eventually split it into **several DMA transfers** and eventually submit them to the AIO request list for handling. (2). I/O Up Path (worker thread —> QEMU main loop thread) when the request in AIO request list has been successfully handled, the worker thread will signal the QEMU main thread this I/O completion event, which is later handled by its callback (posix_aio_read). posix_aio_read will then eventually return to IDE callback function, where virtual interrupt is generated to signal guest about I/O completion. What I’m confused about is that: If one I/O is too large and may need several rounds (say 2) of DMA transfers, it seems the second round transfer begins only after the completion of the first part, by reading data from **IDEState**. But the IDEState info may have been changed by VCPU threads (by writing new I/Os to it) when the first transfer finishes. From the code, I see that IDE r/w call back function will continue the second transfer by referencing IDEState’s information. Wouldn’t this be problematic? Am I missing anything here? Thanks. Best, Huaicheng ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] about correctness of IDE emulation 2016-03-13 19:37 [Qemu-devel] about correctness of IDE emulation Huaicheng Li (coperd) @ 2016-03-14 1:42 ` Fam Zheng 2016-03-15 3:09 ` Huaicheng Li 2016-03-14 10:15 ` Stefan Hajnoczi 1 sibling, 1 reply; 8+ messages in thread From: Fam Zheng @ 2016-03-14 1:42 UTC (permalink / raw) To: Huaicheng Li (coperd); +Cc: qemu-devel, Stefan Hajnoczi On Sun, 03/13 14:37, Huaicheng Li (coperd) wrote: > Hi all, > > What I’m confused about is that: > > If one I/O is too large and may need several rounds (say 2) of DMA transfers, > it seems the second round transfer begins only after the completion of the > first part, by reading data from **IDEState**. But the IDEState info may have > been changed by VCPU threads (by writing new I/Os to it) when the first > transfer finishes. From the code, I see that IDE r/w call back function will > continue the second transfer by referencing IDEState’s information. Wouldn’t > this be problematic? Am I missing anything here? Can you give an concrete example? I/O in VCPU threads that changes IDEState must also take care of the DMA transfers, for example ide_reset() has blk_aio_cancel and clears s->nsectors. If an I/O handler fails to do so, it is a bug. Fam ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] about correctness of IDE emulation 2016-03-14 1:42 ` Fam Zheng @ 2016-03-15 3:09 ` Huaicheng Li 2016-03-15 12:59 ` Stefan Hajnoczi 2016-04-13 7:25 ` Huaicheng Li (coperd) 0 siblings, 2 replies; 8+ messages in thread From: Huaicheng Li @ 2016-03-15 3:09 UTC (permalink / raw) To: Fam Zheng; +Cc: qemu-devel, Stefan Hajnoczi > On Mar 13, 2016, at 8:42 PM, Fam Zheng <famz@redhat.com> wrote: > > On Sun, 03/13 14:37, Huaicheng Li (coperd) wrote: >> Hi all, >> >> What I’m confused about is that: >> >> If one I/O is too large and may need several rounds (say 2) of DMA transfers, >> it seems the second round transfer begins only after the completion of the >> first part, by reading data from **IDEState**. But the IDEState info may have >> been changed by VCPU threads (by writing new I/Os to it) when the first >> transfer finishes. From the code, I see that IDE r/w call back function will >> continue the second transfer by referencing IDEState’s information. Wouldn’t >> this be problematic? Am I missing anything here? > > Can you give an concrete example? I/O in VCPU threads that changes IDEState > must also take care of the DMA transfers, for example ide_reset() has > blk_aio_cancel and clears s->nsectors. If an I/O handler fails to do so, it is > a bug. > > Fam I get it now. ide_exec_cmd() can only proceed when BUSY_STAT|DRQ_STAT is not set. When the 2nd DMA transfer continues, BUSY_STAT | DRQ_STAT is already set, i.e., no other new ide_exec_cmd() can enter. BSUY or DRQ is removed only when all DMA transfers are done, after which new writes to IDE are allowed. Thus it’s safe. Thanks, Fam & Stefan. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] about correctness of IDE emulation 2016-03-15 3:09 ` Huaicheng Li @ 2016-03-15 12:59 ` Stefan Hajnoczi 2016-04-13 7:25 ` Huaicheng Li (coperd) 1 sibling, 0 replies; 8+ messages in thread From: Stefan Hajnoczi @ 2016-03-15 12:59 UTC (permalink / raw) To: Huaicheng Li; +Cc: Fam Zheng, qemu-devel [-- Attachment #1: Type: text/plain, Size: 1637 bytes --] On Mon, Mar 14, 2016 at 10:09:17PM -0500, Huaicheng Li wrote: > > > On Mar 13, 2016, at 8:42 PM, Fam Zheng <famz@redhat.com> wrote: > > > > On Sun, 03/13 14:37, Huaicheng Li (coperd) wrote: > >> Hi all, > >> > >> What I’m confused about is that: > >> > >> If one I/O is too large and may need several rounds (say 2) of DMA transfers, > >> it seems the second round transfer begins only after the completion of the > >> first part, by reading data from **IDEState**. But the IDEState info may have > >> been changed by VCPU threads (by writing new I/Os to it) when the first > >> transfer finishes. From the code, I see that IDE r/w call back function will > >> continue the second transfer by referencing IDEState’s information. Wouldn’t > >> this be problematic? Am I missing anything here? > > > > Can you give an concrete example? I/O in VCPU threads that changes IDEState > > must also take care of the DMA transfers, for example ide_reset() has > > blk_aio_cancel and clears s->nsectors. If an I/O handler fails to do so, it is > > a bug. > > > > Fam > > I get it now. ide_exec_cmd() can only proceed when BUSY_STAT|DRQ_STAT is not set. > When the 2nd DMA transfer continues, BUSY_STAT | DRQ_STAT is already > set, i.e., no other new ide_exec_cmd() can enter. BSUY or DRQ is removed only when > all DMA transfers are done, after which new writes to IDE are allowed. Thus it’s safe. > > Thanks, Fam & Stefan. Okay, happy to see that the case you were thinking of is already covered by QEMU. If you do notice anything in the code which looks incorrect, just let us know. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] about correctness of IDE emulation 2016-03-15 3:09 ` Huaicheng Li 2016-03-15 12:59 ` Stefan Hajnoczi @ 2016-04-13 7:25 ` Huaicheng Li (coperd) 2016-04-13 18:07 ` John Snow 1 sibling, 1 reply; 8+ messages in thread From: Huaicheng Li (coperd) @ 2016-04-13 7:25 UTC (permalink / raw) To: qemu-devel; +Cc: Stefan Hajnoczi, John Snow > On Mar 14, 2016, at 10:09 PM, Huaicheng Li <lhcwhu@gmail.com> wrote: > > >> On Mar 13, 2016, at 8:42 PM, Fam Zheng <famz@redhat.com> wrote: >> >> On Sun, 03/13 14:37, Huaicheng Li (coperd) wrote: >>> Hi all, >>> >>> What I’m confused about is that: >>> >>> If one I/O is too large and may need several rounds (say 2) of DMA transfers, >>> it seems the second round transfer begins only after the completion of the >>> first part, by reading data from **IDEState**. But the IDEState info may have >>> been changed by VCPU threads (by writing new I/Os to it) when the first >>> transfer finishes. From the code, I see that IDE r/w call back function will >>> continue the second transfer by referencing IDEState’s information. Wouldn’t >>> this be problematic? Am I missing anything here? >> >> Can you give an concrete example? I/O in VCPU threads that changes IDEState >> must also take care of the DMA transfers, for example ide_reset() has >> blk_aio_cancel and clears s->nsectors. If an I/O handler fails to do so, it is >> a bug. >> >> Fam > > I get it now. ide_exec_cmd() can only proceed when BUSY_STAT|DRQ_STAT is not set. > When the 2nd DMA transfer continues, BUSY_STAT | DRQ_STAT is already > set, i.e., no other new ide_exec_cmd() can enter. BSUY or DRQ is removed only when > all DMA transfers are done, after which new writes to IDE are allowed. Thus it’s safe. > > Thanks, Fam & Stefan. Hi all, I have some further puzzles about IDE emulation: (1). IDE can only handle I/Os one by one. So in the AIO queue there will always be only **ONE** I/O from this IDE, right? For the bigs I/Os which need to be spliced into several rounds of DMA transfers, they are also served one by one. (after one DMA transfer [as an AIO] is finished, another DMA transfer will be submitted and so on). Here I want to convey that there is no batch submission in IDE path at all. True? (2). When the guest kernel prepares to do a big I/O which need multiple rounds of DMA transfers, will each DMA transfer round (one PRD entry) be trapped and trigger one IDE emulation, or IDE will handle all the PRD in one shot? (3). I traced the execution of my guest application with big I/Os (each time reads 2MB), then in the IDE layer, I found that it’s splitted into 512KB chunks for each DMA transfer. Why is 512KB here?? From the BMDMA spec, PRD table can at most represent 64KB/8bytes = 8192 buffers, each of which can be a at most 64KB continuous buffer. This would give us 8192*64KB=512MB for each DMA. Am I missing anything here? Thanks for your attention. Best, Huaicheng ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] about correctness of IDE emulation 2016-04-13 7:25 ` Huaicheng Li (coperd) @ 2016-04-13 18:07 ` John Snow 2016-04-13 21:12 ` Huaicheng Li 0 siblings, 1 reply; 8+ messages in thread From: John Snow @ 2016-04-13 18:07 UTC (permalink / raw) To: Huaicheng Li (coperd), qemu-devel; +Cc: Stefan Hajnoczi On 04/13/2016 03:25 AM, Huaicheng Li (coperd) wrote: > >> On Mar 14, 2016, at 10:09 PM, Huaicheng Li <lhcwhu@gmail.com> wrote: >> >> >>> On Mar 13, 2016, at 8:42 PM, Fam Zheng <famz@redhat.com> wrote: >>> >>> On Sun, 03/13 14:37, Huaicheng Li (coperd) wrote: >>>> Hi all, >>>> >>>> What I’m confused about is that: >>>> >>>> If one I/O is too large and may need several rounds (say 2) of DMA transfers, >>>> it seems the second round transfer begins only after the completion of the >>>> first part, by reading data from **IDEState**. But the IDEState info may have >>>> been changed by VCPU threads (by writing new I/Os to it) when the first >>>> transfer finishes. From the code, I see that IDE r/w call back function will >>>> continue the second transfer by referencing IDEState’s information. Wouldn’t >>>> this be problematic? Am I missing anything here? >>> >>> Can you give an concrete example? I/O in VCPU threads that changes IDEState >>> must also take care of the DMA transfers, for example ide_reset() has >>> blk_aio_cancel and clears s->nsectors. If an I/O handler fails to do so, it is >>> a bug. >>> >>> Fam >> >> I get it now. ide_exec_cmd() can only proceed when BUSY_STAT|DRQ_STAT is not set. >> When the 2nd DMA transfer continues, BUSY_STAT | DRQ_STAT is already >> set, i.e., no other new ide_exec_cmd() can enter. BSUY or DRQ is removed only when >> all DMA transfers are done, after which new writes to IDE are allowed. Thus it’s safe. >> >> Thanks, Fam & Stefan. > > Hi all, I have some further puzzles about IDE emulation: > > (1). IDE can only handle I/Os one by one. So in the AIO queue there will always be only > **ONE** I/O from this IDE, right? For the bigs I/Os which need to be spliced into several > rounds of DMA transfers, they are also served one by one. (after one DMA transfer [as an > AIO] is finished, another DMA transfer will be submitted and so on). Here I want to convey > that there is no batch submission in IDE path at all. True? Correct. In general, DMA requests are fulfilled all at once, so in general each read request to the IDE device is processed as one giant DMA request. I believe ATAPI DMA requests might be split by 2048 chunks, though. > (2). When the guest kernel prepares to do a big I/O which need multiple rounds of DMA > transfers, will each DMA transfer round (one PRD entry) be trapped and trigger one IDE > emulation, or IDE will handle all the PRD in one shot? the IDE emulator does not attempt to process the PRDs individually, but it builds an SGList that is passed down through the AIO stack and eventually to Linux. I'm not sure how Linux decides to process contiguous vs. noncontiguous PRD entries. The IDE emulator however does not iterate per-PRD except to build the SGList. When the AIOCB is invoked, IDE expects that all PRDs it submitted were handled. (For instance, there is an AHCI flag for PRDs that an interrupt should be signalled after *this PRD* was processed. Unfortunately, there is no current way to detect this in QEMU, so I believe we ignore this flag currently. AHCI describes this as an "opportunistic interrupt.") > (3). I traced the execution of my guest application with big I/Os (each time reads 2MB), > then in the IDE layer, I found that it’s splitted into 512KB chunks for each DMA transfer. > Why is 512KB here?? From the BMDMA spec, PRD table can at most represent 64KB/8bytes > = 8192 buffers, each of which can be a at most 64KB continuous buffer. This would give > us 8192*64KB=512MB for each DMA. > The splitting you're seeing could be occurring in lots of different places -- your host OS, QEMU's AIO handling itself, or the guest OS. It's *not* happening in the IDE emulator, though. The IDE emulator itself does not attempt to split requests by 512KB chunks -- you can test yourself by putting a tracer in dma_cb() in core.c to see how many bytes IDE is requesting at a time -- I was able to ask for 1025 sectors in one-shot using a modified version of tests/ide-test. You can put a tracer in cmd_read_dma as well to see how many sectors the guest is requesting from the IDE device at a time. > Am I missing anything here? > Why do you want to use IDE? If you are looking for performance, why not a virtio device? > Thanks for your attention. > > Best, > Huaicheng > > --js ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] about correctness of IDE emulation 2016-04-13 18:07 ` John Snow @ 2016-04-13 21:12 ` Huaicheng Li 0 siblings, 0 replies; 8+ messages in thread From: Huaicheng Li @ 2016-04-13 21:12 UTC (permalink / raw) To: John Snow; +Cc: qemu-devel, Stefan Hajnoczi > On Apr 13, 2016, at 1:07 PM, John Snow <jsnow@redhat.com> wrote: > > Why do you want to use IDE? If you are looking for performance, why not > a virtio device? I’m just trying to understand how IDE emulation works and see where the overhead comes in. Thank you for the detailed explanation. I really appreciate that. Best, Huaicheng ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] about correctness of IDE emulation 2016-03-13 19:37 [Qemu-devel] about correctness of IDE emulation Huaicheng Li (coperd) 2016-03-14 1:42 ` Fam Zheng @ 2016-03-14 10:15 ` Stefan Hajnoczi 1 sibling, 0 replies; 8+ messages in thread From: Stefan Hajnoczi @ 2016-03-14 10:15 UTC (permalink / raw) To: Huaicheng Li (coperd); +Cc: Fam Zheng, qemu-devel [-- Attachment #1: Type: text/plain, Size: 1933 bytes --] On Sun, Mar 13, 2016 at 02:37:21PM -0500, Huaicheng Li (coperd) wrote: > I meet some trouble in understanding IDE emulation: > > (1) IDE I/O Down Path (In VCPU thread): > upon KVM_EXIT_IO, corresponding disk ioport write function will write IO info to IDEState, then ide read callback function will eventually split it into **several DMA transfers** and eventually submit them to the AIO request list for handling. > > (2). I/O Up Path (worker thread —> QEMU main loop thread) > when the request in AIO request list has been successfully handled, the worker thread will signal the QEMU main thread this I/O completion event, which is later handled by its callback (posix_aio_read). posix_aio_read will then eventually return to IDE callback function, where virtual interrupt is generated to signal guest about I/O completion. > > What I’m confused about is that: > > If one I/O is too large and may need several rounds (say 2) of DMA transfers, it seems the second round transfer begins only after the completion of the first part, by reading data from **IDEState**. But the IDEState info may have been changed by VCPU threads (by writing new I/Os to it) when the first transfer finishes. From the code, I see that IDE r/w call back function will continue the second transfer by referencing IDEState’s information. Wouldn’t this be problematic? Am I missing anything here? Yes it would be problematic. Is the case you are thinking about protected by the following code? void ide_exec_cmd(IDEBus *bus, uint32_t val) { ... /* Only RESET is allowed while BSY and/or DRQ are set, * and only to ATAPI devices. */ if (s->status & (BUSY_STAT|DRQ_STAT)) { if (val != WIN_DEVICE_RESET || s->drive_kind != IDE_CD) { return; } } If not, please try writing a test case or post the specific hardware register accesses you have in mind. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-04-13 21:12 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-13 19:37 [Qemu-devel] about correctness of IDE emulation Huaicheng Li (coperd) 2016-03-14 1:42 ` Fam Zheng 2016-03-15 3:09 ` Huaicheng Li 2016-03-15 12:59 ` Stefan Hajnoczi 2016-04-13 7:25 ` Huaicheng Li (coperd) 2016-04-13 18:07 ` John Snow 2016-04-13 21:12 ` Huaicheng Li 2016-03-14 10:15 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).