From: dovgaluk <dovgaluk@ispras.ru>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Pavel Dovgalyuk <pavel.dovgaluk@gmail.com>,
jsnow@redhat.com, qemu-devel@nongnu.org,
pavel.dovgaluk@ispras.ru, mreitz@redhat.com
Subject: Re: [PATCH] icount: make dma reads deterministic
Date: Tue, 03 Mar 2020 15:31:29 +0300 [thread overview]
Message-ID: <a6067910aa1bb4eb512c50292734b566@ispras.ru> (raw)
In-Reply-To: <20200302161903.GF4965@linux.fritz.box>
Kevin Wolf писал 2020-03-02 19:19:
> Am 02.03.2020 um 13:59 hat Pavel Dovgalyuk geschrieben:
>> Windows guest sometimes makes DMA requests with overlapping
>> target addresses. This leads to the following structure of iov for
>> the block driver:
>>
>> addr size1
>> addr size2
>> addr size3
>>
>> It means that three adjacent disk blocks should be read into the same
>> memory buffer. Windows does not expects anything from these bytes
>> (should it be data from the first block, or the last one, or some
>> mix),
>> but uses them somehow. It leads to non-determinism of the guest
>> execution,
>> because block driver does not preserve any order of reading.
>>
>> This situation was discusses in the mailing list at least twice:
>> https://lists.gnu.org/archive/html/qemu-devel/2010-09/msg01996.html
>> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05185.html
>>
>> This patch makes such disk reads deterministic in icount mode.
>> It skips SG parts that were already affected by prior reads
>> within the same request. Parts that are non identical, but are just
>> overlapped, are trimmed.
>>
>> Examples for different SG part sequences:
>>
>> 1)
>> A1 1000
>> A1 1000
>> ->
>> A1 1000
>>
>> 2)
>> A1 1000
>> A2 1000
>> A1 1000
>> A3 1000
>> ->
>> Two requests with different offsets, because second A1/1000 should be
>> skipped.
>> A1 1000
>> A2 1000
>> --
>> A3 1000
>
> How is the "--" line represented in the code?
>
>> 3)
>> A1 800
>> A2 1000
>> A1 1000
>> ->
>> First 800 bytes of third SG are skipped.
>> A1 800
>> A2 1000
>> --
>> A1+800 800
>>
>> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
>> ---
>> dma-helpers.c | 57
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 53 insertions(+), 4 deletions(-)
>>
>> diff --git a/dma-helpers.c b/dma-helpers.c
>> index e8a26e81e1..d71512f707 100644
>> --- a/dma-helpers.c
>> +++ b/dma-helpers.c
>> @@ -13,6 +13,7 @@
>> #include "trace-root.h"
>> #include "qemu/thread.h"
>> #include "qemu/main-loop.h"
>> +#include "sysemu/cpus.h"
>>
>> /* #define DEBUG_IOMMU */
>>
>> @@ -139,17 +140,65 @@ static void dma_blk_cb(void *opaque, int ret)
>> dma_blk_unmap(dbs);
>>
>> while (dbs->sg_cur_index < dbs->sg->nsg) {
>> + bool skip = false;
>> cur_addr = dbs->sg->sg[dbs->sg_cur_index].base +
>> dbs->sg_cur_byte;
>> cur_len = dbs->sg->sg[dbs->sg_cur_index].len -
>> dbs->sg_cur_byte;
>> - mem = dma_memory_map(dbs->sg->as, cur_addr, &cur_len,
>> dbs->dir);
>> - if (!mem)
>> - break;
>> - qemu_iovec_add(&dbs->iov, mem, cur_len);
>> +
>> + /*
>> + * Make reads deterministic in icount mode.
>> + * Windows sometimes issues disk read requests with
>> + * overlapping SGs. It leads to non-determinism, because
>> + * resulting buffer contents may be mixed from several
>> + * sectors.
>> + * This code crops SGs that were already read in this
>> request.
>> + */
>
> Please make use of the full line length for the commit text, and add
> empty lines between paragraphs.
Ok
>
>> + if (use_icount
>> + && dbs->dir == DMA_DIRECTION_FROM_DEVICE) {
>
> This fits in a single line.
Ok
>> + }
>> +
>> dbs->sg_cur_byte += cur_len;
>> if (dbs->sg_cur_byte == dbs->sg->sg[dbs->sg_cur_index].len) {
>> dbs->sg_cur_byte = 0;
>> ++dbs->sg_cur_index;
>> }
>> +
>> + /*
>> + * All remaining SGs were skipped.
>> + * This is not reschedule case, because we already
>> + * performed the reads, and the last SGs were skipped.
>> + */
>> + if (dbs->sg_cur_index == dbs->sg->nsg && dbs->iov.size == 0)
>> {
>> + dma_complete(dbs, ret);
>> + return;
>> + }
>> }
>
> I think the concept of skipping SG list entries makes this patch
> relatively complex. Maybe one of these would work better:
>
> 1. Instead of skipping, add a temporary bounce buffer to the iovec.
>
> 2. Instead of skipping, just exit the loop and effectively split the
> request in multiple parts (like you already do in one case). Then
> the
> memory will still be written to twice, but deterministically so that
> the later SG list entry always wins.
>
> I think 2. sounds quite attractive because you don't have to manage any
> additional state. You can even simplify the loop to use
> ranges_overlap()
Thanks for this idea. Please check the new version.
I didn't find how to check SG addresses without making the comparisons
too complex
and without storing extra data. Therefore I pass iov pointers directly
to ranges_overlap().
Pavel Dovgalyuk
prev parent reply other threads:[~2020-03-03 12:32 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-02 12:59 [PATCH] icount: make dma reads deterministic Pavel Dovgalyuk
2020-03-02 16:19 ` Kevin Wolf
2020-03-03 12:31 ` dovgaluk [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a6067910aa1bb4eb512c50292734b566@ispras.ru \
--to=dovgaluk@ispras.ru \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pavel.dovgaluk@gmail.com \
--cc=pavel.dovgaluk@ispras.ru \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).