qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] hw/ide: reset: cancel async DMA operation before reseting state
@ 2023-08-24 13:38 Fiona Ebner
  2023-08-24 13:38 ` [POC 2/2] add test exposing AHCI reset issue Fiona Ebner
  2023-08-24 15:07 ` [PATCH 1/2] hw/ide: reset: cancel async DMA operation before reseting state Philippe Mathieu-Daudé
  0 siblings, 2 replies; 7+ messages in thread
From: Fiona Ebner @ 2023-08-24 13:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: thuth, lvivier, pbonzini, srowe, mike.maslenkin, qemu-block,
	t.lamprecht, a.lauterer

If there is a pending DMA operation during ide_bus_reset(), the fact
that the IDEState is already reset before the operation is canceled
can be problematic. In particular, ide_dma_cb() might be called and
then use the reset IDEState which contains the signature after the
reset. When used to construct the IO operation this leads to
ide_get_sector() returning 0 and nsector being 1. This is particularly
bad, because a write command will thus destroy the first sector which
often contains a partition table or similar.

Traces showing the unsolicited write happening with IDEState
0x5595af6949d0 being used after reset:

> ahci_port_write ahci(0x5595af6923f0)[0]: port write [reg:PxSCTL] @ 0x2c: 0x00000300
> ahci_reset_port ahci(0x5595af6923f0)[0]: reset port
> ide_reset IDEstate 0x5595af6949d0
> ide_reset IDEstate 0x5595af694da8
> ide_bus_reset_aio aio_cancel
> dma_aio_cancel dbs=0x7f64600089a0
> dma_blk_cb dbs=0x7f64600089a0 ret=0
> dma_complete dbs=0x7f64600089a0 ret=0 cb=0x5595acd40b30
> ahci_populate_sglist ahci(0x5595af6923f0)[0]
> ahci_dma_prepare_buf ahci(0x5595af6923f0)[0]: prepare buf limit=512 prepared=512
> ide_dma_cb IDEState 0x5595af6949d0; sector_num=0 n=1 cmd=DMA WRITE
> dma_blk_io dbs=0x7f6420802010 bs=0x5595ae2c6c30 offset=0 to_dev=1
> dma_blk_cb dbs=0x7f6420802010 ret=0

> (gdb) p *qiov
> $11 = {iov = 0x7f647c76d840, niov = 1, {{nalloc = 1, local_iov = {iov_base = 0x0,
>       iov_len = 512}}, {__pad = "\001\000\000\000\000\000\000\000\000\000\000",
>       size = 512}}}
> (gdb) bt
> #0  blk_aio_pwritev (blk=0x5595ae2c6c30, offset=0, qiov=0x7f6420802070, flags=0,
>     cb=0x5595ace6f0b0 <dma_blk_cb>, opaque=0x7f6420802010)
>     at ../block/block-backend.c:1682
> #1  0x00005595ace6f185 in dma_blk_cb (opaque=0x7f6420802010, ret=<optimized out>)
>     at ../softmmu/dma-helpers.c:179
> #2  0x00005595ace6f778 in dma_blk_io (ctx=0x5595ae0609f0,
>     sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
>     io_func=io_func@entry=0x5595ace6ee30 <dma_blk_write_io_func>,
>     io_func_opaque=io_func_opaque@entry=0x5595ae2c6c30,
>     cb=0x5595acd40b30 <ide_dma_cb>, opaque=0x5595af6949d0,
>     dir=DMA_DIRECTION_TO_DEVICE) at ../softmmu/dma-helpers.c:244
> #3  0x00005595ace6f90a in dma_blk_write (blk=0x5595ae2c6c30,
>     sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
>     cb=cb@entry=0x5595acd40b30 <ide_dma_cb>, opaque=opaque@entry=0x5595af6949d0)
>     at ../softmmu/dma-helpers.c:280
> #4  0x00005595acd40e18 in ide_dma_cb (opaque=0x5595af6949d0, ret=<optimized out>)
>     at ../hw/ide/core.c:953
> #5  0x00005595ace6f319 in dma_complete (ret=0, dbs=0x7f64600089a0)
>     at ../softmmu/dma-helpers.c:107
> #6  dma_blk_cb (opaque=0x7f64600089a0, ret=0) at ../softmmu/dma-helpers.c:127
> #7  0x00005595ad12227d in blk_aio_complete (acb=0x7f6460005b10)
>     at ../block/block-backend.c:1527
> #8  blk_aio_complete (acb=0x7f6460005b10) at ../block/block-backend.c:1524
> #9  blk_aio_write_entry (opaque=0x7f6460005b10) at ../block/block-backend.c:1594
> #10 0x00005595ad258cfb in coroutine_trampoline (i0=<optimized out>,
>     i1=<optimized out>) at ../util/coroutine-ucontext.c:177

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 hw/ide/core.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index de48ff9f86..7e601a379a 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2515,19 +2515,19 @@ static void ide_dummy_transfer_stop(IDEState *s)
 
 void ide_bus_reset(IDEBus *bus)
 {
-    bus->unit = 0;
-    bus->cmd = 0;
-    ide_reset(&bus->ifs[0]);
-    ide_reset(&bus->ifs[1]);
-    ide_clear_hob(bus);
-
-    /* pending async DMA */
+    /* pending async DMA - needs the IDEState before it is reset */
     if (bus->dma->aiocb) {
         trace_ide_bus_reset_aio();
         blk_aio_cancel(bus->dma->aiocb);
         bus->dma->aiocb = NULL;
     }
 
+    bus->unit = 0;
+    bus->cmd = 0;
+    ide_reset(&bus->ifs[0]);
+    ide_reset(&bus->ifs[1]);
+    ide_clear_hob(bus);
+
     /* reset dma provider too */
     if (bus->dma->ops->reset) {
         bus->dma->ops->reset(bus->dma);
-- 
2.39.2




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [POC 2/2] add test exposing AHCI reset issue
  2023-08-24 13:38 [PATCH 1/2] hw/ide: reset: cancel async DMA operation before reseting state Fiona Ebner
@ 2023-08-24 13:38 ` Fiona Ebner
  2023-08-24 15:09   ` Philippe Mathieu-Daudé
                     ` (2 more replies)
  2023-08-24 15:07 ` [PATCH 1/2] hw/ide: reset: cancel async DMA operation before reseting state Philippe Mathieu-Daudé
  1 sibling, 3 replies; 7+ messages in thread
From: Fiona Ebner @ 2023-08-24 13:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: thuth, lvivier, pbonzini, srowe, mike.maslenkin, qemu-block,
	t.lamprecht, a.lauterer

Fails without the previous commit "hw/ide: reset: cancel async DMA
operation before reseting state".

I haven't ever written such a test before, but I wanted something to
expose the problem more easily. It hardcodes the behavior that the
pending write actually is done during reset, which might not be ideal.
It could just check that the first sector is still intact instead.

If I should make this a proper test, I'd be happy about some guidance,
but not sure if required for such a specific one-off issue. After all,
a different variation of the bug might have written to some other
sector not covered by this test.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 tests/qtest/ahci-test.c | 81 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c
index abab761c26..3ebeb4e255 100644
--- a/tests/qtest/ahci-test.c
+++ b/tests/qtest/ahci-test.c
@@ -1401,6 +1401,84 @@ static void test_max(void)
     ahci_shutdown(ahci);
 }
 
+static void test_reset_with_pending_callback(void)
+{
+    AHCIQState *ahci;
+
+    ahci = ahci_boot(NULL);
+    ahci_test_pci_spec(ahci);
+    ahci_pci_enable(ahci);
+
+    int bufsize = 512 * 1024;
+    int offset1 = 0;
+    int offset2 = bufsize / AHCI_SECTOR_SIZE;
+
+    ahci_test_hba_spec(ahci);
+    ahci_hba_enable(ahci);
+    ahci_test_identify(ahci);
+
+    uint8_t port = ahci_port_select(ahci);
+    ahci_port_clear(ahci, port);
+
+    unsigned char *tx1 = g_malloc(bufsize);
+    unsigned char *tx2 = g_malloc(bufsize);
+    unsigned char *rx1 = g_malloc0(bufsize);
+    unsigned char *rx2 = g_malloc0(bufsize);
+    uint64_t ptr1 = ahci_alloc(ahci, bufsize);
+    uint64_t ptr2 = ahci_alloc(ahci, bufsize);
+
+    g_assert(ptr1 && ptr2);
+
+    /* Need two different patterns. */
+    do {
+        generate_pattern(tx1, bufsize, AHCI_SECTOR_SIZE);
+        generate_pattern(tx2, bufsize, AHCI_SECTOR_SIZE);
+    } while (memcmp(tx1, tx2, bufsize) == 0);
+
+    qtest_bufwrite(ahci->parent->qts, ptr1, tx1, bufsize);
+    qtest_bufwrite(ahci->parent->qts, ptr2, tx2, bufsize);
+
+    /* Write to beginning of disk to check it wasn't overwritten later. */
+    ahci_guest_io(ahci, port, CMD_WRITE_DMA_EXT, ptr1, bufsize, offset1);
+
+    /* Issue asynchronously to get a pending callback during reset. */
+    AHCICommand *cmd = ahci_command_create(CMD_WRITE_DMA_EXT);
+    ahci_command_adjust(cmd, offset2, ptr2, bufsize, 0);
+    ahci_command_commit(ahci, cmd, port);
+    ahci_command_issue_async(ahci, cmd);
+
+    ahci_set(ahci, AHCI_GHC, AHCI_GHC_HR);
+
+    ahci_command_free(cmd);
+
+    /* Start again. */
+    ahci_clean_mem(ahci);
+    ahci_pci_enable(ahci);
+    ahci_hba_enable(ahci);
+    port = ahci_port_select(ahci);
+    ahci_port_clear(ahci, port);
+
+    /* Read and verify. */
+    ahci_guest_io(ahci, port, CMD_READ_DMA_EXT, ptr1, bufsize, offset1);
+    qtest_bufread(ahci->parent->qts, ptr1, rx1, bufsize);
+    g_assert_cmphex(memcmp(tx1, rx1, bufsize), ==, 0);
+
+    ahci_guest_io(ahci, port, CMD_READ_DMA_EXT, ptr2, bufsize, offset2);
+    qtest_bufread(ahci->parent->qts, ptr2, rx2, bufsize);
+    g_assert_cmphex(memcmp(tx2, rx2, bufsize), ==, 0);
+
+    ahci_free(ahci, ptr1);
+    ahci_free(ahci, ptr2);
+    g_free(tx1);
+    g_free(tx2);
+    g_free(rx1);
+    g_free(rx2);
+
+    ahci_clean_mem(ahci);
+
+    ahci_shutdown(ahci);
+}
+
 static void test_reset(void)
 {
     AHCIQState *ahci;
@@ -1915,6 +1993,9 @@ int main(int argc, char **argv)
     g_assert(fd >= 0);
     close(fd);
 
+    qtest_add_func("/ahci/reset_with_pending_callback",
+                   test_reset_with_pending_callback);
+
     /* Run the tests */
     qtest_add_func("/ahci/sanity",     test_sanity);
     qtest_add_func("/ahci/pci_spec",   test_pci_spec);
-- 
2.39.2




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] hw/ide: reset: cancel async DMA operation before reseting state
  2023-08-24 13:38 [PATCH 1/2] hw/ide: reset: cancel async DMA operation before reseting state Fiona Ebner
  2023-08-24 13:38 ` [POC 2/2] add test exposing AHCI reset issue Fiona Ebner
@ 2023-08-24 15:07 ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-08-24 15:07 UTC (permalink / raw)
  To: Fiona Ebner, qemu-devel
  Cc: thuth, lvivier, pbonzini, srowe, mike.maslenkin, qemu-block,
	t.lamprecht, a.lauterer

On 24/8/23 15:38, Fiona Ebner wrote:
> If there is a pending DMA operation during ide_bus_reset(), the fact
> that the IDEState is already reset before the operation is canceled
> can be problematic. In particular, ide_dma_cb() might be called and
> then use the reset IDEState which contains the signature after the
> reset. When used to construct the IO operation this leads to
> ide_get_sector() returning 0 and nsector being 1. This is particularly
> bad, because a write command will thus destroy the first sector which
> often contains a partition table or similar.
> 
> Traces showing the unsolicited write happening with IDEState
> 0x5595af6949d0 being used after reset:
> 
>> ahci_port_write ahci(0x5595af6923f0)[0]: port write [reg:PxSCTL] @ 0x2c: 0x00000300
>> ahci_reset_port ahci(0x5595af6923f0)[0]: reset port
>> ide_reset IDEstate 0x5595af6949d0
>> ide_reset IDEstate 0x5595af694da8
>> ide_bus_reset_aio aio_cancel
>> dma_aio_cancel dbs=0x7f64600089a0
>> dma_blk_cb dbs=0x7f64600089a0 ret=0
>> dma_complete dbs=0x7f64600089a0 ret=0 cb=0x5595acd40b30
>> ahci_populate_sglist ahci(0x5595af6923f0)[0]
>> ahci_dma_prepare_buf ahci(0x5595af6923f0)[0]: prepare buf limit=512 prepared=512
>> ide_dma_cb IDEState 0x5595af6949d0; sector_num=0 n=1 cmd=DMA WRITE
>> dma_blk_io dbs=0x7f6420802010 bs=0x5595ae2c6c30 offset=0 to_dev=1
>> dma_blk_cb dbs=0x7f6420802010 ret=0
> 
>> (gdb) p *qiov
>> $11 = {iov = 0x7f647c76d840, niov = 1, {{nalloc = 1, local_iov = {iov_base = 0x0,
>>        iov_len = 512}}, {__pad = "\001\000\000\000\000\000\000\000\000\000\000",
>>        size = 512}}}
>> (gdb) bt
>> #0  blk_aio_pwritev (blk=0x5595ae2c6c30, offset=0, qiov=0x7f6420802070, flags=0,
>>      cb=0x5595ace6f0b0 <dma_blk_cb>, opaque=0x7f6420802010)
>>      at ../block/block-backend.c:1682
>> #1  0x00005595ace6f185 in dma_blk_cb (opaque=0x7f6420802010, ret=<optimized out>)
>>      at ../softmmu/dma-helpers.c:179
>> #2  0x00005595ace6f778 in dma_blk_io (ctx=0x5595ae0609f0,
>>      sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
>>      io_func=io_func@entry=0x5595ace6ee30 <dma_blk_write_io_func>,
>>      io_func_opaque=io_func_opaque@entry=0x5595ae2c6c30,
>>      cb=0x5595acd40b30 <ide_dma_cb>, opaque=0x5595af6949d0,
>>      dir=DMA_DIRECTION_TO_DEVICE) at ../softmmu/dma-helpers.c:244
>> #3  0x00005595ace6f90a in dma_blk_write (blk=0x5595ae2c6c30,
>>      sg=sg@entry=0x5595af694d00, offset=offset@entry=0, align=align@entry=512,
>>      cb=cb@entry=0x5595acd40b30 <ide_dma_cb>, opaque=opaque@entry=0x5595af6949d0)
>>      at ../softmmu/dma-helpers.c:280
>> #4  0x00005595acd40e18 in ide_dma_cb (opaque=0x5595af6949d0, ret=<optimized out>)
>>      at ../hw/ide/core.c:953
>> #5  0x00005595ace6f319 in dma_complete (ret=0, dbs=0x7f64600089a0)
>>      at ../softmmu/dma-helpers.c:107
>> #6  dma_blk_cb (opaque=0x7f64600089a0, ret=0) at ../softmmu/dma-helpers.c:127
>> #7  0x00005595ad12227d in blk_aio_complete (acb=0x7f6460005b10)
>>      at ../block/block-backend.c:1527
>> #8  blk_aio_complete (acb=0x7f6460005b10) at ../block/block-backend.c:1524
>> #9  blk_aio_write_entry (opaque=0x7f6460005b10) at ../block/block-backend.c:1594
>> #10 0x00005595ad258cfb in coroutine_trampoline (i0=<optimized out>,
>>      i1=<optimized out>) at ../util/coroutine-ucontext.c:177
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>   hw/ide/core.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [POC 2/2] add test exposing AHCI reset issue
  2023-08-24 13:38 ` [POC 2/2] add test exposing AHCI reset issue Fiona Ebner
@ 2023-08-24 15:09   ` Philippe Mathieu-Daudé
  2023-08-24 15:52   ` Thomas Huth
  2023-08-25 10:17   ` Fiona Ebner
  2 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-08-24 15:09 UTC (permalink / raw)
  To: Fiona Ebner, qemu-devel
  Cc: thuth, lvivier, pbonzini, srowe, mike.maslenkin, qemu-block,
	t.lamprecht, a.lauterer

On 24/8/23 15:38, Fiona Ebner wrote:
> Fails without the previous commit "hw/ide: reset: cancel async DMA
> operation before reseting state".
> 
> I haven't ever written such a test before, but I wanted something to
> expose the problem more easily. It hardcodes the behavior that the
> pending write actually is done during reset, which might not be ideal.
> It could just check that the first sector is still intact instead.
> 
> If I should make this a proper test, I'd be happy about some guidance,
> but not sure if required for such a specific one-off issue. After all,
> a different variation of the bug might have written to some other
> sector not covered by this test.
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>   tests/qtest/ahci-test.c | 81 +++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 81 insertions(+)

Per our style, variables are allocated on function prologue,
otherwise the test LGTM:

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [POC 2/2] add test exposing AHCI reset issue
  2023-08-24 13:38 ` [POC 2/2] add test exposing AHCI reset issue Fiona Ebner
  2023-08-24 15:09   ` Philippe Mathieu-Daudé
@ 2023-08-24 15:52   ` Thomas Huth
  2023-08-25 10:17   ` Fiona Ebner
  2 siblings, 0 replies; 7+ messages in thread
From: Thomas Huth @ 2023-08-24 15:52 UTC (permalink / raw)
  To: Fiona Ebner, qemu-devel
  Cc: lvivier, pbonzini, srowe, mike.maslenkin, qemu-block, t.lamprecht,
	a.lauterer

On 24/08/2023 15.38, Fiona Ebner wrote:
> Fails without the previous commit "hw/ide: reset: cancel async DMA
> operation before reseting state".
> 
> I haven't ever written such a test before, but I wanted something to
> expose the problem more easily. It hardcodes the behavior that the
> pending write actually is done during reset, which might not be ideal.
> It could just check that the first sector is still intact instead.
> 
> If I should make this a proper test, I'd be happy about some guidance,
> but not sure if required for such a specific one-off issue. After all,
> a different variation of the bug might have written to some other
> sector not covered by this test.
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>   tests/qtest/ahci-test.c | 81 +++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 81 insertions(+)
> 
> diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c
> index abab761c26..3ebeb4e255 100644
> --- a/tests/qtest/ahci-test.c
> +++ b/tests/qtest/ahci-test.c
> @@ -1401,6 +1401,84 @@ static void test_max(void)
>       ahci_shutdown(ahci);
>   }
>   
> +static void test_reset_with_pending_callback(void)
> +{
> +    AHCIQState *ahci;
> +
> +    ahci = ahci_boot(NULL);
> +    ahci_test_pci_spec(ahci);
> +    ahci_pci_enable(ahci);
> +
> +    int bufsize = 512 * 1024;
> +    int offset1 = 0;
> +    int offset2 = bufsize / AHCI_SECTOR_SIZE;
> +
> +    ahci_test_hba_spec(ahci);
> +    ahci_hba_enable(ahci);
> +    ahci_test_identify(ahci);
> +
> +    uint8_t port = ahci_port_select(ahci);
> +    ahci_port_clear(ahci, port);
> +
> +    unsigned char *tx1 = g_malloc(bufsize);
> +    unsigned char *tx2 = g_malloc(bufsize);
> +    unsigned char *rx1 = g_malloc0(bufsize);
> +    unsigned char *rx2 = g_malloc0(bufsize);
> +    uint64_t ptr1 = ahci_alloc(ahci, bufsize);
> +    uint64_t ptr2 = ahci_alloc(ahci, bufsize);

As Philippe already mentioned, please declare variables at the beginning of 
the functions to match our coding style.

I'd also oike to suggest to use g_autofree for the buffers that you 
malloced, so you can drop the g_frees at the end of the function.

> +    g_assert(ptr1 && ptr2);
> +
> +    /* Need two different patterns. */
> +    do {
> +        generate_pattern(tx1, bufsize, AHCI_SECTOR_SIZE);
> +        generate_pattern(tx2, bufsize, AHCI_SECTOR_SIZE);
> +    } while (memcmp(tx1, tx2, bufsize) == 0);
> +
> +    qtest_bufwrite(ahci->parent->qts, ptr1, tx1, bufsize);
> +    qtest_bufwrite(ahci->parent->qts, ptr2, tx2, bufsize);
> +
> +    /* Write to beginning of disk to check it wasn't overwritten later. */
> +    ahci_guest_io(ahci, port, CMD_WRITE_DMA_EXT, ptr1, bufsize, offset1);
> +
> +    /* Issue asynchronously to get a pending callback during reset. */
> +    AHCICommand *cmd = ahci_command_create(CMD_WRITE_DMA_EXT);
> +    ahci_command_adjust(cmd, offset2, ptr2, bufsize, 0);
> +    ahci_command_commit(ahci, cmd, port);
> +    ahci_command_issue_async(ahci, cmd);
> +
> +    ahci_set(ahci, AHCI_GHC, AHCI_GHC_HR);
> +
> +    ahci_command_free(cmd);
> +
> +    /* Start again. */
> +    ahci_clean_mem(ahci);
> +    ahci_pci_enable(ahci);
> +    ahci_hba_enable(ahci);
> +    port = ahci_port_select(ahci);
> +    ahci_port_clear(ahci, port);
> +
> +    /* Read and verify. */
> +    ahci_guest_io(ahci, port, CMD_READ_DMA_EXT, ptr1, bufsize, offset1);
> +    qtest_bufread(ahci->parent->qts, ptr1, rx1, bufsize);
> +    g_assert_cmphex(memcmp(tx1, rx1, bufsize), ==, 0);
> +
> +    ahci_guest_io(ahci, port, CMD_READ_DMA_EXT, ptr2, bufsize, offset2);
> +    qtest_bufread(ahci->parent->qts, ptr2, rx2, bufsize);
> +    g_assert_cmphex(memcmp(tx2, rx2, bufsize), ==, 0);
> +
> +    ahci_free(ahci, ptr1);
> +    ahci_free(ahci, ptr2);
> +    g_free(tx1);
> +    g_free(tx2);
> +    g_free(rx1);
> +    g_free(rx2);
> +
> +    ahci_clean_mem(ahci);
> +
> +    ahci_shutdown(ahci);
> +}
> +
>   static void test_reset(void)
>   {
>       AHCIQState *ahci;
> @@ -1915,6 +1993,9 @@ int main(int argc, char **argv)
>       g_assert(fd >= 0);
>       close(fd);
>   
> +    qtest_add_func("/ahci/reset_with_pending_callback",
> +                   test_reset_with_pending_callback);

It would make more sense to put this below the "Run the tests" comment below.

>       /* Run the tests */
>       qtest_add_func("/ahci/sanity",     test_sanity);
>       qtest_add_func("/ahci/pci_spec",   test_pci_spec);

  Thomas



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [POC 2/2] add test exposing AHCI reset issue
  2023-08-24 13:38 ` [POC 2/2] add test exposing AHCI reset issue Fiona Ebner
  2023-08-24 15:09   ` Philippe Mathieu-Daudé
  2023-08-24 15:52   ` Thomas Huth
@ 2023-08-25 10:17   ` Fiona Ebner
  2023-09-04  9:26     ` Kevin Wolf
  2 siblings, 1 reply; 7+ messages in thread
From: Fiona Ebner @ 2023-08-25 10:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: thuth, lvivier, pbonzini, srowe, mike.maslenkin, qemu-block,
	t.lamprecht, a.lauterer, Philippe Mathieu-Daudé

Am 24.08.23 um 15:38 schrieb Fiona Ebner:
> Fails without the previous commit "hw/ide: reset: cancel async DMA
> operation before reseting state".
> 
> I haven't ever written such a test before, but I wanted something to
> expose the problem more easily. It hardcodes the behavior that the
> pending write actually is done during reset, which might not be ideal.
> It could just check that the first sector is still intact instead.
> 
> If I should make this a proper test, I'd be happy about some guidance,
> but not sure if required for such a specific one-off issue. After all,
> a different variation of the bug might have written to some other
> sector not covered by this test.
> 

While trying to turn it into a proper test with Philippe's and Thomas's
suggestions, I wanted to add a comment about the buffer size. So I tried
figuring out what the "magic" value is. At the very beginning, I had
tried 4 KiB, but then the callback wouldn't be pending, so I just picked
512 KiB for my proof-of-concept. It turns out to be racy though, and
with a buffer size of 64 KiB, it is flaky whether or not the callback is
still pending on my system. Should I just pick a large enough buffer
size (maybe 4 MiB) and hope for the best?

Best Regards,
Fiona



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [POC 2/2] add test exposing AHCI reset issue
  2023-08-25 10:17   ` Fiona Ebner
@ 2023-09-04  9:26     ` Kevin Wolf
  0 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2023-09-04  9:26 UTC (permalink / raw)
  To: Fiona Ebner
  Cc: qemu-devel, thuth, lvivier, pbonzini, srowe, mike.maslenkin,
	qemu-block, t.lamprecht, a.lauterer, Philippe Mathieu-Daudé

Am 25.08.2023 um 12:17 hat Fiona Ebner geschrieben:
> Am 24.08.23 um 15:38 schrieb Fiona Ebner:
> > Fails without the previous commit "hw/ide: reset: cancel async DMA
> > operation before reseting state".
> > 
> > I haven't ever written such a test before, but I wanted something to
> > expose the problem more easily. It hardcodes the behavior that the
> > pending write actually is done during reset, which might not be ideal.
> > It could just check that the first sector is still intact instead.
> > 
> > If I should make this a proper test, I'd be happy about some guidance,
> > but not sure if required for such a specific one-off issue. After all,
> > a different variation of the bug might have written to some other
> > sector not covered by this test.
> > 
> 
> While trying to turn it into a proper test with Philippe's and Thomas's
> suggestions, I wanted to add a comment about the buffer size. So I tried
> figuring out what the "magic" value is. At the very beginning, I had
> tried 4 KiB, but then the callback wouldn't be pending, so I just picked
> 512 KiB for my proof-of-concept. It turns out to be racy though, and
> with a buffer size of 64 KiB, it is flaky whether or not the callback is
> still pending on my system. Should I just pick a large enough buffer
> size (maybe 4 MiB) and hope for the best?

If the problem is that the request may complete too fast, have you tried
using I/O throttling? This is a common approach in qemu-iotests.

Note however that a single big request won't be throttled. If you exceed
the limit, it's only the next request that has to wait until we made up
for the previous one. So you'll want to set the limit below the request
size of a first request (so that we do get some delay), but not much
lower (to avoid having to wait for too long), and then send the second
request that should be delayed for a bit.

Kevin



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-04  9:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-24 13:38 [PATCH 1/2] hw/ide: reset: cancel async DMA operation before reseting state Fiona Ebner
2023-08-24 13:38 ` [POC 2/2] add test exposing AHCI reset issue Fiona Ebner
2023-08-24 15:09   ` Philippe Mathieu-Daudé
2023-08-24 15:52   ` Thomas Huth
2023-08-25 10:17   ` Fiona Ebner
2023-09-04  9:26     ` Kevin Wolf
2023-08-24 15:07 ` [PATCH 1/2] hw/ide: reset: cancel async DMA operation before reseting state Philippe Mathieu-Daudé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).