qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] misc AHCI cleanups
@ 2023-04-28 13:21 Niklas Cassel
  2023-04-28 13:21 ` [PATCH 1/9] hw/ide/ahci: remove stray backslash Niklas Cassel
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Hello John,

Here comes some misc AHCI cleanups.

Most are related to error handling.

Please review.

(I'm also working on a second series which will add support for
READ LOG EXT and READ LOG DMA EXT, but I will send that one out
once it is ready.)


Kind regards,
Niklas


Niklas Cassel (9):
  hw/ide/ahci: remove stray backslash
  hw/ide/core: set ERR_STAT in unsupported command completion
  hw/ide/ahci: write D2H FIS on when processing NCQ command
  hw/ide/ahci: simplify and document PxCI handling
  hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set
  hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared
  hw/ide/ahci: trigger either error IRQ or regular IRQ, not both
  hw/ide/ahci: fix ahci_write_fis_sdb()
  hw/ide/ahci: fix broken SError handling

 hw/ide/ahci.c | 111 +++++++++++++++++++++++++++++++++++---------------
 hw/ide/core.c |   2 +-
 2 files changed, 80 insertions(+), 33 deletions(-)

-- 
2.40.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/9] hw/ide/ahci: remove stray backslash
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
@ 2023-04-28 13:21 ` Niklas Cassel
  2023-04-28 22:17   ` Philippe Mathieu-Daudé
  2023-05-17 17:14   ` John Snow
  2023-04-28 13:21 ` [PATCH 2/9] hw/ide/core: set ERR_STAT in unsupported command completion Niklas Cassel
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

This backslash obviously does not belong here, so remove it.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/ahci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 55902e1df7..a36e3fb77c 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -690,7 +690,7 @@ static void ahci_reset_port(AHCIState *s, int port)
 
     s->dev[port].port_state = STATE_RUN;
     if (ide_state->drive_kind == IDE_CD) {
-        ahci_set_signature(d, SATA_SIGNATURE_CDROM);\
+        ahci_set_signature(d, SATA_SIGNATURE_CDROM);
         ide_state->status = SEEK_STAT | WRERR_STAT | READY_STAT;
     } else {
         ahci_set_signature(d, SATA_SIGNATURE_DISK);
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/9] hw/ide/core: set ERR_STAT in unsupported command completion
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
  2023-04-28 13:21 ` [PATCH 1/9] hw/ide/ahci: remove stray backslash Niklas Cassel
@ 2023-04-28 13:21 ` Niklas Cassel
  2023-05-17 21:12   ` John Snow
  2023-04-28 13:21 ` [PATCH 3/9] hw/ide/ahci: write D2H FIS on when processing NCQ command Niklas Cassel
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

Currently, the first time sending an unsupported command
(e.g. READ LOG DMA EXT) will not have ERR_STAT set in the completion.
Sending the unsupported command again, will correctly have ERR_STAT set.

When ide_cmd_permitted() returns false, it calls ide_abort_command().
ide_abort_command() first calls ide_transfer_stop(), which will call
ide_transfer_halt() and ide_cmd_done(), after that ide_abort_command()
sets ERR_STAT in status.

ide_cmd_done() for AHCI will call ahci_write_fis_d2h() which writes the
current status in the FIS, and raises an IRQ. (The status here will not
have ERR_STAT set!).

Thus, we cannot call ide_transfer_stop() before setting ERR_STAT, as
ide_transfer_stop() will result in the FIS being written and an IRQ
being raised.

The reason why it works the second time, is that ERR_STAT will still
be set from the previous command, so when writing the FIS, the
completion will correctly have ERR_STAT set.

Set ERR_STAT before writing the FIS (calling cmd_done), so that we will
raise an error IRQ correctly when receiving an unsupported command.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 45d14a25e9..c144d1155d 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -531,9 +531,9 @@ BlockAIOCB *ide_issue_trim(
 
 void ide_abort_command(IDEState *s)
 {
-    ide_transfer_stop(s);
     s->status = READY_STAT | ERR_STAT;
     s->error = ABRT_ERR;
+    ide_transfer_stop(s);
 }
 
 static void ide_set_retry(IDEState *s)
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/9] hw/ide/ahci: write D2H FIS on when processing NCQ command
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
  2023-04-28 13:21 ` [PATCH 1/9] hw/ide/ahci: remove stray backslash Niklas Cassel
  2023-04-28 13:21 ` [PATCH 2/9] hw/ide/core: set ERR_STAT in unsupported command completion Niklas Cassel
@ 2023-04-28 13:21 ` Niklas Cassel
  2023-05-17 21:18   ` John Snow
  2023-04-28 13:21 ` [PATCH 4/9] hw/ide/ahci: simplify and document PxCI handling Niklas Cassel
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

The way that BUSY + PxCI is cleared for NCQ (FPDMA QUEUED) commands is
described in SATA 3.5a Gold:

11.15 FPDMA QUEUED command protocol
DFPDMAQ2: ClearInterfaceBsy
"Transmit Register Device to Host FIS with the BSY bit cleared to zero
and the DRQ bit cleared to zero and Interrupt bit cleared to zero to
mark interface ready for the next command."

PxCI is currently cleared by handle_cmd(), but we don't write the D2H
FIS to the FIS Receive Area that actually caused PxCI to be cleared.

Similar to how ahci_pio_transfer() calls ahci_write_fis_pio() with an
additional parameter to write a PIO Setup FIS without raising an IRQ,
add a parameter to ahci_write_fis_d2h() so that ahci_write_fis_d2h()
also can write the FIS to the FIS Receive Area without raising an IRQ.

Change process_ncq_command() to call ahci_write_fis_d2h() without
raising an IRQ (similar to ahci_pio_transfer()), such that the FIS
Receive Area is in sync with the PxTFD shadow register.

E.g. Linux reads status and error fields from the FIS Receive Area
directly, so it is wise to keep the FIS Receive Area and the PxTFD
shadow register in sync.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/ahci.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index a36e3fb77c..62aebc8de7 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -43,7 +43,7 @@
 static void check_cmd(AHCIState *s, int port);
 static int handle_cmd(AHCIState *s, int port, uint8_t slot);
 static void ahci_reset_port(AHCIState *s, int port);
-static bool ahci_write_fis_d2h(AHCIDevice *ad);
+static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i);
 static void ahci_init_d2h(AHCIDevice *ad);
 static int ahci_dma_prepare_buf(const IDEDMA *dma, int32_t limit);
 static bool ahci_map_clb_address(AHCIDevice *ad);
@@ -618,7 +618,7 @@ static void ahci_init_d2h(AHCIDevice *ad)
         return;
     }
 
-    if (ahci_write_fis_d2h(ad)) {
+    if (ahci_write_fis_d2h(ad, true)) {
         ad->init_d2h_sent = true;
         /* We're emulating receiving the first Reg H2D Fis from the device;
          * Update the SIG register, but otherwise proceed as normal. */
@@ -850,7 +850,7 @@ static void ahci_write_fis_pio(AHCIDevice *ad, uint16_t len, bool pio_fis_i)
     }
 }
 
-static bool ahci_write_fis_d2h(AHCIDevice *ad)
+static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i)
 {
     AHCIPortRegs *pr = &ad->port_regs;
     uint8_t *d2h_fis;
@@ -864,7 +864,7 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad)
     d2h_fis = &ad->res_fis[RES_FIS_RFIS];
 
     d2h_fis[0] = SATA_FIS_TYPE_REGISTER_D2H;
-    d2h_fis[1] = (1 << 6); /* interrupt bit */
+    d2h_fis[1] = d2h_fis_i ? (1 << 6) : 0; /* interrupt bit */
     d2h_fis[2] = s->status;
     d2h_fis[3] = s->error;
 
@@ -890,7 +890,10 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad)
         ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_TFES);
     }
 
-    ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
+    if (d2h_fis_i) {
+        ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
+    }
+
     return true;
 }
 
@@ -1120,6 +1123,8 @@ static void process_ncq_command(AHCIState *s, int port, const uint8_t *cmd_fis,
         return;
     }
 
+    ahci_write_fis_d2h(ad, false);
+
     ncq_tfs->used = 1;
     ncq_tfs->drive = ad;
     ncq_tfs->slot = slot;
@@ -1506,7 +1511,7 @@ static void ahci_cmd_done(const IDEDMA *dma)
     }
 
     /* update d2h status */
-    ahci_write_fis_d2h(ad);
+    ahci_write_fis_d2h(ad, true);
 
     if (ad->port_regs.cmd_issue && !ad->check_bh) {
         ad->check_bh = qemu_bh_new(ahci_check_cmd_bh, ad);
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/9] hw/ide/ahci: simplify and document PxCI handling
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
                   ` (2 preceding siblings ...)
  2023-04-28 13:21 ` [PATCH 3/9] hw/ide/ahci: write D2H FIS on when processing NCQ command Niklas Cassel
@ 2023-04-28 13:21 ` Niklas Cassel
  2023-04-28 13:21 ` [PATCH 5/9] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set Niklas Cassel
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

The AHCI spec states that:
For NCQ, PxCI is cleared on command queued successfully.

For non-NCQ, PxCI is cleared on command completed successfully.
(A non-NCQ command that completes with error does not clear PxCI.)

The current QEMU implementation either clears PxCI in check_cmd(),
or in ahci_cmd_done().

check_cmd() will clear PxCI for a command if handle_cmd() returns 0.
handle_cmd() will return -1 if BUSY or DRQ is set.

The QEMU implementation for NCQ commands will currently not set BUSY
or DRQ, so they will always have PxCI cleared by handle_cmd().
ahci_cmd_done() will never even get called for NCQ commands.

Non-NCQ commands are executed by ide_bus_exec_cmd().
Non-NCQ commands in QEMU are implemented either in a sync or in an async
way.

For non-NCQ commands implemented in a sync way, the command handler will
return true, and when ide_bus_exec_cmd() sees that a command handler
returns true, it will call ide_cmd_done() (which will call
ahci_cmd_done()). For a command implemented in a sync way,
ahci_cmd_done() will do nothing (since busy_slot is not set). Instead,
after ide_bus_exec_cmd() has finished, check_cmd() will clear PxCI for
these commands.

For non-NCQ commands implemented in an async way (using either aiocb or
pio_aiocb), the command handler will return false, ide_bus_exec_cmd()
will not call ide_cmd_done(), instead it is expected that the async
callback function will call ide_cmd_done() once the async command is
done. handle_cmd() will set busy_slot, if and only if BUSY or DRQ is
set, and this is checked _after_ ide_bus_exec_cmd() has returned.
handle_cmd() will return -1, so check_cmd() will not clear PxCI.
When the async callback calls ide_cmd_done() (which will call
ahci_cmd_done()), it will see that busy_slot is set, and
ahci_cmd_done() will clear PxCI.

This seems racy, since busy_slot is set _after_ ide_bus_exec_cmd() has
returned. The callback might come before busy_slot gets set. And it is
quite confusing that ahci_cmd_done() will be called for all non-NCQ
commands when the command is done, but will only clear PxCI in certain
cases, even though it will always write a D2H FIS and raise an IRQ.

Even worse, in the case where ahci_cmd_done() does not clear PxCI, it
still raises an IRQ. Host software might thus read an old PxCI value,
since PxCI is cleared (by check_cmd()) after the IRQ has been raised.

Try to simplify this by always setting busy_slot for non-NCQ commands,
such that ahci_cmd_done() will always be responsible for clearing PxCI
for non-NCQ commands.

For NCQ commands, clear PxCI when we receive the D2H FIS, but before
raising the IRQ, see AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and
RegFIS:ClearCI.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/ahci.c | 70 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 20 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 62aebc8de7..9d79b071b8 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -41,9 +41,10 @@
 #include "trace.h"
 
 static void check_cmd(AHCIState *s, int port);
-static int handle_cmd(AHCIState *s, int port, uint8_t slot);
+static void handle_cmd(AHCIState *s, int port, uint8_t slot);
 static void ahci_reset_port(AHCIState *s, int port);
 static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i);
+static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t slot);
 static void ahci_init_d2h(AHCIDevice *ad);
 static int ahci_dma_prepare_buf(const IDEDMA *dma, int32_t limit);
 static bool ahci_map_clb_address(AHCIDevice *ad);
@@ -591,9 +592,8 @@ static void check_cmd(AHCIState *s, int port)
 
     if ((pr->cmd & PORT_CMD_START) && pr->cmd_issue) {
         for (slot = 0; (slot < 32) && pr->cmd_issue; slot++) {
-            if ((pr->cmd_issue & (1U << slot)) &&
-                !handle_cmd(s, port, slot)) {
-                pr->cmd_issue &= ~(1U << slot);
+            if (pr->cmd_issue & (1U << slot)) {
+                handle_cmd(s, port, slot);
             }
         }
     }
@@ -1123,6 +1123,22 @@ static void process_ncq_command(AHCIState *s, int port, const uint8_t *cmd_fis,
         return;
     }
 
+    /*
+     * A NCQ command clears the bit in PxCI after the command has been QUEUED
+     * successfully (ERROR not set, BUSY and DRQ cleared).
+     *
+     * For NCQ commands, PxCI will always be cleared here.
+     *
+     * (Once the NCQ command is COMPLETED, the device will send a SDB FIS with
+     * the interrupt bit set, which will clear PxSACT and raise an interrupt.)
+     */
+    ahci_clear_cmd_issue(ad, slot);
+
+    /*
+     * In reality, for NCQ commands, PxCI is cleared after receiving a D2H FIS
+     * without the interrupt bit set, but since ahci_write_fis_d2h() can raise
+     * an IRQ on error, we need to call them in reverse order.
+     */
     ahci_write_fis_d2h(ad, false);
 
     ncq_tfs->used = 1;
@@ -1197,6 +1213,7 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
 {
     IDEState *ide_state = &s->dev[port].port.ifs[0];
     AHCICmdHdr *cmd = get_cmd_header(s, port, slot);
+    AHCIDevice *ad = &s->dev[port];
     uint16_t opts = le16_to_cpu(cmd->opts);
 
     if (cmd_fis[1] & 0x0F) {
@@ -1273,11 +1290,19 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
     /* Reset transferred byte counter */
     cmd->status = 0;
 
+    /*
+     * A non-NCQ command clears the bit in PxCI after the command has COMPLETED
+     * successfully (ERROR not set, BUSY and DRQ cleared).
+     *
+     * For non-NCQ commands, PxCI will always be cleared by ahci_cmd_done().
+     */
+    ad->busy_slot = slot;
+
     /* We're ready to process the command in FIS byte 2. */
     ide_bus_exec_cmd(&s->dev[port].port, cmd_fis[2]);
 }
 
-static int handle_cmd(AHCIState *s, int port, uint8_t slot)
+static void handle_cmd(AHCIState *s, int port, uint8_t slot)
 {
     IDEState *ide_state;
     uint64_t tbl_addr;
@@ -1288,12 +1313,12 @@ static int handle_cmd(AHCIState *s, int port, uint8_t slot)
     if (s->dev[port].port.ifs[0].status & (BUSY_STAT|DRQ_STAT)) {
         /* Engine currently busy, try again later */
         trace_handle_cmd_busy(s, port);
-        return -1;
+        return;
     }
 
     if (!s->dev[port].lst) {
         trace_handle_cmd_nolist(s, port);
-        return -1;
+        return;
     }
     cmd = get_cmd_header(s, port, slot);
     /* remember current slot handle for later */
@@ -1303,7 +1328,7 @@ static int handle_cmd(AHCIState *s, int port, uint8_t slot)
     ide_state = &s->dev[port].port.ifs[0];
     if (!ide_state->blk) {
         trace_handle_cmd_badport(s, port);
-        return -1;
+        return;
     }
 
     tbl_addr = le64_to_cpu(cmd->tbl_addr);
@@ -1312,7 +1337,7 @@ static int handle_cmd(AHCIState *s, int port, uint8_t slot)
                              DMA_DIRECTION_TO_DEVICE, MEMTXATTRS_UNSPECIFIED);
     if (!cmd_fis) {
         trace_handle_cmd_badfis(s, port);
-        return -1;
+        return;
     } else if (cmd_len != 0x80) {
         ahci_trigger_irq(s, &s->dev[port], AHCI_PORT_IRQ_BIT_HBFS);
         trace_handle_cmd_badmap(s, port, cmd_len);
@@ -1336,15 +1361,6 @@ static int handle_cmd(AHCIState *s, int port, uint8_t slot)
 out:
     dma_memory_unmap(s->as, cmd_fis, cmd_len, DMA_DIRECTION_TO_DEVICE,
                      cmd_len);
-
-    if (s->dev[port].port.ifs[0].status & (BUSY_STAT|DRQ_STAT)) {
-        /* async command, complete later */
-        s->dev[port].busy_slot = slot;
-        return -1;
-    }
-
-    /* done handling the command */
-    return 0;
 }
 
 /* Transfer PIO data between RAM and device */
@@ -1498,6 +1514,16 @@ static int ahci_dma_rw_buf(const IDEDMA *dma, bool is_write)
     return 1;
 }
 
+static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t slot)
+{
+    IDEState *ide_state = &ad->port.ifs[0];
+
+    if (!(ide_state->status & (BUSY_STAT | DRQ_STAT))) {
+        ad->port_regs.cmd_issue &= ~(1 << slot);
+    }
+}
+
+/* Non-NCQ command is done - This function is never called for NCQ commands. */
 static void ahci_cmd_done(const IDEDMA *dma)
 {
     AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
@@ -1506,11 +1532,15 @@ static void ahci_cmd_done(const IDEDMA *dma)
 
     /* no longer busy */
     if (ad->busy_slot != -1) {
-        ad->port_regs.cmd_issue &= ~(1 << ad->busy_slot);
+        ahci_clear_cmd_issue(ad, ad->busy_slot);
         ad->busy_slot = -1;
     }
 
-    /* update d2h status */
+    /*
+     * In reality, for non-NCQ commands, PxCI is cleared after receiving a D2H
+     * FIS with the interrupt bit set, but since ahci_write_fis_d2h() will raise
+     * an IRQ, we need to call them in reverse order.
+     */
     ahci_write_fis_d2h(ad, true);
 
     if (ad->port_regs.cmd_issue && !ad->check_bh) {
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/9] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
                   ` (3 preceding siblings ...)
  2023-04-28 13:21 ` [PATCH 4/9] hw/ide/ahci: simplify and document PxCI handling Niklas Cassel
@ 2023-04-28 13:21 ` Niklas Cassel
  2023-04-28 13:21 ` [PATCH 6/9] hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared Niklas Cassel
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

For NCQ, PxCI is cleared on command queued successfully.
For non-NCQ, PxCI is cleared on command completed successfully.
Successfully means ERR_STAT, BUSY and DRQ are all cleared.

A command that has ERR_STAT set, does not get to clear PxCI.
See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
and 5.3.16.5 ERR:FatalTaskfile.

In the case of non-NCQ commands, not clearing PxCI is needed in order
for host software to be able to see which command slot that failed.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/ahci.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 9d79b071b8..366929132b 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1518,7 +1518,8 @@ static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t slot)
 {
     IDEState *ide_state = &ad->port.ifs[0];
 
-    if (!(ide_state->status & (BUSY_STAT | DRQ_STAT))) {
+    if (!(ide_state->status & ERR_STAT) &&
+        !(ide_state->status & (BUSY_STAT | DRQ_STAT))) {
         ad->port_regs.cmd_issue &= ~(1 << slot);
     }
 }
@@ -1527,6 +1528,7 @@ static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t slot)
 static void ahci_cmd_done(const IDEDMA *dma)
 {
     AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+    IDEState *ide_state = &ad->port.ifs[0];
 
     trace_ahci_cmd_done(ad->hba, ad->port_no);
 
@@ -1543,7 +1545,8 @@ static void ahci_cmd_done(const IDEDMA *dma)
      */
     ahci_write_fis_d2h(ad, true);
 
-    if (ad->port_regs.cmd_issue && !ad->check_bh) {
+    if (!(ide_state->status & ERR_STAT) &&
+        ad->port_regs.cmd_issue && !ad->check_bh) {
         ad->check_bh = qemu_bh_new(ahci_check_cmd_bh, ad);
         qemu_bh_schedule(ad->check_bh);
     }
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/9] hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
                   ` (4 preceding siblings ...)
  2023-04-28 13:21 ` [PATCH 5/9] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set Niklas Cassel
@ 2023-04-28 13:21 ` Niklas Cassel
  2023-05-17 21:21   ` John Snow
  2023-04-28 13:21 ` [PATCH 7/9] hw/ide/ahci: trigger either error IRQ or regular IRQ, not both Niklas Cassel
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

According to AHCI 1.3.1 definition of PxSACT:
This field is cleared when PxCMD.ST is written from a '1' to a '0' by
software. This field is not cleared by a COMRESET or a software reset.

According to AHCI 1.3.1 definition of PxCI:
This field is also cleared when PxCMD.ST is written from a '1' to a '0'
by software.

Clearing PxCMD.ST is part of the error recovery procedure, see
AHCI 1.3.1, section "6.2 Error Recovery".

If we don't clear PxCI on error recovery, the previous command will
incorrectly still be marked as pending after error recovery.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/ahci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 366929132b..2a59d0e0f5 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -329,6 +329,11 @@ static void ahci_port_write(AHCIState *s, int port, int offset, uint32_t val)
         ahci_check_irq(s);
         break;
     case AHCI_PORT_REG_CMD:
+        if ((pr->cmd & PORT_CMD_START) && !(val & PORT_CMD_START)) {
+            pr->scr_act = 0;
+            pr->cmd_issue = 0;
+        }
+
         /* Block any Read-only fields from being set;
          * including LIST_ON and FIS_ON.
          * The spec requires to set ICC bits to zero after the ICC change
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/9] hw/ide/ahci: trigger either error IRQ or regular IRQ, not both
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
                   ` (5 preceding siblings ...)
  2023-04-28 13:21 ` [PATCH 6/9] hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared Niklas Cassel
@ 2023-04-28 13:21 ` Niklas Cassel
  2023-05-17 21:22   ` John Snow
  2023-04-28 13:21 ` [PATCH 8/9] hw/ide/ahci: fix ahci_write_fis_sdb() Niklas Cassel
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

According to AHCI 1.3.1, 5.3.8.1 RegFIS:Entry, if ERR_STAT is set,
we jump to state ERR:FatalTaskfile, which will raise a TFES IRQ
unconditionally, regardless if the I bit is set in the FIS or not.

Thus, we should never raise a normal IRQ after having sent an error
IRQ.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/ahci.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 2a59d0e0f5..d88961b4c0 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -891,11 +891,10 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i)
     pr->tfdata = (ad->port.ifs[0].error << 8) |
         ad->port.ifs[0].status;
 
+    /* TFES IRQ is always raised if ERR_STAT is set, regardless of I bit. */
     if (d2h_fis[2] & ERR_STAT) {
         ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_TFES);
-    }
-
-    if (d2h_fis_i) {
+    } else if (d2h_fis_i) {
         ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
     }
 
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/9] hw/ide/ahci: fix ahci_write_fis_sdb()
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
                   ` (6 preceding siblings ...)
  2023-04-28 13:21 ` [PATCH 7/9] hw/ide/ahci: trigger either error IRQ or regular IRQ, not both Niklas Cassel
@ 2023-04-28 13:21 ` Niklas Cassel
  2023-04-28 13:21 ` [PATCH 9/9] hw/ide/ahci: fix broken SError handling Niklas Cassel
  2023-05-17 17:06 ` [PATCH 0/9] misc AHCI cleanups John Snow
  9 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

When there is an error, we need to raise a TFES error irq, see AHCI 1.3.1,
5.3.13.1 SDB:Entry.

If ERR_STAT is set, we jump to state ERR:FatalTaskfile, which will raise
a TFES IRQ unconditionally, regardless if the I bit is set in the FIS or
not.

Thus, we should never raise a normal IRQ after having sent an error IRQ.

It is valid to signal successfully completed commands as finished in the
same SDB FIS that generates the error IRQ. The important thing is that
commands that did not complete successfully (e.g. commands that were
aborted, do not get the finished bit set).

Before this commit, there was never a TFES IRQ raised on NCQ error.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/ahci.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index d88961b4c0..4950d3575e 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -806,8 +806,14 @@ static void ahci_write_fis_sdb(AHCIState *s, NCQTransferState *ncq_tfs)
     pr->scr_act &= ~ad->finished;
     ad->finished = 0;
 
-    /* Trigger IRQ if interrupt bit is set (which currently, it always is) */
-    if (sdb_fis->flags & 0x40) {
+    /*
+     * TFES IRQ is always raised if ERR_STAT is set, regardless of I bit.
+     * If ERR_STAT is not set, trigger SDBS IRQ if interrupt bit is set
+     * (which currently, it always is).
+     */
+    if (sdb_fis->status & ERR_STAT) {
+        ahci_trigger_irq(s, ad, AHCI_PORT_IRQ_BIT_TFES);
+    } else if (sdb_fis->flags & 0x40) {
         ahci_trigger_irq(s, ad, AHCI_PORT_IRQ_BIT_SDBS);
     }
 }
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 9/9] hw/ide/ahci: fix broken SError handling
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
                   ` (7 preceding siblings ...)
  2023-04-28 13:21 ` [PATCH 8/9] hw/ide/ahci: fix ahci_write_fis_sdb() Niklas Cassel
@ 2023-04-28 13:21 ` Niklas Cassel
  2023-05-17 21:26   ` John Snow
  2023-05-17 17:06 ` [PATCH 0/9] misc AHCI cleanups John Snow
  9 siblings, 1 reply; 20+ messages in thread
From: Niklas Cassel @ 2023-04-28 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

From: Niklas Cassel <niklas.cassel@wdc.com>

When encountering an NCQ error, you should not write the NCQ tag to the
SError register. This is completely wrong.

The SError register has a clear definition, where each bit represents a
different error, see PxSERR definition in AHCI 1.3.1.

If we write a random value (like the NCQ tag) in SError, e.g. Linux will
read SError, and will trigger arbitrary error handling depending on the
NCQ tag that happened to be executing.

In case of success, ncq_cb() will call ncq_finish().
In case of error, ncq_cb() will call ncq_err() (which will clear
ncq_tfs->used), and then call ncq_finish(), thus using ncq_tfs->used is
sufficient to tell if finished should get set or not.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 hw/ide/ahci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 4950d3575e..09a14408e3 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1011,7 +1011,6 @@ static void ncq_err(NCQTransferState *ncq_tfs)
 
     ide_state->error = ABRT_ERR;
     ide_state->status = READY_STAT | ERR_STAT;
-    ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
     qemu_sglist_destroy(&ncq_tfs->sglist);
     ncq_tfs->used = 0;
 }
@@ -1021,7 +1020,7 @@ static void ncq_finish(NCQTransferState *ncq_tfs)
     /* If we didn't error out, set our finished bit. Errored commands
      * do not get a bit set for the SDB FIS ACT register, nor do they
      * clear the outstanding bit in scr_act (PxSACT). */
-    if (!(ncq_tfs->drive->port_regs.scr_err & (1 << ncq_tfs->tag))) {
+    if (ncq_tfs->used) {
         ncq_tfs->drive->finished |= (1 << ncq_tfs->tag);
     }
 
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/9] hw/ide/ahci: remove stray backslash
  2023-04-28 13:21 ` [PATCH 1/9] hw/ide/ahci: remove stray backslash Niklas Cassel
@ 2023-04-28 22:17   ` Philippe Mathieu-Daudé
  2023-05-17 17:14   ` John Snow
  1 sibling, 0 replies; 20+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-04-28 22:17 UTC (permalink / raw)
  To: Niklas Cassel, John Snow
  Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

On 28/4/23 15:21, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> This backslash obviously does not belong here, so remove it.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>   hw/ide/ahci.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] misc AHCI cleanups
  2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
                   ` (8 preceding siblings ...)
  2023-04-28 13:21 ` [PATCH 9/9] hw/ide/ahci: fix broken SError handling Niklas Cassel
@ 2023-05-17 17:06 ` John Snow
  2023-06-01 13:56   ` Niklas Cassel
  9 siblings, 1 reply; 20+ messages in thread
From: John Snow @ 2023-05-17 17:06 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

On Fri, Apr 28, 2023 at 9:22 AM Niklas Cassel <nks@flawful.org> wrote:
>
> From: Niklas Cassel <niklas.cassel@wdc.com>
>
> Hello John,
>

Hi Niklas!

I haven't been actively involved with AHCI for a while, so I am not
sure I can find the time to give this a deep scrub. I'm going to
assume based on '@wdc.com` that you probably know a thing or two more
about AHCI than I do, though. Can you tell me what testing you've
performed on this? As long as it doesn't cause any obvious
regressions, we might be able to push it through, but it might not be
up to me anymore. I can give it a review on technical merit, but with
regards to "correctness" I have to admit I am flying blind.

(I haven't looked at the patches yet, but if in your commit messages
you can point to the relevant sections of the relevant specifications,
that'd help immensely.)

> Here comes some misc AHCI cleanups.
>
> Most are related to error handling.

I've always found this the most difficult part to verify. In a real
system, the registers between AHCI and the actual hard disk are
*physically separate*, and they update at specific times based on the
transmission of the FIS packets. The model in QEMU doesn't bother with
a perfect reproduction of that, and so it's been a little tough to
verify correctness. I tried to improve it a bit back in the day, but
my understanding has always been a bit limited :)

Are there any vendor tools you're aware of that have test suites we
could use to verify behavior? Our AHCI tests are very rudimentary - I
wrote them straight out of undergrad as my first project at RH - and
likely gloss over and misunderstand many things about the
ATA/SATA/AHCI specifications.

>
> Please review.
>
> (I'm also working on a second series which will add support for
> READ LOG EXT and READ LOG DMA EXT, but I will send that one out
> once it is ready.)

Wow!

A question for you: is it worth solidifying which ATA specification we
conform to? I don't believe we adhere to any one specific model,
because a lot of the code is shared between PATA and SATA, and we "in
theory" support IDE hard drives for fairly old guest operating systems
that may or may not predate the DMA extensions. As a result, the
actual device emulation is kind of a mish-mash of different ATA
specifications, generally whichever version provided the
least-abstracted detail and was easy to implement.

If you're adding the logging features, that seems to push us towards
the newer end of the spectrum, but I'm not sure if this causes any
problems for guest operating systems doing probing to guess what kind
of device they're talking to.

Any input?

>
>
> Kind regards,
> Niklas
>
>
> Niklas Cassel (9):
>   hw/ide/ahci: remove stray backslash
>   hw/ide/core: set ERR_STAT in unsupported command completion
>   hw/ide/ahci: write D2H FIS on when processing NCQ command
>   hw/ide/ahci: simplify and document PxCI handling
>   hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set
>   hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared
>   hw/ide/ahci: trigger either error IRQ or regular IRQ, not both
>   hw/ide/ahci: fix ahci_write_fis_sdb()
>   hw/ide/ahci: fix broken SError handling
>
>  hw/ide/ahci.c | 111 +++++++++++++++++++++++++++++++++++---------------
>  hw/ide/core.c |   2 +-
>  2 files changed, 80 insertions(+), 33 deletions(-)
>
> --
> 2.40.0
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/9] hw/ide/ahci: remove stray backslash
  2023-04-28 13:21 ` [PATCH 1/9] hw/ide/ahci: remove stray backslash Niklas Cassel
  2023-04-28 22:17   ` Philippe Mathieu-Daudé
@ 2023-05-17 17:14   ` John Snow
  1 sibling, 0 replies; 20+ messages in thread
From: John Snow @ 2023-05-17 17:14 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

On Fri, Apr 28, 2023 at 9:22 AM Niklas Cassel <nks@flawful.org> wrote:
>
> From: Niklas Cassel <niklas.cassel@wdc.com>
>
> This backslash obviously does not belong here, so remove it.

Reviewed-by: John Snow <jsnow@redhat.com>

>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  hw/ide/ahci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index 55902e1df7..a36e3fb77c 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -690,7 +690,7 @@ static void ahci_reset_port(AHCIState *s, int port)
>
>      s->dev[port].port_state = STATE_RUN;
>      if (ide_state->drive_kind == IDE_CD) {
> -        ahci_set_signature(d, SATA_SIGNATURE_CDROM);\
> +        ahci_set_signature(d, SATA_SIGNATURE_CDROM);
>          ide_state->status = SEEK_STAT | WRERR_STAT | READY_STAT;
>      } else {
>          ahci_set_signature(d, SATA_SIGNATURE_DISK);
> --
> 2.40.0
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/9] hw/ide/core: set ERR_STAT in unsupported command completion
  2023-04-28 13:21 ` [PATCH 2/9] hw/ide/core: set ERR_STAT in unsupported command completion Niklas Cassel
@ 2023-05-17 21:12   ` John Snow
  2023-06-01 13:28     ` Niklas Cassel
  0 siblings, 1 reply; 20+ messages in thread
From: John Snow @ 2023-05-17 21:12 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

On Fri, Apr 28, 2023 at 9:22 AM Niklas Cassel <nks@flawful.org> wrote:
>
> From: Niklas Cassel <niklas.cassel@wdc.com>
>
> Currently, the first time sending an unsupported command
> (e.g. READ LOG DMA EXT) will not have ERR_STAT set in the completion.
> Sending the unsupported command again, will correctly have ERR_STAT set.
>
> When ide_cmd_permitted() returns false, it calls ide_abort_command().
> ide_abort_command() first calls ide_transfer_stop(), which will call
> ide_transfer_halt() and ide_cmd_done(), after that ide_abort_command()
> sets ERR_STAT in status.
>
> ide_cmd_done() for AHCI will call ahci_write_fis_d2h() which writes the
> current status in the FIS, and raises an IRQ. (The status here will not
> have ERR_STAT set!).
>
> Thus, we cannot call ide_transfer_stop() before setting ERR_STAT, as
> ide_transfer_stop() will result in the FIS being written and an IRQ
> being raised.
>
> The reason why it works the second time, is that ERR_STAT will still
> be set from the previous command, so when writing the FIS, the
> completion will correctly have ERR_STAT set.
>
> Set ERR_STAT before writing the FIS (calling cmd_done), so that we will
> raise an error IRQ correctly when receiving an unsupported command.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  hw/ide/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 45d14a25e9..c144d1155d 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -531,9 +531,9 @@ BlockAIOCB *ide_issue_trim(
>
>  void ide_abort_command(IDEState *s)
>  {
> -    ide_transfer_stop(s);
>      s->status = READY_STAT | ERR_STAT;
>      s->error = ABRT_ERR;
> +    ide_transfer_stop(s);
>  }
>
>  static void ide_set_retry(IDEState *s)
> --
> 2.40.0
>

Seems OK at a glance. Does this change the behavior of
ide_transfer_stop at all? I guess we've been using this order of
operations since 2008 at least. I didn't know C then.

ACK



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] hw/ide/ahci: write D2H FIS on when processing NCQ command
  2023-04-28 13:21 ` [PATCH 3/9] hw/ide/ahci: write D2H FIS on when processing NCQ command Niklas Cassel
@ 2023-05-17 21:18   ` John Snow
  0 siblings, 0 replies; 20+ messages in thread
From: John Snow @ 2023-05-17 21:18 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

On Fri, Apr 28, 2023 at 9:22 AM Niklas Cassel <nks@flawful.org> wrote:
>
> From: Niklas Cassel <niklas.cassel@wdc.com>
>
> The way that BUSY + PxCI is cleared for NCQ (FPDMA QUEUED) commands is
> described in SATA 3.5a Gold:
>
> 11.15 FPDMA QUEUED command protocol
> DFPDMAQ2: ClearInterfaceBsy
> "Transmit Register Device to Host FIS with the BSY bit cleared to zero
> and the DRQ bit cleared to zero and Interrupt bit cleared to zero to
> mark interface ready for the next command."
>
> PxCI is currently cleared by handle_cmd(), but we don't write the D2H
> FIS to the FIS Receive Area that actually caused PxCI to be cleared.
>
> Similar to how ahci_pio_transfer() calls ahci_write_fis_pio() with an
> additional parameter to write a PIO Setup FIS without raising an IRQ,
> add a parameter to ahci_write_fis_d2h() so that ahci_write_fis_d2h()
> also can write the FIS to the FIS Receive Area without raising an IRQ.
>
> Change process_ncq_command() to call ahci_write_fis_d2h() without
> raising an IRQ (similar to ahci_pio_transfer()), such that the FIS
> Receive Area is in sync with the PxTFD shadow register.
>
> E.g. Linux reads status and error fields from the FIS Receive Area
> directly, so it is wise to keep the FIS Receive Area and the PxTFD
> shadow register in sync.

I think for some time I wondered if this mattered, because I wasn't
sure when the guest CPU would actually regain control to check an
intermediate state in the memory area before we wrote the next FIS.
But, trusting your quoted blurb, I think this is more obviously
correct.

ACK

(Although, there seems to be a conflict on latest origin/master - can
you rebase, please?)

>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>  hw/ide/ahci.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index a36e3fb77c..62aebc8de7 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -43,7 +43,7 @@
>  static void check_cmd(AHCIState *s, int port);
>  static int handle_cmd(AHCIState *s, int port, uint8_t slot);
>  static void ahci_reset_port(AHCIState *s, int port);
> -static bool ahci_write_fis_d2h(AHCIDevice *ad);
> +static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i);
>  static void ahci_init_d2h(AHCIDevice *ad);
>  static int ahci_dma_prepare_buf(const IDEDMA *dma, int32_t limit);
>  static bool ahci_map_clb_address(AHCIDevice *ad);
> @@ -618,7 +618,7 @@ static void ahci_init_d2h(AHCIDevice *ad)
>          return;
>      }
>
> -    if (ahci_write_fis_d2h(ad)) {
> +    if (ahci_write_fis_d2h(ad, true)) {
>          ad->init_d2h_sent = true;
>          /* We're emulating receiving the first Reg H2D Fis from the device;
>           * Update the SIG register, but otherwise proceed as normal. */
> @@ -850,7 +850,7 @@ static void ahci_write_fis_pio(AHCIDevice *ad, uint16_t len, bool pio_fis_i)
>      }
>  }
>
> -static bool ahci_write_fis_d2h(AHCIDevice *ad)
> +static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i)
>  {
>      AHCIPortRegs *pr = &ad->port_regs;
>      uint8_t *d2h_fis;
> @@ -864,7 +864,7 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad)
>      d2h_fis = &ad->res_fis[RES_FIS_RFIS];
>
>      d2h_fis[0] = SATA_FIS_TYPE_REGISTER_D2H;
> -    d2h_fis[1] = (1 << 6); /* interrupt bit */
> +    d2h_fis[1] = d2h_fis_i ? (1 << 6) : 0; /* interrupt bit */
>      d2h_fis[2] = s->status;
>      d2h_fis[3] = s->error;
>
> @@ -890,7 +890,10 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad)
>          ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_TFES);
>      }
>
> -    ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
> +    if (d2h_fis_i) {
> +        ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
> +    }
> +
>      return true;
>  }
>
> @@ -1120,6 +1123,8 @@ static void process_ncq_command(AHCIState *s, int port, const uint8_t *cmd_fis,
>          return;
>      }
>
> +    ahci_write_fis_d2h(ad, false);
> +
>      ncq_tfs->used = 1;
>      ncq_tfs->drive = ad;
>      ncq_tfs->slot = slot;
> @@ -1506,7 +1511,7 @@ static void ahci_cmd_done(const IDEDMA *dma)
>      }
>
>      /* update d2h status */
> -    ahci_write_fis_d2h(ad);
> +    ahci_write_fis_d2h(ad, true);
>
>      if (ad->port_regs.cmd_issue && !ad->check_bh) {
>          ad->check_bh = qemu_bh_new(ahci_check_cmd_bh, ad);
> --
> 2.40.0
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/9] hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared
  2023-04-28 13:21 ` [PATCH 6/9] hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared Niklas Cassel
@ 2023-05-17 21:21   ` John Snow
  0 siblings, 0 replies; 20+ messages in thread
From: John Snow @ 2023-05-17 21:21 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

On Fri, Apr 28, 2023 at 9:23 AM Niklas Cassel <nks@flawful.org> wrote:
>
> From: Niklas Cassel <niklas.cassel@wdc.com>
>
> According to AHCI 1.3.1 definition of PxSACT:
> This field is cleared when PxCMD.ST is written from a '1' to a '0' by
> software. This field is not cleared by a COMRESET or a software reset.
>
> According to AHCI 1.3.1 definition of PxCI:
> This field is also cleared when PxCMD.ST is written from a '1' to a '0'
> by software.
>
> Clearing PxCMD.ST is part of the error recovery procedure, see
> AHCI 1.3.1, section "6.2 Error Recovery".
>
> If we don't clear PxCI on error recovery, the previous command will
> incorrectly still be marked as pending after error recovery.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>

ACK.

> ---
>  hw/ide/ahci.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index 366929132b..2a59d0e0f5 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -329,6 +329,11 @@ static void ahci_port_write(AHCIState *s, int port, int offset, uint32_t val)
>          ahci_check_irq(s);
>          break;
>      case AHCI_PORT_REG_CMD:
> +        if ((pr->cmd & PORT_CMD_START) && !(val & PORT_CMD_START)) {
> +            pr->scr_act = 0;
> +            pr->cmd_issue = 0;
> +        }
> +
>          /* Block any Read-only fields from being set;
>           * including LIST_ON and FIS_ON.
>           * The spec requires to set ICC bits to zero after the ICC change
> --
> 2.40.0
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/9] hw/ide/ahci: trigger either error IRQ or regular IRQ,  not both
  2023-04-28 13:21 ` [PATCH 7/9] hw/ide/ahci: trigger either error IRQ or regular IRQ, not both Niklas Cassel
@ 2023-05-17 21:22   ` John Snow
  0 siblings, 0 replies; 20+ messages in thread
From: John Snow @ 2023-05-17 21:22 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

On Fri, Apr 28, 2023 at 9:23 AM Niklas Cassel <nks@flawful.org> wrote:
>
> From: Niklas Cassel <niklas.cassel@wdc.com>
>
> According to AHCI 1.3.1, 5.3.8.1 RegFIS:Entry, if ERR_STAT is set,
> we jump to state ERR:FatalTaskfile, which will raise a TFES IRQ
> unconditionally, regardless if the I bit is set in the FIS or not.
>
> Thus, we should never raise a normal IRQ after having sent an error
> IRQ.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>

ACK - and thanks for the spec pointers.

> ---
>  hw/ide/ahci.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index 2a59d0e0f5..d88961b4c0 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -891,11 +891,10 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i)
>      pr->tfdata = (ad->port.ifs[0].error << 8) |
>          ad->port.ifs[0].status;
>
> +    /* TFES IRQ is always raised if ERR_STAT is set, regardless of I bit. */
>      if (d2h_fis[2] & ERR_STAT) {
>          ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_TFES);
> -    }
> -
> -    if (d2h_fis_i) {
> +    } else if (d2h_fis_i) {
>          ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
>      }
>
> --
> 2.40.0
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 9/9] hw/ide/ahci: fix broken SError handling
  2023-04-28 13:21 ` [PATCH 9/9] hw/ide/ahci: fix broken SError handling Niklas Cassel
@ 2023-05-17 21:26   ` John Snow
  0 siblings, 0 replies; 20+ messages in thread
From: John Snow @ 2023-05-17 21:26 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: qemu-block, qemu-devel, Damien Le Moal, Niklas Cassel

On Fri, Apr 28, 2023 at 9:23 AM Niklas Cassel <nks@flawful.org> wrote:
>
> From: Niklas Cassel <niklas.cassel@wdc.com>
>
> When encountering an NCQ error, you should not write the NCQ tag to the
> SError register. This is completely wrong.

Mea culpa ... !

>
> The SError register has a clear definition, where each bit represents a
> different error, see PxSERR definition in AHCI 1.3.1.
>
> If we write a random value (like the NCQ tag) in SError, e.g. Linux will
> read SError, and will trigger arbitrary error handling depending on the
> NCQ tag that happened to be executing.
>
> In case of success, ncq_cb() will call ncq_finish().
> In case of error, ncq_cb() will call ncq_err() (which will clear
> ncq_tfs->used), and then call ncq_finish(), thus using ncq_tfs->used is
> sufficient to tell if finished should get set or not.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>

The bulk of this series looks good, I think I'd be happy to take it,
the commit messages are extremely well written so if a regression
happens to surface, we'll be able to track down what went awry.

Want to rebase and resend it?

--js

> ---
>  hw/ide/ahci.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index 4950d3575e..09a14408e3 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -1011,7 +1011,6 @@ static void ncq_err(NCQTransferState *ncq_tfs)
>
>      ide_state->error = ABRT_ERR;
>      ide_state->status = READY_STAT | ERR_STAT;
> -    ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
>      qemu_sglist_destroy(&ncq_tfs->sglist);
>      ncq_tfs->used = 0;
>  }
> @@ -1021,7 +1020,7 @@ static void ncq_finish(NCQTransferState *ncq_tfs)
>      /* If we didn't error out, set our finished bit. Errored commands
>       * do not get a bit set for the SDB FIS ACT register, nor do they
>       * clear the outstanding bit in scr_act (PxSACT). */
> -    if (!(ncq_tfs->drive->port_regs.scr_err & (1 << ncq_tfs->tag))) {
> +    if (ncq_tfs->used) {
>          ncq_tfs->drive->finished |= (1 << ncq_tfs->tag);
>      }
>
> --
> 2.40.0
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/9] hw/ide/core: set ERR_STAT in unsupported command completion
  2023-05-17 21:12   ` John Snow
@ 2023-06-01 13:28     ` Niklas Cassel
  0 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2023-06-01 13:28 UTC (permalink / raw)
  To: John Snow
  Cc: Niklas Cassel, qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Damien Le Moal

On Wed, May 17, 2023 at 05:12:57PM -0400, John Snow wrote:
> On Fri, Apr 28, 2023 at 9:22 AM Niklas Cassel <nks@flawful.org> wrote:
> >
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> >
> > Currently, the first time sending an unsupported command
> > (e.g. READ LOG DMA EXT) will not have ERR_STAT set in the completion.
> > Sending the unsupported command again, will correctly have ERR_STAT set.
> >
> > When ide_cmd_permitted() returns false, it calls ide_abort_command().
> > ide_abort_command() first calls ide_transfer_stop(), which will call
> > ide_transfer_halt() and ide_cmd_done(), after that ide_abort_command()
> > sets ERR_STAT in status.
> >
> > ide_cmd_done() for AHCI will call ahci_write_fis_d2h() which writes the
> > current status in the FIS, and raises an IRQ. (The status here will not
> > have ERR_STAT set!).
> >
> > Thus, we cannot call ide_transfer_stop() before setting ERR_STAT, as
> > ide_transfer_stop() will result in the FIS being written and an IRQ
> > being raised.
> >
> > The reason why it works the second time, is that ERR_STAT will still
> > be set from the previous command, so when writing the FIS, the
> > completion will correctly have ERR_STAT set.
> >
> > Set ERR_STAT before writing the FIS (calling cmd_done), so that we will
> > raise an error IRQ correctly when receiving an unsupported command.
> >
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> >  hw/ide/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/ide/core.c b/hw/ide/core.c
> > index 45d14a25e9..c144d1155d 100644
> > --- a/hw/ide/core.c
> > +++ b/hw/ide/core.c
> > @@ -531,9 +531,9 @@ BlockAIOCB *ide_issue_trim(
> >
> >  void ide_abort_command(IDEState *s)
> >  {
> > -    ide_transfer_stop(s);
> >      s->status = READY_STAT | ERR_STAT;
> >      s->error = ABRT_ERR;
> > +    ide_transfer_stop(s);
> >  }
> >
> >  static void ide_set_retry(IDEState *s)
> > --
> > 2.40.0
> >
> 
> Seems OK at a glance. Does this change the behavior of
> ide_transfer_stop at all? I guess we've been using this order of
> operations since 2008 at least. I didn't know C then.

Hello John,

Not as far as I can see.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] misc AHCI cleanups
  2023-05-17 17:06 ` [PATCH 0/9] misc AHCI cleanups John Snow
@ 2023-06-01 13:56   ` Niklas Cassel
  0 siblings, 0 replies; 20+ messages in thread
From: Niklas Cassel @ 2023-06-01 13:56 UTC (permalink / raw)
  To: John Snow
  Cc: Niklas Cassel, qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Damien Le Moal

On Wed, May 17, 2023 at 01:06:06PM -0400, John Snow wrote:
> On Fri, Apr 28, 2023 at 9:22 AM Niklas Cassel <nks@flawful.org> wrote:
> >
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> >
> > Hello John,
> >
> 
> Hi Niklas!
> 
> I haven't been actively involved with AHCI for a while, so I am not
> sure I can find the time to give this a deep scrub. I'm going to
> assume based on '@wdc.com` that you probably know a thing or two more
> about AHCI than I do, though. Can you tell me what testing you've
> performed on this? As long as it doesn't cause any obvious
> regressions, we might be able to push it through, but it might not be
> up to me anymore. I can give it a review on technical merit, but with
> regards to "correctness" I have to admit I am flying blind.

Hello John,

The testing is mostly using linux and injecting NCQ errors using some
additional QEMU patches that are not part of this series.

> 
> (I haven't looked at the patches yet, but if in your commit messages
> you can point to the relevant sections of the relevant specifications,
> that'd help immensely.)
> 
> > Here comes some misc AHCI cleanups.
> >
> > Most are related to error handling.
> 
> I've always found this the most difficult part to verify. In a real
> system, the registers between AHCI and the actual hard disk are
> *physically separate*, and they update at specific times based on the
> transmission of the FIS packets. The model in QEMU doesn't bother with
> a perfect reproduction of that, and so it's been a little tough to
> verify correctness. I tried to improve it a bit back in the day, but
> my understanding has always been a bit limited :)
> 
> Are there any vendor tools you're aware of that have test suites we
> could use to verify behavior?

Unfortunately, I don't know of any good test suite.

> A question for you: is it worth solidifying which ATA specification we
> conform to? I don't believe we adhere to any one specific model,
> because a lot of the code is shared between PATA and SATA, and we "in
> theory" support IDE hard drives for fairly old guest operating systems
> that may or may not predate the DMA extensions. As a result, the
> actual device emulation is kind of a mish-mash of different ATA
> specifications, generally whichever version provided the
> least-abstracted detail and was easy to implement.
> 
> If you're adding the logging features, that seems to push us towards
> the newer end of the spectrum, but I'm not sure if this causes any
> problems for guest operating systems doing probing to guess what kind
> of device they're talking to.
> 
> Any input?

I agree.

In my next series, after we have General Purpose Logging support,
I intend to bump the major version to indicate ACS-5 support.
I will need to verify that we are not missing any other major
feature from ACS-5 first though (other than GPL).


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-06-01 13:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-28 13:21 [PATCH 0/9] misc AHCI cleanups Niklas Cassel
2023-04-28 13:21 ` [PATCH 1/9] hw/ide/ahci: remove stray backslash Niklas Cassel
2023-04-28 22:17   ` Philippe Mathieu-Daudé
2023-05-17 17:14   ` John Snow
2023-04-28 13:21 ` [PATCH 2/9] hw/ide/core: set ERR_STAT in unsupported command completion Niklas Cassel
2023-05-17 21:12   ` John Snow
2023-06-01 13:28     ` Niklas Cassel
2023-04-28 13:21 ` [PATCH 3/9] hw/ide/ahci: write D2H FIS on when processing NCQ command Niklas Cassel
2023-05-17 21:18   ` John Snow
2023-04-28 13:21 ` [PATCH 4/9] hw/ide/ahci: simplify and document PxCI handling Niklas Cassel
2023-04-28 13:21 ` [PATCH 5/9] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set Niklas Cassel
2023-04-28 13:21 ` [PATCH 6/9] hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared Niklas Cassel
2023-05-17 21:21   ` John Snow
2023-04-28 13:21 ` [PATCH 7/9] hw/ide/ahci: trigger either error IRQ or regular IRQ, not both Niklas Cassel
2023-05-17 21:22   ` John Snow
2023-04-28 13:21 ` [PATCH 8/9] hw/ide/ahci: fix ahci_write_fis_sdb() Niklas Cassel
2023-04-28 13:21 ` [PATCH 9/9] hw/ide/ahci: fix broken SError handling Niklas Cassel
2023-05-17 21:26   ` John Snow
2023-05-17 17:06 ` [PATCH 0/9] misc AHCI cleanups John Snow
2023-06-01 13:56   ` Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).