Re: [Qemu-devel] [PATCH 2/2] scsi: esp: Improve consistency of RSTAT, RSEQ, and RINTR

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Guenter Roeck <linux@roeck-us.net>
To: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <famz@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 2/2] scsi: esp: Improve consistency of RSTAT, RSEQ, and RINTR
Date: Thu, 29 Nov 2018 09:38:45 -0800	[thread overview]
Message-ID: <20181129173845.GA2929@roeck-us.net> (raw)
In-Reply-To: <734e8388-2f0f-1c5b-7767-29e43d261bcb@ilande.co.uk>

[-- Attachment #1: Type: text/plain, Size: 2486 bytes --]

Hi Mark,

On Thu, Nov 29, 2018 at 11:56:39AM +0000, Mark Cave-Ayland wrote:
> On 29/11/2018 09:58, Paolo Bonzini wrote:
> 
> > On 28/11/18 22:56, Guenter Roeck wrote:
> >> The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers
> >> to reflect a consistent state. However, it is possible that the registers
> >> can change after RSTAT was read, but before RINTR is read.
> >>
> >> Guest OS		qemu
> >> --------		----
> >> Read RSTAT
> >> 			esp_command_complete()
> >> 			 RSTAT = STAT_ST
> >> 			 esp_dma_done()
> >> 			  RSTAT |= STAT_TC
> >> 			  RSEQ = 0
> >> 			  RINTR = INTR_BS
> >>
> >> Read RSEQ
> >> Read RINTR		RINTR = 0
> >> 			RSTAT &= ~STAT_TC
> >> 			RSEQ = SEQ_CD
> >>
> >> The guest OS would then try to handle INTR_BS combined with an old
> >> value of RSTAT. This sometimes resulted in lost events, spurious
> >> interrupts, guest OS confusion, and stalled SCSI operations.
> > 
> > The question is, why was the guest running the interrupt routine before
> > STAT_INT was set in RSTAT?  The code in esp_raise_irq seems good:
> > 
> >     if (!(s->rregs[ESP_RSTAT] & STAT_INT)) {
> >         s->rregs[ESP_RSTAT] |= STAT_INT;
> >         qemu_irq_raise(s->irq);
> >         trace_esp_raise_irq();
> >     }
> > 
> > Paolo
> 
> This patch is very interesting, as I have a long-running regression trying to boot
> NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the commit that
> turned on iothread by default in QEMU.
> 
> The symptom is that ESP SCSI requests hang/timeout before the kernel is able to get
> to the userspace installer: however if you launch QEMU with "taskset –cpu-list 1
> qemu-system-sparc ..." then it works and you can complete the installation.
> 
> So certainly this suggests that there is a race condition still present in ESP
> somewhere. I've given this patch a spin, and in a few quick tests here I was able to
> consistently get further in kernel boot, but it still doesn't completely solve issue
> for me :/
> 

Can you try the attached patch ? It is a bit cleaner than the first version,
and works for me as well.

Note that this isn't perfect. Specifically, I see differences in handling
STAT_TC. The controller specification is a bit ambiguous in that regard,
but comparing the qemu code with real controller behavior shows that the
real controller does not reset STAT_TC when reading the interrupt status
register. That doesn't seem to matter for Linux, but it may influence
other guests.

Guenter

[-- Attachment #2: 0001-scsi-esp-Defer-command-completion-until-previous-int.patch --]
[-- Type: text/x-diff, Size: 5406 bytes --]

>From 751edd383f3c70d8a9195345b45aa3eb1ada7553 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Thu, 29 Nov 2018 09:17:42 -0800
Subject: [PATCH] scsi: esp: Defer command completion until previous interrupts
 have been handled

The guest OS reads RSTAT, RSEQ, and RINTR, and expects those registers
to reflect a consistent state. However, it is possible that the registers
can change after RSTAT was read, but before RINTR is read, when
esp_command_complete() is called.

Guest OS		qemu
--------		----
[handle interrupt]
Read RSTAT
			esp_command_complete()
			 RSTAT = STAT_ST
			 esp_dma_done()
			  RSTAT |= STAT_TC
			  RSEQ = 0
			  RINTR = INTR_BS

Read RSEQ
Read RINTR		RINTR = 0
			RSTAT &= ~STAT_TC
			RSEQ = SEQ_CD

The guest OS would then try to handle INTR_BS combined with an old
value of RSTAT. This sometimes resulted in lost events, spurious
interrupts, guest OS confusion, and stalled SCSI operations.
A typical guest error log (observed with various versions of Linux)
looks as follows.

scsi host1: Spurious irq, sreg=13.
...
scsi host1: Aborting command [84531f10:2a]
scsi host1: Current command [f882eea8:35]
scsi host1: Queued command [84531f10:2a]
scsi host1:  Active command [f882eea8:35]
scsi host1: Dumping command log
scsi host1: ent[15] CMD val[44] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[00] event[0c]
scsi host1: ent[16] CMD val[01] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
scsi host1: ent[17] CMD val[43] sreg[90] seqreg[00] sreg2[00] ireg[20] ss[02] event[0c]
scsi host1: ent[18] EVENT val[0d] sreg[92] seqreg[04] sreg2[00] ireg[18] ss[00] event[0c]
...

Defer handling command completion until previous interrupts have been
handled to fix the problem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
This looks much cleaner than the previous patch and at the same time
identifies the problem. Enabling the new trace shows that the condition
is seen on a regular basis (on average 2-3 times per boot test with
qemu-system-hppa if the test does nothing but shut down immediately).

 hw/scsi/esp.c         | 29 ++++++++++++++++++++++++-----
 hw/scsi/trace-events  |  1 +
 include/hw/scsi/esp.h |  2 ++
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 630d923..e3d6cc5 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -286,11 +286,8 @@ static void esp_do_dma(ESPState *s)
     esp_dma_done(s);
 }
 
-void esp_command_complete(SCSIRequest *req, uint32_t status,
-                                 size_t resid)
+static void _esp_command_complete(ESPState *s, uint32_t status)
 {
-    ESPState *s = req->hba_private;
-
     trace_esp_command_complete();
     if (s->ti_size != 0) {
         trace_esp_command_complete_unexpected();
@@ -311,6 +308,23 @@ void esp_command_complete(SCSIRequest *req, uint32_t status,
     }
 }
 
+void esp_command_complete(SCSIRequest *req, uint32_t status,
+                          size_t resid)
+{
+    ESPState *s = req->hba_private;
+
+    if (s->rregs[ESP_RSTAT] & STAT_INT) {
+        /* Defer handling command complete until the previous
+         * interrupt has been handled.
+         */
+        trace_esp_command_complete_deferred();
+        s->deferred_status = status;
+        s->deferred_complete = true;
+        return;
+    }
+    _esp_command_complete(s, status);
+}
+
 void esp_transfer_data(SCSIRequest *req, uint32_t len)
 {
     ESPState *s = req->hba_private;
@@ -422,7 +436,10 @@ uint64_t esp_reg_read(ESPState *s, uint32_t saddr)
         s->rregs[ESP_RSTAT] &= ~STAT_TC;
         s->rregs[ESP_RSEQ] = SEQ_CD;
         esp_lower_irq(s);
-
+        if (s->deferred_complete) {
+            _esp_command_complete(s, s->deferred_status);
+            s->deferred_complete = false;
+        }
         return old_val;
     case ESP_TCHI:
         /* Return the unique id if the value has never been written */
@@ -582,6 +599,8 @@ const VMStateDescription vmstate_esp = {
         VMSTATE_UINT32(ti_wptr, ESPState),
         VMSTATE_BUFFER(ti_buf, ESPState),
         VMSTATE_UINT32(status, ESPState),
+        VMSTATE_UINT32(deferred_status, ESPState),
+        VMSTATE_BOOL(deferred_complete, ESPState),
         VMSTATE_UINT32(dma, ESPState),
         VMSTATE_PARTIAL_BUFFER(cmdbuf, ESPState, 16),
         VMSTATE_BUFFER_START_MIDDLE_V(cmdbuf, ESPState, 16, 4),
diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events
index 0fb6a99..2fe8a7c 100644
--- a/hw/scsi/trace-events
+++ b/hw/scsi/trace-events
@@ -167,6 +167,7 @@ esp_handle_satn_stop(uint32_t cmdlen) "cmdlen %d"
 esp_write_response(uint32_t status) "Transfer status (status=%d)"
 esp_do_dma(uint32_t cmdlen, uint32_t len) "command len %d + %d"
 esp_command_complete(void) "SCSI Command complete"
+esp_command_complete_deferred(void) "SCSI Command complete deferred"
 esp_command_complete_unexpected(void) "SCSI command completed unexpectedly"
 esp_command_complete_fail(void) "Command failed"
 esp_transfer_data(uint32_t dma_left, int32_t ti_size) "transfer %d/%d"
diff --git a/include/hw/scsi/esp.h b/include/hw/scsi/esp.h
index 682a0d2..adab63d 100644
--- a/include/hw/scsi/esp.h
+++ b/include/hw/scsi/esp.h
@@ -23,6 +23,8 @@ struct ESPState {
     int32_t ti_size;
     uint32_t ti_rptr, ti_wptr;
     uint32_t status;
+    uint32_t deferred_status;
+    bool deferred_complete;
     uint32_t dma;
     uint8_t ti_buf[TI_BUFSZ];
     SCSIBus bus;
-- 
2.7.4

next prev parent reply	other threads:[~2018-11-29 17:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-28 21:56 [Qemu-devel] [PATCH 1/2] esp-pci: Fix status register write erase control Guenter Roeck
2018-11-28 21:56 ` [Qemu-devel] [PATCH 2/2] scsi: esp: Improve consistency of RSTAT, RSEQ, and RINTR Guenter Roeck
2018-11-29  9:58   ` Paolo Bonzini
2018-11-29 11:56     ` Mark Cave-Ayland
2018-11-29 15:42       ` Guenter Roeck
2018-11-29 17:38       ` Guenter Roeck [this message]
2018-11-29 17:53         ` Paolo Bonzini
2018-11-29 18:07         ` Mark Cave-Ayland
2018-11-29 19:00           ` Guenter Roeck
2018-11-29 19:33             ` Mark Cave-Ayland
2018-11-29 21:26               ` Guenter Roeck
2018-11-29 18:34         ` Mark Cave-Ayland
2018-11-29 19:07           ` Guenter Roeck
2018-11-29 19:38             ` Mark Cave-Ayland
2018-11-29 14:18     ` Guenter Roeck
2018-11-29  9:58 ` [Qemu-devel] [PATCH 1/2] esp-pci: Fix status register write erase control Paolo Bonzini

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:630d923 dfblob:e3d6cc5 dfblob:0fb6a99 dfblob:2fe8a7c
dfblob:682a0d2 dfblob:adab63d )
 OR (
bs:"scsi: esp: Defer command completion until previous interrupts" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181129173845.GA2929@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=famz@redhat.com \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).