From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38398)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1Z9bHa-0007q5-5R
	for qemu-devel@nongnu.org; Mon, 29 Jun 2015 11:47:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1Z9bHZ-0004Fc-7W
	for qemu-devel@nongnu.org; Mon, 29 Jun 2015 11:47:42 -0400
Message-ID: <55916894.7030501@redhat.com>
Date: Mon, 29 Jun 2015 11:47:32 -0400
From: John Snow <jsnow@redhat.com>
MIME-Version: 1.0
References: <1435018875-22527-1-git-send-email-jsnow@redhat.com>
	<1435018875-22527-7-git-send-email-jsnow@redhat.com>
	<20150626153514.GC31186@stefanha-thinkpad.redhat.com>
	<558D999B.9090001@redhat.com>
	<20150629142446.GH32151@stefanha-thinkpad.redhat.com>
	<55916763.5000209@redhat.com>
In-Reply-To: <55916763.5000209@redhat.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH 06/16] ahci: record ncq
	failures
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: kwolf@redhat.com, pbonzini@redhat.com, qemu-block@nongnu.org, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>


On 06/29/2015 11:42 AM, John Snow wrote:
> 
> 
> On 06/29/2015 10:24 AM, Stefan Hajnoczi wrote:
>> On Fri, Jun 26, 2015 at 02:27:39PM -0400, John Snow wrote:
>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>>>
>>>
>>>
>>> On 06/26/2015 11:35 AM, Stefan Hajnoczi wrote:
>>>> On Mon, Jun 22, 2015 at 08:21:05PM -0400, John Snow wrote:
>>>>> Handle NCQ failures for cases where we want to halt the VM on
>>>>> IO errors.
>>>>>
>>>>> Signed-off-by: John Snow <jsnow@redhat.com> ---
>>>>> hw/ide/ahci.c | 17 +++++++++++++++-- hw/ide/ahci.h     |  1 +
>>>>> hw/ide/internal.h |  1 + 3 files changed, 17 insertions(+), 2
>>>>> deletions(-)
>>>>>
>>>>> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c index
>>>>> 71b5085..a838317 100644 --- a/hw/ide/ahci.c +++
>>>>> b/hw/ide/ahci.c @@ -959,13 +959,25 @@ static void ncq_cb(void
>>>>> *opaque, int ret) return; }
>>>>>
>>>>> +    ncq_tfs->halt = false;
>>>>
>>>> Why does halt need to be cleared here?
>>>>
> 
> I remember my thinking now. I didn't want to leave it dangling after a
> successful command, so I wanted to clear it on the callback.
> 
> It's harmless, but it's weird to have it set afterwards and I wanted
> it to be very crystal clear what it meant when it was set. (With this
> usage case, 'halt' being set ALWAYS means that there's a command to
> retry -- you don't have to test other variables like 'used' to tell if
> it's a left over flag.)
> 
> I actually want to leave it here, now.
> 

I am literally not awake: if I clear it in execute_ncq_command like I
offered, it will get cleared on retry and we won't leave it hanging.

It takes hitting the 'send' button to realize this.

>>>
>>> Might make more sense to clear it just on the beginning of every 
>>> command, in execute().
>>>
>>> There's no strong reason here other than "If there's an error and
>>> it should be set, it'll get reset again pretty soon." It's just a
>>> default state.
>>>
>>> I could move it from process to execute.
>>
>> By the way, does ->halt need to be cleared in ahci_reset_port()?
>>
>> I'm thinking of a scenario where requests have failed and the port
>> is reset.  We should not try to reissue those commands after
>> reset.
>>
> 
> If I keep it like I have it now (where it clears itself after a
> successful command), we can clear it alongside the 'used' flag to
> protect against the case where the guest issues a reset almost
> simultaneously with an errored read command, where it might be that
> the NCQ command halts the VM, then we try to reset immediately after boot.
> 
> 
> (Perilously tangential side-note: what's the default error action if
> you don't set rerror=stop or werror=stop? the ide code didn't make it
> particularly clear to me if it was IGNORE or REPORT.)
> 

Still curious about this bit, though.