All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>
Subject: Re: [Qemu-devel] qtest hang in /x86_64/ahci/io/ncq/simple (ppc64 host)
Date: Mon, 06 Jul 2015 15:21:12 -0400	[thread overview]
Message-ID: <559AD528.5030903@redhat.com> (raw)
In-Reply-To: <559AABAF.9020208@redhat.com>



On 07/06/2015 12:24 PM, Paolo Bonzini wrote:
> 
> 
> On 06/07/2015 17:41, John Snow wrote:
>> I'll take a look. More than possible there's a race in the wait
>> conditional that I have just never seen before.
>>
>> I tweaked it a little recently to wait on both NCQ and traditional
>> completion, but it doesn't have a timeout or anything. I'll go ahead and
>> fix that while I'm here.
>>
>> Should be something simple, hopefully.

The one Peter Maydell reported is indeed something simple. Sent out a patch.

> 
> My hang in test_halted_ncq is different:

... Great :)

> 
> (gdb) up
> #1  0x00007f3aaadb4476 in read (__nbytes=1, __buf=0x7ffc0652abbf,
> __fd=<optimized out>) at /usr/include/bits/unistd.h:44
> 44	  return __read_alias (__fd, __buf, __nbytes);
> (gdb)
> #2  qtest_qmp_receive (s=s@entry=0x7f3aac017360) at
> /home/work/upstream/qemu/tests/libqtest.c:371
> 371	        len = read(s->qmp_fd, &c, 1);
> (gdb)
> #3  0x00007f3aaadb4c48 in qtest_qmp_eventwait (s=0x7f3aac017360,
> event=event@entry=0x7f3aaadbdc68 "STOP") at
> /home/work/upstream/qemu/tests/libqtest.c:477
> 477	        response = qtest_qmp_receive(s);
> (gdb)
> #4  0x00007f3aaadb3514 in qmp_eventwait (event=0x7f3aaadbdc68 "STOP") at
> /home/work/upstream/qemu/tests/libqtest.h:498
> 498	    return qtest_qmp_eventwait(global_qtest, event);
> (gdb)
> #5  ahci_guest_io_halt (ahci=ahci@entry=0x7f3aac014820,
> port=port@entry=5 '\005', ide_cmd=ide_cmd@entry=97 'a',
> buffer=buffer@entry=1097728, bufsize=bufsize@entry=4096,
> sector=sector@entry=0)
>     at /home/work/upstream/qemu/tests/libqos/ahci.c:604
> 604	    qmp_eventwait("STOP");
> (gdb)
> #6  0x00007f3aaada9e09 in ahci_halted_io_test (cmd_read=<optimized out>,
> cmd_write=<optimized out>) at
> /home/work/upstream/qemu/tests/ahci-test.c:1244
> 1244	    cmd = ahci_guest_io_halt(ahci, port, cmd_write,
> 
> 
> In QEMU, current_run_state is RUN_STATE_RUNNING, so the halt didn't
> happen.  It's 100% reproducible with "make check-qtest-{i386,x86_64} V=1
> -j12".  This is the final part of the qtest log:
> 
> [R +0.066539] readl 0xe0000390
> [S +0.066553] OK 0x0000000000000000
> [R +0.066582] memset 0x10b000 0x100 0x00
> [S +0.066597] OK
> [R +0.067627] write 0x10c000 0x1000
[snip]
> [S +0.067748] OK
> [R +0.067782] readl 0xe00003b8
> [S +0.067796] OK 0x0000000000000000
> [R +0.067824] read 0x10a020 0x20
> [S +0.067843] OK
> 0x0000000000000000000000000000000000000000000000000000000000000000
> [R +0.067874] write 0x10a020 0x20
> 0x0000000000000000000000000000000000000000000000000000000000000000
> [S +0.067897] OK
> [R +0.067926] write 0x10a020 0x20
> 0x450401000000000000e010000000000000000000000000000000000000000000
> [S +0.067940] OK
> [R +0.067964] write 0x10e000 0x14 0x2780610800000040000000000800000000000000
> [S +0.067980] OK
> [R +0.068004] write 0x10e080 0x10 0x00c010000000000000000000ff0f0080
> [S +0.068018] OK
> [R +0.068042] writel 0xe00003b4 0x2
> [S +0.068056] OK
> [R +0.068079] writel 0xe00003b8 0x2
> [S +0.069500] OK
> 
> This is the /tmp/qtest-blkdebug.INHp7s file:
> 
> [inject-error]
> event = "write_aio"
> errno = "5"
> state = "1"
> immediately = "off"
> once = "on"
> [set-state]
> event = "write_aio"
> new_state = "2"
> 
> There is no trace of the write payload in the .qcow2 file.

Smells like the injected error prevented that successfully, but...

> 
> Paolo
> 

Can't reproduce: Fedora 21, 4.0.5, Today's QEMU HEAD (7edd8e56) ...
So it's a race condition of some sort.

Obvious questions: You've tried a distclean?
What was your configure invocation?

My attempt:
make distclean
../../configure --target-list="x86_64-softmmu i386-softmmu" --enable-debug
make -j9
make check-qtest-{i386,x86_64} V=1 -j12

Completes just fine for me.

That aside, I can't figure out why the machine HALT *and* the write
wouldn't occur. Can you stick a bunch of printfs inside the NCQ error
handling function and see if there's something goofing up in there, or
if it's even making it there?

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index bb6a92f..f1af6f9 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -921,6 +921,7 @@ static void ncq_cb(void *opaque, int ret)
     IDEState *ide_state = &ncq_tfs->drive->port.ifs[0];

     if (ret == -ECANCELED) {
+        fprintf(stderr, "ncq_cb: -ECANCELED\n");
         return;
     }

@@ -928,12 +929,17 @@ static void ncq_cb(void *opaque, int ret)
         bool is_read = ncq_tfs->cmd == READ_FPDMA_QUEUED;
         BlockErrorAction action = blk_get_error_action(ide_state->blk,
                                                        is_read, -ret);
+        fprintf(stderr, "ncq_cb: ret (%d) is < 0; action is %s\n",
+                ret, BlockErrorAction_lookup[action]);
         if (action == BLOCK_ERROR_ACTION_STOP) {
             ncq_tfs->halt = true;
             ide_state->bus->error_status = IDE_RETRY_HBA;
+            fprintf(stderr, "ncq_cb: BLOCK_ERROR_ACTION_STOP\n");
         } else if (action == BLOCK_ERROR_ACTION_REPORT) {
             ncq_err(ncq_tfs);
+            fprintf(stderr, "ncq_cb: BLOCK_ERROR_ACTION_REPORT\n");
         }
+        fprintf(stderr, "ncq_cb: -> blk_error_action\n");
         blk_error_action(ide_state->blk, action, is_read, -ret);
     } else {
         ide_state->status = READY_STAT | SEEK_STAT;


My guess is I have goofed something up above there in the error handling
section of ncq_cb, but "worksforme," so I am not sure what's going on.

If no prints show up during testing, maybe turn on AHCI_DEBUG as well
and see if any prints are showing up in the success case.

--js

      reply	other threads:[~2015-07-06 19:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-06 15:35 [Qemu-devel] qtest hang in /x86_64/ahci/io/ncq/simple (ppc64 host) Peter Maydell
2015-07-06 15:41 ` John Snow
2015-07-06 16:24   ` Paolo Bonzini
2015-07-06 19:21     ` John Snow [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559AD528.5030903@redhat.com \
    --to=jsnow@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.