qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH]: fix QEMU SCSI lock up
@ 2008-09-24 22:59 Marcelo Tosatti
  2008-09-24 23:38 ` [Qemu-devel] " Anthony Liguori
  0 siblings, 1 reply; 5+ messages in thread
From: Marcelo Tosatti @ 2008-09-24 22:59 UTC (permalink / raw)
  To: qemu-devel, Anthony Liguori


From: Matteo Frigo <athena@fftw.org>
Date: Wed, 02 Apr 2008 20:41:24 -0400
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] QEMU/KVM SCSI lock up
X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4)
X-Mailman-Approved-At: Thu, 03 Apr 2008 00:59:59 -0400
Reply-To: qemu-devel@nongnu.org

kvm-64 hangs under heavy disk I/O with scsi disks.  To reproduce,
create a fresh qcow2 disk, boot linux, and execute

  dd if=/dev/sdX of=/dev/null bs=1M

on the fresh disk.  See also https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1895893&group_id=180599

I have attached a patch that appears to fix the problem.  The bug
seems to be the following.  scsi_read_data() does the following

    bdrv_aio_read()
    r->sector += n;
    r->sector_count -= n;

For reasons that I do not fully understand, bdrv_aio_read() does
not return immediately, but instead it calls scsi_read_data()
recursively.  Since ``r->sector += n;'' has not been executed
yet, the re-entrant call triggers a read of the same sector, which
breaks the producer-consumer lockstep.  The fix is to swap the operations
as follows:

    r->sector += n;
    r->sector_count -= n;
    bdrv_aio_read()

A similar fix applies to scsi_write_data().

Thanks for developing kvm, it is truly an amazing piece of software.

Regards,
Matteo Frigo


diff -aur kvm-64.old/qemu/hw/scsi-disk.c kvm-64.new/qemu/hw/scsi-disk.c
--- kvm-64.old/qemu/hw/scsi-disk.c	2008-03-26 08:49:35.000000000 -0400
+++ kvm-64.new/qemu/hw/scsi-disk.c	2008-03-30 08:37:25.000000000 -0400
@@ -196,12 +196,12 @@
         n = SCSI_DMA_BUF_SIZE / 512;
 
     r->buf_len = n * 512;
-    r->aiocb = bdrv_aio_read(s->bdrv, r->sector, r->dma_buf, n,
+    r->sector += n;
+    r->sector_count -= n;
+    r->aiocb = bdrv_aio_read(s->bdrv, r->sector - n, r->dma_buf, n,
                              scsi_read_complete, r);
     if (r->aiocb == NULL)
         scsi_command_complete(r, SENSE_HARDWARE_ERROR);
-    r->sector += n;
-    r->sector_count -= n;
 }
 
 static void scsi_write_complete(void * opaque, int ret)
@@ -248,12 +248,12 @@
         BADF("Data transfer already in progress\n");
     n = r->buf_len / 512;
     if (n) {
-        r->aiocb = bdrv_aio_write(s->bdrv, r->sector, r->dma_buf, n,
+        r->sector += n;
+        r->sector_count -= n;
+        r->aiocb = bdrv_aio_write(s->bdrv, r->sector - n, r->dma_buf, n,
                                   scsi_write_complete, r);
         if (r->aiocb == NULL)
             scsi_command_complete(r, SENSE_HARDWARE_ERROR);
-        r->sector += n;
-        r->sector_count -= n;
     } else {
         /* Invoke completion routine to fetch data from host.  */
         scsi_write_complete(r, 0);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] Re: [PATCH]: fix QEMU SCSI lock up
  2008-09-24 22:59 [Qemu-devel] [PATCH]: fix QEMU SCSI lock up Marcelo Tosatti
@ 2008-09-24 23:38 ` Anthony Liguori
  2008-09-25  8:00   ` Avi Kivity
  0 siblings, 1 reply; 5+ messages in thread
From: Anthony Liguori @ 2008-09-24 23:38 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: qemu-devel

Marcelo Tosatti wrote:
> From: Matteo Frigo <athena@fftw.org>
> Date: Wed, 02 Apr 2008 20:41:24 -0400
> To: qemu-devel@nongnu.org
> Subject: [Qemu-devel] QEMU/KVM SCSI lock up
> X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4)
> X-Mailman-Approved-At: Thu, 03 Apr 2008 00:59:59 -0400
> Reply-To: qemu-devel@nongnu.org
>
> kvm-64 hangs under heavy disk I/O with scsi disks.  To reproduce,
> create a fresh qcow2 disk, boot linux, and execute
>
>   dd if=/dev/sdX of=/dev/null bs=1M
>
> on the fresh disk.  See also https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1895893&group_id=180599
>
> I have attached a patch that appears to fix the problem.  The bug
> seems to be the following.  scsi_read_data() does the following
>
>     bdrv_aio_read()
>     r->sector += n;
>     r->sector_count -= n;
>
> For reasons that I do not fully understand, bdrv_aio_read() does
> not return immediately, but instead it calls scsi_read_data()
> recursively.

This bothers me.  bdrv_aio_read() should never immediately invoke the 
callback to prevent exactly this sort of problem.  Perhaps this was a 
bug that has since been fixed?  Is this still reproducible?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Re: [PATCH]: fix QEMU SCSI lock up
  2008-09-24 23:38 ` [Qemu-devel] " Anthony Liguori
@ 2008-09-25  8:00   ` Avi Kivity
  2008-10-16 15:25     ` Avi Kivity
  0 siblings, 1 reply; 5+ messages in thread
From: Avi Kivity @ 2008-09-25  8:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Marcelo Tosatti

Anthony Liguori wrote:
>>
>> For reasons that I do not fully understand, bdrv_aio_read() does
>> not return immediately, but instead it calls scsi_read_data()
>> recursively.
>
> This bothers me.  bdrv_aio_read() should never immediately invoke the
> callback to prevent exactly this sort of problem.  Perhaps this was a
> bug that has since been fixed?  Is this still reproducible?

qcow2 metadata is synchronous, and if the disk is empty, there will be
no data I/O, so bdrv_aio_read() will never be invoked.

Maybe we should fix this in qcow2 (and the other block formats) by
scheduling a BH.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Re: [PATCH]: fix QEMU SCSI lock up
  2008-09-25  8:00   ` Avi Kivity
@ 2008-10-16 15:25     ` Avi Kivity
  2008-10-21 14:57       ` Anthony Liguori
  0 siblings, 1 reply; 5+ messages in thread
From: Avi Kivity @ 2008-10-16 15:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Marcelo Tosatti

Avi Kivity wrote:
> Anthony Liguori wrote:
>   
>>> For reasons that I do not fully understand, bdrv_aio_read() does
>>> not return immediately, but instead it calls scsi_read_data()
>>> recursively.
>>>       
>> This bothers me.  bdrv_aio_read() should never immediately invoke the
>> callback to prevent exactly this sort of problem.  Perhaps this was a
>> bug that has since been fixed?  Is this still reproducible?
>>     
>
> qcow2 metadata is synchronous, and if the disk is empty, there will be
> no data I/O, so bdrv_aio_read() will never be invoked.
>
> Maybe we should fix this in qcow2 (and the other block formats) by
> scheduling a BH.
>   

FWIW, I was told this reproduces on kvm-77 (which has the latest qemu 
scsi bits).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Re: [PATCH]: fix QEMU SCSI lock up
  2008-10-16 15:25     ` Avi Kivity
@ 2008-10-21 14:57       ` Anthony Liguori
  0 siblings, 0 replies; 5+ messages in thread
From: Anthony Liguori @ 2008-10-21 14:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Marcelo Tosatti, Avi Kivity

Avi Kivity wrote:
> Avi Kivity wrote:
>> Anthony Liguori wrote:
>>  
>>>> For reasons that I do not fully understand, bdrv_aio_read() does
>>>> not return immediately, but instead it calls scsi_read_data()
>>>> recursively.
>>>>       
>>> This bothers me.  bdrv_aio_read() should never immediately invoke the
>>> callback to prevent exactly this sort of problem.  Perhaps this was a
>>> bug that has since been fixed?  Is this still reproducible?
>>>     
>>
>> qcow2 metadata is synchronous, and if the disk is empty, there will be
>> no data I/O, so bdrv_aio_read() will never be invoked.
>>
>> Maybe we should fix this in qcow2 (and the other block formats) by
>> scheduling a BH.
>>   
>
> FWIW, I was told this reproduces on kvm-77 (which has the latest qemu 
> scsi bits).

qemu_aio_wait() will run bottom halves when emulating synchronous IO.  I 
don't think this is exploitable practically speaking but it seems to me 
like a major flaw.  I think the proper fix is what you describe, 
modifying qcow2 to schedule a bottom half to read metadata.  Better yet, 
a full conversion to make the meta data reading/writing asynchronous.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-10-21 14:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-24 22:59 [Qemu-devel] [PATCH]: fix QEMU SCSI lock up Marcelo Tosatti
2008-09-24 23:38 ` [Qemu-devel] " Anthony Liguori
2008-09-25  8:00   ` Avi Kivity
2008-10-16 15:25     ` Avi Kivity
2008-10-21 14:57       ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).