From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LQMuW-0003on-Vl for qemu-devel@nongnu.org; Fri, 23 Jan 2009 09:21:29 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LQMuT-0003kM-Ar for qemu-devel@nongnu.org; Fri, 23 Jan 2009 09:21:28 -0500 Received: from [199.232.76.173] (port=39448 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LQMuS-0003k9-Sr for qemu-devel@nongnu.org; Fri, 23 Jan 2009 09:21:24 -0500 Received: from e6.ny.us.ibm.com ([32.97.182.146]:57280) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LQMuS-0001RQ-JW for qemu-devel@nongnu.org; Fri, 23 Jan 2009 09:21:24 -0500 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e6.ny.us.ibm.com (8.13.1/8.13.1) with ESMTP id n0NELiW9005394 for ; Fri, 23 Jan 2009 09:21:44 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id n0NELNaN166322 for ; Fri, 23 Jan 2009 09:21:23 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n0NEKSwK009103 for ; Fri, 23 Jan 2009 09:20:28 -0500 From: Ryan Harper Date: Fri, 23 Jan 2009 08:21:19 -0600 Message-Id: <1232720479-21398-3-git-send-email-ryanh@us.ibm.com> In-Reply-To: <1232720479-21398-1-git-send-email-ryanh@us.ibm.com> References: <1232720479-21398-1-git-send-email-ryanh@us.ibm.com> Subject: [Qemu-devel] [PATCH 2/2] lsi, scsi-disk: check for reentrance via tag matching Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Ryan Harper , kvm@vger.kernel.org SLES10 RC2 install, and other OSes have triggered the following error from the LSI device: lsi_scsi: error: Reselect with pending DMA After some debugging, I noticed that in the lsi code, the phase would change from PHASE_DO which corresponded to the current Write command we were handling, to PHASE_DI. After tracking down all of the places in lsi were we phase change to PHASE_DI, lsi_do_command was triggering this change after calling send command. This was a most odd code path. Then I saw it: lsi_do_command() device->send_command() scsi_command_complete() lsi_command_complete() lsi_resume_script() lsi_do_command() This can be detected by tracking s->current_tag when we start lsi_do_command() and checking against that once we return from send_command(). If the tags are different, then the previous command has completed and we exit the function. A similar reentrance bug happens in scsi-disk.c. scsi_send_command() scsi_command_complete() // we remove the request and put it on the freelist lsi_command_complete() lsi_resume_script() lsi_do_command() scsi_send_command() // fetchings recently free'd request // and updates r->sector_count // clobbering the value that was being // used by the previous user of // this request As with the lsi device, we can test for this by comparing the tag value from the start of the function to the request tag (r->tag) and if they differ, then the current function can exit as that command has already completed. Fixing these two re-entrance issues allows SLES10 SP2 installer to complete and guest function fully. Signed-off-by: Ryan Harper diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index 912d7d5..34f50df 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -668,6 +668,7 @@ static void lsi_do_command(LSIState *s) { uint8_t buf[16]; int n; + uint32_t tag = s->current_tag; DPRINTF("Send command len=%d\n", s->dbc); if (s->dbc > 16) @@ -677,6 +678,15 @@ static void lsi_do_command(LSIState *s) s->command_complete = 0; n = s->current_dev->send_command(s->current_dev, s->current_tag, buf, s->current_lun); + /* send_command may trigger a completion and call lsi_command_complete() + * which can resume SCRIPTS without returning here and actually complete + * this current tag. We check the current_tag against a copy when we + * started this function and if we don't match, just exit the function. */ + if (tag != s->current_tag) { + DPRINTF("%s: tag=0x%x already completed.\n"); + return; + } + if (n > 0) { lsi_set_phase(s, PHASE_DI); s->current_dev->read_data(s->current_dev, s->current_tag); diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 744573e..4fa09a2 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -785,6 +785,12 @@ static int32_t scsi_send_command(SCSIDevice *d, uint32_t tag, } if (r->sector_count == 0 && r->buf_len == 0) { scsi_command_complete(r, STATUS_GOOD, SENSE_NO_SENSE); + /* scsi_send_command can nest since scsi_command_complete calls + * the driver completion function which may proceed to call + * send_command a second time. If that happens the current + * task has completed and we can just exit send_command() */ + if (tag != r->tag) + return 0; } len = r->sector_count * 512 + r->buf_len; if (is_write) {