* [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance
@ 2008-10-03 22:05 Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 1/4] lsi_queue_command: add dma direction parameter Ryan Harper
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Ryan Harper @ 2008-10-03 22:05 UTC (permalink / raw)
To: qemu-devel; +Cc: Anthony Liguori, Ryan Harper, kvm
With the aio refactoring completed we now have aio backend that supports
submitting large amounts of io into the host system. Curiously, write
performance through the scsi device is significantly slower than read
performance. On our reference setup[1], we see about 10MB/s writes and 40MB/s
for reads. Digging a bit deeper into the problem revealed that the linux
driver was queueing up to 16 requests into the device. Enabling debugging
in the LSI device and scsi-disk layers showed that for reads the LSI device
would queue up to 16 commands but that writes never were queued.
In lsi_do_command() the emulation submits the scsi command to the layer to
decode the packet and the result determines if the device issues a read or
write request via the value of n, negative values represent writes, positive
values reads.
n = s->current_dev->send_command(s->current_dev, s->current_tag, buf,
s->current_lun);
if (n > 0) {
lsi_set_phase(s, PHASE_DI);
s->current_dev->read_data(s->current_dev, s->current_tag);
} else if (n < 0) {
lsi_set_phase(s, PHASE_DO);
s->current_dev->write_data(s->current_dev, s->current_tag);
}
Subsequenetly if the command has not already completed *AND* it was a read
operation, the emulation embarks upon queueing the command and continues to
process incoming requests. The first thought is to simply change the logic
here to support queueing writes as well. This is the right idea, but a couple
of other issues prevent this from working.
In lsi_queue_command(), as we're queuing the command we record whether this
operation is a read or write and store that in p->out:
/* Add a command to the queue. */
static void lsi_queue_command(LSIState *s)
{
lsi_queue *p;
DPRINTF("Queueing tag=0x%x\n", s->current_tag);
if (s->queue_len == s->active_commands) {
s->queue_len++;
s->queue = qemu_realloc(s->queue, s->queue_len * sizeof(lsi_queue));
}
p = &s->queue[s->active_commands++];
p->tag = s->current_tag;
p->pending = 0;
p->out = (s->sstat1 & PHASE_MASK) == PHASE_DO;
}
If we change the state prior to queue'ing the command (via lsi_set_phase()),
then when we process the deferred command, we end up issuing a dma in the
wrong direction (read versus write). This isn't an issue when only queuing
reads since the logic in lsi_reselect which sets the direction:
s->msg_action = p->out ? 2 : 3;
defaults to reads(3 = DATA IN) if it isn't a write operation. However, once
we start queueing write operations, this needs to be corrected. The first
patch in the series adds a dma direction parameter to lsi_queue_command since
the caller is in a position to know what type of operation they are queueing.
Now that we can queue reads and writes properly, we have to look at how the
scsi layer is handling the read and write operations. In scsi-disk.c, the
main functions that submit io to the aio subsystem via bdrv_aio_read/write are
in scsi_read_data/write_data. Considering the radical difference in read
versus write performance it was curious to see the code path for reads and
writes to differ greatly, when ultimately, the code should be very similar
with just a few different values to check. In genreal, we've decoded the scsi
request at this point, we know how much to read/write, etc, and can submit.
scsi_write_data however, seems to go through some extra hoops to calculate the
r->buf_len when it can be derived from the request itself, I observed the
following sequenece:
lsi_do_command()
scsi_send_command() /* return -16384 -- write operation */
scsi_write_data() /* n == 0 since r->buf_len is 0 */
scsi_write_complete(r, 0) /* set buf_len to
min(r->sector_count * 512,
SCSI_DMA_BUF_LEN), call driver
completion function */
lsi_command_complete() /* we're not done with the transfer,
we're not wiating, and this isn't
a different tag, so we set the
current_dma_len, and mark
command_complete = 1 */
/* back in lsi_do_command() we check if s->command_complete is set,
it is, and so we do no queueing of writes, and change the lsi
state to PHASE_DI, and wait for the command to complete. *.
This sequence effectively serializes all scsi write operations. Comparing
this sequence to a read operation reveals that the scsi_read_data() path
doesn't delay calculating the request's buf_len, nor call the driver complete
function before it has submitted the request into the aio subsystem.
So, the fix should be straight forward: change the lsi device to allow queing
of write operations, and change the scsi disk write path to match the read
path as we should be Golden (TM).
Patches 1-3 implement this and deliver a good boost to write performance. In
observing the results it was noted that increasing the size of the requests
(16k versus 64k) didn't increase thoroughput even though we aren't bound by
cpu consumption.
Our observation noted that in both read and write to the scsi device, we are
breaking up requests that are larger than the default scsi dma buffer size.
For reads and writes, we can instead of breaking up the larger request into
many smaller once, re-allocate the buffer and submit the request as-is. This
reduces the number of ios we submit. This change resulted in a huge boost in
throughput and lowering of cpu consumption.
Below are the results comparing current qemu bits, the first 3 patches (all
the work needed for queuing write commands), and the 4th patch which
reallocates the scsi buffer for larger requests on-demand as well as how the
improved scsi emulation performs under qemu and contrasted with virtio blk.
As I noted above, the setup is the same as with the aio subsystem testing.
These patches cleanly apply to both qemu SVN and kvm's current copy of qemu,
tested on x86_64.
baremetal baseline:
---------------------------+-------+-------+--------------+------------+
Test scenarios | bandw | % CPU | ave submit | ave compl |
type, block size, iface | MB/s | usage | latency usec | latency ms |
---------------------------+-------+-------+--------------+------------+
write, 16k, lvm | 127.7 | 12 | 11.66 | 9.48 |
read , 16k, lvm | 171.1 | 15 | 10.69 | 7.09 |
write, 64k, lvm | 178.4 | 5 | 13.65 | 27.15 |
read , 64k, lvm | 195.7 | 4 | 12.73 | 24.76 |
---------------------------+-------+-------+--------------+------------+
qemu:
---------------------------+-------+-------+--------------+------------+
Test scenarios | bandw | % CPU | ave submit | ave compl |
type, block size, iface | MB/s | usage | latency usec | latency ms |
---------------------------+-------+-------+--------------+------------+
write, 16k, scsi, no patch | 12.8 | 43 | 188.79 | 84.45 |
read , 16k, scsi, no patch | 41.0 | 100 | 317.39 | 29.09 |
write, 64k, scsi, no patch | 26.6 | 31 | 280.25 | 181.75 |
read , 64k, scsi, no patch | 101.0 | 100 | 542.41 | 46.08 |
---------------------------+-------+-------+--------------+------------+
write, 16k, scsi, patch1-3 | 36.8 | 80 | 219.28 | 32.67 |
read , 16k, scsi, patch1-3 | 40.5 | 100 | 314.80 | 29.51 |
write, 64k, scsi, patch1-3 | 42.7 | 38 | 248.50 | 113.24 |
read , 64k, scsi, patch1-3 | 130.0 | 100 | 405.32 | 35.87 |
---------------------------+-------+-------+--------------+------------+
write, 16k, scsi, patch1-4 | 44.7 | 100 | 284.44 | 26.79 |
read , 16k, scsi, patch1-4 | 40.9 | 100 | 321.81 | 29.24 |
write, 64k, scsi, patch1-4 | 135.0 | 100 | 381.70 | 34.59 |
read , 64k, scsi, patch1-4 | 113.0 | 100 | 488.01 | 41.34 |
---------------------------+-------+-------+--------------+------------+
kvm:
---------------------------+-------+-------+--------------+------------+
Test scenarios | bandw | % CPU | ave submit | ave compl |
type, block size, iface | MB/s | usage | latency usec | latency ms |
---------------------------+-------+-------+--------------+------------+
write, 16k, scsi, no patch | 12.0 | 6 | 5.32 | 97.84 |
read , 16k, scsi, no patch | 104.0 | 28 | 21.99 | 11.34 |
write, 64k, scsi, no patch | 28.0 | 6 | 8.50 | 171.70 |
read , 64k, scsi, no patch | 127.0 | 26 | 9.81 | 37.22 |
---------------------------+-------+-------+--------------+------------+
write, 16k, scsi, patch1-3 | 43.0 | 6 | 12.86 | 28.14 |
read , 16k, scsi, patch1-3 | 103.0 | 30 | 20.64 | 11.43 |
write, 64k, scsi, patch1-3 | 39.5 | 6 | 10.53 | 27.81 |
read , 64k, scsi, patch1-3 | 125.1 | 25 | 9.51 | 37.78 |
---------------------------+-------+-------+--------------+------------+
write, 16k, scsi, patch1-4 | 130.0 | 47 | 12.06 | 9.07 |
read , 16k, scsi, patch1-4 | 155.0 | 50 | 66.99 | 7.57 |
write, 64k, scsi, patch1-4 | 155.0 | 26 | 10.38 | 30.54 |
read , 64k, scsi, patch1-4 | 188.0 | 35 | 10.79 | 25.18 |
---------------------------+-------+-------+--------------+------------+
write, 16k, virtio | 136.0 | 43 | 9.84 | 8.72 |
read , 16k, virtio | 182.0 | 44 | 8.79 | 6.49 |
write, 64k, virtio | 173.0 | 53 | 19.21 | 27.35 |
read , 64k, virtio | 190.0 | 64 | 17.56 | 24.94 |
---------------------------+-------+-------+--------------+------------+
1. http://lists.gnu.org/archive/html/qemu-devel/2008-09/msg01115.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 1/4] lsi_queue_command: add dma direction parameter
2008-10-03 22:05 [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance Ryan Harper
@ 2008-10-03 22:05 ` Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 2/4] Refactor lsi_do_command to queue read and write ops Ryan Harper
` (3 subsequent siblings)
4 siblings, 0 replies; 18+ messages in thread
From: Ryan Harper @ 2008-10-03 22:05 UTC (permalink / raw)
To: qemu-devel; +Cc: Anthony Liguori, Ryan Harper, kvm
The patch changes lsi_queue_command to take a parameter indicating whether the
queue'ed command is a read or write. This is needed because the lsi device may
change phase we've recorded the type of operation.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 53a2add..5c161a1 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -494,7 +494,7 @@ static void lsi_do_dma(LSIState *s, int out)
/* Add a command to the queue. */
-static void lsi_queue_command(LSIState *s)
+static void lsi_queue_command(LSIState *s, int out)
{
lsi_queue *p;
@@ -506,7 +506,9 @@ static void lsi_queue_command(LSIState *s)
p = &s->queue[s->active_commands++];
p->tag = s->current_tag;
p->pending = 0;
- p->out = (s->sstat1 & PHASE_MASK) == PHASE_DO;
+ /* the device may change state before we can queue the command
+ * the caller knows if the command is input/output */
+ p->out = out;
}
/* Queue a byte for a MSG IN phase. */
@@ -654,7 +656,7 @@ static void lsi_do_command(LSIState *s)
/* wait data */
lsi_set_phase(s, PHASE_MI);
s->msg_action = 1;
- lsi_queue_command(s);
+ lsi_queue_command(s, 0);
} else {
/* wait command complete */
lsi_set_phase(s, PHASE_DI);
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 2/4] Refactor lsi_do_command to queue read and write ops
2008-10-03 22:05 [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 1/4] lsi_queue_command: add dma direction parameter Ryan Harper
@ 2008-10-03 22:05 ` Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 3/4] Refactor scsi-disk layer for queue'ing writes Ryan Harper
` (2 subsequent siblings)
4 siblings, 0 replies; 18+ messages in thread
From: Ryan Harper @ 2008-10-03 22:05 UTC (permalink / raw)
To: qemu-devel; +Cc: Anthony Liguori, Ryan Harper, kvm
The patch refactors lsi_do_command to queue both read and write operations where
previously the code only queue'ed read operations.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 5c161a1..5c0535c 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -649,18 +649,23 @@ static void lsi_do_command(LSIState *s)
}
if (!s->command_complete) {
- if (n) {
- /* Command did not complete immediately so disconnect. */
- lsi_add_msg_byte(s, 2); /* SAVE DATA POINTER */
- lsi_add_msg_byte(s, 4); /* DISCONNECT */
- /* wait data */
- lsi_set_phase(s, PHASE_MI);
- s->msg_action = 1;
- lsi_queue_command(s, 0);
- } else {
- /* wait command complete */
+ if (n == 0) {
+ /* not a read or write command, just wait on completion */
lsi_set_phase(s, PHASE_DI);
+ return;
}
+
+ /* read or write command that did not complete immediately so
+ * inject a disconnect message, and queue the command */
+ lsi_add_msg_byte(s, 2); /* SAVE DATA POINTER */
+ lsi_add_msg_byte(s, 4); /* DISCONNECT */
+ /* wait data */
+ lsi_set_phase(s, PHASE_MI);
+ s->msg_action = 1;
+ if (n > 0) /* read operation */
+ lsi_queue_command(s, 0);
+ else /* write operation */
+ lsi_queue_command(s, 1);
}
}
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 3/4] Refactor scsi-disk layer for queue'ing writes
2008-10-03 22:05 [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 1/4] lsi_queue_command: add dma direction parameter Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 2/4] Refactor lsi_do_command to queue read and write ops Ryan Harper
@ 2008-10-03 22:05 ` Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed Ryan Harper
2008-10-05 23:08 ` [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance Ryan Harper
4 siblings, 0 replies; 18+ messages in thread
From: Ryan Harper @ 2008-10-03 22:05 UTC (permalink / raw)
To: qemu-devel; +Cc: Anthony Liguori, Ryan Harper, kvm
Refactor scsi write path to match the simpler, faster scsi read path. No need
to call into the write completion function to calculate the request buf length.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 16b3215..f2b4814 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -208,25 +208,14 @@ static void scsi_write_complete(void * opaque, int ret)
{
SCSIRequest *r = (SCSIRequest *)opaque;
SCSIDeviceState *s = r->dev;
- uint32_t len;
if (ret) {
fprintf(stderr, "scsi-disc: IO write error\n");
exit(1);
}
-
r->aiocb = NULL;
- if (r->sector_count == 0) {
- scsi_command_complete(r, SENSE_NO_SENSE);
- } else {
- len = r->sector_count * 512;
- if (len > SCSI_DMA_BUF_SIZE) {
- len = SCSI_DMA_BUF_SIZE;
- }
- r->buf_len = len;
- DPRINTF("Write complete tag=0x%x more=%d\n", r->tag, len);
- s->completion(s->opaque, SCSI_REASON_DATA, r->tag, len);
- }
+ DPRINTF("Write complete tag=0x%x more=%d\n", r->tag, r->buf_len);
+ s->completion(s->opaque, SCSI_REASON_DATA, r->tag, r->buf_len);
}
/* Write data to a scsi device. Returns nonzero on failure.
@@ -235,7 +224,7 @@ static int scsi_write_data(SCSIDevice *d, uint32_t tag)
{
SCSIDeviceState *s = d->state;
SCSIRequest *r;
- uint32_t n;
+ uint32_t n, len;
DPRINTF("Write data tag=0x%x\n", tag);
r = scsi_find_request(s, tag);
@@ -244,8 +233,22 @@ static int scsi_write_data(SCSIDevice *d, uint32_t tag)
scsi_command_complete(r, SENSE_HARDWARE_ERROR);
return 1;
}
+ DPRINTF("Write sector_count=%d\n", r->sector_count);
+ if (r->sector_count == 0) {
+ scsi_command_complete(r, SENSE_NO_SENSE);
+ return 0;
+ }
if (r->aiocb)
BADF("Data transfer already in progress\n");
+
+ /* we know the len of the request and with the max size of the scsi buffer,
+ * we can calculate buf_len needed */
+ len = r->sector_count * 512;
+ if (len > SCSI_DMA_BUF_SIZE)
+ len = SCSI_DMA_BUF_SIZE;
+ r->buf_len = len;
+
+ /* number of sectors to submit */
n = r->buf_len / 512;
if (n) {
r->aiocb = bdrv_aio_write(s->bdrv, r->sector, r->dma_buf, n,
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed
2008-10-03 22:05 [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance Ryan Harper
` (2 preceding siblings ...)
2008-10-03 22:05 ` [Qemu-devel] [PATCH 3/4] Refactor scsi-disk layer for queue'ing writes Ryan Harper
@ 2008-10-03 22:05 ` Ryan Harper
2008-10-03 23:17 ` Paul Brook
2008-10-05 23:08 ` [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance Ryan Harper
4 siblings, 1 reply; 18+ messages in thread
From: Ryan Harper @ 2008-10-03 22:05 UTC (permalink / raw)
To: qemu-devel; +Cc: Anthony Liguori, Ryan Harper, kvm
The default buffer size breaks up larger read/write requests unnecessarily.
When we encounter requests larger than the default dma buffer, reallocate the
buffer to support the request.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index f2b4814..a94f10a 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -192,10 +192,14 @@ static void scsi_read_data(SCSIDevice *d, uint32_t tag)
}
n = r->sector_count;
- if (n > SCSI_DMA_BUF_SIZE / 512)
- n = SCSI_DMA_BUF_SIZE / 512;
-
r->buf_len = n * 512;
+
+ /* if the request is larger than the default, allocate a larger buffer */
+ if (r->buf_len > SCSI_DMA_BUF_SIZE) {
+ qemu_free(r->dma_buf);
+ r->dma_buf = qemu_memalign(512, r->buf_len);
+ }
+
r->aiocb = bdrv_aio_read(s->bdrv, r->sector, r->dma_buf, n,
scsi_read_complete, r);
if (r->aiocb == NULL)
@@ -224,7 +228,7 @@ static int scsi_write_data(SCSIDevice *d, uint32_t tag)
{
SCSIDeviceState *s = d->state;
SCSIRequest *r;
- uint32_t n, len;
+ uint32_t n;
DPRINTF("Write data tag=0x%x\n", tag);
r = scsi_find_request(s, tag);
@@ -243,10 +247,13 @@ static int scsi_write_data(SCSIDevice *d, uint32_t tag)
/* we know the len of the request and with the max size of the scsi buffer,
* we can calculate buf_len needed */
- len = r->sector_count * 512;
- if (len > SCSI_DMA_BUF_SIZE)
- len = SCSI_DMA_BUF_SIZE;
- r->buf_len = len;
+ r->buf_len = r->sector_count * 512;
+
+ /* if the request is larger than the default, allocate a larger buffer */
+ if (r->buf_len > SCSI_DMA_BUF_SIZE) {
+ qemu_free(r->dma_buf);
+ r->dma_buf = qemu_memalign(512, r->buf_len);
+ }
/* number of sectors to submit */
n = r->buf_len / 512;
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed
2008-10-03 22:05 ` [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed Ryan Harper
@ 2008-10-03 23:17 ` Paul Brook
2008-10-03 23:35 ` Anthony Liguori
0 siblings, 1 reply; 18+ messages in thread
From: Paul Brook @ 2008-10-03 23:17 UTC (permalink / raw)
To: qemu-devel; +Cc: Anthony Liguori, Ryan Harper, kvm
On Friday 03 October 2008, Ryan Harper wrote:
> The default buffer size breaks up larger read/write requests unnecessarily.
> When we encounter requests larger than the default dma buffer, reallocate
> the buffer to support the request.
Allocating unboundedly large host buffers based on guest input seems like a
bad idea.
Paul
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed
2008-10-03 23:17 ` Paul Brook
@ 2008-10-03 23:35 ` Anthony Liguori
2008-10-04 0:00 ` Paul Brook
2008-10-04 10:00 ` Avi Kivity
0 siblings, 2 replies; 18+ messages in thread
From: Anthony Liguori @ 2008-10-03 23:35 UTC (permalink / raw)
To: Paul Brook; +Cc: Anthony Liguori, Ryan Harper, qemu-devel, kvm
Paul Brook wrote:
> On Friday 03 October 2008, Ryan Harper wrote:
>
>> The default buffer size breaks up larger read/write requests unnecessarily.
>> When we encounter requests larger than the default dma buffer, reallocate
>> the buffer to support the request.
>>
>
> Allocating unboundedly large host buffers based on guest input seems like a
> bad idea.
>
Perhaps they could be at least bound to phys_ram_size.
In general, I don't think there's a correct size to bound them that's
less than phys_ram_size. The guest may be issuing really big IO requests.
Regards,
Anthony Liguori
> Paul
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed
2008-10-03 23:35 ` Anthony Liguori
@ 2008-10-04 0:00 ` Paul Brook
2008-10-04 10:00 ` Avi Kivity
1 sibling, 0 replies; 18+ messages in thread
From: Paul Brook @ 2008-10-04 0:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Anthony Liguori, Ryan Harper, kvm
On Saturday 04 October 2008, Anthony Liguori wrote:
> Paul Brook wrote:
> > On Friday 03 October 2008, Ryan Harper wrote:
> >> The default buffer size breaks up larger read/write requests
> >> unnecessarily. When we encounter requests larger than the default dma
> >> buffer, reallocate the buffer to support the request.
> >
> > Allocating unboundedly large host buffers based on guest input seems like
> > a bad idea.
>
> Perhaps they could be at least bound to phys_ram_size.
That's still way too large. It means that the maximum host footprint of qemu
is many times the size of the guest RAM. There's a good chance that the host
machine doesn't even have enough virtual address space to satisfy this
request.
I expect that the only situation where you can only avoid breaking up large
transfers when you have zero-copy IO. Previous zero-copy/vectored IO patches
suffered from a similar problem: It is not acceptable to allocate huge chunks
of host ram when you fallback to normal IO.
> In general, I don't think there's a correct size to bound them that's
> less than phys_ram_size. The guest may be issuing really big IO requests.
Qemu is perfectly capable of handling large IO requests by splitting them into
multiple smaller requests. Enlarging the size of this buffer is just a
secondary performance optimisation.
Admittedly we don't currently limit the number of simultaneous commands a
guest can submit, but that's relatively easy to fix.
Paul
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed
2008-10-03 23:35 ` Anthony Liguori
2008-10-04 0:00 ` Paul Brook
@ 2008-10-04 10:00 ` Avi Kivity
[not found] ` <20081004135749.pphehrhuw9w4gwsc@imap.linux.ibm.com>
1 sibling, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2008-10-04 10:00 UTC (permalink / raw)
To: qemu-devel; +Cc: Anthony Liguori, Ryan Harper, Paul Brook, kvm
Anthony Liguori wrote:
> Paul Brook wrote:
>> On Friday 03 October 2008, Ryan Harper wrote:
>>
>>> The default buffer size breaks up larger read/write requests
>>> unnecessarily.
>>> When we encounter requests larger than the default dma buffer,
>>> reallocate
>>> the buffer to support the request.
>>>
>>
>> Allocating unboundedly large host buffers based on guest input seems
>> like a bad idea.
>>
>
> Perhaps they could be at least bound to phys_ram_size.
>
So the guest could double the memory load on the host with just one
request?!
> In general, I don't think there's a correct size to bound them that's
> less than phys_ram_size. The guest may be issuing really big IO
> requests.
The correct fix is not to buffer at all but use scatter-gather. Until
this is done buffering has to be bounded.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance
2008-10-03 22:05 [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance Ryan Harper
` (3 preceding siblings ...)
2008-10-03 22:05 ` [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed Ryan Harper
@ 2008-10-05 23:08 ` Ryan Harper
2008-10-13 16:15 ` Ryan Harper
4 siblings, 1 reply; 18+ messages in thread
From: Ryan Harper @ 2008-10-05 23:08 UTC (permalink / raw)
To: qemu-devel; +Cc: Anthony Liguori, Ryan Harper, kvm
Any comments/review for patches 1-3? It seems that patch4 is something
that we need to work out before committing, but I'd like to see pathes
1-3 go in since they provide a performance boost in and of themselves.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-10-13 16:17 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-03 22:05 [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 1/4] lsi_queue_command: add dma direction parameter Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 2/4] Refactor lsi_do_command to queue read and write ops Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 3/4] Refactor scsi-disk layer for queue'ing writes Ryan Harper
2008-10-03 22:05 ` [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed Ryan Harper
2008-10-03 23:17 ` Paul Brook
2008-10-03 23:35 ` Anthony Liguori
2008-10-04 0:00 ` Paul Brook
2008-10-04 10:00 ` Avi Kivity
[not found] ` <20081004135749.pphehrhuw9w4gwsc@imap.linux.ibm.com>
2008-10-04 21:47 ` Ryan Harper
2008-10-04 22:22 ` Anthony Liguori
2008-10-05 5:23 ` Avi Kivity
2008-10-05 23:06 ` Ryan Harper
2008-10-06 7:27 ` Avi Kivity
2008-10-04 23:00 ` Paul Brook
2008-10-05 5:29 ` Avi Kivity
2008-10-05 23:08 ` [Qemu-devel] [PATCH 0/4] Improve emulated scsi write performance Ryan Harper
2008-10-13 16:15 ` Ryan Harper
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).