* [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests
@ 2015-01-28 18:49 Denis V. Lunev
2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev
0 siblings, 1 reply; 5+ messages in thread
From: Denis V. Lunev @ 2015-01-28 18:49 UTC (permalink / raw)
Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi
The following sequence
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
for (i = 0; i < 100000; i++)
write(fd, buf, 4096);
performs 10% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.
The difference is quite reliable.
I have used the following program to test
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <malloc.h>
#include <string.h>
int main(int argc, char *argv[])
{
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
void *buf;
int i = 0;
do {
buf = memalign(512, 4096); <--- replace 512 with 4096
if ((unsigned long)buf & 4095)
break;
i++;
} while (1);
printf("%d\n", i);
memset(buf, 0x11, 4096);
for (i = 0; i < 100000; i++)
write(fd, buf, 4096);
close(fd);
return 0;
}
time for in in `seq 1 30` ; do a.out aa ; done
The file was placed into 8 GB partition on HDD below to avoid speed
change due to different offset on disk. Results are reliable:
- 189 vs 180 seconds on Linux 3.16
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
hades ~/src/qemu # hdparm -I /dev/sdg
/dev/sdg:
ATA device, with non-removable media
Model Number: WDC WD20EZRX-07D8PB0
Serial Number: WD-WCC4M5LVSAEP
Firmware Revision: 80.00A80
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Supported: 9 8 7 6 5
Likely used: 9
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 3907029168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
device size with M = 1024*1024: 1907729 MBytes
device size with M = 1000*1000: 2000398 MBytes (2000 GB)
cache/buffer size = unknown
Nominal Media Rotation Rate: 5400
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* 64-bit World wide name
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* NCQ priority information
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
unknown 206[12] (vendor specific)
unknown 206[13] (vendor specific)
unknown 206[14] (vendor specific)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
276min for SECURITY ERASE UNIT. 276min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee2b5da838c
NAA : 5
IEEE OUI : 0014ee
Unique ID : 2b5da838c
Checksum: correct
hades ~/src/qemu #
^ permalink raw reply [flat|nested] 5+ messages in thread* [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 2015-01-28 18:49 [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests Denis V. Lunev @ 2015-01-28 18:49 ` Denis V. Lunev 2015-01-28 19:59 ` Denis V. Lunev 2015-01-28 20:07 ` Paolo Bonzini 0 siblings, 2 replies; 5+ messages in thread From: Denis V. Lunev @ 2015-01-28 18:49 UTC (permalink / raw) Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi The following sequence int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 10% better if buf is aligned to 4096 bytes rather then to 512 bytes on HDD with 512/4096 logical/physical sector size. The difference is quite reliable. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> --- block.c | 4 ++-- block/raw-posix.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/block.c b/block.c index d45e4dd..bc5d1e7 100644 --- a/block.c +++ b/block.c @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_transfer_length = bs->file->bl.max_transfer_length; bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment; } else { - bs->bl.opt_mem_alignment = 512; + bs->bl.opt_mem_alignment = 4096; } if (bs->backing_hd) { @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, bs->open_flags = flags; bs->guest_block_size = 512; - bs->request_alignment = 512; + bs->request_alignment = 4096; bs->zero_beyond_eof = true; open_flags = bdrv_open_flags(bs, flags); bs->read_only = !(open_flags & BDRV_O_RDWR); diff --git a/block/raw-posix.c b/block/raw-posix.c index ec38fee..d1b3388 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) if (!s->buf_align) { size_t align; buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE); - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) { s->buf_align = align; break; @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) if (!bs->request_alignment) { size_t align; buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE); - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { if (pread(fd, buf, align, 0) >= 0) { bs->request_alignment = align; break; -- 1.9.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev @ 2015-01-28 19:59 ` Denis V. Lunev 2015-01-28 20:07 ` Paolo Bonzini 1 sibling, 0 replies; 5+ messages in thread From: Denis V. Lunev @ 2015-01-28 19:59 UTC (permalink / raw) Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi On 28/01/15 21:49, Denis V. Lunev wrote: > The following sequence > int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); > for (i = 0; i < 100000; i++) > write(fd, buf, 4096); > performs 10% better if buf is aligned to 4096 bytes rather then to > 512 bytes on HDD with 512/4096 logical/physical sector size. > > The difference is quite reliable. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Stefan Hajnoczi <stefanha@redhat.com> > --- > block.c | 4 ++-- > block/raw-posix.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/block.c b/block.c > index d45e4dd..bc5d1e7 100644 > --- a/block.c > +++ b/block.c > @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) > bs->bl.max_transfer_length = bs->file->bl.max_transfer_length; > bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment; > } else { > - bs->bl.opt_mem_alignment = 512; > + bs->bl.opt_mem_alignment = 4096; > } > > if (bs->backing_hd) { > @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, > > bs->open_flags = flags; > bs->guest_block_size = 512; > - bs->request_alignment = 512; > + bs->request_alignment = 4096; > bs->zero_beyond_eof = true; > open_flags = bdrv_open_flags(bs, flags); > bs->read_only = !(open_flags & BDRV_O_RDWR); > diff --git a/block/raw-posix.c b/block/raw-posix.c > index ec38fee..d1b3388 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > if (!s->buf_align) { > size_t align; > buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE); > - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { > + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { > if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) { > s->buf_align = align; > break; > @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > if (!bs->request_alignment) { > size_t align; > buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE); > - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { > + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { > if (pread(fd, buf, align, 0) >= 0) { > bs->request_alignment = align; > break; sorry, the patch is wrong. It breaks 'make check-block'. I will redo it and perform more testing. request-alignment related changes are wrong :( I have run tests without them but added them as a obvious last minute addition. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev 2015-01-28 19:59 ` Denis V. Lunev @ 2015-01-28 20:07 ` Paolo Bonzini 2015-01-28 20:13 ` Denis V. Lunev 1 sibling, 1 reply; 5+ messages in thread From: Paolo Bonzini @ 2015-01-28 20:07 UTC (permalink / raw) To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi On 28/01/2015 19:49, Denis V. Lunev wrote: > The following sequence > int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); > for (i = 0; i < 100000; i++) > write(fd, buf, 4096); > performs 10% better if buf is aligned to 4096 bytes rather then to > 512 bytes on HDD with 512/4096 logical/physical sector size. > > The difference is quite reliable. The 10% difference, however, is probably not enough to cover the cost of providing a bounce buffer if a guest is (rightfully) using a 512-byte aligned buffer: bs->bl.opt_mem_alignment is in fact badly named and it should be bs->bl.min_mem_alignment instead. Instead, you probably should patch bdrv_opt_mem_align to return at least 4096, and leave the detection logic intact. This will let qemu_blockalign return a properly aligned buffer to qemu-img and other in-process allocations, without negatively affecting the guest. Thanks, Paolo > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Stefan Hajnoczi <stefanha@redhat.com> > --- > block.c | 4 ++-- > block/raw-posix.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/block.c b/block.c > index d45e4dd..bc5d1e7 100644 > --- a/block.c > +++ b/block.c > @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) > bs->bl.max_transfer_length = bs->file->bl.max_transfer_length; > bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment; > } else { > - bs->bl.opt_mem_alignment = 512; > + bs->bl.opt_mem_alignment = 4096; > } > > if (bs->backing_hd) { > @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, > > bs->open_flags = flags; > bs->guest_block_size = 512; > - bs->request_alignment = 512; > + bs->request_alignment = 4096; > bs->zero_beyond_eof = true; > open_flags = bdrv_open_flags(bs, flags); > bs->read_only = !(open_flags & BDRV_O_RDWR); > diff --git a/block/raw-posix.c b/block/raw-posix.c > index ec38fee..d1b3388 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > if (!s->buf_align) { > size_t align; > buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE); > - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { > + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { > if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) { > s->buf_align = align; > break; > @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > if (!bs->request_alignment) { > size_t align; > buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE); > - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { > + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { > if (pread(fd, buf, align, 0) >= 0) { > bs->request_alignment = align; > break; > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 2015-01-28 20:07 ` Paolo Bonzini @ 2015-01-28 20:13 ` Denis V. Lunev 0 siblings, 0 replies; 5+ messages in thread From: Denis V. Lunev @ 2015-01-28 20:13 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi On 28/01/15 23:07, Paolo Bonzini wrote: > > On 28/01/2015 19:49, Denis V. Lunev wrote: >> The following sequence >> int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); >> for (i = 0; i < 100000; i++) >> write(fd, buf, 4096); >> performs 10% better if buf is aligned to 4096 bytes rather then to >> 512 bytes on HDD with 512/4096 logical/physical sector size. >> >> The difference is quite reliable. > The 10% difference, however, is probably not enough to cover the cost of > providing a bounce buffer if a guest is (rightfully) using a 512-byte > aligned buffer: bs->bl.opt_mem_alignment is in fact badly named and it > should be bs->bl.min_mem_alignment instead. > > Instead, you probably should patch bdrv_opt_mem_align to return at least > 4096, and leave the detection logic intact. This will let > qemu_blockalign return a properly aligned buffer to qemu-img and other > in-process allocations, without negatively affecting the guest. > > Thanks, > > Paolo ok, this looks good to me :) >> Signed-off-by: Denis V. Lunev <den@openvz.org> >> CC: Kevin Wolf <kwolf@redhat.com> >> CC: Stefan Hajnoczi <stefanha@redhat.com> >> --- >> block.c | 4 ++-- >> block/raw-posix.c | 4 ++-- >> 2 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/block.c b/block.c >> index d45e4dd..bc5d1e7 100644 >> --- a/block.c >> +++ b/block.c >> @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) >> bs->bl.max_transfer_length = bs->file->bl.max_transfer_length; >> bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment; >> } else { >> - bs->bl.opt_mem_alignment = 512; >> + bs->bl.opt_mem_alignment = 4096; >> } >> >> if (bs->backing_hd) { >> @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, >> >> bs->open_flags = flags; >> bs->guest_block_size = 512; >> - bs->request_alignment = 512; >> + bs->request_alignment = 4096; >> bs->zero_beyond_eof = true; >> open_flags = bdrv_open_flags(bs, flags); >> bs->read_only = !(open_flags & BDRV_O_RDWR); >> diff --git a/block/raw-posix.c b/block/raw-posix.c >> index ec38fee..d1b3388 100644 >> --- a/block/raw-posix.c >> +++ b/block/raw-posix.c >> @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) >> if (!s->buf_align) { >> size_t align; >> buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE); >> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { >> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { >> if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) { >> s->buf_align = align; >> break; >> @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) >> if (!bs->request_alignment) { >> size_t align; >> buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE); >> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { >> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { >> if (pread(fd, buf, align, 0) >= 0) { >> bs->request_alignment = align; >> break; >> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-28 20:13 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-28 18:49 [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests Denis V. Lunev 2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev 2015-01-28 19:59 ` Denis V. Lunev 2015-01-28 20:07 ` Paolo Bonzini 2015-01-28 20:13 ` Denis V. Lunev
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.