* [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests @ 2015-01-28 18:49 Denis V. Lunev 2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev 0 siblings, 1 reply; 5+ messages in thread From: Denis V. Lunev @ 2015-01-28 18:49 UTC (permalink / raw) Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi The following sequence int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 10% better if buf is aligned to 4096 bytes rather then to 512 bytes on HDD with 512/4096 logical/physical sector size. The difference is quite reliable. I have used the following program to test #define _GNU_SOURCE #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <malloc.h> #include <string.h> int main(int argc, char *argv[]) { int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); void *buf; int i = 0; do { buf = memalign(512, 4096); <--- replace 512 with 4096 if ((unsigned long)buf & 4095) break; i++; } while (1); printf("%d\n", i); memset(buf, 0x11, 4096); for (i = 0; i < 100000; i++) write(fd, buf, 4096); close(fd); return 0; } time for in in `seq 1 30` ; do a.out aa ; done The file was placed into 8 GB partition on HDD below to avoid speed change due to different offset on disk. Results are reliable: - 189 vs 180 seconds on Linux 3.16 Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> hades ~/src/qemu # hdparm -I /dev/sdg /dev/sdg: ATA device, with non-removable media Model Number: WDC WD20EZRX-07D8PB0 Serial Number: WD-WCC4M5LVSAEP Firmware Revision: 80.00A80 Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Supported: 9 8 7 6 5 Likely used: 9 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 3907029168 Logical Sector size: 512 bytes Physical Sector size: 4096 bytes device size with M = 1024*1024: 1907729 MBytes device size with M = 1000*1000: 2000398 MBytes (2000 GB) cache/buffer size = unknown Nominal Media Rotation Rate: 5400 Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * 64-bit World wide name * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters * NCQ priority information * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT * DMA Setup Auto-Activate optimization Device-initiated interface power management * Software settings preservation * SMART Command Transport (SCT) feature set * SCT Write Same (AC2) * SCT Features Control (AC4) * SCT Data Tables (AC5) unknown 206[12] (vendor specific) unknown 206[13] (vendor specific) unknown 206[14] (vendor specific) Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count supported: enhanced erase 276min for SECURITY ERASE UNIT. 276min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 50014ee2b5da838c NAA : 5 IEEE OUI : 0014ee Unique ID : 2b5da838c Checksum: correct hades ~/src/qemu # ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 2015-01-28 18:49 [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests Denis V. Lunev @ 2015-01-28 18:49 ` Denis V. Lunev 2015-01-28 19:59 ` Denis V. Lunev 2015-01-28 20:07 ` Paolo Bonzini 0 siblings, 2 replies; 5+ messages in thread From: Denis V. Lunev @ 2015-01-28 18:49 UTC (permalink / raw) Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi The following sequence int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 10% better if buf is aligned to 4096 bytes rather then to 512 bytes on HDD with 512/4096 logical/physical sector size. The difference is quite reliable. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> --- block.c | 4 ++-- block/raw-posix.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/block.c b/block.c index d45e4dd..bc5d1e7 100644 --- a/block.c +++ b/block.c @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_transfer_length = bs->file->bl.max_transfer_length; bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment; } else { - bs->bl.opt_mem_alignment = 512; + bs->bl.opt_mem_alignment = 4096; } if (bs->backing_hd) { @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, bs->open_flags = flags; bs->guest_block_size = 512; - bs->request_alignment = 512; + bs->request_alignment = 4096; bs->zero_beyond_eof = true; open_flags = bdrv_open_flags(bs, flags); bs->read_only = !(open_flags & BDRV_O_RDWR); diff --git a/block/raw-posix.c b/block/raw-posix.c index ec38fee..d1b3388 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) if (!s->buf_align) { size_t align; buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE); - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) { s->buf_align = align; break; @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) if (!bs->request_alignment) { size_t align; buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE); - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { if (pread(fd, buf, align, 0) >= 0) { bs->request_alignment = align; break; -- 1.9.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev @ 2015-01-28 19:59 ` Denis V. Lunev 2015-01-28 20:07 ` Paolo Bonzini 1 sibling, 0 replies; 5+ messages in thread From: Denis V. Lunev @ 2015-01-28 19:59 UTC (permalink / raw) Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi On 28/01/15 21:49, Denis V. Lunev wrote: > The following sequence > int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); > for (i = 0; i < 100000; i++) > write(fd, buf, 4096); > performs 10% better if buf is aligned to 4096 bytes rather then to > 512 bytes on HDD with 512/4096 logical/physical sector size. > > The difference is quite reliable. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Stefan Hajnoczi <stefanha@redhat.com> > --- > block.c | 4 ++-- > block/raw-posix.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/block.c b/block.c > index d45e4dd..bc5d1e7 100644 > --- a/block.c > +++ b/block.c > @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) > bs->bl.max_transfer_length = bs->file->bl.max_transfer_length; > bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment; > } else { > - bs->bl.opt_mem_alignment = 512; > + bs->bl.opt_mem_alignment = 4096; > } > > if (bs->backing_hd) { > @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, > > bs->open_flags = flags; > bs->guest_block_size = 512; > - bs->request_alignment = 512; > + bs->request_alignment = 4096; > bs->zero_beyond_eof = true; > open_flags = bdrv_open_flags(bs, flags); > bs->read_only = !(open_flags & BDRV_O_RDWR); > diff --git a/block/raw-posix.c b/block/raw-posix.c > index ec38fee..d1b3388 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > if (!s->buf_align) { > size_t align; > buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE); > - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { > + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { > if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) { > s->buf_align = align; > break; > @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > if (!bs->request_alignment) { > size_t align; > buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE); > - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { > + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { > if (pread(fd, buf, align, 0) >= 0) { > bs->request_alignment = align; > break; sorry, the patch is wrong. It breaks 'make check-block'. I will redo it and perform more testing. request-alignment related changes are wrong :( I have run tests without them but added them as a obvious last minute addition. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev 2015-01-28 19:59 ` Denis V. Lunev @ 2015-01-28 20:07 ` Paolo Bonzini 2015-01-28 20:13 ` Denis V. Lunev 1 sibling, 1 reply; 5+ messages in thread From: Paolo Bonzini @ 2015-01-28 20:07 UTC (permalink / raw) To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi On 28/01/2015 19:49, Denis V. Lunev wrote: > The following sequence > int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); > for (i = 0; i < 100000; i++) > write(fd, buf, 4096); > performs 10% better if buf is aligned to 4096 bytes rather then to > 512 bytes on HDD with 512/4096 logical/physical sector size. > > The difference is quite reliable. The 10% difference, however, is probably not enough to cover the cost of providing a bounce buffer if a guest is (rightfully) using a 512-byte aligned buffer: bs->bl.opt_mem_alignment is in fact badly named and it should be bs->bl.min_mem_alignment instead. Instead, you probably should patch bdrv_opt_mem_align to return at least 4096, and leave the detection logic intact. This will let qemu_blockalign return a properly aligned buffer to qemu-img and other in-process allocations, without negatively affecting the guest. Thanks, Paolo > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Stefan Hajnoczi <stefanha@redhat.com> > --- > block.c | 4 ++-- > block/raw-posix.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/block.c b/block.c > index d45e4dd..bc5d1e7 100644 > --- a/block.c > +++ b/block.c > @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) > bs->bl.max_transfer_length = bs->file->bl.max_transfer_length; > bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment; > } else { > - bs->bl.opt_mem_alignment = 512; > + bs->bl.opt_mem_alignment = 4096; > } > > if (bs->backing_hd) { > @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, > > bs->open_flags = flags; > bs->guest_block_size = 512; > - bs->request_alignment = 512; > + bs->request_alignment = 4096; > bs->zero_beyond_eof = true; > open_flags = bdrv_open_flags(bs, flags); > bs->read_only = !(open_flags & BDRV_O_RDWR); > diff --git a/block/raw-posix.c b/block/raw-posix.c > index ec38fee..d1b3388 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > if (!s->buf_align) { > size_t align; > buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE); > - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { > + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { > if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) { > s->buf_align = align; > break; > @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) > if (!bs->request_alignment) { > size_t align; > buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE); > - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { > + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { > if (pread(fd, buf, align, 0) >= 0) { > bs->request_alignment = align; > break; > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 2015-01-28 20:07 ` Paolo Bonzini @ 2015-01-28 20:13 ` Denis V. Lunev 0 siblings, 0 replies; 5+ messages in thread From: Denis V. Lunev @ 2015-01-28 20:13 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi On 28/01/15 23:07, Paolo Bonzini wrote: > > On 28/01/2015 19:49, Denis V. Lunev wrote: >> The following sequence >> int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); >> for (i = 0; i < 100000; i++) >> write(fd, buf, 4096); >> performs 10% better if buf is aligned to 4096 bytes rather then to >> 512 bytes on HDD with 512/4096 logical/physical sector size. >> >> The difference is quite reliable. > The 10% difference, however, is probably not enough to cover the cost of > providing a bounce buffer if a guest is (rightfully) using a 512-byte > aligned buffer: bs->bl.opt_mem_alignment is in fact badly named and it > should be bs->bl.min_mem_alignment instead. > > Instead, you probably should patch bdrv_opt_mem_align to return at least > 4096, and leave the detection logic intact. This will let > qemu_blockalign return a properly aligned buffer to qemu-img and other > in-process allocations, without negatively affecting the guest. > > Thanks, > > Paolo ok, this looks good to me :) >> Signed-off-by: Denis V. Lunev <den@openvz.org> >> CC: Kevin Wolf <kwolf@redhat.com> >> CC: Stefan Hajnoczi <stefanha@redhat.com> >> --- >> block.c | 4 ++-- >> block/raw-posix.c | 4 ++-- >> 2 files changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/block.c b/block.c >> index d45e4dd..bc5d1e7 100644 >> --- a/block.c >> +++ b/block.c >> @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp) >> bs->bl.max_transfer_length = bs->file->bl.max_transfer_length; >> bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment; >> } else { >> - bs->bl.opt_mem_alignment = 512; >> + bs->bl.opt_mem_alignment = 4096; >> } >> >> if (bs->backing_hd) { >> @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file, >> >> bs->open_flags = flags; >> bs->guest_block_size = 512; >> - bs->request_alignment = 512; >> + bs->request_alignment = 4096; >> bs->zero_beyond_eof = true; >> open_flags = bdrv_open_flags(bs, flags); >> bs->read_only = !(open_flags & BDRV_O_RDWR); >> diff --git a/block/raw-posix.c b/block/raw-posix.c >> index ec38fee..d1b3388 100644 >> --- a/block/raw-posix.c >> +++ b/block/raw-posix.c >> @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) >> if (!s->buf_align) { >> size_t align; >> buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE); >> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { >> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { >> if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) { >> s->buf_align = align; >> break; >> @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) >> if (!bs->request_alignment) { >> size_t align; >> buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE); >> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) { >> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) { >> if (pread(fd, buf, align, 0) >= 0) { >> bs->request_alignment = align; >> break; >> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-28 20:13 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-28 18:49 [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests Denis V. Lunev 2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev 2015-01-28 19:59 ` Denis V. Lunev 2015-01-28 20:07 ` Paolo Bonzini 2015-01-28 20:13 ` Denis V. Lunev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).