* [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently @ 2010-09-12 21:42 Christoph Hellwig 2010-09-12 21:43 ` [Qemu-devel] [PATCH 2/5] raw-posix: handle > 512 byte alignment correctly Christoph Hellwig ` (4 more replies) 0 siblings, 5 replies; 8+ messages in thread From: Christoph Hellwig @ 2010-09-12 21:42 UTC (permalink / raw) To: qemu-devel Use qemu_blockalign for all allocations in the block layer. This allows increasing the required alignment, which is need to support O_DIRECT on devices with large block sizes. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: qemu/hw/scsi-disk.c =================================================================== --- qemu.orig/hw/scsi-disk.c 2010-09-12 14:42:40.942759377 -0300 +++ qemu/hw/scsi-disk.c 2010-09-12 14:43:04.694759377 -0300 @@ -70,14 +70,15 @@ struct SCSIDiskState char *serial; }; -static SCSIDiskReq *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun) +static SCSIDiskReq *scsi_new_request(SCSIDiskState *s, uint32_t tag, + uint32_t lun) { SCSIRequest *req; SCSIDiskReq *r; - req = scsi_req_alloc(sizeof(SCSIDiskReq), d, tag, lun); + req = scsi_req_alloc(sizeof(SCSIDiskReq), &s->qdev, tag, lun); r = DO_UPCAST(SCSIDiskReq, req, req); - r->iov.iov_base = qemu_memalign(512, SCSI_DMA_BUF_SIZE); + r->iov.iov_base = qemu_blockalign(s->bs, SCSI_DMA_BUF_SIZE); return r; } @@ -939,7 +940,7 @@ static int32_t scsi_send_command(SCSIDev } /* ??? Tags are not unique for different luns. We only implement a single lun, so this should not matter. */ - r = scsi_new_request(d, tag, lun); + r = scsi_new_request(s, tag, lun); outbuf = (uint8_t *)r->iov.iov_base; is_write = 0; DPRINTF("Command: lun=%d tag=0x%x data=0x%02x", lun, tag, buf[0]); Index: qemu/hw/sd.c =================================================================== --- qemu.orig/hw/sd.c 2010-09-12 14:31:04.387759376 -0300 +++ qemu/hw/sd.c 2010-09-12 14:43:04.695759377 -0300 @@ -440,7 +440,7 @@ SDState *sd_init(BlockDriverState *bs, i SDState *sd; sd = (SDState *) qemu_mallocz(sizeof(SDState)); - sd->buf = qemu_memalign(512, 512); + sd->buf = qemu_blockalign(bs, 512); sd->spi = is_spi; sd->enable = 1; sd_reset(sd, bs); Index: qemu/qemu-io.c =================================================================== --- qemu.orig/qemu-io.c 2010-09-12 14:31:04.394759376 -0300 +++ qemu/qemu-io.c 2010-09-12 14:43:04.699759377 -0300 @@ -61,7 +61,7 @@ static void *qemu_io_alloc(size_t len, i if (misalign) len += MISALIGN_OFFSET; - buf = qemu_memalign(512, len); + buf = qemu_blockalign(bs, len); memset(buf, pattern, len); if (misalign) buf += MISALIGN_OFFSET; Index: qemu/qemu-nbd.c =================================================================== --- qemu.orig/qemu-nbd.c 2010-09-12 14:31:04.401759376 -0300 +++ qemu/qemu-nbd.c 2010-09-12 14:43:04.706759377 -0300 @@ -446,7 +446,7 @@ int main(int argc, char **argv) max_fd = sharing_fds[0]; nb_fds++; - data = qemu_memalign(512, NBD_BUFFER_SIZE); + data = qemu_blockalign(bs, NBD_BUFFER_SIZE); if (data == NULL) errx(EXIT_FAILURE, "Cannot allocate data buffer"); Index: qemu/posix-aio-compat.c =================================================================== --- qemu.orig/posix-aio-compat.c 2010-09-12 14:42:46.725759377 -0300 +++ qemu/posix-aio-compat.c 2010-09-12 14:43:04.711759377 -0300 @@ -270,7 +270,7 @@ static ssize_t handle_aiocb_rw(struct qe * Ok, we have to do it the hard way, copy all segments into * a single aligned buffer. */ - buf = qemu_memalign(512, aiocb->aio_nbytes); + buf = qemu_blockalign(aiocb->common.bs, aiocb->aio_nbytes); if (aiocb->aio_type & QEMU_AIO_WRITE) { char *p = buf; int i; ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 2/5] raw-posix: handle > 512 byte alignment correctly 2010-09-12 21:42 [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Christoph Hellwig @ 2010-09-12 21:43 ` Christoph Hellwig 2010-09-12 21:43 ` [Qemu-devel] [PATCH 3/5] virtio-blk: propagate the required alignment Christoph Hellwig ` (3 subsequent siblings) 4 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2010-09-12 21:43 UTC (permalink / raw) To: qemu-devel Replace the hardcoded handling of 512 byte alignment with bs->buffer_alignment to handle larger sector size devices correctly. Note that we can not rely on it to be initialize in bdrv_open, so deal with the worst case there. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: qemu/block/raw-posix.c =================================================================== --- qemu.orig/block/raw-posix.c 2010-09-12 14:31:04.344759376 -0300 +++ qemu/block/raw-posix.c 2010-09-12 14:46:02.120759376 -0300 @@ -97,12 +97,12 @@ #define FTYPE_CD 1 #define FTYPE_FD 2 -#define ALIGNED_BUFFER_SIZE (32 * 512) - /* if the FD is not accessed during that time (in ms), we try to reopen it to see if the disk has been changed */ #define FD_OPEN_TIMEOUT 1000 +#define MAX_BLOCKSIZE 4096 + typedef struct BDRVRawState { int fd; int type; @@ -118,7 +118,8 @@ typedef struct BDRVRawState { int use_aio; void *aio_ctx; #endif - uint8_t* aligned_buf; + uint8_t *aligned_buf; + unsigned aligned_buf_size; } BDRVRawState; static int fd_open(BlockDriverState *bs); @@ -161,7 +162,12 @@ static int raw_open_common(BlockDriverSt s->aligned_buf = NULL; if ((bdrv_flags & BDRV_O_NOCACHE)) { - s->aligned_buf = qemu_blockalign(bs, ALIGNED_BUFFER_SIZE); + /* + * Allocate a buffer for read/modify/write cycles. Chose the size + * pessimistically as we don't know the block size yet. + */ + s->aligned_buf_size = 32 * MAX_BLOCKSIZE; + s->aligned_buf = qemu_memalign(MAX_BLOCKSIZE, s->aligned_buf_size); if (s->aligned_buf == NULL) { goto out_close; } @@ -278,8 +284,9 @@ static int raw_pread_aligned(BlockDriver } /* - * offset and count are in bytes, but must be multiples of 512 for files - * opened with O_DIRECT. buf must be aligned to 512 bytes then. + * offset and count are in bytes, but must be multiples of the sector size + * for files opened with O_DIRECT. buf must be aligned to sector size bytes + * then. * * This function may be called without alignment if the caller ensures * that O_DIRECT is not in effect. @@ -316,24 +323,25 @@ static int raw_pread(BlockDriverState *b uint8_t *buf, int count) { BDRVRawState *s = bs->opaque; + unsigned sector_mask = bs->buffer_alignment - 1; int size, ret, shift, sum; sum = 0; if (s->aligned_buf != NULL) { - if (offset & 0x1ff) { - /* align offset on a 512 bytes boundary */ + if (offset & sector_mask) { + /* align offset on a sector size bytes boundary */ - shift = offset & 0x1ff; - size = (shift + count + 0x1ff) & ~0x1ff; - if (size > ALIGNED_BUFFER_SIZE) - size = ALIGNED_BUFFER_SIZE; + shift = offset & sector_mask; + size = (shift + count + sector_mask) & ~sector_mask; + if (size > s->aligned_buf_size) + size = s->aligned_buf_size; ret = raw_pread_aligned(bs, offset - shift, s->aligned_buf, size); if (ret < 0) return ret; - size = 512 - shift; + size = bs->buffer_alignment - shift; if (size > count) size = count; memcpy(buf, s->aligned_buf + shift, size); @@ -346,15 +354,15 @@ static int raw_pread(BlockDriverState *b if (count == 0) return sum; } - if (count & 0x1ff || (uintptr_t) buf & 0x1ff) { + if (count & sector_mask || (uintptr_t) buf & sector_mask) { /* read on aligned buffer */ while (count) { - size = (count + 0x1ff) & ~0x1ff; - if (size > ALIGNED_BUFFER_SIZE) - size = ALIGNED_BUFFER_SIZE; + size = (count + sector_mask) & ~sector_mask; + if (size > s->aligned_buf_size) + size = s->aligned_buf_size; ret = raw_pread_aligned(bs, offset, s->aligned_buf, size); if (ret < 0) { @@ -404,25 +412,28 @@ static int raw_pwrite(BlockDriverState * const uint8_t *buf, int count) { BDRVRawState *s = bs->opaque; + unsigned sector_mask = bs->buffer_alignment - 1; int size, ret, shift, sum; sum = 0; if (s->aligned_buf != NULL) { - if (offset & 0x1ff) { - /* align offset on a 512 bytes boundary */ - shift = offset & 0x1ff; - ret = raw_pread_aligned(bs, offset - shift, s->aligned_buf, 512); + if (offset & sector_mask) { + /* align offset on a sector size bytes boundary */ + shift = offset & sector_mask; + ret = raw_pread_aligned(bs, offset - shift, s->aligned_buf, + bs->buffer_alignment); if (ret < 0) return ret; - size = 512 - shift; + size = bs->buffer_alignment - shift; if (size > count) size = count; memcpy(s->aligned_buf + shift, buf, size); - ret = raw_pwrite_aligned(bs, offset - shift, s->aligned_buf, 512); + ret = raw_pwrite_aligned(bs, offset - shift, s->aligned_buf, + bs->buffer_alignment); if (ret < 0) return ret; @@ -434,12 +445,12 @@ static int raw_pwrite(BlockDriverState * if (count == 0) return sum; } - if (count & 0x1ff || (uintptr_t) buf & 0x1ff) { + if (count & sector_mask || (uintptr_t) buf & sector_mask) { - while ((size = (count & ~0x1ff)) != 0) { + while ((size = (count & ~sector_mask)) != 0) { - if (size > ALIGNED_BUFFER_SIZE) - size = ALIGNED_BUFFER_SIZE; + if (size > s->aligned_buf_size) + size = s->aligned_buf_size; memcpy(s->aligned_buf, buf, size); @@ -452,14 +463,16 @@ static int raw_pwrite(BlockDriverState * count -= ret; sum += ret; } - /* here, count < 512 because (count & ~0x1ff) == 0 */ + /* here, count < 512 because (count & ~sector_mask) == 0 */ if (count) { - ret = raw_pread_aligned(bs, offset, s->aligned_buf, 512); + ret = raw_pread_aligned(bs, offset, s->aligned_buf, + bs->buffer_alignment); if (ret < 0) return ret; memcpy(s->aligned_buf, buf, count); - ret = raw_pwrite_aligned(bs, offset, s->aligned_buf, 512); + ret = raw_pwrite_aligned(bs, offset, s->aligned_buf, + bs->buffer_alignment); if (ret < 0) return ret; if (count < ret) @@ -487,12 +500,12 @@ static int raw_write(BlockDriverState *b /* * Check if all memory in this vector is sector aligned. */ -static int qiov_is_aligned(QEMUIOVector *qiov) +static int qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov) { int i; for (i = 0; i < qiov->niov; i++) { - if ((uintptr_t) qiov->iov[i].iov_base % BDRV_SECTOR_SIZE) { + if ((uintptr_t) qiov->iov[i].iov_base % bs->buffer_alignment) { return 0; } } @@ -515,7 +528,7 @@ static BlockDriverAIOCB *raw_aio_submit( * driver that it needs to copy the buffer. */ if (s->aligned_buf) { - if (!qiov_is_aligned(qiov)) { + if (!qiov_is_aligned(bs, qiov)) { type |= QEMU_AIO_MISALIGNED; #ifdef CONFIG_LINUX_AIO } else if (s->use_aio) { ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 3/5] virtio-blk: propagate the required alignment 2010-09-12 21:42 [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Christoph Hellwig 2010-09-12 21:43 ` [Qemu-devel] [PATCH 2/5] raw-posix: handle > 512 byte alignment correctly Christoph Hellwig @ 2010-09-12 21:43 ` Christoph Hellwig 2010-09-12 21:43 ` [Qemu-devel] [PATCH 4/5] scsi-disk: " Christoph Hellwig ` (2 subsequent siblings) 4 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2010-09-12 21:43 UTC (permalink / raw) To: qemu-devel Signed-off-by: Christoph Hellwig <hch@lst.de> Index: qemu/hw/virtio-blk.c =================================================================== --- qemu.orig/hw/virtio-blk.c 2010-09-12 14:42:46.721759377 -0300 +++ qemu/hw/virtio-blk.c 2010-09-12 14:51:09.737759375 -0300 @@ -540,6 +540,7 @@ VirtIODevice *virtio_blk_init(DeviceStat register_savevm(dev, "virtio-blk", virtio_blk_id++, 2, virtio_blk_save, virtio_blk_load, s); bdrv_set_removable(s->bs, 0); + s->bs->buffer_alignment = conf->logical_block_size; return &s->vdev; } ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 4/5] scsi-disk: propagate the required alignment 2010-09-12 21:42 [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Christoph Hellwig 2010-09-12 21:43 ` [Qemu-devel] [PATCH 2/5] raw-posix: handle > 512 byte alignment correctly Christoph Hellwig 2010-09-12 21:43 ` [Qemu-devel] [PATCH 3/5] virtio-blk: propagate the required alignment Christoph Hellwig @ 2010-09-12 21:43 ` Christoph Hellwig 2010-09-12 21:44 ` [Qemu-devel] [PATCH 5/5] ide: " Christoph Hellwig 2010-09-13 19:02 ` [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Stefan Hajnoczi 4 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2010-09-12 21:43 UTC (permalink / raw) To: qemu-devel Signed-off-by: Christoph Hellwig <hch@lst.de> Index: qemu/hw/scsi-disk.c =================================================================== --- qemu.orig/hw/scsi-disk.c 2010-09-12 14:43:04.694759377 -0300 +++ qemu/hw/scsi-disk.c 2010-09-12 14:50:19.049759376 -0300 @@ -1178,6 +1178,7 @@ static int scsi_disk_initfn(SCSIDevice * s->qdev.blocksize = s->qdev.conf.logical_block_size; } s->cluster_size = s->qdev.blocksize / 512; + s->bs->buffer_alignment = s->qdev.blocksize; s->qdev.type = TYPE_DISK; qemu_add_vm_change_state_handler(scsi_dma_restart_cb, s); ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 5/5] ide: propagate the required alignment 2010-09-12 21:42 [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Christoph Hellwig ` (2 preceding siblings ...) 2010-09-12 21:43 ` [Qemu-devel] [PATCH 4/5] scsi-disk: " Christoph Hellwig @ 2010-09-12 21:44 ` Christoph Hellwig 2010-09-21 10:59 ` Kevin Wolf 2010-09-13 19:02 ` [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Stefan Hajnoczi 4 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2010-09-12 21:44 UTC (permalink / raw) To: qemu-devel IDE is a bit ugly in this respect. For one it doesn't really keep track of a sector size - most of the protocol is in units of 512 bytes, and we assume 2048 bytes for CDROMs which is correct most of the time. Second IDE allocates an I/O buffer long before we know if we're dealing with a CDROM or not, so increase the alignment for the io_buffer unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Index: qemu/hw/ide/core.c =================================================================== --- qemu.orig/hw/ide/core.c 2010-09-12 18:30:06.000000000 -0300 +++ qemu/hw/ide/core.c 2010-09-12 18:32:29.133759395 -0300 @@ -2645,6 +2645,7 @@ int ide_init_drive(IDEState *s, BlockDri if (bdrv_get_type_hint(bs) == BDRV_TYPE_CDROM) { s->drive_kind = IDE_CD; bdrv_set_change_cb(bs, cdrom_change_cb, s); + bs->buffer_alignment = 2048; } else { if (!bdrv_is_inserted(s->bs)) { error_report("Device needs media, but drive is empty"); @@ -2679,7 +2680,8 @@ static void ide_init1(IDEBus *bus, int u s->bus = bus; s->unit = unit; s->drive_serial = drive_serial++; - s->io_buffer = qemu_blockalign(s->bs, IDE_DMA_BUF_SECTORS*512 + 4); + /* we need at least 2k alignment for accessing CDROMs using O_DIRECT */ + s->io_buffer = qemu_memalign(2048, IDE_DMA_BUF_SECTORS*512 + 4); s->io_buffer_total_len = IDE_DMA_BUF_SECTORS*512 + 4; s->smart_selftest_data = qemu_blockalign(s->bs, 512); s->sector_write_timer = qemu_new_timer(vm_clock, ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 5/5] ide: propagate the required alignment 2010-09-12 21:44 ` [Qemu-devel] [PATCH 5/5] ide: " Christoph Hellwig @ 2010-09-21 10:59 ` Kevin Wolf 0 siblings, 0 replies; 8+ messages in thread From: Kevin Wolf @ 2010-09-21 10:59 UTC (permalink / raw) To: Christoph Hellwig; +Cc: qemu-devel Am 12.09.2010 23:44, schrieb Christoph Hellwig: > IDE is a bit ugly in this respect. For one it doesn't really keep track > of a sector size - most of the protocol is in units of 512 bytes, and we > assume 2048 bytes for CDROMs which is correct most of the time. > > Second IDE allocates an I/O buffer long before we know if we're dealing > with a CDROM or not, so increase the alignment for the io_buffer > unconditionally. > > Signed-off-by: Christoph Hellwig <hch@lst.de> I'm not very happy about the last three patches in this series because they mix guest device and backend properties. But it's probably the best we can get without major effort. Applied all to the block branch. Kevin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently 2010-09-12 21:42 [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Christoph Hellwig ` (3 preceding siblings ...) 2010-09-12 21:44 ` [Qemu-devel] [PATCH 5/5] ide: " Christoph Hellwig @ 2010-09-13 19:02 ` Stefan Hajnoczi 2010-09-14 16:09 ` Christoph Hellwig 4 siblings, 1 reply; 8+ messages in thread From: Stefan Hajnoczi @ 2010-09-13 19:02 UTC (permalink / raw) To: Christoph Hellwig; +Cc: qemu-devel On Sun, Sep 12, 2010 at 10:42 PM, Christoph Hellwig <hch@lst.de> wrote: > Use qemu_blockalign for all allocations in the block layer. This allows > increasing the required alignment, which is need to support O_DIRECT on > devices with large block sizes. > > Signed-off-by: Christoph Hellwig <hch@lst.de> I noticed that the backing file may have different alignment requirements than the image file but qemu_blockalign() currently only takes into account the image file's requirements. Not necessarily something for this patch, but since you're in the area I wanted to mention it. It seems safe to take the max alignment (they should both be powers of two I think) of the image and backing files. Stefan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently 2010-09-13 19:02 ` [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Stefan Hajnoczi @ 2010-09-14 16:09 ` Christoph Hellwig 0 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2010-09-14 16:09 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Christoph Hellwig, qemu-devel On Mon, Sep 13, 2010 at 08:02:28PM +0100, Stefan Hajnoczi wrote: > On Sun, Sep 12, 2010 at 10:42 PM, Christoph Hellwig <hch@lst.de> wrote: > > Use qemu_blockalign for all allocations in the block layer. ?This allows > > increasing the required alignment, which is need to support O_DIRECT on > > devices with large block sizes. > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > I noticed that the backing file may have different alignment > requirements than the image file but qemu_blockalign() currently only > takes into account the image file's requirements. Not necessarily > something for this patch, but since you're in the area I wanted to > mention it. > > It seems safe to take the max alignment (they should both be powers of > two I think) of the image and backing files. Currently we check the required alignment based on the blocksize we present to the guest, which is set by the driver. That's a bit upside down, but the best we can do. The problem is that the blocksize of the device the image (and it's backing file reside on) are the lower bound of what we can support in the guest without massive read/modify/write cycles. If we use the page cache the kernel does the cycles for us, but for cache=none we would have to do it ourselves in a rather inefficient way. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-09-21 10:59 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-09-12 21:42 [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Christoph Hellwig 2010-09-12 21:43 ` [Qemu-devel] [PATCH 2/5] raw-posix: handle > 512 byte alignment correctly Christoph Hellwig 2010-09-12 21:43 ` [Qemu-devel] [PATCH 3/5] virtio-blk: propagate the required alignment Christoph Hellwig 2010-09-12 21:43 ` [Qemu-devel] [PATCH 4/5] scsi-disk: " Christoph Hellwig 2010-09-12 21:44 ` [Qemu-devel] [PATCH 5/5] ide: " Christoph Hellwig 2010-09-21 10:59 ` Kevin Wolf 2010-09-13 19:02 ` [Qemu-devel] [PATCH 1/5] use qemu_blockalign consistently Stefan Hajnoczi 2010-09-14 16:09 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).