* [PATCHSET] Add support for simplified async direct-io
@ 2016-11-14 17:28 Jens Axboe
2016-11-14 17:28 ` [PATCH 1/2] block: support any sized IO for simplified bdev direct-io Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Jens Axboe @ 2016-11-14 17:28 UTC (permalink / raw)
To: axboe, linux-block; +Cc: hch
This is on top of for-4.10/dio, which has Christophs simplified sync
O_DIRECT support and the IO poll bits.
The restriction on 4 inline vecs is removed on the sync support, we
just allocate the bio_vec array if we have to. I realize this negates
parts of the win of the patch for sync, but it's still a lot leaner
than the old code. And it means we use the same path for any type
of sync O_DIRECT, instead of potentially bouncing between the two.
Second patch is adding async support for the code base. Lightly
tested. Performance results, from fio:
Before patch:
Treads x QD IOPS usr sys
=============================================
1x1 113K 19.8% 24.6%
1x4 321K 30.9% 65.9%
4x4 759K 19.9% 42.2%
After patch:
Treads x QD IOPS usr sys
=============================================
1x1 116K 21.8% 20.2%
1x4 365K 25.3% 72.9%
4x4 818K 20.7% 42.3%
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 1/2] block: support any sized IO for simplified bdev direct-io
2016-11-14 17:28 [PATCHSET] Add support for simplified async direct-io Jens Axboe
@ 2016-11-14 17:28 ` Jens Axboe
2016-11-14 17:28 ` [PATCH 2/2] block: add support for async simple direct-io for bdevs Jens Axboe
2016-11-14 17:37 ` [PATCHSET] Add support for simplified async direct-io Christoph Hellwig
2 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2016-11-14 17:28 UTC (permalink / raw)
To: axboe, linux-block; +Cc: hch, Jens Axboe
Just alloc the bio_vec array if we exceed the inline limit.
Signed-off-by: Jens Axboe <axboe@fb.com>
---
fs/block_dev.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7c3ec6049073..2010997fd326 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -193,7 +193,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
struct file *file = iocb->ki_filp;
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
unsigned blkbits = blksize_bits(bdev_logical_block_size(bdev));
- struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *bvec;
+ struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs, *bvec;
loff_t pos = iocb->ki_pos;
bool should_dirty = false;
struct bio bio;
@@ -204,9 +204,17 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
if ((pos | iov_iter_alignment(iter)) & ((1 << blkbits) - 1))
return -EINVAL;
+ if (nr_pages <= DIO_INLINE_BIO_VECS)
+ vecs = inline_vecs;
+ else {
+ vecs = kmalloc(nr_pages * sizeof(struct bio_vec), GFP_KERNEL);
+ if (!vecs)
+ return -ENOMEM;
+ }
+
bio_init(&bio);
bio.bi_max_vecs = nr_pages;
- bio.bi_io_vec = inline_vecs;
+ bio.bi_io_vec = vecs;
bio.bi_bdev = bdev;
bio.bi_iter.bi_sector = pos >> blkbits;
bio.bi_private = current;
@@ -243,6 +251,9 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
put_page(bvec->bv_page);
}
+ if (vecs != inline_vecs)
+ kfree(vecs);
+
if (unlikely(bio.bi_error))
return bio.bi_error;
iocb->ki_pos += ret;
@@ -259,7 +270,7 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
if (!nr_pages)
return 0;
- if (is_sync_kiocb(iocb) && nr_pages <= DIO_INLINE_BIO_VECS)
+ if (is_sync_kiocb(iocb))
return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter,
blkdev_get_block, NULL, NULL,
--
2.7.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH 2/2] block: add support for async simple direct-io for bdevs
2016-11-14 17:28 [PATCHSET] Add support for simplified async direct-io Jens Axboe
2016-11-14 17:28 ` [PATCH 1/2] block: support any sized IO for simplified bdev direct-io Jens Axboe
@ 2016-11-14 17:28 ` Jens Axboe
2016-11-14 17:37 ` [PATCHSET] Add support for simplified async direct-io Christoph Hellwig
2 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2016-11-14 17:28 UTC (permalink / raw)
To: axboe, linux-block; +Cc: hch, Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com>
---
fs/block_dev.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 66 insertions(+), 10 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 2010997fd326..62ca4ce21222 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -176,9 +176,68 @@ static struct inode *bdev_file_inode(struct file *file)
return file->f_mapping->host;
}
+static void blkdev_bio_end_io_async(struct bio *bio)
+{
+ struct kiocb *iocb = bio->bi_private;
+
+ iocb->ki_complete(iocb, bio->bi_error, 0);
+
+ if (bio_op(bio) == REQ_OP_READ)
+ bio_check_pages_dirty(bio);
+ else
+ bio_put(bio);
+}
+
+static ssize_t
+__blkdev_direct_IO_async(struct kiocb *iocb, struct iov_iter *iter,
+ int nr_pages)
+{
+ struct file *file = iocb->ki_filp;
+ struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+ unsigned blkbits = blksize_bits(bdev_logical_block_size(bdev));
+ loff_t pos = iocb->ki_pos;
+ struct bio *bio;
+ ssize_t ret;
+
+ if ((pos | iov_iter_alignment(iter)) & ((1 << blkbits) - 1))
+ return -EINVAL;
+
+ bio = bio_alloc(GFP_KERNEL, nr_pages);
+ if (!bio)
+ return -ENOMEM;
+
+ bio->bi_bdev = bdev;
+ bio->bi_iter.bi_sector = pos >> blkbits;
+ bio->bi_private = iocb;
+ bio->bi_end_io = blkdev_bio_end_io_async;
+
+ ret = bio_iov_iter_get_pages(bio, iter);
+ if (unlikely(ret))
+ return ret;
+
+ /*
+ * Overload bio size in error. If it gets set, we lose the
+ * size, but we don't need the size for that case. IO is limited
+ * to BIO_MAX_PAGES, so we can't overflow.
+ */
+ ret = bio->bi_error = bio->bi_iter.bi_size;
+
+ if (iov_iter_rw(iter) == READ) {
+ bio_set_op_attrs(bio, REQ_OP_READ, 0);
+ bio_set_pages_dirty(bio);
+ } else {
+ bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC | REQ_IDLE);
+ task_io_account_write(ret);
+ }
+
+ submit_bio(bio);
+ iocb->ki_pos += ret;
+ return -EIOCBQUEUED;
+}
+
#define DIO_INLINE_BIO_VECS 4
-static void blkdev_bio_end_io_simple(struct bio *bio)
+static void blkdev_bio_end_io_sync(struct bio *bio)
{
struct task_struct *waiter = bio->bi_private;
@@ -187,8 +246,7 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
}
static ssize_t
-__blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
- int nr_pages)
+__blkdev_direct_IO_sync(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
{
struct file *file = iocb->ki_filp;
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
@@ -218,7 +276,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
bio.bi_bdev = bdev;
bio.bi_iter.bi_sector = pos >> blkbits;
bio.bi_private = current;
- bio.bi_end_io = blkdev_bio_end_io_simple;
+ bio.bi_end_io = blkdev_bio_end_io_sync;
ret = bio_iov_iter_get_pages(&bio, iter);
if (unlikely(ret))
@@ -263,18 +321,16 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
static ssize_t
blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = bdev_file_inode(file);
int nr_pages;
nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
if (!nr_pages)
return 0;
+
if (is_sync_kiocb(iocb))
- return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
- return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter,
- blkdev_get_block, NULL, NULL,
- DIO_SKIP_DIO_COUNT);
+ return __blkdev_direct_IO_sync(iocb, iter, nr_pages);
+
+ return __blkdev_direct_IO_async(iocb, iter, nr_pages);
}
int __sync_blockdev(struct block_device *bdev, int wait)
--
2.7.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 17:28 [PATCHSET] Add support for simplified async direct-io Jens Axboe
2016-11-14 17:28 ` [PATCH 1/2] block: support any sized IO for simplified bdev direct-io Jens Axboe
2016-11-14 17:28 ` [PATCH 2/2] block: add support for async simple direct-io for bdevs Jens Axboe
@ 2016-11-14 17:37 ` Christoph Hellwig
2016-11-14 17:47 ` Jens Axboe
2 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-14 17:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: axboe, linux-block, hch
On Mon, Nov 14, 2016 at 10:28:37AM -0700, Jens Axboe wrote:
> This is on top of for-4.10/dio, which has Christophs simplified sync
> O_DIRECT support and the IO poll bits.
>
> The restriction on 4 inline vecs is removed on the sync support, we
> just allocate the bio_vec array if we have to. I realize this negates
> parts of the win of the patch for sync, but it's still a lot leaner
> than the old code. And it means we use the same path for any type
> of sync O_DIRECT, instead of potentially bouncing between the two.
>
> Second patch is adding async support for the code base. Lightly
> tested. Performance results, from fio:
І'd much rather add a separate path pased on the same scheme. And
because I prefer that so much I actually did it a while ago, although
it will need some minor updates for the current tree:
http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
I'll see if I can get it quickly updated..
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 17:37 ` [PATCHSET] Add support for simplified async direct-io Christoph Hellwig
@ 2016-11-14 17:47 ` Jens Axboe
2016-11-14 18:00 ` Christoph Hellwig
0 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-14 17:47 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block
On 11/14/2016 10:37 AM, Christoph Hellwig wrote:
> On Mon, Nov 14, 2016 at 10:28:37AM -0700, Jens Axboe wrote:
>> This is on top of for-4.10/dio, which has Christophs simplified sync
>> O_DIRECT support and the IO poll bits.
>>
>> The restriction on 4 inline vecs is removed on the sync support, we
>> just allocate the bio_vec array if we have to. I realize this negates
>> parts of the win of the patch for sync, but it's still a lot leaner
>> than the old code. And it means we use the same path for any type
>> of sync O_DIRECT, instead of potentially bouncing between the two.
>>
>> Second patch is adding async support for the code base. Lightly
>> tested. Performance results, from fio:
>
> І'd much rather add a separate path pased on the same scheme. And
> because I prefer that so much I actually did it a while ago, although
> it will need some minor updates for the current tree:
>
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
>
> I'll see if I can get it quickly updated..
This seems less clean in basically all ways, not sure I agree with you.
We already have 4 vecs inlined in a generic bio, and we might as well
use the fs bioset instead of creating our own. You also add a smallish
dio to track things, I don't think we need that. Ditto for the ref count
- if we keep the sync and async parts separate, we don't need it.
I looked into sharing some code for the async and sync paths, but
there's really nothing to share. We can fold the alignment and account
parts if we wanted, but that's about it.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 17:47 ` Jens Axboe
@ 2016-11-14 18:00 ` Christoph Hellwig
2016-11-14 18:02 ` Jens Axboe
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-14 18:00 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, axboe, linux-block
On Mon, Nov 14, 2016 at 10:47:46AM -0700, Jens Axboe wrote:
> This seems less clean in basically all ways, not sure I agree with you.
> We already have 4 vecs inlined in a generic bio, and we might as well
> use the fs bioset instead of creating our own. You also add a smallish
> dio to track things, I don't think we need that.
We need it unless we want unbounded allocations for the biovec. With a
1GB I/O we're at a page size allocation, and with 64MB I/Os that aren't
unheard of we'd be up to a 64 pages or an order 6 allocation which will
take down the VM. We also need to pin down all the user memory while
doing the I/O instead of having throttling on the bio mempool before
doing the get_user_pages.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 18:00 ` Christoph Hellwig
@ 2016-11-14 18:02 ` Jens Axboe
2016-11-14 18:05 ` Christoph Hellwig
0 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-14 18:02 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block
On 11/14/2016 11:00 AM, Christoph Hellwig wrote:
> On Mon, Nov 14, 2016 at 10:47:46AM -0700, Jens Axboe wrote:
>> This seems less clean in basically all ways, not sure I agree with you.
>> We already have 4 vecs inlined in a generic bio, and we might as well
>> use the fs bioset instead of creating our own. You also add a smallish
>> dio to track things, I don't think we need that.
>
> We need it unless we want unbounded allocations for the biovec. With a
> 1GB I/O we're at a page size allocation, and with 64MB I/Os that aren't
> unheard of we'd be up to a 64 pages or an order 6 allocation which will
> take down the VM. We also need to pin down all the user memory while
> doing the I/O instead of having throttling on the bio mempool before
> doing the get_user_pages.
Just add the iterator and loop for every X pages? We can even put a plug
around it, if we have to.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 18:02 ` Jens Axboe
@ 2016-11-14 18:05 ` Christoph Hellwig
2016-11-14 18:08 ` Jens Axboe
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-14 18:05 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, axboe, linux-block
On Mon, Nov 14, 2016 at 11:02:47AM -0700, Jens Axboe wrote:
> > We need it unless we want unbounded allocations for the biovec. With a
> > 1GB I/O we're at a page size allocation, and with 64MB I/Os that aren't
> > unheard of we'd be up to a 64 pages or an order 6 allocation which will
> > take down the VM. We also need to pin down all the user memory while
> > doing the I/O instead of having throttling on the bio mempool before
> > doing the get_user_pages.
>
> Just add the iterator and loop for every X pages? We can even put a plug
> around it, if we have to.
That's pretty much what my patch does..
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 18:05 ` Christoph Hellwig
@ 2016-11-14 18:08 ` Jens Axboe
2016-11-14 18:11 ` Christoph Hellwig
0 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-14 18:08 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block
On 11/14/2016 11:05 AM, Christoph Hellwig wrote:
> On Mon, Nov 14, 2016 at 11:02:47AM -0700, Jens Axboe wrote:
>>> We need it unless we want unbounded allocations for the biovec. With a
>>> 1GB I/O we're at a page size allocation, and with 64MB I/Os that aren't
>>> unheard of we'd be up to a 64 pages or an order 6 allocation which will
>>> take down the VM. We also need to pin down all the user memory while
>>> doing the I/O instead of having throttling on the bio mempool before
>>> doing the get_user_pages.
>>
>> Just add the iterator and loop for every X pages? We can even put a plug
>> around it, if we have to.
>
> That's pretty much what my patch does..
It'd be cleaner to loop one level out, and avoid all that 'dio' stuff
instead. And then still retain the separate parts of the sync and async.
There's nothing to share there imho, and it just makes the code harder
to read.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 18:08 ` Jens Axboe
@ 2016-11-14 18:11 ` Christoph Hellwig
2016-11-14 18:17 ` Jens Axboe
2016-11-16 16:09 ` Jens Axboe
0 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-14 18:11 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, axboe, linux-block
On Mon, Nov 14, 2016 at 11:08:46AM -0700, Jens Axboe wrote:
> It'd be cleaner to loop one level out, and avoid all that 'dio' stuff
> instead. And then still retain the separate parts of the sync and async.
> There's nothing to share there imho, and it just makes the code harder
> to read.
How do you avoid it for the async case? We can only call ki_complete
once all bios have finished, which means we need a tracking structure
for it. For the synchronous case we could in theory wait for the
previous bio before sending the next, but there are plenty of RAID
arrays that would prefer > 1MB I/O. And we can pretty much reuse the
async case for this anyway.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 18:11 ` Christoph Hellwig
@ 2016-11-14 18:17 ` Jens Axboe
2016-11-15 13:58 ` Christoph Hellwig
2016-11-16 16:09 ` Jens Axboe
1 sibling, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-14 18:17 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block
On 11/14/2016 11:11 AM, Christoph Hellwig wrote:
> On Mon, Nov 14, 2016 at 11:08:46AM -0700, Jens Axboe wrote:
>> It'd be cleaner to loop one level out, and avoid all that 'dio' stuff
>> instead. And then still retain the separate parts of the sync and async.
>> There's nothing to share there imho, and it just makes the code harder
>> to read.
>
> How do you avoid it for the async case? We can only call ki_complete
> once all bios have finished, which means we need a tracking structure
> for it. For the synchronous case we could in theory wait for the
> previous bio before sending the next, but there are plenty of RAID
> arrays that would prefer > 1MB I/O. And we can pretty much reuse the
> async case for this anyway.
Just make the limit reasonable - with a order 1 alloc, we have 256
pages, or 1MB of IO. Loop around that. We could play tricks with bio
reuse and using the ref count for knowing when it'd be done.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 18:17 ` Jens Axboe
@ 2016-11-15 13:58 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-15 13:58 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, axboe, linux-block
On Mon, Nov 14, 2016 at 11:17:09AM -0700, Jens Axboe wrote:
> Just make the limit reasonable - with a order 1 alloc, we have 256 pages, or
> 1MB of IO.
That's not reasonable for many RAID arrays.
> Loop around that.
How do you want to loop around that for AIO?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-14 18:11 ` Christoph Hellwig
2016-11-14 18:17 ` Jens Axboe
@ 2016-11-16 16:09 ` Jens Axboe
2016-11-16 16:31 ` Christoph Hellwig
1 sibling, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-16 16:09 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block
On 11/14/2016 11:11 AM, Christoph Hellwig wrote:
> On Mon, Nov 14, 2016 at 11:08:46AM -0700, Jens Axboe wrote:
>> It'd be cleaner to loop one level out, and avoid all that 'dio' stuff
>> instead. And then still retain the separate parts of the sync and async.
>> There's nothing to share there imho, and it just makes the code harder
>> to read.
>
> How do you avoid it for the async case? We can only call ki_complete
> once all bios have finished, which means we need a tracking structure
> for it. For the synchronous case we could in theory wait for the
> previous bio before sending the next, but there are plenty of RAID
> arrays that would prefer > 1MB I/O. And we can pretty much reuse the
> async case for this anyway.
Another idea - just limit the max we can support in the simplified
version as a single bio. If the user is requesting more, fall back to
the old slow path. Doesn't really matter, since the transfer size will
be large enough to hopefully overcome any deficiencies in the old path.
That avoids having the atomic inc/dec in the fast path, and having to
loop for both aio/sync O_DIRECT.
Example below.
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7c3ec60..becc78e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -176,9 +176,68 @@ static struct inode *bdev_file_inode(struct file *file)
return file->f_mapping->host;
}
+static void blkdev_bio_end_io_async(struct bio *bio)
+{
+ struct kiocb *iocb = bio->bi_private;
+
+ iocb->ki_complete(iocb, bio->bi_error, 0);
+
+ if (bio_op(bio) == REQ_OP_READ)
+ bio_check_pages_dirty(bio);
+ else
+ bio_put(bio);
+}
+
+static ssize_t
+__blkdev_direct_IO_async(struct kiocb *iocb, struct iov_iter *iter,
+ int nr_pages)
+{
+ struct file *file = iocb->ki_filp;
+ struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+ unsigned blkbits = blksize_bits(bdev_logical_block_size(bdev));
+ loff_t pos = iocb->ki_pos;
+ struct bio *bio;
+ ssize_t ret;
+
+ if ((pos | iov_iter_alignment(iter)) & ((1 << blkbits) - 1))
+ return -EINVAL;
+
+ bio = bio_alloc(GFP_KERNEL, nr_pages);
+ if (!bio)
+ return -ENOMEM;
+
+ bio->bi_bdev = bdev;
+ bio->bi_iter.bi_sector = pos >> blkbits;
+ bio->bi_private = iocb;
+ bio->bi_end_io = blkdev_bio_end_io_async;
+
+ ret = bio_iov_iter_get_pages(bio, iter);
+ if (unlikely(ret))
+ return ret;
+
+ /*
+ * Overload bio size in error. If it gets set, we lose the
+ * size, but we don't need the size for that case. IO is limited
+ * to BIO_MAX_PAGES, so we can't overflow.
+ */
+ ret = bio->bi_error = bio->bi_iter.bi_size;
+
+ if (iov_iter_rw(iter) == READ) {
+ bio_set_op_attrs(bio, REQ_OP_READ, 0);
+ bio_set_pages_dirty(bio);
+ } else {
+ bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC | REQ_IDLE);
+ task_io_account_write(ret);
+ }
+
+ submit_bio(bio);
+ iocb->ki_pos += ret;
+ return -EIOCBQUEUED;
+}
+
#define DIO_INLINE_BIO_VECS 4
-static void blkdev_bio_end_io_simple(struct bio *bio)
+static void blkdev_bio_end_io_sync(struct bio *bio)
{
struct task_struct *waiter = bio->bi_private;
@@ -187,13 +246,12 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
}
static ssize_t
-__blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
- int nr_pages)
+__blkdev_direct_IO_sync(struct kiocb *iocb, struct iov_iter *iter, int
nr_pages)
{
struct file *file = iocb->ki_filp;
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
unsigned blkbits = blksize_bits(bdev_logical_block_size(bdev));
- struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *bvec;
+ struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs, *bvec;
loff_t pos = iocb->ki_pos;
bool should_dirty = false;
struct bio bio;
@@ -204,13 +262,21 @@ __blkdev_direct_IO_simple(struct kiocb *iocb,
struct iov_iter *iter,
if ((pos | iov_iter_alignment(iter)) & ((1 << blkbits) - 1))
return -EINVAL;
+ if (nr_pages <= DIO_INLINE_BIO_VECS)
+ vecs = inline_vecs;
+ else {
+ vecs = kmalloc(nr_pages * sizeof(struct bio_vec), GFP_KERNEL);
+ if (!vecs)
+ return -ENOMEM;
+ }
+
bio_init(&bio);
bio.bi_max_vecs = nr_pages;
- bio.bi_io_vec = inline_vecs;
+ bio.bi_io_vec = vecs;
bio.bi_bdev = bdev;
bio.bi_iter.bi_sector = pos >> blkbits;
bio.bi_private = current;
- bio.bi_end_io = blkdev_bio_end_io_simple;
+ bio.bi_end_io = blkdev_bio_end_io_sync;
ret = bio_iov_iter_get_pages(&bio, iter);
if (unlikely(ret))
@@ -243,6 +309,9 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct
iov_iter *iter,
put_page(bvec->bv_page);
}
+ if (vecs != inline_vecs)
+ kfree(vecs);
+
if (unlikely(bio.bi_error))
return bio.bi_error;
iocb->ki_pos += ret;
@@ -252,18 +321,25 @@ __blkdev_direct_IO_simple(struct kiocb *iocb,
struct iov_iter *iter,
static ssize_t
blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = bdev_file_inode(file);
int nr_pages;
- nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
+ nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES + 1);
if (!nr_pages)
return 0;
- if (is_sync_kiocb(iocb) && nr_pages <= DIO_INLINE_BIO_VECS)
- return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
- return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter,
- blkdev_get_block, NULL, NULL,
- DIO_SKIP_DIO_COUNT);
+
+ if (nr_pages > BIO_MAX_PAGES) {
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = bdev_file_inode(file);
+
+ return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter,
+ blkdev_get_block, NULL, NULL,
+ DIO_SKIP_DIO_COUNT);
+ }
+
+ if (is_sync_kiocb(iocb))
+ return __blkdev_direct_IO_sync(iocb, iter, nr_pages);
+
+ return __blkdev_direct_IO_async(iocb, iter, nr_pages);
}
int __sync_blockdev(struct block_device *bdev, int wait)
--
Jens Axboe
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-16 16:09 ` Jens Axboe
@ 2016-11-16 16:31 ` Christoph Hellwig
2016-11-16 16:57 ` Jens Axboe
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-16 16:31 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, axboe, linux-block
On Wed, Nov 16, 2016 at 09:09:17AM -0700, Jens Axboe wrote:
> Another idea - just limit the max we can support in the simplified
> version as a single bio. If the user is requesting more, fall back to
> the old slow path. Doesn't really matter, since the transfer size will
> be large enough to hopefully overcome any deficiencies in the old path.
I really want to kill off the old code :) Maybe we could use my
iomap-based code instead of another block version if we delegate it
far enough away from the host path with a patch like yours, but I really
wonder if avoiding the atomic inc/dec is that important once we are
above a minimal threshold. That being said I think we could avoid
the atomic op for the the single-bio cases even with code that handles
multiples bios. Kent actually noted that on my full blown iomap
version.
But if we need another fast but not super fast case your patch below
looks reasonable. Although we really need to implement O_SYNC/O_DSYNC
support for it to avoid losing data.
> + /*
> + * Overload bio size in error. If it gets set, we lose the
> + * size, but we don't need the size for that case. IO is limited
> + * to BIO_MAX_PAGES, so we can't overflow.
> + */
> + ret = bio->bi_error = bio->bi_iter.bi_size;
Oh, that's a neat trick.
> + if (nr_pages <= DIO_INLINE_BIO_VECS)
> + vecs = inline_vecs;
> + else {
> + vecs = kmalloc(nr_pages * sizeof(struct bio_vec), GFP_KERNEL);
kmalloc_array?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-16 16:31 ` Christoph Hellwig
@ 2016-11-16 16:57 ` Jens Axboe
2016-11-16 17:16 ` Christoph Hellwig
0 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-16 16:57 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block
On 11/16/2016 09:31 AM, Christoph Hellwig wrote:
> On Wed, Nov 16, 2016 at 09:09:17AM -0700, Jens Axboe wrote:
>> Another idea - just limit the max we can support in the simplified
>> version as a single bio. If the user is requesting more, fall back to
>> the old slow path. Doesn't really matter, since the transfer size will
>> be large enough to hopefully overcome any deficiencies in the old path.
>
> I really want to kill off the old code :) Maybe we could use my
> iomap-based code instead of another block version if we delegate it
> far enough away from the host path with a patch like yours, but I really
> wonder if avoiding the atomic inc/dec is that important once we are
> above a minimal threshold. That being said I think we could avoid
> the atomic op for the the single-bio cases even with code that handles
> multiples bios. Kent actually noted that on my full blown iomap
> version.
I'd love to kill it off too, but that might take a bit longer. At least
the bdev versions are nice and simple.
> But if we need another fast but not super fast case your patch below
> looks reasonable. Although we really need to implement O_SYNC/O_DSYNC
> support for it to avoid losing data.
I can add the flush support.
>> + /*
>> + * Overload bio size in error. If it gets set, we lose the
>> + * size, but we don't need the size for that case. IO is limited
>> + * to BIO_MAX_PAGES, so we can't overflow.
>> + */
>> + ret = bio->bi_error = bio->bi_iter.bi_size;
>
> Oh, that's a neat trick.
It's a bit iffy, but should be safe. And with that, we don't need an
extra allocation outside of the bio.
>> + if (nr_pages <= DIO_INLINE_BIO_VECS)
>> + vecs = inline_vecs;
>> + else {
>> + vecs = kmalloc(nr_pages * sizeof(struct bio_vec), GFP_KERNEL);
>
> kmalloc_array?
Good point, I'll make that change.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-16 16:57 ` Jens Axboe
@ 2016-11-16 17:16 ` Christoph Hellwig
2016-11-16 20:02 ` Jens Axboe
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-16 17:16 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block
I've pushed out a new version of my code that avoids atomic ops
for the single bio case:
http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
I think we should take your extension of the existing
__blkdev_direct_IO_simple to kmalloc up to BIO_MAX_PAGES and then use
the above implementation for AIO and large synchronous I/O.
I think I can also kill of blkdev_dio_pool by simply using bio_kmalloc
as we're only doing the allocation at the beginning of the call,
but I'd like to agree on an approach before spending too much time on
it.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-16 17:16 ` Christoph Hellwig
@ 2016-11-16 20:02 ` Jens Axboe
2016-11-16 20:36 ` Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Jens Axboe @ 2016-11-16 20:02 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
On 11/16/2016 10:16 AM, Christoph Hellwig wrote:
> I've pushed out a new version of my code that avoids atomic ops
> for the single bio case:
>
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
>
> I think we should take your extension of the existing
> __blkdev_direct_IO_simple to kmalloc up to BIO_MAX_PAGES and then use
> the above implementation for AIO and large synchronous I/O.
>
> I think I can also kill of blkdev_dio_pool by simply using bio_kmalloc
> as we're only doing the allocation at the beginning of the call,
> but I'd like to agree on an approach before spending too much time on
> it.
>
I'm fine with this approach, but I would still REALLY like to see
blkdev_bio_end_io() split in two, once for sync and once for async. That
would be a lot cleaner, imho.
Let me know how you want to do it. I added SYNC support for mine here,
the branch is here:
http://git.kernel.dk/cgit/linux-block/log/?h=for-4.10/dio
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-16 20:02 ` Jens Axboe
@ 2016-11-16 20:36 ` Jens Axboe
2016-11-16 20:51 ` Jens Axboe
2016-11-17 4:43 ` Jens Axboe
2016-11-17 12:46 ` Christoph Hellwig
2 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-16 20:36 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
[-- Attachment #1: Type: text/plain, Size: 1321 bytes --]
On 11/16/2016 01:02 PM, Jens Axboe wrote:
> On 11/16/2016 10:16 AM, Christoph Hellwig wrote:
>> I've pushed out a new version of my code that avoids atomic ops
>> for the single bio case:
>>
>> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
>>
>> I think we should take your extension of the existing
>> __blkdev_direct_IO_simple to kmalloc up to BIO_MAX_PAGES and then use
>> the above implementation for AIO and large synchronous I/O.
>>
>> I think I can also kill of blkdev_dio_pool by simply using bio_kmalloc
>> as we're only doing the allocation at the beginning of the call,
>> but I'd like to agree on an approach before spending too much time on
>> it.
>>
>
> I'm fine with this approach, but I would still REALLY like to see
> blkdev_bio_end_io() split in two, once for sync and once for async. That
> would be a lot cleaner, imho.
>
> Let me know how you want to do it. I added SYNC support for mine here,
> the branch is here:
>
> http://git.kernel.dk/cgit/linux-block/log/?h=for-4.10/dio
Attaching two patches here that add the BIO_MAX_PAGES extension, and the
SYNC support for your branch. Latter is untested... You can add my
reviewed-by to:
7bed5be4f28cc86e3929c0c0fbba24ca0344f36c
57c2cf5c2595c0054855d7402d3796b5924fd05c
I'd like to get this in for 4.10.
--
Jens Axboe
[-- Attachment #2: 0002-block-add-suppor-for-flushing-cache-for-O_SYNC-DSYNC.patch --]
[-- Type: text/x-patch, Size: 3541 bytes --]
>From f28e75e618e578b6763dfc7bb44df88619bfdbdf Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@fb.com>
Date: Wed, 16 Nov 2016 13:33:42 -0700
Subject: [PATCH 2/2] block: add suppor for flushing cache for O_SYNC/DSYNC
Signed-off-by: Jens Axboe <axboe@fb.com>
---
fs/block_dev.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 59 insertions(+), 8 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index d34ec663f538..052f1cf605cc 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -186,6 +186,18 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
wake_up_process(waiter);
}
+static bool dio_should_flush_cache(struct file *file)
+{
+ if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) {
+ struct block_device *bdev = I_BDEV(bdev_file_inode(file));
+ struct request_queue *q = bdev_get_queue(bdev);
+
+ return test_bit(QUEUE_FLAG_WC, &q->queue_flags) != 0;
+ }
+
+ return false;
+}
+
static ssize_t
__blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
int nr_pages)
@@ -256,22 +268,52 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
if (unlikely(bio.bi_error))
return bio.bi_error;
- iocb->ki_pos += ret;
+
+ if (bio_op(&bio) == REQ_OP_WRITE && dio_should_flush_cache(file)) {
+ struct request_queue *q = bdev_get_queue(bdev);
+
+ if (test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
+ int err;
+
+ err = blkdev_issue_flush(bdev, GFP_KERNEL, NULL);
+ if (err && err != -EOPNOTSUPP)
+ ret = ret;
+ }
+ }
+
+ if (ret > 0)
+ iocb->ki_pos += ret;
+
return ret;
}
struct blkdev_dio {
struct kiocb *iocb;
struct task_struct *waiter;
+ struct block_device *bdev;
size_t size;
atomic_t ref;
bool multi_bio : 1;
bool should_dirty : 1;
+ bool should_flush : 1;
+ struct work_struct work;
struct bio bio;
};
static struct bio_set *blkdev_dio_pool __read_mostly;
+static void blkdev_flush_cache_fn(struct work_struct *work)
+{
+ struct blkdev_dio *dio = container_of(work, struct blkdev_dio, work);
+ int ret;
+
+ ret = blkdev_issue_flush(dio->bdev, GFP_KERNEL, NULL);
+ if (!ret || ret == -EOPNOTSUPP)
+ ret = dio->size;
+
+ dio->iocb->ki_complete(dio->iocb, ret, 0);
+}
+
static void blkdev_bio_end_io(struct bio *bio)
{
struct blkdev_dio *dio = bio->bi_private;
@@ -282,15 +324,19 @@ static void blkdev_bio_end_io(struct bio *bio)
dio->bio.bi_error = bio->bi_error;
} else {
if (iocb) {
- ssize_t ret = dio->bio.bi_error;
+ if (dio->should_flush)
+ kblockd_schedule_work(&dio->work);
+ else {
+ ssize_t ret = dio->bio.bi_error;
+
+ if (likely(!ret)) {
+ ret = dio->size;
+ iocb->ki_pos += ret;
+ }
- if (likely(!ret)) {
- ret = dio->size;
- iocb->ki_pos += ret;
+ dio->iocb->ki_complete(iocb, ret, 0);
+ bio_put(&dio->bio);
}
-
- dio->iocb->ki_complete(iocb, ret, 0);
- bio_put(&dio->bio);
} else {
struct task_struct *waiter = dio->waiter;
@@ -335,10 +381,15 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
dio = container_of(bio, struct blkdev_dio, bio);
dio->iocb = is_sync ? NULL : iocb;
dio->waiter = current;
+ dio->bdev = bdev;
dio->size = 0;
dio->multi_bio = false;
dio->should_dirty = is_read && (iter->type == ITER_IOVEC);
+ dio->should_flush = !is_read && dio_should_flush_cache(file);
+ if (dio->should_flush)
+ INIT_WORK(&dio->work, blkdev_flush_cache_fn);
+
for (;;) {
bio->bi_bdev = bdev;
bio->bi_iter.bi_sector = pos >> blkbits;
--
2.7.4
[-- Attachment #3: 0001-block-support-any-sized-IO-for-simplified-bdev-direc.patch --]
[-- Type: text/x-patch, Size: 2162 bytes --]
>From 84629aca0c25cab79b2e03f2e046b5691255bf17 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@fb.com>
Date: Mon, 14 Nov 2016 10:19:47 -0700
Subject: [PATCH 1/2] block: support any sized IO for simplified bdev direct-io
Just alloc the bio_vec array if we exceed the inline limit.
Signed-off-by: Jens Axboe <axboe@fb.com>
---
fs/block_dev.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6c8b6dfac1e5..d34ec663f538 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -193,7 +193,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
struct file *file = iocb->ki_filp;
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
unsigned blkbits = blksize_bits(bdev_logical_block_size(bdev));
- struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *bvec;
+ struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs, *bvec;
loff_t pos = iocb->ki_pos;
bool should_dirty = false;
struct bio bio;
@@ -204,9 +204,17 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
if ((pos | iov_iter_alignment(iter)) & ((1 << blkbits) - 1))
return -EINVAL;
+ if (nr_pages <= DIO_INLINE_BIO_VECS)
+ vecs = inline_vecs;
+ else {
+ vecs = kmalloc(nr_pages * sizeof(struct bio_vec), GFP_KERNEL);
+ if (!vecs)
+ return -ENOMEM;
+ }
+
bio_init(&bio);
bio.bi_max_vecs = nr_pages;
- bio.bi_io_vec = inline_vecs;
+ bio.bi_io_vec = vecs;
bio.bi_bdev = bdev;
bio.bi_iter.bi_sector = pos >> blkbits;
bio.bi_private = current;
@@ -243,6 +251,9 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
put_page(bvec->bv_page);
}
+ if (vecs != inline_vecs)
+ kfree(vecs);
+
if (unlikely(bio.bi_error))
return bio.bi_error;
iocb->ki_pos += ret;
@@ -408,7 +419,7 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
if (!nr_pages)
return 0;
- if (is_sync_kiocb(iocb) && nr_pages <= DIO_INLINE_BIO_VECS)
+ if (is_sync_kiocb(iocb))
return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
return __blkdev_direct_IO(iocb, iter, nr_pages);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-16 20:36 ` Jens Axboe
@ 2016-11-16 20:51 ` Jens Axboe
2016-11-17 3:44 ` Jens Axboe
0 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-16 20:51 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
On 11/16/2016 01:36 PM, Jens Axboe wrote:
> On 11/16/2016 01:02 PM, Jens Axboe wrote:
>> On 11/16/2016 10:16 AM, Christoph Hellwig wrote:
>>> I've pushed out a new version of my code that avoids atomic ops
>>> for the single bio case:
>>>
>>> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
>>>
>>> I think we should take your extension of the existing
>>> __blkdev_direct_IO_simple to kmalloc up to BIO_MAX_PAGES and then use
>>> the above implementation for AIO and large synchronous I/O.
>>>
>>> I think I can also kill of blkdev_dio_pool by simply using bio_kmalloc
>>> as we're only doing the allocation at the beginning of the call,
>>> but I'd like to agree on an approach before spending too much time on
>>> it.
>>>
>>
>> I'm fine with this approach, but I would still REALLY like to see
>> blkdev_bio_end_io() split in two, once for sync and once for async. That
>> would be a lot cleaner, imho.
>>
>> Let me know how you want to do it. I added SYNC support for mine here,
>> the branch is here:
>>
>> http://git.kernel.dk/cgit/linux-block/log/?h=for-4.10/dio
>
> Attaching two patches here that add the BIO_MAX_PAGES extension, and the
> SYNC support for your branch. Latter is untested... You can add my
> reviewed-by to:
>
> 7bed5be4f28cc86e3929c0c0fbba24ca0344f36c
> 57c2cf5c2595c0054855d7402d3796b5924fd05c
>
> I'd like to get this in for 4.10.
Tested the cache flush part, seems to work fine as well for both sync
and async O_DIRECT.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-16 20:51 ` Jens Axboe
@ 2016-11-17 3:44 ` Jens Axboe
0 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2016-11-17 3:44 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
On 11/16/2016 01:51 PM, Jens Axboe wrote:
> On 11/16/2016 01:36 PM, Jens Axboe wrote:
>> On 11/16/2016 01:02 PM, Jens Axboe wrote:
>>> On 11/16/2016 10:16 AM, Christoph Hellwig wrote:
>>>> I've pushed out a new version of my code that avoids atomic ops
>>>> for the single bio case:
>>>>
>>>> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
>>>>
>>>>
>>>> I think we should take your extension of the existing
>>>> __blkdev_direct_IO_simple to kmalloc up to BIO_MAX_PAGES and then use
>>>> the above implementation for AIO and large synchronous I/O.
>>>>
>>>> I think I can also kill of blkdev_dio_pool by simply using bio_kmalloc
>>>> as we're only doing the allocation at the beginning of the call,
>>>> but I'd like to agree on an approach before spending too much time on
>>>> it.
>>>>
>>>
>>> I'm fine with this approach, but I would still REALLY like to see
>>> blkdev_bio_end_io() split in two, once for sync and once for async. That
>>> would be a lot cleaner, imho.
>>>
>>> Let me know how you want to do it. I added SYNC support for mine here,
>>> the branch is here:
>>>
>>> http://git.kernel.dk/cgit/linux-block/log/?h=for-4.10/dio
>>
>> Attaching two patches here that add the BIO_MAX_PAGES extension, and the
>> SYNC support for your branch. Latter is untested... You can add my
>> reviewed-by to:
>>
>> 7bed5be4f28cc86e3929c0c0fbba24ca0344f36c
>> 57c2cf5c2595c0054855d7402d3796b5924fd05c
>>
>> I'd like to get this in for 4.10.
>
> Tested the cache flush part, seems to work fine as well for both sync
> and async O_DIRECT.
Updated the last patch, since it had both a typo in the subject line and
some duplicated code after I added the dio_should_flush_cache() helper.
Top two patches here:
http://git.kernel.dk/cgit/linux-block/log/?h=for-4.10/hch-dio
which is on top of your dio branch.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-16 20:02 ` Jens Axboe
2016-11-16 20:36 ` Jens Axboe
@ 2016-11-17 4:43 ` Jens Axboe
2016-11-17 12:50 ` Christoph Hellwig
2016-11-17 12:46 ` Christoph Hellwig
2 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-17 4:43 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
On 11/16/2016 01:02 PM, Jens Axboe wrote:
> On 11/16/2016 10:16 AM, Christoph Hellwig wrote:
>> I've pushed out a new version of my code that avoids atomic ops
>> for the single bio case:
>>
>> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
>>
>> I think we should take your extension of the existing
>> __blkdev_direct_IO_simple to kmalloc up to BIO_MAX_PAGES and then use
>> the above implementation for AIO and large synchronous I/O.
>>
>> I think I can also kill of blkdev_dio_pool by simply using bio_kmalloc
>> as we're only doing the allocation at the beginning of the call,
>> but I'd like to agree on an approach before spending too much time on
>> it.
>>
>
> I'm fine with this approach, but I would still REALLY like to see
> blkdev_bio_end_io() split in two, once for sync and once for async. That
> would be a lot cleaner, imho.
OK, I'm getting reasonably happy with it now:
http://git.kernel.dk/cgit/linux-block/log/?h=for-4.10/hch-dio
Sharing the FUA setting to avoid the manual flush between the two, and
removing the remnants of SYNC support from __blkdev_direct_IO().
I'll fire off some overnight testing now and leave it for the day.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-16 20:02 ` Jens Axboe
2016-11-16 20:36 ` Jens Axboe
2016-11-17 4:43 ` Jens Axboe
@ 2016-11-17 12:46 ` Christoph Hellwig
2016-11-17 13:26 ` Jens Axboe
2 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-17 12:46 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block
On Wed, Nov 16, 2016 at 01:02:05PM -0700, Jens Axboe wrote:
> I'm fine with this approach, but I would still REALLY like to see
> blkdev_bio_end_io() split in two, once for sync and once for async. That
> would be a lot cleaner, imho.
Do you still want that? It's not in your latest tree even with all
the other changes.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-17 4:43 ` Jens Axboe
@ 2016-11-17 12:50 ` Christoph Hellwig
2016-11-17 13:25 ` Jens Axboe
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-17 12:50 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block
On Wed, Nov 16, 2016 at 09:43:57PM -0700, Jens Axboe wrote:
> OK, I'm getting reasonably happy with it now:
>
> http://git.kernel.dk/cgit/linux-block/log/?h=for-4.10/hch-dio
>
> Sharing the FUA setting to avoid the manual flush between the two, and
> removing the remnants of SYNC support from __blkdev_direct_IO().
>
> I'll fire off some overnight testing now and leave it for the day.
This looks reasonable, a couple questions and comments:
- why the 8 * BIO_MAX_PAGES in the call to iov_iter_npages,
that probably needs a comment
- I'll probably need to write up a proper changelog and fold
in at least the multi_bio patch.
- I'd just return the op from dio_bio_set_write_op instead of
passing the bio, that reads a lot easier.
- I should probably get rid of blkdev_dio_pool in favour
of bio_kmalloc
I can look into doing these fixups this afternoon and prepare a tre
for you.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-17 12:50 ` Christoph Hellwig
@ 2016-11-17 13:25 ` Jens Axboe
2016-11-17 15:03 ` Jens Axboe
0 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-17 13:25 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
On 11/17/2016 05:50 AM, Christoph Hellwig wrote:
> On Wed, Nov 16, 2016 at 09:43:57PM -0700, Jens Axboe wrote:
>> OK, I'm getting reasonably happy with it now:
>>
>> http://git.kernel.dk/cgit/linux-block/log/?h=for-4.10/hch-dio
>>
>> Sharing the FUA setting to avoid the manual flush between the two, and
>> removing the remnants of SYNC support from __blkdev_direct_IO().
>>
>> I'll fire off some overnight testing now and leave it for the day.
>
> This looks reasonable, a couple questions and comments:
>
> - why the 8 * BIO_MAX_PAGES in the call to iov_iter_npages,
> that probably needs a comment
Looks like I forgot to push out the latest changes, that is gone. We're
just using + 1 now to see if we should use the multibio function or not.
> - I'll probably need to write up a proper changelog and fold
> in at least the multi_bio patch.
That's fine.
> - I'd just return the op from dio_bio_set_write_op instead of
> passing the bio, that reads a lot easier.
Fine with me.
> - I should probably get rid of blkdev_dio_pool in favour
> of bio_kmalloc
Agree, though not a requirement imho.
> I can look into doing these fixups this afternoon and prepare a tre
> for you.
Sounds good.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-17 12:46 ` Christoph Hellwig
@ 2016-11-17 13:26 ` Jens Axboe
0 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2016-11-17 13:26 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
On 11/17/2016 05:46 AM, Christoph Hellwig wrote:
> On Wed, Nov 16, 2016 at 01:02:05PM -0700, Jens Axboe wrote:
>> I'm fine with this approach, but I would still REALLY like to see
>> blkdev_bio_end_io() split in two, once for sync and once for async. That
>> would be a lot cleaner, imho.
>
> Do you still want that? It's not in your latest tree even with all
> the other changes.
I still think it'd clean it up, but not a merge stopper.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-17 13:25 ` Jens Axboe
@ 2016-11-17 15:03 ` Jens Axboe
2016-11-17 15:16 ` Christoph Hellwig
2016-11-17 20:17 ` Christoph Hellwig
0 siblings, 2 replies; 31+ messages in thread
From: Jens Axboe @ 2016-11-17 15:03 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
On 11/17/2016 06:25 AM, Jens Axboe wrote:
> On 11/17/2016 05:50 AM, Christoph Hellwig wrote:
>> On Wed, Nov 16, 2016 at 09:43:57PM -0700, Jens Axboe wrote:
>>> OK, I'm getting reasonably happy with it now:
>>>
>>> http://git.kernel.dk/cgit/linux-block/log/?h=for-4.10/hch-dio
>>>
>>> Sharing the FUA setting to avoid the manual flush between the two, and
>>> removing the remnants of SYNC support from __blkdev_direct_IO().
>>>
>>> I'll fire off some overnight testing now and leave it for the day.
>>
>> This looks reasonable, a couple questions and comments:
>>
>> - why the 8 * BIO_MAX_PAGES in the call to iov_iter_npages,
>> that probably needs a comment
>
> Looks like I forgot to push out the latest changes, that is gone. We're
> just using + 1 now to see if we should use the multibio function or not.
Just to clarify, the correct tip is:
commit 18eb8e94477813c190a2dfcd790c6707ab168377
Author: Jens Axboe <axboe@fb.com>
Date: Wed Nov 16 23:20:41 2016 -0700
block: save 8 bytes of space in struct blkdev_dio
That's what I ran in my overnight testing, and it looks fine from my
end.
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-17 15:03 ` Jens Axboe
@ 2016-11-17 15:16 ` Christoph Hellwig
2016-11-17 20:17 ` Christoph Hellwig
1 sibling, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-17 15:16 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block
On Thu, Nov 17, 2016 at 08:03:46AM -0700, Jens Axboe wrote:
> > Looks like I forgot to push out the latest changes, that is gone. We're
> > just using + 1 now to see if we should use the multibio function or not.
>
> Just to clarify, the correct tip is:
Re-checking for-4.10/dio now everything ѕeems to be fine. Not sure
how I managed to pick up an old head.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-17 15:03 ` Jens Axboe
2016-11-17 15:16 ` Christoph Hellwig
@ 2016-11-17 20:17 ` Christoph Hellwig
2016-11-17 20:30 ` Jens Axboe
1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-17 20:17 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block
Ok, the updated tree is here:
http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
This moves your O_SYNC patch earlier and makes it not directly
assign bi_opf.
Then comes the main direct I/O path with the various folds, and
a use after free fix ported from the iomap code in the completion
handler.
Last comes the I/O completion handler split - I don't think it's
worth it, but if you prefer it that way go ahead and fold it in.
I couldn't do the bio_kmalloc conversion as bio_kmalloc seems
to not like the front pad.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-17 20:17 ` Christoph Hellwig
@ 2016-11-17 20:30 ` Jens Axboe
2016-11-17 21:04 ` Christoph Hellwig
0 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-17 20:30 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
On 11/17/2016 01:17 PM, Christoph Hellwig wrote:
> Ok, the updated tree is here:
>
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/new-dio
>
> This moves your O_SYNC patch earlier and makes it not directly
> assign bi_opf.
>
> Then comes the main direct I/O path with the various folds, and
> a use after free fix ported from the iomap code in the completion
> handler.
>
> Last comes the I/O completion handler split - I don't think it's
> worth it, but if you prefer it that way go ahead and fold it in.
The last patch seems to have some issues:
+ bool is_sync = is_sync = is_sync_kiocb(iocb);
and it's reverting part of the dio->is_sync that was part of my union
waiter/iocb change. So I'll leave that out for now, for those reasons.
> I couldn't do the bio_kmalloc conversion as bio_kmalloc seems
> to not like the front pad.
Yeah, !bs/bio_kmalloc() doesn't use the front pad...
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-17 20:30 ` Jens Axboe
@ 2016-11-17 21:04 ` Christoph Hellwig
2016-11-17 21:06 ` Jens Axboe
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-17 21:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block
On Thu, Nov 17, 2016 at 01:30:12PM -0700, Jens Axboe wrote:
> The last patch seems to have some issues:
>
> + bool is_sync = is_sync = is_sync_kiocb(iocb);
>
> and it's reverting part of the dio->is_sync that was part of my union
> waiter/iocb change. So I'll leave that out for now, for those reasons.
dio->is_sync isn't needed once the completion handlers are split.
That beeing said I wonder how the above statement even compiled,
there should just be a single assignment obviously.
Did I mention that I don't like the split anyway? :)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCHSET] Add support for simplified async direct-io
2016-11-17 21:04 ` Christoph Hellwig
@ 2016-11-17 21:06 ` Jens Axboe
0 siblings, 0 replies; 31+ messages in thread
From: Jens Axboe @ 2016-11-17 21:06 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
On 11/17/2016 02:04 PM, Christoph Hellwig wrote:
> On Thu, Nov 17, 2016 at 01:30:12PM -0700, Jens Axboe wrote:
>> The last patch seems to have some issues:
>>
>> + bool is_sync = is_sync = is_sync_kiocb(iocb);
>>
>> and it's reverting part of the dio->is_sync that was part of my union
>> waiter/iocb change. So I'll leave that out for now, for those reasons.
>
> dio->is_sync isn't needed once the completion handlers are split.
> That beeing said I wonder how the above statement even compiled,
> there should just be a single assignment obviously.
>
> Did I mention that I don't like the split anyway? :)
Alright alright, let's drop the split ;-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2016-11-17 21:06 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-14 17:28 [PATCHSET] Add support for simplified async direct-io Jens Axboe
2016-11-14 17:28 ` [PATCH 1/2] block: support any sized IO for simplified bdev direct-io Jens Axboe
2016-11-14 17:28 ` [PATCH 2/2] block: add support for async simple direct-io for bdevs Jens Axboe
2016-11-14 17:37 ` [PATCHSET] Add support for simplified async direct-io Christoph Hellwig
2016-11-14 17:47 ` Jens Axboe
2016-11-14 18:00 ` Christoph Hellwig
2016-11-14 18:02 ` Jens Axboe
2016-11-14 18:05 ` Christoph Hellwig
2016-11-14 18:08 ` Jens Axboe
2016-11-14 18:11 ` Christoph Hellwig
2016-11-14 18:17 ` Jens Axboe
2016-11-15 13:58 ` Christoph Hellwig
2016-11-16 16:09 ` Jens Axboe
2016-11-16 16:31 ` Christoph Hellwig
2016-11-16 16:57 ` Jens Axboe
2016-11-16 17:16 ` Christoph Hellwig
2016-11-16 20:02 ` Jens Axboe
2016-11-16 20:36 ` Jens Axboe
2016-11-16 20:51 ` Jens Axboe
2016-11-17 3:44 ` Jens Axboe
2016-11-17 4:43 ` Jens Axboe
2016-11-17 12:50 ` Christoph Hellwig
2016-11-17 13:25 ` Jens Axboe
2016-11-17 15:03 ` Jens Axboe
2016-11-17 15:16 ` Christoph Hellwig
2016-11-17 20:17 ` Christoph Hellwig
2016-11-17 20:30 ` Jens Axboe
2016-11-17 21:04 ` Christoph Hellwig
2016-11-17 21:06 ` Jens Axboe
2016-11-17 12:46 ` Christoph Hellwig
2016-11-17 13:26 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).