* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
[not found] ` <1430932106-17451-5-git-send-email-ming.lei@canonical.com>
@ 2015-05-07 7:24 ` Christoph Hellwig
2015-05-07 12:32 ` Ming Lei
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2015-05-07 7:24 UTC (permalink / raw)
To: Ming Lei
Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown,
Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro,
Tejun Heo, util-linux
> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
> mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
> lo->old_gfp_mask = mapping_gfp_mask(mapping);
> mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> +
> + lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
> + if (lo->support_dio)
> + lo->use_aio = true;
> + else
> + lo->use_aio = false;
We need an explicit userspace op-in for this. For one direct I/O can't
handle sub-sector size access and people use the loop device as a
workaround for that. Second this doesn't give anyone seeing negative
results from aio a way to disable it easily.
It think the best way is to require a setup time flag, but enable it to
on in losetup versions that know about it.
> @@ -761,6 +826,13 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
> if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
> blk_queue_flush(lo->lo_queue, REQ_FLUSH);
>
> + /* use aio if it is possible */
> + lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
> + if (lo->support_dio)
> + lo->use_aio = true;
> + else
> + lo->use_aio = false;
Please always factor out checks like this insted of duplicating them.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
2015-05-07 7:24 ` [PATCH v3 4/4] block: loop: support DIO & AIO Christoph Hellwig
@ 2015-05-07 12:32 ` Ming Lei
2015-05-07 22:20 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2015-05-07 12:32 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown,
Maxim Patlasov, Andrew Morton, Alexander Viro, Tejun Heo,
util-linux
On Thu, May 7, 2015 at 3:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
>> mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
>> lo->old_gfp_mask = mapping_gfp_mask(mapping);
>> mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
>> +
>> + lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
>> + if (lo->support_dio)
>> + lo->use_aio = true;
>> + else
>> + lo->use_aio = false;
>
> We need an explicit userspace op-in for this. For one direct I/O can't
Actually this patch is one simplified version, and my old version
has exported two sysfs files(use_aio, use_dio) which can control
if direct IO or AIO is used but only AIO is enabled if DIO is set. Finally
I think it isn't necessary because dio/aio works well from the tests,
and userspace shouldn't care if it is AIO or not if the performance
is good.
> handle sub-sector size access and people use the loop device as a
> workaround for that.
Yes, user can do that, could you explain a bit what the problem is?
> Second this doesn't give anyone seeing negative
> results from aio a way to disable it easily.
If there is the requirement, we can export the control interface via
sysfs files easily, but I suggest to not do that from the start for sake
of simplicity.
>
> It think the best way is to require a setup time flag, but enable it to
> on in losetup versions that know about it.
Could you explain a bit why we need losetup involved?
>
>> @@ -761,6 +826,13 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
>> if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
>> blk_queue_flush(lo->lo_queue, REQ_FLUSH);
>>
>> + /* use aio if it is possible */
>> + lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
>> + if (lo->support_dio)
>> + lo->use_aio = true;
>> + else
>> + lo->use_aio = false;
>
> Please always factor out checks like this insted of duplicating them.
OK, will do that.
Thanks,
Ming
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
2015-05-07 12:32 ` Ming Lei
@ 2015-05-07 22:20 ` Dave Chinner
2015-05-08 4:09 ` Ming Lei
0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2015-05-07 22:20 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
Alexander Viro, Tejun Heo, util-linux
On Thu, May 07, 2015 at 08:32:39PM +0800, Ming Lei wrote:
> On Thu, May 7, 2015 at 3:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
> >> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
> >> mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
> >> lo->old_gfp_mask = mapping_gfp_mask(mapping);
> >> mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> >> +
> >> + lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
> >> + if (lo->support_dio)
> >> + lo->use_aio = true;
> >> + else
> >> + lo->use_aio = false;
> >
> > We need an explicit userspace op-in for this. For one direct I/O can't
>
> Actually this patch is one simplified version, and my old version
> has exported two sysfs files(use_aio, use_dio) which can control
> if direct IO or AIO is used but only AIO is enabled if DIO is set. Finally
> I think it isn't necessary because dio/aio works well from the tests,
> and userspace shouldn't care if it is AIO or not if the performance
> is good.
Performance won't always be good.
It looks to me that this has an unbound queue depth for AIO. What
throttles the amount of IO userspace can throw at an aio-enabled
loop device? If it's unbound, then userspace can throw gigabytes of
random write at the loop device and rather thanbe throttled at 128
outstanding IOs, the queue will just keep growing. That will have
adverse affects on dirty memory throttling, memory reclaim
algorithms, read and write latency, etc.
I suspect that if we are going to make the loop device use AIO, it
will needs a proper queue depth limit (i.e.
/sys/block/loop0/queue/nr_requests) enforced to avoid this sort of
problem...
> > handle sub-sector size access and people use the loop device as a
> > workaround for that.
>
> Yes, user can do that, could you explain a bit what the problem is?
I have a 4k sector backing device and a 512 byte sector filesystem
image. I can't do 512 byte direct IO to the filesystem image, so I
can't run tools that handle fs images in files using direct Io on
that file. Create a loop device with the filesystem image, and now I
can do 512 byte direct IO to the filesystem image, because all that
direct IO to the filesystem image is now buffered by the loop
device.
If the loop device does direct io in this situation, the backing
filesystem rejects direct IO from the loop device because it is not
sector (4k) sized/aligned. User now swears, shouts and curses you
from afar.
DIO and AIO behaviour needs to be configurable through losetup, and
most definitely not the default behaviour.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
2015-05-07 22:20 ` Dave Chinner
@ 2015-05-08 4:09 ` Ming Lei
2015-05-11 1:28 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2015-05-08 4:09 UTC (permalink / raw)
To: Dave Chinner
Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
Alexander Viro, Tejun Heo, util-linux
Hi Dave,
Thanks for your comment!
On Fri, May 8, 2015 at 6:20 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, May 07, 2015 at 08:32:39PM +0800, Ming Lei wrote:
>> On Thu, May 7, 2015 at 3:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> >> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
>> >> mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
>> >> lo->old_gfp_mask = mapping_gfp_mask(mapping);
>> >> mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
>> >> +
>> >> + lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
>> >> + if (lo->support_dio)
>> >> + lo->use_aio = true;
>> >> + else
>> >> + lo->use_aio = false;
>> >
>> > We need an explicit userspace op-in for this. For one direct I/O can't
>>
>> Actually this patch is one simplified version, and my old version
>> has exported two sysfs files(use_aio, use_dio) which can control
>> if direct IO or AIO is used but only AIO is enabled if DIO is set. Finally
>> I think it isn't necessary because dio/aio works well from the tests,
>> and userspace shouldn't care if it is AIO or not if the performance
>> is good.
>
> Performance won't always be good.
>
> It looks to me that this has an unbound queue depth for AIO. What
> throttles the amount of IO userspace can throw at an aio-enabled
> loop device? If it's unbound, then userspace can throw gigabytes of
> random write at the loop device and rather thanbe throttled at 128
> outstanding IOs, the queue will just keep growing. That will have
> adverse affects on dirty memory throttling, memory reclaim
> algorithms, read and write latency, etc.
>
> I suspect that if we are going to make the loop device use AIO, it
> will needs a proper queue depth limit (i.e.
> /sys/block/loop0/queue/nr_requests) enforced to avoid this sort of
> problem...
Loop has been converted to blk-mq, and the current queue depth is
128, so there isn't the problem you worried about, is there?
>> > handle sub-sector size access and people use the loop device as a
>> > workaround for that.
>>
>> Yes, user can do that, could you explain a bit what the problem is?
>
> I have a 4k sector backing device and a 512 byte sector filesystem
> image. I can't do 512 byte direct IO to the filesystem image, so I
> can't run tools that handle fs images in files using direct Io on
> that file. Create a loop device with the filesystem image, and now I
> can do 512 byte direct IO to the filesystem image, because all that
> direct IO to the filesystem image is now buffered by the loop
> device.
>
> If the loop device does direct io in this situation, the backing
> filesystem rejects direct IO from the loop device because it is not
> sector (4k) sized/aligned. User now swears, shouts and curses you
> from afar.
Yes, it is one problem, but looks it can be addressed by adding the
following in loop_set_fd():
if (inode->i_sb->s_bdev)
blk_queue_logical_block_size(lo->lo_queue,
bdev_io_min(inode->i_sb->s_bdev));
> DIO and AIO behaviour needs to be configurable through losetup, and
> most definitely not the default behaviour.
Could you share if there are other reasons for making it configurable via
losetup suppose the above issues can be fixed?
Thanks,
Ming Lei
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 4/4] block: loop: support DIO & AIO
2015-05-08 4:09 ` Ming Lei
@ 2015-05-11 1:28 ` Dave Chinner
0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2015-05-11 1:28 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp,
Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton,
Alexander Viro, Tejun Heo, util-linux
On Fri, May 08, 2015 at 12:09:04PM +0800, Ming Lei wrote:
> On Fri, May 8, 2015 at 6:20 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, May 07, 2015 at 08:32:39PM +0800, Ming Lei wrote:
> >> On Thu, May 7, 2015 at 3:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
> >> >> @@ -441,6 +500,12 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
> >> >> mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
> >> >> lo->old_gfp_mask = mapping_gfp_mask(mapping);
> >> >> mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
> >> >> +
> >> >> + lo->support_dio = mapping->a_ops && mapping->a_ops->direct_IO;
> >> >> + if (lo->support_dio)
> >> >> + lo->use_aio = true;
> >> >> + else
> >> >> + lo->use_aio = false;
> >> >
> >> > We need an explicit userspace op-in for this. For one direct I/O can't
> >>
> >> Actually this patch is one simplified version, and my old version
> >> has exported two sysfs files(use_aio, use_dio) which can control
> >> if direct IO or AIO is used but only AIO is enabled if DIO is set. Finally
> >> I think it isn't necessary because dio/aio works well from the tests,
> >> and userspace shouldn't care if it is AIO or not if the performance
> >> is good.
> >
> > Performance won't always be good.
> >
> > It looks to me that this has an unbound queue depth for AIO. What
...
> Loop has been converted to blk-mq, and the current queue depth is
> 128, so there isn't the problem you worried about, is there?
Oh, OK, that's new. I didn't realise this conversion had already
been done.
> >> > handle sub-sector size access and people use the loop device as a
> >> > workaround for that.
> >>
> >> Yes, user can do that, could you explain a bit what the problem is?
> >
> > I have a 4k sector backing device and a 512 byte sector filesystem
> > image. I can't do 512 byte direct IO to the filesystem image, so I
> > can't run tools that handle fs images in files using direct Io on
> > that file. Create a loop device with the filesystem image, and now I
> > can do 512 byte direct IO to the filesystem image, because all that
> > direct IO to the filesystem image is now buffered by the loop
> > device.
> >
> > If the loop device does direct io in this situation, the backing
> > filesystem rejects direct IO from the loop device because it is not
> > sector (4k) sized/aligned. User now swears, shouts and curses you
> > from afar.
>
> Yes, it is one problem, but looks it can be addressed by adding the
> following in loop_set_fd():
>
> if (inode->i_sb->s_bdev)
> blk_queue_logical_block_size(lo->lo_queue,
> bdev_io_min(inode->i_sb->s_bdev));
How does that address the problem of not being able to do 512 byte
IOs to a loop device that does direct IO to a file hosted on a 4k
sector filesystem?
AFAICT, in that case bdev_io_min(inode->i_sb->s_bdev) will return 4k
because it comes from the backing filesystem, and so the minimum IO
size is still going to be 4k for such configurations...
> > DIO and AIO behaviour needs to be configurable through losetup, and
> > most definitely not the default behaviour.
>
> Could you share if there are other reasons for making it configurable via
> losetup suppose the above issues can be fixed?
It's not about "fixing issues" - losetup is the tool we use for
configuring loop device behaviour. If a user wants to crete a loop
device with a specific configuration, then that is the tool they
use. You add direct IO for the loop device, that needs to be
configurable because switching to direct IO will significantly
change performance for many workloads loops devices are used for.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-05-11 1:29 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1430932106-17451-1-git-send-email-ming.lei@canonical.com>
[not found] ` <1430932106-17451-5-git-send-email-ming.lei@canonical.com>
2015-05-07 7:24 ` [PATCH v3 4/4] block: loop: support DIO & AIO Christoph Hellwig
2015-05-07 12:32 ` Ming Lei
2015-05-07 22:20 ` Dave Chinner
2015-05-08 4:09 ` Ming Lei
2015-05-11 1:28 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox