* [RFC 0/2] Support for high-priority block device flag
@ 2016-05-12 17:43 Jon Derrick
2016-05-12 17:43 ` [RFC 1/2] block: allow other bd i_node flags when DAX is disabled Jon Derrick
2016-05-12 17:43 ` [RFC 2/2] block: Introduce S_HIPRI inode flag Jon Derrick
0 siblings, 2 replies; 8+ messages in thread
From: Jon Derrick @ 2016-05-12 17:43 UTC (permalink / raw)
To: linux-block
Cc: Jon Derrick, Jens Axboe, Alexander Viro, linux-fsdevel,
Dan Williams, Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
Christoph Hellwig
This set intends to recreate block polling (now HIPRI) behavior that was
present in 4.5, where all IO on a queue could be selected to use
block polling behavior. The set allows a block device file to subscribe
to block polling on a block device granularity, rather than a per-queue
granularity.
There have been few-to-no arguments in support of the per-queue,
always-poll functionality that 4.5 offered, moreso in favor of enabling
polling on the entire block device (or indivual IOs as 4.6 offers).
I've been made aware that streams and ioprio may supercede this
functionality in the future, but I'm hoping this is an acceptable
stopgap in the meantime.
This set applies against 4.6-rc7 as well as Jens' for-4.7/core
(I've also been made aware that it may not apply cleanly to 4.7 after
several DAX changes)
Jon Derrick (2):
block: allow other bd i_node flags when DAX is disabled
block: Introduce S_HIPRI inode flag
block/ioctl.c | 33 +++++++++++++++++++++++++++++++++
fs/block_dev.c | 5 ++++-
include/linux/fs.h | 2 ++
include/uapi/linux/fs.h | 2 ++
4 files changed, 41 insertions(+), 1 deletion(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC 1/2] block: allow other bd i_node flags when DAX is disabled
2016-05-12 17:43 [RFC 0/2] Support for high-priority block device flag Jon Derrick
@ 2016-05-12 17:43 ` Jon Derrick
2016-05-13 9:18 ` Carlos Maiolino
2016-05-13 13:25 ` Dan Williams
2016-05-12 17:43 ` [RFC 2/2] block: Introduce S_HIPRI inode flag Jon Derrick
1 sibling, 2 replies; 8+ messages in thread
From: Jon Derrick @ 2016-05-12 17:43 UTC (permalink / raw)
To: linux-block
Cc: Jon Derrick, Jens Axboe, Alexander Viro, linux-fsdevel,
Dan Williams, Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
Christoph Hellwig
When DAX is not compiled into the kernel or the device does not support
direct-access, the block device file's inode flags are fully cleared.
This patch changes it to only clear the S_DAX flag when DAX is disabled.
This reverts to i_flags behavior prior to
bbab37ddc20bae4709bca8745c128c4f46fe63c5
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
fs/block_dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 20a2c02..d4fa725 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1208,7 +1208,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access)
bdev->bd_inode->i_flags = S_DAX;
else
- bdev->bd_inode->i_flags = 0;
+ bdev->bd_inode->i_flags &= ~S_DAX;
if (!partno) {
ret = -ENXIO;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC 2/2] block: Introduce S_HIPRI inode flag
2016-05-12 17:43 [RFC 0/2] Support for high-priority block device flag Jon Derrick
2016-05-12 17:43 ` [RFC 1/2] block: allow other bd i_node flags when DAX is disabled Jon Derrick
@ 2016-05-12 17:43 ` Jon Derrick
1 sibling, 0 replies; 8+ messages in thread
From: Jon Derrick @ 2016-05-12 17:43 UTC (permalink / raw)
To: linux-block
Cc: Jon Derrick, Jens Axboe, Alexander Viro, linux-fsdevel,
Dan Williams, Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
Christoph Hellwig
S_HIPRI is a hint that indicates the file (currently only block devices)
is a high priority file. This hint allows direct-io to the block device
to poll for completions if polling is available to the block device.
The motivation for this patch comes from tiered caching solutions. A
user may wish to have low-latency block devices act as a cache for
higher-latency storage media.
With the introduction of block polling, polling could be enabled on a
queue of a block device. The preadv2/pwritev2 sets allowed a user to
specify per-io polling, but removed the ability to poll per-queue.
Instead of having a user modify their software to use preadv2/pwritev2,
this patch allows a user to set S_HIPRI on a block device file to request
all direct-io for this file to be polled.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
block/ioctl.c | 33 +++++++++++++++++++++++++++++++++
fs/block_dev.c | 3 +++
include/linux/fs.h | 2 ++
include/uapi/linux/fs.h | 2 ++
4 files changed, 40 insertions(+)
diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..5c7f1bd 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -520,6 +520,35 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
return ret;
}
+/* set the hipri flag */
+static int blkdev_hipriset(struct block_device *bdev, fmode_t mode,
+ int __user *argp)
+{
+ int n;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+ if (!argp)
+ return -EINVAL;
+ if (get_user(n, argp))
+ return -EFAULT;
+
+ if (!(mode & FMODE_EXCL)) {
+ bdgrab(bdev);
+ if (blkdev_get(bdev, mode | FMODE_EXCL, &bdev) < 0)
+ return -EBUSY;
+ }
+
+ if (n)
+ bdev->bd_inode->i_flags |= S_HIPRI;
+ else
+ bdev->bd_inode->i_flags &= ~S_HIPRI;
+
+ if (!(mode & FMODE_EXCL))
+ blkdev_put(bdev, mode | FMODE_EXCL);
+ return 0;
+}
+
/*
* always keep this in sync with compat_blkdev_ioctl()
*/
@@ -601,6 +630,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
case BLKDAXGET:
return put_int(arg, !!(bdev->bd_inode->i_flags & S_DAX));
break;
+ case BLKHIPRISET:
+ return blkdev_hipriset(bdev, mode, argp);
+ case BLKHIPRIGET:
+ return put_int(arg, !!(bdev->bd_inode->i_flags & S_HIPRI));
case IOC_PR_REGISTER:
return blkdev_pr_register(bdev, argp);
case IOC_PR_RESERVE:
diff --git a/fs/block_dev.c b/fs/block_dev.c
index d4fa725..6fa81c0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -170,6 +170,9 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
if (IS_DAX(inode))
return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
NULL, DIO_SKIP_DIO_COUNT);
+
+ if (IS_HIPRI(inode))
+ iocb->ki_flags |= IOCB_HIPRI;
return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
blkdev_get_block, NULL, NULL,
DIO_SKIP_DIO_COUNT);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 70e61b5..8ae39ea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1788,6 +1788,7 @@ struct super_operations {
#else
#define S_DAX 0 /* Make all the DAX code disappear */
#endif
+#define S_HIPRI 16384 /* IO for this file has high priority */
/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1826,6 +1827,7 @@ struct super_operations {
#define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
#define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
#define IS_DAX(inode) ((inode)->i_flags & S_DAX)
+#define IS_HIPRI(inode) ((inode)->i_flags & S_HIPRI)
#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
(inode)->i_rdev == WHITEOUT_DEV)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index a079d50..d6e262c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -223,6 +223,8 @@ struct fsxattr {
#define BLKROTATIONAL _IO(0x12,126)
#define BLKZEROOUT _IO(0x12,127)
#define BLKDAXGET _IO(0x12,129)
+#define BLKHIPRISET _IOW(0x12,130,int)
+#define BLKHIPRIGET _IO(0x12,131)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC 1/2] block: allow other bd i_node flags when DAX is disabled
2016-05-12 17:43 ` [RFC 1/2] block: allow other bd i_node flags when DAX is disabled Jon Derrick
@ 2016-05-13 9:18 ` Carlos Maiolino
2016-05-13 13:25 ` Dan Williams
1 sibling, 0 replies; 8+ messages in thread
From: Carlos Maiolino @ 2016-05-13 9:18 UTC (permalink / raw)
To: linux-fsdevel
On Thu, May 12, 2016 at 11:43:05AM -0600, Jon Derrick wrote:
> When DAX is not compiled into the kernel or the device does not support
> direct-access, the block device file's inode flags are fully cleared.
> This patch changes it to only clear the S_DAX flag when DAX is disabled.
>
This patch makes sense to me, despite of the acceptance or not of the S_HIPRI flag, you can add:
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> This reverts to i_flags behavior prior to
> bbab37ddc20bae4709bca8745c128c4f46fe63c5
>
> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> ---
> fs/block_dev.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 20a2c02..d4fa725 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1208,7 +1208,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
> if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access)
> bdev->bd_inode->i_flags = S_DAX;
> else
> - bdev->bd_inode->i_flags = 0;
> + bdev->bd_inode->i_flags &= ~S_DAX;
>
> if (!partno) {
> ret = -ENXIO;
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Carlos
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 1/2] block: allow other bd i_node flags when DAX is disabled
2016-05-12 17:43 ` [RFC 1/2] block: allow other bd i_node flags when DAX is disabled Jon Derrick
2016-05-13 9:18 ` Carlos Maiolino
@ 2016-05-13 13:25 ` Dan Williams
2016-05-13 17:33 ` Jeff Moyer
1 sibling, 1 reply; 8+ messages in thread
From: Dan Williams @ 2016-05-13 13:25 UTC (permalink / raw)
To: Jon Derrick
Cc: linux-block, Jens Axboe, Alexander Viro, linux-fsdevel,
Jeff Moyer, Stephen Bates, Keith Busch, linux-nvme,
Christoph Hellwig
On Thu, May 12, 2016 at 10:43 AM, Jon Derrick
<jonathan.derrick@intel.com> wrote:
> When DAX is not compiled into the kernel or the device does not support
> direct-access, the block device file's inode flags are fully cleared.
> This patch changes it to only clear the S_DAX flag when DAX is disabled.
>
> This reverts to i_flags behavior prior to
> bbab37ddc20bae4709bca8745c128c4f46fe63c5
>
> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> ---
> fs/block_dev.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 20a2c02..d4fa725 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1208,7 +1208,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
> if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access)
> bdev->bd_inode->i_flags = S_DAX;
> else
> - bdev->bd_inode->i_flags = 0;
> + bdev->bd_inode->i_flags &= ~S_DAX;
Setting S_DAX is atomic, but the above change makes it a non-atomic
read-modify-write. Do we need exclusion / locking in this path?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 1/2] block: allow other bd i_node flags when DAX is disabled
2016-05-13 13:25 ` Dan Williams
@ 2016-05-13 17:33 ` Jeff Moyer
2016-05-13 17:53 ` Dan Williams
0 siblings, 1 reply; 8+ messages in thread
From: Jeff Moyer @ 2016-05-13 17:33 UTC (permalink / raw)
To: Dan Williams
Cc: Jon Derrick, linux-block, Jens Axboe, Alexander Viro,
linux-fsdevel, Stephen Bates, Keith Busch, linux-nvme,
Christoph Hellwig
Dan Williams <dan.j.williams@intel.com> writes:
> On Thu, May 12, 2016 at 10:43 AM, Jon Derrick
> <jonathan.derrick@intel.com> wrote:
>> When DAX is not compiled into the kernel or the device does not support
>> direct-access, the block device file's inode flags are fully cleared.
>> This patch changes it to only clear the S_DAX flag when DAX is disabled.
>>
>> This reverts to i_flags behavior prior to
>> bbab37ddc20bae4709bca8745c128c4f46fe63c5
>>
>> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
>> ---
>> fs/block_dev.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 20a2c02..d4fa725 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1208,7 +1208,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
>> if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access)
>> bdev->bd_inode->i_flags = S_DAX;
>> else
>> - bdev->bd_inode->i_flags = 0;
>> + bdev->bd_inode->i_flags &= ~S_DAX;
>
> Setting S_DAX is atomic, but the above change makes it a non-atomic
> read-modify-write. Do we need exclusion / locking in this path?
First, I don't see how you'd ever get a read-modify-write here, as S_DAX
surely won't ever be set if direct_access isn't supported. Second,
there is existing code that already does this very thing. See further
down in __blkdev_get:
if (!ret) {
bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
if (!blkdev_dax_capable(bdev))
bdev->bd_inode->i_flags &= ~S_DAX;
}
What relies on S_DAX being set or cleared atomically?
Cheers,
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 1/2] block: allow other bd i_node flags when DAX is disabled
2016-05-13 17:33 ` Jeff Moyer
@ 2016-05-13 17:53 ` Dan Williams
2016-05-13 18:01 ` Jon Derrick
0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2016-05-13 17:53 UTC (permalink / raw)
To: Jeff Moyer
Cc: Jon Derrick, linux-block, Jens Axboe, Alexander Viro,
linux-fsdevel, Stephen Bates, Keith Busch, linux-nvme,
Christoph Hellwig
On Fri, May 13, 2016 at 10:33 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
>> On Thu, May 12, 2016 at 10:43 AM, Jon Derrick
>> <jonathan.derrick@intel.com> wrote:
>>> When DAX is not compiled into the kernel or the device does not support
>>> direct-access, the block device file's inode flags are fully cleared.
>>> This patch changes it to only clear the S_DAX flag when DAX is disabled.
>>>
>>> This reverts to i_flags behavior prior to
>>> bbab37ddc20bae4709bca8745c128c4f46fe63c5
>>>
>>> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
>>> ---
>>> fs/block_dev.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>>> index 20a2c02..d4fa725 100644
>>> --- a/fs/block_dev.c
>>> +++ b/fs/block_dev.c
>>> @@ -1208,7 +1208,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
>>> if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access)
>>> bdev->bd_inode->i_flags = S_DAX;
>>> else
>>> - bdev->bd_inode->i_flags = 0;
>>> + bdev->bd_inode->i_flags &= ~S_DAX;
>>
>> Setting S_DAX is atomic, but the above change makes it a non-atomic
>> read-modify-write. Do we need exclusion / locking in this path?
>
> First, I don't see how you'd ever get a read-modify-write here, as S_DAX
> surely won't ever be set if direct_access isn't supported. Second,
> there is existing code that already does this very thing. See further
> down in __blkdev_get:
>
> if (!ret) {
> bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
> if (!blkdev_dax_capable(bdev))
> bdev->bd_inode->i_flags &= ~S_DAX;
> }
>
> What relies on S_DAX being set or cleared atomically?
So, when I wrote the above "&= ~S_DAX" I was worried about the
non-atomic update with respect to the BLKDAXSET ioctl, but we killed
BLKDAXSET so the worry went away. Now an inode flag setting ioctl is
back for the S_HIPRI case. Can that collide with these other flag
touches?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 1/2] block: allow other bd i_node flags when DAX is disabled
2016-05-13 17:53 ` Dan Williams
@ 2016-05-13 18:01 ` Jon Derrick
0 siblings, 0 replies; 8+ messages in thread
From: Jon Derrick @ 2016-05-13 18:01 UTC (permalink / raw)
To: Dan Williams
Cc: Jeff Moyer, linux-block, Jens Axboe, Alexander Viro,
linux-fsdevel, Stephen Bates, Keith Busch, linux-nvme,
Christoph Hellwig
On Fri, May 13, 2016 at 10:53:37AM -0700, Dan Williams wrote:
> On Fri, May 13, 2016 at 10:33 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Dan Williams <dan.j.williams@intel.com> writes:
> >
> >> On Thu, May 12, 2016 at 10:43 AM, Jon Derrick
> >> <jonathan.derrick@intel.com> wrote:
> >>> When DAX is not compiled into the kernel or the device does not support
> >>> direct-access, the block device file's inode flags are fully cleared.
> >>> This patch changes it to only clear the S_DAX flag when DAX is disabled.
> >>>
> >>> This reverts to i_flags behavior prior to
> >>> bbab37ddc20bae4709bca8745c128c4f46fe63c5
> >>>
> >>> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> >>> ---
> >>> fs/block_dev.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/fs/block_dev.c b/fs/block_dev.c
> >>> index 20a2c02..d4fa725 100644
> >>> --- a/fs/block_dev.c
> >>> +++ b/fs/block_dev.c
> >>> @@ -1208,7 +1208,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
> >>> if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access)
> >>> bdev->bd_inode->i_flags = S_DAX;
> >>> else
> >>> - bdev->bd_inode->i_flags = 0;
> >>> + bdev->bd_inode->i_flags &= ~S_DAX;
> >>
> >> Setting S_DAX is atomic, but the above change makes it a non-atomic
> >> read-modify-write. Do we need exclusion / locking in this path?
> >
> > First, I don't see how you'd ever get a read-modify-write here, as S_DAX
> > surely won't ever be set if direct_access isn't supported. Second,
> > there is existing code that already does this very thing. See further
> > down in __blkdev_get:
> >
> > if (!ret) {
> > bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
> > if (!blkdev_dax_capable(bdev))
> > bdev->bd_inode->i_flags &= ~S_DAX;
> > }
> >
> > What relies on S_DAX being set or cleared atomically?
>
> So, when I wrote the above "&= ~S_DAX" I was worried about the
> non-atomic update with respect to the BLKDAXSET ioctl, but we killed
> BLKDAXSET so the worry went away. Now an inode flag setting ioctl is
> back for the S_HIPRI case. Can that collide with these other flag
> touches?
The S_DAX case is unclear to me, but the new ioctl certainly does need a i_mutex lock and to use inode_set_flags.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-05-13 18:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-12 17:43 [RFC 0/2] Support for high-priority block device flag Jon Derrick
2016-05-12 17:43 ` [RFC 1/2] block: allow other bd i_node flags when DAX is disabled Jon Derrick
2016-05-13 9:18 ` Carlos Maiolino
2016-05-13 13:25 ` Dan Williams
2016-05-13 17:33 ` Jeff Moyer
2016-05-13 17:53 ` Dan Williams
2016-05-13 18:01 ` Jon Derrick
2016-05-12 17:43 ` [RFC 2/2] block: Introduce S_HIPRI inode flag Jon Derrick
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).