From: Jon Derrick <jonathan.derrick@intel.com>
To: linux-block@vger.kernel.org
Cc: Jon Derrick <jonathan.derrick@intel.com>,
"Jens Axboe" <axboe@fb.com>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
linux-fsdevel@vger.kernel.org,
"Dan Williams" <dan.j.williams@intel.com>,
"Jeff Moyer" <jmoyer@redhat.com>,
"Stephen Bates" <stephen.bates@microsemi.com>,
"Keith Busch" <keith.busch@intel.com>,
linux-nvme@lists.infradead.org,
"Christoph Hellwig" <hch@infradead.org>
Subject: [RFC 2/2] block: Introduce S_HIPRI inode flag
Date: Thu, 12 May 2016 11:43:06 -0600 [thread overview]
Message-ID: <1463074986-3070-3-git-send-email-jonathan.derrick@intel.com> (raw)
In-Reply-To: <1463074986-3070-1-git-send-email-jonathan.derrick@intel.com>
S_HIPRI is a hint that indicates the file (currently only block devices)
is a high priority file. This hint allows direct-io to the block device
to poll for completions if polling is available to the block device.
The motivation for this patch comes from tiered caching solutions. A
user may wish to have low-latency block devices act as a cache for
higher-latency storage media.
With the introduction of block polling, polling could be enabled on a
queue of a block device. The preadv2/pwritev2 sets allowed a user to
specify per-io polling, but removed the ability to poll per-queue.
Instead of having a user modify their software to use preadv2/pwritev2,
this patch allows a user to set S_HIPRI on a block device file to request
all direct-io for this file to be polled.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
---
block/ioctl.c | 33 +++++++++++++++++++++++++++++++++
fs/block_dev.c | 3 +++
include/linux/fs.h | 2 ++
include/uapi/linux/fs.h | 2 ++
4 files changed, 40 insertions(+)
diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..5c7f1bd 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -520,6 +520,35 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
return ret;
}
+/* set the hipri flag */
+static int blkdev_hipriset(struct block_device *bdev, fmode_t mode,
+ int __user *argp)
+{
+ int n;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+ if (!argp)
+ return -EINVAL;
+ if (get_user(n, argp))
+ return -EFAULT;
+
+ if (!(mode & FMODE_EXCL)) {
+ bdgrab(bdev);
+ if (blkdev_get(bdev, mode | FMODE_EXCL, &bdev) < 0)
+ return -EBUSY;
+ }
+
+ if (n)
+ bdev->bd_inode->i_flags |= S_HIPRI;
+ else
+ bdev->bd_inode->i_flags &= ~S_HIPRI;
+
+ if (!(mode & FMODE_EXCL))
+ blkdev_put(bdev, mode | FMODE_EXCL);
+ return 0;
+}
+
/*
* always keep this in sync with compat_blkdev_ioctl()
*/
@@ -601,6 +630,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
case BLKDAXGET:
return put_int(arg, !!(bdev->bd_inode->i_flags & S_DAX));
break;
+ case BLKHIPRISET:
+ return blkdev_hipriset(bdev, mode, argp);
+ case BLKHIPRIGET:
+ return put_int(arg, !!(bdev->bd_inode->i_flags & S_HIPRI));
case IOC_PR_REGISTER:
return blkdev_pr_register(bdev, argp);
case IOC_PR_RESERVE:
diff --git a/fs/block_dev.c b/fs/block_dev.c
index d4fa725..6fa81c0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -170,6 +170,9 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
if (IS_DAX(inode))
return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
NULL, DIO_SKIP_DIO_COUNT);
+
+ if (IS_HIPRI(inode))
+ iocb->ki_flags |= IOCB_HIPRI;
return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
blkdev_get_block, NULL, NULL,
DIO_SKIP_DIO_COUNT);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 70e61b5..8ae39ea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1788,6 +1788,7 @@ struct super_operations {
#else
#define S_DAX 0 /* Make all the DAX code disappear */
#endif
+#define S_HIPRI 16384 /* IO for this file has high priority */
/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1826,6 +1827,7 @@ struct super_operations {
#define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
#define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
#define IS_DAX(inode) ((inode)->i_flags & S_DAX)
+#define IS_HIPRI(inode) ((inode)->i_flags & S_HIPRI)
#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
(inode)->i_rdev == WHITEOUT_DEV)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index a079d50..d6e262c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -223,6 +223,8 @@ struct fsxattr {
#define BLKROTATIONAL _IO(0x12,126)
#define BLKZEROOUT _IO(0x12,127)
#define BLKDAXGET _IO(0x12,129)
+#define BLKHIPRISET _IOW(0x12,130,int)
+#define BLKHIPRIGET _IO(0x12,131)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
--
1.8.3.1
WARNING: multiple messages have this Message-ID (diff)
From: jonathan.derrick@intel.com (Jon Derrick)
Subject: [RFC 2/2] block: Introduce S_HIPRI inode flag
Date: Thu, 12 May 2016 11:43:06 -0600 [thread overview]
Message-ID: <1463074986-3070-3-git-send-email-jonathan.derrick@intel.com> (raw)
In-Reply-To: <1463074986-3070-1-git-send-email-jonathan.derrick@intel.com>
S_HIPRI is a hint that indicates the file (currently only block devices)
is a high priority file. This hint allows direct-io to the block device
to poll for completions if polling is available to the block device.
The motivation for this patch comes from tiered caching solutions. A
user may wish to have low-latency block devices act as a cache for
higher-latency storage media.
With the introduction of block polling, polling could be enabled on a
queue of a block device. The preadv2/pwritev2 sets allowed a user to
specify per-io polling, but removed the ability to poll per-queue.
Instead of having a user modify their software to use preadv2/pwritev2,
this patch allows a user to set S_HIPRI on a block device file to request
all direct-io for this file to be polled.
Signed-off-by: Jon Derrick <jonathan.derrick at intel.com>
---
block/ioctl.c | 33 +++++++++++++++++++++++++++++++++
fs/block_dev.c | 3 +++
include/linux/fs.h | 2 ++
include/uapi/linux/fs.h | 2 ++
4 files changed, 40 insertions(+)
diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..5c7f1bd 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -520,6 +520,35 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
return ret;
}
+/* set the hipri flag */
+static int blkdev_hipriset(struct block_device *bdev, fmode_t mode,
+ int __user *argp)
+{
+ int n;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
+ if (!argp)
+ return -EINVAL;
+ if (get_user(n, argp))
+ return -EFAULT;
+
+ if (!(mode & FMODE_EXCL)) {
+ bdgrab(bdev);
+ if (blkdev_get(bdev, mode | FMODE_EXCL, &bdev) < 0)
+ return -EBUSY;
+ }
+
+ if (n)
+ bdev->bd_inode->i_flags |= S_HIPRI;
+ else
+ bdev->bd_inode->i_flags &= ~S_HIPRI;
+
+ if (!(mode & FMODE_EXCL))
+ blkdev_put(bdev, mode | FMODE_EXCL);
+ return 0;
+}
+
/*
* always keep this in sync with compat_blkdev_ioctl()
*/
@@ -601,6 +630,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
case BLKDAXGET:
return put_int(arg, !!(bdev->bd_inode->i_flags & S_DAX));
break;
+ case BLKHIPRISET:
+ return blkdev_hipriset(bdev, mode, argp);
+ case BLKHIPRIGET:
+ return put_int(arg, !!(bdev->bd_inode->i_flags & S_HIPRI));
case IOC_PR_REGISTER:
return blkdev_pr_register(bdev, argp);
case IOC_PR_RESERVE:
diff --git a/fs/block_dev.c b/fs/block_dev.c
index d4fa725..6fa81c0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -170,6 +170,9 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
if (IS_DAX(inode))
return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
NULL, DIO_SKIP_DIO_COUNT);
+
+ if (IS_HIPRI(inode))
+ iocb->ki_flags |= IOCB_HIPRI;
return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
blkdev_get_block, NULL, NULL,
DIO_SKIP_DIO_COUNT);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 70e61b5..8ae39ea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1788,6 +1788,7 @@ struct super_operations {
#else
#define S_DAX 0 /* Make all the DAX code disappear */
#endif
+#define S_HIPRI 16384 /* IO for this file has high priority */
/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1826,6 +1827,7 @@ struct super_operations {
#define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
#define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
#define IS_DAX(inode) ((inode)->i_flags & S_DAX)
+#define IS_HIPRI(inode) ((inode)->i_flags & S_HIPRI)
#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
(inode)->i_rdev == WHITEOUT_DEV)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index a079d50..d6e262c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -223,6 +223,8 @@ struct fsxattr {
#define BLKROTATIONAL _IO(0x12,126)
#define BLKZEROOUT _IO(0x12,127)
#define BLKDAXGET _IO(0x12,129)
+#define BLKHIPRISET _IOW(0x12,130,int)
+#define BLKHIPRIGET _IO(0x12,131)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
--
1.8.3.1
next prev parent reply other threads:[~2016-05-12 17:45 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-12 17:43 [RFC 0/2] Support for high-priority block device flag Jon Derrick
2016-05-12 17:43 ` Jon Derrick
2016-05-12 17:43 ` [RFC 1/2] block: allow other bd i_node flags when DAX is disabled Jon Derrick
2016-05-12 17:43 ` Jon Derrick
2016-05-13 9:18 ` Carlos Maiolino
2016-05-13 13:25 ` Dan Williams
2016-05-13 13:25 ` Dan Williams
2016-05-13 17:33 ` Jeff Moyer
2016-05-13 17:33 ` Jeff Moyer
2016-05-13 17:53 ` Dan Williams
2016-05-13 17:53 ` Dan Williams
2016-05-13 18:01 ` Jon Derrick
2016-05-13 18:01 ` Jon Derrick
2016-05-12 17:43 ` Jon Derrick [this message]
2016-05-12 17:43 ` [RFC 2/2] block: Introduce S_HIPRI inode flag Jon Derrick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1463074986-3070-3-git-send-email-jonathan.derrick@intel.com \
--to=jonathan.derrick@intel.com \
--cc=axboe@fb.com \
--cc=dan.j.williams@intel.com \
--cc=hch@infradead.org \
--cc=jmoyer@redhat.com \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=stephen.bates@microsemi.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.