* [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc()
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
@ 2023-06-30 15:25 ` David Howells
2023-07-06 15:21 ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from David Howells
` (9 subsequent siblings)
10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Christoph Hellwig, Christian Brauner
Fix references to iov_iter_get_pages/pages_alloc() in comments to refer to
the *2 interfaces instead.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
fs/ceph/file.c | 4 ++--
include/linux/mm_types.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index b1925232dc08..3bb27b9ce751 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -75,7 +75,7 @@ static __le32 ceph_flags_sys2wire(u32 flags)
*/
/*
- * How many pages to get in one call to iov_iter_get_pages(). This
+ * How many pages to get in one call to iov_iter_get_pages2(). This
* determines the size of the on-stack array used as a buffer.
*/
#define ITER_GET_BVECS_PAGES 64
@@ -115,7 +115,7 @@ static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize,
}
/*
- * iov_iter_get_pages() only considers one iov_iter segment, no matter
+ * iov_iter_get_pages2() only considers one iov_iter segment, no matter
* what maxsize or maxpages are given. For ITER_BVEC that is a single
* page.
*
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index de10fc797c8e..f49029c943b0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1249,7 +1249,7 @@ enum {
/*
* FOLL_LONGTERM indicates that the page will be held for an indefinite
* time period _often_ under userspace control. This is in contrast to
- * iov_iter_get_pages(), whose usages are transient.
+ * iov_iter_get_pages2(), whose usages are transient.
*/
FOLL_LONGTERM = 1 << 8,
/* split huge pmd before returning */
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc()
2023-06-30 15:25 ` [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc() David Howells
@ 2023-07-06 15:21 ` Christoph Hellwig
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:21 UTC (permalink / raw)
To: David Howells
Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
linux-kernel, linux-mm, Christian Brauner
On Fri, Jun 30, 2023 at 04:25:14PM +0100, David Howells wrote:
> /*
> * FOLL_LONGTERM indicates that the page will be held for an indefinite
> * time period _often_ under userspace control. This is in contrast to
> - * iov_iter_get_pages(), whose usages are transient.
> + * iov_iter_get_pages2(), whose usages are transient.
> */
I don't think this should refer to iov_iter_get_pages* at all. The
flag should document that actual get/pin_user interfaces and not refer
to a (deprecated) interface built on top of it.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
2023-06-30 15:25 ` [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc() David Howells
@ 2023-06-30 15:25 ` David Howells
2023-07-06 15:22 ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
` (8 subsequent siblings)
10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Christoph Hellwig, Christian Brauner
IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write
operation to the VFS, but it isn't set by, say, the write() system call.
Fix this by adding an extra argument to init_sync_kiocb() to indicate the
direction and setting that to READ or WRITE, which will cause IOCB_WRITE to
be set as appropriate.
Whilst we're at it, rename init_sync_kiocb() to init_kiocb().
This will allow drivers to use IOCB_WRITE instead of the iterator data
source to determine the I/O direction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
fs/btrfs/ioctl.c | 4 ++--
fs/read_write.c | 10 +++++-----
fs/seq_file.c | 2 +-
fs/splice.c | 2 +-
include/linux/fs.h | 6 +++++-
mm/filemap.c | 2 +-
mm/page_io.c | 4 ++--
7 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a895d105464b..15870337dd26 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4422,7 +4422,7 @@ static int btrfs_ioctl_encoded_read(struct file *file, void __user *argp,
if (ret < 0)
goto out_iov;
- init_sync_kiocb(&kiocb, file);
+ init_kiocb(&kiocb, file, READ);
kiocb.ki_pos = pos;
ret = btrfs_encoded_read(&kiocb, &iter, &args);
@@ -4523,7 +4523,7 @@ static int btrfs_ioctl_encoded_write(struct file *file, void __user *argp, bool
if (ret < 0)
goto out_end_write;
- init_sync_kiocb(&kiocb, file);
+ init_kiocb(&kiocb, file, WRITE);
ret = kiocb_set_rw_flags(&kiocb, 0);
if (ret)
goto out_end_write;
diff --git a/fs/read_write.c b/fs/read_write.c
index b07de77ef126..6fe517047095 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -382,7 +382,7 @@ static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, lo
struct iov_iter iter;
ssize_t ret;
- init_sync_kiocb(&kiocb, filp);
+ init_kiocb(&kiocb, filp, READ);
kiocb.ki_pos = (ppos ? *ppos : 0);
iov_iter_ubuf(&iter, ITER_DEST, buf, len);
@@ -422,7 +422,7 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
if (unlikely(!file->f_op->read_iter || file->f_op->read))
return warn_unsupported(file, "read");
- init_sync_kiocb(&kiocb, file);
+ init_kiocb(&kiocb, file, READ);
kiocb.ki_pos = pos ? *pos : 0;
iov_iter_kvec(&iter, ITER_DEST, &iov, 1, iov.iov_len);
ret = file->f_op->read_iter(&kiocb, &iter);
@@ -484,7 +484,7 @@ static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t
struct iov_iter iter;
ssize_t ret;
- init_sync_kiocb(&kiocb, filp);
+ init_kiocb(&kiocb, filp, WRITE);
kiocb.ki_pos = (ppos ? *ppos : 0);
iov_iter_ubuf(&iter, ITER_SOURCE, (void __user *)buf, len);
@@ -512,7 +512,7 @@ ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from, loff_t *po
if (unlikely(!file->f_op->write_iter || file->f_op->write))
return warn_unsupported(file, "write");
- init_sync_kiocb(&kiocb, file);
+ init_kiocb(&kiocb, file, WRITE);
kiocb.ki_pos = pos ? *pos : 0;
ret = file->f_op->write_iter(&kiocb, from);
if (ret > 0) {
@@ -723,7 +723,7 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
struct kiocb kiocb;
ssize_t ret;
- init_sync_kiocb(&kiocb, filp);
+ init_kiocb(&kiocb, filp, type);
ret = kiocb_set_rw_flags(&kiocb, flags);
if (ret)
return ret;
diff --git a/fs/seq_file.c b/fs/seq_file.c
index f5fdaf3b1572..1ee6ffc630da 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -155,7 +155,7 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
struct iov_iter iter;
ssize_t ret;
- init_sync_kiocb(&kiocb, file);
+ init_kiocb(&kiocb, file, READ);
iov_iter_init(&iter, ITER_DEST, &iov, 1, size);
kiocb.ki_pos = *ppos;
diff --git a/fs/splice.c b/fs/splice.c
index 004eb1c4ce31..867357ebb2c3 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -362,7 +362,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
/* Do the I/O */
iov_iter_bvec(&to, ITER_DEST, bv, npages, len);
- init_sync_kiocb(&kiocb, in);
+ init_kiocb(&kiocb, in, READ);
kiocb.ki_pos = *ppos;
ret = call_read_iter(in, &kiocb, &to);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d4b67bdeb53e..466eba253502 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2017,13 +2017,17 @@ static inline bool HAS_UNMAPPED_ID(struct mnt_idmap *idmap,
!vfsgid_valid(i_gid_into_vfsgid(idmap, inode));
}
-static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
+static inline void init_kiocb(struct kiocb *kiocb, struct file *filp,
+ unsigned int rw)
{
*kiocb = (struct kiocb) {
.ki_filp = filp,
.ki_flags = filp->f_iocb_flags,
.ki_ioprio = get_current_ioprio(),
};
+
+ if (rw == WRITE)
+ kiocb->ki_flags |= IOCB_WRITE;
}
static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
diff --git a/mm/filemap.c b/mm/filemap.c
index 9e44a49bbd74..cd763122d2a2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2905,7 +2905,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
if (unlikely(*ppos >= in->f_mapping->host->i_sb->s_maxbytes))
return 0;
- init_sync_kiocb(&iocb, in);
+ init_kiocb(&iocb, in, READ);
iocb.ki_pos = *ppos;
/* Work out how much data we can actually add into the pipe */
diff --git a/mm/page_io.c b/mm/page_io.c
index 684cd3c7b59b..85cbadaf7395 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -312,7 +312,7 @@ static void swap_writepage_fs(struct page *page, struct writeback_control *wbc)
}
if (!sio) {
sio = mempool_alloc(sio_pool, GFP_NOIO);
- init_sync_kiocb(&sio->iocb, swap_file);
+ init_kiocb(&sio->iocb, swap_file, WRITE);
sio->iocb.ki_complete = sio_write_complete;
sio->iocb.ki_pos = pos;
sio->pages = 0;
@@ -443,7 +443,7 @@ static void swap_readpage_fs(struct page *page,
}
if (!sio) {
sio = mempool_alloc(sio_pool, GFP_KERNEL);
- init_sync_kiocb(&sio->iocb, sis->swap_file);
+ init_kiocb(&sio->iocb, sis->swap_file, READ);
sio->iocb.ki_pos = pos;
sio->iocb.ki_complete = sio_read_complete;
sio->pages = 0;
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from
2023-06-30 15:25 ` [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from David Howells
@ 2023-07-06 15:22 ` Christoph Hellwig
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:22 UTC (permalink / raw)
To: David Howells
Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
2023-06-30 15:25 ` [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc() David Howells
2023-06-30 15:25 ` [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from David Howells
@ 2023-06-30 15:25 ` David Howells
2023-06-30 15:39 ` Jens Axboe
` (2 more replies)
2023-06-30 15:25 ` [RFC PATCH 04/11] iov_iter: Use IOCB_WRITE rather than iterator direction David Howells
` (7 subsequent siblings)
10 siblings, 3 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Christoph Hellwig, Christian Brauner
A number of places that generate kiocbs didn't use init_sync_kiocb() to
initialise the new kiocb. Fix these to always use init_kiocb().
Note that aio and io_uring pass information in through ki_filp through an
overlaid union before I can call init_kiocb(), so that gets reinitialised.
I don't think it clobbers anything else.
After this point, IOCB_WRITE is only set by init_kiocb().
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
drivers/block/loop.c | 11 ++++++-----
drivers/nvme/target/io-cmd-file.c | 5 +++--
drivers/target/target_core_file.c | 2 +-
fs/aio.c | 9 ++++-----
fs/cachefiles/io.c | 10 ++++------
io_uring/rw.c | 10 +++++-----
6 files changed, 23 insertions(+), 24 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 37511d2b2caf..ea92235c5ba2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -439,16 +439,17 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
}
atomic_set(&cmd->ref, 2);
- iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq));
+ iov_iter_bvec(&iter, rw == WRITE ? ITER_SOURCE : ITER_DEST,
+ bvec, nr_bvec, blk_rq_bytes(rq));
iter.iov_offset = offset;
+ init_kiocb(&cmd->iocb, file, rw);
cmd->iocb.ki_pos = pos;
- cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT;
cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
- if (rw == ITER_SOURCE)
+ if (rw == WRITE)
ret = call_write_iter(file, &cmd->iocb, &iter);
else
ret = call_read_iter(file, &cmd->iocb, &iter);
@@ -490,12 +491,12 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
case REQ_OP_WRITE:
if (cmd->use_aio)
- return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
+ return lo_rw_aio(lo, cmd, pos, WRITE);
else
return lo_write_simple(lo, rq, pos);
case REQ_OP_READ:
if (cmd->use_aio)
- return lo_rw_aio(lo, cmd, pos, ITER_DEST);
+ return lo_rw_aio(lo, cmd, pos, READ);
else
return lo_read_simple(lo, rq, pos);
default:
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 2d068439b129..0b6577d51b69 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -85,17 +85,18 @@ static ssize_t nvmet_file_submit_bvec(struct nvmet_req *req, loff_t pos,
if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
ki_flags |= IOCB_DSYNC;
call_iter = req->ns->file->f_op->write_iter;
+ init_kiocb(iocb, req->ns->file, WRITE);
rw = ITER_SOURCE;
} else {
call_iter = req->ns->file->f_op->read_iter;
+ init_kiocb(iocb, req->ns->file, READ);
rw = ITER_DEST;
}
iov_iter_bvec(&iter, rw, req->f.bvec, nr_segs, count);
+ iocb->ki_flags |= ki_flags;
iocb->ki_pos = pos;
- iocb->ki_filp = req->ns->file;
- iocb->ki_flags = ki_flags | iocb->ki_filp->f_iocb_flags;
return call_iter(iocb, &iter);
}
diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c
index ce0e000b74fc..d70cf89959dc 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -287,11 +287,11 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
}
iov_iter_bvec(&iter, is_write, aio_cmd->bvecs, sgl_nents, len);
+ init_kiocb(&aio_cmd->iocb, file, is_write);
aio_cmd->cmd = cmd;
aio_cmd->len = len;
aio_cmd->iocb.ki_pos = cmd->t_task_lba * dev->dev_attrib.block_size;
- aio_cmd->iocb.ki_filp = file;
aio_cmd->iocb.ki_complete = cmd_rw_aio_complete;
aio_cmd->iocb.ki_flags = IOCB_DIRECT;
diff --git a/fs/aio.c b/fs/aio.c
index 77e33619de40..26e173be9448 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1461,14 +1461,14 @@ static void aio_complete_rw(struct kiocb *kiocb, long res)
iocb_put(iocb);
}
-static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
+static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb, int rw)
{
int ret;
+ init_kiocb(req, req->ki_filp, rw);
req->ki_complete = aio_complete_rw;
req->private = NULL;
req->ki_pos = iocb->aio_offset;
- req->ki_flags = req->ki_filp->f_iocb_flags;
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
@@ -1539,7 +1539,7 @@ static int aio_read(struct kiocb *req, const struct iocb *iocb,
struct file *file;
int ret;
- ret = aio_prep_rw(req, iocb);
+ ret = aio_prep_rw(req, iocb, READ);
if (ret)
return ret;
file = req->ki_filp;
@@ -1566,7 +1566,7 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
struct file *file;
int ret;
- ret = aio_prep_rw(req, iocb);
+ ret = aio_prep_rw(req, iocb, WRITE);
if (ret)
return ret;
file = req->ki_filp;
@@ -1592,7 +1592,6 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
sb_start_write(file_inode(file)->i_sb);
__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
}
- req->ki_flags |= IOCB_WRITE;
aio_rw_done(req, call_write_iter(file, req, &iter));
}
kfree(iovec);
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 175a25fcade8..2c47788f38d2 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -134,11 +134,10 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
if (!ki)
goto presubmission_error;
+ init_kiocb(&ki->iocb, file, READ);
refcount_set(&ki->ki_refcnt, 2);
- ki->iocb.ki_filp = file;
+ ki->iocb.ki_flags |= IOCB_DIRECT;
ki->iocb.ki_pos = start_pos + skipped;
- ki->iocb.ki_flags = IOCB_DIRECT;
- ki->iocb.ki_ioprio = get_current_ioprio();
ki->skipped = skipped;
ki->object = object;
ki->inval_counter = cres->inval_counter;
@@ -306,10 +305,9 @@ int __cachefiles_write(struct cachefiles_object *object,
}
refcount_set(&ki->ki_refcnt, 2);
- ki->iocb.ki_filp = file;
+ init_kiocb(&ki->iocb, file, WRITE);
ki->iocb.ki_pos = start_pos;
- ki->iocb.ki_flags = IOCB_DIRECT | IOCB_WRITE;
- ki->iocb.ki_ioprio = get_current_ioprio();
+ ki->iocb.ki_flags |= IOCB_DIRECT;
ki->object = object;
ki->start = start_pos;
ki->len = len;
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 1bce2208b65c..1cade1567162 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -655,12 +655,13 @@ static bool need_complete_io(struct io_kiocb *req)
S_ISBLK(file_inode(req->file)->i_mode);
}
-static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
+static int io_rw_init_file(struct io_kiocb *req, unsigned int io_direction)
{
struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
struct kiocb *kiocb = &rw->kiocb;
struct io_ring_ctx *ctx = req->ctx;
struct file *file = req->file;
+ fmode_t mode = (io_direction == WRITE) ? FMODE_WRITE : FMODE_READ;
int ret;
if (unlikely(!file || !(file->f_mode & mode)))
@@ -669,7 +670,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
if (!(req->flags & REQ_F_FIXED_FILE))
req->flags |= io_file_get_flags(file);
- kiocb->ki_flags = file->f_iocb_flags;
+ init_kiocb(kiocb, file, io_direction);
ret = kiocb_set_rw_flags(kiocb, rw->flags);
if (unlikely(ret))
return ret;
@@ -738,7 +739,7 @@ int io_read(struct io_kiocb *req, unsigned int issue_flags)
iov_iter_restore(&s->iter, &s->iter_state);
iovec = NULL;
}
- ret = io_rw_init_file(req, FMODE_READ);
+ ret = io_rw_init_file(req, READ);
if (unlikely(ret)) {
kfree(iovec);
return ret;
@@ -870,7 +871,7 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
iov_iter_restore(&s->iter, &s->iter_state);
iovec = NULL;
}
- ret = io_rw_init_file(req, FMODE_WRITE);
+ ret = io_rw_init_file(req, WRITE);
if (unlikely(ret)) {
kfree(iovec);
return ret;
@@ -914,7 +915,6 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
__sb_writers_release(file_inode(req->file)->i_sb,
SB_FREEZE_WRITE);
}
- kiocb->ki_flags |= IOCB_WRITE;
if (likely(req->file->f_op->write_iter))
ret2 = call_write_iter(req->file, kiocb, &s->iter);
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
@ 2023-06-30 15:39 ` Jens Axboe
2023-06-30 16:00 ` David Howells
2023-07-06 15:29 ` Christoph Hellwig
2 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2023-06-30 15:39 UTC (permalink / raw)
To: David Howells, Al Viro, Christoph Hellwig
Cc: Matthew Wilcox, Jan Kara, Jeff Layton, David Hildenbrand,
Jason Gunthorpe, Logan Gunthorpe, Hillf Danton, Christian Brauner,
linux-fsdevel, linux-block, linux-kernel, linux-mm,
Christoph Hellwig, Christian Brauner
On 6/30/23 9:25?AM, David Howells wrote:
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 1bce2208b65c..1cade1567162 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -655,12 +655,13 @@ static bool need_complete_io(struct io_kiocb *req)
> S_ISBLK(file_inode(req->file)->i_mode);
> }
>
> -static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
> +static int io_rw_init_file(struct io_kiocb *req, unsigned int io_direction)
> {
> struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
> struct kiocb *kiocb = &rw->kiocb;
> struct io_ring_ctx *ctx = req->ctx;
> struct file *file = req->file;
> + fmode_t mode = (io_direction == WRITE) ? FMODE_WRITE : FMODE_READ;
> int ret;
>
> if (unlikely(!file || !(file->f_mode & mode)))
Not ideal to add a branch here, probably better to just pass in both?
> @@ -870,7 +871,7 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
> iov_iter_restore(&s->iter, &s->iter_state);
> iovec = NULL;
> }
> - ret = io_rw_init_file(req, FMODE_WRITE);
> + ret = io_rw_init_file(req, WRITE);
> if (unlikely(ret)) {
> kfree(iovec);
> return ret;
> @@ -914,7 +915,6 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
> __sb_writers_release(file_inode(req->file)->i_sb,
> SB_FREEZE_WRITE);
> }
> - kiocb->ki_flags |= IOCB_WRITE;
>
> if (likely(req->file->f_op->write_iter))
> ret2 = call_write_iter(req->file, kiocb, &s->iter);
>
One concern here is that we're using IOCB_WRITE here to tell if
sb_start_write() has been done or not, and hence whether
kiocb_end_write() needs to be called. You know set it earlier, which
means if we get a failure if we need to setup async data, then we know
have IOCB_WRITE set at that point even though we did not call
sb_start_write().
--
Jens Axboe
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
2023-06-30 15:39 ` Jens Axboe
@ 2023-06-30 16:00 ` David Howells
2023-06-30 16:05 ` Jens Axboe
2023-07-06 15:29 ` Christoph Hellwig
2 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 16:00 UTC (permalink / raw)
To: Jens Axboe
Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner
Jens Axboe <axboe@kernel.dk> wrote:
> One concern here is that we're using IOCB_WRITE here to tell if
> sb_start_write() has been done or not, and hence whether
> kiocb_end_write() needs to be called. You know set it earlier, which
> means if we get a failure if we need to setup async data, then we know
> have IOCB_WRITE set at that point even though we did not call
> sb_start_write().
Hmmm... It's set earlier in a number of places anyway - __cachefiles_write()
for example.
Btw, can you please put some comments on the IOCB_* constants? I have to
guess at what they mean and how they're meant to be used. Or better still,
get Christoph to write Documentation/core-api/iocb.rst describing the API? ;-)
David
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
2023-06-30 16:00 ` David Howells
@ 2023-06-30 16:05 ` Jens Axboe
0 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2023-06-30 16:05 UTC (permalink / raw)
To: David Howells
Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Christoph Hellwig, Christian Brauner
On 6/30/23 10:00?AM, David Howells wrote:
> Jens Axboe <axboe@kernel.dk> wrote:
>
>> One concern here is that we're using IOCB_WRITE here to tell if
>> sb_start_write() has been done or not, and hence whether
>> kiocb_end_write() needs to be called. You know set it earlier, which
>> means if we get a failure if we need to setup async data, then we know
>> have IOCB_WRITE set at that point even though we did not call
>> sb_start_write().
>
> Hmmm... It's set earlier in a number of places anyway -
> __cachefiles_write() for example.
Not sure how that's relevant, that's a private kiocb and not related to
the private one that io_uring uses?
> Btw, can you please put some comments on the IOCB_* constants? I have
> to guess at what they mean and how they're meant to be used. Or
> better still, get Christoph to write Documentation/core-api/iocb.rst
> describing the API? ;-)
The ones I have added do have comments, mostly, though it's not a lot of
commentary for sure... Which ones are confusing and need better
comments? Would be happy to do that. I do think the comments belong in
there rather than have a separate doc for the kiocb. Though one thing
that's confusing is the ki_private ownership. You'd think it belongs to
the owner of the kiocb, but nope, it has random uses in iomap and ocfs2
at least.
--
Jens Axboe
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
2023-06-30 15:39 ` Jens Axboe
2023-06-30 16:00 ` David Howells
@ 2023-07-06 15:29 ` Christoph Hellwig
2 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:29 UTC (permalink / raw)
To: David Howells
Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner
On Fri, Jun 30, 2023 at 04:25:16PM +0100, David Howells wrote:
> A number of places that generate kiocbs didn't use init_sync_kiocb() to
> initialise the new kiocb. Fix these to always use init_kiocb().
>
> Note that aio and io_uring pass information in through ki_filp through an
> overlaid union before I can call init_kiocb(), so that gets reinitialised.
> I don't think it clobbers anything else.
>
> After this point, IOCB_WRITE is only set by init_kiocb().
Nothing in this patch touches the VFS, so the subject line is
wrong. And I think we're better off splitting it into one per
subsystem, which also allows documenting the exact changes.
Which includes now setting the flags from f_iocb_flags and setting
and I/O priority. Please explain why this is harmless or even useful.
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 37511d2b2caf..ea92235c5ba2 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -439,16 +439,17 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
> }
> atomic_set(&cmd->ref, 2);
>
> - iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq));
> + iov_iter_bvec(&iter, rw == WRITE ? ITER_SOURCE : ITER_DEST,
> + bvec, nr_bvec, blk_rq_bytes(rq));
Given the cover letter I expect this is going to go away, but the
changes would probably a lot more readable if you had a helper
to convert from READ/WRITE to the iter flags inbetween.
Or maybe do it the other way - add a helper to init the
> @@ -490,12 +491,12 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
> return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
> case REQ_OP_WRITE:
> if (cmd->use_aio)
> - return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
> + return lo_rw_aio(lo, cmd, pos, WRITE);
> else
> return lo_write_simple(lo, rq, pos);
> case REQ_OP_READ:
> if (cmd->use_aio)
> - return lo_rw_aio(lo, cmd, pos, ITER_DEST);
> + return lo_rw_aio(lo, cmd, pos, READ);
I don't think there is any need to pass the rw argument at all,
lo_rw_aio can just do an op_is_write(req_op(rq))
> -static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
> +static int io_rw_init_file(struct io_kiocb *req, unsigned int io_direction)
> {
> struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
> struct kiocb *kiocb = &rw->kiocb;
> struct io_ring_ctx *ctx = req->ctx;
> struct file *file = req->file;
> + fmode_t mode = (io_direction == WRITE) ? FMODE_WRITE : FMODE_READ;
> int ret;
>
> if (unlikely(!file || !(file->f_mode & mode)))
I'd just move this check into the two callers, that way you can hard
code the mode instead of adding a conversion.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH 04/11] iov_iter: Use IOCB_WRITE rather than iterator direction
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
` (2 preceding siblings ...)
2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
@ 2023-06-30 15:25 ` David Howells
2023-06-30 15:25 ` [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE " David Howells
` (6 subsequent siblings)
10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Christoph Hellwig, Christian Brauner
If a kiocb is available, use the IOCB_WRITE flag instead of the iterator
direction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
block/fops.c | 8 ++++----
fs/9p/vfs_addr.c | 2 +-
fs/affs/file.c | 4 ++--
fs/ceph/file.c | 2 +-
fs/dax.c | 2 +-
fs/direct-io.c | 22 +++++++++++-----------
fs/exfat/inode.c | 6 +++---
fs/ext2/inode.c | 2 +-
fs/f2fs/file.c | 10 +++++-----
fs/fat/inode.c | 4 ++--
fs/fuse/dax.c | 2 +-
fs/fuse/file.c | 8 ++++----
fs/hfs/inode.c | 2 +-
fs/hfsplus/inode.c | 2 +-
fs/iomap/direct-io.c | 2 +-
fs/jfs/inode.c | 2 +-
fs/nfs/direct.c | 2 +-
fs/nilfs2/inode.c | 2 +-
fs/ntfs3/inode.c | 2 +-
fs/ocfs2/aops.c | 2 +-
fs/orangefs/inode.c | 2 +-
fs/reiserfs/inode.c | 2 +-
fs/udf/inode.c | 2 +-
include/linux/fs.h | 10 ++++++++++
24 files changed, 57 insertions(+), 47 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index a286bf3325c5..e70a5ef4d447 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -73,7 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
return -ENOMEM;
}
- if (iov_iter_rw(iter) == READ) {
+ if (iocb_is_read(iocb)) {
bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ);
if (user_backed_iter(iter))
should_dirty = true;
@@ -88,7 +88,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
goto out;
ret = bio.bi_iter.bi_size;
- if (iov_iter_rw(iter) == WRITE)
+ if (iocb_is_write(iocb))
task_io_account_write(ret);
if (iocb->ki_flags & IOCB_NOWAIT)
@@ -174,7 +174,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
struct blk_plug plug;
struct blkdev_dio *dio;
struct bio *bio;
- bool is_read = (iov_iter_rw(iter) == READ), is_sync;
+ bool is_read = iocb_is_read(iocb), is_sync;
blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb);
loff_t pos = iocb->ki_pos;
int ret = 0;
@@ -311,7 +311,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
unsigned int nr_pages)
{
struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
- bool is_read = iov_iter_rw(iter) == READ;
+ bool is_read = iocb_is_read(iocb);
blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb);
struct blkdev_dio *dio;
struct bio *bio;
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 8a635999a7d6..18bce4f8a158 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -257,7 +257,7 @@ v9fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
ssize_t n;
int err = 0;
- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb_is_write(iocb)) {
n = p9_client_write(file->private_data, pos, iter, &err);
if (n) {
struct inode *inode = file_inode(file);
diff --git a/fs/affs/file.c b/fs/affs/file.c
index e43f2f007ac1..ad70ff27e5f2 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -400,7 +400,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
loff_t offset = iocb->ki_pos;
ssize_t ret;
- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb_is_write(iocb)) {
loff_t size = offset + count;
if (AFFS_I(inode)->mmu_private < size)
@@ -408,7 +408,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
}
ret = blockdev_direct_IO(iocb, inode, iter, affs_get_block);
- if (ret < 0 && iov_iter_rw(iter) == WRITE)
+ if (ret < 0 && iocb_is_write(iocb))
affs_write_failed(mapping, offset + count);
return ret;
}
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3bb27b9ce751..453429e9b9e6 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1280,7 +1280,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
struct timespec64 mtime = current_time(inode);
size_t count = iov_iter_count(iter);
loff_t pos = iocb->ki_pos;
- bool write = iov_iter_rw(iter) == WRITE;
+ bool write = iocb_is_write(iocb);
bool should_dirty = !write && user_backed_iter(iter);
if (write && ceph_snap(file_inode(file)) != CEPH_NOSNAP)
diff --git a/fs/dax.c b/fs/dax.c
index 2ababb89918d..5c6ebe64a3fd 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1573,7 +1573,7 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
if (!iomi.len)
return 0;
- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb_is_write(iocb)) {
lockdep_assert_held_write(&iomi.inode->i_rwsem);
iomi.flags |= IOMAP_WRITE;
} else {
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 7bc494ee56b9..2b517cc5b32d 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1125,7 +1125,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
*/
/* watch out for a 0 len io from a tricksy fs */
- if (iov_iter_rw(iter) == READ && !count)
+ if (iocb_is_read(iocb) && !count)
return 0;
dio = kmem_cache_alloc(dio_cache, GFP_KERNEL);
@@ -1139,7 +1139,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
memset(dio, 0, offsetof(struct dio, pages));
dio->flags = flags;
- if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) {
+ if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) {
/* will be released by direct_io_worker */
inode_lock(inode);
}
@@ -1147,7 +1147,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
/* Once we sampled i_size check for reads beyond EOF */
dio->i_size = i_size_read(inode);
- if (iov_iter_rw(iter) == READ && offset >= dio->i_size) {
+ if (iocb_is_read(iocb) && offset >= dio->i_size) {
retval = 0;
goto fail_dio;
}
@@ -1160,7 +1160,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
goto fail_dio;
}
- if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) {
+ if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) {
struct address_space *mapping = iocb->ki_filp->f_mapping;
retval = filemap_write_and_wait_range(mapping, offset, end - 1);
@@ -1176,13 +1176,13 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
*/
if (is_sync_kiocb(iocb))
dio->is_async = false;
- else if (iov_iter_rw(iter) == WRITE && end > i_size_read(inode))
+ else if (iocb_is_write(iocb) && end > i_size_read(inode))
dio->is_async = false;
else
dio->is_async = true;
dio->inode = inode;
- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb_is_write(iocb)) {
dio->opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
if (iocb->ki_flags & IOCB_NOWAIT)
dio->opf |= REQ_NOWAIT;
@@ -1194,7 +1194,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
* For AIO O_(D)SYNC writes we need to defer completions to a workqueue
* so that we can call ->fsync.
*/
- if (dio->is_async && iov_iter_rw(iter) == WRITE) {
+ if (dio->is_async && iocb_is_write(iocb)) {
retval = 0;
if (iocb_is_dsync(iocb))
retval = dio_set_defer_completion(dio);
@@ -1230,7 +1230,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
spin_lock_init(&dio->bio_lock);
dio->refcount = 1;
- dio->should_dirty = user_backed_iter(iter) && iov_iter_rw(iter) == READ;
+ dio->should_dirty = user_backed_iter(iter) && iocb_is_read(iocb);
sdio.iter = iter;
sdio.final_block_in_request = end >> blkbits;
@@ -1287,7 +1287,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
* we can let i_mutex go now that its achieved its purpose
* of protecting us from looking up uninitialized blocks.
*/
- if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING))
+ if (iocb_is_read(iocb) && (dio->flags & DIO_LOCKING))
inode_unlock(dio->inode);
/*
@@ -1299,7 +1299,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
*/
BUG_ON(retval == -EIOCBQUEUED);
if (dio->is_async && retval == 0 && dio->result &&
- (iov_iter_rw(iter) == READ || dio->result == count))
+ (iocb_is_read(iocb) || dio->result == count))
retval = -EIOCBQUEUED;
else
dio_await_completion(dio);
@@ -1312,7 +1312,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
return retval;
fail_dio:
- if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ)
+ if (dio->flags & DIO_LOCKING && iocb_is_read(iocb))
inode_unlock(inode);
kmem_cache_free(dio_cache, dio);
diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index 481dd338f2b8..706db9cd1b4e 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -411,10 +411,10 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
loff_t size = iocb->ki_pos + iov_iter_count(iter);
- int rw = iov_iter_rw(iter);
+ bool writing = iocb_is_write(iocb);
ssize_t ret;
- if (rw == WRITE) {
+ if (writing) {
/*
* FIXME: blockdev_direct_IO() doesn't use ->write_begin(),
* so we need to update the ->i_size_aligned to block boundary.
@@ -433,7 +433,7 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* condition of exfat_get_block() and ->truncate().
*/
ret = blockdev_direct_IO(iocb, inode, iter, exfat_get_block);
- if (ret < 0 && (rw & WRITE))
+ if (ret < 0 && writing)
exfat_write_failed(mapping, size);
return ret;
}
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 26f135e7ffce..b239c16ab7ee 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -919,7 +919,7 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
ssize_t ret;
ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block);
- if (ret < 0 && iov_iter_rw(iter) == WRITE)
+ if (ret < 0 && iocb_is_write(iocb))
ext2_write_failed(mapping, offset + count);
return ret;
}
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 2435111a8532..7ef596b901d9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -805,7 +805,7 @@ int f2fs_truncate(struct inode *inode)
return 0;
}
-static bool f2fs_force_buffered_io(struct inode *inode, int rw)
+static bool f2fs_force_buffered_io(struct inode *inode, bool writing)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -823,9 +823,9 @@ static bool f2fs_force_buffered_io(struct inode *inode, int rw)
* for blkzoned device, fallback direct IO to buffered IO, so
* all IOs can be serialized by log-structured write.
*/
- if (f2fs_sb_has_blkzoned(sbi) && (rw == WRITE))
+ if (f2fs_sb_has_blkzoned(sbi) && writing)
return true;
- if (f2fs_lfs_mode(sbi) && rw == WRITE && F2FS_IO_ALIGNED(sbi))
+ if (f2fs_lfs_mode(sbi) && writing && F2FS_IO_ALIGNED(sbi))
return true;
if (is_sbi_flag_set(sbi, SBI_CP_DISABLED))
return true;
@@ -861,7 +861,7 @@ int f2fs_getattr(struct mnt_idmap *idmap, const struct path *path,
unsigned int bsize = i_blocksize(inode);
stat->result_mask |= STATX_DIOALIGN;
- if (!f2fs_force_buffered_io(inode, WRITE)) {
+ if (!f2fs_force_buffered_io(inode, true)) {
stat->dio_mem_align = bsize;
stat->dio_offset_align = bsize;
}
@@ -4280,7 +4280,7 @@ static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb,
if (!(iocb->ki_flags & IOCB_DIRECT))
return false;
- if (f2fs_force_buffered_io(inode, iov_iter_rw(iter)))
+ if (f2fs_force_buffered_io(inode, iocb_is_write(iocb)))
return false;
/*
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index d99b8549ec8f..237e20891df2 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -261,7 +261,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
loff_t offset = iocb->ki_pos;
ssize_t ret;
- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb_is_write(iocb)) {
/*
* FIXME: blockdev_direct_IO() doesn't use ->write_begin(),
* so we need to update the ->mmu_private to block boundary.
@@ -281,7 +281,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* condition of fat_get_block() and ->truncate().
*/
ret = blockdev_direct_IO(iocb, inode, iter, fat_get_block);
- if (ret < 0 && iov_iter_rw(iter) == WRITE)
+ if (ret < 0 && iocb_is_write(iocb))
fat_write_failed(mapping, offset + count);
return ret;
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 8e74f278a3f6..7bfbe1783462 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -720,7 +720,7 @@ static bool file_extending_write(struct kiocb *iocb, struct iov_iter *from)
{
struct inode *inode = file_inode(iocb->ki_filp);
- return (iov_iter_rw(from) == WRITE &&
+ return (iocb_is_write(iocb) &&
((iocb->ki_pos) >= i_size_read(inode) ||
(iocb->ki_pos + iov_iter_count(from) > i_size_read(inode))));
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index bc4115288eec..f43a39faa8e4 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2911,7 +2911,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
inode = file->f_mapping->host;
i_size = i_size_read(inode);
- if ((iov_iter_rw(iter) == READ) && (offset >= i_size))
+ if (iocb_is_read(iocb) && (offset >= i_size))
return 0;
io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL);
@@ -2923,7 +2923,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
io->bytes = -1;
io->size = 0;
io->offset = offset;
- io->write = (iov_iter_rw(iter) == WRITE);
+ io->write = iocb_is_write(iocb);
io->err = 0;
/*
* By default, we want to optimize all I/Os with async request
@@ -2956,7 +2956,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
io->done = &wait;
}
- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb_is_write(iocb)) {
ret = fuse_direct_io(io, iter, &pos, FUSE_DIO_WRITE);
fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
} else {
@@ -2979,7 +2979,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
kref_put(&io->refcnt, fuse_io_release);
- if (iov_iter_rw(iter) == WRITE) {
+ if (iocb_is_write(iocb)) {
fuse_write_update_attr(inode, pos, ret);
/* For extending writes we already hold exclusive lock */
if (ret < 0 && offset + count > i_size)
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index 441d7fc952e3..d95ac1f559b5 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -141,7 +141,7 @@ static ssize_t hfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* In case of error extending write may have instantiated a few
* blocks outside i_size. Trim these off again.
*/
- if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+ if (unlikely(iocb_is_write(iocb) && ret < 0)) {
loff_t isize = i_size_read(inode);
loff_t end = iocb->ki_pos + count;
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index 7d1a675e037d..9851a6cb9e3b 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -138,7 +138,7 @@ static ssize_t hfsplus_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* In case of error extending write may have instantiated a few
* blocks outside i_size. Trim these off again.
*/
- if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+ if (unlikely(iocb_is_write(iocb) && ret < 0)) {
loff_t isize = i_size_read(inode);
loff_t end = iocb->ki_pos + count;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index ea3b868c8355..a56099470185 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -510,7 +510,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (iocb->ki_flags & IOCB_NOWAIT)
iomi.flags |= IOMAP_NOWAIT;
- if (iov_iter_rw(iter) == READ) {
+ if (iocb_is_read(iocb)) {
if (iomi.pos >= dio->i_size)
goto out_free_dio;
diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
index 8ac10e396050..0d1f94ac9488 100644
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
@@ -334,7 +334,7 @@ static ssize_t jfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* In case of error extending write may have instantiated a few
* blocks outside i_size. Trim these off again.
*/
- if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+ if (unlikely(iocb_is_write(iocb) && ret < 0)) {
loff_t isize = i_size_read(inode);
loff_t end = iocb->ki_pos + count;
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 9a18c5a69ace..5f4522dd05b5 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -133,7 +133,7 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
- if (iov_iter_rw(iter) == READ)
+ if (iocb_is_read(iocb))
ret = nfs_file_direct_read(iocb, iter, true);
else
ret = nfs_file_direct_write(iocb, iter, true);
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index a8ce522ac747..93c61fcb5e91 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -289,7 +289,7 @@ nilfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
struct inode *inode = file_inode(iocb->ki_filp);
- if (iov_iter_rw(iter) == WRITE)
+ if (iocb_is_write(iocb))
return 0;
/* Needs synchronization with the cleaner */
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index 6c560245eef4..5b50c4653ff8 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -774,7 +774,7 @@ static ssize_t ntfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
struct ntfs_inode *ni = ntfs_i(inode);
loff_t vbo = iocb->ki_pos;
loff_t end;
- int wr = iov_iter_rw(iter) & WRITE;
+ bool wr = iocb_is_write(iocb);
size_t iter_count = iov_iter_count(iter);
loff_t valid;
ssize_t ret;
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 8dfc284e85f0..e8afc0c13f71 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2456,7 +2456,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
!ocfs2_supports_append_dio(osb))
return 0;
- if (iov_iter_rw(iter) == READ)
+ if (iocb_is_read(iocb))
get_block = ocfs2_lock_get_block;
else
get_block = ocfs2_dio_wr_get_block;
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 9014bbcc8031..120a9a6c0f3b 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -513,7 +513,7 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
*/
struct file *file = iocb->ki_filp;
loff_t pos = iocb->ki_pos;
- enum ORANGEFS_io_type type = iov_iter_rw(iter) == WRITE ?
+ enum ORANGEFS_io_type type = iocb_is_write(iocb) ?
ORANGEFS_IO_WRITE : ORANGEFS_IO_READ;
loff_t *offset = &pos;
struct inode *inode = file->f_mapping->host;
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 77bd3b27059f..ca57930178f4 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -3248,7 +3248,7 @@ static ssize_t reiserfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
* In case of error extending write may have instantiated a few
* blocks outside i_size. Trim these off again.
*/
- if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+ if (unlikely(iocb_is_write(iocb) && ret < 0)) {
loff_t isize = i_size_read(inode);
loff_t end = iocb->ki_pos + count;
diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 1e71e04ae8f6..d49c12c1f4d9 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -311,7 +311,7 @@ static ssize_t udf_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
if (UDF_I(inode)->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB)
return 0;
ret = blockdev_direct_IO(iocb, inode, iter, udf_get_block);
- if (unlikely(ret < 0 && iov_iter_rw(iter) == WRITE))
+ if (unlikely(ret < 0 && iocb_is_write(iocb)))
udf_write_failed(mapping, iocb->ki_pos + count);
return ret;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 466eba253502..e344db819498 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -368,6 +368,16 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb)
return kiocb->ki_complete == NULL;
}
+static inline bool iocb_is_write(const struct kiocb *kiocb)
+{
+ return kiocb->ki_flags & IOCB_WRITE;
+}
+
+static inline bool iocb_is_read(const struct kiocb *kiocb)
+{
+ return !iocb_is_write(kiocb);
+}
+
struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*read_folio)(struct file *, struct folio *);
^ permalink raw reply related [flat|nested] 21+ messages in thread* [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE rather than iterator direction
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
` (3 preceding siblings ...)
2023-06-30 15:25 ` [RFC PATCH 04/11] iov_iter: Use IOCB_WRITE rather than iterator direction David Howells
@ 2023-06-30 15:25 ` David Howells
2023-07-06 15:30 ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 06/11] iov_iter: Use op_is_write() " David Howells
` (5 subsequent siblings)
10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Christoph Hellwig, Christian Brauner
If an iomap_iter is available, use the IOMAP_WRITE flag instead of the
iterator direction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
fs/dax.c | 4 ++--
fs/iomap/direct-io.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 5c6ebe64a3fd..d49480675fc0 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1438,7 +1438,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
loff_t pos = iomi->pos;
struct dax_device *dax_dev = iomap->dax_dev;
loff_t end = pos + length, done = 0;
- bool write = iov_iter_rw(iter) == WRITE;
+ bool write = iomi->flags & IOMAP_WRITE;
bool cow = write && iomap->flags & IOMAP_F_SHARED;
ssize_t ret = 0;
size_t xfer;
@@ -1498,7 +1498,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size),
DAX_ACCESS, &kaddr, NULL);
- if (map_len == -EIO && iov_iter_rw(iter) == WRITE) {
+ if (map_len == -EIO && write) {
map_len = dax_direct_access(dax_dev, pgoff,
PHYS_PFN(size), DAX_RECOVERY_WRITE,
&kaddr, NULL);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index a56099470185..3095091af680 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -587,7 +587,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
* Revert iter to a state corresponding to that as some callers (such
* as the splice code) rely on it.
*/
- if (iov_iter_rw(iter) == READ && iomi.pos >= dio->i_size)
+ if (!(iomi.flags & IOMAP_WRITE) && iomi.pos >= dio->i_size)
iov_iter_revert(iter, iomi.pos - dio->i_size);
if (ret == -EFAULT && dio->size && (dio_flags & IOMAP_DIO_PARTIAL)) {
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE rather than iterator direction
2023-06-30 15:25 ` [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE " David Howells
@ 2023-07-06 15:30 ` Christoph Hellwig
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:30 UTC (permalink / raw)
To: David Howells
Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner
On Fri, Jun 30, 2023 at 04:25:18PM +0100, David Howells wrote:
> If an iomap_iter is available, use the IOMAP_WRITE flag instead of the
> iterator direction.
This is really two patches, one for dax and one for iomap, and not one
for the iov_iter code.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH 06/11] iov_iter: Use op_is_write() rather than iterator direction
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
` (4 preceding siblings ...)
2023-06-30 15:25 ` [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE " David Howells
@ 2023-06-30 15:25 ` David Howells
2023-07-06 15:30 ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 07/11] cifs: Drop the check using iov_iter_rw() David Howells
` (4 subsequent siblings)
10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Christoph Hellwig, Christian Brauner
If a request struct is available, use op_is_write() instead of the iterator
direction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
block/blk-map.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-map.c b/block/blk-map.c
index 44d74a30ddac..f8fe114ae433 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -205,7 +205,7 @@ static int bio_copy_user_iov(struct request *rq, struct rq_map_data *map_data,
/*
* success
*/
- if ((iov_iter_rw(iter) == WRITE &&
+ if ((op_is_write(rq->cmd_flags) &&
(!map_data || !map_data->null_mapped)) ||
(map_data && map_data->from_user)) {
ret = bio_copy_from_iter(bio, iter);
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 06/11] iov_iter: Use op_is_write() rather than iterator direction
2023-06-30 15:25 ` [RFC PATCH 06/11] iov_iter: Use op_is_write() " David Howells
@ 2023-07-06 15:30 ` Christoph Hellwig
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:30 UTC (permalink / raw)
To: David Howells
Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner
On Fri, Jun 30, 2023 at 04:25:19PM +0100, David Howells wrote:
> If a request struct is available, use op_is_write() instead of the iterator
> direction.
s/iov_iter/block/ in the subject.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH 07/11] cifs: Drop the check using iov_iter_rw()
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
` (5 preceding siblings ...)
2023-06-30 15:25 ` [RFC PATCH 06/11] iov_iter: Use op_is_write() " David Howells
@ 2023-06-30 15:25 ` David Howells
2023-06-30 15:25 ` [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user David Howells
` (3 subsequent siblings)
10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Steve French, Paulo Alcantara, Ronnie Sahlberg,
Shyam Prasad N, Tom Talpey, Christoph Hellwig, Christian Brauner,
linux-cifs
smbd_recv() has a check that the iterator is the correct direction. Drop
this check as we're getting rid of iov_iter_rw().
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@cjr.nz>
cc: Ronnie Sahlberg <lsahlber@redhat.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-cifs@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
fs/smb/client/smbdirect.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c
index 223e17c16b60..672078d00207 100644
--- a/fs/smb/client/smbdirect.c
+++ b/fs/smb/client/smbdirect.c
@@ -1906,14 +1906,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
unsigned int to_read, page_offset;
int rc;
- if (iov_iter_rw(&msg->msg_iter) == WRITE) {
- /* It's a bug in upper layer to get there */
- cifs_dbg(VFS, "Invalid msg iter dir %u\n",
- iov_iter_rw(&msg->msg_iter));
- rc = -EINVAL;
- goto out;
- }
-
switch (iov_iter_type(&msg->msg_iter)) {
case ITER_KVEC:
buf = msg->msg_iter.kvec->iov_base;
@@ -1935,7 +1927,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
rc = -EINVAL;
}
-out:
/* SMBDirect will read it all or nothing */
if (rc > 0)
msg->msg_iter.count = 0;
^ permalink raw reply related [flat|nested] 21+ messages in thread* [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
` (6 preceding siblings ...)
2023-06-30 15:25 ` [RFC PATCH 07/11] cifs: Drop the check using iov_iter_rw() David Howells
@ 2023-06-30 15:25 ` David Howells
2023-07-06 15:31 ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 09/11] iov_iter: Use I/O dir flags with iov_iter_extract_pages() David Howells
` (2 subsequent siblings)
10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Christoph Hellwig, Christian Brauner
Drop the last usage of iov_iter_rw() into __iov_iter_get_pages_alloc().
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
include/linux/uio.h | 5 -----
lib/iov_iter.c | 2 +-
2 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index ff81e5ccaef2..70e12f536f8f 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -136,11 +136,6 @@ static inline bool iov_iter_is_xarray(const struct iov_iter *i)
return iov_iter_type(i) == ITER_XARRAY;
}
-static inline unsigned char iov_iter_rw(const struct iov_iter *i)
-{
- return i->data_source ? WRITE : READ;
-}
-
static inline bool user_backed_iter(const struct iov_iter *i)
{
return i->user_backed;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b667b1e2f688..b8c52231a6ff 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1097,7 +1097,7 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
unsigned long addr;
int res;
- if (iov_iter_rw(i) != WRITE)
+ if (i->data_source == ITER_SOURCE)
gup_flags |= FOLL_WRITE;
if (i->nofault)
gup_flags |= FOLL_NOFAULT;
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user
2023-06-30 15:25 ` [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user David Howells
@ 2023-07-06 15:31 ` Christoph Hellwig
0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:31 UTC (permalink / raw)
To: David Howells
Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner
On Fri, Jun 30, 2023 at 04:25:21PM +0100, David Howells wrote:
> Drop the last usage of iov_iter_rw() into __iov_iter_get_pages_alloc().
Well, if we can't just drop this check entirely, the rest of the prep
work is just churn and not actually very useful. Shouldn't we go
all the way and kill ITER_DEST/SOURCE?
^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC PATCH 09/11] iov_iter: Use I/O dir flags with iov_iter_extract_pages()
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
` (7 preceding siblings ...)
2023-06-30 15:25 ` [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user David Howells
@ 2023-06-30 15:25 ` David Howells
2023-06-30 15:25 ` [RFC PATCH 10/11] 9p: Pin pages rather than ref'ing if appropriate David Howells
2023-06-30 15:25 ` [RFC PATCH 11/11] scsi: Use extract_iter_to_sg() David Howells
10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Christoph Hellwig, Christian Brauner, Herbert Xu,
linux-crypto
Define flags to pass into iov_iter_extract_pages() to indicate I/O
direction. A warning is issued and the function fails if neither or both
flags are set. The flag is also checked against the iterator's data_source
flag.
Also make extract_iter_to_sg() check the flags and propagate them to
iov_iter_extract_pages().
Finally, make the callers pass the flags into iov_iter_extract_pages() and
extract_iter_to_sg().
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-crypto@vger.kernel.org
---
block/bio.c | 6 ++++++
block/blk-map.c | 3 +++
crypto/af_alg.c | 5 +++--
crypto/algif_hash.c | 3 ++-
fs/direct-io.c | 6 +++++-
include/linux/bio.h | 18 ++++++++++++++++--
include/linux/uio.h | 5 ++++-
lib/iov_iter.c | 12 ++++++++++--
lib/scatterlist.c | 12 ++++++++++--
9 files changed, 59 insertions(+), 11 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 8672179213b9..440d4889ba13 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1242,6 +1242,8 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
* will have to be cleaned up in the way indicated by the BIO_PAGE_PINNED flag.
* For a multi-segment *iter, this function only adds pages from the next
* non-empty segment of the iov iterator.
+ *
+ * The I/O direction is determined from the bio operation type.
*/
static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
{
@@ -1263,6 +1265,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
+ extraction_flags |= bio_is_write(bio) ? WRITE_FROM_ITER : READ_INTO_ITER;
+
if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
extraction_flags |= ITER_ALLOW_P2PDMA;
@@ -1332,6 +1336,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
* fit into the bio, or are requested in @iter, whatever is smaller. If
* MM encounters an error pinning the requested pages, it stops. Error
* is returned only if 0 pages could be pinned.
+ *
+ * The bio operation indicates the data direction.
*/
int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
{
diff --git a/block/blk-map.c b/block/blk-map.c
index f8fe114ae433..71de2cf9bf4e 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -279,6 +279,9 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
if (bio == NULL)
return -ENOMEM;
+ extraction_flags |=
+ bio_is_write(bio) ? WRITE_FROM_ITER : READ_INTO_ITER;
+
if (blk_queue_pci_p2pdma(rq->q))
extraction_flags |= ITER_ALLOW_P2PDMA;
if (iov_iter_extract_will_pin(iter))
diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 6218c773d71c..c62ac5e32aeb 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1042,7 +1042,8 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
};
plen = extract_iter_to_sg(&msg->msg_iter, len, &sgtable,
- MAX_SGL_ENTS - sgl->cur, 0);
+ MAX_SGL_ENTS - sgl->cur,
+ WRITE_FROM_ITER);
if (plen < 0) {
err = plen;
goto unlock;
@@ -1247,7 +1248,7 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
sg_init_table(rsgl->sgl.sgt.sgl, ALG_MAX_PAGES);
err = extract_iter_to_sg(&msg->msg_iter, seglen, &rsgl->sgl.sgt,
- ALG_MAX_PAGES, 0);
+ ALG_MAX_PAGES, READ_INTO_ITER);
if (err < 0) {
rsgl->sg_num_bytes = 0;
return err;
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 0ab43e149f0e..b343c4973dbf 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -115,7 +115,8 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg,
ctx->sgl.need_unpin = iov_iter_extract_will_pin(&msg->msg_iter);
err = extract_iter_to_sg(&msg->msg_iter, LONG_MAX,
- &ctx->sgl.sgt, npages, 0);
+ &ctx->sgl.sgt, npages,
+ WRITE_FROM_ITER);
if (err < 0)
goto unlock_free;
len = err;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 2b517cc5b32d..3316d2f8f21c 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -168,10 +168,14 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
{
struct page **pages = dio->pages;
const enum req_op dio_op = dio->opf & REQ_OP_MASK;
+ unsigned int extraction_flags;
ssize_t ret;
+ extraction_flags =
+ op_is_write(dio_op) ? WRITE_FROM_ITER : READ_INTO_ITER;
+
ret = iov_iter_extract_pages(sdio->iter, &pages, LONG_MAX,
- DIO_PAGES, 0, &sdio->from);
+ DIO_PAGES, extraction_flags, &sdio->from);
if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) {
/*
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c4f5b5228105..d4b4c477e69b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -42,11 +42,25 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs)
#define bio_sectors(bio) bvec_iter_sectors((bio)->bi_iter)
#define bio_end_sector(bio) bvec_iter_end_sector((bio)->bi_iter)
+/**
+ * bio_is_write - Query if the I/O direction is towards the disk
+ * @bio: The bio to query
+ *
+ * Return true if this is some sort of write operation - ie. the data is going
+ * towards the disk.
+ */
+static inline bool bio_is_write(const struct bio *bio)
+{
+ return op_is_write(bio_op(bio));
+}
+
/*
* Return the data direction, READ or WRITE.
*/
-#define bio_data_dir(bio) \
- (op_is_write(bio_op(bio)) ? WRITE : READ)
+static inline int bio_data_dir(const struct bio *bio)
+{
+ return bio_is_write(bio) ? WRITE : READ;
+}
/*
* Check whether this bio carries any data or not. A NULL bio is allowed.
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 70e12f536f8f..6ea7c1b589c1 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -379,8 +379,11 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
};
}
/* Flags for iov_iter_get/extract_pages*() */
+/* Indicate if we are going to be writing from the buffer or reading into it. */
+#define WRITE_FROM_ITER ((__force iov_iter_extraction_t)0x01) // == WRITE
+#define READ_INTO_ITER ((__force iov_iter_extraction_t)0x02)
/* Allow P2PDMA on the extracted pages */
-#define ITER_ALLOW_P2PDMA ((__force iov_iter_extraction_t)0x01)
+#define ITER_ALLOW_P2PDMA ((__force iov_iter_extraction_t)0x04)
ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages,
size_t maxsize, unsigned int maxpages,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b8c52231a6ff..d31f40b69da5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1791,8 +1791,10 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
* that the caller allocated a page list at least @maxpages in size and this
* will be filled in.
*
- * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA
- * be allowed on the pages extracted.
+ * @extraction_flags should have either WRITE_FROM_ITER or READ_INTO_ITER to
+ * indicate the direction the data is intended to flow to/from the buffer and
+ * can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be allowed on the
+ * pages extracted.
*
* The iov_iter_extract_will_pin() function can be used to query how cleanup
* should be performed.
@@ -1823,6 +1825,12 @@ ssize_t iov_iter_extract_pages(struct iov_iter *i,
iov_iter_extraction_t extraction_flags,
size_t *offset0)
{
+ unsigned int dir = extraction_flags & (READ_INTO_ITER | WRITE_FROM_ITER);
+
+ if (WARN_ON_ONCE(dir != READ_INTO_ITER && dir != WRITE_FROM_ITER) ||
+ WARN_ON_ONCE((dir & WRITE) != i->data_source))
+ return -EFAULT;
+
maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT);
if (!maxsize)
return 0;
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index e97d7060329e..116ce320eb36 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1326,8 +1326,10 @@ static ssize_t extract_xarray_to_sg(struct iov_iter *iter,
*
* No end mark is placed on the scatterlist; that's left to the caller.
*
- * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA
- * be allowed on the pages extracted.
+ * @extraction_flags should have either WRITE_FROM_ITER or READ_INTO_ITER to
+ * indicate the direction the data is intended to flow to/from the buffer and
+ * can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be allowed on the
+ * pages extracted.
*
* If successful, @sgtable->nents is updated to include the number of elements
* added and the number of bytes added is returned. @sgtable->orig_nents is
@@ -1340,6 +1342,12 @@ ssize_t extract_iter_to_sg(struct iov_iter *iter, size_t maxsize,
struct sg_table *sgtable, unsigned int sg_max,
iov_iter_extraction_t extraction_flags)
{
+ unsigned int dir = extraction_flags & (READ_INTO_ITER | WRITE_FROM_ITER);
+
+ if (WARN_ON_ONCE(dir != READ_INTO_ITER && dir != WRITE_FROM_ITER) ||
+ WARN_ON_ONCE((dir & WRITE) != iter->data_source))
+ return -EFAULT;
+
if (maxsize == 0)
return 0;
^ permalink raw reply related [flat|nested] 21+ messages in thread* [RFC PATCH 10/11] 9p: Pin pages rather than ref'ing if appropriate
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
` (8 preceding siblings ...)
2023-06-30 15:25 ` [RFC PATCH 09/11] iov_iter: Use I/O dir flags with iov_iter_extract_pages() David Howells
@ 2023-06-30 15:25 ` David Howells
2023-06-30 15:25 ` [RFC PATCH 11/11] scsi: Use extract_iter_to_sg() David Howells
10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, Dominique Martinet, Eric Van Hensbergen,
Latchesar Ionkov, Christian Schoenebeck, v9fs-developer
Convert the 9p filesystem to use iov_iter_extract_pages() instead of
iov_iter_get_pages(). This will pin pages or leave them unaltered rather
than getting a ref on them as appropriate to the iterator.
The pages need to be pinned for DIO-read rather than having refs taken on
them to prevent VM copy-on-write from malfunctioning during a concurrent
fork() (the result of the I/O would otherwise end up only visible to the
child process and not the parent).
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Eric Van Hensbergen <ericvh@gmail.com>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: v9fs-developer@lists.sourceforge.net
---
net/9p/trans_common.c | 8 ++--
net/9p/trans_common.h | 2 +-
net/9p/trans_virtio.c | 92 ++++++++++++++-----------------------------
3 files changed, 34 insertions(+), 68 deletions(-)
diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c
index c827f694551c..4342de18f08b 100644
--- a/net/9p/trans_common.c
+++ b/net/9p/trans_common.c
@@ -9,16 +9,16 @@
#include "trans_common.h"
/**
- * p9_release_pages - Release pages after the transaction.
+ * p9_unpin_pages - Unpin pages after the transaction.
* @pages: array of pages to be put
* @nr_pages: size of array
*/
-void p9_release_pages(struct page **pages, int nr_pages)
+void p9_unpin_pages(struct page **pages, int nr_pages)
{
int i;
for (i = 0; i < nr_pages; i++)
if (pages[i])
- put_page(pages[i]);
+ unpin_user_page(pages[i]);
}
-EXPORT_SYMBOL(p9_release_pages);
+EXPORT_SYMBOL(p9_unpin_pages);
diff --git a/net/9p/trans_common.h b/net/9p/trans_common.h
index 32134db6abf3..fd94c48aba5b 100644
--- a/net/9p/trans_common.h
+++ b/net/9p/trans_common.h
@@ -4,4 +4,4 @@
* Author Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
*/
-void p9_release_pages(struct page **pages, int nr_pages);
+void p9_unpin_pages(struct page **pages, int nr_pages);
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 3c27ffb781e3..93569de2bdba 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -310,71 +310,35 @@ static int p9_get_mapped_pages(struct virtio_chan *chan,
struct iov_iter *data,
int count,
size_t *offs,
- int *need_drop)
+ bool *need_unpin,
+ iov_iter_extraction_t extraction_flags)
{
int nr_pages;
int err;
+ int n;
if (!iov_iter_count(data))
return 0;
- if (!iov_iter_is_kvec(data)) {
- int n;
- /*
- * We allow only p9_max_pages pinned. We wait for the
- * Other zc request to finish here
- */
- if (atomic_read(&vp_pinned) >= chan->p9_max_pages) {
- err = wait_event_killable(vp_wq,
- (atomic_read(&vp_pinned) < chan->p9_max_pages));
- if (err == -ERESTARTSYS)
- return err;
- }
- n = iov_iter_get_pages_alloc2(data, pages, count, offs);
- if (n < 0)
- return n;
- *need_drop = 1;
- nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE);
- atomic_add(nr_pages, &vp_pinned);
- return n;
- } else {
- /* kernel buffer, no need to pin pages */
- int index;
- size_t len;
- void *p;
-
- /* we'd already checked that it's non-empty */
- while (1) {
- len = iov_iter_single_seg_count(data);
- if (likely(len)) {
- p = data->kvec->iov_base + data->iov_offset;
- break;
- }
- iov_iter_advance(data, 0);
- }
- if (len > count)
- len = count;
-
- nr_pages = DIV_ROUND_UP((unsigned long)p + len, PAGE_SIZE) -
- (unsigned long)p / PAGE_SIZE;
-
- *pages = kmalloc_array(nr_pages, sizeof(struct page *),
- GFP_NOFS);
- if (!*pages)
- return -ENOMEM;
-
- *need_drop = 0;
- p -= (*offs = offset_in_page(p));
- for (index = 0; index < nr_pages; index++) {
- if (is_vmalloc_addr(p))
- (*pages)[index] = vmalloc_to_page(p);
- else
- (*pages)[index] = kmap_to_page(p);
- p += PAGE_SIZE;
- }
- iov_iter_advance(data, len);
- return len;
+ /*
+ * We allow only p9_max_pages pinned. We wait for the
+ * Other zc request to finish here
+ */
+ if (atomic_read(&vp_pinned) >= chan->p9_max_pages) {
+ err = wait_event_killable(vp_wq,
+ (atomic_read(&vp_pinned) < chan->p9_max_pages));
+ if (err == -ERESTARTSYS)
+ return err;
}
+
+ n = iov_iter_extract_pages(data, pages, count, INT_MAX,
+ extraction_flags, offs);
+ if (n < 0)
+ return n;
+ *need_unpin = iov_iter_extract_will_pin(data);
+ nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE);
+ atomic_add(nr_pages, &vp_pinned);
+ return n;
}
static void handle_rerror(struct p9_req_t *req, int in_hdr_len,
@@ -429,7 +393,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
struct virtio_chan *chan = client->trans;
struct scatterlist *sgs[4];
size_t offs;
- int need_drop = 0;
+ bool need_unpin;
int kicked = 0;
p9_debug(P9_DEBUG_TRANS, "virtio request\n");
@@ -437,7 +401,8 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
if (uodata) {
__le32 sz;
int n = p9_get_mapped_pages(chan, &out_pages, uodata,
- outlen, &offs, &need_drop);
+ outlen, &offs, &need_unpin,
+ WRITE_FROM_ITER);
if (n < 0) {
err = n;
goto err_out;
@@ -456,7 +421,8 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
memcpy(&req->tc.sdata[0], &sz, sizeof(sz));
} else if (uidata) {
int n = p9_get_mapped_pages(chan, &in_pages, uidata,
- inlen, &offs, &need_drop);
+ inlen, &offs, &need_unpin,
+ READ_INTO_ITER);
if (n < 0) {
err = n;
goto err_out;
@@ -542,13 +508,13 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
* Non kernel buffers are pinned, unpin them
*/
err_out:
- if (need_drop) {
+ if (need_unpin) {
if (in_pages) {
- p9_release_pages(in_pages, in_nr_pages);
+ p9_unpin_pages(in_pages, in_nr_pages);
atomic_sub(in_nr_pages, &vp_pinned);
}
if (out_pages) {
- p9_release_pages(out_pages, out_nr_pages);
+ p9_unpin_pages(out_pages, out_nr_pages);
atomic_sub(out_nr_pages, &vp_pinned);
}
/* wakeup anybody waiting for slots to pin pages */
^ permalink raw reply related [flat|nested] 21+ messages in thread* [RFC PATCH 11/11] scsi: Use extract_iter_to_sg()
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
` (9 preceding siblings ...)
2023-06-30 15:25 ` [RFC PATCH 10/11] 9p: Pin pages rather than ref'ing if appropriate David Howells
@ 2023-06-30 15:25 ` David Howells
10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
To: Jens Axboe, Al Viro, Christoph Hellwig
Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
linux-mm, James E . J . Bottomley, Martin K . Petersen,
Christoph Hellwig, linux-scsi
Use extract_iter_to_sg() to build a scatterlist from an iterator.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: James E.J. Bottomley <jejb@linux.ibm.com>
cc: Martin K. Petersen <martin.petersen@oracle.com>
cc: Christoph Hellwig <hch@lst.de>
cc: linux-scsi@vger.kernel.org
---
drivers/vhost/scsi.c | 79 +++++++++++++-------------------------------
1 file changed, 23 insertions(+), 56 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index bb10fa4bb4f6..7bb41e2a0d64 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -75,6 +75,9 @@ struct vhost_scsi_cmd {
u32 tvc_prot_sgl_count;
/* Saved unpacked SCSI LUN for vhost_scsi_target_queue_cmd() */
u32 tvc_lun;
+ /* Cleanup modes for scatterlists */
+ unsigned int tvc_need_unpin;
+ unsigned int tvc_prot_need_unpin;
/* Pointer to the SGL formatted memory from virtio-scsi */
struct scatterlist *tvc_sgl;
struct scatterlist *tvc_prot_sgl;
@@ -327,14 +330,13 @@ static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd)
struct vhost_scsi_inflight *inflight = tv_cmd->inflight;
int i;
- if (tv_cmd->tvc_sgl_count) {
+ if (tv_cmd->tvc_need_unpin && tv_cmd->tvc_sgl_count)
for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
- put_page(sg_page(&tv_cmd->tvc_sgl[i]));
- }
- if (tv_cmd->tvc_prot_sgl_count) {
+ unpin_user_page(sg_page(&tv_cmd->tvc_sgl[i]));
+
+ if (tv_cmd->tvc_prot_need_unpin && tv_cmd->tvc_prot_sgl_count)
for (i = 0; i < tv_cmd->tvc_prot_sgl_count; i++)
- put_page(sg_page(&tv_cmd->tvc_prot_sgl[i]));
- }
+ unpin_user_page(sg_page(&tv_cmd->tvc_prot_sgl[i]));
sbitmap_clear_bit(&svq->scsi_tags, se_cmd->map_tag);
vhost_scsi_put_inflight(inflight);
@@ -606,38 +608,6 @@ vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
return cmd;
}
-/*
- * Map a user memory range into a scatterlist
- *
- * Returns the number of scatterlist entries used or -errno on error.
- */
-static int
-vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd,
- struct iov_iter *iter,
- struct scatterlist *sgl,
- bool write)
-{
- struct page **pages = cmd->tvc_upages;
- struct scatterlist *sg = sgl;
- ssize_t bytes;
- size_t offset;
- unsigned int npages = 0;
-
- bytes = iov_iter_get_pages2(iter, pages, LONG_MAX,
- VHOST_SCSI_PREALLOC_UPAGES, &offset);
- /* No pages were pinned */
- if (bytes <= 0)
- return bytes < 0 ? bytes : -EFAULT;
-
- while (bytes) {
- unsigned n = min_t(unsigned, PAGE_SIZE - offset, bytes);
- sg_set_page(sg++, pages[npages++], n, offset);
- bytes -= n;
- offset = 0;
- }
- return npages;
-}
-
static int
vhost_scsi_calc_sgls(struct iov_iter *iter, size_t bytes, int max_sgls)
{
@@ -661,24 +631,19 @@ vhost_scsi_calc_sgls(struct iov_iter *iter, size_t bytes, int max_sgls)
static int
vhost_scsi_iov_to_sgl(struct vhost_scsi_cmd *cmd, bool write,
struct iov_iter *iter,
- struct scatterlist *sg, int sg_count)
+ struct scatterlist *sg, int sg_count,
+ unsigned int *need_unpin)
{
- struct scatterlist *p = sg;
- int ret;
+ struct sg_table sgt = { .sgl = sg };
+ ssize_t ret;
- while (iov_iter_count(iter)) {
- ret = vhost_scsi_map_to_sgl(cmd, iter, sg, write);
- if (ret < 0) {
- while (p < sg) {
- struct page *page = sg_page(p++);
- if (page)
- put_page(page);
- }
- return ret;
- }
- sg += ret;
- }
- return 0;
+ ret = extract_iter_to_sg(iter, LONG_MAX, &sgt, sg_count,
+ write ? WRITE_FROM_ITER : READ_INTO_ITER);
+ if (ret > 0)
+ sg_mark_end(sg + sgt.nents - 1);
+
+ *need_unpin = iov_iter_extract_will_pin(iter);
+ return ret;
}
static int
@@ -702,7 +667,8 @@ vhost_scsi_mapal(struct vhost_scsi_cmd *cmd,
ret = vhost_scsi_iov_to_sgl(cmd, write, prot_iter,
cmd->tvc_prot_sgl,
- cmd->tvc_prot_sgl_count);
+ cmd->tvc_prot_sgl_count,
+ &cmd->tvc_prot_need_unpin);
if (ret < 0) {
cmd->tvc_prot_sgl_count = 0;
return ret;
@@ -719,7 +685,8 @@ vhost_scsi_mapal(struct vhost_scsi_cmd *cmd,
cmd->tvc_sgl, cmd->tvc_sgl_count);
ret = vhost_scsi_iov_to_sgl(cmd, write, data_iter,
- cmd->tvc_sgl, cmd->tvc_sgl_count);
+ cmd->tvc_sgl, cmd->tvc_sgl_count,
+ &cmd->tvc_need_unpin);
if (ret < 0) {
cmd->tvc_sgl_count = 0;
return ret;
^ permalink raw reply related [flat|nested] 21+ messages in thread