* [PATCH v8 1/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
2025-08-15 23:29 [PATCH v8 0/9] NFS DIRECT: align misaligned DIO for LOCALIO Mike Snitzer
@ 2025-08-15 23:29 ` Mike Snitzer
2025-08-15 23:29 ` [PATCH v8 2/9] nfs/localio: make trace_nfs_local_open_fh more useful Mike Snitzer
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2025-08-15 23:29 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
Previously nfs_local_probe() was made to disable and then attempt to
re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was
called and LOCALIO already enabled.
Vague memory for _why_ this was the case is that this was useful
if/when a local NFS server were to be restarted with a local NFS
client connected to it.
But as it happens this causes an absurd amount of LOCALIO flapping
which has a side-effect of too much IO being needlessly sent to NFSD
(using RPC over the loopback network interface). This is the
definition of "serious performance loss" (that negates the point of
having LOCALIO).
So remove this mis-optimization for re-enabling LOCALIO if/when an NFS
server is restarted (which is an extremely rare thing to do). Will
revisit testing that scenario again but in the meantime this patch
restores the full benefit of LOCALIO.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
---
fs/nfs/localio.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index bdb82a19136aa..97abf62f109d2 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -180,10 +180,8 @@ static void nfs_local_probe(struct nfs_client *clp)
return;
}
- if (nfs_client_is_local(clp)) {
- /* If already enabled, disable and re-enable */
- nfs_localio_disable_client(clp);
- }
+ if (nfs_client_is_local(clp))
+ return;
if (!nfs_uuid_begin(&clp->cl_uuid))
return;
@@ -244,7 +242,8 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
case -ENOMEM:
case -ENXIO:
case -ENOENT:
- /* Revalidate localio, will disable if unsupported */
+ /* Revalidate localio */
+ nfs_localio_disable_client(clp);
nfs_local_probe(clp);
}
}
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 2/9] nfs/localio: make trace_nfs_local_open_fh more useful
2025-08-15 23:29 [PATCH v8 0/9] NFS DIRECT: align misaligned DIO for LOCALIO Mike Snitzer
2025-08-15 23:29 ` [PATCH v8 1/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
@ 2025-08-15 23:29 ` Mike Snitzer
2025-08-15 23:29 ` [PATCH v8 3/9] nfs/localio: avoid issuing misaligned IO using O_DIRECT Mike Snitzer
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2025-08-15 23:29 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
Always trigger trace event when LOCALIO opens a file.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 5 +++--
fs/nfs/nfstrace.h | 6 +++---
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 97abf62f109d2..42ea50d42c995 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -231,13 +231,13 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfsd_file __rcu **pnf,
const fmode_t mode)
{
+ int status = 0;
struct nfsd_file *localio;
localio = nfs_open_local_fh(&clp->cl_uuid, clp->cl_rpcclient,
cred, fh, nfl, pnf, mode);
if (IS_ERR(localio)) {
- int status = PTR_ERR(localio);
- trace_nfs_local_open_fh(fh, mode, status);
+ status = PTR_ERR(localio);
switch (status) {
case -ENOMEM:
case -ENXIO:
@@ -247,6 +247,7 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
nfs_local_probe(clp);
}
}
+ trace_nfs_local_open_fh(fh, mode, status);
return localio;
}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 96b1323318c2f..4ec66d5e9cc6c 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1712,10 +1712,10 @@ TRACE_EVENT(nfs_local_open_fh,
),
TP_printk(
- "error=%d fhandle=0x%08x mode=%s",
- __entry->error,
+ "fhandle=0x%08x mode=%s result=%d",
__entry->fhandle,
- show_fs_fmode_flags(__entry->fmode)
+ show_fs_fmode_flags(__entry->fmode),
+ __entry->error
)
);
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 3/9] nfs/localio: avoid issuing misaligned IO using O_DIRECT
2025-08-15 23:29 [PATCH v8 0/9] NFS DIRECT: align misaligned DIO for LOCALIO Mike Snitzer
2025-08-15 23:29 ` [PATCH v8 1/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-08-15 23:29 ` [PATCH v8 2/9] nfs/localio: make trace_nfs_local_open_fh more useful Mike Snitzer
@ 2025-08-15 23:29 ` Mike Snitzer
2025-08-15 23:29 ` [PATCH v8 4/9] nfs/localio: refactor iocb and iov_iter_bvec initialization Mike Snitzer
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2025-08-15 23:29 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
Add nfsd_file_dio_alignment and use it to avoid issuing misaligned IO
using O_DIRECT.
Also introduce nfs_iov_iter_aligned_bvec() which is a variant of
iov_iter_aligned_bvec() that also verifies the offset associated with
an iov_iter is DIO-aligned.
NOTE: in a parallel effort, iov_iter_aligned_bvec() is being removed
along with iov_iter_is_aligned().
Lastly, add WARN_ON_ONCE if underlying filesystem returns -EINVAL
because it was made to try O_DIRECT for IO that is not DIO-aligned
(shouldn't happen, so its best to be loud if it does).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 66 +++++++++++++++++++++++++++++++++++---
fs/nfsd/localio.c | 11 +++++++
include/linux/nfslocalio.h | 2 ++
3 files changed, 74 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 42ea50d42c995..9b12ddc19485f 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -322,12 +322,10 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
return NULL;
}
+ init_sync_kiocb(&iocb->kiocb, file);
if (localio_O_DIRECT_semantics &&
- test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) {
- iocb->kiocb.ki_filp = file;
+ test_bit(NFS_IOHDR_ODIRECT, &hdr->flags))
iocb->kiocb.ki_flags = IOCB_DIRECT;
- } else
- init_sync_kiocb(&iocb->kiocb, file);
iocb->kiocb.ki_pos = hdr->args.offset;
iocb->hdr = hdr;
@@ -337,6 +335,30 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
return iocb;
}
+static bool nfs_iov_iter_aligned_bvec(const struct iov_iter *i,
+ loff_t offset, unsigned addr_mask, unsigned len_mask)
+{
+ const struct bio_vec *bvec = i->bvec;
+ unsigned skip = i->iov_offset;
+ size_t size = i->count;
+
+ if ((offset | size) & len_mask)
+ return false;
+ do {
+ size_t len = bvec->bv_len;
+
+ if (len > size)
+ len = size;
+ if ((unsigned long)(bvec->bv_offset + skip) & addr_mask)
+ return false;
+ bvec++;
+ size -= len;
+ skip = 0;
+ } while (size);
+
+ return true;
+}
+
static void
nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
{
@@ -346,6 +368,26 @@ nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
hdr->args.count + hdr->args.pgbase);
if (hdr->args.pgbase != 0)
iov_iter_advance(i, hdr->args.pgbase);
+
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ u32 nf_dio_mem_align, nf_dio_offset_align, nf_dio_read_offset_align;
+ /* Verify the IO is DIO-aligned as required */
+ nfs_to->nfsd_file_dio_alignment(iocb->localio, &nf_dio_mem_align,
+ &nf_dio_offset_align,
+ &nf_dio_read_offset_align);
+ if (dir == READ)
+ nf_dio_offset_align = nf_dio_read_offset_align;
+
+ if (nf_dio_mem_align && nf_dio_offset_align &&
+ nfs_iov_iter_aligned_bvec(i, hdr->args.offset,
+ nf_dio_mem_align - 1,
+ nf_dio_offset_align - 1))
+ return; /* is DIO-aligned */
+
+ /* Fallback to using buffered for this misaligned IO */
+ iocb->kiocb.ki_flags &= ~IOCB_DIRECT;
+ iocb->kiocb.ki_filp->f_flags &= ~O_DIRECT;
+ }
}
static void
@@ -406,6 +448,14 @@ nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
struct nfs_pgio_header *hdr = iocb->hdr;
struct file *filp = iocb->kiocb.ki_filp;
+ if (status < 0) {
+ /* Underlying FS will return -EINVAL if misaligned
+ * DIO is attempted because it shouldn't be.
+ */
+ WARN_ON_ONCE((iocb->kiocb.ki_flags & IOCB_DIRECT) &&
+ status == -EINVAL);
+ }
+
nfs_local_pgio_done(hdr, status);
/*
@@ -607,8 +657,14 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
status = -ENOSPC;
}
- if (status < 0)
+ if (status < 0) {
nfs_reset_boot_verifier(inode);
+ /* Underlying FS will return -EINVAL if misaligned
+ * DIO is attempted because it shouldn't be.
+ */
+ WARN_ON_ONCE((iocb->kiocb.ki_flags & IOCB_DIRECT) &&
+ status == -EINVAL);
+ }
nfs_local_pgio_done(hdr, status);
}
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 269fa9391dc46..be710d809a3ba 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -117,12 +117,23 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
return localio;
}
+static void nfsd_file_dio_alignment(struct nfsd_file *nf,
+ u32 *nf_dio_mem_align,
+ u32 *nf_dio_offset_align,
+ u32 *nf_dio_read_offset_align)
+{
+ *nf_dio_mem_align = nf->nf_dio_mem_align;
+ *nf_dio_offset_align = nf->nf_dio_offset_align;
+ *nf_dio_read_offset_align = nf->nf_dio_read_offset_align;
+}
+
static const struct nfsd_localio_operations nfsd_localio_ops = {
.nfsd_net_try_get = nfsd_net_try_get,
.nfsd_net_put = nfsd_net_put,
.nfsd_open_local_fh = nfsd_open_local_fh,
.nfsd_file_put_local = nfsd_file_put_local,
.nfsd_file_file = nfsd_file_file,
+ .nfsd_file_dio_alignment = nfsd_file_dio_alignment,
};
void nfsd_localio_ops_init(void)
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 59ea90bd136b6..3d91043254e64 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -64,6 +64,8 @@ struct nfsd_localio_operations {
const fmode_t);
struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
struct file *(*nfsd_file_file)(struct nfsd_file *);
+ void (*nfsd_file_dio_alignment)(struct nfsd_file *,
+ u32 *, u32 *, u32 *);
} ____cacheline_aligned;
extern void nfsd_localio_ops_init(void);
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 4/9] nfs/localio: refactor iocb and iov_iter_bvec initialization
2025-08-15 23:29 [PATCH v8 0/9] NFS DIRECT: align misaligned DIO for LOCALIO Mike Snitzer
` (2 preceding siblings ...)
2025-08-15 23:29 ` [PATCH v8 3/9] nfs/localio: avoid issuing misaligned IO using O_DIRECT Mike Snitzer
@ 2025-08-15 23:29 ` Mike Snitzer
2025-08-15 23:29 ` [PATCH v8 5/9] nfs/localio: refactor iocb initialization Mike Snitzer
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2025-08-15 23:29 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
nfs_local_iter_init() is updated to follow the same pattern to
initializing LOCALIO's iov_iter_bvec as was established by
nfsd_iter_read().
Other LOCALIO iocb initialization refactoring in this commit offers
incremental cleanup that will be taken further by the next commit.
No functional change.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 69 ++++++++++++++++++++++--------------------------
1 file changed, 32 insertions(+), 37 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 9b12ddc19485f..a2df099b188c4 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -282,23 +282,6 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
}
EXPORT_SYMBOL_GPL(nfs_local_open_fh);
-static struct bio_vec *
-nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
- unsigned int npages, gfp_t flags)
-{
- struct bio_vec *bvec, *p;
-
- bvec = kmalloc_array(npages, sizeof(*bvec), flags);
- if (bvec != NULL) {
- for (p = bvec; npages > 0; p++, pagevec++, npages--) {
- p->bv_page = *pagevec;
- p->bv_len = PAGE_SIZE;
- p->bv_offset = 0;
- }
- }
- return bvec;
-}
-
static void
nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
{
@@ -315,8 +298,9 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
iocb = kmalloc(sizeof(*iocb), flags);
if (iocb == NULL)
return NULL;
- iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
- hdr->page_array.npages, flags);
+
+ iocb->bvec = kmalloc_array(hdr->page_array.npages,
+ sizeof(struct bio_vec), flags);
if (iocb->bvec == NULL) {
kfree(iocb);
return NULL;
@@ -360,14 +344,27 @@ static bool nfs_iov_iter_aligned_bvec(const struct iov_iter *i,
}
static void
-nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
+nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int rw)
{
struct nfs_pgio_header *hdr = iocb->hdr;
+ struct page **pagevec = hdr->page_array.pagevec;
+ unsigned long v, total;
+ unsigned int base;
+ size_t len;
- iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages,
- hdr->args.count + hdr->args.pgbase);
- if (hdr->args.pgbase != 0)
- iov_iter_advance(i, hdr->args.pgbase);
+ v = 0;
+ total = hdr->args.count;
+ base = hdr->args.pgbase;
+ while (total) {
+ len = min_t(size_t, total, PAGE_SIZE - base);
+ bvec_set_page(&iocb->bvec[v], *(pagevec++), len, base);
+ total -= len;
+ ++v;
+ base = 0;
+ }
+ WARN_ON_ONCE(v != hdr->page_array.npages);
+
+ iov_iter_bvec(i, rw, iocb->bvec, v, hdr->args.count);
if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
u32 nf_dio_mem_align, nf_dio_offset_align, nf_dio_read_offset_align;
@@ -375,7 +372,7 @@ nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
nfs_to->nfsd_file_dio_alignment(iocb->localio, &nf_dio_mem_align,
&nf_dio_offset_align,
&nf_dio_read_offset_align);
- if (dir == READ)
+ if (rw == ITER_DEST)
nf_dio_offset_align = nf_dio_read_offset_align;
if (nf_dio_mem_align && nf_dio_offset_align &&
@@ -500,7 +497,11 @@ static void nfs_local_call_read(struct work_struct *work)
save_cred = override_creds(filp->f_cred);
- nfs_local_iter_init(&iter, iocb, READ);
+ nfs_local_iter_init(&iter, iocb, ITER_DEST);
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
+ iocb->aio_complete_work = nfs_local_read_aio_complete_work;
+ }
status = filp->f_op->read_iter(&iocb->kiocb, &iter);
@@ -535,11 +536,6 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
- if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
- iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
- iocb->aio_complete_work = nfs_local_read_aio_complete_work;
- }
-
INIT_WORK(&iocb->work, nfs_local_call_read);
queue_work(nfslocaliod_workqueue, &iocb->work);
@@ -700,7 +696,11 @@ static void nfs_local_call_write(struct work_struct *work)
current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO;
save_cred = override_creds(filp->f_cred);
- nfs_local_iter_init(&iter, iocb, WRITE);
+ nfs_local_iter_init(&iter, iocb, ITER_SOURCE);
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
+ iocb->aio_complete_work = nfs_local_write_aio_complete_work;
+ }
file_start_write(filp);
status = filp->f_op->write_iter(&iocb->kiocb, &iter);
@@ -751,11 +751,6 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
- if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
- iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
- iocb->aio_complete_work = nfs_local_write_aio_complete_work;
- }
-
INIT_WORK(&iocb->work, nfs_local_call_write);
queue_work(nfslocaliod_workqueue, &iocb->work);
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 5/9] nfs/localio: refactor iocb initialization
2025-08-15 23:29 [PATCH v8 0/9] NFS DIRECT: align misaligned DIO for LOCALIO Mike Snitzer
` (3 preceding siblings ...)
2025-08-15 23:29 ` [PATCH v8 4/9] nfs/localio: refactor iocb and iov_iter_bvec initialization Mike Snitzer
@ 2025-08-15 23:29 ` Mike Snitzer
2025-08-15 23:30 ` [PATCH v8 6/9] nfs/direct: add misaligned READ handling Mike Snitzer
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2025-08-15 23:29 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
The goal of this commit's various refactoring is to have LOCALIO's per
IO initialization occur in process context so that we don't get into a
situation where IO fails to be issued from workqueue (e.g. due to lack
of memory, etc). Better to have LOCALIO's iocb initialization fail
early.
There isn't immediate need but this commit makes it possible for
LOCALIO to fallback to NFS pagelist code in process context to allow
for immediate retry over RPC.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 95 ++++++++++++++++++++++++++++--------------------
1 file changed, 56 insertions(+), 39 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index a2df099b188c4..b219999afee18 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -36,6 +36,7 @@ struct nfs_local_kiocb {
struct nfs_pgio_header *hdr;
struct work_struct work;
void (*aio_complete_work)(struct work_struct *);
+ struct iov_iter iter ____cacheline_aligned;
struct nfsd_file *localio;
};
@@ -418,12 +419,18 @@ nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
}
static void
-nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+nfs_local_iocb_release(struct nfs_local_kiocb *iocb)
{
- struct nfs_pgio_header *hdr = iocb->hdr;
-
nfs_local_file_put(iocb->localio);
nfs_local_iocb_free(iocb);
+}
+
+static void
+nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ nfs_local_iocb_release(iocb);
nfs_local_hdr_release(hdr, hdr->task.tk_ops);
}
@@ -492,18 +499,16 @@ static void nfs_local_call_read(struct work_struct *work)
container_of(work, struct nfs_local_kiocb, work);
struct file *filp = iocb->kiocb.ki_filp;
const struct cred *save_cred;
- struct iov_iter iter;
ssize_t status;
save_cred = override_creds(filp->f_cred);
- nfs_local_iter_init(&iter, iocb, ITER_DEST);
if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
iocb->aio_complete_work = nfs_local_read_aio_complete_work;
}
- status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ status = filp->f_op->read_iter(&iocb->kiocb, &iocb->iter);
revert_creds(save_cred);
@@ -514,25 +519,14 @@ static void nfs_local_call_read(struct work_struct *work)
}
static int
-nfs_do_local_read(struct nfs_pgio_header *hdr,
- struct nfsd_file *localio,
+nfs_local_do_read(struct nfs_local_kiocb *iocb,
const struct rpc_call_ops *call_ops)
{
- struct nfs_local_kiocb *iocb;
- struct file *file = nfs_to->nfsd_file_file(localio);
-
- /* Don't support filesystems without read_iter */
- if (!file->f_op->read_iter)
- return -EAGAIN;
+ struct nfs_pgio_header *hdr = iocb->hdr;
dprintk("%s: vfs_read count=%u pos=%llu\n",
__func__, hdr->args.count, hdr->args.offset);
- iocb = nfs_local_iocb_alloc(hdr, file, GFP_KERNEL);
- if (iocb == NULL)
- return -ENOMEM;
- iocb->localio = localio;
-
nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
@@ -690,20 +684,18 @@ static void nfs_local_call_write(struct work_struct *work)
struct file *filp = iocb->kiocb.ki_filp;
unsigned long old_flags = current->flags;
const struct cred *save_cred;
- struct iov_iter iter;
ssize_t status;
current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO;
save_cred = override_creds(filp->f_cred);
- nfs_local_iter_init(&iter, iocb, ITER_SOURCE);
if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
iocb->aio_complete_work = nfs_local_write_aio_complete_work;
}
file_start_write(filp);
- status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iocb->iter);
file_end_write(filp);
revert_creds(save_cred);
@@ -717,26 +709,15 @@ static void nfs_local_call_write(struct work_struct *work)
}
static int
-nfs_do_local_write(struct nfs_pgio_header *hdr,
- struct nfsd_file *localio,
+nfs_local_do_write(struct nfs_local_kiocb *iocb,
const struct rpc_call_ops *call_ops)
{
- struct nfs_local_kiocb *iocb;
- struct file *file = nfs_to->nfsd_file_file(localio);
-
- /* Don't support filesystems without write_iter */
- if (!file->f_op->write_iter)
- return -EAGAIN;
+ struct nfs_pgio_header *hdr = iocb->hdr;
dprintk("%s: vfs_write count=%u pos=%llu %s\n",
__func__, hdr->args.count, hdr->args.offset,
(hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
- iocb = nfs_local_iocb_alloc(hdr, file, GFP_NOIO);
- if (iocb == NULL)
- return -ENOMEM;
- iocb->localio = localio;
-
switch (hdr->args.stable) {
default:
break;
@@ -757,32 +738,68 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
return 0;
}
+static struct nfs_local_kiocb *
+nfs_local_iocb_init(struct nfs_pgio_header *hdr, struct nfsd_file *localio)
+{
+ struct file *file = nfs_to->nfsd_file_file(localio);
+ struct nfs_local_kiocb *iocb;
+ gfp_t gfp_mask;
+ int rw;
+
+ if (hdr->rw_mode & FMODE_READ) {
+ if (!file->f_op->read_iter)
+ return ERR_PTR(-EOPNOTSUPP);
+ gfp_mask = GFP_KERNEL;
+ rw = ITER_DEST;
+ } else {
+ if (!file->f_op->write_iter)
+ return ERR_PTR(-EOPNOTSUPP);
+ gfp_mask = GFP_NOIO;
+ rw = ITER_SOURCE;
+ }
+
+ iocb = nfs_local_iocb_alloc(hdr, file, gfp_mask);
+ if (iocb == NULL)
+ return ERR_PTR(-ENOMEM);
+ iocb->hdr = hdr;
+ iocb->localio = localio;
+
+ nfs_local_iter_init(&iocb->iter, iocb, rw);
+
+ return iocb;
+}
+
int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
struct nfs_pgio_header *hdr,
const struct rpc_call_ops *call_ops)
{
+ struct nfs_local_kiocb *iocb;
int status = 0;
if (!hdr->args.count)
return 0;
+ iocb = nfs_local_iocb_init(hdr, localio);
+ if (IS_ERR(iocb))
+ return PTR_ERR(iocb);
+
switch (hdr->rw_mode) {
case FMODE_READ:
- status = nfs_do_local_read(hdr, localio, call_ops);
+ status = nfs_local_do_read(iocb, call_ops);
break;
case FMODE_WRITE:
- status = nfs_do_local_write(hdr, localio, call_ops);
+ status = nfs_local_do_write(iocb, call_ops);
break;
default:
dprintk("%s: invalid mode: %d\n", __func__,
hdr->rw_mode);
- status = -EINVAL;
+ status = -EOPNOTSUPP;
}
if (status != 0) {
if (status == -EAGAIN)
nfs_localio_disable_client(clp);
- nfs_local_file_put(localio);
+ nfs_local_iocb_release(iocb);
hdr->task.tk_status = status;
nfs_local_hdr_release(hdr, call_ops);
}
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 6/9] nfs/direct: add misaligned READ handling
2025-08-15 23:29 [PATCH v8 0/9] NFS DIRECT: align misaligned DIO for LOCALIO Mike Snitzer
` (4 preceding siblings ...)
2025-08-15 23:29 ` [PATCH v8 5/9] nfs/localio: refactor iocb initialization Mike Snitzer
@ 2025-08-15 23:30 ` Mike Snitzer
2025-08-15 23:30 ` [PATCH v8 7/9] nfs/direct: add misaligned WRITE handling Mike Snitzer
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2025-08-15 23:30 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
Because the NFS client will already happily handle misaligned O_DIRECT
IO (by sending it out to NFSD via RPC) this commit's new capabilities
are for the benefit of LOCALIO and require the nfs modparam:
localio_O_DIRECT_align_misaligned_IO=Y
Add 'localio_O_DIRECT_align_misaligned_IO' modparm, which depends on
localio_O_DIRECT_semantics=Y, to control if LOCALIO will make best
effort to transform misaligned IO to DIO-aligned extents when possible.
When enabled, a misaligned DIO READ is split into a head, middle and
tail as needed. The large middle extent is DIO-aligned and the head
and/or tail are misaligned (due to each being a partial page).
The misaligned head and/or tail extents are issued using buffered IO
and the DIO-aligned middle is issued using O_DIRECT.
A new 'pg_doio_now' flag is added to the nfs_pageio_descriptor struct
and if set nfs_pageio_add_request() will issue all IO up to the
nfs_page being added. This allows for NFS DIRECT to issue the
misaligned head and/or tail and middle extents separately.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/direct.c | 89 ++++++++++++++++++++++++++++++++++++----
fs/nfs/internal.h | 13 ++++++
fs/nfs/localio.c | 11 +++++
fs/nfs/pagelist.c | 9 +++-
include/linux/nfs_page.h | 1 +
5 files changed, 113 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 48d89716193a7..fc011571c5d29 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -363,9 +363,16 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
rsize, &pgbase);
if (result < 0)
break;
-
- bytes = result;
- npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+
+ /* Limit the amount of bytes serviced each iteration to aligned batches */
+ if (pos < dreq->middle_offset && dreq->start_len)
+ bytes = min_t(size_t, dreq->start_len, result);
+ else if (pos < dreq->end_offset && dreq->middle_len)
+ bytes = min_t(size_t, dreq->middle_len, result);
+ else
+ bytes = result;
+ npages = (bytes + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+
for (i = 0; i < npages; i++) {
struct nfs_page *req;
unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
@@ -376,15 +383,35 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
result = PTR_ERR(req);
break;
}
+
+ pgbase = 0;
+ result -= req_len;
+ bytes -= req_len;
+ requested_bytes += req_len;
+ pos += req_len;
+
+ /* Issue IO if this req was the end of the start or middle */
+ if (bytes == 0) {
+ if ((dreq->start_len &&
+ pos == dreq->middle_offset && result >= dreq->middle_len) ||
+ (dreq->end_len &&
+ pos == dreq->end_offset && result == dreq->end_len))
+ desc.pg_doio_now = 1;
+ }
+
if (!nfs_pageio_add_request(&desc, req)) {
+ desc.pg_doio_now = 0;
result = desc.pg_error;
nfs_release_request(req);
break;
}
- pgbase = 0;
- bytes -= req_len;
- requested_bytes += req_len;
- pos += req_len;
+
+ if (desc.pg_doio_now) {
+ /* Reset and handle iter to next aligned boundary */
+ iov_iter_revert(iter, result);
+ desc.pg_doio_now = 0;
+ break;
+ }
}
nfs_direct_release_pages(pagevec, npages);
kvfree(pagevec);
@@ -409,6 +436,47 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
return requested_bytes;
}
+/*
+ * If localio_O_DIRECT_align_misaligned_IO enabled, split misaligned
+ * IO to a DIO-aligned middle and misaligned head and/or tail.
+ */
+static bool nfs_analyze_dio(loff_t offset, ssize_t len,
+ struct nfs_direct_req *dreq)
+{
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ /* Hardcoded to PAGE_SIZE (since don't have LOCALIO nfsd_file's
+ * dio_alignment), works for smaller alignment too (e.g. 512b).
+ */
+ u32 dio_blocksize = PAGE_SIZE;
+ loff_t start_end, orig_end, middle_end;
+
+ /* Return early if feature disabled, if IO is irreparably
+ * misaligned (len < PAGE_SIZE) or if IO is already DIO-aligned.
+ */
+ if (!nfs_localio_O_DIRECT_align_misaligned_IO() ||
+ unlikely(len < dio_blocksize) ||
+ (((offset | len) & (dio_blocksize-1)) == 0))
+ return false;
+
+ start_end = round_up(offset, dio_blocksize);
+ orig_end = offset + len;
+ middle_end = round_down(orig_end, dio_blocksize);
+
+ dreq->io_start = offset;
+ dreq->max_count = orig_end - offset;
+
+ dreq->start_len = start_end - offset;
+ dreq->middle_offset = start_end;
+ dreq->middle_len = middle_end - start_end;
+ dreq->end_offset = middle_end;
+ dreq->end_len = orig_end - middle_end;
+
+ return true;
+#else
+ return false;
+#endif
+}
+
/**
* nfs_file_direct_read - file direct read operation for NFS files
* @iocb: target I/O control block
@@ -456,8 +524,11 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
goto out;
dreq->inode = inode;
- dreq->max_count = count;
- dreq->io_start = iocb->ki_pos;
+ if (swap || !nfs_analyze_dio(iocb->ki_pos, count, dreq)) {
+ dreq->max_count = count;
+ dreq->io_start = iocb->ki_pos;
+ }
+
dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
l_ctx = nfs_get_lock_context(dreq->ctx);
if (IS_ERR(l_ctx)) {
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 522011eea5f2f..a9b03a5df243d 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -469,6 +469,7 @@ extern int nfs_local_commit(struct nfsd_file *,
struct nfs_commit_data *,
const struct rpc_call_ops *, int);
extern bool nfs_server_is_local(const struct nfs_client *clp);
+extern bool nfs_localio_O_DIRECT_align_misaligned_IO(void);
#else /* CONFIG_NFS_LOCALIO */
static inline void nfs_local_probe(struct nfs_client *clp) {}
@@ -497,6 +498,10 @@ static inline bool nfs_server_is_local(const struct nfs_client *clp)
{
return false;
}
+static inline bool nfs_localio_O_DIRECT_align_misaligned_IO(void)
+{
+ return false;
+}
#endif /* CONFIG_NFS_LOCALIO */
/* super.c */
@@ -987,4 +992,12 @@ struct nfs_direct_req {
/* for read */
#define NFS_ODIRECT_SHOULD_DIRTY (3) /* dirty user-space page after read */
#define NFS_ODIRECT_DONE INT_MAX /* write verification failed */
+
+ /* State for expanding/splitting misaligned IO to be DIO-aligned (for LOCALIO) */
+ struct bio_vec * start_extra_bvec;
+ loff_t middle_offset;
+ loff_t end_offset;
+ ssize_t start_len;
+ ssize_t middle_len;
+ ssize_t end_len;
};
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index b219999afee18..89d505e4ef359 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -55,6 +55,11 @@ module_param(localio_O_DIRECT_semantics, bool, 0644);
MODULE_PARM_DESC(localio_O_DIRECT_semantics,
"LOCALIO will use O_DIRECT semantics to filesystem.");
+static bool localio_O_DIRECT_align_misaligned_IO __read_mostly = true;
+module_param(localio_O_DIRECT_align_misaligned_IO, bool, 0644);
+MODULE_PARM_DESC(localio_O_DIRECT_align_misaligned_IO,
+ "If LOCALIO_O_DIRECT_semantics=Y make best effort to transform misaligned IO to DIO-aligned.");
+
static inline bool nfs_client_is_local(const struct nfs_client *clp)
{
return !!rcu_access_pointer(clp->cl_uuid.net);
@@ -66,6 +71,12 @@ bool nfs_server_is_local(const struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_server_is_local);
+bool nfs_localio_O_DIRECT_align_misaligned_IO(void)
+{
+ return localio_O_DIRECT_align_misaligned_IO;
+}
+EXPORT_SYMBOL_GPL(nfs_localio_O_DIRECT_align_misaligned_IO);
+
/*
* UUID_IS_LOCAL XDR functions
*/
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 11968dcb72431..b30b1e8f9ff4b 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -824,6 +824,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
int io_flags)
{
desc->pg_moreio = 0;
+ desc->pg_doio_now = 0;
desc->pg_inode = inode;
desc->pg_ops = pg_ops;
desc->pg_completion_ops = compl_ops;
@@ -1190,8 +1191,11 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
size = nfs_pageio_do_add_request(desc, subreq);
if (size == subreq_size) {
/* We successfully submitted a request */
- if (subreq == req)
+ if (subreq == req) {
+ if (desc->pg_doio_now)
+ goto doio_now;
break;
+ }
req->wb_pgbase += size;
req->wb_bytes -= size;
req->wb_offset += size;
@@ -1207,12 +1211,15 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
nfs_page_group_lock(req);
}
if (!size) {
+doio_now:
/* Can't coalesce any more, so do I/O */
nfs_page_group_unlock(req);
desc->pg_moreio = 1;
nfs_pageio_doio(desc);
if (desc->pg_error < 0 || mirror->pg_recoalesce)
return 0;
+ if (desc->pg_doio_now)
+ return 1;
/* retry add_request for this subreq */
nfs_page_group_lock(req);
continue;
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 169b4ae30ff47..2e88dc2ff3fea 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -117,6 +117,7 @@ struct nfs_pageio_descriptor {
u32 pg_mirror_idx; /* current mirror */
unsigned short pg_maxretrans;
unsigned char pg_moreio : 1;
+ unsigned char pg_doio_now : 1;
};
/* arbitrarily selected limit to number of mirrors */
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 7/9] nfs/direct: add misaligned WRITE handling
2025-08-15 23:29 [PATCH v8 0/9] NFS DIRECT: align misaligned DIO for LOCALIO Mike Snitzer
` (5 preceding siblings ...)
2025-08-15 23:30 ` [PATCH v8 6/9] nfs/direct: add misaligned READ handling Mike Snitzer
@ 2025-08-15 23:30 ` Mike Snitzer
2025-08-15 23:30 ` [PATCH v8 8/9] nfs/direct: add tracepoints for misaligned DIO READ and WRITE support Mike Snitzer
2025-08-15 23:30 ` [PATCH v8 9/9] NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Mike Snitzer
8 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2025-08-15 23:30 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
Because the NFS client will already happily handle misaligned O_DIRECT
IO (by sending it out to NFSD via RPC) this commit's new capabilities
are for the benefit of LOCALIO and require the nfs modparam:
localio_O_DIRECT_align_misaligned_IO=Y
When enabled, a misaligned DIO WRITE is split into a head, middle and
tail as needed. The large middle extent is DIO-aligned and the head
and/or tail are misaligned (due to each being a partial page).
The misaligned head and/or tail extents are issued using buffered IO
and the DIO-aligned middle is issued using O_DIRECT.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/direct.c | 40 +++++++++++++++++++++++++++++++++++-----
1 file changed, 35 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index fc011571c5d29..3803289a94793 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -963,8 +963,15 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
if (result < 0)
break;
- bytes = result;
- npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+ /* Limit the amount of bytes serviced each iteration to aligned batches */
+ if (pos < dreq->middle_offset && dreq->start_len)
+ bytes = min_t(size_t, dreq->start_len, result);
+ else if (pos < dreq->end_offset && dreq->middle_len)
+ bytes = min_t(size_t, dreq->middle_len, result);
+ else
+ bytes = result;
+ npages = (bytes + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+
for (i = 0; i < npages; i++) {
struct nfs_page *req;
unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
@@ -983,6 +990,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
}
pgbase = 0;
+ result -= req_len;
bytes -= req_len;
requested_bytes += req_len;
pos += req_len;
@@ -992,9 +1000,28 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
continue;
}
+ /* Issue IO if this req was the end of the start or middle */
+ if (bytes == 0) {
+ if ((dreq->start_len &&
+ pos == dreq->middle_offset && result >= dreq->middle_len) ||
+ (dreq->end_len &&
+ pos == dreq->end_offset && result == dreq->end_len))
+ desc.pg_doio_now = 1;
+ }
+
nfs_lock_request(req);
- if (nfs_pageio_add_request(&desc, req))
+ if (nfs_pageio_add_request(&desc, req)) {
+ if (desc.pg_doio_now) {
+ /* Reset and handle iter to next aligned boundary */
+ iov_iter_revert(iter, result);
+ desc.pg_doio_now = 0;
+ break;
+ }
continue;
+ }
+
+ if (unlikely(desc.pg_doio_now))
+ desc.pg_doio_now = 0;
/* Exit on hard errors */
if (desc.pg_error < 0 && desc.pg_error != -EAGAIN) {
@@ -1092,8 +1119,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter,
goto out;
dreq->inode = inode;
- dreq->max_count = count;
- dreq->io_start = pos;
+ if (swap || !nfs_analyze_dio(pos, count, dreq)) {
+ dreq->max_count = count;
+ dreq->io_start = pos;
+ }
+
dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
l_ctx = nfs_get_lock_context(dreq->ctx);
if (IS_ERR(l_ctx)) {
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 8/9] nfs/direct: add tracepoints for misaligned DIO READ and WRITE support
2025-08-15 23:29 [PATCH v8 0/9] NFS DIRECT: align misaligned DIO for LOCALIO Mike Snitzer
` (6 preceding siblings ...)
2025-08-15 23:30 ` [PATCH v8 7/9] nfs/direct: add misaligned WRITE handling Mike Snitzer
@ 2025-08-15 23:30 ` Mike Snitzer
2025-08-15 23:30 ` [PATCH v8 9/9] NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Mike Snitzer
8 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2025-08-15 23:30 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
Add nfs_analyze_dio_class and use it to create nfs_analyze_read_dio
and nfs_analyze_write_dio trace events.
These trace events show how the NFS client splits a given misaligned
IO into a mix of misaligned head and/or tail extents and a DIO-aligned
middle extent. The misaligned head and/or tail extents are issued
using buffered IO and the DIO-aligned middle is issued using O_DIRECT.
This combination of trace events is useful for LOCALIO DIO READs:
echo 1 > /sys/kernel/tracing/events/nfs/nfs_analyze_read_dio/enable
echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_read/enable
echo 1 > /sys/kernel/tracing/events/nfs/nfs_readpage_done/enable
echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_read/enable
This combination of trace events is useful for LOCALIO DIO WRITEs:
echo 1 > /sys/kernel/tracing/events/nfs/nfs_analyze_write_dio/enable
echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_write/enable
echo 1 > /sys/kernel/tracing/events/nfs/nfs_writeback_done/enable
echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/direct.c | 10 +++++---
fs/nfs/nfstrace.h | 58 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 3803289a94793..012f5bfa15c21 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -441,7 +441,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
* IO to a DIO-aligned middle and misaligned head and/or tail.
*/
static bool nfs_analyze_dio(loff_t offset, ssize_t len,
- struct nfs_direct_req *dreq)
+ struct nfs_direct_req *dreq, int rw)
{
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
/* Hardcoded to PAGE_SIZE (since don't have LOCALIO nfsd_file's
@@ -471,6 +471,10 @@ static bool nfs_analyze_dio(loff_t offset, ssize_t len,
dreq->end_offset = middle_end;
dreq->end_len = orig_end - middle_end;
+ if (rw == READ)
+ trace_nfs_analyze_read_dio(offset, len, dreq);
+ else
+ trace_nfs_analyze_write_dio(offset, len, dreq);
return true;
#else
return false;
@@ -524,7 +528,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
goto out;
dreq->inode = inode;
- if (swap || !nfs_analyze_dio(iocb->ki_pos, count, dreq)) {
+ if (swap || !nfs_analyze_dio(iocb->ki_pos, count, dreq, READ)) {
dreq->max_count = count;
dreq->io_start = iocb->ki_pos;
}
@@ -1119,7 +1123,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter,
goto out;
dreq->inode = inode;
- if (swap || !nfs_analyze_dio(pos, count, dreq)) {
+ if (swap || !nfs_analyze_dio(pos, count, dreq, WRITE)) {
dreq->max_count = count;
dreq->io_start = pos;
}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 4ec66d5e9cc6c..ec4c0f073361a 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1598,6 +1598,64 @@ DEFINE_NFS_DIRECT_REQ_EVENT(nfs_direct_write_completion);
DEFINE_NFS_DIRECT_REQ_EVENT(nfs_direct_write_schedule_iovec);
DEFINE_NFS_DIRECT_REQ_EVENT(nfs_direct_write_reschedule_io);
+DECLARE_EVENT_CLASS(nfs_analyze_dio_class,
+ TP_PROTO(
+ loff_t offset,
+ ssize_t count,
+ const struct nfs_direct_req *dreq
+ ),
+ TP_ARGS(offset, count, dreq),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(u64, fileid)
+ __field(u32, fhandle)
+ __field(loff_t, offset)
+ __field(ssize_t, count)
+ __field(loff_t, start)
+ __field(ssize_t, start_len)
+ __field(loff_t, middle)
+ __field(ssize_t, middle_len)
+ __field(loff_t, end)
+ __field(ssize_t, end_len)
+ ),
+ TP_fast_assign(
+ const struct inode *inode = dreq->inode;
+ const struct nfs_inode *nfsi = NFS_I(inode);
+ const struct nfs_fh *fh = &nfsi->fh;
+
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->fileid = nfsi->fileid;
+ __entry->fhandle = nfs_fhandle_hash(fh);
+ __entry->offset = offset;
+ __entry->count = count;
+ __entry->start = dreq->io_start;
+ __entry->start_len = dreq->start_len;
+ __entry->middle = dreq->middle_offset;
+ __entry->middle_len = dreq->middle_len;
+ __entry->end = dreq->end_offset;
+ __entry->end_len = dreq->end_len;
+ ),
+ TP_printk("fileid=%02x:%02x:%llu fhandle=0x%08x "
+ "offset=%lld count=%zd "
+ "start=%llu+%lu middle=%llu+%lu end=%llu+%lu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long long)__entry->fileid,
+ __entry->fhandle, __entry->offset, __entry->count,
+ __entry->start, __entry->start_len,
+ __entry->middle, __entry->middle_len,
+ __entry->end, __entry->end_len)
+)
+
+#define DEFINE_NFS_ANALYZE_DIO_EVENT(name) \
+DEFINE_EVENT(nfs_analyze_dio_class, nfs_analyze_##name##_dio, \
+ TP_PROTO(loff_t offset, \
+ ssize_t count, \
+ const struct nfs_direct_req *dreq), \
+ TP_ARGS(offset, count, dreq))
+
+DEFINE_NFS_ANALYZE_DIO_EVENT(read);
+DEFINE_NFS_ANALYZE_DIO_EVENT(write);
+
TRACE_EVENT(nfs_fh_to_dentry,
TP_PROTO(
const struct super_block *sb,
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 9/9] NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support
2025-08-15 23:29 [PATCH v8 0/9] NFS DIRECT: align misaligned DIO for LOCALIO Mike Snitzer
` (7 preceding siblings ...)
2025-08-15 23:30 ` [PATCH v8 8/9] nfs/direct: add tracepoints for misaligned DIO READ and WRITE support Mike Snitzer
@ 2025-08-15 23:30 ` Mike Snitzer
8 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2025-08-15 23:30 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
NFS doesn't have DIO alignment constraints, so have NFS respond with
accommodating DIO alignment attributes (rather than plumb in GETATTR
support for STATX_DIOALIGN and STATX_DIO_READ_ALIGN).
The most coarse-grained dio_offset_align is the most accommodating
(e.g. PAGE_SIZE, in future larger may be supported).
Now that NFS has support, NFS reexport will now handle unaligned DIO
(NFSD's NFSD_IO_DIRECT support requires the underlying filesystem
support STATX_DIOALIGN and/or STATX_DIO_READ_ALIGN).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/inode.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 338ef77ae4230..7866d60b18452 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1066,6 +1066,21 @@ int nfs_getattr(struct mnt_idmap *idmap, const struct path *path,
if (S_ISDIR(inode->i_mode))
stat->blksize = NFS_SERVER(inode)->dtsize;
stat->btime = NFS_I(inode)->btime;
+
+ /* Special handling for STATX_DIOALIGN and STATX_DIO_READ_ALIGN
+ * - NFS doesn't have DIO alignment constraints, avoid getting
+ * these DIO attrs from remote and just respond with most
+ * accommodating limits (so client will issue supported DIO).
+ * - this is unintuitive, but the most coarse-grained
+ * dio_offset_align is the most accommodating.
+ */
+ if ((request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN)) &&
+ S_ISREG(inode->i_mode)) {
+ stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
+ stat->dio_mem_align = 4; /* 4-byte alignment */
+ stat->dio_offset_align = PAGE_SIZE;
+ stat->dio_read_offset_align = stat->dio_offset_align;
+ }
out:
trace_nfs_getattr_exit(inode, err);
return err;
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread