* [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts
@ 2025-09-24 16:20 Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 1/9] NFSv4/flexfiles: Remove cred local variable dependency Jonathan Curley
` (10 more replies)
0 siblings, 11 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
This patch series introduces support for striped layouts:
The first 2 patches are simple preparation changes. There should be
no logical impact to the code.
The 3rd patch refactors the nfs4_ff_layout_mirror struct to have an
array of a new nfs4_ff_layout_ds_stripe type. The
nfs4_ff_layout_ds_stripe has all the contents of ff_data_server4 per
the flexfile rfc. I called it ds_stripe because ds was already taken
by the deviceid side of the code.
The patches 4-8 update various paths to be dss_id aware. Most of this
consists of either adding a new parameter to the function or adding a
loop. Depending on which is appropriate.
The final patch 9 updates the layout creation path to populate the
array and turns the feature on.
v1:
- Fixes function parameter 'dss_id' not described in
'nfs4_ff_layout_prepare_ds'
v2:
- Fixes layout stat error reporting path for commit to properly
calculate dss_id.
v3:
- Fixes do_div dividend to be u64.
v4:
- Use regular division operators for u32 commit path math.
- Fix mirror null check in ff_rw_layout_has_available_ds.
Jonathan Curley (9):
NFSv4/flexfiles: Remove cred local variable dependency
NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
NFSv4/flexfiles: Add data structure support for striped layouts
NFSv4/flexfiles: Update low level helper functions to be DS stripe
aware.
NFSv4/flexfiles: Read path updates for striped layouts
NFSv4/flexfiles: Commit path updates for striped layouts
NFSv4/flexfiles: Write path updates for striped layouts
NFSv4/flexfiles: Update layout stats & error paths for striped layouts
NFSv4/flexfiles: Add support for striped layouts
fs/nfs/flexfilelayout/flexfilelayout.c | 778 +++++++++++++++-------
fs/nfs/flexfilelayout/flexfilelayout.h | 64 +-
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 105 +--
fs/nfs/write.c | 2 +-
4 files changed, 635 insertions(+), 314 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC PATCH v4 1/9] NFSv4/flexfiles: Remove cred local variable dependency
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
@ 2025-09-24 16:20 ` Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 2/9] NFSv4/flexfiles: Use ds_commit_idx when marking a write commit Jonathan Curley
` (9 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
No-op preparation change to remove dependency on cred local
variable. Subsequent striping diff has a cred per stripe so this local
variable can't be trusted to be the same.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 4bea008dbebd..a437d20ebcdf 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -532,10 +532,10 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
if (mirror != fls->mirror_array[i]) {
/* swap cred ptrs so free_mirror will clean up old */
if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->ro_cred, cred);
+ cred = xchg(&mirror->ro_cred, fls->mirror_array[i]->ro_cred);
rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
} else {
- cred = xchg(&mirror->rw_cred, cred);
+ cred = xchg(&mirror->rw_cred, fls->mirror_array[i]->rw_cred);
rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
}
ff_layout_free_mirror(fls->mirror_array[i]);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v4 2/9] NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 1/9] NFSv4/flexfiles: Remove cred local variable dependency Jonathan Curley
@ 2025-09-24 16:20 ` Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 3/9] NFSv4/flexfiles: Add data structure support for striped layouts Jonathan Curley
` (8 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
Correct this path to use ds_commit_idx. Another noop preparation
change. In current code commit_idx == mirror_idx but when striping is
enabled that will not be true.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/write.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 374fc6b34c79..422bb817cc85 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -977,7 +977,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
req->wb_nio = 0;
memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
nfs_mark_request_commit(req, hdr->lseg, &cinfo,
- hdr->pgio_mirror_idx);
+ hdr->ds_commit_idx);
goto next;
}
remove_req:
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v4 3/9] NFSv4/flexfiles: Add data structure support for striped layouts
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 1/9] NFSv4/flexfiles: Remove cred local variable dependency Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 2/9] NFSv4/flexfiles: Use ds_commit_idx when marking a write commit Jonathan Curley
@ 2025-09-24 16:20 ` Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware Jonathan Curley
` (7 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
Adds a new struct nfs4_ff_layout_ds_stripe that represents a data
server stripe within a layout. A new dynamically allocated array of
this type has been added to nfs4_ff_layout_mirror and per stripe
configuration information has been moved from the mirror type to the
stripe based on the RFC.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 134 ++++++++++++----------
fs/nfs/flexfilelayout/flexfilelayout.h | 27 +++--
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 54 ++++-----
3 files changed, 117 insertions(+), 98 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index a437d20ebcdf..46a765bf05c3 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -171,7 +171,7 @@ ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx,
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
- return nfs_local_open_fh(clp, cred, fh, &mirror->nfl, mode);
+ return nfs_local_open_fh(clp, cred, fh, &mirror->dss[0].nfl, mode);
#else
return NULL;
#endif
@@ -182,13 +182,13 @@ static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
{
int i, j;
- if (m1->fh_versions_cnt != m2->fh_versions_cnt)
+ if (m1->dss[0].fh_versions_cnt != m2->dss[0].fh_versions_cnt)
return false;
- for (i = 0; i < m1->fh_versions_cnt; i++) {
+ for (i = 0; i < m1->dss[0].fh_versions_cnt; i++) {
bool found_fh = false;
- for (j = 0; j < m2->fh_versions_cnt; j++) {
- if (nfs_compare_fh(&m1->fh_versions[i],
- &m2->fh_versions[j]) == 0) {
+ for (j = 0; j < m2->dss[0].fh_versions_cnt; j++) {
+ if (nfs_compare_fh(&m1->dss[0].fh_versions[i],
+ &m2->dss[0].fh_versions[j]) == 0) {
found_fh = true;
break;
}
@@ -209,7 +209,8 @@ ff_layout_add_mirror(struct pnfs_layout_hdr *lo,
spin_lock(&inode->i_lock);
list_for_each_entry(pos, &ff_layout->mirrors, mirrors) {
- if (memcmp(&mirror->devid, &pos->devid, sizeof(pos->devid)) != 0)
+ if (memcmp(&mirror->dss[0].devid, &pos->dss[0].devid,
+ sizeof(pos->dss[0].devid)) != 0)
continue;
if (!ff_mirror_match_fh(mirror, pos))
continue;
@@ -246,23 +247,27 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
spin_lock_init(&mirror->lock);
refcount_set(&mirror->ref, 1);
INIT_LIST_HEAD(&mirror->mirrors);
- nfs_localio_file_init(&mirror->nfl);
+ nfs_localio_file_init(&mirror->dss[0].nfl);
}
return mirror;
}
static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
{
- const struct cred *cred;
+ const struct cred *cred;
+ int dss_id = 0;
ff_layout_remove_mirror(mirror);
- kfree(mirror->fh_versions);
- nfs_close_local_fh(&mirror->nfl);
- cred = rcu_access_pointer(mirror->ro_cred);
+
+ kfree(mirror->dss[dss_id].fh_versions);
+ nfs_close_local_fh(&mirror->dss[dss_id].nfl);
+ cred = rcu_access_pointer(mirror->dss[dss_id].ro_cred);
put_cred(cred);
- cred = rcu_access_pointer(mirror->rw_cred);
+ cred = rcu_access_pointer(mirror->dss[dss_id].rw_cred);
put_cred(cred);
- nfs4_ff_layout_put_deviceid(mirror->mirror_ds);
+ nfs4_ff_layout_put_deviceid(mirror->dss[dss_id].mirror_ds);
+
+ kfree(mirror->dss);
kfree(mirror);
}
@@ -372,8 +377,8 @@ static void ff_layout_sort_mirrors(struct nfs4_ff_layout_segment *fls)
for (i = 0; i < fls->mirror_array_cnt - 1; i++) {
for (j = i + 1; j < fls->mirror_array_cnt; j++)
- if (fls->mirror_array[i]->efficiency <
- fls->mirror_array[j]->efficiency)
+ if (fls->mirror_array[i]->dss[0].efficiency <
+ fls->mirror_array[j]->dss[0].efficiency)
swap(fls->mirror_array[i],
fls->mirror_array[j]);
}
@@ -427,23 +432,25 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
fls->mirror_array_cnt = mirror_array_cnt;
fls->stripe_unit = stripe_unit;
+ u32 dss_count = 0;
for (i = 0; i < fls->mirror_array_cnt; i++) {
struct nfs4_ff_layout_mirror *mirror;
struct cred *kcred;
const struct cred __rcu *cred;
kuid_t uid;
kgid_t gid;
- u32 ds_count, fh_count, id;
- int j;
+ u32 fh_count, id;
+ int j, dss_id = 0;
rc = -EIO;
p = xdr_inline_decode(&stream, 4);
if (!p)
goto out_err_free;
- ds_count = be32_to_cpup(p);
+
+ dss_count = be32_to_cpup(p);
/* FIXME: allow for striping? */
- if (ds_count != 1)
+ if (dss_count != 1)
goto out_err_free;
fls->mirror_array[i] = ff_layout_alloc_mirror(gfp_flags);
@@ -452,10 +459,13 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
goto out_err_free;
}
- fls->mirror_array[i]->ds_count = ds_count;
+ fls->mirror_array[i]->dss_count = dss_count;
+ fls->mirror_array[i]->dss =
+ kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
+ gfp_flags);
/* deviceid */
- rc = decode_deviceid(&stream, &fls->mirror_array[i]->devid);
+ rc = decode_deviceid(&stream, &fls->mirror_array[i]->dss[dss_id].devid);
if (rc)
goto out_err_free;
@@ -464,10 +474,10 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
p = xdr_inline_decode(&stream, 4);
if (!p)
goto out_err_free;
- fls->mirror_array[i]->efficiency = be32_to_cpup(p);
+ fls->mirror_array[i]->dss[dss_id].efficiency = be32_to_cpup(p);
/* stateid */
- rc = decode_pnfs_stateid(&stream, &fls->mirror_array[i]->stateid);
+ rc = decode_pnfs_stateid(&stream, &fls->mirror_array[i]->dss[dss_id].stateid);
if (rc)
goto out_err_free;
@@ -478,22 +488,22 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
goto out_err_free;
fh_count = be32_to_cpup(p);
- fls->mirror_array[i]->fh_versions =
- kcalloc(fh_count, sizeof(struct nfs_fh),
- gfp_flags);
- if (fls->mirror_array[i]->fh_versions == NULL) {
+ fls->mirror_array[i]->dss[dss_id].fh_versions =
+ kcalloc(fh_count, sizeof(struct nfs_fh),
+ gfp_flags);
+ if (fls->mirror_array[i]->dss[dss_id].fh_versions == NULL) {
rc = -ENOMEM;
goto out_err_free;
}
for (j = 0; j < fh_count; j++) {
rc = decode_nfs_fh(&stream,
- &fls->mirror_array[i]->fh_versions[j]);
+ &fls->mirror_array[i]->dss[dss_id].fh_versions[j]);
if (rc)
goto out_err_free;
}
- fls->mirror_array[i]->fh_versions_cnt = fh_count;
+ fls->mirror_array[i]->dss[dss_id].fh_versions_cnt = fh_count;
/* user */
rc = decode_name(&stream, &id);
@@ -524,19 +534,21 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
cred = RCU_INITIALIZER(kcred);
if (lgr->range.iomode == IOMODE_READ)
- rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].ro_cred, cred);
else
- rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].rw_cred, cred);
mirror = ff_layout_add_mirror(lh, fls->mirror_array[i]);
if (mirror != fls->mirror_array[i]) {
/* swap cred ptrs so free_mirror will clean up old */
if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->ro_cred, fls->mirror_array[i]->ro_cred);
- rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
+ cred = xchg(&mirror->dss[dss_id].ro_cred,
+ fls->mirror_array[i]->dss[dss_id].ro_cred);
+ rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].ro_cred, cred);
} else {
- cred = xchg(&mirror->rw_cred, fls->mirror_array[i]->rw_cred);
- rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
+ cred = xchg(&mirror->dss[dss_id].rw_cred,
+ fls->mirror_array[i]->dss[dss_id].rw_cred);
+ rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].rw_cred, cred);
}
ff_layout_free_mirror(fls->mirror_array[i]);
fls->mirror_array[i] = mirror;
@@ -624,8 +636,8 @@ nfs4_ff_layoutstat_start_io(struct nfs4_ff_layout_mirror *mirror,
struct nfs4_flexfile_layout *ffl = FF_LAYOUT_FROM_HDR(mirror->layout);
nfs4_ff_start_busy_timer(&layoutstat->busy_timer, now);
- if (!mirror->start_time)
- mirror->start_time = now;
+ if (!mirror->dss[0].start_time)
+ mirror->dss[0].start_time = now;
if (mirror->report_interval != 0)
report_interval = (s64)mirror->report_interval * 1000LL;
else if (layoutstats_timer != 0)
@@ -680,8 +692,8 @@ nfs4_ff_layout_stat_io_start_read(struct inode *inode,
bool report;
spin_lock(&mirror->lock);
- report = nfs4_ff_layoutstat_start_io(mirror, &mirror->read_stat, now);
- nfs4_ff_layout_stat_io_update_requested(&mirror->read_stat, requested);
+ report = nfs4_ff_layoutstat_start_io(mirror, &mirror->dss[0].read_stat, now);
+ nfs4_ff_layout_stat_io_update_requested(&mirror->dss[0].read_stat, requested);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -696,7 +708,7 @@ nfs4_ff_layout_stat_io_end_read(struct rpc_task *task,
__u64 completed)
{
spin_lock(&mirror->lock);
- nfs4_ff_layout_stat_io_update_completed(&mirror->read_stat,
+ nfs4_ff_layout_stat_io_update_completed(&mirror->dss[0].read_stat,
requested, completed,
ktime_get(), task->tk_start);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
@@ -711,8 +723,8 @@ nfs4_ff_layout_stat_io_start_write(struct inode *inode,
bool report;
spin_lock(&mirror->lock);
- report = nfs4_ff_layoutstat_start_io(mirror , &mirror->write_stat, now);
- nfs4_ff_layout_stat_io_update_requested(&mirror->write_stat, requested);
+ report = nfs4_ff_layoutstat_start_io(mirror, &mirror->dss[0].write_stat, now);
+ nfs4_ff_layout_stat_io_update_requested(&mirror->dss[0].write_stat, requested);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -731,7 +743,7 @@ nfs4_ff_layout_stat_io_end_write(struct rpc_task *task,
requested = completed = 0;
spin_lock(&mirror->lock);
- nfs4_ff_layout_stat_io_update_completed(&mirror->write_stat,
+ nfs4_ff_layout_stat_io_update_completed(&mirror->dss[0].write_stat,
requested, completed, ktime_get(), task->tk_start);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -773,7 +785,7 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
continue;
if (check_device &&
- nfs4_test_deviceid_unavailable(&mirror->mirror_ds->id_node))
+ nfs4_test_deviceid_unavailable(&mirror->dss[0].mirror_ds->id_node))
continue;
*best_idx = idx;
@@ -879,7 +891,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
mirror = FF_LAYOUT_COMP(pgio->pg_lseg, ds_idx);
pgm = &pgio->pg_mirrors[0];
- pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
+ pgm->pg_bsize = mirror->dss[0].mirror_ds->ds_versions[0].rsize;
pgio->pg_mirror_idx = ds_idx;
return;
@@ -951,7 +963,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
goto retry;
}
pgm = &pgio->pg_mirrors[i];
- pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].wsize;
+ pgm->pg_bsize = mirror->dss[0].mirror_ds->ds_versions[0].wsize;
}
if (NFS_SERVER(pgio->pg_inode)->flags &
@@ -2021,7 +2033,7 @@ select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i)
/* FIXME: Assume that there is only one NFS version available
* for the DS.
*/
- return &flseg->mirror_array[i]->fh_versions[0];
+ return &flseg->mirror_array[i]->dss[0].fh_versions[0];
}
static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
@@ -2137,10 +2149,10 @@ static void ff_layout_cancel_io(struct pnfs_layout_segment *lseg)
for (idx = 0; idx < flseg->mirror_array_cnt; idx++) {
mirror = flseg->mirror_array[idx];
- mirror_ds = mirror->mirror_ds;
+ mirror_ds = mirror->dss[0].mirror_ds;
if (IS_ERR_OR_NULL(mirror_ds))
continue;
- ds = mirror->mirror_ds->ds;
+ ds = mirror->dss[0].mirror_ds->ds;
if (!ds)
continue;
ds_clp = ds->ds_clp;
@@ -2541,8 +2553,8 @@ ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
struct nfs4_ff_layout_mirror *mirror)
{
struct nfs4_pnfs_ds_addr *da;
- struct nfs4_pnfs_ds *ds = mirror->mirror_ds->ds;
- struct nfs_fh *fh = &mirror->fh_versions[0];
+ struct nfs4_pnfs_ds *ds = mirror->dss[0].mirror_ds->ds;
+ struct nfs_fh *fh = &mirror->dss[0].fh_versions[0];
__be32 *p;
da = list_first_entry(&ds->ds_addrs, struct nfs4_pnfs_ds_addr, da_node);
@@ -2555,12 +2567,12 @@ ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
xdr_encode_opaque(p, fh->data, fh->size);
/* ff_io_latency4 read */
spin_lock(&mirror->lock);
- ff_layout_encode_io_latency(xdr, &mirror->read_stat.io_stat);
+ ff_layout_encode_io_latency(xdr, &mirror->dss[0].read_stat.io_stat);
/* ff_io_latency4 write */
- ff_layout_encode_io_latency(xdr, &mirror->write_stat.io_stat);
+ ff_layout_encode_io_latency(xdr, &mirror->dss[0].write_stat.io_stat);
spin_unlock(&mirror->lock);
/* nfstime4 */
- ff_layout_encode_nfstime(xdr, ktime_sub(ktime_get(), mirror->start_time));
+ ff_layout_encode_nfstime(xdr, ktime_sub(ktime_get(), mirror->dss[0].start_time));
/* bool */
p = xdr_reserve_space(xdr, 4);
*p = cpu_to_be32(false);
@@ -2607,7 +2619,7 @@ ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
list_for_each_entry(mirror, &ff_layout->mirrors, mirrors) {
if (i >= dev_limit)
break;
- if (IS_ERR_OR_NULL(mirror->mirror_ds))
+ if (IS_ERR_OR_NULL(mirror->dss[0].mirror_ds))
continue;
if (!test_and_clear_bit(NFS4_FF_MIRROR_STAT_AVAIL,
&mirror->flags) &&
@@ -2616,15 +2628,15 @@ ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
/* mirror refcount put in cleanup_layoutstats */
if (!refcount_inc_not_zero(&mirror->ref))
continue;
- dev = &mirror->mirror_ds->id_node;
+ dev = &mirror->dss[0].mirror_ds->id_node;
memcpy(&devinfo->dev_id, &dev->deviceid, NFS4_DEVICEID4_SIZE);
devinfo->offset = 0;
devinfo->length = NFS4_MAX_UINT64;
spin_lock(&mirror->lock);
- devinfo->read_count = mirror->read_stat.io_stat.ops_completed;
- devinfo->read_bytes = mirror->read_stat.io_stat.bytes_completed;
- devinfo->write_count = mirror->write_stat.io_stat.ops_completed;
- devinfo->write_bytes = mirror->write_stat.io_stat.bytes_completed;
+ devinfo->read_count = mirror->dss[0].read_stat.io_stat.ops_completed;
+ devinfo->read_bytes = mirror->dss[0].read_stat.io_stat.bytes_completed;
+ devinfo->write_count = mirror->dss[0].write_stat.io_stat.ops_completed;
+ devinfo->write_bytes = mirror->dss[0].write_stat.io_stat.bytes_completed;
spin_unlock(&mirror->lock);
devinfo->layout_type = LAYOUT_FLEX_FILES;
devinfo->ld_private.ops = &layoutstat_ops;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index 095df09017a5..14640452713b 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -71,12 +71,12 @@ struct nfs4_ff_layoutstat {
struct nfs4_ff_busy_timer busy_timer;
};
-struct nfs4_ff_layout_mirror {
- struct pnfs_layout_hdr *layout;
- struct list_head mirrors;
- u32 ds_count;
- u32 efficiency;
+struct nfs4_ff_layout_mirror;
+
+struct nfs4_ff_layout_ds_stripe {
+ struct nfs4_ff_layout_mirror *mirror;
struct nfs4_deviceid devid;
+ u32 efficiency;
struct nfs4_ff_layout_ds *mirror_ds;
u32 fh_versions_cnt;
struct nfs_fh *fh_versions;
@@ -84,12 +84,19 @@ struct nfs4_ff_layout_mirror {
const struct cred __rcu *ro_cred;
const struct cred __rcu *rw_cred;
struct nfs_file_localio nfl;
- refcount_t ref;
- spinlock_t lock;
- unsigned long flags;
struct nfs4_ff_layoutstat read_stat;
struct nfs4_ff_layoutstat write_stat;
ktime_t start_time;
+};
+
+struct nfs4_ff_layout_mirror {
+ struct pnfs_layout_hdr *layout;
+ struct list_head mirrors;
+ u32 dss_count;
+ struct nfs4_ff_layout_ds_stripe *dss;
+ refcount_t ref;
+ spinlock_t lock;
+ unsigned long flags;
u32 report_interval;
};
@@ -155,7 +162,7 @@ FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx)
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, idx);
if (mirror != NULL) {
- struct nfs4_ff_layout_ds *mirror_ds = mirror->mirror_ds;
+ struct nfs4_ff_layout_ds *mirror_ds = mirror->dss[0].mirror_ds;
if (!IS_ERR_OR_NULL(mirror_ds))
return &mirror_ds->id_node;
@@ -184,7 +191,7 @@ ff_layout_no_read_on_rw(struct pnfs_layout_segment *lseg)
static inline int
nfs4_ff_layout_ds_version(const struct nfs4_ff_layout_mirror *mirror)
{
- return mirror->mirror_ds->ds_versions[0].version;
+ return mirror->dss[0].mirror_ds->ds_versions[0].version;
}
struct nfs4_ff_layout_ds *
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index 656d5c50bbce..f8ac9d8bd380 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -259,7 +259,7 @@ int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
if (status == 0)
return 0;
- if (IS_ERR_OR_NULL(mirror->mirror_ds))
+ if (IS_ERR_OR_NULL(mirror->dss[0].mirror_ds))
return -EINVAL;
dserr = kmalloc(sizeof(*dserr), gfp_flags);
@@ -271,8 +271,8 @@ int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
dserr->length = length;
dserr->status = status;
dserr->opnum = opnum;
- nfs4_stateid_copy(&dserr->stateid, &mirror->stateid);
- memcpy(&dserr->deviceid, &mirror->mirror_ds->id_node.deviceid,
+ nfs4_stateid_copy(&dserr->stateid, &mirror->dss[0].stateid);
+ memcpy(&dserr->deviceid, &mirror->dss[0].mirror_ds->id_node.deviceid,
NFS4_DEVICEID4_SIZE);
spin_lock(&flo->generic_hdr.plh_inode->i_lock);
@@ -287,9 +287,9 @@ ff_layout_get_mirror_cred(struct nfs4_ff_layout_mirror *mirror, u32 iomode)
const struct cred *cred, __rcu **pcred;
if (iomode == IOMODE_READ)
- pcred = &mirror->ro_cred;
+ pcred = &mirror->dss[0].ro_cred;
else
- pcred = &mirror->rw_cred;
+ pcred = &mirror->dss[0].rw_cred;
rcu_read_lock();
do {
@@ -307,7 +307,7 @@ struct nfs_fh *
nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror)
{
/* FIXME: For now assume there is only 1 version available for the DS */
- return &mirror->fh_versions[0];
+ return &mirror->dss[0].fh_versions[0];
}
void
@@ -315,7 +315,7 @@ nfs4_ff_layout_select_ds_stateid(const struct nfs4_ff_layout_mirror *mirror,
nfs4_stateid *stateid)
{
if (nfs4_ff_layout_ds_version(mirror) == 4)
- nfs4_stateid_copy(stateid, &mirror->stateid);
+ nfs4_stateid_copy(stateid, &mirror->dss[0].stateid);
}
static bool
@@ -324,23 +324,23 @@ ff_layout_init_mirror_ds(struct pnfs_layout_hdr *lo,
{
if (mirror == NULL)
goto outerr;
- if (mirror->mirror_ds == NULL) {
+ if (mirror->dss[0].mirror_ds == NULL) {
struct nfs4_deviceid_node *node;
struct nfs4_ff_layout_ds *mirror_ds = ERR_PTR(-ENODEV);
node = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode),
- &mirror->devid, lo->plh_lc_cred,
+ &mirror->dss[0].devid, lo->plh_lc_cred,
GFP_KERNEL);
if (node)
mirror_ds = FF_LAYOUT_MIRROR_DS(node);
/* check for race with another call to this function */
- if (cmpxchg(&mirror->mirror_ds, NULL, mirror_ds) &&
+ if (cmpxchg(&mirror->dss[0].mirror_ds, NULL, mirror_ds) &&
mirror_ds != ERR_PTR(-ENODEV))
nfs4_put_deviceid_node(node);
}
- if (IS_ERR(mirror->mirror_ds))
+ if (IS_ERR(mirror->dss[0].mirror_ds))
goto outerr;
return true;
@@ -379,7 +379,7 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
if (!ff_layout_init_mirror_ds(lseg->pls_layout, mirror))
goto noconnect;
- ds = mirror->mirror_ds->ds;
+ ds = mirror->dss[0].mirror_ds->ds;
if (READ_ONCE(ds->ds_clp))
goto out;
/* matching smp_wmb() in _nfs4_pnfs_v3/4_ds_connect */
@@ -388,10 +388,10 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
/* FIXME: For now we assume the server sent only one version of NFS
* to use for the DS.
*/
- status = nfs4_pnfs_ds_connect(s, ds, &mirror->mirror_ds->id_node,
+ status = nfs4_pnfs_ds_connect(s, ds, &mirror->dss[0].mirror_ds->id_node,
dataserver_timeo, dataserver_retrans,
- mirror->mirror_ds->ds_versions[0].version,
- mirror->mirror_ds->ds_versions[0].minor_version);
+ mirror->dss[0].mirror_ds->ds_versions[0].version,
+ mirror->dss[0].mirror_ds->ds_versions[0].minor_version);
/* connect success, check rsize/wsize limit */
if (!status) {
@@ -404,10 +404,10 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
max_payload =
nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
NULL);
- if (mirror->mirror_ds->ds_versions[0].rsize > max_payload)
- mirror->mirror_ds->ds_versions[0].rsize = max_payload;
- if (mirror->mirror_ds->ds_versions[0].wsize > max_payload)
- mirror->mirror_ds->ds_versions[0].wsize = max_payload;
+ if (mirror->dss[0].mirror_ds->ds_versions[0].rsize > max_payload)
+ mirror->dss[0].mirror_ds->ds_versions[0].rsize = max_payload;
+ if (mirror->dss[0].mirror_ds->ds_versions[0].wsize > max_payload)
+ mirror->dss[0].mirror_ds->ds_versions[0].wsize = max_payload;
goto out;
}
noconnect:
@@ -430,7 +430,7 @@ ff_layout_get_ds_cred(struct nfs4_ff_layout_mirror *mirror,
{
const struct cred *cred;
- if (mirror && !mirror->mirror_ds->ds_versions[0].tightly_coupled) {
+ if (mirror && !mirror->dss[0].mirror_ds->ds_versions[0].tightly_coupled) {
cred = ff_layout_get_mirror_cred(mirror, range->iomode);
if (!cred)
cred = get_cred(mdscred);
@@ -453,7 +453,7 @@ struct rpc_clnt *
nfs4_ff_find_or_create_ds_client(struct nfs4_ff_layout_mirror *mirror,
struct nfs_client *ds_clp, struct inode *inode)
{
- switch (mirror->mirror_ds->ds_versions[0].version) {
+ switch (mirror->dss[0].mirror_ds->ds_versions[0].version) {
case 3:
/* For NFSv3 DS, flavor is set when creating DS connections */
return ds_clp->cl_rpcclient;
@@ -564,11 +564,11 @@ static bool ff_read_layout_has_available_ds(struct pnfs_layout_segment *lseg)
for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
if (mirror) {
- if (!mirror->mirror_ds)
+ if (!mirror->dss[0].mirror_ds)
return true;
- if (IS_ERR(mirror->mirror_ds))
+ if (IS_ERR(mirror->dss[0].mirror_ds))
continue;
- devid = &mirror->mirror_ds->id_node;
+ devid = &mirror->dss[0].mirror_ds->id_node;
if (!nfs4_test_deviceid_unavailable(devid))
return true;
}
@@ -585,11 +585,11 @@ static bool ff_rw_layout_has_available_ds(struct pnfs_layout_segment *lseg)
for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
- if (!mirror || IS_ERR(mirror->mirror_ds))
+ if (!mirror || IS_ERR(mirror->dss[0].mirror_ds))
return false;
- if (!mirror->mirror_ds)
+ if (!mirror->dss[0].mirror_ds)
continue;
- devid = &mirror->mirror_ds->id_node;
+ devid = &mirror->dss[0].mirror_ds->id_node;
if (nfs4_test_deviceid_unavailable(devid))
return false;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v4 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware.
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (2 preceding siblings ...)
2025-09-24 16:20 ` [RFC PATCH v4 3/9] NFSv4/flexfiles: Add data structure support for striped layouts Jonathan Curley
@ 2025-09-24 16:20 ` Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 5/9] NFSv4/flexfiles: Read path updates for striped layouts Jonathan Curley
` (6 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
Updates common helper functions to be dss_id aware. Most cases simply
add a dss_id parameter. The has_available functions have been updated
with a loop.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 56 ++++++------
fs/nfs/flexfilelayout/flexfilelayout.h | 39 +++++---
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 105 ++++++++++++----------
3 files changed, 115 insertions(+), 85 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 46a765bf05c3..a2a3821f190c 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -164,14 +164,14 @@ decode_name(struct xdr_stream *xdr, u32 *id)
}
static struct nfsd_file *
-ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx, u32 dss_id,
struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, fmode_t mode)
{
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
- return nfs_local_open_fh(clp, cred, fh, &mirror->dss[0].nfl, mode);
+ return nfs_local_open_fh(clp, cred, fh, &mirror->dss[dss_id].nfl, mode);
#else
return NULL;
#endif
@@ -752,7 +752,7 @@ nfs4_ff_layout_stat_io_end_write(struct rpc_task *task,
static void
ff_layout_mark_ds_unreachable(struct pnfs_layout_segment *lseg, u32 idx)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
if (devid)
nfs4_mark_deviceid_unavailable(devid);
@@ -761,7 +761,7 @@ ff_layout_mark_ds_unreachable(struct pnfs_layout_segment *lseg, u32 idx)
static void
ff_layout_mark_ds_reachable(struct pnfs_layout_segment *lseg, u32 idx)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
if (devid)
nfs4_mark_deviceid_available(devid);
@@ -780,7 +780,7 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
/* mirrors are initially sorted by efficiency */
for (idx = start_idx; idx < fls->mirror_array_cnt; idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, false);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, false);
if (!ds)
continue;
@@ -953,7 +953,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
for (i = 0; i < pgio->pg_mirror_count; i++) {
mirror = FF_LAYOUT_COMP(pgio->pg_lseg, i);
- ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, mirror, true);
+ ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, mirror, 0, true);
if (!ds) {
if (!ff_layout_no_fallback_to_mds(pgio->pg_lseg))
goto out_mds;
@@ -1125,7 +1125,7 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task,
{
struct pnfs_layout_hdr *lo = lseg->pls_layout;
struct inode *inode = lo->plh_inode;
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
switch (op_status) {
@@ -1224,7 +1224,7 @@ static int ff_layout_async_handle_error_v3(struct rpc_task *task,
struct pnfs_layout_segment *lseg,
u32 idx)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
switch (op_status) {
case NFS_OK:
@@ -1354,7 +1354,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
mirror = FF_LAYOUT_COMP(lseg, idx);
err = ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
- mirror, offset, length, status, opnum,
+ mirror, 0, offset, length, status, opnum,
nfs_io_gfp_mask());
switch (status) {
@@ -1885,20 +1885,20 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->args.pgbase, (size_t)hdr->args.count, offset);
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, false);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, false);
if (!ds)
goto out_failed;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- hdr->inode);
+ hdr->inode, 0);
if (IS_ERR(ds_clnt))
goto out_failed;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, 0);
if (!ds_cred)
goto out_failed;
- vers = nfs4_ff_layout_ds_version(mirror);
+ vers = nfs4_ff_layout_ds_version(mirror, 0);
dprintk("%s USE DS: %s cl_count %d vers %d\n", __func__,
ds->ds_remotestr, refcount_read(&ds->ds_clp->cl_count), vers);
@@ -1906,11 +1906,11 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->pgio_done_cb = ff_layout_read_done_cb;
refcount_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- fh = nfs4_ff_layout_select_ds_fh(mirror);
+ fh = nfs4_ff_layout_select_ds_fh(mirror, 0);
if (fh)
hdr->args.fh = fh;
- nfs4_ff_layout_select_ds_stateid(mirror, &hdr->args.stateid);
+ nfs4_ff_layout_select_ds_stateid(mirror, 0, &hdr->args.stateid);
/*
* Note that if we ever decide to split across DSes,
@@ -1920,7 +1920,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Start IO accounting for local read */
- localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, FMODE_READ);
+ localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh, FMODE_READ);
if (localio) {
hdr->task.tk_start = ktime_get();
ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
@@ -1959,20 +1959,20 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
u32 idx = hdr->pgio_mirror_idx;
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, true);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, true);
if (!ds)
goto out_failed;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- hdr->inode);
+ hdr->inode, 0);
if (IS_ERR(ds_clnt))
goto out_failed;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, 0);
if (!ds_cred)
goto out_failed;
- vers = nfs4_ff_layout_ds_version(mirror);
+ vers = nfs4_ff_layout_ds_version(mirror, 0);
dprintk("%s ino %lu sync %d req %zu@%llu DS: %s cl_count %d vers %d\n",
__func__, hdr->inode->i_ino, sync, (size_t) hdr->args.count,
@@ -1983,11 +1983,11 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
refcount_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
hdr->ds_commit_idx = idx;
- fh = nfs4_ff_layout_select_ds_fh(mirror);
+ fh = nfs4_ff_layout_select_ds_fh(mirror, 0);
if (fh)
hdr->args.fh = fh;
- nfs4_ff_layout_select_ds_stateid(mirror, &hdr->args.stateid);
+ nfs4_ff_layout_select_ds_stateid(mirror, 0, &hdr->args.stateid);
/*
* Note that if we ever decide to split across DSes,
@@ -1996,7 +1996,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Start IO accounting for local write */
- localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
hdr->task.tk_start = ktime_get();
@@ -2054,20 +2054,20 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, true);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, true);
if (!ds)
goto out_err;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- data->inode);
+ data->inode, 0);
if (IS_ERR(ds_clnt))
goto out_err;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, data->cred);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, data->cred, 0);
if (!ds_cred)
goto out_err;
- vers = nfs4_ff_layout_ds_version(mirror);
+ vers = nfs4_ff_layout_ds_version(mirror, 0);
dprintk("%s ino %lu, how %d cl_count %d vers %d\n", __func__,
data->inode->i_ino, how, refcount_read(&ds->ds_clp->cl_count),
@@ -2081,7 +2081,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
data->args.fh = fh;
/* Start IO accounting for local commit */
- localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
data->task.tk_start = ktime_get();
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index 14640452713b..142324d6d5c5 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -157,12 +157,12 @@ FF_LAYOUT_COMP(struct pnfs_layout_segment *lseg, u32 idx)
}
static inline struct nfs4_deviceid_node *
-FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx)
+FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx, u32 dss_id)
{
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, idx);
if (mirror != NULL) {
- struct nfs4_ff_layout_ds *mirror_ds = mirror->dss[0].mirror_ds;
+ struct nfs4_ff_layout_ds *mirror_ds = mirror->dss[dss_id].mirror_ds;
if (!IS_ERR_OR_NULL(mirror_ds))
return &mirror_ds->id_node;
@@ -189,9 +189,22 @@ ff_layout_no_read_on_rw(struct pnfs_layout_segment *lseg)
}
static inline int
-nfs4_ff_layout_ds_version(const struct nfs4_ff_layout_mirror *mirror)
+nfs4_ff_layout_ds_version(const struct nfs4_ff_layout_mirror *mirror, u32 dss_id)
{
- return mirror->dss[0].mirror_ds->ds_versions[0].version;
+ return mirror->dss[dss_id].mirror_ds->ds_versions[0].version;
+}
+
+static inline u32
+nfs4_ff_layout_calc_dss_id(const u64 stripe_unit, const u32 dss_count, const loff_t offset)
+{
+ u64 tmp = offset;
+
+ if (dss_count == 1 || stripe_unit == 0)
+ return 0;
+
+ do_div(tmp, stripe_unit);
+
+ return do_div(tmp, dss_count);
}
struct nfs4_ff_layout_ds *
@@ -200,9 +213,9 @@ nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
- struct nfs4_ff_layout_mirror *mirror, u64 offset,
- u64 length, int status, enum nfs_opnum4 opnum,
- gfp_t gfp_flags);
+ struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id, u64 offset, u64 length, int status,
+ enum nfs_opnum4 opnum, gfp_t gfp_flags);
void ff_layout_send_layouterror(struct pnfs_layout_segment *lseg);
int ff_layout_encode_ds_ioerr(struct xdr_stream *xdr, const struct list_head *head);
void ff_layout_free_ds_ioerr(struct list_head *head);
@@ -211,23 +224,27 @@ unsigned int ff_layout_fetch_ds_ioerr(struct pnfs_layout_hdr *lo,
struct list_head *head,
unsigned int maxnum);
struct nfs_fh *
-nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror);
+nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror, u32 dss_id);
void
nfs4_ff_layout_select_ds_stateid(const struct nfs4_ff_layout_mirror *mirror,
- nfs4_stateid *stateid);
+ u32 dss_id,
+ nfs4_stateid *stateid);
struct nfs4_pnfs_ds *
nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
bool fail_return);
struct rpc_clnt *
nfs4_ff_find_or_create_ds_client(struct nfs4_ff_layout_mirror *mirror,
struct nfs_client *ds_clp,
- struct inode *inode);
+ struct inode *inode,
+ u32 dss_id);
const struct cred *ff_layout_get_ds_cred(struct nfs4_ff_layout_mirror *mirror,
const struct pnfs_layout_range *range,
- const struct cred *mdscred);
+ const struct cred *mdscred,
+ u32 dss_id);
bool ff_layout_avoid_mds_available_ds(struct pnfs_layout_segment *lseg);
bool ff_layout_avoid_read_on_rw(struct pnfs_layout_segment *lseg);
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index f8ac9d8bd380..d6b6198db8e5 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -250,16 +250,16 @@ ff_layout_add_ds_error_locked(struct nfs4_flexfile_layout *flo,
}
int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
- struct nfs4_ff_layout_mirror *mirror, u64 offset,
- u64 length, int status, enum nfs_opnum4 opnum,
- gfp_t gfp_flags)
+ struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id, u64 offset, u64 length, int status,
+ enum nfs_opnum4 opnum, gfp_t gfp_flags)
{
struct nfs4_ff_layout_ds_err *dserr;
if (status == 0)
return 0;
- if (IS_ERR_OR_NULL(mirror->dss[0].mirror_ds))
+ if (IS_ERR_OR_NULL(mirror->dss[dss_id].mirror_ds))
return -EINVAL;
dserr = kmalloc(sizeof(*dserr), gfp_flags);
@@ -271,8 +271,8 @@ int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
dserr->length = length;
dserr->status = status;
dserr->opnum = opnum;
- nfs4_stateid_copy(&dserr->stateid, &mirror->dss[0].stateid);
- memcpy(&dserr->deviceid, &mirror->dss[0].mirror_ds->id_node.deviceid,
+ nfs4_stateid_copy(&dserr->stateid, &mirror->dss[dss_id].stateid);
+ memcpy(&dserr->deviceid, &mirror->dss[dss_id].mirror_ds->id_node.deviceid,
NFS4_DEVICEID4_SIZE);
spin_lock(&flo->generic_hdr.plh_inode->i_lock);
@@ -282,14 +282,14 @@ int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
}
static const struct cred *
-ff_layout_get_mirror_cred(struct nfs4_ff_layout_mirror *mirror, u32 iomode)
+ff_layout_get_mirror_cred(struct nfs4_ff_layout_mirror *mirror, u32 iomode, u32 dss_id)
{
const struct cred *cred, __rcu **pcred;
if (iomode == IOMODE_READ)
- pcred = &mirror->dss[0].ro_cred;
+ pcred = &mirror->dss[dss_id].ro_cred;
else
- pcred = &mirror->dss[0].rw_cred;
+ pcred = &mirror->dss[dss_id].rw_cred;
rcu_read_lock();
do {
@@ -304,43 +304,45 @@ ff_layout_get_mirror_cred(struct nfs4_ff_layout_mirror *mirror, u32 iomode)
}
struct nfs_fh *
-nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror)
+nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror, u32 dss_id)
{
/* FIXME: For now assume there is only 1 version available for the DS */
- return &mirror->dss[0].fh_versions[0];
+ return &mirror->dss[dss_id].fh_versions[0];
}
void
nfs4_ff_layout_select_ds_stateid(const struct nfs4_ff_layout_mirror *mirror,
- nfs4_stateid *stateid)
+ u32 dss_id,
+ nfs4_stateid *stateid)
{
- if (nfs4_ff_layout_ds_version(mirror) == 4)
- nfs4_stateid_copy(stateid, &mirror->dss[0].stateid);
+ if (nfs4_ff_layout_ds_version(mirror, dss_id) == 4)
+ nfs4_stateid_copy(stateid, &mirror->dss[dss_id].stateid);
}
static bool
ff_layout_init_mirror_ds(struct pnfs_layout_hdr *lo,
- struct nfs4_ff_layout_mirror *mirror)
+ struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id)
{
if (mirror == NULL)
goto outerr;
- if (mirror->dss[0].mirror_ds == NULL) {
+ if (mirror->dss[dss_id].mirror_ds == NULL) {
struct nfs4_deviceid_node *node;
struct nfs4_ff_layout_ds *mirror_ds = ERR_PTR(-ENODEV);
node = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode),
- &mirror->dss[0].devid, lo->plh_lc_cred,
+ &mirror->dss[dss_id].devid, lo->plh_lc_cred,
GFP_KERNEL);
if (node)
mirror_ds = FF_LAYOUT_MIRROR_DS(node);
/* check for race with another call to this function */
- if (cmpxchg(&mirror->dss[0].mirror_ds, NULL, mirror_ds) &&
+ if (cmpxchg(&mirror->dss[dss_id].mirror_ds, NULL, mirror_ds) &&
mirror_ds != ERR_PTR(-ENODEV))
nfs4_put_deviceid_node(node);
}
- if (IS_ERR(mirror->dss[0].mirror_ds))
+ if (IS_ERR(mirror->dss[dss_id].mirror_ds))
goto outerr;
return true;
@@ -352,6 +354,7 @@ ff_layout_init_mirror_ds(struct pnfs_layout_hdr *lo,
* nfs4_ff_layout_prepare_ds - prepare a DS connection for an RPC call
* @lseg: the layout segment we're operating on
* @mirror: layout mirror describing the DS to use
+ * @dss_id: DS stripe id to select stripe to use
* @fail_return: return layout on connect failure?
*
* Try to prepare a DS connection to accept an RPC call. This involves
@@ -368,6 +371,7 @@ ff_layout_init_mirror_ds(struct pnfs_layout_hdr *lo,
struct nfs4_pnfs_ds *
nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
bool fail_return)
{
struct nfs4_pnfs_ds *ds = NULL;
@@ -376,10 +380,10 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
unsigned int max_payload;
int status;
- if (!ff_layout_init_mirror_ds(lseg->pls_layout, mirror))
+ if (!ff_layout_init_mirror_ds(lseg->pls_layout, mirror, dss_id))
goto noconnect;
- ds = mirror->dss[0].mirror_ds->ds;
+ ds = mirror->dss[dss_id].mirror_ds->ds;
if (READ_ONCE(ds->ds_clp))
goto out;
/* matching smp_wmb() in _nfs4_pnfs_v3/4_ds_connect */
@@ -388,10 +392,10 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
/* FIXME: For now we assume the server sent only one version of NFS
* to use for the DS.
*/
- status = nfs4_pnfs_ds_connect(s, ds, &mirror->dss[0].mirror_ds->id_node,
+ status = nfs4_pnfs_ds_connect(s, ds, &mirror->dss[dss_id].mirror_ds->id_node,
dataserver_timeo, dataserver_retrans,
- mirror->dss[0].mirror_ds->ds_versions[0].version,
- mirror->dss[0].mirror_ds->ds_versions[0].minor_version);
+ mirror->dss[dss_id].mirror_ds->ds_versions[0].version,
+ mirror->dss[dss_id].mirror_ds->ds_versions[0].minor_version);
/* connect success, check rsize/wsize limit */
if (!status) {
@@ -404,15 +408,15 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
max_payload =
nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
NULL);
- if (mirror->dss[0].mirror_ds->ds_versions[0].rsize > max_payload)
- mirror->dss[0].mirror_ds->ds_versions[0].rsize = max_payload;
- if (mirror->dss[0].mirror_ds->ds_versions[0].wsize > max_payload)
- mirror->dss[0].mirror_ds->ds_versions[0].wsize = max_payload;
+ if (mirror->dss[dss_id].mirror_ds->ds_versions[0].rsize > max_payload)
+ mirror->dss[dss_id].mirror_ds->ds_versions[0].rsize = max_payload;
+ if (mirror->dss[dss_id].mirror_ds->ds_versions[0].wsize > max_payload)
+ mirror->dss[dss_id].mirror_ds->ds_versions[0].wsize = max_payload;
goto out;
}
noconnect:
ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
- mirror, lseg->pls_range.offset,
+ mirror, dss_id, lseg->pls_range.offset,
lseg->pls_range.length, NFS4ERR_NXIO,
OP_ILLEGAL, GFP_NOIO);
ff_layout_send_layouterror(lseg);
@@ -426,12 +430,13 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
const struct cred *
ff_layout_get_ds_cred(struct nfs4_ff_layout_mirror *mirror,
const struct pnfs_layout_range *range,
- const struct cred *mdscred)
+ const struct cred *mdscred,
+ u32 dss_id)
{
const struct cred *cred;
- if (mirror && !mirror->dss[0].mirror_ds->ds_versions[0].tightly_coupled) {
- cred = ff_layout_get_mirror_cred(mirror, range->iomode);
+ if (mirror && !mirror->dss[dss_id].mirror_ds->ds_versions[0].tightly_coupled) {
+ cred = ff_layout_get_mirror_cred(mirror, range->iomode, dss_id);
if (!cred)
cred = get_cred(mdscred);
} else {
@@ -445,15 +450,17 @@ ff_layout_get_ds_cred(struct nfs4_ff_layout_mirror *mirror,
* @mirror: pointer to the mirror
* @ds_clp: nfs_client for the DS
* @inode: pointer to inode
+ * @dss_id: DS stripe id
*
* Find or create a DS rpc client with th MDS server rpc client auth flavor
* in the nfs_client cl_ds_clients list.
*/
struct rpc_clnt *
nfs4_ff_find_or_create_ds_client(struct nfs4_ff_layout_mirror *mirror,
- struct nfs_client *ds_clp, struct inode *inode)
+ struct nfs_client *ds_clp, struct inode *inode,
+ u32 dss_id)
{
- switch (mirror->dss[0].mirror_ds->ds_versions[0].version) {
+ switch (mirror->dss[dss_id].mirror_ds->ds_versions[0].version) {
case 3:
/* For NFSv3 DS, flavor is set when creating DS connections */
return ds_clp->cl_rpcclient;
@@ -559,16 +566,18 @@ static bool ff_read_layout_has_available_ds(struct pnfs_layout_segment *lseg)
{
struct nfs4_ff_layout_mirror *mirror;
struct nfs4_deviceid_node *devid;
- u32 idx;
+ u32 idx, dss_id;
for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
- if (mirror) {
- if (!mirror->dss[0].mirror_ds)
+ if (!mirror)
+ continue;
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++) {
+ if (!mirror->dss[dss_id].mirror_ds)
return true;
- if (IS_ERR(mirror->dss[0].mirror_ds))
+ if (IS_ERR(mirror->dss[dss_id].mirror_ds))
continue;
- devid = &mirror->dss[0].mirror_ds->id_node;
+ devid = &mirror->dss[dss_id].mirror_ds->id_node;
if (!nfs4_test_deviceid_unavailable(devid))
return true;
}
@@ -581,17 +590,21 @@ static bool ff_rw_layout_has_available_ds(struct pnfs_layout_segment *lseg)
{
struct nfs4_ff_layout_mirror *mirror;
struct nfs4_deviceid_node *devid;
- u32 idx;
+ u32 idx, dss_id;
for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
- if (!mirror || IS_ERR(mirror->dss[0].mirror_ds))
- return false;
- if (!mirror->dss[0].mirror_ds)
- continue;
- devid = &mirror->dss[0].mirror_ds->id_node;
- if (nfs4_test_deviceid_unavailable(devid))
+ if (!mirror)
return false;
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++) {
+ if (IS_ERR(mirror->dss[dss_id].mirror_ds))
+ return false;
+ if (!mirror->dss[dss_id].mirror_ds)
+ continue;
+ devid = &mirror->dss[dss_id].mirror_ds->id_node;
+ if (nfs4_test_deviceid_unavailable(devid))
+ return false;
+ }
}
return FF_LAYOUT_MIRROR_COUNT(lseg) != 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v4 5/9] NFSv4/flexfiles: Read path updates for striped layouts
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (3 preceding siblings ...)
2025-09-24 16:20 ` [RFC PATCH v4 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware Jonathan Curley
@ 2025-09-24 16:20 ` Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 6/9] NFSv4/flexfiles: Commit " Jonathan Curley
` (5 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
Updates read path to calculate and use dss_id to direct IO to the
appropriate stripe DS.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 122 ++++++++++++++++++++-----
1 file changed, 98 insertions(+), 24 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index a2a3821f190c..79700c18762c 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -770,6 +770,7 @@ ff_layout_mark_ds_reachable(struct pnfs_layout_segment *lseg, u32 idx)
static struct nfs4_pnfs_ds *
ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
u32 start_idx, u32 *best_idx,
+ u32 offset, u32 *dss_id,
bool check_device)
{
struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
@@ -780,12 +781,16 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
/* mirrors are initially sorted by efficiency */
for (idx = start_idx; idx < fls->mirror_array_cnt; idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, false);
+ *dss_id = nfs4_ff_layout_calc_dss_id(
+ fls->stripe_unit,
+ fls->mirror_array[idx]->dss_count,
+ offset);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, *dss_id, false);
if (!ds)
continue;
if (check_device &&
- nfs4_test_deviceid_unavailable(&mirror->dss[0].mirror_ds->id_node))
+ nfs4_test_deviceid_unavailable(&mirror->dss[*dss_id].mirror_ds->id_node))
continue;
*best_idx = idx;
@@ -797,42 +802,52 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
static struct nfs4_pnfs_ds *
ff_layout_choose_any_ds_for_read(struct pnfs_layout_segment *lseg,
- u32 start_idx, u32 *best_idx)
+ u32 start_idx, u32 *best_idx,
+ u32 offset, u32 *dss_id)
{
- return ff_layout_choose_ds_for_read(lseg, start_idx, best_idx, false);
+ return ff_layout_choose_ds_for_read(lseg, start_idx, best_idx,
+ offset, dss_id, false);
}
static struct nfs4_pnfs_ds *
ff_layout_choose_valid_ds_for_read(struct pnfs_layout_segment *lseg,
- u32 start_idx, u32 *best_idx)
+ u32 start_idx, u32 *best_idx,
+ u32 offset, u32 *dss_id)
{
- return ff_layout_choose_ds_for_read(lseg, start_idx, best_idx, true);
+ return ff_layout_choose_ds_for_read(lseg, start_idx, best_idx,
+ offset, dss_id, true);
}
static struct nfs4_pnfs_ds *
ff_layout_choose_best_ds_for_read(struct pnfs_layout_segment *lseg,
- u32 start_idx, u32 *best_idx)
+ u32 start_idx, u32 *best_idx,
+ u32 offset, u32 *dss_id)
{
struct nfs4_pnfs_ds *ds;
- ds = ff_layout_choose_valid_ds_for_read(lseg, start_idx, best_idx);
+ ds = ff_layout_choose_valid_ds_for_read(lseg, start_idx, best_idx,
+ offset, dss_id);
if (ds)
return ds;
- return ff_layout_choose_any_ds_for_read(lseg, start_idx, best_idx);
+ return ff_layout_choose_any_ds_for_read(lseg, start_idx, best_idx,
+ offset, dss_id);
}
static struct nfs4_pnfs_ds *
ff_layout_get_ds_for_read(struct nfs_pageio_descriptor *pgio,
- u32 *best_idx)
+ u32 *best_idx,
+ u32 offset,
+ u32 *dss_id)
{
struct pnfs_layout_segment *lseg = pgio->pg_lseg;
struct nfs4_pnfs_ds *ds;
ds = ff_layout_choose_best_ds_for_read(lseg, pgio->pg_mirror_idx,
- best_idx);
+ best_idx, offset, dss_id);
if (ds || !pgio->pg_mirror_idx)
return ds;
- return ff_layout_choose_best_ds_for_read(lseg, 0, best_idx);
+ return ff_layout_choose_best_ds_for_read(lseg, 0, best_idx,
+ offset, dss_id);
}
static void
@@ -851,6 +866,56 @@ ff_layout_pg_get_read(struct nfs_pageio_descriptor *pgio,
}
}
+static bool
+ff_layout_lseg_is_striped(const struct nfs4_ff_layout_segment *fls)
+{
+ return fls->mirror_array[0]->dss_count > 1;
+}
+
+/*
+ * ff_layout_pg_test(). Called by nfs_can_coalesce_requests()
+ *
+ * Return 0 if @req cannot be coalesced into @pgio, otherwise return the number
+ * of bytes (maximum @req->wb_bytes) that can be coalesced.
+ */
+static size_t
+ff_layout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
+ struct nfs_page *req)
+{
+ unsigned int size;
+ u64 p_stripe, r_stripe;
+ u32 stripe_offset;
+ u64 segment_offset = pgio->pg_lseg->pls_range.offset;
+ u32 stripe_unit = FF_LAYOUT_LSEG(pgio->pg_lseg)->stripe_unit;
+
+ /* calls nfs_generic_pg_test */
+ size = pnfs_generic_pg_test(pgio, prev, req);
+ if (!size)
+ return 0;
+ else if (!ff_layout_lseg_is_striped(FF_LAYOUT_LSEG(pgio->pg_lseg)))
+ return size;
+
+ /* see if req and prev are in the same stripe */
+ if (prev) {
+ p_stripe = (u64)req_offset(prev) - segment_offset;
+ r_stripe = (u64)req_offset(req) - segment_offset;
+ do_div(p_stripe, stripe_unit);
+ do_div(r_stripe, stripe_unit);
+
+ if (p_stripe != r_stripe)
+ return 0;
+ }
+
+ /* calculate remaining bytes in the current stripe */
+ div_u64_rem((u64)req_offset(req) - segment_offset,
+ stripe_unit,
+ &stripe_offset);
+ WARN_ON_ONCE(stripe_offset > stripe_unit);
+ if (stripe_offset >= stripe_unit)
+ return 0;
+ return min(stripe_unit - (unsigned int)stripe_offset, size);
+}
+
static void
ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
struct nfs_page *req)
@@ -858,7 +923,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
struct nfs_pgio_mirror *pgm;
struct nfs4_ff_layout_mirror *mirror;
struct nfs4_pnfs_ds *ds;
- u32 ds_idx;
+ u32 ds_idx, dss_id;
if (NFS_SERVER(pgio->pg_inode)->flags &
(NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR))
@@ -879,7 +944,8 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
/* Reset wb_nio, since getting layout segment was successful */
req->wb_nio = 0;
- ds = ff_layout_get_ds_for_read(pgio, &ds_idx);
+ ds = ff_layout_get_ds_for_read(pgio, &ds_idx,
+ req_offset(req), &dss_id);
if (!ds) {
if (!ff_layout_no_fallback_to_mds(pgio->pg_lseg))
goto out_mds;
@@ -891,7 +957,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
mirror = FF_LAYOUT_COMP(pgio->pg_lseg, ds_idx);
pgm = &pgio->pg_mirrors[0];
- pgm->pg_bsize = mirror->dss[0].mirror_ds->ds_versions[0].rsize;
+ pgm->pg_bsize = mirror->dss[dss_id].mirror_ds->ds_versions[0].rsize;
pgio->pg_mirror_idx = ds_idx;
return;
@@ -1029,7 +1095,7 @@ ff_layout_pg_get_mirror_write(struct nfs_pageio_descriptor *desc, u32 idx)
static const struct nfs_pageio_ops ff_layout_pg_read_ops = {
.pg_init = ff_layout_pg_init_read,
- .pg_test = pnfs_generic_pg_test,
+ .pg_test = ff_layout_pg_test,
.pg_doio = pnfs_generic_pg_readpages,
.pg_cleanup = pnfs_generic_pg_cleanup,
};
@@ -1084,8 +1150,10 @@ static void ff_layout_resend_pnfs_read(struct nfs_pgio_header *hdr)
{
u32 idx = hdr->pgio_mirror_idx + 1;
u32 new_idx = 0;
+ u32 dss_id = 0;
- if (ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx))
+ if (ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx,
+ hdr->args.offset, &dss_id))
ff_layout_send_layouterror(hdr->lseg);
else
pnfs_error_mark_layout_for_return(hdr->inode, hdr->lseg);
@@ -1879,26 +1947,31 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
u32 idx = hdr->pgio_mirror_idx;
int vers;
struct nfs_fh *fh;
+ u32 dss_id;
dprintk("--> %s ino %lu pgbase %u req %zu@%llu\n",
__func__, hdr->inode->i_ino,
hdr->args.pgbase, (size_t)hdr->args.count, offset);
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, false);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(lseg)->stripe_unit,
+ mirror->dss_count,
+ offset);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, dss_id, false);
if (!ds)
goto out_failed;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- hdr->inode, 0);
+ hdr->inode, dss_id);
if (IS_ERR(ds_clnt))
goto out_failed;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, 0);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, dss_id);
if (!ds_cred)
goto out_failed;
- vers = nfs4_ff_layout_ds_version(mirror, 0);
+ vers = nfs4_ff_layout_ds_version(mirror, dss_id);
dprintk("%s USE DS: %s cl_count %d vers %d\n", __func__,
ds->ds_remotestr, refcount_read(&ds->ds_clp->cl_count), vers);
@@ -1906,11 +1979,11 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->pgio_done_cb = ff_layout_read_done_cb;
refcount_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- fh = nfs4_ff_layout_select_ds_fh(mirror, 0);
+ fh = nfs4_ff_layout_select_ds_fh(mirror, dss_id);
if (fh)
hdr->args.fh = fh;
- nfs4_ff_layout_select_ds_stateid(mirror, 0, &hdr->args.stateid);
+ nfs4_ff_layout_select_ds_stateid(mirror, dss_id, &hdr->args.stateid);
/*
* Note that if we ever decide to split across DSes,
@@ -1920,7 +1993,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Start IO accounting for local read */
- localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh, FMODE_READ);
+ localio = ff_local_open_fh(lseg, idx, dss_id, ds->ds_clp, ds_cred, fh,
+ FMODE_READ);
if (localio) {
hdr->task.tk_start = ktime_get();
ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v4 6/9] NFSv4/flexfiles: Commit path updates for striped layouts
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (4 preceding siblings ...)
2025-09-24 16:20 ` [RFC PATCH v4 5/9] NFSv4/flexfiles: Read path updates for striped layouts Jonathan Curley
@ 2025-09-24 16:20 ` Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 7/9] NFSv4/flexfiles: Write " Jonathan Curley
` (4 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
Updates the commit path to be stripe aware. This required updating
the ds_commit_idx to be stripe aware.
ds_commit_idx == mirror_idx * dss_count + dss_id.
Updates code paths to utilize the new ds_commit_idx and derive dss_id
& mirror_idx where appropriate to contact the correct DS using the
corresponding parameters.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 41 ++++++++++++++++----------
1 file changed, 25 insertions(+), 16 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 79700c18762c..3e04de09c3c2 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -605,6 +605,18 @@ ff_layout_free_lseg(struct pnfs_layout_segment *lseg)
_ff_layout_free_lseg(fls);
}
+static u32 calc_mirror_idx_from_commit(struct pnfs_layout_segment *lseg,
+ u32 commit_index)
+{
+ return commit_index / FF_LAYOUT_LSEG(lseg)->mirror_array[0]->dss_count;
+}
+
+static u32 calc_dss_id_from_commit(struct pnfs_layout_segment *lseg,
+ u32 commit_index)
+{
+ return commit_index % FF_LAYOUT_LSEG(lseg)->mirror_array[0]->dss_count;
+}
+
static void
nfs4_ff_start_busy_timer(struct nfs4_ff_busy_timer *timer, ktime_t now)
{
@@ -2094,20 +2106,15 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
return PNFS_NOT_ATTEMPTED;
}
-static u32 calc_ds_index_from_commit(struct pnfs_layout_segment *lseg, u32 i)
-{
- return i;
-}
-
static struct nfs_fh *
-select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i)
+select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i, u32 dss_id)
{
struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
/* FIXME: Assume that there is only one NFS version available
* for the DS.
*/
- return &flseg->mirror_array[i]->dss[0].fh_versions[0];
+ return &flseg->mirror_array[i]->dss[dss_id].fh_versions[0];
}
static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
@@ -2118,7 +2125,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
struct nfsd_file *localio;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
- u32 idx;
+ u32 idx, dss_id;
int vers, ret;
struct nfs_fh *fh;
@@ -2126,22 +2133,23 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags)))
goto out_err;
- idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
+ idx = calc_mirror_idx_from_commit(lseg, data->ds_commit_index);
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, true);
+ dss_id = calc_dss_id_from_commit(lseg, data->ds_commit_index);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, dss_id, true);
if (!ds)
goto out_err;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- data->inode, 0);
+ data->inode, dss_id);
if (IS_ERR(ds_clnt))
goto out_err;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, data->cred, 0);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, data->cred, dss_id);
if (!ds_cred)
goto out_err;
- vers = nfs4_ff_layout_ds_version(mirror, 0);
+ vers = nfs4_ff_layout_ds_version(mirror, dss_id);
dprintk("%s ino %lu, how %d cl_count %d vers %d\n", __func__,
data->inode->i_ino, how, refcount_read(&ds->ds_clp->cl_count),
@@ -2150,12 +2158,12 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
data->cred = ds_cred;
refcount_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
- fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
+ fh = select_ds_fh_from_commit(lseg, idx, dss_id);
if (fh)
data->args.fh = fh;
/* Start IO accounting for local commit */
- localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, dss_id, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
data->task.tk_start = ktime_get();
@@ -2259,8 +2267,9 @@ ff_layout_setup_ds_info(struct pnfs_ds_commit_info *fl_cinfo,
struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
struct inode *inode = lseg->pls_layout->plh_inode;
struct pnfs_commit_array *array, *new;
+ u32 size = flseg->mirror_array_cnt * flseg->mirror_array[0]->dss_count;
- new = pnfs_alloc_commit_array(flseg->mirror_array_cnt,
+ new = pnfs_alloc_commit_array(size,
nfs_io_gfp_mask());
if (new) {
spin_lock(&inode->i_lock);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v4 7/9] NFSv4/flexfiles: Write path updates for striped layouts
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (5 preceding siblings ...)
2025-09-24 16:20 ` [RFC PATCH v4 6/9] NFSv4/flexfiles: Commit " Jonathan Curley
@ 2025-09-24 16:20 ` Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 8/9] NFSv4/flexfiles: Update layout stats & error paths " Jonathan Curley
` (3 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
Updates write path to calculate and use dss_id to direct IO to the
appropriate stripe DS.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 42 ++++++++++++++++++--------
1 file changed, 30 insertions(+), 12 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 3e04de09c3c2..95a5779c32c5 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -605,6 +605,14 @@ ff_layout_free_lseg(struct pnfs_layout_segment *lseg)
_ff_layout_free_lseg(fls);
}
+static u32 calc_commit_idx(struct pnfs_layout_segment *lseg,
+ u32 mirror_idx, u32 dss_id)
+{
+ struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
+
+ return (mirror_idx * flseg->mirror_array[0]->dss_count) + dss_id;
+}
+
static u32 calc_mirror_idx_from_commit(struct pnfs_layout_segment *lseg,
u32 commit_index)
{
@@ -1006,7 +1014,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
struct nfs4_ff_layout_mirror *mirror;
struct nfs_pgio_mirror *pgm;
struct nfs4_pnfs_ds *ds;
- u32 i;
+ u32 i, dss_id;
retry:
pnfs_generic_pg_check_layout(pgio, req);
@@ -1031,7 +1039,12 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
for (i = 0; i < pgio->pg_mirror_count; i++) {
mirror = FF_LAYOUT_COMP(pgio->pg_lseg, i);
- ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, mirror, 0, true);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(pgio->pg_lseg)->stripe_unit,
+ mirror->dss_count,
+ req_offset(req));
+ ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, mirror,
+ dss_id, true);
if (!ds) {
if (!ff_layout_no_fallback_to_mds(pgio->pg_lseg))
goto out_mds;
@@ -1041,7 +1054,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
goto retry;
}
pgm = &pgio->pg_mirrors[i];
- pgm->pg_bsize = mirror->dss[0].mirror_ds->ds_versions[0].wsize;
+ pgm->pg_bsize = mirror->dss[dss_id].mirror_ds->ds_versions[0].wsize;
}
if (NFS_SERVER(pgio->pg_inode)->flags &
@@ -1114,7 +1127,7 @@ static const struct nfs_pageio_ops ff_layout_pg_read_ops = {
static const struct nfs_pageio_ops ff_layout_pg_write_ops = {
.pg_init = ff_layout_pg_init_write,
- .pg_test = pnfs_generic_pg_test,
+ .pg_test = ff_layout_pg_test,
.pg_doio = pnfs_generic_pg_writepages,
.pg_get_mirror_count = ff_layout_pg_get_mirror_count_write,
.pg_cleanup = pnfs_generic_pg_cleanup,
@@ -2043,22 +2056,27 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
int vers;
struct nfs_fh *fh;
u32 idx = hdr->pgio_mirror_idx;
+ u32 dss_id;
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, true);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(lseg)->stripe_unit,
+ mirror->dss_count,
+ offset);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, dss_id, true);
if (!ds)
goto out_failed;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- hdr->inode, 0);
+ hdr->inode, dss_id);
if (IS_ERR(ds_clnt))
goto out_failed;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, 0);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, dss_id);
if (!ds_cred)
goto out_failed;
- vers = nfs4_ff_layout_ds_version(mirror, 0);
+ vers = nfs4_ff_layout_ds_version(mirror, dss_id);
dprintk("%s ino %lu sync %d req %zu@%llu DS: %s cl_count %d vers %d\n",
__func__, hdr->inode->i_ino, sync, (size_t) hdr->args.count,
@@ -2068,12 +2086,12 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->pgio_done_cb = ff_layout_write_done_cb;
refcount_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- hdr->ds_commit_idx = idx;
- fh = nfs4_ff_layout_select_ds_fh(mirror, 0);
+ hdr->ds_commit_idx = calc_commit_idx(lseg, idx, dss_id);
+ fh = nfs4_ff_layout_select_ds_fh(mirror, dss_id);
if (fh)
hdr->args.fh = fh;
- nfs4_ff_layout_select_ds_stateid(mirror, 0, &hdr->args.stateid);
+ nfs4_ff_layout_select_ds_stateid(mirror, dss_id, &hdr->args.stateid);
/*
* Note that if we ever decide to split across DSes,
@@ -2082,7 +2100,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Start IO accounting for local write */
- localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, dss_id, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
hdr->task.tk_start = ktime_get();
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v4 8/9] NFSv4/flexfiles: Update layout stats & error paths for striped layouts
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (6 preceding siblings ...)
2025-09-24 16:20 ` [RFC PATCH v4 7/9] NFSv4/flexfiles: Write " Jonathan Curley
@ 2025-09-24 16:20 ` Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 9/9] NFSv4/flexfiles: Add support " Jonathan Curley
` (2 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
Updates the layout stats logic to be stripe aware. Read and write
stats are accumulated on a per DS stripe basis. Also updates error
paths to use dss_id where appropraite.
Limitations:
1. The layout stats structure is still statically sized to 4 and there
is no deduplication logic for deviceids that may appear more than once
in a striped layout.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 312 +++++++++++++++++--------
1 file changed, 209 insertions(+), 103 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 95a5779c32c5..7b95ab1cd140 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -47,7 +47,7 @@ ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
int dev_limit, enum nfs4_ff_op_type type);
static void ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
const struct nfs42_layoutstat_devinfo *devinfo,
- struct nfs4_ff_layout_mirror *mirror);
+ struct nfs4_ff_layout_ds_stripe *dss_info);
static struct pnfs_layout_hdr *
ff_layout_alloc_layout_hdr(struct inode *inode, gfp_t gfp_flags)
@@ -649,6 +649,7 @@ nfs4_ff_end_busy_timer(struct nfs4_ff_busy_timer *timer, ktime_t now)
static bool
nfs4_ff_layoutstat_start_io(struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
struct nfs4_ff_layoutstat *layoutstat,
ktime_t now)
{
@@ -656,8 +657,8 @@ nfs4_ff_layoutstat_start_io(struct nfs4_ff_layout_mirror *mirror,
struct nfs4_flexfile_layout *ffl = FF_LAYOUT_FROM_HDR(mirror->layout);
nfs4_ff_start_busy_timer(&layoutstat->busy_timer, now);
- if (!mirror->dss[0].start_time)
- mirror->dss[0].start_time = now;
+ if (!mirror->dss[dss_id].start_time)
+ mirror->dss[dss_id].start_time = now;
if (mirror->report_interval != 0)
report_interval = (s64)mirror->report_interval * 1000LL;
else if (layoutstats_timer != 0)
@@ -707,13 +708,16 @@ nfs4_ff_layout_stat_io_update_completed(struct nfs4_ff_layoutstat *layoutstat,
static void
nfs4_ff_layout_stat_io_start_read(struct inode *inode,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
__u64 requested, ktime_t now)
{
bool report;
spin_lock(&mirror->lock);
- report = nfs4_ff_layoutstat_start_io(mirror, &mirror->dss[0].read_stat, now);
- nfs4_ff_layout_stat_io_update_requested(&mirror->dss[0].read_stat, requested);
+ report = nfs4_ff_layoutstat_start_io(
+ mirror, dss_id, &mirror->dss[dss_id].read_stat, now);
+ nfs4_ff_layout_stat_io_update_requested(
+ &mirror->dss[dss_id].read_stat, requested);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -724,11 +728,12 @@ nfs4_ff_layout_stat_io_start_read(struct inode *inode,
static void
nfs4_ff_layout_stat_io_end_read(struct rpc_task *task,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
__u64 requested,
__u64 completed)
{
spin_lock(&mirror->lock);
- nfs4_ff_layout_stat_io_update_completed(&mirror->dss[0].read_stat,
+ nfs4_ff_layout_stat_io_update_completed(&mirror->dss[dss_id].read_stat,
requested, completed,
ktime_get(), task->tk_start);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
@@ -738,13 +743,20 @@ nfs4_ff_layout_stat_io_end_read(struct rpc_task *task,
static void
nfs4_ff_layout_stat_io_start_write(struct inode *inode,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
__u64 requested, ktime_t now)
{
bool report;
spin_lock(&mirror->lock);
- report = nfs4_ff_layoutstat_start_io(mirror, &mirror->dss[0].write_stat, now);
- nfs4_ff_layout_stat_io_update_requested(&mirror->dss[0].write_stat, requested);
+ report = nfs4_ff_layoutstat_start_io(
+ mirror,
+ dss_id,
+ &mirror->dss[dss_id].write_stat,
+ now);
+ nfs4_ff_layout_stat_io_update_requested(
+ &mirror->dss[dss_id].write_stat,
+ requested);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -755,6 +767,7 @@ nfs4_ff_layout_stat_io_start_write(struct inode *inode,
static void
nfs4_ff_layout_stat_io_end_write(struct rpc_task *task,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
__u64 requested,
__u64 completed,
enum nfs3_stable_how committed)
@@ -763,25 +776,25 @@ nfs4_ff_layout_stat_io_end_write(struct rpc_task *task,
requested = completed = 0;
spin_lock(&mirror->lock);
- nfs4_ff_layout_stat_io_update_completed(&mirror->dss[0].write_stat,
+ nfs4_ff_layout_stat_io_update_completed(&mirror->dss[dss_id].write_stat,
requested, completed, ktime_get(), task->tk_start);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
}
static void
-ff_layout_mark_ds_unreachable(struct pnfs_layout_segment *lseg, u32 idx)
+ff_layout_mark_ds_unreachable(struct pnfs_layout_segment *lseg, u32 idx, u32 dss_id)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, dss_id);
if (devid)
nfs4_mark_deviceid_unavailable(devid);
}
static void
-ff_layout_mark_ds_reachable(struct pnfs_layout_segment *lseg, u32 idx)
+ff_layout_mark_ds_reachable(struct pnfs_layout_segment *lseg, u32 idx, u32 dss_id)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, dss_id);
if (devid)
nfs4_mark_deviceid_available(devid);
@@ -1214,11 +1227,11 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
struct pnfs_layout_segment *lseg,
- u32 idx)
+ u32 idx, u32 dss_id)
{
struct pnfs_layout_hdr *lo = lseg->pls_layout;
struct inode *inode = lo->plh_inode;
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, dss_id);
struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
switch (op_status) {
@@ -1315,9 +1328,9 @@ static int ff_layout_async_handle_error_v3(struct rpc_task *task,
u32 op_status,
struct nfs_client *clp,
struct pnfs_layout_segment *lseg,
- u32 idx)
+ u32 idx, u32 dss_id)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, dss_id);
switch (op_status) {
case NFS_OK:
@@ -1381,12 +1394,12 @@ static int ff_layout_async_handle_error(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
struct pnfs_layout_segment *lseg,
- u32 idx)
+ u32 idx, u32 dss_id)
{
int vers = clp->cl_nfs_mod->rpc_vers->number;
if (task->tk_status >= 0) {
- ff_layout_mark_ds_reachable(lseg, idx);
+ ff_layout_mark_ds_reachable(lseg, idx, dss_id);
return 0;
}
@@ -1397,10 +1410,10 @@ static int ff_layout_async_handle_error(struct rpc_task *task,
switch (vers) {
case 3:
return ff_layout_async_handle_error_v3(task, op_status, clp,
- lseg, idx);
+ lseg, idx, dss_id);
case 4:
return ff_layout_async_handle_error_v4(task, op_status, state,
- clp, lseg, idx);
+ clp, lseg, idx, dss_id);
default:
/* should never happen */
WARN_ON_ONCE(1);
@@ -1409,7 +1422,7 @@ static int ff_layout_async_handle_error(struct rpc_task *task,
}
static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
- u32 idx, u64 offset, u64 length,
+ u32 idx, u32 dss_id, u64 offset, u64 length,
u32 *op_status, int opnum, int error)
{
struct nfs4_ff_layout_mirror *mirror;
@@ -1447,7 +1460,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
mirror = FF_LAYOUT_COMP(lseg, idx);
err = ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
- mirror, 0, offset, length, status, opnum,
+ mirror, dss_id, offset, length, status, opnum,
nfs_io_gfp_mask());
switch (status) {
@@ -1456,7 +1469,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
case NFS4ERR_PERM:
break;
case NFS4ERR_NXIO:
- ff_layout_mark_ds_unreachable(lseg, idx);
+ ff_layout_mark_ds_unreachable(lseg, idx, dss_id);
/*
* Don't return the layout if this is a read and we still
* have layouts to try
@@ -1476,10 +1489,16 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
static int ff_layout_read_done_cb(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(hdr->lseg);
+ u32 dss_id = nfs4_ff_layout_calc_dss_id(
+ flseg->stripe_unit,
+ flseg->mirror_array[hdr->pgio_mirror_idx]->dss_count,
+ hdr->args.offset);
int err;
if (task->tk_status < 0) {
- ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
+ ff_layout_io_track_ds_error(hdr->lseg,
+ hdr->pgio_mirror_idx, dss_id,
hdr->args.offset, hdr->args.count,
&hdr->res.op_status, OP_READ,
task->tk_status);
@@ -1489,7 +1508,8 @@ static int ff_layout_read_done_cb(struct rpc_task *task,
err = ff_layout_async_handle_error(task, hdr->res.op_status,
hdr->args.context->state,
hdr->ds_clp, hdr->lseg,
- hdr->pgio_mirror_idx);
+ hdr->pgio_mirror_idx,
+ dss_id);
trace_nfs4_pnfs_read(hdr, err);
clear_bit(NFS_IOHDR_RESEND_PNFS, &hdr->flags);
@@ -1545,23 +1565,47 @@ ff_layout_set_layoutcommit(struct inode *inode,
static void ff_layout_read_record_layoutstats_start(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
+
if (test_and_set_bit(NFS_IOHDR_STAT, &hdr->flags))
return;
- nfs4_ff_layout_stat_io_start_read(hdr->inode,
- FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx),
- hdr->args.count,
- task->tk_start);
+
+ mirror = FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(hdr->lseg)->stripe_unit,
+ mirror->dss_count,
+ hdr->args.offset);
+
+ nfs4_ff_layout_stat_io_start_read(
+ hdr->inode,
+ mirror,
+ dss_id,
+ hdr->args.count,
+ task->tk_start);
}
static void ff_layout_read_record_layoutstats_done(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
+
if (!test_and_clear_bit(NFS_IOHDR_STAT, &hdr->flags))
return;
- nfs4_ff_layout_stat_io_end_read(task,
- FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx),
- hdr->args.count,
- hdr->res.count);
+
+ mirror = FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(hdr->lseg)->stripe_unit,
+ mirror->dss_count,
+ hdr->args.offset);
+
+ nfs4_ff_layout_stat_io_end_read(
+ task,
+ mirror,
+ dss_id,
+ hdr->args.count,
+ hdr->res.count);
set_bit(NFS_LSEG_LAYOUTRETURN, &hdr->lseg->pls_flags);
}
@@ -1649,11 +1693,17 @@ static void ff_layout_read_release(void *data)
static int ff_layout_write_done_cb(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(hdr->lseg);
+ u32 dss_id = nfs4_ff_layout_calc_dss_id(
+ flseg->stripe_unit,
+ flseg->mirror_array[hdr->pgio_mirror_idx]->dss_count,
+ hdr->args.offset);
loff_t end_offs = 0;
int err;
if (task->tk_status < 0) {
- ff_layout_io_track_ds_error(hdr->lseg, hdr->pgio_mirror_idx,
+ ff_layout_io_track_ds_error(hdr->lseg,
+ hdr->pgio_mirror_idx, dss_id,
hdr->args.offset, hdr->args.count,
&hdr->res.op_status, OP_WRITE,
task->tk_status);
@@ -1663,7 +1713,8 @@ static int ff_layout_write_done_cb(struct rpc_task *task,
err = ff_layout_async_handle_error(task, hdr->res.op_status,
hdr->args.context->state,
hdr->ds_clp, hdr->lseg,
- hdr->pgio_mirror_idx);
+ hdr->pgio_mirror_idx,
+ dss_id);
trace_nfs4_pnfs_write(hdr, err);
clear_bit(NFS_IOHDR_RESEND_PNFS, &hdr->flags);
@@ -1701,9 +1752,11 @@ static int ff_layout_commit_done_cb(struct rpc_task *task,
struct nfs_commit_data *data)
{
int err;
+ u32 idx = calc_mirror_idx_from_commit(data->lseg, data->ds_commit_index);
+ u32 dss_id = calc_dss_id_from_commit(data->lseg, data->ds_commit_index);
if (task->tk_status < 0) {
- ff_layout_io_track_ds_error(data->lseg, data->ds_commit_index,
+ ff_layout_io_track_ds_error(data->lseg, idx, dss_id,
data->args.offset, data->args.count,
&data->res.op_status, OP_COMMIT,
task->tk_status);
@@ -1711,8 +1764,8 @@ static int ff_layout_commit_done_cb(struct rpc_task *task,
}
err = ff_layout_async_handle_error(task, data->res.op_status,
- NULL, data->ds_clp, data->lseg,
- data->ds_commit_index);
+ NULL, data->ds_clp, data->lseg, idx,
+ dss_id);
trace_nfs4_pnfs_commit_ds(data, err);
switch (err) {
@@ -1731,30 +1784,54 @@ static int ff_layout_commit_done_cb(struct rpc_task *task,
}
ff_layout_set_layoutcommit(data->inode, data->lseg, data->lwb);
-
return 0;
}
static void ff_layout_write_record_layoutstats_start(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
+
if (test_and_set_bit(NFS_IOHDR_STAT, &hdr->flags))
return;
- nfs4_ff_layout_stat_io_start_write(hdr->inode,
- FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx),
- hdr->args.count,
- task->tk_start);
+
+ mirror = FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(hdr->lseg)->stripe_unit,
+ mirror->dss_count,
+ hdr->args.offset);
+
+ nfs4_ff_layout_stat_io_start_write(
+ hdr->inode,
+ mirror,
+ dss_id,
+ hdr->args.count,
+ task->tk_start);
}
static void ff_layout_write_record_layoutstats_done(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
+
if (!test_and_clear_bit(NFS_IOHDR_STAT, &hdr->flags))
return;
- nfs4_ff_layout_stat_io_end_write(task,
- FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx),
- hdr->args.count, hdr->res.count,
- hdr->res.verf->committed);
+
+ mirror = FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(hdr->lseg)->stripe_unit,
+ mirror->dss_count,
+ hdr->args.offset);
+
+ nfs4_ff_layout_stat_io_end_write(
+ task,
+ mirror,
+ dss_id,
+ hdr->args.count,
+ hdr->res.count,
+ hdr->res.verf->committed);
set_bit(NFS_LSEG_LAYOUTRETURN, &hdr->lseg->pls_flags);
}
@@ -1837,10 +1914,16 @@ static void ff_layout_write_release(void *data)
static void ff_layout_commit_record_layoutstats_start(struct rpc_task *task,
struct nfs_commit_data *cdata)
{
+ u32 idx, dss_id;
+
if (test_and_set_bit(NFS_IOHDR_STAT, &cdata->flags))
return;
+
+ idx = calc_mirror_idx_from_commit(cdata->lseg, cdata->ds_commit_index);
+ dss_id = calc_dss_id_from_commit(cdata->lseg, cdata->ds_commit_index);
nfs4_ff_layout_stat_io_start_write(cdata->inode,
- FF_LAYOUT_COMP(cdata->lseg, cdata->ds_commit_index),
+ FF_LAYOUT_COMP(cdata->lseg, idx),
+ dss_id,
0, task->tk_start);
}
@@ -1849,6 +1932,7 @@ static void ff_layout_commit_record_layoutstats_done(struct rpc_task *task,
{
struct nfs_page *req;
__u64 count = 0;
+ u32 idx, dss_id;
if (!test_and_clear_bit(NFS_IOHDR_STAT, &cdata->flags))
return;
@@ -1857,8 +1941,12 @@ static void ff_layout_commit_record_layoutstats_done(struct rpc_task *task,
list_for_each_entry(req, &cdata->pages, wb_list)
count += req->wb_bytes;
}
+
+ idx = calc_mirror_idx_from_commit(cdata->lseg, cdata->ds_commit_index);
+ dss_id = calc_dss_id_from_commit(cdata->lseg, cdata->ds_commit_index);
nfs4_ff_layout_stat_io_end_write(task,
- FF_LAYOUT_COMP(cdata->lseg, cdata->ds_commit_index),
+ FF_LAYOUT_COMP(cdata->lseg, idx),
+ dss_id,
count, count, NFS_FILE_SYNC);
set_bit(NFS_LSEG_LAYOUTRETURN, &cdata->lseg->pls_flags);
}
@@ -2245,25 +2333,28 @@ static void ff_layout_cancel_io(struct pnfs_layout_segment *lseg)
struct nfs4_pnfs_ds *ds;
struct nfs_client *ds_clp;
struct rpc_clnt *clnt;
- u32 idx;
+ u32 idx, dss_id;
for (idx = 0; idx < flseg->mirror_array_cnt; idx++) {
mirror = flseg->mirror_array[idx];
- mirror_ds = mirror->dss[0].mirror_ds;
- if (IS_ERR_OR_NULL(mirror_ds))
- continue;
- ds = mirror->dss[0].mirror_ds->ds;
- if (!ds)
- continue;
- ds_clp = ds->ds_clp;
- if (!ds_clp)
- continue;
- clnt = ds_clp->cl_rpcclient;
- if (!clnt)
- continue;
- if (!rpc_cancel_tasks(clnt, -EAGAIN, ff_layout_match_io, lseg))
- continue;
- rpc_clnt_disconnect(clnt);
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++) {
+ mirror_ds = mirror->dss[dss_id].mirror_ds;
+ if (IS_ERR_OR_NULL(mirror_ds))
+ continue;
+ ds = mirror->dss[dss_id].mirror_ds->ds;
+ if (!ds)
+ continue;
+ ds_clp = ds->ds_clp;
+ if (!ds_clp)
+ continue;
+ clnt = ds_clp->cl_rpcclient;
+ if (!clnt)
+ continue;
+ if (!rpc_cancel_tasks(clnt, -EAGAIN,
+ ff_layout_match_io, lseg))
+ continue;
+ rpc_clnt_disconnect(clnt);
+ }
}
}
@@ -2651,11 +2742,11 @@ ff_layout_encode_io_latency(struct xdr_stream *xdr,
static void
ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
const struct nfs42_layoutstat_devinfo *devinfo,
- struct nfs4_ff_layout_mirror *mirror)
+ struct nfs4_ff_layout_ds_stripe *dss_info)
{
struct nfs4_pnfs_ds_addr *da;
- struct nfs4_pnfs_ds *ds = mirror->dss[0].mirror_ds->ds;
- struct nfs_fh *fh = &mirror->dss[0].fh_versions[0];
+ struct nfs4_pnfs_ds *ds = dss_info->mirror_ds->ds;
+ struct nfs_fh *fh = &dss_info->fh_versions[0];
__be32 *p;
da = list_first_entry(&ds->ds_addrs, struct nfs4_pnfs_ds_addr, da_node);
@@ -2667,13 +2758,17 @@ ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
p = xdr_reserve_space(xdr, 4 + fh->size);
xdr_encode_opaque(p, fh->data, fh->size);
/* ff_io_latency4 read */
- spin_lock(&mirror->lock);
- ff_layout_encode_io_latency(xdr, &mirror->dss[0].read_stat.io_stat);
+ spin_lock(&dss_info->mirror->lock);
+ ff_layout_encode_io_latency(xdr,
+ &dss_info->read_stat.io_stat);
/* ff_io_latency4 write */
- ff_layout_encode_io_latency(xdr, &mirror->dss[0].write_stat.io_stat);
- spin_unlock(&mirror->lock);
+ ff_layout_encode_io_latency(xdr,
+ &dss_info->write_stat.io_stat);
+ spin_unlock(&dss_info->mirror->lock);
/* nfstime4 */
- ff_layout_encode_nfstime(xdr, ktime_sub(ktime_get(), mirror->dss[0].start_time));
+ ff_layout_encode_nfstime(xdr,
+ ktime_sub(ktime_get(),
+ dss_info->start_time));
/* bool */
p = xdr_reserve_space(xdr, 4);
*p = cpu_to_be32(false);
@@ -2697,7 +2792,8 @@ ff_layout_encode_layoutstats(struct xdr_stream *xdr, const void *args,
static void
ff_layout_free_layoutstats(struct nfs4_xdr_opaque_data *opaque)
{
- struct nfs4_ff_layout_mirror *mirror = opaque->data;
+ struct nfs4_ff_layout_ds_stripe *dss_info = opaque->data;
+ struct nfs4_ff_layout_mirror *mirror = dss_info->mirror;
ff_layout_put_mirror(mirror);
}
@@ -2714,37 +2810,47 @@ ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
{
struct nfs4_flexfile_layout *ff_layout = FF_LAYOUT_FROM_HDR(lo);
struct nfs4_ff_layout_mirror *mirror;
+ struct nfs4_ff_layout_ds_stripe *dss_info;
struct nfs4_deviceid_node *dev;
- int i = 0;
+ int i = 0, dss_id;
list_for_each_entry(mirror, &ff_layout->mirrors, mirrors) {
- if (i >= dev_limit)
- break;
- if (IS_ERR_OR_NULL(mirror->dss[0].mirror_ds))
- continue;
- if (!test_and_clear_bit(NFS4_FF_MIRROR_STAT_AVAIL,
- &mirror->flags) &&
- type != NFS4_FF_OP_LAYOUTRETURN)
- continue;
- /* mirror refcount put in cleanup_layoutstats */
- if (!refcount_inc_not_zero(&mirror->ref))
- continue;
- dev = &mirror->dss[0].mirror_ds->id_node;
- memcpy(&devinfo->dev_id, &dev->deviceid, NFS4_DEVICEID4_SIZE);
- devinfo->offset = 0;
- devinfo->length = NFS4_MAX_UINT64;
- spin_lock(&mirror->lock);
- devinfo->read_count = mirror->dss[0].read_stat.io_stat.ops_completed;
- devinfo->read_bytes = mirror->dss[0].read_stat.io_stat.bytes_completed;
- devinfo->write_count = mirror->dss[0].write_stat.io_stat.ops_completed;
- devinfo->write_bytes = mirror->dss[0].write_stat.io_stat.bytes_completed;
- spin_unlock(&mirror->lock);
- devinfo->layout_type = LAYOUT_FLEX_FILES;
- devinfo->ld_private.ops = &layoutstat_ops;
- devinfo->ld_private.data = mirror;
-
- devinfo++;
- i++;
+ for (dss_id = 0; dss_id < mirror->dss_count; ++dss_id) {
+ dss_info = &mirror->dss[dss_id];
+ if (i >= dev_limit)
+ break;
+ if (IS_ERR_OR_NULL(dss_info->mirror_ds))
+ continue;
+ if (!test_and_clear_bit(NFS4_FF_MIRROR_STAT_AVAIL,
+ &mirror->flags) &&
+ type != NFS4_FF_OP_LAYOUTRETURN)
+ continue;
+ /* mirror refcount put in cleanup_layoutstats */
+ if (!refcount_inc_not_zero(&mirror->ref))
+ continue;
+ dev = &dss_info->mirror_ds->id_node;
+ memcpy(&devinfo->dev_id,
+ &dev->deviceid,
+ NFS4_DEVICEID4_SIZE);
+ devinfo->offset = 0;
+ devinfo->length = NFS4_MAX_UINT64;
+ spin_lock(&mirror->lock);
+ devinfo->read_count =
+ dss_info->read_stat.io_stat.ops_completed;
+ devinfo->read_bytes =
+ dss_info->read_stat.io_stat.bytes_completed;
+ devinfo->write_count =
+ dss_info->write_stat.io_stat.ops_completed;
+ devinfo->write_bytes =
+ dss_info->write_stat.io_stat.bytes_completed;
+ spin_unlock(&mirror->lock);
+ devinfo->layout_type = LAYOUT_FLEX_FILES;
+ devinfo->ld_private.ops = &layoutstat_ops;
+ devinfo->ld_private.data = &mirror->dss[dss_id];
+
+ devinfo++;
+ i++;
+ }
}
return i;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v4 9/9] NFSv4/flexfiles: Add support for striped layouts
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (7 preceding siblings ...)
2025-09-24 16:20 ` [RFC PATCH v4 8/9] NFSv4/flexfiles: Update layout stats & error paths " Jonathan Curley
@ 2025-09-24 16:20 ` Jonathan Curley
2025-10-07 14:05 ` [RFC PATCH v4 0/9] " Mike Snitzer
2025-10-15 13:09 ` [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Mike Snitzer
10 siblings, 0 replies; 17+ messages in thread
From: Jonathan Curley @ 2025-09-24 16:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker
Cc: Jonathan Curley, Luis Chamberlain, linux-nfs
Updates lseg creation path to parse and add striped layouts. Enable
support for striped layouts.
Limitations:
1. All mirrors must have the same number of stripes.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 247 ++++++++++++++++---------
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
2 files changed, 157 insertions(+), 92 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 7b95ab1cd140..4546711a7117 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -177,18 +177,19 @@ ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx, u32 dss_id,
#endif
}
-static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
- const struct nfs4_ff_layout_mirror *m2)
+static bool ff_dss_match_fh(const struct nfs4_ff_layout_ds_stripe *dss1,
+ const struct nfs4_ff_layout_ds_stripe *dss2)
{
int i, j;
- if (m1->dss[0].fh_versions_cnt != m2->dss[0].fh_versions_cnt)
+ if (dss1->fh_versions_cnt != dss2->fh_versions_cnt)
return false;
- for (i = 0; i < m1->dss[0].fh_versions_cnt; i++) {
+
+ for (i = 0; i < dss1->fh_versions_cnt; i++) {
bool found_fh = false;
- for (j = 0; j < m2->dss[0].fh_versions_cnt; j++) {
- if (nfs_compare_fh(&m1->dss[0].fh_versions[i],
- &m2->dss[0].fh_versions[j]) == 0) {
+ for (j = 0; j < dss2->fh_versions_cnt; j++) {
+ if (nfs_compare_fh(&dss1->fh_versions[i],
+ &dss2->fh_versions[j]) == 0) {
found_fh = true;
break;
}
@@ -199,6 +200,38 @@ static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
return true;
}
+static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
+ const struct nfs4_ff_layout_mirror *m2)
+{
+ u32 dss_id;
+
+ if (m1->dss_count != m2->dss_count)
+ return false;
+
+ for (dss_id = 0; dss_id < m1->dss_count; dss_id++)
+ if (!ff_dss_match_fh(&m1->dss[dss_id], &m2->dss[dss_id]))
+ return false;
+
+ return true;
+}
+
+static bool ff_mirror_match_devid(const struct nfs4_ff_layout_mirror *m1,
+ const struct nfs4_ff_layout_mirror *m2)
+{
+ u32 dss_id;
+
+ if (m1->dss_count != m2->dss_count)
+ return false;
+
+ for (dss_id = 0; dss_id < m1->dss_count; dss_id++)
+ if (memcmp(&m1->dss[dss_id].devid,
+ &m2->dss[dss_id].devid,
+ sizeof(m1->dss[dss_id].devid)) != 0)
+ return false;
+
+ return true;
+}
+
static struct nfs4_ff_layout_mirror *
ff_layout_add_mirror(struct pnfs_layout_hdr *lo,
struct nfs4_ff_layout_mirror *mirror)
@@ -209,8 +242,7 @@ ff_layout_add_mirror(struct pnfs_layout_hdr *lo,
spin_lock(&inode->i_lock);
list_for_each_entry(pos, &ff_layout->mirrors, mirrors) {
- if (memcmp(&mirror->dss[0].devid, &pos->dss[0].devid,
- sizeof(pos->dss[0].devid)) != 0)
+ if (!ff_mirror_match_devid(mirror, pos))
continue;
if (!ff_mirror_match_fh(mirror, pos))
continue;
@@ -241,13 +273,15 @@ ff_layout_remove_mirror(struct nfs4_ff_layout_mirror *mirror)
static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
{
struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
mirror = kzalloc(sizeof(*mirror), gfp_flags);
if (mirror != NULL) {
spin_lock_init(&mirror->lock);
refcount_set(&mirror->ref, 1);
INIT_LIST_HEAD(&mirror->mirrors);
- nfs_localio_file_init(&mirror->dss[0].nfl);
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++)
+ nfs_localio_file_init(&mirror->dss[dss_id].nfl);
}
return mirror;
}
@@ -255,17 +289,19 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
{
const struct cred *cred;
- int dss_id = 0;
+ u32 dss_id;
ff_layout_remove_mirror(mirror);
- kfree(mirror->dss[dss_id].fh_versions);
- nfs_close_local_fh(&mirror->dss[dss_id].nfl);
- cred = rcu_access_pointer(mirror->dss[dss_id].ro_cred);
- put_cred(cred);
- cred = rcu_access_pointer(mirror->dss[dss_id].rw_cred);
- put_cred(cred);
- nfs4_ff_layout_put_deviceid(mirror->dss[dss_id].mirror_ds);
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++) {
+ kfree(mirror->dss[dss_id].fh_versions);
+ cred = rcu_access_pointer(mirror->dss[dss_id].ro_cred);
+ put_cred(cred);
+ cred = rcu_access_pointer(mirror->dss[dss_id].rw_cred);
+ put_cred(cred);
+ nfs_close_local_fh(&mirror->dss[dss_id].nfl);
+ nfs4_ff_layout_put_deviceid(mirror->dss[dss_id].mirror_ds);
+ }
kfree(mirror->dss);
kfree(mirror);
@@ -371,14 +407,24 @@ ff_layout_add_lseg(struct pnfs_layout_hdr *lo,
free_me);
}
+static u32 ff_mirror_efficiency_sum(const struct nfs4_ff_layout_mirror *mirror)
+{
+ u32 dss_id, sum = 0;
+
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++)
+ sum += mirror->dss[dss_id].efficiency;
+
+ return sum;
+}
+
static void ff_layout_sort_mirrors(struct nfs4_ff_layout_segment *fls)
{
int i, j;
for (i = 0; i < fls->mirror_array_cnt - 1; i++) {
for (j = i + 1; j < fls->mirror_array_cnt; j++)
- if (fls->mirror_array[i]->dss[0].efficiency <
- fls->mirror_array[j]->dss[0].efficiency)
+ if (ff_mirror_efficiency_sum(fls->mirror_array[i]) <
+ ff_mirror_efficiency_sum(fls->mirror_array[j]))
swap(fls->mirror_array[i],
fls->mirror_array[j]);
}
@@ -398,6 +444,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
u32 mirror_array_cnt;
__be32 *p;
int i, rc;
+ struct nfs4_ff_layout_ds_stripe *dss_info;
dprintk("--> %s\n", __func__);
scratch = alloc_page(gfp_flags);
@@ -440,17 +487,24 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
kuid_t uid;
kgid_t gid;
u32 fh_count, id;
- int j, dss_id = 0;
+ int j, dss_id;
rc = -EIO;
p = xdr_inline_decode(&stream, 4);
if (!p)
goto out_err_free;
- dss_count = be32_to_cpup(p);
+ // Ensure all mirrors have same stripe count.
+ if (dss_count == 0)
+ dss_count = be32_to_cpup(p);
+ else if (dss_count != be32_to_cpup(p))
+ goto out_err_free;
+
+ if (dss_count > NFS4_FLEXFILE_LAYOUT_MAX_STRIPE_CNT ||
+ dss_count == 0)
+ goto out_err_free;
- /* FIXME: allow for striping? */
- if (dss_count != 1)
+ if (dss_count > 1 && stripe_unit == 0)
goto out_err_free;
fls->mirror_array[i] = ff_layout_alloc_mirror(gfp_flags);
@@ -464,91 +518,100 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
gfp_flags);
- /* deviceid */
- rc = decode_deviceid(&stream, &fls->mirror_array[i]->dss[dss_id].devid);
- if (rc)
- goto out_err_free;
+ for (dss_id = 0; dss_id < dss_count; dss_id++) {
+ dss_info = &fls->mirror_array[i]->dss[dss_id];
+ dss_info->mirror = fls->mirror_array[i];
- /* efficiency */
- rc = -EIO;
- p = xdr_inline_decode(&stream, 4);
- if (!p)
- goto out_err_free;
- fls->mirror_array[i]->dss[dss_id].efficiency = be32_to_cpup(p);
+ /* deviceid */
+ rc = decode_deviceid(&stream, &dss_info->devid);
+ if (rc)
+ goto out_err_free;
- /* stateid */
- rc = decode_pnfs_stateid(&stream, &fls->mirror_array[i]->dss[dss_id].stateid);
- if (rc)
- goto out_err_free;
+ /* efficiency */
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ dss_info->efficiency = be32_to_cpup(p);
- /* fh */
- rc = -EIO;
- p = xdr_inline_decode(&stream, 4);
- if (!p)
- goto out_err_free;
- fh_count = be32_to_cpup(p);
+ /* stateid */
+ rc = decode_pnfs_stateid(&stream, &dss_info->stateid);
+ if (rc)
+ goto out_err_free;
- fls->mirror_array[i]->dss[dss_id].fh_versions =
- kcalloc(fh_count, sizeof(struct nfs_fh),
- gfp_flags);
- if (fls->mirror_array[i]->dss[dss_id].fh_versions == NULL) {
- rc = -ENOMEM;
- goto out_err_free;
- }
+ /* fh */
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ fh_count = be32_to_cpup(p);
- for (j = 0; j < fh_count; j++) {
- rc = decode_nfs_fh(&stream,
- &fls->mirror_array[i]->dss[dss_id].fh_versions[j]);
+ dss_info->fh_versions =
+ kcalloc(fh_count, sizeof(struct nfs_fh),
+ gfp_flags);
+ if (dss_info->fh_versions == NULL) {
+ rc = -ENOMEM;
+ goto out_err_free;
+ }
+
+ for (j = 0; j < fh_count; j++) {
+ rc = decode_nfs_fh(&stream,
+ &dss_info->fh_versions[j]);
+ if (rc)
+ goto out_err_free;
+ }
+
+ dss_info->fh_versions_cnt = fh_count;
+
+ /* user */
+ rc = decode_name(&stream, &id);
if (rc)
goto out_err_free;
- }
- fls->mirror_array[i]->dss[dss_id].fh_versions_cnt = fh_count;
+ uid = make_kuid(&init_user_ns, id);
- /* user */
- rc = decode_name(&stream, &id);
- if (rc)
- goto out_err_free;
+ /* group */
+ rc = decode_name(&stream, &id);
+ if (rc)
+ goto out_err_free;
- uid = make_kuid(&init_user_ns, id);
+ gid = make_kgid(&init_user_ns, id);
- /* group */
- rc = decode_name(&stream, &id);
- if (rc)
- goto out_err_free;
+ if (gfp_flags & __GFP_FS)
+ kcred = prepare_kernel_cred(&init_task);
+ else {
+ unsigned int nofs_flags = memalloc_nofs_save();
- gid = make_kgid(&init_user_ns, id);
+ kcred = prepare_kernel_cred(&init_task);
+ memalloc_nofs_restore(nofs_flags);
+ }
+ rc = -ENOMEM;
+ if (!kcred)
+ goto out_err_free;
+ kcred->fsuid = uid;
+ kcred->fsgid = gid;
+ cred = RCU_INITIALIZER(kcred);
- if (gfp_flags & __GFP_FS)
- kcred = prepare_kernel_cred(&init_task);
- else {
- unsigned int nofs_flags = memalloc_nofs_save();
- kcred = prepare_kernel_cred(&init_task);
- memalloc_nofs_restore(nofs_flags);
+ if (lgr->range.iomode == IOMODE_READ)
+ rcu_assign_pointer(dss_info->ro_cred, cred);
+ else
+ rcu_assign_pointer(dss_info->rw_cred, cred);
}
- rc = -ENOMEM;
- if (!kcred)
- goto out_err_free;
- kcred->fsuid = uid;
- kcred->fsgid = gid;
- cred = RCU_INITIALIZER(kcred);
-
- if (lgr->range.iomode == IOMODE_READ)
- rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].ro_cred, cred);
- else
- rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].rw_cred, cred);
mirror = ff_layout_add_mirror(lh, fls->mirror_array[i]);
if (mirror != fls->mirror_array[i]) {
- /* swap cred ptrs so free_mirror will clean up old */
- if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->dss[dss_id].ro_cred,
- fls->mirror_array[i]->dss[dss_id].ro_cred);
- rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].ro_cred, cred);
- } else {
- cred = xchg(&mirror->dss[dss_id].rw_cred,
- fls->mirror_array[i]->dss[dss_id].rw_cred);
- rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].rw_cred, cred);
+ for (dss_id = 0; dss_id < dss_count; dss_id++) {
+ dss_info = &fls->mirror_array[i]->dss[dss_id];
+ /* swap cred ptrs so free_mirror will clean up old */
+ if (lgr->range.iomode == IOMODE_READ) {
+ cred = xchg(&mirror->dss[dss_id].ro_cred,
+ dss_info->ro_cred);
+ rcu_assign_pointer(dss_info->ro_cred, cred);
+ } else {
+ cred = xchg(&mirror->dss[dss_id].rw_cred,
+ dss_info->rw_cred);
+ rcu_assign_pointer(dss_info->rw_cred, cred);
+ }
}
ff_layout_free_mirror(fls->mirror_array[i]);
fls->mirror_array[i] = mirror;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index 142324d6d5c5..17a008c8e97c 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -21,6 +21,8 @@
* due to network error etc. */
#define NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT 4096
+#define NFS4_FLEXFILE_LAYOUT_MAX_STRIPE_CNT 4096
+
/* LAYOUTSTATS report interval in ms */
#define FF_LAYOUTSTATS_REPORT_INTERVAL (60000L)
#define FF_LAYOUTSTATS_MAXDEV 4
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (8 preceding siblings ...)
2025-09-24 16:20 ` [RFC PATCH v4 9/9] NFSv4/flexfiles: Add support " Jonathan Curley
@ 2025-10-07 14:05 ` Mike Snitzer
2025-10-07 14:50 ` Mike Snitzer
2025-10-15 13:09 ` [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Mike Snitzer
10 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2025-10-07 14:05 UTC (permalink / raw)
To: Jonathan Curley
Cc: Trond Myklebust, Anna Schumaker, Luis Chamberlain, linux-nfs
On Wed, Sep 24, 2025 at 04:20:41PM +0000, Jonathan Curley wrote:
> This patch series introduces support for striped layouts:
>
> The first 2 patches are simple preparation changes. There should be
> no logical impact to the code.
>
> The 3rd patch refactors the nfs4_ff_layout_mirror struct to have an
> array of a new nfs4_ff_layout_ds_stripe type. The
> nfs4_ff_layout_ds_stripe has all the contents of ff_data_server4 per
> the flexfile rfc. I called it ds_stripe because ds was already taken
> by the deviceid side of the code.
>
> The patches 4-8 update various paths to be dss_id aware. Most of this
> consists of either adding a new parameter to the function or adding a
> loop. Depending on which is appropriate.
>
> The final patch 9 updates the layout creation path to populate the
> array and turns the feature on.
>
> v1:
> - Fixes function parameter 'dss_id' not described in
> 'nfs4_ff_layout_prepare_ds'
>
> v2:
> - Fixes layout stat error reporting path for commit to properly
> calculate dss_id.
>
> v3:
> - Fixes do_div dividend to be u64.
>
> v4:
> - Use regular division operators for u32 commit path math.
> - Fix mirror null check in ff_rw_layout_has_available_ds.
>
> Jonathan Curley (9):
> NFSv4/flexfiles: Remove cred local variable dependency
> NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
> NFSv4/flexfiles: Add data structure support for striped layouts
> NFSv4/flexfiles: Update low level helper functions to be DS stripe
> aware.
> NFSv4/flexfiles: Read path updates for striped layouts
> NFSv4/flexfiles: Commit path updates for striped layouts
> NFSv4/flexfiles: Write path updates for striped layouts
> NFSv4/flexfiles: Update layout stats & error paths for striped layouts
> NFSv4/flexfiles: Add support for striped layouts
>
> fs/nfs/flexfilelayout/flexfilelayout.c | 778 +++++++++++++++-------
> fs/nfs/flexfilelayout/flexfilelayout.h | 64 +-
> fs/nfs/flexfilelayout/flexfilelayoutdev.c | 105 +--
> fs/nfs/write.c | 2 +-
> 4 files changed, 635 insertions(+), 314 deletions(-)
>
> --
> 2.34.1
>
>
Hi Jonathan,
Testing the latest 'nfs-for-6.18-1' tag (now merged into Linus' tree),
with your flexfiles striped layout changes, using NFS LOCALIO results
in NFSD shutdown hanging due to nfsd refcount.
I'm using 4.2 flexfiles client that connects to local system's DS via
NFS v3 over LOCALIO, and then simply issuing very brief IO with dd:
dd if=/dev/zero of=/mnt/hs_test/dd_thisisa.test bs=47008 count=2 oflag=direct
dd if=/mnt/hs_test/dd_thisisa.test of=/dev/null bs=47008 count=2 iflag=direct
followed by umount of the filesystem, and then attempt to stop all
NFS/NFSD services so that all related kernel modules may be unloaded:
umount /mnt/hs_test
systemctl stop rpc-statd.service
systemctl stop rpc-statd-notify.service
systemctl stop var-lib-nfs-rpc_pipefs.mount
systemctl stop proc-fs-nfsd.mount
systemctl stop nfs-server.service
# ^ this hangs below...
systemctl stop rpcbind
systemctl stop rpcbind.socket
systemctl stop nfsdcld.service
systemctl stop nfs-client.target
/var/log/messages shows:
Oct 6 18:08:20 plsm121c-06 systemd[1]: mnt-hs_test.mount: Deactivated successfully.
Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopping NFS status monitor for NFSv2/3 locking....
Oct 6 18:08:20 plsm121c-06 systemd[1]: rpc-statd.service: Deactivated successfully.
Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking..
Oct 6 18:08:20 plsm121c-06 systemd[1]: rpc-statd-notify.service: Deactivated successfully.
Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped Notify NFS peers of a restart.
Oct 6 18:08:20 plsm121c-06 rpc.idmapd[8982]: exiting on signal 15
Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopping NFSv4 ID-name mapping service...
Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopping NFSv4 Client Tracking Daemon...
Oct 6 18:08:20 plsm121c-06 systemd[1]: nfs-idmapd.service: Deactivated successfully.
Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped NFSv4 ID-name mapping service.
Oct 6 18:08:20 plsm121c-06 systemd[1]: nfsdcld.service: Deactivated successfully.
Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped NFSv4 Client Tracking Daemon.
Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped target rpc_pipefs.target.
Oct 6 18:08:20 plsm121c-06 systemd[1]: Unmounting RPC Pipe File System...
Oct 6 18:08:20 plsm121c-06 systemd[1]: var-lib-nfs-rpc_pipefs.mount: Deactivated successfully.
Oct 6 18:08:20 plsm121c-06 systemd[1]: Unmounted RPC Pipe File System.
Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopping NFS server and services...
Oct 6 18:08:22 plsm121c-06 rpc.mountd[8988]: v4.2 client detached: 0x68a8fa0468e404e3 from "192.168.0.105:853"
Oct 6 18:09:50 plsm121c-06 systemd[1]: nfs-server.service: Stopping timed out. Terminating.
Oct 6 18:10:00 plsm121c-06 systemd[1]: Starting system activity accounting tool...
Oct 6 18:10:00 plsm121c-06 systemd[1]: sysstat-collect.service: Deactivated successfully.
Oct 6 18:10:00 plsm121c-06 systemd[1]: Finished system activity accounting tool.
Oct 6 18:11:21 plsm121c-06 systemd[1]: nfs-server.service: State 'stop-sigterm' timed out. Killing.
Oct 6 18:11:21 plsm121c-06 systemd[1]: nfs-server.service: Killing process 9669 (rpc.nfsd) with signal SIGKILL.
Oct 6 18:11:26 plsm121c-06 kernel: INFO: task rpc.nfsd:9669 blocked for more than 122 seconds.
Oct 6 18:11:26 plsm121c-06 kernel: Not tainted 6.12.24.23.hs.snitm+ #44
Oct 6 18:11:26 plsm121c-06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 6 18:11:26 plsm121c-06 kernel: task:rpc.nfsd state:D stack:0 pid:9669 tgid:9669 ppid:1 flags:0x00004006
Oct 6 18:11:26 plsm121c-06 kernel: Call Trace:
Oct 6 18:11:26 plsm121c-06 kernel: <TASK>
Oct 6 18:11:26 plsm121c-06 kernel: __schedule+0x26d/0x530
Oct 6 18:11:26 plsm121c-06 kernel: schedule+0x27/0xa0
Oct 6 18:11:26 plsm121c-06 kernel: schedule_timeout+0x14e/0x160
Oct 6 18:11:26 plsm121c-06 kernel: ? svc_destroy+0xce/0x160 [sunrpc]
Oct 6 18:11:26 plsm121c-06 kernel: ? lockd_put+0x5f/0x90 [lockd]
Oct 6 18:11:26 plsm121c-06 kernel: __wait_for_common+0x8f/0x1d0
Oct 6 18:11:26 plsm121c-06 kernel: ? __pfx_schedule_timeout+0x10/0x10
Oct 6 18:11:26 plsm121c-06 kernel: nfsd_destroy_serv+0x138/0x1a0 [nfsd]
Oct 6 18:11:26 plsm121c-06 kernel: nfsd_svc+0xe0/0x170 [nfsd]
Oct 6 18:11:26 plsm121c-06 kernel: write_threads+0xc3/0x190 [nfsd]
Oct 6 18:11:26 plsm121c-06 kernel: ? simple_transaction_get+0xc2/0xe0
Oct 6 18:11:26 plsm121c-06 kernel: ? __pfx_write_threads+0x10/0x10 [nfsd]
Oct 6 18:11:26 plsm121c-06 kernel: nfsctl_transaction_write+0x47/0x80 [nfsd]
Oct 6 18:11:26 plsm121c-06 kernel: vfs_write+0xfa/0x420
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
Oct 6 18:11:26 plsm121c-06 kernel: ksys_write+0x63/0xe0
Oct 6 18:11:26 plsm121c-06 kernel: do_syscall_64+0x7d/0x160
Oct 6 18:11:26 plsm121c-06 kernel: ? __x64_sys_close+0x3c/0x80
Oct 6 18:11:26 plsm121c-06 kernel: ? kmem_cache_free+0x347/0x450
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
Oct 6 18:11:26 plsm121c-06 kernel: ? do_syscall_64+0x89/0x160
Oct 6 18:11:26 plsm121c-06 kernel: ? list_lru_add+0x142/0x190
Oct 6 18:11:26 plsm121c-06 kernel: ? task_lookup_next_fdget_rcu+0x91/0xd0
Oct 6 18:11:26 plsm121c-06 kernel: ? __pfx_proc_fd_instantiate+0x10/0x10
Oct 6 18:11:26 plsm121c-06 kernel: ? proc_readfd_common+0xf5/0x1f0
Oct 6 18:11:26 plsm121c-06 kernel: ? atime_needs_update+0x61/0x120
Oct 6 18:11:26 plsm121c-06 kernel: ? touch_atime+0x1e/0x100
Oct 6 18:11:26 plsm121c-06 kernel: ? iterate_dir+0x18f/0x220
Oct 6 18:11:26 plsm121c-06 kernel: ? __x64_sys_getdents64+0xf7/0x120
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
Oct 6 18:11:26 plsm121c-06 kernel: ? do_syscall_64+0x89/0x160
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
Oct 6 18:11:26 plsm121c-06 kernel: ? do_syscall_64+0x89/0x160
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
Oct 6 18:11:26 plsm121c-06 kernel: ? do_syscall_64+0x89/0x160
Oct 6 18:11:26 plsm121c-06 kernel: ? do_user_addr_fault+0x341/0x6b0
Oct 6 18:11:26 plsm121c-06 kernel: ? exc_page_fault+0x70/0x160
Oct 6 18:11:26 plsm121c-06 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Oct 6 18:11:26 plsm121c-06 kernel: RIP: 0033:0x7fc17ecfd617
Oct 6 18:11:26 plsm121c-06 kernel: RSP: 002b:00007ffc321ff1c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Oct 6 18:11:26 plsm121c-06 kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc17ecfd617
Oct 6 18:11:26 plsm121c-06 kernel: RDX: 0000000000000002 RSI: 000056109cba2c20 RDI: 0000000000000003
Oct 6 18:11:26 plsm121c-06 kernel: RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffc321ff060
Oct 6 18:11:26 plsm121c-06 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000056109cba2c20
Oct 6 18:11:26 plsm121c-06 kernel: R13: 00007fc17ee086c8 R14: 00007ffc321ff290 R15: 00000000ffffffff
Oct 6 18:11:26 plsm121c-06 kernel: </TASK>
I started to try to bisect but if I only apply the first 3 commits in
your series:
fec80afc41af NFSv4/flexfiles: Remove cred local variable dependency
eb71428e1a7f NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
d442670c0f63 NFSv4/flexfiles: Add data structure support for striped layouts
and rerun the test I get a kernel panic (which I don't yet have a
crashdump for, kdump appears misconfigured on my system).
Mike
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts
2025-10-07 14:05 ` [RFC PATCH v4 0/9] " Mike Snitzer
@ 2025-10-07 14:50 ` Mike Snitzer
2025-10-07 16:10 ` [PATCH] NFSv4/flexfiles: fix to allocate mirror->dss before use Mike Snitzer
0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2025-10-07 14:50 UTC (permalink / raw)
To: Jonathan Curley
Cc: Trond Myklebust, Anna Schumaker, Luis Chamberlain, linux-nfs
On Tue, Oct 07, 2025 at 10:05:00AM -0400, Mike Snitzer wrote:
> On Wed, Sep 24, 2025 at 04:20:41PM +0000, Jonathan Curley wrote:
> > This patch series introduces support for striped layouts:
> >
> > The first 2 patches are simple preparation changes. There should be
> > no logical impact to the code.
> >
> > The 3rd patch refactors the nfs4_ff_layout_mirror struct to have an
> > array of a new nfs4_ff_layout_ds_stripe type. The
> > nfs4_ff_layout_ds_stripe has all the contents of ff_data_server4 per
> > the flexfile rfc. I called it ds_stripe because ds was already taken
> > by the deviceid side of the code.
> >
> > The patches 4-8 update various paths to be dss_id aware. Most of this
> > consists of either adding a new parameter to the function or adding a
> > loop. Depending on which is appropriate.
> >
> > The final patch 9 updates the layout creation path to populate the
> > array and turns the feature on.
> >
> > v1:
> > - Fixes function parameter 'dss_id' not described in
> > 'nfs4_ff_layout_prepare_ds'
> >
> > v2:
> > - Fixes layout stat error reporting path for commit to properly
> > calculate dss_id.
> >
> > v3:
> > - Fixes do_div dividend to be u64.
> >
> > v4:
> > - Use regular division operators for u32 commit path math.
> > - Fix mirror null check in ff_rw_layout_has_available_ds.
> >
> > Jonathan Curley (9):
> > NFSv4/flexfiles: Remove cred local variable dependency
> > NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
> > NFSv4/flexfiles: Add data structure support for striped layouts
> > NFSv4/flexfiles: Update low level helper functions to be DS stripe
> > aware.
> > NFSv4/flexfiles: Read path updates for striped layouts
> > NFSv4/flexfiles: Commit path updates for striped layouts
> > NFSv4/flexfiles: Write path updates for striped layouts
> > NFSv4/flexfiles: Update layout stats & error paths for striped layouts
> > NFSv4/flexfiles: Add support for striped layouts
> >
> > fs/nfs/flexfilelayout/flexfilelayout.c | 778 +++++++++++++++-------
> > fs/nfs/flexfilelayout/flexfilelayout.h | 64 +-
> > fs/nfs/flexfilelayout/flexfilelayoutdev.c | 105 +--
> > fs/nfs/write.c | 2 +-
> > 4 files changed, 635 insertions(+), 314 deletions(-)
> >
> > --
> > 2.34.1
> >
> >
>
> Hi Jonathan,
>
> Testing the latest 'nfs-for-6.18-1' tag (now merged into Linus' tree),
> with your flexfiles striped layout changes, using NFS LOCALIO results
> in NFSD shutdown hanging due to nfsd refcount.
>
> I'm using 4.2 flexfiles client that connects to local system's DS via
> NFS v3 over LOCALIO, and then simply issuing very brief IO with dd:
>
> dd if=/dev/zero of=/mnt/hs_test/dd_thisisa.test bs=47008 count=2 oflag=direct
> dd if=/mnt/hs_test/dd_thisisa.test of=/dev/null bs=47008 count=2 iflag=direct
>
> followed by umount of the filesystem, and then attempt to stop all
> NFS/NFSD services so that all related kernel modules may be unloaded:
>
> umount /mnt/hs_test
> systemctl stop rpc-statd.service
> systemctl stop rpc-statd-notify.service
> systemctl stop var-lib-nfs-rpc_pipefs.mount
> systemctl stop proc-fs-nfsd.mount
> systemctl stop nfs-server.service
> # ^ this hangs below...
>
> systemctl stop rpcbind
> systemctl stop rpcbind.socket
> systemctl stop nfsdcld.service
> systemctl stop nfs-client.target
>
> /var/log/messages shows:
>
> Oct 6 18:08:20 plsm121c-06 systemd[1]: mnt-hs_test.mount: Deactivated successfully.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopping NFS status monitor for NFSv2/3 locking....
> Oct 6 18:08:20 plsm121c-06 systemd[1]: rpc-statd.service: Deactivated successfully.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking..
> Oct 6 18:08:20 plsm121c-06 systemd[1]: rpc-statd-notify.service: Deactivated successfully.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped Notify NFS peers of a restart.
> Oct 6 18:08:20 plsm121c-06 rpc.idmapd[8982]: exiting on signal 15
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopping NFSv4 ID-name mapping service...
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopping NFSv4 Client Tracking Daemon...
> Oct 6 18:08:20 plsm121c-06 systemd[1]: nfs-idmapd.service: Deactivated successfully.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped NFSv4 ID-name mapping service.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: nfsdcld.service: Deactivated successfully.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped NFSv4 Client Tracking Daemon.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopped target rpc_pipefs.target.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Unmounting RPC Pipe File System...
> Oct 6 18:08:20 plsm121c-06 systemd[1]: var-lib-nfs-rpc_pipefs.mount: Deactivated successfully.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Unmounted RPC Pipe File System.
> Oct 6 18:08:20 plsm121c-06 systemd[1]: Stopping NFS server and services...
> Oct 6 18:08:22 plsm121c-06 rpc.mountd[8988]: v4.2 client detached: 0x68a8fa0468e404e3 from "192.168.0.105:853"
>
> Oct 6 18:09:50 plsm121c-06 systemd[1]: nfs-server.service: Stopping timed out. Terminating.
> Oct 6 18:10:00 plsm121c-06 systemd[1]: Starting system activity accounting tool...
> Oct 6 18:10:00 plsm121c-06 systemd[1]: sysstat-collect.service: Deactivated successfully.
> Oct 6 18:10:00 plsm121c-06 systemd[1]: Finished system activity accounting tool.
>
> Oct 6 18:11:21 plsm121c-06 systemd[1]: nfs-server.service: State 'stop-sigterm' timed out. Killing.
> Oct 6 18:11:21 plsm121c-06 systemd[1]: nfs-server.service: Killing process 9669 (rpc.nfsd) with signal SIGKILL.
> Oct 6 18:11:26 plsm121c-06 kernel: INFO: task rpc.nfsd:9669 blocked for more than 122 seconds.
> Oct 6 18:11:26 plsm121c-06 kernel: Not tainted 6.12.24.23.hs.snitm+ #44
> Oct 6 18:11:26 plsm121c-06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 6 18:11:26 plsm121c-06 kernel: task:rpc.nfsd state:D stack:0 pid:9669 tgid:9669 ppid:1 flags:0x00004006
> Oct 6 18:11:26 plsm121c-06 kernel: Call Trace:
> Oct 6 18:11:26 plsm121c-06 kernel: <TASK>
> Oct 6 18:11:26 plsm121c-06 kernel: __schedule+0x26d/0x530
> Oct 6 18:11:26 plsm121c-06 kernel: schedule+0x27/0xa0
> Oct 6 18:11:26 plsm121c-06 kernel: schedule_timeout+0x14e/0x160
> Oct 6 18:11:26 plsm121c-06 kernel: ? svc_destroy+0xce/0x160 [sunrpc]
> Oct 6 18:11:26 plsm121c-06 kernel: ? lockd_put+0x5f/0x90 [lockd]
> Oct 6 18:11:26 plsm121c-06 kernel: __wait_for_common+0x8f/0x1d0
> Oct 6 18:11:26 plsm121c-06 kernel: ? __pfx_schedule_timeout+0x10/0x10
> Oct 6 18:11:26 plsm121c-06 kernel: nfsd_destroy_serv+0x138/0x1a0 [nfsd]
> Oct 6 18:11:26 plsm121c-06 kernel: nfsd_svc+0xe0/0x170 [nfsd]
> Oct 6 18:11:26 plsm121c-06 kernel: write_threads+0xc3/0x190 [nfsd]
> Oct 6 18:11:26 plsm121c-06 kernel: ? simple_transaction_get+0xc2/0xe0
> Oct 6 18:11:26 plsm121c-06 kernel: ? __pfx_write_threads+0x10/0x10 [nfsd]
> Oct 6 18:11:26 plsm121c-06 kernel: nfsctl_transaction_write+0x47/0x80 [nfsd]
> Oct 6 18:11:26 plsm121c-06 kernel: vfs_write+0xfa/0x420
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
> Oct 6 18:11:26 plsm121c-06 kernel: ksys_write+0x63/0xe0
> Oct 6 18:11:26 plsm121c-06 kernel: do_syscall_64+0x7d/0x160
> Oct 6 18:11:26 plsm121c-06 kernel: ? __x64_sys_close+0x3c/0x80
> Oct 6 18:11:26 plsm121c-06 kernel: ? kmem_cache_free+0x347/0x450
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
> Oct 6 18:11:26 plsm121c-06 kernel: ? do_syscall_64+0x89/0x160
> Oct 6 18:11:26 plsm121c-06 kernel: ? list_lru_add+0x142/0x190
> Oct 6 18:11:26 plsm121c-06 kernel: ? task_lookup_next_fdget_rcu+0x91/0xd0
> Oct 6 18:11:26 plsm121c-06 kernel: ? __pfx_proc_fd_instantiate+0x10/0x10
> Oct 6 18:11:26 plsm121c-06 kernel: ? proc_readfd_common+0xf5/0x1f0
> Oct 6 18:11:26 plsm121c-06 kernel: ? atime_needs_update+0x61/0x120
> Oct 6 18:11:26 plsm121c-06 kernel: ? touch_atime+0x1e/0x100
> Oct 6 18:11:26 plsm121c-06 kernel: ? iterate_dir+0x18f/0x220
> Oct 6 18:11:26 plsm121c-06 kernel: ? __x64_sys_getdents64+0xf7/0x120
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
> Oct 6 18:11:26 plsm121c-06 kernel: ? do_syscall_64+0x89/0x160
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
> Oct 6 18:11:26 plsm121c-06 kernel: ? do_syscall_64+0x89/0x160
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_work+0xf3/0x120
> Oct 6 18:11:26 plsm121c-06 kernel: ? syscall_exit_to_user_mode+0x32/0x1b0
> Oct 6 18:11:26 plsm121c-06 kernel: ? do_syscall_64+0x89/0x160
> Oct 6 18:11:26 plsm121c-06 kernel: ? do_user_addr_fault+0x341/0x6b0
> Oct 6 18:11:26 plsm121c-06 kernel: ? exc_page_fault+0x70/0x160
> Oct 6 18:11:26 plsm121c-06 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
> Oct 6 18:11:26 plsm121c-06 kernel: RIP: 0033:0x7fc17ecfd617
> Oct 6 18:11:26 plsm121c-06 kernel: RSP: 002b:00007ffc321ff1c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> Oct 6 18:11:26 plsm121c-06 kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc17ecfd617
> Oct 6 18:11:26 plsm121c-06 kernel: RDX: 0000000000000002 RSI: 000056109cba2c20 RDI: 0000000000000003
> Oct 6 18:11:26 plsm121c-06 kernel: RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffc321ff060
> Oct 6 18:11:26 plsm121c-06 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000056109cba2c20
> Oct 6 18:11:26 plsm121c-06 kernel: R13: 00007fc17ee086c8 R14: 00007ffc321ff290 R15: 00000000ffffffff
> Oct 6 18:11:26 plsm121c-06 kernel: </TASK>
>
> I started to try to bisect but if I only apply the first 3 commits in
> your series:
>
> fec80afc41af NFSv4/flexfiles: Remove cred local variable dependency
> eb71428e1a7f NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
> d442670c0f63 NFSv4/flexfiles: Add data structure support for striped layouts
I was mistaken, I had also applied the 4th commit:
a1491919c880 NFSv4/flexfiles: Update low level helper functions to be DS stripe aware.
> and rerun the test I get a kernel panic (which I don't yet have a
> crashdump for, kdump appears misconfigured on my system).
I also get a crash if I apply all but the last commit in the series,
and I was able to get a crashdump (which is the same that I actually
did get, for the previous case with only the first 4 commits applied):
[ 301.706108] BUG: kernel NULL pointer dereference, address: 0000000000000060
[ 301.706124] #PF: supervisor write access in kernel mode
[ 301.706134] #PF: error_code(0x0002) - not-present page
[ 301.706143] PGD 80a3680067 P4D 0
[ 301.706150] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 301.706159] CPU: 27 UID: 0 PID: 4299 Comm: dd Kdump: loaded Tainted: G O ------- --- 6.12.24.23.hs.snitm+ #50
[ 301.706176] Tainted: [O]=OOT_MODULE
[ 301.706181] Hardware name: Supermicro SYS-121C-TN10R/X13DDW-A, BIOS 2.7 07/23/2025
[ 301.706191] RIP: 0010:ff_layout_alloc_lseg+0x1d8/0x7b0 [nfs_layout_flexfiles]
[ 301.706205] Code: 85 c0 0f 84 72 05 00 00 48 8d 50 08 c7 40 28 01 00 00 00 48 89 50 08 48 89 50 10 48 8b 50 20 c7 40 2c 00 00 00 00 48 8d 4a 70 <48> c7 42 60 00 00 00 00 48 89 4a 70 48 c7 42 68 00 00 00 00 48 89
[ 301.706228] RSP: 0018:ff825cf2df0876c0 EFLAGS: 00010286
[ 301.706236] RAX: ff373fab2dbf16c0 RBX: 0000000000000001 RCX: 0000000000000070
[ 301.706246] RDX: 0000000000000000 RSI: ffffffffc1b4db5d RDI: ff373fab2dbf1700
[ 301.706529] RBP: ff825cf2df0877c8 R08: 0000000000000040 R09: ff373fab2dbf16c0
[ 301.706759] R10: ff825cf2df0876c0 R11: 0000000000000005 R12: ff37402c1f9c0e80
[ 301.706979] R13: 0000000000000000 R14: ff37402c0269f080 R15: 0000000000000000
[ 301.707194] FS: 00007fa060a0f740(0000) GS:ff3740a87f180000(0000) knlGS:0000000000000000
[ 301.707409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.707617] CR2: 0000000000000060 CR3: 000000809d2a0001 CR4: 0000000000f73ef0
[ 301.707822] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 301.708023] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 301.708223] PKRU: 55555554
[ 301.708420] Call Trace:
[ 301.708616] <TASK>
[ 301.708811] ? __pfx_nfs_init_locked+0x10/0x10 [nfs]
[ 301.709043] pnfs_layout_process+0xc9/0x3c0 [nfsv4]
[ 301.709283] pnfs_parse_lgopen+0x5b/0x120 [nfsv4]
[ 301.709511] _nfs4_open_and_get_state+0x173/0x2b0 [nfsv4]
[ 301.709734] ? nfs4_opendata_alloc+0x26d/0x400 [nfsv4]
[ 301.709956] _nfs4_do_open.isra.0+0x168/0x470 [nfsv4]
[ 301.710176] nfs4_do_open+0xcc/0x210 [nfsv4]
[ 301.710393] ? __memcg_slab_post_alloc_hook+0x220/0x3d0
[ 301.710595] nfs4_atomic_open+0x10b/0x140 [nfsv4]
[ 301.710818] ? alloc_nfs_open_context+0x2e/0x190 [nfs]
[ 301.711047] nfs_atomic_open+0x209/0x6b0 [nfs]
[ 301.711270] lookup_open.isra.0+0x394/0x620
[ 301.711473] open_last_lookups+0x1f6/0x470
[ 301.711677] path_openat+0x88/0x280
[ 301.711875] ? folio_add_file_rmap_ptes+0x38/0xb0
[ 301.712076] do_filp_open+0xae/0x150
[ 301.712273] ? syscall_exit_to_user_mode+0x32/0x1b0
[ 301.712474] ? __check_object_size.part.0+0x5e/0x140
[ 301.712672] do_sys_openat2+0x96/0xd0
[ 301.712871] __x64_sys_openat+0x57/0xa0
[ 301.713066] do_syscall_64+0x7d/0x160
[ 301.713260] ? __count_memcg_events+0x53/0xf0
[ 301.713449] ? handle_mm_fault+0x245/0x340
[ 301.713635] ? do_user_addr_fault+0x341/0x6b0
[ 301.713823] ? exc_page_fault+0x70/0x160
[ 301.714008] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 301.714195] RIP: 0033:0x7fa0608fd2cb
[ 301.714376] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 54 24 28 64 48 2b 14 25
[ 301.714749] RSP: 002b:00007ffe82758370 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 301.714938] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fa0608fd2cb
[ 301.715127] RDX: 0000000000004241 RSI: 00007ffe82759ce1 RDI: 00000000ffffff9c
[ 301.715311] RBP: 00007ffe82759ce1 R08: 0000000000000000 R09: 0000000000000002
[ 301.715490] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000004241
[ 301.715667] R13: 0000000000004241 R14: 00007ffe82759ce1 R15: 00007ffe82759d15
[ 301.715843] </TASK>
crash> l *ff_layout_alloc_lseg+0x1d8
0xffffffffc170eb88 is in ff_layout_alloc_lseg (./include/linux/nfs_fs.h:90).
85 };
86
87 static inline void nfs_localio_file_init(struct nfs_file_localio *nfl)
88 {
89 #if IS_ENABLED(CONFIG_NFS_LOCALIO)
90 nfl->ro_file = NULL;
91 nfl->rw_file = NULL;
92 INIT_LIST_HEAD(&nfl->list);
93 nfl->nfs_uuid = NULL;
94 #endif
pnfs_layout_process
-> ff_layout_alloc_mirror
-> nfs_localio_file_init(&mirror->dss[0].nfl)
So given the crash with:
BUG: kernel NULL pointer dereference, address: 0000000000000060
It would appear mirror->dss[0] is NULL, given its nfl member is at 0x60:
crash> struct -o nfs4_ff_layout_ds_stripe
struct nfs4_ff_layout_ds_stripe {
[0x0] struct nfs4_ff_layout_mirror *mirror;
[0x8] struct nfs4_deviceid devid;
[0x18] u32 efficiency;
[0x20] struct nfs4_ff_layout_ds *mirror_ds;
[0x28] u32 fh_versions_cnt;
[0x30] struct nfs_fh *fh_versions;
[0x38] nfs4_stateid stateid;
[0x50] const struct cred *ro_cred;
[0x58] const struct cred *rw_cred;
[0x60] struct nfs_file_localio nfl;
[0x88] struct nfs4_ff_layoutstat read_stat;
[0xd0] struct nfs4_ff_layoutstat write_stat;
[0x118] ktime_t start_time;
}
SIZE: 0x120
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] NFSv4/flexfiles: fix to allocate mirror->dss before use
2025-10-07 14:50 ` Mike Snitzer
@ 2025-10-07 16:10 ` Mike Snitzer
2025-10-07 17:39 ` [PATCH v2] " Mike Snitzer
0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2025-10-07 16:10 UTC (permalink / raw)
To: Jonathan Curley, Anna Schumaker
Cc: Trond Myklebust, Luis Chamberlain, linux-nfs
Move mirror_array's dss_count initialization and dss allocation to
ff_layout_alloc_mirror(), just before the loop that initializes each
nfs4_ff_layout_ds_stripe's nfs_file_localio.
This resolves dangling nfsd_serv refcount issues seen when using NFS
LOCALIO.
Fixes: 20b1d75fb840 ("NFSv4/flexfiles: Add support for striped layouts")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index fedd7d90e12f..364e16708ca7 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -270,7 +270,8 @@ ff_layout_remove_mirror(struct nfs4_ff_layout_mirror *mirror)
mirror->layout = NULL;
}
-static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
+static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags,
+ int dss_count)
{
struct nfs4_ff_layout_mirror *mirror;
u32 dss_id;
@@ -280,6 +281,12 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
spin_lock_init(&mirror->lock);
refcount_set(&mirror->ref, 1);
INIT_LIST_HEAD(&mirror->mirrors);
+
+ mirror->dss_count = dss_count;
+ mirror->dss =
+ kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
+ gfp_flags);
+
for (dss_id = 0; dss_id < mirror->dss_count; dss_id++)
nfs_localio_file_init(&mirror->dss[dss_id].nfl);
}
@@ -507,17 +514,12 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
if (dss_count > 1 && stripe_unit == 0)
goto out_err_free;
- fls->mirror_array[i] = ff_layout_alloc_mirror(gfp_flags);
+ fls->mirror_array[i] = ff_layout_alloc_mirror(gfp_flags, dss_count);
if (fls->mirror_array[i] == NULL) {
rc = -ENOMEM;
goto out_err_free;
}
- fls->mirror_array[i]->dss_count = dss_count;
- fls->mirror_array[i]->dss =
- kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
- gfp_flags);
-
for (dss_id = 0; dss_id < dss_count; dss_id++) {
dss_info = &fls->mirror_array[i]->dss[dss_id];
dss_info->mirror = fls->mirror_array[i];
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2] NFSv4/flexfiles: fix to allocate mirror->dss before use
2025-10-07 16:10 ` [PATCH] NFSv4/flexfiles: fix to allocate mirror->dss before use Mike Snitzer
@ 2025-10-07 17:39 ` Mike Snitzer
2025-10-07 18:03 ` Jon Curley
0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2025-10-07 17:39 UTC (permalink / raw)
To: Jonathan Curley, Anna Schumaker
Cc: Trond Myklebust, Luis Chamberlain, linux-nfs
Move mirror_array's dss_count initialization and dss allocation to
ff_layout_alloc_mirror(), just before the loop that initializes each
nfs4_ff_layout_ds_stripe's nfs_file_localio.
Also handle NULL return from kcalloc() and remove one level of ident
in ff_layout_alloc_mirror().
This commit fixes dangling nfsd_serv refcount issues seen when using
NFS LOCALIO and then attempting to stop the NFSD service.
Fixes: 20b1d75fb840 ("NFSv4/flexfiles: Add support for striped layouts")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
v2: checks for NULL return from kcalloc() and remove one level of ident in ff_layout_alloc_mirror
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index fedd7d90e12f..b7d2c0ef25fe 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -270,19 +270,31 @@ ff_layout_remove_mirror(struct nfs4_ff_layout_mirror *mirror)
mirror->layout = NULL;
}
-static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
+static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(u32 dss_count,
+ gfp_t gfp_flags)
{
struct nfs4_ff_layout_mirror *mirror;
- u32 dss_id;
mirror = kzalloc(sizeof(*mirror), gfp_flags);
- if (mirror != NULL) {
- spin_lock_init(&mirror->lock);
- refcount_set(&mirror->ref, 1);
- INIT_LIST_HEAD(&mirror->mirrors);
- for (dss_id = 0; dss_id < mirror->dss_count; dss_id++)
- nfs_localio_file_init(&mirror->dss[dss_id].nfl);
+ if (mirror == NULL)
+ return NULL;
+
+ spin_lock_init(&mirror->lock);
+ refcount_set(&mirror->ref, 1);
+ INIT_LIST_HEAD(&mirror->mirrors);
+
+ mirror->dss_count = dss_count;
+ mirror->dss =
+ kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
+ gfp_flags);
+ if (mirror->dss == NULL) {
+ kfree(mirror);
+ return NULL;
}
+
+ for (u32 dss_id = 0; dss_id < mirror->dss_count; dss_id++)
+ nfs_localio_file_init(&mirror->dss[dss_id].nfl);
+
return mirror;
}
@@ -507,17 +519,12 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
if (dss_count > 1 && stripe_unit == 0)
goto out_err_free;
- fls->mirror_array[i] = ff_layout_alloc_mirror(gfp_flags);
+ fls->mirror_array[i] = ff_layout_alloc_mirror(dss_count, gfp_flags);
if (fls->mirror_array[i] == NULL) {
rc = -ENOMEM;
goto out_err_free;
}
- fls->mirror_array[i]->dss_count = dss_count;
- fls->mirror_array[i]->dss =
- kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
- gfp_flags);
-
for (dss_id = 0; dss_id < dss_count; dss_id++) {
dss_info = &fls->mirror_array[i]->dss[dss_id];
dss_info->mirror = fls->mirror_array[i];
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2] NFSv4/flexfiles: fix to allocate mirror->dss before use
2025-10-07 17:39 ` [PATCH v2] " Mike Snitzer
@ 2025-10-07 18:03 ` Jon Curley
0 siblings, 0 replies; 17+ messages in thread
From: Jon Curley @ 2025-10-07 18:03 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Anna Schumaker, Trond Myklebust, Luis Chamberlain, linux-nfs
LGTM, thanks for fixing this.
On Tue, Oct 7, 2025 at 10:39 AM Mike Snitzer <snitzer@kernel.org> wrote:
>
> Move mirror_array's dss_count initialization and dss allocation to
> ff_layout_alloc_mirror(), just before the loop that initializes each
> nfs4_ff_layout_ds_stripe's nfs_file_localio.
>
> Also handle NULL return from kcalloc() and remove one level of ident
> in ff_layout_alloc_mirror().
>
> This commit fixes dangling nfsd_serv refcount issues seen when using
> NFS LOCALIO and then attempting to stop the NFSD service.
>
> Fixes: 20b1d75fb840 ("NFSv4/flexfiles: Add support for striped layouts")
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> v2: checks for NULL return from kcalloc() and remove one level of ident in ff_layout_alloc_mirror
>
> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
> index fedd7d90e12f..b7d2c0ef25fe 100644
> --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> @@ -270,19 +270,31 @@ ff_layout_remove_mirror(struct nfs4_ff_layout_mirror *mirror)
> mirror->layout = NULL;
> }
>
> -static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
> +static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(u32 dss_count,
> + gfp_t gfp_flags)
> {
> struct nfs4_ff_layout_mirror *mirror;
> - u32 dss_id;
>
> mirror = kzalloc(sizeof(*mirror), gfp_flags);
> - if (mirror != NULL) {
> - spin_lock_init(&mirror->lock);
> - refcount_set(&mirror->ref, 1);
> - INIT_LIST_HEAD(&mirror->mirrors);
> - for (dss_id = 0; dss_id < mirror->dss_count; dss_id++)
> - nfs_localio_file_init(&mirror->dss[dss_id].nfl);
> + if (mirror == NULL)
> + return NULL;
> +
> + spin_lock_init(&mirror->lock);
> + refcount_set(&mirror->ref, 1);
> + INIT_LIST_HEAD(&mirror->mirrors);
> +
> + mirror->dss_count = dss_count;
> + mirror->dss =
> + kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
> + gfp_flags);
> + if (mirror->dss == NULL) {
> + kfree(mirror);
> + return NULL;
> }
> +
> + for (u32 dss_id = 0; dss_id < mirror->dss_count; dss_id++)
> + nfs_localio_file_init(&mirror->dss[dss_id].nfl);
> +
> return mirror;
> }
>
> @@ -507,17 +519,12 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
> if (dss_count > 1 && stripe_unit == 0)
> goto out_err_free;
>
> - fls->mirror_array[i] = ff_layout_alloc_mirror(gfp_flags);
> + fls->mirror_array[i] = ff_layout_alloc_mirror(dss_count, gfp_flags);
> if (fls->mirror_array[i] == NULL) {
> rc = -ENOMEM;
> goto out_err_free;
> }
>
> - fls->mirror_array[i]->dss_count = dss_count;
> - fls->mirror_array[i]->dss =
> - kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
> - gfp_flags);
> -
> for (dss_id = 0; dss_id < dss_count; dss_id++) {
> dss_info = &fls->mirror_array[i]->dss[dss_id];
> dss_info->mirror = fls->mirror_array[i];
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (9 preceding siblings ...)
2025-10-07 14:05 ` [RFC PATCH v4 0/9] " Mike Snitzer
@ 2025-10-15 13:09 ` Mike Snitzer
2025-10-20 19:58 ` Mike Snitzer
10 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2025-10-15 13:09 UTC (permalink / raw)
To: Jonathan Curley
Cc: Trond Myklebust, Anna Schumaker, Luis Chamberlain, linux-nfs
Hi Jon,
I got a report that TLS no longer works with flexfiles. For context, I
made TLS possible with flexfiles with commit 04a15263662a
("pnfs/flexfiles: connect to NFSv3 DS using TLS if MDS connection uses
TLS").
If I revert your flexfiles striped patchset then TLS works with
flexfiles again.
I haven't looked closely to try to find the issue yet, but I wanted
to let you (and others) know about this regression.
Mike
On Wed, Sep 24, 2025 at 04:20:41PM +0000, Jonathan Curley wrote:
> This patch series introduces support for striped layouts:
>
> The first 2 patches are simple preparation changes. There should be
> no logical impact to the code.
>
> The 3rd patch refactors the nfs4_ff_layout_mirror struct to have an
> array of a new nfs4_ff_layout_ds_stripe type. The
> nfs4_ff_layout_ds_stripe has all the contents of ff_data_server4 per
> the flexfile rfc. I called it ds_stripe because ds was already taken
> by the deviceid side of the code.
>
> The patches 4-8 update various paths to be dss_id aware. Most of this
> consists of either adding a new parameter to the function or adding a
> loop. Depending on which is appropriate.
>
> The final patch 9 updates the layout creation path to populate the
> array and turns the feature on.
>
> v1:
> - Fixes function parameter 'dss_id' not described in
> 'nfs4_ff_layout_prepare_ds'
>
> v2:
> - Fixes layout stat error reporting path for commit to properly
> calculate dss_id.
>
> v3:
> - Fixes do_div dividend to be u64.
>
> v4:
> - Use regular division operators for u32 commit path math.
> - Fix mirror null check in ff_rw_layout_has_available_ds.
>
> Jonathan Curley (9):
> NFSv4/flexfiles: Remove cred local variable dependency
> NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
> NFSv4/flexfiles: Add data structure support for striped layouts
> NFSv4/flexfiles: Update low level helper functions to be DS stripe
> aware.
> NFSv4/flexfiles: Read path updates for striped layouts
> NFSv4/flexfiles: Commit path updates for striped layouts
> NFSv4/flexfiles: Write path updates for striped layouts
> NFSv4/flexfiles: Update layout stats & error paths for striped layouts
> NFSv4/flexfiles: Add support for striped layouts
>
> fs/nfs/flexfilelayout/flexfilelayout.c | 778 +++++++++++++++-------
> fs/nfs/flexfilelayout/flexfilelayout.h | 64 +-
> fs/nfs/flexfilelayout/flexfilelayoutdev.c | 105 +--
> fs/nfs/write.c | 2 +-
> 4 files changed, 635 insertions(+), 314 deletions(-)
>
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts
2025-10-15 13:09 ` [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Mike Snitzer
@ 2025-10-20 19:58 ` Mike Snitzer
0 siblings, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2025-10-20 19:58 UTC (permalink / raw)
To: Jonathan Curley
Cc: Trond Myklebust, Anna Schumaker, Luis Chamberlain, linux-nfs
FYI, I really don't think your flexfiles striped changes have anything
to do with the reported mtls client mount issue. Sorry for the noise.
Mike
On Wed, Oct 15, 2025 at 09:09:39AM -0400, Mike Snitzer wrote:
> Hi Jon,
>
> I got a report that TLS no longer works with flexfiles. For context, I
> made TLS possible with flexfiles with commit 04a15263662a
> ("pnfs/flexfiles: connect to NFSv3 DS using TLS if MDS connection uses
> TLS").
>
> If I revert your flexfiles striped patchset then TLS works with
> flexfiles again.
>
> I haven't looked closely to try to find the issue yet, but I wanted
> to let you (and others) know about this regression.
>
> Mike
>
> On Wed, Sep 24, 2025 at 04:20:41PM +0000, Jonathan Curley wrote:
> > This patch series introduces support for striped layouts:
> >
> > The first 2 patches are simple preparation changes. There should be
> > no logical impact to the code.
> >
> > The 3rd patch refactors the nfs4_ff_layout_mirror struct to have an
> > array of a new nfs4_ff_layout_ds_stripe type. The
> > nfs4_ff_layout_ds_stripe has all the contents of ff_data_server4 per
> > the flexfile rfc. I called it ds_stripe because ds was already taken
> > by the deviceid side of the code.
> >
> > The patches 4-8 update various paths to be dss_id aware. Most of this
> > consists of either adding a new parameter to the function or adding a
> > loop. Depending on which is appropriate.
> >
> > The final patch 9 updates the layout creation path to populate the
> > array and turns the feature on.
> >
> > v1:
> > - Fixes function parameter 'dss_id' not described in
> > 'nfs4_ff_layout_prepare_ds'
> >
> > v2:
> > - Fixes layout stat error reporting path for commit to properly
> > calculate dss_id.
> >
> > v3:
> > - Fixes do_div dividend to be u64.
> >
> > v4:
> > - Use regular division operators for u32 commit path math.
> > - Fix mirror null check in ff_rw_layout_has_available_ds.
> >
> > Jonathan Curley (9):
> > NFSv4/flexfiles: Remove cred local variable dependency
> > NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
> > NFSv4/flexfiles: Add data structure support for striped layouts
> > NFSv4/flexfiles: Update low level helper functions to be DS stripe
> > aware.
> > NFSv4/flexfiles: Read path updates for striped layouts
> > NFSv4/flexfiles: Commit path updates for striped layouts
> > NFSv4/flexfiles: Write path updates for striped layouts
> > NFSv4/flexfiles: Update layout stats & error paths for striped layouts
> > NFSv4/flexfiles: Add support for striped layouts
> >
> > fs/nfs/flexfilelayout/flexfilelayout.c | 778 +++++++++++++++-------
> > fs/nfs/flexfilelayout/flexfilelayout.h | 64 +-
> > fs/nfs/flexfilelayout/flexfilelayoutdev.c | 105 +--
> > fs/nfs/write.c | 2 +-
> > 4 files changed, 635 insertions(+), 314 deletions(-)
> >
> > --
> > 2.34.1
> >
> >
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-10-20 19:58 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-24 16:20 [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 1/9] NFSv4/flexfiles: Remove cred local variable dependency Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 2/9] NFSv4/flexfiles: Use ds_commit_idx when marking a write commit Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 3/9] NFSv4/flexfiles: Add data structure support for striped layouts Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 5/9] NFSv4/flexfiles: Read path updates for striped layouts Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 6/9] NFSv4/flexfiles: Commit " Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 7/9] NFSv4/flexfiles: Write " Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 8/9] NFSv4/flexfiles: Update layout stats & error paths " Jonathan Curley
2025-09-24 16:20 ` [RFC PATCH v4 9/9] NFSv4/flexfiles: Add support " Jonathan Curley
2025-10-07 14:05 ` [RFC PATCH v4 0/9] " Mike Snitzer
2025-10-07 14:50 ` Mike Snitzer
2025-10-07 16:10 ` [PATCH] NFSv4/flexfiles: fix to allocate mirror->dss before use Mike Snitzer
2025-10-07 17:39 ` [PATCH v2] " Mike Snitzer
2025-10-07 18:03 ` Jon Curley
2025-10-15 13:09 ` [RFC PATCH v4 0/9] NFSv4/flexfiles: Add support for striped layouts Mike Snitzer
2025-10-20 19:58 ` Mike Snitzer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox