* [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts
@ 2025-08-18 22:07 Jonathan Curley
2025-08-18 22:07 ` [PATCH 1/9] NFSv4/flexfiles: Remove cred local variable dependency Jonathan Curley
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
This patch series introduces support for striped layouts:
The first 2 patches are simple preparation changes. There should be
no logical impact to the code.
The 3rd patch refactors the nfs4_ff_layout_mirror struct to have an
array of a new nfs4_ff_layout_ds_stripe type. The
nfs4_ff_layout_ds_stripe has all the contents of ff_data_server4 per
the flexfile rfc. I called it ds_stripe because ds was already taken
by the deviceid side of the code.
The patches 4-8 update various paths to be dss_id aware. Most of this
consists of either adding a new parameter to the function or adding a
loop. Depending on which is appropriate.
The final patch 9 updates the layout creation path to populate the
array and turns the feature on.
Jonathan Curley (9):
NFSv4/flexfiles: Remove cred local variable dependency
NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
NFSv4/flexfiles: Add data structure support for striped layouts
NFSv4/flexfiles: Update low level helper functions to be DS stripe
aware.
NFSv4/flexfiles: Read path updates for striped layouts
NFSv4/flexfiles: Commit path updates for striped layouts
NFSv4/flexfiles: Write path updates for striped layouts
NFSv4/flexfiles: Update layout stats & error paths for striped layouts
NFSv4/flexfiles: Add support for striped layouts
fs/nfs/flexfilelayout/flexfilelayout.c | 774 +++++++++++++++-------
fs/nfs/flexfilelayout/flexfilelayout.h | 64 +-
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 110 +--
fs/nfs/write.c | 2 +-
4 files changed, 636 insertions(+), 314 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/9] NFSv4/flexfiles: Remove cred local variable dependency
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
@ 2025-08-18 22:07 ` Jonathan Curley
2025-08-18 22:07 ` [PATCH 2/9] NFSv4/flexfiles: Use ds_commit_idx when marking a write commit Jonathan Curley
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
No-op preparation change to remove dependency on cred local
variable. Subsequent striping diff has a cred per stripe so this local
variable can't be trusted to be the same.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 4bea008dbebd..a437d20ebcdf 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -532,10 +532,10 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
if (mirror != fls->mirror_array[i]) {
/* swap cred ptrs so free_mirror will clean up old */
if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->ro_cred, cred);
+ cred = xchg(&mirror->ro_cred, fls->mirror_array[i]->ro_cred);
rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
} else {
- cred = xchg(&mirror->rw_cred, cred);
+ cred = xchg(&mirror->rw_cred, fls->mirror_array[i]->rw_cred);
rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
}
ff_layout_free_mirror(fls->mirror_array[i]);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/9] NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
2025-08-18 22:07 ` [PATCH 1/9] NFSv4/flexfiles: Remove cred local variable dependency Jonathan Curley
@ 2025-08-18 22:07 ` Jonathan Curley
2025-08-18 22:07 ` [PATCH 3/9] NFSv4/flexfiles: Add data structure support for striped layouts Jonathan Curley
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
Correct this path to use ds_commit_idx. Another noop preparation
change. In current code commit_idx == mirror_idx but when striping is
enabled that will not be true.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/write.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 374fc6b34c79..422bb817cc85 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -977,7 +977,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
req->wb_nio = 0;
memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
nfs_mark_request_commit(req, hdr->lseg, &cinfo,
- hdr->pgio_mirror_idx);
+ hdr->ds_commit_idx);
goto next;
}
remove_req:
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/9] NFSv4/flexfiles: Add data structure support for striped layouts
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
2025-08-18 22:07 ` [PATCH 1/9] NFSv4/flexfiles: Remove cred local variable dependency Jonathan Curley
2025-08-18 22:07 ` [PATCH 2/9] NFSv4/flexfiles: Use ds_commit_idx when marking a write commit Jonathan Curley
@ 2025-08-18 22:07 ` Jonathan Curley
2025-08-18 22:07 ` [PATCH 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware Jonathan Curley
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
Adds a new struct nfs4_ff_layout_ds_stripe that represents a data
server stripe within a layout. A new dynamically allocated array of
this type has been added to nfs4_ff_layout_mirror and per stripe
configuration information has been moved from the mirror type to the
stripe based on the RFC.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 134 ++++++++++++----------
fs/nfs/flexfilelayout/flexfilelayout.h | 27 +++--
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 54 ++++-----
3 files changed, 117 insertions(+), 98 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index a437d20ebcdf..46a765bf05c3 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -171,7 +171,7 @@ ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx,
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
- return nfs_local_open_fh(clp, cred, fh, &mirror->nfl, mode);
+ return nfs_local_open_fh(clp, cred, fh, &mirror->dss[0].nfl, mode);
#else
return NULL;
#endif
@@ -182,13 +182,13 @@ static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
{
int i, j;
- if (m1->fh_versions_cnt != m2->fh_versions_cnt)
+ if (m1->dss[0].fh_versions_cnt != m2->dss[0].fh_versions_cnt)
return false;
- for (i = 0; i < m1->fh_versions_cnt; i++) {
+ for (i = 0; i < m1->dss[0].fh_versions_cnt; i++) {
bool found_fh = false;
- for (j = 0; j < m2->fh_versions_cnt; j++) {
- if (nfs_compare_fh(&m1->fh_versions[i],
- &m2->fh_versions[j]) == 0) {
+ for (j = 0; j < m2->dss[0].fh_versions_cnt; j++) {
+ if (nfs_compare_fh(&m1->dss[0].fh_versions[i],
+ &m2->dss[0].fh_versions[j]) == 0) {
found_fh = true;
break;
}
@@ -209,7 +209,8 @@ ff_layout_add_mirror(struct pnfs_layout_hdr *lo,
spin_lock(&inode->i_lock);
list_for_each_entry(pos, &ff_layout->mirrors, mirrors) {
- if (memcmp(&mirror->devid, &pos->devid, sizeof(pos->devid)) != 0)
+ if (memcmp(&mirror->dss[0].devid, &pos->dss[0].devid,
+ sizeof(pos->dss[0].devid)) != 0)
continue;
if (!ff_mirror_match_fh(mirror, pos))
continue;
@@ -246,23 +247,27 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
spin_lock_init(&mirror->lock);
refcount_set(&mirror->ref, 1);
INIT_LIST_HEAD(&mirror->mirrors);
- nfs_localio_file_init(&mirror->nfl);
+ nfs_localio_file_init(&mirror->dss[0].nfl);
}
return mirror;
}
static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
{
- const struct cred *cred;
+ const struct cred *cred;
+ int dss_id = 0;
ff_layout_remove_mirror(mirror);
- kfree(mirror->fh_versions);
- nfs_close_local_fh(&mirror->nfl);
- cred = rcu_access_pointer(mirror->ro_cred);
+
+ kfree(mirror->dss[dss_id].fh_versions);
+ nfs_close_local_fh(&mirror->dss[dss_id].nfl);
+ cred = rcu_access_pointer(mirror->dss[dss_id].ro_cred);
put_cred(cred);
- cred = rcu_access_pointer(mirror->rw_cred);
+ cred = rcu_access_pointer(mirror->dss[dss_id].rw_cred);
put_cred(cred);
- nfs4_ff_layout_put_deviceid(mirror->mirror_ds);
+ nfs4_ff_layout_put_deviceid(mirror->dss[dss_id].mirror_ds);
+
+ kfree(mirror->dss);
kfree(mirror);
}
@@ -372,8 +377,8 @@ static void ff_layout_sort_mirrors(struct nfs4_ff_layout_segment *fls)
for (i = 0; i < fls->mirror_array_cnt - 1; i++) {
for (j = i + 1; j < fls->mirror_array_cnt; j++)
- if (fls->mirror_array[i]->efficiency <
- fls->mirror_array[j]->efficiency)
+ if (fls->mirror_array[i]->dss[0].efficiency <
+ fls->mirror_array[j]->dss[0].efficiency)
swap(fls->mirror_array[i],
fls->mirror_array[j]);
}
@@ -427,23 +432,25 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
fls->mirror_array_cnt = mirror_array_cnt;
fls->stripe_unit = stripe_unit;
+ u32 dss_count = 0;
for (i = 0; i < fls->mirror_array_cnt; i++) {
struct nfs4_ff_layout_mirror *mirror;
struct cred *kcred;
const struct cred __rcu *cred;
kuid_t uid;
kgid_t gid;
- u32 ds_count, fh_count, id;
- int j;
+ u32 fh_count, id;
+ int j, dss_id = 0;
rc = -EIO;
p = xdr_inline_decode(&stream, 4);
if (!p)
goto out_err_free;
- ds_count = be32_to_cpup(p);
+
+ dss_count = be32_to_cpup(p);
/* FIXME: allow for striping? */
- if (ds_count != 1)
+ if (dss_count != 1)
goto out_err_free;
fls->mirror_array[i] = ff_layout_alloc_mirror(gfp_flags);
@@ -452,10 +459,13 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
goto out_err_free;
}
- fls->mirror_array[i]->ds_count = ds_count;
+ fls->mirror_array[i]->dss_count = dss_count;
+ fls->mirror_array[i]->dss =
+ kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
+ gfp_flags);
/* deviceid */
- rc = decode_deviceid(&stream, &fls->mirror_array[i]->devid);
+ rc = decode_deviceid(&stream, &fls->mirror_array[i]->dss[dss_id].devid);
if (rc)
goto out_err_free;
@@ -464,10 +474,10 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
p = xdr_inline_decode(&stream, 4);
if (!p)
goto out_err_free;
- fls->mirror_array[i]->efficiency = be32_to_cpup(p);
+ fls->mirror_array[i]->dss[dss_id].efficiency = be32_to_cpup(p);
/* stateid */
- rc = decode_pnfs_stateid(&stream, &fls->mirror_array[i]->stateid);
+ rc = decode_pnfs_stateid(&stream, &fls->mirror_array[i]->dss[dss_id].stateid);
if (rc)
goto out_err_free;
@@ -478,22 +488,22 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
goto out_err_free;
fh_count = be32_to_cpup(p);
- fls->mirror_array[i]->fh_versions =
- kcalloc(fh_count, sizeof(struct nfs_fh),
- gfp_flags);
- if (fls->mirror_array[i]->fh_versions == NULL) {
+ fls->mirror_array[i]->dss[dss_id].fh_versions =
+ kcalloc(fh_count, sizeof(struct nfs_fh),
+ gfp_flags);
+ if (fls->mirror_array[i]->dss[dss_id].fh_versions == NULL) {
rc = -ENOMEM;
goto out_err_free;
}
for (j = 0; j < fh_count; j++) {
rc = decode_nfs_fh(&stream,
- &fls->mirror_array[i]->fh_versions[j]);
+ &fls->mirror_array[i]->dss[dss_id].fh_versions[j]);
if (rc)
goto out_err_free;
}
- fls->mirror_array[i]->fh_versions_cnt = fh_count;
+ fls->mirror_array[i]->dss[dss_id].fh_versions_cnt = fh_count;
/* user */
rc = decode_name(&stream, &id);
@@ -524,19 +534,21 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
cred = RCU_INITIALIZER(kcred);
if (lgr->range.iomode == IOMODE_READ)
- rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].ro_cred, cred);
else
- rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].rw_cred, cred);
mirror = ff_layout_add_mirror(lh, fls->mirror_array[i]);
if (mirror != fls->mirror_array[i]) {
/* swap cred ptrs so free_mirror will clean up old */
if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->ro_cred, fls->mirror_array[i]->ro_cred);
- rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
+ cred = xchg(&mirror->dss[dss_id].ro_cred,
+ fls->mirror_array[i]->dss[dss_id].ro_cred);
+ rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].ro_cred, cred);
} else {
- cred = xchg(&mirror->rw_cred, fls->mirror_array[i]->rw_cred);
- rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
+ cred = xchg(&mirror->dss[dss_id].rw_cred,
+ fls->mirror_array[i]->dss[dss_id].rw_cred);
+ rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].rw_cred, cred);
}
ff_layout_free_mirror(fls->mirror_array[i]);
fls->mirror_array[i] = mirror;
@@ -624,8 +636,8 @@ nfs4_ff_layoutstat_start_io(struct nfs4_ff_layout_mirror *mirror,
struct nfs4_flexfile_layout *ffl = FF_LAYOUT_FROM_HDR(mirror->layout);
nfs4_ff_start_busy_timer(&layoutstat->busy_timer, now);
- if (!mirror->start_time)
- mirror->start_time = now;
+ if (!mirror->dss[0].start_time)
+ mirror->dss[0].start_time = now;
if (mirror->report_interval != 0)
report_interval = (s64)mirror->report_interval * 1000LL;
else if (layoutstats_timer != 0)
@@ -680,8 +692,8 @@ nfs4_ff_layout_stat_io_start_read(struct inode *inode,
bool report;
spin_lock(&mirror->lock);
- report = nfs4_ff_layoutstat_start_io(mirror, &mirror->read_stat, now);
- nfs4_ff_layout_stat_io_update_requested(&mirror->read_stat, requested);
+ report = nfs4_ff_layoutstat_start_io(mirror, &mirror->dss[0].read_stat, now);
+ nfs4_ff_layout_stat_io_update_requested(&mirror->dss[0].read_stat, requested);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -696,7 +708,7 @@ nfs4_ff_layout_stat_io_end_read(struct rpc_task *task,
__u64 completed)
{
spin_lock(&mirror->lock);
- nfs4_ff_layout_stat_io_update_completed(&mirror->read_stat,
+ nfs4_ff_layout_stat_io_update_completed(&mirror->dss[0].read_stat,
requested, completed,
ktime_get(), task->tk_start);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
@@ -711,8 +723,8 @@ nfs4_ff_layout_stat_io_start_write(struct inode *inode,
bool report;
spin_lock(&mirror->lock);
- report = nfs4_ff_layoutstat_start_io(mirror , &mirror->write_stat, now);
- nfs4_ff_layout_stat_io_update_requested(&mirror->write_stat, requested);
+ report = nfs4_ff_layoutstat_start_io(mirror, &mirror->dss[0].write_stat, now);
+ nfs4_ff_layout_stat_io_update_requested(&mirror->dss[0].write_stat, requested);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -731,7 +743,7 @@ nfs4_ff_layout_stat_io_end_write(struct rpc_task *task,
requested = completed = 0;
spin_lock(&mirror->lock);
- nfs4_ff_layout_stat_io_update_completed(&mirror->write_stat,
+ nfs4_ff_layout_stat_io_update_completed(&mirror->dss[0].write_stat,
requested, completed, ktime_get(), task->tk_start);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -773,7 +785,7 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
continue;
if (check_device &&
- nfs4_test_deviceid_unavailable(&mirror->mirror_ds->id_node))
+ nfs4_test_deviceid_unavailable(&mirror->dss[0].mirror_ds->id_node))
continue;
*best_idx = idx;
@@ -879,7 +891,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
mirror = FF_LAYOUT_COMP(pgio->pg_lseg, ds_idx);
pgm = &pgio->pg_mirrors[0];
- pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
+ pgm->pg_bsize = mirror->dss[0].mirror_ds->ds_versions[0].rsize;
pgio->pg_mirror_idx = ds_idx;
return;
@@ -951,7 +963,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
goto retry;
}
pgm = &pgio->pg_mirrors[i];
- pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].wsize;
+ pgm->pg_bsize = mirror->dss[0].mirror_ds->ds_versions[0].wsize;
}
if (NFS_SERVER(pgio->pg_inode)->flags &
@@ -2021,7 +2033,7 @@ select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i)
/* FIXME: Assume that there is only one NFS version available
* for the DS.
*/
- return &flseg->mirror_array[i]->fh_versions[0];
+ return &flseg->mirror_array[i]->dss[0].fh_versions[0];
}
static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
@@ -2137,10 +2149,10 @@ static void ff_layout_cancel_io(struct pnfs_layout_segment *lseg)
for (idx = 0; idx < flseg->mirror_array_cnt; idx++) {
mirror = flseg->mirror_array[idx];
- mirror_ds = mirror->mirror_ds;
+ mirror_ds = mirror->dss[0].mirror_ds;
if (IS_ERR_OR_NULL(mirror_ds))
continue;
- ds = mirror->mirror_ds->ds;
+ ds = mirror->dss[0].mirror_ds->ds;
if (!ds)
continue;
ds_clp = ds->ds_clp;
@@ -2541,8 +2553,8 @@ ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
struct nfs4_ff_layout_mirror *mirror)
{
struct nfs4_pnfs_ds_addr *da;
- struct nfs4_pnfs_ds *ds = mirror->mirror_ds->ds;
- struct nfs_fh *fh = &mirror->fh_versions[0];
+ struct nfs4_pnfs_ds *ds = mirror->dss[0].mirror_ds->ds;
+ struct nfs_fh *fh = &mirror->dss[0].fh_versions[0];
__be32 *p;
da = list_first_entry(&ds->ds_addrs, struct nfs4_pnfs_ds_addr, da_node);
@@ -2555,12 +2567,12 @@ ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
xdr_encode_opaque(p, fh->data, fh->size);
/* ff_io_latency4 read */
spin_lock(&mirror->lock);
- ff_layout_encode_io_latency(xdr, &mirror->read_stat.io_stat);
+ ff_layout_encode_io_latency(xdr, &mirror->dss[0].read_stat.io_stat);
/* ff_io_latency4 write */
- ff_layout_encode_io_latency(xdr, &mirror->write_stat.io_stat);
+ ff_layout_encode_io_latency(xdr, &mirror->dss[0].write_stat.io_stat);
spin_unlock(&mirror->lock);
/* nfstime4 */
- ff_layout_encode_nfstime(xdr, ktime_sub(ktime_get(), mirror->start_time));
+ ff_layout_encode_nfstime(xdr, ktime_sub(ktime_get(), mirror->dss[0].start_time));
/* bool */
p = xdr_reserve_space(xdr, 4);
*p = cpu_to_be32(false);
@@ -2607,7 +2619,7 @@ ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
list_for_each_entry(mirror, &ff_layout->mirrors, mirrors) {
if (i >= dev_limit)
break;
- if (IS_ERR_OR_NULL(mirror->mirror_ds))
+ if (IS_ERR_OR_NULL(mirror->dss[0].mirror_ds))
continue;
if (!test_and_clear_bit(NFS4_FF_MIRROR_STAT_AVAIL,
&mirror->flags) &&
@@ -2616,15 +2628,15 @@ ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
/* mirror refcount put in cleanup_layoutstats */
if (!refcount_inc_not_zero(&mirror->ref))
continue;
- dev = &mirror->mirror_ds->id_node;
+ dev = &mirror->dss[0].mirror_ds->id_node;
memcpy(&devinfo->dev_id, &dev->deviceid, NFS4_DEVICEID4_SIZE);
devinfo->offset = 0;
devinfo->length = NFS4_MAX_UINT64;
spin_lock(&mirror->lock);
- devinfo->read_count = mirror->read_stat.io_stat.ops_completed;
- devinfo->read_bytes = mirror->read_stat.io_stat.bytes_completed;
- devinfo->write_count = mirror->write_stat.io_stat.ops_completed;
- devinfo->write_bytes = mirror->write_stat.io_stat.bytes_completed;
+ devinfo->read_count = mirror->dss[0].read_stat.io_stat.ops_completed;
+ devinfo->read_bytes = mirror->dss[0].read_stat.io_stat.bytes_completed;
+ devinfo->write_count = mirror->dss[0].write_stat.io_stat.ops_completed;
+ devinfo->write_bytes = mirror->dss[0].write_stat.io_stat.bytes_completed;
spin_unlock(&mirror->lock);
devinfo->layout_type = LAYOUT_FLEX_FILES;
devinfo->ld_private.ops = &layoutstat_ops;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index 095df09017a5..14640452713b 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -71,12 +71,12 @@ struct nfs4_ff_layoutstat {
struct nfs4_ff_busy_timer busy_timer;
};
-struct nfs4_ff_layout_mirror {
- struct pnfs_layout_hdr *layout;
- struct list_head mirrors;
- u32 ds_count;
- u32 efficiency;
+struct nfs4_ff_layout_mirror;
+
+struct nfs4_ff_layout_ds_stripe {
+ struct nfs4_ff_layout_mirror *mirror;
struct nfs4_deviceid devid;
+ u32 efficiency;
struct nfs4_ff_layout_ds *mirror_ds;
u32 fh_versions_cnt;
struct nfs_fh *fh_versions;
@@ -84,12 +84,19 @@ struct nfs4_ff_layout_mirror {
const struct cred __rcu *ro_cred;
const struct cred __rcu *rw_cred;
struct nfs_file_localio nfl;
- refcount_t ref;
- spinlock_t lock;
- unsigned long flags;
struct nfs4_ff_layoutstat read_stat;
struct nfs4_ff_layoutstat write_stat;
ktime_t start_time;
+};
+
+struct nfs4_ff_layout_mirror {
+ struct pnfs_layout_hdr *layout;
+ struct list_head mirrors;
+ u32 dss_count;
+ struct nfs4_ff_layout_ds_stripe *dss;
+ refcount_t ref;
+ spinlock_t lock;
+ unsigned long flags;
u32 report_interval;
};
@@ -155,7 +162,7 @@ FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx)
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, idx);
if (mirror != NULL) {
- struct nfs4_ff_layout_ds *mirror_ds = mirror->mirror_ds;
+ struct nfs4_ff_layout_ds *mirror_ds = mirror->dss[0].mirror_ds;
if (!IS_ERR_OR_NULL(mirror_ds))
return &mirror_ds->id_node;
@@ -184,7 +191,7 @@ ff_layout_no_read_on_rw(struct pnfs_layout_segment *lseg)
static inline int
nfs4_ff_layout_ds_version(const struct nfs4_ff_layout_mirror *mirror)
{
- return mirror->mirror_ds->ds_versions[0].version;
+ return mirror->dss[0].mirror_ds->ds_versions[0].version;
}
struct nfs4_ff_layout_ds *
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index 656d5c50bbce..f8ac9d8bd380 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -259,7 +259,7 @@ int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
if (status == 0)
return 0;
- if (IS_ERR_OR_NULL(mirror->mirror_ds))
+ if (IS_ERR_OR_NULL(mirror->dss[0].mirror_ds))
return -EINVAL;
dserr = kmalloc(sizeof(*dserr), gfp_flags);
@@ -271,8 +271,8 @@ int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
dserr->length = length;
dserr->status = status;
dserr->opnum = opnum;
- nfs4_stateid_copy(&dserr->stateid, &mirror->stateid);
- memcpy(&dserr->deviceid, &mirror->mirror_ds->id_node.deviceid,
+ nfs4_stateid_copy(&dserr->stateid, &mirror->dss[0].stateid);
+ memcpy(&dserr->deviceid, &mirror->dss[0].mirror_ds->id_node.deviceid,
NFS4_DEVICEID4_SIZE);
spin_lock(&flo->generic_hdr.plh_inode->i_lock);
@@ -287,9 +287,9 @@ ff_layout_get_mirror_cred(struct nfs4_ff_layout_mirror *mirror, u32 iomode)
const struct cred *cred, __rcu **pcred;
if (iomode == IOMODE_READ)
- pcred = &mirror->ro_cred;
+ pcred = &mirror->dss[0].ro_cred;
else
- pcred = &mirror->rw_cred;
+ pcred = &mirror->dss[0].rw_cred;
rcu_read_lock();
do {
@@ -307,7 +307,7 @@ struct nfs_fh *
nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror)
{
/* FIXME: For now assume there is only 1 version available for the DS */
- return &mirror->fh_versions[0];
+ return &mirror->dss[0].fh_versions[0];
}
void
@@ -315,7 +315,7 @@ nfs4_ff_layout_select_ds_stateid(const struct nfs4_ff_layout_mirror *mirror,
nfs4_stateid *stateid)
{
if (nfs4_ff_layout_ds_version(mirror) == 4)
- nfs4_stateid_copy(stateid, &mirror->stateid);
+ nfs4_stateid_copy(stateid, &mirror->dss[0].stateid);
}
static bool
@@ -324,23 +324,23 @@ ff_layout_init_mirror_ds(struct pnfs_layout_hdr *lo,
{
if (mirror == NULL)
goto outerr;
- if (mirror->mirror_ds == NULL) {
+ if (mirror->dss[0].mirror_ds == NULL) {
struct nfs4_deviceid_node *node;
struct nfs4_ff_layout_ds *mirror_ds = ERR_PTR(-ENODEV);
node = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode),
- &mirror->devid, lo->plh_lc_cred,
+ &mirror->dss[0].devid, lo->plh_lc_cred,
GFP_KERNEL);
if (node)
mirror_ds = FF_LAYOUT_MIRROR_DS(node);
/* check for race with another call to this function */
- if (cmpxchg(&mirror->mirror_ds, NULL, mirror_ds) &&
+ if (cmpxchg(&mirror->dss[0].mirror_ds, NULL, mirror_ds) &&
mirror_ds != ERR_PTR(-ENODEV))
nfs4_put_deviceid_node(node);
}
- if (IS_ERR(mirror->mirror_ds))
+ if (IS_ERR(mirror->dss[0].mirror_ds))
goto outerr;
return true;
@@ -379,7 +379,7 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
if (!ff_layout_init_mirror_ds(lseg->pls_layout, mirror))
goto noconnect;
- ds = mirror->mirror_ds->ds;
+ ds = mirror->dss[0].mirror_ds->ds;
if (READ_ONCE(ds->ds_clp))
goto out;
/* matching smp_wmb() in _nfs4_pnfs_v3/4_ds_connect */
@@ -388,10 +388,10 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
/* FIXME: For now we assume the server sent only one version of NFS
* to use for the DS.
*/
- status = nfs4_pnfs_ds_connect(s, ds, &mirror->mirror_ds->id_node,
+ status = nfs4_pnfs_ds_connect(s, ds, &mirror->dss[0].mirror_ds->id_node,
dataserver_timeo, dataserver_retrans,
- mirror->mirror_ds->ds_versions[0].version,
- mirror->mirror_ds->ds_versions[0].minor_version);
+ mirror->dss[0].mirror_ds->ds_versions[0].version,
+ mirror->dss[0].mirror_ds->ds_versions[0].minor_version);
/* connect success, check rsize/wsize limit */
if (!status) {
@@ -404,10 +404,10 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
max_payload =
nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
NULL);
- if (mirror->mirror_ds->ds_versions[0].rsize > max_payload)
- mirror->mirror_ds->ds_versions[0].rsize = max_payload;
- if (mirror->mirror_ds->ds_versions[0].wsize > max_payload)
- mirror->mirror_ds->ds_versions[0].wsize = max_payload;
+ if (mirror->dss[0].mirror_ds->ds_versions[0].rsize > max_payload)
+ mirror->dss[0].mirror_ds->ds_versions[0].rsize = max_payload;
+ if (mirror->dss[0].mirror_ds->ds_versions[0].wsize > max_payload)
+ mirror->dss[0].mirror_ds->ds_versions[0].wsize = max_payload;
goto out;
}
noconnect:
@@ -430,7 +430,7 @@ ff_layout_get_ds_cred(struct nfs4_ff_layout_mirror *mirror,
{
const struct cred *cred;
- if (mirror && !mirror->mirror_ds->ds_versions[0].tightly_coupled) {
+ if (mirror && !mirror->dss[0].mirror_ds->ds_versions[0].tightly_coupled) {
cred = ff_layout_get_mirror_cred(mirror, range->iomode);
if (!cred)
cred = get_cred(mdscred);
@@ -453,7 +453,7 @@ struct rpc_clnt *
nfs4_ff_find_or_create_ds_client(struct nfs4_ff_layout_mirror *mirror,
struct nfs_client *ds_clp, struct inode *inode)
{
- switch (mirror->mirror_ds->ds_versions[0].version) {
+ switch (mirror->dss[0].mirror_ds->ds_versions[0].version) {
case 3:
/* For NFSv3 DS, flavor is set when creating DS connections */
return ds_clp->cl_rpcclient;
@@ -564,11 +564,11 @@ static bool ff_read_layout_has_available_ds(struct pnfs_layout_segment *lseg)
for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
if (mirror) {
- if (!mirror->mirror_ds)
+ if (!mirror->dss[0].mirror_ds)
return true;
- if (IS_ERR(mirror->mirror_ds))
+ if (IS_ERR(mirror->dss[0].mirror_ds))
continue;
- devid = &mirror->mirror_ds->id_node;
+ devid = &mirror->dss[0].mirror_ds->id_node;
if (!nfs4_test_deviceid_unavailable(devid))
return true;
}
@@ -585,11 +585,11 @@ static bool ff_rw_layout_has_available_ds(struct pnfs_layout_segment *lseg)
for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
- if (!mirror || IS_ERR(mirror->mirror_ds))
+ if (!mirror || IS_ERR(mirror->dss[0].mirror_ds))
return false;
- if (!mirror->mirror_ds)
+ if (!mirror->dss[0].mirror_ds)
continue;
- devid = &mirror->mirror_ds->id_node;
+ devid = &mirror->dss[0].mirror_ds->id_node;
if (nfs4_test_deviceid_unavailable(devid))
return false;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware.
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (2 preceding siblings ...)
2025-08-18 22:07 ` [PATCH 3/9] NFSv4/flexfiles: Add data structure support for striped layouts Jonathan Curley
@ 2025-08-18 22:07 ` Jonathan Curley
2025-08-20 1:52 ` kernel test robot
2025-08-18 22:07 ` [PATCH 5/9] NFSv4/flexfiles: Read path updates for striped layouts Jonathan Curley
` (4 subsequent siblings)
8 siblings, 1 reply; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
Updates common helper functions to be dss_id aware. Most cases simply
add a dss_id parameter. The has_available functions have been updated
with a loop.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 56 +++++------
fs/nfs/flexfilelayout/flexfilelayout.h | 39 +++++---
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 110 ++++++++++++----------
3 files changed, 116 insertions(+), 89 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 46a765bf05c3..a2a3821f190c 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -164,14 +164,14 @@ decode_name(struct xdr_stream *xdr, u32 *id)
}
static struct nfsd_file *
-ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx,
+ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx, u32 dss_id,
struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, fmode_t mode)
{
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
- return nfs_local_open_fh(clp, cred, fh, &mirror->dss[0].nfl, mode);
+ return nfs_local_open_fh(clp, cred, fh, &mirror->dss[dss_id].nfl, mode);
#else
return NULL;
#endif
@@ -752,7 +752,7 @@ nfs4_ff_layout_stat_io_end_write(struct rpc_task *task,
static void
ff_layout_mark_ds_unreachable(struct pnfs_layout_segment *lseg, u32 idx)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
if (devid)
nfs4_mark_deviceid_unavailable(devid);
@@ -761,7 +761,7 @@ ff_layout_mark_ds_unreachable(struct pnfs_layout_segment *lseg, u32 idx)
static void
ff_layout_mark_ds_reachable(struct pnfs_layout_segment *lseg, u32 idx)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
if (devid)
nfs4_mark_deviceid_available(devid);
@@ -780,7 +780,7 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
/* mirrors are initially sorted by efficiency */
for (idx = start_idx; idx < fls->mirror_array_cnt; idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, false);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, false);
if (!ds)
continue;
@@ -953,7 +953,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
for (i = 0; i < pgio->pg_mirror_count; i++) {
mirror = FF_LAYOUT_COMP(pgio->pg_lseg, i);
- ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, mirror, true);
+ ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, mirror, 0, true);
if (!ds) {
if (!ff_layout_no_fallback_to_mds(pgio->pg_lseg))
goto out_mds;
@@ -1125,7 +1125,7 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task,
{
struct pnfs_layout_hdr *lo = lseg->pls_layout;
struct inode *inode = lo->plh_inode;
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
switch (op_status) {
@@ -1224,7 +1224,7 @@ static int ff_layout_async_handle_error_v3(struct rpc_task *task,
struct pnfs_layout_segment *lseg,
u32 idx)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
switch (op_status) {
case NFS_OK:
@@ -1354,7 +1354,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
mirror = FF_LAYOUT_COMP(lseg, idx);
err = ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
- mirror, offset, length, status, opnum,
+ mirror, 0, offset, length, status, opnum,
nfs_io_gfp_mask());
switch (status) {
@@ -1885,20 +1885,20 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->args.pgbase, (size_t)hdr->args.count, offset);
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, false);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, false);
if (!ds)
goto out_failed;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- hdr->inode);
+ hdr->inode, 0);
if (IS_ERR(ds_clnt))
goto out_failed;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, 0);
if (!ds_cred)
goto out_failed;
- vers = nfs4_ff_layout_ds_version(mirror);
+ vers = nfs4_ff_layout_ds_version(mirror, 0);
dprintk("%s USE DS: %s cl_count %d vers %d\n", __func__,
ds->ds_remotestr, refcount_read(&ds->ds_clp->cl_count), vers);
@@ -1906,11 +1906,11 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->pgio_done_cb = ff_layout_read_done_cb;
refcount_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- fh = nfs4_ff_layout_select_ds_fh(mirror);
+ fh = nfs4_ff_layout_select_ds_fh(mirror, 0);
if (fh)
hdr->args.fh = fh;
- nfs4_ff_layout_select_ds_stateid(mirror, &hdr->args.stateid);
+ nfs4_ff_layout_select_ds_stateid(mirror, 0, &hdr->args.stateid);
/*
* Note that if we ever decide to split across DSes,
@@ -1920,7 +1920,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Start IO accounting for local read */
- localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, FMODE_READ);
+ localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh, FMODE_READ);
if (localio) {
hdr->task.tk_start = ktime_get();
ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
@@ -1959,20 +1959,20 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
u32 idx = hdr->pgio_mirror_idx;
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, true);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, true);
if (!ds)
goto out_failed;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- hdr->inode);
+ hdr->inode, 0);
if (IS_ERR(ds_clnt))
goto out_failed;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, 0);
if (!ds_cred)
goto out_failed;
- vers = nfs4_ff_layout_ds_version(mirror);
+ vers = nfs4_ff_layout_ds_version(mirror, 0);
dprintk("%s ino %lu sync %d req %zu@%llu DS: %s cl_count %d vers %d\n",
__func__, hdr->inode->i_ino, sync, (size_t) hdr->args.count,
@@ -1983,11 +1983,11 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
refcount_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
hdr->ds_commit_idx = idx;
- fh = nfs4_ff_layout_select_ds_fh(mirror);
+ fh = nfs4_ff_layout_select_ds_fh(mirror, 0);
if (fh)
hdr->args.fh = fh;
- nfs4_ff_layout_select_ds_stateid(mirror, &hdr->args.stateid);
+ nfs4_ff_layout_select_ds_stateid(mirror, 0, &hdr->args.stateid);
/*
* Note that if we ever decide to split across DSes,
@@ -1996,7 +1996,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Start IO accounting for local write */
- localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
hdr->task.tk_start = ktime_get();
@@ -2054,20 +2054,20 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, true);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, true);
if (!ds)
goto out_err;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- data->inode);
+ data->inode, 0);
if (IS_ERR(ds_clnt))
goto out_err;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, data->cred);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, data->cred, 0);
if (!ds_cred)
goto out_err;
- vers = nfs4_ff_layout_ds_version(mirror);
+ vers = nfs4_ff_layout_ds_version(mirror, 0);
dprintk("%s ino %lu, how %d cl_count %d vers %d\n", __func__,
data->inode->i_ino, how, refcount_read(&ds->ds_clp->cl_count),
@@ -2081,7 +2081,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
data->args.fh = fh;
/* Start IO accounting for local commit */
- localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
data->task.tk_start = ktime_get();
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index 14640452713b..142324d6d5c5 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -157,12 +157,12 @@ FF_LAYOUT_COMP(struct pnfs_layout_segment *lseg, u32 idx)
}
static inline struct nfs4_deviceid_node *
-FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx)
+FF_LAYOUT_DEVID_NODE(struct pnfs_layout_segment *lseg, u32 idx, u32 dss_id)
{
struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, idx);
if (mirror != NULL) {
- struct nfs4_ff_layout_ds *mirror_ds = mirror->dss[0].mirror_ds;
+ struct nfs4_ff_layout_ds *mirror_ds = mirror->dss[dss_id].mirror_ds;
if (!IS_ERR_OR_NULL(mirror_ds))
return &mirror_ds->id_node;
@@ -189,9 +189,22 @@ ff_layout_no_read_on_rw(struct pnfs_layout_segment *lseg)
}
static inline int
-nfs4_ff_layout_ds_version(const struct nfs4_ff_layout_mirror *mirror)
+nfs4_ff_layout_ds_version(const struct nfs4_ff_layout_mirror *mirror, u32 dss_id)
{
- return mirror->dss[0].mirror_ds->ds_versions[0].version;
+ return mirror->dss[dss_id].mirror_ds->ds_versions[0].version;
+}
+
+static inline u32
+nfs4_ff_layout_calc_dss_id(const u64 stripe_unit, const u32 dss_count, const loff_t offset)
+{
+ u64 tmp = offset;
+
+ if (dss_count == 1 || stripe_unit == 0)
+ return 0;
+
+ do_div(tmp, stripe_unit);
+
+ return do_div(tmp, dss_count);
}
struct nfs4_ff_layout_ds *
@@ -200,9 +213,9 @@ nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev,
void nfs4_ff_layout_put_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
void nfs4_ff_layout_free_deviceid(struct nfs4_ff_layout_ds *mirror_ds);
int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
- struct nfs4_ff_layout_mirror *mirror, u64 offset,
- u64 length, int status, enum nfs_opnum4 opnum,
- gfp_t gfp_flags);
+ struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id, u64 offset, u64 length, int status,
+ enum nfs_opnum4 opnum, gfp_t gfp_flags);
void ff_layout_send_layouterror(struct pnfs_layout_segment *lseg);
int ff_layout_encode_ds_ioerr(struct xdr_stream *xdr, const struct list_head *head);
void ff_layout_free_ds_ioerr(struct list_head *head);
@@ -211,23 +224,27 @@ unsigned int ff_layout_fetch_ds_ioerr(struct pnfs_layout_hdr *lo,
struct list_head *head,
unsigned int maxnum);
struct nfs_fh *
-nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror);
+nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror, u32 dss_id);
void
nfs4_ff_layout_select_ds_stateid(const struct nfs4_ff_layout_mirror *mirror,
- nfs4_stateid *stateid);
+ u32 dss_id,
+ nfs4_stateid *stateid);
struct nfs4_pnfs_ds *
nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
bool fail_return);
struct rpc_clnt *
nfs4_ff_find_or_create_ds_client(struct nfs4_ff_layout_mirror *mirror,
struct nfs_client *ds_clp,
- struct inode *inode);
+ struct inode *inode,
+ u32 dss_id);
const struct cred *ff_layout_get_ds_cred(struct nfs4_ff_layout_mirror *mirror,
const struct pnfs_layout_range *range,
- const struct cred *mdscred);
+ const struct cred *mdscred,
+ u32 dss_id);
bool ff_layout_avoid_mds_available_ds(struct pnfs_layout_segment *lseg);
bool ff_layout_avoid_read_on_rw(struct pnfs_layout_segment *lseg);
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index f8ac9d8bd380..e6623ab6742d 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -250,16 +250,16 @@ ff_layout_add_ds_error_locked(struct nfs4_flexfile_layout *flo,
}
int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
- struct nfs4_ff_layout_mirror *mirror, u64 offset,
- u64 length, int status, enum nfs_opnum4 opnum,
- gfp_t gfp_flags)
+ struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id, u64 offset, u64 length, int status,
+ enum nfs_opnum4 opnum, gfp_t gfp_flags)
{
struct nfs4_ff_layout_ds_err *dserr;
if (status == 0)
return 0;
- if (IS_ERR_OR_NULL(mirror->dss[0].mirror_ds))
+ if (IS_ERR_OR_NULL(mirror->dss[dss_id].mirror_ds))
return -EINVAL;
dserr = kmalloc(sizeof(*dserr), gfp_flags);
@@ -271,8 +271,8 @@ int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
dserr->length = length;
dserr->status = status;
dserr->opnum = opnum;
- nfs4_stateid_copy(&dserr->stateid, &mirror->dss[0].stateid);
- memcpy(&dserr->deviceid, &mirror->dss[0].mirror_ds->id_node.deviceid,
+ nfs4_stateid_copy(&dserr->stateid, &mirror->dss[dss_id].stateid);
+ memcpy(&dserr->deviceid, &mirror->dss[dss_id].mirror_ds->id_node.deviceid,
NFS4_DEVICEID4_SIZE);
spin_lock(&flo->generic_hdr.plh_inode->i_lock);
@@ -282,14 +282,14 @@ int ff_layout_track_ds_error(struct nfs4_flexfile_layout *flo,
}
static const struct cred *
-ff_layout_get_mirror_cred(struct nfs4_ff_layout_mirror *mirror, u32 iomode)
+ff_layout_get_mirror_cred(struct nfs4_ff_layout_mirror *mirror, u32 iomode, u32 dss_id)
{
const struct cred *cred, __rcu **pcred;
if (iomode == IOMODE_READ)
- pcred = &mirror->dss[0].ro_cred;
+ pcred = &mirror->dss[dss_id].ro_cred;
else
- pcred = &mirror->dss[0].rw_cred;
+ pcred = &mirror->dss[dss_id].rw_cred;
rcu_read_lock();
do {
@@ -304,43 +304,45 @@ ff_layout_get_mirror_cred(struct nfs4_ff_layout_mirror *mirror, u32 iomode)
}
struct nfs_fh *
-nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror)
+nfs4_ff_layout_select_ds_fh(struct nfs4_ff_layout_mirror *mirror, u32 dss_id)
{
/* FIXME: For now assume there is only 1 version available for the DS */
- return &mirror->dss[0].fh_versions[0];
+ return &mirror->dss[dss_id].fh_versions[0];
}
void
nfs4_ff_layout_select_ds_stateid(const struct nfs4_ff_layout_mirror *mirror,
- nfs4_stateid *stateid)
+ u32 dss_id,
+ nfs4_stateid *stateid)
{
- if (nfs4_ff_layout_ds_version(mirror) == 4)
- nfs4_stateid_copy(stateid, &mirror->dss[0].stateid);
+ if (nfs4_ff_layout_ds_version(mirror, dss_id) == 4)
+ nfs4_stateid_copy(stateid, &mirror->dss[dss_id].stateid);
}
static bool
ff_layout_init_mirror_ds(struct pnfs_layout_hdr *lo,
- struct nfs4_ff_layout_mirror *mirror)
+ struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id)
{
if (mirror == NULL)
goto outerr;
- if (mirror->dss[0].mirror_ds == NULL) {
+ if (mirror->dss[dss_id].mirror_ds == NULL) {
struct nfs4_deviceid_node *node;
struct nfs4_ff_layout_ds *mirror_ds = ERR_PTR(-ENODEV);
node = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode),
- &mirror->dss[0].devid, lo->plh_lc_cred,
+ &mirror->dss[dss_id].devid, lo->plh_lc_cred,
GFP_KERNEL);
if (node)
mirror_ds = FF_LAYOUT_MIRROR_DS(node);
/* check for race with another call to this function */
- if (cmpxchg(&mirror->dss[0].mirror_ds, NULL, mirror_ds) &&
+ if (cmpxchg(&mirror->dss[dss_id].mirror_ds, NULL, mirror_ds) &&
mirror_ds != ERR_PTR(-ENODEV))
nfs4_put_deviceid_node(node);
}
- if (IS_ERR(mirror->dss[0].mirror_ds))
+ if (IS_ERR(mirror->dss[dss_id].mirror_ds))
goto outerr;
return true;
@@ -368,6 +370,7 @@ ff_layout_init_mirror_ds(struct pnfs_layout_hdr *lo,
struct nfs4_pnfs_ds *
nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
bool fail_return)
{
struct nfs4_pnfs_ds *ds = NULL;
@@ -376,10 +379,10 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
unsigned int max_payload;
int status;
- if (!ff_layout_init_mirror_ds(lseg->pls_layout, mirror))
+ if (!ff_layout_init_mirror_ds(lseg->pls_layout, mirror, dss_id))
goto noconnect;
- ds = mirror->dss[0].mirror_ds->ds;
+ ds = mirror->dss[dss_id].mirror_ds->ds;
if (READ_ONCE(ds->ds_clp))
goto out;
/* matching smp_wmb() in _nfs4_pnfs_v3/4_ds_connect */
@@ -388,10 +391,10 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
/* FIXME: For now we assume the server sent only one version of NFS
* to use for the DS.
*/
- status = nfs4_pnfs_ds_connect(s, ds, &mirror->dss[0].mirror_ds->id_node,
+ status = nfs4_pnfs_ds_connect(s, ds, &mirror->dss[dss_id].mirror_ds->id_node,
dataserver_timeo, dataserver_retrans,
- mirror->dss[0].mirror_ds->ds_versions[0].version,
- mirror->dss[0].mirror_ds->ds_versions[0].minor_version);
+ mirror->dss[dss_id].mirror_ds->ds_versions[0].version,
+ mirror->dss[dss_id].mirror_ds->ds_versions[0].minor_version);
/* connect success, check rsize/wsize limit */
if (!status) {
@@ -404,15 +407,15 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
max_payload =
nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
NULL);
- if (mirror->dss[0].mirror_ds->ds_versions[0].rsize > max_payload)
- mirror->dss[0].mirror_ds->ds_versions[0].rsize = max_payload;
- if (mirror->dss[0].mirror_ds->ds_versions[0].wsize > max_payload)
- mirror->dss[0].mirror_ds->ds_versions[0].wsize = max_payload;
+ if (mirror->dss[dss_id].mirror_ds->ds_versions[0].rsize > max_payload)
+ mirror->dss[dss_id].mirror_ds->ds_versions[0].rsize = max_payload;
+ if (mirror->dss[dss_id].mirror_ds->ds_versions[0].wsize > max_payload)
+ mirror->dss[dss_id].mirror_ds->ds_versions[0].wsize = max_payload;
goto out;
}
noconnect:
ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
- mirror, lseg->pls_range.offset,
+ mirror, dss_id, lseg->pls_range.offset,
lseg->pls_range.length, NFS4ERR_NXIO,
OP_ILLEGAL, GFP_NOIO);
ff_layout_send_layouterror(lseg);
@@ -426,12 +429,13 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
const struct cred *
ff_layout_get_ds_cred(struct nfs4_ff_layout_mirror *mirror,
const struct pnfs_layout_range *range,
- const struct cred *mdscred)
+ const struct cred *mdscred,
+ u32 dss_id)
{
const struct cred *cred;
- if (mirror && !mirror->dss[0].mirror_ds->ds_versions[0].tightly_coupled) {
- cred = ff_layout_get_mirror_cred(mirror, range->iomode);
+ if (mirror && !mirror->dss[dss_id].mirror_ds->ds_versions[0].tightly_coupled) {
+ cred = ff_layout_get_mirror_cred(mirror, range->iomode, dss_id);
if (!cred)
cred = get_cred(mdscred);
} else {
@@ -445,15 +449,17 @@ ff_layout_get_ds_cred(struct nfs4_ff_layout_mirror *mirror,
* @mirror: pointer to the mirror
* @ds_clp: nfs_client for the DS
* @inode: pointer to inode
+ * @dss_id: DS stripe id
*
* Find or create a DS rpc client with th MDS server rpc client auth flavor
* in the nfs_client cl_ds_clients list.
*/
struct rpc_clnt *
nfs4_ff_find_or_create_ds_client(struct nfs4_ff_layout_mirror *mirror,
- struct nfs_client *ds_clp, struct inode *inode)
+ struct nfs_client *ds_clp, struct inode *inode,
+ u32 dss_id)
{
- switch (mirror->dss[0].mirror_ds->ds_versions[0].version) {
+ switch (mirror->dss[dss_id].mirror_ds->ds_versions[0].version) {
case 3:
/* For NFSv3 DS, flavor is set when creating DS connections */
return ds_clp->cl_rpcclient;
@@ -559,18 +565,20 @@ static bool ff_read_layout_has_available_ds(struct pnfs_layout_segment *lseg)
{
struct nfs4_ff_layout_mirror *mirror;
struct nfs4_deviceid_node *devid;
- u32 idx;
+ u32 idx, dss_id;
for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
if (mirror) {
- if (!mirror->dss[0].mirror_ds)
- return true;
- if (IS_ERR(mirror->dss[0].mirror_ds))
- continue;
- devid = &mirror->dss[0].mirror_ds->id_node;
- if (!nfs4_test_deviceid_unavailable(devid))
- return true;
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++) {
+ if (!mirror->dss[dss_id].mirror_ds)
+ return true;
+ if (IS_ERR(mirror->dss[dss_id].mirror_ds))
+ continue;
+ devid = &mirror->dss[dss_id].mirror_ds->id_node;
+ if (!nfs4_test_deviceid_unavailable(devid))
+ return true;
+ }
}
}
@@ -581,17 +589,19 @@ static bool ff_rw_layout_has_available_ds(struct pnfs_layout_segment *lseg)
{
struct nfs4_ff_layout_mirror *mirror;
struct nfs4_deviceid_node *devid;
- u32 idx;
+ u32 idx, dss_id;
for (idx = 0; idx < FF_LAYOUT_MIRROR_COUNT(lseg); idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
- if (!mirror || IS_ERR(mirror->dss[0].mirror_ds))
- return false;
- if (!mirror->dss[0].mirror_ds)
- continue;
- devid = &mirror->dss[0].mirror_ds->id_node;
- if (nfs4_test_deviceid_unavailable(devid))
- return false;
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++) {
+ if (!mirror || IS_ERR(mirror->dss[dss_id].mirror_ds))
+ return false;
+ if (!mirror->dss[dss_id].mirror_ds)
+ continue;
+ devid = &mirror->dss[dss_id].mirror_ds->id_node;
+ if (nfs4_test_deviceid_unavailable(devid))
+ return false;
+ }
}
return FF_LAYOUT_MIRROR_COUNT(lseg) != 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 5/9] NFSv4/flexfiles: Read path updates for striped layouts
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (3 preceding siblings ...)
2025-08-18 22:07 ` [PATCH 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware Jonathan Curley
@ 2025-08-18 22:07 ` Jonathan Curley
2025-08-18 22:07 ` [PATCH 6/9] NFSv4/flexfiles: Commit " Jonathan Curley
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
Updates read path to calculate and use dss_id to direct IO to the
appropriate stripe DS.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 122 ++++++++++++++++++++-----
1 file changed, 98 insertions(+), 24 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index a2a3821f190c..79700c18762c 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -770,6 +770,7 @@ ff_layout_mark_ds_reachable(struct pnfs_layout_segment *lseg, u32 idx)
static struct nfs4_pnfs_ds *
ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
u32 start_idx, u32 *best_idx,
+ u32 offset, u32 *dss_id,
bool check_device)
{
struct nfs4_ff_layout_segment *fls = FF_LAYOUT_LSEG(lseg);
@@ -780,12 +781,16 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
/* mirrors are initially sorted by efficiency */
for (idx = start_idx; idx < fls->mirror_array_cnt; idx++) {
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, false);
+ *dss_id = nfs4_ff_layout_calc_dss_id(
+ fls->stripe_unit,
+ fls->mirror_array[idx]->dss_count,
+ offset);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, *dss_id, false);
if (!ds)
continue;
if (check_device &&
- nfs4_test_deviceid_unavailable(&mirror->dss[0].mirror_ds->id_node))
+ nfs4_test_deviceid_unavailable(&mirror->dss[*dss_id].mirror_ds->id_node))
continue;
*best_idx = idx;
@@ -797,42 +802,52 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
static struct nfs4_pnfs_ds *
ff_layout_choose_any_ds_for_read(struct pnfs_layout_segment *lseg,
- u32 start_idx, u32 *best_idx)
+ u32 start_idx, u32 *best_idx,
+ u32 offset, u32 *dss_id)
{
- return ff_layout_choose_ds_for_read(lseg, start_idx, best_idx, false);
+ return ff_layout_choose_ds_for_read(lseg, start_idx, best_idx,
+ offset, dss_id, false);
}
static struct nfs4_pnfs_ds *
ff_layout_choose_valid_ds_for_read(struct pnfs_layout_segment *lseg,
- u32 start_idx, u32 *best_idx)
+ u32 start_idx, u32 *best_idx,
+ u32 offset, u32 *dss_id)
{
- return ff_layout_choose_ds_for_read(lseg, start_idx, best_idx, true);
+ return ff_layout_choose_ds_for_read(lseg, start_idx, best_idx,
+ offset, dss_id, true);
}
static struct nfs4_pnfs_ds *
ff_layout_choose_best_ds_for_read(struct pnfs_layout_segment *lseg,
- u32 start_idx, u32 *best_idx)
+ u32 start_idx, u32 *best_idx,
+ u32 offset, u32 *dss_id)
{
struct nfs4_pnfs_ds *ds;
- ds = ff_layout_choose_valid_ds_for_read(lseg, start_idx, best_idx);
+ ds = ff_layout_choose_valid_ds_for_read(lseg, start_idx, best_idx,
+ offset, dss_id);
if (ds)
return ds;
- return ff_layout_choose_any_ds_for_read(lseg, start_idx, best_idx);
+ return ff_layout_choose_any_ds_for_read(lseg, start_idx, best_idx,
+ offset, dss_id);
}
static struct nfs4_pnfs_ds *
ff_layout_get_ds_for_read(struct nfs_pageio_descriptor *pgio,
- u32 *best_idx)
+ u32 *best_idx,
+ u32 offset,
+ u32 *dss_id)
{
struct pnfs_layout_segment *lseg = pgio->pg_lseg;
struct nfs4_pnfs_ds *ds;
ds = ff_layout_choose_best_ds_for_read(lseg, pgio->pg_mirror_idx,
- best_idx);
+ best_idx, offset, dss_id);
if (ds || !pgio->pg_mirror_idx)
return ds;
- return ff_layout_choose_best_ds_for_read(lseg, 0, best_idx);
+ return ff_layout_choose_best_ds_for_read(lseg, 0, best_idx,
+ offset, dss_id);
}
static void
@@ -851,6 +866,56 @@ ff_layout_pg_get_read(struct nfs_pageio_descriptor *pgio,
}
}
+static bool
+ff_layout_lseg_is_striped(const struct nfs4_ff_layout_segment *fls)
+{
+ return fls->mirror_array[0]->dss_count > 1;
+}
+
+/*
+ * ff_layout_pg_test(). Called by nfs_can_coalesce_requests()
+ *
+ * Return 0 if @req cannot be coalesced into @pgio, otherwise return the number
+ * of bytes (maximum @req->wb_bytes) that can be coalesced.
+ */
+static size_t
+ff_layout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
+ struct nfs_page *req)
+{
+ unsigned int size;
+ u64 p_stripe, r_stripe;
+ u32 stripe_offset;
+ u64 segment_offset = pgio->pg_lseg->pls_range.offset;
+ u32 stripe_unit = FF_LAYOUT_LSEG(pgio->pg_lseg)->stripe_unit;
+
+ /* calls nfs_generic_pg_test */
+ size = pnfs_generic_pg_test(pgio, prev, req);
+ if (!size)
+ return 0;
+ else if (!ff_layout_lseg_is_striped(FF_LAYOUT_LSEG(pgio->pg_lseg)))
+ return size;
+
+ /* see if req and prev are in the same stripe */
+ if (prev) {
+ p_stripe = (u64)req_offset(prev) - segment_offset;
+ r_stripe = (u64)req_offset(req) - segment_offset;
+ do_div(p_stripe, stripe_unit);
+ do_div(r_stripe, stripe_unit);
+
+ if (p_stripe != r_stripe)
+ return 0;
+ }
+
+ /* calculate remaining bytes in the current stripe */
+ div_u64_rem((u64)req_offset(req) - segment_offset,
+ stripe_unit,
+ &stripe_offset);
+ WARN_ON_ONCE(stripe_offset > stripe_unit);
+ if (stripe_offset >= stripe_unit)
+ return 0;
+ return min(stripe_unit - (unsigned int)stripe_offset, size);
+}
+
static void
ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
struct nfs_page *req)
@@ -858,7 +923,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
struct nfs_pgio_mirror *pgm;
struct nfs4_ff_layout_mirror *mirror;
struct nfs4_pnfs_ds *ds;
- u32 ds_idx;
+ u32 ds_idx, dss_id;
if (NFS_SERVER(pgio->pg_inode)->flags &
(NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR))
@@ -879,7 +944,8 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
/* Reset wb_nio, since getting layout segment was successful */
req->wb_nio = 0;
- ds = ff_layout_get_ds_for_read(pgio, &ds_idx);
+ ds = ff_layout_get_ds_for_read(pgio, &ds_idx,
+ req_offset(req), &dss_id);
if (!ds) {
if (!ff_layout_no_fallback_to_mds(pgio->pg_lseg))
goto out_mds;
@@ -891,7 +957,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
mirror = FF_LAYOUT_COMP(pgio->pg_lseg, ds_idx);
pgm = &pgio->pg_mirrors[0];
- pgm->pg_bsize = mirror->dss[0].mirror_ds->ds_versions[0].rsize;
+ pgm->pg_bsize = mirror->dss[dss_id].mirror_ds->ds_versions[0].rsize;
pgio->pg_mirror_idx = ds_idx;
return;
@@ -1029,7 +1095,7 @@ ff_layout_pg_get_mirror_write(struct nfs_pageio_descriptor *desc, u32 idx)
static const struct nfs_pageio_ops ff_layout_pg_read_ops = {
.pg_init = ff_layout_pg_init_read,
- .pg_test = pnfs_generic_pg_test,
+ .pg_test = ff_layout_pg_test,
.pg_doio = pnfs_generic_pg_readpages,
.pg_cleanup = pnfs_generic_pg_cleanup,
};
@@ -1084,8 +1150,10 @@ static void ff_layout_resend_pnfs_read(struct nfs_pgio_header *hdr)
{
u32 idx = hdr->pgio_mirror_idx + 1;
u32 new_idx = 0;
+ u32 dss_id = 0;
- if (ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx))
+ if (ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx,
+ hdr->args.offset, &dss_id))
ff_layout_send_layouterror(hdr->lseg);
else
pnfs_error_mark_layout_for_return(hdr->inode, hdr->lseg);
@@ -1879,26 +1947,31 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
u32 idx = hdr->pgio_mirror_idx;
int vers;
struct nfs_fh *fh;
+ u32 dss_id;
dprintk("--> %s ino %lu pgbase %u req %zu@%llu\n",
__func__, hdr->inode->i_ino,
hdr->args.pgbase, (size_t)hdr->args.count, offset);
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, false);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(lseg)->stripe_unit,
+ mirror->dss_count,
+ offset);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, dss_id, false);
if (!ds)
goto out_failed;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- hdr->inode, 0);
+ hdr->inode, dss_id);
if (IS_ERR(ds_clnt))
goto out_failed;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, 0);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, dss_id);
if (!ds_cred)
goto out_failed;
- vers = nfs4_ff_layout_ds_version(mirror, 0);
+ vers = nfs4_ff_layout_ds_version(mirror, dss_id);
dprintk("%s USE DS: %s cl_count %d vers %d\n", __func__,
ds->ds_remotestr, refcount_read(&ds->ds_clp->cl_count), vers);
@@ -1906,11 +1979,11 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->pgio_done_cb = ff_layout_read_done_cb;
refcount_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- fh = nfs4_ff_layout_select_ds_fh(mirror, 0);
+ fh = nfs4_ff_layout_select_ds_fh(mirror, dss_id);
if (fh)
hdr->args.fh = fh;
- nfs4_ff_layout_select_ds_stateid(mirror, 0, &hdr->args.stateid);
+ nfs4_ff_layout_select_ds_stateid(mirror, dss_id, &hdr->args.stateid);
/*
* Note that if we ever decide to split across DSes,
@@ -1920,7 +1993,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Start IO accounting for local read */
- localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh, FMODE_READ);
+ localio = ff_local_open_fh(lseg, idx, dss_id, ds->ds_clp, ds_cred, fh,
+ FMODE_READ);
if (localio) {
hdr->task.tk_start = ktime_get();
ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 6/9] NFSv4/flexfiles: Commit path updates for striped layouts
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (4 preceding siblings ...)
2025-08-18 22:07 ` [PATCH 5/9] NFSv4/flexfiles: Read path updates for striped layouts Jonathan Curley
@ 2025-08-18 22:07 ` Jonathan Curley
2025-08-18 22:07 ` [PATCH 7/9] NFSv4/flexfiles: Write " Jonathan Curley
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
Updates the commit path to be stripe aware. This required updating
the ds_commit_idx to be stripe aware.
ds_commit_idx == mirror_idx * dss_count + dss_id.
Updates code paths to utilize the new ds_commit_idx and derive dss_id
& mirror_idx where appropriate to contact the correct DS using the
corresponding parameters.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 49 +++++++++++++++++---------
1 file changed, 33 insertions(+), 16 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 79700c18762c..b0d870359536 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -605,6 +605,26 @@ ff_layout_free_lseg(struct pnfs_layout_segment *lseg)
_ff_layout_free_lseg(fls);
}
+static u32 calc_mirror_idx_from_commit(struct pnfs_layout_segment *lseg,
+ u32 commit_index)
+{
+ struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
+ u32 mirror_idx = commit_index;
+
+ do_div(mirror_idx, flseg->mirror_array[0]->dss_count);
+
+ return mirror_idx;
+}
+
+static u32 calc_dss_id_from_commit(struct pnfs_layout_segment *lseg,
+ u32 commit_index)
+{
+ struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
+ u32 mirror_idx = commit_index;
+
+ return do_div(mirror_idx, flseg->mirror_array[0]->dss_count);
+}
+
static void
nfs4_ff_start_busy_timer(struct nfs4_ff_busy_timer *timer, ktime_t now)
{
@@ -2094,20 +2114,15 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
return PNFS_NOT_ATTEMPTED;
}
-static u32 calc_ds_index_from_commit(struct pnfs_layout_segment *lseg, u32 i)
-{
- return i;
-}
-
static struct nfs_fh *
-select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i)
+select_ds_fh_from_commit(struct pnfs_layout_segment *lseg, u32 i, u32 dss_id)
{
struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
/* FIXME: Assume that there is only one NFS version available
* for the DS.
*/
- return &flseg->mirror_array[i]->dss[0].fh_versions[0];
+ return &flseg->mirror_array[i]->dss[dss_id].fh_versions[0];
}
static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
@@ -2118,7 +2133,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
struct nfsd_file *localio;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
- u32 idx;
+ u32 idx, dss_id;
int vers, ret;
struct nfs_fh *fh;
@@ -2126,22 +2141,23 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
test_bit(NFS_LSEG_LAYOUTRETURN, &lseg->pls_flags)))
goto out_err;
- idx = calc_ds_index_from_commit(lseg, data->ds_commit_index);
+ idx = calc_mirror_idx_from_commit(lseg, data->ds_commit_index);
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, true);
+ dss_id = calc_dss_id_from_commit(lseg, data->ds_commit_index);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, dss_id, true);
if (!ds)
goto out_err;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- data->inode, 0);
+ data->inode, dss_id);
if (IS_ERR(ds_clnt))
goto out_err;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, data->cred, 0);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, data->cred, dss_id);
if (!ds_cred)
goto out_err;
- vers = nfs4_ff_layout_ds_version(mirror, 0);
+ vers = nfs4_ff_layout_ds_version(mirror, dss_id);
dprintk("%s ino %lu, how %d cl_count %d vers %d\n", __func__,
data->inode->i_ino, how, refcount_read(&ds->ds_clp->cl_count),
@@ -2150,12 +2166,12 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
data->cred = ds_cred;
refcount_inc(&ds->ds_clp->cl_count);
data->ds_clp = ds->ds_clp;
- fh = select_ds_fh_from_commit(lseg, data->ds_commit_index);
+ fh = select_ds_fh_from_commit(lseg, idx, dss_id);
if (fh)
data->args.fh = fh;
/* Start IO accounting for local commit */
- localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, dss_id, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
data->task.tk_start = ktime_get();
@@ -2259,8 +2275,9 @@ ff_layout_setup_ds_info(struct pnfs_ds_commit_info *fl_cinfo,
struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
struct inode *inode = lseg->pls_layout->plh_inode;
struct pnfs_commit_array *array, *new;
+ u32 size = flseg->mirror_array_cnt * flseg->mirror_array[0]->dss_count;
- new = pnfs_alloc_commit_array(flseg->mirror_array_cnt,
+ new = pnfs_alloc_commit_array(size,
nfs_io_gfp_mask());
if (new) {
spin_lock(&inode->i_lock);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 7/9] NFSv4/flexfiles: Write path updates for striped layouts
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (5 preceding siblings ...)
2025-08-18 22:07 ` [PATCH 6/9] NFSv4/flexfiles: Commit " Jonathan Curley
@ 2025-08-18 22:07 ` Jonathan Curley
2025-08-18 22:07 ` [PATCH 8/9] NFSv4/flexfiles: Update layout stats & error paths " Jonathan Curley
2025-08-18 22:07 ` [PATCH 9/9] NFSv4/flexfiles: Add support " Jonathan Curley
8 siblings, 0 replies; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
Updates write path to calculate and use dss_id to direct IO to the
appropriate stripe DS.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 42 ++++++++++++++++++--------
1 file changed, 30 insertions(+), 12 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index b0d870359536..696589d191e5 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -605,6 +605,14 @@ ff_layout_free_lseg(struct pnfs_layout_segment *lseg)
_ff_layout_free_lseg(fls);
}
+static u32 calc_commit_idx(struct pnfs_layout_segment *lseg,
+ u32 mirror_idx, u32 dss_id)
+{
+ struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
+
+ return (mirror_idx * flseg->mirror_array[0]->dss_count) + dss_id;
+}
+
static u32 calc_mirror_idx_from_commit(struct pnfs_layout_segment *lseg,
u32 commit_index)
{
@@ -1014,7 +1022,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
struct nfs4_ff_layout_mirror *mirror;
struct nfs_pgio_mirror *pgm;
struct nfs4_pnfs_ds *ds;
- u32 i;
+ u32 i, dss_id;
retry:
pnfs_generic_pg_check_layout(pgio, req);
@@ -1039,7 +1047,12 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
for (i = 0; i < pgio->pg_mirror_count; i++) {
mirror = FF_LAYOUT_COMP(pgio->pg_lseg, i);
- ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, mirror, 0, true);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(pgio->pg_lseg)->stripe_unit,
+ mirror->dss_count,
+ req_offset(req));
+ ds = nfs4_ff_layout_prepare_ds(pgio->pg_lseg, mirror,
+ dss_id, true);
if (!ds) {
if (!ff_layout_no_fallback_to_mds(pgio->pg_lseg))
goto out_mds;
@@ -1049,7 +1062,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
goto retry;
}
pgm = &pgio->pg_mirrors[i];
- pgm->pg_bsize = mirror->dss[0].mirror_ds->ds_versions[0].wsize;
+ pgm->pg_bsize = mirror->dss[dss_id].mirror_ds->ds_versions[0].wsize;
}
if (NFS_SERVER(pgio->pg_inode)->flags &
@@ -1122,7 +1135,7 @@ static const struct nfs_pageio_ops ff_layout_pg_read_ops = {
static const struct nfs_pageio_ops ff_layout_pg_write_ops = {
.pg_init = ff_layout_pg_init_write,
- .pg_test = pnfs_generic_pg_test,
+ .pg_test = ff_layout_pg_test,
.pg_doio = pnfs_generic_pg_writepages,
.pg_get_mirror_count = ff_layout_pg_get_mirror_count_write,
.pg_cleanup = pnfs_generic_pg_cleanup,
@@ -2051,22 +2064,27 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
int vers;
struct nfs_fh *fh;
u32 idx = hdr->pgio_mirror_idx;
+ u32 dss_id;
mirror = FF_LAYOUT_COMP(lseg, idx);
- ds = nfs4_ff_layout_prepare_ds(lseg, mirror, 0, true);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(lseg)->stripe_unit,
+ mirror->dss_count,
+ offset);
+ ds = nfs4_ff_layout_prepare_ds(lseg, mirror, dss_id, true);
if (!ds)
goto out_failed;
ds_clnt = nfs4_ff_find_or_create_ds_client(mirror, ds->ds_clp,
- hdr->inode, 0);
+ hdr->inode, dss_id);
if (IS_ERR(ds_clnt))
goto out_failed;
- ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, 0);
+ ds_cred = ff_layout_get_ds_cred(mirror, &lseg->pls_range, hdr->cred, dss_id);
if (!ds_cred)
goto out_failed;
- vers = nfs4_ff_layout_ds_version(mirror, 0);
+ vers = nfs4_ff_layout_ds_version(mirror, dss_id);
dprintk("%s ino %lu sync %d req %zu@%llu DS: %s cl_count %d vers %d\n",
__func__, hdr->inode->i_ino, sync, (size_t) hdr->args.count,
@@ -2076,12 +2094,12 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->pgio_done_cb = ff_layout_write_done_cb;
refcount_inc(&ds->ds_clp->cl_count);
hdr->ds_clp = ds->ds_clp;
- hdr->ds_commit_idx = idx;
- fh = nfs4_ff_layout_select_ds_fh(mirror, 0);
+ hdr->ds_commit_idx = calc_commit_idx(lseg, idx, dss_id);
+ fh = nfs4_ff_layout_select_ds_fh(mirror, dss_id);
if (fh)
hdr->args.fh = fh;
- nfs4_ff_layout_select_ds_stateid(mirror, 0, &hdr->args.stateid);
+ nfs4_ff_layout_select_ds_stateid(mirror, dss_id, &hdr->args.stateid);
/*
* Note that if we ever decide to split across DSes,
@@ -2090,7 +2108,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Start IO accounting for local write */
- localio = ff_local_open_fh(lseg, idx, 0, ds->ds_clp, ds_cred, fh,
+ localio = ff_local_open_fh(lseg, idx, dss_id, ds->ds_clp, ds_cred, fh,
FMODE_READ|FMODE_WRITE);
if (localio) {
hdr->task.tk_start = ktime_get();
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 8/9] NFSv4/flexfiles: Update layout stats & error paths for striped layouts
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (6 preceding siblings ...)
2025-08-18 22:07 ` [PATCH 7/9] NFSv4/flexfiles: Write " Jonathan Curley
@ 2025-08-18 22:07 ` Jonathan Curley
2025-08-18 22:07 ` [PATCH 9/9] NFSv4/flexfiles: Add support " Jonathan Curley
8 siblings, 0 replies; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
Updates the layout stats logic to be stripe aware. Read and write
stats are accumulated on a per DS stripe basis. Also updates error
paths to use dss_id where appropraite.
Limitations:
1. The layout stats structure is still statically sized to 4 and there
is no deduplication logic for deviceids that may appear more than once
in a striped layout.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 300 +++++++++++++++++--------
1 file changed, 201 insertions(+), 99 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 696589d191e5..24d0eef0b6a4 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -47,7 +47,7 @@ ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
int dev_limit, enum nfs4_ff_op_type type);
static void ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
const struct nfs42_layoutstat_devinfo *devinfo,
- struct nfs4_ff_layout_mirror *mirror);
+ struct nfs4_ff_layout_ds_stripe *dss_info);
static struct pnfs_layout_hdr *
ff_layout_alloc_layout_hdr(struct inode *inode, gfp_t gfp_flags)
@@ -657,6 +657,7 @@ nfs4_ff_end_busy_timer(struct nfs4_ff_busy_timer *timer, ktime_t now)
static bool
nfs4_ff_layoutstat_start_io(struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
struct nfs4_ff_layoutstat *layoutstat,
ktime_t now)
{
@@ -664,8 +665,8 @@ nfs4_ff_layoutstat_start_io(struct nfs4_ff_layout_mirror *mirror,
struct nfs4_flexfile_layout *ffl = FF_LAYOUT_FROM_HDR(mirror->layout);
nfs4_ff_start_busy_timer(&layoutstat->busy_timer, now);
- if (!mirror->dss[0].start_time)
- mirror->dss[0].start_time = now;
+ if (!mirror->dss[dss_id].start_time)
+ mirror->dss[dss_id].start_time = now;
if (mirror->report_interval != 0)
report_interval = (s64)mirror->report_interval * 1000LL;
else if (layoutstats_timer != 0)
@@ -715,13 +716,16 @@ nfs4_ff_layout_stat_io_update_completed(struct nfs4_ff_layoutstat *layoutstat,
static void
nfs4_ff_layout_stat_io_start_read(struct inode *inode,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
__u64 requested, ktime_t now)
{
bool report;
spin_lock(&mirror->lock);
- report = nfs4_ff_layoutstat_start_io(mirror, &mirror->dss[0].read_stat, now);
- nfs4_ff_layout_stat_io_update_requested(&mirror->dss[0].read_stat, requested);
+ report = nfs4_ff_layoutstat_start_io(
+ mirror, dss_id, &mirror->dss[dss_id].read_stat, now);
+ nfs4_ff_layout_stat_io_update_requested(
+ &mirror->dss[dss_id].read_stat, requested);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -732,11 +736,12 @@ nfs4_ff_layout_stat_io_start_read(struct inode *inode,
static void
nfs4_ff_layout_stat_io_end_read(struct rpc_task *task,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
__u64 requested,
__u64 completed)
{
spin_lock(&mirror->lock);
- nfs4_ff_layout_stat_io_update_completed(&mirror->dss[0].read_stat,
+ nfs4_ff_layout_stat_io_update_completed(&mirror->dss[dss_id].read_stat,
requested, completed,
ktime_get(), task->tk_start);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
@@ -746,13 +751,20 @@ nfs4_ff_layout_stat_io_end_read(struct rpc_task *task,
static void
nfs4_ff_layout_stat_io_start_write(struct inode *inode,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
__u64 requested, ktime_t now)
{
bool report;
spin_lock(&mirror->lock);
- report = nfs4_ff_layoutstat_start_io(mirror, &mirror->dss[0].write_stat, now);
- nfs4_ff_layout_stat_io_update_requested(&mirror->dss[0].write_stat, requested);
+ report = nfs4_ff_layoutstat_start_io(
+ mirror,
+ dss_id,
+ &mirror->dss[dss_id].write_stat,
+ now);
+ nfs4_ff_layout_stat_io_update_requested(
+ &mirror->dss[dss_id].write_stat,
+ requested);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
@@ -763,6 +775,7 @@ nfs4_ff_layout_stat_io_start_write(struct inode *inode,
static void
nfs4_ff_layout_stat_io_end_write(struct rpc_task *task,
struct nfs4_ff_layout_mirror *mirror,
+ u32 dss_id,
__u64 requested,
__u64 completed,
enum nfs3_stable_how committed)
@@ -771,25 +784,25 @@ nfs4_ff_layout_stat_io_end_write(struct rpc_task *task,
requested = completed = 0;
spin_lock(&mirror->lock);
- nfs4_ff_layout_stat_io_update_completed(&mirror->dss[0].write_stat,
+ nfs4_ff_layout_stat_io_update_completed(&mirror->dss[dss_id].write_stat,
requested, completed, ktime_get(), task->tk_start);
set_bit(NFS4_FF_MIRROR_STAT_AVAIL, &mirror->flags);
spin_unlock(&mirror->lock);
}
static void
-ff_layout_mark_ds_unreachable(struct pnfs_layout_segment *lseg, u32 idx)
+ff_layout_mark_ds_unreachable(struct pnfs_layout_segment *lseg, u32 idx, u32 dss_id)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, dss_id);
if (devid)
nfs4_mark_deviceid_unavailable(devid);
}
static void
-ff_layout_mark_ds_reachable(struct pnfs_layout_segment *lseg, u32 idx)
+ff_layout_mark_ds_reachable(struct pnfs_layout_segment *lseg, u32 idx, u32 dss_id)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, dss_id);
if (devid)
nfs4_mark_deviceid_available(devid);
@@ -1222,11 +1235,11 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
struct pnfs_layout_segment *lseg,
- u32 idx)
+ u32 idx, u32 dss_id)
{
struct pnfs_layout_hdr *lo = lseg->pls_layout;
struct inode *inode = lo->plh_inode;
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, dss_id);
struct nfs4_slot_table *tbl = &clp->cl_session->fc_slot_table;
switch (op_status) {
@@ -1323,9 +1336,9 @@ static int ff_layout_async_handle_error_v3(struct rpc_task *task,
u32 op_status,
struct nfs_client *clp,
struct pnfs_layout_segment *lseg,
- u32 idx)
+ u32 idx, u32 dss_id)
{
- struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, 0);
+ struct nfs4_deviceid_node *devid = FF_LAYOUT_DEVID_NODE(lseg, idx, dss_id);
switch (op_status) {
case NFS_OK:
@@ -1389,12 +1402,17 @@ static int ff_layout_async_handle_error(struct rpc_task *task,
struct nfs4_state *state,
struct nfs_client *clp,
struct pnfs_layout_segment *lseg,
- u32 idx)
+ u32 idx, u32 offset)
{
int vers = clp->cl_nfs_mod->rpc_vers->number;
+ struct nfs4_ff_layout_segment *flseg = FF_LAYOUT_LSEG(lseg);
+ u32 dss_id = nfs4_ff_layout_calc_dss_id(
+ flseg->stripe_unit,
+ flseg->mirror_array[idx]->dss_count,
+ offset);
if (task->tk_status >= 0) {
- ff_layout_mark_ds_reachable(lseg, idx);
+ ff_layout_mark_ds_reachable(lseg, idx, dss_id);
return 0;
}
@@ -1405,10 +1423,10 @@ static int ff_layout_async_handle_error(struct rpc_task *task,
switch (vers) {
case 3:
return ff_layout_async_handle_error_v3(task, op_status, clp,
- lseg, idx);
+ lseg, idx, dss_id);
case 4:
return ff_layout_async_handle_error_v4(task, op_status, state,
- clp, lseg, idx);
+ clp, lseg, idx, dss_id);
default:
/* should never happen */
WARN_ON_ONCE(1);
@@ -1423,6 +1441,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
struct nfs4_ff_layout_mirror *mirror;
u32 status = *op_status;
int err;
+ u32 dss_id;
if (status == 0) {
switch (error) {
@@ -1454,8 +1473,11 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
}
mirror = FF_LAYOUT_COMP(lseg, idx);
+ dss_id = nfs4_ff_layout_calc_dss_id(FF_LAYOUT_LSEG(lseg)->stripe_unit,
+ mirror->dss_count,
+ offset);
err = ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
- mirror, 0, offset, length, status, opnum,
+ mirror, dss_id, offset, length, status, opnum,
nfs_io_gfp_mask());
switch (status) {
@@ -1464,7 +1486,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
case NFS4ERR_PERM:
break;
case NFS4ERR_NXIO:
- ff_layout_mark_ds_unreachable(lseg, idx);
+ ff_layout_mark_ds_unreachable(lseg, idx, dss_id);
/*
* Don't return the layout if this is a read and we still
* have layouts to try
@@ -1497,7 +1519,8 @@ static int ff_layout_read_done_cb(struct rpc_task *task,
err = ff_layout_async_handle_error(task, hdr->res.op_status,
hdr->args.context->state,
hdr->ds_clp, hdr->lseg,
- hdr->pgio_mirror_idx);
+ hdr->pgio_mirror_idx,
+ hdr->args.offset);
trace_nfs4_pnfs_read(hdr, err);
clear_bit(NFS_IOHDR_RESEND_PNFS, &hdr->flags);
@@ -1553,23 +1576,47 @@ ff_layout_set_layoutcommit(struct inode *inode,
static void ff_layout_read_record_layoutstats_start(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
+
if (test_and_set_bit(NFS_IOHDR_STAT, &hdr->flags))
return;
- nfs4_ff_layout_stat_io_start_read(hdr->inode,
- FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx),
- hdr->args.count,
- task->tk_start);
+
+ mirror = FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(hdr->lseg)->stripe_unit,
+ mirror->dss_count,
+ hdr->args.offset);
+
+ nfs4_ff_layout_stat_io_start_read(
+ hdr->inode,
+ mirror,
+ dss_id,
+ hdr->args.count,
+ task->tk_start);
}
static void ff_layout_read_record_layoutstats_done(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
+
if (!test_and_clear_bit(NFS_IOHDR_STAT, &hdr->flags))
return;
- nfs4_ff_layout_stat_io_end_read(task,
- FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx),
- hdr->args.count,
- hdr->res.count);
+
+ mirror = FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(hdr->lseg)->stripe_unit,
+ mirror->dss_count,
+ hdr->args.offset);
+
+ nfs4_ff_layout_stat_io_end_read(
+ task,
+ mirror,
+ dss_id,
+ hdr->args.count,
+ hdr->res.count);
set_bit(NFS_LSEG_LAYOUTRETURN, &hdr->lseg->pls_flags);
}
@@ -1671,7 +1718,8 @@ static int ff_layout_write_done_cb(struct rpc_task *task,
err = ff_layout_async_handle_error(task, hdr->res.op_status,
hdr->args.context->state,
hdr->ds_clp, hdr->lseg,
- hdr->pgio_mirror_idx);
+ hdr->pgio_mirror_idx,
+ hdr->args.offset);
trace_nfs4_pnfs_write(hdr, err);
clear_bit(NFS_IOHDR_RESEND_PNFS, &hdr->flags);
@@ -1709,9 +1757,10 @@ static int ff_layout_commit_done_cb(struct rpc_task *task,
struct nfs_commit_data *data)
{
int err;
+ u32 idx = calc_mirror_idx_from_commit(data->lseg, data->ds_commit_index);
if (task->tk_status < 0) {
- ff_layout_io_track_ds_error(data->lseg, data->ds_commit_index,
+ ff_layout_io_track_ds_error(data->lseg, idx,
data->args.offset, data->args.count,
&data->res.op_status, OP_COMMIT,
task->tk_status);
@@ -1719,7 +1768,7 @@ static int ff_layout_commit_done_cb(struct rpc_task *task,
}
err = ff_layout_async_handle_error(task, data->res.op_status,
- NULL, data->ds_clp, data->lseg,
+ NULL, data->ds_clp, data->lseg, idx,
data->ds_commit_index);
trace_nfs4_pnfs_commit_ds(data, err);
@@ -1739,30 +1788,54 @@ static int ff_layout_commit_done_cb(struct rpc_task *task,
}
ff_layout_set_layoutcommit(data->inode, data->lseg, data->lwb);
-
return 0;
}
static void ff_layout_write_record_layoutstats_start(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
+
if (test_and_set_bit(NFS_IOHDR_STAT, &hdr->flags))
return;
- nfs4_ff_layout_stat_io_start_write(hdr->inode,
- FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx),
- hdr->args.count,
- task->tk_start);
+
+ mirror = FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(hdr->lseg)->stripe_unit,
+ mirror->dss_count,
+ hdr->args.offset);
+
+ nfs4_ff_layout_stat_io_start_write(
+ hdr->inode,
+ mirror,
+ dss_id,
+ hdr->args.count,
+ task->tk_start);
}
static void ff_layout_write_record_layoutstats_done(struct rpc_task *task,
struct nfs_pgio_header *hdr)
{
+ struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
+
if (!test_and_clear_bit(NFS_IOHDR_STAT, &hdr->flags))
return;
- nfs4_ff_layout_stat_io_end_write(task,
- FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx),
- hdr->args.count, hdr->res.count,
- hdr->res.verf->committed);
+
+ mirror = FF_LAYOUT_COMP(hdr->lseg, hdr->pgio_mirror_idx);
+ dss_id = nfs4_ff_layout_calc_dss_id(
+ FF_LAYOUT_LSEG(hdr->lseg)->stripe_unit,
+ mirror->dss_count,
+ hdr->args.offset);
+
+ nfs4_ff_layout_stat_io_end_write(
+ task,
+ mirror,
+ dss_id,
+ hdr->args.count,
+ hdr->res.count,
+ hdr->res.verf->committed);
set_bit(NFS_LSEG_LAYOUTRETURN, &hdr->lseg->pls_flags);
}
@@ -1845,10 +1918,16 @@ static void ff_layout_write_release(void *data)
static void ff_layout_commit_record_layoutstats_start(struct rpc_task *task,
struct nfs_commit_data *cdata)
{
+ u32 idx, dss_id;
+
if (test_and_set_bit(NFS_IOHDR_STAT, &cdata->flags))
return;
+
+ idx = calc_mirror_idx_from_commit(cdata->lseg, cdata->ds_commit_index);
+ dss_id = calc_dss_id_from_commit(cdata->lseg, cdata->ds_commit_index);
nfs4_ff_layout_stat_io_start_write(cdata->inode,
- FF_LAYOUT_COMP(cdata->lseg, cdata->ds_commit_index),
+ FF_LAYOUT_COMP(cdata->lseg, idx),
+ dss_id,
0, task->tk_start);
}
@@ -1857,6 +1936,7 @@ static void ff_layout_commit_record_layoutstats_done(struct rpc_task *task,
{
struct nfs_page *req;
__u64 count = 0;
+ u32 idx, dss_id;
if (!test_and_clear_bit(NFS_IOHDR_STAT, &cdata->flags))
return;
@@ -1865,8 +1945,12 @@ static void ff_layout_commit_record_layoutstats_done(struct rpc_task *task,
list_for_each_entry(req, &cdata->pages, wb_list)
count += req->wb_bytes;
}
+
+ idx = calc_mirror_idx_from_commit(cdata->lseg, cdata->ds_commit_index);
+ dss_id = calc_dss_id_from_commit(cdata->lseg, cdata->ds_commit_index);
nfs4_ff_layout_stat_io_end_write(task,
- FF_LAYOUT_COMP(cdata->lseg, cdata->ds_commit_index),
+ FF_LAYOUT_COMP(cdata->lseg, idx),
+ dss_id,
count, count, NFS_FILE_SYNC);
set_bit(NFS_LSEG_LAYOUTRETURN, &cdata->lseg->pls_flags);
}
@@ -2253,25 +2337,28 @@ static void ff_layout_cancel_io(struct pnfs_layout_segment *lseg)
struct nfs4_pnfs_ds *ds;
struct nfs_client *ds_clp;
struct rpc_clnt *clnt;
- u32 idx;
+ u32 idx, dss_id;
for (idx = 0; idx < flseg->mirror_array_cnt; idx++) {
mirror = flseg->mirror_array[idx];
- mirror_ds = mirror->dss[0].mirror_ds;
- if (IS_ERR_OR_NULL(mirror_ds))
- continue;
- ds = mirror->dss[0].mirror_ds->ds;
- if (!ds)
- continue;
- ds_clp = ds->ds_clp;
- if (!ds_clp)
- continue;
- clnt = ds_clp->cl_rpcclient;
- if (!clnt)
- continue;
- if (!rpc_cancel_tasks(clnt, -EAGAIN, ff_layout_match_io, lseg))
- continue;
- rpc_clnt_disconnect(clnt);
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++) {
+ mirror_ds = mirror->dss[dss_id].mirror_ds;
+ if (IS_ERR_OR_NULL(mirror_ds))
+ continue;
+ ds = mirror->dss[dss_id].mirror_ds->ds;
+ if (!ds)
+ continue;
+ ds_clp = ds->ds_clp;
+ if (!ds_clp)
+ continue;
+ clnt = ds_clp->cl_rpcclient;
+ if (!clnt)
+ continue;
+ if (!rpc_cancel_tasks(clnt, -EAGAIN,
+ ff_layout_match_io, lseg))
+ continue;
+ rpc_clnt_disconnect(clnt);
+ }
}
}
@@ -2659,11 +2746,11 @@ ff_layout_encode_io_latency(struct xdr_stream *xdr,
static void
ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
const struct nfs42_layoutstat_devinfo *devinfo,
- struct nfs4_ff_layout_mirror *mirror)
+ struct nfs4_ff_layout_ds_stripe *dss_info)
{
struct nfs4_pnfs_ds_addr *da;
- struct nfs4_pnfs_ds *ds = mirror->dss[0].mirror_ds->ds;
- struct nfs_fh *fh = &mirror->dss[0].fh_versions[0];
+ struct nfs4_pnfs_ds *ds = dss_info->mirror_ds->ds;
+ struct nfs_fh *fh = &dss_info->fh_versions[0];
__be32 *p;
da = list_first_entry(&ds->ds_addrs, struct nfs4_pnfs_ds_addr, da_node);
@@ -2675,13 +2762,17 @@ ff_layout_encode_ff_layoutupdate(struct xdr_stream *xdr,
p = xdr_reserve_space(xdr, 4 + fh->size);
xdr_encode_opaque(p, fh->data, fh->size);
/* ff_io_latency4 read */
- spin_lock(&mirror->lock);
- ff_layout_encode_io_latency(xdr, &mirror->dss[0].read_stat.io_stat);
+ spin_lock(&dss_info->mirror->lock);
+ ff_layout_encode_io_latency(xdr,
+ &dss_info->read_stat.io_stat);
/* ff_io_latency4 write */
- ff_layout_encode_io_latency(xdr, &mirror->dss[0].write_stat.io_stat);
- spin_unlock(&mirror->lock);
+ ff_layout_encode_io_latency(xdr,
+ &dss_info->write_stat.io_stat);
+ spin_unlock(&dss_info->mirror->lock);
/* nfstime4 */
- ff_layout_encode_nfstime(xdr, ktime_sub(ktime_get(), mirror->dss[0].start_time));
+ ff_layout_encode_nfstime(xdr,
+ ktime_sub(ktime_get(),
+ dss_info->start_time));
/* bool */
p = xdr_reserve_space(xdr, 4);
*p = cpu_to_be32(false);
@@ -2705,7 +2796,8 @@ ff_layout_encode_layoutstats(struct xdr_stream *xdr, const void *args,
static void
ff_layout_free_layoutstats(struct nfs4_xdr_opaque_data *opaque)
{
- struct nfs4_ff_layout_mirror *mirror = opaque->data;
+ struct nfs4_ff_layout_ds_stripe *dss_info = opaque->data;
+ struct nfs4_ff_layout_mirror *mirror = dss_info->mirror;
ff_layout_put_mirror(mirror);
}
@@ -2722,37 +2814,47 @@ ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
{
struct nfs4_flexfile_layout *ff_layout = FF_LAYOUT_FROM_HDR(lo);
struct nfs4_ff_layout_mirror *mirror;
+ struct nfs4_ff_layout_ds_stripe *dss_info;
struct nfs4_deviceid_node *dev;
- int i = 0;
+ int i = 0, dss_id;
list_for_each_entry(mirror, &ff_layout->mirrors, mirrors) {
- if (i >= dev_limit)
- break;
- if (IS_ERR_OR_NULL(mirror->dss[0].mirror_ds))
- continue;
- if (!test_and_clear_bit(NFS4_FF_MIRROR_STAT_AVAIL,
- &mirror->flags) &&
- type != NFS4_FF_OP_LAYOUTRETURN)
- continue;
- /* mirror refcount put in cleanup_layoutstats */
- if (!refcount_inc_not_zero(&mirror->ref))
- continue;
- dev = &mirror->dss[0].mirror_ds->id_node;
- memcpy(&devinfo->dev_id, &dev->deviceid, NFS4_DEVICEID4_SIZE);
- devinfo->offset = 0;
- devinfo->length = NFS4_MAX_UINT64;
- spin_lock(&mirror->lock);
- devinfo->read_count = mirror->dss[0].read_stat.io_stat.ops_completed;
- devinfo->read_bytes = mirror->dss[0].read_stat.io_stat.bytes_completed;
- devinfo->write_count = mirror->dss[0].write_stat.io_stat.ops_completed;
- devinfo->write_bytes = mirror->dss[0].write_stat.io_stat.bytes_completed;
- spin_unlock(&mirror->lock);
- devinfo->layout_type = LAYOUT_FLEX_FILES;
- devinfo->ld_private.ops = &layoutstat_ops;
- devinfo->ld_private.data = mirror;
-
- devinfo++;
- i++;
+ for (dss_id = 0; dss_id < mirror->dss_count; ++dss_id) {
+ dss_info = &mirror->dss[dss_id];
+ if (i >= dev_limit)
+ break;
+ if (IS_ERR_OR_NULL(dss_info->mirror_ds))
+ continue;
+ if (!test_and_clear_bit(NFS4_FF_MIRROR_STAT_AVAIL,
+ &mirror->flags) &&
+ type != NFS4_FF_OP_LAYOUTRETURN)
+ continue;
+ /* mirror refcount put in cleanup_layoutstats */
+ if (!refcount_inc_not_zero(&mirror->ref))
+ continue;
+ dev = &dss_info->mirror_ds->id_node;
+ memcpy(&devinfo->dev_id,
+ &dev->deviceid,
+ NFS4_DEVICEID4_SIZE);
+ devinfo->offset = 0;
+ devinfo->length = NFS4_MAX_UINT64;
+ spin_lock(&mirror->lock);
+ devinfo->read_count =
+ dss_info->read_stat.io_stat.ops_completed;
+ devinfo->read_bytes =
+ dss_info->read_stat.io_stat.bytes_completed;
+ devinfo->write_count =
+ dss_info->write_stat.io_stat.ops_completed;
+ devinfo->write_bytes =
+ dss_info->write_stat.io_stat.bytes_completed;
+ spin_unlock(&mirror->lock);
+ devinfo->layout_type = LAYOUT_FLEX_FILES;
+ devinfo->ld_private.ops = &layoutstat_ops;
+ devinfo->ld_private.data = &mirror->dss[dss_id];
+
+ devinfo++;
+ i++;
+ }
}
return i;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 9/9] NFSv4/flexfiles: Add support for striped layouts
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
` (7 preceding siblings ...)
2025-08-18 22:07 ` [PATCH 8/9] NFSv4/flexfiles: Update layout stats & error paths " Jonathan Curley
@ 2025-08-18 22:07 ` Jonathan Curley
8 siblings, 0 replies; 11+ messages in thread
From: Jonathan Curley @ 2025-08-18 22:07 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Jonathan Curley, linux-nfs
Updates lseg creation path to parse and add striped layouts. Enable
support for striped layouts.
Limitations:
1. All mirrors must have the same number of stripes.
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 247 ++++++++++++++++---------
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
2 files changed, 157 insertions(+), 92 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 24d0eef0b6a4..444267938081 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -177,18 +177,19 @@ ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx, u32 dss_id,
#endif
}
-static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
- const struct nfs4_ff_layout_mirror *m2)
+static bool ff_dss_match_fh(const struct nfs4_ff_layout_ds_stripe *dss1,
+ const struct nfs4_ff_layout_ds_stripe *dss2)
{
int i, j;
- if (m1->dss[0].fh_versions_cnt != m2->dss[0].fh_versions_cnt)
+ if (dss1->fh_versions_cnt != dss2->fh_versions_cnt)
return false;
- for (i = 0; i < m1->dss[0].fh_versions_cnt; i++) {
+
+ for (i = 0; i < dss1->fh_versions_cnt; i++) {
bool found_fh = false;
- for (j = 0; j < m2->dss[0].fh_versions_cnt; j++) {
- if (nfs_compare_fh(&m1->dss[0].fh_versions[i],
- &m2->dss[0].fh_versions[j]) == 0) {
+ for (j = 0; j < dss2->fh_versions_cnt; j++) {
+ if (nfs_compare_fh(&dss1->fh_versions[i],
+ &dss2->fh_versions[j]) == 0) {
found_fh = true;
break;
}
@@ -199,6 +200,38 @@ static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
return true;
}
+static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
+ const struct nfs4_ff_layout_mirror *m2)
+{
+ u32 dss_id;
+
+ if (m1->dss_count != m2->dss_count)
+ return false;
+
+ for (dss_id = 0; dss_id < m1->dss_count; dss_id++)
+ if (!ff_dss_match_fh(&m1->dss[dss_id], &m2->dss[dss_id]))
+ return false;
+
+ return true;
+}
+
+static bool ff_mirror_match_devid(const struct nfs4_ff_layout_mirror *m1,
+ const struct nfs4_ff_layout_mirror *m2)
+{
+ u32 dss_id;
+
+ if (m1->dss_count != m2->dss_count)
+ return false;
+
+ for (dss_id = 0; dss_id < m1->dss_count; dss_id++)
+ if (memcmp(&m1->dss[dss_id].devid,
+ &m2->dss[dss_id].devid,
+ sizeof(m1->dss[dss_id].devid)) != 0)
+ return false;
+
+ return true;
+}
+
static struct nfs4_ff_layout_mirror *
ff_layout_add_mirror(struct pnfs_layout_hdr *lo,
struct nfs4_ff_layout_mirror *mirror)
@@ -209,8 +242,7 @@ ff_layout_add_mirror(struct pnfs_layout_hdr *lo,
spin_lock(&inode->i_lock);
list_for_each_entry(pos, &ff_layout->mirrors, mirrors) {
- if (memcmp(&mirror->dss[0].devid, &pos->dss[0].devid,
- sizeof(pos->dss[0].devid)) != 0)
+ if (!ff_mirror_match_devid(mirror, pos))
continue;
if (!ff_mirror_match_fh(mirror, pos))
continue;
@@ -241,13 +273,15 @@ ff_layout_remove_mirror(struct nfs4_ff_layout_mirror *mirror)
static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
{
struct nfs4_ff_layout_mirror *mirror;
+ u32 dss_id;
mirror = kzalloc(sizeof(*mirror), gfp_flags);
if (mirror != NULL) {
spin_lock_init(&mirror->lock);
refcount_set(&mirror->ref, 1);
INIT_LIST_HEAD(&mirror->mirrors);
- nfs_localio_file_init(&mirror->dss[0].nfl);
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++)
+ nfs_localio_file_init(&mirror->dss[dss_id].nfl);
}
return mirror;
}
@@ -255,17 +289,19 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
{
const struct cred *cred;
- int dss_id = 0;
+ u32 dss_id;
ff_layout_remove_mirror(mirror);
- kfree(mirror->dss[dss_id].fh_versions);
- nfs_close_local_fh(&mirror->dss[dss_id].nfl);
- cred = rcu_access_pointer(mirror->dss[dss_id].ro_cred);
- put_cred(cred);
- cred = rcu_access_pointer(mirror->dss[dss_id].rw_cred);
- put_cred(cred);
- nfs4_ff_layout_put_deviceid(mirror->dss[dss_id].mirror_ds);
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++) {
+ kfree(mirror->dss[dss_id].fh_versions);
+ cred = rcu_access_pointer(mirror->dss[dss_id].ro_cred);
+ put_cred(cred);
+ cred = rcu_access_pointer(mirror->dss[dss_id].rw_cred);
+ put_cred(cred);
+ nfs_close_local_fh(&mirror->dss[dss_id].nfl);
+ nfs4_ff_layout_put_deviceid(mirror->dss[dss_id].mirror_ds);
+ }
kfree(mirror->dss);
kfree(mirror);
@@ -371,14 +407,24 @@ ff_layout_add_lseg(struct pnfs_layout_hdr *lo,
free_me);
}
+static u32 ff_mirror_efficiency_sum(const struct nfs4_ff_layout_mirror *mirror)
+{
+ u32 dss_id, sum = 0;
+
+ for (dss_id = 0; dss_id < mirror->dss_count; dss_id++)
+ sum += mirror->dss[dss_id].efficiency;
+
+ return sum;
+}
+
static void ff_layout_sort_mirrors(struct nfs4_ff_layout_segment *fls)
{
int i, j;
for (i = 0; i < fls->mirror_array_cnt - 1; i++) {
for (j = i + 1; j < fls->mirror_array_cnt; j++)
- if (fls->mirror_array[i]->dss[0].efficiency <
- fls->mirror_array[j]->dss[0].efficiency)
+ if (ff_mirror_efficiency_sum(fls->mirror_array[i]) <
+ ff_mirror_efficiency_sum(fls->mirror_array[j]))
swap(fls->mirror_array[i],
fls->mirror_array[j]);
}
@@ -398,6 +444,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
u32 mirror_array_cnt;
__be32 *p;
int i, rc;
+ struct nfs4_ff_layout_ds_stripe *dss_info;
dprintk("--> %s\n", __func__);
scratch = alloc_page(gfp_flags);
@@ -440,17 +487,24 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
kuid_t uid;
kgid_t gid;
u32 fh_count, id;
- int j, dss_id = 0;
+ int j, dss_id;
rc = -EIO;
p = xdr_inline_decode(&stream, 4);
if (!p)
goto out_err_free;
- dss_count = be32_to_cpup(p);
+ // Ensure all mirrors have same stripe count.
+ if (dss_count == 0)
+ dss_count = be32_to_cpup(p);
+ else if (dss_count != be32_to_cpup(p))
+ goto out_err_free;
+
+ if (dss_count > NFS4_FLEXFILE_LAYOUT_MAX_STRIPE_CNT ||
+ dss_count == 0)
+ goto out_err_free;
- /* FIXME: allow for striping? */
- if (dss_count != 1)
+ if (dss_count > 1 && stripe_unit == 0)
goto out_err_free;
fls->mirror_array[i] = ff_layout_alloc_mirror(gfp_flags);
@@ -464,91 +518,100 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
kcalloc(dss_count, sizeof(struct nfs4_ff_layout_ds_stripe),
gfp_flags);
- /* deviceid */
- rc = decode_deviceid(&stream, &fls->mirror_array[i]->dss[dss_id].devid);
- if (rc)
- goto out_err_free;
+ for (dss_id = 0; dss_id < dss_count; dss_id++) {
+ dss_info = &fls->mirror_array[i]->dss[dss_id];
+ dss_info->mirror = fls->mirror_array[i];
- /* efficiency */
- rc = -EIO;
- p = xdr_inline_decode(&stream, 4);
- if (!p)
- goto out_err_free;
- fls->mirror_array[i]->dss[dss_id].efficiency = be32_to_cpup(p);
+ /* deviceid */
+ rc = decode_deviceid(&stream, &dss_info->devid);
+ if (rc)
+ goto out_err_free;
- /* stateid */
- rc = decode_pnfs_stateid(&stream, &fls->mirror_array[i]->dss[dss_id].stateid);
- if (rc)
- goto out_err_free;
+ /* efficiency */
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ dss_info->efficiency = be32_to_cpup(p);
- /* fh */
- rc = -EIO;
- p = xdr_inline_decode(&stream, 4);
- if (!p)
- goto out_err_free;
- fh_count = be32_to_cpup(p);
+ /* stateid */
+ rc = decode_pnfs_stateid(&stream, &dss_info->stateid);
+ if (rc)
+ goto out_err_free;
- fls->mirror_array[i]->dss[dss_id].fh_versions =
- kcalloc(fh_count, sizeof(struct nfs_fh),
- gfp_flags);
- if (fls->mirror_array[i]->dss[dss_id].fh_versions == NULL) {
- rc = -ENOMEM;
- goto out_err_free;
- }
+ /* fh */
+ rc = -EIO;
+ p = xdr_inline_decode(&stream, 4);
+ if (!p)
+ goto out_err_free;
+ fh_count = be32_to_cpup(p);
- for (j = 0; j < fh_count; j++) {
- rc = decode_nfs_fh(&stream,
- &fls->mirror_array[i]->dss[dss_id].fh_versions[j]);
+ dss_info->fh_versions =
+ kcalloc(fh_count, sizeof(struct nfs_fh),
+ gfp_flags);
+ if (dss_info->fh_versions == NULL) {
+ rc = -ENOMEM;
+ goto out_err_free;
+ }
+
+ for (j = 0; j < fh_count; j++) {
+ rc = decode_nfs_fh(&stream,
+ &dss_info->fh_versions[j]);
+ if (rc)
+ goto out_err_free;
+ }
+
+ dss_info->fh_versions_cnt = fh_count;
+
+ /* user */
+ rc = decode_name(&stream, &id);
if (rc)
goto out_err_free;
- }
- fls->mirror_array[i]->dss[dss_id].fh_versions_cnt = fh_count;
+ uid = make_kuid(&init_user_ns, id);
- /* user */
- rc = decode_name(&stream, &id);
- if (rc)
- goto out_err_free;
+ /* group */
+ rc = decode_name(&stream, &id);
+ if (rc)
+ goto out_err_free;
- uid = make_kuid(&init_user_ns, id);
+ gid = make_kgid(&init_user_ns, id);
- /* group */
- rc = decode_name(&stream, &id);
- if (rc)
- goto out_err_free;
+ if (gfp_flags & __GFP_FS)
+ kcred = prepare_kernel_cred(&init_task);
+ else {
+ unsigned int nofs_flags = memalloc_nofs_save();
- gid = make_kgid(&init_user_ns, id);
+ kcred = prepare_kernel_cred(&init_task);
+ memalloc_nofs_restore(nofs_flags);
+ }
+ rc = -ENOMEM;
+ if (!kcred)
+ goto out_err_free;
+ kcred->fsuid = uid;
+ kcred->fsgid = gid;
+ cred = RCU_INITIALIZER(kcred);
- if (gfp_flags & __GFP_FS)
- kcred = prepare_kernel_cred(&init_task);
- else {
- unsigned int nofs_flags = memalloc_nofs_save();
- kcred = prepare_kernel_cred(&init_task);
- memalloc_nofs_restore(nofs_flags);
+ if (lgr->range.iomode == IOMODE_READ)
+ rcu_assign_pointer(dss_info->ro_cred, cred);
+ else
+ rcu_assign_pointer(dss_info->rw_cred, cred);
}
- rc = -ENOMEM;
- if (!kcred)
- goto out_err_free;
- kcred->fsuid = uid;
- kcred->fsgid = gid;
- cred = RCU_INITIALIZER(kcred);
-
- if (lgr->range.iomode == IOMODE_READ)
- rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].ro_cred, cred);
- else
- rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].rw_cred, cred);
mirror = ff_layout_add_mirror(lh, fls->mirror_array[i]);
if (mirror != fls->mirror_array[i]) {
- /* swap cred ptrs so free_mirror will clean up old */
- if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->dss[dss_id].ro_cred,
- fls->mirror_array[i]->dss[dss_id].ro_cred);
- rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].ro_cred, cred);
- } else {
- cred = xchg(&mirror->dss[dss_id].rw_cred,
- fls->mirror_array[i]->dss[dss_id].rw_cred);
- rcu_assign_pointer(fls->mirror_array[i]->dss[dss_id].rw_cred, cred);
+ for (dss_id = 0; dss_id < dss_count; dss_id++) {
+ dss_info = &fls->mirror_array[i]->dss[dss_id];
+ /* swap cred ptrs so free_mirror will clean up old */
+ if (lgr->range.iomode == IOMODE_READ) {
+ cred = xchg(&mirror->dss[dss_id].ro_cred,
+ dss_info->ro_cred);
+ rcu_assign_pointer(dss_info->ro_cred, cred);
+ } else {
+ cred = xchg(&mirror->dss[dss_id].rw_cred,
+ dss_info->rw_cred);
+ rcu_assign_pointer(dss_info->rw_cred, cred);
+ }
}
ff_layout_free_mirror(fls->mirror_array[i]);
fls->mirror_array[i] = mirror;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index 142324d6d5c5..17a008c8e97c 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -21,6 +21,8 @@
* due to network error etc. */
#define NFS4_FLEXFILE_LAYOUT_MAX_MIRROR_CNT 4096
+#define NFS4_FLEXFILE_LAYOUT_MAX_STRIPE_CNT 4096
+
/* LAYOUTSTATS report interval in ms */
#define FF_LAYOUTSTATS_REPORT_INTERVAL (60000L)
#define FF_LAYOUTSTATS_MAXDEV 4
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware.
2025-08-18 22:07 ` [PATCH 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware Jonathan Curley
@ 2025-08-20 1:52 ` kernel test robot
0 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2025-08-20 1:52 UTC (permalink / raw)
To: Jonathan Curley, Trond Myklebust, Anna Schumaker
Cc: oe-kbuild-all, Jonathan Curley, linux-nfs
Hi Jonathan,
kernel test robot noticed the following build warnings:
[auto build test WARNING on v6.16]
[cannot apply to trondmy-nfs/linux-next v6.17-rc2 v6.17-rc1 linus/master next-20250819]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Jonathan-Curley/NFSv4-flexfiles-Remove-cred-local-variable-dependency/20250819-061041
base: v6.16
patch link: https://lore.kernel.org/r/20250818220750.47085-5-jcurley%40purestorage.com
patch subject: [PATCH 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware.
config: powerpc-randconfig-002-20250820 (https://download.01.org/0day-ci/archive/20250820/202508200945.mrU2bex2-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250820/202508200945.mrU2bex2-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508200945.mrU2bex2-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> Warning: fs/nfs/flexfilelayout/flexfilelayoutdev.c:374 function parameter 'dss_id' not described in 'nfs4_ff_layout_prepare_ds'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-08-20 1:52 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 22:07 [PATCH 0/9] NFSv4/flexfiles: Add support for striped layouts Jonathan Curley
2025-08-18 22:07 ` [PATCH 1/9] NFSv4/flexfiles: Remove cred local variable dependency Jonathan Curley
2025-08-18 22:07 ` [PATCH 2/9] NFSv4/flexfiles: Use ds_commit_idx when marking a write commit Jonathan Curley
2025-08-18 22:07 ` [PATCH 3/9] NFSv4/flexfiles: Add data structure support for striped layouts Jonathan Curley
2025-08-18 22:07 ` [PATCH 4/9] NFSv4/flexfiles: Update low level helper functions to be DS stripe aware Jonathan Curley
2025-08-20 1:52 ` kernel test robot
2025-08-18 22:07 ` [PATCH 5/9] NFSv4/flexfiles: Read path updates for striped layouts Jonathan Curley
2025-08-18 22:07 ` [PATCH 6/9] NFSv4/flexfiles: Commit " Jonathan Curley
2025-08-18 22:07 ` [PATCH 7/9] NFSv4/flexfiles: Write " Jonathan Curley
2025-08-18 22:07 ` [PATCH 8/9] NFSv4/flexfiles: Update layout stats & error paths " Jonathan Curley
2025-08-18 22:07 ` [PATCH 9/9] NFSv4/flexfiles: Add support " Jonathan Curley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox