* [PATCH 4/4] nfs: remove fileid field from struct nfs_inode
From: Jeff Layton @ 2026-05-12 16:12 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Jonathan Corbet, Shuah Khan
Cc: linux-nfs, linux-kernel, linux-doc, Jeff Layton
In-Reply-To: <20260512-nfsino-v1-0-284720522f4c@kernel.org>
Now that all NFS client code uses inode->i_ino directly to store and
access the 64-bit NFS fileid, the separate fileid field in struct
nfs_inode is unused. Remove it to save 8 bytes per NFS inode.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
include/linux/nfs_fs.h | 5 -----
1 file changed, 5 deletions(-)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 83063f4ab488..ec17e602c979 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -145,11 +145,6 @@ struct nfs4_xattr_cache;
* nfs fs inode data in memory
*/
struct nfs_inode {
- /*
- * The 64bit 'inode number'
- */
- __u64 fileid;
-
/*
* NFS file handle
*/
--
2.54.0
^ permalink raw reply related
* [PATCH 3/4] nfs: replace NFS_FILEID() and nfsi->fileid with inode->i_ino
From: Jeff Layton @ 2026-05-12 16:12 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Jonathan Corbet, Shuah Khan
Cc: linux-nfs, linux-kernel, linux-doc, Jeff Layton
In-Reply-To: <20260512-nfsino-v1-0-284720522f4c@kernel.org>
Now that inode->i_ino stores the full 64-bit NFS fileid, replace all
uses of NFS_FILEID(), set_nfs_fileid(), and direct nfsi->fileid
accesses with inode->i_ino throughout the NFS client.
Remove the NFS_FILEID() and set_nfs_fileid() helper functions from
include/linux/nfs_fs.h since they are no longer needed.
Also fix two pre-existing truncation bugs in nfs4trace.h where fileid
trace fields were declared as u32 instead of u64.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfs/export.c | 6 +--
fs/nfs/filelayout/filelayout.c | 4 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 6 +--
fs/nfs/inode.c | 16 +++----
fs/nfs/nfs4proc.c | 4 +-
fs/nfs/nfs4trace.h | 79 ++++++++++++++------------------
fs/nfs/nfstrace.h | 84 +++++++++++++++++-----------------
fs/nfs/pagelist.c | 2 +-
fs/nfs/pnfs.c | 2 +-
fs/nfs/unlink.c | 2 +-
fs/nfs/write.c | 2 +-
include/linux/nfs_fs.h | 10 ----
12 files changed, 99 insertions(+), 118 deletions(-)
diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index a10dd5f9d078..8fb08bce0623 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -49,14 +49,14 @@ nfs_encode_fh(struct inode *inode, __u32 *p, int *max_len, struct inode *parent)
return FILEID_INVALID;
}
- p[FILEID_HIGH_OFF] = NFS_FILEID(inode) >> 32;
- p[FILEID_LOW_OFF] = NFS_FILEID(inode);
+ p[FILEID_HIGH_OFF] = inode->i_ino >> 32;
+ p[FILEID_LOW_OFF] = inode->i_ino;
p[FILE_I_TYPE_OFF] = inode->i_mode & S_IFMT;
p[len - 1] = 0; /* Padding */
nfs_copy_fh(clnt_fh, server_fh);
*max_len = len;
dprintk("%s: result fh fileid %llu mode %u size %d\n",
- __func__, NFS_FILEID(inode), inode->i_mode, *max_len);
+ __func__, inode->i_ino, inode->i_mode, *max_len);
return *max_len;
}
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index e85380e3b11d..f0f53f5dc871 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -95,7 +95,7 @@ static void filelayout_reset_write(struct nfs_pgio_header *hdr)
"(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
hdr->task.tk_pid,
hdr->inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(hdr->inode),
+ (unsigned long long)hdr->inode->i_ino,
hdr->args.count,
(unsigned long long)hdr->args.offset);
@@ -112,7 +112,7 @@ static void filelayout_reset_read(struct nfs_pgio_header *hdr)
"(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
hdr->task.tk_pid,
hdr->inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(hdr->inode),
+ (unsigned long long)hdr->inode->i_ino,
hdr->args.count,
(unsigned long long)hdr->args.offset);
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 8b1559171fe3..6a84d85e0651 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1230,7 +1230,7 @@ static void ff_layout_reset_write(struct nfs_pgio_header *hdr, bool retry_pnfs)
"(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
hdr->task.tk_pid,
hdr->inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(hdr->inode),
+ (unsigned long long)hdr->inode->i_ino,
hdr->args.count,
(unsigned long long)hdr->args.offset);
@@ -1243,7 +1243,7 @@ static void ff_layout_reset_write(struct nfs_pgio_header *hdr, bool retry_pnfs)
"(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
hdr->task.tk_pid,
hdr->inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(hdr->inode),
+ (unsigned long long)hdr->inode->i_ino,
hdr->args.count,
(unsigned long long)hdr->args.offset);
@@ -1283,7 +1283,7 @@ static void ff_layout_reset_read(struct nfs_pgio_header *hdr)
"(req %s/%llu, %u bytes @ offset %llu)\n", __func__,
hdr->task.tk_pid,
hdr->inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(hdr->inode),
+ (unsigned long long)hdr->inode->i_ino,
hdr->args.count,
(unsigned long long)hdr->args.offset);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index a21ed1c7f89d..0d9451a2ad8e 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -303,7 +303,7 @@ nfs_find_actor(struct inode *inode, void *opaque)
struct nfs_fh *fh = desc->fh;
struct nfs_fattr *fattr = desc->fattr;
- if (NFS_FILEID(inode) != fattr->fileid)
+ if (inode->i_ino != fattr->fileid)
return 0;
if (inode_wrong_type(inode, fattr->mode))
return 0;
@@ -320,7 +320,7 @@ nfs_init_locked(struct inode *inode, void *opaque)
struct nfs_find_desc *desc = opaque;
struct nfs_fattr *fattr = desc->fattr;
- set_nfs_fileid(inode, fattr->fileid);
+ inode->i_ino = fattr->fileid;
inode->i_mode = fattr->mode;
nfs_copy_fh(NFS_FH(inode), desc->fh);
return 0;
@@ -580,7 +580,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
}
dprintk("NFS: nfs_fhget(%s/%Lu fh_crc=0x%08x ct=%d)\n",
inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(inode),
+ (unsigned long long)inode->i_ino,
nfs_display_fhandle_hash(fh),
icount_read(inode));
@@ -1343,7 +1343,7 @@ __nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
struct nfs_inode *nfsi = NFS_I(inode);
dfprintk(PAGECACHE, "NFS: revalidating (%s/%Lu)\n",
- inode->i_sb->s_id, (unsigned long long)NFS_FILEID(inode));
+ inode->i_sb->s_id, (unsigned long long)inode->i_ino);
trace_nfs_revalidate_inode_enter(inode);
@@ -1373,7 +1373,7 @@ __nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
if (status != 0) {
dfprintk(PAGECACHE, "nfs_revalidate_inode: (%s/%Lu) getattr failed, error=%d\n",
inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(inode), status);
+ (unsigned long long)inode->i_ino, status);
switch (status) {
case -ETIMEDOUT:
/* A soft timeout occurred. Use cached information? */
@@ -1393,7 +1393,7 @@ __nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
if (status) {
dfprintk(PAGECACHE, "nfs_revalidate_inode: (%s/%Lu) refresh failed, error=%d\n",
inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(inode), status);
+ (unsigned long long)inode->i_ino, status);
goto out;
}
@@ -1404,7 +1404,7 @@ __nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
dfprintk(PAGECACHE, "NFS: (%s/%Lu) revalidation complete\n",
inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(inode));
+ (unsigned long long)inode->i_ino);
out:
nfs_free_fattr(fattr);
@@ -1453,7 +1453,7 @@ static int nfs_invalidate_mapping(struct inode *inode, struct address_space *map
dfprintk(PAGECACHE, "NFS: (%s/%Lu) data cache invalidated\n",
inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(inode));
+ (unsigned long long)inode->i_ino);
return 0;
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a9b8d482d289..60024b978ee0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -377,7 +377,7 @@ static void nfs4_setup_readdir(u64 cookie, __be32 *verifier, struct dentry *dent
*p++ = htonl(attrs); /* bitmap */
*p++ = htonl(12); /* attribute buffer length */
*p++ = htonl(NF4DIR);
- p = xdr_encode_hyper(p, NFS_FILEID(d_inode(dentry)));
+ p = xdr_encode_hyper(p, d_inode(dentry)->i_ino);
}
*p++ = xdr_one; /* next */
@@ -391,7 +391,7 @@ static void nfs4_setup_readdir(u64 cookie, __be32 *verifier, struct dentry *dent
*p++ = htonl(12); /* attribute buffer length */
*p++ = htonl(NF4DIR);
spin_lock(&dentry->d_lock);
- p = xdr_encode_hyper(p, NFS_FILEID(d_inode(dentry->d_parent)));
+ p = xdr_encode_hyper(p, d_inode(dentry->d_parent)->i_ino);
spin_unlock(&dentry->d_lock);
readdir->pgbase = (char *)p - (char *)start;
diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index c939533b9881..1ed677810d9d 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -597,13 +597,13 @@ DECLARE_EVENT_CLASS(nfs4_open_event,
__entry->openstateid_hash = 0;
}
if (inode != NULL) {
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
} else {
__entry->fileid = 0;
__entry->fhandle = 0;
}
- __entry->dir = NFS_FILEID(d_inode(ctx->dentry->d_parent));
+ __entry->dir = d_inode(ctx->dentry->d_parent)->i_ino;
__assign_str(name);
),
@@ -658,7 +658,7 @@ TRACE_EVENT(nfs4_cached_open,
const struct inode *inode = state->inode;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->fmode = (__force unsigned int)state->state;
__entry->stateid_seq =
@@ -703,7 +703,7 @@ TRACE_EVENT(nfs4_close,
const struct inode *inode = state->inode;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->fmode = (__force unsigned int)state->state;
__entry->error = error < 0 ? -error : 0;
@@ -759,7 +759,7 @@ DECLARE_EVENT_CLASS(nfs4_lock_event,
__entry->start = request->fl_start;
__entry->end = request->fl_end;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->stateid_seq =
be32_to_cpu(state->stateid.seqid);
@@ -831,7 +831,7 @@ TRACE_EVENT(nfs4_set_lock,
__entry->start = request->fl_start;
__entry->end = request->fl_end;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->stateid_seq =
be32_to_cpu(state->stateid.seqid);
@@ -922,7 +922,7 @@ TRACE_EVENT(nfs4_state_lock_reclaim,
const struct inode *inode = state->inode;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->state_flags = state->flags;
__entry->lock_flags = lock->ls_flags;
@@ -960,7 +960,7 @@ DECLARE_EVENT_CLASS(nfs4_set_delegation_event,
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->fmode = (__force unsigned int)fmode;
),
@@ -1087,7 +1087,7 @@ DECLARE_EVENT_CLASS(nfs4_test_stateid_event,
__entry->error = error < 0 ? -error : 0;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->stateid_seq =
be32_to_cpu(state->stateid.seqid);
@@ -1137,7 +1137,7 @@ DECLARE_EVENT_CLASS(nfs4_lookup_event,
TP_fast_assign(
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__entry->error = -error;
__assign_str(name);
),
@@ -1185,7 +1185,7 @@ TRACE_EVENT(nfs4_lookupp,
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
- __entry->ino = NFS_FILEID(inode);
+ __entry->ino = inode->i_ino;
__entry->error = error < 0 ? -error : 0;
),
@@ -1220,8 +1220,8 @@ TRACE_EVENT(nfs4_rename,
TP_fast_assign(
__entry->dev = olddir->i_sb->s_dev;
- __entry->olddir = NFS_FILEID(olddir);
- __entry->newdir = NFS_FILEID(newdir);
+ __entry->olddir = olddir->i_ino;
+ __entry->newdir = newdir->i_ino;
__entry->error = error < 0 ? -error : 0;
__assign_str(oldname);
__assign_str(newname);
@@ -1258,7 +1258,7 @@ DECLARE_EVENT_CLASS(nfs4_inode_event,
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->error = error < 0 ? -error : 0;
),
@@ -1311,7 +1311,7 @@ DECLARE_EVENT_CLASS(nfs4_inode_stateid_event,
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->error = error < 0 ? -error : 0;
__entry->stateid_seq =
@@ -1421,7 +1421,7 @@ DECLARE_EVENT_CLASS(nfs4_inode_callback_event,
__entry->error = error < 0 ? -error : 0;
__entry->fhandle = nfs_fhandle_hash(fhandle);
if (!IS_ERR_OR_NULL(inode)) {
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
} else {
__entry->fileid = 0;
@@ -1478,7 +1478,7 @@ DECLARE_EVENT_CLASS(nfs4_inode_stateid_callback_event,
__entry->error = error < 0 ? -error : 0;
__entry->fhandle = nfs_fhandle_hash(fhandle);
if (!IS_ERR_OR_NULL(inode)) {
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
} else {
__entry->fileid = 0;
@@ -1655,7 +1655,7 @@ DECLARE_EVENT_CLASS(nfs4_read_event,
const struct pnfs_layout_segment *lseg = hdr->lseg;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
__entry->offset = hdr->args.offset;
__entry->arg_count = hdr->args.count;
@@ -1727,7 +1727,7 @@ DECLARE_EVENT_CLASS(nfs4_write_event,
const struct pnfs_layout_segment *lseg = hdr->lseg;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
__entry->offset = hdr->args.offset;
__entry->arg_count = hdr->args.count;
@@ -1795,7 +1795,7 @@ DECLARE_EVENT_CLASS(nfs4_commit_event,
const struct pnfs_layout_segment *lseg = data->lseg;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
__entry->offset = data->args.offset;
__entry->count = data->args.count;
@@ -1857,7 +1857,7 @@ TRACE_EVENT(nfs4_layoutget,
const struct inode *inode = d_inode(ctx->dentry);
const struct nfs4_state *state = ctx->state;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->iomode = args->iomode;
__entry->offset = args->offset;
@@ -1957,7 +1957,7 @@ TRACE_EVENT(pnfs_update_layout,
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->pos = pos;
__entry->count = count;
@@ -2012,7 +2012,7 @@ DECLARE_EVENT_CLASS(pnfs_layout_event,
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->pos = pos;
__entry->count = count;
@@ -2194,7 +2194,7 @@ DECLARE_EVENT_CLASS(nfs4_flexfiles_io_event,
__entry->error = -error;
__entry->nfs_error = hdr->res.op_status;
__entry->fhandle = nfs_fhandle_hash(hdr->args.fh);
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->offset = hdr->args.offset;
__entry->count = hdr->args.count;
@@ -2258,7 +2258,7 @@ TRACE_EVENT(ff_layout_commit_error,
__entry->error = -error;
__entry->nfs_error = data->res.op_status;
__entry->fhandle = nfs_fhandle_hash(data->args.fh);
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->offset = data->args.offset;
__entry->count = data->args.count;
@@ -2423,7 +2423,7 @@ TRACE_EVENT(nfs4_llseek,
TP_STRUCT__entry(
__field(unsigned long, error)
__field(u32, fhandle)
- __field(u32, fileid)
+ __field(u64, fileid)
__field(dev_t, dev)
__field(int, stateid_seq)
__field(u32, stateid_hash)
@@ -2434,10 +2434,9 @@ TRACE_EVENT(nfs4_llseek,
),
TP_fast_assign(
- const struct nfs_inode *nfsi = NFS_I(inode);
const struct nfs_fh *fh = args->sa_fh;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->fhandle = nfs_fhandle_hash(fh);
__entry->offset_s = args->sa_offset;
@@ -2499,7 +2498,7 @@ DECLARE_EVENT_CLASS(nfs4_sparse_event,
__entry->offset = args->falloc_offset;
__entry->len = args->falloc_length;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__entry->stateid_seq =
be32_to_cpu(args->falloc_stateid.seqid);
@@ -2568,14 +2567,11 @@ TRACE_EVENT(nfs4_copy,
),
TP_fast_assign(
- const struct nfs_inode *src_nfsi = NFS_I(src_inode);
- const struct nfs_inode *dst_nfsi = NFS_I(dst_inode);
-
- __entry->src_fileid = src_nfsi->fileid;
+ __entry->src_fileid = src_inode->i_ino;
__entry->src_dev = src_inode->i_sb->s_dev;
__entry->src_fhandle = nfs_fhandle_hash(args->src_fh);
__entry->src_offset = args->src_pos;
- __entry->dst_fileid = dst_nfsi->fileid;
+ __entry->dst_fileid = dst_inode->i_ino;
__entry->dst_dev = dst_inode->i_sb->s_dev;
__entry->dst_fhandle = nfs_fhandle_hash(args->dst_fh);
__entry->dst_offset = args->dst_pos;
@@ -2666,14 +2662,11 @@ TRACE_EVENT(nfs4_clone,
),
TP_fast_assign(
- const struct nfs_inode *src_nfsi = NFS_I(src_inode);
- const struct nfs_inode *dst_nfsi = NFS_I(dst_inode);
-
- __entry->src_fileid = src_nfsi->fileid;
+ __entry->src_fileid = src_inode->i_ino;
__entry->src_dev = src_inode->i_sb->s_dev;
__entry->src_fhandle = nfs_fhandle_hash(args->src_fh);
__entry->src_offset = args->src_offset;
- __entry->dst_fileid = dst_nfsi->fileid;
+ __entry->dst_fileid = dst_inode->i_ino;
__entry->dst_dev = dst_inode->i_sb->s_dev;
__entry->dst_fhandle = nfs_fhandle_hash(args->dst_fh);
__entry->dst_offset = args->dst_offset;
@@ -2724,7 +2717,7 @@ TRACE_EVENT(nfs4_copy_notify,
TP_STRUCT__entry(
__field(unsigned long, error)
__field(u32, fhandle)
- __field(u32, fileid)
+ __field(u64, fileid)
__field(dev_t, dev)
__field(int, stateid_seq)
__field(u32, stateid_hash)
@@ -2733,9 +2726,7 @@ TRACE_EVENT(nfs4_copy_notify,
),
TP_fast_assign(
- const struct nfs_inode *nfsi = NFS_I(inode);
-
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->dev = inode->i_sb->s_dev;
__entry->fhandle = nfs_fhandle_hash(args->cna_src_fh);
__entry->stateid_seq =
@@ -2830,7 +2821,7 @@ DECLARE_EVENT_CLASS(nfs4_xattr_event,
TP_fast_assign(
__entry->error = error < 0 ? -error : 0;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
__assign_str(name);
),
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index ff467959f733..4ada21f4eebd 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -80,7 +80,7 @@ DECLARE_EVENT_CLASS(nfs_inode_event,
TP_fast_assign(
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->version = inode_peek_iversion_raw(inode);
__entry->cache_validity = nfsi->cache_validity;
@@ -121,7 +121,7 @@ DECLARE_EVENT_CLASS(nfs_inode_event_done,
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->error = error < 0 ? -error : 0;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->type = nfs_umode_to_dtype(inode->i_mode);
__entry->version = inode_peek_iversion_raw(inode);
@@ -211,7 +211,7 @@ TRACE_EVENT(nfs_access_exit,
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->error = error < 0 ? -error : 0;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->type = nfs_umode_to_dtype(inode->i_mode);
__entry->version = inode_peek_iversion_raw(inode);
@@ -265,7 +265,7 @@ DECLARE_EVENT_CLASS(nfs_update_size_class,
__entry->dev = inode->i_sb->s_dev;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->version = inode_peek_iversion_raw(inode);
__entry->cur_size = i_size_read(inode);
__entry->new_size = new_size;
@@ -317,7 +317,7 @@ DECLARE_EVENT_CLASS(nfs_inode_range_event,
__entry->dev = inode->i_sb->s_dev;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->version = inode_peek_iversion_raw(inode);
__entry->range_start = range_start;
__entry->range_end = range_end;
@@ -371,7 +371,7 @@ DECLARE_EVENT_CLASS(nfs_readdir_event,
const struct nfs_inode *nfsi = NFS_I(dir);
__entry->dev = dir->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = dir->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->version = inode_peek_iversion_raw(dir);
if (cookie != 0)
@@ -429,9 +429,9 @@ DECLARE_EVENT_CLASS(nfs_lookup_event,
TP_fast_assign(
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__entry->flags = flags;
- __entry->fileid = d_is_negative(dentry) ? 0 : NFS_FILEID(d_inode(dentry));
+ __entry->fileid = d_is_negative(dentry) ? 0 : d_inode(dentry)->i_ino;
__assign_str(name);
),
@@ -476,10 +476,10 @@ DECLARE_EVENT_CLASS(nfs_lookup_event_done,
TP_fast_assign(
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__entry->error = error < 0 ? -error : 0;
__entry->flags = flags;
- __entry->fileid = d_is_negative(dentry) ? 0 : NFS_FILEID(d_inode(dentry));
+ __entry->fileid = d_is_negative(dentry) ? 0 : d_inode(dentry)->i_ino;
__assign_str(name);
),
@@ -532,7 +532,7 @@ TRACE_EVENT(nfs_atomic_open_enter,
TP_fast_assign(
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__entry->flags = flags;
__entry->fmode = (__force unsigned long)ctx->mode;
__assign_str(name);
@@ -571,7 +571,7 @@ TRACE_EVENT(nfs_atomic_open_exit,
TP_fast_assign(
__entry->error = -error;
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__entry->flags = flags;
__entry->fmode = (__force unsigned long)ctx->mode;
__assign_str(name);
@@ -608,7 +608,7 @@ TRACE_EVENT(nfs_create_enter,
TP_fast_assign(
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__entry->flags = flags;
__assign_str(name);
),
@@ -644,7 +644,7 @@ TRACE_EVENT(nfs_create_exit,
TP_fast_assign(
__entry->error = -error;
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__entry->flags = flags;
__assign_str(name);
),
@@ -676,7 +676,7 @@ DECLARE_EVENT_CLASS(nfs_directory_event,
TP_fast_assign(
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__assign_str(name);
),
@@ -714,7 +714,7 @@ DECLARE_EVENT_CLASS(nfs_directory_event_done,
TP_fast_assign(
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__entry->error = error < 0 ? -error : 0;
__assign_str(name);
),
@@ -768,8 +768,8 @@ TRACE_EVENT(nfs_link_enter,
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
- __entry->dir = NFS_FILEID(dir);
+ __entry->fileid = inode->i_ino;
+ __entry->dir = dir->i_ino;
__assign_str(name);
),
@@ -803,8 +803,8 @@ TRACE_EVENT(nfs_link_exit,
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = NFS_FILEID(inode);
- __entry->dir = NFS_FILEID(dir);
+ __entry->fileid = inode->i_ino;
+ __entry->dir = dir->i_ino;
__entry->error = error < 0 ? -error : 0;
__assign_str(name);
),
@@ -840,8 +840,8 @@ DECLARE_EVENT_CLASS(nfs_rename_event,
TP_fast_assign(
__entry->dev = old_dir->i_sb->s_dev;
- __entry->old_dir = NFS_FILEID(old_dir);
- __entry->new_dir = NFS_FILEID(new_dir);
+ __entry->old_dir = old_dir->i_ino;
+ __entry->new_dir = new_dir->i_ino;
__assign_str(old_name);
__assign_str(new_name);
),
@@ -889,8 +889,8 @@ DECLARE_EVENT_CLASS(nfs_rename_event_done,
TP_fast_assign(
__entry->dev = old_dir->i_sb->s_dev;
__entry->error = -error;
- __entry->old_dir = NFS_FILEID(old_dir);
- __entry->new_dir = NFS_FILEID(new_dir);
+ __entry->old_dir = old_dir->i_ino;
+ __entry->new_dir = new_dir->i_ino;
__assign_str(old_name);
__assign_str(new_name);
),
@@ -943,7 +943,7 @@ TRACE_EVENT(nfs_sillyrename_unlink,
struct inode *dir = d_inode(data->dentry->d_parent);
size_t len = data->args.name.len;
__entry->dev = dir->i_sb->s_dev;
- __entry->dir = NFS_FILEID(dir);
+ __entry->dir = dir->i_ino;
__entry->error = -error;
memcpy(__get_str(name),
data->args.name.name, len);
@@ -981,7 +981,7 @@ DECLARE_EVENT_CLASS(nfs_folio_event,
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->version = inode_peek_iversion_raw(inode);
__entry->offset = offset;
@@ -1031,7 +1031,7 @@ DECLARE_EVENT_CLASS(nfs_folio_event_done,
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->version = inode_peek_iversion_raw(inode);
__entry->offset = offset;
@@ -1109,7 +1109,7 @@ DECLARE_EVENT_CLASS(nfs_kiocb_event,
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->version = inode_peek_iversion_raw(inode);
__entry->offset = iocb->ki_pos;
@@ -1160,7 +1160,7 @@ TRACE_EVENT(nfs_aop_readahead,
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->version = inode_peek_iversion_raw(inode);
__entry->offset = pos;
@@ -1199,7 +1199,7 @@ TRACE_EVENT(nfs_aop_readahead_done,
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->version = inode_peek_iversion_raw(inode);
__entry->nr_pages = nr_pages;
@@ -1239,7 +1239,7 @@ TRACE_EVENT(nfs_initiate_read,
__entry->offset = hdr->args.offset;
__entry->count = hdr->args.count;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
),
@@ -1284,7 +1284,7 @@ TRACE_EVENT(nfs_readpage_done,
__entry->res_count = hdr->res.count;
__entry->eof = hdr->res.eof;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
),
@@ -1330,7 +1330,7 @@ TRACE_EVENT(nfs_readpage_short,
__entry->res_count = hdr->res.count;
__entry->eof = hdr->res.eof;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
),
@@ -1377,7 +1377,7 @@ TRACE_EVENT(nfs_pgio_error,
__entry->arg_count = hdr->args.count;
__entry->res_count = hdr->res.count;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
),
@@ -1416,7 +1416,7 @@ TRACE_EVENT(nfs_initiate_write,
__entry->count = hdr->args.count;
__entry->stable = hdr->args.stable;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
),
@@ -1467,7 +1467,7 @@ TRACE_EVENT(nfs_writeback_done,
&verf->verifier,
NFS4_VERIFIER_SIZE);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
),
@@ -1507,7 +1507,7 @@ DECLARE_EVENT_CLASS(nfs_page_class,
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->req = req;
__entry->offset = req_offset(req);
@@ -1555,7 +1555,7 @@ DECLARE_EVENT_CLASS(nfs_page_error_class,
TP_fast_assign(
const struct nfs_inode *nfsi = NFS_I(inode);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
__entry->offset = req_offset(req);
__entry->count = req->wb_bytes;
@@ -1609,7 +1609,7 @@ TRACE_EVENT(nfs_initiate_commit,
__entry->offset = data->args.offset;
__entry->count = data->args.count;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
),
@@ -1655,7 +1655,7 @@ TRACE_EVENT(nfs_commit_done,
&verf->verifier,
NFS4_VERIFIER_SIZE);
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
),
@@ -1701,7 +1701,7 @@ DECLARE_EVENT_CLASS(nfs_direct_req_class,
const struct nfs_fh *fh = &nfsi->fh;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
__entry->offset = dreq->io_start;
__entry->count = dreq->count;
@@ -1765,7 +1765,7 @@ DECLARE_EVENT_CLASS(nfs_local_dio_class,
const struct nfs_fh *fh = &nfsi->fh;
__entry->dev = inode->i_sb->s_dev;
- __entry->fileid = nfsi->fileid;
+ __entry->fileid = inode->i_ino;
__entry->fhandle = nfs_fhandle_hash(fh);
__entry->offset = offset;
__entry->count = count;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 4a87b2fdb2e6..7dd478ffc2fa 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -759,7 +759,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
dprintk("NFS: initiated pgio call "
"(req %s/%llu, %u bytes @ offset %llu)\n",
hdr->inode->i_sb->s_id,
- (unsigned long long)NFS_FILEID(hdr->inode),
+ (unsigned long long)hdr->inode->i_ino,
hdr->args.count,
(unsigned long long)hdr->args.offset);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 743467e9ba20..fdedeff5f6cc 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2373,7 +2373,7 @@ pnfs_update_layout(struct inode *ino,
dprintk("%s: inode %s/%llu pNFS layout segment %s for "
"(%s, offset: %llu, length: %llu)\n",
__func__, ino->i_sb->s_id,
- (unsigned long long)NFS_FILEID(ino),
+ (unsigned long long)ino->i_ino,
IS_ERR_OR_NULL(lseg) ? "not found" : "found",
iomode==IOMODE_RW ? "read/write" : "read-only",
(unsigned long long)pos,
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index df3ca4669df6..7f2e84eaaa9f 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -461,7 +461,7 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
if (dentry->d_flags & DCACHE_NFSFS_RENAMED)
goto out;
- fileid = NFS_FILEID(d_inode(dentry));
+ fileid = d_inode(dentry)->i_ino;
sdentry = NULL;
do {
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 3134bb17f3e3..9035bb8a0216 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1817,7 +1817,7 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
dprintk("NFS: commit (%s/%llu %d@%lld)",
nfs_req_openctx(req)->dentry->d_sb->s_id,
- (unsigned long long)NFS_FILEID(d_inode(nfs_req_openctx(req)->dentry)),
+ (unsigned long long)d_inode(nfs_req_openctx(req)->dentry)->i_ino,
req->wb_bytes,
(long long)req_offset(req));
if (status < 0) {
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 6d6fa62ede10..83063f4ab488 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -394,16 +394,6 @@ static inline int NFS_STALE(const struct inode *inode)
return test_bit(NFS_INO_STALE, &NFS_I(inode)->flags);
}
-static inline __u64 NFS_FILEID(const struct inode *inode)
-{
- return inode->i_ino;
-}
-
-static inline void set_nfs_fileid(struct inode *inode, __u64 fileid)
-{
- inode->i_ino = fileid;
-}
-
static inline void nfs_mark_for_revalidate(struct inode *inode)
{
struct nfs_inode *nfsi = NFS_I(inode);
--
2.54.0
^ permalink raw reply related
* [PATCH 2/4] nfs: remove nfs_compat_user_ino64() and deprecate enable_ino64
From: Jeff Layton @ 2026-05-12 16:12 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Jonathan Corbet, Shuah Khan
Cc: linux-nfs, linux-kernel, linux-doc, Jeff Layton
In-Reply-To: <20260512-nfsino-v1-0-284720522f4c@kernel.org>
Now that inode->i_ino stores the full 64-bit NFS fileid, the
nfs_compat_user_ino64() function is no longer needed.
generic_fillattr() already copies inode->i_ino into stat->ino, so the
explicit override in nfs_getattr() is also redundant.
Also remove the now-unused nfs_fileid_to_ino_t() and
nfs_fattr_to_ino_t() helper functions that were used to XOR-fold
64-bit fileids into the old unsigned long i_ino.
Keep the enable_ino64 module parameter as a deprecated stub that
accepts but ignores the value, logging a notice when set. This avoids
breaking existing configurations that pass nfs.enable_ino64 on the
kernel command line or in modprobe.d.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Documentation/admin-guide/kernel-parameters.txt | 7 ----
fs/nfs/dir.c | 2 +-
fs/nfs/inode.c | 50 +++++++------------------
include/linux/nfs_fs.h | 10 -----
4 files changed, 15 insertions(+), 54 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 4d0f545fb3ec..1e75cfb81d9a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4308,13 +4308,6 @@ Kernel parameters
Only applies if the softerr mount option is enabled,
and the specified value is >= 0.
- nfs.enable_ino64=
- [NFS] enable 64-bit inode numbers.
- If zero, the NFS client will fake up a 32-bit inode
- number for the readdir() and stat() syscalls instead
- of returning the full 64-bit number.
- The default is to return 64-bit inode numbers.
-
nfs.idmap_cache_timeout=
[NFS] set the maximum lifetime for idmapper cache
entries.
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 5f8c3ea0bce3..659e39b1562b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1106,7 +1106,7 @@ static void nfs_do_filldir(struct nfs_readdir_descriptor *desc,
ent = &array->array[i];
if (!dir_emit(desc->ctx, ent->name, ent->name_len,
- nfs_compat_user_ino64(ent->ino), ent->d_type)) {
+ ent->ino, ent->d_type)) {
desc->eob = true;
break;
}
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index dd9e378c36fb..a21ed1c7f89d 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -57,21 +57,23 @@
#define NFSDBG_FACILITY NFSDBG_VFS
-#define NFS_64_BIT_INODE_NUMBERS_ENABLED 1
+static bool enable_ino64;
-/* Default is to see 64-bit inode numbers */
-static bool enable_ino64 = NFS_64_BIT_INODE_NUMBERS_ENABLED;
+static int param_set_enable_ino64(const char *val, const struct kernel_param *kp)
+{
+ pr_notice("enable_ino64 is deprecated and has no effect\n");
+ return 0;
+}
+
+static const struct kernel_param_ops param_ops_enable_ino64 = {
+ .set = param_set_enable_ino64,
+ .get = param_get_bool,
+};
static int nfs_update_inode(struct inode *, struct nfs_fattr *);
static struct kmem_cache * nfs_inode_cachep;
-static inline u64
-nfs_fattr_to_ino_t(struct nfs_fattr *fattr)
-{
- return fattr->fileid;
-}
-
int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
{
if (unlikely(nfs_current_task_exiting()))
@@ -83,29 +85,6 @@ int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
}
EXPORT_SYMBOL_GPL(nfs_wait_bit_killable);
-/**
- * nfs_compat_user_ino64 - returns the user-visible inode number
- * @fileid: 64-bit fileid
- *
- * This function returns a 32-bit inode number if the boot parameter
- * nfs.enable_ino64 is zero.
- */
-u64 nfs_compat_user_ino64(u64 fileid)
-{
-#ifdef CONFIG_COMPAT
- compat_ulong_t ino;
-#else
- unsigned long ino;
-#endif
-
- if (enable_ino64)
- return fileid;
- ino = fileid;
- if (sizeof(ino) < sizeof(fileid))
- ino ^= fileid >> (sizeof(fileid)-sizeof(ino)) * 8;
- return ino;
-}
-
int nfs_drop_inode(struct inode *inode)
{
return NFS_STALE(inode) || inode_generic_drop(inode);
@@ -418,7 +397,7 @@ nfs_ilookup(struct super_block *sb, struct nfs_fattr *fattr, struct nfs_fh *fh)
!(fattr->valid & NFS_ATTR_FATTR_TYPE))
return NULL;
- hash = nfs_fattr_to_ino_t(fattr);
+ hash = fattr->fileid;
inode = ilookup5(sb, hash, nfs_find_actor, &desc);
dprintk("%s: returning %p\n", __func__, inode);
@@ -466,7 +445,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
if ((fattr->valid & NFS_ATTR_FATTR_TYPE) == 0)
goto out_no_inode;
- hash = nfs_fattr_to_ino_t(fattr);
+ hash = fattr->fileid;
inode = iget5_locked(sb, hash, nfs_find_actor, nfs_init_locked, &desc);
if (inode == NULL) {
@@ -1061,7 +1040,6 @@ int nfs_getattr(struct mnt_idmap *idmap, const struct path *path,
stat->result_mask = nfs_get_valid_attrmask(inode) | request_mask;
generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
- stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
stat->change_cookie = inode_peek_iversion_raw(inode);
stat->attributes_mask |= STATX_ATTR_CHANGE_MONOTONIC;
if (server->change_attr_type != NFS4_CHANGE_TYPE_IS_UNDEFINED)
@@ -2793,7 +2771,7 @@ static void __exit exit_nfs_fs(void)
MODULE_AUTHOR("Olaf Kirch <okir@monad.swb.de>");
MODULE_DESCRIPTION("NFS client support");
MODULE_LICENSE("GPL");
-module_param(enable_ino64, bool, 0644);
+module_param_cb(enable_ino64, ¶m_ops_enable_ino64, &enable_ino64, 0644);
module_init(init_nfs_fs)
module_exit(exit_nfs_fs)
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 8e48053b3069..6d6fa62ede10 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -473,7 +473,6 @@ extern void nfs_file_set_open_context(struct file *filp, struct nfs_open_context
extern void nfs_file_clear_open_context(struct file *flip);
extern struct nfs_lock_context *nfs_get_lock_context(struct nfs_open_context *ctx);
extern void nfs_put_lock_context(struct nfs_lock_context *l_ctx);
-extern u64 nfs_compat_user_ino64(u64 fileid);
extern void nfs_fattr_init(struct nfs_fattr *fattr);
extern void nfs_fattr_set_barrier(struct nfs_fattr *fattr);
extern unsigned long nfs_inc_attr_generation_counter(void);
@@ -668,15 +667,6 @@ static inline loff_t nfs_size_to_loff_t(__u64 size)
return min_t(u64, size, OFFSET_MAX);
}
-static inline ino_t
-nfs_fileid_to_ino_t(u64 fileid)
-{
- ino_t ino = (ino_t) fileid;
- if (sizeof(ino_t) < sizeof(u64))
- ino ^= fileid >> (sizeof(u64)-sizeof(ino_t)) * 8;
- return ino;
-}
-
static inline void nfs_ooo_clear(struct nfs_inode *nfsi)
{
nfsi->cache_validity &= ~NFS_INO_DATA_INVAL_DEFER;
--
2.54.0
^ permalink raw reply related
* [PATCH 1/4] nfs: store the full NFS fileid in inode->i_ino
From: Jeff Layton @ 2026-05-12 16:12 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Jonathan Corbet, Shuah Khan
Cc: linux-nfs, linux-kernel, linux-doc, Jeff Layton
In-Reply-To: <20260512-nfsino-v1-0-284720522f4c@kernel.org>
Now that inode->i_ino is a 64-bit value, store the full NFS fileid in
it directly instead of an XOR-folded hash. This makes NFS_FILEID() and
set_nfs_fileid() operate on inode->i_ino rather than the separate
nfsi->fileid field.
Since iget5_locked() and ilookup5() now accept a u64 hashval, pass the
full fileid as the hash parameter directly.
Convert direct nfsi->fileid accesses in nfs_check_inode_attributes(),
nfs_update_inode(), and nfs_same_file() to use inode->i_ino.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfs/dir.c | 2 +-
fs/nfs/inode.c | 25 ++++++++++---------------
include/linux/nfs_fs.h | 4 ++--
3 files changed, 13 insertions(+), 18 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index e9ce1883288c..5f8c3ea0bce3 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -650,7 +650,7 @@ int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry)
return 0;
nfsi = NFS_I(inode);
- if (entry->fattr->fileid != nfsi->fileid)
+ if (entry->fattr->fileid != inode->i_ino)
return 0;
if (entry->fh->size && nfs_compare_fh(entry->fh, &nfsi->fh) != 0)
return 0;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index e26030e73696..dd9e378c36fb 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -66,10 +66,10 @@ static int nfs_update_inode(struct inode *, struct nfs_fattr *);
static struct kmem_cache * nfs_inode_cachep;
-static inline unsigned long
+static inline u64
nfs_fattr_to_ino_t(struct nfs_fattr *fattr)
{
- return nfs_fileid_to_ino_t(fattr->fileid);
+ return fattr->fileid;
}
int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
@@ -313,8 +313,7 @@ struct nfs_find_desc {
};
/*
- * In NFSv3 we can have 64bit inode numbers. In order to support
- * this, and re-exported directories (also seen in NFSv2)
+ * For re-exported directories (also seen in NFSv2)
* we are forced to allow 2 different inodes to have the same
* i_ino.
*/
@@ -413,7 +412,7 @@ nfs_ilookup(struct super_block *sb, struct nfs_fattr *fattr, struct nfs_fh *fh)
.fattr = fattr,
};
struct inode *inode;
- unsigned long hash;
+ u64 hash;
if (!(fattr->valid & NFS_ATTR_FATTR_FILEID) ||
!(fattr->valid & NFS_ATTR_FATTR_TYPE))
@@ -456,7 +455,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
};
struct inode *inode = ERR_PTR(-ENOENT);
u64 fattr_supported = NFS_SB(sb)->fattr_valid;
- unsigned long hash;
+ u64 hash;
nfs_attr_check_mountpoint(sb, fattr);
@@ -479,10 +478,6 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr)
struct nfs_inode *nfsi = NFS_I(inode);
unsigned long now = jiffies;
- /* We set i_ino for the few things that still rely on it,
- * such as stat(2) */
- inode->i_ino = hash;
-
/* We can't support update_atime(), since the server will reset it */
inode->i_flags |= S_NOATIME|S_NOCMTIME;
inode->i_mode = fattr->mode;
@@ -1672,10 +1667,10 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat
if (fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID)
return 0;
/* Has the inode gone and changed behind our back? */
- } else if (nfsi->fileid != fattr->fileid) {
+ } else if (inode->i_ino != fattr->fileid) {
/* Is this perhaps the mounted-on fileid? */
if ((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) &&
- nfsi->fileid == fattr->mounted_on_fileid)
+ inode->i_ino == fattr->mounted_on_fileid)
return 0;
return -ESTALE;
}
@@ -2262,15 +2257,15 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
if (fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID)
return 0;
/* Has the inode gone and changed behind our back? */
- } else if (nfsi->fileid != fattr->fileid) {
+ } else if (inode->i_ino != fattr->fileid) {
/* Is this perhaps the mounted-on fileid? */
if ((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) &&
- nfsi->fileid == fattr->mounted_on_fileid)
+ inode->i_ino == fattr->mounted_on_fileid)
return 0;
printk(KERN_ERR "NFS: server %s error: fileid changed\n"
"fsid %s: expected fileid 0x%Lx, got 0x%Lx\n",
NFS_SERVER(inode)->nfs_client->cl_hostname,
- inode->i_sb->s_id, (long long)nfsi->fileid,
+ inode->i_sb->s_id, (long long)inode->i_ino,
(long long)fattr->fileid);
goto out_err;
}
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 4623262da3c0..8e48053b3069 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -396,12 +396,12 @@ static inline int NFS_STALE(const struct inode *inode)
static inline __u64 NFS_FILEID(const struct inode *inode)
{
- return NFS_I(inode)->fileid;
+ return inode->i_ino;
}
static inline void set_nfs_fileid(struct inode *inode, __u64 fileid)
{
- NFS_I(inode)->fileid = fileid;
+ inode->i_ino = fileid;
}
static inline void nfs_mark_for_revalidate(struct inode *inode)
--
2.54.0
^ permalink raw reply related
* [PATCH 0/4] nfs: remove the fileid field from struct nfs_inode
From: Jeff Layton @ 2026-05-12 16:12 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Jonathan Corbet, Shuah Khan
Cc: linux-nfs, linux-kernel, linux-doc, Jeff Layton
v7.1-rc1 contains patches to make inode->i_ino to be a u64. With this
change, there is no need to keep a separate "fileid" field in struct
nfs_inode.
This patchset eliminiates that field, and the inode number hashing
machinery that is no longer needed. This shaves 8 bytes off of each
nfs_inode.
Trond/Anna: please consider this for v7.2.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Jeff Layton (4):
nfs: store the full NFS fileid in inode->i_ino
nfs: remove nfs_compat_user_ino64() and deprecate enable_ino64
nfs: replace NFS_FILEID() and nfsi->fileid with inode->i_ino
nfs: remove fileid field from struct nfs_inode
Documentation/admin-guide/kernel-parameters.txt | 7 --
fs/nfs/dir.c | 4 +-
fs/nfs/export.c | 6 +-
fs/nfs/filelayout/filelayout.c | 4 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 6 +-
fs/nfs/inode.c | 87 +++++++++----------------
fs/nfs/nfs4proc.c | 4 +-
fs/nfs/nfs4trace.h | 79 ++++++++++------------
fs/nfs/nfstrace.h | 84 ++++++++++++------------
fs/nfs/pagelist.c | 2 +-
fs/nfs/pnfs.c | 2 +-
fs/nfs/unlink.c | 2 +-
fs/nfs/write.c | 2 +-
include/linux/nfs_fs.h | 25 -------
14 files changed, 123 insertions(+), 191 deletions(-)
---
base-commit: 5d6919055dec134de3c40167a490f33c74c12581
change-id: 20260512-nfsino-1f9a8ca2f3ed
Best regards,
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply
* Re: [PATCH v12 05/11] iio: core: add decimal value formatting into 64-bit value
From: Rodrigo Alencar @ 2026-05-12 16:09 UTC (permalink / raw)
To: Andy Shevchenko, rodrigo.alencar
Cc: linux-kernel, linux-iio, devicetree, linux-doc, Jonathan Cameron,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan
In-Reply-To: <agM6uzhdn7o8g9v5@ashevche-desk.local>
On 26/05/12 05:35PM, Andy Shevchenko wrote:
> On Sun, May 10, 2026 at 01:42:23PM +0100, Rodrigo Alencar via B4 Relay wrote:
>
> > Create new format types for iio values (IIO_VAL_DECIMAL64_*), which
> > defines the representation of fixed decimal point values into a single
> > 64-bit number. This new format increases the range of represented values,
> > allowing for integer parts greater than 2^32, as bits are not "wasted"
> > in the fractional part, which can be seen in IIO_VAL_INT_PLUS_MICRO and
> > IIO_VAL_INT_PLUS_NANO. Helpers are created to compose and decompose 64-bit
> > decimals into integer values used in IIO formatting interfaces, which
> > creates consistency and avoid error-prone manual assignments when using
> > wordpart macros. When doing the parsing, kstrtodec64() is used with the
> > scale defined by the specific decimal format type.
...
> > + tmp2 = div64_s64_rem(iio_val_s64_from_array(vals),
> > + int_pow(10, scale), &frac);
> > + if (tmp2 == 0 && frac < 0)
> > + return sysfs_emit_at(buf, offset, "-0.%0*lld", scale,
> > + abs(frac));
> > + else
> > + return sysfs_emit_at(buf, offset, "%lld.%0*lld", tmp2,
> > + scale, abs(frac));
> > + }
>
> What about
>
> /* Print a leading '-' for negative fractions */
> if (tmp2 == 0 && frac < 0)
> offset += sysfs_emit_at(buf, offset, "-");
>
> return sysfs_emit_at(buf, offset, "%lld.%0*lld", tmp2, scale, abs(frac));
>
> Also note this won't work with the frac that are == S64_MIN. It's UB (undefined
> behaviour), see the comment at abs() implementation. Maybe a time to add abs()
> corner case tests...
frac cannot be S64_MIN, it is always and remainder of a power of 10 modulus.
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* Re: [PATCH v4 0/5] Add OneXPlayer Configuration HID Driver
From: Jiri Kosina @ 2026-05-12 15:56 UTC (permalink / raw)
To: Derek J. Clark
Cc: Benjamin Tissoires, Pierre-Loup A . Griffais, Lambert Fan,
Zhouwang Huang, linux-input, linux-doc, linux-kernel
In-Reply-To: <20260419042624.625746-1-derekjohn.clark@gmail.com>
On Sat, 18 Apr 2026, Derek J. Clark wrote:
> Adds an HID driver for OneXPlayer HID configuration devices. There are
> currently 2 generations of OneXPlayer HID protocol. The first (OneXPlayer
> F1 series) only provides an RGB control interface over HID. The Second
> (X1 mini series, G1 series, AOKZOE A1X) also includes a hardware level
> button mapping interface, vibration intensity settings, and the ability
> to switch output between xinput and a debug mode that can be used to debug
> the button mapping. Some devices (G1 Series, APEX) use a hybrid of Gen1
> RGB control and Gen 2 controller settings. To ensure there is no conflicts
> when the driver is loaded, we skip creating the RGB interface for Gen 2
> devices if there is a DMI match.
>
> I'll also add a note that Gen 1 devices also have an interface for
> setting the key map and debug mode, but that is done entirely over a
> serial TTY device so it is not able to be added to this driver. There
> are also some "Gen 0" devices (OneXPlayer 2 Series) also use it, but
> the TTY interface also handles the RGB control so no support is
> provided by this driver for those interfaces.
>
> Signed-off-by: Derel J. Clark <derekjohn.clark@gmail.com>
> ---
> v4:
> - Make all delayed work part of drvdata & ensure they are canceled
> during remove.
>
> v3: https://lore.kernel.org/linux-input/20260412213444.2231505-1-derekjohn.clark@gmail.com/
> - Ensure default button map is properly init during probe.
>
> v2: https://lore.kernel.org/linux-input/20260407041354.2283201-1-derekjohn.clark@gmail.com/
> - Add DMI quirks for certain devices that ship with both GEN1 and GEN2
> MCU to avoid clashing when initializing the RGB interface.
> - Add left & right vibration intensity attributes.
> - Add additional mappings for keyboard inputs.
> - Add a delayed work trigger to re-apply settings after the MCU
> completes initializing after a suspend/resume cycle.
>
> v1: https://lore.kernel.org/linux-input/20260322031615.1524307-1-derekjohn.clark@gmail.com/
Now in hid.git#for-7.2/oxp, thanks.
--
Jiri Kosina
SUSE Labs
^ permalink raw reply
* Re: [PATCH v10 4/6] iio: adc: ad4691: add SPI offload support
From: Jonathan Cameron @ 2026-05-12 15:49 UTC (permalink / raw)
To: Radu Sabau via B4 Relay
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, David Lechner,
Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Uwe Kleine-König, Liam Girdwood, Mark Brown,
Linus Walleij, Bartosz Golaszewski, Philipp Zabel,
Jonathan Corbet, Shuah Khan, linux-iio, devicetree, linux-kernel,
linux-pwm, linux-gpio, linux-doc
In-Reply-To: <20260511-ad4692-multichannel-sar-adc-driver-v10-4-e1fbb1744e38@analog.com>
On Mon, 11 May 2026 14:54:16 +0300
Radu Sabau via B4 Relay <devnull+radu.sabau.analog.com@kernel.org> wrote:
> From: Radu Sabau <radu.sabau@analog.com>
>
> Add SPI offload support to enable DMA-based, CPU-independent data
> acquisition using the SPI Engine offload framework.
>
> When an SPI offload is available (devm_spi_offload_get() succeeds),
> the driver registers a DMA engine IIO buffer and uses dedicated buffer
> setup operations. If no offload is available the existing software
> triggered buffer path is used unchanged.
>
> Both CNV Burst Mode and Manual Mode support offload, but use different
> trigger mechanisms:
>
> CNV Burst Mode: the SPI Engine is triggered by the ADC's DATA_READY
> signal on the GP pin specified by the trigger-source consumer reference
> in the device tree (one cell = GP pin number 0-3). For this mode the
> driver acts as both an SPI offload consumer (DMA RX stream, message
> optimization) and a trigger source provider: it registers the
> GP/DATA_READY output via devm_spi_offload_trigger_register() so the
> offload framework can match the '#trigger-source-cells' phandle and
> automatically fire the SPI Engine DMA transfer at end-of-conversion.
>
> Manual Mode: the SPI Engine is triggered by a periodic trigger at
> the configured sampling frequency. The pre-built SPI message uses
> the pipelined CNV-on-CS protocol: N+1 16-bit transfers are issued
> for N active channels (the first result is discarded as garbage from
> the pipeline flush) and the remaining N results are captured by DMA.
>
> All offload transfers use 16-bit frames (bits_per_word=16, len=2).
> The channel scan_type (storagebits=16, shift=0) is shared between the
> software triggered-buffer and offload paths; no separate scan_type or
> channel array is needed for the offload case at this stage. Oversampling
> support and mode-specific channel array distinctions are introduced in
> the following commit.
>
> IIO_BUFFER_DMAENGINE is selected because the offload path uses
> devm_iio_dmaengine_buffer_setup_with_handle() to allocate and
> attach the DMA RX buffer to the IIO device.
More stuff from sashiko - it also not noticed the BE marking is
in a later patch - but given you are doing 16bit words here likely
the chan spec needs to be different.
Hohum. It then seems to have timed out. Maybe that should be a new
rule - if your patch set is too big for sashiko it's too big and
complex for reviewers too ;)
>
> Signed-off-by: Radu Sabau <radu.sabau@analog.com>
I didn't really spot anything myself on this pass so just a question
of analysing the bot output and groaning at what I missed :(
Jonathan
^ permalink raw reply
* Re: [PATCH 2/3] mm/zswap: Implement proactive writeback
From: Nhat Pham @ 2026-05-12 15:47 UTC (permalink / raw)
To: Hao Jia
Cc: Yosry Ahmed, akpm, tj, hannes, shakeel.butt, mhocko, mkoutny,
chengming.zhou, muchun.song, roman.gushchin, cgroups, linux-mm,
linux-kernel, linux-doc, Hao Jia, Alexandre Ghiti
In-Reply-To: <12e4784e-2add-d849-7e54-bde8abfa6e78@gmail.com>
On Tue, May 12, 2026 at 2:32 AM Hao Jia <jiahao.kernel@gmail.com> wrote:
>
>
>
> On 2026/5/12 03:57, Yosry Ahmed wrote:
> > On Mon, May 11, 2026 at 12:49 PM Nhat Pham <nphamcs@gmail.com> wrote:
> >>
> >> On Mon, May 11, 2026 at 3:52 AM Hao Jia <jiahao.kernel@gmail.com> wrote:
> >>>
> >>> From: Hao Jia <jiahao1@lixiang.com>
> >>>
> >>> Zswap currently writes back pages to backing swap devices reactively,
> >>> triggered either by memory pressure via the shrinker or by the pool
> >>> reaching its size limit. This reactive approach offers no precise
> >>> control over when writeback happens, which can disturb latency-sensitive
> >>> workloads, and it cannot direct writeback at a specific memory cgroup.
> >>> However, there are scenarios where users might want to proactively
> >>> write back cold pages from zswap to the backing swap device, for
> >>> example, to free up memory for other applications or to prepare for
> >>> upcoming memory-intensive workloads.
> >>>
> >>> Therefore, implement a proactive writeback mechanism for zswap by
> >>> adding a new cgroup interface file memory.zswap.proactive_writeback
> >>> within the memory controller.
> >>
>
> Thanks Nhat, Yosry — let me address both comments together.
>
> >>
> >> We already have memory.reclaim, no? Would that not work to create
> >> headroom generally for your use case? Is there a reason why we are
> >> treating zswap memory as special here?
> >
>
> Apologies for the lack of detailed explanation in the patch description,
> which led to the confusion.
>
> While we are already utilizing memory.reclaim, it does not fully address
> our requirements.
>
> Our deployment runs a userspace proactive reclaimer that drives
> memory.reclaim based on the system's runtime state (memory/CPU/IO
> pressure, refault rate, ...) and workload-specific
> policy. That first stage compresses cold anon pages into zswap. Entries
> that then remain in zswap past a policy-defined age threshold are
> considered "twice cold", and the reclaimer wants
> to write them back to the backing swap device at a moment of its own
> choosing, to further reclaim the DRAM still held by the compressed data.
>
> This is the "second-level offloading" pattern described in Meta's TMO
> paper [1]. zswap proactive writeback is what this series introduces to
> address that second-level offloading stage.
>
> [1] https://www.pdl.cmu.edu/ftp/NVM/tmo_asplos22.pdf
Yeah that's what we've been trying to work on as well :) We are
working on a couple of improvements to the mechanism side of this path
(cc Alex) - hopefully it will help your use case too!
Anyway, back to my original inquiry: I understand your use case. It's
pretty similar to our goal. What I'm not getting is why is
memory.reclaim (which you already use) not sufficient for zswap ->
disk swap offloading too?
Zswap objects are organized into LRU and exposed to the shrinker
interface. Echo-ing to memory.reclaim should also offload some zswap
entries, correct? Are there still cold zswap entries that escape this,
somehow?
Furthermore, we already have a way to detect the "twice cold" entries
you mentioned: the referenced bit. This is analogous to the way we
treat uncompressed pages.
>
>
> > +1, why do we need to specifically proactively reclaim the compressed memory?
> >
> > Also, if we do need to minimize the compressed memory and force higher
> > writeback rates, we can do so with memory.zswap.max, right?
>
> Here are a few reasons why memory.zswap.max is not enough:
>
> 1. Writing memory.zswap.max itself does not trigger any writeback
> immediately. For a memcg that has reached steady state (on which the
> userspace reclaimer is no longer invoking
> memory.reclaim), after enough time has passed, the reclaimer has no good
> way to trigger proactive writeback for second-level offloading by
> lowering memory.zswap.max, because in steady
> state nothing drives the zswap_store() -> shrink_memcg() path. The
> userspace reclaimer still has no control over when proactive writeback
> happens.
>
> 2. memory.zswap.max currently triggers zswap writeback via zswap_store()
> -> shrink_memcg(), and each over-limit event can write back at most
> NR_NODES entries. If zswap residency is far
> above memory.zswap.max, converging to the target size requires at least
> O(over-limit pages / NR_NODES) zswap_store() events, with no batching —
> proactive writeback therefore has
> significant latency.
>
> 3. memory.zswap.max is a stateful interface. If the userspace reclaimer
> crashes for any reason mid-operation, it may leave memory.zswap.max at
> some set value, putting the application in a
> persistently throttled bad state.
>
> 4. Once the userspace reclaimer has lowered memory.zswap.max, if the
> workload is rapidly expanding and triggers memory reclaim via
> memory.high / kswapd / etc., the actual amount written
> back can exceed what was intended.
One more reason: IIRC, when you set memory.zswap.max to a value other
than 0 max, every zswap store incurs a pretty expensive check
(obj_cgroup_may_zswap), which does a force flush
(__mem_cgroup_flush_stats). That was pretty expensive last time some
of our internal services played with it. So yeah, it's not ideal...
(if you're using this, might wanna profile this as well).
>
> Thanks,
> Hao
^ permalink raw reply
* Re: [PATCH] docs/zh_CN: update admin-guide/index.rst translation
From: Yan Zhu @ 2026-05-12 15:46 UTC (permalink / raw)
To: kernel test robot, corbet, alexs, si.yanteng, kees
Cc: oe-kbuild-all, skhan, dzm91, tony.luck, gpiccoli, frederic,
jani.nikula, longman, mchehab+huawei, linux-doc, linux-kernel
In-Reply-To: <202605111009.hlpiVkT6-lkp@intel.com>
Hi,
On 5/11/2026 4:39 PM, kernel test robot wrote:
> Hi Yan,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on lwn/docs-next]
> [also build test WARNING on linus/master v7.1-rc3 next-20260508]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Yan-Zhu/docs-zh_CN-update-admin-guide-index-rst-translation/20260511-102406
> base: git://git.lwn.net/linux.git docs-next
> patch link: https://lore.kernel.org/r/tencent_7ADF2D1EBD8EAD2028BC93BA7858EA655D0A%40qq.com
> patch subject: [PATCH] docs/zh_CN: update admin-guide/index.rst translation
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
> reproduce: (https://download.01.org/0day-ci/archive/20260511/202605111009.hlpiVkT6-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202605111009.hlpiVkT6-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> Checksumming on output with GSO
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [docutils]
> MAINTAINERS:40: WARNING: Inline strong start-string without end-string. [docutils]
>>> Documentation/translations/zh_CN/admin-guide/index.rst:114: WARNING: toctree contains reference to nonexisting document 'translations/zh_CN/admin-guide/module-signing' [toc.not_readable]
> Documentation/userspace-api/landlock:504: ./security/landlock/errata/abi-4.h:5: ERROR: Unexpected section title.
>
This is a false alarm. I generated a patch based on the following
warehouse. It exists in the module-signing document:
base:git://git.kernel.org/pub/scm/linux/kernel/git/alexs/linux.git
docs-next
Thanks for your attention.
Yan Zhu
>
> vim +114 Documentation/translations/zh_CN/admin-guide/index.rst
>
> 113
> > 114 .. toctree::
> 115 :maxdepth: 1
> 116
> 117 cpu-load
> 118 mm/index
> 119 module-signing
> 120 numastat
> 121
> 122
> 123 Todolist:
> 124
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
--
Yan Zhu
^ permalink raw reply
* Re: [PATCH v10 3/6] iio: adc: ad4691: add triggered buffer support
From: Jonathan Cameron @ 2026-05-12 15:45 UTC (permalink / raw)
To: Radu Sabau via B4 Relay
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, David Lechner,
Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Uwe Kleine-König, Liam Girdwood, Mark Brown,
Linus Walleij, Bartosz Golaszewski, Philipp Zabel,
Jonathan Corbet, Shuah Khan, linux-iio, devicetree, linux-kernel,
linux-pwm, linux-gpio, linux-doc
In-Reply-To: <20260511-ad4692-multichannel-sar-adc-driver-v10-3-e1fbb1744e38@analog.com>
On Mon, 11 May 2026 14:54:15 +0300
Radu Sabau via B4 Relay <devnull+radu.sabau.analog.com@kernel.org> wrote:
> From: Radu Sabau <radu.sabau@analog.com>
>
> Add buffered capture support using the IIO triggered buffer framework.
>
> CNV Burst Mode: the GP pin identified by interrupt-names in the device
> tree is configured as DATA_READY output. The IRQ handler stops
> conversions and fires the IIO trigger; the trigger handler executes a
> pre-built SPI message that reads all active channels from the AVG_IN
> accumulator registers and then resets accumulator state and restarts
> conversions for the next cycle.
>
> Manual Mode: CNV is tied to SPI CS so each transfer simultaneously
> reads the previous result and starts the next conversion (pipelined
> N+1 scheme). At preenable time a pre-built, optimised SPI message of
> N+1 transfers is constructed (N channel reads plus one NOOP to drain
> the pipeline). The trigger handler executes the message in a single
> spi_sync() call and collects the results. An external trigger (e.g.
> iio-trig-hrtimer) is required to drive the trigger at the desired
> sample rate.
>
> Both modes share the same trigger handler and push a complete scan —
> one u16 slot per active channel, densely paccked in scan_index order,
> followed by a timestamp.
>
> The CNV Burst Mode sampling frequency (PWM period) is exposed as a
> buffer-level attribute via IIO_DEVICE_ATTR.
>
> Signed-off-by: Radu Sabau <radu.sabau@analog.com>
Sashiko pointed out you have a buffer that is big endian but
chan_spec doesn't reflect that. That should have generated obvious
garbage output (unless you are actually testing on a be machine!)
Various other things came up, some of which I thought were in previous
reviews - but maybe I'm confusing drivers.
Thanks
Jonathan
> @@ -204,7 +230,14 @@ static const struct ad4691_chip_info ad4694_chip_info = {
> struct ad4691_state {
> const struct ad4691_chip_info *info;
> struct regmap *regmap;
> + struct spi_device *spi;
> +
> + struct pwm_device *conv_trigger;
> + int irq;
> int vref_uV;
> + u32 cnv_period_ns;
> +
> + bool manual_mode;
> bool refbuf_en;
> bool ldo_en;
> /*
> @@ -212,8 +245,56 @@ struct ad4691_state {
> * atomicity of consecutive SPI operations.
> */
> struct mutex lock;
> + /*
> + * Per-buffer-enable lifetime resources:
> + * Manual Mode - a pre-built SPI message that clocks out N+1
> + * transfers in one go.
> + * CNV Burst Mode - a pre-built SPI message that clocks out 2*N
> + * transfers in one go.
> + */
> + struct spi_message scan_msg;
> + /*
> + * max 16 + 1 NOOP (manual) or 2*16 + 1 state-reset (CNV burst).
> + */
> + struct spi_transfer scan_xfers[34];
> + /*
> + * CNV burst: 16 AVG_IN addresses = 16. Manual: 16 channel cmds +
> + * 1 NOOP = 17. Stored as native u16; put_unaligned_be16() fills each
> + * slot so the SPI controller (bits_per_word=8) sends bytes MSB-first.
> + */
> + u16 scan_tx[17] __aligned(IIO_DMA_MINALIGN);
> + /*
> + * CNV burst state-reset: 4-byte write [addr_hi, addr_lo,
> + * STATE_RESET_ALL, OSC_EN=1]. CS is asserted throughout, so
> + * ADDR_DESCENDING writes byte[3]=1 to OSC_EN_REG (0x180) as a
> + * deliberate side-write, keeping the oscillator enabled. Shared
> + * with the offload path (mutually exclusive at probe).
> + */
> + u8 scan_tx_reset[4] __aligned(IIO_DMA_MINALIGN);
> + /*
> + * Scan buffer: one BE16 slot per active channel, plus timestamp.
> + * DMA-aligned because scan_xfers point rx_buf directly into vals[].
> + */
> + IIO_DECLARE_DMA_BUFFER_WITH_TS(__be16, vals, 16);
As per comments elsewhere: If it is big endian why is the channel not marked
as such in the iio_chan_spec?
> };
> +
> +static int ad4691_cnv_burst_buffer_preenable(struct iio_dev *indio_dev)
> +{
> + struct ad4691_state *st = iio_priv(indio_dev);
> + unsigned int acc_mask;
> + unsigned int k, i;
> + int ret;
> +
> + memset(st->scan_xfers, 0, sizeof(st->scan_xfers));
> + memset(st->scan_tx, 0, sizeof(st->scan_tx));
> +
> + spi_message_init(&st->scan_msg);
> +
> + /*
> + * Each AVG_IN read needs two transfers: a 2-byte address write phase
> + * followed by a 2-byte data read phase. CS toggles between channels
> + * (cs_change=1 on the read phase of all but the last channel).
> + */
> + k = 0;
> + iio_for_each_active_channel(indio_dev, i) {
> + put_unaligned_be16(0x8000 | AD4691_AVG_IN(i), &st->scan_tx[k]);
> + st->scan_xfers[2 * k].tx_buf = &st->scan_tx[k];
> + st->scan_xfers[2 * k].len = sizeof(st->scan_tx[k]);
> + spi_message_add_tail(&st->scan_xfers[2 * k], &st->scan_msg);
> + st->scan_xfers[2 * k + 1].rx_buf = &st->vals[k];
> + st->scan_xfers[2 * k + 1].len = sizeof(st->scan_tx[k]);
> + st->scan_xfers[2 * k + 1].cs_change = 1;
> + spi_message_add_tail(&st->scan_xfers[2 * k + 1], &st->scan_msg);
> + k++;
> + }
> +
> + /*
> + * Append a 4-byte state-reset transfer [addr_hi, addr_lo,
> + * STATE_RESET_ALL, OSC_EN=1]. CS is asserted throughout, so
> + * ADDR_DESCENDING writes byte[3]=1 to OSC_EN_REG (0x180) as a
> + * deliberate side-write, keeping the oscillator enabled.
> + */
> + put_unaligned_be16(AD4691_STATE_RESET_REG, st->scan_tx_reset);
> + st->scan_tx_reset[2] = AD4691_STATE_RESET_ALL;
> + st->scan_tx_reset[3] = 1;
> + st->scan_xfers[2 * k].tx_buf = st->scan_tx_reset;
> + st->scan_xfers[2 * k].len = sizeof(st->scan_tx_reset);
> + st->scan_xfers[2 * k].cs_change = 1;
Our old friend - cs_change = 1 is very rarely the right thing to do on a
final message. I thought this came up in an earlier version.
> + spi_message_add_tail(&st->scan_xfers[2 * k], &st->scan_msg);
> +
> + ret = spi_optimize_message(st->spi, &st->scan_msg);
> + if (ret)
> + return ret;
> +
> + ret = regmap_write(st->regmap, AD4691_STD_SEQ_CONFIG,
> + bitmap_read(indio_dev->active_scan_mask, 0,
> + iio_get_masklength(indio_dev)));
> + if (ret)
> + goto err_unoptimize;
> +
> + acc_mask = ~bitmap_read(indio_dev->active_scan_mask, 0,
> + iio_get_masklength(indio_dev)) & GENMASK(15, 0);
> + ret = regmap_write(st->regmap, AD4691_ACC_MASK_REG, acc_mask);
> + if (ret)
> + goto err_unoptimize;
> +
> + ret = ad4691_enter_conversion_mode(st);
> + if (ret)
> + goto err_unoptimize;
> +
> + return 0;
> +
> +err_unoptimize:
> + spi_unoptimize_message(&st->scan_msg);
> + return ret;
> +}
> +
> +static const struct iio_trigger_ops ad4691_trigger_ops = {
> + .validate_device = iio_trigger_validate_own_device,
Same thing about manual mode as mentioned below.
> +};
> +
> +static irqreturn_t ad4691_trigger_handler(int irq, void *p)
> +{
> + struct iio_poll_func *pf = p;
> + struct iio_dev *indio_dev = pf->indio_dev;
> + struct ad4691_state *st = iio_priv(indio_dev);
> +
> + ad4691_read_scan(indio_dev, pf->timestamp);
> + if (!st->manual_mode)
> + enable_irq(st->irq);
Maybe it was a different driver but I thought I commented on this before.
There are a bunch of races if you reenable this here - needs to be
in the trigger reenable callback.
(Sashiko is pointing this out as well with more detail on what those
races are) The short story is that you can race and have a trigger between
the enable and the notify_done which will be dropped on the floor meaning
we never get in here again - IIRC there is (rather convoluted) code to handle that
corner case in via the reenable callback and a work item.
> + iio_trigger_notify_done(indio_dev->trig);
> + return IRQ_HANDLED;
> +}
> +
> static const struct iio_info ad4691_info = {
> .read_raw = &ad4691_read_raw,
> .write_raw = &ad4691_write_raw,
> .read_avail = &ad4691_read_avail,
> .debugfs_reg_access = &ad4691_reg_access,
> + .validate_trigger = iio_validate_own_trigger,
Sashiko noted that this seems odd. If we only support our own
trigger how does manual mode work given it isn't registering one?
You will need to do something more clever to handle that.
Either pick between info structures or use custom validation callbacks.
> };
...
> @@ -651,6 +1119,75 @@ static int ad4691_config(struct ad4691_state *st)
> return 0;
> }
>
> +static int ad4691_setup_triggered_buffer(struct iio_dev *indio_dev,
> + struct ad4691_state *st)
> +{
> + struct device *dev = regmap_get_device(st->regmap);
> + struct iio_trigger *trig;
> + unsigned int i;
> + int irq, ret;
> +
> + trig = devm_iio_trigger_alloc(dev, "%s-dev%d", indio_dev->name,
> + iio_device_id(indio_dev));
> + if (!trig)
> + return -ENOMEM;
> +
> + trig->ops = &ad4691_trigger_ops;
> + iio_trigger_set_drvdata(trig, st);
> +
> + ret = devm_iio_trigger_register(dev, trig);
> + if (ret)
> + return dev_err_probe(dev, ret, "IIO trigger register failed\n");
> +
> + indio_dev->trig = iio_trigger_get(trig);
Sashiko correctly points out the general bug we have right now about leaking
trigger context on any error.
Need to solve that for all drivers though so don't worry about it here.
It did give more complex reasoning than normal however!
> +
> + if (st->manual_mode)
> + return devm_iio_triggered_buffer_setup(dev, indio_dev,
> + &iio_pollfunc_store_time,
> + &ad4691_trigger_handler,
> + &ad4691_manual_buffer_setup_ops);
> +
> + /*
> + * The GP pin named in interrupt-names asserts at end-of-conversion.
> + * The IRQ handler stops conversions and fires the IIO trigger so
> + * the trigger handler can read and push the sample to the buffer.
> + * The IRQ is kept disabled until the buffer is enabled.
> + */
> + irq = -ENXIO;
> + for (i = 0; i < ARRAY_SIZE(ad4691_gp_names); i++) {
> + irq = fwnode_irq_get_byname(dev_fwnode(dev),
> + ad4691_gp_names[i]);
> + if (irq > 0 || irq == -EPROBE_DEFER)
> + break;
> + }
> + if (irq < 0)
> + return dev_err_probe(dev, irq, "failed to get GP interrupt\n");
> +
> + st->irq = irq;
> +
> + ret = ad4691_gpio_setup(st, i);
> + if (ret)
> + return ret;
> +
> + /*
> + * IRQ is kept disabled until the buffer is enabled to prevent
> + * spurious DATA_READY events before the SPI message is set up.
> + */
> + ret = devm_request_threaded_irq(dev, irq, NULL,
> + &ad4691_irq,
> + IRQF_ONESHOT | IRQF_NO_AUTOEN,
> + indio_dev->name, indio_dev);
> + if (ret)
> + return ret;
> +
> + return devm_iio_triggered_buffer_setup_ext(dev, indio_dev,
> + &iio_pollfunc_store_time,
> + &ad4691_trigger_handler,
> + IIO_BUFFER_DIRECTION_IN,
> + &ad4691_cnv_burst_buffer_setup_ops,
> + ad4691_buffer_attrs);
> +}
> +
> static int ad4691_probe(struct spi_device *spi)
> {
> struct device *dev = &spi->dev;
> @@ -663,6 +1200,7 @@ static int ad4691_probe(struct spi_device *spi)
> return -ENOMEM;
>
> st = iio_priv(indio_dev);
> + st->spi = spi;
> st->info = spi_get_device_match_data(spi);
> if (!st->info)
> return -ENODEV;
> @@ -692,8 +1230,9 @@ static int ad4691_probe(struct spi_device *spi)
> indio_dev->info = &ad4691_info;
> indio_dev->modes = INDIO_DIRECT_MODE;
>
> - indio_dev->channels = st->info->sw_info->channels;
> - indio_dev->num_channels = st->info->sw_info->num_channels;
You've lost me here. Where are these now set?
> + ret = ad4691_setup_triggered_buffer(indio_dev, st);
> + if (ret)
> + return ret;
>
> return devm_iio_device_register(dev, indio_dev);
> }
>
^ permalink raw reply
* Re: [PATCH mm-unstable v17 11/14] mm/khugepaged: Introduce mTHP collapse support
From: Wei Yang @ 2026-05-12 15:44 UTC (permalink / raw)
To: Nico Pache
Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
zokeefe
In-Reply-To: <20260511185817.686831-12-npache@redhat.com>
On Mon, May 11, 2026 at 12:58:11PM -0600, Nico Pache wrote:
>Enable khugepaged to collapse to mTHP orders. This patch implements the
>main scanning logic using a bitmap to track occupied pages and a stack
>structure that allows us to find optimal collapse sizes.
>
>Previous to this patch, PMD collapse had 3 main phases, a light weight
>scanning phase (mmap_read_lock) that determines a potential PMD
>collapse, an alloc phase (mmap unlocked), then finally heavier collapse
>phase (mmap_write_lock).
>
>To enabled mTHP collapse we make the following changes:
>
>During PMD scan phase, track occupied pages in a bitmap. When mTHP
>orders are enabled, we remove the restriction of max_ptes_none during the
>scan phase to avoid missing potential mTHP collapse candidates. Once we
>have scanned the full PMD range and updated the bitmap to track occupied
>pages, we use the bitmap to find the optimal mTHP size.
>
>Implement collapse_scan_bitmap() to perform binary recursion on the bitmap
>and determine the best eligible order for the collapse. A stack structure
>is used instead of traditional recursion to manage the search. This also
>prevents a traditional recursive approach when the kernel stack struct is
>limited. The algorithm recursively splits the bitmap into smaller chunks to
>find the highest order mTHPs that satisfy the collapse criteria. We start
>by attempting the PMD order, then moved on the consecutively lower orders
>(mTHP collapse). The stack maintains a pair of variables (offset, order),
>indicating the number of PTEs from the start of the PMD, and the order of
>the potential collapse candidate.
>
>The algorithm for consuming the bitmap works as such:
> 1) push (0, HPAGE_PMD_ORDER) onto the stack
> 2) pop the stack
> 3) check if the number of set bits in that (offset,order) pair
> statisfy the max_ptes_none threshold for that order
> 4) if yes, attempt collapse
> 5) if no (or collapse fails), push two new stack items representing
> the left and right halves of the current bitmap range, at the
> next lower order
> 6) repeat at step (2) until stack is empty.
>
>Below is a diagram representing the algorithm and stack items:
>
> offset mid_offset
> | |
> | |
> v v
> ____________________________________
> | PTE Page Table |
> --------------------------------------
> <-------><------->
> order-1 order-1
>
>mTHP collapses reject regions containing swapped out or shared pages.
>This is because adding new entries can lead to new none pages, and these
>may lead to constant promotion into a higher order mTHP. A similar
>issue can occur with "max_ptes_none > HPAGE_PMD_NR/2" due to a collapse
>introducing at least 2x the number of pages, and on a future scan will
>satisfy the promotion condition once again. This issue is prevented via
>the collapse_max_ptes_none() function which imposes the max_ptes_none
>restrictions above.
>
>We currently only support mTHP collapse for max_ptes_none values of 0
>and HPAGE_PMD_NR - 1. resulting in the following behavior:
>
> - max_ptes_none=0: Never introduce new empty pages during collapse
> - max_ptes_none=HPAGE_PMD_NR-1: Always try collapse to the highest
> available mTHP order
>
>Any other max_ptes_none value will emit a warning and skip mTHP collapse
>attempts. There should be no behavior change for PMD collapse.
>
>Once we determine what mTHP sizes fits best in that PMD range a collapse
>is attempted. A minimum collapse order of 2 is used as this is the lowest
>order supported by anon memory as defined by THP_ORDERS_ALL_ANON.
>
>Currently madv_collapse is not supported and will only attempt PMD
>collapse.
>
>We can also remove the check for is_khugepaged inside the PMD scan as
>the collapse_max_ptes_none() function handles this logic now.
>
>Signed-off-by: Nico Pache <npache@redhat.com>
[...]
>+static int mthp_collapse(struct mm_struct *mm, unsigned long address,
>+ int referenced, int unmapped, struct collapse_control *cc,
>+ unsigned long enabled_orders)
>+{
>+ unsigned int nr_occupied_ptes, nr_ptes;
>+ int max_ptes_none, collapsed = 0, stack_size = 0;
>+ unsigned long collapse_address;
>+ struct mthp_range range;
>+ u16 offset;
>+ u8 order;
>+
>+ collapse_mthp_stack_push(cc, &stack_size, 0, HPAGE_PMD_ORDER);
>+
>+ while (stack_size) {
>+ range = collapse_mthp_stack_pop(cc, &stack_size);
>+ order = range.order;
>+ offset = range.offset;
>+ nr_ptes = 1UL << order;
>+
>+ if (!test_bit(order, &enabled_orders))
>+ goto next_order;
>+
>+ max_ptes_none = collapse_max_ptes_none(cc, NULL, order);
I am thinking whether there is a behavioral change for userfaultfd_armed(vma).
collapse_single_pmd()
collapse_scan_pmd
max_ptes_none = collapse_max_ptes_none(cc, vma)
max_ptes_none = KHUGEPAGED_MAX_PTES_LIMIT --- (1)
mthp_collapse
max_ptes_none = collapse_max_ptes_none(cc, NULL) --- (2)
collapse_huge_page(mm)
hugepage_vma_revalidate(&vma)
__collapse_huge_page_isolate(vma)
max_ptes_none = collapse_max_ptes_none(cc, vma)
Before mthp_collapse() introduced, userfaultfd_armed(vma) is skipped if there
is any pte_none_or_zero() in collapse_scan_pmd().
But now, max_ptes_none could be set to KHUGEPAGED_MAX_PTES_LIMIT at (1), so
that we can scan all the pte to get the bitmap. This means
userfaultfd_armed(vma) could continue even with pte_none_or_zero().
Then in mthp_collapse(), collapse_max_ptes_none() at (2) ignores
userfaultfd_armed(vma), which means it will continue to collapse a
userfaultfd_armed(vma) when there is pte_none_or_zero().
The good news is we will stop at __collapse_huge_page_isolate(), where we
get collapse_max_ptes_none() with vma. But we already did a lot of work.
Not sure if I missed something.
>+
>+ if (max_ptes_none < 0)
>+ return collapsed;
>+
>+ nr_occupied_ptes = collapse_mthp_count_present(cc, offset,
>+ nr_ptes);
>+
>+ if (nr_occupied_ptes >= nr_ptes - max_ptes_none) {
>+ int ret;
>+
>+ collapse_address = address + offset * PAGE_SIZE;
>+ ret = collapse_huge_page(mm, collapse_address, referenced,
>+ unmapped, cc, order);
>+ if (ret == SCAN_SUCCEED) {
>+ collapsed += nr_ptes;
>+ continue;
>+ }
>+ }
>+
>+next_order:
>+ if (order > KHUGEPAGED_MIN_MTHP_ORDER) {
>+ const u8 next_order = order - 1;
>+ const u16 mid_offset = offset + (nr_ptes / 2);
>+
>+ collapse_mthp_stack_push(cc, &stack_size, mid_offset,
>+ next_order);
>+ collapse_mthp_stack_push(cc, &stack_size, offset,
>+ next_order);
>+ }
>+ }
>+ return collapsed;
>+}
>+
> static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> struct vm_area_struct *vma, unsigned long start_addr,
> bool *lock_dropped, struct collapse_control *cc)
> {
>- const int max_ptes_none = collapse_max_ptes_none(cc, vma, HPAGE_PMD_ORDER);
>+ int max_ptes_none = collapse_max_ptes_none(cc, vma, HPAGE_PMD_ORDER);
> const unsigned int max_ptes_shared = collapse_max_ptes_shared(cc, HPAGE_PMD_ORDER);
> const unsigned int max_ptes_swap = collapse_max_ptes_swap(cc, HPAGE_PMD_ORDER);
>+ enum tva_type tva_flags = cc->is_khugepaged ? TVA_KHUGEPAGED : TVA_FORCED_COLLAPSE;
> pmd_t *pmd;
>- pte_t *pte, *_pte;
>- int none_or_zero = 0, shared = 0, referenced = 0;
>+ pte_t *pte, *_pte, pteval;
>+ int i;
>+ int none_or_zero = 0, shared = 0, nr_collapsed = 0, referenced = 0;
> enum scan_result result = SCAN_FAIL;
> struct page *page = NULL;
> struct folio *folio = NULL;
> unsigned long addr;
>+ unsigned long enabled_orders;
> spinlock_t *ptl;
> int node = NUMA_NO_NODE, unmapped = 0;
>
>@@ -1429,8 +1579,19 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> goto out;
> }
>
>+ bitmap_zero(cc->mthp_bitmap, MAX_PTRS_PER_PTE);
> memset(cc->node_load, 0, sizeof(cc->node_load));
> nodes_clear(cc->alloc_nmask);
>+
>+ enabled_orders = collapse_allowable_orders(vma, vma->vm_flags, tva_flags);
Would it be 0 at this point?
>+
>+ /*
>+ * If PMD is the only enabled order, enforce max_ptes_none, otherwise
>+ * scan all pages to populate the bitmap for mTHP collapse.
>+ */
>+ if (enabled_orders != BIT(HPAGE_PMD_ORDER))
>+ max_ptes_none = KHUGEPAGED_MAX_PTES_LIMIT;
>+
> pte = pte_offset_map_lock(mm, pmd, start_addr, &ptl);
> if (!pte) {
> cc->progress++;
>@@ -1438,11 +1599,13 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> goto out;
> }
>
>- for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>- _pte++, addr += PAGE_SIZE) {
>+ for (i = 0; i < HPAGE_PMD_NR; i++) {
>+ _pte = pte + i;
>+ addr = start_addr + i * PAGE_SIZE;
>+ pteval = ptep_get(_pte);
>+
> cc->progress++;
>
>- pte_t pteval = ptep_get(_pte);
> if (pte_none_or_zero(pteval)) {
> if (++none_or_zero > max_ptes_none) {
> result = SCAN_EXCEED_NONE_PTE;
>@@ -1522,6 +1685,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> }
> }
>
>+ /* Set bit for occupied pages */
>+ __set_bit(i, cc->mthp_bitmap);
> /*
> * Record which node the original page is from and save this
> * information to cc->node_load[].
>@@ -1580,10 +1745,11 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> if (result == SCAN_SUCCEED) {
> /* collapse_huge_page expects the lock to be dropped before calling */
> mmap_read_unlock(mm);
>- result = collapse_huge_page(mm, start_addr, referenced,
>- unmapped, cc, HPAGE_PMD_ORDER);
>+ nr_collapsed = mthp_collapse(mm, start_addr, referenced, unmapped,
>+ cc, enabled_orders);
> /* collapse_huge_page will return with the mmap_lock released */
collapse_huge_page will return with mmap_lock released, but mthp_collapse()
may not?
> *lock_dropped = true;
>+ result = nr_collapsed ? SCAN_SUCCEED : SCAN_FAIL;
> }
> out:
> trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
>--
>2.54.0
--
Wei Yang
Help you, Help me
^ permalink raw reply
* Re: [PATCH v13 3/4] gpio: rpmsg: add generic rpmsg GPIO driver
From: Mathieu Poirier @ 2026-05-12 15:41 UTC (permalink / raw)
To: tanmay.shah
Cc: Arnaud POULIQUEN, Beleswar Prasad Padhi, Shenwei Wang,
Andrew Lunn, Linus Walleij, Bartosz Golaszewski, Jonathan Corbet,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson,
Frank Li, Sascha Hauer, Shuah Khan, linux-gpio@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Pengutronix Kernel Team, Fabio Estevam, Peng Fan,
devicetree@vger.kernel.org, linux-remoteproc@vger.kernel.org,
imx@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
dl-linux-imx, Bartosz Golaszewski
In-Reply-To: <ae5cfa0f-e929-4abb-b27b-8eb27586d183@amd.com>
On Mon, May 11, 2026 at 04:35:46PM -0500, Shah, Tanmay wrote:
>
>
> On 5/11/2026 12:58 PM, Mathieu Poirier wrote:
> > On Mon, 11 May 2026 at 10:47, Shah, Tanmay <tanmays@amd.com> wrote:
> >>
> >>
> >>
> >> On 5/5/2026 10:52 AM, Shah, Tanmay wrote:
> >>>
> >>>
> >>> On 5/5/2026 4:28 AM, Arnaud POULIQUEN wrote:
> >>>> Hi Tanmay,
> >>>>
> >>>> On 5/4/26 21:19, Shah, Tanmay wrote:
> >>>>>
> >>>>> Hello all,
> >>>>>
> >>>>> I have started reviewing this work as well.
> >>>>> Thanks Shenwei for this work.
> >>>>>
> >>>>> I have gone through only the current revision, and would like to provide
> >>>>> idea on how to achieve GPIO number multiplexing with the RPMsg protocol.
> >>>>> Also, have some bindings related question.
> >>>>>
> >>>>> Please see below:
> >>>>>
> >>>>> On 4/30/2026 11:40 AM, Arnaud POULIQUEN wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 4/30/26 14:56, Beleswar Prasad Padhi wrote:
> >>>>>>> Hello Arnaud,
> >>>>>>>
> >>>>>>> On 30/04/26 13:05, Arnaud POULIQUEN wrote:
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> On 4/29/26 21:20, Mathieu Poirier wrote:
> >>>>>>>>> On Wed, 29 Apr 2026 at 12:07, Padhi, Beleswar <b-padhi@ti.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Mathieu,
> >>>>>>>>>>
> >>>>>>>>>> On 4/29/2026 11:03 PM, Mathieu Poirier wrote:
> >>>>>>>>>>> On Wed, 29 Apr 2026 at 10:53, Shenwei Wang <shenwei.wang@nxp.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>> From: Mathieu Poirier <mathieu.poirier@linaro.org>
> >>>>>>>>>>>>> Sent: Wednesday, April 29, 2026 10:42 AM
> >>>>>>>>>>>>> To: Shenwei Wang <shenwei.wang@nxp.com>
> >>>>>>>>>>>>> Cc: Andrew Lunn <andrew@lunn.ch>; Padhi, Beleswar <b-
> >>>>>>>>>>>>> padhi@ti.com>; Linus
> >>>>>>>>>>>>> Walleij <linusw@kernel.org>; Bartosz Golaszewski
> >>>>>>>>>>>>> <brgl@kernel.org>; Jonathan
> >>>>>>>>>>>>> Corbet <corbet@lwn.net>; Rob Herring <robh@kernel.org>;
> >>>>>>>>>>>>> Krzysztof Kozlowski
> >>>>>>>>>>>>> <krzk+dt@kernel.org>; Conor Dooley <conor+dt@kernel.org>; Bjorn
> >>>>>>>>>>>>> Andersson
> >>>>>>>>>>>>> <andersson@kernel.org>; Frank Li <frank.li@nxp.com>; Sascha Hauer
> >>>>>>>>>>>>> <s.hauer@pengutronix.de>; Shuah Khan
> >>>>>>>>>>>>> <skhan@linuxfoundation.org>; linux-
> >>>>>>>>>>>>> gpio@vger.kernel.org; linux-doc@vger.kernel.org; linux-
> >>>>>>>>>>>>> kernel@vger.kernel.org;
> >>>>>>>>>>>>> Pengutronix Kernel Team <kernel@pengutronix.de>; Fabio Estevam
> >>>>>>>>>>>>> <festevam@gmail.com>; Peng Fan <peng.fan@nxp.com>;
> >>>>>>>>>>>>> devicetree@vger.kernel.org; linux-remoteproc@vger.kernel.org;
> >>>>>>>>>>>>> imx@lists.linux.dev; linux-arm-kernel@lists.infradead.org; dl-
> >>>>>>>>>>>>> linux-imx <linux-
> >>>>>>>>>>>>> imx@nxp.com>; Bartosz Golaszewski <brgl@bgdev.pl>
> >>>>>>>>>>>>> Subject: [EXT] Re: [PATCH v13 3/4] gpio: rpmsg: add generic
> >>>>>>>>>>>>> rpmsg GPIO driver
> >>>>>>>>>>>>> On Tue, Apr 28, 2026 at 03:24:59PM +0000, Shenwei Wang wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>>> From: Andrew Lunn <andrew@lunn.ch>
> >>>>>>>>>>>>>>> Sent: Monday, April 27, 2026 3:49 PM
> >>>>>>>>>>>>>>> To: Shenwei Wang <shenwei.wang@nxp.com>
> >>>>>>>>>>>>>>> Cc: Padhi, Beleswar <b-padhi@ti.com>; Linus Walleij
> >>>>>>>>>>>>>>> <linusw@kernel.org>; Bartosz Golaszewski <brgl@kernel.org>;
> >>>>>>>>>>>>>>> Jonathan
> >>>>>>>>>>>>>>> Corbet <corbet@lwn.net>; Rob Herring <robh@kernel.org>;
> >>>>>>>>>>>>>>> Krzysztof
> >>>>>>>>>>>>>>> Kozlowski <krzk+dt@kernel.org>; Conor Dooley
> >>>>>>>>>>>>>>> <conor+dt@kernel.org>;
> >>>>>>>>>>>>>>> Bjorn Andersson <andersson@kernel.org>; Mathieu Poirier
> >>>>>>>>>>>>>>> <mathieu.poirier@linaro.org>; Frank Li <frank.li@nxp.com>;
> >>>>>>>>>>>>>>> Sascha
> >>>>>>>>>>>>>>> Hauer <s.hauer@pengutronix.de>; Shuah Khan
> >>>>>>>>>>>>>>> <skhan@linuxfoundation.org>; linux-gpio@vger.kernel.org; linux-
> >>>>>>>>>>>>>>> doc@vger.kernel.org; linux-kernel@vger.kernel.org; Pengutronix
> >>>>>>>>>>>>>>> Kernel Team <kernel@pengutronix.de>; Fabio Estevam
> >>>>>>>>>>>>>>> <festevam@gmail.com>; Peng Fan <peng.fan@nxp.com>;
> >>>>>>>>>>>>>>> devicetree@vger.kernel.org; linux- remoteproc@vger.kernel.org;
> >>>>>>>>>>>>>>> imx@lists.linux.dev; linux-arm- kernel@lists.infradead.org;
> >>>>>>>>>>>>>>> dl-linux-imx <linux-imx@nxp.com>; Bartosz Golaszewski
> >>>>>>>>>>>>>>> <brgl@bgdev.pl>
> >>>>>>>>>>>>>>> Subject: [EXT] Re: [PATCH v13 3/4] gpio: rpmsg: add generic
> >>>>>>>>>>>>>>> rpmsg
> >>>>>>>>>>>>>>> GPIO driver
> >>>>>>>>>>>>>>>>> struct virtio_gpio_response {
> >>>>>>>>>>>>>>>>> __u8 status;
> >>>>>>>>>>>>>>>>> __u8 value;
> >>>>>>>>>>>>>>>>> };
> >>>>>>>>>>>>>>>> It is the same message format. Please see the message
> >>>>>>>>>>>>>>>> definition
> >>>>>>>>>>>>>>> (GET_DIRECTION) below:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> + +-----+-----+-----+-----+-----+----+
> >>>>>>>>>>>>>>>> + |0x00 |0x01 |0x02 |0x03 |0x04 |0x05|
> >>>>>>>>>>>>>>>> + | 1 | 2 |port |line | err | dir|
> >>>>>>>>>>>>>>>> + +-----+-----+-----+-----+-----+----+
> >>>>>>>>>>>>>>> Sorry, but i don't see how two u8 vs six u8 are the same
> >>>>>>>>>>>>>>> message format.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Some changes to the message format are necessary.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Virtio uses two communication channels (virtqueues): one for
> >>>>>>>>>>>>>> requests and
> >>>>>>>>>>>>> replies, and a second one for events.
> >>>>>>>>>>>>>> In contrast, rpmsg provides only a single communication
> >>>>>>>>>>>>>> channel, so a
> >>>>>>>>>>>>>> type field is required to distinguish between different kinds
> >>>>>>>>>>>>>> of messages.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Since rpmsg replies and events share the same message format,
> >>>>>>>>>>>>>> an additional
> >>>>>>>>>>>>> line is introduced to handle both cases.
> >>>>>>>>>>>>>> Finally, rpmsg supports multiple GPIO controllers, so a port
> >>>>>>>>>>>>>> field is added to
> >>>>>>>>>>>>> uniquely identify the target controller.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have commented on this before - RPMSG is already providing
> >>>>>>>>>>>>> multiplexing
> >>>>>>>>>>>>> capability by way of endpoints. There is no need for a port
> >>>>>>>>>>>>> field. One endpoint,
> >>>>>>>>>>>>> one GPIO controller.
> >>>>>>>>>>>>>
> >>>>>>>>>>>> You still need a way to let the remote side know which port the
> >>>>>>>>>>>> endpoint maps to, either
> >>>>>>>>>>>> by embedding the port information in the message (the current
> >>>>>>>>>>>> way), or by sending it
> >>>>>>>>>>>> separately.
> >>>>>>>>>>>>
> >>>>>>>>>>> An endpoint is created with every namespace request. There
> >>>>>>>>>>> should be
> >>>>>>>>>>> one namespace request for every GPIO controller, which yields a
> >>>>>>>>>>> unique
> >>>>>>>>>>> endpoint for each controller and eliminates the need for an extra
> >>>>>>>>>>> field to identify them.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Right, but this can still be done by just having one namespace
> >>>>>>>>>> request.
> >>>>>>>>>> We can create new endpoints bound to an existing namespace/
> >>>>>>>>>> channel by
> >>>>>>>>>> invoking rpmsg_create_ept(). This is what I suggested here too:
> >>>>>>>>>> https://lore.kernel.org/all/29485742-6e49-482e-
> >>>>>>>>>> b73d-228295daaeec@ti.com/
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I will look at your suggestion (i.e link above) later this week or
> >>>>>>>>> next week.
> >>>>>>>>>
> >>>>>>>>>> My mental model looks like this for the complete picture:
> >>>>>>>>>>
> >>>>>>>>>> 1. namespace/channel#1 = rpmsg-io
> >>>>>>>>>> a. ept1 -> gpio-controller@1
> >>>>>>>>>> b. ept2 -> gpio-controller@2
> >>>>>>>>>>
> >>>>>
> >>>>> If my understanding of what gpio-controller is right, than this won't
> >>>>> work. We need one rpmsg channel per gpio-controller, and in most cases
> >>>>> there will be only one GPIO-controller on the remote side. If there are
> >>>>> multiple or multiple instances of same controller, than we need separate
> >>>>> channel name for that controller just like we would have separate device
> >>>>> on the Linux.
> >>>>
> >>>> As done in ehe rpmsg_tty driver it could be instantiated several times with
> >>>> the same channel/service name. This would imply a specific rpmsg to
> >>>> retreive
> >>>> the gpio controller index from the remote side.
> >>>>>
> >>>>>>>>>
> >>>>>>>>> I've asked for one endpoint per GPIO controller since the very
> >>>>>>>>> beginning. I don't yet have a strong opinion on whether to use one
> >>>>>>>>> namespace request per GPIO controller or a single request that spins
> >>>>>>>>> off multiple endpoints. I'll have to look at your link and
> >>>>>>>>> reflect on
> >>>>>>>>> that. Regardless of how we proceed on that front, multiplexing needs
> >>>>>>>>> to happen at the endpoint level rather than the packet level.
> >>>>>>>>> This is
> >>>>>>>>> the only way this work can move forward.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I would be more in favor of Mathieu’s proposal: “An endpoint is
> >>>>>>>> created with every namespace request.”
> >>>>>>>>
> >>>>>>>> If the endpoint is created only on the Linux side, how do we match
> >>>>>>>> the Linux endpoint address with the local port field on the remote
> >>>>>>>> side?
> >>>>>>>
> >>>>>>>
> >>>>>>> Simply by sending a message to the remote containing the newly created
> >>>>>>> endpoint and the port idx. Note that is this done just one time, after
> >>>>>>> this
> >>>>>>> Linux need not have the port field in the message everytime its sending
> >>>>>>> a message.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> With a multi-namespace approach, the namespace could be rpmsg-io-
> >>>>>>>> [addr], where [addr] corresponds to the GPIO controller address in
> >>>>>>>> the DT. This would:
> >>>>>>>
> >>>>>>>
> >>>>>>> You will face the same problem in this case also that you asked above:
> >>>>>>> "how do we match the Linux endpoint address with the local port field
> >>>>>>> on the remote side?"
> >>>>>>
> >>>>>> Sorry I probably introduced confusion here
> >>>>>> my sentence should be;
> >>>>>> With a multi-namespace approach, the namespace could be rpmsg-io-
> >>>>>> [port],
> >>>>>> where [port] corresponds to the GPIO controller port in the DT.
> >>>>>>
> >>>>>>
> >>>>>> For instance:
> >>>>>>
> >>>>>> rpmsg {
> >>>>>> rpmsg-io {
> >>>>>> #address-cells = <1>;
> >>>>>> #size-cells = <0>;
> >>>>>>
> >>>>>> gpio@25 {
> >>>>>> compatible = "rpmsg-gpio";
> >>>>>> reg = <25>;
> >>>>>> gpio-controller;
> >>>>>> #gpio-cells = <2>;
> >>>>>> #interrupt-cells = <2>;
> >>>>>> interrupt-controller;
> >>>>>> };
> >>>>>>
> >>>>>> gpio@32 {
> >>>>>> compatible = "rpmsg-gpio";
> >>>>>> reg = <32>;
> >>>>>> gpio-controller;
> >>>>>> #gpio-cells = <2>;
> >>>>>> #interrupt-cells = <2>;
> >>>>>> interrupt-controller;
> >>>>>> };
> >>>>>> };
> >>>>>> };
> >>>>>>
> >>>>>> rpmsg-io-25 would match with gpio@25
> >>>>>> rpmsg-io-32 would match with gpio@32
> >>>>>>
> >>>>>
> >>>>> The problem with this approach is, we will endup creating way too many
> >>>>> RPMsg devices/channels. i.e. one channel per one GPIO. That limits how
> >>>>> many GPIOs can be handled by remote from memory perspective. At
> >>>>> somepoint we might just run-out of number ept & channels created by the
> >>>>> remote. As of now, open-amp library supports 128 epts I think.
> >>>>
> >>>> Right, I proposed a solution in my previous answer to Beleswar who has
> >>>> the same concern.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Because the endpoint that is created on a namespace request is also
> >>>>>>> dynamic in nature. How will the remote know which endpoint addr
> >>>>>>> Linux allocated for a namespace that it announced?
> >>>>>>>
> >>>>>>> As an example/PoC, I created a firmware example which announces
> >>>>>>> 2 name services to Linux, one is the standard "rpmsg_chrdev" and
> >>>>>>> the other is a TI specific name service "ti.ipc4.ping-pong". You can
> >>>>>>> see it created 2 different addresses (0x400 and 0x401) for each of
> >>>>>>> the name service request from the same firmware:
> >>>>>>>
> >>>>>>> root@j784s4-evm:~# dmesg | grep virtio0 | grep -i channel
> >>>>>>> [ 9.290275] virtio_rpmsg_bus virtio0: creating channel
> >>>>>>> ti.ipc4.ping-pong addr 0xd
> >>>>>>> [ 9.311230] virtio_rpmsg_bus virtio0: creating channel rpmsg_chrdev
> >>>>>>> addr 0xe
> >>>>>>> [ 9.496645] rpmsg_chrdev virtio0.rpmsg_chrdev.-1.14: DEBUG: Channel
> >>>>>>> formed from src = 0x400 to dst = 0xe
> >>>>>>> [ 9.707255] rpmsg_client_sample virtio0.ti.ipc4.ping-pong.-1.13:
> >>>>>>> new channel: 0x401 -> 0xd!
> >>>>>>>
> >>>>>>> So in this case, rpmsg-io-1 can have different ept addr than rpmsg-io-2
> >>>>>>> Back to same problem. Simple solution is to reply to remote with the
> >>>>>>> created ept addr and the index.
> >>>>>>
> >>>>>> That why I would like to suggest to use the name service field to
> >>>>>> identify the port/controller, instead of the endpoint address.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> - match the RPMsg probe with the DT,
> >>>>>>>
> >>>>>>>
> >>>>>>> We can probe from all controllers with a single name service
> >>>>>>> announcement too.
> >>>>>>>
> >>>>>>>> - provide a simple mapping between the port and the endpoint on both
> >>>>>>>> sides,
> >>>>>>>
> >>>>>>>
> >>>>>>> We are trying to get rid of this mapping from Linux side to adapt
> >>>>>>> the gpio-virtio design.
> >>>>>>>
> >>>>>>>> - allow multiple endpoints on the remote side,
> >>>>>>>
> >>>>>>>
> >>>>>>> We can support this as well with single nameservice model.
> >>>>>>> There is no limitation. Remote has to send a message with
> >>>>>>> its newly created ept that's all.
> >>>>>>>
> >>>>>>>> - provide a simple discovery mechanism for remote capabilities.
> >>>>>>>
> >>>>>>>
> >>>>>>> A single announcement: "rpmsg-io" is also discovery mechanism.
> >>>>>>>
> >>>>>>> Feel free to let me know if you have concerns with any of the
> >>>>>>> suggestions!
> >>>>>>
> >>>>>> My only concern, whatever the solution, is that we find a smart
> >>>>>> solution to associate the correct endpoint with the correct GPIO
> >>>>>> port/controller defined in the DT.
> >>>>>>
> >>>>>> I may have misunderstood your solution. Could you please help me
> >>>>>> understand your proposal by explaining how you would handle three
> >>>>>> GPIO ports defined in the DT, considering that the endpoint
> >>>>>> addresses on the Linux side can be random?
> >>>>>> If I assume there is a unique endpoint on the remote side,
> >>>>>> I do not understand how you can match, on the firmware side,
> >>>>>> the Linux endpoint address to the GPIO port.
> >>>>>>
> >>>>>> Thanks and Regards,Arnaud
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Beleswar
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Arnaud
> >>>>>>>>
> >>>>>>>>>> 2. namespace/channel#2 = rpmsg-i2c
> >>>>>>>>>> a. ept1 -> i2c@1
> >>>>>>>>>> b. ept2 -> i2c@2
> >>>>>>>>>> c. ept3 -> i2c@3
> >>>>>>>>>>
> >>>>>>>>>> etc...
> >>>>>>>>>>
> >>>>>
> >>>>> Just want to clear-up few terms before I jump to the solution:
> >>>>>
> >>>>> **RPMsg channel/device**:
> >>>>> - These are devices announced by the remote processor, and created by
> >>>>> linux. They are created at: /sys/bus/rpmsg/devices
> >>>>> - The channel format: <name>.<src ept>.<dst ept>
> >>>>>
> >>>>> **RPMsg endpoint**:
> >>>>> - Endpoint is differnt than channel. Single channel can have multiple
> >>>>> endpoints, and represented in the linux with: /dev/rpmsg? devices.
> >>>>>
> >>>>> To create endpoint device, we have rpmsg_create_ept API, which takes
> >>>>> channel information as input, which has src-ept, dst-ept.
> >>>>>
> >>>>> Following is proposed solution:
> >>>>>
> >>>>> 1) Assign RPMsg channel/device per rpmsg-gpio controller (Not per GPIO
> >>>>> pin/port).
> >>>>> - In our case that would be, single rpmsg-io node. (That makes me
> >>>>> question if bindings are correct or not).
> >>>>>
> >>>>> 2) Assign GPIO number as src ept.
> >>>>>
> >>>>> i.e. *rpmsg-io.<GPIO number>.<dst ept>*. Do not randomly assign src
> >>>>> endpoint.
> >>>>>
> >>>>> Now, RPMSG channel by spec reserves first 1024 endpoints [1], so we can
> >>>>> add 1024 offset to the GPIO number:
> >>>>>
> >>>>> so, when calling rpmsg_create_ept() API, we assing src_endpoint as:
> >>>>> (GPIO_NUMBER + RPMSG_RESERVED_ADDRESSES)
> >>>>>
> >>>>> Now on the remote side, there is single channel and only single-endpoint
> >>>>> is needed that is mapped to the rpmsg-io channel callback.
> >>>>>
> >>>>> That callback will receive all the payloads from the Linux, which will
> >>>>> have src-ept i.e. (RPMSG_RESERVED_ADDRESSES + GPIO_NUMBER).
> >>>>
> >>>>
> >>>> Interesting approach. I also tried to find a similar solution.
> >>>>
> >>>> The question here is: how can we guarantee continuous addresses? Given
> >>>> the static and dynamic allocation of endpoint addresses that are
> >>>> implemented, my conclusion was that it is not reliable enough.
> >>>>
> >>>> but perhaps I missed something...
> >>>>
> >>>>>
> >>>>> It can retrieve GPIO_NUMBER easily, and convert to appropriate pin based
> >>>>> on platform specific logic.
> >>>>>
> >>>>> This doesn't need PORT information at all. Also it makes sure that
> >>>>> remote is using only single-endpoint so not much memory is used.
> >>>>>
> >>>>> *Example*:
> >>>>> If only rpmsg-gpio channel is created by the remote side, than following
> >>>>> is the representation of the devices when GPIO 25, 26, 27 is assigned to
> >>>>> the rpmsg-io controller:
> >>>>>
> >>>>> Linux Remote
> >>>>>
> >>>>> rpmsg-channel: rpmsg-gpio.0x400.0x400
> >>>>>
> >>>>> /dev/rpmsg0 - GPIO25 ept (rpmsg-gpio.0x419.0x400)-|
> >>>>> |
> >>>>> /dev/rpmsg1 - GPIO26 ept (rpmsg-gpio.0x41a.0x400)-|-> rpmsg-gpio.*.0x400
> >>>>> |
> >>>>> /dev/rpmsg2 - GPIO27 ept (rpmsg-gpio.0x41b.0x400)-| 0x400 ept callback.
> >>>>>
> >>>>>
> >>>>> *On remote side*:
> >>>>>
> >>>>> ept_0x400_callback(..., int src_ept, ...,)
> >>>>> {
> >>>>> int gpio_num = src_ept - RPMSG_RESERVED_ADDRESSES;
> >>>>> // platform specific logic to convert gpio num to proper pin,
> >>>>> // just like you would convert gpio num to pin on a linux gpio
> >>>>> controller.
> >>>>> }
> >>>>>
> >>>>> My question on the binding:
> >>>>>
> >>>>> Why each GPIO is represented with the separate node? I think rpmsg-gpio
> >>>>> can be represented just any other GPIO controller? Please let me know if
> >>>>> I am missing something. So rpmsg channel/rpmsg device is not created per
> >>>>> GPIO, but per controller. GPIO number multiplexing should be done with
> >>>>> rpmsg src ept, that removes the need of having each GPIO as a separate
> >>>>> node.
> >>>>>
> >>>>>
> >>>>> rpmsg_gpio: rpmsg-gpio@0 {
> >>>>> compatible = "rpmsg-gpio";
> >>>>> reg = <0>;
> >>>>> gpio-controller;
> >>>>> #gpio-cells = <2>;
> >>>>> #interrupt-cells = <2>;
> >>>>> interrupt-controller;
> >>>>> };
> >>>>>
> >>>>> Then in DT, use like regular GPIO, but with the rpmsg-gpio controller:
> >>>>>
> >>>>> rpmsg-gpios = <&rpmsg_gpio (GPIO NUM) (flags)>;
> >>>>>
> >>>>> If the intent to create separate gpio nodes was only for the channel
> >>>>> creation, then it's not really needed.
> >>>>>
> >>>>> [1]
> >>>>> https://github.com/torvalds/linux/
> >>>>> blob/6d35786de28116ecf78797a62b84e6bf3c45aa5a/drivers/rpmsg/
> >>>>> virtio_rpmsg_bus.c#L136
> >>>>>
> >>>>
> >>>> It is already the case. bindings declare GPIO controllers, not directly
> >>>> GPIOs in:
> >>>>
> >>>> [PATCH v13 2/4] dt-bindings: remoteproc: imx_rproc: Add "rpmsg" subnode
> >>>> support
> >>>>
> >>>> The discussion is around having an unique RPmsg endpoint for all
> >>>> GPIO controller or one RPmsg endpoint per GPIO controller.
> >>>>
> >>>
> >>> Endpoint where remote side or linux side?
> >>>
> >>> If unique endpoint on remote side per gpio controller then it makes sense.
> >>>
> >>> Unique endpoint on linux side doesn't make sense. Instead, unique
> >>> channel per gpio controller makes sense, and each channel will have
> >>> multiple endpoints on linux side. As I replied to Beleswar on the other
> >>> email, I will copy past my answer here too:
> >>>
> >>>
> >>> To be more specific:
> >>>
> >>> Linux: remote:
> >>>
> >>> ch1: rpmsg-gpio.-1.1024 -> gpio-controller@1024
> >>> - gpio-line ept1
> >>> - gpio-line ept2 -> They all map to same callback_ept_1024.
> >>> - gpio-line ept3
> >>>
> >>> ch2: rpmsg-gpio.-1.1025 -> gpio-controller@1025
> >>> - gpio-line ept1
> >>> - gpio-line ept2 -> They all map to same callback_ept_1025.
> >>> - gpio-line ept3
> >>>
> >>
> >>
> >> Hi Mathieu,
> >>
> >> So upon more brain storming in this approach I found limitation:
> >>
> >> This approach won't work if host OS is any other OS but Linux. For
> >> example, if the remote OS is zephyr/baremetal using open-amp, then Only
> >> Linux <-> zephyr combination will work, and we won't be able to re-use
> >> this approach for zephyr <-> zephyr use case. The concept of rpmsg
> >> channel/device exist only in the linux kernel implementation. This
> >> brings another question: Should the protocol we decide work on other use
> >> cases as well? Or Linux must be the Host OS for this protocol ?
> >>
> >
> > Linux and Zephyr are very distinct OS, each with their own subsystems
> > and characteristics. The design we choose here involves RPMSG and,
> > inherently, Linux. We can't make decisions based on what may
> > potentially happen in Zephyr.
> >
> >>
> >> I think your & Arnaud's proposed approach of single endpoint per
> >> gpio-controller on both side makes more sense, as it will work
> >> regardless of any OS on host or remote side.
> >>
> >
> > Arnaud, Beleswar, Andrew and I are all advocating for one endpoint per
> > GPIO controller. The remaining issue it about the best way to work
> > out source and destination addresses between Linux and the remote
> > processor. I'm running out of time for today but I'll return to this
> > thread with a final analysis by the end of the week.
> >
>
> Okay. Then that means multiple endpoints on Linux side can be considered.
If there are multiple GPIO controllers then yes, there will be more than one
endpoint. At this time I do now want to condiser other bus architectures (i2c,
spi, ...) to avoid muddying an already difficult conversation.
>
> If we decide to go single-endpoint per device on both side, then for
> that here is the proposal to represent src ept and dst ept:
I do not understand what you mean by "per device" - please be more specific.
>
> When we represent any device under rpmsg bus node, I think it should be
> considered remote's view of the adddress space. So ideally we can
> convert it to Linux view of the address space, via 'ranges' property.
There is no address space to consider since there is no GPIO controller memory
space to access. All that is done by the driver (remote processor) and
completely hidden from Linux by rpmsg-virtio-gpio.
>
> So bindings should include 'ranges' property in the parent node. Then
> linux view of the start address becomes src ept, and remote view of the
> start address becomes dest ept. The remote view of the start address is
> expected to be the static src endpoint on the remote side.
>
> Following representation of the rpmsg devices (gpio, i2c, spi or any other):
>
> rpmsg {
> #address-cells = <1>;
> #size-cells = <1>;
>
> rpmsg-io {
> compatible = "rpmsg-io-bus";
> ranges = <remote_view_addr(dst ept) linux_view_addr(src ept) size>;
> #address-cells = <1>;
> #size-cells = <1>;
>
> gpio@remote_view_addr(or dst ept) {
> compatible = "rpmsg-io";
> reg = <remote_view_addr addr_space_size>;
> gpio-controller;
> #gpio-cells = <2>;
> interrupt-controller;
> #interrupt-cells = <2>;
> };
>
> ...
>
> };
>
> };
>
> Example device-tree:
>
> rpmsg {
> #address-cells = <1>;
> #size-cells = <1>;
>
> rpmsg-io {
> compatible = "rpmsg-io-bus";
> ranges = <0x10000 0x50000 0x1000>,
> <0x20000 0x60000 0x1000>;
> #address-cells = <1>;
> #size-cells = <1>;
>
> gpio@10000 {
> compatible = "rpmsg-io";
> reg = <0x10000 0x1000>;
> gpio-controller;
> #gpio-cells = <2>;
> interrupt-controller;
> #interrupt-cells = <2>;
> };
>
> gpio@20000 {
> compatible = "rpmsg-io";
> reg = <0x20000 0x1000>;
> gpio-controller;
> #gpio-cells = <2>;
> interrupt-controller;
> #interrupt-cells = <2>;
> };
>
> };
>
> };
>
>
> Thanks,
> Tanmay
>
>
> >> To be more specific this will look like following:
> >>
> >> Host (Linux) Remote (baremetal/RTOS)
> >>
> >> rpmsg ch/device 1:
> >> - rpmsg ept 1 <------> rpmsg ept 1 gpio-controller 0
> >>
> >> rpmsg ch/device 2:
> >> - rpmsg ept 2 <------> rpmsg ept 2 gpio-controller 1
> >>
> >>
> >> The question is, how to decide src ept, and dest ept on both sides?
> >> I still think it should be static endpoints.
> >>
> >> I will get back with more reasoning on that.
> >>
> >>> On the remote side, we have to hardcode Which rpmsg controller is mapped
> >>> to which endpoint.
> >>>
> >>>> Or did I misunderstand your questions?
> >>>>
> >>>> Thanks,
> >>>> Arnaud
> >>>>
> >>>
> >>>
> >>> I gave this patch more time yesterday, and I think the 'reg' property
> >>> should represent remote endpoint, instead of the gpio-controller index.
> >>>
> >>> So in this approach remote implementation is expected to provide
> >>> hard-coded (static) endpoints for each gpio-controller instance, and
> >>> that same number should be represented with the 'reg' property.
> >>>
> >>> On remote side:
> >>>
> >>> #define RPMSG_GPIO_0_CONTROLLER_EPT (RPMSG_RESERVED_ADDRESSES + 1) // 1024
> >>>
> >>> ept_1024_callback() {
> >>>
> >>> // handle appropriate gpio port ()
> >>>
> >>> }
> >>>
> >>> On linux side:
> >>>
> >>> So new representation of controller:
> >>>
> >>> rpmsg_gpio_0: gpio@1024 {
> >>> compatible = "rpmsg-gpio";
> >>> reg = <1024>;
> >>> gpio-controller;
> >>> #gpio-cells = <2>;
> >>> #interrupt-cells = <2>;
> >>> interrupt-controller;
> >>> };
> >>>
> >>> rpmsg_gpio_1: gpio@1025 {
> >>> compatible = "rpmsg-gpio";
> >>> reg = <1025>;
> >>> gpio-controller;
> >>> #gpio-cells = <2>;
> >>> #interrupt-cells = <2>;
> >>> interrupt-controller;
> >>> };
> >>>
> >>> gpios = <&rpmsg_gpio_0 (GPIO NUM or PIN) flags>,
> >>> <&rpmsg_gpio_1 (GPIO NUM or PIN) flags>;
> >>>
> >>> Now in the linux driver:
> >>>
> >>> You can easily retrieve destination endpoint when we want to send the
> >>> command to the gpio controller via device's "reg" property.
> >>>
> >>> This approach also provides built-in security as well. Because now
> >>> gpio-controller instance is hardcoded with the endpoint callback, it
> >>> can't be modified/addressed without changing the 'reg' property.
> >>>
> >>> Just like you wouldn't change device address for the instance of the
> >>> gpio-controller right?
> >>>
> >>> This approach can be easily adapted to all the other rpmsg controllers
> >>> as well.
> >>>
> >>> So, dynamic endpoint allocation doesn't make sense in this case. Dynamic
> >>> endpoint allocation makes more sense for user-space apps which don't
> >>> really care about endpoints and only payloads.
> >>>
> >>> But, here we are multiplexing device-addresses with endpoints, and so it
> >>> has to be fixed, and presented via 'reg' property. So, firmware can't
> >>> change device-address without Linux knowing it.
> >>>
> >>> Thanks,
> >>> Tanmay
> >>>
> >>>
> >>>>
> >>>>>>>>>> This way device groups are isolated with each channel/namespace, and
> >>>>>>>>>> instances within each device groups are also respected with specific
> >>>>>>>>>> endpoints.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Beleswar
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
^ permalink raw reply
* Re: [PATCH 4/8] drm/panthor: Add support for protected memory allocation in panthor
From: Liviu Dudau @ 2026-05-12 15:38 UTC (permalink / raw)
To: Boris Brezillon
Cc: Marcin Ślusarz, Ketil Johnsen, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, Shuah Khan, Sumit Semwal, Benjamin Gaignard,
Brian Starkey, John Stultz, T.J. Mercier, Christian König,
Steven Price, Daniel Almeida, Alice Ryhl, Matthias Brugger,
AngeloGioacchino Del Regno, dri-devel, linux-doc, linux-kernel,
linux-media, linaro-mm-sig, linux-arm-kernel, linux-mediatek,
Florent Tomasin, nd
In-Reply-To: <20260512161111.0cb7000e@fedora>
On Tue, May 12, 2026 at 04:11:11PM +0200, Boris Brezillon wrote:
> On Tue, 12 May 2026 14:47:27 +0100
> Liviu Dudau <liviu.dudau@arm.com> wrote:
>
> > On Thu, May 07, 2026 at 01:53:56PM +0200, Boris Brezillon wrote:
> > > On Thu, 7 May 2026 11:02:26 +0200
> > > Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> > >
> > > > On Tue, May 05, 2026 at 06:15:23PM +0200, Boris Brezillon wrote:
> > > > > > @@ -277,9 +286,21 @@ int panthor_device_init(struct panthor_device *ptdev)
> > > > > > return ret;
> > > > > > }
> > > > > >
> > > > > > + /* If a protected heap name is specified but not found, defer the probe until created */
> > > > > > + if (protected_heap_name && strlen(protected_heap_name)) {
> > > > >
> > > > > Do we really need this strlen() > 0? Won't dma_heap_find() fail is the
> > > > > name is "" already?
> > > >
> > > > If dma_heap_find() will fail, then the whole probe with fail too.
> > > > This check prevents that.
> > >
> > > Yeah, that's also a questionable design choice. I mean, we can
> > > currently probe and boot the FW even though we never setup the
> > > protected FW sections, so why should we defer the probe here? Can't we
> > > just retry the next time a group with the protected bit is created and
> > > fail if we can find a protected heap?
> >
> > The problem we have with the current firmware is that it does a number of setup steps at "boot"
> > time only. One of the steps is preparing its internal structures for when it enters protected
> > mode and it stores them in the buffer passed in at firmware loading. We cannot later run the
> > process when we have a group with protected mode set.
>
> No, but we can force a full/slow reset and have that thing
> re-initialized, can't we? I mean, that's basically what we do when a
> fast reset fails: we re-initialize all the sections and reset again, at
> which point the FW should start from a fresh state, and be able to
> properly initialize the protected-related stuff if protected sections
> are populated. Am I missing something?
Right, we can do that. For some reason I keep associating the reset with the
error handling and not with "normal" operations.
Best regards,
Liviu
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
^ permalink raw reply
* Re: [PATCH v10 2/6] iio: adc: ad4691: add initial driver for AD4691 family
From: Jonathan Cameron @ 2026-05-12 15:25 UTC (permalink / raw)
To: Radu Sabau via B4 Relay
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, David Lechner,
Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Uwe Kleine-König, Liam Girdwood, Mark Brown,
Linus Walleij, Bartosz Golaszewski, Philipp Zabel,
Jonathan Corbet, Shuah Khan, linux-iio, devicetree, linux-kernel,
linux-pwm, linux-gpio, linux-doc
In-Reply-To: <20260511-ad4692-multichannel-sar-adc-driver-v10-2-e1fbb1744e38@analog.com>
On Mon, 11 May 2026 14:54:14 +0300
Radu Sabau via B4 Relay <devnull+radu.sabau.analog.com@kernel.org> wrote:
> From: Radu Sabau <radu.sabau@analog.com>
>
> Add support for the Analog Devices AD4691 family of high-speed,
> low-power multichannel SAR ADCs: AD4691 (16-ch, 500 kSPS),
> AD4692 (16-ch, 1 MSPS), AD4693 (8-ch, 500 kSPS) and
> AD4694 (8-ch, 1 MSPS).
>
> The driver implements a custom regmap layer over raw SPI to handle the
> device's mixed 1/2/3/4-byte register widths and uses the standard IIO
> read_raw/write_raw interface for single-channel reads.
>
> The chip idles in Autonomous Mode so that single-shot read_raw can use
> the internal oscillator without disturbing the hardware configuration.
>
> Three voltage supply domains are managed: avdd (required), vio, and a
> reference supply on either the REF pin (ref-supply, external buffer)
> or the REFIN pin (refin-supply, uses the on-chip reference buffer;
> REFBUF_EN is set accordingly). Hardware reset is performed via
> the reset controller framework; a software reset through SPI_CONFIG_A
> is used as fallback when no hardware reset is available.
>
> Accumulator channel masking for single-shot reads uses ACC_MASK_REG via
> an ADDR_DESCENDING SPI write, which covers both mask bytes in a single
> 16-bit transfer.
>
> IIO_CHAN_INFO_SAMP_FREQ is exposed as info_mask_shared_by_all because
> the AD4691 family has a single internal oscillator whose frequency
> register is shared across all channels. Writing sampling_frequency for
> any one channel necessarily changes the conversion rate for every other
> channel, so the shared annotation correctly reflects the hardware
> behaviour.
Sashiko correctly points out that this last paragraph doesn't correspond to
the driver. Needs an update.
https://sashiko.dev/#/patchset/20260511-ad4692-multichannel-sar-adc-driver-v10-0-e1fbb1744e38%40analog.com
Otherwise just a few more sashiko things inline and my
comments on whether they are true - I think they are in this case...
>
> Reviewed-by: David Lechner <dlechner@baylibre.com>
> Signed-off-by: Radu Sabau <radu.sabau@analog.com>
> diff --git a/drivers/iio/adc/ad4691.c b/drivers/iio/adc/ad4691.c
> new file mode 100644
> index 000000000000..5b72216bca80
> --- /dev/null
> +++ b/drivers/iio/adc/ad4691.c
> +
> +static int ad4691_reg_read(void *context, unsigned int reg, unsigned int *val)
> +{
> + struct spi_device *spi = context;
> + u8 tx[2], rx[4];
> + int ret;
> +
> + /* Set bit 15 to mark the operation as READ. */
> + put_unaligned_be16(0x8000 | reg, tx);
> +
> + switch (reg) {
> + case 0 ... AD4691_OSC_FREQ_REG:
> + case AD4691_SPARE_CONTROL ... AD4691_ACC_MASK_REG - 1:
> + case AD4691_ACC_MASK_REG + 1 ... AD4691_ACC_SAT_OVR_REG(15):
> + ret = spi_write_then_read(spi, tx, sizeof(tx), rx, 1);
> + if (ret)
> + return ret;
> + *val = rx[0];
> + return 0;
> + case AD4691_ACC_MASK_REG:
> + case AD4691_STD_SEQ_CONFIG:
> + case AD4691_AVG_IN(0) ... AD4691_AVG_IN(15):
> + ret = spi_write_then_read(spi, tx, sizeof(tx), rx, 2);
Just to check - (another sashiko one) is it a problem if via debugfs we
read addresses that aren't the base ones of these bigger reads?
> + if (ret)
> + return ret;
> + *val = get_unaligned_be16(rx);
> + return 0;
> + case AD4691_AVG_STS_IN(0) ... AD4691_AVG_STS_IN(15):
> + case AD4691_ACC_IN(0) ... AD4691_ACC_IN(15):
> + ret = spi_write_then_read(spi, tx, sizeof(tx), rx, 3);
> + if (ret)
> + return ret;
> + *val = get_unaligned_be24(rx);
> + return 0;
> + case AD4691_ACC_STS_DATA(0) ... AD4691_ACC_STS_DATA(15):
> + ret = spi_write_then_read(spi, tx, sizeof(tx), rx, 4);
> + if (ret)
> + return ret;
> + *val = get_unaligned_be32(rx);
> + return 0;
> + default:
> + return -EINVAL;
> + }
> +}
> +}
> +
> +static bool ad4691_volatile_reg(struct device *dev, unsigned int reg)
> +{
> + switch (reg) {
> + case AD4691_STATUS_REG:
> + case AD4691_CLAMP_STATUS1_REG:
> + case AD4691_CLAMP_STATUS2_REG:
> + case AD4691_GPIO_READ:
> + case AD4691_ACC_STATUS_FULL1_REG ... AD4691_ACC_STATUS_SAT2_REG:
> + case AD4691_ACC_SAT_OVR_REG(0) ... AD4691_ACC_SAT_OVR_REG(15):
> + case AD4691_AVG_IN(0) ... AD4691_AVG_IN(15):
> + case AD4691_AVG_STS_IN(0) ... AD4691_AVG_STS_IN(15):
> + case AD4691_ACC_IN(0) ... AD4691_ACC_IN(15):
> + case AD4691_ACC_STS_DATA(0) ... AD4691_ACC_STS_DATA(15):
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +static bool ad4691_readable_reg(struct device *dev, unsigned int reg)
I think this will all end up easier to follow if you add some checks in
this and the next one to exclude the latter parts of multi address registers.
Neater to do it here than in the read and write calls themselves.
> +{
> + switch (reg) {
> + case 0 ... AD4691_OSC_FREQ_REG:
> + case AD4691_SPARE_CONTROL ... AD4691_ACC_SAT_OVR_REG(15):
> + case AD4691_STD_SEQ_CONFIG:
> + case AD4691_AVG_IN(0) ... AD4691_AVG_IN(15):
> + case AD4691_AVG_STS_IN(0) ... AD4691_AVG_STS_IN(15):
> + case AD4691_ACC_IN(0) ... AD4691_ACC_IN(15):
> + case AD4691_ACC_STS_DATA(0) ... AD4691_ACC_STS_DATA(15):
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +static bool ad4691_writeable_reg(struct device *dev, unsigned int reg)
> +{
> + switch (reg) {
> + case 0 ... AD4691_OSC_FREQ_REG:
> + case AD4691_STD_SEQ_CONFIG:
> + case AD4691_SPARE_CONTROL ... AD4691_GPIO_MODE2_REG:
> + return true;
> + default:
> + return false;
> + }
> +}
>
> +
> +static int ad4691_get_sampling_freq(struct ad4691_state *st, int *val)
> +{
> + unsigned int reg_val;
> + int ret;
> +
I think sashiko's comment on locking here is a false positive because you'll
always get the value from regcache. Maybe worth a comment on that though.
Sashiko suggests the direct read goes away later anyway.
> + ret = regmap_read(st->regmap, AD4691_OSC_FREQ_REG, ®_val);
> + if (ret)
> + return ret;
> +
> + *val = ad4691_osc_freqs_Hz[FIELD_GET(AD4691_OSC_FREQ_MASK, reg_val)];
> + return IIO_VAL_INT;
> +}
> +static int ad4691_reg_access(struct iio_dev *indio_dev, unsigned int reg,
> + unsigned int writeval, unsigned int *readval)
> +{
> + struct ad4691_state *st = iio_priv(indio_dev);
> +
> + guard(mutex)(&st->lock);
> +
> + if (readval)
> + return regmap_read(st->regmap, reg, readval);
> +
> + return regmap_write(st->regmap, reg, writeval);
This is where we perhaps have too much freedom given the effective gaps
in register addresses due for the larger 'registers'.
Using those wrong addresses might not be a problem belong filling the
regcache with garbage. If we can just screen them out in the readable
writeable checks that would be better.
> +}
...
> +static int ad4691_config(struct ad4691_state *st)
> +{
> + struct device *dev = regmap_get_device(st->regmap);
> + enum ad4691_ref_ctrl ref_val;
> + unsigned int val;
> + int ret;
> +
> + switch (st->vref_uV) {
> + case AD4691_VREF_uV_MIN ... AD4691_VREF_2P5_uV_MAX:
> + ref_val = AD4691_VREF_2P5;
> + break;
> + case AD4691_VREF_2P5_uV_MAX + 1 ... AD4691_VREF_3P0_uV_MAX:
> + ref_val = AD4691_VREF_3P0;
> + break;
> + case AD4691_VREF_3P0_uV_MAX + 1 ... AD4691_VREF_3P3_uV_MAX:
> + ref_val = AD4691_VREF_3P3;
> + break;
> + case AD4691_VREF_3P3_uV_MAX + 1 ... AD4691_VREF_4P096_uV_MAX:
> + ref_val = AD4691_VREF_4P096;
> + break;
> + case AD4691_VREF_4P096_uV_MAX + 1 ... AD4691_VREF_uV_MAX:
> + ref_val = AD4691_VREF_5P0;
> + break;
> + default:
> + return dev_err_probe(dev, -EINVAL,
> + "Unsupported vref voltage: %d uV\n",
> + st->vref_uV);
> + }
> +
> + val = FIELD_PREP(AD4691_REF_CTRL_MASK, ref_val);
> + if (st->refbuf_en)
> + val |= AD4691_REFBUF_EN;
> +
> + ret = regmap_write(st->regmap, AD4691_REF_CTRL, val);
> + if (ret)
> + return dev_err_probe(dev, ret, "Failed to write REF_CTRL\n");
> +
> + ret = regmap_assign_bits(st->regmap, AD4691_DEVICE_SETUP,
> + AD4691_LDO_EN, st->ldo_en);
> + if (ret)
> + return dev_err_probe(dev, ret, "Failed to write DEVICE_SETUP\n");
> +
> + /*
> + * Set the internal oscillator to the highest rate this chip supports.
> + * Index 0 (1 MHz) exceeds the 500 kHz max of AD4691/AD4693, so those
> + * chips start at index 1 (500 kHz).
> + */
> + ret = regmap_write(st->regmap, AD4691_OSC_FREQ_REG,
As per the comment above - I think this is why we don't have a bug reading
back the frequency during an ADC sampling sequence. This ensures we have
the value cached.
> + ad4691_samp_freq_start(st->info));
> + if (ret)
> + return dev_err_probe(dev, ret, "Failed to write OSC_FREQ\n");
> +
> + ret = regmap_update_bits(st->regmap, AD4691_ADC_SETUP,
> + AD4691_ADC_MODE_MASK, AD4691_AUTONOMOUS_MODE);
> + if (ret)
> + return dev_err_probe(dev, ret, "Failed to write ADC_SETUP\n");
> +
> + return 0;
> +}
^ permalink raw reply
* RE: [RFC net-next 0/4] devlink: Add boot-time defaults
From: Parav Pandit @ 2026-05-12 15:25 UTC (permalink / raw)
To: Jiri Pirko
Cc: Mark Bloch, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
Andrew Lunn, David S. Miller, Jonathan Corbet, Shuah Khan,
Simon Horman, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Andrew Morton, Borislav Petkov (AMD), Randy Dunlap, Dave Hansen,
Christian Brauner, Petr Mladek, Peter Zijlstra (Intel),
Thomas Gleixner, Pawan Gupta, Dapeng Mi, Kees Cook, Marco Elver,
Eric Biggers, NBU-Contact-Li Rongqing (EXTERNAL),
Paul E. McKenney, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org
In-Reply-To: <agM0DsiaAH8-Ox7N@FV6GYCPJ69>
> From: Jiri Pirko <jiri@resnulli.us>
> Sent: 12 May 2026 07:37 PM
>
> Tue, May 12, 2026 at 03:48:32PM CEST, parav@nvidia.com wrote:
> >
> >> From: Jiri Pirko <jiri@resnulli.us>
> >> Sent: 12 May 2026 02:16 PM
> >>
> >> Mon, May 11, 2026 at 08:21:37PM +0200, parav@nvidia.com wrote:
> >> >
> >> >> From: Mark Bloch <mbloch@nvidia.com>
> >> >> Sent: 10 May 2026 06:02 PM
> >> >>
> >> >
> >> >[..]
> >> >
> >> >> > I look at it from the perspective that from some CX generation,
> >> >> > switchdev mode should be default. So that is a device-based decision.
> >> >> > I believe as such it can optionally be permanenty configured (nv config)
> >> >> > on older device. Why not?
> >> >>
> >> >Because sometimes switchdev_inactive is needed and sometimes not.
> >> >Such knob is not device decision.
> >>
> >> That is what I would call corner case. In that, user can use userspace
> >> configuration to change the mode in runtime.
> >>
> >Corner vs common depends on users one talks to. :)
> >If fw has switchdev(active) as default, and then
> >And user needs to run switchdev_inactive, it will actually break their switching applications.
>
> Can you describe the actutal breakage please?
>
Driver default was switchdev so all the traffic is forwarded to the switch,
and user didn't have chance to setup the fdb rules.
So packets are dropped but user didn't expect the traffic to be forwarded.
With this RFC, the device would start in the switchdev_inactive.
And user's goal is achieved.
> >
> >So, one needs to invent switchdev_inactive in the FW.
> >
> >Jakub's suggestion in this RFC is covering both the scenarios uniformly without above problems.
> >Single uapi for all the cases, so looks good to me.
> >
> >Moreover, do not understand how alternative solves such problems.
> >i.e. user is unable to configure the fw because driver is not yet loaded/up.
>
> See my other reply in this thread. I don't think there is a need to
> configure anything in FW. If we fix the behaviour in switchdev mode for
> non-sriov user and change the default, no fw knob needed. What am I
> missing?
>
If I understood your suggestion right, is it the devlinkd based solution?
If yes, then Mark explained that it has the issue of all drivers to be loaded, followed by user space to start.
^ permalink raw reply
* Re: [PATCH v12 02/11] lib: kstrtox: add kstrtoudec64() and kstrtodec64()
From: Andy Shevchenko @ 2026-05-12 15:21 UTC (permalink / raw)
To: Rodrigo Alencar
Cc: Andy Shevchenko, Jonathan Cameron, Rodrigo Alencar via B4 Relay,
rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan, David Laight
In-Reply-To: <ur6brs3yjzyb4mtelabmcglxjltddqvjxtgl3lkdkmbjlkmnsq@bwd6rz7gided>
On Tue, May 12, 2026 at 6:11 PM Rodrigo Alencar
<455.rodrigo.alencar@gmail.com> wrote:
> On 26/05/12 05:43PM, Andy Shevchenko wrote:
> > On Tue, May 12, 2026 at 03:12:24PM +0100, Rodrigo Alencar wrote:
> > > On 26/05/12 04:48PM, Andy Shevchenko wrote:
> > > > On Tue, May 12, 2026 at 02:21:14PM +0100, Rodrigo Alencar wrote:
> > > > > On 26/05/12 04:12PM, Andy Shevchenko wrote:
> > > > > > On Tue, May 12, 2026 at 12:39:53PM +0100, Jonathan Cameron wrote:
> > > > > > > On Sun, 10 May 2026 13:42:20 +0100
> > > > > > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > > > > > >
> > > > > > > > Add helpers that parses decimal numbers into 64-bit number, i.e., decimal
> > > > > > > > point numbers with pre-defined scale are parsed into a 64-bit value (fixed
> > > > > > > > precision). After the decimal point, digits beyond the specified scale
> > > > > > > > are ignored.
> > > > > > >
> > > > > > > Whilst Rodrigo has already replied to say there will be another version
> > > > > > > I'd like to request final feedback from those who were involved in the parser
> > > > > > > discussions.
> > > > > > >
> > > > > > > They got very involved and I'm far from an expert in the right way to do
> > > > > > > this stuff.
> > > > > > >
> > > > > > > I don't think David Laight was +CC so I've added that.
> > > > > > > David, Andy - I think you two were most involved in that discussion:
> > > > > > > Any objections to the end result?
> > > > > >
> > > > > > I already said a few times about the naming. I do not like the kstrto*()
> > > > > > be semantically different on how they treat the input. Second point is
> > > > > > to avoid code duplication, but this one is less of a concern since the
> > > > > > new code is in the library close to the other potentially duplicate code
> > > > > > piece and hence can be addressed later.
> > > > >
> > > > > I suppose I reached into kstrtodec64() and kstrtoudec64() because it aligns
> > > > > with your expectations for kstrto*() semantics, no? Those include:
> > > > > - overflow check;
> > > > > - extensive input validation;
> > > > > - optional '\n' in the end;
> > > > > - mandatory nul-termination.
> > > > >
> > > > > am I missing anything?
> > > >
> > > > When we add scale we basically make that not true. Moreover the code in this
> > > > patch makes scale == number_of_characters which I think a bit fragile, however
> > > > it's about the fractional part when the amount of digits is equal to scale.
> > >
> > > That is not really the case. It is being set as a limit, so it does check for
> > > truncation and zero-padding.
> >
> > I do not see it happens in _parse_integer_limit(). It doesn't try to parse more
> > characters than it's requested in max_chars. It doesn't check if there are more
> > character nor their converted values.
> >
> > > > To make this work as expected we need to add an additional call like
> > > > kstrtoull() (and perhaps drop that \n and NUL-terminator checks) and see
> > > > if that overflows or not. Since it's a fractional part it must have less
> > > > than 20 (decimal) digits there, so we check the rv (or how many digits
> > > > were parsed successfully) and compare to 20. If it's more, we got too many
> > > > decimal digits.
> > >
> > > For overflow it checks the KSTRTOX_OVERFLOW flag and leverages check_mul_overflow()
> > > and check_add_overflow() when combining fractional and integer parts. The amount
> > > of characters is not really important there. The scale cannot be bigger than 19 and
> > > that makes sure that int_pow() does not overflow. The code uses _parse_integer_limit()
> > > due to the nature of input and to avoid 64-bit division, kstrtoull() at any point
> > > (parsing integer or fractional parts) does not make much sense.
> >
> > Under 'like kstrotoull()' I meant something that repeats needed functionality.
> > I believe it's parse_integer() (without limit).
>
> I think we are going in circles here and we could look at the code instead:
> - integer parsing with _parse_integer()
> - overflow check and validation of the return value
> - fractional parsing with _parse_integer_limit()
> - overflow check and validation of the return value
No, this is not fully true. That's what my whole point is about. The
max_chars parameter limits the input check, then it skips an arbitrary
number of digits and only *then* it checks for \n and \0. What will be
the result of the
0.00000000000000000000000000000000423 in your case? Whatever scale you
gave it will return 0 without checking on how many digits were
supplied. All the same for 0.9999999999999999999999999999999000423. My
point is that we should limit this by 19 digits.
On top of that, what about -0.9(19 times) ? the fraction should be u64
in this case and it's fine. The sign applies to the combined value.
> - extra scaling and truncation happening outside if needed.
Right, but the given input may be way too long and still needs more validation.
> - check for input termination
> - combination of integer and fractional parts with check_mul_overflow() and check_add_overflow()
>
> > > > Maybe I'm missing these checks already performed?
> > > >
> > > > > > Having the test cases is a big benefit, and that part I like the most.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v13 3/4] gpio: rpmsg: add generic rpmsg GPIO driver
From: Mathieu Poirier @ 2026-05-12 15:21 UTC (permalink / raw)
To: Andrew Lunn
Cc: tanmay.shah, Arnaud POULIQUEN, Beleswar Prasad Padhi,
Shenwei Wang, Linus Walleij, Bartosz Golaszewski, Jonathan Corbet,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson,
Frank Li, Sascha Hauer, Shuah Khan, linux-gpio@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Pengutronix Kernel Team, Fabio Estevam, Peng Fan,
devicetree@vger.kernel.org, linux-remoteproc@vger.kernel.org,
imx@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
dl-linux-imx, Bartosz Golaszewski
In-Reply-To: <4ae35920-2539-4b12-8dea-efd407b8aaeb@lunn.ch>
On Mon, 11 May 2026 at 12:18, Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Arnaud, Beleswar, Andrew and I are all advocating for one endpoint per
> > GPIO controller. The remaining issue it about the best way to work
> > out source and destination addresses between Linux and the remote
> > processor. I'm running out of time for today but I'll return to this
> > thread with a final analysis by the end of the week.
>
> How many of the participants here will be in Minneapolis next week for
> the Embedded Linux Conference? There is even a talk about this:
>
> https://osselcna2026.sched.com/event/2JQpx/building-virtual-drivers-with-rpmsg-key-design-principles-challenges-trade-offs-beleswar-prasad-padhi-texas-instruments?iframe=yes&w=100%&sidebar=yes&bg=no
>
> Maybe we can get together and decide on the final design after the
> session.
>
I will not be in Minneapolis next week. At this point I think things
are converging into 2 main takeaways:
1) A serious refactoring of the protocol to include only what is
available in the virtio-gpio specification [1].
2) The specification of GPIO controller number in an extension of the
namespace announcement [2].
Shenwei proposed embedding the GPIO controller number in the
endpoint's source address [3], something I'm ambivalent about and
still have to look into. I also have to read Tanmay's latest
comments. I'm hoping to be done with all that by the end of the week.
With the above (1) and (2), a new patchset will be required to reset
this thread.
Thanks,
Mathieu
[1]. https://lwn.net/ml/all/afjyH5JT0JS2j0L5@p14s/
[2]. https://lwn.net/ml/all/afzIABSh1xtMEGbf%40p14s/
[3]. https://lwn.net/ml/all/PAXPR04MB9185BFA6E7375FAD0B15B021893C2@PAXPR04MB9185.eurprd04.prod.outlook.com/
> Andrew
^ permalink raw reply
* Re: [PATCH v12 02/11] lib: kstrtox: add kstrtoudec64() and kstrtodec64()
From: Rodrigo Alencar @ 2026-05-12 15:11 UTC (permalink / raw)
To: Andy Shevchenko, Rodrigo Alencar
Cc: Jonathan Cameron, Rodrigo Alencar via B4 Relay, rodrigo.alencar,
linux-kernel, linux-iio, devicetree, linux-doc, David Lechner,
Andy Shevchenko, Lars-Peter Clausen, Michael Hennerich,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
Andrew Morton, Petr Mladek, Steven Rostedt, Rasmus Villemoes,
Sergey Senozhatsky, Shuah Khan, David Laight
In-Reply-To: <agM8pWrM6j_XksvN@ashevche-desk.local>
On 26/05/12 05:43PM, Andy Shevchenko wrote:
> On Tue, May 12, 2026 at 03:12:24PM +0100, Rodrigo Alencar wrote:
> > On 26/05/12 04:48PM, Andy Shevchenko wrote:
> > > On Tue, May 12, 2026 at 02:21:14PM +0100, Rodrigo Alencar wrote:
> > > > On 26/05/12 04:12PM, Andy Shevchenko wrote:
> > > > > On Tue, May 12, 2026 at 12:39:53PM +0100, Jonathan Cameron wrote:
> > > > > > On Sun, 10 May 2026 13:42:20 +0100
> > > > > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > > > > >
> > > > > > > Add helpers that parses decimal numbers into 64-bit number, i.e., decimal
> > > > > > > point numbers with pre-defined scale are parsed into a 64-bit value (fixed
> > > > > > > precision). After the decimal point, digits beyond the specified scale
> > > > > > > are ignored.
> > > > > >
> > > > > > Whilst Rodrigo has already replied to say there will be another version
> > > > > > I'd like to request final feedback from those who were involved in the parser
> > > > > > discussions.
> > > > > >
> > > > > > They got very involved and I'm far from an expert in the right way to do
> > > > > > this stuff.
> > > > > >
> > > > > > I don't think David Laight was +CC so I've added that.
> > > > > > David, Andy - I think you two were most involved in that discussion:
> > > > > > Any objections to the end result?
> > > > >
> > > > > I already said a few times about the naming. I do not like the kstrto*()
> > > > > be semantically different on how they treat the input. Second point is
> > > > > to avoid code duplication, but this one is less of a concern since the
> > > > > new code is in the library close to the other potentially duplicate code
> > > > > piece and hence can be addressed later.
> > > >
> > > > I suppose I reached into kstrtodec64() and kstrtoudec64() because it aligns
> > > > with your expectations for kstrto*() semantics, no? Those include:
> > > > - overflow check;
> > > > - extensive input validation;
> > > > - optional '\n' in the end;
> > > > - mandatory nul-termination.
> > > >
> > > > am I missing anything?
> > >
> > > When we add scale we basically make that not true. Moreover the code in this
> > > patch makes scale == number_of_characters which I think a bit fragile, however
> > > it's about the fractional part when the amount of digits is equal to scale.
> >
> > That is not really the case. It is being set as a limit, so it does check for
> > truncation and zero-padding.
>
> I do not see it happens in _parse_integer_limit(). It doesn't try to parse more
> characters than it's requested in max_chars. It doesn't check if there are more
> character nor their converted values.
>
> > > To make this work as expected we need to add an additional call like
> > > kstrtoull() (and perhaps drop that \n and NUL-terminator checks) and see
> > > if that overflows or not. Since it's a fractional part it must have less
> > > than 20 (decimal) digits there, so we check the rv (or how many digits
> > > were parsed successfully) and compare to 20. If it's more, we got too many
> > > decimal digits.
> >
> > For overflow it checks the KSTRTOX_OVERFLOW flag and leverages check_mul_overflow()
> > and check_add_overflow() when combining fractional and integer parts. The amount
> > of characters is not really important there. The scale cannot be bigger than 19 and
> > that makes sure that int_pow() does not overflow. The code uses _parse_integer_limit()
> > due to the nature of input and to avoid 64-bit division, kstrtoull() at any point
> > (parsing integer or fractional parts) does not make much sense.
>
> Under 'like kstrotoull()' I meant something that repeats needed functionality.
> I believe it's parse_integer() (without limit).
I think we are going in circles here and we could look at the code instead:
- integer parsing with _parse_integer()
- overflow check and validation of the return value
- fractional parsing with _parse_integer_limit()
- overflow check and validation of the return value
- extra scaling and truncation happening outside if needed.
- check for input termination
- combination of integer and fractional parts with check_mul_overflow() and check_add_overflow()
> > > Maybe I'm missing these checks already performed?
> > >
> > > > > Having the test cases is a big benefit, and that part I like the most.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* Re: [PATCH v9 0/4] kunit: Add support for suppressing warning backtraces
From: Albert Esteve @ 2026-05-12 15:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Jonathan Corbet, Shuah Khan, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Alexandre Ghiti, linux-kernel,
linux-arch, linux-kselftest, kunit-dev, dri-devel, workflows,
linux-riscv, linux-doc, peterz, Alessandro Carminati,
Guenter Roeck, Kees Cook, Linux Kernel Functional Testing,
Maíra Canal, Dan Carpenter, Simona Vetter
In-Reply-To: <20260508165203.1cd1b27e664754d18dbea899@linux-foundation.org>
On Sat, May 9, 2026 at 1:52 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Fri, 08 May 2026 17:02:44 +0200 Albert Esteve <aesteve@redhat.com> wrote:
>
> > Some unit tests intentionally trigger warning backtraces by passing bad
> > parameters to kernel API functions. Such unit tests typically check the
> > return value from such calls, not the existence of the warning backtrace.
> >
> > ...
> >
> > Solve the problem by providing a means to suppress warning backtraces
> > originating from the current kthread while executing test code.
> > Since each KUnit test runs in its own kthread, this effectively scopes
> > suppression to the test that enabled it, without requiring any
> > architecture-specific code.
>
> Thanks. AI review has a bunch of questions:
> https://sashiko.dev/#/patchset/20260508-kunit_add_support-v9-0-99df7aa880f6@redhat.com
>
Hi! Most look like valid concerns/issues. I will send a new version
addressing them then, and monitor sashiko to ensure I clear
everything.
Thanks for the link.
BR,
Albert.
^ permalink raw reply
* [PATCH] docs: reporting-issues: clarify advice wording
From: Chen-Shi-Hong @ 2026-05-12 15:04 UTC (permalink / raw)
To: linux; +Cc: corbet, skhan, linux-doc, linux-kernel, Chen-Shi-Hong
A previous change replaced "these advices" with "this advice", but that
wording can be read too narrowly and may seem to refer only to a single
recommendation.
Use "all of this advice" instead to make it clearer that the sentence
refers to the broader set of recommendations in the paragraph.
Signed-off-by: Chen-Shi-Hong <eric039eric@gmail.com>
---
Documentation/admin-guide/reporting-issues.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 731865b5e8ff..87dd874fffcf 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -129,7 +129,7 @@ After these preparations you'll now enter the main part:
situations; during the merge window that actually might be even the best
approach, but in that development phase it can be an even better idea to
suspend your efforts for a few days anyway. Whatever version you choose,
- ideally use a 'vanilla' build. Ignoring this advice will dramatically
+ ideally use a 'vanilla' build. Ignoring all of this advice will dramatically
increase the risk your report will be rejected or ignored.
* Ensure the kernel you just installed does not 'taint' itself when
@@ -795,7 +795,7 @@ Install a fresh kernel for testing
situations; during the merge window that actually might be even the best
approach, but in that development phase it can be an even better idea to
suspend your efforts for a few days anyway. Whatever version you choose,
- ideally use a 'vanilla' built. Ignoring this advice will dramatically
+ ideally use a 'vanilla' built. Ignoring all of this advice will dramatically
increase the risk your report will be rejected or ignored.*
As mentioned in the detailed explanation for the first step already: Like most
--
2.53.0
^ permalink raw reply related
* Re: [kees:for-next/hardening 1/1] htmldocs: Documentation/driver-api/basics:127: ./include/linux/stddef.h:110: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
From: Kees Cook @ 2026-05-12 14:56 UTC (permalink / raw)
To: Gustavo A. R. Silva; +Cc: kernel test robot, oe-kbuild-all, linux-doc
In-Reply-To: <202605120507.9iQRMgKR-lkp@intel.com>
On Tue, May 12, 2026 at 05:06:30AM +0200, kernel test robot wrote:
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/hardening
> head: 3c74955937520e6aabc0ec921b1bfe01734c6abc
> commit: 3c74955937520e6aabc0ec921b1bfe01734c6abc [1/1] stddef: Document designated initializer semantics for __TRAILING_OVERLAP()
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
> reproduce: (https://download.01.org/0day-ci/archive/20260512/202605120507.9iQRMgKR-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202605120507.9iQRMgKR-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
> --------------------------------------------------------------------------------------------^
> Documentation/driver-api/basics:42: ./kernel/time/time.c:370: WARNING: Duplicate C declaration, also defined at driver-api/basics:436.
> Declaration is '.. c:function:: unsigned int jiffies_to_msecs (const unsigned long j)'. [duplicate_declaration.c]
> Documentation/driver-api/basics:42: ./kernel/time/time.c:393: WARNING: Duplicate C declaration, also defined at driver-api/basics:453.
> Declaration is '.. c:function:: unsigned int jiffies_to_usecs (const unsigned long j)'. [duplicate_declaration.c]
> >> Documentation/driver-api/basics:127: ./include/linux/stddef.h:110: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
> Documentation/driver-api/basics:127: ./include/linux/stddef.h:115: ERROR: Unexpected indentation. [docutils]
> >> Documentation/driver-api/basics:127: ./include/linux/stddef.h:116: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
> Documentation/driver-api/basics:127: ./include/linux/stddef.h:117: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
> Documentation/driver-api/basics:127: ./include/linux/stddef.h:122: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
> Documentation/driver-api/basics:127: ./include/linux/stddef.h:124: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
> Documentation/driver-api/basics:127: ./include/linux/stddef.h:139: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
> Documentation/driver-api/basics:127: ./include/linux/stddef.h:140: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Oh, hrm, there are a lot of errors in stddef.h for the "htmldocs" make
target. Gustavo, can you see what's needed to fix these?
-Kees
--
Kees Cook
^ permalink raw reply
* Re: [PATCH v12 02/11] lib: kstrtox: add kstrtoudec64() and kstrtodec64()
From: Andy Shevchenko @ 2026-05-12 14:43 UTC (permalink / raw)
To: Rodrigo Alencar
Cc: Jonathan Cameron, Rodrigo Alencar via B4 Relay, rodrigo.alencar,
linux-kernel, linux-iio, devicetree, linux-doc, David Lechner,
Andy Shevchenko, Lars-Peter Clausen, Michael Hennerich,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
Andrew Morton, Petr Mladek, Steven Rostedt, Rasmus Villemoes,
Sergey Senozhatsky, Shuah Khan, David Laight
In-Reply-To: <dxjg2sdyxb7ieb4abmeyyye7qok6cczrxabpsjyjhcbehwoec3@sbbqoo4wmzre>
On Tue, May 12, 2026 at 03:12:24PM +0100, Rodrigo Alencar wrote:
> On 26/05/12 04:48PM, Andy Shevchenko wrote:
> > On Tue, May 12, 2026 at 02:21:14PM +0100, Rodrigo Alencar wrote:
> > > On 26/05/12 04:12PM, Andy Shevchenko wrote:
> > > > On Tue, May 12, 2026 at 12:39:53PM +0100, Jonathan Cameron wrote:
> > > > > On Sun, 10 May 2026 13:42:20 +0100
> > > > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > > > >
> > > > > > Add helpers that parses decimal numbers into 64-bit number, i.e., decimal
> > > > > > point numbers with pre-defined scale are parsed into a 64-bit value (fixed
> > > > > > precision). After the decimal point, digits beyond the specified scale
> > > > > > are ignored.
> > > > >
> > > > > Whilst Rodrigo has already replied to say there will be another version
> > > > > I'd like to request final feedback from those who were involved in the parser
> > > > > discussions.
> > > > >
> > > > > They got very involved and I'm far from an expert in the right way to do
> > > > > this stuff.
> > > > >
> > > > > I don't think David Laight was +CC so I've added that.
> > > > > David, Andy - I think you two were most involved in that discussion:
> > > > > Any objections to the end result?
> > > >
> > > > I already said a few times about the naming. I do not like the kstrto*()
> > > > be semantically different on how they treat the input. Second point is
> > > > to avoid code duplication, but this one is less of a concern since the
> > > > new code is in the library close to the other potentially duplicate code
> > > > piece and hence can be addressed later.
> > >
> > > I suppose I reached into kstrtodec64() and kstrtoudec64() because it aligns
> > > with your expectations for kstrto*() semantics, no? Those include:
> > > - overflow check;
> > > - extensive input validation;
> > > - optional '\n' in the end;
> > > - mandatory nul-termination.
> > >
> > > am I missing anything?
> >
> > When we add scale we basically make that not true. Moreover the code in this
> > patch makes scale == number_of_characters which I think a bit fragile, however
> > it's about the fractional part when the amount of digits is equal to scale.
>
> That is not really the case. It is being set as a limit, so it does check for
> truncation and zero-padding.
I do not see it happens in _parse_integer_limit(). It doesn't try to parse more
characters than it's requested in max_chars. It doesn't check if there are more
character nor their converted values.
> > To make this work as expected we need to add an additional call like
> > kstrtoull() (and perhaps drop that \n and NUL-terminator checks) and see
> > if that overflows or not. Since it's a fractional part it must have less
> > than 20 (decimal) digits there, so we check the rv (or how many digits
> > were parsed successfully) and compare to 20. If it's more, we got too many
> > decimal digits.
>
> For overflow it checks the KSTRTOX_OVERFLOW flag and leverages check_mul_overflow()
> and check_add_overflow() when combining fractional and integer parts. The amount
> of characters is not really important there. The scale cannot be bigger than 19 and
> that makes sure that int_pow() does not overflow. The code uses _parse_integer_limit()
> due to the nature of input and to avoid 64-bit division, kstrtoull() at any point
> (parsing integer or fractional parts) does not make much sense.
Under 'like kstrotoull()' I meant something that repeats needed functionality.
I believe it's parse_integer() (without limit).
> > Maybe I'm missing these checks already performed?
> >
> > > > Having the test cases is a big benefit, and that part I like the most.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* [RFC PATCH v2 28/28] Docs/admin-guide/mm/damon/usage: update for memcg damon filter
From: SeongJae Park @ 2026-05-12 14:36 UTC (permalink / raw)
Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
linux-kernel, linux-mm
In-Reply-To: <20260512143645.113201-1-sj@kernel.org>
Update DAMON usage document for the newly added belonging memory cgroup
attribute monitoring feature.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
Documentation/admin-guide/mm/damon/usage.rst | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index 465bcdf89b182..84741b4cd1877 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -74,7 +74,7 @@ comma (",").
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ │ :ref:`probes <damon_usage_sysfs_probes>`/nr_probes
│ │ │ │ │ │ │ 0/filters/nr_filters
- │ │ │ │ │ │ │ │ │ 0/type,matching,allow
+ │ │ │ │ │ │ │ │ │ 0/type,matching,allow,path
│ │ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ :ref:`targets <sysfs_targets>`/nr_targets
@@ -289,7 +289,9 @@ the data attribute for the probe.
In the beginning, ``filters`` directory has only one file, ``nr_filters``.
Writing a number (``N``) to the file creates the number of child directories
named ``0`` to ``N-1``. Each directory represents each filter and work in a
-way similar to that for :ref:`DAMOS filter <sysfs_filters>`.
+way similar to that for :ref:`DAMOS filter <sysfs_filters>`. When the filter
+``type`` is ``memcg``, ``path`` file works the role of ``memcg_path`` for
+:ref:`DAMOS filter <sysfs_filters>`.
.. _sysfs_targets:
--
2.47.3
^ permalink raw reply related
* [RFC PATCH v2 27/28] Docs/mm/damon/design: update for memcg damon filter
From: SeongJae Park @ 2026-05-12 14:36 UTC (permalink / raw)
Cc: SeongJae Park, Liam R. Howlett, Andrew Morton, David Hildenbrand,
Jonathan Corbet, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
Shuah Khan, Suren Baghdasaryan, Vlastimil Babka, damon, linux-doc,
linux-kernel, linux-mm
In-Reply-To: <20260512143645.113201-1-sj@kernel.org>
Update DAMON design document for the newly added belonging memory cgroup
attribute monitoring feature.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
Documentation/mm/damon/design.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst
index 887b45cbeb716..a24f9f00d1837 100644
--- a/Documentation/mm/damon/design.rst
+++ b/Documentation/mm/damon/design.rst
@@ -293,8 +293,8 @@ registration is made by specifying a probe per attribute. Each of the probe
specifies a rule to determine if a given memory region has the related
attribute. The rule is constructed with multiple filters. The filters work
same to :ref:`DAMOS filters <damon_design_damos_filters>` except the supported
-filter types. Currently only ``anon`` filter type is supported for data
-attributes monitoring.
+filter types. Currently only ``anon`` and ``memcg`` filter types are supported
+for data attributes monitoring.
If such probes are registered, DAMON executes the probes for each region's
sampling memory when it does the access :ref:`sampling
--
2.47.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox