* [PATCH v3 21/28] nfsd: allow nfsd4_encode_fattr4_change() to work with no export
From: Jeff Layton @ 2026-04-28 7:10 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
Alexander Aring, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
Anna Schumaker, Amir Goldstein
Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
linux-doc, linux-nfs, Jeff Layton
In-Reply-To: <20260428-dir-deleg-v3-0-5a0780ba9def@kernel.org>
In the context of a CB_NOTIFY callback, we may not have easy access to
a svc_export. nfsd will not currently grant a delegation on a the V4 root
however, so this should be safe.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4xdr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index b9b173ec7421..91681aea9d7f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3257,7 +3257,7 @@ static __be32 nfsd4_encode_fattr4_change(struct xdr_stream *xdr,
{
const struct svc_export *exp = args->exp;
- if (unlikely(exp->ex_flags & NFSEXP_V4ROOT)) {
+ if (exp && unlikely(exp->ex_flags & NFSEXP_V4ROOT)) {
u32 flush_time = convert_to_wallclock(exp->cd->flush_time);
if (xdr_stream_encode_u32(xdr, flush_time) != XDR_UNIT)
--
2.54.0
^ permalink raw reply related
* [PATCH v3 22/28] nfsd: send basic file attributes in CB_NOTIFY
From: Jeff Layton @ 2026-04-28 7:10 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
Alexander Aring, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
Anna Schumaker, Amir Goldstein
Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
linux-doc, linux-nfs, Jeff Layton
In-Reply-To: <20260428-dir-deleg-v3-0-5a0780ba9def@kernel.org>
In addition to the filename, send attributes about the inode in a
CB_NOTIFY event. This patch just adds a the basic inode information that
can be acquired via GETATTR.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4xdr.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 91681aea9d7f..f85581ebbd10 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4104,12 +4104,21 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
goto out;
}
+#define CB_NOTIFY_STATX_REQUEST_MASK (STATX_BASIC_STATS | \
+ STATX_BTIME | \
+ STATX_CHANGE_COOKIE)
+
static bool
nfsd4_setup_notify_entry4(struct notify_entry4 *ne, struct xdr_stream *xdr,
struct dentry *dentry, struct nfs4_delegation *dp,
struct nfsd_file *nf, char *name, u32 namelen)
{
+ struct path path = { .mnt = nf->nf_file->f_path.mnt,
+ .dentry = dentry };
+ struct nfsd4_fattr_args args = { };
uint32_t *attrmask;
+ __be32 status;
+ int ret;
/* Reserve space for attrmask */
attrmask = xdr_reserve_space(xdr, 3 * sizeof(uint32_t));
@@ -4120,6 +4129,41 @@ nfsd4_setup_notify_entry4(struct notify_entry4 *ne, struct xdr_stream *xdr,
ne->ne_file.len = namelen;
ne->ne_attrs.attrmask.element = attrmask;
+ /* FIXME: d_find_alias for inode ? */
+ if (!path.dentry || !d_inode(path.dentry))
+ goto noattrs;
+
+ /*
+ * It is possible that the client was granted a delegation when a file
+ * was created. Note that we don't issue a CB_GETATTR here since stale
+ * attributes are presumably ok.
+ */
+ ret = vfs_getattr(&path, &args.stat, CB_NOTIFY_STATX_REQUEST_MASK, AT_STATX_SYNC_AS_STAT);
+ if (ret)
+ goto noattrs;
+
+ args.change_attr = nfsd4_change_attribute(&args.stat);
+
+ attrmask[0] = FATTR4_WORD0_TYPE | FATTR4_WORD0_CHANGE |
+ FATTR4_WORD0_SIZE | FATTR4_WORD0_FILEID;
+ attrmask[1] = FATTR4_WORD1_MODE | FATTR4_WORD1_NUMLINKS | FATTR4_WORD1_RAWDEV |
+ FATTR4_WORD1_SPACE_USED | FATTR4_WORD1_TIME_ACCESS |
+ FATTR4_WORD1_TIME_METADATA | FATTR4_WORD1_TIME_MODIFY;
+ attrmask[2] = 0;
+
+ if (args.stat.result_mask & STATX_BTIME)
+ attrmask[1] |= FATTR4_WORD1_TIME_CREATE;
+
+ ne->ne_attrs.attrmask.count = 2;
+ ne->ne_attrs.attr_vals.data = (u8 *)xdr->p;
+
+ status = nfsd4_encode_attr_vals(xdr, attrmask, &args);
+ if (status != nfs_ok)
+ goto noattrs;
+
+ ne->ne_attrs.attr_vals.len = (u8 *)xdr->p - ne->ne_attrs.attr_vals.data;
+ return true;
+noattrs:
attrmask[0] = 0;
attrmask[1] = 0;
attrmask[2] = 0;
--
2.54.0
^ permalink raw reply related
* [PATCH v3 23/28] nfsd: allow encoding a filehandle into fattr4 without a svc_fh
From: Jeff Layton @ 2026-04-28 7:10 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
Alexander Aring, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
Anna Schumaker, Amir Goldstein
Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
linux-doc, linux-nfs, Jeff Layton
In-Reply-To: <20260428-dir-deleg-v3-0-5a0780ba9def@kernel.org>
The current fattr4 encoder requires a svc_fh in order to encode the
filehandle. This is not available in a CB_NOTIFY callback. Add a a new
"fhandle" field to struct nfsd4_fattr_args and copy the filehandle into
there from the svc_fh. CB_NOTIFY will populate it via other means.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4xdr.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index f85581ebbd10..a9cdf7a3f8b3 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2701,7 +2701,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
}
static __be32 nfsd4_encode_nfs_fh4(struct xdr_stream *xdr,
- struct knfsd_fh *fh_handle)
+ const struct knfsd_fh *fh_handle)
{
return nfsd4_encode_opaque(xdr, fh_handle->fh_raw, fh_handle->fh_size);
}
@@ -3144,6 +3144,7 @@ struct nfsd4_fattr_args {
struct svc_fh *fhp;
struct svc_export *exp;
struct dentry *dentry;
+ struct knfsd_fh fhandle;
struct kstat stat;
struct kstatfs statfs;
struct nfs4_acl *acl;
@@ -3359,7 +3360,7 @@ static __be32 nfsd4_encode_fattr4_acl(struct xdr_stream *xdr,
static __be32 nfsd4_encode_fattr4_filehandle(struct xdr_stream *xdr,
const struct nfsd4_fattr_args *args)
{
- return nfsd4_encode_nfs_fh4(xdr, &args->fhp->fh_handle);
+ return nfsd4_encode_nfs_fh4(xdr, &args->fhandle);
}
static __be32 nfsd4_encode_fattr4_fileid(struct xdr_stream *xdr,
@@ -3969,19 +3970,23 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
if (err)
goto out_nfserr;
}
- if ((attrmask[0] & (FATTR4_WORD0_FILEHANDLE | FATTR4_WORD0_FSID)) &&
- !fhp) {
- tempfh = kmalloc_obj(struct svc_fh);
- status = nfserr_jukebox;
- if (!tempfh)
- goto out;
- fh_init(tempfh, NFS4_FHSIZE);
- status = fh_compose(tempfh, exp, dentry, NULL);
- if (status)
- goto out;
- args.fhp = tempfh;
- } else
- args.fhp = fhp;
+
+ args.fhp = fhp;
+ if ((attrmask[0] & (FATTR4_WORD0_FILEHANDLE | FATTR4_WORD0_FSID))) {
+ if (!args.fhp) {
+ tempfh = kmalloc_obj(struct svc_fh);
+ status = nfserr_jukebox;
+ if (!tempfh)
+ goto out;
+ fh_init(tempfh, NFS4_FHSIZE);
+ status = fh_compose(tempfh, exp, dentry, NULL);
+ if (status)
+ goto out;
+ args.fhp = tempfh;
+ }
+ if (args.fhp)
+ fh_copy_shallow(&args.fhandle, &args.fhp->fh_handle);
+ }
if (attrmask[0] & FATTR4_WORD0_ACL) {
err = nfsd4_get_nfs4_acl(rqstp, dentry, &args.acl);
--
2.54.0
^ permalink raw reply related
* [PATCH v3 24/28] nfsd: add a fi_connectable flag to struct nfs4_file
From: Jeff Layton @ 2026-04-28 7:10 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
Alexander Aring, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
Anna Schumaker, Amir Goldstein
Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
linux-doc, linux-nfs, Jeff Layton
In-Reply-To: <20260428-dir-deleg-v3-0-5a0780ba9def@kernel.org>
When encoding a filehandle for a CB_NOTIFY, there is no svc_export
available, but the server needs to know whether to encode a connectable
filehandle. Add a flag to the nfs4_file that tells whether the
svc_export under which a directory delegation was acquired has subtree
checking enabled, in which case it needs connectable filehandles.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4state.c | 1 +
fs/nfsd/state.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index f00aba95753c..3050980a778c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5167,6 +5167,7 @@ static void nfsd4_file_init(const struct svc_fh *fh, struct nfs4_file *fp)
memset(fp->fi_access, 0, sizeof(fp->fi_access));
fp->fi_aliased = false;
fp->fi_inode = d_inode(fh->fh_dentry);
+ fp->fi_connectable = !(fh->fh_export->ex_flags & NFSEXP_NOSUBTREECHECK);
#ifdef CONFIG_NFSD_PNFS
INIT_LIST_HEAD(&fp->fi_lo_states);
atomic_set(&fp->fi_lo_recalls, 0);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 570d66fc8297..caa3f5f78dc1 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -747,6 +747,7 @@ struct nfs4_file {
int fi_delegees;
struct knfsd_fh fi_fhandle;
bool fi_had_conflict;
+ bool fi_connectable;
#ifdef CONFIG_NFSD_PNFS
struct list_head fi_lo_states;
atomic_t fi_lo_recalls;
--
2.54.0
^ permalink raw reply related
* [PATCH v3 25/28] nfsd: add the filehandle to returned attributes in CB_NOTIFY
From: Jeff Layton @ 2026-04-28 7:10 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
Alexander Aring, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
Anna Schumaker, Amir Goldstein
Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
linux-doc, linux-nfs, Jeff Layton
In-Reply-To: <20260428-dir-deleg-v3-0-5a0780ba9def@kernel.org>
nfsd's usual fh_compose routine requires a svc_export and fills out a
svc_fh. In the context of a CB_NOTIFY there is no such export to
consult.
Add a new routine that composes a filehandle with only a parent
filehandle and nfs4_file. Use that to fill out the fhandle field in the
nfsd4_fattr_args.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4xdr.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index a9cdf7a3f8b3..5d7d8545c904 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4109,6 +4109,39 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
goto out;
}
+static bool
+setup_notify_fhandle(struct dentry *dentry, struct nfs4_file *fi,
+ struct nfsd_file *nf, struct nfsd4_fattr_args *args)
+{
+ int fileid_type, fsid_len, maxsize, flags = 0;
+ struct knfsd_fh *fhp = &args->fhandle;
+ struct inode *inode = d_inode(dentry);
+ struct inode *parent = NULL;
+ struct fid *fid;
+
+ fsid_len = key_len(fi->fi_fhandle.fh_fsid_type);
+ fhp->fh_size = 4 + fsid_len;
+
+ /* Copy first 4 bytes + fsid */
+ memcpy(&fhp->fh_raw, &fi->fi_fhandle.fh_raw, fhp->fh_size);
+
+ fid = (struct fid *)(fh_fsid(fhp) + fsid_len/4);
+ maxsize = (NFS4_FHSIZE - fhp->fh_size)/4;
+
+ if (fi->fi_connectable && !S_ISDIR(inode->i_mode)) {
+ parent = d_inode(nf->nf_file->f_path.dentry);
+ flags = EXPORT_FH_CONNECTABLE;
+ }
+
+ fileid_type = exportfs_encode_inode_fh(inode, fid, &maxsize, parent, flags);
+ if (fileid_type < 0)
+ return false;
+
+ fhp->fh_fileid_type = fileid_type;
+ fhp->fh_size += maxsize * 4;
+ return true;
+}
+
#define CB_NOTIFY_STATX_REQUEST_MASK (STATX_BASIC_STATS | \
STATX_BTIME | \
STATX_CHANGE_COOKIE)
@@ -4118,6 +4151,7 @@ nfsd4_setup_notify_entry4(struct notify_entry4 *ne, struct xdr_stream *xdr,
struct dentry *dentry, struct nfs4_delegation *dp,
struct nfsd_file *nf, char *name, u32 namelen)
{
+ struct nfs4_file *fi = dp->dl_stid.sc_file;
struct path path = { .mnt = nf->nf_file->f_path.mnt,
.dentry = dentry };
struct nfsd4_fattr_args args = { };
@@ -4156,6 +4190,9 @@ nfsd4_setup_notify_entry4(struct notify_entry4 *ne, struct xdr_stream *xdr,
FATTR4_WORD1_TIME_METADATA | FATTR4_WORD1_TIME_MODIFY;
attrmask[2] = 0;
+ if (setup_notify_fhandle(dentry, fi, nf, &args))
+ attrmask[0] |= FATTR4_WORD0_FILEHANDLE;
+
if (args.stat.result_mask & STATX_BTIME)
attrmask[1] |= FATTR4_WORD1_TIME_CREATE;
--
2.54.0
^ permalink raw reply related
* [PATCH v3 26/28] nfsd: properly track requested child attributes
From: Jeff Layton @ 2026-04-28 7:10 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
Alexander Aring, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
Anna Schumaker, Amir Goldstein
Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
linux-doc, linux-nfs, Jeff Layton
In-Reply-To: <20260428-dir-deleg-v3-0-5a0780ba9def@kernel.org>
Track the union of requested and supported child attributes in the
delegation, and only encode the attributes in that union when sending
add/remove/rename updates.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4proc.c | 2 ++
fs/nfsd/nfs4state.c | 18 ++++++++++++++++++
fs/nfsd/nfs4xdr.c | 15 ++++++---------
fs/nfsd/state.h | 3 +++
4 files changed, 29 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 01e3bf9e1839..a807a55dddf9 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2553,6 +2553,8 @@ nfsd4_get_dir_delegation(struct svc_rqst *rqstp,
gdd->gddrnf_status = GDD4_OK;
memcpy(&gdd->gddr_stateid, &dd->dl_stid.sc_stateid, sizeof(gdd->gddr_stateid));
+ gdd->gddr_child_attributes[0] = dd->dl_child_attrs[0];
+ gdd->gddr_child_attributes[1] = dd->dl_child_attrs[1];
nfs4_put_stid(&dd->dl_stid);
return nfs_ok;
}
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 3050980a778c..d2b3283c0d31 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -9808,6 +9808,21 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry *dentry,
return status;
}
+#define GDD_WORD0_CHILD_ATTRS (FATTR4_WORD0_TYPE | \
+ FATTR4_WORD0_CHANGE | \
+ FATTR4_WORD0_SIZE | \
+ FATTR4_WORD0_FILEID | \
+ FATTR4_WORD0_FILEHANDLE)
+
+#define GDD_WORD1_CHILD_ATTRS (FATTR4_WORD1_MODE | \
+ FATTR4_WORD1_NUMLINKS | \
+ FATTR4_WORD1_RAWDEV | \
+ FATTR4_WORD1_SPACE_USED | \
+ FATTR4_WORD1_TIME_ACCESS | \
+ FATTR4_WORD1_TIME_METADATA | \
+ FATTR4_WORD1_TIME_MODIFY | \
+ FATTR4_WORD1_TIME_CREATE)
+
/**
* nfsd_get_dir_deleg - attempt to get a directory delegation
* @cstate: compound state
@@ -9877,6 +9892,9 @@ nfsd_get_dir_deleg(struct nfsd4_compound_state *cstate,
dp->dl_stid.sc_export =
exp_get(cstate->current_fh.fh_export);
+ dp->dl_child_attrs[0] = gdd->gdda_child_attributes[0] & GDD_WORD0_CHILD_ATTRS;
+ dp->dl_child_attrs[1] = gdd->gdda_child_attributes[1] & GDD_WORD1_CHILD_ATTRS;
+
fl = nfs4_alloc_init_lease(dp, gdd->gddr_notification[0]);
if (!fl)
goto out_put_stid;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 5d7d8545c904..bd1142590d2b 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4183,18 +4183,15 @@ nfsd4_setup_notify_entry4(struct notify_entry4 *ne, struct xdr_stream *xdr,
args.change_attr = nfsd4_change_attribute(&args.stat);
- attrmask[0] = FATTR4_WORD0_TYPE | FATTR4_WORD0_CHANGE |
- FATTR4_WORD0_SIZE | FATTR4_WORD0_FILEID;
- attrmask[1] = FATTR4_WORD1_MODE | FATTR4_WORD1_NUMLINKS | FATTR4_WORD1_RAWDEV |
- FATTR4_WORD1_SPACE_USED | FATTR4_WORD1_TIME_ACCESS |
- FATTR4_WORD1_TIME_METADATA | FATTR4_WORD1_TIME_MODIFY;
+ attrmask[0] = dp->dl_child_attrs[0];
+ attrmask[1] = dp->dl_child_attrs[1];
attrmask[2] = 0;
- if (setup_notify_fhandle(dentry, fi, nf, &args))
- attrmask[0] |= FATTR4_WORD0_FILEHANDLE;
+ if (!setup_notify_fhandle(dentry, fi, nf, &args))
+ attrmask[0] &= ~FATTR4_WORD0_FILEHANDLE;
- if (args.stat.result_mask & STATX_BTIME)
- attrmask[1] |= FATTR4_WORD1_TIME_CREATE;
+ if (!(args.stat.result_mask & STATX_BTIME))
+ attrmask[1] &= ~FATTR4_WORD1_TIME_CREATE;
ne->ne_attrs.attrmask.count = 2;
ne->ne_attrs.attr_vals.data = (u8 *)xdr->p;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index caa3f5f78dc1..cb1ac3248fe8 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -284,6 +284,9 @@ struct nfs4_delegation {
struct timespec64 dl_atime;
struct timespec64 dl_mtime;
struct timespec64 dl_ctime;
+
+ /* For dir delegations */
+ uint32_t dl_child_attrs[2];
};
static inline bool deleg_is_read(u32 dl_type)
--
2.54.0
^ permalink raw reply related
* [PATCH v3 27/28] nfsd: track requested dir attributes
From: Jeff Layton @ 2026-04-28 7:10 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
Alexander Aring, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
Anna Schumaker, Amir Goldstein
Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
linux-doc, linux-nfs, Jeff Layton
In-Reply-To: <20260428-dir-deleg-v3-0-5a0780ba9def@kernel.org>
Track the union of requested and supported dir attributes in the
delegation, and only encode the attributes in that union when sending
add/remove/rename updates.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4proc.c | 9 ++++++---
fs/nfsd/nfs4state.c | 14 +++++++++++++-
fs/nfsd/state.h | 2 ++
3 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a807a55dddf9..e4717e1e3d93 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2506,9 +2506,10 @@ nfsd4_verify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return status == nfserr_same ? nfs_ok : status;
}
-#define SUPPORTED_NOTIFY_MASK (BIT(NOTIFY4_REMOVE_ENTRY) | \
- BIT(NOTIFY4_ADD_ENTRY) | \
- BIT(NOTIFY4_RENAME_ENTRY) | \
+#define SUPPORTED_NOTIFY_MASK (BIT(NOTIFY4_CHANGE_DIR_ATTRS) | \
+ BIT(NOTIFY4_REMOVE_ENTRY) | \
+ BIT(NOTIFY4_ADD_ENTRY) | \
+ BIT(NOTIFY4_RENAME_ENTRY) | \
BIT(NOTIFY4_GFLAG_EXTEND))
static __be32
@@ -2555,6 +2556,8 @@ nfsd4_get_dir_delegation(struct svc_rqst *rqstp,
memcpy(&gdd->gddr_stateid, &dd->dl_stid.sc_stateid, sizeof(gdd->gddr_stateid));
gdd->gddr_child_attributes[0] = dd->dl_child_attrs[0];
gdd->gddr_child_attributes[1] = dd->dl_child_attrs[1];
+ gdd->gddr_dir_attributes[0] = dd->dl_dir_attrs[0];
+ gdd->gddr_dir_attributes[1] = dd->dl_dir_attrs[1];
nfs4_put_stid(&dd->dl_stid);
return nfs_ok;
}
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d2b3283c0d31..b60aa2cf1eba 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -9823,6 +9823,15 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry *dentry,
FATTR4_WORD1_TIME_MODIFY | \
FATTR4_WORD1_TIME_CREATE)
+#define GDD_WORD0_DIR_ATTRS (FATTR4_WORD0_CHANGE | \
+ FATTR4_WORD0_SIZE)
+
+#define GDD_WORD1_DIR_ATTRS (FATTR4_WORD1_NUMLINKS | \
+ FATTR4_WORD1_SPACE_USED | \
+ FATTR4_WORD1_TIME_ACCESS | \
+ FATTR4_WORD1_TIME_METADATA | \
+ FATTR4_WORD1_TIME_MODIFY)
+
/**
* nfsd_get_dir_deleg - attempt to get a directory delegation
* @cstate: compound state
@@ -9892,10 +9901,13 @@ nfsd_get_dir_deleg(struct nfsd4_compound_state *cstate,
dp->dl_stid.sc_export =
exp_get(cstate->current_fh.fh_export);
+ dp->dl_notify_mask = gdd->gddr_notification[0];
dp->dl_child_attrs[0] = gdd->gdda_child_attributes[0] & GDD_WORD0_CHILD_ATTRS;
dp->dl_child_attrs[1] = gdd->gdda_child_attributes[1] & GDD_WORD1_CHILD_ATTRS;
+ dp->dl_dir_attrs[0] = gdd->gdda_dir_attributes[0] & GDD_WORD0_DIR_ATTRS;
+ dp->dl_dir_attrs[1] = gdd->gdda_dir_attributes[1] & GDD_WORD1_DIR_ATTRS;
- fl = nfs4_alloc_init_lease(dp, gdd->gddr_notification[0]);
+ fl = nfs4_alloc_init_lease(dp, dp->dl_notify_mask);
if (!fl)
goto out_put_stid;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index cb1ac3248fe8..4c5848285378 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -286,7 +286,9 @@ struct nfs4_delegation {
struct timespec64 dl_ctime;
/* For dir delegations */
+ uint32_t dl_notify_mask;
uint32_t dl_child_attrs[2];
+ uint32_t dl_dir_attrs[2];
};
static inline bool deleg_is_read(u32 dl_type)
--
2.54.0
^ permalink raw reply related
* [PATCH v3 28/28] nfsd: add support to CB_NOTIFY for dir attribute changes
From: Jeff Layton @ 2026-04-28 7:10 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, Chuck Lever,
Alexander Aring, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
Anna Schumaker, Amir Goldstein
Cc: Calum Mackay, linux-fsdevel, linux-kernel, linux-trace-kernel,
linux-doc, linux-nfs, Jeff Layton
In-Reply-To: <20260428-dir-deleg-v3-0-5a0780ba9def@kernel.org>
If the client requested dir attribute change notifications, send those
alongside any set of add/remove/rename events. Note that the server will
still recall the delegation on a SETATTR, so these are only sent for
changes to child dirents.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4state.c | 25 ++++++++++++++++++++--
fs/nfsd/nfs4xdr.c | 61 +++++++++++++++++++++++++++++++++++++++++++++--------
fs/nfsd/xdr4.h | 2 ++
3 files changed, 77 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b60aa2cf1eba..bb9093e3933f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3479,10 +3479,15 @@ nfsd4_cb_notify_prepare(struct nfsd4_callback *cb)
struct nfsd_notify_event *events[NOTIFY4_EVENT_QUEUE_SIZE];
struct xdr_buf xdr = { .buflen = PAGE_SIZE * NOTIFY4_PAGE_ARRAY_SIZE,
.pages = ncn->ncn_pages };
+ int limit = NOTIFY4_EVENT_QUEUE_SIZE;
struct xdr_stream stream;
struct nfsd_file *nf;
- int count, i;
bool error = false;
+ int count, i;
+
+ /* Save a slot for dir attr update if requested */
+ if (dp->dl_notify_mask & BIT(NOTIFY4_CHANGE_DIR_ATTRS))
+ --limit;
xdr_init_encode_pages(&stream, &xdr);
@@ -3496,7 +3501,7 @@ nfsd4_cb_notify_prepare(struct nfsd4_callback *cb)
}
/* we can't keep up! */
- if (count > NOTIFY4_EVENT_QUEUE_SIZE) {
+ if (count > limit) {
spin_unlock(&ncn->ncn_lock);
goto out_recall;
}
@@ -3543,6 +3548,22 @@ nfsd4_cb_notify_prepare(struct nfsd4_callback *cb)
nfsd_notify_event_put(nne);
}
if (!error) {
+ if (dp->dl_notify_mask & BIT(NOTIFY4_CHANGE_DIR_ATTRS)) {
+ u32 *maskp = (u32 *)xdr_reserve_space(&stream, sizeof(*maskp));
+
+ if (maskp) {
+ u8 *p = nfsd4_encode_dir_attr_change(&stream, dp, nf);
+
+ if (p) {
+ *maskp = BIT(NOTIFY4_CHANGE_DIR_ATTRS);
+ ncn->ncn_nf[count].notify_mask.count = 1;
+ ncn->ncn_nf[count].notify_mask.element = maskp;
+ ncn->ncn_nf[count].notify_vals.data = p;
+ ncn->ncn_nf[count].notify_vals.len = (u8 *)stream.p - p;
+ ++count;
+ }
+ }
+ }
ncn->ncn_nf_cnt = count;
nfsd_file_put(nf);
return true;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index bd1142590d2b..73f2fdf929ed 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4152,11 +4152,11 @@ nfsd4_setup_notify_entry4(struct notify_entry4 *ne, struct xdr_stream *xdr,
struct nfsd_file *nf, char *name, u32 namelen)
{
struct nfs4_file *fi = dp->dl_stid.sc_file;
- struct path path = { .mnt = nf->nf_file->f_path.mnt,
- .dentry = dentry };
+ struct path path = nf->nf_file->f_path;
struct nfsd4_fattr_args args = { };
uint32_t *attrmask;
__be32 status;
+ bool parent;
int ret;
/* Reserve space for attrmask */
@@ -4168,6 +4168,9 @@ nfsd4_setup_notify_entry4(struct notify_entry4 *ne, struct xdr_stream *xdr,
ne->ne_file.len = namelen;
ne->ne_attrs.attrmask.element = attrmask;
+ parent = (dentry == path.dentry);
+ path.dentry = dentry;
+
/* FIXME: d_find_alias for inode ? */
if (!path.dentry || !d_inode(path.dentry))
goto noattrs;
@@ -4183,15 +4186,20 @@ nfsd4_setup_notify_entry4(struct notify_entry4 *ne, struct xdr_stream *xdr,
args.change_attr = nfsd4_change_attribute(&args.stat);
- attrmask[0] = dp->dl_child_attrs[0];
- attrmask[1] = dp->dl_child_attrs[1];
- attrmask[2] = 0;
+ if (parent) {
+ attrmask[0] = dp->dl_dir_attrs[0];
+ attrmask[1] = dp->dl_dir_attrs[1];
+ } else {
+ attrmask[0] = dp->dl_child_attrs[0];
+ attrmask[1] = dp->dl_child_attrs[1];
- if (!setup_notify_fhandle(dentry, fi, nf, &args))
- attrmask[0] &= ~FATTR4_WORD0_FILEHANDLE;
+ if (!setup_notify_fhandle(dentry, fi, nf, &args))
+ attrmask[0] &= ~FATTR4_WORD0_FILEHANDLE;
- if (!(args.stat.result_mask & STATX_BTIME))
- attrmask[1] &= ~FATTR4_WORD1_TIME_CREATE;
+ if (!(args.stat.result_mask & STATX_BTIME))
+ attrmask[1] &= ~FATTR4_WORD1_TIME_CREATE;
+ }
+ attrmask[2] = 0;
ne->ne_attrs.attrmask.count = 2;
ne->ne_attrs.attr_vals.data = (u8 *)xdr->p;
@@ -4308,6 +4316,41 @@ u8 *nfsd4_encode_notify_event(struct xdr_stream *xdr, struct nfsd_notify_event *
return NULL;
}
+/**
+ * nfsd4_encode_dir_attr_change
+ * @xdr: stream to which to encode the fattr4
+ * @dp: delegation where the event occurred
+ * @nf: nfsd_file opened on the directory
+ *
+ * Encode a dir attr change event.
+ */
+u8 *nfsd4_encode_dir_attr_change(struct xdr_stream *xdr, struct nfs4_delegation *dp,
+ struct nfsd_file *nf)
+{
+ struct dentry *dentry = nf->nf_file->f_path.dentry;
+ struct notify_attr4 na = { };
+ struct name_snapshot n;
+ bool ret;
+ u8 *p = NULL;
+
+ if (!(dp->dl_notify_mask & BIT(NOTIFY4_CHANGE_DIR_ATTRS)))
+ return NULL;
+
+ take_dentry_name_snapshot(&n, dentry);
+ ret = nfsd4_setup_notify_entry4(&na.na_changed_entry, xdr,
+ dentry, dp, nf, (char *)n.name.name,
+ n.name.len);
+
+ /* Don't bother with the event if we're not encoding attrs */
+ if (ret && na.na_changed_entry.ne_attrs.attr_vals.len) {
+ p = (u8 *)xdr->p;
+ if (!xdrgen_encode_notify_attr4(xdr, &na))
+ p = NULL;
+ }
+ release_dentry_name_snapshot(&n);
+ return p;
+}
+
static void svcxdr_init_encode_from_buffer(struct xdr_stream *xdr,
struct xdr_buf *buf, __be32 *p, int bytes)
{
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index d276840aca50..cf7f0df68d63 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -958,6 +958,8 @@ __be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
u8 *nfsd4_encode_notify_event(struct xdr_stream *xdr, struct nfsd_notify_event *nne,
struct nfs4_delegation *dd, struct nfsd_file *nf,
u32 *notify_mask);
+u8 *nfsd4_encode_dir_attr_change(struct xdr_stream *xdr, struct nfs4_delegation *dp,
+ struct nfsd_file *nf);
extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp,
struct nfsd4_compound_state *, union nfsd4_op_u *u);
extern __be32 nfsd4_setclientid_confirm(struct svc_rqst *rqstp,
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v2] Documentation/rv: Replace stale website link
From: Randy Dunlap @ 2026-04-28 7:16 UTC (permalink / raw)
To: Gabriele Monaco, Steven Rostedt, Jonathan Corbet,
linux-trace-kernel, linux-doc, linux-kernel
Cc: matteo.martelli, skhan
In-Reply-To: <75eeb7196b3e65b4b5d5144f87aabd8b57793ebc.camel@redhat.com>
On 4/28/26 12:06 AM, Gabriele Monaco wrote:
> On Mon, 2026-04-27 at 09:50 -0700, Randy Dunlap wrote:
>> Tested-by: Randy Dunlap <rdunlap@infradead.org>
>> Acked-by: Randy Dunlap <rdunlap@infradead.org>
>
> Thanks for the ack!
>
>> although I don't care for the "J. Syst. Archit." abbreviation.
>> Does JSA use that? Not that I can see.
>
> That's the citation format I got from semanticscholar.org , it's indeed
> a bit ugly but it's apparently the ISO 4 abbreviation [1].
OK then. Leave it.
Thanks.
> Not sure if it would be neater to just use JSA which looks more
> official.
>
> Thanks,
> Gabriele
>
> [1] - https://dblp.org/db/journals/jsa/index.html
>
--
~Randy
^ permalink raw reply
* Re: [PATCH v2 2/2] module/kallsyms: sort function symbols and use binary search
From: Stanislaw Gruszka @ 2026-04-28 8:23 UTC (permalink / raw)
To: Petr Pavlu
Cc: linux-modules, Sami Tolvanen, Luis Chamberlain, linux-kernel,
linux-trace-kernel, live-patching, Daniel Gomez, Aaron Tomlin,
Steven Rostedt, Masami Hiramatsu, Jordan Rome, Viktor Malik
In-Reply-To: <88ae41dc-e5e0-442a-9b95-5125adf31e75@suse.com>
On Mon, Apr 27, 2026 at 03:51:31PM +0200, Petr Pavlu wrote:
> On 4/24/26 11:13 AM, Stanislaw Gruszka wrote:
> > On Thu, Apr 23, 2026 at 04:00:04PM +0200, Petr Pavlu wrote:
> >> On 3/27/26 12:00 PM, Stanislaw Gruszka wrote:
> [...]
> >>> diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
> >>> index f23126d804b2..d69e99e67707 100644
> >>> --- a/kernel/module/kallsyms.c
> >>> +++ b/kernel/module/kallsyms.c
> >>> @@ -10,6 +10,7 @@
> >>> #include <linux/kallsyms.h>
> >>> #include <linux/buildid.h>
> >>> #include <linux/bsearch.h>
> >>> +#include <linux/sort.h>
> >>> #include "internal.h"
> >>>
> >>> /* Lookup exported symbol in given range of kernel_symbols */
> >>> @@ -103,6 +104,95 @@ static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs,
> >>> return true;
> >>> }
> >>>
> >>> +static inline bool is_func_symbol(const Elf_Sym *sym)
> >>> +{
> >>> + return sym->st_shndx != SHN_UNDEF && sym->st_size != 0 &&
> >>> + ELF_ST_TYPE(sym->st_info) == STT_FUNC;
> >>> +}
> >>> +
> >>> +static unsigned int bsearch_func_symbol(struct mod_kallsyms *kallsyms,
> >>> + unsigned long addr,
> >>> + unsigned long *bestval,
> >>> + unsigned long *nextval)
> >>> +
> >>> +{
> >>> + unsigned int mid, low = 1, high = kallsyms->num_func_syms + 1;
> >>> + unsigned int best = 0;
> >>> + unsigned long thisval;
> >>> +
> >>> + while (low < high) {
> >>> + mid = low + (high - low) / 2;
> >>> + thisval = kallsyms_symbol_value(&kallsyms->symtab[mid]);
> >>> +
> >>> + if (thisval <= addr) {
> >>> + *bestval = thisval;
> >>> + best = mid;
> >>> + low = mid + 1;
> >>
> >> If thisval == addr, the search moves to the right and finds the last
> >> symbol with the same address. I believe it should do the opposite and
> >> return the first symbol to match the behavior of
> >> search_kallsyms_symbol().
> >
> > In the case of multiple symbols sharing the same address, we have
> > to pick one and ignore the others. I don’t think it matters much which
> > one is chosen in practice. Also, I expect function symbol addresses
> > to be unique, so this shouldn’t be a real issue.
>
> I think that the code should consistently pick the same answer. If
> someone uses aliases for their functions, the logic shouldn't
> arbitrarily return one of them, but preferably the first one, which
> should normally be the actual implementation.
>
> >
> >>> + } else {
> >>> + *nextval = thisval;
> >>> + high = mid;
> >>> + }
> >>> + }
> >>> +
> >>> + return best;
> >>> +}
> >>> +
> >>> +static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms,
> >>> + unsigned int symnum)
> >>> +{
> >>> + return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
> >>> +}
> >>> +
> >>> +static unsigned int search_kallsyms_symbol(struct mod_kallsyms *kallsyms,
> >>> + unsigned long addr,
> >>> + unsigned long *bestval,
> >>> + unsigned long *nextval)
> >>> +{
> >>> + unsigned int i, best = 0;
> >>> +
> >>> + /*
> >>> + * Scan for closest preceding symbol and next symbol. (ELF starts
> >>> + * real symbols at 1). Skip the initial function symbols range
> >>> + * if num_func_syms is non-zero, those are handled separately for
> >>> + * the core TEXT segment lookup.
> >>> + */
> >>> + for (i = 1 + kallsyms->num_func_syms; i < kallsyms->num_symtab; i++) {
> >>> + const Elf_Sym *sym = &kallsyms->symtab[i];
> >>> + unsigned long thisval = kallsyms_symbol_value(sym);
> >>> +
> >>> + if (sym->st_shndx == SHN_UNDEF)
> >>> + continue;
> >>> +
> >>> + /*
> >>> + * We ignore unnamed symbols: they're uninformative
> >>> + * and inserted at a whim.
> >>> + */
> >>> + if (*kallsyms_symbol_name(kallsyms, i) == '\0' ||
> >>> + is_mapping_symbol(kallsyms_symbol_name(kallsyms, i)))
> >>> + continue;
> >>> +
> >>> + if (thisval <= addr && thisval > *bestval) {
> >>> + best = i;
> >>> + *bestval = thisval;
> >>> + }
> >>> + if (thisval > addr && thisval < *nextval)
> >>> + *nextval = thisval;
> >>> + }
> >>> +
> >>> + return best;
> >>> +}
> >>> +
> >>> +static int elf_sym_cmp(const void *a, const void *b)
> >>> +{
> >>> + unsigned long val_a = kallsyms_symbol_value((const Elf_Sym *)a);
> >>> + unsigned long val_b = kallsyms_symbol_value((const Elf_Sym *)b);
> >>> +
> >>> + if (val_a < val_b)
> >>> + return -1;
> >>> +
> >>> + return val_a > val_b;
> >>
> >> Does this comparison function and the sort() call result in stable
> >> sorting? If val_a and val_b are the same, the sorting should preserve
> >> the original order.
> >
> > The kernel’s sort() implementation is not stable.
>
> Ok, I see it is a heapsort. It would require additional data to keep
> information about the original indexes for elf_sym_cmp() to use as
> a tiebreaker.
>
> >
> >>> +}
> >>> +
> >>> /*
> >>> * We only allocate and copy the strings needed by the parts of symtab
> >>> * we keep. This is simple, but has the effect of making multiple
> >>> @@ -115,9 +205,10 @@ void layout_symtab(struct module *mod, struct load_info *info)
> >>> Elf_Shdr *symsect = info->sechdrs + info->index.sym;
> >>> Elf_Shdr *strsect = info->sechdrs + info->index.str;
> >>> const Elf_Sym *src;
> >>> - unsigned int i, nsrc, ndst, strtab_size = 0;
> >>> + unsigned int i, nsrc, ndst, nfunc, strtab_size = 0;
> >>> struct module_memory *mod_mem_data = &mod->mem[MOD_DATA];
> >>> struct module_memory *mod_mem_init_data = &mod->mem[MOD_INIT_DATA];
> >>> + bool is_lp_mod = is_livepatch_module(mod);
> >>>
> >>> /* Put symbol section at end of init part of module. */
> >>> symsect->sh_flags |= SHF_ALLOC;
> >>> @@ -129,12 +220,14 @@ void layout_symtab(struct module *mod, struct load_info *info)
> >>> nsrc = symsect->sh_size / sizeof(*src);
> >>>
> >>> /* Compute total space required for the core symbols' strtab. */
> >>> - for (ndst = i = 0; i < nsrc; i++) {
> >>> - if (i == 0 || is_livepatch_module(mod) ||
> >>> + for (ndst = nfunc = i = 0; i < nsrc; i++) {
> >>> + if (i == 0 || is_lp_mod ||
> >>> is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
> >>> info->index.pcpu)) {
> >>> strtab_size += strlen(&info->strtab[src[i].st_name]) + 1;
> >>> ndst++;
> >>> + if (!is_lp_mod && is_func_symbol(src + i))
> >>> + nfunc++;
> >>> }
> >>> }
> >>>
> >>> @@ -156,6 +249,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
> >>> mod_mem_init_data->size = ALIGN(mod_mem_init_data->size,
> >>> __alignof__(struct mod_kallsyms));
> >>> info->mod_kallsyms_init_off = mod_mem_init_data->size;
> >>> + info->num_func_syms = nfunc;
> >>>
> >>> mod_mem_init_data->size += sizeof(struct mod_kallsyms);
> >>> info->init_typeoffs = mod_mem_init_data->size;
> >>> @@ -169,7 +263,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
> >>> */
> >>> void add_kallsyms(struct module *mod, const struct load_info *info)
> >>> {
> >>> - unsigned int i, ndst;
> >>> + unsigned int i, di, nfunc, ndst;
> >>> const Elf_Sym *src;
> >>> Elf_Sym *dst;
> >>> char *s;
> >>> @@ -178,6 +272,7 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
> >>> void *data_base = mod->mem[MOD_DATA].base;
> >>> void *init_data_base = mod->mem[MOD_INIT_DATA].base;
> >>> struct mod_kallsyms *kallsyms;
> >>> + bool is_lp_mod = is_livepatch_module(mod);
> >>>
> >>> kallsyms = init_data_base + info->mod_kallsyms_init_off;
> >>
> >> This code is followed by the initialization of kallsyms:
> >>
> >> kallsyms->symtab = (void *)symsec->sh_addr;
> >> kallsyms->num_symtab = symsec->sh_size / sizeof(Elf_Sym);
> >> /* Make sure we get permanent strtab: don't use info->strtab. */
> >> kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr;
> >> kallsyms->typetab = init_data_base + info->init_typeoffs;
> >>
> >> I suggest adding 'kallsyms->num_func_syms = 0;' after the initialization
> >> of kallsyms->num_symtab.
> >
> > I relied on zeroed memory initialization, but I can add this explicitly
> > for clarity.
> >
> >>> @@ -194,19 +289,28 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
> >>> mod->core_kallsyms.symtab = dst = data_base + info->symoffs;
> >>> mod->core_kallsyms.strtab = s = data_base + info->stroffs;
> >>> mod->core_kallsyms.typetab = data_base + info->core_typeoffs;
> >>> +
> >>> strtab_size = info->core_typeoffs - info->stroffs;
> >>> src = kallsyms->symtab;
> >>> - for (ndst = i = 0; i < kallsyms->num_symtab; i++) {
> >>> + ndst = info->num_func_syms + 1;
> >>> +
> >>> + for (nfunc = i = 0; i < kallsyms->num_symtab; i++) {
> >>> kallsyms->typetab[i] = elf_type(src + i, info);
> >>> - if (i == 0 || is_livepatch_module(mod) ||
> >>> + if (i == 0 || is_lp_mod ||
> >>> is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
> >>> info->index.pcpu)) {
> >>> ssize_t ret;
> >>>
> >>> - mod->core_kallsyms.typetab[ndst] =
> >>> - kallsyms->typetab[i];
> >>> - dst[ndst] = src[i];
> >>> - dst[ndst++].st_name = s - mod->core_kallsyms.strtab;
> >>> + if (i == 0)
> >>> + di = 0;
> >>> + else if (!is_lp_mod && is_func_symbol(src + i))
> >>> + di = 1 + nfunc++;
> >>> + else
> >>> + di = ndst++;
> >>> +
> >>> + mod->core_kallsyms.typetab[di] = kallsyms->typetab[i];
> >>> + dst[di] = src[i];
> >>> + dst[di].st_name = s - mod->core_kallsyms.strtab;
> >>> ret = strscpy(s, &kallsyms->strtab[src[i].st_name],
> >>> strtab_size);
> >>> if (ret < 0)
> >>> @@ -216,9 +320,13 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
> >>> }
> >>> }
> >>>
> >>> + WARN_ON_ONCE(nfunc != info->num_func_syms);
> >>> + sort(dst + 1, nfunc, sizeof(Elf_Sym), elf_sym_cmp, NULL);
> >>> +
> >>
> >> The code sorts mod->core_kallsyms.symtab but mod->core_kallsyms.typetab
> >> is not reordered accordingly.
> >
> > Right, but for function symbols the typetab entries are all 't',
> > so swapping them does not change the type value. The 'T' vs 't'
> > distinction is handled later when printing (based on export status).
> > But the comment explaining skiping adjusting of
> > mod->core_kallsyms.typetab is needed.
>
> Modules can also contain weak functions with elf_type() = 'w'.
Good point.
> >>> /* Set up to point into init section. */
> >>> rcu_assign_pointer(mod->kallsyms, kallsyms);
> >>> mod->core_kallsyms.num_symtab = ndst;
> >>> + mod->core_kallsyms.num_func_syms = nfunc;
> >>> }
> >>>
> >>> #if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
> >>> @@ -241,11 +349,6 @@ void init_build_id(struct module *mod, const struct load_info *info)
> >>> }
> >>> #endif
> >>>
> >>> -static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms, unsigned int symnum)
> >>> -{
> >>> - return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
> >>> -}
> >>> -
> >>> /*
> >>> * Given a module and address, find the corresponding symbol and return its name
> >>> * while providing its size and offset if needed.
> >>> @@ -255,7 +358,10 @@ static const char *find_kallsyms_symbol(struct module *mod,
> >>> unsigned long *size,
> >>> unsigned long *offset)
> >>> {
> >>> - unsigned int i, best = 0;
> >>> + unsigned int (*search)(struct mod_kallsyms *kallsyms,
> >>> + unsigned long addr, unsigned long *bestval,
> >>> + unsigned long *nextval);
> >>> + unsigned int best;
> >>> unsigned long nextval, bestval;
> >>> struct mod_kallsyms *kallsyms = rcu_dereference(mod->kallsyms);
> >>> struct module_memory *mod_mem = NULL;
> >>> @@ -266,6 +372,11 @@ static const char *find_kallsyms_symbol(struct module *mod,
> >>> continue;
> >>> #endif
> >>> if (within_module_mem_type(addr, mod, type)) {
> >>> + if (type == MOD_TEXT && kallsyms->num_func_syms > 0)
> >>> + search = bsearch_func_symbol;
> >>
> >> I'm not sure if it is ok to limit the search only to function symbols
> >> when the address lies in MOD_TEXT. The text can theoretically contain
> >> non-function symbols.
> >
> > Yes, the patch assumes that the only valid symbols in the MOD_TEXT
> > are functions. If there are defined OBJECT symbols in .text, the patch
> > would break lookup for those.
> >
> > While it’s theoretically possible (e.g. hand-written assembly placing
> > data in .text ?), I’m not sure this is a practical concern. In general,
> > having data in executable segments is discouraged for security reasons.
> >
> >> Could this optimization be adjusted to sort all
> >> MOD_TEXT symbols (excluding anonymous and mapping symbols) and move them
> >> to the front of the symbol table?
> >
> > That’s possible. We could track .text sections indices in
> > __layout_sections() and include all valid symbols from those sections,
> > and also reorder typetab accordingly.
> >
> > However, this adds complexity. I would prefer to first confirm whether
> > OBJECT symbols in MOD_TEXT is a real issue before going in that direction.
>
> I'm not aware of specific OBJECT symbols that end up in MOD_TEXT.
> Nonetheless, it is a valid case and it is preferable that an
> optimization doesn't break their lookup by address.
>
> In general, I'm worried about the several edge cases and inconsistencies
> that this optimization introduces. This also includes the fact that it
> doesn't work for livepatch modules.
>
> An alternative could be to keep the symbol table untouched and have
> a separate array with symbol indexes that is sorted by their addresses,
> but it requires evaluation if the additional memory usage is worth it.
I personally prefer not to over-engineer for cases that may never occur.
That said, your concerns about edge cases, consistency, and livepatch
handling are valid.
Let me take some time to think this through and experiment with possible
solutions that handle these cases properly. I can’t work on this
immediately, but I’ll get back to it.
Regards
Stanislaw
^ permalink raw reply
* [PATCH] tracing: fprobe: Remove __packed from generic __fprobe_header
From: Markus Schneider-Pargmann (The Capable Hub) @ 2026-04-28 8:30 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Heiko Carstens
Cc: linux-kernel, linux-trace-kernel,
Markus Schneider-Pargmann (The Capable Hub)
fp pointer and unsigned long have the same size on all relevant
architectures that build Linux. Furthermore this struct is only used in
architectures that do not set ARCH_DEFINE_ENCODE_FPROBE_HEADER which is
set only for 64bit architectures (apart from LoongArch).
Both fields are aligned on these architectures so the struct with
__packed and without it are the same.
Remove the __packed as it is unnecessary.
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Signed-off-by: Markus Schneider-Pargmann (The Capable Hub) <msp@baylibre.com>
---
kernel/trace/fprobe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index cc49ebd2a773..21751dcdb7b9 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -181,7 +181,7 @@ static inline void read_fprobe_header(unsigned long *stack,
struct __fprobe_header {
struct fprobe *fp;
unsigned long size_words;
-} __packed;
+};
#define FPROBE_HEADER_SIZE_IN_LONG SIZE_IN_LONG(sizeof(struct __fprobe_header))
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260427-topic-fprobe-packed-v7-1-f44f9bbdedf6
Best regards,
--
Markus Schneider-Pargmann (The Capable Hub) <msp@baylibre.com>
^ permalink raw reply related
* Re: [PATCH v2] Documentation/rv: Replace stale website link
From: Gabriele Monaco @ 2026-04-28 9:39 UTC (permalink / raw)
To: Matteo Martelli, rdunlap, Steven Rostedt, Jonathan Corbet,
linux-trace-kernel, linux-doc, linux-kernel
Cc: skhan
In-Reply-To: <d0a66f985640388ab13fda6f7d66c5ad@codethink.co.uk>
Hi Matteo,
On Tue, 2026-04-28 at 11:11 +0200, Matteo Martelli wrote:
> Thanks for addressing this. FWIW this looks good to me, however I've
> just noticed that the same article is also referenced in
> rv/runtime-verification.rst with a slightly different format and no
> link. I think that it might be more clear if we had a References
> section in rv/runtime-verification.rst, add links to publicly
> available articles, and then let the other pages point to that
> section when needed instead of duplicating the references like in
> this case. But that could probably be addressed as a further clean-up
> patch.
>
> Acked-by: Matteo Martelli <matteo.martelli@codethink.co.uk>
> Tested-by: Matteo Martelli <matteo.martelli@codethink.co.uk>
Thanks for the ack!
That's a good point, I agree it's better to go ahead with this small
patch and I can make a deeper documentation cleanup later.
Thanks,
Gabriele
^ permalink raw reply
* Re: [PATCH] kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()
From: Masami Hiramatsu @ 2026-04-28 9:43 UTC (permalink / raw)
To: Jianpeng Chang
Cc: naveen, davem, catalin.marinas, mark.rutland, linux-kernel,
linux-trace-kernel, stable
In-Reply-To: <20260427073545.3656835-1-jianpeng.chang.cn@windriver.com>
Hi,
On Mon, 27 Apr 2026 15:35:44 +0800
Jianpeng Chang <jianpeng.chang.cn@windriver.com> wrote:
> When kprobe_add_area_blacklist() iterates through a section like
> .kprobes.text, the start address may not correspond to a named symbol.
> On ARM64 with CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y (introduced by
> commit baaf553d3bc3 ("arm64: Implement
> HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS")), the compiler flag
> -fpatchable-function-entry=4,2 inserts 2 NOPs before each function entry
> point for ftrace call_ops. These pre-function NOPs sit at the section base
> address, before the first named function symbol. The compiler emits a $x
> mapping symbol at offset 0x00 to mark the start of code, but
> find_kallsyms_symbol() ignores mapping symbols.
>
> Without CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS (e.g. defconfig), no
> pre-function NOPs are inserted, the first function starts at offset
> 0x00, and the bug does not trigger.
>
> This only affects modules that have a .kprobes.text section (i.e. those
> using the __kprobes annotation). Modules using NOKPROBE_SYMBOL() instead
> (like kretprobe_example.ko) blacklist exact function addresses via the
> _kprobe_blacklist section and are not affected.
>
> For kprobe_example.ko on ARM64 with -fpatchable-function-entry=4,2,
> the .kprobes.text section layout is:
>
> offset 0x00: $x + 2 NOPs (mapping symbol + ftrace preamble)
> offset 0x08: handler_post (64 bytes)
> offset 0x50: handler_pre (68 bytes)
Ah, OK. It is for __kprobes attribute. I recommend user to use NOKPROBE_SYMBOL()
but I understand the situation.
>
> kprobe_add_area_blacklist() starts iterating from the section base
> address (offset 0x00), which only has the $x mapping symbol.
> kprobe_add_ksym_blacklist() then calls kallsyms_lookup_size_offset()
> for this address, which goes through:
>
> kallsyms_lookup_size_offset()
> -> module_address_lookup()
> -> find_kallsyms_symbol()
>
> find_kallsyms_symbol() scans all module symbols to find the closest
> preceding symbol.
>
> Since no named text symbol exists at offset 0x00,
> find_kallsyms_symbol() picks __UNIQUE_ID_vermagic (a .modinfo symbol
> whose address is in the temporary image) as the "best" match. The
> computed "size" = next_text_symbol - modinfo_symbol spans across
> these two unrelated memory regions, creating a blacklist entry with
> a bogus range of tens of terabytes.
>
> Whether this causes a visible failure depends on address randomization,
> here is what happens on Raspberry Pi 4/5:
>
> - On RPi5, the bogus size was ~35 TB. start + size stayed within
> 64-bit range, so the blacklist entry covered the entire kernel
> text. register_kprobe() in the module's own init function failed
> with -EINVAL.
>
> - On RPi4, the bogus size was ~75 TB. start + size overflowed
> 64 bits and wrapped to a small address near zero. The range
> check (addr >= start && addr < end) then failed because end
> wrapped around, so the bogus entry was accidentally harmless
> and kprobes worked by luck.
>
> The same bug exists on both machines, but randomization determines whether
> the integer overflow masks it or not.
>
> Fix this by checking the offset returned by kallsyms_lookup_size_offset().
> A non-zero offset means the address is not at a symbol boundary, so skip
> forward to the next symbol instead of creating a blacklist entry with a
> wrong size.
>
> Fixes: baaf553d3bc3 ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS")
> Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com>
> ---
> Hi,
>
> This patch skips non-symbol addresses, fixes the bogus blacklist entry,
> but leaves the NOP gap at the start of .kprobes.text unblacklisted.
That is OK because those NOPs are not executed in kprobe handler.
>
> We can continue alloc the ent without return to add the gap to
> blacklist, or do some more works to add the gap to the first symbol in
> blacklist. I'm not sure if is this necessary, or is there a better way?
Are there any compiler option or attribute to avoid inserting these
NOPs to the specific section? (like notrace?)
Also, as you can see there is an alias symbol whose size is 0. and
in that case, we move the entry + 1 and call kprobe_add_ksym_blacklist()
again. Thus, the offset becomes 1. Please make sure it is correctly
handled.
Thanks,
>
> Thanks,
> Jianpeng
>
> kernel/kprobes.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index bfc89083daa9..be700fb03198 100644
> --- a/kernel/kprobes.c>
> +++ b/kernel/kprobes.c
> @@ -2503,6 +2503,10 @@ int kprobe_add_ksym_blacklist(unsigned long entry)
> !kallsyms_lookup_size_offset(entry, &size, &offset))
> return -EINVAL;
>
> + /* Not on a symbol boundary -- skip to the next symbol */
> + if (offset)
> + return (int)(size - offset);
> +
> ent = kmalloc_obj(*ent);
> if (!ent)
> return -ENOMEM;
> --
> 2.54.0
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH v2] Documentation/rv: Replace stale website link
From: Matteo Martelli @ 2026-04-28 9:11 UTC (permalink / raw)
To: Gabriele Monaco, rdunlap, Steven Rostedt, Gabriele Monaco,
Jonathan Corbet, linux-trace-kernel, linux-doc, linux-kernel
Cc: skhan
In-Reply-To: <20260427131709.170505-2-gmonaco@redhat.com>
Hi Gabriele,
On Mon, 27 Apr 2026 15:17:09 +0200, Gabriele Monaco <gmonaco@redhat.com> wrote:
> The sched monitor page was linking to Daniel's website which is now
> down. The main purpose of the link was to point to a source for the
> models from the original author and that can be found also in his
> published paper.
>
> Replace the link with a reference to Daniel's "A thread synchronization
> model for the PREEMPT_RT Linux kernel" which can be found online and
> includes the models definitions as well as the work behind them (not the
> original patches but since they're based on a 5.0 kernel and are mostly
> included upstream, there's little value in keeping them in the docs).
>
> Fixes: 03abeaa63c08 ("Documentation/rv: Add docs for the sched monitors")
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
> V2: Add link to the PDF and fixed RST references
>
> Documentation/trace/rv/monitor_sched.rst | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
> index 0b96d6e147c6..d3ba7edc202f 100644
> --- a/Documentation/trace/rv/monitor_sched.rst
> +++ b/Documentation/trace/rv/monitor_sched.rst
> @@ -36,7 +36,7 @@ Specifications
> --------------
>
> The specifications included in sched are currently a work in progress, adapting the ones
> -defined in by Daniel Bristot in [1].
> +defined by Daniel Bristot in [1]_.
>
> Currently we included the following:
>
> @@ -365,4 +365,7 @@ constraints when processing the events::
> References
> ----------
>
> -[1] - https://bristot.me/linux-task-model
> +.. [1] Daniel Bristot de Oliveira et al.:
> + `A thread synchronization model for the PREEMPT_RT Linux kernel
> + <https://www.iris.sssup.it/bitstream/11382/533630/1/Elsevier-JSA-2020.pdf>`_,
> + J. Syst. Archit., 2020.
>
> base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
> --
> 2.53.0
>
>
Thanks for addressing this. FWIW this looks good to me, however I've just
noticed that the same article is also referenced in rv/runtime-verification.rst
with a slightly different format and no link. I think that it might be more
clear if we had a References section in rv/runtime-verification.rst, add links
to publicly available articles, and then let the other pages point to that
section when needed instead of duplicating the references like in this case. But
that could probably be addressed as a further clean-up patch.
Acked-by: Matteo Martelli <matteo.martelli@codethink.co.uk>
Tested-by: Matteo Martelli <matteo.martelli@codethink.co.uk>
Best regards,
Matteo Martelli
^ permalink raw reply
* Re: [PATCH v2] tracing: export live module tracepoint strings in printk_formats
From: Steven Rostedt @ 2026-04-28 12:39 UTC (permalink / raw)
To: Cao Ruichuang
Cc: petr.pavlu, linux-trace-kernel, linux-kernel, Luis Chamberlain,
Daniel Gomez, Sami Tolvanen
In-Reply-To: <20260420061911.97066-1-create0818@163.com>
[ Adding more module maintainers ]
Note, tracing subsystem uses capital letters in subject:
[PATCH v2] tracing: Export live module tracepoint strings in printk_formats
On Mon, 20 Apr 2026 14:19:11 +0800
Cao Ruichuang <create0818@163.com> wrote:
> tracepoint_string() documents that its strings are exported through
> printk_formats so that user space can decode pointer fields recorded in
> trace buffers.
>
> That already works for built-in __tracepoint_str entries, but module
> __tracepoint_str sections are not collected or exported today. As a
> result, module tracepoint_string() users still show raw pointer values
> in printk_formats consumers such as trace.dat decoders.
>
> Record module __tracepoint_str sections when modules are loaded, expose
> their live ranges through printk_formats, and teach
> trace_is_tracepoint_string() to accept those live module strings too.
>
> Keep the lifetime semantics tied to the module itself. This does not
> copy strings into tracing-owned storage and does not preserve the
> mappings after module unload.
>
> On MODULE_STATE_GOING, the live module string ranges are removed again.
> This relies on the existing tracing module notifier ordering: trace
> event teardown runs first and resets module event buffers before these
> auxiliary string mappings are dropped.
>
> If the small auxiliary registry allocation fails, warn and continue
> loading the module. printk_formats exposure is degraded in that case,
> but tracing should not fail module load for missing debug metadata.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217196
> Assisted-by: Codex:GPT-5.4
> Signed-off-by: Cao Ruichuang <create0818@163.com>
> ---
> v2:
> - replace the previous copied-string approach with live module section ranges
> - record module __tracepoint_str ranges in struct module
> - export only live module tracepoint strings in printk_formats
> - remove module mappings on MODULE_STATE_GOING
> - keep auxiliary registry allocation failure non-fatal and warn instead
> - add explicit notifier priority and document the teardown ordering dependency
>
> Tested in QEMU:
> - basic repro showing module tracepoint_string() entries in printk_formats
> - load/unload validation confirming mappings are removed after rmmod
> - failed module init after MODULE_STATE_COMING with no stale mapping left
> - targeted failslab injection on the notifier-time auxiliary allocation,
> confirming module load still succeeds, a warning is emitted, and the
> module mapping is not exported
>
> include/linux/module.h | 2 +
> kernel/module/main.c | 4 +
> kernel/trace/trace_printk.c | 153 ++++++++++++++++++++++++++++++++++--
> 3 files changed, 152 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 14f391b186c..e475466a785 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -515,6 +515,8 @@ struct module {
> #ifdef CONFIG_TRACING
> unsigned int num_trace_bprintk_fmt;
> const char **trace_bprintk_fmt_start;
> + unsigned int num_tracepoint_strings;
> + const char **tracepoint_strings_start;
Would be more compact on 64 bit systems to put the two ints together.
unsigned int num_trace_bprintk_fmt;
unsigned int num_tracepoint_strings;
const char **trace_bprintk_fmt_start;
const char **tracepoint_strings_start;
> #endif
> #ifdef CONFIG_EVENT_TRACING
> struct trace_event_call **trace_events;
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index c3ce106c70a..d7d890138ac 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2672,6 +2672,10 @@ static int find_module_sections(struct module *mod, struct load_info *info)
> mod->trace_bprintk_fmt_start = section_objs(info, "__trace_printk_fmt",
> sizeof(*mod->trace_bprintk_fmt_start),
> &mod->num_trace_bprintk_fmt);
> + mod->tracepoint_strings_start =
> + section_objs(info, "__tracepoint_str",
> + sizeof(*mod->tracepoint_strings_start),
> + &mod->num_tracepoint_strings);
> #endif
> #ifdef CONFIG_DYNAMIC_FTRACE
> /* sechdrs[0].sh_size is always zero */
> diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
> index 5ea5e0d76f0..2d41b0a63b3 100644
> --- a/kernel/trace/trace_printk.c
> +++ b/kernel/trace/trace_printk.c
> @@ -13,6 +13,7 @@
> #include <linux/string.h>
> #include <linux/module.h>
> #include <linux/mutex.h>
> +#include <linux/rcupdate.h>
> #include <linux/ctype.h>
> #include <linux/list.h>
> #include <linux/slab.h>
> @@ -24,10 +25,15 @@
> /*
> * modules trace_printk()'s formats are autosaved in struct trace_bprintk_fmt
> * which are queued on trace_bprintk_fmt_list.
> + *
> + * modules tracepoint_string() entries are kept as ranges into the owning
> + * module's __tracepoint_str section and are removed again when the module
> + * goes away.
> */
> static LIST_HEAD(trace_bprintk_fmt_list);
> +static LIST_HEAD(tracepoint_str_list);
>
> -/* serialize accesses to trace_bprintk_fmt_list */
> +/* serialize accesses to module trace printk and tracepoint string lists */
> static DEFINE_MUTEX(btrace_mutex);
>
> struct trace_bprintk_fmt {
> @@ -35,6 +41,13 @@ struct trace_bprintk_fmt {
> const char *fmt;
> };
>
> +struct tracepoint_mod_str {
> + struct list_head list;
Perhaps add a rcu_head here?
> + struct module *mod;
> + const char **start;
> + unsigned int num;
> +};
> +
> static inline struct trace_bprintk_fmt *lookup_format(const char *fmt)
> {
> struct trace_bprintk_fmt *pos;
> @@ -85,16 +98,70 @@ void hold_module_trace_bprintk_format(const char **start, const char **end)
> mutex_unlock(&btrace_mutex);
> }
>
> +static void hold_module_tracepoint_strings(struct module *mod)
> +{
> + struct tracepoint_mod_str *tp_str;
> +
> + if (!mod->num_tracepoint_strings)
> + return;
> +
> + tp_str = kmalloc_obj(*tp_str);
> + if (!tp_str) {
> + pr_warn("tracing: Failed to expose module tracepoint strings for %s\n",
> + mod->name);
> + return;
> + }
> +
> + tp_str->mod = mod;
> + tp_str->start = mod->tracepoint_strings_start;
> + tp_str->num = mod->num_tracepoint_strings;
> +
> + mutex_lock(&btrace_mutex);
> + list_add_tail_rcu(&tp_str->list, &tracepoint_str_list);
> + mutex_unlock(&btrace_mutex);
> +}
> +
> +static void release_module_tracepoint_strings(struct module *mod)
> +{
> + struct tracepoint_mod_str *tp_str, *next;
> + struct tracepoint_mod_str *found = NULL;
> +
> + mutex_lock(&btrace_mutex);
> + list_for_each_entry_safe(tp_str, next, &tracepoint_str_list, list) {
> + if (tp_str->mod != mod)
> + continue;
> +
> + list_del_rcu(&tp_str->list);
Does this really need to be rcu protected? (See below).
> + found = tp_str;
> + break;
> + }
> + mutex_unlock(&btrace_mutex);
> +
> + if (found) {
> + synchronize_rcu();
> + kfree(found);
Should use kfree_rcu() to not slow down module unload anymore than needed
(hence the need for the rcu_head above). But if we keep it used under the
mutex instead of RCU, we don't even need to do that.
And honestly, the iteration happens only on module load, so the only time
the read would intersect with the write is if a module was being loaded at
the same time another module was being unloaded. I don't think that can
happen.
> + }
> +}
> +
> static int module_trace_bprintk_format_notify(struct notifier_block *self,
> unsigned long val, void *data)
> {
> struct module *mod = data;
> - if (mod->num_trace_bprintk_fmt) {
> - const char **start = mod->trace_bprintk_fmt_start;
> - const char **end = start + mod->num_trace_bprintk_fmt;
>
> - if (val == MODULE_STATE_COMING)
> + switch (val) {
> + case MODULE_STATE_COMING:
> + if (mod->num_trace_bprintk_fmt) {
> + const char **start = mod->trace_bprintk_fmt_start;
> + const char **end = start + mod->num_trace_bprintk_fmt;
> +
> hold_module_trace_bprintk_format(start, end);
> + }
> + hold_module_tracepoint_strings(mod);
> + break;
> + case MODULE_STATE_GOING:
> + /* trace event teardown runs first and clears module event buffers. */
> + release_module_tracepoint_strings(mod);
> + break;
> }
> return NOTIFY_OK;
> }
> @@ -159,6 +226,55 @@ find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
> return &mod_fmt->fmt;
> }
>
> +static int count_mod_formats(void)
> +{
> + struct trace_bprintk_fmt *p;
> + int count = 0;
> +
> + list_for_each_entry(p, &trace_bprintk_fmt_list, list)
> + count++;
This counter needs to be stored somewhere and not calculated every time.
int trace_bprintk_fmt_cnt;
And updated on creation and deletion.
> +
> + return count;
> +}
> +
> +static const char **
> +find_next_mod_tracepoint_str(int start_index, loff_t *pos)
> +{
> + struct tracepoint_mod_str *tp_str;
> + int index = start_index;
> + unsigned int i;
> +
> + list_for_each_entry(tp_str, &tracepoint_str_list, list) {
> + for (i = 0; i < tp_str->num; i++) {
> + if (index == *pos)
> + return tp_str->start + i;
> + index++;
> + }
The above is horribly inefficient. Should have:
list_for_each_entry(tp_str, &tracepoint_str_list, list) {
if (index + tp_str->num <= *pos) {
index += tp_str->num;
continue;
}
for (i = 0; i < tp_str->num; i++) {
if (index == *pos)
return tp_str->start + i;
index++;
}
/* Should not hit here */
WARN_ONCE(1);
return NULL;
}
> + }
> +
> + return NULL;
> +}
> +
> +static bool is_module_tracepoint_string(const char *str)
> +{
> + struct tracepoint_mod_str *tp_str;
> + unsigned int i;
> + bool found = false;
> +
> + rcu_read_lock();
This is not a fast path, why not take the btrace_mutex?
Either way, for rcu, this should be:
guard(rcu)();
or for the mutex:
guard(mutex)(&btrace_mutex);
> + list_for_each_entry_rcu(tp_str, &tracepoint_str_list, list) {
> + for (i = 0; i < tp_str->num; i++) {
> + if (str == tp_str->start[i]) {
> + found = true;
This should be:
return true;
> + goto out;
> + }
> + }
> + }
Remove the below and have:
return false;
-- Steve
> +out:
> + rcu_read_unlock();
> + return found;
> +}
> +
> static void format_mod_start(void)
> {
> mutex_lock(&btrace_mutex);
> @@ -181,6 +297,22 @@ find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
> {
> return NULL;
> }
> +
> +static inline int count_mod_formats(void)
> +{
> + return 0;
> +}
> +
> +static inline const char **
> +find_next_mod_tracepoint_str(int start_index, loff_t *pos)
> +{
> + return NULL;
> +}
> +
> +static inline bool is_module_tracepoint_string(const char *str)
> +{
> + return false;
> +}
> static inline void format_mod_start(void) { }
> static inline void format_mod_stop(void) { }
> #endif /* CONFIG_MODULES */
> @@ -195,6 +327,7 @@ void trace_printk_control(bool enabled)
> __initdata_or_module static
> struct notifier_block module_trace_bprintk_format_nb = {
> .notifier_call = module_trace_bprintk_format_notify,
> + .priority = 0,
> };
>
> int __trace_bprintk(unsigned long ip, const char *fmt, ...)
> @@ -259,12 +392,13 @@ bool trace_is_tracepoint_string(const char *str)
> if (str == *ptr)
> return true;
> }
> - return false;
> + return is_module_tracepoint_string(str);
> }
>
> static const char **find_next(void *v, loff_t *pos)
> {
> const char **fmt = v;
> + int mod_formats;
> int start_index;
> int last_index;
>
> @@ -292,7 +426,12 @@ static const char **find_next(void *v, loff_t *pos)
> return __start___tracepoint_str + (*pos - last_index);
>
> start_index += last_index;
> - return find_next_mod_format(start_index, v, fmt, pos);
> + mod_formats = count_mod_formats();
> + if (*pos < start_index + mod_formats)
> + return find_next_mod_format(start_index, v, fmt, pos);
> +
> + start_index += mod_formats;
> + return find_next_mod_tracepoint_str(start_index, pos);
> }
>
> static void *
^ permalink raw reply
* Re: [RFC PATCH 00/12] rv: Add selftests to tools and KUnit tests
From: Wen Yang @ 2026-04-28 15:09 UTC (permalink / raw)
To: Gabriele Monaco, linux-trace-kernel, linux-kernel
Cc: Steven Rostedt, Nam Cao, Thomas Weissschuh, Tomas Glozar,
John Kacur
In-Reply-To: <20260427151134.192971-1-gmonaco@redhat.com>
On 4/27/26 23:11, Gabriele Monaco wrote:
> This series adds support to the make check target in the rv userspace
> tool and the rvgen script, this allows to quickly validate its
> functionality. The selftest framework is inspired by the one used in
> RTLA.
>
> A few bugs in both tools were also discovered and are fixed as part of
> this series.
>
> Additionally it adds unit tests for models. This is achieved by running
> the handlers functions directly within KUnit, emulating all modules
> paths as if real kernel events fired.
>
> Unit tests emulate a series of events that are expected to trigger
> violations and checks that a reaction occurred, stub structs and
> functions are used so the kernel is not affected by the test.
>
Hi Gabriele,
Good direction overall. The approach of calling event handlers directly
inside KUnit is clean and avoids the complexity of setting up real
tracepoints. Patches 1-4 (bug fixes) look correct.
We are planning to build the KUnit and selftest coverage for the tlob
monitor on top of this infrastructure, so getting this merged would be
useful for us as well.
- One issue found in the KUnit patches:
patch 10: nomiss test
kernel/trace/rv/monitors/nomiss/nomiss.c:
udelay(10 / 1000);
The compiler folds it silently as udelay(0).
Presumably intended as udelay(10)?
- minor: copyright year range
rv_monitors_test.c: Copyright (C) 2025-2028
kunit_stubs.h: Copyright (C) 2026-2029
Kernel copyright entries conventionally use only the year(s) the work
was actually created,eg:
https://lkml.indiana.edu/2510.0/01897.html
Reviewed-by: Wen Yang <wen.yang@linux.dev>
--
Best wishes,
Wen
> To: linux-trace-kernel@vger.kernel.org
> To: linux-kernel@vger.kernel.org
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Nam Cao <namcao@linutronix.de>
> Cc: Thomas Weissschuh <thomas.weissschuh@linutronix.de>
> Cc: Tomas Glozar <tglozar@redhat.com>
> Cc: John Kacur <jkacur@redhat.com>
> Cc: Wen Yang <wen.yang@linux.dev>
>
> Gabriele Monaco (12):
> tools/rv: Fix substring match bug in monitor name search
> tools/rv: Fix substring match when listing container monitors
> tools/rv: Fix exit status when monitor execution fails
> tools/rv: Fix cleanup after failed trace setup
> tools/rv: Add selftests
> verification/rvgen: Fix options shared among commands
> verification/rvgen: Add golden and spec folders for tests
> verification/rvgen: Add selftests
> rv: Add KUnit stub to rv_react() and rv_*_task_monitor_slot()
> rv: Add KUnit tests for some DA/HA monitors
> rv: Add KUnit stubs for current and smp_processor_id()
> rv: Add KUnit tests for some LTL monitors
>
> include/rv/da_monitor.h | 32 +++
> include/rv/kunit_stubs.h | 17 ++
> include/rv/ltl_monitor.h | 32 +++
> kernel/trace/rv/Kconfig | 14 +
> kernel/trace/rv/Makefile | 3 +
> kernel/trace/rv/monitors/nomiss/nomiss.c | 30 +++
> kernel/trace/rv/monitors/opid/opid.c | 27 ++
> .../trace/rv/monitors/pagefault/pagefault.c | 26 +-
> kernel/trace/rv/monitors/sco/sco.c | 23 ++
> kernel/trace/rv/monitors/sleep/sleep.c | 64 ++++-
> kernel/trace/rv/monitors/sssw/sssw.c | 27 ++
> kernel/trace/rv/monitors/sts/sts.c | 35 +++
> kernel/trace/rv/rv.c | 5 +
> kernel/trace/rv/rv_monitors_test.c | 99 +++++++
> kernel/trace/rv/rv_monitors_test.h | 90 +++++++
> kernel/trace/rv/rv_reactors.c | 3 +
> tools/verification/rv/Makefile | 5 +-
> tools/verification/rv/src/in_kernel.c | 58 ++--
> tools/verification/rv/src/rv.c | 2 +-
> tools/verification/rv/tests/rv_list.t | 48 ++++
> tools/verification/rv/tests/rv_mon.t | 95 +++++++
> tools/verification/rvgen/Makefile | 4 +
> tools/verification/rvgen/__main__.py | 10 +-
> .../rvgen/tests/golden/da_global/Kconfig | 9 +
> .../rvgen/tests/golden/da_global/da_global.c | 95 +++++++
> .../rvgen/tests/golden/da_global/da_global.h | 47 ++++
> .../tests/golden/da_global/da_global_trace.h | 15 ++
> .../tests/golden/da_perobj_parent/Kconfig | 11 +
> .../da_perobj_parent/da_perobj_parent.c | 110 ++++++++
> .../da_perobj_parent/da_perobj_parent.h | 64 +++++
> .../da_perobj_parent/da_perobj_parent_trace.h | 15 ++
> .../tests/golden/da_pertask_desc/Kconfig | 9 +
> .../golden/da_pertask_desc/da_pertask_desc.c | 105 ++++++++
> .../golden/da_pertask_desc/da_pertask_desc.h | 64 +++++
> .../da_pertask_desc/da_pertask_desc_trace.h | 15 ++
> .../rvgen/tests/golden/ha_percpu/Kconfig | 9 +
> .../rvgen/tests/golden/ha_percpu/ha_percpu.c | 244 +++++++++++++++++
> .../rvgen/tests/golden/ha_percpu/ha_percpu.h | 72 +++++
> .../tests/golden/ha_percpu/ha_percpu_trace.h | 19 ++
> .../rvgen/tests/golden/ltl_pertask/Kconfig | 9 +
> .../tests/golden/ltl_pertask/ltl_pertask.c | 107 ++++++++
> .../tests/golden/ltl_pertask/ltl_pertask.h | 108 ++++++++
> .../golden/ltl_pertask/ltl_pertask_trace.h | 14 +
> .../rvgen/tests/golden/test_container/Kconfig | 5 +
> .../golden/test_container/test_container.c | 35 +++
> .../golden/test_container/test_container.h | 3 +
> .../rvgen/tests/golden/test_da/Kconfig | 9 +
> .../rvgen/tests/golden/test_da/test_da.c | 95 +++++++
> .../rvgen/tests/golden/test_da/test_da.h | 47 ++++
> .../tests/golden/test_da/test_da_trace.h | 15 ++
> .../rvgen/tests/golden/test_ha/Kconfig | 9 +
> .../rvgen/tests/golden/test_ha/test_ha.c | 247 ++++++++++++++++++
> .../rvgen/tests/golden/test_ha/test_ha.h | 72 +++++
> .../tests/golden/test_ha/test_ha_trace.h | 19 ++
> .../rvgen/tests/golden/test_ltl/Kconfig | 11 +
> .../rvgen/tests/golden/test_ltl/test_ltl.c | 108 ++++++++
> .../rvgen/tests/golden/test_ltl/test_ltl.h | 108 ++++++++
> .../tests/golden/test_ltl/test_ltl_trace.h | 14 +
> .../rvgen/tests/rvgen_container.t | 20 ++
> .../verification/rvgen/tests/rvgen_monitor.t | 87 ++++++
> .../rvgen/tests/specs/test_da.dot | 16 ++
> .../rvgen/tests/specs/test_da2.dot | 18 ++
> .../rvgen/tests/specs/test_ha.dot | 27 ++
> .../rvgen/tests/specs/test_invalid.dot | 8 +
> .../rvgen/tests/specs/test_invalid.ltl | 1 +
> .../rvgen/tests/specs/test_invalid_ha.dot | 16 ++
> .../rvgen/tests/specs/test_ltl.ltl | 1 +
> tools/verification/tests/engine.sh | 156 +++++++++++
> 68 files changed, 2993 insertions(+), 44 deletions(-)
> create mode 100644 include/rv/kunit_stubs.h
> create mode 100644 kernel/trace/rv/rv_monitors_test.c
> create mode 100644 kernel/trace/rv/rv_monitors_test.h
> create mode 100644 tools/verification/rv/tests/rv_list.t
> create mode 100644 tools/verification/rv/tests/rv_mon.t
> create mode 100644 tools/verification/rvgen/tests/golden/da_global/Kconfig
> create mode 100644 tools/verification/rvgen/tests/golden/da_global/da_global.c
> create mode 100644 tools/verification/rvgen/tests/golden/da_global/da_global.h
> create mode 100644 tools/verification/rvgen/tests/golden/da_global/da_global_trace.h
> create mode 100644 tools/verification/rvgen/tests/golden/da_perobj_parent/Kconfig
> create mode 100644 tools/verification/rvgen/tests/golden/da_perobj_parent/da_perobj_parent.c
> create mode 100644 tools/verification/rvgen/tests/golden/da_perobj_parent/da_perobj_parent.h
> create mode 100644 tools/verification/rvgen/tests/golden/da_perobj_parent/da_perobj_parent_trace.h
> create mode 100644 tools/verification/rvgen/tests/golden/da_pertask_desc/Kconfig
> create mode 100644 tools/verification/rvgen/tests/golden/da_pertask_desc/da_pertask_desc.c
> create mode 100644 tools/verification/rvgen/tests/golden/da_pertask_desc/da_pertask_desc.h
> create mode 100644 tools/verification/rvgen/tests/golden/da_pertask_desc/da_pertask_desc_trace.h
> create mode 100644 tools/verification/rvgen/tests/golden/ha_percpu/Kconfig
> create mode 100644 tools/verification/rvgen/tests/golden/ha_percpu/ha_percpu.c
> create mode 100644 tools/verification/rvgen/tests/golden/ha_percpu/ha_percpu.h
> create mode 100644 tools/verification/rvgen/tests/golden/ha_percpu/ha_percpu_trace.h
> create mode 100644 tools/verification/rvgen/tests/golden/ltl_pertask/Kconfig
> create mode 100644 tools/verification/rvgen/tests/golden/ltl_pertask/ltl_pertask.c
> create mode 100644 tools/verification/rvgen/tests/golden/ltl_pertask/ltl_pertask.h
> create mode 100644 tools/verification/rvgen/tests/golden/ltl_pertask/ltl_pertask_trace.h
> create mode 100644 tools/verification/rvgen/tests/golden/test_container/Kconfig
> create mode 100644 tools/verification/rvgen/tests/golden/test_container/test_container.c
> create mode 100644 tools/verification/rvgen/tests/golden/test_container/test_container.h
> create mode 100644 tools/verification/rvgen/tests/golden/test_da/Kconfig
> create mode 100644 tools/verification/rvgen/tests/golden/test_da/test_da.c
> create mode 100644 tools/verification/rvgen/tests/golden/test_da/test_da.h
> create mode 100644 tools/verification/rvgen/tests/golden/test_da/test_da_trace.h
> create mode 100644 tools/verification/rvgen/tests/golden/test_ha/Kconfig
> create mode 100644 tools/verification/rvgen/tests/golden/test_ha/test_ha.c
> create mode 100644 tools/verification/rvgen/tests/golden/test_ha/test_ha.h
> create mode 100644 tools/verification/rvgen/tests/golden/test_ha/test_ha_trace.h
> create mode 100644 tools/verification/rvgen/tests/golden/test_ltl/Kconfig
> create mode 100644 tools/verification/rvgen/tests/golden/test_ltl/test_ltl.c
> create mode 100644 tools/verification/rvgen/tests/golden/test_ltl/test_ltl.h
> create mode 100644 tools/verification/rvgen/tests/golden/test_ltl/test_ltl_trace.h
> create mode 100644 tools/verification/rvgen/tests/rvgen_container.t
> create mode 100644 tools/verification/rvgen/tests/rvgen_monitor.t
> create mode 100644 tools/verification/rvgen/tests/specs/test_da.dot
> create mode 100644 tools/verification/rvgen/tests/specs/test_da2.dot
> create mode 100644 tools/verification/rvgen/tests/specs/test_ha.dot
> create mode 100644 tools/verification/rvgen/tests/specs/test_invalid.dot
> create mode 100644 tools/verification/rvgen/tests/specs/test_invalid.ltl
> create mode 100644 tools/verification/rvgen/tests/specs/test_invalid_ha.dot
> create mode 100644 tools/verification/rvgen/tests/specs/test_ltl.ltl
> create mode 100644 tools/verification/tests/engine.sh
>
>
> base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
^ permalink raw reply
* Re: [RFC PATCH 00/12] rv: Add selftests to tools and KUnit tests
From: Gabriele Monaco @ 2026-04-28 15:27 UTC (permalink / raw)
To: Wen Yang, linux-trace-kernel, linux-kernel
Cc: Steven Rostedt, Nam Cao, Thomas Weissschuh, Tomas Glozar,
John Kacur
In-Reply-To: <d0c1fe9d-8d51-4c4f-897b-45444de1286d@linux.dev>
Hi Wen,
thanks for the review!
On Tue, 2026-04-28 at 23:09 +0800, Wen Yang wrote:
> Hi Gabriele,
>
> Good direction overall. The approach of calling event handlers
> directly inside KUnit is clean and avoids the complexity of setting
> up real tracepoints. Patches 1-4 (bug fixes) look correct.
>
> We are planning to build the KUnit and selftest coverage for the tlob
> monitor on top of this infrastructure, so getting this merged would
> be useful for us as well.
Great, for the time being, feel free to either base your work on this
series or only focus on the test cases, whichever is more comfortable
for you.
The RV kernel-side selftests are already upstream so that won't be a
blocker anyway.
> - One issue found in the KUnit patches:
>
> patch 10: nomiss test
> kernel/trace/rv/monitors/nomiss/nomiss.c:
> udelay(10 / 1000);
>
> The compiler folds it silently as udelay(0).
> Presumably intended as udelay(10)?
>
Yes, that was quite a dumb oversight..
> - minor: copyright year range
> rv_monitors_test.c: Copyright (C) 2025-2028
> kunit_stubs.h: Copyright (C) 2026-2029
>
> Kernel copyright entries conventionally use only the year(s) the
> work
> was actually created,eg:
> https://lkml.indiana.edu/2510.0/01897.html
>
Right, I didn't notice that! Will update for the next submission.
>
> Reviewed-by: Wen Yang <wen.yang@linux.dev>
>
Thanks,
Gabriele
> --
> Best wishes,
> Wen
>
>
> > To: linux-trace-kernel@vger.kernel.org
> > To: linux-kernel@vger.kernel.org
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Nam Cao <namcao@linutronix.de>
> > Cc: Thomas Weissschuh <thomas.weissschuh@linutronix.de>
> > Cc: Tomas Glozar <tglozar@redhat.com>
> > Cc: John Kacur <jkacur@redhat.com>
> > Cc: Wen Yang <wen.yang@linux.dev>
> >
> > Gabriele Monaco (12):
> > tools/rv: Fix substring match bug in monitor name search
> > tools/rv: Fix substring match when listing container monitors
> > tools/rv: Fix exit status when monitor execution fails
> > tools/rv: Fix cleanup after failed trace setup
> > tools/rv: Add selftests
> > verification/rvgen: Fix options shared among commands
> > verification/rvgen: Add golden and spec folders for tests
> > verification/rvgen: Add selftests
> > rv: Add KUnit stub to rv_react() and rv_*_task_monitor_slot()
> > rv: Add KUnit tests for some DA/HA monitors
> > rv: Add KUnit stubs for current and smp_processor_id()
> > rv: Add KUnit tests for some LTL monitors
> >
> > include/rv/da_monitor.h | 32 +++
> > include/rv/kunit_stubs.h | 17 ++
> > include/rv/ltl_monitor.h | 32 +++
> > kernel/trace/rv/Kconfig | 14 +
> > kernel/trace/rv/Makefile | 3 +
> > kernel/trace/rv/monitors/nomiss/nomiss.c | 30 +++
> > kernel/trace/rv/monitors/opid/opid.c | 27 ++
> > .../trace/rv/monitors/pagefault/pagefault.c | 26 +-
> > kernel/trace/rv/monitors/sco/sco.c | 23 ++
> > kernel/trace/rv/monitors/sleep/sleep.c | 64 ++++-
> > kernel/trace/rv/monitors/sssw/sssw.c | 27 ++
> > kernel/trace/rv/monitors/sts/sts.c | 35 +++
> > kernel/trace/rv/rv.c | 5 +
> > kernel/trace/rv/rv_monitors_test.c | 99 +++++++
> > kernel/trace/rv/rv_monitors_test.h | 90 +++++++
> > kernel/trace/rv/rv_reactors.c | 3 +
> > tools/verification/rv/Makefile | 5 +-
> > tools/verification/rv/src/in_kernel.c | 58 ++--
> > tools/verification/rv/src/rv.c | 2 +-
> > tools/verification/rv/tests/rv_list.t | 48 ++++
> > tools/verification/rv/tests/rv_mon.t | 95 +++++++
> > tools/verification/rvgen/Makefile | 4 +
> > tools/verification/rvgen/__main__.py | 10 +-
> > .../rvgen/tests/golden/da_global/Kconfig | 9 +
> > .../rvgen/tests/golden/da_global/da_global.c | 95 +++++++
> > .../rvgen/tests/golden/da_global/da_global.h | 47 ++++
> > .../tests/golden/da_global/da_global_trace.h | 15 ++
> > .../tests/golden/da_perobj_parent/Kconfig | 11 +
> > .../da_perobj_parent/da_perobj_parent.c | 110 ++++++++
> > .../da_perobj_parent/da_perobj_parent.h | 64 +++++
> > .../da_perobj_parent/da_perobj_parent_trace.h | 15 ++
> > .../tests/golden/da_pertask_desc/Kconfig | 9 +
> > .../golden/da_pertask_desc/da_pertask_desc.c | 105 ++++++++
> > .../golden/da_pertask_desc/da_pertask_desc.h | 64 +++++
> > .../da_pertask_desc/da_pertask_desc_trace.h | 15 ++
> > .../rvgen/tests/golden/ha_percpu/Kconfig | 9 +
> > .../rvgen/tests/golden/ha_percpu/ha_percpu.c | 244
> > +++++++++++++++++
> > .../rvgen/tests/golden/ha_percpu/ha_percpu.h | 72 +++++
> > .../tests/golden/ha_percpu/ha_percpu_trace.h | 19 ++
> > .../rvgen/tests/golden/ltl_pertask/Kconfig | 9 +
> > .../tests/golden/ltl_pertask/ltl_pertask.c | 107 ++++++++
> > .../tests/golden/ltl_pertask/ltl_pertask.h | 108 ++++++++
> > .../golden/ltl_pertask/ltl_pertask_trace.h | 14 +
> > .../rvgen/tests/golden/test_container/Kconfig | 5 +
> > .../golden/test_container/test_container.c | 35 +++
> > .../golden/test_container/test_container.h | 3 +
> > .../rvgen/tests/golden/test_da/Kconfig | 9 +
> > .../rvgen/tests/golden/test_da/test_da.c | 95 +++++++
> > .../rvgen/tests/golden/test_da/test_da.h | 47 ++++
> > .../tests/golden/test_da/test_da_trace.h | 15 ++
> > .../rvgen/tests/golden/test_ha/Kconfig | 9 +
> > .../rvgen/tests/golden/test_ha/test_ha.c | 247
> > ++++++++++++++++++
> > .../rvgen/tests/golden/test_ha/test_ha.h | 72 +++++
> > .../tests/golden/test_ha/test_ha_trace.h | 19 ++
> > .../rvgen/tests/golden/test_ltl/Kconfig | 11 +
> > .../rvgen/tests/golden/test_ltl/test_ltl.c | 108 ++++++++
> > .../rvgen/tests/golden/test_ltl/test_ltl.h | 108 ++++++++
> > .../tests/golden/test_ltl/test_ltl_trace.h | 14 +
> > .../rvgen/tests/rvgen_container.t | 20 ++
> > .../verification/rvgen/tests/rvgen_monitor.t | 87 ++++++
> > .../rvgen/tests/specs/test_da.dot | 16 ++
> > .../rvgen/tests/specs/test_da2.dot | 18 ++
> > .../rvgen/tests/specs/test_ha.dot | 27 ++
> > .../rvgen/tests/specs/test_invalid.dot | 8 +
> > .../rvgen/tests/specs/test_invalid.ltl | 1 +
> > .../rvgen/tests/specs/test_invalid_ha.dot | 16 ++
> > .../rvgen/tests/specs/test_ltl.ltl | 1 +
> > tools/verification/tests/engine.sh | 156 +++++++++++
> > 68 files changed, 2993 insertions(+), 44 deletions(-)
> > create mode 100644 include/rv/kunit_stubs.h
> > create mode 100644 kernel/trace/rv/rv_monitors_test.c
> > create mode 100644 kernel/trace/rv/rv_monitors_test.h
> > create mode 100644 tools/verification/rv/tests/rv_list.t
> > create mode 100644 tools/verification/rv/tests/rv_mon.t
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_global/Kconfig
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_global/da_global.c
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_global/da_global.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_global/da_global_trace.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_perobj_parent/Kconfig
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_perobj_parent/da_perobj_pa
> > rent.c
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_perobj_parent/da_perobj_pa
> > rent.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_perobj_parent/da_perobj_pa
> > rent_trace.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_pertask_desc/Kconfig
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_pertask_desc/da_pertask_de
> > sc.c
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_pertask_desc/da_pertask_de
> > sc.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/da_pertask_desc/da_pertask_de
> > sc_trace.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/ha_percpu/Kconfig
> > create mode 100644
> > tools/verification/rvgen/tests/golden/ha_percpu/ha_percpu.c
> > create mode 100644
> > tools/verification/rvgen/tests/golden/ha_percpu/ha_percpu.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/ha_percpu/ha_percpu_trace.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/ltl_pertask/Kconfig
> > create mode 100644
> > tools/verification/rvgen/tests/golden/ltl_pertask/ltl_pertask.c
> > create mode 100644
> > tools/verification/rvgen/tests/golden/ltl_pertask/ltl_pertask.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/ltl_pertask/ltl_pertask_trace
> > .h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_container/Kconfig
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_container/test_container
> > .c
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_container/test_container
> > .h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_da/Kconfig
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_da/test_da.c
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_da/test_da.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_da/test_da_trace.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_ha/Kconfig
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_ha/test_ha.c
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_ha/test_ha.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_ha/test_ha_trace.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_ltl/Kconfig
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_ltl/test_ltl.c
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_ltl/test_ltl.h
> > create mode 100644
> > tools/verification/rvgen/tests/golden/test_ltl/test_ltl_trace.h
> > create mode 100644
> > tools/verification/rvgen/tests/rvgen_container.t
> > create mode 100644 tools/verification/rvgen/tests/rvgen_monitor.t
> > create mode 100644
> > tools/verification/rvgen/tests/specs/test_da.dot
> > create mode 100644
> > tools/verification/rvgen/tests/specs/test_da2.dot
> > create mode 100644
> > tools/verification/rvgen/tests/specs/test_ha.dot
> > create mode 100644
> > tools/verification/rvgen/tests/specs/test_invalid.dot
> > create mode 100644
> > tools/verification/rvgen/tests/specs/test_invalid.ltl
> > create mode 100644
> > tools/verification/rvgen/tests/specs/test_invalid_ha.dot
> > create mode 100644
> > tools/verification/rvgen/tests/specs/test_ltl.ltl
> > create mode 100644 tools/verification/tests/engine.sh
> >
> >
> > base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
^ permalink raw reply
* [PATCH] tracing/probes: Limit size of event probe to 3K
From: Steven Rostedt @ 2026-04-28 16:23 UTC (permalink / raw)
To: LKML, Linux Trace Kernel; +Cc: Masami Hiramatsu, Mathieu Desnoyers
From: Steven Rostedt <rostedt@goodmis.org>
There currently isn't a max limit an event probe can be. One could make an
event greater than PAGE_SIZE, which makes the event useless because if
it's bigger than the max event that can be recorded into the ring buffer,
then it will never be recorded.
A event probe should never need to be greater than 3K, so make that the
max size. As long as the max is less than the max that can be recorded
onto the ring buffer, it should be fine.
Cc: stable@vger.kernel.org
Fixes: 93ccae7a22274 ("tracing/kprobes: Support basic types on dynamic events")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/trace/trace_probe.c | 6 ++++++
kernel/trace/trace_probe.h | 4 +++-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index e1c73065dae5..e0d3a0da26af 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1523,6 +1523,12 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
parg->offset = *size;
*size += parg->type->size * (parg->count ?: 1);
+ if (*size > MAX_PROBE_EVENT_SIZE) {
+ ret = -E2BIG;
+ trace_probe_log_err(ctx->offset, EVENT_TOO_BIG);
+ goto fail;
+ }
+
if (parg->count) {
len = strlen(parg->type->fmttype) + 6;
parg->fmt = kmalloc(len, GFP_KERNEL);
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 9fc56c937130..262d8707a3df 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -38,6 +38,7 @@
#define MAX_BTF_ARGS_LEN 128
#define MAX_DENTRY_ARGS_LEN 256
#define MAX_STRING_SIZE PATH_MAX
+#define MAX_PROBE_EVENT_SIZE 3072
/* Reserved field names */
#define FIELD_STRING_IP "__probe_ip"
@@ -561,7 +562,8 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(BAD_TYPE4STR, "This type does not fit for string."),\
C(NEED_STRING_TYPE, "$comm and immediate-string only accepts string type"),\
C(TOO_MANY_ARGS, "Too many arguments are specified"), \
- C(TOO_MANY_EARGS, "Too many entry arguments specified"),
+ C(TOO_MANY_EARGS, "Too many entry arguments specified"), \
+ C(EVENT_TOO_BIG, "Event too big (too many fields?)"),
#undef C
#define C(a, b) TP_ERR_##a
--
2.53.0
^ permalink raw reply related
* Re: [PATCH] trace: remove the dead IS_ERR() check in trace_pipe_open()
From: Steven Rostedt @ 2026-04-28 17:11 UTC (permalink / raw)
To: Yash Suthar; +Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel
In-Reply-To: <20260420101236.223919-1-yashsuthar983@gmail.com>
On Mon, 20 Apr 2026 15:42:36 +0530
Yash Suthar <yashsuthar983@gmail.com> wrote:
"remove the dead"? What is this? a horror flick? ;-)
The subsystem is "tracing" not "trace" and the first word should be
capitalized. I changed the subject to:
[PATCH] tracing: Remove redundant IS_ERR() check in trace_pipe_open()
And pulled it in.
When submitting changes to a subsystem, please do a git log --no-merges to
see how other commits are done in that subsystem.
Thanks,
-- Steve
> in trace_pipe_open() already check the IS_ERR(iter) and
> return early on error,so iter after will be valid and
> it is safe to return 0 at end.
>
> Signed-off-by: Yash Suthar <yashsuthar983@gmail.com>
> ---
> kernel/trace/trace_remote.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c
> index d6c3f94d67cd..2a6cc000ec98 100644
> --- a/kernel/trace/trace_remote.c
> +++ b/kernel/trace/trace_remote.c
> @@ -602,7 +602,7 @@ static int trace_pipe_open(struct inode *inode, struct file *filp)
>
> filp->private_data = iter;
>
> - return IS_ERR(iter) ? PTR_ERR(iter) : 0;
> + return 0;
> }
>
> static int trace_pipe_release(struct inode *inode, struct file *filp)
^ permalink raw reply
* Re: [PATCH v5] tracing: Bound synthetic-field strings with seq_buf
From: Steven Rostedt @ 2026-04-28 17:32 UTC (permalink / raw)
To: Pengpeng Hou
Cc: Masami Hiramatsu, Mathieu Desnoyers, Tom Zanussi,
linux-trace-kernel, linux-kernel
In-Reply-To: <20260424070104.1-tracing-synth-v5-pengpeng@iscas.ac.cn>
I should have also mentioned. Please, new versions of a patch should always
start a new thread. Don't reply to the previous version. That makes it very
difficult to find which is the latest version.
On Thu, 23 Apr 2026 23:33:00 +0800
Pengpeng Hou <pengpeng@iscas.ac.cn> wrote:
> The synthetic field helpers build a prefixed synthetic variable name and
> a generated hist command in fixed MAX_FILTER_STR_VAL buffers. The
> current code appends those strings with raw strcat(), so long key lists,
> field names, or saved filters can run past the end of the staging
> buffers.
>
> Build both strings with seq_buf and propagate -E2BIG if either the
> synthetic variable name or the generated command exceeds
> MAX_FILTER_STR_VAL. This keeps the existing tracing-side limit while
> using the helper intended for bounded command construction.
>
> Fixes: 02205a6752f2 ("tracing: Add support for 'field variables'")
> Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> ---
> Changes since v4:
To keep a link to the previous version, use lore links to access it (the
message ID of the previous version attached to
"https://lore.kernel.org/all/$message-id")
Changes since v4: https://lore.kernel.org/all/20260417223001.1-tracing-synth-v4-pengpeng@iscas.ac.cn/
> - add the requested blank lines around seq_buf_str() comments
> - add the seq_buf_str() comment for the generated command buffer too
> - keep saved_filter scoped next to its point of use
>
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 0dbbf6cca9bc..87429567417f 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -8,6 +8,7 @@
> #include <linux/module.h>
> #include <linux/kallsyms.h>
> #include <linux/security.h>
> +#include <linux/seq_buf.h>
> #include <linux/mutex.h>
> #include <linux/slab.h>
> #include <linux/stacktrace.h>
> @@ -2968,14 +2969,24 @@ find_synthetic_field_var(struct hist_trigger_data *target_hist_data,
> char *system, char *event_name, char *field_name)
> {
> struct hist_field *event_var;
> + struct seq_buf s;
> char *synthetic_name;
>
> synthetic_name = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
> if (!synthetic_name)
> return ERR_PTR(-ENOMEM);
>
> - strcpy(synthetic_name, "synthetic_");
> - strcat(synthetic_name, field_name);
> + seq_buf_init(&s, synthetic_name, MAX_FILTER_STR_VAL);
> + seq_buf_puts(&s, "synthetic_");
> + seq_buf_puts(&s, field_name);
Hmm, the above is probably simpler with:
seq_buf_printf(&s, "synthetic_%s", field_name);
> +
> + /* Terminate synthetic_name with a NUL. */
> + seq_buf_str(&s);
> +
> + if (seq_buf_has_overflowed(&s)) {
> + kfree(synthetic_name);
> + return ERR_PTR(-E2BIG);
> + }
>
> event_var = find_event_var(target_hist_data, system, event_name, synthetic_name);
>
> @@ -3020,7 +3031,7 @@ create_field_var_hist(struct hist_trigger_data *target_hist_data,
> struct trace_event_file *file;
> struct hist_field *key_field;
> struct hist_field *event_var;
> - char *saved_filter;
Don't remove this for an anonymous block.
> + struct seq_buf s;
> char *cmd;
> int ret;
>
> @@ -3065,28 +3076,37 @@ create_field_var_hist(struct hist_trigger_data *target_hist_data,
> return ERR_PTR(-ENOMEM);
> }
>
> + seq_buf_init(&s, cmd, MAX_FILTER_STR_VAL);
> +
> /* Use the same keys as the compatible histogram */
> - strcat(cmd, "keys=");
> + seq_buf_puts(&s, "keys=");
>
> for_each_hist_key_field(i, hist_data) {
> key_field = hist_data->fields[i];
> if (!first)
> - strcat(cmd, ",");
> - strcat(cmd, key_field->field->name);
> + seq_buf_putc(&s, ',');
> + seq_buf_puts(&s, key_field->field->name);
> first = false;
> }
>
> /* Create the synthetic field variable specification */
> - strcat(cmd, ":synthetic_");
> - strcat(cmd, field_name);
> - strcat(cmd, "=");
> - strcat(cmd, field_name);
> + seq_buf_printf(&s, ":synthetic_%s=%s", field_name, field_name);
>
> /* Use the same filter as the compatible histogram */
> - saved_filter = find_trigger_filter(hist_data, file);
> - if (saved_filter) {
> - strcat(cmd, " if ");
> - strcat(cmd, saved_filter);
> + {
We don't add anonymous blocks in the kernel (with the exception of needing
to be around #ifdefs).
-- Steve
> + char *saved_filter = find_trigger_filter(hist_data, file);
> +
> + if (saved_filter)
> + seq_buf_printf(&s, " if %s", saved_filter);
> + }
> +
> + /* Terminate cmd with a NUL. */
> + seq_buf_str(&s);
> +
> + if (seq_buf_has_overflowed(&s)) {
> + kfree(cmd);
> + kfree(var_hist);
> + return ERR_PTR(-E2BIG);
> }
>
> var_hist->cmd = kstrdup(cmd, GFP_KERNEL);
^ permalink raw reply
* Re: [RFC PATCH 16/19] mm/damon: trace probe_hits
From: Steven Rostedt @ 2026-04-28 18:17 UTC (permalink / raw)
To: SeongJae Park
Cc: Andrew Morton, Masami Hiramatsu, Mathieu Desnoyers, damon,
linux-kernel, linux-mm, linux-trace-kernel
In-Reply-To: <20260426205222.93895-17-sj@kernel.org>
On Sun, 26 Apr 2026 13:52:17 -0700
SeongJae Park <sj@kernel.org> wrote:
> Introduce a new tracepoint for exposing the per-region per-probe
> positive sample count via tracefs.
>
> Signed-off-by: SeongJae Park <sj@kernel.org>
> ---
> include/trace/events/damon.h | 41 ++++++++++++++++++++++++++++++++++++
> mm/damon/core.c | 1 +
> 2 files changed, 42 insertions(+)
>
> diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
> index 7e25f4469b81b..121d7bc3a2c27 100644
> --- a/include/trace/events/damon.h
> +++ b/include/trace/events/damon.h
> @@ -130,6 +130,47 @@ TRACE_EVENT(damon_monitor_intervals_tune,
> TP_printk("sample_us=%lu", __entry->sample_us)
> );
>
> +TRACE_EVENT(damon_aggregated_v2,
> +
> + TP_PROTO(unsigned int target_id, struct damon_region *r,
> + unsigned int nr_regions),
> +
> + TP_ARGS(target_id, r, nr_regions),
> +
> + TP_STRUCT__entry(
> + __field(unsigned long, target_id)
> + __field(unsigned int, nr_regions)
Move the nr_regions to after "end" as on 64 bit machines, this creates a 4
byte hole.
-- Steve
> + __field(unsigned long, start)
> + __field(unsigned long, end)
> + __field(unsigned int, nr_accesses)
> + __field(unsigned int, age)
> + __field(unsigned char, probe_hit0)
> + __field(unsigned char, probe_hit1)
> + __field(unsigned char, probe_hit2)
> + __field(unsigned char, probe_hit3)
> + ),
> +
> + TP_fast_assign(
> + __entry->target_id = target_id;
> + __entry->nr_regions = nr_regions;
> + __entry->start = r->ar.start;
> + __entry->end = r->ar.end;
> + __entry->nr_accesses = r->nr_accesses;
> + __entry->age = r->age;
> + __entry->probe_hit0 = r->probe_hits[0];
> + __entry->probe_hit1 = r->probe_hits[1];
> + __entry->probe_hit2 = r->probe_hits[2];
> + __entry->probe_hit3 = r->probe_hits[3];
> + ),
> +
> + TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u %u %hhu %hhu %hhu %hhu",
> + __entry->target_id, __entry->nr_regions,
> + __entry->start, __entry->end,
> + __entry->nr_accesses, __entry->age,
> + __entry->probe_hit0, __entry->probe_hit1,
> + __entry->probe_hit2, __entry->probe_hit3)
> +);
> +
> TRACE_EVENT(damon_aggregated,
>
> TP_PROTO(unsigned int target_id, struct damon_region *r,
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index fe14971d72747..54834b74efef4 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -1924,6 +1924,7 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
> int i;
>
> trace_damon_aggregated(ti, r, damon_nr_regions(t));
> + trace_damon_aggregated_v2(ti, r, damon_nr_regions(t));
> damon_warn_fix_nr_accesses_corruption(r);
> r->last_nr_accesses = r->nr_accesses;
> r->nr_accesses = 0;
^ permalink raw reply
* Re: [PATCH v2] mm/page_alloc: trace PCP refills and PCP zone lock usage
From: Steven Rostedt @ 2026-04-28 18:23 UTC (permalink / raw)
To: Bunyod Suvonov
Cc: akpm, vbabka, linux-mm, mhiramat, mathieu.desnoyers,
linux-trace-kernel, linux-kernel, surenb, mhocko, jackmanb,
hannes, ziy, david, vishal.moola, corbet, skhan, linux-doc
In-Reply-To: <20260427060142.131055-1-b.suvonov@sjtu.edu.cn>
On Mon, 27 Apr 2026 14:01:42 +0800
Bunyod Suvonov <b.suvonov@sjtu.edu.cn> wrote:
> + TP_STRUCT__entry(
> + __field(int, nid)
> + __field(int, zid)
> + __field(unsigned long, nr_pages)
> + ),
> +
> + TP_fast_assign(
> + __entry->nid = nid;
> + __entry->zid = zid;
> + __entry->nr_pages = nr_pages;
> + ),
> +
> + TP_printk("nid=%d zid=%d nr_pages=%lu",
> + __entry->nid, __entry->zid, __entry->nr_pages)
> +);
> +
> +DEFINE_EVENT(mm_page_pcpu_zone_locked, mm_page_pcpu_refill_zone_locked,
> +
> + TP_PROTO(int nid, int zid, unsigned long nr_pages),
> +
> + TP_ARGS(nid, zid, nr_pages)
> +);
> +
> +DEFINE_EVENT(mm_page_pcpu_zone_locked, mm_page_pcpu_drain_zone_locked,
> +
> + TP_PROTO(int nid, int zid, unsigned long nr_pages),
> +
> + TP_ARGS(nid, zid, nr_pages)
> +);
> +
> +DECLARE_EVENT_CLASS(mm_page_pcpu,
>
> TP_PROTO(struct page *page, unsigned int order, int migratetype),
>
> TP_ARGS(page, order, migratetype),
>
> TP_STRUCT__entry(
> - __field( unsigned long, pfn )
> - __field( unsigned int, order )
> - __field( int, migratetype )
> + __field(unsigned long, pfn)
> + __field(unsigned int, order)
> + __field(int, migratetype)
Why this change? It makes it much harder to understand.
The above is not a normal macro. Ignore any checkpatch warnings about it.
The proper way to do the TP_STRUCT__entry() is to make it just like a struct:
struct {
unsigned long pfn;
unsigned int order;
int migratetype;
};
Thus, the macro should be:
TP_STRUCT__entry(
__field( unsigned long, pfn )
__field( unsigned int, order )
__field( int, migratetype )
),
-- Steve
> ),
>
^ permalink raw reply
* Re: [PATCH] fprobe: Add unregister_fprobe_sync() for synchronous unregistration
From: Steven Rostedt @ 2026-04-28 18:27 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Mathieu Desnoyers, Jonathan Corbet, linux-kernel,
linux-trace-kernel, linux-doc
In-Reply-To: <177729179863.401400.6063130067239479972.stgit@mhiramat.tok.corp.google.com>
On Mon, 27 Apr 2026 21:09:58 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> +/**
> + * unregister_fprobe_sync() - Unregister fprobe synchronously with RCU grace period.
> + * @fp: A fprobe data structure to be unregistered.
> + *
> + * Unregister fprobe (and remove ftrace hooks from the function entries) and
> + * wait for the RCU grace period to finish. This is useful for preventing
> + * the fprobe from being used after it is unregistered.
> + *
> + * Return 0 if @fp is unregistered successfully, -errno if not.
> + */
> +int unregister_fprobe_sync(struct fprobe *fp)
> +{
> + int ret;
> +
> + guard(mutex)(&fprobe_mutex);
> + if (!fp || !fprobe_registered(fp))
> + return -EINVAL;
> +
> + ret = unregister_fprobe_nolock(fp);
> + if (ret)
> + return ret;
> +
> + synchronize_rcu();
Hmm, do we really need to hold the fprobe_mutex when doing the
synchronize_rcu()? This could cause other updates to have to wait longer
too.
-- Steve
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(unregister_fprobe_sync);
^ permalink raw reply
* Re: [PATCH v3 1/4] tracing/preemptirq: Optimize preempt_disable/enable() tracepoint overhead
From: Steven Rostedt @ 2026-04-28 18:51 UTC (permalink / raw)
To: Wander Lairson Costa
Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
Masami Hiramatsu, Mathieu Desnoyers, Andrew Morton,
open list:SCHEDULER, open list:TRACING, acme, williams, gmonaco
In-Reply-To: <abQsFM9tFPr3Mrtt@fedora>
On Fri, 13 Mar 2026 12:36:10 -0300
Wander Lairson Costa <wander@redhat.com> wrote:
> On Fri, Mar 13, 2026 at 10:04:04AM +0100, Peter Zijlstra wrote:
> > On Thu, Mar 12, 2026 at 02:19:15PM -0300, Wander Lairson Costa wrote:
> >
> > > > That's significant bloat, for really very little gain. Realistically
> > > > nobody is going to need these.
> > > >
> > >
> > > Of course, I can't speak for others, but more than once I debugged issues
> > > that those tracepoints had made my life far easier. Those cases convinced
> > > me that such a feature would be worth it. But if you don't see
> > > value and will reject the patches no matter what, nothing can be done,
> > > and I will have to accept defeat.
> >
> > If distros are going to enable this, I suppose I'm not going to stop
> > this. But I do very much worry about the general bloat of things, there
> > are a *LOT* of preempt_{dis,en}able() sites.
> >
>
> We plan to enable these tracepoints in the RHEL kernel-rt to track
> extended non-preemptible states that cause high latencies. These
> issues occasionally surface in customer OpenShift deployments, where
> deploying a custom debug kernel is highly impractical. Having these
> tracepoints available in the distribution kernel would be handful for
> debugging these production systems. That said, I expect enabling this
> feature to be the exception rather than the rule — most distribution
> kernels would leave it disabled.
Is this work going to continue? Or should I just change the status to
"reject" in patchwork?
-- Steve
^ permalink raw reply
* Re: [PATCH v3 4/4] trace/preemptirq: Implement trace_irqflags hooks
From: Steven Rostedt @ 2026-04-28 18:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Wander Lairson Costa, Ingo Molnar, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
Masami Hiramatsu, Mathieu Desnoyers, Andrew Morton, open list,
open list:TRACING, acme, williams, gmonaco
In-Reply-To: <20260311194305.GT606826@noisy.programming.kicks-ass.net>
On Wed, 11 Mar 2026 20:43:05 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Mar 11, 2026 at 09:50:18AM -0300, Wander Lairson Costa wrote:
> > +#define local_irq_enable() \
> > + do { \
> > + if (tracepoint_enabled(irq_enable)) \
> > + trace_local_irq_enable(); \
>
> I'm thinking you didn't even look at the assembly generated :/
>
> Otherwise you would have written this like:
>
> if (tracepoint_enabled(irq_enable))
> __do_trace_local_irq_enable();
>
> > + raw_local_irq_enable(); \
> > + } while (0)
>
> Again, this was one instruction, and you clearly didn't bother looking
> at the mess you've generated :/
We now have trace_call__#name(), thus the above can be:
if (tracepoint_enabled(irq_enable))
trace_call__local_irq_enable();
See commit 677a3d82b6407 ("tracepoint: Add trace_call__##name() API")
Just in case there's a v4.
-- Steve
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox