CRIU (Checkpoint/Restore in Userspace) mailing list
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue
@ 2026-04-08  9:51 Mathura_Kumar
  2026-04-08  9:51 ` [PATCH v3 1/5]IPC: Added New system call do_mq_timedreceive2() for non-destructive peek on posix mqueue Mathura_Kumar
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Mathura_Kumar @ 2026-04-08  9:51 UTC (permalink / raw)
  To: criu; +Cc: academic1mathura, avagin, ptikhomirov, rstoyanov


Patch series overview:

  1. Add New system call do_mq_timedreceive2() and handler implementation
  2. Add system call number in all most common arch.
  3. Prepared Documentation and test
  4. Add entry in performance tools all most common file
  5. Add bitmask flag and user struct size as input for extensibility

change since v2:
 - bitmask flag is added and check for valid operation
 - userspace struct argument size is added for extensibility
 - new test is added
 - documentation is updated approximately
 - increased the total system call count in performance tool as well
 - v2 Link: https://lore.kernel.org/linux-arch/20260320052340.6696-1-academic1mathura@gmail.com/T/#t


  Thanks for reviewing.
  

 Documentation/userspace-api/index.rst         |   1 +
 Documentation/userspace-api/ipc.rst           | 228 +++++
 arch/alpha/kernel/syscalls/syscall.tbl        |   1 +
 arch/arm/tools/syscall.tbl                    |   1 +
 arch/arm64/tools/syscall_32.tbl               |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl         |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl     |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl       |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl      |   1 +
 arch/s390/kernel/syscalls/syscall.tbl         |   1 +
 arch/sh/kernel/syscalls/syscall.tbl           |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl       |   1 +
 include/linux/compat.h                        |   6 +-
 include/linux/syscalls.h                      |   6 +
 include/uapi/asm-generic/unistd.h             |   7 +-
 include/uapi/linux/mqueue.h                   |  14 +-
 ipc/mqueue.c                                  | 211 ++++-
 ipc/msg.c                                     |   2 +-
 ipc/msgutil.c                                 |  48 +-
 ipc/util.h                                    |   3 +-
 kernel/sys_ni.c                               |   3 +
 scripts/syscall.tbl                           |   1 +
 tools/include/uapi/asm-generic/unistd.h       |   6 +-
 .../arch/alpha/entry/syscalls/syscall.tbl     |   1 +
 .../perf/arch/arm/entry/syscalls/syscall.tbl  |   1 +
 .../arch/arm64/entry/syscalls/syscall_32.tbl  |   1 +
 .../arch/mips/entry/syscalls/syscall_n64.tbl  |   1 +
 .../arch/parisc/entry/syscalls/syscall.tbl    |   1 +
 .../arch/powerpc/entry/syscalls/syscall.tbl   |   1 +
 .../perf/arch/s390/entry/syscalls/syscall.tbl |   1 +
 tools/perf/arch/sh/entry/syscalls/syscall.tbl |   1 +
 .../arch/sparc/entry/syscalls/syscall.tbl     |   1 +
 .../arch/x86/entry/syscalls/syscall_32.tbl    |   1 +
 .../arch/x86/entry/syscalls/syscall_64.tbl    |   1 +
 .../arch/xtensa/entry/syscalls/syscall.tbl    |   1 +
 tools/scripts/syscall.tbl                     |   1 +
 tools/testing/selftests/ipc/.gitignore        |   1 +
 tools/testing/selftests/ipc/Makefile          |   9 +-
 tools/testing/selftests/ipc/mq_peek.c         | 813 ++++++++++++++++++
 45 files changed, 1343 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/userspace-api/ipc.rst
 create mode 100644 tools/testing/selftests/ipc/mq_peek.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/5]IPC: Added New system call do_mq_timedreceive2() for non-destructive peek on posix mqueue
  2026-04-08  9:51 [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Mathura_Kumar
@ 2026-04-08  9:51 ` Mathura_Kumar
  2026-04-13 16:34   ` Andrei Vagin
  2026-04-08  9:51 ` [PATCH v3 2/5]IPC: Added system call number in all most common arch Mathura_Kumar
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Mathura_Kumar @ 2026-04-08  9:51 UTC (permalink / raw)
  To: criu; +Cc: academic1mathura, avagin, ptikhomirov, rstoyanov

Signed-off-by: Mathura_Kumar <academic1mathura@gmail.com>
---
 include/linux/compat.h            |   6 +-
 include/linux/syscalls.h          |   6 +
 include/uapi/asm-generic/unistd.h |   7 +-
 include/uapi/linux/mqueue.h       |  14 ++-
 ipc/mqueue.c                      | 186 ++++++++++++++++++++++++++++--
 ipc/msg.c                         |   2 +-
 ipc/msgutil.c                     |  48 ++++----
 ipc/util.h                        |   3 +-
 kernel/sys_ni.c                   |   1 +
 9 files changed, 231 insertions(+), 42 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 56cebaff0c91..9f5ca26e76d8 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -22,6 +22,7 @@
 #include <asm/compat.h>
 #include <asm/siginfo.h>
 #include <asm/signal.h>
+#include <linux/mqueue.h>
 
 #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
 /*
@@ -801,8 +802,9 @@ asmlinkage long compat_sys_pwritev64v2(unsigned long fd,
 		const struct iovec __user *vec,
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-
-
+asmlinkage long compat_sys_mq_timedreceive2(mqd_t mqdes, struct compat_mq_timedreceive2_args __user *uargs,
+											unsigned int flags, unsigned long index,
+											struct old_timespec32 __user *abs_timeout);
 /*
  * Deprecated system calls which are still defined in
  * include/uapi/asm-generic/unistd.h and wanted by >= 1 arch
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 02bd6ddb6278..993e570c90ab 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -79,6 +79,7 @@ struct mnt_id_req;
 struct ns_id_req;
 struct xattr_args;
 struct file_attr;
+struct mq_timedreceive2_args;
 
 #include <linux/types.h>
 #include <linux/aio_abi.h>
@@ -93,6 +94,7 @@ struct file_attr;
 #include <linux/key.h>
 #include <linux/personality.h>
 #include <trace/syscall.h>
+#include <linux/mqueue.h>
 
 #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
 /*
@@ -746,6 +748,10 @@ asmlinkage long sys_mq_timedsend_time32(mqd_t mqdes,
 			const char __user *u_msg_ptr,
 			unsigned int msg_len, unsigned int msg_prio,
 			const struct old_timespec32 __user *u_abs_timeout);
+asmlinkage long
+sys_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args __user *uargs,
+		     unsigned int flags, unsigned long index,
+		     struct __kernel_timespec __user *abs_timeout);
 asmlinkage long sys_msgget(key_t key, int msgflg);
 asmlinkage long sys_old_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
 asmlinkage long sys_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index a627acc8fb5f..200ee7fde5c4 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -863,9 +863,12 @@ __SYSCALL(__NR_listns, sys_listns)
 #define __NR_rseq_slice_yield 471
 __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
 
-#undef __NR_syscalls
-#define __NR_syscalls 472
+#define __NR_mq_timedreceive2 472
+__SC_COMP(__NR_mq_timedreceive2, sys_mq_timedreceive2,
+		compat_sys_mq_timedreceive2)
 
+#undef __NR_syscalls
+#define __NR_syscalls 473
 /*
  * 32 bit systems traditionally used different
  * syscalls for off_t and loff_t arguments, while
diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h
index b516b66840ad..7cdced63f5d2 100644
--- a/include/uapi/linux/mqueue.h
+++ b/include/uapi/linux/mqueue.h
@@ -18,8 +18,8 @@
 
 #ifndef _LINUX_MQUEUE_H
 #define _LINUX_MQUEUE_H
-
 #include <linux/types.h>
+#include <asm/compat.h>
 
 #define MQ_PRIO_MAX 	32768
 /* per-uid limit of kernel memory used by mqueue, in bytes */
@@ -33,6 +33,18 @@ struct mq_attr {
 	__kernel_long_t	__reserved[4];	/* ignored for input, zeroed for output */
 };
 
+struct mq_timedreceive2_args {
+	size_t msg_len;
+	unsigned int  *msg_prio;
+	char  *msg_ptr;
+};
+
+struct compat_mq_timedreceive2_args {
+	compat_size_t msg_len;
+	compat_uptr_t msg_prio;
+	compat_uptr_t msg_ptr;
+};
+
 /*
  * SIGEV_THREAD implementation:
  * SIGEV_THREAD must be implemented in user space. If SIGEV_THREAD is passed
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 4798b375972b..78dc414967a2 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -53,6 +53,7 @@ struct mqueue_fs_context {
 
 #define SEND		0
 #define RECV		1
+#define MQ_PEEK     2
 
 #define STATE_NONE	0
 #define STATE_READY	1
@@ -1230,6 +1231,115 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 	return ret;
 }
 
+static struct msg_msg *mq_peek_index(struct mqueue_inode_info *info, int index)
+{
+	struct rb_node *node;
+	struct posix_msg_tree_node *leaf;
+	struct msg_msg *msg;
+
+	int count = 0;
+
+	/* Start from highest priority */
+	node = rb_last(&info->msg_tree);
+	while (node) {
+		leaf = rb_entry(node, struct posix_msg_tree_node, rb_node);
+		list_for_each_entry(msg, &leaf->msg_list, m_list) {
+			if (count == index)
+				return msg;
+			count++;
+		}
+
+		node = rb_prev(node);
+	}
+
+	return NULL;
+}
+
+static int do_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args *args,
+			       unsigned int flags, unsigned long index,
+			       struct timespec64 *ts)
+{
+	ssize_t ret;
+	struct msg_msg *msg_ptr, *k_msg_buffer;
+	long k_m_type;
+	size_t k_m_ts;
+	struct inode *inode;
+	struct mqueue_inode_info *info;
+
+	if (!(flags & MQ_PEEK)) {
+		return do_mq_timedreceive(mqdes, args->msg_ptr, args->msg_len,
+					  args->msg_prio, ts);
+	}
+	audit_mq_sendrecv(mqdes, args->msg_len, 0, ts);
+	CLASS(fd, f)(mqdes);
+	if (fd_empty(f))
+		return -EBADF;
+
+	inode = file_inode(fd_file(f));
+	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
+		return -EBADF;
+	info = MQUEUE_I(inode);
+	audit_file(fd_file(f));
+
+	if (unlikely(!(fd_file(f)->f_mode & FMODE_READ)))
+		return -EBADF;
+
+	if (unlikely(args->msg_len < info->attr.mq_msgsize))
+		return -EMSGSIZE;
+	if (index >= (unsigned long)info->attr.mq_maxmsg)
+		return -ENOENT;
+
+	spin_lock(&info->lock);
+	if (info->attr.mq_curmsgs == 0) {
+		spin_unlock(&info->lock);
+		return -EAGAIN;
+	}
+	msg_ptr = mq_peek_index(info, index);
+	if (!msg_ptr) {
+		spin_unlock(&info->lock);
+		return -ENOENT;
+	}
+	k_m_type = msg_ptr->m_type;
+	k_m_ts = msg_ptr->m_ts;
+	spin_unlock(&info->lock);
+
+	k_msg_buffer = alloc_msg(k_m_ts);
+	if (!k_msg_buffer)
+		return -ENOMEM;
+
+	/*
+	 * Two spin locks are necessary here. We are avoiding atomic memory
+	 * allocation and premature allocation before confirming that
+	 * a message actually exists to peek.
+	 */
+	spin_lock(&info->lock);
+	msg_ptr = mq_peek_index(info, index);
+	if (!msg_ptr || msg_ptr->m_type != k_m_type ||
+	    msg_ptr->m_ts != k_m_ts) {
+		spin_unlock(&info->lock);
+		free_msg(k_msg_buffer);
+		return -EAGAIN;
+	}
+	if (IS_ERR(copy_msg(msg_ptr, k_msg_buffer, k_m_ts))) {
+		spin_unlock(&info->lock);
+		free_msg(k_msg_buffer);
+		return -EINVAL;
+	}
+	spin_unlock(&info->lock);
+
+	ret = k_msg_buffer->m_ts;
+	if (args->msg_prio && put_user(k_m_type, args->msg_prio)) {
+		free_msg(k_msg_buffer);
+		return -EFAULT;
+	}
+	if (store_msg(args->msg_ptr, k_msg_buffer, k_m_ts)) {
+		free_msg(k_msg_buffer);
+		return -EFAULT;
+	}
+	free_msg(k_msg_buffer);
+	return ret;
+}
+
 SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
 		size_t, msg_len, unsigned int, msg_prio,
 		const struct __kernel_timespec __user *, u_abs_timeout)
@@ -1258,6 +1368,27 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
 	return do_mq_timedreceive(mqdes, u_msg_ptr, msg_len, u_msg_prio, p);
 }
 
+SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
+		struct mq_timedreceive2_args __user *, uargs, unsigned int,
+		flags, const unsigned long, index,
+		const struct __kernel_timespec __user *, u_abs_timeout)
+{
+	struct mq_timedreceive2_args args;
+	struct timespec64 ts, *p = NULL;
+
+	if (copy_from_user(&args, uargs, sizeof(args)))
+		return -EFAULT;
+
+	if (u_abs_timeout) {
+		int res = prepare_timeout(u_abs_timeout, &ts);
+
+		if (res)
+			return res;
+		p = &ts;
+	}
+	return do_mq_timedreceive2(mqdes, &args, flags, index, p);
+}
+
 /*
  * Notes: the case when user wants us to deregister (with NULL as pointer)
  * and he isn't currently owner of notification, will be silently discarded.
@@ -1449,6 +1580,17 @@ SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
 	return 0;
 }
 
+#ifdef CONFIG_COMPAT_32BIT_TIME
+static int compat_prepare_timeout(const struct old_timespec32 __user *p,
+								struct timespec64 *ts)
+{
+	if (get_old_timespec32(ts, p))
+		return -EFAULT;
+	if (!timespec64_valid(ts))
+		return -EINVAL;
+	return 0;
+}
+
 #ifdef CONFIG_COMPAT
 
 struct compat_mq_attr {
@@ -1490,6 +1632,22 @@ static inline int put_compat_mq_attr(const struct mq_attr *attr,
 	return 0;
 }
 
+static inline int get_compat_mq_args(struct mq_timedreceive2_args *args,
+									struct compat_mq_timedreceive2_args __user *uargs)
+{
+	struct compat_mq_timedreceive2_args v;
+
+	if (copy_from_user(&v, uargs, sizeof(v)))
+		return -EFAULT;
+
+	memset(args, 0, sizeof(*args));
+	args->msg_len = (size_t)v.msg_len;
+	args->msg_prio = (unsigned int *)compat_ptr(v.msg_prio);
+	args->msg_ptr = (char *)compat_ptr(v.msg_ptr);
+
+	return 0;
+}
+
 COMPAT_SYSCALL_DEFINE4(mq_open, const char __user *, u_name,
 		       int, oflag, compat_mode_t, mode,
 		       struct compat_mq_attr __user *, u_attr)
@@ -1541,19 +1699,30 @@ COMPAT_SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
 		return -EFAULT;
 	return 0;
 }
-#endif
 
-#ifdef CONFIG_COMPAT_32BIT_TIME
-static int compat_prepare_timeout(const struct old_timespec32 __user *p,
-				   struct timespec64 *ts)
+COMPAT_SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
+		       struct compat_mq_timedreceive2_args __user *, uargs,
+		       unsigned int, flags, const unsigned long, index,
+		       const struct old_timespec32 __user *, u_abs_timeout)
 {
-	if (get_old_timespec32(ts, p))
+	struct mq_timedreceive2_args args;
+	struct timespec64 ts, *p = NULL;
+
+	if (get_compat_mq_args(&args, uargs))
 		return -EFAULT;
-	if (!timespec64_valid(ts))
-		return -EINVAL;
-	return 0;
+
+	if (u_abs_timeout) {
+		int res = compat_prepare_timeout(u_abs_timeout, &ts);
+
+		if (res)
+			return res;
+		p = &ts;
+	}
+	return do_mq_timedreceive2(mqdes, &args, flags, index, p);
 }
 
+#endif
+
 SYSCALL_DEFINE5(mq_timedsend_time32, mqd_t, mqdes,
 		const char __user *, u_msg_ptr,
 		unsigned int, msg_len, unsigned int, msg_prio,
@@ -1583,6 +1752,7 @@ SYSCALL_DEFINE5(mq_timedreceive_time32, mqd_t, mqdes,
 	}
 	return do_mq_timedreceive(mqdes, u_msg_ptr, msg_len, u_msg_prio, p);
 }
+
 #endif
 
 static const struct inode_operations mqueue_dir_inode_operations = {
diff --git a/ipc/msg.c b/ipc/msg.c
index 62996b97f0ac..6392b11dd7f7 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -1156,7 +1156,7 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
 			 * not update queue parameters.
 			 */
 			if (msgflg & MSG_COPY) {
-				msg = copy_msg(msg, copy);
+				msg = copy_msg(msg, copy, msg->m_ts);
 				goto out_unlock0;
 			}
 
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index e28f0cecb2ec..c5536ce47cc2 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -51,7 +51,7 @@ static int __init init_msg_buckets(void)
 }
 subsys_initcall(init_msg_buckets);
 
-static struct msg_msg *alloc_msg(size_t len)
+struct msg_msg *alloc_msg(size_t len)
 {
 	struct msg_msg *msg;
 	struct msg_msgseg **pseg;
@@ -122,39 +122,33 @@ struct msg_msg *load_msg(const void __user *src, size_t len)
 	free_msg(msg);
 	return ERR_PTR(err);
 }
-#ifdef CONFIG_CHECKPOINT_RESTORE
-struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
+
+struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst, size_t len)
 {
-	struct msg_msgseg *dst_pseg, *src_pseg;
-	size_t len = src->m_ts;
-	size_t alen;
+	struct msg_msgseg *src_seg, *dst_seg;
+	size_t remaining, chunk;
 
-	if (src->m_ts > dst->m_ts)
+	if (len > src->m_ts)
 		return ERR_PTR(-EINVAL);
-
-	alen = min(len, DATALEN_MSG);
-	memcpy(dst + 1, src + 1, alen);
-
-	for (dst_pseg = dst->next, src_pseg = src->next;
-	     src_pseg != NULL;
-	     dst_pseg = dst_pseg->next, src_pseg = src_pseg->next) {
-
-		len -= alen;
-		alen = min(len, DATALEN_SEG);
-		memcpy(dst_pseg + 1, src_pseg + 1, alen);
+	chunk = min(len, DATALEN_MSG);
+	memcpy(dst + 1, src + 1, chunk);
+	remaining = len - chunk;
+	src_seg = src->next;
+	dst_seg = dst->next;
+	while (remaining > 0 && src_seg && dst_seg) {
+		chunk = min(remaining, DATALEN_SEG);
+		memcpy(dst_seg + 1, src_seg + 1, chunk);
+		remaining -= chunk;
+		src_seg = src_seg->next;
+		dst_seg = dst_seg->next;
 	}
-
+	if (remaining != 0)
+		return ERR_PTR(-EINVAL);
 	dst->m_type = src->m_type;
-	dst->m_ts = src->m_ts;
-
+	dst->m_ts   = src->m_ts;
 	return dst;
 }
-#else
-struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
-{
-	return ERR_PTR(-ENOSYS);
-}
-#endif
+
 int store_msg(void __user *dest, struct msg_msg *msg, size_t len)
 {
 	size_t alen;
diff --git a/ipc/util.h b/ipc/util.h
index a55d6cebe6d3..374abeee79b3 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -197,8 +197,9 @@ int ipc_parse_version(int *cmd);
 
 extern void free_msg(struct msg_msg *msg);
 extern struct msg_msg *load_msg(const void __user *src, size_t len);
-extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst);
+extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst, size_t len);
 extern int store_msg(void __user *dest, struct msg_msg *msg, size_t len);
+extern struct msg_msg *alloc_msg(size_t len);
 
 static inline int ipc_checkid(struct kern_ipc_perm *ipcp, int id)
 {
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index add3032da16f..658d6b8274b3 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -391,6 +391,7 @@ COND_SYSCALL(setuid16);
 /* restartable sequence */
 COND_SYSCALL(rseq);
 COND_SYSCALL(rseq_slice_yield);
+COND_SYSCALL(mq_timedreceive2);
 
 COND_SYSCALL(uretprobe);
 COND_SYSCALL(uprobe);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/5]IPC: Added system call number in all  most common  arch.
  2026-04-08  9:51 [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Mathura_Kumar
  2026-04-08  9:51 ` [PATCH v3 1/5]IPC: Added New system call do_mq_timedreceive2() for non-destructive peek on posix mqueue Mathura_Kumar
@ 2026-04-08  9:51 ` Mathura_Kumar
  2026-04-08  9:51 ` [PATCH v3 3/5]IPC: Prepared Documentation and test Mathura_Kumar
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Mathura_Kumar @ 2026-04-08  9:51 UTC (permalink / raw)
  To: criu; +Cc: academic1mathura, avagin, ptikhomirov, rstoyanov

Signed-off-by: Mathura_Kumar <academic1mathura@gmail.com>
---
 arch/alpha/kernel/syscalls/syscall.tbl      | 1 +
 arch/arm/tools/syscall.tbl                  | 1 +
 arch/arm64/tools/syscall_32.tbl             | 1 +
 arch/m68k/kernel/syscalls/syscall.tbl       | 1 +
 arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 1 +
 arch/parisc/kernel/syscalls/syscall.tbl     | 1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    | 1 +
 arch/s390/kernel/syscalls/syscall.tbl       | 1 +
 arch/sh/kernel/syscalls/syscall.tbl         | 1 +
 arch/sparc/kernel/syscalls/syscall.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl      | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     | 1 +
 scripts/syscall.tbl                         | 1 +
 17 files changed, 17 insertions(+)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index f31b7afffc34..0ff42d241419 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -511,3 +511,4 @@
 579	common	file_setattr			sys_file_setattr
 580	common	listns				sys_listns
 581	common	rseq_slice_yield		sys_rseq_slice_yield
+582	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 94351e22bfcf..c00a185fbe50 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -486,3 +486,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2          compat_sys_mq_timedreceive2
diff --git a/arch/arm64/tools/syscall_32.tbl b/arch/arm64/tools/syscall_32.tbl
index 62d93d88e0fe..4eac85c25658 100644
--- a/arch/arm64/tools/syscall_32.tbl
+++ b/arch/arm64/tools/syscall_32.tbl
@@ -483,3 +483,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2     	sys_mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 248934257101..6e336289eaf4 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -471,3 +471,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 223d26303627..6b617de0442e 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -477,3 +477,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 7430714e2b8f..1f761cf9cfb0 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -410,3 +410,4 @@
 469	n32	file_setattr			sys_file_setattr
 470	n32	listns				sys_listns
 471	n32	rseq_slice_yield		sys_rseq_slice_yield
+472	n32	mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 630aab9e5425..75de6ee2df94 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -386,3 +386,4 @@
 469	n64	file_setattr			sys_file_setattr
 470	n64	listns				sys_listns
 471	n64	rseq_slice_yield		sys_rseq_slice_yield
+472	n64	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 128653112284..f11be7e34d93 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -459,3 +459,4 @@
 469	o32	file_setattr			sys_file_setattr
 470	o32	listns				sys_listns
 471	o32	rseq_slice_yield		sys_rseq_slice_yield
+472	o32	mq_timedreceive2		sys_mq_timedreceive2	compat_sys_mq_timedreceive2
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index f6e2d0379d57..1b22b8d0edcc 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -470,3 +470,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 4fcc7c58a105..ab57aa18b24a 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -562,3 +562,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	nospu	rseq_slice_yield		sys_rseq_slice_yield
+472 common	mq_timedreceive2	sys_mq_timedreceive2	compat_sys_mq_timedreceive2
\ No newline at end of file
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 09a7ef04d979..34df682061d9 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -398,3 +398,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 70b315cbe710..cac787dbef1a 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -475,3 +475,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 7e71bf7fcd14..06bcea3622cb 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -517,3 +517,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472 common	mq_timedreceive2	sys_mq_timedreceive2	compat_sys_mq_timedreceive2
\ No newline at end of file
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index f832ebd2d79b..79154690058f 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -477,3 +477,4 @@
 469	i386	file_setattr		sys_file_setattr
 470	i386	listns			sys_listns
 471	i386	rseq_slice_yield	sys_rseq_slice_yield
+472	i386	mq_timedreceive2	sys_mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 524155d655da..a76f9a2f043a 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -441,3 +441,4 @@
 547	x32	pwritev2		compat_sys_pwritev64v2
 # This is the end of the legacy x32 range.  Numbers 548 and above are
 # not special and are not to be used for x32-specific syscalls.
+548	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index a9bca4e484de..ae106ac97418 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -442,3 +442,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl
index 7a42b32b6577..282bfff83d59 100644
--- a/scripts/syscall.tbl
+++ b/scripts/syscall.tbl
@@ -412,3 +412,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2		compat_sys_mq_timedreceive2
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/5]IPC: Prepared Documentation and test
  2026-04-08  9:51 [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Mathura_Kumar
  2026-04-08  9:51 ` [PATCH v3 1/5]IPC: Added New system call do_mq_timedreceive2() for non-destructive peek on posix mqueue Mathura_Kumar
  2026-04-08  9:51 ` [PATCH v3 2/5]IPC: Added system call number in all most common arch Mathura_Kumar
@ 2026-04-08  9:51 ` Mathura_Kumar
  2026-04-08  9:51 ` [PATCH v3 4/5]IPC:Added entry in performance tools for new system call Mathura_Kumar
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Mathura_Kumar @ 2026-04-08  9:51 UTC (permalink / raw)
  To: criu; +Cc: academic1mathura, avagin, ptikhomirov, rstoyanov

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=yes, Size: 30479 bytes --]

Signed-off-by: Mathura_Kumar <academic1mathura@gmail.com>
---
 Documentation/userspace-api/index.rst  |   1 +
 Documentation/userspace-api/ipc.rst    | 222 +++++++
 tools/testing/selftests/ipc/.gitignore |   1 +
 tools/testing/selftests/ipc/Makefile   |   9 +-
 tools/testing/selftests/ipc/mq_peek.c  | 785 +++++++++++++++++++++++++
 5 files changed, 1017 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/userspace-api/ipc.rst
 create mode 100644 tools/testing/selftests/ipc/mq_peek.c

diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index a68b1bea57a8..bf747a91e74a 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -22,6 +22,7 @@ System calls
    ioctl/index
    mseal
    rseq
+   ipc
 
 Security-related interfaces
 ===========================
diff --git a/Documentation/userspace-api/ipc.rst b/Documentation/userspace-api/ipc.rst
new file mode 100644
index 000000000000..14a5f0ce7230
--- /dev/null
+++ b/Documentation/userspace-api/ipc.rst
@@ -0,0 +1,222 @@
+mq_timedreceive2 system call
+=============================
+
+This document describes the mq_timedreceive2() system call. It provides
+an overview of the feature, interface specification, design, and
+test specification.
+
+Contents
+--------
+
+        1) Overview
+        2) Functional Specification
+        3) Design
+        4) Implementation Notes
+        5) Test Specification
+
+1) Overview
+-----------
+
+POSIX message queues on Linux provide mq_timedreceive() for consuming
+messages from a queue.This interface requires the caller to pass the
+message buffer, length and priority pointer as individual arguments to
+the system call. This imposes a fixed calling convention that cannot be
+extended without breaking the ABI.
+
+mq_timedreceive2() introduces a new system call entry point that accepts
+message buffer parameters via a struct argument rather than as individual
+syscall arguments. This frees the remaining syscall argument slots for
+new functionality flags and a message index, enabling non-destructive
+peek and indexed access semantics that are not possible with the
+original interface.
+
+One 64-bit variant is provided with compat handling:
+    mq_timedreceive2()
+
+2) Functional Specification
+---------------------------
+
+NAME
+        mq_timedreceive2 - receive or peek at a message from a
+        POSIX message queue
+
+SYNOPSIS
+
+.. code-block:: c
+
+        #include <mqueue.h>
+
+        struct mq_timedreceive2_args {
+                size_t         msg_len;
+                unsigned int  *msg_prio;
+                char          *msg_ptr;
+        };
+
+        ssize_t mq_timedreceive2(mqd_t mqdes,
+                                 struct mq_timedreceive2_args *uargs,
+                                 unsigned int flags,
+                                 unsigned long index,
+                                 const struct timespec *abs_timeout);
+
+Note: No glibc wrapper exists for this syscall. Callers must invoke it
+directly using syscall(2).
+
+DESCRIPTION
+        mq_timedreceive2() receives or peeks at a message from the
+        message queue referred to by the descriptor mqdes.
+
+        The uargs structure provides the message buffer parameters:
+
+        ``msg_ptr``
+                Userspace buffer to receive the message body.
+
+        ``msg_len``
+                Size of msg_ptr in bytes. Must be greater than or equal
+                to the mq_msgsize attribute of the queue.
+
+        ``msg_prio``
+                If not NULL, the priority of the received message is
+                stored here.
+
+        The flags argument controls receive behavior. The following
+        flag is defined:
+
+        ``MQ_PEEK``
+                Copy the message into msg_ptr without removing it from
+                the queue. The queue is not modified. If this flag is
+                not set, behavior is identical to mq_timedreceive() and
+                the message is consumed.
+
+        The index argument selects which message to operate on within
+        the priority-ordered queue. index 0 refers to the highest
+        priority message. When MQ_PEEK is not set, index is ignored
+        but must be non-empty.
+
+        The abs_timeout argument specifies an absolute timeout. When
+        MQ_PEEK is set, abs_timeout is ignored since peek is a
+        non-blocking snapshot operation. When MQ_PEEK is not set,
+        abs_timeout behaves identically to mq_timedreceive().
+
+RETURN VALUE
+        On success, returns the number of bytes copied into msg_ptr.
+        On failure, returns -1 and sets errno.
+
+ERRORS
+        ``EAGAIN``
+                Queue is empty and MQ_PEEK is set. Peek is always
+                non-blocking and returns immediately on empty queue.
+
+        ``EBADF``
+                mqdes is not a valid message queue descriptor open
+                for reading.
+
+        ``EFAULT``
+                uargs, msg_ptr, msg_prio, or abs_timeout points to
+                an invalid address.
+
+        ``EINVAL``
+                flags contains an unknown value, or index is nonzero
+                and MQ_PEEK is not set.
+
+        ``EMSGSIZE``
+                msg_len is less than the mq_msgsize attribute of
+                the queue.
+
+        ``ETIMEDOUT``
+                Pop path only. The call timed out before a message
+                became available. Never returned on peek path.
+
+3) Design
+---------
+
+3.1 Struct-based argument passing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The message buffer parameters (msg_ptr, msg_len, msg_prio) are
+consolidated into struct mq_timedreceive2_args rather than passed
+as individual syscall arguments. Due to limited six arguments,
+The original mq_timedreceive() consumes all six slots,
+leaving no room for extension. Consolidating the buffer parameters
+into a struct recovers two argument slots for flags and index while
+keeping the interface clean and forward-compatible.
+
+Future extensions can be made by adding new flag bits without
+requiring a new syscall number.
+
+3.2 Compat handling
+~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: c
+
+        struct compat_mq_timedreceive2_args {
+                compat_size_t  msg_len;
+                compat_uptr_t  msg_prio;
+                compat_uptr_t  msg_ptr;
+        };
+
+The compat entry point performs the necessary conversions before
+calling the shared do_mq_timedreceive2() implementation.
+
+3.3 Peek implementation
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+When MQ_PEEK is set, the implementation locates the target message
+in the priority tree but does not remove it. Two locks are taken:
+the first confirms a message exists before any allocation is
+attempted, avoiding allocation on empty queues. The second protects
+the kernel temporary buffer copy operation. The message is copied to
+userspace and both locks are released with the queue unmodified.
+
+3.4 Index argument
+~~~~~~~~~~~~~~~~~~~
+
+The priority tree is walked to the node at position index within
+the priority ordering. index 0 is always the highest priority
+message, consistent with what an unconditional mq_timedreceive()
+would return.
+
+4) Implementation Notes
+-----------------------
+
+The implementation lives in ipc/mqueue.c. The syscall entry
+point mq_timedreceive2 are thin wrappers that validate and convert
+arguments before calling the shared internal function do_mq_timedreceive2().
+
+Syscall numbers are assigned for all of most common architectures. Refer
+to the respective syscall table files under arch/ for per-architecture
+numbers.
+
+5) Test Specification
+---------------------
+
+Tests for mq_timedreceive2() should cover the following:
+
+1) Basic receive: verify that without MQ_PEEK the message is consumed
+   and queue depth decreases by one. Verify message body and priority
+   are correct.
+
+2) Peek semantics: verify that with MQ_PEEK the message body and
+   priority are returned correctly and the queue depth is unchanged
+   after the call.
+
+3) Repeated peek: verify that calling mq_timedreceive2() with MQ_PEEK
+   multiple times on the same queue returns the same message each time
+   without modifying the queue.
+
+4) Index argument: verify that index 0 returns the highest priority
+   message. Verify that out-of-range index returns ENOENT.
+
+5) Empty queue: verify that peek on an empty queue returns EAGAIN
+   immediately without blocking.
+
+6) Memory behavior: verify that both small and large message copies
+   work correctly without corruption.
+
+7) Invalid arguments: verify EBADF for invalid mqdes, EFAULT for bad
+   pointers in uargs, EINVAL for unknown flags, EMSGSIZE for msg_len
+   smaller than queue mq_msgsize.
+
+8) Concurrent access: verify that simultaneous peek from multiple
+   threads returns consistent results and does not corrupt queue
+   state. Verify that a concurrent mq_receive() and mq_peek() do
+   not race.
diff --git a/tools/testing/selftests/ipc/.gitignore b/tools/testing/selftests/ipc/.gitignore
index 9ed280e4c704..fe609d98ecb2 100644
--- a/tools/testing/selftests/ipc/.gitignore
+++ b/tools/testing/selftests/ipc/.gitignore
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 msgque_test
 msgque
+mq_peek
diff --git a/tools/testing/selftests/ipc/Makefile b/tools/testing/selftests/ipc/Makefile
index 50e9c299fc4a..1a8b27127ca8 100644
--- a/tools/testing/selftests/ipc/Makefile
+++ b/tools/testing/selftests/ipc/Makefile
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
 uname_M := $(shell uname -m 2>/dev/null || echo not)
 ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/)
+CC = gcc
+CFLAGS = -static -O2
 ifeq ($(ARCH),i386)
         ARCH := x86
 	CFLAGS := -DCONFIG_X86_32 -D__i386__
@@ -9,10 +11,15 @@ ifeq ($(ARCH),x86_64)
 	ARCH := x86
 	CFLAGS := -DCONFIG_X86_64 -D__x86_64__
 endif
+ifeq ($(ARCH),aarch64)
+        ARCH := arm64
+        CFLAGS := -DCONFIG_ARM64
+endif
 
 CFLAGS += $(KHDR_INCLUDES)
 
-TEST_GEN_PROGS := msgque
+TEST_GEN_PROGS := msgque mq_peek
+LDLIBS += -lrt -lpthread
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/ipc/mq_peek.c b/tools/testing/selftests/ipc/mq_peek.c
new file mode 100644
index 000000000000..d08596ae6e1c
--- /dev/null
+++ b/tools/testing/selftests/ipc/mq_peek.c
@@ -0,0 +1,785 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*  - Correct priority-rank indexed access (index 0 = highest priority)
+ *  - FIFO ordering within same priority level
+ *  - Concurrent peek + receive / peek + send races
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <errno.h>
+#include <pthread.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <time.h>
+#include "mqueue.h"
+#include "kselftest.h"
+#include "linux/types.h"
+
+#ifndef __NR_mq_timedreceive2
+
+#if defined(__alpha__)
+#define __NR_mq_timedreceive2 582
+
+#elif defined(__x86_64__)
+#define __NR_mq_timedreceive2 548
+
+#else
+/* common value used by many architectures:
+ * arm  m68k, microblaze, mips
+ * parisc, x86_32, and others
+ */
+#define __NR_mq_timedreceive2 472
+
+#endif
+#endif
+
+#define MQ_PEEK 2U
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#endif
+
+struct mq_timedreceive2_args {
+	size_t msg_len;
+	unsigned int *msg_prio;
+	char *msg_ptr;
+};
+
+static long mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args *uargs,
+			     unsigned int flags, unsigned long index,
+			     const struct timespec *timeout)
+{
+	return syscall(__NR_mq_timedreceive2, (long)mqdes, uargs, (long)flags,
+		       index, timeout);
+}
+
+#define MQ_NAME_PREFIX "/mq_peek_test_"
+#define MAX_MSG_SIZE 128
+#define LARGE_MSG_SIZE 4064
+#define MQ_MAXMSG 16
+
+#define PRIO_HIGH 9
+#define PRIO_MED 5
+#define PRIO_LOW 1
+
+static mqd_t open_queue(const char *suffix, long msgsize)
+{
+	char name[64];
+	struct mq_attr attr = {
+		.mq_flags = 0,
+		.mq_maxmsg = MQ_MAXMSG,
+		.mq_msgsize = msgsize,
+		.mq_curmsgs = 0,
+	};
+	mqd_t mqd;
+
+	snprintf(name, sizeof(name), "%s%s", MQ_NAME_PREFIX, suffix);
+	mq_unlink(name);
+
+	mqd = mq_open(name, O_NONBLOCK | O_RDWR | O_CREAT | O_EXCL, 0600, &attr);
+	if (mqd == (mqd_t)-1) {
+		ksft_test_result_fail("mq_open(%s): %m\n", name);
+		ksft_exit_fail();
+	}
+
+	mq_unlink(name);
+	return mqd;
+}
+
+static void send_msg(mqd_t mqd, unsigned int prio, const char *text, size_t len)
+{
+	if (mq_send(mqd, text, len, prio) != 0) {
+		ksft_test_result_fail("mq_send(prio=%u): %m\n", prio);
+		ksft_exit_fail();
+	}
+}
+
+static long peek(mqd_t mqd, unsigned long index, char *buf, size_t bufsz,
+		 unsigned int *prio)
+{
+	struct mq_timedreceive2_args args = {
+		.msg_len = bufsz,
+		.msg_prio = prio,
+		.msg_ptr = buf,
+	};
+	return mq_timedreceive2(mqd, &args, MQ_PEEK, index, NULL);
+}
+
+static long queue_depth(mqd_t mqd)
+{
+	struct mq_attr attr;
+
+	if (mq_getattr(mqd, &attr) != 0)
+		return -1;
+	return attr.mq_curmsgs;
+}
+
+static void test_peek_empty_queue(void)
+{
+	mqd_t mqd = open_queue("empty", MAX_MSG_SIZE);
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	long ret;
+
+	ret = peek(mqd, 0, buf, sizeof(buf), &prio);
+	if (ret == -1 && errno == EAGAIN)
+		ksft_test_result_pass("peek on empty queue [EAGAIN]\n");
+	else
+		ksft_test_result_fail("peek on empty queue: expected EAGAIN, got ret=%ld errno=%d (%m)\n",
+								ret, errno);
+
+	mq_close(mqd);
+}
+
+static void test_peek_invalid_fd(void)
+{
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	long ret;
+
+	ret = peek((mqd_t)-1, 0, buf, sizeof(buf), &prio);
+	if (ret == -1 && errno == EBADF)
+		ksft_test_result_pass("peek invalid fd [ EBADF]\n");
+	else
+		ksft_test_result_fail("peek invalid fd: expected EBADF, got ret=%ld errno=%d\n",
+							   ret, errno);
+}
+
+static void test_peek_non_mqueue_fd(void)
+{
+	int pipefd[2];
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	long ret;
+
+	if (pipe(pipefd) != 0) {
+		ksft_test_result_skip("pipe() failed, skipping non-mqueue-fd test\n");
+		return;
+	}
+
+	ret = peek((mqd_t)pipefd[0], 0, buf, sizeof(buf), &prio);
+	if (ret == -1 && errno == EBADF)
+		ksft_test_result_pass("peek on pipe fd [EBADF]\n");
+	else
+		ksft_test_result_fail("peek non-mqueue fd: expected EBADF, got ret=%ld errno=%d\n",
+								ret, errno);
+
+	close(pipefd[0]);
+	close(pipefd[1]);
+}
+
+static void test_peek_writeonly_fd(void)
+{
+	char name[] = "/ksft_mq_peek_wo";
+	struct mq_attr attr = { .mq_maxmsg = 4, .mq_msgsize = MAX_MSG_SIZE };
+	mqd_t rw, wo;
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	long ret;
+
+	mq_unlink(name);
+	rw = mq_open(name, O_RDWR | O_CREAT, 0600, &attr);
+	if (rw == (mqd_t)-1) {
+		ksft_test_result_skip("mq_open RW failed: %m\n");
+		return;
+	}
+	wo = mq_open(name, O_WRONLY);
+	mq_unlink(name);
+
+	if (wo == (mqd_t)-1) {
+		ksft_test_result_skip("mq_open WO failed: %m\n");
+		mq_close(rw);
+		return;
+	}
+
+	send_msg(rw, PRIO_HIGH, "x", 1);
+
+	ret = peek(wo, 0, buf, sizeof(buf), &prio);
+	if (ret == -1 && errno == EBADF)
+		ksft_test_result_pass("peek on O_WRONLY fd [EBADF]\n");
+	else
+		ksft_test_result_fail("peek WO fd: expected EBADF, got ret=%ld errno=%d\n",
+								ret, errno);
+
+	mq_close(wo);
+	mq_close(rw);
+}
+
+static void test_peek_buffer_too_small(void)
+{
+	mqd_t mqd = open_queue("small", MAX_MSG_SIZE);
+	char tiny[1]; /* deliberately too small */
+	unsigned int prio;
+	struct mq_timedreceive2_args args = {
+		.msg_len = sizeof(tiny),
+		.msg_prio = &prio,
+		.msg_ptr = tiny,
+	};
+	long ret;
+
+	send_msg(mqd, PRIO_HIGH, "hello", 5);
+
+	ret = mq_timedreceive2(mqd, &args, MQ_PEEK, 0, NULL);
+	if (ret == -1 && errno == EMSGSIZE)
+		ksft_test_result_pass("peek with small buf [EMSGSIZE]\n");
+	else
+		ksft_test_result_fail("peek small buf: expected EMSGSIZE, got ret=%ld errno=%d\n",
+								ret, errno);
+
+	mq_close(mqd);
+}
+
+static void test_peek_bad_msg_ptr(void)
+{
+	mqd_t mqd = open_queue("bad_ptr", MAX_MSG_SIZE);
+	unsigned int prio;
+
+	struct mq_timedreceive2_args args = {
+		.msg_len = MAX_MSG_SIZE,
+		.msg_prio = &prio,
+		.msg_ptr = (char *)0x1,
+	};
+
+	long ret;
+
+	send_msg(mqd, PRIO_HIGH, "payload", 7);
+	ret = mq_timedreceive2(mqd, &args, MQ_PEEK, 0, NULL);
+	if (ret == -1 && errno == EFAULT)
+		ksft_test_result_pass("peek bad msg_ptr [EFAULT]\n");
+	else
+		ksft_test_result_fail("peek bad msg_ptr: expected EFAULT, got ret=%ld errno=%d\n",
+								ret, errno);
+
+	mq_close(mqd);
+}
+
+static void test_peek_index_out_of_range(void)
+{
+	mqd_t mqd = open_queue("oob", MAX_MSG_SIZE);
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	long ret;
+
+	send_msg(mqd, PRIO_MED, "one", 3);
+	ret = peek(mqd, 1, buf, sizeof(buf), &prio);
+	if (ret == -1 && errno == ENOENT)
+		ksft_test_result_pass("peek OOB index [ENOENT]\n");
+	else
+		ksft_test_result_fail("peek OOB: expected ENOENT, got ret=%ld errno=%d\n",
+								ret, errno);
+
+	mq_close(mqd);
+}
+
+static void test_peek_basic_data(void)
+{
+	mqd_t mqd = open_queue("basic", MAX_MSG_SIZE);
+	const char *payload = "peek-test-payload";
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio = 0;
+	long ret;
+
+	send_msg(mqd, PRIO_HIGH, payload, strlen(payload));
+
+	memset(buf, 0, sizeof(buf));
+	ret = peek(mqd, 0, buf, sizeof(buf), &prio);
+
+	if (ret < 0) {
+		ksft_test_result_fail("basic peek failed: ret=%ld errno=%d (%m)\n", ret, errno);
+		goto out;
+	}
+	if ((size_t)ret != strlen(payload)) {
+		ksft_test_result_fail("basic peek: wrong size %ld (expected %zu)\n", ret, strlen(payload));
+		goto out;
+	}
+	if (memcmp(buf, payload, strlen(payload)) != 0) {
+		ksft_test_result_fail("basic peek: payload mismatch\n");
+		goto out;
+	}
+	if (prio != PRIO_HIGH) {
+		ksft_test_result_fail("basic peek: wrong prio %u (expected %d)\n", prio, PRIO_HIGH);
+		goto out;
+	}
+	ksft_test_result_pass("basic peek: correct data and priority\n");
+out:
+	mq_close(mqd);
+}
+
+static void test_peek_nondestructive(void)
+{
+	mqd_t mqd = open_queue("nodestr", MAX_MSG_SIZE);
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	int i;
+
+	send_msg(mqd, PRIO_HIGH, "A", 1);
+	send_msg(mqd, PRIO_MED, "B", 1);
+	send_msg(mqd, PRIO_LOW, "C", 1);
+
+	if (queue_depth(mqd) != 3) {
+		ksft_test_result_fail("initial depth != 3\n");
+		mq_close(mqd);
+		return;
+	}
+
+	for (i = 0; i < 10; i++) {
+		peek(mqd, 0, buf, sizeof(buf), &prio);
+		peek(mqd, 1, buf, sizeof(buf), &prio);
+		peek(mqd, 2, buf, sizeof(buf), &prio);
+	}
+
+	if (queue_depth(mqd) == 3)
+		ksft_test_result_pass("peek is non-destructive (depth stays 3)\n");
+	else
+		ksft_test_result_fail("peek modified queue: depth=%ld (expected 3)\n", queue_depth(mqd));
+
+	mq_close(mqd);
+}
+
+static void test_peek_priority_order(void)
+{
+	mqd_t mqd = open_queue("prio_order", MAX_MSG_SIZE);
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	long ret;
+	int pass = 1;
+
+	send_msg(mqd, PRIO_LOW, "low", 3);
+	send_msg(mqd, PRIO_HIGH, "high", 4);
+	send_msg(mqd, PRIO_MED, "med", 3);
+
+	/* index 0 must return highest priority */
+	ret = peek(mqd, 0, buf, sizeof(buf), &prio);
+	if (ret < 0 || prio != PRIO_HIGH) {
+		ksft_test_result_fail("prio_order index0: prio=%u ret=%ld errno=%d\n", prio,
+								ret, errno);
+		pass = 0;
+	}
+	if (pass && memcmp(buf, "high", 4) != 0) {
+		ksft_test_result_fail("prio_order index0: wrong payload\n");
+		pass = 0;
+	}
+
+	/* index 1 must return medium priority */
+	ret = peek(mqd, 1, buf, sizeof(buf), &prio);
+	if (pass && (ret < 0 || prio != PRIO_MED)) {
+		ksft_test_result_fail("prio_order index1: prio=%u ret=%ld\n",
+				      prio, ret);
+		pass = 0;
+	}
+	if (pass && memcmp(buf, "med", 3) != 0) {
+		ksft_test_result_fail("prio_order index1: wrong payload\n");
+		pass = 0;
+	}
+
+	/* index 2 must return lowest priority */
+	ret = peek(mqd, 2, buf, sizeof(buf), &prio);
+	if (pass && (ret < 0 || prio != PRIO_LOW)) {
+		ksft_test_result_fail("prio_order index2: prio=%u ret=%ld\n",
+				      prio, ret);
+		pass = 0;
+	}
+	if (pass && memcmp(buf, "low", 3) != 0) {
+		ksft_test_result_fail("prio_order index2: wrong payload\n");
+		pass = 0;
+	}
+
+	if (pass)
+		ksft_test_result_pass("priority ordering: index0=HIGH, index1=MED, index2=LOW\n");
+
+	mq_close(mqd);
+}
+
+static void test_peek_fifo_within_priority(void)
+{
+	mqd_t mqd = open_queue("fifo", MAX_MSG_SIZE);
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	long ret;
+	int pass = 1;
+
+	send_msg(mqd, PRIO_HIGH, "first", 5);
+	send_msg(mqd, PRIO_HIGH, "second", 6);
+	send_msg(mqd, PRIO_HIGH, "third", 5);
+
+	memset(buf, 0, sizeof(buf));
+	ret = peek(mqd, 0, buf, sizeof(buf), &prio);
+	if (ret < 0 || memcmp(buf, "first", 5) != 0) {
+		ksft_test_result_fail("FIFO peek[0]: expected 'first', got '%.*s' ret=%ld\n",
+								(int)ret, buf, ret);
+		pass = 0;
+	}
+
+	if (pass) {
+		char rbuf[MAX_MSG_SIZE];
+		unsigned int rprio;
+
+		ssize_t r = mq_receive(mqd, rbuf, sizeof(rbuf), &rprio);
+
+		if (r < 0 || memcmp(rbuf, "first", 5) != 0) {
+			ksft_test_result_fail("mq_receive 'first' failed\n");
+			pass = 0;
+		}
+	}
+
+	if (pass) {
+		char rbuf[MAX_MSG_SIZE];
+		unsigned int rprio;
+
+		memset(buf, 0, sizeof(buf));
+
+		ret = peek(mqd, 0, buf, sizeof(buf), &prio);
+		if (ret < 0 || memcmp(buf, "second", 6) != 0) {
+			ksft_test_result_fail("FIFO peek after receive: expected 'second', got '%.*s'\n",
+									(int)ret, buf);
+			pass = 0;
+		}
+	}
+
+	if (pass)
+		ksft_test_result_pass("FIFO within same priority is correct\n");
+
+	mq_close(mqd);
+}
+
+static void test_peek_all_indices(void)
+{
+	const unsigned int prios[] = { 2, 7, 4, 9, 1, 6 };
+	const int N = ARRAY_SIZE(prios);
+	mqd_t mqd = open_queue("all_idx", MAX_MSG_SIZE);
+	char buf[MAX_MSG_SIZE];
+	char expected_payload[MAX_MSG_SIZE];
+	unsigned int prio;
+	long ret;
+	int i, pass = 1;
+	unsigned int sorted[6];
+
+	for (i = 0; i < N; i++) {
+		snprintf(expected_payload, sizeof(expected_payload),
+			 "msg_prio_%u", prios[i]);
+		send_msg(mqd, prios[i], expected_payload,
+			 strlen(expected_payload));
+		sorted[i] = prios[i];
+	}
+
+	for (i = 0; i < N - 1; i++) {
+		int j;
+
+		for (j = i + 1; j < N; j++) {
+			if (sorted[j] > sorted[i]) {
+
+				unsigned int tmp = sorted[i];
+				sorted[i] = sorted[j];
+				sorted[j] = tmp;
+			}
+		}
+	}
+
+	for (i = 0; i < N && pass; i++) {
+		memset(buf, 0, sizeof(buf));
+
+		ret = peek(mqd, (unsigned long)i, buf, sizeof(buf), &prio);
+
+		if (ret < 0) {
+			ksft_test_result_fail("all_indices peek[%d] failed: ret=%ld errno=%d\n",
+								i, ret, errno);
+			pass = 0;
+			break;
+		}
+		if (prio != sorted[i]) {
+			ksft_test_result_fail("all_indices peek[%d]: prio=%u expected=%u\n",
+								i, prio, sorted[i]);
+			pass = 0;
+		}
+
+		snprintf(expected_payload, sizeof(expected_payload),
+			 "msg_prio_%u", sorted[i]);
+		if (memcmp(buf, expected_payload, strlen(expected_payload))) {
+			ksft_test_result_fail("all_indices peek[%d]: payload mismatch\n", i);
+			pass = 0;
+		}
+	}
+
+	if (pass && queue_depth(mqd) != N) {
+		ksft_test_result_fail("all_indices: depth=%ld expected=%d after peek\n",
+							queue_depth(mqd), N);
+		pass = 0;
+	}
+
+	if (pass) {
+		ret = peek(mqd, (unsigned long)N, buf, sizeof(buf), &prio);
+		if (!(ret == -1 && errno == ENOENT)) {
+			ksft_test_result_fail("all_indices OOB[%d]: expected ENOENT, got ret=%ld errno=%d\n",
+								N, ret, errno);
+			pass = 0;
+		}
+	}
+
+	if (pass)
+		ksft_test_result_pass("all-indices: correct prio order + OOB ENOENT\n");
+
+	mq_close(mqd);
+}
+
+static void test_peek_large_message(void)
+{
+	mqd_t mqd = open_queue("large", LARGE_MSG_SIZE);
+	char *send_buf, *recv_buf;
+	unsigned int prio = 0;
+	long ret;
+	int pass = 1;
+
+	send_buf = malloc(LARGE_MSG_SIZE);
+	recv_buf = calloc(1, LARGE_MSG_SIZE);
+	if (!send_buf || !recv_buf) {
+		ksft_test_result_skip("OOM allocating large message buffers\n");
+		goto out;
+	}
+
+	for (int i = 0; i < LARGE_MSG_SIZE; i++)
+		send_buf[i] = (char)(i & 0xFF);
+
+	send_msg(mqd, PRIO_HIGH, send_buf, LARGE_MSG_SIZE);
+
+	ret = peek(mqd, 0, recv_buf, LARGE_MSG_SIZE, &prio);
+	if (ret != LARGE_MSG_SIZE) {
+		ksft_test_result_fail("large msg peek: ret=%ld expected=%d\n",
+				      ret, LARGE_MSG_SIZE);
+		pass = 0;
+	}
+	if (pass && memcmp(send_buf, recv_buf, LARGE_MSG_SIZE) != 0) {
+		ksft_test_result_fail("large msg peek: payload mismatch\n");
+		pass = 0;
+	}
+	if (pass && prio != PRIO_HIGH) {
+		ksft_test_result_fail("large msg peek: prio=%u expected=%d\n", prio, PRIO_HIGH);
+		pass = 0;
+	}
+	if (pass && queue_depth(mqd) != 1) {
+		ksft_test_result_fail("large msg peek: queue modified (depth=%ld)\n", queue_depth(mqd));
+		pass = 0;
+	}
+	if (pass)
+		ksft_test_result_pass("large (%d B) multi-segment peek: correct\n", LARGE_MSG_SIZE);
+out:
+	free(send_buf);
+	free(recv_buf);
+	mq_close(mqd);
+}
+
+static void test_no_peek_flag_is_receive(void)
+{
+	mqd_t mqd = open_queue("nopeek", MAX_MSG_SIZE);
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio = 0;
+	struct mq_timedreceive2_args args = {
+		.msg_len = sizeof(buf),
+		.msg_prio = &prio,
+		.msg_ptr = buf,
+	};
+	long ret;
+
+	send_msg(mqd, PRIO_HIGH, "consume-me", 10);
+
+	ret = mq_timedreceive2(mqd, &args, 0, 0, NULL);
+	if (ret < 0) {
+		ksft_test_result_fail("no-peek receive failed: ret=%ld errno=%d\n", ret, errno);
+		mq_close(mqd);
+		return;
+	}
+	if (queue_depth(mqd) != 0)
+		ksft_test_result_fail("no-peek: queue still has messages (depth=%ld)\n", queue_depth(mqd));
+	else
+		ksft_test_result_pass("without MQ_PEEK the message is consumed normally\n");
+
+	mq_close(mqd);
+}
+
+struct race_ctx {
+	mqd_t mqd;
+	int errors;
+};
+
+static void *receiver_thread(void *arg)
+{
+	struct race_ctx *ctx = arg;
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	ssize_t r;
+
+	while ((r = mq_receive(ctx->mqd, buf, sizeof(buf), &prio)) > 0)
+		;
+
+	return NULL;
+}
+
+static void test_peek_concurrent_receive(void)
+{
+	struct race_ctx ctx;
+	pthread_t tid;
+	char buf[MAX_MSG_SIZE];
+	unsigned int prio;
+	long ret;
+	int i;
+
+	ctx.mqd = open_queue("concurrent", MAX_MSG_SIZE);
+	ctx.errors = 0;
+
+	for (i = 0; i < MQ_MAXMSG; i++) {
+		char payload[32];
+
+		snprintf(payload, sizeof(payload), "msg%d", i);
+		send_msg(ctx.mqd, (unsigned int)(i % 5) + 1, payload,
+			 strlen(payload));
+	}
+
+	if (pthread_create(&tid, NULL, receiver_thread, &ctx) != 0) {
+		ksft_test_result_skip("pthread_create failed\n");
+		mq_close(ctx.mqd);
+		return;
+	}
+
+	/*
+     * Peek repeatedly.The queue is being drained concurrently.
+	 */
+	for (i = 0; i < 200; i++) {
+		ret = peek(ctx.mqd, (unsigned long)(i % 4), buf, sizeof(buf),
+			   &prio);
+		if (ret < 0 && errno != EAGAIN && errno != ENOENT) {
+			ctx.errors++;
+		}
+	}
+
+	pthread_join(tid, NULL);
+
+	if (ctx.errors == 0)
+		ksft_test_result_pass("concurrent peek+receive: no unexpected errors\n");
+	else
+		ksft_test_result_fail("concurrent peek+receive: %d unexpected errors\n", ctx.errors);
+
+	mq_close(ctx.mqd);
+}
+
+static void test_peek_null_prio_ptr(void)
+{
+	mqd_t mqd = open_queue("null_prio", MAX_MSG_SIZE);
+	char buf[MAX_MSG_SIZE];
+	struct mq_timedreceive2_args args = {
+		.msg_len = sizeof(buf),
+		.msg_prio = NULL,
+		.msg_ptr = buf,
+	};
+	long ret;
+
+	send_msg(mqd, PRIO_MED, "no-prio-needed", 14);
+
+	ret = mq_timedreceive2(mqd, &args, MQ_PEEK, 0, NULL);
+	if (ret >= 0)
+		ksft_test_result_pass("peek with NULL msg_prio ptr: OK\n");
+	else
+		ksft_test_result_fail("peek NULL msg_prio: ret=%ld errno=%d (%m)\n", ret, errno);
+
+	mq_close(mqd);
+}
+
+static void test_peek_priority_matches_receive(void)
+{
+	mqd_t mqd = open_queue("prio_match", MAX_MSG_SIZE);
+	char peek_buf[MAX_MSG_SIZE], recv_buf[MAX_MSG_SIZE];
+	unsigned int peek_prio = 0, recv_prio = 0;
+	long peek_ret;
+	ssize_t recv_ret;
+	int pass = 1;
+
+	send_msg(mqd, PRIO_MED, "consistent-prio", 15);
+
+	peek_ret = peek(mqd, 0, peek_buf, sizeof(peek_buf), &peek_prio);
+	if (peek_ret < 0) {
+		ksft_test_result_fail("peek failed: %m\n");
+		mq_close(mqd);
+		return;
+	}
+
+	recv_ret = mq_receive(mqd, recv_buf, sizeof(recv_buf), &recv_prio);
+	if (recv_ret < 0) {
+		ksft_test_result_fail("mq_receive failed: %m\n");
+		mq_close(mqd);
+		return;
+	}
+
+	if (peek_prio != recv_prio) {
+		ksft_test_result_fail("prio mismatch: peek=%u receive=%u\n",
+							peek_prio, recv_prio);
+		pass = 0;
+	}
+	if (pass && peek_ret != recv_ret) {
+		ksft_test_result_fail("size mismatch: peek=%ld receive=%zd\n",
+				      peek_ret, recv_ret);
+		pass = 0;
+	}
+	if (pass && memcmp(peek_buf, recv_buf, (size_t)recv_ret) != 0) {
+		ksft_test_result_fail("payload mismatch between peek and receive\n");
+		pass = 0;
+	}
+	if (pass)
+		ksft_test_result_pass("peeked priority/payload matches mq_receive output\n");
+
+	mq_close(mqd);
+}
+
+static const struct {
+	const char *name;
+	void (*fn)(void);
+} tests[] = {
+	{ "empty queue → EAGAIN", test_peek_empty_queue },
+	{ "invalid fd → EBADF", test_peek_invalid_fd },
+	{ "non-mqueue fd → EBADF", test_peek_non_mqueue_fd },
+	{ "O_WRONLY fd → EBADF", test_peek_writeonly_fd },
+	{ "buffer too small → EMSGSIZE", test_peek_buffer_too_small },
+	{ "bad msg_ptr → EFAULT", test_peek_bad_msg_ptr },
+	{ "OOB index → ENOENT", test_peek_index_out_of_range },
+	{ "basic data+prio correctness", test_peek_basic_data },
+	{ "non-destructive semantics", test_peek_nondestructive },
+	{ "priority ordering across indices", test_peek_priority_order },
+	{ "FIFO within same priority", test_peek_fifo_within_priority },
+	{ "all distinct priority indices", test_peek_all_indices },
+	{ "large multi-segment message", test_peek_large_message },
+	{ "no MQ_PEEK → normal receive", test_no_peek_flag_is_receive },
+	{ "concurrent peek + receive", test_peek_concurrent_receive },
+	{ "NULL msg_prio ptr", test_peek_null_prio_ptr },
+	{ "peeked prio matches mq_receive",
+	  test_peek_priority_matches_receive },
+};
+
+int main(void)
+{
+	unsigned int i;
+	long sc_ret;
+
+	ksft_print_header();
+	ksft_set_plan(ARRAY_SIZE(tests));
+
+	{
+		struct mq_timedreceive2_args probe_args = { 0 };
+
+		sc_ret = mq_timedreceive2((mqd_t)-1, &probe_args, MQ_PEEK, 0,
+					  NULL);
+		if (sc_ret == -1 && errno == ENOSYS)
+			ksft_exit_skip("mq_timedreceive2 syscall not available (NR=%d ENOSYS) — is the kernel too old?\n",
+						__NR_mq_timedreceive2);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(tests); i++) {
+		ksft_print_msg("--- [%02u] %s ---\n", i + 1, tests[i].name);
+		tests[i].fn();
+	}
+
+	return ksft_get_fail_cnt() ? 1 : 0;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/5]IPC:Added entry in performance tools for new system call
  2026-04-08  9:51 [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Mathura_Kumar
                   ` (2 preceding siblings ...)
  2026-04-08  9:51 ` [PATCH v3 3/5]IPC: Prepared Documentation and test Mathura_Kumar
@ 2026-04-08  9:51 ` Mathura_Kumar
  2026-04-08  9:51 ` [PATCH v3 5/5]Added bitmask flag and user struct argument size as input for extensibility Mathura_Kumar
  2026-04-13 16:36 ` [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Andrei Vagin
  5 siblings, 0 replies; 14+ messages in thread
From: Mathura_Kumar @ 2026-04-08  9:51 UTC (permalink / raw)
  To: criu; +Cc: academic1mathura, avagin, ptikhomirov, rstoyanov

Signed-off-by: Mathura_Kumar <academic1mathura@gmail.com>
---
 tools/include/uapi/asm-generic/unistd.h             | 6 +++++-
 tools/perf/arch/alpha/entry/syscalls/syscall.tbl    | 1 +
 tools/perf/arch/arm/entry/syscalls/syscall.tbl      | 1 +
 tools/perf/arch/arm64/entry/syscalls/syscall_32.tbl | 1 +
 tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl | 1 +
 tools/perf/arch/parisc/entry/syscalls/syscall.tbl   | 1 +
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl  | 1 +
 tools/perf/arch/s390/entry/syscalls/syscall.tbl     | 1 +
 tools/perf/arch/sh/entry/syscalls/syscall.tbl       | 1 +
 tools/perf/arch/sparc/entry/syscalls/syscall.tbl    | 1 +
 tools/perf/arch/x86/entry/syscalls/syscall_32.tbl   | 1 +
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl   | 1 +
 tools/perf/arch/xtensa/entry/syscalls/syscall.tbl   | 1 +
 tools/scripts/syscall.tbl                           | 1 +
 14 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h
index a627acc8fb5f..e186695f1f6c 100644
--- a/tools/include/uapi/asm-generic/unistd.h
+++ b/tools/include/uapi/asm-generic/unistd.h
@@ -863,8 +863,12 @@ __SYSCALL(__NR_listns, sys_listns)
 #define __NR_rseq_slice_yield 471
 __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
 
+#define __NR_mq_timedreceive2 472
+__SC_COMP(__NR_mq_timedreceive2, sys_mq_timedreceive2,
+		compat_sys_mq_timedreceive2)
+
 #undef __NR_syscalls
-#define __NR_syscalls 472
+#define __NR_syscalls 473
 
 /*
  * 32 bit systems traditionally used different
diff --git a/tools/perf/arch/alpha/entry/syscalls/syscall.tbl b/tools/perf/arch/alpha/entry/syscalls/syscall.tbl
index 74720667fe09..95893d9ec401 100644
--- a/tools/perf/arch/alpha/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/alpha/entry/syscalls/syscall.tbl
@@ -502,3 +502,4 @@
 570	common	lsm_set_self_attr		sys_lsm_set_self_attr
 571	common	lsm_list_modules		sys_lsm_list_modules
 572	common  mseal				sys_mseal
+582	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/tools/perf/arch/arm/entry/syscalls/syscall.tbl b/tools/perf/arch/arm/entry/syscalls/syscall.tbl
index 94351e22bfcf..df82865ca7cf 100644
--- a/tools/perf/arch/arm/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/arm/entry/syscalls/syscall.tbl
@@ -486,3 +486,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/tools/perf/arch/arm64/entry/syscalls/syscall_32.tbl b/tools/perf/arch/arm64/entry/syscalls/syscall_32.tbl
index 9a37930d4e26..0e9ff6deb3fa 100644
--- a/tools/perf/arch/arm64/entry/syscalls/syscall_32.tbl
+++ b/tools/perf/arch/arm64/entry/syscalls/syscall_32.tbl
@@ -474,3 +474,4 @@
 460	common	lsm_set_self_attr		sys_lsm_set_self_attr
 461	common	lsm_list_modules		sys_lsm_list_modules
 462	common	mseal				sys_mseal
+472	common	mq_timedreceive2		sys_mq_timedreceive2		compat_sys_mq_timedreceive2
\ No newline at end of file
diff --git a/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl b/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
index 630aab9e5425..75de6ee2df94 100644
--- a/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
+++ b/tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl
@@ -386,3 +386,4 @@
 469	n64	file_setattr			sys_file_setattr
 470	n64	listns				sys_listns
 471	n64	rseq_slice_yield		sys_rseq_slice_yield
+472	n64	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/tools/perf/arch/parisc/entry/syscalls/syscall.tbl b/tools/perf/arch/parisc/entry/syscalls/syscall.tbl
index 66dc406b12e4..f1741102631b 100644
--- a/tools/perf/arch/parisc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/parisc/entry/syscalls/syscall.tbl
@@ -461,3 +461,4 @@
 460	common	lsm_set_self_attr		sys_lsm_set_self_attr
 461	common	lsm_list_modules		sys_lsm_list_modules
 462	common	mseal				sys_mseal
+472	common	mq_timedreceive2		sys_mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
index 4fcc7c58a105..f93db75b79e6 100644
--- a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -562,3 +562,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	nospu	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/tools/perf/arch/s390/entry/syscalls/syscall.tbl b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
index 09a7ef04d979..577ef18fdf42 100644
--- a/tools/perf/arch/s390/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/s390/entry/syscalls/syscall.tbl
@@ -398,3 +398,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472 common	mq_timedreceive2		sys_mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/tools/perf/arch/sh/entry/syscalls/syscall.tbl b/tools/perf/arch/sh/entry/syscalls/syscall.tbl
index 70b315cbe710..cac787dbef1a 100644
--- a/tools/perf/arch/sh/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/sh/entry/syscalls/syscall.tbl
@@ -475,3 +475,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/tools/perf/arch/sparc/entry/syscalls/syscall.tbl b/tools/perf/arch/sparc/entry/syscalls/syscall.tbl
index 7e71bf7fcd14..626f4c8402cf 100644
--- a/tools/perf/arch/sparc/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/sparc/entry/syscalls/syscall.tbl
@@ -517,3 +517,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_32.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_32.tbl
index f832ebd2d79b..577092a9aef6 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_32.tbl
@@ -477,3 +477,4 @@
 469	i386	file_setattr		sys_file_setattr
 470	i386	listns			sys_listns
 471	i386	rseq_slice_yield	sys_rseq_slice_yield
+472	i386	mq_timedreceive2		sys_mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
index 524155d655da..f02327c465b3 100644
--- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
@@ -441,3 +441,4 @@
 547	x32	pwritev2		compat_sys_pwritev64v2
 # This is the end of the legacy x32 range.  Numbers 548 and above are
 # not special and are not to be used for x32-specific syscalls.
+548	common	mq_timedreceive2		sys_mq_timedreceive2		compat_sys_mq_timedreceive2
diff --git a/tools/perf/arch/xtensa/entry/syscalls/syscall.tbl b/tools/perf/arch/xtensa/entry/syscalls/syscall.tbl
index a9bca4e484de..ae106ac97418 100644
--- a/tools/perf/arch/xtensa/entry/syscalls/syscall.tbl
+++ b/tools/perf/arch/xtensa/entry/syscalls/syscall.tbl
@@ -442,3 +442,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472	common	mq_timedreceive2		sys_mq_timedreceive2
diff --git a/tools/scripts/syscall.tbl b/tools/scripts/syscall.tbl
index 7a42b32b6577..7bf197b306da 100644
--- a/tools/scripts/syscall.tbl
+++ b/tools/scripts/syscall.tbl
@@ -412,3 +412,4 @@
 469	common	file_setattr			sys_file_setattr
 470	common	listns				sys_listns
 471	common	rseq_slice_yield		sys_rseq_slice_yield
+472 common	mq_timedreceive2	sys_mq_timedreceive2	compat_sys_mq_timedreceive2
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/5]Added bitmask flag and user struct argument size as input for extensibility
  2026-04-08  9:51 [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Mathura_Kumar
                   ` (3 preceding siblings ...)
  2026-04-08  9:51 ` [PATCH v3 4/5]IPC:Added entry in performance tools for new system call Mathura_Kumar
@ 2026-04-08  9:51 ` Mathura_Kumar
  2026-04-13 16:36 ` [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Andrei Vagin
  5 siblings, 0 replies; 14+ messages in thread
From: Mathura_Kumar @ 2026-04-08  9:51 UTC (permalink / raw)
  To: criu; +Cc: academic1mathura, avagin, ptikhomirov, rstoyanov

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=yes, Size: 17604 bytes --]

Signed-off-by: Mathura_Kumar <academic1mathura@gmail.com>
---
 Documentation/userspace-api/ipc.rst   |  24 ++--
 include/linux/compat.h                |   4 +-
 include/linux/syscalls.h              |   4 +-
 ipc/mqueue.c                          | 179 +++++++++++++++-----------
 kernel/sys_ni.c                       |   2 +
 tools/testing/selftests/ipc/mq_peek.c |  54 ++++++--
 6 files changed, 163 insertions(+), 104 deletions(-)

diff --git a/Documentation/userspace-api/ipc.rst b/Documentation/userspace-api/ipc.rst
index 14a5f0ce7230..3fc46621b5c6 100644
--- a/Documentation/userspace-api/ipc.rst
+++ b/Documentation/userspace-api/ipc.rst
@@ -54,13 +54,10 @@ SYNOPSIS
 
         ssize_t mq_timedreceive2(mqd_t mqdes,
                                  struct mq_timedreceive2_args *uargs,
-                                 unsigned int flags,
+                                 size_t usize, unsigned int flags,
                                  unsigned long index,
                                  const struct timespec *abs_timeout);
 
-Note: No glibc wrapper exists for this syscall. Callers must invoke it
-directly using syscall(2).
-
 DESCRIPTION
         mq_timedreceive2() receives or peeks at a message from the
         message queue referred to by the descriptor mqdes.
@@ -78,14 +75,19 @@ DESCRIPTION
                 If not NULL, the priority of the received message is
                 stored here.
 
+        The usize argument take size of struct to make mq_timedreceive2_args
+        extensible in future if required.
+
         The flags argument controls receive behavior. The following
         flag is defined:
 
         ``MQ_PEEK``
                 Copy the message into msg_ptr without removing it from
-                the queue. The queue is not modified. If this flag is
-                not set, behavior is identical to mq_timedreceive() and
-                the message is consumed.
+                the queue. The queue is not modified.
+
+        ``MQ_RECV``
+                Copy the message into msg_ptr and consume a single message by
+                invoking the existing do_mq_timedreceive() handler.
 
         The index argument selects which message to operate on within
         the priority-ordered queue. index 0 refers to the highest
@@ -116,7 +118,7 @@ ERRORS
 
         ``EINVAL``
                 flags contains an unknown value, or index is nonzero
-                and MQ_PEEK is not set.
+                and contradictory flag was passed.
 
         ``EMSGSIZE``
                 msg_len is less than the mq_msgsize attribute of
@@ -137,7 +139,7 @@ consolidated into struct mq_timedreceive2_args rather than passed
 as individual syscall arguments. Due to limited six arguments,
 The original mq_timedreceive() consumes all six slots,
 leaving no room for extension. Consolidating the buffer parameters
-into a struct recovers two argument slots for flags and index while
+into a struct recovers three argument slots for flags, size and index while
 keeping the interface clean and forward-compatible.
 
 Future extensions can be made by adding new flag bits without
@@ -220,3 +222,7 @@ Tests for mq_timedreceive2() should cover the following:
    threads returns consistent results and does not corrupt queue
    state. Verify that a concurrent mq_receive() and mq_peek() do
    not race.
+
+9) Contradictory Operation: verify that simultaneous inconsistent flag work as
+   expected.
+
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 9f5ca26e76d8..837a59559fa5 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -803,8 +803,8 @@ asmlinkage long compat_sys_pwritev64v2(unsigned long fd,
 		unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
 asmlinkage long compat_sys_mq_timedreceive2(mqd_t mqdes, struct compat_mq_timedreceive2_args __user *uargs,
-											unsigned int flags, unsigned long index,
-											struct old_timespec32 __user *abs_timeout);
+											size_t usize, unsigned int flags, const unsigned long index,
+											const struct old_timespec32 __user *abs_timeout);
 /*
  * Deprecated system calls which are still defined in
  * include/uapi/asm-generic/unistd.h and wanted by >= 1 arch
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 993e570c90ab..e5b544464a44 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -750,8 +750,8 @@ asmlinkage long sys_mq_timedsend_time32(mqd_t mqdes,
 			const struct old_timespec32 __user *u_abs_timeout);
 asmlinkage long
 sys_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args __user *uargs,
-		     unsigned int flags, unsigned long index,
-		     struct __kernel_timespec __user *abs_timeout);
+					size_t usize, unsigned int flags, const unsigned long index,
+					const struct __kernel_timespec __user *abs_timeout);
 asmlinkage long sys_msgget(key_t key, int msgflg);
 asmlinkage long sys_old_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
 asmlinkage long sys_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 78dc414967a2..2d4c285e4aa1 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -38,6 +38,7 @@
 #include <linux/sched/wake_q.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/user.h>
+#include <linux/uaccess.h>
 
 #include <net/sock.h>
 #include "util.h"
@@ -53,7 +54,10 @@ struct mqueue_fs_context {
 
 #define SEND		0
 #define RECV		1
-#define MQ_PEEK     2
+
+#define MQ_PEEK     0x02
+#define MQ_RECV     0x04
+#define MQ_VALID_FLAGS (MQ_PEEK | MQ_RECV)
 
 #define STATE_NONE	0
 #define STATE_READY	1
@@ -1266,78 +1270,85 @@ static int do_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args *args,
 	struct inode *inode;
 	struct mqueue_inode_info *info;
 
-	if (!(flags & MQ_PEEK)) {
-		return do_mq_timedreceive(mqdes, args->msg_ptr, args->msg_len,
-					  args->msg_prio, ts);
-	}
-	audit_mq_sendrecv(mqdes, args->msg_len, 0, ts);
-	CLASS(fd, f)(mqdes);
-	if (fd_empty(f))
-		return -EBADF;
-
-	inode = file_inode(fd_file(f));
-	if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
-		return -EBADF;
-	info = MQUEUE_I(inode);
-	audit_file(fd_file(f));
+	if (flags & (~MQ_VALID_FLAGS))
+		return -EINVAL;
+	if ((flags & MQ_RECV) && (flags & MQ_PEEK))
+		return -EINVAL;
+	if (flags & MQ_PEEK) {
+		audit_mq_sendrecv(mqdes, args->msg_len, 0, ts);
+		CLASS(fd, f)(mqdes);
+		if (fd_empty(f))
+			return -EBADF;
+
+		inode = file_inode(fd_file(f));
+		if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
+			return -EBADF;
+		info = MQUEUE_I(inode);
+		audit_file(fd_file(f));
 
-	if (unlikely(!(fd_file(f)->f_mode & FMODE_READ)))
-		return -EBADF;
+		if (unlikely(!(fd_file(f)->f_mode & FMODE_READ)))
+			return -EBADF;
 
-	if (unlikely(args->msg_len < info->attr.mq_msgsize))
-		return -EMSGSIZE;
-	if (index >= (unsigned long)info->attr.mq_maxmsg)
-		return -ENOENT;
+		if (unlikely(args->msg_len < info->attr.mq_msgsize))
+			return -EMSGSIZE;
+		if (index >= (unsigned long)info->attr.mq_maxmsg)
+			return -ENOENT;
 
-	spin_lock(&info->lock);
-	if (info->attr.mq_curmsgs == 0) {
-		spin_unlock(&info->lock);
-		return -EAGAIN;
-	}
-	msg_ptr = mq_peek_index(info, index);
-	if (!msg_ptr) {
+		spin_lock(&info->lock);
+		if (info->attr.mq_curmsgs == 0) {
+			spin_unlock(&info->lock);
+			return -EAGAIN;
+		}
+		msg_ptr = mq_peek_index(info, index);
+		if (!msg_ptr) {
+			spin_unlock(&info->lock);
+			return -ENOENT;
+		}
+		k_m_type = msg_ptr->m_type;
+		k_m_ts = msg_ptr->m_ts;
 		spin_unlock(&info->lock);
-		return -ENOENT;
-	}
-	k_m_type = msg_ptr->m_type;
-	k_m_ts = msg_ptr->m_ts;
-	spin_unlock(&info->lock);
 
-	k_msg_buffer = alloc_msg(k_m_ts);
-	if (!k_msg_buffer)
-		return -ENOMEM;
+		k_msg_buffer = alloc_msg(k_m_ts);
+		if (!k_msg_buffer)
+			return -ENOMEM;
 
 	/*
 	 * Two spin locks are necessary here. We are avoiding atomic memory
 	 * allocation and premature allocation before confirming that
 	 * a message actually exists to peek.
 	 */
-	spin_lock(&info->lock);
-	msg_ptr = mq_peek_index(info, index);
-	if (!msg_ptr || msg_ptr->m_type != k_m_type ||
-	    msg_ptr->m_ts != k_m_ts) {
-		spin_unlock(&info->lock);
-		free_msg(k_msg_buffer);
-		return -EAGAIN;
-	}
-	if (IS_ERR(copy_msg(msg_ptr, k_msg_buffer, k_m_ts))) {
+		spin_lock(&info->lock);
+		msg_ptr = mq_peek_index(info, index);
+		if (!msg_ptr || msg_ptr->m_type != k_m_type ||
+			msg_ptr->m_ts != k_m_ts) {
+			spin_unlock(&info->lock);
+			free_msg(k_msg_buffer);
+			return -EAGAIN;
+		}
+		if (IS_ERR(copy_msg(msg_ptr, k_msg_buffer, k_m_ts))) {
+			spin_unlock(&info->lock);
+			free_msg(k_msg_buffer);
+			return -EINVAL;
+		}
 		spin_unlock(&info->lock);
-		free_msg(k_msg_buffer);
-		return -EINVAL;
-	}
-	spin_unlock(&info->lock);
 
-	ret = k_msg_buffer->m_ts;
-	if (args->msg_prio && put_user(k_m_type, args->msg_prio)) {
+		ret = k_msg_buffer->m_ts;
+		if (args->msg_prio && put_user(k_m_type, args->msg_prio)) {
+			free_msg(k_msg_buffer);
+			return -EFAULT;
+		}
+		if (store_msg(args->msg_ptr, k_msg_buffer, k_m_ts)) {
+			free_msg(k_msg_buffer);
+			return -EFAULT;
+		}
 		free_msg(k_msg_buffer);
-		return -EFAULT;
+			return ret;
 	}
-	if (store_msg(args->msg_ptr, k_msg_buffer, k_m_ts)) {
-		free_msg(k_msg_buffer);
-		return -EFAULT;
+	if (flags & MQ_RECV) {
+		return do_mq_timedreceive(mqdes, args->msg_ptr, args->msg_len, args->msg_prio, ts);
 	}
-	free_msg(k_msg_buffer);
-	return ret;
+
+	return -EINVAL;
 }
 
 SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
@@ -1368,17 +1379,22 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
 	return do_mq_timedreceive(mqdes, u_msg_ptr, msg_len, u_msg_prio, p);
 }
 
-SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
-		struct mq_timedreceive2_args __user *, uargs, unsigned int,
-		flags, const unsigned long, index,
-		const struct __kernel_timespec __user *, u_abs_timeout)
+SYSCALL_DEFINE6(mq_timedreceive2, mqd_t, mqdes,
+				struct mq_timedreceive2_args __user *, uargs, size_t, usize,
+				unsigned int, flags, const unsigned long, index,
+				const struct __kernel_timespec __user *, u_abs_timeout)
 {
-	struct mq_timedreceive2_args args;
+	struct mq_timedreceive2_args k_args;
 	struct timespec64 ts, *p = NULL;
+	int err;
 
-	if (copy_from_user(&args, uargs, sizeof(args)))
-		return -EFAULT;
-
+	if (unlikely(usize < sizeof(struct mq_timedreceive2_args)))
+		return -EINVAL;
+	if (unlikely(usize > PAGE_SIZE))
+		return -E2BIG;
+	err = copy_struct_from_user(&k_args, sizeof(k_args), uargs, usize);
+	if (err)
+		return err;
 	if (u_abs_timeout) {
 		int res = prepare_timeout(u_abs_timeout, &ts);
 
@@ -1386,7 +1402,7 @@ SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
 			return res;
 		p = &ts;
 	}
-	return do_mq_timedreceive2(mqdes, &args, flags, index, p);
+	return do_mq_timedreceive2(mqdes, &k_args, flags, index, p);
 }
 
 /*
@@ -1633,12 +1649,18 @@ static inline int put_compat_mq_attr(const struct mq_attr *attr,
 }
 
 static inline int get_compat_mq_args(struct mq_timedreceive2_args *args,
-									struct compat_mq_timedreceive2_args __user *uargs)
+									struct compat_mq_timedreceive2_args __user *uargs, size_t usize)
 {
 	struct compat_mq_timedreceive2_args v;
+	int err;
 
-	if (copy_from_user(&v, uargs, sizeof(v)))
-		return -EFAULT;
+	if (unlikely(usize < sizeof(struct compat_mq_timedreceive2_args)))
+		return -EINVAL;
+	if (unlikely(usize > PAGE_SIZE))
+		return -E2BIG;
+	err = copy_struct_from_user(&v, sizeof(v), uargs, usize);
+	if (err)
+		return err;
 
 	memset(args, 0, sizeof(*args));
 	args->msg_len = (size_t)v.msg_len;
@@ -1700,17 +1722,18 @@ COMPAT_SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
 	return 0;
 }
 
-COMPAT_SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
-		       struct compat_mq_timedreceive2_args __user *, uargs,
-		       unsigned int, flags, const unsigned long, index,
-		       const struct old_timespec32 __user *, u_abs_timeout)
+COMPAT_SYSCALL_DEFINE6(mq_timedreceive2, mqd_t, mqdes,
+					struct compat_mq_timedreceive2_args __user *, uargs, size_t, usize,
+					unsigned int, flags, const unsigned long, index,
+					const struct old_timespec32 __user *, u_abs_timeout)
 {
-	struct mq_timedreceive2_args args;
+	struct mq_timedreceive2_args k_args;
 	struct timespec64 ts, *p = NULL;
+	int err;
 
-	if (get_compat_mq_args(&args, uargs))
-		return -EFAULT;
-
+	err = get_compat_mq_args(&k_args, uargs, usize);
+	if (err)
+		return err;
 	if (u_abs_timeout) {
 		int res = compat_prepare_timeout(u_abs_timeout, &ts);
 
@@ -1718,7 +1741,7 @@ COMPAT_SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
 			return res;
 		p = &ts;
 	}
-	return do_mq_timedreceive2(mqdes, &args, flags, index, p);
+	return do_mq_timedreceive2(mqdes, &k_args, flags, index, p);
 }
 
 #endif
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 658d6b8274b3..af956bc4e6fe 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -391,6 +391,8 @@ COND_SYSCALL(setuid16);
 /* restartable sequence */
 COND_SYSCALL(rseq);
 COND_SYSCALL(rseq_slice_yield);
+
+/* ipc */
 COND_SYSCALL(mq_timedreceive2);
 
 COND_SYSCALL(uretprobe);
diff --git a/tools/testing/selftests/ipc/mq_peek.c b/tools/testing/selftests/ipc/mq_peek.c
index d08596ae6e1c..0ec3f2fca82c 100644
--- a/tools/testing/selftests/ipc/mq_peek.c
+++ b/tools/testing/selftests/ipc/mq_peek.c
@@ -40,7 +40,8 @@
 #endif
 #endif
 
-#define MQ_PEEK 2U
+#define MQ_PEEK 0x02
+#define MQ_RECV 0x04
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
@@ -52,11 +53,13 @@ struct mq_timedreceive2_args {
 	char *msg_ptr;
 };
 
-static long mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args *uargs,
-			     unsigned int flags, unsigned long index,
+#define u_struct_size sizeof(struct mq_timedreceive2_args)
+
+static long mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args *uargs, size_t usize,
+			     unsigned int flags, const unsigned long index,
 			     const struct timespec *timeout)
 {
-	return syscall(__NR_mq_timedreceive2, (long)mqdes, uargs, (long)flags,
+	return syscall(__NR_mq_timedreceive2, mqdes, uargs, usize, flags,
 		       index, timeout);
 }
 
@@ -109,7 +112,7 @@ static long peek(mqd_t mqd, unsigned long index, char *buf, size_t bufsz,
 		.msg_prio = prio,
 		.msg_ptr = buf,
 	};
-	return mq_timedreceive2(mqd, &args, MQ_PEEK, index, NULL);
+	return mq_timedreceive2(mqd, &args, u_struct_size, MQ_PEEK, index, NULL);
 }
 
 static long queue_depth(mqd_t mqd)
@@ -226,7 +229,7 @@ static void test_peek_buffer_too_small(void)
 
 	send_msg(mqd, PRIO_HIGH, "hello", 5);
 
-	ret = mq_timedreceive2(mqd, &args, MQ_PEEK, 0, NULL);
+	ret = mq_timedreceive2(mqd, &args, u_struct_size, MQ_PEEK, 0, NULL);
 	if (ret == -1 && errno == EMSGSIZE)
 		ksft_test_result_pass("peek with small buf [EMSGSIZE]\n");
 	else
@@ -246,11 +249,10 @@ static void test_peek_bad_msg_ptr(void)
 		.msg_prio = &prio,
 		.msg_ptr = (char *)0x1,
 	};
-
 	long ret;
 
 	send_msg(mqd, PRIO_HIGH, "payload", 7);
-	ret = mq_timedreceive2(mqd, &args, MQ_PEEK, 0, NULL);
+	ret = mq_timedreceive2(mqd, &args, u_struct_size, MQ_PEEK, 0, NULL);
 	if (ret == -1 && errno == EFAULT)
 		ksft_test_result_pass("peek bad msg_ptr [EFAULT]\n");
 	else
@@ -588,7 +590,7 @@ static void test_no_peek_flag_is_receive(void)
 
 	send_msg(mqd, PRIO_HIGH, "consume-me", 10);
 
-	ret = mq_timedreceive2(mqd, &args, 0, 0, NULL);
+	ret = mq_timedreceive2(mqd, &args, u_struct_size, MQ_RECV, 0, NULL);
 	if (ret < 0) {
 		ksft_test_result_fail("no-peek receive failed: ret=%ld errno=%d\n", ret, errno);
 		mq_close(mqd);
@@ -615,8 +617,7 @@ static void *receiver_thread(void *arg)
 	ssize_t r;
 
 	while ((r = mq_receive(ctx->mqd, buf, sizeof(buf), &prio)) > 0)
-		;
-
+			;
 	return NULL;
 }
 
@@ -680,7 +681,7 @@ static void test_peek_null_prio_ptr(void)
 
 	send_msg(mqd, PRIO_MED, "no-prio-needed", 14);
 
-	ret = mq_timedreceive2(mqd, &args, MQ_PEEK, 0, NULL);
+	ret = mq_timedreceive2(mqd, &args, u_struct_size, MQ_PEEK, 0, NULL);
 	if (ret >= 0)
 		ksft_test_result_pass("peek with NULL msg_prio ptr: OK\n");
 	else
@@ -734,6 +735,32 @@ static void test_peek_priority_matches_receive(void)
 	mq_close(mqd);
 }
 
+static void test_contradictory_operation(void)
+{
+	mqd_t mqd = open_queue("contra_op", MAX_MSG_SIZE);
+	char buf[MAX_MSG_SIZE];
+	struct mq_timedreceive2_args args = {
+		.msg_len = sizeof(buf),
+		.msg_prio = NULL,
+		.msg_ptr = buf,
+	};
+	long ret;
+
+	send_msg(mqd, PRIO_HIGH, "A", 1);
+	send_msg(mqd, PRIO_MED, "B", 1);
+	send_msg(mqd, PRIO_LOW, "C", 1);
+
+	ret = mq_timedreceive2(mqd, &args, u_struct_size, MQ_PEEK | MQ_RECV, 0, 0);
+
+	if (ret == -1 && errno == EINVAL)
+		ksft_test_result_pass("Handler rejected invalid/contradictory request [EINVAL]\n");
+	else
+		ksft_test_result_fail("Handler did not reject invalid request: ret=%ld errno=%d (%m)\n",
+							ret, errno);
+
+	mq_close(mqd);
+}
+
 static const struct {
 	const char *name;
 	void (*fn)(void);
@@ -756,6 +783,7 @@ static const struct {
 	{ "NULL msg_prio ptr", test_peek_null_prio_ptr },
 	{ "peeked prio matches mq_receive",
 	  test_peek_priority_matches_receive },
+	{"two_contradictory_operation_togther", test_contradictory_operation}
 };
 
 int main(void)
@@ -769,7 +797,7 @@ int main(void)
 	{
 		struct mq_timedreceive2_args probe_args = { 0 };
 
-		sc_ret = mq_timedreceive2((mqd_t)-1, &probe_args, MQ_PEEK, 0,
+		sc_ret = mq_timedreceive2((mqd_t)-1, &probe_args, u_struct_size, MQ_PEEK, 0,
 					  NULL);
 		if (sc_ret == -1 && errno == ENOSYS)
 			ksft_exit_skip("mq_timedreceive2 syscall not available (NR=%d ENOSYS) — is the kernel too old?\n",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/5]IPC: Added New system call do_mq_timedreceive2() for non-destructive peek on posix mqueue
  2026-04-08  9:51 ` [PATCH v3 1/5]IPC: Added New system call do_mq_timedreceive2() for non-destructive peek on posix mqueue Mathura_Kumar
@ 2026-04-13 16:34   ` Andrei Vagin
  2026-04-14  0:04     ` Mathura
  0 siblings, 1 reply; 14+ messages in thread
From: Andrei Vagin @ 2026-04-13 16:34 UTC (permalink / raw)
  To: Mathura_Kumar; +Cc: criu, ptikhomirov, rstoyanov

Hi Mathura_Kumar,

First of all, please read
https://www.kernel.org/doc/html/v4.10/process/submitting-patches.html.

You need to write a detailed commit message for each patch.

Before sending a patch, it is recommended to run checkpatch:
$ ./scripts/checkpatch.pl `git format-patch HEAD~1`
WARNING: Missing commit description - Add an appropriate one

WARNING: line length of 107 exceeds 100 columns
#38: FILE: include/linux/compat.h:805:
+asmlinkage long compat_sys_mq_timedreceive2(mqd_t mqdes, struct
compat_mq_timedreceive2_args __user *uargs,

WARNING: line length of 128 exceeds 100 columns
#39: FILE: include/linux/compat.h:806:
+ unsigned int flags, unsigned long index,

WARNING: line length of 131 exceeds 100 columns
#40: FILE: include/linux/compat.h:807:
+ struct old_timespec32 __user *abs_timeout);

WARNING: Use #include <linux/compat.h> instead of <asm/compat.h>
#104: FILE: include/uapi/linux/mqueue.h:22:
+#include <asm/compat.h>

WARNING: line length of 122 exceeds 100 columns
#306: FILE: ipc/mqueue.c:1636:
+ struct compat_mq_timedreceive2_args __user *uargs)

total: 0 errors, 6 warnings, 408 lines checked


On Wed, Apr 8, 2026 at 2:53 AM Mathura_Kumar <academic1mathura@gmail.com> wrote:
>
> Signed-off-by: Mathura_Kumar <academic1mathura@gmail.com>
> ---
>  include/linux/compat.h            |   6 +-
>  include/linux/syscalls.h          |   6 +
>  include/uapi/asm-generic/unistd.h |   7 +-
>  include/uapi/linux/mqueue.h       |  14 ++-
>  ipc/mqueue.c                      | 186 ++++++++++++++++++++++++++++--
>  ipc/msg.c                         |   2 +-
>  ipc/msgutil.c                     |  48 ++++----
>  ipc/util.h                        |   3 +-
>  kernel/sys_ni.c                   |   1 +
>  9 files changed, 231 insertions(+), 42 deletions(-)
>
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index 56cebaff0c91..9f5ca26e76d8 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -22,6 +22,7 @@
>  #include <asm/compat.h>
>  #include <asm/siginfo.h>
>  #include <asm/signal.h>
> +#include <linux/mqueue.h>
>
>  #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
>  /*
> @@ -801,8 +802,9 @@ asmlinkage long compat_sys_pwritev64v2(unsigned long fd,
>                 const struct iovec __user *vec,
>                 unsigned long vlen, loff_t pos, rwf_t flags);
>  #endif
> -
> -
> +asmlinkage long compat_sys_mq_timedreceive2(mqd_t mqdes, struct compat_mq_timedreceive2_args __user *uargs,
> +                                                                                       unsigned int flags, unsigned long index,
> +                                                                                       struct old_timespec32 __user *abs_timeout);
>  /*
>   * Deprecated system calls which are still defined in
>   * include/uapi/asm-generic/unistd.h and wanted by >= 1 arch
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 02bd6ddb6278..993e570c90ab 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -79,6 +79,7 @@ struct mnt_id_req;
>  struct ns_id_req;
>  struct xattr_args;
>  struct file_attr;
> +struct mq_timedreceive2_args;
>
>  #include <linux/types.h>
>  #include <linux/aio_abi.h>
> @@ -93,6 +94,7 @@ struct file_attr;
>  #include <linux/key.h>
>  #include <linux/personality.h>
>  #include <trace/syscall.h>
> +#include <linux/mqueue.h>
>
>  #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
>  /*
> @@ -746,6 +748,10 @@ asmlinkage long sys_mq_timedsend_time32(mqd_t mqdes,
>                         const char __user *u_msg_ptr,
>                         unsigned int msg_len, unsigned int msg_prio,
>                         const struct old_timespec32 __user *u_abs_timeout);
> +asmlinkage long
> +sys_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args __user *uargs,
> +                    unsigned int flags, unsigned long index,
> +                    struct __kernel_timespec __user *abs_timeout);
>  asmlinkage long sys_msgget(key_t key, int msgflg);
>  asmlinkage long sys_old_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
>  asmlinkage long sys_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index a627acc8fb5f..200ee7fde5c4 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -863,9 +863,12 @@ __SYSCALL(__NR_listns, sys_listns)
>  #define __NR_rseq_slice_yield 471
>  __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
>
> -#undef __NR_syscalls
> -#define __NR_syscalls 472
> +#define __NR_mq_timedreceive2 472
> +__SC_COMP(__NR_mq_timedreceive2, sys_mq_timedreceive2,
> +               compat_sys_mq_timedreceive2)
>
> +#undef __NR_syscalls
> +#define __NR_syscalls 473
>  /*
>   * 32 bit systems traditionally used different
>   * syscalls for off_t and loff_t arguments, while
> diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h
> index b516b66840ad..7cdced63f5d2 100644
> --- a/include/uapi/linux/mqueue.h
> +++ b/include/uapi/linux/mqueue.h
> @@ -18,8 +18,8 @@
>
>  #ifndef _LINUX_MQUEUE_H
>  #define _LINUX_MQUEUE_H
> -
>  #include <linux/types.h>
> +#include <asm/compat.h>
>
>  #define MQ_PRIO_MAX    32768
>  /* per-uid limit of kernel memory used by mqueue, in bytes */
> @@ -33,6 +33,18 @@ struct mq_attr {
>         __kernel_long_t __reserved[4];  /* ignored for input, zeroed for output */
>  };
>
> +struct mq_timedreceive2_args {
> +       size_t msg_len;
> +       unsigned int  *msg_prio;
> +       char  *msg_ptr;
> +};
> +
> +struct compat_mq_timedreceive2_args {
> +       compat_size_t msg_len;
> +       compat_uptr_t msg_prio;
> +       compat_uptr_t msg_ptr;
> +};
> +
>  /*
>   * SIGEV_THREAD implementation:
>   * SIGEV_THREAD must be implemented in user space. If SIGEV_THREAD is passed
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 4798b375972b..78dc414967a2 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -53,6 +53,7 @@ struct mqueue_fs_context {
>
>  #define SEND           0
>  #define RECV           1
> +#define MQ_PEEK     2
>
>  #define STATE_NONE     0
>  #define STATE_READY    1
> @@ -1230,6 +1231,115 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
>         return ret;
>  }
>
> +static struct msg_msg *mq_peek_index(struct mqueue_inode_info *info, int index)
> +{
> +       struct rb_node *node;
> +       struct posix_msg_tree_node *leaf;
> +       struct msg_msg *msg;
> +
> +       int count = 0;
> +
> +       /* Start from highest priority */
> +       node = rb_last(&info->msg_tree);
> +       while (node) {
> +               leaf = rb_entry(node, struct posix_msg_tree_node, rb_node);
> +               list_for_each_entry(msg, &leaf->msg_list, m_list) {
> +                       if (count == index)
> +                               return msg;
> +                       count++;
> +               }
> +
> +               node = rb_prev(node);
> +       }
> +
> +       return NULL;
> +}
> +
> +static int do_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args *args,
> +                              unsigned int flags, unsigned long index,
> +                              struct timespec64 *ts)
> +{
> +       ssize_t ret;
> +       struct msg_msg *msg_ptr, *k_msg_buffer;
> +       long k_m_type;
> +       size_t k_m_ts;
> +       struct inode *inode;
> +       struct mqueue_inode_info *info;
> +
> +       if (!(flags & MQ_PEEK)) {
> +               return do_mq_timedreceive(mqdes, args->msg_ptr, args->msg_len,
> +                                         args->msg_prio, ts);
> +       }
> +       audit_mq_sendrecv(mqdes, args->msg_len, 0, ts);
> +       CLASS(fd, f)(mqdes);
> +       if (fd_empty(f))
> +               return -EBADF;
> +
> +       inode = file_inode(fd_file(f));
> +       if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
> +               return -EBADF;
> +       info = MQUEUE_I(inode);
> +       audit_file(fd_file(f));
> +
> +       if (unlikely(!(fd_file(f)->f_mode & FMODE_READ)))
> +               return -EBADF;
> +
> +       if (unlikely(args->msg_len < info->attr.mq_msgsize))
> +               return -EMSGSIZE;
> +       if (index >= (unsigned long)info->attr.mq_maxmsg)
> +               return -ENOENT;
> +
> +       spin_lock(&info->lock);
> +       if (info->attr.mq_curmsgs == 0) {
> +               spin_unlock(&info->lock);
> +               return -EAGAIN;

should it be ENOENT?

> +       }
> +       msg_ptr = mq_peek_index(info, index);
> +       if (!msg_ptr) {
> +               spin_unlock(&info->lock);
> +               return -ENOENT;
> +       }
> +       k_m_type = msg_ptr->m_type;
> +       k_m_ts = msg_ptr->m_ts;
> +       spin_unlock(&info->lock);
> +
> +       k_msg_buffer = alloc_msg(k_m_ts);
> +       if (!k_msg_buffer)
> +               return -ENOMEM;
> +
> +       /*
> +        * Two spin locks are necessary here. We are avoiding atomic memory
> +        * allocation and premature allocation before confirming that
> +        * a message actually exists to peek.
> +        */

MSG_COPY doesn't require to lock the spinlock twice, so why can't we
do the same thing here?

> +       spin_lock(&info->lock);
> +       msg_ptr = mq_peek_index(info, index);
> +       if (!msg_ptr || msg_ptr->m_type != k_m_type ||
> +           msg_ptr->m_ts != k_m_ts) {
> +               spin_unlock(&info->lock);
> +               free_msg(k_msg_buffer);
> +               return -EAGAIN;
> +       }
> +       if (IS_ERR(copy_msg(msg_ptr, k_msg_buffer, k_m_ts))) {
> +               spin_unlock(&info->lock);
> +               free_msg(k_msg_buffer);
> +               return -EINVAL;

you probably need to return the code returned by copy_msg...

> +       }
> +       spin_unlock(&info->lock);
> +
> +       ret = k_msg_buffer->m_ts;
> +       if (args->msg_prio && put_user(k_m_type, args->msg_prio)) {
> +               free_msg(k_msg_buffer);
> +               return -EFAULT;
> +       }
> +       if (store_msg(args->msg_ptr, k_msg_buffer, k_m_ts)) {
> +               free_msg(k_msg_buffer);
> +               return -EFAULT;
> +       }
> +       free_msg(k_msg_buffer);
> +       return ret;
> +}
> +
>  SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
>                 size_t, msg_len, unsigned int, msg_prio,
>                 const struct __kernel_timespec __user *, u_abs_timeout)
> @@ -1258,6 +1368,27 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
>         return do_mq_timedreceive(mqdes, u_msg_ptr, msg_len, u_msg_prio, p);
>  }
>
> +SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
> +               struct mq_timedreceive2_args __user *, uargs, unsigned int,
> +               flags, const unsigned long, index,
> +               const struct __kernel_timespec __user *, u_abs_timeout)
> +{
> +       struct mq_timedreceive2_args args;
> +       struct timespec64 ts, *p = NULL;
> +
> +       if (copy_from_user(&args, uargs, sizeof(args)))
> +               return -EFAULT;
> +
> +       if (u_abs_timeout) {
> +               int res = prepare_timeout(u_abs_timeout, &ts);
> +
> +               if (res)
> +                       return res;
> +               p = &ts;
> +       }
> +       return do_mq_timedreceive2(mqdes, &args, flags, index, p);
> +}
> +
>  /*
>   * Notes: the case when user wants us to deregister (with NULL as pointer)
>   * and he isn't currently owner of notification, will be silently discarded.
> @@ -1449,6 +1580,17 @@ SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
>         return 0;
>  }
>
> +#ifdef CONFIG_COMPAT_32BIT_TIME
> +static int compat_prepare_timeout(const struct old_timespec32 __user *p,
> +                                                               struct timespec64 *ts)
> +{
> +       if (get_old_timespec32(ts, p))
> +               return -EFAULT;
> +       if (!timespec64_valid(ts))
> +               return -EINVAL;
> +       return 0;
> +}
> +
>  #ifdef CONFIG_COMPAT
>
>  struct compat_mq_attr {
> @@ -1490,6 +1632,22 @@ static inline int put_compat_mq_attr(const struct mq_attr *attr,
>         return 0;
>  }
>
> +static inline int get_compat_mq_args(struct mq_timedreceive2_args *args,
> +                                                                       struct compat_mq_timedreceive2_args __user *uargs)
> +{
> +       struct compat_mq_timedreceive2_args v;
> +
> +       if (copy_from_user(&v, uargs, sizeof(v)))
> +               return -EFAULT;
> +
> +       memset(args, 0, sizeof(*args));
> +       args->msg_len = (size_t)v.msg_len;
> +       args->msg_prio = (unsigned int *)compat_ptr(v.msg_prio);
> +       args->msg_ptr = (char *)compat_ptr(v.msg_ptr);
> +
> +       return 0;
> +}
> +
>  COMPAT_SYSCALL_DEFINE4(mq_open, const char __user *, u_name,
>                        int, oflag, compat_mode_t, mode,
>                        struct compat_mq_attr __user *, u_attr)
> @@ -1541,19 +1699,30 @@ COMPAT_SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
>                 return -EFAULT;
>         return 0;
>  }
> -#endif
>
> -#ifdef CONFIG_COMPAT_32BIT_TIME
> -static int compat_prepare_timeout(const struct old_timespec32 __user *p,
> -                                  struct timespec64 *ts)
> +COMPAT_SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
> +                      struct compat_mq_timedreceive2_args __user *, uargs,
> +                      unsigned int, flags, const unsigned long, index,
> +                      const struct old_timespec32 __user *, u_abs_timeout)
>  {
> -       if (get_old_timespec32(ts, p))
> +       struct mq_timedreceive2_args args;
> +       struct timespec64 ts, *p = NULL;
> +
> +       if (get_compat_mq_args(&args, uargs))
>                 return -EFAULT;
> -       if (!timespec64_valid(ts))
> -               return -EINVAL;
> -       return 0;
> +
> +       if (u_abs_timeout) {
> +               int res = compat_prepare_timeout(u_abs_timeout, &ts);
> +
> +               if (res)
> +                       return res;
> +               p = &ts;
> +       }
> +       return do_mq_timedreceive2(mqdes, &args, flags, index, p);
>  }
>
> +#endif
> +
>  SYSCALL_DEFINE5(mq_timedsend_time32, mqd_t, mqdes,
>                 const char __user *, u_msg_ptr,
>                 unsigned int, msg_len, unsigned int, msg_prio,
> @@ -1583,6 +1752,7 @@ SYSCALL_DEFINE5(mq_timedreceive_time32, mqd_t, mqdes,
>         }
>         return do_mq_timedreceive(mqdes, u_msg_ptr, msg_len, u_msg_prio, p);
>  }
> +
>  #endif
>
>  static const struct inode_operations mqueue_dir_inode_operations = {
> diff --git a/ipc/msg.c b/ipc/msg.c
> index 62996b97f0ac..6392b11dd7f7 100644
> --- a/ipc/msg.c
> +++ b/ipc/msg.c
> @@ -1156,7 +1156,7 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
>                          * not update queue parameters.
>                          */
>                         if (msgflg & MSG_COPY) {
> -                               msg = copy_msg(msg, copy);
> +                               msg = copy_msg(msg, copy, msg->m_ts);
>                                 goto out_unlock0;
>                         }
>
> diff --git a/ipc/msgutil.c b/ipc/msgutil.c
> index e28f0cecb2ec..c5536ce47cc2 100644
> --- a/ipc/msgutil.c
> +++ b/ipc/msgutil.c
> @@ -51,7 +51,7 @@ static int __init init_msg_buckets(void)
>  }
>  subsys_initcall(init_msg_buckets);
>
> -static struct msg_msg *alloc_msg(size_t len)
> +struct msg_msg *alloc_msg(size_t len)
>  {
>         struct msg_msg *msg;
>         struct msg_msgseg **pseg;
> @@ -122,39 +122,33 @@ struct msg_msg *load_msg(const void __user *src, size_t len)
>         free_msg(msg);
>         return ERR_PTR(err);
>  }
> -#ifdef CONFIG_CHECKPOINT_RESTORE
> -struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
> +
> +struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst, size_t len)
>  {
> -       struct msg_msgseg *dst_pseg, *src_pseg;
> -       size_t len = src->m_ts;
> -       size_t alen;
> +       struct msg_msgseg *src_seg, *dst_seg;
> +       size_t remaining, chunk;
>
> -       if (src->m_ts > dst->m_ts)
> +       if (len > src->m_ts)
>                 return ERR_PTR(-EINVAL);
> -
> -       alen = min(len, DATALEN_MSG);
> -       memcpy(dst + 1, src + 1, alen);
> -
> -       for (dst_pseg = dst->next, src_pseg = src->next;
> -            src_pseg != NULL;
> -            dst_pseg = dst_pseg->next, src_pseg = src_pseg->next) {
> -
> -               len -= alen;
> -               alen = min(len, DATALEN_SEG);
> -               memcpy(dst_pseg + 1, src_pseg + 1, alen);
> +       chunk = min(len, DATALEN_MSG);
> +       memcpy(dst + 1, src + 1, chunk);
> +       remaining = len - chunk;
> +       src_seg = src->next;
> +       dst_seg = dst->next;
> +       while (remaining > 0 && src_seg && dst_seg) {
> +               chunk = min(remaining, DATALEN_SEG);
> +               memcpy(dst_seg + 1, src_seg + 1, chunk);
> +               remaining -= chunk;
> +               src_seg = src_seg->next;
> +               dst_seg = dst_seg->next;
>         }
> -
> +       if (remaining != 0)
> +               return ERR_PTR(-EINVAL);
>         dst->m_type = src->m_type;
> -       dst->m_ts = src->m_ts;
> -
> +       dst->m_ts   = src->m_ts;
>         return dst;
>  }
> -#else
> -struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
> -{
> -       return ERR_PTR(-ENOSYS);
> -}
> -#endif
> +
>  int store_msg(void __user *dest, struct msg_msg *msg, size_t len)
>  {
>         size_t alen;
> diff --git a/ipc/util.h b/ipc/util.h
> index a55d6cebe6d3..374abeee79b3 100644
> --- a/ipc/util.h
> +++ b/ipc/util.h
> @@ -197,8 +197,9 @@ int ipc_parse_version(int *cmd);
>
>  extern void free_msg(struct msg_msg *msg);
>  extern struct msg_msg *load_msg(const void __user *src, size_t len);
> -extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst);
> +extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst, size_t len);
>  extern int store_msg(void __user *dest, struct msg_msg *msg, size_t len);
> +extern struct msg_msg *alloc_msg(size_t len);
>
>  static inline int ipc_checkid(struct kern_ipc_perm *ipcp, int id)
>  {
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index add3032da16f..658d6b8274b3 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -391,6 +391,7 @@ COND_SYSCALL(setuid16);
>  /* restartable sequence */
>  COND_SYSCALL(rseq);
>  COND_SYSCALL(rseq_slice_yield);
> +COND_SYSCALL(mq_timedreceive2);
>
>  COND_SYSCALL(uretprobe);
>  COND_SYSCALL(uprobe);
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue
  2026-04-08  9:51 [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Mathura_Kumar
                   ` (4 preceding siblings ...)
  2026-04-08  9:51 ` [PATCH v3 5/5]Added bitmask flag and user struct argument size as input for extensibility Mathura_Kumar
@ 2026-04-13 16:36 ` Andrei Vagin
       [not found]   ` <CA+QNo220nhDmyazh=JQi0OEfhsGAc7az+GRkM=G5KdZj4+SyNg@mail.gmail.com>
  5 siblings, 1 reply; 14+ messages in thread
From: Andrei Vagin @ 2026-04-13 16:36 UTC (permalink / raw)
  To: Mathura_Kumar; +Cc: criu, ptikhomirov, rstoyanov

On Wed, Apr 8, 2026 at 2:53 AM Mathura_Kumar <academic1mathura@gmail.com> wrote:
>
>
> Patch series overview:
>
>   1. Add New system call do_mq_timedreceive2() and handler implementation
>   2. Add system call number in all most common arch.
>   3. Prepared Documentation and test
>   4. Add entry in performance tools all most common file
>   5. Add bitmask flag and user struct size as input for extensibility

Could you please explain why MSG_COPY can't be used to implement C/R?
It was introduced by 4a674f34ba04a ("ipc: introduce message queue copy feature")

>
> change since v2:
>  - bitmask flag is added and check for valid operation
>  - userspace struct argument size is added for extensibility
>  - new test is added
>  - documentation is updated approximately
>  - increased the total system call count in performance tool as well
>  - v2 Link: https://lore.kernel.org/linux-arch/20260320052340.6696-1-academic1mathura@gmail.com/T/#t
>
>
>   Thanks for reviewing.
>
>
>  Documentation/userspace-api/index.rst         |   1 +
>  Documentation/userspace-api/ipc.rst           | 228 +++++
>  arch/alpha/kernel/syscalls/syscall.tbl        |   1 +
>  arch/arm/tools/syscall.tbl                    |   1 +
>  arch/arm64/tools/syscall_32.tbl               |   1 +
>  arch/m68k/kernel/syscalls/syscall.tbl         |   1 +
>  arch/microblaze/kernel/syscalls/syscall.tbl   |   1 +
>  arch/mips/kernel/syscalls/syscall_n32.tbl     |   1 +
>  arch/mips/kernel/syscalls/syscall_n64.tbl     |   1 +
>  arch/mips/kernel/syscalls/syscall_o32.tbl     |   1 +
>  arch/parisc/kernel/syscalls/syscall.tbl       |   1 +
>  arch/powerpc/kernel/syscalls/syscall.tbl      |   1 +
>  arch/s390/kernel/syscalls/syscall.tbl         |   1 +
>  arch/sh/kernel/syscalls/syscall.tbl           |   1 +
>  arch/sparc/kernel/syscalls/syscall.tbl        |   1 +
>  arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
>  arch/xtensa/kernel/syscalls/syscall.tbl       |   1 +
>  include/linux/compat.h                        |   6 +-
>  include/linux/syscalls.h                      |   6 +
>  include/uapi/asm-generic/unistd.h             |   7 +-
>  include/uapi/linux/mqueue.h                   |  14 +-
>  ipc/mqueue.c                                  | 211 ++++-
>  ipc/msg.c                                     |   2 +-
>  ipc/msgutil.c                                 |  48 +-
>  ipc/util.h                                    |   3 +-
>  kernel/sys_ni.c                               |   3 +
>  scripts/syscall.tbl                           |   1 +
>  tools/include/uapi/asm-generic/unistd.h       |   6 +-
>  .../arch/alpha/entry/syscalls/syscall.tbl     |   1 +
>  .../perf/arch/arm/entry/syscalls/syscall.tbl  |   1 +
>  .../arch/arm64/entry/syscalls/syscall_32.tbl  |   1 +
>  .../arch/mips/entry/syscalls/syscall_n64.tbl  |   1 +
>  .../arch/parisc/entry/syscalls/syscall.tbl    |   1 +
>  .../arch/powerpc/entry/syscalls/syscall.tbl   |   1 +
>  .../perf/arch/s390/entry/syscalls/syscall.tbl |   1 +
>  tools/perf/arch/sh/entry/syscalls/syscall.tbl |   1 +
>  .../arch/sparc/entry/syscalls/syscall.tbl     |   1 +
>  .../arch/x86/entry/syscalls/syscall_32.tbl    |   1 +
>  .../arch/x86/entry/syscalls/syscall_64.tbl    |   1 +
>  .../arch/xtensa/entry/syscalls/syscall.tbl    |   1 +
>  tools/scripts/syscall.tbl                     |   1 +
>  tools/testing/selftests/ipc/.gitignore        |   1 +
>  tools/testing/selftests/ipc/Makefile          |   9 +-
>  tools/testing/selftests/ipc/mq_peek.c         | 813 ++++++++++++++++++
>  45 files changed, 1343 insertions(+), 45 deletions(-)
>  create mode 100644 Documentation/userspace-api/ipc.rst
>  create mode 100644 tools/testing/selftests/ipc/mq_peek.c
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue
       [not found]   ` <CA+QNo220nhDmyazh=JQi0OEfhsGAc7az+GRkM=G5KdZj4+SyNg@mail.gmail.com>
@ 2026-04-13 23:30     ` Mathura
  2026-04-14  0:11       ` Andrei Vagin
  0 siblings, 1 reply; 14+ messages in thread
From: Mathura @ 2026-04-13 23:30 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: criu, Pavel Tikhomirov, rstoyanov

Hi, Andrei

Thanks for reviewing and time,
I saw the code what you pointing to, here is current constraint what I see-

 1) "Kernel will figure out dilemma"- current implementation is seems
to be highly tied with System V MSG semantic, If we try to add logic
to handle Posix queue as well, then I think we need to mix posix queue
and current system V path, its create cross subsystem coupling,we need
to detect is msqid input passed in arg of msgrcv() is file descriptor
for posix queue or integer id of kernel object for system V (As we all
know posix queue is implemented on virtual file system), then route to
SysV path or POSIX path.We require  extra lookup attempts and
ambiguous failure modes.
Both are integers so in extremely rare cases ,it may overlap then, We
may end-up copying unrelated data to user_space

2) Readability issue- we need to think  about many things before
merging both, like Wakeup semantic,data structure, priority and FIFO
handling for posix msg.

3) Backward compatibility risk- changing behavior of msgrcv() to
accept FDs behave differently based on detection, Could break existing
programs and introduce silent misbehavior.

4) Non-Trivial to extend existing one- from given below snippet, it
seem pretty much clear that no input to size of user_space struct
taken, "msgsz" used for buffer or how many byte actually going to
copy, if we just add new member in struct as input in case we required
,it may cause compatibility issue with different arch and 32 bit or 64
bit struct layout handling be tricky without the size of struct.

SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp,
size_t, msgsz, long, msgtyp, int, msgflg)
 {
  return ksys_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg);
}

and static long do_msgrcv(int msqid, void __user *buf, size_t bufsz,
long msgtyp, int msgflg,
      long (*msg_handler)(void __user *, struct msg_msg *, size_t)){}


Thanks,
Mathura


On Tue, 14 Apr 2026 at 04:53, Mathura <academic1mathura@gmail.com> wrote:
>
> Hi, Andrei
>
> Thanks for reviewing and time,
> I saw the code what you pointing to, here is current constraint what I see-
>
>  1) "Kernel will figure out dilemma"- current implementation is seems to be highly tied with System V MSG semantic, If we try to add logic to handle Posix queue as well, then I think we need to mix posix queue and current system V path, its create cross subsystem coupling,we need to detect is msqid input passed in arg of msgrcv() is file descriptor for posix queue or integer id of kernel object for system V (As we all know posix queue is implemented on virtual file system), then route to SysV path or POSIX path.We require  extra lookup attempts and ambiguous failure modes.
> Both are integers so in extremely rare cases ,it may overlap then, We may end-up copying unrelated data to user_space
>
> 2) Readability issue- we need to think  about many things before merging both, like Wakeup semantic,data structure, priority and FIFO handling for posix msg.
>
> 3) Backward compatibility risk- changing behavior of msgrcv() to accept FDs behave differently based on detection, Could break existing programs and introduce silent misbehavior.
>
> 4) Non-Trivial to extend existing one- from given below snippet, it seem pretty much clear that no input to size of user_space struct  taken, "msgsz" used for buffer or how many byte actually going to copy, if we just add new member in struct as input in case we required ,it may cause compatibility issue with different arch and 32 bit or 64 bit struct layout handling be tricky without the size of struct.
>
> SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz, long, msgtyp, int, msgflg)
>  {
>   return ksys_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg);
> }
>
> and static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgflg,
>       long (*msg_handler)(void __user *, struct msg_msg *, size_t)){}
>
> Thanks,
> Mathura
>
>
> On Mon, 13 Apr 2026 at 22:06, Andrei Vagin <avagin@gmail.com> wrote:
>>
>> On Wed, Apr 8, 2026 at 2:53 AM Mathura_Kumar <academic1mathura@gmail.com> wrote:
>> >
>> >
>> > Patch series overview:
>> >
>> >   1. Add New system call do_mq_timedreceive2() and handler implementation
>> >   2. Add system call number in all most common arch.
>> >   3. Prepared Documentation and test
>> >   4. Add entry in performance tools all most common file
>> >   5. Add bitmask flag and user struct size as input for extensibility
>>
>> Could you please explain why MSG_COPY can't be used to implement C/R?
>> It was introduced by 4a674f34ba04a ("ipc: introduce message queue copy feature")
>>
>> >
>> > change since v2:
>> >  - bitmask flag is added and check for valid operation
>> >  - userspace struct argument size is added for extensibility
>> >  - new test is added
>> >  - documentation is updated approximately
>> >  - increased the total system call count in performance tool as well
>> >  - v2 Link: https://lore.kernel.org/linux-arch/20260320052340.6696-1-academic1mathura@gmail.com/T/#t
>> >
>> >
>> >   Thanks for reviewing.
>> >
>> >
>> >  Documentation/userspace-api/index.rst         |   1 +
>> >  Documentation/userspace-api/ipc.rst           | 228 +++++
>> >  arch/alpha/kernel/syscalls/syscall.tbl        |   1 +
>> >  arch/arm/tools/syscall.tbl                    |   1 +
>> >  arch/arm64/tools/syscall_32.tbl               |   1 +
>> >  arch/m68k/kernel/syscalls/syscall.tbl         |   1 +
>> >  arch/microblaze/kernel/syscalls/syscall.tbl   |   1 +
>> >  arch/mips/kernel/syscalls/syscall_n32.tbl     |   1 +
>> >  arch/mips/kernel/syscalls/syscall_n64.tbl     |   1 +
>> >  arch/mips/kernel/syscalls/syscall_o32.tbl     |   1 +
>> >  arch/parisc/kernel/syscalls/syscall.tbl       |   1 +
>> >  arch/powerpc/kernel/syscalls/syscall.tbl      |   1 +
>> >  arch/s390/kernel/syscalls/syscall.tbl         |   1 +
>> >  arch/sh/kernel/syscalls/syscall.tbl           |   1 +
>> >  arch/sparc/kernel/syscalls/syscall.tbl        |   1 +
>> >  arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
>> >  arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
>> >  arch/xtensa/kernel/syscalls/syscall.tbl       |   1 +
>> >  include/linux/compat.h                        |   6 +-
>> >  include/linux/syscalls.h                      |   6 +
>> >  include/uapi/asm-generic/unistd.h             |   7 +-
>> >  include/uapi/linux/mqueue.h                   |  14 +-
>> >  ipc/mqueue.c                                  | 211 ++++-
>> >  ipc/msg.c                                     |   2 +-
>> >  ipc/msgutil.c                                 |  48 +-
>> >  ipc/util.h                                    |   3 +-
>> >  kernel/sys_ni.c                               |   3 +
>> >  scripts/syscall.tbl                           |   1 +
>> >  tools/include/uapi/asm-generic/unistd.h       |   6 +-
>> >  .../arch/alpha/entry/syscalls/syscall.tbl     |   1 +
>> >  .../perf/arch/arm/entry/syscalls/syscall.tbl  |   1 +
>> >  .../arch/arm64/entry/syscalls/syscall_32.tbl  |   1 +
>> >  .../arch/mips/entry/syscalls/syscall_n64.tbl  |   1 +
>> >  .../arch/parisc/entry/syscalls/syscall.tbl    |   1 +
>> >  .../arch/powerpc/entry/syscalls/syscall.tbl   |   1 +
>> >  .../perf/arch/s390/entry/syscalls/syscall.tbl |   1 +
>> >  tools/perf/arch/sh/entry/syscalls/syscall.tbl |   1 +
>> >  .../arch/sparc/entry/syscalls/syscall.tbl     |   1 +
>> >  .../arch/x86/entry/syscalls/syscall_32.tbl    |   1 +
>> >  .../arch/x86/entry/syscalls/syscall_64.tbl    |   1 +
>> >  .../arch/xtensa/entry/syscalls/syscall.tbl    |   1 +
>> >  tools/scripts/syscall.tbl                     |   1 +
>> >  tools/testing/selftests/ipc/.gitignore        |   1 +
>> >  tools/testing/selftests/ipc/Makefile          |   9 +-
>> >  tools/testing/selftests/ipc/mq_peek.c         | 813 ++++++++++++++++++
>> >  45 files changed, 1343 insertions(+), 45 deletions(-)
>> >  create mode 100644 Documentation/userspace-api/ipc.rst
>> >  create mode 100644 tools/testing/selftests/ipc/mq_peek.c
>> >
>> > --
>> > 2.43.0
>> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/5]IPC: Added New system call do_mq_timedreceive2() for non-destructive peek on posix mqueue
  2026-04-13 16:34   ` Andrei Vagin
@ 2026-04-14  0:04     ` Mathura
  2026-04-14  0:52       ` Andrei Vagin
  0 siblings, 1 reply; 14+ messages in thread
From: Mathura @ 2026-04-14  0:04 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: criu, ptikhomirov, rstoyanov

Hi, Andrei
Thanks for review,

I have taken two lock, This is simple thought I had-

In first attempt, I want to  have a look into queue to see, is even
data exist to peek (because I preferred to allocate memory after
confirmation) then I taken second  lock after allocating temporary
buffer to  ensure, does even at this point already seen msg exist,
intact and till we finish to copy to user_space, we are  not going to
allow other system call to consume same msg or alter anything.
If I do not take a second lock, What if someone else in between tries
to consume the same msg ?

One more approach- pre-allocate memory before confirming data, then
take one lock during copy to temporary buffer only.

How ensure atomicity and existence of data during the period of copy
if we do not take lock ?

Because copy_msg(msg_ptr, k_msg_buffer, k_m_ts) is doing a temporary
copy to the kernel buffer only while holding a second lock, the final
copy to use_space is lock free.

Please leave more insight on this.

Mathura,

On Mon, 13 Apr 2026 at 22:04, Andrei Vagin <avagin@gmail.com> wrote:
>
> Hi Mathura_Kumar,
>
> First of all, please read
> https://www.kernel.org/doc/html/v4.10/process/submitting-patches.html.
>
> You need to write a detailed commit message for each patch.
>
> Before sending a patch, it is recommended to run checkpatch:
> $ ./scripts/checkpatch.pl `git format-patch HEAD~1`
> WARNING: Missing commit description - Add an appropriate one
>
> WARNING: line length of 107 exceeds 100 columns
> #38: FILE: include/linux/compat.h:805:
> +asmlinkage long compat_sys_mq_timedreceive2(mqd_t mqdes, struct
> compat_mq_timedreceive2_args __user *uargs,
>
> WARNING: line length of 128 exceeds 100 columns
> #39: FILE: include/linux/compat.h:806:
> + unsigned int flags, unsigned long index,
>
> WARNING: line length of 131 exceeds 100 columns
> #40: FILE: include/linux/compat.h:807:
> + struct old_timespec32 __user *abs_timeout);
>
> WARNING: Use #include <linux/compat.h> instead of <asm/compat.h>
> #104: FILE: include/uapi/linux/mqueue.h:22:
> +#include <asm/compat.h>
>
> WARNING: line length of 122 exceeds 100 columns
> #306: FILE: ipc/mqueue.c:1636:
> + struct compat_mq_timedreceive2_args __user *uargs)
>
> total: 0 errors, 6 warnings, 408 lines checked
>
>
> On Wed, Apr 8, 2026 at 2:53 AM Mathura_Kumar <academic1mathura@gmail.com> wrote:
> >
> > Signed-off-by: Mathura_Kumar <academic1mathura@gmail.com>
> > ---
> >  include/linux/compat.h            |   6 +-
> >  include/linux/syscalls.h          |   6 +
> >  include/uapi/asm-generic/unistd.h |   7 +-
> >  include/uapi/linux/mqueue.h       |  14 ++-
> >  ipc/mqueue.c                      | 186 ++++++++++++++++++++++++++++--
> >  ipc/msg.c                         |   2 +-
> >  ipc/msgutil.c                     |  48 ++++----
> >  ipc/util.h                        |   3 +-
> >  kernel/sys_ni.c                   |   1 +
> >  9 files changed, 231 insertions(+), 42 deletions(-)
> >
> > diff --git a/include/linux/compat.h b/include/linux/compat.h
> > index 56cebaff0c91..9f5ca26e76d8 100644
> > --- a/include/linux/compat.h
> > +++ b/include/linux/compat.h
> > @@ -22,6 +22,7 @@
> >  #include <asm/compat.h>
> >  #include <asm/siginfo.h>
> >  #include <asm/signal.h>
> > +#include <linux/mqueue.h>
> >
> >  #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
> >  /*
> > @@ -801,8 +802,9 @@ asmlinkage long compat_sys_pwritev64v2(unsigned long fd,
> >                 const struct iovec __user *vec,
> >                 unsigned long vlen, loff_t pos, rwf_t flags);
> >  #endif
> > -
> > -
> > +asmlinkage long compat_sys_mq_timedreceive2(mqd_t mqdes, struct compat_mq_timedreceive2_args __user *uargs,
> > +                                                                                       unsigned int flags, unsigned long index,
> > +                                                                                       struct old_timespec32 __user *abs_timeout);
> >  /*
> >   * Deprecated system calls which are still defined in
> >   * include/uapi/asm-generic/unistd.h and wanted by >= 1 arch
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 02bd6ddb6278..993e570c90ab 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -79,6 +79,7 @@ struct mnt_id_req;
> >  struct ns_id_req;
> >  struct xattr_args;
> >  struct file_attr;
> > +struct mq_timedreceive2_args;
> >
> >  #include <linux/types.h>
> >  #include <linux/aio_abi.h>
> > @@ -93,6 +94,7 @@ struct file_attr;
> >  #include <linux/key.h>
> >  #include <linux/personality.h>
> >  #include <trace/syscall.h>
> > +#include <linux/mqueue.h>
> >
> >  #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
> >  /*
> > @@ -746,6 +748,10 @@ asmlinkage long sys_mq_timedsend_time32(mqd_t mqdes,
> >                         const char __user *u_msg_ptr,
> >                         unsigned int msg_len, unsigned int msg_prio,
> >                         const struct old_timespec32 __user *u_abs_timeout);
> > +asmlinkage long
> > +sys_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args __user *uargs,
> > +                    unsigned int flags, unsigned long index,
> > +                    struct __kernel_timespec __user *abs_timeout);
> >  asmlinkage long sys_msgget(key_t key, int msgflg);
> >  asmlinkage long sys_old_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
> >  asmlinkage long sys_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
> > diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> > index a627acc8fb5f..200ee7fde5c4 100644
> > --- a/include/uapi/asm-generic/unistd.h
> > +++ b/include/uapi/asm-generic/unistd.h
> > @@ -863,9 +863,12 @@ __SYSCALL(__NR_listns, sys_listns)
> >  #define __NR_rseq_slice_yield 471
> >  __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
> >
> > -#undef __NR_syscalls
> > -#define __NR_syscalls 472
> > +#define __NR_mq_timedreceive2 472
> > +__SC_COMP(__NR_mq_timedreceive2, sys_mq_timedreceive2,
> > +               compat_sys_mq_timedreceive2)
> >
> > +#undef __NR_syscalls
> > +#define __NR_syscalls 473
> >  /*
> >   * 32 bit systems traditionally used different
> >   * syscalls for off_t and loff_t arguments, while
> > diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h
> > index b516b66840ad..7cdced63f5d2 100644
> > --- a/include/uapi/linux/mqueue.h
> > +++ b/include/uapi/linux/mqueue.h
> > @@ -18,8 +18,8 @@
> >
> >  #ifndef _LINUX_MQUEUE_H
> >  #define _LINUX_MQUEUE_H
> > -
> >  #include <linux/types.h>
> > +#include <asm/compat.h>
> >
> >  #define MQ_PRIO_MAX    32768
> >  /* per-uid limit of kernel memory used by mqueue, in bytes */
> > @@ -33,6 +33,18 @@ struct mq_attr {
> >         __kernel_long_t __reserved[4];  /* ignored for input, zeroed for output */
> >  };
> >
> > +struct mq_timedreceive2_args {
> > +       size_t msg_len;
> > +       unsigned int  *msg_prio;
> > +       char  *msg_ptr;
> > +};
> > +
> > +struct compat_mq_timedreceive2_args {
> > +       compat_size_t msg_len;
> > +       compat_uptr_t msg_prio;
> > +       compat_uptr_t msg_ptr;
> > +};
> > +
> >  /*
> >   * SIGEV_THREAD implementation:
> >   * SIGEV_THREAD must be implemented in user space. If SIGEV_THREAD is passed
> > diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> > index 4798b375972b..78dc414967a2 100644
> > --- a/ipc/mqueue.c
> > +++ b/ipc/mqueue.c
> > @@ -53,6 +53,7 @@ struct mqueue_fs_context {
> >
> >  #define SEND           0
> >  #define RECV           1
> > +#define MQ_PEEK     2
> >
> >  #define STATE_NONE     0
> >  #define STATE_READY    1
> > @@ -1230,6 +1231,115 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
> >         return ret;
> >  }
> >
> > +static struct msg_msg *mq_peek_index(struct mqueue_inode_info *info, int index)
> > +{
> > +       struct rb_node *node;
> > +       struct posix_msg_tree_node *leaf;
> > +       struct msg_msg *msg;
> > +
> > +       int count = 0;
> > +
> > +       /* Start from highest priority */
> > +       node = rb_last(&info->msg_tree);
> > +       while (node) {
> > +               leaf = rb_entry(node, struct posix_msg_tree_node, rb_node);
> > +               list_for_each_entry(msg, &leaf->msg_list, m_list) {
> > +                       if (count == index)
> > +                               return msg;
> > +                       count++;
> > +               }
> > +
> > +               node = rb_prev(node);
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +static int do_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args *args,
> > +                              unsigned int flags, unsigned long index,
> > +                              struct timespec64 *ts)
> > +{
> > +       ssize_t ret;
> > +       struct msg_msg *msg_ptr, *k_msg_buffer;
> > +       long k_m_type;
> > +       size_t k_m_ts;
> > +       struct inode *inode;
> > +       struct mqueue_inode_info *info;
> > +
> > +       if (!(flags & MQ_PEEK)) {
> > +               return do_mq_timedreceive(mqdes, args->msg_ptr, args->msg_len,
> > +                                         args->msg_prio, ts);
> > +       }
> > +       audit_mq_sendrecv(mqdes, args->msg_len, 0, ts);
> > +       CLASS(fd, f)(mqdes);
> > +       if (fd_empty(f))
> > +               return -EBADF;
> > +
> > +       inode = file_inode(fd_file(f));
> > +       if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
> > +               return -EBADF;
> > +       info = MQUEUE_I(inode);
> > +       audit_file(fd_file(f));
> > +
> > +       if (unlikely(!(fd_file(f)->f_mode & FMODE_READ)))
> > +               return -EBADF;
> > +
> > +       if (unlikely(args->msg_len < info->attr.mq_msgsize))
> > +               return -EMSGSIZE;
> > +       if (index >= (unsigned long)info->attr.mq_maxmsg)
> > +               return -ENOENT;
> > +
> > +       spin_lock(&info->lock);
> > +       if (info->attr.mq_curmsgs == 0) {
> > +               spin_unlock(&info->lock);
> > +               return -EAGAIN;
>
> should it be ENOENT?
>
> > +       }
> > +       msg_ptr = mq_peek_index(info, index);
> > +       if (!msg_ptr) {
> > +               spin_unlock(&info->lock);
> > +               return -ENOENT;
> > +       }
> > +       k_m_type = msg_ptr->m_type;
> > +       k_m_ts = msg_ptr->m_ts;
> > +       spin_unlock(&info->lock);
> > +
> > +       k_msg_buffer = alloc_msg(k_m_ts);
> > +       if (!k_msg_buffer)
> > +               return -ENOMEM;
> > +
> > +       /*
> > +        * Two spin locks are necessary here. We are avoiding atomic memory
> > +        * allocation and premature allocation before confirming that
> > +        * a message actually exists to peek.
> > +        */
>
> MSG_COPY doesn't require to lock the spinlock twice, so why can't we
> do the same thing here?
>
> > +       spin_lock(&info->lock);
> > +       msg_ptr = mq_peek_index(info, index);
> > +       if (!msg_ptr || msg_ptr->m_type != k_m_type ||
> > +           msg_ptr->m_ts != k_m_ts) {
> > +               spin_unlock(&info->lock);
> > +               free_msg(k_msg_buffer);
> > +               return -EAGAIN;
> > +       }
> > +       if (IS_ERR(copy_msg(msg_ptr, k_msg_buffer, k_m_ts))) {
> > +               spin_unlock(&info->lock);
> > +               free_msg(k_msg_buffer);
> > +               return -EINVAL;
>
> you probably need to return the code returned by copy_msg...
>
> > +       }
> > +       spin_unlock(&info->lock);
> > +
> > +       ret = k_msg_buffer->m_ts;
> > +       if (args->msg_prio && put_user(k_m_type, args->msg_prio)) {
> > +               free_msg(k_msg_buffer);
> > +               return -EFAULT;
> > +       }
> > +       if (store_msg(args->msg_ptr, k_msg_buffer, k_m_ts)) {
> > +               free_msg(k_msg_buffer);
> > +               return -EFAULT;
> > +       }
> > +       free_msg(k_msg_buffer);
> > +       return ret;
> > +}
> > +
> >  SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
> >                 size_t, msg_len, unsigned int, msg_prio,
> >                 const struct __kernel_timespec __user *, u_abs_timeout)
> > @@ -1258,6 +1368,27 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
> >         return do_mq_timedreceive(mqdes, u_msg_ptr, msg_len, u_msg_prio, p);
> >  }
> >
> > +SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
> > +               struct mq_timedreceive2_args __user *, uargs, unsigned int,
> > +               flags, const unsigned long, index,
> > +               const struct __kernel_timespec __user *, u_abs_timeout)
> > +{
> > +       struct mq_timedreceive2_args args;
> > +       struct timespec64 ts, *p = NULL;
> > +
> > +       if (copy_from_user(&args, uargs, sizeof(args)))
> > +               return -EFAULT;
> > +
> > +       if (u_abs_timeout) {
> > +               int res = prepare_timeout(u_abs_timeout, &ts);
> > +
> > +               if (res)
> > +                       return res;
> > +               p = &ts;
> > +       }
> > +       return do_mq_timedreceive2(mqdes, &args, flags, index, p);
> > +}
> > +
> >  /*
> >   * Notes: the case when user wants us to deregister (with NULL as pointer)
> >   * and he isn't currently owner of notification, will be silently discarded.
> > @@ -1449,6 +1580,17 @@ SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
> >         return 0;
> >  }
> >
> > +#ifdef CONFIG_COMPAT_32BIT_TIME
> > +static int compat_prepare_timeout(const struct old_timespec32 __user *p,
> > +                                                               struct timespec64 *ts)
> > +{
> > +       if (get_old_timespec32(ts, p))
> > +               return -EFAULT;
> > +       if (!timespec64_valid(ts))
> > +               return -EINVAL;
> > +       return 0;
> > +}
> > +
> >  #ifdef CONFIG_COMPAT
> >
> >  struct compat_mq_attr {
> > @@ -1490,6 +1632,22 @@ static inline int put_compat_mq_attr(const struct mq_attr *attr,
> >         return 0;
> >  }
> >
> > +static inline int get_compat_mq_args(struct mq_timedreceive2_args *args,
> > +                                                                       struct compat_mq_timedreceive2_args __user *uargs)
> > +{
> > +       struct compat_mq_timedreceive2_args v;
> > +
> > +       if (copy_from_user(&v, uargs, sizeof(v)))
> > +               return -EFAULT;
> > +
> > +       memset(args, 0, sizeof(*args));
> > +       args->msg_len = (size_t)v.msg_len;
> > +       args->msg_prio = (unsigned int *)compat_ptr(v.msg_prio);
> > +       args->msg_ptr = (char *)compat_ptr(v.msg_ptr);
> > +
> > +       return 0;
> > +}
> > +
> >  COMPAT_SYSCALL_DEFINE4(mq_open, const char __user *, u_name,
> >                        int, oflag, compat_mode_t, mode,
> >                        struct compat_mq_attr __user *, u_attr)
> > @@ -1541,19 +1699,30 @@ COMPAT_SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
> >                 return -EFAULT;
> >         return 0;
> >  }
> > -#endif
> >
> > -#ifdef CONFIG_COMPAT_32BIT_TIME
> > -static int compat_prepare_timeout(const struct old_timespec32 __user *p,
> > -                                  struct timespec64 *ts)
> > +COMPAT_SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
> > +                      struct compat_mq_timedreceive2_args __user *, uargs,
> > +                      unsigned int, flags, const unsigned long, index,
> > +                      const struct old_timespec32 __user *, u_abs_timeout)
> >  {
> > -       if (get_old_timespec32(ts, p))
> > +       struct mq_timedreceive2_args args;
> > +       struct timespec64 ts, *p = NULL;
> > +
> > +       if (get_compat_mq_args(&args, uargs))
> >                 return -EFAULT;
> > -       if (!timespec64_valid(ts))
> > -               return -EINVAL;
> > -       return 0;
> > +
> > +       if (u_abs_timeout) {
> > +               int res = compat_prepare_timeout(u_abs_timeout, &ts);
> > +
> > +               if (res)
> > +                       return res;
> > +               p = &ts;
> > +       }
> > +       return do_mq_timedreceive2(mqdes, &args, flags, index, p);
> >  }
> >
> > +#endif
> > +
> >  SYSCALL_DEFINE5(mq_timedsend_time32, mqd_t, mqdes,
> >                 const char __user *, u_msg_ptr,
> >                 unsigned int, msg_len, unsigned int, msg_prio,
> > @@ -1583,6 +1752,7 @@ SYSCALL_DEFINE5(mq_timedreceive_time32, mqd_t, mqdes,
> >         }
> >         return do_mq_timedreceive(mqdes, u_msg_ptr, msg_len, u_msg_prio, p);
> >  }
> > +
> >  #endif
> >
> >  static const struct inode_operations mqueue_dir_inode_operations = {
> > diff --git a/ipc/msg.c b/ipc/msg.c
> > index 62996b97f0ac..6392b11dd7f7 100644
> > --- a/ipc/msg.c
> > +++ b/ipc/msg.c
> > @@ -1156,7 +1156,7 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
> >                          * not update queue parameters.
> >                          */
> >                         if (msgflg & MSG_COPY) {
> > -                               msg = copy_msg(msg, copy);
> > +                               msg = copy_msg(msg, copy, msg->m_ts);
> >                                 goto out_unlock0;
> >                         }
> >
> > diff --git a/ipc/msgutil.c b/ipc/msgutil.c
> > index e28f0cecb2ec..c5536ce47cc2 100644
> > --- a/ipc/msgutil.c
> > +++ b/ipc/msgutil.c
> > @@ -51,7 +51,7 @@ static int __init init_msg_buckets(void)
> >  }
> >  subsys_initcall(init_msg_buckets);
> >
> > -static struct msg_msg *alloc_msg(size_t len)
> > +struct msg_msg *alloc_msg(size_t len)
> >  {
> >         struct msg_msg *msg;
> >         struct msg_msgseg **pseg;
> > @@ -122,39 +122,33 @@ struct msg_msg *load_msg(const void __user *src, size_t len)
> >         free_msg(msg);
> >         return ERR_PTR(err);
> >  }
> > -#ifdef CONFIG_CHECKPOINT_RESTORE
> > -struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
> > +
> > +struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst, size_t len)
> >  {
> > -       struct msg_msgseg *dst_pseg, *src_pseg;
> > -       size_t len = src->m_ts;
> > -       size_t alen;
> > +       struct msg_msgseg *src_seg, *dst_seg;
> > +       size_t remaining, chunk;
> >
> > -       if (src->m_ts > dst->m_ts)
> > +       if (len > src->m_ts)
> >                 return ERR_PTR(-EINVAL);
> > -
> > -       alen = min(len, DATALEN_MSG);
> > -       memcpy(dst + 1, src + 1, alen);
> > -
> > -       for (dst_pseg = dst->next, src_pseg = src->next;
> > -            src_pseg != NULL;
> > -            dst_pseg = dst_pseg->next, src_pseg = src_pseg->next) {
> > -
> > -               len -= alen;
> > -               alen = min(len, DATALEN_SEG);
> > -               memcpy(dst_pseg + 1, src_pseg + 1, alen);
> > +       chunk = min(len, DATALEN_MSG);
> > +       memcpy(dst + 1, src + 1, chunk);
> > +       remaining = len - chunk;
> > +       src_seg = src->next;
> > +       dst_seg = dst->next;
> > +       while (remaining > 0 && src_seg && dst_seg) {
> > +               chunk = min(remaining, DATALEN_SEG);
> > +               memcpy(dst_seg + 1, src_seg + 1, chunk);
> > +               remaining -= chunk;
> > +               src_seg = src_seg->next;
> > +               dst_seg = dst_seg->next;
> >         }
> > -
> > +       if (remaining != 0)
> > +               return ERR_PTR(-EINVAL);
> >         dst->m_type = src->m_type;
> > -       dst->m_ts = src->m_ts;
> > -
> > +       dst->m_ts   = src->m_ts;
> >         return dst;
> >  }
> > -#else
> > -struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
> > -{
> > -       return ERR_PTR(-ENOSYS);
> > -}
> > -#endif
> > +
> >  int store_msg(void __user *dest, struct msg_msg *msg, size_t len)
> >  {
> >         size_t alen;
> > diff --git a/ipc/util.h b/ipc/util.h
> > index a55d6cebe6d3..374abeee79b3 100644
> > --- a/ipc/util.h
> > +++ b/ipc/util.h
> > @@ -197,8 +197,9 @@ int ipc_parse_version(int *cmd);
> >
> >  extern void free_msg(struct msg_msg *msg);
> >  extern struct msg_msg *load_msg(const void __user *src, size_t len);
> > -extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst);
> > +extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst, size_t len);
> >  extern int store_msg(void __user *dest, struct msg_msg *msg, size_t len);
> > +extern struct msg_msg *alloc_msg(size_t len);
> >
> >  static inline int ipc_checkid(struct kern_ipc_perm *ipcp, int id)
> >  {
> > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> > index add3032da16f..658d6b8274b3 100644
> > --- a/kernel/sys_ni.c
> > +++ b/kernel/sys_ni.c
> > @@ -391,6 +391,7 @@ COND_SYSCALL(setuid16);
> >  /* restartable sequence */
> >  COND_SYSCALL(rseq);
> >  COND_SYSCALL(rseq_slice_yield);
> > +COND_SYSCALL(mq_timedreceive2);
> >
> >  COND_SYSCALL(uretprobe);
> >  COND_SYSCALL(uprobe);
> > --
> > 2.43.0
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue
  2026-04-13 23:30     ` Mathura
@ 2026-04-14  0:11       ` Andrei Vagin
  2026-04-14  0:40         ` Mathura
  0 siblings, 1 reply; 14+ messages in thread
From: Andrei Vagin @ 2026-04-14  0:11 UTC (permalink / raw)
  To: Mathura; +Cc: criu, Pavel Tikhomirov, rstoyanov

On Mon, Apr 13, 2026 at 4:30 PM Mathura <academic1mathura@gmail.com> wrote:
>
> Hi, Andrei
>
> Thanks for reviewing and time,
> I saw the code what you pointing to, here is current constraint what I see-

That was my mistake. It happens when I am rushing through some
unfamiliar code. I wasn't aware that IPC queues and POSIX mqueues
shared some kernel logic. However, I still think it’s worth looking at the
MSG_COPY implementation for IPC queues as a reference.

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue
  2026-04-14  0:11       ` Andrei Vagin
@ 2026-04-14  0:40         ` Mathura
  2026-04-14 16:39           ` Andrei Vagin
  0 siblings, 1 reply; 14+ messages in thread
From: Mathura @ 2026-04-14  0:40 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: criu, Pavel Tikhomirov, rstoyanov

Well, I will see that for reference
Before that here is quick question from my side-
Which better and clean approach

1) Current implementation follows one system call to one msg copy to
user space as per priority and FIFO with a given index value from
posix IPC RB Tree, this allows us  to go beyond Just from peek to
early inspection in some other use case by passing valid index.

2) Just dump everything  and all data in one call to user space.


Thanks,
Mathura


On Tue, 14 Apr 2026 at 05:41, Andrei Vagin <avagin@gmail.com> wrote:
>
> On Mon, Apr 13, 2026 at 4:30 PM Mathura <academic1mathura@gmail.com> wrote:
> >
> > Hi, Andrei
> >
> > Thanks for reviewing and time,
> > I saw the code what you pointing to, here is current constraint what I see-
>
> That was my mistake. It happens when I am rushing through some
> unfamiliar code. I wasn't aware that IPC queues and POSIX mqueues
> shared some kernel logic. However, I still think it’s worth looking at the
> MSG_COPY implementation for IPC queues as a reference.
>
> Thanks,
> Andrei

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/5]IPC: Added New system call do_mq_timedreceive2() for non-destructive peek on posix mqueue
  2026-04-14  0:04     ` Mathura
@ 2026-04-14  0:52       ` Andrei Vagin
  0 siblings, 0 replies; 14+ messages in thread
From: Andrei Vagin @ 2026-04-14  0:52 UTC (permalink / raw)
  To: Mathura; +Cc: criu, ptikhomirov, rstoyanov

On Mon, Apr 13, 2026 at 5:04 PM Mathura <academic1mathura@gmail.com> wrote:
>
> Hi, Andrei
> Thanks for review,
>
> I have taken two lock, This is simple thought I had-
>
> In first attempt, I want to  have a look into queue to see, is even
> data exist to peek (because I preferred to allocate memory after
> confirmation) then I taken second  lock after allocating temporary
> buffer to  ensure, does even at this point already seen msg exist,
> intact and till we finish to copy to user_space, we are  not going to
> allow other system call to consume same msg or alter anything.
> If I do not take a second lock, What if someone else in between tries
> to consume the same msg ?
>
> One more approach- pre-allocate memory before confirming data, then
> take one lock during copy to temporary buffer only.

I think this is what you need to do.

>
> How ensure atomicity and existence of data during the period of copy
> if we do not take lock ?

I don't suggest copying data without holding a lock...

>
> Because copy_msg(msg_ptr, k_msg_buffer, k_m_ts) is doing a temporary
> copy to the kernel buffer only while holding a second lock, the final
> copy to use_space is lock free.
>
> Please leave more insight on this.
>
> Mathura,
>
> On Mon, 13 Apr 2026 at 22:04, Andrei Vagin <avagin@gmail.com> wrote:
> >
> > Hi Mathura_Kumar,
> >
> > First of all, please read
> > https://www.kernel.org/doc/html/v4.10/process/submitting-patches.html.
> >
> > You need to write a detailed commit message for each patch.
> >
> > Before sending a patch, it is recommended to run checkpatch:
> > $ ./scripts/checkpatch.pl `git format-patch HEAD~1`
> > WARNING: Missing commit description - Add an appropriate one
> >
> > WARNING: line length of 107 exceeds 100 columns
> > #38: FILE: include/linux/compat.h:805:
> > +asmlinkage long compat_sys_mq_timedreceive2(mqd_t mqdes, struct
> > compat_mq_timedreceive2_args __user *uargs,
> >
> > WARNING: line length of 128 exceeds 100 columns
> > #39: FILE: include/linux/compat.h:806:
> > + unsigned int flags, unsigned long index,
> >
> > WARNING: line length of 131 exceeds 100 columns
> > #40: FILE: include/linux/compat.h:807:
> > + struct old_timespec32 __user *abs_timeout);
> >
> > WARNING: Use #include <linux/compat.h> instead of <asm/compat.h>
> > #104: FILE: include/uapi/linux/mqueue.h:22:
> > +#include <asm/compat.h>
> >
> > WARNING: line length of 122 exceeds 100 columns
> > #306: FILE: ipc/mqueue.c:1636:
> > + struct compat_mq_timedreceive2_args __user *uargs)
> >
> > total: 0 errors, 6 warnings, 408 lines checked
> >
> >
> > On Wed, Apr 8, 2026 at 2:53 AM Mathura_Kumar <academic1mathura@gmail.com> wrote:
> > >
> > > Signed-off-by: Mathura_Kumar <academic1mathura@gmail.com>
> > > ---
> > >  include/linux/compat.h            |   6 +-
> > >  include/linux/syscalls.h          |   6 +
> > >  include/uapi/asm-generic/unistd.h |   7 +-
> > >  include/uapi/linux/mqueue.h       |  14 ++-
> > >  ipc/mqueue.c                      | 186 ++++++++++++++++++++++++++++--
> > >  ipc/msg.c                         |   2 +-
> > >  ipc/msgutil.c                     |  48 ++++----
> > >  ipc/util.h                        |   3 +-
> > >  kernel/sys_ni.c                   |   1 +
> > >  9 files changed, 231 insertions(+), 42 deletions(-)
> > >
> > > diff --git a/include/linux/compat.h b/include/linux/compat.h
> > > index 56cebaff0c91..9f5ca26e76d8 100644
> > > --- a/include/linux/compat.h
> > > +++ b/include/linux/compat.h
> > > @@ -22,6 +22,7 @@
> > >  #include <asm/compat.h>
> > >  #include <asm/siginfo.h>
> > >  #include <asm/signal.h>
> > > +#include <linux/mqueue.h>
> > >
> > >  #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
> > >  /*
> > > @@ -801,8 +802,9 @@ asmlinkage long compat_sys_pwritev64v2(unsigned long fd,
> > >                 const struct iovec __user *vec,
> > >                 unsigned long vlen, loff_t pos, rwf_t flags);
> > >  #endif
> > > -
> > > -
> > > +asmlinkage long compat_sys_mq_timedreceive2(mqd_t mqdes, struct compat_mq_timedreceive2_args __user *uargs,
> > > +                                                                                       unsigned int flags, unsigned long index,
> > > +                                                                                       struct old_timespec32 __user *abs_timeout);
> > >  /*
> > >   * Deprecated system calls which are still defined in
> > >   * include/uapi/asm-generic/unistd.h and wanted by >= 1 arch
> > > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > > index 02bd6ddb6278..993e570c90ab 100644
> > > --- a/include/linux/syscalls.h
> > > +++ b/include/linux/syscalls.h
> > > @@ -79,6 +79,7 @@ struct mnt_id_req;
> > >  struct ns_id_req;
> > >  struct xattr_args;
> > >  struct file_attr;
> > > +struct mq_timedreceive2_args;
> > >
> > >  #include <linux/types.h>
> > >  #include <linux/aio_abi.h>
> > > @@ -93,6 +94,7 @@ struct file_attr;
> > >  #include <linux/key.h>
> > >  #include <linux/personality.h>
> > >  #include <trace/syscall.h>
> > > +#include <linux/mqueue.h>
> > >
> > >  #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
> > >  /*
> > > @@ -746,6 +748,10 @@ asmlinkage long sys_mq_timedsend_time32(mqd_t mqdes,
> > >                         const char __user *u_msg_ptr,
> > >                         unsigned int msg_len, unsigned int msg_prio,
> > >                         const struct old_timespec32 __user *u_abs_timeout);
> > > +asmlinkage long
> > > +sys_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args __user *uargs,
> > > +                    unsigned int flags, unsigned long index,
> > > +                    struct __kernel_timespec __user *abs_timeout);
> > >  asmlinkage long sys_msgget(key_t key, int msgflg);
> > >  asmlinkage long sys_old_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
> > >  asmlinkage long sys_msgctl(int msqid, int cmd, struct msqid_ds __user *buf);
> > > diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> > > index a627acc8fb5f..200ee7fde5c4 100644
> > > --- a/include/uapi/asm-generic/unistd.h
> > > +++ b/include/uapi/asm-generic/unistd.h
> > > @@ -863,9 +863,12 @@ __SYSCALL(__NR_listns, sys_listns)
> > >  #define __NR_rseq_slice_yield 471
> > >  __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
> > >
> > > -#undef __NR_syscalls
> > > -#define __NR_syscalls 472
> > > +#define __NR_mq_timedreceive2 472
> > > +__SC_COMP(__NR_mq_timedreceive2, sys_mq_timedreceive2,
> > > +               compat_sys_mq_timedreceive2)
> > >
> > > +#undef __NR_syscalls
> > > +#define __NR_syscalls 473
> > >  /*
> > >   * 32 bit systems traditionally used different
> > >   * syscalls for off_t and loff_t arguments, while
> > > diff --git a/include/uapi/linux/mqueue.h b/include/uapi/linux/mqueue.h
> > > index b516b66840ad..7cdced63f5d2 100644
> > > --- a/include/uapi/linux/mqueue.h
> > > +++ b/include/uapi/linux/mqueue.h
> > > @@ -18,8 +18,8 @@
> > >
> > >  #ifndef _LINUX_MQUEUE_H
> > >  #define _LINUX_MQUEUE_H
> > > -
> > >  #include <linux/types.h>
> > > +#include <asm/compat.h>
> > >
> > >  #define MQ_PRIO_MAX    32768
> > >  /* per-uid limit of kernel memory used by mqueue, in bytes */
> > > @@ -33,6 +33,18 @@ struct mq_attr {
> > >         __kernel_long_t __reserved[4];  /* ignored for input, zeroed for output */
> > >  };
> > >
> > > +struct mq_timedreceive2_args {
> > > +       size_t msg_len;
> > > +       unsigned int  *msg_prio;
> > > +       char  *msg_ptr;
> > > +};
> > > +
> > > +struct compat_mq_timedreceive2_args {
> > > +       compat_size_t msg_len;
> > > +       compat_uptr_t msg_prio;
> > > +       compat_uptr_t msg_ptr;
> > > +};
> > > +
> > >  /*
> > >   * SIGEV_THREAD implementation:
> > >   * SIGEV_THREAD must be implemented in user space. If SIGEV_THREAD is passed
> > > diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> > > index 4798b375972b..78dc414967a2 100644
> > > --- a/ipc/mqueue.c
> > > +++ b/ipc/mqueue.c
> > > @@ -53,6 +53,7 @@ struct mqueue_fs_context {
> > >
> > >  #define SEND           0
> > >  #define RECV           1
> > > +#define MQ_PEEK     2
> > >
> > >  #define STATE_NONE     0
> > >  #define STATE_READY    1
> > > @@ -1230,6 +1231,115 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
> > >         return ret;
> > >  }
> > >
> > > +static struct msg_msg *mq_peek_index(struct mqueue_inode_info *info, int index)
> > > +{
> > > +       struct rb_node *node;
> > > +       struct posix_msg_tree_node *leaf;
> > > +       struct msg_msg *msg;
> > > +
> > > +       int count = 0;
> > > +
> > > +       /* Start from highest priority */
> > > +       node = rb_last(&info->msg_tree);
> > > +       while (node) {
> > > +               leaf = rb_entry(node, struct posix_msg_tree_node, rb_node);
> > > +               list_for_each_entry(msg, &leaf->msg_list, m_list) {
> > > +                       if (count == index)
> > > +                               return msg;
> > > +                       count++;
> > > +               }
> > > +
> > > +               node = rb_prev(node);
> > > +       }
> > > +
> > > +       return NULL;
> > > +}
> > > +
> > > +static int do_mq_timedreceive2(mqd_t mqdes, struct mq_timedreceive2_args *args,
> > > +                              unsigned int flags, unsigned long index,
> > > +                              struct timespec64 *ts)
> > > +{
> > > +       ssize_t ret;
> > > +       struct msg_msg *msg_ptr, *k_msg_buffer;
> > > +       long k_m_type;
> > > +       size_t k_m_ts;
> > > +       struct inode *inode;
> > > +       struct mqueue_inode_info *info;
> > > +
> > > +       if (!(flags & MQ_PEEK)) {
> > > +               return do_mq_timedreceive(mqdes, args->msg_ptr, args->msg_len,
> > > +                                         args->msg_prio, ts);
> > > +       }
> > > +       audit_mq_sendrecv(mqdes, args->msg_len, 0, ts);
> > > +       CLASS(fd, f)(mqdes);
> > > +       if (fd_empty(f))
> > > +               return -EBADF;
> > > +
> > > +       inode = file_inode(fd_file(f));
> > > +       if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
> > > +               return -EBADF;
> > > +       info = MQUEUE_I(inode);
> > > +       audit_file(fd_file(f));
> > > +
> > > +       if (unlikely(!(fd_file(f)->f_mode & FMODE_READ)))
> > > +               return -EBADF;
> > > +
> > > +       if (unlikely(args->msg_len < info->attr.mq_msgsize))
> > > +               return -EMSGSIZE;
> > > +       if (index >= (unsigned long)info->attr.mq_maxmsg)
> > > +               return -ENOENT;
> > > +
> > > +       spin_lock(&info->lock);
> > > +       if (info->attr.mq_curmsgs == 0) {
> > > +               spin_unlock(&info->lock);
> > > +               return -EAGAIN;
> >
> > should it be ENOENT?
> >
> > > +       }
> > > +       msg_ptr = mq_peek_index(info, index);
> > > +       if (!msg_ptr) {
> > > +               spin_unlock(&info->lock);
> > > +               return -ENOENT;
> > > +       }
> > > +       k_m_type = msg_ptr->m_type;
> > > +       k_m_ts = msg_ptr->m_ts;
> > > +       spin_unlock(&info->lock);
> > > +
> > > +       k_msg_buffer = alloc_msg(k_m_ts);
> > > +       if (!k_msg_buffer)
> > > +               return -ENOMEM;
> > > +
> > > +       /*
> > > +        * Two spin locks are necessary here. We are avoiding atomic memory
> > > +        * allocation and premature allocation before confirming that
> > > +        * a message actually exists to peek.
> > > +        */
> >
> > MSG_COPY doesn't require to lock the spinlock twice, so why can't we
> > do the same thing here?
> >
> > > +       spin_lock(&info->lock);
> > > +       msg_ptr = mq_peek_index(info, index);
> > > +       if (!msg_ptr || msg_ptr->m_type != k_m_type ||
> > > +           msg_ptr->m_ts != k_m_ts) {
> > > +               spin_unlock(&info->lock);
> > > +               free_msg(k_msg_buffer);
> > > +               return -EAGAIN;
> > > +       }
> > > +       if (IS_ERR(copy_msg(msg_ptr, k_msg_buffer, k_m_ts))) {
> > > +               spin_unlock(&info->lock);
> > > +               free_msg(k_msg_buffer);
> > > +               return -EINVAL;
> >
> > you probably need to return the code returned by copy_msg...
> >
> > > +       }
> > > +       spin_unlock(&info->lock);
> > > +
> > > +       ret = k_msg_buffer->m_ts;
> > > +       if (args->msg_prio && put_user(k_m_type, args->msg_prio)) {
> > > +               free_msg(k_msg_buffer);
> > > +               return -EFAULT;
> > > +       }
> > > +       if (store_msg(args->msg_ptr, k_msg_buffer, k_m_ts)) {
> > > +               free_msg(k_msg_buffer);
> > > +               return -EFAULT;
> > > +       }
> > > +       free_msg(k_msg_buffer);
> > > +       return ret;
> > > +}
> > > +
> > >  SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
> > >                 size_t, msg_len, unsigned int, msg_prio,
> > >                 const struct __kernel_timespec __user *, u_abs_timeout)
> > > @@ -1258,6 +1368,27 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
> > >         return do_mq_timedreceive(mqdes, u_msg_ptr, msg_len, u_msg_prio, p);
> > >  }
> > >
> > > +SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
> > > +               struct mq_timedreceive2_args __user *, uargs, unsigned int,
> > > +               flags, const unsigned long, index,
> > > +               const struct __kernel_timespec __user *, u_abs_timeout)
> > > +{
> > > +       struct mq_timedreceive2_args args;
> > > +       struct timespec64 ts, *p = NULL;
> > > +
> > > +       if (copy_from_user(&args, uargs, sizeof(args)))
> > > +               return -EFAULT;
> > > +
> > > +       if (u_abs_timeout) {
> > > +               int res = prepare_timeout(u_abs_timeout, &ts);
> > > +
> > > +               if (res)
> > > +                       return res;
> > > +               p = &ts;
> > > +       }
> > > +       return do_mq_timedreceive2(mqdes, &args, flags, index, p);
> > > +}
> > > +
> > >  /*
> > >   * Notes: the case when user wants us to deregister (with NULL as pointer)
> > >   * and he isn't currently owner of notification, will be silently discarded.
> > > @@ -1449,6 +1580,17 @@ SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
> > >         return 0;
> > >  }
> > >
> > > +#ifdef CONFIG_COMPAT_32BIT_TIME
> > > +static int compat_prepare_timeout(const struct old_timespec32 __user *p,
> > > +                                                               struct timespec64 *ts)
> > > +{
> > > +       if (get_old_timespec32(ts, p))
> > > +               return -EFAULT;
> > > +       if (!timespec64_valid(ts))
> > > +               return -EINVAL;
> > > +       return 0;
> > > +}
> > > +
> > >  #ifdef CONFIG_COMPAT
> > >
> > >  struct compat_mq_attr {
> > > @@ -1490,6 +1632,22 @@ static inline int put_compat_mq_attr(const struct mq_attr *attr,
> > >         return 0;
> > >  }
> > >
> > > +static inline int get_compat_mq_args(struct mq_timedreceive2_args *args,
> > > +                                                                       struct compat_mq_timedreceive2_args __user *uargs)
> > > +{
> > > +       struct compat_mq_timedreceive2_args v;
> > > +
> > > +       if (copy_from_user(&v, uargs, sizeof(v)))
> > > +               return -EFAULT;
> > > +
> > > +       memset(args, 0, sizeof(*args));
> > > +       args->msg_len = (size_t)v.msg_len;
> > > +       args->msg_prio = (unsigned int *)compat_ptr(v.msg_prio);
> > > +       args->msg_ptr = (char *)compat_ptr(v.msg_ptr);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  COMPAT_SYSCALL_DEFINE4(mq_open, const char __user *, u_name,
> > >                        int, oflag, compat_mode_t, mode,
> > >                        struct compat_mq_attr __user *, u_attr)
> > > @@ -1541,19 +1699,30 @@ COMPAT_SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
> > >                 return -EFAULT;
> > >         return 0;
> > >  }
> > > -#endif
> > >
> > > -#ifdef CONFIG_COMPAT_32BIT_TIME
> > > -static int compat_prepare_timeout(const struct old_timespec32 __user *p,
> > > -                                  struct timespec64 *ts)
> > > +COMPAT_SYSCALL_DEFINE5(mq_timedreceive2, mqd_t, mqdes,
> > > +                      struct compat_mq_timedreceive2_args __user *, uargs,
> > > +                      unsigned int, flags, const unsigned long, index,
> > > +                      const struct old_timespec32 __user *, u_abs_timeout)
> > >  {
> > > -       if (get_old_timespec32(ts, p))
> > > +       struct mq_timedreceive2_args args;
> > > +       struct timespec64 ts, *p = NULL;
> > > +
> > > +       if (get_compat_mq_args(&args, uargs))
> > >                 return -EFAULT;
> > > -       if (!timespec64_valid(ts))
> > > -               return -EINVAL;
> > > -       return 0;
> > > +
> > > +       if (u_abs_timeout) {
> > > +               int res = compat_prepare_timeout(u_abs_timeout, &ts);
> > > +
> > > +               if (res)
> > > +                       return res;
> > > +               p = &ts;
> > > +       }
> > > +       return do_mq_timedreceive2(mqdes, &args, flags, index, p);
> > >  }
> > >
> > > +#endif
> > > +
> > >  SYSCALL_DEFINE5(mq_timedsend_time32, mqd_t, mqdes,
> > >                 const char __user *, u_msg_ptr,
> > >                 unsigned int, msg_len, unsigned int, msg_prio,
> > > @@ -1583,6 +1752,7 @@ SYSCALL_DEFINE5(mq_timedreceive_time32, mqd_t, mqdes,
> > >         }
> > >         return do_mq_timedreceive(mqdes, u_msg_ptr, msg_len, u_msg_prio, p);
> > >  }
> > > +
> > >  #endif
> > >
> > >  static const struct inode_operations mqueue_dir_inode_operations = {
> > > diff --git a/ipc/msg.c b/ipc/msg.c
> > > index 62996b97f0ac..6392b11dd7f7 100644
> > > --- a/ipc/msg.c
> > > +++ b/ipc/msg.c
> > > @@ -1156,7 +1156,7 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
> > >                          * not update queue parameters.
> > >                          */
> > >                         if (msgflg & MSG_COPY) {
> > > -                               msg = copy_msg(msg, copy);
> > > +                               msg = copy_msg(msg, copy, msg->m_ts);
> > >                                 goto out_unlock0;
> > >                         }
> > >
> > > diff --git a/ipc/msgutil.c b/ipc/msgutil.c
> > > index e28f0cecb2ec..c5536ce47cc2 100644
> > > --- a/ipc/msgutil.c
> > > +++ b/ipc/msgutil.c
> > > @@ -51,7 +51,7 @@ static int __init init_msg_buckets(void)
> > >  }
> > >  subsys_initcall(init_msg_buckets);
> > >
> > > -static struct msg_msg *alloc_msg(size_t len)
> > > +struct msg_msg *alloc_msg(size_t len)
> > >  {
> > >         struct msg_msg *msg;
> > >         struct msg_msgseg **pseg;
> > > @@ -122,39 +122,33 @@ struct msg_msg *load_msg(const void __user *src, size_t len)
> > >         free_msg(msg);
> > >         return ERR_PTR(err);
> > >  }
> > > -#ifdef CONFIG_CHECKPOINT_RESTORE
> > > -struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
> > > +
> > > +struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst, size_t len)
> > >  {
> > > -       struct msg_msgseg *dst_pseg, *src_pseg;
> > > -       size_t len = src->m_ts;
> > > -       size_t alen;
> > > +       struct msg_msgseg *src_seg, *dst_seg;
> > > +       size_t remaining, chunk;
> > >
> > > -       if (src->m_ts > dst->m_ts)
> > > +       if (len > src->m_ts)
> > >                 return ERR_PTR(-EINVAL);
> > > -
> > > -       alen = min(len, DATALEN_MSG);
> > > -       memcpy(dst + 1, src + 1, alen);
> > > -
> > > -       for (dst_pseg = dst->next, src_pseg = src->next;
> > > -            src_pseg != NULL;
> > > -            dst_pseg = dst_pseg->next, src_pseg = src_pseg->next) {
> > > -
> > > -               len -= alen;
> > > -               alen = min(len, DATALEN_SEG);
> > > -               memcpy(dst_pseg + 1, src_pseg + 1, alen);
> > > +       chunk = min(len, DATALEN_MSG);
> > > +       memcpy(dst + 1, src + 1, chunk);
> > > +       remaining = len - chunk;
> > > +       src_seg = src->next;
> > > +       dst_seg = dst->next;
> > > +       while (remaining > 0 && src_seg && dst_seg) {
> > > +               chunk = min(remaining, DATALEN_SEG);
> > > +               memcpy(dst_seg + 1, src_seg + 1, chunk);
> > > +               remaining -= chunk;
> > > +               src_seg = src_seg->next;
> > > +               dst_seg = dst_seg->next;
> > >         }
> > > -
> > > +       if (remaining != 0)
> > > +               return ERR_PTR(-EINVAL);
> > >         dst->m_type = src->m_type;
> > > -       dst->m_ts = src->m_ts;
> > > -
> > > +       dst->m_ts   = src->m_ts;
> > >         return dst;
> > >  }
> > > -#else
> > > -struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst)
> > > -{
> > > -       return ERR_PTR(-ENOSYS);
> > > -}
> > > -#endif
> > > +
> > >  int store_msg(void __user *dest, struct msg_msg *msg, size_t len)
> > >  {
> > >         size_t alen;
> > > diff --git a/ipc/util.h b/ipc/util.h
> > > index a55d6cebe6d3..374abeee79b3 100644
> > > --- a/ipc/util.h
> > > +++ b/ipc/util.h
> > > @@ -197,8 +197,9 @@ int ipc_parse_version(int *cmd);
> > >
> > >  extern void free_msg(struct msg_msg *msg);
> > >  extern struct msg_msg *load_msg(const void __user *src, size_t len);
> > > -extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst);
> > > +extern struct msg_msg *copy_msg(struct msg_msg *src, struct msg_msg *dst, size_t len);
> > >  extern int store_msg(void __user *dest, struct msg_msg *msg, size_t len);
> > > +extern struct msg_msg *alloc_msg(size_t len);
> > >
> > >  static inline int ipc_checkid(struct kern_ipc_perm *ipcp, int id)
> > >  {
> > > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> > > index add3032da16f..658d6b8274b3 100644
> > > --- a/kernel/sys_ni.c
> > > +++ b/kernel/sys_ni.c
> > > @@ -391,6 +391,7 @@ COND_SYSCALL(setuid16);
> > >  /* restartable sequence */
> > >  COND_SYSCALL(rseq);
> > >  COND_SYSCALL(rseq_slice_yield);
> > > +COND_SYSCALL(mq_timedreceive2);
> > >
> > >  COND_SYSCALL(uretprobe);
> > >  COND_SYSCALL(uprobe);
> > > --
> > > 2.43.0
> > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue
  2026-04-14  0:40         ` Mathura
@ 2026-04-14 16:39           ` Andrei Vagin
  0 siblings, 0 replies; 14+ messages in thread
From: Andrei Vagin @ 2026-04-14 16:39 UTC (permalink / raw)
  To: Mathura
  Cc: Andrei Vagin, criu, Pavel Tikhomirov, rstoyanov,
	Alexander Mikhalitsyn

On Mon, Apr 13, 2026 at 5:40 PM Mathura <academic1mathura@gmail.com> wrote:
>
> Well, I will see that for reference
> Before that here is quick question from my side-
> Which better and clean approach
>
> 1) Current implementation follows one system call to one msg copy to
> user space as per priority and FIFO with a given index value from
> posix IPC RB Tree, this allows us  to go beyond Just from peek to
> early inspection in some other use case by passing valid index.
>
> 2) Just dump everything  and all data in one call to user space.

It probably will not work reliably. I think we can do something
inspired by recvmmsg:

mq_recvmmsg(mqdes, iovec, iovec_len, attrs, attrs_len, flags)

where attrs is an extandable structure:
struct mq_recvmmsg_attrs {
  uint64_t peek_offset; // MSG_PEEK
  /* new fields can be added in the future */
}

It is just an idea...

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-04-14 16:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08  9:51 [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Mathura_Kumar
2026-04-08  9:51 ` [PATCH v3 1/5]IPC: Added New system call do_mq_timedreceive2() for non-destructive peek on posix mqueue Mathura_Kumar
2026-04-13 16:34   ` Andrei Vagin
2026-04-14  0:04     ` Mathura
2026-04-14  0:52       ` Andrei Vagin
2026-04-08  9:51 ` [PATCH v3 2/5]IPC: Added system call number in all most common arch Mathura_Kumar
2026-04-08  9:51 ` [PATCH v3 3/5]IPC: Prepared Documentation and test Mathura_Kumar
2026-04-08  9:51 ` [PATCH v3 4/5]IPC:Added entry in performance tools for new system call Mathura_Kumar
2026-04-08  9:51 ` [PATCH v3 5/5]Added bitmask flag and user struct argument size as input for extensibility Mathura_Kumar
2026-04-13 16:36 ` [PATCH v3 0/5] Add new system call for non-destructive peek and inspection to posix ipc mqueue Andrei Vagin
     [not found]   ` <CA+QNo220nhDmyazh=JQi0OEfhsGAc7az+GRkM=G5KdZj4+SyNg@mail.gmail.com>
2026-04-13 23:30     ` Mathura
2026-04-14  0:11       ` Andrei Vagin
2026-04-14  0:40         ` Mathura
2026-04-14 16:39           ` Andrei Vagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox