linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf
@ 2023-11-22 14:15 Yafang Shao
  2023-11-22 14:15 ` [RFC PATCH v2 1/6] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Yafang Shao @ 2023-11-22 14:15 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

Background
==========

In our containerized environment, we've identified unexpected OOM events
where the OOM-killer terminates tasks despite having ample free memory.
This anomaly is traced back to tasks within a container using mbind(2) to
bind memory to a specific NUMA node. When the allocated memory on this node
is exhausted, the OOM-killer, prioritizing tasks based on oom_score,
indiscriminately kills tasks. 

The Challenge 
============
In a containerized environment, independent memory binding by a user can
lead to unexpected system issues or disrupt tasks being run by other users
on the same server. If a user genuinely requires memory binding, we will
allocate dedicated servers to them by leveraging kubelet deployment.

Currently, users possess the ability to autonomously bind their memory to
specific nodes without explicit agreement or authorization from our end.
It's imperative that we establish a method to prevent this behavior.

Proposed Solutions
=================

- Introduce Capability to Disable MPOL_BIND
  Currently, any task can perform MPOL_BIND without specific capabilities.
  Enforcing CAP_SYS_RESOURCE or CAP_SYS_NICE could be an option, but this
  may have unintended consequences. Capabilities, being broad, might grant
  unnecessary privileges. We should explore alternatives to prevent
  unexpected side effects.

- Use LSM BPF to Disable MPOL_BIND
  Introduce LSM hooks for syscalls such as mbind(2), set_mempolicy(2), and
  set_mempolicy_home_node(2) to disable MPOL_BIND. This approach is more
  flexibility and allows for fine-grained control without unintended
  consequences. A sample LSM BPF program is included, demonstrating
  practical implementation in a production environment.

- seccomp
  seccomp is relatively heavyweight, making it less suitable for
  enabling in our production environment:
  - Both kubelet and containers need adaptation to support it.
  - Dynamically altering security policies for individual containers
    without interrupting their operations isn't straightforward.

Future Considerations
=====================

In addition, there's room for enhancement in the OOM-killer for cases
involving CONSTRAINT_MEMORY_POLICY. It would be more beneficial to
prioritize selecting a victim that has allocated memory on the same NUMA
node. My exploration on the lore led me to a proposal[0] related to this
matter, although consensus seems elusive at this point. Nevertheless,
delving into this specific topic is beyond the scope of the current
patchset.

[0] https://lore.kernel.org/lkml/20220512044634.63586-1-ligang.bdlg@bytedance.com/


Changes:
- RFC v1 -> RFC v2:
  - Refine the commit log to avoid misleading
  - Use one common lsm hook instead and add comment for it
  - Add selinux implementation
  - Other improments in mempolicy
- RFC v1: https://lwn.net/Articles/951188/

Yafang Shao (6):
  mm, doc: Add doc for MPOL_F_NUMA_BALANCING
  mm: mempolicy: Revise comment regarding mempolicy mode flags
  mm, security: Fix missed security_task_movememory() in mbind(2)
  mm, security: Add lsm hook for memory policy adjustment
  security: selinux: Implement set_mempolicy hook
  selftests/bpf: Add selftests for set_mempolicy with a lsm prog

 .../admin-guide/mm/numa_memory_policy.rst     | 27 +++++++
 include/linux/lsm_hook_defs.h                 |  3 +
 include/linux/security.h                      |  9 +++
 include/uapi/linux/mempolicy.h                |  2 +-
 mm/mempolicy.c                                | 17 +++-
 security/security.c                           | 13 +++
 security/selinux/hooks.c                      |  8 ++
 security/selinux/include/classmap.h           |  2 +-
 tools/testing/selftests/bpf/Makefile          |  2 +-
 .../selftests/bpf/prog_tests/set_mempolicy.c  | 79 +++++++++++++++++++
 .../selftests/bpf/progs/test_set_mempolicy.c  | 29 +++++++
 11 files changed, 187 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_set_mempolicy.c

-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH v2 1/6] mm, doc: Add doc for MPOL_F_NUMA_BALANCING
  2023-11-22 14:15 [RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
@ 2023-11-22 14:15 ` Yafang Shao
  2023-11-23  6:37   ` Huang, Ying
  2023-11-22 14:15 ` [RFC PATCH v2 2/6] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2023-11-22 14:15 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao,
	Huang, Ying

The document on MPOL_F_NUMA_BALANCING was missed in the initial commit
The MPOL_F_NUMA_BALANCING document was inadvertently omitted from the
initial commit bda420b98505 ("numa balancing: migrate on fault among
multiple bound nodes")

Let's ensure its inclusion.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
---
 .../admin-guide/mm/numa_memory_policy.rst     | 27 +++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index eca38fa81e0f..19071b71979c 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -332,6 +332,33 @@ MPOL_F_RELATIVE_NODES
 	MPOL_PREFERRED policies that were created with an empty nodemask
 	(local allocation).
 
+MPOL_F_NUMA_BALANCING (since Linux 5.12)
+        When operating in MPOL_BIND mode, enables NUMA balancing for tasks,
+        contingent upon kernel support. This feature optimizes page
+        placement within the confines of the specified memory binding
+        policy. The addition of the MPOL_F_NUMA_BALANCING flag augments the
+        control mechanism for NUMA balancing:
+
+        - The sysctl knob numa_balancing governs global activation or
+          deactivation of NUMA balancing.
+
+        - Even if sysctl numa_balancing is enabled, NUMA balancing remains
+          disabled by default for memory areas or applications utilizing
+          explicit memory policies.
+
+        - The MPOL_F_NUMA_BALANCING flag facilitates NUMA balancing
+          activation for applications employing explicit memory policies
+          (MPOL_BIND).
+
+        This flags enables various optimizations for page placement through
+        NUMA balancing. For instance, when an application's memory is bound
+        to multiple nodes (MPOL_BIND), the hint page fault handler attempts
+        to migrate accessed pages to reduce cross-node access if the
+        accessing node aligns with the policy nodemask.
+
+        If the flag isn't supported by the kernel, or is used with mode
+        other than MPOL_BIND, -1 is returned and errno is set to EINVAL.
+
 Memory Policy Reference Counting
 ================================
 
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v2 2/6] mm: mempolicy: Revise comment regarding mempolicy mode flags
  2023-11-22 14:15 [RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
  2023-11-22 14:15 ` [RFC PATCH v2 1/6] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
@ 2023-11-22 14:15 ` Yafang Shao
  2023-11-23  6:30   ` Huang, Ying
  2023-11-22 14:15 ` [RFC PATCH v2 3/6] mm, security: Fix missed security_task_movememory() in mbind(2) Yafang Shao
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2023-11-22 14:15 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao,
	Eric Dumazet, Huang, Ying

MPOL_F_STATIC_NODES, MPOL_F_RELATIVE_NODES, and MPOL_F_NUMA_BALANCING are
mode flags applicable to both set_mempolicy(2) and mbind(2) system calls.
It's worth noting that MPOL_F_NUMA_BALANCING was initially introduced in
commit bda420b98505 ("numa balancing: migrate on fault among multiple bound
nodes") exclusively for set_mempolicy(2). However, it was later made a
shared flag for both set_mempolicy(2) and mbind(2) following
commit 6d2aec9e123b ("mm/mempolicy: do not allow illegal
MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()").

This revised version aims to clarify the details regarding the mode flags.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
---
 include/uapi/linux/mempolicy.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index a8963f7ef4c2..afed4a45f5b9 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -26,7 +26,7 @@ enum {
 	MPOL_MAX,	/* always last member of enum */
 };
 
-/* Flags for set_mempolicy */
+/* Flags for set_mempolicy() or mbind() */
 #define MPOL_F_STATIC_NODES	(1 << 15)
 #define MPOL_F_RELATIVE_NODES	(1 << 14)
 #define MPOL_F_NUMA_BALANCING	(1 << 13) /* Optimize with NUMA balancing if possible */
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v2 3/6] mm, security: Fix missed security_task_movememory() in mbind(2)
  2023-11-22 14:15 [RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
  2023-11-22 14:15 ` [RFC PATCH v2 1/6] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
  2023-11-22 14:15 ` [RFC PATCH v2 2/6] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
@ 2023-11-22 14:15 ` Yafang Shao
  2023-11-22 14:15 ` [RFC PATCH v2 4/6] mm, security: Add lsm hook for memory policy adjustment Yafang Shao
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-11-22 14:15 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

Considering that mbind(2) using either MPOL_MF_MOVE or MPOL_MF_MOVE_ALL is
capable of memory movement, it's essential to include
security_task_movememory() to cover this functionality as well. It was
identified during a code review.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/mempolicy.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 10a590ee1c89..ded2e0e62e24 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1259,8 +1259,15 @@ static long do_mbind(unsigned long start, unsigned long len,
 	if (!new)
 		flags |= MPOL_MF_DISCONTIG_OK;
 
-	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
+		err = security_task_movememory(current);
+		if (err) {
+			mpol_put(new);
+			return err;
+		}
 		lru_cache_disable();
+	}
+
 	{
 		NODEMASK_SCRATCH(scratch);
 		if (scratch) {
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v2 4/6] mm, security: Add lsm hook for memory policy adjustment
  2023-11-22 14:15 [RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
                   ` (2 preceding siblings ...)
  2023-11-22 14:15 ` [RFC PATCH v2 3/6] mm, security: Fix missed security_task_movememory() in mbind(2) Yafang Shao
@ 2023-11-22 14:15 ` Yafang Shao
  2023-11-22 14:15 ` [RFC PATCH v2 5/6] security: selinux: Implement set_mempolicy hook Yafang Shao
  2023-11-22 14:15 ` [RFC PATCH v2 6/6] selftests/bpf: Add selftests for set_mempolicy with a lsm prog Yafang Shao
  5 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-11-22 14:15 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

In a containerized environment, independent memory binding by a user can
lead to unexpected system issues or disrupt tasks being run by other users
on the same server. If a user genuinely requires memory binding, we will
allocate dedicated servers to them by leveraging kubelet deployment.

At present, users have the capability to bind their memory to a specific
node without explicit agreement or authorization from us. Consequently, a
new LSM hook is introduced to mitigate this. This implementation allows us
to exercise fine-grained control over memory policy adjustments within our
container environment

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/lsm_hook_defs.h |  3 +++
 include/linux/security.h      |  9 +++++++++
 mm/mempolicy.c                |  8 ++++++++
 security/security.c           | 13 +++++++++++++
 4 files changed, 33 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index ff217a5ce552..558012719f98 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -419,3 +419,6 @@ LSM_HOOK(int, 0, uring_override_creds, const struct cred *new)
 LSM_HOOK(int, 0, uring_sqpoll, void)
 LSM_HOOK(int, 0, uring_cmd, struct io_uring_cmd *ioucmd)
 #endif /* CONFIG_IO_URING */
+
+LSM_HOOK(int, 0, set_mempolicy, unsigned long mode, unsigned short mode_flags,
+	 nodemask_t *nmask, unsigned int flags)
diff --git a/include/linux/security.h b/include/linux/security.h
index 1d1df326c881..cc4a19a0888c 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -484,6 +484,8 @@ int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
 int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
 int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
 int security_locked_down(enum lockdown_reason what);
+int security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+			   nodemask_t *nmask, unsigned int flags);
 #else /* CONFIG_SECURITY */
 
 static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1395,6 +1397,13 @@ static inline int security_locked_down(enum lockdown_reason what)
 {
 	return 0;
 }
+
+static inline int
+security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+		       nodemask_t *nmask, unsigned int flags)
+{
+	return 0;
+}
 #endif	/* CONFIG_SECURITY */
 
 #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index ded2e0e62e24..aa09198cbd29 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1490,6 +1490,10 @@ static long kernel_mbind(unsigned long start, unsigned long len,
 	if (err)
 		return err;
 
+	err = security_set_mempolicy(lmode, mode_flags, &nodes, flags);
+	if (err)
+		return err;
+
 	return do_mbind(start, len, lmode, mode_flags, &nodes, flags);
 }
 
@@ -1584,6 +1588,10 @@ static long kernel_set_mempolicy(int mode, const unsigned long __user *nmask,
 	if (err)
 		return err;
 
+	err = security_set_mempolicy(lmode, mode_flags, &nodes, 0);
+	if (err)
+		return err;
+
 	return do_set_mempolicy(lmode, mode_flags, &nodes);
 }
 
diff --git a/security/security.c b/security/security.c
index dcb3e7014f9b..685ad7993753 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5337,3 +5337,16 @@ int security_uring_cmd(struct io_uring_cmd *ioucmd)
 	return call_int_hook(uring_cmd, 0, ioucmd);
 }
 #endif /* CONFIG_IO_URING */
+
+/**
+ * security_set_mempolicy() - Check if memory policy can be adjusted
+ * @mode: The memory policy mode to be set
+ * @mode_flags: optional mode flags
+ * @nmask: modemask to which the mode applies
+ * @flags: mode flags for mbind(2) only
+ */
+int security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+			   nodemask_t *nmask, unsigned int flags)
+{
+	return call_int_hook(set_mempolicy, 0, mode, mode_flags, nmask, flags);
+}
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v2 5/6] security: selinux: Implement set_mempolicy hook
  2023-11-22 14:15 [RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
                   ` (3 preceding siblings ...)
  2023-11-22 14:15 ` [RFC PATCH v2 4/6] mm, security: Add lsm hook for memory policy adjustment Yafang Shao
@ 2023-11-22 14:15 ` Yafang Shao
  2023-11-22 14:15 ` [RFC PATCH v2 6/6] selftests/bpf: Add selftests for set_mempolicy with a lsm prog Yafang Shao
  5 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-11-22 14:15 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

Add a SELinux access control for the newly introduced set_mempolicy lsm
hook. A new permission "setmempolicy" is defined under the "process" class
for it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 security/selinux/hooks.c            | 8 ++++++++
 security/selinux/include/classmap.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index feda711c6b7b..1528d4dcfa03 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4238,6 +4238,13 @@ static int selinux_userns_create(const struct cred *cred)
 			USER_NAMESPACE__CREATE, NULL);
 }
 
+static int selinux_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+				 nodemask_t *nmask, unsigned int flags)
+{
+	return avc_has_perm(current_sid(), task_sid_obj(current), SECCLASS_PROCESS,
+			    PROCESS__SETMEMPOLICY, NULL);
+}
+
 /* Returns error only if unable to parse addresses */
 static int selinux_parse_skb_ipv4(struct sk_buff *skb,
 			struct common_audit_data *ad, u8 *proto)
@@ -7072,6 +7079,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(task_kill, selinux_task_kill),
 	LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode),
 	LSM_HOOK_INIT(userns_create, selinux_userns_create),
+	LSM_HOOK_INIT(set_mempolicy, selinux_set_mempolicy),
 
 	LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission),
 	LSM_HOOK_INIT(ipc_getsecid, selinux_ipc_getsecid),
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index a3c380775d41..c280d92a409f 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -51,7 +51,7 @@ const struct security_class_mapping secclass_map[] = {
 	    "getattr", "setexec", "setfscreate", "noatsecure", "siginh",
 	    "setrlimit", "rlimitinh", "dyntransition", "setcurrent",
 	    "execmem", "execstack", "execheap", "setkeycreate",
-	    "setsockcreate", "getrlimit", NULL } },
+	    "setsockcreate", "getrlimit", "setmempolicy", NULL } },
 	{ "process2",
 	  { "nnp_transition", "nosuid_transition", NULL } },
 	{ "system",
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v2 6/6] selftests/bpf: Add selftests for set_mempolicy with a lsm prog
  2023-11-22 14:15 [RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
                   ` (4 preceding siblings ...)
  2023-11-22 14:15 ` [RFC PATCH v2 5/6] security: selinux: Implement set_mempolicy hook Yafang Shao
@ 2023-11-22 14:15 ` Yafang Shao
  5 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-11-22 14:15 UTC (permalink / raw)
  To: akpm, paul, jmorris, serge, omosnace, mhocko
  Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao

The result as follows,
  #261/1   set_mempolicy/MPOL_BIND_with_lsm:OK
  #261/2   set_mempolicy/MPOL_DEFAULT_with_lsm:OK
  #261/3   set_mempolicy/MPOL_BIND_without_lsm:OK
  #261/4   set_mempolicy/MPOL_DEFAULT_without_lsm:OK
  #261     set_mempolicy:OK
  Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 tools/testing/selftests/bpf/Makefile          |  2 +-
 .../selftests/bpf/prog_tests/set_mempolicy.c  | 79 +++++++++++++++++++
 .../selftests/bpf/progs/test_set_mempolicy.c  | 29 +++++++
 3 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_set_mempolicy.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 9c27b67bc7b1..3c3c3b7d5dcd 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -35,7 +35,7 @@ CFLAGS += -g $(OPT_FLAGS) -rdynamic					\
 	  -I$(CURDIR) -I$(INCLUDE_DIR) -I$(GENDIR) -I$(LIBDIR)		\
 	  -I$(TOOLSINCDIR) -I$(APIDIR) -I$(OUTPUT)
 LDFLAGS += $(SAN_LDFLAGS)
-LDLIBS += -lelf -lz -lrt -lpthread
+LDLIBS += -lelf -lz -lrt -lpthread -lnuma
 
 ifneq ($(LLVM),)
 # Silence some warnings when compiled with clang
diff --git a/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c b/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
new file mode 100644
index 000000000000..0dc3391b29fb
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <numaif.h>
+#include <test_progs.h>
+#include "test_set_mempolicy.skel.h"
+
+#define SIZE 4096
+
+static void mempolicy_bind(bool success)
+{
+	unsigned long mask = 1;
+	char *addr;
+	int err;
+
+	addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+	if (!ASSERT_OK_PTR(addr, "mmap"))
+		return;
+
+	err = mbind(addr, SIZE, MPOL_BIND, &mask, sizeof(mask), 0);
+	if (success)
+		ASSERT_OK(err, "mbind_success");
+	else
+		ASSERT_ERR(err, "mbind_fail");
+
+	munmap(addr, SIZE);
+}
+
+static void mempolicy_default(void)
+{
+	char *addr;
+	int err;
+
+	addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+	if (!ASSERT_OK_PTR(addr, "mmap"))
+		return;
+
+	err = mbind(addr, SIZE, MPOL_DEFAULT, NULL, 0, 0);
+	ASSERT_OK(err, "mbind_success");
+
+	munmap(addr, SIZE);
+}
+void test_set_mempolicy(void)
+{
+	struct test_set_mempolicy *skel;
+	int err;
+
+	skel = test_set_mempolicy__open();
+	if (!ASSERT_OK_PTR(skel, "open"))
+		return;
+
+	skel->bss->target_pid = getpid();
+
+	err = test_set_mempolicy__load(skel);
+	if (!ASSERT_OK(err, "load"))
+		goto destroy;
+
+	/* Attach LSM prog first */
+	err = test_set_mempolicy__attach(skel);
+	if (!ASSERT_OK(err, "attach"))
+		goto destroy;
+
+	/* syscall to adjust memory policy */
+	if (test__start_subtest("MPOL_BIND_with_lsm"))
+		mempolicy_bind(false);
+	if (test__start_subtest("MPOL_DEFAULT_with_lsm"))
+		mempolicy_default();
+
+destroy:
+	test_set_mempolicy__destroy(skel);
+
+	if (test__start_subtest("MPOL_BIND_without_lsm"))
+		mempolicy_bind(true);
+	if (test__start_subtest("MPOL_DEFAULT_without_lsm"))
+		mempolicy_default();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_set_mempolicy.c b/tools/testing/selftests/bpf/progs/test_set_mempolicy.c
new file mode 100644
index 000000000000..31eeaa580a17
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_set_mempolicy.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+
+int target_pid;
+
+static int mem_policy_adjustment(u64 mode)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+
+	if (task->pid != target_pid)
+		return 0;
+
+	if (mode != MPOL_BIND)
+		return 0;
+	return -1;
+}
+
+SEC("lsm/set_mempolicy")
+int BPF_PROG(setmempolicy, u64 mode, u16 mode_flags, nodemask_t *nmask, u32 flags)
+{
+	return mem_policy_adjustment(mode);
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v2 2/6] mm: mempolicy: Revise comment regarding mempolicy mode flags
  2023-11-22 14:15 ` [RFC PATCH v2 2/6] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
@ 2023-11-23  6:30   ` Huang, Ying
  2023-11-23 12:21     ` Yafang Shao
  0 siblings, 1 reply; 10+ messages in thread
From: Huang, Ying @ 2023-11-23  6:30 UTC (permalink / raw)
  To: Yafang Shao
  Cc: akpm, paul, jmorris, serge, omosnace, mhocko, linux-mm,
	linux-security-module, bpf, ligang.bdlg, Eric Dumazet

Yafang Shao <laoar.shao@gmail.com> writes:

> MPOL_F_STATIC_NODES, MPOL_F_RELATIVE_NODES, and MPOL_F_NUMA_BALANCING are
> mode flags applicable to both set_mempolicy(2) and mbind(2) system calls.
> It's worth noting that MPOL_F_NUMA_BALANCING was initially introduced in
> commit bda420b98505 ("numa balancing: migrate on fault among multiple bound
> nodes") exclusively for set_mempolicy(2). However, it was later made a
> shared flag for both set_mempolicy(2) and mbind(2) following
> commit 6d2aec9e123b ("mm/mempolicy: do not allow illegal
> MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()").
>
> This revised version aims to clarify the details regarding the mode flags.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: "Huang, Ying" <ying.huang@intel.com>

Thanks for fixing this.

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

And, please revise the manpage for mbind() too.  As we have done for
set_mempolicy(),

https://lore.kernel.org/all/20210120061235.148637-3-ying.huang@intel.com/

--
Best Regards,
Huang, Ying

> ---
>  include/uapi/linux/mempolicy.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
> index a8963f7ef4c2..afed4a45f5b9 100644
> --- a/include/uapi/linux/mempolicy.h
> +++ b/include/uapi/linux/mempolicy.h
> @@ -26,7 +26,7 @@ enum {
>  	MPOL_MAX,	/* always last member of enum */
>  };
>  
> -/* Flags for set_mempolicy */
> +/* Flags for set_mempolicy() or mbind() */
>  #define MPOL_F_STATIC_NODES	(1 << 15)
>  #define MPOL_F_RELATIVE_NODES	(1 << 14)
>  #define MPOL_F_NUMA_BALANCING	(1 << 13) /* Optimize with NUMA balancing if possible */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v2 1/6] mm, doc: Add doc for MPOL_F_NUMA_BALANCING
  2023-11-22 14:15 ` [RFC PATCH v2 1/6] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
@ 2023-11-23  6:37   ` Huang, Ying
  0 siblings, 0 replies; 10+ messages in thread
From: Huang, Ying @ 2023-11-23  6:37 UTC (permalink / raw)
  To: Yafang Shao
  Cc: akpm, paul, jmorris, serge, omosnace, mhocko, linux-mm,
	linux-security-module, bpf, ligang.bdlg

Yafang Shao <laoar.shao@gmail.com> writes:

> The document on MPOL_F_NUMA_BALANCING was missed in the initial commit
> The MPOL_F_NUMA_BALANCING document was inadvertently omitted from the
> initial commit bda420b98505 ("numa balancing: migrate on fault among
> multiple bound nodes")
>
> Let's ensure its inclusion.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: "Huang, Ying" <ying.huang@intel.com>

LGTM, Thanks!

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>

> ---
>  .../admin-guide/mm/numa_memory_policy.rst     | 27 +++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
> index eca38fa81e0f..19071b71979c 100644
> --- a/Documentation/admin-guide/mm/numa_memory_policy.rst
> +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
> @@ -332,6 +332,33 @@ MPOL_F_RELATIVE_NODES
>  	MPOL_PREFERRED policies that were created with an empty nodemask
>  	(local allocation).
>  
> +MPOL_F_NUMA_BALANCING (since Linux 5.12)
> +        When operating in MPOL_BIND mode, enables NUMA balancing for tasks,
> +        contingent upon kernel support. This feature optimizes page
> +        placement within the confines of the specified memory binding
> +        policy. The addition of the MPOL_F_NUMA_BALANCING flag augments the
> +        control mechanism for NUMA balancing:
> +
> +        - The sysctl knob numa_balancing governs global activation or
> +          deactivation of NUMA balancing.
> +
> +        - Even if sysctl numa_balancing is enabled, NUMA balancing remains
> +          disabled by default for memory areas or applications utilizing
> +          explicit memory policies.
> +
> +        - The MPOL_F_NUMA_BALANCING flag facilitates NUMA balancing
> +          activation for applications employing explicit memory policies
> +          (MPOL_BIND).
> +
> +        This flags enables various optimizations for page placement through
> +        NUMA balancing. For instance, when an application's memory is bound
> +        to multiple nodes (MPOL_BIND), the hint page fault handler attempts
> +        to migrate accessed pages to reduce cross-node access if the
> +        accessing node aligns with the policy nodemask.
> +
> +        If the flag isn't supported by the kernel, or is used with mode
> +        other than MPOL_BIND, -1 is returned and errno is set to EINVAL.
> +
>  Memory Policy Reference Counting
>  ================================

--
Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v2 2/6] mm: mempolicy: Revise comment regarding mempolicy mode flags
  2023-11-23  6:30   ` Huang, Ying
@ 2023-11-23 12:21     ` Yafang Shao
  0 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-11-23 12:21 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, paul, jmorris, serge, omosnace, mhocko, linux-mm,
	linux-security-module, bpf, ligang.bdlg, Eric Dumazet

On Thu, Nov 23, 2023 at 2:32 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> Yafang Shao <laoar.shao@gmail.com> writes:
>
> > MPOL_F_STATIC_NODES, MPOL_F_RELATIVE_NODES, and MPOL_F_NUMA_BALANCING are
> > mode flags applicable to both set_mempolicy(2) and mbind(2) system calls.
> > It's worth noting that MPOL_F_NUMA_BALANCING was initially introduced in
> > commit bda420b98505 ("numa balancing: migrate on fault among multiple bound
> > nodes") exclusively for set_mempolicy(2). However, it was later made a
> > shared flag for both set_mempolicy(2) and mbind(2) following
> > commit 6d2aec9e123b ("mm/mempolicy: do not allow illegal
> > MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()").
> >
> > This revised version aims to clarify the details regarding the mode flags.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: "Huang, Ying" <ying.huang@intel.com>
>
> Thanks for fixing this.
>
> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
>
> And, please revise the manpage for mbind() too.  As we have done for
> set_mempolicy(),
>
> https://lore.kernel.org/all/20210120061235.148637-3-ying.huang@intel.com/

Thanks for your review. will do it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-23 12:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-22 14:15 [RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
2023-11-22 14:15 ` [RFC PATCH v2 1/6] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
2023-11-23  6:37   ` Huang, Ying
2023-11-22 14:15 ` [RFC PATCH v2 2/6] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
2023-11-23  6:30   ` Huang, Ying
2023-11-23 12:21     ` Yafang Shao
2023-11-22 14:15 ` [RFC PATCH v2 3/6] mm, security: Fix missed security_task_movememory() in mbind(2) Yafang Shao
2023-11-22 14:15 ` [RFC PATCH v2 4/6] mm, security: Add lsm hook for memory policy adjustment Yafang Shao
2023-11-22 14:15 ` [RFC PATCH v2 5/6] security: selinux: Implement set_mempolicy hook Yafang Shao
2023-11-22 14:15 ` [RFC PATCH v2 6/6] selftests/bpf: Add selftests for set_mempolicy with a lsm prog Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).