[PATCH v4 0/2] refactor sqthread cpu binding logic

public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/2] refactor sqthread cpu binding logic
@ 2021-09-01 12:43 Hao Xu
       [not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
  2021-09-02 16:48 ` [PATCH v4 0/2] refactor sqthread cpu binding logic Michal Koutný
  0 siblings, 2 replies; 14+ messages in thread
From: Hao Xu @ 2021-09-01 12:43 UTC (permalink / raw)
  To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov
  Cc: io-uring, cgroups, Joseph Qi

This patchset is to enhance sqthread cpu binding logic, we didn't
consider cgroup setting before. In container environment, theoretically
sqthread is in its container's task group, it shouldn't occupy cpu out
of its container. Otherwise it may cause some problems, for example:
resource balancer may controll cpu resource allocation by a container's
current cpu usage, here sqthread should be counted in.

v1-->v2
 - add helper to do cpu in current-cpuset test

v2-->v3
 - split it to 2 patches, first as prep one, second as comsumer
 - remove warnning, since cpuset may change anytime, we cannot catch
   all cases, so make the behaviour consistent.

v3-->v4
 - remove sqthread cpu binding helper since the logic is now very simple

Hao Xu (2):
  cpuset: add a helper to check if cpu in cpuset of current task
  io_uring: consider cgroup setting when binding sqpoll cpu

 fs/io_uring.c          | 11 ++++++-----
 include/linux/cpuset.h |  7 +++++++
 kernel/cgroup/cpuset.c | 11 +++++++++++
 3 files changed, 24 insertions(+), 5 deletions(-)

-- 
2.24.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>]

* [PATCH 1/2] cpuset: add a helper to check if cpu in cpuset of current task
       [not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
@ 2021-09-01 12:43   ` Hao Xu
  2021-09-01 12:43   ` [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu Hao Xu
  1 sibling, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-09-01 12:43 UTC (permalink / raw)
  To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov
  Cc: io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

Check if a cpu is in the cpuset of current task, not guaranteed that
cpu is online.

Signed-off-by: Hao Xu <haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
---
 include/linux/cpuset.h |  7 +++++++
 kernel/cgroup/cpuset.c | 11 +++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 04c20de66afc..fad77c91bc1f 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -116,6 +116,8 @@ static inline int cpuset_do_slab_mem_spread(void)
 
 extern bool current_cpuset_is_being_rebound(void);
 
+extern bool test_cpu_in_current_cpuset(int cpu);
+
 extern void rebuild_sched_domains(void);
 
 extern void cpuset_print_current_mems_allowed(void);
@@ -257,6 +259,11 @@ static inline bool current_cpuset_is_being_rebound(void)
 	return false;
 }
 
+static inline bool test_cpu_in_current_cpuset(int cpu)
+{
+	return false;
+}
+
 static inline void rebuild_sched_domains(void)
 {
 	partition_sched_domains(1, NULL, NULL);
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index adb5190c4429..a63c27e9430e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1849,6 +1849,17 @@ bool current_cpuset_is_being_rebound(void)
 	return ret;
 }
 
+bool test_cpu_in_current_cpuset(int cpu)
+{
+	bool ret;
+
+	rcu_read_lock();
+	ret = cpumask_test_cpu(cpu, task_cs(current)->effective_cpus);
+	rcu_read_unlock();
+
+	return ret;
+}
+
 static int update_relax_domain_level(struct cpuset *cs, s64 val)
 {
 #ifdef CONFIG_SMP
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu
       [not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
  2021-09-01 12:43   ` [PATCH 1/2] cpuset: add a helper to check if cpu in cpuset of current task Hao Xu
@ 2021-09-01 12:43   ` Hao Xu
       [not found]     ` <20210901124322.164238-3-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Hao Xu @ 2021-09-01 12:43 UTC (permalink / raw)
  To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov
  Cc: io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

Since sqthread is userspace like thread now, it should respect cgroup
setting, thus we should consider current allowed cpuset when doing
cpu binding for sqthread.

Signed-off-by: Hao Xu <haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
---
 fs/io_uring.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7cc458e0b636..fb7890077ede 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -79,6 +79,7 @@
 #include <linux/pagemap.h>
 #include <linux/io_uring.h>
 #include <linux/tracehook.h>
+#include <linux/cpuset.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -7112,11 +7113,9 @@ static int io_sq_thread(void *data)
 
 	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
 	set_task_comm(current, buf);
-
-	if (sqd->sq_cpu != -1)
+	if (sqd->sq_cpu != -1 && test_cpu_in_current_cpuset(sqd->sq_cpu))
 		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
-	else
-		set_cpus_allowed_ptr(current, cpu_online_mask);
+
 	current->flags |= PF_NO_SETAFFINITY;
 
 	mutex_lock(&sqd->lock);
@@ -8310,8 +8309,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 			int cpu = p->sq_thread_cpu;
 
 			ret = -EINVAL;
-			if (cpu >= nr_cpu_ids || !cpu_online(cpu))
+			if (cpu >= nr_cpu_ids || !cpu_online(cpu) ||
+			    !test_cpu_in_current_cpuset(cpu))
 				goto err_sqpoll;
+
 			sqd->sq_cpu = cpu;
 		} else {
 			sqd->sq_cpu = -1;
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

[parent not found: <20210901124322.164238-3-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>]

* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu
       [not found]     ` <20210901124322.164238-3-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
@ 2021-09-01 16:41       ` Tejun Heo
  2021-09-01 16:42         ` Tejun Heo
       [not found]         ` <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
  0 siblings, 2 replies; 14+ messages in thread
From: Tejun Heo @ 2021-09-01 16:41 UTC (permalink / raw)
  To: Hao Xu
  Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov,
	io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

Hello,

On Wed, Sep 01, 2021 at 08:43:22PM +0800, Hao Xu wrote:
> @@ -7112,11 +7113,9 @@ static int io_sq_thread(void *data)
>  
>  	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
>  	set_task_comm(current, buf);
> +	if (sqd->sq_cpu != -1 && test_cpu_in_current_cpuset(sqd->sq_cpu))
>  		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
> +

Would it make sense to just test whether set_cpus_allowed_ptr() succeeded
afterwards?

> @@ -8310,8 +8309,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
>  			int cpu = p->sq_thread_cpu;
>  
>  			ret = -EINVAL;
> -			if (cpu >= nr_cpu_ids || !cpu_online(cpu))
> +			if (cpu >= nr_cpu_ids || !cpu_online(cpu) ||
> +			    !test_cpu_in_current_cpuset(cpu))
>  				goto err_sqpoll;
> +

Failing operations on transient conditions like this may be confusing. Let's
ignore cpuset for now. CPU hotplug is sometimes driven automatically for
power saving purposes, so failing operation based on whether a cpu is online
means that the success or failure of the operation can seem arbitrary. If
the operation takes place while the cpu happens to be online, it succeeds
and the thread gets unbound and rebound automatically as the cpu goes
offline and online. If the operation takes place while the cpu happens to be
offline, the operation fails.

I don't know what the intended behavior here should be and we haven't been
pretty bad at defining reasonable behavior around cpu hotplug, so it'd
probably be worthwhile to consider what the behavior should be.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu
  2021-09-01 16:41       ` Tejun Heo
@ 2021-09-01 16:42         ` Tejun Heo
       [not found]         ` <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
  1 sibling, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2021-09-01 16:42 UTC (permalink / raw)
  To: Hao Xu
  Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov, io-uring,
	cgroups, Joseph Qi

On Wed, Sep 01, 2021 at 06:41:34AM -1000, Tejun Heo wrote:
> I don't know what the intended behavior here should be and we haven't been
                                                                ^
                                                                have
> pretty bad at defining reasonable behavior around cpu hotplug, so it'd
> probably be worthwhile to consider what the behavior should be.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu
       [not found]         ` <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
@ 2021-09-03 15:04           ` Hao Xu
       [not found]             ` <c49d9b26-1c74-316a-c933-e6964695a286-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Hao Xu @ 2021-09-03 15:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov,
	io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

在 2021/9/2 上午12:41, Tejun Heo 写道:
Hi Tejun,
> Hello,
> 
> On Wed, Sep 01, 2021 at 08:43:22PM +0800, Hao Xu wrote:
>> @@ -7112,11 +7113,9 @@ static int io_sq_thread(void *data)
>>   
>>   	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
>>   	set_task_comm(current, buf);
>> +	if (sqd->sq_cpu != -1 && test_cpu_in_current_cpuset(sqd->sq_cpu))
>>   		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
>> +
> 
> Would it make sense to just test whether set_cpus_allowed_ptr() succeeded
> afterwards?
Do you mean: if (sqd->sq_cpu != -1 && !set_cpus_allowed_ptr(current, 
cpumask_of(sqd->sq_cpu)))

I'm not familiar with set_cpus_allowed_ptr(), you mean it contains the
similar logic of test_cpu_in_current_cpuset?
> 
>> @@ -8310,8 +8309,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
>>   			int cpu = p->sq_thread_cpu;
>>   
>>   			ret = -EINVAL;
>> -			if (cpu >= nr_cpu_ids || !cpu_online(cpu))
>> +			if (cpu >= nr_cpu_ids || !cpu_online(cpu) ||
>> +			    !test_cpu_in_current_cpuset(cpu))
>>   				goto err_sqpoll;
>> +
> 
> Failing operations on transient conditions like this may be confusing. Let's
> ignore cpuset for now. CPU hotplug is sometimes driven automatically for
> power saving purposes, so failing operation based on whether a cpu is online
> means that the success or failure of the operation can seem arbitrary. If
> the operation takes place while the cpu happens to be online, it succeeds
> and the thread gets unbound and rebound automatically as the cpu goes
This is a bit beyond of my knowledge, so you mean if the cpu back
online, the task will automatically schedule to this cpu? if it's true,
I think the code logic here is fine.
> offline and online. If the operation takes place while the cpu happens to be
> offline, the operation fails.
It's ok that it fails, we leave the option of retry to users themselves.
> 
> I don't know what the intended behavior here should be and we haven't been
> pretty bad at defining reasonable behavior around cpu hotplug, so it'd
> probably be worthwhile to consider what the behavior should be.
> 
> Thanks.
> 
Thanks,
Hao


^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <c49d9b26-1c74-316a-c933-e6964695a286-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>]

* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu
       [not found]             ` <c49d9b26-1c74-316a-c933-e6964695a286-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
@ 2021-09-07 16:54               ` Tejun Heo
       [not found]                 ` <YTeZUnshr+mgf5GS-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2021-09-07 16:54 UTC (permalink / raw)
  To: Hao Xu
  Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov,
	io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

Hello,

On Fri, Sep 03, 2021 at 11:04:07PM +0800, Hao Xu wrote:
> > Would it make sense to just test whether set_cpus_allowed_ptr() succeeded
> > afterwards?
> Do you mean: if (sqd->sq_cpu != -1 && !set_cpus_allowed_ptr(current,
> cpumask_of(sqd->sq_cpu)))
> 
> I'm not familiar with set_cpus_allowed_ptr(), you mean it contains the
> similar logic of test_cpu_in_current_cpuset?

It's kinda muddy unfortunately. I think it rejects if the target cpu is
offline but accept and ignores if the cpu is excluded by cpuset.

> This is a bit beyond of my knowledge, so you mean if the cpu back
> online, the task will automatically schedule to this cpu? if it's true,
> I think the code logic here is fine.
>
> > offline and online. If the operation takes place while the cpu happens to be
> > offline, the operation fails.
> It's ok that it fails, we leave the option of retry to users themselves.

I think the first thing to do is defining the desired behavior, hopefully in
a consistent manner, rather than letting it be defined by implementation.
e.g. If the desired behavior is the per-cpu helper failing, then it should
probably exit when the target cpu isn't available for whatever reason. If
the desired behavior is best effort when cpu goes away (ie. ignore
affinity), the creation likely shouldn't fail when the target cpu is
unavailable but can become available in the future.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <YTeZUnshr+mgf5GS-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>]

* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu
       [not found]                 ` <YTeZUnshr+mgf5GS-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
@ 2021-09-07 19:28                   ` Hao Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-09-07 19:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov,
	io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

在 2021/9/8 上午12:54, Tejun Heo 写道:
> Hello,
> 
> On Fri, Sep 03, 2021 at 11:04:07PM +0800, Hao Xu wrote:
>>> Would it make sense to just test whether set_cpus_allowed_ptr() succeeded
>>> afterwards?
>> Do you mean: if (sqd->sq_cpu != -1 && !set_cpus_allowed_ptr(current,
>> cpumask_of(sqd->sq_cpu)))
>>
>> I'm not familiar with set_cpus_allowed_ptr(), you mean it contains the
>> similar logic of test_cpu_in_current_cpuset?
> 
> It's kinda muddy unfortunately. I think it rejects if the target cpu is
> offline but accept and ignores if the cpu is excluded by cpuset.
> 
>> This is a bit beyond of my knowledge, so you mean if the cpu back
>> online, the task will automatically schedule to this cpu? if it's true,
>> I think the code logic here is fine.
>>
>>> offline and online. If the operation takes place while the cpu happens to be
>>> offline, the operation fails.
>> It's ok that it fails, we leave the option of retry to users themselves.
> 
> I think the first thing to do is defining the desired behavior, hopefully in
> a consistent manner, rather than letting it be defined by implementation.
> e.g. If the desired behavior is the per-cpu helper failing, then it should
> probably exit when the target cpu isn't available for whatever reason. If
> the desired behavior is best effort when cpu goes away (ie. ignore
Hmm, I see. First I think we should move the set_cpus_allowed_ptr() to
sqthread creation place not when it is running(not sure why it is
currently at the beginning of sqthred itself), then we can have
consistent behaviour.(if we do the check at sqthread's running time,
then no matter we kill it or still allow it to run when cpu_online
check fails, it's hard to let users know the result of their cpu binding
since users don't know the exact time when sqthread waken up and begin
to run, so that they can check their cpu binding result).
Second, I think users' cpu binding is a kind of 'advice', not 'command'.
So no matter cpu_online check succeeds or fails, we still let sqthread
run, meanwhile return the cpu binding result to the userspace.
Anyway, I'd like to know Jens' thoughts on this.
> affinity), the creation likely shouldn't fail when the target cpu is
> unavailable but can become available in the future.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/2] refactor sqthread cpu binding logic
  2021-09-01 12:43 [PATCH v4 0/2] refactor sqthread cpu binding logic Hao Xu
       [not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
@ 2021-09-02 16:48 ` Michal Koutný
       [not found]   ` <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
  2021-09-03 14:43   ` Hao Xu
  1 sibling, 2 replies; 14+ messages in thread
From: Michal Koutný @ 2021-09-02 16:48 UTC (permalink / raw)
  To: Hao Xu
  Cc: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov,
	io-uring, cgroups, Joseph Qi

Hello Hao.

On Wed, Sep 01, 2021 at 08:43:20PM +0800, Hao Xu <haoxu@linux.alibaba.com> wrote:
> This patchset is to enhance sqthread cpu binding logic, we didn't
> consider cgroup setting before. In container environment, theoretically
> sqthread is in its container's task group, it shouldn't occupy cpu out
> of its container.

I see in the discussions that there's struggle to make
set_cpus_allowed_ptr() do what's intended under the given constraints.

IIUC, set_cpus_allowed_ptr() is conventionally used for kernel threads
[1]. But does the sqthread fall into this category? You want to have it
_directly_ associated with a container and its cgroups. It looks to me
more like a userspace thread (from this perspective, not literally). Or
is there a different intention?

It seems to me that reusing the sched_setaffinity() (with all its
checks and race pains/solutions) would be a more universal approach.
(I don't mean calling sched_setaffinity() directly, some parts would
need to be factored separately to this end.) WDYT?

Regards,
Michal

[1] Not only spending their life in kernel but providing some
delocalized kernel service.

^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>]

* Re: [PATCH v4 0/2] refactor sqthread cpu binding logic
       [not found]   ` <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
@ 2021-09-02 18:00     ` Jens Axboe
       [not found]       ` <efd3c387-9c7c-c0d8-1306-f722da2a9ba1-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2021-09-02 18:00 UTC (permalink / raw)
  To: Michal Koutný, Hao Xu
  Cc: Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov,
	io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

On 9/2/21 10:48 AM, Michal KoutnÃ½ wrote:
> Hello Hao.
> 
> On Wed, Sep 01, 2021 at 08:43:20PM +0800, Hao Xu <haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> wrote:
>> This patchset is to enhance sqthread cpu binding logic, we didn't
>> consider cgroup setting before. In container environment, theoretically
>> sqthread is in its container's task group, it shouldn't occupy cpu out
>> of its container.
> 
> I see in the discussions that there's struggle to make
> set_cpus_allowed_ptr() do what's intended under the given constraints.
> 
> IIUC, set_cpus_allowed_ptr() is conventionally used for kernel threads
> [1]. But does the sqthread fall into this category? You want to have it
> _directly_ associated with a container and its cgroups. It looks to me
> more like a userspace thread (from this perspective, not literally). Or
> is there a different intention?

It's an io thread, which is kind of a hybrid - it's a kernel thread in
the sense that it never exits to userspace (ever), but it's a regular
thread in the sense that it's setup like one.

> It seems to me that reusing the sched_setaffinity() (with all its
> checks and race pains/solutions) would be a more universal approach.
> (I don't mean calling sched_setaffinity() directly, some parts would
> need to be factored separately to this end.) WDYT?

We already have this API to set the affinity based on when these were
regular kernel threads, so it needs to work with that too. As such they
are marked PF_NO_SETAFFINITY.

> [1] Not only spending their life in kernel but providing some
> delocalized kernel service.

That's what they do...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <efd3c387-9c7c-c0d8-1306-f722da2a9ba1-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>]

* Re: [PATCH v4 0/2] refactor sqthread cpu binding logic
       [not found]       ` <efd3c387-9c7c-c0d8-1306-f722da2a9ba1-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
@ 2021-09-09 12:34         ` Michal Koutný
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Koutný @ 2021-09-09 12:34 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu
  Cc: Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov,
	io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

Thanks for the answer and the context explanations.

On Thu, Sep 02, 2021 at 12:00:33PM -0600, Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> wrote:
> We already have this API to set the affinity based on when these were
> regular kernel threads, so it needs to work with that too. As such they
> are marked PF_NO_SETAFFINITY.

I see the current implementation "allows" at most one binding (by the
passed sq_cpu arg) to a CPU and then "locks" it by setting
PF_NO_SETAFFINITY subsequently (after set_cpus_allowed_ptr).
And actually doesn't check whether it succeeded or not (Hao suggests in
another subthread sq_cpu is a mere hint).

Nevertheless, you likely don't want to "trespass" the boundary of a
cpuset and I think that it'll end up with a loop checking against
hotplug races. That is already implemented in __sched_affinity, it'd be
IMO good to have it in one place only.

> > [1] Not only spending their life in kernel but providing some
> > delocalized kernel service.
> 
> That's what they do...

(I assume that answer to "life in kernel" and the IO threads serve only
the originating cpuset (container) i.e. are (co)localized to it (not
delocalized as some kernel workers).)

Cheers,
Michal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/2] refactor sqthread cpu binding logic
  2021-09-02 16:48 ` [PATCH v4 0/2] refactor sqthread cpu binding logic Michal Koutný
       [not found]   ` <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
@ 2021-09-03 14:43   ` Hao Xu
  1 sibling, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-09-03 14:43 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov,
	io-uring, cgroups, Joseph Qi

在 2021/9/3 上午12:48, Michal Koutný 写道:
> Hello Hao.
> 
> On Wed, Sep 01, 2021 at 08:43:20PM +0800, Hao Xu <haoxu@linux.alibaba.com> wrote:
>> This patchset is to enhance sqthread cpu binding logic, we didn't
>> consider cgroup setting before. In container environment, theoretically
>> sqthread is in its container's task group, it shouldn't occupy cpu out
>> of its container.
> 
> I see in the discussions that there's struggle to make
> set_cpus_allowed_ptr() do what's intended under the given constraints.
> 
> IIUC, set_cpus_allowed_ptr() is conventionally used for kernel threads
> [1]. But does the sqthread fall into this category? You want to have it
> _directly_ associated with a container and its cgroups. It looks to me
sqthread is in it's creator's task group, so it is like a userspace
thread from this perspective. When it comes to container environemt
sqthread naturely belongs to a container which also contains its creator
And it has same cgroup setting with it's creator by default.
> more like a userspace thread (from this perspective, not literally). Or
> is there a different intention?
> 
> It seems to me that reusing the sched_setaffinity() (with all its
> checks and race pains/solutions) would be a more universal approach.
> (I don't mean calling sched_setaffinity() directly, some parts would
> need to be factored separately to this end.) WDYT?
> 
> 
> Regards,
> Michal
> 
> [1] Not only spending their life in kernel but providing some
> delocalized kernel service.
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 0/2] refactor sqthread cpu binding logic
@ 2021-09-01 10:18 Hao Xu
       [not found] ` <20210901101833.69535-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Hao Xu @ 2021-09-01 10:18 UTC (permalink / raw)
  To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov
  Cc: io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

This patchset is to enhance sqthread cpu binding logic, we didn't
consider cgroup setting before. In container environment, theoretically
sqthread is in its container's task group, it shouldn't occupy cpu out
of its container. Otherwise it may cause some problems, for example:
resource balancer may controll cpu resource allocation by a container's
current cpu usage, here sqthread should be counted in.

v1-->v2
 - add helper to do cpu in current-cpuset test

v2-->v3
 - split it to 2 patches, first as prep one, second as comsumer
 - remove warnning, since cpuset may change anytime, we cannot catch
   all cases, so make the behaviour consistent.

Hao Xu (2):
  cpuset: add a helper to check if cpu in cpuset of current task
  io_uring: consider cgroup setting when binding sqpoll cpu

 fs/io_uring.c          | 19 ++++++++++++++-----
 include/linux/cpuset.h |  7 +++++++
 kernel/cgroup/cpuset.c | 11 +++++++++++
 3 files changed, 32 insertions(+), 5 deletions(-)

-- 
2.24.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <20210901101833.69535-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>]

* [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu
       [not found] ` <20210901101833.69535-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
@ 2021-09-01 10:18   ` Hao Xu
  2021-09-01 12:31     ` Hao Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Hao Xu @ 2021-09-01 10:18 UTC (permalink / raw)
  To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov
  Cc: io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Joseph Qi

Since sqthread is userspace like thread now, it should respect cgroup
setting, thus we should consider current allowed cpuset when doing
cpu binding for sqthread.

Signed-off-by: Hao Xu <haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
---
 fs/io_uring.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7cc458e0b636..414dfedf79a7 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -79,6 +79,7 @@
 #include <linux/pagemap.h>
 #include <linux/io_uring.h>
 #include <linux/tracehook.h>
+#include <linux/cpuset.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -7102,6 +7103,14 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd)
 	return did_sig || test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
 }
 
+static inline int io_sq_bind_cpu(int cpu)
+{
+	if (test_cpu_in_current_cpuset(cpu))
+		set_cpus_allowed_ptr(current, cpumask_of(cpu));
+
+	return 0;
+}
+
 static int io_sq_thread(void *data)
 {
 	struct io_sq_data *sqd = data;
@@ -7112,11 +7121,9 @@ static int io_sq_thread(void *data)
 
 	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
 	set_task_comm(current, buf);
-
 	if (sqd->sq_cpu != -1)
-		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
-	else
-		set_cpus_allowed_ptr(current, cpu_online_mask);
+		io_sq_bind_cpu(sqd->sq_cpu);
+
 	current->flags |= PF_NO_SETAFFINITY;
 
 	mutex_lock(&sqd->lock);
@@ -8310,8 +8317,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 			int cpu = p->sq_thread_cpu;
 
 			ret = -EINVAL;
-			if (cpu >= nr_cpu_ids || !cpu_online(cpu))
+			if (cpu >= nr_cpu_ids || !cpu_online(cpu) ||
+			    !test_cpu_in_current_cpuset(cpu))
 				goto err_sqpoll;
+
 			sqd->sq_cpu = cpu;
 		} else {
 			sqd->sq_cpu = -1;
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu
  2021-09-01 10:18   ` [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu Hao Xu
@ 2021-09-01 12:31     ` Hao Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-09-01 12:31 UTC (permalink / raw)
  To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov
  Cc: io-uring, cgroups, Joseph Qi

在 2021/9/1 下午6:18, Hao Xu 写道:
> Since sqthread is userspace like thread now, it should respect cgroup
> setting, thus we should consider current allowed cpuset when doing
> cpu binding for sqthread.
> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>   fs/io_uring.c | 19 ++++++++++++++-----
>   1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 7cc458e0b636..414dfedf79a7 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -79,6 +79,7 @@
>   #include <linux/pagemap.h>
>   #include <linux/io_uring.h>
>   #include <linux/tracehook.h>
> +#include <linux/cpuset.h>
>   
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/io_uring.h>
> @@ -7102,6 +7103,14 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd)
>   	return did_sig || test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
>   }
>   
> +static inline int io_sq_bind_cpu(int cpu)
> +{
> +	if (test_cpu_in_current_cpuset(cpu))
> +		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> +
> +	return 0;
Ah, no need to return value anymore, even no need to have this function
here. I'll resend a new version.
> +}
> +
>   static int io_sq_thread(void *data)
>   {
>   	struct io_sq_data *sqd = data;
> @@ -7112,11 +7121,9 @@ static int io_sq_thread(void *data)
>   
>   	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
>   	set_task_comm(current, buf);
> -
>   	if (sqd->sq_cpu != -1)
> -		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
> -	else
> -		set_cpus_allowed_ptr(current, cpu_online_mask);
> +		io_sq_bind_cpu(sqd->sq_cpu);
> +
>   	current->flags |= PF_NO_SETAFFINITY;
>   
>   	mutex_lock(&sqd->lock);
> @@ -8310,8 +8317,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
>   			int cpu = p->sq_thread_cpu;
>   
>   			ret = -EINVAL;
> -			if (cpu >= nr_cpu_ids || !cpu_online(cpu))
> +			if (cpu >= nr_cpu_ids || !cpu_online(cpu) ||
> +			    !test_cpu_in_current_cpuset(cpu))
>   				goto err_sqpoll;
> +
>   			sqd->sq_cpu = cpu;
>   		} else {
>   			sqd->sq_cpu = -1;
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-09 12:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-01 12:43 [PATCH v4 0/2] refactor sqthread cpu binding logic Hao Xu
     [not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2021-09-01 12:43   ` [PATCH 1/2] cpuset: add a helper to check if cpu in cpuset of current task Hao Xu
2021-09-01 12:43   ` [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu Hao Xu
     [not found]     ` <20210901124322.164238-3-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2021-09-01 16:41       ` Tejun Heo
2021-09-01 16:42         ` Tejun Heo
     [not found]         ` <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2021-09-03 15:04           ` Hao Xu
     [not found]             ` <c49d9b26-1c74-316a-c933-e6964695a286-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2021-09-07 16:54               ` Tejun Heo
     [not found]                 ` <YTeZUnshr+mgf5GS-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2021-09-07 19:28                   ` Hao Xu
2021-09-02 16:48 ` [PATCH v4 0/2] refactor sqthread cpu binding logic Michal Koutný
     [not found]   ` <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2021-09-02 18:00     ` Jens Axboe
     [not found]       ` <efd3c387-9c7c-c0d8-1306-f722da2a9ba1-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2021-09-09 12:34         ` Michal Koutný
2021-09-03 14:43   ` Hao Xu
  -- strict thread matches above, loose matches on Subject: below --
2021-09-01 10:18 [PATCH v3 " Hao Xu
     [not found] ` <20210901101833.69535-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2021-09-01 10:18   ` [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu Hao Xu
2021-09-01 12:31     ` Hao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox