* [PATCH v4 0/2] refactor sqthread cpu binding logic
@ 2021-09-01 12:43 Hao Xu
[not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2021-09-02 16:48 ` [PATCH v4 0/2] refactor sqthread cpu binding logic Michal Koutný
0 siblings, 2 replies; 14+ messages in thread
From: Hao Xu @ 2021-09-01 12:43 UTC (permalink / raw)
To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov
Cc: io-uring, cgroups, Joseph Qi
This patchset is to enhance sqthread cpu binding logic, we didn't
consider cgroup setting before. In container environment, theoretically
sqthread is in its container's task group, it shouldn't occupy cpu out
of its container. Otherwise it may cause some problems, for example:
resource balancer may controll cpu resource allocation by a container's
current cpu usage, here sqthread should be counted in.
v1-->v2
- add helper to do cpu in current-cpuset test
v2-->v3
- split it to 2 patches, first as prep one, second as comsumer
- remove warnning, since cpuset may change anytime, we cannot catch
all cases, so make the behaviour consistent.
v3-->v4
- remove sqthread cpu binding helper since the logic is now very simple
Hao Xu (2):
cpuset: add a helper to check if cpu in cpuset of current task
io_uring: consider cgroup setting when binding sqpoll cpu
fs/io_uring.c | 11 ++++++-----
include/linux/cpuset.h | 7 +++++++
kernel/cgroup/cpuset.c | 11 +++++++++++
3 files changed, 24 insertions(+), 5 deletions(-)
--
2.24.4
^ permalink raw reply [flat|nested] 14+ messages in thread[parent not found: <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>]
* [PATCH 1/2] cpuset: add a helper to check if cpu in cpuset of current task [not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> @ 2021-09-01 12:43 ` Hao Xu 2021-09-01 12:43 ` [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu Hao Xu 1 sibling, 0 replies; 14+ messages in thread From: Hao Xu @ 2021-09-01 12:43 UTC (permalink / raw) To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov Cc: io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Joseph Qi Check if a cpu is in the cpuset of current task, not guaranteed that cpu is online. Signed-off-by: Hao Xu <haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> --- include/linux/cpuset.h | 7 +++++++ kernel/cgroup/cpuset.c | 11 +++++++++++ 2 files changed, 18 insertions(+) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 04c20de66afc..fad77c91bc1f 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -116,6 +116,8 @@ static inline int cpuset_do_slab_mem_spread(void) extern bool current_cpuset_is_being_rebound(void); +extern bool test_cpu_in_current_cpuset(int cpu); + extern void rebuild_sched_domains(void); extern void cpuset_print_current_mems_allowed(void); @@ -257,6 +259,11 @@ static inline bool current_cpuset_is_being_rebound(void) return false; } +static inline bool test_cpu_in_current_cpuset(int cpu) +{ + return false; +} + static inline void rebuild_sched_domains(void) { partition_sched_domains(1, NULL, NULL); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index adb5190c4429..a63c27e9430e 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1849,6 +1849,17 @@ bool current_cpuset_is_being_rebound(void) return ret; } +bool test_cpu_in_current_cpuset(int cpu) +{ + bool ret; + + rcu_read_lock(); + ret = cpumask_test_cpu(cpu, task_cs(current)->effective_cpus); + rcu_read_unlock(); + + return ret; +} + static int update_relax_domain_level(struct cpuset *cs, s64 val) { #ifdef CONFIG_SMP -- 2.24.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu [not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> 2021-09-01 12:43 ` [PATCH 1/2] cpuset: add a helper to check if cpu in cpuset of current task Hao Xu @ 2021-09-01 12:43 ` Hao Xu [not found] ` <20210901124322.164238-3-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> 1 sibling, 1 reply; 14+ messages in thread From: Hao Xu @ 2021-09-01 12:43 UTC (permalink / raw) To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov Cc: io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Joseph Qi Since sqthread is userspace like thread now, it should respect cgroup setting, thus we should consider current allowed cpuset when doing cpu binding for sqthread. Signed-off-by: Hao Xu <haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> --- fs/io_uring.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 7cc458e0b636..fb7890077ede 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -79,6 +79,7 @@ #include <linux/pagemap.h> #include <linux/io_uring.h> #include <linux/tracehook.h> +#include <linux/cpuset.h> #define CREATE_TRACE_POINTS #include <trace/events/io_uring.h> @@ -7112,11 +7113,9 @@ static int io_sq_thread(void *data) snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid); set_task_comm(current, buf); - - if (sqd->sq_cpu != -1) + if (sqd->sq_cpu != -1 && test_cpu_in_current_cpuset(sqd->sq_cpu)) set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu)); - else - set_cpus_allowed_ptr(current, cpu_online_mask); + current->flags |= PF_NO_SETAFFINITY; mutex_lock(&sqd->lock); @@ -8310,8 +8309,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, int cpu = p->sq_thread_cpu; ret = -EINVAL; - if (cpu >= nr_cpu_ids || !cpu_online(cpu)) + if (cpu >= nr_cpu_ids || !cpu_online(cpu) || + !test_cpu_in_current_cpuset(cpu)) goto err_sqpoll; + sqd->sq_cpu = cpu; } else { sqd->sq_cpu = -1; -- 2.24.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <20210901124322.164238-3-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>]
* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu [not found] ` <20210901124322.164238-3-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> @ 2021-09-01 16:41 ` Tejun Heo 2021-09-01 16:42 ` Tejun Heo [not found] ` <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> 0 siblings, 2 replies; 14+ messages in thread From: Tejun Heo @ 2021-09-01 16:41 UTC (permalink / raw) To: Hao Xu Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov, io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Joseph Qi Hello, On Wed, Sep 01, 2021 at 08:43:22PM +0800, Hao Xu wrote: > @@ -7112,11 +7113,9 @@ static int io_sq_thread(void *data) > > snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid); > set_task_comm(current, buf); > + if (sqd->sq_cpu != -1 && test_cpu_in_current_cpuset(sqd->sq_cpu)) > set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu)); > + Would it make sense to just test whether set_cpus_allowed_ptr() succeeded afterwards? > @@ -8310,8 +8309,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, > int cpu = p->sq_thread_cpu; > > ret = -EINVAL; > - if (cpu >= nr_cpu_ids || !cpu_online(cpu)) > + if (cpu >= nr_cpu_ids || !cpu_online(cpu) || > + !test_cpu_in_current_cpuset(cpu)) > goto err_sqpoll; > + Failing operations on transient conditions like this may be confusing. Let's ignore cpuset for now. CPU hotplug is sometimes driven automatically for power saving purposes, so failing operation based on whether a cpu is online means that the success or failure of the operation can seem arbitrary. If the operation takes place while the cpu happens to be online, it succeeds and the thread gets unbound and rebound automatically as the cpu goes offline and online. If the operation takes place while the cpu happens to be offline, the operation fails. I don't know what the intended behavior here should be and we haven't been pretty bad at defining reasonable behavior around cpu hotplug, so it'd probably be worthwhile to consider what the behavior should be. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu 2021-09-01 16:41 ` Tejun Heo @ 2021-09-01 16:42 ` Tejun Heo [not found] ` <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> 1 sibling, 0 replies; 14+ messages in thread From: Tejun Heo @ 2021-09-01 16:42 UTC (permalink / raw) To: Hao Xu Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov, io-uring, cgroups, Joseph Qi On Wed, Sep 01, 2021 at 06:41:34AM -1000, Tejun Heo wrote: > I don't know what the intended behavior here should be and we haven't been ^ have > pretty bad at defining reasonable behavior around cpu hotplug, so it'd > probably be worthwhile to consider what the behavior should be. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu [not found] ` <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> @ 2021-09-03 15:04 ` Hao Xu [not found] ` <c49d9b26-1c74-316a-c933-e6964695a286-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Hao Xu @ 2021-09-03 15:04 UTC (permalink / raw) To: Tejun Heo Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov, io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Joseph Qi 在 2021/9/2 上午12:41, Tejun Heo 写道: Hi Tejun, > Hello, > > On Wed, Sep 01, 2021 at 08:43:22PM +0800, Hao Xu wrote: >> @@ -7112,11 +7113,9 @@ static int io_sq_thread(void *data) >> >> snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid); >> set_task_comm(current, buf); >> + if (sqd->sq_cpu != -1 && test_cpu_in_current_cpuset(sqd->sq_cpu)) >> set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu)); >> + > > Would it make sense to just test whether set_cpus_allowed_ptr() succeeded > afterwards? Do you mean: if (sqd->sq_cpu != -1 && !set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu))) I'm not familiar with set_cpus_allowed_ptr(), you mean it contains the similar logic of test_cpu_in_current_cpuset? > >> @@ -8310,8 +8309,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, >> int cpu = p->sq_thread_cpu; >> >> ret = -EINVAL; >> - if (cpu >= nr_cpu_ids || !cpu_online(cpu)) >> + if (cpu >= nr_cpu_ids || !cpu_online(cpu) || >> + !test_cpu_in_current_cpuset(cpu)) >> goto err_sqpoll; >> + > > Failing operations on transient conditions like this may be confusing. Let's > ignore cpuset for now. CPU hotplug is sometimes driven automatically for > power saving purposes, so failing operation based on whether a cpu is online > means that the success or failure of the operation can seem arbitrary. If > the operation takes place while the cpu happens to be online, it succeeds > and the thread gets unbound and rebound automatically as the cpu goes This is a bit beyond of my knowledge, so you mean if the cpu back online, the task will automatically schedule to this cpu? if it's true, I think the code logic here is fine. > offline and online. If the operation takes place while the cpu happens to be > offline, the operation fails. It's ok that it fails, we leave the option of retry to users themselves. > > I don't know what the intended behavior here should be and we haven't been > pretty bad at defining reasonable behavior around cpu hotplug, so it'd > probably be worthwhile to consider what the behavior should be. > > Thanks. > Thanks, Hao ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <c49d9b26-1c74-316a-c933-e6964695a286-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>]
* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu [not found] ` <c49d9b26-1c74-316a-c933-e6964695a286-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> @ 2021-09-07 16:54 ` Tejun Heo [not found] ` <YTeZUnshr+mgf5GS-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2021-09-07 16:54 UTC (permalink / raw) To: Hao Xu Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov, io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Joseph Qi Hello, On Fri, Sep 03, 2021 at 11:04:07PM +0800, Hao Xu wrote: > > Would it make sense to just test whether set_cpus_allowed_ptr() succeeded > > afterwards? > Do you mean: if (sqd->sq_cpu != -1 && !set_cpus_allowed_ptr(current, > cpumask_of(sqd->sq_cpu))) > > I'm not familiar with set_cpus_allowed_ptr(), you mean it contains the > similar logic of test_cpu_in_current_cpuset? It's kinda muddy unfortunately. I think it rejects if the target cpu is offline but accept and ignores if the cpu is excluded by cpuset. > This is a bit beyond of my knowledge, so you mean if the cpu back > online, the task will automatically schedule to this cpu? if it's true, > I think the code logic here is fine. > > > offline and online. If the operation takes place while the cpu happens to be > > offline, the operation fails. > It's ok that it fails, we leave the option of retry to users themselves. I think the first thing to do is defining the desired behavior, hopefully in a consistent manner, rather than letting it be defined by implementation. e.g. If the desired behavior is the per-cpu helper failing, then it should probably exit when the target cpu isn't available for whatever reason. If the desired behavior is best effort when cpu goes away (ie. ignore affinity), the creation likely shouldn't fail when the target cpu is unavailable but can become available in the future. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <YTeZUnshr+mgf5GS-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu [not found] ` <YTeZUnshr+mgf5GS-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> @ 2021-09-07 19:28 ` Hao Xu 0 siblings, 0 replies; 14+ messages in thread From: Hao Xu @ 2021-09-07 19:28 UTC (permalink / raw) To: Tejun Heo Cc: Jens Axboe, Zefan Li, Johannes Weiner, Pavel Begunkov, io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Joseph Qi 在 2021/9/8 上午12:54, Tejun Heo 写道: > Hello, > > On Fri, Sep 03, 2021 at 11:04:07PM +0800, Hao Xu wrote: >>> Would it make sense to just test whether set_cpus_allowed_ptr() succeeded >>> afterwards? >> Do you mean: if (sqd->sq_cpu != -1 && !set_cpus_allowed_ptr(current, >> cpumask_of(sqd->sq_cpu))) >> >> I'm not familiar with set_cpus_allowed_ptr(), you mean it contains the >> similar logic of test_cpu_in_current_cpuset? > > It's kinda muddy unfortunately. I think it rejects if the target cpu is > offline but accept and ignores if the cpu is excluded by cpuset. > >> This is a bit beyond of my knowledge, so you mean if the cpu back >> online, the task will automatically schedule to this cpu? if it's true, >> I think the code logic here is fine. >> >>> offline and online. If the operation takes place while the cpu happens to be >>> offline, the operation fails. >> It's ok that it fails, we leave the option of retry to users themselves. > > I think the first thing to do is defining the desired behavior, hopefully in > a consistent manner, rather than letting it be defined by implementation. > e.g. If the desired behavior is the per-cpu helper failing, then it should > probably exit when the target cpu isn't available for whatever reason. If > the desired behavior is best effort when cpu goes away (ie. ignore Hmm, I see. First I think we should move the set_cpus_allowed_ptr() to sqthread creation place not when it is running(not sure why it is currently at the beginning of sqthred itself), then we can have consistent behaviour.(if we do the check at sqthread's running time, then no matter we kill it or still allow it to run when cpu_online check fails, it's hard to let users know the result of their cpu binding since users don't know the exact time when sqthread waken up and begin to run, so that they can check their cpu binding result). Second, I think users' cpu binding is a kind of 'advice', not 'command'. So no matter cpu_online check succeeds or fails, we still let sqthread run, meanwhile return the cpu binding result to the userspace. Anyway, I'd like to know Jens' thoughts on this. > affinity), the creation likely shouldn't fail when the target cpu is > unavailable but can become available in the future. > > Thanks. > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 0/2] refactor sqthread cpu binding logic 2021-09-01 12:43 [PATCH v4 0/2] refactor sqthread cpu binding logic Hao Xu [not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> @ 2021-09-02 16:48 ` Michal Koutný [not found] ` <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org> 2021-09-03 14:43 ` Hao Xu 1 sibling, 2 replies; 14+ messages in thread From: Michal Koutný @ 2021-09-02 16:48 UTC (permalink / raw) To: Hao Xu Cc: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov, io-uring, cgroups, Joseph Qi Hello Hao. On Wed, Sep 01, 2021 at 08:43:20PM +0800, Hao Xu <haoxu@linux.alibaba.com> wrote: > This patchset is to enhance sqthread cpu binding logic, we didn't > consider cgroup setting before. In container environment, theoretically > sqthread is in its container's task group, it shouldn't occupy cpu out > of its container. I see in the discussions that there's struggle to make set_cpus_allowed_ptr() do what's intended under the given constraints. IIUC, set_cpus_allowed_ptr() is conventionally used for kernel threads [1]. But does the sqthread fall into this category? You want to have it _directly_ associated with a container and its cgroups. It looks to me more like a userspace thread (from this perspective, not literally). Or is there a different intention? It seems to me that reusing the sched_setaffinity() (with all its checks and race pains/solutions) would be a more universal approach. (I don't mean calling sched_setaffinity() directly, some parts would need to be factored separately to this end.) WDYT? Regards, Michal [1] Not only spending their life in kernel but providing some delocalized kernel service. ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>]
* Re: [PATCH v4 0/2] refactor sqthread cpu binding logic [not found] ` <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org> @ 2021-09-02 18:00 ` Jens Axboe [not found] ` <efd3c387-9c7c-c0d8-1306-f722da2a9ba1-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2021-09-02 18:00 UTC (permalink / raw) To: Michal Koutný, Hao Xu Cc: Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov, io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Joseph Qi On 9/2/21 10:48 AM, Michal Koutný wrote: > Hello Hao. > > On Wed, Sep 01, 2021 at 08:43:20PM +0800, Hao Xu <haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> wrote: >> This patchset is to enhance sqthread cpu binding logic, we didn't >> consider cgroup setting before. In container environment, theoretically >> sqthread is in its container's task group, it shouldn't occupy cpu out >> of its container. > > I see in the discussions that there's struggle to make > set_cpus_allowed_ptr() do what's intended under the given constraints. > > IIUC, set_cpus_allowed_ptr() is conventionally used for kernel threads > [1]. But does the sqthread fall into this category? You want to have it > _directly_ associated with a container and its cgroups. It looks to me > more like a userspace thread (from this perspective, not literally). Or > is there a different intention? It's an io thread, which is kind of a hybrid - it's a kernel thread in the sense that it never exits to userspace (ever), but it's a regular thread in the sense that it's setup like one. > It seems to me that reusing the sched_setaffinity() (with all its > checks and race pains/solutions) would be a more universal approach. > (I don't mean calling sched_setaffinity() directly, some parts would > need to be factored separately to this end.) WDYT? We already have this API to set the affinity based on when these were regular kernel threads, so it needs to work with that too. As such they are marked PF_NO_SETAFFINITY. > [1] Not only spending their life in kernel but providing some > delocalized kernel service. That's what they do... -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <efd3c387-9c7c-c0d8-1306-f722da2a9ba1-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>]
* Re: [PATCH v4 0/2] refactor sqthread cpu binding logic [not found] ` <efd3c387-9c7c-c0d8-1306-f722da2a9ba1-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> @ 2021-09-09 12:34 ` Michal Koutný 0 siblings, 0 replies; 14+ messages in thread From: Michal Koutný @ 2021-09-09 12:34 UTC (permalink / raw) To: Jens Axboe, Hao Xu Cc: Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov, io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Joseph Qi Thanks for the answer and the context explanations. On Thu, Sep 02, 2021 at 12:00:33PM -0600, Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> wrote: > We already have this API to set the affinity based on when these were > regular kernel threads, so it needs to work with that too. As such they > are marked PF_NO_SETAFFINITY. I see the current implementation "allows" at most one binding (by the passed sq_cpu arg) to a CPU and then "locks" it by setting PF_NO_SETAFFINITY subsequently (after set_cpus_allowed_ptr). And actually doesn't check whether it succeeded or not (Hao suggests in another subthread sq_cpu is a mere hint). Nevertheless, you likely don't want to "trespass" the boundary of a cpuset and I think that it'll end up with a loop checking against hotplug races. That is already implemented in __sched_affinity, it'd be IMO good to have it in one place only. > > [1] Not only spending their life in kernel but providing some > > delocalized kernel service. > > That's what they do... (I assume that answer to "life in kernel" and the IO threads serve only the originating cpuset (container) i.e. are (co)localized to it (not delocalized as some kernel workers).) Cheers, Michal ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 0/2] refactor sqthread cpu binding logic 2021-09-02 16:48 ` [PATCH v4 0/2] refactor sqthread cpu binding logic Michal Koutný [not found] ` <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org> @ 2021-09-03 14:43 ` Hao Xu 1 sibling, 0 replies; 14+ messages in thread From: Hao Xu @ 2021-09-03 14:43 UTC (permalink / raw) To: Michal Koutný Cc: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov, io-uring, cgroups, Joseph Qi 在 2021/9/3 上午12:48, Michal Koutný 写道: > Hello Hao. > > On Wed, Sep 01, 2021 at 08:43:20PM +0800, Hao Xu <haoxu@linux.alibaba.com> wrote: >> This patchset is to enhance sqthread cpu binding logic, we didn't >> consider cgroup setting before. In container environment, theoretically >> sqthread is in its container's task group, it shouldn't occupy cpu out >> of its container. > > I see in the discussions that there's struggle to make > set_cpus_allowed_ptr() do what's intended under the given constraints. > > IIUC, set_cpus_allowed_ptr() is conventionally used for kernel threads > [1]. But does the sqthread fall into this category? You want to have it > _directly_ associated with a container and its cgroups. It looks to me sqthread is in it's creator's task group, so it is like a userspace thread from this perspective. When it comes to container environemt sqthread naturely belongs to a container which also contains its creator And it has same cgroup setting with it's creator by default. > more like a userspace thread (from this perspective, not literally). Or > is there a different intention? > > It seems to me that reusing the sched_setaffinity() (with all its > checks and race pains/solutions) would be a more universal approach. > (I don't mean calling sched_setaffinity() directly, some parts would > need to be factored separately to this end.) WDYT? > > > Regards, > Michal > > [1] Not only spending their life in kernel but providing some > delocalized kernel service. > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3 0/2] refactor sqthread cpu binding logic
@ 2021-09-01 10:18 Hao Xu
[not found] ` <20210901101833.69535-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Hao Xu @ 2021-09-01 10:18 UTC (permalink / raw)
To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov
Cc: io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
Joseph Qi
This patchset is to enhance sqthread cpu binding logic, we didn't
consider cgroup setting before. In container environment, theoretically
sqthread is in its container's task group, it shouldn't occupy cpu out
of its container. Otherwise it may cause some problems, for example:
resource balancer may controll cpu resource allocation by a container's
current cpu usage, here sqthread should be counted in.
v1-->v2
- add helper to do cpu in current-cpuset test
v2-->v3
- split it to 2 patches, first as prep one, second as comsumer
- remove warnning, since cpuset may change anytime, we cannot catch
all cases, so make the behaviour consistent.
Hao Xu (2):
cpuset: add a helper to check if cpu in cpuset of current task
io_uring: consider cgroup setting when binding sqpoll cpu
fs/io_uring.c | 19 ++++++++++++++-----
include/linux/cpuset.h | 7 +++++++
kernel/cgroup/cpuset.c | 11 +++++++++++
3 files changed, 32 insertions(+), 5 deletions(-)
--
2.24.4
^ permalink raw reply [flat|nested] 14+ messages in thread[parent not found: <20210901101833.69535-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>]
* [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu [not found] ` <20210901101833.69535-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> @ 2021-09-01 10:18 ` Hao Xu 2021-09-01 12:31 ` Hao Xu 0 siblings, 1 reply; 14+ messages in thread From: Hao Xu @ 2021-09-01 10:18 UTC (permalink / raw) To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov Cc: io-uring-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, Joseph Qi Since sqthread is userspace like thread now, it should respect cgroup setting, thus we should consider current allowed cpuset when doing cpu binding for sqthread. Signed-off-by: Hao Xu <haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> --- fs/io_uring.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 7cc458e0b636..414dfedf79a7 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -79,6 +79,7 @@ #include <linux/pagemap.h> #include <linux/io_uring.h> #include <linux/tracehook.h> +#include <linux/cpuset.h> #define CREATE_TRACE_POINTS #include <trace/events/io_uring.h> @@ -7102,6 +7103,14 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd) return did_sig || test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state); } +static inline int io_sq_bind_cpu(int cpu) +{ + if (test_cpu_in_current_cpuset(cpu)) + set_cpus_allowed_ptr(current, cpumask_of(cpu)); + + return 0; +} + static int io_sq_thread(void *data) { struct io_sq_data *sqd = data; @@ -7112,11 +7121,9 @@ static int io_sq_thread(void *data) snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid); set_task_comm(current, buf); - if (sqd->sq_cpu != -1) - set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu)); - else - set_cpus_allowed_ptr(current, cpu_online_mask); + io_sq_bind_cpu(sqd->sq_cpu); + current->flags |= PF_NO_SETAFFINITY; mutex_lock(&sqd->lock); @@ -8310,8 +8317,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, int cpu = p->sq_thread_cpu; ret = -EINVAL; - if (cpu >= nr_cpu_ids || !cpu_online(cpu)) + if (cpu >= nr_cpu_ids || !cpu_online(cpu) || + !test_cpu_in_current_cpuset(cpu)) goto err_sqpoll; + sqd->sq_cpu = cpu; } else { sqd->sq_cpu = -1; -- 2.24.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu 2021-09-01 10:18 ` [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu Hao Xu @ 2021-09-01 12:31 ` Hao Xu 0 siblings, 0 replies; 14+ messages in thread From: Hao Xu @ 2021-09-01 12:31 UTC (permalink / raw) To: Jens Axboe, Zefan Li, Tejun Heo, Johannes Weiner, Pavel Begunkov Cc: io-uring, cgroups, Joseph Qi 在 2021/9/1 下午6:18, Hao Xu 写道: > Since sqthread is userspace like thread now, it should respect cgroup > setting, thus we should consider current allowed cpuset when doing > cpu binding for sqthread. > > Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> > --- > fs/io_uring.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 7cc458e0b636..414dfedf79a7 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -79,6 +79,7 @@ > #include <linux/pagemap.h> > #include <linux/io_uring.h> > #include <linux/tracehook.h> > +#include <linux/cpuset.h> > > #define CREATE_TRACE_POINTS > #include <trace/events/io_uring.h> > @@ -7102,6 +7103,14 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd) > return did_sig || test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state); > } > > +static inline int io_sq_bind_cpu(int cpu) > +{ > + if (test_cpu_in_current_cpuset(cpu)) > + set_cpus_allowed_ptr(current, cpumask_of(cpu)); > + > + return 0; Ah, no need to return value anymore, even no need to have this function here. I'll resend a new version. > +} > + > static int io_sq_thread(void *data) > { > struct io_sq_data *sqd = data; > @@ -7112,11 +7121,9 @@ static int io_sq_thread(void *data) > > snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid); > set_task_comm(current, buf); > - > if (sqd->sq_cpu != -1) > - set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu)); > - else > - set_cpus_allowed_ptr(current, cpu_online_mask); > + io_sq_bind_cpu(sqd->sq_cpu); > + > current->flags |= PF_NO_SETAFFINITY; > > mutex_lock(&sqd->lock); > @@ -8310,8 +8317,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, > int cpu = p->sq_thread_cpu; > > ret = -EINVAL; > - if (cpu >= nr_cpu_ids || !cpu_online(cpu)) > + if (cpu >= nr_cpu_ids || !cpu_online(cpu) || > + !test_cpu_in_current_cpuset(cpu)) > goto err_sqpoll; > + > sqd->sq_cpu = cpu; > } else { > sqd->sq_cpu = -1; > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-09-09 12:34 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-01 12:43 [PATCH v4 0/2] refactor sqthread cpu binding logic Hao Xu
[not found] ` <20210901124322.164238-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2021-09-01 12:43 ` [PATCH 1/2] cpuset: add a helper to check if cpu in cpuset of current task Hao Xu
2021-09-01 12:43 ` [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu Hao Xu
[not found] ` <20210901124322.164238-3-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2021-09-01 16:41 ` Tejun Heo
2021-09-01 16:42 ` Tejun Heo
[not found] ` <YS+tPq1eiQLx4P3M-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2021-09-03 15:04 ` Hao Xu
[not found] ` <c49d9b26-1c74-316a-c933-e6964695a286-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2021-09-07 16:54 ` Tejun Heo
[not found] ` <YTeZUnshr+mgf5GS-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2021-09-07 19:28 ` Hao Xu
2021-09-02 16:48 ` [PATCH v4 0/2] refactor sqthread cpu binding logic Michal Koutný
[not found] ` <20210902164808.GA10014-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2021-09-02 18:00 ` Jens Axboe
[not found] ` <efd3c387-9c7c-c0d8-1306-f722da2a9ba1-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2021-09-09 12:34 ` Michal Koutný
2021-09-03 14:43 ` Hao Xu
-- strict thread matches above, loose matches on Subject: below --
2021-09-01 10:18 [PATCH v3 " Hao Xu
[not found] ` <20210901101833.69535-1-haoxu-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2021-09-01 10:18 ` [PATCH 2/2] io_uring: consider cgroup setting when binding sqpoll cpu Hao Xu
2021-09-01 12:31 ` Hao Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox