* [RFC PATCH v3 0/3] genirq/cpuhotplug: Adjust managed interrupts according to change of housekeeping cpumask
@ 2024-09-16 12:20 Costa Shulyupin
2024-09-16 12:20 ` [RFC PATCH v3 1/3] sched/isolation: Add infrastructure for dynamic CPU isolation Costa Shulyupin
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Costa Shulyupin @ 2024-09-16 12:20 UTC (permalink / raw)
To: longman, ming.lei, pauld, juri.lelli, vschneid, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Thomas Gleixner,
Zefan Li, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Costa Shulyupin,
Bjorn Helgaas, linuxppc-dev, linux-kernel, cgroups
The housekeeping CPU masks, set up by the "isolcpus" and "nohz_full"
boot command line options, are used at boot time to exclude selected
CPUs from running some kernel housekeeping subsystems to minimize
disturbance to latency sensitive userspace applications such as DPDK.
This options can only be changed with a reboot. This is a problem for
containerized workloads running on OpenShift/Kubernetes where a
mix of low latency and "normal" workloads can be created/destroyed
dynamically and the number of CPUs allocated to each workload is often
not known at boot time.
Theoretically, complete CPU offlining/onlining could be used for
housekeeping adjustments, but this approach is not practical.
Telco companies use Linux to run DPDK in OpenShift/Kubernetes containers.
DPDK requires isolated cpus to run real-time processes.
Kubernetes manages allocation of resources for containers.
Unfortunately Kubernetes doesn't support dynamic CPU offlining/onlining:
https://github.com/kubernetes/kubernetes/issues/67500
and is not planning to support it.
Addressing this issue at the application level appears to be even
less straightforward than addressing it at the kernel level.
This series of patches is based on series
isolation: Exclude dynamically isolated CPUs from housekeeping masks:
https://lore.kernel.org/lkml/20240821142312.236970-1-longman@redhat.com/
Its purpose is to exclude dynamically isolated CPUs from some
housekeeping masks so that subsystems that check the housekeeping masks
at run time will not use those isolated CPUs.
However, some of subsystems can use obsolete housekeeping CPU masks.
Therefore, to prevent the use of these isolated CPUs, it is necessary to
explicitly propagate changes of the housekeeping masks to all subsystems
depending on the mask.
Changes in v2:
- Focusing in this patch series on interrupts only.
v1:
- https://lore.kernel.org/lkml/20240516190437.3545310-1-costa.shul@redhat.com/
References:
- Linux Kernel Dynamic CPU Isolation: https://pretalx.com/devconf-us-2024/talk/AZBQLE/
Costa Shulyupin (3):
sched/isolation: Add infrastructure for dynamic CPU isolation
genirq/cpuhotplug: Adjust managed irqs according to change of
housekeeping CPU
DO NOT MERGE: test for managed irqs adjustment
include/linux/irq.h | 2 +
kernel/cgroup/cpuset.c | 1 +
kernel/irq/cpuhotplug.c | 95 ++++++++++++++++++++++++++++++++
kernel/sched/isolation.c | 46 ++++++++++++++--
tests/managed_irq.sh | 113 +++++++++++++++++++++++++++++++++++++++
5 files changed, 254 insertions(+), 3 deletions(-)
create mode 100755 tests/managed_irq.sh
--
2.45.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH v3 1/3] sched/isolation: Add infrastructure for dynamic CPU isolation
2024-09-16 12:20 [RFC PATCH v3 0/3] genirq/cpuhotplug: Adjust managed interrupts according to change of housekeeping cpumask Costa Shulyupin
@ 2024-09-16 12:20 ` Costa Shulyupin
2024-10-02 9:44 ` Thomas Gleixner
2024-09-16 12:20 ` [RFC PATCH v3 2/3] genirq/cpuhotplug: Adjust managed irqs according to change of housekeeping CPU Costa Shulyupin
2024-09-16 12:20 ` [RFC PATCH v3 3/3] DO NOT MERGE: test for managed irqs adjustment Costa Shulyupin
2 siblings, 1 reply; 6+ messages in thread
From: Costa Shulyupin @ 2024-09-16 12:20 UTC (permalink / raw)
To: longman, ming.lei, pauld, juri.lelli, vschneid, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Thomas Gleixner,
Zefan Li, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Costa Shulyupin,
Bjorn Helgaas, linuxppc-dev, linux-kernel, cgroups
Introduce infrastructure function housekeeping_update() to change
housekeeping_cpumask during runtime and adjust configurations of depended
subsystems.
Configuration adjustments of subsystems follow in subsequent patches.
Parent patch:
sched/isolation: Exclude dynamically isolated CPUs from housekeeping masks
https://lore.kernel.org/lkml/20240821142312.236970-1-longman@redhat.com/
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
---
Changes in v2:
- remove unnecessary `err` variable
- add for_each_clear_bit... to clear isolated CPUs
- Address Gleixner's comments:
- use WRITE_ONCE to change housekeeping.flags
- use `struct cpumask *update` in signature of housekeeping_update
v1:
- https://lore.kernel.org/lkml/20240516190437.3545310-2-costa.shul@redhat.com/
---
kernel/sched/isolation.c | 38 +++++++++++++++++++++++++++++++++++---
1 file changed, 35 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 3018ba81eb65d..3f24921b929a0 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -124,6 +124,36 @@ static void __init housekeeping_setup_type(enum hk_type type,
housekeeping_staging);
}
+/*
+ * housekeeping_update - change housekeeping.cpumasks[type] and propagate the
+ * change.
+ */
+static int housekeeping_update(enum hk_type type, const struct cpumask *update)
+{
+ struct {
+ struct cpumask changed;
+ struct cpumask enable;
+ struct cpumask disable;
+ } *masks;
+
+ masks = kmalloc(sizeof(*masks), GFP_KERNEL);
+ if (!masks)
+ return -ENOMEM;
+
+ lockdep_assert_cpus_held();
+ cpumask_xor(&masks->changed, housekeeping_cpumask(type), update);
+ cpumask_and(&masks->enable, &masks->changed, update);
+ cpumask_andnot(&masks->disable, &masks->changed, update);
+ cpumask_copy(housekeeping.cpumasks[type], update);
+ WRITE_ONCE(housekeeping.flags, housekeeping.flags | BIT(type));
+ if (!static_branch_unlikely(&housekeeping_overridden))
+ static_key_enable_cpuslocked(&housekeeping_overridden.key);
+
+ kfree(masks);
+
+ return 0;
+}
+
static int __init housekeeping_setup(char *str, unsigned long flags)
{
cpumask_var_t non_housekeeping_mask, housekeeping_staging;
@@ -327,8 +357,11 @@ int housekeeping_exlude_isolcpus(const struct cpumask *isolcpus, unsigned long f
/*
* Reset housekeeping to bootup default
*/
+
+ for_each_clear_bit(type, &boot_hk_flags, HK_TYPE_MAX)
+ housekeeping_update(type, cpu_possible_mask);
for_each_set_bit(type, &boot_hk_flags, HK_TYPE_MAX)
- cpumask_copy(housekeeping.cpumasks[type], boot_hk_cpumask);
+ housekeeping_update(type, boot_hk_cpumask);
WRITE_ONCE(housekeeping.flags, boot_hk_flags);
if (!boot_hk_flags && static_key_enabled(&housekeeping_overridden))
@@ -355,9 +388,8 @@ int housekeeping_exlude_isolcpus(const struct cpumask *isolcpus, unsigned long f
cpumask_andnot(tmp_mask, src_mask, isolcpus);
if (!cpumask_intersects(tmp_mask, cpu_online_mask))
return -EINVAL; /* Invalid isolated CPUs */
- cpumask_copy(housekeeping.cpumasks[type], tmp_mask);
+ housekeeping_update(type, tmp_mask);
}
- WRITE_ONCE(housekeeping.flags, boot_hk_flags | flags);
excluded = true;
if (!static_key_enabled(&housekeeping_overridden))
static_key_enable_cpuslocked(&housekeeping_overridden.key);
--
2.45.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH v3 2/3] genirq/cpuhotplug: Adjust managed irqs according to change of housekeeping CPU
2024-09-16 12:20 [RFC PATCH v3 0/3] genirq/cpuhotplug: Adjust managed interrupts according to change of housekeeping cpumask Costa Shulyupin
2024-09-16 12:20 ` [RFC PATCH v3 1/3] sched/isolation: Add infrastructure for dynamic CPU isolation Costa Shulyupin
@ 2024-09-16 12:20 ` Costa Shulyupin
2024-10-02 10:09 ` Thomas Gleixner
2024-09-16 12:20 ` [RFC PATCH v3 3/3] DO NOT MERGE: test for managed irqs adjustment Costa Shulyupin
2 siblings, 1 reply; 6+ messages in thread
From: Costa Shulyupin @ 2024-09-16 12:20 UTC (permalink / raw)
To: longman, ming.lei, pauld, juri.lelli, vschneid, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Thomas Gleixner,
Zefan Li, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Costa Shulyupin,
Bjorn Helgaas, linuxppc-dev, linux-kernel, cgroups
Interrupts disturb real-time tasks on affined cpus.
To ensure CPU isolation for real-time tasks, interrupt handling must
be adjusted accordingly.
Non-managed interrupts can be configured from userspace,
while managed interrupts require adjustments in kernelspace.
Adjust status of managed interrupts according change
of housekeeping CPUs to support dynamic CPU isolation.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
---
The following code is a proof of concept to validate
and review the correctness of the approach to solving the problem.
C++ comments denote temporary comments.
v2:
- refactor irq_affinity_adjust():
- add more comments
- add managed_irq_isolate() derived from migrate_one_irq(),
irq_needs_fixup() and irq_fixup_move_pending()
- use irq_set_affinity() instead of irq_set_affinity_locked
- Addressed Gleixner's comments:
- use `struct cpumask *` instead of `cpumask_var_t` in function signature
- remove locking in irq_affinity_adjust()
v1:
- https://lore.kernel.org/lkml/20240516190437.3545310-5-costa.shul@redhat.com/
---
include/linux/irq.h | 2 +
kernel/cgroup/cpuset.c | 1 +
kernel/irq/cpuhotplug.c | 95 ++++++++++++++++++++++++++++++++++++++++
kernel/sched/isolation.c | 10 ++++-
4 files changed, 107 insertions(+), 1 deletion(-)
diff --git a/include/linux/irq.h b/include/linux/irq.h
index fa711f80957b6..4eb2e765dbd95 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -619,6 +619,8 @@ extern int irq_affinity_online_cpu(unsigned int cpu);
# define irq_affinity_online_cpu NULL
#endif
+int managed_irq_affinity_adjust(struct cpumask *enable_mask);
+
#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_PENDING_IRQ)
void __irq_move_irq(struct irq_data *data);
static inline void irq_move_irq(struct irq_data *data)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index bf60bdc973dd6..73b06b2cd91e3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -90,6 +90,7 @@ static struct list_head remote_children;
#define HOUSEKEEPING_FLAGS (BIT(HK_TYPE_TIMER) | BIT(HK_TYPE_RCU) |\
BIT(HK_TYPE_SCHED) | BIT(HK_TYPE_MISC) |\
BIT(HK_TYPE_DOMAIN) | BIT(HK_TYPE_WQ) |\
+ BIT(HK_TYPE_MANAGED_IRQ) |\
BIT(HK_TYPE_KTHREAD))
/*
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index ec2cdcd20bee7..adbe1f3e5bd22 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -252,3 +252,98 @@ int irq_affinity_online_cpu(unsigned int cpu)
return 0;
}
+
+/*
+ * managed_irq_isolate() - Deactivate managed interrupts if necessary
+ */
+// derived from migrate_one_irq, irq_needs_fixup, irq_fixup_move_pending
+static int managed_irq_isolate(struct irq_desc *desc)
+{
+ struct irq_data *d = irq_desc_get_irq_data(desc);
+ struct irq_chip *chip = irq_data_get_irq_chip(d);
+ const struct cpumask *a;
+ bool maskchip;
+ int err;
+
+ /*
+ * Deactivate if:
+ * - Interrupt is managed
+ * - Interrupt is not per cpu
+ * - Interrupt is started
+ * - Effective affinity mask includes isolated CPUs
+ */
+ if (!irqd_affinity_is_managed(d) || irqd_is_per_cpu(d) || !irqd_is_started(d)
+ || cpumask_subset(irq_data_get_effective_affinity_mask(d),
+ housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)))
+ return 0;
+ // TBD: it is required?
+ /*
+ * Complete an eventually pending irq move cleanup. If this
+ * interrupt was moved in hard irq context, then the vectors need
+ * to be cleaned up. It can't wait until this interrupt actually
+ * happens and this CPU was involved.
+ */
+ irq_force_complete_move(desc);
+
+ if (irqd_is_setaffinity_pending(d)) {
+ irqd_clr_move_pending(d);
+ if (cpumask_intersects(desc->pending_mask,
+ housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)))
+ a = irq_desc_get_pending_mask(desc);
+ } else {
+ a = irq_data_get_affinity_mask(d);
+ }
+
+ maskchip = chip->irq_mask && !irq_can_move_pcntxt(d) && !irqd_irq_masked(d);
+ if (maskchip)
+ chip->irq_mask(d);
+
+ if (!cpumask_intersects(a, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ))) {
+ /*
+ * Shut managed interrupt down and leave the affinity untouched.
+ * The effective affinity is reset to the first online CPU.
+ */
+ irqd_set_managed_shutdown(d);
+ irq_shutdown_and_deactivate(desc);
+ return 0;
+ }
+
+ /*
+ * Do not set the force argument of irq_do_set_affinity() as this
+ * disables the masking of offline CPUs from the supplied affinity
+ * mask and therefore might keep/reassign the irq to the outgoing
+ * CPU.
+ */
+ err = irq_do_set_affinity(d, a, false);
+ if (err)
+ pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
+ d->irq, err);
+
+ if (maskchip)
+ chip->irq_unmask(d);
+
+ return err;
+}
+
+/** managed_irq_affinity_adjust() - Deactivate of restore managed interrupts
+ * according to change of housekeeping cpumask.
+ *
+ * @enable_mask: CPUs for which interrupts should be restored
+ */
+int managed_irq_affinity_adjust(struct cpumask *enable_mask)
+{
+ unsigned int irq;
+
+ for_each_active_irq(irq) {
+ struct irq_desc *desc = irq_to_desc(irq);
+ unsigned int cpu;
+
+ for_each_cpu(cpu, enable_mask)
+ irq_restore_affinity_of_irq(desc, cpu);
+ raw_spin_lock(&desc->lock);
+ managed_irq_isolate(desc);
+ raw_spin_unlock(&desc->lock);
+ }
+
+ return 0;
+}
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 3f24921b929a0..cd72300ec8b99 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -130,6 +130,8 @@ static void __init housekeeping_setup_type(enum hk_type type,
*/
static int housekeeping_update(enum hk_type type, const struct cpumask *update)
{
+ int err = 0;
+
struct {
struct cpumask changed;
struct cpumask enable;
@@ -149,9 +151,15 @@ static int housekeeping_update(enum hk_type type, const struct cpumask *update)
if (!static_branch_unlikely(&housekeeping_overridden))
static_key_enable_cpuslocked(&housekeeping_overridden.key);
+ switch (type) {
+ case HK_TYPE_MANAGED_IRQ:
+ err = managed_irq_affinity_adjust(&masks->enable);
+ break;
+ default:
+ }
kfree(masks);
- return 0;
+ return err;
}
static int __init housekeeping_setup(char *str, unsigned long flags)
--
2.45.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH v3 3/3] DO NOT MERGE: test for managed irqs adjustment
2024-09-16 12:20 [RFC PATCH v3 0/3] genirq/cpuhotplug: Adjust managed interrupts according to change of housekeeping cpumask Costa Shulyupin
2024-09-16 12:20 ` [RFC PATCH v3 1/3] sched/isolation: Add infrastructure for dynamic CPU isolation Costa Shulyupin
2024-09-16 12:20 ` [RFC PATCH v3 2/3] genirq/cpuhotplug: Adjust managed irqs according to change of housekeeping CPU Costa Shulyupin
@ 2024-09-16 12:20 ` Costa Shulyupin
2 siblings, 0 replies; 6+ messages in thread
From: Costa Shulyupin @ 2024-09-16 12:20 UTC (permalink / raw)
To: longman, ming.lei, pauld, juri.lelli, vschneid, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Thomas Gleixner,
Zefan Li, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Costa Shulyupin,
Bjorn Helgaas, linuxppc-dev, linux-kernel, cgroups
shell script for testing managed interrupts
status adjustments.
Targets: managed_irq_affinity_adjust(),
irq_restore_affinity_of_irq(), managed_irq_isolate()
Managed interrupts can be created in various ways. One of them:
1. create qcow2 image
2. run
virtme-ng -v --cpus 4 --rw --user root \
--qemu-opts '\-drive id=d1,if=none,file=image.qcow2 \
\-device nvme,id=i1,drive=d1,serial=1,bootindex=2'
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
---
Changes in v2:
- use shell script only
v1:
- https://lore.kernel.org/lkml/20240516190437.3545310-8-costa.shul@redhat.com/
---
tests/managed_irq.sh | 113 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 113 insertions(+)
create mode 100755 tests/managed_irq.sh
diff --git a/tests/managed_irq.sh b/tests/managed_irq.sh
new file mode 100755
index 0000000000000..bd97f47991a0b
--- /dev/null
+++ b/tests/managed_irq.sh
@@ -0,0 +1,113 @@
+#!/bin/zsh
+
+# shell script for testing IRQ status adjustment.
+# Targets: managed_irq_affinity_adjust(),
+# irq_restore_affinity_of_irq(), managed_irq_isolate()
+
+# cpu# to isolate
+#
+isolate=1
+
+managed_affined=$(
+ cd /sys/kernel/debug/irq/irqs/;
+ grep -l -e "affinity: $isolate$" /dev/null $(grep -l IRQD_AFFINITY_MANAGED *)
+)
+test_irq=${managed_affined%% *}
+
+[ -z $test_irq ] && { echo No managed IRQs found;exit 1}
+
+rm -rf 0.irqs
+cp -R /sys/kernel/debug/irq/irqs 0.irqs
+
+cd /sys/fs/cgroup/
+echo +cpuset > cgroup.subtree_control
+mkdir -p test
+echo isolated > test/cpuset.cpus.partition
+
+effective_affinity=/proc/irq/$test_irq/effective_affinity
+test_irq_debug=/sys/kernel/debug/irq/irqs/$test_irq
+
+errors=0
+
+check()
+{
+ local _status=$?
+ if [[ $_status == 0 ]]
+ then
+ echo PASS
+ else
+ let errors+=1
+ echo FAIL:
+ cat $test_irq_debug
+ fi
+ return $_status
+}
+
+check_activated()
+{
+ echo "Check normal irq affinity"
+ test 0 -ne $((0x$(cat $effective_affinity) & 1 << $isolate))
+ check
+ grep -q IRQD_ACTIVATED $test_irq_debug
+ check
+ grep -q IRQD_IRQ_STARTED $test_irq_debug
+ check
+ ! grep -q IRQD_IRQ_DISABLED $test_irq_debug
+ check
+ ! grep -q IRQD_IRQ_MASKED $test_irq_debug
+ check
+ ! grep -q IRQD_MANAGED_SHUTDOWN $test_irq_debug
+ check
+}
+
+check_shutdown()
+{
+ echo "Check that irq affinity doesn't contain isolated cpu."
+ test 0 -eq $((0x$(cat $effective_affinity) & 1 << $isolate))
+ check
+ ! grep -q IRQD_ACTIVATED $test_irq_debug
+ check
+ ! grep -q IRQD_IRQ_STARTED $test_irq_debug
+ check
+ grep -q IRQD_IRQ_DISABLED $test_irq_debug
+ check
+ grep -q IRQD_IRQ_MASKED $test_irq_debug
+ check
+ grep -q IRQD_MANAGED_SHUTDOWN $test_irq_debug
+ check
+}
+
+echo "Isolating CPU #$isolate"
+echo $isolate > test/cpuset.cpus
+
+check_shutdown
+
+echo Reset cpuset
+echo "" > test/cpuset.cpus
+
+check_activated
+
+echo "Isolating CPU #$isolate again"
+echo $isolate > test/cpuset.cpus
+
+check_shutdown()
+
+echo "Isolating CPU #3 and restore CPU #$isolate"
+echo 3 > test/cpuset.cpus
+
+check_activated
+
+echo Reset cpuset
+echo "" > test/cpuset.cpus
+
+rmdir test
+cd -
+
+rm -rf final.irqs
+cp -R /sys/kernel/debug/irq/irqs final.irqs
+
+echo Expected to be without major differences:
+diff -r 0.irqs final.irqs && echo No differences
+
+echo errors=$errors
+(return $errors)
--
2.45.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v3 1/3] sched/isolation: Add infrastructure for dynamic CPU isolation
2024-09-16 12:20 ` [RFC PATCH v3 1/3] sched/isolation: Add infrastructure for dynamic CPU isolation Costa Shulyupin
@ 2024-10-02 9:44 ` Thomas Gleixner
0 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2024-10-02 9:44 UTC (permalink / raw)
To: Costa Shulyupin, longman, ming.lei, pauld, juri.lelli, vschneid,
Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
Zefan Li, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Costa Shulyupin,
Bjorn Helgaas, linuxppc-dev, linux-kernel, cgroups
On Mon, Sep 16 2024 at 15:20, Costa Shulyupin wrote:
> +/*
> + * housekeeping_update - change housekeeping.cpumasks[type] and propagate the
> + * change.
> + */
> +static int housekeeping_update(enum hk_type type, const struct cpumask *update)
> +{
> + struct {
> + struct cpumask changed;
> + struct cpumask enable;
> + struct cpumask disable;
> + } *masks;
> +
> + masks = kmalloc(sizeof(*masks), GFP_KERNEL);
> + if (!masks)
> + return -ENOMEM;
> +
> + lockdep_assert_cpus_held();
> + cpumask_xor(&masks->changed, housekeeping_cpumask(type), update);
> + cpumask_and(&masks->enable, &masks->changed, update);
> + cpumask_andnot(&masks->disable, &masks->changed, update);
> + cpumask_copy(housekeeping.cpumasks[type], update);
> + WRITE_ONCE(housekeeping.flags, housekeeping.flags | BIT(type));
So this sets the bit for the type
> + if (!static_branch_unlikely(&housekeeping_overridden))
> + static_key_enable_cpuslocked(&housekeeping_overridden.key);
What's the point of doing this on every iteration?
> + kfree(masks);
> +
> + return 0;
> +}
> +
> static int __init housekeeping_setup(char *str, unsigned long flags)
> {
> cpumask_var_t non_housekeeping_mask, housekeeping_staging;
> @@ -327,8 +357,11 @@ int housekeeping_exlude_isolcpus(const struct cpumask *isolcpus, unsigned long f
> /*
> * Reset housekeeping to bootup default
> */
> +
> + for_each_clear_bit(type, &boot_hk_flags, HK_TYPE_MAX)
> + housekeeping_update(type, cpu_possible_mask);
Even for those which are clear
> for_each_set_bit(type, &boot_hk_flags, HK_TYPE_MAX)
> - cpumask_copy(housekeeping.cpumasks[type], boot_hk_cpumask);
> + housekeeping_update(type, boot_hk_cpumask);
>
> WRITE_ONCE(housekeeping.flags, boot_hk_flags);
Just to overwrite them with boot_hk_flags afterwards. That does not make
any sense at all.
Thanks,
tglx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v3 2/3] genirq/cpuhotplug: Adjust managed irqs according to change of housekeeping CPU
2024-09-16 12:20 ` [RFC PATCH v3 2/3] genirq/cpuhotplug: Adjust managed irqs according to change of housekeeping CPU Costa Shulyupin
@ 2024-10-02 10:09 ` Thomas Gleixner
0 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2024-10-02 10:09 UTC (permalink / raw)
To: Costa Shulyupin, longman, ming.lei, pauld, juri.lelli, vschneid,
Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
Zefan Li, Tejun Heo, Johannes Weiner, Michal Koutný,
Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Costa Shulyupin,
Bjorn Helgaas, linuxppc-dev, linux-kernel, cgroups
On Mon, Sep 16 2024 at 15:20, Costa Shulyupin wrote:
> Interrupts disturb real-time tasks on affined cpus.
> To ensure CPU isolation for real-time tasks, interrupt handling must
> be adjusted accordingly.
> Non-managed interrupts can be configured from userspace,
> while managed interrupts require adjustments in kernelspace.
>
> Adjust status of managed interrupts according change
> of housekeeping CPUs to support dynamic CPU isolation.
What means 'adjust status' ?
> +
> +/*
> + * managed_irq_isolate() - Deactivate managed interrupts if necessary
> + */
> +// derived from migrate_one_irq, irq_needs_fixup, irq_fixup_move_pending
If at all then this needs to be integrated with migrate_one_irq()
> +static int managed_irq_isolate(struct irq_desc *desc)
> +{
> + struct irq_data *d = irq_desc_get_irq_data(desc);
> + struct irq_chip *chip = irq_data_get_irq_chip(d);
> + const struct cpumask *a;
> + bool maskchip;
> + int err;
> +
> + /*
> + * Deactivate if:
> + * - Interrupt is managed
> + * - Interrupt is not per cpu
> + * - Interrupt is started
> + * - Effective affinity mask includes isolated CPUs
> + */
> + if (!irqd_affinity_is_managed(d) || irqd_is_per_cpu(d) || !irqd_is_started(d)
> + || cpumask_subset(irq_data_get_effective_affinity_mask(d),
> + housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)))
> + return 0;
> + // TBD: it is required?
> + /*
> + * Complete an eventually pending irq move cleanup. If this
> + * interrupt was moved in hard irq context, then the vectors need
> + * to be cleaned up. It can't wait until this interrupt actually
> + * happens and this CPU was involved.
> + */
> + irq_force_complete_move(desc);
> +
> + if (irqd_is_setaffinity_pending(d)) {
> + irqd_clr_move_pending(d);
> + if (cpumask_intersects(desc->pending_mask,
> + housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)))
> + a = irq_desc_get_pending_mask(desc);
> + } else {
> + a = irq_data_get_affinity_mask(d);
> + }
> +
> + maskchip = chip->irq_mask && !irq_can_move_pcntxt(d) && !irqd_irq_masked(d);
> + if (maskchip)
> + chip->irq_mask(d);
> +
> + if (!cpumask_intersects(a, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ))) {
> + /*
> + * Shut managed interrupt down and leave the affinity untouched.
> + * The effective affinity is reset to the first online CPU.
> + */
> + irqd_set_managed_shutdown(d);
> + irq_shutdown_and_deactivate(desc);
> + return 0;
Seriously? The interrupt is active and the queue might have outstanding
requests which will never complete because the interrupt is taken away.
On CPU hotplug the related subsystem has shut down the device queue and
drained all outstanding requests. But none of this happens here.
> + }
> +
> + /*
> + * Do not set the force argument of irq_do_set_affinity() as this
> + * disables the masking of offline CPUs from the supplied affinity
> + * mask and therefore might keep/reassign the irq to the outgoing
> + * CPU.
Which outgoing CPU?
> + */
> + err = irq_do_set_affinity(d, a, false);
> + if (err)
> + pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
> + d->irq, err);
> +
> + if (maskchip)
> + chip->irq_unmask(d);
> +
> + return err;
> +}
> +
> +/** managed_irq_affinity_adjust() - Deactivate of restore managed interrupts
> + * according to change of housekeeping cpumask.
> + *
> + * @enable_mask: CPUs for which interrupts should be restored
> + */
> +int managed_irq_affinity_adjust(struct cpumask *enable_mask)
> +{
> + unsigned int irq;
> +
> + for_each_active_irq(irq) {
What ensures that this iteration is safe?
> + struct irq_desc *desc = irq_to_desc(irq);
And that the descriptor is valid?
> + unsigned int cpu;
> +
> + for_each_cpu(cpu, enable_mask)
> + irq_restore_affinity_of_irq(desc, cpu);
And what protects irq_restore_affinity_of_irq() against other operations
on @desc?
> + raw_spin_lock(&desc->lock);
What disables interrupts here in the runtime case?
> + managed_irq_isolate(desc);
> + raw_spin_unlock(&desc->lock);
> + }
> +
> + return 0;
That return value has which purpose?
None of this can work at runtime.
Thanks,
tglx
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-10-02 10:09 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-16 12:20 [RFC PATCH v3 0/3] genirq/cpuhotplug: Adjust managed interrupts according to change of housekeeping cpumask Costa Shulyupin
2024-09-16 12:20 ` [RFC PATCH v3 1/3] sched/isolation: Add infrastructure for dynamic CPU isolation Costa Shulyupin
2024-10-02 9:44 ` Thomas Gleixner
2024-09-16 12:20 ` [RFC PATCH v3 2/3] genirq/cpuhotplug: Adjust managed irqs according to change of housekeeping CPU Costa Shulyupin
2024-10-02 10:09 ` Thomas Gleixner
2024-09-16 12:20 ` [RFC PATCH v3 3/3] DO NOT MERGE: test for managed irqs adjustment Costa Shulyupin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).