linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional
@ 2025-03-17 10:42 Ingo Molnar
  2025-03-17 10:42 ` [PATCH 1/5] sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() Ingo Molnar
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Ingo Molnar @ 2025-03-17 10:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot

For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
in all the major Linux distributions:

   /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y

The reason is that while originally CONFIG_SCHED_DEBUG started
out as a debugging feature, over the years (decades ...) it has
grown various bits of statistics, instrumentation and
control knobs that are useful for sysadmin and general software
development purposes as well.

But within the kernel we still pretend that there's a choice,
and sometimes code that is seemingly 'debug only' creates overhead
that should be optimized in reality.

So make it all official and make CONFIG_SCHED_DEBUG unconditional.
This gets rid of a large amount of #ifdefs, so good riddance ...

Ingo Molnar (5):
  sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
  sched/debug: Make 'const_debug' tunables unconditional __read_mostly
  sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
  sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation
  sched/debug: Remove CONFIG_SCHED_DEBUG

 Documentation/scheduler/sched-debug.rst                         |  2 +-
 Documentation/scheduler/sched-design-CFS.rst                    |  2 +-
 Documentation/scheduler/sched-domains.rst                       |  5 +-
 Documentation/scheduler/sched-ext.rst                           |  3 +-
 Documentation/scheduler/sched-stats.rst                         |  2 +-
 Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst |  2 +-
 fs/proc/base.c                                                  |  7 ---
 include/linux/energy_model.h                                    |  2 -
 include/linux/sched/debug.h                                     |  2 -
 include/linux/sched/topology.h                                  |  4 --
 include/trace/events/sched.h                                    |  2 -
 kernel/sched/build_utility.c                                    |  4 +-
 kernel/sched/core.c                                             | 46 ++++++----------
 kernel/sched/core_sched.c                                       |  2 +-
 kernel/sched/deadline.c                                         | 14 +++--
 kernel/sched/ext.c                                              |  2 +-
 kernel/sched/fair.c                                             | 64 +++++++++++-----------
 kernel/sched/rt.c                                               |  7 +--
 kernel/sched/sched.h                                            | 83 +++++------------------------
 kernel/sched/stats.h                                            |  2 +-
 kernel/sched/topology.c                                         | 13 -----
 lib/Kconfig.debug                                               |  9 ----
 22 files changed, 79 insertions(+), 200 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/5] sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
  2025-03-17 10:42 [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Ingo Molnar
@ 2025-03-17 10:42 ` Ingo Molnar
  2025-03-20  9:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2025-03-17 10:42 ` [PATCH 2/5] sched/debug: Make 'const_debug' tunables unconditional __read_mostly Ingo Molnar
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-03-17 10:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot

The scheduler has this special SCHED_WARN() facility that
depends on CONFIG_SCHED_DEBUG.

Since CONFIG_SCHED_DEBUG is getting removed, convert
SCHED_WARN() to WARN_ON_ONCE().

Note that the warning output isn't 100% equivalent:

   #define SCHED_WARN_ON(x)      WARN_ONCE(x, #x)

Because SCHED_WARN_ON() would output the 'x' condition
as well, while WARN_ONCE() will only show a backtrace.

Hopefully these are rare enough to not really matter.

If it does, we should probably introduce a new WARN_ON()
variant that outputs the condition in stringified form,
or improve WARN_ON() itself.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c       | 24 ++++++++++++------------
 kernel/sched/core_sched.c |  2 +-
 kernel/sched/deadline.c   | 12 ++++++------
 kernel/sched/ext.c        |  2 +-
 kernel/sched/fair.c       | 58 +++++++++++++++++++++++++++++-----------------------------
 kernel/sched/rt.c         |  2 +-
 kernel/sched/sched.h      | 16 +++++-----------
 kernel/sched/stats.h      |  2 +-
 8 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 03d7b63dc3e5..2da197b2968b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -801,7 +801,7 @@ void update_rq_clock(struct rq *rq)
 
 #ifdef CONFIG_SCHED_DEBUG
 	if (sched_feat(WARN_DOUBLE_CLOCK))
-		SCHED_WARN_ON(rq->clock_update_flags & RQCF_UPDATED);
+		WARN_ON_ONCE(rq->clock_update_flags & RQCF_UPDATED);
 	rq->clock_update_flags |= RQCF_UPDATED;
 #endif
 	clock = sched_clock_cpu(cpu_of(rq));
@@ -1719,7 +1719,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
 
 	bucket = &uc_rq->bucket[uc_se->bucket_id];
 
-	SCHED_WARN_ON(!bucket->tasks);
+	WARN_ON_ONCE(!bucket->tasks);
 	if (likely(bucket->tasks))
 		bucket->tasks--;
 
@@ -1739,7 +1739,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
 	 * Defensive programming: this should never happen. If it happens,
 	 * e.g. due to future modification, warn and fix up the expected value.
 	 */
-	SCHED_WARN_ON(bucket->value > rq_clamp);
+	WARN_ON_ONCE(bucket->value > rq_clamp);
 	if (bucket->value >= rq_clamp) {
 		bkt_clamp = uclamp_rq_max_value(rq, clamp_id, uc_se->value);
 		uclamp_rq_set(rq, clamp_id, bkt_clamp);
@@ -2121,7 +2121,7 @@ void activate_task(struct rq *rq, struct task_struct *p, int flags)
 
 void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 {
-	SCHED_WARN_ON(flags & DEQUEUE_SLEEP);
+	WARN_ON_ONCE(flags & DEQUEUE_SLEEP);
 
 	WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING);
 	ASSERT_EXCLUSIVE_WRITER(p->on_rq);
@@ -2726,7 +2726,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx)
 	 * XXX do further audits, this smells like something putrid.
 	 */
 	if (ctx->flags & SCA_MIGRATE_DISABLE)
-		SCHED_WARN_ON(!p->on_cpu);
+		WARN_ON_ONCE(!p->on_cpu);
 	else
 		lockdep_assert_held(&p->pi_lock);
 
@@ -4195,7 +4195,7 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		 *  - we're serialized against set_special_state() by virtue of
 		 *    it disabling IRQs (this allows not taking ->pi_lock).
 		 */
-		SCHED_WARN_ON(p->se.sched_delayed);
+		WARN_ON_ONCE(p->se.sched_delayed);
 		if (!ttwu_state_match(p, state, &success))
 			goto out;
 
@@ -4489,7 +4489,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	INIT_LIST_HEAD(&p->se.group_node);
 
 	/* A delayed task cannot be in clone(). */
-	SCHED_WARN_ON(p->se.sched_delayed);
+	WARN_ON_ONCE(p->se.sched_delayed);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	p->se.cfs_rq			= NULL;
@@ -5745,7 +5745,7 @@ static void sched_tick_remote(struct work_struct *work)
 			 * we are always sure that there is no proxy (only a
 			 * single task is running).
 			 */
-			SCHED_WARN_ON(rq->curr != rq->donor);
+			WARN_ON_ONCE(rq->curr != rq->donor);
 			update_rq_clock(rq);
 
 			if (!is_idle_task(curr)) {
@@ -5965,7 +5965,7 @@ static inline void schedule_debug(struct task_struct *prev, bool preempt)
 		preempt_count_set(PREEMPT_DISABLED);
 	}
 	rcu_sleep_check();
-	SCHED_WARN_ON(ct_state() == CT_STATE_USER);
+	WARN_ON_ONCE(ct_state() == CT_STATE_USER);
 
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
@@ -6811,7 +6811,7 @@ static inline void sched_submit_work(struct task_struct *tsk)
 	 * deadlock if the callback attempts to acquire a lock which is
 	 * already acquired.
 	 */
-	SCHED_WARN_ON(current->__state & TASK_RTLOCK_WAIT);
+	WARN_ON_ONCE(current->__state & TASK_RTLOCK_WAIT);
 
 	/*
 	 * If we are going to sleep and we have plugged IO queued,
@@ -9202,7 +9202,7 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css)
 	unsigned int clamps;
 
 	lockdep_assert_held(&uclamp_mutex);
-	SCHED_WARN_ON(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held());
 
 	css_for_each_descendant_pre(css, top_css) {
 		uc_parent = css_tg(css)->parent
@@ -10537,7 +10537,7 @@ static void task_mm_cid_work(struct callback_head *work)
 	struct mm_struct *mm;
 	int weight, cpu;
 
-	SCHED_WARN_ON(t != container_of(work, struct task_struct, cid_work));
+	WARN_ON_ONCE(t != container_of(work, struct task_struct, cid_work));
 
 	work->next = work;	/* Prevent double-add */
 	if (t->flags & PF_EXITING)
diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
index 1ef98a93eb1d..c4606ca89210 100644
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -65,7 +65,7 @@ static unsigned long sched_core_update_cookie(struct task_struct *p,
 	 * a cookie until after we've removed it, we must have core scheduling
 	 * enabled here.
 	 */
-	SCHED_WARN_ON((p->core_cookie || cookie) && !sched_core_enabled(rq));
+	WARN_ON_ONCE((p->core_cookie || cookie) && !sched_core_enabled(rq));
 
 	if (sched_core_enqueued(p))
 		sched_core_dequeue(rq, p, DEQUEUE_SAVE);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ff4df16b5186..b18c80272f86 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -249,8 +249,8 @@ void __add_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->running_bw += dl_bw;
-	SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
-	SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
+	WARN_ON_ONCE(dl_rq->running_bw < old); /* overflow */
+	WARN_ON_ONCE(dl_rq->running_bw > dl_rq->this_bw);
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
 	cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
 }
@@ -262,7 +262,7 @@ void __sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->running_bw -= dl_bw;
-	SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
+	WARN_ON_ONCE(dl_rq->running_bw > old); /* underflow */
 	if (dl_rq->running_bw > old)
 		dl_rq->running_bw = 0;
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
@@ -276,7 +276,7 @@ void __add_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->this_bw += dl_bw;
-	SCHED_WARN_ON(dl_rq->this_bw < old); /* overflow */
+	WARN_ON_ONCE(dl_rq->this_bw < old); /* overflow */
 }
 
 static inline
@@ -286,10 +286,10 @@ void __sub_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->this_bw -= dl_bw;
-	SCHED_WARN_ON(dl_rq->this_bw > old); /* underflow */
+	WARN_ON_ONCE(dl_rq->this_bw > old); /* underflow */
 	if (dl_rq->this_bw > old)
 		dl_rq->this_bw = 0;
-	SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
+	WARN_ON_ONCE(dl_rq->running_bw > dl_rq->this_bw);
 }
 
 static inline
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 0f1da199cfc7..953a5b9ec0cd 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2341,7 +2341,7 @@ static bool task_can_run_on_remote_rq(struct task_struct *p, struct rq *rq,
 {
 	int cpu = cpu_of(rq);
 
-	SCHED_WARN_ON(task_cpu(p) == cpu);
+	WARN_ON_ONCE(task_cpu(p) == cpu);
 
 	/*
 	 * If @p has migration disabled, @p->cpus_ptr is updated to contain only
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9dafb374d76d..89609ebd4904 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -399,7 +399,7 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
 
 static inline void assert_list_leaf_cfs_rq(struct rq *rq)
 {
-	SCHED_WARN_ON(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list);
+	WARN_ON_ONCE(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list);
 }
 
 /* Iterate through all leaf cfs_rq's on a runqueue */
@@ -696,7 +696,7 @@ static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	s64 vlag, limit;
 
-	SCHED_WARN_ON(!se->on_rq);
+	WARN_ON_ONCE(!se->on_rq);
 
 	vlag = avg_vruntime(cfs_rq) - se->vruntime;
 	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
@@ -3317,7 +3317,7 @@ static void task_numa_work(struct callback_head *work)
 	bool vma_pids_skipped;
 	bool vma_pids_forced = false;
 
-	SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work));
+	WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));
 
 	work->next = work;
 	/*
@@ -4036,7 +4036,7 @@ static inline bool load_avg_is_decayed(struct sched_avg *sa)
 	 * Make sure that rounding and/or propagation of PELT values never
 	 * break this.
 	 */
-	SCHED_WARN_ON(sa->load_avg ||
+	WARN_ON_ONCE(sa->load_avg ||
 		      sa->util_avg ||
 		      sa->runnable_avg);
 
@@ -5460,7 +5460,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	clear_buddies(cfs_rq, se);
 
 	if (flags & DEQUEUE_DELAYED) {
-		SCHED_WARN_ON(!se->sched_delayed);
+		WARN_ON_ONCE(!se->sched_delayed);
 	} else {
 		bool delay = sleep;
 		/*
@@ -5470,7 +5470,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 		if (flags & DEQUEUE_SPECIAL)
 			delay = false;
 
-		SCHED_WARN_ON(delay && se->sched_delayed);
+		WARN_ON_ONCE(delay && se->sched_delayed);
 
 		if (sched_feat(DELAY_DEQUEUE) && delay &&
 		    !entity_eligible(cfs_rq, se)) {
@@ -5551,7 +5551,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	}
 
 	update_stats_curr_start(cfs_rq, se);
-	SCHED_WARN_ON(cfs_rq->curr);
+	WARN_ON_ONCE(cfs_rq->curr);
 	cfs_rq->curr = se;
 
 	/*
@@ -5592,7 +5592,7 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
 	if (sched_feat(PICK_BUDDY) &&
 	    cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next)) {
 		/* ->next will never be delayed */
-		SCHED_WARN_ON(cfs_rq->next->sched_delayed);
+		WARN_ON_ONCE(cfs_rq->next->sched_delayed);
 		return cfs_rq->next;
 	}
 
@@ -5628,7 +5628,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
 		/* in !on_rq case, update occurred at dequeue */
 		update_load_avg(cfs_rq, prev, 0);
 	}
-	SCHED_WARN_ON(cfs_rq->curr != prev);
+	WARN_ON_ONCE(cfs_rq->curr != prev);
 	cfs_rq->curr = NULL;
 }
 
@@ -5851,7 +5851,7 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
 
 			cfs_rq->throttled_clock_self = 0;
 
-			if (SCHED_WARN_ON((s64)delta < 0))
+			if (WARN_ON_ONCE((s64)delta < 0))
 				delta = 0;
 
 			cfs_rq->throttled_clock_self_time += delta;
@@ -5871,7 +5871,7 @@ static int tg_throttle_down(struct task_group *tg, void *data)
 		cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
 		list_del_leaf_cfs_rq(cfs_rq);
 
-		SCHED_WARN_ON(cfs_rq->throttled_clock_self);
+		WARN_ON_ONCE(cfs_rq->throttled_clock_self);
 		if (cfs_rq->nr_queued)
 			cfs_rq->throttled_clock_self = rq_clock(rq);
 	}
@@ -5980,7 +5980,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
 	 * throttled-list.  rq->lock protects completion.
 	 */
 	cfs_rq->throttled = 1;
-	SCHED_WARN_ON(cfs_rq->throttled_clock);
+	WARN_ON_ONCE(cfs_rq->throttled_clock);
 	if (cfs_rq->nr_queued)
 		cfs_rq->throttled_clock = rq_clock(rq);
 	return true;
@@ -6136,7 +6136,7 @@ static inline void __unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
 	}
 
 	/* Already enqueued */
-	if (SCHED_WARN_ON(!list_empty(&cfs_rq->throttled_csd_list)))
+	if (WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_csd_list)))
 		return;
 
 	first = list_empty(&rq->cfsb_csd_list);
@@ -6155,7 +6155,7 @@ static void unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
 {
 	lockdep_assert_rq_held(rq_of(cfs_rq));
 
-	if (SCHED_WARN_ON(!cfs_rq_throttled(cfs_rq) ||
+	if (WARN_ON_ONCE(!cfs_rq_throttled(cfs_rq) ||
 	    cfs_rq->runtime_remaining <= 0))
 		return;
 
@@ -6191,7 +6191,7 @@ static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 			goto next;
 
 		/* By the above checks, this should never be true */
-		SCHED_WARN_ON(cfs_rq->runtime_remaining > 0);
+		WARN_ON_ONCE(cfs_rq->runtime_remaining > 0);
 
 		raw_spin_lock(&cfs_b->lock);
 		runtime = -cfs_rq->runtime_remaining + 1;
@@ -6212,7 +6212,7 @@ static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 				 * We currently only expect to be unthrottling
 				 * a single cfs_rq locally.
 				 */
-				SCHED_WARN_ON(!list_empty(&local_unthrottle));
+				WARN_ON_ONCE(!list_empty(&local_unthrottle));
 				list_add_tail(&cfs_rq->throttled_csd_list,
 					      &local_unthrottle);
 			}
@@ -6237,7 +6237,7 @@ static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 
 		rq_unlock_irqrestore(rq, &rf);
 	}
-	SCHED_WARN_ON(!list_empty(&local_unthrottle));
+	WARN_ON_ONCE(!list_empty(&local_unthrottle));
 
 	rcu_read_unlock();
 
@@ -6789,7 +6789,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
 
-	SCHED_WARN_ON(task_rq(p) != rq);
+	WARN_ON_ONCE(task_rq(p) != rq);
 
 	if (rq->cfs.h_nr_queued > 1) {
 		u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
@@ -6900,8 +6900,8 @@ requeue_delayed_entity(struct sched_entity *se)
 	 * Because a delayed entity is one that is still on
 	 * the runqueue competing until elegibility.
 	 */
-	SCHED_WARN_ON(!se->sched_delayed);
-	SCHED_WARN_ON(!se->on_rq);
+	WARN_ON_ONCE(!se->sched_delayed);
+	WARN_ON_ONCE(!se->on_rq);
 
 	if (sched_feat(DELAY_ZERO)) {
 		update_entity_lag(cfs_rq, se);
@@ -7161,8 +7161,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
 		rq->next_balance = jiffies;
 
 	if (p && task_delayed) {
-		SCHED_WARN_ON(!task_sleep);
-		SCHED_WARN_ON(p->on_rq != 1);
+		WARN_ON_ONCE(!task_sleep);
+		WARN_ON_ONCE(p->on_rq != 1);
 
 		/* Fix-up what dequeue_task_fair() skipped */
 		hrtick_update(rq);
@@ -8740,7 +8740,7 @@ static inline void set_task_max_allowed_capacity(struct task_struct *p) {}
 static void set_next_buddy(struct sched_entity *se)
 {
 	for_each_sched_entity(se) {
-		if (SCHED_WARN_ON(!se->on_rq))
+		if (WARN_ON_ONCE(!se->on_rq))
 			return;
 		if (se_is_idle(se))
 			return;
@@ -12484,7 +12484,7 @@ static void set_cpu_sd_state_busy(int cpu)
 
 void nohz_balance_exit_idle(struct rq *rq)
 {
-	SCHED_WARN_ON(rq != this_rq());
+	WARN_ON_ONCE(rq != this_rq());
 
 	if (likely(!rq->nohz_tick_stopped))
 		return;
@@ -12520,7 +12520,7 @@ void nohz_balance_enter_idle(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 
-	SCHED_WARN_ON(cpu != smp_processor_id());
+	WARN_ON_ONCE(cpu != smp_processor_id());
 
 	/* If this CPU is going down, then nothing needs to be done: */
 	if (!cpu_active(cpu))
@@ -12603,7 +12603,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags)
 	int balance_cpu;
 	struct rq *rq;
 
-	SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
+	WARN_ON_ONCE((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
 
 	/*
 	 * We assume there will be no idle load after this update and clear
@@ -13043,7 +13043,7 @@ bool cfs_prio_less(const struct task_struct *a, const struct task_struct *b,
 	struct cfs_rq *cfs_rqb;
 	s64 delta;
 
-	SCHED_WARN_ON(task_rq(b)->core != rq->core);
+	WARN_ON_ONCE(task_rq(b)->core != rq->core);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	/*
@@ -13246,7 +13246,7 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
-	SCHED_WARN_ON(p->se.sched_delayed);
+	WARN_ON_ONCE(p->se.sched_delayed);
 
 	attach_task_cfs_rq(p);
 
@@ -13281,7 +13281,7 @@ static void __set_next_task_fair(struct rq *rq, struct task_struct *p, bool firs
 	if (!first)
 		return;
 
-	SCHED_WARN_ON(se->sched_delayed);
+	WARN_ON_ONCE(se->sched_delayed);
 
 	if (hrtick_enabled_fair(rq))
 		hrtick_start_fair(rq, p);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4b8e33c615b1..926281ac3ac0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1713,7 +1713,7 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rt_rq *rt_rq)
 	BUG_ON(idx >= MAX_RT_PRIO);
 
 	queue = array->queue + idx;
-	if (SCHED_WARN_ON(list_empty(queue)))
+	if (WARN_ON_ONCE(list_empty(queue)))
 		return NULL;
 	next = list_entry(queue->next, struct sched_rt_entity, run_list);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0212a0c5534a..189f7b033dab 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -91,12 +91,6 @@ struct cpuidle_state;
 #include "cpupri.h"
 #include "cpudeadline.h"
 
-#ifdef CONFIG_SCHED_DEBUG
-# define SCHED_WARN_ON(x)      WARN_ONCE(x, #x)
-#else
-# define SCHED_WARN_ON(x)      ({ (void)(x), 0; })
-#endif
-
 /* task_struct::on_rq states: */
 #define TASK_ON_RQ_QUEUED	1
 #define TASK_ON_RQ_MIGRATING	2
@@ -1571,7 +1565,7 @@ static inline void update_idle_core(struct rq *rq) { }
 
 static inline struct task_struct *task_of(struct sched_entity *se)
 {
-	SCHED_WARN_ON(!entity_is_task(se));
+	WARN_ON_ONCE(!entity_is_task(se));
 	return container_of(se, struct task_struct, se);
 }
 
@@ -1652,7 +1646,7 @@ static inline void assert_clock_updated(struct rq *rq)
 	 * The only reason for not seeing a clock update since the
 	 * last rq_pin_lock() is if we're currently skipping updates.
 	 */
-	SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP);
+	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
 }
 
 static inline u64 rq_clock(struct rq *rq)
@@ -1699,7 +1693,7 @@ static inline void rq_clock_cancel_skipupdate(struct rq *rq)
 static inline void rq_clock_start_loop_update(struct rq *rq)
 {
 	lockdep_assert_rq_held(rq);
-	SCHED_WARN_ON(rq->clock_update_flags & RQCF_ACT_SKIP);
+	WARN_ON_ONCE(rq->clock_update_flags & RQCF_ACT_SKIP);
 	rq->clock_update_flags |= RQCF_ACT_SKIP;
 }
 
@@ -1774,7 +1768,7 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
 	rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
 	rf->clock_update_flags = 0;
 # ifdef CONFIG_SMP
-	SCHED_WARN_ON(rq->balance_callback && rq->balance_callback != &balance_push_callback);
+	WARN_ON_ONCE(rq->balance_callback && rq->balance_callback != &balance_push_callback);
 # endif
 #endif
 }
@@ -2685,7 +2679,7 @@ static inline void idle_set_state(struct rq *rq,
 
 static inline struct cpuidle_state *idle_get_state(struct rq *rq)
 {
-	SCHED_WARN_ON(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held());
 
 	return rq->idle_state;
 }
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 19cdbe96f93d..452826df6ae1 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -144,7 +144,7 @@ static inline void psi_enqueue(struct task_struct *p, int flags)
 
 	if (p->se.sched_delayed) {
 		/* CPU migration of "sleeping" task */
-		SCHED_WARN_ON(!(flags & ENQUEUE_MIGRATED));
+		WARN_ON_ONCE(!(flags & ENQUEUE_MIGRATED));
 		if (p->in_memstall)
 			set |= TSK_MEMSTALL;
 		if (p->in_iowait)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] sched/debug: Make 'const_debug' tunables unconditional __read_mostly
  2025-03-17 10:42 [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Ingo Molnar
  2025-03-17 10:42 ` [PATCH 1/5] sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() Ingo Molnar
@ 2025-03-17 10:42 ` Ingo Molnar
  2025-03-20  9:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2025-03-17 10:42 ` [PATCH 3/5] sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional Ingo Molnar
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-03-17 10:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot

With CONFIG_SCHED_DEBUG becoming unconditional, remove the
extra 'const_debug' indirection towards __read_mostly.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c  |  4 ++--
 kernel/sched/fair.c  |  2 +-
 kernel/sched/sched.h | 15 +++++----------
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2da197b2968b..d6833a85e561 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -128,7 +128,7 @@ DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
  */
 #define SCHED_FEAT(name, enabled)	\
 	(1UL << __SCHED_FEAT_##name) * enabled |
-const_debug unsigned int sysctl_sched_features =
+__read_mostly unsigned int sysctl_sched_features =
 #include "features.h"
 	0;
 #undef SCHED_FEAT
@@ -148,7 +148,7 @@ __read_mostly int sysctl_resched_latency_warn_once = 1;
  * Number of tasks to iterate in a single balance run.
  * Limited because this is done with IRQs disabled.
  */
-const_debug unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK;
+__read_mostly unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK;
 
 __read_mostly int scheduler_running;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 89609ebd4904..35ee8d9d78d5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -79,7 +79,7 @@ unsigned int sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
 unsigned int sysctl_sched_base_slice			= 700000ULL;
 static unsigned int normalized_sysctl_sched_base_slice	= 700000ULL;
 
-const_debug unsigned int sysctl_sched_migration_cost	= 500000UL;
+__read_mostly unsigned int sysctl_sched_migration_cost	= 500000UL;
 
 static int __init setup_sched_thermal_decay_shift(char *str)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 189f7b033dab..187a22800577 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2194,13 +2194,8 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 }
 
 /*
- * Tunables that become constants when CONFIG_SCHED_DEBUG is off:
+ * Tunables:
  */
-#ifdef CONFIG_SCHED_DEBUG
-# define const_debug __read_mostly
-#else
-# define const_debug const
-#endif
 
 #define SCHED_FEAT(name, enabled)	\
 	__SCHED_FEAT_##name ,
@@ -2218,7 +2213,7 @@ enum {
  * To support run-time toggling of sched features, all the translation units
  * (but core.c) reference the sysctl_sched_features defined in core.c.
  */
-extern const_debug unsigned int sysctl_sched_features;
+extern __read_mostly unsigned int sysctl_sched_features;
 
 #ifdef CONFIG_JUMP_LABEL
 
@@ -2249,7 +2244,7 @@ extern struct static_key sched_feat_keys[__SCHED_FEAT_NR];
  */
 #define SCHED_FEAT(name, enabled)	\
 	(1UL << __SCHED_FEAT_##name) * enabled |
-static const_debug __maybe_unused unsigned int sysctl_sched_features =
+static __read_mostly __maybe_unused unsigned int sysctl_sched_features =
 #include "features.h"
 	0;
 #undef SCHED_FEAT
@@ -2837,8 +2832,8 @@ extern void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags);
 # define SCHED_NR_MIGRATE_BREAK 32
 #endif
 
-extern const_debug unsigned int sysctl_sched_nr_migrate;
-extern const_debug unsigned int sysctl_sched_migration_cost;
+extern __read_mostly unsigned int sysctl_sched_nr_migrate;
+extern __read_mostly unsigned int sysctl_sched_migration_cost;
 
 extern unsigned int sysctl_sched_base_slice;
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
  2025-03-17 10:42 [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Ingo Molnar
  2025-03-17 10:42 ` [PATCH 1/5] sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() Ingo Molnar
  2025-03-17 10:42 ` [PATCH 2/5] sched/debug: Make 'const_debug' tunables unconditional __read_mostly Ingo Molnar
@ 2025-03-17 10:42 ` Ingo Molnar
  2025-03-20  9:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2025-03-17 10:42 ` [PATCH 4/5] sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation Ingo Molnar
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-03-17 10:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot

All the big Linux distros enable CONFIG_SCHED_DEBUG, because
the various features it provides help not just with kernel
development, but with system administration and user-space
software development as well.

Reflect this reality and enable this functionality
unconditionally.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 fs/proc/base.c                 |  7 -------
 include/linux/energy_model.h   |  2 --
 include/linux/sched/debug.h    |  2 --
 include/linux/sched/topology.h |  4 ----
 include/trace/events/sched.h   |  2 --
 kernel/sched/build_utility.c   |  4 +---
 kernel/sched/core.c            | 18 +++---------------
 kernel/sched/deadline.c        |  2 --
 kernel/sched/fair.c            |  4 ----
 kernel/sched/rt.c              |  5 +----
 kernel/sched/sched.h           | 54 ++++--------------------------------------------------
 kernel/sched/topology.c        | 13 -------------
 12 files changed, 9 insertions(+), 108 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index cd89e956c322..61526420d0ee 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1489,7 +1489,6 @@ static const struct file_operations proc_fail_nth_operations = {
 #endif
 
 
-#ifdef CONFIG_SCHED_DEBUG
 /*
  * Print out various scheduling related per-task fields:
  */
@@ -1539,8 +1538,6 @@ static const struct file_operations proc_pid_sched_operations = {
 	.release	= single_release,
 };
 
-#endif
-
 #ifdef CONFIG_SCHED_AUTOGROUP
 /*
  * Print out autogroup related information:
@@ -3331,9 +3328,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	ONE("status",     S_IRUGO, proc_pid_status),
 	ONE("personality", S_IRUSR, proc_pid_personality),
 	ONE("limits",	  S_IRUGO, proc_pid_limits),
-#ifdef CONFIG_SCHED_DEBUG
 	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
-#endif
 #ifdef CONFIG_SCHED_AUTOGROUP
 	REG("autogroup",  S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
 #endif
@@ -3682,9 +3677,7 @@ static const struct pid_entry tid_base_stuff[] = {
 	ONE("status",    S_IRUGO, proc_pid_status),
 	ONE("personality", S_IRUSR, proc_pid_personality),
 	ONE("limits",	 S_IRUGO, proc_pid_limits),
-#ifdef CONFIG_SCHED_DEBUG
 	REG("sched",     S_IRUGO|S_IWUSR, proc_pid_sched_operations),
-#endif
 	NOD("comm",      S_IFREG|S_IRUGO|S_IWUSR,
 			 &proc_tid_comm_inode_operations,
 			 &proc_pid_set_comm_operations, {}),
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 78318d49276d..65efc0f5ea2e 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -240,9 +240,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	struct em_perf_state *ps;
 	int i;
 
-#ifdef CONFIG_SCHED_DEBUG
 	WARN_ONCE(!rcu_read_lock_held(), "EM: rcu read lock needed\n");
-#endif
 
 	if (!sum_util)
 		return 0;
diff --git a/include/linux/sched/debug.h b/include/linux/sched/debug.h
index b5035afa2396..35ed4577a6cc 100644
--- a/include/linux/sched/debug.h
+++ b/include/linux/sched/debug.h
@@ -35,12 +35,10 @@ extern void show_stack(struct task_struct *task, unsigned long *sp,
 
 extern void sched_show_task(struct task_struct *p);
 
-#ifdef CONFIG_SCHED_DEBUG
 struct seq_file;
 extern void proc_sched_show_task(struct task_struct *p,
 				 struct pid_namespace *ns, struct seq_file *m);
 extern void proc_sched_set_task(struct task_struct *p);
-#endif
 
 /* Attach to any functions which should be ignored in wchan output. */
 #define __sched		__section(".sched.text")
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 7f3dbafe1817..7894653bc70b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -25,16 +25,12 @@ enum {
 };
 #undef SD_FLAG
 
-#ifdef CONFIG_SCHED_DEBUG
-
 struct sd_flag_debug {
 	unsigned int meta_flags;
 	char *name;
 };
 extern const struct sd_flag_debug sd_flag_debug[];
 
-#endif
-
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
 {
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 9ea4c404bd4e..bfd97cce40a1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -193,9 +193,7 @@ static inline long __trace_sched_switch_state(bool preempt,
 {
 	unsigned int state;
 
-#ifdef CONFIG_SCHED_DEBUG
 	BUG_ON(p != current);
-#endif /* CONFIG_SCHED_DEBUG */
 
 	/*
 	 * Preemption ignores task state, therefore preempted tasks are always
diff --git a/kernel/sched/build_utility.c b/kernel/sched/build_utility.c
index 80a3df49ab47..bf9d8db94b70 100644
--- a/kernel/sched/build_utility.c
+++ b/kernel/sched/build_utility.c
@@ -68,9 +68,7 @@
 # include "cpufreq_schedutil.c"
 #endif
 
-#ifdef CONFIG_SCHED_DEBUG
-# include "debug.c"
-#endif
+#include "debug.c"
 
 #ifdef CONFIG_SCHEDSTATS
 # include "stats.c"
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d6833a85e561..598b7f241dda 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -118,7 +118,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp);
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 
-#ifdef CONFIG_SCHED_DEBUG
 /*
  * Debugging: various feature bits
  *
@@ -142,7 +141,6 @@ __read_mostly unsigned int sysctl_sched_features =
  */
 __read_mostly int sysctl_resched_latency_warn_ms = 100;
 __read_mostly int sysctl_resched_latency_warn_once = 1;
-#endif /* CONFIG_SCHED_DEBUG */
 
 /*
  * Number of tasks to iterate in a single balance run.
@@ -799,11 +797,10 @@ void update_rq_clock(struct rq *rq)
 	if (rq->clock_update_flags & RQCF_ACT_SKIP)
 		return;
 
-#ifdef CONFIG_SCHED_DEBUG
 	if (sched_feat(WARN_DOUBLE_CLOCK))
 		WARN_ON_ONCE(rq->clock_update_flags & RQCF_UPDATED);
 	rq->clock_update_flags |= RQCF_UPDATED;
-#endif
+
 	clock = sched_clock_cpu(cpu_of(rq));
 	scx_rq_clock_update(rq, clock);
 
@@ -3291,7 +3288,6 @@ void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
 
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
-#ifdef CONFIG_SCHED_DEBUG
 	unsigned int state = READ_ONCE(p->__state);
 
 	/*
@@ -3329,7 +3325,6 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	WARN_ON_ONCE(!cpu_online(new_cpu));
 
 	WARN_ON_ONCE(is_migration_disabled(p));
-#endif
 
 	trace_sched_migrate_task(p, new_cpu);
 
@@ -5577,7 +5572,6 @@ unsigned long long task_sched_runtime(struct task_struct *p)
 	return ns;
 }
 
-#ifdef CONFIG_SCHED_DEBUG
 static u64 cpu_resched_latency(struct rq *rq)
 {
 	int latency_warn_ms = READ_ONCE(sysctl_resched_latency_warn_ms);
@@ -5622,9 +5616,6 @@ static int __init setup_resched_latency_warn_ms(char *str)
 	return 1;
 }
 __setup("resched_latency_warn_ms=", setup_resched_latency_warn_ms);
-#else
-static inline u64 cpu_resched_latency(struct rq *rq) { return 0; }
-#endif /* CONFIG_SCHED_DEBUG */
 
 /*
  * This function gets called by the timer code, with HZ frequency.
@@ -6718,9 +6709,7 @@ static void __sched notrace __schedule(int sched_mode)
 picked:
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
-#ifdef CONFIG_SCHED_DEBUG
 	rq->last_seen_need_resched_ns = 0;
-#endif
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
@@ -7094,7 +7083,7 @@ asmlinkage __visible void __sched preempt_schedule_irq(void)
 int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,
 			  void *key)
 {
-	WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~(WF_SYNC|WF_CURRENT_CPU));
+	WARN_ON_ONCE(wake_flags & ~(WF_SYNC|WF_CURRENT_CPU));
 	return try_to_wake_up(curr->private, mode, wake_flags);
 }
 EXPORT_SYMBOL(default_wake_function);
@@ -7764,10 +7753,9 @@ void show_state_filter(unsigned int state_filter)
 			sched_show_task(p);
 	}
 
-#ifdef CONFIG_SCHED_DEBUG
 	if (!state_filter)
 		sysrq_sched_debug_show();
-#endif
+
 	rcu_read_unlock();
 	/*
 	 * Only show locks if all tasks are dumped:
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b18c80272f86..d352b57f31cf 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3567,9 +3567,7 @@ void dl_bw_free(int cpu, u64 dl_bw)
 }
 #endif
 
-#ifdef CONFIG_SCHED_DEBUG
 void print_dl_stats(struct seq_file *m, int cpu)
 {
 	print_dl_rq(m, cpu, &cpu_rq(cpu)->dl);
 }
-#endif /* CONFIG_SCHED_DEBUG */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 35ee8d9d78d5..a0c4cd26ee07 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -983,7 +983,6 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 	return best;
 }
 
-#ifdef CONFIG_SCHED_DEBUG
 struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
 {
 	struct rb_node *last = rb_last(&cfs_rq->tasks_timeline.rb_root);
@@ -1010,7 +1009,6 @@ int sched_update_scaling(void)
 	return 0;
 }
 #endif
-#endif
 
 static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se);
 
@@ -13668,7 +13666,6 @@ DEFINE_SCHED_CLASS(fair) = {
 #endif
 };
 
-#ifdef CONFIG_SCHED_DEBUG
 void print_cfs_stats(struct seq_file *m, int cpu)
 {
 	struct cfs_rq *cfs_rq, *pos;
@@ -13702,7 +13699,6 @@ void show_numa_stats(struct task_struct *p, struct seq_file *m)
 	rcu_read_unlock();
 }
 #endif /* CONFIG_NUMA_BALANCING */
-#endif /* CONFIG_SCHED_DEBUG */
 
 __init void init_sched_fair_class(void)
 {
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 926281ac3ac0..8f7c3bfb49ef 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -169,9 +169,8 @@ static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b)
 
 static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
 {
-#ifdef CONFIG_SCHED_DEBUG
 	WARN_ON_ONCE(!rt_entity_is_task(rt_se));
-#endif
+
 	return container_of(rt_se, struct task_struct, rt);
 }
 
@@ -2967,7 +2966,6 @@ static int sched_rr_handler(const struct ctl_table *table, int write, void *buff
 }
 #endif /* CONFIG_SYSCTL */
 
-#ifdef CONFIG_SCHED_DEBUG
 void print_rt_stats(struct seq_file *m, int cpu)
 {
 	rt_rq_iter_t iter;
@@ -2978,4 +2976,3 @@ void print_rt_stats(struct seq_file *m, int cpu)
 		print_rt_rq(m, cpu, rt_rq);
 	rcu_read_unlock();
 }
-#endif /* CONFIG_SCHED_DEBUG */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 187a22800577..ac68db706b7c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1174,10 +1174,8 @@ struct rq {
 
 	atomic_t		nr_iowait;
 
-#ifdef CONFIG_SCHED_DEBUG
 	u64 last_seen_need_resched_ns;
 	int ticks_without_resched;
-#endif
 
 #ifdef CONFIG_MEMBARRIER
 	int membarrier_state;
@@ -1706,14 +1704,12 @@ static inline void rq_clock_stop_loop_update(struct rq *rq)
 struct rq_flags {
 	unsigned long flags;
 	struct pin_cookie cookie;
-#ifdef CONFIG_SCHED_DEBUG
 	/*
 	 * A copy of (rq::clock_update_flags & RQCF_UPDATED) for the
 	 * current pin context is stashed here in case it needs to be
 	 * restored in rq_repin_lock().
 	 */
 	unsigned int clock_update_flags;
-#endif
 };
 
 extern struct balance_callback balance_push_callback;
@@ -1764,21 +1760,18 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	rf->cookie = lockdep_pin_lock(__rq_lockp(rq));
 
-#ifdef CONFIG_SCHED_DEBUG
 	rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
 	rf->clock_update_flags = 0;
-# ifdef CONFIG_SMP
+#ifdef CONFIG_SMP
 	WARN_ON_ONCE(rq->balance_callback && rq->balance_callback != &balance_push_callback);
-# endif
 #endif
 }
 
 static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
 {
-#ifdef CONFIG_SCHED_DEBUG
 	if (rq->clock_update_flags > RQCF_ACT_SKIP)
 		rf->clock_update_flags = RQCF_UPDATED;
-#endif
+
 	scx_rq_clock_invalidate(rq);
 	lockdep_unpin_lock(__rq_lockp(rq), rf->cookie);
 }
@@ -1787,12 +1780,10 @@ static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	lockdep_repin_lock(__rq_lockp(rq), rf->cookie);
 
-#ifdef CONFIG_SCHED_DEBUG
 	/*
 	 * Restore the value we stashed in @rf for this pin context.
 	 */
 	rq->clock_update_flags |= rf->clock_update_flags;
-#endif
 }
 
 extern
@@ -2066,9 +2057,7 @@ struct sched_group_capacity {
 	unsigned long		next_update;
 	int			imbalance;		/* XXX unrelated to capacity but shared group state */
 
-#ifdef CONFIG_SCHED_DEBUG
 	int			id;
-#endif
 
 	unsigned long		cpumask[];		/* Balance mask */
 };
@@ -2108,13 +2097,8 @@ static inline struct cpumask *group_balance_mask(struct sched_group *sg)
 
 extern int group_balance_cpu(struct sched_group *sg);
 
-#ifdef CONFIG_SCHED_DEBUG
 extern void update_sched_domain_debugfs(void);
 extern void dirty_sched_domain_sysctl(int cpu);
-#else
-static inline void update_sched_domain_debugfs(void) { }
-static inline void dirty_sched_domain_sysctl(int cpu) { }
-#endif
 
 extern int sched_update_scaling(void);
 
@@ -2207,8 +2191,6 @@ enum {
 
 #undef SCHED_FEAT
 
-#ifdef CONFIG_SCHED_DEBUG
-
 /*
  * To support run-time toggling of sched features, all the translation units
  * (but core.c) reference the sysctl_sched_features defined in core.c.
@@ -2235,24 +2217,6 @@ extern struct static_key sched_feat_keys[__SCHED_FEAT_NR];
 
 #endif /* !CONFIG_JUMP_LABEL */
 
-#else /* !SCHED_DEBUG: */
-
-/*
- * Each translation unit has its own copy of sysctl_sched_features to allow
- * constants propagation at compile time and compiler optimization based on
- * features default.
- */
-#define SCHED_FEAT(name, enabled)	\
-	(1UL << __SCHED_FEAT_##name) * enabled |
-static __read_mostly __maybe_unused unsigned int sysctl_sched_features =
-#include "features.h"
-	0;
-#undef SCHED_FEAT
-
-#define sched_feat(x) !!(sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
-
-#endif /* !SCHED_DEBUG */
-
 extern struct static_key_false sched_numa_balancing;
 extern struct static_key_false sched_schedstats;
 
@@ -2837,7 +2801,6 @@ extern __read_mostly unsigned int sysctl_sched_migration_cost;
 
 extern unsigned int sysctl_sched_base_slice;
 
-#ifdef CONFIG_SCHED_DEBUG
 extern int sysctl_resched_latency_warn_ms;
 extern int sysctl_resched_latency_warn_once;
 
@@ -2848,7 +2811,6 @@ extern unsigned int sysctl_numa_balancing_scan_period_min;
 extern unsigned int sysctl_numa_balancing_scan_period_max;
 extern unsigned int sysctl_numa_balancing_scan_size;
 extern unsigned int sysctl_numa_balancing_hot_threshold;
-#endif
 
 #ifdef CONFIG_SCHED_HRTICK
 
@@ -2921,7 +2883,6 @@ unsigned long arch_scale_freq_capacity(int cpu)
 }
 #endif
 
-#ifdef CONFIG_SCHED_DEBUG
 /*
  * In double_lock_balance()/double_rq_lock(), we use raw_spin_rq_lock() to
  * acquire rq lock instead of rq_lock(). So at the end of these two functions
@@ -2936,9 +2897,6 @@ static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
 	rq2->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
 #endif
 }
-#else
-static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2) { }
-#endif
 
 #define DEFINE_LOCK_GUARD_2(name, type, _lock, _unlock, ...)				\
 __DEFINE_UNLOCK_GUARD(name, type, _unlock, type *lock2; __VA_ARGS__)			\
@@ -3151,7 +3109,6 @@ extern struct sched_entity *__pick_root_entity(struct cfs_rq *cfs_rq);
 extern struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq);
 extern struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq);
 
-#ifdef	CONFIG_SCHED_DEBUG
 extern bool sched_debug_verbose;
 
 extern void print_cfs_stats(struct seq_file *m, int cpu);
@@ -3162,15 +3119,12 @@ extern void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq);
 extern void print_dl_rq(struct seq_file *m, int cpu, struct dl_rq *dl_rq);
 
 extern void resched_latency_warn(int cpu, u64 latency);
-# ifdef CONFIG_NUMA_BALANCING
+#ifdef CONFIG_NUMA_BALANCING
 extern void show_numa_stats(struct task_struct *p, struct seq_file *m);
 extern void
 print_numa_stats(struct seq_file *m, int node, unsigned long tsf,
 		 unsigned long tpf, unsigned long gsf, unsigned long gpf);
-# endif /* CONFIG_NUMA_BALANCING */
-#else /* !CONFIG_SCHED_DEBUG: */
-static inline void resched_latency_warn(int cpu, u64 latency) { }
-#endif /* !CONFIG_SCHED_DEBUG */
+#endif /* CONFIG_NUMA_BALANCING */
 
 extern void init_cfs_rq(struct cfs_rq *cfs_rq);
 extern void init_rt_rq(struct rt_rq *rt_rq);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index c49aea8c1025..cb0769820b0b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -11,8 +11,6 @@ DEFINE_MUTEX(sched_domains_mutex);
 static cpumask_var_t sched_domains_tmpmask;
 static cpumask_var_t sched_domains_tmpmask2;
 
-#ifdef CONFIG_SCHED_DEBUG
-
 static int __init sched_debug_setup(char *str)
 {
 	sched_debug_verbose = true;
@@ -151,15 +149,6 @@ static void sched_domain_debug(struct sched_domain *sd, int cpu)
 			break;
 	}
 }
-#else /* !CONFIG_SCHED_DEBUG */
-
-# define sched_debug_verbose 0
-# define sched_domain_debug(sd, cpu) do { } while (0)
-static inline bool sched_debug(void)
-{
-	return false;
-}
-#endif /* CONFIG_SCHED_DEBUG */
 
 /* Generate a mask of SD flags with the SDF_NEEDS_GROUPS metaflag */
 #define SD_FLAG(name, mflags) (name * !!((mflags) & SDF_NEEDS_GROUPS)) |
@@ -2275,9 +2264,7 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
 			if (!sgc)
 				return -ENOMEM;
 
-#ifdef CONFIG_SCHED_DEBUG
 			sgc->id = j;
-#endif
 
 			*per_cpu_ptr(sdd->sgc, j) = sgc;
 		}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/5] sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation
  2025-03-17 10:42 [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Ingo Molnar
                   ` (2 preceding siblings ...)
  2025-03-17 10:42 ` [PATCH 3/5] sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional Ingo Molnar
@ 2025-03-17 10:42 ` Ingo Molnar
  2025-03-20  9:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2025-03-17 10:42 ` [PATCH 5/5] sched/debug: Remove CONFIG_SCHED_DEBUG Ingo Molnar
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-03-17 10:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot

Since it's enabled unconditionally now, remove all references to it.

(Left out languages I cannot read.)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/scheduler/sched-debug.rst                         | 2 +-
 Documentation/scheduler/sched-design-CFS.rst                    | 2 +-
 Documentation/scheduler/sched-domains.rst                       | 5 ++---
 Documentation/scheduler/sched-ext.rst                           | 3 +--
 Documentation/scheduler/sched-stats.rst                         | 2 +-
 Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst | 2 +-
 6 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/Documentation/scheduler/sched-debug.rst b/Documentation/scheduler/sched-debug.rst
index 4d3d24f2a439..b5a92a39eccd 100644
--- a/Documentation/scheduler/sched-debug.rst
+++ b/Documentation/scheduler/sched-debug.rst
@@ -2,7 +2,7 @@
 Scheduler debugfs
 =================
 
-Booting a kernel with CONFIG_SCHED_DEBUG=y will give access to
+Booting a kernel with debugfs enabled will give access to
 scheduler specific debug files under /sys/kernel/debug/sched. Some of
 those files are described below.
 
diff --git a/Documentation/scheduler/sched-design-CFS.rst b/Documentation/scheduler/sched-design-CFS.rst
index 8786f219fc73..b574a2644c77 100644
--- a/Documentation/scheduler/sched-design-CFS.rst
+++ b/Documentation/scheduler/sched-design-CFS.rst
@@ -96,7 +96,7 @@ picked and the current task is preempted.
 CFS uses nanosecond granularity accounting and does not rely on any jiffies or
 other HZ detail.  Thus the CFS scheduler has no notion of "timeslices" in the
 way the previous scheduler had, and has no heuristics whatsoever.  There is
-only one central tunable (you have to switch on CONFIG_SCHED_DEBUG):
+only one central tunable:
 
    /sys/kernel/debug/sched/base_slice_ns
 
diff --git a/Documentation/scheduler/sched-domains.rst b/Documentation/scheduler/sched-domains.rst
index 5e996fe973b1..15e3a4cb304a 100644
--- a/Documentation/scheduler/sched-domains.rst
+++ b/Documentation/scheduler/sched-domains.rst
@@ -73,9 +73,8 @@ Architectures may override the generic domain builder and the default SD flags
 for a given topology level by creating a sched_domain_topology_level array and
 calling set_sched_topology() with this array as the parameter.
 
-The sched-domains debugging infrastructure can be enabled by enabling
-CONFIG_SCHED_DEBUG and adding 'sched_verbose' to your cmdline. If you
-forgot to tweak your cmdline, you can also flip the
+The sched-domains debugging infrastructure can be enabled by 'sched_verbose'
+to your cmdline. If you forgot to tweak your cmdline, you can also flip the
 /sys/kernel/debug/sched/verbose knob. This enables an error checking parse of
 the sched domains which should catch most possible errors (described above). It
 also prints out the domain structure in a visual format.
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index c4672d7df2f7..5788a3319630 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -107,8 +107,7 @@ detailed information:
     nr_rejected   : 0
     enable_seq    : 1
 
-If ``CONFIG_SCHED_DEBUG`` is set, whether a given task is on sched_ext can
-be determined as follows:
+Whether a given task is on sched_ext can be determined as follows:
 
 .. code-block:: none
 
diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index caea83d91c67..08b6bc9a315c 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -88,7 +88,7 @@ One of these is produced per domain for each cpu described. (Note that if
 CONFIG_SMP is not defined, *no* domains are utilized and these lines
 will not appear in the output. <name> is an extension to the domain field
 that prints the name of the corresponding sched domain. It can appear in
-schedstat version 17 and above, and requires CONFIG_SCHED_DEBUG.)
+schedstat version 17 and above.
 
 domain<N> <name> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
 
diff --git a/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst b/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst
index dc728c739e28..b35d24464be9 100644
--- a/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst
+++ b/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst
@@ -112,7 +112,7 @@ CFS usa una granularidad de nanosegundos y no depende de ningún
 jiffy o detalles como HZ. De este modo, el gestor de tareas CFS no tiene
 noción de "ventanas de tiempo" de la forma en que tenía el gestor de
 tareas previo, y tampoco tiene heurísticos. Únicamente hay un parámetro
-central ajustable (se ha de cambiar en CONFIG_SCHED_DEBUG):
+central ajustable:
 
    /sys/kernel/debug/sched/base_slice_ns
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/5] sched/debug: Remove CONFIG_SCHED_DEBUG
  2025-03-17 10:42 [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Ingo Molnar
                   ` (3 preceding siblings ...)
  2025-03-17 10:42 ` [PATCH 4/5] sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation Ingo Molnar
@ 2025-03-17 10:42 ` Ingo Molnar
  2025-03-20  8:59   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
  2025-03-17 21:39 ` [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Linus Torvalds
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-03-17 10:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot

For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
in all the major Linux distributions:

   /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y

The reason is that while originally CONFIG_SCHED_DEBUG started
out as a debugging feature, over the years (decades ...) it has
grown various bits of statistics, instrumentation and
control knobs that are useful for sysadmin and general software
development purposes as well.

But within the kernel we still pretend that there's a choice,
and sometimes code that is seemingly 'debug only' creates overhead
that should be optimized in reality.

So make it all official and make CONFIG_SCHED_DEBUG unconditional.

Now that all uses of CONFIG_SCHED_DEBUG are removed from
the code by previous patches, remove the Kconfig option as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 lib/Kconfig.debug | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1af972a92d06..a2ab693d008d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1301,15 +1301,6 @@ endmenu # "Debug lockups and hangs"
 
 menu "Scheduler Debugging"
 
-config SCHED_DEBUG
-	bool "Collect scheduler debugging info"
-	depends on DEBUG_KERNEL && DEBUG_FS
-	default y
-	help
-	  If you say Y here, the /sys/kernel/debug/sched file will be provided
-	  that can help debug the scheduler. The runtime overhead of this
-	  option is minimal.
-
 config SCHED_INFO
 	bool
 	default n
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional
  2025-03-17 10:42 [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Ingo Molnar
                   ` (4 preceding siblings ...)
  2025-03-17 10:42 ` [PATCH 5/5] sched/debug: Remove CONFIG_SCHED_DEBUG Ingo Molnar
@ 2025-03-17 21:39 ` Linus Torvalds
  2025-03-17 22:24   ` Ingo Molnar
  2025-03-19  8:49 ` Valentin Schneider
  2025-03-19 12:48 ` Shrikanth Hegde
  7 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2025-03-17 21:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Dietmar Eggemann, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot

Ingo, please fix your mail setup..

These were all in my spam-box, because you used

    From: Ingo Molnar <mingo@kernel.org>

but sent it using gmail, so the DKIM signature looks like

    DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; [...]

and then that results in

       dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE)
header.from=kernel.org;

because the DKIM signature - while a valid signature for gmail - does
not match the kernel.org signature.

So you need to use 'mail.kernel.org' to send the email to get the
right signature, as documented in

    https://korg.docs.kernel.org/mail.html

otherwise any sane setup will mark all those things as spam.

              Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional
  2025-03-17 21:39 ` [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Linus Torvalds
@ 2025-03-17 22:24   ` Ingo Molnar
  2025-03-17 22:42     ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-03-17 22:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Dietmar Eggemann, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Ingo, please fix your mail setup..
> 
> These were all in my spam-box, because you used
> 
>     From: Ingo Molnar <mingo@kernel.org>
> 
> but sent it using gmail, so the DKIM signature looks like
> 
>     DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>         d=gmail.com; [...]
> 
> and then that results in
> 
>        dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE)
> header.from=kernel.org;
> 
> because the DKIM signature - while a valid signature for gmail - does
> not match the kernel.org signature.
> 
> So you need to use 'mail.kernel.org' to send the email to get the
> right signature, as documented in
> 
>     https://korg.docs.kernel.org/mail.html
> 
> otherwise any sane setup will mark all those things as spam.

Sorry about that!

(And I just sent out another series with the same flawed script ...)

I thought I have fixed that all up, but apparently only for my main 
Mutt setup, not for some of my older Git-patchbomb scripts that used 
.gitconfig's [sendemail]. :-/

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional
  2025-03-17 22:24   ` Ingo Molnar
@ 2025-03-17 22:42     ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2025-03-17 22:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Dietmar Eggemann, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > Ingo, please fix your mail setup..
> > 
> > These were all in my spam-box, because you used
> > 
> >     From: Ingo Molnar <mingo@kernel.org>
> > 
> > but sent it using gmail, so the DKIM signature looks like
> > 
> >     DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
> >         d=gmail.com; [...]
> > 
> > and then that results in
> > 
> >        dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE)
> > header.from=kernel.org;
> > 
> > because the DKIM signature - while a valid signature for gmail - does
> > not match the kernel.org signature.
> > 
> > So you need to use 'mail.kernel.org' to send the email to get the
> > right signature, as documented in
> > 
> >     https://korg.docs.kernel.org/mail.html
> > 
> > otherwise any sane setup will mark all those things as spam.
> 
> Sorry about that!
> 
> (And I just sent out another series with the same flawed script ...)
> 
> I thought I have fixed that all up, but apparently only for my main 
> Mutt setup, not for some of my older Git-patchbomb scripts that used 
> .gitconfig's [sendemail]. :-/

BTW., the reason I didn't notice this sooner is because I read lkml via 
a local maildir representation of the Lore Git archive (all hail Konstantin),
where these mails showed up just fine.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional
  2025-03-17 10:42 [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Ingo Molnar
                   ` (5 preceding siblings ...)
  2025-03-17 21:39 ` [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Linus Torvalds
@ 2025-03-19  8:49 ` Valentin Schneider
  2025-03-19 21:09   ` Ingo Molnar
  2025-03-19 12:48 ` Shrikanth Hegde
  7 siblings, 1 reply; 30+ messages in thread
From: Valentin Schneider @ 2025-03-19  8:49 UTC (permalink / raw)
  To: Ingo Molnar, linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra, Shrikanth Hegde,
	Thomas Gleixner, Steven Rostedt, Mel Gorman, Vincent Guittot

On 17/03/25 11:42, Ingo Molnar wrote:
> For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
> in all the major Linux distributions:
>
>    /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y
>
> The reason is that while originally CONFIG_SCHED_DEBUG started
> out as a debugging feature, over the years (decades ...) it has
> grown various bits of statistics, instrumentation and
> control knobs that are useful for sysadmin and general software
> development purposes as well.
>
> But within the kernel we still pretend that there's a choice,
> and sometimes code that is seemingly 'debug only' creates overhead
> that should be optimized in reality.
>
> So make it all official and make CONFIG_SCHED_DEBUG unconditional.
> This gets rid of a large amount of #ifdefs, so good riddance ...
>

Pretty much every distro I'm aware of has CONFIG_SCHED_DEBUG=y; a quick check
tells me it's been like so for RHEL since at least 2013, and that's from a
commit copying configs from RHEL-6 to RHEL-7.

Two things however come to mind:

1) What does this mean for the debug stuff we've repeatedly said wasn't ABI
   because it was under CONFIG_SCHED_DEBUG? I've been burned by making
   sched_domain.flags read-only, and there's still writable stuff:

   # ls -al /sys/kernel/debug/sched/domains/cpu0/domain0/
   total 0
   drwxr-xr-x. 2 root root 0 Mar 19 04:36 .
   drwxr-xr-x. 3 root root 0 Mar 19 04:36 ..
   -rw-r--r--. 1 root root 0 Mar 19 04:36 busy_factor
   -rw-r--r--. 1 root root 0 Mar 19 04:36 cache_nice_tries
   -r--r--r--. 1 root root 0 Mar 19 04:36 flags
   -r--r--r--. 1 root root 0 Mar 19 04:36 groups_flags
   -rw-r--r--. 1 root root 0 Mar 19 04:36 imbalance_pct
   -r--r--r--. 1 root root 0 Mar 19 04:36 level
   -rw-r--r--. 1 root root 0 Mar 19 04:36 max_interval
   -rw-r--r--. 1 root root 0 Mar 19 04:36 max_newidle_lb_cost
   -rw-r--r--. 1 root root 0 Mar 19 04:36 min_interval
   -r--r--r--. 1 root root 0 Mar 19 04:36 name

   + all the non topology related debug knobs.

2) Peter mentioned a few times that, last time it was benchmarked, there
   were noticeable perf differences between CONFIG_SCHED_DEBUG=n and
   CONFIG_SCHED_DEBUG=y. This would be an occasion to re-measure that and
   potentially move (some of) these checks to e.g. a sched_debug_verbose
   static key.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional
  2025-03-17 10:42 [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Ingo Molnar
                   ` (6 preceding siblings ...)
  2025-03-19  8:49 ` Valentin Schneider
@ 2025-03-19 12:48 ` Shrikanth Hegde
  2025-03-19 21:14   ` Ingo Molnar
  7 siblings, 1 reply; 30+ messages in thread
From: Shrikanth Hegde @ 2025-03-19 12:48 UTC (permalink / raw)
  To: Ingo Molnar, linux-kernel
  Cc: Dietmar Eggemann, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Valentin Schneider, Steven Rostedt, Mel Gorman, Vincent Guittot



On 3/17/25 16:12, Ingo Molnar wrote:
> For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
> in all the major Linux distributions:
> 
>     /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y
> 
> The reason is that while originally CONFIG_SCHED_DEBUG started
> out as a debugging feature, over the years (decades ...) it has
> grown various bits of statistics, instrumentation and
> control knobs that are useful for sysadmin and general software
> development purposes as well.

A tunable like base_slice which is the only tunable available for EEVDF is under the debug.
So an option is to get rid of CONFIG_SCHED_DEBUG and make it available to all.

We had seen performance regression when domains folder was built with cpu hotplug.
Later that was moved iff verbose was enabled. Maybe something like that can be done
if something is hurting performance.

> 
> But within the kernel we still pretend that there's a choice,
> and sometimes code that is seemingly 'debug only' creates overhead
> that should be optimized in reality.
> 
> So make it all official and make CONFIG_SCHED_DEBUG unconditional.
> This gets rid of a large amount of #ifdefs, so good riddance ...
> 

There are some references in selftest like these, maybe remove them as well?

tools/testing/selftests/sched_ext/config:CONFIG_SCHED_DEBUG=y
tools/testing/selftests/sched/config:CONFIG_SCHED_DEBUG=y
tools/testing/selftests/wireguard/qemu/debug.config:CONFIG_SCHED_DEBUG=y


Also ran unixbench and hackbench on 80 CPU system (1NUMA) with and without CONFIG_SCHED_DEBUG.
hackbench numbers are almost the same.
for unixbench, process creation/Context Switching show 1-2% improvement with CONFIG_SCHED_DEBUG=n


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional
  2025-03-19  8:49 ` Valentin Schneider
@ 2025-03-19 21:09   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2025-03-19 21:09 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Shrikanth Hegde, Thomas Gleixner, Steven Rostedt, Mel Gorman,
	Vincent Guittot


* Valentin Schneider <vschneid@redhat.com> wrote:

> On 17/03/25 11:42, Ingo Molnar wrote:
> > For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
> > in all the major Linux distributions:
> >
> >    /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y
> >
> > The reason is that while originally CONFIG_SCHED_DEBUG started
> > out as a debugging feature, over the years (decades ...) it has
> > grown various bits of statistics, instrumentation and
> > control knobs that are useful for sysadmin and general software
> > development purposes as well.
> >
> > But within the kernel we still pretend that there's a choice,
> > and sometimes code that is seemingly 'debug only' creates overhead
> > that should be optimized in reality.
> >
> > So make it all official and make CONFIG_SCHED_DEBUG unconditional.
> > This gets rid of a large amount of #ifdefs, so good riddance ...
> >
> 
> Pretty much every distro I'm aware of has CONFIG_SCHED_DEBUG=y; a quick check
> tells me it's been like so for RHEL since at least 2013, and that's from a
> commit copying configs from RHEL-6 to RHEL-7.
> 
> Two things however come to mind:
> 
> 1) What does this mean for the debug stuff we've repeatedly said wasn't ABI
>    because it was under CONFIG_SCHED_DEBUG? I've been burned by making
>    sched_domain.flags read-only, and there's still writable stuff:
> 
>    # ls -al /sys/kernel/debug/sched/domains/cpu0/domain0/
>    total 0
>    drwxr-xr-x. 2 root root 0 Mar 19 04:36 .
>    drwxr-xr-x. 3 root root 0 Mar 19 04:36 ..
>    -rw-r--r--. 1 root root 0 Mar 19 04:36 busy_factor
>    -rw-r--r--. 1 root root 0 Mar 19 04:36 cache_nice_tries
>    -r--r--r--. 1 root root 0 Mar 19 04:36 flags
>    -r--r--r--. 1 root root 0 Mar 19 04:36 groups_flags
>    -rw-r--r--. 1 root root 0 Mar 19 04:36 imbalance_pct
>    -r--r--r--. 1 root root 0 Mar 19 04:36 level
>    -rw-r--r--. 1 root root 0 Mar 19 04:36 max_interval
>    -rw-r--r--. 1 root root 0 Mar 19 04:36 max_newidle_lb_cost
>    -rw-r--r--. 1 root root 0 Mar 19 04:36 min_interval
>    -r--r--r--. 1 root root 0 Mar 19 04:36 name
> 
>    + all the non topology related debug knobs.

Yeah, I don't think these or other sysctls are as contentious as 
previously thought. We might want to put '/debug/' into the directory 
name above, or we could move it over to debugfs entirely - but we 
should make it clear via the name that these are debugging knobs in 
essence.

> 2) Peter mentioned a few times that, last time it was benchmarked, there
>    were noticeable perf differences between CONFIG_SCHED_DEBUG=n and
>    CONFIG_SCHED_DEBUG=y. This would be an occasion to re-measure that and
>    potentially move (some of) these checks to e.g. a sched_debug_verbose
>    static key.

Yeah, and this is an argument strongly *in favor* of eliminating 
CONFIG_SCHED_DEBUG: in a way the CONFIG_SCHED_DEBUG "option" created a 
false sense of "it's only debug code". But it's not a genuine debug 
option, it's actual overhead for the vast majority of Linux distros and 
users.

So let's just eliminate SCHED_DEBUG, and fix any overhead. It's exactly 
what we should do anyway - nothing changes IMHO, just the appearance of 
urgency. :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional
  2025-03-19 12:48 ` Shrikanth Hegde
@ 2025-03-19 21:14   ` Ingo Molnar
  2025-03-20  4:41     ` Shrikanth Hegde
  2025-03-20  9:00     ` [tip: sched/core] sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files tip-bot2 for Ingo Molnar
  0 siblings, 2 replies; 30+ messages in thread
From: Ingo Molnar @ 2025-03-19 21:14 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot


* Shrikanth Hegde <sshegde@linux.ibm.com> wrote:

> 
> 
> On 3/17/25 16:12, Ingo Molnar wrote:
> > For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
> > in all the major Linux distributions:
> > 
> >     /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y
> > 
> > The reason is that while originally CONFIG_SCHED_DEBUG started
> > out as a debugging feature, over the years (decades ...) it has
> > grown various bits of statistics, instrumentation and
> > control knobs that are useful for sysadmin and general software
> > development purposes as well.
> 
> A tunable like base_slice which is the only tunable available for EEVDF is under the debug.
> So an option is to get rid of CONFIG_SCHED_DEBUG and make it available to all.
> 
> We had seen performance regression when domains folder was built with cpu hotplug.
> Later that was moved iff verbose was enabled. Maybe something like that can be done
> if something is hurting performance.
> 
> > 
> > But within the kernel we still pretend that there's a choice,
> > and sometimes code that is seemingly 'debug only' creates overhead
> > that should be optimized in reality.
> > 
> > So make it all official and make CONFIG_SCHED_DEBUG unconditional.
> > This gets rid of a large amount of #ifdefs, so good riddance ...
> > 
> 
> There are some references in selftest like these, maybe remove them as well?
> 
> tools/testing/selftests/sched_ext/config:CONFIG_SCHED_DEBUG=y
> tools/testing/selftests/sched/config:CONFIG_SCHED_DEBUG=y
> tools/testing/selftests/wireguard/qemu/debug.config:CONFIG_SCHED_DEBUG=y

Indeed - fixed.

I left out all the defconfigs from the patches, because there's a lot 
of them (~79 reference CONFIG_SCHED_DEBUG ...) and they get refreshed 
naturally in any case.

> Also ran unixbench and hackbench on 80 CPU system (1NUMA) with and 
> without CONFIG_SCHED_DEBUG. hackbench numbers are almost the same.
>
> for unixbench, process creation/Context Switching show 1-2% 
> improvement with CONFIG_SCHED_DEBUG=n

Thank you for the testing! I'll add:

  Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>

to the series if you don't mind.

And irrespectively of this series we should probably look at that 1-2% 
overhead in unixbench context switching overhead, maybe there's a few 
low hanging fruits in the debug code.

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional
  2025-03-19 21:14   ` Ingo Molnar
@ 2025-03-20  4:41     ` Shrikanth Hegde
  2025-03-20  9:00     ` [tip: sched/core] sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: Shrikanth Hegde @ 2025-03-20  4:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Dietmar Eggemann, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Valentin Schneider, Steven Rostedt, Mel Gorman,
	Vincent Guittot



On 3/20/25 02:44, Ingo Molnar wrote:
> 
> * Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> 
>>
>>
>> On 3/17/25 16:12, Ingo Molnar wrote:
>>> For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
>>> in all the major Linux distributions:
>>>
>>>      /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y
>>>
>>> The reason is that while originally CONFIG_SCHED_DEBUG started
>>> out as a debugging feature, over the years (decades ...) it has
>>> grown various bits of statistics, instrumentation and
>>> control knobs that are useful for sysadmin and general software
>>> development purposes as well.
>>
>> A tunable like base_slice which is the only tunable available for EEVDF is under the debug.
>> So an option is to get rid of CONFIG_SCHED_DEBUG and make it available to all.
>>
>> We had seen performance regression when domains folder was built with cpu hotplug.
>> Later that was moved iff verbose was enabled. Maybe something like that can be done
>> if something is hurting performance.
>>
>>>
>>> But within the kernel we still pretend that there's a choice,
>>> and sometimes code that is seemingly 'debug only' creates overhead
>>> that should be optimized in reality.
>>>
>>> So make it all official and make CONFIG_SCHED_DEBUG unconditional.
>>> This gets rid of a large amount of #ifdefs, so good riddance ...
>>>
>>
>> There are some references in selftest like these, maybe remove them as well?
>>
>> tools/testing/selftests/sched_ext/config:CONFIG_SCHED_DEBUG=y
>> tools/testing/selftests/sched/config:CONFIG_SCHED_DEBUG=y
>> tools/testing/selftests/wireguard/qemu/debug.config:CONFIG_SCHED_DEBUG=y
> 
> Indeed - fixed.
> 
> I left out all the defconfigs from the patches, because there's a lot
> of them (~79 reference CONFIG_SCHED_DEBUG ...) and they get refreshed
> naturally in any case.
> 
>> Also ran unixbench and hackbench on 80 CPU system (1NUMA) with and
>> without CONFIG_SCHED_DEBUG. hackbench numbers are almost the same.
>>
>> for unixbench, process creation/Context Switching show 1-2%
>> improvement with CONFIG_SCHED_DEBUG=n
> 
> Thank you for the testing! I'll add:
> 
>    Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> 
> to the series if you don't mind.

It was minimal testing. If that suffices i am okay with the tag.

> 
> And irrespectively of this series we should probably look at that 1-2%
> overhead in unixbench context switching overhead, maybe there's a few
> low hanging fruits in the debug code.
> 
> 	Ingo

ok. Let me see perf record and see what shows up.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched/debug: Remove CONFIG_SCHED_DEBUG
  2025-03-17 10:42 ` [PATCH 5/5] sched/debug: Remove CONFIG_SCHED_DEBUG Ingo Molnar
@ 2025-03-20  8:59   ` tip-bot2 for Ingo Molnar
  2025-03-24 11:57     ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2025-03-20  8:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Shrikanth Hegde, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     b52173065e0aad82a31863bb5f63ebe46f7eb657
Gitweb:        https://git.kernel.org/tip/b52173065e0aad82a31863bb5f63ebe46f7eb657
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Mon, 17 Mar 2025 11:42:56 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 19 Mar 2025 22:23:24 +01:00

sched/debug: Remove CONFIG_SCHED_DEBUG

For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
in all the major Linux distributions:

   /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y

The reason is that while originally CONFIG_SCHED_DEBUG started
out as a debugging feature, over the years (decades ...) it has
grown various bits of statistics, instrumentation and
control knobs that are useful for sysadmin and general software
development purposes as well.

But within the kernel we still pretend that there's a choice,
and sometimes code that is seemingly 'debug only' creates overhead
that should be optimized in reality.

So make it all official and make CONFIG_SCHED_DEBUG unconditional.

Now that all uses of CONFIG_SCHED_DEBUG are removed from
the code by previous patches, remove the Kconfig option as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-6-mingo@kernel.org
---
 lib/Kconfig.debug |  9 ---------
 1 file changed, 9 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1af972a..a2ab693 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1301,15 +1301,6 @@ endmenu # "Debug lockups and hangs"
 
 menu "Scheduler Debugging"
 
-config SCHED_DEBUG
-	bool "Collect scheduler debugging info"
-	depends on DEBUG_KERNEL && DEBUG_FS
-	default y
-	help
-	  If you say Y here, the /sys/kernel/debug/sched file will be provided
-	  that can help debug the scheduler. The runtime overhead of this
-	  option is minimal.
-
 config SCHED_INFO
 	bool
 	default n

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files
  2025-03-19 21:14   ` Ingo Molnar
  2025-03-20  4:41     ` Shrikanth Hegde
@ 2025-03-20  9:00     ` tip-bot2 for Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2025-03-20  9:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Linus Torvalds, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     14d281db78b2e5af1bdce793910ce1ea74520d05
Gitweb:        https://git.kernel.org/tip/14d281db78b2e5af1bdce793910ce1ea74520d05
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Wed, 19 Mar 2025 22:13:15 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 19 Mar 2025 22:23:24 +01:00

sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files

We leave most of the defconfigs alone (there's over 70 of them),
but let's remove CONFIG_SCHED_DEBUG from the scheduler self-test
Kconfig files.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/Z9szt3MpQmQ56TRd@gmail.com
---
 tools/testing/selftests/sched/config                | 2 +-
 tools/testing/selftests/sched_ext/config            | 1 -
 tools/testing/selftests/wireguard/qemu/debug.config | 1 -
 3 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/testing/selftests/sched/config b/tools/testing/selftests/sched/config
index e8b09aa..1bb8bf6 100644
--- a/tools/testing/selftests/sched/config
+++ b/tools/testing/selftests/sched/config
@@ -1 +1 @@
-CONFIG_SCHED_DEBUG=y
+# empty
diff --git a/tools/testing/selftests/sched_ext/config b/tools/testing/selftests/sched_ext/config
index 0de9b4e..aa901b0 100644
--- a/tools/testing/selftests/sched_ext/config
+++ b/tools/testing/selftests/sched_ext/config
@@ -1,4 +1,3 @@
-CONFIG_SCHED_DEBUG=y
 CONFIG_SCHED_CLASS_EXT=y
 CONFIG_CGROUPS=y
 CONFIG_CGROUP_SCHED=y
diff --git a/tools/testing/selftests/wireguard/qemu/debug.config b/tools/testing/selftests/wireguard/qemu/debug.config
index 139fd9a..c305d2f 100644
--- a/tools/testing/selftests/wireguard/qemu/debug.config
+++ b/tools/testing/selftests/wireguard/qemu/debug.config
@@ -27,7 +27,6 @@ CONFIG_DEBUG_KMEMLEAK=y
 CONFIG_DEBUG_STACK_USAGE=y
 CONFIG_DEBUG_SHIRQ=y
 CONFIG_WQ_WATCHDOG=y
-CONFIG_SCHED_DEBUG=y
 CONFIG_SCHED_INFO=y
 CONFIG_SCHEDSTATS=y
 CONFIG_SCHED_STACK_END_CHECK=y

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation
  2025-03-17 10:42 ` [PATCH 4/5] sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation Ingo Molnar
@ 2025-03-20  9:00   ` tip-bot2 for Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2025-03-20  9:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Shrikanth Hegde, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     1b68a6aba00efbee9804c11d85019646b2e2646f
Gitweb:        https://git.kernel.org/tip/1b68a6aba00efbee9804c11d85019646b2e2646f
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Mon, 17 Mar 2025 11:42:55 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 19 Mar 2025 22:20:54 +01:00

sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation

Since it's enabled unconditionally now, remove all references to it.

(Left out languages I cannot read.)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-5-mingo@kernel.org
---
 Documentation/scheduler/sched-debug.rst                         | 2 +-
 Documentation/scheduler/sched-design-CFS.rst                    | 2 +-
 Documentation/scheduler/sched-domains.rst                       | 5 ++---
 Documentation/scheduler/sched-ext.rst                           | 3 +--
 Documentation/scheduler/sched-stats.rst                         | 2 +-
 Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst | 2 +-
 6 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/Documentation/scheduler/sched-debug.rst b/Documentation/scheduler/sched-debug.rst
index 4d3d24f..b5a92a3 100644
--- a/Documentation/scheduler/sched-debug.rst
+++ b/Documentation/scheduler/sched-debug.rst
@@ -2,7 +2,7 @@
 Scheduler debugfs
 =================
 
-Booting a kernel with CONFIG_SCHED_DEBUG=y will give access to
+Booting a kernel with debugfs enabled will give access to
 scheduler specific debug files under /sys/kernel/debug/sched. Some of
 those files are described below.
 
diff --git a/Documentation/scheduler/sched-design-CFS.rst b/Documentation/scheduler/sched-design-CFS.rst
index 8786f21..b574a26 100644
--- a/Documentation/scheduler/sched-design-CFS.rst
+++ b/Documentation/scheduler/sched-design-CFS.rst
@@ -96,7 +96,7 @@ picked and the current task is preempted.
 CFS uses nanosecond granularity accounting and does not rely on any jiffies or
 other HZ detail.  Thus the CFS scheduler has no notion of "timeslices" in the
 way the previous scheduler had, and has no heuristics whatsoever.  There is
-only one central tunable (you have to switch on CONFIG_SCHED_DEBUG):
+only one central tunable:
 
    /sys/kernel/debug/sched/base_slice_ns
 
diff --git a/Documentation/scheduler/sched-domains.rst b/Documentation/scheduler/sched-domains.rst
index 5e996fe..15e3a4c 100644
--- a/Documentation/scheduler/sched-domains.rst
+++ b/Documentation/scheduler/sched-domains.rst
@@ -73,9 +73,8 @@ Architectures may override the generic domain builder and the default SD flags
 for a given topology level by creating a sched_domain_topology_level array and
 calling set_sched_topology() with this array as the parameter.
 
-The sched-domains debugging infrastructure can be enabled by enabling
-CONFIG_SCHED_DEBUG and adding 'sched_verbose' to your cmdline. If you
-forgot to tweak your cmdline, you can also flip the
+The sched-domains debugging infrastructure can be enabled by 'sched_verbose'
+to your cmdline. If you forgot to tweak your cmdline, you can also flip the
 /sys/kernel/debug/sched/verbose knob. This enables an error checking parse of
 the sched domains which should catch most possible errors (described above). It
 also prints out the domain structure in a visual format.
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index c4672d7..5788a33 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -107,8 +107,7 @@ detailed information:
     nr_rejected   : 0
     enable_seq    : 1
 
-If ``CONFIG_SCHED_DEBUG`` is set, whether a given task is on sched_ext can
-be determined as follows:
+Whether a given task is on sched_ext can be determined as follows:
 
 .. code-block:: none
 
diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index caea83d..08b6bc9 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -88,7 +88,7 @@ One of these is produced per domain for each cpu described. (Note that if
 CONFIG_SMP is not defined, *no* domains are utilized and these lines
 will not appear in the output. <name> is an extension to the domain field
 that prints the name of the corresponding sched domain. It can appear in
-schedstat version 17 and above, and requires CONFIG_SCHED_DEBUG.)
+schedstat version 17 and above.
 
 domain<N> <name> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
 
diff --git a/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst b/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst
index dc728c7..b35d244 100644
--- a/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst
+++ b/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst
@@ -112,7 +112,7 @@ CFS usa una granularidad de nanosegundos y no depende de ningún
 jiffy o detalles como HZ. De este modo, el gestor de tareas CFS no tiene
 noción de "ventanas de tiempo" de la forma en que tenía el gestor de
 tareas previo, y tampoco tiene heurísticos. Únicamente hay un parámetro
-central ajustable (se ha de cambiar en CONFIG_SCHED_DEBUG):
+central ajustable:
 
    /sys/kernel/debug/sched/base_slice_ns
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
  2025-03-17 10:42 ` [PATCH 3/5] sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional Ingo Molnar
@ 2025-03-20  9:00   ` tip-bot2 for Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2025-03-20  9:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Shrikanth Hegde, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     dd5bdaf2b72da81d57f4f99e518af80002b6562e
Gitweb:        https://git.kernel.org/tip/dd5bdaf2b72da81d57f4f99e518af80002b6562e
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Mon, 17 Mar 2025 11:42:54 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 19 Mar 2025 22:20:53 +01:00

sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional

All the big Linux distros enable CONFIG_SCHED_DEBUG, because
the various features it provides help not just with kernel
development, but with system administration and user-space
software development as well.

Reflect this reality and enable this functionality
unconditionally.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-4-mingo@kernel.org
---
 fs/proc/base.c                 |  7 +----
 include/linux/energy_model.h   |  2 +-
 include/linux/sched/debug.h    |  2 +-
 include/linux/sched/topology.h |  4 +--
 include/trace/events/sched.h   |  2 +-
 kernel/sched/build_utility.c   |  4 +--
 kernel/sched/core.c            | 18 +----------
 kernel/sched/deadline.c        |  2 +-
 kernel/sched/fair.c            |  4 +--
 kernel/sched/rt.c              |  5 +---
 kernel/sched/sched.h           | 54 ++-------------------------------
 kernel/sched/topology.c        | 13 +--------
 12 files changed, 9 insertions(+), 108 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index cd89e95..6152642 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1489,7 +1489,6 @@ static const struct file_operations proc_fail_nth_operations = {
 #endif
 
 
-#ifdef CONFIG_SCHED_DEBUG
 /*
  * Print out various scheduling related per-task fields:
  */
@@ -1539,8 +1538,6 @@ static const struct file_operations proc_pid_sched_operations = {
 	.release	= single_release,
 };
 
-#endif
-
 #ifdef CONFIG_SCHED_AUTOGROUP
 /*
  * Print out autogroup related information:
@@ -3331,9 +3328,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	ONE("status",     S_IRUGO, proc_pid_status),
 	ONE("personality", S_IRUSR, proc_pid_personality),
 	ONE("limits",	  S_IRUGO, proc_pid_limits),
-#ifdef CONFIG_SCHED_DEBUG
 	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
-#endif
 #ifdef CONFIG_SCHED_AUTOGROUP
 	REG("autogroup",  S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
 #endif
@@ -3682,9 +3677,7 @@ static const struct pid_entry tid_base_stuff[] = {
 	ONE("status",    S_IRUGO, proc_pid_status),
 	ONE("personality", S_IRUSR, proc_pid_personality),
 	ONE("limits",	 S_IRUGO, proc_pid_limits),
-#ifdef CONFIG_SCHED_DEBUG
 	REG("sched",     S_IRUGO|S_IWUSR, proc_pid_sched_operations),
-#endif
 	NOD("comm",      S_IFREG|S_IRUGO|S_IWUSR,
 			 &proc_tid_comm_inode_operations,
 			 &proc_pid_set_comm_operations, {}),
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 78318d4..65efc0f 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -240,9 +240,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	struct em_perf_state *ps;
 	int i;
 
-#ifdef CONFIG_SCHED_DEBUG
 	WARN_ONCE(!rcu_read_lock_held(), "EM: rcu read lock needed\n");
-#endif
 
 	if (!sum_util)
 		return 0;
diff --git a/include/linux/sched/debug.h b/include/linux/sched/debug.h
index b5035af..35ed457 100644
--- a/include/linux/sched/debug.h
+++ b/include/linux/sched/debug.h
@@ -35,12 +35,10 @@ extern void show_stack(struct task_struct *task, unsigned long *sp,
 
 extern void sched_show_task(struct task_struct *p);
 
-#ifdef CONFIG_SCHED_DEBUG
 struct seq_file;
 extern void proc_sched_show_task(struct task_struct *p,
 				 struct pid_namespace *ns, struct seq_file *m);
 extern void proc_sched_set_task(struct task_struct *p);
-#endif
 
 /* Attach to any functions which should be ignored in wchan output. */
 #define __sched		__section(".sched.text")
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 51f7b81..7b4301b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -25,16 +25,12 @@ enum {
 };
 #undef SD_FLAG
 
-#ifdef CONFIG_SCHED_DEBUG
-
 struct sd_flag_debug {
 	unsigned int meta_flags;
 	char *name;
 };
 extern const struct sd_flag_debug sd_flag_debug[];
 
-#endif
-
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
 {
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 9ea4c40..bfd97cc 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -193,9 +193,7 @@ static inline long __trace_sched_switch_state(bool preempt,
 {
 	unsigned int state;
 
-#ifdef CONFIG_SCHED_DEBUG
 	BUG_ON(p != current);
-#endif /* CONFIG_SCHED_DEBUG */
 
 	/*
 	 * Preemption ignores task state, therefore preempted tasks are always
diff --git a/kernel/sched/build_utility.c b/kernel/sched/build_utility.c
index 80a3df4..bf9d8db 100644
--- a/kernel/sched/build_utility.c
+++ b/kernel/sched/build_utility.c
@@ -68,9 +68,7 @@
 # include "cpufreq_schedutil.c"
 #endif
 
-#ifdef CONFIG_SCHED_DEBUG
-# include "debug.c"
-#endif
+#include "debug.c"
 
 #ifdef CONFIG_SCHEDSTATS
 # include "stats.c"
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3589abc..9a4109f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -118,7 +118,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp);
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 
-#ifdef CONFIG_SCHED_DEBUG
 /*
  * Debugging: various feature bits
  *
@@ -142,7 +141,6 @@ __read_mostly unsigned int sysctl_sched_features =
  */
 __read_mostly int sysctl_resched_latency_warn_ms = 100;
 __read_mostly int sysctl_resched_latency_warn_once = 1;
-#endif /* CONFIG_SCHED_DEBUG */
 
 /*
  * Number of tasks to iterate in a single balance run.
@@ -799,11 +797,10 @@ void update_rq_clock(struct rq *rq)
 	if (rq->clock_update_flags & RQCF_ACT_SKIP)
 		return;
 
-#ifdef CONFIG_SCHED_DEBUG
 	if (sched_feat(WARN_DOUBLE_CLOCK))
 		WARN_ON_ONCE(rq->clock_update_flags & RQCF_UPDATED);
 	rq->clock_update_flags |= RQCF_UPDATED;
-#endif
+
 	clock = sched_clock_cpu(cpu_of(rq));
 	scx_rq_clock_update(rq, clock);
 
@@ -3291,7 +3288,6 @@ void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
 
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
-#ifdef CONFIG_SCHED_DEBUG
 	unsigned int state = READ_ONCE(p->__state);
 
 	/*
@@ -3329,7 +3325,6 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	WARN_ON_ONCE(!cpu_online(new_cpu));
 
 	WARN_ON_ONCE(is_migration_disabled(p));
-#endif
 
 	trace_sched_migrate_task(p, new_cpu);
 
@@ -5577,7 +5572,6 @@ unsigned long long task_sched_runtime(struct task_struct *p)
 	return ns;
 }
 
-#ifdef CONFIG_SCHED_DEBUG
 static u64 cpu_resched_latency(struct rq *rq)
 {
 	int latency_warn_ms = READ_ONCE(sysctl_resched_latency_warn_ms);
@@ -5622,9 +5616,6 @@ static int __init setup_resched_latency_warn_ms(char *str)
 	return 1;
 }
 __setup("resched_latency_warn_ms=", setup_resched_latency_warn_ms);
-#else
-static inline u64 cpu_resched_latency(struct rq *rq) { return 0; }
-#endif /* CONFIG_SCHED_DEBUG */
 
 /*
  * This function gets called by the timer code, with HZ frequency.
@@ -6718,9 +6709,7 @@ static void __sched notrace __schedule(int sched_mode)
 picked:
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
-#ifdef CONFIG_SCHED_DEBUG
 	rq->last_seen_need_resched_ns = 0;
-#endif
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
@@ -7094,7 +7083,7 @@ asmlinkage __visible void __sched preempt_schedule_irq(void)
 int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,
 			  void *key)
 {
-	WARN_ON_ONCE(IS_ENABLED(CONFIG_SCHED_DEBUG) && wake_flags & ~(WF_SYNC|WF_CURRENT_CPU));
+	WARN_ON_ONCE(wake_flags & ~(WF_SYNC|WF_CURRENT_CPU));
 	return try_to_wake_up(curr->private, mode, wake_flags);
 }
 EXPORT_SYMBOL(default_wake_function);
@@ -7811,10 +7800,9 @@ void show_state_filter(unsigned int state_filter)
 			sched_show_task(p);
 	}
 
-#ifdef CONFIG_SCHED_DEBUG
 	if (!state_filter)
 		sysrq_sched_debug_show();
-#endif
+
 	rcu_read_unlock();
 	/*
 	 * Only show locks if all tasks are dumped:
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d4f7cbf..03a33b5 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3574,9 +3574,7 @@ void dl_bw_free(int cpu, u64 dl_bw)
 }
 #endif
 
-#ifdef CONFIG_SCHED_DEBUG
 void print_dl_stats(struct seq_file *m, int cpu)
 {
 	print_dl_rq(m, cpu, &cpu_rq(cpu)->dl);
 }
-#endif /* CONFIG_SCHED_DEBUG */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 35ee8d9..a0c4cd2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -983,7 +983,6 @@ found:
 	return best;
 }
 
-#ifdef CONFIG_SCHED_DEBUG
 struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
 {
 	struct rb_node *last = rb_last(&cfs_rq->tasks_timeline.rb_root);
@@ -1010,7 +1009,6 @@ int sched_update_scaling(void)
 	return 0;
 }
 #endif
-#endif
 
 static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se);
 
@@ -13668,7 +13666,6 @@ DEFINE_SCHED_CLASS(fair) = {
 #endif
 };
 
-#ifdef CONFIG_SCHED_DEBUG
 void print_cfs_stats(struct seq_file *m, int cpu)
 {
 	struct cfs_rq *cfs_rq, *pos;
@@ -13702,7 +13699,6 @@ void show_numa_stats(struct task_struct *p, struct seq_file *m)
 	rcu_read_unlock();
 }
 #endif /* CONFIG_NUMA_BALANCING */
-#endif /* CONFIG_SCHED_DEBUG */
 
 __init void init_sched_fair_class(void)
 {
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8b8d2c1..a477415 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -169,9 +169,8 @@ static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b)
 
 static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
 {
-#ifdef CONFIG_SCHED_DEBUG
 	WARN_ON_ONCE(!rt_entity_is_task(rt_se));
-#endif
+
 	return container_of(rt_se, struct task_struct, rt);
 }
 
@@ -2969,7 +2968,6 @@ static int sched_rr_handler(const struct ctl_table *table, int write, void *buff
 }
 #endif /* CONFIG_SYSCTL */
 
-#ifdef CONFIG_SCHED_DEBUG
 void print_rt_stats(struct seq_file *m, int cpu)
 {
 	rt_rq_iter_t iter;
@@ -2980,4 +2978,3 @@ void print_rt_stats(struct seq_file *m, int cpu)
 		print_rt_rq(m, cpu, rt_rq);
 	rcu_read_unlock();
 }
-#endif /* CONFIG_SCHED_DEBUG */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d8e4040..47972f3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1174,10 +1174,8 @@ struct rq {
 
 	atomic_t		nr_iowait;
 
-#ifdef CONFIG_SCHED_DEBUG
 	u64 last_seen_need_resched_ns;
 	int ticks_without_resched;
-#endif
 
 #ifdef CONFIG_MEMBARRIER
 	int membarrier_state;
@@ -1706,14 +1704,12 @@ static inline void rq_clock_stop_loop_update(struct rq *rq)
 struct rq_flags {
 	unsigned long flags;
 	struct pin_cookie cookie;
-#ifdef CONFIG_SCHED_DEBUG
 	/*
 	 * A copy of (rq::clock_update_flags & RQCF_UPDATED) for the
 	 * current pin context is stashed here in case it needs to be
 	 * restored in rq_repin_lock().
 	 */
 	unsigned int clock_update_flags;
-#endif
 };
 
 extern struct balance_callback balance_push_callback;
@@ -1764,21 +1760,18 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	rf->cookie = lockdep_pin_lock(__rq_lockp(rq));
 
-#ifdef CONFIG_SCHED_DEBUG
 	rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
 	rf->clock_update_flags = 0;
-# ifdef CONFIG_SMP
+#ifdef CONFIG_SMP
 	WARN_ON_ONCE(rq->balance_callback && rq->balance_callback != &balance_push_callback);
-# endif
 #endif
 }
 
 static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
 {
-#ifdef CONFIG_SCHED_DEBUG
 	if (rq->clock_update_flags > RQCF_ACT_SKIP)
 		rf->clock_update_flags = RQCF_UPDATED;
-#endif
+
 	scx_rq_clock_invalidate(rq);
 	lockdep_unpin_lock(__rq_lockp(rq), rf->cookie);
 }
@@ -1787,12 +1780,10 @@ static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	lockdep_repin_lock(__rq_lockp(rq), rf->cookie);
 
-#ifdef CONFIG_SCHED_DEBUG
 	/*
 	 * Restore the value we stashed in @rf for this pin context.
 	 */
 	rq->clock_update_flags |= rf->clock_update_flags;
-#endif
 }
 
 extern
@@ -2066,9 +2057,7 @@ struct sched_group_capacity {
 	unsigned long		next_update;
 	int			imbalance;		/* XXX unrelated to capacity but shared group state */
 
-#ifdef CONFIG_SCHED_DEBUG
 	int			id;
-#endif
 
 	unsigned long		cpumask[];		/* Balance mask */
 };
@@ -2108,13 +2097,8 @@ static inline struct cpumask *group_balance_mask(struct sched_group *sg)
 
 extern int group_balance_cpu(struct sched_group *sg);
 
-#ifdef CONFIG_SCHED_DEBUG
 extern void update_sched_domain_debugfs(void);
 extern void dirty_sched_domain_sysctl(int cpu);
-#else
-static inline void update_sched_domain_debugfs(void) { }
-static inline void dirty_sched_domain_sysctl(int cpu) { }
-#endif
 
 extern int sched_update_scaling(void);
 
@@ -2207,8 +2191,6 @@ enum {
 
 #undef SCHED_FEAT
 
-#ifdef CONFIG_SCHED_DEBUG
-
 /*
  * To support run-time toggling of sched features, all the translation units
  * (but core.c) reference the sysctl_sched_features defined in core.c.
@@ -2235,24 +2217,6 @@ extern struct static_key sched_feat_keys[__SCHED_FEAT_NR];
 
 #endif /* !CONFIG_JUMP_LABEL */
 
-#else /* !SCHED_DEBUG: */
-
-/*
- * Each translation unit has its own copy of sysctl_sched_features to allow
- * constants propagation at compile time and compiler optimization based on
- * features default.
- */
-#define SCHED_FEAT(name, enabled)	\
-	(1UL << __SCHED_FEAT_##name) * enabled |
-static __read_mostly __maybe_unused unsigned int sysctl_sched_features =
-#include "features.h"
-	0;
-#undef SCHED_FEAT
-
-#define sched_feat(x) !!(sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
-
-#endif /* !SCHED_DEBUG */
-
 extern struct static_key_false sched_numa_balancing;
 extern struct static_key_false sched_schedstats;
 
@@ -2837,7 +2801,6 @@ extern __read_mostly unsigned int sysctl_sched_migration_cost;
 
 extern unsigned int sysctl_sched_base_slice;
 
-#ifdef CONFIG_SCHED_DEBUG
 extern int sysctl_resched_latency_warn_ms;
 extern int sysctl_resched_latency_warn_once;
 
@@ -2848,7 +2811,6 @@ extern unsigned int sysctl_numa_balancing_scan_period_min;
 extern unsigned int sysctl_numa_balancing_scan_period_max;
 extern unsigned int sysctl_numa_balancing_scan_size;
 extern unsigned int sysctl_numa_balancing_hot_threshold;
-#endif
 
 #ifdef CONFIG_SCHED_HRTICK
 
@@ -2921,7 +2883,6 @@ unsigned long arch_scale_freq_capacity(int cpu)
 }
 #endif
 
-#ifdef CONFIG_SCHED_DEBUG
 /*
  * In double_lock_balance()/double_rq_lock(), we use raw_spin_rq_lock() to
  * acquire rq lock instead of rq_lock(). So at the end of these two functions
@@ -2936,9 +2897,6 @@ static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
 	rq2->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
 #endif
 }
-#else
-static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2) { }
-#endif
 
 #define DEFINE_LOCK_GUARD_2(name, type, _lock, _unlock, ...)				\
 __DEFINE_UNLOCK_GUARD(name, type, _unlock, type *lock2; __VA_ARGS__)			\
@@ -3151,7 +3109,6 @@ extern struct sched_entity *__pick_root_entity(struct cfs_rq *cfs_rq);
 extern struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq);
 extern struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq);
 
-#ifdef	CONFIG_SCHED_DEBUG
 extern bool sched_debug_verbose;
 
 extern void print_cfs_stats(struct seq_file *m, int cpu);
@@ -3162,15 +3119,12 @@ extern void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq);
 extern void print_dl_rq(struct seq_file *m, int cpu, struct dl_rq *dl_rq);
 
 extern void resched_latency_warn(int cpu, u64 latency);
-# ifdef CONFIG_NUMA_BALANCING
+#ifdef CONFIG_NUMA_BALANCING
 extern void show_numa_stats(struct task_struct *p, struct seq_file *m);
 extern void
 print_numa_stats(struct seq_file *m, int node, unsigned long tsf,
 		 unsigned long tpf, unsigned long gsf, unsigned long gpf);
-# endif /* CONFIG_NUMA_BALANCING */
-#else /* !CONFIG_SCHED_DEBUG: */
-static inline void resched_latency_warn(int cpu, u64 latency) { }
-#endif /* !CONFIG_SCHED_DEBUG */
+#endif /* CONFIG_NUMA_BALANCING */
 
 extern void init_cfs_rq(struct cfs_rq *cfs_rq);
 extern void init_rt_rq(struct rt_rq *rt_rq);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 95bde79..f1ebc60 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -19,8 +19,6 @@ void sched_domains_mutex_unlock(void)
 static cpumask_var_t sched_domains_tmpmask;
 static cpumask_var_t sched_domains_tmpmask2;
 
-#ifdef CONFIG_SCHED_DEBUG
-
 static int __init sched_debug_setup(char *str)
 {
 	sched_debug_verbose = true;
@@ -159,15 +157,6 @@ static void sched_domain_debug(struct sched_domain *sd, int cpu)
 			break;
 	}
 }
-#else /* !CONFIG_SCHED_DEBUG */
-
-# define sched_debug_verbose 0
-# define sched_domain_debug(sd, cpu) do { } while (0)
-static inline bool sched_debug(void)
-{
-	return false;
-}
-#endif /* CONFIG_SCHED_DEBUG */
 
 /* Generate a mask of SD flags with the SDF_NEEDS_GROUPS metaflag */
 #define SD_FLAG(name, mflags) (name * !!((mflags) & SDF_NEEDS_GROUPS)) |
@@ -2283,9 +2272,7 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
 			if (!sgc)
 				return -ENOMEM;
 
-#ifdef CONFIG_SCHED_DEBUG
 			sgc->id = j;
-#endif
 
 			*per_cpu_ptr(sdd->sgc, j) = sgc;
 		}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched/debug: Make 'const_debug' tunables unconditional __read_mostly
  2025-03-17 10:42 ` [PATCH 2/5] sched/debug: Make 'const_debug' tunables unconditional __read_mostly Ingo Molnar
@ 2025-03-20  9:00   ` tip-bot2 for Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2025-03-20  9:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Shrikanth Hegde, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     57903f72f270a3deb9de408ac61001a3fd94bf2f
Gitweb:        https://git.kernel.org/tip/57903f72f270a3deb9de408ac61001a3fd94bf2f
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Mon, 17 Mar 2025 11:42:53 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 19 Mar 2025 22:20:53 +01:00

sched/debug: Make 'const_debug' tunables unconditional __read_mostly

With CONFIG_SCHED_DEBUG becoming unconditional, remove the
extra 'const_debug' indirection towards __read_mostly.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-3-mingo@kernel.org
---
 kernel/sched/core.c  |  4 ++--
 kernel/sched/fair.c  |  2 +-
 kernel/sched/sched.h | 15 +++++----------
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6f666b4..3589abc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -128,7 +128,7 @@ DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
  */
 #define SCHED_FEAT(name, enabled)	\
 	(1UL << __SCHED_FEAT_##name) * enabled |
-const_debug unsigned int sysctl_sched_features =
+__read_mostly unsigned int sysctl_sched_features =
 #include "features.h"
 	0;
 #undef SCHED_FEAT
@@ -148,7 +148,7 @@ __read_mostly int sysctl_resched_latency_warn_once = 1;
  * Number of tasks to iterate in a single balance run.
  * Limited because this is done with IRQs disabled.
  */
-const_debug unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK;
+__read_mostly unsigned int sysctl_sched_nr_migrate = SCHED_NR_MIGRATE_BREAK;
 
 __read_mostly int scheduler_running;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 89609eb..35ee8d9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -79,7 +79,7 @@ unsigned int sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
 unsigned int sysctl_sched_base_slice			= 700000ULL;
 static unsigned int normalized_sysctl_sched_base_slice	= 700000ULL;
 
-const_debug unsigned int sysctl_sched_migration_cost	= 500000UL;
+__read_mostly unsigned int sysctl_sched_migration_cost	= 500000UL;
 
 static int __init setup_sched_thermal_decay_shift(char *str)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fadaabe..d8e4040 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2194,13 +2194,8 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 }
 
 /*
- * Tunables that become constants when CONFIG_SCHED_DEBUG is off:
+ * Tunables:
  */
-#ifdef CONFIG_SCHED_DEBUG
-# define const_debug __read_mostly
-#else
-# define const_debug const
-#endif
 
 #define SCHED_FEAT(name, enabled)	\
 	__SCHED_FEAT_##name ,
@@ -2218,7 +2213,7 @@ enum {
  * To support run-time toggling of sched features, all the translation units
  * (but core.c) reference the sysctl_sched_features defined in core.c.
  */
-extern const_debug unsigned int sysctl_sched_features;
+extern __read_mostly unsigned int sysctl_sched_features;
 
 #ifdef CONFIG_JUMP_LABEL
 
@@ -2249,7 +2244,7 @@ extern struct static_key sched_feat_keys[__SCHED_FEAT_NR];
  */
 #define SCHED_FEAT(name, enabled)	\
 	(1UL << __SCHED_FEAT_##name) * enabled |
-static const_debug __maybe_unused unsigned int sysctl_sched_features =
+static __read_mostly __maybe_unused unsigned int sysctl_sched_features =
 #include "features.h"
 	0;
 #undef SCHED_FEAT
@@ -2837,8 +2832,8 @@ extern void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags);
 # define SCHED_NR_MIGRATE_BREAK 32
 #endif
 
-extern const_debug unsigned int sysctl_sched_nr_migrate;
-extern const_debug unsigned int sysctl_sched_migration_cost;
+extern __read_mostly unsigned int sysctl_sched_nr_migrate;
+extern __read_mostly unsigned int sysctl_sched_migration_cost;
 
 extern unsigned int sysctl_sched_base_slice;
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
  2025-03-17 10:42 ` [PATCH 1/5] sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() Ingo Molnar
@ 2025-03-20  9:00   ` tip-bot2 for Ingo Molnar
  2025-03-24 11:59     ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: tip-bot2 for Ingo Molnar @ 2025-03-20  9:00 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Shrikanth Hegde, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     f7d2728cc032a23fccb5ecde69793a38eb30ba5c
Gitweb:        https://git.kernel.org/tip/f7d2728cc032a23fccb5ecde69793a38eb30ba5c
Author:        Ingo Molnar <mingo@kernel.org>
AuthorDate:    Mon, 17 Mar 2025 11:42:52 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 19 Mar 2025 22:20:53 +01:00

sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()

The scheduler has this special SCHED_WARN() facility that
depends on CONFIG_SCHED_DEBUG.

Since CONFIG_SCHED_DEBUG is getting removed, convert
SCHED_WARN() to WARN_ON_ONCE().

Note that the warning output isn't 100% equivalent:

   #define SCHED_WARN_ON(x)      WARN_ONCE(x, #x)

Because SCHED_WARN_ON() would output the 'x' condition
as well, while WARN_ONCE() will only show a backtrace.

Hopefully these are rare enough to not really matter.

If it does, we should probably introduce a new WARN_ON()
variant that outputs the condition in stringified form,
or improve WARN_ON() itself.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-2-mingo@kernel.org
---
 kernel/sched/core.c       | 24 ++++++++--------
 kernel/sched/core_sched.c |  2 +-
 kernel/sched/deadline.c   | 12 ++++----
 kernel/sched/ext.c        |  2 +-
 kernel/sched/fair.c       | 58 +++++++++++++++++++-------------------
 kernel/sched/rt.c         |  2 +-
 kernel/sched/sched.h      | 16 +++-------
 kernel/sched/stats.h      |  2 +-
 8 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index affa99f..6f666b4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -801,7 +801,7 @@ void update_rq_clock(struct rq *rq)
 
 #ifdef CONFIG_SCHED_DEBUG
 	if (sched_feat(WARN_DOUBLE_CLOCK))
-		SCHED_WARN_ON(rq->clock_update_flags & RQCF_UPDATED);
+		WARN_ON_ONCE(rq->clock_update_flags & RQCF_UPDATED);
 	rq->clock_update_flags |= RQCF_UPDATED;
 #endif
 	clock = sched_clock_cpu(cpu_of(rq));
@@ -1719,7 +1719,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
 
 	bucket = &uc_rq->bucket[uc_se->bucket_id];
 
-	SCHED_WARN_ON(!bucket->tasks);
+	WARN_ON_ONCE(!bucket->tasks);
 	if (likely(bucket->tasks))
 		bucket->tasks--;
 
@@ -1739,7 +1739,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
 	 * Defensive programming: this should never happen. If it happens,
 	 * e.g. due to future modification, warn and fix up the expected value.
 	 */
-	SCHED_WARN_ON(bucket->value > rq_clamp);
+	WARN_ON_ONCE(bucket->value > rq_clamp);
 	if (bucket->value >= rq_clamp) {
 		bkt_clamp = uclamp_rq_max_value(rq, clamp_id, uc_se->value);
 		uclamp_rq_set(rq, clamp_id, bkt_clamp);
@@ -2121,7 +2121,7 @@ void activate_task(struct rq *rq, struct task_struct *p, int flags)
 
 void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 {
-	SCHED_WARN_ON(flags & DEQUEUE_SLEEP);
+	WARN_ON_ONCE(flags & DEQUEUE_SLEEP);
 
 	WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING);
 	ASSERT_EXCLUSIVE_WRITER(p->on_rq);
@@ -2726,7 +2726,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx)
 	 * XXX do further audits, this smells like something putrid.
 	 */
 	if (ctx->flags & SCA_MIGRATE_DISABLE)
-		SCHED_WARN_ON(!p->on_cpu);
+		WARN_ON_ONCE(!p->on_cpu);
 	else
 		lockdep_assert_held(&p->pi_lock);
 
@@ -4195,7 +4195,7 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		 *  - we're serialized against set_special_state() by virtue of
 		 *    it disabling IRQs (this allows not taking ->pi_lock).
 		 */
-		SCHED_WARN_ON(p->se.sched_delayed);
+		WARN_ON_ONCE(p->se.sched_delayed);
 		if (!ttwu_state_match(p, state, &success))
 			goto out;
 
@@ -4489,7 +4489,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	INIT_LIST_HEAD(&p->se.group_node);
 
 	/* A delayed task cannot be in clone(). */
-	SCHED_WARN_ON(p->se.sched_delayed);
+	WARN_ON_ONCE(p->se.sched_delayed);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	p->se.cfs_rq			= NULL;
@@ -5745,7 +5745,7 @@ static void sched_tick_remote(struct work_struct *work)
 			 * we are always sure that there is no proxy (only a
 			 * single task is running).
 			 */
-			SCHED_WARN_ON(rq->curr != rq->donor);
+			WARN_ON_ONCE(rq->curr != rq->donor);
 			update_rq_clock(rq);
 
 			if (!is_idle_task(curr)) {
@@ -5965,7 +5965,7 @@ static inline void schedule_debug(struct task_struct *prev, bool preempt)
 		preempt_count_set(PREEMPT_DISABLED);
 	}
 	rcu_sleep_check();
-	SCHED_WARN_ON(ct_state() == CT_STATE_USER);
+	WARN_ON_ONCE(ct_state() == CT_STATE_USER);
 
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
@@ -6811,7 +6811,7 @@ static inline void sched_submit_work(struct task_struct *tsk)
 	 * deadlock if the callback attempts to acquire a lock which is
 	 * already acquired.
 	 */
-	SCHED_WARN_ON(current->__state & TASK_RTLOCK_WAIT);
+	WARN_ON_ONCE(current->__state & TASK_RTLOCK_WAIT);
 
 	/*
 	 * If we are going to sleep and we have plugged IO queued,
@@ -9249,7 +9249,7 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css)
 	unsigned int clamps;
 
 	lockdep_assert_held(&uclamp_mutex);
-	SCHED_WARN_ON(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held());
 
 	css_for_each_descendant_pre(css, top_css) {
 		uc_parent = css_tg(css)->parent
@@ -10584,7 +10584,7 @@ static void task_mm_cid_work(struct callback_head *work)
 	struct mm_struct *mm;
 	int weight, cpu;
 
-	SCHED_WARN_ON(t != container_of(work, struct task_struct, cid_work));
+	WARN_ON_ONCE(t != container_of(work, struct task_struct, cid_work));
 
 	work->next = work;	/* Prevent double-add */
 	if (t->flags & PF_EXITING)
diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c
index 1ef98a9..c4606ca 100644
--- a/kernel/sched/core_sched.c
+++ b/kernel/sched/core_sched.c
@@ -65,7 +65,7 @@ static unsigned long sched_core_update_cookie(struct task_struct *p,
 	 * a cookie until after we've removed it, we must have core scheduling
 	 * enabled here.
 	 */
-	SCHED_WARN_ON((p->core_cookie || cookie) && !sched_core_enabled(rq));
+	WARN_ON_ONCE((p->core_cookie || cookie) && !sched_core_enabled(rq));
 
 	if (sched_core_enqueued(p))
 		sched_core_dequeue(rq, p, DEQUEUE_SAVE);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5dca336..d4f7cbf 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -249,8 +249,8 @@ void __add_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->running_bw += dl_bw;
-	SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
-	SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
+	WARN_ON_ONCE(dl_rq->running_bw < old); /* overflow */
+	WARN_ON_ONCE(dl_rq->running_bw > dl_rq->this_bw);
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
 	cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
 }
@@ -262,7 +262,7 @@ void __sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->running_bw -= dl_bw;
-	SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
+	WARN_ON_ONCE(dl_rq->running_bw > old); /* underflow */
 	if (dl_rq->running_bw > old)
 		dl_rq->running_bw = 0;
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
@@ -276,7 +276,7 @@ void __add_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->this_bw += dl_bw;
-	SCHED_WARN_ON(dl_rq->this_bw < old); /* overflow */
+	WARN_ON_ONCE(dl_rq->this_bw < old); /* overflow */
 }
 
 static inline
@@ -286,10 +286,10 @@ void __sub_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
 
 	lockdep_assert_rq_held(rq_of_dl_rq(dl_rq));
 	dl_rq->this_bw -= dl_bw;
-	SCHED_WARN_ON(dl_rq->this_bw > old); /* underflow */
+	WARN_ON_ONCE(dl_rq->this_bw > old); /* underflow */
 	if (dl_rq->this_bw > old)
 		dl_rq->this_bw = 0;
-	SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
+	WARN_ON_ONCE(dl_rq->running_bw > dl_rq->this_bw);
 }
 
 static inline
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 0f1da19..953a5b9 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2341,7 +2341,7 @@ static bool task_can_run_on_remote_rq(struct task_struct *p, struct rq *rq,
 {
 	int cpu = cpu_of(rq);
 
-	SCHED_WARN_ON(task_cpu(p) == cpu);
+	WARN_ON_ONCE(task_cpu(p) == cpu);
 
 	/*
 	 * If @p has migration disabled, @p->cpus_ptr is updated to contain only
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9dafb37..89609eb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -399,7 +399,7 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
 
 static inline void assert_list_leaf_cfs_rq(struct rq *rq)
 {
-	SCHED_WARN_ON(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list);
+	WARN_ON_ONCE(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list);
 }
 
 /* Iterate through all leaf cfs_rq's on a runqueue */
@@ -696,7 +696,7 @@ static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	s64 vlag, limit;
 
-	SCHED_WARN_ON(!se->on_rq);
+	WARN_ON_ONCE(!se->on_rq);
 
 	vlag = avg_vruntime(cfs_rq) - se->vruntime;
 	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
@@ -3317,7 +3317,7 @@ static void task_numa_work(struct callback_head *work)
 	bool vma_pids_skipped;
 	bool vma_pids_forced = false;
 
-	SCHED_WARN_ON(p != container_of(work, struct task_struct, numa_work));
+	WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));
 
 	work->next = work;
 	/*
@@ -4036,7 +4036,7 @@ static inline bool load_avg_is_decayed(struct sched_avg *sa)
 	 * Make sure that rounding and/or propagation of PELT values never
 	 * break this.
 	 */
-	SCHED_WARN_ON(sa->load_avg ||
+	WARN_ON_ONCE(sa->load_avg ||
 		      sa->util_avg ||
 		      sa->runnable_avg);
 
@@ -5460,7 +5460,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	clear_buddies(cfs_rq, se);
 
 	if (flags & DEQUEUE_DELAYED) {
-		SCHED_WARN_ON(!se->sched_delayed);
+		WARN_ON_ONCE(!se->sched_delayed);
 	} else {
 		bool delay = sleep;
 		/*
@@ -5470,7 +5470,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 		if (flags & DEQUEUE_SPECIAL)
 			delay = false;
 
-		SCHED_WARN_ON(delay && se->sched_delayed);
+		WARN_ON_ONCE(delay && se->sched_delayed);
 
 		if (sched_feat(DELAY_DEQUEUE) && delay &&
 		    !entity_eligible(cfs_rq, se)) {
@@ -5551,7 +5551,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	}
 
 	update_stats_curr_start(cfs_rq, se);
-	SCHED_WARN_ON(cfs_rq->curr);
+	WARN_ON_ONCE(cfs_rq->curr);
 	cfs_rq->curr = se;
 
 	/*
@@ -5592,7 +5592,7 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
 	if (sched_feat(PICK_BUDDY) &&
 	    cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next)) {
 		/* ->next will never be delayed */
-		SCHED_WARN_ON(cfs_rq->next->sched_delayed);
+		WARN_ON_ONCE(cfs_rq->next->sched_delayed);
 		return cfs_rq->next;
 	}
 
@@ -5628,7 +5628,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
 		/* in !on_rq case, update occurred at dequeue */
 		update_load_avg(cfs_rq, prev, 0);
 	}
-	SCHED_WARN_ON(cfs_rq->curr != prev);
+	WARN_ON_ONCE(cfs_rq->curr != prev);
 	cfs_rq->curr = NULL;
 }
 
@@ -5851,7 +5851,7 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
 
 			cfs_rq->throttled_clock_self = 0;
 
-			if (SCHED_WARN_ON((s64)delta < 0))
+			if (WARN_ON_ONCE((s64)delta < 0))
 				delta = 0;
 
 			cfs_rq->throttled_clock_self_time += delta;
@@ -5871,7 +5871,7 @@ static int tg_throttle_down(struct task_group *tg, void *data)
 		cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
 		list_del_leaf_cfs_rq(cfs_rq);
 
-		SCHED_WARN_ON(cfs_rq->throttled_clock_self);
+		WARN_ON_ONCE(cfs_rq->throttled_clock_self);
 		if (cfs_rq->nr_queued)
 			cfs_rq->throttled_clock_self = rq_clock(rq);
 	}
@@ -5980,7 +5980,7 @@ done:
 	 * throttled-list.  rq->lock protects completion.
 	 */
 	cfs_rq->throttled = 1;
-	SCHED_WARN_ON(cfs_rq->throttled_clock);
+	WARN_ON_ONCE(cfs_rq->throttled_clock);
 	if (cfs_rq->nr_queued)
 		cfs_rq->throttled_clock = rq_clock(rq);
 	return true;
@@ -6136,7 +6136,7 @@ static inline void __unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
 	}
 
 	/* Already enqueued */
-	if (SCHED_WARN_ON(!list_empty(&cfs_rq->throttled_csd_list)))
+	if (WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_csd_list)))
 		return;
 
 	first = list_empty(&rq->cfsb_csd_list);
@@ -6155,7 +6155,7 @@ static void unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq)
 {
 	lockdep_assert_rq_held(rq_of(cfs_rq));
 
-	if (SCHED_WARN_ON(!cfs_rq_throttled(cfs_rq) ||
+	if (WARN_ON_ONCE(!cfs_rq_throttled(cfs_rq) ||
 	    cfs_rq->runtime_remaining <= 0))
 		return;
 
@@ -6191,7 +6191,7 @@ static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 			goto next;
 
 		/* By the above checks, this should never be true */
-		SCHED_WARN_ON(cfs_rq->runtime_remaining > 0);
+		WARN_ON_ONCE(cfs_rq->runtime_remaining > 0);
 
 		raw_spin_lock(&cfs_b->lock);
 		runtime = -cfs_rq->runtime_remaining + 1;
@@ -6212,7 +6212,7 @@ static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b)
 				 * We currently only expect to be unthrottling
 				 * a single cfs_rq locally.
 				 */
-				SCHED_WARN_ON(!list_empty(&local_unthrottle));
+				WARN_ON_ONCE(!list_empty(&local_unthrottle));
 				list_add_tail(&cfs_rq->throttled_csd_list,
 					      &local_unthrottle);
 			}
@@ -6237,7 +6237,7 @@ next:
 
 		rq_unlock_irqrestore(rq, &rf);
 	}
-	SCHED_WARN_ON(!list_empty(&local_unthrottle));
+	WARN_ON_ONCE(!list_empty(&local_unthrottle));
 
 	rcu_read_unlock();
 
@@ -6789,7 +6789,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
 
-	SCHED_WARN_ON(task_rq(p) != rq);
+	WARN_ON_ONCE(task_rq(p) != rq);
 
 	if (rq->cfs.h_nr_queued > 1) {
 		u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
@@ -6900,8 +6900,8 @@ requeue_delayed_entity(struct sched_entity *se)
 	 * Because a delayed entity is one that is still on
 	 * the runqueue competing until elegibility.
 	 */
-	SCHED_WARN_ON(!se->sched_delayed);
-	SCHED_WARN_ON(!se->on_rq);
+	WARN_ON_ONCE(!se->sched_delayed);
+	WARN_ON_ONCE(!se->on_rq);
 
 	if (sched_feat(DELAY_ZERO)) {
 		update_entity_lag(cfs_rq, se);
@@ -7161,8 +7161,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
 		rq->next_balance = jiffies;
 
 	if (p && task_delayed) {
-		SCHED_WARN_ON(!task_sleep);
-		SCHED_WARN_ON(p->on_rq != 1);
+		WARN_ON_ONCE(!task_sleep);
+		WARN_ON_ONCE(p->on_rq != 1);
 
 		/* Fix-up what dequeue_task_fair() skipped */
 		hrtick_update(rq);
@@ -8740,7 +8740,7 @@ static inline void set_task_max_allowed_capacity(struct task_struct *p) {}
 static void set_next_buddy(struct sched_entity *se)
 {
 	for_each_sched_entity(se) {
-		if (SCHED_WARN_ON(!se->on_rq))
+		if (WARN_ON_ONCE(!se->on_rq))
 			return;
 		if (se_is_idle(se))
 			return;
@@ -12484,7 +12484,7 @@ unlock:
 
 void nohz_balance_exit_idle(struct rq *rq)
 {
-	SCHED_WARN_ON(rq != this_rq());
+	WARN_ON_ONCE(rq != this_rq());
 
 	if (likely(!rq->nohz_tick_stopped))
 		return;
@@ -12520,7 +12520,7 @@ void nohz_balance_enter_idle(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 
-	SCHED_WARN_ON(cpu != smp_processor_id());
+	WARN_ON_ONCE(cpu != smp_processor_id());
 
 	/* If this CPU is going down, then nothing needs to be done: */
 	if (!cpu_active(cpu))
@@ -12603,7 +12603,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags)
 	int balance_cpu;
 	struct rq *rq;
 
-	SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
+	WARN_ON_ONCE((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
 
 	/*
 	 * We assume there will be no idle load after this update and clear
@@ -13043,7 +13043,7 @@ bool cfs_prio_less(const struct task_struct *a, const struct task_struct *b,
 	struct cfs_rq *cfs_rqb;
 	s64 delta;
 
-	SCHED_WARN_ON(task_rq(b)->core != rq->core);
+	WARN_ON_ONCE(task_rq(b)->core != rq->core);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	/*
@@ -13246,7 +13246,7 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
-	SCHED_WARN_ON(p->se.sched_delayed);
+	WARN_ON_ONCE(p->se.sched_delayed);
 
 	attach_task_cfs_rq(p);
 
@@ -13281,7 +13281,7 @@ static void __set_next_task_fair(struct rq *rq, struct task_struct *p, bool firs
 	if (!first)
 		return;
 
-	SCHED_WARN_ON(se->sched_delayed);
+	WARN_ON_ONCE(se->sched_delayed);
 
 	if (hrtick_enabled_fair(rq))
 		hrtick_start_fair(rq, p);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8cebe71..8b8d2c1 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1713,7 +1713,7 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rt_rq *rt_rq)
 	BUG_ON(idx >= MAX_RT_PRIO);
 
 	queue = array->queue + idx;
-	if (SCHED_WARN_ON(list_empty(queue)))
+	if (WARN_ON_ONCE(list_empty(queue)))
 		return NULL;
 	next = list_entry(queue->next, struct sched_rt_entity, run_list);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 5d853f9..fadaabe 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -91,12 +91,6 @@ struct cpuidle_state;
 #include "cpupri.h"
 #include "cpudeadline.h"
 
-#ifdef CONFIG_SCHED_DEBUG
-# define SCHED_WARN_ON(x)      WARN_ONCE(x, #x)
-#else
-# define SCHED_WARN_ON(x)      ({ (void)(x), 0; })
-#endif
-
 /* task_struct::on_rq states: */
 #define TASK_ON_RQ_QUEUED	1
 #define TASK_ON_RQ_MIGRATING	2
@@ -1571,7 +1565,7 @@ static inline void update_idle_core(struct rq *rq) { }
 
 static inline struct task_struct *task_of(struct sched_entity *se)
 {
-	SCHED_WARN_ON(!entity_is_task(se));
+	WARN_ON_ONCE(!entity_is_task(se));
 	return container_of(se, struct task_struct, se);
 }
 
@@ -1652,7 +1646,7 @@ static inline void assert_clock_updated(struct rq *rq)
 	 * The only reason for not seeing a clock update since the
 	 * last rq_pin_lock() is if we're currently skipping updates.
 	 */
-	SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP);
+	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
 }
 
 static inline u64 rq_clock(struct rq *rq)
@@ -1699,7 +1693,7 @@ static inline void rq_clock_cancel_skipupdate(struct rq *rq)
 static inline void rq_clock_start_loop_update(struct rq *rq)
 {
 	lockdep_assert_rq_held(rq);
-	SCHED_WARN_ON(rq->clock_update_flags & RQCF_ACT_SKIP);
+	WARN_ON_ONCE(rq->clock_update_flags & RQCF_ACT_SKIP);
 	rq->clock_update_flags |= RQCF_ACT_SKIP;
 }
 
@@ -1774,7 +1768,7 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
 	rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
 	rf->clock_update_flags = 0;
 # ifdef CONFIG_SMP
-	SCHED_WARN_ON(rq->balance_callback && rq->balance_callback != &balance_push_callback);
+	WARN_ON_ONCE(rq->balance_callback && rq->balance_callback != &balance_push_callback);
 # endif
 #endif
 }
@@ -2685,7 +2679,7 @@ static inline void idle_set_state(struct rq *rq,
 
 static inline struct cpuidle_state *idle_get_state(struct rq *rq)
 {
-	SCHED_WARN_ON(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held());
 
 	return rq->idle_state;
 }
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 19cdbe9..452826d 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -144,7 +144,7 @@ static inline void psi_enqueue(struct task_struct *p, int flags)
 
 	if (p->se.sched_delayed) {
 		/* CPU migration of "sleeping" task */
-		SCHED_WARN_ON(!(flags & ENQUEUE_MIGRATED));
+		WARN_ON_ONCE(!(flags & ENQUEUE_MIGRATED));
 		if (p->in_memstall)
 			set |= TSK_MEMSTALL;
 		if (p->in_iowait)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [tip: sched/core] sched/debug: Remove CONFIG_SCHED_DEBUG
  2025-03-20  8:59   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
@ 2025-03-24 11:57     ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2025-03-24 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Ingo Molnar, Shrikanth Hegde, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86

On Thu, Mar 20, 2025 at 08:59:56AM -0000, tip-bot2 for Ingo Molnar wrote:
> The following commit has been merged into the sched/core branch of tip:
> 
> Commit-ID:     b52173065e0aad82a31863bb5f63ebe46f7eb657
> Gitweb:        https://git.kernel.org/tip/b52173065e0aad82a31863bb5f63ebe46f7eb657
> Author:        Ingo Molnar <mingo@kernel.org>
> AuthorDate:    Mon, 17 Mar 2025 11:42:56 +01:00
> Committer:     Ingo Molnar <mingo@kernel.org>
> CommitterDate: Wed, 19 Mar 2025 22:23:24 +01:00
> 
> sched/debug: Remove CONFIG_SCHED_DEBUG
> 
> For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
> in all the major Linux distributions:
> 
>    /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y
> 
> The reason is that while originally CONFIG_SCHED_DEBUG started
> out as a debugging feature, over the years (decades ...) it has
> grown various bits of statistics, instrumentation and
> control knobs that are useful for sysadmin and general software
> development purposes as well.
> 
> But within the kernel we still pretend that there's a choice,
> and sometimes code that is seemingly 'debug only' creates overhead
> that should be optimized in reality.
> 
> So make it all official and make CONFIG_SCHED_DEBUG unconditional.
> 
> Now that all uses of CONFIG_SCHED_DEBUG are removed from
> the code by previous patches, remove the Kconfig option as well.


I really don't much like this :-(

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [tip: sched/core] sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
  2025-03-20  9:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
@ 2025-03-24 11:59     ` Peter Zijlstra
  2025-03-25  9:37       ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2025-03-24 11:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Ingo Molnar, Shrikanth Hegde, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86

On Thu, Mar 20, 2025 at 09:00:05AM -0000, tip-bot2 for Ingo Molnar wrote:
> The following commit has been merged into the sched/core branch of tip:
> 
> Commit-ID:     f7d2728cc032a23fccb5ecde69793a38eb30ba5c
> Gitweb:        https://git.kernel.org/tip/f7d2728cc032a23fccb5ecde69793a38eb30ba5c
> Author:        Ingo Molnar <mingo@kernel.org>
> AuthorDate:    Mon, 17 Mar 2025 11:42:52 +01:00
> Committer:     Ingo Molnar <mingo@kernel.org>
> CommitterDate: Wed, 19 Mar 2025 22:20:53 +01:00
> 
> sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
> 
> The scheduler has this special SCHED_WARN() facility that
> depends on CONFIG_SCHED_DEBUG.
> 
> Since CONFIG_SCHED_DEBUG is getting removed, convert
> SCHED_WARN() to WARN_ON_ONCE().
> 
> Note that the warning output isn't 100% equivalent:
> 
>    #define SCHED_WARN_ON(x)      WARN_ONCE(x, #x)
> 
> Because SCHED_WARN_ON() would output the 'x' condition
> as well, while WARN_ONCE() will only show a backtrace.
> 
> Hopefully these are rare enough to not really matter.
> 
> If it does, we should probably introduce a new WARN_ON()
> variant that outputs the condition in stringified form,
> or improve WARN_ON() itself.

So those strings really were useful, trouble is WARN_ONCE() generates
utter crap code compared to WARN_ON_ONCE(), but since SCHED_DEBUG that
doesn't really matter.

Also, last time I measured, there was a measurable performance
difference between SCHED_DEBUG=n and SCHED_DEBUG=y.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [tip: sched/core] sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
  2025-03-24 11:59     ` Peter Zijlstra
@ 2025-03-25  9:37       ` Ingo Molnar
  2025-03-25 11:18         ` [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-03-25  9:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-tip-commits, Shrikanth Hegde, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Mar 20, 2025 at 09:00:05AM -0000, tip-bot2 for Ingo Molnar wrote:
> > The following commit has been merged into the sched/core branch of tip:
> > 
> > Commit-ID:     f7d2728cc032a23fccb5ecde69793a38eb30ba5c
> > Gitweb:        https://git.kernel.org/tip/f7d2728cc032a23fccb5ecde69793a38eb30ba5c
> > Author:        Ingo Molnar <mingo@kernel.org>
> > AuthorDate:    Mon, 17 Mar 2025 11:42:52 +01:00
> > Committer:     Ingo Molnar <mingo@kernel.org>
> > CommitterDate: Wed, 19 Mar 2025 22:20:53 +01:00
> > 
> > sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
> > 
> > The scheduler has this special SCHED_WARN() facility that
> > depends on CONFIG_SCHED_DEBUG.
> > 
> > Since CONFIG_SCHED_DEBUG is getting removed, convert
> > SCHED_WARN() to WARN_ON_ONCE().
> > 
> > Note that the warning output isn't 100% equivalent:
> > 
> >    #define SCHED_WARN_ON(x)      WARN_ONCE(x, #x)
> > 
> > Because SCHED_WARN_ON() would output the 'x' condition
> > as well, while WARN_ONCE() will only show a backtrace.
> > 
> > Hopefully these are rare enough to not really matter.
> > 
> > If it does, we should probably introduce a new WARN_ON()
> > variant that outputs the condition in stringified form,
> > or improve WARN_ON() itself.
> 
> So those strings really were useful, trouble is WARN_ONCE() generates
> utter crap code compared to WARN_ON_ONCE(), but since SCHED_DEBUG that
> doesn't really matter.

Why wouldn't it matter? CONFIG_SCHED_DEBUG was turned on for 99.9999% 
of Linux users, ie. we generated crap code for most of our users.

And as a side effect of using the standard WARN_ON_ONCE() primitive we 
now generate better code, at the expense of harder to interpret debug 
output, right?

Ie. CONFIG_SCHED_DEBUG has obfuscated crappy code generation under the 
"it's only debugging code" pretense, right?

> Also, last time I measured, there was a measurable performance
> difference between SCHED_DEBUG=n and SCHED_DEBUG=y.

Which 99.9999% of Linux users are affected by. The config option 
basically did nothing for them but hide this overhead...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions
  2025-03-25  9:37       ` Ingo Molnar
@ 2025-03-25 11:18         ` Ingo Molnar
  2025-03-25 12:36           ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-03-25 11:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-tip-commits, Shrikanth Hegde, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86


* Ingo Molnar <mingo@kernel.org> wrote:

> > >    #define SCHED_WARN_ON(x)      WARN_ONCE(x, #x)
> > > 
> > > Because SCHED_WARN_ON() would output the 'x' condition
> > > as well, while WARN_ONCE() will only show a backtrace.
> > > 
> > > Hopefully these are rare enough to not really matter.
> > > 
> > > If it does, we should probably introduce a new WARN_ON()
> > > variant that outputs the condition in stringified form,
> > > or improve WARN_ON() itself.
> > 
> > So those strings really were useful, trouble is WARN_ONCE() generates
> > utter crap code compared to WARN_ON_ONCE(), but since SCHED_DEBUG that
> > doesn't really matter.
> 
> Why wouldn't it matter? CONFIG_SCHED_DEBUG was turned on for 99.9999% 
> of Linux users, ie. we generated crap code for most of our users.
> 
> And as a side effect of using the standard WARN_ON_ONCE() primitive we 
> now generate better code, at the expense of harder to interpret debug 
> output, right?
> 
> Ie. CONFIG_SCHED_DEBUG has obfuscated crappy code generation under the 
> "it's only debugging code" pretense, right?

So, to argue this via code, we'd like to have something like the patch below?

When enabled it will warn in the following fashion:

  static void super_perfect_kernel_function(void *ptr)
  {
	...
	WARN_ON_ONCE(ptr == 0 && 1);
	...
  }


  ------------[ cut here ]------------
  FAIL: 'ptr == 0 && 1' is true
  WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:8511 sched_init+0x44/0x430
  ...

But the real question is, how do we keep distros from enabling 
CONFIG_DEBUG_BUGVERBOSE_EXTRA=y?

It does bloat the defconfig by about +144k .text and ~64k data, so 
maybe that's deterrence enough.

The BSS shift is due to it not using the clever x86 U2D tricks, right?

Thanks,

	Ingo

=================>
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 25 Mar 2025 11:35:20 +0100
Subject: [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions

      text         data	    bss	     dec	    hex	filename
  29522704	7926322	1389904	38838930	250a292	vmlinux.before
  29667392	8017958	1363024	39048374	253d4b6	vmlinux.after

Totally-Not-Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/asm-generic/bug.h |  7 +++++++
 kernel/sched/core.c       |  2 ++
 lib/Kconfig.debug         | 12 ++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 387720933973..5475258a99dc 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -92,6 +92,11 @@ void warn_slowpath_fmt(const char *file, const int line, unsigned taint,
 		       const char *fmt, ...);
 extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 
+#ifdef CONFIG_DEBUG_BUGVERBOSE_EXTRA
+#define WARN_ON_ONCE(condition)						\
+	DO_ONCE_LITE_IF(condition, WARN, 1, "FAIL: '%s' is true", #condition)
+#endif
+
 #ifndef __WARN_FLAGS
 #define __WARN()		__WARN_printf(TAINT_WARN, NULL)
 #define __WARN_printf(taint, arg...) do {				\
@@ -107,6 +112,7 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 		__WARN_FLAGS(BUGFLAG_NO_CUT_HERE | BUGFLAG_TAINT(taint));\
 		instrumentation_end();					\
 	} while (0)
+#ifndef WARN_ON_ONCE
 #define WARN_ON_ONCE(condition) ({				\
 	int __ret_warn_on = !!(condition);			\
 	if (unlikely(__ret_warn_on))				\
@@ -115,6 +121,7 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 	unlikely(__ret_warn_on);				\
 })
 #endif
+#endif
 
 /* used internally by panic.c */
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 87540217fc09..71bf94bf68f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8508,6 +8508,8 @@ void __init sched_init(void)
 	unsigned long ptr = 0;
 	int i;
 
+	WARN_ON_ONCE(ptr == 0 && 1);
+
 	/* Make sure the linker didn't screw up */
 #ifdef CONFIG_SMP
 	BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class));
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index b1b92a9a8f24..88f215f712f8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -206,6 +206,18 @@ config DEBUG_BUGVERBOSE
 	  of the BUG call as well as the EIP and oops trace.  This aids
 	  debugging but costs about 70-100K of memory.
 
+config DEBUG_BUGVERBOSE_EXTRA
+	bool "Extra verbose WARN_ON() reporting" if DEBUG_BUGVERBOSE
+	default n
+	help
+	  Say Y here to make WARN_ON() warnings extra verbose, printing
+	  the condition they warn about.
+
+	  This aids debugging but uses up some memory and causes some
+	  runtime overhead due to worse code generation.
+
+	  If unsure, say N.
+
 endmenu # "printk and dmesg options"
 
 config DEBUG_KERNEL


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions
  2025-03-25 11:18         ` [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions Ingo Molnar
@ 2025-03-25 12:36           ` Peter Zijlstra
  2025-03-25 17:48             ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2025-03-25 12:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-tip-commits, Shrikanth Hegde, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Linus Torvalds, x86

On Tue, Mar 25, 2025 at 12:18:39PM +0100, Ingo Molnar wrote:

> So, to argue this via code, we'd like to have something like the patch below?

I would do it differently. If we know the thing is a simple string, we
can stick it in bug_entry and print from __report_bug() without causing
horrific shite at the call site.

The problem with WARN() is that it is a format string, which must be
filled out in situ. Resulting in calls to snprintf() and arguments and
whatnot.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions
  2025-03-25 12:36           ` Peter Zijlstra
@ 2025-03-25 17:48             ` Linus Torvalds
  2025-03-25 18:46               ` Peter Zijlstra
  2025-03-25 22:42               ` [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output Ingo Molnar
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2025-03-25 17:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, linux-tip-commits, Shrikanth Hegde,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, x86

 On Tue, 25 Mar 2025 at 05:36, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The problem with WARN() is that it is a format string, which must be
> filled out in situ. Resulting in calls to snprintf() and arguments and
> whatnot.

A fair number of warnings do want the format string, so that you can
print out more information about what went wrong if the warning
triggered.

That said, I do think that the "just give a fixed string that is the
warning condition" is probably the right thing 90% of the time, and is
the much simpler interface both to use and causes much less code
(exactly because it's just a single hardcoded string at compile time).

So I think we end up wanting both.

But I *don't* like Ingo's suggestion of DEBUG_BUGVERBOSE_EXTRA,
because it does that "both" by making the simple case complicated.

How about going a different route instead? Right now we have that
CONFIG_DEBUG_BUGVERBOSE thing which adds the file name and line number
information. That has been very good.

But maybe that should be extended to also always take the compile-time
'#condition' string?

So then all warnings would have the warning condition string (assuming
you end up enabling DEBUG_BUGVERBOSE, of course, which I think
everybody pretty much does). With no extra code.

And then the _dynamic_ string - and associated code generation - would
be only for when you want to print out the actual values that caused
the warning.

Hmm?

             Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions
  2025-03-25 17:48             ` Linus Torvalds
@ 2025-03-25 18:46               ` Peter Zijlstra
  2025-03-25 22:42               ` [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output Ingo Molnar
  1 sibling, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2025-03-25 18:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, linux-kernel, linux-tip-commits, Shrikanth Hegde,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, x86

On Tue, Mar 25, 2025 at 10:48:49AM -0700, Linus Torvalds wrote:
> Hmm?

That is indeed what I was thinking of; far better articulated :-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output
  2025-03-25 17:48             ` Linus Torvalds
  2025-03-25 18:46               ` Peter Zijlstra
@ 2025-03-25 22:42               ` Ingo Molnar
  2025-03-25 23:12                 ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2025-03-25 22:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, linux-kernel, linux-tip-commits, Shrikanth Hegde,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, x86


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> How about going a different route instead? Right now we have that 
> CONFIG_DEBUG_BUGVERBOSE thing which adds the file name and line 
> number information. That has been very good.
> 
> But maybe that should be extended to also always take the 
> compile-time '#condition' string?
> 
> So then all warnings would have the warning condition string 
> (assuming you end up enabling DEBUG_BUGVERBOSE, of course, which I 
> think everybody pretty much does). With no extra code.

So something like the patch below?

Testcase:

  @@ -8508,6 +8508,8 @@ void __init sched_init(void)
          unsigned long ptr = 0;
          int i;
 
  +       WARN_ON_ONCE(ptr == 0 && 1);
  +

Before:

  WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:8511 sched_init+0x20/0x410

After:

  WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
                            ^^^^^^^^^^^^^^^

I concatenated the condition string with the file string, so didn't 
have to extend the 'struct bug_entry' backend, and could be shared with 
the regular WARN() and BUG*() code as well without modifying its 
output.

The .text impact is zero, as hoped for:

       text       data        bss         dec        hex    filename
   29523998    7926322    1389904    38840224    250a7a0    vmlinux.before
   29523998    8024626    1389904    38938528    25227a0    vmlinue.after

So this does have the debugging advantages of SCHED_WARN_ON() and the 
code generation benefits of WARN_ON_ONCE().

Note that the patch has still the maturity of a Labradoodle puppy: it 
won't build on the majority of non-x86 architectures, has only been 
built and booted once, etc. - so it's not signed off on.

Thanks,

	Ingo

===================>
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 25 Mar 2025 12:18:44 +0100
Subject: [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output

       text       data        bss         dec        hex    filename
   29523998    7926322    1389904    38840224    250a7a0    vmlinux.before
   29523998    8024626    1389904    38938528    25227a0    vmlinue.after

Before:

  WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:8511 sched_init+0x20/0x410

After:

  WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
                            ^^^^^^^^^^^^^^^
Not-Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Z-KRD3ODxT9f8Yjw@gmail.com
---
 arch/x86/include/asm/bug.h | 14 +++++++-------
 include/asm-generic/bug.h  |  7 ++++---
 kernel/sched/core.c        |  2 ++
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index f0e9acf72547..e966199c8ef7 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -39,7 +39,7 @@
 
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 
-#define _BUG_FLAGS(ins, flags, extra)					\
+#define _BUG_FLAGS(cond_str, ins, flags, extra)				\
 do {									\
 	asm_inline volatile("1:\t" ins "\n"				\
 		     ".pushsection __bug_table,\"aw\"\n"		\
@@ -50,14 +50,14 @@ do {									\
 		     "\t.org 2b+%c3\n"					\
 		     ".popsection\n"					\
 		     extra						\
-		     : : "i" (__FILE__), "i" (__LINE__),		\
+		     : : "i" (cond_str __FILE__), "i" (__LINE__),		\
 			 "i" (flags),					\
 			 "i" (sizeof(struct bug_entry)));		\
 } while (0)
 
 #else /* !CONFIG_DEBUG_BUGVERBOSE */
 
-#define _BUG_FLAGS(ins, flags, extra)					\
+#define _BUG_FLAGS(cond_str, ins, flags, extra)				\
 do {									\
 	asm_inline volatile("1:\t" ins "\n"				\
 		     ".pushsection __bug_table,\"aw\"\n"		\
@@ -74,7 +74,7 @@ do {									\
 
 #else
 
-#define _BUG_FLAGS(ins, flags, extra)  asm volatile(ins)
+#define _BUG_FLAGS(cond_str, ins, flags, extra)  asm volatile(ins)
 
 #endif /* CONFIG_GENERIC_BUG */
 
@@ -82,7 +82,7 @@ do {									\
 #define BUG()							\
 do {								\
 	instrumentation_begin();				\
-	_BUG_FLAGS(ASM_UD2, 0, "");				\
+	_BUG_FLAGS("", ASM_UD2, 0, "");				\
 	__builtin_unreachable();				\
 } while (0)
 
@@ -92,11 +92,11 @@ do {								\
  * were to trigger, we'd rather wreck the machine in an attempt to get the
  * message out than not know about it.
  */
-#define __WARN_FLAGS(flags)					\
+#define __WARN_FLAGS(cond_str, flags)				\
 do {								\
 	__auto_type __flags = BUGFLAG_WARNING|(flags);		\
 	instrumentation_begin();				\
-	_BUG_FLAGS(ASM_UD2, __flags, ANNOTATE_REACHABLE(1b));	\
+	_BUG_FLAGS(cond_str, ASM_UD2, __flags, ANNOTATE_REACHABLE(1b)); \
 	instrumentation_end();					\
 } while (0)
 
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 387720933973..c8e7126bc26e 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -100,17 +100,18 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 		instrumentation_end();					\
 	} while (0)
 #else
-#define __WARN()		__WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN))
+#define __WARN()		__WARN_FLAGS("", BUGFLAG_TAINT(TAINT_WARN))
 #define __WARN_printf(taint, arg...) do {				\
 		instrumentation_begin();				\
 		__warn_printk(arg);					\
-		__WARN_FLAGS(BUGFLAG_NO_CUT_HERE | BUGFLAG_TAINT(taint));\
+		__WARN_FLAGS("", BUGFLAG_NO_CUT_HERE | BUGFLAG_TAINT(taint));\
 		instrumentation_end();					\
 	} while (0)
 #define WARN_ON_ONCE(condition) ({				\
 	int __ret_warn_on = !!(condition);			\
 	if (unlikely(__ret_warn_on))				\
-		__WARN_FLAGS(BUGFLAG_ONCE |			\
+		__WARN_FLAGS("["#condition"] ",			\
+			     BUGFLAG_ONCE |			\
 			     BUGFLAG_TAINT(TAINT_WARN));	\
 	unlikely(__ret_warn_on);				\
 })
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 87540217fc09..71bf94bf68f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8508,6 +8508,8 @@ void __init sched_init(void)
 	unsigned long ptr = 0;
 	int i;
 
+	WARN_ON_ONCE(ptr == 0 && 1);
+
 	/* Make sure the linker didn't screw up */
 #ifdef CONFIG_SMP
 	BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class));

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output
  2025-03-25 22:42               ` [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output Ingo Molnar
@ 2025-03-25 23:12                 ` Linus Torvalds
  2025-03-26  7:42                   ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2025-03-25 23:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, linux-kernel, linux-tip-commits, Shrikanth Hegde,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, x86

On Tue, 25 Mar 2025 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:
>
> So something like the patch below?
> [...]
> After:
>
>   WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
>                             ^^^^^^^^^^^^^^^

Hmm. Is that the prettiest output ever? No. But it does seem workable,
and the patch is simple.

And I think the added condition string is useful, in that I often end
up looking up warnings that other people report and where the line
numbers have changed enough that it's not immediately obvious exactly
which warning it is. Not only does it disambiguate which warning it
is, it would probably often would obviate having to look it up
entirely because the warning message is now more useful.

So I think I like it. Let's see how it works in practice.

(I actually think the "CPU: 0 PID: 0" is likely the least useful part
of that warning string, and maybe *that* should be moved away and make
things a bit more legible, but I think that discussion might as well
be part of that "Let's see how it works")

            Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output
  2025-03-25 23:12                 ` Linus Torvalds
@ 2025-03-26  7:42                   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2025-03-26  7:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, linux-kernel, linux-tip-commits, Shrikanth Hegde,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider, x86


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, 25 Mar 2025 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > So something like the patch below?
> > [...]
> > After:
> >
> >   WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
> >                             ^^^^^^^^^^^^^^^
> 
> Hmm. Is that the prettiest output ever? No. But it does seem workable,
> and the patch is simple.
> 
> And I think the added condition string is useful, in that I often end
> up looking up warnings that other people report and where the line
> numbers have changed enough that it's not immediately obvious exactly
> which warning it is. Not only does it disambiguate which warning it
> is, it would probably often would obviate having to look it up
> entirely because the warning message is now more useful.

Yeah, that exactly was the original motivation for SCHED_WARN_ON(): 
core kernel code often gets backported on and changed by distributions, 
so line numbers are fuzzy and with large functions it's sometimes 
unclear exactly where the warning originated from.

> So I think I like it. Let's see how it works in practice.
> 
> (I actually think the "CPU: 0 PID: 0" is likely the least useful part 
> of that warning string, and maybe *that* should be moved away and 
> make things a bit more legible, but I think that discussion might as 
> well be part of that "Let's see how it works")

Okay!

The CPU and PID part is particularly useless, given that it's repeated 
in the splat a few lines later:

  ------------[ cut here ]------------^M
  WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
  Modules linked in:
  CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-01616-g94d7af2844aa #4 PREEMPT(undef)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  RIP: 0010:sched_init+0x20/0x410

So I'll just remove it, which will turn this into:

  WARNING: [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410

Which is actually pretty nicely formatted IMHO and orders the 
information by expected entropy: most constant, most valuable 
information comes first.

BTW., there's also another option we still have open: by using a unique 
character separator that isn't 0 we could split up the single string 
into cond_str and FILE_str parts, and leave formatting to 
architectures. But I don't think it's needed if we get rid of the "CPU: 
PID:" noise though.

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-03-26  7:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-17 10:42 [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Ingo Molnar
2025-03-17 10:42 ` [PATCH 1/5] sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() Ingo Molnar
2025-03-20  9:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2025-03-24 11:59     ` Peter Zijlstra
2025-03-25  9:37       ` Ingo Molnar
2025-03-25 11:18         ` [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions Ingo Molnar
2025-03-25 12:36           ` Peter Zijlstra
2025-03-25 17:48             ` Linus Torvalds
2025-03-25 18:46               ` Peter Zijlstra
2025-03-25 22:42               ` [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output Ingo Molnar
2025-03-25 23:12                 ` Linus Torvalds
2025-03-26  7:42                   ` Ingo Molnar
2025-03-17 10:42 ` [PATCH 2/5] sched/debug: Make 'const_debug' tunables unconditional __read_mostly Ingo Molnar
2025-03-20  9:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2025-03-17 10:42 ` [PATCH 3/5] sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional Ingo Molnar
2025-03-20  9:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2025-03-17 10:42 ` [PATCH 4/5] sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation Ingo Molnar
2025-03-20  9:00   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2025-03-17 10:42 ` [PATCH 5/5] sched/debug: Remove CONFIG_SCHED_DEBUG Ingo Molnar
2025-03-20  8:59   ` [tip: sched/core] " tip-bot2 for Ingo Molnar
2025-03-24 11:57     ` Peter Zijlstra
2025-03-17 21:39 ` [PATCH 0/5] sched: Make CONFIG_SCHED_DEBUG features unconditional Linus Torvalds
2025-03-17 22:24   ` Ingo Molnar
2025-03-17 22:42     ` Ingo Molnar
2025-03-19  8:49 ` Valentin Schneider
2025-03-19 21:09   ` Ingo Molnar
2025-03-19 12:48 ` Shrikanth Hegde
2025-03-19 21:14   ` Ingo Molnar
2025-03-20  4:41     ` Shrikanth Hegde
2025-03-20  9:00     ` [tip: sched/core] sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files tip-bot2 for Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).