[PATCH 01/12] sched_debug: Unify parsing methods for task

public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 01/12] sched_debug: Unify parsing methods for task_info
@ 2025-10-17  2:24 Clark Williams
  2025-10-17  2:24 ` [PATCH 02/12] sched_debug: Fix runqueue task parsing logic and state filtering Clark Williams
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Clark Williams, Derek Barbosa, wander,
	marco.chiappero, chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

In the sched_debug backend code, there are two logical paths for parsing
the sched_debug file's task_info "running tasks". These are currently
divided into "OLD" and "NEW" parsing functions, each with their own
logic.

Unify these branching code paths by creating a line-based "word" parser
that stores the word-offset of the neccessary fields in a struct.
Accomodate "legacy" behavior where needed using an enumerated type.

parse_task_lines() now can parse multiple formats of output from the
debugfs file which on modern systems is located at
/sys/kernel/debug/sched/debug, and on 3.X "legacy" systems, is located
at /proc/sched_debug.

The detect_task_format() function records field offsets which are used
by parse_task_lines to pull fields out of the "running tasks:" section.

Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Derek Barbosa <debarbos@redhat.com>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/sched_debug.c | 405 +++++++++++++++++++++-------------------------
 src/sched_debug.h |  65 +++++++-
 2 files changed, 244 insertions(+), 226 deletions(-)

diff --git a/src/sched_debug.c b/src/sched_debug.c
index fa2f74bf36ed..180932ca7aa3 100644
--- a/src/sched_debug.c
+++ b/src/sched_debug.c
@@ -24,6 +24,9 @@
  */
 static int config_task_format;
 
+static struct task_format_offsets
+    config_task_format_offsets  = { 0, 0, 0, 0 };
+
 /*
  * Read the contents of sched_debug into the input buffer.
  */
@@ -88,8 +91,12 @@ static char *get_next_cpu_info_start(char *start)
 {
 	const char *next_cpu = "cpu#";
 
-	/* Skip the current CPU definition. */
-	start += 10;
+        /*
+         * Skip the current CPU definition.
+         * We want to move our "cursor" past the current "cpu#" definition.
+         * This number is arbitrary. It is purely to assist strstr().
+         */
+        start += 10;
 
 	return strstr(start, next_cpu);
 }
@@ -143,9 +150,13 @@ static inline char *skipchars(char *str)
 	return str;
 }
 
+/*
+ * Note, for our purposes newline is *not* a space
+ * and we want to stop when we hit it
+ */
 static inline char *skipspaces(char *str)
 {
-	while (*str && isspace(*str))
+	while (*str && isspace(*str) && (*str != '\n'))
 		str++;
 	return str;
 }
@@ -156,6 +167,20 @@ static inline char *nextline(char *str)
 	return ptr ? ptr+1 : NULL;
 }
 
+/*
+ * skip a specified number of words on a task line
+ */
+
+static inline char *skipwords(char *ptr, int nwords)
+{
+	int i;
+	for (i=0; i < nwords; i++) {
+		ptr = skipspaces(ptr);
+		ptr = skipchars(ptr);
+	}
+	return ptr;
+}
+
 /*
  * Read sched_debug and figure out if it's old or new format
  * done once so if we fail just exit the program.
@@ -173,6 +198,7 @@ static int detect_task_format(void)
 	char *ptr;
 	int status;
 	int fd;
+	int i, count=0;
 
 	bufsiz = bufincrement = BUFFER_PAGES * page_size;
 
@@ -204,122 +230,63 @@ static int detect_task_format(void)
 
 	ptr = strstr(buffer, TASK_MARKER);
 	if (ptr == NULL) {
-		fprintf(stderr, "unable to find 'runnable tasks' in buffer, invalid input\n");
+		die("unable to find 'runnable tasks' in buffer, invalid input\n");
 		exit(-1);
 	}
 
-	ptr += strlen(TASK_MARKER) + 1;
-	ptr = skipspaces(ptr);
+	ptr = nextline(ptr);
+	i = 0;
 
-	if (strncmp(ptr, "task", 4) == 0) {
-		retval = OLD_TASK_FORMAT;
-		log_msg("detected old task format\n");
-	} else if (strncmp(ptr, "S", 1) == 0) {
+	/*
+	 * Determine the TASK_FORMAT from the first "word" in the header
+	 * line.
+	 */
+	ptr = skipspaces(ptr);
+	if (strncmp(ptr, "S", strlen("S")) == 0) {
+		log_msg("detect_task_format: NEW_TASK_FORMAT detected\n");
 		retval = NEW_TASK_FORMAT;
-		log_msg("detected new task format\n");
 	}
-
-	free(buffer);
-	return retval;
-}
-
-/*
- * Parse the new sched_debug format.
- *
- * Example:
- * ' S           task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep'
- * '-----------------------------------------------------------------------------------------------------------'
- * ' I         rcu_gp     3        13.973264         2   100         0.000000         0.004469         0.000000 0 0 /
- */
-static int parse_new_task_format(char *buffer, struct task_info *task_info, int nr_entries)
-{
-	char *R, *X, *start = buffer;
-	struct task_info *task;
-	int tasks = 0;
-	int comm_size;
-	char *end;
+	else {
+		log_msg("detect_task_format: OLD_TASK_FORMAT detected\n");
+		retval = OLD_TASK_FORMAT;
+	}
 
 	/*
-	 * If we have less than two tasks on the CPU there is no
-	 * possibility of a stall.
+	 * Look for our header keywords and store their offset
+	 * we'll use the offsets when we actually parse the task
+	 * line data
 	 */
-	if (nr_entries < 2)
-		return 0;
-
-	while (tasks < nr_entries) {
-		task = &task_info[tasks];
-
-		/*
-		 * Runnable tasks.
-		 */
-		R = strstr(start, "\n R");
-
-		/*
-		 * Dying tasks.
-		 */
-		X = strstr(start, "\n X");
-
-		/*
-		 * Get the first one, the only one, or break.
-		 */
-		if (X && R) {
-			start = R < X ? R : X;
-		} else if (X || R) {
-			start = R ? R : X;
-		} else {
-			break;
+	while (*ptr != '\n') {
+		ptr = skipspaces(ptr);
+		if (strncmp(ptr, "task", strlen("task")) == 0) {
+			config_task_format_offsets.task = i;
+			count++;
+			log_msg("detect_task_format: found 'task' at word %d\n", i);
 		}
-
-		/* Skip '\n R' || '\n X'. */
-		start = &start[3];
-
-		/* Skip the spaces. */
-		start = skipspaces(start);
-
-		/* Find the end of the string. */
-		end = skipchars(start);
-
-		comm_size = end - start;
-
-		if (comm_size >= COMM_SIZE) {
-			warn("comm_size is too large: %d\n", comm_size);
-			comm_size = COMM_SIZE - 1;
+		else if (strncmp(ptr, "PID", strlen("PID")) == 0) {
+			config_task_format_offsets.pid = i;
+			count++;
+			log_msg("detect_task_format: found 'PID' at word %d\n", i);
 		}
-
-		strncpy(task->comm, start, comm_size);
-
-		task->comm[comm_size] = '\0';
-
-		/* Go to the end of the task comm. */
-		start=end;
-
-		task->pid = strtol(start, &end, 10);
-
-		/* Get the id of the thread group leader. */
-		task->tgid = get_tgid(task->pid);
-
-		/* Go to the end of the pid. */
-		start=end;
-
-		/* Skip the tree-key. */
-		start = skipspaces(start);
-		start = skipchars(start);
-
-		task->ctxsw = strtol(start, &end, 10);
-
-		start = end;
-
-		task->prio = strtol(start, &end, 10);
-
-		task->since = time(NULL);
-
-		/* Go to the end and try to find the next occurrence. */
-		start = end;
-
-		tasks++;
+		else if (strncmp(ptr, "switches", strlen("switches")) == 0) {
+			config_task_format_offsets.switches = i;
+			count++;
+			log_msg("detect_task_format: found 'switches' at word %d\n", i);
+		}
+		else if (strncmp(ptr, "prio", strlen("prio")) == 0) {
+			config_task_format_offsets.prio = i;
+			count++;
+			log_msg("detect_task_format: found 'prio' at word %d\n", i);
+		}
+		ptr = skipchars(ptr);
+		i++;
 	}
 
-	return tasks;
+	if (count != 4)
+		die("detect_task_format: did not detect all task line fields we need\n");
+
+	free(buffer);
+	return retval;
 }
 
 /*
@@ -387,104 +354,80 @@ static int is_runnable(int pid)
 	return runnable;
 }
 
-static int count_task_lines(char *buffer)
-{
-	int lines = 0;
-	char *ptr;
-	int len;
-
-	len = strlen(buffer);
-
-	/* Find the runnable tasks: header. */
-	ptr = strstr(buffer, TASK_MARKER);
-	if (ptr == NULL)
-		return 0;
-
-	/* Skip to the end of the dashed line separator. */
-	ptr = strstr(ptr, "-\n");
-	if (ptr == NULL)
-		return 0;
-
-	ptr += 2;
-	while(*ptr && ptr < (buffer+len)) {
-		lines++;
-		ptr = strchr(ptr, '\n');
-		if (ptr == NULL)
-			break;
-		ptr++;
-	}
-	return lines;
-}
-
-/*
- * Parse the old sched debug format:
- *
- * Example:
- * '            task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
- * ' ----------------------------------------------------------------------------------------------------------
- * '     watchdog/35   296       -11.731402      4081     0         0.000000        44.052473         0.000000 /
- */
-static int parse_old_task_format(char *buffer, struct task_info *task_info, int nr_entries)
+static int parse_task_lines(char *buffer, struct task_info *task_info, int nr_entries)
 {
 	int pid, ctxsw, prio, comm_size;
-	char *start, *end, *buffer_end;
+	char *ptr, *line, *end;
 	struct task_info *task;
 	char comm[COMM_SIZE];
-	int waiting_tasks = 0;
-
-	start = buffer;
-	start = strstr(start, TASK_MARKER);
-	start = strstr(start, "-\n");
-	start++;
+	int tasks = 0;
 
-	buffer_end = buffer + strlen(buffer);
+	if ((ptr = strstr(buffer, TASK_MARKER)) == NULL)
+		die ("no runnable task section found!\n");
 
 	/*
-	 * We can't short-circuit using nr_entries, we have to scan the
-	 * entire list of processes that is on this CPU.
+	 * If we have less than two tasks on the CPU there is no
+	 * possibility of a stall.
 	 */
-	while (*start && start < buffer_end) {
-		task = &task_info[waiting_tasks];
+	if (nr_entries < 2)
+		return 0;
+	line = ptr;
+
+	/* skip header and divider */
+	line = nextline(line);
+	line = nextline(line);
+	
+	/* now loop over the task info */
+	while (tasks < nr_entries) {
+		task = &task_info[tasks];
 
-		/* Only care about tasks that are not R (running on a CPU). */
-		if (start[0] == 'R') {
+		/*
+		 * In 3.X kernels, only the singular RUNNING task receives
+		 * a "running state" label. Therefore, only care about
+		 * tasks that are not R (running on a CPU).
+		 */
+		if ((config_task_format == OLD_TASK_FORMAT) &&
+			(*ptr == 'R')) {
 			/* Go to the end of the line and ignore this task. */
-			start = strchr(start, '\n');
-			start++;
+			ptr = strchr(ptr, '\n');
+			ptr++;
 			continue;
 		}
 
-		/* Pick up the comm field. */
-		start = skipspaces(start);
-		end = skipchars(start);
-		comm_size = end - start;
+		/* get the task field */
+		ptr = skipwords(line, config_task_format_offsets.task);
+
+		/* Find the end of the task field */
+		end = skipchars(ptr);
+		comm_size = end - ptr;
+
+		/* make sure we don't overflow the comm array */
 		if (comm_size >= COMM_SIZE) {
 			warn("comm_size is too large: %d\n", comm_size);
 			comm_size = COMM_SIZE - 1;
 		}
-		strncpy(comm, start, comm_size);
-		comm[comm_size] = 0;
-
-		/* Go to the end of the task comm. */
-		start=end;
-
-		/* Now pick up the pid. */
-		pid = strtol(start, &end, 10);
-
-		/* Go to the end of the pid. */
-		start=end;
-
-		/* Skip the tree-key. */
-		start = skipspaces(start);
-		start = skipchars(start);
-
-		/* Pick up the context switch count. */
-		ctxsw = strtol(start, &end, 10);
-		start = end;
-
-		/* Get the priority. */
-		prio = strtol(start, &end, 10);
-		if (is_runnable(pid)) {
+		strncpy(comm, ptr, comm_size);
+		comm[comm_size] = '\0';
+		ptr = end;
+
+		/* get the PID field */
+		ptr = skipwords(line, config_task_format_offsets.pid);
+		pid = strtol(ptr, NULL, 10);
+
+		/* get the context switches field */
+		ptr = skipwords(line, config_task_format_offsets.switches);
+		ctxsw = strtol(ptr, NULL, 10);
+
+		/* get the prio field */
+		ptr = skipwords(line, config_task_format_offsets.prio);
+		prio = strtol(ptr, NULL, 10);
+
+                /*
+                 * In older formats, we must check to
+                 * see if the process is runnable prior to storing header
+                 * fields and incrementing task processing
+                 */
+                if ((config_task_format == NEW_TASK_FORMAT) || (is_runnable(pid))) {
 			strncpy(task->comm, comm, comm_size);
 			task->comm[comm_size] = 0;
 			task->pid = pid;
@@ -492,18 +435,44 @@ static int parse_old_task_format(char *buffer, struct task_info *task_info, int
 			task->ctxsw = ctxsw;
 			task->prio = prio;
 			task->since = time(NULL);
-			waiting_tasks++;
+			/* increment the count of tasks processed */
+			tasks++;
+		} else {
+			continue;
 		}
 
-		if ((start = nextline(start)) == NULL)
-			break;
+	}
+	return tasks;
+}
 
-		if (waiting_tasks >= nr_entries) {
+
+static int count_task_lines(char *buffer)
+{
+	int lines = 0;
+	char *ptr;
+	int len;
+
+	len = strlen(buffer);
+
+	/* Find the runnable tasks: header. */
+	ptr = strstr(buffer, TASK_MARKER);
+	if (ptr == NULL)
+		return 0;
+
+	/* Skip to the end of the dashed line separator. */
+	ptr = strstr(ptr, "-\n");
+	if (ptr == NULL)
+		return 0;
+
+	ptr += 2;
+	while(*ptr && ptr < (buffer+len)) {
+		lines++;
+		ptr = strchr(ptr, '\n');
+		if (ptr == NULL)
 			break;
-		}
+		ptr++;
 	}
-
-	return waiting_tasks;
+	return lines;
 }
 
 static int fill_waiting_task(char *buffer, struct cpu_info *cpu_info)
@@ -515,36 +484,23 @@ static int fill_waiting_task(char *buffer, struct cpu_info *cpu_info)
 		warn("NULL cpu_info pointer!\n");
 		return 0;
 	}
-	nr_entries = cpu_info->nr_running;
-
-	switch (config_task_format) {
-	case NEW_TASK_FORMAT:
-		cpu_info->starving = malloc(sizeof(struct task_info) * nr_entries);
-		if (cpu_info->starving == NULL) {
-			warn("failed to malloc %d task_info structs", nr_entries);
-			return 0;
-		}
-		nr_waiting = parse_new_task_format(buffer, cpu_info->starving, nr_entries);
-		break;
-	case OLD_TASK_FORMAT:
-		/*
-		 * The old task format does not output a correct value for
-		 * nr_running (the initializer for nr_entries) so count the
-		 * task lines for this CPU data and use that instead.
-		 */
+
+	if (config_task_format == OLD_TASK_FORMAT)
 		nr_entries = count_task_lines(buffer);
-		if (nr_entries <= 0)
-			return 0;
-		cpu_info->starving = malloc(sizeof(struct task_info) * nr_entries);
-		if (cpu_info->starving == NULL) {
-			warn("failed to malloc %d task_info structs", nr_entries);
-			return 0;
-		}
-		nr_waiting = parse_old_task_format(buffer, cpu_info->starving, nr_entries);
-		break;
-	default:
-		die("invalid value for config_task_format: %d\n", config_task_format);
+	else
+		nr_entries = cpu_info->nr_running;
+
+	if (nr_entries <= 0)
+		return 0;
+
+	cpu_info->starving = malloc(sizeof(struct task_info) * nr_entries);
+	if (cpu_info->starving == NULL) {
+		warn("failed to malloc %d task_info structs", nr_entries);
+		return 0;
 	}
+
+	nr_waiting = parse_task_lines(buffer, cpu_info->starving, nr_entries);
+
 	return nr_waiting;
 }
 
@@ -574,7 +530,7 @@ static int sched_debug_parse(struct cpu_info *cpu_info, char *buffer, size_t buf
 	}
 
 	/*
-	 * The NEW_TASK_FORMAT produces useful output values for nr_running and
+	 * NEW_TASK_FORMAT and produces useful output values for nr_running and
 	 * rt_nr_running, so in this case use them. For the old format just leave
 	 * them initialized to zero.
 	 */
@@ -613,7 +569,8 @@ static int sched_debug_has_starving_task(struct cpu_info *cpu)
 static int sched_debug_init(void)
 {
 	find_sched_debug_path();
-	config_task_format = detect_task_format();
+	if ((config_task_format = detect_task_format()) == TASK_FORMAT_UNKNOWN)
+		die("Can't handle task format!\n");
 	return 0;
 }
 
diff --git a/src/sched_debug.h b/src/sched_debug.h
index 21f9da27866e..4b12c39bf7fe 100644
--- a/src/sched_debug.h
+++ b/src/sched_debug.h
@@ -1,6 +1,67 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
-#define OLD_TASK_FORMAT  1
-#define NEW_TASK_FORMAT  2
 #define TASK_MARKER	"runnable tasks:"
+#define TASK_DIVIDER	"-\n"
+
+/*
+ * Over time, the various 'runnable task' output in SCHED_DEBUG has
+ * changed significantly.
+ *
+ * Depending on the version of the running kernel, the task formats can
+ * differ greatly.
+ *
+ * For example, in 3.X kernels, the sched_debug running tasks format denotes the current
+ * running task on the current CPU with a singular state label, 'R'. Other tasks do not
+ * receive a state label.
+ *
+ * example:
+ * '          task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep'
+ * ' ----------------------------------------------------------------------------------------------------------'
+ * '      watchdog/5    33        -8.984472       151     0         0.000000         0.535614         0.000000 0 /'
+ * ' R          less  9542      2382.087644        56   120         0.000000        16.444493         0.000000 0 /'
+ *
+ * In 4.18+ kernels, the sched_debug format running tasks format included an additional 'S'
+ * state field to denote the state of the running tasks on said CPU.
+ *
+ * example:
+ * ' S           task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep'
+ * '-----------------------------------------------------------------------------------------------------------'
+ * ' I         rcu_gp     3        13.973264         2   100         0.000000         0.004469         0.000000 0 0 /'
+ *
+ * Introduced in 6.12+, 2cab4bd024d2 sched/debug: Fix the runnable tasks
+ * output, the sched_debug running tasks format was changed to include
+ * four new EEVDF fields.
+ *
+ * Example:
+ *  'S            task   PID       vruntime   eligible    deadline             slice          sum-exec      switches  prio         wait-time        sum-sleep       sum-block  node   group-id  group-path'
+ *  '-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------'
+ *  ' I kworker/R-rcu_g     4        -1.048576   E          -1.040501           0.700000         0.000000         2     100         0.000000         0.000000         0.000000   0      0        /'
+ *
+ * As there are considerable differences in the location of the fields
+ * needed to boost task prioriy, handle the logical code differences with
+ * an enumerated type.
+ */
+enum task_format {
+	TASK_FORMAT_UNKNOWN =0,
+	OLD_TASK_FORMAT,	// 3.10 kernel
+	NEW_TASK_FORMAT,	// 4.18+ kernel
+	TASK_FORMAT_LIMIT
+};
+
+
+/*
+ * set of offsets in a task format line based on offsets
+ * discovered by discover_task_format
+ *
+ * Note: These are *NOT* character offsets, these are "word" offsets.
+ * Requiring consumers of this struct to parse through the individual
+ * lines.
+ */
+struct task_format_offsets {
+	int task;
+	int pid;
+	int switches;
+	int prio;
+	int wait_time;
+};
 
 extern struct stalld_backend sched_debug_backend;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/12] sched_debug: Fix runqueue task parsing logic and state filtering
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 15:58   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 03/12] sched_debug: Fix double-free crash in fill_waiting_task() Clark Williams
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Claude, Clark Williams, wander, debarbos,
	marco.chiappero, chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

Refactor parse_task_lines() to correctly parse runnable tasks from
sched_debug output with improved state filtering and iteration.

Key fixes:
- Rename skipwords() to skip2word() and fix logic to position pointer
  at the start of the target word (not after whitespace)
- Fix NEW_TASK_FORMAT detection to account for 'S' column offset in
  task field positions
- Add proper state filtering for NEW_TASK_FORMAT: skip '>R' (running),
  non-'R', and non-'X' states
- Fix loop termination: iterate to (nr_entries-1) since running task
  is excluded
- Fix buffer allocation size to include null terminator (+1)
- Add comprehensive comments explaining parsing state machine
- Initialize ptr=NULL to prevent uninitialized use warnings

This resolves task parsing failures where stalld was incorrectly
reading task fields or missing runnable tasks on newer kernels.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/sched_debug.c | 89 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 66 insertions(+), 23 deletions(-)

diff --git a/src/sched_debug.c b/src/sched_debug.c
index 180932ca7aa3..04a12ef2a7c2 100644
--- a/src/sched_debug.c
+++ b/src/sched_debug.c
@@ -122,7 +122,8 @@ static char *alloc_and_fill_cpu_buffer(int cpu, char *sched_dbg, int sched_dbg_s
 	if (!next_cpu_start)
 		next_cpu_start = sched_dbg + sched_dbg_size;
 
-	size = next_cpu_start - cpu_start;
+	/* add one for the null terminator */
+	size = next_cpu_start - cpu_start + 1;
 
 	if (size <= 0)
 		return NULL;
@@ -171,12 +172,13 @@ static inline char *nextline(char *str)
  * skip a specified number of words on a task line
  */
 
-static inline char *skipwords(char *ptr, int nwords)
+static inline char *skip2word(char *ptr, int nwords)
 {
 	int i;
-	for (i=0; i < nwords; i++) {
-		ptr = skipspaces(ptr);
+	ptr = skipspaces(ptr);
+	for (i=1; i < nwords; i++) {
 		ptr = skipchars(ptr);
+		ptr = skipspaces(ptr);
 	}
 	return ptr;
 }
@@ -228,12 +230,14 @@ static int detect_task_format(void)
 	config_buffer_size = bufsiz;
 	log_msg("initial config_buffer_size set to %zu\n", config_buffer_size);
 
+	/* find the delimiter for task information */
 	ptr = strstr(buffer, TASK_MARKER);
 	if (ptr == NULL) {
 		die("unable to find 'runnable tasks' in buffer, invalid input\n");
 		exit(-1);
 	}
 
+	/* move to the column header line */
 	ptr = nextline(ptr);
 	i = 0;
 
@@ -245,6 +249,8 @@ static int detect_task_format(void)
 	if (strncmp(ptr, "S", strlen("S")) == 0) {
 		log_msg("detect_task_format: NEW_TASK_FORMAT detected\n");
 		retval = NEW_TASK_FORMAT;
+		/* move the word offset by one */
+		i++;
 	}
 	else {
 		log_msg("detect_task_format: OLD_TASK_FORMAT detected\n");
@@ -357,45 +363,81 @@ static int is_runnable(int pid)
 static int parse_task_lines(char *buffer, struct task_info *task_info, int nr_entries)
 {
 	int pid, ctxsw, prio, comm_size;
-	char *ptr, *line, *end;
+	char *ptr=NULL, *line = buffer, *end;
+	char *buffer_end = buffer + strlen(buffer);
 	struct task_info *task;
 	char comm[COMM_SIZE];
 	int tasks = 0;
 
-	if ((ptr = strstr(buffer, TASK_MARKER)) == NULL)
-		die ("no runnable task section found!\n");
-
 	/*
 	 * If we have less than two tasks on the CPU there is no
 	 * possibility of a stall.
-	 */
+ 	 */
 	if (nr_entries < 2)
 		return 0;
+
+
+	/* search for the task marker header */
+	ptr = strstr(buffer, TASK_MARKER);
+	if (ptr == NULL)
+		die ("no runnable task section found!\n");
+
 	line = ptr;
 
-	/* skip header and divider */
+	/* skip "runnable tasks:" */
+ 	line = nextline(line);
+
+	/* skip header lines */
 	line = nextline(line);
+
+	/* skip divider line */
 	line = nextline(line);
-	
-	/* now loop over the task info */
-	while (tasks < nr_entries) {
+	/* at this point, line should point to the start of a task line */
+
+	/* now loop over the task info
+	 * note that we always discount the task that's on the cpu, so the
+	 * number of waiting tasks will always be at least one less than
+	 * nr_entries.
+	 */
+	while ((line < buffer_end) && tasks < (nr_entries-1)) {
 		task = &task_info[tasks];
 
+		/* move ptr to the first word of the line */
+		ptr = skipspaces(line);
+
 		/*
 		 * In 3.X kernels, only the singular RUNNING task receives
 		 * a "running state" label. Therefore, only care about
-		 * tasks that are not R (running on a CPU).
+		 * tasks that are not R (runnable on a CPU).
 		 */
 		if ((config_task_format == OLD_TASK_FORMAT) &&
 			(*ptr == 'R')) {
 			/* Go to the end of the line and ignore this task. */
-			ptr = strchr(ptr, '\n');
-			ptr++;
+			line = nextline(line);
 			continue;
 		}
 
+		/*
+		 * in newer kernels (>=4.x) every task info line has a state
+		 * but the actual running tasks has a '>R' to denote it.
+		 * since we don't care about the currently running tasks
+		 * skip it.
+		 * Also, we don't care about any states other than 'R' (runnable)
+		 * and 'X' (dying)
+		 */
+		if (config_task_format == NEW_TASK_FORMAT) {
+			if (*ptr == '>' || (*ptr != 'R' && *ptr != 'X')) {
+				line = nextline(line);
+				continue;
+			}
+		}
+
+		/*
+		 * At this point we have a task line to record
+		 */
+		
 		/* get the task field */
-		ptr = skipwords(line, config_task_format_offsets.task);
+		ptr = skip2word(line, config_task_format_offsets.task);
 
 		/* Find the end of the task field */
 		end = skipchars(ptr);
@@ -408,20 +450,21 @@ static int parse_task_lines(char *buffer, struct task_info *task_info, int nr_en
 		}
 		strncpy(comm, ptr, comm_size);
 		comm[comm_size] = '\0';
-		ptr = end;
 
 		/* get the PID field */
-		ptr = skipwords(line, config_task_format_offsets.pid);
+		ptr = skip2word(line, config_task_format_offsets.pid);
 		pid = strtol(ptr, NULL, 10);
 
 		/* get the context switches field */
-		ptr = skipwords(line, config_task_format_offsets.switches);
+		ptr = skip2word(line, config_task_format_offsets.switches);
 		ctxsw = strtol(ptr, NULL, 10);
 
 		/* get the prio field */
-		ptr = skipwords(line, config_task_format_offsets.prio);
+		ptr = skip2word(line, config_task_format_offsets.prio);
 		prio = strtol(ptr, NULL, 10);
 
+		/*log_msg("DEBUG: task%d comm:%s pid:%d ctxsw:%d prio:%d\n", tasks, comm, pid, ctxsw, prio);*/
+
                 /*
                  * In older formats, we must check to
                  * see if the process is runnable prior to storing header
@@ -437,10 +480,10 @@ static int parse_task_lines(char *buffer, struct task_info *task_info, int nr_en
 			task->since = time(NULL);
 			/* increment the count of tasks processed */
 			tasks++;
-		} else {
-			continue;
 		}
 
+		/* move our line pointer to the next availble line */
+		line = nextline(line);
 	}
 	return tasks;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/12] sched_debug: Fix double-free crash in fill_waiting_task()
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
  2025-10-17  2:24 ` [PATCH 02/12] sched_debug: Fix runqueue task parsing logic and state filtering Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 16:01   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 04/12] stalld.c: remove noisy idle report and added report to should_skip_idle_cpus() Clark Williams
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Claude, Clark Williams, wander, debarbos,
	marco.chiappero, chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

Fix a critical double-free bug that occurs when fill_waiting_task()
returns early without allocating new memory for cpu_info->starving.

Root cause:
In sched_debug_parse(), the code saves old_tasks = cpu_info->starving,
then calls fill_waiting_task(). If fill_waiting_task() returns early
(nr_entries <= 0 or cpu_info == NULL), it doesn't allocate new memory
or update cpu_info->starving. This leaves cpu_info->starving pointing
to the old memory. When the code later frees old_tasks, it's freeing
the same memory that cpu_info->starving still points to. On the next
iteration, old_tasks again points to this already-freed memory, causing
a double-free crash.

Example crash sequence:
1. First parse: cpu_info->starving = malloc(memory A)
2. Second parse: old_tasks = memory A
                 nr_entries = 5, so cpu_info->starving = malloc(memory B)
                 free(old_tasks) → frees memory A
3. Third parse:  old_tasks = memory B
                 nr_entries = 0, so fill_waiting_task() returns early
                 cpu_info->starving still points to memory B
                 free(old_tasks) → frees memory B
4. Fourth parse: old_tasks = memory B (already freed!)
                 nr_entries > 0, so cpu_info->starving = malloc(memory C)
                 free(old_tasks) → DOUBLE FREE of memory B → CRASH

The fix:
Explicitly set cpu_info->starving = NULL in fill_waiting_task() when
returning early without allocation. This ensures that cpu_info->starving
never points to freed memory, preventing the double-free.

Impact:
- Prevents random crashes in sched_debug backend
- Enables stable operation with varying numbers of runnable tasks
- Critical for test suite reliability

Testing:
- Verified no crash with: sudo stalld -f -v -N -b sched_debug -t 3 -c 0 -l
- Runs stably for extended periods without crashes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/sched_debug.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/sched_debug.c b/src/sched_debug.c
index 04a12ef2a7c2..102a4e02d57d 100644
--- a/src/sched_debug.c
+++ b/src/sched_debug.c
@@ -525,6 +525,7 @@ static int fill_waiting_task(char *buffer, struct cpu_info *cpu_info)
 
 	if (cpu_info == NULL) {
 		warn("NULL cpu_info pointer!\n");
+		cpu_info->starving = NULL;
 		return 0;
 	}
 
@@ -533,8 +534,10 @@ static int fill_waiting_task(char *buffer, struct cpu_info *cpu_info)
 	else
 		nr_entries = cpu_info->nr_running;
 
-	if (nr_entries <= 0)
+	if (nr_entries <= 0) {
+		cpu_info->starving = NULL;
 		return 0;
+	}
 
 	cpu_info->starving = malloc(sizeof(struct task_info) * nr_entries);
 	if (cpu_info->starving == NULL) {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/12] stalld.c: remove noisy idle report and added report to should_skip_idle_cpus()
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
  2025-10-17  2:24 ` [PATCH 02/12] sched_debug: Fix runqueue task parsing logic and state filtering Clark Williams
  2025-10-17  2:24 ` [PATCH 03/12] sched_debug: Fix double-free crash in fill_waiting_task() Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 16:03   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 05/12] stalld.c: initialize cpu_info->idle_time to be -1 Clark Williams
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Clark Williams, wander, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

comment out the log_verbose() call in get_cpu_busy_list(), since
it's not helpful for production and chews up a lot of screen real-
estate when running tests. Added a report in should_skip_idle_cpus()
which reports that there were busy cpus and that a run-queue parse
will be done.

Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/stalld.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/stalld.c b/src/stalld.c
index 4c00158cf239..e8c927750ee0 100644
--- a/src/stalld.c
+++ b/src/stalld.c
@@ -289,9 +289,10 @@ int get_cpu_busy_list(struct cpu_info *cpus, int nr_cpus, char *busy_cpu_list)
 			continue;
 		}
 
-		log_verbose ("\t cpu %d had %ld idle time, and now has %ld\n",
-			     cpu->id, cpu->idle_time, idle_time);
-
+		/*
+		 * log_verbose ("\t cpu %d had %ld idle time, and now has %ld\n",
+		 *	     cpu->id, cpu->idle_time, idle_time);
+		 */
 		/* If the idle time did not change, the CPU is busy. */
 		if (cpu->idle_time == idle_time) {
 			busy_cpu_list[i] = 1;
@@ -795,7 +796,7 @@ static int should_skip_idle_cpus(struct cpu_info *cpus, int nr_cpus, char *busy_
 		log_verbose("all CPUs had idle time, skipping parse\n");
 		return 1;
 	}
-
+	log_verbose("some cpus ran, so run-queues should be parsed\n");
 	return 0;
 }
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/12] stalld.c: initialize cpu_info->idle_time to be -1
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
                   ` (2 preceding siblings ...)
  2025-10-17  2:24 ` [PATCH 04/12] stalld.c: remove noisy idle report and added report to should_skip_idle_cpus() Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 16:15   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 06/12] stalld.c: get rid of misleading print about DL-Server Clark Williams
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Clark Williams, wander, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

Initialize the idle_time value of a cpu_info structure to be -1 so
that it will be presumed busy for the first time through the loop,
which will force an initial check of the run queues.

Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/stalld.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/stalld.c b/src/stalld.c
index e8c927750ee0..6cd4bf68d04a 100644
--- a/src/stalld.c
+++ b/src/stalld.c
@@ -293,6 +293,15 @@ int get_cpu_busy_list(struct cpu_info *cpus, int nr_cpus, char *busy_cpu_list)
 		 * log_verbose ("\t cpu %d had %ld idle time, and now has %ld\n",
 		 *	     cpu->id, cpu->idle_time, idle_time);
 		 */
+
+		/* If this is the first check (idle_time was -1), assume busy to be safe */
+		if (cpu->idle_time == -1) {
+			cpu->idle_time = idle_time;
+			busy_cpu_list[i] = 1;
+			busy_count++;
+			continue;
+		}
+
 		/* If the idle time did not change, the CPU is busy. */
 		if (cpu->idle_time == idle_time) {
 			busy_cpu_list[i] = 1;
@@ -1168,6 +1177,7 @@ int main(int argc, char **argv)
 	for (i = 0; i < config_nr_cpus; i++) {
 		cpus[i].buffer = allocate_memory(1, config_buffer_size);
 		cpus[i].buffer_size = config_buffer_size;
+		cpus[i].idle_time = -1;  /* Initialize to -1 so first check doesn't skip busy CPUs */
 	}
 
 	if (config_log_syslog)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/12] stalld.c: get rid of misleading print about DL-Server
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
                   ` (3 preceding siblings ...)
  2025-10-17  2:24 ` [PATCH 05/12] stalld.c: initialize cpu_info->idle_time to be -1 Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 16:16   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 07/12] stalld.c: Add starvation logging in single-threaded log-only mode Clark Williams
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Clark Williams, wander, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

Get rid of the assertion that stalld was forced into log-only mode

Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/stalld.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/stalld.c b/src/stalld.c
index 6cd4bf68d04a..6adc3e9a8cea 100644
--- a/src/stalld.c
+++ b/src/stalld.c
@@ -1149,7 +1149,7 @@ int main(int argc, char **argv)
 		set_cpu_affinity(config_affinity_cpus);
 
 	if (!check_dl_server_dir_exists())
-		log_msg("DL-server detected. Operating in log-only mode.\n");
+		log_msg("DL-server detected.\n");
 
 	/*
 	 * Check RT throttling:
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/12] stalld.c: Add starvation logging in single-threaded log-only mode
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
                   ` (4 preceding siblings ...)
  2025-10-17  2:24 ` [PATCH 06/12] stalld.c: get rid of misleading print about DL-Server Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 16:27   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 08/12] stalld: Add -N/--no_idle_detect flag to disable idle detection Clark Williams
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Claude, Clark Williams, wander, debarbos,
	marco.chiappero, chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

In single-threaded mode with -l/--log_only flag, stalld was not logging
the "starved" message when tasks reached the starvation threshold. This
was because the config_log_only check occurred before any logging, and
the "starved" message (present in check_starving_tasks() for per-CPU
threading modes) was missing from boost_cpu_starving_vector().

This caused test_log_only.sh to fail when looking for the "starved"
message in the log output.

Changes to boost_cpu_starving_vector():
- Move config_log_only check after starvation detection
- Add log_msg() call to output "starved" message when threshold reached
- Reset timestamp (cpu->since) in log-only mode to prevent continuous
  logging of the same starvation event
- Change threshold comparison from <= to < for consistency with
  check_starving_tasks()

This ensures consistent logging behavior between single-threaded mode
and per-CPU threading modes when using log-only mode.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/stalld.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/stalld.c b/src/stalld.c
index 6adc3e9a8cea..ebca23105ce9 100644
--- a/src/stalld.c
+++ b/src/stalld.c
@@ -924,12 +924,22 @@ int boost_cpu_starving_vector(struct cpu_starving_task_info *vector, int nr_cpus
 			log_verbose("\t cpu %d: pid: %d starving for %llu\n",
 				    i, cpu->pid, (now - cpu->since));
 
-		if (config_log_only)
+		/* Skip if no task or not starving long enough */
+		if (cpu->pid == 0 || (now - cpu->since) < config_starving_threshold)
 			continue;
 
-		/* Skip if no task or not starving long enough */
-		if (cpu->pid == 0 || (now - cpu->since) <= config_starving_threshold)
+		/* Log when task has reached starvation threshold */
+		if ((now - cpu->since) >= config_starving_threshold) {
+			log_msg("%s-%d starved on CPU %d for %d seconds\n",
+				cpu->task.comm, cpu->pid, i,
+				(now - cpu->since));
+		}
+
+		if (config_log_only) {
+			/* Reset timestamp to avoid continuous logging */
+			cpu->since = now;
 			continue;
+		}
 
 		/* Skip if task is on denylist */
 		if (config_ignore && !check_task_ignore(&cpu->task))
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/12] stalld: Add -N/--no_idle_detect flag to disable idle detection
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
                   ` (5 preceding siblings ...)
  2025-10-17  2:24 ` [PATCH 07/12] stalld.c: Add starvation logging in single-threaded log-only mode Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 16:33   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 09/12] stalld: Add defensive checks in print_boosted_info Clark Williams
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Claude, Clark Williams, wander, debarbos,
	marco.chiappero, chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

Idle detection is an optimization that skips parsing CPUs that have
increasing idle time. However, this causes issues in controlled test
environments where the starvation generator may not prevent CPUs from
appearing idle.

The new flag allows disabling this optimization for:
- Testing scenarios where CPUs need to be monitored regardless of idle state
- Debugging starvation detection issues
- Ensuring all CPUs are always checked

Usage: stalld -N or --no_idle_detect

Implementation:
- Added -N option to getopt_long short options
- Added no_idle_detect to long_options array
- Sets config_idle_detection = 0 when specified
- Added help text: "disable idle CPU detection optimization (for testing)"

The config_idle_detection variable already exists and is checked
throughout the code, so no other changes are needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/utils.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/utils.c b/src/utils.c
index 91c99c0a1270..8f1fe9266db9 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -767,6 +767,7 @@ static void print_usage(void)
 		"                               starving_threshold time, dispatch a specialized thread to monitor",
 		"                               it.",
 		"	   -O/--power_mode: works as a single threaded tool. Saves CPU, but loses precision.",
+		"	   -N/--no_idle_detect: disable idle CPU detection optimization (for testing)",
 		"	   -g/--granularity: set the granularity at which stalld checks for starving threads",
 		"	   -R/--reservation: percentage of CPU time reserved to stalld using SCHED_DEADLINE.",
 		"	   -a/--affinity: limit stalld's affinity",
@@ -996,6 +997,7 @@ int parse_args(int argc, char **argv)
 			{"aggressive_mode",	no_argument,	   0, 'A'},
 			{"power_mode",		no_argument,	   0, 'O'},
 			{"adaptive_mode",	no_argument,	   0, 'M'},
+			{"no_idle_detect",	no_argument,	   0, 'N'},
 			{"help",		no_argument,	   0, 'h'},
 			{"boost_period",	required_argument, 0, 'p'},
 			{"boost_runtime",	required_argument, 0, 'r'},
@@ -1017,7 +1019,7 @@ int parse_args(int argc, char **argv)
 		/* getopt_long stores the option index here. */
 		int option_index = 0;
 
-		c = getopt_long(argc, argv, "lvkfAOMhsp:r:d:t:c:FVSg:i:I:R:b:a:",
+		c = getopt_long(argc, argv, "lvkfAOMNhsp:r:d:t:c:FVSg:i:I:R:b:a:",
 				 long_options, &option_index);
 
 		/* Detect the end of the options. */
@@ -1138,6 +1140,10 @@ int parse_args(int argc, char **argv)
 			config_single_threaded = 0;
 			config_aggressive = 0;
 			break;
+		case 'N':
+			config_idle_detection = 0;
+			log_msg("idle detection disabled\n");
+			break;
 		case 'R':
 			config_reservation = get_long_from_str(optarg);
 			if (config_reservation < 10 || config_reservation > 90)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/12] stalld: Add defensive checks in print_boosted_info
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
                   ` (6 preceding siblings ...)
  2025-10-17  2:24 ` [PATCH 08/12] stalld: Add -N/--no_idle_detect flag to disable idle detection Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 17:36   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 10/12] Makefile: Add support for legacy kernels Clark Williams
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Claude, Clark Williams, wander, debarbos,
	marco.chiappero, chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

This adds validation and bounds checking in print_boosted_info() to
prevent segmentation faults when boosting tasks in FIFO mode.

Changes:
- Add NULL check for 'type' parameter
- Handle fill_process_comm() failures gracefully with fallback
- Validate cpu->id is within valid range before dereferencing
- Provide informative error messages for invalid states

Issue: Test results show stalld crashes with SIGSEGV when using -F
(force FIFO) flag in aggressive mode (-A). The crash occurs during
or immediately after the first boost attempt.

This is a partial fix that adds defensive programming. Further
investigation needed to identify the root cause of the crashes
in FIFO boosting mode.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/stalld.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/stalld.c b/src/stalld.c
index ebca23105ce9..466b6db15668 100644
--- a/src/stalld.c
+++ b/src/stalld.c
@@ -435,10 +435,21 @@ void print_boosted_info(int tgid, int pid, struct cpu_info *cpu, char *type)
 {
 	char comm[COMM_SIZE];
 
-	fill_process_comm(tgid, pid, comm, COMM_SIZE);
+	/* Validate inputs to prevent crashes */
+	if (!type) {
+		warn("print_boosted_info called with NULL type\n");
+		return;
+	}
+
+	if (fill_process_comm(tgid, pid, comm, COMM_SIZE) != 0) {
+		/* If we can't get the comm, use a placeholder */
+		snprintf(comm, COMM_SIZE, "<unknown>");
+	}
 
-	if (cpu)
+	if (cpu && cpu->id >= 0 && cpu->id < config_nr_cpus)
 		log_msg("boosted pid %d (%s) (cpu %d) using %s\n", pid, comm, cpu->id, type);
+	else if (cpu)
+		log_msg("boosted pid %d (%s) (cpu <invalid>) using %s\n", pid, comm, type);
 	else
 		log_msg("boosted pid %d (%s) using %s\n", pid, comm, type);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/12] Makefile: Add support for legacy kernels
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
                   ` (7 preceding siblings ...)
  2025-10-17  2:24 ` [PATCH 09/12] stalld: Add defensive checks in print_boosted_info Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-17 12:50   ` Derek Barbosa
  2025-10-21 17:43   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 11/12] scripts: fix run-local if bashism Clark Williams
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Derek Barbosa, Clark Williams, Clark Williams, wander,
	marco.chiappero, chris.friesen, luochunsheng

From: Derek Barbosa <debarbos@redhat.com>

In the sched_debug backend code, we conditionally parse the running
tasks block of the sched_debug file to support the older task format (as
denoted by OLD_TASK_FORMAT).

This task format is only present on legacy 3.X kernels. However, on said
legacy kernels, the current Makefile directives and #if macros in the
stalld.c code do not fully support compilation of stalld.

Add a LEGACY check to the makefile, which parses `uname -r` output for
the major kernel version, and adds a LEGACY compilation flag to the
CFLAGS. Use the compilation flag to default to sched_debug_backend in
the stalld.c file to avoid missing references to the queue_track code.

Also, disable Intel CET -fcf-protection and BPF support, remove
annocheck targets, and default to the c99 compiler (if present).

Signed-off-by: Derek Barbosa <debarbos@redhat.com>
Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 Makefile     | 40 +++++++++++++++++++++++++++++++++++++++-
 src/stalld.c |  6 ++++--
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 7234a46976aa..c09d5743b696 100644
--- a/Makefile
+++ b/Makefile
@@ -9,6 +9,14 @@ ARCH=$(shell uname -m)
 endif
 $(info ARCH=$(ARCH))
 
+LEGACY_VER := 3
+ifeq ($(strip $(KERNEL_VER)),)
+KERNEL_VER=$(shell uname -r | cut -f 1 -d .)
+endif
+$(info Kernel Major Version is $(KERNEL_VER))
+
+IS_LEGACY:= $(shell test $(KERNEL_VER) -le $(LEGACY_VER) && echo "true" || echo "false")
+
 USE_BPF := 1
 FCF_PROTECTION := -fcf-protection
 MTUNE	:= -mtune=generic
@@ -33,6 +41,10 @@ ifeq ($(ARCH),powerpc)
 USE_BPF := 0
 MTUNE := -mtune=powerpc
 endif
+ifeq ($(IS_LEGACY),true)
+USE_BPF := 0
+FCF_PROTECTION := 
+endif
 
 $(info USE_BPF=$(USE_BPF))
 $(info FCF_PROTECTION=$(FCF_PROTECTION))
@@ -57,6 +69,7 @@ DEFS	:=	-DUSE_BPF=$(USE_BPF) -D_FORTIFY_SOURCE=3 -D_GLIBCXX_ASSERTIONS -DDEBUG_S
 
 # note that RPMCFLAGS and RPMLDFLAGS are variables that come from the specfile when
 # building for Fedora/CentOS/RHEL/et al
+#
 
 ifeq ($(RPMCFLAGS),)
 CFLAGS	:=	-DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(SOPTS) $(DEFS)
@@ -64,6 +77,12 @@ else
 CFLAGS	:=	 $(RPMCFLAGS) $(DEFS)
 endif
 
+ifeq ($(IS_LEGACY),true)
+$(info Compiling with Legacy Mode enabled)
+$(info Overwriting RPMCFLAGS...)
+CFLAGS	:=	-DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(DEFS) -std=c99 -DLEGACY
+endif
+
 ifeq ($(DEBUG),0)
 CFLAGS	+=	-g -O2
 else
@@ -200,7 +219,7 @@ install: stalld
 	rm -f $(DESTDIR)$(BINDIR)/throttlectl
 	make -C systemd DESTDIR=$(INSPATH) uninstall
 
-.PHONY: clean tarball systemd push annocheck
+.PHONY: clean tarball systemd push annocheck help
 clean:
 	@test ! -f stalld || rm stalld
 	@test ! -f stalld-static || rm stalld-static
@@ -221,3 +240,22 @@ tarball:  clean
 
 annocheck: stalld
 	annocheck --ignore-unknown --verbose --profile=el10 --debug-dir=/usr/lib/debug/ ./stalld
+
+help:
+	@echo  'Cleaning targets:'
+	@echo  '  clean            - Clean the main executable, tests, objects, and tarballs.'
+	@echo  ''
+	@echo 'Building targets:'
+	@echo '  stalld            - Build the main executable (stalld).'
+	@echo '  static            - Build a statically linked executable (stalld-static).'
+	@echo '  tests             - Run tests in the "tests" subdirectory.'
+	@echo '  tarball           - Create a source tarball: $(NAME)-$(VERSION).tar.$(CEXT)'
+	@echo ''
+	@echo 'Installation targets:'
+	@echo '  install           - Install the executable, man page, docs, and licenses.'
+	@echo '  uninstall         - Remove all installed files.'
+	@echo ''
+	@echo 'Utility targets:'
+	@echo '  annocheck         - Run the annocheck security analysis tool on the stalld executable.'
+	@echo '  help              - Display this help message.'
+
diff --git a/src/stalld.c b/src/stalld.c
index 466b6db15668..305618e577fc 100644
--- a/src/stalld.c
+++ b/src/stalld.c
@@ -153,14 +153,16 @@ int config_reservation = 0;
 
 /*
  * Select a backend.
- * Note that eBPF not supported on i686 and powerpc
+ * Note that eBPF not supported on i686, powerpc, and on x86 3.X kernels.
  */
-#if !__powerpc__ && !__i386__
+#if !__powerpc__ && !__i386__ && !LEGACY
 struct stalld_backend *backend = &queue_track_backend;
 #else
 struct stalld_backend *backend = &sched_debug_backend;
 #endif
 
+
+
 /*
  * Set of CPUs in which stalld should run.
  */
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/12] scripts: fix run-local if bashism
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
                   ` (8 preceding siblings ...)
  2025-10-17  2:24 ` [PATCH 10/12] Makefile: Add support for legacy kernels Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 17:45   ` Wander Lairson Costa
  2025-10-17  2:24 ` [PATCH 12/12] Fix segfault in adaptive/aggressive modes Clark Williams
  2025-10-21 15:54 ` [PATCH 01/12] sched_debug: Unify parsing methods for task_info Wander Lairson Costa
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Derek Barbosa, Clark Williams, Clark Williams, wander,
	marco.chiappero, chris.friesen, luochunsheng

From: Derek Barbosa <debarbos@redhat.com>

In older versions of Bash, mainly 4.2.X, a semicolon (or newline) is
requied after a conditional statement and before the `then` keyword.

Add the semicolon to allow testing and quick "local runs" on legacy
kernels.

Signed-off-by: Derek Barbosa <debarbos@redhat.com>
Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 scripts/run-local.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/run-local.sh b/scripts/run-local.sh
index 0ce147634472..e65f34dd46a4 100755
--- a/scripts/run-local.sh
+++ b/scripts/run-local.sh
@@ -44,7 +44,7 @@ parse_args()
 # script start
 #
 
-if [[ ! -x ./stalld ]] then
+if [[ ! -x ./stalld ]]; then
    echo "No stalld executable in the current directory!"
    exit 1
 fi
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/12] Fix segfault in adaptive/aggressive modes
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
                   ` (9 preceding siblings ...)
  2025-10-17  2:24 ` [PATCH 11/12] scripts: fix run-local if bashism Clark Williams
@ 2025-10-17  2:24 ` Clark Williams
  2025-10-21 17:45   ` Wander Lairson Costa
  2025-10-21 15:54 ` [PATCH 01/12] sched_debug: Unify parsing methods for task_info Wander Lairson Costa
  11 siblings, 1 reply; 25+ messages in thread
From: Clark Williams @ 2025-10-17  2:24 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, Claude, Clark Williams, wander, debarbos,
	marco.chiappero, chris.friesen, luochunsheng

From: Clark Williams <williams@redhat.com>

The merge_taks_info() function was unconditionally calling
update_cpu_starving_vector() at line 389, but cpu_starving_vector
is only allocated in single_threaded_main() (line 1007). This caused
segmentation faults when running in adaptive or aggressive threading
modes.

The fix guards both update_cpu_starving_vector() calls with
config_single_threaded checks, since this vector is only used
in single-threaded/power mode.

Root cause: cpu_starving_vector is a global array used exclusively
by single-threaded mode to track one starving task per CPU. The
merge_taks_info() function is called by both backends during parsing,
regardless of threading mode. In adaptive/aggressive modes, the
vector remains NULL, causing crashes when accessed.

Crash backtrace:
  conservative_main() → get_cpu_and_parse() → sched_debug_parse() →
  merge_taks_info() → update_cpu_starving_vector() → SEGFAULT at
  line 362: if (cpu_info->pid)

This fix allows test_starvation_threshold.sh to pass on sched_debug
backend with adaptive mode (-M flag).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Clark Williams <clrkwllms@kernel.org>
Signed-off-by: Clark Williams <williams@redhat.com>
---
 src/stalld.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/stalld.c b/src/stalld.c
index 305618e577fc..130a7d4eaf9d 100644
--- a/src/stalld.c
+++ b/src/stalld.c
@@ -386,7 +386,8 @@ void merge_taks_info(int cpu, struct task_info *old_tasks, int nr_old, struct ta
 	int i;
 	int j;

-	update_cpu_starving_vector(cpu, &notask);
+	if (config_single_threaded)
+		update_cpu_starving_vector(cpu, &notask);

 	for (i = 0; i < nr_old; i++) {
 		old_task = &old_tasks[i];
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/12] Makefile: Add support for legacy kernels
  2025-10-17  2:24 ` [PATCH 10/12] Makefile: Add support for legacy kernels Clark Williams
@ 2025-10-17 12:50   ` Derek Barbosa
  2025-10-21 17:43   ` Wander Lairson Costa
  1 sibling, 0 replies; 25+ messages in thread
From: Derek Barbosa @ 2025-10-17 12:50 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, wander, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:42PM -0500, Clark Williams wrote:
> From: Derek Barbosa <debarbos@redhat.com>
> 
> In the sched_debug backend code, we conditionally parse the running
> tasks block of the sched_debug file to support the older task format (as
> denoted by OLD_TASK_FORMAT).
> 
> This task format is only present on legacy 3.X kernels. However, on said
> legacy kernels, the current Makefile directives and #if macros in the
> stalld.c code do not fully support compilation of stalld.
> 
> Add a LEGACY check to the makefile, which parses `uname -r` output for
> the major kernel version, and adds a LEGACY compilation flag to the
> CFLAGS. Use the compilation flag to default to sched_debug_backend in
> the stalld.c file to avoid missing references to the queue_track code.
> 
> Also, disable Intel CET -fcf-protection and BPF support, remove
> annocheck targets, and default to the c99 compiler (if present).
> 
> Signed-off-by: Derek Barbosa <debarbos@redhat.com>
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  Makefile     | 40 +++++++++++++++++++++++++++++++++++++++-
>  src/stalld.c |  6 ++++--
>  2 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 7234a46976aa..c09d5743b696 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -9,6 +9,14 @@ ARCH=$(shell uname -m)
>  endif
>  $(info ARCH=$(ARCH))
>  
> +LEGACY_VER := 3
> +ifeq ($(strip $(KERNEL_VER)),)
> +KERNEL_VER=$(shell uname -r | cut -f 1 -d .)
> +endif
> +$(info Kernel Major Version is $(KERNEL_VER))
> +
> +IS_LEGACY:= $(shell test $(KERNEL_VER) -le $(LEGACY_VER) && echo "true" || echo "false")
> +
>  USE_BPF := 1
>  FCF_PROTECTION := -fcf-protection
>  MTUNE	:= -mtune=generic
> @@ -33,6 +41,10 @@ ifeq ($(ARCH),powerpc)
>  USE_BPF := 0
>  MTUNE := -mtune=powerpc
>  endif
> +ifeq ($(IS_LEGACY),true)
> +USE_BPF := 0
> +FCF_PROTECTION := 
> +endif
>  
>  $(info USE_BPF=$(USE_BPF))
>  $(info FCF_PROTECTION=$(FCF_PROTECTION))
> @@ -57,6 +69,7 @@ DEFS	:=	-DUSE_BPF=$(USE_BPF) -D_FORTIFY_SOURCE=3 -D_GLIBCXX_ASSERTIONS -DDEBUG_S
>  
>  # note that RPMCFLAGS and RPMLDFLAGS are variables that come from the specfile when
>  # building for Fedora/CentOS/RHEL/et al
> +#
>  
>  ifeq ($(RPMCFLAGS),)
>  CFLAGS	:=	-DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(SOPTS) $(DEFS)
> @@ -64,6 +77,12 @@ else
>  CFLAGS	:=	 $(RPMCFLAGS) $(DEFS)
>  endif
>  
> +ifeq ($(IS_LEGACY),true)
> +$(info Compiling with Legacy Mode enabled)
> +$(info Overwriting RPMCFLAGS...)
> +CFLAGS	:=	-DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(DEFS) -std=c99 -DLEGACY
> +endif
> +
>  ifeq ($(DEBUG),0)
>  CFLAGS	+=	-g -O2
>  else
> @@ -200,7 +219,7 @@ install: stalld
>  	rm -f $(DESTDIR)$(BINDIR)/throttlectl
>  	make -C systemd DESTDIR=$(INSPATH) uninstall
>  
> -.PHONY: clean tarball systemd push annocheck
> +.PHONY: clean tarball systemd push annocheck help
>  clean:
>  	@test ! -f stalld || rm stalld
>  	@test ! -f stalld-static || rm stalld-static
> @@ -221,3 +240,22 @@ tarball:  clean
>  
>  annocheck: stalld
>  	annocheck --ignore-unknown --verbose --profile=el10 --debug-dir=/usr/lib/debug/ ./stalld
> +
> +help:
> +	@echo  'Cleaning targets:'
> +	@echo  '  clean            - Clean the main executable, tests, objects, and tarballs.'
> +	@echo  ''
> +	@echo 'Building targets:'
> +	@echo '  stalld            - Build the main executable (stalld).'
> +	@echo '  static            - Build a statically linked executable (stalld-static).'
> +	@echo '  tests             - Run tests in the "tests" subdirectory.'
> +	@echo '  tarball           - Create a source tarball: $(NAME)-$(VERSION).tar.$(CEXT)'
> +	@echo ''
> +	@echo 'Installation targets:'
> +	@echo '  install           - Install the executable, man page, docs, and licenses.'
> +	@echo '  uninstall         - Remove all installed files.'
> +	@echo ''
> +	@echo 'Utility targets:'
> +	@echo '  annocheck         - Run the annocheck security analysis tool on the stalld executable.'
> +	@echo '  help              - Display this help message.'
> +
> diff --git a/src/stalld.c b/src/stalld.c
> index 466b6db15668..305618e577fc 100644
> --- a/src/stalld.c
> +++ b/src/stalld.c
> @@ -153,14 +153,16 @@ int config_reservation = 0;
>  
>  /*
>   * Select a backend.
> - * Note that eBPF not supported on i686 and powerpc
> + * Note that eBPF not supported on i686, powerpc, and on x86 3.X kernels.
>   */
> -#if !__powerpc__ && !__i386__
> +#if !__powerpc__ && !__i386__ && !LEGACY
>  struct stalld_backend *backend = &queue_track_backend;
>  #else
>  struct stalld_backend *backend = &sched_debug_backend;
>  #endif
>  
> +
> +
>  /*
>   * Set of CPUs in which stalld should run.
>   */
> -- 
> 2.51.0
> 

Hey Clark.

I still have to implement Crystal's suggestions[0] which also means grabbing
access to a compatible box. I haven't had time today, but I don't think we
should take this patch as-is at the moment.

Would you also mind replying to a few of the questions there? Specifically:

>
> Speaking of SOPTS, is this tool intended to be Red Hat specific
> (and require redhat-rpm-config even if no RPM building is intended)?
>

> Of course, we could always go the direction of just dropping the functionality
> of parsing said "old" sched_debug files. Clark said that this may make some
> people upset though :P


Cheers,

[0] https://lore.kernel.org/linux-rt-users/367f7c89f0915f586a133914fe509d2913941cc4.camel@redhat.com/T/#u

-- 
Derek <debarbos@redhat.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/12] sched_debug: Unify parsing methods for task_info
  2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
                   ` (10 preceding siblings ...)
  2025-10-17  2:24 ` [PATCH 12/12] Fix segfault in adaptive/aggressive modes Clark Williams
@ 2025-10-21 15:54 ` Wander Lairson Costa
  11 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 15:54 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, Derek Barbosa, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:33PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> In the sched_debug backend code, there are two logical paths for parsing
> the sched_debug file's task_info "running tasks". These are currently
> divided into "OLD" and "NEW" parsing functions, each with their own
> logic.
> 
> Unify these branching code paths by creating a line-based "word" parser
> that stores the word-offset of the neccessary fields in a struct.
> Accomodate "legacy" behavior where needed using an enumerated type.
> 
> parse_task_lines() now can parse multiple formats of output from the
> debugfs file which on modern systems is located at
> /sys/kernel/debug/sched/debug, and on 3.X "legacy" systems, is located
> at /proc/sched_debug.
> 
> The detect_task_format() function records field offsets which are used
> by parse_task_lines to pull fields out of the "running tasks:" section.
> 
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Derek Barbosa <debarbos@redhat.com>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/sched_debug.c | 405 +++++++++++++++++++++-------------------------
>  src/sched_debug.h |  65 +++++++-
>  2 files changed, 244 insertions(+), 226 deletions(-)
> 
> diff --git a/src/sched_debug.c b/src/sched_debug.c
> index fa2f74bf36ed..180932ca7aa3 100644
> --- a/src/sched_debug.c
> +++ b/src/sched_debug.c
> @@ -24,6 +24,9 @@
>   */
>  static int config_task_format;
>  
> +static struct task_format_offsets
> +    config_task_format_offsets  = { 0, 0, 0, 0 };
> +
>  /*
>   * Read the contents of sched_debug into the input buffer.
>   */
> @@ -88,8 +91,12 @@ static char *get_next_cpu_info_start(char *start)
>  {
>  	const char *next_cpu = "cpu#";
>  
> -	/* Skip the current CPU definition. */
> -	start += 10;
> +        /*
> +         * Skip the current CPU definition.
> +         * We want to move our "cursor" past the current "cpu#" definition.
> +         * This number is arbitrary. It is purely to assist strstr().
> +         */
> +        start += 10;
>  
>  	return strstr(start, next_cpu);
>  }
> @@ -143,9 +150,13 @@ static inline char *skipchars(char *str)
>  	return str;
>  }
>  
> +/*
> + * Note, for our purposes newline is *not* a space
> + * and we want to stop when we hit it
> + */
>  static inline char *skipspaces(char *str)
>  {
> -	while (*str && isspace(*str))
> +	while (*str && isspace(*str) && (*str != '\n'))
>  		str++;
>  	return str;
>  }
> @@ -156,6 +167,20 @@ static inline char *nextline(char *str)
>  	return ptr ? ptr+1 : NULL;
>  }
>  
> +/*
> + * skip a specified number of words on a task line
> + */
> +
> +static inline char *skipwords(char *ptr, int nwords)
> +{
> +	int i;
> +	for (i=0; i < nwords; i++) {
> +		ptr = skipspaces(ptr);
> +		ptr = skipchars(ptr);
> +	}
> +	return ptr;
> +}
> +
>  /*
>   * Read sched_debug and figure out if it's old or new format
>   * done once so if we fail just exit the program.
> @@ -173,6 +198,7 @@ static int detect_task_format(void)
>  	char *ptr;
>  	int status;
>  	int fd;
> +	int i, count=0;
>  
>  	bufsiz = bufincrement = BUFFER_PAGES * page_size;
>  
> @@ -204,122 +230,63 @@ static int detect_task_format(void)
>  
>  	ptr = strstr(buffer, TASK_MARKER);
>  	if (ptr == NULL) {
> -		fprintf(stderr, "unable to find 'runnable tasks' in buffer, invalid input\n");
> +		die("unable to find 'runnable tasks' in buffer, invalid input\n");
>  		exit(-1);
>  	}
>  
> -	ptr += strlen(TASK_MARKER) + 1;
> -	ptr = skipspaces(ptr);
> +	ptr = nextline(ptr);
> +	i = 0;
>  
> -	if (strncmp(ptr, "task", 4) == 0) {
> -		retval = OLD_TASK_FORMAT;
> -		log_msg("detected old task format\n");
> -	} else if (strncmp(ptr, "S", 1) == 0) {
> +	/*
> +	 * Determine the TASK_FORMAT from the first "word" in the header
> +	 * line.
> +	 */
> +	ptr = skipspaces(ptr);
> +	if (strncmp(ptr, "S", strlen("S")) == 0) {
> +		log_msg("detect_task_format: NEW_TASK_FORMAT detected\n");
>  		retval = NEW_TASK_FORMAT;
> -		log_msg("detected new task format\n");
>  	}
> -
> -	free(buffer);
> -	return retval;
> -}
> -
> -/*
> - * Parse the new sched_debug format.
> - *
> - * Example:
> - * ' S           task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep'
> - * '-----------------------------------------------------------------------------------------------------------'
> - * ' I         rcu_gp     3        13.973264         2   100         0.000000         0.004469         0.000000 0 0 /
> - */
> -static int parse_new_task_format(char *buffer, struct task_info *task_info, int nr_entries)
> -{
> -	char *R, *X, *start = buffer;
> -	struct task_info *task;
> -	int tasks = 0;
> -	int comm_size;
> -	char *end;
> +	else {
> +		log_msg("detect_task_format: OLD_TASK_FORMAT detected\n");
> +		retval = OLD_TASK_FORMAT;
> +	}
>  
>  	/*
> -	 * If we have less than two tasks on the CPU there is no
> -	 * possibility of a stall.
> +	 * Look for our header keywords and store their offset
> +	 * we'll use the offsets when we actually parse the task
> +	 * line data
>  	 */
> -	if (nr_entries < 2)
> -		return 0;
> -
> -	while (tasks < nr_entries) {
> -		task = &task_info[tasks];
> -
> -		/*
> -		 * Runnable tasks.
> -		 */
> -		R = strstr(start, "\n R");
> -
> -		/*
> -		 * Dying tasks.
> -		 */
> -		X = strstr(start, "\n X");
> -
> -		/*
> -		 * Get the first one, the only one, or break.
> -		 */
> -		if (X && R) {
> -			start = R < X ? R : X;
> -		} else if (X || R) {
> -			start = R ? R : X;
> -		} else {
> -			break;
> +	while (*ptr != '\n') {
> +		ptr = skipspaces(ptr);
> +		if (strncmp(ptr, "task", strlen("task")) == 0) {
> +			config_task_format_offsets.task = i;
> +			count++;
> +			log_msg("detect_task_format: found 'task' at word %d\n", i);
>  		}
> -
> -		/* Skip '\n R' || '\n X'. */
> -		start = &start[3];
> -
> -		/* Skip the spaces. */
> -		start = skipspaces(start);
> -
> -		/* Find the end of the string. */
> -		end = skipchars(start);
> -
> -		comm_size = end - start;
> -
> -		if (comm_size >= COMM_SIZE) {
> -			warn("comm_size is too large: %d\n", comm_size);
> -			comm_size = COMM_SIZE - 1;
> +		else if (strncmp(ptr, "PID", strlen("PID")) == 0) {
> +			config_task_format_offsets.pid = i;
> +			count++;
> +			log_msg("detect_task_format: found 'PID' at word %d\n", i);
>  		}
> -
> -		strncpy(task->comm, start, comm_size);
> -
> -		task->comm[comm_size] = '\0';
> -
> -		/* Go to the end of the task comm. */
> -		start=end;
> -
> -		task->pid = strtol(start, &end, 10);
> -
> -		/* Get the id of the thread group leader. */
> -		task->tgid = get_tgid(task->pid);
> -
> -		/* Go to the end of the pid. */
> -		start=end;
> -
> -		/* Skip the tree-key. */
> -		start = skipspaces(start);
> -		start = skipchars(start);
> -
> -		task->ctxsw = strtol(start, &end, 10);
> -
> -		start = end;
> -
> -		task->prio = strtol(start, &end, 10);
> -
> -		task->since = time(NULL);
> -
> -		/* Go to the end and try to find the next occurrence. */
> -		start = end;
> -
> -		tasks++;
> +		else if (strncmp(ptr, "switches", strlen("switches")) == 0) {
> +			config_task_format_offsets.switches = i;
> +			count++;
> +			log_msg("detect_task_format: found 'switches' at word %d\n", i);
> +		}
> +		else if (strncmp(ptr, "prio", strlen("prio")) == 0) {
> +			config_task_format_offsets.prio = i;
> +			count++;
> +			log_msg("detect_task_format: found 'prio' at word %d\n", i);
> +		}
> +		ptr = skipchars(ptr);
> +		i++;
>  	}
>  
> -	return tasks;
> +	if (count != 4)
> +		die("detect_task_format: did not detect all task line fields we need\n");
> +
> +	free(buffer);
> +	return retval;
>  }
>  
>  /*
> @@ -387,104 +354,80 @@ static int is_runnable(int pid)
>  	return runnable;
>  }
>  
> -static int count_task_lines(char *buffer)
> -{
> -	int lines = 0;
> -	char *ptr;
> -	int len;
> -
> -	len = strlen(buffer);
> -
> -	/* Find the runnable tasks: header. */
> -	ptr = strstr(buffer, TASK_MARKER);
> -	if (ptr == NULL)
> -		return 0;
> -
> -	/* Skip to the end of the dashed line separator. */
> -	ptr = strstr(ptr, "-\n");
> -	if (ptr == NULL)
> -		return 0;
> -
> -	ptr += 2;
> -	while(*ptr && ptr < (buffer+len)) {
> -		lines++;
> -		ptr = strchr(ptr, '\n');
> -		if (ptr == NULL)
> -			break;
> -		ptr++;
> -	}
> -	return lines;
> -}
> -
> -/*
> - * Parse the old sched debug format:
> - *
> - * Example:
> - * '            task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
> - * ' ----------------------------------------------------------------------------------------------------------
> - * '     watchdog/35   296       -11.731402      4081     0         0.000000        44.052473         0.000000 /
> - */
> -static int parse_old_task_format(char *buffer, struct task_info *task_info, int nr_entries)
> +static int parse_task_lines(char *buffer, struct task_info *task_info, int nr_entries)
>  {
>  	int pid, ctxsw, prio, comm_size;
> -	char *start, *end, *buffer_end;
> +	char *ptr, *line, *end;
>  	struct task_info *task;
>  	char comm[COMM_SIZE];
> -	int waiting_tasks = 0;
> -
> -	start = buffer;
> -	start = strstr(start, TASK_MARKER);
> -	start = strstr(start, "-\n");
> -	start++;
> +	int tasks = 0;
>  
> -	buffer_end = buffer + strlen(buffer);
> +	if ((ptr = strstr(buffer, TASK_MARKER)) == NULL)
> +		die ("no runnable task section found!\n");
>  
>  	/*
> -	 * We can't short-circuit using nr_entries, we have to scan the
> -	 * entire list of processes that is on this CPU.
> +	 * If we have less than two tasks on the CPU there is no
> +	 * possibility of a stall.
>  	 */
> -	while (*start && start < buffer_end) {
> -		task = &task_info[waiting_tasks];
> +	if (nr_entries < 2)
> +		return 0;
> +	line = ptr;
> +
> +	/* skip header and divider */
> +	line = nextline(line);
> +	line = nextline(line);
> +	
> +	/* now loop over the task info */
> +	while (tasks < nr_entries) {
> +		task = &task_info[tasks];
>  
> -		/* Only care about tasks that are not R (running on a CPU). */
> -		if (start[0] == 'R') {
> +		/*
> +		 * In 3.X kernels, only the singular RUNNING task receives
> +		 * a "running state" label. Therefore, only care about
> +		 * tasks that are not R (running on a CPU).
> +		 */
> +		if ((config_task_format == OLD_TASK_FORMAT) &&
> +			(*ptr == 'R')) {
>  			/* Go to the end of the line and ignore this task. */
> -			start = strchr(start, '\n');
> -			start++;
> +			ptr = strchr(ptr, '\n');
> +			ptr++;
>  			continue;
>  		}
>  
> -		/* Pick up the comm field. */
> -		start = skipspaces(start);
> -		end = skipchars(start);
> -		comm_size = end - start;
> +		/* get the task field */
> +		ptr = skipwords(line, config_task_format_offsets.task);
> +
> +		/* Find the end of the task field */
> +		end = skipchars(ptr);
> +		comm_size = end - ptr;
> +
> +		/* make sure we don't overflow the comm array */
>  		if (comm_size >= COMM_SIZE) {
>  			warn("comm_size is too large: %d\n", comm_size);
>  			comm_size = COMM_SIZE - 1;
>  		}
> -		strncpy(comm, start, comm_size);
> -		comm[comm_size] = 0;
> -
> -		/* Go to the end of the task comm. */
> -		start=end;
> -
> -		/* Now pick up the pid. */
> -		pid = strtol(start, &end, 10);
> -
> -		/* Go to the end of the pid. */
> -		start=end;
> -
> -		/* Skip the tree-key. */
> -		start = skipspaces(start);
> -		start = skipchars(start);
> -
> -		/* Pick up the context switch count. */
> -		ctxsw = strtol(start, &end, 10);
> -		start = end;
> -
> -		/* Get the priority. */
> -		prio = strtol(start, &end, 10);
> -		if (is_runnable(pid)) {
> +		strncpy(comm, ptr, comm_size);
> +		comm[comm_size] = '\0';
> +		ptr = end;
> +
> +		/* get the PID field */
> +		ptr = skipwords(line, config_task_format_offsets.pid);
> +		pid = strtol(ptr, NULL, 10);
> +
> +		/* get the context switches field */
> +		ptr = skipwords(line, config_task_format_offsets.switches);
> +		ctxsw = strtol(ptr, NULL, 10);
> +
> +		/* get the prio field */
> +		ptr = skipwords(line, config_task_format_offsets.prio);
> +		prio = strtol(ptr, NULL, 10);
> +
> +                /*
> +                 * In older formats, we must check to
> +                 * see if the process is runnable prior to storing header
> +                 * fields and incrementing task processing
> +                 */
> +                if ((config_task_format == NEW_TASK_FORMAT) || (is_runnable(pid))) {
>  			strncpy(task->comm, comm, comm_size);
>  			task->comm[comm_size] = 0;
>  			task->pid = pid;
> @@ -492,18 +435,44 @@ static int parse_old_task_format(char *buffer, struct task_info *task_info, int
>  			task->ctxsw = ctxsw;
>  			task->prio = prio;
>  			task->since = time(NULL);
> -			waiting_tasks++;
> +			/* increment the count of tasks processed */
> +			tasks++;
> +		} else {
> +			continue;
>  		}
>  
> -		if ((start = nextline(start)) == NULL)
> -			break;
> +	}
> +	return tasks;
> +}
>  
> -		if (waiting_tasks >= nr_entries) {
> +
> +static int count_task_lines(char *buffer)
> +{
> +	int lines = 0;
> +	char *ptr;
> +	int len;
> +
> +	len = strlen(buffer);
> +
> +	/* Find the runnable tasks: header. */
> +	ptr = strstr(buffer, TASK_MARKER);
> +	if (ptr == NULL)
> +		return 0;
> +
> +	/* Skip to the end of the dashed line separator. */
> +	ptr = strstr(ptr, "-\n");
> +	if (ptr == NULL)
> +		return 0;
> +
> +	ptr += 2;
> +	while(*ptr && ptr < (buffer+len)) {
> +		lines++;
> +		ptr = strchr(ptr, '\n');
> +		if (ptr == NULL)
>  			break;
> -		}
> +		ptr++;
>  	}
> -
> -	return waiting_tasks;
> +	return lines;
>  }
>  
>  static int fill_waiting_task(char *buffer, struct cpu_info *cpu_info)
> @@ -515,36 +484,23 @@ static int fill_waiting_task(char *buffer, struct cpu_info *cpu_info)
>  		warn("NULL cpu_info pointer!\n");
>  		return 0;
>  	}
> -	nr_entries = cpu_info->nr_running;
> -
> -	switch (config_task_format) {
> -	case NEW_TASK_FORMAT:
> -		cpu_info->starving = malloc(sizeof(struct task_info) * nr_entries);
> -		if (cpu_info->starving == NULL) {
> -			warn("failed to malloc %d task_info structs", nr_entries);
> -			return 0;
> -		}
> -		nr_waiting = parse_new_task_format(buffer, cpu_info->starving, nr_entries);
> -		break;
> -	case OLD_TASK_FORMAT:
> -		/*
> -		 * The old task format does not output a correct value for
> -		 * nr_running (the initializer for nr_entries) so count the
> -		 * task lines for this CPU data and use that instead.
> -		 */
> +
> +	if (config_task_format == OLD_TASK_FORMAT)
>  		nr_entries = count_task_lines(buffer);
> -		if (nr_entries <= 0)
> -			return 0;
> -		cpu_info->starving = malloc(sizeof(struct task_info) * nr_entries);
> -		if (cpu_info->starving == NULL) {
> -			warn("failed to malloc %d task_info structs", nr_entries);
> -			return 0;
> -		}
> -		nr_waiting = parse_old_task_format(buffer, cpu_info->starving, nr_entries);
> -		break;
> -	default:
> -		die("invalid value for config_task_format: %d\n", config_task_format);
> +	else
> +		nr_entries = cpu_info->nr_running;
> +
> +	if (nr_entries <= 0)
> +		return 0;
> +
> +	cpu_info->starving = malloc(sizeof(struct task_info) * nr_entries);
> +	if (cpu_info->starving == NULL) {
> +		warn("failed to malloc %d task_info structs", nr_entries);
> +		return 0;
>  	}
> +
> +	nr_waiting = parse_task_lines(buffer, cpu_info->starving, nr_entries);
> +
>  	return nr_waiting;
>  }
>  
> @@ -574,7 +530,7 @@ static int sched_debug_parse(struct cpu_info *cpu_info, char *buffer, size_t buf
>  	}
>  
>  	/*
> -	 * The NEW_TASK_FORMAT produces useful output values for nr_running and
> +	 * NEW_TASK_FORMAT and produces useful output values for nr_running and
>  	 * rt_nr_running, so in this case use them. For the old format just leave
>  	 * them initialized to zero.
>  	 */
> @@ -613,7 +569,8 @@ static int sched_debug_has_starving_task(struct cpu_info *cpu)
>  static int sched_debug_init(void)
>  {
>  	find_sched_debug_path();
> -	config_task_format = detect_task_format();
> +	if ((config_task_format = detect_task_format()) == TASK_FORMAT_UNKNOWN)
> +		die("Can't handle task format!\n");
>  	return 0;
>  }
>  
> diff --git a/src/sched_debug.h b/src/sched_debug.h
> index 21f9da27866e..4b12c39bf7fe 100644
> --- a/src/sched_debug.h
> +++ b/src/sched_debug.h
> @@ -1,6 +1,67 @@
>  /* SPDX-License-Identifier: GPL-2.0-or-later */
> -#define OLD_TASK_FORMAT  1
> -#define NEW_TASK_FORMAT  2
>  #define TASK_MARKER	"runnable tasks:"
> +#define TASK_DIVIDER	"-\n"
> +
> +/*
> + * Over time, the various 'runnable task' output in SCHED_DEBUG has
> + * changed significantly.
> + *
> + * Depending on the version of the running kernel, the task formats can
> + * differ greatly.
> + *
> + * For example, in 3.X kernels, the sched_debug running tasks format denotes the current
> + * running task on the current CPU with a singular state label, 'R'. Other tasks do not
> + * receive a state label.
> + *
> + * example:
> + * '          task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep'
> + * ' ----------------------------------------------------------------------------------------------------------'
> + * '      watchdog/5    33        -8.984472       151     0         0.000000         0.535614         0.000000 0 /'
> + * ' R          less  9542      2382.087644        56   120         0.000000        16.444493         0.000000 0 /'
> + *
> + * In 4.18+ kernels, the sched_debug format running tasks format included an additional 'S'
> + * state field to denote the state of the running tasks on said CPU.
> + *
> + * example:
> + * ' S           task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep'
> + * '-----------------------------------------------------------------------------------------------------------'
> + * ' I         rcu_gp     3        13.973264         2   100         0.000000         0.004469         0.000000 0 0 /'
> + *
> + * Introduced in 6.12+, 2cab4bd024d2 sched/debug: Fix the runnable tasks
> + * output, the sched_debug running tasks format was changed to include
> + * four new EEVDF fields.
> + *
> + * Example:
> + *  'S            task   PID       vruntime   eligible    deadline             slice          sum-exec      switches  prio         wait-time        sum-sleep       sum-block  node   group-id  group-path'
> + *  '-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------'
> + *  ' I kworker/R-rcu_g     4        -1.048576   E          -1.040501           0.700000         0.000000         2     100         0.000000         0.000000         0.000000   0      0        /'
> + *
> + * As there are considerable differences in the location of the fields
> + * needed to boost task prioriy, handle the logical code differences with
> + * an enumerated type.
> + */
> +enum task_format {
> +	TASK_FORMAT_UNKNOWN =0,
> +	OLD_TASK_FORMAT,	// 3.10 kernel
> +	NEW_TASK_FORMAT,	// 4.18+ kernel
> +	TASK_FORMAT_LIMIT
> +};
> +
> +
> +/*
> + * set of offsets in a task format line based on offsets
> + * discovered by discover_task_format
> + *
> + * Note: These are *NOT* character offsets, these are "word" offsets.
> + * Requiring consumers of this struct to parse through the individual
> + * lines.
> + */
> +struct task_format_offsets {
> +	int task;
> +	int pid;
> +	int switches;
> +	int prio;
> +	int wait_time;
> +};

I wonder if it is possible to simplify the code using sscanf. For
example:

int detect_task_format(void)
{
    ...
    nextline(ptr);
    skipspaces(ptr);
    return *ptr == 'S' ? NEW_TASK_FORMAT : OLD_TASK_FORMAT;
}

...

void parse_task(void)
{
    ...
    switch (detect_task_format())
    {
        case NEW_TASK_FORMAT:
            sscanf(...);
            break;
        case OLD_TASK_FORMAT:
            sscanf(...);
            break;
        default:
            die();
    }
}

For an example of what I mean, see[1].

[1] https://git.kernel.org/pub/scm/utils/stalld/stalld.git/commit/?h=main&id=f9fe90f6c0b867be0c3fd245d6dd8b99686bb4ae

>  
>  extern struct stalld_backend sched_debug_backend;
> -- 
> 2.51.0
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/12] sched_debug: Fix runqueue task parsing logic and state filtering
  2025-10-17  2:24 ` [PATCH 02/12] sched_debug: Fix runqueue task parsing logic and state filtering Clark Williams
@ 2025-10-21 15:58   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 15:58 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, Claude, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:34PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> Refactor parse_task_lines() to correctly parse runnable tasks from
> sched_debug output with improved state filtering and iteration.
> 
> Key fixes:
> - Rename skipwords() to skip2word() and fix logic to position pointer
>   at the start of the target word (not after whitespace)
> - Fix NEW_TASK_FORMAT detection to account for 'S' column offset in
>   task field positions
> - Add proper state filtering for NEW_TASK_FORMAT: skip '>R' (running),
>   non-'R', and non-'X' states
> - Fix loop termination: iterate to (nr_entries-1) since running task
>   is excluded
> - Fix buffer allocation size to include null terminator (+1)
> - Add comprehensive comments explaining parsing state machine
> - Initialize ptr=NULL to prevent uninitialized use warnings
> 
> This resolves task parsing failures where stalld was incorrectly
> reading task fields or missing runnable tasks on newer kernels.
> 
> ???? Generated with [Claude Code](https://claude.com/claude-code)
> 
> Co-Authored-By: Claude <noreply@anthropic.com>
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/sched_debug.c | 89 +++++++++++++++++++++++++++++++++++------------
>  1 file changed, 66 insertions(+), 23 deletions(-)
> 
> diff --git a/src/sched_debug.c b/src/sched_debug.c
> index 180932ca7aa3..04a12ef2a7c2 100644
> --- a/src/sched_debug.c
> +++ b/src/sched_debug.c
> @@ -122,7 +122,8 @@ static char *alloc_and_fill_cpu_buffer(int cpu, char *sched_dbg, int sched_dbg_s
>  	if (!next_cpu_start)
>  		next_cpu_start = sched_dbg + sched_dbg_size;
>  
> -	size = next_cpu_start - cpu_start;
> +	/* add one for the null terminator */
> +	size = next_cpu_start - cpu_start + 1;
>  
>  	if (size <= 0)
>  		return NULL;
> @@ -171,12 +172,13 @@ static inline char *nextline(char *str)
>   * skip a specified number of words on a task line
>   */
>  
> -static inline char *skipwords(char *ptr, int nwords)
> +static inline char *skip2word(char *ptr, int nwords)
>  {
>  	int i;
> -	for (i=0; i < nwords; i++) {
> -		ptr = skipspaces(ptr);
> +	ptr = skipspaces(ptr);
> +	for (i=1; i < nwords; i++) {
>  		ptr = skipchars(ptr);
> +		ptr = skipspaces(ptr);
>  	}
>  	return ptr;
>  }
> @@ -228,12 +230,14 @@ static int detect_task_format(void)
>  	config_buffer_size = bufsiz;
>  	log_msg("initial config_buffer_size set to %zu\n", config_buffer_size);
>  
> +	/* find the delimiter for task information */
>  	ptr = strstr(buffer, TASK_MARKER);
>  	if (ptr == NULL) {
>  		die("unable to find 'runnable tasks' in buffer, invalid input\n");
>  		exit(-1);
>  	}
>  
> +	/* move to the column header line */
>  	ptr = nextline(ptr);
>  	i = 0;
>  
> @@ -245,6 +249,8 @@ static int detect_task_format(void)
>  	if (strncmp(ptr, "S", strlen("S")) == 0) {
>  		log_msg("detect_task_format: NEW_TASK_FORMAT detected\n");
>  		retval = NEW_TASK_FORMAT;
> +		/* move the word offset by one */
> +		i++;
>  	}
>  	else {
>  		log_msg("detect_task_format: OLD_TASK_FORMAT detected\n");
> @@ -357,45 +363,81 @@ static int is_runnable(int pid)
>  static int parse_task_lines(char *buffer, struct task_info *task_info, int nr_entries)
>  {
>  	int pid, ctxsw, prio, comm_size;
> -	char *ptr, *line, *end;
> +	char *ptr=NULL, *line = buffer, *end;
> +	char *buffer_end = buffer + strlen(buffer);
>  	struct task_info *task;
>  	char comm[COMM_SIZE];
>  	int tasks = 0;
>  
> -	if ((ptr = strstr(buffer, TASK_MARKER)) == NULL)
> -		die ("no runnable task section found!\n");
> -
>  	/*
>  	 * If we have less than two tasks on the CPU there is no
>  	 * possibility of a stall.
> -	 */
> + 	 */
>  	if (nr_entries < 2)
>  		return 0;
> +
> +
> +	/* search for the task marker header */
> +	ptr = strstr(buffer, TASK_MARKER);
> +	if (ptr == NULL)
> +		die ("no runnable task section found!\n");
> +
>  	line = ptr;
>  
> -	/* skip header and divider */
> +	/* skip "runnable tasks:" */
> + 	line = nextline(line);
> +
> +	/* skip header lines */
>  	line = nextline(line);
> +
> +	/* skip divider line */
>  	line = nextline(line);
> -	
> -	/* now loop over the task info */
> -	while (tasks < nr_entries) {
> +	/* at this point, line should point to the start of a task line */
> +
> +	/* now loop over the task info
> +	 * note that we always discount the task that's on the cpu, so the
> +	 * number of waiting tasks will always be at least one less than
> +	 * nr_entries.
> +	 */
> +	while ((line < buffer_end) && tasks < (nr_entries-1)) {
>  		task = &task_info[tasks];
>  
> +		/* move ptr to the first word of the line */
> +		ptr = skipspaces(line);
> +
>  		/*
>  		 * In 3.X kernels, only the singular RUNNING task receives
>  		 * a "running state" label. Therefore, only care about
> -		 * tasks that are not R (running on a CPU).
> +		 * tasks that are not R (runnable on a CPU).
>  		 */
>  		if ((config_task_format == OLD_TASK_FORMAT) &&
>  			(*ptr == 'R')) {
>  			/* Go to the end of the line and ignore this task. */
> -			ptr = strchr(ptr, '\n');
> -			ptr++;
> +			line = nextline(line);
>  			continue;
>  		}
>  
> +		/*
> +		 * in newer kernels (>=4.x) every task info line has a state
> +		 * but the actual running tasks has a '>R' to denote it.
> +		 * since we don't care about the currently running tasks
> +		 * skip it.
> +		 * Also, we don't care about any states other than 'R' (runnable)
> +		 * and 'X' (dying)
> +		 */
> +		if (config_task_format == NEW_TASK_FORMAT) {
> +			if (*ptr == '>' || (*ptr != 'R' && *ptr != 'X')) {
> +				line = nextline(line);
> +				continue;
> +			}
> +		}
> +
> +		/*
> +		 * At this point we have a task line to record
> +		 */
> +		
>  		/* get the task field */
> -		ptr = skipwords(line, config_task_format_offsets.task);
> +		ptr = skip2word(line, config_task_format_offsets.task);
>  
>  		/* Find the end of the task field */
>  		end = skipchars(ptr);
> @@ -408,20 +450,21 @@ static int parse_task_lines(char *buffer, struct task_info *task_info, int nr_en
>  		}
>  		strncpy(comm, ptr, comm_size);
>  		comm[comm_size] = '\0';
> -		ptr = end;
>  
>  		/* get the PID field */
> -		ptr = skipwords(line, config_task_format_offsets.pid);
> +		ptr = skip2word(line, config_task_format_offsets.pid);
>  		pid = strtol(ptr, NULL, 10);
>  
>  		/* get the context switches field */
> -		ptr = skipwords(line, config_task_format_offsets.switches);
> +		ptr = skip2word(line, config_task_format_offsets.switches);
>  		ctxsw = strtol(ptr, NULL, 10);
>  
>  		/* get the prio field */
> -		ptr = skipwords(line, config_task_format_offsets.prio);
> +		ptr = skip2word(line, config_task_format_offsets.prio);
>  		prio = strtol(ptr, NULL, 10);
>  
> +		/*log_msg("DEBUG: task%d comm:%s pid:%d ctxsw:%d prio:%d\n", tasks, comm, pid, ctxsw, prio);*/
> +
>                  /*
>                   * In older formats, we must check to
>                   * see if the process is runnable prior to storing header
> @@ -437,10 +480,10 @@ static int parse_task_lines(char *buffer, struct task_info *task_info, int nr_en
>  			task->since = time(NULL);
>  			/* increment the count of tasks processed */
>  			tasks++;
> -		} else {
> -			continue;
>  		}
>  
> +		/* move our line pointer to the next availble line */
> +		line = nextline(line);
>  	}
>  	return tasks;
>  }
> -- 
> 2.51.0
> 

I think it would be better to squash it with patch 1.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/12] sched_debug: Fix double-free crash in fill_waiting_task()
  2025-10-17  2:24 ` [PATCH 03/12] sched_debug: Fix double-free crash in fill_waiting_task() Clark Williams
@ 2025-10-21 16:01   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 16:01 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, Claude, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:35PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> Fix a critical double-free bug that occurs when fill_waiting_task()
> returns early without allocating new memory for cpu_info->starving.
> 
> Root cause:
> In sched_debug_parse(), the code saves old_tasks = cpu_info->starving,
> then calls fill_waiting_task(). If fill_waiting_task() returns early
> (nr_entries <= 0 or cpu_info == NULL), it doesn't allocate new memory
> or update cpu_info->starving. This leaves cpu_info->starving pointing
> to the old memory. When the code later frees old_tasks, it's freeing
> the same memory that cpu_info->starving still points to. On the next
> iteration, old_tasks again points to this already-freed memory, causing
> a double-free crash.
> 
> Example crash sequence:
> 1. First parse: cpu_info->starving = malloc(memory A)
> 2. Second parse: old_tasks = memory A
>                  nr_entries = 5, so cpu_info->starving = malloc(memory B)
>                  free(old_tasks) ??? frees memory A
> 3. Third parse:  old_tasks = memory B
>                  nr_entries = 0, so fill_waiting_task() returns early
>                  cpu_info->starving still points to memory B
>                  free(old_tasks) ??? frees memory B
> 4. Fourth parse: old_tasks = memory B (already freed!)
>                  nr_entries > 0, so cpu_info->starving = malloc(memory C)
>                  free(old_tasks) ??? DOUBLE FREE of memory B ??? CRASH
> 
> The fix:
> Explicitly set cpu_info->starving = NULL in fill_waiting_task() when
> returning early without allocation. This ensures that cpu_info->starving
> never points to freed memory, preventing the double-free.
> 
> Impact:
> - Prevents random crashes in sched_debug backend
> - Enables stable operation with varying numbers of runnable tasks
> - Critical for test suite reliability
> 
> Testing:
> - Verified no crash with: sudo stalld -f -v -N -b sched_debug -t 3 -c 0 -l
> - Runs stably for extended periods without crashes
> 
> ???? Generated with [Claude Code](https://claude.com/claude-code)
> 
> Co-Authored-By: Claude <noreply@anthropic.com>
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/sched_debug.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/sched_debug.c b/src/sched_debug.c
> index 04a12ef2a7c2..102a4e02d57d 100644
> --- a/src/sched_debug.c
> +++ b/src/sched_debug.c
> @@ -525,6 +525,7 @@ static int fill_waiting_task(char *buffer, struct cpu_info *cpu_info)
>  
>  	if (cpu_info == NULL) {
>  		warn("NULL cpu_info pointer!\n");
> +		cpu_info->starving = NULL;
>  		return 0;
>  	}
>  
> @@ -533,8 +534,10 @@ static int fill_waiting_task(char *buffer, struct cpu_info *cpu_info)
>  	else
>  		nr_entries = cpu_info->nr_running;
>  
> -	if (nr_entries <= 0)
> +	if (nr_entries <= 0) {
> +		cpu_info->starving = NULL;
>  		return 0;
> +	}
>  
>  	cpu_info->starving = malloc(sizeof(struct task_info) * nr_entries);
>  	if (cpu_info->starving == NULL) {
> -- 
> 2.51.0
> 

Reviewed-by: Wander Lairson Costa <wander@redhat.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/12] stalld.c: remove noisy idle report and added report to should_skip_idle_cpus()
  2025-10-17  2:24 ` [PATCH 04/12] stalld.c: remove noisy idle report and added report to should_skip_idle_cpus() Clark Williams
@ 2025-10-21 16:03   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 16:03 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:36PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> comment out the log_verbose() call in get_cpu_busy_list(), since
> it's not helpful for production and chews up a lot of screen real-
> estate when running tests. Added a report in should_skip_idle_cpus()
> which reports that there were busy cpus and that a run-queue parse
> will be done.
> 
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/stalld.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/stalld.c b/src/stalld.c
> index 4c00158cf239..e8c927750ee0 100644
> --- a/src/stalld.c
> +++ b/src/stalld.c
> @@ -289,9 +289,10 @@ int get_cpu_busy_list(struct cpu_info *cpus, int nr_cpus, char *busy_cpu_list)
>  			continue;
>  		}
>  
> -		log_verbose ("\t cpu %d had %ld idle time, and now has %ld\n",
> -			     cpu->id, cpu->idle_time, idle_time);
> -
> +		/*
> +		 * log_verbose ("\t cpu %d had %ld idle time, and now has %ld\n",
> +		 *	     cpu->id, cpu->idle_time, idle_time);
> +		 */

Any particular reason of this code is commented out instead of deleted?

>  		/* If the idle time did not change, the CPU is busy. */
>  		if (cpu->idle_time == idle_time) {
>  			busy_cpu_list[i] = 1;
> @@ -795,7 +796,7 @@ static int should_skip_idle_cpus(struct cpu_info *cpus, int nr_cpus, char *busy_
>  		log_verbose("all CPUs had idle time, skipping parse\n");
>  		return 1;
>  	}
> -
> +	log_verbose("some cpus ran, so run-queues should be parsed\n");
>  	return 0;
>  }
>  
> -- 
> 2.51.0
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/12] stalld.c: initialize cpu_info->idle_time to be -1
  2025-10-17  2:24 ` [PATCH 05/12] stalld.c: initialize cpu_info->idle_time to be -1 Clark Williams
@ 2025-10-21 16:15   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 16:15 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:37PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> Initialize the idle_time value of a cpu_info structure to be -1 so
> that it will be presumed busy for the first time through the loop,
> which will force an initial check of the run queues.
> 
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/stalld.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/src/stalld.c b/src/stalld.c
> index e8c927750ee0..6cd4bf68d04a 100644
> --- a/src/stalld.c
> +++ b/src/stalld.c
> @@ -293,6 +293,15 @@ int get_cpu_busy_list(struct cpu_info *cpus, int nr_cpus, char *busy_cpu_list)
>  		 * log_verbose ("\t cpu %d had %ld idle time, and now has %ld\n",
>  		 *	     cpu->id, cpu->idle_time, idle_time);
>  		 */
> +
> +		/* If this is the first check (idle_time was -1), assume busy to be safe */
> +		if (cpu->idle_time == -1) {
> +			cpu->idle_time = idle_time;
> +			busy_cpu_list[i] = 1;
> +			busy_count++;
> +			continue;
> +		}
> +
>  		/* If the idle time did not change, the CPU is busy. */
>  		if (cpu->idle_time == idle_time) {
>  			busy_cpu_list[i] = 1;
> @@ -1168,6 +1177,7 @@ int main(int argc, char **argv)
>  	for (i = 0; i < config_nr_cpus; i++) {
>  		cpus[i].buffer = allocate_memory(1, config_buffer_size);
>  		cpus[i].buffer_size = config_buffer_size;
> +		cpus[i].idle_time = -1;  /* Initialize to -1 so first check doesn't skip busy CPUs */
>  	}
>  
>  	if (config_log_syslog)
> -- 
> 2.51.0
> 

Reviewed-by: Wander Lairson Costa <wander@redhat.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 06/12] stalld.c: get rid of misleading print about DL-Server
  2025-10-17  2:24 ` [PATCH 06/12] stalld.c: get rid of misleading print about DL-Server Clark Williams
@ 2025-10-21 16:16   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 16:16 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:38PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> Get rid of the assertion that stalld was forced into log-only mode
> 
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/stalld.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/stalld.c b/src/stalld.c
> index 6cd4bf68d04a..6adc3e9a8cea 100644
> --- a/src/stalld.c
> +++ b/src/stalld.c
> @@ -1149,7 +1149,7 @@ int main(int argc, char **argv)
>  		set_cpu_affinity(config_affinity_cpus);
>  
>  	if (!check_dl_server_dir_exists())
> -		log_msg("DL-server detected. Operating in log-only mode.\n");
> +		log_msg("DL-server detected.\n");
>  
>  	/*
>  	 * Check RT throttling:
> -- 
> 2.51.0
> 

Reviewed-by: Wander Lairson Costa <wander@redhat.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/12] stalld.c: Add starvation logging in single-threaded log-only mode
  2025-10-17  2:24 ` [PATCH 07/12] stalld.c: Add starvation logging in single-threaded log-only mode Clark Williams
@ 2025-10-21 16:27   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 16:27 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, Claude, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:39PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> In single-threaded mode with -l/--log_only flag, stalld was not logging
> the "starved" message when tasks reached the starvation threshold. This
> was because the config_log_only check occurred before any logging, and
> the "starved" message (present in check_starving_tasks() for per-CPU
> threading modes) was missing from boost_cpu_starving_vector().
> 
> This caused test_log_only.sh to fail when looking for the "starved"
> message in the log output.
> 
> Changes to boost_cpu_starving_vector():
> - Move config_log_only check after starvation detection
> - Add log_msg() call to output "starved" message when threshold reached
> - Reset timestamp (cpu->since) in log-only mode to prevent continuous
>   logging of the same starvation event
> - Change threshold comparison from <= to < for consistency with
>   check_starving_tasks()
> 
> This ensures consistent logging behavior between single-threaded mode
> and per-CPU threading modes when using log-only mode.
> 
> ???? Generated with [Claude Code](https://claude.com/claude-code)
> 
> Co-Authored-By: Claude <noreply@anthropic.com>
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/stalld.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/src/stalld.c b/src/stalld.c
> index 6adc3e9a8cea..ebca23105ce9 100644
> --- a/src/stalld.c
> +++ b/src/stalld.c
> @@ -924,12 +924,22 @@ int boost_cpu_starving_vector(struct cpu_starving_task_info *vector, int nr_cpus
>  			log_verbose("\t cpu %d: pid: %d starving for %llu\n",
>  				    i, cpu->pid, (now - cpu->since));
>  
> -		if (config_log_only)
> +		/* Skip if no task or not starving long enough */
> +		if (cpu->pid == 0 || (now - cpu->since) < config_starving_threshold)
>  			continue;
>  
> -		/* Skip if no task or not starving long enough */
> -		if (cpu->pid == 0 || (now - cpu->since) <= config_starving_threshold)
> +		/* Log when task has reached starvation threshold */
> +		if ((now - cpu->since) >= config_starving_threshold) {

Is this "if" necessary? It wouldn't reach this point if the condition is
false, would it?

> +			log_msg("%s-%d starved on CPU %d for %d seconds\n",
> +				cpu->task.comm, cpu->pid, i,
> +				(now - cpu->since));
> +		}
> +
> +		if (config_log_only) {
> +			/* Reset timestamp to avoid continuous logging */
> +			cpu->since = now;
>  			continue;
> +		}
>  
>  		/* Skip if task is on denylist */
>  		if (config_ignore && !check_task_ignore(&cpu->task))
> -- 
> 2.51.0
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 08/12] stalld: Add -N/--no_idle_detect flag to disable idle detection
  2025-10-17  2:24 ` [PATCH 08/12] stalld: Add -N/--no_idle_detect flag to disable idle detection Clark Williams
@ 2025-10-21 16:33   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 16:33 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, Claude, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:40PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> Idle detection is an optimization that skips parsing CPUs that have
> increasing idle time. However, this causes issues in controlled test
> environments where the starvation generator may not prevent CPUs from
> appearing idle.
> 
> The new flag allows disabling this optimization for:
> - Testing scenarios where CPUs need to be monitored regardless of idle state
> - Debugging starvation detection issues
> - Ensuring all CPUs are always checked
> 
> Usage: stalld -N or --no_idle_detect
> 
> Implementation:
> - Added -N option to getopt_long short options
> - Added no_idle_detect to long_options array
> - Sets config_idle_detection = 0 when specified
> - Added help text: "disable idle CPU detection optimization (for testing)"
> 
> The config_idle_detection variable already exists and is checked
> throughout the code, so no other changes are needed.
> 
> ???? Generated with [Claude Code](https://claude.com/claude-code)
> 
> Co-Authored-By: Claude <noreply@anthropic.com>
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/utils.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/src/utils.c b/src/utils.c
> index 91c99c0a1270..8f1fe9266db9 100644
> --- a/src/utils.c
> +++ b/src/utils.c
> @@ -767,6 +767,7 @@ static void print_usage(void)
>  		"                               starving_threshold time, dispatch a specialized thread to monitor",
>  		"                               it.",
>  		"	   -O/--power_mode: works as a single threaded tool. Saves CPU, but loses precision.",
> +		"	   -N/--no_idle_detect: disable idle CPU detection optimization (for testing)",
>  		"	   -g/--granularity: set the granularity at which stalld checks for starving threads",
>  		"	   -R/--reservation: percentage of CPU time reserved to stalld using SCHED_DEADLINE.",
>  		"	   -a/--affinity: limit stalld's affinity",
> @@ -996,6 +997,7 @@ int parse_args(int argc, char **argv)
>  			{"aggressive_mode",	no_argument,	   0, 'A'},
>  			{"power_mode",		no_argument,	   0, 'O'},
>  			{"adaptive_mode",	no_argument,	   0, 'M'},
> +			{"no_idle_detect",	no_argument,	   0, 'N'},
>  			{"help",		no_argument,	   0, 'h'},
>  			{"boost_period",	required_argument, 0, 'p'},
>  			{"boost_runtime",	required_argument, 0, 'r'},
> @@ -1017,7 +1019,7 @@ int parse_args(int argc, char **argv)
>  		/* getopt_long stores the option index here. */
>  		int option_index = 0;
>  
> -		c = getopt_long(argc, argv, "lvkfAOMhsp:r:d:t:c:FVSg:i:I:R:b:a:",
> +		c = getopt_long(argc, argv, "lvkfAOMNhsp:r:d:t:c:FVSg:i:I:R:b:a:",
>  				 long_options, &option_index);
>  
>  		/* Detect the end of the options. */
> @@ -1138,6 +1140,10 @@ int parse_args(int argc, char **argv)
>  			config_single_threaded = 0;
>  			config_aggressive = 0;
>  			break;
> +		case 'N':
> +			config_idle_detection = 0;
> +			log_msg("idle detection disabled\n");
> +			break;
>  		case 'R':
>  			config_reservation = get_long_from_str(optarg);
>  			if (config_reservation < 10 || config_reservation > 90)
> -- 
> 2.51.0
> 

Reviewed-by: Wander Lairson Costa <wander@redhat.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 09/12] stalld: Add defensive checks in print_boosted_info
  2025-10-17  2:24 ` [PATCH 09/12] stalld: Add defensive checks in print_boosted_info Clark Williams
@ 2025-10-21 17:36   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 17:36 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, Claude, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:41PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> This adds validation and bounds checking in print_boosted_info() to
> prevent segmentation faults when boosting tasks in FIFO mode.
> 
> Changes:
> - Add NULL check for 'type' parameter
> - Handle fill_process_comm() failures gracefully with fallback
> - Validate cpu->id is within valid range before dereferencing
> - Provide informative error messages for invalid states
> 
> Issue: Test results show stalld crashes with SIGSEGV when using -F
> (force FIFO) flag in aggressive mode (-A). The crash occurs during
> or immediately after the first boost attempt.
> 
> This is a partial fix that adds defensive programming. Further
> investigation needed to identify the root cause of the crashes
> in FIFO boosting mode.
> 
> Co-Authored-By: Claude <noreply@anthropic.com>
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/stalld.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/src/stalld.c b/src/stalld.c
> index ebca23105ce9..466b6db15668 100644
> --- a/src/stalld.c
> +++ b/src/stalld.c
> @@ -435,10 +435,21 @@ void print_boosted_info(int tgid, int pid, struct cpu_info *cpu, char *type)
>  {
>  	char comm[COMM_SIZE];
>  
> -	fill_process_comm(tgid, pid, comm, COMM_SIZE);
> +	/* Validate inputs to prevent crashes */
> +	if (!type) {
> +		warn("print_boosted_info called with NULL type\n");
> +		return;
> +	}

If !type must not happen with correct program logic, I would let it
crash. If we are screwing things up, it is better to have a core dump
than a buggy process running.

> +
> +	if (fill_process_comm(tgid, pid, comm, COMM_SIZE) != 0) {
> +		/* If we can't get the comm, use a placeholder */
> +		snprintf(comm, COMM_SIZE, "<unknown>");
> +	}
>  
> -	if (cpu)
> +	if (cpu && cpu->id >= 0 && cpu->id < config_nr_cpus)
>  		log_msg("boosted pid %d (%s) (cpu %d) using %s\n", pid, comm, cpu->id, type);
> +	else if (cpu)
> +		log_msg("boosted pid %d (%s) (cpu <invalid>) using %s\n", pid, comm, type);
>  	else
>  		log_msg("boosted pid %d (%s) using %s\n", pid, comm, type);
>  }

The same comment applies here.

> -- 
> 2.51.0
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/12] Makefile: Add support for legacy kernels
  2025-10-17  2:24 ` [PATCH 10/12] Makefile: Add support for legacy kernels Clark Williams
  2025-10-17 12:50   ` Derek Barbosa
@ 2025-10-21 17:43   ` Wander Lairson Costa
  1 sibling, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 17:43 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Derek Barbosa, Clark Williams, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:42PM -0500, Clark Williams wrote:
> From: Derek Barbosa <debarbos@redhat.com>
> 
> In the sched_debug backend code, we conditionally parse the running
> tasks block of the sched_debug file to support the older task format (as
> denoted by OLD_TASK_FORMAT).
> 
> This task format is only present on legacy 3.X kernels. However, on said
> legacy kernels, the current Makefile directives and #if macros in the
> stalld.c code do not fully support compilation of stalld.
> 
> Add a LEGACY check to the makefile, which parses `uname -r` output for
> the major kernel version, and adds a LEGACY compilation flag to the
> CFLAGS. Use the compilation flag to default to sched_debug_backend in
> the stalld.c file to avoid missing references to the queue_track code.
> 
> Also, disable Intel CET -fcf-protection and BPF support, remove
> annocheck targets, and default to the c99 compiler (if present).
> 
> Signed-off-by: Derek Barbosa <debarbos@redhat.com>
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  Makefile     | 40 +++++++++++++++++++++++++++++++++++++++-
>  src/stalld.c |  6 ++++--
>  2 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 7234a46976aa..c09d5743b696 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -9,6 +9,14 @@ ARCH=$(shell uname -m)
>  endif
>  $(info ARCH=$(ARCH))
>  
> +LEGACY_VER := 3
> +ifeq ($(strip $(KERNEL_VER)),)

Is the ifeq here to let the user overwrite KERNEL_VER?

> +KERNEL_VER=$(shell uname -r | cut -f 1 -d .)
> +endif
> +$(info Kernel Major Version is $(KERNEL_VER))
> +
> +IS_LEGACY:= $(shell test $(KERNEL_VER) -le $(LEGACY_VER) && echo "true" || echo "false")
> +
>  USE_BPF := 1
>  FCF_PROTECTION := -fcf-protection
>  MTUNE	:= -mtune=generic
> @@ -33,6 +41,10 @@ ifeq ($(ARCH),powerpc)
>  USE_BPF := 0
>  MTUNE := -mtune=powerpc
>  endif
> +ifeq ($(IS_LEGACY),true)
> +USE_BPF := 0
> +FCF_PROTECTION := 
> +endif
>  
>  $(info USE_BPF=$(USE_BPF))
>  $(info FCF_PROTECTION=$(FCF_PROTECTION))
> @@ -57,6 +69,7 @@ DEFS	:=	-DUSE_BPF=$(USE_BPF) -D_FORTIFY_SOURCE=3 -D_GLIBCXX_ASSERTIONS -DDEBUG_S
>  
>  # note that RPMCFLAGS and RPMLDFLAGS are variables that come from the specfile when
>  # building for Fedora/CentOS/RHEL/et al
> +#

typo?

>  
>  ifeq ($(RPMCFLAGS),)
>  CFLAGS	:=	-DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(SOPTS) $(DEFS)
> @@ -64,6 +77,12 @@ else
>  CFLAGS	:=	 $(RPMCFLAGS) $(DEFS)
>  endif
>  
> +ifeq ($(IS_LEGACY),true)
> +$(info Compiling with Legacy Mode enabled)
> +$(info Overwriting RPMCFLAGS...)
> +CFLAGS	:=	-DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(DEFS) -std=c99 -DLEGACY
> +endif
> +
>  ifeq ($(DEBUG),0)
>  CFLAGS	+=	-g -O2
>  else
> @@ -200,7 +219,7 @@ install: stalld
>  	rm -f $(DESTDIR)$(BINDIR)/throttlectl
>  	make -C systemd DESTDIR=$(INSPATH) uninstall
>  
> -.PHONY: clean tarball systemd push annocheck
> +.PHONY: clean tarball systemd push annocheck help
>  clean:
>  	@test ! -f stalld || rm stalld
>  	@test ! -f stalld-static || rm stalld-static
> @@ -221,3 +240,22 @@ tarball:  clean
>  
>  annocheck: stalld
>  	annocheck --ignore-unknown --verbose --profile=el10 --debug-dir=/usr/lib/debug/ ./stalld
> +
> +help:
> +	@echo  'Cleaning targets:'
> +	@echo  '  clean            - Clean the main executable, tests, objects, and tarballs.'
> +	@echo  ''
> +	@echo 'Building targets:'
> +	@echo '  stalld            - Build the main executable (stalld).'
> +	@echo '  static            - Build a statically linked executable (stalld-static).'
> +	@echo '  tests             - Run tests in the "tests" subdirectory.'
> +	@echo '  tarball           - Create a source tarball: $(NAME)-$(VERSION).tar.$(CEXT)'
> +	@echo ''
> +	@echo 'Installation targets:'
> +	@echo '  install           - Install the executable, man page, docs, and licenses.'
> +	@echo '  uninstall         - Remove all installed files.'
> +	@echo ''
> +	@echo 'Utility targets:'
> +	@echo '  annocheck         - Run the annocheck security analysis tool on the stalld executable.'
> +	@echo '  help              - Display this help message.'
> +
> diff --git a/src/stalld.c b/src/stalld.c
> index 466b6db15668..305618e577fc 100644
> --- a/src/stalld.c
> +++ b/src/stalld.c
> @@ -153,14 +153,16 @@ int config_reservation = 0;
>  
>  /*
>   * Select a backend.
> - * Note that eBPF not supported on i686 and powerpc
> + * Note that eBPF not supported on i686, powerpc, and on x86 3.X kernels.
>   */
> -#if !__powerpc__ && !__i386__
> +#if !__powerpc__ && !__i386__ && !LEGACY
>  struct stalld_backend *backend = &queue_track_backend;
>  #else
>  struct stalld_backend *backend = &sched_debug_backend;
>  #endif
>  
> +
> +
>  /*
>   * Set of CPUs in which stalld should run.
>   */
> -- 
> 2.51.0
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/12] scripts: fix run-local if bashism
  2025-10-17  2:24 ` [PATCH 11/12] scripts: fix run-local if bashism Clark Williams
@ 2025-10-21 17:45   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 17:45 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Derek Barbosa, Clark Williams, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:43PM -0500, Clark Williams wrote:
> From: Derek Barbosa <debarbos@redhat.com>
> 
> In older versions of Bash, mainly 4.2.X, a semicolon (or newline) is
> requied after a conditional statement and before the `then` keyword.
> 
> Add the semicolon to allow testing and quick "local runs" on legacy
> kernels.
> 
> Signed-off-by: Derek Barbosa <debarbos@redhat.com>
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  scripts/run-local.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/run-local.sh b/scripts/run-local.sh
> index 0ce147634472..e65f34dd46a4 100755
> --- a/scripts/run-local.sh
> +++ b/scripts/run-local.sh
> @@ -44,7 +44,7 @@ parse_args()
>  # script start
>  #
>  
> -if [[ ! -x ./stalld ]] then
> +if [[ ! -x ./stalld ]]; then
>     echo "No stalld executable in the current directory!"
>     exit 1
>  fi
> -- 
> 2.51.0
> 

Reviewed-by: Wander Lairson Costa <wander@redhat.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 12/12] Fix segfault in adaptive/aggressive modes
  2025-10-17  2:24 ` [PATCH 12/12] Fix segfault in adaptive/aggressive modes Clark Williams
@ 2025-10-21 17:45   ` Wander Lairson Costa
  0 siblings, 0 replies; 25+ messages in thread
From: Wander Lairson Costa @ 2025-10-21 17:45 UTC (permalink / raw)
  To: Clark Williams
  Cc: linux-rt-users, Clark Williams, Claude, debarbos, marco.chiappero,
	chris.friesen, luochunsheng

On Thu, Oct 16, 2025 at 09:24:44PM -0500, Clark Williams wrote:
> From: Clark Williams <williams@redhat.com>
> 
> The merge_taks_info() function was unconditionally calling
> update_cpu_starving_vector() at line 389, but cpu_starving_vector
> is only allocated in single_threaded_main() (line 1007). This caused
> segmentation faults when running in adaptive or aggressive threading
> modes.
> 
> The fix guards both update_cpu_starving_vector() calls with
> config_single_threaded checks, since this vector is only used
> in single-threaded/power mode.
> 
> Root cause: cpu_starving_vector is a global array used exclusively
> by single-threaded mode to track one starving task per CPU. The
> merge_taks_info() function is called by both backends during parsing,
> regardless of threading mode. In adaptive/aggressive modes, the
> vector remains NULL, causing crashes when accessed.
> 
> Crash backtrace:
>   conservative_main() ??? get_cpu_and_parse() ??? sched_debug_parse() ???
>   merge_taks_info() ??? update_cpu_starving_vector() ??? SEGFAULT at
>   line 362: if (cpu_info->pid)
> 
> This fix allows test_starvation_threshold.sh to pass on sched_debug
> backend with adaptive mode (-M flag).
> 
> ???? Generated with [Claude Code](https://claude.com/claude-code)
> 
> Co-Authored-By: Claude <noreply@anthropic.com>
> Signed-off-by: Clark Williams <clrkwllms@kernel.org>
> Signed-off-by: Clark Williams <williams@redhat.com>
> ---
>  src/stalld.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/stalld.c b/src/stalld.c
> index 305618e577fc..130a7d4eaf9d 100644
> --- a/src/stalld.c
> +++ b/src/stalld.c
> @@ -386,7 +386,8 @@ void merge_taks_info(int cpu, struct task_info *old_tasks, int nr_old, struct ta
>  	int i;
>  	int j;
>  
> -	update_cpu_starving_vector(cpu, &notask);
> +	if (config_single_threaded)
> +		update_cpu_starving_vector(cpu, &notask);
>  
>  	for (i = 0; i < nr_old; i++) {
>  		old_task = &old_tasks[i];
> -- 
> 2.51.0
> 

Reviewed-by: Wander Lairson Costa <wander@redhat.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-10-21 17:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-17  2:24 [PATCH 01/12] sched_debug: Unify parsing methods for task_info Clark Williams
2025-10-17  2:24 ` [PATCH 02/12] sched_debug: Fix runqueue task parsing logic and state filtering Clark Williams
2025-10-21 15:58   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 03/12] sched_debug: Fix double-free crash in fill_waiting_task() Clark Williams
2025-10-21 16:01   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 04/12] stalld.c: remove noisy idle report and added report to should_skip_idle_cpus() Clark Williams
2025-10-21 16:03   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 05/12] stalld.c: initialize cpu_info->idle_time to be -1 Clark Williams
2025-10-21 16:15   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 06/12] stalld.c: get rid of misleading print about DL-Server Clark Williams
2025-10-21 16:16   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 07/12] stalld.c: Add starvation logging in single-threaded log-only mode Clark Williams
2025-10-21 16:27   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 08/12] stalld: Add -N/--no_idle_detect flag to disable idle detection Clark Williams
2025-10-21 16:33   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 09/12] stalld: Add defensive checks in print_boosted_info Clark Williams
2025-10-21 17:36   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 10/12] Makefile: Add support for legacy kernels Clark Williams
2025-10-17 12:50   ` Derek Barbosa
2025-10-21 17:43   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 11/12] scripts: fix run-local if bashism Clark Williams
2025-10-21 17:45   ` Wander Lairson Costa
2025-10-17  2:24 ` [PATCH 12/12] Fix segfault in adaptive/aggressive modes Clark Williams
2025-10-21 17:45   ` Wander Lairson Costa
2025-10-21 15:54 ` [PATCH 01/12] sched_debug: Unify parsing methods for task_info Wander Lairson Costa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox