public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: kernel test robot <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	 linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
	sched-ext@lists.linux.dev,  aubrey.li@linux.intel.com,
	yu.c.chen@intel.com
Subject: Re: [linus:master] [sched/core]  704069649b: kernel-selftests.kvm.hardware_disable_test.fail
Date: Fri, 13 Feb 2026 14:14:14 -0800	[thread overview]
Message-ID: <aY-iNnrhEj2XuRcH@google.com> (raw)
In-Reply-To: <20260213144417.GL3016024@noisy.programming.kicks-ass.net>

On Fri, Feb 13, 2026, Peter Zijlstra wrote:
> On Thu, Feb 12, 2026 at 10:08:04PM +0800, kernel test robot wrote:
> > Hello,
> > 
> > we found the kernel-selftests.kvm.hardware_disable_test failed consistently upon
> > this commit but pass on parent. unfortunately, we didn't find many useful
> > information in dmesg. this report is just FYI what we observed in our tests.
> > 
> > kernel test robot noticed "kernel-selftests.kvm.hardware_disable_test.fail" on:
> 
> With the caveat of PEBKAC (it is Friday after all); I can't reproduce.
> 
> That is, ./hardware_disable_test as build from cee73b1e840c, doesn't
> work for me on 704069649b5b^1 either.
> 
> Sean; is there a magic trick to operating that test, or is it a known
> trouble spot?

Hmm, shouldn't require any magic, and hasn't been known to be flaky.

This very decisively points at 704069649b5b ("sched/core: Rework
sched_class::wakeup_preempt() and rq_modified_*()"). on my end as well.  With
that commit reverted, the below runs in ~40ms total.  With 704069649b5b present,
the test constantly stalls for multiple seconds at sem_timedwait().

AFAICT, the key is to have the busy_loop() pthread affined to the same CPU as
its parent.  The KVM pieces of the selftest have nothing to do with the failure.

Here's a minimal reproducer that you can build without selftests goo :-)
E.g. `gcc -pthread -o busy busy.c` should work.

// SPDX-License-Identifier: GPL-2.0-only
#define _GNU_SOURCE

#include <fcntl.h>
#include <pthread.h>
#include <sched.h>
#include <semaphore.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

sem_t *sem;

static void *busy_loop(void *arg)
{
	for (;;)
		;

	return NULL;
}

static void run_test(uint32_t run)
{
	pthread_t thread;
	cpu_set_t cpuset;

	CPU_ZERO(&cpuset);
	CPU_SET(sched_getcpu(), &cpuset);

	printf("%s: [%d] spawn busy thread\n", __func__, run);
	if (pthread_create(&thread, NULL, busy_loop, (void *)NULL))
		exit(-1);

	if (pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset))
		exit(-1);

	printf("%s: [%d] thread launched\n", __func__, run);
	sem_post(sem);

	pthread_join(thread, NULL);
	printf("child pthread exited prematurely\n");
	exit(-1);
}

void wait_for_child_setup(pid_t pid)
{
	/*
	 * Wait for the child to post to the semaphore, but wake up periodically
	 * to check if the child exited prematurely.
	 */
	for (;;) {
		const struct timespec wait_period = { .tv_sec = 1 };
		int status;

		if (!sem_timedwait(sem, &wait_period))
			return;

		/* Child is still running, keep waiting. */
		if (pid != waitpid(pid, &status, WNOHANG))
			continue;

		/*
		 * Child is no longer running, which is not expected.
		 *
		 * If it exited with a non-zero status, we explicitly forward
		 * the child's status in case it exited with KSFT_SKIP.
		 */
		if (WIFEXITED(status))
			exit(WEXITSTATUS(status));

		printf("Child exited unexpectedly\n");
		exit(-1);
	}
}

int main(int argc, char **argv)
{
	uint32_t i;
	int s, r;
	pid_t pid;

	sem = sem_open("vm_sem", O_CREAT | O_EXCL, 0644, 0);
	sem_unlink("vm_sem");

	for (i = 0; i < 512; ++i) {
		pid = fork();
		if (pid < 0)
			exit(-1);
		if (pid == 0)
			run_test(i); /* This function always exits */

		printf("%s: [%d] waiting semaphore\n", __func__, i);
		wait_for_child_setup(pid);

		printf("%s: [%d] do waitpid\n", __func__, i);
		r = waitpid(pid, &s, WNOHANG);
		if (r == pid) {
			printf("%s: [%d] child exited unexpectedly status: [%d]",
			       __func__, i, s);
			exit(-1);
		}
		printf("%s: [%d] killing child\n", __func__, i);
		kill(pid, SIGKILL);
	}

	sem_destroy(sem);
	exit(0);
}


  reply	other threads:[~2026-02-13 22:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12 14:08 [linus:master] [sched/core] 704069649b: kernel-selftests.kvm.hardware_disable_test.fail kernel test robot
2026-02-13 10:40 ` Peter Zijlstra
2026-02-13 22:06   ` Sean Christopherson
2026-02-13 14:44 ` Peter Zijlstra
2026-02-13 22:14   ` Sean Christopherson [this message]
2026-02-18  9:40     ` Peter Zijlstra
2026-02-18 16:33       ` Peter Zijlstra
2026-02-23 10:25         ` [tip: sched/urgent] sched/core: Fix wakeup_preempt's next_class tracking tip-bot2 for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aY-iNnrhEj2XuRcH@google.com \
    --to=seanjc@google.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=mingo@kernel.org \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=peterz@infradead.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox