From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B79C32E723 for ; Thu, 29 Jan 2026 21:20:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769721650; cv=none; b=jsdhykO6Jp7iwPvwC7sKqh2MZUBUyoTzffkeqb7tQepXih7x8/bolN/u14vFFwJhb6jBBrnMjv8U+vhZo/q1uD9JqM5EVfBdvsGwjDxWPbdtblDxZausNr5EO8CgY8MzLB7lecl8glR6k73nROTkCrgiVCS7mVp1CxORQzZRFes= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769721650; c=relaxed/simple; bh=uymwgfZoRw6SwkMfdIIHHUgdGcL5hzDCieu9RqAUoDc=; h=Date:Message-ID:From:To:Cc:Subject; b=FG1Z2gUK98AJ4xkj8g8Vw9YUcGAk0SENbXBjdh9+4uzd+mx6Eq5JsL+/O0JsgQYmeyqtYcN33l1Y480d3V6BFJ1iz/C/30jl3Fjnot9e1AIvlqOES7JHYUNLD8XAjYUwif4XFQLeKb3izlxrWzI4GqyOQSdyGc/8rIQx1U1r3sQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=pass smtp.mailfrom=linutronix.de; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Date: Thu, 29 Jan 2026 22:20:46 +0100 Message-ID: <20260129210219.452851594@kernel.org> From: Thomas Gleixner To: LKML Cc: Ihor Solodrai , Shrikanth Hegde , Peter Zijlstra , Mathieu Desnoyers , Michael Jeanson Subject: [patch 0/4] sched/mmcid: Cure mode transition woes Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Ihor and Shrikanth reported hard lockups which can be tracked back to the recent rewrite of the MM_CID management code. 1) The from task to CPU ownership transition lacks the intermediate transition mode, which can lead to CID pool exhaustion and a subsequent live lock. That intermediate mode was implemented for the reverse operation already but omitted for this transition as the original analysis missed a few possible scheduling scenarios. 2) Weakly ordered architectures can observe inconsistent state which causes them to make the wrong decision. That leads to the same problem as with #1. The following series addresses these issue and fixes another albeit harmless inconsistent state hickup which was found when analysing the above issues. With these issues addressed the last change optimizes the bitmap utilization in the transition modes. The series applies on Linus tree and passes the selftests and a thread pool emulator which stress tests the ownership transitions. Thanks, tglx --- include/linux/rseq_types.h | 7 - kernel/sched/core.c | 170 +++++++++++++++++++++++++++++---------------- kernel/sched/sched.h | 45 +++++++++-- 3 files changed, 151 insertions(+), 71 deletions(-)