From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69EDA3A453C for ; Tue, 7 Apr 2026 08:20:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775550032; cv=none; b=L1IRfl4NfVCmUBZ+hbyrNID8owrNKzy5nPxwzFMtVxR5RZI4VcFpHKSJ2bgrLmaTXH9YOYn73TLuvCOBtHW6JVfeEPQQqT4D4pmm/nEg33/sXR+uFWzgVwWvhnT9UzkugdrrGS5homp8ermStNRLbPAYABlHGOC7kTtzMTFuW2g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775550032; c=relaxed/simple; bh=wMD0C51YtTv+vGZu4+KqRAlSIavBREHNk7BivGpAoeE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=chVIIxk5nOZ27vIxOrv0T8f+1FS5DwpznUJsCqGMsG8BpFYO0KcSN1l5Nzn2bJWQ0NIc8cM75kod+nXhVTnMxHIUuZraGhjQ3ev7fEBEaP1085GdaPY7LLyvqOM9+tJPZYfVmzY+inQNApzxCozTWASteOD6jVAkniERG/kEFFY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=jqK9q0Xo; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="jqK9q0Xo" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=S+XiwyU0BGGpO330btkSn8z6ENzV7zxeWE0KhLdlJck=; b=jqK9q0XoPpzkHP0tIKJUdLs10X Pf+rNIgtDjjDXjwoK92rxNKeXghiUWEd7mw5pWYQ7j2avT4BFR7HzB5rX/IFsHIJxSlktF2ZhPT2S LzXxDfyy56YjEMxaRg+p1XjDdpk9KaR4cGp6QpeVaItTkpsl7sx+smYjFf9JiHBJypj8q2I1IqxsF I259AzoMIVtLfFz+qyuszDThJAT+9b6+ooTiDTYS03BYB+z+qkMdzLM3lmn+iG9K5yqaN6BiTNjhR XrwztIAJqE7vIaZ9LS973aWKDl89losi2KpPUbFTosnZJIbc7xg5WTzjNFX4Se1BhSAXecGic9g0J kQKhEefQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1wA1fY-00000003E93-2BIP; Tue, 07 Apr 2026 08:20:20 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id CD0633005E5; Tue, 07 Apr 2026 10:20:18 +0200 (CEST) Date: Tue, 7 Apr 2026 10:20:18 +0200 From: Peter Zijlstra To: Ritesh Harjani Cc: Andres Freund , Salvatore Dipietro , linux-kernel@vger.kernel.org, alisaidi@amazon.com, blakgeof@amazon.com, abuehaze@amazon.de, dipietro.salvatore@gmail.com, Thomas Gleixner , Valentin Schneider , Sebastian Andrzej Siewior , Mark Rutland Subject: Re: [PATCH 0/1] sched: Restore PREEMPT_NONE as default Message-ID: <20260407082018.GC3738010@noisy.programming.kicks-ass.net> References: <20260403191942.21410-1-dipiets@amazon.it> <20260403213207.GF2872@noisy.programming.kicks-ass.net> <1pgulz0k.ritesh.list@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1pgulz0k.ritesh.list@gmail.com> On Sun, Apr 05, 2026 at 11:38:59AM +0530, Ritesh Harjani wrote: > However, for curiosity, I was hoping if someone more familiar with the > scheduler area can explain why PREEMPT_LAZY v/s PREEMPT_NONE, causes > performance regression w/o huge pages? > > Minor page fault handling has micro-secs latency, where as sched ticks > is in milli-secs. Besides, both preemption models should anyway > schedule() if TIF_NEED_RESCHED is set on return to userspace, right? > > So was curious to understand how is the preemption model causing > performance regression with no hugepages in this case? So yes, everything can schedule on return-to-user (very much including NONE). Which is why rseq slice ext is heavily recommended for anything attempting user space spinlocks. The thing where the other preemption modes differ is the scheduling while in kernel mode. So if the workload is spending significant time in the kernel, this could cause more scheduling. As you already mentioned, no huge pages, gives us more overhead on #PF (and TLB miss, but that's mostly hidden in access latency rather than immediate system time). This gives more system time, and more room to schedule. If we get preempted in the middle of a #PF, rather than finishing it, this increases the #PF completion time and if userspace is trying to access this page concurrently.... But we should see that in mmap_lock contention/idle time :/ I'm not sure I can explain any of this.