From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3FFE137F73C for ; Fri, 8 May 2026 09:08:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778231312; cv=none; b=XolW1IoZTOoryMDqX9dIumPvtcXzNhINopSpNvigyUAh75K9v/Jhc9qLsIltIHSevANiW5vucvFcC1H7HS7tsPI+CzcE22bzPEPCC59cJkZCPpriJ1/cuSMcxroyI+vz3yH2IFdmw/slYsWQKTDg7o1FlFYYxFaliDdzoutGTo8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778231312; c=relaxed/simple; bh=998AzUHkBd2MHjoPernwnRkOgC5TCEI3RLAhgvrlDps=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=ux0xQTo30ZVkebfr/zN1o0bTU8KPTAqQzNrMCd5pXVC8kcxZB/27ltf78pQHw/4pgOEx0molYLdSa6Fqzh48tLPmV2Y0i8TR8nPwCy01RjEqQFMCRhryenF9rzs8vlX8rFVL33Socg5F1ZWIBh0xNQkiv9QgJToGT9iK90Gb8hA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=WPh3yUoF; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="WPh3yUoF" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 56CD51AC1; Fri, 8 May 2026 02:08:25 -0700 (PDT) Received: from [10.57.35.71] (unknown [10.57.35.71]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F04523F7B4; Fri, 8 May 2026 02:08:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1778231310; bh=998AzUHkBd2MHjoPernwnRkOgC5TCEI3RLAhgvrlDps=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=WPh3yUoFMFj8+b6NrCFhAXEvHu9aundhUOnMwNWD2QHqLVB/wfS7QiwL+0BDpm4RF NSMUNZAiTkWs+ukXKE7RyPNfyloPQoejZ2iApXnQrGF7MghSNxsyc0kXZ3cJwCG8wd Yjtv0h4b+w7/JhHBBOSnR+TQIO25g+ipSooZj9r0= Message-ID: <775a5fd2-394d-4409-81d3-4dc6ef8209f0@arm.com> Date: Fri, 8 May 2026 11:08:23 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] x86/xen: Fix lazy mmu handling across context switch To: =?UTF-8?B?SsO8cmdlbiBHcm/Dnw==?= , linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org Cc: mmarek@invisiblethingslab.com, Boris Ostrovsky , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , xen-devel@lists.xenproject.org References: <20260508080514.454607-1-jgross@suse.com> <5cb54bd1-5981-4a46-9083-f7b527ca342f@suse.com> From: Kevin Brodsky Content-Language: en-GB In-Reply-To: <5cb54bd1-5981-4a46-9083-f7b527ca342f@suse.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 08/05/2026 10:33, Jürgen Groß wrote: > Please disregard this patch. It isn't fixing the real problem. That's what I would expect, see below. > > On 08.05.26 10:05, Juergen Gross wrote: >> The recent rework of mmu lazy mode has resulted in problems when >> running as a Xen PV guest. Enabling lazy mmu mode for the new context >> during context switch is done from the arch_end_context_switch() hook, >> but when calling this hook current hasn't been changed yet, so the >> lazy mmu mode state of the wrong task is modified. Currently xen_end_context_switch() checks if next has lazy MMU mode enabled and if so calls arch_enter_lazy_mmu_mode(), i.e. enter_lazy(XEN_LAZY_MMU). This does *not* modify any task state, rather it writes to the xen_lazy_mode percpu variable. I've thought about this from various angles when reworking lazy MMU, and the conclusion I made is that arch_{start,end}_context_switch() have no reason to change any task state. On arm64, for instance, we do nothing at all on context switching, since everything lazy MMU-related is tracked in task_struct and therefore already switched. Xen is trickier because it tracks lazy MMU/CPU state in a percpu variable, so these hooks do need to do something about it. This is entirely Xen-internal though, and there's no reason to be calling generic functions like lazy_mmu_mode_pause() that modify task state. The idea behind commit 291b3abed657 ("x86/xen: use lazy_mmu_state when context-switching") is that TIF_LAZY_MMU_UPDATES now duplicates lazy_mmu_state in task_struct and we can therefore replace the former with the latter. More specifically, the assumption is that TIF_LAZY_MMU_UPDATES is set if and only if the task has been scheduled out and __task_lazy_mmu_mode_active(task) is true. Clearly there is something wrong with this assumption, but I still can't put my finger on it. For now I would suggest reverting this commit if that solves the issue Marek reported; the intention was not to introduce any functional change, but only a (minor) optimisation. - Kevin >> >> Additionally it is much cleaner to use lazy_mmu_mode_pause() and >> lazy_mmu_mode_resume() in the Xen context switch hooks, as it avoids >> conditionals in those hooks. >> >> In order not having to add another hook to be called after switching >> current, modify lazy_mmu_mode_resume() to use a new sub-function which >> takes a task pointer as parameter. This new sub-function can then be >> used in the xen_end_context_switch() hook. >> >> Fixes: 291b3abed657 ("x86/xen: use lazy_mmu_state when >> context-switching") >> Signed-off-by: Juergen Gross >> --- >>   arch/x86/xen/enlighten_pv.c |  7 ++----- >>   include/linux/pgtable.h     | 33 ++++++++++++++++++++++++--------- >>   2 files changed, 26 insertions(+), 14 deletions(-) >> >> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c >> index ed2d7a3756ce..67bb6bf6d240 100644 >> --- a/arch/x86/xen/enlighten_pv.c >> +++ b/arch/x86/xen/enlighten_pv.c >> @@ -424,9 +424,7 @@ static void xen_start_context_switch(struct >> task_struct *prev) >>   { >>       BUG_ON(preemptible()); >>   -    if (this_cpu_read(xen_lazy_mode) == XEN_LAZY_MMU) { >> -        arch_leave_lazy_mmu_mode(); >> -    } >> +    lazy_mmu_mode_pause(); >>       enter_lazy(XEN_LAZY_CPU); >>   } >>   @@ -436,8 +434,7 @@ static void xen_end_context_switch(struct >> task_struct *next) >>         xen_mc_flush(); >>       leave_lazy(XEN_LAZY_CPU); >> -    if (__task_lazy_mmu_mode_active(next)) >> -        arch_enter_lazy_mmu_mode(); >> +    lazy_mmu_mode_resume_task(next); >>   } >>     static unsigned long xen_store_tr(void) >> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h >> index cdd68ed3ae1a..83a099bf2038 100644 >> --- a/include/linux/pgtable.h >> +++ b/include/linux/pgtable.h >> @@ -326,6 +326,28 @@ static inline void lazy_mmu_mode_pause(void) >>           arch_leave_lazy_mmu_mode(); >>   } >>   +/** >> + * lazy_mmu_mode_resume_task() - Resume the lazy MMU mode for a >> specific task. >> + * >> + * Like lazy_mmu_mode_resume() below, but with a task specified. >> + * Must be called only by lazy_mmu_mode_resume() or during context >> switch. >> + * Must never be called in interrupt context. >> + * >> + * Must match a call to lazy_mmu_mode_pause(). >> + * >> + * Has no effect if called: >> + * - While paused (inside another pause()/resume() pair) >> + */ >> +static inline void lazy_mmu_mode_resume_task(struct task_struct *task) >> +{ >> +    struct lazy_mmu_state *state = &task->lazy_mmu_state; >> + >> +    VM_WARN_ON_ONCE(state->pause_count == 0); >> + >> +    if (--state->pause_count == 0 && state->enable_count > 0) >> +        arch_enter_lazy_mmu_mode(); >> +} >> + >>   /** >>    * lazy_mmu_mode_resume() - Resume the lazy MMU mode. >>    * >> @@ -341,15 +363,8 @@ static inline void lazy_mmu_mode_pause(void) >>    */ >>   static inline void lazy_mmu_mode_resume(void) >>   { >> -    struct lazy_mmu_state *state = ¤t->lazy_mmu_state; >> - >> -    if (in_interrupt()) >> -        return; >> - >> -    VM_WARN_ON_ONCE(state->pause_count == 0); >> - >> -    if (--state->pause_count == 0 && state->enable_count > 0) >> -        arch_enter_lazy_mmu_mode(); >> +    if (!in_interrupt()) >> +        lazy_mmu_mode_resume_task(current); >>   } >>   #else >>   static inline void lazy_mmu_mode_enable(void) {} >