From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3378CCD3436 for ; Fri, 8 May 2026 09:54:49 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.1303391.1576817 (Exim 4.92) (envelope-from ) id 1wLHua-0007rf-VP; Fri, 08 May 2026 09:54:24 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 1303391.1576817; Fri, 08 May 2026 09:54:24 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wLHua-0007rY-RU; Fri, 08 May 2026 09:54:24 +0000 Received: by outflank-mailman (input) for mailman id 1303391; Fri, 08 May 2026 09:54:23 +0000 Received: from mx.expurgate.net ([195.190.135.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wLHuZ-0007rS-Ke for xen-devel@lists.xenproject.org; Fri, 08 May 2026 09:54:23 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wLHuZ-00FKkc-1O for xen-devel@lists.xenproject.org; Fri, 08 May 2026 11:54:23 +0200 Received: from [10.42.69.12] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 69fdb2b9-2eae-0a2a0a5409dd-0a2a450cbdaa-36 for ; Fri, 08 May 2026 11:54:22 +0200 Received: from [217.140.110.172] (helo=foss.arm.com) by tlsNG-d25034.mxtls.expurgate.net with ESMTP (eXpurgate 4.56.1) (envelope-from ) id 69fdb2ce-62f1-0a2a450c0019-d98c6eacc006-1 for ; Fri, 08 May 2026 11:54:22 +0200 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 53F741BCA; Fri, 8 May 2026 02:54:16 -0700 (PDT) Received: from [10.57.35.71] (unknown [10.57.35.71]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6A97A3F836; Fri, 8 May 2026 02:54:20 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=foss header.d=arm.com header.i="@arm.com" header.h="Date:Subject:To:Cc:References:From:In-Reply-To" DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1778234061; bh=UKXz8qe6CYYl8KXHQkBSeLcEXveBk25zWF0FLtT+yKE=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=Ut8+a+EPP3X/dGPsnJmH2XWvwOLX3QKyFPxb/dXyfTZ15exwUseKgJi4fk74OstOH MTTVJGJv2XIha/tcFrEOG6KBdbcohMz6tRbsgYxppTIqOXj65xlm7yD4toUkI5IznK dFuDW1ZHsf/TF0D6F/GH2OqG/w7Bu4yxSVWkVolQ= Message-ID: <7f123733-2ec2-436e-bb0c-67b3e9f80735@arm.com> Date: Fri, 8 May 2026 11:54:17 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: kernel BUG around vmap/vfree - xen_enter_lazy_mmu()/xen_leave_lazy_mmu() - Linux 7.0-rc1 To: Juergen Gross , =?UTF-8?Q?Marek_Marczykowski-G=C3=B3recki?= Cc: Andrew Cooper , xen-devel , Boris Ostrovsky References: <5d068304-837d-4aef-b8a7-87c91ccf96b4@arm.com> <15645d19-f19d-4955-8315-0188aa834eb6@suse.com> From: Kevin Brodsky Content-Language: en-GB In-Reply-To: <15645d19-f19d-4955-8315-0188aa834eb6@suse.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-purgate-ID: tlsNG-d25034/1778234062-E0D63CF5-133CD4DB/0/0 X-purgate-type: clean X-purgate-size: 2334 On 08/05/2026 10:53, Juergen Gross wrote: > [...] > > But now I think I have found the real culprit in lazy_mmu_mode_enable(): > > static inline void lazy_mmu_mode_enable(void) > { >         struct lazy_mmu_state *state = ¤t->lazy_mmu_state; > >         if (in_interrupt() || state->pause_count > 0) >                 return; > >         VM_WARN_ON_ONCE(state->enable_count == U8_MAX); > >         if (state->enable_count++ == 0) >                 arch_enter_lazy_mmu_mode(); > } > > Consider a preemption just before calling arch_enter_lazy_mmu_mode(). The > enable_count will be 1 now, but there was no switch to lazy mode yet. > > When the task becomes active again, context switch handling will see lazy > mode enabled (enable_count > 0), so it will call > arch_enter_lazy_mmu_mode(). > And then the task resumes and is calling arch_enter_lazy_mmu_mode() > another > time. Agreed, this must be the problem. I did wonder whether the lack of atomicity would cause trouble... arm64 isn't impacted because it tracks related state in task_struct only. powerpc and sparc do use percpu variables but that shouldn't matter as they disable preemption in the entire lazy MMU section. > > The only chance I'm seeing to avoid that would be to disable preemption > around all instances of testing a condition and then enabling or > disabling > lazy mmu mode. I don't immediately see why we would need such a big hammer. If we revert commit 291b3abed657 ("x86/xen: use lazy_mmu_state when context-switching"), then arch_{start,end}_context_switch() should once again do the right thing for Xen since the TIF_LAZY_MMU_UPDATES flag is separate from lazy_mmu_state. I think it looks like this: lazy_mmu_mode_enable()     state->enable_count++             arch_start_context_switch()              xen_lazy_mode == XEN_LAZY_NONE -> do nothing                          arch_end_context_switch()              TIF_LAZY_MMU_UPDATES not set -> do nothing             enter_lazy(XEN_LAZY_MMU) Nothing else should be checking lazy MMU state during the context switch. Does that make sense? - Kevin