From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB211402BAC for ; Wed, 29 Apr 2026 18:20:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777486803; cv=none; b=XLoEmr41H5FucKop4ZTzjc75drV32B3H9cykqAOvsxkUaIpG5VIL3Z8kUJzF9HKRv27nxM3RXK5MyfYKqbgApJdQvICL3rXO9L3vPD7Qz97EJlDTvOpPOVl2MlbWuJQgnHNNKadQZmUquAHKJH1c1itRDoa89Dvv+VI+8EVihN8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777486803; c=relaxed/simple; bh=gAidwY/isSq9GwYadrcKVxfY5GsbxvEPZGfjDxsycxY=; h=Subject:To:Cc:From:Date:References:In-Reply-To:Message-Id; b=Kob1s8GBUmrltFxzKratQVKgDSaq5OCtU7GOkDRxrC7BLo+/r59HaD5PnjjUPPAU7a+M/LXiQ2aD9bCxaehlEjfvtxWSlgPL1qUFI0qDuCy6RpYk+ZjviHB9J0t+CA05v0YQ5bB1MJA6voUkKFI59P+cJpK8U0cPWkaqY99xH+I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cpZxdqtB; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cpZxdqtB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1777486802; x=1809022802; h=subject:to:cc:from:date:references:in-reply-to: message-id; bh=gAidwY/isSq9GwYadrcKVxfY5GsbxvEPZGfjDxsycxY=; b=cpZxdqtBweFDsxAH41ac5E1BwFK5GLgIWo22zElbkE1RMaIMh8uIJvKl zGixIdu2t3F12UaQKRv72F/V3nIlq5fbCeEongYnZi3te/vIrTFiiNQI7 BtxthAOH3Tr++cUm32zJKaWLWcalEtcVIEftm2Kb7YdLWcPE5djPRyOKt G/IzKSRdn7ogWjD2p3y75UFcdOI43iXSoApsbfUGdR1G7o3KfRRJB/Ckp myg4WO/8xkB++zVlJ4fTxLZj9cb2CSncS/d6Fo1c+2wKNDbGBfB+9+/WM 9z58S0phbBTjF21buTySeSghXL3BFtTYojiX2laj5SOqk4FeAPm9PxTow A==; X-CSE-ConnectionGUID: RcLOBUC0R7Ot4NicEvrgcA== X-CSE-MsgGUID: Ky2weyG/SjKwGUfeiUGRuQ== X-IronPort-AV: E=McAfee;i="6800,10657,11771"; a="95990109" X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="95990109" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Apr 2026 11:20:00 -0700 X-CSE-ConnectionGUID: WNjeZOkkRJuU6X+T+RGEcQ== X-CSE-MsgGUID: 4rIFHkXPQNaEscpxqJpPfA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,206,1770624000"; d="scan'208";a="239336402" Received: from davehans-spike.ostc.intel.com (HELO localhost.localdomain) ([10.165.164.11]) by fmviesa005.fm.intel.com with ESMTP; 29 Apr 2026 11:19:59 -0700 Subject: [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers To: linux-kernel@vger.kernel.org Cc: Dave Hansen , Andrew Morton , "Liam R. Howlett" , linux-mm@kvack.org, Lorenzo Stoakes , Shakeel Butt , Suren Baghdasaryan , Vlastimil Babka From: Dave Hansen Date: Wed, 29 Apr 2026 11:19:59 -0700 References: <20260429181954.F50224AE@davehans-spike.ostc.intel.com> In-Reply-To: <20260429181954.F50224AE@davehans-spike.ostc.intel.com> Message-Id: <20260429181959.BC9DABC5@davehans-spike.ostc.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: From: Dave Hansen == Background == There are basically two parallel ways to look up a VMA: the traditional way, which is protected by mmap_lock, and the RCU-based per-VMA lock way which is based on RCU and refcounts. == Problems == The mmap_lock one is more straightforward to use but it has a big disadvantage in that it can not be mixed with page faults since those can take mmap_lock for read. The RCU one can be mixed with faults, but it is not available in all configs, so all RCU users need to be able to fall back to the traditional way. == Solution == Add a variant of the RCU-based lookup that waits for writers. This is basically the same as the existing RCU-based lookup, but it also takes mmap_lock for read and waits for writers to finish before returning the VMA. This has two big advantages: 1. Callers do not need to have a fallback path for when they collide with writers. 2. It can be used in contexts where page faults can happen because it can take the mmap_lock for read but never *holds* it. == Discussion == I am not married to the naming here at all. Naming suggestions would be much appreciated. This basically uses mmap_lock to wait for writers, nothing else. The VMA is obviously stable under mmap_read_lock() and the code _can_ likely take advantage of that and possibly even remove the goto. For instance, it could (probably) bump the VMA refcount and exclude future writers. That would eliminate the goto. But the approach as-is is probably the smallest line count and arguably the simplest approach. It is a good place to start a conversation if nothing else. Signed-off-by: Dave Hansen Cc: Suren Baghdasaryan Cc: Andrew Morton Cc: "Liam R. Howlett" Cc: Lorenzo Stoakes Cc: Vlastimil Babka Cc: Shakeel Butt Cc: linux-mm@kvack.org --- b/include/linux/mmap_lock.h | 2 ++ b/mm/mmap_lock.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) diff -puN include/linux/mmap_lock.h~lock-vma-under-rcu-wait include/linux/mmap_lock.h --- a/include/linux/mmap_lock.h~lock-vma-under-rcu-wait 2026-04-29 11:18:50.633628887 -0700 +++ b/include/linux/mmap_lock.h 2026-04-29 11:18:50.707631737 -0700 @@ -470,6 +470,8 @@ static inline void vma_mark_detached(str struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, unsigned long address); +struct vm_area_struct *lock_vma_under_rcu_wait(struct mm_struct *mm, + unsigned long address); /* * Locks next vma pointed by the iterator. Confirms the locked vma has not diff -puN mm/mmap_lock.c~lock-vma-under-rcu-wait mm/mmap_lock.c --- a/mm/mmap_lock.c~lock-vma-under-rcu-wait 2026-04-29 11:18:50.704631622 -0700 +++ b/mm/mmap_lock.c 2026-04-29 11:18:50.707631737 -0700 @@ -340,6 +340,49 @@ inval: return NULL; } +/* + * Find the VMA covering 'address' and lock it for reading. Waits for writers to + * finish if the VMA is being modified. Returns NULL if there is no VMA covering + * 'address'. + * + * The fast path does not take mmap lock. + */ +struct vm_area_struct *lock_vma_under_rcu_wait(struct mm_struct *mm, + unsigned long address) +{ + struct vm_area_struct *vma; + +retry: + vma = lock_vma_under_rcu(mm, address); + /* Fast path: return stable VMA covering 'address': */ + if (vma) + return vma; + + /* + * Slow path: the VMA covering 'address' is being modified. + * or there is no VMA covering 'address'. Rule out the + * possibility that the VMA is being modified: + */ + mmap_read_lock(mm); + vma = vma_lookup(mm, address); + mmap_read_unlock(mm); + + /* There was for sure no VMA covering 'address': */ + if (!vma) + return NULL; + + /* + * VMA was likely being modified during RCU lookup. Try again. + * mmap_read_lock() waited for the writer to complete and the + * writer is now done. + * + * There is no guarantee that any single retry will succeed, + * and it is possible but highly unlikely this will loop + * forever. + */ + goto retry; +} + static struct vm_area_struct *lock_next_vma_under_mmap_lock(struct mm_struct *mm, struct vma_iterator *vmi, unsigned long from_addr) _