From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fhigh-a4-smtp.messagingengine.com (fhigh-a4-smtp.messagingengine.com [103.168.172.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B2B335DA67; Thu, 19 Mar 2026 14:30:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.168.172.155 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773930645; cv=none; b=bLekP9pZy3cUKP0Oq71qX+lKoKgRzfBwc+fDOkADQpOaY6vBj7uGtyCVBazE4ElD76dwzwurUR0DvMvpwG8Z+GfljKl5e45+8VDB2f9m6vUXgBq1+nqU4hbRC7fq5k6klWo0tkLs1U7oitOM+Y97dLd5pf/ujoaCkN4sfHMRGMU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773930645; c=relaxed/simple; bh=Tc1rDkH3+UiS6QBT1k2JXzUFq1OvHy6pwiSd1/PUFxg=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=koPa0vzgitRGfjBXNAfNO5c+D9eS0G2oZJYEkXQ95zIk5gF+hasnn0U+9Wc/kANdP/ri0iB/N/PyEqVENXh68ueF+mjMduoJxWoBFgGlDxZOeB2MS69RW0aR5M9WgJGB/2tgtmIfKbK9ZEmMJdHwXXcqwLDTNB23STqJKF1fd5w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=shazbot.org; spf=pass smtp.mailfrom=shazbot.org; dkim=pass (2048-bit key) header.d=shazbot.org header.i=@shazbot.org header.b=JCbdqHdE; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=yyJkujny; arc=none smtp.client-ip=103.168.172.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=shazbot.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shazbot.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=shazbot.org header.i=@shazbot.org header.b="JCbdqHdE"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="yyJkujny" Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfhigh.phl.internal (Postfix) with ESMTP id 4A8151400204; Thu, 19 Mar 2026 10:30:41 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Thu, 19 Mar 2026 10:30:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shazbot.org; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1773930641; x=1774017041; bh=d8H4MdzzItKEf3ZYCkEG3VbvnFnz+wY00IZL+Df1u5Q=; b= JCbdqHdEm5582iLKjH4zLt7GL3O6rxvYSTEUXBLo6xbMLHF5NJitkG/UiaedfD4k 9xOWSf9twNxfNFrLy6tMKizYfnGiJLHbtiadarQifbEZPMPU7uMtbTkd2HsJHwOS m0l/WqEuWaa6DaMMVz5kJnfZVELz1JEucjUbvtatTXBX8YN3xHl4XBhSPJ8SX1w0 7DyAoeHqTCmmgDq4Kymw4A+BCqQbRe7pljJORShbLcVyKm5ImEe4igIL+MqD0izS 6ZPFxZM3UUUWuohOb2EcuxyYbKuZCLuVzhllrdzHva9QnnpQ06U0RhftF2Q7eKyF x51STR3QyfDSpy9fI4ZCLQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1773930641; x= 1774017041; bh=d8H4MdzzItKEf3ZYCkEG3VbvnFnz+wY00IZL+Df1u5Q=; b=y yJkujnyo+GYnZY5NyPW0k/KebDotyHYON4PnY7dsIs+8F2enWMC1cWh/bcnvdcBf K3iv3qraxQXUr0E6BvSpznWE/hVxHe+KlgCyeqWC+GExpdwaDcQrMrhdZSrjQ3MG sQzKpHuwDPB/q4np5JNH9KxZNeyVTUqsKc1jmGYsR+fUY1VkNDl+FHzr6coJSemJ pU3oe0uAB62wX0EsLFGe86SAvu0m1RFGLiqrsgPHOOKuowplOK2vCG2ZB2y0XIqo mNntX0vLdgibLoqzx6DjRTT4GX9uQU4N+I7A/OO2j+tApeKFZOX7kZXNyxJudbkw svUsQA0zKYX6Gbr3Q2IBQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdeftdejvdeiucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkjghfofggtgfgsehtqheftdertdejnecuhfhrohhmpeetlhgvgicu hghilhhlihgrmhhsohhnuceorghlvgigsehshhgriigsohhtrdhorhhgqeenucggtffrrg htthgvrhhnpeeihfduieffvefhffehffejjeetveetgeeugeevtddvieehveduteetfeei tefgffenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe grlhgvgiesshhhrgiisghothdrohhrghdpnhgspghrtghpthhtohepjedpmhhouggvpehs mhhtphhouhhtpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpth htohepmhgsohhonhgvsegrkhgrmhgrihdrtghomhdprhgtphhtthhopeguvghvnhhulhhl odhmsghoohhnvgdrrghkrghmrghirdgtohhmsehkvghrnhgvlhdrohhrghdprhgtphhtth hopehkvhhmsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheplhhinhhugidq khgvrhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtohepshhtrggslh gvsehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtghpthhtoheprghlvgigsehshhgriigs ohhtrdhorhhg X-ME-Proxy: Feedback-ID: i03f14258:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 19 Mar 2026 10:30:40 -0400 (EDT) Date: Thu, 19 Mar 2026 08:30:39 -0600 From: Alex Williamson To: "David Hildenbrand (Arm)" Cc: "Boone, Max" , Max Boone via B4 Relay , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" , alex@shazbot.org Subject: Re: [PATCH] vfio/type1: Retry follow_pfnmap_start() when PFNMAP is zapped Message-ID: <20260319083039.03865989@shazbot.org> In-Reply-To: <45e50068-751c-4e8c-a6b0-62cf8d1e58e6@kernel.org> References: <20260317-retry-pin-on-reclaimed-pud-v1-1-1f0d0a23f78d@akamai.com> <20260318152249.43eb81f6@shazbot.org> <3C8F924E-CA2D-4368-83DF-3CCCD4BA49FF@akamai.com> <45e50068-751c-4e8c-a6b0-62cf8d1e58e6@kernel.org> X-Mailer: Claws Mail 4.3.1 (GTK 3.24.51; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Thu, 19 Mar 2026 14:18:49 +0100 "David Hildenbrand (Arm)" wrote: > On 3/19/26 09:36, Boone, Max wrote: > >=20 > > =20 > >> On Mar 18, 2026, at 10:22=E2=80=AFPM, Alex Williamson wrote: > >> > >> [=E2=80=A6] > >> =20 > >>> + /* > >>> + * follow_pfnmap_start() returns -EINVAL for > >>> + * invalid parameters and non-present entries. > >>> + * If that happens here after a successful > >>> + * fixup_user_fault(), it is likely that the > >>> + * pfnmap has been zapped. Retry instead of > >>> + * failing. > >>> + */ =20 > >> > >> It's a little stronger than that, right? We're betting that the only > >> remaining non-zero return is due to a race and we can introduce what > >> appears to be potential for an infinite loop here because -EAGAIN will > >> get kicked out to redo the vma_lookup() and fixup_user_fault() should > >> return a genuine error if we're completely in the weeds. Should we > >> make this a little stronger and more specific? Thanks, =20 > >=20 > > I=E2=80=99d say that the best case would be to have follow_pfnmap_start= () return > > -EINVAL or -ENOENT w.r.t. which of the two return values it is. But then > > again, we could theoretically run into an infinite loop I guess - as th= e zap > > and faulting could run in lockstep (the race window is extremely small > > though). =20 >=20 > Well, in theory :) To hit that race repeatedly, you'd really have to be > quite lucky I guess. >=20 > But the real question is: if user space triggered the pinning, and user > space keeps hurting itself to make progress, is that a real problem? I'd say no, that's not a problem. It's really just that masking all non-zero returns as -EAGAIN makes some assumptions about the other error conditions that could change over time, so minimally those dependencies should be clearly stated. Even better would be if we didn't need to make those assumptions. Thanks, Alex > I guess the crucial part would be to >=20 > a) Have some cond_resched(() in there? > b) Checking for fatal signals somewhere? > c) Possibly drop locks (mmap lock?) every now and then? >=20 > For GUP, a) and b) are in place in __get_user_pages(). >=20 > c) might be done, but I think it's less deterministic. >=20 > >=20 > > We could make the retry above bounded, and bubble up a -EBUSY such > > that users of the ioctl can decide to retry instead of fail? =20 >=20 > Would that be a possible ABI break? You'd really have to only do that in > a case where user space does stupid things, I guess. >=20 > >=20 > > David, you mentioned that gup already has retry logic that we don=E2=80= =99t have > > with follow_fault_pfn() -> follow_pfnmap_start(). Would we potentially = run > > into an infinite loop with this change? =20 >=20 > GUP triggers page faults through faultin_page(). If handle_mm_fault() > returns >=20 > * VM_FAULT_COMPLETED we return -EAGAIN > * VM_FAULT_ERROR we return the error > * VM_FAULT_RETRY we return -EBUSY > * Otherwise 0 >=20 > In the caller __get_user_pages(), we > * Retry immediately with ret =3D=3D 0 > * Return to the GUP caller (letting it retry) with -EBUSY/-EAGAIN >=20 > Having at least a) and b) sounds reasonable. Not sure about having c), > might be tricky if we are not allowed to drop the lock. >=20