From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33D97399027 for ; Thu, 26 Feb 2026 22:40:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772145653; cv=none; b=rVImvMJKWHKUUq0y4MXeWOqmXLVLg09X8/NJqHTEhCUaNWAAmYDE2LmRDdtMlPnHm9x8OcIy3PGTsuGEV6jKgRwbXkRWeycnDxOBEnlkzinDp5X828YzrM2We6+mUs1WiorYORNKG7J+w9/riPHy2zky1snYCa4empQFgcxjboE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772145653; c=relaxed/simple; bh=YJoNV5ibtXK7Fjn1qXoDZbyy5ckBfovJdK/fD1zKydE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=F8RgpYac82EH0Ljj0ds+WFB0RzWK1ARROT40lhCCspX5jXiR6Vwcd4A8/208nbHRfS6QRT48XY6Y7UrcO+bk1lpbttGkn6Mq1fyhjnInleAqq3b1xobIzKmubz/Pr+0uCEX4ao2UB9wIFaemXsry4YyCljr9KyDbTnWmwxuDQSU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=LF9UHqKX; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LF9UHqKX" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2aad6045810so12830605ad.3 for ; Thu, 26 Feb 2026 14:40:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772145651; x=1772750451; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=NlR4d3dvmkzYuGlfHJDWO2utwy3AGmj/mSIZB1EeTfU=; b=LF9UHqKXA/rz1AhZ4f5r1Lv9dxlsTHn1CWJH6mj4G1Rje4dFvCw5iaNdYxHsvWFPvd 1TKJu6H2/74cPcEmT2qnvCUH1/OK9WwX2Yh8d2KVdg81zCBCyDHFgGpSwyI5nasIEJx0 kq0BqYK7+JmM/fwpbJiJauGXCGAwY+tuKDlPsZOTET2zHgrspdpvSI7HM2xcHmC262iX kRvbYFXaVfVoLw09bvD2j6T1jDLmOc1+utEmn6GPhdy4tYPHG183j61sAmV6tUEgUaro YoETwSyfyQj0DXyaJSUm4x6D7acVpl4OVXLeUjgQH7it8kscx52YVARHC8hGxaBrjrf5 0phw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772145651; x=1772750451; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NlR4d3dvmkzYuGlfHJDWO2utwy3AGmj/mSIZB1EeTfU=; b=phOxc8MGlXb09cstZ/2mrhyB/zkVqDEg3R3ZZkWYMvTK+PZZI/V8lZ6LGAsO/jaOqz r7o0FQgwWpp0kIehINPPPx5A8ZVVnnfnr/kAzMHO/7QY6o+0WivKIBvObTLbYAV+fbfx T56eBgADFVk+FiHgvwfPh0wBOEVFK46Xibiv1oQXB0z+82gdSi5YnngpKZmlvRvD0eru UGSKDsDf9aP/+nGYuczPiwJIGkDVA9FHQeoAuzy9LSLtjmyke6uzmpoZoF6iBat4pbxz g31RDglDEL5Yda5qjBbl1fHF7hY5jzu4YJ7DC12nKhdUUvrPX7bhM27H+okkru+pqnUM NEQw== X-Forwarded-Encrypted: i=1; AJvYcCVZp986qqjbUStVVGQ4kz+09wNrxRvtbeqIxbGk1nygY9PAbRP01IYXWyF5Tfxd8ffCQuhXwqr2/koVIo0=@vger.kernel.org X-Gm-Message-State: AOJu0Ywe3e4amEYx3qAtHaltka8x6Lw2HG4IhaANBQMF1z5diBLt7gmH ijgfKmLKqVVwJgbrAz2PCzVh0hv+JvIzrJImlX49T2zTxjsoxqjl7VMH6lN8GACWv044OEzNdWn Bd1y3Zw== X-Received: from plfp5.prod.google.com ([2002:a17:902:e745:b0:2a9:62df:189d]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:2450:b0:2ad:e535:36ca with SMTP id d9443c01a7336-2ae2e3e3c85mr5360285ad.12.1772145651270; Thu, 26 Feb 2026 14:40:51 -0800 (PST) Date: Thu, 26 Feb 2026 14:40:50 -0800 In-Reply-To: <20260226190757.GA44359@ziepe.ca> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260225075211.3353194-1-aik@amd.com> <20260226190757.GA44359@ziepe.ca> Message-ID: Subject: Re: [RFC PATCH kernel] iommufd: Allow mapping from KVM's guest_memfd From: Sean Christopherson To: Jason Gunthorpe Cc: Ackerley Tng , Alexey Kardashevskiy , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Kevin Tian , Joerg Roedel , Will Deacon , Robin Murphy , Paolo Bonzini , Steve Sistare , Nicolin Chen , iommu@lists.linux.dev, linux-coco@lists.linux.dev, Dan Williams , Santosh Shukla , "Pratik R . Sampat" , Fuad Tabba , Xu Yilun , "Aneesh Kumar K . V" , michael.roth@amd.com, vannapurve@google.com Content-Type: text/plain; charset="us-ascii" On Thu, Feb 26, 2026, Jason Gunthorpe wrote: > On Thu, Feb 26, 2026 at 12:19:52AM -0800, Ackerley Tng wrote: > > Sean Christopherson writes: > > > > > On Wed, Feb 25, 2026, Alexey Kardashevskiy wrote: > > >> For the new guest_memfd type, no additional reference is taken as > > >> pinning is guaranteed by the KVM guest_memfd library. > > >> > > >> There is no KVM-GMEMFD->IOMMUFD direct notification mechanism as > > >> the assumption is that: > > >> 1) page stage change events will be handled by VMM which is going > > >> to call IOMMUFD to remap pages; > > >> 2) shrinking GMEMFD equals to VM memory unplug and VMM is going to > > >> handle it. > > > > > > The VMM is outside of the kernel's effective TCB. Assuming the VMM will always > > > do the right thing is a non-starter. > > > > I think looking up the guest_memfd file from the userspace address > > (uptr) is a good start > > Please no, if we need complicated things like notifiers then it is > better to start directly with the struct file interface and get > immediately into some guestmemfd API instead of trying to get their > from a VMA. A VMA doesn't help in any way and just complicates things. +1000. Anything that _requires_ a VMA to do something with guest_memfd is broken by design. > > I didn't think of this before LPC but forcing unmapping during > > truncation (aka shrinking guest_memfd) is probably necessary for overall > > system stability and correctness, so notifying and having guest_memfd > > track where its pages were mapped in the IOMMU is necessary. Whether or > > not to unmap during conversions could be a arch-specific thing, but all > > architectures would want the memory unmapped if the memory is removed > > from guest_memfd ownership. > > Things like truncate are a bit easier to handle, you do need a > protective notifier, but if it detects truncate while an iommufd area > still covers the truncated region it can just revoke the whole > area. Userspace made a mistake and gets burned but the kernel is > safe. We don't need something complicated kernel side to automatically > handle removing just the slice of truncated guestmemfd, for example. Yeah, as long as the behavior is well-documented from time zero, we can probably get away with fairly draconian behavior. > If guestmemfd is fully pinned and cannot free memory outside of > truncate that may be good enough (though somehow I think that is not > the case) With in-place conversion, PUNCH_HOLE and private=>shared conversions are the only two ways to partial "remove" memory from guest_memfd, so it may really be that simple.