From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 264373ED3B8 for ; Tue, 16 Jun 2026 06:38:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781591907; cv=none; b=e/zkYmAG6CAomh+mLzUlB7XnJeYjiF2nvKP5+vM+9i4+5GjWn1Trcr7YVY1NFzdDnnfUciAAWxLC+l6OSCcCDh2OleRfuQwDLuIBmYYxpLoBy7BQpIoTP65iNrXVfDfghWaEgFi05qKsScWybzNnyr2zPcfHsrHbkzJgGeBDZEI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781591907; c=relaxed/simple; bh=EnxOCa15sZX3oEek17FpEAqKawL7WEZTyWucBlcjikg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=JPz8GBnB9XnBV+KCd0xXATmvIHOFOXiKw+lOAcDZuUtQNLyS2O8KiBO2RLRr9B2EaZLmeA7bgtjJBV47N8ADNauqYWQ+L0zv8SJXnseYkSVFrQ0Ylsf8CGF4p0XBptB0p/g1BxrYFWBy7JajhNVGbQA9OXGdt5VLG7dA13Vmfz0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Q9YNOphC; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Q9YNOphC" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2bf22c18ad3so35715ad.0 for ; Mon, 15 Jun 2026 23:38:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1781591903; x=1782196703; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=EFz9lxRroKDDcTb/8ZfQJSCQqH8dCC3UQeivoDCOLsY=; b=Q9YNOphCFZZzWi37RFXsyfJkuN6xPhsL2EHamixCuR8nYfQxSirjTFq6YCytklhyAM eMyu3s0iw2LpUrrjXGvSArTOBoUrx+vjnEuy8PIGIHDzchmBIcEQn96X9tEXgCrJNUVR Vw1pbkn2smntXzDV9l7hzJniIyANjNt9b/548gyVPnYnHcfvo5bYoiQcoMELAUQAxgvw /O88/RAwb5CZ/EuVj6HRoM3PiOHKo4oYhviUHfhAnW/Lfk3qP3nVB5ibdJEZwdM1zt2c Bhl/4Rhxp6DM5zFPnQBNnB+BiiHDjiCAu5dFhuttkkkblwm+fM8XYmB4kVv8GJ4dVNHr c7kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781591903; x=1782196703; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EFz9lxRroKDDcTb/8ZfQJSCQqH8dCC3UQeivoDCOLsY=; b=aSeDAKRdPXDbQUm5xMFFFQubli+WPtE7lLqnglFSF2EHRe0zZpxWgktVmsjC5yVanS A6x2nC82Yndon9j9gU3uxuqdLxcqcz0BoKpqth89e0OQTM+ZAhDOAawsuTO7vMBV6Dns lNCjV5gHcBogabAUO4q+gSGECDYsrTW6PJtw+G6TveFl9iK6q9naPnjRQLGKEbQxwtoO LHbGYUM/UL6qiI/tlnXpGMP3lWXAY0NZywy8iaQvGwTw3o/WFlB7UPC7+GHZMf90zrjq jPjgPRIC72FQWXtkecFXx6/IYKYmG2idwjm8fprbcxDuBDlMHcKX6NAf4mgButB1h0V4 3row== X-Forwarded-Encrypted: i=1; AFNElJ9eX+lrVt6mrIsTPaPyRYfcmxAVMB4crTdAIOzj13Ryap23P4P5JoTzgUqh9GL6nrdxFmDwFtyis1Ay5tQ=@vger.kernel.org X-Gm-Message-State: AOJu0YycsPynUE9XMOFjHvaOJ3/sTqtXcw/uQSzHuQl47KKx9Zu+UEa7 PixJ6+Z54t526HfKGnFAPUDgElRNNfPRJbcjlWumOLT0sDpAcwvIdpZ6+Pwq52jlKg== X-Gm-Gg: Acq92OErGWJw+BYV7huckfFgG7nvypO7O5EdX+IUOhSUmWrjnIdmWPNr/ls0Gqs6FAS 8VzGKfq8oWRL3aqPc+ZeS3qZaKtz+mQfpEgJ8T7pIQKHjaf+B8jtbK224vnkAd0ruYAalgWnNPA os3s3QyYfD1lEic9ftLK9xT/06Ia0Xs8ZUGQ5sNEJ4Tw92Ud/SYeFcM38saJFMUIc73gKfiDpWM MYdeLHxLObWpiJ8beTYuzZLTtuqVDicKfPjLDgcfOdzXoOPz8rgakshW0/cbD5aEpBI2T+z3NSF oUkaIPE6NmkWAJwucrQxpsiuqvZ7CSNdFbpK7GtVFwO36OFJziPDSh9ctMpFq9NyqD4XFGIssQP YEhyWD0M6aRDIIsE8ZdaLox5nD+LNilD8+6v4Tkc1TNwHSm78V8AM7on3ltMN/M/7Q1i7Chr+Z6 pezOONjWbvq5AAC8CQaQfv6cUTfbetYt9BBTWTX0f4K9f6puFUo/SiL8DOQbmPfI6m6TuJIoo= X-Received: by 2002:a17:903:2a84:b0:2bd:3bfd:74f1 with SMTP id d9443c01a7336-2c69a26dea7mr49405ad.2.1781591903036; Mon, 15 Jun 2026 23:38:23 -0700 (PDT) Received: from google.com (199.255.142.34.bc.googleusercontent.com. [34.142.255.199]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c866519f0a6sm10345718a12.22.2026.06.15.23.38.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jun 2026 23:38:22 -0700 (PDT) Date: Tue, 16 Jun 2026 06:38:15 +0000 From: Pranjal Shrivastava To: Samiullah Khawaja Cc: Jason Gunthorpe , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Bjorn Helgaas , Logan Gunthorpe , Alex Williamson , Kevin Tian , Ankit Agrawal , Matt Evans , Vivek Kasireddy , Leon Romanovsky , Shivaji Kant Subject: Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration Message-ID: References: <20260610151853.3608948-1-praan@google.com> <20260610162848.GO2764304@ziepe.ca> <20260611221447.GH1066031@ziepe.ca> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue, Jun 16, 2026 at 12:42:19AM +0000, Samiullah Khawaja wrote: > On Fri, Jun 12, 2026 at 02:50:18PM +0000, Pranjal Shrivastava wrote: > > On Thu, Jun 11, 2026 at 07:14:47PM -0300, Jason Gunthorpe wrote: > > > On Thu, Jun 11, 2026 at 02:40:17PM +0000, Pranjal Shrivastava wrote: > > > > On Wed, Jun 10, 2026 at 01:28:48PM -0300, Jason Gunthorpe wrote: > > > > > On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote: > > > > > > > [snip] > > > > Yea, that's going to be tricky.. I'm thinking if we can have a zap model > > there somehow? If the device is gone / going through a reset, we can > > handle the refcounts accordingly? > > IIUC zapping will only work if userspace is using these, but if you feed > this memory into another device through NFS and the pages are pinned by > gup (or that device) then the dmabuf move_notify/revoke logic on device > reset will be tricky as now the pages for that device BAR are pinned. Yes, it would be tricky. However the zap is still needed since userspace is the entity creating the file I/O leading to those pins. The user would mmap the BAR and pass the buff into a POSIX read() / write() where the filesystem (like NFS) would extract the iovs and call GUP to pin them. By zapping the userspace mappings first, we prevent the any new read/write() calls and halts the creation of additional GUP pins. (Note that if GUP doesn't see a PTE for the page, it manually invokes the page fault handler and waits for the page fault to be serviced, where it would then block on the vdev->memory_lock held by the reset thread). I agree it will be tricky but we just need a multi-stage sequence. The standard workflow is: userspace mmaps the BAR and passes the buffer to filesystem via the POSIX file API. Filesystem then pins the pages via GUP for the duration of the synchronous DMA. My plan for RFC v2 is as follows: a) Zap the userspace mappings first to prevent new requests b) Wait for In-flight DMA: Just as we currently use dma_resv_wait_timeout to wait for HW fences, we'll first wait for the page refcounts to drop. An important thing to note is that filesystems can't pin these pages for long term, i.e. FOLL_PCI_P2PDMA and FOLL_LONGTERM can't be requested together for a single pin as mandated by gup [1]. Thus, filesystems using ZONE_DEVICE memory (via ITER_ALLOW_P2PDMA) simply hold the pins for the DMA duration. c) Once the refcounts hit zero do we proceed with move_notify and the hardware reset. This is going to be the first stab, I guess it'll definitely evolve further. I'll try to implement this in RFC v2 and attempt to address these concerns Thanks, Praan [1] https://elixir.bootlin.com/linux/v7.1-rc6/source/mm/gup.c#L2538