From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69BD83ACA59 for ; Tue, 23 Jun 2026 20:56:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782248180; cv=none; b=fpt1I5sdSlrzvziEUTIe0NmfjoQUNJAdJ8mPLvqO8a/jUqIOK8phytKwMKotxs91xoeM/ctrZGlCuPi3IouM/fUpa/f8sWyGtkN70J1IqRmBZ2i6OQNb8/NuWMg6cqf/cLvRCSr5580xQLwsLU+YTWzRO8DoO85Aut0kE9s7k88= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782248180; c=relaxed/simple; bh=G82PIGobVIM5hrjzCuF1CpVGWo8L2z/ATglE9QG7wqI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=HeBUzwx7efoTvWfFzl3+G88cu7yGeAkp5ulKeYmY6RaxFEdyyElwBi9F29yagyHQ/AJxyYfr1MJ8GedpxRo0CCNFsbH2ILwYsdKpaVLmESVSXRaLJdAW21Va3culr9brewfT+SzDyqY95XsMgH0iYBmjjcrYsIv4ojYYJMx1aQ4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aaMtamGU; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aaMtamGU" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-2c6b7bd4e8dso13985ad.0 for ; Tue, 23 Jun 2026 13:56:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1782248179; x=1782852979; darn=vger.kernel.org; h=in-reply-to:content-disposition:content-type:mime-version :references:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to:content-type; bh=tDNBIwJuw+D53Ki5EAv9tDfTFhNZX4Fj7K/EJveLoSA=; b=aaMtamGUuDy9g5ntFRF5sL2bWDCHjV4NfC9LNLQbHlQQxRylg45BvA4RgJS4dxzGSA BTQPKgE7wPR7gEPFywUTou1nrhsAeUtpGR4NfJcHDWFhJEr7acBET5hZeR70/QRuFieQ vh29Rc2gxjIRKoPVSr7jCKRpqz+WscOgZVBBELjviuOgr/a2wiU1krejkYIxm3uYWSIf xm7Uol7SKIpYXzVyDhySSnMKWK6dLFARPYy7x00Tve8XZeHqAFk0udewfEmd+r8f5rqM fp2712svM1y9e0hWZFGXI2F7Gxasp4mBDhXowCnO6GfvJPLwyr2XJik/zwgA2Rku4IeV OReg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782248179; x=1782852979; h=in-reply-to:content-disposition:content-type:mime-version :references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to :content-type; bh=tDNBIwJuw+D53Ki5EAv9tDfTFhNZX4Fj7K/EJveLoSA=; b=gbiFYTzT55tBIcExCv7al6CS6S+LftJ+MEhp0LOT9QweXthzU+ecN10QcPfsMbcPeq R/ujgAW2IAOE1q8uYhom2KvwcxB4lFefOR1+L4tC0NPGbqDYhrLwFvYyTxNUmUkrgsIC GiFD+USkXnV5TOxkw2fxhvoWSZwV0cZKHqVCQj0Q/ETvLd5emdNYQBPXX6/zEIUNmUWm al9hRulwD+ww8I0JkQbnClZIBiq42S1qIHB2acszdgFTqV4FMPY86eha0V40OPDIZKny 2CLlDRIlkTatEdSTxZqhTERY/obHE4CUZfdFyM6qsAVh5+E2mSjWHTWTKFEuv3yLhdJe yFPw== X-Forwarded-Encrypted: i=1; AHgh+RqElsRHPKj/rfm7RJHVSsLTCEW9Uh2R8IVjnbTZMEG/NkjBMictOK9abDhYTdSvBtlvYVmysJVjqZtRumc=@vger.kernel.org X-Gm-Message-State: AOJu0YwB6VJRGbER3fa0+EGSVyMnfzxbDmnABFZC9NQzkRFtVJUdF+p5 qcul6MvY+2vCMmnsYzAl01XSqM6G6y4X1nS/lsr95UKgNuDApZwHv3k0e5kjjWwy3w== X-Gm-Gg: AfdE7ck+w8wq0pr0y9FX8cN5VaPbDo4KBHsA3TglmK3bQVXT+LkB/4SDUwwRG/gscCs m+Nuc2Y6Qg3Q53tVT/Rf27F0GKMWm2BCvtH3iTnB0M0JM2uCDh8irAo8Kotq0w1j5Edo4iyjYut vsXk2JEy426mGZfMN2vWjuVLrlBwncW05UrXGp+Hh3oGMJOEq1JkWxGKvOBV7md3VOSAcJaz0jn 6ZcL6EluMN+ZNdBNJ43BvnKwbS9C6qNrTtjpcE8NeS9+GgkDPdQnkzhX0RtKyfNnEiQmgT67A/e 6+uWX0AV9/hRFw4zfrEYiLNtj6lMuGCPCoXiYNROybAgio+oiuhvBrGvck3IaiQldbSRzfkh2WP 7WChA7eHY70Ycusl2TIlmBviqxIePaQ4jgSVK6QSewXLSQt93fiX/1spvDWWh15Ok8r8q6W5Nd1 6CxLE0VDjiNspXjyVxX5X+0u1A/6e36YTloaDXXsSFvodLOVeLZA== X-Received: by 2002:a17:902:f689:b0:2bf:3741:5b76 with SMTP id d9443c01a7336-2c7e26b7724mr203255ad.3.1782248140960; Tue, 23 Jun 2026 13:55:40 -0700 (PDT) Received: from google.com (199.255.142.34.bc.googleusercontent.com. [34.142.255.199]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-845a40d1b0fsm54433b3a.28.2026.06.23.13.55.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Jun 2026 13:55:40 -0700 (PDT) Date: Tue, 23 Jun 2026 20:55:32 +0000 From: Pranjal Shrivastava To: David Laight Cc: David Hu , Sumit Semwal , Christian =?iso-8859-1?Q?K=F6nig?= , Jason Gunthorpe , Nicolin Chen , Leon Romanovsky , Kevin Tian , Ankit Agrawal , Alex Williamson , linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, jmoroni@google.com, kpberry@google.com, chriscli@google.com, sashiko-bot@kernel.org, stable@vger.kernel.org Subject: Re: [PATCH v2] dma-buf: Split sgl into page-aligned 2G chunks Message-ID: References: <20260621222130.1667453-1-xuehaohu@google.com> <20260623015459.1153884-1-xuehaohu@google.com> <20260623094446.4a8fc2ed@pumpkin> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260623094446.4a8fc2ed@pumpkin> On Tue, Jun 23, 2026 at 09:44:46AM +0100, David Laight wrote: Hi David, > On Tue, 23 Jun 2026 01:54:59 +0000 > David Hu wrote: > > > Currently, `fill_sg_entry()` splits the scatterlist using `UINT_MAX`. > > This creates a non-page-aligned DMA length (`0xFFFFFFFF`) for the > > first entry, resulting in non-page-aligned DMA addresses for all > > subsequent entries. > > There is a separate issue of whether this code is even needed at all. > Where can transfers over 2G (never mind 4G) actually come from. > > The read, write and similar system calls limit transfers to INT_MAX > (even on 64bit) and a lot of driver code will need fixing it longer > lengths are allowed though. > io_uring better enforce the same limits. > So the transfers can come directly from userspace. > > Not only that but you also need a single physically contiguous buffer. > Good luck allocating that! > > Now maybe there are some peer-to-peer places where the large buffer > is device memory, but they will be unusual and probably need > special treatment anyway. > I agree that traditional VFS read/write face the MAX_RW_COUNT limit (~2GB), and io_uring has its limits, but I'm a little confused by the push to enforce these limits here in the SGL code? File I/O seems to be only one side of the picture. In my view, this fix is necessary and certainly has a use-case: For example, the RDMA subsystem has the capability to import dmabufs [1], which gives rise to use cases for dmabuf beyond standard file ops (via VFS/io_uring). In these scenarios, GPU HBM can be exported as dmabufs. With recent GPUs, HBM capacity can be in the order of hundreds of GBs [2]. RDMA can employ infrastructure like the vfio-dmabuf-exporter [3] or similar dmabuf exporters to frequently move huge blocks of data via P2PDMA. If we restrict incoming dmabuf transfers to fit within VFS-centric limits (2GB), we impose unnecessary overhead on the RDMA stack, forcing it to manage a significantly higher number of memory registrations. By cleanly splitting these massive contiguous device buffers into page-aligned SGL entries, we directly improve the efficiency of P2P transfers and memory registration. Since this change doesn't seem to have a negative impact on standard file I/O or break existing VFS constraints, I'm curious why we shouldn't support splitting these >4GB P2P transfers? Am I missing something? Thanks, Praan [1] https://elixir.bootlin.com/linux/v7.1.1/source/drivers/infiniband/core/umem_dmabuf.c#L174 [2] https://nvdam.widen.net/s/fdvdqvfvj2/hopper-h200-nvl-product-brief (Table 2-2) [3] https://elixir.bootlin.com/linux/v7.1.1/source/drivers/vfio/pci/vfio_pci_dmabuf.c#L297