From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EAD0F3955C6 for ; Tue, 9 Jun 2026 14:58:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781017115; cv=none; b=Seme4lqszPZQG+LJOxFYSQHHJ4nlGLnEQWOFFhhk/LSo4ExL1I+iTQIt+VgqlXlwM5+zab63H1aQj6AWVqLPS/wKqS1tELPu8kYR22WSq4sSR0ynM0hhJvIuXuBsmXB9LSqjVwRzNDZ8xCAnZBZoOKn37ojCUTPAbq7BhWXOfjw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781017115; c=relaxed/simple; bh=89JGksdThCij5+VjGUOcb5O/ZGVRmOdgNkQwA8TGZlE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=WwJ1pqyZeospXM28x+RS14lLFuO6hBpwMfklNRyRetk9YvLUZghPuNSu+00eyQCc45bJW7hdCKrW4R7TezISTeJ8urA0B1DlvwtTQ2gP1BsKdPqy0KyiHazNep85OUJX/ArEoNMj+2bozLRQMD/hQwrnWMEQe/15OyBqrEcMbHA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Tdo4Gkb7; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Tdo4Gkb7" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-2c0c3315c5dso59745145ad.3 for ; Tue, 09 Jun 2026 07:58:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781017112; x=1781621912; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=eFoSbz/qa2j2QRMMIew+HXblnpoKHM6VjME/GNmQDGg=; b=Tdo4Gkb70sz1WZpRhmXsMMAbsW5c5IptasOp/N1KcmZvLYbw+P0Tf21pdZb6BjM/rm Cpl886tq8iW0aVwtWQz7V1l8l42mami4zZCtQzu0NO6YLf+BW6k4XgXFk8Q+sY1tk9xP 3My6QwJlLb95bjiFQds12dOvrT9NQRdBxym6IAreSNwKTVv4mrJ/XSY/0yPoc1WSExq0 ZGTQS66UR4kOUBUjSpMtEL+CA5mDrTw/Gj8I1q+9v6EFOB81Qrpg3rTpUgX33aDdScDi 8HySQQf8W9Fz2QJqzOUMd9kLnQRXH8ULJSuuSvFxMDVPQKF9GsqCyvubMJ8rthiFjoal YpFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781017112; x=1781621912; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eFoSbz/qa2j2QRMMIew+HXblnpoKHM6VjME/GNmQDGg=; b=nGDUon7h/2Ob85alTPibUfjSBylb7K+hSLNP5ma9aX1N5XOh/W+T9p8W+BufBiJ1ti BU7gcxvfR0BMma5CzxQdDtQ1HLieWLNCFPNVD/3tHxr/jl4fS9BiHHRlOw7a/d5yrrXV OGtco9AG6pG8qshhaxmGIjEsBJInku4XIe//pZwghgkmQM68sIEjWKmjXpT9Eiv2guLN VTHhHTyfZIxRhNfHo7AvlCPk9WoxjZl/DnVBqXLvwgkhQqPs8Pdkd0MK1ZhohLwDu4sG N9vv4tosM2u5s+th5eXe0KGRPEkzqBy7mSng7LLhxH9NDSRFY9ctLzgxl39HOv/LgqcZ 9KIA== X-Forwarded-Encrypted: i=1; AFNElJ+4oRTxmoFcLB/MNl4TJmk2+4LjociTpwrKhQ21h56OvgfimtSdRLQ7iYgRuyl7Nw3x+OHqRuM=@vger.kernel.org X-Gm-Message-State: AOJu0YwvSeXDd/n6Rspqrp2lrkpEZHwn0F6d36fot2KAis5Z1gBFED1w 32p6oNWwgxPs7moCOybVhkbnfL+tnzKSgu2JwqvFE1/oxKHkeBfpNYdY X-Gm-Gg: Acq92OEz5TYwfRXRTQsYNw8FEB4MxoMzTLL33JjmIuoIxoETZbMxlXrzLezBCI1O2iy cvQCsRaVeTh/JAG7FPm+ta/XUqUr3xkQ8GrIOcngPO8FpW4NbvhZo0dh3rECSZARuvkAYLHrMhA yoy4bDTKZK7iyv+YTq8Jok3TDUFcsTlaC1sqRwHAhkPndUAXysdejE2mrrOVkkOzHQ/6Vwkyd+y XkylZjapgBSbvAy1olafL+a6UlUBOnS4UtjyU2gkKDbC3r0vELS42BKKKQomFveSNc1zNTbTgNm 8J73NeHeB5/NE32Aun9ChwKDXTVNPbRUwk2G4zAifdv9uPtk2kLNXCcncM33ZZRBgKVa6qMAzxj 5CxjM8ughVb3fKaDsu3vxx+VPIbLjQ8WSP54v+5Ujyvi0qohfTeyJDSUd3qTjJoFUsf2lL5J+oC lhLfCZPWptdIUDTEx6OrqqRzyTSUG//ZkJA1npkddJi75Q2tq/zqyVzA== X-Received: by 2002:a17:903:3885:b0:2c2:27be:39a9 with SMTP id d9443c01a7336-2c227be3b30mr199425095ad.9.1781017111918; Tue, 09 Jun 2026 07:58:31 -0700 (PDT) Received: from devvm29614.prn0.facebook.com ([2a03:2880:ff:3::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2c164f87920sm219844645ad.24.2026.06.09.07.58.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jun 2026 07:58:31 -0700 (PDT) Date: Tue, 9 Jun 2026 07:58:29 -0700 From: Bobby Eshleman To: Christian =?iso-8859-1?Q?K=F6nig?= Cc: Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Andrew Lunn , Gerd Hoffmann , Vivek Kasireddy , Sumit Semwal , Shuah Khan , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-kselftest@vger.kernel.org, sdf@fomichev.me, razor@blackwall.org, daniel@iogearbox.net, almasrymina@google.com, matttbe@kernel.org, skhawaja@google.com, dw@davidwei.uk, Bobby Eshleman Subject: Re: [PATCH net-next 2/4] udmabuf: emit one sg entry per pinned folio Message-ID: References: <20260603-tcpdm-large-niovs-v1-0-f37a4ac6726c@meta.com> <20260603-tcpdm-large-niovs-v1-2-f37a4ac6726c@meta.com> <0c86f5d3-b5e9-4cac-aa9d-30c5c8ecca66@amd.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Mon, Jun 08, 2026 at 03:59:04PM +0200, Christian König wrote: > On 6/8/26 15:55, Bobby Eshleman wrote: > > > > On Sun, Jun 7, 2026 at 11:42 PM Christian König > wrote: > > > > On 6/5/26 20:44, Bobby Eshleman wrote: > > > On Fri, Jun 05, 2026 at 11:30:07AM +0200, Christian König wrote: > > >> On 6/4/26 02:42, Bobby Eshleman wrote: > > >>> From: Bobby Eshleman > > > >>> > > >>> get_sg_table() emitted one PAGE_SIZE sg entry per page even when the > > >>> underlying folio was larger. > > >>> > > >>> Instead, walk folios[] and emit one sg entry per folio. When folios > > >>> represent large pages (as is for MFD_HUGETLB), each sg entry is a large > > >>> page. Normal PAGE_SIZE sg tables are unchanged. > > >>> > > >>> Required by net/core/devmem to support rx-buf-size > PAGE_SIZE with > > >>> udmabuf. > > >> > > >> That doesn't explain why this is required. > > > > > > Sure, can definitely add. Devmem currently requires dmabuf sg entries to > > > be length and size aligned when it allocates niovs for NIC page pools. > > > Though udmabuf is not violating any dmabuf contract by emitting > > > PAGE_SIZE entries and the above restriction is probably more a > > > shortfalling of devmem, by emitting a single entry per folio this patch > > > allows udmabuf to be used by devmem for large pages. > > > > > >> > > >> Please note that accessing the pages/folio of an sg-table returned by DMA-buf is illegal and strictly forbidden! > > >> > > >> Regards, > > >> Christian. > > > > > > It seems both devmem and io_uring zcrx at least introspect through to > > > the sg-table to build NIC page pools (not accessing the memory itself, > > > however). Is there a better way? > > > > That's an absolute NO-GO! We need to stop that immediately. > > > > Touching the underlying struct page of an DMA-buf exported sg-table is strictly forbidden. > > > > We even have code to wrap the sg_table and hide the struct pages on debug builds to catch those issues, see function dma_buf_wrap_sg_table(). > > > > My last status is that the NIC page pools are build directly from the DMA addresses exposed by the sg_table. > > > > Was there any change I'm not aware of? > > > > Regards, > > Christian. > > > > > > Oh no change, your mental model is still current. > > They just go through each sg and use sg_dma_address() on each. > > Ah, thanks! That was a near heart attack :D > > Yeah that is perfectly correct, question is do you then still really need this udmabuf change? I mean the DMA API usually merges together contiguous DMA addresses. > > Regards, > Christian. > Hey Christian, sorry for the delay I justed want to double check what I'm seeing... I reverted the udmabuf patch and confirmed devmem still runs into 4K pages even for hugepage udmabuf. I see that the dma_map_direct() path is being taken, which if I am reading the code correctly results in the sg_dma_len(sg) inheriting sg->length directly (set by udmabuf's sg_set_folio(..., PAGE_SIZE) call), compared to the iommu_dma_map_phys() path which looks like it does merge when possible. Best, Bobby