From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C945BC77B78 for ; Wed, 19 Apr 2023 16:45:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232772AbjDSQpR (ORCPT ); Wed, 19 Apr 2023 12:45:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233061AbjDSQpQ (ORCPT ); Wed, 19 Apr 2023 12:45:16 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4798A83E2; Wed, 19 Apr 2023 09:45:09 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BDDDD640DF; Wed, 19 Apr 2023 16:45:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 30BE9C433D2; Wed, 19 Apr 2023 16:45:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681922708; bh=usuHs7WWzGx30SJOsB+7vdP1xmHfJvvA9iRDFPMGcEA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=fv8TXQzoH5tiCSpaoUcM9KXtHi+S5mruDUO+5WGhkKraiTP3OXFJW11oaTkqjo//4 Mpv+9frdHij8HUjjOq2xsmFs6iaovoAmBZ+HukOb4DenoSK3VN/UXSKQXdtxJIz3bR 4N2KLndDBUCBr3weL68idBbLUp/+SGjBAq8gvruBqj3iiHRRKOnGghkKnyA2qH+U1Q 3MBz2RyjA3XrmvfVfFMags/gBsfnbBSI0G9XsldrVRuzaHHR9udt6omz33nzCW98a3 GdNRycvhpOIfhD2AjBalygb00xZo7a9IuRyxc9KUwGBJh38uVhrQWWhH9yBEXNw3l6 ANIk3WRv9Fyaw== Date: Wed, 19 Apr 2023 09:45:06 -0700 From: Jakub Kicinski To: Christoph Hellwig Cc: Xuan Zhuo , netdev@vger.kernel.org, =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S. Miller" , Eric Dumazet , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , bpf@vger.kernel.org, virtualization@lists.linux-foundation.org, "Michael S. Tsirkin" , Guenter Roeck , Gerd Hoffmann , Jason Wang , Greg Kroah-Hartman , Jens Axboe , Linus Torvalds Subject: Re: [PATCH net-next] xsk: introduce xsk_dma_ops Message-ID: <20230419094506.2658b73f@kernel.org> In-Reply-To: References: <1681711081.378984-2-xuanzhuo@linux.alibaba.com> <20230417115610.7763a87c@kernel.org> <20230417115753.7fb64b68@kernel.org> <20230417181950.5db68526@kernel.org> <1681784379.909136-2-xuanzhuo@linux.alibaba.com> <20230417195400.482cfe75@kernel.org> <20230417231947.3972f1a8@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, 18 Apr 2023 22:16:53 -0700 Christoph Hellwig wrote: > On Mon, Apr 17, 2023 at 11:19:47PM -0700, Jakub Kicinski wrote: > > Damn, that's unfortunate. Thinking aloud -- that means that if we want > > to continue to pull memory management out of networking drivers to > > improve it for all, cross-optimize with the rest of the stack and > > allow various upcoming forms of zero copy -- then we need to add an > > equivalent of dma_ops and DMA API locally in networking? > > Can you explain what the actual use case is? > > From the original patchset I suspect it is dma mapping something very > long term and then maybe doing syncs on it as needed? In this case yes, pinned user memory, it gets sliced up into MTU sized chunks, fed into an Rx queue of a device, and user can see packets without any copies. Quite similar use case #2 is upcoming io_uring / "direct placement" patches (former from Meta, latter for Google) which will try to receive just the TCP data into pinned user memory. And, as I think Olek mentioned, #3 is page_pool - which allocates 4k pages, manages the DMA mappings, gives them to the device and tries to recycle back to the device once TCP is done with them (avoiding the unmapping and even atomic ops on the refcount, as in the good case page refcount is always 1). See page_pool_return_skb_page() for the recycling flow. In all those cases it's more flexible (and faster) to hide the DMA mapping from the driver. All the cases are also opt-in so we don't need to worry about complete oddball devices. And to answer your question in all cases we hope mapping/unmapping will be relatively rare while syncing will be frequent. AFAIU the patch we're discussing implements custom dma_ops for case #1, but the same thing will be needed for #2, and #3. Question to me is whether we need netdev-wide net_dma_ops or device model can provide us with a DMA API that'd work for SoC/PCIe/virt devices.