From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93959C001DE for ; Tue, 18 Jul 2023 18:06:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231876AbjGRSGe (ORCPT ); Tue, 18 Jul 2023 14:06:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230197AbjGRSGd (ORCPT ); Tue, 18 Jul 2023 14:06:33 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7144019A for ; Tue, 18 Jul 2023 11:06:32 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1b9e9765f2cso35777975ad.3 for ; Tue, 18 Jul 2023 11:06:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1689703592; x=1692295592; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=nXz9xDxwT7BaavIhnCmafXwp3/aDUnwPYqui5jTbzSQ=; b=ACjbrXYsBjZtBHo79yj+TyLzejWZcJf0M1KE5Nr8CY0VPfyoOEcyveFX72cP3PRP9t OW+QPBxUbyr8Ijpyvhswl2xrBaZSWXjSOJj/cHksd1Kyx9grxU4NG/+/tn0y+T2dlw11 OWW4P6usGYxiGx0rqZIzuFiFsTRyi6Rx/m5mG+5pMRGga9+g8VJKgfzzhWZALToxbsi6 kWVc8yb26xh+wh0f6jUuOnSNnXYXCv373FZU7EXuGVmWsCzVnlsqoQVDOXwQUPgZDHn2 fDgPr+2H92WfTB7trYjFZI1vsdTvb5GbxWMIS205xgibFTYYMNJRYgLTU/h0l/g8Gf7V WgMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689703592; x=1692295592; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=nXz9xDxwT7BaavIhnCmafXwp3/aDUnwPYqui5jTbzSQ=; b=TnvoIdKfqjgnG1QhgWQls5ODhSv5f410RXEMxkRRWAhbaOK4Q9YqdjC1Kx+ZoYgjej yOn7uylBYbyi9xO6g8bzNKa/r8rt7SXdaVo/thkajv/LLbSRRJqz/dbAKC7p2z8KvLLx mVQSxyBEkpc4DNgn/boBgoKieOwwuKCrqoR6HV/7l5JMa5ZGo/CP/eK1UV/vClgTdO+N h6EsMBSumiifVDTguX7sNw0bUQh5PphcdrdBFrh3Cn8QXDksUx3dvA0by1DF2MtmfnPq kmEM1uEW+U3mugDPMlhYZgIOQOeo7nmuWrfySdbmiQO0uai6IUxaAGmb7EJmeFbdnQyy k+iw== X-Gm-Message-State: ABy/qLavzm4ftfaZ3Ke0Kd2hKsmloWuWucE4sZWhFTIdmYfwEq83ZA8o q/LsYkGZGp5ZfvmxNFVwnD8meg== X-Google-Smtp-Source: APBJJlF7FlQJHF3FMLrPPssG4qg4BrXeUhwWshlLKfnw7NuTDc181VYHZRzG6vsaH9PJ4JGKGoGO3g== X-Received: by 2002:a17:902:c406:b0:1b9:e9b2:1288 with SMTP id k6-20020a170902c40600b001b9e9b21288mr530425plk.38.1689703591309; Tue, 18 Jul 2023 11:06:31 -0700 (PDT) Received: from ziepe.ca ([206.223.160.26]) by smtp.gmail.com with ESMTPSA id z10-20020a1709028f8a00b001b89c313185sm2171634plo.205.2023.07.18.11.06.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 11:06:30 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1qLp5h-002aj7-9n; Tue, 18 Jul 2023 15:06:29 -0300 Date: Tue, 18 Jul 2023 15:06:29 -0300 From: Jason Gunthorpe To: Mina Almasry Cc: Andy Lutomirski , linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, Sumit Semwal , Christian =?utf-8?B?S8O2bmln?= , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan Subject: Re: [RFC PATCH 00/10] Device Memory TCP Message-ID: References: <20230710223304.1174642-1-almasrymina@google.com> <12393cd2-4b09-4956-fff0-93ef3929ee37@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-arch@vger.kernel.org On Tue, Jul 18, 2023 at 10:36:52AM -0700, Mina Almasry wrote: > That is specific to this proposal, and will likely be very different > in future ones. I thought the dma-buf pages approach was extensible > and the uapi belonged somewhere in dma-buf. Clearly not. The next > proposal, I think, will program the rxq via some net uapi and will > take the dma-buf as input. Probably some netlink api (not sure if > ethtool family or otherwise). I'm working out details of this > non-paged networking first. In practice you want the application to startup, get itself some 3/5 tuples and then request the kernel to setup the flow steering and provision the NIC queues. This is the right moment for the application to provide the backing for the rx queue memory via a DMABUF handle. Ideally this would all be accessible to non-priv applications as well, so I think you'd want some kind of system call that sets all this up and takes in a FD for the 3/5-tuple socket (to prove ownership over the steering) and the DMABUF FD. The queues and steering should exist only as long as the application is still running (whatever that means). Otherwise you have a big mess to clean up whenever anything crashes. netlink feels like a weird API choice for that, in particular it would be really wrong to somehow bind the lifecycle of a netlink object to a process. Further, if you are going to all the trouble of doing this, it seems to me you should make it work with any kind of memory, including CPU memory. Get a consistent approach to zero-copy TCP RX. So also allow a memfd or similar to be passed in as the backing storage. Jason From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CE752C0015E for ; Tue, 18 Jul 2023 18:06:34 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BF8DB10E3B2; Tue, 18 Jul 2023 18:06:33 +0000 (UTC) Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by gabe.freedesktop.org (Postfix) with ESMTPS id D881310E3B2 for ; Tue, 18 Jul 2023 18:06:32 +0000 (UTC) Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1b9e9765f2cso35777905ad.3 for ; Tue, 18 Jul 2023 11:06:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1689703591; x=1692295591; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=nXz9xDxwT7BaavIhnCmafXwp3/aDUnwPYqui5jTbzSQ=; b=groKThLKhcGdvS7VGO7C4SDS0MFs8h/VRyqV+OQecO/1MgBAs/ZfpPKMHdEDFc18U6 lL6E01mKA8UE5kLtpd7OED7gt7ztbJ5zG6ZR10SH2TRXOQQY1VxzoLfjSQ8X+8hvr7zc 87zzsMTMDmeVha3PmXm/KENBZCBJR0whtZvmDjcbID28N6ghyDcrk5tXGBvdZgk2kE89 c/ehnZ5oymDKSDagYvboYUn0sxTqAeqXz24JaL1wf81rDQDv/hEv3XN80y+jSOV2M/0E 6T7xuWgJpH1NmtIjdE8p3/Mzy7HrBdkuTncvzIqpvybXisn6eFXd5MAeGA3KsBbF/NVQ x6YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689703591; x=1692295591; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=nXz9xDxwT7BaavIhnCmafXwp3/aDUnwPYqui5jTbzSQ=; b=ggfMJJTr3yV/2sNDyaJQNYb807Vjbdi73GzrgL+JfaC5nDrBal+87GDYoOOHwOxS+w z6mBjZZxOrj8JxfwGyEAReoCC7xqE2NpyAoxgqtyqMruRncJgfTG9/qZQzgCJEoQ1V8f kODfkygCj+74lZJBmAz8Mj9VBv05Rnc7pSGiHU6OTaRK+ILeYnTOFWaOc9A0EhaJ8Hr2 x8MXTaz1+MpH1ENrR5WWCN24L2uwYVSXKZs+tslgmU5q96FIBgOvP2LYZnsBgQBtzqWV BfxgHTBVRNDynEcCrXwsM7aoawlW2D2EWr/TwGwQ76QFEyqyL9APzdUKjIkuOkLccdqy MrzQ== X-Gm-Message-State: ABy/qLaPum1d4BOf1mA2w/ZZJcyNPHzo6XQnN9WvNVVt1+ZGODwMt4PS Z4Y/u6gAss5B8THJ/GmFqwhlNg== X-Google-Smtp-Source: APBJJlF7FlQJHF3FMLrPPssG4qg4BrXeUhwWshlLKfnw7NuTDc181VYHZRzG6vsaH9PJ4JGKGoGO3g== X-Received: by 2002:a17:902:c406:b0:1b9:e9b2:1288 with SMTP id k6-20020a170902c40600b001b9e9b21288mr530425plk.38.1689703591309; Tue, 18 Jul 2023 11:06:31 -0700 (PDT) Received: from ziepe.ca ([206.223.160.26]) by smtp.gmail.com with ESMTPSA id z10-20020a1709028f8a00b001b89c313185sm2171634plo.205.2023.07.18.11.06.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 11:06:30 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1qLp5h-002aj7-9n; Tue, 18 Jul 2023 15:06:29 -0300 Date: Tue, 18 Jul 2023 15:06:29 -0300 From: Jason Gunthorpe To: Mina Almasry Subject: Re: [RFC PATCH 00/10] Device Memory TCP Message-ID: References: <20230710223304.1174642-1-almasrymina@google.com> <12393cd2-4b09-4956-fff0-93ef3929ee37@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-arch@vger.kernel.org, Willem de Bruijn , linux-kselftest@vger.kernel.org, Arnd Bergmann , "David S. Miller" , netdev@vger.kernel.org, David Ahern , Ilias Apalodimas , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Christian =?utf-8?B?S8O2bmln?= , linaro-mm-sig@lists.linaro.org, Eric Dumazet , Andy Lutomirski , Jakub Kicinski , Paolo Abeni , Shuah Khan , Sumit Semwal , Jesper Dangaard Brouer , linux-media@vger.kernel.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Tue, Jul 18, 2023 at 10:36:52AM -0700, Mina Almasry wrote: > That is specific to this proposal, and will likely be very different > in future ones. I thought the dma-buf pages approach was extensible > and the uapi belonged somewhere in dma-buf. Clearly not. The next > proposal, I think, will program the rxq via some net uapi and will > take the dma-buf as input. Probably some netlink api (not sure if > ethtool family or otherwise). I'm working out details of this > non-paged networking first. In practice you want the application to startup, get itself some 3/5 tuples and then request the kernel to setup the flow steering and provision the NIC queues. This is the right moment for the application to provide the backing for the rx queue memory via a DMABUF handle. Ideally this would all be accessible to non-priv applications as well, so I think you'd want some kind of system call that sets all this up and takes in a FD for the 3/5-tuple socket (to prove ownership over the steering) and the DMABUF FD. The queues and steering should exist only as long as the application is still running (whatever that means). Otherwise you have a big mess to clean up whenever anything crashes. netlink feels like a weird API choice for that, in particular it would be really wrong to somehow bind the lifecycle of a netlink object to a process. Further, if you are going to all the trouble of doing this, it seems to me you should make it work with any kind of memory, including CPU memory. Get a consistent approach to zero-copy TCP RX. So also allow a memfd or similar to be passed in as the backing storage. Jason