From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AECEC60BBF for ; Fri, 22 Mar 2024 17:55:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711130111; cv=none; b=UZXUHnMf0IwYXwQH91UP7v3Re8GstmlTCSh1BFGeqSyMDXbbPNAzvynjlTveLOFrNiwo+oM+tSjiFz+G9Tzb0G5CfTms36j9u/ULdHIOtOIn8vQd8nxR2w+Jl5WP0sS8SKS8O9bh7AEOzRe492g8b103GMtkQsrjptOLdatfzdI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711130111; c=relaxed/simple; bh=l9m99BFxmTnLiVZTWH4IyJEQNsmy+RFkQO1mLY1L5IA=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=Z0gbBg7pEfjFqABBL+x9SQBpOPuRzVOxJLFEUBD4nQMRx0Ya28fA70a7/QAwQyqXuwWbgZJHGMKR7BIxZ2jUoua8m/ZCxgT4G03/qJxc0waJSHxR/Pt8yHBphdzUFT0yTjPV5boxKGTBpHBwTwODG/jX/r6pjh/HGirBTKsRtYU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=RzkND4Bm; arc=none smtp.client-ip=209.85.208.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RzkND4Bm" Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-56bcbf40cabso2821100a12.0 for ; Fri, 22 Mar 2024 10:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711130107; x=1711734907; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6WuYhmNp304wILfLv6V/aQkHKFavalWMa0RSWXYvrPw=; b=RzkND4Bm4wkW3lvcEypkcxdmIg1Pc2XAJIPjja+0m+v/CoeclpPbx3S06/HnLsyDhQ IlE86J50GPqiQotibPDMSZcMLL/LCKx/nc/WNm03EtQrfCMtwRaAGvsZjG31wZg8unia yIO5LSMFoknJfUy3v6TCpywvoCOXS0ROpdyhSG4wgoLxD11Cmr7aC53NpjsGzJZtHEmz sA4xBR4biuHrpF47aakzjpmM+UJ+L8Q/7YjQryfz+gthJ09Xn0G6rQ7Edg5cBODxW7zj U24TMitp1pKP0upk5gxicw9rGF6SY9GuUwm96+GjbXB1pIgu8Q2a3SaUmr9e+3ce/QnJ wkHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711130107; x=1711734907; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6WuYhmNp304wILfLv6V/aQkHKFavalWMa0RSWXYvrPw=; b=r9AEt5LxTHSSum8v3zQyUJZyFhcN3ziHeYK18HndVDuXxtqxrxEgY9qjBrfmDDldqH LnLv1Fmzhja0wmJ5uS5fcWbJKseLzjPtHZS3dx71JEZ9wafimnEj4R5hmzlkzEVFd/9J r0HfMuS3Pw/4MBGHSAJvQe4t3/kSI4JYVwqAjQuhi0D9/1WunX/UmuIPXioCmcKRazuL ssyUUrdJ9oKWYHr2o8oc5poveQ3/lfI+HaUU0V/N83rw1wJF1un89HbbyNZayFn6ZPcM 3g1jpdshdEhIpYSuxTuhSlXqimrdZZJVYVpGMEdySwRnuASOWMPmPaVaH1PThzqjCscl eAcQ== X-Forwarded-Encrypted: i=1; AJvYcCX9hT+T0DR4AlD7yFx3ZgJuhRAjzWQSBbKmmktlGNHH10mGwPnXXHfxo8ivwHWz2MbRp60YCxT3fT7OnJbCKf7B458FTVjA798z390rsN9R64+A X-Gm-Message-State: AOJu0YxFEjQym2fGUZpCuhQn0PDbiQ5l5tG0sB6V0O9dlSHvUdaVGyoh e2qCNb287BChNiKdH+XNRLyE7aBCZeSCAzTLv1uZjAEv9keZZDYorTMrNJ/9lwT2DtVARmOJ65O sipTyxrIL3cqNqRPIqE3B8GTrG+l16wSluI80 X-Google-Smtp-Source: AGHT+IHTothcuqikFFrMs12/Yd8HwJZ8LnAxmbZMMmJb1jW1wBBEyord9u+K2tzA7reF5OIehczUR0R88m66zZf5tNs= X-Received: by 2002:a17:906:d190:b0:a47:e62:4d72 with SMTP id c16-20020a170906d19000b00a470e624d72mr331556ejz.15.1711130106762; Fri, 22 Mar 2024 10:55:06 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240305020153.2787423-1-almasrymina@google.com> <20240305020153.2787423-3-almasrymina@google.com> In-Reply-To: From: Mina Almasry Date: Fri, 22 Mar 2024 10:54:54 -0700 Message-ID: Subject: Re: [RFC PATCH net-next v6 02/15] net: page_pool: create hooks for custom page providers To: Christoph Hellwig Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Pavel Begunkov , David Wei , Jason Gunthorpe , Yunsheng Lin , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Mar 17, 2024 at 7:03=E2=80=AFPM Christoph Hellwig wrote: > > On Mon, Mar 04, 2024 at 06:01:37PM -0800, Mina Almasry wrote: > > From: Jakub Kicinski > > > > The page providers which try to reuse the same pages will > > need to hold onto the ref, even if page gets released from > > the pool - as in releasing the page from the pp just transfers > > the "ownership" reference from pp to the provider, and provider > > will wait for other references to be gone before feeding this > > page back into the pool. > > The word hook always rings a giant warning bell for me, and looking into > this series I am concerned indeed. > > The only provider provided here is the dma-buf one, and that basically > is the only sensible one for the documented design. Sorry I don't mean to argue but as David mentioned, there are some plans in the works and ones not in the works to extend this to other memory types. David mentioned io_uring & Jakub's huge page use cases which may want to re-use this design. I have an additional one in mind, which is extending devmem TCP for storage devices. Currently storage devices do not support dmabuf and my understanding is that it's very hard to do so, and NVMe uses pci_p2pdma instead. I wonder if it's possible to extend devmem TCP in the future to support pci_p2pdma to support nvme devices in the future. Additionally I've been thinking about a use case of limiting the amount of memory the net stack can use. Currently the page pool is free to allocate as much memory as it wants from the buddy allocator. This may be undesirable in very low memory setups such as overcommited VMs. We can imagine a memory provider that allows allocation only if the page_pool is below a certain limit. We can also imagine a memory provider that preallocates memory and only uses that pinned pool. None of these are in the works at the moment, but are examples of how this can be (reasonably?) extended. > So instead of > adding hooks that random proprietary crap can hook into, To be completely honest I'm unsure how to design hooks for proprietary code to hook into. I think that would be done on the basis of EXPORTED_SYMBOL? We do not export these hooks, nor plan to at the moment. > why not hard > code the dma buf provide and just use a flag? That'll also avoid > expensive indirect calls. > Thankfully the indirect calls do not seem to be an issue. We've been able to hit 95% line rate with devmem TCP and I think the remaining 5% are a bottleneck unrelated to the indirect calls. Page_pool benchmarks show a very minor degradation in the fast path, so small it may be just noise in the measurement (may!): https://lore.kernel.org/netdev/20240305020153.2787423-1-almasrymina@google.= com/T/#m1c308df9665724879947a345c4b1ec3b51ff6856 This is because the code path that does indirect allocations is the slow path. The page_pool recycles netmem aggressively. --=20 Thanks, Mina