From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FCCFC3DA4A for ; Mon, 19 Aug 2024 15:53:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 063C06B0083; Mon, 19 Aug 2024 11:53:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 013D26B0085; Mon, 19 Aug 2024 11:53:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF64E6B0088; Mon, 19 Aug 2024 11:53:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C200C6B0083 for ; Mon, 19 Aug 2024 11:53:38 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6BC52161380 for ; Mon, 19 Aug 2024 15:53:38 +0000 (UTC) X-FDA: 82469440116.30.A5823E3 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) by imf28.hostedemail.com (Postfix) with ESMTP id 899E1C0023 for ; Mon, 19 Aug 2024 15:53:36 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=StzO5JY2; spf=pass (imf28.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.46 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724082740; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yHjuzHLc6dVbH873UnnfxiBtXqzG2kypRsGylP+Qz8Q=; b=4RF/B8EGkxca9OevY9i165ytY3WR4pJeFKxZiRhlNNJGu5BkrkmAoBe7rCHlSv+Hi354MI b2EF5hXZscg2RleOoyGlyYD+h9+FSIl323Gw0b0Rx5/9gD29HVagx9oZR+MGt7bztmHiac zxURJsKkNjtulHtTv2PG+T148tzxKxM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724082740; a=rsa-sha256; cv=none; b=vI947jtY6+Q9E0/JZF/ZEE9rcYggCaL/EceUuwHg3h0VEcYP8LqESyZ5xhyX1KXdraEIYG kGumnsoH6TuajWDIac3Upy+NHCicKoqpRfr3s/dvRz731wMd48CFTJnLfoRNhpK4v6uHJc hEkqnv1n39Pov5dEztn7lzgJ+lso06M= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=StzO5JY2; spf=pass (imf28.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.46 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-428085a3ad1so39387565e9.1 for ; Mon, 19 Aug 2024 08:53:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724082815; x=1724687615; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yHjuzHLc6dVbH873UnnfxiBtXqzG2kypRsGylP+Qz8Q=; b=StzO5JY2bp0gB8B2eW5oQYZKahQca5E1sxtMpi0DoX71XswAcBtWhl/MIKnr5xTSgs Zt/F6Z5Ag/vigSWFTzsdRJ/mMJeIRM9/HPkOc7Oar2IaHorU3BBTN9pNS7BDUJrbrYlC KImgHHyTQTtk5+KHlFaDRL8VwdxKzRQuKMwui0fHwMrwqpLTzFtWHt2p8eyZI5SM9bgM jsNlk5vpyhpqBZj0rydUy3JL4s1fyycoGc4v6pVuYoR0yXs1XQKX0ngVAySnd32GLFBu dPaIXTlmETLgGJ+eeP3MJMi9XXxTji8p34rLnrHloCA2opUrVsSC+GKi1dEIj1gYhN6N fSrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724082815; x=1724687615; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yHjuzHLc6dVbH873UnnfxiBtXqzG2kypRsGylP+Qz8Q=; b=G2/5pr+fkihKXZV8opUN4UeCWU1vmi/3vhtHB6p3IcZMHHl8qDBEptmMJkymHzTphW YEZlc5tMIbyHASFmKq/H07CKSDQ/ZwhX4+z7qovzu+cKp7M6sAmjZ8HI6Q/X71BcutBu Nwa3iAdRqSxrfYuKPqyENuxaRsNOLywz9TGMu5umxhMuXMnovQexpB8eN86BVLs7DKDA 036Ww7jU21k02gdN3/6wOAkLYtYPqOfIMTVk7DwmjdwBuLmfOVPYex2YfoPqWiLsjckn ShVow+BjLmNhYXstsYgx2KfruGiisy7H/CN+y2GGbNuqDQf8q7qy886lnnvxlv8f87Ig 1ZIg== X-Forwarded-Encrypted: i=1; AJvYcCXzCWA18imw0W6LXF/EK7qJX8EArKGMmP5lIqMErDlnnPCMSta8JsDCoo3H5RI6ibQf6i5pvn/EtjjW0ZJ5QGXj++w= X-Gm-Message-State: AOJu0YzRGzKaGlHKajYaT9GPI0ctuoKJyxIP9y1AZ2p6wKN+QOFMJhor /meR70AdZ3dy/hDEp2GYIxw6nfsvd6U2uXsS781KfCONE9c9dbNxEYchztFFb1Fo44VRLUphePA 8ToNugjiT6elH7cqvkYYDOkXuseY= X-Google-Smtp-Source: AGHT+IGQebnGsoVvedM2mMyp/EghK9PoeWJVkJxA7xPAhuM5bYFyaz+0IQN7NgeQJW5stT9jK19xelNtf1qrHeubrms= X-Received: by 2002:a05:600c:1c8f:b0:428:23c8:1e54 with SMTP id 5b1f17b1804b1-429ed7b8360mr89105815e9.18.1724082814573; Mon, 19 Aug 2024 08:53:34 -0700 (PDT) MIME-Version: 1.0 References: <20240808123714.462740-1-linyunsheng@huawei.com> <20240808123714.462740-12-linyunsheng@huawei.com> <7f06fa30-fa7c-4cf2-bd8e-52ea1c78f8aa@huawei.com> <5905bad4-8a98-4f9d-9bd6-b9764e299ac7@huawei.com> In-Reply-To: <5905bad4-8a98-4f9d-9bd6-b9764e299ac7@huawei.com> From: Alexander Duyck Date: Mon, 19 Aug 2024 08:52:58 -0700 Message-ID: Subject: Re: [PATCH net-next v13 11/14] mm: page_frag: introduce prepare/probe/commit API To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 899E1C0023 X-Stat-Signature: qf5mgd19bnkb1gygnqfdiacsshy6nmcj X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1724082816-936935 X-HE-Meta: U2FsdGVkX1/NTyqHc5Lj+kZ1SH9xVTOntlOM1zdZJr6a+GT5y77p2LrPky18dv8TTNqXQWtD+1OwxXg5RrrjbY+VXv71pJgPDrkrqRhaLXeDK8GyFN0POSizHTtOhAZfXA4i5ju6oqpv2ppPk+gQ6/g42TZ31fCh/XhcLYAtyvAwYzcxUMSXMVOCJTYLPYShLltjTmIOdjl5sMyMkMkeNyDe93RIiVCw92tj3ar7BUkmHQQH7D8ET27PaiPnx4N4qFLelf0wXcJD1KNva/2jANYT9/NK8JxhSGxYcYIqnPw41me2QSSXxJTVjlNTXt89E61B/hZkimqnuNYc7QMLHVpKaBJ66cmdHbjgE9PC0rAdISqN5ibVGUdBaiXpjeDkZ9YIRFZgdd6f/+q6ww+trCQDpp1RyMD4zKP2YRcINmkMeo4YOUbRUpb4nWXsj4Wpm4rpNL5FC9J3jW080rvTBigHngThdndzsn38J7CDMVv/r2jb58Cn9xWO3nrhoxkpQOuOAT1sTjhYboQgD4jU60zNj/6ROMxoG2HTrDfEMZ1WPYfHj7c5cQxPnAopCZAmE7eGGySDAzlT3ph1ZnH2BQJhbZG5u6xaLnNXu6qmk7xSvijZOt97k7DPcCbiXMbUcOZVbicG6x2Ra6EttqzfrRk4dE7Pvf1Tjks9hvfmQ7JHJh6eIoPgnOwJS65xtLz+mN4F2f9L6KikI4FhTw8AIt6RhJfdGXhek1X6ge0rOpu+3ygPC19NaHjLAxis4HdLR6zpcOYLWaX/R13dqTgtq0xJkMJX5TgK7C13JWV5+jHp4gRPlV05TyzEtM4P+xa7dXm91anhdkftGEfbmGI7xvwM5LkRJmkNULbbaYAVRz44q86GLkRfuqE0/q3k9jQSr9xJ63Te2reQOAV/VRUoUNAFaJWPKWRdCRv5cSkdmHODmF90XaVVMEwh9KnxN2AHAOT2n82jlyvLQRApxTu QfeRqqnx LxNRlVdC0DaqnDjn4ffKAS1ff6Zu847Xyd4mMaYWZNLhSkE6YcxAaklphInxzESckcMNeMG6VfGYzogf2qWpAO/N4JQsgUeJpKH19590h43frXxE/hcQPPxKvJ7XyHt1///O53VKJPYzvQ9UPJKYPUHWcaDgnFv6MepPoSg57rwYt4S72JWha+JiltrZFFECEIxh/DGxOryqlU0PFXteFZAe5KaOxty7tZjkiCwx2Z+49JjubML6gILnjhUb/sDcgIgBPqnpfrY9Qnu0hfc7KYYKBMLzB+SxMM/bnO3NRpM9jH8MBvQlJ1T7Ei/pBRLn9qwOiGvoY6SrbGmoFR/WHPAOCxv3AqsJ5uK7nSjaPQpEKEDFYoJwfgc89wfn70IFAfUVs6lZuM8jxDvY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 16, 2024 at 5:01=E2=80=AFAM Yunsheng Lin wrote: > > On 2024/8/15 23:25, Alexander Duyck wrote: > > On Wed, Aug 14, 2024 at 8:05=E2=80=AFPM Yunsheng Lin wrote: > >> > >> On 2024/8/15 5:00, Alexander H Duyck wrote: > > > > ... > > > >>>> +static inline void page_frag_alloc_abort(struct page_frag_cache *nc= , > >>>> + unsigned int fragsz) > >>>> +{ > >>>> + nc->pagecnt_bias++; > >>>> + nc->remaining +=3D fragsz; > >>>> +} > >>>> + > >>> > >>> This doesn't add up. Why would you need abort if you have commit? Isn= 't > >>> this more of a revert? I wouldn't think that would be valid as it is > >>> possible you took some sort of action that might have resulted in thi= s > >>> memory already being shared. We shouldn't allow rewinding the offset > >>> pointer without knowing that there are no other entities sharing the > >>> page. > >> > >> This is used for __tun_build_skb() in drivers/net/tun.c as below, main= ly > >> used to avoid performance penalty for XDP drop case: > > > > Yeah, I reviewed that patch. As I said there, rewinding the offset > > should be avoided unless you can verify you are the only owner of the > > page as you have no guarantees that somebody else didn't take an > > access to the page/data to send it off somewhere else. Once you expose > > the page to any other entity it should be written off or committed in > > your case and you should move on to the next block. > > Yes, the expectation is that somebody else didn't take an access to the > page/data to send it off somewhere else between page_frag_alloc_va() > and page_frag_alloc_abort(), did you see expectation was broken in that > patch? If yes, we should fix that by using page_frag_free_va() related > API instead of using page_frag_alloc_abort(). The problem is when you expose it to XDP there are a number of different paths it can take. As such you shouldn't be expecting XDP to not do something like that. Basically you have to check the reference count before you can rewind the page. > > > > > >> > >>>> +static struct page *__page_frag_cache_reload(struct page_frag_cache= *nc, > >>>> + gfp_t gfp_mask) > >>>> { > >>>> + struct page *page; > >>>> + > >>>> if (likely(nc->encoded_va)) { > >>>> - if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt= _bias)) > >>>> + page =3D __page_frag_cache_reuse(nc->encoded_va, nc->pa= gecnt_bias); > >>>> + if (page) > >>>> goto out; > >>>> } > >>>> > >>>> - if (unlikely(!__page_frag_cache_refill(nc, gfp_mask))) > >>>> - return false; > >>>> + page =3D __page_frag_cache_refill(nc, gfp_mask); > >>>> + if (unlikely(!page)) > >>>> + return NULL; > >>>> > >>>> out: > >>>> /* reset page count bias and remaining to start of new frag */ > >>>> nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; > >>>> nc->remaining =3D page_frag_cache_page_size(nc->encoded_va); > >>>> - return true; > >>>> + return page; > >>>> +} > >>>> + > >>> > >>> None of the functions above need to be returning page. > >> > >> Are you still suggesting to always use virt_to_page() even when it is > >> not really necessary? why not return the page here to avoid the > >> virt_to_page()? > > > > Yes. The likelihood of you needing to pass this out as a page should > > be low as most cases will just be you using the virtual address > > anyway. You are essentially trading off branching for not having to > > use virt_to_page. It is unnecessary optimization. > > As my understanding, I am not trading off branching for not having to > use virt_to_page, the branching is already needed no matter we utilize > it to avoid calling virt_to_page() or not, please be more specific about > which branching is traded off for not having to use virt_to_page() here. The virt_to_page overhead isn't that high. It would be better to just use a consistent path rather than try to optimize for an unlikely branch in your datapath. > > > > > >> > >>>> +struct page *page_frag_alloc_pg(struct page_frag_cache *nc, > >>>> + unsigned int *offset, unsigned int frag= sz, > >>>> + gfp_t gfp) > >>>> +{ > >>>> + unsigned int remaining =3D nc->remaining; > >>>> + struct page *page; > >>>> + > >>>> + VM_BUG_ON(!fragsz); > >>>> + if (likely(remaining >=3D fragsz)) { > >>>> + unsigned long encoded_va =3D nc->encoded_va; > >>>> + > >>>> + *offset =3D page_frag_cache_page_size(encoded_va) - > >>>> + remaining; > >>>> + > >>>> + return virt_to_page((void *)encoded_va); > >>>> + } > >>>> + > >>>> + if (unlikely(fragsz > PAGE_SIZE)) > >>>> + return NULL; > >>>> + > >>>> + page =3D __page_frag_cache_reload(nc, gfp); > >>>> + if (unlikely(!page)) > >>>> + return NULL; > >>>> + > >>>> + *offset =3D 0; > >>>> + nc->remaining =3D remaining - fragsz; > >>>> + nc->pagecnt_bias--; > >>>> + > >>>> + return page; > >>>> } > >>>> +EXPORT_SYMBOL(page_frag_alloc_pg); > >>> > >>> Again, this isn't returning a page. It is essentially returning a > >>> bio_vec without calling it as such. You might as well pass the bio_ve= c > >>> pointer as an argument and just have it populate it directly. > >> > >> I really don't think your bio_vec suggestion make much sense for now = as > >> the reason mentioned in below: > >> > >> "Through a quick look, there seems to be at least three structs which = have > >> similar values: struct bio_vec & struct skb_frag & struct page_frag. > >> > >> As your above agrument about using bio_vec, it seems it is ok to use a= ny > >> one of them as each one of them seems to have almost all the values we > >> are using? > >> > >> Personally, my preference over them: 'struct page_frag' > 'struct skb_= frag' > >>> 'struct bio_vec', as the naming of 'struct page_frag' seems to best m= atch > >> the page_frag API, 'struct skb_frag' is the second preference because = we > >> mostly need to fill skb frag anyway, and 'struct bio_vec' is the last > >> preference because it just happen to have almost all the values needed= . > > > > That is why I said I would be okay with us passing page_frag in patch > > 12 after looking closer at the code. The fact is it should make the > > review of that patch set much easier if you essentially just pass the > > page_frag back out of the call. Then it could be used in exactly the > > same way it was before and should reduce the total number of lines of > > code that need to be changed. > > So the your suggestion changed to something like below? > > int page_frag_alloc_pfrag(struct page_frag_cache *nc, struct page_frag *p= frag); > > The API naming of 'page_frag_alloc_pfrag' seems a little odd to me, any b= etter > one in your mind? Well at this point we are populating/getting/pulling a page frag from the page frag cache. Maybe look for a word other than alloc such as populate. Essentially what you are doing is populating the pfrag from the frag cache, although I thought there was a size value you passed for that isn't there?