From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1737323D59 for ; Wed, 17 May 2023 14:18:11 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DD6818C for ; Wed, 17 May 2023 07:18:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684333086; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ioWVxI0IhvogYDfX7ey3FCLR56UxerMQXuLkBllxahs=; b=Ag7/lQ0sx3I6IvfrGY/27y63xh3/XaBzfenYa36XEO746evrWV2Mwo7Z33Fkax/Fp1hUe4 /zYMfzFDVektcQCq5cW15ol9rpQxRf6op+C3PppGRotO88OBsBYG+/Hp2Qc1+cACkWHrtP SZXRsxdsSVJhW44VXdYqIkNyz4i2C10= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-32-vtnLKq0_MEi4ZYZPdgZPwA-1; Wed, 17 May 2023 10:18:03 -0400 X-MC-Unique: vtnLKq0_MEi4ZYZPdgZPwA-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3f42b36733aso3418965e9.3 for ; Wed, 17 May 2023 07:18:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684333081; x=1686925081; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ioWVxI0IhvogYDfX7ey3FCLR56UxerMQXuLkBllxahs=; b=FxRC2l9J2bGuBBbCmwIgyJV2rqlDns5syQAvvujc29Bh8Lmd+j5GZDL44wDZjbW59b NOSpiDdPKTNNCvawX+Q+SQZPskoi2gDeK1XyIytwYDyE/mz0vnYLvk+AClLVRE8QISN6 AfoMsGam+GPBHXXllhPovDxzFn2ejJGaMkomUD/wWDNm9CoyGlj3ZOFrlHLUk7YIV97c ujKUIst8FyY+skX0reCmFAaYr2xflVgT6rd7dVRVxtVWf+ajaPirK6S/dOS5iYhhZ8Tx 08FzV7l6d/gAEJ/V+fohWvRERlKRADmDmAOnuvLacuKAGpd42IUSdaTWpFYdsYQe8XwX Kpvw== X-Gm-Message-State: AC+VfDwXNErTevZkDd6DHe9Te2Ir+RDxQmx48yBAHpajwSmijQ4qMkP0 eMlkMJFOR04oi/UvHUyu4ss1XzYdJ+LEQbiDDij22Ty4EeByVRisYy77e1wvFTNPgR6Dq1/s5Sn RcVFY5wz/qKRt X-Received: by 2002:a1c:6a0d:0:b0:3f4:23d4:e48 with SMTP id f13-20020a1c6a0d000000b003f423d40e48mr24238464wmc.23.1684333081618; Wed, 17 May 2023 07:18:01 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ75qo8BdcqXdxo15sDlxFUJpgmybshio61afdZvp0npjows/dgUMrhfRqB4N5ATx0DaUrybCA== X-Received: by 2002:a1c:6a0d:0:b0:3f4:23d4:e48 with SMTP id f13-20020a1c6a0d000000b003f423d40e48mr24238448wmc.23.1684333081207; Wed, 17 May 2023 07:18:01 -0700 (PDT) Received: from localhost (net-130-25-106-149.cust.vodafonedsl.it. [130.25.106.149]) by smtp.gmail.com with ESMTPSA id 13-20020a05600c024d00b003f42813b315sm2359342wmj.32.2023.05.17.07.18.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 May 2023 07:18:00 -0700 (PDT) Date: Wed, 17 May 2023 16:17:59 +0200 From: Lorenzo Bianconi To: Yunsheng Lin Cc: Lorenzo Bianconi , Maciej Fijalkowski , netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, ast@kernel.org, daniel@iogearbox.net, hawk@kernel.org, john.fastabend@gmail.com Subject: Re: [RFC net-next] net: veth: reduce page_pool memory footprint using half page per-buffer Message-ID: References: <62654fa5-d3a2-4b81-af70-59c9e90db842@huawei.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="izBUiPk0NpMgOihH" Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net --izBUiPk0NpMgOihH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable > On 2023/5/17 6:52, Lorenzo Bianconi wrote: > >> On Mon, May 15, 2023 at 01:24:20PM +0200, Lorenzo Bianconi wrote: > >>>> On 2023/5/12 21:08, Lorenzo Bianconi wrote: > >>>>> In order to reduce page_pool memory footprint, rely on > >>>>> page_pool_dev_alloc_frag routine and reduce buffer size > >>>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one= page > >>>> > >>>> Is there any performance improvement beside the memory saving? As it > >>>> should reduce TLB miss, I wonder if the TLB miss reducing can even > >>>> out the cost of the extra frag reference count handling for the > >>>> frag support? > >>> > >>> reducing the requested headroom to 192 (from 256) we have a nice impr= ovement in > >>> the 1500B frame case while it is mostly the same in the case of paged= skb > >>> (e.g. MTU 8000B). > >> > >> Can you define 'nice improvement' ? ;) > >> Show us numbers or improvement in %. > >=20 > > I am testing this RFC patch in the scenario reported below: > >=20 > > iperf tcp tx --> veth0 --> veth1 (xdp_pass) --> iperf tcp rx > >=20 > > - 6.4.0-rc1 net-next: > > MTU 1500B: ~ 7.07 Gbps > > MTU 8000B: ~ 14.7 Gbps > >=20 > > - 6.4.0-rc1 net-next + page_pool frag support in veth: > > MTU 1500B: ~ 8.57 Gbps > > MTU 8000B: ~ 14.5 Gbps > >=20 >=20 > Thanks for sharing the data. > Maybe using the new frag interface introduced in [1] bring > back the performance for the MTU 8000B case. >=20 > 1. https://patchwork.kernel.org/project/netdevbpf/cover/20230516124801.24= 65-1-linyunsheng@huawei.com/ >=20 >=20 > I drafted a patch for veth to use the new frag interface, maybe that > will show how veth can make use of it. Would you give it a try to see > if there is any performance improvment for MTU 8000B case? Thanks. >=20 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -737,8 +737,8 @@ static int veth_convert_skb_to_xdp_buff(struct veth_r= q *rq, > skb_shinfo(skb)->nr_frags || > skb_headroom(skb) < XDP_PACKET_HEADROOM) { > u32 size, len, max_head_size, off; > + struct page_pool_frag *pp_frag; > struct sk_buff *nskb; > - struct page *page; > int i, head_off; >=20 > /* We need a private copy of the skb and data buffers sin= ce > @@ -752,14 +752,20 @@ static int veth_convert_skb_to_xdp_buff(struct veth= _rq *rq, > if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size) > goto drop; >=20 > + size =3D min_t(u32, skb->len, max_head_size); > + size +=3D VETH_XDP_HEADROOM; > + size +=3D SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > + > /* Allocate skb head */ > - page =3D page_pool_dev_alloc_pages(rq->page_pool); > - if (!page) > + pp_frag =3D page_pool_dev_alloc_frag(rq->page_pool, size); > + if (!pp_frag) > goto drop; >=20 > - nskb =3D napi_build_skb(page_address(page), PAGE_SIZE); > + nskb =3D napi_build_skb(page_address(pp_frag->page) + pp_= frag->offset, > + pp_frag->truesize); > if (!nskb) { > - page_pool_put_full_page(rq->page_pool, page, true= ); > + page_pool_put_full_page(rq->page_pool, pp_frag->p= age, > + true); > goto drop; > } >=20 > @@ -782,16 +788,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth= _rq *rq, > len =3D skb->len - off; >=20 > for (i =3D 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { > - page =3D page_pool_dev_alloc_pages(rq->page_pool); > - if (!page) { > + size =3D min_t(u32, len, PAGE_SIZE); > + > + pp_frag =3D page_pool_dev_alloc_frag(rq->page_poo= l, size); > + if (!pp_frag) { > consume_skb(nskb); > goto drop; > } >=20 > - size =3D min_t(u32, len, PAGE_SIZE); > - skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE= ); > - if (skb_copy_bits(skb, off, page_address(page), > - size)) { > + skb_add_rx_frag(nskb, i, pp_frag->page, pp_frag->= offset, > + size, pp_frag->truesize); > + if (skb_copy_bits(skb, off, page_address(pp_frag-= >page) + > + pp_frag->offset, size)) { > consume_skb(nskb); > goto drop; > } > @@ -1047,6 +1055,8 @@ static int veth_create_page_pool(struct veth_rq *rq) > return err; > } IIUC the code here we are using a variable length for linear part (at most = one page) while we will always use a full page (exept for the last fragment) for the = paged area, correct? I have not tested it yet but I do not think we will get a si= gnificant improvement since if we set MTU to 8000B in my tests we get mostly the same= throughput (14.5 Gbps vs 14.7 Gbps) if we use page_pool fragment or page_pool full pag= e. Am I missing something? What we are discussing with Jesper is try to allocate a order 3 page from t= he pool and rely page_pool fragment, similar to page_frag_cache is doing. I will look i= nto it if there are no strong 'red flags'. Regards, Lorenzo >=20 > + page_pool_set_max_frag_size(rq->page_pool, PAGE_SIZE / 2); > + > return 0; > } >=20 --izBUiPk0NpMgOihH Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTquNwa3Txd3rGGn7Y6cBh0uS2trAUCZGTiFgAKCRA6cBh0uS2t rIYXAQDC1p5J2e/oUAWCH/jUJ7eWYR4MYfi1yoeuVh2Ez13k/QEAl19w3AO1FWmQ fAhzwXqmg4cV/roSBYKY+62rlK78DQs= =t0oa -----END PGP SIGNATURE----- --izBUiPk0NpMgOihH--