From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05614C2BD09 for ; Mon, 1 Jul 2024 23:27:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F3D46B0083; Mon, 1 Jul 2024 19:27:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A32A6B0085; Mon, 1 Jul 2024 19:27:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2440F6B0088; Mon, 1 Jul 2024 19:27:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 056C76B0083 for ; Mon, 1 Jul 2024 19:27:53 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id ABBE71C37F7 for ; Mon, 1 Jul 2024 23:27:53 +0000 (UTC) X-FDA: 82292773626.20.F18BB4A Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) by imf04.hostedemail.com (Postfix) with ESMTP id C197A4000F for ; Mon, 1 Jul 2024 23:27:51 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Bro/udK3"; spf=pass (imf04.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.167.182 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719876460; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=76/gZ390qF9o/E7hcl7yvTWNr8WqJpJYNL9n9m4Vg/I=; b=YGTTsYdXmcoPkAqUjJq3P9gem5Yn7M8p6CnxuJcdoeQjXAcS9hluWxPl2JbS4+0JKwFaYE GdFlAdZoiTbTeofbuVGBt9gqJAEfxnvJqNvW93V0tMi/uB9nSV0E4cX+RCyV0JD4VZlQSO eHihotV/TlOESCFQ7BPSPANvkfX5Z44= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Bro/udK3"; spf=pass (imf04.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.167.182 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719876460; a=rsa-sha256; cv=none; b=ULgd/0kG4yiNI5qHviuubXPNX9X8K/Zt74flojxTOph6qIHLP3dKsWaB/M5oSg56Tq7sdr rGT2ultCLiojEppEI62bylTa4LMUtxS6DeRDw5JlhbpQ1SedOaN2yfYxDV9FzYvz9jzwOE S2WMbRaVNjuWBbutS70lWD8XHOHECGQ= Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3d6301e7279so2282038b6e.3 for ; Mon, 01 Jul 2024 16:27:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719876470; x=1720481270; darn=kvack.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=76/gZ390qF9o/E7hcl7yvTWNr8WqJpJYNL9n9m4Vg/I=; b=Bro/udK3zM9oX2SJbEU+X7456xGB/lpjpI1kcjem8Wb7Xwfzw5XJbXT9l2/YC7bULA F8aoaJ3QgArUXfd/ikHI5CwzuSQTd6Ko2WOeSiRycHMym2DKfB0bQy8pwxgydol/pT/I 7DgBXMNUIKaLtxc76YDLoS+1HWgmALAc0gzEO/xvHyW+zlReyBq7U1bfffYyNWAyReJS doEry/ZMbsLLw5V3r171KeRNSW8F8WeeJnAroxyIQ/QMIIFDfux7Exs50unGiNNPeWVI pQRo2f/jdZ1D6ygDc8sP9XmBuFdNa1W4gVJz+vKvAUVcVG480nNaT52elUh3aQMUOm2Z 7vng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719876470; x=1720481270; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=76/gZ390qF9o/E7hcl7yvTWNr8WqJpJYNL9n9m4Vg/I=; b=iSrWGz/0Si4JS69zajhUHJNaXGaDRyZMfpFVSUlioXh2HNFFtI/eT8QRb2+2OSRLsq h0RBkrYJXdsms1DCl96RiU9B//elTzfaS78kel6Q2t09ooacuE37kYOwotCb758Ky0aM aiqi7zlLB/vasICx/+o2iKaaxslTwn7HCwKilcsrmrDGUs86D2+oX/yhZzSKKjtIAkM0 oVxxbuMQcgNiSgh+ToD0xi5gTrqhi5AfRDI5oQC1nUvlvel6Jx6QQG47Usov8dIqRe/S CIRiutr0PuTGkmaWgdj09uOs8s8m/u9lfLsLjoerJ1uJSESoU8oRWv2QtPpeGKUXFsJK sWRw== X-Forwarded-Encrypted: i=1; AJvYcCU+B7gg1/ccvMUI6DJplLGnuxuwff/itfKxHKpud8zuQo754rWj2TBRIcGm2dzJKzCNKOHTL0YehWOjw4fbopb53nI= X-Gm-Message-State: AOJu0Yxc4HtkYqVszkmCmoOlGXGk3xiZ/O3bLa4qkC36WHr8D8/aCSNA NpQ4lnXTzLbH/2tYdYo88Mw8APMqxsPmxCSkTerqdfwy4PPM2eRxblEZ4w== X-Google-Smtp-Source: AGHT+IFTas9QKVB5IeQv8pE6F/a4Mr2QBk6CLEqmOgnBodeX1KGm0+fpKOchpheX1cTy9E2wDqqg7g== X-Received: by 2002:a05:6808:f8d:b0:3d6:32b4:b8fa with SMTP id 5614622812f47-3d6b30f45cemr9299758b6e.13.1719876470469; Mon, 01 Jul 2024 16:27:50 -0700 (PDT) Received: from [192.168.0.128] ([98.97.103.43]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-70802567926sm7120203b3a.54.2024.07.01.16.27.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Jul 2024 16:27:49 -0700 (PDT) Message-ID: <8e80507c685be94256e0e457ee97622c4487716c.camel@gmail.com> Subject: Re: [PATCH net-next v9 03/13] mm: page_frag: use initial zero offset for page_frag_alloc_align() From: Alexander H Duyck To: Yunsheng Lin , davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Date: Mon, 01 Jul 2024 16:27:48 -0700 In-Reply-To: <20240625135216.47007-4-linyunsheng@huawei.com> References: <20240625135216.47007-1-linyunsheng@huawei.com> <20240625135216.47007-4-linyunsheng@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.48.4 (3.48.4-1.fc38) MIME-Version: 1.0 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C197A4000F X-Stat-Signature: ugie6iddsdyen4wj39rsf6jsmtzu7axo X-Rspam-User: X-HE-Tag: 1719876471-4407 X-HE-Meta: U2FsdGVkX19mQKA7FWPfLywzP9veRZu+yhclTQlieevj+KE4tQXA3bNyyFOh9CleK9I5azxCZI0oZZ36MWn9bbgFc9QmGnh66S5EpGgd/Ryc7K9wEZVYI9sHmZkpLjay7Sx+m8/bPKyG1UfmeBmZv0puFPklqxD3xuni0DEP9RxuOzldbe70cGeuzOt2Ur2U8+mTJR9GpxW5r5CE8wz0kVUX+10p2Ng3/oAlAkHuYlT9ZQsBpW6ramAXpCLBJEuHDnJZ4/OmRE8/tMLXstOn+3ksyLirXOzCzNwBSOkDS4PjlUpZmQHJxW3j54VtbKcQVybuvnu1vIRDThnzTnFzbcpTTmlDo6O1On9PxKrtHCD9aXLBtnO90XT6X82ngFtYSaY90dWdzP7jc12ADAt3wYEgtc1L+BV/84pdwLmmqDOJn7IqkbPmylyF919T0gfvDypJneNwX+9rsuCNchzZiZcnWFvl8aCHRr1Xpq8gY5ZZKo/xQQ/ZStwCAgM3/EQ+TfITY4YRxG/v67cWG4VwrGswAjCfTEwuyDiXt0Mwc5vpJzPQGVG2CXNelW8UI3/UyBgtgUeWXHk6tAbpVUSwONfHuqi/rZ2+z0VazEMQAjGKM26dNEbhASKWkUsm16D2fIQZj8sKtvH5F0bmTqzO2/+VwT3ouwU4xqUaRZ7EBq6jDiooH8jU1fwjIzLiq+UhLXUKKNg2uDkX1UcBUGC7mwp+uym7c8sXre9FbxRlgK1ns4Fkv66+XxjlVUI4dk/NefdSAJ0DKgP/zZ+gys8/lIHVdvGarWfB4WyE4GJQq/LYVH+q12QTtAmc5gUm/zIVYtWynh5S0WWopHNXvAhP9RW50Y+4xsjpTZSHFxZu7/GlNJK4REv4zi53+fi1A0wBLWp+t/3PDLkg7Kb/MBfkOUoaRElSJ2XAmFPySw2srhz9EHr9NlnRU2MmmeN98bWJY6PHwd66IQIvfgBIy02 UHgA+inR 4R+HUENrq9hGWxn/cy+wnqAkYgbZZs1g3ZENvWg6Cx4i0b0q5vK5gqX34ndjsrTPLoYxDyweZLzqvGyJRPBEXHQtFdsZ/lnCGLq8iR7LjA0MYWf3pb8p3uGFvEkjLx3s4+nVJDWWikZ1SbtVnFeW97l0tpB4gLkapT3I9KgNX+uvOimnaNgFNzSSLi6ZAVg+KdDokGvIwV9Gmg68Bnp0YejUlUrnz+PO52kAsQFXkjrGHYPg+R/aLEk/xhB2iPXE+UEEwW/tAsgJMRVpd/Bry/eEqaqLdW+PalmjfYZ8Lr8Cpo9es7JB3c5mNrOwDs9WO+vLrdXlkxyTGxSzSIGrwQnU9vv5mpNSzxi+d5vvtNsmHoHOL+STQAPmv3GiLEYAvn8d+zc9XK9e7wOuwkCsb1t7avc8tKTjIddkocTSi9rvy5ELFN40NPjrI2e2wQZAxHxC/s4yiwOYPP3WDDFyvmOIYdrcZYw2i6/gXKUQkKmMC1JgJSkfH4YapqG8MY8gqCWmyK3sAJGBLMwk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 2024-06-25 at 21:52 +0800, Yunsheng Lin wrote: > We are above to use page_frag_alloc_*() API to not just "about to use", not "above to use" > allocate memory for skb->data, but also use them to do > the memory allocation for skb frag too. Currently the > implementation of page_frag in mm subsystem is running > the offset as a countdown rather than count-up value, > there may have several advantages to that as mentioned > in [1], but it may have some disadvantages, for example, > it may disable skb frag coaleasing and more correct cache > prefetching >=20 > We have a trade-off to make in order to have a unified > implementation and API for page_frag, so use a initial zero > offset in this patch, and the following patch will try to > make some optimization to aovid the disadvantages as much > as possible. >=20 > As offsets is added due to alignment requirement before > actually checking if the cache is enough, which might make > it exploitable if caller passes a align value bigger than > 32K mistakenly. As we are allowing order 3 page allocation > to fail easily under low memory condition, align value bigger > than PAGE_SIZE is not really allowed, so add a 'align > > PAGE_SIZE' checking in page_frag_alloc_va_align() to catch > that. >=20 > 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.c= amel@gmail.com/ >=20 > CC: Alexander Duyck > Signed-off-by: Yunsheng Lin > --- > include/linux/page_frag_cache.h | 2 +- > include/linux/skbuff.h | 4 ++-- > mm/page_frag_cache.c | 26 +++++++++++--------------- > 3 files changed, 14 insertions(+), 18 deletions(-) >=20 > diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_ca= che.h > index 3a44bfc99750..b9411f0db25a 100644 > --- a/include/linux/page_frag_cache.h > +++ b/include/linux/page_frag_cache.h > @@ -32,7 +32,7 @@ static inline void *page_frag_alloc_align(struct page_f= rag_cache *nc, > unsigned int fragsz, gfp_t gfp_mask, > unsigned int align) > { > - WARN_ON_ONCE(!is_power_of_2(align)); > + WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE); > return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align); > } > =20 > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index eb8ae8292c48..d1fea23ec386 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -3320,7 +3320,7 @@ static inline void *netdev_alloc_frag(unsigned int = fragsz) > static inline void *netdev_alloc_frag_align(unsigned int fragsz, > unsigned int align) > { > - WARN_ON_ONCE(!is_power_of_2(align)); > + WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE); > return __netdev_alloc_frag_align(fragsz, -align); > } > =20 > @@ -3391,7 +3391,7 @@ static inline void *napi_alloc_frag(unsigned int fr= agsz) > static inline void *napi_alloc_frag_align(unsigned int fragsz, > unsigned int align) > { > - WARN_ON_ONCE(!is_power_of_2(align)); > + WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE); > return __napi_alloc_frag_align(fragsz, -align); > } > =20 > diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c > index 88f567ef0e29..da244851b8a4 100644 > --- a/mm/page_frag_cache.c > +++ b/mm/page_frag_cache.c > @@ -72,10 +72,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *= nc, > if (!page) > return NULL; > =20 > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > - /* if size can vary use size else just use PAGE_SIZE */ > - size =3D nc->size; > -#endif > /* Even if we own the page, we do not use atomic_set(). > * This would break get_page_unless_zero() users. > */ > @@ -84,11 +80,16 @@ void *__page_frag_alloc_align(struct page_frag_cache = *nc, > /* reset page count bias and offset to start of new frag */ > nc->pfmemalloc =3D page_is_pfmemalloc(page); > nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; > - nc->offset =3D size; > + nc->offset =3D 0; > } > =20 > - offset =3D nc->offset - fragsz; > - if (unlikely(offset < 0)) { > +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > + /* if size can vary use size else just use PAGE_SIZE */ > + size =3D nc->size; > +#endif > + > + offset =3D __ALIGN_KERNEL_MASK(nc->offset, ~align_mask); > + if (unlikely(offset + fragsz > size)) { The fragsz check below could be moved to here. > page =3D virt_to_page(nc->va); > =20 > if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) > @@ -99,17 +100,13 @@ void *__page_frag_alloc_align(struct page_frag_cache= *nc, > goto refill; > } > =20 > -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) > - /* if size can vary use size else just use PAGE_SIZE */ > - size =3D nc->size; > -#endif > /* OK, page count is 0, we can safely set it */ > set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); > =20 > /* reset page count bias and offset to start of new frag */ > nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; > - offset =3D size - fragsz; > - if (unlikely(offset < 0)) { > + offset =3D 0; > + if (unlikely(fragsz > PAGE_SIZE)) { Since we aren't taking advantage of the flag that is left after the subtraction we might just want to look at moving this piece up to just after the offset + fragsz check. That should prevent us from trying to refill if we have a request that is larger than a single page. In addition we could probably just drop the 3 PAGE_SIZE checks above as they would be redundant. > /* > * The caller is trying to allocate a fragment > * with fragsz > PAGE_SIZE but the cache isn't big > @@ -124,8 +121,7 @@ void *__page_frag_alloc_align(struct page_frag_cache = *nc, > } > =20 > nc->pagecnt_bias--; > - offset &=3D align_mask; > - nc->offset =3D offset; > + nc->offset =3D offset + fragsz; > =20 > return nc->va + offset; > }