linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexander H Duyck <alexander.duyck@gmail.com>
To: Yunsheng Lin <linyunsheng@huawei.com>,
	davem@davemloft.net, kuba@kernel.org,  pabeni@redhat.com
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH net-next v2 09/15] mm: page_frag: reuse MSB of 'size' field for pfmemalloc
Date: Wed, 17 Apr 2024 08:11:23 -0700	[thread overview]
Message-ID: <17066b6a4f941eea3ef567767450b311096da22b.camel@gmail.com> (raw)
In-Reply-To: <8b7361c2-6f45-72e8-5aca-92e8a41a7e5e@huawei.com>

On Wed, 2024-04-17 at 21:19 +0800, Yunsheng Lin wrote:
> On 2024/4/17 0:22, Alexander H Duyck wrote:
> > On Mon, 2024-04-15 at 21:19 +0800, Yunsheng Lin wrote:
> > > The '(PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)' case is for the
> > > system with page size less than 32KB, which is 0x8000 bytes
> > > requiring 16 bits space, change 'size' to 'size_mask' to avoid
> > > using the MSB, and change 'pfmemalloc' field to reuse the that
> > > MSB, so that we remove the orginal space needed by 'pfmemalloc'.
> > > 
> > > For another case, the MSB of 'offset' is reused for 'pfmemalloc'.
> > > 
> > > Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> > > ---
> > >  include/linux/page_frag_cache.h | 13 ++++++++-----
> > >  mm/page_frag_cache.c            |  5 +++--
> > >  2 files changed, 11 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
> > > index fe5faa80b6c3..40a7d6da9ef0 100644
> > > --- a/include/linux/page_frag_cache.h
> > > +++ b/include/linux/page_frag_cache.h
> > > @@ -12,15 +12,16 @@ struct page_frag_cache {
> > >  	void *va;
> > >  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> > >  	__u16 offset;
> > > -	__u16 size;
> > > +	__u16 size_mask:15;
> > > +	__u16 pfmemalloc:1;
> > >  #else
> > > -	__u32 offset;
> > > +	__u32 offset:31;
> > > +	__u32 pfmemalloc:1;
> > >  #endif
> > 
> > This seems like a really bad idea. Using a bit-field like this seems
> > like a waste as it means that all the accesses now have to add
> > additional operations to access either offset or size. It wasn't as if
> > this is an oversized struct, or one that we are allocating a ton of. As
> > such I am not sure why we need to optmize for size like this.
> 
> For the old 'struct page_frag' use case, there is one 'struct page_frag'
> for every socket and task_struct, there may be tens of thousands of
> them even in a 32 bit sysmem, which might mean a lof of memory even for
> a single byte saving, see patch 13.
> 

Yeah, I finally had time to finish getting through the patch set last
night. Sorry for quick firing reviews but lately I haven't had much
time to work on upstream work, and as you mentioned last time I only
got to 3 patches so I was trying to speed through.

I get that you are trying to reduce the size but in the next patch you
actually end up overshooting that on x86_64 systems. I am assuming that
is to try to account for the 32b use case? On 64b I am pretty sure you
don't get any gain since a long followed by two u16s and an int will
still be 16B. What we probably need to watch out for is the
optimization for size versus having to add instructions to extract and
insert the data back into the struct.

Anyway as far as this layout I am not sure it is the best way to go.
You are combining pfmemalloc with either size *OR* offset, and then
combining the pagecnt_bias with the va. I'm wondering if it wouldn't
make more sense to look at putting together the structure something
like:

#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
typedef u16 page_frag_bias_t;
#else
typedef u32 page_frag_bias_t;
#endif

struct page_frag_cache {
	/* page address and offset */
	void *va;
	page_frag_bias_t pagecnt_bias;
	u8 pfmemalloc;
	u8 page_frag_order;
}

The basic idea would be that we would be able to replace the size mask
with just a shift value representing the page order of the page being
fragmented. With that we can reduce the size to just a single byte. In
addition we could probably leave it there regardless of build as the
order should be initialized to 0 when this is allocated to it would be
correct even in the case where it isn't used (and there isn't much we
can do about the hole anyway).

In addition by combining the virtual address with the offset we can
just use the combined result for what we need. The only item that has
to be worked out is how to deal with the end of a page in the count up
case. However the combination seems the most logical one since they are
meant to be combined ultimately anyway. It does put limits on when we
can align things as we don't want to align ourselves into the next
page, but I think it makes more sense then the other limits that had to
be put on allocations and such in order to allow us to squeeze
pagecnt_bias into the virtual address.

Anyway I pulled in your patches and plan to do a bit of testing, after
I figure out what the nvme disk ID regression is I am seeing. My main
concern can be summed up as the NIC driver use case
(netdev/napi_alloc_frag callers) versus the socket/vhost use case. The
main thing in the case of the NIC driver callers is that we have a need
for isolation and guarantees that we won't lose cache line alignment. I
think those are the callers you are missing in your benchmarks, but
arguably that might be something you cannot test as I don't know what
NICs you have access to and if you have any that are using those calls.


  reply	other threads:[~2024-04-17 15:11 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240415131941.51153-1-linyunsheng@huawei.com>
2024-04-15 13:19 ` [PATCH net-next v2 01/15] mm: page_frag: add a test module for page_frag Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 03/15] mm: page_frag: use free_unref_page() to free page fragment Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 04/15] mm: move the page fragment allocator from page_alloc into its own file Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 05/15] mm: page_frag: use initial zero offset for page_frag_alloc_align() Yunsheng Lin
2024-04-15 23:55   ` Alexander H Duyck
2024-04-16 13:11     ` Yunsheng Lin
2024-04-16 15:51       ` Alexander H Duyck
2024-04-17 13:17         ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 06/15] mm: page_frag: change page_frag_alloc_* API to accept align param Yunsheng Lin
2024-04-16 16:08   ` Alexander Duyck
2024-04-17 13:18     ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 07/15] mm: page_frag: add '_va' suffix to page_frag API Yunsheng Lin
2024-04-16 16:12   ` Alexander H Duyck
2024-04-17 13:18     ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 08/15] mm: page_frag: add two inline helper for " Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 09/15] mm: page_frag: reuse MSB of 'size' field for pfmemalloc Yunsheng Lin
2024-04-16 16:22   ` Alexander H Duyck
2024-04-17 13:19     ` Yunsheng Lin
2024-04-17 15:11       ` Alexander H Duyck [this message]
2024-04-18  9:39         ` Yunsheng Lin
2024-04-26  9:38           ` Yunsheng Lin
2024-04-29 14:49             ` Alexander Duyck
2024-04-30 12:05               ` Yunsheng Lin
2024-04-30 14:54                 ` Alexander Duyck
2024-05-06 12:33                   ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 10/15] mm: page_frag: reuse existing bit field of 'va' for pagecnt_bias Yunsheng Lin
2024-04-16 16:33   ` Alexander H Duyck
2024-04-17 13:23     ` Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 12/15] mm: page_frag: introduce prepare/commit API for page_frag Yunsheng Lin
2024-04-15 13:19 ` [PATCH net-next v2 14/15] mm: page_frag: update documentation " Yunsheng Lin
2024-04-16  6:13   ` Bagas Sanjaya
2024-04-16 13:11     ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17066b6a4f941eea3ef567767450b311096da22b.camel@gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).