Re: [RFC 2/2] page_frag_cache: Store metadata in struct page

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>,
	linux-mm@vger.kernel.org, Netdev <netdev@vger.kernel.org>,
	Matthew Wilcox <mawilcox@microsoft.com>
Subject: Re: [RFC 2/2] page_frag_cache: Store metadata in struct page
Date: Fri, 16 Mar 2018 14:05:00 -0700	[thread overview]
Message-ID: <20180316210500.GH27498@bombadil.infradead.org> (raw)
In-Reply-To: <CAKgT0Uf0qWbo7T8j00Owt_hEj6vZSY-MOsqUeZCek1x62DKz4Q@mail.gmail.com>

On Thu, Mar 15, 2018 at 02:26:28PM -0700, Alexander Duyck wrote:
> On Thu, Mar 15, 2018 at 12:53 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> >
> > Shrink page_frag_cache from 24 to 8 bytes (a single pointer to the
> > currently-in-use struct page) by using the page's refcount directly
> > (instead of maintaining a bias) and storing our current progress through
> > the page in the same bits currently used for page->index.  We no longer
> > need to reflect the page pfmemalloc state if we're storing the page
> > directly.
> >
> > On the downside, we now call page_address() on every allocation, and we
> > do an atomic_inc() rather than a non-atomic decrement, but we should
> > touch the same number of cachelines and there is far less code (and
> > the code is less complex).
> >
> > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> 
> So my concern with this patch is that it is essentially trading off
> CPU performance for reduced size. One of the reasons for getting away
> from using the page pointer is because it is expensive to access the
> page when the ref_count is bouncing on multiple cache lines.

I'm not really interested in trading off SMP scalability for reduced
memory usage; if we see more than a trivial penalty then I'll withdraw
this RFC.  I read & understand the commits that you made to get us to
the current codebase, but there's no history of this particular variation
in the tree.

0e39250845c0f91acc64264709b25f7f9b85c2c3
9451980a6646ed487efce04a9df28f450935683e
540eb7bf0bbedb65277d68ab89ae43cdec3fd6ba

I think that by moving the 'offset' into the struct page, we end up updating
only one cacheline instead of two, which should be a performance win.
I understand your concern about the cacheline bouncing between the freeing
and allocating CPUs.  Is cross-CPU freeing a frequent occurrence?  From
looking at its current usage, it seemed like the allocation and freeing
were usually on the same CPU.

> In
> addition converting from a page to a virtual address is actually more
> expensive then you would think it should be. I went through and
> optimized that the best I could, but there is still a pretty
> significant cost to the call.

Were you able to break down where that cost occurred?  It looks
pretty cheap on x86-32 nohighmem:

    343e:       a1 00 00 00 00          mov    0x0,%eax
                        343f: R_386_32  mem_map
    3443:       29 f3                   sub    %esi,%ebx
    3445:       89 5a 08                mov    %ebx,0x8(%edx)
    3448:       29 c2                   sub    %eax,%edx
    344a:       c1 fa 05                sar    $0x5,%edx
    344d:       c1 e2 0c                shl    $0xc,%edx
    3450:       8d 84 13 00 00 00 c0    lea    -0x40000000(%ebx,%edx,1),%eax
    3457:       8b 5d f8                mov    -0x8(%ebp),%ebx
    345a:       8b 75 fc                mov    -0x4(%ebp),%esi
    345d:       89 ec                   mov    %ebp,%esp
    345f:       5d                      pop    %ebp
    3460:       c3                      ret    

I did code up a variant which stored the virtual address of the offset
in page->index, but I threw that away after seeing how much more code
it turned into.  Fortunately I had a better idea of how to implement that
early this morning.  I haven't even tested it yet, but it looks like better
generated code:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 28d9a4c9c5fd..ec38204eb903 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4357,8 +4357,8 @@ struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
 		unsigned int offset = PAGE_SIZE << compound_order(old);
 
 		if (offset > size) {
-			old->offset = offset;
-			return old;
+			page = old;
+			goto reset;
 		}
 	}
 
@@ -4379,7 +4379,10 @@ struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
 	if (old)
 		put_page(old);
 	nc->page = page;
-	page->offset = PAGE_SIZE << compound_order(page);
+	page->private = (PAGE_SIZE << compound_order(page)) - 1;
+reset:
+	page->index = (unsigned long)page_to_virt(page) +
+					(PAGE_SIZE << compound_order(page));
 	return page;
 }
 
@@ -4402,20 +4405,26 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		      unsigned int size, gfp_t gfp_mask)
 {
 	struct page *page = nc->page;
-	unsigned int offset = page->offset;
+	unsigned long addr = addr;
+	unsigned int offset = 0;
 
-	if (unlikely(!page || offset < size)) {
+	if (page) {
+		addr = page->index;
+		offset = addr & page->private;
+	}
+
+	if (unlikely(offset < size)) {
 		page = __page_frag_cache_refill(nc, size, gfp_mask);
 		if (!page)
 			return NULL;
-		offset = page->offset;
+		addr = page->index;
 	}
 
 	page_ref_inc(page);
-	offset -= size;
-	page->offset = offset;
+	addr -= size;
+	page->index = addr;
 
-	return page_address(page) + offset;
+	return (void *)addr;
 }
 EXPORT_SYMBOL(page_frag_alloc);
 

Obviously if the problem turns out to be the cacheline thrashing rather
than the call to page_to_virt, then this is pointless to test.

> I won't be able to test the patches until next week, but I expect I
> will probably see a noticeable regression when performing a small
> packet routing test.

I really appreciate you being willing to try this for me.  I need to
get myself a dual-socket machine to test things like this.

next prev parent reply	other threads:[~2018-03-16 21:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-15 19:53 [RFC 0/2] Shrink page_frag_cache Matthew Wilcox
2018-03-15 19:53 ` [RFC 1/2] mm: Use page->mapping to indicate pfmemalloc Matthew Wilcox
2018-03-15 19:53 ` [RFC 2/2] page_frag_cache: Store metadata in struct page Matthew Wilcox
2018-03-15 21:03   ` Eric Dumazet
2018-03-15 21:26   ` Alexander Duyck
2018-03-16 21:05     ` Matthew Wilcox [this message]
2018-03-16 21:32       ` Eric Dumazet
2018-03-19  8:27       ` Jesper Dangaard Brouer
2018-03-17 20:54     ` David Miller
2018-03-17  2:17   ` [page_frag_cache] 47b0eaa4b5: BUG:unable_to_handle_kernel kernel test robot
2018-03-20 20:47   ` [RFC 2/2] page_frag_cache: Store metadata in struct page Alexander Duyck
2018-03-15 20:02 ` [RFC 0/2] Shrink page_frag_cache Alexander Duyck

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:28d9a4c9c5f dfblob:ec38204eb90 )
 OR (
bs:"Re: [RFC 2/2] page_frag_cache: Store metadata in struct page" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180316210500.GH27498@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=linux-mm@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).