From: Matthew Wilcox <willy@infradead.org>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>,
Netdev <netdev@vger.kernel.org>, linux-mm <linux-mm@kvack.org>,
Jesper Dangaard Brouer <brouer@redhat.com>,
Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset
Date: Thu, 22 Mar 2018 09:41:57 -0700 [thread overview]
Message-ID: <20180322164157.GE28468@bombadil.infradead.org> (raw)
In-Reply-To: <CAKgT0UfcYLm3UZcq536cNOczVhR60qoFDHh_gcXqqyqdViuLzw@mail.gmail.com>
On Thu, Mar 22, 2018 at 09:22:31AM -0700, Alexander Duyck wrote:
> On Thu, Mar 22, 2018 at 8:31 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > By combining 'va' and 'offset' into 'addr' and using a mask instead,
> > we can save a compare-and-branch in the fast-path of the allocator.
> > This removes 4 instructions on x86 (both 32 and 64 bit).
>
> What is the point of renaming "va"? I'm seeing a lot of unneeded
> renaming in these patches that doesn't really seem needed and is just
> making things harder to review.
By renaming 'va', I made sure that I saw everywhere that 'va' was touched,
and reviewed it to be sure it was still correct with the new meaning.
The advantage of keeping that in the patch submission, rather than
renaming it back again, is that you can see everywhere that it's been
touched and verify that for yourself.
> > We can avoid storing the mask at all if we know that we're only allocating
> > a single page. This shrinks page_frag_cache from 12 to 8 bytes on 32-bit
> > CONFIG_BASE_SMALL build.
> >
> > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
>
> So I am not really a fan of CONFIG_BASE_SMALL in general, so
> advertising gains in size is just going back down the reducing size at
> the expense of performance train of thought.
There's no tradeoff for performance *in this patch* with
CONFIG_BASE_SMALL. Indeed, being able to assume that the cache contains a
single PAGE_SIZE page reduces the number of instructions by two on x86-64
(and is neutral on x86-32). IIRC it saves a register, so there's one fewer
'push' at the beginning of the function and one fewer 'pop' at the end.
I think the more compelling argument for conditioning the number of pages
allocated on CONFIG_BASE_SMALL is that a machine which needs to shrink
its data structures so badly isn't going to have 32k of memory available,
nor want to spend it on a networking allocation. Eric's commit which
introduced NETDEV_FRAG_PAGE_MAX_ORDER back in 2012 (69b08f62e174) didn't
mention small machines as a consideration.
> Do we know for certain that a higher order page is always aligned to
> the size of the higher order page itself? That is one thing I have
> never been certain about. I know for example there are head and tail
> pages so I was never certain if it was possible to create a higher
> order page that is not aligned to to the size of the page itself.
It's intrinsic to the buddy allocator that pages are naturally aligned
to their order. There's a lot of code in the kernel which relies on
it, including much of the mm (particularly THP). I suppose one could
construct a non-aligned compound page, but it'd be really weird, and you'd
have to split it up manually before handing it back to the page allocator.
I don't see this ever changing.
> > struct page_frag_cache {
> > - void * va;
> > + void *addr;
> > #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> > - __u16 offset;
> > - __u16 size;
> > -#else
> > - __u32 offset;
> > + unsigned int mask;
>
> So this is just an akward layout. You now have essentially:
> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> #else
> unsigned int mask;
> #endif
Huh? There's a '-' in front of the '#else'. It looks like this:
struct page_frag_cache {
void *addr;
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
unsigned int mask;
#endif
/* we maintain a pagecount bias, so that we dont dirty cache line
* containing page->_refcount every time we allocate a fragment.
*/
unsigned int pagecnt_bias;
};
> > @@ -4364,27 +4361,24 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *pfc,
> > PAGE_FRAG_CACHE_MAX_ORDER);PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> > if (page)
> > size = PAGE_FRAG_CACHE_MAX_SIZE;
> > - pfc->size = size;
> > + pfc->mask = size - 1;
> > #endif
> > if (unlikely(!page))
> > page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
> > if (!page) {
> > - pfc->va = NULL;
> > + pfc->addr = NULL;
> > return NULL;
> > }
> >
> > - pfc->va = page_address(page);
> > -
> > /* Using atomic_set() would break get_page_unless_zero() users. */
> > page_ref_add(page, size - 1);
>
> You could just use the pfc->mask here instead of size - 1 just to
> avoid having to do the subtraction more than once assuming the
> compiler doesn't optimize it.
Either way I'm assuming a compiler optimisation -- that it won't reload
from memory, or that it'll remember the subtraction. I don't much care
which, and I'll happily use the page_frag_cache_mask() if that reads better
for you.
next prev parent reply other threads:[~2018-03-22 16:41 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool Matthew Wilcox
2018-03-22 16:39 ` Alexander Duyck
2018-03-22 17:08 ` Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 2/8] page_frag_cache: Move slowpath code from page_frag_alloc Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 3/8] page_frag_cache: Rename 'nc' to 'pfc' Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 4/8] page_frag_cache: Rename fragsz to size Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 5/8] page_frag_cache: Save memory on small machines Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset Matthew Wilcox
2018-03-22 16:22 ` Alexander Duyck
2018-03-22 16:41 ` Matthew Wilcox [this message]
2018-03-22 17:31 ` Alexander Duyck
2018-03-22 17:34 ` Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 7/8] page_frag: Update documentation Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 8/8] page_frag: Account allocations Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180322164157.GE28468@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=alexander.duyck@gmail.com \
--cc=brouer@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=linux-mm@kvack.org \
--cc=mawilcox@microsoft.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).