Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Dave Hansen <dave@sr71.net>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Glauber Costa <glommer@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Jiang Liu <jiang.liu@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/5] mm, page_alloc: support multiple pages allocation
Date: Thu, 11 Jul 2013 15:12:01 +0900	[thread overview]
Message-ID: <20130711061201.GA2400@lge.com> (raw)
In-Reply-To: <51DE44CC.2070700@sr71.net>

On Wed, Jul 10, 2013 at 10:38:20PM -0700, Dave Hansen wrote:
> On 07/10/2013 06:02 PM, Joonsoo Kim wrote:
> > On Wed, Jul 10, 2013 at 03:52:42PM -0700, Dave Hansen wrote:
> >> On 07/03/2013 01:34 AM, Joonsoo Kim wrote:
> >>> -		if (page)
> >>> +		do {
> >>> +			page = buffered_rmqueue(preferred_zone, zone, order,
> >>> +							gfp_mask, migratetype);
> >>> +			if (!page)
> >>> +				break;
> >>> +
> >>> +			if (!nr_pages) {
> >>> +				count++;
> >>> +				break;
> >>> +			}
> >>> +
> >>> +			pages[count++] = page;
> >>> +			if (count >= *nr_pages)
> >>> +				break;
> >>> +
> >>> +			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> >>> +			if (!zone_watermark_ok(zone, order, mark,
> >>> +					classzone_idx, alloc_flags))
> >>> +				break;
> >>> +		} while (1);
> >>
> >> I'm really surprised this works as well as it does.  Calling
> >> buffered_rmqueue() a bunch of times enables/disables interrupts a bunch
> >> of times, and mucks with the percpu pages lists a whole bunch.
> >> buffered_rmqueue() is really meant for _single_ pages, not to be called
> >> a bunch of times in a row.
> >>
> >> Why not just do a single rmqueue_bulk() call?
> > 
> > rmqueue_bulk() needs a zone lock. If we allocate not so many pages,
> > for example, 2 or 3 pages, it can have much more overhead rather than
> > allocationg 1 page multiple times. So, IMHO, it is better that
> > multiple pages allocation is supported on top of percpu pages list.
> 
> It makes _zero_ sense to be doing a number of buffered_rmqueue() calls
> that are approaching the size of the percpu pages batches.  If you end
> up doing that, you pay both the overhead in manipulating both the percpu
> page data _and_ the buddy lists.
> 
> You're probably right for small numbers of pages.  But, if we're talking
> about things that are more than, say, 100 pages (isn't the pcp batch
> size clamped to 128 4k pages?) you surely don't want to be doing
> buffered_rmqueue().

Yes, you are right.
Firstly, I thought that I can use this for readahead. On my machine,
readahead reads (maximum) 32 pages in advance if faulted. And batch size
of percpu pages list is close to or larger than 32 pages
on today's machine. So I didn't consider more than 32 pages before.
But to cope with a request for more pages, using rmqueue_bulk() is
a right way. How about using rmqueue_bulk() conditionally?

> 
> I'd also like to see some scalability numbers on this.  How do your
> tests look when all the CPUs on the system are hammering away?

What test do you mean?
Please elaborate on this more.

> > And I think that enables/disables interrupts a bunch of times help
> > to avoid a latency problem. If we disable interrupts until the whole works
> > is finished, interrupts can be handled too lately.
> > free_hot_cold_page_list() already do enables/disalbed interrupts a bunch of
> > times.
> 
> I don't think interrupt latency is going to be a problem on the scale
> we're talking about here.  There are much, much, much worse offenders in
> the kernel.

Hmm, rmqueue_bulk() doesn't stop until all requested pages are allocated.
If we request too many pages (1024 pages or more), interrupt latency can
be a problem.

Thanks!!

> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-07-11  6:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-03  8:34 [RFC PATCH 0/5] Support multiple pages allocation Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 1/5] mm, page_alloc: support " Joonsoo Kim
2013-07-03 15:57   ` Christoph Lameter
2013-07-04  4:29     ` Joonsoo Kim
2013-07-10 22:52   ` Dave Hansen
2013-07-11  1:02     ` Joonsoo Kim
2013-07-11  5:38       ` Dave Hansen
2013-07-11  6:12         ` Joonsoo Kim [this message]
2013-07-11 15:51           ` Dave Hansen
2013-07-16  0:26             ` Joonsoo Kim
2013-07-12 16:31           ` Dave Hansen
2013-07-16  0:37             ` Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 2/5] mm, page_alloc: introduce alloc_pages_exact_node_multiple() Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 3/5] radix-tree: introduce radix_tree_[next/prev]_present() Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 4/5] readahead: remove end range check Joonsoo Kim
2013-07-03  8:34 ` [RFC PATCH 5/5] readhead: support multiple pages allocation for readahead Joonsoo Kim
2013-07-03 15:28 ` [RFC PATCH 0/5] Support multiple pages allocation Michal Hocko
2013-07-03 15:51   ` Zhang Yanfei
2013-07-03 16:01     ` Zhang Yanfei
2013-07-04  4:24       ` Joonsoo Kim
2013-07-04 10:00         ` Michal Hocko
2013-07-10  0:31           ` Joonsoo Kim
2013-07-10  1:20             ` Zhang Yanfei
2013-07-10  9:56               ` Joonsoo Kim
2013-07-10  9:17             ` Michal Hocko
2013-07-10  9:55               ` Joonsoo Kim
2013-07-10 11:27                 ` Michal Hocko
2013-07-11  1:05                   ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130711061201.GA2400@lge.com \
    --to=iamjoonsoo.kim@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=dave@sr71.net \
    --cc=glommer@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jiang.liu@huawei.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).