From: mel@skynet.ie (Mel Gorman)
To: Andi Kleen <andi@firstfloor.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
akpm@osdl.org, torvalds@osdl.org, linux-kernel@vger.kernel.org,
linux-mm@vger.kernel.org
Subject: Re: Major mke2fs slowdown (reproducable, bisected)
Date: Wed, 14 Nov 2007 11:25:31 +0000 [thread overview]
Message-ID: <20071114112531.GC9566@skynet.ie> (raw)
In-Reply-To: <p736406yuxm.fsf@bingen.suse.de>
On (13/11/07 17:25), Andi Kleen didst pronounce:
> Alexey Dobriyan <adobriyan@gmail.com> writes:
> >
> > +/* Return the page with the lowest PFN in the list */
> > +static struct page *min_page(struct list_head *list)
> > +{
> > + unsigned long min_pfn = -1UL;
> > + struct page *min_page = NULL, *page;;
> > +
> > + list_for_each_entry(page, list, lru) {
> > + unsigned long pfn = page_to_pfn(page);
> > + if (pfn < min_pfn) {
> > + min_pfn = pfn;
> > + min_page = page;
> > + }
> > + }
> > +
> > + return min_page;
> > +}
> > +
> > /* Remove an element from the buddy allocator from the fallback list */
> > static struct page *__rmqueue_fallback(struct zone *zone, int order,
> > int start_migratetype)
> > @@ -795,8 +812,11 @@ retry:
> > if (list_empty(&area->free_list[migratetype]))
> > continue;
> >
> > + /* Bias kernel allocations towards low pfns */
> > page = list_entry(area->free_list[migratetype].next,
> > struct page, lru);
> > + if (unlikely(start_migratetype != MIGRATE_MOVABLE))
> > + page = min_page(&area->free_list[migratetype]);
>
> Do I misread this, or does it really turn the O(1) buddy allocation into
> a "search whole free list" algorithm? Even as fallback that looks like
> a quite extreme thing to do.
>
It's extreme but not *quite* as extreme as you imply. The whole free-lists are
not searched, just one set at a specific order so it's "search a portion of
the free-lists". It happens for non-movable allocations (usually the minority)
and only then in fallback (in itself quite rare in almost all cases I've seen).
The problem was not detected before by me because it wasn't just a case of
creating a large number of pinned allocations but also depended on the type
of workload preceding it. If mke2fs was long-lived, it might not even have
been noticed. When run more than once, the fallbacks have all been dealt
with and it goes back to normal times.
The patch is now reverted and I don't expect to try bringing it back.
There are ways to bias the placement the pages as the patch intended without
doing an expensive search.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
next prev parent reply other threads:[~2007-11-14 11:25 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-12 18:25 Major mke2fs slowdown (reproducable, bisected) Alexey Dobriyan
2007-11-12 18:39 ` Linus Torvalds
2007-11-12 21:34 ` Alexey Dobriyan
2007-11-12 22:15 ` Linus Torvalds
2007-11-13 16:25 ` Andi Kleen
2007-11-14 11:25 ` Mel Gorman [this message]
2007-11-13 16:54 ` Mel Gorman
2007-11-14 11:17 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071114112531.GC9566@skynet.ie \
--to=mel@skynet.ie \
--cc=adobriyan@gmail.com \
--cc=akpm@osdl.org \
--cc=andi@firstfloor.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@vger.kernel.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.