All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Rik van Riel <riel@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Nick Piggin <npiggin@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Lin Ming <ming.m.lin@intel.com>,
	Zhang Yanmin <yanmin_zhang@linux.intel.com>
Subject: Re: [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value
Date: Tue, 24 Feb 2009 11:36:19 +0000	[thread overview]
Message-ID: <20090224113619.GA25151@csn.ul.ie> (raw)
In-Reply-To: <20090224103226.e9e2766f.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, Feb 24, 2009 at 10:32:26AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 23 Feb 2009 16:40:47 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Mon, Feb 23, 2009 at 10:43:20AM -0500, Christoph Lameter wrote:
> > > On Tue, 24 Feb 2009, Nick Piggin wrote:
> > > 
> > > > > Are you sure that this is a benefit? Jumps are forward and pretty short
> > > > > and the compiler is optimizing a branch away in the current code.
> > > >
> > > > Pretty easy to mispredict there, though, especially as you can tend
> > > > to get allocations interleaved between kernel and movable (or simply
> > > > if the branch predictor is cold there are a lot of branches on x86-64).
> > > >
> > > > I would be interested to know if there is a measured improvement.
> > 
> > Not in kernbench at least, but that is no surprise. It's a small
> > percentage of the overall cost. It'll appear in the noise for anything
> > other than micro-benchmarks.
> > 
> > > > It
> > > > adds an extra dcache line to the footprint, but OTOH the instructions
> > > > you quote is more than one icache line, and presumably Mel's code will
> > > > be a lot shorter.
> > > 
> > 
> > Yes, it's an index lookup of a shared read-only cache line versus a lot
> > of code with branches to mispredict. I wasn't happy with the cache line
> > consumption but it was the first obvious alternative.
> > 
> > > Maybe we can come up with a version of gfp_zone that has no branches and
> > > no lookup?
> > > 
> > 
> > Ideally, yes, but I didn't spot any obvious way of figuring it out at
> > compile time then or now. Suggestions?
> > 
> 
> 
> Assume
>   ZONE_DMA=0
>   ZONE_DMA32=1
>   ZONE_NORMAL=2
>   ZONE_HIGHMEM=3
>   ZONE_MOVABLE=4
> 
> #define __GFP_DMA       ((__force gfp_t)0x01u)
> #define __GFP_DMA32     ((__force gfp_t)0x02u)
> #define __GFP_HIGHMEM   ((__force gfp_t)0x04u)
> #define __GFP_MOVABLE   ((__force gfp_t)0x08u)
> 
> #define GFP_MAGIC (0400030102) ) #depends on config.
> 
> gfp_zone(mask) = ((GFP_MAGIC >> ((mask & 0xf)*3) & 0x7)
> 

Clever. I can see how this can be made work for __GFP_DMA, __GFP_DMA32 and
__GFP_HIGHMEM. However, I'm not currently seeing how __GFP_MOVABLE can be dealt
with properly and quickly. In the above scheme __GFP_MOVABLE would return
zone 4 which appears right but it's not. Only __GFP_MOVABLE|__GFP_HIGHMEM
should return 4.

To make that work, you end up with something like the following;

#define GFP_DMA_ZONEMAGIC       0000000100
#define GFP_DMA32_ZONEMAGIC     0000010000
#define GFP_NORMAL_ZONEMAGIC    0000000002
#define GFP_HIGHMEM_ZONEMAGIC   0000000200
#define GFP_MOVABLE_ZONEMAGIC   040000000000ULL
#define GFP_MAGIC (GFP_DMA_ZONEMAGIC|GFP_DMA32_ZONEMAGIC|GFP_NORMAL_ZONEMAGIC|GFP_HIGHMEM_ZONEMAGIC|GFP_MOVABLE_ZONEMAGIC)

static inline int new_gfp_zone(gfp_t flags) {
        if ((flags & __GFP_MOVABLE))
                if (!(flags & __GFP_HIGHMEM))
                        flags &= ~__GFP_MOVABLE;
        return (GFP_MAGIC >> ((flags & 0xf)*3) & 0x7);
}

so we end up back again with branches and checking masks. Mind you, I also
ended up with a different GFP magic value when actually implementing this
so I might be missing something else with your suggestion and how it works.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mel@csn.ul.ie>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Rik van Riel <riel@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Nick Piggin <npiggin@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Lin Ming <ming.m.lin@intel.com>,
	Zhang Yanmin <yanmin_zhang@linux.intel.com>
Subject: Re: [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value
Date: Tue, 24 Feb 2009 11:36:19 +0000	[thread overview]
Message-ID: <20090224113619.GA25151@csn.ul.ie> (raw)
In-Reply-To: <20090224103226.e9e2766f.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, Feb 24, 2009 at 10:32:26AM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 23 Feb 2009 16:40:47 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Mon, Feb 23, 2009 at 10:43:20AM -0500, Christoph Lameter wrote:
> > > On Tue, 24 Feb 2009, Nick Piggin wrote:
> > > 
> > > > > Are you sure that this is a benefit? Jumps are forward and pretty short
> > > > > and the compiler is optimizing a branch away in the current code.
> > > >
> > > > Pretty easy to mispredict there, though, especially as you can tend
> > > > to get allocations interleaved between kernel and movable (or simply
> > > > if the branch predictor is cold there are a lot of branches on x86-64).
> > > >
> > > > I would be interested to know if there is a measured improvement.
> > 
> > Not in kernbench at least, but that is no surprise. It's a small
> > percentage of the overall cost. It'll appear in the noise for anything
> > other than micro-benchmarks.
> > 
> > > > It
> > > > adds an extra dcache line to the footprint, but OTOH the instructions
> > > > you quote is more than one icache line, and presumably Mel's code will
> > > > be a lot shorter.
> > > 
> > 
> > Yes, it's an index lookup of a shared read-only cache line versus a lot
> > of code with branches to mispredict. I wasn't happy with the cache line
> > consumption but it was the first obvious alternative.
> > 
> > > Maybe we can come up with a version of gfp_zone that has no branches and
> > > no lookup?
> > > 
> > 
> > Ideally, yes, but I didn't spot any obvious way of figuring it out at
> > compile time then or now. Suggestions?
> > 
> 
> 
> Assume
>   ZONE_DMA=0
>   ZONE_DMA32=1
>   ZONE_NORMAL=2
>   ZONE_HIGHMEM=3
>   ZONE_MOVABLE=4
> 
> #define __GFP_DMA       ((__force gfp_t)0x01u)
> #define __GFP_DMA32     ((__force gfp_t)0x02u)
> #define __GFP_HIGHMEM   ((__force gfp_t)0x04u)
> #define __GFP_MOVABLE   ((__force gfp_t)0x08u)
> 
> #define GFP_MAGIC (0400030102) ) #depends on config.
> 
> gfp_zone(mask) = ((GFP_MAGIC >> ((mask & 0xf)*3) & 0x7)
> 

Clever. I can see how this can be made work for __GFP_DMA, __GFP_DMA32 and
__GFP_HIGHMEM. However, I'm not currently seeing how __GFP_MOVABLE can be dealt
with properly and quickly. In the above scheme __GFP_MOVABLE would return
zone 4 which appears right but it's not. Only __GFP_MOVABLE|__GFP_HIGHMEM
should return 4.

To make that work, you end up with something like the following;

#define GFP_DMA_ZONEMAGIC       0000000100
#define GFP_DMA32_ZONEMAGIC     0000010000
#define GFP_NORMAL_ZONEMAGIC    0000000002
#define GFP_HIGHMEM_ZONEMAGIC   0000000200
#define GFP_MOVABLE_ZONEMAGIC   040000000000ULL
#define GFP_MAGIC (GFP_DMA_ZONEMAGIC|GFP_DMA32_ZONEMAGIC|GFP_NORMAL_ZONEMAGIC|GFP_HIGHMEM_ZONEMAGIC|GFP_MOVABLE_ZONEMAGIC)

static inline int new_gfp_zone(gfp_t flags) {
        if ((flags & __GFP_MOVABLE))
                if (!(flags & __GFP_HIGHMEM))
                        flags &= ~__GFP_MOVABLE;
        return (GFP_MAGIC >> ((flags & 0xf)*3) & 0x7);
}

so we end up back again with branches and checking masks. Mind you, I also
ended up with a different GFP magic value when actually implementing this
so I might be missing something else with your suggestion and how it works.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-02-24 11:36 UTC|newest]

Thread overview: 190+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-22 23:17 [RFC PATCH 00/20] Cleanup and optimise the page allocator Mel Gorman
2009-02-22 23:17 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 01/20] Replace __alloc_pages_internal() with __alloc_pages_nodemask() Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 02/20] Do not sanity check order in the fast path Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 03/20] Do not check NUMA node ID when the caller knows the node is valid Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23 15:01   ` Christoph Lameter
2009-02-23 15:01     ` Christoph Lameter
2009-02-23 16:24     ` Mel Gorman
2009-02-23 16:24       ` Mel Gorman
2009-02-22 23:17 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23 11:55   ` [PATCH] mm: clean up __GFP_* flags a bit Peter Zijlstra
2009-02-23 11:55     ` Peter Zijlstra
2009-02-23 18:01     ` Mel Gorman
2009-02-23 18:01       ` Mel Gorman
2009-02-23 20:27       ` Vegard Nossum
2009-02-23 20:27         ` Vegard Nossum
2009-02-23 15:23   ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Christoph Lameter
2009-02-23 15:23     ` Christoph Lameter
2009-02-23 15:41     ` Nick Piggin
2009-02-23 15:41       ` Nick Piggin
2009-02-23 15:43       ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 15:43         ` Christoph Lameter
2009-02-23 16:40         ` Mel Gorman
2009-02-23 16:40           ` Mel Gorman
2009-02-23 17:03           ` Christoph Lameter
2009-02-23 17:03             ` Christoph Lameter
2009-02-24  1:32           ` KAMEZAWA Hiroyuki
2009-02-24  1:32             ` KAMEZAWA Hiroyuki
2009-02-24  3:59             ` Nick Piggin
2009-02-24  3:59               ` Nick Piggin
2009-02-24  5:20               ` KAMEZAWA Hiroyuki
2009-02-24  5:20                 ` KAMEZAWA Hiroyuki
2009-02-24 11:36             ` Mel Gorman [this message]
2009-02-24 11:36               ` Mel Gorman
2009-02-23 16:33     ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 16:33       ` Mel Gorman
2009-02-23 16:33       ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 16:33         ` Christoph Lameter
2009-02-23 17:41         ` Mel Gorman
2009-02-23 17:41           ` Mel Gorman
2009-02-22 23:17 ` [PATCH 05/20] Check only once if the zonelist is suitable for the allocation Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 06/20] Break up the allocator entry point into fast and slow paths Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 07/20] Simplify the check on whether cpusets are a factor or not Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  7:14   ` Pekka J Enberg
2009-02-23  7:14     ` Pekka J Enberg
2009-02-23  9:07     ` Peter Zijlstra
2009-02-23  9:07       ` Peter Zijlstra
2009-02-23  9:13       ` Pekka Enberg
2009-02-23  9:13         ` Pekka Enberg
2009-02-23 11:39         ` Mel Gorman
2009-02-23 11:39           ` Mel Gorman
2009-02-23 13:19           ` Pekka Enberg
2009-02-23 13:19             ` Pekka Enberg
2009-02-23  9:14   ` Li Zefan
2009-02-23  9:14     ` Li Zefan
2009-02-22 23:17 ` [PATCH 08/20] Move check for disabled anti-fragmentation out of fastpath Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 09/20] Calculate the preferred zone for allocation only once Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 10/20] Calculate the migratetype " Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 11/20] Inline get_page_from_freelist() in the fast-path Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  7:21   ` Pekka Enberg
2009-02-23  7:21     ` Pekka Enberg
2009-02-23 11:42     ` Mel Gorman
2009-02-23 11:42       ` Mel Gorman
2009-02-23 15:32   ` Nick Piggin
2009-02-23 15:32     ` Nick Piggin
2009-02-24 13:32     ` Mel Gorman
2009-02-24 13:32       ` Mel Gorman
2009-02-24 14:08       ` Nick Piggin
2009-02-24 14:08         ` Nick Piggin
2009-02-24 15:03         ` Mel Gorman
2009-02-24 15:03           ` Mel Gorman
2009-02-22 23:17 ` [PATCH 12/20] Inline __rmqueue_smallest() Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 13/20] Inline buffered_rmqueue() Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  7:24   ` Pekka Enberg
2009-02-23  7:24     ` Pekka Enberg
2009-02-23 11:44     ` Mel Gorman
2009-02-23 11:44       ` Mel Gorman
2009-02-22 23:17 ` [PATCH 14/20] Do not call get_pageblock_migratetype() more than necessary Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 15/20] Do not disable interrupts in free_page_mlock() Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  9:19   ` Peter Zijlstra
2009-02-23  9:19     ` Peter Zijlstra
2009-02-23 12:23     ` Mel Gorman
2009-02-23 12:23       ` Mel Gorman
2009-02-23 12:44       ` Peter Zijlstra
2009-02-23 12:44         ` Peter Zijlstra
2009-02-23 14:25         ` Mel Gorman
2009-02-23 14:25           ` Mel Gorman
2009-02-22 23:17 ` [PATCH 16/20] Do not setup zonelist cache when there is only one node Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 17/20] Do not double sanity check page attributes during allocation Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 18/20] Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 19/20] Batch free pages from migratetype per-cpu lists Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-22 23:17 ` [PATCH 20/20] Get rid of the concept of hot/cold page freeing Mel Gorman
2009-02-22 23:17   ` Mel Gorman
2009-02-23  9:37   ` Andrew Morton
2009-02-23  9:37     ` Andrew Morton
2009-02-23 23:30     ` Mel Gorman
2009-02-23 23:30       ` Mel Gorman
2009-02-23 23:53       ` Andrew Morton
2009-02-23 23:53         ` Andrew Morton
2009-02-24 11:51         ` Mel Gorman
2009-02-24 11:51           ` Mel Gorman
2009-02-25  0:01           ` Andrew Morton
2009-02-25  0:01             ` Andrew Morton
2009-02-25 16:01             ` Mel Gorman
2009-02-25 16:01               ` Mel Gorman
2009-02-25 16:19               ` Andrew Morton
2009-02-25 16:19                 ` Andrew Morton
2009-02-26 16:37                 ` Mel Gorman
2009-02-26 16:37                   ` Mel Gorman
2009-02-26 17:00                   ` Christoph Lameter
2009-02-26 17:00                     ` Christoph Lameter
2009-02-26 17:15                     ` Mel Gorman
2009-02-26 17:15                       ` Mel Gorman
2009-02-26 17:30                       ` Christoph Lameter
2009-02-26 17:30                         ` Christoph Lameter
2009-02-27 11:33                         ` Nick Piggin
2009-02-27 11:33                           ` Nick Piggin
2009-02-27 15:40                           ` Christoph Lameter
2009-02-27 15:40                             ` Christoph Lameter
2009-03-03 13:52                             ` Mel Gorman
2009-03-03 13:52                               ` Mel Gorman
2009-03-03 18:53                               ` Christoph Lameter
2009-03-03 18:53                                 ` Christoph Lameter
2009-02-27 11:38                       ` Nick Piggin
2009-02-27 11:38                         ` Nick Piggin
2009-03-01 10:37                         ` KOSAKI Motohiro
2009-03-01 10:37                           ` KOSAKI Motohiro
2009-02-25 18:33               ` Christoph Lameter
2009-02-25 18:33                 ` Christoph Lameter
2009-02-22 23:57 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-22 23:57   ` Andi Kleen
2009-02-23 12:34   ` Mel Gorman
2009-02-23 12:34     ` Mel Gorman
2009-02-23 15:34   ` [RFC PATCH 00/20] Cleanup and optimise the page allocato Christoph Lameter
2009-02-23 15:34     ` Christoph Lameter
2009-02-23  0:02 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23  0:02   ` Andi Kleen
2009-02-23 14:32   ` Mel Gorman
2009-02-23 14:32     ` Mel Gorman
2009-02-23 17:49     ` Andi Kleen
2009-02-23 17:49       ` Andi Kleen
2009-02-24 14:32       ` Mel Gorman
2009-02-24 14:32         ` Mel Gorman
2009-02-23  7:29 ` Pekka Enberg
2009-02-23  7:29   ` Pekka Enberg
2009-02-23  8:34   ` Zhang, Yanmin
2009-02-23  8:34     ` Zhang, Yanmin
2009-02-23  9:10   ` KOSAKI Motohiro
2009-02-23  9:10     ` KOSAKI Motohiro
2009-02-23 11:55 ` [PATCH] mm: gfp_to_alloc_flags() Peter Zijlstra
2009-02-23 11:55   ` Peter Zijlstra
2009-02-23 14:00   ` Pekka Enberg
2009-02-23 14:00     ` Pekka Enberg
2009-02-23 18:17   ` Mel Gorman
2009-02-23 18:17     ` Mel Gorman
2009-02-23 20:09     ` Peter Zijlstra
2009-02-23 20:09       ` Peter Zijlstra
2009-02-23 22:59   ` Andrew Morton
2009-02-23 22:59     ` Andrew Morton
2009-02-24  8:59     ` Peter Zijlstra
2009-02-24  8:59       ` Peter Zijlstra
2009-02-23 14:38 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Christoph Lameter
2009-02-23 14:38   ` Christoph Lameter
2009-02-23 14:46 ` Nick Piggin
2009-02-23 14:46   ` Nick Piggin
2009-02-23 15:00   ` Mel Gorman
2009-02-23 15:00     ` Mel Gorman
2009-02-23 15:22     ` Nick Piggin
2009-02-23 15:22       ` Nick Piggin
2009-02-23 20:26       ` Mel Gorman
2009-02-23 20:26         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090224113619.GA25151@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=cl@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ming.m.lin@intel.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=riel@redhat.com \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.