Re: [PATCH 20/20] Get rid of the concept of hot/cold page freeing

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, penberg@cs.helsinki.fi, riel@redhat.com,
	kosaki.motohiro@jp.fujitsu.com, cl@linux-foundation.org,
	hannes@cmpxchg.org, npiggin@suse.de,
	linux-kernel@vger.kernel.org, ming.m.lin@intel.com,
	yanmin_zhang@linux.intel.com
Subject: Re: [PATCH 20/20] Get rid of the concept of hot/cold page freeing
Date: Tue, 24 Feb 2009 11:51:26 +0000	[thread overview]
Message-ID: <20090224115126.GB25151@csn.ul.ie> (raw)
In-Reply-To: <20090223155313.abd41881.akpm@linux-foundation.org>

On Mon, Feb 23, 2009 at 03:53:13PM -0800, Andrew Morton wrote:
> On Mon, 23 Feb 2009 23:30:30 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Mon, Feb 23, 2009 at 01:37:23AM -0800, Andrew Morton wrote:
> > > On Sun, 22 Feb 2009 23:17:29 +0000 Mel Gorman <mel@csn.ul.ie> wrote:
> > > 
> > > > Currently an effort is made to determine if a page is hot or cold when
> > > > it is being freed so that cache hot pages can be allocated to callers if
> > > > possible. However, the reasoning used whether to mark something hot or
> > > > cold is a bit spurious. A profile run of kernbench showed that "cold"
> > > > pages were never freed so it either doesn't happen generally or is so
> > > > rare, it's barely measurable.
> > > > 
> > > > It's dubious as to whether pages are being correctly marked hot and cold
> > > > anyway. Things like page cache and pages being truncated are are considered
> > > > "hot" but there is no guarantee that these pages have been recently used
> > > > and are cache hot. Pages being reclaimed from the LRU are considered
> > > > cold which is logical because they cannot have been referenced recently
> > > > but if the system is reclaiming pages, then we have entered allocator
> > > > slowpaths and are not going to notice any potential performance boost
> > > > because a "hot" page was freed.
> > > > 
> > > > This patch just deletes the concept of freeing hot or cold pages and
> > > > just frees them all as hot.
> > > > 
> > > 
> > > Well yes.  We waffled for months over whether to merge that code originally.
> > > 
> > > What tipped the balance was a dopey microbenchmark which I wrote which
> > > sat in a loop extending (via write()) and then truncating the same file
> > > by 32 kbytes (or thereabouts).  Its performance was increased by a lot
> > > (2x or more, iirc) and no actual regressions were demonstrable, so we
> > > merged it.
> > > 
> > > Could you check that please?  I'd suggest trying various values of 32k,
> > > too.
> > > 
> > 
> > I dug around the archives but hadn't much luck finding the original
> > discussion. I saw some results from around the 2.5.40-mm timeframe that talked
> > about ~60% difference with this benchmark (http://lkml.org/lkml/2002/10/6/174)
> > but didn't find the source. The more solid benchmark reports was
> > https://lwn.net/Articles/14761/ where you talked about 1-2% kernel compile
> > improvements, good SpecWEB and a big hike on performance with SDET.
> > 
> > It's not clearcut. I tried reproducing your original benchmark rather than
> > whinging about not finding yours :) . The source is below so maybe you can
> > tell me if it's equivalent? I only ran it on one CPU which also may be a
> > factor. The results were
> > 
> >     size      with   without difference
> >       64  0.216033  0.558803 -158.67%
> >      128  0.158551  0.150673   4.97%
> >      256  0.153240  0.153488  -0.16%
> >      512  0.156502  0.158769  -1.45%
> >     1024  0.162146  0.163302  -0.71%
> >     2048  0.167001  0.169573  -1.54%
> >     4096  0.175376  0.178882  -2.00%
> >     8192  0.237618  0.243385  -2.43%
> >    16384  0.735053  0.351040  52.24%
> >    32768  0.524731  0.583863 -11.27%
> >    65536  1.149310  1.227855  -6.83%
> >   131072  2.160248  2.084981   3.48%
> >   262144  3.858264  4.046389  -4.88%
> >   524288  8.228358  8.259957  -0.38%
> >  1048576 16.228190 16.288308  -0.37%
> > 
> > with    == Using hot/cold information to place pages at the front or end of
> >         the freelist
> > without == Consider all pages being freed as hot
> 
> My head is spinning.  Smaller is better, right? 

Right. It's measured in time so smaller == faster == better.

> So for 16384-byte
> writes, current mainline is slower?
> 
> That's odd.
> 

Indeed.

> > The results are a bit all over the place but mostly negative but nowhere near
> > 60% of a difference so the benchmark might be wrong. Oddly, 64 shows massive
> > regressions but 16384 shows massive improvements. With profiling enabled, it's
> > 
> >       64  0.214873  0.196666   8.47%
> >      128  0.166807  0.162612   2.51%
> >      256  0.170776  0.161861   5.22%
> >      512  0.175772  0.164903   6.18%
> >     1024  0.178835  0.168695   5.67%
> >     2048  0.183769  0.174317   5.14%
> >     4096  0.191877  0.183343   4.45%
> >     8192  0.262511  0.254148   3.19%
> >    16384  0.388201  0.371461   4.31%
> >    32768  0.655402  0.611528   6.69%
> >    65536  1.325445  1.193961   9.92%
> >   131072  2.218135  2.209091   0.41%
> >   262144  4.117233  4.116681   0.01%
> >   524288  8.514915  8.590700  -0.89%
> >  1048576 16.657330 16.708367  -0.31%
> > 
> > Almost the opposite with steady improvements almost all the way through.
> > 
> > With the patch applied, we are still using hot/cold information on the
> > allocation side so I'm somewhat surprised the patch even makes much of a
> > difference. I'd have expected the pages being freed to be mostly hot.
> 
> Oh yeah.  Back in the ancient days, hot-cold-pages was using separate
> magazines for hot and cold pages.  Then Christoph went and mucked with
> it, using a single queue.  That might have affected things.
> 

It might have. The impact is that requests for cold pages can get hot pages
if there are not enough cold pages in the queue so readahead could prevent
an active process getting cache hot pages. I don't think that would have
showed up in the microbenchmark though.

> It would be interesting to go back to a suitably-early kernel to see if
> we broke it sometime after the early quantitative testing.  But I could
> understand you not being so terribly interested ;)
> 

I'm interested, but I now want to put it off for a second or third pass at
giving the page allocator "go faster stripes". It's pure chicken, but the other
patches are lower-hanging fruit but hefty enough to deal with on their own.

> > Kernbench was no help figuring this out either.
> > 
> > with:    Elapsed: 74.1625s User: 253.85s System: 27.1s CPU: 378.5%
> > without: Elapsed: 74.0525s User: 252.9s System: 27.3675s CPU: 378.25%
> > 
> > Improvements on elapsed and user time but a regression on system time.
> > 
> > The issue is sufficiently cloudy that I'm just going to drop the patch
> > for now. Hopefully the rest of the patchset is more clear-cut. I'll pick
> > it up again at a later time.
> 
> Well...  if the benefits of the existing code are dubious then we
> should default to deleting it.
> 

Some things haven't changed since 2002 - I can't convince myself it's
either good or bad at the moment so am leaning towards leaving it alone
for now.

> > Here is the microbenchmark I used
> > 
> > Thanks.
> > 
> > /*
> >  * write-truncate.c
> >  * Microbenchmark that tests the speed of write/truncate of small files.
> >  * 
> >  * Suggested by Andrew Morton
> >  * Written by Mel Gorman 2009
> >  */
> > #include <stdio.h>
> > #include <limits.h>
> > #include <unistd.h>
> > #include <sys/types.h>
> > #include <sys/time.h>
> > #include <fcntl.h>
> > #include <stdlib.h>
> > #include <string.h>
> > 
> > #define TESTFILE "./write-truncate-testfile.dat"
> > #define ITERATIONS 10000
> > #define STARTSIZE 32
> > #define SIZES 15
> > 
> > #ifndef MIN
> > #define MIN(x,y) ((x)<(y)?(x):(y))
> > #endif
> > #ifndef MAX
> > #define MAX(x,y) ((x)>(y)?(x):(y))
> > #endif
> > 
> > double whattime()
> > {
> >         struct timeval tp;
> >         int i;
> > 
> > 	if (gettimeofday(&tp,NULL) == -1) {
> > 		perror("gettimeofday");
> > 		exit(EXIT_FAILURE);
> > 	}
> > 
> >         return ( (double) tp.tv_sec + (double) tp.tv_usec * 1.e-6 );
> > }
> > 
> > int main(void)
> > {
> > 	int fd;
> > 	int bufsize, sizes, iteration;
> > 	char *buf;
> > 	double t;
> > 
> > 	/* Create test file */
> > 	fd = open(TESTFILE, O_RDWR|O_CREAT|O_EXCL);
> > 	if (fd == -1) {
> > 		perror("open");
> > 		exit(EXIT_FAILURE);
> > 	}
> > 
> > 	/* Unlink now for cleanup */
> > 	if (unlink(TESTFILE) == -1) {
> > 		perror("unlinke");
> > 		exit(EXIT_FAILURE);
> > 	}
> > 
> > 	/* Go through a series of sizes */
> > 	bufsize = STARTSIZE;
> > 	for (sizes = 1; sizes <= SIZES; sizes++) {
> > 		bufsize *= 2;
> > 		buf = malloc(bufsize);
> > 		if (buf == NULL) {
> > 			printf("ERROR: Malloc failed\n");
> > 			exit(EXIT_FAILURE);
> > 		}
> > 		memset(buf, 0xE0, bufsize);
> > 
> > 		t = whattime();
> > 		for (iteration = 0; iteration < ITERATIONS; iteration++) {
> > 			size_t written = 0, thiswrite;
> > 			
> > 			while (written != bufsize) {
> > 				thiswrite = write(fd, buf, bufsize);
> 
> (it should write bufsize-written ;))
> 

D'oh - and write the buffer from buf + written. Not that it matters in this
case as it's all just fake data. I wonder did this mistake cause the spike
at 16384 hmmm....

> > 				if (thiswrite == -1) {
> > 					perror("write");
> > 					exit(EXIT_FAILURE);
> > 				}
> > 				written += thiswrite;
> > 			}
> > 
> > 			if (ftruncate(fd, 0) == -1) {
> > 				perror("ftruncate");
> > 				exit(EXIT_FAILURE);
> > 			}
> > 
> > 			if (lseek(fd, 0, SEEK_SET) != 0) {
> > 				perror("lseek");
> > 				exit(EXIT_FAILURE);
> > 			}
> > 		}
> 
> yup, I think that captures the same idea.
> 
> > 		t = whattime() - t;
> > 		free(buf);
> > 
> > 		printf("%d %f\n", bufsize, t);
> > 	}
> > 
> > 	if (close(fd) == -1) {
> > 		perror("close");
> > 		exit(EXIT_FAILURE);
> > 	}
> > 
> > 	exit(EXIT_SUCCESS);
> > }

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-02-24 11:51 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-22 23:17 [RFC PATCH 00/20] Cleanup and optimise the page allocator Mel Gorman
2009-02-22 23:17 ` [PATCH 01/20] Replace __alloc_pages_internal() with __alloc_pages_nodemask() Mel Gorman
2009-02-22 23:17 ` [PATCH 02/20] Do not sanity check order in the fast path Mel Gorman
2009-02-22 23:17 ` [PATCH 03/20] Do not check NUMA node ID when the caller knows the node is valid Mel Gorman
2009-02-23 15:01   ` Christoph Lameter
2009-02-23 16:24     ` Mel Gorman
2009-02-22 23:17 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 11:55   ` [PATCH] mm: clean up __GFP_* flags a bit Peter Zijlstra
2009-02-23 18:01     ` Mel Gorman
2009-02-23 20:27       ` Vegard Nossum
2009-02-23 15:23   ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Christoph Lameter
2009-02-23 15:41     ` Nick Piggin
2009-02-23 15:43       ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 16:40         ` Mel Gorman
2009-02-23 17:03           ` Christoph Lameter
2009-02-24  1:32           ` KAMEZAWA Hiroyuki
2009-02-24  3:59             ` Nick Piggin
2009-02-24  5:20               ` KAMEZAWA Hiroyuki
2009-02-24 11:36             ` Mel Gorman
2009-02-23 16:33     ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 16:33       ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 17:41         ` Mel Gorman
2009-02-22 23:17 ` [PATCH 05/20] Check only once if the zonelist is suitable for the allocation Mel Gorman
2009-02-22 23:17 ` [PATCH 06/20] Break up the allocator entry point into fast and slow paths Mel Gorman
2009-02-22 23:17 ` [PATCH 07/20] Simplify the check on whether cpusets are a factor or not Mel Gorman
2009-02-23  7:14   ` Pekka J Enberg
2009-02-23  9:07     ` Peter Zijlstra
2009-02-23  9:13       ` Pekka Enberg
2009-02-23 11:39         ` Mel Gorman
2009-02-23 13:19           ` Pekka Enberg
2009-02-23  9:14   ` Li Zefan
2009-02-22 23:17 ` [PATCH 08/20] Move check for disabled anti-fragmentation out of fastpath Mel Gorman
2009-02-22 23:17 ` [PATCH 09/20] Calculate the preferred zone for allocation only once Mel Gorman
2009-02-22 23:17 ` [PATCH 10/20] Calculate the migratetype " Mel Gorman
2009-02-22 23:17 ` [PATCH 11/20] Inline get_page_from_freelist() in the fast-path Mel Gorman
2009-02-23  7:21   ` Pekka Enberg
2009-02-23 11:42     ` Mel Gorman
2009-02-23 15:32   ` Nick Piggin
2009-02-24 13:32     ` Mel Gorman
2009-02-24 14:08       ` Nick Piggin
2009-02-24 15:03         ` Mel Gorman
2009-02-22 23:17 ` [PATCH 12/20] Inline __rmqueue_smallest() Mel Gorman
2009-02-22 23:17 ` [PATCH 13/20] Inline buffered_rmqueue() Mel Gorman
2009-02-23  7:24   ` Pekka Enberg
2009-02-23 11:44     ` Mel Gorman
2009-02-22 23:17 ` [PATCH 14/20] Do not call get_pageblock_migratetype() more than necessary Mel Gorman
2009-02-22 23:17 ` [PATCH 15/20] Do not disable interrupts in free_page_mlock() Mel Gorman
2009-02-23  9:19   ` Peter Zijlstra
2009-02-23 12:23     ` Mel Gorman
2009-02-23 12:44       ` Peter Zijlstra
2009-02-23 14:25         ` Mel Gorman
2009-02-22 23:17 ` [PATCH 16/20] Do not setup zonelist cache when there is only one node Mel Gorman
2009-02-22 23:17 ` [PATCH 17/20] Do not double sanity check page attributes during allocation Mel Gorman
2009-02-22 23:17 ` [PATCH 18/20] Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-02-22 23:17 ` [PATCH 19/20] Batch free pages from migratetype per-cpu lists Mel Gorman
2009-02-22 23:17 ` [PATCH 20/20] Get rid of the concept of hot/cold page freeing Mel Gorman
2009-02-23  9:37   ` Andrew Morton
2009-02-23 23:30     ` Mel Gorman
2009-02-23 23:53       ` Andrew Morton
2009-02-24 11:51         ` Mel Gorman [this message]
2009-02-25  0:01           ` Andrew Morton
2009-02-25 16:01             ` Mel Gorman
2009-02-25 16:19               ` Andrew Morton
2009-02-26 16:37                 ` Mel Gorman
2009-02-26 17:00                   ` Christoph Lameter
2009-02-26 17:15                     ` Mel Gorman
2009-02-26 17:30                       ` Christoph Lameter
2009-02-27 11:33                         ` Nick Piggin
2009-02-27 15:40                           ` Christoph Lameter
2009-03-03 13:52                             ` Mel Gorman
2009-03-03 18:53                               ` Christoph Lameter
2009-02-27 11:38                       ` Nick Piggin
2009-03-01 10:37                         ` KOSAKI Motohiro
2009-02-25 18:33               ` Christoph Lameter
2009-02-22 23:57 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 12:34   ` Mel Gorman
2009-02-23 15:34   ` [RFC PATCH 00/20] Cleanup and optimise the page allocato Christoph Lameter
2009-02-23  0:02 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 14:32   ` Mel Gorman
2009-02-23 17:49     ` Andi Kleen
2009-02-24 14:32       ` Mel Gorman
2009-02-23  7:29 ` Pekka Enberg
2009-02-23  8:34   ` Zhang, Yanmin
2009-02-23  9:10   ` KOSAKI Motohiro
2009-02-23 11:55 ` [PATCH] mm: gfp_to_alloc_flags() Peter Zijlstra
2009-02-23 14:00   ` Pekka Enberg
2009-02-23 18:17   ` Mel Gorman
2009-02-23 20:09     ` Peter Zijlstra
2009-02-23 22:59   ` Andrew Morton
2009-02-24  8:59     ` Peter Zijlstra
2009-02-23 14:38 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Christoph Lameter
2009-02-23 14:46 ` Nick Piggin
2009-02-23 15:00   ` Mel Gorman
2009-02-23 15:22     ` Nick Piggin
2009-02-23 20:26       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090224115126.GB25151@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ming.m.lin@intel.com \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=riel@redhat.com \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox