Re: Performance regression in scsi sequential throughput (iozone) due to "e084b - page-allocator: preserve PFN ordering when __GFP_COLD is set"

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Cc: arayananu Gopalakrishnan <narayanan.g@samsung.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	epasch@de.ibm.com, SCHILLIG@de.ibm.com,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	christof.schmitt@de.ibm.com, thoss@de.ibm.com
Subject: Re: Performance regression in scsi sequential throughput (iozone) due to "e084b - page-allocator: preserve PFN ordering when __GFP_COLD is set"
Date: Fri, 11 Dec 2009 11:20:10 +0000	[thread overview]
Message-ID: <20091211112009.GC30670@csn.ul.ie> (raw)
In-Reply-To: <4B210754.2020601@linux.vnet.ibm.com>

On Thu, Dec 10, 2009 at 03:36:04PM +0100, Christian Ehrhardt wrote:
> Keeping the old discussion in the mail tail, adding the new information  
> up here were everyone finds them :-)
>
> Things I was able to confirm so far summarized:
> - The controller doesn't care about pfn ordering in any way (proved by  
> HW statistics)
> - regression appears in sequential AND random workloads -> also without  
> readahead
> - oprofile & co are no option atm.
>  The effective consumed cpu cycles per transferred kb are almost the  
> same so I would not expect sampling would give us huge insights.
>  Therefore I expect that it is more a matter of lost time (latency) than 
> more expensive tasks (cpu consumption)

But earlier, you said that latency was lower - "latency statistics clearly
state that your patch is working as intended - the latency from entering
the controller until the interrupt to linux device driver is ~30% lower!."

Also, if the controller is doing no merging of IO requests, why is the
interrupt rate lower?

>  I don't want to preclude it completely, but sampling has to wait as  
> long as we have better tracks to follow
>
> So the question is where time is lost in Linux. I used blktrace to  
> create latency summaries.
> I only list the random case for discussion as the effects are more clear  
> int hat data.
> Abbreviations are (like the blkparse man page explains) - sorted in  
> order it would appear per request:
>       A -- remap For stacked devices, incoming i/o is remapped to device 
> below it in the i/o stack. The remap action details what exactly is being 
> remapped to what.
>       G -- get request To send any type of request to a block device, a  
> struct request container must be allocated first.
>       I -- inserted A request is being sent to the i/o scheduler for  
> addition to the internal queue and later service by the driver. The  
> request is fully formed at this time.
>       D -- issued A request that previously resided on the block layer  
> queue or in the i/o scheduler has been sent to the driver.
>       C -- complete A previously issued request has been completed.  The 
> output will detail the sector and size of that request, as well as the 
> success or failure of it.
>
> The following table shows the average latencies from A to G, G to I and  
> so on.
> C2A is special and tries to summarize how long it takes after completing  
> an I/O until the next one arrives in the block device layer.
>
>                     avg-A2G    avg-G2I    avg-I2D   avg-D2C    avg-C2A-in-avg+-stddev    %C2A-in-avg+-stddev
> deviation good->bad    -3.48%    -0.56%    -1.57%    -1.31%         128.69%                 97.26%
>
> It clearly shows that all latencies once block device layer and device  
> driver are involved are almost equal. Remember that the throughput of  
> good vs. bad case is more than x3.
> But we can also see that the value of C2A increases by a huge amount.  
> That huge C2A increase let me assume that the time is actually lost  
> "above" the block device layer.
>

To be clear. As C is "completion" and "A" is remapping new IO, it
implies that time is being lost between when one IO completes and
another starts, right?

> I don't expect the execution speed of iozone as user process itself is  
> affected by commit e084b,

Not by this much anyway. Lets say cache hotness is a problem, I would
expect some performance loss but not this much.
> so the question is where the time is lost  
> between the "read" issued by iozone and entering the block device layer.
> Actually I expect it somewhere in the area of getting a page cache page  
> for the I/O. On one hand page handling is what commit e084b changes and  
> on the other hand pages are under pressure (systat vm effectiveness  
> ~100%, >40% scanned directly in both cases).
>
> I'll continue hunting down the lost time - maybe with ftrace if it is  
> not concealing the effect by its invasiveness -, any further  
> ideas/comments welcome.
>

One way of finding out if cache hottness was the problem would be to profile
for cache misses and see if there are massive differences with and without
the patch. Is that an option?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

next prev parent reply	other threads:[~2009-12-11 11:20 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-07 14:39 Performance regression in scsi sequential throughput (iozone) due to "e084b - page-allocator: preserve PFN ordering when __GFP_COLD is set" Christian Ehrhardt
2009-12-07 15:09 ` Mel Gorman
2009-12-08 17:59   ` Christian Ehrhardt
2009-12-10 14:36     ` Christian Ehrhardt
2009-12-11 11:20       ` Mel Gorman [this message]
2009-12-11 14:47         ` Christian Ehrhardt
2009-12-18 13:38           ` Christian Ehrhardt
2009-12-18 17:42             ` Mel Gorman
2010-01-14 12:30               ` Christian Ehrhardt
2010-01-19 11:33                 ` Mel Gorman
2010-02-05 15:51                   ` Christian Ehrhardt
2010-02-05 17:49                     ` Mel Gorman
2010-02-08 14:01                       ` Christian Ehrhardt
2010-02-08 15:21                         ` Mel Gorman
2010-02-08 16:55                           ` Mel Gorman
2010-02-09  6:23                           ` Christian Ehrhardt
2010-02-09 15:52                           ` Christian Ehrhardt
2010-02-09 17:57                             ` Mel Gorman
2010-02-11 16:11                               ` Christian Ehrhardt
2010-02-12 10:05                                 ` Nick Piggin
2010-02-15  6:59                                   ` Nick Piggin
2010-02-15 15:46                                   ` Christian Ehrhardt
2010-02-16 11:25                                     ` Mel Gorman
2010-02-16 16:47                                       ` Christian Ehrhardt
2010-02-17  9:55                                         ` Christian Ehrhardt
2010-02-17 10:03                                           ` Christian Ehrhardt
2010-02-18 11:43                                           ` Mel Gorman
2010-02-18 16:09                                             ` Christian Ehrhardt
2010-02-19 11:19                                               ` Christian Ehrhardt
2010-02-19 15:19                                                 ` Mel Gorman
2010-02-22 15:42                                                   ` Christian Ehrhardt
2010-02-25 15:13                                                     ` Christian Ehrhardt
2010-02-26 11:18                                                       ` Nick Piggin
2010-03-02  6:52                                                   ` Nick Piggin
2010-03-02 10:04                                                     ` Mel Gorman
2010-03-02 10:36                                                       ` Nick Piggin
2010-03-02 11:01                                                         ` Mel Gorman
2010-03-02 11:18                                                           ` Nick Piggin
2010-03-02 11:24                                                             ` Mel Gorman
2010-03-03  6:51                                                               ` Christian Ehrhardt
2010-02-08 15:02                       ` Christian Ehrhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091211112009.GC30670@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=SCHILLIG@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=christof.schmitt@de.ibm.com \
    --cc=ehrhardt@linux.vnet.ibm.com \
    --cc=epasch@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=narayanan.g@samsung.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=thoss@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).