All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>
Subject: Re: [PATCH] readahead:add blk_run_backing_dev
Date: Wed, 27 May 2009 11:55:05 +0800	[thread overview]
Message-ID: <20090527035505.GA16916@localhost> (raw)
In-Reply-To: <20090526193601.b825af5f.akpm@linux-foundation.org>

On Wed, May 27, 2009 at 10:36:01AM +0800, Andrew Morton wrote:
> On Wed, 27 May 2009 11:21:53 +0900 Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote:
> 
> > 
> > At 11:09 09/05/27, Wu Fengguang wrote:
> > >On Wed, May 27, 2009 at 08:25:04AM +0800, Hisashi Hifumi wrote:
> > >> 
> > >> At 08:42 09/05/27, Andrew Morton wrote:
> > >> >On Fri, 22 May 2009 10:33:23 +0800
> > >> >Wu Fengguang <fengguang.wu@intel.com> wrote:
> > >> >
> > >> >> > I tested above patch, and I got same performance number.
> > >> >> > I wonder why if (PageUptodate(page)) check is there...
> > >> >> 
> > >> >> Thanks!  This is an interesting micro timing behavior that
> > >> >> demands some research work.  The above check is to confirm if it's
> > >> >> the PageUptodate() case that makes the difference. So why that case
> > >> >> happens so frequently so as to impact the performance? Will it also
> > >> >> happen in NFS?
> > >> >> 
> > >> >> The problem is readahead IO pipeline is not running smoothly, which is
> > >> >> undesirable and not well understood for now.
> > >> >
> > >> >The patch causes a remarkably large performance increase.  A 9%
> > >> >reduction in time for a linear read? I'd be surprised if the workload
> > >> 
> > >> Hi Andrew.
> > >> Yes, I tested this with dd.
> > >> 
> > >> >even consumed 9% of a CPU, so where on earth has the kernel gone to?
> > >> >
> > >> >Have you been able to reproduce this in your testing?
> > >> 
> > >> Yes, this test on my environment is reproducible.
> > >
> > >Hisashi, does your environment have some special configurations?
> > 
> > Hi.
> > My testing environment is as follows:
> > Hardware: HP DL580 
> > CPU:Xeon 3.2GHz *4 HT enabled
> > Memory:8GB
> > Storage: Dothill SANNet2 FC (7Disks RAID-0 Array)
> > 
> > I did dd to this disk-array and got improved performance number.
> > 
> > I noticed that when a disk is just one HDD, performance improvement
> > is very small.
> > 
> 
> Ah.  So it's likely to be some strange interaction with the RAID setup.

The normal case is, if page N become uptodate at time T(N), then
T(N) <= T(N+1) holds. But for RAID, the data arrival time depends on
runtime status of individual disks, which breaks that formula. So
in do_generic_file_read(), just after submitting the async readahead IO
request, the current page may well be uptodate, so the page won't be locked,
and the block device won't be implicitly unplugged:

               if (PageReadahead(page))
                        page_cache_async_readahead()
                if (!PageUptodate(page))
                                goto page_not_up_to_date;
                //...
page_not_up_to_date:
                lock_page_killable(page);


Therefore explicit unplugging can help, so

        Acked-by: Wu Fengguang <fengguang.wu@intel.com> 

The only question is, shall we avoid the double unplug by doing this?

---
 mm/readahead.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- linux.orig/mm/readahead.c
+++ linux/mm/readahead.c
@@ -490,5 +490,15 @@ page_cache_async_readahead(struct addres
 
 	/* do read-ahead */
 	ondemand_readahead(mapping, ra, filp, true, offset, req_size);
+
+	/*
+	 * Normally the current page is !uptodate and lock_page() will be
+	 * immediately called to implicitly unplug the device. However this
+	 * is not always true for RAID conifgurations, where data arrives
+	 * not strictly in their submission order. In this case we need to
+	 * explicitly kick off the IO.
+	 */
+	if (PageUptodate(page))
+		blk_run_backing_dev(mapping->backing_dev_info, NULL);
 }
 EXPORT_SYMBOL_GPL(page_cache_async_readahead);

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"kosaki.motohiro@jp.fujitsu.com" <kosaki.motohiro@jp.fujitsu.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>
Subject: Re: [PATCH] readahead:add blk_run_backing_dev
Date: Wed, 27 May 2009 11:55:05 +0800	[thread overview]
Message-ID: <20090527035505.GA16916@localhost> (raw)
In-Reply-To: <20090526193601.b825af5f.akpm@linux-foundation.org>

On Wed, May 27, 2009 at 10:36:01AM +0800, Andrew Morton wrote:
> On Wed, 27 May 2009 11:21:53 +0900 Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> wrote:
> 
> > 
> > At 11:09 09/05/27, Wu Fengguang wrote:
> > >On Wed, May 27, 2009 at 08:25:04AM +0800, Hisashi Hifumi wrote:
> > >> 
> > >> At 08:42 09/05/27, Andrew Morton wrote:
> > >> >On Fri, 22 May 2009 10:33:23 +0800
> > >> >Wu Fengguang <fengguang.wu@intel.com> wrote:
> > >> >
> > >> >> > I tested above patch, and I got same performance number.
> > >> >> > I wonder why if (PageUptodate(page)) check is there...
> > >> >> 
> > >> >> Thanks!  This is an interesting micro timing behavior that
> > >> >> demands some research work.  The above check is to confirm if it's
> > >> >> the PageUptodate() case that makes the difference. So why that case
> > >> >> happens so frequently so as to impact the performance? Will it also
> > >> >> happen in NFS?
> > >> >> 
> > >> >> The problem is readahead IO pipeline is not running smoothly, which is
> > >> >> undesirable and not well understood for now.
> > >> >
> > >> >The patch causes a remarkably large performance increase.  A 9%
> > >> >reduction in time for a linear read? I'd be surprised if the workload
> > >> 
> > >> Hi Andrew.
> > >> Yes, I tested this with dd.
> > >> 
> > >> >even consumed 9% of a CPU, so where on earth has the kernel gone to?
> > >> >
> > >> >Have you been able to reproduce this in your testing?
> > >> 
> > >> Yes, this test on my environment is reproducible.
> > >
> > >Hisashi, does your environment have some special configurations?
> > 
> > Hi.
> > My testing environment is as follows:
> > Hardware: HP DL580 
> > CPU:Xeon 3.2GHz *4 HT enabled
> > Memory:8GB
> > Storage: Dothill SANNet2 FC (7Disks RAID-0 Array)
> > 
> > I did dd to this disk-array and got improved performance number.
> > 
> > I noticed that when a disk is just one HDD, performance improvement
> > is very small.
> > 
> 
> Ah.  So it's likely to be some strange interaction with the RAID setup.

The normal case is, if page N become uptodate at time T(N), then
T(N) <= T(N+1) holds. But for RAID, the data arrival time depends on
runtime status of individual disks, which breaks that formula. So
in do_generic_file_read(), just after submitting the async readahead IO
request, the current page may well be uptodate, so the page won't be locked,
and the block device won't be implicitly unplugged:

               if (PageReadahead(page))
                        page_cache_async_readahead()
                if (!PageUptodate(page))
                                goto page_not_up_to_date;
                //...
page_not_up_to_date:
                lock_page_killable(page);


Therefore explicit unplugging can help, so

        Acked-by: Wu Fengguang <fengguang.wu@intel.com> 

The only question is, shall we avoid the double unplug by doing this?

---
 mm/readahead.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- linux.orig/mm/readahead.c
+++ linux/mm/readahead.c
@@ -490,5 +490,15 @@ page_cache_async_readahead(struct addres
 
 	/* do read-ahead */
 	ondemand_readahead(mapping, ra, filp, true, offset, req_size);
+
+	/*
+	 * Normally the current page is !uptodate and lock_page() will be
+	 * immediately called to implicitly unplug the device. However this
+	 * is not always true for RAID conifgurations, where data arrives
+	 * not strictly in their submission order. In this case we need to
+	 * explicitly kick off the IO.
+	 */
+	if (PageUptodate(page))
+		blk_run_backing_dev(mapping->backing_dev_info, NULL);
 }
 EXPORT_SYMBOL_GPL(page_cache_async_readahead);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-05-27  3:55 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-18  9:38 [PATCH] readahead:add blk_run_backing_dev Hisashi Hifumi
2009-05-18 17:53 ` Jens Axboe
2009-05-19  0:44   ` Hisashi Hifumi
2009-05-19 10:05   ` Hisashi Hifumi
2009-05-20  0:55   ` Hisashi Hifumi
2009-05-20  2:51   ` Wu Fengguang
2009-05-20  2:51     ` Wu Fengguang
2009-05-21  6:01     ` Hisashi Hifumi
2009-05-21  6:01       ` Hisashi Hifumi
2009-05-22  1:05       ` Wu Fengguang
2009-05-22  1:05         ` Wu Fengguang
2009-05-22  1:44         ` Hisashi Hifumi
2009-05-22  1:44           ` Hisashi Hifumi
2009-05-22  2:33           ` Wu Fengguang
2009-05-22  2:33             ` Wu Fengguang
2009-05-26 23:42             ` Andrew Morton
2009-05-26 23:42               ` Andrew Morton
2009-05-27  0:25               ` Hisashi Hifumi
2009-05-27  0:25                 ` Hisashi Hifumi
2009-05-27  0:25                 ` Hisashi Hifumi
2009-05-27  2:09                 ` Wu Fengguang
2009-05-27  2:09                   ` Wu Fengguang
2009-05-27  2:21                   ` Hisashi Hifumi
2009-05-27  2:21                     ` Hisashi Hifumi
2009-05-27  2:35                     ` KOSAKI Motohiro
2009-05-27  2:35                       ` KOSAKI Motohiro
2009-05-27  2:36                     ` Andrew Morton
2009-05-27  2:36                       ` Andrew Morton
2009-05-27  2:38                       ` Hisashi Hifumi
2009-05-27  2:38                         ` Hisashi Hifumi
2009-05-27  3:55                       ` Wu Fengguang [this message]
2009-05-27  3:55                         ` Wu Fengguang
2009-05-27  4:06                         ` KOSAKI Motohiro
2009-05-27  4:06                           ` KOSAKI Motohiro
2009-05-27  4:36                           ` Wu Fengguang
2009-05-27  4:36                             ` Wu Fengguang
2009-05-27  6:20                             ` Hisashi Hifumi
2009-05-27  6:20                               ` Hisashi Hifumi
2009-05-28  1:20                             ` Hisashi Hifumi
2009-05-28  1:20                               ` Hisashi Hifumi
2009-05-28  2:23                               ` KOSAKI Motohiro
2009-05-28  2:23                                 ` KOSAKI Motohiro
2009-06-01  1:39                                 ` Hisashi Hifumi
2009-06-01  1:39                                   ` Hisashi Hifumi
2009-06-01  1:39                                   ` Hisashi Hifumi
2009-06-01  2:23                                   ` KOSAKI Motohiro
2009-06-01  2:23                                     ` KOSAKI Motohiro
2009-05-27  2:36                     ` Wu Fengguang
2009-05-27  2:36                       ` Wu Fengguang
2009-05-27  2:47                       ` Hisashi Hifumi
2009-05-27  2:47                         ` Hisashi Hifumi
2009-05-27  2:57                         ` Wu Fengguang
2009-05-27  2:57                           ` Wu Fengguang
2009-05-27  3:06                           ` Hisashi Hifumi
2009-05-27  3:06                             ` Hisashi Hifumi
2009-05-27  3:26                             ` KOSAKI Motohiro
2009-05-27  3:26                               ` KOSAKI Motohiro
2009-06-01  2:37                             ` Wu Fengguang
2009-06-01  2:37                               ` Wu Fengguang
2009-06-01  2:51                               ` Hisashi Hifumi
2009-06-01  2:51                                 ` Hisashi Hifumi
2009-06-01  3:02                                 ` Wu Fengguang
2009-06-01  3:02                                   ` Wu Fengguang
2009-06-01  3:06                                   ` KOSAKI Motohiro
2009-06-01  3:06                                     ` KOSAKI Motohiro
2009-06-01  3:07                                   ` Hisashi Hifumi
2009-06-01  3:07                                     ` Hisashi Hifumi
2009-06-01  4:30                                     ` Wu Fengguang
2009-06-01  4:30                                       ` Wu Fengguang
2009-05-27  2:07               ` Wu Fengguang
2009-05-27  2:07                 ` Wu Fengguang
2009-05-20  1:07 ` KOSAKI Motohiro
2009-05-20  1:07   ` KOSAKI Motohiro
2009-05-20  1:43   ` Hisashi Hifumi
2009-05-20  1:43     ` Hisashi Hifumi
2009-05-20  2:52     ` Wu Fengguang
2009-05-20  2:52       ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090527035505.GA16916@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hifumi.hisashi@oss.ntt.co.jp \
    --cc=jens.axboe@oracle.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.