All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Shan Wei <shanwei@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: CFQ is worse than other IO schedulers in some cases
Date: Wed, 18 Feb 2009 12:37:04 +0100	[thread overview]
Message-ID: <20090218113704.GW30821@kernel.dk> (raw)
In-Reply-To: <499BA413.2010705@cn.fujitsu.com>

On Wed, Feb 18 2009, Shan Wei wrote:
> I found that CFQ's performance is worse than other IO scheduer in some cases
> I confirmed its phenomenon when I executed dump command and sysbench on 2.6.28.
> 
> 
> In dump(version:dump-0.4b41-2.fc6), I confirmed 
> the speed under CFQ is slower than other IO schedulers.
> 
> 
> The Test Result(dump):
>    UNIT:Mb/sec
>     _______________________
>     |   IO       |        | 
>     | scheduler  |  Speed |
>     +------------|--------|
>     |cfq         | 24.310 |  
>     |noop        | 36.885 |  
>     |anticipatory| 34.956 |  
>     |deadline    | 36.758 |  
>     +----------------------
> 
> 
> Steps to reproduce(dump):
>   #dump -0uf /dev/null /dev/sda6

The dump issue is a known one, it has to do with how dump uses seperate
processes to interleave IO to the 'same' location. Jeff Moyer posted a
fix for that some time ago, you can also find references to the
discussion and progress right here on lkml. For reference, patch is
included.

> In sysbench(version:sysbench-0.4.10), I confirmed followings.
>   - CFQ's performance is worse than other IO schedulers when only multiple
>     threads test.
>     (There is no difference under single thread test.)
>   - It is worse than other IO scheduler when
>     I used read mode. (No regression in write mode).
>   - There is no difference among other IO schedulers. (e.g noop deadline)
> 
> 
> The Test Result(sysbench):
>    UNIT:Mb/sec
>     __________________________________________________
>     |   IO       |      thread  number               |  
>     | scheduler  |-----------------------------------|
>     |            |  1   |  3    |  5   |   7  |   9  |
>     +------------|------|-------|------|------|------|
>     |cfq         | 77.8 |  32.4 | 43.3 | 55.8 | 58.5 | 
>     |noop        | 78.2 |  79.0 | 78.2 | 77.2 | 77.0 |
>     |anticipatory| 78.2 |  78.6 | 78.4 | 77.8 | 78.1 |
>     |deadline    | 76.9 |  78.4 | 77.0 | 78.4 | 77.9 |
>     +------------------------------------------------+

What kind of storage hardware did you use?

------

Hi,

dump performs poorly when run under the CFQ I/O scheduler.  The reason
for this is that the dump command interleaves I/O between two (or
three?) cooperating processes.  This is about the worst case scenario
you can get for CFQ, as the I/O access pattern within each process is
sequential.  Thus, CFQ will idle for a number of milliseconds waiting
for the current process to issue more I/O before switching to the next.

Now, this behaviour can be changed with tuning.  However, if the dump
command simply shared I/O contexts between cooperating processes, CFQ
could make more intelligent decisions about I/O scheduling.

So, here are the numbers, running under 2.6.28-rc3.

deadline    82241 kB/s
cfq	    34143 kB/s
cfq-shared  82241 kB/s

cfq-shared denotes that the dump utility was patched with the attached
patch to share I/O contexts.  As you can see, with a very little bit of
code change, we can drastically increase the performance of dump under
CFQ (which is the default I/O scheduler used in a number of
distributions).

For more information on the underlying problems, you can refer to the
following kernel discussion:
  http://lkml.org/lkml/2008/11/9/133

Comments are appreciated.

Cheers,

Jeff

diff -up ./dump/tape.c.orig ./dump/tape.c
--- ./dump/tape.c.orig	2005-08-20 17:00:48.000000000 -0400
+++ ./dump/tape.c	2008-11-17 16:40:42.575792509 -0500
@@ -187,6 +187,40 @@ static sigjmp_buf jmpbuf;	/* where to ju
 static int gtperr = 0;
 #endif
 
+/*
+ * Determine if we can use Linux' clone system call.  If so, call it
+ * with the CLONE_IO flag so that all processes will share the same I/O
+ * context, allowing the I/O schedulers to make better scheduling decisions.
+ */
+#ifdef __linux__
+#include <syscall.h>
+
+#ifndef SYS_clone
+#define fork_clone_io fork
+#else /* SYS_clone */
+#include <linux/version.h>
+ 
+/*
+ * Kernel 2.5.49 introduced two extra parameters to the clone system call.
+ * Neither is useful in our case, so this is easy to handle.
+ */
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,49)
+/* clone_flags, child_stack, parent_tidptr, child_tidptr */
+#define CLONE_ARGS SIGCHLD|CLONE_IO, 0, NULL, NULL
+#else
+#define CLONE_ARGS SIGCHLD|CLONE_IO, 0
+#endif /* LINUX_VERSION_CODE */
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <unistd.h>
+#undef _GNU_SOURCE
+pid_t fork_clone_io(void);
+#endif /* SYS_clone */
+#else /* __linux__ not defined */
+#define fork_clone_io fork
+#endif /* __linux__ */
+
 int
 alloctape(void)
 {
@@ -755,6 +789,16 @@ rollforward(void)
 #endif
 }
 
+#ifdef __linux__
+#ifdef SYS_clone
+pid_t
+fork_clone_io(void)
+{
+	return syscall(SYS_clone, CLONE_ARGS);
+}
+#endif
+#endif
+
 /*
  * We implement taking and restoring checkpoints on the tape level.
  * When each tape is opened, a new process is created by forking; this
@@ -801,7 +845,7 @@ restore_check_point:
 	/*
 	 *	All signals are inherited...
 	 */
-	childpid = fork();
+	childpid = fork_clone_io();
 	if (childpid < 0) {
 		msg("Context save fork fails in parent %d\n", parentpid);
 		Exit(X_ABORT);
@@ -1017,7 +1061,7 @@ enslave(void)
 		}
 
 		if (socketpair(AF_UNIX, SOCK_STREAM, 0, cmd) < 0 ||
-		    (slaves[i].pid = fork()) < 0)
+		    (slaves[i].pid = fork_clone_io()) < 0)
 			quit("too many slaves, %d (recompile smaller): %s\n",
 			    i, strerror(errno));
 

-- 
Jens Axboe


  parent reply	other threads:[~2009-02-18 11:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-18  6:00 CFQ is worse than other IO schedulers in some cases Shan Wei
2009-02-18  8:05 ` Mike Galbraith
2009-02-18 10:15   ` Shan Wei
2009-02-18 11:35     ` Mike Galbraith
2009-03-09  5:24   ` Shan Wei
2009-03-09  7:43     ` Jens Axboe
2009-03-09 12:02       ` Shan Wei
2009-03-09 12:14         ` Jens Axboe
2009-03-09 12:31           ` Shan Wei
2009-02-18 11:37 ` Jens Axboe [this message]
2009-02-19  9:28   ` Shan Wei
2009-02-19 15:26     ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090218113704.GW30821@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shanwei@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.