Re: [RFC] writeback and cgroup

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Fengguang Wu <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>, Tejun Heo <tj@kernel.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-mm@kvack.org, sjayaraman@suse.com, andrea@betterlinux.com,
	jmoyer@redhat.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, kamezawa.hiroyu@jp.fujitsu.com,
	lizefan@huawei.com, containers@lists.linux-foundation.org,
	cgroups@vger.kernel.org, ctalbott@google.com, rni@google.com,
	lsf@lists.linux-foundation.org
Subject: Re: [RFC] writeback and cgroup
Date: Wed, 25 Apr 2012 20:05:02 +0800	[thread overview]
Message-ID: <20120425120502.GA18819@localhost> (raw)
In-Reply-To: <20120425090156.GB12568@quack.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 4319 bytes --]

> > So the cfq behavior is pretty undetermined. I more or less realize
> > this from the experiments. For example, when starting 2+ "dd oflag=direct"
> > tasks in one single cgroup, they _sometimes_ progress at different rates.
> > See the attached graphs for two such examples on XFS. ext4 is fine.
> > 
> > The 2-dd test case is:
> > 
> > mkdir /cgroup/dd
> > echo $$ > /cgroup/dd/tasks
> > 
> > dd if=/dev/zero of=/fs/zero1 bs=1M oflag=direct &
> > dd if=/dev/zero of=/fs/zero2 bs=1M oflag=direct &
> > 
> > The 6-dd test case is similar.
>   Hum, interesting. I would not expect that. Maybe it's because files are
> allocated at the different area of the disk. But even then the difference
> should not be *that* big.

Agreed.

> > > > Look at this graph, the 4 dd tasks are granted the same weight (2 of
> > > > them are buffered writes). I guess the 2 buffered dd tasks managed to
> > > > progress much faster than the 2 direct dd tasks just because the async
> > > > IOs are much more efficient than the bs=64k direct IOs.
> > >   Likely because 64k is too low to get good bandwidth with direct IO. If
> > > it was 4M, I believe you would get similar throughput for buffered and
> > > direct IO. So essentially you are right, small IO benefits from caching
> > > effects since they allow you to submit larger requests to the device which
> > > is more efficient.
> > 
> > I didn't direct compare the effects, however here is an example of
> > doing 1M, 64k, 4k direct writes in parallel. It _seems_ bs=1M only has
> > marginal benefits of 64k, assuming cfq is behaving well.
> > 
> > https://github.com/fengguang/io-controller-tests/raw/master/log/snb/ext4/direct-write-1M-64k-4k.2012-04-19-10-50/balance_dirty_pages-task-bw.png
> > 
> > The test case is:
> > 
> > # cgroup 1
> > echo 500 > /cgroup/cp/blkio.weight
> > 
> > dd if=/dev/zero of=/fs/zero-1M bs=1M oflag=direct &
> > 
> > # cgroup 2
> > echo 1000 > /cgroup/dd/blkio.weight
> > 
> > dd if=/dev/zero of=/fs/zero-64k bs=64k oflag=direct &
> > dd if=/dev/zero of=/fs/zero-4k  bs=4k  oflag=direct &
>   Um, I'm not completely sure what you tried to test in the above test.

Yeah it's not a good test case. I've changed it to run the 3 dd tasks
in 3 cgroups with equal weight. Attached the new results (looks the
same as the original one).

> What I wanted to point out is that direct IO is not necessarily less
> efficient than buffered IO. Look:
> xen-node0:~ # uname -a
> Linux xen-node0 3.3.0-rc4-xen+ #6 SMP PREEMPT Tue Apr 17 06:48:08 UTC 2012
> x86_64 x86_64 x86_64 GNU/Linux
> xen-node0:~ # dd if=/dev/zero of=/mnt/file bs=1M count=1024 conv=fsync
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 10.5304 s, 102 MB/s
> xen-node0:~ # dd if=/dev/zero of=/mnt/file bs=1M count=1024 oflag=direct conv=fsync
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 10.3678 s, 104 MB/s
> 
> So both direct and buffered IO are about the same. Note that I used
> conv=fsync flag to erase the effect that part of buffered write still
> remains in the cache when dd is done writing which is unfair to direct
> writer...

OK, I also find direct write being a bit faster than buffered write:

root@snb /home/wfg# dd if=/dev/zero of=/mnt/file bs=1M count=1024 conv=fsync

1073741824 bytes (1.1 GB) copied, 10.4039 s, 103 MB/s
1073741824 bytes (1.1 GB) copied, 10.4143 s, 103 MB/s

root@snb /home/wfg# dd if=/dev/zero of=/mnt/file bs=1M count=1024 oflag=direct conv=fsync

1073741824 bytes (1.1 GB) copied, 9.9006 s, 108 MB/s
1073741824 bytes (1.1 GB) copied, 9.55173 s, 112 MB/s

root@snb /home/wfg# dd if=/dev/zero of=/mnt/file bs=64k count=16384 oflag=direct conv=fsync

1073741824 bytes (1.1 GB) copied, 9.83902 s, 109 MB/s
1073741824 bytes (1.1 GB) copied, 9.61725 s, 112 MB/s

> And actually 64k vs 1M makes a big difference on my machine:
> xen-node0:~ # dd if=/dev/zero of=/mnt/file bs=64k count=16384 oflag=direct conv=fsync
> 16384+0 records in
> 16384+0 records out
> 1073741824 bytes (1.1 GB) copied, 19.3176 s, 55.6 MB/s

Interestingly, my 64k direct writes are as fast as 1M direct writes...
and 4k writes run at ~1/4 speed:

root@snb /home/wfg# dd if=/dev/zero of=/mnt/file bs=4k count=$((256<<10)) oflag=direct conv=fsync

1073741824 bytes (1.1 GB) copied, 42.0726 s, 25.5 MB/s

Thanks,
Fengguang

[-- Attachment #2: balance_dirty_pages-task-bw.png --]
[-- Type: image/png, Size: 61279 bytes --]

next prev parent reply	other threads:[~2012-04-25 12:07 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-03 18:36 [RFC] writeback and cgroup Tejun Heo
2012-04-04 14:51 ` Vivek Goyal
2012-04-04 15:36   ` [Lsf] " Steve French
2012-04-04 18:56     ` Tejun Heo
2012-04-04 19:19       ` Vivek Goyal
2012-04-25  8:47         ` Suresh Jayaraman
2012-04-04 18:49   ` Tejun Heo
2012-04-04 19:23     ` [Lsf] " Steve French
2012-04-14 12:15       ` Peter Zijlstra
2012-04-04 20:32     ` Vivek Goyal
2012-04-04 23:02       ` Tejun Heo
2012-04-05 16:38     ` Tejun Heo
2012-04-05 17:13       ` Vivek Goyal
2012-04-14 11:53     ` [Lsf] " Peter Zijlstra
2012-04-07  8:00   ` Jan Kara
2012-04-10 16:23     ` [Lsf] " Steve French
2012-04-10 18:16       ` Vivek Goyal
2012-04-10 18:06     ` Vivek Goyal
2012-04-10 21:05       ` Jan Kara
2012-04-10 21:20         ` Vivek Goyal
2012-04-10 22:24           ` Jan Kara
2012-04-11 15:40             ` Vivek Goyal
2012-04-11 15:45               ` Vivek Goyal
2012-04-11 17:05                 ` Jan Kara
2012-04-11 17:23                   ` Vivek Goyal
2012-04-11 19:44                     ` Jan Kara
2012-04-17 21:48                   ` Tejun Heo
2012-04-18 18:18                     ` Vivek Goyal
2012-04-11 19:22               ` Jan Kara
2012-04-12 20:37                 ` Vivek Goyal
2012-04-12 20:51                   ` Tejun Heo
2012-04-14 14:36                     ` Fengguang Wu
2012-04-16 14:57                       ` Vivek Goyal
2012-04-24 11:33                         ` Fengguang Wu
2012-04-24 14:56                           ` Jan Kara
2012-04-24 15:58                             ` Vivek Goyal
2012-04-25  2:42                               ` Fengguang Wu
2012-04-25  3:16                             ` Fengguang Wu
2012-04-25  9:01                               ` Jan Kara
2012-04-25 12:05                                 ` Fengguang Wu [this message]
2012-04-15 11:37                   ` [Lsf] " Peter Zijlstra
2012-04-17 22:01                 ` Tejun Heo
2012-04-18  6:30                   ` Jan Kara
2012-04-14 12:25               ` [Lsf] " Peter Zijlstra
2012-04-16 12:54                 ` Vivek Goyal
2012-04-16 13:07                   ` Fengguang Wu
2012-04-16 14:19                     ` Fengguang Wu
2012-04-16 15:52                     ` Vivek Goyal
2012-04-17  2:14                       ` Fengguang Wu
2012-04-04 17:51 ` Fengguang Wu
2012-04-04 18:35   ` Vivek Goyal
2012-04-04 21:42     ` Fengguang Wu
2012-04-05 15:10       ` Vivek Goyal
2012-04-06  0:32         ` Fengguang Wu
2012-04-04 19:33   ` Tejun Heo
2012-04-04 20:18     ` Vivek Goyal
2012-04-05 16:31       ` Tejun Heo
2012-04-05 17:09         ` Vivek Goyal
2012-04-06  9:59     ` Fengguang Wu
2012-04-17 22:38       ` Tejun Heo
2012-04-19 14:23         ` Fengguang Wu
2012-04-19 18:31           ` Vivek Goyal
2012-04-20 12:45             ` Fengguang Wu
2012-04-20 19:29               ` Vivek Goyal
2012-04-20 21:33                 ` Tejun Heo
2012-04-22 14:26                   ` Fengguang Wu
2012-04-23 12:30                   ` Vivek Goyal
2012-04-23 16:04                     ` Tejun Heo
2012-04-19 20:26           ` Jan Kara
2012-04-20 13:34             ` Fengguang Wu
2012-04-20 19:08               ` Tejun Heo
2012-04-22 14:46                 ` Fengguang Wu
2012-04-23 16:56                   ` Tejun Heo
2012-04-24  7:58                     ` Fengguang Wu
2012-04-25 15:47                       ` Tejun Heo
2012-04-23  9:14               ` Jan Kara
2012-04-23 10:24                 ` Fengguang Wu
2012-04-23 12:42                   ` Jan Kara
2012-04-23 14:31                     ` Fengguang Wu
2012-04-18  6:57       ` Jan Kara
2012-04-18  7:58         ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120425120502.GA18819@localhost \
    --to=fengguang.wu@intel.com \
    --cc=andrea@betterlinux.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=ctalbott@google.com \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=lsf@lists.linux-foundation.org \
    --cc=rni@google.com \
    --cc=sjayaraman@suse.com \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).