From: Dave Chinner <david@fromorbit.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>, Chris Mason <chris.mason@oracle.com>,
Christoph Hellwig <hch@lst.de>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
"Theodore Ts'o" <tytso@mit.edu>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Mel Gorman <mel@csn.ul.ie>, Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Greg Thelen <gthelen@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Vivek Goyal <vgoyal@redhat.com>,
Andrea Righi <arighi@develer.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
linux-mm <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 06/27] btrfs: lower the dirty balance poll interval
Date: Fri, 4 Mar 2011 17:22:17 +1100 [thread overview]
Message-ID: <20110304062217.GE25368@dastard> (raw)
In-Reply-To: <20110303074949.419321686@intel.com>
On Thu, Mar 03, 2011 at 02:45:11PM +0800, Wu Fengguang wrote:
> Call balance_dirty_pages_ratelimit_nr() on every 32 pages dirtied.
>
> Tests show that original larger intervals can easily make the bdi
> dirty limit exceeded on 100 concurrent dd.
>
> CC: Chris Mason <chris.mason@oracle.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
> fs/btrfs/file.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> --- linux-next.orig/fs/btrfs/file.c 2011-03-02 20:15:19.000000000 +0800
> +++ linux-next/fs/btrfs/file.c 2011-03-02 20:35:07.000000000 +0800
> @@ -949,9 +949,8 @@ static ssize_t btrfs_file_aio_write(stru
> }
>
> iov_iter_init(&i, iov, nr_segs, count, num_written);
> - nrptrs = min((iov_iter_count(&i) + PAGE_CACHE_SIZE - 1) /
> - PAGE_CACHE_SIZE, PAGE_CACHE_SIZE /
> - (sizeof(struct page *)));
> + nrptrs = min(DIV_ROUND_UP(iov_iter_count(&i), PAGE_CACHE_SIZE),
> + min(32UL, PAGE_CACHE_SIZE / sizeof(struct page *)));
You're basically hardcoding the maximum to 32 pages here, because
PAGE_CACHE_SIZE / sizeof(page *) is always going to be much larger
than 32.
This means that you are effectively neutering the large write
efficiencies of btrfs - you're reducing the delayed allocation sizes
from 512 * PAGE_CACHE_SIZE down to 32 * PAGE_CACHE_SIZE. This will
increase the overhead of the write process for btrfs for large IOs.
Also, I've got some multipage write modifications that allow 1024
pages at a time between mapping/allocation calls with XFS - once
again for improving the efficiencies of the extent
mapping/allocations in the write path. If the new writeback
throttling algorithms don't work with large numbers of pages being
copied in a single go, then that's a problem.
As it is, if 100 concurrent dd's can overrun the dirty limit w/ 512
pages at a time, then 1000 concurrent dd's w/ 32 pages at a time is
just as likely to overrun it, too. We support 4096 CPU systems, so a
few thousand concurrent writers is not out of the question. Hence I
don't think just reducing the number of pages between dirty balance
calls is a sufficient solution....
Cheers,
Dave..
--
Dave Chinner
david@fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>, Chris Mason <chris.mason@oracle.com>,
Christoph Hellwig <hch@lst.de>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
Theodore Ts'o <tytso@mit.edu>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Mel Gorman <mel@csn.ul.ie>, Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Greg Thelen <gthelen@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Vivek Goyal <vgoyal@redhat.com>,
Andrea Righi <arighi@develer.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
linux-mm <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 06/27] btrfs: lower the dirty balance poll interval
Date: Fri, 4 Mar 2011 17:22:17 +1100 [thread overview]
Message-ID: <20110304062217.GE25368@dastard> (raw)
In-Reply-To: <20110303074949.419321686@intel.com>
On Thu, Mar 03, 2011 at 02:45:11PM +0800, Wu Fengguang wrote:
> Call balance_dirty_pages_ratelimit_nr() on every 32 pages dirtied.
>
> Tests show that original larger intervals can easily make the bdi
> dirty limit exceeded on 100 concurrent dd.
>
> CC: Chris Mason <chris.mason@oracle.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
> fs/btrfs/file.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> --- linux-next.orig/fs/btrfs/file.c 2011-03-02 20:15:19.000000000 +0800
> +++ linux-next/fs/btrfs/file.c 2011-03-02 20:35:07.000000000 +0800
> @@ -949,9 +949,8 @@ static ssize_t btrfs_file_aio_write(stru
> }
>
> iov_iter_init(&i, iov, nr_segs, count, num_written);
> - nrptrs = min((iov_iter_count(&i) + PAGE_CACHE_SIZE - 1) /
> - PAGE_CACHE_SIZE, PAGE_CACHE_SIZE /
> - (sizeof(struct page *)));
> + nrptrs = min(DIV_ROUND_UP(iov_iter_count(&i), PAGE_CACHE_SIZE),
> + min(32UL, PAGE_CACHE_SIZE / sizeof(struct page *)));
You're basically hardcoding the maximum to 32 pages here, because
PAGE_CACHE_SIZE / sizeof(page *) is always going to be much larger
than 32.
This means that you are effectively neutering the large write
efficiencies of btrfs - you're reducing the delayed allocation sizes
from 512 * PAGE_CACHE_SIZE down to 32 * PAGE_CACHE_SIZE. This will
increase the overhead of the write process for btrfs for large IOs.
Also, I've got some multipage write modifications that allow 1024
pages at a time between mapping/allocation calls with XFS - once
again for improving the efficiencies of the extent
mapping/allocations in the write path. If the new writeback
throttling algorithms don't work with large numbers of pages being
copied in a single go, then that's a problem.
As it is, if 100 concurrent dd's can overrun the dirty limit w/ 512
pages at a time, then 1000 concurrent dd's w/ 32 pages at a time is
just as likely to overrun it, too. We support 4096 CPU systems, so a
few thousand concurrent writers is not out of the question. Hence I
don't think just reducing the number of pages between dirty balance
calls is a sufficient solution....
Cheers,
Dave..
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-04 6:22 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-03 6:45 [PATCH 00/27] IO-less dirty throttling v6 Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 01/27] writeback: add bdi_dirty_limit() kernel-doc Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 02/27] writeback: avoid duplicate balance_dirty_pages_ratelimited() calls Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 03/27] writeback: skip balance_dirty_pages() for in-memory fs Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 04/27] writeback: reduce per-bdi dirty threshold ramp up time Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 05/27] btrfs: avoid duplicate balance_dirty_pages_ratelimited() calls Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 06/27] btrfs: lower the dirty balance poll interval Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-04 6:22 ` Dave Chinner [this message]
2011-03-04 6:22 ` Dave Chinner
2011-03-04 7:57 ` Wu Fengguang
2011-03-04 7:57 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 07/27] btrfs: wait on too many nr_async_bios Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 08/27] nfs: dirty livelock prevention is now done in VFS Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 09/27] nfs: writeback pages wait queue Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 16:07 ` Peter Zijlstra
2011-03-03 16:07 ` Peter Zijlstra
2011-03-04 1:53 ` Wu Fengguang
2011-03-04 1:53 ` Wu Fengguang
2011-03-03 16:08 ` Peter Zijlstra
2011-03-03 16:08 ` Peter Zijlstra
2011-03-04 2:01 ` Wu Fengguang
2011-03-04 2:01 ` Wu Fengguang
2011-03-04 9:10 ` Peter Zijlstra
2011-03-04 9:10 ` Peter Zijlstra
2011-03-04 9:26 ` Peter Zijlstra
2011-03-04 9:26 ` Peter Zijlstra
2011-03-04 14:38 ` Wu Fengguang
2011-03-04 14:38 ` Wu Fengguang
2011-03-04 14:41 ` Peter Zijlstra
2011-03-04 14:41 ` Peter Zijlstra
2011-03-03 6:45 ` [PATCH 10/27] nfs: limit the commit size to reduce fluctuations Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 11/27] nfs: limit the commit range Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 12/27] nfs: lower writeback threshold proportionally to dirty threshold Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 13/27] writeback: account per-bdi accumulated written pages Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 14/27] writeback: account per-bdi accumulated dirtied pages Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 15/27] writeback: bdi write bandwidth estimation Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 16/27] writeback: smoothed global/bdi dirty pages Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 17/27] writeback: smoothed dirty threshold and limit Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 18/27] writeback: enforce 1/4 gap between the dirty/background thresholds Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 19/27] writeback: dirty throttle bandwidth control Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-07 21:34 ` Wu Fengguang
2011-03-07 21:34 ` Wu Fengguang
2011-03-29 21:08 ` Wu Fengguang
2011-03-29 21:08 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 20/27] writeback: IO-less balance_dirty_pages() Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 21/27] writeback: show bdi write bandwidth in debugfs Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 22/27] writeback: trace dirty_throttle_bandwidth Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 23/27] writeback: trace balance_dirty_pages Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 24/27] writeback: trace global_dirty_state Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 25/27] writeback: make nr_to_write a per-file limit Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 26/27] writeback: scale IO chunk size up to device bandwidth Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` [PATCH 27/27] writeback: trace writeback_single_inode Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 6:45 ` Wu Fengguang
2011-03-03 20:12 ` [PATCH 00/27] IO-less dirty throttling v6 Vivek Goyal
2011-03-03 20:12 ` Vivek Goyal
2011-03-03 20:48 ` Vivek Goyal
2011-03-03 20:48 ` Vivek Goyal
2011-03-04 9:06 ` Wu Fengguang
2011-03-04 9:06 ` Wu Fengguang
2011-04-04 18:12 ` async write IO controllers Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110304062217.GE25368@dastard \
--to=david@fromorbit.com \
--cc=Trond.Myklebust@netapp.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=arighi@develer.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=chris.mason@oracle.com \
--cc=fengguang.wu@intel.com \
--cc=gthelen@google.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
--cc=tytso@mit.edu \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.