linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, davej@redhat.com,
	viro@zeniv.linux.org.uk, jack@suse.cz, glommer@parallels.com
Subject: Re: [PATCH 01/11] writeback: plug writeback at a high level
Date: Wed, 31 Jul 2013 16:40:19 +0200	[thread overview]
Message-ID: <20130731144019.GC22930@quack.suse.cz> (raw)
In-Reply-To: <1375244150-27296-2-git-send-email-david@fromorbit.com>

On Wed 31-07-13 14:15:40, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Doing writeback on lots of little files causes terrible IOPS storms
> because of the per-mapping writeback plugging we do. This
> essentially causes imeediate dispatch of IO for each mapping,
> regardless of the context in which writeback is occurring.
> 
> IOWs, running a concurrent write-lots-of-small 4k files using fsmark
> on XFS results in a huge number of IOPS being issued for data
> writes.  Metadata writes are sorted and plugged at a high level by
> XFS, so aggregate nicely into large IOs. However, data writeback IOs
> are dispatched in individual 4k IOs, even when the blocks of two
> consecutively written files are adjacent.
> 
> Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem,
> metadata CRCs enabled.
> 
> Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches)
> 
> Test:
> 
> $ ./fs_mark  -D  10000  -S0  -n  10000  -s  4096  -L  120  -d
> /mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d
> /mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d
> /mnt/scratch/6  -d  /mnt/scratch/7
> 
> Result:
> 
> 		wall	sys	create rate	Physical write IO
> 		time	CPU	(avg files/s)	 IOPS	Bandwidth
> 		-----	-----	------------	------	---------
> unpatched	6m56s	15m47s	24,000+/-500	26,000	130MB/s
> patched		5m06s	13m28s	32,800+/-600	 1,500	180MB/s
> improvement	-26.44%	-14.68%	  +36.67%	-94.23%	+38.46%
> 
> If I use zero length files, this workload at about 500 IOPS, so
> plugging drops the data IOs from roughly 25,500/s to 1000/s.
> 3 lines of code, 35% better throughput for 15% less CPU.
> 
> The benefits of plugging at this layer are likely to be higher for
> spinning media as the IO patterns for this workload are going make a
> much bigger difference on high IO latency devices.....
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
  Just one question: Won't this cause a regression when files are say 2 MB
large? Then we generate maximum sized requests for these files with
per-inode plugging anyway and they will unnecessarily sit in the plug list
until the plug list gets full (that is after 16 requests). Granted it
shouldn't be too long but with fast storage it may be measurable...

Now if we have maximum sized request in the plug list, maybe we could just
dispatch it right away but that's another story.


							Honza
> ---
>  fs/fs-writeback.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 68851ff..1d23d9a 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -589,7 +589,9 @@ static long writeback_sb_inodes(struct super_block *sb,
>  	unsigned long start_time = jiffies;
>  	long write_chunk;
>  	long wrote = 0;  /* count both pages and inodes */
> +	struct blk_plug plug;
>  
> +	blk_start_plug(&plug);
>  	while (!list_empty(&wb->b_io)) {
>  		struct inode *inode = wb_inode(wb->b_io.prev);
>  
> @@ -686,6 +688,7 @@ static long writeback_sb_inodes(struct super_block *sb,
>  				break;
>  		}
>  	}
> +	blk_finish_plug(&plug);
>  	return wrote;
>  }
>  
> -- 
> 1.8.3.2
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2013-07-31 14:40 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-31  4:15 [PATCH 00/11] Sync and VFS scalability improvements Dave Chinner
2013-07-31  4:15 ` [PATCH 01/11] writeback: plug writeback at a high level Dave Chinner
2013-07-31 14:40   ` Jan Kara [this message]
2013-08-01  5:48     ` Dave Chinner
2013-08-01  8:34       ` Jan Kara
2013-07-31  4:15 ` [PATCH 02/11] inode: add IOP_NOTHASHED to avoid inode hash lock in evict Dave Chinner
2013-07-31 14:44   ` Jan Kara
2013-08-01  8:12   ` Christoph Hellwig
2013-08-02  1:11     ` Dave Chinner
2013-08-02 14:32       ` Christoph Hellwig
2013-07-31  4:15 ` [PATCH 03/11] inode: convert inode_sb_list_lock to per-sb Dave Chinner
2013-07-31 14:48   ` Jan Kara
2013-07-31  4:15 ` [PATCH 04/11] sync: serialise per-superblock sync operations Dave Chinner
2013-07-31 15:12   ` Jan Kara
2013-07-31  4:15 ` [PATCH 05/11] inode: rename i_wb_list to i_io_list Dave Chinner
2013-07-31 14:51   ` Jan Kara
2013-07-31  4:15 ` [PATCH 06/11] bdi: add a new writeback list for sync Dave Chinner
2013-07-31 15:11   ` Jan Kara
2013-08-01  5:59     ` Dave Chinner
2013-07-31  4:15 ` [PATCH 07/11] writeback: periodically trim the writeback list Dave Chinner
2013-07-31 15:15   ` Jan Kara
2013-08-01  6:16     ` Dave Chinner
2013-08-01  9:03       ` Jan Kara
2013-07-31  4:15 ` [PATCH 08/11] inode: convert per-sb inode list to a list_lru Dave Chinner
2013-08-01  8:19   ` Christoph Hellwig
2013-08-02  1:06     ` Dave Chinner
2013-07-31  4:15 ` [PATCH 09/11] fs: Use RCU lookups for inode cache Dave Chinner
2013-07-31  4:15 ` [PATCH 10/11] list_lru: don't need node lock in list_lru_count_node Dave Chinner
2013-07-31  4:15 ` [PATCH 11/11] list_lru: don't lock during add/del if unnecessary Dave Chinner
2013-07-31  6:48 ` [PATCH 00/11] Sync and VFS scalability improvements Sedat Dilek
2013-08-01  6:19   ` Dave Chinner
2013-08-01  6:31     ` Sedat Dilek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130731144019.GC22930@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=david@fromorbit.com \
    --cc=glommer@parallels.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).