From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-btrfs-owner@vger.kernel.org>
Received: from aserp1040.oracle.com ([141.146.126.69]:24474 "EHLO
	aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751553Ab3KZHbZ (ORCPT
	<rfc822;linux-btrfs@vger.kernel.org>);
	Tue, 26 Nov 2013 02:31:25 -0500
Date: Tue, 26 Nov 2013 15:31:10 +0800
From: Liu Bo <bo.li.liu@oracle.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: Chris Mason <chris.mason@fusionio.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3 00/17] Replace btrfs_workers with kernel workqueue
 based btrfs_workqueue_struct
Message-ID: <20131126073109.GD29771@localhost.localdomain>
Reply-To: bo.li.liu@oracle.com
References: <1383803527-23736-1-git-send-email-quwenruo@cn.fujitsu.com>
 <20131107175456.3802.35292@localhost.localdomain>
 <5293FBEF.6050309@cn.fujitsu.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <5293FBEF.6050309@cn.fujitsu.com>
Sender: linux-btrfs-owner@vger.kernel.org
List-ID: <linux-btrfs.vger.kernel.org>

On Tue, Nov 26, 2013 at 09:39:59AM +0800, Qu Wenruo wrote:
> On Thu, 7 Nov 2013 12:54:56 -0500, Chris Mason wrote:
> >Quoting Qu Wenruo (2013-11-07 00:51:50)
> >>Add a new btrfs_workqueue_struct which use kernel workqueue to implement
> >>most of the original btrfs_workers, to replace btrfs_workers.
> >>
> >>With this patchset, redundant workqueue codes are replaced with kernel
> >>workqueue infrastructure, which not only reduces the code size but also the
> >>effort to maintain it.
> >>
> >>More performace tests are ongoing, the result from sysbench shows minor
> >>improvement on the following server:
> >>CPU: two-way Xeon X5660
> >>RAM: 4G
> >>HDD: SAS HDD, 150G total, 40G partition for btrfs test
> >>
> >>Test result:
> >>Mode|Num_threads|block size|extra flags|performance change vs 3.11 kernel
> >>rndrd   1       4K      none            +1.22%
> >>rndrd   1       32K     none            +1.00%
> >>rndrd   8       32K     sync            +1.35%
> >>seqrd   8       4K      direct          +5.56%
> >>seqwr   8       4K      none            -1.26%
> >>seqwr   8       32K     sync            +1.20%
> >>
> >>Changes below 1% are not mentioned.
> >>Overall the patchset doesn't change the performance on HDD.
> >>
> >>Since more tests are needed, more test result are welcomed.
> >Thanks for working on this, it's really good to move toward a single set
> >of workqueues in the kernel.
> >
> >Have you benchmarked with compression on?  Especially on modern
> >hardware, the crcs don't exercise the workqueues very much.
> >
> >-chris
> >
> The result with compression on is quite interesting.
> Overall minor improvement in random read,
> mixed but still minor changes in sequence write.
> Some impressive improvement and small regression in random write,
> as well as some improvement in sequence write.
> 
> But overall, test result with compression is not as stable as the
> ones without compression,(some result data can change up to 15%
> using the same kernel)
> and the result seems good overall, even with some regression in some tests.
> 
> I think the test machine should be modern enough as the following.
> CPU: Two way Xeon X5660  @ 2.80GHz(24 cores when full load)
> RAM: 4G(with mem=4G in kernel cmdline, physical RAM is 8G)
> HDD: SAS 150G HDD, test btrfs partition is 40G
> 
> The detail test result is like the following:(Only changes over 1%
> is mentioned)
> 
> Mode|Num_threads|block size|extra flags|performance change vs 3.11 kernel
> rndrd	1	32K	async		+1.98%
> rndrd	1	32K	none		+2.77%
> rndrd	8	4K	async		+5.16%
> rndrd	8	4K	none		+5.57%
> rndrd	8	32K	async		+5.11%
> seqrd	1	4K	none		+3.84%
> seqrd	1	32K	async		-2.84%
> seqrd	1	32K	none		+1.87%
> seqrd	8	4K	none		+4.75%
> seqrd	8	32K	async		+1.02%
> seqrd	8	32K	none		-1.38%
> rndwr	1	4K	direct		-7.84%
> rndwr	1	4K	none		+30.21% (*1)
> rndwr	1	32K	async		-7.84%
> rndwr	1	32K	none		-1.59%
> rndwr	8	4K	async		+32.60% (*2)
> rndwr	8	4K	none		+20.34% (*3)
> rndwr	8	32K	async		+1.06%
> rndwr	8	32K	none		-14.64% (*4)
> seqwr	1	4K	async		-1.87%
> seqwr	1	4K	none		+4.65%
> seqwr	1	32K	async		+1.72%
> seqwr	1	32K	none		+9.65%
> seqwr	8	4K	async		+6.47%
> seqwr	8	4K	none		-6.38%
> seqwr	8	32K	async		+15.14%
> seqwr	8	32K	none		+9.38%
> 
> *1: The data on original kernel changes between 35~45MBytes/s,
> But on the patched kernel, the result tends to get a result of 70MBytes/s(about 50% chance),
> but sometimes, the result can also drops to the 35~45MBytes/s.(50% chance)
> 
> *2: Much like *1, with patched kernel, result is more unstable and has a high chance to
> get a better result. Even the worst result with patched kernel, the data is still on par
> with the original kernel.
> 
> *3: Much like *1 or *2, this time, the original kernel also have a chance to get a better result,
> but the possibility is much smaller than the patched kernel.
> 
> *4: Sadly, this time the patched kernel is more unstable and has a high chance to get a worse result.
> 
> *1~*4 only differ in the chance of unstable good/bad data, and the stable data seems on par.

Can you verify if this is caused by overcommit stuff?

Not sure if 40G is large enough to meet the metadata creation.

thanks,
-liubo