From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Mailand Subject: Re: Btrfs slowdown with ceph (how to reproduce) Date: Tue, 24 Jan 2012 20:15:58 +0100 Message-ID: <4F1F036E.9030801@tuxadero.com> References: <20120123181928.GA3724@localhost.localdomain> <20120123185040.GH4387@shiny> Reply-To: martin@tuxadero.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed To: Chris Mason , Josef Bacik , Christian Brunner , linux-btrfs@vger.kernel.org, ceph-devel@vger.kernel.org Return-path: In-Reply-To: <20120123185040.GH4387@shiny> List-ID: Hi I tried the branch on one of my ceph osd, and there is a big difference in the performance. The average request size stayed high, but after around a hour the kernel crashed. IOstat http://pastebin.com/xjuriJ6J Kernel trace http://pastebin.com/SYE95GgH -martin Am 23.01.2012 19:50, schrieb Chris Mason: > On Mon, Jan 23, 2012 at 01:19:29PM -0500, Josef Bacik wrote: >> On Fri, Jan 20, 2012 at 01:13:37PM +0100, Christian Brunner wrote: >>> As you might know, I have been seeing btrfs slowdowns in our ceph >>> cluster for quite some time. Even with the latest btrfs code for 3.3 >>> I'm still seeing these problems. To make things reproducible, I've now >>> written a small test, that imitates ceph's behavior: >>> >>> On a freshly created btrfs filesystem (2 TB size, mounted with >>> "noatime,nodiratime,compress=lzo,space_cache,inode_cache") I'm opening >>> 100 files. After that I'm doing random writes on these files with a >>> sync_file_range after each write (each write has a size of 100 bytes) >>> and ioctl(BTRFS_IOC_SYNC) after every 100 writes. >>> >>> After approximately 20 minutes, write activity suddenly increases >>> fourfold and the average request size decreases (see chart in the >>> attachment). >>> >>> You can find IOstat output here: http://pastebin.com/Smbfg1aG >>> >>> I hope that you are able to trace down the problem with the test >>> program in the attachment. >> >> Ran it, saw the problem, tried the dangerdonteveruse branch in Chris's tree and >> formatted the fs with 64k node and leaf sizes and the problem appeared to go >> away. So surprise surprise fragmentation is biting us in the ass. If you can >> try running that branch with 64k node and leaf sizes with your ceph cluster and >> see how that works out. Course you should only do that if you dont mind if you >> lose everything :). Thanks, >> > > Please keep in mind this branch is only out there for development, and > it really might have huge flaws. scrub doesn't work with it correctly > right now, and the IO error recovery code is probably broken too. > > Long term though, I think the bigger block sizes are going to make a > huge difference in these workloads. > > If you use the very dangerous code: > > mkfs.btrfs -l 64k -n 64k /dev/xxx > > (-l is leaf size, -n is node size). > > 64K is the max right now, 32K may help just as much at a lower CPU cost. > > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html