From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vx0-f174.google.com ([209.85.220.174]:52190 "EHLO mail-vx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752381Ab1LIPRW convert rfc822-to-8bit (ORCPT ); Fri, 9 Dec 2011 10:17:22 -0500 Received: by vcbfk14 with SMTP id fk14so2328864vcb.19 for ; Fri, 09 Dec 2011 07:17:22 -0800 (PST) MIME-Version: 1.0 Reply-To: chb@muc.de In-Reply-To: References: Date: Fri, 9 Dec 2011 16:17:21 +0100 Message-ID: Subject: Re: [PATCH 02/20] Btrfs: initialize new bitmaps' list From: Christian Brunner To: Alexandre Oliva Cc: linux-btrfs@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-btrfs-owner@vger.kernel.org List-ID: 2011/12/7 Christian Brunner : > 2011/12/1 Christian Brunner : >> 2011/12/1 Alexandre Oliva : >>> On Nov 29, 2011, Christian Brunner wrote: >>> >>>> When I'm doing havy reading in our ceph cluster. The load and wait-io >>>> on the patched servers is higher than on the unpatched ones. >>> >>> That's unexpected. > > In the mean time I know, that it's not related to the reads. > >>> I suppose I could wave my hands while explaining that you're getting >>> higher data throughput, so it's natural that it would take up more >>> resources, but that explanation doesn't satisfy me.  I suppose >>> allocation might have got slightly more CPU intensive in some cases, as >>> we now use bitmaps where before we'd only use the cheaper-to-allocate >>> extents.  But that's unsafisfying as well. >> >> I must admit, that I do not completely understand the difference >> between bitmaps and extents. >> >> From what I see on my servers, I can tell, that the degradation over >> time is gone. (Rebooting the servers every day is no longer needed. >> This is a real plus.) But the performance compared to a freshly >> booted, unpatched server is much slower with my ceph workload. >> >> I wonder if it would make sense to initialize the list field only, >> when the cluster setup fails? This would avoid the fallback to the >> much unclustered allocation and would give us the cheaper-to-allocate >> extents. > > I've now tried various combinations of you patches and I can really > nail it down to this one line. > > With this patch applied I get much higher write-io values than without > it. Some of the other patches help to reduce the effect, but it's > still significant. > > iostat on an unpatched node is giving me: > > Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s > avgrq-sz avgqu-sz   await  svctm  %util > sda             105.90     0.37   15.42   14.48  2657.33   560.13 > 107.61     1.89   62.75   6.26  18.71 > > while on a node with this patch it's > sda             128.20     0.97   11.10   57.15  3376.80   552.80 > 57.58    20.58  296.33   4.16  28.36 > > > Also interesting, is the fact that the average request size on the > patched node is much smaller. > > Josef was telling me, that this could be related to the number of > bitmaps we write out, but I've no idea how to trace this. > > I would be very happy if someone could give me a hint on what to do > next, as this is one of the last remaining issues with our ceph > cluster. This is still bugging me and I just remembered something that might be helpfull. Also I hope that this is not misleading... Back in 2.6.38 we were running ceph without btrfs performance degradation. I found a thread on the list where similar problems where reported: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg10346.html In that thread someone bisected the issue to >>From 4e69b598f6cfb0940b75abf7e179d6020e94ad1e Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Mon, 21 Mar 2011 10:11:24 -0400 Subject: [PATCH] Btrfs: cleanup how we setup free space clusters In this commit the bitmaps handling was changed. So I just thought that this may be related. I'm still hoping, that someone with a deeper understanding of btrfs could take a look at this. Thanks, Christian