From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wido den Hollander Subject: Re: Higher OSD disk util due to RBD snapshots from Dumpling to Firefly Date: Thu, 08 Jan 2015 08:55:20 +0100 Message-ID: <54AE37E8.5000004@42on.com> References: <54A42280.60607@42on.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: Received: from websrv.42on.com ([31.25.102.167]:50441 "EHLO websrv.42on.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751298AbbAHHz0 (ORCPT ); Thu, 8 Jan 2015 02:55:26 -0500 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Dan van der Ster Cc: ceph-devel On 01/07/2015 05:51 PM, Dan van der Ster wrote: > Hi Wido, > I've been trying to reproduce this but haven't been able yet. > > What I've tried so far is use fio rbd with a 0.80.7 client connected > to a 0.80.7 cluster. I created a 10GB format 2 block device, then > measured the 4k randwrite iops before and after having snaps. I > measured around 2000 iops to the image before any snapshots, then > created 200 snapshots on the device and ran fio again. Initially the > iops were low (I guess this is from the 4MB CoW resulting from the > first 4k write to each underlying object). But eventually the speed > stabilized to around 2000 iops again. Actually the initial slowdown > was the same whether I created 1 snapshot or 200. > > This was just quick subjective test so far, since from your report I > was expecting something obvious to stick out. But it appears pretty > OK, no? Would you have expected something different from these tests? > Well, I'm not sure what to expect. But what I noticed is that when I removed all the snapshots the slow requests were gone and the disk util dropped on the OSDs. Wido > Cheers, Dan > > > On Wed, Dec 31, 2014 at 5:21 PM, Wido den Hollander wrote: >> Hi, >> >> Last week I upgraded a 250 OSD cluster from Dumpling 0.67.10 to Firefly >> 0.80.7 and after the upgrade there was a severe performance drop on the >> cluster. >> >> It started raining slow requests after the upgrade and most of them >> included a 'snapc' in the request. >> >> That lead me to investigate the RBD snapshots and I found that a rogue >> process had created ~1800 snapshots spread out over 200 volumes. >> >> One image even had 181 snapshots! >> >> As the snapshots weren't used I removed them all and after the snapshots >> were removed the performance of the cluster came back to normal level again. >> >> I'm wondering what changed between Dumpling and Firefly which caused >> this? I saw OSDs spiking to 100% disk util constantly under Firefly >> where this didn't happen with Dumpling. >> >> Did something change in the way OSDs handle RBD snapshots which causes >> them to create more disk I/O? >> >> -- >> Wido den Hollander >> 42on B.V. >> Ceph trainer and consultant >> >> Phone: +31 (0)20 700 9902 >> Skype: contact42on >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on