From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:33650 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752713AbaEZPVR (ORCPT ); Mon, 26 May 2014 11:21:17 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Wowi8-0002y1-Lg for linux-btrfs@vger.kernel.org; Mon, 26 May 2014 17:21:12 +0200 Received: from cpc21-stap10-2-0-cust974.12-2.cable.virginm.net ([86.0.163.207]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 26 May 2014 17:21:12 +0200 Received: from m_btrfs by cpc21-stap10-2-0-cust974.12-2.cable.virginm.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 26 May 2014 17:21:12 +0200 To: linux-btrfs@vger.kernel.org From: Martin Subject: Re: Btrfs filesystem freezing during snapshots Date: Mon, 26 May 2014 16:20:55 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 In-Reply-To: Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 26/05/14 13:28, David Bloquel wrote: > Hi, > > I have a problem with my btrfs filesystem which is freezing when I am > doing snapshots. > > I have a cron that is snapshoting around 70 sub volume every ten > minutes. The sub volumes that btrfs is snapshoting are containers > folders that are running through my virtual environment. > Sub directories that btrfs is snapshoting are not that big (from 500MB > to 10GB max and usually around 3GB) but there is a lot of IO on the > filesystem because of the intensive use of the CTs and VMs. > > At some point the snapshot process becomes really slow, at first it > snapshot around one folder per seconds but then after a while it can > take 30seconds or even few minutes to snapshot one single sub volumes. > Subvolumes are really similar to each other in size and number of > files so there is no reason that it takes 1second for one sub volume > and then 3minutes for another one. > > Moreover when my snapshot cron is running all my vms and containers > are slowing down until the whole filesystem freezes which leads to > frozen CT and VMs (which is a real problem for me). > > Moreover I can see that my CPU load is really high during the process. > > when I'm am looking to dmesg there is a lot of messages of this kind: > > [96537.686467] BTRFS debug (device drbd0): unlinked 290 orphans [...] That looks to be running on top of drbd which will add a network write overhead (unless you are dangerously running asynchronously!). Hence you will see IO speed related limits a little sooner... However, I will guess that your primary problem is likely due to accumulating fragmentation due to adding ever more snapshots every 10 mins for the VMs/containers. There are other people far more practised here than I, but some guesses to try are: Use "nocow" for the VM images (and container images); Try using the btrfs auto defrag (beware your IO speed limit vs file size to be defragged); Avoid accumulating too many versions of any one snapshot. Note also the "experimental" status for btrfs... I'm sure you will have noticed the previous race problems for deleting snapshots. Aside: I've held off from using kernel 3.12 and 3.13 due to curious happenings on my test system. kernel 3.14.4 is behaving well so far. Hope that gives a few clues. Good luck, Martin