From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:59608 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755976AbaIKTcQ (ORCPT ); Thu, 11 Sep 2014 15:32:16 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1XSA6C-0000Z2-35 for linux-btrfs@vger.kernel.org; Thu, 11 Sep 2014 21:32:08 +0200 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 11 Sep 2014 21:32:08 +0200 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 11 Sep 2014 21:32:08 +0200 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: how long should "btrfs device delete missing ..." take? Date: Thu, 11 Sep 2014 19:31:56 +0000 (UTC) Message-ID: References: <84a98bc5f667c04ca74ff77b56f537d9@admin.virtall.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Tomasz Chmielewski posted on Thu, 11 Sep 2014 17:22:15 +0200 as excerpted: > After a disk died and was replaced, "btrfs device delete missing" is > taking more than 10 days on an otherwise idle server: > > # btrfs fi show /home > Label: none uuid: 84d087aa-3a32-46da-844f-a233237cf04f > Total devices 3 FS bytes used 362.44GiB > devid 2 size 1.71TiB used 365.03GiB path /dev/sdb4 > devid 3 size 1.71TiB used 58.00GiB path /dev/sda4 > *** Some devices missing > > Btrfs v3.16 > > So far, it has copied 58 GB out of 365 GB - and it took 10 days. At this > speed, the whole operation will take 2-3 months (assuming that the only > healthy disk doesn't die in the meantime). > Is this expected time for btrfs RAID-1? Device delete definitely takes time. For the sub-GiB usage shown above, 10 days for 50 GiB out of 350+ does seem excessive, but there are extreme cases where it isn't entirely out of line. See below. > There are no errors in dmesg/smart, performance of both disks is fine: > # btrfs sub list /home | wc -l > 260 > I've tried running this on the latest 3.16.x kernel earlier, but since > the progress was so slow, rebooted after about a week to see if the > latest RC will be any faster. The good thing is that once a block group is copied over, it should be fine and won't need re-copied if the process is stopped over a reboot and restarted on a new kernel, etc. The bad thing is that if I'm interpreting your report correctly, that likely means 7+10=17 days for that 58 gig. =:^( Questions: * Presumably most of those 260 subvolumes are snapshots, correct? What was your snapshotting schedule and did you have old snapshot cleanup-deletion scheduled as well? * Do you run with autodefrag or was the system otherwise regularly defragged? * Do you have large (GiB plus) database or virtual machine image files on that filesystem? If so, had you properly set the NOCOW file attribute (chattr +C) on them and were they on dedicated subvolumes? 200+ snapshots is somewhat high and could be part of the issue, tho it's nothing like the extremes (thousands) we've seen posted in the past. Were it me, I'd have tried deleting as many as possible before the device delete missing, in ordered to simplify the process and eliminate as much "extra" data as possible. The real issue is going to be fragmentation, on spinning-rust drives. Run filefrag on some of your gig-plus files that get written to frequently (vm-images and database files are the classic cases) and see how many extents are reported. (Tho note that filefrag doesn't understand btrfs compression and won't be accurate in that case, and also that due to the btrfs data chunk size of 1 GiB, that's the maximum extent size, so multi-gig files will typically be two extents more than the number of gigs, filling up the current chunk, N whole-gig chunks, the file tail.) The nocow file attribute (which must be set while the file is zero-sized to be effective, see discussion elsewhere) can help with that, but snapshotting an actively being rewritten nocow file more or less defeats the purpose of nocow, since the snapshot locks in place the existing data and the first rewrite to a block must then be cowed anyway. But putting those files on dedicated subvolumes and then not snapshotting those subvolumes is a workaround. I wouldn't try defragging now, but it might be worthwhile to stop the device delete (rebooting to do so since I don't think there's a cancel) and delete as many snapshots as possible. That should help matters. Additionally, if you have recent backups of highly fragmented files such as the VM-images and DBs I mentioned, you might consider simply deleting them, thus eliminating that fragment processing from the device delete. I don't know that making a backup now would go much faster than the device delete, however, so I don't know whether to recommend that or not. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman