From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from verein.lst.de ([213.95.11.211]:58235 "EHLO newverein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751023AbdAMRnC (ORCPT ); Fri, 13 Jan 2017 12:43:02 -0500 Date: Fri, 13 Jan 2017 18:43:00 +0100 From: Christoph Hellwig Subject: Re: [PATCH, RFC] xfs: use per-AG reservations for the finobt Message-ID: <20170113174300.GA6418@lst.de> References: <1483107598-6779-1-git-send-email-hch@lst.de> <20170103192426.GA14031@birch.djwong.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170103192426.GA14031@birch.djwong.org> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: "Darrick J. Wong" Cc: Christoph Hellwig , linux-xfs@vger.kernel.org, bfoster@redhat.com On Tue, Jan 03, 2017 at 11:24:26AM -0800, Darrick J. Wong wrote: > ...and so here we calculate the number of blocks needed to store the > maximum number of finobt records possible for an AG. IIRC, each *inobt > record refers to a single chunk of 64 inodes (or at least a theoretical > chunk in the spinodes=1 case), so I think we can reduce the reservation > to... > > nr = m_sb.sb_agblocks * m_sb.sb_inopblock / XFS_INODES_PER_CHUNK; > return xfs_inobt_calc_size(mp, nr); > > ...right? Yes, that should reduce the reservation quite a bit. > This requires us to traverse all the blocks in the finobt at mount time, > which isn't necessarily quick. For refcount/rmap we cache the number of > tree blocks in the AGF to speed this up... but it was easy to sneak that > into the disk format. :) But for finobt it's too late to do that without another incompatible feature flag. > For finobt I wonder if one could defer the block counting work to a > separate thread if the AG has enough free blocks to cover, say, 10x the > maximum reservation? Though that could be racy and maybe finobts are > small enough that the impact on mount time is low anyway? Usually they are small. And if they aren't - well that's life. I don't think anync counting for a reservation is a good idea. If we see a problem with the time needed to count in practice we'll have to keep a count an introduce a feature flag. > There's also the unsolved problem of what happens if we mount and find > agf_freeblks < (sum(ask) - sum(used)) -- right now we eat that state and > hope that we don't later ENOSPC and crash. Yes. Which is exactly the situation we would have without this patch anyway.. > But as for retroactively adding AG reservations for an existing tree, I > guess we'll have to come up with a strategy for dealing with > insufficient free blocks. I suppose one could try to use xfs_fsr to > move large contiguous extents to a less full AG, if there are any... Eww. We could just fall back to the old code before this patch, which would then eventually shut down..