From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99B8DC433EF for ; Thu, 24 Mar 2022 05:24:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242166AbiCXF0G (ORCPT ); Thu, 24 Mar 2022 01:26:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229911AbiCXF0G (ORCPT ); Thu, 24 Mar 2022 01:26:06 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E262939BA for ; Wed, 23 Mar 2022 22:24:31 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9890260A56; Thu, 24 Mar 2022 05:24:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F03FBC340EC; Thu, 24 Mar 2022 05:24:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1648099470; bh=yTttBokpLAiVFeWg7ukJDuK3yZ3MpX6DE9wfuL3GXds=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PhdMWv7ZRkyhT2jqnkN22zAquT5nAP5Ps9wnfy75y51kbApgU+b++s7JkcPyzTKTT ORE/eH8+ADPQVPFEWNRsowqAQPXtqQJWgs9p86zdeILwalshrhNKCWoR91GLTjbcIt vhgs2UXww444mMGzsBjmTsFhHobjs+Bd1CF54SMjhCTmjWtAWm+aiYjCyBcuLv2mCx ZetAPNQoVytmFSqeFtbC2pxTDWvoL57fHLnuUrmJU7S5z4Y9Gpe/BFGnH20PxhGNNK sXu3OK27/bjyxnjyTgNTVPFX9SQpsT2Ut4gKXDhRA3CzEShWNkBkTrBOz84Xh8np35 F1ywzhrkQp5DA== Date: Wed, 23 Mar 2022 22:24:29 -0700 From: "Darrick J. Wong" To: Dave Chinner , g@magnolia Cc: Brian Foster , linux-xfs@vger.kernel.org Subject: Re: [PATCH 4/6] xfs: fix infinite loop when reserving free block pool Message-ID: <20220324052429.GS8241@magnolia> References: <164779460699.550479.5112721232994728564.stgit@magnolia> <164779462946.550479.3987400627869935198.stgit@magnolia> <20220323211147.GX1544202@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220323211147.GX1544202@dread.disaster.area> Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Thu, Mar 24, 2022 at 08:11:47AM +1100, Dave Chinner wrote: > On Sun, Mar 20, 2022 at 09:43:49AM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong > > > > Don't spin in an infinite loop trying to reserve blocks -- if we can't > > do it after 30 tries, we're racing with a nearly full filesystem, so > > just give up. > > > > Cc: Brian Foster > > Signed-off-by: Darrick J. Wong > > Reviewed-by: Brian Foster > > --- > > fs/xfs/xfs_fsops.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > index 710e857bb825..615334e4f689 100644 > > --- a/fs/xfs/xfs_fsops.c > > +++ b/fs/xfs/xfs_fsops.c > > @@ -379,6 +379,7 @@ xfs_reserve_blocks( > > int64_t fdblks_delta = 0; > > uint64_t request; > > int64_t free; > > + unsigned int tries; > > int error = 0; > > > > /* If inval is null, report current values and return */ > > @@ -430,9 +431,16 @@ xfs_reserve_blocks( > > * If the request is larger than the current reservation, reserve the > > * blocks before we update the reserve counters. Sample m_fdblocks and > > * perform a partial reservation if the request exceeds free space. > > + * > > + * The loop body estimates how many blocks it can request from fdblocks > > + * to stash in the reserve pool. This is a classic TOCTOU race since > > + * fdblocks updates are not always coordinated via m_sb_lock. We also > > + * cannot tell if @free remaining unchanged between iterations is due > > + * to an idle system or freed blocks being consumed immediately, so > > + * we'll try a finite number of times to satisfy the request. > > */ > > error = -ENOSPC; > > - do { > > + for (tries = 0; tries < 30 && error == -ENOSPC; tries++) { > > free = percpu_counter_sum(&mp->m_fdblocks) - > > xfs_fdblocks_unavailable(mp); > > if (free <= 0) > > @@ -459,7 +467,7 @@ xfs_reserve_blocks( > > spin_unlock(&mp->m_sb_lock); > > error = xfs_mod_fdblocks(mp, -fdblks_delta, 0); > > spin_lock(&mp->m_sb_lock); > > - } while (error == -ENOSPC); > > + } > > So the problem here is that if we don't get all of the reservation, > we get none of it, so we try again. This seems like we should simply > punch the error back out to userspace and let it retry rather than > try a few times and then fail anyway. Either way, userspace has to > handle failure to reserve enough blocks. > > The other alternative here is that we just change the reserve block > limit when ENOSPC occurs and allow the xfs_mod_fdblocks() refill > code just fill up the pool as space is freed. That would greatly > simplify the code - we do a single attempt to reserve the new space, > and if it fails we don't care because the reserve space gets > refilled before free space is made available again. I think that's > preferable to having an arbitrary number of retries like this that's > going to end up failing anyway. How about I try the following tomorrow? spin_lock(...); mp->m_resblks = request; free = percpu_counter_sum(&mp->m_fdblocks) - xfs_fdblocks_unavailable(mp); if (request < mp->m_resblks_avail && free > 0) { ask = min(request - mp->m_resblks_avail, free); spin_unlock(...); error = xfs_mod_fdblocks(mp, -ask, 0); spin_lock(...); if (!error) mp->m_resblks_avail += ask; } spin_unlock(...); return error; --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com