From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73169C7EE2A for ; Wed, 17 May 2023 00:05:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229751AbjEQAFE (ORCPT ); Tue, 16 May 2023 20:05:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229638AbjEQAFB (ORCPT ); Tue, 16 May 2023 20:05:01 -0400 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40D31527B for ; Tue, 16 May 2023 17:04:57 -0700 (PDT) Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-51b33c72686so26471a12.1 for ; Tue, 16 May 2023 17:04:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20221208.gappssmtp.com; s=20221208; t=1684281897; x=1686873897; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=5HKIHNG8XkdEXJw7znBRUctCr+tvvKFVnG6cPSoxeb4=; b=f3o3e6QlYT0Mex0hP4/2PXdqAtIzknfsw81yougpXlAEb5w7EEEWla+STUODF0ytSi yCGQKyvN2yV0xTRKX62Z/Bn2hxUEUPlJXtp+eg7NahOIEZHnHyy5lmin5rtzDuwLGNxT Tza/H/te7bWZasnw+V13H+HoNWNsz1p88i+81NOvYSLxG1Os4DDdYYl82+jgxg7P5BfF ij98eQZ2QLadKlBgqyHN4RMmrxMKLiRgFfrg+EMpldnw5S0b/gBZx5KeXFyFJl2+WX8d 97/1p/3qbe0/PdKLZJZHHLrscUO7H0VKwwXYp04g3ycqa26XGa6c3SbYw1mlc745HDAZ yuCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684281897; x=1686873897; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5HKIHNG8XkdEXJw7znBRUctCr+tvvKFVnG6cPSoxeb4=; b=AL41z4SNmUhwcFRCiz03mhOe4On3JnX7KSAWIhrZsmgchHfKJ5LzYxxw1tQt6xg7OO tdaCJYwyPCGNpWZAAOfN2ChPojA+2U9upoDMGMUYLFwjsxxbSLYSZqWex7KEWF0XbLOs kPXZ+VIIJxkHNHLGERuCFpfya6/wsMwokTP8zjrWxU3ZX1BU1flIHD3J3eWeJ+RTImPd UuVR3Bs7hp/CDe5DKboIl6jPDNHswyJkuum8kXEvZTAXUG43SwSJ0kmiMVCZO1pL9m4z H/0Is9Ulcldlcnx6npJ98bo90Am0rWq8m+SNFdazrGZwChj0YCs/LxwAQ/a2w7yzuE7e r7gA== X-Gm-Message-State: AC+VfDyUufs5PEfsBRlkTa2TRbMoVbmiIYtc95edhL8e6D5iFQtMn58m 6y13EnJh/0HPqYLmZflXv4hwtIlChqNr0qJsJmI= X-Google-Smtp-Source: ACHHUZ5EIzS2CQqrhDD+TufYItl0TDCZG3XYZSNSEexsBAbsHRCjT3KsmAbwG5uNCYxVSVQJvzyBNQ== X-Received: by 2002:a05:6a20:72a9:b0:f2:57da:7f45 with SMTP id o41-20020a056a2072a900b000f257da7f45mr47555448pzk.8.1684281896644; Tue, 16 May 2023 17:04:56 -0700 (PDT) Received: from dread.disaster.area (pa49-179-0-188.pa.nsw.optusnet.com.au. [49.179.0.188]) by smtp.gmail.com with ESMTPSA id f24-20020a17090ace1800b002405d3bbe42sm184690pju.0.2023.05.16.17.04.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 May 2023 17:04:55 -0700 (PDT) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1pz4ey-000Lbo-37 for linux-xfs@vger.kernel.org; Wed, 17 May 2023 10:04:52 +1000 Received: from dave by devoid.disaster.area with local (Exim 4.96) (envelope-from ) id 1pz4ey-00Gu85-1U for linux-xfs@vger.kernel.org; Wed, 17 May 2023 10:04:52 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 1/4] xfs: buffer pins need to hold a buffer reference Date: Wed, 17 May 2023 10:04:46 +1000 Message-Id: <20230517000449.3997582-2-david@fromorbit.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230517000449.3997582-1-david@fromorbit.com> References: <20230517000449.3997582-1-david@fromorbit.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count. This results in a use after free that can only be triggered in active filesystem shutdown situations. To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete. Reported-by: yangerkun Signed-off-by: Dave Chinner --- fs/xfs/xfs_buf_item.c | 88 ++++++++++++++++++++++++++++++++----------- 1 file changed, 65 insertions(+), 23 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index df7322ed73fa..b2d211730fd2 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -452,10 +452,18 @@ xfs_buf_item_format( * This is called to pin the buffer associated with the buf log item in memory * so it cannot be written out. * - * We also always take a reference to the buffer log item here so that the bli - * is held while the item is pinned in memory. This means that we can - * unconditionally drop the reference count a transaction holds when the - * transaction is completed. + * We take a reference to the buffer log item here so that the BLI life cycle + * extends at least until the buffer is unpinned via xfs_buf_item_unpin() and + * inserted into the AIL. + * + * We also need to take a reference to the buffer itself as the BLI unpin + * processing requires accessing the buffer after the BLI has dropped the final + * BLI reference. See xfs_buf_item_unpin() for an explanation. + * If unpins race to drop the final BLI reference and only the + * BLI owns a reference to the buffer, then the loser of the race can have the + * buffer fgreed from under it (e.g. on shutdown). Taking a buffer reference per + * pin count ensures the life cycle of the buffer extends for as + * long as we hold the buffer pin reference in xfs_buf_item_unpin(). */ STATIC void xfs_buf_item_pin( @@ -470,13 +478,30 @@ xfs_buf_item_pin( trace_xfs_buf_item_pin(bip); + xfs_buf_hold(bip->bli_buf); atomic_inc(&bip->bli_refcount); atomic_inc(&bip->bli_buf->b_pin_count); } /* - * This is called to unpin the buffer associated with the buf log item which - * was previously pinned with a call to xfs_buf_item_pin(). + * This is called to unpin the buffer associated with the buf log item which was + * previously pinned with a call to xfs_buf_item_pin(). We enter this function + * with a buffer pin count, a buffer reference and a BLI reference. + * + * We must drop the BLI reference before we unpin the buffer because the AIL + * doesn't acquire a BLI reference whenever it accesses it. Therefore if the + * refcount drops to zero, the bli could still be AIL resident and the buffer + * submitted for I/O at any point before we return. This can result in IO + * completion freeing the buffer while we are still trying to access it here. + * This race condition can also occur in shutdown situations where we abort and + * unpin buffers from contexts other that journal IO completion. + * + * Hence we have to hold a buffer reference per pin count to ensure that the + * buffer cannot be freed until we have finished processing the unpin operation. + * The reference is taken in xfs_buf_item_pin(), and we must hold it until we + * are done processing the buffer state. In the case of an abort (remove = + * true) then we re-use the current pin reference as the IO reference we hand + * off to IO failure handling. */ STATIC void xfs_buf_item_unpin( @@ -493,24 +518,18 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin(bip); - /* - * Drop the bli ref associated with the pin and grab the hold required - * for the I/O simulation failure in the abort case. We have to do this - * before the pin count drops because the AIL doesn't acquire a bli - * reference. Therefore if the refcount drops to zero, the bli could - * still be AIL resident and the buffer submitted for I/O (and freed on - * completion) at any point before we return. This can be removed once - * the AIL properly holds a reference on the bli. - */ freed = atomic_dec_and_test(&bip->bli_refcount); - if (freed && !stale && remove) - xfs_buf_hold(bp); if (atomic_dec_and_test(&bp->b_pin_count)) wake_up_all(&bp->b_waiters); - /* nothing to do but drop the pin count if the bli is active */ - if (!freed) + /* + * Nothing to do but drop the buffer pin reference if the BLI is + * still active + */ + if (!freed) { + xfs_buf_rele(bp); return; + } if (stale) { ASSERT(bip->bli_flags & XFS_BLI_STALE); @@ -522,6 +541,15 @@ xfs_buf_item_unpin( trace_xfs_buf_item_unpin_stale(bip); + /* + * The buffer has been locked and referenced since it was marked + * stale so we own both lock and reference exclusively here. We + * do not need the pin reference any more, so drop it now so + * that we only have one reference to drop once item completion + * processing is complete. + */ + xfs_buf_rele(bp); + /* * If we get called here because of an IO error, we may or may * not have the item on the AIL. xfs_trans_ail_delete() will @@ -538,16 +566,30 @@ xfs_buf_item_unpin( ASSERT(bp->b_log_item == NULL); } xfs_buf_relse(bp); - } else if (remove) { + return; + } + + if (remove) { /* - * The buffer must be locked and held by the caller to simulate - * an async I/O failure. We acquired the hold for this case - * before the buffer was unpinned. + * We need to simulate an async IO failures here to ensure that + * the correct error completion is run on this buffer. This + * requires a reference to the buffer and for the buffer to be + * locked. We can safely pass ownership of the pin reference to + * the IO to ensure that nothing can free the buffer while we + * wait for the lock and then run the IO failure completion. */ xfs_buf_lock(bp); bp->b_flags |= XBF_ASYNC; xfs_buf_ioend_fail(bp); + return; } + + /* + * BLI has no more active references - it will be moved to the AIL to + * manage the remaining BLI/buffer life cycle. There is nothing left for + * us to do here so drop the pin reference to the buffer. + */ + xfs_buf_rele(bp); } STATIC uint -- 2.40.1