From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1697C7618E for ; Thu, 20 Apr 2023 22:54:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232719AbjDTWyY (ORCPT ); Thu, 20 Apr 2023 18:54:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231562AbjDTWyX (ORCPT ); Thu, 20 Apr 2023 18:54:23 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 673333C06 for ; Thu, 20 Apr 2023 15:54:22 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-63b87d23729so1403116b3a.0 for ; Thu, 20 Apr 2023 15:54:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20221208.gappssmtp.com; s=20221208; t=1682031262; x=1684623262; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=kZSbuoiPSP9G0UkYrhhATJ8XJbsqlTPc25y6VuDNBBU=; b=BXdN/s0wDns+blju+1CObEYpPdX0++IBchDldvurjIRMo7PxUPq0JG0nniSuhezvUF 3ozre2zIIb1OHYoHEFkYa7On5yds2N1A/+ORfiWuyj+v3FQGphMK2Lu/DaIKlRajZtfh 4SFmT5Hw3PQFRSjoUOPLh2nAA7GO4b9ZE/QBQ4rW/x8F++cmlqGdSBq7ZnzN2PQBoJmg gf7zO8pmrVWr0yBOpwU5ykMSkRpPv0VJTUyHqd8e78Dk91NsyIo9mj6JDyqTytVfwpjI ABSO4AsMjkmoFg+53jAUud3ZtkyI2EJurntLW11JpocZaQGMeEWkDI38NuQ3tme8y52c 2aDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682031262; x=1684623262; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kZSbuoiPSP9G0UkYrhhATJ8XJbsqlTPc25y6VuDNBBU=; b=bYimziKoaEhkq4wvVvp/Y6BV8Y/yT1n0LVoLRJkeMdoWbj4pe9ie2BNgbKiN+sFF5w y/7dqWbOKtVi9EG4MvgZd7cWRT+BQlX2aFTaFMVGaL81nfYgr9W2L6tq6BGLvES+S1CA cRpRZ7eEbG1TmlkLwz37JbmbTrp4VXh0IaIFWyZR23jLGH/4IVec8KkGlq4q/dj9qHvB eLNEYWZ0I42Czr8x5ykplB8dPHsbyyR4qaMqYqj9ISDo7EtHNGG21Z3gQacPEHCdjPIB NrC6Ame7/nkCZBj6dlcRddhc2Z0niJnUh4hvrSQULBAY0sMcqwvxb+FIp/gG4cCkyEXW mgPQ== X-Gm-Message-State: AAQBX9dMi8YNta4O7aN5KQKQtVBE23p1VZ79UwZqqQGISxv7Fn0Ugm7c i4zhsU8J5p6iT0PrX4EOlJOp0g== X-Google-Smtp-Source: AKy350YZ0/3HotQWDxefjmf+bI6pCOywKKW0sdCuF92oafYn1AR/lsRaQb2xcud+Ma9uUvzBYPutaQ== X-Received: by 2002:a05:6a00:1143:b0:63b:6149:7ad6 with SMTP id b3-20020a056a00114300b0063b61497ad6mr3716465pfm.34.1682031261819; Thu, 20 Apr 2023 15:54:21 -0700 (PDT) Received: from dread.disaster.area (pa49-180-41-174.pa.nsw.optusnet.com.au. [49.180.41.174]) by smtp.gmail.com with ESMTPSA id fd22-20020a056a002e9600b0063b6e3e5a39sm1734764pfb.52.2023.04.20.15.54.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Apr 2023 15:54:21 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1ppdAQ-005pXv-5u; Fri, 21 Apr 2023 08:54:18 +1000 Date: Fri, 21 Apr 2023 08:54:18 +1000 From: Dave Chinner To: Wengang Wang Cc: "linux-xfs@vger.kernel.org" Subject: Re: [PATCH 2/2] xfs: log recovery stage split EFIs with multiple extents Message-ID: <20230420225418.GY3223426@dread.disaster.area> References: <20230414225836.8952-1-wen.gang.wang@oracle.com> <20230414225836.8952-3-wen.gang.wang@oracle.com> <20230420003050.GX3223426@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Thu, Apr 20, 2023 at 05:10:42PM +0000, Wengang Wang wrote: > > > > On Apr 19, 2023, at 5:30 PM, Dave Chinner wrote: > > > > On Fri, Apr 14, 2023 at 03:58:36PM -0700, Wengang Wang wrote: > >> At log recovery stage, we need to split EFIs with multiple extents. For each > >> orginal multiple-extent EFI, split it into new EFIs each including one extent > >> from the original EFI. By that we avoid deadlock when allocating blocks for > >> AGFL waiting for the held busy extents by current transaction to be flushed. > >> > >> For the original EFI, the process is > >> 1. Create and log new EFIs each covering one extent from the > >> original EFI. > >> 2. Don't free extent with the original EFI. > >> 3. Log EFD for the original EFI. > >> Make sure we log the new EFIs and original EFD in this order: > >> new EFI 1 > >> new EFI 2 > >> ... > >> new EFI N > >> original EFD > >> The original extents are freed with the new EFIs. > > > > We may not have the log space available during recovery to explode a > > single EFI out into many EFIs like this. The EFI only had enough > > space reserved for processing a single EFI, and exploding a single > > EFI out like this requires an individual log reservation for each > > new EFI. Hence this de-multiplexing process risks running out of log > > space and deadlocking before we've been able to process anything. > > > > Oh, yes, got it. > > > Hence the only option we really have here is to replicate how CUIs > > are handled. We must process the first extent with a whole EFD and > > a new EFI containing the remaining unprocessed extents as defered > > operations. i.e. > > > > 1. free the first extent in the original EFI > > 2. log an EFD for the original EFI > > 3. Add all the remaining extents in the original EFI to an xefi chain > > 4. Call xfs_defer_ops_capture_and_commit() to create a new EFI from > > the xefi chain and commit the current transaction. > > > > xfs_defer_ops_capture_and_commit() will then add a work item to the > > defered list which will come back to the new EFI and process it > > through the normal runtime deferred ops intent processing path. > > > > So you meant this? > > Orig EFI with extent1 extent2 extent3 > free first extent1 > Full EFD to orig EFI > transaction roll, > xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3 No. We do not need a transaction roll there if we rebuild a new xefi list with the remaining extents from the original efi. At that point, we call: xfs_defer_ops_capture_and_commit() xfs_defer_ops_capture() xfs_defer_create_intents() for each tp->t_dfops ->create intent xfs_extent_free_create_intent() create new EFI walk each xefi and add it to the new intent xfs_trans_commit() i.e. xfs_defer_ops_capture_and_commit() can builds new EFI for us from the xefi list as part of defering the work that remains to be done. Once it has done that, it queues the remaining work and commits the transaction. Hence all we need to do in recovery of the first extent is free it, create the xefi list and log the full EFD. xfs_defer_ops_capture_and_commit() does the rest. > If so, I don’t think it’s safe. > Consider that case that kernel panic happened after the transaction roll, > during next log replay, the original EFI has the matching EFD, so this EFI > is ignored, but actually extent2 and extent3 are not freed. > > If you didn’t mean above, but instead this: > > Orig EFI with extent1 extent2 extent3 > free first extent1 > New EFI extent2 extent3 > Full EFD to orig EFI > transaction roll, > xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3 > > The problem will comeback to the log space issue, are we ensured we have > the space for the new EFI? Yes, because logging the full EFD cancels the original EFI and so it no longer consumes log space. hence we can log a new EFI using the space the original EFI consumed. > > The first patch changed that path to only create intents with a > > single extent, so the continued defer ops would then do the right > > thing with that change in place. However, I think that we also need > > the runtime code to process a single extent per intent per commit in > > the same manner as above. i.e. we process the first extent in the > > intent, then relog all the remaining unprocessed extents as a single > > new intent. > > > > Note that this is similar to how we already relog intents to roll > > them forward in the journal. The only difference for single extent > > processing is that an intent relog duplicates the entire extent list > > in the EFD and the new EFI, whilst what we want is the new EFI to > > contain all the extents except the one we just processed... > > > > The problem to me is that where we place the new EFI, it can’t be after the EFD. > I explained why above. Yes it can. The only thing that matters is that the EFD and new EFI are committed in the same transaction. Remember: transactions are atomic change sets - either all the changes in the transaction are replayed on recovery, or none of them are. -Dave. -- Dave Chinner david@fromorbit.com