From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA911EB64D7 for ; Tue, 20 Jun 2023 05:47:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230448AbjFTFrO (ORCPT ); Tue, 20 Jun 2023 01:47:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjFTFrN (ORCPT ); Tue, 20 Jun 2023 01:47:13 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13E50120 for ; Mon, 19 Jun 2023 22:47:12 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id 98e67ed59e1d1-25e836b733eso2163661a91.0 for ; Mon, 19 Jun 2023 22:47:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20221208.gappssmtp.com; s=20221208; t=1687240031; x=1689832031; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Bs4HgfeqjgJt5nfZJr9dS3obRKN7AF2do59UTgm42Z4=; b=GWlkwYlzXExP8E6L0UAF0jGJDgm9vEnntYbKZOr+a2b3ErGtoW0W0ysHI4XvsRFUGc f5VhiobwLyeNgmqBd9t+o48UfB/djqczUPUK3hSX1wM5hCsMLQ4cdrZ5wYEmjVXmIIB3 WPISWkOSQUgUTwnKB83WHtXjGyOWGmVhPx0kOlxqhXFvVkTQyefYHrSKL4854i37qN4B 0SAR/b99QR5I20bMXf6eIAfTr8iZVrjyaosJKQmvP8MJjJCnV+JOW5s43F82PAdDWDEe NygNqQzv68PAyDmQ8B/wzfDphGH/usX807zPoHseAcMK1RvVThB76pydeZJOJYEs/Zj0 muag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687240031; x=1689832031; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Bs4HgfeqjgJt5nfZJr9dS3obRKN7AF2do59UTgm42Z4=; b=XuhYFkah24wKALnHRo/+5ekGk6mXlrSMmTr3/Gg77ECprAVBpW+k6fS4BjOBgS4fea zTiREOjO7WXYbnp+52vMp/N0IvR2hIcEAWHHlg/3jrBR9axgo65KRZFZgwoAYRyGpMLg dED/FMODu4DPc7mxf4PI1gM19v71bRJ9ywIPmZ1j2dNPS4KPzlsQG/2PEyYnjsF1JGX8 AZYKlBuMLukO6gX32Aq1GGU7BqoZ8GE8R8kBOpc38IcNe//zkgPc/xHi9dFk3jZyzBlD gv4psuoWC4K3DzCZ2oqnLC7OezWpH5V+78j6i6UikwddAeJcdtGvTRLvIeb+V4jryiOU 4dNA== X-Gm-Message-State: AC+VfDzAQf06Fln5s9oLYXIkMCVQkudBPf9EVdX1XxnpWc9+S1vju+4B LOsCxSSNnZs/pGvdLq5gurfmKDHD/NKDw462up8= X-Google-Smtp-Source: ACHHUZ7cB2FcpYfViv5BTYL0g/30ZwbVk/EjxZH5ei3GJ97IbJelegcqHR5DuFlTFGyMIP0WfyeSwA== X-Received: by 2002:a17:90a:c087:b0:25e:91ef:8b24 with SMTP id o7-20020a17090ac08700b0025e91ef8b24mr7653880pjs.24.1687240031510; Mon, 19 Jun 2023 22:47:11 -0700 (PDT) Received: from dread.disaster.area (pa49-180-13-202.pa.nsw.optusnet.com.au. [49.180.13.202]) by smtp.gmail.com with ESMTPSA id q21-20020a170902b11500b001b414fae374sm681613plr.291.2023.06.19.22.47.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Jun 2023 22:47:10 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qBUCq-00DwEa-0I; Tue, 20 Jun 2023 15:47:08 +1000 Date: Tue, 20 Jun 2023 15:47:08 +1000 From: Dave Chinner To: "Darrick J. Wong" Cc: linux-xfs@vger.kernel.org Subject: Re: [PATCH 8/9] xfs: reap large AG metadata extents when possible Message-ID: References: <168506055606.3728180.16225214578338421625.stgit@frogsfrogsfrogs> <168506055733.3728180.3134566782464969180.stgit@frogsfrogsfrogs> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <168506055733.3728180.3134566782464969180.stgit@frogsfrogsfrogs> Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Thu, May 25, 2023 at 05:45:03PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > When we're freeing extents that have been set in a bitmap, break the > bitmap extent into multiple sub-extents organized by fate, and reap the > extents. This enables us to dispose of old resources more efficiently > than doing them block by block. > > While we're at it, rename the reaping functions to make it clear that > they're reaping per-AG extents. > > Signed-off-by: Darrick J. Wong > --- ..... > +xreap_agextent_binval( > + struct xreap_state *rs, > + xfs_agblock_t agbno, > + xfs_extlen_t *aglenp) > { > - struct xfs_buf *bp = NULL; > - int error; > + struct xfs_scrub *sc = rs->sc; > + struct xfs_perag *pag = sc->sa.pag; > + struct xfs_mount *mp = sc->mp; > + xfs_agnumber_t agno = sc->sa.pag->pag_agno; > + xfs_agblock_t agbno_next = agbno + *aglenp; > + xfs_agblock_t bno = agbno; > > /* > - * If there's an incore buffer for exactly this block, invalidate it. > * Avoid invalidating AG headers and post-EOFS blocks because we never > * own those. > */ > - if (!xfs_verify_fsbno(sc->mp, fsbno)) > + if (!xfs_verify_agbno(pag, agbno) || > + !xfs_verify_agbno(pag, agbno_next - 1)) > return; > > /* > - * We assume that the lack of any other known owners means that the > - * buffer can be locked without risk of deadlocking. > + * If there are incore buffers for these blocks, invalidate them. We > + * assume that the lack of any other known owners means that the buffer > + * can be locked without risk of deadlocking. The buffer cache cannot > + * detect aliasing, so employ nested loops to scan for incore buffers > + * of any plausible size. > */ > - error = xfs_buf_incore(sc->mp->m_ddev_targp, > - XFS_FSB_TO_DADDR(sc->mp, fsbno), > - XFS_FSB_TO_BB(sc->mp, 1), XBF_BCACHE_SCAN, &bp); > - if (error) > - return; > - > - xfs_trans_bjoin(sc->tp, bp); > - xfs_trans_binval(sc->tp, bp); > + while (bno < agbno_next) { > + xfs_agblock_t fsbcount; > + xfs_agblock_t max_fsbs; > + > + /* > + * Max buffer size is the max remote xattr buffer size, which > + * is one fs block larger than 64k. > + */ > + max_fsbs = min_t(xfs_agblock_t, agbno_next - bno, > + xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX)); > + > + for (fsbcount = 1; fsbcount < max_fsbs; fsbcount++) { > + struct xfs_buf *bp = NULL; > + xfs_daddr_t daddr; > + int error; > + > + daddr = XFS_AGB_TO_DADDR(mp, agno, bno); > + error = xfs_buf_incore(mp->m_ddev_targp, daddr, > + XFS_FSB_TO_BB(mp, fsbcount), > + XBF_BCACHE_SCAN, &bp); > + if (error) > + continue; > + > + xfs_trans_bjoin(sc->tp, bp); > + xfs_trans_binval(sc->tp, bp); > + rs->invalidated++; Hmmm. O(N^2) brute force lookup walk to find any buffer at that specific daddr? That's going to have an impact on running systems by hammering the perag hash lock. I didn't know this was being done before I suggested XBF_ANY_MATCH, but now I'm wondering how can we even have multiple buffers in the cache at a given address? The whole point of the ASSERT() in the match function is to indicate this should not ever happen. i.e. xfs_buf_find_insert() uses rhashtable_lookup_get_insert_fast(), which will return an existing buffer only if it has a match length. The compare function used at insert time will throw an assert fail if any other buffer exists at that address that has a mismatched length that is not stale. There is no way to avoid that ASSERT check on insert. Hence, AFAICT, the only way we can get multiple buffers into the cache at the same daddr with different lengths is for all the existing buffers at that daddr to all be stale at insert time. In which case, why do we need to invalidate buffers that are already stale (i.e. been invalidated)? What am I missing here? i.e. this appears to cater for behaviour that doesn't currently exist in the buffer cache, and I'm not sure we even want to allow to occur in the buffer cache given that it generally indicates in-memory metadata corruption..... Hmmmm. What if the buffer was already stale? The lookup will then clear all the stale state from it, leave it with no valid contents which we then invalidate again and log a new buffer cancel item for it. What are the implications of that? (My brain is too full of stuff trying to understand what this code is doing to be able to think this through right now.) -Dave. -- Dave Chinner david@fromorbit.com