From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1095EC433F5 for ; Sun, 3 Apr 2022 12:01:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349340AbiDCMD0 (ORCPT ); Sun, 3 Apr 2022 08:03:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235886AbiDCMDZ (ORCPT ); Sun, 3 Apr 2022 08:03:25 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF11D2E9D2 for ; Sun, 3 Apr 2022 05:01:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=FgMxoR1d+CZADER7W5bzGH90tel/fpt7LnR3WTfIajY=; b=Xz0D8zyvUmRFf1dy7UZuhShyEL KxOYguvJtwPwrsiqgFrh9cpPgTgtOcUZ9rYUscNSOod2Zs8UVT/WYrfKA7xmQ+FlmcVTGGl7ev2uU PQ8SsEB12AmVSd0AP7gmSsDsoDhKhHU0jxPVCbc3FQTLFofL84ikhh9MDAZNckWrveGjKIpjFcWqc Wz9Jwy5bR99VZNTBd6vGdK9GBTm5Uj8IqlmpwqAkkcgP66Q2IPV0fKWPreR4f3WA2IrbGudRkPaS/ Nde09TFzgBkZ6N9a3ZltxjtRQ/T0MY2qAb2tozHqijnVKf3dzEVaB3EcG1p8EnxQE0WfnH0JAv+D6 02Muxrsg==; Received: from [2001:4bb8:184:7553:31f9:976f:c3b1:7920] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nayvC-00BK3H-Ve; Sun, 03 Apr 2022 12:01:31 +0000 From: Christoph Hellwig To: linux-xfs@vger.kernel.org Cc: Dave Chinner Subject: [PATCH 3/5] xfs: remove a superflous hash lookup when inserting new buffers Date: Sun, 3 Apr 2022 14:01:17 +0200 Message-Id: <20220403120119.235457-4-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220403120119.235457-1-hch@lst.de> References: <20220403120119.235457-1-hch@lst.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org xfs_buf_get_map has a bit of a strange structure where the xfs_buf_find helper is called twice before we actually insert a new buffer on a cache miss. Given that the rhashtable has an interface to insert a new entry and return the found one on a conflict we can easily get rid of the double lookup by using that. To get a straight code path this requires folding the xfs_buf_find helper into xfs_buf_get_map, but the counter side we can now easily move the logic for the slow path cache insert into a new helper. Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_buf.c | 207 +++++++++++++++++++++++------------------------ 1 file changed, 99 insertions(+), 108 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 0951b7cbd79675..ef645e15935369 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -504,12 +504,66 @@ xfs_buf_hash_destroy( rhashtable_destroy(&pag->pag_buf_hash); } +static int +xfs_buf_insert( + struct xfs_buftarg *btp, + struct xfs_buf_map *map, + int nmaps, + xfs_buf_flags_t flags, + struct xfs_perag *pag, + struct xfs_buf **bpp) +{ + int error; + struct xfs_buf *bp; + + error = _xfs_buf_alloc(btp, map, nmaps, flags, &bp); + if (error) + return error; + + /* + * For buffers that fit entirely within a single page, first attempt to + * allocate the memory from the heap to minimise memory usage. If we + * can't get heap memory for these small buffers, we fall back to using + * the page allocator. + */ + if (BBTOB(bp->b_length) >= PAGE_SIZE || + xfs_buf_alloc_kmem(bp, flags) < 0) { + error = xfs_buf_alloc_pages(bp, flags); + if (error) + goto out_free_buf; + } + + /* the buffer keeps the perag reference until it is freed */ + bp->b_pag = pag; + + spin_lock(&pag->pag_buf_lock); + *bpp = rhashtable_lookup_get_insert_fast(&pag->pag_buf_hash, + &bp->b_rhash_head, xfs_buf_hash_params); + if (IS_ERR(*bpp)) { + error = PTR_ERR(*bpp); + goto out_unlock; + } + if (*bpp) { + /* found an existing buffer */ + atomic_inc(&(*bpp)->b_hold); + error = -EEXIST; + goto out_unlock; + } + spin_unlock(&pag->pag_buf_lock); + *bpp = bp; + return 0; + +out_unlock: + spin_unlock(&pag->pag_buf_lock); +out_free_buf: + xfs_buf_free(bp); + return error; +} + /* - * Look up a buffer in the buffer cache and return it referenced and locked - * in @found_bp. - * - * If @new_bp is supplied and we have a lookup miss, insert @new_bp into the - * cache. + * Assemble a buffer covering the specified range. The code is optimised for + * cache hits, as metadata intensive workloads will see 3 orders of magnitude + * more hits than misses. * * If XBF_TRYLOCK is set in @flags, only try to lock the buffer and return * -EAGAIN if we fail to lock it. @@ -517,27 +571,26 @@ xfs_buf_hash_destroy( * Return values are: * -EFSCORRUPTED if have been supplied with an invalid address * -EAGAIN on trylock failure - * -ENOENT if we fail to find a match and @new_bp was NULL - * 0, with @found_bp: - * - @new_bp if we inserted it into the cache - * - the buffer we found and locked. + * -ENOENT if we fail to find a match and alloc was %false + * 0 if a buffer was found or newly created */ -static int -xfs_buf_find( +int +xfs_buf_get_map( struct xfs_buftarg *btp, struct xfs_buf_map *map, int nmaps, xfs_buf_flags_t flags, - struct xfs_buf *new_bp, - struct xfs_buf **found_bp) + struct xfs_buf **bpp) { + struct xfs_mount *mp = btp->bt_mount; struct xfs_perag *pag; struct xfs_buf *bp; struct xfs_buf_map cmap = { .bm_bn = map[0].bm_bn }; + int error = 0; xfs_daddr_t eofs; int i; - *found_bp = NULL; + *bpp = NULL; for (i = 0; i < nmaps; i++) cmap.bm_len += map[i].bm_len; @@ -550,54 +603,47 @@ xfs_buf_find( * Corrupted block numbers can get through to here, unfortunately, so we * have to check that the buffer falls within the filesystem bounds. */ - eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks); - if (cmap.bm_bn < 0 || cmap.bm_bn >= eofs) { - xfs_alert(btp->bt_mount, - "%s: daddr 0x%llx out of range, EOFS 0x%llx", + eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks); + if (WARN_ON_ONCE(cmap.bm_bn < 0) || WARN_ON_ONCE(cmap.bm_bn >= eofs)) { + xfs_alert(mp, "%s: daddr 0x%llx out of range, EOFS 0x%llx", __func__, cmap.bm_bn, eofs); - WARN_ON(1); return -EFSCORRUPTED; } - pag = xfs_perag_get(btp->bt_mount, - xfs_daddr_to_agno(btp->bt_mount, cmap.bm_bn)); + pag = xfs_perag_get(mp, xfs_daddr_to_agno(mp, cmap.bm_bn)); spin_lock(&pag->pag_buf_lock); bp = rhashtable_lookup_fast(&pag->pag_buf_hash, &cmap, xfs_buf_hash_params); - if (bp) { + if (bp) atomic_inc(&bp->b_hold); - goto found; - } - - /* No match found */ - if (!new_bp) { - XFS_STATS_INC(btp->bt_mount, xb_miss_locked); - spin_unlock(&pag->pag_buf_lock); - xfs_perag_put(pag); - return -ENOENT; - } - - /* the buffer keeps the perag reference until it is freed */ - new_bp->b_pag = pag; - rhashtable_insert_fast(&pag->pag_buf_hash, &new_bp->b_rhash_head, - xfs_buf_hash_params); spin_unlock(&pag->pag_buf_lock); - *found_bp = new_bp; - return 0; -found: - spin_unlock(&pag->pag_buf_lock); + if (unlikely(!bp)) { + if (flags & XBF_NOALLOC) { + XFS_STATS_INC(mp, xb_miss_locked); + xfs_perag_put(pag); + return -ENOENT; + } + + error = xfs_buf_insert(btp, map, nmaps, flags, pag, &bp); + if (!error) + goto finish; + if (error != -EEXIST) { + xfs_perag_put(pag); + return error; + } + } xfs_perag_put(pag); if (!xfs_buf_trylock(bp)) { if (flags & XBF_TRYLOCK) { xfs_buf_rele(bp); - XFS_STATS_INC(btp->bt_mount, xb_busy_locked); + XFS_STATS_INC(mp, xb_busy_locked); return -EAGAIN; } xfs_buf_lock(bp); - XFS_STATS_INC(btp->bt_mount, xb_get_locked_waited); + XFS_STATS_INC(mp, xb_get_locked_waited); } /* @@ -612,64 +658,12 @@ xfs_buf_find( } trace_xfs_buf_find(bp, flags, _RET_IP_); - XFS_STATS_INC(btp->bt_mount, xb_get_locked); - *found_bp = bp; - return 0; -} - -/* - * Assembles a buffer covering the specified range. The code is optimised for - * cache hits, as metadata intensive workloads will see 3 orders of magnitude - * more hits than misses. - */ -int -xfs_buf_get_map( - struct xfs_buftarg *target, - struct xfs_buf_map *map, - int nmaps, - xfs_buf_flags_t flags, - struct xfs_buf **bpp) -{ - struct xfs_buf *bp; - struct xfs_buf *new_bp; - int error; - - *bpp = NULL; - error = xfs_buf_find(target, map, nmaps, flags, NULL, &bp); - if (!error) - goto found; - if (error != -ENOENT || (flags & XBF_NOALLOC)) - return error; - - error = _xfs_buf_alloc(target, map, nmaps, flags, &new_bp); - if (error) - return error; - - /* - * For buffers that fit entirely within a single page, first attempt to - * allocate the memory from the heap to minimise memory usage. If we - * can't get heap memory for these small buffers, we fall back to using - * the page allocator. - */ - if (BBTOB(new_bp->b_length) >= PAGE_SIZE || - xfs_buf_alloc_kmem(new_bp, flags) < 0) { - error = xfs_buf_alloc_pages(new_bp, flags); - if (error) - goto out_free_buf; - } - - error = xfs_buf_find(target, map, nmaps, flags, new_bp, &bp); - if (error) - goto out_free_buf; - - if (bp != new_bp) - xfs_buf_free(new_bp); - -found: + XFS_STATS_INC(mp, xb_get_locked); +finish: if (!bp->b_addr) { error = _xfs_buf_map_pages(bp, flags); if (unlikely(error)) { - xfs_warn_ratelimited(target->bt_mount, + xfs_warn_ratelimited(mp, "%s: failed to map %u pages", __func__, bp->b_page_count); xfs_buf_relse(bp); @@ -684,13 +678,10 @@ xfs_buf_get_map( if (!(flags & XBF_READ)) xfs_buf_ioerror(bp, 0); - XFS_STATS_INC(target->bt_mount, xb_get); + XFS_STATS_INC(mp, xb_get); trace_xfs_buf_get(bp, flags, _RET_IP_); *bpp = bp; return 0; -out_free_buf: - xfs_buf_free(new_bp); - return error; } int @@ -962,12 +953,12 @@ xfs_buf_rele( /* * We grab the b_lock here first to serialise racing xfs_buf_rele() * calls. The pag_buf_lock being taken on the last reference only - * serialises against racing lookups in xfs_buf_find(). IOWs, the second - * to last reference we drop here is not serialised against the last - * reference until we take bp->b_lock. Hence if we don't grab b_lock - * first, the last "release" reference can win the race to the lock and - * free the buffer before the second-to-last reference is processed, - * leading to a use-after-free scenario. + * serialises against racing lookups in xfs_buf_get_map(). IOWs, the + * second to last reference we drop here is not serialised against the + * last reference until we take bp->b_lock. Hence if we don't grab + * b_lock first, the last "release" reference can win the race to the + * lock and free the buffer before the second-to-last reference is + * processed, leading to a use-after-free scenario. */ spin_lock(&bp->b_lock); release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock); -- 2.30.2