From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=CyxS=WS=vger.kernel.org=linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DB167C3A5A1
	for <linux-btrfs@archiver.kernel.org>; Thu, 22 Aug 2019 19:19:15 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id B3FC52339F
	for <linux-btrfs@archiver.kernel.org>; Thu, 22 Aug 2019 19:19:15 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=toxicpanda-com.20150623.gappssmtp.com header.i=@toxicpanda-com.20150623.gappssmtp.com header.b="sRJ2PgaG"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S2404053AbfHVTTO (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Thu, 22 Aug 2019 15:19:14 -0400
Received: from mail-yb1-f195.google.com ([209.85.219.195]:34269 "EHLO
        mail-yb1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1731916AbfHVTTO (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Thu, 22 Aug 2019 15:19:14 -0400
Received: by mail-yb1-f195.google.com with SMTP id u68so2983098ybg.1
        for <linux-btrfs@vger.kernel.org>; Thu, 22 Aug 2019 12:19:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=toxicpanda-com.20150623.gappssmtp.com; s=20150623;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=AeR79+V8X3o9J1ucanG1LzY/IGLduXL++6IRQRmHGIE=;
        b=sRJ2PgaGBlszTXL2a2xRFH2uvv1byIQHOYOuYscxwFz5RnoSw165kuvSrWxckVSL+S
         ZmDXE9JArHZZm7tMq3XZo35yZrdDY+YE1njba6zpeW3G+qce/6KjeDuM2Q+cuaU4sr28
         EfEnQGq4MolafBYJ1OX0HismwI/P4VThsIzcOT6MgSIk2E3737VKKefyLOPoJcORoXKq
         qsa1HI1KshBXG8qaIXY+eaa5yZjihOdIZxKtE3RqfK0LmUqhdIru1dlqWbBmH71fAId6
         ULc/C1vhz/llwsmDlbnibtCf4MVK0U6wlDZGG8Ar8jGn1Oc7r8orkUigwwAeMGVTYSDy
         MxKw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=AeR79+V8X3o9J1ucanG1LzY/IGLduXL++6IRQRmHGIE=;
        b=NdH1e0Pmsghr6qKEVHQGUSQQXX9OtGNmVMNKFyFqk6ojot1zBgAzCjCLqr4BcVv3Eh
         zmmSvXUU1a6IvbebkQTHBghFGUZOZIJ5ED8TpZzzBy1rmRSKKx1wyBZsEROFahteP3Xh
         bN2X5bwVQlXaBr/gr43qSpyHi4F/hFFhYNFA2Gahk0iqwy9nfvz7tY34ZjSnUk6tpXzn
         yAmo12P+gfSct/zXx/kACKrlznjiCwRrRtx3pHuCFzXLdZRxdUUfFg0pbmanfU45yufr
         Iv+QCVde8EpAGCBQif/M83bA0l/wxZOgShAmhfjVeZkdNRV7aQJnx0nEK5rbmzH6MxVK
         jB0w==
X-Gm-Message-State: APjAAAVE21o41cIORhPvOGouhlms/gWTc5mKkzN1vBG91Zd+jhIa7338
        DB50Kw9P8VBUY72p/1aQJtrunOOJ+iCnyQ==
X-Google-Smtp-Source: APXvYqwT4p6aCqIydgmxHjFuF6ZBMOxYPzq3v69O/nuup741wLZxk8zNRZGhDmhM9iotCpU1jfJgXg==
X-Received: by 2002:a05:6902:4f1:: with SMTP id w17mr446478ybs.36.1566501553243;
        Thu, 22 Aug 2019 12:19:13 -0700 (PDT)
Received: from localhost ([107.15.81.208])
        by smtp.gmail.com with ESMTPSA id y3sm105861ywa.47.2019.08.22.12.19.12
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 22 Aug 2019 12:19:12 -0700 (PDT)
From:   Josef Bacik <josef@toxicpanda.com>
To:     kernel-team@fb.com, linux-btrfs@vger.kernel.org
Cc:     Nikolay Borisov <nborisov@suse.com>
Subject: [PATCH 4/5] btrfs: do not account global reserve in can_overcommit
Date:   Thu, 22 Aug 2019 15:19:03 -0400
Message-Id: <20190822191904.13939-5-josef@toxicpanda.com>
X-Mailer: git-send-email 2.21.0
In-Reply-To: <20190822191904.13939-1-josef@toxicpanda.com>
References: <20190822191904.13939-1-josef@toxicpanda.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

We ran into a problem in production where a box with plenty of space was
getting wedged doing ENOSPC flushing.  These boxes only had 20% of the
disk allocated, but their metadata space + global reserve was right at
the size of their metadata chunk.

In this case can_overcommit should be allowing allocations without
problem, but there's logic in can_overcommit that doesn't allow us to
overcommit if there's not enough real space to satisfy the global
reserve.

This is for historical reasons.  Before there were only certain places
we could allocate chunks.  We could go to commit the transaction and not
have enough space for our pending delayed refs and such and be unable to
allocate a new chunk.  This would result in a abort because of ENOSPC.
This code was added to solve this problem.

However since then we've gained the ability to always be able to
allocate a chunk.  So we can easily overcommit in these cases without
risking a transaction abort because of ENOSPC.

Also prior to now the global reserve really would be used because that's
the space we relied on for delayed refs.  With delayed refs being
tracked separately we no longer have to worry about running out of
delayed refs space while committing.  We are much less likely to
exhaust our global reserve space during transaction commit.

Fix the can_overcommit code to simply see if our current usage + what we
want is less than our current free space plus whatever slack space we
have in the disk is.  This solves the problem we were seeing in
production and keeps us from flushing as aggressively as we approach our
actual metadata size usage.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/space-info.c | 19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index a43f6287074b..3053b3e91b34 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -165,9 +165,7 @@ static int can_overcommit(struct btrfs_fs_info *fs_info,
 			  enum btrfs_reserve_flush_enum flush,
 			  bool system_chunk)
 {
-	struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
 	u64 profile;
-	u64 space_size;
 	u64 avail;
 	u64 used;
 	int factor;
@@ -181,22 +179,7 @@ static int can_overcommit(struct btrfs_fs_info *fs_info,
 	else
 		profile = btrfs_metadata_alloc_profile(fs_info);
 
-	used = btrfs_space_info_used(space_info, false);
-
-	/*
-	 * We only want to allow over committing if we have lots of actual space
-	 * free, but if we don't have enough space to handle the global reserve
-	 * space then we could end up having a real enospc problem when trying
-	 * to allocate a chunk or some other such important allocation.
-	 */
-	spin_lock(&global_rsv->lock);
-	space_size = calc_global_rsv_need_space(global_rsv);
-	spin_unlock(&global_rsv->lock);
-	if (used + space_size >= space_info->total_bytes)
-		return 0;
-
-	used += space_info->bytes_may_use;
-
+	used = btrfs_space_info_used(space_info, true);
 	avail = atomic64_read(&fs_info->free_chunk_space);
 
 	/*
-- 
2.21.0