From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB141C35242 for ; Sat, 8 Feb 2020 18:35:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D095D2253D for ; Sat, 8 Feb 2020 18:35:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728093AbgBHSff (ORCPT ); Sat, 8 Feb 2020 13:35:35 -0500 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:34056 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727850AbgBHS3n (ORCPT ); Sat, 8 Feb 2020 13:29:43 -0500 Received: from [192.168.4.242] (helo=deadeye) by shadbolt.decadent.org.uk with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1j0UrJ-0003fL-Us; Sat, 08 Feb 2020 18:29:37 +0000 Received: from ben by deadeye with local (Exim 4.93) (envelope-from ) id 1j0UrJ-000CRk-LG; Sat, 08 Feb 2020 18:29:37 +0000 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, Denis Kirjanov , "Josef Bacik" , "David Sterba" , "Filipe Manana" Date: Sat, 08 Feb 2020 18:20:41 +0000 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) X-Patchwork-Hint: ignore Subject: [PATCH 3.16 102/148] Btrfs: fix negative subv_writers counter and data space leak after buffered write In-Reply-To: X-SA-Exim-Connect-IP: 192.168.4.242 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 3.16.82-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Filipe Manana commit a0e248bb502d5165b3314ac3819e888fdcdf7d9f upstream. When doing a buffered write it's possible to leave the subv_writers counter of the root, used for synchronization between buffered nocow writers and snapshotting. This happens in an exceptional case like the following: 1) We fail to allocate data space for the write, since there's not enough available data space nor enough unallocated space for allocating a new data block group; 2) Because of that failure, we try to go to NOCOW mode, which succeeds and therefore we set the local variable 'only_release_metadata' to true and set the root's sub_writers counter to 1 through the call to btrfs_start_write_no_snapshotting() made by check_can_nocow(); 3) The call to btrfs_copy_from_user() returns zero, which is very unlikely to happen but not impossible; 4) No pages are copied because btrfs_copy_from_user() returned zero; 5) We call btrfs_end_write_no_snapshotting() which decrements the root's subv_writers counter to 0; 6) We don't set 'only_release_metadata' back to 'false' because we do it only if 'copied', the value returned by btrfs_copy_from_user(), is greater than zero; 7) On the next iteration of the while loop, which processes the same page range, we are now able to allocate data space for the write (we got enough data space released in the meanwhile); 8) After this if we fail at btrfs_delalloc_reserve_metadata(), because now there isn't enough free metadata space, or in some other place further below (prepare_pages(), lock_and_cleanup_extent_if_need(), btrfs_dirty_pages()), we break out of the while loop with 'only_release_metadata' having a value of 'true'; 9) Because 'only_release_metadata' is 'true' we end up decrementing the root's subv_writers counter to -1 (through a call to btrfs_end_write_no_snapshotting()), and we also end up not releasing the data space previously reserved through btrfs_check_data_free_space(). As a consequence the mechanism for synchronizing NOCOW buffered writes with snapshotting gets broken. Fix this by always setting 'only_release_metadata' to false at the start of each iteration. Fixes: 8257b2dc3c1a ("Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume") Fixes: 7ee9e4405f26 ("Btrfs: check if we can nocow if we don't have data space") Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- fs/btrfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1540,6 +1540,7 @@ static noinline ssize_t __btrfs_buffered break; } + only_release_metadata = false; reserve_bytes = num_pages << PAGE_CACHE_SHIFT; ret = btrfs_check_data_free_space(inode, reserve_bytes); if (ret == -ENOSPC && @@ -1671,7 +1672,6 @@ again: set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend, EXTENT_NORESERVE, NULL, NULL, GFP_NOFS); - only_release_metadata = false; } btrfs_drop_pages(pages, num_pages);