From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5EEFC77B7F for ; Mon, 24 Apr 2023 13:21:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232036AbjDXNVk (ORCPT ); Mon, 24 Apr 2023 09:21:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232017AbjDXNVi (ORCPT ); Mon, 24 Apr 2023 09:21:38 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 004F9524F for ; Mon, 24 Apr 2023 06:21:24 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 83B3C62247 for ; Mon, 24 Apr 2023 13:21:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 93A61C4339C; Mon, 24 Apr 2023 13:21:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1682342483; bh=mLSytBvQlfwB1tZGNfXw6XNAS4ua8mSIQH5mtpOa0vQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EB/h5pgUZPqMboy+MjeoxZU9LkNdVswYznItH3wRx2lpTTrggPx5rF6YgHAZjElWG K2kckE8Yjgm+b65l5JFrxj7EZV5663yg6+01i6mv9Mf1t+AjsKt90dQgc1793pVL5z PcIQDS92V9rTssTjnEKZn8KGV9x+UuXlASPt0JRs= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Jiachen Zhang , Miklos Szeredi , Yang Bo Subject: [PATCH 5.15 57/73] fuse: fix deadlock between atomic O_TRUNC and page invalidation Date: Mon, 24 Apr 2023 15:17:11 +0200 Message-Id: <20230424131131.133700175@linuxfoundation.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20230424131129.040707961@linuxfoundation.org> References: <20230424131129.040707961@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Miklos Szeredi commit 2fdbb8dd01556e1501132b5ad3826e8f71e24a8b upstream. fuse_finish_open() will be called with FUSE_NOWRITE set in case of atomic O_TRUNC open(), so commit 76224355db75 ("fuse: truncate pagecache on atomic_o_trunc") replaced invalidate_inode_pages2() by truncate_pagecache() in such a case to avoid the A-A deadlock. However, we found another A-B-B-A deadlock related to the case above, which will cause the xfstests generic/464 testcase hung in our virtio-fs test environment. For example, consider two processes concurrently open one same file, one with O_TRUNC and another without O_TRUNC. The deadlock case is described below, if open(O_TRUNC) is already set_nowrite(acquired A), and is trying to lock a page (acquiring B), open() could have held the page lock (acquired B), and waiting on the page writeback (acquiring A). This would lead to deadlocks. open(O_TRUNC) ---------------------------------------------------------------- fuse_open_common inode_lock [C acquire] fuse_set_nowrite [A acquire] fuse_finish_open truncate_pagecache lock_page [B acquire] truncate_inode_page unlock_page [B release] fuse_release_nowrite [A release] inode_unlock [C release] ---------------------------------------------------------------- open() ---------------------------------------------------------------- fuse_open_common fuse_finish_open invalidate_inode_pages2 lock_page [B acquire] fuse_launder_page fuse_wait_on_page_writeback [A acquire & release] unlock_page [B release] ---------------------------------------------------------------- Besides this case, all calls of invalidate_inode_pages2() and invalidate_inode_pages2_range() in fuse code also can deadlock with open(O_TRUNC). Fix by moving the truncate_pagecache() call outside the nowrite protected region. The nowrite protection is only for delayed writeback (writeback_cache) case, where inode lock does not protect against truncation racing with writes on the server. Write syscalls racing with page cache truncation still get the inode lock protection. This patch also changes the order of filemap_invalidate_lock() vs. fuse_set_nowrite() in fuse_open_common(). This new order matches the order found in fuse_file_fallocate() and fuse_do_setattr(). Reported-by: Jiachen Zhang Tested-by: Jiachen Zhang Fixes: e4648309b85a ("fuse: truncate pending writes on O_TRUNC") Cc: Signed-off-by: Miklos Szeredi Signed-off-by: Yang Bo Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dir.c | 7 ++++++- fs/fuse/file.c | 29 +++++++++++++++++------------ 2 files changed, 23 insertions(+), 13 deletions(-) --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -476,6 +476,7 @@ static int fuse_create_open(struct inode struct fuse_entry_out outentry; struct fuse_inode *fi; struct fuse_file *ff; + bool trunc = flags & O_TRUNC; /* Userspace expects S_IFREG in create mode */ BUG_ON((mode & S_IFMT) != S_IFREG); @@ -500,7 +501,7 @@ static int fuse_create_open(struct inode inarg.mode = mode; inarg.umask = current_umask(); - if (fm->fc->handle_killpriv_v2 && (flags & O_TRUNC) && + if (fm->fc->handle_killpriv_v2 && trunc && !(flags & O_EXCL) && !capable(CAP_FSETID)) { inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID; } @@ -549,6 +550,10 @@ static int fuse_create_open(struct inode } else { file->private_data = ff; fuse_finish_open(inode, file); + if (fm->fc->atomic_o_trunc && trunc) + truncate_pagecache(inode, 0); + else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) + invalidate_inode_pages2(inode->i_mapping); } return err; --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -210,12 +210,9 @@ void fuse_finish_open(struct inode *inod fi->attr_version = atomic64_inc_return(&fc->attr_version); i_size_write(inode, 0); spin_unlock(&fi->lock); - truncate_pagecache(inode, 0); fuse_invalidate_attr(inode); if (fc->writeback_cache) file_update_time(file); - } else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) { - invalidate_inode_pages2(inode->i_mapping); } if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache) @@ -240,30 +237,38 @@ int fuse_open_common(struct inode *inode if (err) return err; - if (is_wb_truncate || dax_truncate) { + if (is_wb_truncate || dax_truncate) inode_lock(inode); - fuse_set_nowrite(inode); - } if (dax_truncate) { filemap_invalidate_lock(inode->i_mapping); err = fuse_dax_break_layouts(inode, 0, 0); if (err) - goto out; + goto out_inode_unlock; } + if (is_wb_truncate || dax_truncate) + fuse_set_nowrite(inode); + err = fuse_do_open(fm, get_node_id(inode), file, isdir); if (!err) fuse_finish_open(inode, file); -out: + if (is_wb_truncate || dax_truncate) + fuse_release_nowrite(inode); + if (!err) { + struct fuse_file *ff = file->private_data; + + if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) + truncate_pagecache(inode, 0); + else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) + invalidate_inode_pages2(inode->i_mapping); + } if (dax_truncate) filemap_invalidate_unlock(inode->i_mapping); - - if (is_wb_truncate | dax_truncate) { - fuse_release_nowrite(inode); +out_inode_unlock: + if (is_wb_truncate || dax_truncate) inode_unlock(inode); - } return err; }