From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19404C43457 for ; Fri, 16 Oct 2020 12:28:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AAAF72073A for ; Fri, 16 Oct 2020 12:28:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="FXKKcMQz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2407479AbgJPM2M (ORCPT ); Fri, 16 Oct 2020 08:28:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405601AbgJPM2M (ORCPT ); Fri, 16 Oct 2020 08:28:12 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1B99C061755 for ; Fri, 16 Oct 2020 05:28:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=F5azy1C+DHm5sWEovkALlRq6rf8uP+uxnGoxIcsol1Y=; b=FXKKcMQzE/3gOVuBZ+SUutLO8h P6iDD5g0oWQMrNdhYk2KAllWnu49bPdTlHFgWEuD5Tcpfj60oMPox/nMiNQse2WzRBhmaZ0zYKc9P vtAxxyo+mUHHqKrLVHHujm76H/6KkGu2r5EzMo00NhvZWmW8BxdXULjqJd+NY9l0SQtHFk6S65Rbx fKNySLt9Ur6U0mXMt0GL3rdWhz7HDzCQZktD64PfaAA59OXfo5ZPKVkwGKE2MdDhg008cC/5Jk5hu KoNEHdt7CcE3XCA7wTUn8fps8GXVfwoC48WNBFZUswFwoppezge+RYMbVTQR4yZe0T6XMS0uIAxfE JSq5ISHw==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kTOpy-0007mL-BV; Fri, 16 Oct 2020 12:27:58 +0000 Date: Fri, 16 Oct 2020 13:27:58 +0100 From: Matthew Wilcox To: Miklos Szeredi Cc: Linus Torvalds , Vivek Goyal , Qian Cai , Hugh Dickins , "Kirill A . Shutemov" , Linux-MM , Andrew Morton , linux-fsdevel , Amir Goldstein Subject: Re: Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..) Message-ID: <20201016122758.GE20115@casper.infradead.org> References: <4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw> <20201015151606.GA226448@redhat.com> <20201015195526.GC226448@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Fri, Oct 16, 2020 at 12:02:21PM +0200, Miklos Szeredi wrote: > This was added by commit ea9b9907b82a ("fuse: implement > perform_write") in v2.6.26 and remains essentially unchanged, AFAICS. > So this is an old bug indeed. > > So what is the page lock protecting? I think not truncation, because > inode_lock should be sufficient protection. > > What it does after sending a synchronous WRITE and before unlocking > the pages is set the PG_uptodate flag, but only if the complete page > was really written, which is what the uptodate flag really says: this > page is in sync with the underlying fs. Not in sync with. Uptodate means "every byte on this page is at least as new as the bytes on storage". Dirty means "at least one byte is newer than the bytes on storage". > So I think the page lock here is trying to protect against concurrent > reads/faults on not uptodate pages. I.e. until the WRITE request > completes it is unknown whether the page was really written or not, so > any reads must block until this state becomes known. This logic falls > down on already cached pages, since they start out uptodate and the > write does not clear this flag. That's not how the page cache should work -- if a write() has touched every byte in a page, then the page should be marked as Uptodate, and it can immediately satisfy read()s without even touching the backing store. > So keeping the pages locked has dubious value: short writes don't seem > to work correctly anyway. Which means that we can probably just set > the page uptodate right after filling it from the user buffer, and > unlock the page immediately. > > Am I missing something? I haven't looked at fuse in detail -- are you handling partial page writes? That is, if someone writes to bytes 5-15 of a file, are you first reading bytes 0-4 and 16-4095 from the backing store? If so, you can mark the page Uptodate as soon as you've copied bytes 5-15 from the user.