From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59EE0C43457 for ; Thu, 15 Oct 2020 15:16:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B67C02222B for ; Thu, 15 Oct 2020 15:16:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aSobyQfB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B67C02222B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C4857940007; Thu, 15 Oct 2020 11:16:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BFB4E900002; Thu, 15 Oct 2020 11:16:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC35E940007; Thu, 15 Oct 2020 11:16:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0166.hostedemail.com [216.40.44.166]) by kanga.kvack.org (Postfix) with ESMTP id 7FDB6900002 for ; Thu, 15 Oct 2020 11:16:16 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 13AFC180AD807 for ; Thu, 15 Oct 2020 15:16:16 +0000 (UTC) X-FDA: 77374510752.06.egg13_1d0c4a527215 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id DCA5510068AFA for ; Thu, 15 Oct 2020 15:16:15 +0000 (UTC) X-HE-Tag: egg13_1d0c4a527215 X-Filterd-Recvd-Size: 6280 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Thu, 15 Oct 2020 15:16:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602774974; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=V8hWGYgIvpxPZeV0khWMAperQupwbHKdzJCNracXQOs=; b=aSobyQfB349f37WARukQF1ak9kkOGd63TMKuMMrwR1lQ4L9LRzJ/c9Al6GMWj3sCNzy1x9 K29+q+ou4UEpZSy/kAeUSGJSwa9e1+nR4DmBTe4HTQXgZcYlX47F+1pt0lfOCTOBCpLmdc jW9JFkZgWWLjP8oPmXt8QX/ury5WWvg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-294-_V9ivtmpNju9G9P6E60dxQ-1; Thu, 15 Oct 2020 11:16:10 -0400 X-MC-Unique: _V9ivtmpNju9G9P6E60dxQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0DC2F1019626; Thu, 15 Oct 2020 15:16:08 +0000 (UTC) Received: from horse.redhat.com (ovpn-116-118.rdu2.redhat.com [10.10.116.118]) by smtp.corp.redhat.com (Postfix) with ESMTP id 64E6060C07; Thu, 15 Oct 2020 15:16:07 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id D5270223B17; Thu, 15 Oct 2020 11:16:06 -0400 (EDT) Date: Thu, 15 Oct 2020 11:16:06 -0400 From: Vivek Goyal To: Linus Torvalds Cc: Qian Cai , Hugh Dickins , Matthew Wilcox , "Kirill A . Shutemov" , Linux-MM , Andrew Morton , linux-fsdevel , Amir Goldstein , Miklos Szeredi Subject: Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..) Message-ID: <20201015151606.GA226448@redhat.com> References: <4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 14, 2020 at 07:44:16PM -0700, Linus Torvalds wrote: > On Wed, Oct 14, 2020 at 6:48 PM Qian Cai wrote: > > > > While on this topic, I just want to bring up a bug report that we are chasing an > > issue that a process is stuck in the loop of wait_on_page_bit_common() for more > > than 10 minutes before I gave up. > > Judging by call trace, that looks like a deadlock rather than a missed wakeup. > > The trace isn't reliable, but I find it suspicious that the call trace > just before the fault contains that > "iov_iter_copy_from_user_atomic()". > > IOW, I think you're in fuse_fill_write_pages(), which has allocated > the page, locked it, and then it takes a page fault. > > And the page fault waits on a page that is locked. > > This is a classic deadlock. > > The *intent* is that iov_iter_copy_from_user_atomic() returns zero, > and you retry without the page lock held. > > HOWEVER. > > That's not what fuse actually does. Fuse will do multiple pages, and > it will unlock only the _last_ page. It keeps the other pages locked, > and puts them in an array: > > ap->pages[ap->num_pages] = page; > > And after the iov_iter_copy_from_user_atomic() fails, it does that > "unlock" and repeat. > > But while the _last_ page was unlocked, the *previous* pages are still > locked in that array. Deadlock. > > I really don't think this has anything at all to do with page locking, > and everything to do with fuse_fill_write_pages() having a deadlock if > the source of data is a mmap of one of the pages it is trying to write > to (just with an offset, so that it's not the last page). > > See a similar code sequence in generic_perform_write(), but notice how > that code only has *one* page that it locks, and never holds an array > of pages around over that iov_iter_fault_in_readable() thing. Indeed. This is a deadlock in fuse. Thanks for the analysis. I can now trivially reproduce it with following program. Thanks Vivek #include #include #include #include #include #include #include #include #include #include int main(int argc, char *argv[]) { int fd, ret; void *map_addr; size_t map_length = 2 * 4096; char *buf_out = "Hello World"; struct iovec iov[2]; if (argc != 2 ) { printf("Usage:%s \n", argv[0]); exit(1); } fd = open(argv[1], O_RDWR | O_TRUNC); if (fd == -1) { fprintf(stderr, "Failed to open file %s:%s, errorno=%d\n", argv[1], strerror(errno), errno); exit(1); } map_addr = mmap(NULL, map_length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (map_addr == MAP_FAILED) { fprintf(stderr, "mmap failed %s, errorno=%d\n", strerror(errno), errno); exit(1); } /* Write first page and second page */ pwrite(fd, buf_out, strlen(buf_out), 0); pwrite(fd, buf_out, strlen(buf_out), 4096); /* Copy from first page and then second page */ iov[0].iov_base = map_addr; iov[0].iov_len = strlen(buf_out) + 1; iov[1].iov_base = map_addr + 4096; iov[1].iov_len = strlen(buf_out) + 1; /* * Write second page offset (4K - 12), reading from first page and * then second page. In first iteration we should fault in first * page and lock second page. And in second iteration we should * try fault in second page which is locked. Deadlock. */ ret = pwritev(fd, iov, 2, 4096 + 4096 - 12); printf("write() returned=%d\n", ret); munmap(map_addr, map_length); close(fd); }