From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9012C2D0E2 for ; Tue, 22 Sep 2020 11:44:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E2B062388B for ; Tue, 22 Sep 2020 11:44:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="immUtW4w" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E2B062388B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0D5C5900068; Tue, 22 Sep 2020 07:44:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 08819900063; Tue, 22 Sep 2020 07:44:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDFB4900068; Tue, 22 Sep 2020 07:44:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id D8825900063 for ; Tue, 22 Sep 2020 07:44:55 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9B101362B for ; Tue, 22 Sep 2020 11:44:55 +0000 (UTC) X-FDA: 77290515750.12.jewel68_0b163012714d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 7303D18054D2D for ; Tue, 22 Sep 2020 11:44:55 +0000 (UTC) X-HE-Tag: jewel68_0b163012714d X-Filterd-Recvd-Size: 4254 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 11:44:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Btsk3TWkzwusSqiy5IFMXQB0H19hg3MCpZrC15s/Q/Q=; b=immUtW4wZOBSurJnINZK4zOPXR fCnuH2cw3F9w/DHb9I9QM1AIfvrDxut0ooauKIZfA+OTLYzZgjmzHBmHuahSW3GFo0A+RlCXhvzKs Mepl+PDXagba7KQgObhrDyPANHcbKerdbz5PyQ2QdvW8NGKjqHueQark+XdZlWfZB6lcIt9zvCHO6 BEde8TkWKqZZKZJoYhGpBVd/2YgQthRqC6f4zbZ/OgUBesbyKcOTlS/VO3iwu48J6RjFZLLdfJQIE AAOs3SvZBabdOWFvAINAk0ndcSTHL8IPZ4tyDeTSjT/8usRJu/u2amN8eyegVkhwjowRL9nsH8VdU 3F6alXSQ==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kKgj5-0006aJ-F6; Tue, 22 Sep 2020 11:44:51 +0000 Date: Tue, 22 Sep 2020 12:44:51 +0100 From: Matthew Wilcox To: Vlastimil Babka Cc: linux-mm@kvack.org, Ira Weiny , "Kirill A. Shutemov" Subject: Re: Rare memory leakage Message-ID: <20200922114451.GC32101@casper.infradead.org> References: <20200922031215.GZ32101@casper.infradead.org> <8f294420-93c9-618a-6128-432b7035642b@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8f294420-93c9-618a-6128-432b7035642b@suse.cz> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 22, 2020 at 01:22:12PM +0200, Vlastimil Babka wrote: > On 9/22/20 5:12 AM, Matthew Wilcox wrote: > > This is a fun little race. > > > > Dramatis personae: Pages P0, P1 are consecutive and aligned. > > Threads A, B, C. > > > > Page P0 is allocated to the page cache. > > Page P1 is free. > > > > Thread A calls find_get_entry() > > P0 is returned from xas_load() > > > > Thread B removes the page from the page cache (eg truncate, invalidatepage). > > P0 is buddy-merged with P1. > > > > Thread C calls alloc_pages, order 1, does not specify GFP_COMP. P0 now > > has refcount 1. > > > > Thread A calls page_cache_get_speculative(). P0 has refcount 2. > > > > Thread C calls __free_page(P0, 1) > > put_page_testzero is _false_. Do not call free_the_page(). > > > > Thread A calls put_page(P0) > > We free P0 and nobody knows to free P1. > > > > > > Weird solution: In __free_page(), if put_page_testzero() fails and page > > is not PageHead, convert it to a compound page. Then the put_page() > > by Thread A will free P1. > > I can imagine doing the conversion in a manner that deals with races properly > will be rather tricky... Yes. We can't just look at the return from put_page_testzero() because by then thread A may have put its ref too and the page is already winding its way through the page freeing mechanism. We could try to conditionally get the reference back again, but we're then looking at the same race. I've just kicked off a test run using this: +/* + * Have to be careful when freeing a non-compound allocation in case somebody + * else takes a temporary reference on the first page and then calls put_page() + */ void __free_pages(struct page *page, unsigned int order) { - if (put_page_testzero(page)) - free_the_page(page, order); + if (likely(page_ref_freeze(page, 1))) + goto free; + if (likely(order == 0 || PageHead(page))) { + if (put_page_testzero(page)) + goto free; + return; + } + + prep_compound_page(page, order); + put_page(page); + return; +free: + free_the_page(page, order); } EXPORT_SYMBOL(__free_pages); > > Better ideas? > > IMHO, alloc_pages() with order > 0 and without __GFP_COMP is a weird beast. In > that case it would be probably best to refcount each base page separately. I > don't know how many assumptions this would break :/ You sound like a man who's never dealt with device drivers. Multiorder, non-compound allocations are the norm in that world.