From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0230C2D0A8 for ; Wed, 23 Sep 2020 11:26:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6655C20709 for ; Wed, 23 Sep 2020 11:26:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="lR5r1jk8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6655C20709 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9950A6B0003; Wed, 23 Sep 2020 07:26:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 91F056B0055; Wed, 23 Sep 2020 07:26:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C0236B005A; Wed, 23 Sep 2020 07:26:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0059.hostedemail.com [216.40.44.59]) by kanga.kvack.org (Postfix) with ESMTP id 640436B0003 for ; Wed, 23 Sep 2020 07:26:56 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 25F248249980 for ; Wed, 23 Sep 2020 11:26:56 +0000 (UTC) X-FDA: 77294099232.04.story84_2609c6127156 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin04.hostedemail.com (Postfix) with ESMTP id 0A32F8005A74 for ; Wed, 23 Sep 2020 11:26:56 +0000 (UTC) X-HE-Tag: story84_2609c6127156 X-Filterd-Recvd-Size: 5526 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Sep 2020 11:26:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=n9zsDXQuPfAz/dK2tNt3Cbhp3NRSC1ZrO9GrpkkIwco=; b=lR5r1jk8KN7dI1seABnKetM6Dn a9AGxyr7MVl14flNHrjEXDJ5E1Qn3MAMN9B8Jw8IliolGTedr2lSVlfnemnl3HIu2mCRaetM5MQCs N35OWk09KuHzOh1MuSXPCv9wJcujeaTihT9nFQV53s0fAKNOZ8LPBi4dS5jMiy60kyvqMaYWGOQ8h XCUE/+ErrBRjmTsrhB9nTHkMVfBfB6cUB/caDcM1q6K4+SQ8XOKCaJQN1bLdsaTL32SX+HRypTgGQ NWj2DepRpZT+gB79RNsjzkSoaRCoJRiwbARJRbOvRh51FwAIQdXf0iBZujwJttK3cXqr6MDD17TDg tbLBhvQw==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kL2vC-0007ZI-Un; Wed, 23 Sep 2020 11:26:51 +0000 Date: Wed, 23 Sep 2020 12:26:50 +0100 From: Matthew Wilcox To: "Kirill A. Shutemov" Cc: Vlastimil Babka , linux-mm@kvack.org, Ira Weiny Subject: Re: Rare memory leakage Message-ID: <20200923112650.GN32101@casper.infradead.org> References: <20200922031215.GZ32101@casper.infradead.org> <8f294420-93c9-618a-6128-432b7035642b@suse.cz> <20200922114451.GC32101@casper.infradead.org> <20200922131219.GE32101@casper.infradead.org> <20200923064606.l3ajwbbkewvvaimc@black.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200923064606.l3ajwbbkewvvaimc@black.fi.intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 23, 2020 at 09:46:06AM +0300, Kirill A. Shutemov wrote: > On Tue, Sep 22, 2020 at 02:12:19PM +0100, Matthew Wilcox wrote: > > On Tue, Sep 22, 2020 at 12:44:51PM +0100, Matthew Wilcox wrote: > > > I've just kicked off a test run using this: > > > > > > +/* > > > + * Have to be careful when freeing a non-compound allocation in case somebody > > > + * else takes a temporary reference on the first page and then calls put_page() > > > + */ > > > void __free_pages(struct page *page, unsigned int order) > > > { > > > - if (put_page_testzero(page)) > > > - free_the_page(page, order); > > > + if (likely(page_ref_freeze(page, 1))) > > > + goto free; > > > + if (likely(order == 0 || PageHead(page))) { > > > + if (put_page_testzero(page)) > > > + goto free; > > > + return; > > > + } > > > + > > > + prep_compound_page(page, order); > > > + put_page(page); > > > + return; > > > +free: > > > + free_the_page(page, order); > > > } > > > EXPORT_SYMBOL(__free_pages); > > > > > > > > Better ideas? > > > > > > > > IMHO, alloc_pages() with order > 0 and without __GFP_COMP is a weird beast. In > > > > that case it would be probably best to refcount each base page separately. I > > > > don't know how many assumptions this would break :/ > > > > > > You sound like a man who's never dealt with device drivers. Multiorder, > > > non-compound allocations are the norm in that world. > > > > This seems easier to reason about: > > > > +/* > > + * If we free a non-compound allocation, another thread may have a > > + * transient reference to the first page. It has no way of knowing > > + * about the rest of the allocation, so we have to free all but the > > + * first page here. > > + */ > > void __free_pages(struct page *page, unsigned int order) > > { > > if (put_page_testzero(page)) > > free_the_page(page, order); > > + else > > + while (order-- > 0) > > + free_the_page(page + 1 << order, order); > > } > > EXPORT_SYMBOL(__free_pages); > > To be honest, I have not dealt with device drivers much myself, but this > looks bogus. We free pages beynd the first one if the first is pinned. > How is it correct? >From the device driver's point of view, it's just freeing some memory. Let's look at an example: static struct page *i8xx_alloc_pages(void) { struct page *page; page = alloc_pages(GFP_KERNEL | GFP_DMA32, 2); if (page == NULL) return NULL; if (set_pages_uc(page, 4) < 0) { set_pages_wb(page, 4); __free_pages(page, 2); return NULL; } atomic_inc(&agp_bridge->current_memory_agp); return page; } So we allocate four consecutive pages but they're not a compound page. If this mysterious call to set_pages_uc() fails, we call __free_pages() and intend to free all four pages. If the page cache has a speculative reference on the first page, we will free none of them. When the page cache drops its speculative reference, it will free only the first one. So we have to free all-but-the-first one here. As far as I can tell, you're concerned that the driver might do something like this: struct page *page = alloc_pages(GFP_KERNEL, 2); get_page(page); __free_pages(page, 2); __free_pages(page, 2); Drivers don't do that, in my experience, and if there are any, we'll see a noisy complaint in page_expected_state() when it checks _mapcount on each of the freed pages and it's PageBuddy instead of -1.