* Rare memory leakage @ 2020-09-22 3:12 Matthew Wilcox 2020-09-22 3:43 ` Ira Weiny 2020-09-22 11:22 ` Vlastimil Babka 0 siblings, 2 replies; 8+ messages in thread From: Matthew Wilcox @ 2020-09-22 3:12 UTC (permalink / raw) To: linux-mm This is a fun little race. Dramatis personae: Pages P0, P1 are consecutive and aligned. Threads A, B, C. Page P0 is allocated to the page cache. Page P1 is free. Thread A calls find_get_entry() P0 is returned from xas_load() Thread B removes the page from the page cache (eg truncate, invalidatepage). P0 is buddy-merged with P1. Thread C calls alloc_pages, order 1, does not specify GFP_COMP. P0 now has refcount 1. Thread A calls page_cache_get_speculative(). P0 has refcount 2. Thread C calls __free_page(P0, 1) put_page_testzero is _false_. Do not call free_the_page(). Thread A calls put_page(P0) We free P0 and nobody knows to free P1. Weird solution: In __free_page(), if put_page_testzero() fails and page is not PageHead, convert it to a compound page. Then the put_page() by Thread A will free P1. Better ideas? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rare memory leakage 2020-09-22 3:12 Rare memory leakage Matthew Wilcox @ 2020-09-22 3:43 ` Ira Weiny 2020-09-22 11:04 ` Matthew Wilcox 2020-09-22 11:22 ` Vlastimil Babka 1 sibling, 1 reply; 8+ messages in thread From: Ira Weiny @ 2020-09-22 3:43 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-mm On Tue, Sep 22, 2020 at 04:12:15AM +0100, Matthew Wilcox wrote: > This is a fun little race. > > Dramatis personae: Pages P0, P1 are consecutive and aligned. > Threads A, B, C. > > Page P0 is allocated to the page cache. > Page P1 is free. > > Thread A calls find_get_entry() > P0 is returned from xas_load() > > Thread B removes the page from the page cache (eg truncate, invalidatepage). > P0 is buddy-merged with P1. > > Thread C calls alloc_pages, order 1, does not specify GFP_COMP. P0 now > has refcount 1. > > Thread A calls page_cache_get_speculative(). P0 has refcount 2. > > Thread C calls __free_page(P0, 1) > put_page_testzero is _false_. Do not call free_the_page(). > > Thread A calls put_page(P0) > We free P0 and nobody knows to free P1. > > > Weird solution: In __free_page(), if put_page_testzero() fails and page > is not PageHead, I'm not an expert, so I'm probably off here, but if Thread C calls __free_page(P0, 1) wouldn't that be on PageHead? So how would you know to convert it to a compound page? Ira > convert it to a compound page. Then the put_page() > by Thread A will free P1. > > Better ideas? > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rare memory leakage 2020-09-22 3:43 ` Ira Weiny @ 2020-09-22 11:04 ` Matthew Wilcox 0 siblings, 0 replies; 8+ messages in thread From: Matthew Wilcox @ 2020-09-22 11:04 UTC (permalink / raw) To: Ira Weiny; +Cc: linux-mm On Mon, Sep 21, 2020 at 08:43:14PM -0700, Ira Weiny wrote: > On Tue, Sep 22, 2020 at 04:12:15AM +0100, Matthew Wilcox wrote: > > This is a fun little race. > > > > Dramatis personae: Pages P0, P1 are consecutive and aligned. > > Threads A, B, C. > > > > Page P0 is allocated to the page cache. > > Page P1 is free. > > > > Thread A calls find_get_entry() > > P0 is returned from xas_load() > > > > Thread B removes the page from the page cache (eg truncate, invalidatepage). > > P0 is buddy-merged with P1. > > > > Thread C calls alloc_pages, order 1, does not specify GFP_COMP. P0 now > > has refcount 1. > > > > Thread A calls page_cache_get_speculative(). P0 has refcount 2. > > > > Thread C calls __free_page(P0, 1) > > put_page_testzero is _false_. Do not call free_the_page(). > > > > Thread A calls put_page(P0) > > We free P0 and nobody knows to free P1. > > > > > > Weird solution: In __free_page(), if put_page_testzero() fails and page > > is not PageHead, > > I'm not an expert, so I'm probably off here, but if Thread C calls > __free_page(P0, 1) wouldn't that be on PageHead? So how would you know to > convert it to a compound page? Only compound pages get PageHead set. A regular multiorder allocation without GFP_COMP set does not have PageHead. void prep_compound_page(struct page *page, unsigned int order) { ... __SetPageHead(page); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rare memory leakage 2020-09-22 3:12 Rare memory leakage Matthew Wilcox 2020-09-22 3:43 ` Ira Weiny @ 2020-09-22 11:22 ` Vlastimil Babka 2020-09-22 11:44 ` Matthew Wilcox 1 sibling, 1 reply; 8+ messages in thread From: Vlastimil Babka @ 2020-09-22 11:22 UTC (permalink / raw) To: Matthew Wilcox, linux-mm; +Cc: Ira Weiny, Kirill A. Shutemov On 9/22/20 5:12 AM, Matthew Wilcox wrote: > This is a fun little race. > > Dramatis personae: Pages P0, P1 are consecutive and aligned. > Threads A, B, C. > > Page P0 is allocated to the page cache. > Page P1 is free. > > Thread A calls find_get_entry() > P0 is returned from xas_load() > > Thread B removes the page from the page cache (eg truncate, invalidatepage). > P0 is buddy-merged with P1. > > Thread C calls alloc_pages, order 1, does not specify GFP_COMP. P0 now > has refcount 1. > > Thread A calls page_cache_get_speculative(). P0 has refcount 2. > > Thread C calls __free_page(P0, 1) > put_page_testzero is _false_. Do not call free_the_page(). > > Thread A calls put_page(P0) > We free P0 and nobody knows to free P1. > > > Weird solution: In __free_page(), if put_page_testzero() fails and page > is not PageHead, convert it to a compound page. Then the put_page() > by Thread A will free P1. I can imagine doing the conversion in a manner that deals with races properly will be rather tricky... > Better ideas? IMHO, alloc_pages() with order > 0 and without __GFP_COMP is a weird beast. In that case it would be probably best to refcount each base page separately. I don't know how many assumptions this would break :/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rare memory leakage 2020-09-22 11:22 ` Vlastimil Babka @ 2020-09-22 11:44 ` Matthew Wilcox 2020-09-22 13:12 ` Matthew Wilcox 0 siblings, 1 reply; 8+ messages in thread From: Matthew Wilcox @ 2020-09-22 11:44 UTC (permalink / raw) To: Vlastimil Babka; +Cc: linux-mm, Ira Weiny, Kirill A. Shutemov On Tue, Sep 22, 2020 at 01:22:12PM +0200, Vlastimil Babka wrote: > On 9/22/20 5:12 AM, Matthew Wilcox wrote: > > This is a fun little race. > > > > Dramatis personae: Pages P0, P1 are consecutive and aligned. > > Threads A, B, C. > > > > Page P0 is allocated to the page cache. > > Page P1 is free. > > > > Thread A calls find_get_entry() > > P0 is returned from xas_load() > > > > Thread B removes the page from the page cache (eg truncate, invalidatepage). > > P0 is buddy-merged with P1. > > > > Thread C calls alloc_pages, order 1, does not specify GFP_COMP. P0 now > > has refcount 1. > > > > Thread A calls page_cache_get_speculative(). P0 has refcount 2. > > > > Thread C calls __free_page(P0, 1) > > put_page_testzero is _false_. Do not call free_the_page(). > > > > Thread A calls put_page(P0) > > We free P0 and nobody knows to free P1. > > > > > > Weird solution: In __free_page(), if put_page_testzero() fails and page > > is not PageHead, convert it to a compound page. Then the put_page() > > by Thread A will free P1. > > I can imagine doing the conversion in a manner that deals with races properly > will be rather tricky... Yes. We can't just look at the return from put_page_testzero() because by then thread A may have put its ref too and the page is already winding its way through the page freeing mechanism. We could try to conditionally get the reference back again, but we're then looking at the same race. I've just kicked off a test run using this: +/* + * Have to be careful when freeing a non-compound allocation in case somebody + * else takes a temporary reference on the first page and then calls put_page() + */ void __free_pages(struct page *page, unsigned int order) { - if (put_page_testzero(page)) - free_the_page(page, order); + if (likely(page_ref_freeze(page, 1))) + goto free; + if (likely(order == 0 || PageHead(page))) { + if (put_page_testzero(page)) + goto free; + return; + } + + prep_compound_page(page, order); + put_page(page); + return; +free: + free_the_page(page, order); } EXPORT_SYMBOL(__free_pages); > > Better ideas? > > IMHO, alloc_pages() with order > 0 and without __GFP_COMP is a weird beast. In > that case it would be probably best to refcount each base page separately. I > don't know how many assumptions this would break :/ You sound like a man who's never dealt with device drivers. Multiorder, non-compound allocations are the norm in that world. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rare memory leakage 2020-09-22 11:44 ` Matthew Wilcox @ 2020-09-22 13:12 ` Matthew Wilcox 2020-09-23 6:46 ` Kirill A. Shutemov 0 siblings, 1 reply; 8+ messages in thread From: Matthew Wilcox @ 2020-09-22 13:12 UTC (permalink / raw) To: Vlastimil Babka; +Cc: linux-mm, Ira Weiny, Kirill A. Shutemov On Tue, Sep 22, 2020 at 12:44:51PM +0100, Matthew Wilcox wrote: > I've just kicked off a test run using this: > > +/* > + * Have to be careful when freeing a non-compound allocation in case somebody > + * else takes a temporary reference on the first page and then calls put_page() > + */ > void __free_pages(struct page *page, unsigned int order) > { > - if (put_page_testzero(page)) > - free_the_page(page, order); > + if (likely(page_ref_freeze(page, 1))) > + goto free; > + if (likely(order == 0 || PageHead(page))) { > + if (put_page_testzero(page)) > + goto free; > + return; > + } > + > + prep_compound_page(page, order); > + put_page(page); > + return; > +free: > + free_the_page(page, order); > } > EXPORT_SYMBOL(__free_pages); > > > > Better ideas? > > > > IMHO, alloc_pages() with order > 0 and without __GFP_COMP is a weird beast. In > > that case it would be probably best to refcount each base page separately. I > > don't know how many assumptions this would break :/ > > You sound like a man who's never dealt with device drivers. Multiorder, > non-compound allocations are the norm in that world. This seems easier to reason about: +/* + * If we free a non-compound allocation, another thread may have a + * transient reference to the first page. It has no way of knowing + * about the rest of the allocation, so we have to free all but the + * first page here. + */ void __free_pages(struct page *page, unsigned int order) { if (put_page_testzero(page)) free_the_page(page, order); + else + while (order-- > 0) + free_the_page(page + 1 << order, order); } EXPORT_SYMBOL(__free_pages); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rare memory leakage 2020-09-22 13:12 ` Matthew Wilcox @ 2020-09-23 6:46 ` Kirill A. Shutemov 2020-09-23 11:26 ` Matthew Wilcox 0 siblings, 1 reply; 8+ messages in thread From: Kirill A. Shutemov @ 2020-09-23 6:46 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Vlastimil Babka, linux-mm, Ira Weiny On Tue, Sep 22, 2020 at 02:12:19PM +0100, Matthew Wilcox wrote: > On Tue, Sep 22, 2020 at 12:44:51PM +0100, Matthew Wilcox wrote: > > I've just kicked off a test run using this: > > > > +/* > > + * Have to be careful when freeing a non-compound allocation in case somebody > > + * else takes a temporary reference on the first page and then calls put_page() > > + */ > > void __free_pages(struct page *page, unsigned int order) > > { > > - if (put_page_testzero(page)) > > - free_the_page(page, order); > > + if (likely(page_ref_freeze(page, 1))) > > + goto free; > > + if (likely(order == 0 || PageHead(page))) { > > + if (put_page_testzero(page)) > > + goto free; > > + return; > > + } > > + > > + prep_compound_page(page, order); > > + put_page(page); > > + return; > > +free: > > + free_the_page(page, order); > > } > > EXPORT_SYMBOL(__free_pages); > > > > > > Better ideas? > > > > > > IMHO, alloc_pages() with order > 0 and without __GFP_COMP is a weird beast. In > > > that case it would be probably best to refcount each base page separately. I > > > don't know how many assumptions this would break :/ > > > > You sound like a man who's never dealt with device drivers. Multiorder, > > non-compound allocations are the norm in that world. > > This seems easier to reason about: > > +/* > + * If we free a non-compound allocation, another thread may have a > + * transient reference to the first page. It has no way of knowing > + * about the rest of the allocation, so we have to free all but the > + * first page here. > + */ > void __free_pages(struct page *page, unsigned int order) > { > if (put_page_testzero(page)) > free_the_page(page, order); > + else > + while (order-- > 0) > + free_the_page(page + 1 << order, order); > } > EXPORT_SYMBOL(__free_pages); To be honest, I have not dealt with device drivers much myself, but this looks bogus. We free pages beynd the first one if the first is pinned. How is it correct? -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rare memory leakage 2020-09-23 6:46 ` Kirill A. Shutemov @ 2020-09-23 11:26 ` Matthew Wilcox 0 siblings, 0 replies; 8+ messages in thread From: Matthew Wilcox @ 2020-09-23 11:26 UTC (permalink / raw) To: Kirill A. Shutemov; +Cc: Vlastimil Babka, linux-mm, Ira Weiny On Wed, Sep 23, 2020 at 09:46:06AM +0300, Kirill A. Shutemov wrote: > On Tue, Sep 22, 2020 at 02:12:19PM +0100, Matthew Wilcox wrote: > > On Tue, Sep 22, 2020 at 12:44:51PM +0100, Matthew Wilcox wrote: > > > I've just kicked off a test run using this: > > > > > > +/* > > > + * Have to be careful when freeing a non-compound allocation in case somebody > > > + * else takes a temporary reference on the first page and then calls put_page() > > > + */ > > > void __free_pages(struct page *page, unsigned int order) > > > { > > > - if (put_page_testzero(page)) > > > - free_the_page(page, order); > > > + if (likely(page_ref_freeze(page, 1))) > > > + goto free; > > > + if (likely(order == 0 || PageHead(page))) { > > > + if (put_page_testzero(page)) > > > + goto free; > > > + return; > > > + } > > > + > > > + prep_compound_page(page, order); > > > + put_page(page); > > > + return; > > > +free: > > > + free_the_page(page, order); > > > } > > > EXPORT_SYMBOL(__free_pages); > > > > > > > > Better ideas? > > > > > > > > IMHO, alloc_pages() with order > 0 and without __GFP_COMP is a weird beast. In > > > > that case it would be probably best to refcount each base page separately. I > > > > don't know how many assumptions this would break :/ > > > > > > You sound like a man who's never dealt with device drivers. Multiorder, > > > non-compound allocations are the norm in that world. > > > > This seems easier to reason about: > > > > +/* > > + * If we free a non-compound allocation, another thread may have a > > + * transient reference to the first page. It has no way of knowing > > + * about the rest of the allocation, so we have to free all but the > > + * first page here. > > + */ > > void __free_pages(struct page *page, unsigned int order) > > { > > if (put_page_testzero(page)) > > free_the_page(page, order); > > + else > > + while (order-- > 0) > > + free_the_page(page + 1 << order, order); > > } > > EXPORT_SYMBOL(__free_pages); > > To be honest, I have not dealt with device drivers much myself, but this > looks bogus. We free pages beynd the first one if the first is pinned. > How is it correct? From the device driver's point of view, it's just freeing some memory. Let's look at an example: static struct page *i8xx_alloc_pages(void) { struct page *page; page = alloc_pages(GFP_KERNEL | GFP_DMA32, 2); if (page == NULL) return NULL; if (set_pages_uc(page, 4) < 0) { set_pages_wb(page, 4); __free_pages(page, 2); return NULL; } atomic_inc(&agp_bridge->current_memory_agp); return page; } So we allocate four consecutive pages but they're not a compound page. If this mysterious call to set_pages_uc() fails, we call __free_pages() and intend to free all four pages. If the page cache has a speculative reference on the first page, we will free none of them. When the page cache drops its speculative reference, it will free only the first one. So we have to free all-but-the-first one here. As far as I can tell, you're concerned that the driver might do something like this: struct page *page = alloc_pages(GFP_KERNEL, 2); get_page(page); __free_pages(page, 2); __free_pages(page, 2); Drivers don't do that, in my experience, and if there are any, we'll see a noisy complaint in page_expected_state() when it checks _mapcount on each of the freed pages and it's PageBuddy instead of -1. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-09-23 11:26 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-09-22 3:12 Rare memory leakage Matthew Wilcox 2020-09-22 3:43 ` Ira Weiny 2020-09-22 11:04 ` Matthew Wilcox 2020-09-22 11:22 ` Vlastimil Babka 2020-09-22 11:44 ` Matthew Wilcox 2020-09-22 13:12 ` Matthew Wilcox 2020-09-23 6:46 ` Kirill A. Shutemov 2020-09-23 11:26 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).