* [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation @ 2012-08-02 13:11 Glauber Costa 2012-08-02 14:06 ` Christoph Lameter 0 siblings, 1 reply; 6+ messages in thread From: Glauber Costa @ 2012-08-02 13:11 UTC (permalink / raw) To: linux-kernel Cc: Andrew Morton, linux-mm, Glauber Costa, David Rientjes, Pekka Enberg, Christoph Lameter The slab allocators provide its users with memory regions, with very few placement guarantees. No user should assume an actual page is given by kmalloc calls that are multiple of a page in size. This means that we can be sure that every sane user of the interface would not mess with the page reference counting of the underlying page. When freeing objects, the slub allocator will most of the time free empty pages by calling __free_pages(). But high-order kmalloc will be diposed by means of put_page() instead. It makes no sense to call put_page() in kernel pages that are not reference counted, which is the case here. Signed-off-by: Glauber Costa <glommer@parallels.com> CC: David Rientjes <rientjes@google.com> CC: Pekka Enberg <penberg@kernel.org> CC: Christoph Lameter <cl@linux.com> --- mm/slub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index e517d43..9ca4e20 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3453,7 +3453,7 @@ void kfree(const void *x) if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); kmemleak_free(x); - put_page(page); + __free_pages(page, compound_order(page)); return; } slab_free(page->slab, page, object, _RET_IP_); -- 1.7.11.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation 2012-08-02 13:11 [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation Glauber Costa @ 2012-08-02 14:06 ` Christoph Lameter 2012-08-02 16:42 ` Johannes Weiner 0 siblings, 1 reply; 6+ messages in thread From: Christoph Lameter @ 2012-08-02 14:06 UTC (permalink / raw) To: Glauber Costa Cc: linux-kernel, Andrew Morton, linux-mm, David Rientjes, Pekka Enberg On Thu, 2 Aug 2012, Glauber Costa wrote: > diff --git a/mm/slub.c b/mm/slub.c > index e517d43..9ca4e20 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3453,7 +3453,7 @@ void kfree(const void *x) > if (unlikely(!PageSlab(page))) { > BUG_ON(!PageCompound(page)); > kmemleak_free(x); > - put_page(page); > + __free_pages(page, compound_order(page)); Hmmm... put_page would have called put_compound_page(). which would have called the dtor function. dtor is set to __free_pages() ok which does mlock checks and verifies that the page is in a proper condition for freeing. Then it calls free_one_page(). __free_pages() decrements the refcount and then calls __free_pages_ok(). So we loose the checking and the dtor stuff with this patch. Guess that is ok? Acked-by: Christoph Lameter <cl@linux.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation 2012-08-02 14:06 ` Christoph Lameter @ 2012-08-02 16:42 ` Johannes Weiner 2012-08-02 16:51 ` Glauber Costa 0 siblings, 1 reply; 6+ messages in thread From: Johannes Weiner @ 2012-08-02 16:42 UTC (permalink / raw) To: Christoph Lameter Cc: Glauber Costa, linux-kernel, Andrew Morton, linux-mm, David Rientjes, Pekka Enberg On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote: > On Thu, 2 Aug 2012, Glauber Costa wrote: > > > diff --git a/mm/slub.c b/mm/slub.c > > index e517d43..9ca4e20 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -3453,7 +3453,7 @@ void kfree(const void *x) > > if (unlikely(!PageSlab(page))) { > > BUG_ON(!PageCompound(page)); > > kmemleak_free(x); > > - put_page(page); > > + __free_pages(page, compound_order(page)); > > Hmmm... put_page would have called put_compound_page(). which would have > called the dtor function. dtor is set to __free_pages() ok which does > mlock checks and verifies that the page is in a proper condition for > freeing. Then it calls free_one_page(). > > __free_pages() decrements the refcount and then calls __free_pages_ok(). > > So we loose the checking and the dtor stuff with this patch. Guess that is > ok? The changelog is not correct, however. People DO get pages underlying slab objects and even free the slab objects before returning the page. See recent fix: commit 5bf5f03c271907978489868a4c72aeb42b5127d2 Author: Pravin B Shelar <pshelar@nicira.com> Date: Tue May 29 15:06:49 2012 -0700 mm: fix slab->page flags corruption Transparent huge pages can change page->flags (PG_compound_lock) without taking Slab lock. Since THP can not break slab pages we can safely access compound page without taking compound lock. Specifically this patch fixes a race between compound_unlock() and slab functions which perform page-flags updates. This can occur when get_page()/put_page() is called on a page from slab. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation 2012-08-02 16:42 ` Johannes Weiner @ 2012-08-02 16:51 ` Glauber Costa 2012-08-02 17:10 ` Johannes Weiner 0 siblings, 1 reply; 6+ messages in thread From: Glauber Costa @ 2012-08-02 16:51 UTC (permalink / raw) To: Johannes Weiner Cc: Christoph Lameter, linux-kernel, Andrew Morton, linux-mm, David Rientjes, Pekka Enberg On 08/02/2012 08:42 PM, Johannes Weiner wrote: > On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote: >> On Thu, 2 Aug 2012, Glauber Costa wrote: >> >>> diff --git a/mm/slub.c b/mm/slub.c >>> index e517d43..9ca4e20 100644 >>> --- a/mm/slub.c >>> +++ b/mm/slub.c >>> @@ -3453,7 +3453,7 @@ void kfree(const void *x) >>> if (unlikely(!PageSlab(page))) { >>> BUG_ON(!PageCompound(page)); >>> kmemleak_free(x); >>> - put_page(page); >>> + __free_pages(page, compound_order(page)); >> >> Hmmm... put_page would have called put_compound_page(). which would have >> called the dtor function. dtor is set to __free_pages() ok which does >> mlock checks and verifies that the page is in a proper condition for >> freeing. Then it calls free_one_page(). >> >> __free_pages() decrements the refcount and then calls __free_pages_ok(). >> >> So we loose the checking and the dtor stuff with this patch. Guess that is >> ok? > > The changelog is not correct, however. People DO get pages underlying > slab objects and even free the slab objects before returning the page. > See recent fix: Well, yes, in the sense that slab objects are page-backed. The point is that a user of kmalloc/kfree should not treat a memory area as if it were a page, even if it is page-sized. If it is just the Changelog you are unhappy about, I can do another submission rewording it. > commit 5bf5f03c271907978489868a4c72aeb42b5127d2 > Author: Pravin B Shelar <pshelar@nicira.com> > Date: Tue May 29 15:06:49 2012 -0700 > > mm: fix slab->page flags corruption > > Transparent huge pages can change page->flags (PG_compound_lock) without > taking Slab lock. Since THP can not break slab pages we can safely access > compound page without taking compound lock. > > Specifically this patch fixes a race between compound_unlock() and slab > functions which perform page-flags updates. This can occur when > get_page()/put_page() is called on a page from slab. This is just another argument not to do put_page on slab pages! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation 2012-08-02 16:51 ` Glauber Costa @ 2012-08-02 17:10 ` Johannes Weiner 2012-08-02 17:24 ` Glauber Costa 0 siblings, 1 reply; 6+ messages in thread From: Johannes Weiner @ 2012-08-02 17:10 UTC (permalink / raw) To: Glauber Costa Cc: Christoph Lameter, linux-kernel, Andrew Morton, linux-mm, David Rientjes, Pekka Enberg, Andrea Arcangeli On Thu, Aug 02, 2012 at 08:51:31PM +0400, Glauber Costa wrote: > On 08/02/2012 08:42 PM, Johannes Weiner wrote: > > On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote: > >> On Thu, 2 Aug 2012, Glauber Costa wrote: > >> > >>> diff --git a/mm/slub.c b/mm/slub.c > >>> index e517d43..9ca4e20 100644 > >>> --- a/mm/slub.c > >>> +++ b/mm/slub.c > >>> @@ -3453,7 +3453,7 @@ void kfree(const void *x) > >>> if (unlikely(!PageSlab(page))) { > >>> BUG_ON(!PageCompound(page)); > >>> kmemleak_free(x); > >>> - put_page(page); > >>> + __free_pages(page, compound_order(page)); > >> > >> Hmmm... put_page would have called put_compound_page(). which would have > >> called the dtor function. dtor is set to __free_pages() ok which does > >> mlock checks and verifies that the page is in a proper condition for > >> freeing. Then it calls free_one_page(). > >> > >> __free_pages() decrements the refcount and then calls __free_pages_ok(). > >> > >> So we loose the checking and the dtor stuff with this patch. Guess that is > >> ok? > > > > The changelog is not correct, however. People DO get pages underlying > > slab objects and even free the slab objects before returning the page. > > See recent fix: > > Well, yes, in the sense that slab objects are page-backed. > > The point is that a user of kmalloc/kfree should not treat a memory area > as if it were a page, even if it is page-sized. I whole-heartedly agree. But it's hard to verify there aren't any doing that. And even though it's ugly to do, it's technically working, no? No longer supporting it would be a regression. > If it is just the Changelog you are unhappy about, I can do another > submission rewording it. __free_pages still respects the refcount, so I think the Changelog is not actually appropriate for the change you're making. You're just changing what Christoph outlined above, the compound page handling. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation 2012-08-02 17:10 ` Johannes Weiner @ 2012-08-02 17:24 ` Glauber Costa 0 siblings, 0 replies; 6+ messages in thread From: Glauber Costa @ 2012-08-02 17:24 UTC (permalink / raw) To: Johannes Weiner Cc: Christoph Lameter, linux-kernel, Andrew Morton, linux-mm, David Rientjes, Pekka Enberg, Andrea Arcangeli On 08/02/2012 09:10 PM, Johannes Weiner wrote: > On Thu, Aug 02, 2012 at 08:51:31PM +0400, Glauber Costa wrote: >> On 08/02/2012 08:42 PM, Johannes Weiner wrote: >>> On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote: >>>> On Thu, 2 Aug 2012, Glauber Costa wrote: >>>> >>>>> diff --git a/mm/slub.c b/mm/slub.c >>>>> index e517d43..9ca4e20 100644 >>>>> --- a/mm/slub.c >>>>> +++ b/mm/slub.c >>>>> @@ -3453,7 +3453,7 @@ void kfree(const void *x) >>>>> if (unlikely(!PageSlab(page))) { >>>>> BUG_ON(!PageCompound(page)); >>>>> kmemleak_free(x); >>>>> - put_page(page); >>>>> + __free_pages(page, compound_order(page)); >>>> >>>> Hmmm... put_page would have called put_compound_page(). which would have >>>> called the dtor function. dtor is set to __free_pages() ok which does >>>> mlock checks and verifies that the page is in a proper condition for >>>> freeing. Then it calls free_one_page(). >>>> >>>> __free_pages() decrements the refcount and then calls __free_pages_ok(). >>>> >>>> So we loose the checking and the dtor stuff with this patch. Guess that is >>>> ok? >>> >>> The changelog is not correct, however. People DO get pages underlying >>> slab objects and even free the slab objects before returning the page. >>> See recent fix: >> >> Well, yes, in the sense that slab objects are page-backed. >> >> The point is that a user of kmalloc/kfree should not treat a memory area >> as if it were a page, even if it is page-sized. > > I whole-heartedly agree. But it's hard to verify there aren't any > doing that. And even though it's ugly to do, it's technically > working, no? No longer supporting it would be a regression. I've done an extensive audit per Christoph's request, and although of course this is not enough to guarantee it 100 %, it should at least be enough to sustain a belief that it should be reasonably safe. About regressions, yes, it is working. But as you know, this area is under undergoing change by myself. For kmemcg to work, we need to explicitly mark instances of __free_pages that are accounted. With this patch, this is trivial. Without this patch, I need to come up with a quite ugly hack to mark put_pages as well, that would exist for no reason aside from "avoid touching this". I could of course just bundle this is my series, but since this is an independent change, it is better to send it separate so it get better review, testing and validation. >> If it is just the Changelog you are unhappy about, I can do another >> submission rewording it. > > __free_pages still respects the refcount, so I think the Changelog is > not actually appropriate for the change you're making. You're just > changing what Christoph outlined above, the compound page handling. I can update the Changelog, no problem. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-08-02 17:24 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-02 13:11 [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation Glauber Costa 2012-08-02 14:06 ` Christoph Lameter 2012-08-02 16:42 ` Johannes Weiner 2012-08-02 16:51 ` Glauber Costa 2012-08-02 17:10 ` Johannes Weiner 2012-08-02 17:24 ` Glauber Costa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).