* [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation
@ 2012-08-02 13:11 Glauber Costa
2012-08-02 14:06 ` Christoph Lameter
0 siblings, 1 reply; 6+ messages in thread
From: Glauber Costa @ 2012-08-02 13:11 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, linux-mm, Glauber Costa, David Rientjes,
Pekka Enberg, Christoph Lameter
The slab allocators provide its users with memory regions, with very few
placement guarantees. No user should assume an actual page is given by
kmalloc calls that are multiple of a page in size. This means that we
can be sure that every sane user of the interface would not mess with
the page reference counting of the underlying page.
When freeing objects, the slub allocator will most of the time free
empty pages by calling __free_pages(). But high-order kmalloc will be
diposed by means of put_page() instead.
It makes no sense to call put_page() in kernel pages that are not
reference counted, which is the case here.
Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: David Rientjes <rientjes@google.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Christoph Lameter <cl@linux.com>
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c
index e517d43..9ca4e20 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3453,7 +3453,7 @@ void kfree(const void *x)
if (unlikely(!PageSlab(page))) {
BUG_ON(!PageCompound(page));
kmemleak_free(x);
- put_page(page);
+ __free_pages(page, compound_order(page));
return;
}
slab_free(page->slab, page, object, _RET_IP_);
--
1.7.11.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation
2012-08-02 13:11 [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation Glauber Costa
@ 2012-08-02 14:06 ` Christoph Lameter
2012-08-02 16:42 ` Johannes Weiner
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2012-08-02 14:06 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-kernel, Andrew Morton, linux-mm, David Rientjes,
Pekka Enberg
On Thu, 2 Aug 2012, Glauber Costa wrote:
> diff --git a/mm/slub.c b/mm/slub.c
> index e517d43..9ca4e20 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3453,7 +3453,7 @@ void kfree(const void *x)
> if (unlikely(!PageSlab(page))) {
> BUG_ON(!PageCompound(page));
> kmemleak_free(x);
> - put_page(page);
> + __free_pages(page, compound_order(page));
Hmmm... put_page would have called put_compound_page(). which would have
called the dtor function. dtor is set to __free_pages() ok which does
mlock checks and verifies that the page is in a proper condition for
freeing. Then it calls free_one_page().
__free_pages() decrements the refcount and then calls __free_pages_ok().
So we loose the checking and the dtor stuff with this patch. Guess that is
ok?
Acked-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation
2012-08-02 14:06 ` Christoph Lameter
@ 2012-08-02 16:42 ` Johannes Weiner
2012-08-02 16:51 ` Glauber Costa
0 siblings, 1 reply; 6+ messages in thread
From: Johannes Weiner @ 2012-08-02 16:42 UTC (permalink / raw)
To: Christoph Lameter
Cc: Glauber Costa, linux-kernel, Andrew Morton, linux-mm,
David Rientjes, Pekka Enberg
On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote:
> On Thu, 2 Aug 2012, Glauber Costa wrote:
>
> > diff --git a/mm/slub.c b/mm/slub.c
> > index e517d43..9ca4e20 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3453,7 +3453,7 @@ void kfree(const void *x)
> > if (unlikely(!PageSlab(page))) {
> > BUG_ON(!PageCompound(page));
> > kmemleak_free(x);
> > - put_page(page);
> > + __free_pages(page, compound_order(page));
>
> Hmmm... put_page would have called put_compound_page(). which would have
> called the dtor function. dtor is set to __free_pages() ok which does
> mlock checks and verifies that the page is in a proper condition for
> freeing. Then it calls free_one_page().
>
> __free_pages() decrements the refcount and then calls __free_pages_ok().
>
> So we loose the checking and the dtor stuff with this patch. Guess that is
> ok?
The changelog is not correct, however. People DO get pages underlying
slab objects and even free the slab objects before returning the page.
See recent fix:
commit 5bf5f03c271907978489868a4c72aeb42b5127d2
Author: Pravin B Shelar <pshelar@nicira.com>
Date: Tue May 29 15:06:49 2012 -0700
mm: fix slab->page flags corruption
Transparent huge pages can change page->flags (PG_compound_lock) without
taking Slab lock. Since THP can not break slab pages we can safely access
compound page without taking compound lock.
Specifically this patch fixes a race between compound_unlock() and slab
functions which perform page-flags updates. This can occur when
get_page()/put_page() is called on a page from slab.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation
2012-08-02 16:42 ` Johannes Weiner
@ 2012-08-02 16:51 ` Glauber Costa
2012-08-02 17:10 ` Johannes Weiner
0 siblings, 1 reply; 6+ messages in thread
From: Glauber Costa @ 2012-08-02 16:51 UTC (permalink / raw)
To: Johannes Weiner
Cc: Christoph Lameter, linux-kernel, Andrew Morton, linux-mm,
David Rientjes, Pekka Enberg
On 08/02/2012 08:42 PM, Johannes Weiner wrote:
> On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote:
>> On Thu, 2 Aug 2012, Glauber Costa wrote:
>>
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index e517d43..9ca4e20 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -3453,7 +3453,7 @@ void kfree(const void *x)
>>> if (unlikely(!PageSlab(page))) {
>>> BUG_ON(!PageCompound(page));
>>> kmemleak_free(x);
>>> - put_page(page);
>>> + __free_pages(page, compound_order(page));
>>
>> Hmmm... put_page would have called put_compound_page(). which would have
>> called the dtor function. dtor is set to __free_pages() ok which does
>> mlock checks and verifies that the page is in a proper condition for
>> freeing. Then it calls free_one_page().
>>
>> __free_pages() decrements the refcount and then calls __free_pages_ok().
>>
>> So we loose the checking and the dtor stuff with this patch. Guess that is
>> ok?
>
> The changelog is not correct, however. People DO get pages underlying
> slab objects and even free the slab objects before returning the page.
> See recent fix:
Well, yes, in the sense that slab objects are page-backed.
The point is that a user of kmalloc/kfree should not treat a memory area
as if it were a page, even if it is page-sized.
If it is just the Changelog you are unhappy about, I can do another
submission rewording it.
> commit 5bf5f03c271907978489868a4c72aeb42b5127d2
> Author: Pravin B Shelar <pshelar@nicira.com>
> Date: Tue May 29 15:06:49 2012 -0700
>
> mm: fix slab->page flags corruption
>
> Transparent huge pages can change page->flags (PG_compound_lock) without
> taking Slab lock. Since THP can not break slab pages we can safely access
> compound page without taking compound lock.
>
> Specifically this patch fixes a race between compound_unlock() and slab
> functions which perform page-flags updates. This can occur when
> get_page()/put_page() is called on a page from slab.
This is just another argument not to do put_page on slab pages!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation
2012-08-02 16:51 ` Glauber Costa
@ 2012-08-02 17:10 ` Johannes Weiner
2012-08-02 17:24 ` Glauber Costa
0 siblings, 1 reply; 6+ messages in thread
From: Johannes Weiner @ 2012-08-02 17:10 UTC (permalink / raw)
To: Glauber Costa
Cc: Christoph Lameter, linux-kernel, Andrew Morton, linux-mm,
David Rientjes, Pekka Enberg, Andrea Arcangeli
On Thu, Aug 02, 2012 at 08:51:31PM +0400, Glauber Costa wrote:
> On 08/02/2012 08:42 PM, Johannes Weiner wrote:
> > On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote:
> >> On Thu, 2 Aug 2012, Glauber Costa wrote:
> >>
> >>> diff --git a/mm/slub.c b/mm/slub.c
> >>> index e517d43..9ca4e20 100644
> >>> --- a/mm/slub.c
> >>> +++ b/mm/slub.c
> >>> @@ -3453,7 +3453,7 @@ void kfree(const void *x)
> >>> if (unlikely(!PageSlab(page))) {
> >>> BUG_ON(!PageCompound(page));
> >>> kmemleak_free(x);
> >>> - put_page(page);
> >>> + __free_pages(page, compound_order(page));
> >>
> >> Hmmm... put_page would have called put_compound_page(). which would have
> >> called the dtor function. dtor is set to __free_pages() ok which does
> >> mlock checks and verifies that the page is in a proper condition for
> >> freeing. Then it calls free_one_page().
> >>
> >> __free_pages() decrements the refcount and then calls __free_pages_ok().
> >>
> >> So we loose the checking and the dtor stuff with this patch. Guess that is
> >> ok?
> >
> > The changelog is not correct, however. People DO get pages underlying
> > slab objects and even free the slab objects before returning the page.
> > See recent fix:
>
> Well, yes, in the sense that slab objects are page-backed.
>
> The point is that a user of kmalloc/kfree should not treat a memory area
> as if it were a page, even if it is page-sized.
I whole-heartedly agree. But it's hard to verify there aren't any
doing that. And even though it's ugly to do, it's technically
working, no? No longer supporting it would be a regression.
> If it is just the Changelog you are unhappy about, I can do another
> submission rewording it.
__free_pages still respects the refcount, so I think the Changelog is
not actually appropriate for the change you're making. You're just
changing what Christoph outlined above, the compound page handling.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation
2012-08-02 17:10 ` Johannes Weiner
@ 2012-08-02 17:24 ` Glauber Costa
0 siblings, 0 replies; 6+ messages in thread
From: Glauber Costa @ 2012-08-02 17:24 UTC (permalink / raw)
To: Johannes Weiner
Cc: Christoph Lameter, linux-kernel, Andrew Morton, linux-mm,
David Rientjes, Pekka Enberg, Andrea Arcangeli
On 08/02/2012 09:10 PM, Johannes Weiner wrote:
> On Thu, Aug 02, 2012 at 08:51:31PM +0400, Glauber Costa wrote:
>> On 08/02/2012 08:42 PM, Johannes Weiner wrote:
>>> On Thu, Aug 02, 2012 at 09:06:41AM -0500, Christoph Lameter wrote:
>>>> On Thu, 2 Aug 2012, Glauber Costa wrote:
>>>>
>>>>> diff --git a/mm/slub.c b/mm/slub.c
>>>>> index e517d43..9ca4e20 100644
>>>>> --- a/mm/slub.c
>>>>> +++ b/mm/slub.c
>>>>> @@ -3453,7 +3453,7 @@ void kfree(const void *x)
>>>>> if (unlikely(!PageSlab(page))) {
>>>>> BUG_ON(!PageCompound(page));
>>>>> kmemleak_free(x);
>>>>> - put_page(page);
>>>>> + __free_pages(page, compound_order(page));
>>>>
>>>> Hmmm... put_page would have called put_compound_page(). which would have
>>>> called the dtor function. dtor is set to __free_pages() ok which does
>>>> mlock checks and verifies that the page is in a proper condition for
>>>> freeing. Then it calls free_one_page().
>>>>
>>>> __free_pages() decrements the refcount and then calls __free_pages_ok().
>>>>
>>>> So we loose the checking and the dtor stuff with this patch. Guess that is
>>>> ok?
>>>
>>> The changelog is not correct, however. People DO get pages underlying
>>> slab objects and even free the slab objects before returning the page.
>>> See recent fix:
>>
>> Well, yes, in the sense that slab objects are page-backed.
>>
>> The point is that a user of kmalloc/kfree should not treat a memory area
>> as if it were a page, even if it is page-sized.
>
> I whole-heartedly agree. But it's hard to verify there aren't any
> doing that. And even though it's ugly to do, it's technically
> working, no? No longer supporting it would be a regression.
I've done an extensive audit per Christoph's request, and although of
course this is not enough to guarantee it 100 %, it should at least be
enough to sustain a belief that it should be reasonably safe.
About regressions, yes, it is working. But as you know, this area is
under undergoing change by myself. For kmemcg to work, we need to
explicitly mark instances of __free_pages that are accounted. With this
patch, this is trivial. Without this patch, I need to come up with a
quite ugly hack to mark put_pages as well, that would exist for no
reason aside from "avoid touching this".
I could of course just bundle this is my series, but since this is an
independent change, it is better to send it separate so it get better
review, testing and validation.
>> If it is just the Changelog you are unhappy about, I can do another
>> submission rewording it.
>
> __free_pages still respects the refcount, so I think the Changelog is
> not actually appropriate for the change you're making. You're just
> changing what Christoph outlined above, the compound page handling.
I can update the Changelog, no problem.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-08-02 17:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-02 13:11 [PATCH] slub: use free_page instead of put_page for freeing kmalloc allocation Glauber Costa
2012-08-02 14:06 ` Christoph Lameter
2012-08-02 16:42 ` Johannes Weiner
2012-08-02 16:51 ` Glauber Costa
2012-08-02 17:10 ` Johannes Weiner
2012-08-02 17:24 ` Glauber Costa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).