linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Any reason to use put_page in slub.c?
@ 2012-07-27 12:19 Glauber Costa
  2012-07-27 15:55 ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Glauber Costa @ 2012-07-27 12:19 UTC (permalink / raw)
  To: linux-mm; +Cc: Pekka Enberg, David Rientjes, Christoph Lameter, Andrew Morton

Hi,

I've recently came across a bug in my kmemcg slab implementation, where memory
wasn't being unaccounted every time I expected it to be.  (bugs found by myself
are becoming a lot lot rarer, for the record)

I tracked it down to be due to the fact that we are now unaccounting at the
page allocator by calling __free_accounted_pages instead of normal
__free_pages.

However, higher order kmalloc allocations in the slub doesn't do that.  They
call put_page() instead, and I missed the conversion spot when converting
__free_pages() to __free_accounted_pages().

Now, although of course I can come up with put_accounted_page(), this is a bit
more awkward: first, it is in everybody's interest in keeping changes to the
page allocator to a minimum; also, put_page will not necessarily free the page,
so the semantics can get a bit complicated.

Since we are not doing any kind of page sharing with those pages in the slub -
and are already doing compound checks ourselves, I was wondering why couldn't
we just use __free_pages() instead. I see no reason not to. Replacing it with
__free_page() seems to work - my patched kernel is up and running, and doing
fine.

But I am still wondering if there is anything I am overlooking.

Do you guys think the following patch is safe?

---
 mm/slub.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index a136a75..a8fffeb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3399,7 +3399,7 @@ void kfree(const void *x)
 	if (unlikely(!PageSlab(page))) {
 		BUG_ON(!PageCompound(page));
 		kmemleak_free(x);
-		put_page(page);
+		__free_pages(page);
 		return;
 	}
 	slab_free(page->slab, page, object, _RET_IP_);
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-27 12:19 Any reason to use put_page in slub.c? Glauber Costa
@ 2012-07-27 15:55 ` Christoph Lameter
  2012-07-30  7:53   ` Glauber Costa
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2012-07-27 15:55 UTC (permalink / raw)
  To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton

On Fri, 27 Jul 2012, Glauber Costa wrote:

> But I am still wondering if there is anything I am overlooking.

put_page() is necessary because other subsystems may still be holding a
refcount on the page (if f.e. there is DMA still pending to that page).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-27 15:55 ` Christoph Lameter
@ 2012-07-30  7:53   ` Glauber Costa
  2012-07-30 19:23     ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Glauber Costa @ 2012-07-30  7:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton

On 07/27/2012 07:55 PM, Christoph Lameter wrote:
> On Fri, 27 Jul 2012, Glauber Costa wrote:
> 
>> But I am still wondering if there is anything I am overlooking.
> 
> put_page() is necessary because other subsystems may still be holding a
> refcount on the page (if f.e. there is DMA still pending to that page).
> 

Humm, this seems to be extremely unsafe in my read.
If you do kmalloc, the API - AFAIK - does not provide us with any
guarantee that the object (it's not even a page, in the strict sense!)
allocated is reference counted internally. So relying on kfree to do it
doesn't bode well. For one thing, slab doesn't go to the page allocator
for high order allocations, and this code would crash miserably if
running with the slab.

Or am I missing something ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-30  7:53   ` Glauber Costa
@ 2012-07-30 19:23     ` Christoph Lameter
  2012-07-31  8:25       ` Glauber Costa
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2012-07-30 19:23 UTC (permalink / raw)
  To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton

On Mon, 30 Jul 2012, Glauber Costa wrote:

> On 07/27/2012 07:55 PM, Christoph Lameter wrote:
> > On Fri, 27 Jul 2012, Glauber Costa wrote:
> >
> >> But I am still wondering if there is anything I am overlooking.
> >
> > put_page() is necessary because other subsystems may still be holding a
> > refcount on the page (if f.e. there is DMA still pending to that page).
> >
>
> Humm, this seems to be extremely unsafe in my read.

I do not like it either. Hopefully these usecases have been removed in the
meantime but that used to be an issue.

> If you do kmalloc, the API - AFAIK - does not provide us with any
> guarantee that the object (it's not even a page, in the strict sense!)
> allocated is reference counted internally. So relying on kfree to do it
> doesn't bode well. For one thing, slab doesn't go to the page allocator
> for high order allocations, and this code would crash miserably if
> running with the slab.
>
> Or am I missing something ?

Yes the refcounting is done at the page level by the page allocator. It is
safe. The slab allocator can free a page removing all references from its
internal structure while the subsystem page reference will hold off the
page allocator from actually freeing the page until the subsystem itself
drops the page count.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-30 19:23     ` Christoph Lameter
@ 2012-07-31  8:25       ` Glauber Costa
  2012-07-31 14:09         ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Glauber Costa @ 2012-07-31  8:25 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton

On 07/30/2012 11:23 PM, Christoph Lameter wrote:
> On Mon, 30 Jul 2012, Glauber Costa wrote:
> 
>> On 07/27/2012 07:55 PM, Christoph Lameter wrote:
>>> On Fri, 27 Jul 2012, Glauber Costa wrote:
>>>
>>>> But I am still wondering if there is anything I am overlooking.
>>>
>>> put_page() is necessary because other subsystems may still be holding a
>>> refcount on the page (if f.e. there is DMA still pending to that page).
>>>
>>
>> Humm, this seems to be extremely unsafe in my read.
> 
> I do not like it either. Hopefully these usecases have been removed in the
> meantime but that used to be an issue.
> 
>> If you do kmalloc, the API - AFAIK - does not provide us with any
>> guarantee that the object (it's not even a page, in the strict sense!)
>> allocated is reference counted internally. So relying on kfree to do it
>> doesn't bode well. For one thing, slab doesn't go to the page allocator
>> for high order allocations, and this code would crash miserably if
>> running with the slab.
>>
>> Or am I missing something ?
> 
> Yes the refcounting is done at the page level by the page allocator. It is
> safe. The slab allocator can free a page removing all references from its
> internal structure while the subsystem page reference will hold off the
> page allocator from actually freeing the page until the subsystem itself
> drops the page count.
> 

pages, yes. But when you do kfree, you don't free a page. You free an
object. The allocator is totally free to keep the page around and pass
it on to someone else.

The use case that put_page protect against, would be totally and
absolutely broken with every other allocator. They could give an object
in the same address to another user in the very next moment.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-31 14:09         ` Christoph Lameter
@ 2012-07-31 14:09           ` Glauber Costa
  2012-07-31 14:17             ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Glauber Costa @ 2012-07-31 14:09 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton

On 07/31/2012 06:09 PM, Christoph Lameter wrote:
> That is understood. Typically these object where page sized though and
> various assumptions (pretty dangerous ones as you are finding out) are
> made regarding object reuse. The fallback of SLUB for higher order allocs
> to the page allocator avoids these problems for higher order pages.
omg...

I am curious how slab handles this, since it doesn't seem to refcount in
the same way slub does?

Now, I am still left with the original problem:
__free_pages() here would be a superior solution, and the right thing to
do. Should we just convert it - and then fix whoever we find to be
abusing it (it doesn't mean anything, but I am running it on my systems
since then - 0 problems), or should I just create a hacky
put_accounted_page()?

I really, really dislike the later.

Anyone else would care to comment on this ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-31  8:25       ` Glauber Costa
@ 2012-07-31 14:09         ` Christoph Lameter
  2012-07-31 14:09           ` Glauber Costa
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2012-07-31 14:09 UTC (permalink / raw)
  To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton

On Tue, 31 Jul 2012, Glauber Costa wrote:

> >> Or am I missing something ?
> >
> > Yes the refcounting is done at the page level by the page allocator. It is
> > safe. The slab allocator can free a page removing all references from its
> > internal structure while the subsystem page reference will hold off the
> > page allocator from actually freeing the page until the subsystem itself
> > drops the page count.
> >
>
> pages, yes. But when you do kfree, you don't free a page. You free an
> object. The allocator is totally free to keep the page around and pass
> it on to someone else.

That is understood. Typically these object where page sized though and
various assumptions (pretty dangerous ones as you are finding out) are
made regarding object reuse. The fallback of SLUB for higher order allocs
to the page allocator avoids these problems for higher order pages.

It would be better and cleaner if all callers would not use slab
allocators but the page allocators directly for any page that requires an
increased refcount for DMA operations.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-31 14:09           ` Glauber Costa
@ 2012-07-31 14:17             ` Christoph Lameter
  2012-07-31 14:18               ` Glauber Costa
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2012-07-31 14:17 UTC (permalink / raw)
  To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton

On Tue, 31 Jul 2012, Glauber Costa wrote:

> On 07/31/2012 06:09 PM, Christoph Lameter wrote:
> > That is understood. Typically these object where page sized though and
> > various assumptions (pretty dangerous ones as you are finding out) are
> > made regarding object reuse. The fallback of SLUB for higher order allocs
> > to the page allocator avoids these problems for higher order pages.
> omg...

I would be very thankful if you would go through the tree and check for
any remaining use cases like that. Would take care of your problem.

> I am curious how slab handles this, since it doesn't seem to refcount in
> the same way slub does?

Slabs are not refcounting in general. With slab larger sized free pages
may be queued for awhile on the freelist. I guess this has taken care of
these issues in the past.

> Now, I am still left with the original problem:
> __free_pages() here would be a superior solution, and the right thing to
> do. Should we just convert it - and then fix whoever we find to be
> abusing it (it doesn't mean anything, but I am running it on my systems
> since then - 0 problems), or should I just create a hacky
> put_accounted_page()?
>
> I really, really dislike the later.

So do I. If you can verify that this no longer occurs then your patch wil
be fine and we can get rid of the put_page().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-31 14:17             ` Christoph Lameter
@ 2012-07-31 14:18               ` Glauber Costa
  2012-07-31 14:31                 ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Glauber Costa @ 2012-07-31 14:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton

On 07/31/2012 06:17 PM, Christoph Lameter wrote:
> On Tue, 31 Jul 2012, Glauber Costa wrote:
> 
>> On 07/31/2012 06:09 PM, Christoph Lameter wrote:
>>> That is understood. Typically these object where page sized though and
>>> various assumptions (pretty dangerous ones as you are finding out) are
>>> made regarding object reuse. The fallback of SLUB for higher order allocs
>>> to the page allocator avoids these problems for higher order pages.
>> omg...
> 
> I would be very thankful if you would go through the tree and check for
> any remaining use cases like that. Would take care of your problem.

I would be happy to do it. Do you have any example of any user that
behaved like this in the past, so I can search for something similar?

This can potentially take many forms, and auditing every kfree out there
is not humanly possible. The best I can do is to search for known
patterns here...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-31 14:18               ` Glauber Costa
@ 2012-07-31 14:31                 ` Christoph Lameter
  2012-07-31 14:52                   ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Lameter @ 2012-07-31 14:31 UTC (permalink / raw)
  To: Glauber Costa; +Cc: linux-mm, Pekka Enberg, David Rientjes, Andrew Morton

On Tue, 31 Jul 2012, Glauber Costa wrote:

> On 07/31/2012 06:17 PM, Christoph Lameter wrote:
> > On Tue, 31 Jul 2012, Glauber Costa wrote:
> >
> >> On 07/31/2012 06:09 PM, Christoph Lameter wrote:
> >>> That is understood. Typically these object where page sized though and
> >>> various assumptions (pretty dangerous ones as you are finding out) are
> >>> made regarding object reuse. The fallback of SLUB for higher order allocs
> >>> to the page allocator avoids these problems for higher order pages.
> >> omg...
> >
> > I would be very thankful if you would go through the tree and check for
> > any remaining use cases like that. Would take care of your problem.
>
> I would be happy to do it. Do you have any example of any user that
> behaved like this in the past, so I can search for something similar?
>
> This can potentially take many forms, and auditing every kfree out there
> is not humanly possible. The best I can do is to search for known
> patterns here...

The basic problem is that someone will take the address of an object that
is allocated via slab and then access the page struct to increase the page
count.

So you would see

page = virt_to_page(<slab_object>);

get_page(page);


The main cuprit in the past has been the DMA code in the SCSI layer. I
think it was the first 512 byte control block for the device that was the
main issue. There was a discussion betwen Hugh Dickins and me when SLUB
was first released about this issue and it resulted in some changes so
that certain fields in the page struct were not touched by SLUB since they
were needed for I/O.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-31 14:31                 ` Christoph Lameter
@ 2012-07-31 14:52                   ` James Bottomley
  2012-08-01 12:42                     ` Glauber Costa
  0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2012-07-31 14:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Glauber Costa, linux-mm, Pekka Enberg, David Rientjes,
	Andrew Morton

On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote:
> On Tue, 31 Jul 2012, Glauber Costa wrote:
> 
> > On 07/31/2012 06:17 PM, Christoph Lameter wrote:
> > > On Tue, 31 Jul 2012, Glauber Costa wrote:
> > >
> > >> On 07/31/2012 06:09 PM, Christoph Lameter wrote:
> > >>> That is understood. Typically these object where page sized though and
> > >>> various assumptions (pretty dangerous ones as you are finding out) are
> > >>> made regarding object reuse. The fallback of SLUB for higher order allocs
> > >>> to the page allocator avoids these problems for higher order pages.
> > >> omg...
> > >
> > > I would be very thankful if you would go through the tree and check for
> > > any remaining use cases like that. Would take care of your problem.
> >
> > I would be happy to do it. Do you have any example of any user that
> > behaved like this in the past, so I can search for something similar?
> >
> > This can potentially take many forms, and auditing every kfree out there
> > is not humanly possible. The best I can do is to search for known
> > patterns here...
> 
> The basic problem is that someone will take the address of an object that
> is allocated via slab and then access the page struct to increase the page
> count.
> 
> So you would see
> 
> page = virt_to_page(<slab_object>);
> 
> get_page(page);
> 
> 
> The main cuprit in the past has been the DMA code in the SCSI layer. I
> think it was the first 512 byte control block for the device that was the
> main issue. There was a discussion betwen Hugh Dickins and me when SLUB
> was first released about this issue and it resulted in some changes so
> that certain fields in the page struct were not touched by SLUB since they
> were needed for I/O.

Hey, don't try to pin this on me.  We don't use get_page() at all on the
ordinary DMA route.  There are four get_page() calls in the whole of
drivers/scsi.  One is in the sg.c fault path, which looks genuine.  The
other three are in fcoe and iSCSI ... what they're trying to do is to
ensure that the page hangs around until the device sees the data in a
network tx path.

I can't see why any of these pages would come from kmalloc() or any
other slab object since they should all be user pages.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-07-31 14:52                   ` James Bottomley
@ 2012-08-01 12:42                     ` Glauber Costa
  2012-08-01 18:10                       ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Glauber Costa @ 2012-08-01 12:42 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Lameter, linux-mm, Pekka Enberg, David Rientjes,
	Andrew Morton

On 07/31/2012 06:52 PM, James Bottomley wrote:
> On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote:
>> On Tue, 31 Jul 2012, Glauber Costa wrote:
>>
>>> On 07/31/2012 06:17 PM, Christoph Lameter wrote:
>>>> On Tue, 31 Jul 2012, Glauber Costa wrote:
>>>>
>>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote:
>>>>>> That is understood. Typically these object where page sized though and
>>>>>> various assumptions (pretty dangerous ones as you are finding out) are
>>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs
>>>>>> to the page allocator avoids these problems for higher order pages.
>>>>> omg...
>>>>
>>>> I would be very thankful if you would go through the tree and check for
>>>> any remaining use cases like that. Would take care of your problem.
>>>
>>> I would be happy to do it. Do you have any example of any user that
>>> behaved like this in the past, so I can search for something similar?
>>>
>>> This can potentially take many forms, and auditing every kfree out there
>>> is not humanly possible. The best I can do is to search for known
>>> patterns here...
>>
>> The basic problem is that someone will take the address of an object that
>> is allocated via slab and then access the page struct to increase the page
>> count.
>>
>> So you would see
>>
>> page = virt_to_page(<slab_object>);
>>
>> get_page(page);
>>
>>
>> The main cuprit in the past has been the DMA code in the SCSI layer. I
>> think it was the first 512 byte control block for the device that was the
>> main issue. There was a discussion betwen Hugh Dickins and me when SLUB
>> was first released about this issue and it resulted in some changes so
>> that certain fields in the page struct were not touched by SLUB since they
>> were needed for I/O.
> 
> Hey, don't try to pin this on me.  We don't use get_page() at all on the
> ordinary DMA route.  There are four get_page() calls in the whole of
> drivers/scsi.  One is in the sg.c fault path, which looks genuine.  The
> other three are in fcoe and iSCSI ... what they're trying to do is to
> ensure that the page hangs around until the device sees the data in a
> network tx path.
> 
> I can't see why any of these pages would come from kmalloc() or any
> other slab object since they should all be user pages.
> 
> James
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

On 07/31/2012 06:52 PM, James Bottomley wrote:
> On Tue, 2012-07-31 at 09:31 -0500, Christoph Lameter wrote:
>> On Tue, 31 Jul 2012, Glauber Costa wrote:
>>
>>> On 07/31/2012 06:17 PM, Christoph Lameter wrote:
>>>> On Tue, 31 Jul 2012, Glauber Costa wrote:
>>>>
>>>>> On 07/31/2012 06:09 PM, Christoph Lameter wrote:
>>>>>> That is understood. Typically these object where page sized though and
>>>>>> various assumptions (pretty dangerous ones as you are finding out) are
>>>>>> made regarding object reuse. The fallback of SLUB for higher order allocs
>>>>>> to the page allocator avoids these problems for higher order pages.
>>>>> omg...
>>>>
>>>> I would be very thankful if you would go through the tree and check for
>>>> any remaining use cases like that. Would take care of your problem.
>>>
>>> I would be happy to do it. Do you have any example of any user that
>>> behaved like this in the past, so I can search for something similar?
>>>
>>> This can potentially take many forms, and auditing every kfree out there
>>> is not humanly possible. The best I can do is to search for known
>>> patterns here...
>>
>> The basic problem is that someone will take the address of an object that
>> is allocated via slab and then access the page struct to increase the page
>> count.
>>
>> So you would see
>>
>> page = virt_to_page(<slab_object>);
>>
>> get_page(page);
>>
>>
>> The main cuprit in the past has been the DMA code in the SCSI layer. I
>> think it was the first 512 byte control block for the device that was the
>> main issue. There was a discussion betwen Hugh Dickins and me when SLUB
>> was first released about this issue and it resulted in some changes so
>> that certain fields in the page struct were not touched by SLUB since they
>> were needed for I/O.
> 
> Hey, don't try to pin this on me.  We don't use get_page() at all on the
> ordinary DMA route.  There are four get_page() calls in the whole of
> drivers/scsi.  One is in the sg.c fault path, which looks genuine.  The
> other three are in fcoe and iSCSI ... what they're trying to do is to
> ensure that the page hangs around until the device sees the data in a
> network tx path.
> 
> I can't see why any of these pages would come from kmalloc() or any
> other slab object since they should all be user pages.
> 

I've audited all users of get_page() in the drivers/ directory for
patterns like this. In general, they kmalloc something like a table of
entries, and then get_page() the entries. The entries are either user
pages, pages allocated by the page allocator, or physical addresses
through their pfn (in 2 cases from the vga ones...)

I took a look about some other instances where virt_to_page occurs
together with kmalloc as well, and they all seem to fall in the same
category.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-08-01 12:42                     ` Glauber Costa
@ 2012-08-01 18:10                       ` Christoph Lameter
  2012-08-02  7:55                         ` Glauber Costa
  2012-08-02  8:07                         ` James Bottomley
  0 siblings, 2 replies; 15+ messages in thread
From: Christoph Lameter @ 2012-08-01 18:10 UTC (permalink / raw)
  To: Glauber Costa
  Cc: James Bottomley, linux-mm, Pekka Enberg, David Rientjes,
	Andrew Morton

On Wed, 1 Aug 2012, Glauber Costa wrote:

> I've audited all users of get_page() in the drivers/ directory for
> patterns like this. In general, they kmalloc something like a table of
> entries, and then get_page() the entries. The entries are either user
> pages, pages allocated by the page allocator, or physical addresses
> through their pfn (in 2 cases from the vga ones...)
>
> I took a look about some other instances where virt_to_page occurs
> together with kmalloc as well, and they all seem to fall in the same
> category.

The case that was notorious in the past was a scsi control structure
allocated from slab that was then written to the device via DMA. And it
was not on x86 but some esoteric platform (powerpc?),

A reference to the discussion of this issue in 2007:

http://lkml.indiana.edu/hypermail/linux/kernel/0706.3/0424.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-08-01 18:10                       ` Christoph Lameter
@ 2012-08-02  7:55                         ` Glauber Costa
  2012-08-02  8:07                         ` James Bottomley
  1 sibling, 0 replies; 15+ messages in thread
From: Glauber Costa @ 2012-08-02  7:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: James Bottomley, linux-mm, Pekka Enberg, David Rientjes,
	Andrew Morton

On 08/01/2012 10:10 PM, Christoph Lameter wrote:
> On Wed, 1 Aug 2012, Glauber Costa wrote:
> 
>> I've audited all users of get_page() in the drivers/ directory for
>> patterns like this. In general, they kmalloc something like a table of
>> entries, and then get_page() the entries. The entries are either user
>> pages, pages allocated by the page allocator, or physical addresses
>> through their pfn (in 2 cases from the vga ones...)
>>
>> I took a look about some other instances where virt_to_page occurs
>> together with kmalloc as well, and they all seem to fall in the same
>> category.
> 
> The case that was notorious in the past was a scsi control structure
> allocated from slab that was then written to the device via DMA. And it
> was not on x86 but some esoteric platform (powerpc?),
> 
> A reference to the discussion of this issue in 2007:
> 
> http://lkml.indiana.edu/hypermail/linux/kernel/0706.3/0424.html
> 
Thanks.

So again, I've scanned across that thread, and found some very useful
excerpts from it, that can only argue in favor of my patch =)

"There are no kmalloced pages. There is only kmalloced memory. You
allocate pages from the page allocator. Its a layering violation to
expect a page struct operation on a slab object to work."

"So someone played loose ball with the slab, was successful and that
makes it right now?"

Looking at the code again, I see that page_mapping(), that ends up being
called to do the translation in those pathological cases now features a
VM_BUG_ON(), put in place by yourself. This dates back from 2007, giving
me enough reason to believe that whatever issue still existed back then
is already sorted out - or nobody really cares.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Any reason to use put_page in slub.c?
  2012-08-01 18:10                       ` Christoph Lameter
  2012-08-02  7:55                         ` Glauber Costa
@ 2012-08-02  8:07                         ` James Bottomley
  1 sibling, 0 replies; 15+ messages in thread
From: James Bottomley @ 2012-08-02  8:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Glauber Costa, linux-mm, Pekka Enberg, David Rientjes,
	Andrew Morton

On Wed, 2012-08-01 at 13:10 -0500, Christoph Lameter wrote:
> On Wed, 1 Aug 2012, Glauber Costa wrote:
> 
> > I've audited all users of get_page() in the drivers/ directory for
> > patterns like this. In general, they kmalloc something like a table of
> > entries, and then get_page() the entries. The entries are either user
> > pages, pages allocated by the page allocator, or physical addresses
> > through their pfn (in 2 cases from the vga ones...)
> >
> > I took a look about some other instances where virt_to_page occurs
> > together with kmalloc as well, and they all seem to fall in the same
> > category.
> 
> The case that was notorious in the past was a scsi control structure
> allocated from slab that was then written to the device via DMA. And it
> was not on x86 but some esoteric platform (powerpc?),
> 
> A reference to the discussion of this issue in 2007:
> 
> http://lkml.indiana.edu/hypermail/linux/kernel/0706.3/0424.html

What you wrote above bears no relation to the thread.  The thread is a
long argument about what flush_dcache_page() should be doing if called
on slab memory.  Hugh told you about five times: "nothing".  Which is
the correct answer since flush_dcache_page() is our user to kernel
memory coherence API ... it's actually nothing to do with DMA although
it can be used to cause coherence for the purposes for DMA, but often
it's simply used to allow the kernel to write to or read from user
memory.

What you seem to be worried about is DMA cache line interference caused
by object misalignment?  But even in what you wrote above it's clear you
don't understand what that actually is.

As long as you flush correctly, you can never actually get DMA cache
line interference on memory being sent to a device via DMA ... however
misaligned it is.  The killer case is unresolvable incoherencies when
you DMA *from* a device.  For this case, if you have a cache line
overlapping an object like this

--------------------------------
  OBJ     | Other OBJ
--------------------------------
  | CPU Cache line |
--------------------------------

If OBJ gets its underlying page altered by DMA at the same time someone
writes to other OBJ (causing the CPU to pull in the cache line with the
old pre-DMA value for OBJ and then dirty the component for Other OBJ),
you have both a dirty cache line and altered (i.e. dirty) underlying
memory.  Here an invalidate will destroy the update to other OBJ and a
flush will destroy the DMA update to OBJ, so the alias is unresolvable.

Is that what the worry is?

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-08-02  8:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-27 12:19 Any reason to use put_page in slub.c? Glauber Costa
2012-07-27 15:55 ` Christoph Lameter
2012-07-30  7:53   ` Glauber Costa
2012-07-30 19:23     ` Christoph Lameter
2012-07-31  8:25       ` Glauber Costa
2012-07-31 14:09         ` Christoph Lameter
2012-07-31 14:09           ` Glauber Costa
2012-07-31 14:17             ` Christoph Lameter
2012-07-31 14:18               ` Glauber Costa
2012-07-31 14:31                 ` Christoph Lameter
2012-07-31 14:52                   ` James Bottomley
2012-08-01 12:42                     ` Glauber Costa
2012-08-01 18:10                       ` Christoph Lameter
2012-08-02  7:55                         ` Glauber Costa
2012-08-02  8:07                         ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).