public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Question on debugging use-after-free memory issues.
@ 2011-06-27 18:12 Ben Greear
  2011-06-28 22:00 ` Jiri Kosina
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2011-06-27 18:12 UTC (permalink / raw)
  To: Linux Kernel Mailing List


I have a case where deleted memory is being passed into an RPC
callback.  I enabled SLUB memory poisoning and verified that the
data pointed to has 0x6b...6b value.

Unfortunately, the rpc code is a giant maze of callbacks and I'm
having a difficult time figuring out where this data could be erroneously
deleted at.

So first question:

Given a pointer to memory, and with SLUB memory debuging on (and/or other
debugging options if applicable), is there a way to get any info about where
the memory was last deleted?

Second:  Any other suggestions for how to go about debugging this?

I hit this problem under load after multiple hours, so just adding printks
in random places may not be feasible...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question on debugging use-after-free memory issues.
  2011-06-27 18:12 Question on debugging use-after-free memory issues Ben Greear
@ 2011-06-28 22:00 ` Jiri Kosina
  2011-06-29  5:41   ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Jiri Kosina @ 2011-06-28 22:00 UTC (permalink / raw)
  To: Ben Greear; +Cc: Linux Kernel Mailing List

On Mon, 27 Jun 2011, Ben Greear wrote:

> I have a case where deleted memory is being passed into an RPC callback.  
> I enabled SLUB memory poisoning and verified that the data pointed to 
> has 0x6b...6b value.
> 
> Unfortunately, the rpc code is a giant maze of callbacks and I'm having 
> a difficult time figuring out where this data could be erroneously 
> deleted at.
> 
> So first question:
> 
> Given a pointer to memory, and with SLUB memory debuging on (and/or 
> other debugging options if applicable), is there a way to get any info 
> about where the memory was last deleted?
> 
> Second:  Any other suggestions for how to go about debugging this?
> 
> I hit this problem under load after multiple hours, so just adding 
> printks in random places may not be feasible...

First, this is not really a proper list for such questions. I'd propose 
kernel newbies community next time.

Anyway, I'd propose to start with kmemcheck (see 
Documentation/kmemcheck.txt). It could pin-point the problemtic spot 
immediately (or it might not).

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question on debugging use-after-free memory issues.
  2011-06-28 22:00 ` Jiri Kosina
@ 2011-06-29  5:41   ` Eric Dumazet
  2011-06-29  6:01     ` Ben Greear
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2011-06-29  5:41 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Ben Greear, Linux Kernel Mailing List

Le mercredi 29 juin 2011 à 00:00 +0200, Jiri Kosina a écrit :
> On Mon, 27 Jun 2011, Ben Greear wrote:
> 
> > I have a case where deleted memory is being passed into an RPC callback.  
> > I enabled SLUB memory poisoning and verified that the data pointed to 
> > has 0x6b...6b value.
> > 
> > Unfortunately, the rpc code is a giant maze of callbacks and I'm having 
> > a difficult time figuring out where this data could be erroneously 
> > deleted at.
> > 
> > So first question:
> > 
> > Given a pointer to memory, and with SLUB memory debuging on (and/or 
> > other debugging options if applicable), is there a way to get any info 
> > about where the memory was last deleted?
> > 
> > Second:  Any other suggestions for how to go about debugging this?
> > 
> > I hit this problem under load after multiple hours, so just adding 
> > printks in random places may not be feasible...
> 
> First, this is not really a proper list for such questions. I'd propose 
> kernel newbies community next time.
> 

LKML is definitely a place for such questions.

> Anyway, I'd propose to start with kmemcheck (see 
> Documentation/kmemcheck.txt). It could pin-point the problemtic spot 
> immediately (or it might not).
> 

kmemcheck is fine if problem is not coming from an SMP bug only. Also
kmemcheck is so slow it makes a rare bug becoming very hard to trigger.


Ben, given that you know that RPC might have a problem on a given small
object (struct rpcbind_args ), you could afford changing the
kmalloc()/kfree() used to allocate/free such objects by calls to page
allocator, and dont free the page but unmap it from kernel mapping so
that any further read/write access triggers a fault. You can then have a
more precise idea of what's happening, without slowing down whole
kernel. Of course there is a mem leak for each "struct rpcbind_args"
allocated, so this is a debugging aid only.

DEBUG_PAGEALLOC might be too expensive, so try this patch (untested, you
might need to complete it)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 9a80a92..9b4dbaf 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -158,7 +158,7 @@ static void rpcb_map_release(void *data)
 	rpcb_wake_rpcbind_waiters(map->r_xprt, map->r_status);
 	xprt_put(map->r_xprt);
 	kfree(map->r_addr);
-	kfree(map);
+	kernel_map_pages(virt_to_page(map), 1, 0);
 }
 
 /*
@@ -668,7 +668,7 @@ void rpcb_getport_async(struct rpc_task *task)
 		goto bailout_nofree;
 	}
 
-	map = kzalloc(sizeof(struct rpcbind_args), GFP_ATOMIC);
+	map = (struct rpcbind_args *)__get_free_page(GFP_ATOMIC | __GFP_ZERO);
 	if (!map) {
 		status = -ENOMEM;
 		dprintk("RPC: %5u %s: no memory available\n",



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Question on debugging use-after-free memory issues.
  2011-06-29  5:41   ` Eric Dumazet
@ 2011-06-29  6:01     ` Ben Greear
  0 siblings, 0 replies; 4+ messages in thread
From: Ben Greear @ 2011-06-29  6:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jiri Kosina, Linux Kernel Mailing List

On 06/28/2011 10:41 PM, Eric Dumazet wrote:
> Le mercredi 29 juin 2011 à 00:00 +0200, Jiri Kosina a écrit :
>> On Mon, 27 Jun 2011, Ben Greear wrote:

>> Anyway, I'd propose to start with kmemcheck (see
>> Documentation/kmemcheck.txt). It could pin-point the problemtic spot
>> immediately (or it might not).
>>
>
> kmemcheck is fine if problem is not coming from an SMP bug only. Also
> kmemcheck is so slow it makes a rare bug becoming very hard to trigger.

I think I've pretty much verified that deleted memory is passed down
a certain call path with the slub patches I posted.
What I can't figure out is how that came to be.

> Ben, given that you know that RPC might have a problem on a given small
> object (struct rpcbind_args ), you could afford changing the
> kmalloc()/kfree() used to allocate/free such objects by calls to page
> allocator, and dont free the page but unmap it from kernel mapping so
> that any further read/write access triggers a fault. You can then have a
> more precise idea of what's happening, without slowing down whole
> kernel. Of course there is a mem leak for each "struct rpcbind_args"
> allocated, so this is a debugging aid only.
>
> DEBUG_PAGEALLOC might be too expensive, so try this patch (untested, you
> might need to complete it)
>
> diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
> index 9a80a92..9b4dbaf 100644
> --- a/net/sunrpc/rpcb_clnt.c
> +++ b/net/sunrpc/rpcb_clnt.c
> @@ -158,7 +158,7 @@ static void rpcb_map_release(void *data)
>   	rpcb_wake_rpcbind_waiters(map->r_xprt, map->r_status);
>   	xprt_put(map->r_xprt);
>   	kfree(map->r_addr);
> -	kfree(map);
> +	kernel_map_pages(virt_to_page(map), 1, 0);
>   }
>
>   /*
> @@ -668,7 +668,7 @@ void rpcb_getport_async(struct rpc_task *task)
>   		goto bailout_nofree;
>   	}
>
> -	map = kzalloc(sizeof(struct rpcbind_args), GFP_ATOMIC);
> +	map = (struct rpcbind_args *)__get_free_page(GFP_ATOMIC | __GFP_ZERO);
>   	if (!map) {
>   		status = -ENOMEM;
>   		dprintk("RPC: %5u %s: no memory available\n",
>

It takes possibly hours of heavy load to hit the problem, so I think
I cannot afford to leak that much memory.

Interestingly, I added this code below, and haven't hit the problem since.
I'm not sure if it just changed the timing, or what...or maybe I'll
hit it overnight...

I tried setting this (below) to 0x6b instead of 0x0 (mempool doesn't
really kmalloc/free too often, so the slub poisoning doesn't help),
but never did hit the bug again.

I suspect that somehow the task object is still on the work-queue,
when it is deleted, but since the 0x6b and 0x0 poisoning didn't cause
any funny crashes, I could easily be wrong about that.

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 17c3e3a..d94f009 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -859,6 +859,12 @@ static void rpc_free_task(struct rpc_task *task)

         if (task->tk_flags & RPC_TASK_DYNAMIC) {
                 dprintk("RPC: %5u freeing task\n", task->tk_pid);
+               /* HACK:  Have been seeing use-after-free of calldata.  Zero this memory
+                * so that it cannot happen here.  Seems to have fixed the problem
+                * in 3.0 kernel, but maybe it just adjusted timing..either way,
+                * it's not a real fix. --Ben
+                */
+               memset(task, 0, sizeof(*task));
                 mempool_free(task, rpc_task_mempool);
         }
         rpc_release_calldata(tk_ops, calldata);


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-06-29  6:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-27 18:12 Question on debugging use-after-free memory issues Ben Greear
2011-06-28 22:00 ` Jiri Kosina
2011-06-29  5:41   ` Eric Dumazet
2011-06-29  6:01     ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox