Serious memory leak in TI EDMA driver (drivers/dma/edma.c)

linux-omap.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
@ 2015-03-16 19:26 Petr Kulhavy
  2015-03-16 19:27 ` Tony Lindgren
  2015-03-17 12:38 ` Peter Ujfalusi
  0 siblings, 2 replies; 12+ messages in thread
From: Petr Kulhavy @ 2015-03-16 19:26 UTC (permalink / raw)
  To: linux-omap

Hi,

I have found a memory leak in the TI EDMA driver, which happens every 
time a DMA transfer is performed.
The leak is in kernel 3.17, however the same problem seems to exist also 
in 3.19.

In particular this was found on our custom TI AM1808 based hardware 
while accessing the MMC/SD card interface.
When extensively using the SD card (e.g. downloading files to it) you 
can virtually see the "SUnreclaim" memory in /proc/meminfo growing few 
kB every few seconds.
After few days of operation a device with 128MB of RAM renders unusable 
(lack of memory, system slow, processes being killed, etc.), the 
unreclaimed SLAB memory is over 50MB.

The kernel memory leak debug mechanism revealed the leak to happen in 
edma_prep_slave_sg(), however the same pattern repeats all over the 
edma.c file (see below).

unreferenced object 0xc5abe3c0 (size 128):
   comm "mmcqd/0", pid 1099, jiffies 4294948151 (age 5865.330s)
   hex dump (first 32 bytes):
     b7 02 00 00 03 00 00 00 00 00 00 00 80 bb 81 c7  ................
     18 b4 23 c0 00 00 00 00 00 00 00 00 00 00 00 00  ..#.............
   backtrace:
     [<c023c8d0>] edma_prep_slave_sg+0x98/0x344
     [<c030b350>] mmc_davinci_request+0x3d4/0x53c
     [<c02f86c8>] mmc_start_request+0xc4/0xe8
     [<c02f9654>] mmc_start_req+0x18c/0x354
     [<c0307c84>] mmc_blk_issue_rw_rq+0xc0/0xc94
     [<c0308a0c>] mmc_blk_issue_rq+0x1b4/0x4f4
     [<c0309648>] mmc_queue_thread+0xb8/0x168
     [<c0034930>] kthread+0xb4/0xd0
     [<c0009730>] ret_from_fork+0x14/0x24
     [<ffffffff>] 0xffffffff

The structure edma_desc is allocated using kzalloc in the 
edma_prep_slave_sg() function, then a pointer to a member of its 
substructure (dma_async_tx_descriptor) is returned.
Therefore the edma_desc structure cannot be freed since the allocated 
address is nowhere stored and therefore lost.
I also haven't found that the dma_async_tx_descriptor would be freed, 
but not sure whether the kernel does this in some other place?

Basically every time there is edma_prep_slave_sg 128 bytes of memory is 
allocated but it's never freed.
I'm not sure what is the right way to fix this issue, but it seems to me 
that the driver needs a more significant change to keep e.g. a pool of 
resources which is reused and eventually freed, like some other EDMA 
drivers do.

Could you please advise what to do.

Thank you
Petr

-- 
Petr Kulhavy, MSc
System Architect
Barix AG, Zürich, Switzerland

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
@ 2015-03-16 19:27 Petr Kulhavy
  0 siblings, 0 replies; 12+ messages in thread
From: Petr Kulhavy @ 2015-03-16 19:27 UTC (permalink / raw)
  To: linux-omap

Hi,

I have found a memory leak in the TI EDMA driver, which happens every 
time a DMA transfer is performed.
The leak is in kernel 3.17, however the same problem seems to exist also 
in 3.19.

In particular this was found on our custom TI AM1808 based hardware 
while accessing the MMC/SD card interface.
When extensively using the SD card (e.g. downloading files to it) you 
can virtually see the "SUnreclaim" memory in /proc/meminfo growing few 
kB every few seconds.
After few days of operation a device with 128MB of RAM renders unusable 
(lack of memory, system slow, processes being killed, etc.), the 
unreclaimed SLAB memory is over 50MB.

The kernel memory leak debug mechanism revealed the leak to happen in 
edma_prep_slave_sg(), however the same pattern repeats all over the 
edma.c file (see below).

unreferenced object 0xc5abe3c0 (size 128):
   comm "mmcqd/0", pid 1099, jiffies 4294948151 (age 5865.330s)
   hex dump (first 32 bytes):
     b7 02 00 00 03 00 00 00 00 00 00 00 80 bb 81 c7  ................
     18 b4 23 c0 00 00 00 00 00 00 00 00 00 00 00 00  ..#.............
   backtrace:
     [<c023c8d0>] edma_prep_slave_sg+0x98/0x344
     [<c030b350>] mmc_davinci_request+0x3d4/0x53c
     [<c02f86c8>] mmc_start_request+0xc4/0xe8
     [<c02f9654>] mmc_start_req+0x18c/0x354
     [<c0307c84>] mmc_blk_issue_rw_rq+0xc0/0xc94
     [<c0308a0c>] mmc_blk_issue_rq+0x1b4/0x4f4
     [<c0309648>] mmc_queue_thread+0xb8/0x168
     [<c0034930>] kthread+0xb4/0xd0
     [<c0009730>] ret_from_fork+0x14/0x24
     [<ffffffff>] 0xffffffff

The structure edma_desc is allocated using kzalloc in the 
edma_prep_slave_sg() function, then a pointer to a member of its 
substructure (dma_async_tx_descriptor) is returned.
Therefore the edma_desc structure cannot be freed since the allocated 
address is nowhere stored and therefore lost.
I also haven't found that the dma_async_tx_descriptor would be freed, 
but not sure whether the kernel does this in some other place?

Basically every time there is edma_prep_slave_sg 128 bytes of memory is 
allocated but it's never freed.
I'm not sure what is the right way to fix this issue, but it seems to me 
that the driver needs a more significant change to keep e.g. a pool of 
resources which is reused and eventually freed, like some other EDMA 
drivers do.

Could you please advise what to do.

Thank you
Petr

-- 
Petr Kulhavy, MSc
System Architect
Barix AG, Zürich, Switzerland

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-16 19:26 Serious memory leak in TI EDMA driver (drivers/dma/edma.c) Petr Kulhavy
@ 2015-03-16 19:27 ` Tony Lindgren
  2015-03-17 12:38 ` Peter Ujfalusi
  1 sibling, 0 replies; 12+ messages in thread
From: Tony Lindgren @ 2015-03-16 19:27 UTC (permalink / raw)
  To: Petr Kulhavy; +Cc: linux-omap, Peter Ujfalusi

Hi,

* Petr Kulhavy <petr@barix.com> [150316 12:26]:
> Hi,
> 
> I have found a memory leak in the TI EDMA driver, which happens every time a
> DMA transfer is performed.
> The leak is in kernel 3.17, however the same problem seems to exist also in
> 3.19.
> 
> In particular this was found on our custom TI AM1808 based hardware while
> accessing the MMC/SD card interface.
> When extensively using the SD card (e.g. downloading files to it) you can
> virtually see the "SUnreclaim" memory in /proc/meminfo growing few kB every
> few seconds.
> After few days of operation a device with 128MB of RAM renders unusable
> (lack of memory, system slow, processes being killed, etc.), the unreclaimed
> SLAB memory is over 50MB.
> 
> The kernel memory leak debug mechanism revealed the leak to happen in
> edma_prep_slave_sg(), however the same pattern repeats all over the edma.c
> file (see below).
> 
> unreferenced object 0xc5abe3c0 (size 128):
>   comm "mmcqd/0", pid 1099, jiffies 4294948151 (age 5865.330s)
>   hex dump (first 32 bytes):
>     b7 02 00 00 03 00 00 00 00 00 00 00 80 bb 81 c7  ................
>     18 b4 23 c0 00 00 00 00 00 00 00 00 00 00 00 00  ..#.............
>   backtrace:
>     [<c023c8d0>] edma_prep_slave_sg+0x98/0x344
>     [<c030b350>] mmc_davinci_request+0x3d4/0x53c
>     [<c02f86c8>] mmc_start_request+0xc4/0xe8
>     [<c02f9654>] mmc_start_req+0x18c/0x354
>     [<c0307c84>] mmc_blk_issue_rw_rq+0xc0/0xc94
>     [<c0308a0c>] mmc_blk_issue_rq+0x1b4/0x4f4
>     [<c0309648>] mmc_queue_thread+0xb8/0x168
>     [<c0034930>] kthread+0xb4/0xd0
>     [<c0009730>] ret_from_fork+0x14/0x24
>     [<ffffffff>] 0xffffffff
> 
> 
> The structure edma_desc is allocated using kzalloc in the
> edma_prep_slave_sg() function, then a pointer to a member of its
> substructure (dma_async_tx_descriptor) is returned.
> Therefore the edma_desc structure cannot be freed since the allocated
> address is nowhere stored and therefore lost.
> I also haven't found that the dma_async_tx_descriptor would be freed, but
> not sure whether the kernel does this in some other place?
> 
> Basically every time there is edma_prep_slave_sg 128 bytes of memory is
> allocated but it's never freed.
> I'm not sure what is the right way to fix this issue, but it seems to me
> that the driver needs a more significant change to keep e.g. a pool of
> resources which is reused and eventually freed, like some other EDMA drivers
> do.
> 
> Could you please advise what to do.

Thanks for reporting it. This sounds like something for Peter to look at.

Regards,

Tony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-16 19:26 Serious memory leak in TI EDMA driver (drivers/dma/edma.c) Petr Kulhavy
  2015-03-16 19:27 ` Tony Lindgren
@ 2015-03-17 12:38 ` Peter Ujfalusi
  2015-03-17 19:02   ` Petr Kulhavy
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Ujfalusi @ 2015-03-17 12:38 UTC (permalink / raw)
  To: Petr Kulhavy, linux-omap

Hi,

On 03/16/2015 09:26 PM, Petr Kulhavy wrote:
> Hi,
> 
> I have found a memory leak in the TI EDMA driver, which happens every time a
> DMA transfer is performed.
> The leak is in kernel 3.17, however the same problem seems to exist also in 3.19.

I have issues booting the 3.17, 3.18 and 3.19 on my am335x-evmsk so I could
only test this with 4.0-rc4 and linux-next.

> In particular this was found on our custom TI AM1808 based hardware while
> accessing the MMC/SD card interface.
> When extensively using the SD card (e.g. downloading files to it) you can
> virtually see the "SUnreclaim" memory in /proc/meminfo growing few kB every
> few seconds.

I've done the test dd-ing to/from the mmc, running a recursive grep on the
filesystem on the mmc. This should have generated enough edma requests.

> After few days of operation a device with 128MB of RAM renders unusable (lack
> of memory, system slow, processes being killed, etc.), the unreclaimed SLAB
> memory is over 50MB.
> 
> The kernel memory leak debug mechanism revealed the leak to happen in
> edma_prep_slave_sg(), however the same pattern repeats all over the edma.c
> file (see below).
> 
> unreferenced object 0xc5abe3c0 (size 128):
>   comm "mmcqd/0", pid 1099, jiffies 4294948151 (age 5865.330s)
>   hex dump (first 32 bytes):
>     b7 02 00 00 03 00 00 00 00 00 00 00 80 bb 81 c7  ................
>     18 b4 23 c0 00 00 00 00 00 00 00 00 00 00 00 00  ..#.............
>   backtrace:
>     [<c023c8d0>] edma_prep_slave_sg+0x98/0x344
>     [<c030b350>] mmc_davinci_request+0x3d4/0x53c
>     [<c02f86c8>] mmc_start_request+0xc4/0xe8
>     [<c02f9654>] mmc_start_req+0x18c/0x354
>     [<c0307c84>] mmc_blk_issue_rw_rq+0xc0/0xc94
>     [<c0308a0c>] mmc_blk_issue_rq+0x1b4/0x4f4
>     [<c0309648>] mmc_queue_thread+0xb8/0x168
>     [<c0034930>] kthread+0xb4/0xd0
>     [<c0009730>] ret_from_fork+0x14/0x24
>     [<ffffffff>] 0xffffffff

But I have not seen a single report from kmemleak suggesting edma.

> The structure edma_desc is allocated using kzalloc in the edma_prep_slave_sg()
> function, then a pointer to a member of its substructure
> (dma_async_tx_descriptor) is returned.
> Therefore the edma_desc structure cannot be freed since the allocated address
> is nowhere stored and therefore lost.

the allocated edesc is freed up in edma_desc_free(), which is going to be
called either from vchan_dma_desc_free_list() or vchan_cookie_complete() when
we terminate the dma transfer or when the transfer is completed.

> I also haven't found that the dma_async_tx_descriptor would be freed, but not
> sure whether the kernel does this in some other place?

It is freed when the edesc is freed up since the dma_async_tx_descriptor is
part of the edma_desc :

struct edma_desc {
	struct virt_dma_desc		vdesc;
...
};

struct virt_dma_desc {
	struct dma_async_tx_descriptor tx;
	/* protected by vc.lock */
	struct list_head node;
};

and the &vdesc->tx is returned from vchan_tx_prep().

> Basically every time there is edma_prep_slave_sg 128 bytes of memory is
> allocated but it's never freed.
> I'm not sure what is the right way to fix this issue, but it seems to me that
> the driver needs a more significant change to keep e.g. a pool of resources
> which is reused and eventually freed, like some other EDMA drivers do.
> 
> Could you please advise what to do.

I can not reproduce the leak from edma driver, but I could get leaks from the
ethernet:
unreferenced object 0xcbe2f400 (size 176):
  comm "softirq", pid 0, jiffies 358465 (age 84.320s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 98 99 cb 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<c04fc4c8>] __alloc_rx_skb+0x58/0xdc
    [<c04fc564>] __netdev_alloc_skb+0x18/0x40
    [<c045c750>] cpsw_rx_handler+0x70/0x1c0
    [<c04599f8>] __cpdma_chan_process+0xf0/0x130
    [<c0459a74>] cpdma_chan_process+0x3c/0x5c
    [<c045bd20>] cpsw_poll+0x28/0xd8
    [<c050ce34>] net_rx_action+0x1d4/0x334
    [<c0042404>] __do_softirq+0xd4/0x348
    [<c0042998>] irq_exit+0xbc/0x130
    [<c0090b10>] __handle_domain_irq+0x6c/0xe0
    [<c00086fc>] omap_intc_handle_irq+0xb4/0xc4
    [<c05e3724>] __irq_svc+0x44/0x5c
    [<c05e2f0c>] _raw_spin_unlock_irqrestore+0x34/0x44
    [<c05e2f0c>] _raw_spin_unlock_irqrestore+0x34/0x44
    [<c014fe94>] scan_gray_list+0x150/0x18c
    [<c01500ec>] kmemleak_scan+0x21c/0x4d8

by just pinging the board (ping -s 2000 192.168.1.120).

It might be possible that you are seeing this cpdma leak in the edma driver.
If you download and store it to mmc, this might be something which is plausible.

-- 
Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-17 12:38 ` Peter Ujfalusi
@ 2015-03-17 19:02   ` Petr Kulhavy
  2015-03-18 13:56     ` Peter Ujfalusi
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Kulhavy @ 2015-03-17 19:02 UTC (permalink / raw)
  To: Peter Ujfalusi, linux-omap

Hi Peter,

thanks a lot for the details.
I believe it's not an Ethernet issue, it's really related to the SD 
card. If we use the USB storage instead of the SD card on our device we 
don't see the leaks.

I enabled the dynamic debug and added a debug message for the kzalloc() 
in edma_prep_slave_sg() and for the kfree() in the edma_desc_free() both 
to print the pointer address. And it gives an interesting result, see below.

You can see that after every alloc (i.e.edma_prep_slave_sg()) 
edma_execute() is called ("file transfer starting..."), however not all 
of them end with "Transfer complete". And exactly those are also not freed.

Unfortunately I do not know how exactly the edma mechanism works with 
all the callbacks, states, etc.
But does it make any sense for you? Can you help me to debug more?

Thanks
Petr

ALLOC edesc c65d5c80
first transfer starting on channel 65565
ALLOC edesc c5b69640
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c5b69640
ALLOC edesc c58ec580
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c58ec580
ALLOC edesc c5103d80
first transfer starting on channel 65565
ALLOC edesc c61e78c0
first transfer starting on channel 65565
ALLOC edesc c65d6f80
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c65d6f80
ALLOC edesc c5b698c0
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c5b698c0
ALLOC edesc c52244c0
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c52244c0
ALLOC edesc c52244c0
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c52244c0
ALLOC edesc c52244c0
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c52244c0
ALLOC edesc c52244c0
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c52244c0
ALLOC edesc c58ec580
first transfer starting on channel 65565
ALLOC edesc c5b698c0
first transfer starting on channel 65565
ALLOC edesc c5103480
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c5103480
ALLOC edesc c5b69640
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c5b69640
ALLOC edesc c61e62c0
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c61e62c0
ALLOC edesc c5227440
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c5227440
ALLOC edesc c5b69640
first transfer starting on channel 65565
ALLOC edesc c5b69b40
first transfer starting on channel 65565
ALLOC edesc c5233000
first transfer starting on channel 65565
ALLOC edesc c5233dc0
first transfer starting on channel 65565
ALLOC edesc c5233140
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c5233140
ALLOC edesc c5233140
first transfer starting on channel 65565
ALLOC edesc c5233280
first transfer starting on channel 65565
Transfer complete, stopping channel 29
FREE edesc c5233280




On 17.03.2015 13:38, Peter Ujfalusi wrote:
> Hi,
>
> On 03/16/2015 09:26 PM, Petr Kulhavy wrote:
>> Hi,
>>
>> I have found a memory leak in the TI EDMA driver, which happens every time a
>> DMA transfer is performed.
>> The leak is in kernel 3.17, however the same problem seems to exist also in 3.19.
> I have issues booting the 3.17, 3.18 and 3.19 on my am335x-evmsk so I could
> only test this with 4.0-rc4 and linux-next.
>
>> In particular this was found on our custom TI AM1808 based hardware while
>> accessing the MMC/SD card interface.
>> When extensively using the SD card (e.g. downloading files to it) you can
>> virtually see the "SUnreclaim" memory in /proc/meminfo growing few kB every
>> few seconds.
> I've done the test dd-ing to/from the mmc, running a recursive grep on the
> filesystem on the mmc. This should have generated enough edma requests.
>
>> After few days of operation a device with 128MB of RAM renders unusable (lack
>> of memory, system slow, processes being killed, etc.), the unreclaimed SLAB
>> memory is over 50MB.
>>
>> The kernel memory leak debug mechanism revealed the leak to happen in
>> edma_prep_slave_sg(), however the same pattern repeats all over the edma.c
>> file (see below).
>>
>> unreferenced object 0xc5abe3c0 (size 128):
>>    comm "mmcqd/0", pid 1099, jiffies 4294948151 (age 5865.330s)
>>    hex dump (first 32 bytes):
>>      b7 02 00 00 03 00 00 00 00 00 00 00 80 bb 81 c7  ................
>>      18 b4 23 c0 00 00 00 00 00 00 00 00 00 00 00 00  ..#.............
>>    backtrace:
>>      [<c023c8d0>] edma_prep_slave_sg+0x98/0x344
>>      [<c030b350>] mmc_davinci_request+0x3d4/0x53c
>>      [<c02f86c8>] mmc_start_request+0xc4/0xe8
>>      [<c02f9654>] mmc_start_req+0x18c/0x354
>>      [<c0307c84>] mmc_blk_issue_rw_rq+0xc0/0xc94
>>      [<c0308a0c>] mmc_blk_issue_rq+0x1b4/0x4f4
>>      [<c0309648>] mmc_queue_thread+0xb8/0x168
>>      [<c0034930>] kthread+0xb4/0xd0
>>      [<c0009730>] ret_from_fork+0x14/0x24
>>      [<ffffffff>] 0xffffffff
> But I have not seen a single report from kmemleak suggesting edma.
>
>> The structure edma_desc is allocated using kzalloc in the edma_prep_slave_sg()
>> function, then a pointer to a member of its substructure
>> (dma_async_tx_descriptor) is returned.
>> Therefore the edma_desc structure cannot be freed since the allocated address
>> is nowhere stored and therefore lost.
> the allocated edesc is freed up in edma_desc_free(), which is going to be
> called either from vchan_dma_desc_free_list() or vchan_cookie_complete() when
> we terminate the dma transfer or when the transfer is completed.
>
>> I also haven't found that the dma_async_tx_descriptor would be freed, but not
>> sure whether the kernel does this in some other place?
> It is freed when the edesc is freed up since the dma_async_tx_descriptor is
> part of the edma_desc :
>
> struct edma_desc {
> 	struct virt_dma_desc		vdesc;
> ...
> };
>
> struct virt_dma_desc {
> 	struct dma_async_tx_descriptor tx;
> 	/* protected by vc.lock */
> 	struct list_head node;
> };
>
> and the &vdesc->tx is returned from vchan_tx_prep().
>
>> Basically every time there is edma_prep_slave_sg 128 bytes of memory is
>> allocated but it's never freed.
>> I'm not sure what is the right way to fix this issue, but it seems to me that
>> the driver needs a more significant change to keep e.g. a pool of resources
>> which is reused and eventually freed, like some other EDMA drivers do.
>>
>> Could you please advise what to do.
> I can not reproduce the leak from edma driver, but I could get leaks from the
> ethernet:
> unreferenced object 0xcbe2f400 (size 176):
>    comm "softirq", pid 0, jiffies 358465 (age 84.320s)
>    hex dump (first 32 bytes):
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>      00 00 00 00 00 98 99 cb 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<c04fc4c8>] __alloc_rx_skb+0x58/0xdc
>      [<c04fc564>] __netdev_alloc_skb+0x18/0x40
>      [<c045c750>] cpsw_rx_handler+0x70/0x1c0
>      [<c04599f8>] __cpdma_chan_process+0xf0/0x130
>      [<c0459a74>] cpdma_chan_process+0x3c/0x5c
>      [<c045bd20>] cpsw_poll+0x28/0xd8
>      [<c050ce34>] net_rx_action+0x1d4/0x334
>      [<c0042404>] __do_softirq+0xd4/0x348
>      [<c0042998>] irq_exit+0xbc/0x130
>      [<c0090b10>] __handle_domain_irq+0x6c/0xe0
>      [<c00086fc>] omap_intc_handle_irq+0xb4/0xc4
>      [<c05e3724>] __irq_svc+0x44/0x5c
>      [<c05e2f0c>] _raw_spin_unlock_irqrestore+0x34/0x44
>      [<c05e2f0c>] _raw_spin_unlock_irqrestore+0x34/0x44
>      [<c014fe94>] scan_gray_list+0x150/0x18c
>      [<c01500ec>] kmemleak_scan+0x21c/0x4d8
>
> by just pinging the board (ping -s 2000 192.168.1.120).
>
> It might be possible that you are seeing this cpdma leak in the edma driver.
> If you download and store it to mmc, this might be something which is plausible.
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-17 19:02   ` Petr Kulhavy
@ 2015-03-18 13:56     ` Peter Ujfalusi
  2015-03-18 21:33       ` Petr Kulhavy
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Ujfalusi @ 2015-03-18 13:56 UTC (permalink / raw)
  To: Petr Kulhavy, linux-omap

Petr,

On 03/17/2015 09:02 PM, Petr Kulhavy wrote:
> Hi Peter,
> 
> thanks a lot for the details.
> I believe it's not an Ethernet issue, it's really related to the SD card. If
> we use the USB storage instead of the SD card on our device we don't see the
> leaks.

I believe you are not booting with DT. The two eDMA is only supported when
booting in legacy mode.

> I enabled the dynamic debug and added a debug message for the kzalloc() in
> edma_prep_slave_sg() and for the kfree() in the edma_desc_free() both to print
> the pointer address. And it gives an interesting result, see below.
> 
> You can see that after every alloc (i.e.edma_prep_slave_sg()) edma_execute()
> is called ("file transfer starting..."), however not all of them end with
> "Transfer complete". And exactly those are also not freed.

I did the same on am335x-evmsk and I don't see the same issue (linux-next). It
does not mean that we do not have the issue you describe, but somehow it is
not that easy to reproduce. I will try my OMAP-L138 board, which is closer to
AM1808 than AM335x.

> Unfortunately I do not know how exactly the edma mechanism works with all the
> callbacks, states, etc.
> But does it make any sense for you? Can you help me to debug more?

Not sure at the moment, but I would try to print also in the error cases in
the callback and track the edesc pointer in every prints to see where the code
goes. I would enable the prints in the edma_execute as well to see what is
going on there.

For reference I have these prints:
[  924.127638] ALLOC edesc: 0xcd948c40 (channel: 25, sg_len: 1, rounds: 1)
[  924.134598] first transfer starting on channel 25 (edesc: 0xcd948c40)
[  924.145505] Transfer complete, stopping channel: 25 (edesc: 0xcd948c40)
[  924.152561] FREE edesc: 0xcd948c40 (channel: 25)
[  924.159223] ALLOC edesc: 0xc916f800 (channel: 25, sg_len: 30, rounds: 2)
[  924.166611] first transfer starting on channel 25 (edesc: 0xc916f800)
[  924.180541] Intermediate transfer complete on channel: 25 (edesc: 0xc916f800)
[  924.188054] chan: 25: completed 30 elements, resuming (edesc: 0xc916f800)
[  924.195208] Error occurred on channel: 25 (edesc: 0xc916f800), TRIGGERING
[  924.204117] Transfer complete, stopping channel: 25 (edesc: 0xc916f800)
[  924.211158] FREE edesc: 0xc916f800 (channel: 25)
[  924.218407] ALLOC edesc: 0xc6cbce00 (channel: 25, sg_len: 8, rounds: 1)
[  924.225396] first transfer starting on channel 25 (edesc: 0xc6cbce00)
[  924.236530] Transfer complete, stopping channel: 25 (edesc: 0xc6cbce00)
[  924.243506] FREE edesc: 0xc6cbce00 (channel: 25)
[  924.248817] ALLOC edesc: 0xcaa5ec00 (channel: 25, sg_len: 12, rounds: 1)
[  924.256068] first transfer starting on channel 25 (edesc: 0xcaa5ec00)
[  924.268277] Transfer complete, stopping channel: 25 (edesc: 0xcaa5ec00)
[  924.275265] FREE edesc: 0xcaa5ec00 (channel: 25)



> Thanks
> Petr
> 
> ALLOC edesc c65d5c80
> first transfer starting on channel 65565

The channel number does not seams right here.

> ALLOC edesc c5b69640

Would be nice to see the channel as well here. But sure the FREE for c65d5c80
is missing. Strange, I don't see how this happens.
I have enabled PREEMPT and still can not reproduce - this was my first idea.

> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c5b69640
> ALLOC edesc c58ec580
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c58ec580
> ALLOC edesc c5103d80
> first transfer starting on channel 65565
> ALLOC edesc c61e78c0
> first transfer starting on channel 65565
> ALLOC edesc c65d6f80
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c65d6f80
> ALLOC edesc c5b698c0
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c5b698c0
> ALLOC edesc c52244c0
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c52244c0
> ALLOC edesc c52244c0
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c52244c0
> ALLOC edesc c52244c0
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c52244c0
> ALLOC edesc c52244c0
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c52244c0
> ALLOC edesc c58ec580
> first transfer starting on channel 65565
> ALLOC edesc c5b698c0
> first transfer starting on channel 65565
> ALLOC edesc c5103480
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c5103480
> ALLOC edesc c5b69640
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c5b69640
> ALLOC edesc c61e62c0
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c61e62c0
> ALLOC edesc c5227440
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c5227440
> ALLOC edesc c5b69640
> first transfer starting on channel 65565
> ALLOC edesc c5b69b40
> first transfer starting on channel 65565
> ALLOC edesc c5233000
> first transfer starting on channel 65565
> ALLOC edesc c5233dc0
> first transfer starting on channel 65565
> ALLOC edesc c5233140
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c5233140
> ALLOC edesc c5233140
> first transfer starting on channel 65565
> ALLOC edesc c5233280
> first transfer starting on channel 65565
> Transfer complete, stopping channel 29
> FREE edesc c5233280
> 

-- 
Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-18 13:56     ` Peter Ujfalusi
@ 2015-03-18 21:33       ` Petr Kulhavy
  2015-03-20 13:59         ` Peter Ujfalusi
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Kulhavy @ 2015-03-18 21:33 UTC (permalink / raw)
  To: Peter Ujfalusi, linux-omap

Hi Peter,

Yes, we do not use DT.
Our board design is very close to the da850evm reference board. So if 
you have a chance of getting hold of it you could try on that one.

The "wrong" channel number justmeans that it's on the second controller 
(pdev->id is 1), the debug is just printing the compound number 
channel+controller.

I believe that the problem is in edma_terminate_all(), the active edesc 
is not freed by the vchan_dma_desc_free_list() according to the below 
debug trace. The first alloc/free pair is correct, the other alloc has 
no free - see the "Terminate all" line. I also enabled debug in 
common/edma.c and virt-dma.c. Note that the "Terminate all" message is 
printed at the end of edma_terminate_all() function just before return. 
Printing it earlier was disturbing my test.

So the critical sequence is:
- submit transaction
- issue pending
- IRQ handler (int 3 ???) - doesn't seem to be relevant
- terminate all transfers -> dma stop - this doesn't free
- IRQ handler (int 29) -> callback - probably harmless

The issue is that edma_execute() calls vchan_next_desc()  which removes 
the  request from vc->dest_issued list
and while the transaction is being processed it's not on any list and 
terminate_all_transfers calling vchan_get_all_descriptors doesn't find 
it and therefore doesn't free it.

Is that a plausible explanation?

Cheers
Petr



Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: ALLOC edesc: 0xc34d5b80 (channel: 65565, sg_len: 9)
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: vchan c781bc70: txd c34d5b80[1c91]: submitted
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: first transfer starting on channel 65565
Mar 18 20:38:09 barix user.debug kernel: EDMA: ER0 00000000
Mar 18 20:38:09 barix user.debug kernel: EDMA: EER0 20000000
Mar 18 20:38:09 barix user.debug kernel: edma edma: dma_irq_handler
Mar 18 20:38:09 barix user.debug kernel: edma edma: IPR0 00000008
Mar 18 20:38:09 barix user.debug kernel: edma edma: dma_irq_handler
Mar 18 20:38:09 barix user.debug kernel: edma edma: IPR0 20000000
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: Edma callback: edesc 0xc34d5b80
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: Pausing channel 29
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: Transfer complete, stopping channel 29
Mar 18 20:38:09 barix user.debug kernel: EDMA: EER0 00000000
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: Setting edesc 0xc34d5b80 to NULL
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: txd c34d5b80: freeing
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: FREE edesc: 0xc34d5b80 (channel: 65565)
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: ALLOC edesc: 0xc2f53900 (channel: 65565, sg_len: 15)
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: vchan c781bc70: txd c2f53900[1c92]: submitted
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: first transfer starting on channel 65565
Mar 18 20:38:09 barix user.debug kernel: EDMA: ER0 00000000
Mar 18 20:38:09 barix user.debug kernel: EDMA: EER0 20000000
Mar 18 20:38:09 barix user.debug kernel: edma edma: dma_irq_handler
Mar 18 20:38:09 barix user.debug kernel: edma edma: IPR0 00000008
Mar 18 20:38:09 barix user.debug kernel: EDMA: EER0 00000000
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: Terminate all, setting edesc 0xc2f53900 to NULL
Mar 18 20:38:09 barix user.debug kernel: edma edma: dma_irq_handler
Mar 18 20:38:09 barix user.debug kernel: edma edma: IPR0 20000000
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: Edma callback: edesc 0x  (null)
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: Pausing channel 29
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: ALLOC edesc: 0xc2f52140 (channel: 65565, sg_len: 16)
Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine 
edma-dma-engine.1: vchan c781bc70: txd c2f52140[1c93]: submitted



On 18.03.2015 14:56, Peter Ujfalusi wrote:
> Petr,
>
> On 03/17/2015 09:02 PM, Petr Kulhavy wrote:
>> Hi Peter,
>>
>> thanks a lot for the details.
>> I believe it's not an Ethernet issue, it's really related to the SD card. If
>> we use the USB storage instead of the SD card on our device we don't see the
>> leaks.
> I believe you are not booting with DT. The two eDMA is only supported when
> booting in legacy mode.
>
>> I enabled the dynamic debug and added a debug message for the kzalloc() in
>> edma_prep_slave_sg() and for the kfree() in the edma_desc_free() both to print
>> the pointer address. And it gives an interesting result, see below.
>>
>> You can see that after every alloc (i.e.edma_prep_slave_sg()) edma_execute()
>> is called ("file transfer starting..."), however not all of them end with
>> "Transfer complete". And exactly those are also not freed.
> I did the same on am335x-evmsk and I don't see the same issue (linux-next). It
> does not mean that we do not have the issue you describe, but somehow it is
> not that easy to reproduce. I will try my OMAP-L138 board, which is closer to
> AM1808 than AM335x.
>
>> Unfortunately I do not know how exactly the edma mechanism works with all the
>> callbacks, states, etc.
>> But does it make any sense for you? Can you help me to debug more?
> Not sure at the moment, but I would try to print also in the error cases in
> the callback and track the edesc pointer in every prints to see where the code
> goes. I would enable the prints in the edma_execute as well to see what is
> going on there.
>
> For reference I have these prints:
> [  924.127638] ALLOC edesc: 0xcd948c40 (channel: 25, sg_len: 1, rounds: 1)
> [  924.134598] first transfer starting on channel 25 (edesc: 0xcd948c40)
> [  924.145505] Transfer complete, stopping channel: 25 (edesc: 0xcd948c40)
> [  924.152561] FREE edesc: 0xcd948c40 (channel: 25)
> [  924.159223] ALLOC edesc: 0xc916f800 (channel: 25, sg_len: 30, rounds: 2)
> [  924.166611] first transfer starting on channel 25 (edesc: 0xc916f800)
> [  924.180541] Intermediate transfer complete on channel: 25 (edesc: 0xc916f800)
> [  924.188054] chan: 25: completed 30 elements, resuming (edesc: 0xc916f800)
> [  924.195208] Error occurred on channel: 25 (edesc: 0xc916f800), TRIGGERING
> [  924.204117] Transfer complete, stopping channel: 25 (edesc: 0xc916f800)
> [  924.211158] FREE edesc: 0xc916f800 (channel: 25)
> [  924.218407] ALLOC edesc: 0xc6cbce00 (channel: 25, sg_len: 8, rounds: 1)
> [  924.225396] first transfer starting on channel 25 (edesc: 0xc6cbce00)
> [  924.236530] Transfer complete, stopping channel: 25 (edesc: 0xc6cbce00)
> [  924.243506] FREE edesc: 0xc6cbce00 (channel: 25)
> [  924.248817] ALLOC edesc: 0xcaa5ec00 (channel: 25, sg_len: 12, rounds: 1)
> [  924.256068] first transfer starting on channel 25 (edesc: 0xcaa5ec00)
> [  924.268277] Transfer complete, stopping channel: 25 (edesc: 0xcaa5ec00)
> [  924.275265] FREE edesc: 0xcaa5ec00 (channel: 25)
>
>
>
>> Thanks
>> Petr
>>
>> ALLOC edesc c65d5c80
>> first transfer starting on channel 65565
> The channel number does not seams right here.
>
>> ALLOC edesc c5b69640
> Would be nice to see the channel as well here. But sure the FREE for c65d5c80
> is missing. Strange, I don't see how this happens.
> I have enabled PREEMPT and still can not reproduce - this was my first idea.
>
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c5b69640
>> ALLOC edesc c58ec580
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c58ec580
>> ALLOC edesc c5103d80
>> first transfer starting on channel 65565
>> ALLOC edesc c61e78c0
>> first transfer starting on channel 65565
>> ALLOC edesc c65d6f80
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c65d6f80
>> ALLOC edesc c5b698c0
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c5b698c0
>> ALLOC edesc c52244c0
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c52244c0
>> ALLOC edesc c52244c0
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c52244c0
>> ALLOC edesc c52244c0
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c52244c0
>> ALLOC edesc c52244c0
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c52244c0
>> ALLOC edesc c58ec580
>> first transfer starting on channel 65565
>> ALLOC edesc c5b698c0
>> first transfer starting on channel 65565
>> ALLOC edesc c5103480
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c5103480
>> ALLOC edesc c5b69640
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c5b69640
>> ALLOC edesc c61e62c0
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c61e62c0
>> ALLOC edesc c5227440
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c5227440
>> ALLOC edesc c5b69640
>> first transfer starting on channel 65565
>> ALLOC edesc c5b69b40
>> first transfer starting on channel 65565
>> ALLOC edesc c5233000
>> first transfer starting on channel 65565
>> ALLOC edesc c5233dc0
>> first transfer starting on channel 65565
>> ALLOC edesc c5233140
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c5233140
>> ALLOC edesc c5233140
>> first transfer starting on channel 65565
>> ALLOC edesc c5233280
>> first transfer starting on channel 65565
>> Transfer complete, stopping channel 29
>> FREE edesc c5233280
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-18 21:33       ` Petr Kulhavy
@ 2015-03-20 13:59         ` Peter Ujfalusi
  2015-03-20 21:53           ` Petr Kulhavy
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Ujfalusi @ 2015-03-20 13:59 UTC (permalink / raw)
  To: Petr Kulhavy, linux-omap

Petr,

On 03/18/2015 11:33 PM, Petr Kulhavy wrote:
> Hi Peter,
> 
> Yes, we do not use DT.
> Our board design is very close to the da850evm reference board. So if you have
> a chance of getting hold of it you could try on that one.

I have been trying to reproduce this on my OMAP-L138-EVM (da850evm) but was
not able.

There is a big difference in your an my setup: the MMC on my EVM is connected
to MMC/SD0 interface while in your setup it seams to be connected to MMC/SD1.
The DMA requests for MMC/SD0 is eDMA CC0 16/17, while the MMC/SD1 is eDMA CC1
28/29.
So they are handled by different channel controllers. I would not rule out
that the support for the second CC has issues.
The information from you logs just points to this direction also:
You are writing to the MMC, so CC1 ch29 is active most of the time.
In your log the channel number sometimes 65565 ((1 << 16) | 29) sometimes it
is 29. In my case it is 17 every time.
I think there is something funny regarding to how these channels on CC1 are
working and this might be causing the leak for you.

Unfortunately I do not have HW where I could use any channel on CC1 so I can
not debug this further, but I'll look at the code to see if anything obvious
stands out.
And something does stand out:
arch/arm/common/edma.c: dma_irq_handler()
It calls the callback with a wrong interrupt number since it does not take
into account the different CC.
Can you try something similar to see if it helps:

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 6c49887d326e..e1d413c61e9e 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -405,7 +405,8 @@ static irqreturn_t dma_irq_handler(int irq, void *data)
 					BIT(slot));
 			if (edma_cc[ctlr]->intr_data[channel].callback)
 				edma_cc[ctlr]->intr_data[channel].callback(
-					channel, EDMA_DMA_COMPLETE,
+					EDMA_CTLR_CHAN(ctlr, channel),
+					EDMA_DMA_COMPLETE,
 					edma_cc[ctlr]->intr_data[channel].data);
 		}
 	} while (sh_ipr);

-- 
Péter

> The "wrong" channel number justmeans that it's on the second controller
> (pdev->id is 1), the debug is just printing the compound number
> channel+controller.
> 
> I believe that the problem is in edma_terminate_all(), the active edesc is not
> freed by the vchan_dma_desc_free_list() according to the below debug trace.
> The first alloc/free pair is correct, the other alloc has no free - see the
> "Terminate all" line. I also enabled debug in common/edma.c and virt-dma.c.
> Note that the "Terminate all" message is printed at the end of
> edma_terminate_all() function just before return. Printing it earlier was
> disturbing my test.
> 
> So the critical sequence is:
> - submit transaction
> - issue pending
> - IRQ handler (int 3 ???) - doesn't seem to be relevant
> - terminate all transfers -> dma stop - this doesn't free
> - IRQ handler (int 29) -> callback - probably harmless
> 
> The issue is that edma_execute() calls vchan_next_desc()  which removes the 
> request from vc->dest_issued list
> and while the transaction is being processed it's not on any list and
> terminate_all_transfers calling vchan_get_all_descriptors doesn't find it and
> therefore doesn't free it.
> 
> Is that a plausible explanation?
> 
> Cheers
> Petr
> 
> 
> 
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> ALLOC edesc: 0xc34d5b80 (channel: 65565, sg_len: 9)
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> vchan c781bc70: txd c34d5b80[1c91]: submitted
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> first transfer starting on channel 65565
> Mar 18 20:38:09 barix user.debug kernel: EDMA: ER0 00000000
> Mar 18 20:38:09 barix user.debug kernel: EDMA: EER0 20000000
> Mar 18 20:38:09 barix user.debug kernel: edma edma: dma_irq_handler
> Mar 18 20:38:09 barix user.debug kernel: edma edma: IPR0 00000008
> Mar 18 20:38:09 barix user.debug kernel: edma edma: dma_irq_handler
> Mar 18 20:38:09 barix user.debug kernel: edma edma: IPR0 20000000
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> Edma callback: edesc 0xc34d5b80
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> Pausing channel 29
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> Transfer complete, stopping channel 29
> Mar 18 20:38:09 barix user.debug kernel: EDMA: EER0 00000000
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> Setting edesc 0xc34d5b80 to NULL
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> txd c34d5b80: freeing
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> FREE edesc: 0xc34d5b80 (channel: 65565)
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> ALLOC edesc: 0xc2f53900 (channel: 65565, sg_len: 15)
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> vchan c781bc70: txd c2f53900[1c92]: submitted
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> first transfer starting on channel 65565
> Mar 18 20:38:09 barix user.debug kernel: EDMA: ER0 00000000
> Mar 18 20:38:09 barix user.debug kernel: EDMA: EER0 20000000
> Mar 18 20:38:09 barix user.debug kernel: edma edma: dma_irq_handler
> Mar 18 20:38:09 barix user.debug kernel: edma edma: IPR0 00000008
> Mar 18 20:38:09 barix user.debug kernel: EDMA: EER0 00000000
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> Terminate all, setting edesc 0xc2f53900 to NULL
> Mar 18 20:38:09 barix user.debug kernel: edma edma: dma_irq_handler
> Mar 18 20:38:09 barix user.debug kernel: edma edma: IPR0 20000000
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> Edma callback: edesc 0x  (null)
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> Pausing channel 29
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> ALLOC edesc: 0xc2f52140 (channel: 65565, sg_len: 16)
> Mar 18 20:38:09 barix user.debug kernel: edma-dma-engine edma-dma-engine.1:
> vchan c781bc70: txd c2f52140[1c93]: submitted
> 
> 
> 
> On 18.03.2015 14:56, Peter Ujfalusi wrote:
>> Petr,
>>
>> On 03/17/2015 09:02 PM, Petr Kulhavy wrote:
>>> Hi Peter,
>>>
>>> thanks a lot for the details.
>>> I believe it's not an Ethernet issue, it's really related to the SD card. If
>>> we use the USB storage instead of the SD card on our device we don't see the
>>> leaks.
>> I believe you are not booting with DT. The two eDMA is only supported when
>> booting in legacy mode.
>>
>>> I enabled the dynamic debug and added a debug message for the kzalloc() in
>>> edma_prep_slave_sg() and for the kfree() in the edma_desc_free() both to print
>>> the pointer address. And it gives an interesting result, see below.
>>>
>>> You can see that after every alloc (i.e.edma_prep_slave_sg()) edma_execute()
>>> is called ("file transfer starting..."), however not all of them end with
>>> "Transfer complete". And exactly those are also not freed.
>> I did the same on am335x-evmsk and I don't see the same issue (linux-next). It
>> does not mean that we do not have the issue you describe, but somehow it is
>> not that easy to reproduce. I will try my OMAP-L138 board, which is closer to
>> AM1808 than AM335x.
>>
>>> Unfortunately I do not know how exactly the edma mechanism works with all the
>>> callbacks, states, etc.
>>> But does it make any sense for you? Can you help me to debug more?
>> Not sure at the moment, but I would try to print also in the error cases in
>> the callback and track the edesc pointer in every prints to see where the code
>> goes. I would enable the prints in the edma_execute as well to see what is
>> going on there.
>>
>> For reference I have these prints:
>> [  924.127638] ALLOC edesc: 0xcd948c40 (channel: 25, sg_len: 1, rounds: 1)
>> [  924.134598] first transfer starting on channel 25 (edesc: 0xcd948c40)
>> [  924.145505] Transfer complete, stopping channel: 25 (edesc: 0xcd948c40)
>> [  924.152561] FREE edesc: 0xcd948c40 (channel: 25)
>> [  924.159223] ALLOC edesc: 0xc916f800 (channel: 25, sg_len: 30, rounds: 2)
>> [  924.166611] first transfer starting on channel 25 (edesc: 0xc916f800)
>> [  924.180541] Intermediate transfer complete on channel: 25 (edesc:
>> 0xc916f800)
>> [  924.188054] chan: 25: completed 30 elements, resuming (edesc: 0xc916f800)
>> [  924.195208] Error occurred on channel: 25 (edesc: 0xc916f800), TRIGGERING
>> [  924.204117] Transfer complete, stopping channel: 25 (edesc: 0xc916f800)
>> [  924.211158] FREE edesc: 0xc916f800 (channel: 25)
>> [  924.218407] ALLOC edesc: 0xc6cbce00 (channel: 25, sg_len: 8, rounds: 1)
>> [  924.225396] first transfer starting on channel 25 (edesc: 0xc6cbce00)
>> [  924.236530] Transfer complete, stopping channel: 25 (edesc: 0xc6cbce00)
>> [  924.243506] FREE edesc: 0xc6cbce00 (channel: 25)
>> [  924.248817] ALLOC edesc: 0xcaa5ec00 (channel: 25, sg_len: 12, rounds: 1)
>> [  924.256068] first transfer starting on channel 25 (edesc: 0xcaa5ec00)
>> [  924.268277] Transfer complete, stopping channel: 25 (edesc: 0xcaa5ec00)
>> [  924.275265] FREE edesc: 0xcaa5ec00 (channel: 25)
>>
>>
>>
>>> Thanks
>>> Petr
>>>
>>> ALLOC edesc c65d5c80
>>> first transfer starting on channel 65565
>> The channel number does not seams right here.
>>
>>> ALLOC edesc c5b69640
>> Would be nice to see the channel as well here. But sure the FREE for c65d5c80
>> is missing. Strange, I don't see how this happens.
>> I have enabled PREEMPT and still can not reproduce - this was my first idea.
>>
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c5b69640
>>> ALLOC edesc c58ec580
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c58ec580
>>> ALLOC edesc c5103d80
>>> first transfer starting on channel 65565
>>> ALLOC edesc c61e78c0
>>> first transfer starting on channel 65565
>>> ALLOC edesc c65d6f80
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c65d6f80
>>> ALLOC edesc c5b698c0
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c5b698c0
>>> ALLOC edesc c52244c0
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c52244c0
>>> ALLOC edesc c52244c0
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c52244c0
>>> ALLOC edesc c52244c0
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c52244c0
>>> ALLOC edesc c52244c0
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c52244c0
>>> ALLOC edesc c58ec580
>>> first transfer starting on channel 65565
>>> ALLOC edesc c5b698c0
>>> first transfer starting on channel 65565
>>> ALLOC edesc c5103480
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c5103480
>>> ALLOC edesc c5b69640
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c5b69640
>>> ALLOC edesc c61e62c0
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c61e62c0
>>> ALLOC edesc c5227440
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c5227440
>>> ALLOC edesc c5b69640
>>> first transfer starting on channel 65565
>>> ALLOC edesc c5b69b40
>>> first transfer starting on channel 65565
>>> ALLOC edesc c5233000
>>> first transfer starting on channel 65565
>>> ALLOC edesc c5233dc0
>>> first transfer starting on channel 65565
>>> ALLOC edesc c5233140
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c5233140
>>> ALLOC edesc c5233140
>>> first transfer starting on channel 65565
>>> ALLOC edesc c5233280
>>> first transfer starting on channel 65565
>>> Transfer complete, stopping channel 29
>>> FREE edesc c5233280
>>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-20 13:59         ` Peter Ujfalusi
@ 2015-03-20 21:53           ` Petr Kulhavy
  2015-03-23 15:28             ` Peter Ujfalusi
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Kulhavy @ 2015-03-20 21:53 UTC (permalink / raw)
  To: Peter Ujfalusi, linux-omap

Hi Peter,

yes, that is one of the differences to the EVM that the SD card is on 
MMCSD1. This is due to the pin multiplexer and other peripherals we're 
using.

Your patch is correct, however the edma_callback()  is using the channel 
number parameter for debug messages only. For the actual work 
echan->ch_num is used. BTW is that correct? Could there be a mismatch 
between the echan->ch_num and the ch_num parameter?

Something else seems to be odd in edma_alloc_chan_resources():

/* Alloc channel resources */
static int edma_alloc_chan_resources(struct dma_chan *chan)
{
         struct edma_chan *echan = to_edma_chan(chan);
         struct device *dev = chan->device->dev;
         int ret;
         int a_ch_num;
         LIST_HEAD(descs);

         a_ch_num = edma_alloc_channel(echan->ch_num, edma_callback,
                                         chan, EVENTQ_DEFAULT);


The third parameter to edma_alloc_channel() should be echan, not chan, 
since the edma_callback() interprets the callback data parameter as 
struct edma_chan *.

Let me know if you find something or if you have an advice for more debug.

Cheers
Petr



On 20.03.2015 14:59, Peter Ujfalusi wrote:
> Petr,
>
> On 03/18/2015 11:33 PM, Petr Kulhavy wrote:
>> Hi Peter,
>>
>> Yes, we do not use DT.
>> Our board design is very close to the da850evm reference board. So if you have
>> a chance of getting hold of it you could try on that one.
> I have been trying to reproduce this on my OMAP-L138-EVM (da850evm) but was
> not able.
>
> There is a big difference in your an my setup: the MMC on my EVM is connected
> to MMC/SD0 interface while in your setup it seams to be connected to MMC/SD1.
> The DMA requests for MMC/SD0 is eDMA CC0 16/17, while the MMC/SD1 is eDMA CC1
> 28/29.
> So they are handled by different channel controllers. I would not rule out
> that the support for the second CC has issues.
> The information from you logs just points to this direction also:
> You are writing to the MMC, so CC1 ch29 is active most of the time.
> In your log the channel number sometimes 65565 ((1 << 16) | 29) sometimes it
> is 29. In my case it is 17 every time.
> I think there is something funny regarding to how these channels on CC1 are
> working and this might be causing the leak for you.
>
> Unfortunately I do not have HW where I could use any channel on CC1 so I can
> not debug this further, but I'll look at the code to see if anything obvious
> stands out.
> And something does stand out:
> arch/arm/common/edma.c: dma_irq_handler()
> It calls the callback with a wrong interrupt number since it does not take
> into account the different CC.
> Can you try something similar to see if it helps:
>
> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
> index 6c49887d326e..e1d413c61e9e 100644
> --- a/arch/arm/common/edma.c
> +++ b/arch/arm/common/edma.c
> @@ -405,7 +405,8 @@ static irqreturn_t dma_irq_handler(int irq, void *data)
>   					BIT(slot));
>   			if (edma_cc[ctlr]->intr_data[channel].callback)
>   				edma_cc[ctlr]->intr_data[channel].callback(
> -					channel, EDMA_DMA_COMPLETE,
> +					EDMA_CTLR_CHAN(ctlr, channel),
> +					EDMA_DMA_COMPLETE,
>   					edma_cc[ctlr]->intr_data[channel].data);
>   		}
>   	} while (sh_ipr);
>

-- 


-- 
Petr Kulhavy, MSc
System Architect

*BARIX*

petr@barix.com <mailto:petr@barix.com> | Skype: brain.barix

Barix AG, Seefeldstrasse 303 | 8008 Zurich, Switzerland
T +41 43 43322 11 | www.barix.com <http://www.barix.com>

You have received this email because of your relationship Barix AG and 
its affiliated companies. Barix AG and its affiliated companies do not 
sell or exchange email addresses, or any other personal contact 
information provided by you with any third parties. All email 
distributions are managed and controlled by Barix AG and its affiliated 
companies.
Barix AG, Seefeldstr. 303, 8008 Zürich, Switzerland. Company Reg. No: 
CH-020.3.023.869-8, VAT Reg. No: CHE-105.687.663.

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-20 21:53           ` Petr Kulhavy
@ 2015-03-23 15:28             ` Peter Ujfalusi
  2015-03-23 15:45               ` Petr Kulhavy
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Ujfalusi @ 2015-03-23 15:28 UTC (permalink / raw)
  To: Petr Kulhavy, linux-omap

On 03/20/2015 11:53 PM, Petr Kulhavy wrote:
> Hi Peter,
> 
> yes, that is one of the differences to the EVM that the SD card is on MMCSD1.
> This is due to the pin multiplexer and other peripherals we're using.
> 
> Your patch is correct, however the edma_callback()  is using the channel
> number parameter for debug messages only. For the actual work echan->ch_num is
> used. BTW is that correct? Could there be a mismatch between the echan->ch_num
> and the ch_num parameter?
> 
> Something else seems to be odd in edma_alloc_chan_resources():
> 
> /* Alloc channel resources */
> static int edma_alloc_chan_resources(struct dma_chan *chan)
> {
>         struct edma_chan *echan = to_edma_chan(chan);
>         struct device *dev = chan->device->dev;
>         int ret;
>         int a_ch_num;
>         LIST_HEAD(descs);
> 
>         a_ch_num = edma_alloc_channel(echan->ch_num, edma_callback,
>                                         chan, EVENTQ_DEFAULT);

Hah, yes this is wrong but worked so far fine because:
struct edma_chan {
	struct virt_dma_chan		vchan;
...
};

struct virt_dma_chan {
	struct dma_chan	chan;
...
};

so &edma_chan == &edma_chan.vchan.chan;

But this is not why you see the leak.

What I did on my board is that I have swapped in SW the cc0 and cc1, now mmc0
is on cc1 from the sw point of view, but still can not reproduce the issue.

> 
> The third parameter to edma_alloc_channel() should be echan, not chan, since
> the edma_callback() interprets the callback data parameter as struct edma_chan *.
> 
> Let me know if you find something or if you have an advice for more debug.

From the log I would go and see what happens in the
vchan_get_all_descriptors() and vchan_dma_desc_free_list(), is it so that at
terminate_all time the list we got is empty? But why it is?

It seams you got the terminate_all call before the transfer finished, this is
also interesting. I'll look at at this more deeply.
What I see is that at terminate_all we clear the echan->edesc and for some
reason the vchan code will not free the desc. Later the completion interrupt
comes, but since the echan->edesc is NULL we do nothing. This causes the leak.

The question to all of this why and how to reproduce it?

-- 
Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-23 15:28             ` Peter Ujfalusi
@ 2015-03-23 15:45               ` Petr Kulhavy
  2015-03-24 12:59                 ` Peter Ujfalusi
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Kulhavy @ 2015-03-23 15:45 UTC (permalink / raw)
  To: Peter Ujfalusi, linux-omap

Hi Peter,

I've just posted a patch to the community, you should have received 
another email from me just a few minutes ago.
Basically there should be a kfree in terminate_all() just before 
echan->edesc is set to NULL.
The reason is that at that point the edesc is in none of the vchan lists 
(edma_execute() removes it from the "issued" list)
and vchan_get_all_descriptors() doesn't find it.

With the extra kfree the mem leak is not seen any more.
It's a good question why terminate_all is called before the transfer 
finished, but I thought it could be called at any time?

I'm also not sure if the echan->edesc shouldn't be free also in 
edma_execute(), could you please check that? :

static void edma_execute(struct edma_chan *echan)
{
     struct virt_dma_desc *vdesc;
     struct edma_desc *edesc;
     struct device *dev = echan->vchan.chan.device->dev;
     int i, j, left, nslots;

     /* If either we processed all psets or we're still not started */
     if (!echan->edesc ||
         echan->edesc->pset_nr == echan->edesc->processed) {
         /* Get next vdesc */
         vdesc = vchan_next_desc(&echan->vchan);
         if (!vdesc) {
             dev_dbg(dev, "Setting edesc 0x%p to NULL\n",echan->edesc);
             echan->edesc = NULL;
#warning "possible memory leak here ?"
             return;
         }
         list_del(&vdesc->node);        // at this point node was in issued
         echan->edesc = to_edma_desc(&vdesc->tx);
     }

I'll post another patch for the wrong type in edma_alloc_chan_resources()


Regards
Petr


On 23.03.2015 16:28, Peter Ujfalusi wrote:
> On 03/20/2015 11:53 PM, Petr Kulhavy wrote:
>> Hi Peter,
>>
>> yes, that is one of the differences to the EVM that the SD card is on MMCSD1.
>> This is due to the pin multiplexer and other peripherals we're using.
>>
>> Your patch is correct, however the edma_callback()  is using the channel
>> number parameter for debug messages only. For the actual work echan->ch_num is
>> used. BTW is that correct? Could there be a mismatch between the echan->ch_num
>> and the ch_num parameter?
>>
>> Something else seems to be odd in edma_alloc_chan_resources():
>>
>> /* Alloc channel resources */
>> static int edma_alloc_chan_resources(struct dma_chan *chan)
>> {
>>          struct edma_chan *echan = to_edma_chan(chan);
>>          struct device *dev = chan->device->dev;
>>          int ret;
>>          int a_ch_num;
>>          LIST_HEAD(descs);
>>
>>          a_ch_num = edma_alloc_channel(echan->ch_num, edma_callback,
>>                                          chan, EVENTQ_DEFAULT);
> Hah, yes this is wrong but worked so far fine because:
> struct edma_chan {
> 	struct virt_dma_chan		vchan;
> ...
> };
>
> struct virt_dma_chan {
> 	struct dma_chan	chan;
> ...
> };
>
> so &edma_chan == &edma_chan.vchan.chan;
>
> But this is not why you see the leak.
>
> What I did on my board is that I have swapped in SW the cc0 and cc1, now mmc0
> is on cc1 from the sw point of view, but still can not reproduce the issue.
>
>> The third parameter to edma_alloc_channel() should be echan, not chan, since
>> the edma_callback() interprets the callback data parameter as struct edma_chan *.
>>
>> Let me know if you find something or if you have an advice for more debug.
>  From the log I would go and see what happens in the
> vchan_get_all_descriptors() and vchan_dma_desc_free_list(), is it so that at
> terminate_all time the list we got is empty? But why it is?
>
> It seams you got the terminate_all call before the transfer finished, this is
> also interesting. I'll look at at this more deeply.
> What I see is that at terminate_all we clear the echan->edesc and for some
> reason the vchan code will not free the desc. Later the completion interrupt
> comes, but since the echan->edesc is NULL we do nothing. This causes the leak.
>
> The question to all of this why and how to reproduce it?
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Serious memory leak in TI EDMA driver (drivers/dma/edma.c)
  2015-03-23 15:45               ` Petr Kulhavy
@ 2015-03-24 12:59                 ` Peter Ujfalusi
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Ujfalusi @ 2015-03-24 12:59 UTC (permalink / raw)
  To: Petr Kulhavy, linux-omap

On 03/23/2015 05:45 PM, Petr Kulhavy wrote:
> Hi Peter,
> 
> I've just posted a patch to the community, you should have received another
> email from me just a few minutes ago.
> Basically there should be a kfree in terminate_all() just before echan->edesc
> is set to NULL.
> The reason is that at that point the edesc is in none of the vchan lists
> (edma_execute() removes it from the "issued" list)
> and vchan_get_all_descriptors() doesn't find it.

And this is the main issue in my view. The desc should be in one of the lists
IMHO.

> With the extra kfree the mem leak is not seen any more.
> It's a good question why terminate_all is called before the transfer finished,
> but I thought it could be called at any time?
> 
> I'm also not sure if the echan->edesc shouldn't be free also in
> edma_execute(), could you please check that? :
> 
> static void edma_execute(struct edma_chan *echan)
> {
>     struct virt_dma_desc *vdesc;
>     struct edma_desc *edesc;
>     struct device *dev = echan->vchan.chan.device->dev;
>     int i, j, left, nslots;
> 
>     /* If either we processed all psets or we're still not started */
>     if (!echan->edesc ||
>         echan->edesc->pset_nr == echan->edesc->processed) {
>         /* Get next vdesc */
>         vdesc = vchan_next_desc(&echan->vchan);
>         if (!vdesc) {
>             dev_dbg(dev, "Setting edesc 0x%p to NULL\n",echan->edesc);
>             echan->edesc = NULL;
> #warning "possible memory leak here ?"

Yes, it would be, but in fact this will never happen since the edma_execute()
will not be called when echan->edesc is not NULL _and_ (echan->edesc->pset_nr
== echan->edesc->processed).

if we have echan->edesc, we are in a middle of transfer, moving from one batch
of sg to another batch.

I do have cleanup patches for this part to clarify the situation.

>             return;
>         }
>         list_del(&vdesc->node);        // at this point node was in issued
>         echan->edesc = to_edma_desc(&vdesc->tx);
>     }
> 
> I'll post another patch for the wrong type in edma_alloc_chan_resources()
> 
> 
> Regards
> Petr
> 
> 
> On 23.03.2015 16:28, Peter Ujfalusi wrote:
>> On 03/20/2015 11:53 PM, Petr Kulhavy wrote:
>>> Hi Peter,
>>>
>>> yes, that is one of the differences to the EVM that the SD card is on MMCSD1.
>>> This is due to the pin multiplexer and other peripherals we're using.
>>>
>>> Your patch is correct, however the edma_callback()  is using the channel
>>> number parameter for debug messages only. For the actual work echan->ch_num is
>>> used. BTW is that correct? Could there be a mismatch between the echan->ch_num
>>> and the ch_num parameter?
>>>
>>> Something else seems to be odd in edma_alloc_chan_resources():
>>>
>>> /* Alloc channel resources */
>>> static int edma_alloc_chan_resources(struct dma_chan *chan)
>>> {
>>>          struct edma_chan *echan = to_edma_chan(chan);
>>>          struct device *dev = chan->device->dev;
>>>          int ret;
>>>          int a_ch_num;
>>>          LIST_HEAD(descs);
>>>
>>>          a_ch_num = edma_alloc_channel(echan->ch_num, edma_callback,
>>>                                          chan, EVENTQ_DEFAULT);
>> Hah, yes this is wrong but worked so far fine because:
>> struct edma_chan {
>>     struct virt_dma_chan        vchan;
>> ...
>> };
>>
>> struct virt_dma_chan {
>>     struct dma_chan    chan;
>> ...
>> };
>>
>> so &edma_chan == &edma_chan.vchan.chan;
>>
>> But this is not why you see the leak.
>>
>> What I did on my board is that I have swapped in SW the cc0 and cc1, now mmc0
>> is on cc1 from the sw point of view, but still can not reproduce the issue.
>>
>>> The third parameter to edma_alloc_channel() should be echan, not chan, since
>>> the edma_callback() interprets the callback data parameter as struct
>>> edma_chan *.
>>>
>>> Let me know if you find something or if you have an advice for more debug.
>>  From the log I would go and see what happens in the
>> vchan_get_all_descriptors() and vchan_dma_desc_free_list(), is it so that at
>> terminate_all time the list we got is empty? But why it is?
>>
>> It seams you got the terminate_all call before the transfer finished, this is
>> also interesting. I'll look at at this more deeply.
>> What I see is that at terminate_all we clear the echan->edesc and for some
>> reason the vchan code will not free the desc. Later the completion interrupt
>> comes, but since the echan->edesc is NULL we do nothing. This causes the leak.
>>
>> The question to all of this why and how to reproduce it?
>>
> 


-- 
Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-03-24 12:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-16 19:26 Serious memory leak in TI EDMA driver (drivers/dma/edma.c) Petr Kulhavy
2015-03-16 19:27 ` Tony Lindgren
2015-03-17 12:38 ` Peter Ujfalusi
2015-03-17 19:02   ` Petr Kulhavy
2015-03-18 13:56     ` Peter Ujfalusi
2015-03-18 21:33       ` Petr Kulhavy
2015-03-20 13:59         ` Peter Ujfalusi
2015-03-20 21:53           ` Petr Kulhavy
2015-03-23 15:28             ` Peter Ujfalusi
2015-03-23 15:45               ` Petr Kulhavy
2015-03-24 12:59                 ` Peter Ujfalusi
  -- strict thread matches above, loose matches on Subject: below --
2015-03-16 19:27 Petr Kulhavy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).