How to measure performance inside Kernel?

kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed

* How to measure performance inside Kernel?
@ 2012-02-09 12:58 Peter Senna Tschudin
  2012-02-09 18:12 ` michi1 at michaelblizek.twilightparadox.com
  2012-02-10 21:47 ` Peter Senna Tschudin
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Senna Tschudin @ 2012-02-09 12:58 UTC (permalink / raw)
  To: kernelnewbies

Dear list,

I'm looking for a way to compare the performance of two different
codes inside Kernel. I was able to do some comparison on user land but
I want to test the specific portion of code inside Kernel.

At line 1195 of drivers/media/video/videobuf2-core.c:
/*
 * Reinitialize all buffers for next use.
 */
for (i = 0; i < q->num_buffers; ++i)
       q->bufs[i]->state = VB2_BUF_STATE_DEQUEUED;

With:

/* buf2 */
/*
 * Reinitialize all buffers for next use.
 */
buf_ptr_end = q->bufs[q->num_buffers];

for (buf_ptr = q->bufs[0]; buf_ptr < buf_ptr_end; ++buf_ptr)
       buf_ptr->state = VB2_BUF_STATE_DEQUEUED;

To test on user land I've created two separate C source codes and
compiled with gcc -O2, then used the "perf" tool on the entire
application. With num_buffers = 131072:

$ perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-references,cache-misses
-r 2048 ./buf1

Performance counter stats for './buf1' (2048 runs):

16,538,039 cycles                #0.000 GHz                  (+-0.06%)[80.23%]
6,917,411 stalled-cycles-frontend#41.83% frontend cycles idle(+-0.14%)[80.25%]
4,686,384 stalled-cycles-backend #28.34% backend  cycles idle(+-0.14%)[80.28%]
148,990 cache-references                                     (+-0.38%)[80.24%]
71,180 cache-misses              #47.775 % of all cache refs (+-0.22%)[88.14%]

0.005234340 seconds time elapsed

$ perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-references,cache-misses
-r 2048 ./buf2
Performance counter stats for './buf2' (2048 runs):

14,740,563 cycles                #0.000 GHz                  (+-0.04%)[77.89%]
5,187,716 stalled-cycles-frontend#35.19% frontend cycles idle(+-0.14%)[77.81%]
3,383,748 stalled-cycles-backend #
101,894 cache-references                                     (+-0.23%)[84.60%]
66,647 cache-misses              #65.408 % of all cache refs (+-0.14%)[90.52%]

0.004661826 seconds time elapsed                             (+-0.06%)

But I want to repeat the tests on specific portion of code, not on
entire application. Is there a safe way of do something like:

start_bench ( ?? ); /* start measurement */

buf_ptr_end = q->bufs[q->num_buffers];

for (buf_ptr = q->bufs[0]; buf_ptr < buf_ptr_end; ++buf_ptr)
       buf_ptr->state = VB2_BUF_STATE_DEQUEUED;

end_bench ( ?? ); /* end measurement */

And is this the correct approach for testing the performance of
specific portion of Kernel code?

Thank you!

Peter



-- 
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-09 12:58 How to measure performance inside Kernel? Peter Senna Tschudin
@ 2012-02-09 18:12 ` michi1 at michaelblizek.twilightparadox.com
  2012-02-10 21:47 ` Peter Senna Tschudin
  1 sibling, 0 replies; 14+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2012-02-09 18:12 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 10:58 Thu 09 Feb     , Peter Senna Tschudin wrote:
...
> But I want to repeat the tests on specific portion of code, not on
> entire application. Is there a safe way of do something like:
> 
> start_bench ( ?? ); /* start measurement */
> 
> buf_ptr_end = q->bufs[q->num_buffers];
> 
> for (buf_ptr = q->bufs[0]; buf_ptr < buf_ptr_end; ++buf_ptr)
>        buf_ptr->state = VB2_BUF_STATE_DEQUEUED;
> 
> end_bench ( ?? ); /* end measurement */

Yes, you can do this. If you search for a way for time measurement, take a
look at include/linux/ktime.h .

> And is this the correct approach for testing the performance of
> specific portion of Kernel code?

Why not? I would be a be worried about how the cpu-cache effects the
measurement, especially because your benchmark already showed that a
significant time is spent there.

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-09 12:58 How to measure performance inside Kernel? Peter Senna Tschudin
  2012-02-09 18:12 ` michi1 at michaelblizek.twilightparadox.com
@ 2012-02-10 21:47 ` Peter Senna Tschudin
  2012-02-10 22:06   ` Jeff Haran
  2012-02-11  7:22   ` michi1 at michaelblizek.twilightparadox.com
  1 sibling, 2 replies; 14+ messages in thread
From: Peter Senna Tschudin @ 2012-02-10 21:47 UTC (permalink / raw)
  To: kernelnewbies

Dear list,

As Michi suggested, I did some testing with ktime.h, but I found a
simpler solution with time.h.

I'm not sure if it is correct, and I would like to have some help... :-)

The code that I'm using for execution time measurement is:

#include <linux/time.h>

getnstimeofday (ts_start); /*stopwatch start*/

for (i = 0; i < q->num_buffers; ++i)
	q->bufs[i]->state = VB2_BUF_STATE_DEQUEUED;

getnstimeofday (ts_end); /*stopwatch stop*/

diff = timespec_sub(end, begin);

printk ("%lu,", diff.tv_nsec );

Am I doing anything wrong? Can mysterious stuff like out of order
execution engine, brake the stopwatch?

The full module code is at: http://goo.gl/cCMIa

Thank you!

Peter

On Thu, Feb 9, 2012 at 10:58 AM, Peter Senna Tschudin
<peter.senna@gmail.com> wrote:
> Dear list,
>
> I'm looking for a way to compare the performance of two different
> codes inside Kernel. I was able to do some comparison on user land but
> I want to test the specific portion of code inside Kernel.

-- 
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-10 21:47 ` Peter Senna Tschudin
@ 2012-02-10 22:06   ` Jeff Haran
  2012-02-11  0:22     ` Peter Senna Tschudin
  2012-02-11  7:22   ` michi1 at michaelblizek.twilightparadox.com
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff Haran @ 2012-02-10 22:06 UTC (permalink / raw)
  To: kernelnewbies

> -----Original Message-----
> From: kernelnewbies-bounces at kernelnewbies.org [mailto:kernelnewbies-
> bounces at kernelnewbies.org] On Behalf Of Peter Senna Tschudin
> Sent: Friday, February 10, 2012 1:48 PM
> To: kernelnewbies at kernelnewbies.org
> Subject: Re: How to measure performance inside Kernel?
> 
> Dear list,
> 
> As Michi suggested, I did some testing with ktime.h, but I found a
> simpler solution with time.h.
> 
> I'm not sure if it is correct, and I would like to have some help...
:-)
> 
> The code that I'm using for execution time measurement is:
> 
> #include <linux/time.h>
> 
> getnstimeofday (ts_start); /*stopwatch start*/
> 
> for (i = 0; i < q->num_buffers; ++i)
> 	q->bufs[i]->state = VB2_BUF_STATE_DEQUEUED;
> 
> getnstimeofday (ts_end); /*stopwatch stop*/
> 
> diff = timespec_sub(end, begin);
> 
> printk ("%lu,", diff.tv_nsec );
> 
> Am I doing anything wrong? Can mysterious stuff like out of order
> execution engine, brake the stopwatch?
> 
> The full module code is at: http://goo.gl/cCMIa
> 
> Thank you!
> 
> Peter

If you didn't disable interrupts before executing the above, the timing
of the above loop would include any time spent servicing interrupts.
Likewise if there were context switches or soft IRQs running. All would
inflate the perceived time to execute your loop.

Jeff Haran

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-10 22:06   ` Jeff Haran
@ 2012-02-11  0:22     ` Peter Senna Tschudin
  2012-02-11  3:44       ` Graeme Russ
  2012-02-11  7:34       ` michi1 at michaelblizek.twilightparadox.com
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Senna Tschudin @ 2012-02-11  0:22 UTC (permalink / raw)
  To: kernelnewbies

Jeff,

Thanks for the fast reply. My goal is to determine if the code:

/* * */
buf_ptr_end = q->bufs[q->num_buffers];

for (buf_ptr = q->bufs[0]; buf_ptr < buf_ptr_end; ++buf_ptr)
	buf_ptr->state = VB2_BUF_STATE_DEQUEUED;
/* * */

is faster than:

/* * */
for (i = 0; i < q->num_buffers; ++i)
	q->bufs[i]->state = VB2_BUF_STATE_DEQUEUED;
/* * */

The second code is found at at line 1195 of
drivers/media/video/videobuf2-core.c. I'm doing the tests to see if it
does justify to make a patch or not.

> If you didn't disable interrupts before executing the above, the timing
> of the above loop would include any time spent servicing interrupts.
> Likewise if there were context switches or soft IRQs running. All would
> inflate the perceived time to execute your loop.

I'm measuring the running time of that portion of code 512 times. Then
calculate the geometrical mean of results. See one output example:

Original_code:,
514,110,92,104,107,101,101,99,105,99,87,105,99,105,105,102,108,123,105,113,104,128,107,92,117,117,120,102,105,105,90,105,101,119,119,101,141,111,96,99,93,80,102,78,93,93,108,108,129,105,107,113,104,104,93,95,104,125,104,95,90,108,117,126,93,87,102,99,93,108,96,104,113,89,92,134,107,98,95,104,102,143,102,126,99,111,99,117,105,102,108,108,98,107,104,98,96,81,108,120,102,87,93,93,93,119,104,96,108,93,108,108,111,98,98,116,122,98,101,87,99,99,105,102,108,111,99,96,111,111,102,98,110,125,107,110,132,102,99,105,87,84,108,120,90,101,95,113,101,107,111,105,108,114,126,102,92,110,104,101,99,108,135,105,123,111,108,102,102,110,110,92,98,104,119,102,113,95,107,104,116,131,153,164,152,125,107,101,93,105,96,123,96,111,111,99,93,101,92,107,89,95,108,111,99,111,114,108,99,117,129,107,105,87,87,93,102,99,83,96,84,102,96,90,110,101,116,89,98,119,125,114,99,126,123,102,123,111,102,101,110,107,111,90,105,111,96,105,102,113,104,158,101,87,102,96,108,111,138,102,120,87,90,102,104,107,101,84,102,99,96,111,99,105,102,99,104,131,116,104,104,105,126,105,116,128,107,101,105,120,132,111,90,90,114,99,86,110,95,81,120,96,126,99,108,114,120,102,120,125,95,104,96,108,105,105,114,123,111,93,104,83,113,107,99,99,99,105,90,78,119,113,98,98,90,99,90,129,96,101,110,77,110,125,101,102,87,87,117,126,117,108,126,108,96,108,99,105,105,114,123,104,104,110,105,84,96,105,96,120,111,120,101,110,110,87,105,114,102,87,108,135,117,132,141,105,113,95,98,84,96,87,98,89,108,105,102,99,99,105,126,99,101,92,98,75,102,102,129,102,99,102,99,108,92,110,125,107,110,102,96,96,117,72,108,123,105,120,120,99,120,98,104,89,102,117,129,123,105,119,107,101,87,117,111,99,108,117,114,114,90,122,113,95,104,125,113,102,108,120,90,108,93,89,86,87,90,84,83,108,99,102,90,108,90,108,87,95,86,90,123,135,93,126,93,102,99,123,108,117,105,102,105,98,107,122,119,125,96,108,131,99,114,104,93,96,95,83,99,84,92,87,

The geometrical mean of the values is: 104.7623578604

Isn't it enough?

Thanks! :-)

-- 
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-11  0:22     ` Peter Senna Tschudin
@ 2012-02-11  3:44       ` Graeme Russ
  2012-02-11 13:14         ` Peter Senna Tschudin
  2012-02-11  7:34       ` michi1 at michaelblizek.twilightparadox.com
  1 sibling, 1 reply; 14+ messages in thread
From: Graeme Russ @ 2012-02-11  3:44 UTC (permalink / raw)
  To: kernelnewbies

Hi Peter,

On 02/11/2012 11:22 AM, Peter Senna Tschudin wrote:
> Jeff,
>
> Thanks for the fast reply. My goal is to determine if the code:
>
> /* * */
> buf_ptr_end = q->bufs[q->num_buffers];
>
> for (buf_ptr = q->bufs[0]; buf_ptr < buf_ptr_end; ++buf_ptr)
> 	buf_ptr->state = VB2_BUF_STATE_DEQUEUED;
> /* * */
>
> is faster than:
>
> /* * */
> for (i = 0; i < q->num_buffers; ++i)
> 	q->bufs[i]->state = VB2_BUF_STATE_DEQUEUED;
> /* * */

For starters, I would not be surprised if the compiler produces identical
code, and if not, code that has the same performance...

> I'm measuring the running time of that portion of code 512 times. Then
> calculate the geometrical mean of results. See one output example:
>
> Original_code:,
>
514,110,92,104,107,101,101,99,105,99,87,105,99,105,105,102,108,123,105,113,104,128,107,92,117,117,120,102,105,105,90,105,101,119,119,101,141,111,96,99,93,80,102,78,93,93,108,108,129,105,107,113,104,104,93,95,104,125,104,95,90,108,117,126,93,87,102,99,93,108,96,104,113,89,92,134,107,98,95,104,102,143,102,126,99,111,99,117,105,102,108,108,98,107,104,98,96,81,108,120,102,87,93,93,93,119,104,96,108,93,108,108,111,98,98,116,122,98,101,87,99,99,105,102,108,111,99,96,111,111,102,98,110,125,107,110,132,102,99,105,87,84,108,120,90,101,95,113,101,107,111,105,108,114,126,102,92,110,104,101,99,108,135,105,123,111,108,102,102,110,110,92,98,104,119,102,113,95,107,104,116,131,153,164,152,125,107,101,93,105,96,123,96,111,111,99,93,101,92,107,89,95,108,111,99,111,114,108,99,117,129,107,105,87,87,93,102,99,83,96,84,102,96,90,110,101,116,89,98,119,125,114,99,126,123,102,123,111,102,101,110,107,111,90,105,111,96,105,102,113,104,158,101,87,102,96,108,111,138,102,120,87,90,102,104,107,101,84,102,99,96,111,
99,105,102,99,104,131,116,104,104,105,126,105,116,128,107,101,105,120,132,111,90,90,114,99,86,110,95,81,120,96,126,99,108,114,120,102,120,125,95,104,96,108,105,105,114,123,111,93,104,83,113,107,99,99,99,105,90,78,119,113,98,98,90,99,90,129,96,101,110,77,110,125,101,102,87,87,117,126,117,108,126,108,96,108,99,105,105,114,123,104,104,110,105,84,96,105,96,120,111,120,101,110,110,87,105,114,102,87,108,135,117,132,141,105,113,95,98,84,96,87,98,89,108,105,102,99,99,105,126,99,101,92,98,75,102,102,129,102,99,102,99,108,92,110,125,107,110,102,96,96,117,72,108,123,105,120,120,99,120,98,104,89,102,117,129,123,105,119,107,101,87,117,111,99,108,117,114,114,90,122,113,95,104,125,113,102,108,120,90,108,93,89,86,87,90,84,83,108,99,102,90,108,90,108,87,95,86,90,123,135,93,126,93,102,99,123,108,117,105,102,105,98,107,122,119,125,96,108,131,99,114,104,93,96,95,83,99,84,92,87,
>
> The geometrical mean of the values is: 104.7623578604
>
> Isn't it enough?

That sounds like a perfectly good methodology - It would be interesting to
see how many reps are required before the mean settles down to a constant
value - Maybe you don't need to run it so many times, or maybe you need to
 run more...

Regards,

Graeme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-10 21:47 ` Peter Senna Tschudin
  2012-02-10 22:06   ` Jeff Haran
@ 2012-02-11  7:22   ` michi1 at michaelblizek.twilightparadox.com
  2012-02-11 13:29     ` Peter Senna Tschudin
  1 sibling, 1 reply; 14+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2012-02-11  7:22 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 19:47 Fri 10 Feb     , Peter Senna Tschudin wrote:
...
> #include <linux/time.h>
> 
> getnstimeofday (ts_start); /*stopwatch start*/
...
> getnstimeofday (ts_end); /*stopwatch stop*/
> 
> diff = timespec_sub(end, begin);
> 
> printk ("%lu,", diff.tv_nsec );
> 
> Am I doing anything wrong? Can mysterious stuff like out of order
> execution engine, brake the stopwatch?

Why don't you print the tv_sec part?

You might also want to replace getnstimeofday with getrawmonotonic or any
other monotonic time source. If ntp or something else decides to change system
time during the measurement, you would probably get weird results.

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-11  0:22     ` Peter Senna Tschudin
  2012-02-11  3:44       ` Graeme Russ
@ 2012-02-11  7:34       ` michi1 at michaelblizek.twilightparadox.com
  1 sibling, 0 replies; 14+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2012-02-11  7:34 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 22:22 Fri 10 Feb     , Peter Senna Tschudin wrote:
...
> I'm measuring the running time of that portion of code 512 times. Then
> calculate the geometrical mean of results. See one output example:
> 
> Original_code:,
> 514,110,92,104,107,101,101,
...
> The geometrical mean of the values is: 104.7623578604
> 
> Isn't it enough?

It should reduce the influence of the scheduler, but you can see a different
effect here: The first run takes ~5 times longer than any run which follows.
This is most likely caused by CPU cache effects. The question is now whether
you can expect the data to be in the cpu cache when this code is run in the
real world. If not, you might want to add prefetch instructions (look for
"__builtin_prefetch"). These instructions will make the first run faster, but
further runs slower.

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-11  3:44       ` Graeme Russ
@ 2012-02-11 13:14         ` Peter Senna Tschudin
  2012-02-11 13:57           ` Peter Senna Tschudin
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Senna Tschudin @ 2012-02-11 13:14 UTC (permalink / raw)
  To: kernelnewbies

Hi Graeme,

I'm doing tests on Fedora 15 and using it's provided
/lib/modules/2.6.41.10-3.fc15.x86_64/build/Makefile for compiling the
module.

> For starters, I would not be surprised if the compiler produces identical
> code, and if not, code that has the same performance...

I've measured huge differences. Geometrical Mean of 512 values on Core i7:
Original_code:		98.2421756187
Proposed_code:		17.9710892854

> That sounds like a perfectly good methodology - It would be interesting to
> see how many reps are required before the mean settles down to a constant
> value - Maybe you don't need to run it so many times, or maybe you need to
> ?run more...

During the run, measuring the original code reports some values that
are much higher than the average.
The graph is at: http://imgur.com/2WXAq

The full source of the module I'm using is at: http://goo.gl/cCMIa

What's wrong?

[]'s

Peter

-- 
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-11  7:22   ` michi1 at michaelblizek.twilightparadox.com
@ 2012-02-11 13:29     ` Peter Senna Tschudin
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Senna Tschudin @ 2012-02-11 13:29 UTC (permalink / raw)
  To: kernelnewbies

Hi Michi,

> Why don't you print the tv_sec part?

It is expected that tv_sec is always zero. But I'll fix the code to
check if it really is.

>
> You might also want to replace getnstimeofday with getrawmonotonic or any
> other monotonic time source. If ntp or something else decides to change system
> time during the measurement, you would probably get weird results.

I got the getnstimeofday() from line 333 of <linux/ktime.h>
/* Get the real (wall-) time in timespec format: */
#define ktime_get_real_ts(ts)	getnstimeofday(ts)

If the clock changes in the worst time possible, it may affect only
one result. But I'll check other functions.

Thanks!

Peter


-- 
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-11 13:14         ` Peter Senna Tschudin
@ 2012-02-11 13:57           ` Peter Senna Tschudin
  2012-02-12 11:46             ` Mulyadi Santosa
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Senna Tschudin @ 2012-02-11 13:57 UTC (permalink / raw)
  To: kernelnewbies

Graeme,

I found a problem on my code. I was calling kmalloc() only once for
both portions of code. The result is that the first loop that accessed
the memory was finding some penalty. Now I'm calling independent
kmalloc for each test.

The latest results are:
Proposed_code:	13.3940560108
Original_code:	37.8944950594

The graph is at:
http://i.imgur.com/nPjTE.jpg

Source:
http://goo.gl/cCMIa

Peter
-- 
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-11 13:57           ` Peter Senna Tschudin
@ 2012-02-12 11:46             ` Mulyadi Santosa
  2012-02-13 23:13               ` Peter Senna Tschudin
  0 siblings, 1 reply; 14+ messages in thread
From: Mulyadi Santosa @ 2012-02-12 11:46 UTC (permalink / raw)
  To: kernelnewbies

Hi Peter...

On Sat, Feb 11, 2012 at 20:57, Peter Senna Tschudin
<peter.senna@gmail.com> wrote:
> Graeme,
>
> I found a problem on my code. I was calling kmalloc() only once for
> both portions of code. The result is that the first loop that accessed
> the memory was finding some penalty. Now I'm calling independent
> kmalloc for each test.

Sorry for jumping in the mid of discussion :)

I read your code and I think kmalloc can be streamlined here. I
recommend that kmalloc() allocate total memory needed to handle whole
q->buf[] array. something like (CMIIW):

q->buf=kmalloc(sizeof(struct vb_buffer)*q->num_buffers,GFP_KERNEL)

then access q->buf[1], q->buf[2] etc.

This way, AFAIK, you will likely get not only virtually continous
pages, but also physical continous pages. And that will ease
prefetching into L1/L2 cache.

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-12 11:46             ` Mulyadi Santosa
@ 2012-02-13 23:13               ` Peter Senna Tschudin
  2012-02-14  3:43                 ` Mulyadi Santosa
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Senna Tschudin @ 2012-02-13 23:13 UTC (permalink / raw)
  To: kernelnewbies

Hi Mulyadi,

> I read your code and I think kmalloc can be streamlined here. I
> recommend that kmalloc() allocate total memory needed to handle whole
> q->buf[] array. something like (CMIIW):
>
> q->buf=kmalloc(sizeof(struct vb_buffer)*q->num_buffers,GFP_KERNEL)
>
> then access q->buf[1], q->buf[2] etc.
>
struct vb2_queue {
	struct vb2_buffer		*bufs[VIDEO_MAX_FRAME];
	unsigned int			num_buffers;
};

bufs is an array of pointers to struct vb2_buffer. I was not able to
use your kmalloc code. I get incompatible types errors when trying it.
Any ideas?

-- 
Peter Senna Tschudin
peter.senna at gmail.com
gpg id: 48274C36

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to measure performance inside Kernel?
  2012-02-13 23:13               ` Peter Senna Tschudin
@ 2012-02-14  3:43                 ` Mulyadi Santosa
  0 siblings, 0 replies; 14+ messages in thread
From: Mulyadi Santosa @ 2012-02-14  3:43 UTC (permalink / raw)
  To: kernelnewbies

Hi :)

On Tue, Feb 14, 2012 at 06:13, Peter Senna Tschudin
<peter.senna@gmail.com> wrote:
> bufs is an array of pointers to struct vb2_buffer. I was not able to
> use your kmalloc code. I get incompatible types errors when trying it.
> Any ideas?

Hmmm, okay I admit I am not that good in C....

How about, casting the kmalloc result to void first?


-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-02-14  3:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-09 12:58 How to measure performance inside Kernel? Peter Senna Tschudin
2012-02-09 18:12 ` michi1 at michaelblizek.twilightparadox.com
2012-02-10 21:47 ` Peter Senna Tschudin
2012-02-10 22:06   ` Jeff Haran
2012-02-11  0:22     ` Peter Senna Tschudin
2012-02-11  3:44       ` Graeme Russ
2012-02-11 13:14         ` Peter Senna Tschudin
2012-02-11 13:57           ` Peter Senna Tschudin
2012-02-12 11:46             ` Mulyadi Santosa
2012-02-13 23:13               ` Peter Senna Tschudin
2012-02-14  3:43                 ` Mulyadi Santosa
2012-02-11  7:34       ` michi1 at michaelblizek.twilightparadox.com
2012-02-11  7:22   ` michi1 at michaelblizek.twilightparadox.com
2012-02-11 13:29     ` Peter Senna Tschudin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).