[PATCH] oprofile per-cpu buffer overrun

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] oprofile per-cpu buffer overrun
@ 2004-01-26  2:37 Philippe Elie
  2004-01-26  4:07 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Philippe Elie @ 2004-01-26  2:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, John Levon

Hi Andrew,

In a ring buffer controlled by a read and write positions we
can't use buffer_size but only buffer_size - 1 entry, the last
free entry act as a guard to avoid write pos overrun. This bug
was hidden because the writer, oprofile_add_sample(), request
one more entry than really needed.

regards,
Phil

Index: drivers/oprofile/cpu_buffer.c
===================================================================
RCS file: /usr/local/cvsroot/linux-2.5/drivers/oprofile/cpu_buffer.c,v
retrieving revision 1.9
diff -u -p -r1.9 cpu_buffer.c
--- drivers/oprofile/cpu_buffer.c	26 May 2003 04:42:54 -0000	1.9
+++ drivers/oprofile/cpu_buffer.c	24 Jan 2004 21:07:03 -0000
@@ -86,9 +86,9 @@ static unsigned long nr_available_slots(
 	unsigned long tail = b->tail_pos;
 
 	if (tail > head)
-		return tail - head;
+		return (tail - head) - 1;
 
-	return tail + (b->buffer_size - head);
+	return tail + (b->buffer_size - head) - 1;
 }
 
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] oprofile per-cpu buffer overrun
  2004-01-26  2:37 [PATCH] oprofile per-cpu buffer overrun Philippe Elie
@ 2004-01-26  4:07 ` Andrew Morton
  2004-01-26  5:56   ` Anton Blanchard
  2004-01-26 10:32   ` John Levon
  0 siblings, 2 replies; 6+ messages in thread
From: Andrew Morton @ 2004-01-26  4:07 UTC (permalink / raw)
  To: Philippe Elie; +Cc: linux-kernel, levon

Philippe Elie <phil.el@wanadoo.fr> wrote:
>
> Hi Andrew,
> 
> In a ring buffer controlled by a read and write positions we
> can't use buffer_size but only buffer_size - 1 entry,

you can, actually.

> the last
> free entry act as a guard to avoid write pos overrun. This bug
> was hidden because the writer, oprofile_add_sample(), request
> one more entry than really needed.
> 
>...
> diff -u -p -r1.9 cpu_buffer.c
> --- drivers/oprofile/cpu_buffer.c	26 May 2003 04:42:54 -0000	1.9
> +++ drivers/oprofile/cpu_buffer.c	24 Jan 2004 21:07:03 -0000
> @@ -86,9 +86,9 @@ static unsigned long nr_available_slots(
>  	unsigned long tail = b->tail_pos;
>  
>  	if (tail > head)
> -		return tail - head;
> +		return (tail - head) - 1;
>  
> -	return tail + (b->buffer_size - head);
> +	return tail + (b->buffer_size - head) - 1;
>  }

When implementing a circular buffer it is better to not constrain the head
and tail indices - just let them grow and wrap without bound.  You only need
to bring them in-bounds when you actually use them to index the buffer.

This way,

- head-tail is always the amount of used space, no need to futz around
  handling the case where one has wrapped and the other hasn't.

- you get to use all of the buffer, because the cases head-tail == 0
  (empty) and head-tail == bufsize (full) are now distinguishable.

  It helps if the buffer size is a power of two, of course, but integer
  modulus is pretty quick.

All the net drivers and the printk log buffer implement their ring buffers
in this way; it works nicely.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] oprofile per-cpu buffer overrun
  2004-01-26  4:07 ` Andrew Morton
@ 2004-01-26  5:56   ` Anton Blanchard
  2004-01-26 10:32   ` John Levon
  1 sibling, 0 replies; 6+ messages in thread
From: Anton Blanchard @ 2004-01-26  5:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Philippe Elie, linux-kernel, levon

 
>   It helps if the buffer size is a power of two, of course, but integer
>   modulus is pretty quick.

If its something thats hit real hard, the way the modulus is done can
make a difference. We've seen it in various places (eg disabling tigon 1
support in the acenic changes the driver from using a variable to a
compile time constant and you can actually see it in the profile):

quickest == power of 2 compile time constant (results in a mask)
quickish == compile time constant (results in the multiplication by
		an inverse trick)
slow == run time (results in a divide)

Of course you can do like the printk stuff and use a variable but
enforce a power of 2 value and & it.

Anton

--

unsigned long i, j, k;

int quickest()
{
        j = i % 2;
}

int quickish()
{
        j = i % 3;
}

int slow()
{
        j = i % k;
}

--

quickest:
        lis 9,i@ha
        lwz 0,i@l(9)
        lis 9,j@ha
        rlwinm 0,0,0,31,31
        stw 0,j@l(9)
        blr
quickish:
        lis 9,i@ha
        lis 0,0xaaaa
        lwz 11,i@l(9)
        ori 0,0,43691
        lis 9,j@ha
        mulhwu 0,11,0
        srwi 0,0,1
        mulli 0,0,3
        subf 11,0,11
        stw 11,j@l(9)
        blr
slow:
        lis 9,i@ha
        lis 11,k@ha
        lwz 10,i@l(9)
        lwz 9,k@l(11)
        divwu 0,10,9
        mullw 0,0,9
        lis 9,j@ha
        subf 10,0,10
        stw 10,j@l(9)
        blr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] oprofile per-cpu buffer overrun
  2004-01-26  4:07 ` Andrew Morton
  2004-01-26  5:56   ` Anton Blanchard
@ 2004-01-26 10:32   ` John Levon
  2004-01-26 11:07     ` Nick Piggin
  2004-01-26 15:52     ` Philippe Elie
  1 sibling, 2 replies; 6+ messages in thread
From: John Levon @ 2004-01-26 10:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Philippe Elie, linux-kernel

On Sun, Jan 25, 2004 at 08:07:01PM -0800, Andrew Morton wrote:

> When implementing a circular buffer it is better to not constrain the head
> and tail indices - just let them grow and wrap without bound.  You only need
> to bring them in-bounds when you actually use them to index the buffer.

I'm not sure why that's better.

> - head-tail is always the amount of used space, no need to futz around
>   handling the case where one has wrapped and the other hasn't.

I admit I have a hangover, but it seems to me it would actually be more
complicated to make damn sure that the integer overflow case is
definitely handled properly.

I clearly can't write functioning ring buffers :), so I'd prefer it to
stay as simple as possible.

regards,
john
-- 
Khendon's Law:
If the same point is made twice by the same person, the thread is over.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] oprofile per-cpu buffer overrun
  2004-01-26 10:32   ` John Levon
@ 2004-01-26 11:07     ` Nick Piggin
  2004-01-26 15:52     ` Philippe Elie
  1 sibling, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2004-01-26 11:07 UTC (permalink / raw)
  To: John Levon; +Cc: Andrew Morton, Philippe Elie, linux-kernel



John Levon wrote:

>On Sun, Jan 25, 2004 at 08:07:01PM -0800, Andrew Morton wrote:
>
>
>>When implementing a circular buffer it is better to not constrain the head
>>and tail indices - just let them grow and wrap without bound.  You only need
>>to bring them in-bounds when you actually use them to index the buffer.
>>
>
>I'm not sure why that's better.
>
>
>>- head-tail is always the amount of used space, no need to futz around
>>  handling the case where one has wrapped and the other hasn't.
>>
>
>I admit I have a hangover, but it seems to me it would actually be more
>complicated to make damn sure that the integer overflow case is
>definitely handled properly.
>

You needn't worry about integer overflow - it is handled properly ;)



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] oprofile per-cpu buffer overrun
  2004-01-26 10:32   ` John Levon
  2004-01-26 11:07     ` Nick Piggin
@ 2004-01-26 15:52     ` Philippe Elie
  1 sibling, 0 replies; 6+ messages in thread
From: Philippe Elie @ 2004-01-26 15:52 UTC (permalink / raw)
  To: John Levon; +Cc: Andrew Morton, linux-kernel

John Levon wrote:
> On Sun, Jan 25, 2004 at 08:07:01PM -0800, Andrew Morton wrote:
> 
> 
>>When implementing a circular buffer it is better to not constrain the head
>>and tail indices - just let them grow and wrap without bound.  You only need
>>to bring them in-bounds when you actually use them to index the buffer.

neat!

> I'm not sure why that's better.

We win in increment_head/increment_tail:

static void increment_head(struct oprofile_cpu_buffer * b)
{
- 
unsigned long new_head = b->head_pos + 1;
	wmb();
- 
if (new_head < (b->buffer_size))
- 
	b->head_pos = new_head;
- 
else
- 
	b->head_pos = 0;
+ 
b->head_pos++;
}

for this added cost when accessing the buffer:

b->buffer[b->head & b->buffer_size_mask];

Modulo use is not worth but with buffer_size a power of 2
it's probably a win, I'll try and measure this later, not
urgent since the problem is fixed, I added it in our todo.

regards,
Phil


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-01-26 14:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-26  2:37 [PATCH] oprofile per-cpu buffer overrun Philippe Elie
2004-01-26  4:07 ` Andrew Morton
2004-01-26  5:56   ` Anton Blanchard
2004-01-26 10:32   ` John Levon
2004-01-26 11:07     ` Nick Piggin
2004-01-26 15:52     ` Philippe Elie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox