public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* likely() vs. unlikely()
@ 2010-12-15  6:42 Daniel Kopko
  2010-12-15 14:30 ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Kopko @ 2010-12-15  6:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: srostedt

Hello, Mr. Rostedt, LKML,

I've noticed the patch series by Steven Rostedt.  I am a bit of a lurker here, 
but I noticed something that I could perhaps contribute to.  Mr. Rostedt has 
done some great work deducing exactly whether or not these clauses meet their 
stated presumptions of "likeliness".  However, I think there may be some cases 
where accurately codifying branch biases based on literal likeliness might 
produce worse performance overall.  An example:

if(X)
    some_expensive_code();   //takes 500 ms
else
    some_very_cheap_code();  //takes 100 us

Now, let's say X is true 90% of the time.  The literal encoding of that would be 
"if(likely(X))".  However, it may make much more sense to encode it *wrongly* 
for the sake of cheapening the already cheap code, as the delay of the branch 
misprediction may be readily "absorbed" into the more expensive code.  In which 
case, even with X being likely, we may want to encode it as "if(unlikely(X))".  
(Also, to avoid obscuring things, please keep in mind that the bodies of the two 
halves of the branch above need not actually be function calls.)

I think that this type of thing may be most noticeable around any branches where 
there is a fastpath that may be run if ideal conditions are met, but which are 
met less than 50% of the time.  In such cases, the likely()/unlikely() may be 
used "wrongly" to cause the branch misprediction to occur in the 
already-high-latency (some_expensive_function()) case, and lower latencies in 
the already-low-latency (some_very_cheap_function()) case.  This would lead to 
lower attainable latencies overall (by at least the cost of a branch miss which 
would otherwise have been spent in the fast code), and further encourage coding 
to meet the ideal conditions of the fastpath.

So, several points:
1)  Please let me know if any of the above is outright wrong.
2)  I don't know if any such cases occur in the likely()/unlikely() patch 
series.  A place where it obviously DOESN'T occur would be:
http://marc.info/?l=linux-kernel&m=129229014528892&w=2
A place where I thought it MAY occur:
http://marc.info/?l=linux-kernel&m=129228728125413&w=2
3)  If there is overall agreement on the above, then I would also suggest that 
perhaps some additional macro names would be appropriate for the 
__builtin_expect() use (for cases where we want __builtin_expect(!!(X),1), but 
for which it isn't truly "likely", and for cases where we want 
__builtin_expect((X), 0), but for which it isn't truly "unlikely").  These would 
be parallel to likely()/unlikely() and have the same implementations, but 
different titles, to better document the intent of the code where they're used.  
Names maybe slow_branch_path() and fast_branch_path()? 
slow_branch()/fast_branch()?
4) I'm very sorry if this winds up ill-formatted.  I have a yahoo webmail 
client.  Open to suggestions for different free email providers on this front.


Thanks,

Daniel Kopko


      

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: likely() vs. unlikely()
  2010-12-15  6:42 likely() vs. unlikely() Daniel Kopko
@ 2010-12-15 14:30 ` Steven Rostedt
  2010-12-17  6:07   ` Daniel Kopko
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2010-12-15 14:30 UTC (permalink / raw)
  To: Daniel Kopko; +Cc: linux-kernel

On Tue, 2010-12-14 at 22:42 -0800, Daniel Kopko wrote:
> Hello, Mr. Rostedt, LKML,
> 
> I've noticed the patch series by Steven Rostedt.  I am a bit of a lurker here, 
> but I noticed something that I could perhaps contribute to.  Mr. Rostedt has 
> done some great work deducing exactly whether or not these clauses meet their 
> stated presumptions of "likeliness".  However, I think there may be some cases 
> where accurately codifying branch biases based on literal likeliness might 
> produce worse performance overall.  An example:
> 
> if(X)
>     some_expensive_code();   //takes 500 ms

Nothing in the kernel proper should ever take 500ms.

> else
>     some_very_cheap_code();  //takes 100 us
> 
> Now, let's say X is true 90% of the time.  The literal encoding of that would be 
> "if(likely(X))".  However, it may make much more sense to encode it *wrongly* 
> for the sake of cheapening the already cheap code, as the delay of the branch 
> misprediction may be readily "absorbed" into the more expensive code.  In which 
> case, even with X being likely, we may want to encode it as "if(unlikely(X))".  
> (Also, to avoid obscuring things, please keep in mind that the bodies of the two 
> halves of the branch above need not actually be function calls.)

Doesn't matter if they are function calls or not.

> 
> I think that this type of thing may be most noticeable around any branches where 
> there is a fastpath that may be run if ideal conditions are met, but which are 
> met less than 50% of the time.

Then that's not a fastpath. A definition of a fastpath has nothing to do
with the amount of time that path takes. We call something a fastpath
when it is hit 90% of the time and hit often. We want that to be as fast
as possible, even if it takes 500ms compared to the 10% of 100us. If you
look at the big picture (the entire running system) adding a missed
branch prediction(*) to 90% of a single branch is going to be larger
than having it hit the branch that is only 10% taken.

Also note, I honestly believe that most of the branch annotations should
be removed unless they are correct 90% of the time. But I do not remove
them blindly, so it takes a bit of work for each and every change.

>   In such cases, the likely()/unlikely() may be 
> used "wrongly" to cause the branch misprediction to occur in the 
> already-high-latency (some_expensive_function()) case, and lower latencies in 
> the already-low-latency (some_very_cheap_function()) case.  This would lead to 
> lower attainable latencies overall (by at least the cost of a branch miss which 
> would otherwise have been spent in the fast code), and further encourage coding 
> to meet the ideal conditions of the fastpath.

Which is not what we call a fast path.

> 
> So, several points:
> 1)  Please let me know if any of the above is outright wrong.

Already stated ;-)

> 2)  I don't know if any such cases occur in the likely()/unlikely() patch 
> series.  A place where it obviously DOESN'T occur would be:
> http://marc.info/?l=linux-kernel&m=129229014528892&w=2
> A place where I thought it MAY occur:
> http://marc.info/?l=linux-kernel&m=129228728125413&w=2
> 3)  If there is overall agreement on the above, then I would also suggest that 
> perhaps some additional macro names would be appropriate for the 
> __builtin_expect() use (for cases where we want __builtin_expect(!!(X),1), but 
> for which it isn't truly "likely", and for cases where we want 
> __builtin_expect((X), 0), but for which it isn't truly "unlikely").  These would 
> be parallel to likely()/unlikely() and have the same implementations, but 
> different titles, to better document the intent of the code where they're used.  
> Names maybe slow_branch_path() and fast_branch_path()? 
> slow_branch()/fast_branch()?
> 4) I'm very sorry if this winds up ill-formatted.  I have a yahoo webmail 
> client.  Open to suggestions for different free email providers on this front.

Lets look at a very short path that is done all over the place:

	if (unlikely(mypointer == NULL))
		return;

This is done all over the place. And it fits your definition of a fast
path. Because all it does is end the function. Where if we were to
continue, the path could be much longer. But if this function is called
1000 times a second, we want all branches to be as little of a hindrance
as possible.

-- Steve

(*) the likely() and unlike()'s do not really do anything with branch
prediction of most archs. Some archs allow for a flag that can be added
to a branch condition in assembly to what you expect it will take. Intel
does not. But it does affect the way gcc orders code, which causes
instruction cache misses. And there is a bit of default predictions that
we want. A compare and branch is default to not branch.

	cmp x
	be  1:
	[...]
1:	

The cpu will be pulling in the instructions from memory and placing it
into the instruction cache. Branch prediction helps when the CPU sees
the branch and if there's already a prediction made, it can pull in the
location of the destination of that branch, if it has not done so
already.

Lets say the cache size is 128 bytes, and that branch was pulled in at
the beginning of the cache line. We then have 124 bytes of instructions
after it even if that branch is expected to be taken.

If we have the following code:

	if (x)
		r = 1;
	a = 2;

If we consider likely(x) then we will have this:

	cmp x
	be  1: ;; if (x == 0) branch
	ld r, 1
1:	ld a, 2

But if x is unlikely 90% of the time, we don't want to even bother
jumping around it. 10% of the time it's ok to just make it slower.

	cmp x
	bne 2: ;; if (x != 0) branch
1:	ld a, 1
[...]

2:	ld r, 1
	jmp 1b

We just removed a line of code from our cache line that we don't want in
there in the first place. A lot of error code is like this. We test for
errors all the time, but we do not expect that error code to ever be
taken, so we want it out of our cache lines as much as possible.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: likely() vs. unlikely()
  2010-12-15 14:30 ` Steven Rostedt
@ 2010-12-17  6:07   ` Daniel Kopko
  2010-12-17 14:12     ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Kopko @ 2010-12-17  6:07 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel


Thanks for the thorough response.  I think maybe I didn't make my point well 
enough the first time, so please see below.



----- Original Message ----
> From: Steven Rostedt <srostedt@redhat.com>
> To: Daniel Kopko <dk_fedorabugs@yahoo.com>
> Cc: linux-kernel@vger.kernel.org
> Sent: Wed, December 15, 2010 9:30:18 AM
> Subject: Re: likely() vs. unlikely()
> 
> On Tue, 2010-12-14 at 22:42 -0800, Daniel Kopko wrote:
> > Hello, Mr.  Rostedt, LKML,
> > 
> > I've noticed the patch series by Steven  Rostedt.  I am a bit of a lurker 
>here, 
>
> > but I noticed something  that I could perhaps contribute to.  Mr. Rostedt has 
>
> > done some  great work deducing exactly whether or not these clauses meet 
>their 
>
> >  stated presumptions of "likeliness".  However, I think there may be some  
>cases 
>
> > where accurately codifying branch biases based on literal  likeliness might 
> > produce worse performance overall.  An  example:
> > 
> > if(X)
> >      some_expensive_code();   //takes 500 ms
> 
> Nothing in the kernel proper  should ever take 500ms.

Agreed.  I was just trying to make the example extreme enough to try and 
illustrate the point.

> 
> > else
> >      some_very_cheap_code();  //takes 100 us
> > 
> > Now, let's say X  is true 90% of the time.  The literal encoding of that 
>would be 
>
> >  "if(likely(X))".  However, it may make much more sense to encode it  
>*wrongly* 
>
> > for the sake of cheapening the already cheap code, as the  delay of the 
>branch 
>
> > misprediction may be readily "absorbed" into the  more expensive code.  In 
>which 
>
> > case, even with X being likely, we  may want to encode it as 
>"if(unlikely(X))".  
>
> > (Also, to avoid  obscuring things, please keep in mind that the bodies of the 
>two 
>
> > halves  of the branch above need not actually be function calls.)
> 
> Doesn't matter  if they are function calls or not.

Right, that's what I was trying to say.

> 
> > 
> > I think that this type  of thing may be most noticeable around any branches 
>where 
>
> > there is a  fastpath that may be run if ideal conditions are met, but which 
>are 
>
> > met  less than 50% of the time.
> 
> Then that's not a fastpath. A definition of a  fastpath has nothing to do
> with the amount of time that path takes. We call  something a fastpath
> when it is hit 90% of the time and hit often. We want  that to be as fast
> as possible, even if it takes 500ms compared to the 10% of  100us. If you
> look at the big picture (the entire running system) adding a  missed
> branch prediction(*) to 90% of a single branch is going to be  larger
> than having it hit the branch that is only 10% taken.

OK, perhaps I'm misusing "fastpath" here.  I just mean the path where latency is 
more of a concern.  Also, I agree with your last point here, but I have a couple 
of more concrete counter-examples below for which I don't think the last 
statement holds true.

> 
> Also  note, I honestly believe that most of the branch annotations should
> be  removed unless they are correct 90% of the time. But I do not remove
> them  blindly, so it takes a bit of work for each and every change.
> 
> >    In such cases, the likely()/unlikely() may be 
> > used "wrongly" to cause  the branch misprediction to occur in the 
> > already-high-latency  (some_expensive_function()) case, and lower latencies 
>in 
>
> > the  already-low-latency (some_very_cheap_function()) case.  This would lead 
>to 
>
> > lower attainable latencies overall (by at least the cost of a branch  miss 
>which 
>
> > would otherwise have been spent in the fast code), and  further encourage 
>coding 
>
> > to meet the ideal conditions of the  fastpath.
> 
> Which is not what we call a fast path.

Please let that be the "path-of-greatest-latency-concern", then.

> 
> > 
> > So,  several points:
> > 1)  Please let me know if any of the above is  outright wrong.
> 
> Already stated ;-)
> 
> > 2)  I don't know if  any such cases occur in the likely()/unlikely() patch 
> > series.  A  place where it obviously DOESN'T occur would be:
> > http://marc.info/?l=linux-kernel&m=129229014528892&w=2
> > A place  where I thought it MAY occur:
> > http://marc.info/?l=linux-kernel&m=129228728125413&w=2
> > 3)  If  there is overall agreement on the above, then I would also suggest 
>that 
>
> >  perhaps some additional macro names would be appropriate for the 
> >  __builtin_expect() use (for cases where we want __builtin_expect(!!(X),1), 
>but 
>
> > for which it isn't truly "likely", and for cases where we want 
> >  __builtin_expect((X), 0), but for which it isn't truly "unlikely").  These  
>would 
>
> > be parallel to likely()/unlikely() and have the same  implementations, but 
> > different titles, to better document the intent of  the code where they're 
>used.  
>
> > Names maybe slow_branch_path() and  fast_branch_path()? 
> > slow_branch()/fast_branch()?
> > 4) I'm very  sorry if this winds up ill-formatted.  I have a yahoo webmail 
> >  client.  Open to suggestions for different free email providers on this  
>front.
> 
> Lets look at a very short path that is done all over the  place:
> 
>     if (unlikely(mypointer ==  NULL))
>         return;
> 
> This is done all  over the place. And it fits your definition of a fast
> path. Because all it  does is end the function. Where if we were to
> continue, the path could be  much longer. But if this function is called
> 1000 times a second, we want all  branches to be as little of a hindrance
> as possible.

Yes, agreed, this isn't a case where I'd suggest an inversion at all.

> 
> --  Steve
> 
<clipped explanation of gcc's handling of likely()/unlikely()>
(This was informative, thank you.)


OK, so here are two examples that perhaps better illustrate my point:

1)  A spinlock.  This one is not one from the kernel, just one implemented in 
userspace:

#define CACHE_PAUSE() __asm__ __volatile__ ("pause" : :) 
#define SPIN_ACQUIRE_LOCK(x) do { while(unlikely(__sync_lock_test_and_set(&(x), 
1))) { CACHE_PAUSE(); } } while(0)

Let's assume, for sake of argument, that this lock is known to be highly 
contended.  (And let's ignore any better alternatives to spinlocks for this 
case, this is just illustrative.)  Due to the high contention, the 
__sync_lock_test_and_set() call will in the vast majority of cases return 1, 
indicating the lock was already taken.  However, even though the result is 
*likely 1*, we do not want to use likely() here, but instead unlikely().  The 
reason being is that the penalty of the incorrect unlikely() usage occurs on a 
path for which we have nothing better to do anyway, we're just going to pause 
for a moment waiting for the lock to become free.  However, in the smaller 
fraction of time in which the lock is available, we want to suffer as little 
penalty as possible.  It seems that it is correct here to invert/abuse 
unlikely() to cause the penalty to be paid in say 90% of cases (which are just 
going to wait anyway), in order to not pay the penalty at all in the 10% of 
cases where work can actually progress.

2) Suppose there is message processing system which has two classifications of 
messages:  LOWEST_LATENCY, NORMAL.  When a message is submitted into this 
system, it can be submitted with a flag/enum which indicates which type of 
message it is.  The logic then goes something like this:

if(message_type == LOWEST_LATENCY)
{
     do { sendmsg(msg); } while(errno == EAGAIN);
}
else
        enqueue_message_for_subsequent_send(msg);

Setting all messages to LOWEST_LATENCY would actually probably degrade 
performance, but let's say roughly 5-10% are set with this flag, and the rest of 
the messages are NORMAL.  I would argue that even though it is actually 
*unlikely* for message_type to be LOWEST_LATENCY, it would be appropriate to use 
likely(message_type == LOWEST_LATENCY) here.  The reason being that we want to 
optimize for latency for things requesting LOWEST_LATENCY, and permit the 
latency degradation which is thereby caused to messages which are indifferent to 
latency (NORMAL transmissions).  It doesn't even matter whether or not our 
average latency across all messages has increased, what matters is that for the 
messages marked as sensitive to latency, average latency has decreased.

Now, I am aware that as latency increases elsewhere in the system (the NORMAL 
handling), that this may bog down the processing loop and ultimately impact the 
LOWEST_LATENCY handling as well.  However, this is *not necessarily* the case.  
Nothing says that the processing loop must be handling so many messages that the 
likely()/unlikely() inversion costs so much that the loop must slow down.  In 
fact, the processing loop may have enough intervals where it has nothing to do 
that it can readily absorb the additional latency of the NORMAL handling in the 
time it would have otherwise spent waiting.  And indeed, for some applications, 
this is the case.


These are the only two cases which I've personally come across.  In these cases, 
I tested performance with and without the inversions, and was satisfied by the 
improvements that the inversions gave me.  I hadn't ever heard anyone discuss 
something like this, so I figured I should air it on LKML.  If there's yet a 
flaw in my reasoning, I'm sure you (or some other expert) will let me know.  :)  
Even if there isn't, I do not know enough to say if any situations like the 
above even occur in the kernel code to where an inversion would be appropriate.


Thanks for your time,

Daniel Kopko


      

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: likely() vs. unlikely()
  2010-12-17  6:07   ` Daniel Kopko
@ 2010-12-17 14:12     ` Steven Rostedt
  0 siblings, 0 replies; 4+ messages in thread
From: Steven Rostedt @ 2010-12-17 14:12 UTC (permalink / raw)
  To: Daniel Kopko; +Cc: linux-kernel

On Thu, 2010-12-16 at 22:07 -0800, Daniel Kopko wrote:

> OK, perhaps I'm misusing "fastpath" here.  I just mean the path where latency is 
> more of a concern.  Also, I agree with your last point here, but I have a couple 
> of more concrete counter-examples below for which I don't think the last 
> statement holds true.

Actually, we have used unlikely() for latency concerns and not really
just "what is more likely", but that is usually discouraged, and no
annotation is recommended.


> 
> OK, so here are two examples that perhaps better illustrate my point:
> 
> 1)  A spinlock.  This one is not one from the kernel, just one implemented in 
> userspace:

I'll ignore the fact that userspace spin locks is an extremely bad idea.
It is prone to live locks when used on RT systems. This is why futex was
made (fast userspace mutex). It is just as fast as userspace imlemented
spin locks, but is does not have the live lock problems.

> 
> #define CACHE_PAUSE() __asm__ __volatile__ ("pause" : :) 
> #define SPIN_ACQUIRE_LOCK(x) do { while(unlikely(__sync_lock_test_and_set(&(x), 
> 1))) { CACHE_PAUSE(); } } while(0)
> 
> Let's assume, for sake of argument, that this lock is known to be highly 
> contended.  (And let's ignore any better alternatives to spinlocks for this 
> case, this is just illustrative.)  Due to the high contention, the 
> __sync_lock_test_and_set() call will in the vast majority of cases return 1, 
> indicating the lock was already taken.  However, even though the result is 
> *likely 1*, we do not want to use likely() here, but instead unlikely().  The 
> reason being is that the penalty of the incorrect unlikely() usage occurs on a 
> path for which we have nothing better to do anyway, we're just going to pause 
> for a moment waiting for the lock to become free.  However, in the smaller 
> fraction of time in which the lock is available, we want to suffer as little 
> penalty as possible.  It seems that it is correct here to invert/abuse 
> unlikely() to cause the penalty to be paid in say 90% of cases (which are just 
> going to wait anyway), in order to not pay the penalty at all in the 10% of 
> cases where work can actually progress.

Actually, the best thing to do in this case, is simply do not use
likely() or unlikely(). But I do understand the point.


> 
> 2) Suppose there is message processing system which has two classifications of 
> messages:  LOWEST_LATENCY, NORMAL.  When a message is submitted into this 
> system, it can be submitted with a flag/enum which indicates which type of 
> message it is.  The logic then goes something like this:
> 
> if(message_type == LOWEST_LATENCY)
> {
>      do { sendmsg(msg); } while(errno == EAGAIN);
> }
> else
>         enqueue_message_for_subsequent_send(msg);
> 
> Setting all messages to LOWEST_LATENCY would actually probably degrade 
> performance, but let's say roughly 5-10% are set with this flag, and the rest of 
> the messages are NORMAL.  I would argue that even though it is actually 
> *unlikely* for message_type to be LOWEST_LATENCY, it would be appropriate to use 
> likely(message_type == LOWEST_LATENCY) here.  The reason being that we want to 
> optimize for latency for things requesting LOWEST_LATENCY, and permit the 
> latency degradation which is thereby caused to messages which are indifferent to 
> latency (NORMAL transmissions).  It doesn't even matter whether or not our 
> average latency across all messages has increased, what matters is that for the 
> messages marked as sensitive to latency, average latency has decreased.
> 
> Now, I am aware that as latency increases elsewhere in the system (the NORMAL 
> handling), that this may bog down the processing loop and ultimately impact the 
> LOWEST_LATENCY handling as well.  However, this is *not necessarily* the case.  
> Nothing says that the processing loop must be handling so many messages that the 
> likely()/unlikely() inversion costs so much that the loop must slow down.  In 
> fact, the processing loop may have enough intervals where it has nothing to do 
> that it can readily absorb the additional latency of the NORMAL handling in the 
> time it would have otherwise spent waiting.  And indeed, for some applications, 
> this is the case.

We usually don't have this worry in the kernel, since the kernel itself
tries to be fair. In cases that we use likely() or unlikey() for latency
reasons, would require a comment more than a new interface. This sounds
like a case by case instance. Not something that would be commented by a
"inverse_unlikely()" notation. Simply using the normal likely() and
unlikely() and commenting it above the usage is good enough.

> 
> 
> These are the only two cases which I've personally come across.  In these cases, 
> I tested performance with and without the inversions, and was satisfied by the 
> improvements that the inversions gave me.  I hadn't ever heard anyone discuss 
> something like this, so I figured I should air it on LKML.  If there's yet a 
> flaw in my reasoning, I'm sure you (or some other expert) will let me know.  :)  
> Even if there isn't, I do not know enough to say if any situations like the 
> above even occur in the kernel code to where an inversion would be appropriate.

Again, these sound like case by case exceptions to the rule. For which a
good comment is much better than any new interface.

-- Steve



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-12-17 14:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-15  6:42 likely() vs. unlikely() Daniel Kopko
2010-12-15 14:30 ` Steven Rostedt
2010-12-17  6:07   ` Daniel Kopko
2010-12-17 14:12     ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox