netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] act_cpu: redirect skb receiving to a special CPU.
@ 2010-06-05 10:56 Changli Gao
  2010-06-05 13:07 ` jamal
  0 siblings, 1 reply; 8+ messages in thread
From: Changli Gao @ 2010-06-05 10:56 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Tom Herbert, Jamal Hadi Salim
  Cc: Linux Netdev List

I am going to implement a CPU action, which can be used with ingress
qdisc to redirect skb receiving to a special cpu. It is much like RPS,
but more flexible:

* choose the hash function with the help of cls_flow.c
* pin special traffic to a dedicate CPU
* weighted packets distributing

act_cpu will use the function enqueue_to_backlog() supplied by RPS to
redirect skb receiving, and have two kind paramter:

* cpu CPUID - the ID of CPU, which handles this traffic
* map OFFSET - map the mirror class ID to CPUID: CPUID = mirror class ID + CPUID

sch_ingress will be enhanced to support class tree.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] act_cpu: redirect skb receiving to a special CPU.
  2010-06-05 10:56 [RFC] act_cpu: redirect skb receiving to a special CPU Changli Gao
@ 2010-06-05 13:07 ` jamal
  2010-06-05 13:26   ` Changli Gao
  0 siblings, 1 reply; 8+ messages in thread
From: jamal @ 2010-06-05 13:07 UTC (permalink / raw)
  To: Changli Gao; +Cc: Eric Dumazet, David S. Miller, Tom Herbert, Linux Netdev List

Changli,

I like the idea..

My preference would be to not change ingress qdisc to have queues.
The cpuid should be sufficient to map to a remote cpu queue, no?
Now, if you could represent each cpu as a netdevice, then we wouldnt
need any change;-> And we could have multiple types of ways to redirect
to cpus instead of just doing IPIs - example, ive always thought of 
sending over something like HT (I think it would be a lot cheaper).

I didnt queit understand the map OFFSET part. is this part of rfs?

cheers,
jamal

On Sat, 2010-06-05 at 18:56 +0800, Changli Gao wrote:
> I am going to implement a CPU action, which can be used with ingress
> qdisc to redirect skb receiving to a special cpu. It is much like RPS,
> but more flexible:
> 
> * choose the hash function with the help of cls_flow.c
> * pin special traffic to a dedicate CPU
> * weighted packets distributing
> 
> act_cpu will use the function enqueue_to_backlog() supplied by RPS to
> redirect skb receiving, and have two kind paramter:
> 
> * cpu CPUID - the ID of CPU, which handles this traffic
> * map OFFSET - map the mirror class ID to CPUID: CPUID = mirror class ID + CPUID
> 
> sch_ingress will be enhanced to support class tree.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] act_cpu: redirect skb receiving to a special CPU.
  2010-06-05 13:07 ` jamal
@ 2010-06-05 13:26   ` Changli Gao
  2010-06-05 13:54     ` jamal
  0 siblings, 1 reply; 8+ messages in thread
From: Changli Gao @ 2010-06-05 13:26 UTC (permalink / raw)
  To: hadi; +Cc: Eric Dumazet, David S. Miller, Tom Herbert, Linux Netdev List

On Sat, Jun 5, 2010 at 9:07 PM, jamal <hadi@cyberus.ca> wrote:
> Changli,
>
> I like the idea..
>
> My preference would be to not change ingress qdisc to have queues.

ingress doesn't have any qdisc, but a class tree. The ingress_queue
will be sth. like this:

while (1) {
 result = tc_classify(..., &res);
 cl = ingress_find(res.classid, ...);
 if (!cl->level)
     break;
 ...
}

Then we can classify skbs in tree  manner.

> The cpuid should be sufficient to map to a remote cpu queue, no?

It should be sufficient, but it isn't efficient. With map option, we
can use cls_flow to map traffic to classid, and use act_cpu map to map
classid to cpuid.

> Now, if you could represent each cpu as a netdevice, then we wouldnt
> need any change;-> And we could have multiple types of ways to redirect
> to cpus instead of just doing IPIs - example, ive always thought of
> sending over something like HT (I think it would be a lot cheaper).

I won't implement a new netdevice, but reuse the softnet. Even, I'll
reuse the enqueue_to_backlog() introduced by RPS, and of course, use
IPIs as RPS. Is there another way to trigger an IRQ of the remote CPU?

>
> I didnt queit understand the map OFFSET part. is this part of rfs?
>

No. As class IDs are started from 1, but CPU IDs are started from 0, I
need to minus/add a number from/to class IDs to map class IDs from CPU
IDs.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] act_cpu: redirect skb receiving to a special CPU.
  2010-06-05 13:26   ` Changli Gao
@ 2010-06-05 13:54     ` jamal
  2010-06-05 14:15       ` Changli Gao
  2010-06-07  8:43       ` Andi Kleen
  0 siblings, 2 replies; 8+ messages in thread
From: jamal @ 2010-06-05 13:54 UTC (permalink / raw)
  To: Changli Gao; +Cc: Eric Dumazet, David S. Miller, Tom Herbert, Linux Netdev List

On Sat, 2010-06-05 at 21:26 +0800, Changli Gao wrote:

> ingress doesn't have any qdisc, but a class tree. The ingress_queue
> will be sth. like this:
[..]

> Then we can classify skbs in tree  manner.
[..]
> > The cpuid should be sufficient to map to a remote cpu queue, no?
> 
> It should be sufficient, but it isn't efficient. With map option, we
> can use cls_flow to map traffic to classid, and use act_cpu map to map
> classid to cpuid.

I am missing something, I would see the flow as:
-->ethx/lo/etc->ingressqdisc->classify-->action(redirect to cpuidX)
Why/when do you need the tree variant? If you are thinking of maybe
rate limiting to a specific CPU, then would passing it to a policer
first not be sufficient?  IOW, classid is not very useful.

> I won't implement a new netdevice, but reuse the softnet. Even, I'll
> reuse the enqueue_to_backlog() introduced by RPS, and of course, use
> IPIs as RPS. Is there another way to trigger an IRQ of the remote CPU?

I would look at it as "messaging of remote CPU" which may not result
in an IRQ. I am pretty sure if you tried hard you could use HT in AMD
hardware - the remote cpu may have an IRQ triggered but it wont be as
expensive as IPI.

cheers,
jamal


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] act_cpu: redirect skb receiving to a special CPU.
  2010-06-05 13:54     ` jamal
@ 2010-06-05 14:15       ` Changli Gao
  2010-06-05 14:26         ` jamal
  2010-06-07  8:43       ` Andi Kleen
  1 sibling, 1 reply; 8+ messages in thread
From: Changli Gao @ 2010-06-05 14:15 UTC (permalink / raw)
  To: hadi; +Cc: Eric Dumazet, David S. Miller, Tom Herbert, Linux Netdev List

On Sat, Jun 5, 2010 at 9:54 PM, jamal <hadi@cyberus.ca> wrote:
> On Sat, 2010-06-05 at 21:26 +0800, Changli Gao wrote:
>
>> ingress doesn't have any qdisc, but a class tree. The ingress_queue
>> will be sth. like this:
> [..]
>
>> Then we can classify skbs in tree  manner.
> [..]
>> > The cpuid should be sufficient to map to a remote cpu queue, no?
>>
>> It should be sufficient, but it isn't efficient. With map option, we
>> can use cls_flow to map traffic to classid, and use act_cpu map to map
>> classid to cpuid.
>
> I am missing something, I would see the flow as:
> -->ethx/lo/etc->ingressqdisc->classify-->action(redirect to cpuidX)
> Why/when do you need the tree variant? If you are thinking of maybe
> rate limiting to a specific CPU, then would passing it to a policer
> first not be sufficient?  IOW, classid is not very useful.

For instance: there are 4 CPUs. I want redirect traffic to CPU 1-3
evenly. If the qdisc is linear the rules as

flow classify(flow classid ffff:2-4) | tc_index 2 action cpu 1 |
tc_index 3 action cpu 2 | tc_index 4 action cpu3

a tree variant:

class ffff:1 : flow classify(flow classid ffff:2-4)
class ffff:2 parent ffff:1 : action cpu 1
class ffff:3 parent ffff:1 : action cpu 2
class ffff:4 parent ffff:1 : action cpu 3

ingress_classify: use flow classify to get the subclass ID, then find
the corresponding class and exec action.

When there are lots of CPUs, tree is more efficient.

>
>> I won't implement a new netdevice, but reuse the softnet. Even, I'll
>> reuse the enqueue_to_backlog() introduced by RPS, and of course, use
>> IPIs as RPS. Is there another way to trigger an IRQ of the remote CPU?
>
> I would look at it as "messaging of remote CPU" which may not result
> in an IRQ. I am pretty sure if you tried hard you could use HT in AMD
> hardware - the remote cpu may have an IRQ triggered but it wont be as
> expensive as IPI.
>

It seems AMD specific. Why do the AMD guys use this to implement async
smp_call_function() if it is useful as you said?

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] act_cpu: redirect skb receiving to a special CPU.
  2010-06-05 14:15       ` Changli Gao
@ 2010-06-05 14:26         ` jamal
  2010-06-05 15:00           ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: jamal @ 2010-06-05 14:26 UTC (permalink / raw)
  To: Changli Gao; +Cc: Eric Dumazet, David S. Miller, Tom Herbert, Linux Netdev List

On Sat, 2010-06-05 at 22:15 +0800, Changli Gao wrote:

> For instance: there are 4 CPUs. I want redirect traffic to CPU 1-3
> evenly. If the qdisc is linear the rules as
> 
> flow classify(flow classid ffff:2-4) | tc_index 2 action cpu 1 |
> tc_index 3 action cpu 2 | tc_index 4 action cpu3
> 
> a tree variant:
> 
> class ffff:1 : flow classify(flow classid ffff:2-4)
> class ffff:2 parent ffff:1 : action cpu 1
> class ffff:3 parent ffff:1 : action cpu 2
> class ffff:4 parent ffff:1 : action cpu 3
> 
> ingress_classify: use flow classify to get the subclass ID, then find
> the corresponding class and exec action.
> 
> When there are lots of CPUs, tree is more efficient.

I still didnt follow .. 
Even if i had a million CPUs, A classifier matches some filter
and an action already bound to filter is executed. So the expensive
part is the classifier lookup.

> It seems AMD specific. Why do the AMD guys use this to implement async
> smp_call_function() if it is useful as you said?

Indeed it is AMD specific - but my view is if i was using AMD that would
be more efficient way of doing it; i.e IPI is the lowest common
denominator which works on all archs. Essentially what i am saying is
this would be a "inter-cpu messaging netdev" and i could replace its
send/recv parts from what we do in the RPS path right now to one that
uses AMD hypertransport etc.

cheers,
jamal


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] act_cpu: redirect skb receiving to a special CPU.
  2010-06-05 14:26         ` jamal
@ 2010-06-05 15:00           ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2010-06-05 15:00 UTC (permalink / raw)
  To: hadi; +Cc: Changli Gao, David S. Miller, Tom Herbert, Linux Netdev List

Le samedi 05 juin 2010 à 10:26 -0400, jamal a écrit :

> Indeed it is AMD specific - but my view is if i was using AMD that would
> be more efficient way of doing it; i.e IPI is the lowest common
> denominator which works on all archs. Essentially what i am saying is
> this would be a "inter-cpu messaging netdev" and i could replace its
> send/recv parts from what we do in the RPS path right now to one that
> uses AMD hypertransport etc.

You do realize this should be discussed on lkml , of course ?




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] act_cpu: redirect skb receiving to a special CPU.
  2010-06-05 13:54     ` jamal
  2010-06-05 14:15       ` Changli Gao
@ 2010-06-07  8:43       ` Andi Kleen
  1 sibling, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2010-06-07  8:43 UTC (permalink / raw)
  To: hadi
  Cc: Changli Gao, Eric Dumazet, David S. Miller, Tom Herbert,
	Linux Netdev List

jamal <hadi@cyberus.ca> writes:
>
> I would look at it as "messaging of remote CPU" which may not result
> in an IRQ. I am pretty sure if you tried hard you could use HT in AMD
> hardware - the remote cpu may have an IRQ triggered but it wont be as
> expensive as IPI.

It's unlikely you'll find any way on x86 to do an IPI that is cheaper
than an standard IPI. That is unless you dedicate the receiver to poll
or monitor.

On recent higher end Intel CPUs X2APIC IPIs will be somewhat cheaper
than classical APIC IPIs.

But for a normal "IPI user" like networking it looks all the same,
it's hidden by the architecture code.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-06-07  8:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-05 10:56 [RFC] act_cpu: redirect skb receiving to a special CPU Changli Gao
2010-06-05 13:07 ` jamal
2010-06-05 13:26   ` Changli Gao
2010-06-05 13:54     ` jamal
2010-06-05 14:15       ` Changli Gao
2010-06-05 14:26         ` jamal
2010-06-05 15:00           ` Eric Dumazet
2010-06-07  8:43       ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).