Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] mac8390: fix pr_info() calls and change return code
From: David Miller @ 2010-04-16 20:28 UTC (permalink / raw)
  To: fthain; +Cc: joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.OSX.2.00.1004162340370.271@localhost>

From: Finn Thain <fthain@telegraphics.com.au>
Date: Fri, 16 Apr 2010 23:57:34 +1000 (EST)

> 
> On Thu, 15 Apr 2010, Joe Perches wrote:
> 
>> ...Why is it better to use -EBUSY?
> 
> Nubus slots are geographically addressed and their irqs are equally 
> inflexible. -EAGAIN is misleading because retrying will not help fix 
> whatever bug caused the irq to unavailable.

This is exactly the kind of background information and verbose
explanation that belongs in the commit message.

Yet in your recent version of the patch, you're still being extremely
terse as per the reasoning for using -EBUSY

Just saying it's "misleading" doesn't tell anyone anything if they
have to go back in the commit history and try to figure out why this
change was made if it's causing problems later.

Please make the verbose and complete explanation in your commit
message, and resubmit your patch.

I just want to point out that with all the trouble you gave about
Joe's work, you're having one heck of a time even submitting your
changes properly. :-)

Thanks.

^ permalink raw reply

* Re: [PATCH net-2.6] packet : remove init_net restriction
From: David Miller @ 2010-04-16 20:23 UTC (permalink / raw)
  To: daniel.lezcano; +Cc: netdev
In-Reply-To: <4BC87C7C.4060407@free.fr>

From: Daniel Lezcano <daniel.lezcano@free.fr>
Date: Fri, 16 Apr 2010 17:04:28 +0200

> Shall I send it against net-next-2.6 ?

No, I'll likely add it to net-2.6, I just haven't gotten around
to it yet.

Thanks.

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-16 19:37 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev
In-Reply-To: <20100415.233334.242114544.davem@davemloft.net>

Le jeudi 15 avril 2010 à 23:33 -0700, David Miller a écrit :
> From: Tom Herbert <therbert@google.com>
> Date: Thu, 15 Apr 2010 22:47:08 -0700 (PDT)
> 
> > Version 5 of RFS:
> > - Moved rps_sock_flow_sysctl into net/core/sysctl_net_core.c as a
> > static function.
> > - Apply limits to rps_sock_flow_entires systcl and rps_flow_count
> > sysfs variable.
> 
> I've read this over a few times and I think it's ready to go into
> net-next-2.6, we can tweak things as-needed from here on out.
> 
> Eric, what do you think?

I think I can give my Sob, and we have time to fully test it and tweak
it if necessary.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks Tom !



^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-16 18:53 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev
In-Reply-To: <l2m65634d661004161135h1c1466afi54787022bfc2ce12@mail.gmail.com>

Le vendredi 16 avril 2010 à 11:35 -0700, Tom Herbert a écrit :
> Results with "tbench 16" on an 8 core Intel machine.
> 
> No RPS/RFS:  2155 MB/sec
> RPS (0ff mask): 1700 MB/sec
> RFS: 1097
> 
> I am not particularly surprised by the results, using loopback
> interface already provides good parallelism and RPS/RFS really would
> only add overhead and more trips between CPUs (last part is why RPS <
> RFS I suspect)-- I guess this is why we've never enabled RPS on
> loopback :-)
> 
> Eric, do you have a particular concern that this could affect a real workload?
> 

I was expecting RFS to be better than RPS at least, for this particular
workload (tcp over loopback)

With RPS, the hash function of (127.0.0.1, port1, 127.0.0.1, port2)
is different than (127.0.0.1, port2, 127.0.0.1, port1), so basically we
force the server to run on different processor than client

However, I was expecting that with RFS, client and server would run on
same cpu.

Maybe we could change (for a test) hash function to use  (sport ^ dport)
instead of (sport << 16) + dport 




^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Tom Herbert @ 2010-04-16 18:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1271401007.16881.3762.camel@edumazet-laptop>

Results with "tbench 16" on an 8 core Intel machine.

No RPS/RFS:  2155 MB/sec
RPS (0ff mask): 1700 MB/sec
RFS: 1097

I am not particularly surprised by the results, using loopback
interface already provides good parallelism and RPS/RFS really would
only add overhead and more trips between CPUs (last part is why RPS <
RFS I suspect)-- I guess this is why we've never enabled RPS on
loopback :-)

Eric, do you have a particular concern that this could affect a real workload?

Tom


On Thu, Apr 15, 2010 at 11:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 23:33 -0700, David Miller a écrit :
>> From: Tom Herbert <therbert@google.com>
>> Date: Thu, 15 Apr 2010 22:47:08 -0700 (PDT)
>>
>> > Version 5 of RFS:
>> > - Moved rps_sock_flow_sysctl into net/core/sysctl_net_core.c as a
>> > static function.
>> > - Apply limits to rps_sock_flow_entires systcl and rps_flow_count
>> > sysfs variable.
>>
>> I've read this over a few times and I think it's ready to go into
>> net-next-2.6, we can tweak things as-needed from here on out.
>>
>> Eric, what do you think?
>
> I read the patch and found no error.
>
> I booted a test machine and performed some tests
>
> I am a bit worried of a tbench regression I am looking at right now.
>
> if RFS disabled , tbench 16   ->  4408.63 MB/sec
>
>
> # grep . /sys/class/net/lo/queues/rx-0/*
> /sys/class/net/lo/queues/rx-0/rps_cpus:00000000
> /sys/class/net/lo/queues/rx-0/rps_flow_cnt:8192
> # cat /proc/sys/net/core/rps_sock_flow_entries
> 8192
>
>
> echo ffff >/sys/class/net/lo/queues/rx-0/rps_cpus
>
> tbench 16 -> 2336.32 MB/sec
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>   PerfTop:   14561 irqs/sec  kernel:86.3% [1000Hz cycles],  (all, 16 CPUs)
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>             samples  pcnt function                       DSO
>             _______ _____ ______________________________ __________________________________________________________
>
>             2664.00  5.1% copy_user_generic_string       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             2323.00  4.4% acpi_os_read_port              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1641.00  3.1% _raw_spin_lock_irqsave         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1260.00  2.4% schedule                       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1159.00  2.2% _raw_spin_lock                 /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1051.00  2.0% tcp_ack                        /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              991.00  1.9% tcp_sendmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              922.00  1.8% tcp_recvmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              821.00  1.6% child_run                      /usr/bin/tbench
>              766.00  1.5% all_string_sub                 /usr/bin/tbench
>              630.00  1.2% __switch_to                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              608.00  1.2% __GI_strchr                    /lib/tls/libc-2.3.4.so
>              606.00  1.2% ipt_do_table                   /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              600.00  1.1% __GI_strstr                    /lib/tls/libc-2.3.4.so
>              556.00  1.1% __netif_receive_skb            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              504.00  1.0% tcp_transmit_skb               /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              502.00  1.0% tick_nohz_stop_sched_tick      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              481.00  0.9% _raw_spin_unlock_irqrestore    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              473.00  0.9% next_token                     /usr/bin/tbench
>              449.00  0.9% ip_rcv                         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              423.00  0.8% call_function_single_interrupt /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              422.00  0.8% ia32_sysenter_target           /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              420.00  0.8% compat_sys_socketcall          /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              401.00  0.8% mod_timer                      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              400.00  0.8% process_backlog                /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              399.00  0.8% ip_queue_xmit                  /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              387.00  0.7% select_task_rq_fair            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              377.00  0.7% _raw_spin_lock_bh              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              360.00  0.7% tcp_v4_rcv                     /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>
> But if RFS is on, why activating rps_cpus change tbench ?
>
>
>
>

^ permalink raw reply

* Re: [PATCH v4] rfs: Receive Flow Steering
From: Rick Jones @ 2010-04-16 18:32 UTC (permalink / raw)
  To: Paul Turner
  Cc: Tom Herbert, Stephen Hemminger, davem, netdev, eric.dumazet,
	Ingo Molnar
In-Reply-To: <i2oed628a921004161059z65a3cf1aq5f3cd2194f40a811@mail.gmail.com>

> Even under a hybrid model I think phrasing it as networking leading
> the scheduler here is a little strong.  The scheduler is in both cases
> the most 'informed' place to make these decisions, but I think it
> could benefit from more knowledge.  In the 'virgin' single flow case
> without any steering the network stack is currently able to implicitly
> hint to the scheduler where flows could be most efficiently served due
> to wake-affine balancing behaviors.  This is a natural side-effect of
> wake-ups being sourced by the networking cpus.

Hinting to the scheduler is fine - so long as the final say is the scheduler. 
Presumably it is the thing that knows about the other forces tugging at where to 
run the thread - where its memory is allocated, what other flows are coming to 
it etc.

rick jones

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Eric Dumazet @ 2010-04-16 18:15 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev
In-Reply-To: <w2t65634d661004160835z4a604ee7pb5f9d395fe61b5db@mail.gmail.com>

Le vendredi 16 avril 2010 à 08:35 -0700, Tom Herbert a écrit :
> Eric, thanks for testing that.  Admittedly, we have looked at enabling
> RFS/RPS over loopback.   I'll look at that today also.
> 
> 

Hi Tom

I am sorry, but I could not work on this today. I hope I can find some
time a bit later.



> On Thu, Apr 15, 2010 at 11:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le jeudi 15 avril 2010 à 23:33 -0700, David Miller a écrit :
> >> From: Tom Herbert <therbert@google.com>
> >> Date: Thu, 15 Apr 2010 22:47:08 -0700 (PDT)
> >>
> >> > Version 5 of RFS:
> >> > - Moved rps_sock_flow_sysctl into net/core/sysctl_net_core.c as a
> >> > static function.
> >> > - Apply limits to rps_sock_flow_entires systcl and rps_flow_count
> >> > sysfs variable.
> >>
> >> I've read this over a few times and I think it's ready to go into
> >> net-next-2.6, we can tweak things as-needed from here on out.
> >>
> >> Eric, what do you think?
> >
> > I read the patch and found no error.
> >
> > I booted a test machine and performed some tests
> >
> > I am a bit worried of a tbench regression I am looking at right now.
> >
> > if RFS disabled , tbench 16   ->  4408.63 MB/sec
> >
> >
> > # grep . /sys/class/net/lo/queues/rx-0/*
> > /sys/class/net/lo/queues/rx-0/rps_cpus:00000000
> > /sys/class/net/lo/queues/rx-0/rps_flow_cnt:8192
> > # cat /proc/sys/net/core/rps_sock_flow_entries
> > 8192
> >
> >
> > echo ffff >/sys/class/net/lo/queues/rx-0/rps_cpus
> >
> > tbench 16 -> 2336.32 MB/sec
> >
> >
> > -----------------------------------------------------------------------------------------------------------------------------------------------------
> >   PerfTop:   14561 irqs/sec  kernel:86.3% [1000Hz cycles],  (all, 16 CPUs)
> > -----------------------------------------------------------------------------------------------------------------------------------------------------
> >
> >             samples  pcnt function                       DSO
> >             _______ _____ ______________________________ __________________________________________________________
> >
> >             2664.00  5.1% copy_user_generic_string       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             2323.00  4.4% acpi_os_read_port              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             1641.00  3.1% _raw_spin_lock_irqsave         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             1260.00  2.4% schedule                       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             1159.00  2.2% _raw_spin_lock                 /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >             1051.00  2.0% tcp_ack                        /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              991.00  1.9% tcp_sendmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              922.00  1.8% tcp_recvmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              821.00  1.6% child_run                      /usr/bin/tbench
> >              766.00  1.5% all_string_sub                 /usr/bin/tbench
> >              630.00  1.2% __switch_to                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              608.00  1.2% __GI_strchr                    /lib/tls/libc-2.3.4.so
> >              606.00  1.2% ipt_do_table                   /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              600.00  1.1% __GI_strstr                    /lib/tls/libc-2.3.4.so
> >              556.00  1.1% __netif_receive_skb            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              504.00  1.0% tcp_transmit_skb               /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              502.00  1.0% tick_nohz_stop_sched_tick      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              481.00  0.9% _raw_spin_unlock_irqrestore    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              473.00  0.9% next_token                     /usr/bin/tbench
> >              449.00  0.9% ip_rcv                         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              423.00  0.8% call_function_single_interrupt /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              422.00  0.8% ia32_sysenter_target           /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              420.00  0.8% compat_sys_socketcall          /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              401.00  0.8% mod_timer                      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              400.00  0.8% process_backlog                /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              399.00  0.8% ip_queue_xmit                  /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              387.00  0.7% select_task_rq_fair            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              377.00  0.7% _raw_spin_lock_bh              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >              360.00  0.7% tcp_v4_rcv                     /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
> >
> > But if RFS is on, why activating rps_cpus change tbench ?
> >
> >
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply

* Re: [PATCH v4] rfs: Receive Flow Steering
From: Paul Turner @ 2010-04-16 17:59 UTC (permalink / raw)
  To: Rick Jones
  Cc: Tom Herbert, Stephen Hemminger, davem, netdev, eric.dumazet,
	Ingo Molnar
In-Reply-To: <4BC89F6D.2080604@hp.com>

On Fri, Apr 16, 2010 at 10:33 AM, Rick Jones <rick.jones2@hp.com> wrote:
>>
>> This is true.  There is a fundamental question of whether scheduler
>> should lead networking or vice versa.  The advantages of networking
>> following scheduler seem to become more apparent on heavily loaded
>> systems or with threads that handle more than one flow.
>
> I will confess to being in the networking should follow the scheduler camp
> :)
>
>> I'm not sure these two models have to be mutually exclusive, we are
>> looking at some ways to make a hybrid model.
>
> It is perhaps too speculative on my part, but if the host has no control
> over the remote addressing of the connections to/from it, doesn't that
> suggest that allowing networking to lead the scheduler gives "external
> forces" more say in intra-system resource consumption than we might want
> them to have?
>
> rick jones
>

Even under a hybrid model I think phrasing it as networking leading
the scheduler here is a little strong.  The scheduler is in both cases
the most 'informed' place to make these decisions, but I think it
could benefit from more knowledge.  In the 'virgin' single flow case
without any steering the network stack is currently able to implicitly
hint to the scheduler where flows could be most efficiently served due
to wake-affine balancing behaviors.  This is a natural side-effect of
wake-ups being sourced by the networking cpus.

I think the win here would be allowing this (naturally existing)
hinting to be a little more explicit so that the scheduler and
load-balancer are able to gracefully 'collapse' back down onto the
network cpu socket under low stress conditions, even if previous
processing was balanced away from it due to load.

This would actually then look very much like today's model under loads
where you don't need scaling via parallelism.  One way to think about
making it an explicit hint could be: should the rx cpu sourcing the
wake-up in this case be the target for wake-affine as opposed to the
current bottom-half delegate?

- Paul

^ permalink raw reply

* Re: [PATCH v4] rfs: Receive Flow Steering
From: Rick Jones @ 2010-04-16 17:33 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Stephen Hemminger, davem, netdev, eric.dumazet, Ingo Molnar,
	Paul Turner
In-Reply-To: <o2k65634d661004160851wc00c609p7136a22fd07503c1@mail.gmail.com>

> 
> This is true.  There is a fundamental question of whether scheduler
> should lead networking or vice versa.  The advantages of networking
> following scheduler seem to become more apparent on heavily loaded
> systems or with threads that handle more than one flow.

I will confess to being in the networking should follow the scheduler camp :)

> I'm not sure these two models have to be mutually exclusive, we are
> looking at some ways to make a hybrid model.

It is perhaps too speculative on my part, but if the host has no control over 
the remote addressing of the connections to/from it, doesn't that suggest that 
allowing networking to lead the scheduler gives "external forces" more say in 
intra-system resource consumption than we might want them to have?

rick jones

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: Tom Herbert @ 2010-04-16 15:57 UTC (permalink / raw)
  To: hadi; +Cc: Eric Dumazet, netdev, robert, David Miller, Changli Gao,
	Andi Kleen
In-Reply-To: <1271271222.4567.51.camel@bigi>

> It would be valuable to have something like Documentation/networking/rps
> to detail things a little more.
>

Working on it.  Will try to post data for several platforms soon.

> cheers,
> jamal
>
>

^ permalink raw reply

* Re: [PATCH v4] rfs: Receive Flow Steering
From: Tom Herbert @ 2010-04-16 15:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev, eric.dumazet, Ingo Molnar, Paul Turner
In-Reply-To: <20100412171205.561a1aec@nehalam>

> There are two sometimes conflicting models:
>
> One model is to have the flow's be dispersed and let the scheduler
> be smarter about running the applications on the right CPU's where
> the packets arrive.
>
> The other is to have the flows redirected to the CPU where the application
> previously ran which is what RFS does.
>
> For benchmarks and private fixed configuration systems it is tempting
> to just nail everything down: i.e. use hard SMP affinity, for hardware, processes,
> and flows.  But this is the wrong solution for general purpose systems with
> varying workloads and requirements.  How well does RFS really work when
> applications, processes, and sockets come and go or get migrated among
> CPU's by the scheduler? My concern is this is overlapping scheduler
> design and might be a step backwards.
>
This is true.  There is a fundamental question of whether scheduler
should lead networking or vice versa.  The advantages of networking
following scheduler seem to become more apparent on heavily loaded
systems or with threads that handle more than one flow.

I'm not sure these two models have to be mutually exclusive, we are
looking at some ways to make a hybrid model.

The statement about pinning down resources is also true, we are
actively try to squash any instances this in our applications!

Tom

>
> --
>

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Tom Herbert @ 2010-04-16 15:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1271401007.16881.3762.camel@edumazet-laptop>

Eric, thanks for testing that.  Admittedly, we have looked at enabling
RFS/RPS over loopback.   I'll look at that today also.


On Thu, Apr 15, 2010 at 11:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 23:33 -0700, David Miller a écrit :
>> From: Tom Herbert <therbert@google.com>
>> Date: Thu, 15 Apr 2010 22:47:08 -0700 (PDT)
>>
>> > Version 5 of RFS:
>> > - Moved rps_sock_flow_sysctl into net/core/sysctl_net_core.c as a
>> > static function.
>> > - Apply limits to rps_sock_flow_entires systcl and rps_flow_count
>> > sysfs variable.
>>
>> I've read this over a few times and I think it's ready to go into
>> net-next-2.6, we can tweak things as-needed from here on out.
>>
>> Eric, what do you think?
>
> I read the patch and found no error.
>
> I booted a test machine and performed some tests
>
> I am a bit worried of a tbench regression I am looking at right now.
>
> if RFS disabled , tbench 16   ->  4408.63 MB/sec
>
>
> # grep . /sys/class/net/lo/queues/rx-0/*
> /sys/class/net/lo/queues/rx-0/rps_cpus:00000000
> /sys/class/net/lo/queues/rx-0/rps_flow_cnt:8192
> # cat /proc/sys/net/core/rps_sock_flow_entries
> 8192
>
>
> echo ffff >/sys/class/net/lo/queues/rx-0/rps_cpus
>
> tbench 16 -> 2336.32 MB/sec
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>   PerfTop:   14561 irqs/sec  kernel:86.3% [1000Hz cycles],  (all, 16 CPUs)
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>             samples  pcnt function                       DSO
>             _______ _____ ______________________________ __________________________________________________________
>
>             2664.00  5.1% copy_user_generic_string       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             2323.00  4.4% acpi_os_read_port              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1641.00  3.1% _raw_spin_lock_irqsave         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1260.00  2.4% schedule                       /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1159.00  2.2% _raw_spin_lock                 /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>             1051.00  2.0% tcp_ack                        /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              991.00  1.9% tcp_sendmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              922.00  1.8% tcp_recvmsg                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              821.00  1.6% child_run                      /usr/bin/tbench
>              766.00  1.5% all_string_sub                 /usr/bin/tbench
>              630.00  1.2% __switch_to                    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              608.00  1.2% __GI_strchr                    /lib/tls/libc-2.3.4.so
>              606.00  1.2% ipt_do_table                   /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              600.00  1.1% __GI_strstr                    /lib/tls/libc-2.3.4.so
>              556.00  1.1% __netif_receive_skb            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              504.00  1.0% tcp_transmit_skb               /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              502.00  1.0% tick_nohz_stop_sched_tick      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              481.00  0.9% _raw_spin_unlock_irqrestore    /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              473.00  0.9% next_token                     /usr/bin/tbench
>              449.00  0.9% ip_rcv                         /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              423.00  0.8% call_function_single_interrupt /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              422.00  0.8% ia32_sysenter_target           /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              420.00  0.8% compat_sys_socketcall          /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              401.00  0.8% mod_timer                      /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              400.00  0.8% process_backlog                /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              399.00  0.8% ip_queue_xmit                  /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              387.00  0.7% select_task_rq_fair            /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              377.00  0.7% _raw_spin_lock_bh              /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>              360.00  0.7% tcp_v4_rcv                     /lib/modules/2.6.34-rc3-03375-ga4fbf84-dirty/build/vmlinux
>
> But if RFS is on, why activating rps_cpus change tbench ?
>
>
>
>

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Andi Kleen @ 2010-04-16 15:28 UTC (permalink / raw)
  To: jamal; +Cc: Andi Kleen, Andi Kleen, Tom Herbert, davem, netdev, eric.dumazet
In-Reply-To: <1271426715.4606.97.camel@bigi>

On Fri, Apr 16, 2010 at 10:05:15AM -0400, jamal wrote:
> On Fri, 2010-04-16 at 15:42 +0200, Andi Kleen wrote:
> > On Fri, Apr 16, 2010 at 09:32:06AM -0400, jamal wrote:
> > > How are you going to schedule the net softirq on an empty queue if you
> > > do this?
> > 
> > Sorry don't understand the question? 
> > 
> > You can always do the flow as if rps was not there.
> 
> Meaning you schedule the other side netrx softirq if queue is empty?

You handle the packet like if rps wasn't enabled. softirq on current
CPU and it queues it on the socket.

> > I meant an IPI to a sibling is not useful. You send it to the IPI
> > to get cache locality in the target, but if the target has the same
> > cache locality as you you can as well avoid the cost of the IPI
> > and process directly.
> > 
> 
> Isnt the purpose of the IPI to signal remote side that theres something
> for it to do? 

The current CPU can queue on that socket as well.

The whole point of the IPI is to do it with cache locality.
But if cache locality is already there on the current CPU you don't
need the IPI.

> Does it also sync the remote cache?

No, the caches are always coherent.

> 
> > For thread sibling I'm pretty sure it's useless. Not full sure about
> > socket sibling. Maybe.
> > 
> 
> Agreed, the SMT threads share L2. All the cores share L3. And it is
> inclusive, so if it is missing it is in L1 of one thread it must be
> present in L2 of shared cache as well as L3. Across the QPI i dont think
> that is true.
> But if you speacial case this - arent you being specific to Nehalem?

Other CPUs have SMT too (Niagara, POWER 6/7, mips, ...). It should
be the same there.

Assuming L3 affinity helps it might need to be a CPU specific tunable
yes. The scheduler has some information about this.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* TCP keepalive question
From: Flavio Leitner @ 2010-04-16 15:06 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 700 bytes --]


Hi,


I'm reading the RFC 1122 and it says the following:
...
           Keep-alive packets MUST only be sent when no data or
           acknowledgement packets have been received for the
           connection within an interval.  
...

The receiving acknowledgement part seems to be ok and handled 
by tcp_keepalive_timer() when it does 
elapsed = tcp_time_stamp - tp->rcv_tstamp;

However, if one side just receive data and reply with ACK, the
keepalive probes is sent anyway - 2.6.32.9-70.fc12.i686.PAE.

Any reason to not reset keepalive timer when data is received?

Socket options used:
SO_KEEPALIVE, TCP_KEEPIDLE=40, TCP_KEEPCNT=6, TCP_KEEPINTVL=5

Traffic dump attached.

thanks!
-- 
Flavio

[-- Attachment #2: keepalive.pcap.gz --]
[-- Type: application/octet-stream, Size: 552 bytes --]

^ permalink raw reply

* Re: [PATCH net-2.6] packet : remove init_net restriction
From: Daniel Lezcano @ 2010-04-16 15:04 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1271322674-21726-1-git-send-email-daniel.lezcano@free.fr>

Daniel Lezcano wrote:
> The af_packet protocol is used by Perl to do ioctls as reported by
> Stephane Riviere:
>
> "Net::RawIP relies on SIOCGIFADDR et SIOCGIFHWADDR to get the IP and MAC
> addresses of the network interface."
>
> But in a new network namespace these ioctl fail because it is disabled for
> a namespace different from the init_net_ns.
>
> These two lines should not be there as af_inet and af_packet are
> namespace aware since a long time now. I suppose we forget to remove these
> lines because we sent the af_packet first, before af_inet was supported.
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
> Reported-by: Stephane Riviere <stephane.riviere@regis-dgac.net>
> ---
>  net/packet/af_packet.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index cc90363..243946d 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2169,8 +2169,6 @@ static int packet_ioctl(struct socket *sock, unsigned int cmd,
>  	case SIOCGIFDSTADDR:
>  	case SIOCSIFDSTADDR:
>  	case SIOCSIFFLAGS:
> -		if (!net_eq(sock_net(sk), &init_net))
> -			return -ENOIOCTLCMD;
>  		return inet_dgram_ops.ioctl(sock, cmd, arg);
>  #endif
>  
>   
Shall I send it against net-next-2.6 ?

Thanks
  -- Daniel

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: Changli Gao @ 2010-04-16 14:58 UTC (permalink / raw)
  To: hadi; +Cc: Eric Dumazet, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <1271429025.4606.149.camel@bigi>

On Fri, Apr 16, 2010 at 10:43 PM, jamal <hadi@cyberus.ca> wrote:
> On Fri, 2010-04-16 at 22:10 +0800, Changli Gao wrote:
>> On Fri, Apr 16, 2010 at 9:49 PM, jamal <hadi@cyberus.ca> wrote:
>
>> > my observation is:
>> > s->total is the sum of all packets received by cpu (some directly from
>> > ethernet)
>>
>> It is meaningless currently. If rps is enabled, it may be twice of the
>> number of the packets received, because one packet may be count twice:
>> one in enqueue_to_backlog(), and the other in __netif_receive_skb().
>
> You are probably right - you made me look at my collected data ;->
> i will look closely later, but it seems they are accounting for
> different cpus, no?
> Example, attached are some of the stats i captured when i was running
> the tests redirecting from CPU0 to CPU1 1M packets at about 20Kpps (just
> cut to the first and last two columns):
>
> cpu   Total     |rps_recv |rps_ipi
> -----+----------+---------+---------
> cpu0 | 002dc7f1 |00000000 |000f4246
> cpu1 | 002dc804 |000f4240 |00000000
> -------------------------------------
>
> So: cpu0 receive 0x2dc7f1 pkts accummulative over time and
> redirected to cpu1 (mostly, the extra 5 maybe to leftover since i clear
> the data) and for the test 0xf4246 times it generated an IPI. It can be
> seen that total running for CPU1 is 0x2dc804 but in this one run it
> received 1M packets (0xf4240).

I remeber you redirected all the traffic from cpu0 to cpu1, and the data shows:

about 0x2dc7f1 packets are processed, and about 0xf4240 IPI are generated.

> i.e i dont see the double accounting..
>

a single packet is counted twice by CPU0 and CPU1. If you change RPS setting by:

echo 1 > ..../rps_cpus

you will find the total number are doubled.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: jamal @ 2010-04-16 14:43 UTC (permalink / raw)
  To: Changli Gao
  Cc: Eric Dumazet, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <r2q412e6f7f1004160710j6d575f36t8e39a283328cf2d7@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1331 bytes --]

On Fri, 2010-04-16 at 22:10 +0800, Changli Gao wrote:
> On Fri, Apr 16, 2010 at 9:49 PM, jamal <hadi@cyberus.ca> wrote:

> > my observation is:
> > s->total is the sum of all packets received by cpu (some directly from
> > ethernet)
> 
> It is meaningless currently. If rps is enabled, it may be twice of the
> number of the packets received, because one packet may be count twice:
> one in enqueue_to_backlog(), and the other in __netif_receive_skb(). 

You are probably right - you made me look at my collected data ;->
i will look closely later, but it seems they are accounting for
different cpus, no? 
Example, attached are some of the stats i captured when i was running
the tests redirecting from CPU0 to CPU1 1M packets at about 20Kpps (just
cut to the first and last two columns):

cpu   Total     |rps_recv |rps_ipi
-----+----------+---------+---------
cpu0 | 002dc7f1 |00000000 |000f4246
cpu1 | 002dc804 |000f4240 |00000000
-------------------------------------

So: cpu0 receive 0x2dc7f1 pkts accummulative over time and
redirected to cpu1 (mostly, the extra 5 maybe to leftover since i clear
the data) and for the test 0xf4246 times it generated an IPI. It can be
seen that total running for CPU1 is 0x2dc804 but in this one run it
received 1M packets (0xf4240). 
i.e i dont see the double accounting..

cheers,
jamal

[-- Attachment #2: st1 --]
[-- Type: text/plain, Size: 792 bytes --]

002dc7f1 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 000f4246
002dc804 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 000f4240 00000000
00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000006 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000003c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000003e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

^ permalink raw reply

* [PATCH] mac8390: change an error return code and some cleanup
From: Finn Thain @ 2010-04-16 14:14 UTC (permalink / raw)
  To: David Miller; +Cc: joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.OSX.2.00.1004161403160.271@localhost>


Change an error return from EAGAIN to EBUSY since the former is 
misleading. Also promote the log message. Likewise some other KERN_INFO 
log messages.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

--- a/drivers/net/mac8390.c	2010-04-16 13:31:04.000000000 +1000
+++ b/drivers/net/mac8390.c	2010-04-16 23:50:39.000000000 +1000
@@ -554,7 +554,7 @@
 	case MAC8390_APPLE:
 		switch (mac8390_testio(dev->mem_start)) {
 		case ACCESS_UNKNOWN:
-			pr_info("Don't know how to access card memory!\n");
+			pr_err("Don't know how to access card memory!\n");
 			return -ENODEV;
 			break;
 
@@ -643,8 +643,8 @@
 {
 	__ei_open(dev);
 	if (request_irq(dev->irq, __ei_interrupt, 0, "8390 Ethernet", dev)) {
-		pr_info("%s: unable to get IRQ %d.\n", dev->name, dev->irq);
-		return -EAGAIN;
+		pr_err("%s: unable to get IRQ %d.\n", dev->name, dev->irq);
+		return -EBUSY;
 	}
 	return 0;
 }
@@ -660,7 +660,7 @@
 {
 	ei_status.txing = 0;
 	if (ei_debug > 1)
-		pr_info("reset not supported\n");
+		printk(KERN_DEBUG pr_fmt("reset not supported\n"));
 	return;
 }
 
@@ -668,11 +668,11 @@
 {
 	unsigned char *target = nubus_slot_addr(IRQ2SLOT(dev->irq));
 	if (ei_debug > 1)
-		pr_info("Need to reset the NS8390 t=%lu...", jiffies);
+		printk(KERN_DEBUG pr_fmt("Need to reset the NS8390 t=%lu..."), jiffies);
 	ei_status.txing = 0;
 	target[0xC0000] = 0;
 	if (ei_debug > 1)
-		pr_cont("reset complete\n");
+		printk(KERN_CONT "reset complete\n");
 	return;
 }
 

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: Changli Gao @ 2010-04-16 14:10 UTC (permalink / raw)
  To: hadi; +Cc: Eric Dumazet, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <1271425753.4606.65.camel@bigi>

On Fri, Apr 16, 2010 at 9:49 PM, jamal <hadi@cyberus.ca> wrote:
> On Fri, 2010-04-16 at 21:34 +0800, Changli Gao wrote:
>
>
> my observation is:
> s->total is the sum of all packets received by cpu (some directly from
> ethernet)

It is meaningless currently. If rps is enabled, it may be twice of the
number of the packets received, because one packet may be count twice:
one in enqueue_to_backlog(), and the other in __netif_receive_skb(). I
had posted a patch to solve this problem.

http://patchwork.ozlabs.org/patch/50217/

If you don't apply my patch, you'd better refer to /proc/net/dev for
the total number.

> s->received_rps was what the count receiver cpu saw incoming if they
> were sent by another cpu.

Maybe its name confused you.

/* Called from hardirq (IPI) context */
static void trigger_softirq(void *data)
{
        struct softnet_data *queue = data;
        __napi_schedule(&queue->backlog);
        __get_cpu_var(netdev_rx_stat).received_rps++;
}

the function above is called in IRQ of IPI. It counts the number of
IPIs received. It is actually ipi_rps you need.

> s-> ipi_rps is the times we tried to enq to remote cpu but found it to
> be empty and had to send an IPI.
> ipi_rps can be < received_rps if we receive > 1 packet without
> generating an IPI. What did i miss?
>


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: jamal @ 2010-04-16 14:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, Tom Herbert, davem, netdev, eric.dumazet
In-Reply-To: <20100416134236.GA18855@one.firstfloor.org>

On Fri, 2010-04-16 at 15:42 +0200, Andi Kleen wrote:
> On Fri, Apr 16, 2010 at 09:32:06AM -0400, jamal wrote:
> > How are you going to schedule the net softirq on an empty queue if you
> > do this?
> 
> Sorry don't understand the question? 
> 
> You can always do the flow as if rps was not there.

Meaning you schedule the other side netrx softirq if queue is empty?

> I meant an IPI to a sibling is not useful. You send it to the IPI
> to get cache locality in the target, but if the target has the same
> cache locality as you you can as well avoid the cost of the IPI
> and process directly.
> 

Isnt the purpose of the IPI to signal remote side that theres something
for it to do? Does it also sync the remote cache?

> For thread sibling I'm pretty sure it's useless. Not full sure about
> socket sibling. Maybe.
> 

Agreed, the SMT threads share L2. All the cores share L3. And it is
inclusive, so if it is missing it is in L1 of one thread it must be
present in L2 of shared cache as well as L3. Across the QPI i dont think
that is true.
But if you speacial case this - arent you being specific to Nehalem?

cheers,
jamal


^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: jamal @ 2010-04-16 13:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Changli Gao, Eric Dumazet, Rick Jones, David Miller, therbert,
	netdev, robert
In-Reply-To: <20100416133707.GZ18855@one.firstfloor.org>

On Fri, 2010-04-16 at 15:37 +0200, Andi Kleen wrote:
> On Fri, Apr 16, 2010 at 09:27:35AM -0400, jamal wrote:

> > So you are saying that the old implementation of IPI (likely what i
> > tried pre-napi and as recent as 2-3 years ago) was bad because of a
> > single lock?
> 
> Yes.

> The old implementation of smp_call_function. Also in the really old
> days there was no smp_call_function_single() so you tended to broadcast.
> 
> Jens did a lot of work on this for his block device work IPI implementation.

Nice - thanks for that info! So not only has h/ware improved, but
implementation as well..

> > On IPIs:
> > Is anyone familiar with what is going on with Nehalem? Why is it this
> > good? I expect things will get a lot nastier with other hardware like
> > xeon based or even Nehalem with rps going across QPI.
> 
> Nehalem is just fast. I don't know why it's fast in your specific
> case. It might be simply because it has lots of bandwidth everywhere.
> Atomic operations are also faster than on previous Intel CPUs.

Well, the cache architecture is nicer. The on-die MC is nice. No more
shared MC hub/FSB. The 3 MC channels are nice. Intel finally beating
AMD ;-> someone did a measurement of the memory timings (L1, L2, L3, MM
and the results were impressive; i have the numbers somewhere).

> 
> > Here's why i think IPIs are bad, please correct me if i am wrong:
> > - they are synchronous. i.e an IPI issuer has to wait for an ACK (which
> > is in the form of an IPI).
> 
> In the hardware there's no ack, but in the Linux implementation there
> is usually (because need to know when to free the stack state used
> to pass information)
>
> However there's also now support for queued IPI
> with a special API (I believe Tom is using that)
> 

Which is the non-queued-IPI call?

> > - data cache has to be synced to main memory
> > - the instruction pipeline is flushed
> 
> At least on Nehalem data transfer can be often through the cache.

I thought you have to go all the way to MM in case of IPIs.

> IPIs involve APIC accesses which are not very fast (so overall
> it's far more than a pipeline worth of work), but it's still
> not a incredible expensive operation.
> 
> There's also X2APIC now which should be slightly faster, but it's 
> likely not in your Nehalem (this is only in the highend Xeon versions)
> 

Ok, true - forgot about the APIC as well...

> > Do you know any specs i could read up which will tell me a little more?
> 
> If you're just interested in IPI and cache line transfer performance it's
> probably best to just measure it.

There are tools like benchit which would give me L1,2,3,MM measurements;
for IPI the ping + rps test i did maybe sufficient.

> Some general information is always in the Intel optimization guide.

Thanks Andi!

cheers,
jamal


^ permalink raw reply

* Re: [PATCH] mac8390: fix pr_info() calls and change return code
From: Finn Thain @ 2010-04-16 13:57 UTC (permalink / raw)
  To: Joe Perches; +Cc: David Miller, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <1271392454.2298.37.camel@Joe-Laptop.home>


On Thu, 15 Apr 2010, Joe Perches wrote:

> ...Why is it better to use -EBUSY?

Nubus slots are geographically addressed and their irqs are equally 
inflexible. -EAGAIN is misleading because retrying will not help fix 
whatever bug caused the irq to unavailable.

> ...It'd be better to prefix this with the driver name
> or use something like netdev_dbg with #define DEBUG
> otherwise it's "huh? what device emits this message?"
> when reading the logs.
> 
> Something like:
> 	printk(KERN_DEBUG pr_fmt("reset not supported\n"));

Thanks for the suggestion. I'll resend again.

> ...unnecessary conversion.

I guess some prefer consistency, some prefer symmetry.

Finn

^ permalink raw reply

* Re: [PATCH] rdma/cm: Randomize local port allocation.
From: Tetsuo Handa @ 2010-04-16 13:54 UTC (permalink / raw)
  To: amwang-H+wXaHxf7aLQT0dZR+AlfA, sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: opurdila-+zzKsuq53OdBDgjK7y7TUQ,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	rolandd-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4BC7C9CF.20403-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Cong Wang wrote:
> Sean Hefty wrote:
> > I like this version, thanks!  I'm not sure which tree to merge it through.
> > Are you needing this for 2.6.34, or is 2.6.35 okay?
> > 
> 
> As soon as possible, so 2.6.34. :)
> 
Cong, merge window for 2.6.34 was already closed.
You need to make your patchset towards 2.6.35 (using net-next-2.6 tree)
rather than 2.6.34 (using linux-2.6 tree). Therefore, this patch being
queued for 2.6.35 (through net-next-2.6 tree) should be okay for you.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: jamal @ 2010-04-16 13:49 UTC (permalink / raw)
  To: Changli Gao
  Cc: Eric Dumazet, Rick Jones, David Miller, therbert, netdev, robert,
	andi
In-Reply-To: <k2h412e6f7f1004160634v88440075p6e4cb2404abdb006@mail.gmail.com>

On Fri, 2010-04-16 at 21:34 +0800, Changli Gao wrote:

> 
> +	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
>  		   s->total, s->dropped, s->time_squeeze, 0,
>  		   0, 0, 0, 0, /* was fastroute */
> -		   s->cpu_collision, s->received_rps);
> +		   s->cpu_collision, s->received_rps, s->ipi_rps);
> 
> Do you mean that received_rps is equal to ipi_rps? received_rps is the
> number of IPI used by RPS. And ipi_rps is the number of IPIs sent by
> function generic_exec_single(). If there isn't other user of
> generic_exec_single(), received_rps should be equal to ipi_rps.
> 

my observation is:
s->total is the sum of all packets received by cpu (some directly from
ethernet)
s->received_rps was what the count receiver cpu saw incoming if they
were sent by another cpu. 
s-> ipi_rps is the times we tried to enq to remote cpu but found it to
be empty and had to send an IPI. 
ipi_rps can be < received_rps if we receive > 1 packet without
generating an IPI. What did i miss?

cheers,
jamal


^ permalink raw reply

* Re: [PATCH v5] rfs: Receive Flow Steering
From: Andi Kleen @ 2010-04-16 13:42 UTC (permalink / raw)
  To: jamal; +Cc: Andi Kleen, Tom Herbert, davem, netdev, eric.dumazet
In-Reply-To: <1271424726.4606.42.camel@bigi>

On Fri, Apr 16, 2010 at 09:32:06AM -0400, jamal wrote:
> How are you going to schedule the net softirq on an empty queue if you
> do this?

Sorry don't understand the question? 

You can always do the flow as if rps was not there.

> BTW, in my tests sending an IPI to an SMT sibling or to another core
> didnt make any difference in terms of latency - still 5 microsecs.
> I dont have dual Nehalem where we have to cross QPI - there i suspect
> it will be longer than 5 microsecs.

I meant an IPI to a sibling is not useful. You send it to the IPI
to get cache locality in the target, but if the target has the same
cache locality as you you can as well avoid the cost of the IPI
and process directly.

For thread sibling I'm pretty sure it's useless. Not full sure about
socket sibling. Maybe.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox