Netdev List
 help / color / mirror / Atom feed
* RE: Question on "net: allocate skbs on local node"
From: Eric Dumazet @ 2011-04-07  4:58 UTC (permalink / raw)
  To: Wei Gu; +Cc: netdev, Alexander Duyck, Jeff Kirsher
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48B9BEB@ESGSCCMS0001.eapac.ericsson.se>

Le jeudi 07 avril 2011 à 10:16 +0800, Wei Gu a écrit :
> Hi Eric,
> Testing with ixgbe Linux 2.6.38 driver:
> We have a little better thruput figure with this driver, but it looks
> not scalling at all, I always stressed one CPU core/24.
> And when look the perf report for ksoftirqd/24, the most cost function
> is still "_raw_spin_unlock_irqstore" and the IRQ/s is huge, it's
> somehow conflicts with desgin of NAPI. On linux 2.6.32 while the CPU
> was stressed the IRQ will descreased while the NAPI will running much
> on the polling mode. I don't know why on 2.6.38 the IRQ was keep
> increasing.


CC netdev and Intel guys, since they said it should not happen (TM)

IF you dont use DCA (make sure ioatdma module is not loaded), how comes
alloc_iova() is called at all ?

IF you use DCA, how comes its called, since the same CPU serves a given
interrupt ?



>  
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ=y
>  
> PerfTop:  512417 irqs/sec  kernel:91.3%  exact:  0.0% [1000Hz cpu-clock-msecs],  (all, 64 CPUs)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> -      0.82%     ksoftirqd/24  [kernel.kallsyms]          [k] _raw_spin_unlock_irqrestore                                                                                                                                                 
> \u2592   - _raw_spin_unlock_irqrestore                                                                                                                
> \u2592      - 44.27% alloc_iova                                                                                
> \u2592           intel_alloc_iova                                                                                                                                                                                                               
> \u2592           __intel_map_single                                                                             
> \u2592           intel_map_page                                                                                                              
> \u2592         - ixgbe_init_interrupt_scheme                                                                                                             
> \u2592            - 59.97% ixgbe_alloc_rx_buffers                                                                                                                 
> \u2592                 ixgbe_clean_rx_irq                                                                                                                
> \u2592                 0xffffffffa033a5                                                                                               
> \u2592                 net_rx_action                                                                                                                   
> u2592                 __do_softirq                                                                                                        
> \u2592               + call_softirq                                                                                                              
> \u2592            - 40.03% ixgbe_change_mtu                                                                                                                                                                                                     
> \u2592                 ixgbe_change_mtu                                                                                               
> \u2592                 dev_hard_start_xmit                                                       
> \u2592                 sch_direct_xmit                                                                   
> \u2592                 dev_queue_xmit                                                                                                 
> \u2592                 vlan_dev_hard_start_xmit                                                                                                                                                                                                 
> \u2592                 hook_func                                                                                                                                                                                                                
> \u2592                 nf_iterate                                                                                                                                                                                                              
> \u2592                nf_hook_slow                                                                                                                                                                                                             
> \u2592                 NF_HOOK.clone.1                                                                                                                                                                                                          
> \u2592                 ip_rcv                                                                                                                                                                                                                   
> \u2592                 __netif_receive_skb                                                                                                                                                                                                      
> \u2592                 __netif_receive_skb                                                                                                                                                                                                      
> \u2592                 netif_receive_skb                                                                                                                                                                                                        
> \u2592                 napi_skb_finish                                                                                                                                                                                                          
> \u2592                 napi_gro_receive                                                                                                                                                                                                         
> \u2592                 ixgbe_clean_rx_irq                                                                                                                                                                                                       
> \u2592                 0xffffffffa033a5                                                                                                                                                                                                         
> \u2592                 net_rx_action                                                                                                                                                                                                            
> \u2592                 __do_softirq                                                                                                                                                                                                             
> \u2592               + call_softirq                                                                                                                                                                                                             
> \u2592      + 35.85% find_iova                                                                                                                                                                                                                  
> \u2592      + 19.44% add_unmap      
>  
>  
> Thanks
> WeiGu
>  



^ permalink raw reply

* Re: [PATCH net-next 1/5] be2net: add rxhash support
From: Eric Dumazet @ 2011-04-07  5:05 UTC (permalink / raw)
  To: Ajit Khaparde; +Cc: netdev
In-Reply-To: <20110407040743.GA4183@akhaparde-VBox>

Le mercredi 06 avril 2011 à 23:07 -0500, Ajit Khaparde a écrit :
> Add rxhash support,
> Based on initial work by Eric Dumazet.
> 
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
> ---
>  drivers/net/benet/be.h         |    5 +++++
>  drivers/net/benet/be_ethtool.c |   13 +++++++++++++
>  drivers/net/benet/be_main.c    |   17 ++++++++++++-----
>  3 files changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
> index 0899d91..8941b98 100644
> --- a/drivers/net/benet/be.h
> +++ b/drivers/net/benet/be.h
> @@ -485,6 +485,11 @@ static inline void be_vf_eth_addr_generate(struct be_adapter *adapter, u8 *mac)
>  	memcpy(mac, adapter->netdev->dev_addr, 3);
>  }
>  
> +static inline bool be_multi_rxq(struct be_adapter *adapter)

static inline bool be_multi_rxq(const struct be_adapter *adapter)


> +{
> +	return (adapter->num_rx_qs > 1);

	return adapter->num_rx_qs > 1;

> +}
> +

Other parts seems fine, thanks !



^ permalink raw reply

* RE: Question on "net: allocate skbs on local node"
From: Eric Dumazet @ 2011-04-07  5:16 UTC (permalink / raw)
  To: Wei Gu; +Cc: netdev, Alexander Duyck, Jeff Kirsher
In-Reply-To: <1302152327.2701.50.camel@edumazet-laptop>

Le jeudi 07 avril 2011 à 06:58 +0200, Eric Dumazet a écrit :
> Le jeudi 07 avril 2011 à 10:16 +0800, Wei Gu a écrit :
> > Hi Eric,
> > Testing with ixgbe Linux 2.6.38 driver:
> > We have a little better thruput figure with this driver, but it looks
> > not scalling at all, I always stressed one CPU core/24.
> > And when look the perf report for ksoftirqd/24, the most cost function
> > is still "_raw_spin_unlock_irqstore" and the IRQ/s is huge, it's
> > somehow conflicts with desgin of NAPI. On linux 2.6.32 while the CPU
> > was stressed the IRQ will descreased while the NAPI will running much
> > on the polling mode. I don't know why on 2.6.38 the IRQ was keep
> > increasing.
> 
> 
> CC netdev and Intel guys, since they said it should not happen (TM)
> 
> IF you dont use DCA (make sure ioatdma module is not loaded), how comes
> alloc_iova() is called at all ?
> 
> IF you use DCA, how comes its called, since the same CPU serves a given
> interrupt ?
> 
> 

But then, maybe you forgot to cpu affine IRQS ?

High performance routing setup is tricky, since you probably want to
disable many features that are ON by default : Most machines act as a
end host.




^ permalink raw reply

* Re: problem of "ipv4: revert Set rt->rt_iif more sanely on output routes."
From: David Miller @ 2011-04-07  5:34 UTC (permalink / raw)
  To: hirofumi; +Cc: netdev
In-Reply-To: <87ipuqsmwl.fsf@devron.myhome.or.jp>

From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Date: Thu, 07 Apr 2011 13:31:06 +0900

> I'm not pretty sure though, output message is
> 
> 	ip_finish_output2: No header cache and no neighbour!
> 
> I'm not debugging this though,
> 
> static inline bool rt_is_output_route(struct rtable *rt)
> {
> 	return rt->rt_iif == 0;
> }
> 
> from review I guess the above is one of cause.

arp_bind_neighbour() is only called if rt_is_output_route() is true
or route is unicast.

If packet is sent using a route for which arp_bind_neighbour() has not
been called, you will see that warning message.

^ permalink raw reply

* Re: problem of "ipv4: revert Set rt->rt_iif more sanely on output routes."
From: David Miller @ 2011-04-07  5:42 UTC (permalink / raw)
  To: hirofumi; +Cc: netdev
In-Reply-To: <20110406.223400.71127145.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 06 Apr 2011 22:34:00 -0700 (PDT)

> From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> Date: Thu, 07 Apr 2011 13:31:06 +0900
> 
>> I'm not pretty sure though, output message is
>> 
>> 	ip_finish_output2: No header cache and no neighbour!
>> 
>> I'm not debugging this though,
>> 
>> static inline bool rt_is_output_route(struct rtable *rt)
>> {
>> 	return rt->rt_iif == 0;
>> }
>> 
>> from review I guess the above is one of cause.
> 
> arp_bind_neighbour() is only called if rt_is_output_route() is true
> or route is unicast.
> 
> If packet is sent using a route for which arp_bind_neighbour() has not
> been called, you will see that warning message.

Ok, the problem is that, for output routes in original code:

1) user's flow device index is stored in rt->rt_iif

2) arp_bind_neighbour() tests meanwhile used rt->fl.iif

So we do need, for now, to add a new member.  But I think for
correct semantics it needs to have inverse meaning to the one
you added in your RFC patch.

So fix is something like:

1) Add "int rt_route_iif;" to struct rtable

2) For input routes, always set rt_route_iif to same value as rt_iif

3) For output routes, always set rt_route_iif to zero.  Set rt_iif
   as it is done currently.

4) Change rt_is_{output,input}_route() to test rt_route_iif

This should fix the bug and not introduce new regressions.

Can you write and test such a patch with your test case?

Thank you!

^ permalink raw reply

* RE: [PATCHv3 NEXT 1/1] net: ethtool support to configure number of channels
From: Amit Salecha @ 2011-04-07  6:03 UTC (permalink / raw)
  To: David Miller, bhutchings@solarflare.com
  Cc: netdev@vger.kernel.org, Ameen Rahman, Anirban Chakraborty
In-Reply-To: <20110406.133019.212692386.davem@davemloft.net>


>
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Wed, 06 Apr 2011 21:20:47 +0100
>
> > Amit, I told you already that you must not use my Signed-off-by line
> > since you are changing the patch significantly.
>
> Amit, this is a very serious infraction.
>
> You must not ever add someone else's signed-off-by when you make
> changes to a patch, unless you have their very clear and explicit
> permission to do so.

As this patch is based on Ben patch, so I thought its my duty to add his Signed-off.
Then I misinterpreted Ben comment and thought he want me to explain his contribution.
Sorry Ben.

-Amit

This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.


^ permalink raw reply

* Re: [PATCH net-next] net: fix skb_add_data_nocache() to calc csum correctly
From: David Miller @ 2011-04-07  6:05 UTC (permalink / raw)
  To: therbert; +Cc: yjwei, netdev
In-Reply-To: <BANLkTikspQHnjagFme0S3GdPUo-zw48zBw@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Wed, 6 Apr 2011 21:50:55 -0700

> Nice catch.
> 
> Acked-by: Tom Herbert <therbert@google.com>
> 
> On Wed, Apr 6, 2011 at 9:40 PM, Wei Yongjun <yjwei@cn.fujitsu.com> wrote:
>> commit c6e1a0d12ca7b4f22c58e55a16beacfb7d3d8462 broken the calc
>>  (net: Allow no-cache copy from user on transmit)
>> of checksum, which may cause some tcp packets be dropped because
>> incorrect checksum. ssh does not work under today's net-next-2.6
>> tree.
>>
>> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>

Applied, thanks everyone.

^ permalink raw reply

* RE: Question on "net: allocate skbs on local node"
From: Eric Dumazet @ 2011-04-07  6:16 UTC (permalink / raw)
  To: Wei Gu; +Cc: netdev, Alexander Duyck, Jeff Kirsher
In-Reply-To: <1302153412.2701.64.camel@edumazet-laptop>

Le jeudi 07 avril 2011 à 07:16 +0200, Eric Dumazet a écrit :
> Le jeudi 07 avril 2011 à 06:58 +0200, Eric Dumazet a écrit :
> > Le jeudi 07 avril 2011 à 10:16 +0800, Wei Gu a écrit :
> > > Hi Eric,
> > > Testing with ixgbe Linux 2.6.38 driver:
> > > We have a little better thruput figure with this driver, but it looks
> > > not scalling at all, I always stressed one CPU core/24.
> > > And when look the perf report for ksoftirqd/24, the most cost function
> > > is still "_raw_spin_unlock_irqstore" and the IRQ/s is huge, it's
> > > somehow conflicts with desgin of NAPI. On linux 2.6.32 while the CPU
> > > was stressed the IRQ will descreased while the NAPI will running much
> > > on the polling mode. I don't know why on 2.6.38 the IRQ was keep
> > > increasing.
> > 
> > 
> > CC netdev and Intel guys, since they said it should not happen (TM)
> > 
> > IF you dont use DCA (make sure ioatdma module is not loaded), how comes
> > alloc_iova() is called at all ?
> > 
> > IF you use DCA, how comes its called, since the same CPU serves a given
> > interrupt ?
> > 
> > 
> 
> But then, maybe you forgot to cpu affine IRQS ?
> 
> High performance routing setup is tricky, since you probably want to
> disable many features that are ON by default : Most machines act as a
> end host.
> 
> 

Please dont send me anymore private mails, I do think the issue you have
is on a setup, not a particular optimization done in network stack.


Copy of your private mail :

> On 2.6.38, I got a lot of "rx_missed_errors" on NIC, which means the
> rx loop was really busy to get packet from the receiving ring. Usually
> in this case it shouldn't exit the softirqs and keep polling in order
> to decrease the initrs.
> 
> On 2.6.32, I can Rx and Tx 2.3Mpps with no packet lost(error on NIC),
> but on 2.6.38 I can only reach 50kpps with a lot of
> "rx_missed_errors", and all the binding cpu core was 100% in SI. I
> don't think there was any optimizations on it.

I hope you understand there is something wrong with your setup ?

50.000 pps on a 64 cpu machine is a bad joke.

We can reach +10.000.000 on a 16 cpus one.




^ permalink raw reply

* Re: nfs client doesn't work [was: mmotm 2011-03-31-14-48 uploaded]
From: Jiri Slaby @ 2011-04-07  6:42 UTC (permalink / raw)
  To: Myklebust, Trond
  Cc: Jiri Slaby, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA, ML netdev,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1302122693.16786.0.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>

On 04/06/2011 10:44 PM, Myklebust, Trond wrote:
> On Sat, 2011-04-02 at 10:56 +0200, Jiri Slaby wrote:
>> On 03/31/2011 11:48 PM, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org wrote:
>> > The mm-of-the-moment snapshot 2011-03-31-14-48 has been uploaded to
>>
>> Hi, nfs client is defunct in this kernel. Tcpdump says:
>> 10:51:55.489717 IP 10.20.11.33.759945860 > 10.20.3.2.2049: 132 getattr
>> fh 0,0/24
>> 10:51:55.515927 IP 10.20.3.2.2049 > 10.20.11.33.759945860: reply ok 44
>> getattr ERROR: Operation not permitted
>> 10:51:55.515949 IP 10.20.11.33.921 > 10.20.3.2.2049: Flags [.], ack
>> 3569361440, win 115, options [nop,nop,TS val 599750 ecr 255058541],
> length 0
>> 10:52:04.130310 IP 10.20.11.33.793500292 > 10.20.3.2.2049: 76 getattr fh
>> 0,0/24
>> 10:52:04.152178 IP 10.20.3.2.2049 > 10.20.11.33.793500292: reply ok 44
>> getattr ERROR: Operation not permitted
>>
>> If I run the same mount command (mount -oro,intr host:dir mountpoint)
>> from within a virtual machine with 2.6.38.2 there, everything mounts OK.
> 
> Does the attached patch help?

No, still the operation not permitted in the tcpdump output and no mount.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xen: drop anti-dependency on X86_VISWS
From: Ian Campbell @ 2011-04-07  6:58 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet@gmail.com, mirq-linux@rere.qmqm.pl,
	netdev@vger.kernel.org, Jeremy Fitzhardinge,
	konrad.wilk@oracle.com, xen-devel@lists.xensource.com,
	virtualization@lists.linux-foundation.org,
	randy.dunlap@oracle.com, pazke@donpac.ru,
	linux-visws-devel@lists.sf.net, tglx@linutronix.de,
	mingo@redhat.com, hpa@zytor.com
In-Reply-To: <20110406.144515.235693855.davem@davemloft.net>

On Wed, 2011-04-06 at 22:45 +0100, David Miller wrote:
> From: Ian Campbell <Ian.Campbell@eu.citrix.com>
> Date: Mon, 4 Apr 2011 10:55:55 +0100
> 
> > You mean the "!X86_VISWS" I presume? It doesn't make sense to me either.
> 
> No, I think 32-bit x86 allmodconfig elides XEN because of it's X86_TSC dependency.

TSC is a real dependency of the Xen interfaces.

> And, well, you could type "make allmodconfig" on your tree and see for
> yourself instead of asking me :-)

True.

X86_TSC not being enabled appears to due to CONFIG_ELAN being enabled
which causes the processor selection option (which defaults to M686,
which is a sane choice and enables TSC etc) to be gated at the top level
in arch/x86/Kconfig.cpu. Disabling the ELAN option then leaves X86_TSC
gated on !CONFIG_NUMAQ but removing that results in a generally useful
looking config.

It's a shame that these sorts of minority options cause allmodconfig to
omit support for more interesting configurations, such as modern
processors. Other than negating the semantics of such options I'm not
really sure what can be done about it though. On the other hand
compiling all the unusual stuff in an allmodconfig is probably a
positive thing.

I'm not sure why ELAN belongs in the EXTENDED_PLATFORM option space
rather than in the CPU choice option, since its only impact seems to be
on -march, MODULE_PROC_FAMILY and some cpufreq drivers which doesn't
sound like an extended platform to me but does it appear to be
deliberate (see 9e111f3e167a "x86: move ELAN to the
NON_STANDARD_PLATFORM section", that was the old name for
EXTENDED_PLATFORM).

Hrm, what about the following? (doesn't actually make a difference to
Xen since allmodconfig chooses HIGHMEM4G instead of HIGHMEM64G in the !
NUMAQ case but I stopped worrying about that several paragraphs ago)

8<--------

x86: invert X86_EXTENDED_PLATFORM to X86_STANDARD_PLATFORM

Having the =y choice be the more "standard" configuration causes
all*config to provide greater coverage of usual configurations.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc6c53a..6d8a404 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -299,15 +299,15 @@ config X86_BIGSMP
 	  This option is needed for the systems that have more than 8 CPUs
 
 if X86_32
-config X86_EXTENDED_PLATFORM
-	bool "Support for extended (non-PC) x86 platforms"
+config X86_STANDARD_PLATFORM
+	bool "Restrict support to standard (PC) x86 platforms"
 	default y
 	---help---
-	  If you disable this option then the kernel will only support
+	  If you enable this option then the kernel will only support
 	  standard PC platforms. (which covers the vast majority of
 	  systems out there.)
 
-	  If you enable this option then you'll be able to select support
+	  If you disable this option then you'll be able to select support
 	  for the following (non-PC) 32 bit x86 platforms:
 		AMD Elan
 		NUMAQ (IBM/Sequent)
@@ -318,25 +318,25 @@ config X86_EXTENDED_PLATFORM
 		Moorestown MID devices
 
 	  If you have one of these systems, or if you want to build a
-	  generic distribution kernel, say Y here - otherwise say N.
+	  generic distribution kernel, say N here - otherwise say Y.
 endif
 
 if X86_64
-config X86_EXTENDED_PLATFORM
-	bool "Support for extended (non-PC) x86 platforms"
+config X86_STANDARD_PLATFORM
+	bool "Restrict support to standard (PC) x86 platforms"
 	default y
 	---help---
-	  If you disable this option then the kernel will only support
+	  If you enable this option then the kernel will only support
 	  standard PC platforms. (which covers the vast majority of
 	  systems out there.)
 
-	  If you enable this option then you'll be able to select support
+	  If you disable this option then you'll be able to select support
 	  for the following (non-PC) 64 bit x86 platforms:
 		ScaleMP vSMP
 		SGI Ultraviolet
 
 	  If you have one of these systems, or if you want to build a
-	  generic distribution kernel, say Y here - otherwise say N.
+	  generic distribution kernel, say N here - otherwise say Y.
 endif
 # This is an alphabetically sorted list of 64 bit extended platforms
 # Please maintain the alphabetic order if and when there are additions
@@ -346,7 +346,7 @@ config X86_VSMP
 	select PARAVIRT_GUEST
 	select PARAVIRT
 	depends on X86_64 && PCI
-	depends on X86_EXTENDED_PLATFORM
+	depends on !X86_STANDARD_PLATFORM
 	---help---
 	  Support for ScaleMP vSMP systems.  Say 'Y' here if this kernel is
 	  supposed to run on these EM64T-based machines.  Only choose this option
@@ -355,7 +355,7 @@ config X86_VSMP
 config X86_UV
 	bool "SGI Ultraviolet"
 	depends on X86_64
-	depends on X86_EXTENDED_PLATFORM
+	depends on !X86_STANDARD_PLATFORM
 	depends on NUMA
 	depends on X86_X2APIC
 	---help---
@@ -368,7 +368,7 @@ config X86_UV
 config X86_ELAN
 	bool "AMD Elan"
 	depends on X86_32
-	depends on X86_EXTENDED_PLATFORM
+	depends on !X86_STANDARD_PLATFORM
 	---help---
 	  Select this for an AMD Elan processor.
 
@@ -381,7 +381,7 @@ config X86_INTEL_CE
 	depends on PCI
 	depends on PCI_GODIRECT
 	depends on X86_32
-	depends on X86_EXTENDED_PLATFORM
+	depends on !X86_STANDARD_PLATFORM
 	select X86_REBOOTFIXUPS
 	select OF
 	select OF_EARLY_FLATTREE
@@ -395,7 +395,7 @@ config X86_MRST
 	depends on PCI
 	depends on PCI_GOANY
 	depends on X86_32
-	depends on X86_EXTENDED_PLATFORM
+	depends on !X86_STANDARD_PLATFORM
 	depends on X86_IO_APIC
 	select APB_TIMER
 	select I2C
@@ -413,7 +413,7 @@ config X86_MRST
 config X86_RDC321X
 	bool "RDC R-321x SoC"
 	depends on X86_32
-	depends on X86_EXTENDED_PLATFORM
+	depends on !X86_STANDARD_PLATFORM
 	select M486
 	select X86_REBOOTFIXUPS
 	---help---
@@ -424,7 +424,7 @@ config X86_RDC321X
 config X86_32_NON_STANDARD
 	bool "Support non-standard 32-bit SMP architectures"
 	depends on X86_32 && SMP
-	depends on X86_EXTENDED_PLATFORM
+	depends on !X86_STANDARD_PLATFORM
 	---help---
 	  This option compiles in the NUMAQ, Summit, bigsmp, ES7000, default
 	  subarchitectures.  It is intended for a generic binary kernel.



^ permalink raw reply related

* Re: [Patch] isdn: remove deprecated ISDN_CAPI_CAPIFS
From: Cong Wang @ 2011-04-07  7:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, isdn, jan.kiszka
In-Reply-To: <20110406.131216.15250040.davem@davemloft.net>

于 2011年04月07日 04:12, David Miller 写道:
> From: Amerigo Wang<amwang@redhat.com>
> Date: Wed,  6 Apr 2011 17:05:39 +0800
>
>> Cc: Jan Kiszka<jan.kiszka@web.de>
>> Cc: Karsten Keil<isdn@linux-pingi.de>
>> Signed-off-by: WANG Cong<amwang@redhat.com>
>
> capi.c still includes capifs.h, which you are deleting here.
>
> How did you build test this?

Oops! I definitely used a wrong .config.. :-/

Thanks for fixing it, Jan!

^ permalink raw reply

* Re: [Patch] iwlwifi: remove obsoleted module alias and parameters
From: Cong Wang @ 2011-04-07  7:17 UTC (permalink / raw)
  To: Guy, Wey-Yi
  Cc: linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	Intel Linux Wireless, Berg, Johannes, John W. Linville,
	Stanislaw Gruszka, Venkataraman, Meenakshi, Larry Finger
In-Reply-To: <1302098755.14995.112.camel@wwguy-huron>

于 2011年04月06日 22:05, Guy, Wey-Yi 写道:
> On Wed, 2011-04-06 at 02:49 -0700, Amerigo Wang wrote:
>> As scheduled in Documentation/feature-removal-schedule.txt,
>> remove "*50", "disable_hw_scan" module parameters and MODULE_ALIAS("iwl4965").
>>
>> Cc: Intel Linux Wireless<ilw@linux.intel.com>
>> Cc: Johannes Berg<johannes.berg@intel.com>
>> Cc: "John W. Linville"<linville@tuxdriver.com>
>> Cc: Wey-Yi Guy<wey-yi.w.guy@intel.com>
>> Cc: Stanislaw Gruszka<sgruszka@redhat.com>
>> Cc: Meenakshi Venkataraman<meenakshi.venkataraman@intel.com>
>> Cc: Larry Finger<Larry.Finger@lwfinger.net>
>> Signed-off-by: WANG Cong<amwang@redhat.com>
>>
>> ---
> what tree you are base on?
> please check commit#7eaa6a5e964f1ab02d849bda36950c0d30be8ce2 in
> wireless-next-2.6

The latest Linus tree, sorry that I didn't know wireless has its own tree,
I just checked wireless-next-2.6 but don't find any commit matched
that commit ID, but I assume you meant you already sent a same patch?
If yes, feel free to discard mine.

Thanks.

^ permalink raw reply

* Re: problem of "ipv4: revert Set rt->rt_iif more sanely on output routes."
From: OGAWA Hirofumi @ 2011-04-07  7:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110406.224244.104071339.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> So fix is something like:
>
> 1) Add "int rt_route_iif;" to struct rtable
>
> 2) For input routes, always set rt_route_iif to same value as rt_iif
>
> 3) For output routes, always set rt_route_iif to zero.  Set rt_iif
>    as it is done currently.
>
> 4) Change rt_is_{output,input}_route() to test rt_route_iif
>
> This should fix the bug and not introduce new regressions.
>
> Can you write and test such a patch with your test case?

Ok. I'll try, but I'm not sure I understand the above correctly. Well,
I'll send the patch after testing.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply

* Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-07  7:22 UTC (permalink / raw)
  To: netdev, Alexander Duyck, Jeff Kirsher
In-Reply-To: <1302157012.2701.73.camel@edumazet-laptop>

Hi guys,
As I talked with Eric, that I get a very low performance on Linux 2.6.38 kernel with intel ixgbe-3.2.10 driver.
I test different rx buff size on the Intel 10G NIC, by setting ethtool -G rx 4096.
I get the lowest performance(~50Kpps Rx&Tx) by setting the rx==4096.
Once I decrease the Rx to 512 (default) then I can get Max 250Kpps Rx&Tx on 1 NIC.

I was runing this test with HP DL580 4 Sock CPUs, and full memeory configuration.
modprobe ixgbe RSS=8,8,8,8,8,8,8,8 FdirMode=0,0,0,0,0,0,0,0 Node=0,0,1,1,2,2,3,3
Numactrl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
node 0 size: 65525 MB
node 0 free: 63053 MB
node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
node 1 size: 65536 MB
node 1 free: 63388 MB
node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
node 2 size: 65536 MB
node 2 free: 63344 MB
node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
node 3 size: 65535 MB
node 3 free: 63376 MB

Then I binding the eth10's rx and tx's IRQs to core "24 25 26 27 28 29 30 31", one by one, which means 1 rx and 1 tx was share 1 core.


I did the same test on 2.6.32 kernel, I can get >2.5M tx&rx with the same setup on RHEL6(2.6.32) Linux. But never reach 10.000.000 rx&tx on a single NIC:)

I also test the 2.6.38 shipped intel ixgbe driver It has the same problem.

This is a perf record with linux shipped ixgbe driver, looks it has a very high irq/s rate. And the softirq was busy on alloc_iova


PerfTop:  512417 irqs/sec  kernel:91.3%  exact:  0.0% [1000Hz cpu-clock-msecs],  (all, 64 CPUs)
------------------------------------------------------------------------------------------------------------------------------------------------------
-      0.82%     ksoftirqd/24  [kernel.kallsyms]          [k] _raw_spin_unlock_irqrestore
\u2592   - _raw_spin_unlock_irqrestore
\u2592      - 44.27% alloc_iova
\u2592           intel_alloc_iova
\u2592           __intel_map_single
\u2592           intel_map_page
\u2592         - ixgbe_init_interrupt_scheme
\u2592            - 59.97% ixgbe_alloc_rx_buffers
\u2592                 ixgbe_clean_rx_irq
\u2592                 0xffffffffa033a5
\u2592                 net_rx_action
u2592                 __do_softirq
\u2592               + call_softirq
\u2592            - 40.03% ixgbe_change_mtu
\u2592                 ixgbe_change_mtu
\u2592                 dev_hard_start_xmit
\u2592                 sch_direct_xmit
\u2592                 dev_queue_xmit
\u2592                 vlan_dev_hard_start_xmit
\u2592                 hook_func
\u2592                 nf_iterate
\u2592                nf_hook_slow
\u2592                 NF_HOOK.clone.1
\u2592                 ip_rcv
\u2592                 __netif_receive_skb
\u2592                 __netif_receive_skb
\u2592                 netif_receive_skb
\u2592                 napi_skb_finish
\u2592                 napi_gro_receive
\u2592                 ixgbe_clean_rx_irq
\u2592                 0xffffffffa033a5
\u2592                 net_rx_action
\u2592                 __do_softirq
\u2592               + call_softirq
\u2592      + 35.85% find_iova
\u2592      + 19.44% add_unmap


Thanks
WeiGu


-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Thursday, April 07, 2011 2:17 PM
To: Wei Gu
Cc: netdev; Alexander Duyck; Jeff Kirsher
Subject: RE: Question on "net: allocate skbs on local node"

Le jeudi 07 avril 2011 à 07:16 +0200, Eric Dumazet a écrit :
> Le jeudi 07 avril 2011 à 06:58 +0200, Eric Dumazet a écrit :
> > Le jeudi 07 avril 2011 à 10:16 +0800, Wei Gu a écrit :
> > > Hi Eric,
> > > Testing with ixgbe Linux 2.6.38 driver:
> > > We have a little better thruput figure with this driver, but it
> > > looks not scalling at all, I always stressed one CPU core/24.
> > > And when look the perf report for ksoftirqd/24, the most cost
> > > function is still "_raw_spin_unlock_irqstore" and the IRQ/s is
> > > huge, it's somehow conflicts with desgin of NAPI. On linux 2.6.32
> > > while the CPU was stressed the IRQ will descreased while the NAPI
> > > will running much on the polling mode. I don't know why on 2.6.38
> > > the IRQ was keep increasing.
> >
> >
> > CC netdev and Intel guys, since they said it should not happen (TM)
> >
> > IF you dont use DCA (make sure ioatdma module is not loaded), how
> > comes
> > alloc_iova() is called at all ?
> >
> > IF you use DCA, how comes its called, since the same CPU serves a
> > given interrupt ?
> >
> >
>
> But then, maybe you forgot to cpu affine IRQS ?
>
> High performance routing setup is tricky, since you probably want to
> disable many features that are ON by default : Most machines act as a
> end host.
>
>

Please dont send me anymore private mails, I do think the issue you have is on a setup, not a particular optimization done in network stack.


Copy of your private mail :

> On 2.6.38, I got a lot of "rx_missed_errors" on NIC, which means the
> rx loop was really busy to get packet from the receiving ring. Usually
> in this case it shouldn't exit the softirqs and keep polling in order
> to decrease the initrs.
>
> On 2.6.32, I can Rx and Tx 2.3Mpps with no packet lost(error on NIC),
> but on 2.6.38 I can only reach 50kpps with a lot of
> "rx_missed_errors", and all the binding cpu core was 100% in SI. I
> don't think there was any optimizations on it.

I hope you understand there is something wrong with your setup ?

50.000 pps on a 64 cpu machine is a bad joke.

We can reach +10.000.000 on a 16 cpus one.




^ permalink raw reply

* [PatchV3 1/3] usb: plusb: Whitespace
From: Simon Wood @ 2011-04-07  7:40 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sergei Shtylyov, davem, linux-usb, netdev, linux-kernel,
	Simon Wood
In-Reply-To: <1301456667-1648-1-git-send-email-simon@mungewell.org>

From: simon <simon@ubuntu.(none)>

This patch cleans up a couple of instances of incorrect whitespace

Signed-off-by: Simon Wood <simon@mungewell.org>
---
 drivers/net/usb/plusb.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/plusb.c b/drivers/net/usb/plusb.c
index 823c537..2fe1bb5 100644
--- a/drivers/net/usb/plusb.c
+++ b/drivers/net/usb/plusb.c
@@ -134,13 +134,13 @@ static struct usb_driver plusb_driver = {
 
 static int __init plusb_init(void)
 {
- 	return usb_register(&plusb_driver);
+	return usb_register(&plusb_driver);
 }
 module_init(plusb_init);
 
 static void __exit plusb_exit(void)
 {
- 	usb_deregister(&plusb_driver);
+	usb_deregister(&plusb_driver);
 }
 module_exit(plusb_exit);
 
-- 
1.7.4.1

^ permalink raw reply related

* [PatchV3 2/3] usb: plusb: Add support for PL-25A1
From: Simon Wood @ 2011-04-07  7:40 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sergei Shtylyov, davem, linux-usb, netdev, linux-kernel,
	Simon Wood
In-Reply-To: <1302162015-22504-1-git-send-email-simon@mungewell.org>

From: simon <simon@ubuntu.(none)>

This patch adds support for the PL-25A1 by adding the appropriate
USB ID's. This chip is used in the Belkin 'Windows Easy Transfer'
Cables.

Signed-off-by: Simon Wood <simon@mungewell.org>
---
 drivers/net/usb/Kconfig |    2 +-
 drivers/net/usb/plusb.c |   22 ++++++++++++++++++++--
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index 3ec22c3..9d4f911 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -258,7 +258,7 @@ config USB_NET_NET1080
 	  optionally with LEDs that indicate traffic
 
 config USB_NET_PLUSB
-	tristate "Prolific PL-2301/2302 based cables"
+	tristate "Prolific PL-2301/2302/25A1 based cables"
 	# if the handshake/init/reset problems, from original 'plusb',
 	# are ever resolved ... then remove "experimental"
 	depends on USB_USBNET && EXPERIMENTAL
diff --git a/drivers/net/usb/plusb.c b/drivers/net/usb/plusb.c
index 2fe1bb5..f46aa07 100644
--- a/drivers/net/usb/plusb.c
+++ b/drivers/net/usb/plusb.c
@@ -45,6 +45,14 @@
  * seems to get wedged under load.  Prolific docs are weak, and
  * don't identify differences between PL2301 and PL2302, much less
  * anything to explain the different PL2302 versions observed.
+ *
+ * NOTE:  pl2501 has several modes, including pl2301 and pl2302
+ * compatibility.   Some docs suggest the difference between 2301
+ * and 2302 is only to make MS-Windows use a different driver...
+ *
+ * pl25a1 glue based on patch from Tony Gibbs.  Prolific "docs" on
+ * this chip are as usual incomplete about what control messages
+ * are supported.
  */
 
 /*
@@ -95,7 +103,7 @@ static int pl_reset(struct usbnet *dev)
 }
 
 static const struct driver_info	prolific_info = {
-	.description =	"Prolific PL-2301/PL-2302",
+	.description =	"Prolific PL-2301/PL-2302/PL-25A1",
 	.flags =	FLAG_POINTTOPOINT | FLAG_NO_SETINT,
 		/* some PL-2302 versions seem to fail usb_set_interface() */
 	.reset =	pl_reset,
@@ -111,6 +119,7 @@ static const struct driver_info	prolific_info = {
 
 static const struct usb_device_id	products [] = {
 
+/* full speed cables */
 {
 	USB_DEVICE(0x067b, 0x0000),	// PL-2301
 	.driver_info =	(unsigned long) &prolific_info,
@@ -119,6 +128,15 @@ static const struct usb_device_id	products [] = {
 	.driver_info =	(unsigned long) &prolific_info,
 },
 
+/* high speed cables */
+{
+	USB_DEVICE(0x067b, 0x25a1),     /* PL-25A1, no eeprom */
+	.driver_info =  (unsigned long) &prolific_info,
+}, {
+	USB_DEVICE(0x050d, 0x258a),     /* Belkin F5U258/F5U279 (PL-25A1) */
+	.driver_info =  (unsigned long) &prolific_info,
+},
+
 	{ },		// END
 };
 MODULE_DEVICE_TABLE(usb, products);
@@ -145,5 +163,5 @@ static void __exit plusb_exit(void)
 module_exit(plusb_exit);
 
 MODULE_AUTHOR("David Brownell");
-MODULE_DESCRIPTION("Prolific PL-2301/2302 USB Host to Host Link Driver");
+MODULE_DESCRIPTION("Prolific PL-2301/2302/25A1 USB Host to Host Link Driver");
 MODULE_LICENSE("GPL");
-- 
1.7.4.1

^ permalink raw reply related

* [PatchV3 3/3] usb: plusb: Add debug to reset function
From: Simon Wood @ 2011-04-07  7:40 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sergei Shtylyov, davem, linux-usb, netdev, linux-kernel,
	Simon Wood
In-Reply-To: <1302162015-22504-1-git-send-email-simon@mungewell.org>

From: simon <simon@ubuntu.(none)>

This patch adds some debug to the reset function to print out the
reason why it fails.

Signed-off-by: Simon Wood <simon@mungewell.org>
---
 drivers/net/usb/plusb.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/usb/plusb.c b/drivers/net/usb/plusb.c
index f46aa07..217aec8 100644
--- a/drivers/net/usb/plusb.c
+++ b/drivers/net/usb/plusb.c
@@ -94,11 +94,15 @@ pl_set_QuickLink_features(struct usbnet *dev, int val)
 
 static int pl_reset(struct usbnet *dev)
 {
+	int status;
+
 	/* some units seem to need this reset, others reject it utterly.
 	 * FIXME be more like "naplink" or windows drivers.
 	 */
-	(void) pl_set_QuickLink_features(dev,
+	status = pl_set_QuickLink_features(dev,
 		PL_S_EN|PL_RESET_OUT|PL_RESET_IN|PL_PEER_E);
+	if (status != 0 && netif_msg_probe(dev))
+		netif_dbg(dev, link, dev->net, "pl_reset --> %d\n", status);
 	return 0;
 }
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH] iproute2: parse flag XFRM_POLICY_ICMP
From: Ulrich Weber @ 2011-04-07  7:37 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

parse flag XFRM_POLICY_ICMP

Signed-off-by: Ulrich Weber <uweber@astaro.com>
---
 ip/ipxfrm.c      |    1 +
 ip/xfrm_policy.c |    4 +++-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c
index a276c0b..7a9a681 100644
--- a/ip/ipxfrm.c
+++ b/ip/ipxfrm.c
@@ -980,6 +980,7 @@ void xfrm_policy_info_print(struct xfrm_userpolicy_info *xpinfo,
 
 		fprintf(fp, "flag ");
 		XFRM_FLAG_PRINT(fp, flags, XFRM_POLICY_LOCALOK, "localok");
+		XFRM_FLAG_PRINT(fp, flags, XFRM_POLICY_ICMP, "icmp");
 		if (flags)
 			fprintf(fp, "%x", flags);
 	}
diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index 9ef5c09..7827f91 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -77,7 +77,7 @@ static void usage(void)
 	//fprintf(stderr, "PRIORITY - priority value(default=0)\n");
 
 	fprintf(stderr, "FLAG-LIST := [ FLAG-LIST ] FLAG\n");
-	fprintf(stderr, "FLAG := [ localok ]\n");
+	fprintf(stderr, "FLAG := [ localok | icmp ]\n");
 
 	fprintf(stderr, "LIMIT-LIST := [ LIMIT-LIST ] | [ limit LIMIT ]\n");
 	fprintf(stderr, "LIMIT := [ [time-soft|time-hard|time-use-soft|time-use-hard] SECONDS ] |\n");
@@ -156,6 +156,8 @@ static int xfrm_policy_flag_parse(__u8 *flags, int *argcp, char ***argvp)
 		while (1) {
 			if (strcmp(*argv, "localok") == 0)
 				*flags |= XFRM_POLICY_LOCALOK;
+			else if (strcmp(*argv, "icmp") == 0)
+				*flags |= XFRM_POLICY_ICMP;
 			else {
 				PREV_ARG(); /* back track */
 				break;
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH 07/19] timberdale: mfd_cell is now implicitly available to drivers
From: Grant Likely @ 2011-04-07  8:04 UTC (permalink / raw)
  To: Greg KH
  Cc: Andres Salomon, Samuel Ortiz, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Mark Brown, khali-PUYAD+kWke1g9hUCZPvPmw,
	ben-linux-elnMNo+KYs3YtjvyW6yDsg, Peter Korsgaard,
	Mauro Carvalho Chehab, David Brownell,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	Mocean Laboratories
In-Reply-To: <20110406183854.GA10058-l3A5Bk7waGM@public.gmane.org>

On Wed, Apr 06, 2011 at 11:38:54AM -0700, Greg KH wrote:
> On Wed, Apr 06, 2011 at 11:25:57AM -0700, Andres Salomon wrote:
> > > > We've been faced with the problem of being able to pass both MFD
> > > > related data and a platform_data pointer to some of those drivers.
> > > > Squeezing the MFD bits in the sub driver platform_data pointer
> > > > doesn't work for drivers that know nothing about MFDs. It also adds
> > > > an additional dependency on the MFD API to all MFD sub drivers.
> > > > That prevents any of those drivers to eventually be used as plain
> > > > platform device drivers.
> > > 
> > > Then they shouldn't be "plain" platform drivers, that should only be
> > > reserved for drivers that are the "lowest" type.  Just make them MFD
> > > devices and go from there.
> > 
> > 
> > The problem is of mixing "plain" platform devices and MFD devices.
> 
> Then don't do that.

>From my perspective, MFD devices are little more than a bag of
platform_devices, with the MFD layer provides infrastructure for
managing it.  It isn't that there are 'plain' platform device and
'mfd' devices.  There are only platform_devices, but some of the
drivers use additional data stored in a struct mfd.

Personally, I'm not thrilled with the approach of using struct mfd, or
more specifically making it available to drivers, but on the ugly
scale it isn't very high.

However, the changes on how struct mfd is passed that were merged in
2.6.39 were actively dangerous and are going to be reverted.  Yet
a method is still needed to pass the struct mfd in a safe way.  I
don't have a problem with adding the mfd pointer to struct
platform_device, even if it should just be a stop gap to something
better.

Independently, I have been experimenting with typesafe methods for
attaching data to devices which may very well be the long term
approach, but for the short term I see no problem with adding the mfd
pointer, particularly because it is by far safer than any of the other
immediately available options.

g.

^ permalink raw reply

* Re: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Eric Dumazet @ 2011-04-07  8:07 UTC (permalink / raw)
  To: Wei Gu; +Cc: netdev, Alexander Duyck, Jeff Kirsher
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48B9E82@ESGSCCMS0001.eapac.ericsson.se>

Le jeudi 07 avril 2011 à 15:22 +0800, Wei Gu a écrit :
> Hi guys,
> As I talked with Eric, that I get a very low performance on Linux 2.6.38 kernel with intel ixgbe-3.2.10 driver.
> I test different rx buff size on the Intel 10G NIC, by setting ethtool -G rx 4096.
> I get the lowest performance(~50Kpps Rx&Tx) by setting the rx==4096.
> Once I decrease the Rx to 512 (default) then I can get Max 250Kpps Rx&Tx on 1 NIC.
> 
> I was runing this test with HP DL580 4 Sock CPUs, and full memeory configuration.
> modprobe ixgbe RSS=8,8,8,8,8,8,8,8 FdirMode=0,0,0,0,0,0,0,0 Node=0,0,1,1,2,2,3,3
> Numactrl --hardware
> available: 4 nodes (0-3)
> node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
> node 0 size: 65525 MB
> node 0 free: 63053 MB
> node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
> node 1 size: 65536 MB
> node 1 free: 63388 MB
> node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
> node 2 size: 65536 MB
> node 2 free: 63344 MB
> node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
> node 3 size: 65535 MB
> node 3 free: 63376 MB
> 
> Then I binding the eth10's rx and tx's IRQs to core "24 25 26 27 28 29 30 31", one by one, which means 1 rx and 1 tx was share 1 core.
> 
> 
> I did the same test on 2.6.32 kernel, I can get >2.5M tx&rx with the same setup on RHEL6(2.6.32) Linux. But never reach 10.000.000 rx&tx on a single NIC:)
> 
> I also test the 2.6.38 shipped intel ixgbe driver It has the same problem.
> 
> This is a perf record with linux shipped ixgbe driver, looks it has a very high irq/s rate. And the softirq was busy on alloc_iova
> 
> 
> PerfTop:  512417 irqs/sec  kernel:91.3%  exact:  0.0% [1000Hz cpu-clock-msecs],  (all, 64 CPUs)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> -      0.82%     ksoftirqd/24  [kernel.kallsyms]          [k] _raw_spin_unlock_irqrestore
> \u2592   - _raw_spin_unlock_irqrestore
> \u2592      - 44.27% alloc_iova
> \u2592           intel_alloc_iova
> \u2592           __intel_map_single
> \u2592           intel_map_page
> \u2592         - ixgbe_init_interrupt_scheme
> \u2592            - 59.97% ixgbe_alloc_rx_buffers
> \u2592                 ixgbe_clean_rx_irq
> \u2592                 0xffffffffa033a5
> \u2592                 net_rx_action
> u2592                 __do_softirq
> \u2592               + call_softirq
> \u2592            - 40.03% ixgbe_change_mtu
> \u2592                 ixgbe_change_mtu
> \u2592                 dev_hard_start_xmit
> \u2592                 sch_direct_xmit
> \u2592                 dev_queue_xmit
> \u2592                 vlan_dev_hard_start_xmit
> \u2592                 hook_func
> \u2592                 nf_iterate
> \u2592                nf_hook_slow
> \u2592                 NF_HOOK.clone.1
> \u2592                 ip_rcv
> \u2592                 __netif_receive_skb
> \u2592                 __netif_receive_skb
> \u2592                 netif_receive_skb
> \u2592                 napi_skb_finish
> \u2592                 napi_gro_receive
> \u2592                 ixgbe_clean_rx_irq
> \u2592                 0xffffffffa033a5
> \u2592                 net_rx_action
> \u2592                 __do_softirq
> \u2592               + call_softirq
> \u2592      + 35.85% find_iova
> \u2592      + 19.44% add_unmap
> 
> 
> Thanks
> WeiGu

What about using the driver as provided in 2.6.38 ?

No custom module parameter, only play with irq affinities

Say you have 64 queues but want only 8 cpus (24 -> 31) receiving trafic

for i in `seq 0 7`
do
 echo 01000000 >/proc/irq/*/eth1-fp-$i/../smp_affinity
done

for i in `seq 8 15`
do
 echo 02000000 >/proc/irq/*/eth1-fp-$i/../smp_affinity
done

...

for i in `seq 56 63`
do
 echo 80000000 >/proc/irq/*/eth1-fp-$i/../smp_affinity
done


Why is ixgbe_change_mtu() seen on your profile ?
Its damn expensive, since it must call ixgbe_reinit_locked()

Are you using a custom code in kernel ?




^ permalink raw reply

* Re: [PATCH 07/19] timberdale: mfd_cell is now implicitly available to drivers
From: Felipe Balbi @ 2011-04-07  8:09 UTC (permalink / raw)
  To: Greg KH
  Cc: Felipe Balbi, Samuel Ortiz, Grant Likely, Andres Salomon,
	linux-kernel, Mark Brown, khali, ben-linux, Peter Korsgaard,
	Mauro Carvalho Chehab, David Brownell, linux-i2c, linux-media,
	netdev, spi-devel-general, Mocean Laboratories
In-Reply-To: <20110406220900.GA16117@suse.de>

Hi,

On Wed, Apr 06, 2011 at 03:09:00PM -0700, Greg KH wrote:
> On Wed, Apr 06, 2011 at 09:59:02PM +0300, Felipe Balbi wrote:
> > Hi,
> > 
> > On Wed, Apr 06, 2011 at 08:47:34PM +0200, Samuel Ortiz wrote:
> > > > > > What is a "MFD cell pointer" and why is it needed in struct device?
> > > > > An MFD cell is an MFD instantiated device.
> > > > > MFD (Multi Function Device) drivers instantiate platform devices. Those
> > > > > devices drivers sometimes need a platform data pointer, sometimes an MFD
> > > > > specific pointer, and sometimes both. Also, some of those drivers have been
> > > > > implemented as MFD sub drivers, while others know nothing about MFD and just
> > > > > expect a plain platform_data pointer.
> > > > 
> > > > That sounds like a bug in those drivers, why not fix them to properly
> > > > pass in the correct pointer?
> > > Because they're drivers for generic IPs, not MFD ones. By forcing them to use
> > > MFD specific structure and APIs, we make it more difficult for platform code
> > > to instantiate them.
> > 
> > I agree. What I do on those cases is to have a simple platform_device
> > for the core IP driver and use platform_device_id tables to do runtime
> > checks of the small differences. If one platform X doesn't use a
> > platform_bus, it uses e.g. PCI, then you make a PCI "bridge" which
> > allocates a platform_device with the correct name and adds that to the
> > driver model.
> > 
> > See [1] (for the core driver) and [2] (for a PCI bridge driver) for an
> > example of what I'm talking about.
> 
> Yes, thanks for providing a real example, this is the best way to handle
> this.

no problem.

ps: that's the driver for the USB3 controller which will come on OMAP5.
Driver being validate on a pre-silicon platform right now :-D In a few
weeks I'll send the driver for integration.

-- 
balbi

^ permalink raw reply

* Re: [PATCH net-next 2/5] be2net: use common method to check for sriov function type
From: Ben Hutchings @ 2011-04-07  8:14 UTC (permalink / raw)
  To: Ajit Khaparde; +Cc: netdev
In-Reply-To: <20110407040801.GA4199@akhaparde-VBox>

On Wed, 2011-04-06 at 23:08 -0500, Ajit Khaparde wrote:
> Lancer and BE can both use SLI_INTF_REG to check a VF or a PF.
[...]

This seems pretty unreliable (both in the previous and the current
version).  You cannot rely on the whole of PCI config space being mapped
to a VM guest.  KVM certainly didn't do this when I used PCI pass-
through.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH net-next] cxgb4: don't hold RTNL during ethtool phys_id
From: Dimitris Michailidis @ 2011-04-07  8:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Casey Leedom, Ben Hutchings, David Miller, netdev
In-Reply-To: <20110406173308.4737e9d4@nehalam>

Stephen Hemminger wrote:
> On Wed, 6 Apr 2011 17:20:29 -0700
> Casey Leedom <leedom@chelsio.com> wrote:
> 
>> | From: Stephen Hemminger <shemminger@linux-foundation.org>
>> | Date: Wednesday, April 06, 2011 05:09 pm
>> | 
>> | The Chelsio cxgb4 drivers implement blinking in a unique way by
>> | waiting on the mailbox. This patch cleans it up slightly by no longer
>> | holding the system wide network configuration lock during the process.
>> | 
>> | The patch also uses correct semantics for the time argument
>> | which is supposed to be in seconds; and zero is supposed
>> | to signify infinite blinking.
>> | 
>> | This is still a bad firmware interface design for this
>> | since it means the board is basically hung while doing the blink.
>> | But fixing it correctly would require hardware and firmware
>> | documentation. With that information the device could be converted
>> | to the new set_phys_id.
>> | 
>> | Compile tested only.
>> | 
>> | Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>
>>   Are you assuming that the firmware won't respond with a command completion 
>> until the LED blinking is complete?  If so, that's a bad assumption.  The 
>> firmware runs as an asynchronous real-time OS.  The LED blinking simply becomes 
>> a thread of activity within the OS and the command completes immediately.
>>
>> Casey
> 
> Then how is LED blinking stopped?

You can pass 0 as blinks to cancel your request, which may or may not cancel 
the LED blinking depending on what other drivers have concurrent blinking 
requests in progress.  But you can't pass UINT_MAX as the patch does.  I'll 
fix it up to use the new ethtool interface this week.

^ permalink raw reply

* Agent Needed !!!
From: Mr. Chia-Juch Chang @ 2011-04-07  8:21 UTC (permalink / raw)


China Steel Corporation (CSC).
HEAD OFFICE 1 Chung-Kang Road,
Siaogang District, Kaohsiung
81233, Taiwan, R.O.C.
REF:CSC/REP/887

i. Introduction.
My Name is Chia-Juch Chang. I am the Chief Executive Officer of China Steel
Corporation (CSC). We need a reputable company/firm to serve as our payment
collection agent in North America, Europe, Asia. You shall earn 10% of every
payment issued to you on behalf of China Steel Corporation.

ii. Requirement (Contact Information):
1. Full Names:
2. Company Name:
3. Full Contact Address:
4. Tel and Fax Numbers:

If interested, please email us immediately at ChinaSteelCorporation@email.com

Contact Person: Ethan Downing
Regional Manager
Tel: +886-7-802-1111
Email: ChinaSteelCorporation@email.com
Website: www.csc.com.tw

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



^ permalink raw reply

* Re: problem of "ipv4: revert Set rt->rt_iif more sanely on output routes."
From: OGAWA Hirofumi @ 2011-04-07  8:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <877hb6sf43.fsf@devron.myhome.or.jp>

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:

> David Miller <davem@davemloft.net> writes:
>
>> So fix is something like:
>>
>> 1) Add "int rt_route_iif;" to struct rtable
>>
>> 2) For input routes, always set rt_route_iif to same value as rt_iif
>>
>> 3) For output routes, always set rt_route_iif to zero.  Set rt_iif
>>    as it is done currently.
>>
>> 4) Change rt_is_{output,input}_route() to test rt_route_iif
>>
>> This should fix the bug and not introduce new regressions.
>>
>> Can you write and test such a patch with your test case?
>
> Ok. I'll try, but I'm not sure I understand the above correctly. Well,
> I'll send the patch after testing.

This patch seems to work for avahi-daemon without any warning.

BTW, the above meant change from (there was before) "fl.iif" to
"rt_route_iif"? If so, this patch is not enough. I'm not sure

+	rth->rt_route_iif = 0;
+	rth->rt_iif	= oldflp4->flowi4_oif ? : dev_out->ifindex;

is correct one or not. Please review.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox