Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Andi Kleen @ 2010-05-02 21:30 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Eric Dumazet, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <20100502105418.7abf83a7@infradead.org>

On Sun, May 02, 2010 at 10:54:18AM -0700, Arjan van de Ven wrote:
> On Sun, 02 May 2010 16:27:28 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > C2 latency seems to be 64  (us ?), while C1 seems to be 1
> 
> the processor_idle module has a "latency_factor" module parameter.
> The default is 2, but sometimes people think 6 is a better value...
> .. any chance you can try that value ?
> 
> Also, I'm starting to wonder if Andi's patch to use io_schedule() needs
> to be replaced with a net_schedule() kind of thing. The cpuidle code
> currently has a weight factor for IO (based on measuring/experiments),
> and maybe networking really needs another factor... so just having a
> parallel concept with a different weight could be the right answer for
> that.

We definitely need a net_schedule() for other reasons too: to avoid the blkio 
wait code and then also because networking needs a short "fast idle" timeout 
because the delays are not bounded.  

Otherwise a sender that suddenly stops sending could break all your power 
saving.

I think the reference count used in io_schedule is not the right model for 
this, probably needs a per cpu timeout ("be fast until this time"). Possibly 
a dynamic one feed by the measured input rate.

-Andi

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Andi Kleen @ 2010-05-02 21:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, hadi, xiaosuo, therbert, shemminger, netdev, lenb,
	arjan
In-Reply-To: <1272818131.2173.127.camel@edumazet-laptop>

> You mean we can wakeup a thread with something else than an IPI ?

It's pointless to send an IPI to your thread sibling for this. 
Everything it could do you can do yourself too with the same performance.

-Andi

^ permalink raw reply

* Re: [PATCH] net: fix compile error due to double return type in SOCK_DEBUG
From: David Miller @ 2010-05-02 20:43 UTC (permalink / raw)
  To: jengelh; +Cc: netdev
In-Reply-To: <1272823202-25957-1-git-send-email-jengelh@medozas.de>

From: Jan Engelhardt <jengelh@medozas.de>
Date: Sun,  2 May 2010 20:00:01 +0200

> Fix this one:
> include/net/sock.h: error: two or more data types in declaration specifiers
> 
> Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

Applied, thanks Jan.

Shows how many people actually try to mess with that ifdef.
I think we should just kill it off.

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-05-02 19:22 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <20100502105418.7abf83a7@infradead.org>

Le dimanche 02 mai 2010 à 10:54 -0700, Arjan van de Ven a écrit :
> On Sun, 02 May 2010 16:27:28 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > C2 latency seems to be 64  (us ?), while C1 seems to be 1
> 
> the processor_idle module has a "latency_factor" module parameter.
> The default is 2, but sometimes people think 6 is a better value...
> .. any chance you can try that value ?
> 

I tried 6 and 20, nothing changed ;(

> Also, I'm starting to wonder if Andi's patch to use io_schedule() needs
> to be replaced with a net_schedule() kind of thing. The cpuidle code
> currently has a weight factor for IO (based on measuring/experiments),
> and maybe networking really needs another factor... so just having a
> parallel concept with a different weight could be the right answer for
> that.
> 

But a task blocked on disk IO is probably blocked for a small amount of
time, while on network, it can be for a long time. I am not sure its the
right metric.

I was expecting something based on recent history.
Say if we have 20.000 wakeups per second, most likely we should not
enter C2/C3 states...

> 
> we'll fix powertop to report the marketing name soon.
> 
> 

Ah, I see, thanks :)



^ permalink raw reply

* [PATCH] net: fix compile error due to double return type in SOCK_DEBUG
From: Jan Engelhardt @ 2010-05-02 18:00 UTC (permalink / raw)
  To: davem; +Cc: netdev

Fix this one:
include/net/sock.h: error: two or more data types in declaration specifiers

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 include/net/sock.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 56df440..0f0b6d6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -74,7 +74,7 @@
 					printk(KERN_DEBUG msg); } while (0)
 #else
 /* Validate arguments and do nothing */
-static void inline int __attribute__ ((format (printf, 2, 3)))
+static inline void __attribute__ ((format (printf, 2, 3)))
 SOCK_DEBUG(struct sock *sk, const char *msg, ...)
 {
 }
-- 
1.7.0.5


^ permalink raw reply related

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Arjan van de Ven @ 2010-05-02 17:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <1272810448.2173.31.camel@edumazet-laptop>

On Sun, 02 May 2010 16:27:28 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> C2 latency seems to be 64  (us ?), while C1 seems to be 1

the processor_idle module has a "latency_factor" module parameter.
The default is 2, but sometimes people think 6 is a better value...
.. any chance you can try that value ?

Also, I'm starting to wonder if Andi's patch to use io_schedule() needs
to be replaced with a net_schedule() kind of thing. The cpuidle code
currently has a weight factor for IO (based on measuring/experiments),
and maybe networking really needs another factor... so just having a
parallel concept with a different weight could be the right answer for
that.



> 
> Your CPU supports the following C-states : C1 C2 C3 
> Your BIOS reports the following C-states : C1 C2 
> 
> C3 seems to be disabled in BIOS

btw this C2 == marketing name C3, and C3 == marketing name C6

(too many translations ;-)

we'll fix powertop to report the marketing name soon.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-05-02 17:47 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <20100502104304.6c6b2764@infradead.org>

Le dimanche 02 mai 2010 à 10:43 -0700, Arjan van de Ven a écrit :
> On Sun, 02 May 2010 18:35:31 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote
> > 
> > 
> > You mean we can wakeup a thread with something else than an IPI ?
> > 
> 
> actually we can.
> 
> mwait is not only "go idle", it is "go idle until someone writes to
> <THIS> cacheline". where <THIS> is set up with a "monitor" instruction.
> We don't need to send an ipi per se.. all we need is to write to the
> right cacheline that we're monitoring.
> 
> 

Thats a bit x86 specific, isnt it ?

But we want to eventually send a 'signal' to a cpu, even if not blocked
in idle, so that it can do following action :

/* Called from hardirq (IPI) context */
static void rps_trigger_softirq(void *data)
{
        struct softnet_data *sd = data;

        __napi_schedule(&sd->backlog);
        __get_cpu_var(netdev_rx_stat).received_rps++;
}

And it also should be portable ;)

If something else than an IPI is available, please let us know !

Thanks



^ permalink raw reply

* mmotm 2010-04-28 - RCU whinges
From: Valdis.Kletnieks @ 2010-05-02 17:46 UTC (permalink / raw)
  To: Andrew Morton, Peter Zijlstra, Patrick McHardy, David S. Miller
  Cc: linux-kernel, netfilter-devel, netdev
In-Reply-To: <201004290021.o3T0L04Y028017@imap1.linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]

On Wed, 28 Apr 2010 16:53:32 PDT, akpm@linux-foundation.org said:
> The mm-of-the-moment snapshot 2010-04-28-16-53 has been uploaded to
> 
>    http://userweb.kernel.org/~akpm/mmotm/

I thought we swatted all these, hit another one...

[    9.131490] ctnetlink v0.93: registering with nfnetlink.
[    9.131535]
[    9.131535] ===================================================
[    9.131704] [ INFO: suspicious rcu_dereference_check() usage. ]
[    9.131794] ---------------------------------------------------
[    9.131883] net/netfilter/nf_conntrack_ecache.c:88 invoked rcu_dereference_check() without protection!
[    9.131977]
[    9.131977] other info that might help us debug this:
[    9.131978]
[    9.132218]
[    9.132219] rcu_scheduler_active = 1, debug_locks = 0
[    9.132434] 1 lock held by swapper/1:
[    9.132519]  #0:  (nf_ct_ecache_mutex){+.+...}, at: [<ffffffff8148922d>] nf_conntrack_register_notifier+0x1a/0x75
[    9.132938]
[    9.132939] stack backtrace:
[    9.133129] Pid: 1, comm: swapper Tainted: G        W   2.6.34-rc5-mmotm0428 #1
[    9.133220] Call Trace:
[    9.133319]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[    9.133410]  [<ffffffff81489250>] nf_conntrack_register_notifier+0x3d/0x75
[    9.133521]  [<ffffffff81b5a157>] ctnetlink_init+0x71/0xd5
[    9.133627]  [<ffffffff81b5a0e6>] ? ctnetlink_init+0x0/0xd5
[    9.133735]  [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
[    9.133843]  [<ffffffff81b2e68a>] kernel_init+0x144/0x1ce
[    9.133949]  [<ffffffff81003414>] kernel_thread_helper+0x4/0x10
[    9.134060]  [<ffffffff81598a40>] ? restore_args+0x0/0x30
[    9.134196]  [<ffffffff81b2e546>] ? kernel_init+0x0/0x1ce
[    9.134328]  [<ffffffff81003410>] ? kernel_thread_helper+0x0/0x10
[    9.134530] ip_tables: (C) 2000-2006 Netfilter Core Team
[    9.134655] TCP bic registered


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Arjan van de Ven @ 2010-05-02 17:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <1272818131.2173.127.camel@edumazet-laptop>

On Sun, 02 May 2010 18:35:31 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote
> 
> 
> You mean we can wakeup a thread with something else than an IPI ?
> 

actually we can.

mwait is not only "go idle", it is "go idle until someone writes to
<THIS> cacheline". where <THIS> is set up with a "monitor" instruction.
We don't need to send an ipi per se.. all we need is to write to the
right cacheline that we're monitoring.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* Re: Kernel panic in fib_rules_lookup (kernel 2.6.32)
From: "Oleg A. Arkhangelsky" @ 2010-05-02 17:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1272817864.2173.123.camel@edumazet-laptop>

Hello,

02.05.10, 18:31, "Eric Dumazet" <eric.dumazet@gmail.com>:

> Le dimanche 02 mai 2010 à 14:46 +0400, "Oleg A. Arkhangelsky" a écrit :

>  > Got the same panic, at the same place (fib_rules_lookup+0xa2/0xd0).
>  > Looks like the problem with NULL dereference is somewhere in
>  > list_for_each_entry_rcu macro. But I don't understand how this can be.
>  > 
>  > Any thoughts? :(
>  > 
>  
>  Do you have any modify rules activity ?

It can be a nice clue, but unfortunately no. Moreover, we don't use rules at
all. Only default template:

0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

Also, there are no "floating" interfaces (such as ppp) on this PC. Only routes
are changing periodically (Quagga with full-view BGP sessions).

-- 
wbr, Oleg.

^ permalink raw reply

* Re: [PATCH v3] net: fix softnet_stat
From: Eric Dumazet @ 2010-05-02 16:43 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <1272814936-4902-1-git-send-email-xiaosuo@gmail.com>

Le dimanche 02 mai 2010 à 23:42 +0800, Changli Gao a écrit :
> fix softnet_stat
> 
> Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
> without any protection. And enqueue_to_backlog should update the netdev_rx_stat
> of the target CPU.
> 
> This patch renames softnet_data.total to softnet_data.processed: the number of
> packets processed in uppper levels(IP stacks).
> 
> softnet_stat data is moved into softnet_data.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Thats a fine patch, thanks Changli.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>


>  
> @@ -5869,6 +5866,7 @@ static int __init net_dev_init(void)
>  	for_each_possible_cpu(i) {
>  		struct softnet_data *sd = &per_cpu(softnet_data, i);
>  
> +		memset(sd, 0, sizeof(*sd));
>  		skb_queue_head_init(&sd->input_pkt_queue);
>  		skb_queue_head_init(&sd->process_queue);
>  		sd->completion_queue = NULL;

Minor note : 
- You could have removed some NULL initializations after the memset()
- Also, per_cpu data is already 0 initialized, so memset() is not
strictly needed ;)

This is very minor, since we dont have a bss section in per_cpu data, so
it doesnt matter.




^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-05-02 16:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Miller, hadi, xiaosuo, therbert, shemminger, netdev, lenb,
	arjan
In-Reply-To: <20100502154649.GA18063@gargoyle.fritz.box>

Le dimanche 02 mai 2010 à 17:46 +0200, Andi Kleen a écrit :
> > In this test, all 16 queues of one BCM57711E NIC (1Gb link) delivers
> >  packets at about 1.300.000 pps to 16 cpus (one cpu per queue) and these
> > packets are then redistributed by RPS to same 16 cpus, generating about
> > 650.000 IPI per second.
> 
> BTW if rps was SMT aware it could avoid a lot of the IPIs in the first place.

RPS do what you want, just stick a good cpumask, not a unware one :)

In my test, I specifically do something 'stupid' like :

echo fffe >/sys/class/net/bond0.2240/queues/rx-0/rps_cpus
echo fffd >/sys/class/net/bond0.2240/queues/rx-1/rps_cpus
echo fffb >/sys/class/net/bond0.2240/queues/rx-2/rps_cpus
echo fff7 >/sys/class/net/bond0.2240/queues/rx-3/rps_cpus

echo ffef >/sys/class/net/bond0.2240/queues/rx-4/rps_cpus
echo ffdf >/sys/class/net/bond0.2240/queues/rx-5/rps_cpus
echo ffbf >/sys/class/net/bond0.2240/queues/rx-6/rps_cpus
echo ff7f >/sys/class/net/bond0.2240/queues/rx-7/rps_cpus

echo feff >/sys/class/net/bond0.2240/queues/rx-8/rps_cpus
echo fdff >/sys/class/net/bond0.2240/queues/rx-9/rps_cpus
echo fbff >/sys/class/net/bond0.2240/queues/rx-10/rps_cpus
echo f7ff >/sys/class/net/bond0.2240/queues/rx-11/rps_cpus

echo efff >/sys/class/net/bond0.2240/queues/rx-12/rps_cpus
echo dfff >/sys/class/net/bond0.2240/queues/rx-13/rps_cpus
echo bfff >/sys/class/net/bond0.2240/queues/rx-14/rps_cpus
echo 7fff >/sys/class/net/bond0.2240/queues/rx-15/rps_cpus

echo 0001 >/proc/irq/*/eth1-fp-0/../smp_affinity
echo 0002 >/proc/irq/*/eth1-fp-1/../smp_affinity
echo 0004 >/proc/irq/*/eth1-fp-2/../smp_affinity
echo 0008 >/proc/irq/*/eth1-fp-3/../smp_affinity
echo 0010 >/proc/irq/*/eth1-fp-4/../smp_affinity
echo 0020 >/proc/irq/*/eth1-fp-5/../smp_affinity
echo 0040 >/proc/irq/*/eth1-fp-6/../smp_affinity
echo 0080 >/proc/irq/*/eth1-fp-7/../smp_affinity
echo 0100 >/proc/irq/*/eth1-fp-8/../smp_affinity
echo 0200 >/proc/irq/*/eth1-fp-9/../smp_affinity
echo 0400 >/proc/irq/*/eth1-fp-10/../smp_affinity
echo 0800 >/proc/irq/*/eth1-fp-11/../smp_affinity
echo 1000 >/proc/irq/*/eth1-fp-12/../smp_affinity
echo 2000 >/proc/irq/*/eth1-fp-13/../smp_affinity
echo 4000 >/proc/irq/*/eth1-fp-14/../smp_affinity
echo 8000 >/proc/irq/*/eth1-fp-15/../smp_affinity


You mean we can wakeup a thread with something else than an IPI ?




^ permalink raw reply

* Re: Re: Kernel panic in fib_rules_lookup (kernel 2.6.32)
From: Eric Dumazet @ 2010-05-02 16:31 UTC (permalink / raw)
  To: "Oleg A. Arkhangelsky"; +Cc: netdev
In-Reply-To: <382091272797186@web84.yandex.ru>

Le dimanche 02 mai 2010 à 14:46 +0400, "Oleg A. Arkhangelsky" a écrit :
> Hello,
> 
> 09.03.10, 17:09, "Eric Dumazet" <eric.dumazet@gmail.com>:
> 
> > Le mardi 09 mars 2010 à 10:44 +0300, "Oleg A. Arkhangelsky" a écrit :
> >  > Hello,
> >  > 
> >  > Got this kernel panic tomorrow. This PC is rather heavy loaded router with BGP full view (> 300K routes).
> >  > We're using FIB_TRIE. Last time we got similar panic about 1 month ago. Please, let me know if you
> >  > need additional information to debug (e.g. objdump). Thanks!
> >  > 
> >  > Mar  9 10:08:55 bras-1 kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
> >  > Mar  9 10:08:55 bras-1 kernel: IP: [] fib_rules_lookup+0xa2/0xd0
> >  > Mar  9 10:08:55 bras-1 kernel: *pde = 00000000
> >  > Mar  9 10:08:55 bras-1 kernel: Thread overran stack, or stack corrupted
> >  
> >  Hmm...
> 
> Got the same panic, at the same place (fib_rules_lookup+0xa2/0xd0). Looks like the problem with NULL dereference is somewhere in list_for_each_entry_rcu macro. But I don't understand how this can be.
> 
> Any thoughts? :(
> 

Do you have any modify rules activity ?

I dont understand why we need synchronize_rcu() in fib_nl_delrule() but
this certainly not a bug.




^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Andi Kleen @ 2010-05-02 15:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, hadi, xiaosuo, therbert, shemminger, netdev, lenb,
	arjan
In-Reply-To: <1272797690.2173.26.camel@edumazet-laptop>

> In this test, all 16 queues of one BCM57711E NIC (1Gb link) delivers
>  packets at about 1.300.000 pps to 16 cpus (one cpu per queue) and these
> packets are then redistributed by RPS to same 16 cpus, generating about
> 650.000 IPI per second.

BTW if rps was SMT aware it could avoid a lot of the IPIs in the first place.

-Andi


^ permalink raw reply

* [PATCH v3] net: fix softnet_stat
From: Changli Gao @ 2010-05-02 15:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: Eric Dumazet, netdev, Changli Gao

fix softnet_stat

Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.

This patch renames softnet_data.total to softnet_data.processed: the number of
packets processed in uppper levels(IP stacks).

softnet_stat data is moved into softnet_data.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |   17 +++++++----------
 net/core/dev.c            |   26 ++++++++++++--------------
 net/sched/sch_generic.c   |    2 +-
 3 files changed, 20 insertions(+), 25 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 40d4c20..c39938f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -218,16 +218,6 @@ struct neighbour;
 struct neigh_parms;
 struct sk_buff;
 
-struct netif_rx_stats {
-	unsigned total;
-	unsigned dropped;
-	unsigned time_squeeze;
-	unsigned cpu_collision;
-	unsigned received_rps;
-};
-
-DECLARE_PER_CPU(struct netif_rx_stats, netdev_rx_stat);
-
 struct netdev_hw_addr {
 	struct list_head	list;
 	unsigned char		addr[MAX_ADDR_LEN];
@@ -1390,6 +1380,12 @@ struct softnet_data {
 	struct sk_buff		*completion_queue;
 	struct sk_buff_head	process_queue;
 
+	/* stats */
+	unsigned		processed;
+	unsigned		time_squeeze;
+	unsigned		cpu_collision;
+	unsigned		received_rps;
+
 #ifdef CONFIG_RPS
 	struct softnet_data	*rps_ipi_list;
 
@@ -1399,6 +1395,7 @@ struct softnet_data {
 	unsigned int		cpu;
 	unsigned int		input_queue_head;
 #endif
+	unsigned		dropped;
 	struct sk_buff_head	input_pkt_queue;
 	struct napi_struct	backlog;
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index 100dcbd..36d53be 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2205,8 +2205,6 @@ int netdev_max_backlog __read_mostly = 1000;
 int netdev_budget __read_mostly = 300;
 int weight_p __read_mostly = 64;            /* old backlog weight */
 
-DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
-
 #ifdef CONFIG_RPS
 
 /* One global table that all flow-based protocols share. */
@@ -2366,7 +2364,7 @@ static void rps_trigger_softirq(void *data)
 	struct softnet_data *sd = data;
 
 	__napi_schedule(&sd->backlog);
-	__get_cpu_var(netdev_rx_stat).received_rps++;
+	sd->received_rps++;
 }
 
 #endif /* CONFIG_RPS */
@@ -2405,7 +2403,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	sd = &per_cpu(softnet_data, cpu);
 
 	local_irq_save(flags);
-	__get_cpu_var(netdev_rx_stat).total++;
 
 	rps_lock(sd);
 	if (skb_queue_len(&sd->input_pkt_queue) <= netdev_max_backlog) {
@@ -2429,9 +2426,9 @@ enqueue:
 		goto enqueue;
 	}
 
+	sd->dropped++;
 	rps_unlock(sd);
 
-	__get_cpu_var(netdev_rx_stat).dropped++;
 	local_irq_restore(flags);
 
 	kfree_skb(skb);
@@ -2806,7 +2803,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 			skb->dev = master;
 	}
 
-	__get_cpu_var(netdev_rx_stat).total++;
+	__get_cpu_var(softnet_data).processed++;
 
 	skb_reset_network_header(skb);
 	skb_reset_transport_header(skb);
@@ -3490,7 +3487,7 @@ out:
 	return;
 
 softnet_break:
-	__get_cpu_var(netdev_rx_stat).time_squeeze++;
+	sd->time_squeeze++;
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 	goto out;
 }
@@ -3691,17 +3688,17 @@ static int dev_seq_show(struct seq_file *seq, void *v)
 	return 0;
 }
 
-static struct netif_rx_stats *softnet_get_online(loff_t *pos)
+static struct softnet_data *softnet_get_online(loff_t *pos)
 {
-	struct netif_rx_stats *rc = NULL;
+	struct softnet_data *sd = NULL;
 
 	while (*pos < nr_cpu_ids)
 		if (cpu_online(*pos)) {
-			rc = &per_cpu(netdev_rx_stat, *pos);
+			sd = &per_cpu(softnet_data, *pos);
 			break;
 		} else
 			++*pos;
-	return rc;
+	return sd;
 }
 
 static void *softnet_seq_start(struct seq_file *seq, loff_t *pos)
@@ -3721,12 +3718,12 @@ static void softnet_seq_stop(struct seq_file *seq, void *v)
 
 static int softnet_seq_show(struct seq_file *seq, void *v)
 {
-	struct netif_rx_stats *s = v;
+	struct softnet_data *sd = v;
 
 	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
-		   s->total, s->dropped, s->time_squeeze, 0,
+		   sd->processed, sd->dropped, sd->time_squeeze, 0,
 		   0, 0, 0, 0, /* was fastroute */
-		   s->cpu_collision, s->received_rps);
+		   sd->cpu_collision, sd->received_rps);
 	return 0;
 }
 
@@ -5869,6 +5866,7 @@ static int __init net_dev_init(void)
 	for_each_possible_cpu(i) {
 		struct softnet_data *sd = &per_cpu(softnet_data, i);
 
+		memset(sd, 0, sizeof(*sd));
 		skb_queue_head_init(&sd->input_pkt_queue);
 		skb_queue_head_init(&sd->process_queue);
 		sd->completion_queue = NULL;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index aeddabf..a969b11 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -94,7 +94,7 @@ static inline int handle_dev_cpu_collision(struct sk_buff *skb,
 		 * Another cpu is holding lock, requeue & delay xmits for
 		 * some time.
 		 */
-		__get_cpu_var(netdev_rx_stat).cpu_collision++;
+		__get_cpu_var(softnet_data).cpu_collision++;
 		ret = dev_requeue_skb(skb, q);
 	}
 

^ permalink raw reply related

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-05-02 15:32 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <1272810448.2173.31.camel@edumazet-laptop>

Le dimanche 02 mai 2010 à 16:27 +0200, Eric Dumazet a écrit :
> Le dimanche 02 mai 2010 à 07:13 -0700, Arjan van de Ven a écrit :
> > > 
> > > Cn                Avg residency       P-states (frequencies)
> > > C0 (cpu running)        (68.9%)         2.93 Ghz    46.5%
> > > polling           0.0ms ( 0.0%)         2.80 Ghz     5.1%
> > > C1 mwait          0.0ms ( 0.0%)         2.53 Ghz     3.0%
> > > C2 mwait          0.0ms (31.1%)         2.13 Ghz     2.8%
> > >                                         1.60 Ghz    38.2%
> > 
> > I bet your system advertizes C2 with the same latency as C1,
> > but with lower power... which means Linux will pretty much never
> > pick C1.... no matter how much you take Andi's patch.
> > 
> > this is a bios thing... and until we put in the patch to override the
> > bios values (I can dust it off but it might need a bit of tweaking
> > since it was against .31) Andi's patch alone won't cut it... you also
> > need a non-lying bios ;)
> > 
> > 
> > 
> # pwd
> /sys/devices/system/cpu/cpu15/cpuidle
> # grep . */*
> state0/desc:CPUIDLE CORE POLL IDLE
> state0/latency:0
> state0/name:C0
> state0/power:4294967295
> state0/time:0
> state0/usage:0
> state1/desc:ACPI FFH INTEL MWAIT 0x0
> state1/latency:1
> state1/name:C1
> state1/power:1000
> state1/time:433855186
> state1/usage:126869
> state2/desc:ACPI FFH INTEL MWAIT 0x10
> state2/latency:64
> state2/name:C2
> state2/power:500
> state2/time:198095020416
> state2/usage:76287744
> 
> C2 latency seems to be 64  (us ?), while C1 seems to be 1
> 
> BIOS Information
> 	Vendor: HP
> 	Version: I24
> 	Release Date: 10/01/2009
> 
> # powertop
> PowerTOP 1.11   (C) 2007, 2008 Intel Corporation 
> 
> Collecting data for 5 seconds 
> 
> 
> Your CPU supports the following C-states : C1 C2 C3 
> Your BIOS reports the following C-states : C1 C2 
> 
> C3 seems to be disabled in BIOS
> 

I took a look at BIOS settings and enabled the minimum sleep state to be
C6 (instead of C3, the default). Now we see C3 being available...

No changes, only more IPI delivered during the test, and more overhead
in clockevents_notify()

# grep . */*
state0/desc:CPUIDLE CORE POLL IDLE
state0/latency:0
state0/name:C0
state0/power:4294967295
state0/time:0
state0/usage:0
state1/desc:ACPI FFH INTEL MWAIT 0x0
state1/latency:1
state1/name:C1
state1/power:1000
state1/time:39432
state1/usage:119
state2/desc:ACPI FFH INTEL MWAIT 0x10
state2/latency:64
state2/name:C2
state2/power:500
state2/time:3170745
state2/usage:11177
state3/desc:ACPI FFH INTEL MWAIT 0x20
state3/latency:96
state3/name:C3
state3/power:350
state3/time:1030987453
state3/usage:14047019

---------------------------------------------------------------------------------------------------------------------------
   PerfTop:   15984 irqs/sec  kernel:98.5% [1000Hz cycles],  (all, 16 CPUs)
---------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                       DSO
             _______ _____ ______________________________ _______

            23822.00 40.2% _raw_spin_lock_irqsave         vmlinux
             4413.00  7.4% acpi_os_read_port              vmlinux
             1426.00  2.4% _raw_spin_lock                 vmlinux
             1284.00  2.2% _raw_spin_unlock_irqrestore    vmlinux
             1247.00  2.1% schedule                       vmlinux
             1137.00  1.9% bnx2x_rx_int                   vmlinux
              643.00  1.1% tick_broadcast_oneshot_control vmlinux
              597.00  1.0% copy_user_generic_string       vmlinux
              595.00  1.0% __napi_complete                vmlinux
              550.00  0.9% call_function_single_interrupt vmlinux
              548.00  0.9% bnx2x_msix_fp_int              vmlinux
              486.00  0.8% __netif_receive_skb            vmlinux
              461.00  0.8% bnx2x_poll                     vmlinux
              433.00  0.7% eth_type_trans                 vmlinux
              428.00  0.7% acpi_idle_enter_bm             vmlinux
              422.00  0.7% sock_recv_ts_and_drops         vmlinux
              382.00  0.6% __udp4_lib_lookup              vmlinux
              369.00  0.6% __slab_free                    vmlinux
              357.00  0.6% ip_route_input                 vmlinux
              341.00  0.6% kfree                          vmlinux
              335.00  0.6% ipt_do_table                   vmlinux
              334.00  0.6% ip_rcv                         vmlinux
              332.00  0.6% udp_recvmsg                    vmlinux
              317.00  0.5% __kmalloc_node_track_caller    vmlinux

    37.46%            init  [kernel.kallsyms]             [k] _raw_spin_lock_irqsave
                      |
                      --- _raw_spin_lock_irqsave
                         |          
                         |--95.58%-- clockevents_notify
                         |          lapic_timer_state_broadcast
                         |          acpi_idle_enter_bm
                         |          cpuidle_idle_call
                         |          cpu_idle
                         |          start_secondary
                         |          
                         |--3.27%-- tick_broadcast_oneshot_control
                         |          tick_notify
                         |          notifier_call_chain
                         |          __raw_notifier_call_chain
                         |          raw_notifier_call_chain
                         |          clockevents_do_notify
                         |          clockevents_notify
                         |          lapic_timer_state_broadcast
                         |          acpi_idle_enter_bm



^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-05-02 14:27 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <20100502071324.1a95fdad@infradead.org>

Le dimanche 02 mai 2010 à 07:13 -0700, Arjan van de Ven a écrit :
> > 
> > Cn                Avg residency       P-states (frequencies)
> > C0 (cpu running)        (68.9%)         2.93 Ghz    46.5%
> > polling           0.0ms ( 0.0%)         2.80 Ghz     5.1%
> > C1 mwait          0.0ms ( 0.0%)         2.53 Ghz     3.0%
> > C2 mwait          0.0ms (31.1%)         2.13 Ghz     2.8%
> >                                         1.60 Ghz    38.2%
> 
> I bet your system advertizes C2 with the same latency as C1,
> but with lower power... which means Linux will pretty much never
> pick C1.... no matter how much you take Andi's patch.
> 
> this is a bios thing... and until we put in the patch to override the
> bios values (I can dust it off but it might need a bit of tweaking
> since it was against .31) Andi's patch alone won't cut it... you also
> need a non-lying bios ;)
> 
> 
> 
# pwd
/sys/devices/system/cpu/cpu15/cpuidle
# grep . */*
state0/desc:CPUIDLE CORE POLL IDLE
state0/latency:0
state0/name:C0
state0/power:4294967295
state0/time:0
state0/usage:0
state1/desc:ACPI FFH INTEL MWAIT 0x0
state1/latency:1
state1/name:C1
state1/power:1000
state1/time:433855186
state1/usage:126869
state2/desc:ACPI FFH INTEL MWAIT 0x10
state2/latency:64
state2/name:C2
state2/power:500
state2/time:198095020416
state2/usage:76287744

C2 latency seems to be 64  (us ?), while C1 seems to be 1

BIOS Information
	Vendor: HP
	Version: I24
	Release Date: 10/01/2009

# powertop
PowerTOP 1.11   (C) 2007, 2008 Intel Corporation 

Collecting data for 5 seconds 


Your CPU supports the following C-states : C1 C2 C3 
Your BIOS reports the following C-states : C1 C2 

C3 seems to be disabled in BIOS



^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Arjan van de Ven @ 2010-05-02 14:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <1272797690.2173.26.camel@edumazet-laptop>

> 
> Cn                Avg residency       P-states (frequencies)
> C0 (cpu running)        (68.9%)         2.93 Ghz    46.5%
> polling           0.0ms ( 0.0%)         2.80 Ghz     5.1%
> C1 mwait          0.0ms ( 0.0%)         2.53 Ghz     3.0%
> C2 mwait          0.0ms (31.1%)         2.13 Ghz     2.8%
>                                         1.60 Ghz    38.2%

I bet your system advertizes C2 with the same latency as C1,
but with lower power... which means Linux will pretty much never
pick C1.... no matter how much you take Andi's patch.

this is a bios thing... and until we put in the patch to override the
bios values (I can dust it off but it might need a bit of tweaking
since it was against .31) Andi's patch alone won't cut it... you also
need a non-lying bios ;)



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* Re: [PATCH 0/3] [RFC] ptp: IEEE 1588 clock support
From: Wolfgang Grandegger @ 2010-05-02 11:51 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev
In-Reply-To: <20100429153401.GA26741@riccoc20.at.omicron.at>

Richard Cochran wrote:
> On Thu, Apr 29, 2010 at 02:02:59PM +0200, Wolfgang Grandegger wrote:
>> I realized two other netdev drivers already supporting PTP timestamping:
>> igb and bfin_mac. From the PTP developer point of view, the interface
>> looks rather complete to me and it works fine on my MPC8313 setup.
> 
> Do you know whether these two also have PTP clocks? If so, is the API
> that I suggested going to work for controlling those clocks, too?
> 
>> The only thing I stumbled over was that PTP clock registration
>> failed when PTP support is statically linked into the kernel.
> 
> Okay, will look into that...

With subsys_initcall(), ptp_gianfar_init() is called very early and
*before* ptp_init(). It works fine with module_init(). The
ptp_gianfar_init() is then called after gianfar_init().

There is another minor issue with module init/probe ordering.
gianfar_ptp_probe() does overtake the time from the system clock, which
may happen before the RTC is probed and initialized. But syncing with
the system time is a separate issue anyway.

Wolfgang.

^ permalink raw reply

* Re: ep93xx_eth stopps receiving packages
From: Lennert Buytenhek @ 2010-05-02 10:43 UTC (permalink / raw)
  To: Stefan Agner; +Cc: netdev
In-Reply-To: <20100419173813.7750395f4fkkmrk0@limpopo.deheime.ch>

On Mon, Apr 19, 2010 at 05:38:13PM +0200, Stefan Agner wrote:

> I'm using Linux 2.6.32.9 on a technologic systems TS-7250 SBC board, with
> the ep93xx_eth driver for networking. On three identical, but independent
> systems I noted that the system is unreachable after a while. On a serial
> terminal I noted that only the TX counter counts onward, RX stays where it 
> is,
> no matter if i try to ping from or to the system. Wireshark tells me exactly
> that too: I see helpless ARP requests which gets answered, but no ICMP. The
> system doesnt receive the ARP requests, and just sends another one.

(So does the board or does it not respond to ARP requests for its IP?)


> With a simple program which sends small packages in a fast pace I can
> reproduce the problem after several seconds (additional CPU load seem to
> provoke the problem even more). Remove and replug the network cable doesn't
> solve the problem, but ifup/down does. I don't see any messages in dmesg,
> memory is still available.

Do you see interrupts increasing in /proc/interrupts when this happens?

^ permalink raw reply

* Re: Re: Kernel panic in fib_rules_lookup (kernel 2.6.32)
From: "Oleg A. Arkhangelsky" @ 2010-05-02 10:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1268154568.3113.19.camel@edumazet-laptop>

Hello,

09.03.10, 17:09, "Eric Dumazet" <eric.dumazet@gmail.com>:

> Le mardi 09 mars 2010 à 10:44 +0300, "Oleg A. Arkhangelsky" a écrit :
>  > Hello,
>  > 
>  > Got this kernel panic tomorrow. This PC is rather heavy loaded router with BGP full view (> 300K routes).
>  > We're using FIB_TRIE. Last time we got similar panic about 1 month ago. Please, let me know if you
>  > need additional information to debug (e.g. objdump). Thanks!
>  > 
>  > Mar  9 10:08:55 bras-1 kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
>  > Mar  9 10:08:55 bras-1 kernel: IP: [] fib_rules_lookup+0xa2/0xd0
>  > Mar  9 10:08:55 bras-1 kernel: *pde = 00000000
>  > Mar  9 10:08:55 bras-1 kernel: Thread overran stack, or stack corrupted
>  
>  Hmm...

Got the same panic, at the same place (fib_rules_lookup+0xa2/0xd0). Looks like the problem with NULL dereference is somewhere in list_for_each_entry_rcu macro. But I don't understand how this can be.

Any thoughts? :(

-- 
wbr, Oleg.

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Eric Dumazet @ 2010-05-02 10:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Miller, hadi, xiaosuo, therbert, shemminger, netdev, lenb,
	arjan
In-Reply-To: <20100502092020.GA9655@gargoyle.fritz.box>

Le dimanche 02 mai 2010 à 11:20 +0200, Andi Kleen a écrit :
> > I tried it on the right spot (since my bench was only doing recvmsg()
> > calls, I had to patch wait_for_packet() in net/core/datagram.c
> > 
> > udp_recvmsg -> __skb_recv_datagram -> wait_for_packet ->
> > schedule_timeout
> > 
> > Unfortunatly, using io_schedule_timeout() did not solve the problem.
> 
> Hmm, too bad. Weird.
> 
> > 
> > Tell me if you need some traces or something.
> 
> I'll try to reproduce it and see what I can do.
> 

Here the perf report on the latest test done, I confirm I am using
io_schedule_timeout() in this kernel.

In this test, all 16 queues of one BCM57711E NIC (1Gb link) delivers
 packets at about 1.300.000 pps to 16 cpus (one cpu per queue) and these
packets are then redistributed by RPS to same 16 cpus, generating about
650.000 IPI per second.

top says :
Cpu(s):  3.0%us, 17.3%sy,  0.0%ni, 22.4%id, 28.2%wa,  0.0%hi, 29.1%si,
0.0%st


# Samples: 321362570767
#
# Overhead         Command                 Shared Object  Symbol
# ........  ..............  ............................  ......
#
    25.08%            init  [kernel.kallsyms]             [k] _raw_spin_lock_irqsave
                      |
                      --- _raw_spin_lock_irqsave
                         |          
                         |--93.47%-- clockevents_notify
                         |          lapic_timer_state_broadcast
                         |          acpi_idle_enter_bm
                         |          cpuidle_idle_call
                         |          cpu_idle
                         |          start_secondary
                         |          
                         |--4.70%-- tick_broadcast_oneshot_control
                         |          tick_notify
                         |          notifier_call_chain
                         |          __raw_notifier_call_chain
                         |          raw_notifier_call_chain
                         |          clockevents_do_notify
                         |          clockevents_notify
                         |          lapic_timer_state_broadcast
                         |          acpi_idle_enter_bm
                         |          cpuidle_idle_call
                         |          cpu_idle
                         |          start_secondary
                         |          
                         |--0.64%-- generic_exec_single
                         |          __smp_call_function_single
                         |          net_rps_action_and_irq_enable
...
     9.72%            init  [kernel.kallsyms]             [k] acpi_os_read_port
                      |
                      --- acpi_os_read_port
                         |          
                         |--99.45%-- acpi_hw_read_port
                         |          acpi_hw_read
                         |          acpi_hw_read_multiple
                         |          acpi_hw_register_read
                         |          acpi_read_bit_register
                         |          acpi_idle_enter_bm
                         |          cpuidle_idle_call
                         |          cpu_idle
                         |          start_secondary
                         |          
                          --0.55%-- acpi_hw_read
                                    acpi_hw_read_multiple

powertop says :
     PowerTOP version 1.11      (C) 2007 Intel Corporation

Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        (68.9%)         2.93 Ghz    46.5%
polling           0.0ms ( 0.0%)         2.80 Ghz     5.1%
C1 mwait          0.0ms ( 0.0%)         2.53 Ghz     3.0%
C2 mwait          0.0ms (31.1%)         2.13 Ghz     2.8%
                                        1.60 Ghz    38.2%

Wakeups-from-idle per second : 45177.8  interval: 5.0s
no ACPI power usage estimate available

Top causes for wakeups:
   9.9% (40863.0)       <interrupt> : eth1-fp-7 
   9.9% (40861.0)       <interrupt> : eth1-fp-8 
   9.9% (40858.0)       <interrupt> : eth1-fp-5 
   9.9% (40855.2)       <interrupt> : eth1-fp-10 
   9.9% (40847.6)       <interrupt> : eth1-fp-14 
   9.9% (40847.2)       <interrupt> : eth1-fp-12 
   9.9% (40835.0)       <interrupt> : eth1-fp-1 
   9.9% (40834.2)       <interrupt> : eth1-fp-3 
   9.9% (40834.0)       <interrupt> : eth1-fp-6 
   9.9% (40829.6)       <interrupt> : eth1-fp-4 
   1.0% (4002.0)     <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 
   0.4% (1725.6)       <interrupt> : extra timer interrupt 
   0.0% (  4.0)     <kernel core> : usb_hcd_poll_rh_status (rh_timer_func)
   0.0% (  2.0)     <kernel core> : clocksource_watchdog (clocksource_watchdog)
   0.0% (  2.0)             snmpd : hrtimer_start_range_ns (hrtimer_wakeup)



^ permalink raw reply

* Re: [PATCH 1/3] ptp: Added a brand new class driver for ptp clocks.
From: Wolfgang Grandegger @ 2010-05-02 10:50 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev
In-Reply-To: <20100429091936.GA6703@riccoc20.at.omicron.at>

Hi Richard,

Richard Cochran wrote:
> This patch adds an infrastructure for hardware clocks that implement
> IEEE 1588, the Precision Time Protocol (PTP). A class driver offers a
> registration method to particular hardware clock drivers. Each clock is
> exposed to user space as a character device with ioctls that allow tuning
> of the PTP clock.
> 
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
> ---
...
> diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
> new file mode 100644
> index 0000000..a5acac4
> --- /dev/null
> +++ b/drivers/ptp/ptp_clock.c
...
> +static int ptp_open(struct inode *inode, struct file *fp)
> +{
> +	struct ptp_clock *ptp;
> +	ptp = container_of(inode->i_cdev, struct ptp_clock, cdev);
> +
> +	if (mutex_lock_interruptible(&ptp->mux))
> +		return -ERESTARTSYS;
> +
> +	fp->private_data = ptp;
> +
> +	return 0;
> +}
...
> +static int ptp_release(struct inode *inode, struct file *fp)
> +{
> +	struct ptp_clock *ptp = fp->private_data;
> +	mutex_unlock(&ptp->mux);
> +	return 0;
> +}

As long as the device is in use by an application, no other can access
it, because the mutex is locked. Other application may want to read the
PTP clock time while ptpd is running, though.

Wolfgang.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: eth_type_trans() should inline skb_pull()
From: David Miller @ 2010-05-02 10:03 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, therbert, hadi
In-Reply-To: <1272783032.2173.8.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 02 May 2010 08:50:32 +0200

> Excellent !

Great, here is the commit message I will use:

--------------------
net: Inline skb_pull() in eth_type_trans().

In commit 6be8ac2f ("[NET]: uninline skb_pull, de-bloats a lot")
we uninlined skb_pull.

But in some critical paths it makes sense to inline this thing
and it helps performance significantly.

Create an skb_pull_inline() so that we can do this in a way that
serves also as annotation.

Based upon a patch by Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
--------------------

> Could we assume all eth_type_trans() must call it with initial
> skb->len >= (46 + 12) or not ?  (According to ethernet specs, all
> frames should have a minimum payload of 46 bytes)
>
> If not sure, maybe we should issue a WARN_ON_ONCE()
> 
> If yes, tests could be removed and we could gain two cycles ;)

Isn't the minimum ETH_ZLEN?

But yes, regardless of whether the minimum ethernet frame is 58 or 60
bytes, we should require it's at least that big, and use that test
consistently throughout.

Anything smaller is a runt packet and should be tossed or marked as an
error in some other way by the hardware.  They should never make it to
eth_type_trans().

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Andi Kleen @ 2010-05-02  9:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, hadi, xiaosuo, therbert, shemminger, netdev, lenb,
	arjan
In-Reply-To: <1272783366.2173.13.camel@edumazet-laptop>

> I tried it on the right spot (since my bench was only doing recvmsg()
> calls, I had to patch wait_for_packet() in net/core/datagram.c
> 
> udp_recvmsg -> __skb_recv_datagram -> wait_for_packet ->
> schedule_timeout
> 
> Unfortunatly, using io_schedule_timeout() did not solve the problem.

Hmm, too bad. Weird.

> 
> Tell me if you need some traces or something.

I'll try to reproduce it and see what I can do.

-Andi


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox