Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [GIT PULL] vhost-net fix for 2.6.34-rc4
From: Michael S. Tsirkin @ 2010-04-15  4:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, hch
In-Reply-To: <20100414.155715.210946931.davem@davemloft.net>

On Wed, Apr 14, 2010 at 03:57:15PM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Wed, 14 Apr 2010 17:17:33 +0300
> 
> > Christoph Hellwig (1):
> >       vhost: fix sparse warnings
> 
> Do the sparse warnings show actual real bugs, or are we just
> adding annotations to make the tool happy?
> 
> Unless the bugs being fixed are earth shattering, this belongs
> in net-next-2.6 not net-2.6

No, these are just annotations to make the tool happy.
Pls queue for net-next-2.6. Thanks!

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Eric Dumazet @ 2010-04-15  4:14 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271302846.16881.1828.camel@edumazet-laptop>

Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit :
> Le jeudi 15 avril 2010 à 05:28 +0200, Eric Dumazet a écrit :
> > >  
> > > +	per_cpu(netdev_rx_stat, cpu).dropped++;
> > 
> > This is a bit expensive, and could be a queue->rx_stat.dropped++
> > if netdev_rx_stat is moved inside queue structure.
> 
> Hmm, this is not really needed, this event is the slow path and should
> almost never happen. This part of your fix is fine.
> 
> 
> -

Oh well, I revert this comment, we need a mutual exclusion between cpus
if we dont want to miss some increments, even in a slow path.

__get_cpu_var(netdev_rx_stat).field++; is safe because updated only by
this cpu.

Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a
synchronization since this is not atomic.

so queue->dropped++ is safer and faster too (inside the rps_lock
section)

^ permalink raw reply

* Re: [PATCH] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  4:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Tom Herbert, netdev
In-Reply-To: <1271304846.16881.1881.camel@edumazet-laptop>

On Thu, Apr 15, 2010 at 12:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 15 avril 2010 à 05:40 +0200, Eric Dumazet a écrit :
>
> Oh well, I revert this comment, we need a mutual exclusion between cpus
> if we dont want to miss some increments, even in a slow path.
>
> __get_cpu_var(netdev_rx_stat).field++; is safe because updated only by
> this cpu.
>
> Once you use per_cpu(netdev_rx_stat, cpu).field++; you need a
> synchronization since this is not atomic.
>

rps_unlock() is a synchronization if needed.

> so queue->dropped++ is safer and faster too (inside the rps_lock
> section)
>

It involves more lines of code, moving netdev_rx_stat into softdata.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH v2] net: fix softnet_stat
From: Changli Gao @ 2010-04-15  5:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Herbert, Eric Dumazet, netdev, Changli Gao

fix softnet_stat

Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.

This patch splits softnet_data.total into softnet_data.received and
softnet_data.dropped internally, in order to avoid it sharing between IRQ and
SoftIRQ. The old ABI softnet_data.total is kept by summing softnet_data.received
and softnet_data.dropped when exporting it to userland. And softnet_data.dropped
is protected in the corresponding input_pkt_queue.lock, if RPS is enabled.

This patch also fixed a bug: packets received by enqueue_to_backlog() were
counted twice: one in enqueue_to_backlog(), and the other in
__netif_receive_skb(), although they maybe not on the same CPUs, if RPS is
involved.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netdevice.h |    2 +-
 net/core/dev.c            |    7 +++----
 2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d1a21b5..394e850 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -219,7 +219,7 @@ struct neigh_parms;
 struct sk_buff;

 struct netif_rx_stats {
-	unsigned total;
+	unsigned received;
 	unsigned dropped;
 	unsigned time_squeeze;
 	unsigned cpu_collision;
diff --git a/net/core/dev.c b/net/core/dev.c
index a10a216..5817f0e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2336,7 +2336,6 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
 	queue = &per_cpu(softnet_data, cpu);

 	local_irq_save(flags);
-	__get_cpu_var(netdev_rx_stat).total++;

 	rps_lock(queue);
 	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
@@ -2366,9 +2365,9 @@ enqueue:
 		goto enqueue;
 	}

+	per_cpu(netdev_rx_stat, cpu).dropped++;
 	rps_unlock(queue);

-	__get_cpu_var(netdev_rx_stat).dropped++;
 	local_irq_restore(flags);

 	kfree_skb(skb);
@@ -2679,7 +2678,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 			skb->dev = master;
 	}

-	__get_cpu_var(netdev_rx_stat).total++;
+	__get_cpu_var(netdev_rx_stat).received++;

 	skb_reset_network_header(skb);
 	skb_reset_transport_header(skb);
@@ -3565,7 +3564,7 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
 	struct netif_rx_stats *s = v;

 	seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
-		   s->total, s->dropped, s->time_squeeze, 0,
+		   s->received + s->dropped, s->dropped, s->time_squeeze, 0,
 		   0, 0, 0, 0, /* was fastroute */
 		   s->cpu_collision, s->received_rps);
 	return 0;

^ permalink raw reply related

* RE: [PATCH] Infiniband: Randomize local port allocation.
From: Tetsuo Handa @ 2010-04-15  5:46 UTC (permalink / raw)
  To: sean.hefty; +Cc: netdev, rolandd, amwang
In-Reply-To: <195EB1AD388F455AA386EB214FCD09F4@amr.corp.intel.com>

Tetsuo Handa wrote:
> Sean Hefty wrote:
> > >+	remaining = (high - low) + 1;
> > >+	rover = net_random() % remaining + low;
> > >+	do {
> > >+		rover++;
> > >+		if ((rover < low) || (rover > high))
> > >+			rover = low;
> > 
> > Assuming that we're likely to pick a valid port on the first try, it would be
> > more efficient to move the above 3 lines to the end of the while loop.
> > 
> Indeed. I moved these lines to "if (--remaining) { ... }" block.

Oh, I made more changes than needed.
Adding "goto start;" and "start:" are sufficient.
Please use whichever you like.
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   71 +++++++++++++++---------------------------
 1 file changed, 26 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,34 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+	goto start;
+	do {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+start:
+		if (last_used_port != rover &&
+		    !idr_find(ps, (unsigned short) rover)) {
+			int ret = cma_alloc_port(ps, id_priv, rover);
+			/*
+			 * Remember previously used port number in order to
+			 * avoid re-using same port immediately after it is
+			 * closed.
+			 */
+			if (!ret)
+				last_used_port = rover;
+			if (ret != -EADDRNOTAVAIL)
+				return ret;
 		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
-	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	} while (--remaining);
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2981,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply

* Re: [GIT PULL] vhost-net fix for 2.6.34-rc4
From: David Miller @ 2010-04-15  5:53 UTC (permalink / raw)
  To: mst; +Cc: netdev, hch
In-Reply-To: <20100415040557.GA12108@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 15 Apr 2010 07:05:57 +0300

> On Wed, Apr 14, 2010 at 03:57:15PM -0700, David Miller wrote:
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Wed, 14 Apr 2010 17:17:33 +0300
>> 
>> > Christoph Hellwig (1):
>> >       vhost: fix sparse warnings
>> 
>> Do the sparse warnings show actual real bugs, or are we just
>> adding annotations to make the tool happy?
>> 
>> Unless the bugs being fixed are earth shattering, this belongs
>> in net-next-2.6 not net-2.6
> 
> No, these are just annotations to make the tool happy.
> Pls queue for net-next-2.6. Thanks!

Done, thanks Michael.

^ permalink raw reply

* Re: [linux-2.6.34-rc3] drivers/net: ks8851 MLL ethernet network driver
From: David Miller @ 2010-04-15  6:38 UTC (permalink / raw)
  To: David.Choi; +Cc: netdev

I cannot apply this, there are too many issues:

@@ -378,34 +384,34 @@ union ks_tx_hdr {

 /**
  * struct ks_net - KS8851 driver private data
- * @net_device 	: The network device we're bound to
+ * @net_device	: The network device we're bound to
  * @hw_addr	: start address of data register.

Do not mix lots of coding style and functional changes.

If you want to fix coding style, do it later in a seperate
patch after you implement your functional change.

+	if (!reg_data & CCR_ENDIAN) {

This test will never do what you want it to do.

First the compiler will evaluate "!reg_data" which will be either "0"
or "1", then it will and it with the CCR_ENDIAN mask, which is (1 <<
10).  And this can never be true.

I also do not like all of the endian ifdefs peppered all over the
driver.  Try consolidate this by creating a header file that contains
the structure and macro definitions, and in there create helper inline
functions which do the ifdefs.  This way the ks8851_mll.c file can
stay ifdef clean.

Thanks.

^ permalink raw reply

* Re: [PATCH] drivers/net: remove network drivers' last few uses of IRQF_SAMPLE_RANDOM
From: David Miller @ 2010-04-15  6:42 UTC (permalink / raw)
  To: cpeterso; +Cc: netdev, linux-kernel, mpm
In-Reply-To: <1270877380-23813-1-git-send-email-cpeterso@cpeterso.com>

From: Chris Peterson <cpeterso@cpeterso.com>
Date: Sat, 10 Apr 2010 01:29:40 -0400

> feature-removal-schedule.txt says a new /dev/random entropy model is planned 
> (for July 2009). Until then, this patch removes the few remaining uses of 
> IRQF_SAMPLE_RANDOM from drivers/net/*. 12 network drivers are affected, one 
> more than last year (netxen_nic_main.c). If idle servers need more entropy, 
> they ought to use a hardware RNG or an entropy-gathering userspace daemon.
> 
> Signed-off-by: Chris Peterson <cpeterso@cpeterso.com>

Well, that feature-removal-schedule.txt entry states that new
add_*_randomness() interfaces will be created such that things
like network devices can indicate what kind of entropy they
are providing.

Those interfaces have not yet been created as far as I can tell, and
the idea is that network drivers could use those not-yet-written new
interfaces.

Until that time I feel it does more harm than good to remove these
annotations since frankly there never ever was good agreement about
whether we should do this or not.

It may seem stupid and pointless to just do nothing, but at this time
I think that's still the best thing to do.

Therefore, I'm not applying your patch, sorry.

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: David Miller @ 2010-04-15  6:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <1271056697.16881.7.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 12 Apr 2010 09:18:17 +0200

> [PATCH] net: dev_pick_tx() fix
> 
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> 
> Bug introduced by commit a4ee3ce3
> (net: Use sk_tx_queue_mapping for connected sockets)
> 
> Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

It looks like Denys is still getting crashes even with this patch
applied.  And I also think there is some meric to some of Krishna's
analysis.

To me it seems to make more sense to validate the SKB's queue against
the real actual choosen device's range.

The socket queue index will catch up and eventually become valid
because the dst reset will invalidate the queue setting, and we'll
thus recompute it as needed, as Krishna stated.

So I'm tossing this patch for now since it doesn't even aparently
fix the bug.

^ permalink raw reply

* Re: [PATCH 2.6.34-git] 8139too: fix Coding Styles
From: David Miller @ 2010-04-15  7:08 UTC (permalink / raw)
  To: jblanco; +Cc: netdev
In-Reply-To: <ba865f7b49c5df1b71bc393ad42ff40a.squirrel@neurowork.net>

From: "Javier Blanco de Torres (Neurowork)" <jblanco@neurowork.net>
Date: Mon, 12 Apr 2010 09:36:43 +0200

> Fixed coding styles in the 8139too net driver.
> 
> Signed-off-by: Javier Blanco de Torres <jblanco@neurowork.net>
> Signed-off-by: Alejandro Sánchez Acosta <asanchez@neurowork.net>

This is why I absolutely hate pure checkpatch.pl patches, people just
try to make the tool happy and don't think about what the tool is
trying to tell them.

The worst of this is this "typedef enum" part of your changes:

-typedef enum {
+enum {
 	RTL8139 = 0,
 	RTL8129,
 } board_t;

and checkpatch was telling you:

WARNING: do not add new typedefs
#220: FILE: net/8139too.c:220:
+typedef enum {

Well, you're still adding a new type!  Getting rid of the type name is
what it's telling you to stop doing.

It's still a newly named type after your change, it wants you to get
rid of the "board_t" thing altogether.

Give the enum a real "enum" name like:

enum rtl8139_board_t {

Then use _THAT_ in the sources:

	enum rtl8139_board_t x;

The typedef section of Documentation/CodingStyle makes this very
clear.

But your entire patch is like this, the changes are largely pointless
and many of them are false interpreations of what checkpatch complains
about.

Therefore I really don't encourage that you pursue this any further,
sorry.

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  7:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, eparis, netdev
In-Reply-To: <1271142857.16881.193.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Apr 2010 09:14:17 +0200

> [PATCH net-next-2.6] net: netif_rx() must disable preemption
> 
> Eric Paris reported netif_rx() is calling smp_processor_id() from
> preemptible context, in particular when caller is
> ip_dev_loopback_xmit().
> 
> RPS commit added this smp_processor_id() call, this patch makes sure
> preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
> can dot it a bit earlier.
> 
> Reported-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

I've applied this with some coding style fixups.

Thanks!

--------------------
net: netif_rx() must disable preemption

Eric Paris reported netif_rx() is calling smp_processor_id() from
preemptible context, in particular when caller is
ip_dev_loopback_xmit().

RPS commit added this smp_processor_id() call, this patch makes sure
preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
can dot it a bit earlier.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/dev.c |   25 +++++++++++++++----------
 1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 876b111..e8041eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
 /*
  * get_rps_cpu is called from netif_receive_skb and returns the target
  * CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
  */
 static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 {
@@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
 	u8 ip_proto;
 	u32 addr1, addr2, ports, ihl;
 
-	rcu_read_lock();
-
 	if (skb_rx_queue_recorded(skb)) {
 		u16 index = skb_get_rx_queue(skb);
 		if (unlikely(index >= dev->num_rx_queues)) {
@@ -2296,7 +2295,6 @@ got_hash:
 	}
 
 done:
-	rcu_read_unlock();
 	return cpu;
 }
 
@@ -2392,7 +2390,7 @@ enqueue:
 
 int netif_rx(struct sk_buff *skb)
 {
-	int cpu;
+	int ret;
 
 	/* if netpoll wants it, pretend we never saw it */
 	if (netpoll_rx(skb))
@@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
 		net_timestamp(skb);
 
 #ifdef CONFIG_RPS
-	cpu = get_rps_cpu(skb->dev, skb);
-	if (cpu < 0)
-		cpu = smp_processor_id();
+	{
+		int cpu;
+
+		rcu_read_lock();
+		cpu = get_rps_cpu(skb->dev, skb);
+		if (cpu < 0)
+			cpu = smp_processor_id();
+		ret = enqueue_to_backlog(skb, cpu);
+		rcu_read_unlock();
+	}
 #else
-	cpu = smp_processor_id();
+	ret = enqueue_to_backlog(skb, get_cpu());
+	put_cpu();
 #endif
-
-	return enqueue_to_backlog(skb, cpu);
+	return ret;
 }
 EXPORT_SYMBOL(netif_rx);
 
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH] CONFIG_SMP should be CONFIG_RPS
From: David Miller @ 2010-04-15  7:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, netdev, therbert
In-Reply-To: <1271141947.16881.177.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Apr 2010 08:59:07 +0200

> Thanks Changli, this is part of RFS patches :)
> 

I'll apply this, Tom has to respin RFS for other reasons
already...

^ permalink raw reply

* [PATCH 1/2] igb: double increment nr_frags
From: Koki Sanagi @ 2010-04-15  7:20 UTC (permalink / raw)
  To: netdev, e1000-devel
  Cc: davem, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	peter.p.waskiewicz.jr, john.ronciak

There is no need to increment nr_frags becasue skb_fill_page_desc increments
it.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
  drivers/net/igb/igb_main.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 78cc742..8bde6c3 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -5254,7 +5254,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
  				       PAGE_SIZE / 2, PCI_DMA_FROMDEVICE);
  			buffer_info->page_dma = 0;
  
-			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags++,
+			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags,
  						buffer_info->page,
  						buffer_info->page_offset,
  						length);


^ permalink raw reply related

* [PATCH 2/2] igbvf: dobule increment nr_frags
From: Koki Sanagi @ 2010-04-15  7:23 UTC (permalink / raw)
  To: netdev, e1000-devel
  Cc: davem, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	peter.p.waskiewicz.jr, john.ronciak

There is no need to increment nr_frags becasue skb_fill_page_desc increments
it.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
 drivers/net/igbvf/netdev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igbvf/netdev.c b/drivers/net/igbvf/netdev.c
index ea8abf5..97cfc60 100644
--- a/drivers/net/igbvf/netdev.c
+++ b/drivers/net/igbvf/netdev.c
@@ -288,7 +288,7 @@ static bool igbvf_clean_rx_irq(struct igbvf_adapter *adapter,
 			               PCI_DMA_FROMDEVICE);
 			buffer_info->page_dma = 0;
 
-			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags++,
+			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags,
 			                   buffer_info->page,
 			                   buffer_info->page_offset,
 			                   length);


^ permalink raw reply related

* Re: [PATCH 1/2] igb: double increment nr_frags
From: David Miller @ 2010-04-15  7:28 UTC (permalink / raw)
  To: sanagi.koki
  Cc: netdev, e1000-devel, jeffrey.t.kirsher, jesse.brandeburg,
	bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <4BC6BE3C.6070204@jp.fujitsu.com>

Your email client corrupted your patch by applying text
formatting to it.

Please disable all text formatting in your email client
and resend your changes.

You can read Documentation/email-clients.txt for help in
this area.

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Changli Gao @ 2010-04-15  7:30 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <20100415.001446.244372815.davem@davemloft.net>

On Thu, Apr 15, 2010 at 3:14 PM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 13 Apr 2010 09:14:17 +0200
>
>> [PATCH net-next-2.6] net: netif_rx() must disable preemption
>>
>> Eric Paris reported netif_rx() is calling smp_processor_id() from
>> preemptible context, in particular when caller is
>> ip_dev_loopback_xmit().
>>
>> RPS commit added this smp_processor_id() call, this patch makes sure
>> preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
>> can dot it a bit earlier.
>>
>> Reported-by: Eric Paris <eparis@redhat.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> I've applied this with some coding style fixups.
>
> Thanks!
>
> --------------------
> net: netif_rx() must disable preemption
>
> Eric Paris reported netif_rx() is calling smp_processor_id() from
> preemptible context, in particular when caller is
> ip_dev_loopback_xmit().
>
> RPS commit added this smp_processor_id() call, this patch makes sure
> preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
> can dot it a bit earlier.
>
> Reported-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/core/dev.c |   25 +++++++++++++++----------
>  1 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 876b111..e8041eb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2206,6 +2206,7 @@ DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };
>  /*
>  * get_rps_cpu is called from netif_receive_skb and returns the target
>  * CPU from the RPS map of the receiving queue for a given skb.
> + * rcu_read_lock must be held on entry.
>  */
>  static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
>  {
> @@ -2217,8 +2218,6 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb)
>        u8 ip_proto;
>        u32 addr1, addr2, ports, ihl;
>
> -       rcu_read_lock();
> -
>        if (skb_rx_queue_recorded(skb)) {
>                u16 index = skb_get_rx_queue(skb);
>                if (unlikely(index >= dev->num_rx_queues)) {
> @@ -2296,7 +2295,6 @@ got_hash:
>        }
>
>  done:
> -       rcu_read_unlock();
>        return cpu;
>  }
>
> @@ -2392,7 +2390,7 @@ enqueue:
>
>  int netif_rx(struct sk_buff *skb)
>  {
> -       int cpu;
> +       int ret;
>
>        /* if netpoll wants it, pretend we never saw it */
>        if (netpoll_rx(skb))
> @@ -2402,14 +2400,21 @@ int netif_rx(struct sk_buff *skb)
>                net_timestamp(skb);
>
>  #ifdef CONFIG_RPS
> -       cpu = get_rps_cpu(skb->dev, skb);
> -       if (cpu < 0)
> -               cpu = smp_processor_id();
> +       {
> +               int cpu;
> +
> +               rcu_read_lock();
> +               cpu = get_rps_cpu(skb->dev, skb);
> +               if (cpu < 0)
> +                       cpu = smp_processor_id();
> +               ret = enqueue_to_backlog(skb, cpu);
> +               rcu_read_unlock();
> +       }
>  #else
> -       cpu = smp_processor_id();
> +       ret = enqueue_to_backlog(skb, get_cpu());
> +       put_cpu();
>  #endif
> -
> -       return enqueue_to_backlog(skb, cpu);
> +       return ret;
>  }
>  EXPORT_SYMBOL(netif_rx);
>

Should netif_rx() be used only when preemption is disabled? If not,
netif_rx_ni() should be used instead.?


-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  7:37 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <l2p412e6f7f1004150030o8a406133s91d6614cbb106796@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 15 Apr 2010 15:30:44 +0800

> Should netif_rx() be used only when preemption is disabled? If not,
> netif_rx_ni() should be used instead.?

netif_rx() must be invoked from a hardware or software interrupt,
which implies preemption disabled.

In netif_rx_ni(), the "ni" means "not interrupt".

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Changli Gao @ 2010-04-15  7:47 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <20100415.003711.159334670.davem@davemloft.net>

On Thu, Apr 15, 2010 at 3:37 PM, David Miller <davem@davemloft.net> wrote:
> From: Changli Gao <xiaosuo@gmail.com>
> Date: Thu, 15 Apr 2010 15:30:44 +0800
>
>> Should netif_rx() be used only when preemption is disabled? If not,
>> netif_rx_ni() should be used instead.?
>
> netif_rx() must be invoked from a hardware or software interrupt,
> which implies preemption disabled.
>
> In netif_rx_ni(), the "ni" means "not interrupt".
>

yea, I know netif_rx_ni()'s meaning. It means that the following
changes aren't necessary.

 #else
-       cpu = smp_processor_id();
+       ret = enqueue_to_backlog(skb, get_cpu());
+       put_cpu();

ret = enqueue_to_backlog(skb, smp_processor_id()); should be OK.

 #endif
-
-       return enqueue_to_backlog(skb, cpu);
+       return ret;
 }

^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  7:57 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <u2o412e6f7f1004150047nc9250ffuaef68be066a7575c@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 15 Apr 2010 15:47:26 +0800

> On Thu, Apr 15, 2010 at 3:37 PM, David Miller <davem@davemloft.net> wrote:
>> From: Changli Gao <xiaosuo@gmail.com>
>> Date: Thu, 15 Apr 2010 15:30:44 +0800
>>
>>> Should netif_rx() be used only when preemption is disabled? If not,
>>> netif_rx_ni() should be used instead.?
>>
>> netif_rx() must be invoked from a hardware or software interrupt,
>> which implies preemption disabled.
>>
>> In netif_rx_ni(), the "ni" means "not interrupt".
>>
> 
> yea, I know netif_rx_ni()'s meaning. It means that the following
> changes aren't necessary.
> 
>  #else
> -       cpu = smp_processor_id();
> +       ret = enqueue_to_backlog(skb, get_cpu());
> +       put_cpu();
> 

Why?  If we are in an interrupt (either soft or hard) then
smp_processor_id() is stable, and valid.

Changli, I find it very frustrating to communicate with you, you are
very terse in your descriptions and analysis and you make many simple
errors that would be avoided if you spent more time thinking about
things before sending your emails. :-/

Instead of just showing some pseudo patch, state WHY it is needed.
Talk about the execution state of environment and what rules or other
things are being violated which must be corrected.

^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: Eric Dumazet @ 2010-04-15  8:02 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <20100414.235256.190096561.davem@davemloft.net>

Le mercredi 14 avril 2010 à 23:52 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 12 Apr 2010 09:18:17 +0200
> 
> > [PATCH] net: dev_pick_tx() fix
> > 
> > When dev_pick_tx() caches tx queue_index on a socket, we must check
> > socket dst_entry matches skb one, or risk a crash later, as reported by
> > Denys Fedorysychenko, if old packets are in flight during a route
> > change, involving devices with different number of queues.
> > 
> > Bug introduced by commit a4ee3ce3
> > (net: Use sk_tx_queue_mapping for connected sockets)
> > 
> > Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> It looks like Denys is still getting crashes even with this patch
> applied.  And I also think there is some meric to some of Krishna's
> analysis.
> 


> To me it seems to make more sense to validate the SKB's queue against
> the real actual choosen device's range.
> 
> The socket queue index will catch up and eventually become valid
> because the dst reset will invalidate the queue setting, and we'll
> thus recompute it as needed, as Krishna stated.
> 

??? 


> So I'm tossing this patch for now since it doesn't even aparently
> fix the bug.

I am a bit lost here David.

Denys got a crash that we cannot explain yet. He said he has no
multiqueue devices, so obviously my patch cant help him.

But this patch was fixing a real issue, I believe I pointed it twice
already...

I'll try to setup an environment to trigger this bug for real, but this
will take time, my dev machines are not multiqueue.




^ permalink raw reply

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
From: David Miller @ 2010-04-15  8:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat
In-Reply-To: <1271318553.16881.2161.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Apr 2010 10:02:33 +0200

> Denys got a crash that we cannot explain yet. He said he has no
> multiqueue devices, so obviously my patch cant help him.
> 
> But this patch was fixing a real issue, I believe I pointed it twice
> already...

Ok, I thought your patch was specifically meant to fix Denys's bug.

The confusion comes from the fact that you mention Denys's crash in
your commit message:

--------------------
When dev_pick_tx() caches tx queue_index on a socket, we must check
socket dst_entry matches skb one, or risk a crash later, as reported by
Denys Fedorysychenko, if old packets are in flight during a route
change, involving devices with different number of queues.
--------------------

Anyways, I studied your patch once more and read the thread discussion
with Krishna again and your patch looks fine.  I'll apply it to
net-2.6, thanks!


^ permalink raw reply

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: Changli Gao @ 2010-04-15  8:27 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <20100415.005726.205428265.davem@davemloft.net>

On Thu, Apr 15, 2010 at 3:57 PM, David Miller <davem@davemloft.net> wrote:
>
> Why?  If we are in an interrupt (either soft or hard) then
> smp_processor_id() is stable, and valid.
>
> Changli, I find it very frustrating to communicate with you, you are
> very terse in your descriptions and analysis and you make many simple
> errors that would be avoided if you spent more time thinking about
> things before sending your emails. :-/
>
> Instead of just showing some pseudo patch, state WHY it is needed.
> Talk about the execution state of environment and what rules or other
> things are being violated which must be corrected.
>

Sorry. English isn't my native language, so sometimes I can't describe
myself clearly.

I think the following patch from Eric should be applied instead.

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c65f18e..d1bcc9f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
       newskb->pkt_type = PACKET_LOOPBACK;
       newskb->ip_summed = CHECKSUM_UNNECESSARY;
       WARN_ON(!skb_dst(newskb));
-       netif_rx(newskb);
+       netif_rx_ni(newskb);
       return 0;
 }

As you know  "netif_rx() must be invoked from a hardware or software
interrupt, which implies preemption disabled.", obviously
ip_dev_loopback_xmit() doesn't obey this rule, so the crash isn't the
fault of net_rx(). If there are other users, who don't obey this rule,
they should be fixed too.

For this patch:

-       cpu = smp_processor_id();
+       ret = enqueue_to_backlog(skb, get_cpu());
+       put_cpu();

You said: " If we are in an interrupt (either soft or hard) then
smp_processor_id() is stable, and valid.". so we don't need to call
get_cpu() instead of smp_processor_id(). get_cpu() brings no good but
additional cost preempt_disable().

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply related

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon: caller is netif_rx
From: David Miller @ 2010-04-15  8:33 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, therbert, eparis, netdev
In-Reply-To: <k2y412e6f7f1004150127u4e7ec668r80f066bfb3efea81@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 15 Apr 2010 16:27:19 +0800

> I think the following patch from Eric should be applied instead.
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index c65f18e..d1bcc9f 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -120,7 +120,7 @@ static int ip_dev_loopback_xmit(struct sk_buff *newskb)
>        newskb->pkt_type = PACKET_LOOPBACK;
>        newskb->ip_summed = CHECKSUM_UNNECESSARY;
>        WARN_ON(!skb_dst(newskb));
> -       netif_rx(newskb);
> +       netif_rx_ni(newskb);
>        return 0;
>  }

Yes, this looks more reasonable.  Eric if you agree please (re-)submit
this formally, I must have missed this somehow, sorry.

And this is a bug fix in any kernel, not just one's that have RPS
patches applied.

If we are not called from some interrupt context, there is no sure
trigger to make sure software interrupts will be executed after the
packet is queued locally.  netif_rx_ni() makes sure that any pending
software interrupts will run in such cases.

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:42 UTC (permalink / raw)
  To: hadi; +Cc: therbert, eric.dumazet, netdev, robert, xiaosuo, andi
In-Reply-To: <1271245986.3943.55.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 14 Apr 2010 07:53:06 -0400

> I base tested against no rps being used and a kernel which didnt
> have any RPS config on.  [BTW, I had to hand-edit the .config since
> i couldnt do it from menuconfig (Is there any reason for it to be
> so?)]

The RPS config is merely an indirect dependency on SMP as we have it
coded up in the Kconfig files, it's not meant to be user selectable
and is intended to be unconditionally on for SMP builds.

^ permalink raw reply

* Re: rps perfomance WAS(Re: rps: question
From: David Miller @ 2010-04-15  8:48 UTC (permalink / raw)
  To: hadi; +Cc: eric.dumazet, therbert, netdev, robert, xiaosuo, andi
In-Reply-To: <1271271222.4567.51.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 14 Apr 2010 14:53:42 -0400

> On Wed, 2010-04-14 at 20:04 +0200, Eric Dumazet wrote:
> 
>> Yes, multiqueue is far better of course, but in case of hardware lacking
>> multiqueue, RPS can help many workloads, where application has _some_
>> work to do, not only counting frames or so...
> 
> Agreed. So to enumerate, the benefits come in if:
> a) you have many processors
> b) you have single-queue nic
> c) at sub-threshold traffic you dont care about a little latency
> d) you have a specific cache hierachy
> e) app is working hard to process incoming messages

A single-queue NIC is actually not a requirement, RPS helps also in
cases where you have 'N' application threads and N is less than the
number of CPUs your multi-queue NIC is distributing traffic to.

Moving the bulk of the input packet processing to the cpus where
the applications actually sit had a non-trivial benefit.  RFS takes
this aspect to yet another level.

> I think the main challenge for my pedantic mind is missing details. Is
> there a paper on rps? Example for #d above, the commit log mentions that
> rps benefits if you have certain types of "cache hierachy". Probably
> some arch with large shared L2/3 (maybe inclusive) cache will benefit.
> example: it does well on Nehalem and probably opterons as long (as you
> dont start stacking these things on some interconnect like QPI or HT).
> But what happens when you have FSB sharing across cores (still a very
> common setup)? etc etc

I think for the case where application locality is important,
RPS/RFS can help regardless of cache details.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox