Netdev List
 help / color / mirror / Atom feed
* Re: kvm networking todo wiki
From: Michael S. Tsirkin @ 2010-10-10 11:35 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Sridhar Samudrala, David Stevens, sri, Anthony Liguori,
	Rusty Russell, Krishna Kumar2, Shirley Ma, Xin, Xiaohui, jdike,
	herbert, lmr, akong, kvm, qemu-devel, netdev
In-Reply-To: <AANLkTikzU6g2pYKoyXOWfGPE64VZusPaYdn+TQ14tE7_@mail.gmail.com>

On Sun, Oct 10, 2010 at 01:37:45PM +0200, Dragos Tatulea wrote:
> Hi,
> 
> > More importantly: anyone's going to work on this?
> 
> I'd like to work on this. Might need some assistance though.
> 
> Thanks,
> Dragos

Cool, you can add this info to the wiki.
Thanks!

-- 
MST

^ permalink raw reply

* Re: tbf/htb qdisc limitations
From: Jarek Poplawski @ 2010-10-10 11:23 UTC (permalink / raw)
  To: Steven Brudenell; +Cc: netdev
In-Reply-To: <AANLkTik6_C5pzx=yS=C0VQR2xfg0S57AZAn9WYiGvxe5@mail.gmail.com>

Steven Brudenell wrote:
> hi folks,
> 
> i was disappointed recently to find that i can't set the "burst"
> parameters very high on the tbf or htb qdiscs. the actual limit of the
> burst parameters varies, according to the rate parameter. at the
> relatively low rate i want to set, i want to have the burst parameter
> be several gigabytes, but i'm actually limited to only a few
> megabytes.
> 
> (motivation: a fully-automated way to stay inside the monthly transfer
> limits imposed by many ISPs these days, without resorting to a
> constant rate limit. for example, comcast limits its customers to
> 250GB/month, which is about 101KB/s; many cellular data plans in the
> US limit to 5GB/month =~ 2KB/s).

I'm not sure you checked how the "burst" works, and doubt it could
help you here. Anyway, do you think: rate 2KB/s with burst 5GB
config would be useful for you?

> 
> i'll gladly code a patch, but i'd like the list's advice on whether
> this is necessary, and a little bit about how to proceed:
> 
> 1) what is the purpose of the "rate tables" used in these qdiscs --
> why use them in favor of dividing bytes by time to compute a rate? i
> assume the answer has something to do with restrictions on using
> floating point math (maybe even integer division?) at different places
> / interruptability states in the kernel. maybe this is documented on
> kernelnewbies somewhere but i couldn't find it.
> 
> 2) is there an established procedure for versioning a netlink
> interface? today the netlink interface for tbf and htb is horribly
> implementation-coupled (the "burst" parameters need to be munged
> according to the "rate" parameters and kernel tick rate). i think i
> would need to change these interfaces in order to change the
> accounting implementation in the corresponding qdisc. however, i
> probably want to remain compatible with old userspace.

My proposal is you don't bother with 1) and 2), but first do the
hack in tbf or htb directly, using or omitting rate tables, how
you like, and test this idea.

But it seems the right way is to collect monthly stats with some
userspace tool and change qdisc config dynamically. You might
look at network admins' lists for small ISPs exemplary scripts
doing such nasty things to their users, or have a look at ppp
accounting tools.
 
Jarek P.

^ permalink raw reply

* Re: [Bugme-new] [Bug 19692] New: linux-2.6.36-rc5 crash with gianfar ethernet at full line rate traffic
From: Jarek Poplawski @ 2010-10-10 10:32 UTC (permalink / raw)
  To: emin ak
  Cc: Andrew Morton, netdev, bugzilla-daemon, bugme-daemon,
	Anton Vorontsov
In-Reply-To: <AANLkTi=EsfQbkt-YycXfuVpBDviCxKFuUdz4qZTXV+eG@mail.gmail.com>

On Sat, Oct 09, 2010 at 03:10:25PM +0300, emin ak wrote:
> Hi Jarek,
> My test setup is at my office so that I will try your patch on Monday
> immediately and turn you the results.

No need to hurry, especially on Mondays, but thanks for the info.

Jarek P.

^ permalink raw reply

* Re: [PATCH] Add IRQF_SAMPLE_RANDOM Flag to forcedeth
From: Eric Dumazet @ 2010-10-10  8:57 UTC (permalink / raw)
  To: Patrick Simmons; +Cc: David Miller, bhutchings, netdev
In-Reply-To: <4CB13195.2070007@netscape.net>

Le samedi 09 octobre 2010 à 21:23 -0600, Patrick Simmons a écrit :
> On 10/09/10 21:15, David Miller wrote:
> > From: Ben Hutchings<bhutchings@solarflare.com>
> > Date: Sun, 10 Oct 2010 02:09:24 +0100
> >
> >> Patrick Simmons wrote:
> >>> This patch adds the IRQF_SAMPLE_RANDOM flag to the forcedeth driver,
> >>> allowing the interrupt timing for forcedeth to be used for entropy
> >>> generation.  This should help /dev/random generate more secure random
> >>> numbers on machines using this driver.
> >> [...]
> >>
> >> We don't enable this for network drivers any more because:
> >>
> >> 1. At high packet rates, interrupt moderation makes interrupts very
> >> regular.
> >> 2. At low packet rates, a malicious sender can control the interrupt
> >> timing.
> >
> > Agreed on all counts, I'm not applying this patch.
> 
> It's enabled for other network drivers, which is where I got the idea 
> from.  Has anyone actually done an experiment to see whether these two 
> concerns are valid?

Several attemps in the past tried to go into one direction or another

(Add the flag to some driver, then remove it from others)

Please read commit 9d9b8fb0e5ebf4b0398e579
http://lkml.org/lkml/2009/4/6/283

A third reason not adding is : At moderate packet rates, _no_ entropy is
feeded at all because add_interrupt_randomness()/add_timer_randomness is
_very_ conservative, with first, second-order and third-order estimates.

credit_entropy_bits() is called with 0 bit

Adding this stuff has a high cost, I can see it in profiles on machines
with tg3 nics. I often remove the IRQF_SAMPLE_RANDOM flag localy.




^ permalink raw reply

* [PATCH net-next V2] igb: fix stats handling
From: Eric Dumazet @ 2010-10-10  8:14 UTC (permalink / raw)
  To: Tantilov, Emil S
  Cc: Kirsher, Jeffrey T, Jesper Dangaard Brouer, Duyck, Alexander H,
	Jesper Dangaard Brouer, David S. Miller, netdev, Wyborny, Carolyn
In-Reply-To: <EA929A9653AAE14F841771FB1DE5A1366027AC5DB5@rrsmsx501.amr.corp.intel.com>

Le samedi 09 octobre 2010 à 17:57 -0600, Tantilov, Emil S a écrit :
> Eric Dumazet wrote:
> > Le mercredi 06 octobre 2010 à 05:28 +0200, Eric Dumazet a écrit :
> > 
> >> I'll let Intel guys doing the backporting work, but for old kernels,
> >> you'll probably need to use "unsigned long" instead of "u64"
> >> 
> >> My plan is :
> >> 
> >> - Provide 64bit counters even on 32bit arch
> >> - with proper synchro (include/linux/u64_stats_sync.h)
> >> - Add a spinlock so we can apply Jesper patch.
> > 
> > Here is the net-next-2.6 patch, I am currently enable to test it, the
> > dev machine with IGB NIC cannot be restarted until tomorrow, my son
> > Nicolas is currently using it ;)
> > 
> > Could you and/or Jesper test it, possibly on 32 and 64 bit kernels ?
> > 
> > Thanks !
> > 
> > [PATCH net-next] igb: fix stats handling
> > 
> > There are currently some problems with igb.
> > 
> > - On 32bit arches, maintaining 64bit counters without proper
> > synchronization between writers and readers.
> > 
> > - Stats updated every two seconds, as reported by Jesper.
> >    (Jesper provided a patch for this)
> > 
> > - Potential problem between worker thread and ethtool -S
> > 
> > This patch uses u64_stats_sync, and convert everything to be 64bit
> > safe, 
> > SMP safe, even on 32bit arches.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> >  drivers/net/igb/igb.h         |    7 +-
> >  drivers/net/igb/igb_ethtool.c |   10 +-
> >  drivers/net/igb/igb_main.c    |  111 +++++++++++++++++++++++---------
> >  3 files changed, 94 insertions(+), 34 deletions(-)
> 
> This patch is causing a hang when testing with 2 sessions in a while loop reading /proc/net/dev/ and ethtool -S. I think even just reading /proc/net/dev/ is sufficient, but have not confirmed it yet. I have seen the hang somewhere between 15 min to an hour. Without the patch same test ran 24+ hours without issues.
> 
> There was no trace on the screen, I got this with magic sysrq:
> 
> [15388.393579] SysRq : Show Regs 
> [15388.397341] Modules linked in: igb [last unloaded: scsi_wait_scan] [15388.404846]  
> [15388.406889] Pid: 16218, comm: kworker/4:1 Not tainted 2.6.36-rc3-net-next-igb-100810+ #2 S5520HC/S5520HC 
> [15388.418393] EIP: 0060:[<c13fead2>] EFLAGS: 00000297 CPU: 4 
> [15388.424908] EIP is at _raw_spin_lock+0x13/0x19 
> [15388.430257] EAX: f6eab55c EBX: f6eab380 ECX: 00000001 EDX: 00004e4a [15388.437629] ESI: f6eab000 EDI: f6eab41c EBP: f3d9bf4c ESP: f3d9bf4c [15388.445011]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 
> [15388.451422] Process kworker/4:1 (pid: 16218, ti=f3d9a000 task=f5ce0ed0 task.ti=f3d9a000) 
> [15388.461173] Stack: 
> [15388.463796]  f3d9bf64 f8586c57 f3d9bf5c f6eab41c f6901e00 c8703cc0 f3d9bf90 c1041379 
> [15388.473116] <0> 00000004 00000000 f8586b16 00000000 c8707b05 c8707b00 f6901e00 c8703cc4 
> [15388.483379] <0> c8703cc0 f3d9bfb8 c1042879 c16bb640 c8703cc4 00000c54 c16bb640 f6901e10 
> [15388.494219] Call Trace: 
> [15388.497336]  [<f8586c57>] ? igb_watchdog_task+0x141/0x21a [igb] 
> [15388.504336]  [<c1041379>] ? process_one_work+0x18e/0x265 
> [15388.510643]  [<f8586b16>] ? igb_watchdog_task+0x0/0x21a [igb] 
> [15388.517455]  [<c1042879>] ? worker_thread+0xf3/0x1ef 
> [15388.523384]  [<c1042786>] ? worker_thread+0x0/0x1ef 
> [15388.529222]  [<c104506b>] ? kthread+0x62/0x67 
> [15388.534475]  [<c1045009>] ? kthread+0x0/0x67 
> [15388.539623]  [<c1002d36>] ? kernel_thread_helper+0x6/0x10 
> [15388.546034] Code: 00 75 05 f0 66 0f b1 0a 0f 94 c1 0f b6 c1 85 c0 0f 95 c0 0f b6 c0 5d c3 55 ba 00 01 00 00 89 e5 f0 66 0f c1 10 38 f2 74 06 f3 90 <8a> 10 eb f6 5d c3 55 89 e5 9c 59 fa ba 00 01 00 00 f0 66 0f c1
> 
> Thanks,
> Emil
> 

Hi Emil, thanks for testing.

It seems one one the u64_stats_sync invariant is not met.

I believe problem comes from "restart_queue"

This one can be updated in // by two cpus.

So we can lose one of the u64_stats_update_begin() /
u64_stats_update_end() increment and freeze a reader.

So igb had a previous bug here, un-noticed :)

restart_queue can be updated by start_xmit() (so only one cpu can be
there, because of txqueue lock serialization), but it also can be
updated by the igb_clean_tx_irq() function (one cpu there too).

One solution to this problem is to use two separate counters, with two
separate syncp.

[PATCH net-next V2] igb: fix stats handling

There are currently some problems with igb.

- On 32bit arches, maintaining 64bit counters without proper
synchronization between writers and readers.

- Stats updated every two seconds, as reported by Jesper.
   (Jesper provided a patch for this)

- Potential problem between worker thread and ethtool -S

This patch uses u64_stats_sync, and convert everything to be 64bit safe,
SMP safe, even on 32bit arches. It integrates Jesper idea of providing
accurate stats at the time user reads them.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
V2: Add a second restart_queue field, with separate syncp, because
original restart_queue was potentially udpated by two cpus.
   Corrected igb_get_ethtool_stats() to also use appropriate syncp,
   and do the sum of two restart_queue fields.

 drivers/net/igb/igb.h         |    9 ++
 drivers/net/igb/igb_ethtool.c |   52 ++++++++++----
 drivers/net/igb/igb_main.c    |  113 +++++++++++++++++++++++---------
 3 files changed, 129 insertions(+), 45 deletions(-)

diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index 44e0ff1..edab9c4 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -159,6 +159,7 @@ struct igb_tx_queue_stats {
 	u64 packets;
 	u64 bytes;
 	u64 restart_queue;
+	u64 restart_queue2;
 };
 
 struct igb_rx_queue_stats {
@@ -210,11 +211,14 @@ struct igb_ring {
 		/* TX */
 		struct {
 			struct igb_tx_queue_stats tx_stats;
+			struct u64_stats_sync tx_syncp;
+			struct u64_stats_sync tx_syncp2;
 			bool detect_tx_hung;
 		};
 		/* RX */
 		struct {
 			struct igb_rx_queue_stats rx_stats;
+			struct u64_stats_sync rx_syncp;
 			u32 rx_buffer_len;
 		};
 	};
@@ -288,6 +292,9 @@ struct igb_adapter {
 	struct timecompare compare;
 	struct hwtstamp_config hwtstamp_config;
 
+	spinlock_t stats64_lock;
+	struct rtnl_link_stats64 stats64;
+
 	/* structs defined in e1000_hw.h */
 	struct e1000_hw hw;
 	struct e1000_hw_stats stats;
@@ -357,7 +364,7 @@ extern netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *, struct igb_ring *);
 extern void igb_unmap_and_free_tx_resource(struct igb_ring *,
 					   struct igb_buffer *);
 extern void igb_alloc_rx_buffers_adv(struct igb_ring *, int);
-extern void igb_update_stats(struct igb_adapter *);
+extern void igb_update_stats(struct igb_adapter *, struct rtnl_link_stats64 *);
 extern bool igb_has_link(struct igb_adapter *adapter);
 extern void igb_set_ethtool_ops(struct net_device *);
 extern void igb_power_up_link(struct igb_adapter *);
diff --git a/drivers/net/igb/igb_ethtool.c b/drivers/net/igb/igb_ethtool.c
index 26bf6a1..a70e16b 100644
--- a/drivers/net/igb/igb_ethtool.c
+++ b/drivers/net/igb/igb_ethtool.c
@@ -90,8 +90,8 @@ static const struct igb_stats igb_gstrings_stats[] = {
 
 #define IGB_NETDEV_STAT(_net_stat) { \
 	.stat_string = __stringify(_net_stat), \
-	.sizeof_stat = FIELD_SIZEOF(struct net_device_stats, _net_stat), \
-	.stat_offset = offsetof(struct net_device_stats, _net_stat) \
+	.sizeof_stat = FIELD_SIZEOF(struct rtnl_link_stats64, _net_stat), \
+	.stat_offset = offsetof(struct rtnl_link_stats64, _net_stat) \
 }
 static const struct igb_stats igb_gstrings_net_stats[] = {
 	IGB_NETDEV_STAT(rx_errors),
@@ -111,8 +111,9 @@ static const struct igb_stats igb_gstrings_net_stats[] = {
 	(sizeof(igb_gstrings_net_stats) / sizeof(struct igb_stats))
 #define IGB_RX_QUEUE_STATS_LEN \
 	(sizeof(struct igb_rx_queue_stats) / sizeof(u64))
-#define IGB_TX_QUEUE_STATS_LEN \
-	(sizeof(struct igb_tx_queue_stats) / sizeof(u64))
+
+#define IGB_TX_QUEUE_STATS_LEN 3 /* packets, bytes, restart_queue */
+
 #define IGB_QUEUE_STATS_LEN \
 	((((struct igb_adapter *)netdev_priv(netdev))->num_rx_queues * \
 	  IGB_RX_QUEUE_STATS_LEN) + \
@@ -2070,12 +2071,14 @@ static void igb_get_ethtool_stats(struct net_device *netdev,
 				  struct ethtool_stats *stats, u64 *data)
 {
 	struct igb_adapter *adapter = netdev_priv(netdev);
-	struct net_device_stats *net_stats = &netdev->stats;
-	u64 *queue_stat;
-	int i, j, k;
+	struct rtnl_link_stats64 *net_stats = &adapter->stats64;
+	unsigned int start;
+	struct igb_ring *ring;
+	int i, j;
 	char *p;
 
-	igb_update_stats(adapter);
+	spin_lock(&adapter->stats64_lock);
+	igb_update_stats(adapter, net_stats);
 
 	for (i = 0; i < IGB_GLOBAL_STATS_LEN; i++) {
 		p = (char *)adapter + igb_gstrings_stats[i].stat_offset;
@@ -2088,15 +2091,36 @@ static void igb_get_ethtool_stats(struct net_device *netdev,
 			sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
 	}
 	for (j = 0; j < adapter->num_tx_queues; j++) {
-		queue_stat = (u64 *)&adapter->tx_ring[j]->tx_stats;
-		for (k = 0; k < IGB_TX_QUEUE_STATS_LEN; k++, i++)
-			data[i] = queue_stat[k];
+		u64	restart2;
+
+		ring = adapter->tx_ring[j];
+		do {
+			start = u64_stats_fetch_begin_bh(&ring->tx_syncp);
+			data[i]   = ring->tx_stats.packets;
+			data[i+1] = ring->tx_stats.bytes;
+			data[i+2] = ring->tx_stats.restart_queue;
+		} while (u64_stats_fetch_retry_bh(&ring->tx_syncp, start));
+		do {
+			start = u64_stats_fetch_begin_bh(&ring->tx_syncp2);
+			restart2  = ring->tx_stats.restart_queue2;
+		} while (u64_stats_fetch_retry_bh(&ring->tx_syncp2, start));
+		data[i+2] += restart2;
+
+		i += IGB_TX_QUEUE_STATS_LEN;
 	}
 	for (j = 0; j < adapter->num_rx_queues; j++) {
-		queue_stat = (u64 *)&adapter->rx_ring[j]->rx_stats;
-		for (k = 0; k < IGB_RX_QUEUE_STATS_LEN; k++, i++)
-			data[i] = queue_stat[k];
+		ring = adapter->rx_ring[j];
+		do {
+			start = u64_stats_fetch_begin_bh(&ring->rx_syncp);
+			data[i]   = ring->rx_stats.packets;
+			data[i+1] = ring->rx_stats.bytes;
+			data[i+2] = ring->rx_stats.drops;
+			data[i+3] = ring->rx_stats.csum_err;
+			data[i+4] = ring->rx_stats.alloc_failed;
+		} while (u64_stats_fetch_retry_bh(&ring->rx_syncp, start));
+		i += IGB_RX_QUEUE_STATS_LEN;
 	}
+	spin_unlock(&adapter->stats64_lock);
 }
 
 static void igb_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 55edcb7..c8f1249 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -96,7 +96,6 @@ static int igb_setup_all_rx_resources(struct igb_adapter *);
 static void igb_free_all_tx_resources(struct igb_adapter *);
 static void igb_free_all_rx_resources(struct igb_adapter *);
 static void igb_setup_mrqc(struct igb_adapter *);
-void igb_update_stats(struct igb_adapter *);
 static int igb_probe(struct pci_dev *, const struct pci_device_id *);
 static void __devexit igb_remove(struct pci_dev *pdev);
 static int igb_sw_init(struct igb_adapter *);
@@ -113,7 +112,8 @@ static void igb_update_phy_info(unsigned long);
 static void igb_watchdog(unsigned long);
 static void igb_watchdog_task(struct work_struct *);
 static netdev_tx_t igb_xmit_frame_adv(struct sk_buff *skb, struct net_device *);
-static struct net_device_stats *igb_get_stats(struct net_device *);
+static struct rtnl_link_stats64 *igb_get_stats64(struct net_device *dev,
+						 struct rtnl_link_stats64 *stats);
 static int igb_change_mtu(struct net_device *, int);
 static int igb_set_mac(struct net_device *, void *);
 static void igb_set_uta(struct igb_adapter *adapter);
@@ -1536,7 +1536,9 @@ void igb_down(struct igb_adapter *adapter)
 	netif_carrier_off(netdev);
 
 	/* record the stats before reset*/
-	igb_update_stats(adapter);
+	spin_lock(&adapter->stats64_lock);
+	igb_update_stats(adapter, &adapter->stats64);
+	spin_unlock(&adapter->stats64_lock);
 
 	adapter->link_speed = 0;
 	adapter->link_duplex = 0;
@@ -1689,7 +1691,7 @@ static const struct net_device_ops igb_netdev_ops = {
 	.ndo_open		= igb_open,
 	.ndo_stop		= igb_close,
 	.ndo_start_xmit		= igb_xmit_frame_adv,
-	.ndo_get_stats		= igb_get_stats,
+	.ndo_get_stats64	= igb_get_stats64,
 	.ndo_set_rx_mode	= igb_set_rx_mode,
 	.ndo_set_multicast_list	= igb_set_rx_mode,
 	.ndo_set_mac_address	= igb_set_mac,
@@ -2276,6 +2278,7 @@ static int __devinit igb_sw_init(struct igb_adapter *adapter)
 	adapter->max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
 	adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
 
+	spin_lock_init(&adapter->stats64_lock);
 #ifdef CONFIG_PCI_IOV
 	if (hw->mac.type == e1000_82576)
 		adapter->vfs_allocated_count = (max_vfs > 7) ? 7 : max_vfs;
@@ -3483,7 +3486,9 @@ static void igb_watchdog_task(struct work_struct *work)
 		}
 	}
 
-	igb_update_stats(adapter);
+	spin_lock(&adapter->stats64_lock);
+	igb_update_stats(adapter, &adapter->stats64);
+	spin_unlock(&adapter->stats64_lock);
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igb_ring *tx_ring = adapter->tx_ring[i];
@@ -3550,6 +3555,8 @@ static void igb_update_ring_itr(struct igb_q_vector *q_vector)
 	int new_val = q_vector->itr_val;
 	int avg_wire_size = 0;
 	struct igb_adapter *adapter = q_vector->adapter;
+	struct igb_ring *ring;
+	unsigned int packets;
 
 	/* For non-gigabit speeds, just fix the interrupt rate at 4000
 	 * ints/sec - ITR timer value of 120 ticks.
@@ -3559,16 +3566,21 @@ static void igb_update_ring_itr(struct igb_q_vector *q_vector)
 		goto set_itr_val;
 	}
 
-	if (q_vector->rx_ring && q_vector->rx_ring->total_packets) {
-		struct igb_ring *ring = q_vector->rx_ring;
-		avg_wire_size = ring->total_bytes / ring->total_packets;
+	ring = q_vector->rx_ring;
+	if (ring) {
+		packets = ACCESS_ONCE(ring->total_packets);
+
+		if (packets) 
+			avg_wire_size = ring->total_bytes / packets;
 	}
 
-	if (q_vector->tx_ring && q_vector->tx_ring->total_packets) {
-		struct igb_ring *ring = q_vector->tx_ring;
-		avg_wire_size = max_t(u32, avg_wire_size,
-		                      (ring->total_bytes /
-		                       ring->total_packets));
+	ring = q_vector->tx_ring;
+	if (ring) {
+		packets = ACCESS_ONCE(ring->total_packets);
+
+		if (packets)
+			avg_wire_size = max_t(u32, avg_wire_size,
+			                      ring->total_bytes / packets);
 	}
 
 	/* if avg_wire_size isn't set no work was done */
@@ -4077,7 +4089,11 @@ static int __igb_maybe_stop_tx(struct igb_ring *tx_ring, int size)
 
 	/* A reprieve! */
 	netif_wake_subqueue(netdev, tx_ring->queue_index);
-	tx_ring->tx_stats.restart_queue++;
+
+	u64_stats_update_begin(&tx_ring->tx_syncp2);
+	tx_ring->tx_stats.restart_queue2++;
+	u64_stats_update_end(&tx_ring->tx_syncp2);
+
 	return 0;
 }
 
@@ -4214,16 +4230,22 @@ static void igb_reset_task(struct work_struct *work)
 }
 
 /**
- * igb_get_stats - Get System Network Statistics
+ * igb_get_stats64 - Get System Network Statistics
  * @netdev: network interface device structure
+ * @stats: rtnl_link_stats64 pointer
  *
- * Returns the address of the device statistics structure.
- * The statistics are actually updated from the timer callback.
  **/
-static struct net_device_stats *igb_get_stats(struct net_device *netdev)
+static struct rtnl_link_stats64 *igb_get_stats64(struct net_device *netdev,
+						 struct rtnl_link_stats64 *stats)
 {
-	/* only return the current stats */
-	return &netdev->stats;
+	struct igb_adapter *adapter = netdev_priv(netdev);
+
+	spin_lock(&adapter->stats64_lock);
+	igb_update_stats(adapter, &adapter->stats64);
+	memcpy(stats, &adapter->stats64, sizeof(*stats));
+	spin_unlock(&adapter->stats64_lock);
+
+	return stats;
 }
 
 /**
@@ -4305,15 +4327,17 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
  * @adapter: board private structure
  **/
 
-void igb_update_stats(struct igb_adapter *adapter)
+void igb_update_stats(struct igb_adapter *adapter,
+		      struct rtnl_link_stats64 *net_stats)
 {
-	struct net_device_stats *net_stats = igb_get_stats(adapter->netdev);
 	struct e1000_hw *hw = &adapter->hw;
 	struct pci_dev *pdev = adapter->pdev;
 	u32 reg, mpc;
 	u16 phy_tmp;
 	int i;
 	u64 bytes, packets;
+	unsigned int start;
+	u64 _bytes, _packets;
 
 #define PHY_IDLE_ERROR_COUNT_MASK 0x00FF
 
@@ -4331,10 +4355,17 @@ void igb_update_stats(struct igb_adapter *adapter)
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		u32 rqdpc_tmp = rd32(E1000_RQDPC(i)) & 0x0FFF;
 		struct igb_ring *ring = adapter->rx_ring[i];
+
 		ring->rx_stats.drops += rqdpc_tmp;
 		net_stats->rx_fifo_errors += rqdpc_tmp;
-		bytes += ring->rx_stats.bytes;
-		packets += ring->rx_stats.packets;
+		
+		do {
+			start = u64_stats_fetch_begin_bh(&ring->rx_syncp);
+			_bytes = ring->rx_stats.bytes;
+			_packets = ring->rx_stats.packets;
+		} while (u64_stats_fetch_retry_bh(&ring->rx_syncp, start));
+		bytes += _bytes;
+		packets += _packets;
 	}
 
 	net_stats->rx_bytes = bytes;
@@ -4344,8 +4375,13 @@ void igb_update_stats(struct igb_adapter *adapter)
 	packets = 0;
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct igb_ring *ring = adapter->tx_ring[i];
-		bytes += ring->tx_stats.bytes;
-		packets += ring->tx_stats.packets;
+		do {
+			start = u64_stats_fetch_begin_bh(&ring->tx_syncp);
+			_bytes = ring->tx_stats.bytes;
+			_packets = ring->tx_stats.packets;
+		} while (u64_stats_fetch_retry_bh(&ring->tx_syncp, start));
+		bytes += _bytes;
+		packets += _packets;
 	}
 	net_stats->tx_bytes = bytes;
 	net_stats->tx_packets = packets;
@@ -5397,7 +5433,10 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 		if (__netif_subqueue_stopped(netdev, tx_ring->queue_index) &&
 		    !(test_bit(__IGB_DOWN, &adapter->state))) {
 			netif_wake_subqueue(netdev, tx_ring->queue_index);
+
+			u64_stats_update_begin(&tx_ring->tx_syncp);
 			tx_ring->tx_stats.restart_queue++;
+			u64_stats_update_end(&tx_ring->tx_syncp);
 		}
 	}
 
@@ -5437,8 +5476,10 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 	}
 	tx_ring->total_bytes += total_bytes;
 	tx_ring->total_packets += total_packets;
+	u64_stats_update_begin(&tx_ring->tx_syncp);
 	tx_ring->tx_stats.bytes += total_bytes;
 	tx_ring->tx_stats.packets += total_packets;
+	u64_stats_update_end(&tx_ring->tx_syncp);
 	return count < tx_ring->count;
 }
 
@@ -5480,9 +5521,11 @@ static inline void igb_rx_checksum_adv(struct igb_ring *ring,
 		 * packets, (aka let the stack check the crc32c)
 		 */
 		if ((skb->len == 60) &&
-		    (ring->flags & IGB_RING_FLAG_RX_SCTP_CSUM))
+		    (ring->flags & IGB_RING_FLAG_RX_SCTP_CSUM)) {
+			u64_stats_update_begin(&ring->rx_syncp);
 			ring->rx_stats.csum_err++;
-
+			u64_stats_update_end(&ring->rx_syncp);
+		}
 		/* let the stack verify checksum errors */
 		return;
 	}
@@ -5669,8 +5712,10 @@ next_desc:
 
 	rx_ring->total_packets += total_packets;
 	rx_ring->total_bytes += total_bytes;
+	u64_stats_update_begin(&rx_ring->rx_syncp);
 	rx_ring->rx_stats.packets += total_packets;
 	rx_ring->rx_stats.bytes += total_bytes;
+	u64_stats_update_end(&rx_ring->rx_syncp);
 	return cleaned;
 }
 
@@ -5698,8 +5743,10 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 		if ((bufsz < IGB_RXBUFFER_1024) && !buffer_info->page_dma) {
 			if (!buffer_info->page) {
 				buffer_info->page = netdev_alloc_page(netdev);
-				if (!buffer_info->page) {
+				if (unlikely(!buffer_info->page)) {
+					u64_stats_update_begin(&rx_ring->rx_syncp);
 					rx_ring->rx_stats.alloc_failed++;
+					u64_stats_update_end(&rx_ring->rx_syncp);
 					goto no_buffers;
 				}
 				buffer_info->page_offset = 0;
@@ -5714,7 +5761,9 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 			if (dma_mapping_error(rx_ring->dev,
 					      buffer_info->page_dma)) {
 				buffer_info->page_dma = 0;
+				u64_stats_update_begin(&rx_ring->rx_syncp);
 				rx_ring->rx_stats.alloc_failed++;
+				u64_stats_update_end(&rx_ring->rx_syncp);
 				goto no_buffers;
 			}
 		}
@@ -5722,8 +5771,10 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 		skb = buffer_info->skb;
 		if (!skb) {
 			skb = netdev_alloc_skb_ip_align(netdev, bufsz);
-			if (!skb) {
+			if (unlikely(!skb)) {
+				u64_stats_update_begin(&rx_ring->rx_syncp);
 				rx_ring->rx_stats.alloc_failed++;
+				u64_stats_update_end(&rx_ring->rx_syncp);
 				goto no_buffers;
 			}
 
@@ -5737,7 +5788,9 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 			if (dma_mapping_error(rx_ring->dev,
 					      buffer_info->dma)) {
 				buffer_info->dma = 0;
+				u64_stats_update_begin(&rx_ring->rx_syncp);
 				rx_ring->rx_stats.alloc_failed++;
+				u64_stats_update_end(&rx_ring->rx_syncp);
 				goto no_buffers;
 			}
 		}



^ permalink raw reply related

* Documenting UNIX domain autobind
From: Michael Kerrisk @ 2010-10-10  5:48 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: netdev, mzxreary

Hello Tetsuo,

I'm the Linux man-pages mainatiner. I write to you because I see that
you recently (http://kerneltrap.org/mailarchive/linux-netdev/2010/8/30/6284106/thread#mid-6284106)
did some work patchiing Linux unix_autobind(), so you may know the
answer to this question. But, also others on the CC may know.

I recently noticed this feature in the kernel, and so added some
documentation to the unix(7) man page. That text reads as follows:

   Autobind Feature
       If a bind() call specifies addrlen as  sizeof(sa_family_t),  or
       the  SO_PASSCRED  socket option was specified for a socket that
       was not explicitly bound to an  address,  then  the  socket  is
       autobound  to  an  abstract address.  The address consists of a
       null byte followed by 5 bytes in the  character  set  [0-9a-f].
       (Thus, there is a limit of 2^20 autobind addresses.)

I think this text correctly documents the technical details (but let
me know if you see errors). What is lacking is an explanation of why
this feature exists. Is someone able to explain where this feature is
used and why?

thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply

* Re: [PATCH] Add IRQF_SAMPLE_RANDOM Flag to forcedeth
From: Patrick Simmons @ 2010-10-10  3:23 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, netdev
In-Reply-To: <20101009.201501.214209347.davem@davemloft.net>

On 10/09/10 21:15, David Miller wrote:
> From: Ben Hutchings<bhutchings@solarflare.com>
> Date: Sun, 10 Oct 2010 02:09:24 +0100
>
>> Patrick Simmons wrote:
>>> This patch adds the IRQF_SAMPLE_RANDOM flag to the forcedeth driver,
>>> allowing the interrupt timing for forcedeth to be used for entropy
>>> generation.  This should help /dev/random generate more secure random
>>> numbers on machines using this driver.
>> [...]
>>
>> We don't enable this for network drivers any more because:
>>
>> 1. At high packet rates, interrupt moderation makes interrupts very
>> regular.
>> 2. At low packet rates, a malicious sender can control the interrupt
>> timing.
>
> Agreed on all counts, I'm not applying this patch.

It's enabled for other network drivers, which is where I got the idea 
from.  Has anyone actually done an experiment to see whether these two 
concerns are valid?

--Patrick

-- 
If I'm not here, I've gone out to find myself.  If I get back before I 
return, please keep me here.

^ permalink raw reply

* Re: [PATCH] Add IRQF_SAMPLE_RANDOM Flag to forcedeth
From: David Miller @ 2010-10-10  3:15 UTC (permalink / raw)
  To: bhutchings; +Cc: linuxrocks123, netdev
In-Reply-To: <20101010010924.GB15074@solarflare.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Sun, 10 Oct 2010 02:09:24 +0100

> Patrick Simmons wrote:
>> This patch adds the IRQF_SAMPLE_RANDOM flag to the forcedeth driver, 
>> allowing the interrupt timing for forcedeth to be used for entropy 
>> generation.  This should help /dev/random generate more secure random 
>> numbers on machines using this driver.
> [...]
> 
> We don't enable this for network drivers any more because:
> 
> 1. At high packet rates, interrupt moderation makes interrupts very
> regular.
> 2. At low packet rates, a malicious sender can control the interrupt
> timing.

Agreed on all counts, I'm not applying this patch.

^ permalink raw reply

* Re: [PATCH] net: Fix sk_dst_check() to reset the obsolete dst_entry of a socket.
From: Chung-Yih Wang (王崇懿) @ 2010-10-10  1:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, timo.teras
In-Reply-To: <AANLkTikz8=H9=O2RR9J0=rHm7pcwabnMupKfiSfGnt1k@mail.gmail.com>

After I gave it a try, that patch worked. Please ignore my patch then. Thanks!

On Thu, Oct 7, 2010 at 12:37 PM, Chung-Yih Wang (王崇懿) <cywang@google.com> wrote:
> As I am testing the l2tp/ipsec client(it is working fine on 2.6.32 but
> failed on 2.6.35 with the same client). Please see the following log
> dump for sk_dst_check().
>
> <2>[  201.390289] ==== sk_dst_check sk=C7485800 dst=C717AC60
> obsolete=FFFFFFFF cookie=0 check=C0296510
> <2>[  211.247467] ==== sk_dst_check sk=C7485000 dst=C717AC60
> obsolete=FFFFFFFF cookie=0 check=C0296510
>
> [Basically, the ipsec tunnel is built and then free the dst_entry for
> this l2tp socket. Therefore, the obsolete entry should be reset in
> sk_dst_check(). However, the kernel 2.6.35 will do nothing here since
> the ipv4_dst_check still return the obsolete entry even if it is
> obsolete(dst->obsolete=2)]
>
> <2>[  216.571350] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <6>[  218.069396] alg: No test for authenc(hmac(sha1),cbc(des3_ede))
> (authenc(hmac(sha1-generic),cbc(des3_ede-generic)))
> <6>[  218.164764] batt:  96%, 4114 mV, 0 mA (-6 avg), 27.2 C, 1305 mAh
> <2>[  218.575561] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  220.580535] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  222.585754] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  224.591522] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  226.599212] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  228.602600] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  230.608062] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  232.613464] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  234.618896] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  236.623504] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  238.628936] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  240.634338] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  242.639709] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  244.645111] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  246.648864] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  248.654693] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  250.660125] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
> <2>[  252.665527] ==== sk_dst_check sk=C7485400 dst=C6F670E0
> obsolete=00000002 cookie=0 check=C0296510
>

^ permalink raw reply

* Re: [PATCH] Add IRQF_SAMPLE_RANDOM Flag to forcedeth
From: Ben Hutchings @ 2010-10-10  1:09 UTC (permalink / raw)
  To: Patrick Simmons; +Cc: netdev
In-Reply-To: <4CB0FEA6.3030206@netscape.net>

Patrick Simmons wrote:
> This patch adds the IRQF_SAMPLE_RANDOM flag to the forcedeth driver, 
> allowing the interrupt timing for forcedeth to be used for entropy 
> generation.  This should help /dev/random generate more secure random 
> numbers on machines using this driver.
[...]

We don't enable this for network drivers any more because:

1. At high packet rates, interrupt moderation makes interrupts very
regular.
2. At low packet rates, a malicious sender can control the interrupt
timing.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [PATCH net-next] igb: fix stats handling
From: Tantilov, Emil S @ 2010-10-09 23:57 UTC (permalink / raw)
  To: Eric Dumazet, Kirsher, Jeffrey T
  Cc: Jesper Dangaard Brouer, Duyck, Alexander H,
	Jesper Dangaard Brouer, David S. Miller, netdev, Wyborny, Carolyn
In-Reply-To: <1286339791.4861.26.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le mercredi 06 octobre 2010 à 05:28 +0200, Eric Dumazet a écrit :
> 
>> I'll let Intel guys doing the backporting work, but for old kernels,
>> you'll probably need to use "unsigned long" instead of "u64"
>> 
>> My plan is :
>> 
>> - Provide 64bit counters even on 32bit arch
>> - with proper synchro (include/linux/u64_stats_sync.h)
>> - Add a spinlock so we can apply Jesper patch.
> 
> Here is the net-next-2.6 patch, I am currently enable to test it, the
> dev machine with IGB NIC cannot be restarted until tomorrow, my son
> Nicolas is currently using it ;)
> 
> Could you and/or Jesper test it, possibly on 32 and 64 bit kernels ?
> 
> Thanks !
> 
> [PATCH net-next] igb: fix stats handling
> 
> There are currently some problems with igb.
> 
> - On 32bit arches, maintaining 64bit counters without proper
> synchronization between writers and readers.
> 
> - Stats updated every two seconds, as reported by Jesper.
>    (Jesper provided a patch for this)
> 
> - Potential problem between worker thread and ethtool -S
> 
> This patch uses u64_stats_sync, and convert everything to be 64bit
> safe, 
> SMP safe, even on 32bit arches.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  drivers/net/igb/igb.h         |    7 +-
>  drivers/net/igb/igb_ethtool.c |   10 +-
>  drivers/net/igb/igb_main.c    |  111 +++++++++++++++++++++++---------
>  3 files changed, 94 insertions(+), 34 deletions(-)

This patch is causing a hang when testing with 2 sessions in a while loop reading /proc/net/dev/ and ethtool -S. I think even just reading /proc/net/dev/ is sufficient, but have not confirmed it yet. I have seen the hang somewhere between 15 min to an hour. Without the patch same test ran 24+ hours without issues.

There was no trace on the screen, I got this with magic sysrq:

[15388.393579] SysRq : Show Regs 
[15388.397341] Modules linked in: igb [last unloaded: scsi_wait_scan] [15388.404846]  
[15388.406889] Pid: 16218, comm: kworker/4:1 Not tainted 2.6.36-rc3-net-next-igb-100810+ #2 S5520HC/S5520HC 
[15388.418393] EIP: 0060:[<c13fead2>] EFLAGS: 00000297 CPU: 4 
[15388.424908] EIP is at _raw_spin_lock+0x13/0x19 
[15388.430257] EAX: f6eab55c EBX: f6eab380 ECX: 00000001 EDX: 00004e4a [15388.437629] ESI: f6eab000 EDI: f6eab41c EBP: f3d9bf4c ESP: f3d9bf4c [15388.445011]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 
[15388.451422] Process kworker/4:1 (pid: 16218, ti=f3d9a000 task=f5ce0ed0 task.ti=f3d9a000) 
[15388.461173] Stack: 
[15388.463796]  f3d9bf64 f8586c57 f3d9bf5c f6eab41c f6901e00 c8703cc0 f3d9bf90 c1041379 
[15388.473116] <0> 00000004 00000000 f8586b16 00000000 c8707b05 c8707b00 f6901e00 c8703cc4 
[15388.483379] <0> c8703cc0 f3d9bfb8 c1042879 c16bb640 c8703cc4 00000c54 c16bb640 f6901e10 
[15388.494219] Call Trace: 
[15388.497336]  [<f8586c57>] ? igb_watchdog_task+0x141/0x21a [igb] 
[15388.504336]  [<c1041379>] ? process_one_work+0x18e/0x265 
[15388.510643]  [<f8586b16>] ? igb_watchdog_task+0x0/0x21a [igb] 
[15388.517455]  [<c1042879>] ? worker_thread+0xf3/0x1ef 
[15388.523384]  [<c1042786>] ? worker_thread+0x0/0x1ef 
[15388.529222]  [<c104506b>] ? kthread+0x62/0x67 
[15388.534475]  [<c1045009>] ? kthread+0x0/0x67 
[15388.539623]  [<c1002d36>] ? kernel_thread_helper+0x6/0x10 
[15388.546034] Code: 00 75 05 f0 66 0f b1 0a 0f 94 c1 0f b6 c1 85 c0 0f 95 c0 0f b6 c0 5d c3 55 ba 00 01 00 00 89 e5 f0 66 0f c1 10 38 f2 74 06 f3 90 <8a> 10 eb f6 5d c3 55 89 e5 9c 59 fa ba 00 01 00 00 f0 66 0f c1

Thanks,
Emil


^ permalink raw reply

* [PATCH] Add IRQF_SAMPLE_RANDOM Flag to forcedeth
From: Patrick Simmons @ 2010-10-09 23:45 UTC (permalink / raw)
  To: netdev

This patch adds the IRQF_SAMPLE_RANDOM flag to the forcedeth driver, 
allowing the interrupt timing for forcedeth to be used for entropy 
generation.  This should help /dev/random generate more secure random 
numbers on machines using this driver.

Signed-off-by: Patrick Simmons <linuxrocks123@netscape.net>

Please CC me with any comments as I am not subscribed to the list.

--- linux/drivers/net/forcedeth.c.orig    2010-10-09 17:12:08.400000015 
-0600
+++ linux/drivers/net/forcedeth.c    2010-10-09 17:14:44.880000015 -0600
@@ -3819,7 +3819,7 @@ static int nv_request_irq(struct net_dev
                  /* Request irq for rx handling */
                  sprintf(np->name_rx, "%s-rx", dev->name);
                  if 
(request_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector,
-                        nv_nic_irq_rx, IRQF_SHARED, np->name_rx, dev) 
!= 0) {
+                        nv_nic_irq_rx, IRQF_SHARED | 
IRQF_SAMPLE_RANDOM, np->name_rx, dev) != 0) {
                      printk(KERN_INFO "forcedeth: request_irq failed 
for rx %d\n", ret);
                      pci_disable_msix(np->pci_dev);
                      np->msi_flags &= ~NV_MSI_X_ENABLED;
@@ -3828,7 +3828,7 @@ static int nv_request_irq(struct net_dev
                  /* Request irq for tx handling */
                  sprintf(np->name_tx, "%s-tx", dev->name);
                  if 
(request_irq(np->msi_x_entry[NV_MSI_X_VECTOR_TX].vector,
-                        nv_nic_irq_tx, IRQF_SHARED, np->name_tx, dev) 
!= 0) {
+                        nv_nic_irq_tx, IRQF_SHARED | 
IRQF_SAMPLE_RANDOM, np->name_tx, dev) != 0) {
                      printk(KERN_INFO "forcedeth: request_irq failed 
for tx %d\n", ret);
                      pci_disable_msix(np->pci_dev);
                      np->msi_flags &= ~NV_MSI_X_ENABLED;
@@ -3837,7 +3837,7 @@ static int nv_request_irq(struct net_dev
                  /* Request irq for link and timer handling */
                  sprintf(np->name_other, "%s-other", dev->name);
                  if 
(request_irq(np->msi_x_entry[NV_MSI_X_VECTOR_OTHER].vector,
-                        nv_nic_irq_other, IRQF_SHARED, np->name_other, 
dev) != 0) {
+                        nv_nic_irq_other, IRQF_SHARED | 
IRQF_SAMPLE_RANDOM, np->name_other, dev) != 0) {
                      printk(KERN_INFO "forcedeth: request_irq failed 
for link %d\n", ret);
                      pci_disable_msix(np->pci_dev);
                      np->msi_flags &= ~NV_MSI_X_ENABLED;
@@ -3851,7 +3851,7 @@ static int nv_request_irq(struct net_dev
                  set_msix_vector_map(dev, NV_MSI_X_VECTOR_OTHER, 
NVREG_IRQ_OTHER);
              } else {
                  /* Request irq for all interrupts */
-                if 
(request_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector, handler, 
IRQF_SHARED, dev->name, dev) != 0) {
+                if 
(request_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector, handler, 
IRQF_SHARED | IRQF_SAMPLE_RANDOM, dev->name, dev) != 0) {
                      printk(KERN_INFO "forcedeth: request_irq failed 
%d\n", ret);
                      pci_disable_msix(np->pci_dev);
                      np->msi_flags &= ~NV_MSI_X_ENABLED;
@@ -3868,7 +3868,7 @@ static int nv_request_irq(struct net_dev
          if ((ret = pci_enable_msi(np->pci_dev)) == 0) {
              np->msi_flags |= NV_MSI_ENABLED;
              dev->irq = np->pci_dev->irq;
-            if (request_irq(np->pci_dev->irq, handler, IRQF_SHARED, 
dev->name, dev) != 0) {
+            if (request_irq(np->pci_dev->irq, handler, IRQF_SHARED | 
IRQF_SAMPLE_RANDOM, dev->name, dev) != 0) {
                  printk(KERN_INFO "forcedeth: request_irq failed %d\n", 
ret);
                  pci_disable_msi(np->pci_dev);
                  np->msi_flags &= ~NV_MSI_ENABLED;
@@ -3884,7 +3884,7 @@ static int nv_request_irq(struct net_dev
          }
      }
      if (ret != 0) {
-        if (request_irq(np->pci_dev->irq, handler, IRQF_SHARED, 
dev->name, dev) != 0)
+        if (request_irq(np->pci_dev->irq, handler, IRQF_SHARED | 
IRQF_SAMPLE_RANDOM, dev->name, dev) != 0)
              goto out_err;

      }


^ permalink raw reply

* Re: [PATCH -next] sundance: Add initial ethtool stats support
From: Jeff Garzik @ 2010-10-09 21:48 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: netdev, davem, Eric Dumazet, Ben Hutchings
In-Reply-To: <4CB0C56C.2000106@kernel.org>

On 10/09/2010 03:41 PM, Denis Kirjanov wrote:
> On 10/09/2010 05:27 PM, Ben Hutchings wrote:
>> Denis Kirjanov wrote:
>>> Add initial ethtool statistics support
>> [...]
>>> +static void get_ethtool_stats(struct net_device *dev,
>>> +		struct ethtool_stats *stats, u64 *data)
>>> +{
>>> +	struct net_device_stats *netdev_stats = get_stats(dev);
>>> +	int i = 0;
>>> +
>>> +	data[i++] = netdev_stats->tx_packets;
>>> +	data[i++] = netdev_stats->tx_bytes;
>>> +	data[i++] = netdev_stats->rx_packets;
>>> +	data[i++] = netdev_stats->rx_bytes;
>>> +	data[i++] = netdev_stats->tx_errors;
>>> +	data[i++] = netdev_stats->tx_dropped;
>>> +	data[i++] = netdev_stats->rx_errors;
>>> +}
>> [...]
>>
>> There is no point in adding ethtool stats that merely mirror the baseline
>> net device stats.
>>
>> Ben.
>>
>
> [PATCH -next v2] sundance: Add ethtool stats support
>
> Add ethtool stats support
>
> Signed-off-by: Denis Kirjanov<dkirjanov@kernel.org>
> ---
> V2:
>   check for the ETH_SS_STATS in get_string()
>   use xstats struct for ethtool stats
>
> drivers/net/sundance.c |   90 ++++++++++++++++++++++++++++++++++++++++++++----
>   1 files changed, 83 insertions(+), 7 deletions(-)

Acked-by: Jeff Garzik <jgarzik@redhat.com>

glad somebody tackled this...



^ permalink raw reply

* Re: [PATCH] forcedeth: reconfigure multicast packet filter only when needed
From: Jindřich Makovička @ 2010-10-09 20:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-kernel, netdev, davem, aabdulla, mditto
In-Reply-To: <20101009113213.62c2f131@nehalam>

[-- Attachment #1: Type: text/plain, Size: 661 bytes --]

2010/10/9 Stephen Hemminger <shemminger@vyatta.com>:
> On Sat, 9 Oct 2010 13:26:05 +0200
> Jindřich Makovička <makovick@gmail.com> wrote:
>
>> +
>> +     /* current packet filter state */
>> +     u32 cur_pff;
>> +     u32 cur_addr[2];
>> +     u32 cur_mask[2];
>>  };
>
> No big deal, but couldn't you just put those temporary variables
> on the stack. and reread the current value before stopping.

Sure, this version should do the same (still untested, I don't have a
machine with nForce here at the moment). I just wanted to avoid more
NIC register accesses, but it's probably a premature optimization.

Regards,
-- 
Jindrich Makovicka

[-- Attachment #2: forcedeth2.diff --]
[-- Type: text/x-patch, Size: 2001 bytes --]

--- forcedeth.c.orig	2010-10-07 08:56:55.564511153 +0200
+++ forcedeth.c	2010-10-09 22:34:53.151523511 +0200
@@ -3027,9 +3027,15 @@
 {
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
-	u32 addr[2];
-	u32 mask[2];
-	u32 pff = readl(base + NvRegPacketFilterFlags) & NVREG_PFF_PAUSE_RX;
+	u32 addr[2], prev_addr[2];
+	u32 mask[2], prev_mask[2];
+	u32 prev_pff = readl(base + NvRegPacketFilterFlags);
+	u32 pff = prev_pff & NVREG_PFF_PAUSE_RX;
+
+	prev_addr[0] = readl(base + NvRegMulticastAddrA);
+	prev_addr[1] = readl(base + NvRegMulticastAddrB);
+	prev_mask[0] = readl(base + NvRegMulticastMaskA);
+	prev_mask[1] = readl(base + NvRegMulticastMaskB);
 
 	memset(addr, 0, sizeof(addr));
 	memset(mask, 0, sizeof(mask));
@@ -3072,17 +3078,25 @@
 	}
 	addr[0] |= NVREG_MCASTADDRA_FORCE;
 	pff |= NVREG_PFF_ALWAYS;
-	spin_lock_irq(&np->lock);
-	nv_stop_rx(dev);
-	writel(addr[0], base + NvRegMulticastAddrA);
-	writel(addr[1], base + NvRegMulticastAddrB);
-	writel(mask[0], base + NvRegMulticastMaskA);
-	writel(mask[1], base + NvRegMulticastMaskB);
-	writel(pff, base + NvRegPacketFilterFlags);
-	dprintk(KERN_INFO "%s: reconfiguration for multicast lists.\n",
-		dev->name);
-	nv_start_rx(dev);
-	spin_unlock_irq(&np->lock);
+	if (prev_pff != pff
+	    || memcmp(prev_addr, addr, sizeof(prev_addr)) != 0
+	    || memcmp(prev_mask, mask, sizeof(prev_mask)) != 0)
+	{
+		dprintk(KERN_INFO "%s: reconfiguration for multicast lists.\n",
+			dev->name);
+		spin_lock_irq(&np->lock);
+		nv_stop_rx(dev);
+		writel(addr[0], base + NvRegMulticastAddrA);
+		writel(addr[1], base + NvRegMulticastAddrB);
+		writel(mask[0], base + NvRegMulticastMaskA);
+		writel(mask[1], base + NvRegMulticastMaskB);
+		writel(pff, base + NvRegPacketFilterFlags);
+		nv_start_rx(dev);
+		spin_unlock_irq(&np->lock);
+	} else {
+		dprintk(KERN_INFO "%s: pff state unchanged - skipping reconfiguration.\n",
+			dev->name);
+	}
 }
 
 static void nv_update_pause(struct net_device *dev, u32 pause_flags)

^ permalink raw reply

* Re: [PATCH -next] sundance: Add initial ethtool stats support
From: Denis Kirjanov @ 2010-10-09 19:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, Eric Dumazet, Ben Hutchings
In-Reply-To: <20101009132749.GA15074@solarflare.com>

On 10/09/2010 05:27 PM, Ben Hutchings wrote:
> Denis Kirjanov wrote:
>> Add initial ethtool statistics support 
> [...]
>> +static void get_ethtool_stats(struct net_device *dev,
>> +		struct ethtool_stats *stats, u64 *data)
>> +{
>> +	struct net_device_stats *netdev_stats = get_stats(dev);
>> +	int i = 0;
>> +
>> +	data[i++] = netdev_stats->tx_packets;
>> +	data[i++] = netdev_stats->tx_bytes;
>> +	data[i++] = netdev_stats->rx_packets;
>> +	data[i++] = netdev_stats->rx_bytes;
>> +	data[i++] = netdev_stats->tx_errors;
>> +	data[i++] = netdev_stats->tx_dropped;
>> +	data[i++] = netdev_stats->rx_errors;
>> +}
> [...]
> 
> There is no point in adding ethtool stats that merely mirror the baseline
> net device stats.
> 
> Ben.
> 

[PATCH -next v2] sundance: Add ethtool stats support

Add ethtool stats support

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
---
V2: 
 check for the ETH_SS_STATS in get_string()
 use xstats struct for ethtool stats

drivers/net/sundance.c |   90 ++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 83 insertions(+), 7 deletions(-)

diff --git a/drivers/net/sundance.c b/drivers/net/sundance.c
index 4283cc5..4e3ff71 100644
--- a/drivers/net/sundance.c
+++ b/drivers/net/sundance.c
@@ -363,6 +363,19 @@ struct netdev_private {
         dma_addr_t tx_ring_dma;
         dma_addr_t rx_ring_dma;
 	struct timer_list timer;		/* Media monitoring timer. */
+	/* ethtool extra stats */
+	struct {
+		unsigned long tx_multiple_collisions;
+		unsigned long tx_single_collisions;
+		unsigned long tx_late_collisions;
+		unsigned long tx_deffered;
+		unsigned long tx_deffered_excessive;
+		unsigned long tx_aborted;
+		unsigned long tx_bcasts;
+		unsigned long rx_bcasts;
+		unsigned long tx_mcasts;
+		unsigned long rx_mcasts;
+	} xstats;
 	/* Frequently used values: keep some adjacent for cache effect. */
 	spinlock_t lock;
 	int msg_enable;
@@ -1486,7 +1499,6 @@ static struct net_device_stats *get_stats(struct net_device *dev)
 {
 	struct netdev_private *np = netdev_priv(dev);
 	void __iomem *ioaddr = np->base;
-	int i;
 	unsigned long flags;
 
 	spin_lock_irqsave(&np->statlock, flags);
@@ -1494,13 +1506,23 @@ static struct net_device_stats *get_stats(struct net_device *dev)
 	dev->stats.rx_missed_errors	+= ioread8(ioaddr + RxMissed);
 	dev->stats.tx_packets += ioread16(ioaddr + TxFramesOK);
 	dev->stats.rx_packets += ioread16(ioaddr + RxFramesOK);
-	dev->stats.collisions += ioread8(ioaddr + StatsLateColl);
-	dev->stats.collisions += ioread8(ioaddr + StatsMultiColl);
-	dev->stats.collisions += ioread8(ioaddr + StatsOneColl);
 	dev->stats.tx_carrier_errors += ioread8(ioaddr + StatsCarrierError);
-	ioread8(ioaddr + StatsTxDefer);
-	for (i = StatsTxDefer; i <= StatsMcastRx; i++)
-		ioread8(ioaddr + i);
+
+	np->xstats.tx_multiple_collisions += ioread8(ioaddr + StatsMultiColl);
+	np->xstats.tx_single_collisions += ioread8(ioaddr + StatsOneColl);
+	np->xstats.tx_late_collisions += ioread8(ioaddr + StatsLateColl);
+	dev->stats.collisions += np->xstats.tx_multiple_collisions
+		+ np->xstats.tx_single_collisions
+		+ np->xstats.tx_late_collisions;
+
+	np->xstats.tx_deffered += ioread8(ioaddr + StatsTxDefer);
+	np->xstats.tx_deffered_excessive += ioread8(ioaddr + StatsTxXSDefer);
+	np->xstats.tx_aborted += ioread8(ioaddr + StatsTxAbort);
+	np->xstats.tx_bcasts += ioread8(ioaddr + StatsBcastTx);
+	np->xstats.rx_bcasts += ioread8(ioaddr + StatsBcastRx);
+	np->xstats.tx_mcasts += ioread8(ioaddr + StatsMcastTx);
+	np->xstats.rx_mcasts += ioread8(ioaddr + StatsMcastRx);
+
 	dev->stats.tx_bytes += ioread16(ioaddr + TxOctetsLow);
 	dev->stats.tx_bytes += ioread16(ioaddr + TxOctetsHigh) << 16;
 	dev->stats.rx_bytes += ioread16(ioaddr + RxOctetsLow);
@@ -1566,6 +1588,21 @@ static int __set_mac_addr(struct net_device *dev)
 	return 0;
 }
 
+static const struct {
+	const char name[ETH_GSTRING_LEN];
+} sundance_stats[] = {
+	{ "tx_multiple_collisions" },
+	{ "tx_single_collisions" },
+	{ "tx_late_collisions" },
+	{ "tx_deffered" },
+	{ "tx_deffered_excessive" },
+	{ "tx_aborted" },
+	{ "tx_bcasts" },
+	{ "rx_bcasts" },
+	{ "tx_mcasts" },
+	{ "rx_mcasts" },
+};
+
 static int check_if_running(struct net_device *dev)
 {
 	if (!netif_running(dev))
@@ -1624,6 +1661,42 @@ static void set_msglevel(struct net_device *dev, u32 val)
 	np->msg_enable = val;
 }
 
+static void get_strings(struct net_device *dev, u32 stringset,
+		u8 *data)
+{
+	if (stringset == ETH_SS_STATS)
+		memcpy(data, sundance_stats, sizeof(sundance_stats));
+}
+
+static int get_sset_count(struct net_device *dev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return ARRAY_SIZE(sundance_stats);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static void get_ethtool_stats(struct net_device *dev,
+		struct ethtool_stats *stats, u64 *data)
+{
+	struct netdev_private *np = netdev_priv(dev);
+	int i = 0;
+
+	get_stats(dev);
+	data[i++] = np->xstats.tx_multiple_collisions;
+	data[i++] = np->xstats.tx_single_collisions;
+	data[i++] = np->xstats.tx_late_collisions;
+	data[i++] = np->xstats.tx_deffered;
+	data[i++] = np->xstats.tx_deffered_excessive;
+	data[i++] = np->xstats.tx_aborted;
+	data[i++] = np->xstats.tx_bcasts;
+	data[i++] = np->xstats.rx_bcasts;
+	data[i++] = np->xstats.tx_mcasts;
+	data[i++] = np->xstats.rx_mcasts;
+}
+
 static const struct ethtool_ops ethtool_ops = {
 	.begin = check_if_running,
 	.get_drvinfo = get_drvinfo,
@@ -1633,6 +1706,9 @@ static const struct ethtool_ops ethtool_ops = {
 	.get_link = get_link,
 	.get_msglevel = get_msglevel,
 	.set_msglevel = set_msglevel,
+	.get_strings = get_strings,
+	.get_sset_count = get_sset_count,
+	.get_ethtool_stats = get_ethtool_stats,
 };
 
 static int netdev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
-- 
1.7.0


^ permalink raw reply related

* Re: [PATCH] forcedeth: reconfigure multicast packet filter only when needed
From: Stephen Hemminger @ 2010-10-09 18:32 UTC (permalink / raw)
  To: Jindřich Makovička; +Cc: linux-kernel, netdev, davem, aabdulla, ditto
In-Reply-To: <AANLkTinbgud9enM1XDQrdZVBtGnTFaUeoEL-D8Z2FmJh@mail.gmail.com>

On Sat, 9 Oct 2010 13:26:05 +0200
Jindřich Makovička <makovick@gmail.com> wrote:

> +
> +	/* current packet filter state */
> +	u32 cur_pff;
> +	u32 cur_addr[2];
> +	u32 cur_mask[2];
>  };

No big deal, but couldn't you just put those temporary variables
on the stack. and reread the current value before stopping.

-- 

^ permalink raw reply

* Re: [PATCH] ehea: Fix a checksum issue on the receive path
From: Stephen Hemminger @ 2010-10-09 17:31 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, leitao, netdev, fubar
In-Reply-To: <20101009.092043.58419374.davem@davemloft.net>

On Sat, 09 Oct 2010 09:20:43 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 08 Oct 2010 16:36:17 +0200
> 
> > I am pretty sure most (if not all) netdev drivers pass the packet with
> > invalid checksum to upper stack, so that we can increment appropriate
> > SNMP counters, in IP stack or UDP/TCP/whatever stack.
> > 
> > tg3, bnx2, e1000, skge, sky2, bnx2x, niu, r8169, igb, ... seems to do
> > that.
> 
> Drivers _must_ send up all packets, even those with bad checksums,
> without exception.
> 
> Otherwise protocol statistics get lost, netfilter log entries go
> missing, etc.

Also hardware checksum can be wrong/broken. By passing up a packet
which the driver thinks is bad, the software can still work.

^ permalink raw reply

* Re: [PATCH net-next] net: percpu net_device refcount
From: Paul E. McKenney @ 2010-10-09 16:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, David Miller, netdev
In-Reply-To: <1286605396.2692.10.camel@edumazet-laptop>

On Sat, Oct 09, 2010 at 08:23:16AM +0200, Eric Dumazet wrote:
> Le vendredi 08 octobre 2010 à 14:56 -0700, Paul E. McKenney a écrit :
> > On Thu, Oct 07, 2010 at 10:30:51AM -0700, Stephen Hemminger wrote:
> > > On Thu, 07 Oct 2010 19:12:35 +0200
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > 
> > > > We tried very hard to remove all possible dev_hold()/dev_put() pairs in
> > > > network stack, using RCU conversions.
> > > > 
> > > > There is still an unavoidable device refcount change for every dst we
> > > > create/destroy, and this can slow down some workloads (routers or some
> > > > app servers)
> > > > 
> > > > We can switch to a percpu refcount implementation, now dynamic per_cpu
> > > > infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
> > > > per device.
> > > 
> > > It makes sense, but what about 256 cores and 1024 Vlans?
> > > That adds up to 4M of memory which is might be noticeable.
> > 
> > I bet that systems that have 256 cores have >100GB of memory, at which
> > point 4MB is way down in the noise.
> 
> Well, first its 1MB added, and secondly we added percpu stats for vlan
> devices, and this consumed 8x more :
> 
> (struct vlan_rx_stats is 32 bytes per cpu and per vlan
> 32*256*1024  ->  8 Mbytes
> 
> Some strange machines have many cores sharing a small amount of memory,
> but I am not sure they want to run many net devices ;)

I do have to admit that the rapid growth rate in the data required might
well be cause for concern.  But only if it continues.  ;-)

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH] net/tg3: simplify conditional
From: David Miller @ 2010-10-09 16:23 UTC (permalink / raw)
  To: mcarlson; +Cc: nikai, mchan, netdev, linux-kernel
In-Reply-To: <20101008174312.GA3562@mcarlson.broadcom.com>

From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Fri, 8 Oct 2010 10:43:12 -0700

> On Fri, Oct 08, 2010 at 02:29:27AM -0700, Nicolas Kaiser wrote:
>> Simplify: ((a && !b) || (!a && b)) => (a != b)
>> 
>> Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
 ...
> Looks good to me.

Applied.

Matt, you can give an "Acked-by: " tag to indicate your approval
of a patch in the future :-)

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] sundance: get_stats proper locking
From: David Miller @ 2010-10-09 16:24 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dkirjanov, netdev
In-Reply-To: <1286626621.2692.18.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 09 Oct 2010 14:17:01 +0200

> Le samedi 09 octobre 2010 à 09:53 +0000, Denis Kirjanov a écrit :
>> Add initial ethtool statistics support 
> 
> OK, I guess its time to add proper locking into sundance after all ;)
> 
> [PATCH net-next] sundance: get_stats proper locking
> 
> sundance get_stats() should not be run concurrently, add a lock to avoid
> potential losses.
> 
> Note: Remove unused rx_lock field
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net_sched: use __TCA_HTB_MAX and TCA_HTB_MAX
From: David Miller @ 2010-10-09 16:23 UTC (permalink / raw)
  To: xiaosuo; +Cc: hadi, netdev
In-Reply-To: <1285863464-5299-1-git-send-email-xiaosuo@gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Fri,  1 Oct 2010 00:17:44 +0800

> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ehea: simplify conditional
From: David Miller @ 2010-10-09 16:21 UTC (permalink / raw)
  To: leitao; +Cc: nikai, netdev, linux-kernel
In-Reply-To: <4CAE5C71.50304@linux.vnet.ibm.com>

From: Breno Leitao <leitao@linux.vnet.ibm.com>
Date: Thu, 07 Oct 2010 20:49:05 -0300

> On 10/07/2010 08:14 PM, Nicolas Kaiser wrote:
>> Simplify: ((a&&  b) || (!a&&  !b)) =>  (a == b)
>>
>> Signed-off-by: Nicolas Kaiser<nikai@nikai.net>
> Acked-by: Breno Leitao <leitao@linux.vnet.ibm.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ehea: Fix a checksum issue on the receive path
From: David Miller @ 2010-10-09 16:20 UTC (permalink / raw)
  To: eric.dumazet; +Cc: leitao, netdev, fubar
In-Reply-To: <1286548577.2959.412.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 08 Oct 2010 16:36:17 +0200

> I am pretty sure most (if not all) netdev drivers pass the packet with
> invalid checksum to upper stack, so that we can increment appropriate
> SNMP counters, in IP stack or UDP/TCP/whatever stack.
> 
> tg3, bnx2, e1000, skge, sky2, bnx2x, niu, r8169, igb, ... seems to do
> that.

Drivers _must_ send up all packets, even those with bad checksums,
without exception.

Otherwise protocol statistics get lost, netfilter log entries go
missing, etc.

^ permalink raw reply

* Re: [PATCH 2/2] r8169: use device model DMA API
From: David Miller @ 2010-10-09 16:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sgruszka, romieu, netdev
In-Reply-To: <1286611056.2692.12.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 09 Oct 2010 09:57:35 +0200

> Le vendredi 08 octobre 2010 à 16:25 +0200, Stanislaw Gruszka a écrit :
>> Use DMA API as PCI equivalents will be deprecated. This change also
>> allow to allocate with GFP_KERNEL where possible.
>> 
>> Tested-by: Neal Becker <ndbecker2@gmail.com>
>> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
>> ---
>>  drivers/net/r8169.c |   53 +++++++++++++++++++++++++++-----------------------
>>  1 files changed, 29 insertions(+), 24 deletions(-)
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] r8169: allocate with GFP_KERNEL flag when able to sleep
From: David Miller @ 2010-10-09 16:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sgruszka, romieu, netdev
In-Reply-To: <1286610844.2692.11.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 09 Oct 2010 09:54:04 +0200

> Le vendredi 08 octobre 2010 à 16:25 +0200, Stanislaw Gruszka a écrit :
>> We have fedora bug report where driver fail to initialize after
>> suspend/resume because of memory allocation errors:
>> https://bugzilla.redhat.com/show_bug.cgi?id=629158
>> 
>> To fix use GFP_KERNEL allocation where possible.
>> 
>> Tested-by: Neal Becker <ndbecker2@gmail.com>
>> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox