Netdev List

Netdev List
 help / color / mirror / Atom feed

* Best route for re-implementing TCPHA
From: RichardFliam @ 2011-04-13 23:08 UTC (permalink / raw)
  To: netdev

TCPHA (http://dragon.linux-vs.org/~dragonfly/htm/tcpha.htm) provided
several neat features for content and health aware load balancing. I
am looking to re-implement on the 2.6 kernel and I am struck by
indecision on a few key features.

In particular the original project created its own polling methods for
TCP sockets based on fs/select.c and tcp_poll but to me this seems
inelegant. I am wondering if there is a "correct" way to poll sockets
in kernel or should I simply call sock_map_fd on the kernel socket.

After extensive searching I did find this post
http://permalink.gmane.org/gmane.linux.network/180354 to this mailing
list, but it does not seem to contain an answer as to the correct
direction for polling tcp sockets in kernel.

--
--Richard Fliam

^ permalink raw reply

* kernel panic, 2.6.38.2, gretap
From: Denys Fedoryshchenko @ 2011-04-13 22:58 UTC (permalink / raw)
  To: netdev

 Did following rule to route incoming (over eth0) traffic over gretap 
 interface

 Bringing up interface
 ip link add eoip1 type gretap remote X.X.X.X local Y.Y.Y.Y nopmtudisc
 ifconfig eoip1 10.255.254.1 netmask 255.255.255.252 up mtu 1500

 made source routing:
 32000:  from all iif eth0 lookup 203

 Some routes added to table 203

 After few(1-3) seconds running around 30-40 Mbps getting kernel panic:

 Notes: I have vlan on same interface, eth0.2023, where rest of traffic 
 going, and this vlan "shaped" by HTB. It is not involved in gretap 
 operation.
 on eth0 i have huge bfifo:
 qdisc bfifo 8001: dev eth0 root refcnt 9 limit 100000000b
  Sent 14652829681 bytes 15646355 pkt (dropped 0, overlimits 0 requeues 
 8)
  backlog 0b 0p requeues 8


 [  658.492347] skb_over_panic: text:f80f37d4 len:3028 put:1514 
 head:d1af2000 data:d1af20a4 tail:0xd1af2c78 end:0xd1af2700 dev:eth0.2022
 [  658.492975] ------------[ cut here ]------------
 [  658.493264] Kernel BUG at c0377eaf [verbose debug info unavailable]
 [  658.493317] invalid opcode: 0000 [#1]
 SMP

 [  658.493317] last sysfs file: 
 /sys/devices/virtual/net/eth0.2022/address
 [  658.493317] Modules linked in:
 ip_gre
 gre
 netconsole
 ipmi_si
 tun
 configfs
 cls_u32
 sch_htb
 8021q
 garp
 stp
 llc
 iptable_filter
 ipt_addrtype
 xt_dscp
 xt_string
 xt_owner
 xt_multiport
 xt_iprange
 xt_hashlimit
 xt_conntrack
 xt_DSCP
 xt_NFQUEUE
 xt_mark
 xt_connmark
 nf_conntrack
 ip_tables
 x_tables
 bnx2
 ipmi_devintf
 ipmi_msghandler
 processor
 ata_piix
 i5k_amb
 iTCO_wdt
 pata_acpi
 hwmon
 [last unloaded: netconsole]

 [  658.493317]
 [  658.493317] Pid: 0, comm: kworker/0:1 Not tainted 2.6.38.2-devel2 #2

 Dell Inc. PowerEdge 1950
 /
 0D8635

 [  658.493317] EIP: 0060:[<c0377eaf>] EFLAGS: 00010282 CPU: 3
 [  658.493317] EIP is at skb_put+0x7f/0x89
 [  658.493317] EAX: 0000008e EBX: d1af2c78 ECX: f64b5e40 EDX: c05032e8
 [  658.493317] ESI: 000005ea EDI: f5f28380 EBP: 006d006d ESP: f64b5e3c
 [  658.493317]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 [  658.493317] Process kworker/0:1 (pid: 0, ti=f64b4000 task=f64a4a80 
 task.ti=f64b0000)
 [  658.493317] Stack:
 [  658.493317]  c05032e8
 f80f37d4
 00000bd4
 000005ea
 d1af2000
 d1af20a4
 d1af2c78
 d1af2700

 [  658.493317]  f5e54000
 00000000
 eee81e00
 f80f37d4
 00000604
 00000002
 00000000
 e5602500

 [  658.493317]  00000001
 f6b02cb8
 0000004d
 512e75c0
 eef28480
 00000000
 f5f28400
 f5f28380

 [  658.493317] Call Trace:
 [  658.493317]  [<f80f37d4>] ? bnx2_poll_work+0x980/0xf48 [bnx2]
 [  658.493317]  [<f80f37d4>] ? bnx2_poll_work+0x980/0xf48 [bnx2]
 [  658.493317]  [<c0140e49>] ? hrtimer_start+0x20/0x25
 [  658.493317]  [<f826ffd1>] ? htb_dequeue+0x757/0x770 [sch_htb]
 [  658.493317]  [<f80f3f27>] ? bnx2_poll+0xf7/0x1d9 [bnx2]
 [  658.493317]  [<c037f564>] ? net_rx_action+0x8c/0x176
 [  658.493317]  [<c012f28f>] ? __do_softirq+0x6b/0x104
 [  658.493317]  [<c012f224>] ? __do_softirq+0x0/0x104
 [  658.493317]  <IRQ>

 [  658.493317]  [<c012f17e>] ? irq_exit+0x26/0x59
 [  658.493317]  [<c0103b3d>] ? do_IRQ+0x81/0x95
 [  658.493317]  [<c0102ca9>] ? common_interrupt+0x29/0x30
 [  658.493317]  [<c010807a>] ? mwait_idle+0x51/0x56
 [  658.493317]  [<c0101a97>] ? cpu_idle+0x41/0x5e
 [  658.493317] Code:
 24
 14
 8b
 81
 a4
 00
 00
 00
 89
 74
 24
 0c
 89
 44
 24
 10
 8b
 41
 4c
 c7
 04
 24
 e8
 32
 50
 c0
 89
 44
 24
 08
 8b
 44
 24
 2c
 89
 44
 24
 04
 e8
 51
 85
 07
 00
 Apr 13 22:48:46 217.151.224.119 unparseable log message: "<0f> "
 0b
 eb
 fe
 83
 c4
 24
 5b
 5e
 c3
 55
 57
 56
 53
 83
 ec
 24
 fc
 89
 c5
 89

 [  658.493317] EIP: [<c0377eaf>]
 skb_put+0x7f/0x89
 SS:ESP 0068:f64b5e3c
 [  658.512472] ---[ end trace d06a076521439891 ]---
 [  658.512750] Kernel panic - not syncing: Fatal exception in interrupt
 [  658.514034] Rebooting in 5 seconds..



^ permalink raw reply

* RE: [net-next-2.6 RFC PATCH v2 01/13] ethtool: allow custom interval for physical identification
From: Allan, Bruce W @ 2011-04-13 22:55 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev@vger.kernel.org
In-Reply-To: <1302734679.2873.23.camel@bwh-desktop>

>-----Original Message-----
>From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>Sent: Wednesday, April 13, 2011 3:45 PM
>To: Allan, Bruce W
>Cc: netdev@vger.kernel.org
>Subject: RE: [net-next-2.6 RFC PATCH v2 01/13] ethtool: allow custom interval
>for physical identification
>
>On Wed, 2011-04-13 at 15:39 -0700, Allan, Bruce W wrote:
>>
>> >-----Original Message-----
>> >From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>> >Sent: Wednesday, April 13, 2011 1:25 PM
>> >To: Allan, Bruce W
>> >Cc: netdev@vger.kernel.org
>> >Subject: Re: [net-next-2.6 RFC PATCH v2 01/13] ethtool: allow custom interval
>> >for physical identification
>> >
>> >I'm sure there ought to be a clearer way to do this, and to avoid any
>> >weird effects from integer overflow in the multiplication.  How about
>> >using an inner loop for each second:
>> >
>> >		/* Driver expects to be called at twice the frequency in rc */
>> >		int n = rc * 2, i, interval = HZ / n;
>> >
>
>		/* Count down seconds */
>> >		do {
>			/* Count down iterations per second */
>> >			i = n;
>> >			do {
>> >	 			rtnl_lock();
>> > 				rc = dev->ethtool_ops->set_phys_id(
>> >					dev, (i & 1) ? ETHTOOL_ID_OFF : ETHTOOL_ID_ON);
>> >	 			rtnl_unlock();
>> > 				if (rc)
>> > 					break;
>> >				schedule_timeout_interruptible(interval);
>> >			} while (!signal_pending(current) && --i != 0);
>> > 		} while (!signal_pending(current) &&
>> >			 (id.data == 0 || --id.data != 0));
>> >
>> >Ben.
>>
>> OK, if that is clearer to you...v3 forthcoming.
>
>I guess it wouldn't hurt to add comemnts too.  Would you agree that it's
>clear with the additions above?
>
>Ben.

Sure, makes sense to me.

Thanks,
Bruce.

^ permalink raw reply

* RE: [net-next-2.6 RFC PATCH v2 01/13] ethtool: allow custom interval for physical identification
From: Ben Hutchings @ 2011-04-13 22:44 UTC (permalink / raw)
  To: Allan, Bruce W; +Cc: netdev@vger.kernel.org
In-Reply-To: <8DD2590731AB5D4C9DBF71A877482A90018A3427B6@orsmsx509.amr.corp.intel.com>

On Wed, 2011-04-13 at 15:39 -0700, Allan, Bruce W wrote:
> 
> >-----Original Message-----
> >From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> >Sent: Wednesday, April 13, 2011 1:25 PM
> >To: Allan, Bruce W
> >Cc: netdev@vger.kernel.org
> >Subject: Re: [net-next-2.6 RFC PATCH v2 01/13] ethtool: allow custom interval
> >for physical identification
> >
> >I'm sure there ought to be a clearer way to do this, and to avoid any
> >weird effects from integer overflow in the multiplication.  How about
> >using an inner loop for each second:
> >
> >		/* Driver expects to be called at twice the frequency in rc */
> >		int n = rc * 2, i, interval = HZ / n;
> >

		/* Count down seconds */
> >		do {
			/* Count down iterations per second */
> >			i = n;
> >			do {
> >	 			rtnl_lock();
> > 				rc = dev->ethtool_ops->set_phys_id(
> >					dev, (i & 1) ? ETHTOOL_ID_OFF : ETHTOOL_ID_ON);
> >	 			rtnl_unlock();
> > 				if (rc)
> > 					break;
> >				schedule_timeout_interruptible(interval);
> >			} while (!signal_pending(current) && --i != 0);
> > 		} while (!signal_pending(current) &&
> >			 (id.data == 0 || --id.data != 0));
> >
> >Ben.
> 
> OK, if that is clearer to you...v3 forthcoming.

I guess it wouldn't hurt to add comemnts too.  Would you agree that it's
clear with the additions above?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* RE: [net-next-2.6 RFC PATCH v2 01/13] ethtool: allow custom interval for physical identification
From: Allan, Bruce W @ 2011-04-13 22:39 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev@vger.kernel.org
In-Reply-To: <1302726313.2873.18.camel@bwh-desktop>



>-----Original Message-----
>From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>Sent: Wednesday, April 13, 2011 1:25 PM
>To: Allan, Bruce W
>Cc: netdev@vger.kernel.org
>Subject: Re: [net-next-2.6 RFC PATCH v2 01/13] ethtool: allow custom interval
>for physical identification
>
>I'm sure there ought to be a clearer way to do this, and to avoid any
>weird effects from integer overflow in the multiplication.  How about
>using an inner loop for each second:
>
>		/* Driver expects to be called at twice the frequency in rc */
>		int n = rc * 2, i, interval = HZ / n;
>
>		do {
>			i = n;
>			do {
>	 			rtnl_lock();
> 				rc = dev->ethtool_ops->set_phys_id(
>					dev, (i & 1) ? ETHTOOL_ID_OFF : ETHTOOL_ID_ON);
>	 			rtnl_unlock();
> 				if (rc)
> 					break;
>				schedule_timeout_interruptible(interval);
>			} while (!signal_pending(current) && --i != 0);
> 		} while (!signal_pending(current) &&
>			 (id.data == 0 || --id.data != 0));
>
>Ben.

OK, if that is clearer to you...v3 forthcoming.

Thanks,
Bruce.

^ permalink raw reply

* Re: [PATCHv2 net-next-2.6] rndis_host: Poll status before control channel where necessary
From: David Miller @ 2011-04-13 21:49 UTC (permalink / raw)
  To: ben-/+tVBieCtBitmTQ+vhA3Yw
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, vzeeaxwl-ubggFOsnOr3gwBMGfI3FeA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1302670523.5282.610.camel@localhost>

From: Ben Hutchings <ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org>
Date: Wed, 13 Apr 2011 05:55:23 +0100

> Some RNDIS devices don't respond on the control channel until polled
> on the status channel.  In particular, this was reported to be the
> case for the 2Wire HomePortal 1000SW and for some Windows Mobile
> devices.
> 
> This is roughly based on a patch by John Carr <john.carr-3P/l8hQepEe9FHfhHBbuYA@public.gmane.org>
> which is currently applied by Mandriva.
> 
> Reported-by: Mark Glassberg <vzeeaxwl-ubggFOsnOr3gwBMGfI3FeA@public.gmane.org>
> Signed-off-by: Ben Hutchings <ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org>
> ---
> The first version made this behaviour unconditional and had to be
> reverted.  This version adds a quirk flag instead.

Applied, thanks Ben.

The feedback about whether to use the point-to-point flag or not should
be addressed, but seperately.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] bridge: reset IPCB in br_parse_ip_options
From: David Miller @ 2011-04-13 21:48 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lkml, shemminger, shimoda.hiroaki, netdev
In-Reply-To: <1302708487.3725.0.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 13 Apr 2011 17:28:07 +0200

> Dont worry, Stephen or me will send it asap.

I'm looking forward to it :)

^ permalink raw reply

* [RFC][PATCH] Zero-copy receive from socket into bio
From: Andreas Gruenbacher @ 2011-04-13 21:39 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: linux-kernel

Hello,

I'm currently looking into supporting zero-copy receive in drbd.

The basic idea is this: drbd transmits bios via sockets.  An ideal sender
sends the packet header and data in separate packets, and the network driver
supports RX_COPYBREAK and receives them into separate socket buffers.  The
socket buffers end up aligned properly, and we add them to bios and submit
them, no copying required.

This scenario doesn't seem to be supported by the existing infrastructure, so
does this patch make sense?

Thanks,
Andreas

---

[PATCH] Add a generic zero-copy-receive primitive

This requires a network driver which supports header-data split, i.e.,
receiving small header packets and big data packets into different
buffers so that the data will end up aligned well enough for consumption
by the block layer (search for RX_COPYBREAK in the drivers).

diff --git a/tcp_recvbio.c b/tcp_recvbio.c
new file mode 100644
index 0000000..38342e9
--- /dev/null
+++ b/tcp_recvbio.c
@@ -0,0 +1,185 @@
+#include <linux/kernel.h>
+#include <net/tcp.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
+#include <linux/fs.h>
+#include "tcp_recvbio.h"
+
+static int tcp_recvbio_add(struct sk_buff *skb, struct bio *bio,
+			   struct bio_vec *last)
+{
+	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	struct sk_buff **frag_list = &skb_shinfo(skb)->frag_list;
+	int ret;
+
+	/*
+	 * Reject fragmented skbs: there should be no need to support them.  We
+	 * use frag_list to keep track of the skbs attached to a bio instead.
+	 */
+	if (*frag_list && skb != (struct sk_buff *)bio->bi_private)
+		return false;
+
+	if (!blk_rq_aligned(q, last->bv_offset, last->bv_len))
+		return false;
+	ret = bio_add_page(bio, last->bv_page, last->bv_len, last->bv_offset);
+
+	if (ret && !*frag_list) {
+		/* Tell the network layer to leave @skb alone.  */
+		skb_get(skb);
+
+		/* Put this skb on the list.  */
+		*frag_list = (struct sk_buff *)bio->bi_private;
+		bio->bi_private = skb;
+	}
+	return ret;
+}
+
+static int tcp_recvbio_data(read_descriptor_t *rd_desc, struct sk_buff *skb,
+			    unsigned int offset, size_t len)
+{
+	struct bio *bio = rd_desc->arg.data;
+	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	int start = skb_headlen(skb), consumed = 0, i;
+	struct bio_vec last = { };
+
+	/* Cannot zero-copy from the header.  */
+	if (offset < start)
+		goto give_up;
+
+	/* Give up if the payload is unaligned.  */
+	if (!blk_rq_aligned(q, offset - start, 0))
+		goto give_up;
+
+	/* Do not consume more data than we need.  */
+	if (len > rd_desc->count - rd_desc->written)
+		len = rd_desc->count - rd_desc->written;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i];
+		int end, frag_len;
+
+		WARN_ON(start > offset + len);
+
+		end = start + frag->size;
+		frag_len = end - offset;
+		if (frag_len > 0) {
+			bool merged = false;
+			unsigned int page_offset;
+
+			if (frag_len > len)
+				frag_len = len;
+
+			page_offset = frag->page_offset + offset - start;
+			if (last.bv_page == frag->page &&
+			    last.bv_offset + last.bv_len == page_offset) {
+				/* Merge with the previous fragment.  */
+				last.bv_len += frag_len;
+				merged = true;
+			}
+			len -= frag_len;
+			offset += frag_len;
+			if (!len || !merged) {
+				if (last.bv_page) {
+					if (!tcp_recvbio_add(skb, bio, &last))
+						goto give_up;
+					consumed += last.bv_len;
+				}
+				if (!len)
+					goto out;
+				last.bv_page = frag->page;
+				last.bv_offset = page_offset;
+				last.bv_len = frag_len;
+			}
+		}
+		start = end;
+	}
+
+	/*
+	 * We don't care if there are additional blocks in the skb's frag_list
+	 * that are zero-copyable: at worst, we end up copying too many blocks.
+	 * (See skb_copy_bits() for an example of walking the frag_list.)
+	 */
+
+out:
+	rd_desc->written += consumed;
+	return consumed;
+
+give_up:
+	rd_desc->count = 0;
+	goto out;
+}
+
+/**
+ * tcp_recvbio  -  zero-copy receive a bio from a socket
+ * @sk: socket to receive from
+ * @bio: bio to add socket data to
+ * @size: bytes to receive
+ * @list: single linked list of skbs added to @bio
+ *
+ * Zero-copy receive data from @sk into @bio by directly using the socket
+ * buffer pages, bypassing the page cache.  To keep the network layer from
+ * modifying the socket buffers while in use by @bio, we skb_get() them and
+ * return a list of skbs that @bio now references.  The caller is
+ * responsible for releasing @list with consume_skbs() once done.
+ *
+ * Returns the number of bytes received into @bio.
+ */
+int tcp_recvbio(struct sock *sk, struct bio *bio, size_t size,
+		struct sk_buff **list)
+{
+	read_descriptor_t rd_desc = {
+		.count = size,
+		.arg = { .data = bio },
+	};
+	void *old_bi_private;
+	int err = 0;
+
+	/* Temporarily build referenced skb list in bi_private.  */
+	old_bi_private = bio->bi_private;
+	bio->bi_private = NULL;
+
+	lock_sock(sk);
+	while (rd_desc.written < rd_desc.count) {
+		long timeo = sock_rcvtimeo(sk, 0);
+
+		sk_wait_data(sk, &timeo);
+		if (signal_pending(current)) {
+			err = sock_intr_errno(timeo);
+			break;
+		}
+		if (!timeo) {
+			if (!rd_desc.written)
+				err = -EAGAIN;
+			break;
+		}
+		read_lock(&sk->sk_callback_lock);
+		err = tcp_read_sock(sk, &rd_desc, tcp_recvbio_data);
+		read_unlock(&sk->sk_callback_lock);
+		if (err < 0)
+			break;
+	}
+	release_sock(sk);
+
+	*list = (struct sk_buff *)bio->bi_private;
+	bio->bi_private = old_bi_private;
+
+	if (err)
+		return err;
+	return rd_desc.written;
+}
+
+/**
+ * consume_skbs  -  consume a list of skbs
+ *
+ * This assumes that the skbs are linked on frag_list, as the @list returned
+ * from tcp_recvbio().
+ */
+void consume_skbs(struct sk_buff **skb)
+{
+	while (*skb) {
+		struct sk_buff *tmp = *skb;
+		*skb = skb_shinfo(tmp)->frag_list;
+		skb_shinfo(tmp)->frag_list = NULL;
+		consume_skb(tmp);
+	}
+}
diff --git a/tcp_recvbio.h b/tcp_recvbio.h
new file mode 100644
index 0000000..0ba30ee
--- /dev/null
+++ b/tcp_recvbio.h
@@ -0,0 +1,9 @@
+#ifndef __TCP_RECVBIO_H
+#define __TCP_RECVBIO_H
+
+
+extern int tcp_recvbio(struct sock *, struct bio *, size_t, struct sk_buff **);
+extern void consume_skbs(struct sk_buff **);
+
+
+#endif  /* __TCP_RECVBIO_H */
-- 
1.7.4.1.415.g5e839

^ permalink raw reply related

* RE: SMSC 8720a/MDIO/PHY help.
From: ANDY KENNEDY @ 2011-04-13 21:38 UTC (permalink / raw)
  To: michael, netdev
In-Reply-To: <1302729564.2742.28.camel@malcolm>

> -----Original Message-----
> From: Michael Riesch [mailto:michael@riesch.at]
> Sent: Wednesday, April 13, 2011 4:19 PM
> To: netdev@vger.kernel.org
> Cc: ANDY KENNEDY
> Subject: Re: SMSC 8720a/MDIO/PHY help.
> 
> 
> > If you have an idea of something for me to try, I'd love to
> entertain
> > it.
> 
> I am rather new to PHYLIB, but these are my ideas:
> 
>  1) make sure phy_connect is executed (AFIAK called by MDIO bus
> driver)

Going through the phy.txt doc under Documentation/networking:
PHY Abstraction Layer
(Updated 2008-04-08)
though it may be a bit out-of-date, I did see what you are talking about.  What I'm hung up on at the moment is the behavior of adjust_link().  It appears that I only need to start the queues, though I don’t know.

> 
>  2) maybe you need to call phy_start / phy_stop (AFAIK from the PHY
> driver's open / close function)

Currently, when I do this I only get the call to adjust_link() over and over again.

> 
> HTH,
> Michael

Thanks for the help!

Andy

^ permalink raw reply

* Re: [PATCH 1/1] ipv6: ignore looped-back NA while dad is running
From: David Miller @ 2011-04-13 21:30 UTC (permalink / raw)
  To: dwalter; +Cc: netdev, linux-kernel
In-Reply-To: <1302706963.8923.25.camel@localhost>

From: Daniel Walter <dwalter@barracuda.com>
Date: Wed, 13 Apr 2011 17:02:43 +0200

> This message and any attached files are confidential and intended
> solely for the addressee(s). Any publication, transmission or other
> use of the information by a person or entity other than the intended
> addressee is prohibited. If you receive this in error please contact
> the sender and delete the material. The sender does not accept
> liability for any errors or omissions as a result of the
> transmission.

I'm not applying patches that have legal disclaimers like this.

It has no place in a posting made on a public mailing list where open
and unrestricted discussions are essential.

^ permalink raw reply

* Re: SMSC 8720a/MDIO/PHY help.
From: Michael Riesch @ 2011-04-13 21:19 UTC (permalink / raw)
  To: netdev; +Cc: ANDY KENNEDY
In-Reply-To: <9AC3F0E75060224C8BBC5BA2DDC8853A1FA8E632@EXV1.corp.adtran.com>

> If you have an idea of something for me to try, I'd love to entertain
> it.

I am rather new to PHYLIB, but these are my ideas:

 1) make sure phy_connect is executed (AFIAK called by MDIO bus driver)

 2) maybe you need to call phy_start / phy_stop (AFAIK from the PHY
driver's open / close function)

HTH,
Michael

^ permalink raw reply

* Re: [net-next-2.6 RFC PATCH v2 12/13] sky2: set ethtool set_phys_id on/off cycle frequency to 1/sec
From: Stephen Hemminger @ 2011-04-13 21:00 UTC (permalink / raw)
  To: Bruce Allan; +Cc: netdev
In-Reply-To: <20110413195949.25901.86878.stgit@gitlad.jf.intel.com>

On Wed, 13 Apr 2011 12:59:49 -0700
Bruce Allan <bruce.w.allan@intel.com> wrote:

> Physical identification frequency based on how it was done prior to the
> introduction of set_phys_id.  Compile tested only.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Cc: Stephen Hemminger <shemminger@linux-foundation.org>

Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Assume same for skge

^ permalink raw reply

* [PATCH net-next 3/5] tg3: Automatically size stat/test string arrays
From: Matt Carlson @ 2011-04-13 21:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson, Benjamin Li

This patch reimplements the size preprocessor constants of the stats and
ethtool test string arrays.  The size is calculated at compile time
rather than using static constants.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/tg3.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index b61b52f..9975cdb 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -165,11 +165,6 @@
 
 #define TG3_RAW_IP_ALIGN 2
 
-/* number of ETHTOOL_GSTATS u64's */
-#define TG3_NUM_STATS		(sizeof(struct tg3_ethtool_stats)/sizeof(u64))
-
-#define TG3_NUM_TEST		6
-
 #define TG3_FW_UPDATE_TIMEOUT_SEC	5
 
 #define FIRMWARE_TG3		"tigon/tg3.bin"
@@ -279,7 +274,7 @@ MODULE_DEVICE_TABLE(pci, tg3_pci_tbl);
 
 static const struct {
 	const char string[ETH_GSTRING_LEN];
-} ethtool_stats_keys[TG3_NUM_STATS] = {
+} ethtool_stats_keys[] = {
 	{ "rx_octets" },
 	{ "rx_fragments" },
 	{ "rx_ucast_packets" },
@@ -358,9 +353,12 @@ static const struct {
 	{ "nic_tx_threshold_hit" }
 };
 
+#define TG3_NUM_STATS	ARRAY_SIZE(ethtool_stats_keys)
+
+
 static const struct {
 	const char string[ETH_GSTRING_LEN];
-} ethtool_test_keys[TG3_NUM_TEST] = {
+} ethtool_test_keys[] = {
 	{ "nvram test     (online) " },
 	{ "link test      (online) " },
 	{ "register test  (offline)" },
@@ -369,6 +367,9 @@ static const struct {
 	{ "interrupt test (offline)" },
 };
 
+#define TG3_NUM_TEST	ARRAY_SIZE(ethtool_test_keys)
+
+
 static void tg3_write32(struct tg3 *tp, u32 off, u32 val)
 {
 	writel(val, tp->regs + off);
-- 
1.7.3.4



^ permalink raw reply related

* [PATCH net-next 2/5] tg3: Dump registers when status block shows errors
From: Matt Carlson @ 2011-04-13 21:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson, Michael Chan

This patch monitors the error bit of the status word within the status
block.  If it is set, the driver will dump the driver state after
validating the error and then reset the chip.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
---
 drivers/net/tg3.c |   40 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/tg3.h |    3 +++
 2 files changed, 42 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 7274435..b61b52f 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -5259,6 +5259,40 @@ tx_recovery:
 	return work_done;
 }
 
+static void tg3_process_error(struct tg3 *tp)
+{
+	u32 val;
+	bool real_error = false;
+
+	if (tp->tg3_flags & TG3_FLAG_ERROR_PROCESSED)
+		return;
+
+	/* Check Flow Attention register */
+	val = tr32(HOSTCC_FLOW_ATTN);
+	if (val & ~HOSTCC_FLOW_ATTN_MBUF_LWM) {
+		netdev_err(tp->dev, "FLOW Attention error.  Resetting chip.\n");
+		real_error = true;
+	}
+
+	if (tr32(MSGINT_STATUS) & ~MSGINT_STATUS_MSI_REQ) {
+		netdev_err(tp->dev, "MSI Status error.  Resetting chip.\n");
+		real_error = true;
+	}
+
+	if (tr32(RDMAC_STATUS) || tr32(WDMAC_STATUS)) {
+		netdev_err(tp->dev, "DMA Status error.  Resetting chip.\n");
+		real_error = true;
+	}
+
+	if (!real_error)
+		return;
+
+	tg3_dump_state(tp);
+
+	tp->tg3_flags |= TG3_FLAG_ERROR_PROCESSED;
+	schedule_work(&tp->reset_task);
+}
+
 static int tg3_poll(struct napi_struct *napi, int budget)
 {
 	struct tg3_napi *tnapi = container_of(napi, struct tg3_napi, napi);
@@ -5267,6 +5301,9 @@ static int tg3_poll(struct napi_struct *napi, int budget)
 	struct tg3_hw_status *sblk = tnapi->hw_status;
 
 	while (1) {
+		if (sblk->status & SD_STATUS_ERROR)
+			tg3_process_error(tp);
+
 		tg3_poll_link(tp);
 
 		work_done = tg3_poll_work(tnapi, work_done, budget);
@@ -7316,7 +7353,8 @@ static int tg3_chip_reset(struct tg3 *tp)
 
 	tg3_restore_pci_state(tp);
 
-	tp->tg3_flags &= ~TG3_FLAG_CHIP_RESETTING;
+	tp->tg3_flags &= ~(TG3_FLAG_CHIP_RESETTING |
+			   TG3_FLAG_ERROR_PROCESSED);
 
 	val = 0;
 	if (tp->tg3_flags2 & TG3_FLG2_5780_CLASS)
diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h
index 9912010..b3ccfcc 100644
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -1201,6 +1201,7 @@
 #define HOSTCC_STATS_BLK_NIC_ADDR	0x00003c40
 #define HOSTCC_STATUS_BLK_NIC_ADDR	0x00003c44
 #define HOSTCC_FLOW_ATTN		0x00003c48
+#define HOSTCC_FLOW_ATTN_MBUF_LWM	 0x00000040
 /* 0x3c4c --> 0x3c50 unused */
 #define HOSTCC_JUMBO_CON_IDX		0x00003c50
 #define HOSTCC_STD_CON_IDX		0x00003c54
@@ -1611,6 +1612,7 @@
 #define  MSGINT_MODE_ONE_SHOT_DISABLE	 0x00000020
 #define  MSGINT_MODE_MULTIVEC_EN	 0x00000080
 #define MSGINT_STATUS			0x00006004
+#define  MSGINT_STATUS_MSI_REQ		 0x00000001
 #define MSGINT_FIFO			0x00006008
 /* 0x600c --> 0x6400 unused */
 
@@ -2886,6 +2888,7 @@ struct tg3 {
 #define TG3_FLAG_TAGGED_STATUS		0x00000001
 #define TG3_FLAG_TXD_MBOX_HWBUG		0x00000002
 #define TG3_FLAG_USE_LINKCHG_REG	0x00000008
+#define TG3_FLAG_ERROR_PROCESSED	0x00000010
 #define TG3_FLAG_ENABLE_ASF		0x00000020
 #define TG3_FLAG_ASPM_WORKAROUND	0x00000040
 #define TG3_FLAG_POLL_SERDES		0x00000080
-- 
1.7.3.4



^ permalink raw reply related

* [PATCH net-next 5/5] tg3: Add support for extended VPD blocks
From: Matt Carlson @ 2011-04-13 21:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson

In some devices, the VPD block is relocated to a different area in
NVRAM.  The original location can still contain old, but still valid VPD
data.  This patch changes the code to look for an extended VPD block in
NVRAM.  If one is found, that block is used for all VPD operations
instead.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
---
 drivers/net/tg3.c |  125 ++++++++++++++++++++++++++++++++++-------------------
 drivers/net/tg3.h |    2 +
 2 files changed, 83 insertions(+), 44 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 52dd516..10fa476 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -10416,6 +10416,81 @@ static void tg3_get_ethtool_stats(struct net_device *dev,
 	memcpy(tmp_stats, tg3_get_estats(tp), sizeof(tp->estats));
 }
 
+static __be32 * tg3_vpd_readblock(struct tg3 *tp)
+{
+	int i;
+	__be32 *buf;
+	u32 offset = 0, len = 0;
+	u32 magic, val;
+
+	if ((tp->tg3_flags3 & TG3_FLG3_NO_NVRAM) ||
+	    tg3_nvram_read(tp, 0, &magic))
+		return NULL;
+
+	if (magic == TG3_EEPROM_MAGIC) {
+		for (offset = TG3_NVM_DIR_START;
+		     offset < TG3_NVM_DIR_END;
+		     offset += TG3_NVM_DIRENT_SIZE) {
+			if (tg3_nvram_read(tp, offset, &val))
+				return NULL;
+
+			if ((val >> TG3_NVM_DIRTYPE_SHIFT) ==
+			    TG3_NVM_DIRTYPE_EXTVPD)
+				break;
+		}
+
+		if (offset != TG3_NVM_DIR_END) {
+			len = (val & TG3_NVM_DIRTYPE_LENMSK) * 4;
+			if (tg3_nvram_read(tp, offset + 4, &offset))
+				return NULL;
+
+			offset = tg3_nvram_logical_addr(tp, offset);
+		}
+	}
+
+	if (!offset || !len) {
+		offset = TG3_NVM_VPD_OFF;
+		len = TG3_NVM_VPD_LEN;
+	}
+
+	buf = kmalloc(len, GFP_KERNEL);
+	if (buf == NULL)
+		return NULL;
+
+	if (magic == TG3_EEPROM_MAGIC) {
+		for (i = 0; i < len; i += 4) {
+			/* The data is in little-endian format in NVRAM.
+			 * Use the big-endian read routines to preserve
+			 * the byte order as it exists in NVRAM.
+			 */
+			if (tg3_nvram_read_be32(tp, offset + i, &buf[i/4]))
+				goto error;
+		}
+	} else {
+		u8 *ptr;
+		ssize_t cnt;
+		unsigned int pos = 0;
+
+		ptr = (u8 *)&buf[0];
+		for (i = 0; pos < len && i < 3; i++, pos += cnt, ptr += cnt) {
+			cnt = pci_read_vpd(tp->pdev, pos,
+					   len - pos, ptr);
+			if (cnt == -ETIMEDOUT || cnt == -EINTR)
+				cnt = 0;
+			else if (cnt < 0)
+				goto error;
+		}
+		if (pos != len)
+			goto error;
+	}
+
+	return buf;
+
+error:
+	kfree(buf);
+	return NULL;
+}
+
 #define NVRAM_TEST_SIZE 0x100
 #define NVRAM_SELFBOOT_FORMAT1_0_SIZE	0x14
 #define NVRAM_SELFBOOT_FORMAT1_2_SIZE	0x18
@@ -10555,14 +10630,11 @@ static int tg3_test_nvram(struct tg3 *tp)
 	if (csum != le32_to_cpu(buf[0xfc/4]))
 		goto out;
 
-	for (i = 0; i < TG3_NVM_VPD_LEN; i += 4) {
-		/* The data is in little-endian format in NVRAM.
-		 * Use the big-endian read routines to preserve
-		 * the byte order as it exists in NVRAM.
-		 */
-		if (tg3_nvram_read_be32(tp, TG3_NVM_VPD_OFF + i, &buf[i/4]))
-			goto out;
-	}
+	kfree(buf);
+
+	buf = tg3_vpd_readblock(tp);
+	if (!buf)
+		return -ENOMEM;
 
 	i = pci_vpd_find_tag((u8 *)buf, 0, TG3_NVM_VPD_LEN,
 			     PCI_VPD_LRDT_RO_DATA);
@@ -12905,46 +12977,11 @@ static void __devinit tg3_read_vpd(struct tg3 *tp)
 	u8 *vpd_data;
 	unsigned int block_end, rosize, len;
 	int j, i = 0;
-	u32 magic;
-
-	if ((tp->tg3_flags3 & TG3_FLG3_NO_NVRAM) ||
-	    tg3_nvram_read(tp, 0x0, &magic))
-		goto out_no_vpd;
 
-	vpd_data = kmalloc(TG3_NVM_VPD_LEN, GFP_KERNEL);
+	vpd_data = (u8 *)tg3_vpd_readblock(tp);
 	if (!vpd_data)
 		goto out_no_vpd;
 
-	if (magic == TG3_EEPROM_MAGIC) {
-		for (i = 0; i < TG3_NVM_VPD_LEN; i += 4) {
-			u32 tmp;
-
-			/* The data is in little-endian format in NVRAM.
-			 * Use the big-endian read routines to preserve
-			 * the byte order as it exists in NVRAM.
-			 */
-			if (tg3_nvram_read_be32(tp, TG3_NVM_VPD_OFF + i, &tmp))
-				goto out_not_found;
-
-			memcpy(&vpd_data[i], &tmp, sizeof(tmp));
-		}
-	} else {
-		ssize_t cnt;
-		unsigned int pos = 0;
-
-		for (; pos < TG3_NVM_VPD_LEN && i < 3; i++, pos += cnt) {
-			cnt = pci_read_vpd(tp->pdev, pos,
-					   TG3_NVM_VPD_LEN - pos,
-					   &vpd_data[pos]);
-			if (cnt == -ETIMEDOUT || cnt == -EINTR)
-				cnt = 0;
-			else if (cnt < 0)
-				goto out_not_found;
-		}
-		if (pos != TG3_NVM_VPD_LEN)
-			goto out_not_found;
-	}
-
 	i = pci_vpd_find_tag(vpd_data, 0, TG3_NVM_VPD_LEN,
 			     PCI_VPD_LRDT_RO_DATA);
 	if (i < 0)
diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h
index b3ccfcc..224c3e0 100644
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -2009,7 +2009,9 @@
 #define TG3_NVM_DIR_END			0x78
 #define TG3_NVM_DIRENT_SIZE		0xc
 #define TG3_NVM_DIRTYPE_SHIFT		24
+#define TG3_NVM_DIRTYPE_LENMSK		0x003fffff
 #define TG3_NVM_DIRTYPE_ASFINI		1
+#define TG3_NVM_DIRTYPE_EXTVPD		20
 #define TG3_NVM_PTREV_BCVER		0x94
 #define TG3_NVM_BCVER_MAJMSK		0x0000ff00
 #define TG3_NVM_BCVER_MAJSFT		8
-- 
1.7.3.4



^ permalink raw reply related

* [PATCH net-next 4/5] tg3: Add jumbo frame loopback tests to selftest
From: Matt Carlson @ 2011-04-13 21:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson

This patch adds jumbo frame loopback test support to the ethtool
selftest.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
---
 drivers/net/tg3.c |   34 +++++++++++++++++++++++++---------
 1 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 9975cdb..52dd516 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -10935,7 +10935,7 @@ static int tg3_test_memory(struct tg3 *tp)
 #define TG3_MAC_LOOPBACK	0
 #define TG3_PHY_LOOPBACK	1
 
-static int tg3_run_loopback(struct tg3 *tp, int loopback_mode)
+static int tg3_run_loopback(struct tg3 *tp, u32 pktsz, int loopback_mode)
 {
 	u32 mac_mode, rx_start_idx, rx_idx, tx_idx, opaque_key;
 	u32 desc_idx, coal_now;
@@ -11033,7 +11033,7 @@ static int tg3_run_loopback(struct tg3 *tp, int loopback_mode)
 
 	err = -EIO;
 
-	tx_len = 1514;
+	tx_len = pktsz;
 	skb = netdev_alloc_skb(tp->dev, tx_len);
 	if (!skb)
 		return -ENOMEM;
@@ -11042,7 +11042,7 @@ static int tg3_run_loopback(struct tg3 *tp, int loopback_mode)
 	memcpy(tx_data, tp->dev->dev_addr, 6);
 	memset(tx_data + 6, 0x0, 8);
 
-	tw32(MAC_RX_MTU_SIZE, tx_len + 4);
+	tw32(MAC_RX_MTU_SIZE, tx_len + ETH_FCS_LEN);
 
 	for (i = 14; i < tx_len; i++)
 		tx_data[i] = (u8) (i & 0xff);
@@ -11098,8 +11098,6 @@ static int tg3_run_loopback(struct tg3 *tp, int loopback_mode)
 	desc = &rnapi->rx_rcb[rx_start_idx];
 	desc_idx = desc->opaque & RXD_OPAQUE_INDEX_MASK;
 	opaque_key = desc->opaque & RXD_OPAQUE_RING_MASK;
-	if (opaque_key != RXD_OPAQUE_RING_STD)
-		goto out;
 
 	if ((desc->err_vlan & RXD_ERR_MASK) != 0 &&
 	    (desc->err_vlan != RXD_ERR_ODD_NIBBLE_RCVD_MII))
@@ -11109,9 +11107,20 @@ static int tg3_run_loopback(struct tg3 *tp, int loopback_mode)
 	if (rx_len != tx_len)
 		goto out;
 
-	rx_skb = tpr->rx_std_buffers[desc_idx].skb;
+	if (pktsz <= TG3_RX_STD_DMA_SZ - ETH_FCS_LEN) {
+		if (opaque_key != RXD_OPAQUE_RING_STD)
+			goto out;
+
+		rx_skb = tpr->rx_std_buffers[desc_idx].skb;
+		map = dma_unmap_addr(&tpr->rx_std_buffers[desc_idx], mapping);
+	} else {
+		if (opaque_key != RXD_OPAQUE_RING_JUMBO)
+			goto out;
+
+		rx_skb = tpr->rx_jmb_buffers[desc_idx].skb;
+		map = dma_unmap_addr(&tpr->rx_jmb_buffers[desc_idx], mapping);
+	}
 
-	map = dma_unmap_addr(&tpr->rx_std_buffers[desc_idx], mapping);
 	pci_dma_sync_single_for_cpu(tp->pdev, map, rx_len, PCI_DMA_FROMDEVICE);
 
 	for (i = 14; i < tx_len; i++) {
@@ -11177,9 +11186,13 @@ static int tg3_test_loopback(struct tg3 *tp)
 				  CPMU_CTRL_LINK_AWARE_MODE));
 	}
 
-	if (tg3_run_loopback(tp, TG3_MAC_LOOPBACK))
+	if (tg3_run_loopback(tp, ETH_FRAME_LEN, TG3_MAC_LOOPBACK))
 		err |= TG3_MAC_LOOPBACK_FAILED;
 
+	if ((tp->tg3_flags & TG3_FLAG_JUMBO_RING_ENABLE) &&
+	    tg3_run_loopback(tp, 9000 + ETH_HLEN, TG3_MAC_LOOPBACK))
+		err |= (TG3_MAC_LOOPBACK_FAILED << 2);
+
 	if (tp->tg3_flags & TG3_FLAG_CPMU_PRESENT) {
 		tw32(TG3_CPMU_CTRL, cpmuctrl);
 
@@ -11189,8 +11202,11 @@ static int tg3_test_loopback(struct tg3 *tp)
 
 	if (!(tp->phy_flags & TG3_PHYFLG_PHY_SERDES) &&
 	    !(tp->tg3_flags3 & TG3_FLG3_USE_PHYLIB)) {
-		if (tg3_run_loopback(tp, TG3_PHY_LOOPBACK))
+		if (tg3_run_loopback(tp, ETH_FRAME_LEN, TG3_PHY_LOOPBACK))
 			err |= TG3_PHY_LOOPBACK_FAILED;
+		if ((tp->tg3_flags & TG3_FLAG_JUMBO_RING_ENABLE) &&
+		    tg3_run_loopback(tp, 9000 + ETH_HLEN, TG3_PHY_LOOPBACK))
+			err |= (TG3_PHY_LOOPBACK_FAILED << 2);
 	}
 
 	/* Re-enable gphy autopowerdown. */
-- 
1.7.3.4



^ permalink raw reply related

* [PATCH net-next 0/5] tg3: Add more selftest and debug support
From: Matt Carlson @ 2011-04-13 21:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson

This patchset adds register dump capabilities for first failure debugging,
a jumbo frame loopback test mode, and extended VPD block handling.

^ permalink raw reply

* [PATCH net-next 1/5] tg3: Provide full regdump on tx timeout
From: Matt Carlson @ 2011-04-13 21:05 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson, Michael Chan

The current amount of information provided in the output of a tx timeout
is insufficient to determine a root cause.  This patch replaces the
terse, four-register status output with a more complete body of
information.  For PCIe devices, the full register space is dumped.  For
other devices, select registers are dumped instead.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
---
 drivers/net/tg3.c |  189 ++++++++++++++++++++++++++++++++++-------------------
 drivers/net/tg3.h |    2 +
 2 files changed, 123 insertions(+), 68 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 9d7defc..7274435 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4459,6 +4459,123 @@ static inline int tg3_irq_sync(struct tg3 *tp)
 	return tp->irq_sync;
 }
 
+static inline void tg3_rd32_loop(struct tg3 *tp, u32 *dst, u32 off, u32 len)
+{
+	int i;
+
+	dst = (u32 *)((u8 *)dst + off);
+	for (i = 0; i < len; i += sizeof(u32))
+		*dst++ = tr32(off + i);
+}
+
+static void tg3_dump_legacy_regs(struct tg3 *tp, u32 *regs)
+{
+	tg3_rd32_loop(tp, regs, TG3PCI_VENDOR, 0xb0);
+	tg3_rd32_loop(tp, regs, MAILBOX_INTERRUPT_0, 0x200);
+	tg3_rd32_loop(tp, regs, MAC_MODE, 0x4f0);
+	tg3_rd32_loop(tp, regs, SNDDATAI_MODE, 0xe0);
+	tg3_rd32_loop(tp, regs, SNDDATAC_MODE, 0x04);
+	tg3_rd32_loop(tp, regs, SNDBDS_MODE, 0x80);
+	tg3_rd32_loop(tp, regs, SNDBDI_MODE, 0x48);
+	tg3_rd32_loop(tp, regs, SNDBDC_MODE, 0x04);
+	tg3_rd32_loop(tp, regs, RCVLPC_MODE, 0x20);
+	tg3_rd32_loop(tp, regs, RCVLPC_SELLST_BASE, 0x15c);
+	tg3_rd32_loop(tp, regs, RCVDBDI_MODE, 0x0c);
+	tg3_rd32_loop(tp, regs, RCVDBDI_JUMBO_BD, 0x3c);
+	tg3_rd32_loop(tp, regs, RCVDBDI_BD_PROD_IDX_0, 0x44);
+	tg3_rd32_loop(tp, regs, RCVDCC_MODE, 0x04);
+	tg3_rd32_loop(tp, regs, RCVBDI_MODE, 0x20);
+	tg3_rd32_loop(tp, regs, RCVCC_MODE, 0x14);
+	tg3_rd32_loop(tp, regs, RCVLSC_MODE, 0x08);
+	tg3_rd32_loop(tp, regs, MBFREE_MODE, 0x08);
+	tg3_rd32_loop(tp, regs, HOSTCC_MODE, 0x100);
+
+	if (tp->tg3_flags & TG3_FLAG_SUPPORT_MSIX)
+		tg3_rd32_loop(tp, regs, HOSTCC_RXCOL_TICKS_VEC1, 0x180);
+
+	tg3_rd32_loop(tp, regs, MEMARB_MODE, 0x10);
+	tg3_rd32_loop(tp, regs, BUFMGR_MODE, 0x58);
+	tg3_rd32_loop(tp, regs, RDMAC_MODE, 0x08);
+	tg3_rd32_loop(tp, regs, WDMAC_MODE, 0x08);
+	tg3_rd32_loop(tp, regs, RX_CPU_MODE, 0x04);
+	tg3_rd32_loop(tp, regs, RX_CPU_STATE, 0x04);
+	tg3_rd32_loop(tp, regs, RX_CPU_PGMCTR, 0x04);
+	tg3_rd32_loop(tp, regs, RX_CPU_HWBKPT, 0x04);
+
+	if (!(tp->tg3_flags2 & TG3_FLG2_5705_PLUS)) {
+		tg3_rd32_loop(tp, regs, TX_CPU_MODE, 0x04);
+		tg3_rd32_loop(tp, regs, TX_CPU_STATE, 0x04);
+		tg3_rd32_loop(tp, regs, TX_CPU_PGMCTR, 0x04);
+	}
+
+	tg3_rd32_loop(tp, regs, GRCMBOX_INTERRUPT_0, 0x110);
+	tg3_rd32_loop(tp, regs, FTQ_RESET, 0x120);
+	tg3_rd32_loop(tp, regs, MSGINT_MODE, 0x0c);
+	tg3_rd32_loop(tp, regs, DMAC_MODE, 0x04);
+	tg3_rd32_loop(tp, regs, GRC_MODE, 0x4c);
+
+	if (tp->tg3_flags & TG3_FLAG_NVRAM)
+		tg3_rd32_loop(tp, regs, NVRAM_CMD, 0x24);
+}
+
+static void tg3_dump_state(struct tg3 *tp)
+{
+	int i;
+	u32 *regs;
+
+	regs = kzalloc(TG3_REG_BLK_SIZE, GFP_ATOMIC);
+	if (!regs) {
+		netdev_err(tp->dev, "Failed allocating register dump buffer\n");
+		return;
+	}
+
+	if (tp->tg3_flags2 & TG3_FLG2_PCI_EXPRESS) {
+		/* Read up to but not including private PCI registers */
+		for (i = 0; i < TG3_PCIE_TLDLPL_PORT; i += sizeof(u32))
+			regs[i / sizeof(u32)] = tr32(i);
+	} else
+		tg3_dump_legacy_regs(tp, regs);
+
+	for (i = 0; i < TG3_REG_BLK_SIZE / sizeof(u32); i += 4) {
+		if (!regs[i + 0] && !regs[i + 1] &&
+		    !regs[i + 2] && !regs[i + 3])
+			continue;
+
+		netdev_err(tp->dev, "0x%08x: 0x%08x, 0x%08x, 0x%08x, 0x%08x\n",
+			   i * 4,
+			   regs[i + 0], regs[i + 1], regs[i + 2], regs[i + 3]);
+	}
+
+	kfree(regs);
+
+	for (i = 0; i < tp->irq_cnt; i++) {
+		struct tg3_napi *tnapi = &tp->napi[i];
+
+		/* SW status block */
+		netdev_err(tp->dev,
+			 "%d: Host status block [%08x:%08x:(%04x:%04x:%04x):(%04x:%04x)]\n",
+			   i,
+			   tnapi->hw_status->status,
+			   tnapi->hw_status->status_tag,
+			   tnapi->hw_status->rx_jumbo_consumer,
+			   tnapi->hw_status->rx_consumer,
+			   tnapi->hw_status->rx_mini_consumer,
+			   tnapi->hw_status->idx[0].rx_producer,
+			   tnapi->hw_status->idx[0].tx_consumer);
+
+		netdev_err(tp->dev,
+		"%d: NAPI info [%08x:%08x:(%04x:%04x:%04x):%04x:(%04x:%04x:%04x:%04x)]\n",
+			   i,
+			   tnapi->last_tag, tnapi->last_irq_tag,
+			   tnapi->tx_prod, tnapi->tx_cons, tnapi->tx_pending,
+			   tnapi->rx_rcb_ptr,
+			   tnapi->prodring.rx_std_prod_idx,
+			   tnapi->prodring.rx_std_cons_idx,
+			   tnapi->prodring.rx_jmb_prod_idx,
+			   tnapi->prodring.rx_jmb_cons_idx);
+	}
+}
+
 /* This is called whenever we suspect that the system chipset is re-
  * ordering the sequence of MMIO to the tx send mailbox. The symptom
  * is bogus tx completions. We try to recover by setting the
@@ -5516,21 +5633,13 @@ out:
 		tg3_phy_start(tp);
 }
 
-static void tg3_dump_short_state(struct tg3 *tp)
-{
-	netdev_err(tp->dev, "DEBUG: MAC_TX_STATUS[%08x] MAC_RX_STATUS[%08x]\n",
-		   tr32(MAC_TX_STATUS), tr32(MAC_RX_STATUS));
-	netdev_err(tp->dev, "DEBUG: RDMAC_STATUS[%08x] WDMAC_STATUS[%08x]\n",
-		   tr32(RDMAC_STATUS), tr32(WDMAC_STATUS));
-}
-
 static void tg3_tx_timeout(struct net_device *dev)
 {
 	struct tg3 *tp = netdev_priv(dev);
 
 	if (netif_msg_tx_err(tp)) {
 		netdev_err(dev, "transmit timed out, resetting\n");
-		tg3_dump_short_state(tp);
+		tg3_dump_state(tp);
 	}
 
 	schedule_work(&tp->reset_task);
@@ -9624,82 +9733,26 @@ static void tg3_set_rx_mode(struct net_device *dev)
 	tg3_full_unlock(tp);
 }
 
-#define TG3_REGDUMP_LEN		(32 * 1024)
-
 static int tg3_get_regs_len(struct net_device *dev)
 {
-	return TG3_REGDUMP_LEN;
+	return TG3_REG_BLK_SIZE;
 }
 
 static void tg3_get_regs(struct net_device *dev,
 		struct ethtool_regs *regs, void *_p)
 {
-	u32 *p = _p;
 	struct tg3 *tp = netdev_priv(dev);
-	u8 *orig_p = _p;
-	int i;
 
 	regs->version = 0;
 
-	memset(p, 0, TG3_REGDUMP_LEN);
+	memset(_p, 0, TG3_REG_BLK_SIZE);
 
 	if (tp->phy_flags & TG3_PHYFLG_IS_LOW_POWER)
 		return;
 
 	tg3_full_lock(tp, 0);
 
-#define __GET_REG32(reg)	(*(p)++ = tr32(reg))
-#define GET_REG32_LOOP(base, len)		\
-do {	p = (u32 *)(orig_p + (base));		\
-	for (i = 0; i < len; i += 4)		\
-		__GET_REG32((base) + i);	\
-} while (0)
-#define GET_REG32_1(reg)			\
-do {	p = (u32 *)(orig_p + (reg));		\
-	__GET_REG32((reg));			\
-} while (0)
-
-	GET_REG32_LOOP(TG3PCI_VENDOR, 0xb0);
-	GET_REG32_LOOP(MAILBOX_INTERRUPT_0, 0x200);
-	GET_REG32_LOOP(MAC_MODE, 0x4f0);
-	GET_REG32_LOOP(SNDDATAI_MODE, 0xe0);
-	GET_REG32_1(SNDDATAC_MODE);
-	GET_REG32_LOOP(SNDBDS_MODE, 0x80);
-	GET_REG32_LOOP(SNDBDI_MODE, 0x48);
-	GET_REG32_1(SNDBDC_MODE);
-	GET_REG32_LOOP(RCVLPC_MODE, 0x20);
-	GET_REG32_LOOP(RCVLPC_SELLST_BASE, 0x15c);
-	GET_REG32_LOOP(RCVDBDI_MODE, 0x0c);
-	GET_REG32_LOOP(RCVDBDI_JUMBO_BD, 0x3c);
-	GET_REG32_LOOP(RCVDBDI_BD_PROD_IDX_0, 0x44);
-	GET_REG32_1(RCVDCC_MODE);
-	GET_REG32_LOOP(RCVBDI_MODE, 0x20);
-	GET_REG32_LOOP(RCVCC_MODE, 0x14);
-	GET_REG32_LOOP(RCVLSC_MODE, 0x08);
-	GET_REG32_1(MBFREE_MODE);
-	GET_REG32_LOOP(HOSTCC_MODE, 0x100);
-	GET_REG32_LOOP(MEMARB_MODE, 0x10);
-	GET_REG32_LOOP(BUFMGR_MODE, 0x58);
-	GET_REG32_LOOP(RDMAC_MODE, 0x08);
-	GET_REG32_LOOP(WDMAC_MODE, 0x08);
-	GET_REG32_1(RX_CPU_MODE);
-	GET_REG32_1(RX_CPU_STATE);
-	GET_REG32_1(RX_CPU_PGMCTR);
-	GET_REG32_1(RX_CPU_HWBKPT);
-	GET_REG32_1(TX_CPU_MODE);
-	GET_REG32_1(TX_CPU_STATE);
-	GET_REG32_1(TX_CPU_PGMCTR);
-	GET_REG32_LOOP(GRCMBOX_INTERRUPT_0, 0x110);
-	GET_REG32_LOOP(FTQ_RESET, 0x120);
-	GET_REG32_LOOP(MSGINT_MODE, 0x0c);
-	GET_REG32_1(DMAC_MODE);
-	GET_REG32_LOOP(GRC_MODE, 0x4c);
-	if (tp->tg3_flags & TG3_FLAG_NVRAM)
-		GET_REG32_LOOP(NVRAM_CMD, 0x24);
-
-#undef __GET_REG32
-#undef GET_REG32_LOOP
-#undef GET_REG32_1
+	tg3_dump_legacy_regs(tp, (u32 *)_p);
 
 	tg3_full_unlock(tp);
 }
diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h
index 829a84a..9912010 100644
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -1954,6 +1954,8 @@
 #define TG3_PCIE_PL_LO_PHYCTL5		 0x00000014
 #define TG3_PCIE_PL_LO_PHYCTL5_DIS_L2CLKREQ	  0x80000000
 
+#define TG3_REG_BLK_SIZE		0x00008000
+
 /* OTP bit definitions */
 #define TG3_OTP_AGCTGT_MASK		0x000000e0
 #define TG3_OTP_AGCTGT_SHIFT		1
-- 
1.7.3.4



^ permalink raw reply related

* Re: [PATCH] NFS: Fix infinite loop in gss_create_upcall()
From: Bryan Schumaker @ 2011-04-13 20:42 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Trond Myklebust, Jiri Slaby, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA, ML netdev,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4DA49F7F.8060005-AlSwsSmVLrQ@public.gmane.org>

On 04/12/2011 02:52 PM, Jiri Slaby wrote:
> On 04/12/2011 08:43 PM, Bryan Schumaker wrote:
>> On 04/12/2011 02:34 PM, Jiri Slaby wrote:
>>> On 04/12/2011 08:31 PM, Trond Myklebust wrote:
>>>>> Yes, it fixes the problem. But it waits 15s before it times out. This is
>>>>> inacceptable for automounted NFS dirs.
>>>>
>>>> I'm still confused as to why you are hitting it at all. In the normal
>>>> autonegotiation case, the client should be trying to use AUTH_SYS first
>>>> and then trying rpcsec_gss if and only if that fails.
>>>>
>>>> Are you really exporting a filesystem using AUTH_NULL as the only
>>>> supported flavour?
>>>
>>> I don't know, I connect to a nfs server which is not maintained by me.
>>> It looks like that. How can I find out?
>>
>> If you're not using gss for anything, you could try rmmod-ing rpcsec_gss_krb5 (and other rpcsec_gss_* modules).
> 
> I don't have NFS in modules. It's all built-in. And this one is
> unconditionally selected because of CONFIG_NFS_V4.

Does this patch help?

- Bryan

We should attempt an AUTH_NULL style mount before
trying gss flavors.  This should prevent a hang if
gss modules are loaded but the userspace program
isn't running.

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9bf41ea..4e3c16b 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2218,8 +2218,8 @@ static int nfs4_proc_get_root(struct nfs_server *server, struct nfs_fh *fhandle,
 	rpc_authflavor_t flav_array[NFS_MAX_SECFLAVORS + 2];
 
 	flav_array[0] = RPC_AUTH_UNIX;
-	len = gss_mech_list_pseudoflavors(&flav_array[1]);
-	flav_array[1+len] = RPC_AUTH_NULL;
+	flav_array[1] = RPC_AUTH_NULL;
+	len = gss_mech_list_pseudoflavors(&flav_array[2]);
 	len += 2;
 
 	for (i = 0; i < len; i++) {


> 
> regards,

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] Add Qualcomm Gobi 2000/3000 driver.
From: David Miller @ 2011-04-13 20:37 UTC (permalink / raw)
  To: ellyjones; +Cc: netdev, dcbw, mjg59, jglasgow, trond
In-Reply-To: <20110413190023.GC1652@google.com>

From: Elly Jones <ellyjones@google.com>
Date: Wed, 13 Apr 2011 15:00:24 -0400

> +void qcusbnet_put(struct qcusbnet *dev)
> +{
> +	mutex_lock(&qcusbnet_lock);
> +	kref_put(&dev->refcount, free_dev);
> +	mutex_unlock(&qcusbnet_lock);
> +}

This locking looks excessive, and shouldn't be needed simply to
release a reference to an object.

> +int qc_suspend(struct usb_interface *iface, pm_message_t event)
> +{
> +	struct usbnet *usbnet;
> +	struct qcusbnet *dev;
> +
> +	if (!iface)
> +		return -ENOMEM;

When is qc_suspend() called with a NULL iface arguemnt?

> +static int qc_resume(struct usb_interface *iface)
> +{
> +	struct usbnet *usbnet;
> +	struct qcusbnet *dev;
> +	int ret;
> +	int oldstate;
> +
> +	if (iface == 0)
> +		return -ENOMEM;

Likewise, and if it is needed use consistent tests for NULL.  Testing
against the integer "0" is definitely the wrong way.

> +		if (usb_endpoint_dir_in(&endpoint->desc)
> +		&&  !usb_endpoint_xfer_int(&endpoint->desc)) {

Please do it like this:

	if (A &&
	    B) {

Not like:

	if (A
	&&  B

the latter looks awful at best.

> +	if (!usbnet || !usbnet->net) {
> +		DBG("failed to get usbnet device\n");
> +		return;
> +	}
> +
> +	dev = (struct qcusbnet *)usbnet->data[0];
> +	if (!dev) {
> +		DBG("failed to get QMIDevice\n");
> +		return;
> +	}

These NULL checks are everywhere!  Do we really _ever_ create a full
registered netdev with any of these things being NULL?  I severely
doubt it.

> +static int qcnet_worker(void *arg)
> +{
> +	struct list_head *node, *tmp;
> +	unsigned long activeflags, listflags;
> +	struct urbreq *req;
> +	int status;
> +	struct usb_device *usbdev;
> +	struct worker *worker = arg;
> +	if (!worker) {
> +		DBG("passed null pointer\n");
> +		return -EINVAL;
> +	}

This NULL check is impossible, you register the worker function with an
explicit &dev->worker argument, so seeing NULL here is impossible.

> +static int qcnet_startxmit(struct sk_buff *skb, struct net_device *netdev)
> +{
> +	unsigned long listflags;
> +	struct qcusbnet *dev;
> +	struct worker *worker;
> +	struct urbreq *req;
> +	void *data;
> +	struct usbnet *usbnet = netdev_priv(netdev);
> +
> +	DBG("\n");
> +
> +	if (!usbnet || !usbnet->net) {
> +		DBG("failed to get usbnet device\n");
> +		return NETDEV_TX_BUSY;
> +	}
> +
> +	dev = (struct qcusbnet *)usbnet->data[0];
> +	if (!dev) {

Again, kill this NULL check noise, all of it can't be necessary.

> +	netdev->trans_start = jiffies;

Setting netdev->trans_start in drivers is expensive and deprecated,
please set netdev_queue->trans_start instead.

> +static int qcnet_open(struct net_device *netdev)
> +{
> +	int status = 0;
> +	struct qcusbnet *dev;
> +	struct usbnet *usbnet = netdev_priv(netdev);
> +
> +	if (!usbnet) {
> +		DBG("failed to get usbnet device\n");
> +		return -ENXIO;
> +	}
> +
> +	dev = (struct qcusbnet *)usbnet->data[0];
> +	if (!dev) {
> +		DBG("failed to get QMIDevice\n");
> +		return -ENXIO;
> +	}

Again, excessive NULL checks.

> +int qcnet_stop(struct net_device *netdev)
> +{
> +	struct qcusbnet *dev;
> +	struct usbnet *usbnet = netdev_priv(netdev);
> +
> +	if (!usbnet || !usbnet->net) {
> +		DBG("failed to get netdevice\n");
> +		return -ENXIO;
> +	}
> +
> +	dev = (struct qcusbnet *)usbnet->data[0];
> +	if (!dev) {
> +		DBG("failed to get QMIDevice\n");
> +		return -ENXIO;
> +	}

Here too.

> +static u8 nibble(unsigned char c)
> +{
> +	if (likely(isdigit(c)))
> +		return c - '0';
> +	c = toupper(c);
> +	if (likely(isxdigit(c)))
> +		return 10 + c - 'A';
> +	return 0;
> +}

Remove this function and use hex_to_bin() instead.

^ permalink raw reply

* Re: [net-next-2.6 RFC PATCH v2 00/13] ethtool: allow custom interval for
From: David Miller @ 2011-04-13 20:25 UTC (permalink / raw)
  To: bhutchings; +Cc: bruce.w.allan, netdev
In-Reply-To: <1302725464.2873.7.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 13 Apr 2011 21:11:04 +0100

> On Wed, 2011-04-13 at 12:58 -0700, Bruce Allan wrote:
>> physical identification
>> 
>> The following series changes the recently added ethtool set_phys_id
>> functions to allow drivers to provide a frequency at which to cycle
>> through an on/off identifier via software if/when the capability is
>> not provided by hardware.
> [...]
> 
> The first patch leaves all the drivers broken temporarily.  Since the
> change in each driver is trivial, I think you can squash this all into
> one patch.

Agreed.

^ permalink raw reply

* Re: [net-next-2.6 RFC PATCH v2 01/13] ethtool: allow custom interval for physical identification
From: Ben Hutchings @ 2011-04-13 20:25 UTC (permalink / raw)
  To: Bruce Allan; +Cc: netdev
In-Reply-To: <20110413195851.25901.8139.stgit@gitlad.jf.intel.com>

On Wed, 2011-04-13 at 12:58 -0700, Bruce Allan wrote:
> When physical identification of an adapter is done by toggling the
> mechanism on and off through software utilizing the set_phys_id operation,
> it is done with a fixed duration for both on and off states.  Some drivers
> may want to set a custom duration for the on/off intervals.  This patch
> changes the API so the return code from the driver's entry point when it
> is called with ETHTOOL_ID_ACTIVE can specify the frequency at which to
> cycle the on/off states.
[...]
> @@ -1655,23 +1655,26 @@ static int ethtool_phys_id(struct net_device *dev, void __user *useraddr)
>  		schedule_timeout_interruptible(
>  			id.data ? (id.data * HZ) : MAX_SCHEDULE_TIMEOUT);
>  	} else {
> -		/* Driver expects to be called periodically */
> +		/* Driver expects to be called using the frequency in rc */
> +		int i = 0, interval = (HZ / (rc * 2));
> +
>  		do {
>  			rtnl_lock();
>  			rc = dev->ethtool_ops->set_phys_id(dev, ETHTOOL_ID_ON);
>  			rtnl_unlock();
>  			if (rc)
>  				break;
> -			schedule_timeout_interruptible(HZ / 2);
> +			schedule_timeout_interruptible(interval);
>  
>  			rtnl_lock();
>  			rc = dev->ethtool_ops->set_phys_id(dev, ETHTOOL_ID_OFF);
>  			rtnl_unlock();
>  			if (rc)
>  				break;
> -			schedule_timeout_interruptible(HZ / 2);
> +			schedule_timeout_interruptible(interval);
>  		} while (!signal_pending(current) &&
> -			 (id.data == 0 || --id.data != 0));
> +			 (id.data == 0 ||
> +			  (++i * 2 * interval) < (id.data * HZ)));
[...]

I'm sure there ought to be a clearer way to do this, and to avoid any
weird effects from integer overflow in the multiplication.  How about
using an inner loop for each second:

		/* Driver expects to be called at twice the frequency in rc */
		int n = rc * 2, i, interval = HZ / n;

		do {
			i = n;
			do {
	 			rtnl_lock();
 				rc = dev->ethtool_ops->set_phys_id(
					dev, (i & 1) ? ETHTOOL_ID_OFF : ETHTOOL_ID_ON);
	 			rtnl_unlock();
 				if (rc)
 					break;
				schedule_timeout_interruptible(interval);
			} while (!signal_pending(current) && --i != 0);
 		} while (!signal_pending(current) &&
			 (id.data == 0 || --id.data != 0));

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [Bug 32772] New: PROBLEM: kernel BUG at net/ipv4/inetpeer.c:386
From: David Miller @ 2011-04-13 20:24 UTC (permalink / raw)
  To: dimetrios; +Cc: eric.dumazet, shemminger, netdev
In-Reply-To: <BANLkTi=PTqcYd1wO_QzQTtg_PWEq2fAJMg@mail.gmail.com>

From: Dmitry Novikov <dimetrios@gmail.com>
Date: Wed, 13 Apr 2011 23:14:03 +0300

> Crash again after 7 days of uptime. slub_nomerge is set

Looks like too deep stack, try this patch which is in net-2.6:

--------------------
inetpeer: reduce stack usage

On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
from inet_getpeer().

Lets share the avl stack to save ~376 bytes.

Before patch :

# objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl

0x000006c3 unlink_from_pool [inetpeer.o]:		376
0x00000721 unlink_from_pool [inetpeer.o]:		376
0x00000cb1 inet_getpeer [inetpeer.o]:			376
0x00000e6d inet_getpeer [inetpeer.o]:			376
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o

After patch :

objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x00000c11 inet_getpeer [inetpeer.o]:			376
0x00000dcd inet_getpeer [inetpeer.o]:			376
0x00000ab9 peer_check_expire [inetpeer.o]:		328
0x00000b7f peer_check_expire [inetpeer.o]:		328
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Scot Doyle <lkml@scotdoyle.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Reviewed-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/inetpeer.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index dd1b20e..9df4e63 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -354,7 +354,8 @@ static void inetpeer_free_rcu(struct rcu_head *head)
 }
 
 /* May be called with local BH enabled. */
-static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base)
+static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base,
+			     struct inet_peer __rcu **stack[PEER_MAXDEPTH])
 {
 	int do_free;
 
@@ -368,7 +369,6 @@ static void unlink_from_pool(struct inet_peer *p, struct inet_peer_base *base)
 	 * We use refcnt=-1 to alert lockless readers this entry is deleted.
 	 */
 	if (atomic_cmpxchg(&p->refcnt, 1, -1) == 1) {
-		struct inet_peer __rcu **stack[PEER_MAXDEPTH];
 		struct inet_peer __rcu ***stackptr, ***delp;
 		if (lookup(&p->daddr, stack, base) != p)
 			BUG();
@@ -422,7 +422,7 @@ static struct inet_peer_base *peer_to_base(struct inet_peer *p)
 }
 
 /* May be called with local BH enabled. */
-static int cleanup_once(unsigned long ttl)
+static int cleanup_once(unsigned long ttl, struct inet_peer __rcu **stack[PEER_MAXDEPTH])
 {
 	struct inet_peer *p = NULL;
 
@@ -454,7 +454,7 @@ static int cleanup_once(unsigned long ttl)
 		 * happen because of entry limits in route cache. */
 		return -1;
 
-	unlink_from_pool(p, peer_to_base(p));
+	unlink_from_pool(p, peer_to_base(p), stack);
 	return 0;
 }
 
@@ -524,7 +524,7 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
 
 	if (base->total >= inet_peer_threshold)
 		/* Remove one less-recently-used entry. */
-		cleanup_once(0);
+		cleanup_once(0, stack);
 
 	return p;
 }
@@ -540,6 +540,7 @@ static void peer_check_expire(unsigned long dummy)
 {
 	unsigned long now = jiffies;
 	int ttl, total;
+	struct inet_peer __rcu **stack[PEER_MAXDEPTH];
 
 	total = compute_total();
 	if (total >= inet_peer_threshold)
@@ -548,7 +549,7 @@ static void peer_check_expire(unsigned long dummy)
 		ttl = inet_peer_maxttl
 				- (inet_peer_maxttl - inet_peer_minttl) / HZ *
 					total / inet_peer_threshold * HZ;
-	while (!cleanup_once(ttl)) {
+	while (!cleanup_once(ttl, stack)) {
 		if (jiffies != now)
 			break;
 	}
-- 
1.7.4.3


^ permalink raw reply related

* Re: [Bug 32772] New: PROBLEM: kernel BUG at net/ipv4/inetpeer.c:386
From: Dmitry Novikov @ 2011-04-13 20:14 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, shemminger, netdev
In-Reply-To: <20110406.111649.193697123.davem@davemloft.net>

Hello.

Crash again after 7 days of uptime. slub_nomerge is set

[559353.216526] ------------[ cut here ]------------
[559353.217494] kernel BUG at net/ipv4/inetpeer.c:386!
[559353.217494] invalid opcode: 0000 [#1] SMP
[559353.217494] last sysfs file: /sys/module/nf_conntrack_pptp/initstate
[559353.217494] Modules linked in: nf_nat_pptp nf_nat_proto_gre
nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_ftp nf_conntrack_ftp
ipt_REJECT xt_state xt_tcpudp xt_multiport ip_set iptable_filter
iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
nf_defrag_ipv4 ip_tables x_tables act_police cls_u32 sch_ingress
sch_tbf 8021q garp bridge ipv6 stp llc loop intel_agp intel_gtt
agpgart rng_core pcspkr i2c_i801 i2c_core processor thermal_sys
parport_pc evdev parport serio_raw tpm_tis tpm button tpm_bios ext3
jbd mbcache sd_mod crc_t10dif ata_generic ata_piix libata scsi_mod
uhci_hcd ide_pci_generic e1000e ehci_hcd igb r8169 ide_core dca mii
usbcore nls_base [last unloaded: scsi_wait_scan]
[559353.217494]
[559353.217494] Pid: 0, comm: kworker/0:0 Not tainted
2.6.38-demyan-1.1demyan #1 Gigabyte Technology Co., Ltd.
G41MT-ES2L/G41MT-ES2L
[559353.217494] EIP: 0060:[<c11e0caa>] EFLAGS: 00010287 CPU: 1
[559353.217494] EIP is at unlink_from_pool+0x85/0x14a
[559353.217494] EAX: c125ff04 EBX: efcb09c0 ECX: abfd6970 EDX: ee6d77c4
[559353.217494] ESI: c1333338 EDI: f4c91bfc EBP: abfea42e ESP: f4c91ba8
[559353.217494]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[559353.217494] Process kworker/0:0 (pid: 0, ti=f4c90000 task=f4c6a400
task.ti=f4c8c000)
[559353.217494] Stack:
[559353.217494]  f351c790 00000001 abfd6970 c133333c c1333338 efc6b384
efe2af80 efcd1c04
[559353.217494]  f3cc2784 ef3f62c4 f251da80 ef054b40 efcdc300 f0373dc0
f429a144 ef254a80
[559353.217494]  ed4e6340 f0705f40 efcdb580 f05261c0 ee6d77c4 f4c91cb4
f351c790 f4c91c78
[559353.217494] Call Trace:
[559353.217494]  [<c120f068>] ? fib4_rule_action+0x40/0x4d
[559353.217494]  [<c11d1be3>] ? fib_rules_lookup+0x8d/0xe4
[559353.217494]  [<c11e0de9>] ? cleanup_once+0x7a/0x7f
[559353.217494]  [<c11e0fa9>] ? inet_getpeer+0x1bb/0x1dc
[559353.217494]  [<c11dc073>] ? nf_ct_attach+0x12/0x13
[559353.217494]  [<c1202404>] ? icmp_glue_bits+0x65/0x6a
[559353.217494]  [<c11e4109>] ? ip_append_data+0x595/0x850
[559353.217494]  [<c11e025d>] ? rt_bind_peer+0x1d/0x3d
[559353.217494]  [<c11e029f>] ? __ip_select_ident+0x22/0xa6
[559353.217494]  [<c11e4f60>] ? ip_push_pending_frames+0x206/0x2cb
[559353.217494]  [<c120301b>] ? icmp_send+0x4fe/0x523
[559353.217494]  [<f81a6b09>] ? ____nf_conntrack_find+0xfa/0x142 [nf_conntrack]
[559353.217494]  [<f81a8069>] ? nf_conntrack_in+0x4f3/0x5e3 [nf_conntrack]
[559353.217494]  [<f815c536>] ? ipt_do_table+0x4bc/0x4eb [ip_tables]
[559353.217494]  [<c11e2949>] ? ip_forward+0x2ef/0x316
[559353.217494]  [<c11e13da>] ? ip_rcv_finish+0x2fa/0x31f
[559353.217494]  [<c11c1b3c>] ? __netif_receive_skb+0x405/0x42c
[559353.217494]  [<c11c1a63>] ? __netif_receive_skb+0x32c/0x42c
[559353.217494]  [<c1047585>] ? ktime_get_real+0x10/0x2d
[559353.217494]  [<c11c2547>] ? netif_receive_skb+0x5a/0x5f
[559353.217494]  [<c11c25ff>] ? napi_skb_finish+0x1b/0x30
[559353.217494]  [<f8104723>] ? igb_poll+0x649/0x94a [igb]
[559353.217494]  [<c1007765>] ? sched_clock+0x9/0xd
[559353.217494]  [<c1030094>] ? wait_consider_task+0x977/0xa91
[559353.217494]  [<c104438f>] ? sched_clock_local+0x17/0x13d
[559353.217494]  [<c11c2b7b>] ? net_rx_action+0x90/0x150
[559353.217494]  [<c1031f12>] ? __do_softirq+0x75/0x10e
[559353.217494]  [<c1031e9d>] ? __do_softirq+0x0/0x10e
[559353.217494]  <IRQ>
[559353.217494]  [<c1031df3>] ? irq_exit+0x31/0x64
[559353.217494]  [<c1004397>] ? do_IRQ+0x73/0x84
[559353.217494]  [<c1003429>] ? common_interrupt+0x29/0x30
[559353.217494]  [<c10089b4>] ? mwait_idle+0x4f/0x59
[559353.217494]  [<c10021ef>] ? cpu_idle+0x46/0x63
[559353.217494] Code: 24 08 39 cd 75 09 42 3b 54 24 04 7c e9 eb 18 3b
6c 24 08 8d 50 04 0f 42 d0 89 17 83 c7 04 8b 02 3d 04 ff 25 c1 75 bb
39 d8 74 04 <0f> 0b eb fe 8d 6f fc 81 3b 04 ff 25 c1 89 6c 24 08 75 0d
8b 47
[559353.217494] EIP: [<c11e0caa>] unlink_from_pool+0x85/0x14a SS:ESP
0068:f4c91ba8
[559354.302112] ---[ end trace 55cdab910854890a ]---
[559354.316239] Kernel panic - not syncing: Fatal exception in interrupt
[559354.335557] Pid: 0, comm: kworker/0:0 Tainted: G      D
2.6.38-demyan-1.1demyan #1
[559354.359578] Call Trace:
[559354.367198]  [<c1231f71>] ? panic+0x4d/0x137
[559354.380274]  [<c1005722>] ? oops_end+0x8e/0x99
[559354.393871]  [<c1003a0e>] ? do_invalid_op+0x0/0x75
[559354.408509]  [<c1003a7a>] ? do_invalid_op+0x6c/0x75
[559354.423407]  [<c11e0caa>] ? unlink_from_pool+0x85/0x14a
[559354.439345]  [<c120f068>] ? fib4_rule_action+0x40/0x4d
[559354.455022]  [<c11d1be3>] ? fib_rules_lookup+0x8d/0xe4
[559354.470700]  [<c120f122>] ? fib_lookup+0x31/0x3f
[559354.484818]  [<c11ca4f1>] ? neigh_lookup+0x8e/0x96
[559354.499454]  [<c123464e>] ? error_code+0x5a/0x60
[559354.513571]  [<c1003a0e>] ? do_invalid_op+0x0/0x75
[559354.528208]  [<c11e0caa>] ? unlink_from_pool+0x85/0x14a
[559354.544146]  [<c120f068>] ? fib4_rule_action+0x40/0x4d
[559354.559823]  [<c11d1be3>] ? fib_rules_lookup+0x8d/0xe4
[559354.575500]  [<c11e0de9>] ? cleanup_once+0x7a/0x7f
[559354.590137]  [<c11e0fa9>] ? inet_getpeer+0x1bb/0x1dc
[559354.605297]  [<c11dc073>] ? nf_ct_attach+0x12/0x13
[559354.619934]  [<c1202404>] ? icmp_glue_bits+0x65/0x6a
[559354.635090]  [<c11e4109>] ? ip_append_data+0x595/0x850
[559354.650767]  [<c11e025d>] ? rt_bind_peer+0x1d/0x3d
[559354.665405]  [<c11e029f>] ? __ip_select_ident+0x22/0xa6
[559354.681344]  [<c11e4f60>] ? ip_push_pending_frames+0x206/0x2cb
[559354.699099]  [<c120301b>] ? icmp_send+0x4fe/0x523
[559354.713479]  [<f81a6b09>] ? ____nf_conntrack_find+0xfa/0x142 [nf_conntrack]
[559354.734615]  [<f81a8069>] ? nf_conntrack_in+0x4f3/0x5e3 [nf_conntrack]
[559354.754452]  [<f815c536>] ? ipt_do_table+0x4bc/0x4eb [ip_tables]
[559354.772731]  [<c11e2949>] ? ip_forward+0x2ef/0x316
[559354.787366]  [<c11e13da>] ? ip_rcv_finish+0x2fa/0x31f
[559354.802785]  [<c11c1b3c>] ? __netif_receive_skb+0x405/0x42c
[559354.819762]  [<c11c1a63>] ? __netif_receive_skb+0x32c/0x42c
[559354.836738]  [<c1047585>] ? ktime_get_real+0x10/0x2d
[559354.851901]  [<c11c2547>] ? netif_receive_skb+0x5a/0x5f
[559354.867835]  [<c11c25ff>] ? napi_skb_finish+0x1b/0x30
[559354.883254]  [<f8104723>] ? igb_poll+0x649/0x94a [igb]
[559354.898930]  [<c1007765>] ? sched_clock+0x9/0xd
[559354.912786]  [<c1030094>] ? wait_consider_task+0x977/0xa91
[559354.929502]  [<c104438f>] ? sched_clock_local+0x17/0x13d
[559354.945701]  [<c11c2b7b>] ? net_rx_action+0x90/0x150
[559354.960857]  [<c1031f12>] ? __do_softirq+0x75/0x10e
[559354.975756]  [<c1031e9d>] ? __do_softirq+0x0/0x10e
[559354.990393]  <IRQ>  [<c1031df3>] ? irq_exit+0x31/0x64
[559355.005862]  [<c1004397>] ? do_IRQ+0x73/0x84
[559355.018941]  [<c1003429>] ? common_interrupt+0x29/0x30
[559355.034618]  [<c10089b4>] ? mwait_idle+0x4f/0x59
[559355.048734]  [<c10021ef>] ? cpu_idle+0x46/0x63
[559355.062333] Rebooting in 5 seconds..

^ permalink raw reply

* Re: [net-next-2.6 RFC PATCH v2 00/13] ethtool: allow custom interval for
From: Ben Hutchings @ 2011-04-13 20:11 UTC (permalink / raw)
  To: Bruce Allan; +Cc: netdev
In-Reply-To: <20110413195146.25901.72193.stgit@gitlad.jf.intel.com>

On Wed, 2011-04-13 at 12:58 -0700, Bruce Allan wrote:
> physical identification
> 
> The following series changes the recently added ethtool set_phys_id
> functions to allow drivers to provide a frequency at which to cycle
> through an on/off identifier via software if/when the capability is
> not provided by hardware.
[...]

The first patch leaves all the drivers broken temporarily.  Since the
change in each driver is trivial, I think you can squash this all into
one patch.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox