Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2] sctp: Fix a race between ICMP protocol unreachable and connect()
From: Vlad Yasevich @ 2010-05-05 19:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-sctp, Vlad Yasevich
In-Reply-To: <1273087783-18250-1-git-send-email-vladislav.yasevich@hp.com>

[.. removed leftover debuggin printk. should probably be queued for stable
 as well... ]

ICMP protocol unreachable handling completely disregarded
the fact that the user may have locket the socket.  It proceeded
to destroy the association, even though the user may have
held the lock and had a ref on the association.  This resulted
in the following:

Attempt to release alive inet socket f6afcc00

=========================
[ BUG: held lock freed! ]
-------------------------
somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
there!
 (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
1 lock held by somenu/2672:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c

stack backtrace:
Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
Call Trace:
 [<c1232266>] ? printk+0xf/0x11
 [<c1038553>] debug_check_no_locks_freed+0xce/0xff
 [<c10620b4>] kmem_cache_free+0x21/0x66
 [<c1185f25>] __sk_free+0x9d/0xab
 [<c1185f9c>] sk_free+0x1c/0x1e
 [<c1216e38>] sctp_association_put+0x32/0x89
 [<c1220865>] __sctp_connect+0x36d/0x3f4
 [<c122098a>] ? sctp_connect+0x13/0x4c
 [<c102d073>] ? autoremove_wake_function+0x0/0x33
 [<c12209a8>] sctp_connect+0x31/0x4c
 [<c11d1e80>] inet_dgram_connect+0x4b/0x55
 [<c11834fa>] sys_connect+0x54/0x71
 [<c103a3a2>] ? lock_release_non_nested+0x88/0x239
 [<c1054026>] ? might_fault+0x42/0x7c
 [<c1054026>] ? might_fault+0x42/0x7c
 [<c11847ab>] sys_socketcall+0x6d/0x178
 [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c1002959>] syscall_call+0x7/0xb

This was because the sctp_wait_for_connect() would aqcure the socket
lock and then proceed to release the last reference count on the
association, thus cause the fully destruction path to finish freeing
the socket.

The simplest solution is to start a very short timer in case the socket
is owned by user.  When the timer expires, we can do some verification
and be able to do the release properly.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 include/net/sctp/sm.h      |    1 +
 include/net/sctp/structs.h |    3 +++
 net/sctp/input.c           |   22 ++++++++++++++++++----
 net/sctp/sm_sideeffect.c   |   35 +++++++++++++++++++++++++++++++++++
 net/sctp/transport.c       |    2 ++
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 851c813..61d73e3 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -279,6 +279,7 @@ int sctp_do_sm(sctp_event_t event_type, sctp_subtype_t subtype,
 /* 2nd level prototypes */
 void sctp_generate_t3_rtx_event(unsigned long peer);
 void sctp_generate_heartbeat_event(unsigned long peer);
+void sctp_generate_proto_unreach_event(unsigned long peer);
 
 void sctp_ootb_pkt_free(struct sctp_packet *);
 
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 597f8e2..219043a 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1010,6 +1010,9 @@ struct sctp_transport {
 	/* Heartbeat timer is per destination. */
 	struct timer_list hb_timer;
 
+	/* Timer to handle ICMP proto unreachable envets */
+	struct timer_list proto_unreach_timer;
+
 	/* Since we're using per-destination retransmission timers
 	 * (see above), we're also using per-destination "transmitted"
 	 * queues.  This probably ought to be a private struct
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 2a57018..ea21924 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -440,11 +440,25 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
 {
 	SCTP_DEBUG_PRINTK("%s\n",  __func__);
 
-	sctp_do_sm(SCTP_EVENT_T_OTHER,
-		   SCTP_ST_OTHER(SCTP_EVENT_ICMP_PROTO_UNREACH),
-		   asoc->state, asoc->ep, asoc, t,
-		   GFP_ATOMIC);
+	if (sock_owned_by_user(sk)) {
+		if (timer_pending(&t->proto_unreach_timer))
+			return;
+		else {
+			if (!mod_timer(&t->proto_unreach_timer,
+						jiffies + (HZ/20)))
+				sctp_association_hold(asoc);
+		}
+			
+	} else {
+		if (timer_pending(&t->proto_unreach_timer) &&
+		    del_timer(&t->proto_unreach_timer))
+			sctp_association_put(asoc);
 
+		sctp_do_sm(SCTP_EVENT_T_OTHER,
+			   SCTP_ST_OTHER(SCTP_EVENT_ICMP_PROTO_UNREACH),
+			   asoc->state, asoc->ep, asoc, t,
+			   GFP_ATOMIC);
+	}
 }
 
 /* Common lookup code for icmp/icmpv6 error handler. */
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index d5ae450..eb1f42f 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -397,6 +397,41 @@ out_unlock:
 	sctp_transport_put(transport);
 }
 
+/* Handle the timeout of the ICMP protocol unreachable timer.  Trigger
+ * the correct state machine transition that will close the association.
+ */
+void sctp_generate_proto_unreach_event(unsigned long data)
+{
+	struct sctp_transport *transport = (struct sctp_transport *) data;
+	struct sctp_association *asoc = transport->asoc;
+	
+	sctp_bh_lock_sock(asoc->base.sk);
+	if (sock_owned_by_user(asoc->base.sk)) {
+		SCTP_DEBUG_PRINTK("%s:Sock is busy.\n", __func__);
+
+		/* Try again later.  */
+		if (!mod_timer(&transport->proto_unreach_timer,
+				jiffies + (HZ/20)))
+			sctp_association_hold(asoc);
+		goto out_unlock;
+	}
+
+	/* Is this structure just waiting around for us to actually
+	 * get destroyed?
+	 */
+	if (asoc->base.dead)
+		goto out_unlock;
+
+	sctp_do_sm(SCTP_EVENT_T_OTHER,
+		   SCTP_ST_OTHER(SCTP_EVENT_ICMP_PROTO_UNREACH),
+		   asoc->state, asoc->ep, asoc, transport, GFP_ATOMIC);
+
+out_unlock:
+	sctp_bh_unlock_sock(asoc->base.sk);
+	sctp_association_put(asoc);
+}
+
+
 /* Inject a SACK Timeout event into the state machine.  */
 static void sctp_generate_sack_event(unsigned long data)
 {
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index be4d63d..4a36803 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -108,6 +108,8 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
 			(unsigned long)peer);
 	setup_timer(&peer->hb_timer, sctp_generate_heartbeat_event,
 			(unsigned long)peer);
+	setup_timer(&peer->proto_unreach_timer,
+		    sctp_generate_proto_unreach_event, (unsigned long)peer);
 
 	/* Initialize the 64-bit random nonce sent with heartbeat. */
 	get_random_bytes(&peer->hb_nonce, sizeof(peer->hb_nonce));
-- 
1.6.0.4


^ permalink raw reply related

* [PATCH] sctp: Fix a race between ICMP protocol unreachable and connect()
From: Vlad Yasevich @ 2010-05-05 19:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-sctp, Vlad Yasevich

ICMP protocol unreachable handling completely disregarded
the fact that the user may have locket the socket.  It proceeded
to destroy the association, even though the user may have
held the lock and had a ref on the association.  This resulted
in the following:

Attempt to release alive inet socket f6afcc00

=========================
[ BUG: held lock freed! ]
-------------------------
somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
there!
 (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
1 lock held by somenu/2672:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c

stack backtrace:
Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
Call Trace:
 [<c1232266>] ? printk+0xf/0x11
 [<c1038553>] debug_check_no_locks_freed+0xce/0xff
 [<c10620b4>] kmem_cache_free+0x21/0x66
 [<c1185f25>] __sk_free+0x9d/0xab
 [<c1185f9c>] sk_free+0x1c/0x1e
 [<c1216e38>] sctp_association_put+0x32/0x89
 [<c1220865>] __sctp_connect+0x36d/0x3f4
 [<c122098a>] ? sctp_connect+0x13/0x4c
 [<c102d073>] ? autoremove_wake_function+0x0/0x33
 [<c12209a8>] sctp_connect+0x31/0x4c
 [<c11d1e80>] inet_dgram_connect+0x4b/0x55
 [<c11834fa>] sys_connect+0x54/0x71
 [<c103a3a2>] ? lock_release_non_nested+0x88/0x239
 [<c1054026>] ? might_fault+0x42/0x7c
 [<c1054026>] ? might_fault+0x42/0x7c
 [<c11847ab>] sys_socketcall+0x6d/0x178
 [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c1002959>] syscall_call+0x7/0xb

This was because the sctp_wait_for_connect() would aqcure the socket
lock and then proceed to release the last reference count on the
association, thus cause the fully destruction path to finish freeing
the socket.

The simplest solution is to start a very short timer in case the socket
is owned by user.  When the timer expires, we can do some verification
and be able to do the release properly.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
 include/net/sctp/sm.h      |    1 +
 include/net/sctp/structs.h |    3 +++
 net/sctp/input.c           |   23 +++++++++++++++++++----
 net/sctp/sm_sideeffect.c   |   35 +++++++++++++++++++++++++++++++++++
 net/sctp/transport.c       |    2 ++
 5 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 851c813..61d73e3 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -279,6 +279,7 @@ int sctp_do_sm(sctp_event_t event_type, sctp_subtype_t subtype,
 /* 2nd level prototypes */
 void sctp_generate_t3_rtx_event(unsigned long peer);
 void sctp_generate_heartbeat_event(unsigned long peer);
+void sctp_generate_proto_unreach_event(unsigned long peer);
 
 void sctp_ootb_pkt_free(struct sctp_packet *);
 
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 597f8e2..219043a 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1010,6 +1010,9 @@ struct sctp_transport {
 	/* Heartbeat timer is per destination. */
 	struct timer_list hb_timer;
 
+	/* Timer to handle ICMP proto unreachable envets */
+	struct timer_list proto_unreach_timer;
+
 	/* Since we're using per-destination retransmission timers
 	 * (see above), we're also using per-destination "transmitted"
 	 * queues.  This probably ought to be a private struct
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 2a57018..94b2eb2 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -440,11 +440,25 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
 {
 	SCTP_DEBUG_PRINTK("%s\n",  __func__);
 
-	sctp_do_sm(SCTP_EVENT_T_OTHER,
-		   SCTP_ST_OTHER(SCTP_EVENT_ICMP_PROTO_UNREACH),
-		   asoc->state, asoc->ep, asoc, t,
-		   GFP_ATOMIC);
+	if (sock_owned_by_user(sk)) {
+		if (timer_pending(&t->proto_unreach_timer))
+			return;
+		else {
+			if (!mod_timer(&t->proto_unreach_timer,
+						jiffies + (HZ/20)))
+				sctp_association_hold(asoc);
+		}
+			
+	} else {
+		if (timer_pending(&t->proto_unreach_timer) &&
+		    del_timer(&t->proto_unreach_timer))
+			sctp_association_put(asoc);
 
+		sctp_do_sm(SCTP_EVENT_T_OTHER,
+			   SCTP_ST_OTHER(SCTP_EVENT_ICMP_PROTO_UNREACH),
+			   asoc->state, asoc->ep, asoc, t,
+			   GFP_ATOMIC);
+	}
 }
 
 /* Common lookup code for icmp/icmpv6 error handler. */
@@ -599,6 +613,7 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info)
 		}
 		else {
 			if (ICMP_PROT_UNREACH == code) {
+				printk("processing code unreach\n");
 				sctp_icmp_proto_unreachable(sk, asoc,
 							    transport);
 				goto out_unlock;
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index d5ae450..eb1f42f 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -397,6 +397,41 @@ out_unlock:
 	sctp_transport_put(transport);
 }
 
+/* Handle the timeout of the ICMP protocol unreachable timer.  Trigger
+ * the correct state machine transition that will close the association.
+ */
+void sctp_generate_proto_unreach_event(unsigned long data)
+{
+	struct sctp_transport *transport = (struct sctp_transport *) data;
+	struct sctp_association *asoc = transport->asoc;
+	
+	sctp_bh_lock_sock(asoc->base.sk);
+	if (sock_owned_by_user(asoc->base.sk)) {
+		SCTP_DEBUG_PRINTK("%s:Sock is busy.\n", __func__);
+
+		/* Try again later.  */
+		if (!mod_timer(&transport->proto_unreach_timer,
+				jiffies + (HZ/20)))
+			sctp_association_hold(asoc);
+		goto out_unlock;
+	}
+
+	/* Is this structure just waiting around for us to actually
+	 * get destroyed?
+	 */
+	if (asoc->base.dead)
+		goto out_unlock;
+
+	sctp_do_sm(SCTP_EVENT_T_OTHER,
+		   SCTP_ST_OTHER(SCTP_EVENT_ICMP_PROTO_UNREACH),
+		   asoc->state, asoc->ep, asoc, transport, GFP_ATOMIC);
+
+out_unlock:
+	sctp_bh_unlock_sock(asoc->base.sk);
+	sctp_association_put(asoc);
+}
+
+
 /* Inject a SACK Timeout event into the state machine.  */
 static void sctp_generate_sack_event(unsigned long data)
 {
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index be4d63d..4a36803 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -108,6 +108,8 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
 			(unsigned long)peer);
 	setup_timer(&peer->hb_timer, sctp_generate_heartbeat_event,
 			(unsigned long)peer);
+	setup_timer(&peer->proto_unreach_timer,
+		    sctp_generate_proto_unreach_event, (unsigned long)peer);
 
 	/* Initialize the 64-bit random nonce sent with heartbeat. */
 	get_random_bytes(&peer->hb_nonce, sizeof(peer->hb_nonce));
-- 
1.6.0.4


^ permalink raw reply related

* linear sk_buff
From: Mark Ryden @ 2010-05-05 19:12 UTC (permalink / raw)
  To: netdev

Hello,
 I would appreciate if someone in this mailing list  can say in a
sentence or two what is a linear
sk_buff and what is a non linear sk_buff; does it has to do with fragmentation?
(I am sure that many know the answer, but I am confused and googling
made me overconfused)

Regards, and sorry for the noise,

Mark

^ permalink raw reply

* [PATCH/RFC] cxgb4: Add MAINTAINERS info
From: Roland Dreier @ 2010-05-05 19:01 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW, dm-ut6Up61K2wZBDgjK7y7TUQ
In-Reply-To: <20100428103356.931ba48c.randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Hi guys, does this info for cxgb4/iw_cxgb4 (pretty much copied from
cxgb3, except with Dimitris instead of Divy) look right?  If so I'll add
it to my tree.

Thanks,
  Roland
---
diff --git a/MAINTAINERS b/MAINTAINERS
index 7a9ccda..a00231b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1719,6 +1719,20 @@ W:	http://www.openfabrics.org
 S:	Supported
 F:	drivers/infiniband/hw/cxgb3/
 
+CXGB4 ETHERNET DRIVER (CXGB4)
+M:	Dimitris Michailidis <dm-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
+L:	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+W:	http://www.chelsio.com
+S:	Supported
+F:	drivers/net/cxgb4/
+
+CXGB4 IWARP RNIC DRIVER (IW_CXGB4)
+M:	Steve Wise <swise-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
+L:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+W:	http://www.openfabrics.org
+S:	Supported
+F:	drivers/infiniband/hw/cxgb4/
+
 CYBERPRO FB DRIVER
 M:	Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
 L:	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org (moderated for non-subscribers)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Pankaj Thakkar @ 2010-05-05 19:00 UTC (permalink / raw)
  To: Chris Wright
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org, pv-drivers@vmware.com,
	Shreyas Bhatewara, kvm@vger.kernel.org
In-Reply-To: <20100505005852.GA28034@sequoia.sous-sol.org>

On Tue, May 04, 2010 at 05:58:52PM -0700, Chris Wright wrote:
> Date: Tue, 4 May 2010 17:58:52 -0700
> From: Chris Wright <chrisw@sous-sol.org>
> To: Pankaj Thakkar <pthakkar@vmware.com>
> CC: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
> 	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
> 	"virtualization@lists.linux-foundation.org"
>  <virtualization@lists.linux-foundation.org>,
> 	"pv-drivers@vmware.com" <pv-drivers@vmware.com>,
> 	Shreyas Bhatewara <sbhatewara@vmware.com>,
> 	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
> 
> * Pankaj Thakkar (pthakkar@vmware.com) wrote:
> > We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
> > Linux users can exploit the benefits provided by passthrough devices in a
> > seamless manner while retaining the benefits of virtualization. The document
> > below tries to answer most of the questions which we anticipated. Please let us
> > know your comments and queries.
> 
> How does the throughput, latency, and host CPU utilization for normal
> data path compare with say NetQueue?

NetQueue is really for scaling across multiple VMs. NPA allows similar scaling
and also helps in improving the CPU efficiency for a single VM since the
hypervisor is bypassed. Througput wise both emulation and passthrough (NPA) can
obtain line rates on 10gig but passthrough saves upto 40% cpu based on the
workload. We did a demo at IDF 2009 where we compared 8 VMs running on NetQueue
v/s 8 VMs running on NPA (using Niantic) and we obtained similar CPU efficiency
gains.

> 
> And does this obsolete your UPT implementation?

NPA and UPT share a lot of code in the hypervisor. UPT was adopted only by very
limited IHVs and hence NPA is our way forward to have all IHVs onboard.

> How many cards actually support this NPA interface?  What does it look
> like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT
> one).

We have it working internally with Intel Niantic (10G) and Kawela (1G) SR-IOV
NIC. We are also working with upcoming Broadcom 10G card and plan to support
other IHVs. This is unlike UPT so we don't dictate the register sets or rings
like we did in UPT. Rather we have guidelines like that the card should have an
embedded switch for inter VF switching or should support programming (rx
filters, VLAN, etc) though the PF driver rather than the VF driver.

> How do you handle hardware which has a more symmetric view of the
> SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
> specification)?  Or hardware which has multiple functions per physical
> port (multiqueue, hw filtering, embedded switch, etc.)?

I am not sure what do you mean by symmetric view of SR-IOV world?

NPA allows multi-queue VFs and requires an embedded switch currently. As far as
the PF driver is concerned we require IHVs to support all existing and upcoming
features like NetQueue, FCoE, etc. The PF driver is considered special and is
used to drive the traffic for the emulated/paravirtualized VMs and is also used
to program things on behalf of the VFs through the hypervisor. If the hardware
has multiple physical functions they are treated as separate adapters (with
their own set of VFs) and we require the embedded switch to maintain that
distinction as well.

> > NPA offers several benefits:
> > 1. Performance: Critical performance sensitive paths are not trapped and the
> > guest can directly drive the hardware without incurring virtualization
> > overheads.
> 
> Can you demonstrate with data?

The setup is 2.667Ghz Nehalem server running SLES11 VM talking to a 2.33Ghz
Barcelona client box running RHEL 5.1. We had netperf streams with 16k msg size
over 64k socket size running between server VM and client and they are using
Intel Niantic 10G cards. In both cases (NPA and regular) the VM was CPU
saturated (used one full core).

TX: regular vmxnet3 = 3085.5 Mbps/GHz; NPA vmxnet3 = 4397.2 Mbps/GHz
RX: regular vmxnet3 = 1379.6 Mbps/GHz; NPA vmxnet3 = 2349.7 Mbps/GHz

We have similar results for other configuration and in general we have seen NPA
is better in terms of CPU cost and can save upto 40% of CPU cost.

> 
> > 2. Hypervisor control: All control operations from the guest such as programming
> > MAC address go through the hypervisor layer and hence can be subjected to
> > hypervisor policies. The PF driver can be further used to put policy decisions
> > like which VLAN the guest should be on.
> 
> This can happen without NPA as well.  VF simply needs to request
> the change via the PF (in fact, hw does that right now).  Also, we
> already have a host side management interface via PF (see, for example,
> RTM_SETLINK IFLA_VF_MAC interface).
> 
> What is control plane interface?  Just something like a fixed register set?

All operations other than TX/RX go through the vmxnet3 shell to the vmxnet3
device emulation. So the control plane is really the vmxnet3 device emulation
as far as the guest is concerned.

> 
> > 3. Guest Management: No hardware specific drivers need to be installed in the
> > guest virtual machine and hence no overheads are incurred for guest management.
> > All software for the driver (including the PF driver and the plugin) is
> > installed in the hypervisor.
> 
> So we have a plugin per hardware VF implementation?  And the hypervisor
> injects this code into the guest?

One guest-agnostic plugin per VF implementation. Yes, the plugin is injected
into the guest by the hypervisor.

> > The plugin image is provided by the IHVs along with the PF driver and is
> > packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> > either into a Linux VM or a Windows VM. The plugin is written against the Shell
> 
> And it will need to be GPL AFAICT from what you've said thus far.  It
> does sound worrisome, although I suppose hw firmware isn't particularly
> different.

Yes it would be GPL and we are thinking of enforcing the license in the
hypervisor as well as in the shell.

> How does the shell switch back to emulated mode for live migration?

The hypervisor sends a notification to the shell to switch out of passthrough
and it quiesces the VF and tears down the mapping between VF and the guest. The
shell free's up the buffers and other resources on behalf of the plugin and
reinitializes the s/w vmxnet3 emulation plugin.

> Please make this shell API interface and the PF/VF requirments available.

We have an internal prototype working but we are not yet ready to post the
patch to LKML. We are still in the process of making changes to our windows
driver and want to ensure that we take into account all changes that could
happen.

Thanks,

-pankaj

^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Eric Dumazet @ 2010-05-05 18:53 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Stephen Hemminger, Ben Hutchings, netdev
In-Reply-To: <g2p571fb4001005051103w67e1b9ddn3e8f7feb84d0559@mail.gmail.com>

Le mercredi 05 mai 2010 à 23:33 +0530, Bhaskar Dutta a écrit :

> Hi,
> 
> TSO, GSO and SG are already turned off.
> rx/tx checksumming is on, but that shouldn't matter, right?
> 
> # ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: off
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: off
> 
> The bad packets are very small in size, most have no data at all (<300 bytes).
> 
> After adding some logs to kernel 2.6.31-12, it seems that
> tcp_v4_md5_hash_skb (function that calculates the md5 hash) is
> (might?) getting corrupt.
> 
> The tcp4_pseudohdr (bp = &hp->md5_blk.ip4) structure's saddr, daddr
> and len fields get modified to different values towards the end of the
> tcp_v4_md5_hash_skb function whenever there is a checksum error.
> 
> The tcp4_pseudohdr (bp) is within the tcp_md5sig_pool (hp), which is
> filled up by tcp_get_md5sig_pool (which calls per_cpu_ptr).
> 
> Using a local copy of the tcp4_pseudohdr in the same function
> tcp_v4_md5_hash_skb (copied all fields from the original
> tcp4_pseudohdr within the tcp_md5sig_pool) and calculating the md5
> checksum with the local  tcp4_pseudohdr seems to solve the issue
> (don't see bad packets for a hours in load tests, and without the
> change I can see them instantaneously in the load tests).
> 
> I am still unable to figure out how this is happening. Please let me
> know if you have any pointers.

I am not familiar with this code, but I suspect same per_cpu data can be
used at both time by a sender (process context) and by a receiver
(softirq context).

To trigger this, you need at least two active md5 sockets.

tcp_get_md5sig_pool() should probably disable bh to make sure current
cpu wont be preempted by softirq processing


Something like :

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fb5c66b..e232123 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool          *tcp_get_md5sig_pool(void)
 	struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu);
 	if (!ret)
 		put_cpu();
+	else
+		local_bh_disable();
 	return ret;
 }
 
 static inline void             tcp_put_md5sig_pool(void)
 {
 	__tcp_put_md5sig_pool();
+	local_bh_enable();
 	put_cpu();
 }



^ permalink raw reply related

* [PATCH 4/4 v2] ks8851: read/write MAC address on companion eeprom through debugfs
From: Sebastien Jan @ 2010-05-05 18:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, Abraham Arce, Ben Dooks, Tristram.Ha, Sebastien Jan
In-Reply-To: <1273085155-1260-1-git-send-email-s-jan@ti.com>

A more elegant alternative to ethtool for updating the ks8851
MAC address stored on its companion eeprom.
Using this debugfs interface does not require any knowledge on the
ks8851 companion eeprom organization to update the MAC address.

Example to write 01:23:45:67:89:AB MAC address to the companion
eeprom (assuming debugfs is mounted in /sys/kernel/debug):
$ echo "01:23:45:67:89:AB" > /sys/kernel/debug/ks8851/mac_eeprom

Signed-off-by: Sebastien Jan <s-jan@ti.com>
---
 drivers/net/ks8851.c |  216 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 216 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
index 8e38c36..1b62ec5 100644
--- a/drivers/net/ks8851.c
+++ b/drivers/net/ks8851.c
@@ -22,6 +22,11 @@
 
 #include <linux/spi/spi.h>
 
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+#include <linux/ctype.h>
+#endif
+
 #include "ks8851.h"
 
 /**
@@ -1550,6 +1555,214 @@ static int ks8851_read_selftest(struct ks8851_net *ks)
 	return 0;
 }
 
+#ifdef CONFIG_DEBUG_FS
+static struct dentry *ks8851_dir;
+
+static int ks8851_mac_eeprom_open(struct inode *inode, struct file *file)
+{
+	file->private_data = inode->i_private;
+	return 0;
+}
+
+static int ks8851_mac_eeprom_release(struct inode *inode, struct file *file)
+{
+	return 0;
+}
+
+static loff_t ks8851_mac_eeprom_seek(struct file *file, loff_t off, int whence)
+{
+	return 0;
+}
+
+/**
+ * ks8851_mac_eeprom_read - Read a MAC address in ks8851 companion EEPROM
+ *
+ * Warning: The READ feature is not supported on ks8851 revision 0.
+ */
+static ssize_t ks8851_mac_eeprom_read(struct file *filep, char __user *buff,
+						size_t count, loff_t *offp)
+{
+	ssize_t ret;
+	struct net_device *dev = filep->private_data;
+	char str[50];
+	unsigned int data;
+	unsigned char mac_addr[6];
+
+	if (*offp > 0) {
+		ret = 0;
+		goto ks8851_cnt_rd_bk;
+	}
+
+	data = ks8851_eeprom_read(dev, 1);
+	mac_addr[5] = data & 0xFF;
+	mac_addr[4] = (data >> 8) & 0xFF;
+	data = ks8851_eeprom_read(dev, 2);
+	mac_addr[3] = data & 0xFF;
+	mac_addr[2] = (data >> 8) & 0xFF;
+	data = ks8851_eeprom_read(dev, 3);
+	mac_addr[1] = data & 0xFF;
+	mac_addr[0] = (data >> 8) & 0xFF;
+
+	sprintf(str, "%02x:%02x:%02x:%02x:%02x:%02x\n", mac_addr[0],
+			mac_addr[1], mac_addr[2], mac_addr[3], mac_addr[4],
+								mac_addr[5]);
+
+	ret = strlen(str);
+
+	if (copy_to_user((void __user *)buff, str, ret)) {
+		dev_err(&dev->dev, "ks8851 - copy_to_user failed\n");
+		ret = 0;
+	} else {
+		*offp = ret;
+	}
+
+ks8851_cnt_rd_bk:
+	return ret;
+}
+
+/*
+ * Split the buffer `buf' into ':'-separated words.
+ * Return the number of words or <0 on error.
+ */
+#define isdelimiter(c)	((c) == ':')
+static int ks8851_debug_tokenize(char *buf, char *words[], int maxwords)
+{
+	int nwords = 0;
+
+	while (*buf) {
+		char *end;
+
+		/* Skip leading whitespace */
+		while (*buf && isspace(*buf))
+			buf++;
+		if (!*buf)
+			break;	/* oh, it was trailing whitespace */
+
+		/* Run `end' over a word */
+		for (end = buf ; *end && !isdelimiter(*end) ; end++)
+			;
+		/* `buf' is the start of the word, `end' is one past the end */
+
+		if (nwords == maxwords)
+			return -EINVAL;	/* ran out of words[] before bytes */
+		if (*end)
+			*end++ = '\0';	/* terminate the word */
+		words[nwords++] = buf;
+		buf = end;
+	}
+	return nwords;
+}
+
+/**
+ * ks8851_mac_eeprom_write - Write a MAC address in ks8851 companion EEPROM
+ *
+ */
+static ssize_t ks8851_mac_eeprom_write(struct file *filep,
+			const char __user *buff, size_t count, loff_t *offp)
+{
+	struct net_device *dev = filep->private_data;
+	ssize_t ret;
+#define MAXWORDS 6
+	int nwords, i;
+	char *words[MAXWORDS];
+	char tmpbuf[256];
+	unsigned long mac_addr[6];
+
+	if (count == 0)
+		return 0;
+	if (count > sizeof(tmpbuf)-1)
+		return -E2BIG;
+	if (copy_from_user(tmpbuf, buff, count))
+		return -EFAULT;
+	tmpbuf[count] = '\0';
+	dev_dbg(&dev->dev, "%s: read %d bytes from userspace\n",
+			__func__, (int)count);
+
+	nwords = ks8851_debug_tokenize(tmpbuf, words, MAXWORDS);
+	if (nwords != 6) {
+		dev_warn(&dev->dev,
+		"ks8851 MAC address write to EEPROM requires a MAC address " \
+						"like 01:23:45:67:89:AB\n");
+		return -EINVAL;
+	}
+
+	for (i = 0; i < 6; i++)
+		strict_strtoul(words[i], 16, &mac_addr[i]);
+
+	ks8851_eeprom_write(dev, EEPROM_OP_EWEN, 0, 0);
+
+	ks8851_eeprom_write(dev, EEPROM_OP_WRITE, 1,
+						mac_addr[4] << 8 | mac_addr[5]);
+	mdelay(EEPROM_WRITE_TIME);
+	ks8851_eeprom_write(dev, EEPROM_OP_WRITE, 2,
+						mac_addr[2] << 8 | mac_addr[3]);
+	mdelay(EEPROM_WRITE_TIME);
+	ks8851_eeprom_write(dev, EEPROM_OP_WRITE, 3,
+						mac_addr[0] << 8 | mac_addr[1]);
+	mdelay(EEPROM_WRITE_TIME);
+
+	ks8851_eeprom_write(dev, EEPROM_OP_EWDS, 0, 0);
+
+	dev_dbg(&dev->dev, "MAC address %02lx.%02lx.%02lx.%02lx.%02lx.%02lx "\
+			"written to EEPROM\n", mac_addr[0], mac_addr[1],
+			mac_addr[2], mac_addr[3], mac_addr[4], mac_addr[5]);
+
+	ret = count;
+	*offp += count;
+	return ret;
+}
+
+static const struct file_operations ks8851_mac_eeprom_fops = {
+	.open		= ks8851_mac_eeprom_open,
+	.read		= ks8851_mac_eeprom_read,
+	.write		= ks8851_mac_eeprom_write,
+	.llseek		= ks8851_mac_eeprom_seek,
+	.release	= ks8851_mac_eeprom_release,
+};
+
+int __init ks8851_debug_init(struct net_device *dev)
+{
+	struct ks8851_net *ks = netdev_priv(dev);
+	mode_t mac_access = S_IWUGO;
+
+	if (ks->rc_ccr & CCR_EEPROM) {
+		ks8851_dir = debugfs_create_dir("ks8851", NULL);
+		if (IS_ERR(ks8851_dir))
+			return PTR_ERR(ks8851_dir);
+
+		/* Check ks8851 version and features */
+		mutex_lock(&ks->lock);
+		if (CIDER_REV_GET(ks8851_rdreg16(ks, KS_CIDER)) > 0)
+			mac_access |= S_IRUGO;
+		mutex_unlock(&ks->lock);
+
+		debugfs_create_file("mac_eeprom", mac_access, ks8851_dir, dev,
+						&ks8851_mac_eeprom_fops);
+		debugfs_create_u32("eeprom_size", S_IRUGO | S_IWUGO,
+					ks8851_dir, (u32 *) &(ks->eeprom_size));
+	} else {
+		ks8851_dir = NULL;
+	}
+	return 0;
+}
+
+void ks8851_debug_exit(void)
+{
+	if (ks8851_dir)
+		debugfs_remove_recursive(ks8851_dir);
+}
+#else
+int __init ks8851_debug_init(struct net_device *dev)
+{
+	return 0;
+};
+
+void ks8851_debug_exit(void)
+{
+};
+#endif /* CONFIG_DEBUG_FS */
+
+
 /* driver bus management functions */
 
 static int __devinit ks8851_probe(struct spi_device *spi)
@@ -1649,6 +1862,8 @@ static int __devinit ks8851_probe(struct spi_device *spi)
 		goto err_netdev;
 	}
 
+	ks8851_debug_init(ndev);
+
 	dev_info(&spi->dev, "revision %d, MAC %pM, IRQ %d\n",
 		 CIDER_REV_GET(ks8851_rdreg16(ks, KS_CIDER)),
 		 ndev->dev_addr, ndev->irq);
@@ -1672,6 +1887,7 @@ static int __devexit ks8851_remove(struct spi_device *spi)
 	if (netif_msg_drv(priv))
 		dev_info(&spi->dev, "remove");
 
+	ks8851_debug_exit();
 	unregister_netdev(priv->netdev);
 	free_irq(spi->irq, priv);
 	free_netdev(priv->netdev);
-- 
1.6.3.3


^ permalink raw reply related

* [PATCH 2/4 v2] ks8851: Low level functions for read/write to companion eeprom
From: Sebastien Jan @ 2010-05-05 18:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, Abraham Arce, Ben Dooks, Tristram.Ha, Sebastien Jan
In-Reply-To: <1273085155-1260-1-git-send-email-s-jan@ti.com>

Low-level functions provide 16bits words read and write capability
to ks8851 companion eeprom.

Signed-off-by: Sebastien Jan <s-jan@ti.com>
---
 drivers/net/ks8851.c |  228 ++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/net/ks8851.h |   14 +++-
 2 files changed, 241 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
index a84e500..787f9df 100644
--- a/drivers/net/ks8851.c
+++ b/drivers/net/ks8851.c
@@ -1044,6 +1044,234 @@ static const struct net_device_ops ks8851_netdev_ops = {
 	.ndo_validate_addr	= eth_validate_addr,
 };
 
+/* Companion eeprom access */
+
+enum {	/* EEPROM programming states */
+	EEPROM_CONTROL,
+	EEPROM_ADDRESS,
+	EEPROM_DATA,
+	EEPROM_COMPLETE
+};
+
+/**
+ * ks8851_eeprom_read - read a 16bits word in ks8851 companion EEPROM
+ * @dev: The network device the PHY is on.
+ * @addr: EEPROM address to read
+ *
+ * eeprom_size: used to define the data coding length. Can be changed
+ * through debug-fs.
+ *
+ * Programs a read on the EEPROM using ks8851 EEPROM SW access feature.
+ * Warning: The READ feature is not supported on ks8851 revision 0.
+ *
+ * Rough programming model:
+ *  - on period start: set clock high and read value on bus
+ *  - on period / 2: set clock low and program value on bus
+ *  - start on period / 2
+ */
+unsigned int ks8851_eeprom_read(struct net_device *dev, unsigned int addr)
+{
+	struct ks8851_net *ks = netdev_priv(dev);
+	int eepcr;
+	int ctrl = EEPROM_OP_READ;
+	int state = EEPROM_CONTROL;
+	int bit_count = EEPROM_OP_LEN - 1;
+	unsigned int data = 0;
+	int dummy;
+	unsigned int addr_len;
+
+	addr_len = (ks->eeprom_size == 128) ? 6 : 8;
+
+	/* start transaction: chip select high, authorize write */
+	mutex_lock(&ks->lock);
+	eepcr = EEPCR_EESA | EEPCR_EESRWA;
+	ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+	eepcr |= EEPCR_EECS;
+	ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+	mutex_unlock(&ks->lock);
+
+	while (state != EEPROM_COMPLETE) {
+		/* falling clock period starts... */
+		/* set EED_IO pin for control and address */
+		eepcr &= ~EEPCR_EEDO;
+		switch (state) {
+		case EEPROM_CONTROL:
+			eepcr |= ((ctrl >> bit_count) & 1) << 2;
+			if (bit_count-- <= 0) {
+				bit_count = addr_len - 1;
+				state = EEPROM_ADDRESS;
+			}
+			break;
+		case EEPROM_ADDRESS:
+			eepcr |= ((addr >> bit_count) & 1) << 2;
+			bit_count--;
+			break;
+		case EEPROM_DATA:
+			/* Change to receive mode */
+			eepcr &= ~EEPCR_EESRWA;
+			break;
+		}
+
+		/* lower clock  */
+		eepcr &= ~EEPCR_EESCK;
+
+		mutex_lock(&ks->lock);
+		ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+		mutex_unlock(&ks->lock);
+
+		/* waitread period / 2 */
+		udelay(EEPROM_SK_PERIOD / 2);
+
+		/* rising clock period starts... */
+
+		/* raise clock */
+		mutex_lock(&ks->lock);
+		eepcr |= EEPCR_EESCK;
+		ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+		mutex_unlock(&ks->lock);
+
+		/* Manage read */
+		switch (state) {
+		case EEPROM_ADDRESS:
+			if (bit_count < 0) {
+				bit_count = EEPROM_DATA_LEN - 1;
+				state = EEPROM_DATA;
+			}
+			break;
+		case EEPROM_DATA:
+			mutex_lock(&ks->lock);
+			dummy = ks8851_rdreg16(ks, KS_EEPCR);
+			mutex_unlock(&ks->lock);
+			data |= ((dummy >> EEPCR_EESB_OFFSET) & 1) << bit_count;
+			if (bit_count-- <= 0)
+				state = EEPROM_COMPLETE;
+			break;
+		}
+
+		/* wait period / 2 */
+		udelay(EEPROM_SK_PERIOD / 2);
+	}
+
+	/* close transaction */
+	mutex_lock(&ks->lock);
+	eepcr &= ~EEPCR_EECS;
+	ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+	eepcr = 0;
+	ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+	mutex_unlock(&ks->lock);
+
+	return data;
+}
+
+/**
+ * ks8851_eeprom_write - write a 16bits word in ks8851 companion EEPROM
+ * @dev: The network device the PHY is on.
+ * @op: operand (can be WRITE, EWEN, EWDS)
+ * @addr: EEPROM address to write
+ * @data: data to write
+ *
+ * eeprom_size: used to define the data coding length. Can be changed
+ * through debug-fs.
+ *
+ * Programs a write on the EEPROM using ks8851 EEPROM SW access feature.
+ *
+ * Note that a write enable is required before writing data.
+ *
+ * Rough programming model:
+ *  - on period start: set clock high
+ *  - on period / 2: set clock low and program value on bus
+ *  - start on period / 2
+ */
+void ks8851_eeprom_write(struct net_device *dev, unsigned int op,
+					unsigned int addr, unsigned int data)
+{
+	struct ks8851_net *ks = netdev_priv(dev);
+	int eepcr;
+	int state = EEPROM_CONTROL;
+	int bit_count = EEPROM_OP_LEN - 1;
+	unsigned int addr_len;
+
+	addr_len = (ks->eeprom_size == 128) ? 6 : 8;
+
+	switch (op) {
+	case EEPROM_OP_EWEN:
+		addr = 0x30;
+	break;
+	case EEPROM_OP_EWDS:
+		addr = 0;
+		break;
+	}
+
+	/* start transaction: chip select high, authorize write */
+	mutex_lock(&ks->lock);
+	eepcr = EEPCR_EESA | EEPCR_EESRWA;
+	ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+	eepcr |= EEPCR_EECS;
+	ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+	mutex_unlock(&ks->lock);
+
+	while (state != EEPROM_COMPLETE) {
+		/* falling clock period starts... */
+		/* set EED_IO pin for control and address */
+		eepcr &= ~EEPCR_EEDO;
+		switch (state) {
+		case EEPROM_CONTROL:
+			eepcr |= ((op >> bit_count) & 1) << 2;
+			if (bit_count-- <= 0) {
+				bit_count = addr_len - 1;
+				state = EEPROM_ADDRESS;
+			}
+			break;
+		case EEPROM_ADDRESS:
+			eepcr |= ((addr >> bit_count) & 1) << 2;
+			if (bit_count-- <= 0) {
+				if (op == EEPROM_OP_WRITE) {
+					bit_count = EEPROM_DATA_LEN - 1;
+					state = EEPROM_DATA;
+				} else {
+					state = EEPROM_COMPLETE;
+				}
+			}
+			break;
+		case EEPROM_DATA:
+			eepcr |= ((data >> bit_count) & 1) << 2;
+			if (bit_count-- <= 0)
+				state = EEPROM_COMPLETE;
+			break;
+		}
+
+		/* lower clock  */
+		eepcr &= ~EEPCR_EESCK;
+
+		mutex_lock(&ks->lock);
+		ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+		mutex_unlock(&ks->lock);
+
+		/* wait period / 2 */
+		udelay(EEPROM_SK_PERIOD / 2);
+
+		/* rising clock period starts... */
+
+		/* raise clock */
+		eepcr |= EEPCR_EESCK;
+		mutex_lock(&ks->lock);
+		ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+		mutex_unlock(&ks->lock);
+
+		/* wait period / 2 */
+		udelay(EEPROM_SK_PERIOD / 2);
+	}
+
+	/* close transaction */
+	mutex_lock(&ks->lock);
+	eepcr &= ~EEPCR_EECS;
+	ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+	eepcr = 0;
+	ks8851_wrreg16(ks, KS_EEPCR, eepcr);
+	mutex_unlock(&ks->lock);
+
+}
+
 /* ethtool support */
 
 static void ks8851_get_drvinfo(struct net_device *dev,
diff --git a/drivers/net/ks8851.h b/drivers/net/ks8851.h
index f52c312..537fb06 100644
--- a/drivers/net/ks8851.h
+++ b/drivers/net/ks8851.h
@@ -25,12 +25,24 @@
 #define OBCR_ODS_16mA				(1 << 6)
 
 #define KS_EEPCR				0x22
+#define EEPCR_EESRWA				(1 << 5)
 #define EEPCR_EESA				(1 << 4)
-#define EEPCR_EESB				(1 << 3)
+#define EEPCR_EESB_OFFSET			3
+#define EEPCR_EESB				(1 << EEPCR_EESB_OFFSET)
 #define EEPCR_EEDO				(1 << 2)
 #define EEPCR_EESCK				(1 << 1)
 #define EEPCR_EECS				(1 << 0)
 
+#define EEPROM_OP_LEN				3	/* bits:*/
+#define EEPROM_OP_READ				0x06
+#define EEPROM_OP_EWEN				0x04
+#define EEPROM_OP_WRITE				0x05
+#define EEPROM_OP_EWDS				0x14
+
+#define EEPROM_DATA_LEN				16	/* 16 bits EEPROM */
+#define EEPROM_WRITE_TIME			4	/* wrt ack time in ms */
+#define EEPROM_SK_PERIOD			400	/* in us */
+
 #define KS_MBIR					0x24
 #define MBIR_TXMBF				(1 << 12)
 #define MBIR_TXMBFA				(1 << 11)
-- 
1.6.3.3


^ permalink raw reply related

* [PATCH 1/4 v2] ks8851: Add caching of CCR register
From: Sebastien Jan @ 2010-05-05 18:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, Abraham Arce, Ben Dooks, Tristram.Ha, Sebastien Jan
In-Reply-To: <1273085155-1260-1-git-send-email-s-jan@ti.com>

CCR register contains information on companion eeprom availability.

Signed-off-by: Sebastien Jan <s-jan@ti.com>
---
 drivers/net/ks8851.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
index 9e9f9b3..a84e500 100644
--- a/drivers/net/ks8851.c
+++ b/drivers/net/ks8851.c
@@ -76,7 +76,9 @@ union ks8851_tx_hdr {
  * @msg_enable: The message flags controlling driver output (see ethtool).
  * @fid: Incrementing frame id tag.
  * @rc_ier: Cached copy of KS_IER.
+ * @rc_ccr: Cached copy of KS_CCR.
  * @rc_rxqcr: Cached copy of KS_RXQCR.
+ * @eeprom_size: Companion eeprom size in Bytes, 0 if no eeprom
  *
  * The @lock ensures that the chip is protected when certain operations are
  * in progress. When the read or write packet transfer is in progress, most
@@ -107,6 +109,8 @@ struct ks8851_net {
 
 	u16			rc_ier;
 	u16			rc_rxqcr;
+	u16			rc_ccr;
+	u16			eeprom_size;
 
 	struct mii_if_info	mii;
 	struct ks8851_rxctrl	rxctrl;
@@ -1279,6 +1283,14 @@ static int __devinit ks8851_probe(struct spi_device *spi)
 		goto err_id;
 	}
 
+	/* cache the contents of the CCR register for EEPROM, etc. */
+	ks->rc_ccr = ks8851_rdreg16(ks, KS_CCR);
+
+	if (ks->rc_ccr & CCR_EEPROM)
+		ks->eeprom_size = 128;
+	else
+		ks->eeprom_size = 0;
+
 	ks8851_read_selftest(ks);
 	ks8851_init_mac(ks);
 
-- 
1.6.3.3


^ permalink raw reply related

* [PATCH 0/4 v2] ks8851: support for read/write MAC address from eeprom
From: Sebastien Jan @ 2010-05-05 18:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, Abraham Arce, Ben Dooks, Tristram.Ha, Sebastien Jan

Provides the ability to update the ks8851 companion eeprom content (including the
MAC address) through ethtool or debugfs (for MAC address only).

Differences with v1:
 - added the support of eeprom access through ethtool
 - different patches split

Patch set tested on OMAP4-based test board. Build tested on net-2.6 head.

Applies on current net-2.6 head.

Sebastien Jan (4):
  ks8851: Add caching of CCR register
  ks8851: Low level functions for read/write to companion eeprom
  ks8851: companion eeprom access through ethtool
  ks8851: read/write MAC address on companion eeprom through debugfs

 drivers/net/ks8851.c |  570 ++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/net/ks8851.h |   14 ++-
 2 files changed, 583 insertions(+), 1 deletions(-)


^ permalink raw reply

* [PATCH 3/4 v2] ks8851: companion eeprom access through ethtool
From: Sebastien Jan @ 2010-05-05 18:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, Abraham Arce, Ben Dooks, Tristram.Ha, Sebastien Jan
In-Reply-To: <1273085155-1260-1-git-send-email-s-jan@ti.com>

Accessing ks8851 companion eeprom permits modifying the ks8851 stored
MAC address.

Example how to change the MAC address using ethtool, to set the
01:23:45:67:89:AB MAC address:
$ echo "0:AB8976452301" | xxd -r > mac.bin
$ sudo ethtool -E eth0 magic 0x8870 offset 2 < mac.bin

Signed-off-by: Sebastien Jan <s-jan@ti.com>
---
 drivers/net/ks8851.c |  114 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 114 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
index 787f9df..8e38c36 100644
--- a/drivers/net/ks8851.c
+++ b/drivers/net/ks8851.c
@@ -1318,6 +1318,117 @@ static int ks8851_nway_reset(struct net_device *dev)
 	return mii_nway_restart(&ks->mii);
 }
 
+static int ks8851_get_eeprom_len(struct net_device *dev)
+{
+	struct ks8851_net *ks = netdev_priv(dev);
+	return ks->eeprom_size;
+}
+
+static int ks8851_get_eeprom(struct net_device *dev,
+			    struct ethtool_eeprom *eeprom, u8 *bytes)
+{
+	struct ks8851_net *ks = netdev_priv(dev);
+	u16 *eeprom_buff;
+	int first_word;
+	int last_word;
+	int ret_val = 0;
+	u16 i;
+
+	if (eeprom->len == 0)
+		return -EINVAL;
+
+	if (eeprom->len > ks->eeprom_size)
+		return -EINVAL;
+
+	eeprom->magic = ks8851_rdreg16(ks, KS_CIDER);
+
+	first_word = eeprom->offset >> 1;
+	last_word = (eeprom->offset + eeprom->len - 1) >> 1;
+
+	eeprom_buff = kmalloc(sizeof(u16) *
+			(last_word - first_word + 1), GFP_KERNEL);
+	if (!eeprom_buff)
+		return -ENOMEM;
+
+	for (i = 0; i < last_word - first_word + 1; i++)
+		eeprom_buff[i] = ks8851_eeprom_read(dev, first_word + 1);
+
+	/* Device's eeprom is little-endian, word addressable */
+	for (i = 0; i < last_word - first_word + 1; i++)
+		le16_to_cpus(&eeprom_buff[i]);
+
+	memcpy(bytes, (u8 *)eeprom_buff + (eeprom->offset & 1), eeprom->len);
+	kfree(eeprom_buff);
+
+	return ret_val;
+}
+
+static int ks8851_set_eeprom(struct net_device *dev,
+			    struct ethtool_eeprom *eeprom, u8 *bytes)
+{
+	struct ks8851_net *ks = netdev_priv(dev);
+	u16 *eeprom_buff;
+	void *ptr;
+	int max_len;
+	int first_word;
+	int last_word;
+	int ret_val = 0;
+	u16 i;
+
+	if (eeprom->len == 0)
+		return -EOPNOTSUPP;
+
+	if (eeprom->len > ks->eeprom_size)
+		return -EINVAL;
+
+	if (eeprom->magic != ks8851_rdreg16(ks, KS_CIDER))
+		return -EFAULT;
+
+	first_word = eeprom->offset >> 1;
+	last_word = (eeprom->offset + eeprom->len - 1) >> 1;
+	max_len = (last_word - first_word + 1) * 2;
+	eeprom_buff = kmalloc(max_len, GFP_KERNEL);
+	if (!eeprom_buff)
+		return -ENOMEM;
+
+	ptr = (void *)eeprom_buff;
+
+	if (eeprom->offset & 1) {
+		/* need read/modify/write of first changed EEPROM word */
+		/* only the second byte of the word is being modified */
+		eeprom_buff[0] = ks8851_eeprom_read(dev, first_word);
+		ptr++;
+	}
+	if ((eeprom->offset + eeprom->len) & 1)
+		/* need read/modify/write of last changed EEPROM word */
+		/* only the first byte of the word is being modified */
+		eeprom_buff[last_word - first_word] =
+					ks8851_eeprom_read(dev, last_word);
+
+
+	/* Device's eeprom is little-endian, word addressable */
+	le16_to_cpus(&eeprom_buff[0]);
+	le16_to_cpus(&eeprom_buff[last_word - first_word]);
+
+	memcpy(ptr, bytes, eeprom->len);
+
+	for (i = 0; i < last_word - first_word + 1; i++)
+		eeprom_buff[i] = cpu_to_le16(eeprom_buff[i]);
+
+	ks8851_eeprom_write(dev, EEPROM_OP_EWEN, 0, 0);
+
+	for (i = 0; i < last_word - first_word + 1; i++) {
+		ks8851_eeprom_write(dev, EEPROM_OP_WRITE, first_word + i,
+							eeprom_buff[i]);
+		mdelay(EEPROM_WRITE_TIME);
+	}
+
+	ks8851_eeprom_write(dev, EEPROM_OP_EWDS, 0, 0);
+
+	kfree(eeprom_buff);
+	return ret_val;
+}
+
 static const struct ethtool_ops ks8851_ethtool_ops = {
 	.get_drvinfo	= ks8851_get_drvinfo,
 	.get_msglevel	= ks8851_get_msglevel,
@@ -1326,6 +1437,9 @@ static const struct ethtool_ops ks8851_ethtool_ops = {
 	.set_settings	= ks8851_set_settings,
 	.get_link	= ks8851_get_link,
 	.nway_reset	= ks8851_nway_reset,
+	.get_eeprom_len	= ks8851_get_eeprom_len,
+	.get_eeprom	= ks8851_get_eeprom,
+	.set_eeprom	= ks8851_set_eeprom,
 };
 
 /* MII interface controls */
-- 
1.6.3.3


^ permalink raw reply related

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Bhaskar Dutta @ 2010-05-05 18:03 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ben Hutchings, netdev
In-Reply-To: <20100504101301.5f4dd9c2@nehalam>

On Tue, May 4, 2010 at 10:43 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
>
> On Tue, 4 May 2010 22:38:49 +0530
> Bhaskar Dutta <bhaskie@gmail.com> wrote:
>
> > On Tue, May 4, 2010 at 9:42 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> > > On Tue, 4 May 2010 19:58:32 +0530
> > > Bhaskar Dutta <bhaskie@gmail.com> wrote:
> > >
> > >> On Tue, May 4, 2010 at 5:02 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
> > >> > On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
> > >> >> Hi,
> > >> >>
> > >> >> I am observing intermittent TCP-MD5 checksum failures
> > >> >> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
> > >> >>
> > >> >> The problem is only seen in multi-core 64 bit machines.
> > >> >> Is there any known bug in the per_cpu_ptr implementation (I am aware
> > >> >> that the percpu allocator has been re-implemented in 2.6.33) that
> > >> >> might cause a corruption in 64 bit SMP machines?
> > >> >>
> > >> >> Any pointers would be appreciated.
> > >> >
> > >> > There was another recent report of incorrect MD5 signatures in
> > >> > <http://thread.gmane.org/gmane.linux.network/159556>, but without any
> > >> > response.
> > >> >
> > >> > Ben.
> > >> >
> > >>
> > >> I found another thread posted back in Jan 2007 with a similar bug
> > >> (x86_64 on 2.6.20) but no replies to that as well.
> > >> http://lkml.org/lkml/2007/1/20/56
> > >
> > > 2.6.20 had lots of other MD5 bugs. Your problem might be related to
> > > GRO.  MD5 may not handle multi-fragment packets.
> > > --
> >
> > I am getting the issue on 2.6.31 and 2.6.28 (gro infrastructure was
> > added in 2.6.29).
> > Also, both segmentation offloading as well as receive offloading
> > (gso/gro) are turned off.
> >
> > Moreover outgoing TCP packets are the ones with the corrupt checksums.
> > Both tcpdump on my local machine and the BGP router on the other side
> > complain of the bad checksums with the same packet.
> >
> > I am trying to figure out if there is something in the per-cpu
> > implementation that might be causing a corruption (SMP and x86_64) but
> > I am not really getting anywhere.
>
> I seriously doubt the per-cpu stuff is the issue.
>
> > I am trying to reproduce the bad checksums with the latest kernel
> > sources since it has a new implementation of the percpu allocator.
>
> First turn off all offload settings on the device (TSO,GSO,SG,CSUM)
> then check that size of the bad packets. Are they fragmented or
> just simple linear packets?
>
> --

Hi,

TSO, GSO and SG are already turned off.
rx/tx checksumming is on, but that shouldn't matter, right?

# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off

The bad packets are very small in size, most have no data at all (<300 bytes).

After adding some logs to kernel 2.6.31-12, it seems that
tcp_v4_md5_hash_skb (function that calculates the md5 hash) is
(might?) getting corrupt.

The tcp4_pseudohdr (bp = &hp->md5_blk.ip4) structure's saddr, daddr
and len fields get modified to different values towards the end of the
tcp_v4_md5_hash_skb function whenever there is a checksum error.

The tcp4_pseudohdr (bp) is within the tcp_md5sig_pool (hp), which is
filled up by tcp_get_md5sig_pool (which calls per_cpu_ptr).

Using a local copy of the tcp4_pseudohdr in the same function
tcp_v4_md5_hash_skb (copied all fields from the original
tcp4_pseudohdr within the tcp_md5sig_pool) and calculating the md5
checksum with the local  tcp4_pseudohdr seems to solve the issue
(don't see bad packets for a hours in load tests, and without the
change I can see them instantaneously in the load tests).

I am still unable to figure out how this is happening. Please let me
know if you have any pointers.

Thanks a lot!
Bhaskar

^ permalink raw reply

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Avi Kivity @ 2010-05-05 17:59 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, sbhatewara
In-Reply-To: <20100504230225.GP8323@vmware.com>

On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
> 2. Hypervisor control: All control operations from the guest such as programming
> MAC address go through the hypervisor layer and hence can be subjected to
> hypervisor policies. The PF driver can be further used to put policy decisions
> like which VLAN the guest should be on.
>    

Is this enforced?  Since you pass the hardware through, you can't rely 
on the guest actually doing this, yes?

> The plugin image is provided by the IHVs along with the PF driver and is
> packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> either into a Linux VM or a Windows VM. The plugin is written against the Shell
> API interface which the shell is responsible for implementing. The API
> interface allows the plugin to do TX and RX only by programming the hardware
> rings (along with things like buffer allocation and basic initialization). The
> virtual machine comes up in paravirtualized/emulated mode when it is booted.
> The hypervisor allocates the VF and other resources and notifies the shell of
> the availability of the VF. The hypervisor injects the plugin into memory
> location specified by the shell. The shell initializes the plugin by calling
> into a known entry point and the plugin initializes the data path. The control
> path is already initialized by the PF driver when the VF is allocated. At this
> point the shell switches to using the loaded plugin to do all further TX and RX
> operations. The guest networking stack does not participate in these operations
> and continues to function normally. All the control operations continue being
> trapped by the hypervisor and are directed to the PF driver as needed. For
> example, if the MAC address changes the hypervisor updates its internal state
> and changes the state of the embedded switch as well through the PF control
> API.
>    

This is essentially a miniature network stack with a its own mini 
bonding layer, mini hotplug, and mini API, except s/API/ABI/.  Is this a 
correct view?

If so, the Linuxy approach would be to use the ordinary drivers and the 
Linux networking API, and hide the bond setup using namespaces.  The 
bond driver, or perhaps a new, similar, driver can be enhanced to 
propagate ethtool commands to its (hidden) components, and to have a 
control channel with the hypervisor.

This would make the approach hypervisor agnostic, you're just pairing 
two devices and presenting them to the rest of the stack as a single device.

> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
> splitting the driver into two parts: Shell and Plugin. The new split driver is
>    

So the Shell would be the reworked or new bond driver, and Plugins would 
be ordinary Linux network drivers.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply

* Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Stephen Hemminger @ 2010-05-05 17:52 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Christoph Hellwig, pv-drivers@vmware.com, Pankaj Thakkar,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <20100505173951.GA8388@infradead.org>

On Wed, 5 May 2010 13:39:51 -0400
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, May 05, 2010 at 10:35:28AM -0700, Dmitry Torokhov wrote:
> > Yes, with the exception that the only body of code that will be
> > accepted by the shell should be GPL-licensed and thus open and available
> > for examining. This is not different from having a standard kernel
> > module that is loaded normally and plugs into a certain subsystem.
> > The difference is that the binary resides not on guest filesystem
> > but elsewhere.
> 
> Forget about the licensing.  Loading binary blobs written to a shim
> layer is a complete pain in the ass and totally unsupportable, and
> also uninteresting because of the overhead.
> 
> If you have any interesting in developing this further, do:
> 
>  (1) move the limited VF drivers directly into the kernel tree,
>      talk to them through a normal ops vector
>  (2) get rid of the whole shim crap and instead integrate the limited
>      VF driver with the full VF driver we already have, instead of
>      duplicating the code
>  (3) don't make the PV to VF integration VMware-specific but also
>      provide an open reference implementation like virtio.  We're not
>      going to add massive amount of infrastructure that is not actually
>      useable in a free software stack.

Let me put it bluntly. Any design that allows external code to run
in the kernel is not going to be accepted.  Out of tree kernel modules are enough
of a pain already, why do you expect the developers to add another
interface.

^ permalink raw reply

* RE: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Pankaj Thakkar @ 2010-05-05 17:47 UTC (permalink / raw)
  To: Christoph Hellwig, Dmitry Torokhov
  Cc: pv-drivers@vmware.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <20100505173951.GA8388@infradead.org>



> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@infradead.org]
> Sent: Wednesday, May 05, 2010 10:40 AM
> To: Dmitry Torokhov
> Cc: Christoph Hellwig; pv-drivers@vmware.com; Pankaj Thakkar;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> virtualization@lists.linux-foundation.org
> Subject: Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for
> vmxnet3
> 
> On Wed, May 05, 2010 at 10:35:28AM -0700, Dmitry Torokhov wrote:
> > Yes, with the exception that the only body of code that will be
> > accepted by the shell should be GPL-licensed and thus open and
> available
> > for examining. This is not different from having a standard kernel
> > module that is loaded normally and plugs into a certain subsystem.
> > The difference is that the binary resides not on guest filesystem
> > but elsewhere.
> 
> Forget about the licensing.  Loading binary blobs written to a shim
> layer is a complete pain in the ass and totally unsupportable, and
> also uninteresting because of the overhead.

[PT] Why do you think it is unsupportable? How different is it from any module written against a well maintained interface? What overhead are you talking about?

> 
> If you have any interesting in developing this further, do:
> 
>  (1) move the limited VF drivers directly into the kernel tree,
>      talk to them through a normal ops vector
[PT] This assumes that all the VF drivers would always be available. Also we have to support windows and our current design supports it nicely in an OS agnostic manner.

>  (2) get rid of the whole shim crap and instead integrate the limited
>      VF driver with the full VF driver we already have, instead of
>      duplicating the code
[PT] Having a full VF driver adds a lot of dependency on the guest VM and this is what NPA tries to avoid.

>  (3) don't make the PV to VF integration VMware-specific but also
>      provide an open reference implementation like virtio.  We're not
>      going to add massive amount of infrastructure that is not actually
>      useable in a free software stack.
[PT] Today this is tied to vmxnet3 device and is intended to work on ESX hypervisor only (vmxnet3 works on VMware hypervisor only). All the loading support is inside the ESX hypervisor. I am going to post the interface between the shell and the plugin soon and you can see that there is not a whole lot of dependency or infrastructure requirements from the Linux kernel. Please keep in mind that we don't use Linux as a hypervisor but as a guest VM.


^ permalink raw reply

* Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Christoph Hellwig @ 2010-05-05 17:39 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Christoph Hellwig, pv-drivers@vmware.com, Pankaj Thakkar,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <201005051035.29831.dtor@vmware.com>

On Wed, May 05, 2010 at 10:35:28AM -0700, Dmitry Torokhov wrote:
> Yes, with the exception that the only body of code that will be
> accepted by the shell should be GPL-licensed and thus open and available
> for examining. This is not different from having a standard kernel
> module that is loaded normally and plugs into a certain subsystem.
> The difference is that the binary resides not on guest filesystem
> but elsewhere.

Forget about the licensing.  Loading binary blobs written to a shim
layer is a complete pain in the ass and totally unsupportable, and
also uninteresting because of the overhead.

If you have any interesting in developing this further, do:

 (1) move the limited VF drivers directly into the kernel tree,
     talk to them through a normal ops vector
 (2) get rid of the whole shim crap and instead integrate the limited
     VF driver with the full VF driver we already have, instead of
     duplicating the code
 (3) don't make the PV to VF integration VMware-specific but also
     provide an open reference implementation like virtio.  We're not
     going to add massive amount of infrastructure that is not actually
     useable in a free software stack.

^ permalink raw reply

* Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Dmitry Torokhov @ 2010-05-05 17:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: pv-drivers@vmware.com, Pankaj Thakkar, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <20100505173120.GA1752@infradead.org>

On Wednesday 05 May 2010 10:31:20 am Christoph Hellwig wrote:
> On Wed, May 05, 2010 at 10:29:40AM -0700, Dmitry Torokhov wrote:
> > > We're not going to add any kind of loader for binry blobs into kernel
> > > space, sorry.  Don't even bother wasting your time on this.
> > 
> > It would not be a binary blob but software properly released under GPL.
> > The current plan is for the shell to enforce GPL requirement on the
> > plugin code, similar to what module loaded does for regular kernel
> > modules.
> 
> The mechanism described in the document is loading a binary blob
> coded to an abstract API.

Yes, with the exception that the only body of code that will be
accepted by the shell should be GPL-licensed and thus open and available
for examining. This is not different from having a standard kernel
module that is loaded normally and plugs into a certain subsystem.
The difference is that the binary resides not on guest filesystem
but elsewhere.

> 
> That's something entirely different from having normal modules for
> the Virtual Functions, which we already have for various pieces of
> hardware anyway.

-- 
Dmitry

^ permalink raw reply

* Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Christoph Hellwig @ 2010-05-05 17:31 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: pv-drivers, Christoph Hellwig, Pankaj Thakkar,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <201005051029.42052.dtor@vmware.com>

On Wed, May 05, 2010 at 10:29:40AM -0700, Dmitry Torokhov wrote:
> > We're not going to add any kind of loader for binry blobs into kernel
> > space, sorry.  Don't even bother wasting your time on this.
> > 
> 
> It would not be a binary blob but software properly released under GPL.
> The current plan is for the shell to enforce GPL requirement on the
> plugin code, similar to what module loaded does for regular kernel
> modules.

The mechanism described in the document is loading a binary blob
coded to an abstract API.

That's something entirely different from having normal modules for
the Virtual Functions, which we already have for various pieces of
hardware anyway.

^ permalink raw reply

* Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Dmitry Torokhov @ 2010-05-05 17:29 UTC (permalink / raw)
  To: pv-drivers
  Cc: Christoph Hellwig, Pankaj Thakkar, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <20100505172316.GA25338@infradead.org>

On Wednesday 05 May 2010 10:23:16 am Christoph Hellwig wrote:
> On Tue, May 04, 2010 at 04:02:25PM -0700, Pankaj Thakkar wrote:
> > The plugin image is provided by the IHVs along with the PF driver and is
> > packaged in the hypervisor. The plugin image is OS agnostic and can be
> > loaded either into a Linux VM or a Windows VM. The plugin is written
> > against the Shell API interface which the shell is responsible for
> > implementing. The API
> 
> We're not going to add any kind of loader for binry blobs into kernel
> space, sorry.  Don't even bother wasting your time on this.
> 

It would not be a binary blob but software properly released under GPL.
The current plan is for the shell to enforce GPL requirement on the
plugin code, similar to what module loaded does for regular kernel
modules.

-- 
Dmitry

^ permalink raw reply

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Christoph Hellwig @ 2010-05-05 17:23 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel, netdev, virtualization, pv-drivers, sbhatewara
In-Reply-To: <20100504230225.GP8323@vmware.com>

On Tue, May 04, 2010 at 04:02:25PM -0700, Pankaj Thakkar wrote:
> The plugin image is provided by the IHVs along with the PF driver and is
> packaged in the hypervisor. The plugin image is OS agnostic and can be loaded
> either into a Linux VM or a Windows VM. The plugin is written against the Shell
> API interface which the shell is responsible for implementing. The API

We're not going to add any kind of loader for binry blobs into kernel
space, sorry.  Don't even bother wasting your time on this.


^ permalink raw reply

* Re: Fw: [Bug 15907] New: IP_ADD_SOURCE_MEMBERSHIP after IP_ADD_MEMBERSHIP join on same multicast-group dont return EINVAL
From: David Stevens @ 2010-05-05 17:09 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Herbert Xu, netdev, netdev-owner
In-Reply-To: <20100505082138.6cb93aa1@nehalam>

This particular failure  may be  just a matter of translating the
EADDRINUSE check in IP_ADD_SOURCE_MEMBERSHIP to
return EINVAL rather than ignoring it. The more general change of
segregating SSM and ASM should go further than that (e.g., a boolean
to tell you which way the membership was added and checking in
all the operations).

The code predates this informational RFC and allows you to
change back and forth between ASM and SSM where it makes
sense (a more liberal interpretation).

Of course, if existing applications are mixing them already, they
would break, and I'm not sure I agree it's a good thing to have
to destroy an existing membership and recreate it if you want to
switch from ASM to SSM.

I can look at this, but not for a few days at least; can review if
someone else does before I do.

                                        +-DLS


netdev-owner@vger.kernel.org wrote on 05/05/2010 08:21:38 AM:

> 
> 
> Begin forwarded message:
> 
> Date: Wed, 5 May 2010 09:48:40 GMT
> From: bugzilla-daemon@bugzilla.kernel.org
> To: shemminger@linux-foundation.org
> Subject: [Bug 15907] New: IP_ADD_SOURCE_MEMBERSHIP after 
IP_ADD_MEMBERSHIP 
> join on same multicast-group dont return EINVAL
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=15907
> 
>            Summary: IP_ADD_SOURCE_MEMBERSHIP after IP_ADD_MEMBERSHIP 
join
>                     on same multicast-group dont return EINVAL
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.34-rc6
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: mail@fholler.de
>         Regression: No
> 
> 
> Created an attachment (id=26225)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=26225)
> asm+ssm join test program
> 
> When an SSM IP_ADD_SOURCE_MEMBERSHIP is done after an ASM 
IP_ADD_MEMBERSHIP
> join on the same group(& same interface) the setsockopt operation should 
return
> EINVAL.
> 
> The linux implementation returns successfull
> 
> 
> 
> https://www3.tools.ietf.org/html/rfc3678#section-4.1.3
> 
> I attached an simple C test program.
> 
> -- 
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug.
> 
> 
> -- 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* RE: [RFC PATCH net-next] drivers/net/ks*: Use netdev_<level>, netif_<level> and pr_<level>
From: Ha, Tristram @ 2010-05-05 17:06 UTC (permalink / raw)
  To: David Miller; +Cc: ben, Choi, David, richard.rojfors.ext, netdev, joe
In-Reply-To: <20100316.213133.102673436.davem@davemloft.net>

David Miller wrote:
> From: Joe Perches <joe@perches.com>
> Date: Sat, 27 Feb 2010 16:43:51 -0800
> 
>> I'm not sure this is correct.
>> 
>> It changes logging macros from:
>> 	dev_<level>(&ks->spidev->dev,
>> to
>> 	netdev_<level>(ks->netdev,
> 
> I'm just applying this, more than a week to get a review is more than
enough.
> 
> Thanks Joe.

I have no objection about the patch as it just uses the latest kernel
debug call.  Is this new code available in current 2.6.34 release?  Or
will it be in the next version?

I generally download the latest 2.6.34-rc release to generate patches
for submission.

^ permalink raw reply

* RTL-8110SC lockup with r8169
From: Pádraig Brady @ 2010-05-05 16:05 UTC (permalink / raw)
  To: netdev; +Cc: Francois Romieu, Glen Gray

[-- Attachment #1: Type: text/plain, Size: 1812 bytes --]

Hi,

We're having an issue with the r8169 driver, where very often
(1 in 10 boots) it will lockup and our netboot system will hang.
On this hardware previously, we used FC5 with the r1000 driver without issue.

The interesting (problematic) thing is that the system needs to
be _power cycled_ to get the link detected again.
A reboot does not suffice. The symptoms sound like:

http://kerneltrap.org/mailarchive/linux-netdev/2009/3/6/5110184
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.32.y.git;a=commitdiff;h=ea8dbdd17099a9a5864ebd4c87e01e657b19c7ab

However the above code wasn't in the 2.6.32.10-90.fc12 driver we used.
Also I've back-ported the latest r8169 driver from git to our kernel
and it still has the same issue. The inconsequential diff
between our driver and the latest in git are attached just in case.

In the following dmesg you can see a successful boot.
Why does the link go down twice even in that case?
On lockup one does not see the "link up" message and
a power cycle is required, as the link is not detected
by the netboot ROM on a reboot.

r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:01:04.0: PCI INT A -> Link[LNKA] -> GSI 11 (level, low) -> IRQ 11
r8169 0000:01:04.0: no PCI Express capability
eth0: RTL8169sc/8110sc at 0xde87c000, 00:60:ef:08:49:f5, XID 98000000 IRQ 11
r8169: eth0: link down
r8169: eth0: link down
r8169: eth0: link up

I'm in a good position to test any patches/ideas you may have.

cheers,
Pádraig.

Ancillary version details are...

# dmesg | grep 8169
# lspci -n | grep -v 8086:
01:04.0 0200: 10ec:8167 (rev 10)

# ethtool -i eth0
driver: r8169
version: 2.3LK-NAPI
firmware-version:
bus-info: 0000:01:04.0

# uname -a
Linux Unit-00:60:ef:08:49:f5 2.6.32.10-90.fc12.i686 #1 SMP Tue Mar 23 10:21:29 UTC 2010 i686 i686 i386 GNU/Linux

[-- Attachment #2: r8169-unimportant.diff --]
[-- Type: text/x-patch, Size: 11726 bytes --]

--- /home/padraig/kernel/r8169.c.latest	2010-05-05 11:42:54.345452634 +0100
+++ /home/padraig/kernel/r8169.c.new	2010-05-05 16:36:36.805627050 +0100
@@ -168,7 +168,7 @@
 static void rtl_hw_start_8168(struct net_device *);
 static void rtl_hw_start_8101(struct net_device *);
 
-static DEFINE_PCI_DEVICE_TABLE(rtl8169_pci_tbl) = {
+static struct pci_device_id rtl8169_pci_tbl[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8129), 0, 0, RTL_CFG_0 },
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8136), 0, 0, RTL_CFG_2 },
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8167), 0, 0, RTL_CFG_0 },
@@ -749,10 +749,12 @@
 	spin_lock_irqsave(&tp->lock, flags);
 	if (tp->link_ok(ioaddr)) {
 		netif_carrier_on(dev);
-		netif_info(tp, ifup, dev, "link up\n");
+		if (netif_msg_ifup(tp))
+			printk(KERN_INFO PFX "%s: link up\n", dev->name);
 	} else {
+		if (netif_msg_ifdown(tp))
+			printk(KERN_INFO PFX "%s: link down\n", dev->name);
 		netif_carrier_off(dev);
-		netif_info(tp, ifdown, dev, "link down\n");
 	}
 	spin_unlock_irqrestore(&tp->lock, flags);
 }
@@ -865,8 +867,11 @@
 	} else if (autoneg == AUTONEG_ENABLE)
 		RTL_W32(TBICSR, reg | TBINwEnable | TBINwRestart);
 	else {
-		netif_warn(tp, link, dev,
-			   "incorrect speed setting refused in TBI mode\n");
+		if (netif_msg_link(tp)) {
+			printk(KERN_WARNING "%s: "
+			       "incorrect speed setting refused in TBI mode\n",
+			       dev->name);
+		}
 		ret = -EOPNOTSUPP;
 	}
 
@@ -901,9 +906,9 @@
 		    (tp->mac_version != RTL_GIGA_MAC_VER_15) &&
 		    (tp->mac_version != RTL_GIGA_MAC_VER_16)) {
 			giga_ctrl |= ADVERTISE_1000FULL | ADVERTISE_1000HALF;
-		} else {
-			netif_info(tp, link, dev,
-				   "PHY does not support 1000Mbps\n");
+		} else if (netif_msg_link(tp)) {
+			printk(KERN_INFO "%s: PHY does not support 1000Mbps.\n",
+			       dev->name);
 		}
 
 		bmcr = BMCR_ANENABLE | BMCR_ANRESTART;
@@ -2705,7 +2710,8 @@
 	if (tp->link_ok(ioaddr))
 		goto out_unlock;
 
-	netif_warn(tp, link, dev, "PHY reset until link up\n");
+	if (netif_msg_link(tp))
+		printk(KERN_WARNING "%s: PHY reset until link up\n", dev->name);
 
 	tp->phy_reset_enable(ioaddr);
 
@@ -2776,7 +2782,8 @@
 			return;
 		msleep(1);
 	}
-	netif_err(tp, link, dev, "PHY reset failed\n");
+	if (netif_msg_link(tp))
+		printk(KERN_ERR "%s: PHY reset failed.\n", dev->name);
 }
 
 static void rtl8169_init_phy(struct net_device *dev, struct rtl8169_private *tp)
@@ -2810,8 +2817,8 @@
 	 */
 	rtl8169_set_speed(dev, AUTONEG_ENABLE, SPEED_1000, DUPLEX_FULL);
 
-	if (RTL_R8(PHYstatus) & TBI_Enable)
-		netif_info(tp, link, dev, "TBI auto-negotiating\n");
+	if ((RTL_R8(PHYstatus) & TBI_Enable) && netif_msg_link(tp))
+		printk(KERN_INFO PFX "%s: TBI auto-negotiating\n", dev->name);
 }
 
 static void rtl_rar_set(struct rtl8169_private *tp, u8 *addr)
@@ -3016,33 +3023,41 @@
 	/* enable device (incl. PCI PM wakeup and hotplug setup) */
 	rc = pci_enable_device(pdev);
 	if (rc < 0) {
-		netif_err(tp, probe, dev, "enable failure\n");
+		if (netif_msg_probe(tp))
+			dev_err(&pdev->dev, "enable failure\n");
 		goto err_out_free_dev_1;
 	}
 
 	if (pci_set_mwi(pdev) < 0)
-		netif_info(tp, probe, dev, "Mem-Wr-Inval unavailable\n");
+		if (netif_msg_probe(tp)) {
+			dev_err(&pdev->dev, "Mem-Wr-Inval unavailable\n");
+		}
 
 	/* make sure PCI base addr 1 is MMIO */
 	if (!(pci_resource_flags(pdev, region) & IORESOURCE_MEM)) {
-		netif_err(tp, probe, dev,
-			  "region #%d not an MMIO resource, aborting\n",
-			  region);
+		if (netif_msg_probe(tp)) {
+			dev_err(&pdev->dev,
+				"region #%d not an MMIO resource, aborting\n",
+				region);
+		}
 		rc = -ENODEV;
 		goto err_out_mwi_2;
 	}
 
 	/* check for weird/broken PCI region reporting */
 	if (pci_resource_len(pdev, region) < R8169_REGS_SIZE) {
-		netif_err(tp, probe, dev,
-			  "Invalid PCI region size(s), aborting\n");
+		if (netif_msg_probe(tp)) {
+			dev_err(&pdev->dev,
+				"Invalid PCI region size(s), aborting\n");
+		}
 		rc = -ENODEV;
 		goto err_out_mwi_2;
 	}
 
 	rc = pci_request_regions(pdev, MODULENAME);
 	if (rc < 0) {
-		netif_err(tp, probe, dev, "could not request regions\n");
+		if (netif_msg_probe(tp))
+			dev_err(&pdev->dev, "could not request regions.\n");
 		goto err_out_mwi_2;
 	}
 
@@ -3055,7 +3070,10 @@
 	} else {
 		rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
 		if (rc < 0) {
-			netif_err(tp, probe, dev, "DMA configuration failed\n");
+			if (netif_msg_probe(tp)) {
+				dev_err(&pdev->dev,
+					"DMA configuration failed.\n");
+			}
 			goto err_out_free_res_3;
 		}
 	}
@@ -3063,14 +3081,15 @@
 	/* ioremap MMIO region */
 	ioaddr = ioremap(pci_resource_start(pdev, region), R8169_REGS_SIZE);
 	if (!ioaddr) {
-		netif_err(tp, probe, dev, "cannot remap MMIO, aborting\n");
+		if (netif_msg_probe(tp))
+			dev_err(&pdev->dev, "cannot remap MMIO, aborting\n");
 		rc = -EIO;
 		goto err_out_free_res_3;
 	}
 
 	tp->pcie_cap = pci_find_capability(pdev, PCI_CAP_ID_EXP);
-	if (!tp->pcie_cap)
-		netif_info(tp, probe, dev, "no PCI Express capability\n");
+	if (!tp->pcie_cap && netif_msg_probe(tp))
+		dev_info(&pdev->dev, "no PCI Express capability\n");
 
 	RTL_W16(IntrMask, 0x0000);
 
@@ -3093,8 +3112,10 @@
 
 	/* Use appropriate default if unknown */
 	if (tp->mac_version == RTL_GIGA_MAC_NONE) {
-		netif_notice(tp, probe, dev,
-			     "unknown MAC, using family default\n");
+		if (netif_msg_probe(tp)) {
+			dev_notice(&pdev->dev,
+				   "unknown MAC, using family default\n");
+		}
 		tp->mac_version = cfg->default_ver;
 	}
 
@@ -3176,10 +3197,19 @@
 
 	pci_set_drvdata(pdev, dev);
 
-	netif_info(tp, probe, dev, "%s at 0x%lx, %pM, XID %08x IRQ %d\n",
-		   rtl_chip_info[tp->chipset].name,
-		   dev->base_addr, dev->dev_addr,
-		   (u32)(RTL_R32(TxConfig) & 0x9cf0f8ff), dev->irq);
+	if (netif_msg_probe(tp)) {
+		u32 xid = RTL_R32(TxConfig) & 0x9cf0f8ff;
+
+		printk(KERN_INFO "%s: %s at 0x%lx, "
+		       "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x, "
+		       "XID %08x IRQ %d\n",
+		       dev->name,
+		       rtl_chip_info[tp->chipset].name,
+		       dev->base_addr,
+		       dev->dev_addr[0], dev->dev_addr[1],
+		       dev->dev_addr[2], dev->dev_addr[3],
+		       dev->dev_addr[4], dev->dev_addr[5], xid, dev->irq);
+	}
 
 	rtl8169_init_phy(dev, tp);
 
@@ -3231,8 +3261,8 @@
 	unsigned int max_frame = mtu + VLAN_ETH_HLEN + ETH_FCS_LEN;
 
 	if (max_frame != 16383)
-		printk(KERN_WARNING PFX "WARNING! Changing of MTU on this "
-			"NIC may lead to frame reception errors!\n");
+		printk(KERN_WARNING "WARNING! Changing of MTU on this NIC "
+			"May lead to frame reception errors!\n");
 
 	tp->rx_buf_sz = (max_frame > RX_BUF_SIZE) ? max_frame : RX_BUF_SIZE;
 }
@@ -4131,10 +4161,10 @@
 
 	ret = rtl8169_open(dev);
 	if (unlikely(ret < 0)) {
-		if (net_ratelimit())
-			netif_err(tp, drv, dev,
-				  "reinit failure (status = %d). Rescheduling\n",
-				  ret);
+		if (net_ratelimit() && netif_msg_drv(tp)) {
+			printk(KERN_ERR PFX "%s: reinit failure (status = %d)."
+			       " Rescheduling.\n", dev->name, ret);
+		}
 		rtl8169_schedule_work(dev, rtl8169_reinit_task);
 	}
 
@@ -4164,8 +4194,10 @@
 		netif_wake_queue(dev);
 		rtl8169_check_link_status(dev, tp, tp->mmio_addr);
 	} else {
-		if (net_ratelimit())
-			netif_emerg(tp, intr, dev, "Rx buffers shortage\n");
+		if (net_ratelimit() && netif_msg_intr(tp)) {
+			printk(KERN_EMERG PFX "%s: Rx buffers shortage\n",
+			       dev->name);
+		}
 		rtl8169_schedule_work(dev, rtl8169_reset_task);
 	}
 
@@ -4253,7 +4285,11 @@
 	u32 opts1;
 
 	if (unlikely(TX_BUFFS_AVAIL(tp) < skb_shinfo(skb)->nr_frags)) {
-		netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
+		if (netif_msg_drv(tp)) {
+			printk(KERN_ERR
+			       "%s: BUG! Tx Ring full when queue awake!\n",
+			       dev->name);
+		}
 		goto err_stop;
 	}
 
@@ -4315,8 +4351,11 @@
 	pci_read_config_word(pdev, PCI_COMMAND, &pci_cmd);
 	pci_read_config_word(pdev, PCI_STATUS, &pci_status);
 
-	netif_err(tp, intr, dev, "PCI error (cmd = 0x%04x, status = 0x%04x)\n",
-		  pci_cmd, pci_status);
+	if (netif_msg_intr(tp)) {
+		printk(KERN_ERR
+		       "%s: PCI error (cmd = 0x%04x, status = 0x%04x).\n",
+		       dev->name, pci_cmd, pci_status);
+	}
 
 	/*
 	 * The recovery sequence below admits a very elaborated explanation:
@@ -4340,7 +4379,8 @@
 
 	/* The infamous DAC f*ckup only happens at boot time */
 	if ((tp->cp_cmd & PCIDAC) && !tp->dirty_rx && !tp->cur_rx) {
-		netif_info(tp, intr, dev, "disabling PCI DAC\n");
+		if (netif_msg_intr(tp))
+			printk(KERN_INFO "%s: disabling PCI DAC.\n", dev->name);
 		tp->cp_cmd &= ~PCIDAC;
 		RTL_W16(CPlusCmd, tp->cp_cmd);
 		dev->features &= ~NETIF_F_HIGHDMA;
@@ -4432,12 +4472,13 @@
 	if (pkt_size >= rx_copybreak)
 		goto out;
 
-	skb = netdev_alloc_skb_ip_align(tp->dev, pkt_size);
+	skb = netdev_alloc_skb(tp->dev, pkt_size + NET_IP_ALIGN);
 	if (!skb)
 		goto out;
 
 	pci_dma_sync_single_for_cpu(tp->pci_dev, addr, pkt_size,
 				    PCI_DMA_FROMDEVICE);
+	skb_reserve(skb, NET_IP_ALIGN);
 	skb_copy_from_linear_data(*sk_buff, skb->data, pkt_size);
 	*sk_buff = skb;
 	done = true;
@@ -4475,8 +4516,11 @@
 		if (status & DescOwn)
 			break;
 		if (unlikely(status & RxRES)) {
-			netif_info(tp, rx_err, dev, "Rx ERROR. status = %08x\n",
-				   status);
+			if (netif_msg_rx_err(tp)) {
+				printk(KERN_INFO
+				       "%s: Rx ERROR. status = %08x\n",
+				       dev->name, status);
+			}
 			dev->stats.rx_errors++;
 			if (status & (RxRWT | RxRUNT))
 				dev->stats.rx_length_errors++;
@@ -4543,8 +4587,8 @@
 	tp->cur_rx = cur_rx;
 
 	delta = rtl8169_rx_fill(tp, dev, tp->dirty_rx, tp->cur_rx);
-	if (!delta && count)
-		netif_info(tp, intr, dev, "no Rx buffer allocated\n");
+	if (!delta && count && netif_msg_intr(tp))
+		printk(KERN_INFO "%s: no Rx buffer allocated\n", dev->name);
 	tp->dirty_rx += delta;
 
 	/*
@@ -4554,8 +4598,8 @@
 	 *   after refill ?
 	 * - how do others driver handle this condition (Uh oh...).
 	 */
-	if (tp->dirty_rx + NUM_RX_DESC == tp->cur_rx)
-		netif_emerg(tp, intr, dev, "Rx buffers exhausted\n");
+	if ((tp->dirty_rx + NUM_RX_DESC == tp->cur_rx) && netif_msg_intr(tp))
+		printk(KERN_EMERG "%s: Rx buffers exhausted\n", dev->name);
 
 	return count;
 }
@@ -4610,9 +4654,10 @@
 
 			if (likely(napi_schedule_prep(&tp->napi)))
 				__napi_schedule(&tp->napi);
-			else
-				netif_info(tp, intr, dev,
-					   "interrupt %04x in poll\n", status);
+			else if (netif_msg_intr(tp)) {
+				printk(KERN_INFO "%s: interrupt %04x in poll\n",
+				dev->name, status);
+			}
 		}
 
 		/* We only get a new MSI interrupt when all active irq
@@ -4748,22 +4793,27 @@
 
 	if (dev->flags & IFF_PROMISC) {
 		/* Unconditionally log net taps. */
-		netif_notice(tp, link, dev, "Promiscuous mode enabled\n");
+		if (netif_msg_link(tp)) {
+			printk(KERN_NOTICE "%s: Promiscuous mode enabled.\n",
+			       dev->name);
+		}
 		rx_mode =
 		    AcceptBroadcast | AcceptMulticast | AcceptMyPhys |
 		    AcceptAllPhys;
 		mc_filter[1] = mc_filter[0] = 0xffffffff;
-	} else if ((netdev_mc_count(dev) > multicast_filter_limit) ||
-		   (dev->flags & IFF_ALLMULTI)) {
+	} else if ((dev->mc_count > multicast_filter_limit)
+		   || (dev->flags & IFF_ALLMULTI)) {
 		/* Too many to filter perfectly -- accept all multicasts. */
 		rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys;
 		mc_filter[1] = mc_filter[0] = 0xffffffff;
 	} else {
 		struct dev_mc_list *mclist;
+		unsigned int i;
 
 		rx_mode = AcceptBroadcast | AcceptMyPhys;
 		mc_filter[1] = mc_filter[0] = 0;
-		netdev_for_each_mc_addr(mclist, dev) {
+		for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count;
+		     i++, mclist = mclist->next) {
 			int bit_nr = ether_crc(ETH_ALEN, mclist->dmi_addr) >> 26;
 			mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
 			rx_mode |= AcceptMulticast;

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: Brian Haley @ 2010-05-05 15:36 UTC (permalink / raw)
  To: David Miller; +Cc: dlstevens, enh, netdev, netdev-owner
In-Reply-To: <20100504.144647.157477097.davem@davemloft.net>

David Miller wrote:
> From: Brian Haley <brian.haley@hp.com>
> Date: Tue, 04 May 2010 10:40:58 -0400
> 
>> Specifying -1 for setsockopt(IPV6_MULTICAST_HOPS) should set the socket
>> value back to the system default value of IPV6_DEFAULT_MCASTHOPS (1).
>>
>> Signed-off-by: Brian Haley <brian.haley@hp.com>
> 
> In cast it wasn't clear from my other reply, I'm not applying this
> patch because I intentionally left this behavior there based upon
> some comments from Elliot in that this lets developers get the
> old default by asking for "-1" explicitly with a setsockopt.

I now see that in Elliot's email, but I think it's incorrect.  The RFC
says that setting it to -1 should get you the kernel default, which is
now 1.  Without this change, setting it to -1 will get you 64, the
old behavior.  If the user wants to, they can always just set it to
64 themselves, that's better than assuming when you set it to -1
you're going to get 64.

I'm just trying to make this follow the RFC and behave like other OSes
for consistency.

-Brian

^ permalink raw reply

* Fw: [Bug 15907] New: IP_ADD_SOURCE_MEMBERSHIP after IP_ADD_MEMBERSHIP join on same multicast-group dont return EINVAL
From: Stephen Hemminger @ 2010-05-05 15:21 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev



Begin forwarded message:

Date: Wed, 5 May 2010 09:48:40 GMT
From: bugzilla-daemon@bugzilla.kernel.org
To: shemminger@linux-foundation.org
Subject: [Bug 15907] New: IP_ADD_SOURCE_MEMBERSHIP after IP_ADD_MEMBERSHIP join on same multicast-group dont return EINVAL


https://bugzilla.kernel.org/show_bug.cgi?id=15907

           Summary: IP_ADD_SOURCE_MEMBERSHIP after IP_ADD_MEMBERSHIP join
                    on same multicast-group dont return EINVAL
           Product: Networking
           Version: 2.5
    Kernel Version: 2.6.34-rc6
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
        AssignedTo: shemminger@linux-foundation.org
        ReportedBy: mail@fholler.de
        Regression: No


Created an attachment (id=26225)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=26225)
asm+ssm join test program

When an SSM IP_ADD_SOURCE_MEMBERSHIP is done after an ASM IP_ADD_MEMBERSHIP
join on the same group(& same interface) the setsockopt operation should return
EINVAL.

The linux implementation returns successfull



https://www3.tools.ietf.org/html/rfc3678#section-4.1.3

I attached an simple C test program.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


-- 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox