Netdev List
 help / color / mirror / Atom feed
* [net-next-2.6 PATCH 2/2] e1000e: Reset 82577/82578 PHY before first PHY register read
From: Jeff Kirsher @ 2010-05-06  8:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Bruce Allan, Jeff Kirsher
In-Reply-To: <20100506075959.8910.13493.stgit@localhost.localdomain>

From: Bruce Allan <bruce.w.allan@intel.com>

Reset the PHY before first accessing it.  Doing so, ensure that the PHY is
in a known good state before we read/write PHY registers. This fixes a
driver probe failure.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/ich8lan.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 0bfef8e..b8c4dce 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -294,6 +294,16 @@ static s32 e1000_init_phy_params_pchlan(struct e1000_hw *hw)
 		msleep(50);
 	}
 
+	/*
+	 * Reset the PHY before any acccess to it.  Doing so, ensures that
+	 * the PHY is in a known good state before we read/write PHY registers.
+	 * The generic reset is sufficient here, because we haven't determined
+	 * the PHY type yet.
+	 */
+	ret_val = e1000e_phy_hw_reset_generic(hw);
+	if (ret_val)
+		goto out;
+
 	phy->id = e1000_phy_unknown;
 	ret_val = e1000e_get_phy_id(hw);
 	if (ret_val)


^ permalink raw reply related

* Re: [PATCH 4/4 v2] ks8851: read/write MAC address on companion eeprom through debugfs
From: Sebastien Jan @ 2010-05-06  8:01 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, linux-omap@vger.kernel.org, Arce, Abraham,
	ben-linux@fluff.org, Tristram.Ha@micrel.com
In-Reply-To: <20100506.002508.241926083.davem@davemloft.net>

Hi david,

On Thursday 06 May 2010 09:25:08 David Miller wrote:
> From: Sebastien Jan <s-jan@ti.com>
> Date: Wed,  5 May 2010 20:45:55 +0200
> 
> > A more elegant alternative to ethtool for updating the ks8851
> > MAC address stored on its companion eeprom.
> > Using this debugfs interface does not require any knowledge on the
> > ks8851 companion eeprom organization to update the MAC address.
> >
> > Example to write 01:23:45:67:89:AB MAC address to the companion
> > eeprom (assuming debugfs is mounted in /sys/kernel/debug):
> > $ echo "01:23:45:67:89:AB" > /sys/kernel/debug/ks8851/mac_eeprom
> >
> > Signed-off-by: Sebastien Jan <s-jan@ti.com>
> 
> Elegant?  This commit message is the biggest lie ever told.
> 
> What makes your ethernet driver so god-damn special that it deserves
> to have a private, completely unique, and obscure interface for
> setting the permanent ethernet address of a network device?
> 
> Tell me how damn elegant it is that users have to learn about this
> special, unique, and common with no other driver, interface for
> performing this task?
> 
> Tell me how damn elegant it is when another driver wants to provide
> users with a way to do this too, and they (like you) come up with
> their own unique and different interface for doing this.
> 
> No, this is the most inelegant patch ever conceived because it totally
> ignores the way in which we handle issues like this.
> 
> There is no way in the world I'm applying this garbage patch, sorry.
> 
> We have an ETHTOOL_GPERMADDR, add a new ETHTOOL_SPERMADDR operation
> and then any driver (not just your's) can portably provide this
> facility and users will have one, and only one, way of performing this
> task.
> 

I agree that my commit message was probably too provocative, sorry for that.

Thank you for shedding some light on ETHTOOL_GPERMADDR and ETHTOOL_SPERMADDR. 
I will look into these interfaces for a proper and generic implementation.

^ permalink raw reply

* Re: [PATCH net-next-2.6] rps: consistent rxhash
From: David Miller @ 2010-05-06  8:06 UTC (permalink / raw)
  To: therbert; +Cc: eric.dumazet, franco, xiaosuo, netdev
In-Reply-To: <g2m65634d661004211212t13714cccyd27936c520515684@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Wed, 21 Apr 2010 12:12:41 -0700

> On Tue, Apr 20, 2010 at 2:41 PM, David Miller <davem@davemloft.net> wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Tue, 20 Apr 2010 16:57:01 +0200
>>
>>> I know many applications using TCP on loopback, they are real :)
>>
>> This is all true and I support your hashing patch and all of that.
>>
>> But if we really want TCP over loopback to go fast, there are much
>> better ways to do this.
>>
>> Eric, do you remember that "TCP friends" rough patch I sent you last
>> year that essentailly made TCP sockets over loopback behave like
>> AF_UNIX ones and just queue the SKBs directly to the destination
>> socket without doing any protocol work?
>>
> This is sounds very interesting!  Could you post a patch? :-)

I was finally able to unearth a copy, it's completely raw, it's at least
a year old, and it's not fully implemented at all.

But you asked for it :-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 299ec4b..7f855d3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -206,6 +206,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@mac_header: Link layer header
  *	@dst: destination entry
  *	@sp: the security path, used for xfrm
+ *	@friend: loopback friend socket
  *	@cb: Control buffer. Free for use by every layer. Put private vars here
  *	@len: Length of actual data
  *	@data_len: Data length
@@ -262,6 +263,7 @@ struct sk_buff {
 		struct  rtable		*rtable;
 	};
 	struct	sec_path	*sp;
+	struct sock		*friend;
 
 	/*
 	 * This is the control buffer. It is free to use for every
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index b220b5f..52b2f7a 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -53,6 +53,7 @@ struct request_sock {
 	unsigned long			expires;
 	const struct request_sock_ops	*rsk_ops;
 	struct sock			*sk;
+	struct sock			*friend;
 	u32				secid;
 	u32				peer_secid;
 };
diff --git a/include/net/sock.h b/include/net/sock.h
index dc42b44..3e86190 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -137,6 +137,7 @@ struct sock_common {
   *	@sk_userlocks: %SO_SNDBUF and %SO_RCVBUF settings
   *	@sk_lock:	synchronizer
   *	@sk_rcvbuf: size of receive buffer in bytes
+  *	@sk_friend: loopback friend socket
   *	@sk_sleep: sock wait queue
   *	@sk_dst_cache: destination cache
   *	@sk_dst_lock: destination cache lock
@@ -227,6 +228,7 @@ struct sock {
 		struct sk_buff *head;
 		struct sk_buff *tail;
 	} sk_backlog;
+	struct sock		*sk_friend;
 	wait_queue_head_t	*sk_sleep;
 	struct dst_entry	*sk_dst_cache;
 	struct xfrm_policy	*sk_policy[2];
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4fe605f..0eef90a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -435,6 +435,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 #ifdef CONFIG_INET
 	new->sp			= secpath_get(old->sp);
 #endif
+	new->friend		= old->friend;
 	memcpy(new->cb, old->cb, sizeof(old->cb));
 	new->csum_start		= old->csum_start;
 	new->csum_offset	= old->csum_offset;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 828ea21..375dc2e 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -503,6 +503,8 @@ struct sock *inet_csk_clone(struct sock *sk, const struct request_sock *req,
 	if (newsk != NULL) {
 		struct inet_connection_sock *newicsk = inet_csk(newsk);
 
+		newsk->sk_friend = req->friend;
+
 		newsk->sk_state = TCP_SYN_RECV;
 		newicsk->icsk_bind_hash = NULL;
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 58ac838..042ee1d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -474,7 +474,8 @@ static inline int forced_push(struct tcp_sock *tp)
 	return after(tp->write_seq, tp->pushed_seq + (tp->max_window >> 1));
 }
 
-static inline void skb_entail(struct sock *sk, struct sk_buff *skb)
+static inline void skb_entail(struct sock *sk, struct sk_buff *skb,
+			      struct sk_buff_head *friend_queue)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
@@ -484,7 +485,10 @@ static inline void skb_entail(struct sock *sk, struct sk_buff *skb)
 	tcb->flags   = TCPCB_FLAG_ACK;
 	tcb->sacked  = 0;
 	skb_header_release(skb);
-	tcp_add_write_queue_tail(sk, skb);
+	if (sk->sk_friend)
+		__skb_queue_tail(friend_queue, skb);
+	else
+		tcp_add_write_queue_tail(sk, skb);
 	sk->sk_wmem_queued += skb->truesize;
 	sk_mem_charge(sk, skb->truesize);
 	if (tp->nonagle & TCP_NAGLE_PUSH)
@@ -501,7 +505,7 @@ static inline void tcp_mark_urg(struct tcp_sock *tp, int flags,
 }
 
 static inline void tcp_push(struct sock *sk, int flags, int mss_now,
-			    int nonagle)
+			    int nonagle, struct sk_buff_head *friend_queue)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
@@ -512,6 +516,19 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now,
 		tcp_mark_urg(tp, flags, skb);
 		__tcp_push_pending_frames(sk, mss_now,
 					  (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
+	} else if (sk->sk_friend) {
+		struct sock *friend = sk->sk_friend;
+		struct sk_buff *skb;
+		unsigned int len;
+
+		spin_lock_bh(&friend->sk_lock.slock);
+		len = 0;
+		while ((skb = __skb_dequeue(friend_queue)) != NULL) {
+			len += skb->len;
+			__skb_queue_tail(&sk->sk_receive_queue, skb);
+		}
+		sk->sk_data_ready(friend, len);
+		spin_unlock_bh(&friend->sk_lock.slock);
 	}
 }
 
@@ -658,6 +675,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse
 			 size_t psize, int flags)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff_head friend_queue;
 	int mss_now, size_goal;
 	int err;
 	ssize_t copied;
@@ -674,6 +692,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse
 	size_goal = tp->xmit_size_goal;
 	copied = 0;
 
+	skb_queue_head_init(&friend_queue);
+
 	err = -EPIPE;
 	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
 		goto do_error;
@@ -694,7 +714,7 @@ new_segment:
 			if (!skb)
 				goto wait_for_memory;
 
-			skb_entail(sk, skb);
+			skb_entail(sk, skb, &friend_queue);
 			copy = size_goal;
 		}
 
@@ -749,7 +769,8 @@ wait_for_sndbuf:
 		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 wait_for_memory:
 		if (copied)
-			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
+			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH,
+				 &friend_queue);
 
 		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
 			goto do_error;
@@ -760,7 +781,7 @@ wait_for_memory:
 
 out:
 	if (copied)
-		tcp_push(sk, flags, mss_now, tp->nonagle);
+		tcp_push(sk, flags, mss_now, tp->nonagle, &friend_queue);
 	return copied;
 
 do_error:
@@ -817,6 +838,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	struct sock *sk = sock->sk;
 	struct iovec *iov;
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff_head friend_queue;
 	struct sk_buff *skb;
 	int iovlen, flags;
 	int mss_now, size_goal;
@@ -849,6 +871,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
 		goto do_error;
 
+	skb_queue_head_init(&friend_queue);
 	while (--iovlen >= 0) {
 		int seglen = iov->iov_len;
 		unsigned char __user *from = iov->iov_base;
@@ -881,7 +904,7 @@ new_segment:
 				if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
 					skb->ip_summed = CHECKSUM_PARTIAL;
 
-				skb_entail(sk, skb);
+				skb_entail(sk, skb, &friend_queue);
 				copy = size_goal;
 			}
 
@@ -995,7 +1018,8 @@ wait_for_sndbuf:
 			set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 wait_for_memory:
 			if (copied)
-				tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
+				tcp_push(sk, flags & ~MSG_MORE, mss_now,
+					 TCP_NAGLE_PUSH, &friend_queue);
 
 			if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
 				goto do_error;
@@ -1007,7 +1031,7 @@ wait_for_memory:
 
 out:
 	if (copied)
-		tcp_push(sk, flags, mss_now, tp->nonagle);
+		tcp_push(sk, flags, mss_now, tp->nonagle, &friend_queue);
 	TCP_CHECK_TIMER(sk);
 	release_sock(sk);
 	return copied;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index cdc051b..eb6f914 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4998,6 +4998,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		 *    state to ESTABLISHED..."
 		 */
 
+		sk->sk_friend = skb->friend;
 		TCP_ECN_rcv_synack(tp, th);
 
 		tp->snd_wl1 = TCP_SKB_CB(skb)->seq;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7766151..4d91ff4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1289,6 +1289,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	if (!req)
 		goto drop;
 
+	req->friend = skb->friend;
 #ifdef CONFIG_TCP_MD5SIG
 	tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
 #endif
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index debf235..a4d4c14 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -577,6 +577,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	}
 
 	if (unlikely(tcb->flags & TCPCB_FLAG_SYN)) {
+		skb->friend = sk;
 		tcp_syn_build_options((__be32 *)(th + 1),
 				      tcp_advertise_mss(sk),
 				      (sysctl_flags & SYSCTL_FLAG_TSTAMPS),
@@ -1006,6 +1007,8 @@ unsigned int tcp_current_mss(struct sock *sk, int large_allowed)
 		xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
 		xmit_size_goal -= (xmit_size_goal % mss_now);
 	}
+	if (sk->sk_friend)
+		xmit_size_goal = ~(u16)0;
 	tp->xmit_size_goal = xmit_size_goal;
 
 	return mss_now;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 715965f..c79d3ea 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1280,6 +1280,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	if (req == NULL)
 		goto drop;
 
+	req->friend = skb->friend;
 #ifdef CONFIG_TCP_MD5SIG
 	tcp_rsk(req)->af_specific = &tcp_request_sock_ipv6_ops;
 #endif

^ permalink raw reply related

* Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Gleb Natapov @ 2010-05-06  8:19 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: Christoph Hellwig, Dmitry Torokhov, pv-drivers@vmware.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org
In-Reply-To: <F1354E79A137A24CBA60059AA65CB1B802A235C535@EXCH-MBX-2.vmware.com>

On Wed, May 05, 2010 at 10:47:10AM -0700, Pankaj Thakkar wrote:
> 
> 
> > -----Original Message-----
> > From: Christoph Hellwig [mailto:hch@infradead.org]
> > Sent: Wednesday, May 05, 2010 10:40 AM
> > To: Dmitry Torokhov
> > Cc: Christoph Hellwig; pv-drivers@vmware.com; Pankaj Thakkar;
> > netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> > virtualization@lists.linux-foundation.org
> > Subject: Re: [Pv-drivers] RFC: Network Plugin Architecture (NPA) for
> > vmxnet3
> > 
> > On Wed, May 05, 2010 at 10:35:28AM -0700, Dmitry Torokhov wrote:
> > > Yes, with the exception that the only body of code that will be
> > > accepted by the shell should be GPL-licensed and thus open and
> > available
> > > for examining. This is not different from having a standard kernel
> > > module that is loaded normally and plugs into a certain subsystem.
> > > The difference is that the binary resides not on guest filesystem
> > > but elsewhere.
> > 
> > Forget about the licensing.  Loading binary blobs written to a shim
> > layer is a complete pain in the ass and totally unsupportable, and
> > also uninteresting because of the overhead.
> 
> [PT] Why do you think it is unsupportable? How different is it from any module
> written against a well maintained interface? What overhead are you talking about?
> 
Overhead of interpreting bytecode plugin is written in. Or are you
saying plugin is x86 assembly (32bit or 64bit btw?) and other arches
will have to have in kernel x86 emulator to use the plugin (like some
of them had for vgabios)? 

--
			Gleb.

^ permalink raw reply

* [PATCH] netpoll: Use 'bool' for netpoll_rx() return type.
From: David Miller @ 2010-05-06  8:20 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---

I noticed this while validating and auditing Eric Dumazet's vnet
fix...  applied to net-next-2.6

 include/linux/netpoll.h |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 017e604..3688c83 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -55,19 +55,19 @@ void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
 
 
 #ifdef CONFIG_NETPOLL
-static inline int netpoll_rx(struct sk_buff *skb)
+static inline bool netpoll_rx(struct sk_buff *skb)
 {
 	struct netpoll_info *npinfo = skb->dev->npinfo;
 	unsigned long flags;
-	int ret = 0;
+	int ret = false;
 
 	if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
-		return 0;
+		return false;
 
 	spin_lock_irqsave(&npinfo->rx_lock, flags);
 	/* check rx_flags again with the lock held */
 	if (npinfo->rx_flags && __netpoll_rx(skb))
-		ret = 1;
+		ret = true;
 	spin_unlock_irqrestore(&npinfo->rx_lock, flags);
 
 	return ret;
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] net: emaclite: Use resource_size
From: Tobias Klauser @ 2010-05-06  8:12 UTC (permalink / raw)
  To: netdev, davem; +Cc: john.linn, kernel-janitors, Tobias Klauser

Use the resource_size function instead of manually calculating the
resource size.  This reduces the chance of introducing off-by-one
errors.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
---
 drivers/net/xilinx_emaclite.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xilinx_emaclite.c b/drivers/net/xilinx_emaclite.c
index e9381fe..93828d5 100644
--- a/drivers/net/xilinx_emaclite.c
+++ b/drivers/net/xilinx_emaclite.c
@@ -1171,7 +1171,7 @@ static int __devinit xemaclite_of_probe(struct of_device *ofdev,
 	}
 
 	/* Get the virtual base address for the device */
-	lp->base_addr = ioremap(r_mem.start, r_mem.end - r_mem.start + 1);
+	lp->base_addr = ioremap(r_mem.start, resource_size(&r_mem));
 	if (NULL == lp->base_addr) {
 		dev_err(dev, "EmacLite: Could not allocate iomem\n");
 		rc = -EIO;
@@ -1224,7 +1224,7 @@ static int __devinit xemaclite_of_probe(struct of_device *ofdev,
 	return 0;
 
 error1:
-	release_mem_region(ndev->mem_start, r_mem.end - r_mem.start + 1);
+	release_mem_region(ndev->mem_start, resource_size(&r_mem));
 
 error2:
 	xemaclite_remove_ndev(ndev);
-- 
1.6.3.3


^ permalink raw reply related

* Re: [net-next-2.6 PATCH 1/2] e1000e: reset MAC-PHY interconnect on 82577/82578 during Sx->S0
From: David Miller @ 2010-05-06  8:22 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bruce.w.allan
In-Reply-To: <20100506075959.8910.13493.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 06 May 2010 01:00:06 -0700

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> During Sx->S0 transitions, the interconnect between the MAC and PHY on
> 82577/82578 can remain in SMBus mode instead of transitioning to the
> PCIe-like mode required during normal operation.  Toggling the LANPHYPC
> Value bit essentially resets the interconnect forcing it to the correct
> mode.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/2] e1000e: Reset 82577/82578 PHY before first PHY register read
From: David Miller @ 2010-05-06  8:23 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bruce.w.allan
In-Reply-To: <20100506080025.8910.45557.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 06 May 2010 01:00:27 -0700

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> Reset the PHY before first accessing it.  Doing so, ensure that the PHY is
> in a known good state before we read/write PHY registers. This fixes a
> driver probe failure.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: emaclite: Use resource_size
From: David Miller @ 2010-05-06  8:23 UTC (permalink / raw)
  To: tklauser; +Cc: netdev, john.linn, kernel-janitors
In-Reply-To: <1273133540-6894-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Thu,  6 May 2010 10:12:20 +0200

> Use the resource_size function instead of manually calculating the
> resource size.  This reduces the chance of introducing off-by-one
> errors.
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] netpoll: Use 'bool' for netpoll_rx() return type.
From: Changli Gao @ 2010-05-06  8:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100506.012051.149835101.davem@davemloft.net>

On Thu, May 6, 2010 at 4:20 PM, David Miller <davem@davemloft.net> wrote:
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>
> I noticed this while validating and auditing Eric Dumazet's vnet
> fix...  applied to net-next-2.6
>
>  include/linux/netpoll.h |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
> index 017e604..3688c83 100644
> --- a/include/linux/netpoll.h
> +++ b/include/linux/netpoll.h
> @@ -55,19 +55,19 @@ void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
>
>
>  #ifdef CONFIG_NETPOLL
> -static inline int netpoll_rx(struct sk_buff *skb)
> +static inline bool netpoll_rx(struct sk_buff *skb)
>  {
>        struct netpoll_info *npinfo = skb->dev->npinfo;
>        unsigned long flags;
> -       int ret = 0;
> +       int ret = false;
>

bool ret = false.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] netpoll: Use 'bool' for netpoll_rx() return type.
From: David Miller @ 2010-05-06  8:30 UTC (permalink / raw)
  To: xiaosuo; +Cc: netdev
In-Reply-To: <g2k412e6f7f1005060127g11f7b1e6vddd448178e1ddff9@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Thu, 6 May 2010 16:27:44 +0800

> bool ret = false.

Thanks for catching that, fixed.

^ permalink raw reply

* Re: [PATCH/RFC] cxgb4: Add MAINTAINERS info
From: Dimitris Michailidis @ 2010-05-06  8:43 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW
In-Reply-To: <adawrvijhpq.fsf_-_-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>

Roland Dreier wrote:
> Hi guys, does this info for cxgb4/iw_cxgb4 (pretty much copied from
> cxgb3, except with Dimitris instead of Divy) look right?  If so I'll add
> it to my tree.

Yes, it's fine with me.  Thanks.

> 
> Thanks,
>   Roland
> ---
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7a9ccda..a00231b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1719,6 +1719,20 @@ W:	http://www.openfabrics.org
>  S:	Supported
>  F:	drivers/infiniband/hw/cxgb3/
>  
> +CXGB4 ETHERNET DRIVER (CXGB4)
> +M:	Dimitris Michailidis <dm-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> +L:	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +W:	http://www.chelsio.com
> +S:	Supported
> +F:	drivers/net/cxgb4/
> +
> +CXGB4 IWARP RNIC DRIVER (IW_CXGB4)
> +M:	Steve Wise <swise-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
> +L:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +W:	http://www.openfabrics.org
> +S:	Supported
> +F:	drivers/infiniband/hw/cxgb4/
> +
>  CYBERPRO FB DRIVER
>  M:	Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
>  L:	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org (moderated for non-subscribers)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: 3 packet TCP window limit?
From: dormando @ 2010-05-06  8:51 UTC (permalink / raw)
  To: Lars Eggert; +Cc: Rick Jones, Brian Bloniarz, netdev@vger.kernel.org
In-Reply-To: <0C1FD143-5588-4455-B08F-85ADD19E0E02@nokia.com>

> On 2010-5-5, at 23:31, dormando wrote:
> > The RFC clearly states "around 4k",
>
> no, it doesn't. RFC3390 gives a very precise formula for calculating the initial window:
>
> 	min (4*MSS, max (2*MSS, 4380 bytes))
>
> Please see the RFC for why. More reading at http://www.icir.org/floyd/tcp_init_win.html I believe that Linux implements behavior this pretty faithfully.

Sorry, paraphrasing :) Web nerds have been working around this for a long
time now. Google talks about using HTTP chunked encoding responses to send
an initial "frame" of a webpage in under 3 packets. Which immediately
gives the browser something to render and primes the TCP connection for
more web junk.

> I'm surprised to hear that OpenBSD doesn't follow the RFC. Can you share a measurement? Are you sure the box you are measuring is using the default configuration?

Yeah, default config. OBSD was giving me back 4 packets in the first
window, while linux always gives back 3. The Big/IP is based on linux
2.4.21. If that kernel didn't have it wrong, they tuned it.

Already nuked my dumps. If you're curious I'll re-create.

> I don't think the RFC can be misread (it's pretty clear), and the
> formula is also not exactly complicated. My guess would be that some
> vendors have convinced themselves that using a slightly larger value is
> OK, esp. if they can show customers that "their" TCP is "faster" than
> some competitors' TCPs. An arms race between vendors in this space would
> really not be good for anyone - it's clear that at some point, problems
> due to overshoot will occur.

I clearly remember some vendors bragging about doing this. That was a long
time ago? Perhaps they stopped? If it's true they've been doing it for
half a decade or more, and haven't broken anything someone would notice.

The only reason why I set about tuning this is because our latency jumped
while moving traffic from a commercial machine to a linux machine, and I
had to figure out what they changed to do that. I've since turned the
setting *back* to the standard, having confirmed what they did.

Almost tempted to test this against a bunch of websites...

> (We can definitely argue about whether the current RFC-recommended value
> is too low, and Google and others are gathering data in support of
> making a convincing and backed-up argument for increasing the initial
> window to the IETF. Which is exactly the correct way of going about
> this.)

This sounds like fun. We have some diverse traffic, so I'm hoping we can
contribute to that conversation. Still have a lot of reading to catch up
with first :)

^ permalink raw reply

* Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
From: Avi Kivity @ 2010-05-06  8:58 UTC (permalink / raw)
  To: Pankaj Thakkar
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org, pv-drivers@vmware.com,
	Shreyas Bhatewara
In-Reply-To: <20100505194417.GV8323@vmware.com>

On 05/05/2010 10:44 PM, Pankaj Thakkar wrote:
> On Wed, May 05, 2010 at 10:59:51AM -0700, Avi Kivity wrote:
>    
>> Date: Wed, 5 May 2010 10:59:51 -0700
>> From: Avi Kivity<avi@redhat.com>
>> To: Pankaj Thakkar<pthakkar@vmware.com>
>> CC: "linux-kernel@vger.kernel.org"<linux-kernel@vger.kernel.org>,
>> 	"netdev@vger.kernel.org"<netdev@vger.kernel.org>,
>> 	"virtualization@lists.linux-foundation.org"
>>   <virtualization@lists.linux-foundation.org>,
>> 	"pv-drivers@vmware.com"<pv-drivers@vmware.com>,
>> 	Shreyas Bhatewara<sbhatewara@vmware.com>
>> Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
>>
>> On 05/05/2010 02:02 AM, Pankaj Thakkar wrote:
>>      
>>> 2. Hypervisor control: All control operations from the guest such as programming
>>> MAC address go through the hypervisor layer and hence can be subjected to
>>> hypervisor policies. The PF driver can be further used to put policy decisions
>>> like which VLAN the guest should be on.
>>>
>>>        
>> Is this enforced?  Since you pass the hardware through, you can't rely
>> on the guest actually doing this, yes?
>>      
> We don't pass the whole VF to the guest. Only the BAR which is responsible for
> TX/RX/intr is mapped into guest space.

Does the SR/IOV spec guarantee that you will have such a separation?



>
>>
>>> We have reworked our existing Linux vmxnet3 driver to accomodate NPA by
>>> splitting the driver into two parts: Shell and Plugin. The new split driver is
>>>
>>>        
>> So the Shell would be the reworked or new bond driver, and Plugins would
>> be ordinary Linux network drivers.
>>      
> In NPA we do not rely on the guest OS to provide any of these services like
> bonding or PCI hotplug.

Well the Shell does some sort of bonding (there are two links and the 
shell selects which one to exercise) and some sort of hotplug.  Since 
the Shell is part of the guest OS, you do rely on it.

It's certainly simpler than PCI hotplug or ordinary bonding.

> We don't rely on the guest OS to unmap a VF and switch
> a VM out of passthrough. In a bonding approach that becomes an issue you can't
> just yank a device from underneath, you have to wait for the OS to process the
> request and switch from using VF to the emulated device and this makes the
> hypervisor dependent on the guest OS.

How can you unmap the VF without guest cooperation?  If you're executing 
Plugin code, you can't yank anything out.

Are plugins executed with preemption/interrupts disabled?

> Also we don't rely on the presence of all
> the drivers inside the guest OS (be it Linux or Windows), the ESX hypervisor
> carries all the plugins and the PF drivers and injects the right one as needed.
> These plugins are guest agnostic and the IHVs do not have to write plugins for
> different OS.
>    

What ISAs do those plugins support?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* [PATCH net-next-2.6] rps: Various optimizations
From: Eric Dumazet @ 2010-05-06  8:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert

Introduce ____napi_schedule() helper for callers in irq disabled
contexts. rps_trigger_softirq() becomes a leaf function.

Use container_of() in process_backlog() instead of accessing per_cpu
address.

Use a custom inlined version of __napi_complete() in process_backlog()
to avoid one locked instruction :

 only current cpu owns and manipulates this napi,
 and NAPI_STATE_SCHED is the only possible flag set on backlog.
 we can use a plain write instead of clear_bit(),
 and we dont need an smp_mb() memory barrier, since RPS is on,
 backlog is protected by a spinlock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |   28 ++++++++++++++++++++++------
 1 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 36d53be..c6861e4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2205,6 +2205,14 @@ int netdev_max_backlog __read_mostly = 1000;
 int netdev_budget __read_mostly = 300;
 int weight_p __read_mostly = 64;            /* old backlog weight */
 
+/* Called with irq disabled */
+static inline void ____napi_schedule(struct softnet_data *sd,
+				     struct napi_struct *napi)
+{
+	list_add_tail(&napi->poll_list, &sd->poll_list);
+	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+}
+
 #ifdef CONFIG_RPS
 
 /* One global table that all flow-based protocols share. */
@@ -2363,7 +2371,7 @@ static void rps_trigger_softirq(void *data)
 {
 	struct softnet_data *sd = data;
 
-	__napi_schedule(&sd->backlog);
+	____napi_schedule(sd, &sd->backlog);
 	sd->received_rps++;
 }
 
@@ -2421,7 +2429,7 @@ enqueue:
 		/* Schedule NAPI for backlog device */
 		if (napi_schedule_prep(&sd->backlog)) {
 			if (!rps_ipi_queued(sd))
-				__napi_schedule(&sd->backlog);
+				____napi_schedule(sd, &sd->backlog);
 		}
 		goto enqueue;
 	}
@@ -3280,7 +3288,7 @@ static void net_rps_action_and_irq_enable(struct softnet_data *sd)
 static int process_backlog(struct napi_struct *napi, int quota)
 {
 	int work = 0;
-	struct softnet_data *sd = &__get_cpu_var(softnet_data);
+	struct softnet_data *sd = container_of(napi, struct softnet_data, backlog);
 
 #ifdef CONFIG_RPS
 	/* Check if we have pending ipi, its better to send them now,
@@ -3313,7 +3321,16 @@ static int process_backlog(struct napi_struct *napi, int quota)
 						   &sd->process_queue);
 		}
 		if (qlen < quota - work) {
-			__napi_complete(napi);
+			/*
+			 * Inline a custom version of __napi_complete().
+			 * only current cpu owns and manipulates this napi,
+			 * and NAPI_STATE_SCHED is the only possible flag set on backlog.
+			 * we can use a plain write instead of clear_bit(),
+			 * and we dont need an smp_mb() memory barrier.
+			 */
+			list_del(&napi->poll_list);
+			napi->state = 0;
+
 			quota = work + qlen;
 		}
 		rps_unlock(sd);
@@ -3334,8 +3351,7 @@ void __napi_schedule(struct napi_struct *n)
 	unsigned long flags;
 
 	local_irq_save(flags);
-	list_add_tail(&n->poll_list, &__get_cpu_var(softnet_data).poll_list);
-	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+	____napi_schedule(&__get_cpu_var(softnet_data), n);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(__napi_schedule);



^ permalink raw reply related

* tip: origin tree boot crash (in the new micrel phy driver)
From: Ingo Molnar @ 2010-05-06  9:59 UTC (permalink / raw)
  To: David J. Choi; +Cc: torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20100505.012750.246538504.davem@davemloft.net>


* David Miller <davem@davemloft.net> wrote:

>       drivers/net/phy: micrel phy driver

FYI, -tip testing started triggering this boot crash today (x86, 64-bit):

bus: 'mdio_bus': add driver STe101p
initcall ste10Xp_init+0x0/0x22 returned 0 after 52 usecs
calling  ksphy_init+0x0/0x5e @ 1
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff813b1e98>] strcmp+0x6/0x21
PGD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file:
CPU 1
Pid: 1, comm: swapper Not tainted 2.6.34-rc5 #3328 A8N-E/System Product
Name
RIP: 0010:[<ffffffff813b1e98>]  [<ffffffff813b1e98>] strcmp+0x6/0x21
RSP: 0018:ffff88003f33fe00  EFLAGS: 00010286
Call Trace:
 [<ffffffff813ae59d>] kset_find_obj+0x3d/0x81
 [<ffffffff814e664f>] driver_find+0x1f/0x32
 [<ffffffff814e6732>] driver_register+0x64/0x103
 [<ffffffff83ca853f>] ? ksphy_init+0x0/0x5e
 [<ffffffff816adcf1>] phy_driver_register+0x3e/0x92
 [<ffffffff83ca853f>] ? ksphy_init+0x0/0x5e
 [<ffffffff83ca8553>] ksphy_init+0x14/0x5e
 [<ffffffff810001f9>] do_one_initcall+0x5e/0x15e
 [<ffffffff83c706bb>] kernel_init+0x17d/0x206
 [<ffffffff81002f24>] kernel_thread_helper+0x4/0x10
 [<ffffffff81d8d450>] ? restore_args+0x0/0x30
 [<ffffffff83c7053e>] ? kernel_init+0x0/0x206
 [<ffffffff81002f20>] ? kernel_thread_helper+0x0/0x10
Code: c1 80 39 00 75 f8 eb 0d 48 ff c1 48 ff ca 75 05 c6 01 00 eb 0e 40 8a 3e 48 ff c6 40 84 ff 40 88 39 75 e5 c9 c3 55 48 89 e5 8a 07 <8a> 16 48 ff c7 48 ff c6 38 d0 74 07 19 c0 83 c8 01 eb 06 84 c0
RIP  [<ffffffff813b1e98>] strcmp+0x6/0x21
 RSP <ffff88003f33fe00>
CR2: 0000000000000000
---[ end trace 73aaba243cb4fa42 ]---

I bisected it back to the following commit:

d05070091849015f8c5b7d55cd75b86ebb61b3ec is the first bad commit
commit d05070091849015f8c5b7d55cd75b86ebb61b3ec
Author: David J. Choi <david.choi@micrel.com>
Date:   Thu Apr 29 06:12:41 2010 +0000

    drivers/net/phy: micrel phy driver
    
    This is the first version of phy driver from Micrel Inc.
    
    Signed-off-by: David J. Choi <david.choi@micrel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

the config had:

  CONFIG_MICREL_PHY=y

Disabling the driver fixes the crash.

Thanks,

	Ingo

^ permalink raw reply

* Re: tip: origin tree boot crash (in the new micrel phy driver)
From: David Miller @ 2010-05-06 10:10 UTC (permalink / raw)
  To: mingo; +Cc: david.choi, torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20100506095940.GA12711@elte.hu>

From: Ingo Molnar <mingo@elte.hu>
Date: Thu, 6 May 2010 11:59:40 +0200

> I bisected it back to the following commit:
> 
> d05070091849015f8c5b7d55cd75b86ebb61b3ec is the first bad commit
> commit d05070091849015f8c5b7d55cd75b86ebb61b3ec
> Author: David J. Choi <david.choi@micrel.com>
> Date:   Thu Apr 29 06:12:41 2010 +0000
> 
>     drivers/net/phy: micrel phy driver
>     
>     This is the first version of phy driver from Micrel Inc.
>     
>     Signed-off-by: David J. Choi <david.choi@micrel.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> the config had:
> 
>   CONFIG_MICREL_PHY=y
> 
> Disabling the driver fixes the crash.

David please fix this crash, thanks.

^ permalink raw reply

* Re: tip: origin tree boot crash (in the new micrel phy driver)
From: David Miller @ 2010-05-06 10:14 UTC (permalink / raw)
  To: mingo; +Cc: david.choi, torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20100506.031040.186321896.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 06 May 2010 03:10:40 -0700 (PDT)

> From: Ingo Molnar <mingo@elte.hu>
> Date: Thu, 6 May 2010 11:59:40 +0200
> 
>> I bisected it back to the following commit:
>> 
>> d05070091849015f8c5b7d55cd75b86ebb61b3ec is the first bad commit
>> commit d05070091849015f8c5b7d55cd75b86ebb61b3ec
>> Author: David J. Choi <david.choi@micrel.com>
>> Date:   Thu Apr 29 06:12:41 2010 +0000
>> 
>>     drivers/net/phy: micrel phy driver
 ...
>> Disabling the driver fixes the crash.
> 
> David please fix this crash, thanks.

Nevermind, Ingo please test this fix:

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 0cd80e4..e67691d 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -32,6 +32,7 @@ static int kszphy_config_init(struct phy_device *phydev)
 
 static struct phy_driver ks8001_driver = {
 	.phy_id		= PHY_ID_KS8001,
+	.name		= "Micrel KS8001",
 	.phy_id_mask	= 0x00fffff0,
 	.features	= PHY_BASIC_FEATURES,
 	.flags		= PHY_POLL,

^ permalink raw reply related

* Re: tip: origin tree boot crash (in the new micrel phy driver)
From: Ingo Molnar @ 2010-05-06 10:44 UTC (permalink / raw)
  To: David Miller; +Cc: david.choi, torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20100506.031435.235688827.davem@davemloft.net>


* David Miller <davem@davemloft.net> wrote:

> From: David Miller <davem@davemloft.net>
> Date: Thu, 06 May 2010 03:10:40 -0700 (PDT)
> 
> > From: Ingo Molnar <mingo@elte.hu>
> > Date: Thu, 6 May 2010 11:59:40 +0200
> > 
> >> I bisected it back to the following commit:
> >> 
> >> d05070091849015f8c5b7d55cd75b86ebb61b3ec is the first bad commit
> >> commit d05070091849015f8c5b7d55cd75b86ebb61b3ec
> >> Author: David J. Choi <david.choi@micrel.com>
> >> Date:   Thu Apr 29 06:12:41 2010 +0000
> >> 
> >>     drivers/net/phy: micrel phy driver
>  ...
> >> Disabling the driver fixes the crash.
> > 
> > David please fix this crash, thanks.
> 
> Nevermind, Ingo please test this fix:
> 
> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
> index 0cd80e4..e67691d 100644
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -32,6 +32,7 @@ static int kszphy_config_init(struct phy_device *phydev)
>  
>  static struct phy_driver ks8001_driver = {
>  	.phy_id		= PHY_ID_KS8001,
> +	.name		= "Micrel KS8001",
>  	.phy_id_mask	= 0x00fffff0,
>  	.features	= PHY_BASIC_FEATURES,
>  	.flags		= PHY_POLL,

This fixes the crash, thanks Dave!

Tested-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply

* Re: kernel panic when using netns+bridges+tc(netem)
From: Martín Ferrari @ 2010-05-06 11:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Arnd Bergmann, netdev, Mathieu Lacage, David Miller
In-Reply-To: <1273129187.2304.14.camel@edumazet-laptop>

Eric,

On Thu, May 6, 2010 at 08:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Could you please try following patch ?
> David, this is a stable candidate, once tested and acked, thanks !
>
> [PATCH] veth: Dont kfree_skb() after dev_forward_skb()

I have been running a bunch of benchmarks and it didn't cause any
trouble. Seems the fix is good

Thanks a lot!!

-- 
Martín Ferrari

^ permalink raw reply

* [PATCH net-next-2.6] net: adjust handle_macvlan to pass port struct to hook
From: Jiri Pirko @ 2010-05-06 11:33 UTC (permalink / raw)
  To: netdev; +Cc: davem, kaber

Now there's null check here and also again in the hook. Looking at bridge bits
which are simmilar, port structure is rcu_dereferenced right away in
handle_bridge and passed to hook. Looks nicer.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9a939d8..1b78c00 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,19 +145,15 @@ static void macvlan_broadcast(struct sk_buff *skb,
 }
 
 /* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
+					    struct sk_buff *skb)
 {
 	const struct ethhdr *eth = eth_hdr(skb);
-	const struct macvlan_port *port;
 	const struct macvlan_dev *vlan;
 	const struct macvlan_dev *src;
 	struct net_device *dev;
 	unsigned int len;
 
-	port = rcu_dereference(skb->dev->macvlan_port);
-	if (port == NULL)
-		return skb;
-
 	if (is_multicast_ether_addr(eth->h_dest)) {
 		src = macvlan_hash_lookup(port, eth->h_source);
 		if (!src)
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index b78a712..9ea047a 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -85,6 +85,7 @@ extern netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 				      struct net_device *dev);
 
 
-extern struct sk_buff *(*macvlan_handle_frame_hook)(struct sk_buff *);
+extern struct sk_buff *(*macvlan_handle_frame_hook)(struct macvlan_port *,
+						    struct sk_buff *);
 
 #endif /* _LINUX_IF_MACVLAN_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 36d53be..f316a37 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2605,7 +2605,8 @@ static inline struct sk_buff *handle_bridge(struct sk_buff *skb,
 #endif
 
 #if defined(CONFIG_MACVLAN) || defined(CONFIG_MACVLAN_MODULE)
-struct sk_buff *(*macvlan_handle_frame_hook)(struct sk_buff *skb) __read_mostly;
+struct sk_buff *(*macvlan_handle_frame_hook)(struct macvlan_port *p,
+					     struct sk_buff *skb) __read_mostly;
 EXPORT_SYMBOL_GPL(macvlan_handle_frame_hook);
 
 static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
@@ -2613,14 +2614,17 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
 					     int *ret,
 					     struct net_device *orig_dev)
 {
-	if (skb->dev->macvlan_port == NULL)
+	struct macvlan_port *port;
+
+	port = rcu_dereference(skb->dev->macvlan_port);
+	if (!port)
 		return skb;
 
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
 	}
-	return macvlan_handle_frame_hook(skb);
+	return macvlan_handle_frame_hook(port, skb);
 }
 #else
 #define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)

^ permalink raw reply related

* [PATCH] xfrm: fix policy unreferencing on larval drop
From: Timo Teras @ 2010-05-06 11:52 UTC (permalink / raw)
  To: netdev; +Cc: Timo Teras

I mistakenly had the error path to use num_pols to decide how
many policies we need to drop (cruft from earlier patch set
version which did not handle socket policies right).

This is wrong since normally we do not keep explicit references
(instead we hold reference to the cache entry which holds references
to policies). drop_pols is set to num_pols if we are holding the
references, so use that. Otherwise we eventually BUG_ON inside
xfrm_policy_destroy due to premature policy deletion.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
---
 net/xfrm/xfrm_policy.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 31f4ba4..f4ea3a0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1805,7 +1805,7 @@ restart:
 			/* EREMOTE tells the caller to generate
 			 * a one-shot blackhole route. */
 			dst_release(dst);
-			xfrm_pols_put(pols, num_pols);
+			xfrm_pols_put(pols, drop_pols);
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTNOSTATES);
 			return -EREMOTE;
 		}
-- 
1.6.3.3


^ permalink raw reply related

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: Bhaskar Dutta @ 2010-05-06 11:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Ben Hutchings, netdev
In-Reply-To: <1273085598.2367.233.camel@edumazet-laptop>

On Thu, May 6, 2010 at 12:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 05 mai 2010 à 23:33 +0530, Bhaskar Dutta a écrit :
>
>> Hi,
>>
>> TSO, GSO and SG are already turned off.
>> rx/tx checksumming is on, but that shouldn't matter, right?
>>
>> # ethtool -k eth0
>> Offload parameters for eth0:
>> rx-checksumming: on
>> tx-checksumming: on
>> scatter-gather: off
>> tcp segmentation offload: off
>> udp fragmentation offload: off
>> generic segmentation offload: off
>>
>> The bad packets are very small in size, most have no data at all (<300 bytes).
>>
>> After adding some logs to kernel 2.6.31-12, it seems that
>> tcp_v4_md5_hash_skb (function that calculates the md5 hash) is
>> (might?) getting corrupt.
>>
>> The tcp4_pseudohdr (bp = &hp->md5_blk.ip4) structure's saddr, daddr
>> and len fields get modified to different values towards the end of the
>> tcp_v4_md5_hash_skb function whenever there is a checksum error.
>>
>> The tcp4_pseudohdr (bp) is within the tcp_md5sig_pool (hp), which is
>> filled up by tcp_get_md5sig_pool (which calls per_cpu_ptr).
>>
>> Using a local copy of the tcp4_pseudohdr in the same function
>> tcp_v4_md5_hash_skb (copied all fields from the original
>> tcp4_pseudohdr within the tcp_md5sig_pool) and calculating the md5
>> checksum with the local  tcp4_pseudohdr seems to solve the issue
>> (don't see bad packets for a hours in load tests, and without the
>> change I can see them instantaneously in the load tests).
>>
>> I am still unable to figure out how this is happening. Please let me
>> know if you have any pointers.
>
> I am not familiar with this code, but I suspect same per_cpu data can be
> used at both time by a sender (process context) and by a receiver
> (softirq context).
>
> To trigger this, you need at least two active md5 sockets.
>
> tcp_get_md5sig_pool() should probably disable bh to make sure current
> cpu wont be preempted by softirq processing
>
>
> Something like :
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index fb5c66b..e232123 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool          *tcp_get_md5sig_pool(void)
>        struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu);
>        if (!ret)
>                put_cpu();
> +       else
> +               local_bh_disable();
>        return ret;
>  }
>
>  static inline void             tcp_put_md5sig_pool(void)
>  {
>        __tcp_put_md5sig_pool();
> +       local_bh_enable();
>        put_cpu();
>  }
>
>
>

I put in the above change and ran some load tests with around 50
active TCP connections doing MD5.
I could see only 1 bad packet in 30 min (earlier the problem used to
occur instantaneously and repeatedly).

I think there is another possibility of being preempted when calling
tcp_alloc_md5sig_pool()
this function releases the spinlock when calling __tcp_alloc_md5sig_pool().

I will run some more tests after changing the  tcp_alloc_md5sig_pool
and see if the problem is completely resolved.

Thanks a lot for your help!
Bhaskar

^ permalink raw reply

* Re: RTL-8110SC lockup with r8169
From: Pádraig Brady @ 2010-05-06 12:00 UTC (permalink / raw)
  To: netdev; +Cc: Francois Romieu, Glen Gray
In-Reply-To: <4BE1973D.8080502@draigBrady.com>

On 05/05/10 17:05, Pádraig Brady wrote:
> Hi,
> 
> We're having an issue with the r8169 driver, where very often
> (1 in 10 boots) it will lockup and our netboot system will hang.
> On this hardware previously, we used FC5 with the r1000 driver without issue.

I should have clarified that the lockup is at boot
when trying to bring up the interface.

I've now tried the Realtek r8169-6.013.00.tar.bz2 driver
dated April 28 2010, and it does _not_ have the issue.

cheers,
Pádraig.

^ permalink raw reply

* [PATCH net-next-2.6] net: Consistent skb timestamping
From: Eric Dumazet @ 2010-05-06 12:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert

With RPS inclusion, skb timestamping is not consistent in RX path.

If netif_receive_skb() is used, its deferred after RPS dispatch.

If netif_rx() is used, its done before RPS dispatch.

This can give strange tcpdump timestamps results.

I think timestamping should be done as soon as possible in the receive
path, to get meaningful values (ie timestamps taken at the time packet
was delivered by NIC driver to our stack), even if NAPI already can
defer timestamping a bit (RPS can help to reduce the gap)

Remove timestamping from __netif_receive_skb, and add it to
netif_receive_skb(), before RPS.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |   46 ++++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 36d53be..3278003 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1454,7 +1454,7 @@ void net_disable_timestamp(void)
 }
 EXPORT_SYMBOL(net_disable_timestamp);
 
-static inline void net_timestamp(struct sk_buff *skb)
+static inline void net_timestamp_set(struct sk_buff *skb)
 {
 	if (atomic_read(&netstamp_needed))
 		__net_timestamp(skb);
@@ -1462,6 +1462,12 @@ static inline void net_timestamp(struct sk_buff *skb)
 		skb->tstamp.tv64 = 0;
 }
 
+static inline void net_timestamp_check(struct sk_buff *skb)
+{
+	if (!skb->tstamp.tv64 && atomic_read(&netstamp_needed))
+		__net_timestamp(skb);
+}
+
 /**
  * dev_forward_skb - loopback an skb to another netif
  *
@@ -1509,9 +1515,9 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 
 #ifdef CONFIG_NET_CLS_ACT
 	if (!(skb->tstamp.tv64 && (G_TC_FROM(skb->tc_verd) & AT_INGRESS)))
-		net_timestamp(skb);
+		net_timestamp_set(skb);
 #else
-	net_timestamp(skb);
+	net_timestamp_set(skb);
 #endif
 
 	rcu_read_lock();
@@ -2458,8 +2464,7 @@ int netif_rx(struct sk_buff *skb)
 	if (netpoll_rx(skb))
 		return NET_RX_DROP;
 
-	if (!skb->tstamp.tv64)
-		net_timestamp(skb);
+	net_timestamp_check(skb);
 
 #ifdef CONFIG_RPS
 	{
@@ -2780,9 +2785,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	int ret = NET_RX_DROP;
 	__be16 type;
 
-	if (!skb->tstamp.tv64)
-		net_timestamp(skb);
-
 	if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
 		return NET_RX_SUCCESS;
 
@@ -2899,23 +2901,27 @@ out:
  */
 int netif_receive_skb(struct sk_buff *skb)
 {
+	net_timestamp_check(skb);
+
 #ifdef CONFIG_RPS
-	struct rps_dev_flow voidflow, *rflow = &voidflow;
-	int cpu, ret;
+	{
+		struct rps_dev_flow voidflow, *rflow = &voidflow;
+		int cpu, ret;
 
-	rcu_read_lock();
+		rcu_read_lock();
 
-	cpu = get_rps_cpu(skb->dev, skb, &rflow);
+		cpu = get_rps_cpu(skb->dev, skb, &rflow);
 
-	if (cpu >= 0) {
-		ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
-		rcu_read_unlock();
-	} else {
-		rcu_read_unlock();
-		ret = __netif_receive_skb(skb);
-	}
+		if (cpu >= 0) {
+			ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+			rcu_read_unlock();
+		} else {
+			rcu_read_unlock();
+			ret = __netif_receive_skb(skb);
+		}
 
-	return ret;
+		return ret;
+	}
 #else
 	return __netif_receive_skb(skb);
 #endif



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox