Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Bridging behavior apparently changed around the Fedora 14 time
From: Ben Greear @ 2011-07-11 21:16 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Greg Scott, netdev, Lynn Hanson, Joe Whalen
In-Reply-To: <20110711141028.19f0de46@nehalam.ftrdhcpuser.net>

On 07/11/2011 02:10 PM, Stephen Hemminger wrote:
> On Mon, 11 Jul 2011 16:08:14 -0500
> "Greg Scott"<GregScott@Infrasupport.com>  wrote:
>
>>> What about console dmesg output.
>>
>> I should probably turn off all firewall logging so I don't fill the ring
>> buffer with my log messages in, like, the first couple minutes after a
>> boot.  :)
>>
>>> Please retest with a standard upstream kernel (like 2.6.39.2).
>>
>> That's gonna take a while to put together a whole test environment with
>> the latest kernel.org kernel.
>>
>>> The bridge itself puts the device into promiscuous mode already.
>
> The bridge code calls dev_set_promiscuity() which should
> be changing device mode. But it could be that netdev core is
> resetting/changing/breaking that.

Last time I checked, 'ifconfig' and similar output didn't
show promisc when NIC was actually promisc, unless the user
specified the promisc-ness.

You can read /sys/class/net/dev/eth0/flags and
see if flag 0x100 is set..if so, it's promisc.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* RE: Bridging behavior apparently changed around the Fedora 14 time
From: Greg Scott @ 2011-07-11 21:16 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Lynn Hanson, Joe Whalen
In-Reply-To: <20110711141028.19f0de46@nehalam.ftrdhcpuser.net>

> The bridge code calls dev_set_promiscuity() which should
> be changing device mode. But it could be that netdev core is 
> resetting/changing/breaking that.

Is it supposed to change the physical ethnn devices or the br device?

Here is what I do to set up the bridging. I do it myself right in the
script so I can control all the details.

.
.
.
#
# Setup bridging
#

echo "Setting up bridge $BR_IFACE to bridge $INET_IFACE with
$TRUSTED1_IFACE"

$BRCTL addbr $BR_IFACE
$BRCTL addif $BR_IFACE $INET_IFACE
$BRCTL addif $BR_IFACE $TRUSTED1_IFACE

echo "  Adding $BR_IP_SLASH and $TRUSTED1_IP_SLASH IP Addresses to
$BR_IFACE"
/sbin/ip addr add $BR_IP_SLASH broadcast $BR_BCAST_ADDRESS dev $BR_IFACE
/sbin/ip addr add $TRUSTED1_IP_SLASH broadcast $TRUSTED1_BCAST_ADDRESS
dev $BR_IFACE
/sbin/ip link set $BR_IFACE up

echo "  Removing $INET_IP_SLASH and $TRUSTED1_IP_SLASH from $INET_IFACE
and $TRUSTED1_IFACE"
/sbin/ip addr del $INET_IP_SLASH dev $INET_IFACE
/sbin/ip addr del $INET_IP_SLASH dev $INET_IFACE
/sbin/ip addr del $TRUSTED1_IP_SLASH dev $TRUSTED1_IFACE
/sbin/ip addr del $TRUSTED1_IP_SLASH dev $TRUSTED1_IFACE

echo "  Putting $BR_IFACE into promiscuous mode"
# This fixes a bug forwarding packets bound for external IP Addresses
# from the private LAN.

ip link set $BR_IFACE promisc on

#
# Set up aliases for public IP addresses
#
.
.
.


- Greg



^ permalink raw reply

* Re: sch_generic warn_on (timed out)
From: David Miller @ 2011-07-11 21:17 UTC (permalink / raw)
  To: davej; +Cc: netdev
In-Reply-To: <20110711204834.GA4950@redhat.com>

From: Dave Jones <davej@redhat.com>
Date: Mon, 11 Jul 2011 16:48:34 -0400

> We've recieved quite a few bug reports in Fedora recently concerning this warning in
> sch_generic..
> 
>             WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
>                       dev->name, netdev_drivername(dev, drivername, 64), i);
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=702723 is our 'master bug' that we're
> duping all others against. It seems to be showing up on a variety of different
> hardware (r8169, atl1c, ipheth, e1000e, 8139too). Do all these drivers need
> fixing ? or is it just 'crap hardware' ?
> 
> note that I've only been looking through fedora 15 bugs so far (which is still on 2.6.38),
> but looking at the commit log for sch_generic, it doesn't seem that there's anything
> obvious that needs backporting.

It means the transmitter stopped sending packets for several seconds.

I would track this on a per-device basis if I were you, instead of
combining them all into one super-bug.

^ permalink raw reply

* Re: sch_generic warn_on (timed out)
From: Eric Dumazet @ 2011-07-11 21:20 UTC (permalink / raw)
  To: David Miller; +Cc: davej, netdev
In-Reply-To: <20110711.141701.1453197953333902027.davem@davemloft.net>

Le lundi 11 juillet 2011 à 14:17 -0700, David Miller a écrit :
> From: Dave Jones <davej@redhat.com>
> Date: Mon, 11 Jul 2011 16:48:34 -0400
> 
> > We've recieved quite a few bug reports in Fedora recently concerning this warning in
> > sch_generic..
> > 
> >             WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
> >                       dev->name, netdev_drivername(dev, drivername, 64), i);
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=702723 is our 'master bug' that we're
> > duping all others against. It seems to be showing up on a variety of different
> > hardware (r8169, atl1c, ipheth, e1000e, 8139too). Do all these drivers need
> > fixing ? or is it just 'crap hardware' ?
> > 
> > note that I've only been looking through fedora 15 bugs so far (which is still on 2.6.38),
> > but looking at the commit log for sch_generic, it doesn't seem that there's anything
> > obvious that needs backporting.
> 
> It means the transmitter stopped sending packets for several seconds.
> 
> I would track this on a per-device basis if I were you, instead of
> combining them all into one super-bug.

Last time I took a look (on one r8169 NIC), it wasnt clear if this could
be a PAUSE problem.



^ permalink raw reply

* Re: [PATCH 3/5] ath5k: Add missing breaks in switch/case
From: Nick Kossifidis @ 2011-07-11 21:24 UTC (permalink / raw)
  To: Joe Perches
  Cc: Jiri Slaby, Luis R. Rodriguez, Bob Copeland, John W. Linville,
	linux-wireless, ath5k-devel, netdev, linux-kernel
In-Reply-To: <e554bed5c753a3fe85d3be34983a15e6110a6e35.1310289795.git.joe@perches.com>

2011/7/10 Joe Perches <joe@perches.com>:
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>  drivers/net/wireless/ath/ath5k/desc.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath5k/desc.c b/drivers/net/wireless/ath/ath5k/desc.c
> index 62172d5..f82383b 100644
> --- a/drivers/net/wireless/ath/ath5k/desc.c
> +++ b/drivers/net/wireless/ath/ath5k/desc.c
> @@ -107,10 +107,13 @@ ath5k_hw_setup_2word_tx_desc(struct ath5k_hw *ah, struct ath5k_desc *desc,
>                case AR5K_PKT_TYPE_BEACON:
>                case AR5K_PKT_TYPE_PROBE_RESP:
>                        frame_type = AR5K_AR5210_TX_DESC_FRAME_TYPE_NO_DELAY;
> +                       break;
>                case AR5K_PKT_TYPE_PIFS:
>                        frame_type = AR5K_AR5210_TX_DESC_FRAME_TYPE_PIFS;
> +                       break;
>                default:
>                        frame_type = type;
> +                       break;
>                }
>
>                tx_ctl->tx_control_0 |=

Acked-by: Nick Kossifidis <mickflemm@gmail.com>



-- 
GPG ID: 0xD21DB2DB
As you read this post global entropy rises. Have Fun ;-)
Nick

^ permalink raw reply

* Re: Bridging behavior apparently changed around the Fedora 14 time
From: Stephen Hemminger @ 2011-07-11 21:24 UTC (permalink / raw)
  To: Greg Scott; +Cc: netdev, Lynn Hanson, Joe Whalen
In-Reply-To: <925A849792280C4E80C5461017A4B8A2A040F7@mail733.InfraSupportEtc.com>

On Mon, 11 Jul 2011 16:16:40 -0500
"Greg Scott" <GregScott@Infrasupport.com> wrote:

> > The bridge code calls dev_set_promiscuity() which should
> > be changing device mode. But it could be that netdev core is 
> > resetting/changing/breaking that.
> 
> Is it supposed to change the physical ethnn devices or the br device?

The physical device ethnn.

> Here is what I do to set up the bridging. I do it myself right in the
> script so I can control all the details.
> 
> .
> .
> .
> #
> # Setup bridging
> #
> 
> echo "Setting up bridge $BR_IFACE to bridge $INET_IFACE with
> $TRUSTED1_IFACE"
> 
> $BRCTL addbr $BR_IFACE
> $BRCTL addif $BR_IFACE $INET_IFACE
> $BRCTL addif $BR_IFACE $TRUSTED1_IFACE
> 
> echo "  Adding $BR_IP_SLASH and $TRUSTED1_IP_SLASH IP Addresses to
> $BR_IFACE"
> /sbin/ip addr add $BR_IP_SLASH broadcast $BR_BCAST_ADDRESS dev $BR_IFACE
> /sbin/ip addr add $TRUSTED1_IP_SLASH broadcast $TRUSTED1_BCAST_ADDRESS
> dev $BR_IFACE
> /sbin/ip link set $BR_IFACE up
> 
> echo "  Removing $INET_IP_SLASH and $TRUSTED1_IP_SLASH from $INET_IFACE
> and $TRUSTED1_IFACE"
> /sbin/ip addr del $INET_IP_SLASH dev $INET_IFACE
> /sbin/ip addr del $INET_IP_SLASH dev $INET_IFACE
> /sbin/ip addr del $TRUSTED1_IP_SLASH dev $TRUSTED1_IFACE
> /sbin/ip addr del $TRUSTED1_IP_SLASH dev $TRUSTED1_IFACE
> 
> echo "  Putting $BR_IFACE into promiscuous mode"
> # This fixes a bug forwarding packets bound for external IP Addresses
> # from the private LAN.
> 
> ip link set $BR_IFACE promisc on
> 

What is supposed to happen is that the bridge adds all the interface
MAC addresses to the forwarding table as permanent entries. To show the
forwarding table:
  # brctl showmacs br0

port no	mac addr		is local?	ageing timer
  1	c6:eb:2a:0c:12:eb	yes		   0.00

Then when packet arrives with that mac address it is handed up to
netfilter, then if not firewalled, it goes on to the IP stack.

There maybe protections against packet going back out the same interface
that are getting tripped by all the rewriting.




^ permalink raw reply

* [PATCH net-next-2.6 1/2] net: introduce __netdev_alloc_skb_ip_align
From: Eric Dumazet @ 2011-07-11 21:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

RX rings should use GFP_KERNEL allocations if possible, add
__netdev_alloc_skb_ip_align() helper to ease this.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/skbuff.h |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 32ada53..a24218c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1579,16 +1579,22 @@ static inline struct sk_buff *netdev_alloc_skb(struct net_device *dev,
 	return __netdev_alloc_skb(dev, length, GFP_ATOMIC);
 }
 
-static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
-		unsigned int length)
+static inline struct sk_buff *__netdev_alloc_skb_ip_align(struct net_device *dev,
+		unsigned int length, gfp_t gfp)
 {
-	struct sk_buff *skb = netdev_alloc_skb(dev, length + NET_IP_ALIGN);
+	struct sk_buff *skb = __netdev_alloc_skb(dev, length + NET_IP_ALIGN, gfp);
 
 	if (NET_IP_ALIGN && skb)
 		skb_reserve(skb, NET_IP_ALIGN);
 	return skb;
 }
 
+static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
+		unsigned int length)
+{
+	return __netdev_alloc_skb_ip_align(dev, length, GFP_ATOMIC);
+}
+
 /**
  *	__netdev_alloc_page - allocate a page for ps-rx on a specific device
  *	@dev: network device to receive on



^ permalink raw reply related

* [PATCH net-next-2.6 2/2] e1000e: use GFP_KERNEL allocations at init time
From: Eric Dumazet @ 2011-07-11 21:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ben Greear, Bruce Allan

Note : This patch is untested, I dont have the hardware

Thanks

[PATCH net-next-2.6 2/2] e1000e: use GFP_KERNEL allocations at init time

In process and sleep allowed context, favor GFP_KERNEL allocations over
GFP_ATOMIC ones.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ben Greear <greearb@candelatech.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
---
 drivers/net/e1000e/e1000.h  |    2 +-
 drivers/net/e1000e/netdev.c |   33 +++++++++++++++++----------------
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index c1e7f94..2de6fc8 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -341,7 +341,7 @@ struct e1000_adapter {
 			  int *work_done, int work_to_do)
 						____cacheline_aligned_in_smp;
 	void (*alloc_rx_buf) (struct e1000_adapter *adapter,
-			      int cleaned_count);
+			      int cleaned_count, gfp_t gfp);
 	struct e1000_ring *rx_ring;
 
 	u32 rx_int_delay;
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index ed7a93d..365a324 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -523,7 +523,7 @@ static void e1000_rx_checksum(struct e1000_adapter *adapter, u32 status_err,
  * @adapter: address of board private structure
  **/
 static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
-				   int cleaned_count)
+				   int cleaned_count, gfp_t gfp)
 {
 	struct net_device *netdev = adapter->netdev;
 	struct pci_dev *pdev = adapter->pdev;
@@ -544,7 +544,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 			goto map_skb;
 		}
 
-		skb = netdev_alloc_skb_ip_align(netdev, bufsz);
+		skb = __netdev_alloc_skb_ip_align(netdev, bufsz, gfp);
 		if (!skb) {
 			/* Better luck next round */
 			adapter->alloc_rx_buff_failed++;
@@ -589,7 +589,7 @@ map_skb:
  * @adapter: address of board private structure
  **/
 static void e1000_alloc_rx_buffers_ps(struct e1000_adapter *adapter,
-				      int cleaned_count)
+				      int cleaned_count, gfp_t gfp)
 {
 	struct net_device *netdev = adapter->netdev;
 	struct pci_dev *pdev = adapter->pdev;
@@ -615,7 +615,7 @@ static void e1000_alloc_rx_buffers_ps(struct e1000_adapter *adapter,
 				continue;
 			}
 			if (!ps_page->page) {
-				ps_page->page = alloc_page(GFP_ATOMIC);
+				ps_page->page = alloc_page(gfp);
 				if (!ps_page->page) {
 					adapter->alloc_rx_buff_failed++;
 					goto no_buffers;
@@ -641,8 +641,9 @@ static void e1000_alloc_rx_buffers_ps(struct e1000_adapter *adapter,
 			    cpu_to_le64(ps_page->dma);
 		}
 
-		skb = netdev_alloc_skb_ip_align(netdev,
-						adapter->rx_ps_bsize0);
+		skb = __netdev_alloc_skb_ip_align(netdev,
+						  adapter->rx_ps_bsize0,
+						  gfp);
 
 		if (!skb) {
 			adapter->alloc_rx_buff_failed++;
@@ -692,7 +693,7 @@ no_buffers:
  **/
 
 static void e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
-                                         int cleaned_count)
+					 int cleaned_count, gfp_t gfp)
 {
 	struct net_device *netdev = adapter->netdev;
 	struct pci_dev *pdev = adapter->pdev;
@@ -713,7 +714,7 @@ static void e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
 			goto check_page;
 		}
 
-		skb = netdev_alloc_skb_ip_align(netdev, bufsz);
+		skb = __netdev_alloc_skb_ip_align(netdev, bufsz, gfp);
 		if (unlikely(!skb)) {
 			/* Better luck next round */
 			adapter->alloc_rx_buff_failed++;
@@ -724,7 +725,7 @@ static void e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
 check_page:
 		/* allocate a new page if necessary */
 		if (!buffer_info->page) {
-			buffer_info->page = alloc_page(GFP_ATOMIC);
+			buffer_info->page = alloc_page(gfp);
 			if (unlikely(!buffer_info->page)) {
 				adapter->alloc_rx_buff_failed++;
 				break;
@@ -888,7 +889,7 @@ next_desc:
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= E1000_RX_BUFFER_WRITE) {
-			adapter->alloc_rx_buf(adapter, cleaned_count);
+			adapter->alloc_rx_buf(adapter, cleaned_count, GFP_ATOMIC);
 			cleaned_count = 0;
 		}
 
@@ -900,7 +901,7 @@ next_desc:
 
 	cleaned_count = e1000_desc_unused(rx_ring);
 	if (cleaned_count)
-		adapter->alloc_rx_buf(adapter, cleaned_count);
+		adapter->alloc_rx_buf(adapter, cleaned_count, GFP_ATOMIC);
 
 	adapter->total_rx_bytes += total_rx_bytes;
 	adapter->total_rx_packets += total_rx_packets;
@@ -1230,7 +1231,7 @@ next_desc:
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= E1000_RX_BUFFER_WRITE) {
-			adapter->alloc_rx_buf(adapter, cleaned_count);
+			adapter->alloc_rx_buf(adapter, cleaned_count, GFP_ATOMIC);
 			cleaned_count = 0;
 		}
 
@@ -1244,7 +1245,7 @@ next_desc:
 
 	cleaned_count = e1000_desc_unused(rx_ring);
 	if (cleaned_count)
-		adapter->alloc_rx_buf(adapter, cleaned_count);
+		adapter->alloc_rx_buf(adapter, cleaned_count, GFP_ATOMIC);
 
 	adapter->total_rx_bytes += total_rx_bytes;
 	adapter->total_rx_packets += total_rx_packets;
@@ -1411,7 +1412,7 @@ next_desc:
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (unlikely(cleaned_count >= E1000_RX_BUFFER_WRITE)) {
-			adapter->alloc_rx_buf(adapter, cleaned_count);
+			adapter->alloc_rx_buf(adapter, cleaned_count, GFP_ATOMIC);
 			cleaned_count = 0;
 		}
 
@@ -1423,7 +1424,7 @@ next_desc:
 
 	cleaned_count = e1000_desc_unused(rx_ring);
 	if (cleaned_count)
-		adapter->alloc_rx_buf(adapter, cleaned_count);
+		adapter->alloc_rx_buf(adapter, cleaned_count, GFP_ATOMIC);
 
 	adapter->total_rx_bytes += total_rx_bytes;
 	adapter->total_rx_packets += total_rx_packets;
@@ -3105,7 +3106,7 @@ static void e1000_configure(struct e1000_adapter *adapter)
 	e1000_configure_tx(adapter);
 	e1000_setup_rctl(adapter);
 	e1000_configure_rx(adapter);
-	adapter->alloc_rx_buf(adapter, e1000_desc_unused(adapter->rx_ring));
+	adapter->alloc_rx_buf(adapter, e1000_desc_unused(adapter->rx_ring), GFP_KERNEL);
 }
 
 /**



^ permalink raw reply related

* [PATCH RFC] vhost: address fixme in vhost TX zero-copy support
From: Michael S. Tsirkin @ 2011-07-11 22:04 UTC (permalink / raw)
  To: Shirley Ma; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <1309991321.10209.26.camel@localhost.localdomain>

So the following should do it, on top of Shirleys's patch, I think.  I'm
a bit not sure about using vq->upend_idx - vq->done_idx to check the
number of outstanding DMA, Shirley, what do you think?
Untested.

I'm also thinking about making the use of this conditinal
on a module parameter, off by default to reduce
stability risk while still enabling more people to
test the feature.
Thoughts?

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7de0c6e..cf8deb3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -156,8 +156,7 @@ static void handle_tx(struct vhost_net *net)
 
 	for (;;) {
 		/* Release DMAs done buffers first */
-		if (atomic_read(&vq->refcnt) > VHOST_MAX_PEND)
-			vhost_zerocopy_signal_used(vq);
+		vhost_zerocopy_signal_used(vq);
 
 		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
 					 ARRAY_SIZE(vq->iov),
@@ -175,7 +174,7 @@ static void handle_tx(struct vhost_net *net)
 				break;
 			}
 			/* If more outstanding DMAs, queue the work */
-			if (atomic_read(&vq->refcnt) > VHOST_MAX_PEND) {
+			if (vq->upend_idx - vq->done_idx > VHOST_MAX_PEND) {
 				tx_poll_start(net, sock);
 				set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
 				break;
@@ -214,12 +213,12 @@ static void handle_tx(struct vhost_net *net)
 
 				vq->heads[vq->upend_idx].len = len;
 				ubuf->callback = vhost_zerocopy_callback;
-				ubuf->arg = vq;
+				ubuf->arg = vq->ubufs;
 				ubuf->desc = vq->upend_idx;
 				msg.msg_control = ubuf;
 				msg.msg_controllen = sizeof(ubuf);
+				kref_get(&vq->ubufs->kref);
 			}
-			atomic_inc(&vq->refcnt);
 			vq->upend_idx = (vq->upend_idx + 1) % UIO_MAXIOV;
 		}
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
@@ -646,6 +645,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 {
 	struct socket *sock, *oldsock;
 	struct vhost_virtqueue *vq;
+	struct vhost_ubuf_ref *ubufs, *oldubufs = NULL;
 	int r;
 
 	mutex_lock(&n->dev.mutex);
@@ -675,6 +675,13 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 	oldsock = rcu_dereference_protected(vq->private_data,
 					    lockdep_is_held(&vq->mutex));
 	if (sock != oldsock) {
+		ubufs = vhost_ubuf_alloc(vq, sock);
+		if (IS_ERR(ubufs)) {
+			r = PTR_ERR(ubufs);
+			goto err_ubufs;
+		}
+		oldubufs = vq->ubufs;
+		vq->ubufs = ubufs;
 		vhost_net_disable_vq(n, vq);
 		rcu_assign_pointer(vq->private_data, sock);
 		vhost_net_enable_vq(n, vq);
@@ -682,6 +689,9 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 
 	mutex_unlock(&vq->mutex);
 
+	if (oldbufs)
+		vhost_ubuf_put_and_wait(oldbufs);
+
 	if (oldsock) {
 		vhost_net_flush_vq(n, index);
 		fput(oldsock->file);
@@ -690,6 +700,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 	mutex_unlock(&n->dev.mutex);
 	return 0;
 
+err_ubufs:
+	fput(sock);
 err_vq:
 	mutex_unlock(&vq->mutex);
 err:
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index db242b1..81b1dd7 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -181,7 +181,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->log_ctx = NULL;
 	vq->upend_idx = 0;
 	vq->done_idx = 0;
-	atomic_set(&vq->refcnt, 0);
+	vq->ubufs = NULL;
 }
 
 static int vhost_worker(void *data)
@@ -401,7 +401,7 @@ long vhost_dev_reset_owner(struct vhost_dev *dev)
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
  * guest used idx.
  */
-void vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
 {
 	int i, j = 0;
 
@@ -414,10 +414,9 @@ void vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
 		} else
 			break;
 	}
-	if (j) {
+	if (j)
 		vq->done_idx = i;
-		atomic_sub(j, &vq->refcnt);
-	}
+	return j;
 }
 
 /* Caller should have device mutex */
@@ -430,9 +429,13 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 			vhost_poll_stop(&dev->vqs[i].poll);
 			vhost_poll_flush(&dev->vqs[i].poll);
 		}
-		/* Wait for all lower device DMAs done (busywait FIXME) */
-		while (atomic_read(&dev->vqs[i].refcnt))
-			vhost_zerocopy_signal_used(&dev->vqs[i]);
+		/* Wait for all lower device DMAs done. */
+		if (dev->vqs[i].ubufs)
+			vhost_ubuf_put_and_wait(dev->vqs[i].ubufs);
+
+		/* Signal guest as appropriate. */
+		vhost_zerocopy_signal_used(&dev->vqs[i]);
+
 		if (dev->vqs[i].error_ctx)
 			eventfd_ctx_put(dev->vqs[i].error_ctx);
 		if (dev->vqs[i].error)
@@ -645,11 +648,6 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
 
 	mutex_lock(&vq->mutex);
 
-	/* clean up lower device outstanding DMAs, before setting ring
-	   busywait FIXME */
-	while (atomic_read(&vq->refcnt))
-		vhost_zerocopy_signal_used(vq);
-
 	switch (ioctl) {
 	case VHOST_SET_VRING_NUM:
 		/* Resizing ring with an active backend?
@@ -1525,12 +1523,46 @@ void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	}
 }
 
+static void vhost_zerocopy_done_signal(struct kref *kref)
+{
+	struct vhost_ubuf_ref *ubufs = container_of(kref, struct vhost_ubuf_ref,
+						    kref);
+	wake_up(&ubufs->wait);
+}
+
+struct vhost_ubuf_ref *vhost_ubuf_alloc(struct vhost_virtqueue *vq,
+					void * private_data)
+{
+	struct vhost_ubuf_ref *ubufs;
+	/* No backend? Nothing to count. */
+	if (!private_data)
+		return NULL;
+	ubufs = kmalloc(sizeof *ubufs, GFP_KERNEL);
+	if (!ubufs)
+		return ERR_PTR(-ENOMEM);
+	kref_init(&ubufs->kref);
+	kref_get(&ubufs->kref);
+	init_waitqueue_head(&ubufs->wait);
+	ubufs->vq = vq;
+	return ubufs;
+}
+
+void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs)
+{
+	kref_put(&ubufs->kref, vhost_zerocopy_done_signal); 
+	wait_event(ubufs->wait, !atomic_read(&ubufs->kref.refcount));
+	kfree(ubufs);
+}
+
 void vhost_zerocopy_callback(void *arg)
 {
 	struct ubuf_info *ubuf = (struct ubuf_info *)arg;
+	struct vhost_ubuf_ref *ubufs;
 	struct vhost_virtqueue *vq;
 
-	vq = (struct vhost_virtqueue *)ubuf->arg;
+	ubufs = ubuf->arg;
+	vq = ubufs->vq;
 	/* set len = 1 to mark this desc buffers done DMA */
 	vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
+	kref_put(&ubufs->kref, vhost_zerocopy_done_signal); 
 }
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 883688c..b42b126 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -55,6 +55,17 @@ struct vhost_log {
 	u64 len;
 };
 
+struct vhost_virtqueue;
+
+struct vhost_ubuf_ref {
+	struct kref kref;
+	wait_queue_t wait;
+	struct vhost_virtqueue *vq;
+};
+
+struct vhost_ubuf_ref *vhost_ubuf_alloc(struct vhost_virtqueue *, void *);
+void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *);
+
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
@@ -127,6 +138,9 @@ struct vhost_virtqueue {
 	int done_idx;
 	/* an array of userspace buffers info */
 	struct ubuf_info *ubuf_info;
+	/* Reference counting for outstanding ubufs.
+	 * Protected by vq mutex. Writers must also take device mutex. */
+	struct vhost_ubuf_ref *ubufs;
 };
 
 struct vhost_dev {
@@ -174,7 +188,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
 		    unsigned int log_num, u64 len);
 void vhost_zerocopy_callback(void *arg);
-void vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
+int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
 
 #define vq_err(vq, fmt, ...) do {                                  \
 		pr_debug(pr_fmt(fmt), ##__VA_ARGS__);       \

^ permalink raw reply related

* Re: [PATCH net-next-2.6 1/2] net: introduce __netdev_alloc_skb_ip_align
From: Ben Hutchings @ 2011-07-11 22:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1310421164.2860.21.camel@edumazet-laptop>

On Mon, 2011-07-11 at 23:52 +0200, Eric Dumazet wrote:
> RX rings should use GFP_KERNEL allocations if possible, add
> __netdev_alloc_skb_ip_align() helper to ease this.
[...]

When it is possible, other than when starting an interface?  RX refill
normally has to be done in NAPI context or at least synchronised with
NAPI, so GFP_ATOMIC is required.  Not sure how much point there is in
making a special case for the initial fill.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH net-next-2.6 1/2] net: introduce __netdev_alloc_skb_ip_align
From: Michał Mirosław @ 2011-07-11 23:12 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Eric Dumazet, David Miller, netdev
In-Reply-To: <1310425028.2707.12.camel@bwh-desktop>

2011/7/12 Ben Hutchings <bhutchings@solarflare.com>:
> On Mon, 2011-07-11 at 23:52 +0200, Eric Dumazet wrote:
>> RX rings should use GFP_KERNEL allocations if possible, add
>> __netdev_alloc_skb_ip_align() helper to ease this.
> [...]
> When it is possible, other than when starting an interface?  RX refill
> normally has to be done in NAPI context or at least synchronised with
> NAPI, so GFP_ATOMIC is required.  Not sure how much point there is in
> making a special case for the initial fill.

Drivers can also refill buffers later, even from separate thread. Ring refilling
can be made lockless given proper access ordering is preserved.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] e1000e: use GFP_KERNEL allocations at init time
From: Jeff Kirsher @ 2011-07-11 23:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Ben Greear, Bruce Allan
In-Reply-To: <1310421182.2860.22.camel@edumazet-laptop>

On Mon, Jul 11, 2011 at 14:53, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Note : This patch is untested, I dont have the hardware
>
> Thanks
>
> [PATCH net-next-2.6 2/2] e1000e: use GFP_KERNEL allocations at init time
>
> In process and sleep allowed context, favor GFP_KERNEL allocations over
> GFP_ATOMIC ones.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Ben Greear <greearb@candelatech.com>
> CC: Bruce Allan <bruce.w.allan@intel.com>
> ---
>  drivers/net/e1000e/e1000.h  |    2 +-
>  drivers/net/e1000e/netdev.c |   33 +++++++++++++++++----------------
>  2 files changed, 18 insertions(+), 17 deletions(-)
>

Thanks Eric!  I have added the patch to my queue.

-- 
Cheers,
Jeff

^ permalink raw reply

* RE: [net-next 5/5] ixgbe: convert to ndo_fix_features
From: Skidmore, Donald C @ 2011-07-11 23:53 UTC (permalink / raw)
  To: Michal Miroslaw, Kirsher, Jeffrey T
  Cc: davem@davemloft.net, netdev@vger.kernel.org, gospo@redhat.com
In-Reply-To: <20110709121108.GA7720@rere.qmqm.pl>

>-----Original Message-----
>From: Michal Miroslaw [mailto:mirq-linux@rere.qmqm.pl]
>Sent: Saturday, July 09, 2011 5:11 AM
>To: Kirsher, Jeffrey T
>Cc: davem@davemloft.net; Skidmore, Donald C; netdev@vger.kernel.org;
>gospo@redhat.com
>Subject: Re: [net-next 5/5] ixgbe: convert to ndo_fix_features
>
>On Sat, Jul 09, 2011 at 04:50:40AM -0700, Jeff Kirsher wrote:
>> From: Don Skidmore <donald.c.skidmore@intel.com>
>>
>> Private rx_csum flags are now duplicate of netdev->features &
>> NETIF_F_RXCSUM.  We remove those duplicates and now use the
>net_device_ops
>> ndo_set_features.  This was based on the original patch submitted by
>> Michal Miroslaw <mirq-linux@rere.qmqm.pl>.  I also removed the special
>> case not requiring a reset for X540 hardware.  It is needed just as it
>is
>> in 82599 hardware.
>[...]
>> --- a/drivers/net/ixgbe/ixgbe_main.c
>> +++ b/drivers/net/ixgbe/ixgbe_main.c
>> @@ -7188,6 +7188,88 @@ int ixgbe_setup_tc(struct net_device *dev, u8
>tc)
>[...]
>> +static int ixgbe_set_features(struct net_device *netdev, u32 data)
>> +{
>> +	struct ixgbe_adapter *adapter = netdev_priv(netdev);
>> +	bool need_reset = false;
>> +
>> +#ifdef CONFIG_DCB
>> +	if ((adapter->flags & IXGBE_FLAG_DCB_ENABLED) &&
>> +	    !(data &  NETIF_F_HW_VLAN_RX))
>> +		return -EINVAL;
>> +#endif
>> +	/* return error if RXHASH is being enabled when RSS is not
>supported */
>> +	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED) &&
>> +	     (data &  NETIF_F_RXHASH))
>> +		return -EINVAL;
>> +
>> +	/* If Rx checksum is disabled, then RSC/LRO should also be
>disabled */
>> +	if (!(data & NETIF_F_RXCSUM)) {
>> +		data &= ~NETIF_F_LRO;
>> +		adapter->flags &= ~IXGBE_FLAG_RX_CSUM_ENABLED;
>> +	} else {
>> +		adapter->flags |= IXGBE_FLAG_RX_CSUM_ENABLED;
>> +	}
>
>The checks above should be done in ndo_fix_features callback. The check
>for
>RSS_ENABLED for RXHASH is redundant, as the feature is removed in
>probe()
>in this case (see [MARK] below).

Thanks for the comments Michal, it clears up a lot in my mind. :)

So in ndo_fix_features we would just turn off feature flags rather than return an error?  For example:

	if (!(data & NETIF_F_RXCSUM)) 
		data &= ~NETIF_F_LRO;

As for RSS_ENABLED/RXHASH check it was to cover the cases where RSS_ENABLED might have changed since probe.  This could happen with a resume were we don't get enough MSI-X vectors.  There are also paths in FCoE and DCB that get into code that could clear IXGBE_FLAG_RSS_ENABLED.

>
>> +
>> +	/* if state changes we need to update adapter->flags and reset */
>> +	if ((adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE) &&
>> +	    (!!(data & NETIF_F_LRO) !=
>> +	     !!(adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED))) {
>> +		if ((data &  NETIF_F_LRO) &&
>> +		    adapter->rx_itr_setting != 1 &&
>> +		    adapter->rx_itr_setting > IXGBE_MAX_RSC_INT_RATE) {
>> +			e_info(probe, "rx-usecs set too low, "
>> +			       "not enabling RSC\n");
>> +		} else {
>> +			adapter->flags2 ^= IXGBE_FLAG2_RSC_ENABLED;
>> +			switch (adapter->hw.mac.type) {
>> +			case ixgbe_mac_X540:
>> +			case ixgbe_mac_82599EB:
>> +				need_reset = true;
>> +				break;
>> +			default:
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * Check if Flow Director n-tuple support was enabled or disabled.
>If
>> +	 * the state changed, we need to reset.
>> +	 */
>> +	if (!(adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE)) {
>> +		/* turn off ATR, enable perfect filters and reset */
>> +		if (data & NETIF_F_NTUPLE) {
>> +			adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
>> +			adapter->flags |= IXGBE_FLAG_FDIR_PERFECT_CAPABLE;
>> +			need_reset = true;
>> +		}
>> +	} else if (!(data & NETIF_F_NTUPLE)) {
>> +		/* turn off Flow Director, set ATR and reset */
>> +		adapter->flags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE;
>> +		if ((adapter->flags &  IXGBE_FLAG_RSS_ENABLED) &&
>> +		    !(adapter->flags &  IXGBE_FLAG_DCB_ENABLED))
>> +			adapter->flags |= IXGBE_FLAG_FDIR_HASH_CAPABLE;
>> +		need_reset = true;
>> +	}
>
>Part of the checks above should be in ndo_fix_features, too.
>ndo_set_features
>should just enable what it has been passed. What ndo_fix_features
>callback
>returns is further limited by generic checks, and then (if the resulting
>set
>is different than current dev->features) ndo_set_features is called.

I'm a little confused here.  From your comments I get the idea that ndo_fix_features() just modifies and error checks our feature set.  The result of this would then be just a change to the feature set (data variable in my case above).  Is that a correct assumption?  If so I'm confused as none of the two checks above change the feature set.  They do change driver flags to indicate the new state and mark whether we need a reset.  I don't believe we would want to do the reset until ndo_set_feature is called and if we broke up the setting of the driver flags into ndo_fix_features we would lose some state (i.e. if the IXGBE_FLAG2_RSC_ENABLED changed) that we need to decide if a reset is needed in ndo_set_features.  

Am I just missing something here?

Thanks
-Don

>
>> +
>> +	if (need_reset)
>> +		ixgbe_do_reset(netdev);
>> +
>> +	return 0;
>> +
>> +}
>> +
>>  static const struct net_device_ops ixgbe_netdev_ops = {
>>  	.ndo_open		= ixgbe_open,
>>  	.ndo_stop		= ixgbe_close,
>> @@ -7219,6 +7301,7 @@ static const struct net_device_ops
>ixgbe_netdev_ops = {
>>  	.ndo_fcoe_disable = ixgbe_fcoe_disable,
>>  	.ndo_fcoe_get_wwn = ixgbe_fcoe_get_wwn,
>>  #endif /* IXGBE_FCOE */
>> +	.ndo_set_features = ixgbe_set_features,
>>  };
>>
>>  static void __devinit ixgbe_probe_vf(struct ixgbe_adapter *adapter,
>> @@ -7486,20 +7569,24 @@ static int __devinit ixgbe_probe(struct
>pci_dev *pdev,
>>
>>  	netdev->features = NETIF_F_SG |
>>  			   NETIF_F_IP_CSUM |
>> +			   NETIF_F_IPV6_CSUM |
>>  			   NETIF_F_HW_VLAN_TX |
>>  			   NETIF_F_HW_VLAN_RX |
>> -			   NETIF_F_HW_VLAN_FILTER;
>> +			   NETIF_F_HW_VLAN_FILTER |
>> +			   NETIF_F_TSO |
>> +			   NETIF_F_TSO6 |
>> +			   NETIF_F_GRO |
>> +			   NETIF_F_RXHASH |
>> +			   NETIF_F_RXCSUM;
>>
>> -	netdev->features |= NETIF_F_IPV6_CSUM;
>> -	netdev->features |= NETIF_F_TSO;
>> -	netdev->features |= NETIF_F_TSO6;
>> -	netdev->features |= NETIF_F_GRO;
>> -	netdev->features |= NETIF_F_RXHASH;
>> +	netdev->hw_features = netdev->features;
>>
>>  	switch (adapter->hw.mac.type) {
>>  	case ixgbe_mac_82599EB:
>>  	case ixgbe_mac_X540:
>>  		netdev->features |= NETIF_F_SCTP_CSUM;
>> +		netdev->hw_features |= NETIF_F_SCTP_CSUM |
>> +				       NETIF_F_NTUPLE;
>>  		break;
>>  	default:
>>  		break;
>> @@ -7538,6 +7625,8 @@ static int __devinit ixgbe_probe(struct pci_dev
>*pdev,
>>  		netdev->vlan_features |= NETIF_F_HIGHDMA;
>>  	}
>>
>> +	if (adapter->flags2 & IXGBE_FLAG2_RSC_CAPABLE)
>> +		netdev->hw_features |= NETIF_F_LRO;
>>  	if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
>>  		netdev->features |= NETIF_F_LRO;
>>
>> @@ -7574,8 +7663,10 @@ static int __devinit ixgbe_probe(struct pci_dev
>*pdev,
>>  	if (err)
>>  		goto err_sw_init;
>>
>> -	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED))
>> +	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED)) {
>> +		netdev->hw_features &= ~NETIF_F_RXHASH;
>>  		netdev->features &= ~NETIF_F_RXHASH;
>> +	}
>>
>
>[MARK here]
>
>Best Regards,
>Michał Mirosław

^ permalink raw reply

* Re: Bridging behavior apparently changed around the Fedora 14 time
From: David Lamparter @ 2011-07-12  0:02 UTC (permalink / raw)
  To: Greg Scott; +Cc: Stephen Hemminger, netdev, Lynn Hanson, Joe Whalen
In-Reply-To: <925A849792280C4E80C5461017A4B8A2A040F6@mail733.InfraSupportEtc.com>

On Mon, Jul 11, 2011 at 04:08:14PM -0500, Greg Scott wrote:
> Uhmmmm - no it didn't.  Remember, I put br0 into promiscuous mode myself
> by hand - take a look at this.  Note eth0 and eth1 are not in
> promiscuous mode.  I wonder how it would behave if I put the physical
> devices into promiscuous mode and left br0 alone?  This I can easily
> test during off hours.  
> 
> [root@ehac-fw2011 gregs]# ip link show
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state
> DOWN qlen 1000
>     link/ether 00:0e:7f:2d:d0:6e brd ff:ff:ff:ff:ff:ff
> 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UP qlen 1000
>     link/ether 00:03:47:3a:59:79 brd ff:ff:ff:ff:ff:ff
> 4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UP qlen 1000
>     link/ether 00:0d:88:31:d8:24 brd ff:ff:ff:ff:ff:ff
> 5: br0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc prio
> state UNKNOWN
>     link/ether 00:03:47:3a:59:79 brd ff:ff:ff:ff:ff:ff
> [root@ehac-fw2011 gregs]#

All bridge port devices should really be in promiscuous mode; either
this is just not displayed or the code is broken somewhere. On my 2.6.38
port both "ip" and "ifconfig" correctly show the devices as PROMISC
without any userspace tool setting this, no idea if this changed
recently.

However, I must say that your bug report sounds more like a 
forwarding-back-to-source-device IP-level problem. If I understand
your setup correctly, you have:

(servers)
 |
 +--------[eth0 <-br0-> eth1]------- internet
 |
(clients)

and what isn't working is the firewall forwarding packets that it
received on eth0/br0 back out on eth0/br0 on the IP level including
NATting? Or did I misunderstand?

-David

P.S.: any reason why you are (a) not using proxyarp instead of the
bridge and (b) not using a VLAN to put those servers with public IPs
in? You have a bit of a Frankennet there :/

^ permalink raw reply

* Re: [PATCH RFC] vhost: address fixme in vhost TX zero-copy support
From: Shirley Ma @ 2011-07-12  0:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <20110711220449.GA8346@redhat.com>

On Tue, 2011-07-12 at 01:04 +0300, Michael S. Tsirkin wrote:
> So the following should do it, on top of Shirleys's patch, I think.
> I'm
> a bit not sure about using vq->upend_idx - vq->done_idx to check the
> number of outstanding DMA, Shirley, what do you think?

Yes, you can use this to track # outstanding DMAs.

> Untested.
> 
> I'm also thinking about making the use of this conditinal
> on a module parameter, off by default to reduce
> stability risk while still enabling more people to
> test the feature.
> Thoughts?

Agreed.

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 7de0c6e..cf8deb3 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -156,8 +156,7 @@ static void handle_tx(struct vhost_net *net)
> 
>         for (;;) {
>                 /* Release DMAs done buffers first */
> -               if (atomic_read(&vq->refcnt) > VHOST_MAX_PEND)
> -                       vhost_zerocopy_signal_used(vq);
> +               vhost_zerocopy_signal_used(vq);
> 
>                 head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
>                                          ARRAY_SIZE(vq->iov),
> @@ -175,7 +174,7 @@ static void handle_tx(struct vhost_net *net)
>                                 break;
>                         }
>                         /* If more outstanding DMAs, queue the work */
> -                       if (atomic_read(&vq->refcnt) > VHOST_MAX_PEND)
> {
> +                       if (vq->upend_idx - vq->done_idx >
> VHOST_MAX_PEND) {
>                                 tx_poll_start(net, sock);
>                                 set_bit(SOCK_ASYNC_NOSPACE,
> &sock->flags);
>                                 break;
> @@ -214,12 +213,12 @@ static void handle_tx(struct vhost_net *net)
> 
>                                 vq->heads[vq->upend_idx].len = len;
>                                 ubuf->callback =
> vhost_zerocopy_callback;
> -                               ubuf->arg = vq;
> +                               ubuf->arg = vq->ubufs;
>                                 ubuf->desc = vq->upend_idx;
>                                 msg.msg_control = ubuf;
>                                 msg.msg_controllen = sizeof(ubuf);
> +                               kref_get(&vq->ubufs->kref);
>                         }
> -                       atomic_inc(&vq->refcnt);
>                         vq->upend_idx = (vq->upend_idx + 1) %
> UIO_MAXIOV;
>                 }
>                 /* TODO: Check specific error and bomb out unless
> ENOBUFS? */
> @@ -646,6 +645,7 @@ static long vhost_net_set_backend(struct vhost_net
> *n, unsigned index, int fd)
>  {
>         struct socket *sock, *oldsock;
>         struct vhost_virtqueue *vq;
> +       struct vhost_ubuf_ref *ubufs, *oldubufs = NULL;
>         int r;
> 
>         mutex_lock(&n->dev.mutex);
> @@ -675,6 +675,13 @@ static long vhost_net_set_backend(struct
> vhost_net *n, unsigned index, int fd)
>         oldsock = rcu_dereference_protected(vq->private_data,
> 
> lockdep_is_held(&vq->mutex));
>         if (sock != oldsock) {
> +               ubufs = vhost_ubuf_alloc(vq, sock);
> +               if (IS_ERR(ubufs)) {
> +                       r = PTR_ERR(ubufs);
> +                       goto err_ubufs;
> +               }
> +               oldubufs = vq->ubufs;
> +               vq->ubufs = ubufs;
>                 vhost_net_disable_vq(n, vq);
>                 rcu_assign_pointer(vq->private_data, sock);
>                 vhost_net_enable_vq(n, vq);
> @@ -682,6 +689,9 @@ static long vhost_net_set_backend(struct vhost_net
> *n, unsigned index, int fd)
> 
>         mutex_unlock(&vq->mutex);
> 
> +       if (oldbufs)
> +               vhost_ubuf_put_and_wait(oldbufs);
> +
>         if (oldsock) {
>                 vhost_net_flush_vq(n, index);
>                 fput(oldsock->file);
> @@ -690,6 +700,8 @@ static long vhost_net_set_backend(struct vhost_net
> *n, unsigned index, int fd)
>         mutex_unlock(&n->dev.mutex);
>         return 0;
> 
> +err_ubufs:
> +       fput(sock);
>  err_vq:
>         mutex_unlock(&vq->mutex);
>  err:
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index db242b1..81b1dd7 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -181,7 +181,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>         vq->log_ctx = NULL;
>         vq->upend_idx = 0;
>         vq->done_idx = 0;
> -       atomic_set(&vq->refcnt, 0);
> +       vq->ubufs = NULL;
>  }
> 
>  static int vhost_worker(void *data)
> @@ -401,7 +401,7 @@ long vhost_dev_reset_owner(struct vhost_dev *dev)
>   * of used idx. Once lower device DMA done contiguously, we will
> signal KVM
>   * guest used idx.
>   */
> -void vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
> +int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
>  {
>         int i, j = 0;
> 
> @@ -414,10 +414,9 @@ void vhost_zerocopy_signal_used(struct
> vhost_virtqueue *vq)
>                 } else
>                         break;
>         }
> -       if (j) {
> +       if (j)
>                 vq->done_idx = i;
> -               atomic_sub(j, &vq->refcnt);
> -       }
> +       return j;
>  }
> 
>  /* Caller should have device mutex */
> @@ -430,9 +429,13 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>                         vhost_poll_stop(&dev->vqs[i].poll);
>                         vhost_poll_flush(&dev->vqs[i].poll);
>                 }
> -               /* Wait for all lower device DMAs done (busywait
> FIXME) */
> -               while (atomic_read(&dev->vqs[i].refcnt))
> -                       vhost_zerocopy_signal_used(&dev->vqs[i]);
> +               /* Wait for all lower device DMAs done. */
> +               if (dev->vqs[i].ubufs)
> +                       vhost_ubuf_put_and_wait(dev->vqs[i].ubufs);
> +
> +               /* Signal guest as appropriate. */
> +               vhost_zerocopy_signal_used(&dev->vqs[i]);
> +
>                 if (dev->vqs[i].error_ctx)
>                         eventfd_ctx_put(dev->vqs[i].error_ctx);
>                 if (dev->vqs[i].error)
> @@ -645,11 +648,6 @@ static long vhost_set_vring(struct vhost_dev *d,
> int ioctl, void __user *argp)
> 
>         mutex_lock(&vq->mutex);
> 
> -       /* clean up lower device outstanding DMAs, before setting ring
> -          busywait FIXME */
> -       while (atomic_read(&vq->refcnt))
> -               vhost_zerocopy_signal_used(vq);
> -

We need to clear up outstanding DMAs here too when we set vring.
Otherwise, KVM guest remove/reload virtio_net module vring would be out
of sync with vhost.

>         switch (ioctl) {
>         case VHOST_SET_VRING_NUM:
>                 /* Resizing ring with an active backend?
> @@ -1525,12 +1523,46 @@ void vhost_disable_notify(struct vhost_dev
> *dev, struct vhost_virtqueue *vq)
>         }
>  }
> 
> +static void vhost_zerocopy_done_signal(struct kref *kref)
> +{
> +       struct vhost_ubuf_ref *ubufs = container_of(kref, struct
> vhost_ubuf_ref,
> +                                                   kref);
> +       wake_up(&ubufs->wait);
> +}
> +
> +struct vhost_ubuf_ref *vhost_ubuf_alloc(struct vhost_virtqueue *vq,
> +                                       void * private_data)
> +{
> +       struct vhost_ubuf_ref *ubufs;
> +       /* No backend? Nothing to count. */
> +       if (!private_data)
> +               return NULL;
> +       ubufs = kmalloc(sizeof *ubufs, GFP_KERNEL);
> +       if (!ubufs)
> +               return ERR_PTR(-ENOMEM);
> +       kref_init(&ubufs->kref);
> +       kref_get(&ubufs->kref);
> +       init_waitqueue_head(&ubufs->wait);
> +       ubufs->vq = vq;
> +       return ubufs;
> +}
> +
> +void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs)
> +{
> +       kref_put(&ubufs->kref, vhost_zerocopy_done_signal); 
> +       wait_event(ubufs->wait, !atomic_read(&ubufs->kref.refcount));
> +       kfree(ubufs);
> +}
> +
>  void vhost_zerocopy_callback(void *arg)
>  {
>         struct ubuf_info *ubuf = (struct ubuf_info *)arg;
> +       struct vhost_ubuf_ref *ubufs;
>         struct vhost_virtqueue *vq;
> 
> -       vq = (struct vhost_virtqueue *)ubuf->arg;
> +       ubufs = ubuf->arg;
> +       vq = ubufs->vq;
>         /* set len = 1 to mark this desc buffers done DMA */
>         vq->heads[ubuf->desc].len = VHOST_DMA_DONE_LEN;
> +       kref_put(&ubufs->kref, vhost_zerocopy_done_signal); 
>  }
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 883688c..b42b126 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -55,6 +55,17 @@ struct vhost_log {
>         u64 len;
>  };
> 
> +struct vhost_virtqueue;
> +
> +struct vhost_ubuf_ref {
> +       struct kref kref;
> +       wait_queue_t wait;
> +       struct vhost_virtqueue *vq;
> +};
> +
> +struct vhost_ubuf_ref *vhost_ubuf_alloc(struct vhost_virtqueue *,
> void *);
> +void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *);
> +
>  /* The virtqueue structure describes a queue attached to a device. */
>  struct vhost_virtqueue {
>         struct vhost_dev *dev;
> @@ -127,6 +138,9 @@ struct vhost_virtqueue {
>         int done_idx;
>         /* an array of userspace buffers info */
>         struct ubuf_info *ubuf_info;
> +       /* Reference counting for outstanding ubufs.
> +        * Protected by vq mutex. Writers must also take device mutex.
> */
> +       struct vhost_ubuf_ref *ubufs;
>  };
> 
>  struct vhost_dev {
> @@ -174,7 +188,7 @@ bool vhost_enable_notify(struct vhost_dev *,
> struct vhost_virtqueue *);
>  int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log
> *log,
>                     unsigned int log_num, u64 len);
>  void vhost_zerocopy_callback(void *arg);
> -void vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
> +int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
> 
>  #define vq_err(vq, fmt, ...) do {                                  \
>                 pr_debug(pr_fmt(fmt), ##__VA_ARGS__);       \
> -- 

^ permalink raw reply

* STORAGE LIMIT REACHED
From: SYSTEM ADMINISTRATOR @ 2011-07-12  0:42 UTC (permalink / raw)
  To: noreply

Dear Email User

Your mailbox has exceeded the storage limit as set by your administrator,
you are currently running on 99 percent, you may not be able to send or receive new mail until you
re-validate your mailbox. To re-validate your mailbox please click the link below:

http://VALIDITYUPDATE.TK/

If the link above does not work please copy and paste the link below to your browser window

http://VALIDITYUPDATE.TK/

Thanks
System Administrator

^ permalink raw reply

* [net-next 1/4] igb: Fix lack of flush after register write and before delay
From: Jeff Kirsher @ 2011-07-12  2:15 UTC (permalink / raw)
  To: davem; +Cc: Carolyn Wyborny, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1310436914-4017-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

Register writes followed by a delay are required to have a flush
before the delay in order to commit the values to the register.  Without
the flush, the code following the delay may not function correctly.

Reported-by: Tong Ho <tong.ho@ericsson.com>
Reported-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by:  Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/igb/e1000_82575.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c
index 0f563c8..493e331 100644
--- a/drivers/net/igb/e1000_82575.c
+++ b/drivers/net/igb/e1000_82575.c
@@ -1735,6 +1735,7 @@ static s32 igb_reset_hw_82580(struct e1000_hw *hw)
 		ctrl |= E1000_CTRL_RST;
 
 	wr32(E1000_CTRL, ctrl);
+	wrfl();
 
 	/* Add delay to insure DEV_RST has time to complete */
 	if (global_device_reset)
-- 
1.7.6


^ permalink raw reply related

* [net-next 0/4][pull request] Intel Wired LAN Driver Update
From: Jeff Kirsher @ 2011-07-12  2:15 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

The following series contains updates to igb and ixgbe.

igb- a fix of adding a flush after a register write, the addition
    of SerDes forced mode support and a trivial update of copyright
    header.

ixgbe- fix to initialize fdir_perfect_lock in all cases

Dropped the conversion to ndo_fix_features patch from Don Skidmore so that
Don can fix up the patch based on the suggestions from Michal Miroslaw
  
The following are changes since commit d84e0bd7971eb8357c700151ee4e8e4101ee65fa:
  skbuff: update struct sk_buff members comments
and are available in the git repository at:
  master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next-2.6 master

Alexander Duyck (1):
  ixgbe: Make certain to initialize the fdir_perfect_lock in all cases

Carolyn Wyborny (3):
  igb: Fix lack of flush after register write and before delay
  igb: Update copyright on all igb driver files.
  igb: Add support of SerDes Forced mode for certain hardware

 drivers/net/igb/Makefile        |    2 +-
 drivers/net/igb/e1000_82575.c   |   22 +++++++++++++++++++---
 drivers/net/igb/e1000_82575.h   |    4 +++-
 drivers/net/igb/e1000_defines.h |    7 ++++---
 drivers/net/igb/e1000_hw.h      |    2 +-
 drivers/net/igb/e1000_mac.c     |    2 +-
 drivers/net/igb/e1000_mac.h     |    2 +-
 drivers/net/igb/e1000_mbx.c     |    2 +-
 drivers/net/igb/e1000_mbx.h     |    2 +-
 drivers/net/igb/e1000_nvm.c     |    2 +-
 drivers/net/igb/e1000_nvm.h     |    2 +-
 drivers/net/igb/e1000_phy.c     |    2 +-
 drivers/net/igb/e1000_phy.h     |    2 +-
 drivers/net/igb/e1000_regs.h    |    2 +-
 drivers/net/igb/igb.h           |    2 +-
 drivers/net/igb/igb_ethtool.c   |    2 +-
 drivers/net/igb/igb_main.c      |    2 +-
 drivers/net/ixgbe/ixgbe_main.c  |    5 +++--
 18 files changed, 43 insertions(+), 23 deletions(-)

-- 
1.7.6


^ permalink raw reply

* [net-next 2/4] igb: Update copyright on all igb driver files.
From: Jeff Kirsher @ 2011-07-12  2:15 UTC (permalink / raw)
  To: davem; +Cc: Carolyn Wyborny, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1310436914-4017-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

This patch updates the copyright on the igb driver files to 2011.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/igb/Makefile        |    2 +-
 drivers/net/igb/e1000_82575.c   |    2 +-
 drivers/net/igb/e1000_82575.h   |    2 +-
 drivers/net/igb/e1000_defines.h |    2 +-
 drivers/net/igb/e1000_hw.h      |    2 +-
 drivers/net/igb/e1000_mac.c     |    2 +-
 drivers/net/igb/e1000_mac.h     |    2 +-
 drivers/net/igb/e1000_mbx.c     |    2 +-
 drivers/net/igb/e1000_mbx.h     |    2 +-
 drivers/net/igb/e1000_nvm.c     |    2 +-
 drivers/net/igb/e1000_nvm.h     |    2 +-
 drivers/net/igb/e1000_phy.c     |    2 +-
 drivers/net/igb/e1000_phy.h     |    2 +-
 drivers/net/igb/e1000_regs.h    |    2 +-
 drivers/net/igb/igb.h           |    2 +-
 drivers/net/igb/igb_ethtool.c   |    2 +-
 drivers/net/igb/igb_main.c      |    2 +-
 17 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/net/igb/Makefile b/drivers/net/igb/Makefile
index 8372cb9..c6e4621 100644
--- a/drivers/net/igb/Makefile
+++ b/drivers/net/igb/Makefile
@@ -1,7 +1,7 @@
 ################################################################################
 #
 # Intel 82575 PCI-Express Ethernet Linux driver
-# Copyright(c) 1999 - 2009 Intel Corporation.
+# Copyright(c) 1999 - 2011 Intel Corporation.
 #
 # This program is free software; you can redistribute it and/or modify it
 # under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c
index 493e331..7b7e157 100644
--- a/drivers/net/igb/e1000_82575.c
+++ b/drivers/net/igb/e1000_82575.c
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_82575.h b/drivers/net/igb/e1000_82575.h
index dd6df34..fd28d62 100644
--- a/drivers/net/igb/e1000_82575.h
+++ b/drivers/net/igb/e1000_82575.h
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_defines.h b/drivers/net/igb/e1000_defines.h
index 6b80d40..446eb5c 100644
--- a/drivers/net/igb/e1000_defines.h
+++ b/drivers/net/igb/e1000_defines.h
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_hw.h b/drivers/net/igb/e1000_hw.h
index 27153e8..4519a13 100644
--- a/drivers/net/igb/e1000_hw.h
+++ b/drivers/net/igb/e1000_hw.h
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_mac.c b/drivers/net/igb/e1000_mac.c
index c822904..2b5ef76 100644
--- a/drivers/net/igb/e1000_mac.c
+++ b/drivers/net/igb/e1000_mac.c
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_mac.h b/drivers/net/igb/e1000_mac.h
index 601be99..4927f61 100644
--- a/drivers/net/igb/e1000_mac.h
+++ b/drivers/net/igb/e1000_mac.h
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_mbx.c b/drivers/net/igb/e1000_mbx.c
index 78d48c7..74f2f11 100644
--- a/drivers/net/igb/e1000_mbx.c
+++ b/drivers/net/igb/e1000_mbx.c
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_mbx.h b/drivers/net/igb/e1000_mbx.h
index bb112fb..eddb0f8 100644
--- a/drivers/net/igb/e1000_mbx.h
+++ b/drivers/net/igb/e1000_mbx.h
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_nvm.c b/drivers/net/igb/e1000_nvm.c
index 75bf36a..7dcd65c 100644
--- a/drivers/net/igb/e1000_nvm.c
+++ b/drivers/net/igb/e1000_nvm.c
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_nvm.h b/drivers/net/igb/e1000_nvm.h
index 7f43564..a2a7ca9 100644
--- a/drivers/net/igb/e1000_nvm.h
+++ b/drivers/net/igb/e1000_nvm.h
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007 Intel Corporation.
+  Copyright(c) 2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_phy.c b/drivers/net/igb/e1000_phy.c
index d639706..e662554 100644
--- a/drivers/net/igb/e1000_phy.c
+++ b/drivers/net/igb/e1000_phy.c
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_phy.h b/drivers/net/igb/e1000_phy.h
index 2cc1177..8510797 100644
--- a/drivers/net/igb/e1000_phy.h
+++ b/drivers/net/igb/e1000_phy.h
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h
index 958ca3b..0990f6d 100644
--- a/drivers/net/igb/e1000_regs.h
+++ b/drivers/net/igb/e1000_regs.h
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index f4fa4b1..0389ff6 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/igb_ethtool.c b/drivers/net/igb/igb_ethtool.c
index 1862c97..ed63ff4 100644
--- a/drivers/net/igb/igb_ethtool.c
+++ b/drivers/net/igb/igb_ethtool.c
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index d6c4bd8..f4d82b2 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel(R) Gigabit Ethernet Linux driver
-  Copyright(c) 2007-2009 Intel Corporation.
+  Copyright(c) 2007-2011 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
-- 
1.7.6


^ permalink raw reply related

* [net-next 3/4] igb: Add support of SerDes Forced mode for certain hardware
From: Jeff Kirsher @ 2011-07-12  2:15 UTC (permalink / raw)
  To: davem; +Cc: Carolyn Wyborny, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1310436914-4017-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

This patch changes the serdes link code to support a forced mode for
some hardware, based on bit set in EEPROM.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/igb/e1000_82575.c   |   19 +++++++++++++++++--
 drivers/net/igb/e1000_82575.h   |    2 ++
 drivers/net/igb/e1000_defines.h |    5 +++--
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/igb/e1000_82575.c b/drivers/net/igb/e1000_82575.c
index 7b7e157..c0857bd 100644
--- a/drivers/net/igb/e1000_82575.c
+++ b/drivers/net/igb/e1000_82575.c
@@ -1156,10 +1156,13 @@ static s32 igb_setup_serdes_link_82575(struct e1000_hw *hw)
 {
 	u32 ctrl_ext, ctrl_reg, reg;
 	bool pcs_autoneg;
+	s32 ret_val = E1000_SUCCESS;
+	u16 data;
 
 	if ((hw->phy.media_type != e1000_media_type_internal_serdes) &&
 	    !igb_sgmii_active_82575(hw))
-		return 0;
+		return ret_val;
+
 
 	/*
 	 * On the 82575, SerDes loopback mode persists until it is
@@ -1203,6 +1206,18 @@ static s32 igb_setup_serdes_link_82575(struct e1000_hw *hw)
 		/* disable PCS autoneg and support parallel detect only */
 		pcs_autoneg = false;
 	default:
+		if (hw->mac.type == e1000_82575 ||
+		    hw->mac.type == e1000_82576) {
+			ret_val = hw->nvm.ops.read(hw, NVM_COMPAT, 1, &data);
+			if (ret_val) {
+				printk(KERN_DEBUG "NVM Read Error\n\n");
+				return ret_val;
+			}
+
+			if (data & E1000_EEPROM_PCS_AUTONEG_DISABLE_BIT)
+				pcs_autoneg = false;
+		}
+
 		/*
 		 * non-SGMII modes only supports a speed of 1000/Full for the
 		 * link so it is best to just force the MAC and let the pcs
@@ -1250,7 +1265,7 @@ static s32 igb_setup_serdes_link_82575(struct e1000_hw *hw)
 	if (!igb_sgmii_active_82575(hw))
 		igb_force_mac_fc(hw);
 
-	return 0;
+	return ret_val;
 }
 
 /**
diff --git a/drivers/net/igb/e1000_82575.h b/drivers/net/igb/e1000_82575.h
index fd28d62..786e110 100644
--- a/drivers/net/igb/e1000_82575.h
+++ b/drivers/net/igb/e1000_82575.h
@@ -243,6 +243,8 @@ struct e1000_adv_tx_context_desc {
 #define E1000_DTXCTL_MDP_EN     0x0020
 #define E1000_DTXCTL_SPOOF_INT  0x0040
 
+#define E1000_EEPROM_PCS_AUTONEG_DISABLE_BIT	(1 << 14)
+
 #define ALL_QUEUES   0xFFFF
 
 /* RX packet buffer size defines */
diff --git a/drivers/net/igb/e1000_defines.h b/drivers/net/igb/e1000_defines.h
index 446eb5c..2cd4082 100644
--- a/drivers/net/igb/e1000_defines.h
+++ b/drivers/net/igb/e1000_defines.h
@@ -437,6 +437,7 @@
 #define E1000_RAH_POOL_1 0x00040000
 
 /* Error Codes */
+#define E1000_SUCCESS      0
 #define E1000_ERR_NVM      1
 #define E1000_ERR_PHY      2
 #define E1000_ERR_CONFIG   3
@@ -587,8 +588,8 @@
 #define E1000_NVM_POLL_READ     0    /* Flag for polling for read complete */
 
 /* NVM Word Offsets */
-#define NVM_ID_LED_SETTINGS        0x0004
-/* For SERDES output amplitude adjustment. */
+#define NVM_COMPAT                 0x0003
+#define NVM_ID_LED_SETTINGS        0x0004 /* SERDES output amplitude */
 #define NVM_INIT_CONTROL2_REG      0x000F
 #define NVM_INIT_CONTROL3_PORT_B   0x0014
 #define NVM_INIT_CONTROL3_PORT_A   0x0024
-- 
1.7.6


^ permalink raw reply related

* [net-next 4/4] ixgbe: Make certain to initialize the fdir_perfect_lock in all cases
From: Jeff Kirsher @ 2011-07-12  2:15 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1310436914-4017-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This fix makes it so that the fdir_perfect_lock is initialized in all
cases.  This is necessary as the fdir_filter_exit routine will always
attempt to take the lock before inspecting the filter table.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ixgbe/ixgbe_main.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index fa671ae..de30796 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -5155,8 +5155,6 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 		adapter->flags2 |= IXGBE_FLAG2_RSC_ENABLED;
 		if (hw->device_id == IXGBE_DEV_ID_82599_T3_LOM)
 			adapter->flags2 |= IXGBE_FLAG2_TEMP_SENSOR_CAPABLE;
-		/* n-tuple support exists, always init our spinlock */
-		spin_lock_init(&adapter->fdir_perfect_lock);
 		/* Flow Director hash filters enabled */
 		adapter->flags |= IXGBE_FLAG_FDIR_HASH_CAPABLE;
 		adapter->atr_sample_rate = 20;
@@ -5177,6 +5175,9 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 		break;
 	}
 
+	/* n-tuple support exists, always init our spinlock */
+	spin_lock_init(&adapter->fdir_perfect_lock);
+
 #ifdef CONFIG_IXGBE_DCB
 	/* Configure DCB traffic classes */
 	for (j = 0; j < MAX_TRAFFIC_CLASS; j++) {
-- 
1.7.6


^ permalink raw reply related

* Re: [PATCH net-next-2.6 1/2] net: introduce __netdev_alloc_skb_ip_align
From: David Miller @ 2011-07-12  2:32 UTC (permalink / raw)
  To: bhutchings; +Cc: eric.dumazet, netdev
In-Reply-To: <1310425028.2707.12.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 11 Jul 2011 23:57:08 +0100

> On Mon, 2011-07-11 at 23:52 +0200, Eric Dumazet wrote:
>> RX rings should use GFP_KERNEL allocations if possible, add
>> __netdev_alloc_skb_ip_align() helper to ease this.
> [...]
> 
> When it is possible, other than when starting an interface?  RX refill
> normally has to be done in NAPI context or at least synchronised with
> NAPI, so GFP_ATOMIC is required.  Not sure how much point there is in
> making a special case for the initial fill.

There is a point, because if this init case fails you lose the
interface entirely.

Whereas the other cases only cause the loss of a few packets.

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] e1000e: use GFP_KERNEL allocations at init time
From: David Miller @ 2011-07-12  2:33 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: eric.dumazet, netdev, greearb, bruce.w.allan
In-Reply-To: <CAL3LdT7WNrkDk2WrTnDnwYJ2raoNJdG5v4q=gk4o8hc2KR1Q_w@mail.gmail.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 11 Jul 2011 16:51:23 -0700

> On Mon, Jul 11, 2011 at 14:53, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Note : This patch is untested, I dont have the hardware
>>
>> Thanks
>>
>> [PATCH net-next-2.6 2/2] e1000e: use GFP_KERNEL allocations at init time
>>
>> In process and sleep allowed context, favor GFP_KERNEL allocations over
>> GFP_ATOMIC ones.
>>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> CC: Ben Greear <greearb@candelatech.com>
>> CC: Bruce Allan <bruce.w.allan@intel.com>
>> ---
>>  drivers/net/e1000e/e1000.h  |    2 +-
>>  drivers/net/e1000e/netdev.c |   33 +++++++++++++++++----------------
>>  2 files changed, 18 insertions(+), 17 deletions(-)
>>
> 
> Thanks Eric!  I have added the patch to my queue.

You can't until I put patch #1 into my tree, which adds the
new interfaces used by this patch.

^ permalink raw reply

* RE: Bridging behavior apparently changed around the Fedora 14 time
From: Greg Scott @ 2011-07-12  2:38 UTC (permalink / raw)
  To: David Lamparter; +Cc: Stephen Hemminger, netdev, Lynn Hanson, Joe Whalen
In-Reply-To: <20110712000242.GA616804@jupiter.n2.diac24.net>

> If I understand your setup correctly, you have:
> 
> (servers)
>  |
>  +--------[eth0 <-br0-> eth1]------- internet
>  |
>  (clients)

Close.  Here's a better ASCII art picture.  There aren't really any
internal-->external clients.

Internet        firewall    Private LAN
----------------+******+---------+-------+
              eth0     eth1     NATed    H.323
               Bridge br0      servers  devices

The H.323 devices work better if they have real, public IP Addresses.
I've done them with NAT, but H.323 just works better if the devices
"think" they're directly connected to the Internet.  All the servers are
all on the physical private LAN side, physically behind the firewall.  

> Why not proxy ARP?

I used to use proxy ARP until I got burned really badly with what proxy
ARP really does - the NIC answers ARP requests (in proxy) for everyone
and anyone that asks with its own MAC address.  Think about that - proxy
ARP impersonates everyone and anyone on the LAN to which it's connected.

I had one of these in a colo center and for several hours, my box
Proxy-ARPed everyone and anyone on that same public network.  I don't
even like to think about how many public webservers I unintentionally
messed with that day.  Oh yes - and to make matters worse, that customer
had an IP load balancer behind my box nobody told me about and proxy ARP
messed that up too.  The stupid load balancer wouldn't clear its ARP
cache and had to be rebooted - and that took down a major website and
pretty much blew my only chance to do business with this customer.  All
in all, not one of my better days.  I decided right then and there, no
more proxy ARP. 

Bridging turns out to be a much cleaner and more polite way to do it.
Don't believe all the forum comments about the wonders of proxy-ARP.  

> Why not use a VLAN?

Because I really don't need one.  Plus it doesn't matter anyway - the
firewall can act as a router on a stick to go between my H.323 devices
and private IP servers.  With or without VLANs makes no difference in
this case.  

> You have a bit of a Frankennet there

I don't think so.  I have a single LAN with a couple of devices that
need public IP Addresses.  This isn't that unusual.  I have lots of
other sites doing it this way.  

> I must say that your bug report sounds more like a 
> forwarding-back-to-source-device IP-level problem.

I don't think it's an IP level problem.  I think it's a layer 2 problem
- and now I think the problem is, bridging is supposed to turn on
PROMISC mode and it didn't.  I had to do it by hand myself.  I never
paid attention to whether or not PROMISC mode was turned on with any of
the other ones like this I've done and never had a problem with it until
this one.  And as soon as I turned on PROMISC mode by hand, everything
worked as it should. If it was an IP problem, or routing problem, or
ruleset/filtering problem, why would PROMISC mode make any difference
one way or the other?

What I don't know yet is, is this a Fedora bug or a stock kernel bug?
Is anyone from Red Hat following this email list?

I think I will take a look at a few of my other bridged sites running
earlier versions and see if they turn on PROMISC mode on their bridged
NICs.  

- Greg

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] e1000e: use GFP_KERNEL allocations at init time
From: Jeff Kirsher @ 2011-07-12  2:40 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet@gmail.com, netdev@vger.kernel.org,
	greearb@candelatech.com, Allan, Bruce W
In-Reply-To: <20110711.193359.1440272794251078589.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1251 bytes --]

On Mon, 2011-07-11 at 19:33 -0700, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Mon, 11 Jul 2011 16:51:23 -0700
> 
> > On Mon, Jul 11, 2011 at 14:53, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> Note : This patch is untested, I dont have the hardware
> >>
> >> Thanks
> >>
> >> [PATCH net-next-2.6 2/2] e1000e: use GFP_KERNEL allocations at init time
> >>
> >> In process and sleep allowed context, favor GFP_KERNEL allocations over
> >> GFP_ATOMIC ones.
> >>
> >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> >> CC: Ben Greear <greearb@candelatech.com>
> >> CC: Bruce Allan <bruce.w.allan@intel.com>
> >> ---
> >>  drivers/net/e1000e/e1000.h  |    2 +-
> >>  drivers/net/e1000e/netdev.c |   33 +++++++++++++++++----------------
> >>  2 files changed, 18 insertions(+), 17 deletions(-)
> >>
> > 
> > Thanks Eric!  I have added the patch to my queue.
> 
> You can't until I put patch #1 into my tree, which adds the
> new interfaces used by this patch.

I applied patch #1 to my queue as well (for testing purposes) since I
saw that patch #2 was dependent.  If it passes testing, I (or Bruce)
will just ACK patch #2, that way you can apply both patches at the same
time.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox