Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net -v2] [BUGFIX] bonding: use local function pointer of bond->recv_probe in bond_handle_frame
From: Jay Vosburgh @ 2011-10-19 17:59 UTC (permalink / raw)
  To: David Miller
  Cc: mitsuo.hayasaka.hu, andy, netdev, linux-kernel, yrl.pp-manager.tt,
	eric.dumazet, xiyou.wangcong
In-Reply-To: <20111019.000311.1490092497677136273.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote:

>From: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
>Date: Thu, 13 Oct 2011 11:04:29 +0900
>
>> The bond->recv_probe is called in bond_handle_frame() when
>> a packet is received, but bond_close() sets it to NULL. So,
>> a panic occurs when both functions work in parallel.
>> 
>> Why this happen:
>> After null pointer check of bond->recv_probe, an sk_buff is
>> duplicated and bond->recv_probe is called in bond_handle_frame.
>> So, a panic occurs when bond_close() is called between the
>> check and call of bond->recv_probe.
>> 
>> Patch:
>> This patch uses a local function pointer of bond->recv_probe
>> in bond_handle_frame(). So, it can avoid the null pointer
>> dereference.
>> 
>> 
>> Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
>> Cc: Jay Vosburgh <fubar@us.ibm.com>
>> Cc: Andy Gospodarek <andy@greyhouse.net>
>> Cc: Eric Dumazet <eric.dumazet@gmail.com>
>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>
>Bonding folks please review this, thanks.
>

	Looks reasonable.  Even if by some quirk of timing the
recv_probe function ends up being entered after bond_close has
completed, it doesn't look like there is a risk of those functions
misbehaving (because bond_close doesn't deallocate the data structures).

	-J

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH net -v2] [BUGFIX] bonding: use flush_delayed_work_sync in bond_close
From: Jay Vosburgh @ 2011-10-19 18:01 UTC (permalink / raw)
  To: Mitsuo Hayasaka
  Cc: Andy Gospodarek, netdev, linux-kernel, yrl.pp-manager.tt,
	WANG Cong
In-Reply-To: <20111019081757.12455.24788.stgit@ltc219.sdl.hitachi.co.jp>

Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> wrote:

>The bond_close() calls cancel_delayed_work() to cancel delayed works.
>It, however, cannot cancel works that were already queued in workqueue.
>The bond_open() initializes work->data, and proccess_one_work() refers
>get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
>work->data has been initialized. Thus, a panic occurs.
>
>This patch uses flush_delayed_work_sync() instead of cancel_delayed_work()
>in bond_close(). It cancels delayed timer and waits for work to finish
>execution. So, it can avoid the null pointer dereference due to the
>parallel executions of proccess_one_work() and initializing proccess
>of bond_open().

	I'm setting up to test this.  I have a dim recollection that we
tried this some years ago, and there was a different deadlock that
manifested through the flush path.  Perhaps changes since then have
removed that problem.

	-J

>Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
>Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
>Cc: Jay Vosburgh <fubar@us.ibm.com>
>Cc: Andy Gospodarek <andy@greyhouse.net>
>Cc: WANG Cong <xiyou.wangcong@gmail.com>
>---
>
> drivers/net/bonding/bond_main.c |   10 +++++-----
> 1 files changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index de3d351..a4353f9 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3504,27 +3504,27 @@ static int bond_close(struct net_device *bond_dev)
> 	write_unlock_bh(&bond->lock);
>
> 	if (bond->params.miimon) {  /* link check interval, in milliseconds. */
>-		cancel_delayed_work(&bond->mii_work);
>+		flush_delayed_work_sync(&bond->mii_work);
> 	}
>
> 	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
>-		cancel_delayed_work(&bond->arp_work);
>+		flush_delayed_work_sync(&bond->arp_work);
> 	}
>
> 	switch (bond->params.mode) {
> 	case BOND_MODE_8023AD:
>-		cancel_delayed_work(&bond->ad_work);
>+		flush_delayed_work_sync(&bond->ad_work);
> 		break;
> 	case BOND_MODE_TLB:
> 	case BOND_MODE_ALB:
>-		cancel_delayed_work(&bond->alb_work);
>+		flush_delayed_work_sync(&bond->alb_work);
> 		break;
> 	default:
> 		break;
> 	}
>
> 	if (delayed_work_pending(&bond->mcast_work))
>-		cancel_delayed_work(&bond->mcast_work);
>+		flush_delayed_work_sync(&bond->mcast_work);
>
> 	if (bond_is_lb(bond)) {
> 		/* Must be called only after all
>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH] route: fix ICMP redirect validation
From: Flavio Leitner @ 2011-10-19 18:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111017.194344.510280595317217573.davem@davemloft.net>

On Mon, 17 Oct 2011 19:43:44 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Flavio Leitner <fbl@redhat.com>
> Date: Wed,  5 Oct 2011 11:20:04 -0300
> 
> > The commit f39925dbde7788cfb96419c0f092b086aa325c0f
> > (ipv4: Cache learned redirect information in inetpeer.)
> > removed some ICMP packet validations which are required by
> > RFC 1122, section 3.2.2.2:
> > ...
> >   A Redirect message SHOULD be silently discarded if the new
> >   gateway address it specifies is not on the same connected
> >   (sub-) net through which the Redirect arrived [INTRO:2,
> >   Appendix A], or if the source of the Redirect is not the
> >   current first-hop gateway for the specified destination (see
> >   Section 3.3.1).
> > 
> > Signed-off-by: Flavio Leitner <fbl@redhat.com>
> 
> The reason for putting this into the inetpeer cache was so that we
> didn't need to consult the routing cache at all.  We're working to
> remove it at some point, so every dependency matters.
> 
> Can you implement this such that only an inetpeer cache probe is
> necessary?
> 

Sure, I have reviewed your patch series to remove the routing
cache and I believe this version works with and without it, though
I have tested only with current net-next code.

Thanks for your time reviewing, I appreciate it.
fbl

Signed-off-by: Flavio Leitner <fbl@redhat.com>
---
 net/ipv4/route.c |   43 ++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 26c77e1..1a639b9 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1308,8 +1308,14 @@ static void rt_del(unsigned hash, struct rtable *rt)
 void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 		    __be32 saddr, struct net_device *dev)
 {
+	int s, i;
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
+	struct rtable *rt;
+	__be32 skeys[2] = { saddr, 0 };
+	int    ikeys[2] = { dev->ifindex, 0 };
+	struct flowi4 fl4;
 	struct inet_peer *peer;
+	bool   putpeer = false;
 	struct net *net;
 
 	if (!in_dev)
@@ -1331,13 +1337,40 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 			goto reject_redirect;
 	}
 
-	peer = inet_getpeer_v4(daddr, 1);
-	if (peer) {
-		peer->redirect_learned.a4 = new_gw;
+	memset(&fl4, 0, sizeof(fl4));
+	fl4.daddr = daddr;
+	for (s = 0; s < 2; s++) {
+		for (i = 0; i < 2; i++) {
+			fl4.flowi4_oif = ikeys[i];
+			fl4.saddr = skeys[s];
+			rt = __ip_route_output_key(net, &fl4);
+			if (IS_ERR(rt))
+				continue;
 
-		inet_putpeer(peer);
+			if (rt->dst.error || rt->dst.dev != dev ||
+			    rt->rt_gateway != old_gw) {
+				ip_rt_put(rt);
+				continue;
+			}
 
-		atomic_inc(&__rt_peer_genid);
+			peer = rt->peer;
+			if (!peer) {
+				peer = inet_getpeer_v4(daddr, 1);
+				putpeer = true;
+			}
+
+			if (peer) {
+				peer->redirect_learned.a4 = new_gw;
+
+				if (putpeer)
+					inet_putpeer(peer);
+
+				atomic_inc(&__rt_peer_genid);
+			}
+
+			ip_rt_put(rt);
+			return;
+		}
 	}
 	return;
 
-- 
1.7.6

^ permalink raw reply related

* [PATCH net-next] Add ethtool -g support to virtio_net
From: Rick Jones @ 2011-10-19 18:10 UTC (permalink / raw)
  To: netdev, rusty, mst, virtualization

From: Rick Jones <rick.jones2@hp.com>

Add support for reporting ring sizes via ethtool -g to the virtio_net
driver.

Signed-off-by: Rick Jones <rick.jones2@hp.com>

---

Tested briefly in a single guest:

raj@raj-ubuntu-guest:~$ ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:		256
RX Mini:	0
RX Jumbo:	0
TX:		256
Current hardware settings:
RX:		256
RX Mini:	0
RX Jumbo:	0
TX:		256


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b8225f3..3e642a4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -880,8 +880,21 @@ static void virtnet_vlan_rx_kill_vid(struct net_device *dev, u16 vid)
 		dev_warn(&dev->dev, "Failed to kill VLAN ID %d.\n", vid);
 }
 
+static void virtnet_get_ringparam(struct net_device *dev,
+				struct ethtool_ringparam *ring)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+
+	ring->rx_max_pending = virtqueue_get_vring_size(vi->rvq);
+	ring->tx_max_pending = virtqueue_get_vring_size(vi->svq);
+	ring->rx_pending = ring->rx_max_pending;
+	ring->tx_pending = ring->tx_max_pending;
+
+}
+
 static const struct ethtool_ops virtnet_ethtool_ops = {
 	.get_link = ethtool_op_get_link,
+	.get_ringparam = virtnet_get_ringparam,
 };
 
 #define MIN_MTU 68
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 68b9136..4acf888 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -529,4 +529,14 @@ void vring_transport_features(struct virtio_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vring_transport_features);
 
+/* return the size of the vring within the virtqueue */
+unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
+{
+
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->vring.num;
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
+
 MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 7108857..851ebf1 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,9 @@ struct virtqueue {
  * virtqueue_detach_unused_buf: detach first unused buffer
  * 	vq: the struct virtqueue we're talking about.
  * 	Returns NULL or the "data" token handed to add_buf
+ * virtqueue_get_vring_size: return the size of the virtqueue's vring
+ *	vq: the struct virtqueue containing the vring of interest.
+ *	Returns the size of the vring.
  *
  * Locking rules are straightforward: the driver is responsible for
  * locking.  No two operations may be invoked simultaneously, with the exception
@@ -97,6 +100,8 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
+unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
+
 /**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus

^ permalink raw reply related

* Re: [PATCH net -v2] [BUGFIX] bonding: use flush_delayed_work_sync in bond_close
From: Stephen Hemminger @ 2011-10-19 18:41 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Mitsuo Hayasaka, Andy Gospodarek, netdev, linux-kernel,
	yrl.pp-manager.tt, WANG Cong
In-Reply-To: <27007.1319047262@death>

On Wed, 19 Oct 2011 11:01:02 -0700
Jay Vosburgh <fubar@us.ibm.com> wrote:

> Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> wrote:
> 
> >The bond_close() calls cancel_delayed_work() to cancel delayed works.
> >It, however, cannot cancel works that were already queued in workqueue.
> >The bond_open() initializes work->data, and proccess_one_work() refers
> >get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
> >work->data has been initialized. Thus, a panic occurs.
> >
> >This patch uses flush_delayed_work_sync() instead of cancel_delayed_work()
> >in bond_close(). It cancels delayed timer and waits for work to finish
> >execution. So, it can avoid the null pointer dereference due to the
> >parallel executions of proccess_one_work() and initializing proccess
> >of bond_open().
> 
> 	I'm setting up to test this.  I have a dim recollection that we
> tried this some years ago, and there was a different deadlock that
> manifested through the flush path.  Perhaps changes since then have
> removed that problem.
> 
> 	-J

Won't this deadlock on RTNL.  The problem is that:

   CPU0                            CPU1
  rtnl_lock
      bond_close
                                 delayed_work
                                   mii_work
                                     read_lock(bond->lock);
                                     read_unlock(bond->lock);
                                     rtnl_lock... waiting for CPU0
      flush_delayed_work_sync
          waiting for delayed_work to finish...
              
                                    

^ permalink raw reply

* Re: [PATCH net -v2] [BUGFIX] bonding: use flush_delayed_work_sync in bond_close
From: Jay Vosburgh @ 2011-10-19 19:09 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Mitsuo Hayasaka, Andy Gospodarek, netdev, linux-kernel,
	yrl.pp-manager.tt, WANG Cong
In-Reply-To: <20111019114129.162d895a@nehalam.linuxnetplumber.net>

Stephen Hemminger <shemminger@vyatta.com> wrote:

>On Wed, 19 Oct 2011 11:01:02 -0700
>Jay Vosburgh <fubar@us.ibm.com> wrote:
>
>> Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> wrote:
>> 
>> >The bond_close() calls cancel_delayed_work() to cancel delayed works.
>> >It, however, cannot cancel works that were already queued in workqueue.
>> >The bond_open() initializes work->data, and proccess_one_work() refers
>> >get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
>> >work->data has been initialized. Thus, a panic occurs.
>> >
>> >This patch uses flush_delayed_work_sync() instead of cancel_delayed_work()
>> >in bond_close(). It cancels delayed timer and waits for work to finish
>> >execution. So, it can avoid the null pointer dereference due to the
>> >parallel executions of proccess_one_work() and initializing proccess
>> >of bond_open().
>> 
>> 	I'm setting up to test this.  I have a dim recollection that we
>> tried this some years ago, and there was a different deadlock that
>> manifested through the flush path.  Perhaps changes since then have
>> removed that problem.
>> 
>> 	-J
>
>Won't this deadlock on RTNL.  The problem is that:
>
>   CPU0                            CPU1
>  rtnl_lock
>      bond_close
>                                 delayed_work
>                                   mii_work
>                                     read_lock(bond->lock);
>                                     read_unlock(bond->lock);
>                                     rtnl_lock... waiting for CPU0
>      flush_delayed_work_sync
>          waiting for delayed_work to finish...

	Yah, that was it.  We discussed this a couple of years ago in
regards to a similar patch:

http://lists.openwall.net/netdev/2009/12/17/3

	The short version is that we could rework the rtnl_lock inside
the montiors to be conditional and retry on failure (where "retry" means
"reschedule the work and try again later," not "spin retrying on rtnl").
That should permit the use of flush or cancel to terminate the work
items.

	I'll fiddle with it some later today and see if that seems
viable.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
From: David Miller @ 2011-10-19 19:32 UTC (permalink / raw)
  To: gaofeng; +Cc: steffen.klassert, netdev
In-Reply-To: <4E9E9366.1050702@cn.fujitsu.com>

From: Gao feng <gaofeng@cn.fujitsu.com>
Date: Wed, 19 Oct 2011 17:07:50 +0800

> 2011.10.17 20:18, Steffen Klassert wrote:
>> On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote:
>>> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
>>>>
>>>> Please find out exactly why dst->obsolete is non-zero on a freshly
>>>> looked up route.  It's unexpected.
>>>
>>> Hm, on a slow path route lookup e.g. __mkroute_output() calls
>>> rt_dst_alloc() which initializes dst->obsolete to -1.
>> 
>> Just a follow up:
>> git commit d11a4dc18 (ipv4: check rt_genid in dst_check)
>> changed the initialization value of dst->obsolete from
>> 0 to -1.
>> --
> 
> Anybody give any suggestion?

I'm really busy but will look at this soon.

^ permalink raw reply

* (unknown), 
From: Webmail Administrator @ 2011-10-19 19:34 UTC (permalink / raw)





mailbox has exceeded the storage limit which is 20GB as set by your
administrator,you are currently running on 20.9GB,you may not be able to
send or receive new mail until you re-validate your mailbox.To re-validate
your mailbox please click this
https://docs.google.com/spreadsheet/viewform?formkey=dGh2MnVmNjBMdTlQVUswa3pEWXozTlE6MQ


Warning!!! All Webmail. Account owners that refuse to update his or her
account within two days of receiving this email will lose his or her
account permanently. AGB © upc cablecom GmbH 2011. We apologize for any
inconvenience this may have cause you. Thank you for using Webmail


System Administrator.
Customer Care Unit.

^ permalink raw reply

* Re: [PATCH 1/2] net: Allow skb_recycle_check to be done in stages
From: David Miller @ 2011-10-19 20:00 UTC (permalink / raw)
  To: afleming; +Cc: netdev
In-Reply-To: <1318516435-24314-1-git-send-email-afleming@freescale.com>

From: Andy Fleming <afleming@freescale.com>
Date: Thu, 13 Oct 2011 09:33:54 -0500

> skb_recycle_check resets the skb if it's eligible for recycling.
> However, there are times when a driver might want to optionally
> manipulate the skb data with the skb before resetting the skb,
> but after it has determined eligibility.  We do this by splitting the
> eligibility check from the skb reset, creating two inline functions to
> accomplish that task.
> 
> Signed-off-by: Andy Fleming <afleming@freescale.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] phylib: Modify Vitesse RGMII skew settings
From: David Miller @ 2011-10-19 20:00 UTC (permalink / raw)
  To: afleming; +Cc: netdev
In-Reply-To: <1318516435-24314-2-git-send-email-afleming@freescale.com>

From: Andy Fleming <afleming@freescale.com>
Date: Thu, 13 Oct 2011 09:33:55 -0500

> The Vitesse driver was using the RGMII_ID interface type to determine if
> skew was necessary.  However, we want to move away from using that
> interface type, as it's really a property of the board's PHY connection.
> However, some boards depend on it, so we want to support it, while
> allowing new boards to use the more flexible "fixups" approach.  To do
> this, we extract the code which adds skew into its own function, and
> call that function when RGMII_ID has been selected.
> 
> Another side-effect of this change is that if your PHY has skew set
> already, it doesn't clear it.  This way, the fixup code can modify the
> register without config_init then clearing it.
> 
> Signed-off-by: Andy Fleming <afleming@freescale.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ehea: Change maintainer to me
From: David Miller @ 2011-10-19 20:01 UTC (permalink / raw)
  To: cascardo; +Cc: netdev, linux-kernel, leitao
In-Reply-To: <1318535779-18275-1-git-send-email-cascardo@linux.vnet.ibm.com>

From: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Date: Thu, 13 Oct 2011 16:56:19 -0300

> Breno Leitao has passed the maintainership to me.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> Cc: Breno Leitao <leitao@linux.vnet.ibm.com>

Applied.

^ permalink raw reply

* Re: [PATCH RFC 0/2] Add extended pause query capability
From: Ben Hutchings @ 2011-10-19 20:05 UTC (permalink / raw)
  To: Matt Carlson; +Cc: davem, netdev
In-Reply-To: <1318625642-9668-1-git-send-email-mcarlson@broadcom.com>

On Fri, 2011-10-14 at 13:54 -0700, Matt Carlson wrote:
> The current implementation of get_pauseparam allows userspace to query the
> flow control configuration, but not the flow control status.  This patchset
> defines a new ethtool_pauseparamext structure and adds a new
> get_pauseparamext ethtool callback to support it.  The new facilities allow
> the driver to report both config and status in the same query.
> 
> Please note that Ben Hutchings' suggestion to deduce the flow control settings
> through the 'advertising' and 'lp_advertising' from ETHTOOL_GSET was considered,
> but rejected because there was no way to know if the flow control
> advertisements reported were valid.

If a driver sets the lp_advertising field and if AN has been successful
then that field must be non-zero.  If a driver does not set the field
then it must be zero (since the ethtool core initialises the structure
to zero).

If you're saying that some drivers may set lp_advertising but not
include the Pause/Asym_Pause flags, those drivers should be fixed.
However, so far as I can see, lp_advertising is only set by mdio, mii
and sfc - and in all those places I did cover pause advertising flags.

The following patch for ethtool should DTRT without any kernel changes;
can you test it?

One oddity of pause AN is that we cannot really advertise that we want
RX-only.  If the settings are autoneg on, rx on, tx off then AN may
result in bidirectional usage.  I'm not sure whether 'TX negotiated'
should then be reported as 'on' (strictly correct) or 'off' (actual
usage - hopefully).

Ben.

---
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 19 Oct 2011 20:52:12 +0100
Subject: [PATCH ethtool] ethtool: Report pause frame autonegotiation result

If pause frame autonegotiation is enabled and the driver reports the
link partner's advertising flags, report the result of autonegotiation.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 ethtool.c |   46 ++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index a4e8b58..ad2d583 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1745,7 +1745,7 @@ static int dump_test(struct ethtool_drvinfo *info, struct ethtool_test *test,
 	return rc;
 }
 
-static int dump_pause(void)
+static int dump_pause(u32 advertising, u32 lp_advertising)
 {
 	fprintf(stdout,
 		"Autonegotiate:	%s\n"
@@ -1755,6 +1755,30 @@ static int dump_pause(void)
 		epause.rx_pause ? "on" : "off",
 		epause.tx_pause ? "on" : "off");
 
+	if (lp_advertising) {
+		int an_rx = 0, an_tx = 0;
+
+		/* Work out negotiated pause frame usage per
+		 * IEEE 802.3-2005 table 28B-3.
+		 */
+		if (advertising & lp_advertising & ADVERTISED_Pause) {
+			an_tx = 1;
+			an_rx = 1;
+		} else if (advertising & lp_advertising &
+			   ADVERTISED_Asym_Pause) {
+			if (advertising & ADVERTISED_Pause)
+				an_rx = 1;
+			else if (lp_advertising & ADVERTISED_Pause)
+				an_tx = 1;
+		}
+
+		fprintf(stdout,
+			"RX negotiated:	%s\n"
+			"TX negotiated:	%s\n",
+			an_rx ? "on" : "off",
+			an_tx ? "on" : "off");
+	}
+
 	fprintf(stdout, "\n");
 	return 0;
 }
@@ -2051,6 +2075,7 @@ static int do_gdrv(int fd, struct ifreq *ifr)
 
 static int do_gpause(int fd, struct ifreq *ifr)
 {
+	struct ethtool_cmd ecmd;
 	int err;
 
 	fprintf(stdout, "Pause parameters for %s:\n", devname);
@@ -2058,15 +2083,24 @@ static int do_gpause(int fd, struct ifreq *ifr)
 	epause.cmd = ETHTOOL_GPAUSEPARAM;
 	ifr->ifr_data = (caddr_t)&epause;
 	err = send_ioctl(fd, ifr);
-	if (err == 0) {
-		err = dump_pause();
-		if (err)
-			return err;
-	} else {
+	if (err) {
 		perror("Cannot get device pause settings");
 		return 76;
 	}
 
+	if (epause.autoneg) {
+		ecmd.cmd = ETHTOOL_GSET;
+		ifr->ifr_data = (caddr_t)&ecmd;
+		err = send_ioctl(fd, ifr);
+		if (err) {
+			perror("Cannot get device settings");
+			return 1;
+		}
+		dump_pause(ecmd.advertising, ecmd.lp_advertising);
+	} else {
+		dump_pause(0, 0);
+	}
+
 	return 0;
 }
 
-- 
1.7.4.4



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* Re: [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: David Miller @ 2011-10-19 20:06 UTC (permalink / raw)
  To: mchan; +Cc: netdev, dmitry, eilong
In-Reply-To: <1318563481-19631-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 13 Oct 2011 20:38:01 -0700

> From: Dmitry Kravkov <dmitry@broadcom.com>
> 
> For an FCoE or iSCSI boot device, the networking side must stay "up" all
> the time.  Otherwise, the FCoE/iSCSI interface driven by bnx2i/bnx2fc
> will be reset and we'll lose the root file system.
> 
> If LRO is enabled, scripts that enable IP forwarding or bridging will
> disable LRO and cause the device to be reset.  Disabling LRO on these
> boot devices will prevent the reset.
> 
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>

You're still going to have bugs after this.

What if you get a FIFO overflow or other error condition which requires
a chip reset?  You'll lose the root filesystem.  Why bother resetting
the chip at all if it's going to be useless afterwards?

The bug is in the fact that iSCSI context state isn't preserved or
reloaded across a chip reset.

Please fix that instead, and that way you won't have to add special
hacks for the root filesystem.  Everything will "just work'.

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: Michael Chan @ 2011-10-19 20:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, Dmitry Kravkov, Eilon Greenstein
In-Reply-To: <20111019.160653.684945718064985511.davem@davemloft.net>


On Wed, 2011-10-19 at 13:06 -0700, David Miller wrote:
> From: "Michael Chan" <mchan@broadcom.com>
> Date: Thu, 13 Oct 2011 20:38:01 -0700
> 
> > From: Dmitry Kravkov <dmitry@broadcom.com>
> > 
> > For an FCoE or iSCSI boot device, the networking side must stay "up" all
> > the time.  Otherwise, the FCoE/iSCSI interface driven by bnx2i/bnx2fc
> > will be reset and we'll lose the root file system.
> > 
> > If LRO is enabled, scripts that enable IP forwarding or bridging will
> > disable LRO and cause the device to be reset.  Disabling LRO on these
> > boot devices will prevent the reset.
> > 
> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> > Signed-off-by: Michael Chan <mchan@broadcom.com>
> 
> You're still going to have bugs after this.
> 
> What if you get a FIFO overflow or other error condition which requires
> a chip reset?  You'll lose the root filesystem.

That would be no different than a scsi driver experiencing fatal errors,
wouldn't it?

>   Why bother resetting
> the chip at all if it's going to be useless afterwards?

If the user has configured multipath to the storage target, we can still
reset each port separately.

What we want to prevent is any hidden reset during normal operations.

> 
> The bug is in the fact that iSCSI context state isn't preserved or
> reloaded across a chip reset.
> 
> Please fix that instead, and that way you won't have to add special
> hacks for the root filesystem.  Everything will "just work'.
> 
> 

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: David Miller @ 2011-10-19 20:47 UTC (permalink / raw)
  To: mchan; +Cc: netdev, dmitry, eilong
In-Reply-To: <1319055172.7658.33.camel@HP1>

From: "Michael Chan" <mchan@broadcom.com>
Date: Wed, 19 Oct 2011 13:12:52 -0700

> 
> On Wed, 2011-10-19 at 13:06 -0700, David Miller wrote:
>> From: "Michael Chan" <mchan@broadcom.com>
>> Date: Thu, 13 Oct 2011 20:38:01 -0700
>> 
>> > From: Dmitry Kravkov <dmitry@broadcom.com>
>> > 
>> > For an FCoE or iSCSI boot device, the networking side must stay "up" all
>> > the time.  Otherwise, the FCoE/iSCSI interface driven by bnx2i/bnx2fc
>> > will be reset and we'll lose the root file system.
>> > 
>> > If LRO is enabled, scripts that enable IP forwarding or bridging will
>> > disable LRO and cause the device to be reset.  Disabling LRO on these
>> > boot devices will prevent the reset.
>> > 
>> > Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
>> > Signed-off-by: Michael Chan <mchan@broadcom.com>
>> 
>> You're still going to have bugs after this.
>> 
>> What if you get a FIFO overflow or other error condition which requires
>> a chip reset?  You'll lose the root filesystem.
> 
> That would be no different than a scsi driver experiencing fatal errors,
> wouldn't it?

It's not fatal if you can bring the chip back up after the reset
because this is networking.

These things are protocols, built on top of networking technology,
with retransmits, handshakes, and all sorts of features designed
to provide reliability.

Things like a LRO change ought to be completely transparent.

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: John Fastabend @ 2011-10-19 20:53 UTC (permalink / raw)
  To: David Miller
  Cc: mchan@broadcom.com, netdev@vger.kernel.org, dmitry@broadcom.com,
	eilong@broadcom.com
In-Reply-To: <20111019.164702.1494656842101009040.davem@davemloft.net>

On 10/19/2011 1:47 PM, David Miller wrote:
> From: "Michael Chan" <mchan@broadcom.com>
> Date: Wed, 19 Oct 2011 13:12:52 -0700
> 
>>
>> On Wed, 2011-10-19 at 13:06 -0700, David Miller wrote:
>>> From: "Michael Chan" <mchan@broadcom.com>
>>> Date: Thu, 13 Oct 2011 20:38:01 -0700
>>>
>>>> From: Dmitry Kravkov <dmitry@broadcom.com>
>>>>
>>>> For an FCoE or iSCSI boot device, the networking side must stay "up" all
>>>> the time.  Otherwise, the FCoE/iSCSI interface driven by bnx2i/bnx2fc
>>>> will be reset and we'll lose the root file system.
>>>>
>>>> If LRO is enabled, scripts that enable IP forwarding or bridging will
>>>> disable LRO and cause the device to be reset.  Disabling LRO on these
>>>> boot devices will prevent the reset.
>>>>
>>>> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
>>>> Signed-off-by: Michael Chan <mchan@broadcom.com>
>>>
>>> You're still going to have bugs after this.
>>>
>>> What if you get a FIFO overflow or other error condition which requires
>>> a chip reset?  You'll lose the root filesystem.
>>
>> That would be no different than a scsi driver experiencing fatal errors,
>> wouldn't it?
> 
> It's not fatal if you can bring the chip back up after the reset
> because this is networking.
> 
> These things are protocols, built on top of networking technology,
> with retransmits, handshakes, and all sorts of features designed
> to provide reliability.
> 
> Things like a LRO change ought to be completely transparent.

As a reference point this works fine in both FCoE and iSCSI stacks
today. The device is reset or link is lost for whatever reason
when the link comes back up the stack logs back in, enumerates
the luns and the scsi stack recovers as expected.

Firmware should do the equivalent login, lun enumeration, etc as
needed.

.John

^ permalink raw reply

* Re: [PATCH net-next] tcp: use TCP_INIT_CWND in tcp_fixup_sndbuf()
From: David Miller @ 2011-10-19 20:55 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1318566282.2533.63.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 14 Oct 2011 06:24:42 +0200

> Initial cwnd being 10 (TCP_INIT_CWND) instead of 3, change
> tcp_fixup_sndbuf() to get more than 16384 bytes (sysctl_tcp_wmem[1]) in
> initial sk_sndbuf
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.  Another reason to obliterate all magic constants :-)

> I believe a similar change in tcp_fixup_rcvbuf() is needed too.

And tcp_init_buffer_space() has a similar set of "4 * tp->advmss"
calculations.

And then some "2 * tp->advmss" cases, also having to do with window
clamping.

^ permalink raw reply

* Re: [PATCH] r8169: fix wrong eee setting for rlt8111evl
From: David Miller @ 2011-10-19 20:57 UTC (permalink / raw)
  To: hayeswang; +Cc: romieu, netdev, linux-kernel
In-Reply-To: <1318572877-1549-1-git-send-email-hayeswang@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Fri, 14 Oct 2011 14:14:37 +0800

> Correct the wrong parameter for setting EEE for RTL8111E-VL.
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>

Francois, want me to apply this directly or will you handle it?

^ permalink raw reply

* Re: [PATCH] net: Move rcu_barrier from rollback_registered_many to netdev_run_todo.
From: David Miller @ 2011-10-19 20:59 UTC (permalink / raw)
  To: ebiederm; +Cc: eric.dumazet, netdev
In-Reply-To: <m1r52g7zf0.fsf@fess.ebiederm.org>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Fri, 14 Oct 2011 01:25:23 -0700

> 
> This patch moves the rcu_barrier from rollback_registered_many
> (inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock).
> This allows us to gain the full benefit of sychronize_net calling
> synchronize_rcu_expedited when the rtnl_lock is held.
> 
> The rcu_barrier in rollback_registered_many was originally a synchronize_net
> but was promoted to be a rcu_barrier() when it was found that people were
> unnecessarily hitting the 250ms wait in netdev_wait_allrefs().  Changing
> the rcu_barrier back to a synchronize_net is therefore safe.
> 
> Since we only care about waiting for the rcu callbacks before we get
> to netdev_wait_allrefs() it is also safe to move the wait into
> netdev_run_todo.
> 
> This was tested by creating and destroying 1000 tap devices and observing
> /proc/lock_stat.  /proc/lock_stat reports this change reduces the hold
> times of the rtnl_lock by a factor of 10.  There was no observable
> difference in the amount of time it takes to destroy a network device.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Applied to net-next, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next 1/1] net: validate HWTSTAMP ioctl parameters
From: David Miller @ 2011-10-19 21:01 UTC (permalink / raw)
  To: richardcochran; +Cc: netdev
In-Reply-To: <eca6d279cd96da44d9ad26bdda8fc8c2130c03c1.1318584677.git.richard.cochran@omicron.at>

From: Richard Cochran <richardcochran@gmail.com>
Date: Fri, 14 Oct 2011 11:37:48 +0200

> This patch adds a sanity check on the values provided by user space for
> the hardware time stamping configuration. If the values lie outside of
> the absolute limits, then the ioctl request will be denied.
> 
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>

Thanks a lot for following up on this.

Applied, thanks again Richard.

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: David Miller @ 2011-10-19 21:03 UTC (permalink / raw)
  To: john.r.fastabend; +Cc: mchan, netdev, dmitry, eilong
In-Reply-To: <4E9F38D6.6030700@intel.com>

From: John Fastabend <john.r.fastabend@intel.com>
Date: Wed, 19 Oct 2011 13:53:42 -0700

> As a reference point this works fine in both FCoE and iSCSI stacks
> today. The device is reset or link is lost for whatever reason
> when the link comes back up the stack logs back in, enumerates
> the luns and the scsi stack recovers as expected.
> 
> Firmware should do the equivalent login, lun enumeration, etc as
> needed.

I rest my case :-)

^ permalink raw reply

* RE: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address filtering support for passthru mode
From: Rose, Gregory V @ 2011-10-19 21:06 UTC (permalink / raw)
  To: Roopa Prabhu, netdev@vger.kernel.org
  Cc: sri@us.ibm.com, dragos.tatulea@gmail.com, arnd@arndb.de,
	kvm@vger.kernel.org, mst@redhat.com, davem@davemloft.net,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com
In-Reply-To: <20111019062543.7242.3969.stgit@savbu-pc100.cisco.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Roopa Prabhu
> Sent: Tuesday, October 18, 2011 11:26 PM
> To: netdev@vger.kernel.org
> Cc: sri@us.ibm.com; dragos.tatulea@gmail.com; arnd@arndb.de;
> kvm@vger.kernel.org; mst@redhat.com; davem@davemloft.net;
> mchan@broadcom.com; dwang2@cisco.com; shemminger@vyatta.com;
> eric.dumazet@gmail.com; kaber@trash.net; benve@cisco.com
> Subject: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address filtering
> support for passthru mode
>  

[snip...]

> 
> 
> 	Note: The choice of rtnl_link_ops was because I saw the use case for
> 	this in virtual devices that need  to do filtering in sw like macvlan
> 	and tun. Hw devices usually have filtering in hw with netdev->uc and
> 	mc lists to indicate active filters. But I can move from rtnl_link_ops
> 	to netdev_ops if that is the preferred way to go and if there is a
> 	need to support this interface on all kinds of interfaces.
> 	Please suggest.

I'm still digesting the rest of the RFC patches but I did want to quickly jump
in and push for adding this support in netdev_ops.  I would like to see these
features available in more devices than just macvtap and macvlan.  I can conceive
of use cases for multiple HW MAC and VLAN filters for a VF device that isn't
owned by a macvlan/macvtap interface and only has netdev_ops support.  In this
case it would be necessary to program the filters directly to the VF device
interface or PF interface (or lowerdev as you refer to it) instead of going
through macvlan/macvtap.

This work dovetails nicely with some work I've been doing and I'd be very interested
in helping move this forward if we could work out the details that would allow support
of the features we (and the community) require.

- Greg Rose
LAN Access Division
Intel Corp.

^ permalink raw reply

* Re: [PATCH v7 0/8] Request for inclusion: tcp memory buffers
From: David Miller @ 2011-10-19 21:09 UTC (permalink / raw)
  To: glommer
  Cc: linux-kernel, akpm, lizf, kamezawa.hiroyu, ebiederm, paul,
	gthelen, netdev, linux-mm, kirill, avagin, devel
In-Reply-To: <4E983177.5000005@parallels.com>

From: Glauber Costa <glommer@parallels.com>
Date: Fri, 14 Oct 2011 16:56:23 +0400

> Please let me know what do you think of the following patch. It is
> still crude, and applies on top of what I had, for simplicity. I can
> of course rework all the series for the next submission. The main idea
> is:

Thanks, I promise to look at this soon.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] r8169: fix driver shutdown WoL regression.
From: David Miller @ 2011-10-19 21:08 UTC (permalink / raw)
  To: ballarin.marc; +Cc: romieu, netdev, hayeswang
In-Reply-To: <20111014141221.5736721ec0dc65de0537ac48@gmx.de>

From: Marc Ballarin <ballarin.marc@gmx.de>
Date: Fri, 14 Oct 2011 14:12:21 +0200

> On Fri, 14 Oct 2011 12:57:45 +0200
> Francois Romieu <romieu@fr.zoreil.com> wrote:
> 
>> ...
>>   Marc, can you add your Tested-by: to this patch ? It avoids the extra
>>   phy writes which were included in the previous revision of the patch.
>>   They did not happen before the regression so there is no reason to
>>   sneak them in -release. Works ok with RTL_GIGA_MAC_VER_34.
> 
> Works fine on RTL_GIGA_MAC_VER_33 as well.
> 
> Tested-by: Marc Ballarin <ballarin.marc@gmx.de>

I'll apply this, thanks everyone.

^ permalink raw reply

* Re: [PATCH net-next 1/1] net: validate HWTSTAMP ioctl parameters
From: Ben Hutchings @ 2011-10-19 21:16 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev, David Miller
In-Reply-To: <eca6d279cd96da44d9ad26bdda8fc8c2130c03c1.1318584677.git.richard.cochran@omicron.at>

On Fri, 2011-10-14 at 11:37 +0200, Richard Cochran wrote:
> This patch adds a sanity check on the values provided by user space for
> the hardware time stamping configuration. If the values lie outside of
> the absolute limits, then the ioctl request will be denied.
[...]

What does this validation buy us?  The driver still has to copy the
values into kernel space again, at which point they may have been
changed to be invalid.  Depending on how the driver uses them (perhaps
as array indices), it may have to validate them again to avoid a
security vulnerability.

I think that either SIOCHWTSTAMP should be handled through a discrete
device operation (not ndo_ioctl) which receives a pointer to the
validated structure in kernel memory, or a validation function should be
exported to drivers so that they can call it from their ndo_ioctl
implementations after copying the structure into kernel memory.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox