Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH iproute2] ip: Support IFLA_TXQLEN in ip link command
From: Eric Dumazet @ 2009-10-22 14:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Linux Netdev List, Benjamin LaHaise

We currently use an expensive ioctl() to get device txqueuelen, while
rtnetlink gave it to us for free. This patch speeds up ip link operation
when many devices are registered.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 267ecb3..f06a3f7 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -252,9 +252,12 @@ int print_linkinfo(const struct sockaddr_nl *who,
 	if (tb[IFLA_OPERSTATE])
 		print_operstate(fp, *(__u8 *)RTA_DATA(tb[IFLA_OPERSTATE]));
 		
-	if (filter.showqueue)
-		print_queuelen(fp, (char*)RTA_DATA(tb[IFLA_IFNAME]));
-
+	if (filter.showqueue) {
+		if (tb[IFLA_TXQLEN])
+			fprintf(fp, "qlen %d ", *(int *)RTA_DATA(tb[IFLA_TXQLEN]));
+		else
+			print_queuelen(fp, (char *)RTA_DATA(tb[IFLA_IFNAME]));
+	}
 	if (!filter.family || filter.family == AF_PACKET) {
 		SPRINT_BUF(b1);
 		fprintf(fp, "%s", _SL_);

^ permalink raw reply related

* [PATCH net-next-2.6 4/4] dvb: dvb_net: use mc helpers to access multicast list
From: Jiri Pirko @ 2009-10-22 13:57 UTC (permalink / raw)
  To: netdev
  Cc: davem, eric.dumazet, jeffrey.t.kirsher, jesse.brandeburg,
	bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak, e1000-devel,
	mchehab, linux-media
In-Reply-To: <20091022135120.GC2868@psychotron.lab.eng.brq.redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/media/dvb/dvb-core/dvb_net.c |   22 +++++++---------------
 1 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/media/dvb/dvb-core/dvb_net.c b/drivers/media/dvb/dvb-core/dvb_net.c
index 8c9ae0a..eb50fb0 100644
--- a/drivers/media/dvb/dvb-core/dvb_net.c
+++ b/drivers/media/dvb/dvb-core/dvb_net.c
@@ -1110,17 +1110,16 @@ static int dvb_net_feed_stop(struct net_device *dev)
 }
 
 
-static int dvb_set_mc_filter (struct net_device *dev, struct dev_mc_list *mc)
+static void dvb_set_mc_filter(void *data, unsigned char *addr)
 {
-	struct dvb_net_priv *priv = netdev_priv(dev);
+	struct dvb_net_priv *priv = data;
 
 	if (priv->multi_num == DVB_NET_MULTICAST_MAX)
-		return -ENOMEM;
+		return;
 
-	memcpy(priv->multi_macs[priv->multi_num], mc->dmi_addr, 6);
+	memcpy(priv->multi_macs[priv->multi_num], addr, ETH_ALEN);
 
 	priv->multi_num++;
-	return 0;
 }
 
 
@@ -1140,21 +1139,14 @@ static void wq_set_multicast_list (struct work_struct *work)
 	} else if ((dev->flags & IFF_ALLMULTI)) {
 		dprintk("%s: allmulti mode\n", dev->name);
 		priv->rx_mode = RX_MODE_ALL_MULTI;
-	} else if (dev->mc_count) {
-		int mci;
-		struct dev_mc_list *mc;
-
+	} else if (netdev_mc_count(dev)) {
 		dprintk("%s: set_mc_list, %d entries\n",
-			dev->name, dev->mc_count);
+			dev->name, netdev_mc_count(dev));
 
 		priv->rx_mode = RX_MODE_MULTI;
 		priv->multi_num = 0;
 
-		for (mci = 0, mc=dev->mc_list;
-		     mci < dev->mc_count;
-		     mc = mc->next, mci++) {
-			dvb_set_mc_filter(dev, mc);
-		}
+		netdev_mc_walk(dev, dvb_set_mc_filter, priv);
 	}
 
 	netif_addr_unlock_bh(dev);
-- 
1.6.2.5


^ permalink raw reply related

* Re: [PATCH net-next-2.6 0/4] net: change the way mc_list is accessed
From: Jiri Pirko @ 2009-10-22 13:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, eric.dumazet, jeffrey.t.kirsher, jesse.brandeburg,
	bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak, e1000-devel,
	mchehab, linux-media
In-Reply-To: <20091022135446.GG2868@psychotron.lab.eng.brq.redhat.com>

wrong subject... reposting...

Thu, Oct 22, 2009 at 03:54:47PM CEST, jpirko@redhat.com wrote:
>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>---
> drivers/media/dvb/dvb-core/dvb_net.c |   22 +++++++---------------
> 1 files changed, 7 insertions(+), 15 deletions(-)
>
>diff --git a/drivers/media/dvb/dvb-core/dvb_net.c b/drivers/media/dvb/dvb-core/dvb_net.c
>index 8c9ae0a..eb50fb0 100644
>--- a/drivers/media/dvb/dvb-core/dvb_net.c
>+++ b/drivers/media/dvb/dvb-core/dvb_net.c
>@@ -1110,17 +1110,16 @@ static int dvb_net_feed_stop(struct net_device *dev)
> }
> 
> 
>-static int dvb_set_mc_filter (struct net_device *dev, struct dev_mc_list *mc)
>+static void dvb_set_mc_filter(void *data, unsigned char *addr)
> {
>-	struct dvb_net_priv *priv = netdev_priv(dev);
>+	struct dvb_net_priv *priv = data;
> 
> 	if (priv->multi_num == DVB_NET_MULTICAST_MAX)
>-		return -ENOMEM;
>+		return;
> 
>-	memcpy(priv->multi_macs[priv->multi_num], mc->dmi_addr, 6);
>+	memcpy(priv->multi_macs[priv->multi_num], addr, ETH_ALEN);
> 
> 	priv->multi_num++;
>-	return 0;
> }
> 
> 
>@@ -1140,21 +1139,14 @@ static void wq_set_multicast_list (struct work_struct *work)
> 	} else if ((dev->flags & IFF_ALLMULTI)) {
> 		dprintk("%s: allmulti mode\n", dev->name);
> 		priv->rx_mode = RX_MODE_ALL_MULTI;
>-	} else if (dev->mc_count) {
>-		int mci;
>-		struct dev_mc_list *mc;
>-
>+	} else if (netdev_mc_count(dev)) {
> 		dprintk("%s: set_mc_list, %d entries\n",
>-			dev->name, dev->mc_count);
>+			dev->name, netdev_mc_count(dev));
> 
> 		priv->rx_mode = RX_MODE_MULTI;
> 		priv->multi_num = 0;
> 
>-		for (mci = 0, mc=dev->mc_list;
>-		     mci < dev->mc_count;
>-		     mc = mc->next, mci++) {
>-			dvb_set_mc_filter(dev, mc);
>-		}
>+		netdev_mc_walk(dev, dvb_set_mc_filter, priv);
> 	}
> 
> 	netif_addr_unlock_bh(dev);
>-- 
>1.6.2.5
>

^ permalink raw reply

* Re: [RFC] net,socket: introduce build_sockaddr_check helper to catch overflow at build time
From: Cyrill Gorcunov @ 2009-10-22 13:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20091022.044914.36401063.davem@davemloft.net>

[David Miller - Thu, Oct 22, 2009 at 04:49:14AM -0700]
| From: Cyrill Gorcunov <gorcunov@gmail.com>
| Date: Wed, 21 Oct 2009 21:07:32 +0400
| 
| > net,socket: introduce build_sockaddr_check helper to catch overflow at build time
| > 
| > proto_ops->getname implies copying protocol specific data
| > into storage unit (particulary to __kernel_sockaddr_storage).
| > So when one implements new protocol he either may keep this
| > in mind (or may not).
| > 
| > Lets introduce build_sockaddr_check helper which check if
| > storage unit is not overfowed. Note that the check is build
| > time and introduce no slowdown at execution time.
| > 
| > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
| 
| Nice idea, and I wonder if we can automate it even further.
| Perhaps some tag that gets put on the socket address type
| definition or similar?
| 

Thanks for review David! Not sure if I understand you right.
Initially I was trying to bring as minimum changes as possible.
Also I was shuffle in mind the following possibilities:

1) Since at least one .getname handler use memcpy, we could
   introduce some helper which check size (at build time) and
   then do memcpy (not optimal perhaps).

2) All handlers set *len to some size explicitly so we may
   introduce set_sockaddr_size() helper like

#define set_sockaddr_size(ptr, size)		\
	do {					\
		build_sockaddr_check(size);	\
		*ptr = size;			\
	} while (0)

Or you meant something completely different?

	-- Cyrill

^ permalink raw reply

* Re: [PATCH net-next-2.6 0/4] net: change the way mc_list is accessed
From: Jiri Pirko @ 2009-10-22 13:54 UTC (permalink / raw)
  To: netdev
  Cc: eric.dumazet, e1000-devel, bruce.w.allan, jesse.brandeburg,
	mchehab, john.ronciak, jeffrey.t.kirsher, davem, linux-media
In-Reply-To: <20091022135120.GC2868@psychotron.lab.eng.brq.redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/media/dvb/dvb-core/dvb_net.c |   22 +++++++---------------
 1 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/media/dvb/dvb-core/dvb_net.c b/drivers/media/dvb/dvb-core/dvb_net.c
index 8c9ae0a..eb50fb0 100644
--- a/drivers/media/dvb/dvb-core/dvb_net.c
+++ b/drivers/media/dvb/dvb-core/dvb_net.c
@@ -1110,17 +1110,16 @@ static int dvb_net_feed_stop(struct net_device *dev)
 }
 
 
-static int dvb_set_mc_filter (struct net_device *dev, struct dev_mc_list *mc)
+static void dvb_set_mc_filter(void *data, unsigned char *addr)
 {
-	struct dvb_net_priv *priv = netdev_priv(dev);
+	struct dvb_net_priv *priv = data;
 
 	if (priv->multi_num == DVB_NET_MULTICAST_MAX)
-		return -ENOMEM;
+		return;
 
-	memcpy(priv->multi_macs[priv->multi_num], mc->dmi_addr, 6);
+	memcpy(priv->multi_macs[priv->multi_num], addr, ETH_ALEN);
 
 	priv->multi_num++;
-	return 0;
 }
 
 
@@ -1140,21 +1139,14 @@ static void wq_set_multicast_list (struct work_struct *work)
 	} else if ((dev->flags & IFF_ALLMULTI)) {
 		dprintk("%s: allmulti mode\n", dev->name);
 		priv->rx_mode = RX_MODE_ALL_MULTI;
-	} else if (dev->mc_count) {
-		int mci;
-		struct dev_mc_list *mc;
-
+	} else if (netdev_mc_count(dev)) {
 		dprintk("%s: set_mc_list, %d entries\n",
-			dev->name, dev->mc_count);
+			dev->name, netdev_mc_count(dev));
 
 		priv->rx_mode = RX_MODE_MULTI;
 		priv->multi_num = 0;
 
-		for (mci = 0, mc=dev->mc_list;
-		     mci < dev->mc_count;
-		     mc = mc->next, mci++) {
-			dvb_set_mc_filter(dev, mc);
-		}
+		netdev_mc_walk(dev, dvb_set_mc_filter, priv);
 	}
 
 	netif_addr_unlock_bh(dev);
-- 
1.6.2.5


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

^ permalink raw reply related

* [PATCH net-next-2.6 3/4] e1000e: use mc helpers to access multicast list
From: Jiri Pirko @ 2009-10-22 13:54 UTC (permalink / raw)
  To: netdev
  Cc: eric.dumazet, e1000-devel, bruce.w.allan, jesse.brandeburg,
	mchehab, john.ronciak, jeffrey.t.kirsher, davem, linux-media
In-Reply-To: <20091022135120.GC2868@psychotron.lab.eng.brq.redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/e1000e/netdev.c |   34 +++++++++++++++++++---------------
 1 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 3769248..97cd106 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2529,6 +2529,17 @@ static void e1000_update_mc_addr_list(struct e1000_hw *hw, u8 *mc_addr_list,
 }
 
 /**
+ * e1000_mc_walker - helper function
+ **/
+static void e1000_mc_walker(void *data, unsigned char *addr)
+{
+	u8 **mta_list_i = data;
+
+	memcpy(*mta_list_i, addr, ETH_ALEN);
+	*mta_list_i += ETH_ALEN;
+}
+
+/**
  * e1000_set_multi - Multicast and Promiscuous mode set
  * @netdev: network interface device structure
  *
@@ -2542,10 +2553,9 @@ static void e1000_set_multi(struct net_device *netdev)
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
 	struct e1000_mac_info *mac = &hw->mac;
-	struct dev_mc_list *mc_ptr;
-	u8  *mta_list;
+	u8  *mta_list, *mta_list_i;
 	u32 rctl;
-	int i;
+	int mc_count;
 
 	/* Check for Promiscuous and All Multicast modes */
 
@@ -2567,23 +2577,17 @@ static void e1000_set_multi(struct net_device *netdev)
 
 	ew32(RCTL, rctl);
 
-	if (netdev->mc_count) {
-		mta_list = kmalloc(netdev->mc_count * 6, GFP_ATOMIC);
+	mc_count = netdev_mc_count(netdev);
+	if (mc_count) {
+		mta_list = kmalloc(mc_count * ETH_ALEN, GFP_ATOMIC);
 		if (!mta_list)
 			return;
 
 		/* prepare a packed array of only addresses. */
-		mc_ptr = netdev->mc_list;
-
-		for (i = 0; i < netdev->mc_count; i++) {
-			if (!mc_ptr)
-				break;
-			memcpy(mta_list + (i*ETH_ALEN), mc_ptr->dmi_addr,
-			       ETH_ALEN);
-			mc_ptr = mc_ptr->next;
-		}
+		mta_list_i = mta_list;
+		netdev_mc_walk(netdev, e1000_mc_walker, &mta_list_i);
 
-		e1000_update_mc_addr_list(hw, mta_list, i, 1,
+		e1000_update_mc_addr_list(hw, mta_list, mc_count, 1,
 					  mac->rar_entry_count);
 		kfree(mta_list);
 	} else {
-- 
1.6.2.5


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

^ permalink raw reply related

* [PATCH net-next-2.6 2/4] 8139too: use mc helpers to access multicast list
From: Jiri Pirko @ 2009-10-22 13:53 UTC (permalink / raw)
  To: netdev
  Cc: eric.dumazet, e1000-devel, bruce.w.allan, jesse.brandeburg,
	mchehab, john.ronciak, jeffrey.t.kirsher, davem, linux-media
In-Reply-To: <20091022135120.GC2868@psychotron.lab.eng.brq.redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/8139too.c |   24 ++++++++++++++----------
 1 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c
index 7e333f7..f0c3670 100644
--- a/drivers/net/8139too.c
+++ b/drivers/net/8139too.c
@@ -2501,6 +2501,15 @@ static struct net_device_stats *rtl8139_get_stats (struct net_device *dev)
 	return &dev->stats;
 }
 
+static void mc_walker(void *data, unsigned char *addr)
+{
+	u32 *mc_filter = data;
+	int bit_nr;
+
+	bit_nr = ether_crc(ETH_ALEN, addr) >> 26;
+	mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
+}
+
 /* Set or clear the multicast filter for this adaptor.
    This routine is not state sensitive and need not be SMP locked. */
 
@@ -2509,7 +2518,7 @@ static void __set_rx_mode (struct net_device *dev)
 	struct rtl8139_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->mmio_addr;
 	u32 mc_filter[2];	/* Multicast hash filter */
-	int i, rx_mode;
+	int rx_mode;
 	u32 tmp;
 
 	pr_debug("%s:   rtl8139_set_rx_mode(%4.4x) done -- Rx config %8.8lx.\n",
@@ -2521,22 +2530,17 @@ static void __set_rx_mode (struct net_device *dev)
 		    AcceptBroadcast | AcceptMulticast | AcceptMyPhys |
 		    AcceptAllPhys;
 		mc_filter[1] = mc_filter[0] = 0xffffffff;
-	} else if ((dev->mc_count > multicast_filter_limit)
+	} else if ((netdev_mc_count(dev) > multicast_filter_limit)
 		   || (dev->flags & IFF_ALLMULTI)) {
 		/* Too many to filter perfectly -- accept all multicasts. */
 		rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys;
 		mc_filter[1] = mc_filter[0] = 0xffffffff;
 	} else {
-		struct dev_mc_list *mclist;
 		rx_mode = AcceptBroadcast | AcceptMyPhys;
-		mc_filter[1] = mc_filter[0] = 0;
-		for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count;
-		     i++, mclist = mclist->next) {
-			int bit_nr = ether_crc(ETH_ALEN, mclist->dmi_addr) >> 26;
-
-			mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
+		if (!netdev_mc_empty(dev))
 			rx_mode |= AcceptMulticast;
-		}
+		mc_filter[1] = mc_filter[0] = 0;
+		netdev_mc_walk(dev, mc_walker, mc_filter);
 	}
 
 	/* We can safely update without stopping the chip. */
-- 
1.6.2.5


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

^ permalink raw reply related

* [PATCH net-next-2.6 1/4] net: introduce mc list helpers
From: Jiri Pirko @ 2009-10-22 13:52 UTC (permalink / raw)
  To: netdev
  Cc: davem, eric.dumazet, jeffrey.t.kirsher, jesse.brandeburg,
	bruce.w.allan, peter.p.waskiewicz.jr, john.ronciak, e1000-devel,
	mchehab, linux-media
In-Reply-To: <20091022135120.GC2868@psychotron.lab.eng.brq.redhat.com>

This helpers should be used by network drivers to access to netdev
multicast lists.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 include/linux/netdevice.h |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8380009..7edc4a6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -921,6 +921,28 @@ struct net_device
 
 #define	NETDEV_ALIGN		32
 
+static inline int netdev_mc_count(struct net_device *dev)
+{
+	return dev->mc_count;
+}
+
+static inline bool netdev_mc_empty(struct net_device *dev)
+{
+	return netdev_mc_count(dev) == 0;
+}
+
+static inline void netdev_mc_walk(struct net_device *dev,
+				  void (*func)(void *, unsigned char *),
+				  void *data)
+{
+	struct dev_addr_list *mclist;
+	int i;
+
+	for (i = 0, mclist = dev->mc_list; mclist && i < dev->mc_count;
+	     i++, mclist = mclist->next)
+		func(data, mclist->dmi_addr);
+}
+
 static inline
 struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev,
 					 unsigned int index)
-- 
1.6.2.5


^ permalink raw reply related

* [PATCH net-next-2.6 0/4] net: change the way mc_list is accessed
From: Jiri Pirko @ 2009-10-22 13:51 UTC (permalink / raw)
  To: netdev
  Cc: eric.dumazet, e1000-devel, bruce.w.allan, jesse.brandeburg,
	mchehab, john.ronciak, jeffrey.t.kirsher, davem, linux-media

In a struct net_device, multicast addresses are stored using a self-made linked
list. To convert this to list_head list there would be needed to do the change
in all (literally all) network device drivers at once.

To solve this situation and also to make device drivers' code prettier I'm
introducing several multicast list helpers which can (and in the future they
should) be used to access mc list. Once all drivers will use these helpers,
we can easily convert to list_head.

The part of this patchset are also 3 examples of a usage of the helpers.

Kindly asking for review.

Thanks,

Jirka

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

^ permalink raw reply

* Re: xfrm transport mode policy and forward packets
From: Timo Teräs @ 2009-10-22 13:31 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, Alexey Kuznetsov
In-Reply-To: <20091022132126.GB28893@gondor.apana.org.au>

Herbert Xu wrote:
> On Thu, Oct 22, 2009 at 03:07:28PM +0300, Timo Teräs wrote:
>> I'm using on my dmvpn environment security policies like:
>>
>> src 0.0.0.0/0 dst 0.0.0.0/0 proto gre 	dir in priority 2147483648 ptype 
>> main 	tmpl src 0.0.0.0 dst 0.0.0.0
>> 		proto esp reqid 0 mode transport
>>
>> src 0.0.0.0/0 dst 0.0.0.0/0 proto gre 	dir out priority 2147483648 ptype 
>> main 	tmpl src 0.0.0.0 dst 0.0.0.0
>> 		proto esp reqid 0 mode transport
>>
>> To make sure the locally generated/received GRE traffic is IPsec protected.
>> Now when some other non-local gre traffic is being forwarded by this router,
>> that seems to match these SPs too. Basically no one behind this router box
>> can use GRE (or PPTP).
> 
> This is expected since forwarded GRE packets match the selector
> given.

Yes. I forgot to explicitly mention, that I thought just removing the
'fwd' policy would fix this. It's slightly confusing that that input path
is split to two separate policy db's, while output is not.

>> My ideas so far have been:
>> a) rename 'fwd' to 'infwd' and split 'out' to 'out' and 'outfwd' ?
>>   (sounds kinda intrusive)
>> b) iptables target that would be able to disable xfrm
>>
>> Any other ideas?
>> What would be the proper fix for this problem?
> 
> We could add the fwmark as a key.

Ah, sounds even better.

> Alexey and others may have better ideas on this.

Thanks!
 Timo

^ permalink raw reply

* [net-next-2.6 PATCH] be2net:Changes to update ethtool get_settings function to return appropriate values.
From: Sarveshwar Bandi @ 2009-10-22 13:30 UTC (permalink / raw)
  To: netdev; +Cc: davem

Update ethtool get_settings function to:
- get current link speed settings from controller
- get port transceiver type from controller
- fill appropriate values for supported, phy_address

Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
---
 drivers/net/benet/be_cmds.c    |   37 +++++++++++++++++++++++++++++++--
 drivers/net/benet/be_cmds.h    |   45 ++++++++++++++++++++++++++++++++++++++--
 drivers/net/benet/be_ethtool.c |   36 +++++++++++++++++++++++++++++++-
 drivers/net/benet/be_main.c    |    5 ++++
 4 files changed, 117 insertions(+), 6 deletions(-)

diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 25b6602..a034265 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -823,7 +823,7 @@ int be_cmd_get_stats(struct be_adapter *
 
 /* Uses synchronous mcc */
 int be_cmd_link_status_query(struct be_adapter *adapter,
-			bool *link_up)
+			bool *link_up, u8 *mac_speed, u16 *link_speed)
 {
 	struct be_mcc_wrb *wrb;
 	struct be_cmd_req_link_status *req;
@@ -844,8 +844,11 @@ int be_cmd_link_status_query(struct be_a
 	status = be_mcc_notify_wait(adapter);
 	if (!status) {
 		struct be_cmd_resp_link_status *resp = embedded_payload(wrb);
-		if (resp->mac_speed != PHY_LINK_SPEED_ZERO)
+		if (resp->mac_speed != PHY_LINK_SPEED_ZERO) {
 			*link_up = true;
+			*link_speed = le16_to_cpu(resp->link_speed);
+			*mac_speed = resp->mac_speed;
+		}
 	}
 
 	spin_unlock_bh(&adapter->mcc_lock);
@@ -1177,6 +1180,36 @@ int be_cmd_get_beacon_state(struct be_ad
 	return status;
 }
 
+/* Uses sync mcc */
+int be_cmd_read_port_type(struct be_adapter *adapter, u32 port,
+				u8 *connector)
+{
+	struct be_mcc_wrb *wrb;
+	struct be_cmd_req_port_type *req;
+	int status;
+
+	spin_lock_bh(&adapter->mcc_lock);
+
+	wrb = wrb_from_mccq(adapter);
+	req = embedded_payload(wrb);
+
+	be_wrb_hdr_prepare(wrb, sizeof(struct be_cmd_resp_port_type), true, 0);
+
+	be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
+		OPCODE_COMMON_READ_TRANSRECV_DATA, sizeof(*req));
+
+	req->port = cpu_to_le32(port);
+	req->page_num = cpu_to_le32(TR_PAGE_A0);
+	status = be_mcc_notify_wait(adapter);
+	if (!status) {
+		struct be_cmd_resp_port_type *resp = embedded_payload(wrb);
+			*connector = resp->data.connector;
+	}
+
+	spin_unlock_bh(&adapter->mcc_lock);
+	return status;
+}
+
 int be_cmd_write_flashrom(struct be_adapter *adapter, struct be_dma_mem *cmd,
 			u32 flash_type, u32 flash_opcode, u32 buf_size)
 {
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index a1e78cc..65e14dd 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -140,6 +140,7 @@ #define OPCODE_COMMON_NTWK_PMAC_DEL			60
 #define OPCODE_COMMON_FUNCTION_RESET			61
 #define OPCODE_COMMON_ENABLE_DISABLE_BEACON		69
 #define OPCODE_COMMON_GET_BEACON_STATE			70
+#define OPCODE_COMMON_READ_TRANSRECV_DATA		73
 
 #define OPCODE_ETH_ACPI_CONFIG				2
 #define OPCODE_ETH_PROMISCUOUS				3
@@ -635,9 +636,47 @@ struct be_cmd_resp_link_status {
 	u8 mac_fault;
 	u8 mgmt_mac_duplex;
 	u8 mgmt_mac_speed;
-	u16 rsvd0;
+	u16 link_speed;
+	u32 rsvd0;
 } __packed;
 
+/******************** Port Identification ***************************/
+/*    Identifies the type of port attached to NIC     */
+struct be_cmd_req_port_type {
+	struct be_cmd_req_hdr hdr;
+	u32 page_num;
+	u32 port;
+};
+
+enum {
+	TR_PAGE_A0 = 0xa0,
+	TR_PAGE_A2 = 0xa2
+};
+
+struct be_cmd_resp_port_type {
+	struct be_cmd_resp_hdr hdr;
+	u32 page_num;
+	u32 port;
+	struct data {
+		u8 identifier;
+		u8 identifier_ext;
+		u8 connector;
+		u8 transceiver[8];
+		u8 rsvd0[3];
+		u8 length_km;
+		u8 length_hm;
+		u8 length_om1;
+		u8 length_om2;
+		u8 length_cu;
+		u8 length_cu_m;
+		u8 vendor_name[16];
+		u8 rsvd;
+		u8 vendor_oui[3];
+		u8 vendor_pn[16];
+		u8 vendor_rev[4];
+	} data;
+};
+
 /******************** Get FW Version *******************/
 struct be_cmd_req_get_fw_version {
 	struct be_cmd_req_hdr hdr;
@@ -775,7 +814,7 @@ extern int be_cmd_rxq_create(struct be_a
 extern int be_cmd_q_destroy(struct be_adapter *adapter, struct be_queue_info *q,
 			int type);
 extern int be_cmd_link_status_query(struct be_adapter *adapter,
-			bool *link_up);
+			bool *link_up, u8 *mac_speed, u16 *link_speed);
 extern int be_cmd_reset(struct be_adapter *adapter);
 extern int be_cmd_get_stats(struct be_adapter *adapter,
 			struct be_dma_mem *nonemb_cmd);
@@ -801,6 +840,8 @@ extern int be_cmd_set_beacon_state(struc
 			u8 port_num, u8 beacon, u8 status, u8 state);
 extern int be_cmd_get_beacon_state(struct be_adapter *adapter,
 			u8 port_num, u32 *state);
+extern int be_cmd_read_port_type(struct be_adapter *adapter, u32 port,
+					u8 *connector);
 extern int be_cmd_write_flashrom(struct be_adapter *adapter,
 			struct be_dma_mem *cmd, u32 flash_oper,
 			u32 flash_opcode, u32 buf_size);
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 280471e..edebce9 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -293,9 +293,43 @@ static int be_get_sset_count(struct net_
 
 static int be_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 {
-	ecmd->speed = SPEED_10000;
+	struct be_adapter *adapter = netdev_priv(netdev);
+	u8 mac_speed = 0, connector = 0;
+	u16 link_speed = 0;
+	bool link_up = false;
+
+	be_cmd_link_status_query(adapter, &link_up, &mac_speed, &link_speed);
+
+	/* link_speed is in units of 10 Mbps */
+	if (link_speed) {
+		ecmd->speed = link_speed*10;
+	} else {
+		switch (mac_speed) {
+		case PHY_LINK_SPEED_1GBPS:
+			ecmd->speed = SPEED_1000;
+			break;
+		case PHY_LINK_SPEED_10GBPS:
+			ecmd->speed = SPEED_10000;
+			break;
+		}
+	}
 	ecmd->duplex = DUPLEX_FULL;
 	ecmd->autoneg = AUTONEG_DISABLE;
+	ecmd->supported = (SUPPORTED_10000baseT_Full | SUPPORTED_TP);
+
+	be_cmd_read_port_type(adapter, adapter->port_num, &connector);
+	switch (connector) {
+	case 7:
+		ecmd->port = PORT_FIBRE;
+		break;
+	default:
+		ecmd->port = PORT_TP;
+		break;
+	}
+
+	ecmd->phy_address = adapter->port_num;
+	ecmd->transceiver = XCVR_INTERNAL;
+
 	return 0;
 }
 
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index e0f9d64..a48e822 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1586,6 +1586,8 @@ static int be_open(struct net_device *ne
 	struct be_eq_obj *tx_eq = &adapter->tx_eq;
 	bool link_up;
 	int status;
+	u8 mac_speed;
+	u16 link_speed;
 
 	/* First time posting */
 	be_post_rx_frags(adapter);
@@ -1604,7 +1606,8 @@ static int be_open(struct net_device *ne
 	/* Rx compl queue may be in unarmed state; rearm it */
 	be_cq_notify(adapter, adapter->rx_obj.cq.id, true, 0);
 
-	status = be_cmd_link_status_query(adapter, &link_up);
+	status = be_cmd_link_status_query(adapter, &link_up, &mac_speed,
+			&link_speed);
 	if (status)
 		return status;
 	be_link_status_update(adapter, link_up);
-- 
1.4.0


^ permalink raw reply related

* Re: xfrm transport mode policy and forward packets
From: Herbert Xu @ 2009-10-22 13:21 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netdev, Alexey Kuznetsov
In-Reply-To: <4AE04B00.8090207@iki.fi>

On Thu, Oct 22, 2009 at 03:07:28PM +0300, Timo Teräs wrote:
>
> I'm using on my dmvpn environment security policies like:
>
> src 0.0.0.0/0 dst 0.0.0.0/0 proto gre 	dir in priority 2147483648 ptype 
> main 	tmpl src 0.0.0.0 dst 0.0.0.0
> 		proto esp reqid 0 mode transport
>
> src 0.0.0.0/0 dst 0.0.0.0/0 proto gre 	dir out priority 2147483648 ptype 
> main 	tmpl src 0.0.0.0 dst 0.0.0.0
> 		proto esp reqid 0 mode transport
>
> To make sure the locally generated/received GRE traffic is IPsec protected.
> Now when some other non-local gre traffic is being forwarded by this router,
> that seems to match these SPs too. Basically no one behind this router box
> can use GRE (or PPTP).

This is expected since forwarded GRE packets match the selector
given.

> My ideas so far have been:
> a) rename 'fwd' to 'infwd' and split 'out' to 'out' and 'outfwd' ?
>   (sounds kinda intrusive)
> b) iptables target that would be able to disable xfrm
>
> Any other ideas?
> What would be the proper fix for this problem?

We could add the fwmark as a key.

Alexey and others may have better ideas on this.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* iproute2: 2 questions
From: almaop @ 2009-10-22 12:42 UTC (permalink / raw)
  To: netdev

1. There is the known bug so we cant use ipt action with recent  iptables:
tc filter add ...
action ipt -j mark --set-mark 2
does not work.
It does not work with the last iproute2-2.6.29 and with latest git.
Is there some official workaround?
2. Are there plans to release the new iproute2 which fixes this bug?

Krzysiek

----------------------------------------------------------------------
Afera Hazardowa- O co tu chodzi?
Sprawdz >>> http://link.interia.pl/f238e

^ permalink raw reply

* bridging + load balancing bonding
From: Jasper Spaans @ 2009-10-22 12:23 UTC (permalink / raw)
  To: netdev

Hi,

We're using the following setup for bonding and bridging, to be able to put
large amounts of data through multiple IDS analyzers:

                             +---[br0]----+     +--- eth1 ---(IDS machine 1)
(Span port from switch) -- eth0          bond0--+
                                                +--- eth2 ---(IDS machine 2)

eth0 receives network traffic, which should be passed to machines which are
connected to eth1 and eth2. These machines run an IDS package, and there are
two of those for performance reasons.

bond0 is configured to load balance the packets using "balance-xor", in this
case combined with xmit_hash_policy layer2.

However, we're seeing problems: packets from one flow do not end up at the
same IDS machine.  This is because this selection is not based on the source
_and_ destination mac addresses of the original packet, but on the mac
address of the bonding device and the destination mac address of the
package.

This is also clear in the code:
For example, in bond_main.c, in bond_xmit_hash_policy_l2:
	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;

Changing this to
	return (data->h_dest[5] ^ data->h_source[5]) % count;
fixes our problems, but is this harmful for packets originating locally (or
being routed?)

If not, can this be applied? Or does anyone have other ideas?

Thanks,
Jasper Spaans
-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply

* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: Jarek Poplawski @ 2009-10-22 12:54 UTC (permalink / raw)
  To: David Miller
  Cc: johannes, tilman, hidave.darkstar, linux-kernel, tglx,
	linux-wireless, linux-ppp, netdev, paulus, mb, oliver
In-Reply-To: <20091022.042939.95166154.davem@davemloft.net>

On Thu, Oct 22, 2009 at 04:29:39AM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Wed, 21 Oct 2009 23:39:47 +0200
> 
> > I'm not sure I can understand your question. This patch is mainly to
> > avoid using netif_rx()/netif_rx_ni() pair as a test of proper process
> > context handling; IMHO there're better tools for this (lockdep,
> > WARN_ON's).
> 
> Semantically I think your patch is correct, but I wonder about cost.
> 
> Something that is a simply per-cpu inline "or" operation is now a
> function call and potentially mispredicted branch inside of
> raise_softirq_irqoff().
> 
> And netif_rx() is indeed a fast path for tunnels and other users so
> this does matter.
> 
> I like having people call things in the correct context the function
> was built for, and thus we can avoiryd completely useless operations and
> tests as we can now in netif_rx().

I like it too, but in this particular case I'm not sure netif_rx()
functionality requires this kind of separation; it looks to me quite
similarly to e.g. tasklet_schedule(), the same for process or softirq
contexts.

> 
> Makaing things general purpose costs something, and it costs too much
> here for this critical routine, sorry.
> 
> I was just having a talk with Nick Piggin about these kinds of issues
> today, too few people care about these ever encrouching tiny pieces
> of bloat that slow the kernel down gradually over time, and I simply
> won't stand for it when I notice it :-)

I'm not sure we're saving in the right place. As a matter of fact,
whenever I look into kernel/ code I can't see this kind of
optimization. There is quite a lot of WARN_ON's and if's. These NOHZ
warnings simply show somebody's else debugging triggers far from
places where it all started and is quite accidental, while this
particular "bug" should've been printed immediately long time ago, if
we really cared.

Since I understand it's a question of taste, and it's not anything
critical, I'm quite OK with staying with the old way (except old
bugs, I hope ;-).

Jarek P.

^ permalink raw reply

* xfrm transport mode policy and forward packets
From: Timo Teräs @ 2009-10-22 12:07 UTC (permalink / raw)
  To: netdev, Herbert Xu

Hi,

I'm using on my dmvpn environment security policies like:

src 0.0.0.0/0 dst 0.0.0.0/0 proto gre 
	dir in priority 2147483648 ptype main 
	tmpl src 0.0.0.0 dst 0.0.0.0
		proto esp reqid 0 mode transport

src 0.0.0.0/0 dst 0.0.0.0/0 proto gre 
	dir out priority 2147483648 ptype main 
	tmpl src 0.0.0.0 dst 0.0.0.0
		proto esp reqid 0 mode transport

To make sure the locally generated/received GRE traffic is IPsec protected.
Now when some other non-local gre traffic is being forwarded by this router,
that seems to match these SPs too. Basically no one behind this router box
can use GRE (or PPTP).

I originally had the 'fwd' policy too, but removing it did not help as-is.
I needed to add destination specific 'out' policies with higher priority.

Apparently, the forward path does two xfrm lookups: first one with from 'fwd'
policies to check if the received packet is not against policy, and a second
'out' lookup to see if it needs to get transformed.

My initial thought was if transport mode policies ought to be ignored, but
if the forwarded packet is NATted we might actually want to xfrm it in
transport mode.

There is 'ifindex' field in xfrm_selector, but that seems to be the output
interface. So it would not solve my problem: both local and forwarded gre
packets are output on the same interface.

I'm now slightly curious why 'in' was sort of split to 'in' and 'fwd', but
'out' was not split similarly, so we'd have more control over policies
depending if the traffic is local or forwarded?

My ideas so far have been:
a) rename 'fwd' to 'infwd' and split 'out' to 'out' and 'outfwd' ?
   (sounds kinda intrusive)
b) iptables target that would be able to disable xfrm

Any other ideas?
What would be the proper fix for this problem?

Thanks,
  Timo

^ permalink raw reply

* Re: [RFC] net,socket: introduce build_sockaddr_check helper to catch overflow at build time
From: David Miller @ 2009-10-22 11:49 UTC (permalink / raw)
  To: gorcunov; +Cc: netdev
In-Reply-To: <20091021170732.GE5976@lenovo>

From: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Wed, 21 Oct 2009 21:07:32 +0400

> net,socket: introduce build_sockaddr_check helper to catch overflow at build time
> 
> proto_ops->getname implies copying protocol specific data
> into storage unit (particulary to __kernel_sockaddr_storage).
> So when one implements new protocol he either may keep this
> in mind (or may not).
> 
> Lets introduce build_sockaddr_check helper which check if
> storage unit is not overfowed. Note that the check is build
> time and introduce no slowdown at execution time.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>

Nice idea, and I wonder if we can automate it even further.
Perhaps some tag that gets put on the socket address type
definition or similar?

^ permalink raw reply

* Re: [PATCH net-next-2.6] rtnetlink: rtnl_setlink() and rtnl_getlink() changes
From: David Miller @ 2009-10-22 11:34 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, netdev
In-Reply-To: <4ADF7633.9050208@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Oct 2009 22:59:31 +0200

> rtnl_getlink() & rtnl_setlink() run with RTNL held, we can use
> __dev_get_by_index() and __dev_get_by_name() variants and avoid
> dev_hold()/dev_put()
> 
> Adds to rtnl_getlink() the capability to find a device by its name,
> not only by its index.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good, applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: David Miller @ 2009-10-22 11:29 UTC (permalink / raw)
  To: jarkao2
  Cc: johannes, tilman, hidave.darkstar, linux-kernel, tglx,
	linux-wireless, linux-ppp, netdev, paulus, mb, oliver
In-Reply-To: <20091021213947.GA12202@ami.dom.local>

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 21 Oct 2009 23:39:47 +0200

> I'm not sure I can understand your question. This patch is mainly to
> avoid using netif_rx()/netif_rx_ni() pair as a test of proper process
> context handling; IMHO there're better tools for this (lockdep,
> WARN_ON's).

Semantically I think your patch is correct, but I wonder about cost.

Something that is a simply per-cpu inline "or" operation is now a
function call and potentially mispredicted branch inside of
raise_softirq_irqoff().

And netif_rx() is indeed a fast path for tunnels and other users so
this does matter.

I like having people call things in the correct context the function
was built for, and thus we can avoiryd completely useless operations and
tests as we can now in netif_rx().

Makaing things general purpose costs something, and it costs too much
here for this critical routine, sorry.

I was just having a talk with Nick Piggin about these kinds of issues
today, too few people care about these ever encrouching tiny pieces
of bloat that slow the kernel down gradually over time, and I simply
won't stand for it when I notice it :-)

^ permalink raw reply

* Re: [PATCH  kernel 2.6.32-rc5] pcnet_cs: add cis of PreMax PE-200 ethernet pcmcia card
From: Ken Kawasaki @ 2009-10-22 11:10 UTC (permalink / raw)
  To: Dan Williams; +Cc: netdev
In-Reply-To: <1256152686.8469.34.camel@localhost.localdomain>

Hi,

>Dan Williams <dcbw@redhat.com> wrote:

> > add cis of PreMax ethernet pcmcia card,
> > and some Sierra Wireless serial card(AC555, AC7xx, AC8xx).
 
> Random question: are CIS files copyrightable?  

The CIS contains the IRQ, ioport-range, voltage information etc
like the PCI config space.
So I think it is not copyrightable.
but Sierra Wireless provided this CIS by GPL.

> What exactly do they
> contain, just updates to the the CIS data on the card itself that the
> manufacturer forgot to burn before shipping the card?

The reason for the CIS update is original CIS does not conform to the pcmcia spec,
not forget to burn the CIS.  
 
> Also, I've got a Sierra AC860 here that reports as "prod_id(2):
> "AC860"", and has the same manf_id (0x0192) and card_id (0x710) as the
> AC850.

Actually, not all Sierra Wireless card need the CIS update.

Could you remove the PCMCIA_DEVICE_CIS_PROD_ID12 and PCMCIA_DEVICE_CIS_MANF_CARD
definition of the Sierra Wireless card,
and check the AC860 works or not?


Here is the output of dumpcis for SW_8xx_SER.cis.

Socket 0
offset 0x02, tuple 0x01, link 0x01
  ff 
dev_info
  no_info

offset 0x05, tuple 0x17, link 0x03
  41 00 ff 
attr_dev_info
  EEPROM 250ns, 512b

offset 0x0a, tuple 0x20, link 0x04
  92 01 10 07 
manfid 0x0192, 0x0710

offset 0x10, tuple 0x21, link 0x02
  02 00 
funcid serial_port

offset 0x14, tuple 0x15, link 0x2f
  07 00 53 69 65 72 72 61 20 57 69 72 65 6c 65 73 
  73 00 41 43 38 35 30 00 33 47 20 4e 65 74 77 6f 
  72 6b 20 41 64 61 70 74 65 72 00 52 31 00 ff 
vers_1 7.0, "Sierra Wireless", "AC850", "3G Network Adapter", "R1"

offset 0x45, tuple 0x1a, link 0x05
  01 03 00 07 73 
config base 0x0700 mask 0x0073 last_index 0x03

offset 0x4c, tuple 0x1b, link 0x10
  e0 01 19 78 4d 55 5d 25 a3 60 f8 48 07 30 bc 86 
cftable_entry 0x20 [default]
  Vcc Istatic 45mA Iavg 50mA Ipeak 55mA Idown 20mA
  io 0x48f8-0x48ff [lines=3] [8bit] [range]
  irq mask 0x86bc [level]

offset 0x5e, tuple 0x1b, link 0x08
  a1 01 08 a3 60 f8 47 07 
cftable_entry 0x21
  io 0x47f8-0x47ff [lines=3] [8bit] [range]

offset 0x68, tuple 0x1b, link 0x08
  a2 01 08 a3 60 e8 48 07 
cftable_entry 0x22
  io 0x48e8-0x48ef [lines=3] [8bit] [range]

offset 0x72, tuple 0x1b, link 0x08
  a3 01 08 a3 60 e8 47 07 
cftable_entry 0x23
  io 0x47e8-0x47ef [lines=3] [8bit] [range]

offset 0x7c, tuple 0x1b, link 0x04
  a4 01 08 23 
cftable_entry 0x24
  io 0x0000-0x0007 [lines=3] [8bit]

offset 0x82, tuple 0x14, link 0x00
no_long_link


Best Regards
Ken.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] ixgbe: Set MSI-X vectors to NOBALANCING and set affinity
From: David Miller @ 2009-10-22 10:56 UTC (permalink / raw)
  To: peter.p.waskiewicz.jr; +Cc: jeffrey.t.kirsher, gospo, netdev
In-Reply-To: <1256199756.2634.65.camel@ppwaskie-mobl2>

From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Date: Thu, 22 Oct 2009 01:22:36 -0700

> The first thing any performance guide says is to disable irqbalance

Such guides are wrong, and that's the end of this discussion.

These kinds of guides also say to do all kinds of crazy things with
the socket sysctl settings.  That's wrong too and we absolutely do not
do things to accomodate nor support those guide suggestions.

And we won't do that here.

I'm especially not going to succumb in this case because Arjan has
been more than responsive to making sure irqbalanced in userspace does
the right thing for networking devices, even multiqueue ones.

So we can make it do the right thing when flow director is present.
In fact, the thing you want for flow director makes sense in the
general case too.

^ permalink raw reply

* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
From: William Allen Simpson @ 2009-10-22 10:53 UTC (permalink / raw)
  To: netdev
In-Reply-To: <4ADFE635.4020109@gmail.com>

Eric Dumazet wrote:
> Absolutely, global setting is a must when an admin wants a quick path.
> 
> The more flexible would be to have two bits per route, plus
> 2 bits on the global configuration.
> 
> global conf:
> 00 : timestamps OFF, unless a route setting is not 00
> 01 : timestamps ON, unless a route setting is not 00
> 10 : Force timestamps OFF, ignore route settings (emergency sysadmin request)
> 11 : Force timestamps ON, ignore route settings 
> 
> Route settings (used *only* if global setting is 0Y)
> 00 : global conf is used
> 01 : Force timestamps being OFF for this route
> 10 : Force timestamps being ON for this route
> 11 : complement global conf
> 
Nice!  Seems to have all the bases covered.  For consistency, I'd swap the
latter values (although I doubt complement will have much use):

00 : global conf is used
01 : complement global conf
10 : Timestamps OFF for this route
11 : Timestamps ON for this route

And the documentation should make it clear that global 10 and 11 override
per route 10 and 11.

^ permalink raw reply

* Re: [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry
From: Ilpo Järvinen @ 2009-10-22  9:41 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Netdev, ori
In-Reply-To: <4ADF15A2.1050804@codefidence.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4535 bytes --]

On Wed, 21 Oct 2009, Gilad Ben-Yossef wrote:

> Hi Ilpo,
> 
> 
> Thanks for the feedback :-)
> 
> 
> Ilpo Järvinen wrote:
> 
> > On Wed, 21 Oct 2009, Gilad Ben-Yossef wrote:
> >
> >   
> > > We need tcp_parse_options to be aware of dst_entry to take into account
> > > per dst_entry TCP options settings
> > >
> > > Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> > > Sigend-off-by: Ori Finkelman <ori@comsleep.com>
> > > Sigend-off-by: Yony Amit <yony@comsleep.com>
> > >
> > > ---
> > >  include/net/tcp.h        |    3 ++-
> > >  net/ipv4/syncookies.c    |   27 ++++++++++++++-------------
> > >  net/ipv4/tcp_input.c     |    9 ++++++---
> > >  net/ipv4/tcp_ipv4.c      |   19 ++++++++++---------
> > >  net/ipv4/tcp_minisocks.c |    7 +++++--
> > >  net/ipv6/syncookies.c    |   28 +++++++++++++++-------------
> > >  net/ipv6/tcp_ipv6.c      |    3 ++-
> > >  7 files changed, 54 insertions(+), 42 deletions(-)
> > >
> > >
> > >     
> <snip>
> > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > > index 7cda24b..1cb0ec4 100644
> > > --- a/net/ipv4/tcp_ipv4.c
> > > +++ b/net/ipv4/tcp_ipv4.c
> >> @@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct
> sk_buff *skb)
> > >  	tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
> > >  #endif
> > >  
> > > +	ireq = inet_rsk(req);
> > > +	ireq->loc_addr = daddr;
> > > +	ireq->rmt_addr = saddr;
> > > +	ireq->no_srccheck = inet_sk(sk)->transparent;
> > > +	ireq->opt = tcp_v4_save_options(sk, skb);
> > > +
> > > +	dst = inet_csk_route_req(sk, req);
> > >   tcp_clear_options(&tmp_opt);
> > >   tmp_opt.mss_clamp = 536;
> > >   tmp_opt.user_mss  = tcp_sk(sk)->rx_opt.user_mss;
> > > 
> > > -	tcp_parse_options(skb, &tmp_opt, 0);
> > > +	tcp_parse_options(skb, &tmp_opt, 0, dst);
> > >  
> > >   if (want_cookie && !tmp_opt.saw_tstamp)
> > >    tcp_clear_options(&tmp_opt);
> >> @@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct
> sk_buff *skb)
> > >  
> > >   tcp_openreq_init(req, &tmp_opt, skb);
> > > 
> > > -	ireq = inet_rsk(req);
> > > -	ireq->loc_addr = daddr;
> > > -	ireq->rmt_addr = saddr;
> > > -	ireq->no_srccheck = inet_sk(sk)->transparent;
> > > -	ireq->opt = tcp_v4_save_options(sk, skb);
> > > -
> > > 	if (security_inet_conn_request(sk, skb, req))
> > > -		goto drop_and_free;
> > > +		goto drop_and_release;
> > >  
> > >   if (!want_cookie)
> > >    TCP_ECN_create_request(req, tcp_hdr(skb));
> >> @@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct
> sk_buff *skb)
> > >    */
> > >    if (tmp_opt.saw_tstamp &&
> > > 		    tcp_death_row.sysctl_tw_recycle &&
> > > -		    (dst = inet_csk_route_req(sk, req)) != NULL &&
> > > +		    dst != NULL &&
> > >     
> >
> > Why you need this NULL check this here while you trap it with BUG_ON
> > elsewhere? Does your patch perhaps create a remote DoS opportunity?
> >
> >
> >   
> Indeed, I believe you are right. Good catch.
> 
> What about this (I know the patch gets eaten by Thunderbird, sorry about that.
> This is just for explaining what I want to do):
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> 
> index 1cb0ec4..1d611e3 100644
> 
> --- a/net/ipv4/tcp_ipv4.c
> 
> +++ b/net/ipv4/tcp_ipv4.c
> 
> @@ -1263,6 +1263,9 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff
> *skb)
> 
>        ireq->opt = tcp_v4_save_options(sk, skb);
> 
> 
> 
>        dst = inet_csk_route_req(sk, req);
> 
> +       if(!dst)
> 
> +               goto drop_and_free;
> 
> +
> 
>        tcp_clear_options(&tmp_opt);
> 
>        tmp_opt.mss_clamp = 536;
> 
>        tmp_opt.user_mss  = tcp_sk(sk)->rx_opt.user_mss;
> 
> @@ -1302,7 +1305,6 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff
> *skb)
> 
>                 */
> 
>                if (tmp_opt.saw_tstamp &&
> 
>                    tcp_death_row.sysctl_tw_recycle &&
> 
> -                   dst != NULL &&
> 
>                    (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
> 
>                    peer->v4daddr == saddr) {
> 
>                        if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL
> &&
> 
> 
> 
> My rational is that since if the connection is formed we will need to send a
> syn/ack ( call to __tcp_v4_send_synack a couple of lines below) and since we
> can't do that  if we don't have a route, this makes sense.
> 
> If this sounds sane, I'll re-spin the patch with this as a fix.

I'd just guard the relevant places with dst && ...? ...But I didn't go 
through that far to find out how many one would then need.

-- 
 i.

^ permalink raw reply

* [PATCH] isdn: fix possible circular locking dependency
From: Xiaotian Feng @ 2009-10-22  9:07 UTC (permalink / raw)
  To: isdn, isdn4linux; +Cc: tilman, netdev, linux-kernel, Xiaotian Feng

There's a circular locking dependency:

---> isdn_net_get_locked_lp
    --->lock &nd->queue_lock
    --->lock &nd->queue->xmit_lock
    .....................
    ---->unlock &nd->queue_lock

---> isdn_net_writebuf_skb (called with &nd->queue->xmit_lock locked)
    ---->isdn_net_inc_frame_cnt
         ---->isdn_net_device_busy
              ----> lock &nd->queue_lock

This will trigger lockdep warnings:

 =======================================================
 [ INFO: possible circular locking dependency detected ]
 2.6.32-rc4-testing #7
 -------------------------------------------------------
 ipppd/28379 is trying to acquire lock:
 (&netdev->queue_lock){......}, at: [<e62ad0fd>] isdn_net_device_busy+0x2c/0x74 [isdn]

 but task is already holding lock:
 (&netdev->local->xmit_lock){+.....}, at: [<e62aefc2>] isdn_net_write_super+0x3f/0x6e [isdn]

 which lock already depends on the new lock.
 .......

 We don't need to lock nd->queue->xmit_lock to protect single
isdn_net_lp_busy(). This can fix above lockdep warnings.

Reported-and-tested-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Xiaotian Feng <xtfeng@gmail.com>
---
 drivers/isdn/i4l/isdn_net.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/isdn/i4l/isdn_net.h b/drivers/isdn/i4l/isdn_net.h
index 74032d0..7511f08 100644
--- a/drivers/isdn/i4l/isdn_net.h
+++ b/drivers/isdn/i4l/isdn_net.h
@@ -83,19 +83,19 @@ static __inline__ isdn_net_local * isdn_net_get_locked_lp(isdn_net_dev *nd)
 
 	spin_lock_irqsave(&nd->queue_lock, flags);
 	lp = nd->queue;         /* get lp on top of queue */
-	spin_lock(&nd->queue->xmit_lock);
 	while (isdn_net_lp_busy(nd->queue)) {
-		spin_unlock(&nd->queue->xmit_lock);
 		nd->queue = nd->queue->next;
 		if (nd->queue == lp) { /* not found -- should never happen */
 			lp = NULL;
 			goto errout;
 		}
-		spin_lock(&nd->queue->xmit_lock);
 	}
 	lp = nd->queue;
 	nd->queue = nd->queue->next;
+	spin_unlock_irqrestore(&nd->queue_lock, flags);
+	spin_lock(&lp->xmit_lock);
 	local_bh_disable();
+	return lp;
 errout:
 	spin_unlock_irqrestore(&nd->queue_lock, flags);
 	return lp;
-- 
1.6.2.5

^ permalink raw reply related

* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: Johannes Berg @ 2009-10-22  8:27 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Tilman Schmidt, David Miller, hidave.darkstar, linux-kernel, tglx,
	linux-wireless, linux-ppp, netdev, paulus, Michael Buesch,
	Oliver Hartkopp
In-Reply-To: <20091021213947.GA12202@ami.dom.local>

[-- Attachment #1: Type: text/plain, Size: 1228 bytes --]

On Wed, 2009-10-21 at 23:39 +0200, Jarek Poplawski wrote:

> > > -	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
> > > +	raise_softirq_irqoff(NET_RX_SOFTIRQ);
> > 
> > This still doesn't make any sense.
> > 
> > There may or may not be a lot of code that assumes that everything else
> > is run with other tasklets disabled, and that it cannot be interrupted
> > by a tasklet and thus create a race.
> > 
> > Can you prove that is not the case, across the entire networking layer?
> 
> I'm not sure I can understand your question. This patch is mainly to
> avoid using netif_rx()/netif_rx_ni() pair as a test of proper process
> context handling; IMHO there're better tools for this (lockdep,
> WARN_ON's).

And how exactly does that matter to the patch at hand?!

I'm saying that it seems to me, as indicated by the API (and without
proof otherwise that's how it is) the networking layer needs to have
packets handed to it with softirqs disabled. Therefore, this patch is
not needed. While it may not be _wrong_, it'll definitely introduce a
performance regression.

This really should be obvious. You're fixing the warning at the source
of the warning, rather than the source of the problem.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox