Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] sparc: bpf_jit_comp: add XOR instruction for BPF JIT JIT
From: David Miller @ 2012-09-27 22:05 UTC (permalink / raw)
  To: dxchgb; +Cc: netdev
In-Reply-To: <20120924215753.GA31312@thinkbox>

From: Daniel Borkmann <dxchgb@gmail.com>
Date: Mon, 24 Sep 2012 23:57:54 +0200

> This patch is a follow-up for patch "filter: add XOR instruction for use
> with X/K" that implements BPF SPARC JIT parts for the BPF XOR operation.
> 
> Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>

Looks good, applied, thanks Daniel.

^ permalink raw reply

* Re: [PATCH net-next 3/3] ipv4: gre: add GRO capability
From: Jesse Gross @ 2012-09-27 22:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1348769990.5093.1584.camel@edumazet-glaptop>

On Thu, Sep 27, 2012 at 11:19 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-09-27 at 20:08 +0200, Eric Dumazet wrote:
>
>>
>> This sounds not feasible with all kind of tunnels, for example IPIP
>> tunnels, or UDP encapsulation, at least with current stack (not OVS)
>>
>> Also note that pushing earlier means forcing the checksumming earlier
>> and it consumes a lot of cpu cycles. Hopefully NIC will help us in the
>> future.
>>
>> Using a napi_struct permits to eventually have separate cpus, and things
>> like RPS/RSS to split the load.
>
> Also please note that my implementation doesnt bypass first IP stack
> traversal (and firewalling if any), so its changing nothing in term
> of existing setups.
>
> So packets that should be forwarded will stay as they are (no tunnels
> decapsulation/recapsulation)
>
> Doing this in the generic GRO layer sounds a bit difficult.

We wouldn't actually do the decapsulation at the point of GRO.  This
is actually pretty similar to what we do with TCP - we merge TCP
payloads even though we haven't done any real IP processing yet.
However, we do check firewall rules later if we actually hit the IP
stack.  GRE would work the same way in this case.

What I'm describing is pretty much exactly what NICs will be doing, so
if that doesn't work we'll have a problem...

^ permalink raw reply

* Re: [PATCH net-next 3/3] ipv4: gre: add GRO capability
From: Jesse Gross @ 2012-09-27 22:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1348769294.5093.1566.camel@edumazet-glaptop>

On Thu, Sep 27, 2012 at 11:08 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-09-27 at 10:52 -0700, Jesse Gross wrote:
>
>> When I was thinking about doing this, my original plan was to handle
>> GRO/GSO by extending the current handlers to be able to look inside
>> GRE and then loop around to process the inner packet (similar to what
>> is done today with skb_flow_dissect() for RPS).  Is there a reason to
>> do it in the device?
>>
>> Pushing it earlier/later in the stack obviously increases the benefit
>> and it will also be more compatible with the forthcoming OVS tunneling
>> hooks, which will be flow based and therefore won't have a device.
>>
>> Also, the next generation of NICs will support this type of thing in
>> hardware so putting the software versions very close to the NIC will
>> give us a more similar abstraction.
>
> This sounds not feasible with all kind of tunnels, for example IPIP
> tunnels, or UDP encapsulation, at least with current stack (not OVS)

Hmm, I think we might be talking about different things since I can't
think of why it wouldn't be feasible (and none of it should be
specific to OVS).  What I was planning would result in the creation of
large but still encapsulated packets.  The merging would be purely
based on the headers in each layer being the same (as GRO is today) so
the logic of the IP stack, UDP stack, etc. isn't processed until
later.

> Also note that pushing earlier means forcing the checksumming earlier
> and it consumes a lot of cpu cycles. Hopefully NIC will help us in the
> future.

It is a good point that if the packet isn't actually destined to us
then probably none of this is worth it (although I suspect that the
relative number of tunnel packets that are passed through vs.
terminated is fairly low).  Many NICs are capable of supplying
CHECKSUM_COMPLETE packets here, even if it is not exposed by the
drivers.

> Using a napi_struct permits to eventually have separate cpus, and things
> like RPS/RSS to split the load.

We should be able to split the load today using RPS since we can look
into the GRE flow once the packet comes off the NIC (assuming that it
is using NAPI).

^ permalink raw reply

* [PATCH] IB/ipoib: Add more rtnl_link_ops callbacks
From: Or Gerlitz @ 2012-09-27 22:02 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, kaber, Or Gerlitz

Add the rtnl_link_ops changelink and fill_info callbacks, through 
which the admin can now set/get the driver mode, etc policies. 
Maintain the proprietary sysfs entries only for legacy childs.

For child devices, set dev->iflink to point to the parent 
device ifindex, such that user space tools can now correctly 
show the uplink relation as done for vlan, macvlan, etc 
devices. Pointed out by Patrick McHardy <kaber@trash.net>

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h         |    3 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c      |   34 ++++++++++----
 drivers/infiniband/ulp/ipoib/ipoib_main.c    |   16 +++++--
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c |   60 +++++++++++++++++++++++++-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c    |   24 ++++++----
 include/linux/if_link.h                      |    7 +++
 6 files changed, 118 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 381f51b..8956f6f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -520,6 +520,9 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 int  __init ipoib_netlink_init(void);
 void __exit ipoib_netlink_fini(void);
 
+void ipoib_set_umcast(struct net_device *ndev, int umcast_val);
+int  ipoib_set_mode(struct net_device *dev, const char *buf);
+
 void ipoib_setup(struct net_device *dev);
 
 void ipoib_pkey_poll(struct work_struct *work);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 24683fd..ef6d5a3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1448,15 +1448,10 @@ static ssize_t show_mode(struct device *d, struct device_attribute *attr,
 		return sprintf(buf, "datagram\n");
 }
 
-static ssize_t set_mode(struct device *d, struct device_attribute *attr,
-			const char *buf, size_t count)
+int ipoib_set_mode(struct net_device *dev, const char *buf)
 {
-	struct net_device *dev = to_net_dev(d);
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
-	if (!rtnl_trylock())
-		return restart_syscall();
-
 	/* flush paths if we switch modes so that connections are restarted */
 	if (IPOIB_CM_SUPPORTED(dev->dev_addr) && !strcmp(buf, "connected\n")) {
 		set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
@@ -1467,7 +1462,8 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
 		priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM;
 
 		ipoib_flush_paths(dev);
-		return count;
+		rtnl_lock();
+		return 0;
 	}
 
 	if (!strcmp(buf, "datagram\n")) {
@@ -1476,14 +1472,32 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
 		dev_set_mtu(dev, min(priv->mcast_mtu, dev->mtu));
 		rtnl_unlock();
 		ipoib_flush_paths(dev);
-
-		return count;
+		rtnl_lock();
+		return 0;
 	}
-	rtnl_unlock();
 
 	return -EINVAL;
 }
 
+static ssize_t set_mode(struct device *d, struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct net_device *dev = to_net_dev(d);
+	int ret;
+
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	ret = ipoib_set_mode(dev, buf);
+	
+	rtnl_unlock();
+
+	if (!ret)
+		return count;
+
+	return ret;
+}
+
 static DEVICE_ATTR(mode, S_IWUSR | S_IRUGO, show_mode, set_mode);
 
 int ipoib_cm_add_mode_attr(struct net_device *dev)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b3e9709..6c5c771 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1386,12 +1386,9 @@ static ssize_t show_umcast(struct device *dev,
 	return sprintf(buf, "%d\n", test_bit(IPOIB_FLAG_UMCAST, &priv->flags));
 }
 
-static ssize_t set_umcast(struct device *dev,
-			  struct device_attribute *attr,
-			  const char *buf, size_t count)
+void ipoib_set_umcast(struct net_device *ndev, int umcast_val)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(dev));
-	unsigned long umcast_val = simple_strtoul(buf, NULL, 0);
+	struct ipoib_dev_priv *priv = netdev_priv(ndev);
 
 	if (umcast_val > 0) {
 		set_bit(IPOIB_FLAG_UMCAST, &priv->flags);
@@ -1399,6 +1396,15 @@ static ssize_t set_umcast(struct device *dev,
 				"by userspace\n");
 	} else
 		clear_bit(IPOIB_FLAG_UMCAST, &priv->flags);
+}
+
+static ssize_t set_umcast(struct device *dev,
+			  struct device_attribute *attr,
+			  const char *buf, size_t count)
+{
+	unsigned long umcast_val = simple_strtoul(buf, NULL, 0);
+	
+	ipoib_set_umcast(to_net_dev(dev), umcast_val);
 
 	return count;
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index a7dc5ea..42f6756 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -37,8 +37,60 @@
 
 static const struct nla_policy ipoib_policy[IFLA_IPOIB_MAX + 1] = {
 	[IFLA_IPOIB_PKEY]	= { .type = NLA_U16 },
+	[IFLA_IPOIB_MODE]	= { .type = NLA_U16 },
+	[IFLA_IPOIB_UMCAST]	= { .type = NLA_U16 },
 };
 
+static int ipoib_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	u16 val;
+
+	if (nla_put_u16(skb, IFLA_IPOIB_PKEY, priv->pkey))
+		goto nla_put_failure;
+
+	val = test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+	if (nla_put_u16(skb, IFLA_IPOIB_MODE, val))
+		goto nla_put_failure;
+
+	val = test_bit(IPOIB_FLAG_UMCAST, &priv->flags);
+	if (nla_put_u16(skb, IFLA_IPOIB_UMCAST, val))
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static int ipoib_changelink(struct net_device *dev,
+			    struct nlattr *tb[], struct nlattr *data[])
+{
+	u16 mode, umcast;
+	int ret = 0;
+
+	if (data[IFLA_IPOIB_MODE]) {
+		mode  = nla_get_u16(data[IFLA_IPOIB_MODE]);
+		if (mode == IPOIB_MODE_DATAGRAM)
+			ret = ipoib_set_mode(dev, "datagram\n");
+		else if (mode == IPOIB_MODE_CONNECTED)
+			ret = ipoib_set_mode(dev, "connected\n");
+		else 
+			ret = -EINVAL;
+
+		if (ret < 0)
+			goto out_err;
+	}
+
+	if (data[IFLA_IPOIB_UMCAST]) { 
+		umcast = nla_get_u16(data[IFLA_IPOIB_UMCAST]);
+		ipoib_set_umcast(dev, umcast);
+	}
+	
+out_err:
+	return ret;
+}
+
 static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
 			       struct nlattr *tb[], struct nlattr *data[])
 {
@@ -69,6 +121,8 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
 
 	err = __ipoib_vlan_add(ppriv, netdev_priv(dev), child_pkey, IPOIB_RTNL_CHILD);
 
+	if (!err && data)
+		err = ipoib_changelink(dev, tb, data);
 	return err;
 }
 
@@ -87,7 +141,9 @@ static void ipoib_unregister_child_dev(struct net_device *dev, struct list_head
 
 static size_t ipoib_get_size(const struct net_device *dev)
 {
-	return nla_total_size(2);	/* IFLA_IPOIB_PKEY */
+	return nla_total_size(2) +	/* IFLA_IPOIB_PKEY   */
+		nla_total_size(2) +	/* IFLA_IPOIB_MODE   */
+		nla_total_size(2);	/* IFLA_IPOIB_UMCAST */
 }
 
 static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
@@ -97,8 +153,10 @@ static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
 	.priv_size	= sizeof(struct ipoib_dev_priv),
 	.setup		= ipoib_setup,
 	.newlink	= ipoib_new_child_link,
+	.changelink	= ipoib_changelink,
 	.dellink	= ipoib_unregister_child_dev,
 	.get_size	= ipoib_get_size,
+	.fill_info	= ipoib_fill_info,
 };
 
 int __init ipoib_netlink_init(void)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 238bbf9..8292554 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -88,17 +88,21 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 
 	ipoib_create_debug_files(priv->dev);
 
-	if (ipoib_cm_add_mode_attr(priv->dev))
-		goto sysfs_failed;
-	if (ipoib_add_pkey_attr(priv->dev))
-		goto sysfs_failed;
-	if (ipoib_add_umcast_attr(priv->dev))
-		goto sysfs_failed;
-
-	if (device_create_file(&priv->dev->dev, &dev_attr_parent))
-		goto sysfs_failed;
+	/* RTNL childs don't need proprietary sysfs entries */
+	if (type == IPOIB_LEGACY_CHILD) {
+		if (ipoib_cm_add_mode_attr(priv->dev))
+			goto sysfs_failed;
+		if (ipoib_add_pkey_attr(priv->dev))
+			goto sysfs_failed;
+		if (ipoib_add_umcast_attr(priv->dev))
+			goto sysfs_failed;
+
+		if (device_create_file(&priv->dev->dev, &dev_attr_parent))
+			goto sysfs_failed;
+	}
 
-	priv->child_type = type;
+	priv->child_type  = type;
+	priv->dev->iflink = ppriv->dev->ifindex;
 	list_add_tail(&priv->list, &ppriv->child_intfs);
 
 	return 0;
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 24c0dd0..4491177 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -404,9 +404,16 @@ struct ifla_port_vsi {
 enum {
 	IFLA_IPOIB_UNSPEC,
 	IFLA_IPOIB_PKEY,
+	IFLA_IPOIB_MODE,
+	IFLA_IPOIB_UMCAST,
 	__IFLA_IPOIB_MAX
 };
 
+enum {
+	IPOIB_MODE_DATAGRAM  = 0, /* using unreliable datagram QPs */
+	IPOIB_MODE_CONNECTED = 1, /* using connected QPs */
+};
+
 #define IFLA_IPOIB_MAX (__IFLA_IPOIB_MAX - 1)
 
 #endif /* _LINUX_IF_LINK_H */
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] smsc75xx: fix resume after device reset
From: David Miller @ 2012-09-27 22:00 UTC (permalink / raw)
  To: steve.glendinning; +Cc: netdev
In-Reply-To: <1348497779-10042-1-git-send-email-steve.glendinning@shawell.net>

From: Steve Glendinning <steve.glendinning@shawell.net>
Date: Mon, 24 Sep 2012 15:42:59 +0100

> On some systems this device fails to properly resume after suspend,
> this patch fixes it by running the usbnet_resume handler.
> 
> I suspect this also fixes this bug:
> 
> http://code.google.com/p/chromium-os/issues/detail?id=31871

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v4] lxt PHY: Support for the buggy LXT973 rev A2
From: David Miller @ 2012-09-27 21:58 UTC (permalink / raw)
  To: richardcochran; +Cc: christophe.leroy, netdev, linux-kernel
In-Reply-To: <20120925074736.GB2169@netboy.at.omicron.at>

From: Richard Cochran <richardcochran@gmail.com>
Date: Tue, 25 Sep 2012 09:47:36 +0200

> On Tue, Sep 25, 2012 at 08:23:42AM +0200, leroy christophe wrote:
>> 
>> A2 chip has phy_id 0x00137a10
>> A3 chip has phy_id 0x00137a11
> 
> Okay then, thanks.
> 
> Acked-by: Richard Cochran <richardcochran@gmail.com>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH 0/7] Add Chelsio T4 firmware configuration file support
From: David Miller @ 2012-09-27 21:56 UTC (permalink / raw)
  To: vipul; +Cc: netdev, divy, dm, swise, leedom, felix
In-Reply-To: <1348663182-20190-1-git-send-email-vipul@chelsio.com>

From: Vipul Pandya <vipul@chelsio.com>
Date: Wed, 26 Sep 2012 18:09:35 +0530

> This patch series adds aupport for firmware configuration file for Chelsio T4
> adapters.
> 
> The Firmware Configuration file was primarily developed in order to centralize
> all of the configuration, resource allocation, etc. for Unified Wire operation
> where multiple Physical / Virtual Function Drivers would be using a T4 adapter
> simultaneously.
> 
> The patch series also has bug fixes which can occur while upgrading the T4
> firmware.
> 
> The patch series is built against David Miller's net-next tree.

Series applied.

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Eric Dumazet @ 2012-09-27 21:17 UTC (permalink / raw)
  To: Chris Clayton, David Miller; +Cc: netdev, gpiez
In-Reply-To: <1348779826.5093.1750.camel@edumazet-glaptop>

On Thu, 2012-09-27 at 23:03 +0200, Eric Dumazet wrote:
> On Thu, 2012-09-27 at 19:05 +0100, Chris Clayton wrote:
> > On 09/27/12 13:14, Eric Dumazet wrote:
> > > On Thu, 2012-09-27 at 12:50 +0100, Chris Clayton wrote:
> > >> Just for information - I've pulled Linus' tree this morning and the
> > >> problem is still present. Also, Gunther Piaz has reported, via the
> > >> bugzilla entry, that he too has hit this regression.
> > >
> > > I tried to reproduce the bug, and my kvm guests have no problem.
> > >
> > > I guess you need to precisely describe how you setup your network, so
> > > that I can reproduce the problem and eventually fix it.
> > >
> > 
> > You've seen the bits from my firewall setup script that relate to this 
> > issue. I start the WinXP client with another script:
> > 
> > #!/bin/sh
> > if [ -e $HOME/kvm/var/run/kvm-winxp.pid ]; then
> >      echo "winxp is already running ..." > /dev/stderr
> >      exit 1
> > fi
> > 
> > # make sure the kvm modules are loaded
> > if test -z "$(grep '\<kvm\>' /proc/misc)"; then
> >      sudo modprobe kvm-intel
> >      while test -z "$(grep '\<kvm\>' /proc/misc)"; do
> >          true
> >      done
> > fi
> > 
> > # make sure tun module is loaded
> > if test ! -e /dev/net/tun; then
> >      sudo modprobe tun
> > fi
> > 
> > # figure out the cpu to use
> > QVER=$(qemu-kvm --version | cut -d' ' -f 4 | sed 's/,/./')
> > # assumes major version is 1
> > MINORVER=$(echo $QVER | cut -d'.' -f 2)
> > if [ $MINORVER -ge 1 ]; then
> >      CPU="host"
> > else
> >      CPU="qemu64"
> > fi
> > 
> > # set up the network interface
> > TAPDEV=$(sudo tunctl -b -u $(whoami))
> > sudo ifconfig $TAPDEV 192.168.200.254 netmask 255.255.255.0 broadcast 
> > 192.168.200.255
> > 
> > # start Windows XP
> > qemu-kvm -drive file=$HOME/kvm/winxp.qcow2,index=0,cache=none,if=virtio 
> > -cpu $CPU -smp cores=1,threads=2 -soundhw es1370 \
> >      -m 768 -net nic,model=virtio,macaddr=$(getmacaddr) -net 
> > tap,ifname=$TAPDEV -startdate $(date +%Y-%m-%dT%H:%M:%S) \
> >      -name kxplaptop -pidfile $HOME/kvm/var/run/kvm-winxp.pid $*
> > 
> > # stop the network interface
> > sudo ifconfig $TAPDEV down
> > sudo tunctl -d $TAPDEV &>/dev/null
> > 
> > # tidy up
> > rm -f $HOME/kvm/var/run/kvm-winxp.pid
> > 
> > 
> > The call to getmacaddr just returns the next in a sequence of mac 
> > addresses. qemu-kvm is a symlink to /usr/bin/qemu-system-i386. I first 
> > found the problem whilst running qemu-kvm version 1.1.1 although I've 
> > since updated to 1.2.0.
> > 
> > By the way, I doubt it will make a difference, but, although my laptop 
> > has a 64bit CPU, I am running a 32 bit kernel and, obviously, user space.
> > 
> > Let me know if you need anything else.
> 
> It works for me.
> 
> Hmm, maybe your guest is using DHCP and DHCP fails ?

Yes it seems the problem. On the host I tried :

# ip ro get 8.8.8.8 from 192.168.200.1 iif tap1
8.8.8.8 from 192.168.200.1 via 172.30.42.1 dev eth0 
    cache  iif *

So if the guest tries to send a frame to 8.8.8.8 we are going to forward
the packet to eth0

But if the guest tries to send to 255.255.255.255, we try to deliver the
packet to the host itself, instead of broadcasting to eth0

# ip ro get 255.255.255.255 from 192.168.200.1 iif tap1
broadcast 255.255.255.255 from 192.168.200.1 dev lo 
    cache <local,brd>  iif *


David, maybe you'll have an idea ?

Thanks

^ permalink raw reply

* Netfilter lacks ability to filter packets via Application-origin
From: Chad Gray @ 2012-09-27 21:04 UTC (permalink / raw)
  To: netdev@vger.kernel.org
In-Reply-To: <COL002-W8067088C0C0B4682A10A0F39B0@phx.gbl>

Users need the ability for Linux firewall to filter packets based on what 
Application they are originating from. This ability is present in Mac and 
Windows firewalls, but not Linux. 

For example, users would like ability to open Port 80 for Firefox, but keep 
Port 80 closed for other applications. 

This ability enhances Privacy & Security of the user but also helps to better 
inform the user about the comings and goings of internet traffic and what 
application/s are causing the traffic. 

https://bugzilla.kernel.org/show_bug.cgi?id=47531 		 	   		  

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Eric Dumazet @ 2012-09-27 21:03 UTC (permalink / raw)
  To: Chris Clayton; +Cc: netdev, gpiez
In-Reply-To: <50649567.2010704@googlemail.com>

On Thu, 2012-09-27 at 19:05 +0100, Chris Clayton wrote:
> On 09/27/12 13:14, Eric Dumazet wrote:
> > On Thu, 2012-09-27 at 12:50 +0100, Chris Clayton wrote:
> >> Just for information - I've pulled Linus' tree this morning and the
> >> problem is still present. Also, Gunther Piaz has reported, via the
> >> bugzilla entry, that he too has hit this regression.
> >
> > I tried to reproduce the bug, and my kvm guests have no problem.
> >
> > I guess you need to precisely describe how you setup your network, so
> > that I can reproduce the problem and eventually fix it.
> >
> 
> You've seen the bits from my firewall setup script that relate to this 
> issue. I start the WinXP client with another script:
> 
> #!/bin/sh
> if [ -e $HOME/kvm/var/run/kvm-winxp.pid ]; then
>      echo "winxp is already running ..." > /dev/stderr
>      exit 1
> fi
> 
> # make sure the kvm modules are loaded
> if test -z "$(grep '\<kvm\>' /proc/misc)"; then
>      sudo modprobe kvm-intel
>      while test -z "$(grep '\<kvm\>' /proc/misc)"; do
>          true
>      done
> fi
> 
> # make sure tun module is loaded
> if test ! -e /dev/net/tun; then
>      sudo modprobe tun
> fi
> 
> # figure out the cpu to use
> QVER=$(qemu-kvm --version | cut -d' ' -f 4 | sed 's/,/./')
> # assumes major version is 1
> MINORVER=$(echo $QVER | cut -d'.' -f 2)
> if [ $MINORVER -ge 1 ]; then
>      CPU="host"
> else
>      CPU="qemu64"
> fi
> 
> # set up the network interface
> TAPDEV=$(sudo tunctl -b -u $(whoami))
> sudo ifconfig $TAPDEV 192.168.200.254 netmask 255.255.255.0 broadcast 
> 192.168.200.255
> 
> # start Windows XP
> qemu-kvm -drive file=$HOME/kvm/winxp.qcow2,index=0,cache=none,if=virtio 
> -cpu $CPU -smp cores=1,threads=2 -soundhw es1370 \
>      -m 768 -net nic,model=virtio,macaddr=$(getmacaddr) -net 
> tap,ifname=$TAPDEV -startdate $(date +%Y-%m-%dT%H:%M:%S) \
>      -name kxplaptop -pidfile $HOME/kvm/var/run/kvm-winxp.pid $*
> 
> # stop the network interface
> sudo ifconfig $TAPDEV down
> sudo tunctl -d $TAPDEV &>/dev/null
> 
> # tidy up
> rm -f $HOME/kvm/var/run/kvm-winxp.pid
> 
> 
> The call to getmacaddr just returns the next in a sequence of mac 
> addresses. qemu-kvm is a symlink to /usr/bin/qemu-system-i386. I first 
> found the problem whilst running qemu-kvm version 1.1.1 although I've 
> since updated to 1.2.0.
> 
> By the way, I doubt it will make a difference, but, although my laptop 
> has a 64bit CPU, I am running a 32 bit kernel and, obviously, user space.
> 
> Let me know if you need anything else.

It works for me.

Hmm, maybe your guest is using DHCP and DHCP fails ?

Could you check ?

^ permalink raw reply

* Re: mlx4: dropping multicast packets at promisc leave
From: Or Gerlitz @ 2012-09-27 20:45 UTC (permalink / raw)
  To: mleitner; +Cc: netdev, Yevgeny Petrilin, Amir Vadai
In-Reply-To: <5064B905.6090906@redhat.com>

On Thu, Sep 27, 2012 at 10:37 PM, Marcelo Ricardo Leitner
<mleitner@redhat.com> wrote:
> Well, neither me nor they could reproduce the drops at promisc exit anymore.
> It's hard to chase a ghost, you know. I'll still track that cpu usage, but
> it seems unrelated to the driver at first glance.
>
> Thank you for your support Or, appreciated.

sure

^ permalink raw reply

* Re: mlx4: dropping multicast packets at promisc leave
From: Marcelo Ricardo Leitner @ 2012-09-27 20:37 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev, Yevgeny Petrilin, Amir Vadai
In-Reply-To: <505CB2BD.4080402@redhat.com>

On 09/21/2012 03:32 PM, Marcelo Ricardo Leitner wrote:
> On 09/20/2012 12:04 PM, Marcelo Ricardo Leitner wrote:
>> On 09/20/2012 10:21 AM, Or Gerlitz wrote:
>>> On 20/09/2012 03:43, Marcelo Ricardo Leitner wrote:
>>>> I have a report that our mlx4 driver (RHEL 6.3) is dropping multicast
>>>> packets when NIC leaves promisc mode. It seems this is being cause due
>>>> to the new steering mode that took place near by commit
>>>> 1679200f91da6a054b06954c9bd3eeed29b6731f. As it seems, the new
>>>> steering mode needs more commands/time to leave the promisc mode,
>>>> which may be leading to packet drops.
>>>
>>> Marcelo,
>>>
>>> The commit you point on below 6d19993 "net/mlx4_en: Re-design multicast
>>> attachments flow" makes sure to avoid
>>> doing extra firmware comments and not leave a window in time where
>>> "correct" addresses are not attached. Its hard to say what's the case on
>>> that RHEL 6.3 system, it would be very helpful through if you manage to
>>> reproduce the problem on an upstream kernel -- BTW you didn't say on
>>
>> Okay, I understand that the commit prevents a window. I may be missing
>> something, but isn't there another one in there? Between:
>> mlx4_SET_MCAST_FLTR MLX4_MCAST_DISABLE and
>> mlx4_SET_MCAST_FLTR MLX4_MCAST_ENABLE
>> because mlx4_multicast_promisc_remove() was called just before those.
>> Otherwise I don't how is the NIC would be receiving multicast packets in
>> there.
>>
> ....
>> And then I tried 3 additional patches applied at once:
>> - 60d31c1475f2 "net/mlx4_core: Looking for promiscuous entries on the
>> correct port"
>> - f1f75f0 - mlx4: attach multicast with correct flag
>> - Yes, this one wasn't in 2.6.32-279.el6.
>> - 6d19993 - net/mlx4_en: Re-design multicast attachments flow
>>
>> And they still reported drops.

Well, neither me nor they could reproduce the drops at promisc exit 
anymore. It's hard to chase a ghost, you know. I'll still track that cpu 
usage, but it seems unrelated to the driver at first glance.

Thank you for your support Or, appreciated.

Regards,
Marcelo

^ permalink raw reply

* [PATCH v2] net: ti cpsw ethernet: set IFCTL_A bit in MACCONTROL
From: Daniel Mack @ 2012-09-27 19:19 UTC (permalink / raw)
  To: netdev; +Cc: Daniel Mack, Mugunthan V N, Vaibhav Hiremath, David S. Miller

For RMII/RGMII mode operation in 100Mbps, the CPSW needs to set the
IFCTL_A bits in the MACCONTROL register. For all other PHY modes, this
bit is unused, so setting it unconditionally shouldn't cause any
trouble.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Vaibhav Hiremath <hvaibhav@ti.com>
Cc: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/ti/cpsw.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index aa78168..fb1a692 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -386,6 +386,11 @@ static void _cpsw_adjust_link(struct cpsw_slave *slave,
 			mac_control |= BIT(7);	/* GIGABITEN	*/
 		if (phy->duplex)
 			mac_control |= BIT(0);	/* FULLDUPLEXEN	*/
+
+		/* set speed_in input in case RMII mode is used in 100Mbps */
+		if (phy->speed == 100)
+			mac_control |= BIT(15);
+
 		*link = true;
 	} else {
 		mac_control = 0;
-- 
1.7.11.4

^ permalink raw reply related

* RE: [PATCH] net: ti cpsw ethernet: set IFCTL_A bit in MACCONTROL
From: N, Mugunthan V @ 2012-09-27 18:40 UTC (permalink / raw)
  To: Daniel Mack, netdev@vger.kernel.org; +Cc: Hiremath, Vaibhav, David S. Miller
In-Reply-To: <1348746636-24156-1-git-send-email-zonque@gmail.com>

> -----Original Message-----
> From: Daniel Mack [mailto:zonque@gmail.com]
> Sent: Thursday, September 27, 2012 5:21 PM
> To: netdev@vger.kernel.org
> Cc: Daniel Mack; N, Mugunthan V; Hiremath, Vaibhav; David S. Miller
> Subject: [PATCH] net: ti cpsw ethernet: set IFCTL_A bit in MACCONTROL
> 
> For RMII/RGMII mode operation in 100Mbps, the CPSW needs to set the
> IFCTL_A bits in the MACCONTROL register. For all other PHY modes, this
> bit is unused, so setting it unconditionally shouldn't cause any
> trouble.
> 
> Signed-off-by: Daniel Mack <zonque@gmail.com>
> Cc: Mugunthan V N <mugunthanvnm@ti.com>
> Cc: Vaibhav Hiremath <hvaibhav@ti.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
>  drivers/net/ethernet/ti/cpsw.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c
> b/drivers/net/ethernet/ti/cpsw.c
> index aa78168..b764f75 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -386,6 +386,11 @@ static void _cpsw_adjust_link(struct cpsw_slave
> *slave,
>  			mac_control |= BIT(7);	/* GIGABITEN	*/
>  		if (phy->duplex)
>  			mac_control |= BIT(0);	/* FULLDUPLEXEN	*/
> +
> +		/* set speed_in input in case RMII mode is used in >10Mbps
> */
> +		if (phy->speed > 10)

Please change the speed check as == 100 as it is required only for 100Mbps link

Regards
Mugunthan V N

> +			mac_control |= BIT(15);
> +
>  		*link = true;
>  	} else {
>  		mac_control = 0;
> --
> 1.7.11.4

^ permalink raw reply

* Re: [PATCH] net: phy: smsc: Implement PHY config_init for LAN87xx
From: Otavio Salvador @ 2012-09-27 18:21 UTC (permalink / raw)
  To: Marek Vasut
  Cc: netdev, Christian Hohnstaedt, David S. Miller, Fabio Estevam,
	Giuseppe Cavallaro
In-Reply-To: <1348604262-21522-1-git-send-email-marex@denx.de>

On Tue, Sep 25, 2012 at 5:17 PM, Marek Vasut <marex@denx.de> wrote:
> The LAN8710/LAN8720 chips do have broken the "FlexPWR" smart power-saving
> capability. Enabling it leads to the PHY not being able to detect Link when
> cold-started without cable connected. Thus, make sure this is disabled.
>
> Signed-off-by: Marek Vasut <marex@denx.de>

Acked-by: Otavio Salvador <otavio@ossystems.com.br>

-- 
Otavio Salvador                             O.S. Systems
E-mail: otavio@ossystems.com.br  http://www.ossystems.com.br
Mobile: +55 53 9981-7854              http://projetos.ossystems.com.br

^ permalink raw reply

* Re: [PATCH net-next 3/3] ipv4: gre: add GRO capability
From: Eric Dumazet @ 2012-09-27 18:19 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev
In-Reply-To: <1348769294.5093.1566.camel@edumazet-glaptop>

On Thu, 2012-09-27 at 20:08 +0200, Eric Dumazet wrote:

> 
> This sounds not feasible with all kind of tunnels, for example IPIP
> tunnels, or UDP encapsulation, at least with current stack (not OVS)
> 
> Also note that pushing earlier means forcing the checksumming earlier
> and it consumes a lot of cpu cycles. Hopefully NIC will help us in the
> future.
> 
> Using a napi_struct permits to eventually have separate cpus, and things
> like RPS/RSS to split the load.

Also please note that my implementation doesnt bypass first IP stack
traversal (and firewalling if any), so its changing nothing in term
of existing setups.

So packets that should be forwarded will stay as they are (no tunnels
decapsulation/recapsulation)

Doing this in the generic GRO layer sounds a bit difficult.

^ permalink raw reply

* [PATCH RFC net-next 1/1] ptp: add an ioctl to compare PHC time with system time
From: Richard Cochran @ 2012-09-27 18:12 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Jacob Keller, John Stultz, Miroslav Lichvar
In-Reply-To: <cover.1348768886.git.richardcochran@gmail.com>

This patch adds an ioctl for PTP Hardware Clock (PHC) devices that allows
user space to measure the time offset between the PHC and the system
clock. Rather than hard coding any kind of estimation algorithm into the
kernel, this patch takes the more flexible approach of just delivering
an array of raw clock readings. In that way, the user space clock servo
may be adapted to new and different hardware clocks.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 drivers/ptp/ptp_chardev.c |   32 ++++++++++++++++++++++++++++++++
 include/linux/ptp_clock.h |   14 ++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index e7f301da2..4f8ae80 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -33,9 +33,13 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 {
 	struct ptp_clock_caps caps;
 	struct ptp_clock_request req;
+	struct ptp_sys_offset sysoff;
 	struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
 	struct ptp_clock_info *ops = ptp->info;
+	struct ptp_clock_time *pct;
+	struct timespec ts;
 	int enable, err = 0;
+	unsigned int i;
 
 	switch (cmd) {
 
@@ -88,6 +92,34 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 		err = ops->enable(ops, &req, enable);
 		break;
 
+	case PTP_SYS_OFFSET:
+		if (copy_from_user(&sysoff, (void __user *)arg,
+				   sizeof(sysoff))) {
+			err = -EFAULT;
+			break;
+		}
+		if (sysoff.n_samples > PTP_MAX_SAMPLES) {
+			err = -EINVAL;
+			break;
+		}
+		pct = &sysoff.ts[0];
+		for (i = 0; i < sysoff.n_samples; i++) {
+			getnstimeofday(&ts);
+			pct->sec = ts.tv_sec;
+			pct->nsec = ts.tv_nsec;
+			pct++;
+			ptp->info->gettime(ptp->info, &ts);
+			pct->sec = ts.tv_sec;
+			pct->nsec = ts.tv_nsec;
+			pct++;
+		}
+		getnstimeofday(&ts);
+		pct->sec = ts.tv_sec;
+		pct->nsec = ts.tv_nsec;
+		if (copy_to_user((void __user *)arg, &sysoff, sizeof(sysoff)))
+			err = -EFAULT;
+		break;
+
 	default:
 		err = -ENOTTY;
 		break;
diff --git a/include/linux/ptp_clock.h b/include/linux/ptp_clock.h
index 94e981f..b65c834 100644
--- a/include/linux/ptp_clock.h
+++ b/include/linux/ptp_clock.h
@@ -67,12 +67,26 @@ struct ptp_perout_request {
 	unsigned int rsv[4];          /* Reserved for future use. */
 };
 
+#define PTP_MAX_SAMPLES 25 /* Maximum allowed offset measurement samples. */
+
+struct ptp_sys_offset {
+	unsigned int n_samples; /* Desired number of measurements. */
+	unsigned int rsv[3];    /* Reserved for future use. */
+	/*
+	 * Array of interleaved system/phc time stamps. The kernel
+	 * will provide 2*n_samples + 1 time stamps, with the last
+	 * one as a system time stamp.
+	 */
+	struct ptp_clock_time ts[2 * PTP_MAX_SAMPLES + 1];
+};
+
 #define PTP_CLK_MAGIC '='
 
 #define PTP_CLOCK_GETCAPS  _IOR(PTP_CLK_MAGIC, 1, struct ptp_clock_caps)
 #define PTP_EXTTS_REQUEST  _IOW(PTP_CLK_MAGIC, 2, struct ptp_extts_request)
 #define PTP_PEROUT_REQUEST _IOW(PTP_CLK_MAGIC, 3, struct ptp_perout_request)
 #define PTP_ENABLE_PPS     _IOW(PTP_CLK_MAGIC, 4, int)
+#define PTP_SYS_OFFSET     _IOW(PTP_CLK_MAGIC, 5, struct ptp_sys_offset)
 
 struct ptp_extts_event {
 	struct ptp_clock_time t; /* Time event occured. */
-- 
1.7.2.5

^ permalink raw reply related

* [PATCH RFC net-next 0/1] ptp: add pseudo pps ioctl
From: Richard Cochran @ 2012-09-27 18:12 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Jacob Keller, John Stultz, Miroslav Lichvar

This patch adds a kind of "poor man's PPS" for use with those PHC
drivers which do not support a true PPS. Using this ioctl, user space
can estimate the system-phc offset and tune one of the clocks as
desired.

This patch has been tested on both the Intel igb (PCIe card) and the
National Semiconductor phyter (PHY via MDIO bus), and the results seem
quite promising.

This patch avoids the "timecompare" code on purpose, since experiments
have shown that code to be quite brittle, having been tuned to only
one specific kind of hardware.

Thanks,
Richard

Richard Cochran (1):
  ptp: add an ioctl to compare PHC time with system time

 drivers/ptp/ptp_chardev.c |   32 ++++++++++++++++++++++++++++++++
 include/linux/ptp_clock.h |   14 ++++++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)

-- 
1.7.2.5

^ permalink raw reply

* Re: [PATCH net-next 3/3] ipv4: gre: add GRO capability
From: Eric Dumazet @ 2012-09-27 18:08 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev
In-Reply-To: <CAEP_g=-JAYHXM86AYNp7BhDV+eqfkKVgC+SJS1MVdo0K8fRLSQ@mail.gmail.com>

On Thu, 2012-09-27 at 10:52 -0700, Jesse Gross wrote:

> When I was thinking about doing this, my original plan was to handle
> GRO/GSO by extending the current handlers to be able to look inside
> GRE and then loop around to process the inner packet (similar to what
> is done today with skb_flow_dissect() for RPS).  Is there a reason to
> do it in the device?
> 
> Pushing it earlier/later in the stack obviously increases the benefit
> and it will also be more compatible with the forthcoming OVS tunneling
> hooks, which will be flow based and therefore won't have a device.
> 
> Also, the next generation of NICs will support this type of thing in
> hardware so putting the software versions very close to the NIC will
> give us a more similar abstraction.

This sounds not feasible with all kind of tunnels, for example IPIP
tunnels, or UDP encapsulation, at least with current stack (not OVS)

Also note that pushing earlier means forcing the checksumming earlier
and it consumes a lot of cpu cycles. Hopefully NIC will help us in the
future.

Using a napi_struct permits to eventually have separate cpus, and things
like RPS/RSS to split the load.

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Chris Clayton @ 2012-09-27 18:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, gpiez
In-Reply-To: <1348748042.5093.1168.camel@edumazet-glaptop>

On 09/27/12 13:14, Eric Dumazet wrote:
> On Thu, 2012-09-27 at 12:50 +0100, Chris Clayton wrote:
>> Just for information - I've pulled Linus' tree this morning and the
>> problem is still present. Also, Gunther Piaz has reported, via the
>> bugzilla entry, that he too has hit this regression.
>
> I tried to reproduce the bug, and my kvm guests have no problem.
>
> I guess you need to precisely describe how you setup your network, so
> that I can reproduce the problem and eventually fix it.
>

You've seen the bits from my firewall setup script that relate to this 
issue. I start the WinXP client with another script:

#!/bin/sh
if [ -e $HOME/kvm/var/run/kvm-winxp.pid ]; then
     echo "winxp is already running ..." > /dev/stderr
     exit 1
fi

# make sure the kvm modules are loaded
if test -z "$(grep '\<kvm\>' /proc/misc)"; then
     sudo modprobe kvm-intel
     while test -z "$(grep '\<kvm\>' /proc/misc)"; do
         true
     done
fi

# make sure tun module is loaded
if test ! -e /dev/net/tun; then
     sudo modprobe tun
fi

# figure out the cpu to use
QVER=$(qemu-kvm --version | cut -d' ' -f 4 | sed 's/,/./')
# assumes major version is 1
MINORVER=$(echo $QVER | cut -d'.' -f 2)
if [ $MINORVER -ge 1 ]; then
     CPU="host"
else
     CPU="qemu64"
fi

# set up the network interface
TAPDEV=$(sudo tunctl -b -u $(whoami))
sudo ifconfig $TAPDEV 192.168.200.254 netmask 255.255.255.0 broadcast 
192.168.200.255

# start Windows XP
qemu-kvm -drive file=$HOME/kvm/winxp.qcow2,index=0,cache=none,if=virtio 
-cpu $CPU -smp cores=1,threads=2 -soundhw es1370 \
     -m 768 -net nic,model=virtio,macaddr=$(getmacaddr) -net 
tap,ifname=$TAPDEV -startdate $(date +%Y-%m-%dT%H:%M:%S) \
     -name kxplaptop -pidfile $HOME/kvm/var/run/kvm-winxp.pid $*

# stop the network interface
sudo ifconfig $TAPDEV down
sudo tunctl -d $TAPDEV &>/dev/null

# tidy up
rm -f $HOME/kvm/var/run/kvm-winxp.pid


The call to getmacaddr just returns the next in a sequence of mac 
addresses. qemu-kvm is a symlink to /usr/bin/qemu-system-i386. I first 
found the problem whilst running qemu-kvm version 1.1.1 although I've 
since updated to 1.2.0.

By the way, I doubt it will make a difference, but, although my laptop 
has a 64bit CPU, I am running a 32 bit kernel and, obviously, user space.

Let me know if you need anything else.

Thanks

> Thanks
>
>

^ permalink raw reply

* Re: [PATCH net-next 3/3] ipv4: gre: add GRO capability
From: Jesse Gross @ 2012-09-27 17:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1348750130.5093.1227.camel@edumazet-glaptop>

On Thu, Sep 27, 2012 at 5:48 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Add GRO capability to IPv4 GRE tunnels, using the gro_cells
> infrastructure.
>
> Tested using IPv4 and IPv6 TCP traffic inside this tunnel, and
> checking GRO is building large packets.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

When I was thinking about doing this, my original plan was to handle
GRO/GSO by extending the current handlers to be able to look inside
GRE and then loop around to process the inner packet (similar to what
is done today with skb_flow_dissect() for RPS).  Is there a reason to
do it in the device?

Pushing it earlier/later in the stack obviously increases the benefit
and it will also be more compatible with the forthcoming OVS tunneling
hooks, which will be flow based and therefore won't have a device.

Also, the next generation of NICs will support this type of thing in
hardware so putting the software versions very close to the NIC will
give us a more similar abstraction.

^ permalink raw reply

* Re: [3.5 regression / mcs7830 / bisected] bridge constantly toggeling between disabled and forwarding
From: Greg KH @ 2012-09-27 17:39 UTC (permalink / raw)
  To: Michael Leun; +Cc: linux, davem, netdev, linux-kernel
In-Reply-To: <20120724013634.11bf1360@xenia.leun.net>

On Tue, Jul 24, 2012 at 01:36:34AM +0200, Michael Leun wrote:
> On Mon, 23 Jul 2012 09:15:04 +0200
> Michael Leun <lkml20120218@newton.leun.net> wrote:
> 
> [see issue description below]
> 
> Bisecting yielded
> 
> b1ff4f96fd1c63890d78d8939c6e0f2b44ce3113 is the first bad commit
> commit b1ff4f96fd1c63890d78d8939c6e0f2b44ce3113
> Author: Ondrej Zary <linux@rainbow-software.org>
> Date:   Fri Jun 1 10:29:08 2012 +0000
> 
>     mcs7830: Implement link state detection
> 
>     Add .status callback that detects link state changes.
>     Tested with MCS7832CV-AA chip (9710:7830, identified as rev.C by the driver).
>     Fixes https://bugzilla.kernel.org/show_bug.cgi?id=28532
> 
>     Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> :040000 040000 5480780cb5e75c57122a621fc3bab0108c16be27 d97efd9cc0a465dff76bcd3a3c547f718f2a5345 M    drivers
> 
> 
> Reverting that from 3.5 makes the issue go away.

Did this ever get resolved in 3.6-rc7 or any older kernel?  I can't
revert the patch from 3.5.y unless it's also fixed in Linus's tree.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] netdev: pasemi: fix return value check in pasemi_mac_phy_init()
From: David Miller @ 2012-09-27 17:21 UTC (permalink / raw)
  To: weiyj.lk
  Cc: olof, grant.likely, rob.herring, yongjun_wei, netdev,
	devicetree-discuss
In-Reply-To: <CAPgLHd8hJjj1HkV7gc8QH2p9rDJQedkc+vCUpFJMgyXQua4LGg@mail.gmail.com>

From: Wei Yongjun <weiyj.lk@gmail.com>
Date: Thu, 27 Sep 2012 13:51:58 +0800

> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> 
> In case of error, the function of_phy_connect() returns NULL
> pointer not ERR_PTR(). The IS_ERR() test in the return value
> check should be replaced with NULL test.
> 
> dpatch engine is used to auto generate this patch.
> (https://github.com/weiyj/dpatch)
> 
> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Applied, thanks.

^ permalink raw reply

* Re: [PATCHv4 net-next] vxlan: virtual extensible lan
From: Jesse Gross @ 2012-09-27 17:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Chris Wright, David Miller, netdev
In-Reply-To: <20120925213623.39ee67d1@nehalam.linuxnetplumber.net>

On Tue, Sep 25, 2012 at 9:36 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Tue, 25 Sep 2012 14:55:13 -0700
> Jesse Gross <jesse@nicira.com> wrote:
>
>> On Mon, Sep 24, 2012 at 2:50 PM, Stephen Hemminger
>> <shemminger@vyatta.com> wrote:
>> > +static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
>> [...]
>> > +       /* Do PMTU */
>> > +       if (skb->protocol == htons(ETH_P_IP)) {
>> > +               df |= old_iph->frag_off & htons(IP_DF);
>> > +               if (df && mtu < pkt_len) {
>> > +                       icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
>> > +                                 htonl(mtu));
>> > +                       ip_rt_put(rt);
>> > +                       goto tx_error;
>> > +               }
>> > +       }
>> > +#if IS_ENABLED(CONFIG_IPV6)
>> > +       else if (skb->protocol == htons(ETH_P_IPV6)) {
>> > +               if (mtu >= IPV6_MIN_MTU && mtu < pkt_len) {
>> > +                       icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
>> > +                       ip_rt_put(rt);
>> > +                       goto tx_error;
>> > +               }
>> > +       }
>> > +#endif
>>
>> Won't this black hole packets if we need to generate ICMP messages?
>> Since we're doing switching and not routing here icmp_send() doesn't
>> necessarily have a route to the relevant endpoint.  It looks like
>> Ethernet over GRE has this issue as well.
>
> It is an interesting question about what is the correct way to handle packets
> where the inner header is IPv6 or IPv4 with Don't Fragment set. As you mention
> sending an ICMP response won't work because the tunnel endpoint is not part
> of that IP network.
>
> The simple option is to fragment it in the tunnel and since the fragmentation
> is not visible to the overlay network, that is okay. But for PMTU discovery
> it might be better to just drop the packet and not send a fragmented payload.
>
> Some backbone networks don't allow fragmentation at all (in a futile attempt
> to block DoS attacks and protect fragile Windows hosts). Fragmentation
> brings all sorts of evil problems like the potential of corrupted assembly
> because of sequence wrap; the checksum in the inner packet will defend against
> that but tunnels are not supposed to rely on inner protocol data protection.
>
> Or you can just do what Cisco and Microsoft do and just tell everyone
> to set larger MTU on the backbone.

What I think people usually do in these situations are:
 1. Insist people set the MTU to take into account the tunnel.
 2. Use MSS clamping for TCP traffic.
 3. Either drop or fragment the tunnel packet.  In theory some IP
stacks will probe for a lower MTU if packets are dropping, in practice
things seem to just break.  If the backbone is going to drop
fragmented packets then I guess it doesn't make a difference, modulo
the potential for corruption that you mentioned.  Always dropping
seems worse (although it is the behavior of many hardware devices that
can't do fragmentation at all).

So I think what you have currently is correct.

A couple of other options:
 * In many cases it might be desirable to do fragmentation on the
inner rather than outer packet, especially if there are middleboxes
looking inside the tunnel.  This assumes that the inner packet is IP
and doesn't have the DF bit set.  In theory, you could do it even if
the DF bit is set since we can't do path MTU discovery anyways.
 * A few years ago I wrote an implementation of path MTU discovery in
OVS to handle this situation.  It's pretty effective but it relies on
guessing/faking some addresses.  I think we're going to pull it out
soon in favor of MSS clamping soon though.

I wouldn't implement either of these here, at least at this time though.

^ permalink raw reply

* Re: [PATCH] team: fix return value check
From: David Miller @ 2012-09-27 17:18 UTC (permalink / raw)
  To: weiyj.lk; +Cc: jpirko, yongjun_wei, netdev
In-Reply-To: <CAPgLHd9GYYu21FhiKAr2MvmYtWbL=85-E0EzJmiJm6cDDp=WcA@mail.gmail.com>

From: Wei Yongjun <weiyj.lk@gmail.com>
Date: Tue, 25 Sep 2012 12:29:35 +0800

> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> 
> In case of error, the function genlmsg_put() returns NULL pointer
> not ERR_PTR(). The IS_ERR() test in the return value check should
> be replaced with NULL test.
> 
> dpatch engine is used to auto generate this patch.
> (https://github.com/weiyj/dpatch)
> 
> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox