Netdev List
 help / color / mirror / Atom feed
* Re: [RFC] ipv6: don't flush routes when setting loopback down
From: David Miller @ 2011-01-23 19:47 UTC (permalink / raw)
  To: shemminger
  Cc: stephen.hemminger, ebiederm, jbohac, brian.haley, netdev,
	maheshkelkar, lorenzo, yoshfuji, stable
In-Reply-To: <20110123192416.73cd7521@s6510>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Sun, 23 Jan 2011 19:24:16 +1100

> You are probably so upset because I stepped on code you worked hard
> on.

Frankly, I honestly don't care much at all about ipv6, it's still only
a tiny minuscule fraction of actual usage in this world, even with
ipv4 addresses basically completely depleted right now, and it's also
an over-engineered monster.

> But the IPv6 semantics should not have been different from IPv4
> and the disable_ipv6 flag was a poor API choice as well.

There are no equivalent semantics, because nobody wants to disable
all IPV4 activity and socket protocol operations.  People want it
only for ipv6.

This interface was choosen because this is what users asked for.

They wanted global and per-interface ways to disable IPV6 protocol
activity, so that's what we gave them.

Stephen you don't get it.  The ball is completely in your court
about this, you broke stuff to fix stuff and that's not allowed.

You have also been given months of time to undo the breakage.
Normally I would just immediately revert, so you were given
special allowances because I respect and trust you.

So stop this BS where you say that I'm treating you in an
unfair way.

^ permalink raw reply

* Re: [RFC] ipv6: don't flush routes when setting loopback down
From: Eric W. Biederman @ 2011-01-23 19:21 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: jbohac, David Miller, yoshfuji, netdev, stable, stephen.hemminger,
	maheshkelkar, brian.haley, Willy Tarreau, lorenzo
In-Reply-To: <20110123213444.70d0b0f1@s6510>

Stephen Hemminger <shemminger@vyatta.com> writes:

> I think this fixes the issue with disable_ipv6

Somehow I doubt deleting the ipv6 state, and removing the per device
disable_ipv6 flag is going to be backwards compatible.

echo 0 > /proc/sys/net/ipv6/conf/disable_ipv6

Won't work.

What ever other good properties calling NETDEV_UNREGISTER for ipv6
devices may have.

Eric

> --- a/net/ipv6/addrconf.c	2011-01-23 20:30:25.897243002 +1100
> +++ b/net/ipv6/addrconf.c	2011-01-23 20:30:41.161243002 +1100
> @@ -4197,7 +4197,7 @@ static void dev_disable_change(struct in
>  		return;
>  
>  	if (idev->cnf.disable_ipv6)
> -		addrconf_notify(NULL, NETDEV_DOWN, idev->dev);
> +		addrconf_notify(NULL, NETDEV_UNREGISTER, idev->dev);
>  	else
>  		addrconf_notify(NULL, NETDEV_UP, idev->dev);
>  }

^ permalink raw reply

* Re: 2.6.37 regression: adding main interface to a bridge breaks vlan interface RX
From: Maciej Rutecki @ 2011-01-23 17:45 UTC (permalink / raw)
  To: Simon Arlott; +Cc: netdev, Linux Kernel Mailing List, jesse
In-Reply-To: <4D32FC1C.3010905@simon.arlott.org.uk>

I created a Bugzilla entry at 
https://bugzilla.kernel.org/show_bug.cgi?id=27432
for your bug report, please add your address to the CC list in there, thanks!

On niedziela, 16 stycznia 2011 o 15:09:32 Simon Arlott wrote:
> [    1.666706] forcedeth 0000:00:08.0: ifname eth0, PHY OUI 0x5043 @ 16,
> addr 00:e0:81:4d:2b:ec [    1.666767] forcedeth 0000:00:08.0: highdma csum
> vlan pwrctl mgmt gbit lnktim msi desc-v3
> 
> I have eth0 and eth0.3840 which works until I add eth0 to a bridge.
> While eth0 is in a bridge (the bridge device is up), eth0.3840 is unable
> to receive packets. Using tcpdump on eth0 shows the packets being
> received with a VLAN tag but they don't appear on eth0.3840. They appear
> with the VLAN tag on the bridge interface.
> 
> If I remove eth0 from the bridge, eth0.3840 starts working again. It
> still works if eth0.3840 is part of a bridge but eth0 isn't (the device
> is in promiscuous mode). I've only tested with broadcast traffic.
> 
> This works with 2.6.36.
> 
> git bisect produces 3701e51382a026cba10c60b03efabe534fba4ca4 as the
> first bad commit.
> 
> The behaviour of drivers/net/forcedeth.c nv_rx_process_optimized looks
> ok - vlan_gro_receive and napi_gro_receive are called correctly. (The
> likely(!np->vlangrp) looks odd as it'll always be false if vlans are in
> use).

-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply

* Re: [PATCH v4] net-next-2.6: Allow ethtool to set interface in loopback mode.
From: Michał Mirosław @ 2011-01-23 17:32 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: David Miller, Ben Hutchings, Tom Herbert, Laurent Chavey, netdev
In-Reply-To: <1295655836-23421-1-git-send-email-maheshb@google.com>

2011/1/22 Mahesh Bandewar <maheshb@google.com>:
> This patch enables ethtool to set the loopback mode on a given interface.
> By configuring the interface in loopback mode in conjunction with a policy
> route / rule, a userland application can stress the egress / ingress path
> exposing the flows of the change in progress and potentially help developer(s)
> understand the impact of those changes without even sending a packet out
> on the network.
[...]

If this is going to be a flag, then maybe you could look at my ethtool
unification
series
http://marc.info/?l=linux-netdev&m=129573447816532&w=3
and integrate it there?

On the other hand, if this is going to be driver-specific value, then
you should use ethtool_get_value() and ethtool_set_value() instead of
making yet another copy
(linke eg. ETHTOOL_SMSGLVL).

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH v4] net-next-2.6: Allow ethtool to set interface in loopback mode.
From: Mahesh Bandewar @ 2011-01-23 17:12 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, Tom Herbert, Laurent Chavey, netdev
In-Reply-To: <1295750153.4117.17.camel@localhost>

On Sat, Jan 22, 2011 at 6:35 PM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Fri, 2011-01-21 at 16:23 -0800, Mahesh Bandewar wrote:
>> This patch enables ethtool to set the loopback mode on a given interface.
>> By configuring the interface in loopback mode in conjunction with a policy
>> route / rule, a userland application can stress the egress / ingress path
>> exposing the flows of the change in progress and potentially help developer(s)
>> understand the impact of those changes without even sending a packet out
>> on the network.
>>
>> Following set of commands illustrates one such example -
>>       a) ip -4 addr add 192.168.1.1/24 dev eth1
>>       b) ip -4 rule add from all iif eth1 lookup 250
>>       c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
>>       d) arp -Ds 192.168.1.100 eth1
>>       e) arp -Ds 192.168.1.200 eth1
>>       f) sysctl -w net.ipv4.ip_nonlocal_bind=1
>>       g) sysctl -w net.ipv4.conf.all.accept_local=1
>>       # Assuming that the machine has 8 cores
>>       h) taskset 000f netserver -L 192.168.1.200
>>       i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30
>>
>> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
> [...]
>
> If this version has been revised, you can't claim I reviewed it!
> If it's a repost of an earlier version, you should say so.
There were no code changes made since you reviewed last. I made some
formatting changes only and that's why I thought it would be
appropriate to put your name there.
>
> I thought we agreed that loopback could be treated as a flag, anyway.
>
> Ben.
>
> --
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>
>

^ permalink raw reply

* Re: [net-2.6 PATCH 1/2] net: dcbnl: remove redundant DCB_CAP_DCBX_STATIC bit
From: Shmulik Ravid @ 2011-01-23 16:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <4D3A465F.8020809@intel.com>


On Fri, 2011-01-21 at 18:52 -0800, John Fastabend wrote:
> On 1/21/2011 6:35 PM, John Fastabend wrote:
> > Remove redundant DCB_CAP_DCBX_STATIC bit in DCB capabilities
> > 
> > Setting this bit indicates that no embedded DCBx engine is
> > present and the hardware can not be configured. This is the
> > same as having none of the DCB capability flags set or simply
> > not implementing the dcbnl ops at all.
> > 
> > This patch removes this bit. The bit has not made a stable
> > release yet so removing it should not be an issue with
> > existing apps.
> > 
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > CC: Shmulik Ravid <shmulikr@broadcom.com>
> > ---
> > 
> 
> Shmulik, could you ACK this because you added these bits? But
> I was adding support for this in lldpad and I see no reason that
> we need these?
> 
DCB_CAP_DCBX_STATIC means that the embedded engine will turn the user
configuration into the operational configuration without performing the
actual negotiation, so it is not equivalent to not having an embedded
DCBx engine. This is mostly a debug and integration option as it allows
you to do DCB related or dependent testing and development without
having a proper DCBx peer.

On second thought, I'm not sure this option is justified although we
found it useful during our development. If you think it's not useful
enough (or not at all) then by all means remove it.

Thanks,
Shmulik




^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Simon Horman @ 2011-01-23 13:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rick Jones, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <20110123103902.GA28585@redhat.com>

On Sun, Jan 23, 2011 at 12:39:02PM +0200, Michael S. Tsirkin wrote:
> On Sun, Jan 23, 2011 at 05:38:49PM +1100, Simon Horman wrote:
> > On Sat, Jan 22, 2011 at 11:57:42PM +0200, Michael S. Tsirkin wrote:
> > > On Sat, Jan 22, 2011 at 10:11:52AM +1100, Simon Horman wrote:
> > > > On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote:

[snip]

> > > > > Hmm, what is this supposed to measure?  Basically each time you run an
> > > > > un-paced UDP_STREAM you get some random load on the network.
> > > > > You can't tell what it was exactly, only that it was between
> > > > > the send and receive throughput.
> > > > 
> > > > Rick mentioned in another email that I messed up my test parameters a bit,
> > > > so I will re-run the tests, incorporating his suggestions.
> > > > 
> > > > What I was attempting to measure was the effect of an unpaced UDP_STREAM
> > > > on the latency of more moderated traffic. Because I am interested in
> > > > what effect an abusive guest has on other guests and how that my be
> > > > mitigated.
> > > > 
> > > > Could you suggest some tests that you feel are more appropriate?
> > > 
> > > Yes. To refraze my concern in these terms, besides the malicious guest
> > > you have another software in host (netperf) that interferes with
> > > the traffic, and it cooperates with the malicious guest.
> > > Right?
> > 
> > Yes, that is the scenario in this test.
> 
> Yes but I think that you want to put some controlled load on host.
> Let's assume that we impove the speed somehow and now you can push more
> bytes per second without loss.  Result might be a regression in your
> test because you let the guest push "as much as it can" and suddenly it
> can push more data through.  OTOH with packet loss the load on host is
> anywhere in between send and receive throughput: there's no easy way to
> measure it from netperf: the earlier some buffers overrun, the earlier
> the packets get dropped and the less the load on host.
> 
> This is why I say that to get a specific
> load on host you want to limit the sender
> to a specific BW and then either
> - make sure packet loss % is close to 0.
> - make sure packet loss % is close to 100%.

Thanks, and sorry for being a bit slow.  I now see what you have
been getting at with regards to limiting the tests.
I will see about getting some numbers based on your suggestions.


^ permalink raw reply

* [PATCH] net: change netdev->features to u32
From: Michał Mirosław @ 2011-01-23 12:44 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings, Eric Dumazet
In-Reply-To: <1295778913.17333.28.camel@edumazet-laptop>

Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

netdev->vlan_features field is moved just after ->features to avoid holes
on 64-bit arches (noticed by Eric Dumazet).

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/bnx2.c              |    2 +-
 drivers/net/bonding/bond_main.c |    4 ++--
 drivers/net/myri10ge/myri10ge.c |    4 ++--
 drivers/net/sfc/ethtool.c       |    4 ++--
 drivers/net/sfc/net_driver.h    |    2 +-
 drivers/net/tun.c               |    2 +-
 include/linux/netdevice.h       |   23 +++++++++++------------
 include/linux/skbuff.h          |    2 +-
 include/net/protocol.h          |    4 ++--
 include/net/tcp.h               |    2 +-
 include/net/udp.h               |    2 +-
 net/8021q/vlan.c                |    2 +-
 net/bridge/br_if.c              |    2 +-
 net/bridge/br_private.h         |    2 +-
 net/core/dev.c                  |   15 +++++++--------
 net/core/ethtool.c              |    2 +-
 net/core/net-sysfs.c            |    2 +-
 net/core/skbuff.c               |    4 ++--
 net/ipv4/af_inet.c              |    2 +-
 net/ipv4/tcp.c                  |    2 +-
 net/ipv4/udp.c                  |    2 +-
 net/ipv6/af_inet6.c             |    2 +-
 net/ipv6/udp.c                  |    2 +-
 23 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index df99edf..cab96fa 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -8312,7 +8312,7 @@ static const struct net_device_ops bnx2_netdev_ops = {
 #endif
 };
 
-static void inline vlan_features_add(struct net_device *dev, unsigned long flags)
+static void inline vlan_features_add(struct net_device *dev, u32 flags)
 {
 	dev->vlan_features |= flags;
 }
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b1025b8..8d10aff 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1372,8 +1372,8 @@ static int bond_compute_features(struct bonding *bond)
 {
 	struct slave *slave;
 	struct net_device *bond_dev = bond->dev;
-	unsigned long features = bond_dev->features;
-	unsigned long vlan_features = 0;
+	u32 features = bond_dev->features;
+	u32 vlan_features = 0;
 	unsigned short max_hard_header_len = max((u16)ETH_HLEN,
 						bond_dev->hard_header_len);
 	int i;
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index ea5cfe2..a7f2eed 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -253,7 +253,7 @@ struct myri10ge_priv {
 	unsigned long serial_number;
 	int vendor_specific_offset;
 	int fw_multicast_support;
-	unsigned long features;
+	u32 features;
 	u32 max_tso6;
 	u32 read_dma;
 	u32 write_dma;
@@ -1776,7 +1776,7 @@ static int myri10ge_set_rx_csum(struct net_device *netdev, u32 csum_enabled)
 static int myri10ge_set_tso(struct net_device *netdev, u32 tso_enabled)
 {
 	struct myri10ge_priv *mgp = netdev_priv(netdev);
-	unsigned long flags = mgp->features & (NETIF_F_TSO6 | NETIF_F_TSO);
+	u32 flags = mgp->features & (NETIF_F_TSO6 | NETIF_F_TSO);
 
 	if (tso_enabled)
 		netdev->features |= flags;
diff --git a/drivers/net/sfc/ethtool.c b/drivers/net/sfc/ethtool.c
index 0e8bb19..713969a 100644
--- a/drivers/net/sfc/ethtool.c
+++ b/drivers/net/sfc/ethtool.c
@@ -502,7 +502,7 @@ static void efx_ethtool_get_stats(struct net_device *net_dev,
 static int efx_ethtool_set_tso(struct net_device *net_dev, u32 enable)
 {
 	struct efx_nic *efx __attribute__ ((unused)) = netdev_priv(net_dev);
-	unsigned long features;
+	u32 features;
 
 	features = NETIF_F_TSO;
 	if (efx->type->offload_features & NETIF_F_V6_CSUM)
@@ -519,7 +519,7 @@ static int efx_ethtool_set_tso(struct net_device *net_dev, u32 enable)
 static int efx_ethtool_set_tx_csum(struct net_device *net_dev, u32 enable)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
-	unsigned long features = efx->type->offload_features & NETIF_F_ALL_CSUM;
+	u32 features = efx->type->offload_features & NETIF_F_ALL_CSUM;
 
 	if (enable)
 		net_dev->features |= features;
diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index 28df866..c652702 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -906,7 +906,7 @@ struct efx_nic_type {
 	unsigned int phys_addr_channels;
 	unsigned int tx_dc_base;
 	unsigned int rx_dc_base;
-	unsigned long offload_features;
+	u32 offload_features;
 	u32 reset_world_flags;
 };
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b100bd5..55786a0 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1142,7 +1142,7 @@ static int tun_get_iff(struct net *net, struct tun_struct *tun,
  * privs required. */
 static int set_offload(struct net_device *dev, unsigned long arg)
 {
-	unsigned int old_features, features;
+	u32 old_features, features;
 
 	old_features = dev->features;
 	/* Unset features, set them as we chew on the arg. */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 371fa88..176062f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -894,7 +894,7 @@ struct net_device {
 	struct list_head	unreg_list;
 
 	/* Net device features */
-	unsigned long		features;
+	u32			features;
 #define NETIF_F_SG		1	/* Scatter/gather IO. */
 #define NETIF_F_IP_CSUM		2	/* Can checksum TCP/UDP over IPv4. */
 #define NETIF_F_NO_CSUM		4	/* Does not require checksum. F.e. loopack. */
@@ -948,6 +948,9 @@ struct net_device {
 				 NETIF_F_SG | NETIF_F_HIGHDMA |		\
 				 NETIF_F_FRAGLIST)
 
+	/* VLAN feature mask */
+	u32			vlan_features;
+
 	/* Interface index. Unique device identifier	*/
 	int			ifindex;
 	int			iflink;
@@ -1149,9 +1152,6 @@ struct net_device {
 	/* rtnetlink link ops */
 	const struct rtnl_link_ops *rtnl_link_ops;
 
-	/* VLAN feature mask */
-	unsigned long vlan_features;
-
 	/* for setting kernel sock attribute on TCP connection setup */
 #define GSO_MAX_SIZE		65536
 	unsigned int		gso_max_size;
@@ -1374,7 +1374,7 @@ struct packet_type {
 					 struct packet_type *,
 					 struct net_device *);
 	struct sk_buff		*(*gso_segment)(struct sk_buff *skb,
-						int features);
+						u32 features);
 	int			(*gso_send_check)(struct sk_buff *skb);
 	struct sk_buff		**(*gro_receive)(struct sk_buff **head,
 					       struct sk_buff *skb);
@@ -2343,7 +2343,7 @@ extern int		netdev_tstamp_prequeue;
 extern int		weight_p;
 extern int		netdev_set_master(struct net_device *dev, struct net_device *master);
 extern int skb_checksum_help(struct sk_buff *skb);
-extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features);
+extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, u32 features);
 #ifdef CONFIG_BUG
 extern void netdev_rx_csum_fault(struct net_device *dev);
 #else
@@ -2370,22 +2370,21 @@ extern char *netdev_drivername(const struct net_device *dev, char *buffer, int l
 
 extern void linkwatch_run_queue(void);
 
-unsigned long netdev_increment_features(unsigned long all, unsigned long one,
-					unsigned long mask);
-unsigned long netdev_fix_features(unsigned long features, const char *name);
+u32 netdev_increment_features(u32 all, u32 one, u32 mask);
+u32 netdev_fix_features(u32 features, const char *name);
 
 void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 					struct net_device *dev);
 
-int netif_skb_features(struct sk_buff *skb);
+u32 netif_skb_features(struct sk_buff *skb);
 
-static inline int net_gso_ok(int features, int gso_type)
+static inline int net_gso_ok(u32 features, int gso_type)
 {
 	int feature = gso_type << NETIF_F_GSO_SHIFT;
 	return (features & feature) == feature;
 }
 
-static inline int skb_gso_ok(struct sk_buff *skb, int features)
+static inline int skb_gso_ok(struct sk_buff *skb, u32 features)
 {
 	return net_gso_ok(features, skb_shinfo(skb)->gso_type) &&
 	       (!skb_has_frag_list(skb) || (features & NETIF_F_FRAGLIST));
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6e946da..31f02d0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1877,7 +1877,7 @@ extern void	       skb_split(struct sk_buff *skb,
 extern int	       skb_shift(struct sk_buff *tgt, struct sk_buff *skb,
 				 int shiftlen);
 
-extern struct sk_buff *skb_segment(struct sk_buff *skb, int features);
+extern struct sk_buff *skb_segment(struct sk_buff *skb, u32 features);
 
 static inline void *skb_header_pointer(const struct sk_buff *skb, int offset,
 				       int len, void *buffer)
diff --git a/include/net/protocol.h b/include/net/protocol.h
index dc07495..6f7eb80 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -38,7 +38,7 @@ struct net_protocol {
 	void			(*err_handler)(struct sk_buff *skb, u32 info);
 	int			(*gso_send_check)(struct sk_buff *skb);
 	struct sk_buff	       *(*gso_segment)(struct sk_buff *skb,
-					       int features);
+					       u32 features);
 	struct sk_buff	      **(*gro_receive)(struct sk_buff **head,
 					       struct sk_buff *skb);
 	int			(*gro_complete)(struct sk_buff *skb);
@@ -57,7 +57,7 @@ struct inet6_protocol {
 
 	int	(*gso_send_check)(struct sk_buff *skb);
 	struct sk_buff *(*gso_segment)(struct sk_buff *skb,
-				       int features);
+				       u32 features);
 	struct sk_buff **(*gro_receive)(struct sk_buff **head,
 					struct sk_buff *skb);
 	int	(*gro_complete)(struct sk_buff *skb);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 38509f0..9179111 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1404,7 +1404,7 @@ extern struct request_sock_ops tcp6_request_sock_ops;
 extern void tcp_v4_destroy_sock(struct sock *sk);
 
 extern int tcp_v4_gso_send_check(struct sk_buff *skb);
-extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features);
+extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, u32 features);
 extern struct sk_buff **tcp_gro_receive(struct sk_buff **head,
 					struct sk_buff *skb);
 extern struct sk_buff **tcp4_gro_receive(struct sk_buff **head,
diff --git a/include/net/udp.h b/include/net/udp.h
index bb967dd..e82f3a8 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -245,5 +245,5 @@ extern void udp4_proc_exit(void);
 extern void udp_init(void);
 
 extern int udp4_ufo_send_check(struct sk_buff *skb);
-extern struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, int features);
+extern struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, u32 features);
 #endif	/* _UDP_H */
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 6e64f7c..7850412 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -327,7 +327,7 @@ static void vlan_sync_address(struct net_device *dev,
 static void vlan_transfer_features(struct net_device *dev,
 				   struct net_device *vlandev)
 {
-	unsigned long old_features = vlandev->features;
+	u32 old_features = vlandev->features;
 
 	vlandev->features &= ~dev->vlan_features;
 	vlandev->features |= dev->features & dev->vlan_features;
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index d9d1e2b..52ce4a3 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -365,7 +365,7 @@ int br_min_mtu(const struct net_bridge *br)
 void br_features_recompute(struct net_bridge *br)
 {
 	struct net_bridge_port *p;
-	unsigned long features, mask;
+	u32 features, mask;
 
 	features = mask = br->feature_mask;
 	if (list_empty(&br->port_list))
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 84aac77..9f22898 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -182,7 +182,7 @@ struct net_bridge
 	struct br_cpu_netstats __percpu *stats;
 	spinlock_t			hash_lock;
 	struct hlist_head		hash[BR_HASH_SIZE];
-	unsigned long			feature_mask;
+	u32				feature_mask;
 #ifdef CONFIG_BRIDGE_NETFILTER
 	struct rtable 			fake_rtable;
 	bool				nf_call_iptables;
diff --git a/net/core/dev.c b/net/core/dev.c
index 906b589..01d7ce2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1856,7 +1856,7 @@ EXPORT_SYMBOL(skb_checksum_help);
  *	It may return NULL if the skb requires no segmentation.  This is
  *	only possible when GSO is used for verifying header integrity.
  */
-struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features)
+struct sk_buff *skb_gso_segment(struct sk_buff *skb, u32 features)
 {
 	struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
 	struct packet_type *ptype;
@@ -2044,7 +2044,7 @@ static bool can_checksum_protocol(unsigned long features, __be16 protocol)
 		 protocol == htons(ETH_P_FCOE)));
 }
 
-static int harmonize_features(struct sk_buff *skb, __be16 protocol, int features)
+static u32 harmonize_features(struct sk_buff *skb, __be16 protocol, u32 features)
 {
 	if (!can_checksum_protocol(protocol, features)) {
 		features &= ~NETIF_F_ALL_CSUM;
@@ -2056,10 +2056,10 @@ static int harmonize_features(struct sk_buff *skb, __be16 protocol, int features
 	return features;
 }
 
-int netif_skb_features(struct sk_buff *skb)
+u32 netif_skb_features(struct sk_buff *skb)
 {
 	__be16 protocol = skb->protocol;
-	int features = skb->dev->features;
+	u32 features = skb->dev->features;
 
 	if (protocol == htons(ETH_P_8021Q)) {
 		struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
@@ -2104,7 +2104,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 	int rc = NETDEV_TX_OK;
 
 	if (likely(!skb->next)) {
-		int features;
+		u32 features;
 
 		/*
 		 * If device doesnt need skb->dst, release it right now while
@@ -5127,7 +5127,7 @@ static void rollback_registered(struct net_device *dev)
 	rollback_registered_many(&single);
 }
 
-unsigned long netdev_fix_features(unsigned long features, const char *name)
+u32 netdev_fix_features(u32 features, const char *name)
 {
 	/* Fix illegal checksum combinations */
 	if ((features & NETIF_F_HW_CSUM) &&
@@ -6057,8 +6057,7 @@ static int dev_cpu_callback(struct notifier_block *nfb,
  *	@one to the master device with current feature set @all.  Will not
  *	enable anything that is off in @mask. Returns the new feature set.
  */
-unsigned long netdev_increment_features(unsigned long all, unsigned long one,
-					unsigned long mask)
+u32 netdev_increment_features(u32 all, u32 one, u32 mask)
 {
 	/* If device needs checksumming, downgrade to it. */
 	if (all & NETIF_F_NO_CSUM && !(one & NETIF_F_NO_CSUM))
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 1774178..bd1af99 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1458,7 +1458,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	void __user *useraddr = ifr->ifr_data;
 	u32 ethcmd;
 	int rc;
-	unsigned long old_features;
+	u32 old_features;
 
 	if (!dev || !netif_device_present(dev))
 		return -ENODEV;
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index e23c01b..81367cc 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -99,7 +99,7 @@ NETDEVICE_SHOW(addr_assign_type, fmt_dec);
 NETDEVICE_SHOW(addr_len, fmt_dec);
 NETDEVICE_SHOW(iflink, fmt_dec);
 NETDEVICE_SHOW(ifindex, fmt_dec);
-NETDEVICE_SHOW(features, fmt_long_hex);
+NETDEVICE_SHOW(features, fmt_hex);
 NETDEVICE_SHOW(type, fmt_dec);
 NETDEVICE_SHOW(link_mode, fmt_dec);
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d31bb36..436c4c4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2497,7 +2497,7 @@ EXPORT_SYMBOL_GPL(skb_pull_rcsum);
  *	a pointer to the first in a list of new skbs for the segments.
  *	In case of error it returns ERR_PTR(err).
  */
-struct sk_buff *skb_segment(struct sk_buff *skb, int features)
+struct sk_buff *skb_segment(struct sk_buff *skb, u32 features)
 {
 	struct sk_buff *segs = NULL;
 	struct sk_buff *tail = NULL;
@@ -2507,7 +2507,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, int features)
 	unsigned int offset = doffset;
 	unsigned int headroom;
 	unsigned int len;
-	int sg = features & NETIF_F_SG;
+	int sg = !!(features & NETIF_F_SG);
 	int nfrags = skb_shinfo(skb)->nr_frags;
 	int err = -ENOMEM;
 	int i = 0;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f2b6110..e5e2d9d 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1215,7 +1215,7 @@ out:
 	return err;
 }
 
-static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features)
+static struct sk_buff *inet_gso_segment(struct sk_buff *skb, u32 features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct iphdr *iph;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6c11eec..f9867d2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2653,7 +2653,7 @@ int compat_tcp_getsockopt(struct sock *sk, int level, int optname,
 EXPORT_SYMBOL(compat_tcp_getsockopt);
 #endif
 
-struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
+struct sk_buff *tcp_tso_segment(struct sk_buff *skb, u32 features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct tcphdr *th;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8157b17..d37baaa 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2199,7 +2199,7 @@ int udp4_ufo_send_check(struct sk_buff *skb)
 	return 0;
 }
 
-struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, int features)
+struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, u32 features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	unsigned int mss;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 978e80e..3194aa9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -772,7 +772,7 @@ out:
 	return err;
 }
 
-static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, int features)
+static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, u32 features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct ipv6hdr *ipv6h;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9a009c6..a419a78 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1299,7 +1299,7 @@ static int udp6_ufo_send_check(struct sk_buff *skb)
 	return 0;
 }
 
-static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, int features)
+static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, u32 features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	unsigned int mss;
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH] net: reduce and unify printk level in netdev_fix_features()
From: Michał Mirosław @ 2011-01-23 12:44 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings, Joe Perches
In-Reply-To: <20110122234051.GA30734@rere.qmqm.pl>

Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
be used by ethtool and might spam dmesg unnecessarily.

This converts the function to use netdev_info() instead of plain printk().

As a side effect, bonding and bridge devices will now log dropped features
on every slave device change.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/bonding/bond_main.c |    4 ++--
 include/linux/netdevice.h       |    2 +-
 net/bridge/br_if.c              |    2 +-
 net/core/dev.c                  |   33 ++++++++++++---------------------
 4 files changed, 16 insertions(+), 25 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8d10aff..f4373e8 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1400,8 +1400,8 @@ static int bond_compute_features(struct bonding *bond)
 
 done:
 	features |= (bond_dev->features & BOND_VLAN_FEATURES);
-	bond_dev->features = netdev_fix_features(features, NULL);
-	bond_dev->vlan_features = netdev_fix_features(vlan_features, NULL);
+	bond_dev->features = netdev_fix_features(bond_dev, features);
+	bond_dev->vlan_features = netdev_fix_features(bond_dev, vlan_features);
 	bond_dev->hard_header_len = max_hard_header_len;
 
 	return 0;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 176062f..769249a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2371,7 +2371,7 @@ extern char *netdev_drivername(const struct net_device *dev, char *buffer, int l
 extern void linkwatch_run_queue(void);
 
 u32 netdev_increment_features(u32 all, u32 one, u32 mask);
-u32 netdev_fix_features(u32 features, const char *name);
+u32 netdev_fix_features(struct net_device *dev, u32 features);
 
 void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 					struct net_device *dev);
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 52ce4a3..2a6801d 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -379,7 +379,7 @@ void br_features_recompute(struct net_bridge *br)
 	}
 
 done:
-	br->dev->features = netdev_fix_features(features, NULL);
+	br->dev->features = netdev_fix_features(br->dev, features);
 }
 
 /* called with RTNL */
diff --git a/net/core/dev.c b/net/core/dev.c
index 01d7ce2..a1de71c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5127,58 +5127,49 @@ static void rollback_registered(struct net_device *dev)
 	rollback_registered_many(&single);
 }
 
-u32 netdev_fix_features(u32 features, const char *name)
+u32 netdev_fix_features(struct net_device *dev, u32 features)
 {
 	/* Fix illegal checksum combinations */
 	if ((features & NETIF_F_HW_CSUM) &&
 	    (features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
-		if (name)
-			printk(KERN_NOTICE "%s: mixed HW and IP checksum settings.\n",
-				name);
+		netdev_info(dev, "mixed HW and IP checksum settings.\n");
 		features &= ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM);
 	}
 
 	if ((features & NETIF_F_NO_CSUM) &&
 	    (features & (NETIF_F_HW_CSUM|NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
-		if (name)
-			printk(KERN_NOTICE "%s: mixed no checksumming and other settings.\n",
-				name);
+		netdev_info(dev, "mixed no checksumming and other settings.\n");
 		features &= ~(NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM|NETIF_F_HW_CSUM);
 	}
 
 	/* Fix illegal SG+CSUM combinations. */
 	if ((features & NETIF_F_SG) &&
 	    !(features & NETIF_F_ALL_CSUM)) {
-		if (name)
-			printk(KERN_NOTICE "%s: Dropping NETIF_F_SG since no "
-			       "checksum feature.\n", name);
+		netdev_info(dev,
+			"Dropping NETIF_F_SG since no checksum feature.\n");
 		features &= ~NETIF_F_SG;
 	}
 
 	/* TSO requires that SG is present as well. */
 	if ((features & NETIF_F_TSO) && !(features & NETIF_F_SG)) {
-		if (name)
-			printk(KERN_NOTICE "%s: Dropping NETIF_F_TSO since no "
-			       "SG feature.\n", name);
+		netdev_info(dev, "Dropping NETIF_F_TSO since no SG feature.\n");
 		features &= ~NETIF_F_TSO;
 	}
 
+	/* UFO needs SG and checksumming */
 	if (features & NETIF_F_UFO) {
 		/* maybe split UFO into V4 and V6? */
 		if (!((features & NETIF_F_GEN_CSUM) ||
 		    (features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
 			    == (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
-			if (name)
-				printk(KERN_ERR "%s: Dropping NETIF_F_UFO "
-				       "since no checksum offload features.\n",
-				       name);
+			netdev_info(dev,
+				"Dropping NETIF_F_UFO since no checksum offload features.\n");
 			features &= ~NETIF_F_UFO;
 		}
 
 		if (!(features & NETIF_F_SG)) {
-			if (name)
-				printk(KERN_ERR "%s: Dropping NETIF_F_UFO "
-				       "since no NETIF_F_SG feature.\n", name);
+			netdev_info(dev,
+				"Dropping NETIF_F_UFO since no NETIF_F_SG feature.\n");
 			features &= ~NETIF_F_UFO;
 		}
 	}
@@ -5321,7 +5312,7 @@ int register_netdevice(struct net_device *dev)
 	if (dev->iflink == -1)
 		dev->iflink = dev->ifindex;
 
-	dev->features = netdev_fix_features(dev->features, dev->name);
+	dev->features = netdev_fix_features(dev, dev->features);
 
 	/* Enable software GSO if SG is supported. */
 	if (dev->features & NETIF_F_SG)
-- 
1.7.2.3


^ permalink raw reply related

* Re: RFC: pid "ownership" of ip config information
From: Nicolas de Pesloüan @ 2011-01-23 12:32 UTC (permalink / raw)
  To: Patrick Schaaf; +Cc: netdev
In-Reply-To: <1295778271.5657.7.camel@lat1>

Le 23/01/2011 11:24, Patrick Schaaf a écrit :
> On Fri, 2011-01-21 at 11:17 +0100, Nicolas de Pesloüan wrote:
>> Le 21/01/2011 10:28, Patrick Schaaf a écrit :
>>> The alternative to such a feature, would be to have an additional
>>> monitoring process, which would watch the PID somehow, and need to
>>> be configured to know what to withdraw when it dies.
>
>> There exists some user space clustering system that should provide the same functionalities. Did you
>> had a look at http://www.linux-ha.org/ ?
>
> Those would be the more complex instances of "an additional monitoring
> process", right?
 >
> What happens when heartbeat is "kill -9"ed? Assume that I want to avoid
> STOMITH like approaches.
 >
> My proposal could be _used_ by such complex clustering managers, too.
>
> Or, did I overlook there a kernel based solution to "withdraw IP config
> when processes die"?
 >
> Can you provide a direct link on linux-ha?

Do you consider "withdraw IP config" the only feature that is needed when a process die ? Or shall 
we instead design a more generic framework to run a command or call a system call when a process die 
? /sbin/init is probably already doing something similar. Arguably, even init mail hang...

If your point is to provide a safety net for very sick but not really died node, then, no userland 
system would help. As such, I agree with you that an automatic withdraw of IP config might help. 
However, how would you protect against a simple never ending loop in the process or against very 
slow process due to high load on the node? You probably also need to guard against process not 
reading the network receive queue anymore.

This might end up with some sort of local heart beating monitoring of userland process, in the 
kernel, and I'm not sure if someone would support this.

And whatever you do locally to a node to ensure proper operation, you need a way to also check for 
proper operation from outside of the node. A STOMITH system is always required, in order to kill a 
totally mad node. Even the kernel may become mad.

	Nicolas.

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Michael S. Tsirkin @ 2011-01-23 10:39 UTC (permalink / raw)
  To: Simon Horman
  Cc: Rick Jones, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <20110123063849.GB2673@verge.net.au>

On Sun, Jan 23, 2011 at 05:38:49PM +1100, Simon Horman wrote:
> On Sat, Jan 22, 2011 at 11:57:42PM +0200, Michael S. Tsirkin wrote:
> > On Sat, Jan 22, 2011 at 10:11:52AM +1100, Simon Horman wrote:
> > > On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote:
> > > > On Thu, Jan 20, 2011 at 05:38:33PM +0900, Simon Horman wrote:
> > > > > [ Trimmed Eric from CC list as vger was complaining that it is too long ]
> > > > > 
> > > > > On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
> > > > > > >So it won't be all that simple to implement well, and before we try,
> > > > > > >I'd like to know whether there are applications that are helped
> > > > > > >by it. For example, we could try to measure latency at various
> > > > > > >pps and see whether the backpressure helps. netperf has -b, -w
> > > > > > >flags which might help these measurements.
> > > > > > 
> > > > > > Those options are enabled when one adds --enable-burst to the
> > > > > > pre-compilation ./configure  of netperf (one doesn't have to
> > > > > > recompile netserver).  However, if one is also looking at latency
> > > > > > statistics via the -j option in the top-of-trunk, or simply at the
> > > > > > histogram with --enable-histogram on the ./configure and a verbosity
> > > > > > level of 2 (global -v 2) then one wants the very top of trunk
> > > > > > netperf from:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > I have constructed a test where I run an un-paced  UDP_STREAM test in
> > > > > one guest and a paced omni rr test in another guest at the same time.
> > > > 
> > > > Hmm, what is this supposed to measure?  Basically each time you run an
> > > > un-paced UDP_STREAM you get some random load on the network.
> > > > You can't tell what it was exactly, only that it was between
> > > > the send and receive throughput.
> > > 
> > > Rick mentioned in another email that I messed up my test parameters a bit,
> > > so I will re-run the tests, incorporating his suggestions.
> > > 
> > > What I was attempting to measure was the effect of an unpaced UDP_STREAM
> > > on the latency of more moderated traffic. Because I am interested in
> > > what effect an abusive guest has on other guests and how that my be
> > > mitigated.
> > > 
> > > Could you suggest some tests that you feel are more appropriate?
> > 
> > Yes. To refraze my concern in these terms, besides the malicious guest
> > you have another software in host (netperf) that interferes with
> > the traffic, and it cooperates with the malicious guest.
> > Right?
> 
> Yes, that is the scenario in this test.

Yes but I think that you want to put some controlled load on host.
Let's assume that we impove the speed somehow and now you can push more
bytes per second without loss.  Result might be a regression in your
test because you let the guest push "as much as it can" and suddenly it
can push more data through.  OTOH with packet loss the load on host is
anywhere in between send and receive throughput: there's no easy way to
measure it from netperf: the earlier some buffers overrun, the earlier
the packets get dropped and the less the load on host.

This is why I say that to get a specific
load on host you want to limit the sender
to a specific BW and then either
- make sure packet loss % is close to 0.
- make sure packet loss % is close to 100%.

> > IMO for a malicious guest you would send
> > UDP packets that then get dropped by the host.
> > 
> > For example block netperf in host so that
> > it does not consume packets from the socket.
> 
> I'm more interested in rate-limiting netperf than blocking it.

Well I mean netperf on host.

> But in any case, do you mean use iptables or tc based on
> classification made by net_cls?

Just to block netperf you can send it SIGSTOP :)

-- 
MST

^ permalink raw reply

* Re: [PATCH v2 02/16] net: change netdev->features to u32
From: Eric Dumazet @ 2011-01-23 10:35 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, Ben Hutchings
In-Reply-To: <d9fe088556152bd3b43390e0fb5ea20d83aef233.1295734271.git.mirq-linux@rere.qmqm.pl>

Le samedi 22 janvier 2011 à 23:14 +0100, Michał Mirosław a écrit :
> Quoting Ben Hutchings: we presumably won't be defining features that
> can only be enabled on 64-bit architectures.
> 
> Occurences found by `grep -r` on net/, drivers/net, include/
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---

...

>  	/* Unset features, set them as we chew on the arg. */
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 371fa88..c73d63d 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -894,7 +894,7 @@ struct net_device {
>  	struct list_head	unreg_list;
>  
>  	/* Net device features */
> -	unsigned long		features;
> +	u32			features;
>  #define NETIF_F_SG		1	/* Scatter/gather IO. */
>  #define NETIF_F_IP_CSUM		2	/* Can checksum TCP/UDP over IPv4. */
>  #define NETIF_F_NO_CSUM		4	/* Does not require checksum. F.e. loopack. */
> @@ -1150,7 +1150,7 @@ struct net_device {
>  	const struct rtnl_link_ops *rtnl_link_ops;
>  
>  	/* VLAN feature mask */
> -	unsigned long vlan_features;
> +	u32 vlan_features;
>  
>  	/* for setting kernel sock attribute on TCP connection setup */


Could you move "vlan_features" right after "features", so that no holes
are there on 64bit arches ?



^ permalink raw reply

* Re: [stable] [RFC] ipv6: don't flush routes when setting loopback down
From: Stephen Hemminger @ 2011-01-23 10:34 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: jbohac, yoshfuji, netdev, stable, stephen.hemminger, ebiederm,
	brian.haley, lorenzo, David Miller, maheshkelkar
In-Reply-To: <20110123091531.GQ12837@1wt.eu>

I think this fixes the issue with disable_ipv6

--- a/net/ipv6/addrconf.c	2011-01-23 20:30:25.897243002 +1100
+++ b/net/ipv6/addrconf.c	2011-01-23 20:30:41.161243002 +1100
@@ -4197,7 +4197,7 @@ static void dev_disable_change(struct in
 		return;
 
 	if (idev->cnf.disable_ipv6)
-		addrconf_notify(NULL, NETDEV_DOWN, idev->dev);
+		addrconf_notify(NULL, NETDEV_UNREGISTER, idev->dev);
 	else
 		addrconf_notify(NULL, NETDEV_UP, idev->dev);
 }


^ permalink raw reply

* Re: RFC: pid "ownership" of ip config information
From: Patrick Schaaf @ 2011-01-23 10:24 UTC (permalink / raw)
  To: Nicolas de Pesloüan; +Cc: netdev
In-Reply-To: <4D395D3C.9010308@gmail.com>

On Fri, 2011-01-21 at 11:17 +0100, Nicolas de Pesloüan wrote:
> Le 21/01/2011 10:28, Patrick Schaaf a écrit :
> > The alternative to such a feature, would be to have an additional
> > monitoring process, which would watch the PID somehow, and need to
> > be configured to know what to withdraw when it dies.

> There exists some user space clustering system that should provide the same functionalities. Did you 
> had a look at http://www.linux-ha.org/ ?

Those would be the more complex instances of "an additional monitoring
process", right?

What happens when heartbeat is "kill -9"ed? Assume that I want to avoid
STOMITH like approaches.

My proposal could be _used_ by such complex clustering managers, too.

Or, did I overlook there a kernel based solution to "withdraw IP config
when processes die"? Can you provide a direct link on linux-ha?

best regards
  Patrick


^ permalink raw reply

* [PATCH iproute2] sfq: add divisor support
From: Eric Dumazet @ 2011-01-23 10:09 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20110120.165618.180427971.davem@davemloft.net>

In 2.6.39, we can build SFQ queues with a given hash table size,
different than default one (1024 slots)

# tc qdisc add dev eth0 sfq help
Usage: ... sfq [ limit NUMBER ] [ perturb SECS ] [ quantum BYTES ]
               [ divisor NUMBER ]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 tc/q_sfq.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tc/q_sfq.c b/tc/q_sfq.c
index 71a3c9a..98ec530 100644
--- a/tc/q_sfq.c
+++ b/tc/q_sfq.c
@@ -26,6 +26,7 @@
 static void explain(void)
 {
 	fprintf(stderr, "Usage: ... sfq [ limit NUMBER ] [ perturb SECS ] [ quantum BYTES ]\n");
+	fprintf(stderr, "               [ divisor NUMBER ]\n");
 }
 
 static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nlmsghdr *n)
@@ -61,6 +62,13 @@ static int sfq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl
 				return -1;
 			}
 			ok++;
+		} else if (strcmp(*argv, "divisor") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.divisor, *argv, 0)) {
+				fprintf(stderr, "Illegal \"divisor\"\n");
+				return -1;
+			}
+			ok++;
 		} else if (strcmp(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -93,6 +101,7 @@ static int sfq_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 	if (show_details) {
 		fprintf(f, "flows %u/%u ", qopt->flows, qopt->divisor);
 	}
+	fprintf(f, "divisor %u ", qopt->divisor);
 	if (qopt->perturb_period)
 		fprintf(f, "perturb %dsec ", qopt->perturb_period);
 	return 0;



^ permalink raw reply related

* Re: ipv6: why disable ipv6 on last address removal?
From: Stephen Hemminger @ 2011-01-23  9:47 UTC (permalink / raw)
  To: Jiri Bohac; +Cc: David Miller, yoshfuji, netdev
In-Reply-To: <20100216152859.GC29736@midget.suse.cz>

What about this? It will remove the address on ipv6 disable.

--- a/net/ipv6/addrconf.c	2011-01-23 20:30:25.897243002 +1100
+++ b/net/ipv6/addrconf.c	2011-01-23 20:30:41.161243002 +1100
@@ -4197,7 +4197,7 @@ static void dev_disable_change(struct in
 		return;
 
 	if (idev->cnf.disable_ipv6)
-		addrconf_notify(NULL, NETDEV_DOWN, idev->dev);
+		addrconf_notify(NULL, NETDEV_UNREGISTER, idev->dev);
 	else
 		addrconf_notify(NULL, NETDEV_UP, idev->dev);
 }


^ permalink raw reply

* Re: [stable] [RFC] ipv6: don't flush routes when setting loopback down
From: Stephen Hemminger @ 2011-01-23  9:21 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: jbohac, yoshfuji, netdev, stable, stephen.hemminger, ebiederm,
	brian.haley, lorenzo, David Miller, maheshkelkar
In-Reply-To: <20110123091531.GQ12837@1wt.eu>

On Sun, 23 Jan 2011 10:15:32 +0100
Willy Tarreau <w@1wt.eu> wrote:

> [ first, is there a reason we have stable@ CCed on this thread ? ]
> 
> On Sun, Jan 23, 2011 at 07:26:24PM +1100, Stephen Hemminger wrote:
> > > You are probably so upset because I stepped on code you worked hard
> > > on. But the IPv6 semantics should not have been different from IPv4
> > > and the disable_ipv6 flag was a poor API choice as well. Legacy
> > > API's suck, I don't expect perfection but it should be possible
> > > to make a working version that:
> > > 
> > > Allows disabling IPv6 completely on an interface
> > > AND Has the same address and route semantics for both
> > > IPv4 and IPv6.
> > 
> > Also for application sanity, Linux should behave the same as BSD
> 
> Stephen,
> 
> while I agree with all the points you made, David is right in that we
> can't use a fix for a bug as a justification for breaking something
> that worked for other people. It simply means that everything that was
> merged since the first regression was introduced should be reverted
> and reworked until a more satisfying solution is found.
> 
> Otherwise users lose trust and you have to deal with much more cases
> when users report issues.
> 
> If the bug is caused by a deep design issue, then maybe a development
> branch should be dedicated to it so that the persons affected by it

I made my attempt at fixing the issue, others can attack that mud pit.



^ permalink raw reply

* Re: [stable] [RFC] ipv6: don't flush routes when setting loopback down
From: Willy Tarreau @ 2011-01-23  9:15 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: jbohac, yoshfuji, netdev, stable, stephen.hemminger, ebiederm,
	brian.haley, lorenzo, David Miller, maheshkelkar
In-Reply-To: <20110123192624.5cfe33d0@s6510>

[ first, is there a reason we have stable@ CCed on this thread ? ]

On Sun, Jan 23, 2011 at 07:26:24PM +1100, Stephen Hemminger wrote:
> > You are probably so upset because I stepped on code you worked hard
> > on. But the IPv6 semantics should not have been different from IPv4
> > and the disable_ipv6 flag was a poor API choice as well. Legacy
> > API's suck, I don't expect perfection but it should be possible
> > to make a working version that:
> > 
> > Allows disabling IPv6 completely on an interface
> > AND Has the same address and route semantics for both
> > IPv4 and IPv6.
> 
> Also for application sanity, Linux should behave the same as BSD

Stephen,

while I agree with all the points you made, David is right in that we
can't use a fix for a bug as a justification for breaking something
that worked for other people. It simply means that everything that was
merged since the first regression was introduced should be reverted
and reworked until a more satisfying solution is found.

Otherwise users lose trust and you have to deal with much more cases
when users report issues.

If the bug is caused by a deep design issue, then maybe a development
branch should be dedicated to it so that the persons affected by it
can track its evolution and report some feedback.

Regards,
Willy


^ permalink raw reply

* Re: [RFC] ipv6: don't flush routes when setting loopback down
From: Stephen Hemminger @ 2011-01-23  8:26 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, stephen.hemminger, ebiederm, jbohac, brian.haley,
	netdev, maheshkelkar, lorenzo, yoshfuji, stable
In-Reply-To: <20110123192416.73cd7521@s6510>

On Sun, 23 Jan 2011 19:24:16 +1100
Stephen Hemminger <shemminger@vyatta.com> wrote:

> On Sat, 22 Jan 2011 21:42:54 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Stephen Hemminger <stephen.hemminger@vyatta.com>
> > Date: Sat, 22 Jan 2011 20:41:12 -0800 (PST)
> > 
> > > Having IPv6 remove all addresses when link goes down is fundamentally broken
> > > that is what the original problem being fixed. For users on servers or using
> > > Quagga this matters, how do you plan to fix that?
> > 
> > How about in a way that doesn't break stuff?
> > 
> > And it's been beyond proven that people give more of a crap
> > about disable_ipv6 than the thing you keep claiming is a big deal.
> > 
> > NOBODY other than you even noticed the issue or made a report about
> > it.
> > 
> > Yet we have people actively complaining about disable_ipv6 being
> > broken.
> > 
> > So you lose on two counts.  You can't fix things by breaking other
> > stuff, and your obscure stuff matters less than things people
> > actually notice being broken.
> 
> You are probably so upset because I stepped on code you worked hard
> on. But the IPv6 semantics should not have been different from IPv4
> and the disable_ipv6 flag was a poor API choice as well. Legacy
> API's suck, I don't expect perfection but it should be possible
> to make a working version that:
> 
> Allows disabling IPv6 completely on an interface
> AND Has the same address and route semantics for both
> IPv4 and IPv6.

Also for application sanity, Linux should behave the same as BSD

^ permalink raw reply

* Re: [RFC] ipv6: don't flush routes when setting loopback down
From: Stephen Hemminger @ 2011-01-23  8:24 UTC (permalink / raw)
  To: David Miller
  Cc: stephen.hemminger, ebiederm, jbohac, brian.haley, netdev,
	maheshkelkar, lorenzo, yoshfuji, stable
In-Reply-To: <20110122.214254.226765382.davem@davemloft.net>

On Sat, 22 Jan 2011 21:42:54 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <stephen.hemminger@vyatta.com>
> Date: Sat, 22 Jan 2011 20:41:12 -0800 (PST)
> 
> > Having IPv6 remove all addresses when link goes down is fundamentally broken
> > that is what the original problem being fixed. For users on servers or using
> > Quagga this matters, how do you plan to fix that?
> 
> How about in a way that doesn't break stuff?
> 
> And it's been beyond proven that people give more of a crap
> about disable_ipv6 than the thing you keep claiming is a big deal.
> 
> NOBODY other than you even noticed the issue or made a report about
> it.
> 
> Yet we have people actively complaining about disable_ipv6 being
> broken.
> 
> So you lose on two counts.  You can't fix things by breaking other
> stuff, and your obscure stuff matters less than things people
> actually notice being broken.

You are probably so upset because I stepped on code you worked hard
on. But the IPv6 semantics should not have been different from IPv4
and the disable_ipv6 flag was a poor API choice as well. Legacy
API's suck, I don't expect perfection but it should be possible
to make a working version that:

Allows disabling IPv6 completely on an interface
AND Has the same address and route semantics for both
IPv4 and IPv6.



^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Simon Horman @ 2011-01-23  6:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rick Jones, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <20110122215742.GC5617@redhat.com>

On Sat, Jan 22, 2011 at 11:57:42PM +0200, Michael S. Tsirkin wrote:
> On Sat, Jan 22, 2011 at 10:11:52AM +1100, Simon Horman wrote:
> > On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Jan 20, 2011 at 05:38:33PM +0900, Simon Horman wrote:
> > > > [ Trimmed Eric from CC list as vger was complaining that it is too long ]
> > > > 
> > > > On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
> > > > > >So it won't be all that simple to implement well, and before we try,
> > > > > >I'd like to know whether there are applications that are helped
> > > > > >by it. For example, we could try to measure latency at various
> > > > > >pps and see whether the backpressure helps. netperf has -b, -w
> > > > > >flags which might help these measurements.
> > > > > 
> > > > > Those options are enabled when one adds --enable-burst to the
> > > > > pre-compilation ./configure  of netperf (one doesn't have to
> > > > > recompile netserver).  However, if one is also looking at latency
> > > > > statistics via the -j option in the top-of-trunk, or simply at the
> > > > > histogram with --enable-histogram on the ./configure and a verbosity
> > > > > level of 2 (global -v 2) then one wants the very top of trunk
> > > > > netperf from:
> > > > 
> > > > Hi,
> > > > 
> > > > I have constructed a test where I run an un-paced  UDP_STREAM test in
> > > > one guest and a paced omni rr test in another guest at the same time.
> > > 
> > > Hmm, what is this supposed to measure?  Basically each time you run an
> > > un-paced UDP_STREAM you get some random load on the network.
> > > You can't tell what it was exactly, only that it was between
> > > the send and receive throughput.
> > 
> > Rick mentioned in another email that I messed up my test parameters a bit,
> > so I will re-run the tests, incorporating his suggestions.
> > 
> > What I was attempting to measure was the effect of an unpaced UDP_STREAM
> > on the latency of more moderated traffic. Because I am interested in
> > what effect an abusive guest has on other guests and how that my be
> > mitigated.
> > 
> > Could you suggest some tests that you feel are more appropriate?
> 
> Yes. To refraze my concern in these terms, besides the malicious guest
> you have another software in host (netperf) that interferes with
> the traffic, and it cooperates with the malicious guest.
> Right?

Yes, that is the scenario in this test.

> IMO for a malicious guest you would send
> UDP packets that then get dropped by the host.
> 
> For example block netperf in host so that
> it does not consume packets from the socket.

I'm more interested in rate-limiting netperf than blocking it.
But in any case, do you mean use iptables or tc based on
classification made by net_cls?


^ permalink raw reply

* Re: [RFC] ipv6: don't flush routes when setting loopback down
From: David Miller @ 2011-01-23  5:42 UTC (permalink / raw)
  To: stephen.hemminger
  Cc: jbohac, ebiederm, yoshfuji, netdev, maheshkelkar, brian.haley,
	stable, lorenzo
In-Reply-To: <901400353.32377.1295757672292.JavaMail.root@tahiti.vyatta.com>

From: Stephen Hemminger <stephen.hemminger@vyatta.com>
Date: Sat, 22 Jan 2011 20:41:12 -0800 (PST)

> Having IPv6 remove all addresses when link goes down is fundamentally broken
> that is what the original problem being fixed. For users on servers or using
> Quagga this matters, how do you plan to fix that?

How about in a way that doesn't break stuff?

And it's been beyond proven that people give more of a crap
about disable_ipv6 than the thing you keep claiming is a big deal.

NOBODY other than you even noticed the issue or made a report about
it.

Yet we have people actively complaining about disable_ipv6 being
broken.

So you lose on two counts.  You can't fix things by breaking other
stuff, and your obscure stuff matters less than things people
actually notice being broken.

^ permalink raw reply

* Re: [RFC] ipv6: don't flush routes when setting loopback down
From: Stephen Hemminger @ 2011-01-23  4:41 UTC (permalink / raw)
  To: David Miller
  Cc: ebiederm, jbohac, brian haley, netdev, maheshkelkar, lorenzo,
	yoshfuji, stable
In-Reply-To: <20110122.145438.193725532.davem@davemloft.net>

Having IPv6 remove all addresses when link goes down is fundamentally broken
that is what the original problem being fixed. For users on servers or using
Quagga this matters, how do you plan to fix that?

----- Original Message -----
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Sun, 23 Jan 2011 09:39:40 +1100
> 
> > The design problem behind all this is that sysctl disable_ipv6 as
> > currently implemented is passive (just changes a variable). It needs
> > to be implemented
> > as a more active step that does the same thing as removing the
> > interface from
> > ipv6. I will look into it after LCA.
> 
> All of this stuff worked before your change Stephen.
> 
> It doesn't matter how it was implemented before, IT WORKED.
> 
> You broke it, and it's still broken.
> 
> You keep talking about fixing things other than the changes
> you made, but honestly I think we're at the point where you've
> been given enough changes and we need to simply revert your
> change.
> 
> Things like that can't run on and on for months, I don't care
> what the reason is.

^ permalink raw reply

* Re: [PATCH v4] net-next-2.6: Allow ethtool to set interface in loopback mode.
From: Ben Hutchings @ 2011-01-23  2:35 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: David Miller, Tom Herbert, Laurent Chavey, netdev
In-Reply-To: <1295655836-23421-1-git-send-email-maheshb@google.com>

On Fri, 2011-01-21 at 16:23 -0800, Mahesh Bandewar wrote:
> This patch enables ethtool to set the loopback mode on a given interface.
> By configuring the interface in loopback mode in conjunction with a policy
> route / rule, a userland application can stress the egress / ingress path
> exposing the flows of the change in progress and potentially help developer(s)
> understand the impact of those changes without even sending a packet out
> on the network.
> 
> Following set of commands illustrates one such example -
> 	a) ip -4 addr add 192.168.1.1/24 dev eth1
> 	b) ip -4 rule add from all iif eth1 lookup 250
> 	c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
> 	d) arp -Ds 192.168.1.100 eth1
> 	e) arp -Ds 192.168.1.200 eth1
> 	f) sysctl -w net.ipv4.ip_nonlocal_bind=1
> 	g) sysctl -w net.ipv4.conf.all.accept_local=1
> 	# Assuming that the machine has 8 cores
> 	h) taskset 000f netserver -L 192.168.1.200
> 	i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30
> 
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
[...]

If this version has been revised, you can't claim I reviewed it!
If it's a repost of an earlier version, you should say so.

I thought we agreed that loopback could be treated as a flag, anyway.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH v2 03/16] net: reduce and unify printk level in netdev_fix_features()
From: Michał Mirosław @ 2011-01-22 23:48 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings, Joe Perches
In-Reply-To: <29e476c368b6580f7ff5215ca759b69c75d0e021.1295734271.git.mirq-linux@rere.qmqm.pl>

On Sat, Jan 22, 2011 at 11:14:12PM +0100, Michał Mirosław wrote:
> Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
> be used by ethtool and might spam dmesg unnecessarily.
> 
> This converts the function to use netdev_info() instead of plain printk().
> 
> As a side effect, bonding and bridge devices will now log dropped features
> on every slave device change.

Hmm. I wonder whether it would be better to demote it further to KERN_DEBUG?
With the new interface, user could see when and which features are requested
but not active and figure the rest from documentation. Currently, disabling
of features because of other (temporary) conditions is not logged anyway.

Best Regards,
Michał Mirosław

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox