Netdev List
 help / color / mirror / Atom feed
* [ipv4:PATCH] Allow userspace to specify primary or secondary ip on interface
From: Vincent Li @ 2013-09-24 21:09 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, davem, Vincent Li

Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to
be primary or secondary ip on an interface.

the current behavior is when an IP is added to an interface, the primary
or secondary attributes is depending on the order of ip added to the interface
the first IP will be primary and second, third...or alias IP will be secondary
if the IP subnet matches.

this patch add the flexiblity to allow user to specify an argument 'primary' 
or 'secondary' (use 'ip addr add ip/mask primary|secondary dev ethX ' from 
iproute2 for example) to specify an IP address to be  primary or secondary.

the reason for this patch is that we have a multi blade cluster platform 
sharing 'floating management ip' and also that each blade has its own 
management ip on the management interface, so whichever blade in the
cluster becomes primary blade, the 'floating mangaement ip' follows it,
and we want any of our traffic originated from the primary blade source from
the 'floating management ip' for consistency. but in this case, since the local
blade management ip is always the primary ip on the mangaement interface and 
'floating management ip' is always secondary, kernel always choose the primary
ip as source ip address. thus we would like to add the flexibility in kernel to
allow us to specify which ip to be primary or secondary.

Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
---
 net/ipv4/devinet.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a1b5bcb..5a7764e 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -440,8 +440,9 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
 		return 0;
 	}
 
-	ifa->ifa_flags &= ~IFA_F_SECONDARY;
 	last_primary = &in_dev->ifa_list;
+	if(*last_primary == NULL)
+		ifa->ifa_flags &= ~IFA_F_SECONDARY;
 
 	for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
 	     ifap = &ifa1->ifa_next) {
@@ -458,7 +459,10 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
 				inet_free_ifa(ifa);
 				return -EINVAL;
 			}
-			ifa->ifa_flags |= IFA_F_SECONDARY;
+			if (!(ifa->ifa_flags & IFA_F_SECONDARY))
+				ifa1->ifa_flags |= IFA_F_SECONDARY;
+			else
+				ifa->ifa_flags |= IFA_F_SECONDARY;
 		}
 	}
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: per-PID network stats files in /proc
From: Stephen Hemminger @ 2013-09-24 21:04 UTC (permalink / raw)
  To: Matthew Hall; +Cc: netdev
In-Reply-To: <20130924204442.GA5074@mhcomputing.net>

On Tue, 24 Sep 2013 13:44:42 -0700
Matthew Hall <mh@mhcomputing.net> wrote:

> On Tue, Sep 24, 2013 at 01:41:57PM -0700, Stephen Hemminger wrote:
> > No. because most of these would be associated with global state.
> > Even sockets can be shared between PID's.
> 
> OK. So if this is true, then I feel compelled to ask, why does 
> /proc/PID/net/snmp exist in the first place, if it would never really work?
> 
> Thanks,
> Matthew.

/proc/PID/net is symlink to the processes network namespace.

You could do what you want by putting each process in own network
namespace, but that might be more work than you want to bother with.

^ permalink raw reply

* Re: per-PID network stats files in /proc
From: Eric Dumazet @ 2013-09-24 21:02 UTC (permalink / raw)
  To: Matthew Hall; +Cc: Stephen Hemminger, netdev
In-Reply-To: <20130924204442.GA5074@mhcomputing.net>

On Tue, 2013-09-24 at 13:44 -0700, Matthew Hall wrote:
> On Tue, Sep 24, 2013 at 01:41:57PM -0700, Stephen Hemminger wrote:
> > No. because most of these would be associated with global state.
> > Even sockets can be shared between PID's.
> 
> OK. So if this is true, then I feel compelled to ask, why does 
> /proc/PID/net/snmp exist in the first place, if it would never really work?

That because of network containers

You can perfectly use a network container to get your own copy of
counters

ip netns help

^ permalink raw reply

* Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
From: Vincent Li @ 2013-09-24 20:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, linux-kernel
In-Reply-To: <20130924.152833.434453431118381771.davem@redhat.com>

Ok, I will resend the patch with your suggestions.

Vincent

On Tue, Sep 24, 2013 at 12:28 PM, David Miller <davem@redhat.com> wrote:
> From: Vincent Li <vincent.mc.li@gmail.com>
> Date: Tue, 24 Sep 2013 11:11:21 -0700
>
>> the current behavior is when an IP is added to an interface, the primary
>> or secondary attributes is depending on the order of ip added to the interface
>> the first IP will be primary and second, third,... or alias IP will be secondary
>> if the IP subnet matches
>>
>> this patch add the flexiblity to allow user to specify an argument 'primary' or 'secondary'
>> (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for example) to specify
>> an IP address to be  primary or secondary.
>>
>> the reason for this patch is that we have a multi blade cluster platform sharing 'floating management ip'
>> and also that each blade has its own management ip on the management interface, so whichever blade in the
>> cluster becomes primary blade, the 'floating mangaement ip' follows it, and we want any of our traffic
>> originated from the primary blade source from the 'floating management ip' for consistency. but in this
>> case, since the local blade management ip is always the primary ip on the mangaement interface and 'floating
>> management ip' is always secondary, kernel always choose the primary ip as source ip address. thus we would
>> like to add the flexibility in kernel to allow us to specify which ip to be primary or secondary.
>>
>> Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
>
> When submitting a patch, please:
>
> 1) Specify an appropriate prefix for your subject line, indicating the
>    subsystem.  "ipv4: " might be appropriate here.
>
> 2) Format your commit message so that lines do not exceed 80 columns.
>    People will read using ASCII text based tools in 80 column
>    terminals.

^ permalink raw reply

* Re: per-PID network stats files in /proc
From: Matthew Hall @ 2013-09-24 20:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20130924134157.4fc22806@nehalam.linuxnetplumber.net>

On Tue, Sep 24, 2013 at 01:41:57PM -0700, Stephen Hemminger wrote:
> No. because most of these would be associated with global state.
> Even sockets can be shared between PID's.

OK. So if this is true, then I feel compelled to ask, why does 
/proc/PID/net/snmp exist in the first place, if it would never really work?

Thanks,
Matthew.

^ permalink raw reply

* [PATCH 2/2 v2] tun: call skb_get_sw_rxhash
From: Tom Herbert @ 2013-09-24 20:42 UTC (permalink / raw)
  To: davem; +Cc: netdev

tun calls skb_get_rxhash on both transmit and receive side with the
expectation that the calculation in symmetric. To ensure this
property call skb_get_sw_rxhash instead (ignores hash provided by NIC
on receive).

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/tun.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 807815f..0619fdf 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -358,7 +358,7 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
 	rcu_read_lock();
 	numqueues = ACCESS_ONCE(tun->numqueues);
 
-	txq = skb_get_rxhash(skb);
+	txq = skb_get_sw_rxhash(skb);
 	if (txq) {
 		e = tun_flow_find(&tun->flows[tun_hashfn(txq)], txq);
 		if (e)
@@ -1138,7 +1138,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	skb_reset_network_header(skb);
 	skb_probe_transport_header(skb, 0);
 
-	rxhash = skb_get_rxhash(skb);
+	rxhash = skb_get_sw_rxhash(skb);
 	netif_rx_ni(skb);
 
 	tun->dev->stats.rx_packets++;
-- 
1.8.4

^ permalink raw reply related

* [PATCH 1/2 v2] net: Add function to get SW rxhash
From: Tom Herbert @ 2013-09-24 20:42 UTC (permalink / raw)
  To: davem; +Cc: netdev

Some uses of skb_get_rxhash expect that the function will return
a consistent value whether it is called on RX or TX paths. On RX
path, we will use the rxhash if provided by the NIC, so this
would not normally be the same result computed in TX path would be
a software calculation.

This patch adds skb_get_sw_rxhash to explicitly request a hash
calculated by the stack, disregarding the hash provided by NIC.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/skbuff.h    | 14 ++++++++++++--
 net/core/flow_dissector.c |  1 +
 net/core/skbuff.c         |  1 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2ddb48d..fdde013 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -381,6 +381,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@ooo_okay: allow the mapping of a socket to a queue to be changed
  *	@l4_rxhash: indicate rxhash is a canonical 4-tuple hash over transport
  *		ports.
+ *	@sw_rxhash: indicate rxhash was computed by the stack
  *	@wifi_acked_valid: wifi_acked was set
  *	@wifi_acked: whether frame was acked on wifi or not
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
@@ -488,6 +489,7 @@ struct sk_buff {
 	__u8			pfmemalloc:1;
 	__u8			ooo_okay:1;
 	__u8			l4_rxhash:1;
+	__u8			sw_rxhash:1;
 	__u8			wifi_acked_valid:1;
 	__u8			wifi_acked:1;
 	__u8			no_fcs:1;
@@ -498,7 +500,7 @@ struct sk_buff {
 	 * headers if needed
 	 */
 	__u8			encapsulation:1;
-	/* 7/9 bit hole (depending on ndisc_nodetype presence) */
+	/* 6/8 bit hole (depending on ndisc_nodetype presence) */
 	kmemcheck_bitfield_end(flags2);
 
 #if defined CONFIG_NET_DMA || defined CONFIG_NET_RX_BUSY_POLL
@@ -720,7 +722,15 @@ extern unsigned int   skb_find_text(struct sk_buff *skb, unsigned int from,
 extern void __skb_get_rxhash(struct sk_buff *skb);
 static inline __u32 skb_get_rxhash(struct sk_buff *skb)
 {
-	if (!skb->l4_rxhash)
+	if (!skb->l4_rxhash && !skb->sw_rxhash)
+		__skb_get_rxhash(skb);
+
+	return skb->rxhash;
+}
+
+static inline __u32 skb_get_sw_rxhash(struct sk_buff *skb)
+{
+	if (!skb->sw_rxhash)
 		__skb_get_rxhash(skb);
 
 	return skb->rxhash;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1929af8..8979121 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -200,6 +200,7 @@ void __skb_get_rxhash(struct sk_buff *skb)
 		hash = 1;
 
 	skb->rxhash = hash;
+	skb->sw_rxhash = 1;
 }
 EXPORT_SYMBOL(__skb_get_rxhash);
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d81cff1..5021318 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -706,6 +706,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->rxhash		= old->rxhash;
 	new->ooo_okay		= old->ooo_okay;
 	new->l4_rxhash		= old->l4_rxhash;
+	new->sw_rxhash		= old->sw_rxhash;
 	new->no_fcs		= old->no_fcs;
 	new->encapsulation	= old->encapsulation;
 #ifdef CONFIG_XFRM
-- 
1.8.4

^ permalink raw reply related

* Re: per-PID network stats files in /proc
From: Stephen Hemminger @ 2013-09-24 20:41 UTC (permalink / raw)
  To: Matthew Hall; +Cc: netdev
In-Reply-To: <20130924201536.GA3555@mhcomputing.net>

On Tue, 24 Sep 2013 13:15:37 -0700
Matthew Hall <mh@mhcomputing.net> wrote:

> Hello,
> 
> I have an application that I'd like to make self-tuning, based upon the values 
> of some of the network stats counters. Thus I went reading through a copy of 
> linux-3.11.1 to look for some more information, and began exploring procfs as 
> well.
> 
> I discovered some system-wide counters in /proc/net/snmp which are pretty 
> interesting so I was trying to use the per-PID counters in 
> /proc/net/PID/net/snmp as well.
> 
> Unfortunately I found that all of these files seem to be identical on my own 
> system running Linux 3.2.0:
> 
> $ md5sum /proc/net/snmp /proc/1/net/snmp /proc/2/net/snmp
> 8b92b426f860d0667a780cd8baabb7bf  /proc/net/snmp
> 8b92b426f860d0667a780cd8baabb7bf  /proc/1/net/snmp
> 8b92b426f860d0667a780cd8baabb7bf  /proc/2/net/snmp
> 
> I found the snmp_seq_show function in net/ipv4/proc.c which prints these 
> statistics to see how it really worked, but then I was having a hard time 
> finding out what's really calling this code since it hooks up to the /proc 
> framework.
> 
> Is there some option one can change which enables gathering the network 
> statistics per-PID, or per-socket? This would be a tremendous help for my 
> application.
> 
> Thanks,
> Matthew.

No. because most of these would be associated with global state.
Even sockets can be shared between PID's.

^ permalink raw reply

* per-PID network stats files in /proc
From: Matthew Hall @ 2013-09-24 20:15 UTC (permalink / raw)
  To: netdev

Hello,

I have an application that I'd like to make self-tuning, based upon the values 
of some of the network stats counters. Thus I went reading through a copy of 
linux-3.11.1 to look for some more information, and began exploring procfs as 
well.

I discovered some system-wide counters in /proc/net/snmp which are pretty 
interesting so I was trying to use the per-PID counters in 
/proc/net/PID/net/snmp as well.

Unfortunately I found that all of these files seem to be identical on my own 
system running Linux 3.2.0:

$ md5sum /proc/net/snmp /proc/1/net/snmp /proc/2/net/snmp
8b92b426f860d0667a780cd8baabb7bf  /proc/net/snmp
8b92b426f860d0667a780cd8baabb7bf  /proc/1/net/snmp
8b92b426f860d0667a780cd8baabb7bf  /proc/2/net/snmp

I found the snmp_seq_show function in net/ipv4/proc.c which prints these 
statistics to see how it really worked, but then I was having a hard time 
finding out what's really calling this code since it hooks up to the /proc 
framework.

Is there some option one can change which enables gathering the network 
statistics per-PID, or per-socket? This would be a tremendous help for my 
application.

Thanks,
Matthew.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Hannes Frederic Sowa @ 2013-09-24 19:42 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Eric Dumazet, David Miller, Linux Netdev List, Brandeburg, Jesse
In-Reply-To: <CA+mtBx8PJj1HgaoiOoJjQL6VmU4Rn7ctBpscr1YxpfrWW9VSbA@mail.gmail.com>

On Tue, Sep 24, 2013 at 11:02:11AM -0700, Tom Herbert wrote:
> On Tue, Sep 24, 2013 at 10:34 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2013-09-24 at 10:03 -0700, Tom Herbert wrote:
> >> On Tue, Sep 24, 2013 at 9:46 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Tue, 2013-09-24 at 09:35 -0700, Tom Herbert wrote:
> >> >
> >> >> We should really be using rxhash for that anyway, eliminate this
> >> >> ehashfn.  This would entail adding rxhash argument in the various
> >> >> udp_lookup functions.
> >> >
> >> > Nope : Some NICs provide UDP rxhash only using L3  (source IP,
> >> > destination IP), not L4 (adding source & destination ports)
> >> >
> >> Then the NIC won't set l4_rxhash and we'll rehash over 4-tuple when
> >> skb_get_rxhash is called.
> >
> > Yes, but then in this case you add cpu cycles for no reason.
> >
> > If you have multiqueue NIC, you do not use RPS/RFS, so skb->rxhash might
> > be 0
> >
> > hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
> >
> > is faster than the whole flow dissection game.
> >
> But if we already have valid l4_rxhash we're just wasting time
> recomputing it (very like the same value).  So just do:
> 
> if (skb->l4_hash)
>     hash = skb->rxhash
> else
>     hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
> 
> Even if we go the inet_ehashfn route in the packet, it's makes sense
> to store this in skb->rxhash so that subsequent functions don't need
> to compute the hash
> 
> By the way, I believe there is an insidious in using the same hash
> value or even related hash values in multiple places for steering.
> Since we typically do something like hash % numqueues, we introduce
> bias if multiple steering levels look at the same bits.  With
> SO_REUSEPORT this might actually be beneficial since we could get port
> selection to match RSS locality, but this is not by design!

Wouldn't this also need infrastructure to sync the keys over multiple network
cards to have best benefits?

^ permalink raw reply

* Re: [PATCH v4 net-next 06/27] bonding: modify bond_get_slave_by_dev() to use neighbours
From: Veaceslav Falico @ 2013-09-24 19:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jiri, fubar, andy
In-Reply-To: <20130924.151943.1884204173373827600.davem@redhat.com>

On Tue, Sep 24, 2013 at 03:19:43PM -0400, David Miller wrote:
>From: Veaceslav Falico <vfalico@redhat.com>
>Date: Tue, 24 Sep 2013 13:46:46 +0200
>
>> -	return NULL;
>> +	return (struct slave *)netdev_lower_dev_get_private(bond->dev,
>> +							    slave_dev);
>>  }
>
>void pointers never need to be cast, please remove this "struct slave *"
>cast.

Yep, leftover from the sandbox versions, will change and submit the new
version. Sorry about that :-/.

^ permalink raw reply

* Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
From: David Miller @ 2013-09-24 19:28 UTC (permalink / raw)
  To: vincent.mc.li; +Cc: netdev, linux-kernel
In-Reply-To: <1380046281-6012-1-git-send-email-vincent.mc.li@gmail.com>

From: Vincent Li <vincent.mc.li@gmail.com>
Date: Tue, 24 Sep 2013 11:11:21 -0700

> the current behavior is when an IP is added to an interface, the primary
> or secondary attributes is depending on the order of ip added to the interface
> the first IP will be primary and second, third,... or alias IP will be secondary
> if the IP subnet matches
> 
> this patch add the flexiblity to allow user to specify an argument 'primary' or 'secondary'
> (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for example) to specify
> an IP address to be  primary or secondary.
> 
> the reason for this patch is that we have a multi blade cluster platform sharing 'floating management ip'
> and also that each blade has its own management ip on the management interface, so whichever blade in the
> cluster becomes primary blade, the 'floating mangaement ip' follows it, and we want any of our traffic
> originated from the primary blade source from the 'floating management ip' for consistency. but in this
> case, since the local blade management ip is always the primary ip on the mangaement interface and 'floating
> management ip' is always secondary, kernel always choose the primary ip as source ip address. thus we would
> like to add the flexibility in kernel to allow us to specify which ip to be primary or secondary.
> 
> Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>

When submitting a patch, please:

1) Specify an appropriate prefix for your subject line, indicating the
   subsystem.  "ipv4: " might be appropriate here.

2) Format your commit message so that lines do not exceed 80 columns.
   People will read using ASCII text based tools in 80 column
   terminals.

^ permalink raw reply

* Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
From: Stephen Hemminger @ 2013-09-24 19:21 UTC (permalink / raw)
  To: Vincent Li; +Cc: netdev, linux-kernel, davem
In-Reply-To: <1380046281-6012-1-git-send-email-vincent.mc.li@gmail.com>

On Tue, 24 Sep 2013 11:11:21 -0700
Vincent Li <vincent.mc.li@gmail.com> wrote:

> -	ifa->ifa_flags &= ~IFA_F_SECONDARY;
>  	last_primary = &in_dev->ifa_list;
>  
> +	if((*last_primary) == NULL)


Minor nit, paren's here are unnecessary

^ permalink raw reply

* Re: [PATCH v4 net-next 06/27] bonding: modify bond_get_slave_by_dev() to use neighbours
From: David Miller @ 2013-09-24 19:19 UTC (permalink / raw)
  To: vfalico; +Cc: netdev, jiri, fubar, andy
In-Reply-To: <1380023227-9576-7-git-send-email-vfalico@redhat.com>

From: Veaceslav Falico <vfalico@redhat.com>
Date: Tue, 24 Sep 2013 13:46:46 +0200

> -	return NULL;
> +	return (struct slave *)netdev_lower_dev_get_private(bond->dev,
> +							    slave_dev);
>  }

void pointers never need to be cast, please remove this "struct slave *"
cast.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Eric Dumazet @ 2013-09-24 19:14 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Ben Hutchings, David Miller, David Laight, Linux Netdev List,
	Brandeburg, Jesse
In-Reply-To: <CA+mtBx83=X6k42oe4=-hD_coRoX72bJ05Kb5dDfRCDCdTzOH+A@mail.gmail.com>

On Tue, 2013-09-24 at 11:24 -0700, Tom Herbert wrote:
> >
> There are some drivers are using fixed keys.  This is not good.  I'll
> take a look at those.  It should be sufficient to randomly set the key
> once at initialization.
> 

If the software fallback (we need this for dumb NIC, or encapsulated
traffic) is more expensive than jhash, I do not think we will ever use
this Toeplitz thing.

We can not use jhash or Toeplitz.

^ permalink raw reply

* Re: SMSC 9303 support
From: Gary Thomas @ 2013-09-24 19:13 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Ben Hutchings, netdev
In-Reply-To: <CAGVrzcbF+24ouTGxMwsu3-QCi=JeHEcS7BqBceGBX7Jf+e7BdQ@mail.gmail.com>

On 2013-09-24 12:29, Florian Fainelli wrote:
> Hello,
>
> 2013/9/24 Gary Thomas <gary@mlbassoc.com>:
>> On 2013-09-24 10:51, Ben Hutchings wrote:
>>>
>>> On Tue, 2013-09-24 at 06:21 -0600, Gary Thomas wrote:
>>>>
>>>> I need to support the SMSC9303 in an embedded system.  I'm not
>>>> finding any [explicit] support for this device in the latest
>>>> mainline kernel.  Did I miss something?
>>>>
>>>> To be clear, the SMSC9303 is a 3-port managed ethernet switch
>>>> capable of supporting 802.1D/802.1Q directly. This switch is
>>>> driven by a single MAC via MII/RMII and exposes the other two
>>>> ports via physical PHYs.  What I need it to do is behave like
>>>> two external, separate devices.  I was thinking that what I need
>>>> to do is treat these as VLAN devices since the switch can manage
>>>> the routing.
>>>>
>>>> Does this seem like a reasonable approach?
>>>
>>>
>>> Linux has 'DSA' (Distributed Switch Architecture) which supports tagging
>>> of packets to indicate which switch port they are sent or received
>>> through.  This was originally added to support some Marvell switch chips
>>> and I don't know whether it would be suitable or extensible for this
>>> one.
>>
>>
>> I've used the DSA stuff for years (worked directly with the Marvell folks
>> when it was being developed).  It might work for this device, I'll think
>> some more about using it although I was hoping for a lighter weight
>> solution.
>
> I do not think DSA is suitable for pure 802.1q switches such as this
> one. OpenWrt has an out of tree patch which adds some switch-specific
> operations that can be controlled over netlink (currently trying to
> get them in a shape where they can be submitted for mainline
> inclusion) [1], which I think is much more suitable than DSA or any
> other proprietary switch tagging mechanism.
>
> [1]: https://dev.openwrt.org/browser/trunk/target/linux/generic/files/drivers/net/phy/swconfig.c
>

This looks interesting.  Do you have any more information on how to
integrate this and/or use it?

-- 
------------------------------------------------------------
Gary Thomas                 |  Consulting for the
MLB Associates              |    Embedded world
------------------------------------------------------------

^ permalink raw reply

* Re: [net 5/6] i40e: better return values
From: Jesse Brandeburg @ 2013-09-24 19:09 UTC (permalink / raw)
  To: David Miller; +Cc: joe, jeffrey.t.kirsher, netdev, gospo, sassmann
In-Reply-To: <20130924.101231.1193830264265403478.davem@davemloft.net>

On Tue, 24 Sep 2013 10:12:31 -0400
David Miller <davem@davemloft.net> wrote:
> > Ick.  post_increment problem.
> > 
> > 	return ++num_tc;
> > 
> > There's nothing wrong with the original code
> > unless this is a bugfix which should be documented
> > better than "better return values".
> 
> Agreed, this style of coding is asking for a bug.
> 
> If you want to return "num_tc PLUS ONE" just say that:
> 
> 	return num_tc + 1;

Oops! Obviously the original post-inc was not the intent. That was my
bad. good catch Joe! will fix and resubmit.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Tom Herbert @ 2013-09-24 19:04 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: David Miller, David Laight, Linux Netdev List
In-Reply-To: <20130924114853.00003935@unknown>

On Tue, Sep 24, 2013 at 11:48 AM, Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
> On Tue, 24 Sep 2013 14:03:12 -0400 David Miller <davem@redhat.com> wrote:
> ...
>> >> For security reasons we absolutely cannot use it for that purpose,
>> >> please stop claiming this.
>> >>
>> >> Any hash function which an attacker can reproduce is attackable.
>> >
> ...
>> > that should be addressed.  It is possible to DoS attack through the
>> > steering mechanism.
>>
>> All of them are using a fixed, defined, key.
>
> We selected the fixed key on purpose.  The existing mechanisms built
> into the stack for preventing the impact of DOS attacks like NAPI
> polling will prevent any actual damage even if someone sends lots of
> packets on a single flow.  If someone overflows a receive queue that
> CPU runs until it can't keep up and then hardware drops further
> packets.  In this case even with a randomized seed key any single flow
> can still be targeted at a queue, which is no different than a single
> queue adapter.
>
> I'm not convinced there is an actual impact in practice.
>
But what is the purpose of using a fixed key, what are the benefits?
A clever attacker could attack a specific queue with different
4-tuples.  While this might be similar to a single flow attack, it
will be much harder to detect and be used to attack multiple servers
simultaneously in the say way (since they all share the same key).  I
don't think we could deploy a NIC with a static RSS key.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: David Miller @ 2013-09-24 18:49 UTC (permalink / raw)
  To: bhutchings; +Cc: therbert, David.Laight, netdev, jesse.brandeburg
In-Reply-To: <1380046245.2736.52.camel@bwh-desktop.uk.level5networks.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 24 Sep 2013 19:10:45 +0100

> On Tue, 2013-09-24 at 14:03 -0400, David Miller wrote:
>> From: Tom Herbert <therbert@google.com>
>> Date: Tue, 24 Sep 2013 08:54:24 -0700
>> 
>> > On Tue, Sep 24, 2013 at 8:39 AM, David Miller <davem@redhat.com> wrote:
>> >> From: Tom Herbert <therbert@google.com>
>> >> Date: Tue, 24 Sep 2013 08:22:55 -0700
>> >>
>> >>> We use this value for steering, and could use it for other uses like
>> >>> connection lookup.
>> >>
>> >> For security reasons we absolutely cannot use it for that purpose,
>> >> please stop claiming this.
>> >>
>> >> Any hash function which an attacker can reproduce is attackable.
>> > 
>> > The Toeplitz function uses a secret key whose length is based on the
>> > input length.  96 bits in IPv4, 320 bits in IPv6.  I don't see how an
>> > attacker can reproduce this if the key is random.  If the problem is
>> > that devices are not being configured with a sufficiently random key
>> > (some actually are using a fixed key :-( ), that's a separate issue
>> > that should be addressed.  It is possible to DoS attack through the
>> > steering mechanism.
>> 
>> All of them are using a fixed, defined, key.
> 
> This is certainly false, as I know sfc randomises the key.  And the
> Microsoft RSS spec appears to require that the key is programmable.

Then I stand corrected.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Jesse Brandeburg @ 2013-09-24 18:48 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, David.Laight, netdev
In-Reply-To: <20130924.140312.1944338200709799169.davem@redhat.com>

On Tue, 24 Sep 2013 14:03:12 -0400 David Miller <davem@redhat.com> wrote:
...
> >> For security reasons we absolutely cannot use it for that purpose,
> >> please stop claiming this.
> >>
> >> Any hash function which an attacker can reproduce is attackable.
> > 
...
> > that should be addressed.  It is possible to DoS attack through the
> > steering mechanism.
> 
> All of them are using a fixed, defined, key.

We selected the fixed key on purpose.  The existing mechanisms built
into the stack for preventing the impact of DOS attacks like NAPI
polling will prevent any actual damage even if someone sends lots of
packets on a single flow.  If someone overflows a receive queue that
CPU runs until it can't keep up and then hardware drops further
packets.  In this case even with a randomized seed key any single flow
can still be targeted at a queue, which is no different than a single
queue adapter.

I'm not convinced there is an actual impact in practice.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: David Miller @ 2013-09-24 18:48 UTC (permalink / raw)
  To: therbert; +Cc: eric.dumazet, hannes, netdev, jesse.brandeburg
In-Reply-To: <CA+mtBx8PJj1HgaoiOoJjQL6VmU4Rn7ctBpscr1YxpfrWW9VSbA@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Tue, 24 Sep 2013 11:02:11 -0700

> But if we already have valid l4_rxhash we're just wasting time
> recomputing it (very like the same value).  So just do:

Honestly, you're arguing over 12 or so cycles.  That's about how much
jhash costs.

^ permalink raw reply

* Re: SMSC 9303 support
From: Florian Fainelli @ 2013-09-24 18:29 UTC (permalink / raw)
  To: Gary Thomas; +Cc: Ben Hutchings, netdev
In-Reply-To: <5241C4F2.2040106@mlbassoc.com>

Hello,

2013/9/24 Gary Thomas <gary@mlbassoc.com>:
> On 2013-09-24 10:51, Ben Hutchings wrote:
>>
>> On Tue, 2013-09-24 at 06:21 -0600, Gary Thomas wrote:
>>>
>>> I need to support the SMSC9303 in an embedded system.  I'm not
>>> finding any [explicit] support for this device in the latest
>>> mainline kernel.  Did I miss something?
>>>
>>> To be clear, the SMSC9303 is a 3-port managed ethernet switch
>>> capable of supporting 802.1D/802.1Q directly. This switch is
>>> driven by a single MAC via MII/RMII and exposes the other two
>>> ports via physical PHYs.  What I need it to do is behave like
>>> two external, separate devices.  I was thinking that what I need
>>> to do is treat these as VLAN devices since the switch can manage
>>> the routing.
>>>
>>> Does this seem like a reasonable approach?
>>
>>
>> Linux has 'DSA' (Distributed Switch Architecture) which supports tagging
>> of packets to indicate which switch port they are sent or received
>> through.  This was originally added to support some Marvell switch chips
>> and I don't know whether it would be suitable or extensible for this
>> one.
>
>
> I've used the DSA stuff for years (worked directly with the Marvell folks
> when it was being developed).  It might work for this device, I'll think
> some more about using it although I was hoping for a lighter weight
> solution.

I do not think DSA is suitable for pure 802.1q switches such as this
one. OpenWrt has an out of tree patch which adds some switch-specific
operations that can be controlled over netlink (currently trying to
get them in a shape where they can be submitted for mainline
inclusion) [1], which I think is much more suitable than DSA or any
other proprietary switch tagging mechanism.

[1]: https://dev.openwrt.org/browser/trunk/target/linux/generic/files/drivers/net/phy/swconfig.c
-- 
Florian

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Tom Herbert @ 2013-09-24 18:24 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, David Laight, Linux Netdev List, Brandeburg, Jesse
In-Reply-To: <1380046245.2736.52.camel@bwh-desktop.uk.level5networks.com>

On Tue, Sep 24, 2013 at 11:10 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Tue, 2013-09-24 at 14:03 -0400, David Miller wrote:
>> From: Tom Herbert <therbert@google.com>
>> Date: Tue, 24 Sep 2013 08:54:24 -0700
>>
>> > On Tue, Sep 24, 2013 at 8:39 AM, David Miller <davem@redhat.com> wrote:
>> >> From: Tom Herbert <therbert@google.com>
>> >> Date: Tue, 24 Sep 2013 08:22:55 -0700
>> >>
>> >>> We use this value for steering, and could use it for other uses like
>> >>> connection lookup.
>> >>
>> >> For security reasons we absolutely cannot use it for that purpose,
>> >> please stop claiming this.
>> >>
>> >> Any hash function which an attacker can reproduce is attackable.
>> >
>> > The Toeplitz function uses a secret key whose length is based on the
>> > input length.  96 bits in IPv4, 320 bits in IPv6.  I don't see how an
>> > attacker can reproduce this if the key is random.  If the problem is
>> > that devices are not being configured with a sufficiently random key
>> > (some actually are using a fixed key :-( ), that's a separate issue
>> > that should be addressed.  It is possible to DoS attack through the
>> > steering mechanism.
>>
>> All of them are using a fixed, defined, key.
>
> This is certainly false, as I know sfc randomises the key.  And the
> Microsoft RSS spec appears to require that the key is programmable.
>
There are some drivers are using fixed keys.  This is not good.  I'll
take a look at those.  It should be sufficient to randomly set the key
once at initialization.


> Ben.
>
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>

^ permalink raw reply

* [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
From: Vincent Li @ 2013-09-24 18:11 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, davem, Vincent Li

the current behavior is when an IP is added to an interface, the primary
or secondary attributes is depending on the order of ip added to the interface
the first IP will be primary and second, third,... or alias IP will be secondary
if the IP subnet matches

this patch add the flexiblity to allow user to specify an argument 'primary' or 'secondary'
(use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for example) to specify
an IP address to be  primary or secondary.

the reason for this patch is that we have a multi blade cluster platform sharing 'floating management ip'
and also that each blade has its own management ip on the management interface, so whichever blade in the
cluster becomes primary blade, the 'floating mangaement ip' follows it, and we want any of our traffic
originated from the primary blade source from the 'floating management ip' for consistency. but in this
case, since the local blade management ip is always the primary ip on the mangaement interface and 'floating
management ip' is always secondary, kernel always choose the primary ip as source ip address. thus we would
like to add the flexibility in kernel to allow us to specify which ip to be primary or secondary.

Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
---
 net/ipv4/devinet.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a1b5bcb..bfc702a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -440,9 +440,11 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
 		return 0;
 	}
 
-	ifa->ifa_flags &= ~IFA_F_SECONDARY;
 	last_primary = &in_dev->ifa_list;
 
+	if((*last_primary) == NULL)
+		ifa->ifa_flags &= ~IFA_F_SECONDARY;
+
 	for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
 	     ifap = &ifa1->ifa_next) {
 		if (!(ifa1->ifa_flags & IFA_F_SECONDARY) &&
@@ -458,7 +460,10 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
 				inet_free_ifa(ifa);
 				return -EINVAL;
 			}
-			ifa->ifa_flags |= IFA_F_SECONDARY;
+                        if (!(ifa->ifa_flags & IFA_F_SECONDARY))
+                                ifa1->ifa_flags |= IFA_F_SECONDARY;
+                        else
+                                ifa->ifa_flags |= IFA_F_SECONDARY;
 		}
 	}
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Ben Hutchings @ 2013-09-24 18:10 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, David.Laight, netdev, jesse.brandeburg
In-Reply-To: <20130924.140312.1944338200709799169.davem@redhat.com>

On Tue, 2013-09-24 at 14:03 -0400, David Miller wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Tue, 24 Sep 2013 08:54:24 -0700
> 
> > On Tue, Sep 24, 2013 at 8:39 AM, David Miller <davem@redhat.com> wrote:
> >> From: Tom Herbert <therbert@google.com>
> >> Date: Tue, 24 Sep 2013 08:22:55 -0700
> >>
> >>> We use this value for steering, and could use it for other uses like
> >>> connection lookup.
> >>
> >> For security reasons we absolutely cannot use it for that purpose,
> >> please stop claiming this.
> >>
> >> Any hash function which an attacker can reproduce is attackable.
> > 
> > The Toeplitz function uses a secret key whose length is based on the
> > input length.  96 bits in IPv4, 320 bits in IPv6.  I don't see how an
> > attacker can reproduce this if the key is random.  If the problem is
> > that devices are not being configured with a sufficiently random key
> > (some actually are using a fixed key :-( ), that's a separate issue
> > that should be addressed.  It is possible to DoS attack through the
> > steering mechanism.
> 
> All of them are using a fixed, defined, key.

This is certainly false, as I know sfc randomises the key.  And the
Microsoft RSS spec appears to require that the key is programmable.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox