Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v2 4/7] rhashtable: future table needs to be traversed when remove an object
From: Thomas Graf @ 2015-01-07  9:39 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, jon.maloy, Paul.Gortmaker, erik.hugne, netdev,
	tipc-discussion
In-Reply-To: <1420609318-3261-5-git-send-email-ying.xue@windriver.com>

On 01/07/15 at 01:41pm, Ying Xue wrote:
> When remove an object from hash table, we currently only traverse old
> bucket table to check whether the object exists. If the object is not
> found in it, we will try again. But in the second search loop, we still
> search the object from the old table instead of future table. As a
> result, the object may be not removed from hash table especially when
> resizing is currently in progress and the object is just saved in the
> future table.
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Cc: Thomas Graf <tgraf@suug.ch>

Nice catch!
Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* [PATCH next] net: e1000e: support txtd update delay via xmit_more
From: Florian Westphal @ 2015-01-07  9:49 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Don't update tx tail descriptor if queue hasn't been stopped
and we know at least one more skb will be sent right away.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 332a298..214cb78 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5444,16 +5444,6 @@ static void e1000_tx_queue(struct e1000_ring *tx_ring, int tx_flags, int count)
 	wmb();
 
 	tx_ring->next_to_use = i;
-
-	if (adapter->flags2 & FLAG2_PCIM2PCI_ARBITER_WA)
-		e1000e_update_tdt_wa(tx_ring, i);
-	else
-		writel(i, tx_ring->tail);
-
-	/* we need this if more than one processor can write to our tail
-	 * at a time, it synchronizes IO on IA64/Altix systems
-	 */
-	mmiowb();
 }
 
 #define MINIMUM_DHCP_PACKET_SIZE 282
@@ -5653,6 +5643,19 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 				    (MAX_SKB_FRAGS *
 				     DIV_ROUND_UP(PAGE_SIZE,
 						  adapter->tx_fifo_limit) + 2));
+
+		if (!skb->xmit_more ||
+		    netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
+			if (adapter->flags2 & FLAG2_PCIM2PCI_ARBITER_WA)
+				e1000e_update_tdt_wa(tx_ring, tx_ring->next_to_use);
+			else
+				writel(tx_ring->next_to_use, tx_ring->tail);
+
+			/* we need this if more than one processor can write to our tail
+			 * at a time, it synchronizes IO on IA64/Altix systems
+			 */
+			mmiowb();
+		}
 	} else {
 		dev_kfree_skb_any(skb);
 		tx_ring->buffer_info[first].time_stamp = 0;
-- 
2.0.5

^ permalink raw reply related

* [PATCH -next] r8169: add support for xmit_more
From: Florian Westphal @ 2015-01-07  9:49 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Delay update of hw tail descriptor if we know that another skb is going
to be sent.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 drivers/net/ethernet/realtek/r8169.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 14a1c5c..3a28059 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7049,6 +7049,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	u32 status, len;
 	u32 opts[2];
 	int frags;
+	bool stop_queue;
 
 	if (unlikely(!TX_FRAGS_READY_FOR(tp, skb_shinfo(skb)->nr_frags))) {
 		netif_err(tp, drv, dev, "BUG! Tx Ring full when queue awake!\n");
@@ -7105,11 +7106,16 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	tp->cur_tx += frags + 1;
 
-	RTL_W8(TxPoll, NPQ);
+	stop_queue = !TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS);
 
-	mmiowb();
+	if (!skb->xmit_more || stop_queue ||
+	    netif_xmit_stopped(netdev_get_tx_queue(dev, 0))) {
+		RTL_W8(TxPoll, NPQ);
+
+		mmiowb();
+	}
 
-	if (!TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) {
+	if (stop_queue) {
 		/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
 		 * not miss a ring update when it notices a stopped queue.
 		 */
-- 
2.0.5

^ permalink raw reply related

* Re: [PATCH net-next v2 5/7] rhashtable: avoid unnecessary wakeup for worker queue
From: Thomas Graf @ 2015-01-07  9:50 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, jon.maloy, Paul.Gortmaker, erik.hugne, netdev,
	tipc-discussion
In-Reply-To: <1420609318-3261-6-git-send-email-ying.xue@windriver.com>

On 01/07/15 at 01:41pm, Ying Xue wrote:
> Move condition statements of verifying whether hash table size exceeds
> its maximum threshold or reaches its minimum threshold from resizing
> functions to resizing decision functions, avoiding unnecessary wakeup
> for worker queue thread.
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Cc: Thomas Graf <tgraf@suug.ch>

Good optimization, thanks!

Acked-by: Thomas Graf <tgraf@suug.ch>

Can you do a follow-up patch and add a note in rhashtable.h to
indicate the implementation of the grow and shrink decision function
must enforce min/max shift?

^ permalink raw reply

* Re: [PATCH net-next v2 6/7] rhashtable: initialize atomic nelems variable
From: Thomas Graf @ 2015-01-07  9:53 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, jon.maloy, Paul.Gortmaker, erik.hugne, netdev,
	tipc-discussion
In-Reply-To: <1420609318-3261-7-git-send-email-ying.xue@windriver.com>

On 01/07/15 at 01:41pm, Ying Xue wrote:
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Cc: Thomas Graf <tgraf@suug.ch>

Is this really needed at all? We initialize the full rhashtable
struct to 0 in rhashtable_init().

^ permalink raw reply

* Re: [PATCH net-next v2 0/7] Involve rhashtable_lookup_insert routine
From: Thomas Graf @ 2015-01-07  9:54 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, jon.maloy, Paul.Gortmaker, erik.hugne, netdev,
	tipc-discussion
In-Reply-To: <1420609318-3261-1-git-send-email-ying.xue@windriver.com>

On 01/07/15 at 01:41pm, Ying Xue wrote:
> The series aims to involve rhashtable_lookup_insert() to guarantee
> that the process of lookup and insertion of an object from/into hash
> table is finished atomically, allowing rhashtable's users not to
> introduce an extra lock during search and insertion. For example,
> tipc socket is the first user benefiting from this enhancement. 
> 
> v2 changes:

Thanks, this looks excellent. I think patch 6/7 can be dropped,
otherwise this looks ready to go. I acked all other changes.

^ permalink raw reply

* RE: [PATCH 2/6] vxlan: Group Policy extension
From: David Laight @ 2015-01-07 10:03 UTC (permalink / raw)
  To: 'Alexei Starovoitov', Thomas Graf
  Cc: David S. Miller, Jesse Gross, Stephen Hemminger, Pravin Shelar,
	netdev@vger.kernel.org, dev@openvswitch.org
In-Reply-To: <CAADnVQ+EO34qYbK+sji-Da3CMkn+-V-9Uvdar_WgxRQ4_+JFXA@mail.gmail.com>

From: Alexei Starovoitov
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > +struct vxlan_gbp {
> > +#ifdef __LITTLE_ENDIAN_BITFIELD
> > +       __u8    reserved_flags1:3,
> ...
> > +       __be16 policy_id;
> > +} __packed;
> 
> are you sure that compiler will be smart enough
> to do single short load when you pack
> u8 + struct vxlan_gbp inside struct vxlanhdr ?
> I suspect compiler will use two byte loads
> with shifts and ors every time to access policy_id.
> Even it works ok, I think this struct layout is ugly.
> imo would be much easier to read if you replace
> the whole vxlanhdr with vxlanhdr_gbp
> or split vxlanhdr into two 32-bit structs.
> then __packed hacks won't be needed.

Also, if you are writing the values then you need to write
all the members of the bitfield in order to get a single write.

Basically you are much better off using explicit masks.

	David


^ permalink raw reply

* Re: [RFC PATCH] unlock rtnl mutex in ic_open_devs while waiting
From: Maarten Lankhorst @ 2015-01-07 10:06 UTC (permalink / raw)
  To: David Miller
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel,
	Jay Cliburn, Chris Snook, Sabrina Dubroca, Johannes Berg
In-Reply-To: <20150106.172152.2034122593693302134.davem@davemloft.net>

Op 06-01-15 om 23:21 schreef David Miller:
> From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Date: Mon, 05 Jan 2015 14:52:06 +0100
>
>> This fixes a deadlock with alx_link_check, which takes the rtnl_mutex in
>> a work item to check the link.
>>
>> I have no idea whether alx should be fixed or ipconfig.c,
>> but this saves 120 seconds off my boot time. ;-)
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> I genuinely think that alx_link_check() needs to use a smaller hammer
> to do it's locking, there is no reason to use the RTNL mutex.
>
> A driver private mutex will probably work just as well and not have
> this problem.

I guess alx_check_link uses the rtnl_lock for serializing against any possible alx_reset call.
The alternative is stopping check_link work before running anything that changes the device state.
Does the below patch look sane instead?
---
diff --git a/drivers/net/ethernet/atheros/alx/alx.h b/drivers/net/ethernet/atheros/alx/alx.h
index 8fc93c5f6abc..354f155b3144 100644
--- a/drivers/net/ethernet/atheros/alx/alx.h
+++ b/drivers/net/ethernet/atheros/alx/alx.h
@@ -88,6 +88,8 @@ struct alx_priv {
 		unsigned int size;
 	} descmem;
 
+	bool stop_link_check;
+
 	/* protect int_mask updates */
 	spinlock_t irq_lock;
 	u32 int_mask;
diff --git a/drivers/net/ethernet/atheros/alx/main.c b/drivers/net/ethernet/atheros/alx/main.c
index e398eda07298..ae93b8052cbf 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -315,7 +316,8 @@ static irqreturn_t alx_intr_handle(struct alx_priv *alx, u32 intr)
 		 */
 		alx->int_mask &= ~ALX_ISR_PHY;
 		write_int_mask = true;
-		alx_schedule_link_check(alx);
+		if (!alx->stop_link_check)
+			alx_schedule_link_check(alx);
 	}
 
 	if (intr & (ALX_ISR_TX_Q0 | ALX_ISR_RX_Q0)) {
@@ -742,6 +744,17 @@ static netdev_features_t alx_fix_features(struct net_device *netdev,
 	return features;
 }
 
+static void alx_disable_link_check(struct alx_priv *alx)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&alx->irq_lock, flags);
+	alx->stop_link_check = true;
+	spin_unlock_irqrestore(&alx->irq_lock, flags);
+
+	cancel_work_sync(&alx->link_check_wk);
+}
+
 static void alx_netif_stop(struct alx_priv *alx)
 {
 	alx->dev->trans_start = jiffies;
@@ -756,6 +769,7 @@ static void alx_halt(struct alx_priv *alx)
 {
 	struct alx_hw *hw = &alx->hw;
 
+	alx_disable_link_check(alx);
 	alx_netif_stop(alx);
 	hw->link_speed = SPEED_UNKNOWN;
 	hw->duplex = DUPLEX_UNKNOWN;
@@ -788,6 +802,7 @@ static void alx_activate(struct alx_priv *alx)
 	/* clear old interrupts */
 	alx_write_mem32(&alx->hw, ALX_ISR, ~(u32)ALX_ISR_DIS);
 
+	alx->stop_link_check = false;
 	alx_irq_enable(alx);
 
 	alx_schedule_link_check(alx);
@@ -850,6 +865,7 @@ static int __alx_open(struct alx_priv *alx, bool resume)
 	/* clear old interrupts */
 	alx_write_mem32(&alx->hw, ALX_ISR, ~(u32)ALX_ISR_DIS);
 
+	alx->stop_link_check = false;
 	alx_irq_enable(alx);
 
 	if (!resume)
@@ -966,9 +982,7 @@ static void alx_link_check(struct work_struct *work)
 
 	alx = container_of(work, struct alx_priv, link_check_wk);
 
-	rtnl_lock();
 	alx_check_link(alx);
-	rtnl_unlock();
 }
 

^ permalink raw reply related

* Re: [net-next PATCH v1 01/11] net: flow_table: create interface for hw match/action tables
From: Or Gerlitz @ 2015-01-07 10:07 UTC (permalink / raw)
  To: John Fastabend
  Cc: Thomas Graf, Scott Feldman, Jiří Pírko,
	Jamal Hadi Salim, simon.horman, Linux Netdev List, David Miller,
	Andy Gospodarek
In-Reply-To: <54AADEFF.3090306@gmail.com>

On Mon, Jan 5, 2015 at 8:59 PM, John Fastabend <john.fastabend@gmail.com> wrote:
>>> +struct sk_buff *net_flow_build_actions_msg(struct net_flow_action **a,
>>> +                                          struct net_device *dev,
>>> +                                          u32 portid, int seq, u8 cmd)
>>> +{
>>> +       struct genlmsghdr *hdr;
>>> +       struct sk_buff *skb;
>>> +       int err = -ENOBUFS;
>>> +
>>> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>>
>>
>> genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
>
>
> fixed along with the other cases.

small nit here, net_flow_build_actions_msg can be made static, it's
called only from within this file

few more nits... checkpatch --strict produces bunch of "CHECK: Please
use a blank line after function/struct/union/enum declarations"
comments, I guess worth fixing too.

^ permalink raw reply

* RE: [PATCH 1/1] update ip-sysctl.txt documentation
From: David Laight @ 2015-01-07 10:11 UTC (permalink / raw)
  To: 'Ani Sinha', corbet@lwn.net, davem@davemloft.net,
	edumazet@google.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, P@draigbrady.com,
	netdev@vger.kernel.org, fruggeri@arista.com
In-Reply-To: <1420587382-24612-1-git-send-email-ani@arista.com>

From: Ani Sinha
> Update documentation to reflect the fact that
> /proc/sys/net/ipv4/route/max_size is no longer used for ipv4.
> 
> Signed-off-by: Ani Sinha <ani@arista.com>
> ---
>  Documentation/networking/ip-sysctl.txt |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 9bffdfc..2a28261 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
>  	Default: 0
> 
>  route/max_size - INTEGER
> -	Maximum number of routes allowed in the kernel.  Increase
> -	this when using large numbers of interfaces and/or routes.
> +	Post linux kernel 3.6, this is deprecated for ipv4 as route cache is no
> +	longer used. For ipv6, this is used to limit the maximum number of ipv6
> +	routes allowed in the kernel.  Increase this when using large numbers of
> +	interfaces and/or routes.

Now imagine you are reading ip-sysctl.txt, the new text is all wrong.
Something like:
	Limit on the size of the ip route caches.
	Ignored for ipv4 after kernel 3.6 because the ipv4 route cache was removed.
	Increase this when using large numbers of interfaces and/or routes.
would read better.

	David

^ permalink raw reply

* Re: [PATCH net-next] net: sched: use pinned timers
From: Cosmin GIRADU @ 2015-01-07 10:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <20140926.002710.1595423408687413602.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1786 bytes --]

On 26/09/14 07:27, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 20 Sep 2014 18:01:30 -0700
>
>> From: Eric Dumazet <edumazet@google.com>
>>
>> While using a MQ + NETEM setup, I had confirmation that the default
>> timer migration ( /proc/sys/kernel/timer_migration ) is killing us.
>>
>> Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
>> queues) :
>  ...
>> Current Qdiscs that benefit from this change are :
>>
>> 	netem, cbq, fq, hfsc, tbf, htb.
>>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Looks great, applied, thanks Eric.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Hi Eric,

    I saw that this patch didn't make it's way into the stable branches.
So I have two questions:
        - Would it be safe to apply to linux-3.12.x stable?
        - If yes, would there be any [noticeable] efects on a [pretty
complex] HTB setup? (I know, test and I'll see,
           but if theory sais I won't, then there would be no point to
the test, would there?)

Thank you!

-- 

Cosmin GIRADU
OSS Engineer
RCS & RDS - Unified Services
Phone:  +40-31-400-6323
Mobile: +40-77-020-0858
http://www.rcs-rds.ro

..........................................................................
Privileged/Confidential Information may be contained in this message. If
you are not the addressee indicated in this message (or responsible for
delivery of the message to such person), you may not copy or deliver
this message to anyone. In such a case, you should destroy this message
and kindly notify the sender by reply e-mail.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply

* [PATCH] wireless extensions should default to Y
From: Pavel Machek @ 2015-01-07 10:19 UTC (permalink / raw)
  To: jikos, kernel list, Linus Torvalds
  Cc: johannes, davem, linux-wireless, netdev

Do we need following patch on top of
24a0aa212ee2dbe44360288684478d76a8e20a0a revert?

I updated kernel today, and (probably because extensions were not
selectable before), the default choice was "N", which is wrong:
oldconfig should result in compatible choices being made, for example
to help bisect.

Signed-off-by: Pavel Machek <pavel@ucw.cz>

diff --git a/net/wireless/Kconfig b/net/wireless/Kconfig
index 29c8675..8951dba 100644
--- a/net/wireless/Kconfig
+++ b/net/wireless/Kconfig
@@ -177,6 +177,7 @@ config CFG80211_INTERNAL_REGDB
 config CFG80211_WEXT
 	bool "cfg80211 wireless extensions compatibility"
 	depends on CFG80211
+	default y
 	select WEXT_CORE
 	help
 	  Enable this option if you need old userspace for wireless


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related

* Re: [PATCH iproute2 -next] ip: route: add congestion control metric
From: Daniel Borkmann @ 2015-01-07 10:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: fw, netdev
In-Reply-To: <20150106170926.5bb2d816@urahara>

On 01/07/2015 02:09 AM, Stephen Hemminger wrote:
> On Wed,  7 Jan 2015 00:52:37 +0100
> Daniel Borkmann <dborkman@redhat.com> wrote:
>
>> +		} else if (matches(*argv, "congctl") == 0) {
>> +			char cc[16];
>> +			NEXT_ARG();
>> +			memset(cc, 0, sizeof(cc));
>> +			if (strcmp(*argv, "lock") == 0) {
>> +				mxlock |= (1<<RTAX_CC_ALGO);
>
> Unneeded paren

Yep, I kept it consistent across all mxlock assignments of this file,
but I can remove it, sure.

>> +				NEXT_ARG();
>> +			}
>> +			strncpy(cc, *argv, sizeof(cc) - 1);
>> +			if (strlen(cc) == 0)
>> +				invarg("\"conctl\" value must be an algorithm name\n", *argv
>
> Silently truncating the string is not odd. Can't we just let kernel impose
> length restrictions.

Sure, will respin, thanks.

^ permalink raw reply

* Re: [PATCH net-next v2 6/7] rhashtable: initialize atomic nelems variable
From: Ying Xue @ 2015-01-07 10:26 UTC (permalink / raw)
  To: Thomas Graf
  Cc: davem, jon.maloy, Paul.Gortmaker, erik.hugne, netdev,
	tipc-discussion
In-Reply-To: <20150107095332.GK21820@casper.infradead.org>

On 01/07/2015 05:53 PM, Thomas Graf wrote:
> On 01/07/15 at 01:41pm, Ying Xue wrote:
>> Signed-off-by: Ying Xue <ying.xue@windriver.com>
>> Cc: Thomas Graf <tgraf@suug.ch>
> 
> Is this really needed at all? We initialize the full rhashtable
> struct to 0 in rhashtable_init().
> 
> 

I am not sure whether we really need to reinitialize atomic variable
again although we have reset it with memset() or something else. But I
see many places in kernel where we do this, for example:

Although we use kmem_cache_zalloc() to allocate "net" structure instance
in net_alloc(), there are still several places where to reinitialize its
atomic variables again:

setup_net()
  atomic_set(&net->use_count, 0);

rt_genid_init()
  atomic_set(&net->ipv4.rt_genid, 0);
  atomic_set(&net->fnhe_genid, 0);

Can you please definitely confirm that the reinitialisation is redundant
for us?

Regards,
Ying

^ permalink raw reply

* Re: [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
From: Thomas Graf @ 2015-01-07 10:27 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev@openvswitch.org
In-Reply-To: <CA+mtBx_mp1mf_HPp5u9wDfgeOc8pt26Mf1VHFZTAqDhTdZe7cw@mail.gmail.com>

On 01/06/15 at 07:46pm, Tom Herbert wrote:
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > The VXLAN receive code is currently conservative in what it accepts and
> > will reject any frame that uses any of the reserved VXLAN protocol fields.
> > The VXLAN draft specifies that "reserved fields MUST be set to zero on
> > transmit and ignored on receive.".
> >
> IMO it is an unfortunate decision in VXLAN to ignore set reserved
> fields on receive. Had they not done this, then adding a protocol
> field to the header would have been feasible and we wouldn't need yet
> another encapsulation protocol (i.e. VXLAN-GPE). Rejecting frames with
> reserved bits set is the better behavior, but I think the comment
> about this needs to be clear about why this diverges from RFC7348.

Agreed. Do you want to give it a shot? I know you have been involved
on all aspects of this topic for a long time in NVO3 ;-)

> > Retain the current conservative parsing behaviour by default but allows
> > these fields to be used by VXLAN extensions which are explicitly enabled
> > on the VXLAN socket respectively VXLAN net_device.
> >
> Conceptually, VXLAN has both mandatory flags and optional flags for
> extensions. You may want to look at the VXLAN RCO patches that added
> an extension and infrastructure for them.

I've seen your proposed changes. I think they could be easily rebased
on top of this and use the enablement infrastructure. In fact, I see
this as the only feasible option to deal with VXLAN extensions: Leave
it to the user to decide which extensions to run. That way we can
support any combination of extensions, even conflicting ones. All we
have to do is catch incompatible extension configurations on a socket
and reject them at configuration time. Expecting that NVO3/IETF will
find consensus on a list of compatible set of extensions with perfect
forward and backwards compatibility is unrealistic in my eyes. We have
Geneve and GUE to solve it properly in the future.

^ permalink raw reply

* Re: [PATCH net-next v2 5/7] rhashtable: avoid unnecessary wakeup for worker queue
From: Ying Xue @ 2015-01-07 10:28 UTC (permalink / raw)
  To: Thomas Graf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <20150107095013.GJ21820@casper.infradead.org>

On 01/07/2015 05:50 PM, Thomas Graf wrote:
> On 01/07/15 at 01:41pm, Ying Xue wrote:
>> Move condition statements of verifying whether hash table size exceeds
>> its maximum threshold or reaches its minimum threshold from resizing
>> functions to resizing decision functions, avoiding unnecessary wakeup
>> for worker queue thread.
>>
>> Signed-off-by: Ying Xue <ying.xue@windriver.com>
>> Cc: Thomas Graf <tgraf@suug.ch>
> 
> Good optimization, thanks!
> 
> Acked-by: Thomas Graf <tgraf@suug.ch>
> 
> Can you do a follow-up patch and add a note in rhashtable.h to
> indicate the implementation of the grow and shrink decision function
> must enforce min/max shift?
> 
> 

Thanks for the reminder, and I will do this later.

Regards,
Ying


------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net

^ permalink raw reply

* Re: IPsec workshop at netdev01?
From: Steffen Klassert @ 2015-01-07 10:31 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Jamal Hadi Salim, Herbert Xu, David Miller
In-Reply-To: <20150106170026.GD11324@breakpoint.cc>

On Tue, Jan 06, 2015 at 06:00:26PM +0100, Florian Westphal wrote:
> Steffen Klassert <steffen.klassert@secunet.com> wrote:
> > - We still lack a 32/64 bit compatibiltiy layer for IPsec, this issue
> >   comes up from time to time. Some solutions were proposed in the past
> >   but all had problems. The current behaviour is broken if someone tries
> >   to configure IPsec with 32 bit tools on a 64 bit machine. Can we get
> >   this right somehow or is it better to just return an error in this case?
> 
> FWIW I think
> http://patchwork.ozlabs.org/patch/49465/
> 
> came closest to achieving full CONFIG_COMPAT support; since netlink is
> no longer async now I'm not sure we'd still need additonal 32-compat syscalls
> to make compat work for all cases.
> 
> So "its ugly as hell" is probably the only problem that is hard to avoid ;-)

Yeah, and it will be no fun to maintain it...

So the question is still, do we really need/want it or should we
tell that this is not supported. We just can't leave it as it is.
We allow to configure with 32 bit tools, but the result is crap.

^ permalink raw reply

* Re: [PATCH iproute2 -next] ip: route: add congestion control metric
From: Vadim Kochan @ 2015-01-07 10:22 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Stephen Hemminger, fw, netdev
In-Reply-To: <54AD08A9.9000205@redhat.com>

On Wed, Jan 07, 2015 at 11:21:29AM +0100, Daniel Borkmann wrote:
> On 01/07/2015 02:09 AM, Stephen Hemminger wrote:
> >On Wed,  7 Jan 2015 00:52:37 +0100
> >Daniel Borkmann <dborkman@redhat.com> wrote:
> >
> >>+		} else if (matches(*argv, "congctl") == 0) {
> >>+			char cc[16];
> >>+			NEXT_ARG();
> >>+			memset(cc, 0, sizeof(cc));
> >>+			if (strcmp(*argv, "lock") == 0) {
> >>+				mxlock |= (1<<RTAX_CC_ALGO);
> >
> >Unneeded paren
> 
 And what about spaces arount "<<" ?

^ permalink raw reply

* Re: [PATCH net-next v2 6/7] rhashtable: initialize atomic nelems variable
From: Thomas Graf @ 2015-01-07 10:36 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, jon.maloy, Paul.Gortmaker, erik.hugne, netdev,
	tipc-discussion
In-Reply-To: <54AD09CD.3010100@windriver.com>

On 01/07/15 at 06:26pm, Ying Xue wrote:
> I am not sure whether we really need to reinitialize atomic variable
> again although we have reset it with memset() or something else. But I
> see many places in kernel where we do this, for example:
> 
> Although we use kmem_cache_zalloc() to allocate "net" structure instance
> in net_alloc(), there are still several places where to reinitialize its
> atomic variables again:
> 
> setup_net()
>   atomic_set(&net->use_count, 0);
> 
> rt_genid_init()
>   atomic_set(&net->ipv4.rt_genid, 0);
>   atomic_set(&net->fnhe_genid, 0);
> 
> Can you please definitely confirm that the reinitialisation is redundant
> for us?

I see examples for both, explicit initialization and dependence on
a previous memset. I'm not sure what is the preferred way.

I'll provide my ACK since this obviously doesn't break anything and
leave it up to Dave.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* Re: [PATCH] wireless extensions should default to Y
From: Johannes Berg @ 2015-01-07 10:41 UTC (permalink / raw)
  To: Pavel Machek
  Cc: jikos-AlSwsSmVLrQ, kernel list, Linus Torvalds,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150107101903.GA6563@amd>

On Wed, 2015-01-07 at 11:19 +0100, Pavel Machek wrote:
> Do we need following patch on top of
> 24a0aa212ee2dbe44360288684478d76a8e20a0a revert?
> 
> I updated kernel today, and (probably because extensions were not
> selectable before), the default choice was "N", which is wrong:
> oldconfig should result in compatible choices being made, for example
> to help bisect.

I don't believe we need this. It has been defaulting to N for a long
time, it's just that it got thrown out of your config due to building
inbetween the patch and its revert. Had you built only before and after
that wouldn't be an issue.

If we default to Y now we send the wrong signal. If you really need to
bisect something wext related you have far bigger issues I'd think, the
code hasn't changed *forever*.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net] ipv6: Prevent ipv6_find_hdr() from returning ENOENT for valid non-first fragments
From: Hannes Frederic Sowa @ 2015-01-07 10:43 UTC (permalink / raw)
  To: Rahul Sharma; +Cc: Pablo Neira Ayuso, netdev, linux-kernel, netfilter-devel
In-Reply-To: <CAFB3abwKpR1VqZXUrsexqveAZrA2LTSmTe8a=-H7z2HjAptjwA@mail.gmail.com>

Hi,

On Mi, 2015-01-07 at 11:11 +0530, Rahul Sharma wrote:
> On Wed, Jan 7, 2015 at 4:17 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Wed, Jan 07, 2015 at 03:03:20AM +0530, Rahul Sharma wrote:
> >> ipv6_find_hdr() currently assumes that the next-header field in the
> >> fragment header of the non-first fragment is the "protocol number of
> >> the last header" (here last header excludes any extension header
> >> protocol numbers ) which is incorrect as per RFC2460. The next-header
> >> value is the first header of the fragmentable part of the original
> >> packet (which can be extension header as well).
> >> This can create reassembly problems. For example: Fragmented
> >> authenticated OSPFv3 packets (where AH header is inserted before the
> >> protocol header). For the second fragment, the next header value in
> >> the fragment header will be NEXTHDR_AUTH which is correct but
> >> ipv6_find_hdr will return ENOENT since AH is an extension header
> >> resulting in second fragment getting dropped. This check for the
> >> presence of non-extension header needs to be removed.
> >>
> >> Signed-off-by: Rahul Sharma <rsharma@arista.com>
> >> ---
> >> --- linux-3.18.1/net/ipv6/exthdrs_core.c.orig   2015-01-06
> >> 10:25:36.411419863 -0800
> >> +++ linux-3.18.1/net/ipv6/exthdrs_core.c        2015-01-06
> >> 10:51:45.819364986 -0800
> >> @@ -171,10 +171,11 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
> >>   * If the first fragment doesn't contain the final protocol header or
> >>   * NEXTHDR_NONE it is considered invalid.
> >>   *
> >> - * Note that non-1st fragment is special case that "the protocol number
> >> - * of last header" is "next header" field in Fragment header. In this case,
> >> - * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
> >> - * isn't NULL.
> >> + * Note that non-1st fragment is special case that "the protocol number of the
> >> + * first header of the fragmentable part of the original packet" is
> >> + * "next header" field in the Fragment header. In this case, *offset is
> >> + * meaningless and fragment offset is stored in *fragoff if fragoff isn't
> >> + * NULL.
> >>   *
> >>   * if flags is not NULL and it's a fragment, then the frag flag
> >>   * IP6_FH_F_FRAG will be set. If it's an AH header, the
> >> @@ -250,9 +251,7 @@ int ipv6_find_hdr(const struct sk_buff *
> >>
> >>                         _frag_off = ntohs(*fp) & ~0x7;
> >>                         if (_frag_off) {
> >> -                               if (target < 0 &&
> >> -                                   ((!ipv6_ext_hdr(hp->nexthdr)) ||
> >
> > This check assumes that the following headers cannot show up in the
> > fragmented part of the IPv6 packet:
> >
> >  12 bool ipv6_ext_hdr(u8 nexthdr)
> >  13 {
> >  14         /*
> >  15          * find out if nexthdr is an extension header or a protocol
> >  16          */
> >  17         return   (nexthdr == NEXTHDR_HOP)       ||
> >  18                  (nexthdr == NEXTHDR_ROUTING)   ||
> >  19                  (nexthdr == NEXTHDR_FRAGMENT)  ||
> >  20                  (nexthdr == NEXTHDR_AUTH)      ||
> >  21                  (nexthdr == NEXTHDR_NONE)      ||
> >  22                  (nexthdr == NEXTHDR_DEST);
> >
> >> -                                    hp->nexthdr == NEXTHDR_NONE)) {
> >> +                               if (target < 0) {
> >>                                         if (fragoff)
> >>                                                 *fragoff = _frag_off;
> >>                                         return hp->nexthdr;
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> I think this is incorrect. Authentication header shows up in the
> fragmentable part of the original IPv6 packet. So, for the non-first
> fragments the next-header field value can be NEXTHDR_AUTH.

Pablo's mail got me thinking again.

In general, IPv6 extension headers can appear in any order and stacks
must be process them. Fragmentation adds a limitation, that some
extension headers do not make sense and don't have any effect if they
appear after a fragmentation header (HbH and ROUTING).

Looking at the rest of the function we don't check for HBHHDR or RTHDR
following a fragmentation header either if we process the first fragment
(core stack only processes HBH if directly following the ipv6 header
anyway).

So, in my opinion, it is safe to completely remove this check and it
would align if the rest of the extension processing logic. The callers
all seem fine with that.

Pablo, what do you think?

Anyway, the patch does not apply cleanly, the patch header is mangled.
Could you check and send again?

Thanks,
Hannes

^ permalink raw reply

* Re: ipv6: oops in datagram.c line 260
From: Hannes Frederic Sowa @ 2015-01-07 10:45 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Chris Ruehl, netdev, davem
In-Reply-To: <20150107072254.GA13046@secunet.com>

On Mi, 2015-01-07 at 08:22 +0100, Steffen Klassert wrote:
> On Tue, Jan 06, 2015 at 05:01:13PM +0100, Hannes Frederic Sowa wrote:
> > 
> > xfrm6_output_finish unconditionally resets skb->protocol so we try to
> > dispatch to the IPv6 handler, even though tcp just sends an IPv4 packet.
> 
> Maybe we better dispatch based on sk->sk_family, this should give
> always the right address family of the socket.

The original problem was dealing with IPv4/v6 mapped traffic. Processing
local errors from unconnected UDP sockets which are emitting both IPv4
and IPv6 frames won't play nicely with sk->sk_family I fear.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Thomas Graf @ 2015-01-07 11:01 UTC (permalink / raw)
  To: David Laight
  Cc: 'Alexei Starovoitov', David S. Miller, Jesse Gross,
	Stephen Hemminger, Pravin Shelar, netdev@vger.kernel.org,
	dev@openvswitch.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1CAC2740@AcuExch.aculab.com>

On 01/07/15 at 10:03am, David Laight wrote:
> From: Alexei Starovoitov
> > On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > > +struct vxlan_gbp {
> > > +#ifdef __LITTLE_ENDIAN_BITFIELD
> > > +       __u8    reserved_flags1:3,
> > ...
> > > +       __be16 policy_id;
> > > +} __packed;
> > 
> > are you sure that compiler will be smart enough
> > to do single short load when you pack
> > u8 + struct vxlan_gbp inside struct vxlanhdr ?
> > I suspect compiler will use two byte loads
> > with shifts and ors every time to access policy_id.
> > Even it works ok, I think this struct layout is ugly.
> > imo would be much easier to read if you replace
> > the whole vxlanhdr with vxlanhdr_gbp
> > or split vxlanhdr into two 32-bit structs.
> > then __packed hacks won't be needed.

If I read objdump output correctly, gcc seems fine with it:

        /* For backwards compatibility, only allow reserved fields to be
         * used by VXLAN extensions if explicitly requested.
         */
        if (vs->exts) {
                if (!vxh->vni_present)
    2640:       41 0f b6 55 08          movzbl 0x8(%r13),%edx
    2645:       f6 c2 08                test   $0x8,%dl
    2648:       74 c2                   je     260c <vxlan_udp_encap_recv+0x9c>
    [...]
                        md.gbp = ntohs(vxh->gbp.policy_id);
    2652:       41 0f b7 55 0a          movzwl 0xa(%r13),%edx

Let me know what I have to do/provide to validate this properly.

> Also, if you are writing the values then you need to write
> all the members of the bitfield in order to get a single write.
> 
> Basically you are much better off using explicit masks.

I went back and forth on this and chose to use individual bit fields
and map them to a static bit definition which is exported via Netlink.
That way the user space Netlink interface remains stable should the
wire protocol ever change. Yes, this implies some branching which could
be avoided right now as long as user and wire protocol are identical. I
did not observe any performance differences in benchmarks though.

^ permalink raw reply

* Re: [PATCH] wireless extensions should default to Y
From: Pavel Machek @ 2015-01-07 11:03 UTC (permalink / raw)
  To: Johannes Berg
  Cc: jikos, kernel list, Linus Torvalds, davem, linux-wireless, netdev
In-Reply-To: <1420627315.3407.2.camel@sipsolutions.net>

On Wed 2015-01-07 11:41:55, Johannes Berg wrote:
> On Wed, 2015-01-07 at 11:19 +0100, Pavel Machek wrote:
> > Do we need following patch on top of
> > 24a0aa212ee2dbe44360288684478d76a8e20a0a revert?
> > 
> > I updated kernel today, and (probably because extensions were not
> > selectable before), the default choice was "N", which is wrong:
> > oldconfig should result in compatible choices being made, for example
> > to help bisect.
> 
> I don't believe we need this. It has been defaulting to N for a long
> time, it's just that it got thrown out of your config due to building
> inbetween the patch and its revert. Had you built only before and after
> that wouldn't be an issue.

Well, I clearly hit the issue. If someone had _not_ build in between,
the "default y" does not change anything for him, as he will not be
asked thequestion. If someone starts config from scratch, Y is the
safe answer.

> If we default to Y now we send the wrong signal. If you really need to
> bisect something wext related you have far bigger issues I'd think, the
> code hasn't changed *forever*.

I am woried about bisecting something unrelated, and then wext coming
and breaking the bisect. But you are right that it will break the
bisect, anyway...

                                                                        Pavel
									
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH] wireless extensions should default to Y
From: Johannes Berg @ 2015-01-07 11:06 UTC (permalink / raw)
  To: Pavel Machek
  Cc: jikos-AlSwsSmVLrQ, kernel list, Linus Torvalds,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20150107110323.GB7368@amd>

On Wed, 2015-01-07 at 12:03 +0100, Pavel Machek wrote:
> On Wed 2015-01-07 11:41:55, Johannes Berg wrote:
> > On Wed, 2015-01-07 at 11:19 +0100, Pavel Machek wrote:
> > > Do we need following patch on top of
> > > 24a0aa212ee2dbe44360288684478d76a8e20a0a revert?
> > > 
> > > I updated kernel today, and (probably because extensions were not
> > > selectable before), the default choice was "N", which is wrong:
> > > oldconfig should result in compatible choices being made, for example
> > > to help bisect.
> > 
> > I don't believe we need this. It has been defaulting to N for a long
> > time, it's just that it got thrown out of your config due to building
> > inbetween the patch and its revert. Had you built only before and after
> > that wouldn't be an issue.
> 
> Well, I clearly hit the issue. If someone had _not_ build in between,
> the "default y" does not change anything for him, as he will not be
> asked thequestion. If someone starts config from scratch, Y is the
> safe answer.

We've long stated that if really starting from scratch it's the wrong
thing to do and had a default N, so I don't really know why we'd revert
that only to make a short period of bisect slightly happier?

I really don't want to merge this patch and encourage more people to
build with wext enabled.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox