Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH next-next 0/2] can: cc770: add support for the Bosch CC770 and Intel AN82527
From: David Miller @ 2011-11-30 21:07 UTC (permalink / raw)
  To: socketcan; +Cc: wg, netdev, linux-can, socketcan-users, boir1, stanislavelensky
In-Reply-To: <4ED5CF14.4020502@hartkopp.net>

From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Wed, 30 Nov 2011 07:37:08 +0100

> On 30.11.2011 00:39, David Miller wrote:
> 
>> From: Wolfgang Grandegger <wg@grandegger.com>
>> Date: Thu, 24 Nov 2011 13:07:26 +0100
>> 
>>> Already since a while we have support for the Bosch CC770 and Intel
>>> AN82527 CAN controllers in our out-of-tree Socket-CAN repository.
>>> It would be nice if somebody could test the driver. Unfortunately,
>>> I currently do not have hardware at hand. I have not yet ported the
>>> OF platform driver as it needs further attaention due to the merge
>>> of the platform and OF platform interface.
>>>
>>> Wolfgang Grandegger (2):
>>>   can: cc770: add driver core for the Bosch CC770 and Intel AN82527
>>>   can: cc770: legacy CC770 ISA bus driver
>> 
>> All applied, thank you.
> 
> 
> Hello Dave,
> 
> this patchset was superseded for some time.
> 
> We're currently in the v4 patchset (dated 2011-11-29).
> 
> See http://patchwork.ozlabs.org/patch/128244/
> 
> Do you want to revert the commit or should Wolfgang post a diff patch to get
> to v4 ??

A not very good job was done here communicating to me what is happening.
Endless revisions, and not clear indication to me what should or should
not be applied as a result.

I want you guys to appoint someone to be the defacto CAN driver and
subsystem maintainer who collects and merges all the driver and
protocol patches into his tree, and acts as the one and only interface
for me when changes are ready to be included.

I sorted this out by reverting the older changes and applying V5.

But I had to fix things up, when applying patch #3 there are empty
trailing lines in some of the new files generated, and that causes
git to complain.

^ permalink raw reply

* Re: [PATCH next-next 0/2] can: cc770: add support for the Bosch CC770 and Intel AN82527
From: David Miller @ 2011-11-30 21:14 UTC (permalink / raw)
  To: socketcan; +Cc: wg, netdev, linux-can, socketcan-users, boir1, stanislavelensky
In-Reply-To: <20111130.160727.1989923062226789802.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 30 Nov 2011 16:07:27 -0500 (EST)

> I sorted this out by reverting the older changes and applying V5.
> 
> But I had to fix things up, when applying patch #3 there are empty
> trailing lines in some of the new files generated, and that causes
> git to complain.

Actually, I had to revert, it doesn't even compile.  I'm really pissed
off at how this patch set is being handled, this should never happen.

It's not damn secret how I smoke test everyone's changes, I "allmodconfig"
and type "make".

drivers/net/can/cc770/cc770_platform.c: In function ‘cc770_platform_read_reg’:
drivers/net/can/cc770/cc770_platform.c:68:2: error: implicit declaration of function ‘in_8’ [-Werror=implicit-function-declaration]
drivers/net/can/cc770/cc770_platform.c: In function ‘cc770_platform_write_reg’:
drivers/net/can/cc770/cc770_platform.c:74:2: error: implicit declaration of function ‘out_8’ [-Werror=implicit-function-declaration]
drivers/net/can/cc770/cc770_platform.c: At top level:
drivers/net/can/cc770/cc770_platform.c:78:17: warning: ‘struct platform_device’ declared inside parameter list [enabled by default]
drivers/net/can/cc770/cc770_platform.c:78:17: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
drivers/net/can/cc770/cc770_platform.c: In function ‘cc770_get_of_node_data’:
drivers/net/can/cc770/cc770_platform.c:80:31: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c:85:2: error: implicit declaration of function ‘of_get_property’ [-Werror=implicit-function-declaration]
drivers/net/can/cc770/cc770_platform.c:85:7: warning: assignment makes pointer from integer without a cast [enabled by default]
drivers/net/can/cc770/cc770_platform.c:119:7: warning: assignment makes pointer from integer without a cast [enabled by default]
drivers/net/can/cc770/cc770_platform.c:128:9: warning: assignment makes pointer from integer without a cast [enabled by default]
drivers/net/can/cc770/cc770_platform.c:143:4: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c: At top level:
drivers/net/can/cc770/cc770_platform.c:151:18: warning: ‘struct platform_device’ declared inside parameter list [enabled by default]
drivers/net/can/cc770/cc770_platform.c: In function ‘cc770_get_platform_data’:
drivers/net/can/cc770/cc770_platform.c:154:42: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c: At top level:
drivers/net/can/cc770/cc770_platform.c:166:50: warning: ‘struct platform_device’ declared inside parameter list [enabled by default]
drivers/net/can/cc770/cc770_platform.c: In function ‘cc770_platform_probe’:
drivers/net/can/cc770/cc770_platform.c:175:2: error: implicit declaration of function ‘platform_get_resource’ [-Werror=implicit-function-declaration]
drivers/net/can/cc770/cc770_platform.c:175:6: warning: assignment makes pointer from integer without a cast [enabled by default]
drivers/net/can/cc770/cc770_platform.c:176:2: error: implicit declaration of function ‘platform_get_irq’ [-Werror=implicit-function-declaration]
drivers/net/can/cc770/cc770_platform.c:181:7: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c:203:10: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c:204:3: warning: passing argument 1 of ‘cc770_get_of_node_data’ from incompatible pointer type [enabled by default]
drivers/net/can/cc770/cc770_platform.c:77:22: note: expected ‘struct platform_device *’ but argument is of type ‘struct platform_device *’
drivers/net/can/cc770/cc770_platform.c:205:15: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c:206:3: warning: passing argument 1 of ‘cc770_get_platform_data’ from incompatible pointer type [enabled by default]
drivers/net/can/cc770/cc770_platform.c:150:22: note: expected ‘struct platform_device *’ but argument is of type ‘struct platform_device *’
drivers/net/can/cc770/cc770_platform.c:212:2: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c:218:23: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c:219:2: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c:223:16: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c: At top level:
drivers/net/can/cc770/cc770_platform.c:240:51: warning: ‘struct platform_device’ declared inside parameter list [enabled by default]
drivers/net/can/cc770/cc770_platform.c: In function ‘cc770_platform_remove’:
drivers/net/can/cc770/cc770_platform.c:242:48: error: dereferencing pointer to incomplete type
drivers/net/can/cc770/cc770_platform.c:250:6: warning: assignment makes pointer from integer without a cast [enabled by default]
drivers/net/can/cc770/cc770_platform.c: At top level:
drivers/net/can/cc770/cc770_platform.c:256:42: error: array type has incomplete element type
drivers/net/can/cc770/cc770_platform.c:257:2: error: field name not in record or union initializer
drivers/net/can/cc770/cc770_platform.c:257:2: error: (near initialization for ‘cc770_platform_table’)
drivers/net/can/cc770/cc770_platform.c:258:2: error: field name not in record or union initializer
drivers/net/can/cc770/cc770_platform.c:258:2: error: (near initialization for ‘cc770_platform_table’)
drivers/net/can/cc770/cc770_platform.c:262:15: error: variable ‘cc770_platform_driver’ has initializer but incomplete type
drivers/net/can/cc770/cc770_platform.c:263:2: error: unknown field ‘driver’ specified in initializer
drivers/net/can/cc770/cc770_platform.c:263:2: error: extra brace group at end of initializer
drivers/net/can/cc770/cc770_platform.c:263:2: error: (near initialization for ‘cc770_platform_driver’)
drivers/net/can/cc770/cc770_platform.c:267:2: warning: excess elements in struct initializer [enabled by default]
drivers/net/can/cc770/cc770_platform.c:267:2: warning: (near initialization for ‘cc770_platform_driver’) [enabled by default]
drivers/net/can/cc770/cc770_platform.c:268:2: error: unknown field ‘probe’ specified in initializer
drivers/net/can/cc770/cc770_platform.c:268:2: warning: excess elements in struct initializer [enabled by default]
drivers/net/can/cc770/cc770_platform.c:268:2: warning: (near initialization for ‘cc770_platform_driver’) [enabled by default]
drivers/net/can/cc770/cc770_platform.c:269:2: error: unknown field ‘remove’ specified in initializer
drivers/net/can/cc770/cc770_platform.c:269:2: warning: excess elements in struct initializer [enabled by default]
drivers/net/can/cc770/cc770_platform.c:269:2: warning: (near initialization for ‘cc770_platform_driver’) [enabled by default]
drivers/net/can/cc770/cc770_platform.c:272:1: warning: data definition has no type or storage class [enabled by default]
drivers/net/can/cc770/cc770_platform.c:272:1: warning: type defaults to ‘int’ in declaration of ‘module_platform_driver’ [-Wimplicit-int]
drivers/net/can/cc770/cc770_platform.c:272:1: warning: parameter names (without types) in function declaration [enabled by default]
drivers/net/can/cc770/cc770_platform.c:256:42: warning: ‘cc770_platform_table’ defined but not used [-Wunused-variable]
drivers/net/can/cc770/cc770_platform.c:262:31: warning: ‘cc770_platform_driver’ defined but not used [-Wunused-variable]
cc1: some warnings being treated as errors

^ permalink raw reply

* Re: pull request: wireless 2011-11-30
From: David Miller @ 2011-11-30 21:21 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20111130193907.GB2477@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Wed, 30 Nov 2011 14:39:07 -0500

> There is an nl80211 fix for validating user input to prevent a buffer
> underrun, a mac80211 race fix to prevent a WARN_ON, a mac80211 fix to
> avoid trying to stop a single aggregation session twice, a mac80211 fix
> for a race condition that could cause a crash when an addBA response
> came late, the partial reversion of an ath9k change that broke wireless
> on Ferrari One laptops, and a fix for an rtlwifi deadlock.
> 
> Please let me know if there are problems!

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] tcp: inherit listener congestion control for passive cnx
From: Rick Jones @ 2011-11-30 21:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Yuchung Cheng, Stephen Hemminger
In-Reply-To: <1322650961.2403.13.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

>
> Rick Jones reported that TCP_CONGESTION sockopt performed on a listener
> was ignored for its children sockets : right after accept() the
> congestion control for new socket is the system default one.
>
> This seems an oversight of the initial design (quoted from Stephen)
>
> Based on prior investigation and patch from Rick.
>
> Reported-by: Rick Jones<rick.jones2@hp.com>
> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
> CC: Stephen Hemminger<shemminger@vyatta.com>
> CC: Yuchung Cheng<ycheng@google.com>

Tested-by: Rick Jones <rick.jones2@hp.com>

Works as expected/desired with my test program, and doesn't appear to 
affect performance measurably (loopback netperf TCP_CC test on a 1 VCPU 
guest).

rick jones

^ permalink raw reply

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
From: Ben Hutchings @ 2011-11-30 21:34 UTC (permalink / raw)
  To: Chris Wright
  Cc: Greg Rose, Roopa Prabhu, netdev@vger.kernel.org,
	davem@davemloft.net, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com
In-Reply-To: <20111130210454.GC29071@x200.localdomain>

On Wed, 2011-11-30 at 13:04 -0800, Chris Wright wrote:
> * Ben Hutchings (bhutchings@solarflare.com) wrote:
> > On Wed, 2011-11-30 at 09:34 -0800, Greg Rose wrote:
> > > On 11/29/2011 9:19 AM, Ben Hutchings wrote:
> > > > On Tue, 2011-11-29 at 16:35 +0000, Ben Hutchings wrote:
> > > >>
> > > >> Maybe I missed something!
> > [...]
> > > >> If not, please explain what the new model *is*.
> > > 
> > > The new model is to incorporate a VEB into the NIC.  The current model 
> > > doesn't address any of the requirements of a VEB in the NIC and this 
> > > proposed set of patches allow us to set MAC filters for the *ports* on 
> > > the internal NIC VEB.  Consider the PF and each of the VFs as just a 
> > > port on the VEB.  We need the ability to set L2 filters (MAC, MC and 
> > > VLAN) for each of the ports on that VEB.  There is no currently 
> > > supported method for doing this.  So yes, this is a new model although 
> > > it's a fairly simple one.
> > 
> > Explain precisely how the VEB changes the existing model.  Explain how
> > the existing MAC filter and VF filter APIs interact with port filters on
> > the VEB.  Refer to any relevant standards.
> 
> I agree that it's confusing.  Couldn't you simplify your ascii art
> (hopefully removing hw assumptions about receive processing, and
> completely ignoring vlans for the moment) to something like:
>
>              |RX
>              v
> +------------+-------------+
> |     +------+--------+    |
> |     | RX MAC filter |    |
> |     |and port select|    |
> |     +---------------+    |
> |            /|\           |
> |           / | \   match 2|
> |          /  v  \         |
> |         /match  \        |
> |        /  1 |    \       |
> |       /     |     \      |
> |match /      |      \     |
> |  0  /       |       \    |
> |    v        |        v   |
> |    |        |        |   |
> +----+--------+--------+---+
>      |        |        |
>     PF       VF 1     VF 2
> 
> And there's an unclear number of ways to update "RX MAC filter and port
> select" table.
> 
> 1) PF ndo_set_mac_addr
> I expect that to be implicit to match 0.
> 
> 2) PF ndo_set_rx_mode
> Less clear, but I'd still expect these to implicitly match 0
> 
> 3) PF ndo_set_vf_mac
> I expect these to be an explicit match to VF N (given the interface
> specifices which VF's MAC is being programmed).

I'm not sure whether this is supposed to implicitly add to the MAC
filter or whether that has to be changed too.  That's the main
difference between my models (a) and (b).

There's also PF ndo_set_vf_vlan.

> 4) VF ndo_set_mac_addr
> This one may or may not be allowed (setting MAC+port if the VF is owned
> by a guest is likely not allowed), but would expect an implicit VF N.
> 
> 5) VF ndo_set_rx_mode
> Same as 4) above.

So this is where we are today.

> 6) PF or VF? ndo_set_rx_filter_addr
> The new proposal, which has an explicit VF, although when it's VF_SELF
> I'm not clear if this is just the same as 5) above?
> 
> Have I missed anything?

Any physical port can be bridged to a mixture of guests with and without
their own VFs.  Packets sent from a guest with a VF to the address of a
guest without a VF need to be forwarded to the PF rather than the
physical port, but none of the drivers currently get to know about those
addresses.

Packets sent from a guest with a VF to the address of another guest with
a VF need to be forwarded similarly, but the driver should be able to
infer that from (3).

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH] sch_red: fix red_calc_qavg_from_idle_time
From: Eric Dumazet @ 2011-11-30 21:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger, David Täht

Since commit a4a710c4a7490587 (pkt_sched: Change PSCHED_SHIFT from 10 to
6) it seems [G]RED are broken in red_calc_qavg_from_idle_time()

This function computes a delay in us units, but this delay is now 16
times bigger than real delay, so the final qavg result is wrong...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
Maybe we should use native kernel time services in red ?
(ktime_get(), ktime_us_delta())

 include/net/pkt_sched.h |    1 +
 include/net/red.h       |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fffdc60..c719d9b 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -43,6 +43,7 @@ typedef long	psched_tdiff_t;
 /* Avoid doing 64 bit divide */
 #define PSCHED_SHIFT			6
 #define PSCHED_TICKS2NS(x)		((s64)(x) << PSCHED_SHIFT)
+#define PSCHED_TICKS2US(x)		(PSCHED_TICKS2NS(x) >> 10) /* approximate 1000 by 1024 */
 #define PSCHED_NS2TICKS(x)		((x) >> PSCHED_SHIFT)
 
 #define PSCHED_TICKS_PER_SEC		PSCHED_NS2TICKS(NSEC_PER_SEC)
diff --git a/include/net/red.h b/include/net/red.h
index 3319f16..a413638 100644
--- a/include/net/red.h
+++ b/include/net/red.h
@@ -175,7 +175,7 @@ static inline unsigned long red_calc_qavg_from_idle_time(struct red_parms *p)
 	int  shift;
 
 	now = psched_get_time();
-	us_idle = psched_tdiff_bounded(now, p->qidlestart, p->Scell_max);
+	us_idle = PSCHED_TICKS2US(psched_tdiff_bounded(now, p->qidlestart, p->Scell_max));
 
 	/*
 	 * The problem: ideally, average length queue recalcultion should

^ permalink raw reply related

* Re: [PATCH] sch_red: fix red_calc_qavg_from_idle_time
From: Eric Dumazet @ 2011-11-30 21:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger, David Täht
In-Reply-To: <1322688891.2602.15.camel@edumazet-laptop>

Le mercredi 30 novembre 2011 à 22:34 +0100, Eric Dumazet a écrit :
> Since commit a4a710c4a7490587 (pkt_sched: Change PSCHED_SHIFT from 10 to
> 6) it seems [G]RED are broken in red_calc_qavg_from_idle_time()
> 
> This function computes a delay in us units, but this delay is now 16
> times bigger than real delay, so the final qavg result is wrong...
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> Maybe we should use native kernel time services in red ?
> (ktime_get(), ktime_us_delta())
> 
>  include/net/pkt_sched.h |    1 +
>  include/net/red.h       |    2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index fffdc60..c719d9b 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -43,6 +43,7 @@ typedef long	psched_tdiff_t;
>  /* Avoid doing 64 bit divide */
>  #define PSCHED_SHIFT			6
>  #define PSCHED_TICKS2NS(x)		((s64)(x) << PSCHED_SHIFT)
> +#define PSCHED_TICKS2US(x)		(PSCHED_TICKS2NS(x) >> 10) /* approximate 1000 by 1024 */
>  #define PSCHED_NS2TICKS(x)		((x) >> PSCHED_SHIFT)
>  
>  #define PSCHED_TICKS_PER_SEC		PSCHED_NS2TICKS(NSEC_PER_SEC)
> diff --git a/include/net/red.h b/include/net/red.h
> index 3319f16..a413638 100644
> --- a/include/net/red.h
> +++ b/include/net/red.h
> @@ -175,7 +175,7 @@ static inline unsigned long red_calc_qavg_from_idle_time(struct red_parms *p)
>  	int  shift;
>  
>  	now = psched_get_time();
> -	us_idle = psched_tdiff_bounded(now, p->qidlestart, p->Scell_max);
> +	us_idle = PSCHED_TICKS2US(psched_tdiff_bounded(now, p->qidlestart, p->Scell_max));

Hmm, this is not correct the bound should be applied on the us result,
not on tick I presume (one tick == 64ns)

I'll test another patch.

^ permalink raw reply

* Re: [PATCH] datapath: Fix build breakage on kernel 2.6.40
From: Jesse Gross @ 2011-11-30 21:55 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	aliguori-r/Jw6+rmf7HQT0dZR+AlfA, stefanha-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1322042318-4809-1-git-send-email-zwu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Wed, Nov 23, 2011 at 1:58 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> diff --git a/datapath/linux/compat/include/linux/skbuff.h b/datapath/linux/compat/include/linux/skbuff.h
> index 311bfdb..22ba2e6 100644
> --- a/datapath/linux/compat/include/linux/skbuff.h
> +++ b/datapath/linux/compat/include/linux/skbuff.h
> @@ -239,7 +239,7 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
>  }
>  #endif
>
> -#if LINUX_VERSION_CODE < KERNEL_VERSION(3,0,0)
> +#ifndef HAVE_SKB_RESET_MAC_LEN

2.6.40 is the early name for 3.0.  Does it work if you just replace
the check with KERNEL_VERSION(2,6,40)?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply

* Re: [PATCH net-next] tcp: inherit listener congestion control for passive cnx
From: David Miller @ 2011-11-30 21:56 UTC (permalink / raw)
  To: rick.jones2; +Cc: eric.dumazet, netdev, ycheng, shemminger
In-Reply-To: <4ED6A019.2040708@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Wed, 30 Nov 2011 13:28:57 -0800

>>
>> Rick Jones reported that TCP_CONGESTION sockopt performed on a
>> listener
>> was ignored for its children sockets : right after accept() the
>> congestion control for new socket is the system default one.
>>
>> This seems an oversight of the initial design (quoted from Stephen)
>>
>> Based on prior investigation and patch from Rick.
>>
>> Reported-by: Rick Jones<rick.jones2@hp.com>
>> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
>> CC: Stephen Hemminger<shemminger@vyatta.com>
>> CC: Yuchung Cheng<ycheng@google.com>
> 
> Tested-by: Rick Jones <rick.jones2@hp.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH 1/2] net/hyperv: Fix long lines in netvsc.c
From: Greg KH @ 2011-11-30 21:57 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: David Miller, KY Srinivasan, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, devel@linuxdriverproject.org
In-Reply-To: <A1F3067C9B68744AA19F6802BAB8FFDC0CCD01B1@TK5EX14MBXC227.redmond.corp.microsoft.com>

On Wed, Nov 30, 2011 at 07:13:49PM +0000, Haiyang Zhang wrote:
> > -----Original Message-----
> > From: David Miller [mailto:davem@davemloft.net]
> > Sent: Wednesday, November 30, 2011 12:36 PM
> > To: Haiyang Zhang
> > Cc: KY Srinivasan; gregkh@suse.de; linux-kernel@vger.kernel.org;
> > netdev@vger.kernel.org; devel@linuxdriverproject.org
> > Subject: Re: [PATCH 1/2] net/hyperv: Fix long lines in netvsc.c
> > 
> > 
> > Well, since Greg's tree is where the staging move happened, he'll have
> > to take any patches to this driver that happen now, rather than me.
> 
> Sounds good. And, these patches were sent to devel@linuxdriverproject.org
> too.
> 
> Greg, could you consider these patches for your tree?

Yes, I'll queue them up, thanks.

greg k-h

^ permalink raw reply

* Re: [PATCH net-next 3/3] caif: Remove unused enum and parameter in cfserl
From: David Miller @ 2011-11-30 22:04 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev
In-Reply-To: <1322680968-6470-3-git-send-email-sjur.brandeland@stericsson.com>

From: Sjur Brændeland <sjur.brandeland@stericsson.com>
Date: Wed, 30 Nov 2011 20:22:48 +0100

>  			link_support = cfserl_create(dev->ifindex,
> -					CFPHYTYPE_FRAG, caifdev->use_stx);
> +							caifdev->use_stx);
>  			if (!link_support) {
>  				pr_warn("Out of memory\n");
>  				break;
 ...
> -struct cflayer *cfserl_create(int type, int instance, bool use_stx)
> +struct cflayer *cfserl_create(int instance, bool use_stx)
>  {

Looks like this code was always buggy, passing the index in the type parameter
and the type in the instance parameter.

^ permalink raw reply

* Re: [PATCH] bonding: only use primary address for ARP
From: Andy Gospodarek @ 2011-11-30 22:07 UTC (permalink / raw)
  To: Henrik Saavedra Persson; +Cc: netdev, fubar
In-Reply-To: <20111124093715.GA25365@as68121.uab.ericsson.se>

On Thu, Nov 24, 2011 at 10:37:15AM +0100, Henrik Saavedra Persson wrote:
> Only use the primary address of the bond device
> for master_ip. This will prevent changing the ARP source
> address in Active-Backup mode whenever a secondry address
> is added to the bond device.
> 
> Signed-off-by: Henrik Saavedra Persson <henrik.e.persson@ericsson.com>
> ---
>  drivers/net/bonding/bond_main.c |   33 ++++++---------------------------
>  1 files changed, 6 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 6191e63..ec293b3 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2551,30 +2551,6 @@ out:
>  	}
>  }
>  
> -static __be32 bond_glean_dev_ip(struct net_device *dev)
> -{
> -	struct in_device *idev;
> -	struct in_ifaddr *ifa;
> -	__be32 addr = 0;
> -
> -	if (!dev)
> -		return 0;
> -
> -	rcu_read_lock();
> -	idev = __in_dev_get_rcu(dev);
> -	if (!idev)
> -		goto out;
> -
> -	ifa = idev->ifa_list;
> -	if (!ifa)
> -		goto out;
> -
> -	addr = ifa->ifa_local;
> -out:
> -	rcu_read_unlock();
> -	return addr;
> -}
> -
>  static int bond_has_this_ip(struct bonding *bond, __be32 ip)
>  {
>  	struct vlan_entry *vlan;
> @@ -3316,6 +3292,10 @@ static int bond_inetaddr_event(struct notifier_block *this, unsigned long event,
>  	struct bonding *bond;
>  	struct vlan_entry *vlan;
>  
> +	/* we only care about primary address */
> +	if(ifa->ifa_flags & IFA_F_SECONDARY)
> +		return NOTIFY_DONE;
> +

This change is good.  We need to stop overwriting bond->master_ip when
secondary addresses are added.

>  	list_for_each_entry(bond, &bn->dev_list, bond_list) {
>  		if (bond->dev == event_dev) {
>  			switch (event) {
> @@ -3323,7 +3303,7 @@ static int bond_inetaddr_event(struct notifier_block *this, unsigned long event,
>  				bond->master_ip = ifa->ifa_local;
>  				return NOTIFY_OK;
>  			case NETDEV_DOWN:
> -				bond->master_ip = bond_glean_dev_ip(bond->dev);
> +				bond->master_ip = 0;
>  				return NOTIFY_OK;
>  			default:
>  				return NOTIFY_DONE;
> @@ -3339,8 +3319,7 @@ static int bond_inetaddr_event(struct notifier_block *this, unsigned long event,
>  					vlan->vlan_ip = ifa->ifa_local;
>  					return NOTIFY_OK;
>  				case NETDEV_DOWN:
> -					vlan->vlan_ip =
> -						bond_glean_dev_ip(vlan_dev);
> +					vlan->vlan_ip = 0;
>  					return NOTIFY_OK;
>  				default:
>  					return NOTIFY_DONE;

Inspection indicates this is safe too.  The only code that appears to
check against master_up and vlan_ip appears to be in ARP monitoring code
and it does not make much sense to use ARP monitoring when your host does
not have an IP address.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>

^ permalink raw reply

* Re: [PATCH v3] net: fec: Select the FEC driver by default for i.MX SoCs
From: David Miller @ 2011-11-30 22:08 UTC (permalink / raw)
  To: fabio.estevam
  Cc: netdev, marek.vasut, linux-arm-kernel, kernel, u.kleine-koenig,
	shawn.guo
In-Reply-To: <1322664138-3245-1-git-send-email-fabio.estevam@freescale.com>

From: Fabio Estevam <fabio.estevam@freescale.com>
Date: Wed, 30 Nov 2011 12:42:18 -0200

> Since commit 230dec6 (net/fec: add imx6q enet support) the FEC driver is no 
> longer built by default for i.MX SoCs.
> 
> Let the FEC driver be built by default again.
> 
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
> Suggested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de
> Acked-by: Shawn Guo <shawn.guo@linaro.org>

Applied.

^ permalink raw reply

* [PATCH V2] sch_red: fix red_calc_qavg_from_idle_time
From: Eric Dumazet @ 2011-11-30 22:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger, David Täht
In-Reply-To: <1322689213.2602.18.camel@edumazet-laptop>

Since commit a4a710c4a7490587 (pkt_sched: Change PSCHED_SHIFT from 10 to
6) it seems RED/GRED are broken.

red_calc_qavg_from_idle_time() computes a delay in us units, but this
delay is now 16 times bigger than real delay, so the final qavg result
smaller than expected.

Use standard kernel time services since there is no need to obfuscate
them.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/red.h |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/net/red.h b/include/net/red.h
index 3319f16..b72a3b8 100644
--- a/include/net/red.h
+++ b/include/net/red.h
@@ -116,7 +116,7 @@ struct red_parms {
 	u32		qR;		/* Cached random number */
 
 	unsigned long	qavg;		/* Average queue length: A scaled */
-	psched_time_t	qidlestart;	/* Start of current idle period */
+	ktime_t		qidlestart;	/* Start of current idle period */
 };
 
 static inline u32 red_rmask(u8 Plog)
@@ -148,17 +148,17 @@ static inline void red_set_parms(struct red_parms *p,
 
 static inline int red_is_idling(struct red_parms *p)
 {
-	return p->qidlestart != PSCHED_PASTPERFECT;
+	return p->qidlestart.tv64 != 0;
 }
 
 static inline void red_start_of_idle_period(struct red_parms *p)
 {
-	p->qidlestart = psched_get_time();
+	p->qidlestart = ktime_get();
 }
 
 static inline void red_end_of_idle_period(struct red_parms *p)
 {
-	p->qidlestart = PSCHED_PASTPERFECT;
+	p->qidlestart.tv64 = 0;
 }
 
 static inline void red_restart(struct red_parms *p)
@@ -170,13 +170,10 @@ static inline void red_restart(struct red_parms *p)
 
 static inline unsigned long red_calc_qavg_from_idle_time(struct red_parms *p)
 {
-	psched_time_t now;
-	long us_idle;
+	s64 delta = ktime_us_delta(ktime_get(), p->qidlestart);
+	long us_idle = min_t(s64, delta, p->Scell_max);
 	int  shift;
 
-	now = psched_get_time();
-	us_idle = psched_tdiff_bounded(now, p->qidlestart, p->Scell_max);
-
 	/*
 	 * The problem: ideally, average length queue recalcultion should
 	 * be done over constant clock intervals. This is too expensive, so

^ permalink raw reply related

* Re: [PATCH] sch_teql: fix lockdep splat
From: David Miller @ 2011-11-30 22:11 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1322662138.2403.25.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 30 Nov 2011 15:08:58 +0100

> We need rcu_read_lock() protection before using dst_get_neighbour(), and
> we must cache its value (pass it to __teql_resolve())
> 
> teql_master_xmit() is called under rcu_read_lock_bh() protection, its
> not enough.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] bnx2: Support for byte queue limits
From: David Miller @ 2011-11-30 22:11 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, therbert
In-Reply-To: <1322603585.2596.34.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 29 Nov 2011 22:53:05 +0100

> Changes to bnx2 to use byte queue limits.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Tom Herbert <therbert@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH 10/10] sfc: Support for byte queue limits
From: David Miller @ 2011-11-30 22:12 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev, bhutchings
In-Reply-To: <1322657043.2403.18.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 30 Nov 2011 13:44:03 +0100

> [PATCH net-next] sfc: fix race in efx_enqueue_skb_tso()
> 
> As soon as skb is pushed to hardware, it can be completed and freed, so
> we should not dereference skb anymore.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply

* [PATCH v3 net-next 1/2] netem: rate extension
From: Hagen Paul Pfeifer @ 2011-11-30 22:20 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, Stephen Hemminger, Hagen Paul Pfeifer

Currently netem is not in the ability to emulate channel bandwidth. Only static
delay (and optional random jitter) can be configured.

To emulate the channel rate the token bucket filter (sch_tbf) can be used.  But
TBF has some major emulation flaws. The buffer (token bucket depth/rate) cannot
be 0. Also the idea behind TBF is that the credit (token in buckets) fills if
no packet is transmitted. So that there is always a "positive" credit for new
packets. In real life this behavior contradicts the law of nature where
nothing can travel faster as speed of light. E.g.: on an emulated 1000 byte/s
link a small IPv4/TCP SYN packet with ~50 byte require ~0.05 seconds - not 0
seconds.

Netem is an excellent place to implement a rate limiting feature: static
delay is already implemented, tfifo already has time information and the
user can skip TBF configuration completely.

This patch implement rate feature which can be configured via tc. e.g:

	tc qdisc add dev eth0 root netem rate 10kbit

To emulate a link of 5000byte/s and add an additional static delay of 10ms:

	tc qdisc add dev eth0 root netem delay 10ms rate 5KBps

Note: similar to TBF the rate extension is bounded to the kernel timing
system. Depending on the architecture timer granularity, higher rates (e.g.
10mbit/s and higher) tend to transmission bursts. Also note: further queues
living in network adaptors; see ethtool(8).

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/pkt_sched.h |    5 +++++
 net/sched/sch_netem.c     |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index c533670..26c37ca 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -465,6 +465,7 @@ enum {
 	TCA_NETEM_REORDER,
 	TCA_NETEM_CORRUPT,
 	TCA_NETEM_LOSS,
+	TCA_NETEM_RATE,
 	__TCA_NETEM_MAX,
 };
 
@@ -495,6 +496,10 @@ struct tc_netem_corrupt {
 	__u32	correlation;
 };
 
+struct tc_netem_rate {
+	__u32	rate;	/* byte/s */
+};
+
 enum {
 	NETEM_LOSS_UNSPEC,
 	NETEM_LOSS_GI,		/* General Intuitive - 4 state model */
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index eb3b9a8..9b7af9f 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -79,6 +79,7 @@ struct netem_sched_data {
 	u32 duplicate;
 	u32 reorder;
 	u32 corrupt;
+	u32 rate;
 
 	struct crndstate {
 		u32 last;
@@ -298,6 +299,11 @@ static psched_tdiff_t tabledist(psched_tdiff_t mu, psched_tdiff_t sigma,
 	return  x / NETEM_DIST_SCALE + (sigma / NETEM_DIST_SCALE) * t + mu;
 }
 
+static psched_time_t packet_len_2_sched_time(unsigned int len, u32 rate)
+{
+	return PSCHED_NS2TICKS((u64)len * NSEC_PER_SEC / rate);
+}
+
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -371,6 +377,24 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 				  &q->delay_cor, q->delay_dist);
 
 		now = psched_get_time();
+
+		if (q->rate) {
+			struct sk_buff_head *list = &q->qdisc->q;
+
+			delay += packet_len_2_sched_time(skb->len, q->rate);
+
+			if (!skb_queue_empty(list)) {
+				/*
+				 * Last packet in queue is reference point (now).
+				 * First packet in queue is already in flight,
+				 * calculate this time bonus and substract
+				 * from delay.
+				 */
+				delay -= now - netem_skb_cb(skb_peek(list))->time_to_send;
+				now = netem_skb_cb(skb_peek_tail(list))->time_to_send;
+			}
+		}
+
 		cb->time_to_send = now + delay;
 		++q->counter;
 		ret = qdisc_enqueue(skb, q->qdisc);
@@ -535,6 +559,14 @@ static void get_corrupt(struct Qdisc *sch, const struct nlattr *attr)
 	init_crandom(&q->corrupt_cor, r->correlation);
 }
 
+static void get_rate(struct Qdisc *sch, const struct nlattr *attr)
+{
+	struct netem_sched_data *q = qdisc_priv(sch);
+	const struct tc_netem_rate *r = nla_data(attr);
+
+	q->rate = r->rate;
+}
+
 static int get_loss_clg(struct Qdisc *sch, const struct nlattr *attr)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
@@ -594,6 +626,7 @@ static const struct nla_policy netem_policy[TCA_NETEM_MAX + 1] = {
 	[TCA_NETEM_CORR]	= { .len = sizeof(struct tc_netem_corr) },
 	[TCA_NETEM_REORDER]	= { .len = sizeof(struct tc_netem_reorder) },
 	[TCA_NETEM_CORRUPT]	= { .len = sizeof(struct tc_netem_corrupt) },
+	[TCA_NETEM_RATE]	= { .len = sizeof(struct tc_netem_rate) },
 	[TCA_NETEM_LOSS]	= { .type = NLA_NESTED },
 };
 
@@ -666,6 +699,9 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt)
 	if (tb[TCA_NETEM_CORRUPT])
 		get_corrupt(sch, tb[TCA_NETEM_CORRUPT]);
 
+	if (tb[TCA_NETEM_RATE])
+		get_rate(sch, tb[TCA_NETEM_RATE]);
+
 	q->loss_model = CLG_RANDOM;
 	if (tb[TCA_NETEM_LOSS])
 		ret = get_loss_clg(sch, tb[TCA_NETEM_LOSS]);
@@ -846,6 +882,7 @@ static int netem_dump(struct Qdisc *sch, struct sk_buff *skb)
 	struct tc_netem_corr cor;
 	struct tc_netem_reorder reorder;
 	struct tc_netem_corrupt corrupt;
+	struct tc_netem_rate rate;
 
 	qopt.latency = q->latency;
 	qopt.jitter = q->jitter;
@@ -868,6 +905,9 @@ static int netem_dump(struct Qdisc *sch, struct sk_buff *skb)
 	corrupt.correlation = q->corrupt_cor.rho;
 	NLA_PUT(skb, TCA_NETEM_CORRUPT, sizeof(corrupt), &corrupt);
 
+	rate.rate = q->rate;
+	NLA_PUT(skb, TCA_NETEM_RATE, sizeof(rate), &rate);
+
 	if (dump_loss_model(q, skb) != 0)
 		goto nla_put_failure;
 
-- 
1.7.7

^ permalink raw reply related

* [PATCH v3 net-next 2/2] netem: add cell concept to simulate special MAC behavior
From: Hagen Paul Pfeifer @ 2011-11-30 22:20 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, Stephen Hemminger, Hagen Paul Pfeifer
In-Reply-To: <1322691627-20551-1-git-send-email-hagen@jauu.net>

This extension can be used to simulate special link layer
characteristics. Simulate because packet data is not modified, only the
calculation base is changed to delay a packet based on the original
packet size and artificial cell information.

packet_overhead can be used to simulate a link layer header compression
scheme (e.g. set packet_overhead to -20) or with a positive
packet_overhead value an additional MAC header can be simulated. It is
also possible to "replace" the 14 byte Ethernet header with something
else.

cell_size and cell_overhead can be used to simulate link layer schemes,
based on cells, like some TDMA schemes. Another application area are MAC
schemes using a link layer fragmentation with a (small) header each.
Cell size is the maximum amount of data bytes within one cell. Cell
overhead is an additional variable to change the per-cell-overhead (e.g.
5 byte header per fragment).

Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per
cell overhead 5 byte):

	tc qdisc add dev eth0 root netem rate 5kbit 20 100 5

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
---

The actual version of packet_len_2_sched_time() address Eric's div/mod
instruction concerns. I benchmarked the version in the patch with the
following version:


	if (q->cell_size) {
		u32 mod_carry = len % q->cell_size;
		u32 cells     = len / q->cell_size;
		if (mod_carry)
			mod_carry = (len > q->cell_size || !cells) ?
				q->cell_size - mod_carry : len - mod_carry;

		if (q->cell_overhead) {
			if (mod_carry)
				++cells;
			len += cells * q->cell_overhead;
		}
		len += mod_carry;
	}
	return len;


The patch version is a little bit faster for "all" packet sizes. For common
cases (e.g. max. 1000 byte packets, cellsize 100 byte, the patch version
exhibit significant improvements). IMHO the actual version is also more
understandable. Replace div and mod by do_div() was not that successful.


 include/linux/pkt_sched.h |    3 +++
 net/sched/sch_netem.c     |   32 +++++++++++++++++++++++++++++---
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 26c37ca..63845cf 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -498,6 +498,9 @@ struct tc_netem_corrupt {
 
 struct tc_netem_rate {
 	__u32	rate;	/* byte/s */
+	__s32   packet_overhead;
+	__u32   cell_size;
+	__s32   cell_overhead;
 };
 
 enum {
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 9b7af9f..bcd2b3f 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -80,6 +80,9 @@ struct netem_sched_data {
 	u32 reorder;
 	u32 corrupt;
 	u32 rate;
+	s32 packet_overhead;
+	u32 cell_size;
+	s32 cell_overhead;
 
 	struct crndstate {
 		u32 last;
@@ -299,9 +302,26 @@ static psched_tdiff_t tabledist(psched_tdiff_t mu, psched_tdiff_t sigma,
 	return  x / NETEM_DIST_SCALE + (sigma / NETEM_DIST_SCALE) * t + mu;
 }
 
-static psched_time_t packet_len_2_sched_time(unsigned int len, u32 rate)
+static psched_time_t packet_len_2_sched_time(unsigned int len,
+					     struct netem_sched_data *q)
 {
-	return PSCHED_NS2TICKS((u64)len * NSEC_PER_SEC / rate);
+	u32 cells = 0;
+	u32 datalen;
+
+	len += q->packet_overhead;
+
+	if (q->cell_size) {
+		for (datalen = len; datalen >  q->cell_size; datalen -= q->cell_size)
+			cells++;
+
+		if (q->cell_overhead)
+			len += cells * q->cell_overhead;
+
+		if (datalen)
+			len += (q->cell_size - datalen);
+	}
+
+	return PSCHED_NS2TICKS((u64)len * NSEC_PER_SEC / q->rate);
 }
 
 /*
@@ -381,7 +401,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		if (q->rate) {
 			struct sk_buff_head *list = &q->qdisc->q;
 
-			delay += packet_len_2_sched_time(skb->len, q->rate);
+			delay += packet_len_2_sched_time(skb->len, q);
 
 			if (!skb_queue_empty(list)) {
 				/*
@@ -565,6 +585,9 @@ static void get_rate(struct Qdisc *sch, const struct nlattr *attr)
 	const struct tc_netem_rate *r = nla_data(attr);
 
 	q->rate = r->rate;
+	q->packet_overhead = r->packet_overhead;
+	q->cell_size       = r->cell_size;
+	q->cell_overhead   = r->cell_overhead;
 }
 
 static int get_loss_clg(struct Qdisc *sch, const struct nlattr *attr)
@@ -906,6 +929,9 @@ static int netem_dump(struct Qdisc *sch, struct sk_buff *skb)
 	NLA_PUT(skb, TCA_NETEM_CORRUPT, sizeof(corrupt), &corrupt);
 
 	rate.rate = q->rate;
+	rate.packet_overhead = q->packet_overhead;
+	rate.cell_size       = q->cell_size;
+	rate.cell_overhead   = q->cell_overhead;
 	NLA_PUT(skb, TCA_NETEM_RATE, sizeof(rate), &rate);
 
 	if (dump_loss_model(q, skb) != 0)
-- 
1.7.7

^ permalink raw reply related

* Re: 3.2.0-rc3+ (Linus git - 883381d9f1c5a6329bbb796e23ae52c939940310) - INFO: suspicious RCU usage
From: David Miller @ 2011-11-30 22:24 UTC (permalink / raw)
  To: eric.dumazet; +Cc: miles.lane, linux-kernel, kuznet, dipankar, paulmck, netdev
In-Reply-To: <1322633155.23721.5.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 30 Nov 2011 07:05:55 +0100

> [PATCH] ipv4: fix lockdep splat in rt_cache_seq_show
> 
> After commit f2c31e32b378 (fix NULL dereferences in check_peer_redir()),
> dst_get_neighbour() should be guarded by rcu_read_lock() /
> rcu_read_unlock() section.
> 
> Reported-by: Miles Lane <miles.lane@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH V2] sch_red: fix red_calc_qavg_from_idle_time
From: Dave Taht @ 2011-11-30 22:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Stephen Hemminger
In-Reply-To: <1322691053.2602.24.camel@edumazet-laptop>

On Wed, Nov 30, 2011 at 11:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Since commit a4a710c4a7490587 (pkt_sched: Change PSCHED_SHIFT from 10 to
> 6) it seems RED/GRED are broken.

> red_calc_qavg_from_idle_time() computes a delay in us units, but this
> delay is now 16 times bigger than real delay, so the final qavg result
> smaller than expected.

I've been scratching my head about RED's behavior for a while.

Pre-BQL I was unable to even see red behave, so we're still ahead
on this game.

I will apply this patch to the ongoing work. Thank you.

http://www.bufferbloat.net/issues/312

One of my other problems is when I try to size red (or choke) appropriately
(or so I think) for GigE bandwidths and queue depths, it would fail
to calculate the ewma value.


-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
FR Tel: 0638645374
http://www.bufferbloat.net

^ permalink raw reply

* Re: [PATCH 2/3] route: set iif and oif information in flowi struct
From: Julian Anastasov @ 2011-11-30 22:37 UTC (permalink / raw)
  To: Ulrich Weber; +Cc: netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <4ED66630.8030308@sophos.com>


	Hello,

On Wed, 30 Nov 2011, Ulrich Weber wrote:

> > 	OTOH, rt_iif has some complex semantic: original oif
> > or the selected oif. May be you prefer flowi4_oif to hold
> > the selected oif, right?
> I wasn't aware the ip_route_output_slow() might change the original oif.
> You know why this might happen? Shouldn't fib_lookup only return
> a route matching the given oif?

	There are exceptions but only in this form:

	dev_out = net->loopback_dev;

	i.e. when traffic is diverted to loopback device.
In all others cases the provided oif is considered and never
changed.

> Anyway, if thats the case your code above is more correct. The packet
> should always match the xfrm policy where it was originally routed.
> 
> > 	I see one dangerous place that must be checked:
> > icmp_route_lookup. Before now __ip_route_output_key was
> > called after xfrm_decode_session_reverse with 0 in
> > flowi4_oif, i.e. no oif binding was used. But now when
> > decode_session sets flowi4_oif we will restrict the route
> > via this interface?
> Thanks for the hint! Yes the current patch will force the ICMP packet
> over the received interface.
> 
> Will add "fl4_dec.flowi4_oif = 0;" in case the saddr is local, so the
> behavior will be the same. fl4_dec.flowi4_oif will then be set in
> _ip_route_output_key() again.

	Yes, I think this should work. I now see another
unrelated problem with this code. _decode_session4 uses
fl4->flowi4_tos = iph->tos; But we then feed fl4_dec
to __ip_route_output_key without applying IPTOS_RT_MASK.
__ip_route_output_key does not work with plain TOS from IP
header.

	Currently, the situation around flowi4_tos is as follows:

- flowi4_tos:
	- (IPTOS_RT_MASK | RTO_ONLINK) bits when provided to
	__ip_route_output_key

	- on return above value is preserved if route is
	in cache, it is not changed

	- only IPTOS_RT_MASK bits are preserved by
	ip_route_output_slow if the route is not cached
	because flowi4_tos holds the bits that are
	matched by ip rules tos XX

	As result, we can not rely on the RTO_ONLINK bit
	being valid on return. IPTOS_RT_MASK bits are preserved.

- rt_key_tos:
	- (IPTOS_RT_MASK | RTO_ONLINK) bits

	- we have a bug here: ip_route_output_slow filters
	the provided flowi4_tos by removing RTO_ONLINK bit,
	then __mkroute_output tries to get original value
	with the RTO_ONLINK bit but ends up with the
	modified value. We need to provide orig_tos
	to __mkroute_output.

	In XFRM. Currently only xfrm_bundle_create calls
xfrm_get_tos to get a routable TOS value and such value is
provided to xfrm_dst_lookup. It means __xfrm4_dst_lookup gets
such filtered TOS value, lets name it rtos (routing tos) and
provides it to output routing correctly.

	xfrm4_fill_dst probably correctly copies
flowi4_tos into rt_key_tos if fl is result of output route
and not returned by decode_session. Of course, RTO_ONLINK
can be lost. But may be problem can happen at least for icmp.c
for the xfrm_decode_session_reverse -> xfrm_lookup code
where a full IP TOS can be copied by xfrm4_fill_dst:

	rt->rt_key_tos = fl4->flowi4_tos;

	What significance has this assignment? Should we have
valid RTO_ONLINK bit in flowi4_tos ?

	So, the fl4->flowi4_tos = iph->tos code does not
look nice, I don't know if this field is used for some
matching and we need to hold all IP TOS bits. If not, may be
we do not need to play games and have to use this instead:

	fl4->flowi4_tos = iph->tos & IPTOS_RT_MASK;

	because flowi4_tos is a rtos, not an IP TOS field.
Then we can reach xfrm_lookup or __ip_route_output_key safely
after _decode_session4.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [RFC iproute2] sch_red: TC_RED_HARDDROP
From: Stephen Hemminger @ 2011-11-30 22:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Thomas Graf, netdev
In-Reply-To: <1322684749.2602.3.camel@edumazet-laptop>

On Wed, 30 Nov 2011 21:25:49 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Hi Thomas
> 
> In 2005, TC_RED_HARDDROP kernel support was added to RED and GRED
> (commit bdc450a0bb1 [PKT_SCHED]: (G)RED: Introduce hard dropping), but
> current iproute2 doesnt have user land support.
> 
> Is there some patch waiting somewhere, or should we :
> 
> - Remove kernel support, since nobody uses it.
> 
> - Add iproute2 support.

(Almost) nobody uses RED because they can't figure it out.
According to Wikipedia, VJ says that: 
 "there are not one, but two bugs in classic RED."


But if flag is present, then Thomas should have sent a patch, might have
been lost, that was pre-patchwork days.

^ permalink raw reply

* Re: [PATCH V2] sch_red: fix red_calc_qavg_from_idle_time
From: Eric Dumazet @ 2011-11-30 22:44 UTC (permalink / raw)
  To: Dave Taht; +Cc: David Miller, netdev, Stephen Hemminger
In-Reply-To: <CAA93jw7TZj3Gd-KiVwr3Qwta4RpDKKV9QxGc+fyGc_+-9r1tLw@mail.gmail.com>

Le mercredi 30 novembre 2011 à 23:35 +0100, Dave Taht a écrit :

> One of my other problems is when I try to size red (or choke) appropriately
> (or so I think) for GigE bandwidths and queue depths, it would fail
> to calculate the ewma value.

Do you have one example of such failure ?

^ permalink raw reply

* Re: [PATCH 10/10] sfc: Support for byte queue limits
From: Ben Hutchings @ 2011-11-30 22:51 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, therbert, netdev
In-Reply-To: <20111130.171239.236955684775039413.davem@davemloft.net>

On Wed, 2011-11-30 at 17:12 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 30 Nov 2011 13:44:03 +0100
> 
> > [PATCH net-next] sfc: fix race in efx_enqueue_skb_tso()
> > 
> > As soon as skb is pushed to hardware, it can be completed and freed, so
> > we should not dereference skb anymore.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied, thanks Eric.

Thanks to all.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox