* Re: [PATCH 00/19] Enable various Renesas drivers on all ARM platforms
From: Laurent Pinchart @ 2013-10-29 13:22 UTC (permalink / raw)
To: dedekind1-Re5JQEeQqe8AvxtiuMwx3w
Cc: linux-fbdev-u79uwXL29TY76Z2rM5mHXA,
linux-sh-u79uwXL29TY76Z2rM5mHXA, Linus Walleij,
Guennadi Liakhovetski, Thierry Reding,
linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-i2c-u79uwXL29TY76Z2rM5mHXA, Laurent Pinchart,
David S. Miller, Vinod Koul, Wolfram Sang, Magnus Damm,
Eduardo Valentin, Tomi Valkeinen,
linux-serial-u79uwXL29TY76Z2rM5mHXA,
linux-input-u79uwXL29TY76Z2rM5mHXA, Zhang Rui, Chris Ball,
Jean-Christophe Plagniol-Villard,
linux-media-u79uwXL29TY76Z2rM5mHXA,
linux-pwm-u79uwXL29TY76Z2rM5mHXA, Samuel Ortiz,
linux-pm-u79uwXL29TY76Z2rM5mHXA, Ian Molton, Mark
In-Reply-To: <1383051980.29619.33.camel-Bxnoe/o8FG+Ef9UqXRslZEEOCMrvLtNR@public.gmane.org>
Hi Artem,
On Tuesday 29 October 2013 15:06:20 Artem Bityutskiy wrote:
> On Tue, 2013-10-29 at 10:12 +0100, Guennadi Liakhovetski wrote:
> > On Tue, 29 Oct 2013, Laurent Pinchart wrote:
> > > Hello,
> > >
> > > This patch series, based on v3.12-rc7, prepares various Renesas drivers
> > > for migration to multiplatform kernels by enabling their compilation or
> > > otherwise fixing them on all ARM platforms. The patches are pretty
> > > straightforward and are described in their commit message.
> > >
> > > I'd like to get all these patches merged in v3.14. As they will need to
> > > go through their respective subsystems' trees, I would appreciate if all
> > > maintainers involved could notify me when they merge patches from this
> > > series in their tree to help me tracking the merge status. I don't plan
> > > to send pull requests individually for these patches, and I will repost
> > > patches individually if changes are requested during review.
> > >
> > > If you believe the issue should be solved in a different way (for
> > > instance by removing the architecture dependency completely) please
> > > reply to the cover letter to let other maintainers chime in.
> >
> > Exactly this was my doubt. If we let these drivers build on all ARM
> > platforms... Maybe we should just let them build everywhere? Unless there
> > are real ARM dependencies. Maybe you could try to remove the restriction
> > and try to build them all on x86?
>
> If they have never been used on anything but ARM, why would you remove
> ARM dependencies? Just for the sake of compile-checking?
>
> Also, if ARM dependency is ever removed, all these should become 'n' by
> default in the Kconfig, in order to make sure they do not slip into
> defconfigs of different architectures.
The idea is that, if ARM is neither a compile-time nor runtime dependency, it
should not be specified in Kconfig. However, if the IP core has never been
used on anything but SuperH and ARM, I don't think clobbering the config
process with drivers that can't be used on the target architecture would be a
really good idea, especially now that we have a COMPILE_TEST Kconfig option.
My preference does goes to SUPERH || ARM || COMPILE_TEST over no dependency at
all.
--
Regards,
Laurent Pinchart
^ permalink raw reply
* Re: [PATCH 03/16] wl1251: add sysfs interface for bluetooth coexistence mode configuration
From: Kalle Valo @ 2013-10-29 13:35 UTC (permalink / raw)
To: Luca Coelho
Cc: Ben Hutchings, Pali Rohár, John W. Linville, Johannes Berg,
David S. Miller, linux-wireless, netdev, linux-kernel,
freemangordon, aaro.koskinen, pavel, sre, joni.lapilainen,
David Gnedt
In-Reply-To: <1383030565.21526.92.camel@porter.coelho.fi>
Luca Coelho <luca@coelho.fi> writes:
> On Mon, 2013-10-28 at 23:39 +0000, Ben Hutchings wrote:
>> On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
>> > From: David Gnedt <david.gnedt@davizone.at>
>> >
>> > Port the bt_coex_mode sysfs interface from wl1251 driver version included
>> > in the Maemo Fremantle kernel to allow bt-coexistence mode configuration.
>> > This enables userspace applications to set one of the modes
>> > WL1251_BT_COEX_OFF, WL1251_BT_COEX_ENABLE and WL1251_BT_COEX_MONOAUDIO.
>> > The default mode is WL1251_BT_COEX_OFF.
>> > It should be noted that this driver always enabled bt-coexistence before
>> > and enabled bt-coexistence directly affects the receiving performance,
>> > rendering it unusable in some low-signal situations. Especially monitor
>> > mode is affected very badly with bt-coexistence enabled.
>> [...]
>>
>> This should be implemented consistently with other drivers:
>>
>> drivers/net/wireless/ath/ath9k/htc_drv_init.c:module_param_named(btcoex_enable, ath9k_htc_btcoex_enable, int, 0444);
>> drivers/net/wireless/ath/ath9k/init.c:module_param_named(btcoex_enable, ath9k_btcoex_enable, int, 0444);
>> drivers/net/wireless/b43/main.c:module_param_named(btcoex, modparam_btcoex, int, 0444);
>> drivers/net/wireless/ipw2x00/ipw2200.c:module_param(bt_coexist, int, 0444);
>> drivers/net/wireless/iwlegacy/common.c:module_param(bt_coex_active, bool, S_IRUGO);
>> drivers/net/wireless/iwlwifi/iwl-drv.c:module_param_named(bt_coex_active, iwlwifi_mod_params.bt_coex_active,
>> drivers/net/wireless/ti/wlcore/sysfs.c:static DEVICE_ATTR(bt_coex_state, S_IRUGO | S_IWUSR,
>>
>> Oh, hmm, I see a problem here.
>
> With so many drivers doing the same thing, isn't it about time to add
> this to nl80211?
Yes, this really needs to be in nl80211. I even suggested this years ago
but was turned down at the time. Can't remember the reason anymore.
--
Kalle Valo
^ permalink raw reply
* Re: [PATCH] bridge: pass correct vlan id to multicast code
From: Amos Kong @ 2013-10-29 13:39 UTC (permalink / raw)
To: Toshiaki Makita; +Cc: Vlad Yasevich, netdev, shemminger
In-Reply-To: <1383044915.3518.41.camel@ubuntu-vm-makita>
On Tue, Oct 29, 2013 at 08:08:35PM +0900, Toshiaki Makita wrote:
> On Tue, 2013-10-29 at 10:36 +0800, Amos Kong wrote:
> > On Mon, Oct 28, 2013 at 03:45:07PM -0400, Vlad Yasevich wrote:
> > > Currently multicast code attempts to extrace the vlan id from
> > > the skb even when vlan filtering is disabled. This can lead
> > > to mdb entries being created with the wrong vlan id.
> > > Pass the already extracted vlan id to the multicast
> > > filtering code to make the correct id is used in
> > > creation as well as lookup.
>
> Thanks!
>
> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>
> >
> > Hi Vlad,
> >
> > Can we just update br_vlan_get_tag() to set vid to 0 if dev->vlan is
> > disabled? I guess it would effect br_handle_local_finish().
>
> br_handle_local_finish() looks also buggy.
> But adding vlan enabled checking would not fix it completely because
> vlan_bitmap and PVID are not taken into account in that function.
>
> Since we cannot pass vid as an argument from br_dev_xmit() to
> br_handle_[local/frame]_finish() because of NF_HOOK,
> br_handle_local_finish() seems to have to check vlan_enabled,
> vlan_bitmap, and pvid by itself.
>
> IMHO it can be addressed by another patch.
>
> > > Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> > > ---
> > > net/bridge/br_device.c | 2 +-
> > > net/bridge/br_input.c | 2 +-
> > > net/bridge/br_multicast.c | 44 +++++++++++++++++++-------------------------
> > > net/bridge/br_private.h | 6 ++++--
> > > 4 files changed, 25 insertions(+), 29 deletions(-)
> >
> > ...
> >
> > > diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> > > index 8b0b610..686284f 100644
> > > --- a/net/bridge/br_multicast.c
> > > +++ b/net/bridge/br_multicast.c
> > > @@ -947,7 +947,8 @@ void br_multicast_disable_port(struct net_bridge_port *port)
> > >
> > > static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
> > > struct net_bridge_port *port,
> > > - struct sk_buff *skb)
> > > + struct sk_buff *skb,
> > > + u16 vid)
> > > {
> > > struct igmpv3_report *ih;
> > > struct igmpv3_grec *grec;
> > > @@ -957,12 +958,10 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
> > > int type;
> > > int err = 0;
> > > __be32 group;
> > > - u16 vid = 0;
> > >
> > > if (!pskb_may_pull(skb, sizeof(*ih)))
> > > return -EINVAL;
> > >
> > > - br_vlan_get_tag(skb, &vid);
> >
> > After applied the patch, we always use vid in br_dev_xmit()->br_allowed_ingress(),
> > is it possible that the vlan of bridge is re-enabled when other
> > changed functions are called?
> >
> > We can just add a enabled checking before this kind of br_vlan_get_tag()?
> >
> > if (!br->vlan_enabled)
> > br_vlan_get_tag(skb2, &vid);
>
> Maybe this leads to a wrong way to update mdb in some cases like
> Vlan_filtering is disabled (by default).
> Add some vids we want to allow.
> Receive a frame whose vid wouldn't be allowed with vlan_filtering enabled.
> The frame passes br_allowed_ingress().
> Enable vlan_filtering.
> The frame reaches br_ip4_multicast_igmp3_report().
> Mdb is updated with disabled vid.
>
>
> Thanks,
>
> Toshiaki Makita
Thanks all your explanation, I'm ok with the patch.
--
Amos.
^ permalink raw reply
* Re: [PATCH net-next] net: introduce gro_frag_list_enable sysctl
From: Christoph Paasch @ 2013-10-29 13:48 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Herbert Xu, netdev, Jerry Chu, Michael Dalton
In-Reply-To: <1383051962.5464.25.camel@edumazet-glaptop.roam.corp.google.com>
On 29/10/13 - 06:06:02, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Christoph Paasch and Jerry Chu reported crashes in skb_segment() caused
> by commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")
>
> (Jerry is working on adding native GRO support for tunnels)
>
> skb_segment() only deals with a frag_list chain containing MSS sized
> fragments.
>
> This patch adds support any kind of frag, and adds a new sysctl,
> as clearly the GRO layer should avoid building frag_list skbs
> on a router, as the segmentation is adding cpu overhead.
>
> Note that we could try to reuse page fragments instead of doing
> copy to linear skbs, but this requires a fair amount of work,
> and possible truesize nightmares, as we do not track individual
> (per page fragment) truesizes.
>
> /proc/sys/net/core/gro_frag_list_enable possible values are :
>
> 0 : GRO layer is not allowed to use frag_list to extend skb capacity
> 1 : GRO layer is allowed to use frag_list, but skb_segment()
> automatically sets the sysctl to 0.
> 2 : GRO is allowed to use frag_list, and skb_segment() wont
> clear the sysctl.
>
> Default value is 1 : automatic discovery
>
> Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> Reported-by: Jerry Chu <hkchu@google.com>
> Cc: Michael Dalton <mwdalton@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> Documentation/sysctl/net.txt | 19 +++++++++++++++++++
> include/linux/netdevice.h | 1 +
> net/core/skbuff.c | 29 ++++++++++++++++++++---------
> net/core/sysctl_net_core.c | 10 ++++++++++
> 4 files changed, 50 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> index 9a0319a82470..8778568ae64e 100644
> --- a/Documentation/sysctl/net.txt
> +++ b/Documentation/sysctl/net.txt
> @@ -87,6 +87,25 @@ sysctl.net.busy_read globally.
> Will increase power usage.
> Default: 0 (off)
>
> +gro_frag_list_enable
> +--------------------
> +
> +GRO layer can build full size GRO packets (~64K of payload) if it is allowed
> +to extend skb using the frag_list pointer. However, this strategy is a win
> +on hosts, where TCP flows are terminated. For a router, using frag_list
> +skbs is not a win because we have to segment skbs before transmit,
> +as most NIC drivers do not support frag_list.
> +As soon as one frag_list skb has to be segmented, this sysctl is automatically
> +changed from 1 to 0.
> +If the value is set to 2, kernel wont change it.
> +
> +Choices : 0 (off),
> + 1 (on, with automatic change to 0)
> + 2 (on, permanent)
> +
> +Default: 1 (on, with automatic downgrade on a router)
> +
> +
> rmem_default
> ------------
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 27f62f746621..b82ff52f301e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2807,6 +2807,7 @@ extern int netdev_max_backlog;
> extern int netdev_tstamp_prequeue;
> extern int weight_p;
> extern int bpf_jit_enable;
> +extern int sysctl_gro_frag_list_enable;
We are missing the definition of sysctl_gro_frag_list_enable :)
net/built-in.o: In function `skb_gro_receive':
(.text+0x8f04): undefined reference to `sysctl_gro_frag_list_enable'
net/built-in.o: In function `skb_segment':
(.text+0xa54e): undefined reference to `sysctl_gro_frag_list_enable'
net/built-in.o: In function `skb_segment':
(.text+0xa557): undefined reference to `sysctl_gro_frag_list_enable'
net/built-in.o:(.data+0x1198): undefined reference to `sysctl_gro_frag_list_enable'
Cheers,
Christoph
^ permalink raw reply
* Re: [PATCH 00/19] Enable various Renesas drivers on all ARM platforms
From: Artem Bityutskiy @ 2013-10-29 13:54 UTC (permalink / raw)
To: Laurent Pinchart
Cc: linux-fbdev, linux-sh, Linus Walleij, Guennadi Liakhovetski,
Thierry Reding, linux-mtd, linux-i2c, Laurent Pinchart,
David S. Miller, Vinod Koul, Joerg Roedel, Wolfram Sang,
Magnus Damm, Eduardo Valentin, Tomi Valkeinen, linux-serial,
linux-input, Zhang Rui, Chris Ball,
Jean-Christophe Plagniol-Villard, linux-media, linux-pwm,
Samuel Ortiz, linux-pm, Ian
In-Reply-To: <1844190.ApyucSZX8W@avalon>
On Tue, 2013-10-29 at 14:22 +0100, Laurent Pinchart wrote:
> > Also, if ARM dependency is ever removed, all these should become 'n' by
> > default in the Kconfig, in order to make sure they do not slip into
> > defconfigs of different architectures.
>
> The idea is that, if ARM is neither a compile-time nor runtime dependency, it
> should not be specified in Kconfig. However, if the IP core has never been
> used on anything but SuperH and ARM, I don't think clobbering the config
> process with drivers that can't be used on the target architecture would be a
> really good idea, especially now that we have a COMPILE_TEST Kconfig option.
> My preference does goes to SUPERH || ARM || COMPILE_TEST over no dependency at
> all.
Ah, OK, I missed the entire COMPILE_TEST story.
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: David Ahern @ 2013-10-29 14:12 UTC (permalink / raw)
To: Ingo Molnar, Neil Horman
Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029125233.GA17449@gmail.com>
On 10/29/13 6:52 AM, Ingo Molnar wrote:
>> According to the perf man page, I'm supposed to be able to use --
>> to separate perf command line parameters from the command I want
>> to run. And it definately executed test.sh, I added an echo to
>> stdout in there as a test run and observed them get captured in
>> counters.txt
>
> Well, '--' can be used to delineate the command portion for cases
> where it's ambiguous.
>
> Here's it's unambiguous though. This:
>
> perf stat --repeat 20 -C 0 -ddd perf bench sched messaging -- /root/test.sh
>
> stops parsing a valid option after the -ddd option, so in theory it
> should execute 'perf bench sched messaging -- /root/test.sh' where
> '-- /root/test.sh' is simply a parameter to 'perf bench' and is thus
> ignored.
Normally with perf commands a workload can be specified to state how
long to collect perf data. That is not the case for perf-bench.
David
^ permalink raw reply
* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Neil Horman @ 2013-10-29 14:17 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029131149.GB20408@gmail.com>
On Tue, Oct 29, 2013 at 02:11:49PM +0100, Ingo Molnar wrote:
>
> * Neil Horman <nhorman@tuxdriver.com> wrote:
>
> > I'm sure it worked properly on my system here, I specificially
> > checked it, but I'll gladly run it again. You have to give me an
> > hour as I have a meeting to run to, but I'll have results shortly.
>
> So what I tried to react to was this observation of yours:
>
> > > > Heres my data for running the same test with taskset
> > > > restricting execution to only cpu0. I'm not quite sure whats
> > > > going on here, but doing so resulted in a 10x slowdown of the
> > > > runtime of each iteration which I can't explain. [...]
>
> A 10x slowdown would be consistent with not running your testcase
> but 'perf bench sched messaging' by accident, or so.
>
> But I was really just guessing wildly here.
>
> Thanks,
>
> Ingo
>
So, I apologize, you were right. I was running the test.sh script but perf was
measuring itself. Using this command line:
for i in `seq 0 1 3`
do
echo $i > /sys/modules/csum_test/parameters/module_test_mode; taskset -c 0 perf stat --repeat -C 0 -ddd /root/test.sh
done >> counters.txt 2>&1
with test.sh unchanged I get these results:
Base:
Performance counter stats for '/root/test.sh' (20 runs):
56.069737 task-clock # 1.005 CPUs utilized ( +- 0.13% ) [100.00%]
5 context-switches # 0.091 K/sec ( +- 5.11% ) [100.00%]
0 cpu-migrations # 0.000 K/sec [100.00%]
366 page-faults # 0.007 M/sec ( +- 0.08% )
144,264,737 cycles # 2.573 GHz ( +- 0.23% ) [17.49%]
9,239,760 stalled-cycles-frontend # 6.40% frontend cycles idle ( +- 3.77% ) [19.19%]
110,635,829 stalled-cycles-backend # 76.69% backend cycles idle ( +- 0.14% ) [19.68%]
54,291,496 instructions # 0.38 insns per cycle
# 2.04 stalled cycles per insn ( +- 0.14% ) [18.30%]
5,844,933 branches # 104.244 M/sec ( +- 2.81% ) [16.58%]
301,523 branch-misses # 5.16% of all branches ( +- 0.12% ) [16.09%]
23,645,797 L1-dcache-loads # 421.721 M/sec ( +- 0.05% ) [16.06%]
494,467 L1-dcache-load-misses # 2.09% of all L1-dcache hits ( +- 0.06% ) [16.06%]
2,907,250 LLC-loads # 51.851 M/sec ( +- 0.08% ) [16.06%]
486,329 LLC-load-misses # 16.73% of all LL-cache hits ( +- 0.11% ) [16.06%]
11,113,848 L1-icache-loads # 198.215 M/sec ( +- 0.07% ) [16.06%]
5,378 L1-icache-load-misses # 0.05% of all L1-icache hits ( +- 1.34% ) [16.06%]
23,742,876 dTLB-loads # 423.453 M/sec ( +- 0.06% ) [16.06%]
0 dTLB-load-misses # 0.00% of all dTLB cache hits [16.06%]
11,108,538 iTLB-loads # 198.120 M/sec ( +- 0.06% ) [16.06%]
0 iTLB-load-misses # 0.00% of all iTLB cache hits [16.07%]
0 L1-dcache-prefetches # 0.000 K/sec [16.07%]
0 L1-dcache-prefetch-misses # 0.000 K/sec [16.07%]
0.055817066 seconds time elapsed ( +- 0.10% )
Prefetch(5*64):
Performance counter stats for '/root/test.sh' (20 runs):
47.423853 task-clock # 1.005 CPUs utilized ( +- 0.62% ) [100.00%]
6 context-switches # 0.116 K/sec ( +- 4.27% ) [100.00%]
0 cpu-migrations # 0.000 K/sec [100.00%]
368 page-faults # 0.008 M/sec ( +- 0.07% )
120,423,860 cycles # 2.539 GHz ( +- 0.85% ) [14.23%]
8,555,632 stalled-cycles-frontend # 7.10% frontend cycles idle ( +- 0.56% ) [16.23%]
87,438,794 stalled-cycles-backend # 72.61% backend cycles idle ( +- 1.13% ) [18.33%]
55,039,308 instructions # 0.46 insns per cycle
# 1.59 stalled cycles per insn ( +- 0.05% ) [18.98%]
5,619,298 branches # 118.491 M/sec ( +- 2.32% ) [18.98%]
303,686 branch-misses # 5.40% of all branches ( +- 0.08% ) [18.98%]
26,577,868 L1-dcache-loads # 560.432 M/sec ( +- 0.05% ) [18.98%]
1,323,630 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 0.14% ) [18.98%]
3,426,016 LLC-loads # 72.242 M/sec ( +- 0.05% ) [18.98%]
1,304,201 LLC-load-misses # 38.07% of all LL-cache hits ( +- 0.13% ) [18.98%]
13,190,316 L1-icache-loads # 278.137 M/sec ( +- 0.21% ) [18.98%]
33,881 L1-icache-load-misses # 0.26% of all L1-icache hits ( +- 4.63% ) [17.93%]
25,366,685 dTLB-loads # 534.893 M/sec ( +- 0.24% ) [15.93%]
734 dTLB-load-misses # 0.00% of all dTLB cache hits ( +- 8.40% ) [13.94%]
13,314,660 iTLB-loads # 280.759 M/sec ( +- 0.05% ) [12.97%]
0 iTLB-load-misses # 0.00% of all iTLB cache hits [12.98%]
0 L1-dcache-prefetches # 0.000 K/sec [12.98%]
0 L1-dcache-prefetch-misses # 0.000 K/sec [12.87%]
0.047194407 seconds time elapsed ( +- 0.62% )
Parallel ALU:
Performance counter stats for '/root/test.sh' (20 runs):
57.395070 task-clock # 1.004 CPUs utilized ( +- 1.71% ) [100.00%]
5 context-switches # 0.092 K/sec ( +- 3.90% ) [100.00%]
0 cpu-migrations # 0.000 K/sec [100.00%]
367 page-faults # 0.006 M/sec ( +- 0.10% )
143,232,396 cycles # 2.496 GHz ( +- 1.68% ) [16.73%]
7,299,843 stalled-cycles-frontend # 5.10% frontend cycles idle ( +- 2.69% ) [18.47%]
109,485,845 stalled-cycles-backend # 76.44% backend cycles idle ( +- 2.01% ) [19.99%]
56,867,669 instructions # 0.40 insns per cycle
# 1.93 stalled cycles per insn ( +- 0.22% ) [19.49%]
6,646,323 branches # 115.800 M/sec ( +- 2.15% ) [17.75%]
304,671 branch-misses # 4.58% of all branches ( +- 0.37% ) [16.23%]
23,612,428 L1-dcache-loads # 411.402 M/sec ( +- 0.05% ) [15.95%]
518,988 L1-dcache-load-misses # 2.20% of all L1-dcache hits ( +- 0.11% ) [15.95%]
2,934,119 LLC-loads # 51.121 M/sec ( +- 0.06% ) [15.95%]
509,027 LLC-load-misses # 17.35% of all LL-cache hits ( +- 0.15% ) [15.95%]
11,103,819 L1-icache-loads # 193.463 M/sec ( +- 0.08% ) [15.95%]
5,381 L1-icache-load-misses # 0.05% of all L1-icache hits ( +- 2.45% ) [15.95%]
23,727,164 dTLB-loads # 413.401 M/sec ( +- 0.06% ) [15.95%]
0 dTLB-load-misses # 0.00% of all dTLB cache hits [15.95%]
11,104,205 iTLB-loads # 193.470 M/sec ( +- 0.06% ) [15.95%]
0 iTLB-load-misses # 0.00% of all iTLB cache hits [15.95%]
0 L1-dcache-prefetches # 0.000 K/sec [15.95%]
0 L1-dcache-prefetch-misses # 0.000 K/sec [15.96%]
0.057151644 seconds time elapsed ( +- 1.69% )
Both:
Performance counter stats for '/root/test.sh' (20 runs):
48.377833 task-clock # 1.005 CPUs utilized ( +- 0.67% ) [100.00%]
5 context-switches # 0.113 K/sec ( +- 3.88% ) [100.00%]
0 cpu-migrations # 0.001 K/sec ( +-100.00% ) [100.00%]
367 page-faults # 0.008 M/sec ( +- 0.08% )
122,529,490 cycles # 2.533 GHz ( +- 1.05% ) [14.24%]
8,796,729 stalled-cycles-frontend # 7.18% frontend cycles idle ( +- 0.56% ) [16.20%]
88,936,550 stalled-cycles-backend # 72.58% backend cycles idle ( +- 1.48% ) [18.16%]
58,405,660 instructions # 0.48 insns per cycle
# 1.52 stalled cycles per insn ( +- 0.07% ) [18.61%]
5,742,738 branches # 118.706 M/sec ( +- 1.54% ) [18.61%]
303,555 branch-misses # 5.29% of all branches ( +- 0.09% ) [18.61%]
26,321,789 L1-dcache-loads # 544.088 M/sec ( +- 0.07% ) [18.61%]
1,236,101 L1-dcache-load-misses # 4.70% of all L1-dcache hits ( +- 0.08% ) [18.61%]
3,409,768 LLC-loads # 70.482 M/sec ( +- 0.05% ) [18.61%]
1,212,511 LLC-load-misses # 35.56% of all LL-cache hits ( +- 0.08% ) [18.61%]
10,579,372 L1-icache-loads # 218.682 M/sec ( +- 0.05% ) [18.61%]
19,426 L1-icache-load-misses # 0.18% of all L1-icache hits ( +- 14.70% ) [18.61%]
25,329,963 dTLB-loads # 523.586 M/sec ( +- 0.27% ) [17.29%]
802 dTLB-load-misses # 0.00% of all dTLB cache hits ( +- 5.43% ) [15.33%]
10,635,524 iTLB-loads # 219.843 M/sec ( +- 0.09% ) [13.38%]
0 iTLB-load-misses # 0.00% of all iTLB cache hits [12.72%]
0 L1-dcache-prefetches # 0.000 K/sec [12.72%]
0 L1-dcache-prefetch-misses # 0.000 K/sec [12.72%]
0.048140073 seconds time elapsed ( +- 0.67% )
Which overall looks alot more like I expect, save for the parallel ALU cases.
It seems here that the parallel ALU changes actually hurt performance, which
really seems counter-intuitive. I don't yet have any explination for that. I
do note that we seem to have more stalls in the both case so perhaps the
parallel chains call for a more agressive prefetch. Do you have any thoughts?
Regards
Neil
^ permalink raw reply
* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-29 14:27 UTC (permalink / raw)
To: Neil Horman
Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029141706.GC25078@neilslaptop.think-freely.org>
* Neil Horman <nhorman@tuxdriver.com> wrote:
> So, I apologize, you were right. I was running the test.sh script
> but perf was measuring itself. [...]
Ok, cool - one mystery less!
> Which overall looks alot more like I expect, save for the parallel
> ALU cases. It seems here that the parallel ALU changes actually
> hurt performance, which really seems counter-intuitive. I don't
> yet have any explination for that. I do note that we seem to have
> more stalls in the both case so perhaps the parallel chains call
> for a more agressive prefetch. Do you have any thoughts?
Note that with -ddd you 'overload' the PMU with more counters than
can be run at once, which introduces extra noise. Since you are
running the tests for 0.150 secs or so, the results are not very
representative:
734 dTLB-load-misses # 0.00% of all dTLB cache hits ( +- 8.40% ) [13.94%]
13,314,660 iTLB-loads # 280.759 M/sec ( +- 0.05% ) [12.97%]
with such low runtimes those results are very hard to trust.
So -ddd is typically used to pick up the most interesting PMU events
you want to see measured, and then use them like this:
-e dTLB-load-misses -e iTLB-loads
etc. For such short runtimes make sure the last column displays
close to 100%, so that the PMU results become trustable.
A nehalem+ PMU will allow 2-4 events to be measured in parallel,
plus generics like 'cycles', 'instructions' can be added 'for free'
because they get counted in a separate (fixed purpose) PMU register.
The last colum tells you what percentage of the runtime that
particular event was actually active. 100% (or empty last column)
means it was active all the time.
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH NEXT] rtlwifi: Fix endian error in extracting packet type
From: Bjørn Mork @ 2013-10-29 14:27 UTC (permalink / raw)
To: Ben Hutchings
Cc: Larry Finger, linville, linux-wireless, Mark Cave-Ayland, netdev,
Stable
In-Reply-To: <1383005246.3779.61.camel@bwh-desktop.uk.level5networks.com>
Ben Hutchings <bhutchings@solarflare.com> writes:
>> @@ -1077,8 +1077,8 @@ u8 rtl_is_special_data(struct ieee80211_hw *hw, struct sk_buff *skb, u8 is_tx)
>>
>> ip = (struct iphdr *)((u8 *) skb->data + mac_hdr_len +
>> SNAP_SIZE + PROTOC_TYPE_SIZE);
>> - ether_type = *(u16 *) ((u8 *) skb->data + mac_hdr_len + SNAP_SIZE);
>> - /* ether_type = ntohs(ether_type); */
>> + ether_type = be16_to_cpu(*(__be16 *)((u8 *)skb->data + mac_hdr_len +
>> + SNAP_SIZE));
>>
>> if (ETH_P_IP == ether_type) {
>> if (IPPROTO_UDP == ip->protocol) {
>
> This crazy function also says that *all* IPv6 frames are special, which
> apparently means that on TX they should get sent at the lowest possible
> bit rate. So I think this is going to cause a regression for IPv6
> throughput unless you remove that case.
>
> The DHCP case is also not validating IP and UDP header lengths against
> the packet length, though this may be harmless in practice.
It's not validating the upper 8 bits of the port numbers either, so it
will hit random UDP traffic in addition to DHCP.
But it was good to see this function now. I was wondering how to support
some buggy 3G modem firmware without ugly hacks. Seems there will always
be worse hacks in drivers/net, no matter what I do :-)
Bjørn
^ permalink raw reply
* Re: IPV6 nf defrag does not work
From: Jiri Pirko @ 2013-10-29 14:30 UTC (permalink / raw)
To: netdev; +Cc: pablo, netfilter-devel, yoshfuji, kadlec, kaber
In-Reply-To: <20131029105208.GA18526@minipsycho.orion>
Tue, Oct 29, 2013 at 11:52:08AM CET, jiri@resnulli.us wrote:
>Hi All.
>
>On the current net-next if you on HOSTA do:
>ip6tables -I INPUT -p icmpv6 -j DROP
>ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>
>and on HOSTB you do:
>ping6 HOSTA -s2000 (MTU is 1500)
>
>Only the first ICMP echo request will be passed through, the rest is not
>passed on HOSTA. This issue does not occur with smaller packets than MTU (where
>fragmentation does not happen).
>
Hmm. The reason why first packet goes through is because of:
commit 58a317f1061c894d2344c0b6a18ab4a64b69b815
Author: Patrick McHardy <kaber@trash.net>
Date: Sun Aug 26 19:14:12 2012 +0200
netfilter: ipv6: add IPv6 NAT support
First packet will hit "if ((help && help->helper) || !nf_ct_is_confirmed(ct))"
(ct is uncorfirmed for it).
For this, nf_conntrack_ipv6 has to be loaded. Continuing investigation.
>I'm trying to find out where the problem is.
>
>Any quick ideas?
>
>Thanks
>
>Jiri
^ permalink raw reply
* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Dan Williams @ 2013-10-29 14:31 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: David Miller, jiri, vyasevich, netdev, kuznet, jmorris, yoshfuji,
kaber, thaller, stephen
In-Reply-To: <20131028234842.GB26185@order.stressinduktion.org>
On Tue, 2013-10-29 at 00:48 +0100, Hannes Frederic Sowa wrote:
> On Mon, Oct 28, 2013 at 06:16:19PM -0500, Dan Williams wrote:
> > On Mon, 2013-10-28 at 17:17 -0400, David Miller wrote:
> > > From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > Date: Sun, 27 Oct 2013 17:48:35 +0100
> > >
> > > > A temporary address is also bound to a non-privacy public address so
> > > > it's lifetime is determined by its lifetime (e.g. if you switch the
> > > > network and don't receive on-link information for that prefix any
> > > > more). NetworkManager would have to take care about that, too. It is
> > > > just a question of what NetworkManager wants to handle itself or lets
> > > > the kernel handle for it.
> > >
> > > How much really needs to be in userspace to implement RFC4941?
> > >
> > > I don't like the idea that even for a fully up and properly
> > > functioning link, if NetworkManager wedges then critical things like
> > > temporary address (re-)generation, will cease.
> >
> > Honestly, I'd be completely happy to leave temporary address handling up
> > to the kernel and *not* do it in userspace; the kernel already has all
> > the code. There are two problems with that though, (a) it's tied to
> > in-kernel RA handling, and (b) it's controlled by a CONFIG option. Both
> > these are solvable.
>
> Ah, (a) does complicate things, I agree. But the tieing is essential
> currently. So it seems a netlink interface would be needed to tie a new
> address to an already installed one, if the kernel should still deal
> with the regeneration?
I think it's simpler than that. New flag set when adding the
non-private address that says "create and manage privacy addresses for
this non-private address". The kernel then adds the privacy addresses
generated off the non-private address/prefixlen, and ties their lifetime
to the non-private address. If the non-private address is removed, the
privacy addresses could get removed too.
I don't think we need API to tie addresses to already installed ones,
because the kernel already has the privacy address generation code, so
why should userspace generate the privacy address at all? Just leave
that to the kernel.
> > First off, what's the reasoning behind having IPv6 privacy as a config
> > option? It's off-by-default and must be explicitly turned on, so is
> > there any harm in removing the config? Or is it just for
> > smallest-kernel-ever folks?
>
> I don't know about the policy. Does it really matter as distributions
> normally switch it on? But I would not like to see the option removed
> entirly, maybe the default could be changed.
>
> > Would a new IFA_F_MANAGE_TEMP (or better name) work here, indicating
> > that for some new static address, that the kernel should create and
> > manage the temporary privacy addresses associated with its prefix?
>
> But this would only be needed if they were managed in user-space, no?
"if they" == what? privacy address or static address? What
NetworkManager is trying to do is handle RAs in userspace with libndp
for various flexibility and behavioral reasons, but we'd really like to
leave all the temporary address stuff up to the kernel.
So NM would handle RA/RS and when it gets a prefix, it would create the
IPv6 non-private address and add it to the interface. When adding, it
would also set the "IFA_F_MANAGE_TEMP" flag (or whatever) and the kernel
would then handle all the privacy address generation, lifetimes, and
timers. Basically, break some of the privacy code away from the
in-kernel RA handling so that privacy addresses could be triggered from
userland too.
Would that be workable?
Dan
^ permalink raw reply
* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Hannes Frederic Sowa @ 2013-10-29 14:38 UTC (permalink / raw)
To: Dan Williams
Cc: David Miller, jiri, vyasevich, netdev, kuznet, jmorris, yoshfuji,
kaber, thaller, stephen
In-Reply-To: <1383057078.2236.12.camel@dcbw.foobar.com>
Hi!
On Tue, Oct 29, 2013 at 09:31:18AM -0500, Dan Williams wrote:
> On Tue, 2013-10-29 at 00:48 +0100, Hannes Frederic Sowa wrote:
> > On Mon, Oct 28, 2013 at 06:16:19PM -0500, Dan Williams wrote:
> > > On Mon, 2013-10-28 at 17:17 -0400, David Miller wrote:
> > > > From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > Date: Sun, 27 Oct 2013 17:48:35 +0100
> > > >
> > > > > A temporary address is also bound to a non-privacy public address so
> > > > > it's lifetime is determined by its lifetime (e.g. if you switch the
> > > > > network and don't receive on-link information for that prefix any
> > > > > more). NetworkManager would have to take care about that, too. It is
> > > > > just a question of what NetworkManager wants to handle itself or lets
> > > > > the kernel handle for it.
> > > >
> > > > How much really needs to be in userspace to implement RFC4941?
> > > >
> > > > I don't like the idea that even for a fully up and properly
> > > > functioning link, if NetworkManager wedges then critical things like
> > > > temporary address (re-)generation, will cease.
> > >
> > > Honestly, I'd be completely happy to leave temporary address handling up
> > > to the kernel and *not* do it in userspace; the kernel already has all
> > > the code. There are two problems with that though, (a) it's tied to
> > > in-kernel RA handling, and (b) it's controlled by a CONFIG option. Both
> > > these are solvable.
> >
> > Ah, (a) does complicate things, I agree. But the tieing is essential
> > currently. So it seems a netlink interface would be needed to tie a new
> > address to an already installed one, if the kernel should still deal
> > with the regeneration?
>
> I think it's simpler than that. New flag set when adding the
> non-private address that says "create and manage privacy addresses for
> this non-private address". The kernel then adds the privacy addresses
> generated off the non-private address/prefixlen, and ties their lifetime
> to the non-private address. If the non-private address is removed, the
> privacy addresses could get removed too.
>
> I don't think we need API to tie addresses to already installed ones,
> because the kernel already has the privacy address generation code, so
> why should userspace generate the privacy address at all? Just leave
> that to the kernel.
Ok.
> > > First off, what's the reasoning behind having IPv6 privacy as a config
> > > option? It's off-by-default and must be explicitly turned on, so is
> > > there any harm in removing the config? Or is it just for
> > > smallest-kernel-ever folks?
> >
> > I don't know about the policy. Does it really matter as distributions
> > normally switch it on? But I would not like to see the option removed
> > entirly, maybe the default could be changed.
> >
> > > Would a new IFA_F_MANAGE_TEMP (or better name) work here, indicating
> > > that for some new static address, that the kernel should create and
> > > manage the temporary privacy addresses associated with its prefix?
> >
> > But this would only be needed if they were managed in user-space, no?
>
> "if they" == what? privacy address or static address? What
With "they" I meant privacy addresses.
> NetworkManager is trying to do is handle RAs in userspace with libndp
> for various flexibility and behavioral reasons, but we'd really like to
> leave all the temporary address stuff up to the kernel.
Can you provide me with details why the Kernel RA implementation is not good
enough? I tried to find some bugs, I found some but they were missing details
or were not even correct or outdated.
> So NM would handle RA/RS and when it gets a prefix, it would create the
> IPv6 non-private address and add it to the interface. When adding, it
> would also set the "IFA_F_MANAGE_TEMP" flag (or whatever) and the kernel
> would then handle all the privacy address generation, lifetimes, and
> timers. Basically, break some of the privacy code away from the
> in-kernel RA handling so that privacy addresses could be triggered from
> userland too.
>
> Would that be workable?
That sounds like a solid plan for me. I would actually liked to see that NM
would use the kernel implementation but I guess there is no way back any more.
:(
Greetings,
Hannes
^ permalink raw reply
* Kernel crash - Large UDP packet over IPv6 over UFO-enabled device with TBF qdisc (No corking needed)
From: Saran Neti @ 2013-10-29 14:30 UTC (permalink / raw)
To: netdev@vger.kernel.org; +Cc: dl TSL Vulnerability Research Team
Hi,
Sending a UDP packet of size larger than MTU over IPv6 over a device that has UFO enabled, and that uses the TBF qdisc causes the kernel to crash. Unlike CVE-2013-4387, this does not require a corked socket and can be remotely triggered by a tftp request.
Configuration:
1. Configure a Linux system with UDP UFO enabled (e.g. virtio_net).
# ethtool -k eth0 | grep udp-frag
udp-fragmentation-offload: on
2. Assign an IPv6 address to it.
# ip addr show dev eth0 | grep inet6
inet6 fd00:abcd:abcd:123::2/64 scope global
3. Change qdisc to tbf
# tc qdisc replace dev eth0 root tbf rate 200kbit latency 20ms burst 5kb
Reproduction:
a) Over Network
1. Run tftp daemon (e.g. using tftp-hpa).
# in.tftpd -6 -l -s /srv/tftp
2. From a different machine, issue a tftp command to cause the kernel to crash:
# atftp --option "blksize 5000" -g -r file1 fd00:abcd:abcd:123::2 69
Or b) Locally
Run the following python script on the vulnerable system to crash it:
#!/usr/bin/python
import socket
sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0)
sock.sendto("A"*5000, ('fd00:abcd:abcd:123::3', 1234, 0, 0))
Versions tested:
Mainline - 3.12-rc7 (HEAD: 959f58544b7f20c92d5eb43d1232c96c15c01bfb)
Stable - 3.11.6
This bug triggers on the default config as shipped with the Arch Linux kernel.
I modified it to turn on kgdb (config file attached).
Platform:
# cat /proc/cpuinfo | grep -E "model|flags"
model : 2
model name : QEMU Virtual CPU version 1.6.1
flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl pni cx16 hypervisor lahf_lm
# cat /proc/modules | grep virtio_net
virtio_net 18821 0 - Live 0xffffffffa0108000
virtio_ring 7846 5 virtio_console,virtio_balloon,virtio_blk,virtio_net,virtio_pci, Live 0xffffffffa001e000
virtio 3954 5 virtio_console,virtio_balloon,virtio_blk,virtio_net,virtio_pci, Live 0xffffffffa001a000
Crash analysis: Using 3.12-rc7 compiled with KGDB.
# (gdb) bt
#0 0xffffffff8129a6bd in memcpy () at arch/x86/lib/memcpy_64.S:69
#1 0xffffffff813e2d2a in skb_copy_from_linear_data_offset (skb=0xffff880036abd500, len=62, to=<optimized out>,
offset=-65536) at include/linux/skbuff.h:2425
#2 skb_segment (skb=skb@entry=0xffff880036abd500, features=features@entry=3221244521) at net/core/skbuff.c:2849
#3 0xffffffff814c3e25 in udp6_ufo_fragment (skb=0xffff880036abd500, features=3221244521) at net/ipv6/udp_offload.c:119
#4 0xffffffff814c3867 in ipv6_gso_segment (skb=0xffff880036abd500, features=3221244521) at net/ipv6/ip6_offload.c:120
#5 0xffffffff813f062d in skb_mac_gso_segment (skb=skb@entry=0xffff880036abd500, features=features@entry=3221244521)
at net/core/dev.c:2333
#6 0xffffffff813f0769 in __skb_gso_segment (skb=skb@entry=0xffff880036abd500, features=3221244521,
tx_path=tx_path@entry=true) at net/core/dev.c:2384
#7 0xffffffff81423eab in skb_gso_segment (features=<optimized out>, skb=0xffff880036abd500)
at include/linux/netdevice.h:2844
#8 tbf_segment (sch=0xffff88003c339800, skb=0xffff880036abd500) at net/sched/sch_tbf.c:130
#9 tbf_enqueue (skb=0xffff880036abd500, sch=0xffff88003c339800) at net/sched/sch_tbf.c:167
#10 0xffffffff813f1121 in __dev_xmit_skb (txq=0xffff88003b261e00, dev=0xffff88003b115000, q=0xffff88003c339800,
skb=0xffff880036abd500) at net/core/dev.c:2728
#11 dev_queue_xmit (skb=skb@entry=0xffff880036abd500) at net/core/dev.c:2828
#12 0xffffffff81491d7e in neigh_hh_output (skb=<optimized out>, hh=<optimized out>) at include/net/neighbour.h:355
#13 dst_neigh_output (dst=<optimized out>, skb=0xffff880036abd500, n=0xffff88003b3d7e00) at include/net/dst.h:411
#14 ip6_finish_output2 (skb=skb@entry=0xffff880036abd500) at net/ipv6/ip6_output.c:113
#15 0xffffffff81495198 in ip6_finish_output (skb=skb@entry=0xffff880036abd500) at net/ipv6/ip6_output.c:131
#16 0xffffffff81495203 in NF_HOOK_COND (cond=<optimized out>, okfn=0xffffffff81495100 <ip6_finish_output>,
out=<optimized out>, in=0x0 <irq_stack_union>, skb=0xffff880036abd500, hook=4, pf=10 '\n')
at include/linux/netfilter.h:184
#17 ip6_output (skb=0xffff880036abd500) at net/ipv6/ip6_output.c:145
#18 0xffffffff814c30c5 in dst_output (skb=0xffff880036abd500) at include/net/dst.h:450
#19 ip6_local_out (skb=skb@entry=0xffff880036abd500) at net/ipv6/output_core.c:121
#20 0xffffffff814939d4 in ip6_push_pending_frames (sk=sk@entry=0xffff88003b294440) at net/ipv6/ip6_output.c:1530
#21 0xffffffff814ac7a8 in udp_v6_push_pending_frames (sk=sk@entry=0xffff88003b294440) at net/ipv6/udp.c:1003
#22 0xffffffff814adb16 in udpv6_sendmsg (iocb=<optimized out>, sk=0xffff88003b294440, msg=<optimized out>, len=5004)
at net/ipv6/udp.c:1257
#23 0xffffffff814713c0 in inet_sendmsg (iocb=0xffff880036bb9d68, sock=<optimized out>, msg=0xffff880036bb9e90, size=5004)
at net/ipv4/af_inet.c:770
#24 0xffffffff813d59a0 in __sock_sendmsg_nosec (size=5004, msg=0xffff880036bb9e90, sock=0xffff880037775900,
iocb=0xffff880036bb9d68) at net/socket.c:631
#25 __sock_sendmsg (size=5004, msg=0xffff880036bb9e90, sock=0xffff880037775900, iocb=0xffff880036bb9d68) at net/socket.c:639
#26 sock_sendmsg (sock=sock@entry=0xffff880037775900, msg=msg@entry=0xffff880036bb9e90, size=size@entry=5004)
at net/socket.c:650
#27 0xffffffff813d7fd1 in SYSC_sendto (addr_len=0, addr=0x0 <irq_stack_union>, flags=<optimized out>, len=5004,
buff=0x6387a4, fd=<optimized out>) at net/socket.c:1796
#28 SyS_sendto (fd=<optimized out>, buff=6522788, len=<optimized out>, flags=0, addr=0, addr_len=0) at net/socket.c:1761
#29 <signal handler called>
# (gdb) display/i $pc
2: x/i $pc
=> 0xffffffff8129a6bd <memcpy+13>: rep movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi]
(gdb) info registers
rax 0xffff88003b251dfa -131940403044870
rbx 0xffffffffffff003e -65474
rcx 0x7 7
rdx 0x6 6
rsi 0x6 6
rdi 0xffff88003b251dfa -131940403044870
rbp 0xffff88003be498e0 0xffff88003be498e0
rsp 0xffff88003be49838 0xffff88003be49838
r8 0xc0 192
r9 0x300 768
r10 0xffff88003e001600 -131940355140096
r11 0xffff88003e001600 -131940355140096
r12 0x8 8
r13 0xffff88003ae5b700 -131940407200000
r14 0x0 0
r15 0xffff88003b7f6600 -131940397128192
rip 0xffffffff8129a6bd 0xffffffff8129a6bd <memcpy+13>
eflags 0x10206 [ PF IF RF ]
cs 0x10 16
ss 0x18 24
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
I have not tested this over other possible configurations (e.g. different qdiscs). Code paths other than the one shown in the backtrace might also be affected.
If there is any other information I can help you with, please let me know.
--
Saran Neti,
Security Researcher, TELUS Security Labs
^ permalink raw reply
* Re: Bug in skb_segment: fskb->len != len
From: Herbert Xu @ 2013-10-29 14:41 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Christoph Paasch, netdev
In-Reply-To: <1383009308.5464.2.camel@edumazet-glaptop.roam.corp.google.com>
On Mon, Oct 28, 2013 at 06:15:08PM -0700, Eric Dumazet wrote:
> On Mon, 2013-10-28 at 06:21 -0700, Eric Dumazet wrote:
>
> > But we also need to fix the skb_segment() bug anyway.
>
> Hi Christoph
>
> I cooked a minimal patch, could you please try it ?
>
> I'll refactor skb_segment() to be smarter for the next release
> (linux-3.14).
I think this patch is just papering over a deeper issue.
We should either be building skbs in pages, or using frag_list.
In the latter case each frag_list must be exactly mss bytes,
except for the last one.
So if we're crashing here it means that we got mixed up on the
receive side, either because the driver was sending us bogus skbs
or we're simply buggy.
So we need to figure out why the receive-side (i.e., GRO) is building
these bogus packets, and not papering over them on the transmit-side.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH] bridge: pass correct vlan id to multicast code
From: Vlad Yasevich @ 2013-10-29 15:00 UTC (permalink / raw)
To: Amos Kong; +Cc: netdev, shemminger, makita.toshiaki
In-Reply-To: <20131029023646.GA2795@amosk.info>
On 10/28/2013 10:36 PM, Amos Kong wrote:
> On Mon, Oct 28, 2013 at 03:45:07PM -0400, Vlad Yasevich wrote:
>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
>> index 8b0b610..686284f 100644
>> --- a/net/bridge/br_multicast.c
>> +++ b/net/bridge/br_multicast.c
>> @@ -947,7 +947,8 @@ void br_multicast_disable_port(struct net_bridge_port *port)
>>
>> static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
>> struct net_bridge_port *port,
>> - struct sk_buff *skb)
>> + struct sk_buff *skb,
>> + u16 vid)
>> {
>> struct igmpv3_report *ih;
>> struct igmpv3_grec *grec;
>> @@ -957,12 +958,10 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
>> int type;
>> int err = 0;
>> __be32 group;
>> - u16 vid = 0;
>>
>> if (!pskb_may_pull(skb, sizeof(*ih)))
>> return -EINVAL;
>>
>> - br_vlan_get_tag(skb, &vid);
>
Sorry, missed this question last time.
> After applied the patch, we always use vid in br_dev_xmit()->br_allowed_ingress(),
> is it possible that the vlan of bridge is re-enabled when other
> changed functions are called?
>
If the frame was allowed to enter, then the current configuration should
apply the the frame. If the config changes during the frame
processing we don't really want to use that. Otherwise, you'd get
inconsistent results.
> We can just add a enabled checking before this kind of br_vlan_get_tag()?
>
> if (!br->vlan_enabled)
> br_vlan_get_tag(skb2, &vid);
>
This sort of what the next patches I am working on do. But we still
want to get the vlan id once and then use it throught out. There is
no need to retrieve it again.
-vlad
>
>> ih = igmpv3_report_hdr(skb);
>> num = ntohs(ih->ngrec);
>> len = sizeof(*ih);
>
> ...
>
^ permalink raw reply
* Re: Bug in skb_segment: fskb->len != len
From: Eric Dumazet @ 2013-10-29 15:08 UTC (permalink / raw)
To: Herbert Xu; +Cc: Christoph Paasch, netdev
In-Reply-To: <20131029144100.GA28046@gondor.apana.org.au>
On Tue, 2013-10-29 at 22:41 +0800, Herbert Xu wrote:
> On Mon, Oct 28, 2013 at 06:15:08PM -0700, Eric Dumazet wrote:
> > On Mon, 2013-10-28 at 06:21 -0700, Eric Dumazet wrote:
> >
> > > But we also need to fix the skb_segment() bug anyway.
> >
> > Hi Christoph
> >
> > I cooked a minimal patch, could you please try it ?
> >
> > I'll refactor skb_segment() to be smarter for the next release
> > (linux-3.14).
>
> I think this patch is just papering over a deeper issue.
>
> We should either be building skbs in pages, or using frag_list.
> In the latter case each frag_list must be exactly mss bytes,
> except for the last one.
>
> So if we're crashing here it means that we got mixed up on the
> receive side, either because the driver was sending us bogus skbs
> or we're simply buggy.
>
> So we need to figure out why the receive-side (i.e., GRO) is building
> these bogus packets, and not papering over them on the transmit-side.
It looks like you missed a lot of recent changes.
GRO layer was updated to be able to stack two or three sk_buff,
fully populated with page frags.
Thats quite mandatory to support line rate for 40Gb links.
We now have to make skb_segment() aware of this, I missed this part.
^ permalink raw reply
* [PATCH v2 net-next] net: introduce gro_frag_list_enable sysctl
From: Eric Dumazet @ 2013-10-29 15:12 UTC (permalink / raw)
To: Christoph Paasch
Cc: David Miller, Herbert Xu, netdev, Jerry Chu, Michael Dalton
In-Reply-To: <1383051962.5464.25.camel@edumazet-glaptop.roam.corp.google.com>
From: Eric Dumazet <edumazet@google.com>
Christoph Paasch and Jerry Chu reported crashes in skb_segment() caused
by commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")
(Jerry is working on adding native GRO support for tunnels)
skb_segment() only deals with a frag_list chain containing MSS sized
fragments.
This patch adds support any kind of frag, and adds a new sysctl,
as clearly the GRO layer should avoid building frag_list skbs
on a router, as the segmentation is adding cpu overhead.
Note that we could try to reuse page fragments instead of doing
copy to linear skbs, but this requires a fair amount of work,
and possible truesize nightmares, as we do not track individual
(per page fragment) truesizes.
/proc/sys/net/core/gro_frag_list_enable possible values are :
0 : GRO layer is not allowed to use frag_list to extend skb capacity
1 : GRO layer is allowed to use frag_list, but skb_segment()
automatically sets the sysctl to 0.
2 : GRO is allowed to use frag_list, and skb_segment() wont
clear the sysctl.
Default value is 1 : automatic discovery
Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Reported-by: Jerry Chu <hkchu@google.com>
Cc: Michael Dalton <mwdalton@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
v2: added missing sysctl definition in skbuff.c
Documentation/sysctl/net.txt | 19 +++++++++++++++++++
include/linux/netdevice.h | 1 +
net/core/skbuff.c | 31 ++++++++++++++++++++++---------
net/core/sysctl_net_core.c | 10 ++++++++++
4 files changed, 52 insertions(+), 9 deletions(-)
diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 9a0319a82470..8778568ae64e 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -87,6 +87,25 @@ sysctl.net.busy_read globally.
Will increase power usage.
Default: 0 (off)
+gro_frag_list_enable
+--------------------
+
+GRO layer can build full size GRO packets (~64K of payload) if it is allowed
+to extend skb using the frag_list pointer. However, this strategy is a win
+on hosts, where TCP flows are terminated. For a router, using frag_list
+skbs is not a win because we have to segment skbs before transmit,
+as most NIC drivers do not support frag_list.
+As soon as one frag_list skb has to be segmented, this sysctl is automatically
+changed from 1 to 0.
+If the value is set to 2, kernel wont change it.
+
+Choices : 0 (off),
+ 1 (on, with automatic change to 0)
+ 2 (on, permanent)
+
+Default: 1 (on, with automatic downgrade on a router)
+
+
rmem_default
------------
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 27f62f746621..b82ff52f301e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2807,6 +2807,7 @@ extern int netdev_max_backlog;
extern int netdev_tstamp_prequeue;
extern int weight_p;
extern int bpf_jit_enable;
+extern int sysctl_gro_frag_list_enable;
bool netdev_has_upper_dev(struct net_device *dev, struct net_device *upper_dev);
bool netdev_has_any_upper_dev(struct net_device *dev);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0ab32faa520f..e089cd2782e5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -74,6 +74,8 @@
struct kmem_cache *skbuff_head_cache __read_mostly;
static struct kmem_cache *skbuff_fclone_cache __read_mostly;
+int sysctl_gro_frag_list_enable __read_mostly = 1;
+
static void sock_pipe_buf_release(struct pipe_inode_info *pipe,
struct pipe_buffer *buf)
{
@@ -2761,7 +2763,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
unsigned int len;
__be16 proto;
bool csum;
- int sg = !!(features & NETIF_F_SG);
+ bool sg = !!(features & NETIF_F_SG);
int nfrags = skb_shinfo(skb)->nr_frags;
int err = -ENOMEM;
int i = 0;
@@ -2793,7 +2795,13 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
hsize = len;
if (!hsize && i >= nfrags) {
- BUG_ON(fskb->len != len);
+ if (fskb->len != len) {
+ if (sysctl_gro_frag_list_enable == 1)
+ sysctl_gro_frag_list_enable = 0;
+ hsize = len;
+ sg = false;
+ goto do_linear;
+ }
pos += len;
nskb = skb_clone(fskb, GFP_ATOMIC);
@@ -2812,6 +2820,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
skb_release_head_state(nskb);
__skb_push(nskb, doffset);
} else {
+do_linear:
nskb = __alloc_skb(hsize + doffset + headroom,
GFP_ATOMIC, skb_alloc_rx_flag(skb),
NUMA_NO_NODE);
@@ -2838,9 +2847,6 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
nskb->data - tnl_hlen,
doffset + tnl_hlen);
- if (fskb != skb_shinfo(skb)->frag_list)
- goto perform_csum_check;
-
if (!sg) {
nskb->ip_summed = CHECKSUM_NONE;
nskb->csum = skb_copy_and_csum_bits(skb, offset,
@@ -2849,6 +2855,9 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
continue;
}
+ if (fskb != skb_shinfo(skb)->frag_list)
+ goto perform_csum_check;
+
frag = skb_shinfo(nskb)->frags;
skb_copy_from_linear_data_offset(skb, offset,
@@ -2944,9 +2953,11 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
int i = skbinfo->nr_frags;
int nr_frags = pinfo->nr_frags + i;
- if (nr_frags > MAX_SKB_FRAGS)
+ if (unlikely(nr_frags > MAX_SKB_FRAGS)) {
+ if (!sysctl_gro_frag_list_enable)
+ return -E2BIG;
goto merge;
-
+ }
offset -= headlen;
pinfo->nr_frags = nr_frags;
skbinfo->nr_frags = 0;
@@ -2977,9 +2988,11 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
unsigned int first_size = headlen - offset;
unsigned int first_offset;
- if (nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS)
+ if (unlikely(nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS)) {
+ if (!sysctl_gro_frag_list_enable)
+ return -E2BIG;
goto merge;
-
+ }
first_offset = skb->data -
(unsigned char *)page_address(page) +
offset;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index cca444190907..2d6aaf6d5838 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -24,6 +24,7 @@
static int zero = 0;
static int one = 1;
+static int two = 2;
static int ushort_max = USHRT_MAX;
#ifdef CONFIG_RPS
@@ -360,6 +361,15 @@ static struct ctl_table net_core_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "gro_frag_list_enable",
+ .data = &sysctl_gro_frag_list_enable,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &two,
+ },
{ }
};
^ permalink raw reply related
* Re: [PATCH] bgmac: don't update slot on skb alloc/dma mapping error
From: Nathan Hintz @ 2013-10-29 15:20 UTC (permalink / raw)
To: Rafał Miłecki; +Cc: Network Development
In-Reply-To: <CACna6rzqwz_wp=9ayX6AUZvVSNwdAg-LniXtNwy9eRFDEoNspg@mail.gmail.com>
On Tue, 29 Oct 2013 09:28:56 +0100
Rafał Miłecki <zajec5@gmail.com> wrote:
> 2013/10/29 Nathan Hintz <nlhintz@hotmail.com>:
> > On Tue, 29 Oct 2013 07:52:58 +0100
> > Rafał Miłecki <zajec5@gmail.com> wrote:
> >
> >> 2013/10/29 Nathan Hintz <nlhintz@hotmail.com>:
> >> > Don't update the slot in "bgmac_dma_rx_skb_for_slot" unless both the
> >> > skb alloc and dma mapping are successful; and free the newly allocated
> >> > skb if a dma mapping error occurs.
> >> > returning when an error occurs.
> >>
> >> In case of bgmac_dma_rx_skb_for_slot failure we're giving up anyway
> >> (and freeing everything), but with your patch code is simpler to
> >> understand, so I'm OK with that.
> >>
> >> Acked-by: Rafał Miłecki <zajec5@gmail.com>
> >>
> >
> > I might be misunderstanding; but it in the case of failure, it appeared to me
> > that the currently received packet was dropped and the old skb would continue
> > to be assigned to the slot and would be used to receive future packets (this
> > would continue until bgmac_dma_rx_skb_for_slot was successful).
>
> I was commenting on current usage (.), not my WIP patch
> for bgmac_dma_rx_read :)
>
> Your patch will be helpful for my bgmac_dma_rx_read rework.
>
You're right, I was commenting to you WIP. The commit message should probably
be changed to remove the statement "This will prevent an skb leak upon returning
when an error occurs", as this doesn't occur with the usage in bgmac_dma_alloc.
Unfortunately, I won't be able to send a revised patch until tonight.
Nathan
--
Nathan
^ permalink raw reply
* Loan Offer
From: Peter Moore @ 2013-10-29 15:25 UTC (permalink / raw)
To: Recipients
Hello Everybody, I am Dr. Peter Moore from Money Mutual, a private legit and govt approved lender,i give loan with a low interest rate of 3% to everyone,i.e house loan,car loan,business loan,education loan e.t.c, you can contact me at Email: peterloan85@gmail.com
^ permalink raw reply
* [PATCH net-next] xen-netback: allocate xenvif arrays using vzalloc.
From: Joby Poriyath @ 2013-10-29 15:27 UTC (permalink / raw)
To: netdev
Cc: wei.liu2, ian.campbell, xen-devel, andrew.bennieston,
david.vrabel, malcolm.crossley
This will reduce memory pressure when allocating struct xenvif.
The size of xenvif struct has increased from 168 to 36632 bytes (on x86-32).
See commit b3f980bd827e6e81a050c518d60ed7811a83061d. This resulted in
occasional netdev allocation failure in dom0 with 752MiB RAM, due to
fragmented memory.
Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com>
---
drivers/net/xen-netback/common.h | 10 +++---
drivers/net/xen-netback/interface.c | 61 +++++++++++++++++++++++++++++++++++
drivers/net/xen-netback/netback.c | 6 ++--
3 files changed, 69 insertions(+), 8 deletions(-)
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 55b8dec..82515a3 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -114,17 +114,17 @@ struct xenvif {
char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */
struct xen_netif_tx_back_ring tx;
struct sk_buff_head tx_queue;
- struct page *mmap_pages[MAX_PENDING_REQS];
+ struct page **mmap_pages; /* [MAX_PENDING_REQS]; */
pending_ring_idx_t pending_prod;
pending_ring_idx_t pending_cons;
u16 pending_ring[MAX_PENDING_REQS];
- struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
+ struct pending_tx_info *pending_tx_info; /* [MAX_PENDING_REQS]; */
/* Coalescing tx requests before copying makes number of grant
* copy ops greater or equal to number of slots required. In
* worst case a tx request consumes 2 gnttab_copy.
*/
- struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
+ struct gnttab_copy *tx_copy_ops; /* [2*MAX_PENDING_REQS]; */
/* Use kthread for guest RX */
@@ -147,8 +147,8 @@ struct xenvif {
* head/fragment page uses 2 copy operations because it
* straddles two buffers in the frontend.
*/
- struct gnttab_copy grant_copy_op[2*XEN_NETIF_RX_RING_SIZE];
- struct xenvif_rx_meta meta[2*XEN_NETIF_RX_RING_SIZE];
+ struct gnttab_copy *grant_copy_op; /* [2*XEN_NETIF_RX_RING_SIZE]; */
+ struct xenvif_rx_meta *meta; /* [2*XEN_NETIF_RX_RING_SIZE]; */
u8 fe_dev_addr[6];
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index e4aa267..d4a9807 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -288,6 +288,60 @@ static const struct net_device_ops xenvif_netdev_ops = {
.ndo_validate_addr = eth_validate_addr,
};
+static void deallocate_xenvif_arrays(struct xenvif *vif)
+{
+ vfree(vif->mmap_pages);
+ vif->mmap_pages = NULL;
+
+ vfree(vif->pending_tx_info);
+ vif->pending_tx_info = NULL;
+
+ vfree(vif->tx_copy_ops);
+ vif->tx_copy_ops = NULL;
+
+ vfree(vif->grant_copy_op);
+ vif->grant_copy_op = NULL;
+
+ vfree(vif->meta);
+ vif->meta = NULL;
+}
+
+static int allocate_xenvif_arrays(struct xenvif *vif)
+{
+ vif->mmap_pages = vif->pending_tx_info = NULL;
+ vif->tx_copy_ops = vif->grant_copy_op = vif->meta = NULL;
+
+ vif->mmap_pages = vzalloc(MAX_PENDING_REQS * sizeof(struct page *));
+ if (! vif->mmap_pages)
+ goto fail;
+
+ vif->pending_tx_info = vzalloc(MAX_PENDING_REQS *
+ sizeof(struct pending_tx_info));
+ if (! vif->pending_tx_info)
+ goto fail;
+
+ vif->tx_copy_ops = vzalloc(2 * MAX_PENDING_REQS *
+ sizeof(struct gnttab_copy));
+ if (! vif->tx_copy_ops)
+ goto fail;
+
+ vif->grant_copy_op = vzalloc(2 * XEN_NETIF_RX_RING_SIZE *
+ sizeof(struct gnttab_copy));
+ if (! vif->grant_copy_op)
+ goto fail;
+
+ vif->meta = vzalloc(2 * XEN_NETIF_RX_RING_SIZE *
+ sizeof(struct xenvif_rx_meta));
+ if (! vif->meta)
+ goto fail;
+
+ return 0;
+
+fail:
+ deallocate_xenvif_arrays(vif);
+ return 1;
+}
+
struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
unsigned int handle)
{
@@ -313,6 +367,12 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
vif->ip_csum = 1;
vif->dev = dev;
+ if (allocate_xenvif_arrays(vif)) {
+ netdev_warn(dev, "Could not create device: out of memory\n");
+ free_netdev(dev);
+ return ERR_PTR(-ENOMEM);
+ }
+
vif->credit_bytes = vif->remaining_credit = ~0UL;
vif->credit_usec = 0UL;
init_timer(&vif->credit_timeout);
@@ -484,6 +544,7 @@ void xenvif_free(struct xenvif *vif)
unregister_netdev(vif->dev);
+ deallocate_xenvif_arrays(vif);
free_netdev(vif->dev);
module_put(THIS_MODULE);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 828fdab..34c0c05 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -602,12 +602,12 @@ void xenvif_rx_action(struct xenvif *vif)
break;
}
- BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta));
+ BUG_ON(npo.meta_prod > 2*XEN_NETIF_RX_RING_SIZE);
if (!npo.copy_prod)
return;
- BUG_ON(npo.copy_prod > ARRAY_SIZE(vif->grant_copy_op));
+ BUG_ON(npo.copy_prod > 2*XEN_NETIF_RX_RING_SIZE);
gnttab_batch_copy(vif->grant_copy_op, npo.copy_prod);
while ((skb = __skb_dequeue(&rxq)) != NULL) {
@@ -1571,7 +1571,7 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif)
vif->tx.req_cons = idx;
- if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops))
+ if ((gop-vif->tx_copy_ops) >= 2*MAX_PENDING_REQS)
break;
}
--
1.7.10.4
^ permalink raw reply related
* Realtek RTL8102E registers
From: Ivan Frederiks @ 2013-10-29 15:05 UTC (permalink / raw)
To: Linux r8169 crew
Hello!
I met troubles with RTL8102E operation (you can find detailed
description below). I suppose that those troubles are related to chip
misconfiguration. Maybe you have access to RTL8102E register
description? If yes, would you be so kind to share it with me?
Thank you in advance,
Ivan Frederiks
Embedded developer
Speech Technology Center
Phone: +7-812-331-0665, ext. 6123, 6942
Fax: +7-812-327-9297
P.S.
Issue description:
I'm currently working with a custom motherboard equipped with 2 RTL8102E
ICs and an Intel x86 SOM. SOM runs 32-bit Linux (Arch or Ubuntu).
By default I use r8169 driver. I know that Ethernet link is up (I
checked it with an oscilloscope and I see that RTL8102E link LEDs are
on). In most cases everything works fine. But sometimes driver reports
that link is down. After a power cycle driver reports, that link is up.
When I replace driver with r8101, it always reports that link is up, but
I observe other issues.
^ permalink raw reply
* Re: [PATCH net-next] xen-netback: allocate xenvif arrays using vzalloc.
From: Eric Dumazet @ 2013-10-29 15:43 UTC (permalink / raw)
To: Joby Poriyath
Cc: netdev, wei.liu2, ian.campbell, xen-devel, andrew.bennieston,
david.vrabel, malcolm.crossley
In-Reply-To: <20131029152628.GA3065@citrix.com>
On Tue, 2013-10-29 at 15:27 +0000, Joby Poriyath wrote:
> This will reduce memory pressure when allocating struct xenvif.
>
> The size of xenvif struct has increased from 168 to 36632 bytes (on x86-32).
> See commit b3f980bd827e6e81a050c518d60ed7811a83061d. This resulted in
> occasional netdev allocation failure in dom0 with 752MiB RAM, due to
> fragmented memory.
This looks overkill.
Replacing a single allocation of ~36 KB into 5 vmalloc() looks like you
did not really tried other things...
This should be done generically in alloc_netdev_mqs()
Take a look at commit 60877a32bce00041
("net: allow large number of tx queues")
^ permalink raw reply
* Re: [PATCH net-next] xen-netback: allocate xenvif arrays using vzalloc.
From: Wei Liu @ 2013-10-29 15:50 UTC (permalink / raw)
To: Joby Poriyath
Cc: netdev, wei.liu2, ian.campbell, xen-devel, andrew.bennieston,
david.vrabel, malcolm.crossley
In-Reply-To: <20131029152628.GA3065@citrix.com>
On Tue, Oct 29, 2013 at 03:27:13PM +0000, Joby Poriyath wrote:
[...]
> +
> +static int allocate_xenvif_arrays(struct xenvif *vif)
> +{
> + vif->mmap_pages = vif->pending_tx_info = NULL;
> + vif->tx_copy_ops = vif->grant_copy_op = vif->meta = NULL;
> +
> + vif->mmap_pages = vzalloc(MAX_PENDING_REQS * sizeof(struct page *));
> + if (! vif->mmap_pages)
No space after "!".
> + goto fail;
> +
> + vif->pending_tx_info = vzalloc(MAX_PENDING_REQS *
> + sizeof(struct pending_tx_info));
> + if (! vif->pending_tx_info)
> + goto fail;
> +
> + vif->tx_copy_ops = vzalloc(2 * MAX_PENDING_REQS *
> + sizeof(struct gnttab_copy));
> + if (! vif->tx_copy_ops)
> + goto fail;
> +
> + vif->grant_copy_op = vzalloc(2 * XEN_NETIF_RX_RING_SIZE *
> + sizeof(struct gnttab_copy));
> + if (! vif->grant_copy_op)
> + goto fail;
> +
> + vif->meta = vzalloc(2 * XEN_NETIF_RX_RING_SIZE *
> + sizeof(struct xenvif_rx_meta));
Indentation.
> + if (! vif->meta)
> + goto fail;
> +
> + return 0;
> +
> +fail:
> + deallocate_xenvif_arrays(vif);
> + return 1;
return -ENOMEM;
> +}
> +
> struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
> unsigned int handle)
> {
> @@ -313,6 +367,12 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
> vif->ip_csum = 1;
> vif->dev = dev;
>
> + if (allocate_xenvif_arrays(vif)) {
> + netdev_warn(dev, "Could not create device: out of memory\n");
> + free_netdev(dev);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> vif->credit_bytes = vif->remaining_credit = ~0UL;
> vif->credit_usec = 0UL;
> init_timer(&vif->credit_timeout);
> @@ -484,6 +544,7 @@ void xenvif_free(struct xenvif *vif)
>
> unregister_netdev(vif->dev);
>
> + deallocate_xenvif_arrays(vif);
> free_netdev(vif->dev);
>
> module_put(THIS_MODULE);
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 828fdab..34c0c05 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -602,12 +602,12 @@ void xenvif_rx_action(struct xenvif *vif)
> break;
> }
>
> - BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta));
> + BUG_ON(npo.meta_prod > 2*XEN_NETIF_RX_RING_SIZE);
>
> if (!npo.copy_prod)
> return;
>
> - BUG_ON(npo.copy_prod > ARRAY_SIZE(vif->grant_copy_op));
> + BUG_ON(npo.copy_prod > 2*XEN_NETIF_RX_RING_SIZE);
It's better to
#define XEN_NETBK_RX_ARRAY_SIZE (2*XEN_NETIF_RX_RING_SIZE)
And use it consistently in code (allocation / comparison). Otherwise
it's easy to change one site and forget about others.
> gnttab_batch_copy(vif->grant_copy_op, npo.copy_prod);
>
> while ((skb = __skb_dequeue(&rxq)) != NULL) {
> @@ -1571,7 +1571,7 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif)
>
> vif->tx.req_cons = idx;
>
> - if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops))
> + if ((gop-vif->tx_copy_ops) >= 2*MAX_PENDING_REQS)
Same here.
#define XEN_NETBK_TX_ARRAY_SIZE
Wei.
^ permalink raw reply
* Re: [PATCH 00/19] Enable various Renesas drivers on all ARM platforms
From: Mark Brown @ 2013-10-29 16:04 UTC (permalink / raw)
To: Simon Horman
Cc: linux-fbdev, Wolfram Sang, Linus Walleij, Guennadi Liakhovetski,
Thierry Reding, linux-mtd, linux-i2c, Laurent Pinchart,
Vinod Koul, Joerg Roedel, linux-sh, Magnus Damm, Eduardo Valentin,
Tomi Valkeinen, linux-serial, linux-input, Zhang Rui, Chris Ball,
Jean-Christophe Plagniol-Villard, linux-media, linux-pwm,
Samuel Ortiz, linux-pm, Ian Molton, linux-arm-ker
In-Reply-To: <20131029060427.GF11580@verge.net.au>
[-- Attachment #1.1: Type: text/plain, Size: 223 bytes --]
On Tue, Oct 29, 2013 at 03:04:27PM +0900, Simon Horman wrote:
> I think this is a step in a good direction.
> However, I think it would be even better if the architecture dependency was
> removed completely.
Yes, please.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [PATCH v2] can: c_can: Speed up rx_poll function
From: Joe Perches @ 2013-10-29 16:24 UTC (permalink / raw)
To: Markus Pargmann
Cc: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev,
linux-kernel, kernel
In-Reply-To: <20131029085853.GC20839@pengutronix.de>
On Tue, 2013-10-29 at 09:58 +0100, Markus Pargmann wrote:
> On Tue, Oct 29, 2013 at 01:34:48AM -0700, Joe Perches wrote:
> > On Tue, 2013-10-29 at 09:27 +0100, Markus Pargmann wrote:
> > > This patch speeds up the rx_poll function by reducing the number of
> > > register reads.
> > []
> > > 125kbit:
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > c_can_do_rx_poll 63960 10168178 us 158.977 us 1493056 us
> > > With patch:
> > > c_can_do_rx_poll 63939 4268457 us 66.758 us 818790.9 us
> > >
> > > 1Mbit:
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > c_can_do_rx_poll 69489 30049498 us 432.435 us 9271851 us
> > > With patch:
> > > c_can_do_rx_poll 103034 24220362 us 235.071 us 6016656 us
[]
> Yes I just measured the timings again:
[]
> ./perf_can_test.sh 125000 30
[]
> c_can_do_rx_poll 63941 3764057 us 58.867 us 776162.2 us
Good, it's slightly faster still.
> ./perf_can_test.sh 1000000 30
[]
> c_can_do_rx_poll 207109 24322185 us 117.436 us 171469047 us
[]
> It is interesting that the number of hits for c_can_do_rx_poll is twice as much
> as it was with find_next_bit.
How is this possible? Any idea?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox