Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Hannes Frederic Sowa @ 2013-10-29 14:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: David Miller, jiri, vyasevich, netdev, kuznet, jmorris, yoshfuji,
	kaber, thaller, stephen
In-Reply-To: <1383057078.2236.12.camel@dcbw.foobar.com>

Hi!

On Tue, Oct 29, 2013 at 09:31:18AM -0500, Dan Williams wrote:
> On Tue, 2013-10-29 at 00:48 +0100, Hannes Frederic Sowa wrote:
> > On Mon, Oct 28, 2013 at 06:16:19PM -0500, Dan Williams wrote:
> > > On Mon, 2013-10-28 at 17:17 -0400, David Miller wrote:
> > > > From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > Date: Sun, 27 Oct 2013 17:48:35 +0100
> > > > 
> > > > > A temporary address is also bound to a non-privacy public address so
> > > > > it's lifetime is determined by its lifetime (e.g. if you switch the
> > > > > network and don't receive on-link information for that prefix any
> > > > > more). NetworkManager would have to take care about that, too. It is
> > > > > just a question of what NetworkManager wants to handle itself or lets
> > > > > the kernel handle for it.
> > > > 
> > > > How much really needs to be in userspace to implement RFC4941?
> > > > 
> > > > I don't like the idea that even for a fully up and properly
> > > > functioning link, if NetworkManager wedges then critical things like
> > > > temporary address (re-)generation, will cease.
> > > 
> > > Honestly, I'd be completely happy to leave temporary address handling up
> > > to the kernel and *not* do it in userspace; the kernel already has all
> > > the code.  There are two problems with that though, (a) it's tied to
> > > in-kernel RA handling, and (b) it's controlled by a CONFIG option.  Both
> > > these are solvable.
> > 
> > Ah, (a) does complicate things, I agree. But the tieing is essential
> > currently. So it seems a netlink interface would be needed to tie a new
> > address to an already installed one, if the kernel should still deal
> > with the regeneration?
> 
> I think it's simpler than that.  New flag set when adding the
> non-private address that says "create and manage privacy addresses for
> this non-private address".  The kernel then adds the privacy addresses
> generated off the non-private address/prefixlen, and ties their lifetime
> to the non-private address.  If the non-private address is removed, the
> privacy addresses could get removed too.
> 
> I don't think we need API to tie addresses to already installed ones,
> because the kernel already has the privacy address generation code, so
> why should userspace generate the privacy address at all?  Just leave
> that to the kernel.

Ok.

> > > First off, what's the reasoning behind having IPv6 privacy as a config
> > > option?  It's off-by-default and must be explicitly turned on, so is
> > > there any harm in removing the config?  Or is it just for
> > > smallest-kernel-ever folks?
> > 
> > I don't know about the policy. Does it really matter as distributions
> > normally switch it on? But I would not like to see the option removed
> > entirly, maybe the default could be changed.
> > 
> > > Would a new IFA_F_MANAGE_TEMP (or better name) work here, indicating
> > > that for some new static address, that the kernel should create and
> > > manage the temporary privacy addresses associated with its prefix?
> > 
> > But this would only be needed if they were managed in user-space, no?
> 
> "if they" == what?  privacy address or static address?  What

With "they" I meant privacy addresses.

> NetworkManager is trying to do is handle RAs in userspace with libndp
> for various flexibility and behavioral reasons, but we'd really like to
> leave all the temporary address stuff up to the kernel.

Can you provide me with details why the Kernel RA implementation is not good
enough? I tried to find some bugs, I found some but they were missing details
or were not even correct or outdated.

> So NM would handle RA/RS and when it gets a prefix, it would create the
> IPv6 non-private address and add it to the interface.  When adding, it
> would also set the "IFA_F_MANAGE_TEMP" flag (or whatever) and the kernel
> would then handle all the privacy address generation, lifetimes, and
> timers.  Basically, break some of the privacy code away from the
> in-kernel RA handling so that privacy addresses could be triggered from
> userland too.
> 
> Would that be workable?

That sounds like a solid plan for me. I would actually liked to see that NM
would use the kernel implementation but I guess there is no way back any more.
:(

Greetings,

  Hannes

^ permalink raw reply

* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Dan Williams @ 2013-10-29 14:31 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Miller, jiri, vyasevich, netdev, kuznet, jmorris, yoshfuji,
	kaber, thaller, stephen
In-Reply-To: <20131028234842.GB26185@order.stressinduktion.org>

On Tue, 2013-10-29 at 00:48 +0100, Hannes Frederic Sowa wrote:
> On Mon, Oct 28, 2013 at 06:16:19PM -0500, Dan Williams wrote:
> > On Mon, 2013-10-28 at 17:17 -0400, David Miller wrote:
> > > From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > Date: Sun, 27 Oct 2013 17:48:35 +0100
> > > 
> > > > A temporary address is also bound to a non-privacy public address so
> > > > it's lifetime is determined by its lifetime (e.g. if you switch the
> > > > network and don't receive on-link information for that prefix any
> > > > more). NetworkManager would have to take care about that, too. It is
> > > > just a question of what NetworkManager wants to handle itself or lets
> > > > the kernel handle for it.
> > > 
> > > How much really needs to be in userspace to implement RFC4941?
> > > 
> > > I don't like the idea that even for a fully up and properly
> > > functioning link, if NetworkManager wedges then critical things like
> > > temporary address (re-)generation, will cease.
> > 
> > Honestly, I'd be completely happy to leave temporary address handling up
> > to the kernel and *not* do it in userspace; the kernel already has all
> > the code.  There are two problems with that though, (a) it's tied to
> > in-kernel RA handling, and (b) it's controlled by a CONFIG option.  Both
> > these are solvable.
> 
> Ah, (a) does complicate things, I agree. But the tieing is essential
> currently. So it seems a netlink interface would be needed to tie a new
> address to an already installed one, if the kernel should still deal
> with the regeneration?

I think it's simpler than that.  New flag set when adding the
non-private address that says "create and manage privacy addresses for
this non-private address".  The kernel then adds the privacy addresses
generated off the non-private address/prefixlen, and ties their lifetime
to the non-private address.  If the non-private address is removed, the
privacy addresses could get removed too.

I don't think we need API to tie addresses to already installed ones,
because the kernel already has the privacy address generation code, so
why should userspace generate the privacy address at all?  Just leave
that to the kernel.

> > First off, what's the reasoning behind having IPv6 privacy as a config
> > option?  It's off-by-default and must be explicitly turned on, so is
> > there any harm in removing the config?  Or is it just for
> > smallest-kernel-ever folks?
> 
> I don't know about the policy. Does it really matter as distributions
> normally switch it on? But I would not like to see the option removed
> entirly, maybe the default could be changed.
> 
> > Would a new IFA_F_MANAGE_TEMP (or better name) work here, indicating
> > that for some new static address, that the kernel should create and
> > manage the temporary privacy addresses associated with its prefix?
> 
> But this would only be needed if they were managed in user-space, no?

"if they" == what?  privacy address or static address?  What
NetworkManager is trying to do is handle RAs in userspace with libndp
for various flexibility and behavioral reasons, but we'd really like to
leave all the temporary address stuff up to the kernel.

So NM would handle RA/RS and when it gets a prefix, it would create the
IPv6 non-private address and add it to the interface.  When adding, it
would also set the "IFA_F_MANAGE_TEMP" flag (or whatever) and the kernel
would then handle all the privacy address generation, lifetimes, and
timers.  Basically, break some of the privacy code away from the
in-kernel RA handling so that privacy addresses could be triggered from
userland too.

Would that be workable?

Dan 

^ permalink raw reply

* Re: IPV6 nf defrag does not work
From: Jiri Pirko @ 2013-10-29 14:30 UTC (permalink / raw)
  To: netdev; +Cc: pablo, netfilter-devel, yoshfuji, kadlec, kaber
In-Reply-To: <20131029105208.GA18526@minipsycho.orion>

Tue, Oct 29, 2013 at 11:52:08AM CET, jiri@resnulli.us wrote:
>Hi All.
>
>On the current net-next if you on HOSTA do:
>ip6tables -I INPUT -p icmpv6 -j DROP
>ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>
>and on HOSTB you do:
>ping6 HOSTA -s2000    (MTU is 1500)
>
>Only the first ICMP echo request will be passed through, the rest is not
>passed on HOSTA. This issue does not occur with smaller packets than MTU (where
>fragmentation does not happen).
>

Hmm. The reason why first packet goes through is because of:
commit 58a317f1061c894d2344c0b6a18ab4a64b69b815
Author: Patrick McHardy <kaber@trash.net>
Date:   Sun Aug 26 19:14:12 2012 +0200

    netfilter: ipv6: add IPv6 NAT support

First packet will hit "if ((help && help->helper) || !nf_ct_is_confirmed(ct))"
(ct is uncorfirmed for it).

For this, nf_conntrack_ipv6 has to be loaded. Continuing investigation.

>I'm trying to find out where the problem is.
>
>Any quick ideas?
>
>Thanks
>
>Jiri

^ permalink raw reply

* Re: [PATCH NEXT] rtlwifi: Fix endian error in extracting packet type
From: Bjørn Mork @ 2013-10-29 14:27 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Larry Finger, linville, linux-wireless, Mark Cave-Ayland, netdev,
	Stable
In-Reply-To: <1383005246.3779.61.camel@bwh-desktop.uk.level5networks.com>

Ben Hutchings <bhutchings@solarflare.com> writes:

>> @@ -1077,8 +1077,8 @@ u8 rtl_is_special_data(struct ieee80211_hw *hw, struct sk_buff *skb, u8 is_tx)
>>  
>>  	ip = (struct iphdr *)((u8 *) skb->data + mac_hdr_len +
>>  			      SNAP_SIZE + PROTOC_TYPE_SIZE);
>> -	ether_type = *(u16 *) ((u8 *) skb->data + mac_hdr_len + SNAP_SIZE);
>> -	/*	ether_type = ntohs(ether_type); */
>> +	ether_type = be16_to_cpu(*(__be16 *)((u8 *)skb->data + mac_hdr_len +
>> +					     SNAP_SIZE));
>>  
>>  	if (ETH_P_IP == ether_type) {
>>  		if (IPPROTO_UDP == ip->protocol) {
>
> This crazy function also says that *all* IPv6 frames are special, which
> apparently means that on TX they should get sent at the lowest possible
> bit rate.  So I think this is going to cause a regression for IPv6
> throughput unless you remove that case.
>
> The DHCP case is also not validating IP and UDP header lengths against
> the packet length, though this may be harmless in practice.

It's not validating the upper 8 bits of the port numbers either, so it
will hit random UDP traffic in addition to DHCP. 

But it was good to see this function now. I was wondering how to support
some buggy 3G modem firmware without ugly hacks. Seems there will always
be worse hacks in drivers/net, no matter what I do :-)


Bjørn

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-29 14:27 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029141706.GC25078@neilslaptop.think-freely.org>

* Neil Horman <nhorman@tuxdriver.com> wrote:

> So, I apologize, you were right.  I was running the test.sh script 
> but perf was measuring itself. [...]

Ok, cool - one mystery less!

> Which overall looks alot more like I expect, save for the parallel 
> ALU cases. It seems here that the parallel ALU changes actually 
> hurt performance, which really seems counter-intuitive.  I don't 
> yet have any explination for that.  I do note that we seem to have 
> more stalls in the both case so perhaps the parallel chains call 
> for a more agressive prefetch.  Do you have any thoughts?

Note that with -ddd you 'overload' the PMU with more counters than 
can be run at once, which introduces extra noise. Since you are 
running the tests for 0.150 secs or so, the results are not very 
representative:

               734 dTLB-load-misses          #    0.00% of all dTLB cache hits   ( +-  8.40% ) [13.94%]
        13,314,660 iTLB-loads                #  280.759 M/sec                    ( +-  0.05% ) [12.97%]

with such low runtimes those results are very hard to trust.

So -ddd is typically used to pick up the most interesting PMU events 
you want to see measured, and then use them like this:

   -e dTLB-load-misses -e iTLB-loads

etc. For such short runtimes make sure the last column displays 
close to 100%, so that the PMU results become trustable.

A nehalem+ PMU will allow 2-4 events to be measured in parallel, 
plus generics like 'cycles', 'instructions' can be added 'for free' 
because they get counted in a separate (fixed purpose) PMU register.

The last colum tells you what percentage of the runtime that 
particular event was actually active. 100% (or empty last column) 
means it was active all the time.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Neil Horman @ 2013-10-29 14:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029131149.GB20408@gmail.com>

On Tue, Oct 29, 2013 at 02:11:49PM +0100, Ingo Molnar wrote:
> 
> * Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > I'm sure it worked properly on my system here, I specificially 
> > checked it, but I'll gladly run it again.  You have to give me an 
> > hour as I have a meeting to run to, but I'll have results shortly.
> 
> So what I tried to react to was this observation of yours:
> 
> > > > Heres my data for running the same test with taskset 
> > > > restricting execution to only cpu0.  I'm not quite sure whats 
> > > > going on here, but doing so resulted in a 10x slowdown of the 
> > > > runtime of each iteration which I can't explain. [...]
> 
> A 10x slowdown would be consistent with not running your testcase 
> but 'perf bench sched messaging' by accident, or so.
> 
> But I was really just guessing wildly here.
> 
> Thanks,
> 
> 	Ingo
> 


So, I apologize, you were right.  I was running the test.sh script but perf was
measuring itself.  Using this command line:

for i in `seq 0 1 3`
do
echo $i > /sys/modules/csum_test/parameters/module_test_mode; taskset -c 0 perf stat --repeat -C 0 -ddd /root/test.sh
done >> counters.txt 2>&1

with test.sh unchanged I get these results:


Base:
 Performance counter stats for '/root/test.sh' (20 runs):

         56.069737 task-clock                #    1.005 CPUs utilized            ( +-  0.13% ) [100.00%]
                 5 context-switches          #    0.091 K/sec                    ( +-  5.11% ) [100.00%]
                 0 cpu-migrations            #    0.000 K/sec                   [100.00%]
               366 page-faults               #    0.007 M/sec                    ( +-  0.08% )
       144,264,737 cycles                    #    2.573 GHz                      ( +-  0.23% ) [17.49%]
         9,239,760 stalled-cycles-frontend   #    6.40% frontend cycles idle     ( +-  3.77% ) [19.19%]
       110,635,829 stalled-cycles-backend    #   76.69% backend  cycles idle     ( +-  0.14% ) [19.68%]
        54,291,496 instructions              #    0.38  insns per cycle        
                                             #    2.04  stalled cycles per insn  ( +-  0.14% ) [18.30%]
         5,844,933 branches                  #  104.244 M/sec                    ( +-  2.81% ) [16.58%]
           301,523 branch-misses             #    5.16% of all branches          ( +-  0.12% ) [16.09%]
        23,645,797 L1-dcache-loads           #  421.721 M/sec                    ( +-  0.05% ) [16.06%]
           494,467 L1-dcache-load-misses     #    2.09% of all L1-dcache hits    ( +-  0.06% ) [16.06%]
         2,907,250 LLC-loads                 #   51.851 M/sec                    ( +-  0.08% ) [16.06%]
           486,329 LLC-load-misses           #   16.73% of all LL-cache hits     ( +-  0.11% ) [16.06%]
        11,113,848 L1-icache-loads           #  198.215 M/sec                    ( +-  0.07% ) [16.06%]
             5,378 L1-icache-load-misses     #    0.05% of all L1-icache hits    ( +-  1.34% ) [16.06%]
        23,742,876 dTLB-loads                #  423.453 M/sec                    ( +-  0.06% ) [16.06%]
                 0 dTLB-load-misses          #    0.00% of all dTLB cache hits  [16.06%]
        11,108,538 iTLB-loads                #  198.120 M/sec                    ( +-  0.06% ) [16.06%]
                 0 iTLB-load-misses          #    0.00% of all iTLB cache hits  [16.07%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [16.07%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [16.07%]

       0.055817066 seconds time elapsed                                          ( +-  0.10% )

Prefetch(5*64):
 Performance counter stats for '/root/test.sh' (20 runs):

         47.423853 task-clock                #    1.005 CPUs utilized            ( +-  0.62% ) [100.00%]
                 6 context-switches          #    0.116 K/sec                    ( +-  4.27% ) [100.00%]
                 0 cpu-migrations            #    0.000 K/sec                   [100.00%]
               368 page-faults               #    0.008 M/sec                    ( +-  0.07% )
       120,423,860 cycles                    #    2.539 GHz                      ( +-  0.85% ) [14.23%]
         8,555,632 stalled-cycles-frontend   #    7.10% frontend cycles idle     ( +-  0.56% ) [16.23%]
        87,438,794 stalled-cycles-backend    #   72.61% backend  cycles idle     ( +-  1.13% ) [18.33%]
        55,039,308 instructions              #    0.46  insns per cycle        
                                             #    1.59  stalled cycles per insn  ( +-  0.05% ) [18.98%]
         5,619,298 branches                  #  118.491 M/sec                    ( +-  2.32% ) [18.98%]
           303,686 branch-misses             #    5.40% of all branches          ( +-  0.08% ) [18.98%]
        26,577,868 L1-dcache-loads           #  560.432 M/sec                    ( +-  0.05% ) [18.98%]
         1,323,630 L1-dcache-load-misses     #    4.98% of all L1-dcache hits    ( +-  0.14% ) [18.98%]
         3,426,016 LLC-loads                 #   72.242 M/sec                    ( +-  0.05% ) [18.98%]
         1,304,201 LLC-load-misses           #   38.07% of all LL-cache hits     ( +-  0.13% ) [18.98%]
        13,190,316 L1-icache-loads           #  278.137 M/sec                    ( +-  0.21% ) [18.98%]
            33,881 L1-icache-load-misses     #    0.26% of all L1-icache hits    ( +-  4.63% ) [17.93%]
        25,366,685 dTLB-loads                #  534.893 M/sec                    ( +-  0.24% ) [15.93%]
               734 dTLB-load-misses          #    0.00% of all dTLB cache hits   ( +-  8.40% ) [13.94%]
        13,314,660 iTLB-loads                #  280.759 M/sec                    ( +-  0.05% ) [12.97%]
                 0 iTLB-load-misses          #    0.00% of all iTLB cache hits  [12.98%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [12.98%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [12.87%]

       0.047194407 seconds time elapsed                                          ( +-  0.62% )

Parallel ALU:
 Performance counter stats for '/root/test.sh' (20 runs):

         57.395070 task-clock                #    1.004 CPUs utilized            ( +-  1.71% ) [100.00%]
                 5 context-switches          #    0.092 K/sec                    ( +-  3.90% ) [100.00%]
                 0 cpu-migrations            #    0.000 K/sec                   [100.00%]
               367 page-faults               #    0.006 M/sec                    ( +-  0.10% )
       143,232,396 cycles                    #    2.496 GHz                      ( +-  1.68% ) [16.73%]
         7,299,843 stalled-cycles-frontend   #    5.10% frontend cycles idle     ( +-  2.69% ) [18.47%]
       109,485,845 stalled-cycles-backend    #   76.44% backend  cycles idle     ( +-  2.01% ) [19.99%]
        56,867,669 instructions              #    0.40  insns per cycle        
                                             #    1.93  stalled cycles per insn  ( +-  0.22% ) [19.49%]
         6,646,323 branches                  #  115.800 M/sec                    ( +-  2.15% ) [17.75%]
           304,671 branch-misses             #    4.58% of all branches          ( +-  0.37% ) [16.23%]
        23,612,428 L1-dcache-loads           #  411.402 M/sec                    ( +-  0.05% ) [15.95%]
           518,988 L1-dcache-load-misses     #    2.20% of all L1-dcache hits    ( +-  0.11% ) [15.95%]
         2,934,119 LLC-loads                 #   51.121 M/sec                    ( +-  0.06% ) [15.95%]
           509,027 LLC-load-misses           #   17.35% of all LL-cache hits     ( +-  0.15% ) [15.95%]
        11,103,819 L1-icache-loads           #  193.463 M/sec                    ( +-  0.08% ) [15.95%]
             5,381 L1-icache-load-misses     #    0.05% of all L1-icache hits    ( +-  2.45% ) [15.95%]
        23,727,164 dTLB-loads                #  413.401 M/sec                    ( +-  0.06% ) [15.95%]
                 0 dTLB-load-misses          #    0.00% of all dTLB cache hits  [15.95%]
        11,104,205 iTLB-loads                #  193.470 M/sec                    ( +-  0.06% ) [15.95%]
                 0 iTLB-load-misses          #    0.00% of all iTLB cache hits  [15.95%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [15.95%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [15.96%]

       0.057151644 seconds time elapsed                                          ( +-  1.69% )

Both:
 Performance counter stats for '/root/test.sh' (20 runs):

         48.377833 task-clock                #    1.005 CPUs utilized            ( +-  0.67% ) [100.00%]
                 5 context-switches          #    0.113 K/sec                    ( +-  3.88% ) [100.00%]
                 0 cpu-migrations            #    0.001 K/sec                    ( +-100.00% ) [100.00%]
               367 page-faults               #    0.008 M/sec                    ( +-  0.08% )
       122,529,490 cycles                    #    2.533 GHz                      ( +-  1.05% ) [14.24%]
         8,796,729 stalled-cycles-frontend   #    7.18% frontend cycles idle     ( +-  0.56% ) [16.20%]
        88,936,550 stalled-cycles-backend    #   72.58% backend  cycles idle     ( +-  1.48% ) [18.16%]
        58,405,660 instructions              #    0.48  insns per cycle        
                                             #    1.52  stalled cycles per insn  ( +-  0.07% ) [18.61%]
         5,742,738 branches                  #  118.706 M/sec                    ( +-  1.54% ) [18.61%]
           303,555 branch-misses             #    5.29% of all branches          ( +-  0.09% ) [18.61%]
        26,321,789 L1-dcache-loads           #  544.088 M/sec                    ( +-  0.07% ) [18.61%]
         1,236,101 L1-dcache-load-misses     #    4.70% of all L1-dcache hits    ( +-  0.08% ) [18.61%]
         3,409,768 LLC-loads                 #   70.482 M/sec                    ( +-  0.05% ) [18.61%]
         1,212,511 LLC-load-misses           #   35.56% of all LL-cache hits     ( +-  0.08% ) [18.61%]
        10,579,372 L1-icache-loads           #  218.682 M/sec                    ( +-  0.05% ) [18.61%]
            19,426 L1-icache-load-misses     #    0.18% of all L1-icache hits    ( +- 14.70% ) [18.61%]
        25,329,963 dTLB-loads                #  523.586 M/sec                    ( +-  0.27% ) [17.29%]
               802 dTLB-load-misses          #    0.00% of all dTLB cache hits   ( +-  5.43% ) [15.33%]
        10,635,524 iTLB-loads                #  219.843 M/sec                    ( +-  0.09% ) [13.38%]
                 0 iTLB-load-misses          #    0.00% of all iTLB cache hits  [12.72%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [12.72%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [12.72%]

       0.048140073 seconds time elapsed                                          ( +-  0.67% )


Which overall looks alot more like I expect, save for the parallel ALU cases.
It seems here that the parallel ALU changes actually hurt performance, which
really seems counter-intuitive.  I don't yet have any explination for that.  I
do note that we seem to have more stalls in the both case so perhaps the
parallel chains call for a more agressive prefetch.  Do you have any thoughts?

Regards
Neil

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: David Ahern @ 2013-10-29 14:12 UTC (permalink / raw)
  To: Ingo Molnar, Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029125233.GA17449@gmail.com>

On 10/29/13 6:52 AM, Ingo Molnar wrote:
>> According to the perf man page, I'm supposed to be able to use --
>> to separate perf command line parameters from the command I want
>> to run.  And it definately executed test.sh, I added an echo to
>> stdout in there as a test run and observed them get captured in
>> counters.txt
>
> Well, '--' can be used to delineate the command portion for cases
> where it's ambiguous.
>
> Here's it's unambiguous though. This:
>
>    perf stat --repeat 20 -C 0 -ddd perf bench sched messaging -- /root/test.sh
>
> stops parsing a valid option after the -ddd option, so in theory it
> should execute 'perf bench sched messaging -- /root/test.sh' where
> '-- /root/test.sh' is simply a parameter to 'perf bench' and is thus
> ignored.

Normally with perf commands a workload can be specified to state how 
long to collect perf data. That is not the case for perf-bench.

David

^ permalink raw reply

* Re: [PATCH 00/19] Enable various Renesas drivers on all ARM platforms
From: Artem Bityutskiy @ 2013-10-29 13:54 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-fbdev, linux-sh, Linus Walleij, Guennadi Liakhovetski,
	Thierry Reding, linux-mtd, linux-i2c, Laurent Pinchart,
	David S. Miller, Vinod Koul, Joerg Roedel, Wolfram Sang,
	Magnus Damm, Eduardo Valentin, Tomi Valkeinen, linux-serial,
	linux-input, Zhang Rui, Chris Ball,
	Jean-Christophe Plagniol-Villard, linux-media, linux-pwm,
	Samuel Ortiz, linux-pm, Ian
In-Reply-To: <1844190.ApyucSZX8W@avalon>

On Tue, 2013-10-29 at 14:22 +0100, Laurent Pinchart wrote:
> > Also, if ARM dependency is ever removed, all these should become 'n' by
> > default in the Kconfig, in order to make sure they do not slip into
> > defconfigs of different architectures.
> 
> The idea is that, if ARM is neither a compile-time nor runtime dependency, it 
> should not be specified in Kconfig. However, if the IP core has never been 
> used on anything but SuperH and ARM, I don't think clobbering the config 
> process with drivers that can't be used on the target architecture would be a 
> really good idea, especially now that we have a COMPILE_TEST Kconfig option. 
> My preference does goes to SUPERH || ARM || COMPILE_TEST over no dependency at 
> all.

Ah, OK, I missed the entire COMPILE_TEST story.

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply

* Re: [PATCH net-next] net: introduce gro_frag_list_enable sysctl
From: Christoph Paasch @ 2013-10-29 13:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Herbert Xu, netdev, Jerry Chu, Michael Dalton
In-Reply-To: <1383051962.5464.25.camel@edumazet-glaptop.roam.corp.google.com>

On 29/10/13 - 06:06:02, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Christoph Paasch and Jerry Chu reported crashes in skb_segment() caused
> by commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")
> 
> (Jerry is working on adding native GRO support for tunnels)
> 
> skb_segment() only deals with a frag_list chain containing MSS sized
> fragments.
> 
> This patch adds support any kind of frag, and adds a new sysctl,
> as clearly the GRO layer should avoid building frag_list skbs
> on a router, as the segmentation is adding cpu overhead.
> 
> Note that we could try to reuse page fragments instead of doing
> copy to linear skbs, but this requires a fair amount of work,
> and possible truesize nightmares, as we do not track individual
> (per page fragment) truesizes.
> 
> /proc/sys/net/core/gro_frag_list_enable possible values are :
> 
> 0 : GRO layer is not allowed to use frag_list to extend skb capacity
> 1 : GRO layer is allowed to use frag_list, but skb_segment()
>     automatically sets the sysctl to 0.
> 2 : GRO is allowed to use frag_list, and skb_segment() wont
>     clear the sysctl.
> 
> Default value is 1 : automatic discovery
> 
> Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> Reported-by: Jerry Chu <hkchu@google.com>
> Cc: Michael Dalton <mwdalton@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  Documentation/sysctl/net.txt |   19 +++++++++++++++++++
>  include/linux/netdevice.h    |    1 +
>  net/core/skbuff.c            |   29 ++++++++++++++++++++---------
>  net/core/sysctl_net_core.c   |   10 ++++++++++
>  4 files changed, 50 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> index 9a0319a82470..8778568ae64e 100644
> --- a/Documentation/sysctl/net.txt
> +++ b/Documentation/sysctl/net.txt
> @@ -87,6 +87,25 @@ sysctl.net.busy_read globally.
>  Will increase power usage.
>  Default: 0 (off)
>  
> +gro_frag_list_enable
> +--------------------
> +
> +GRO layer can build full size GRO packets (~64K of payload) if it is allowed
> +to extend skb using the frag_list pointer. However, this strategy is a win
> +on hosts, where TCP flows are terminated. For a router, using frag_list
> +skbs is not a win because we have to segment skbs before transmit,
> +as most NIC drivers do not support frag_list.
> +As soon as one frag_list skb has to be segmented, this sysctl is automatically
> +changed from 1 to 0.
> +If the value is set to 2, kernel wont change it.
> +
> +Choices : 0 (off),
> +          1 (on, with automatic change to 0)
> +          2 (on, permanent)
> +
> +Default: 1 (on, with automatic downgrade on a router)
> +
> +
>  rmem_default
>  ------------
>  
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 27f62f746621..b82ff52f301e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2807,6 +2807,7 @@ extern int		netdev_max_backlog;
>  extern int		netdev_tstamp_prequeue;
>  extern int		weight_p;
>  extern int		bpf_jit_enable;
> +extern int		sysctl_gro_frag_list_enable;

We are missing the definition of sysctl_gro_frag_list_enable :)

net/built-in.o: In function `skb_gro_receive':
(.text+0x8f04): undefined reference to `sysctl_gro_frag_list_enable'
net/built-in.o: In function `skb_segment':
(.text+0xa54e): undefined reference to `sysctl_gro_frag_list_enable'
net/built-in.o: In function `skb_segment':
(.text+0xa557): undefined reference to `sysctl_gro_frag_list_enable'
net/built-in.o:(.data+0x1198): undefined reference to `sysctl_gro_frag_list_enable'


Cheers,
Christoph

^ permalink raw reply

* Re: [PATCH] bridge: pass correct vlan id to multicast code
From: Amos Kong @ 2013-10-29 13:39 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: Vlad Yasevich, netdev, shemminger
In-Reply-To: <1383044915.3518.41.camel@ubuntu-vm-makita>

On Tue, Oct 29, 2013 at 08:08:35PM +0900, Toshiaki Makita wrote:
> On Tue, 2013-10-29 at 10:36 +0800, Amos Kong wrote:
> > On Mon, Oct 28, 2013 at 03:45:07PM -0400, Vlad Yasevich wrote:
> > > Currently multicast code attempts to extrace the vlan id from
> > > the skb even when vlan filtering is disabled.  This can lead
> > > to mdb entries being created with the wrong vlan id.
> > > Pass the already extracted vlan id to the multicast
> > > filtering code to make the correct id is used in
> > > creation as well as lookup.
> 
> Thanks!
> 
> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> 
> >  
> > Hi Vlad,
> > 
> > Can we just update br_vlan_get_tag() to set vid to 0 if dev->vlan is
> > disabled? I guess it would effect br_handle_local_finish().
> 
> br_handle_local_finish() looks also buggy.
> But adding vlan enabled checking would not fix it completely because
> vlan_bitmap and PVID are not taken into account in that function.
> 
> Since we cannot pass vid as an argument from br_dev_xmit() to
> br_handle_[local/frame]_finish() because of NF_HOOK,
> br_handle_local_finish() seems to have to check vlan_enabled,
> vlan_bitmap, and pvid by itself.
> 
> IMHO it can be addressed by another patch.
>
> > > Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> > > ---
> > >  net/bridge/br_device.c    |  2 +-
> > >  net/bridge/br_input.c     |  2 +-
> > >  net/bridge/br_multicast.c | 44 +++++++++++++++++++-------------------------
> > >  net/bridge/br_private.h   |  6 ++++--
> > >  4 files changed, 25 insertions(+), 29 deletions(-)
> > 
> > ...
> > 
> > > diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> > > index 8b0b610..686284f 100644
> > > --- a/net/bridge/br_multicast.c
> > > +++ b/net/bridge/br_multicast.c
> > > @@ -947,7 +947,8 @@ void br_multicast_disable_port(struct net_bridge_port *port)
> > >  
> > >  static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
> > >  					 struct net_bridge_port *port,
> > > -					 struct sk_buff *skb)
> > > +					 struct sk_buff *skb,
> > > +					 u16 vid)
> > >  {
> > >  	struct igmpv3_report *ih;
> > >  	struct igmpv3_grec *grec;
> > > @@ -957,12 +958,10 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
> > >  	int type;
> > >  	int err = 0;
> > >  	__be32 group;
> > > -	u16 vid = 0;
> > >  
> > >  	if (!pskb_may_pull(skb, sizeof(*ih)))
> > >  		return -EINVAL;
> > >  
> > > -	br_vlan_get_tag(skb, &vid);
> > 
> > After applied the patch, we always use vid in br_dev_xmit()->br_allowed_ingress(),
> > is it possible that the vlan of bridge is re-enabled when other
> > changed functions are called?
> > 
> > We can just add a enabled checking before this kind of br_vlan_get_tag()?
> > 
> > if (!br->vlan_enabled)
> >     br_vlan_get_tag(skb2, &vid);
> 
> Maybe this leads to a wrong way to update mdb in some cases like
>   Vlan_filtering is disabled (by default).
>   Add some vids we want to allow.
>   Receive a frame whose vid wouldn't be allowed with vlan_filtering enabled.
>   The frame passes br_allowed_ingress().
>   Enable vlan_filtering.
>   The frame reaches br_ip4_multicast_igmp3_report().
>   Mdb is updated with disabled vid.
> 
> 
> Thanks,
> 
> Toshiaki Makita
 
Thanks all your explanation, I'm ok with the patch.


-- 
			Amos.

^ permalink raw reply

* Re: [PATCH 03/16] wl1251: add sysfs interface for bluetooth coexistence mode configuration
From: Kalle Valo @ 2013-10-29 13:35 UTC (permalink / raw)
  To: Luca Coelho
  Cc: Ben Hutchings, Pali Rohár, John W. Linville, Johannes Berg,
	David S. Miller, linux-wireless, netdev, linux-kernel,
	freemangordon, aaro.koskinen, pavel, sre, joni.lapilainen,
	David Gnedt
In-Reply-To: <1383030565.21526.92.camel@porter.coelho.fi>

Luca Coelho <luca@coelho.fi> writes:

> On Mon, 2013-10-28 at 23:39 +0000, Ben Hutchings wrote:
>> On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
>> > From: David Gnedt <david.gnedt@davizone.at>
>> > 
>> > Port the bt_coex_mode sysfs interface from wl1251 driver version included
>> > in the Maemo Fremantle kernel to allow bt-coexistence mode configuration.
>> > This enables userspace applications to set one of the modes
>> > WL1251_BT_COEX_OFF, WL1251_BT_COEX_ENABLE and WL1251_BT_COEX_MONOAUDIO.
>> > The default mode is WL1251_BT_COEX_OFF.
>> > It should be noted that this driver always enabled bt-coexistence before
>> > and enabled bt-coexistence directly affects the receiving performance,
>> > rendering it unusable in some low-signal situations. Especially monitor
>> > mode is affected very badly with bt-coexistence enabled.
>> [...]
>> 
>> This should be implemented consistently with other drivers:
>> 
>> drivers/net/wireless/ath/ath9k/htc_drv_init.c:module_param_named(btcoex_enable, ath9k_htc_btcoex_enable, int, 0444);
>> drivers/net/wireless/ath/ath9k/init.c:module_param_named(btcoex_enable, ath9k_btcoex_enable, int, 0444);
>> drivers/net/wireless/b43/main.c:module_param_named(btcoex, modparam_btcoex, int, 0444);
>> drivers/net/wireless/ipw2x00/ipw2200.c:module_param(bt_coexist, int, 0444);
>> drivers/net/wireless/iwlegacy/common.c:module_param(bt_coex_active, bool, S_IRUGO);
>> drivers/net/wireless/iwlwifi/iwl-drv.c:module_param_named(bt_coex_active, iwlwifi_mod_params.bt_coex_active,
>> drivers/net/wireless/ti/wlcore/sysfs.c:static DEVICE_ATTR(bt_coex_state, S_IRUGO | S_IWUSR,
>> 
>> Oh, hmm, I see a problem here.
>
> With so many drivers doing the same thing, isn't it about time to add
> this to nl80211?

Yes, this really needs to be in nl80211. I even suggested this years ago
but was turned down at the time. Can't remember the reason anymore.

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH 00/19] Enable various Renesas drivers on all ARM platforms
From: Laurent Pinchart @ 2013-10-29 13:22 UTC (permalink / raw)
  To: dedekind1-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-fbdev-u79uwXL29TY76Z2rM5mHXA,
	linux-sh-u79uwXL29TY76Z2rM5mHXA, Linus Walleij,
	Guennadi Liakhovetski, Thierry Reding,
	linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA, Laurent Pinchart,
	David S. Miller, Vinod Koul, Wolfram Sang, Magnus Damm,
	Eduardo Valentin, Tomi Valkeinen,
	linux-serial-u79uwXL29TY76Z2rM5mHXA,
	linux-input-u79uwXL29TY76Z2rM5mHXA, Zhang Rui, Chris Ball,
	Jean-Christophe Plagniol-Villard,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	linux-pwm-u79uwXL29TY76Z2rM5mHXA, Samuel Ortiz,
	linux-pm-u79uwXL29TY76Z2rM5mHXA, Ian Molton, Mark
In-Reply-To: <1383051980.29619.33.camel-Bxnoe/o8FG+Ef9UqXRslZEEOCMrvLtNR@public.gmane.org>

Hi Artem,

On Tuesday 29 October 2013 15:06:20 Artem Bityutskiy wrote:
> On Tue, 2013-10-29 at 10:12 +0100, Guennadi Liakhovetski wrote:
> > On Tue, 29 Oct 2013, Laurent Pinchart wrote:
> > > Hello,
> > > 
> > > This patch series, based on v3.12-rc7, prepares various Renesas drivers
> > > for migration to multiplatform kernels by enabling their compilation or
> > > otherwise fixing them on all ARM platforms. The patches are pretty
> > > straightforward and are described in their commit message.
> > > 
> > > I'd like to get all these patches merged in v3.14. As they will need to
> > > go through their respective subsystems' trees, I would appreciate if all
> > > maintainers involved could notify me when they merge patches from this
> > > series in their tree to help me tracking the merge status. I don't plan
> > > to send pull requests individually for these patches, and I will repost
> > > patches individually if changes are requested during review.
> > > 
> > > If you believe the issue should be solved in a different way (for
> > > instance by removing the architecture dependency completely) please
> > > reply to the cover letter to let other maintainers chime in.
> > 
> > Exactly this was my doubt. If we let these drivers build on all ARM
> > platforms... Maybe we should just let them build everywhere? Unless there
> > are real ARM dependencies. Maybe you could try to remove the restriction
> > and try to build them all on x86?
> 
> If they have never been used on anything but ARM, why would you remove
> ARM dependencies? Just for the sake of compile-checking?
> 
> Also, if ARM dependency is ever removed, all these should become 'n' by
> default in the Kconfig, in order to make sure they do not slip into
> defconfigs of different architectures.

The idea is that, if ARM is neither a compile-time nor runtime dependency, it 
should not be specified in Kconfig. However, if the IP core has never been 
used on anything but SuperH and ARM, I don't think clobbering the config 
process with drivers that can't be used on the target architecture would be a 
really good idea, especially now that we have a COMPILE_TEST Kconfig option. 
My preference does goes to SUPERH || ARM || COMPILE_TEST over no dependency at 
all.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Neil Horman @ 2013-10-29 13:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029131149.GB20408@gmail.com>

On Tue, Oct 29, 2013 at 02:11:49PM +0100, Ingo Molnar wrote:
> 
> * Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > I'm sure it worked properly on my system here, I specificially 
> > checked it, but I'll gladly run it again.  You have to give me an 
> > hour as I have a meeting to run to, but I'll have results shortly.
> 
> So what I tried to react to was this observation of yours:
> 
> > > > Heres my data for running the same test with taskset 
> > > > restricting execution to only cpu0.  I'm not quite sure whats 
> > > > going on here, but doing so resulted in a 10x slowdown of the 
> > > > runtime of each iteration which I can't explain. [...]
> 
> A 10x slowdown would be consistent with not running your testcase 
> but 'perf bench sched messaging' by accident, or so.
> 
> But I was really just guessing wildly here.
> 
> Thanks,
> 
> 	Ingo
> 
Ok, well, I'll run it again in just a bit here.
Neil

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-29 13:11 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029130712.GA25078@neilslaptop.think-freely.org>


* Neil Horman <nhorman@tuxdriver.com> wrote:

> I'm sure it worked properly on my system here, I specificially 
> checked it, but I'll gladly run it again.  You have to give me an 
> hour as I have a meeting to run to, but I'll have results shortly.

So what I tried to react to was this observation of yours:

> > > Heres my data for running the same test with taskset 
> > > restricting execution to only cpu0.  I'm not quite sure whats 
> > > going on here, but doing so resulted in a 10x slowdown of the 
> > > runtime of each iteration which I can't explain. [...]

A 10x slowdown would be consistent with not running your testcase 
but 'perf bench sched messaging' by accident, or so.

But I was really just guessing wildly here.

Thanks,

	Ingo

^ permalink raw reply

* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Hannes Frederic Sowa @ 2013-10-29 13:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Laight, David Miller, jiri, vyasevich, netdev, kuznet,
	jmorris, yoshfuji, kaber, thaller, stephen, Dave Taht
In-Reply-To: <1383052163.5464.28.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 29, 2013 at 06:09:23AM -0700, Eric Dumazet wrote:
> On Tue, 2013-10-29 at 13:40 +0100, Hannes Frederic Sowa wrote:
> 
> > Yes, this is a very unfortunate situation. From my experience it is not that
> > easy to get a patch merged into isc-dhcp.
> > 
> > It seems not that invasive to switch from af_packet to an udp socket with
> > SO_BROADCAST set.
> 
> Has anyone cooked and submitted such a patch ?

Not me, my patch was intended for the server side (dhcrelay correctly
choosing source addresses).

> I can visit ISC with a hammer if needed ;)

Actually that is a very big motivation for me to try to come up with a
patch at the weekend. ;-)

^ permalink raw reply

* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Eric Dumazet @ 2013-10-29 13:09 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David Laight, David Miller, jiri, vyasevich, netdev, kuznet,
	jmorris, yoshfuji, kaber, thaller, stephen, Dave Taht
In-Reply-To: <20131029124010.GA15762@order.stressinduktion.org>

On Tue, 2013-10-29 at 13:40 +0100, Hannes Frederic Sowa wrote:

> Yes, this is a very unfortunate situation. From my experience it is not that
> easy to get a patch merged into isc-dhcp.
> 
> It seems not that invasive to switch from af_packet to an udp socket with
> SO_BROADCAST set.

Has anyone cooked and submitted such a patch ?

I can visit ISC with a hammer if needed ;)

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Neil Horman @ 2013-10-29 13:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029125233.GA17449@gmail.com>

On Tue, Oct 29, 2013 at 01:52:33PM +0100, Ingo Molnar wrote:
> 
> * Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > On Tue, Oct 29, 2013 at 12:30:31PM +0100, Ingo Molnar wrote:
> > > 
> > > * Neil Horman <nhorman@tuxdriver.com> wrote:
> > > 
> > > > Sure it was this:
> > > > for i in `seq 0 1 3`
> > > > do
> > > > echo $i > /sys/module/csum_test/parameters/module_test_mode
> > > > taskset -c 0 perf stat --repeat 20 -C 0 -ddd perf bench sched messaging -- /root/test.sh
> > > > done >> counters.txt 2>&1
> > > > 
> > > > where test.sh is:
> > > > #!/bin/sh
> > > > echo 1 > /sys/module/csum_test/parameters/test_fire
> > > 
> > > What does '-- /root/test.sh' do?
> > > 
> > > Unless I'm missing something, the line above will run:
> > > 
> > >   perf bench sched messaging -- /root/test.sh
> > > 
> > > which should be equivalent to:
> > > 
> > >   perf bench sched messaging
> > > 
> > > i.e. /root/test.sh won't be run.
> > 
> > According to the perf man page, I'm supposed to be able to use -- 
> > to separate perf command line parameters from the command I want 
> > to run.  And it definately executed test.sh, I added an echo to 
> > stdout in there as a test run and observed them get captured in 
> > counters.txt
> 
> Well, '--' can be used to delineate the command portion for cases 
> where it's ambiguous.
> 
> Here's it's unambiguous though. This:
> 
>   perf stat --repeat 20 -C 0 -ddd perf bench sched messaging -- /root/test.sh
> 
> stops parsing a valid option after the -ddd option, so in theory it 
> should execute 'perf bench sched messaging -- /root/test.sh' where 
> '-- /root/test.sh' is simply a parameter to 'perf bench' and is thus 
> ignored.
> 
> The message output you provided seems to suggest that to be the 
> case:
> 
>  Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):
> 
> See how the command executed by perf stat was 'perf bench ...'.
> 
> Did you want to run:
> 
>   perf stat --repeat 20 -C 0 -ddd /root/test.sh
> 
I'm sure it worked properly on my system here, I specificially checked it, but
I'll gladly run it again.  You have to give me an hour as I have a meeting to
run to, but I'll have results shortly.
Neil

> ?
> 
> Thanks,
> 
> 	Ingo
> 

^ permalink raw reply

* Re: [PATCH 00/19] Enable various Renesas drivers on all ARM platforms
From: Artem Bityutskiy @ 2013-10-29 13:06 UTC (permalink / raw)
  To: Guennadi Liakhovetski
  Cc: linux-fbdev, linux-sh, Linus Walleij, Guennadi Liakhovetski,
	Thierry Reding, linux-mtd, linux-i2c, Laurent Pinchart,
	Vinod Koul, Joerg Roedel, Wolfram Sang, Magnus Damm,
	Eduardo Valentin, Tomi Valkeinen, linux-serial, linux-input,
	Zhang Rui, Chris Ball, Jean-Christophe Plagniol-Villard,
	linux-media, linux-pwm, Samuel Ortiz, linux-pm, Ian Molton,
	Mark Brown <b
In-Reply-To: <Pine.LNX.4.64.1310291009121.8404@axis700.grange>

On Tue, 2013-10-29 at 10:12 +0100, Guennadi Liakhovetski wrote:
> Hi Laurent
> 
> On Tue, 29 Oct 2013, Laurent Pinchart wrote:
> 
> > Hello,
> > 
> > This patch series, based on v3.12-rc7, prepares various Renesas drivers
> > for migration to multiplatform kernels by enabling their compilation or
> > otherwise fixing them on all ARM platforms. The patches are pretty
> > straightforward and are described in their commit message.
> > 
> > I'd like to get all these patches merged in v3.14. As they will need to go
> > through their respective subsystems' trees, I would appreciate if all
> > maintainers involved could notify me when they merge patches from this series
> > in their tree to help me tracking the merge status. I don't plan to send pull
> > requests individually for these patches, and I will repost patches
> > individually if changes are requested during review.
> > 
> > If you believe the issue should be solved in a different way (for instance by
> > removing the architecture dependency completely) please reply to the cover
> > letter to let other maintainers chime in.
> 
> Exactly this was my doubt. If we let these drivers build on all ARM 
> platforms... Maybe we should just let them build everywhere? Unless there 
> are real ARM dependencies. Maybe you could try to remove the restriction 
> and try to build them all on x86?

If they have never been used on anything but ARM, why would you remove
ARM dependencies? Just for the sake of compile-checking?

Also, if ARM dependency is ever removed, all these should become 'n' by
default in the Kconfig, in order to make sure they do not slip into
defconfigs of different architectures.

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply

* [PATCH net-next] net: introduce gro_frag_list_enable sysctl
From: Eric Dumazet @ 2013-10-29 13:06 UTC (permalink / raw)
  To: Christoph Paasch, David Miller
  Cc: Herbert Xu, netdev, Jerry Chu, Michael Dalton
In-Reply-To: <20131029090849.GC5944@cpaasch-mac>

From: Eric Dumazet <edumazet@google.com>

Christoph Paasch and Jerry Chu reported crashes in skb_segment() caused
by commit 8a29111c7ca6 ("net: gro: allow to build full sized skb")

(Jerry is working on adding native GRO support for tunnels)

skb_segment() only deals with a frag_list chain containing MSS sized
fragments.

This patch adds support any kind of frag, and adds a new sysctl,
as clearly the GRO layer should avoid building frag_list skbs
on a router, as the segmentation is adding cpu overhead.

Note that we could try to reuse page fragments instead of doing
copy to linear skbs, but this requires a fair amount of work,
and possible truesize nightmares, as we do not track individual
(per page fragment) truesizes.

/proc/sys/net/core/gro_frag_list_enable possible values are :

0 : GRO layer is not allowed to use frag_list to extend skb capacity
1 : GRO layer is allowed to use frag_list, but skb_segment()
    automatically sets the sysctl to 0.
2 : GRO is allowed to use frag_list, and skb_segment() wont
    clear the sysctl.

Default value is 1 : automatic discovery

Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Reported-by: Jerry Chu <hkchu@google.com>
Cc: Michael Dalton <mwdalton@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 Documentation/sysctl/net.txt |   19 +++++++++++++++++++
 include/linux/netdevice.h    |    1 +
 net/core/skbuff.c            |   29 ++++++++++++++++++++---------
 net/core/sysctl_net_core.c   |   10 ++++++++++
 4 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 9a0319a82470..8778568ae64e 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -87,6 +87,25 @@ sysctl.net.busy_read globally.
 Will increase power usage.
 Default: 0 (off)
 
+gro_frag_list_enable
+--------------------
+
+GRO layer can build full size GRO packets (~64K of payload) if it is allowed
+to extend skb using the frag_list pointer. However, this strategy is a win
+on hosts, where TCP flows are terminated. For a router, using frag_list
+skbs is not a win because we have to segment skbs before transmit,
+as most NIC drivers do not support frag_list.
+As soon as one frag_list skb has to be segmented, this sysctl is automatically
+changed from 1 to 0.
+If the value is set to 2, kernel wont change it.
+
+Choices : 0 (off),
+          1 (on, with automatic change to 0)
+          2 (on, permanent)
+
+Default: 1 (on, with automatic downgrade on a router)
+
+
 rmem_default
 ------------
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 27f62f746621..b82ff52f301e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2807,6 +2807,7 @@ extern int		netdev_max_backlog;
 extern int		netdev_tstamp_prequeue;
 extern int		weight_p;
 extern int		bpf_jit_enable;
+extern int		sysctl_gro_frag_list_enable;
 
 bool netdev_has_upper_dev(struct net_device *dev, struct net_device *upper_dev);
 bool netdev_has_any_upper_dev(struct net_device *dev);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0ab32faa520f..d5b74ed1e9cb 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2761,7 +2761,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
 	unsigned int len;
 	__be16 proto;
 	bool csum;
-	int sg = !!(features & NETIF_F_SG);
+	bool sg = !!(features & NETIF_F_SG);
 	int nfrags = skb_shinfo(skb)->nr_frags;
 	int err = -ENOMEM;
 	int i = 0;
@@ -2793,7 +2793,13 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
 			hsize = len;
 
 		if (!hsize && i >= nfrags) {
-			BUG_ON(fskb->len != len);
+			if (fskb->len != len) {
+				if (sysctl_gro_frag_list_enable == 1)
+					sysctl_gro_frag_list_enable = 0;
+				hsize = len;
+				sg = false;
+				goto do_linear;
+			}
 
 			pos += len;
 			nskb = skb_clone(fskb, GFP_ATOMIC);
@@ -2812,6 +2818,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
 			skb_release_head_state(nskb);
 			__skb_push(nskb, doffset);
 		} else {
+do_linear:
 			nskb = __alloc_skb(hsize + doffset + headroom,
 					   GFP_ATOMIC, skb_alloc_rx_flag(skb),
 					   NUMA_NO_NODE);
@@ -2838,9 +2845,6 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
 						 nskb->data - tnl_hlen,
 						 doffset + tnl_hlen);
 
-		if (fskb != skb_shinfo(skb)->frag_list)
-			goto perform_csum_check;
-
 		if (!sg) {
 			nskb->ip_summed = CHECKSUM_NONE;
 			nskb->csum = skb_copy_and_csum_bits(skb, offset,
@@ -2849,6 +2853,9 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
 			continue;
 		}
 
+		if (fskb != skb_shinfo(skb)->frag_list)
+			goto perform_csum_check;
+
 		frag = skb_shinfo(nskb)->frags;
 
 		skb_copy_from_linear_data_offset(skb, offset,
@@ -2944,9 +2951,11 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		int i = skbinfo->nr_frags;
 		int nr_frags = pinfo->nr_frags + i;
 
-		if (nr_frags > MAX_SKB_FRAGS)
+		if (unlikely(nr_frags > MAX_SKB_FRAGS)) {
+			if (!sysctl_gro_frag_list_enable)
+				return -E2BIG;
 			goto merge;
-
+		}
 		offset -= headlen;
 		pinfo->nr_frags = nr_frags;
 		skbinfo->nr_frags = 0;
@@ -2977,9 +2986,11 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		unsigned int first_size = headlen - offset;
 		unsigned int first_offset;
 
-		if (nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS)
+		if (unlikely(nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS)) {
+			if (!sysctl_gro_frag_list_enable)
+				return -E2BIG;
 			goto merge;
-
+		}
 		first_offset = skb->data -
 			       (unsigned char *)page_address(page) +
 			       offset;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index cca444190907..2d6aaf6d5838 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -24,6 +24,7 @@
 
 static int zero = 0;
 static int one = 1;
+static int two = 2;
 static int ushort_max = USHRT_MAX;
 
 #ifdef CONFIG_RPS
@@ -360,6 +361,15 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "gro_frag_list_enable",
+		.data		= &sysctl_gro_frag_list_enable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
+	},
 	{ }
 };
 

^ permalink raw reply related

* Re: Bug in skb_segment: fskb->len != len
From: Eric Dumazet @ 2013-10-29 12:57 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Herbert Xu, netdev
In-Reply-To: <20131029090849.GC5944@cpaasch-mac>

On Tue, 2013-10-29 at 10:08 +0100, Christoph Paasch wrote:
> Ok, my router does not crash anymore with my workload.
> 
> Thanks for fixing it!
> 
> Tested-by: Christoph Paasch <christoph.paasch@uclouvain.be>

Ok fine ;)

Do you mind to test the official patch, as I added a new sysctl so that
GRO layer do not use frag_list on a router ?

You could check that /proc/sys/net/core/gro_frag_list_enable is
automatically cleared.

This way, we have optimal behavior for host or router.

Thanks !

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-29 12:52 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131029114907.GE24477@neilslaptop.think-freely.org>


* Neil Horman <nhorman@tuxdriver.com> wrote:

> On Tue, Oct 29, 2013 at 12:30:31PM +0100, Ingo Molnar wrote:
> > 
> > * Neil Horman <nhorman@tuxdriver.com> wrote:
> > 
> > > Sure it was this:
> > > for i in `seq 0 1 3`
> > > do
> > > echo $i > /sys/module/csum_test/parameters/module_test_mode
> > > taskset -c 0 perf stat --repeat 20 -C 0 -ddd perf bench sched messaging -- /root/test.sh
> > > done >> counters.txt 2>&1
> > > 
> > > where test.sh is:
> > > #!/bin/sh
> > > echo 1 > /sys/module/csum_test/parameters/test_fire
> > 
> > What does '-- /root/test.sh' do?
> > 
> > Unless I'm missing something, the line above will run:
> > 
> >   perf bench sched messaging -- /root/test.sh
> > 
> > which should be equivalent to:
> > 
> >   perf bench sched messaging
> > 
> > i.e. /root/test.sh won't be run.
> 
> According to the perf man page, I'm supposed to be able to use -- 
> to separate perf command line parameters from the command I want 
> to run.  And it definately executed test.sh, I added an echo to 
> stdout in there as a test run and observed them get captured in 
> counters.txt

Well, '--' can be used to delineate the command portion for cases 
where it's ambiguous.

Here's it's unambiguous though. This:

  perf stat --repeat 20 -C 0 -ddd perf bench sched messaging -- /root/test.sh

stops parsing a valid option after the -ddd option, so in theory it 
should execute 'perf bench sched messaging -- /root/test.sh' where 
'-- /root/test.sh' is simply a parameter to 'perf bench' and is thus 
ignored.

The message output you provided seems to suggest that to be the 
case:

 Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

See how the command executed by perf stat was 'perf bench ...'.

Did you want to run:

  perf stat --repeat 20 -C 0 -ddd /root/test.sh

?

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
From: Hannes Frederic Sowa @ 2013-10-29 12:50 UTC (permalink / raw)
  To: David Miller, netdev, fweimer
In-Reply-To: <20131029120424.GD26185@order.stressinduktion.org>

On Tue, Oct 29, 2013 at 01:04:25PM +0100, Hannes Frederic Sowa wrote:
> The reason for this is, that __ip_rt_update_pmtu is called with zero mtu
> and will get clamped to min_pmtu. It is possible to tell people to also
> increase min_pmtu but I don't want to take the risk of breaking other
> applications on a DNS server given how the current semantics are.

It is actually difficult to select a reasonable mtu here given more than two
interfaces with different mtus.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH] bridge: pass correct vlan id to multicast code
From: Vlad Yasevich @ 2013-10-29 12:45 UTC (permalink / raw)
  To: Amos Kong, Vlad Yasevich; +Cc: netdev, shemminger, makita.toshiaki
In-Reply-To: <20131029023646.GA2795@amosk.info>

On 10/28/2013 10:36 PM, Amos Kong wrote:
> On Mon, Oct 28, 2013 at 03:45:07PM -0400, Vlad Yasevich wrote:
>> Currently multicast code attempts to extrace the vlan id from
>> the skb even when vlan filtering is disabled.  This can lead
>> to mdb entries being created with the wrong vlan id.
>> Pass the already extracted vlan id to the multicast
>> filtering code to make the correct id is used in
>> creation as well as lookup.
>
> Hi Vlad,
>
> Can we just update br_vlan_get_tag() to set vid to 0 if dev->vlan is
> disabled? I guess it would effect br_handle_local_finish().

I have another patch that refactors the br_vlan_get_tag() and that would 
address this issue.  I am testing it now.

I am hoping to simplify the patch a bit though since I am not
crazy about the function signature that br_vlan_get_tag() gets.

-vlad

>
>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>> ---
>>   net/bridge/br_device.c    |  2 +-
>>   net/bridge/br_input.c     |  2 +-
>>   net/bridge/br_multicast.c | 44 +++++++++++++++++++-------------------------
>>   net/bridge/br_private.h   |  6 ++++--
>>   4 files changed, 25 insertions(+), 29 deletions(-)
>
> ...
>
>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
>> index 8b0b610..686284f 100644
>> --- a/net/bridge/br_multicast.c
>> +++ b/net/bridge/br_multicast.c
>> @@ -947,7 +947,8 @@ void br_multicast_disable_port(struct net_bridge_port *port)
>>
>>   static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
>>   					 struct net_bridge_port *port,
>> -					 struct sk_buff *skb)
>> +					 struct sk_buff *skb,
>> +					 u16 vid)
>>   {
>>   	struct igmpv3_report *ih;
>>   	struct igmpv3_grec *grec;
>> @@ -957,12 +958,10 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
>>   	int type;
>>   	int err = 0;
>>   	__be32 group;
>> -	u16 vid = 0;
>>
>>   	if (!pskb_may_pull(skb, sizeof(*ih)))
>>   		return -EINVAL;
>>
>> -	br_vlan_get_tag(skb, &vid);
>
> After applied the patch, we always use vid in br_dev_xmit()->br_allowed_ingress(),
> is it possible that the vlan of bridge is re-enabled when other
> changed functions are called?
>
> We can just add a enabled checking before this kind of br_vlan_get_tag()?
>
> if (!br->vlan_enabled)
>      br_vlan_get_tag(skb2, &vid);
>
>
>>   	ih = igmpv3_report_hdr(skb);
>>   	num = ntohs(ih->ngrec);
>>   	len = sizeof(*ih);
>
> ...
>

^ permalink raw reply

* RE: [tipc-discussion] [PATCH net-next] tipc: remove two indentation levels in tipc_recv_msg routine
From: David Laight @ 2013-10-29 12:42 UTC (permalink / raw)
  To: Andreas Bofjäll, Erik Hugne
  Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <526FACA9.9020004@ericsson.com>

>   		continue;
> +
> +unlock:
> +		tipc_node_unlock(n_ptr);
>   cont:
>   		kfree_skb(buf);
>   	}

Might be better to call the labels 'unlock_discard' and 'discard'.

	David

^ permalink raw reply

* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Hannes Frederic Sowa @ 2013-10-29 12:40 UTC (permalink / raw)
  To: David Laight
  Cc: David Miller, jiri, vyasevich, netdev, kuznet, jmorris, yoshfuji,
	kaber, thaller, stephen
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B73BB@saturn3.aculab.com>

On Tue, Oct 29, 2013 at 09:37:06AM -0000, David Laight wrote:
> > Note that you don't even need to put the DHCP protocol core into the
> > kernel to fix the promiscuous problem.  You just have to use the
> > current kernel interfaces correctly.
> > 
> > It used to be the case a very long time ago that you couldn't even
> > receive broadcast UDP datagrams on a socket until an address was
> > configured on it.
> > 
> > So everyone turns on promiscuous mode and uses RAW sockets or
> > AF_PACKET.
> > 
> > Stupid?  yes.
> 
> Not only that, but the dhcp client could use a normal UDP socket
> to keep the lease renewed - I suspect it has only ever needed
> to use the BPF interface (I didn't think it set promiscuous)
> when acquiring the initial lease.

Yes, this is a very unfortunate situation. From my experience it is not that
easy to get a patch merged into isc-dhcp.

It seems not that invasive to switch from af_packet to an udp socket with
SO_BROADCAST set.

Greetings,

  Hannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox