Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v6 3/5] rxrpc: check return value of skb_to_sgvec always
From: Sabrina Dubroca @ 2017-04-28 11:41 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: netdev, linux-kernel, David.Laight, kernel-hardening, davem
In-Reply-To: <20170425184734.26563-3-Jason@zx2c4.com>

2017-04-25, 20:47:32 +0200, Jason A. Donenfeld wrote:
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  net/rxrpc/rxkad.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
> index 4374e7b9c7bf..dcf46c9c3ece 100644
> --- a/net/rxrpc/rxkad.c
> +++ b/net/rxrpc/rxkad.c
[...]
> @@ -429,7 +432,8 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
>  	}
>  

Adding a few more lines of context:

	sg = _sg;
	if (unlikely(nsg > 4)) {
		sg = kmalloc(sizeof(*sg) * nsg, GFP_NOIO);
		if (!sg)
			goto nomem;
	}

>  	sg_init_table(sg, nsg);
> -	skb_to_sgvec(skb, sg, offset, len);
> +	if (unlikely(skb_to_sgvec(skb, sg, offset, len) < 0))
> +		goto nomem;

You're leaking sg when nsg > 4, you'll need to add this:

	if (sg != _sg)
		kfree(sg);

BTW, when you resubmit, please Cc: the maintainers of the files you're
changing for each patch, so that they can review this stuff. And send
patch 1 to all of them, otherwise they might be surprised that we even
need <0 checking after calls to skb_to_sgvec.

You might also want to add a cover letter.

-- 
Sabrina

^ permalink raw reply

* Re: rhashtable - Cap total number of entries to 2^31
From: Christian Borntraeger @ 2017-04-28 11:43 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Florian Fainelli, David Miller, fw, netdev, Thomas Graf,
	Stephen Rothwell, Linux-Next Mailing List,
	Linux Kernel Mailing List
In-Reply-To: <20170428113145.GA7843@gondor.apana.org.au>

On 04/28/2017 01:31 PM, Herbert Xu wrote:
> On Fri, Apr 28, 2017 at 12:23:15PM +0200, Christian Borntraeger wrote:
>>
>> I can reproduce this boot failure on s390 bisected to 
>> commit 6d684e54690caef45cf14051ddeb7c71beeb681b
>>    rhashtable: Cap total number of entries to 2^31
>> in linux-next from Apr 28
> 
> It should go away with
> 
> https://patchwork.ozlabs.org/patch/756233/
> 
> Thanks,
> 
Yes it does.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>

would be nice to have it in the next linux-next version.

^ permalink raw reply

* Re: prog ID and next steps. Was: [RFC net-next 0/2] Introduce bpf_prog ID and iteration
From: Hannes Frederic Sowa @ 2017-04-28 11:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Martin KaFai Lau, netdev
  Cc: Daniel Borkmann, kernel-team, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Thomas Graf
In-Reply-To: <40cf6893-4702-4773-1aaa-7dfdc51c6212@fb.com>

Hello Alexei,

On 28.04.2017 03:11, Alexei Starovoitov wrote:
> On 4/27/17 6:36 AM, Hannes Frederic Sowa wrote:
>> On 27.04.2017 08:24, Martin KaFai Lau wrote:
>>> This patchset introduces the bpf_prog ID and a new bpf cmd to
>>> iterate all bpf_prog in the system.
>>>
>>> It is still incomplete.  The idea can be extended to bpf_map.
>>>
>>> Martin KaFai Lau (2):
>>>   bpf: Introduce bpf_prog ID
>>>   bpf: Test for bpf_prog ID and BPF_PROG_GET_NEXT_ID
>>
>> Thanks Martin, I like the approach.
>>
>> I think the progid is also much more suitable to be used in kallsyms
>> because it handles collisions correctly and let's correctly walk the
>> chain (for example imaging loading two identical programs but install
>> them at different hooks, kallsysms doesn't allow to find out which
>> program is installed where).
> 
> i disagree re: kallsyms. The goal of prog_tag is to let program writers
> understand which program is running in a stable way.

But exactly it doesn't let program writers do that, it just confuses them:

---

jit on:

perf record -e bpf_redirect -agR

The unwinder walks the stack, extracts address of upper function and
sends it to user space (perf) or handles it inside the kernel/kallsyms
(ftrace).

User takes tag of bpf program and wants to inspect related maps to the
program. Unfortunately the tag is not unique and thus we need to expand
the tag back to all possible programs with the same tag and expand that
to the union of all possible maps that those programs reference again.

That is what we present to the application developer. I would seriously
be very confused.

If application developer doesn't trust perf and uses instruction pointer
value from the stack directly he can't find out which program there is,
because fdinfo e.g. doesn't show the actual address of where the program
is allocated. I would use /dev/kmem now.

---

jit off:

perf probe -a '__bpf_prog_run ctx insn'
perf probe -a 'bpf_redirect flags ifindex'
perf record -e bpf_redirect -agR

Situation doesn't change. We do get the insn pointer thus have a unique
id for the program. That's it, no further introspection. I can read
/dev/kmem now.

---

Personally I wouldn't rely on such infrastructure.

My proposal would be to maybe hash a map id into the program, so instead
of replacing the user space file descriptor with zero, take a map id
(like discussed below) or an inode number of the map into the register
and hash with that, so that those program have unique identifiers.

Otherwise construct kallsym entries with prog id instead of tag.

I think that the hash should try to reassemble some kind of identity
function and mapping two programs to the same tag, that do something
completely differently is not good (based on we don't include the map).

Also I do think in future the difference between non-jit and jit
operation in regards to tracing should also be lifted. We could add a
manual tracing point into the interpreter for reporting the same event
as if the program was jitted.

Debugging should not be that different based on the sysctl flags.

> id is assigned dynamically and not suitable for that purpose.
> 
>> It would help a lot if you could pass the prog_id back during program
>> creation, otherwise it will be kind of difficult to get a hold on which
>> program is where. ;)
> 
> yes, but not a creation time. bpf_prog_load command will keep returning
> an FD and all operations on programs will be allowed with FD only.

Yes, yes, yes, fd should be primary return value, no questions asked.

If it is not possible to return back additional data via the bpf
syscall, add an operation to get id from fd. This is also fine.

> Think of this 'ID' as program handle or program pointer.
I do. My first idea was to use the inode of the bpffs as prog id and
just allocate it regardless, but it seems to be okay how you do it.
Maybe it even is better because you control the (re)cycling of numbers.

> In other words it's obfuscated kernel 'struct bpf_prog *' given to
> user space, so that user space can later convert this ID into FD.
> The other patch (not shown) will take ID from user space and will
> convert it to FD if prog->aux->user is the same or root.

Sure, what about tag -> id? Tag is being reported from tracing and thus
should be one of the starting points to explore which programs are running.

> We tried really hard to keep everything FD based. Unfortunately
> netlink is not suitable to pass FDs, so to query TC and XDP
> we either have to invent a way to install FD from netlink in recvmsg()
> or pass something that can be converted to FD later.
> That's what program ID is solving.

Hmm, I am not sure why netlink is not suitable to pass up file
descriptors. We certainly pass down file descriptors with netlink. But I
am fine with either bpf syscall or netlink. But also the introduction of
bpf prog id depends a bit on this reasoning.

> This set of patches look trivial with simple use of idr,
> but it took us long time to get there.
> We tried to use 64-bit ID to avoid wrap around issue, but association
> between ID and bpf_prog needs to be kept somewhere. The obvious
> answer is rhashtable, but it cannot be iterated easily.
> Like we'd need to dump the whole thing through bpf syscall which
> is not practical.
> Then we tried to use 32-bit idr's id + 32-bit timestamp/random.
> It works better, but then we hit the issue that bpf_prog_get_next_id
> cannot be iterated in a stable way when programs are being deleted
> while user space iterates over the whole list.
> So at the end we scraped all the fancy things and went with
> simple 32-bit ID allocated in _cyclic_ way via idr.
> The reason for cyclic is to avoid prog delete/create races,
> so ID seen by user space stays stable for 2B ids.
> We were concerned that somebody might try to load/delete
> a program 2B times to cause the counter to wrap around, but
> it turned out not to be an issue. In that sense prog ID is similar
> to PID.

Or to ifindex. I don't have concerns and think this should be okay.

> So more complete picture of what we're trying to do:
> - new bpf_get_fd_from_id syscall cmd will be used to convert
>   prog ID into prog FD
> - tc/xdp/sockets/tracing attachment points will return prog ID
> - existing bpf_map_lookup() cmd from prog_array will be returning
>   prog ID
> - bpf_prog_next_id syscall cmd (this patch) is used to iterate
>   over all prog IDs
> - new bpf_prog_get_info syscall cmd (based on prog FD) will be used
>   to get all or partial info about the program that kernel knows about

Sounds all good to me.

> Example usage:
> - if user space want to see instructions of all loaded programs
>   it can use a loop like:
> while (!bpf_prog_get_next_id(next_id, &next_id)) {
>    int fd = bpf_prog_get_fd_from_id(next_id);
>    struct bpf_prog_info info;
>    bpf_prog_get_info(fd, &info, flags);
>    // look into info.insns[]
>    close(fd);
> }
> 
> - if user space want to see prog_tag of xdp program attached to eth0
>   // netlink sendmsg() into ifindex of eth0 that returns prog ID
>    int fd = bpf_prog_get_fd_from_id(id_from_netlink);
>    struct bpf_prog_info info;
>    bpf_prog_get_info(fd, &info, flags);
>    // look into info.prog_tag
>    close(fd);

No problems with the above examples. We would also like to be able to
close the loop from the tracing side as well as outlined above.

> the 'flags' argument of bpf_prog_get_info() will be used
> to tell kernel which info about the program needs to be dumped.
> Otherwise if kernel always dumps everything about the program,
> it will make the syscall too slow and too cumbersome.
> Possible combinations:
> - prog_type, prog_tag, license, prog ID
> - array of prog instructions
> - array of map IDs
> Here we'll introduce similar IDs for maps and
> bpf_map_get_info() syscall cmd that will return map_type, map_id, sizes.
> If user wants to iterate over all elements of the map, they can
> use map_fd = bpf_map_get_fd_from_id(map_id); command
> and later use existing bpf_map_get_next_key+bpf_map_lookup_elem.
> 
> We believe this way the user space will be able to see _everything_
> about bpf programs and maps and can pick and choose whether
> it wants to see only programs or only maps or partial info
> about progs (without instructions) and so on.
> 
> Once we have CTF (debug info) available for maps and progs,
> we will extend bpf_prog_get_info() and bpf_map_get_info()
> commands to optionally return that as well.
> 

I think this is all non controversial!

Thanks,
Hannes

^ permalink raw reply

* Re: [ISSUE: sky2 - rx error] Link stops working under heavy traffic load connected to a mv88e6176
From: Rafa Corvillo @ 2017-04-28 11:54 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Stephen Hemminger, netdev
In-Reply-To: <20170427130450.GL17172@lunn.ch>

On 27/04/17 15:04, Andrew Lunn wrote:
> On Thu, Apr 27, 2017 at 02:05:51PM +0200, Rafa Corvillo wrote:
>> On 25/04/17 17:27, Stephen Hemminger wrote:
>>> On Fri, 21 Apr 2017 14:39:00 +0200
>>> Rafa Corvillo <rafael.corvillo@aoifes.com> wrote:
>>>
>>>> We are working in an ARMv7 embedded system running kernel 4.9 (LEDE build).
>>>> It is an imx6 board with 2 ethernet interfaces. One of them is connected to
>>>> a Marvell switch.
>>>>
>>>> The schema of the system is the following:
>>>>
>
> Hi Rafa
>
> Your ASCII art got messed up somewhere. Is this the correct
> reconstruction?

Yes, this is the schema.

>
>     +-------------------+ eth0
>     |                   +--+
>     |                   |  |
>     | Embedded system   +--+
>     |                   |
>     |      ARMv7        |
>     |                   | Marvell 88E8057(sky2)     +-------------+
>     |                   +--+                     +--+             +--+ eth1
>     |                   |  +---------------------+  |             |  +------+
>     |                   +--+      CPU port       +--+ mv88e6176   +--+
>     +------+--+---------+                           |             |
> emulated  |  |                                     |             |
> GPIO      +--+                                  +--+             +--+ eth2
> MDIO       +-----------------------------------+   |             |  +------+
>                                  MDIO            +--+             +--+
>                                                     +-------------+
>
> I assume you are using DSA? Since this is LEDE, it could be swconfig,
> but the bridge configuration you mentioned would not make sense for
> swconfig.

Yes, we use DSA driver. We don't use swconfig to configure the Marvell 
switch. Our board has two ethernet interfaces (eth0 and marvell) using 
sky2 driver. The marvell interface is connected to an external Marvell 
switch (mv88e6176) with four ethernet ports (but we only use two of 
them, eth1 and eth2). The Marvell switch is configured with the MDIO 
protocol, that we emulate through GPIOS (mdio-gpio kernel module), and 
the DSA driver is used to works with the Marvell switch.

We have the ethernet interfaces in the same bridge:

config interface 'lan'
         option type 'bridge'
         option ifname 'eth0 eth1 eth2'
         option proto 'static'
         option ipaddr '192.168.1.100'
         option netmask '255.255.255.0'
         option ip6assign '60'

root@LEDE:/# brctl show
bridge name     bridge id               STP enabled     interfaces
br-lan          7fff.00d01274f069       no              eth0
                                                         eth1
                                                         eth2
root@LEDE:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
group default qlen 1
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet6 ::1/128 scope host
        valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel 
master br-lan state DOWN group default qlen 1000
     link/ether 00:d0:12:74:f0:69 brd ff:ff:ff:ff:ff:ff
3: ifb0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
qlen 32
     link/ether be:80:bc:5e:63:c3 brd ff:ff:ff:ff:ff:ff
4: ifb1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
qlen 32
     link/ether 0a:1d:8d:06:e3:5d brd ff:ff:ff:ff:ff:ff
5: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1
     link/gre 0.0.0.0 brd 0.0.0.0
6: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN 
group default qlen 1000
     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
7: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
group default qlen 1000
     link/ether e2:0b:10:b8:b7:b0 brd ff:ff:ff:ff:ff:ff
8: teql0: <NOARP> mtu 1500 qdisc noop state DOWN group default qlen 100
     link/void
9: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN group default qlen 10
     link/can
10: marvell: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel 
state UP group default qlen 1000
     link/ether aa:64:73:91:09:a9 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::a864:73ff:fe91:9a9/64 scope link
        valid_lft forever preferred_lft forever
11: eth1@marvell: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc 
noqueue master br-lan switchid 00000000 state LOWERLAYERDOWN group 
default qlen 1000
     link/ether aa:64:73:91:09:a9 brd ff:ff:ff:ff:ff:ff
12: eth2@marvell: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue master br-lan switchid 00000000 state UP group default qlen 1000
     link/ether aa:64:73:91:09:a9 brd ff:ff:ff:ff:ff:ff
13: eth3@marvell: <BROADCAST,MULTICAST> mtu 1500 qdisc noop switchid 
00000000 state DOWN group default qlen 1000
     link/ether aa:64:73:91:09:a9 brd ff:ff:ff:ff:ff:ff
14: eth4@marvell: <BROADCAST,MULTICAST> mtu 1500 qdisc noop switchid 
00000000 state DOWN group default qlen 1000
     link/ether aa:64:73:91:09:a9 brd ff:ff:ff:ff:ff:ff
15: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP group default qlen 1000
     link/ether 00:d0:12:74:f0:69 brd ff:ff:ff:ff:ff:ff
     inet 192.168.1.100/24 brd 192.168.1.255 scope global br-lan
        valid_lft forever preferred_lft forever
     inet6 fd7b:a43b:e93e::1/60 scope global noprefixroute
        valid_lft forever preferred_lft forever
     inet6 fe80::2d0:12ff:fe74:f069/64 scope link
        valid_lft forever preferred_lft forever


We have this configuration working on a kernel 4.1 and including patches 
to upgrade dsa/mv88e6xxx to kernel version 4.3 (5acf4d0, Wed, 27 May 
2015 15:32:15 -0700) "[PATCH] blk: rq_data_dir() should not return a 
boolean."

>
>>>> If I connect the eth1/eth2, the link is up and I can do ping through it.
>>>> But, once
>>>> I start sending a heavy traffic load the link fails and the kernel sends the
>>>> following messages:
>>>>
>>>> [   48.557140] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
>>>> length 1518
>>>> [   48.564964] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
>>>> length 1518
>>>> [   48.572110] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
>>>> length 1518
>>>> [   48.579263] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
>>>> length 1518
>>>> [   48.586417] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
>>>> length 1518
>>>> [   48.593573] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
>>>> length 1518
>>>> [   48.600718] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
>>>> length 1518
>>>> [   54.877567] net_ratelimit: 6 callbacks suppressed
>>>> [   54.882293] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
>>>> length 1518
>>>> [   61.413552] sky2 0000:04:00.0 marvell: rx error, status 0x5f20010
>>>> length 1518
>>>
>>> The status error bits are in sky2.h
>>> 0x5f20010 is
>>>       05f2 frame length => 1522
>>>       0010 Too long err
>>>
>>> That means the packet was longer than the configured MTU.
>>> You are probably getting packets with VLAN tag but have not configured
>>> a VLAN.
>
> Since you are using DSA, you will have DSA tags enabled on frames
> to/from the switch. This adds an extra 8 byte header in the frame.  My
> guess is, it is this header, not the VLAN tag which is causing you MTU
> issues.

But it is strange because, as I have said above, we have the same 
configuration working properly on a kernel 4.1 (with OpenWrt), and we 
have the MTU set to 1500.

>
> I think this is the first time i've seen sky2 used in a DSA
> setup. mv643xx or mvneta is generally what is used, when using Marvell
> chipsets. These drivers are more lenient about MTU, and are happy to
> pass frames with additional headers.
>

We use the mv88e6xxx (as our switch is mv88e6176) and it depends on DSA 
driver in the kernel (isn't it?).

>> Thanks for the information. I have increased the MTU value to 1550
>> (workaround) and it works if sends traffic (with iperf) from my
>> computer to the unit. But, if I send traffic outside the unit, I get
>> a new error message and link goes down:
>
> Changing the MTU like this is not a good fix. It will allow you to
> receive frames which are bigger, but it also means the local network
> stack will generate bigger frames to be transmitted. You probably need
> to modify the sky2 driver to allow it to receive frames bigger than
> the interface MTU, by about 8 bytes.

Should the DSA driver remove the DSA tags before pass the frames to sky2 
interface?

>
>> [ 4901.032989] sky2 0000:04:00.0 marvell: tx timeout
>> [ 4904.722670] sky2 0000:04:00.0 marvell: Link is up at 1000 Mbps,
>> full duplex, flow control both
>
> Between the sky2 and the switch, do you have two back-to-back PHYs or
> are you connecting the RGMII interfaces together?

I think that we have two back-to-back PHYs, but I am going to double 
check this with the hardware team.

Thanks,
Rafa

>
>      Andrew
>

^ permalink raw reply

* Re: Network cooling device and how to control NIC speed on thermal condition
From: Andrew Lunn @ 2017-04-28 11:56 UTC (permalink / raw)
  To: Waldemar Rymarkiewicz; +Cc: Alan Cox, Florian Fainelli, netdev, linux-kernel
In-Reply-To: <CAHKzcEO-75r5Gmew9x1eprh9ACf_NQKCHQg9svOLM1OmVX0JgQ@mail.gmail.com>

> I collect SoC temp every a few secs. Meantime, I use ethtool -s ethX
> speed <speed> to manipulate link speed and to see how it impacts SoC
> temp. My 4 PHYs and switch are integrated into SoC and I always
> change link speed for all PHYs , no traffic on the link for this test.
> Starting with 1Gb/s and then scaling down to 100 Mb/s and then to
> 10Mb/s, I see significant  ~10 *C drop in temp while link is set to
> 10Mb/s.

Is that a realistic test? No traffic over the network? If you are
hitting your thermal limit, to me that means one of two things:

1) The device is under very heavy load, consuming a lot of power to do
   what it needs to to.

2) Your device is idle, no packets are flowing, but your thermal
   design is wrong, so that it cannot dissipate enough heat.

It seems to me, you are more interested in 1). But your quick test is
more about 2).

I would be more interested in do quick tests of switching 8Gbps,
4Gbps, 2Gbps, 1Gbps, 512Mbps, 256Bps, ... What effect does this have
on temperature?

> So, throttling link speed can really help to dissipate heat
> significantly when the platform is under threat.
> 
> Renegotiating link speed costs something I agree, it also impacts user
> experience, but such a thermal condition will not occur often I
> believe.

It is a heavy handed approach, and you have to be careful. There are
some devices which don't work properly, e.g. if you try to negotiate
1000 half duplex, you might find the link just breaks.

Doing this via packet filtering, dropping packets, gives you a much
finer grained control and is a lot less disruptive. But it assumes
handling packets is what it causing you heat problems, not the links
themselves.

	Andrew

^ permalink raw reply

* Re: [REGRESSION next-20170426] Commit 09515ef5ddad ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices") causes oops in mvneta
From: Sricharan R @ 2017-04-28 11:56 UTC (permalink / raw)
  To: Ralph Sennhauser
  Cc: Rafael J. Wysocki, Joerg Roedel, Bjorn Helgaas, linux-acpi,
	linux-kernel, linux-pci, Thomas Petazzoni, netdev
In-Reply-To: <20170428081919.21bb569d@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2594 bytes --]

Hi Ralph,

<snip..>

>>>>>
>>>>> Commit 09515ef5ddad ("of/acpi: Configure dma operations at probe
>>>>> time for platform/amba/pci bus devices") causes a kernel panic as
>>>>> in the log below on an armada-385. Reverting the commit fixes the
>>>>> issue.
>>>>>
>>>>> Regards
>>>>> Ralph    
>>>>
>>>> Somehow not getting a obvious clue on whats going wrong with the
>>>> logs below. From the log and looking in to dts, the drivers seems
>>>> to the one for "marvell,armada-370-neta".  
>>>
>>> Correct.
>>>   
>>>> Issue looks the data from the dma
>>>> has gone bad and subsequently referring the wrong data has resulted
>>>> in the crash. Looks like the dma_masks is the one going wrong.
>>>> Can i get some logs from mvneta_probe, about dev->dma_mask,
>>>> dev->coherent_dma_mask and dev->dma_ops with and without the patch
>>>> to see whats the difference ?  
>>>
>>> Not sure I understood what exactly you are after. Might be faster to
>>> just send me a patch with all debug print statements you like to
>>> see. 
>>
>> Attached the patch with debug prints.
>>
>> Regards,
>>  Sricharan
>>
> 
> Hi Sricharan
> 
> With commit 09515ef5ddad
> 
> [    1.288962] mvneta f1070000.ethernet: dev->dma_mask 0xffffffff
> [    1.294827] mvneta f1070000.ethernet: dev->coherent_dma_mask 0xffffffff
> [    1.301472] mvneta f1070000.ethernet: dev->dma_ops 0x40b00c0601460
> 
> [    1.322047] mvneta f1034000.ethernet: dev->dma_mask 0xffffffff
> [    1.327904] mvneta f1034000.ethernet: dev->coherent_dma_mask 0xffffffff
> [    1.334549] mvneta f1034000.ethernet: dev->dma_ops 0x40b00c0601460
> 
> 
> With the patch reverted, the build that works
> 
> [    1.289001] mvneta f1070000.ethernet: dev->dma_mask 0xffffffff
> [    1.294866] mvneta f1070000.ethernet: dev->coherent_dma_mask 0xffffffff
> [    1.301511] mvneta f1070000.ethernet: dev->dma_ops 0x40b00c06014a8
> 
> [    1.317005] mvneta f1034000.ethernet: dev->dma_mask 0xffffffff
> [    1.322867] mvneta f1034000.ethernet: dev->coherent_dma_mask 0xffffffff
> [    1.329508] mvneta f1034000.ethernet: dev->dma_ops 0x40b00c06014a8
> 

My bad, i think it is this patch missing [1], attached it as well.
Infact, this was in the series initially and got acked to get merged
separately well before the series. I should have sent this to Russell.
I will do this now. If this fixes up the issue,
i will take this patch separately, while this series gets tested
on -next.

[1] https://patchwork.kernel.org/patch/9362113/

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

[-- Attachment #2: 0001-arm-dma-mapping-Don-t-override-dma_ops-in-arch_setup.patch --]
[-- Type: text/plain, Size: 2001 bytes --]

From be36ea5f2c7d1c28dc8f829b5d2c817826481086 Mon Sep 17 00:00:00 2001
From: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Date: Fri, 15 May 2015 02:00:02 +0300
Subject: [PATCH] arm: dma-mapping: Don't override dma_ops in
 arch_setup_dma_ops()

The arch_setup_dma_ops() function is in charge of setting dma_ops with a
call to set_dma_ops(). set_dma_ops() is also called from

- highbank and mvebu bus notifiers
- dmabounce (to be replaced with swiotlb)
- arm_iommu_attach_device

(arm_iommu_attach_device is itself called from IOMMU and bus master
device drivers)

To allow the arch_setup_dma_ops() call to be moved from device add time
to device probe time we must ensure that dma_ops already setup by any of
the above callers will not be overriden.

Aftering replacing dmabounce with swiotlb, converting IOMMU drivers to
of_xlate and taking care of highbank and mvebu, the workaround should be
removed.

[Rebased on top of 4.11-rc8]
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
---
 arch/arm/mm/dma-mapping.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0268584..c742dfd 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2408,6 +2408,15 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	const struct dma_map_ops *dma_ops;
 
 	dev->archdata.dma_coherent = coherent;
+
+	/*
+	 * Don't override the dma_ops if they have already been set. Ideally
+	 * this should be the only location where dma_ops are set, remove this
+	 * check when all other callers of set_dma_ops will have disappeared.
+	 */
+	if (dev->dma_ops)
+		return;
+
 	if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
 		dma_ops = arm_get_iommu_dma_map_ops(coherent);
 	else
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply related

* [PATCH/RFC net-next 0/4] net/sched: cls_flower: avoid false matching of truncated packets
From: Simon Horman @ 2017-04-28 12:00 UTC (permalink / raw)
  To: Jiri Pirko, Jamal Hadi Salim, Cong Wang
  Cc: Dinan Gunawardena, netdev, oss-drivers

Hi,

this series is intended to avoid false-positives which match
truncated packets against flower classifiers which match on:
* zero L4 ports or;
* zero ICMP code or type

This requires updating the flow dissector to return an error in such cases
and updating flower to not match on the result of a failed dissection.

In the case of UDP this results in a behavioural change to users of
flow_keys_dissector_keys[] and flow_keys_dissector_symmetric_keys[] -
dissection will fail on truncated packets where the IP protocol of the
packets indicates ports should be present (according to skb_flow_get_ports()).

The last patch of the series builds on the above to allow users to specify
a policy for how to handle packets whose dissection fails.

I will separately provide RFC patches to iproute2 to allow exercising the
last patch.

Simon Horman (4):
  flow dissector: return error on port dissection under-run
  flow dissector: return error on icmp dissection under-run
  net/sched: cls_flower: do not match if dissection fails
  net/sched: cls_flower: allow control of tree traversal on packet parse
    errors

 include/linux/skbuff.h       |  11 +++--
 include/uapi/linux/pkt_cls.h |   2 +
 net/core/flow_dissector.c    | 105 ++++++++++++++++++++++++-------------------
 net/sched/cls_flower.c       |  47 ++++++++++++++-----
 4 files changed, 107 insertions(+), 58 deletions(-)

-- 
2.12.2.816.g2cccc81164

^ permalink raw reply

* [PATCH/RFC net-next 1/4] flow dissector: return error on port dissection under-run
From: Simon Horman @ 2017-04-28 12:00 UTC (permalink / raw)
  To: Jiri Pirko, Jamal Hadi Salim, Cong Wang
  Cc: Dinan Gunawardena, netdev, oss-drivers, Simon Horman
In-Reply-To: <20170428120035.15984-1-simon.horman@netronome.com>

Return an error from __skb_flow_dissect() if insufficient packet data is
present when dissecting layer 4 ports.

Without this change the absence of ports in truncated - e.g. UDP - packets
is treated the same way as the presence of ports with a value of zero.  As
a result the flower classifier is unable to differentiate between these two
cases which may lead to unexpected matching of truncated packets.

The approach taken here is to only return an error if the offset of ports
for the previously dissected IP protocol is known - a non error return from
proto_ports_offset() - and the port data is not present in the packet - an
error return value from __skb_header_pointer().

The behaviour for callers of __skb_flow_get_ports() is changed but the only
callers are skb_flow_get_ports() and the flow dissector.  The former has
been updated so that its behaviour is unchanged.  Behavioural change of the
latter is the intended purpose of this patch but will only take effect with
a separate patch to have it refuse to match if dissection fails.

This change will lead to behavioural changes of the users of the dissector
with FLOW_DISSECTOR_KEY_PORTS - flower, and users of
flow_keys_dissector_keys[] and flow_keys_dissector_symmetric_keys[].  The
behavioural change for *_keys[] changes seem reasonable as the change will
should only be for truncated packets.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 include/linux/skbuff.h    | 11 ++++++++---
 net/core/flow_dissector.c | 40 ++++++++++++++++++++++++++--------------
 2 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 81ef53f06534..0ad9b3955829 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1108,13 +1108,18 @@ u32 __skb_get_hash_symmetric(const struct sk_buff *skb);
 u32 skb_get_poff(const struct sk_buff *skb);
 u32 __skb_get_poff(const struct sk_buff *skb, void *data,
 		   const struct flow_keys *keys, int hlen);
-__be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
-			    void *data, int hlen_proto);
+bool __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
+			  void *data, int hlen_proto, __be32 *ports);
 
 static inline __be32 skb_flow_get_ports(const struct sk_buff *skb,
 					int thoff, u8 ip_proto)
 {
-	return __skb_flow_get_ports(skb, thoff, ip_proto, NULL, 0);
+	__be32 ports;
+
+	if (__skb_flow_get_ports(skb, thoff, ip_proto, NULL, 0, &ports))
+		return ports;
+	else
+		return 0;
 }
 
 void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 28d94bce4df8..b3bf4886f71f 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -86,30 +86,41 @@ static __be16 skb_flow_get_be16(const struct sk_buff *skb, int poff,
  * @ip_proto: protocol for which to get port offset
  * @data: raw buffer pointer to the packet, if NULL use skb->data
  * @hlen: packet header length, if @data is NULL use skb_headlen(skb)
+ * @ports: pointer to return ports in
  *
  * The function will try to retrieve the ports at offset thoff + poff where poff
- * is the protocol port offset returned from proto_ports_offset
+ * is the protocol port offset returned from proto_ports_offset.
+ *
+ * Returns false on error, true otherwise.
  */
-__be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
-			    void *data, int hlen)
+bool __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
+			  void *data, int hlen, __be32 *ports)
 {
 	int poff = proto_ports_offset(ip_proto);
+	__be32 *p, _p;
+
+	/* proto_ports_offset returning an error indicates that ip_proto is
+	 * not known to have ports. This is not considered an error here.
+	 * Rather it is considered that the flow key of the caller may use
+	 * the default value of port fields: 0.
+	 */
+	if (poff < 0) {
+		*ports = 0;
+		return true;
+	}
 
 	if (!data) {
 		data = skb->data;
 		hlen = skb_headlen(skb);
 	}
 
-	if (poff >= 0) {
-		__be32 *ports, _ports;
+	p = __skb_header_pointer(skb, thoff + poff, sizeof(_p),
+				 data, hlen, &_p);
+	if (!p)
+		return false;
+	*ports = *p;
 
-		ports = __skb_header_pointer(skb, thoff + poff,
-					     sizeof(_ports), data, hlen, &_ports);
-		if (ports)
-			return *ports;
-	}
-
-	return 0;
+	return true;
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
@@ -692,8 +703,9 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		key_ports = skb_flow_dissector_target(flow_dissector,
 						      FLOW_DISSECTOR_KEY_PORTS,
 						      target_container);
-		key_ports->ports = __skb_flow_get_ports(skb, nhoff, ip_proto,
-							data, hlen);
+		if (!__skb_flow_get_ports(skb, nhoff, ip_proto, data, hlen,
+					  &key_ports->ports))
+			goto out_bad;
 	}
 
 	if (dissector_uses_key(flow_dissector,
-- 
2.12.2.816.g2cccc81164

^ permalink raw reply related

* [PATCH/RFC net-next 2/4] flow dissector: return error on icmp dissection under-run
From: Simon Horman @ 2017-04-28 12:00 UTC (permalink / raw)
  To: Jiri Pirko, Jamal Hadi Salim, Cong Wang
  Cc: Dinan Gunawardena, netdev, oss-drivers, Simon Horman
In-Reply-To: <20170428120035.15984-1-simon.horman@netronome.com>

Return an error from __skb_flow_dissect() if insufficient packet data is
present when dissecting icmp type and code.

Without this change the absence of the ICMP type and code in truncated
ICMPv4 or IPVPv6 packets is treated the same as the presence of a code and
type of value of zero.  As a result the flower classifier is unable to
differentiate between these two cases which may lead to unexpected matching
of truncated packets.

The approach taken here is to return an error if the IP protocol indicates
ICMP, and the type and code data is not present in the packet - an error
return value from __skb_header_pointer().

This should only effect the flower classifier as it is the only user of
W_DISSECTOR_KEY_ICMP.  The behavioural update for flower only takes effect
with a separate patch to have it refuse to match if dissection fails.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 net/core/flow_dissector.c | 65 +++++++++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 31 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index b3bf4886f71f..496afd7b3051 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -58,28 +58,6 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
 /**
- * skb_flow_get_be16 - extract be16 entity
- * @skb: sk_buff to extract from
- * @poff: offset to extract at
- * @data: raw buffer pointer to the packet
- * @hlen: packet header length
- *
- * The function will try to retrieve a be32 entity at
- * offset poff
- */
-static __be16 skb_flow_get_be16(const struct sk_buff *skb, int poff,
-				void *data, int hlen)
-{
-	__be16 *u, _u;
-
-	u = __skb_header_pointer(skb, poff, sizeof(_u), data, hlen, &_u);
-	if (u)
-		return *u;
-
-	return 0;
-}
-
-/**
  * __skb_flow_get_ports - extract the upper layer ports and return them
  * @skb: sk_buff to extract the ports from
  * @thoff: transport header offset
@@ -353,6 +331,29 @@ __skb_flow_dissect_gre(const struct sk_buff *skb,
 	return FLOW_DISSECT_RET_OUT_PROTO_AGAIN;
 }
 
+static enum flow_dissect_ret
+__skb_flow_dissect_icmp(const struct sk_buff *skb,
+			struct flow_dissector *flow_dissector,
+			void *target_container, void *data, int nhoff, int hlen)
+{
+	struct flow_dissector_key_icmp *key_icmp;
+	__be16 *u, _u;
+
+	if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_ICMP))
+		return FLOW_DISSECT_RET_OUT_GOOD;
+
+	u = __skb_header_pointer(skb, nhoff, sizeof(_u), data, hlen, &_u);
+	if (!u)
+		return FLOW_DISSECT_RET_OUT_BAD;
+
+	key_icmp = skb_flow_dissector_target(flow_dissector,
+					     FLOW_DISSECTOR_KEY_ICMP,
+					     target_container);
+	key_icmp->icmp = *u;
+
+	return FLOW_DISSECT_RET_OUT_GOOD;
+}
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are specified
@@ -379,7 +380,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	struct flow_dissector_key_basic *key_basic;
 	struct flow_dissector_key_addrs *key_addrs;
 	struct flow_dissector_key_ports *key_ports;
-	struct flow_dissector_key_icmp *key_icmp;
 	struct flow_dissector_key_tags *key_tags;
 	struct flow_dissector_key_vlan *key_vlan;
 	bool skip_vlan = false;
@@ -694,6 +694,17 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	case IPPROTO_MPLS:
 		proto = htons(ETH_P_MPLS_UC);
 		goto mpls;
+	case IPPROTO_ICMP:
+	case NEXTHDR_ICMP:
+		switch (__skb_flow_dissect_icmp(skb, flow_dissector,
+						target_container, data,
+						nhoff, hlen)) {
+		case FLOW_DISSECT_RET_OUT_GOOD:
+			goto out_good;
+		case FLOW_DISSECT_RET_OUT_BAD:
+		default:
+			goto out_bad;
+		}
 	default:
 		break;
 	}
@@ -708,14 +719,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 			goto out_bad;
 	}
 
-	if (dissector_uses_key(flow_dissector,
-			       FLOW_DISSECTOR_KEY_ICMP)) {
-		key_icmp = skb_flow_dissector_target(flow_dissector,
-						     FLOW_DISSECTOR_KEY_ICMP,
-						     target_container);
-		key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data, hlen);
-	}
-
 out_good:
 	ret = true;
 
-- 
2.12.2.816.g2cccc81164

^ permalink raw reply related

* [PATCH/RFC net-next 4/4] net/sched: cls_flower: allow control of tree traversal on packet parse errors
From: Simon Horman @ 2017-04-28 12:00 UTC (permalink / raw)
  To: Jiri Pirko, Jamal Hadi Salim, Cong Wang
  Cc: Dinan Gunawardena, netdev, oss-drivers, Simon Horman
In-Reply-To: <20170428120035.15984-1-simon.horman@netronome.com>

Allow control how the tree of qdisc, classes and filters is further
traversed if an error is encountered when parsing the packet in order to
match the cls_flower filters at a particular prio.

By default continue to the next filter, the behaviour without this patch.

A use-case for this is to allow configuration of dropping of packets with
truncated headers.

For example, the following drops IPv4 packets that cannot be parsed by the
flow dissector up to the end of the UDP ports - e.g. because they are
truncated, and instantiates a continue action based on the port for packets
that can be parsed.

 # tc qdisc del dev eth0 ingress; tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: flower \
       indev eth0 ip_proto udp dst_port 80 header_parse_err_action drop \
       action continue

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 include/uapi/linux/pkt_cls.h |  2 ++
 net/sched/cls_flower.c       | 46 ++++++++++++++++++++++++++++++++++----------
 2 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index f1129e383b2a..b722a85bcfa1 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -437,6 +437,8 @@ enum {
 	TCA_FLOWER_KEY_MPLS_TC,		/* u8 - 3 bits */
 	TCA_FLOWER_KEY_MPLS_LABEL,	/* be32 - 20 bits */
 
+	TCA_FLOWER_HEADER_PARSE_ERR_ACT,
+
 	__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index cc6b3e7cf03b..4d2d91b0d532 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -67,13 +67,14 @@ struct cls_fl_head {
 	struct fl_flow_mask mask;
 	struct flow_dissector dissector;
 	u32 hgen;
-	bool mask_assigned;
+	bool assigned;
 	struct list_head filters;
 	struct rhashtable_params ht_params;
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
 	};
+	int err_action;
 };
 
 struct cls_fl_filter {
@@ -188,7 +189,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 	 */
 	skb_key.basic.n_proto = skb->protocol;
 	if (!skb_flow_dissect(skb, &head->dissector, &skb_key, 0))
-		return -1;
+		return head->err_action;
 
 	fl_set_masked_key(&skb_mkey, &skb_key, &head->mask);
 
@@ -317,7 +318,7 @@ static void fl_destroy_sleepable(struct work_struct *work)
 {
 	struct cls_fl_head *head = container_of(work, struct cls_fl_head,
 						work);
-	if (head->mask_assigned)
+	if (head->assigned)
 		rhashtable_destroy(&head->ht);
 	kfree(head);
 	module_put(THIS_MODULE);
@@ -425,6 +426,7 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
 	[TCA_FLOWER_KEY_MPLS_BOS]	= { .type = NLA_U8 },
 	[TCA_FLOWER_KEY_MPLS_TC]	= { .type = NLA_U8 },
 	[TCA_FLOWER_KEY_MPLS_LABEL]	= { .type = NLA_U32 },
+	[TCA_FLOWER_HEADER_PARSE_ERR_ACT] = { .type = NLA_U32 },
 };
 
 static void fl_set_key_val(struct nlattr **tb,
@@ -779,13 +781,15 @@ static void fl_init_dissector(struct cls_fl_head *head,
 	skb_flow_dissector_init(&head->dissector, keys, cnt);
 }
 
-static int fl_check_assign_mask(struct cls_fl_head *head,
-				struct fl_flow_mask *mask)
+static int fl_check_assign_mask_and_err_action(struct cls_fl_head *head,
+					       struct fl_flow_mask *mask,
+					       int err_action)
 {
 	int err;
 
-	if (head->mask_assigned) {
-		if (!fl_mask_eq(&head->mask, mask))
+	if (head->assigned) {
+		if (!fl_mask_eq(&head->mask, mask) ||
+		    head->err_action != err_action)
 			return -EINVAL;
 		else
 			return 0;
@@ -798,7 +802,8 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
 	if (err)
 		return err;
 	memcpy(&head->mask, mask, sizeof(head->mask));
-	head->mask_assigned = true;
+	head->assigned = true;
+	head->err_action = err_action;
 
 	fl_init_dissector(head, mask);
 
@@ -871,7 +876,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	struct cls_fl_filter *fnew;
 	struct nlattr **tb;
 	struct fl_flow_mask mask = {};
-	int err;
+	int err, err_action;
 
 	if (!tca[TCA_OPTIONS])
 		return -EINVAL;
@@ -918,11 +923,28 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		}
 	}
 
+	if (tb[TCA_FLOWER_HEADER_PARSE_ERR_ACT]) {
+		err_action = nla_get_u32(tb[TCA_FLOWER_HEADER_PARSE_ERR_ACT]);
+
+		switch (err_action) {
+		case TC_ACT_UNSPEC:
+		case TC_ACT_OK:
+		case TC_ACT_SHOT:
+			break;
+		default:
+			err = -EINVAL;
+			goto errout;
+		}
+
+	} else {
+		err_action = TC_ACT_UNSPEC;
+	}
+
 	err = fl_set_parms(net, tp, fnew, &mask, base, tb, tca[TCA_RATE], ovr);
 	if (err)
 		goto errout;
 
-	err = fl_check_assign_mask(head, &mask);
+	err = fl_check_assign_mask_and_err_action(head, &mask, err_action);
 	if (err)
 		goto errout;
 
@@ -1309,6 +1331,10 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 	if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
 		goto nla_put_failure;
 
+	if (head->err_action != TC_ACT_UNSPEC &&
+	    nla_put_u32(skb, TCA_FLOWER_HEADER_PARSE_ERR_ACT, head->err_action))
+		goto nla_put_failure;
+
 	if (tcf_exts_dump(skb, &f->exts))
 		goto nla_put_failure;
 
-- 
2.12.2.816.g2cccc81164

^ permalink raw reply related

* [PATCH/RFC net-next 3/4] net/sched: cls_flower: do not match if dissection fails
From: Simon Horman @ 2017-04-28 12:00 UTC (permalink / raw)
  To: Jiri Pirko, Jamal Hadi Salim, Cong Wang
  Cc: Dinan Gunawardena, netdev, oss-drivers, Simon Horman
In-Reply-To: <20170428120035.15984-1-simon.horman@netronome.com>

If the flow skb_flow_dissect() returns an error it indicates that
dissection was incomplete for some reason. Matching using the result of an
incomplete dissection may cause unexpected results. For example:

* A match on zero layer 4 ports will also match packets truncated at
  the end of the IP header; that is packets where ports are missing are
  treated the same way as packets with zero ports.
* Likewise, a match on zero ICMP code or type will also match packets
  truncated at the end of the IP header; that is packets where the ICMP
  type and code are missing will be treated the same way as packets with
  zero ICMP code and type.

Separate patches to the flow dissector are required in order for it to
return errors in the above cases.

Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 net/sched/cls_flower.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 3ecf07666df3..cc6b3e7cf03b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -187,7 +187,8 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 	 * so do it rather here.
 	 */
 	skb_key.basic.n_proto = skb->protocol;
-	skb_flow_dissect(skb, &head->dissector, &skb_key, 0);
+	if (!skb_flow_dissect(skb, &head->dissector, &skb_key, 0))
+		return -1;

 	fl_set_masked_key(&skb_mkey, &skb_key, &head->mask);

-- 
2.12.2.816.g2cccc81164

^ permalink raw reply related

* [PATCH/RFC iproute2/net-next 0/3] tc: flower: allow control of tree traversal on packet parse errors
From: Simon Horman @ 2017-04-28 12:02 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang, Dinan Gunawardena,
	netdev, oss-drivers, Simon Horman

Hi,

this series is intended to allow control how the tree of qdisc, classes and
filters is further traversed if an error is encountered when parsing the
packet in order to match the cls_flower filters at a particular prio.

Please see the changelog of the last patch of this series for a more
detailed description.


Simon Horman (3):
  tc: flower: update headers for TCA_FLOWER_KEY_MPLS*
  tc: flower: update headers for TCA_FLOWER_HEADER_PARSE_ERR_ACT
  tc: flower: allow control of tree traversal on packet parse errors

 include/linux/pkt_cls.h |  7 +++++++
 man/man8/tc-flower.8    | 29 +++++++++++++++++++++++++++--
 tc/f_flower.c           | 33 +++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 2 deletions(-)

-- 
2.12.2.816.g2cccc81164

^ permalink raw reply

* [PATCH/RFC iproute2/net-next 1/3] tc: flower: update headers for TCA_FLOWER_KEY_MPLS*
From: Simon Horman @ 2017-04-28 12:02 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang, Dinan Gunawardena,
	netdev, oss-drivers, Simon Horman
In-Reply-To: <20170428120301.16500-1-simon.horman@netronome.com>

These changes are present in net-next.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 include/linux/pkt_cls.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index 7a69f2a4ca0c..f1129e383b2a 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -432,6 +432,11 @@ enum {
 	TCA_FLOWER_KEY_ARP_THA,		/* ETH_ALEN */
 	TCA_FLOWER_KEY_ARP_THA_MASK,	/* ETH_ALEN */
 
+	TCA_FLOWER_KEY_MPLS_TTL,	/* u8 - 8 bits */
+	TCA_FLOWER_KEY_MPLS_BOS,	/* u8 - 1 bit */
+	TCA_FLOWER_KEY_MPLS_TC,		/* u8 - 3 bits */
+	TCA_FLOWER_KEY_MPLS_LABEL,	/* be32 - 20 bits */
+
 	__TCA_FLOWER_MAX,
 };
 
-- 
2.12.2.816.g2cccc81164

^ permalink raw reply related

* [PATCH/RFC iproute2/net-next 2/3] tc: flower: update headers for TCA_FLOWER_HEADER_PARSE_ERR_ACT
From: Simon Horman @ 2017-04-28 12:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang, Dinan Gunawardena,
	netdev, oss-drivers, Simon Horman
In-Reply-To: <20170428120301.16500-1-simon.horman@netronome.com>

This change is proposed for net-next.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 include/linux/pkt_cls.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index f1129e383b2a..1e2dd535703d 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -437,6 +437,8 @@ enum {
 	TCA_FLOWER_KEY_MPLS_TC,		/* u8 - 3 bits */
 	TCA_FLOWER_KEY_MPLS_LABEL,	/* be32 - 20 bits */
 
+	TCA_FLOWER_HEADER_PARSE_ERR_ACT, /* u32 */
+
 	__TCA_FLOWER_MAX,
 };
 
-- 
2.12.2.816.g2cccc81164

^ permalink raw reply related

* [PATCH/RFC iproute2/net-next 3/3] tc: flower: allow control of tree traversal on packet parse errors
From: Simon Horman @ 2017-04-28 12:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang, Dinan Gunawardena,
	netdev, oss-drivers, Simon Horman
In-Reply-To: <20170428120301.16500-1-simon.horman@netronome.com>

Allow control how the tree of qdisc, classes and filters is further
traversed if an error is encountered when parsing the packet in order to
match the cls_flower filters at a particular prio.

By default continue to the next filter, the behaviour without this patch.

A use-case for this is to allow configuration of dropping of packets with
truncated headers.

For example, the following drops IPv4 packets that cannot be parsed by the
flow dissector up to the end of the UDP ports - e.g. because they are
truncated, and instantiates a continue action based on the port for packets
that can be parsed.

 # tc qdisc del dev eth0 ingress; tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: flower \
       indev eth0 ip_proto udp dst_port 80 header_parse_err_action drop \
       action continue

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 man/man8/tc-flower.8 | 29 +++++++++++++++++++++++++++--
 tc/f_flower.c        | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index ba290657c224..73d525d10ccd 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -7,6 +7,8 @@ flower \- flow based traffic control filter
 .ti -8
 .BR tc " " filter " ... " flower " [ "
 .IR MATCH_LIST " ] [ "
+.B header_parse_err_action
+.IR CONTROL " ] [ "
 .B action
 .IR ACTION_SPEC " ] [ "
 .B classid
@@ -64,6 +66,28 @@ action from the generic action framework may be called.
 .BI action " ACTION_SPEC"
 Apply an action from the generic actions framework on matching packets.
 .TP
+.BI header_parse_err_action " CONTROL"
+Control how the tree of qdisc, classes and filters is further
+traversed if an error is encountered when parsing the packet in order to
+match against the \fIMATCH_LIST\fR.
+.RS
+.TP
+.B drop
+.TQ
+.B shot
+Drop the packet.
+.TP
+.B continue
+Continue classification with the next filter in line.
+.TP
+.B pass
+Finish classification process and return to calling qdisc for further packet
+processing. This is the default.
+.P
+All filters with the same prio must have the same header_parse_err_action
+value -  drop and shot are considered to be the same value.
+.RE
+.TP
 .BI classid " CLASSID"
 Specify a class to pass matching packets on to.
 .I CLASSID
@@ -219,8 +243,9 @@ and finally ICMP matches (\fBcode\fR and \fBtype\fR) depend on
 being set to
 .BR icmp " or " icmpv6.
 .P
-There can be only used one mask per one prio. If user needs to specify different
-mask, he has to use different prio.
+There can be only used one mask and header_parse_err_action per one prio.
+If user needs to specify different mask or header_parse_err_action,
+he has to use different prio.
 .SH SEE ALSO
 .BR tc (8),
 .BR tc-flow (8)
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 5aac4a0837f4..7bbc15b79fdd 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -43,6 +43,7 @@ static void explain(void)
 	fprintf(stderr,
 		"Usage: ... flower [ MATCH-LIST ]\n"
 		"                  [ skip_sw | skip_hw ]\n"
+		"                  [ header_parse_err_action CONTROL ]\n"
 		"                  [ action ACTION-SPEC ] [ classid CLASSID ]\n"
 		"\n"
 		"Where: MATCH-LIST := [ MATCH-LIST ] MATCH\n"
@@ -72,6 +73,7 @@ static void explain(void)
 		"       FILTERID := X:Y:Z\n"
 		"       MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS }\n"
 		"       ACTION-SPEC := ... look at individual actions\n"
+		"       CONTROL := ... drop | shot | continue | pass\n"
 		"\n"
 		"NOTE: CLASSID, IP-PROTO are parsed as hexadecimal input.\n"
 		"NOTE: There can be only used one mask per one prio. If user needs\n"
@@ -507,12 +509,14 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 	struct tcmsg *t = NLMSG_DATA(n);
 	struct rtattr *tail;
 	__be16 eth_type = TC_H_MIN(t->tcm_info);
+	int err_action = TC_ACT_UNSPEC;
 	__be16 vlan_ethtype = 0;
 	__u8 ip_proto = 0xff;
 	__u32 flags = 0;
 	__u32 mtf = 0;
 	__u32 mtf_mask = 0;
 
+
 	if (handle) {
 		ret = get_u32(&t->tcm_handle, handle, 0);
 		if (ret) {
@@ -788,6 +792,23 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 				return -1;
 			}
 			continue;
+		} else if (matches(*argv, "header_parse_err_action") == 0) {
+			NEXT_ARG();
+
+			if (!argc || action_a2n(*argv, &err_action, false)) {
+				fprintf(stderr, "Illegal \"header_parse_err_action\"\n");
+				return -1;
+			}
+
+			switch (err_action) {
+			case TC_ACT_UNSPEC:
+			case TC_ACT_OK:
+			case TC_ACT_SHOT:
+				break;
+			default:
+				fprintf(stderr, "Illegal \"header_parse_err_action\"\n");
+				return -1;
+			}
 		} else if (strcmp(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -820,6 +841,12 @@ parse_done:
 			return ret;
 	}
 
+	ret = addattr32(n, MAX_MSG, TCA_FLOWER_HEADER_PARSE_ERR_ACT,
+			err_action);
+	if (ret)
+		return ret;
+
+
 	tail->rta_len = (((void *)n)+n->nlmsg_len) - (void *)tail;
 
 	return 0;
@@ -1173,6 +1200,12 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
 			fprintf(f, "\n  skip_sw");
 	}
 
+	if (tb[TCA_FLOWER_HEADER_PARSE_ERR_ACT]) {
+		int act = rta_getattr_u32(tb[TCA_FLOWER_HEADER_PARSE_ERR_ACT]);
+
+		fprintf(f, "\n  header_parse_err_action %s", action_n2a(act));
+	}
+
 	if (tb[TCA_FLOWER_ACT])
 		tc_print_action(f, tb[TCA_FLOWER_ACT]);
 
-- 
2.12.2.816.g2cccc81164

^ permalink raw reply related

* From: Mr.David Owain
From: Mr. David Owain @ 2017-04-28 12:06 UTC (permalink / raw)


Dear Sir/Madam.

Assalamu`Alaikum.

I am Mr.David Owain. I have  (15.8 Million UsDollars) to transfer into
your account,

I will send you more details about this deal and the procedures to
follow when I receive a positive response from you,

Have a great day,
Mr.David Owain.

^ permalink raw reply

* Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Simon Horman @ 2017-04-28 12:21 UTC (permalink / raw)
  To: David Miller; +Cc: jiri, jhs, xiyou.wangcong, eric.dumazet, netdev
In-Reply-To: <20170426.104658.547156121898948265.davem@davemloft.net>

On Wed, Apr 26, 2017 at 10:46:58AM -0400, David Miller wrote:
> From: Jiri Pirko <jiri@resnulli.us>
> Date: Wed, 26 Apr 2017 15:05:06 +0200
> 
> > Wed, Apr 26, 2017 at 02:46:22PM CEST, jhs@mojatatu.com wrote:
> >>On 17-04-26 07:02 AM, Simon Horman wrote:
> >>> On Tue, Apr 25, 2017 at 06:04:45PM +0200, Jiri Pirko wrote:
> >>[..]
> >>
> >>> > So fix iproute2. It is always first kernel, then iproute2.
> >>> 
> >>> Perhaps I am missing the point or somehow misguided but I would expect that
> >>> if the UAPI uses BIT() it also provides BIT().
> >>
> >>There is a user of BIT() already in iproute2 (devlink). We can move
> >>the code to be more generally available for other iproute2 users.
> >>Then this UAPI change makes use of it.
> > 
> > Should be part of UAPI as well
> > I see that include/uapi/rdma/vmw_pvrdma-abi.h is using BIT macro.
> > I don't see BIT macro defined in UAPI (I thought it is). So either
> > define it there (not sure where) or just use "<<"
> 
> "BIT" is a pretty crazy small simple name to pollute into the global
> namespace, IMHO.

It sounds to me that it would be best to just use "<<" rather than
spending cycles posturing on how to add it to the UAPI. Existing users
of BIT in the UAPI could also be updated to use "<<" to avoid having
a misleading precedence in-tree.

^ permalink raw reply

* Re: [ISSUE: sky2 - rx error] Link stops working under heavy traffic load connected to a mv88e6176
From: Andrew Lunn @ 2017-04-28 12:22 UTC (permalink / raw)
  To: Rafa Corvillo; +Cc: Stephen Hemminger, netdev
In-Reply-To: <59032D8B.1010801@aoifes.com>

> >Since you are using DSA, you will have DSA tags enabled on frames
> >to/from the switch. This adds an extra 8 byte header in the frame.  My
> >guess is, it is this header, not the VLAN tag which is causing you MTU
> >issues.
> 
> But it is strange because, as I have said above, we have the same
> configuration working properly on a kernel 4.1 (with OpenWrt), and
> we have the MTU set to 1500.

If you look at sky2.c:

static unsigned sky2_get_rx_threshold(struct sky2_port *sky2)
{
        unsigned size;

        /* Space needed for frame data + headers rounded up */
        size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);

        /* Stopping point for hardware truncation */
        return (size - 8) / sizeof(u32);
}

This is not going to be big enough for a frame with a DSA header.

> >I think this is the first time i've seen sky2 used in a DSA
> >setup. mv643xx or mvneta is generally what is used, when using Marvell
> >chipsets. These drivers are more lenient about MTU, and are happy to
> >pass frames with additional headers.
> >
> 
> We use the mv88e6xxx (as our switch is mv88e6176) and it depends on
> DSA driver in the kernel (isn't it?).

That is correct. But i was talking about the Ethernet interface. All
the designs i've seen use an mv643xxx Ethernet interface, or an mvneta
interface. This is the first time i've seen a sky2 used, which is why
i'm not too surprised you have issues.

> >Changing the MTU like this is not a good fix. It will allow you to
> >receive frames which are bigger, but it also means the local network
> >stack will generate bigger frames to be transmitted. You probably need
> >to modify the sky2 driver to allow it to receive frames bigger than
> >the interface MTU, by about 8 bytes.
> 
> Should the DSA driver remove the DSA tags before pass the frames to
> sky2 interface?

The DSA driver is adding the DSA tags to the frame and passing these
tagged frames to the sky2 interface. Frames going to/from the switch
will always have such tags.

> >>[ 4901.032989] sky2 0000:04:00.0 marvell: tx timeout
> >>[ 4904.722670] sky2 0000:04:00.0 marvell: Link is up at 1000 Mbps,
> >>full duplex, flow control both
> >
> >Between the sky2 and the switch, do you have two back-to-back PHYs or
> >are you connecting the RGMII interfaces together?
> 
> I think that we have two back-to-back PHYs, but I am going to double
> check this with the hardware team.

This could be your problem them. The mv88e6xxx switch driver assumes
there is a straight rgmii-rgmii connection, no PHYs. So it hard
configures the 'CPU' port to its fastest speed, with the link forced
up. If you actually have a PHY there, this might not work so well. I
don't know if the switch PHY is going to do autoneg correctly. Try
using ethtool to look at the sky2 PHY and see what state it is in.

      Andrew

^ permalink raw reply

* Re: [patch net-next 10/10] net: sched: extend gact to allow jumping to another filter chain
From: Jamal Hadi Salim @ 2017-04-28 12:23 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, xiyou.wangcong, dsa, edumazet, stephen, daniel,
	alexander.h.duyck, mlxsw, simon.horman
In-Reply-To: <20170428065209.GA1886@nanopsycho.orion>

On 17-04-28 02:52 AM, Jiri Pirko wrote:
> Fri, Apr 28, 2017 at 03:41:08AM CEST, jhs@mojatatu.com wrote:
>>
>> Jiri,
>>
>> Good stuff!
>> Thanks for the effort.
>>
>> I didnt review the details - will do. I wanted to raise one issue.
>> This should work for all actions, not just gact (refer to the
>> recent commit i made on the action jumping).
>>
>> Example policy for policer:
>>
>> #if packets destined for mac address 52:54:00:3d:c7:6d
>> #exceed 90kbps with burst of 90K then jump to chain 11
>> #for further classification, otherwise set their skb mark to 11
>> # and proceed.
>>
>> tc filter add dev eth0 parent ffff: protocol ip pref 33 \
>> flower dst_mac 52:54:00:3d:c7:6d \
>> action police rate 1kbit burst 90k conform-exceed pipe/goto chain 11 \
>> action skbedit mark 11
>>
>> But i should also be able to do this for any other action, etc.
>

[..]

> You can have multiple actions in list and gact goto as the last one. Why
> to do this ugliness?

To be able to do what I described above ;-> Policer is always a good
example because it can branch depending on whether a rate is exceeded
or not.

Another example:
If you had two tables, one for flows that dont exceed their
rate and another for flows exceed their rate.
"match X
	action police
	    if exceed
                 goto chain 12
             else did not exceed
                tag packet, goto chain 11
"

But it is not just the policer, other actions as well would benefit.

I am not sure you can achieve that with just gact.

cheers,
jamal

^ permalink raw reply

* Re: [REGRESSION next-20170426] Commit 09515ef5ddad ("of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices") causes oops in mvneta
From: Ralph Sennhauser @ 2017-04-28 12:25 UTC (permalink / raw)
  To: Sricharan R
  Cc: Rafael J. Wysocki, Joerg Roedel, Bjorn Helgaas, linux-acpi,
	linux-kernel, linux-pci, Thomas Petazzoni, netdev
In-Reply-To: <bcdcc1cc-181c-8396-dd3c-dd40d0c7efc1@codeaurora.org>

On Fri, 28 Apr 2017 17:26:41 +0530
Sricharan R <sricharan@codeaurora.org> wrote:

> Hi Ralph,
> 
> <snip..>
> 
> >>>>>
> >>>>> Commit 09515ef5ddad ("of/acpi: Configure dma operations at probe
> >>>>> time for platform/amba/pci bus devices") causes a kernel panic
> >>>>> as in the log below on an armada-385. Reverting the commit
> >>>>> fixes the issue.
> >>>>>
> >>>>> Regards
> >>>>> Ralph      
> >>>>
> >>>> Somehow not getting a obvious clue on whats going wrong with the
> >>>> logs below. From the log and looking in to dts, the drivers seems
> >>>> to the one for "marvell,armada-370-neta".    
> >>>
> >>> Correct.
> >>>     
> >>>> Issue looks the data from the dma
> >>>> has gone bad and subsequently referring the wrong data has
> >>>> resulted in the crash. Looks like the dma_masks is the one going
> >>>> wrong. Can i get some logs from mvneta_probe, about
> >>>> dev->dma_mask, dev->coherent_dma_mask and dev->dma_ops with and
> >>>> without the patch to see whats the difference ?    
> >>>
> >>> Not sure I understood what exactly you are after. Might be faster
> >>> to just send me a patch with all debug print statements you like
> >>> to see.   
> >>
> >> Attached the patch with debug prints.
> >>
> >> Regards,
> >>  Sricharan
> >>  
> > 
> > Hi Sricharan
> > 
> > With commit 09515ef5ddad
> > 
> > [    1.288962] mvneta f1070000.ethernet: dev->dma_mask 0xffffffff
> > [    1.294827] mvneta f1070000.ethernet: dev->coherent_dma_mask
> > 0xffffffff [    1.301472] mvneta f1070000.ethernet: dev->dma_ops
> > 0x40b00c0601460
> > 
> > [    1.322047] mvneta f1034000.ethernet: dev->dma_mask 0xffffffff
> > [    1.327904] mvneta f1034000.ethernet: dev->coherent_dma_mask
> > 0xffffffff [    1.334549] mvneta f1034000.ethernet: dev->dma_ops
> > 0x40b00c0601460
> > 
> > 
> > With the patch reverted, the build that works
> > 
> > [    1.289001] mvneta f1070000.ethernet: dev->dma_mask 0xffffffff
> > [    1.294866] mvneta f1070000.ethernet: dev->coherent_dma_mask
> > 0xffffffff [    1.301511] mvneta f1070000.ethernet: dev->dma_ops
> > 0x40b00c06014a8
> > 
> > [    1.317005] mvneta f1034000.ethernet: dev->dma_mask 0xffffffff
> > [    1.322867] mvneta f1034000.ethernet: dev->coherent_dma_mask
> > 0xffffffff [    1.329508] mvneta f1034000.ethernet: dev->dma_ops
> > 0x40b00c06014a8 
> 
> My bad, i think it is this patch missing [1], attached it as well.
> Infact, this was in the series initially and got acked to get merged
> separately well before the series. I should have sent this to Russell.
> I will do this now. If this fixes up the issue,
> i will take this patch separately, while this series gets tested
> on -next.
> 
> [1] https://patchwork.kernel.org/patch/9362113/
> 

With the attached patch,
0001-arm-dma-mapping-Don-t-override-dma_ops-in-arch_setup.patch, on top
of next all is well again.

Thanks
Ralph

^ permalink raw reply

* Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-28 12:30 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: davem, xiyou.wangcong, eric.dumazet, netdev, Simon Horman,
	Benjamin LaHaise
In-Reply-To: <20170428070207.GC1886@nanopsycho.orion>

On 17-04-28 03:02 AM, Jiri Pirko wrote:
> Fri, Apr 28, 2017 at 03:22:53AM CEST, jhs@mojatatu.com wrote:

[..]
>> Maybe I am misunderstanding:
>> Recall, this is what it looks like with this patchset:
>> <nlh><subsytem-header>[TCA_ROOT_XXXX]
>>
>> TCA_ROOT_XXX is very subsystem specific. classifiers, qdiscs and many
>> subsystems defined their own semantics for that TLV level. This specific
>> "dump max" is very very specific to actions. They were crippled by the
>> fact you could only send 32 at a time - this allows more to be sent.
>>
>> I thought initially you meant:
>> <nlh>[NLA_XXX]<subsytem-header>[TCA_ROOT_XXXX]
>>
>> I think at the NLA_XXX you could fit netlink wide TLVs - but if i said
>> "do a large dump" it is of no use to any other subsystem.
>
> Okay, I'm sorry, I had couple of beers yesterday so that might be
> the cause why your msg makes me totally confused :O
>
> All I suggest is to replace NLA_U32 flags you want that does not
> have any semantics with NLA_FLAGS flags, which eventually will carry
> the exact same u32, but with predefined semantics, helpers, everything.
>

I didnt understand fully Jiri. Are you suggesting a new type called
NLA_FLAGS which is re-usable elsewhere?

cheers,
jamal

^ permalink raw reply

* [PATCH] iov_iter: don't revert if csum error
From: Ding Tianhong @ 2017-04-28 12:48 UTC (permalink / raw)
  To: David Miller, pabeni, edumazet, hannes, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, LinuxArm

The patch 3278682 (make skb_copy_datagram_msg() et.al. preserve
->msg_iter on error) will revert the iov buffer if copy to iter
failed, but it looks no need to revert for csum error, so fix it.

Fixes: 3278682 ("make skb_copy_datagram_msg() et.al. preserve->msg_iter on error")
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 net/core/datagram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index f4947e7..475a8e9 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -760,7 +760,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,

 	if (msg_data_left(msg) < chunk) {
 		if (__skb_checksum_complete(skb))
-			goto csum_error;
+			goto fault;
 		if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
 			goto fault;
 	} else {
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH/RFC net-next 0/4] net/sched: cls_flower: avoid false matching of truncated packets
From: Jamal Hadi Salim @ 2017-04-28 12:52 UTC (permalink / raw)
  To: Simon Horman, Jiri Pirko, Cong Wang
  Cc: Dinan Gunawardena, netdev, oss-drivers
In-Reply-To: <20170428120035.15984-1-simon.horman@netronome.com>

On 17-04-28 08:00 AM, Simon Horman wrote:
> Hi,
>
> this series is intended to avoid false-positives which match
> truncated packets against flower classifiers which match on:
> * zero L4 ports or;
> * zero ICMP code or type
>
> This requires updating the flow dissector to return an error in such cases
> and updating flower to not match on the result of a failed dissection.
>
> In the case of UDP this results in a behavioural change to users of
> flow_keys_dissector_keys[] and flow_keys_dissector_symmetric_keys[] -
> dissection will fail on truncated packets where the IP protocol of the
> packets indicates ports should be present (according to skb_flow_get_ports()).

I think i understand the use case/need.
But would it be fair to say that the truncated vs non-truncated are two
different filter rules? Example what would offloading of 
header_parse_err_action mean?

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next 17/18] net: dsa: mv88e6xxx: support the VTU Page bit
From: Andrew Lunn @ 2017-04-28 12:53 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-18-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:35AM -0400, Vivien Didelot wrote:
> Newer chips such as the 88E6390 have a VTU Page bit in the VTU VID
> register to specify a 13th bit for the VID. This can be used to support
> 8K VLANs.

Hi Vivien

At the moment, this code appears to be generic to all chips. Do we
need checks to ensure we don't look at this bit on older devices where
it is not valid? Particularly on read. On write, we already verify the
vid is less than info->max_vids, so i don't think that is a problem.

> When dumping the whole VTU, all VID bits must be set to one, including
> this VTU Page bit. Add support for VID greater than 4095.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> ---
>  drivers/net/dsa/mv88e6xxx/global1_vtu.c | 7 +++++++
>  drivers/net/dsa/mv88e6xxx/mv88e6xxx.h   | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/global1_vtu.c b/drivers/net/dsa/mv88e6xxx/global1_vtu.c
> index cb0f3359d60b..2ad080291208 100644
> --- a/drivers/net/dsa/mv88e6xxx/global1_vtu.c
> +++ b/drivers/net/dsa/mv88e6xxx/global1_vtu.c
> @@ -95,6 +95,10 @@ static int mv88e6xxx_g1_vtu_vid_read(struct mv88e6xxx_chip *chip,
>  		return err;
>  
>  	entry->vid = val & 0xfff;
> +
> +	if (val & GLOBAL_VTU_VID_PAGE)
> +		entry->vid |= 0x1000;
> +

I'm undecided myself, so i will just bring it up for discussion.

Maybe it would be more readable to say:

		entry->vid += 4096;

???

	Andrew

^ permalink raw reply

* Re: [PATCH net-next 18/18] net: dsa: mv88e6xxx: add VTU support for 88E6390
From: Andrew Lunn @ 2017-04-28 12:55 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-19-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:36AM -0400, Vivien Didelot wrote:
> The 6390 family of chips use only 2 of the 3 VTU Data registers to pack
> the MemberTag and PortState VLAN data. This means that they must be
> written or read before or after each VTU/STU operations.
> 
> Implement this variant to add support for VTU with such chips. These
> chips have a 13th bit for the VID thus set their max_vid to 8191.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox