Netdev List
 help / color / mirror / Atom feed
* [PATCH] iproute2: Nr. of packets and octets for macsec tx stats were swapped.
From: Daniel.Hopf @ 2016-11-22 13:24 UTC (permalink / raw)
  To: netdev

Signed-off-by: Daniel Hopf <daniel.hopf@continental-corporation.com>
---
 ip/ipmacsec.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
index c9252bb..aa89a00 100644
--- a/ip/ipmacsec.c
+++ b/ip/ipmacsec.c
@@ -634,10 +634,10 @@ static void print_one_stat(const char **names, 
struct rtattr **attr, int idx,
 }

 static const char *txsc_stats_names[NUM_MACSEC_TXSC_STATS_ATTR] = {
-       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = 
"OutOctetsProtected",
-       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = 
"OutOctetsEncrypted",
-       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
"OutPktsProtected",
-       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
"OutPktsEncrypted",
+       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = "OutPktsProtected",
+       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = "OutPktsEncrypted",
+       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
"OutOctetsProtected",
+       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
"OutOctetsEncrypted",
 };

 static void print_txsc_stats(const char *prefix, struct rtattr *attr)

^ permalink raw reply related

* Re: [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc
From: Mike Manning @ 2016-11-22 13:17 UTC (permalink / raw)
  To: Hannes Frederic Sowa, netdev
In-Reply-To: <718b6520-686e-cab3-1c8d-9e1de4cb0dbd@stressinduktion.org>

On 11/22/2016 12:18 PM, Hannes Frederic Sowa wrote:
> On 22.11.2016 11:34, Mike Manning wrote:
>> Bursts of failures may occur when adding IPv6 routes via Netlink to the
>> kernel when testing under scale (e.g. 500 routes lost out of 1M). The
>> reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
>> extended the area map in time for the atomic allocation using percpu.c:
>> pcpu_alloc() to succeed. This results in route additions failing with
>> an -ENOMEM error.
>>
>> While the sender of the Netlink msg to add this route could check for
>> an ACK and retransmit in the case of an -ENOMEM error, the latter
>> should not occur in the first place if there is plenty of memory. The
>> solution is to use non-atomic alloc for rt6_info instead. While the
>> client may now be blocked for longer depending on the state of the
>> chunk being added to, this work has to be incurred at some point.
>>
>> The alternative solution would be to provide configurable parameters
>> e.g. via sysctl in percpu.c for default map size, low/high empty pages
>> and map margins. For this solution, the map margin sizes need to be
>> stored per chunk, as large margins cannot be used if the dynamic early
>> slots map size is in use. This is not a preferred solution though, as
>> it requires tuning of these parameters to provide sufficient margins to
>> avoid -ENOMEM errors depending on system requirements.
>>
>> Signed-off-by: Mike Manning <mmanning@brocade.com>
>> ---
>>  net/ipv6/route.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 1b57e11..0e9bb76 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
>>  	struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
>>  
>>  	if (rt) {
>> -		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
>> +		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
>>  		if (rt->rt6i_pcpu) {
>>  			int cpu;
> 
> Nak, this doesn't work, as ip6_dst_alloc must be callable from
> non-blocking code paths unfortunately.
> 
> 

Thanks for the prompt reply.

Do you consider the alternative of providing configurable parameters for per-cpu
alloc as viable, or is there a better way of dealing with this?

While I have tested such param changes under scale as avoiding the -ENOMEM errors, it
would be good to get confirmation that this approach is acceptable prior to coding the
sysctl handling for these.

^ permalink raw reply

* Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Mark Lord @ 2016-11-22 13:12 UTC (permalink / raw)
  To: Hayes Wang, netdev@vger.kernel.org
  Cc: nic_swsd, linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <b9809516-d036-bfc3-b7a3-6563033ec957@pobox.com>

On 16-11-18 07:03 AM, Mark Lord wrote:
> On 16-11-18 02:57 AM, Hayes Wang wrote:
> ..
>> Besides, the maximum data length which the RTL8152 would send to
>> the host is 16KB. That is, if the agg_buf_sz is 16KB, the host
>> wouldn't split it. However, you still see problems for it.
>
> How does the RTL8152 know that the limit is 16KB,
> rather than some other number?  Is this a hardwired number
> in the hardware, or is it a parameter that the software
> sends to the chip during initialization?
..
> The first issue is that a packet sometimes begins in one URB,
> and completes in the next URB, without an rx_desc at the start
> of the second URB.  This I have already reported earlier.

Long run tests over the weekend, with the invalidate_dcache_range() call
before the inner loop of r8152_rx_bottom(), turned up a few instances
where packets were truncated inside a 16384 byte URB buffer, without filling the URB.

[10.293228] r8152_rx_bottom: 4278 corrupted urb: head=9d210000 urb_offset=2856/3376 pkt_len(1518) exceeds remainder(496)
[10.304523] r8152_dump_rx_desc: 044805ee 40080000 006005dc 06020000 00000000 00000000 rx_len=1518
..
[   16.660431] r8152_rx_bottom: 7802 corrupted urb: head=9d1f8000 urb_offset=1544/2064 pkt_len(1518) exceeds remainder(496)
[   16.671719] r8152_dump_rx_desc: 044805ee 40480000 004005dc 46020006 00000000 00000000 rx_len=1518

The r8152.c driver attempted to build skb's for the entire packet size,
even though the 1518-byte packets had only 496-bytes of data in the URB.
It is not clear what the chip did with the rest of the packets in question,
but the next URBs in each case began with a new/real rx_desc and new packet.

There were also unconnected events during the test runs where the
test code noticed totally invalid rx_desc structs in the middles of URBs.
The stock driver would again have attempted to treat those as "valid" (ugh).

..
[   10.273906] r8152_check_rx_desc: rx_desc looks bad.
[   10.279012] r8152_rx_bottom: 4338 corrupted urb. head=9d210000 urb_offset=2856/3376 len_used=2880
[   10.288196] r8152_dump_rx_desc: 312e3239 382e3836 0a20382e 3d435253 3034336d 202f3a30 rx_len=12857

..
[    7.184565] r8152_check_rx_desc: rx_desc looks bad.
[    7.189657] r8152_rx_bottom: 1678 corrupted urb. head=9d210000 urb_offset=2856/3376 len_used=2880
[    7.198852] r8152_dump_rx_desc: a1388402 803c9001 84380810 a67c5c4c a77c782b c64c782b rx_len=1026
..
[   10.351251] r8152_check_rx_desc: rx_desc looks bad.
[   10.356356] r8152_rx_bottom: 4397 corrupted urb. head=9d20c000 urb_offset=4400/7984 len_used=4424
[   10.365543] r8152_dump_rx_desc: 312e3239 382e3836 0a20382e 3d435253 3034336d 202f3a30 rx_len=12857
..
[   10.518119] r8152_check_rx_desc: rx_desc looks bad.
[   10.523204] r8152_rx_bottom: 4458 corrupted urb. head=9d210000 urb_offset=4400/7984 len_used=4424
[   10.532416] r8152_dump_rx_desc: 54544120 6e3d5352 636f6c6f 65762c6b 343d7372 6464612c rx_len=16672
..

> But the driver, as written, sometimes accesses bytes outside
> of the 16KB URB buffer, because it trusts the non-existent
> rx_desc in these cases, and also because it accesses bytes
> from the rx_desc without first checking whether there is
> sufficient remaining space in the URB to hold an rx_desc.
>
> These incorrect accesses sometimes touch memory outside
> of the URB buffer.  Since the driver allocates all of its
> rx URB buffers at once, they are highly likely to be
> physically (and therefore virtually) adjacent in memory.
>
> So mistakenly accessing beyond the end of one buffer will
> often result in a read from memory of the next URB buffer.
> Which causes a portion of it to be loaded in the the D-cache.
>
> When that URB is subsequently filled by DMA, there then exists
> a data-consistency issue:  the D-cache contains stale information
> from before the latest DMA cycle.
>
> So this explains the strange memory behaviour observed earlier on.
> When I add a call to invalidate_dcache_range() to the driver
> just before it begins examining a new rx URB, the problems go away.
> So this confirms the observations.
>
> Using non-cacheable RAM also makes the problem go away.
> But neither is a fix for the real buffer overrun accesses in the driver.
>
> Fix the "packet spans URBs" bug, and fix the driver to ALWAYS
> test lengths/ranges before accessing the actual buffer,
> and everything should begin working reliably.

^ permalink raw reply

* Re: [RFC net-next 0/3] net: bridge: Allow CPU port configuration
From: Jiri Pirko @ 2016-11-22 12:49 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, bridge, stephen, vivien.didelot, andrew, jiri,
	idosch
In-Reply-To: <20161121190925.14530-1-f.fainelli@gmail.com>

Mon, Nov 21, 2016 at 08:09:22PM CET, f.fainelli@gmail.com wrote:
>Hi all,
>
>This patch series allows using the bridge master interface to configure
>an Ethernet switch port's CPU/management port with different VLAN attributes than
>those of the bridge downstream ports/members.
>
>Jiri, Ido, Andrew, Vivien, please review the impact on mlxsw and mv88e6xxx, I
>tested this with b53 and a mockup DSA driver.

Patchset looks fine to me.

>
>Open questions:
>
>- if we have more than one bridge on top of a physical switch, the driver
>  should keep track of that and verify that we are not going to change
>  the CPU port VLAN attributes in a way that results in incompatible settings
>  to be applied

Ack. In mlxsw this is tracked


>
>- if the default behavior is to have all VLANs associated with the CPU port
>  be ingressing/egressing tagged to the CPU, is this really useful?
>
>Florian Fainelli (3):
>  net: bridge: Allow bridge master device to configure switch CPU port
>  net: dsa: Propagate VLAN add/del to CPU port(s)
>  net: dsa: b53: Remove CPU port specific VLAN programming
>
> drivers/net/dsa/b53/b53_common.c | 22 ++++++--------------
> net/bridge/br_vlan.c             | 28 ++++++++++++++++++++++---
> net/dsa/slave.c                  | 45 +++++++++++++++++++++++++++++-----------
> 3 files changed, 64 insertions(+), 31 deletions(-)
>
>-- 
>2.9.3
>

^ permalink raw reply

* Re: [PATCH] ipv6:ipv6_pinfo dereferenced after NULL check
From: Hannes Frederic Sowa @ 2016-11-22 12:26 UTC (permalink / raw)
  To: Manjeet Pawar, davem, kuznet, jmorris, yoshfuji, kaber, netdev,
	linux-kernel
  Cc: pankaj.m, ajeet.y, Rohit Thapliyal
In-Reply-To: <1479796024-39418-1-git-send-email-manjeet.p@samsung.com>

On 22.11.2016 07:27, Manjeet Pawar wrote:
> From: Rohit Thapliyal <r.thapliyal@samsung.com>
> 
> np checked for NULL and then dereferenced. It should be modified
> for NULL case.
> 
> Signed-off-by: Rohit Thapliyal <r.thapliyal@samsung.com>
> Signed-off-by: Manjeet Pawar <manjeet.p@samsung.com>
> ---
>  net/ipv6/ip6_output.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 1dfc402..c2afa14 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -205,14 +205,15 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
>  	/*
>  	 *	Fill in the IPv6 header
>  	 */
> -	if (np)
> +	if (np) {
>  		hlimit = np->hop_limit;
> +		ip6_flow_hdr(
> +					hdr, tclass, ip6_make_flowlabel(
> +					net, skb, fl6->flowlabel,
> +					np->autoflowlabel, fl6));
> +	}
>  	if (hlimit < 0)
>  		hlimit = ip6_dst_hoplimit(dst);
>  
> -	ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel,
> -				np->autoflowlabel, fl6));
> -
>  	hdr->payload_len = htons(seg_len);
>  	hdr->nexthdr = proto;
>  	hdr->hop_limit = hlimit;
> 


We always should initialize hdr and not skip the ip6_flow_hdr call.

Do you saw a bug or did you find this by code review? I wonder if np can
actually be NULL at this point. Maybe we can just eliminate the NULL check.

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc
From: Hannes Frederic Sowa @ 2016-11-22 12:18 UTC (permalink / raw)
  To: Mike Manning, netdev
In-Reply-To: <1479810840-19122-1-git-send-email-mmanning@brocade.com>

On 22.11.2016 11:34, Mike Manning wrote:
> Bursts of failures may occur when adding IPv6 routes via Netlink to the
> kernel when testing under scale (e.g. 500 routes lost out of 1M). The
> reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
> extended the area map in time for the atomic allocation using percpu.c:
> pcpu_alloc() to succeed. This results in route additions failing with
> an -ENOMEM error.
> 
> While the sender of the Netlink msg to add this route could check for
> an ACK and retransmit in the case of an -ENOMEM error, the latter
> should not occur in the first place if there is plenty of memory. The
> solution is to use non-atomic alloc for rt6_info instead. While the
> client may now be blocked for longer depending on the state of the
> chunk being added to, this work has to be incurred at some point.
> 
> The alternative solution would be to provide configurable parameters
> e.g. via sysctl in percpu.c for default map size, low/high empty pages
> and map margins. For this solution, the map margin sizes need to be
> stored per chunk, as large margins cannot be used if the dynamic early
> slots map size is in use. This is not a preferred solution though, as
> it requires tuning of these parameters to provide sufficient margins to
> avoid -ENOMEM errors depending on system requirements.
> 
> Signed-off-by: Mike Manning <mmanning@brocade.com>
> ---
>  net/ipv6/route.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 1b57e11..0e9bb76 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
>  	struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
>  
>  	if (rt) {
> -		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
> +		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
>  		if (rt->rt6i_pcpu) {
>  			int cpu;

Nak, this doesn't work, as ip6_dst_alloc must be callable from
non-blocking code paths unfortunately.

^ permalink raw reply

* Re: mlx5 "syndrome" errors in kernel log
From: Saeed Mahameed @ 2016-11-22 12:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Saeed Mahameed, Tariq Toukan, netdev@vger.kernel.org
In-Reply-To: <20161122105944.0c64e77b@redhat.com>

On Tue, Nov 22, 2016 at 11:59 AM, Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
> Hi Saeed,
>
> I'm seeing below dmesg errors, after pulling net-next at commit
> e796f49d826aad, before I was not seeing these errors, where my tree was
> based on top of commit 319b0534b95.
>
> mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
>
>
> Listing my firmware version:
>
>  $ ethtool -i mlx5p2
>  driver: mlx5_core
>  version: 3.0-1 (January 2015)
>  firmware-version: 12.12.1240

Hi Jesper,

Seems like this FW version doesn't support a new FW command introduced
by "net/mlx5e: Expose PCIe statistics to ethtool"

I suggest to upgrade FW, but if you don't know how to do it or in a
hurry, please go ahead and revert "
   net/mlx5e: Expose PCIe statistics to ethtool"

I will need to introduce a new capability bit as a permanent solution
and a fix for the above patch.

Thanks for the report,
We will handle this.

^ permalink raw reply

* net/udp: bug in skb_pull_rcsum
From: Andrey Konovalov @ 2016-11-22 11:58 UTC (permalink / raw)
  To: samanthakumar, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML
  Cc: Dmitry Vyukov, Kostya Serebryany, Eric Dumazet, syzkaller

[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]

Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

A reproducer is attached.

On commit 9c763584b7c8911106bb77af7e648bef09af9d80 (4.9-rc6, Nov 20).

------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:3029!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 3854 Comm: a.out Not tainted 4.9.0-rc6+ #431
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880068472c00 task.stack: ffff880063ec8000
RIP: 0010:[<ffffffff82b8fd85>]  [<ffffffff82b8fd85>]
skb_pull_rcsum+0x255/0x350 net/core/skbuff.c:3029
RSP: 0018:ffff880063ecf660  EFLAGS: 00010297
RAX: ffff880068472c00 RBX: ffff880065a2da00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000000d RDI: ffffed000c7d9ec0
RBP: ffff880063ecf690 R08: 1ffff1000d08e67e R09: 1ffff1000cb45b50
R10: dffffc0000000000 R11: 0000000000000000 R12: ffff880065a2da80
R13: 0000000000000008 R14: ffff880065a2dad8 R15: 0000000000000001
FS:  00007fbb006497c0(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020032fe0 CR3: 00000000636d9000 CR4: 00000000000006e0
Stack:
 ffff88006bfbb948 ffff880065a2da00 ffff880064160000 1ffff1000cb45b52
 0000000000000000 1ffff1000d4d3933 ffff880063ecf6f8 ffffffff83354ced
 00000000fffffe00 ffff880065a2da90 ffff880063ecf6c0 ffffffff00000001
Call Trace:
 [<     inline     >] udp_csum_pull_header ./include/net/udp.h:166
 [<ffffffff83354ced>] udpv6_queue_rcv_skb+0x37d/0x17b0 net/ipv6/udp.c:625
 [<     inline     >] sk_backlog_rcv ./include/net/sock.h:874
 [<ffffffff82b7eec6>] __release_sock+0x126/0x3a0 net/core/sock.c:2046
 [<ffffffff82b7f199>] release_sock+0x59/0x1c0 net/core/sock.c:2504
 [<ffffffff8334fc50>] udpv6_sendmsg+0x1310/0x24a0 net/ipv6/udp.c:1273
 [<ffffffff83174fa7>] inet_sendmsg+0x317/0x4e0 net/ipv4/af_inet.c:734
 [<     inline     >] sock_sendmsg_nosec net/socket.c:621
 [<ffffffff82b7176c>] sock_sendmsg+0xcc/0x110 net/socket.c:631
 [<ffffffff82b719d1>] sock_write_iter+0x221/0x3b0 net/socket.c:829
 [<ffffffff8151e69b>] do_iter_readv_writev+0x2bb/0x3f0 fs/read_write.c:695
 [<ffffffff81520501>] do_readv_writev+0x431/0x730 fs/read_write.c:872
 [<ffffffff81520d2f>] vfs_writev+0x8f/0xc0 fs/read_write.c:911
 [<ffffffff81520e41>] do_writev+0xe1/0x240 fs/read_write.c:944
 [<     inline     >] SYSC_writev fs/read_write.c:1017
 [<ffffffff81523ca7>] SyS_writev+0x27/0x30 fs/read_write.c:1014
 [<ffffffff83fc4381>] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code: 89 f8 49 c1 e8 03 47 0f b6 14 08 45 84 d2 74 0a 41 80 fa 03 0f
8e cf 00 00 00 80 a3 91 00 00 00 f9 e9 43 ff ff ff e8 3b 79 79 fe <0f>
0b e8 34 79 79 fe 0f 0b e8 2d 79 79 fe 48 8b 7d d0 31 d2 44
RIP  [<ffffffff82b8fd85>] skb_pull_rcsum+0x255/0x350 net/core/skbuff.c:3029
 RSP <ffff880063ecf660>
---[ end trace a5d5d2cef6a25ecb ]---
==================================================================

[-- Attachment #2: skb-pull-bug-poc.c --]
[-- Type: text/x-csrc, Size: 7858 bytes --]

// autogenerated by syzkaller (http://github.com/google/syzkaller)

#ifndef __NR_mmap
#define __NR_mmap 9
#endif
#ifndef __NR_bind
#define __NR_bind 49
#endif
#ifndef __NR_sendmsg
#define __NR_sendmsg 46
#endif
#ifndef __NR_writev
#define __NR_writev 20
#endif
#ifndef __NR_socket
#define __NR_socket 41
#endif
#ifndef __NR_syz_fuse_mount
#define __NR_syz_fuse_mount 1000004
#endif
#ifndef __NR_syz_fuseblk_mount
#define __NR_syz_fuseblk_mount 1000005
#endif
#ifndef __NR_syz_open_dev
#define __NR_syz_open_dev 1000002
#endif
#ifndef __NR_syz_open_pts
#define __NR_syz_open_pts 1000003
#endif
#ifndef __NR_syz_test
#define __NR_syz_test 1000001
#endif

#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <net/if_arp.h>

#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <pthread.h>
#include <setjmp.h>
#include <signal.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

__thread int skip_segv;
__thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* uctx)
{
  if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED))
    _longjmp(segv_env, 1);
  exit(sig);
}

static void install_segv_handler()
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_sigaction = segv_handler;
  sa.sa_flags = SA_NODEFER | SA_SIGINFO;
  sigaction(SIGSEGV, &sa, NULL);
  sigaction(SIGBUS, &sa, NULL);
}

#define NONFAILING(...)                                                \
  {                                                                    \
    __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
    if (_setjmp(segv_env) == 0) {                                      \
      __VA_ARGS__;                                                     \
    }                                                                  \
    __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
  }

static uintptr_t syz_open_dev(uintptr_t a0, uintptr_t a1, uintptr_t a2)
{
  if (a0 == 0xc || a0 == 0xb) {

    char buf[128];
    sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" : "block",
            (uint8_t)a1, (uint8_t)a2);
    return open(buf, O_RDWR, 0);
  } else {

    char buf[1024];
    char* hash;
    strncpy(buf, (char*)a0, sizeof(buf));
    buf[sizeof(buf) - 1] = 0;
    while ((hash = strchr(buf, '#'))) {
      *hash = '0' + (char)(a1 % 10);
      a1 /= 10;
    }
    return open(buf, a2, 0);
  }
}

static uintptr_t syz_open_pts(uintptr_t a0, uintptr_t a1)
{

  int ptyno = 0;
  if (ioctl(a0, TIOCGPTN, &ptyno))
    return -1;
  char buf[128];
  sprintf(buf, "/dev/pts/%d", ptyno);
  return open(buf, a1, 0);
}

static uintptr_t syz_fuse_mount(uintptr_t a0, uintptr_t a1,
                                uintptr_t a2, uintptr_t a3,
                                uintptr_t a4, uintptr_t a5)
{

  uint64_t target = a0;
  uint64_t mode = a1;
  uint64_t uid = a2;
  uint64_t gid = a3;
  uint64_t maxread = a4;
  uint64_t flags = a5;

  int fd = open("/dev/fuse", O_RDWR);
  if (fd == -1)
    return fd;
  char buf[1024];
  sprintf(buf, "fd=%d,user_id=%ld,group_id=%ld,rootmode=0%o", fd,
          (long)uid, (long)gid, (unsigned)mode & ~3u);
  if (maxread != 0)
    sprintf(buf + strlen(buf), ",max_read=%ld", (long)maxread);
  if (mode & 1)
    strcat(buf, ",default_permissions");
  if (mode & 2)
    strcat(buf, ",allow_other");
  syscall(SYS_mount, "", target, "fuse", flags, buf);

  return fd;
}

static uintptr_t syz_fuseblk_mount(uintptr_t a0, uintptr_t a1,
                                   uintptr_t a2, uintptr_t a3,
                                   uintptr_t a4, uintptr_t a5,
                                   uintptr_t a6, uintptr_t a7)
{

  uint64_t target = a0;
  uint64_t blkdev = a1;
  uint64_t mode = a2;
  uint64_t uid = a3;
  uint64_t gid = a4;
  uint64_t maxread = a5;
  uint64_t blksize = a6;
  uint64_t flags = a7;

  int fd = open("/dev/fuse", O_RDWR);
  if (fd == -1)
    return fd;
  if (syscall(SYS_mknodat, AT_FDCWD, blkdev, S_IFBLK, makedev(7, 199)))
    return fd;
  char buf[256];
  sprintf(buf, "fd=%d,user_id=%ld,group_id=%ld,rootmode=0%o", fd,
          (long)uid, (long)gid, (unsigned)mode & ~3u);
  if (maxread != 0)
    sprintf(buf + strlen(buf), ",max_read=%ld", (long)maxread);
  if (blksize != 0)
    sprintf(buf + strlen(buf), ",blksize=%ld", (long)blksize);
  if (mode & 1)
    strcat(buf, ",default_permissions");
  if (mode & 2)
    strcat(buf, ",allow_other");
  syscall(SYS_mount, blkdev, target, "fuseblk", flags, buf);

  return fd;
}

static uintptr_t execute_syscall(int nr, uintptr_t a0, uintptr_t a1,
                                 uintptr_t a2, uintptr_t a3,
                                 uintptr_t a4, uintptr_t a5,
                                 uintptr_t a6, uintptr_t a7,
                                 uintptr_t a8)
{
  switch (nr) {
  default:
    return syscall(nr, a0, a1, a2, a3, a4, a5);
  case __NR_syz_test:
    return 0;
  case __NR_syz_open_dev:
    return syz_open_dev(a0, a1, a2);
  case __NR_syz_open_pts:
    return syz_open_pts(a0, a1);
  case __NR_syz_fuse_mount:
    return syz_fuse_mount(a0, a1, a2, a3, a4, a5);
  case __NR_syz_fuseblk_mount:
    return syz_fuseblk_mount(a0, a1, a2, a3, a4, a5, a6, a7);
  }
}

long r[17];

int main()
{
  install_segv_handler();
  memset(r, -1, sizeof(r));
  r[0] = execute_syscall(__NR_mmap, 0x20000000ul, 0x35000ul, 0x3ul,
                         0x32ul, 0xfffffffffffffffful, 0x0ul, 0, 0, 0);
  r[1] = execute_syscall(__NR_socket, 0xaul, 0x2ul, 0x88ul, 0, 0, 0, 0,
                         0, 0);
  NONFAILING(memcpy(
      (void*)0x20034000,
      "\x0a\x00\x42\x42\x7a\x85\x86\xb4\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x01\xb4\x73\x17\x84\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00",
      128));
  r[3] = execute_syscall(__NR_bind, r[1], 0x20034000ul, 0x80ul, 0, 0, 0,
                         0, 0, 0);
  NONFAILING(*(uint64_t*)0x2002b000 = (uint64_t)0x20021f80);
  NONFAILING(*(uint32_t*)0x2002b008 = (uint32_t)0x80);
  NONFAILING(*(uint64_t*)0x2002b010 = (uint64_t)0x2001f000);
  NONFAILING(*(uint64_t*)0x2002b018 = (uint64_t)0x0);
  NONFAILING(*(uint64_t*)0x2002b020 = (uint64_t)0x20027000);
  NONFAILING(*(uint64_t*)0x2002b028 = (uint64_t)0x0);
  NONFAILING(*(uint32_t*)0x2002b030 = (uint32_t)0x0);
  NONFAILING(memcpy(
      (void*)0x20021f80,
      "\x0a\x00\x42\x42\xbe\xf9\xa8\xa3\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x01\x7f\xdb\x0d\xf1\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00",
      128));
  r[12] = execute_syscall(__NR_sendmsg, r[1], 0x2002b000ul, 0x8000ul, 0,
                          0, 0, 0, 0, 0);
  NONFAILING(*(uint64_t*)0x20032fe0 = (uint64_t)0x20032000);
  NONFAILING(*(uint64_t*)0x20032fe8 = (uint64_t)0x1);
  NONFAILING(memcpy((void*)0x20032000, "\x0d", 1));
  r[16] = execute_syscall(__NR_writev, r[1], 0x20032fe0ul, 0x1ul, 0, 0,
                          0, 0, 0, 0);
  return 0;
}

^ permalink raw reply

* [PATCH] fec: Always write MAC address to controller register
From: Daniel Krüger @ 2016-11-22 11:24 UTC (permalink / raw)
  To: Fugang Duan; +Cc: netdev, Alexander Stein

On non-FEC_QUIRK_ENET_MAC types the MAC address needs to be set in FEC
during initialisation, if not done by bootloader already. Especially random
MACs or MAC addresses provided by kernel parameter must be set.

Signed-off-by: Daniel Krueger <daniel.krueger@systec-electronic.com>
---
 drivers/net/ethernet/freescale/fec_main.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2a03857..ea32fda 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -902,14 +902,14 @@ fec_restart(struct net_device *ndev)
 	/*
 	 * enet-mac reset will reset mac address registers too,
 	 * so need to reconfigure it.
+	 * On non-FEC_QUIRK_ENET_MAC types it won't be reset,
+	 * but it must be configured once at least (especially random MACs).
 	 */
-	if (fep->quirks & FEC_QUIRK_ENET_MAC) {
-		memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
-		writel((__force u32)cpu_to_be32(temp_mac[0]),
-		       fep->hwp + FEC_ADDR_LOW);
-		writel((__force u32)cpu_to_be32(temp_mac[1]),
-		       fep->hwp + FEC_ADDR_HIGH);
-	}
+	memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
+	writel((__force u32)cpu_to_be32(temp_mac[0]),
+	       fep->hwp + FEC_ADDR_LOW);
+	writel((__force u32)cpu_to_be32(temp_mac[1]),
+	       fep->hwp + FEC_ADDR_HIGH);
 
 	/* Clear any outstanding interrupt. */
 	writel(0xffffffff, fep->hwp + FEC_IEVENT);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc
From: Mike Manning @ 2016-11-22 10:34 UTC (permalink / raw)
  To: netdev

Bursts of failures may occur when adding IPv6 routes via Netlink to the
kernel when testing under scale (e.g. 500 routes lost out of 1M). The
reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
extended the area map in time for the atomic allocation using percpu.c:
pcpu_alloc() to succeed. This results in route additions failing with
an -ENOMEM error.

While the sender of the Netlink msg to add this route could check for
an ACK and retransmit in the case of an -ENOMEM error, the latter
should not occur in the first place if there is plenty of memory. The
solution is to use non-atomic alloc for rt6_info instead. While the
client may now be blocked for longer depending on the state of the
chunk being added to, this work has to be incurred at some point.

The alternative solution would be to provide configurable parameters
e.g. via sysctl in percpu.c for default map size, low/high empty pages
and map margins. For this solution, the map margin sizes need to be
stored per chunk, as large margins cannot be used if the dynamic early
slots map size is in use. This is not a preferred solution though, as
it requires tuning of these parameters to provide sufficient margins to
avoid -ENOMEM errors depending on system requirements.

Signed-off-by: Mike Manning <mmanning@brocade.com>
---
 net/ipv6/route.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1b57e11..0e9bb76 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
 	struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
 
 	if (rt) {
-		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
+		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
 		if (rt->rt6i_pcpu) {
 			int cpu;
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH v5] iproute2: macvlan: add "source" mode
From: Michael Braun @ 2016-11-22 10:59 UTC (permalink / raw)
  To: netdev; +Cc: Michael Braun, projekt-wlan, steweg

Adjusting iproute2 utility to support new macvlan link type mode called
"source".

Example of commands that can be applied:
  ip link add link eth0 name macvlan0 type macvlan mode source
  ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr flush
  ip -details link show dev macvlan0

Based on previous work of Stefan Gula <steweg@gmail.com>

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>

Cc: steweg@gmail.com

v5:
 - rebase and fix checkpatch

v4:
 - add MACADDR_SET support
 - skip FLAG_UNICAST / FLAG_UNICAST_ALL as this is not upstream
 - fix man page
---
 ip/iplink_macvlan.c   | 124 +++++++++++++++++++++++++++++++++++++++++++++++---
 man/man8/ip-link.8.in |  42 ++++++++++++++++-
 2 files changed, 158 insertions(+), 8 deletions(-)

diff --git a/ip/iplink_macvlan.c b/ip/iplink_macvlan.c
index 83ff961..b9a146f 100644
--- a/ip/iplink_macvlan.c
+++ b/ip/iplink_macvlan.c
@@ -15,6 +15,7 @@
 #include <string.h>
 #include <sys/socket.h>
 #include <linux/if_link.h>
+#include <linux/if_ether.h>
 
 #include "rt_names.h"
 #include "utils.h"
@@ -29,7 +30,11 @@
 static void print_explain(struct link_util *lu, FILE *f)
 {
 	fprintf(f,
-		"Usage: ... %s mode { private | vepa | bridge | passthru [nopromisc] }\n",
+		"Usage: ... %s mode MODE [flag MODE_FLAG] MODE_OPTS\n"
+		"MODE: private | vepa | bridge | passthru | source\n"
+		"MODE_FLAG: null | nopromisc\n"
+		"MODE_OPTS: for mode \"source\":\n"
+		"\tmacaddr { { add | del } <macaddr> | set [ <macaddr> [ <macaddr>  ... ] ] | flush }\n",
 		lu->id
 	);
 }
@@ -43,7 +48,15 @@ static void explain(struct link_util *lu)
 static int mode_arg(const char *arg)
 {
 	fprintf(stderr,
-		"Error: argument of \"mode\" must be \"private\", \"vepa\", \"bridge\" or \"passthru\", not \"%s\"\n",
+		"Error: argument of \"mode\" must be \"private\", \"vepa\", \"bridge\", \"passthru\" or \"source\", not \"%s\"\n",
+		arg);
+	return -1;
+}
+
+static int flag_arg(const char *arg)
+{
+	fprintf(stderr,
+		"Error: argument of \"flag\" must be \"nopromisc\" or \"null\", not \"%s\"\n",
 		arg);
 	return -1;
 }
@@ -53,6 +66,10 @@ static int macvlan_parse_opt(struct link_util *lu, int argc, char **argv,
 {
 	__u32 mode = 0;
 	__u16 flags = 0;
+	__u32 mac_mode = 0;
+	int has_flags = 0;
+	char mac[ETH_ALEN];
+	struct rtattr *nmac;
 
 	while (argc > 0) {
 		if (matches(*argv, "mode") == 0) {
@@ -66,10 +83,72 @@ static int macvlan_parse_opt(struct link_util *lu, int argc, char **argv,
 				mode = MACVLAN_MODE_BRIDGE;
 			else if (strcmp(*argv, "passthru") == 0)
 				mode = MACVLAN_MODE_PASSTHRU;
+			else if (strcmp(*argv, "source") == 0)
+				mode = MACVLAN_MODE_SOURCE;
 			else
 				return mode_arg(*argv);
+		} else if (matches(*argv, "flag") == 0) {
+			NEXT_ARG();
+
+			if (strcmp(*argv, "nopromisc") == 0)
+				flags |= MACVLAN_FLAG_NOPROMISC;
+			else if (strcmp(*argv, "null") == 0)
+				flags |= 0;
+			else
+				return flag_arg(*argv);
+
+			has_flags = 1;
+
+		} else if (matches(*argv, "macaddr") == 0) {
+			NEXT_ARG();
+
+			if (strcmp(*argv, "add") == 0) {
+				mac_mode = MACVLAN_MACADDR_ADD;
+			} else if (strcmp(*argv, "del") == 0) {
+				mac_mode = MACVLAN_MACADDR_DEL;
+			} else if (strcmp(*argv, "set") == 0) {
+				mac_mode = MACVLAN_MACADDR_SET;
+			} else if (strcmp(*argv, "flush") == 0) {
+				mac_mode = MACVLAN_MACADDR_FLUSH;
+			} else {
+				explain(lu);
+				return -1;
+			}
+
+			addattr32(n, 1024, IFLA_MACVLAN_MACADDR_MODE, mac_mode);
+
+			if (mac_mode == MACVLAN_MACADDR_ADD ||
+			    mac_mode == MACVLAN_MACADDR_DEL) {
+				NEXT_ARG();
+
+				if (ll_addr_a2n(mac, sizeof(mac),
+						*argv) != ETH_ALEN)
+					return -1;
+
+				addattr_l(n, 1024, IFLA_MACVLAN_MACADDR, &mac,
+					  ETH_ALEN);
+			}
+
+			if (mac_mode == MACVLAN_MACADDR_SET) {
+				nmac = addattr_nest(n, 1024,
+						    IFLA_MACVLAN_MACADDR_DATA);
+				while (NEXT_ARG_OK()) {
+					NEXT_ARG_FWD();
+
+					if (ll_addr_a2n(mac, sizeof(mac),
+							*argv) != ETH_ALEN) {
+						PREV_ARG();
+						break;
+					}
+
+					addattr_l(n, 1024, IFLA_MACVLAN_MACADDR,
+						  &mac, ETH_ALEN);
+				}
+				addattr_nest_end(n, nmac);
+			}
 		} else if (matches(*argv, "nopromisc") == 0) {
 			flags |= MACVLAN_FLAG_NOPROMISC;
+			has_flags = 1;
 		} else if (matches(*argv, "help") == 0) {
 			explain(lu);
 			return -1;
@@ -84,7 +163,7 @@ static int macvlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	if (mode)
 		addattr32(n, 1024, IFLA_MACVLAN_MODE, mode);
 
-	if (flags) {
+	if (has_flags) {
 		if (flags & MACVLAN_FLAG_NOPROMISC &&
 		    mode != MACVLAN_MODE_PASSTHRU) {
 			pfx_err(lu, "nopromisc flag only valid in passthru mode");
@@ -100,6 +179,10 @@ static void macvlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]
 {
 	__u32 mode;
 	__u16 flags;
+	__u32 count;
+	unsigned char *addr;
+	int len;
+	struct rtattr *rta;
 
 	if (!tb)
 		return;
@@ -109,20 +192,49 @@ static void macvlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]
 		return;
 
 	mode = rta_getattr_u32(tb[IFLA_MACVLAN_MODE]);
-	fprintf(f, " mode %s ",
+	fprintf(f, "mode %s ",
 		  mode == MACVLAN_MODE_PRIVATE ? "private"
 		: mode == MACVLAN_MODE_VEPA    ? "vepa"
 		: mode == MACVLAN_MODE_BRIDGE  ? "bridge"
 		: mode == MACVLAN_MODE_PASSTHRU  ? "passthru"
+		: mode == MACVLAN_MODE_SOURCE  ? "source"
 		:				 "unknown");
 
 	if (!tb[IFLA_MACVLAN_FLAGS] ||
 	    RTA_PAYLOAD(tb[IFLA_MACVLAN_FLAGS]) < sizeof(__u16))
-		return;
+		flags = 0;
+	else
+		flags = rta_getattr_u16(tb[IFLA_MACVLAN_FLAGS]);
 
-	flags = rta_getattr_u16(tb[IFLA_MACVLAN_FLAGS]);
 	if (flags & MACVLAN_FLAG_NOPROMISC)
 		fprintf(f, "nopromisc ");
+
+	/* in source mode, there are more options to print */
+
+	if (mode != MACVLAN_MODE_SOURCE)
+		return;
+
+	if (!tb[IFLA_MACVLAN_MACADDR_COUNT] ||
+	    RTA_PAYLOAD(tb[IFLA_MACVLAN_MACADDR_COUNT]) < sizeof(__u32))
+		return;
+
+	count = rta_getattr_u32(tb[IFLA_MACVLAN_MACADDR_COUNT]);
+	fprintf(f, "remotes (%d) ", count);
+
+	if (!tb[IFLA_MACVLAN_MACADDR_DATA])
+		return;
+
+	rta = RTA_DATA(tb[IFLA_MACVLAN_MACADDR_DATA]);
+	len = RTA_PAYLOAD(tb[IFLA_MACVLAN_MACADDR_DATA]);
+
+	for (; RTA_OK(rta, len); rta = RTA_NEXT(rta, len)) {
+		if (rta->rta_type != IFLA_MACVLAN_MACADDR ||
+		    RTA_PAYLOAD(rta) < 6)
+			continue;
+		addr = RTA_DATA(rta);
+		fprintf(f, "%.2x:%.2x:%.2x:%.2x:%.2x:%.2x ", addr[0],
+			addr[1], addr[2], addr[3], addr[4], addr[5]);
+	}
 }
 
 static void macvlan_print_help(struct link_util *lu, int argc, char **argv,
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index ee1159d..18e9417 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -135,7 +135,12 @@ ip-link \- network device configuration
 .IR NAME " ]"
 .br
 .RB "[ " addrgenmode " { " eui64 " | " none " | " stable_secret " | " random " } ]"
-
+.br
+.RB "[ " macaddr " { " flush " | { " add " | " del " } "
+.IR MACADDR " | set [ "
+.IR MACADDR " [ "
+.IR MACADDR " [ ... ] ] ] } ]"
+.br
 
 .ti -8
 .B ip link show
@@ -882,7 +887,7 @@ the following additional arguments are supported:
 .BI "ip link add link " DEVICE " name " NAME
 .BR type " { " macvlan " | " macvtap " } "
 .BR mode " { " private " | " vepa " | " bridge " | " passthru
-.RB " [ " nopromisc " ] } "
+.RB " [ " nopromisc " ] | " source " } "
 
 .in +8
 .sp
@@ -919,6 +924,13 @@ the interface or create vlan interfaces on top of it. By default, this mode
 forces the underlying interface into promiscuous mode. Passing the
 .BR nopromisc " flag prevents this, so the promisc flag may be controlled "
 using standard tools.
+
+.B mode source
+- allows one to set a list of allowed mac address, which is used to match
+against source mac address from received frames on underlying interface. This
+allows creating mac based VLAN associations, instead of standard port or tag
+based. The feature is useful to deploy 802.1x mac based behavior,
+where drivers of underlying interfaces doesn't allows that.
 .in -8
 
 .TP
@@ -1468,6 +1480,32 @@ the following additional arguments are supported:
 
 .in -8
 
+.TP
+MACVLAN and MACVTAP Support
+Modify list of allowed macaddr for link in source mode.
+
+.B "ip link set type { macvlan | macvap } "
+[
+.BI macaddr " " "" COMMAND " " MACADDR " ..."
+]
+
+Commands:
+.in +8
+.B add
+- add MACADDR to allowed list
+.sp
+.B set
+- replace allowed list
+.sp
+.B del
+- remove MACADDR from allowed list
+.sp
+.B flush
+- flush whole allowed list
+.sp
+.in -8
+
+
 .SS  ip link show - display device attributes
 
 .TP
-- 
2.1.4

^ permalink raw reply related

* [PATCH v8] mac80211: multicast to unicast conversion
From: Michael Braun @ 2016-11-22 10:52 UTC (permalink / raw)
  To: johannes; +Cc: Michael Braun, linux-wireless, netdev, projekt-wlan

Add the ability for an AP (and associated VLANs) to perform
multicast-to-unicast conversion for ARP, IPv4 and IPv6 frames
(possibly within 802.1Q). If enabled, such frames are to be sent
to each station separately, with the DA replaced by their own
MAC address rather than the group address.

Note that this may break certain expectations of the receiver,
such as the ability to drop unicast IP packets received within
multicast L2 frames, or the ability to not send ICMP destination
unreachable messages for packets received in L2 multicast (which
is required, but the receiver can't tell the difference if this
new option is enabled.)

This also doesn't implement the 802.11 DMS (directed multicast
service).

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>

--
v8:
  - remove superflous check
  - change return type to bool
v7:
  - avoid recursion
  - style and description
v5:
  - rename bss->unicast to bss->multicast_to_unicast
  - access sdata->bss only after checking iftype
v4:
  - rename MULTICAST_TO_UNICAST to MULTICAST_TO_UNICAST
v3: fix compile error for trace.h
v2: add nl80211 toggle
    rename tx_dnat to change_da
    change int to bool unicast
---
 net/mac80211/cfg.c            |  12 +++++
 net/mac80211/debugfs_netdev.c |   3 ++
 net/mac80211/ieee80211_i.h    |   1 +
 net/mac80211/tx.c             | 122 +++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 1edb017..7de342a 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -3345,6 +3345,17 @@ static int ieee80211_del_tx_ts(struct wiphy *wiphy, struct net_device *dev,
 	return -ENOENT;
 }
 
+static int ieee80211_set_multicast_to_unicast(struct wiphy *wiphy,
+					      struct net_device *dev,
+					      const bool enabled)
+{
+	struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+
+	sdata->u.ap.multicast_to_unicast = enabled;
+
+	return 0;
+}
+
 const struct cfg80211_ops mac80211_config_ops = {
 	.add_virtual_intf = ieee80211_add_iface,
 	.del_virtual_intf = ieee80211_del_iface,
@@ -3430,4 +3441,5 @@ const struct cfg80211_ops mac80211_config_ops = {
 	.set_ap_chanwidth = ieee80211_set_ap_chanwidth,
 	.add_tx_ts = ieee80211_add_tx_ts,
 	.del_tx_ts = ieee80211_del_tx_ts,
+	.set_multicast_to_unicast = ieee80211_set_multicast_to_unicast,
 };
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index ed7bff4..509c6c3 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -487,6 +487,8 @@ static ssize_t ieee80211_if_fmt_num_buffered_multicast(
 }
 IEEE80211_IF_FILE_R(num_buffered_multicast);
 
+IEEE80211_IF_FILE(multicast_to_unicast, u.ap.multicast_to_unicast, HEX);
+
 /* IBSS attributes */
 static ssize_t ieee80211_if_fmt_tsf(
 	const struct ieee80211_sub_if_data *sdata, char *buf, int buflen)
@@ -642,6 +644,7 @@ static void add_ap_files(struct ieee80211_sub_if_data *sdata)
 	DEBUGFS_ADD(dtim_count);
 	DEBUGFS_ADD(num_buffered_multicast);
 	DEBUGFS_ADD_MODE(tkip_mic_test, 0200);
+	DEBUGFS_ADD_MODE(multicast_to_unicast, 0600);
 }
 
 static void add_vlan_files(struct ieee80211_sub_if_data *sdata)
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 70c0963..84374ed 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -293,6 +293,7 @@ struct ieee80211_if_ap {
 			 driver_smps_mode; /* smps mode request */
 
 	struct work_struct request_smps_work;
+	bool multicast_to_unicast;
 };
 
 struct ieee80211_if_wds {
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index c3ce86e..5effffd 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -16,6 +16,7 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/skbuff.h>
+#include <linux/if_vlan.h>
 #include <linux/etherdevice.h>
 #include <linux/bitmap.h>
 #include <linux/rcupdate.h>
@@ -3418,6 +3419,115 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
 	rcu_read_unlock();
 }
 
+static int ieee80211_change_da(struct sk_buff *skb, struct sta_info *sta)
+{
+	struct ethhdr *eth;
+	int err;
+
+	err = skb_ensure_writable(skb, ETH_HLEN);
+	if (unlikely(err))
+		return err;
+
+	eth = (void *)skb->data;
+	ether_addr_copy(eth->h_dest, sta->sta.addr);
+
+	return 0;
+}
+
+static inline bool
+ieee80211_multicast_to_unicast(struct sk_buff *skb, struct net_device *dev)
+{
+	struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+	const struct ethhdr *eth = (void *)skb->data;
+	const struct vlan_ethhdr *ethvlan = (void *)skb->data;
+	u16 ethertype;
+
+	if (likely(!is_multicast_ether_addr(eth->h_dest)))
+		return 0;
+
+	switch (sdata->vif.type) {
+	case NL80211_IFTYPE_AP_VLAN:
+		if (sdata->u.vlan.sta)
+			return 0;
+		if (sdata->wdev.use_4addr)
+			return 0;
+		/* fall through */
+	case NL80211_IFTYPE_AP:
+		/* check runtime toggle for this bss */
+		if (!sdata->bss->multicast_to_unicast)
+			return 0;
+		break;
+	default:
+		return 0;
+	}
+
+	/* multicast to unicast conversion only for some payload */
+	ethertype = ntohs(eth->h_proto);
+	if (ethertype == ETH_P_8021Q && skb->len >= VLAN_ETH_HLEN)
+		ethertype = ntohs(ethvlan->h_vlan_encapsulated_proto);
+	switch (ethertype) {
+	case ETH_P_ARP:
+	case ETH_P_IP:
+	case ETH_P_IPV6:
+		break;
+	default:
+		return 0;
+	}
+
+	return 1;
+}
+
+static void
+ieee80211_convert_to_unicast(struct sk_buff *skb, struct net_device *dev,
+			     struct sk_buff_head *queue)
+{
+	struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+	struct ieee80211_local *local = sdata->local;
+	const struct ethhdr *eth = (struct ethhdr *)skb->data;
+	struct sta_info *sta, *first = NULL;
+	struct sk_buff *cloned_skb;
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(sta, &local->sta_list, list) {
+		if (sdata != sta->sdata)
+			/* AP-VLAN mismatch */
+			continue;
+		if (unlikely(ether_addr_equal(eth->h_source, sta->sta.addr)))
+			/* do not send back to source */
+			continue;
+		if (!first) {
+			first = sta;
+			continue;
+		}
+		cloned_skb = skb_clone(skb, GFP_ATOMIC);
+		if (!cloned_skb)
+			goto unicast;
+		if (unlikely(ieee80211_change_da(cloned_skb, sta))) {
+			dev_kfree_skb(cloned_skb);
+			goto unicast;
+		}
+		__skb_queue_tail(queue, cloned_skb);
+	}
+
+	if (likely(first)) {
+		if (unlikely(ieee80211_change_da(skb, first)))
+			goto unicast;
+		__skb_queue_tail(queue, skb);
+	} else {
+		/* no STA connected, drop */
+		kfree_skb(skb);
+		skb = NULL;
+	}
+
+	goto out;
+unicast:
+	__skb_queue_purge(queue);
+	__skb_queue_tail(queue, skb);
+out:
+	rcu_read_unlock();
+}
+
 /**
  * ieee80211_subif_start_xmit - netif start_xmit function for 802.3 vifs
  * @skb: packet to be sent
@@ -3428,7 +3538,17 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
 netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
 				       struct net_device *dev)
 {
-	__ieee80211_subif_start_xmit(skb, dev, 0);
+	if (unlikely(ieee80211_multicast_to_unicast(skb, dev))) {
+		struct sk_buff_head queue;
+
+		__skb_queue_head_init(&queue);
+		ieee80211_convert_to_unicast(skb, dev, &queue);
+		while ((skb = __skb_dequeue(&queue)))
+			__ieee80211_subif_start_xmit(skb, dev, 0);
+	} else {
+		__ieee80211_subif_start_xmit(skb, dev, 0);
+	}
+
 	return NETDEV_TX_OK;
 }
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH] net: dsa: mv88e6xxx: egress all frames
From: Stefan Eichenberger @ 2016-11-22 10:39 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli; +Cc: netdev, Stefan Eichenberger

Egress multicast and egress unicast is only enabled for CPU/DSA ports
but for switching operation it seems it should be enabled for all ports.
Do I miss something here?

I did the following test:
brctl addbr br0
brctl addif br0 lan0
brctl addif br0 lan1

In this scenario the unicast and multicast packets were not forwarded,
therefore ARP requests were not resolved, and no connection could be
established.

If no bridge is configured we do not forward unicast and multicast
packets because the VLAN mapping is active.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 883fd98..fe76372 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2506,15 +2506,14 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
 	    mv88e6xxx_6185_family(chip) || mv88e6xxx_6320_family(chip))
 		reg = PORT_CONTROL_IGMP_MLD_SNOOP |
 		PORT_CONTROL_USE_TAG | PORT_CONTROL_USE_IP |
-		PORT_CONTROL_STATE_FORWARDING;
+		PORT_CONTROL_STATE_FORWARDING |
+		PORT_CONTROL_FORWARD_UNKNOWN_MC | PORT_CONTROL_FORWARD_UNKNOWN;
 	if (dsa_is_cpu_port(ds, port)) {
 		if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_EDSA))
-			reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA |
-				PORT_CONTROL_FORWARD_UNKNOWN_MC;
+			reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA;
 		else
 			reg |= PORT_CONTROL_DSA_TAG;
-		reg |= PORT_CONTROL_EGRESS_ADD_TAG |
-			PORT_CONTROL_FORWARD_UNKNOWN;
+		reg |= PORT_CONTROL_EGRESS_ADD_TAG;
 	}
 	if (dsa_is_dsa_port(ds, port)) {
 		if (mv88e6xxx_6095_family(chip) ||
-- 
2.9.3

^ permalink raw reply related

* [PATCH] net: dsa: mv88e6xxx: add MV88E6097 switch
From: Stefan Eichenberger @ 2016-11-22 10:28 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli; +Cc: netdev, Stefan Eichenberger

Add support for the MV88E6097 switch. The change was tested on an Armada
based platform with a MV88E6097 switch.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c      | 19 +++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5a9729b..20d6fb5 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3213,6 +3213,12 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
 	.phy_write = mv88e6xxx_phy_ppu_write,
 };
 
+static const struct mv88e6xxx_ops mv88e6097_ops = {
+	.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
+	.phy_read = mv88e6xxx_g2_smi_phy_read,
+	.phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
 static const struct mv88e6xxx_ops mv88e6123_ops = {
 	.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
 	.phy_read = mv88e6xxx_read,
@@ -3342,6 +3348,19 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
 		.ops = &mv88e6095_ops,
 	},
 
+	[MV88E6097] = {
+		.prod_num = PORT_SWITCH_ID_PROD_NUM_6097,
+		.family = MV88E6XXX_FAMILY_6097,
+		.name = "Marvell 88E6097/88E6097F",
+		.num_databases = 4096,
+		.num_ports = 11,
+		.port_base_addr = 0x10,
+		.global1_addr = 0x1b,
+		.age_time_coeff = 15000,
+		.flags = MV88E6XXX_FLAGS_FAMILY_6097,
+		.ops = &mv88e6097_ops,
+	},
+
 	[MV88E6123] = {
 		.prod_num = PORT_SWITCH_ID_PROD_NUM_6123,
 		.family = MV88E6XXX_FAMILY_6165,
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index e572121..42e28f8 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -74,6 +74,7 @@
 #define PORT_SWITCH_ID		0x03
 #define PORT_SWITCH_ID_PROD_NUM_6085	0x04a
 #define PORT_SWITCH_ID_PROD_NUM_6095	0x095
+#define PORT_SWITCH_ID_PROD_NUM_6097	0x099
 #define PORT_SWITCH_ID_PROD_NUM_6131	0x106
 #define PORT_SWITCH_ID_PROD_NUM_6320	0x115
 #define PORT_SWITCH_ID_PROD_NUM_6123	0x121
@@ -353,6 +354,7 @@
 enum mv88e6xxx_model {
 	MV88E6085,
 	MV88E6095,
+	MV88E6097,
 	MV88E6123,
 	MV88E6131,
 	MV88E6161,
-- 
2.9.3

^ permalink raw reply related

* [patch net-next 2/2] mlxsw: core: Implement thermal zone
From: Jiri Pirko @ 2016-11-22 10:24 UTC (permalink / raw)
  To: netdev; +Cc: davem, cera, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1479810253-9114-1-git-send-email-jiri@resnulli.us>

From: Ivan Vecera <cera@cera.cz>

Implement thermal zone for mlxsw based HW. It uses temperature sensor
provided by ASIC (the same as mlxsw hwmon interface) to report current
temp to thermal core. The ASIC's PWM is then used to control speed
of system fans registered as cooling devices.

Signed-off-by: Ivan Vecera <cera@cera.cz>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Kconfig        |   9 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile       |   1 +
 drivers/net/ethernet/mellanox/mlxsw/core.c         |   8 +
 drivers/net/ethernet/mellanox/mlxsw/core.h         |  24 ++
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 442 +++++++++++++++++++++
 5 files changed, 484 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Kconfig b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
index c9822e6..95ae4c0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
@@ -19,6 +19,15 @@ config MLXSW_CORE_HWMON
 	---help---
 	  Say Y here if you want to expose HWMON interface on mlxsw devices.
 
+config MLXSW_CORE_THERMAL
+	bool "Thermal zone support for Mellanox Technologies Switch ASICs"
+	depends on MLXSW_CORE && THERMAL
+	depends on !(MLXSW_CORE=y && THERMAL=m)
+	default y
+	---help---
+	 Say Y here if you want to automatically control fans speed according
+	 ambient temperature reported by ASIC.
+
 config MLXSW_PCI
 	tristate "PCI bus implementation for Mellanox Technologies Switch ASICs"
 	depends on PCI && HAS_DMA && HAS_IOMEM && MLXSW_CORE
diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 2722942..fe8dadb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_MLXSW_CORE)	+= mlxsw_core.o
 mlxsw_core-objs			:= core.o
 mlxsw_core-$(CONFIG_MLXSW_CORE_HWMON) += core_hwmon.o
+mlxsw_core-$(CONFIG_MLXSW_CORE_THERMAL) += core_thermal.o
 obj-$(CONFIG_MLXSW_PCI)		+= mlxsw_pci.o
 mlxsw_pci-objs			:= pci.o
 obj-$(CONFIG_MLXSW_I2C)		+= mlxsw_i2c.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 763752f..bcd7251 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -131,6 +131,7 @@ struct mlxsw_core {
 	} lag;
 	struct mlxsw_res res;
 	struct mlxsw_hwmon *hwmon;
+	struct mlxsw_thermal *thermal;
 	struct mlxsw_core_port ports[MLXSW_PORT_MAX_PORTS];
 	unsigned long driver_priv[0];
 	/* driver_priv has to be always the last item */
@@ -1162,6 +1163,11 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (err)
 		goto err_hwmon_init;
 
+	err = mlxsw_thermal_init(mlxsw_core, mlxsw_bus_info,
+				 &mlxsw_core->thermal);
+	if (err)
+		goto err_thermal_init;
+
 	if (mlxsw_driver->init) {
 		err = mlxsw_driver->init(mlxsw_core, mlxsw_bus_info);
 		if (err)
@@ -1178,6 +1184,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (mlxsw_core->driver->fini)
 		mlxsw_core->driver->fini(mlxsw_core);
 err_driver_init:
+err_thermal_init:
 err_hwmon_init:
 	devlink_unregister(devlink);
 err_devlink_register:
@@ -1204,6 +1211,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 	mlxsw_core_debugfs_fini(mlxsw_core);
 	if (mlxsw_core->driver->fini)
 		mlxsw_core->driver->fini(mlxsw_core);
+	mlxsw_thermal_fini(mlxsw_core->thermal);
 	devlink_unregister(devlink);
 	mlxsw_emad_fini(mlxsw_core);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index f7a4d83..3de8955 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -321,4 +321,28 @@ static inline int mlxsw_hwmon_init(struct mlxsw_core *mlxsw_core,
 
 #endif
 
+struct mlxsw_thermal;
+
+#ifdef CONFIG_MLXSW_CORE_THERMAL
+
+int mlxsw_thermal_init(struct mlxsw_core *mlxsw_core,
+		       const struct mlxsw_bus_info *mlxsw_bus_info,
+		       struct mlxsw_thermal **p_thermal);
+void mlxsw_thermal_fini(struct mlxsw_thermal *thermal);
+
+#else
+
+static inline int mlxsw_thermal_init(struct mlxsw_core *mlxsw_core,
+				     const struct mlxsw_bus_info *mlxsw_bus_info,
+				     struct mlxsw_thermal **p_thermal)
+{
+	return 0;
+}
+
+static inline void mlxsw_thermal_fini(struct mlxsw_thermal *thermal)
+{
+}
+
+#endif
+
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
new file mode 100644
index 0000000..d866c98
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -0,0 +1,442 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+ * Copyright (c) 2016 Ivan Vecera <cera@cera.cz>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/sysfs.h>
+#include <linux/thermal.h>
+#include <linux/err.h>
+
+#include "core.h"
+
+#define MLXSW_THERMAL_POLL_INT	1000	/* ms */
+#define MLXSW_THERMAL_MAX_TEMP	110000	/* 110C */
+#define MLXSW_THERMAL_MAX_STATE	10
+#define MLXSW_THERMAL_MAX_DUTY	255
+
+struct mlxsw_thermal_trip {
+	int	type;
+	int	temp;
+	int	min_state;
+	int	max_state;
+};
+
+static const struct mlxsw_thermal_trip default_thermal_trips[] = {
+	{	/* In range - 0-40% PWM */
+		.type		= THERMAL_TRIP_ACTIVE,
+		.temp		= 75000,
+		.min_state	= 0,
+		.max_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
+	},
+	{	/* High - 40-100% PWM */
+		.type		= THERMAL_TRIP_ACTIVE,
+		.temp		= 80000,
+		.min_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
+		.max_state	= MLXSW_THERMAL_MAX_STATE,
+	},
+	{
+		/* Very high - 100% PWM */
+		.type		= THERMAL_TRIP_ACTIVE,
+		.temp		= 85000,
+		.min_state	= MLXSW_THERMAL_MAX_STATE,
+		.max_state	= MLXSW_THERMAL_MAX_STATE,
+	},
+	{	/* Warning */
+		.type		= THERMAL_TRIP_HOT,
+		.temp		= 105000,
+		.min_state	= MLXSW_THERMAL_MAX_STATE,
+		.max_state	= MLXSW_THERMAL_MAX_STATE,
+	},
+	{	/* Critical - soft poweroff */
+		.type		= THERMAL_TRIP_CRITICAL,
+		.temp		= MLXSW_THERMAL_MAX_TEMP,
+		.min_state	= MLXSW_THERMAL_MAX_STATE,
+		.max_state	= MLXSW_THERMAL_MAX_STATE,
+	}
+};
+
+#define MLXSW_THERMAL_NUM_TRIPS	ARRAY_SIZE(default_thermal_trips)
+
+/* Make sure all trips are writable */
+#define MLXSW_THERMAL_TRIP_MASK	(BIT(MLXSW_THERMAL_NUM_TRIPS) - 1)
+
+struct mlxsw_thermal {
+	struct mlxsw_core *core;
+	const struct mlxsw_bus_info *bus_info;
+	struct thermal_zone_device *tzdev;
+	struct thermal_cooling_device *cdevs[MLXSW_MFCR_PWMS_MAX];
+	struct mlxsw_thermal_trip trips[MLXSW_THERMAL_NUM_TRIPS];
+	enum thermal_device_mode mode;
+};
+
+static inline u8 mlxsw_state_to_duty(int state)
+{
+	return DIV_ROUND_CLOSEST(state * MLXSW_THERMAL_MAX_DUTY,
+				 MLXSW_THERMAL_MAX_STATE);
+}
+
+static inline int mlxsw_duty_to_state(u8 duty)
+{
+	return DIV_ROUND_CLOSEST(duty * MLXSW_THERMAL_MAX_STATE,
+				 MLXSW_THERMAL_MAX_DUTY);
+}
+
+static int mlxsw_get_cooling_device_idx(struct mlxsw_thermal *thermal,
+					struct thermal_cooling_device *cdev)
+{
+	int i;
+
+	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
+		if (thermal->cdevs[i] == cdev)
+			return i;
+
+	return -ENODEV;
+}
+
+static int mlxsw_thermal_bind(struct thermal_zone_device *tzdev,
+			      struct thermal_cooling_device *cdev)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	int i, err;
+
+	/* If the cooling device is one of ours bind it */
+	if (mlxsw_get_cooling_device_idx(thermal, cdev) < 0)
+		return 0;
+
+	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++) {
+		const struct mlxsw_thermal_trip *trip = &thermal->trips[i];
+
+		err = thermal_zone_bind_cooling_device(tzdev, i, cdev,
+						       trip->max_state,
+						       trip->min_state,
+						       THERMAL_WEIGHT_DEFAULT);
+		if (err < 0) {
+			dev_err(dev, "Failed to bind cooling device to trip %d\n", i);
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int mlxsw_thermal_unbind(struct thermal_zone_device *tzdev,
+				struct thermal_cooling_device *cdev)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	int i;
+	int err;
+
+	/* If the cooling device is our one unbind it */
+	if (mlxsw_get_cooling_device_idx(thermal, cdev) < 0)
+		return 0;
+
+	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++) {
+		err = thermal_zone_unbind_cooling_device(tzdev, i, cdev);
+		if (err < 0) {
+			dev_err(dev, "Failed to unbind cooling device\n");
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int mlxsw_thermal_get_mode(struct thermal_zone_device *tzdev,
+				  enum thermal_device_mode *mode)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	*mode = thermal->mode;
+
+	return 0;
+}
+
+static int mlxsw_thermal_set_mode(struct thermal_zone_device *tzdev,
+				  enum thermal_device_mode mode)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	mutex_lock(&tzdev->lock);
+
+	if (mode == THERMAL_DEVICE_ENABLED)
+		tzdev->polling_delay = MLXSW_THERMAL_POLL_INT;
+	else
+		tzdev->polling_delay = 0;
+
+	mutex_unlock(&tzdev->lock);
+
+	thermal->mode = mode;
+	thermal_zone_device_update(tzdev, THERMAL_EVENT_UNSPECIFIED);
+
+	return 0;
+}
+
+static int mlxsw_thermal_get_temp(struct thermal_zone_device *tzdev,
+				  int *p_temp)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	char mtmp_pl[MLXSW_REG_MTMP_LEN];
+	unsigned int temp;
+	int err;
+
+	mlxsw_reg_mtmp_pack(mtmp_pl, 0, false, false);
+
+	err = mlxsw_reg_query(thermal->core, MLXSW_REG(mtmp), mtmp_pl);
+	if (err) {
+		dev_err(dev, "Failed to query temp sensor\n");
+		return err;
+	}
+	mlxsw_reg_mtmp_unpack(mtmp_pl, &temp, NULL, NULL);
+
+	*p_temp = (int) temp;
+	return 0;
+}
+
+static int mlxsw_thermal_get_trip_type(struct thermal_zone_device *tzdev,
+				       int trip,
+				       enum thermal_trip_type *p_type)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS)
+		return -EINVAL;
+
+	*p_type = thermal->trips[trip].type;
+	return 0;
+}
+
+static int mlxsw_thermal_get_trip_temp(struct thermal_zone_device *tzdev,
+				       int trip, int *p_temp)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS)
+		return -EINVAL;
+
+	*p_temp = thermal->trips[trip].temp;
+	return 0;
+}
+
+static int mlxsw_thermal_set_trip_temp(struct thermal_zone_device *tzdev,
+				       int trip, int temp)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS ||
+	    temp > MLXSW_THERMAL_MAX_TEMP)
+		return -EINVAL;
+
+	thermal->trips[trip].temp = temp;
+	return 0;
+}
+
+static struct thermal_zone_device_ops mlxsw_thermal_ops = {
+	.bind = mlxsw_thermal_bind,
+	.unbind = mlxsw_thermal_unbind,
+	.get_mode = mlxsw_thermal_get_mode,
+	.set_mode = mlxsw_thermal_set_mode,
+	.get_temp = mlxsw_thermal_get_temp,
+	.get_trip_type	= mlxsw_thermal_get_trip_type,
+	.get_trip_temp	= mlxsw_thermal_get_trip_temp,
+	.set_trip_temp	= mlxsw_thermal_set_trip_temp,
+};
+
+static int mlxsw_thermal_get_max_state(struct thermal_cooling_device *cdev,
+				       unsigned long *p_state)
+{
+	*p_state = MLXSW_THERMAL_MAX_STATE;
+	return 0;
+}
+
+static int mlxsw_thermal_get_cur_state(struct thermal_cooling_device *cdev,
+				       unsigned long *p_state)
+
+{
+	struct mlxsw_thermal *thermal = cdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	char mfsc_pl[MLXSW_REG_MFSC_LEN];
+	int err, idx;
+	u8 duty;
+
+	idx = mlxsw_get_cooling_device_idx(thermal, cdev);
+	if (idx < 0)
+		return idx;
+
+	mlxsw_reg_mfsc_pack(mfsc_pl, idx, 0);
+	err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
+	if (err) {
+		dev_err(dev, "Failed to query PWM duty\n");
+		return err;
+	}
+
+	duty = mlxsw_reg_mfsc_pwm_duty_cycle_get(mfsc_pl);
+	*p_state = mlxsw_duty_to_state(duty);
+	return 0;
+}
+
+static int mlxsw_thermal_set_cur_state(struct thermal_cooling_device *cdev,
+				       unsigned long state)
+
+{
+	struct mlxsw_thermal *thermal = cdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	char mfsc_pl[MLXSW_REG_MFSC_LEN];
+	int err, idx;
+
+	idx = mlxsw_get_cooling_device_idx(thermal, cdev);
+	if (idx < 0)
+		return idx;
+
+	mlxsw_reg_mfsc_pack(mfsc_pl, idx, mlxsw_state_to_duty(state));
+	err = mlxsw_reg_write(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
+	if (err) {
+		dev_err(dev, "Failed to write PWM duty\n");
+		return err;
+	}
+	return 0;
+}
+
+static const struct thermal_cooling_device_ops mlxsw_cooling_ops = {
+	.get_max_state	= mlxsw_thermal_get_max_state,
+	.get_cur_state	= mlxsw_thermal_get_cur_state,
+	.set_cur_state	= mlxsw_thermal_set_cur_state,
+};
+
+int mlxsw_thermal_init(struct mlxsw_core *core,
+		       const struct mlxsw_bus_info *bus_info,
+		       struct mlxsw_thermal **p_thermal)
+{
+	char mfcr_pl[MLXSW_REG_MFCR_LEN] = { 0 };
+	enum mlxsw_reg_mfcr_pwm_frequency freq;
+	struct device *dev = bus_info->dev;
+	struct mlxsw_thermal *thermal;
+	u16 tacho_active;
+	u8 pwm_active;
+	int err, i;
+
+	thermal = devm_kzalloc(dev, sizeof(*thermal),
+			       GFP_KERNEL);
+	if (!thermal)
+		return -ENOMEM;
+
+	thermal->core = core;
+	thermal->bus_info = bus_info;
+	memcpy(thermal->trips, default_thermal_trips, sizeof(thermal->trips));
+
+	err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfcr), mfcr_pl);
+	if (err) {
+		dev_err(dev, "Failed to probe PWMs\n");
+		goto err_free_thermal;
+	}
+	mlxsw_reg_mfcr_unpack(mfcr_pl, &freq, &tacho_active, &pwm_active);
+
+	for (i = 0; i < MLXSW_MFCR_TACHOS_MAX; i++) {
+		if (tacho_active & BIT(i)) {
+			char mfsl_pl[MLXSW_REG_MFSL_LEN];
+
+			mlxsw_reg_mfsl_pack(mfsl_pl, i, 0, 0);
+
+			/* We need to query the register to preserve maximum */
+			err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfsl),
+					      mfsl_pl);
+			if (err)
+				goto err_free_thermal;
+
+			/* set the minimal RPMs to 0 */
+			mlxsw_reg_mfsl_tach_min_set(mfsl_pl, 0);
+			err = mlxsw_reg_write(thermal->core, MLXSW_REG(mfsl),
+					      mfsl_pl);
+			if (err)
+				goto err_free_thermal;
+		}
+	}
+	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
+		if (pwm_active & BIT(i)) {
+			struct thermal_cooling_device *cdev;
+
+			cdev = thermal_cooling_device_register("Fan", thermal,
+							&mlxsw_cooling_ops);
+			if (IS_ERR(cdev)) {
+				err = PTR_ERR(cdev);
+				dev_err(dev, "Failed to register cooling device\n");
+				goto err_unreg_cdevs;
+			}
+			thermal->cdevs[i] = cdev;
+		}
+	}
+
+	thermal->tzdev = thermal_zone_device_register("mlxsw",
+						      MLXSW_THERMAL_NUM_TRIPS,
+						      MLXSW_THERMAL_TRIP_MASK,
+						      thermal,
+						      &mlxsw_thermal_ops,
+						      NULL, 0,
+						      MLXSW_THERMAL_POLL_INT);
+	if (IS_ERR(thermal->tzdev)) {
+		err = PTR_ERR(thermal->tzdev);
+		dev_err(dev, "Failed to register thermal zone\n");
+		goto err_unreg_cdevs;
+	}
+
+	thermal->mode = THERMAL_DEVICE_ENABLED;
+	*p_thermal = thermal;
+	return 0;
+err_unreg_cdevs:
+	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
+		if (thermal->cdevs[i])
+			thermal_cooling_device_unregister(thermal->cdevs[i]);
+err_free_thermal:
+	devm_kfree(dev, thermal);
+	return err;
+}
+
+void mlxsw_thermal_fini(struct mlxsw_thermal *thermal)
+{
+	int i;
+
+	if (thermal->tzdev) {
+		thermal_zone_device_unregister(thermal->tzdev);
+		thermal->tzdev = NULL;
+	}
+
+	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
+		if (thermal->cdevs[i]) {
+			thermal_cooling_device_unregister(thermal->cdevs[i]);
+			thermal->cdevs[i] = NULL;
+		}
+	}
+
+	devm_kfree(thermal->bus_info->dev, thermal);
+}
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 1/2] mlxsw: reg: Add Management Fan Speed Limit register
From: Jiri Pirko @ 2016-11-22 10:24 UTC (permalink / raw)
  To: netdev; +Cc: davem, cera, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1479810253-9114-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

The MFSL register is used to configure the fan speed event / interrupt
notification mechanism. Fan speed threshold are defined for both
under-speed and over-speed.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 49 +++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index edad7cb..2618e9c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -4518,6 +4518,54 @@ static inline void mlxsw_reg_mfsm_pack(char *payload, u8 tacho)
 	mlxsw_reg_mfsm_tacho_set(payload, tacho);
 }
 
+/* MFSL - Management Fan Speed Limit Register
+ * ------------------------------------------
+ * The Fan Speed Limit register is used to configure the fan speed
+ * event / interrupt notification mechanism. Fan speed threshold are
+ * defined for both under-speed and over-speed.
+ */
+#define MLXSW_REG_MFSL_ID 0x9004
+#define MLXSW_REG_MFSL_LEN 0x0C
+
+MLXSW_REG_DEFINE(mfsl, MLXSW_REG_MFSL_ID, MLXSW_REG_MFSL_LEN);
+
+/* reg_mfsl_tacho
+ * Fan tachometer index.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mfsl, tacho, 0x00, 24, 4);
+
+/* reg_mfsl_tach_min
+ * Tachometer minimum value (minimum RPM).
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mfsl, tach_min, 0x04, 0, 16);
+
+/* reg_mfsl_tach_max
+ * Tachometer maximum value (maximum RPM).
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mfsl, tach_max, 0x08, 0, 16);
+
+static inline void mlxsw_reg_mfsl_pack(char *payload, u8 tacho,
+				       u16 tach_min, u16 tach_max)
+{
+	MLXSW_REG_ZERO(mfsl, payload);
+	mlxsw_reg_mfsl_tacho_set(payload, tacho);
+	mlxsw_reg_mfsl_tach_min_set(payload, tach_min);
+	mlxsw_reg_mfsl_tach_max_set(payload, tach_max);
+}
+
+static inline void mlxsw_reg_mfsl_unpack(char *payload, u8 tacho,
+					 u16 *p_tach_min, u16 *p_tach_max)
+{
+	if (p_tach_min)
+		*p_tach_min = mlxsw_reg_mfsl_tach_min_get(payload);
+
+	if (p_tach_max)
+		*p_tach_max = mlxsw_reg_mfsl_tach_max_get(payload);
+}
+
 /* MTCAP - Management Temperature Capabilities
  * -------------------------------------------
  * This register exposes the capabilities of the device and
@@ -5228,6 +5276,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
 	MLXSW_REG(mfcr),
 	MLXSW_REG(mfsc),
 	MLXSW_REG(mfsm),
+	MLXSW_REG(mfsl),
 	MLXSW_REG(mtcap),
 	MLXSW_REG(mtmp),
 	MLXSW_REG(mpat),
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 0/2] mlxsw: core: Implement thermal zone
From: Jiri Pirko @ 2016-11-22 10:24 UTC (permalink / raw)
  To: netdev; +Cc: davem, cera, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz

From: Jiri Pirko <jiri@mellanox.com>

Implement thermal zone for mlxsw based HW.
The first patch is just a register dependency for the second patch.

Ivan Vecera (1):
  mlxsw: core: Implement thermal zone

Jiri Pirko (1):
  mlxsw: reg: Add Management Fan Speed Limit register

 drivers/net/ethernet/mellanox/mlxsw/Kconfig        |   9 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile       |   1 +
 drivers/net/ethernet/mellanox/mlxsw/core.c         |   8 +
 drivers/net/ethernet/mellanox/mlxsw/core.h         |  24 ++
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 442 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/reg.h          |  49 +++
 6 files changed, 533 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c

-- 
2.7.4

^ permalink raw reply

* net/icmp: null-ptr-deref in icmp6_send
From: Andrey Konovalov @ 2016-11-22 10:23 UTC (permalink / raw)
  To: David Ahern, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML
  Cc: Dmitry Vyukov, Alexander Potapenko, Kostya Serebryany,
	Eric Dumazet, syzkaller

Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

It seems that skb_dst(skb) may end up being NULL.

As far as I can see the bug was introduced in commit 5d41ce29e ("net:
icmp6_send should use dst dev to determine L3 domain").
ICMP v4 probaly has similar issue due to 9d1a6c4ea ("net:
icmp_route_lookup should use rt dev to determine L3 domain").

On commit 9c763584b7c8911106bb77af7e648bef09af9d80 (4.9-rc6, Nov 20).

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800666d4200 task.stack: ffff880067348000
RIP: 0010:[<ffffffff833617ec>]  [<ffffffff833617ec>]
icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
RSP: 0018:ffff88006734f2c0  EFLAGS: 00010206
RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
FS:  00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
Stack:
 ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
 ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
 ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
Call Trace:
 [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
 [<     inline     >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
 [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
 [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
 [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
 [<ffffffff82bdc01b>] __netif_receive_skb_core+0x187b/0x2a10 net/core/dev.c:4208
 [<ffffffff82bdd1da>] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4246
 [<ffffffff82bdd4d3>] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4274
 [<ffffffff82bdd6f8>] netif_receive_skb+0x48/0x250 net/core/dev.c:4298
 [<ffffffff82420e7e>] tun_get_user+0xbde/0x2890 drivers/net/tun.c:1308
 [<ffffffff82422d4a>] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
 [<     inline     >] new_sync_write fs/read_write.c:499
 [<ffffffff8151c234>] __vfs_write+0x334/0x570 fs/read_write.c:512
 [<ffffffff8151fd4b>] vfs_write+0x17b/0x500 fs/read_write.c:560
 [<     inline     >] SYSC_write fs/read_write.c:607
 [<ffffffff81523674>] SyS_write+0xd4/0x1a0 fs/read_write.c:599
 [<ffffffff83fc4301>] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code: 67 58 41 f6 c4 01 0f 85 d4 07 00 00 49 83 e4 fe e8 ea 5e fc fd
49 8d 7c 24 18 49 ba 00 00 00 00 00 fc ff df 49 89 f9 49 c1 e9 03 <43>
80 3c 11 00 0f 85 c5 17 00 00 4d 8b 64 24 18 65 ff 05 cd 3c
RIP  [<ffffffff833617ec>] icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
 RSP <ffff88006734f2c0>
---[ end trace 12dd736536064d71 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply

* Re: [RFC PATCH net v2 2/3] dt: bindings: add ethernet phy eee-disable-advert option documentation
From: Jerome Brunet @ 2016-11-22 10:13 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Alexandre TORGUE, Neil Armstrong, Martin Blumenstingl,
	Kevin Hilman, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andre Roth,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Carlo Caione,
	Giuseppe Cavallaro,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <e792c889-8725-3952-ca28-a08537d9f87a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Mon, 2016-11-21 at 21:35 -0800, Florian Fainelli wrote:
> Le 21/11/2016 à 08:47, Andrew Lunn a écrit :
> > 
> > > 
> > > What I did not realize when doing this patch for the realtek
> > > driver is
> > > that there is already 6 valid modes defined in the kernel
> > > 
> > > #define MDIO_EEE_100TX		MDIO_AN_EEE_ADV_100TX	
> > > /*
> > > 100TX EEE cap */
> > > #define MDIO_EEE_1000T		MDIO_AN_EEE_ADV_1000T	
> > > /*
> > > 1000T EEE cap */
> > > #define MDIO_EEE_10GT		0x0008	/* 10GT EEE
> > > cap */
> > > #define MDIO_EEE_1000KX		0x0010	/* 1000KX
> > > EEE cap
> > > */
> > > #define MDIO_EEE_10GKX4		0x0020	/* 10G KX4
> > > EEE cap
> > > */
> > > #define MDIO_EEE_10GKR		0x0040	/* 10G KR EEE
> > > cap
> > > */
> > > 
> > > I took care of only 2 in the case of realtek.c since it only
> > > support
> > > MDIO_EEE_100TX and MDIO_EEE_1000T.
> > > 
> > > Defining a property for each is certainly doable but it does not
> > > look
> > > very nice either. If it extends in the future, it will get even
> > > more
> > > messier, especially if you want to disable everything.
> > 
> > Yes, agreed.
> 
> One risk with the definition a group of advertisement capabilities
> (under the form of a bitmask for instance) to enable/disable is that
> we
> end up with Device Tree contain some kind of configuration policy as
> opposed to just flagging particular hardware features as broken.

The code proposed only allows to disable EEE advertisement (not
enable), so we should not see it used as a configuration policy in DT.
To make this more explicit, I could replace the property "eee-advert-
disable" by "eee-broken" ?

> 
> Fortunately, there does not seem to be a ton of PHYs out there which
> require EEE

It is quite difficult to have the real picture here because some PHYs
have EEE disabled by default and you have to explicitly enable it.
I have no idea of the ratio between the 2 phy policies.

> to be disabled to function properly so having individual
> properties vs. bitmasks/groups is kind of speculative here.

In the particular instance of the OdroidC2, disabling EEE for GbE only
enough. However, If you have a PHY broken with, I think it is likely
that you might want to disable all (supported) EEE modes. That's reason
why I prefer bitmask. I agree both are functionally similar, this is
kind of a cosmetic debate.

> 
> Another approach to solving this problem could be to register a PHY
> fixup which disables EEE at the PHY level, and which is only called
> for
> specific boards affected by this problem
> (of_machine_is_compatible()).
> This code can leave in arch/*/* when that is possible, 

That something I was looking at, but we don't have these files anymore
on ARM64 (looking at your comment, you already know this)

> or it can just be
> somewhere where it is relevant, e.g; in the PHY driver for instance
> (similarly to how PCI fixups are done).

Do you prefer having board specific code inside generic driver than
having the setting living in DT? Peppe told me they also had a few
platform with similar issues. The point is that this could be useful to
other people, so it could spread a grow a bit.

I would prefer having this in the DT, but I can definitely do it the
PHY with of_machine_is_compatible() and register_fixup is this what you
prefer/want. 

Cheers
Jerome


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2] net/phy: add trace events for mdio accesses
From: Uwe Kleine-König @ 2016-11-22 10:01 UTC (permalink / raw)
  To: Florian Fainelli, Steven Rostedt, Ingo Molnar; +Cc: netdev
In-Reply-To: <20161114110335.27862-1-uwe@kleine-koenig.org>

Make it possible to generate trace events for mdio read and write accesses.

Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
---
Changes since (implicit) v1:

 - make use of TRACE_EVENT_CONDITION

Alternatively to this patch the condition could be

+	TP_CONDITION(err == 0),

but then we'd need in the read callbacks:

+	trace_mdio_access(bus, 1, addr, regnum, retval, retval < 0 ? retval : 0);

or at least

+	trace_mdio_access(bus, 1, addr, regnum, retval, retval < 0);

which both looks more ugly IMHO.

Best regards
Uwe

 drivers/net/phy/mdio_bus.c  | 11 +++++++++++
 include/trace/events/mdio.h | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)
 create mode 100644 include/trace/events/mdio.h

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 09deef4bed09..653d076eafe5 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -38,6 +38,9 @@
 
 #include <asm/irq.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/mdio.h>
+
 int mdiobus_register_device(struct mdio_device *mdiodev)
 {
 	if (mdiodev->bus->mdio_map[mdiodev->addr])
@@ -461,6 +464,8 @@ int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum)
 	retval = bus->read(bus, addr, regnum);
 	mutex_unlock(&bus->mdio_lock);
 
+	trace_mdio_access(bus, 1, addr, regnum, retval, retval);
+
 	return retval;
 }
 EXPORT_SYMBOL(mdiobus_read_nested);
@@ -485,6 +490,8 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum)
 	retval = bus->read(bus, addr, regnum);
 	mutex_unlock(&bus->mdio_lock);
 
+	trace_mdio_access(bus, 1, addr, regnum, retval, retval);
+
 	return retval;
 }
 EXPORT_SYMBOL(mdiobus_read);
@@ -513,6 +520,8 @@ int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val)
 	err = bus->write(bus, addr, regnum, val);
 	mutex_unlock(&bus->mdio_lock);
 
+	trace_mdio_access(bus, 0, addr, regnum, val, err);
+
 	return err;
 }
 EXPORT_SYMBOL(mdiobus_write_nested);
@@ -538,6 +547,8 @@ int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val)
 	err = bus->write(bus, addr, regnum, val);
 	mutex_unlock(&bus->mdio_lock);
 
+	trace_mdio_access(bus, 0, addr, regnum, val, err);
+
 	return err;
 }
 EXPORT_SYMBOL(mdiobus_write);
diff --git a/include/trace/events/mdio.h b/include/trace/events/mdio.h
new file mode 100644
index 000000000000..468e2d095d19
--- /dev/null
+++ b/include/trace/events/mdio.h
@@ -0,0 +1,42 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mdio
+
+#if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MDIO_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT_CONDITION(mdio_access,
+
+	TP_PROTO(struct mii_bus *bus, int read,
+		 unsigned addr, unsigned regnum, u16 val, int err),
+
+	TP_ARGS(bus, read, addr, regnum, val, err),
+
+	TP_CONDITION(err >= 0),
+
+	TP_STRUCT__entry(
+		__array(char, busid, MII_BUS_ID_SIZE)
+		__field(int, read)
+		__field(unsigned, addr)
+		__field(unsigned, regnum)
+		__field(u16, val)
+	),
+
+	TP_fast_assign(
+		strncpy(__entry->busid, bus->id, MII_BUS_ID_SIZE);
+		__entry->read = read;
+		__entry->addr = addr;
+		__entry->regnum = regnum;
+		__entry->val = val;
+	),
+
+	TP_printk("%s %-5s phy:0x%02x reg:0x%02x val:0x%04hx",
+		  __entry->busid, __entry->read ? "read" : "write",
+		  __entry->addr, __entry->regnum, __entry->val)
+);
+
+#endif /* if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ) */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.10.2

^ permalink raw reply related

* mlx5 "syndrome" errors in kernel log
From: Jesper Dangaard Brouer @ 2016-11-22  9:59 UTC (permalink / raw)
  To: Saeed Mahameed, Tariq Toukan; +Cc: netdev@vger.kernel.org


Hi Saeed,

I'm seeing below dmesg errors, after pulling net-next at commit
e796f49d826aad, before I was not seeing these errors, where my tree was
based on top of commit 319b0534b95.

mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)


Listing my firmware version:

 $ ethtool -i mlx5p2
 driver: mlx5_core
 version: 3.0-1 (January 2015)
 firmware-version: 12.12.1240
 bus-info: 0000:02:00.1
 supports-statistics: yes
 supports-test: no
 supports-eeprom-access: no
 supports-register-dump: no
 supports-priv-flags: yes

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


git diff --stat 319b0534b95..e796f49d826aad drivers/net/ethernet/mellanox/mlx5/
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c        | 145 ++++++++++++++++++++++++++++++++++++++++++----------------------------------------------
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  40 ++++++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c    |  61 +++++++++++++++++++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h   |  49 +++++++++++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c         |  12 ++++++++
 drivers/net/ethernet/mellanox/mlx5/core/main.c       |  37 +++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/port.c       |  57 +++++++++++++++++++++++++++++++++++
 8 files changed, 312 insertions(+), 90 deletions(-)


$ git shortlog  319b0534b95..e796f49d826aad drivers/net/ethernet/mellanox/mlx5/
Daniel Borkmann (3):
      bpf, mlx5: fix mlx5e_create_rq taking reference on prog
      bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
      bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup

Eric Dumazet (1):
      net/mlx5e: remove napi_hash_del() calls

Gal Pressman (1):
      net/mlx5e: Expose PCIe statistics to ethtool

Huy Nguyen (3):
      net/mlx5: Add handling for port module event
      net/mlx5e: Add port module event counters to ethtool stats
      net/mlx5: Set driver version into firmware

Mohamad Haj Yahia (1):
      net/mlx5: Make the command interface cache more flexible



$ git log --pretty=oneline   319b0534b95..e796f49d826aad drivers/net/ethernet/mellanox/mlx5/
a055c19be98bc065a4478663ba7f6833693b8958 bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
c54c06290428554bc0e26d58f21a7865cbe995af bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
97bc402db7821259f6a722cb38e060aa9b35b6e8 bpf, mlx5: fix mlx5e_create_rq taking reference on prog
9c7262399ba12825f3ca4b00a76d8d5e77c720f5 net/mlx5e: Expose PCIe statistics to ethtool
012e50e109fd27ff989492ad74c50ca7ab21e6a1 net/mlx5: Set driver version into firmware
bedb7c909c1911270fcb084230245df4a00bd881 net/mlx5e: Add port module event counters to ethtool stats
d4eb4cd78b0774c7061db56844ed2ea7790cc77c net/mlx5: Add handling for port module event
0ac3ea70897fb9f84b620aeda074ecccf481629d net/mlx5: Make the command interface cache more flexible
d30d9ccbfac7cf9a12a088d57aaf0891732e2bca net/mlx5e: remove napi_hash_del() calls

^ permalink raw reply

* Re: [PATCH net 1/1] net sched filters: pass netlink message flags in event notification
From: Daniel Borkmann @ 2016-11-22  9:28 UTC (permalink / raw)
  To: Cong Wang, Roman Mashak
  Cc: David Miller, Linux Kernel Network Developers, Jamal Hadi Salim
In-Reply-To: <CAM_iQpXPsDAfgGsKB7V_=+LnwnuqzgvY7amcPo7hs3VKGrviiA@mail.gmail.com>

On 11/22/2016 06:23 AM, Cong Wang wrote:
> On Thu, Nov 17, 2016 at 1:02 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Wed, Nov 16, 2016 at 2:16 PM, Roman Mashak <mrv@mojatatu.com> wrote:
>>> Userland client should be able to read an event, and reflect it back to
>>> the kernel, therefore it needs to extract complete set of netlink flags.
>>>
>>> For example, this will allow "tc monitor" to distinguish Add and Replace
>>> operations.
>>>
>>> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
>>> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
>>> ---
>>>   net/sched/cls_api.c | 5 +++--
>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index 2b2a797..8e93d4a 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -112,7 +112,7 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
>>>
>>>          for (it_chain = chain; (tp = rtnl_dereference(*it_chain)) != NULL;
>>>               it_chain = &tp->next)
>>> -               tfilter_notify(net, oskb, n, tp, 0, event, false);
>>> +               tfilter_notify(net, oskb, n, tp, n->nlmsg_flags, event, false);
>>
>>
>> I must miss something, why does it make sense to pass n->nlmsg_flags
>> as 'fh' to tfilter_notify()??
>
> Ping... Any response?
>
> It still doesn't look correct to me. I will send a fix unless someone could
> explain this.

Sigh, I missed that this was applied already to -net (it certainly doesn't look
like -net material, but rather -net-next stuff) ... This definitely looks buggy
to me, the 0 as it was before was correct here (as it means we delete the whole
chain in this case).

If you could send a patch would be great. Thanks Cong!

^ permalink raw reply

* Re: net/l2tp: use-after-free write in l2tp_ip6_close
From: Andrey Konovalov @ 2016-11-22  9:23 UTC (permalink / raw)
  To: Guillaume Nault
  Cc: David S. Miller, Eric Dumazet, Willem de Bruijn,
	Hannes Frederic Sowa, Soheil Hassas Yeganeh, Shmulik Ladkani,
	Wei Wang, Haishuang Yan, netdev, LKML, Dmitry Vyukov,
	Kostya Serebryany, Alexander Potapenko, syzkaller
In-Reply-To: <20161110174429.dmqgov4k6tgg2fsc@alphalink.fr>

Hi Guillaume,

Sorry, I was on vacation last week, couldn't reply.

As I can see a fix was already sent upstream.

Thanks!

On Thu, Nov 10, 2016 at 6:44 PM, Guillaume Nault <g.nault@alphalink.fr> wrote:
> On Mon, Nov 07, 2016 at 11:35:26PM +0100, Andrey Konovalov wrote:
>> Hi,
>>
>> I've got the following error report while running the syzkaller fuzzer:
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in l2tp_ip6_close+0x239/0x2a0 at addr
>> ffff8800677276d8
>> Write of size 8 by task a.out/8668
>> CPU: 0 PID: 8668 Comm: a.out Not tainted 4.9.0-rc4+ #354
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>  ffff8800694d7b00 ffffffff81b46a64 ffff88006adb5780 ffff8800677276c0
>>  ffff880067727c68 ffff8800677276c0 ffff8800694d7b28 ffffffff8150a86c
>>  ffff8800694d7bb8 ffff88006adb5780 ffff8800e77276d8 ffff8800694d7ba8
>> Call Trace:
>>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>>  [<ffffffff81b46a64>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
>>  [<ffffffff8150a86c>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
>>  [<     inline     >] print_address_description mm/kasan/report.c:194
>>  [<ffffffff8150ab07>] kasan_report_error+0x1f7/0x4d0 mm/kasan/report.c:283
>>  [<     inline     >] kasan_report mm/kasan/report.c:303
>>  [<ffffffff8150b01e>] __asan_report_store8_noabort+0x3e/0x40
>> mm/kasan/report.c:329
>>  [<     inline     >] __write_once_size ./include/linux/compiler.h:272
>>  [<     inline     >] __hlist_del ./include/linux/list.h:622
>>  [<     inline     >] hlist_del_init ./include/linux/list.h:637
>>  [<ffffffff83825f49>] l2tp_ip6_close+0x239/0x2a0 net/l2tp/l2tp_ip6.c:239
>>  [<ffffffff8316b31f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
>>  [<ffffffff832cd4d0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
>>  [<ffffffff82b6d89e>] sock_release+0x8e/0x1d0 net/socket.c:570
>>  [<ffffffff82b6d9f6>] sock_close+0x16/0x20 net/socket.c:1017
>>  [<ffffffff81524bdd>] __fput+0x29d/0x720 fs/file_table.c:208
>>  [<ffffffff815250e5>] ____fput+0x15/0x20 fs/file_table.c:244
>>  [<ffffffff81172928>] task_work_run+0xf8/0x170 kernel/task_work.c:116
>>  [<     inline     >] exit_task_work ./include/linux/task_work.h:21
>>  [<ffffffff8111bda3>] do_exit+0x883/0x2ac0 kernel/exit.c:828
>>  [<ffffffff8112234e>] do_group_exit+0x10e/0x340 kernel/exit.c:931
>>  [<     inline     >] SYSC_exit_group kernel/exit.c:942
>>  [<ffffffff8112259d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
>>  [<ffffffff83fc1501>] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>> Object at ffff8800677276c0, in cache L2TP/IPv6 size: 1448
>> Allocated:
>> PID = 8692
>> [<ffffffff8107e236>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>> [<ffffffff81509bd6>] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
>> [<     inline     >] set_track mm/kasan/kasan.c:507
>> [<ffffffff81509e4b>] kasan_kmalloc+0xab/0xe0 mm/kasan/kasan.c:598
>> [<ffffffff8150a3b2>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:537
>> [<     inline     >] slab_post_alloc_hook mm/slab.h:417
>> [<     inline     >] slab_alloc_node mm/slub.c:2708
>> [<     inline     >] slab_alloc mm/slub.c:2716
>> [<ffffffff81505064>] kmem_cache_alloc+0xb4/0x270 mm/slub.c:2721
>> [<ffffffff82b77ca9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1327
>> [<ffffffff82b80898>] sk_alloc+0x38/0xaf0 net/core/sock.c:1389
>> [<ffffffff832cef05>] inet6_create+0x2e5/0xf60 net/ipv6/af_inet6.c:182
>> [<ffffffff82b7301f>] __sock_create+0x37f/0x640 net/socket.c:1153
>> [<     inline     >] sock_create net/socket.c:1193
>> [<     inline     >] SYSC_socket net/socket.c:1223
>> [<ffffffff82b73510>] SyS_socket+0xf0/0x1b0 net/socket.c:1203
>> [<ffffffff83fc1501>] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>> Freed:
>> PID = 8668
>> [<ffffffff8107e236>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
>> [<ffffffff81509bd6>] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
>> [<     inline     >] set_track mm/kasan/kasan.c:507
>> [<ffffffff8150a433>] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:571
>> [<     inline     >] slab_free_hook mm/slub.c:1352
>> [<     inline     >] slab_free_freelist_hook mm/slub.c:1374
>> [<     inline     >] slab_free mm/slub.c:2951
>> [<ffffffff81506263>] kmem_cache_free+0xb3/0x2c0 mm/slub.c:2973
>> [<     inline     >] sk_prot_free net/core/sock.c:1370
>> [<ffffffff82b7c669>] __sk_destruct+0x319/0x480 net/core/sock.c:1445
>> [<ffffffff82b82b94>] sk_destruct+0x44/0x80 net/core/sock.c:1453
>> [<ffffffff82b82c24>] __sk_free+0x54/0x230 net/core/sock.c:1461
>> [<ffffffff82b82e23>] sk_free+0x23/0x30 net/core/sock.c:1472
>> [<     inline     >] sock_put ./include/net/sock.h:1591
>> [<ffffffff82b84b04>] sk_common_release+0x294/0x3e0 net/core/sock.c:2745
>> [<ffffffff83825f19>] l2tp_ip6_close+0x209/0x2a0 net/l2tp/l2tp_ip6.c:243
>> [<ffffffff8316b31f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
>> [<ffffffff832cd4d0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
>> [<ffffffff82b6d89e>] sock_release+0x8e/0x1d0 net/socket.c:570
>> [<ffffffff82b6d9f6>] sock_close+0x16/0x20 net/socket.c:1017
>> [<ffffffff81524bdd>] __fput+0x29d/0x720 fs/file_table.c:208
>> [<ffffffff815250e5>] ____fput+0x15/0x20 fs/file_table.c:244
>> [<ffffffff81172928>] task_work_run+0xf8/0x170 kernel/task_work.c:116
>> [<     inline     >] exit_task_work ./include/linux/task_work.h:21
>> [<ffffffff8111bda3>] do_exit+0x883/0x2ac0 kernel/exit.c:828
>> [<ffffffff8112234e>] do_group_exit+0x10e/0x340 kernel/exit.c:931
>> [<     inline     >] SYSC_exit_group kernel/exit.c:942
>> [<ffffffff8112259d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
>> [<ffffffff83fc1501>] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>> Memory state around the buggy address:
>>  ffff880067727580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>  ffff880067727600: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
>> >ffff880067727680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>>                                                     ^
>>  ffff880067727700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>  ffff880067727780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ==================================================================
>>
>> To reproduce run the attached program in a tight parallel loop using
>> stress (https://godoc.org/golang.org/x/tools/cmd/stress):
>> $ gcc -lpthread tmp.c
>> $ ./stress ./a.out
>>
>> On commit bc33b0ca11e3df467777a4fa7639ba488c9d4911 (Nov 5).
>>
>
> Thanks for the report. It looks like l2tp_ip6_bind() is racy.
> Can you try the following patch?
>
> diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
> index ad3468c..9978d01 100644
> --- a/net/l2tp/l2tp_ip6.c
> +++ b/net/l2tp/l2tp_ip6.c
> @@ -269,8 +269,6 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>         int addr_type;
>         int err;
>
> -       if (!sock_flag(sk, SOCK_ZAPPED))
> -               return -EINVAL;
>         if (addr->l2tp_family != AF_INET6)
>                 return -EINVAL;
>         if (addr_len < sizeof(*addr))
> @@ -296,6 +294,9 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>         lock_sock(sk);
>
>         err = -EINVAL;
> +       if (!sock_flag(sk, SOCK_ZAPPED))
> +               goto out_unlock;
> +
>         if (sk->sk_state != TCP_CLOSE)
>                 goto out_unlock;

^ permalink raw reply

* net/can: use-after-free in bcm_rx_thr_flush
From: Andrey Konovalov @ 2016-11-22  9:22 UTC (permalink / raw)
  To: Oliver Hartkopp, Marc Kleine-Budde, David S. Miller, linux-can,
	netdev, LKML
  Cc: Dmitry Vyukov, Alexander Potapenko, Kostya Serebryany,
	Eric Dumazet, syzkaller

[-- Attachment #1: Type: text/plain, Size: 5259 bytes --]

Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

A reproducer is attached.
You may need to run it a few times.

On commit 9c763584b7c8911106bb77af7e648bef09af9d80 (4.9-rc6, Nov 20).

==================================================================
BUG: KASAN: use-after-free in bcm_rx_thr_flush+0x284/0x2b0
Read of size 1 at addr ffff88006c1faae5 by task a.out/3874

page:ffffea0001b07e80 count:1 mapcount:0 mapping:          (null) index:0x0
flags: 0x100000000000080(slab)
page dumped because: kasan: bad access detected

CPU: 1 PID: 3874 Comm: a.out Not tainted 4.9.0-rc6+ #427
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88006ab07900 ffffffff81b472e4 ffff88006ab07990 ffff88006c1faae5
 00000000000000fa 00000000000000fb ffff88006ab07980 ffffffff8150ad42
 ffff88006323ce58 0000000000000246 ffff880068ca8000 0000000000000282
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff81b472e4>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [<     inline     >] describe_address mm/kasan/report.c:259
 [<ffffffff8150ad42>] kasan_report_error+0x122/0x560 mm/kasan/report.c:365
 [<     inline     >] kasan_report mm/kasan/report.c:387
 [<ffffffff8150b1be>] __asan_report_load1_noabort+0x3e/0x40
mm/kasan/report.c:405
 [<     inline     >] bcm_rx_do_flush net/can/bcm.c:589
 [<ffffffff83577e04>] bcm_rx_thr_flush+0x284/0x2b0 net/can/bcm.c:612
 [<     inline     >] bcm_rx_setup net/can/bcm.c:1199
 [<ffffffff83578b36>] bcm_sendmsg+0xbb6/0x30e0 net/can/bcm.c:1351
 [<     inline     >] sock_sendmsg_nosec net/socket.c:621
 [<ffffffff82b7176c>] sock_sendmsg+0xcc/0x110 net/socket.c:631
 [<ffffffff82b73651>] ___sys_sendmsg+0x771/0x8b0 net/socket.c:1954
 [<ffffffff82b7563e>] __sys_sendmsg+0xce/0x170 net/socket.c:1988
 [<     inline     >] SYSC_sendmsg net/socket.c:1999
 [<ffffffff82b7570d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
 [<ffffffff83fc4301>] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209

The buggy address belongs to the object at ffff88006c1faae0
 which belongs to the cache kmalloc-32 of size 32
The buggy address ffff88006c1faae5 is located 5 bytes inside
 of 32-byte region [ffff88006c1faae0, ffff88006c1fab00)

Freed by task 2013:
 [<ffffffff8107e236>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 [<ffffffff81509e56>] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
 [<     inline     >] set_track mm/kasan/kasan.c:507
 [<ffffffff8150a6b3>] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:571
 [<     inline     >] slab_free_hook mm/slub.c:1352
 [<     inline     >] slab_free_freelist_hook mm/slub.c:1374
 [<     inline     >] slab_free mm/slub.c:2951
 [<ffffffff81506b98>] kfree+0xe8/0x2b0 mm/slub.c:3871
 [<ffffffff819dd8c1>] selinux_cred_free+0x51/0x80 security/selinux/hooks.c:3725
 [<ffffffff819ce358>] security_cred_free+0x48/0x80 security/security.c:907
 [<ffffffff8117e27d>] put_cred_rcu+0xed/0x390 kernel/cred.c:116
 [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
 [<     inline     >] rcu_do_batch kernel/rcu/tree.c:2776
 [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
 [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3007
 [<ffffffff8125dfe0>] rcu_process_callbacks+0xa40/0x1190 kernel/rcu/tree.c:3024
 [<ffffffff83fc70af>] __do_softirq+0x23f/0x8e5 kernel/softirq.c:284

Allocated by task 1826:
 [<ffffffff8107e236>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 [<ffffffff81509e56>] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
 [<     inline     >] set_track mm/kasan/kasan.c:507
 [<ffffffff8150a0cb>] kasan_kmalloc+0xab/0xe0 mm/kasan/kasan.c:598
 [<ffffffff8150a632>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:537
 [<     inline     >] slab_post_alloc_hook mm/slab.h:417
 [<     inline     >] slab_alloc_node mm/slub.c:2708
 [<     inline     >] slab_alloc mm/slub.c:2716
 [<ffffffff815090ef>] __kmalloc_track_caller+0xcf/0x2a0 mm/slub.c:4240
 [<ffffffff8146bf84>] kmemdup+0x24/0x50 mm/util.c:113
 [<ffffffff819dcbe9>] selinux_cred_prepare+0x49/0xb0
security/selinux/hooks.c:3739
 [<ffffffff819ce40d>] security_prepare_creds+0x7d/0xb0 security/security.c:912
 [<ffffffff8117fab3>] prepare_creds+0x243/0x340 kernel/cred.c:277
 [<ffffffff81181bab>] copy_creds+0x7b/0x5c0 kernel/cred.c:343
 [<ffffffff81109c6e>] copy_process.part.45+0x86e/0x5b50 kernel/fork.c:1529
 [<     inline     >] copy_process kernel/fork.c:1479
 [<ffffffff8110f2fa>] _do_fork+0x1ba/0xcc0 kernel/fork.c:1933
 [<     inline     >] SYSC_clone kernel/fork.c:2043
 [<ffffffff8110fed7>] SyS_clone+0x37/0x50 kernel/fork.c:2037
 [<ffffffff81006465>] do_syscall_64+0x195/0x490 arch/x86/entry/common.c:280
 [<ffffffff83fc43c9>] return_from_SYSCALL_64+0x0/0x7a
arch/x86/entry/entry_64.S:251

Memory state around the buggy address:
 ffff88006c1fa980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb
 ffff88006c1faa00: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc
>ffff88006c1faa80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb
                                                       ^
 ffff88006c1fab00: fc fc fb fb fb fb fc fc 00 00 00 00 fc fc 00 00
 ffff88006c1fab80: 00 00 fc fc fb fb fb fb fc fc fb fb fb fb fc fc
==================================================================

Thanks!

[-- Attachment #2: bcm-rx-uaf-poc.c --]
[-- Type: text/x-csrc, Size: 7860 bytes --]

// autogenerated by syzkaller (http://github.com/google/syzkaller)

#ifndef __NR_socket
#define __NR_socket 41
#endif
#ifndef __NR_syz_fuse_mount
#define __NR_syz_fuse_mount 1000004
#endif
#ifndef __NR_syz_fuseblk_mount
#define __NR_syz_fuseblk_mount 1000005
#endif
#ifndef __NR_syz_open_pts
#define __NR_syz_open_pts 1000003
#endif
#ifndef __NR_syz_test
#define __NR_syz_test 1000001
#endif
#ifndef __NR_mmap
#define __NR_mmap 9
#endif
#ifndef __NR_connect
#define __NR_connect 42
#endif
#ifndef __NR_sendmsg
#define __NR_sendmsg 46
#endif
#ifndef __NR_syz_open_dev
#define __NR_syz_open_dev 1000002
#endif

#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <net/if_arp.h>

#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <pthread.h>
#include <setjmp.h>
#include <signal.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

__thread int skip_segv;
__thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* uctx)
{
  if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED))
    _longjmp(segv_env, 1);
  exit(sig);
}

static void install_segv_handler()
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_sigaction = segv_handler;
  sa.sa_flags = SA_NODEFER | SA_SIGINFO;
  sigaction(SIGSEGV, &sa, NULL);
  sigaction(SIGBUS, &sa, NULL);
}

#define NONFAILING(...)                                                \
  {                                                                    \
    __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
    if (_setjmp(segv_env) == 0) {                                      \
      __VA_ARGS__;                                                     \
    }                                                                  \
    __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
  }

static uintptr_t syz_open_dev(uintptr_t a0, uintptr_t a1, uintptr_t a2)
{
  if (a0 == 0xc || a0 == 0xb) {

    char buf[128];
    sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" : "block",
            (uint8_t)a1, (uint8_t)a2);
    return open(buf, O_RDWR, 0);
  } else {

    char buf[1024];
    char* hash;
    strncpy(buf, (char*)a0, sizeof(buf));
    buf[sizeof(buf) - 1] = 0;
    while ((hash = strchr(buf, '#'))) {
      *hash = '0' + (char)(a1 % 10);
      a1 /= 10;
    }
    return open(buf, a2, 0);
  }
}

static uintptr_t syz_open_pts(uintptr_t a0, uintptr_t a1)
{

  int ptyno = 0;
  if (ioctl(a0, TIOCGPTN, &ptyno))
    return -1;
  char buf[128];
  sprintf(buf, "/dev/pts/%d", ptyno);
  return open(buf, a1, 0);
}

static uintptr_t syz_fuse_mount(uintptr_t a0, uintptr_t a1,
                                uintptr_t a2, uintptr_t a3,
                                uintptr_t a4, uintptr_t a5)
{

  uint64_t target = a0;
  uint64_t mode = a1;
  uint64_t uid = a2;
  uint64_t gid = a3;
  uint64_t maxread = a4;
  uint64_t flags = a5;

  int fd = open("/dev/fuse", O_RDWR);
  if (fd == -1)
    return fd;
  char buf[1024];
  sprintf(buf, "fd=%d,user_id=%ld,group_id=%ld,rootmode=0%o", fd,
          (long)uid, (long)gid, (unsigned)mode & ~3u);
  if (maxread != 0)
    sprintf(buf + strlen(buf), ",max_read=%ld", (long)maxread);
  if (mode & 1)
    strcat(buf, ",default_permissions");
  if (mode & 2)
    strcat(buf, ",allow_other");
  syscall(SYS_mount, "", target, "fuse", flags, buf);

  return fd;
}

static uintptr_t syz_fuseblk_mount(uintptr_t a0, uintptr_t a1,
                                   uintptr_t a2, uintptr_t a3,
                                   uintptr_t a4, uintptr_t a5,
                                   uintptr_t a6, uintptr_t a7)
{

  uint64_t target = a0;
  uint64_t blkdev = a1;
  uint64_t mode = a2;
  uint64_t uid = a3;
  uint64_t gid = a4;
  uint64_t maxread = a5;
  uint64_t blksize = a6;
  uint64_t flags = a7;

  int fd = open("/dev/fuse", O_RDWR);
  if (fd == -1)
    return fd;
  if (syscall(SYS_mknodat, AT_FDCWD, blkdev, S_IFBLK, makedev(7, 199)))
    return fd;
  char buf[256];
  sprintf(buf, "fd=%d,user_id=%ld,group_id=%ld,rootmode=0%o", fd,
          (long)uid, (long)gid, (unsigned)mode & ~3u);
  if (maxread != 0)
    sprintf(buf + strlen(buf), ",max_read=%ld", (long)maxread);
  if (blksize != 0)
    sprintf(buf + strlen(buf), ",blksize=%ld", (long)blksize);
  if (mode & 1)
    strcat(buf, ",default_permissions");
  if (mode & 2)
    strcat(buf, ",allow_other");
  syscall(SYS_mount, blkdev, target, "fuseblk", flags, buf);

  return fd;
}

static uintptr_t execute_syscall(int nr, uintptr_t a0, uintptr_t a1,
                                 uintptr_t a2, uintptr_t a3,
                                 uintptr_t a4, uintptr_t a5,
                                 uintptr_t a6, uintptr_t a7,
                                 uintptr_t a8)
{
  switch (nr) {
  default:
    return syscall(nr, a0, a1, a2, a3, a4, a5);
  case __NR_syz_test:
    return 0;
  case __NR_syz_open_dev:
    return syz_open_dev(a0, a1, a2);
  case __NR_syz_open_pts:
    return syz_open_pts(a0, a1);
  case __NR_syz_fuse_mount:
    return syz_fuse_mount(a0, a1, a2, a3, a4, a5);
  case __NR_syz_fuseblk_mount:
    return syz_fuseblk_mount(a0, a1, a2, a3, a4, a5, a6, a7);
  }
}

long r[25];

int main()
{
  install_segv_handler();
  memset(r, -1, sizeof(r));
  r[0] = execute_syscall(__NR_mmap, 0x20000000ul, 0xf60000ul, 0x3ul,
                         0x32ul, 0xfffffffffffffffful, 0x0ul, 0, 0, 0);
  r[1] = execute_syscall(__NR_socket, 0x1dul, 0x80002ul, 0x2ul, 0, 0, 0,
                         0, 0, 0);
  NONFAILING(*(uint16_t*)0x20f57000 = (uint16_t)0x27);
  NONFAILING(*(uint32_t*)0x20f57004 = (uint32_t)0x0);
  NONFAILING(*(uint32_t*)0x20f57008 = (uint32_t)0x0);
  NONFAILING(*(uint32_t*)0x20f5700c = (uint32_t)0x0);
  NONFAILING(*(uint8_t*)0x20f57010 = (uint8_t)0x0);
  NONFAILING(*(uint8_t*)0x20f57011 = (uint8_t)0x0);
  NONFAILING(memcpy(
      (void*)0x20f57012,
      "\x34\x1b\x3a\x01\xb2\x57\x84\x9c\xa1\xd7\xd1\xff\x9f\x99\x9d\x81"
      "\x27\xb1\x85\xf8\x8d\x1d\x77\x5d\x59\xc8\x8a\x3a\xa6\xa8\xdd\xac"
      "\xdf\x2b\xdc\x32\x4e\xa6\x57\x8a\x21\xb8\x51\x14\x61\x01\x86\xc3"
      "\x81\x7c\x34\xb0\x5e\xaf\xfd\x2c\x3f\x54\xf5\x7f\xa8\x1b\xa0",
      63));
  NONFAILING(*(uint64_t*)0x20f57058 = (uint64_t)0x0);
  r[10] = execute_syscall(__NR_connect, r[1], 0x20f57000ul, 0x60ul, 0,
                          0, 0, 0, 0, 0);
  NONFAILING(*(uint64_t*)0x20b05000 = (uint64_t)0x20f55000);
  NONFAILING(*(uint32_t*)0x20b05008 = (uint32_t)0x0);
  NONFAILING(*(uint64_t*)0x20b05010 = (uint64_t)0x20008fe0);
  NONFAILING(*(uint64_t*)0x20b05018 = (uint64_t)0x2);
  NONFAILING(*(uint64_t*)0x20b05020 = (uint64_t)0x20f54000);
  NONFAILING(*(uint64_t*)0x20b05028 = (uint64_t)0x0);
  NONFAILING(*(uint32_t*)0x20b05030 = (uint32_t)0x0);
  NONFAILING(*(uint64_t*)0x20008fe0 = (uint64_t)0x20d5fff1);
  NONFAILING(*(uint64_t*)0x20008fe8 = (uint64_t)0xf);
  NONFAILING(*(uint64_t*)0x20008ff0 = (uint64_t)0x20f55000);
  NONFAILING(*(uint64_t*)0x20008ff8 = (uint64_t)0x69);
  NONFAILING(memcpy(
      (void*)0x20d5fff1,
      "\x05\x00\x00\x00\x8d\x13\x00\x00\x00\x00\x17\x14\xb7\x7e\xa6",
      15));
  NONFAILING(memcpy(
      (void*)0x20f55000,
      "\x12\x6f\x39\xb6\x5b\x4e\xed\x90\x77\xe0\x54\xbf\xb6\xb2\x41\xd7"
      "\x36\x5d\x58\xfa\xa8\x32\x7a\x6d\x25\x89\x01\x00\xdd\x00\xc5\x89"
      "\x07\xec\xc2\x76\x8d\x02\x00\x00\x00\x10\xb4\x27\xab\x6c\x2a\x41"
      "\xe2\x54\x47\xcc\x08\xca\x75\x2a\x03\x89\xd3\x04\x71\x3f\x75\x90"
      "\xf4\xda\xc6\xd9\xa7\x50\xff\xe8\x3e\xff\xcd\x31\x1b\xa2\x0a\xee"
      "\x8a\x72\x6b\xda\x74\x75\x92\xbf\xad\xf0\x71\xb9\xb7\x70\x04\xbb"
      "\x58\x40\x7d\x50\x14\x6b\xd7\xc2\x60",
      105));
  r[24] = execute_syscall(__NR_sendmsg, r[1], 0x20b05000ul, 0x0ul, 0, 0,
                          0, 0, 0, 0);
  return 0;
}

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Ozgur Karatas @ 2016-11-22  8:38 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, Joao Pinto, Rayagond Kokatanur, Rabin Vincent
  Cc: andreas.irestal@axis.com, alexandre.torgue@st.com,
	saeedm@mellanox.com, netdev, linux-kernel@vger.kernel.org,
	CARLOS.PALMINHA@synopsys.com, idosch@mellanox.com, mued dib,
	jiri@mellanox.com, Jeff Kirsher, David Miller,
	linux-arm-kernel@lists.infradead.org, lars.persson@axis.com
In-Reply-To: <937252db-9538-2cf6-c8fa-82b558531c51@st.com>

Hello all,

I think, ethtool and mdio don't work because the tool's not support to "QoS", right?

Maybe, need a new API. I'm looking for dwceqos code but "tc" tools is very idea.

I hope to be me always helpful.

Regards,

Ozgur

21.11.2016, 16:38, "Giuseppe CAVALLARO" <peppe.cavallaro@st.com>:
> Hello Joao
>
> On 11/21/2016 2:48 PM, Joao Pinto wrote:
>>  Synopsys QoS IP is a separated hardware component, so it should be reusable by
>>  all implementations using it and so have its own "core driver" and platform +
>>  pci glue drivers. This is necessary for example in hardware validation, where
>>  you prototype an IP and instantiate its drivers and test it.
>>
>>  Was there a strong reason to integrate QoS features directly in stmmac and not
>>  in synopsys/dwc_eth_qos.*?
>
> We decided to enhance the stmmac on supporting the QoS for several
> reasons; for example the common APIs that the driver already exposed and
> actually suitable for other SYNP chips. Then, PTP, EEE,
> S/RGMII, MMC could be shared among different chips with a minimal
> effort. This meant a lot of code already ready.
>
> For sure, the net-core, Ethtool, mdio parts were reused. Same for the
> glue logic files.
> For the latter, this helped to easily bring-up new platforms also
> because the stmmac uses the HW cap register to auto-configure many
> parts of the MAC core, DMA and modules. This helped many users, AFAIK.
>
> For validation purpose, this is my experience, the stmmac helped
> a lot because people used the same code to validate different HW
> and it was easy to switch to a platform to another one in order to
> verify / check if the support was ok or if a regression was introduced.
> This is important for complex supports like PTP or EEE.
>
> Hoping this can help.
>
> Do not hesitate to contact me for further details
>
> peppe

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox