Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2] net/phy: add trace events for mdio accesses
From: Steven Rostedt @ 2016-11-22 14:55 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Florian Fainelli, Ingo Molnar, netdev
In-Reply-To: <20161122100127.5940-1-uwe@kleine-koenig.org>

On Tue, 22 Nov 2016 11:01:27 +0100
Uwe Kleine-König <uwe@kleine-koenig.org> wrote:

> diff --git a/include/trace/events/mdio.h b/include/trace/events/mdio.h
> new file mode 100644
> index 000000000000..468e2d095d19
> --- /dev/null
> +++ b/include/trace/events/mdio.h
> @@ -0,0 +1,42 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM mdio
> +
> +#if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_MDIO_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT_CONDITION(mdio_access,
> +
> +	TP_PROTO(struct mii_bus *bus, int read,
> +		 unsigned addr, unsigned regnum, u16 val, int err),
> +
> +	TP_ARGS(bus, read, addr, regnum, val, err),
> +
> +	TP_CONDITION(err >= 0),
> +
> +	TP_STRUCT__entry(
> +		__array(char, busid, MII_BUS_ID_SIZE)
> +		__field(int, read)

read is just a 0 or 1. What about making it a char? That way we can
pack this better. If I'm not mistaken, MII_BUS_ID_SIZE is (20 - 3) or
17. If read is just one byte, then it can fit in one of those three
bytes, and you save 4 extra bytes (assuming addr will be 4 byte
aligned).

-- Steve


> +		__field(unsigned, addr)
> +		__field(unsigned, regnum)
> +		__field(u16, val)
> +	),
> +
> +	TP_fast_assign(
> +		strncpy(__entry->busid, bus->id, MII_BUS_ID_SIZE);
> +		__entry->read = read;
> +		__entry->addr = addr;
> +		__entry->regnum = regnum;
> +		__entry->val = val;
> +	),
> +
> +	TP_printk("%s %-5s phy:0x%02x reg:0x%02x val:0x%04hx",
> +		  __entry->busid, __entry->read ? "read" : "write",
> +		  __entry->addr, __entry->regnum, __entry->val)
> +);
> +
> +#endif /* if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ) */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>

^ permalink raw reply

* Re: [v5,1/5] soc: qcom: smem_state: Fix include for ERR_PTR()
From: Valo, Kalle @ 2016-11-22 14:55 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: k.eugene.e@gmail.com, Andy Gross, wcn36xx@lists.infradead.org,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org
In-Reply-To: <20161118183541.GI28340@tuxbot>

Bjorn Andersson <bjorn.andersson@linaro.org> writes:

> On Wed 16 Nov 10:49 PST 2016, Kalle Valo wrote:
>
>> Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
>> > The correct include file for getting errno constants and ERR_PTR() is
>> > linux/err.h, rather than linux/errno.h, so fix the include.
>> > 
>> > Fixes: e8b123e60084 ("soc: qcom: smem_state: Add stubs for disabled smem_state")
>> > Acked-by: Andy Gross <andy.gross@linaro.org>
>> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
>> 
>> For some reason this fails to compile now. Can you take a look, please?
>> 
>> ERROR: "qcom_wcnss_open_channel" [drivers/net/wireless/ath/wcn36xx/wcn36xx.ko] undefined!
>> make[1]: *** [__modpost] Error 1
>> make: *** [modules] Error 2
>> 
>> 5 patches set to Changes Requested.
>> 
>> 9429045 [v5,1/5] soc: qcom: smem_state: Fix include for ERR_PTR()
>> 9429047 [v5,2/5] wcn36xx: Transition driver to SMD client
>
> This patch was updated with the necessary depends in Kconfig to catch
> this exact issue and when I pull in your .config (which has QCOM_SMD=n,
> QCOM_WCNSS_CTRL=n and WCN36XX=y) I can build this just fine.
>
> I've tested the various combinations and it seems to work fine. Do you
> have any other patches in your tree?

This was with the pending branch of my ath.git tree. There are other
wireless patches (ath10k etc) but I would guess they don't affect here.

> Any stale objects?

Not sure what you mean with this question, but I didn't run 'make clean'
if that's what you are asking.

> Would you mind retesting this, before I invest more time in trying to
> reproduce the issue you're seeing?

Sure, I'll take a look but that might take few days.

-- 
Kalle Valo

^ permalink raw reply

* [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it
From: Roi Dayan @ 2016-11-22 14:25 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Cong Wang, Or Gerlitz, Roi Dayan, Cong Wang

tp->root is being allocated in init() time and kfreed in destroy()
however it is being dereferenced in classify() path.

We could be in classify() path after destroy() was called and thus 
tp->root is null. Verifying if tp->root is null in classify() path 
is enough because it's being freed with kfree_rcu() and classify() 
path is under rcu_read_lock().

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Cc: Cong Wang <cwang@twopensource.com>
---

Hi Cong, all

As stated above, the issue was introduced with commit 1e052be69d04 ("net_sched: destroy 
proto tp when all filters are gone"). This patch provides a fix only for cls_flower where 
I succeeded in reproducing the issue. Cong, if you can/want to come up with a fix that
will be applicable for all the others classifiners, I am fine with that.

Thanks,
Roi


 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e8dd09a..88a26c4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 	struct fl_flow_key skb_mkey;
 	struct ip_tunnel_info *info;
 
-	if (!atomic_read(&head->ht.nelems))
+	if (!head || !atomic_read(&head->ht.nelems))
 		return -1;
 
 	fl_clear_masked_range(&skb_key, &head->mask);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next] marvell: mark mvneta and mvpp2 32-bit only
From: Arnd Bergmann @ 2016-11-22 14:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Arnd Bergmann, Gregory CLEMENT, Florian Fainelli, Marcin Wojtas,
	netdev, linux-kernel

Both of these drivers won't work on 64-bit architectures unless they
are redesigned, since they store a virtual address pointer in a 32-bit
field of the descriptors:

drivers/net/ethernet/marvell/mvneta_bm.c: In function 'mvneta_bm_construct':
drivers/net/ethernet/marvell/mvneta_bm.c:103:16: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_prs_vlan_init':
drivers/net/ethernet/marvell/mvpp2.c:2563:32: error: large integer implicitly truncated to unsigned type [-Werror=overflow]

This limits the COMPILE_TEST option for the two drivers again to
only build them on 32-bit. This seems nicer than shutting up the
warnings, in case we ever actually want to use them on 64-bit,
as the warnings indicate which parts of the driver are currently
broken there.

Fixes: a0627f776a45 ("net: marvell: Allow drivers to be built with COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/marvell/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/marvell/Kconfig b/drivers/net/ethernet/marvell/Kconfig
index d74d4e6f0b34..66fd9dbb2ca7 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -58,6 +58,7 @@ config MVNETA
 	tristate "Marvell Armada 370/38x/XP network interface support"
 	depends on PLAT_ORION || COMPILE_TEST
 	depends on HAS_DMA
+	depends on !64BIT
 	select MVMDIO
 	select FIXED_PHY
 	---help---
@@ -81,6 +82,7 @@ config MVPP2
 	tristate "Marvell Armada 375 network interface support"
 	depends on MACH_ARMADA_375 || COMPILE_TEST
 	depends on HAS_DMA
+	depends on !64BIT
 	select MVMDIO
 	---help---
 	  This driver supports the network interface units in the
-- 
2.9.0

^ permalink raw reply related

* [PATCH net] net/mlx4_en: Free netdev resources under state lock
From: Tariq Toukan @ 2016-11-22 14:20 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Saeed Mahameed, Sagi Grimberg,
	Tariq Toukan, Brenden Blanco

Make sure mlx4_en_free_resources is called under the netdev state lock.
This is needed since RCU dereference of XDP prog should be protected.

Fixes: 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sagi Grimberg <sagi@grimberg.me>
CC: Brenden Blanco <bblanco@plumgrid.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 3a47e83d3e07..a60f635da78b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -129,6 +129,9 @@ static enum mlx4_net_trans_rule_id mlx4_ip_proto_to_trans_rule_id(u8 ip_proto)
 	}
 };
 
+/* Must not acquire state_lock, as its corresponding work_sync
+ * is done under it.
+ */
 static void mlx4_en_filter_work(struct work_struct *work)
 {
 	struct mlx4_en_filter *filter = container_of(work,
@@ -2189,13 +2192,13 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
 	mutex_lock(&mdev->state_lock);
 	mdev->pndev[priv->port] = NULL;
 	mdev->upper[priv->port] = NULL;
-	mutex_unlock(&mdev->state_lock);
 
 #ifdef CONFIG_RFS_ACCEL
 	mlx4_en_cleanup_filters(priv);
 #endif
 
 	mlx4_en_free_resources(priv);
+	mutex_unlock(&mdev->state_lock);
 
 	kfree(priv->tx_ring);
 	kfree(priv->tx_cq);
-- 
1.8.3.1

^ permalink raw reply related

* Re: Synopsys Ethernet QoS Driver
From: Joao Pinto @ 2016-11-22 14:16 UTC (permalink / raw)
  To: Lars Persson, Giuseppe CAVALLARO
  Cc: Joao Pinto, Rayagond Kokatanur, Rabin Vincent, mued dib,
	David Miller, Jeff Kirsher, jiri@mellanox.com,
	saeedm@mellanox.com, idosch@mellanox.com, netdev,
	linux-kernel@vger.kernel.org, CARLOS.PALMINHA@synopsys.com,
	Andreas Irestål, alexandre.torgue@st.com,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <7c7798b5-8cd4-ba99-f526-22d3e06e05db@synopsys.com>


Hi Lars and Peppe,

On 21-11-2016 16:11, Joao Pinto wrote:
> On 21-11-2016 15:43, Lars Persson wrote:
>>
>>
>>> 21 nov. 2016 kl. 16:06 skrev Joao Pinto <Joao.Pinto@synopsys.com>:
>>>
>>>> On 21-11-2016 14:25, Giuseppe CAVALLARO wrote:
>>>>> On 11/21/2016 2:28 PM, Lars Persson wrote:
>>>>>
>>>>>
>>>>>> 21 nov. 2016 kl. 13:53 skrev Giuseppe CAVALLARO <peppe.cavallaro@st.com>:
>>>>>>
>>>>>> Hello Joao
>>>>>>
>>>>>>> On 11/21/2016 1:32 PM, Joao Pinto wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>>>> On 21-11-2016 05:29, Rayagond Kokatanur wrote:
>>>>>>>>>> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent <rabin@rab.in> wrote:
>>>>>>>>>> On Fri, Nov 18, 2016 at 02:20:27PM +0000, Joao Pinto wrote:
>>>>>>>>>> For now we are interesting in improving the synopsys QoS driver under
>>>>>>>>>> /nect/ethernet/synopsys. For now the driver structure consists of a
>>>>>>>>>> single file
>>>>>>>>>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and
> 

snip (...)

>>>>>>
>>>>>> Peppe
>>>>>>
>>>>>
>>>>> Hello Joao and others,
>>>>>
>>>
>>> Hi Lars,
>>>
>>>>> As the maintainer of dwc_eth_qos.c I prefer also that we put efforts on the
>>>>> most mature driver, the stmmac.
>>>>>
>>>>> I hope that the code can migrate into an ethernet/synopsys folder to keep the
>>>>> convention of naming the folder after the vendor. This makes it easy for
>>>>> others to find the driver.
>>>>>
>>>>> The dwc_eth_qos.c will eventually be removed and its DT binding interface can
>>>>> then be implemented in the stmmac driver.
>>>
>>> So your ideia is to pick the ethernet/stmmac and rename it to ethernet/synopsys
>>> and try to improve the structure and add the missing QoS features to it?
>>
>> Indeed this is what I prefer.
> 
> Ok, it makes sense.
> Just for curiosity the target setup is the following:
> https://www.youtube.com/watch?v=8V-LB5y2Cos
> but instead of using internal drivers, we desire to use mainline drivers only.
> 
> Thanks!

Regarding this subject, I am thinking of making the following adaption:

a) delete ethernet/synopsys
b) rename ethernet/stmicro/stmmac to ethernet/synopsys

and send you a patch for you to evaluate. Both agree with the approach?
To have a new work base would be important, because I will add to the "new"
structure some missing QoS features like Multichannel support, CBS and later TSN.

Thanks.

> 
>>
>>>
>>>>
>>>> Thanks Lars, I will be happy to support all you on this transition
>>>> and I agree on renaming all.
>>>>
>>>> peppe
>>>>
>>>>
>>>>> - Lars
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> (See http://lists.openwall.net/netdev/2016/02/29/127)
>>>>>>>>>
>>>>>>>>> The former only supports 4.x of the hardware.
>>>>>>>>>
>>>>>>>>> The later supports 4.x and 3.x and already has a platform glue driver
>>>>>>>>> with support for several platforms, a PCI glue driver, and a core driver
>>>>>>>>> with several features not present in the former (for example: TX/RX
>>>>>>>>> interrupt coalescing, EEE, PTP).
>>>>>>>>>
>>>>>>>>> Have you evaluated both drivers?  Why have you decided to work on the
>>>>>>>>> former rather than the latter?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
> 

^ permalink raw reply

* Re: [PATCH] iproute2: Nr. of packets and octets for macsec tx stats were swapped.
From: Sabrina Dubroca @ 2016-11-22 13:56 UTC (permalink / raw)
  To: Daniel.Hopf; +Cc: netdev
In-Reply-To: <OFDC737720.EA272892-ONC1258073.00458CA2-C1258073.0049AC0D@continental-corporation.com>

Hi Daniel,

Thanks for fixing this. I noticed it some time ago but it seems
I forgot to send the patch :(

Acked-by: Sabrina Dubroca <sd@queasysnail.net>


Your subject line should be:

Subject: [PATCH iproute2] macsec: Nr.of packets and octets for macsec tx stats were swapped.

with "iproute2" between the brackets.

2016-11-22, 14:24:40 +0100, Daniel.Hopf@continental-corporation.com wrote:
> Signed-off-by: Daniel Hopf <daniel.hopf@continental-corporation.com>
> ---
>  ip/ipmacsec.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
> index c9252bb..aa89a00 100644
> --- a/ip/ipmacsec.c
> +++ b/ip/ipmacsec.c
> @@ -634,10 +634,10 @@ static void print_one_stat(const char **names, 
> struct rtattr **attr, int idx,
>  }
> 
>  static const char *txsc_stats_names[NUM_MACSEC_TXSC_STATS_ATTR] = {
> -       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = 
> "OutOctetsProtected",
> -       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = 
> "OutOctetsEncrypted",
> -       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
> "OutPktsProtected",
> -       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
> "OutPktsEncrypted",
> +       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = "OutPktsProtected",
> +       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = "OutPktsEncrypted",
> +       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
> "OutOctetsProtected",
> +       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
> "OutOctetsEncrypted",
>  };

Your patch was corrupted, probably by your email client, you have
extra newlines everywhere.
Can you send a v2 of this patch? Thanks!

-- 
Sabrina

^ permalink raw reply

* [PATCH] iproute2: Nr. of packets and octets for macsec tx stats were swapped.
From: Daniel.Hopf @ 2016-11-22 13:24 UTC (permalink / raw)
  To: netdev

Signed-off-by: Daniel Hopf <daniel.hopf@continental-corporation.com>
---
 ip/ipmacsec.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
index c9252bb..aa89a00 100644
--- a/ip/ipmacsec.c
+++ b/ip/ipmacsec.c
@@ -634,10 +634,10 @@ static void print_one_stat(const char **names, 
struct rtattr **attr, int idx,
 }

 static const char *txsc_stats_names[NUM_MACSEC_TXSC_STATS_ATTR] = {
-       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = 
"OutOctetsProtected",
-       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = 
"OutOctetsEncrypted",
-       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
"OutPktsProtected",
-       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
"OutPktsEncrypted",
+       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = "OutPktsProtected",
+       [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = "OutPktsEncrypted",
+       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
"OutOctetsProtected",
+       [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
"OutOctetsEncrypted",
 };

 static void print_txsc_stats(const char *prefix, struct rtattr *attr)

^ permalink raw reply related

* Re: [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc
From: Mike Manning @ 2016-11-22 13:17 UTC (permalink / raw)
  To: Hannes Frederic Sowa, netdev
In-Reply-To: <718b6520-686e-cab3-1c8d-9e1de4cb0dbd@stressinduktion.org>

On 11/22/2016 12:18 PM, Hannes Frederic Sowa wrote:
> On 22.11.2016 11:34, Mike Manning wrote:
>> Bursts of failures may occur when adding IPv6 routes via Netlink to the
>> kernel when testing under scale (e.g. 500 routes lost out of 1M). The
>> reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
>> extended the area map in time for the atomic allocation using percpu.c:
>> pcpu_alloc() to succeed. This results in route additions failing with
>> an -ENOMEM error.
>>
>> While the sender of the Netlink msg to add this route could check for
>> an ACK and retransmit in the case of an -ENOMEM error, the latter
>> should not occur in the first place if there is plenty of memory. The
>> solution is to use non-atomic alloc for rt6_info instead. While the
>> client may now be blocked for longer depending on the state of the
>> chunk being added to, this work has to be incurred at some point.
>>
>> The alternative solution would be to provide configurable parameters
>> e.g. via sysctl in percpu.c for default map size, low/high empty pages
>> and map margins. For this solution, the map margin sizes need to be
>> stored per chunk, as large margins cannot be used if the dynamic early
>> slots map size is in use. This is not a preferred solution though, as
>> it requires tuning of these parameters to provide sufficient margins to
>> avoid -ENOMEM errors depending on system requirements.
>>
>> Signed-off-by: Mike Manning <mmanning@brocade.com>
>> ---
>>  net/ipv6/route.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 1b57e11..0e9bb76 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
>>  	struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
>>  
>>  	if (rt) {
>> -		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
>> +		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
>>  		if (rt->rt6i_pcpu) {
>>  			int cpu;
> 
> Nak, this doesn't work, as ip6_dst_alloc must be callable from
> non-blocking code paths unfortunately.
> 
> 

Thanks for the prompt reply.

Do you consider the alternative of providing configurable parameters for per-cpu
alloc as viable, or is there a better way of dealing with this?

While I have tested such param changes under scale as avoiding the -ENOMEM errors, it
would be good to get confirmation that this approach is acceptable prior to coding the
sysctl handling for these.

^ permalink raw reply

* Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Mark Lord @ 2016-11-22 13:12 UTC (permalink / raw)
  To: Hayes Wang, netdev@vger.kernel.org
  Cc: nic_swsd, linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <b9809516-d036-bfc3-b7a3-6563033ec957@pobox.com>

On 16-11-18 07:03 AM, Mark Lord wrote:
> On 16-11-18 02:57 AM, Hayes Wang wrote:
> ..
>> Besides, the maximum data length which the RTL8152 would send to
>> the host is 16KB. That is, if the agg_buf_sz is 16KB, the host
>> wouldn't split it. However, you still see problems for it.
>
> How does the RTL8152 know that the limit is 16KB,
> rather than some other number?  Is this a hardwired number
> in the hardware, or is it a parameter that the software
> sends to the chip during initialization?
..
> The first issue is that a packet sometimes begins in one URB,
> and completes in the next URB, without an rx_desc at the start
> of the second URB.  This I have already reported earlier.

Long run tests over the weekend, with the invalidate_dcache_range() call
before the inner loop of r8152_rx_bottom(), turned up a few instances
where packets were truncated inside a 16384 byte URB buffer, without filling the URB.

[10.293228] r8152_rx_bottom: 4278 corrupted urb: head=9d210000 urb_offset=2856/3376 pkt_len(1518) exceeds remainder(496)
[10.304523] r8152_dump_rx_desc: 044805ee 40080000 006005dc 06020000 00000000 00000000 rx_len=1518
..
[   16.660431] r8152_rx_bottom: 7802 corrupted urb: head=9d1f8000 urb_offset=1544/2064 pkt_len(1518) exceeds remainder(496)
[   16.671719] r8152_dump_rx_desc: 044805ee 40480000 004005dc 46020006 00000000 00000000 rx_len=1518

The r8152.c driver attempted to build skb's for the entire packet size,
even though the 1518-byte packets had only 496-bytes of data in the URB.
It is not clear what the chip did with the rest of the packets in question,
but the next URBs in each case began with a new/real rx_desc and new packet.

There were also unconnected events during the test runs where the
test code noticed totally invalid rx_desc structs in the middles of URBs.
The stock driver would again have attempted to treat those as "valid" (ugh).

..
[   10.273906] r8152_check_rx_desc: rx_desc looks bad.
[   10.279012] r8152_rx_bottom: 4338 corrupted urb. head=9d210000 urb_offset=2856/3376 len_used=2880
[   10.288196] r8152_dump_rx_desc: 312e3239 382e3836 0a20382e 3d435253 3034336d 202f3a30 rx_len=12857

..
[    7.184565] r8152_check_rx_desc: rx_desc looks bad.
[    7.189657] r8152_rx_bottom: 1678 corrupted urb. head=9d210000 urb_offset=2856/3376 len_used=2880
[    7.198852] r8152_dump_rx_desc: a1388402 803c9001 84380810 a67c5c4c a77c782b c64c782b rx_len=1026
..
[   10.351251] r8152_check_rx_desc: rx_desc looks bad.
[   10.356356] r8152_rx_bottom: 4397 corrupted urb. head=9d20c000 urb_offset=4400/7984 len_used=4424
[   10.365543] r8152_dump_rx_desc: 312e3239 382e3836 0a20382e 3d435253 3034336d 202f3a30 rx_len=12857
..
[   10.518119] r8152_check_rx_desc: rx_desc looks bad.
[   10.523204] r8152_rx_bottom: 4458 corrupted urb. head=9d210000 urb_offset=4400/7984 len_used=4424
[   10.532416] r8152_dump_rx_desc: 54544120 6e3d5352 636f6c6f 65762c6b 343d7372 6464612c rx_len=16672
..

> But the driver, as written, sometimes accesses bytes outside
> of the 16KB URB buffer, because it trusts the non-existent
> rx_desc in these cases, and also because it accesses bytes
> from the rx_desc without first checking whether there is
> sufficient remaining space in the URB to hold an rx_desc.
>
> These incorrect accesses sometimes touch memory outside
> of the URB buffer.  Since the driver allocates all of its
> rx URB buffers at once, they are highly likely to be
> physically (and therefore virtually) adjacent in memory.
>
> So mistakenly accessing beyond the end of one buffer will
> often result in a read from memory of the next URB buffer.
> Which causes a portion of it to be loaded in the the D-cache.
>
> When that URB is subsequently filled by DMA, there then exists
> a data-consistency issue:  the D-cache contains stale information
> from before the latest DMA cycle.
>
> So this explains the strange memory behaviour observed earlier on.
> When I add a call to invalidate_dcache_range() to the driver
> just before it begins examining a new rx URB, the problems go away.
> So this confirms the observations.
>
> Using non-cacheable RAM also makes the problem go away.
> But neither is a fix for the real buffer overrun accesses in the driver.
>
> Fix the "packet spans URBs" bug, and fix the driver to ALWAYS
> test lengths/ranges before accessing the actual buffer,
> and everything should begin working reliably.

^ permalink raw reply

* Re: [RFC net-next 0/3] net: bridge: Allow CPU port configuration
From: Jiri Pirko @ 2016-11-22 12:49 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, bridge, stephen, vivien.didelot, andrew, jiri,
	idosch
In-Reply-To: <20161121190925.14530-1-f.fainelli@gmail.com>

Mon, Nov 21, 2016 at 08:09:22PM CET, f.fainelli@gmail.com wrote:
>Hi all,
>
>This patch series allows using the bridge master interface to configure
>an Ethernet switch port's CPU/management port with different VLAN attributes than
>those of the bridge downstream ports/members.
>
>Jiri, Ido, Andrew, Vivien, please review the impact on mlxsw and mv88e6xxx, I
>tested this with b53 and a mockup DSA driver.

Patchset looks fine to me.

>
>Open questions:
>
>- if we have more than one bridge on top of a physical switch, the driver
>  should keep track of that and verify that we are not going to change
>  the CPU port VLAN attributes in a way that results in incompatible settings
>  to be applied

Ack. In mlxsw this is tracked


>
>- if the default behavior is to have all VLANs associated with the CPU port
>  be ingressing/egressing tagged to the CPU, is this really useful?
>
>Florian Fainelli (3):
>  net: bridge: Allow bridge master device to configure switch CPU port
>  net: dsa: Propagate VLAN add/del to CPU port(s)
>  net: dsa: b53: Remove CPU port specific VLAN programming
>
> drivers/net/dsa/b53/b53_common.c | 22 ++++++--------------
> net/bridge/br_vlan.c             | 28 ++++++++++++++++++++++---
> net/dsa/slave.c                  | 45 +++++++++++++++++++++++++++++-----------
> 3 files changed, 64 insertions(+), 31 deletions(-)
>
>-- 
>2.9.3
>

^ permalink raw reply

* Re: [PATCH] ipv6:ipv6_pinfo dereferenced after NULL check
From: Hannes Frederic Sowa @ 2016-11-22 12:26 UTC (permalink / raw)
  To: Manjeet Pawar, davem, kuznet, jmorris, yoshfuji, kaber, netdev,
	linux-kernel
  Cc: pankaj.m, ajeet.y, Rohit Thapliyal
In-Reply-To: <1479796024-39418-1-git-send-email-manjeet.p@samsung.com>

On 22.11.2016 07:27, Manjeet Pawar wrote:
> From: Rohit Thapliyal <r.thapliyal@samsung.com>
> 
> np checked for NULL and then dereferenced. It should be modified
> for NULL case.
> 
> Signed-off-by: Rohit Thapliyal <r.thapliyal@samsung.com>
> Signed-off-by: Manjeet Pawar <manjeet.p@samsung.com>
> ---
>  net/ipv6/ip6_output.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 1dfc402..c2afa14 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -205,14 +205,15 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
>  	/*
>  	 *	Fill in the IPv6 header
>  	 */
> -	if (np)
> +	if (np) {
>  		hlimit = np->hop_limit;
> +		ip6_flow_hdr(
> +					hdr, tclass, ip6_make_flowlabel(
> +					net, skb, fl6->flowlabel,
> +					np->autoflowlabel, fl6));
> +	}
>  	if (hlimit < 0)
>  		hlimit = ip6_dst_hoplimit(dst);
>  
> -	ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel,
> -				np->autoflowlabel, fl6));
> -
>  	hdr->payload_len = htons(seg_len);
>  	hdr->nexthdr = proto;
>  	hdr->hop_limit = hlimit;
> 


We always should initialize hdr and not skip the ip6_flow_hdr call.

Do you saw a bug or did you find this by code review? I wonder if np can
actually be NULL at this point. Maybe we can just eliminate the NULL check.

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc
From: Hannes Frederic Sowa @ 2016-11-22 12:18 UTC (permalink / raw)
  To: Mike Manning, netdev
In-Reply-To: <1479810840-19122-1-git-send-email-mmanning@brocade.com>

On 22.11.2016 11:34, Mike Manning wrote:
> Bursts of failures may occur when adding IPv6 routes via Netlink to the
> kernel when testing under scale (e.g. 500 routes lost out of 1M). The
> reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
> extended the area map in time for the atomic allocation using percpu.c:
> pcpu_alloc() to succeed. This results in route additions failing with
> an -ENOMEM error.
> 
> While the sender of the Netlink msg to add this route could check for
> an ACK and retransmit in the case of an -ENOMEM error, the latter
> should not occur in the first place if there is plenty of memory. The
> solution is to use non-atomic alloc for rt6_info instead. While the
> client may now be blocked for longer depending on the state of the
> chunk being added to, this work has to be incurred at some point.
> 
> The alternative solution would be to provide configurable parameters
> e.g. via sysctl in percpu.c for default map size, low/high empty pages
> and map margins. For this solution, the map margin sizes need to be
> stored per chunk, as large margins cannot be used if the dynamic early
> slots map size is in use. This is not a preferred solution though, as
> it requires tuning of these parameters to provide sufficient margins to
> avoid -ENOMEM errors depending on system requirements.
> 
> Signed-off-by: Mike Manning <mmanning@brocade.com>
> ---
>  net/ipv6/route.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 1b57e11..0e9bb76 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
>  	struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
>  
>  	if (rt) {
> -		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
> +		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
>  		if (rt->rt6i_pcpu) {
>  			int cpu;

Nak, this doesn't work, as ip6_dst_alloc must be callable from
non-blocking code paths unfortunately.

^ permalink raw reply

* Re: mlx5 "syndrome" errors in kernel log
From: Saeed Mahameed @ 2016-11-22 12:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Saeed Mahameed, Tariq Toukan, netdev@vger.kernel.org
In-Reply-To: <20161122105944.0c64e77b@redhat.com>

On Tue, Nov 22, 2016 at 11:59 AM, Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
> Hi Saeed,
>
> I'm seeing below dmesg errors, after pulling net-next at commit
> e796f49d826aad, before I was not seeing these errors, where my tree was
> based on top of commit 319b0534b95.
>
> mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core 0000:02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
>
>
> Listing my firmware version:
>
>  $ ethtool -i mlx5p2
>  driver: mlx5_core
>  version: 3.0-1 (January 2015)
>  firmware-version: 12.12.1240

Hi Jesper,

Seems like this FW version doesn't support a new FW command introduced
by "net/mlx5e: Expose PCIe statistics to ethtool"

I suggest to upgrade FW, but if you don't know how to do it or in a
hurry, please go ahead and revert "
   net/mlx5e: Expose PCIe statistics to ethtool"

I will need to introduce a new capability bit as a permanent solution
and a fix for the above patch.

Thanks for the report,
We will handle this.

^ permalink raw reply

* net/udp: bug in skb_pull_rcsum
From: Andrey Konovalov @ 2016-11-22 11:58 UTC (permalink / raw)
  To: samanthakumar, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML
  Cc: Dmitry Vyukov, Kostya Serebryany, Eric Dumazet, syzkaller

[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]

Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

A reproducer is attached.

On commit 9c763584b7c8911106bb77af7e648bef09af9d80 (4.9-rc6, Nov 20).

------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:3029!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 3854 Comm: a.out Not tainted 4.9.0-rc6+ #431
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880068472c00 task.stack: ffff880063ec8000
RIP: 0010:[<ffffffff82b8fd85>]  [<ffffffff82b8fd85>]
skb_pull_rcsum+0x255/0x350 net/core/skbuff.c:3029
RSP: 0018:ffff880063ecf660  EFLAGS: 00010297
RAX: ffff880068472c00 RBX: ffff880065a2da00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000000d RDI: ffffed000c7d9ec0
RBP: ffff880063ecf690 R08: 1ffff1000d08e67e R09: 1ffff1000cb45b50
R10: dffffc0000000000 R11: 0000000000000000 R12: ffff880065a2da80
R13: 0000000000000008 R14: ffff880065a2dad8 R15: 0000000000000001
FS:  00007fbb006497c0(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020032fe0 CR3: 00000000636d9000 CR4: 00000000000006e0
Stack:
 ffff88006bfbb948 ffff880065a2da00 ffff880064160000 1ffff1000cb45b52
 0000000000000000 1ffff1000d4d3933 ffff880063ecf6f8 ffffffff83354ced
 00000000fffffe00 ffff880065a2da90 ffff880063ecf6c0 ffffffff00000001
Call Trace:
 [<     inline     >] udp_csum_pull_header ./include/net/udp.h:166
 [<ffffffff83354ced>] udpv6_queue_rcv_skb+0x37d/0x17b0 net/ipv6/udp.c:625
 [<     inline     >] sk_backlog_rcv ./include/net/sock.h:874
 [<ffffffff82b7eec6>] __release_sock+0x126/0x3a0 net/core/sock.c:2046
 [<ffffffff82b7f199>] release_sock+0x59/0x1c0 net/core/sock.c:2504
 [<ffffffff8334fc50>] udpv6_sendmsg+0x1310/0x24a0 net/ipv6/udp.c:1273
 [<ffffffff83174fa7>] inet_sendmsg+0x317/0x4e0 net/ipv4/af_inet.c:734
 [<     inline     >] sock_sendmsg_nosec net/socket.c:621
 [<ffffffff82b7176c>] sock_sendmsg+0xcc/0x110 net/socket.c:631
 [<ffffffff82b719d1>] sock_write_iter+0x221/0x3b0 net/socket.c:829
 [<ffffffff8151e69b>] do_iter_readv_writev+0x2bb/0x3f0 fs/read_write.c:695
 [<ffffffff81520501>] do_readv_writev+0x431/0x730 fs/read_write.c:872
 [<ffffffff81520d2f>] vfs_writev+0x8f/0xc0 fs/read_write.c:911
 [<ffffffff81520e41>] do_writev+0xe1/0x240 fs/read_write.c:944
 [<     inline     >] SYSC_writev fs/read_write.c:1017
 [<ffffffff81523ca7>] SyS_writev+0x27/0x30 fs/read_write.c:1014
 [<ffffffff83fc4381>] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code: 89 f8 49 c1 e8 03 47 0f b6 14 08 45 84 d2 74 0a 41 80 fa 03 0f
8e cf 00 00 00 80 a3 91 00 00 00 f9 e9 43 ff ff ff e8 3b 79 79 fe <0f>
0b e8 34 79 79 fe 0f 0b e8 2d 79 79 fe 48 8b 7d d0 31 d2 44
RIP  [<ffffffff82b8fd85>] skb_pull_rcsum+0x255/0x350 net/core/skbuff.c:3029
 RSP <ffff880063ecf660>
---[ end trace a5d5d2cef6a25ecb ]---
==================================================================

[-- Attachment #2: skb-pull-bug-poc.c --]
[-- Type: text/x-csrc, Size: 7858 bytes --]

// autogenerated by syzkaller (http://github.com/google/syzkaller)

#ifndef __NR_mmap
#define __NR_mmap 9
#endif
#ifndef __NR_bind
#define __NR_bind 49
#endif
#ifndef __NR_sendmsg
#define __NR_sendmsg 46
#endif
#ifndef __NR_writev
#define __NR_writev 20
#endif
#ifndef __NR_socket
#define __NR_socket 41
#endif
#ifndef __NR_syz_fuse_mount
#define __NR_syz_fuse_mount 1000004
#endif
#ifndef __NR_syz_fuseblk_mount
#define __NR_syz_fuseblk_mount 1000005
#endif
#ifndef __NR_syz_open_dev
#define __NR_syz_open_dev 1000002
#endif
#ifndef __NR_syz_open_pts
#define __NR_syz_open_pts 1000003
#endif
#ifndef __NR_syz_test
#define __NR_syz_test 1000001
#endif

#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <net/if_arp.h>

#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <pthread.h>
#include <setjmp.h>
#include <signal.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

__thread int skip_segv;
__thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* uctx)
{
  if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED))
    _longjmp(segv_env, 1);
  exit(sig);
}

static void install_segv_handler()
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_sigaction = segv_handler;
  sa.sa_flags = SA_NODEFER | SA_SIGINFO;
  sigaction(SIGSEGV, &sa, NULL);
  sigaction(SIGBUS, &sa, NULL);
}

#define NONFAILING(...)                                                \
  {                                                                    \
    __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
    if (_setjmp(segv_env) == 0) {                                      \
      __VA_ARGS__;                                                     \
    }                                                                  \
    __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
  }

static uintptr_t syz_open_dev(uintptr_t a0, uintptr_t a1, uintptr_t a2)
{
  if (a0 == 0xc || a0 == 0xb) {

    char buf[128];
    sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" : "block",
            (uint8_t)a1, (uint8_t)a2);
    return open(buf, O_RDWR, 0);
  } else {

    char buf[1024];
    char* hash;
    strncpy(buf, (char*)a0, sizeof(buf));
    buf[sizeof(buf) - 1] = 0;
    while ((hash = strchr(buf, '#'))) {
      *hash = '0' + (char)(a1 % 10);
      a1 /= 10;
    }
    return open(buf, a2, 0);
  }
}

static uintptr_t syz_open_pts(uintptr_t a0, uintptr_t a1)
{

  int ptyno = 0;
  if (ioctl(a0, TIOCGPTN, &ptyno))
    return -1;
  char buf[128];
  sprintf(buf, "/dev/pts/%d", ptyno);
  return open(buf, a1, 0);
}

static uintptr_t syz_fuse_mount(uintptr_t a0, uintptr_t a1,
                                uintptr_t a2, uintptr_t a3,
                                uintptr_t a4, uintptr_t a5)
{

  uint64_t target = a0;
  uint64_t mode = a1;
  uint64_t uid = a2;
  uint64_t gid = a3;
  uint64_t maxread = a4;
  uint64_t flags = a5;

  int fd = open("/dev/fuse", O_RDWR);
  if (fd == -1)
    return fd;
  char buf[1024];
  sprintf(buf, "fd=%d,user_id=%ld,group_id=%ld,rootmode=0%o", fd,
          (long)uid, (long)gid, (unsigned)mode & ~3u);
  if (maxread != 0)
    sprintf(buf + strlen(buf), ",max_read=%ld", (long)maxread);
  if (mode & 1)
    strcat(buf, ",default_permissions");
  if (mode & 2)
    strcat(buf, ",allow_other");
  syscall(SYS_mount, "", target, "fuse", flags, buf);

  return fd;
}

static uintptr_t syz_fuseblk_mount(uintptr_t a0, uintptr_t a1,
                                   uintptr_t a2, uintptr_t a3,
                                   uintptr_t a4, uintptr_t a5,
                                   uintptr_t a6, uintptr_t a7)
{

  uint64_t target = a0;
  uint64_t blkdev = a1;
  uint64_t mode = a2;
  uint64_t uid = a3;
  uint64_t gid = a4;
  uint64_t maxread = a5;
  uint64_t blksize = a6;
  uint64_t flags = a7;

  int fd = open("/dev/fuse", O_RDWR);
  if (fd == -1)
    return fd;
  if (syscall(SYS_mknodat, AT_FDCWD, blkdev, S_IFBLK, makedev(7, 199)))
    return fd;
  char buf[256];
  sprintf(buf, "fd=%d,user_id=%ld,group_id=%ld,rootmode=0%o", fd,
          (long)uid, (long)gid, (unsigned)mode & ~3u);
  if (maxread != 0)
    sprintf(buf + strlen(buf), ",max_read=%ld", (long)maxread);
  if (blksize != 0)
    sprintf(buf + strlen(buf), ",blksize=%ld", (long)blksize);
  if (mode & 1)
    strcat(buf, ",default_permissions");
  if (mode & 2)
    strcat(buf, ",allow_other");
  syscall(SYS_mount, blkdev, target, "fuseblk", flags, buf);

  return fd;
}

static uintptr_t execute_syscall(int nr, uintptr_t a0, uintptr_t a1,
                                 uintptr_t a2, uintptr_t a3,
                                 uintptr_t a4, uintptr_t a5,
                                 uintptr_t a6, uintptr_t a7,
                                 uintptr_t a8)
{
  switch (nr) {
  default:
    return syscall(nr, a0, a1, a2, a3, a4, a5);
  case __NR_syz_test:
    return 0;
  case __NR_syz_open_dev:
    return syz_open_dev(a0, a1, a2);
  case __NR_syz_open_pts:
    return syz_open_pts(a0, a1);
  case __NR_syz_fuse_mount:
    return syz_fuse_mount(a0, a1, a2, a3, a4, a5);
  case __NR_syz_fuseblk_mount:
    return syz_fuseblk_mount(a0, a1, a2, a3, a4, a5, a6, a7);
  }
}

long r[17];

int main()
{
  install_segv_handler();
  memset(r, -1, sizeof(r));
  r[0] = execute_syscall(__NR_mmap, 0x20000000ul, 0x35000ul, 0x3ul,
                         0x32ul, 0xfffffffffffffffful, 0x0ul, 0, 0, 0);
  r[1] = execute_syscall(__NR_socket, 0xaul, 0x2ul, 0x88ul, 0, 0, 0, 0,
                         0, 0);
  NONFAILING(memcpy(
      (void*)0x20034000,
      "\x0a\x00\x42\x42\x7a\x85\x86\xb4\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x01\xb4\x73\x17\x84\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00",
      128));
  r[3] = execute_syscall(__NR_bind, r[1], 0x20034000ul, 0x80ul, 0, 0, 0,
                         0, 0, 0);
  NONFAILING(*(uint64_t*)0x2002b000 = (uint64_t)0x20021f80);
  NONFAILING(*(uint32_t*)0x2002b008 = (uint32_t)0x80);
  NONFAILING(*(uint64_t*)0x2002b010 = (uint64_t)0x2001f000);
  NONFAILING(*(uint64_t*)0x2002b018 = (uint64_t)0x0);
  NONFAILING(*(uint64_t*)0x2002b020 = (uint64_t)0x20027000);
  NONFAILING(*(uint64_t*)0x2002b028 = (uint64_t)0x0);
  NONFAILING(*(uint32_t*)0x2002b030 = (uint32_t)0x0);
  NONFAILING(memcpy(
      (void*)0x20021f80,
      "\x0a\x00\x42\x42\xbe\xf9\xa8\xa3\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x01\x7f\xdb\x0d\xf1\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
      "\x00",
      128));
  r[12] = execute_syscall(__NR_sendmsg, r[1], 0x2002b000ul, 0x8000ul, 0,
                          0, 0, 0, 0, 0);
  NONFAILING(*(uint64_t*)0x20032fe0 = (uint64_t)0x20032000);
  NONFAILING(*(uint64_t*)0x20032fe8 = (uint64_t)0x1);
  NONFAILING(memcpy((void*)0x20032000, "\x0d", 1));
  r[16] = execute_syscall(__NR_writev, r[1], 0x20032fe0ul, 0x1ul, 0, 0,
                          0, 0, 0, 0);
  return 0;
}

^ permalink raw reply

* [PATCH] fec: Always write MAC address to controller register
From: Daniel Krüger @ 2016-11-22 11:24 UTC (permalink / raw)
  To: Fugang Duan; +Cc: netdev, Alexander Stein

On non-FEC_QUIRK_ENET_MAC types the MAC address needs to be set in FEC
during initialisation, if not done by bootloader already. Especially random
MACs or MAC addresses provided by kernel parameter must be set.

Signed-off-by: Daniel Krueger <daniel.krueger@systec-electronic.com>
---
 drivers/net/ethernet/freescale/fec_main.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2a03857..ea32fda 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -902,14 +902,14 @@ fec_restart(struct net_device *ndev)
 	/*
 	 * enet-mac reset will reset mac address registers too,
 	 * so need to reconfigure it.
+	 * On non-FEC_QUIRK_ENET_MAC types it won't be reset,
+	 * but it must be configured once at least (especially random MACs).
 	 */
-	if (fep->quirks & FEC_QUIRK_ENET_MAC) {
-		memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
-		writel((__force u32)cpu_to_be32(temp_mac[0]),
-		       fep->hwp + FEC_ADDR_LOW);
-		writel((__force u32)cpu_to_be32(temp_mac[1]),
-		       fep->hwp + FEC_ADDR_HIGH);
-	}
+	memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
+	writel((__force u32)cpu_to_be32(temp_mac[0]),
+	       fep->hwp + FEC_ADDR_LOW);
+	writel((__force u32)cpu_to_be32(temp_mac[1]),
+	       fep->hwp + FEC_ADDR_HIGH);
 
 	/* Clear any outstanding interrupt. */
 	writel(0xffffffff, fep->hwp + FEC_IEVENT);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc
From: Mike Manning @ 2016-11-22 10:34 UTC (permalink / raw)
  To: netdev

Bursts of failures may occur when adding IPv6 routes via Netlink to the
kernel when testing under scale (e.g. 500 routes lost out of 1M). The
reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
extended the area map in time for the atomic allocation using percpu.c:
pcpu_alloc() to succeed. This results in route additions failing with
an -ENOMEM error.

While the sender of the Netlink msg to add this route could check for
an ACK and retransmit in the case of an -ENOMEM error, the latter
should not occur in the first place if there is plenty of memory. The
solution is to use non-atomic alloc for rt6_info instead. While the
client may now be blocked for longer depending on the state of the
chunk being added to, this work has to be incurred at some point.

The alternative solution would be to provide configurable parameters
e.g. via sysctl in percpu.c for default map size, low/high empty pages
and map margins. For this solution, the map margin sizes need to be
stored per chunk, as large margins cannot be used if the dynamic early
slots map size is in use. This is not a preferred solution though, as
it requires tuning of these parameters to provide sufficient margins to
avoid -ENOMEM errors depending on system requirements.

Signed-off-by: Mike Manning <mmanning@brocade.com>
---
 net/ipv6/route.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1b57e11..0e9bb76 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
 	struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
 
 	if (rt) {
-		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
+		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
 		if (rt->rt6i_pcpu) {
 			int cpu;
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH v5] iproute2: macvlan: add "source" mode
From: Michael Braun @ 2016-11-22 10:59 UTC (permalink / raw)
  To: netdev; +Cc: Michael Braun, projekt-wlan, steweg

Adjusting iproute2 utility to support new macvlan link type mode called
"source".

Example of commands that can be applied:
  ip link add link eth0 name macvlan0 type macvlan mode source
  ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr flush
  ip -details link show dev macvlan0

Based on previous work of Stefan Gula <steweg@gmail.com>

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>

Cc: steweg@gmail.com

v5:
 - rebase and fix checkpatch

v4:
 - add MACADDR_SET support
 - skip FLAG_UNICAST / FLAG_UNICAST_ALL as this is not upstream
 - fix man page
---
 ip/iplink_macvlan.c   | 124 +++++++++++++++++++++++++++++++++++++++++++++++---
 man/man8/ip-link.8.in |  42 ++++++++++++++++-
 2 files changed, 158 insertions(+), 8 deletions(-)

diff --git a/ip/iplink_macvlan.c b/ip/iplink_macvlan.c
index 83ff961..b9a146f 100644
--- a/ip/iplink_macvlan.c
+++ b/ip/iplink_macvlan.c
@@ -15,6 +15,7 @@
 #include <string.h>
 #include <sys/socket.h>
 #include <linux/if_link.h>
+#include <linux/if_ether.h>
 
 #include "rt_names.h"
 #include "utils.h"
@@ -29,7 +30,11 @@
 static void print_explain(struct link_util *lu, FILE *f)
 {
 	fprintf(f,
-		"Usage: ... %s mode { private | vepa | bridge | passthru [nopromisc] }\n",
+		"Usage: ... %s mode MODE [flag MODE_FLAG] MODE_OPTS\n"
+		"MODE: private | vepa | bridge | passthru | source\n"
+		"MODE_FLAG: null | nopromisc\n"
+		"MODE_OPTS: for mode \"source\":\n"
+		"\tmacaddr { { add | del } <macaddr> | set [ <macaddr> [ <macaddr>  ... ] ] | flush }\n",
 		lu->id
 	);
 }
@@ -43,7 +48,15 @@ static void explain(struct link_util *lu)
 static int mode_arg(const char *arg)
 {
 	fprintf(stderr,
-		"Error: argument of \"mode\" must be \"private\", \"vepa\", \"bridge\" or \"passthru\", not \"%s\"\n",
+		"Error: argument of \"mode\" must be \"private\", \"vepa\", \"bridge\", \"passthru\" or \"source\", not \"%s\"\n",
+		arg);
+	return -1;
+}
+
+static int flag_arg(const char *arg)
+{
+	fprintf(stderr,
+		"Error: argument of \"flag\" must be \"nopromisc\" or \"null\", not \"%s\"\n",
 		arg);
 	return -1;
 }
@@ -53,6 +66,10 @@ static int macvlan_parse_opt(struct link_util *lu, int argc, char **argv,
 {
 	__u32 mode = 0;
 	__u16 flags = 0;
+	__u32 mac_mode = 0;
+	int has_flags = 0;
+	char mac[ETH_ALEN];
+	struct rtattr *nmac;
 
 	while (argc > 0) {
 		if (matches(*argv, "mode") == 0) {
@@ -66,10 +83,72 @@ static int macvlan_parse_opt(struct link_util *lu, int argc, char **argv,
 				mode = MACVLAN_MODE_BRIDGE;
 			else if (strcmp(*argv, "passthru") == 0)
 				mode = MACVLAN_MODE_PASSTHRU;
+			else if (strcmp(*argv, "source") == 0)
+				mode = MACVLAN_MODE_SOURCE;
 			else
 				return mode_arg(*argv);
+		} else if (matches(*argv, "flag") == 0) {
+			NEXT_ARG();
+
+			if (strcmp(*argv, "nopromisc") == 0)
+				flags |= MACVLAN_FLAG_NOPROMISC;
+			else if (strcmp(*argv, "null") == 0)
+				flags |= 0;
+			else
+				return flag_arg(*argv);
+
+			has_flags = 1;
+
+		} else if (matches(*argv, "macaddr") == 0) {
+			NEXT_ARG();
+
+			if (strcmp(*argv, "add") == 0) {
+				mac_mode = MACVLAN_MACADDR_ADD;
+			} else if (strcmp(*argv, "del") == 0) {
+				mac_mode = MACVLAN_MACADDR_DEL;
+			} else if (strcmp(*argv, "set") == 0) {
+				mac_mode = MACVLAN_MACADDR_SET;
+			} else if (strcmp(*argv, "flush") == 0) {
+				mac_mode = MACVLAN_MACADDR_FLUSH;
+			} else {
+				explain(lu);
+				return -1;
+			}
+
+			addattr32(n, 1024, IFLA_MACVLAN_MACADDR_MODE, mac_mode);
+
+			if (mac_mode == MACVLAN_MACADDR_ADD ||
+			    mac_mode == MACVLAN_MACADDR_DEL) {
+				NEXT_ARG();
+
+				if (ll_addr_a2n(mac, sizeof(mac),
+						*argv) != ETH_ALEN)
+					return -1;
+
+				addattr_l(n, 1024, IFLA_MACVLAN_MACADDR, &mac,
+					  ETH_ALEN);
+			}
+
+			if (mac_mode == MACVLAN_MACADDR_SET) {
+				nmac = addattr_nest(n, 1024,
+						    IFLA_MACVLAN_MACADDR_DATA);
+				while (NEXT_ARG_OK()) {
+					NEXT_ARG_FWD();
+
+					if (ll_addr_a2n(mac, sizeof(mac),
+							*argv) != ETH_ALEN) {
+						PREV_ARG();
+						break;
+					}
+
+					addattr_l(n, 1024, IFLA_MACVLAN_MACADDR,
+						  &mac, ETH_ALEN);
+				}
+				addattr_nest_end(n, nmac);
+			}
 		} else if (matches(*argv, "nopromisc") == 0) {
 			flags |= MACVLAN_FLAG_NOPROMISC;
+			has_flags = 1;
 		} else if (matches(*argv, "help") == 0) {
 			explain(lu);
 			return -1;
@@ -84,7 +163,7 @@ static int macvlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	if (mode)
 		addattr32(n, 1024, IFLA_MACVLAN_MODE, mode);
 
-	if (flags) {
+	if (has_flags) {
 		if (flags & MACVLAN_FLAG_NOPROMISC &&
 		    mode != MACVLAN_MODE_PASSTHRU) {
 			pfx_err(lu, "nopromisc flag only valid in passthru mode");
@@ -100,6 +179,10 @@ static void macvlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]
 {
 	__u32 mode;
 	__u16 flags;
+	__u32 count;
+	unsigned char *addr;
+	int len;
+	struct rtattr *rta;
 
 	if (!tb)
 		return;
@@ -109,20 +192,49 @@ static void macvlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]
 		return;
 
 	mode = rta_getattr_u32(tb[IFLA_MACVLAN_MODE]);
-	fprintf(f, " mode %s ",
+	fprintf(f, "mode %s ",
 		  mode == MACVLAN_MODE_PRIVATE ? "private"
 		: mode == MACVLAN_MODE_VEPA    ? "vepa"
 		: mode == MACVLAN_MODE_BRIDGE  ? "bridge"
 		: mode == MACVLAN_MODE_PASSTHRU  ? "passthru"
+		: mode == MACVLAN_MODE_SOURCE  ? "source"
 		:				 "unknown");
 
 	if (!tb[IFLA_MACVLAN_FLAGS] ||
 	    RTA_PAYLOAD(tb[IFLA_MACVLAN_FLAGS]) < sizeof(__u16))
-		return;
+		flags = 0;
+	else
+		flags = rta_getattr_u16(tb[IFLA_MACVLAN_FLAGS]);
 
-	flags = rta_getattr_u16(tb[IFLA_MACVLAN_FLAGS]);
 	if (flags & MACVLAN_FLAG_NOPROMISC)
 		fprintf(f, "nopromisc ");
+
+	/* in source mode, there are more options to print */
+
+	if (mode != MACVLAN_MODE_SOURCE)
+		return;
+
+	if (!tb[IFLA_MACVLAN_MACADDR_COUNT] ||
+	    RTA_PAYLOAD(tb[IFLA_MACVLAN_MACADDR_COUNT]) < sizeof(__u32))
+		return;
+
+	count = rta_getattr_u32(tb[IFLA_MACVLAN_MACADDR_COUNT]);
+	fprintf(f, "remotes (%d) ", count);
+
+	if (!tb[IFLA_MACVLAN_MACADDR_DATA])
+		return;
+
+	rta = RTA_DATA(tb[IFLA_MACVLAN_MACADDR_DATA]);
+	len = RTA_PAYLOAD(tb[IFLA_MACVLAN_MACADDR_DATA]);
+
+	for (; RTA_OK(rta, len); rta = RTA_NEXT(rta, len)) {
+		if (rta->rta_type != IFLA_MACVLAN_MACADDR ||
+		    RTA_PAYLOAD(rta) < 6)
+			continue;
+		addr = RTA_DATA(rta);
+		fprintf(f, "%.2x:%.2x:%.2x:%.2x:%.2x:%.2x ", addr[0],
+			addr[1], addr[2], addr[3], addr[4], addr[5]);
+	}
 }
 
 static void macvlan_print_help(struct link_util *lu, int argc, char **argv,
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index ee1159d..18e9417 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -135,7 +135,12 @@ ip-link \- network device configuration
 .IR NAME " ]"
 .br
 .RB "[ " addrgenmode " { " eui64 " | " none " | " stable_secret " | " random " } ]"
-
+.br
+.RB "[ " macaddr " { " flush " | { " add " | " del " } "
+.IR MACADDR " | set [ "
+.IR MACADDR " [ "
+.IR MACADDR " [ ... ] ] ] } ]"
+.br
 
 .ti -8
 .B ip link show
@@ -882,7 +887,7 @@ the following additional arguments are supported:
 .BI "ip link add link " DEVICE " name " NAME
 .BR type " { " macvlan " | " macvtap " } "
 .BR mode " { " private " | " vepa " | " bridge " | " passthru
-.RB " [ " nopromisc " ] } "
+.RB " [ " nopromisc " ] | " source " } "
 
 .in +8
 .sp
@@ -919,6 +924,13 @@ the interface or create vlan interfaces on top of it. By default, this mode
 forces the underlying interface into promiscuous mode. Passing the
 .BR nopromisc " flag prevents this, so the promisc flag may be controlled "
 using standard tools.
+
+.B mode source
+- allows one to set a list of allowed mac address, which is used to match
+against source mac address from received frames on underlying interface. This
+allows creating mac based VLAN associations, instead of standard port or tag
+based. The feature is useful to deploy 802.1x mac based behavior,
+where drivers of underlying interfaces doesn't allows that.
 .in -8
 
 .TP
@@ -1468,6 +1480,32 @@ the following additional arguments are supported:
 
 .in -8
 
+.TP
+MACVLAN and MACVTAP Support
+Modify list of allowed macaddr for link in source mode.
+
+.B "ip link set type { macvlan | macvap } "
+[
+.BI macaddr " " "" COMMAND " " MACADDR " ..."
+]
+
+Commands:
+.in +8
+.B add
+- add MACADDR to allowed list
+.sp
+.B set
+- replace allowed list
+.sp
+.B del
+- remove MACADDR from allowed list
+.sp
+.B flush
+- flush whole allowed list
+.sp
+.in -8
+
+
 .SS  ip link show - display device attributes
 
 .TP
-- 
2.1.4

^ permalink raw reply related

* [PATCH v8] mac80211: multicast to unicast conversion
From: Michael Braun @ 2016-11-22 10:52 UTC (permalink / raw)
  To: johannes; +Cc: Michael Braun, linux-wireless, netdev, projekt-wlan

Add the ability for an AP (and associated VLANs) to perform
multicast-to-unicast conversion for ARP, IPv4 and IPv6 frames
(possibly within 802.1Q). If enabled, such frames are to be sent
to each station separately, with the DA replaced by their own
MAC address rather than the group address.

Note that this may break certain expectations of the receiver,
such as the ability to drop unicast IP packets received within
multicast L2 frames, or the ability to not send ICMP destination
unreachable messages for packets received in L2 multicast (which
is required, but the receiver can't tell the difference if this
new option is enabled.)

This also doesn't implement the 802.11 DMS (directed multicast
service).

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>

--
v8:
  - remove superflous check
  - change return type to bool
v7:
  - avoid recursion
  - style and description
v5:
  - rename bss->unicast to bss->multicast_to_unicast
  - access sdata->bss only after checking iftype
v4:
  - rename MULTICAST_TO_UNICAST to MULTICAST_TO_UNICAST
v3: fix compile error for trace.h
v2: add nl80211 toggle
    rename tx_dnat to change_da
    change int to bool unicast
---
 net/mac80211/cfg.c            |  12 +++++
 net/mac80211/debugfs_netdev.c |   3 ++
 net/mac80211/ieee80211_i.h    |   1 +
 net/mac80211/tx.c             | 122 +++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 1edb017..7de342a 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -3345,6 +3345,17 @@ static int ieee80211_del_tx_ts(struct wiphy *wiphy, struct net_device *dev,
 	return -ENOENT;
 }
 
+static int ieee80211_set_multicast_to_unicast(struct wiphy *wiphy,
+					      struct net_device *dev,
+					      const bool enabled)
+{
+	struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+
+	sdata->u.ap.multicast_to_unicast = enabled;
+
+	return 0;
+}
+
 const struct cfg80211_ops mac80211_config_ops = {
 	.add_virtual_intf = ieee80211_add_iface,
 	.del_virtual_intf = ieee80211_del_iface,
@@ -3430,4 +3441,5 @@ const struct cfg80211_ops mac80211_config_ops = {
 	.set_ap_chanwidth = ieee80211_set_ap_chanwidth,
 	.add_tx_ts = ieee80211_add_tx_ts,
 	.del_tx_ts = ieee80211_del_tx_ts,
+	.set_multicast_to_unicast = ieee80211_set_multicast_to_unicast,
 };
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index ed7bff4..509c6c3 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -487,6 +487,8 @@ static ssize_t ieee80211_if_fmt_num_buffered_multicast(
 }
 IEEE80211_IF_FILE_R(num_buffered_multicast);
 
+IEEE80211_IF_FILE(multicast_to_unicast, u.ap.multicast_to_unicast, HEX);
+
 /* IBSS attributes */
 static ssize_t ieee80211_if_fmt_tsf(
 	const struct ieee80211_sub_if_data *sdata, char *buf, int buflen)
@@ -642,6 +644,7 @@ static void add_ap_files(struct ieee80211_sub_if_data *sdata)
 	DEBUGFS_ADD(dtim_count);
 	DEBUGFS_ADD(num_buffered_multicast);
 	DEBUGFS_ADD_MODE(tkip_mic_test, 0200);
+	DEBUGFS_ADD_MODE(multicast_to_unicast, 0600);
 }
 
 static void add_vlan_files(struct ieee80211_sub_if_data *sdata)
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 70c0963..84374ed 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -293,6 +293,7 @@ struct ieee80211_if_ap {
 			 driver_smps_mode; /* smps mode request */
 
 	struct work_struct request_smps_work;
+	bool multicast_to_unicast;
 };
 
 struct ieee80211_if_wds {
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index c3ce86e..5effffd 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -16,6 +16,7 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/skbuff.h>
+#include <linux/if_vlan.h>
 #include <linux/etherdevice.h>
 #include <linux/bitmap.h>
 #include <linux/rcupdate.h>
@@ -3418,6 +3419,115 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
 	rcu_read_unlock();
 }
 
+static int ieee80211_change_da(struct sk_buff *skb, struct sta_info *sta)
+{
+	struct ethhdr *eth;
+	int err;
+
+	err = skb_ensure_writable(skb, ETH_HLEN);
+	if (unlikely(err))
+		return err;
+
+	eth = (void *)skb->data;
+	ether_addr_copy(eth->h_dest, sta->sta.addr);
+
+	return 0;
+}
+
+static inline bool
+ieee80211_multicast_to_unicast(struct sk_buff *skb, struct net_device *dev)
+{
+	struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+	const struct ethhdr *eth = (void *)skb->data;
+	const struct vlan_ethhdr *ethvlan = (void *)skb->data;
+	u16 ethertype;
+
+	if (likely(!is_multicast_ether_addr(eth->h_dest)))
+		return 0;
+
+	switch (sdata->vif.type) {
+	case NL80211_IFTYPE_AP_VLAN:
+		if (sdata->u.vlan.sta)
+			return 0;
+		if (sdata->wdev.use_4addr)
+			return 0;
+		/* fall through */
+	case NL80211_IFTYPE_AP:
+		/* check runtime toggle for this bss */
+		if (!sdata->bss->multicast_to_unicast)
+			return 0;
+		break;
+	default:
+		return 0;
+	}
+
+	/* multicast to unicast conversion only for some payload */
+	ethertype = ntohs(eth->h_proto);
+	if (ethertype == ETH_P_8021Q && skb->len >= VLAN_ETH_HLEN)
+		ethertype = ntohs(ethvlan->h_vlan_encapsulated_proto);
+	switch (ethertype) {
+	case ETH_P_ARP:
+	case ETH_P_IP:
+	case ETH_P_IPV6:
+		break;
+	default:
+		return 0;
+	}
+
+	return 1;
+}
+
+static void
+ieee80211_convert_to_unicast(struct sk_buff *skb, struct net_device *dev,
+			     struct sk_buff_head *queue)
+{
+	struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+	struct ieee80211_local *local = sdata->local;
+	const struct ethhdr *eth = (struct ethhdr *)skb->data;
+	struct sta_info *sta, *first = NULL;
+	struct sk_buff *cloned_skb;
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(sta, &local->sta_list, list) {
+		if (sdata != sta->sdata)
+			/* AP-VLAN mismatch */
+			continue;
+		if (unlikely(ether_addr_equal(eth->h_source, sta->sta.addr)))
+			/* do not send back to source */
+			continue;
+		if (!first) {
+			first = sta;
+			continue;
+		}
+		cloned_skb = skb_clone(skb, GFP_ATOMIC);
+		if (!cloned_skb)
+			goto unicast;
+		if (unlikely(ieee80211_change_da(cloned_skb, sta))) {
+			dev_kfree_skb(cloned_skb);
+			goto unicast;
+		}
+		__skb_queue_tail(queue, cloned_skb);
+	}
+
+	if (likely(first)) {
+		if (unlikely(ieee80211_change_da(skb, first)))
+			goto unicast;
+		__skb_queue_tail(queue, skb);
+	} else {
+		/* no STA connected, drop */
+		kfree_skb(skb);
+		skb = NULL;
+	}
+
+	goto out;
+unicast:
+	__skb_queue_purge(queue);
+	__skb_queue_tail(queue, skb);
+out:
+	rcu_read_unlock();
+}
+
 /**
  * ieee80211_subif_start_xmit - netif start_xmit function for 802.3 vifs
  * @skb: packet to be sent
@@ -3428,7 +3538,17 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
 netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
 				       struct net_device *dev)
 {
-	__ieee80211_subif_start_xmit(skb, dev, 0);
+	if (unlikely(ieee80211_multicast_to_unicast(skb, dev))) {
+		struct sk_buff_head queue;
+
+		__skb_queue_head_init(&queue);
+		ieee80211_convert_to_unicast(skb, dev, &queue);
+		while ((skb = __skb_dequeue(&queue)))
+			__ieee80211_subif_start_xmit(skb, dev, 0);
+	} else {
+		__ieee80211_subif_start_xmit(skb, dev, 0);
+	}
+
 	return NETDEV_TX_OK;
 }
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH] net: dsa: mv88e6xxx: egress all frames
From: Stefan Eichenberger @ 2016-11-22 10:39 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli; +Cc: netdev, Stefan Eichenberger

Egress multicast and egress unicast is only enabled for CPU/DSA ports
but for switching operation it seems it should be enabled for all ports.
Do I miss something here?

I did the following test:
brctl addbr br0
brctl addif br0 lan0
brctl addif br0 lan1

In this scenario the unicast and multicast packets were not forwarded,
therefore ARP requests were not resolved, and no connection could be
established.

If no bridge is configured we do not forward unicast and multicast
packets because the VLAN mapping is active.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 883fd98..fe76372 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2506,15 +2506,14 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
 	    mv88e6xxx_6185_family(chip) || mv88e6xxx_6320_family(chip))
 		reg = PORT_CONTROL_IGMP_MLD_SNOOP |
 		PORT_CONTROL_USE_TAG | PORT_CONTROL_USE_IP |
-		PORT_CONTROL_STATE_FORWARDING;
+		PORT_CONTROL_STATE_FORWARDING |
+		PORT_CONTROL_FORWARD_UNKNOWN_MC | PORT_CONTROL_FORWARD_UNKNOWN;
 	if (dsa_is_cpu_port(ds, port)) {
 		if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_EDSA))
-			reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA |
-				PORT_CONTROL_FORWARD_UNKNOWN_MC;
+			reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA;
 		else
 			reg |= PORT_CONTROL_DSA_TAG;
-		reg |= PORT_CONTROL_EGRESS_ADD_TAG |
-			PORT_CONTROL_FORWARD_UNKNOWN;
+		reg |= PORT_CONTROL_EGRESS_ADD_TAG;
 	}
 	if (dsa_is_dsa_port(ds, port)) {
 		if (mv88e6xxx_6095_family(chip) ||
-- 
2.9.3

^ permalink raw reply related

* [PATCH] net: dsa: mv88e6xxx: add MV88E6097 switch
From: Stefan Eichenberger @ 2016-11-22 10:28 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli; +Cc: netdev, Stefan Eichenberger

Add support for the MV88E6097 switch. The change was tested on an Armada
based platform with a MV88E6097 switch.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c      | 19 +++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5a9729b..20d6fb5 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3213,6 +3213,12 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
 	.phy_write = mv88e6xxx_phy_ppu_write,
 };
 
+static const struct mv88e6xxx_ops mv88e6097_ops = {
+	.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
+	.phy_read = mv88e6xxx_g2_smi_phy_read,
+	.phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
 static const struct mv88e6xxx_ops mv88e6123_ops = {
 	.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
 	.phy_read = mv88e6xxx_read,
@@ -3342,6 +3348,19 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
 		.ops = &mv88e6095_ops,
 	},
 
+	[MV88E6097] = {
+		.prod_num = PORT_SWITCH_ID_PROD_NUM_6097,
+		.family = MV88E6XXX_FAMILY_6097,
+		.name = "Marvell 88E6097/88E6097F",
+		.num_databases = 4096,
+		.num_ports = 11,
+		.port_base_addr = 0x10,
+		.global1_addr = 0x1b,
+		.age_time_coeff = 15000,
+		.flags = MV88E6XXX_FLAGS_FAMILY_6097,
+		.ops = &mv88e6097_ops,
+	},
+
 	[MV88E6123] = {
 		.prod_num = PORT_SWITCH_ID_PROD_NUM_6123,
 		.family = MV88E6XXX_FAMILY_6165,
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index e572121..42e28f8 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -74,6 +74,7 @@
 #define PORT_SWITCH_ID		0x03
 #define PORT_SWITCH_ID_PROD_NUM_6085	0x04a
 #define PORT_SWITCH_ID_PROD_NUM_6095	0x095
+#define PORT_SWITCH_ID_PROD_NUM_6097	0x099
 #define PORT_SWITCH_ID_PROD_NUM_6131	0x106
 #define PORT_SWITCH_ID_PROD_NUM_6320	0x115
 #define PORT_SWITCH_ID_PROD_NUM_6123	0x121
@@ -353,6 +354,7 @@
 enum mv88e6xxx_model {
 	MV88E6085,
 	MV88E6095,
+	MV88E6097,
 	MV88E6123,
 	MV88E6131,
 	MV88E6161,
-- 
2.9.3

^ permalink raw reply related

* [patch net-next 2/2] mlxsw: core: Implement thermal zone
From: Jiri Pirko @ 2016-11-22 10:24 UTC (permalink / raw)
  To: netdev; +Cc: davem, cera, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1479810253-9114-1-git-send-email-jiri@resnulli.us>

From: Ivan Vecera <cera@cera.cz>

Implement thermal zone for mlxsw based HW. It uses temperature sensor
provided by ASIC (the same as mlxsw hwmon interface) to report current
temp to thermal core. The ASIC's PWM is then used to control speed
of system fans registered as cooling devices.

Signed-off-by: Ivan Vecera <cera@cera.cz>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Kconfig        |   9 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile       |   1 +
 drivers/net/ethernet/mellanox/mlxsw/core.c         |   8 +
 drivers/net/ethernet/mellanox/mlxsw/core.h         |  24 ++
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 442 +++++++++++++++++++++
 5 files changed, 484 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Kconfig b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
index c9822e6..95ae4c0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
@@ -19,6 +19,15 @@ config MLXSW_CORE_HWMON
 	---help---
 	  Say Y here if you want to expose HWMON interface on mlxsw devices.
 
+config MLXSW_CORE_THERMAL
+	bool "Thermal zone support for Mellanox Technologies Switch ASICs"
+	depends on MLXSW_CORE && THERMAL
+	depends on !(MLXSW_CORE=y && THERMAL=m)
+	default y
+	---help---
+	 Say Y here if you want to automatically control fans speed according
+	 ambient temperature reported by ASIC.
+
 config MLXSW_PCI
 	tristate "PCI bus implementation for Mellanox Technologies Switch ASICs"
 	depends on PCI && HAS_DMA && HAS_IOMEM && MLXSW_CORE
diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 2722942..fe8dadb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_MLXSW_CORE)	+= mlxsw_core.o
 mlxsw_core-objs			:= core.o
 mlxsw_core-$(CONFIG_MLXSW_CORE_HWMON) += core_hwmon.o
+mlxsw_core-$(CONFIG_MLXSW_CORE_THERMAL) += core_thermal.o
 obj-$(CONFIG_MLXSW_PCI)		+= mlxsw_pci.o
 mlxsw_pci-objs			:= pci.o
 obj-$(CONFIG_MLXSW_I2C)		+= mlxsw_i2c.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 763752f..bcd7251 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -131,6 +131,7 @@ struct mlxsw_core {
 	} lag;
 	struct mlxsw_res res;
 	struct mlxsw_hwmon *hwmon;
+	struct mlxsw_thermal *thermal;
 	struct mlxsw_core_port ports[MLXSW_PORT_MAX_PORTS];
 	unsigned long driver_priv[0];
 	/* driver_priv has to be always the last item */
@@ -1162,6 +1163,11 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (err)
 		goto err_hwmon_init;
 
+	err = mlxsw_thermal_init(mlxsw_core, mlxsw_bus_info,
+				 &mlxsw_core->thermal);
+	if (err)
+		goto err_thermal_init;
+
 	if (mlxsw_driver->init) {
 		err = mlxsw_driver->init(mlxsw_core, mlxsw_bus_info);
 		if (err)
@@ -1178,6 +1184,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (mlxsw_core->driver->fini)
 		mlxsw_core->driver->fini(mlxsw_core);
 err_driver_init:
+err_thermal_init:
 err_hwmon_init:
 	devlink_unregister(devlink);
 err_devlink_register:
@@ -1204,6 +1211,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 	mlxsw_core_debugfs_fini(mlxsw_core);
 	if (mlxsw_core->driver->fini)
 		mlxsw_core->driver->fini(mlxsw_core);
+	mlxsw_thermal_fini(mlxsw_core->thermal);
 	devlink_unregister(devlink);
 	mlxsw_emad_fini(mlxsw_core);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index f7a4d83..3de8955 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -321,4 +321,28 @@ static inline int mlxsw_hwmon_init(struct mlxsw_core *mlxsw_core,
 
 #endif
 
+struct mlxsw_thermal;
+
+#ifdef CONFIG_MLXSW_CORE_THERMAL
+
+int mlxsw_thermal_init(struct mlxsw_core *mlxsw_core,
+		       const struct mlxsw_bus_info *mlxsw_bus_info,
+		       struct mlxsw_thermal **p_thermal);
+void mlxsw_thermal_fini(struct mlxsw_thermal *thermal);
+
+#else
+
+static inline int mlxsw_thermal_init(struct mlxsw_core *mlxsw_core,
+				     const struct mlxsw_bus_info *mlxsw_bus_info,
+				     struct mlxsw_thermal **p_thermal)
+{
+	return 0;
+}
+
+static inline void mlxsw_thermal_fini(struct mlxsw_thermal *thermal)
+{
+}
+
+#endif
+
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
new file mode 100644
index 0000000..d866c98
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -0,0 +1,442 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+ * Copyright (c) 2016 Ivan Vecera <cera@cera.cz>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/sysfs.h>
+#include <linux/thermal.h>
+#include <linux/err.h>
+
+#include "core.h"
+
+#define MLXSW_THERMAL_POLL_INT	1000	/* ms */
+#define MLXSW_THERMAL_MAX_TEMP	110000	/* 110C */
+#define MLXSW_THERMAL_MAX_STATE	10
+#define MLXSW_THERMAL_MAX_DUTY	255
+
+struct mlxsw_thermal_trip {
+	int	type;
+	int	temp;
+	int	min_state;
+	int	max_state;
+};
+
+static const struct mlxsw_thermal_trip default_thermal_trips[] = {
+	{	/* In range - 0-40% PWM */
+		.type		= THERMAL_TRIP_ACTIVE,
+		.temp		= 75000,
+		.min_state	= 0,
+		.max_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
+	},
+	{	/* High - 40-100% PWM */
+		.type		= THERMAL_TRIP_ACTIVE,
+		.temp		= 80000,
+		.min_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
+		.max_state	= MLXSW_THERMAL_MAX_STATE,
+	},
+	{
+		/* Very high - 100% PWM */
+		.type		= THERMAL_TRIP_ACTIVE,
+		.temp		= 85000,
+		.min_state	= MLXSW_THERMAL_MAX_STATE,
+		.max_state	= MLXSW_THERMAL_MAX_STATE,
+	},
+	{	/* Warning */
+		.type		= THERMAL_TRIP_HOT,
+		.temp		= 105000,
+		.min_state	= MLXSW_THERMAL_MAX_STATE,
+		.max_state	= MLXSW_THERMAL_MAX_STATE,
+	},
+	{	/* Critical - soft poweroff */
+		.type		= THERMAL_TRIP_CRITICAL,
+		.temp		= MLXSW_THERMAL_MAX_TEMP,
+		.min_state	= MLXSW_THERMAL_MAX_STATE,
+		.max_state	= MLXSW_THERMAL_MAX_STATE,
+	}
+};
+
+#define MLXSW_THERMAL_NUM_TRIPS	ARRAY_SIZE(default_thermal_trips)
+
+/* Make sure all trips are writable */
+#define MLXSW_THERMAL_TRIP_MASK	(BIT(MLXSW_THERMAL_NUM_TRIPS) - 1)
+
+struct mlxsw_thermal {
+	struct mlxsw_core *core;
+	const struct mlxsw_bus_info *bus_info;
+	struct thermal_zone_device *tzdev;
+	struct thermal_cooling_device *cdevs[MLXSW_MFCR_PWMS_MAX];
+	struct mlxsw_thermal_trip trips[MLXSW_THERMAL_NUM_TRIPS];
+	enum thermal_device_mode mode;
+};
+
+static inline u8 mlxsw_state_to_duty(int state)
+{
+	return DIV_ROUND_CLOSEST(state * MLXSW_THERMAL_MAX_DUTY,
+				 MLXSW_THERMAL_MAX_STATE);
+}
+
+static inline int mlxsw_duty_to_state(u8 duty)
+{
+	return DIV_ROUND_CLOSEST(duty * MLXSW_THERMAL_MAX_STATE,
+				 MLXSW_THERMAL_MAX_DUTY);
+}
+
+static int mlxsw_get_cooling_device_idx(struct mlxsw_thermal *thermal,
+					struct thermal_cooling_device *cdev)
+{
+	int i;
+
+	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
+		if (thermal->cdevs[i] == cdev)
+			return i;
+
+	return -ENODEV;
+}
+
+static int mlxsw_thermal_bind(struct thermal_zone_device *tzdev,
+			      struct thermal_cooling_device *cdev)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	int i, err;
+
+	/* If the cooling device is one of ours bind it */
+	if (mlxsw_get_cooling_device_idx(thermal, cdev) < 0)
+		return 0;
+
+	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++) {
+		const struct mlxsw_thermal_trip *trip = &thermal->trips[i];
+
+		err = thermal_zone_bind_cooling_device(tzdev, i, cdev,
+						       trip->max_state,
+						       trip->min_state,
+						       THERMAL_WEIGHT_DEFAULT);
+		if (err < 0) {
+			dev_err(dev, "Failed to bind cooling device to trip %d\n", i);
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int mlxsw_thermal_unbind(struct thermal_zone_device *tzdev,
+				struct thermal_cooling_device *cdev)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	int i;
+	int err;
+
+	/* If the cooling device is our one unbind it */
+	if (mlxsw_get_cooling_device_idx(thermal, cdev) < 0)
+		return 0;
+
+	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++) {
+		err = thermal_zone_unbind_cooling_device(tzdev, i, cdev);
+		if (err < 0) {
+			dev_err(dev, "Failed to unbind cooling device\n");
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int mlxsw_thermal_get_mode(struct thermal_zone_device *tzdev,
+				  enum thermal_device_mode *mode)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	*mode = thermal->mode;
+
+	return 0;
+}
+
+static int mlxsw_thermal_set_mode(struct thermal_zone_device *tzdev,
+				  enum thermal_device_mode mode)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	mutex_lock(&tzdev->lock);
+
+	if (mode == THERMAL_DEVICE_ENABLED)
+		tzdev->polling_delay = MLXSW_THERMAL_POLL_INT;
+	else
+		tzdev->polling_delay = 0;
+
+	mutex_unlock(&tzdev->lock);
+
+	thermal->mode = mode;
+	thermal_zone_device_update(tzdev, THERMAL_EVENT_UNSPECIFIED);
+
+	return 0;
+}
+
+static int mlxsw_thermal_get_temp(struct thermal_zone_device *tzdev,
+				  int *p_temp)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	char mtmp_pl[MLXSW_REG_MTMP_LEN];
+	unsigned int temp;
+	int err;
+
+	mlxsw_reg_mtmp_pack(mtmp_pl, 0, false, false);
+
+	err = mlxsw_reg_query(thermal->core, MLXSW_REG(mtmp), mtmp_pl);
+	if (err) {
+		dev_err(dev, "Failed to query temp sensor\n");
+		return err;
+	}
+	mlxsw_reg_mtmp_unpack(mtmp_pl, &temp, NULL, NULL);
+
+	*p_temp = (int) temp;
+	return 0;
+}
+
+static int mlxsw_thermal_get_trip_type(struct thermal_zone_device *tzdev,
+				       int trip,
+				       enum thermal_trip_type *p_type)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS)
+		return -EINVAL;
+
+	*p_type = thermal->trips[trip].type;
+	return 0;
+}
+
+static int mlxsw_thermal_get_trip_temp(struct thermal_zone_device *tzdev,
+				       int trip, int *p_temp)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS)
+		return -EINVAL;
+
+	*p_temp = thermal->trips[trip].temp;
+	return 0;
+}
+
+static int mlxsw_thermal_set_trip_temp(struct thermal_zone_device *tzdev,
+				       int trip, int temp)
+{
+	struct mlxsw_thermal *thermal = tzdev->devdata;
+
+	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS ||
+	    temp > MLXSW_THERMAL_MAX_TEMP)
+		return -EINVAL;
+
+	thermal->trips[trip].temp = temp;
+	return 0;
+}
+
+static struct thermal_zone_device_ops mlxsw_thermal_ops = {
+	.bind = mlxsw_thermal_bind,
+	.unbind = mlxsw_thermal_unbind,
+	.get_mode = mlxsw_thermal_get_mode,
+	.set_mode = mlxsw_thermal_set_mode,
+	.get_temp = mlxsw_thermal_get_temp,
+	.get_trip_type	= mlxsw_thermal_get_trip_type,
+	.get_trip_temp	= mlxsw_thermal_get_trip_temp,
+	.set_trip_temp	= mlxsw_thermal_set_trip_temp,
+};
+
+static int mlxsw_thermal_get_max_state(struct thermal_cooling_device *cdev,
+				       unsigned long *p_state)
+{
+	*p_state = MLXSW_THERMAL_MAX_STATE;
+	return 0;
+}
+
+static int mlxsw_thermal_get_cur_state(struct thermal_cooling_device *cdev,
+				       unsigned long *p_state)
+
+{
+	struct mlxsw_thermal *thermal = cdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	char mfsc_pl[MLXSW_REG_MFSC_LEN];
+	int err, idx;
+	u8 duty;
+
+	idx = mlxsw_get_cooling_device_idx(thermal, cdev);
+	if (idx < 0)
+		return idx;
+
+	mlxsw_reg_mfsc_pack(mfsc_pl, idx, 0);
+	err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
+	if (err) {
+		dev_err(dev, "Failed to query PWM duty\n");
+		return err;
+	}
+
+	duty = mlxsw_reg_mfsc_pwm_duty_cycle_get(mfsc_pl);
+	*p_state = mlxsw_duty_to_state(duty);
+	return 0;
+}
+
+static int mlxsw_thermal_set_cur_state(struct thermal_cooling_device *cdev,
+				       unsigned long state)
+
+{
+	struct mlxsw_thermal *thermal = cdev->devdata;
+	struct device *dev = thermal->bus_info->dev;
+	char mfsc_pl[MLXSW_REG_MFSC_LEN];
+	int err, idx;
+
+	idx = mlxsw_get_cooling_device_idx(thermal, cdev);
+	if (idx < 0)
+		return idx;
+
+	mlxsw_reg_mfsc_pack(mfsc_pl, idx, mlxsw_state_to_duty(state));
+	err = mlxsw_reg_write(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
+	if (err) {
+		dev_err(dev, "Failed to write PWM duty\n");
+		return err;
+	}
+	return 0;
+}
+
+static const struct thermal_cooling_device_ops mlxsw_cooling_ops = {
+	.get_max_state	= mlxsw_thermal_get_max_state,
+	.get_cur_state	= mlxsw_thermal_get_cur_state,
+	.set_cur_state	= mlxsw_thermal_set_cur_state,
+};
+
+int mlxsw_thermal_init(struct mlxsw_core *core,
+		       const struct mlxsw_bus_info *bus_info,
+		       struct mlxsw_thermal **p_thermal)
+{
+	char mfcr_pl[MLXSW_REG_MFCR_LEN] = { 0 };
+	enum mlxsw_reg_mfcr_pwm_frequency freq;
+	struct device *dev = bus_info->dev;
+	struct mlxsw_thermal *thermal;
+	u16 tacho_active;
+	u8 pwm_active;
+	int err, i;
+
+	thermal = devm_kzalloc(dev, sizeof(*thermal),
+			       GFP_KERNEL);
+	if (!thermal)
+		return -ENOMEM;
+
+	thermal->core = core;
+	thermal->bus_info = bus_info;
+	memcpy(thermal->trips, default_thermal_trips, sizeof(thermal->trips));
+
+	err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfcr), mfcr_pl);
+	if (err) {
+		dev_err(dev, "Failed to probe PWMs\n");
+		goto err_free_thermal;
+	}
+	mlxsw_reg_mfcr_unpack(mfcr_pl, &freq, &tacho_active, &pwm_active);
+
+	for (i = 0; i < MLXSW_MFCR_TACHOS_MAX; i++) {
+		if (tacho_active & BIT(i)) {
+			char mfsl_pl[MLXSW_REG_MFSL_LEN];
+
+			mlxsw_reg_mfsl_pack(mfsl_pl, i, 0, 0);
+
+			/* We need to query the register to preserve maximum */
+			err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfsl),
+					      mfsl_pl);
+			if (err)
+				goto err_free_thermal;
+
+			/* set the minimal RPMs to 0 */
+			mlxsw_reg_mfsl_tach_min_set(mfsl_pl, 0);
+			err = mlxsw_reg_write(thermal->core, MLXSW_REG(mfsl),
+					      mfsl_pl);
+			if (err)
+				goto err_free_thermal;
+		}
+	}
+	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
+		if (pwm_active & BIT(i)) {
+			struct thermal_cooling_device *cdev;
+
+			cdev = thermal_cooling_device_register("Fan", thermal,
+							&mlxsw_cooling_ops);
+			if (IS_ERR(cdev)) {
+				err = PTR_ERR(cdev);
+				dev_err(dev, "Failed to register cooling device\n");
+				goto err_unreg_cdevs;
+			}
+			thermal->cdevs[i] = cdev;
+		}
+	}
+
+	thermal->tzdev = thermal_zone_device_register("mlxsw",
+						      MLXSW_THERMAL_NUM_TRIPS,
+						      MLXSW_THERMAL_TRIP_MASK,
+						      thermal,
+						      &mlxsw_thermal_ops,
+						      NULL, 0,
+						      MLXSW_THERMAL_POLL_INT);
+	if (IS_ERR(thermal->tzdev)) {
+		err = PTR_ERR(thermal->tzdev);
+		dev_err(dev, "Failed to register thermal zone\n");
+		goto err_unreg_cdevs;
+	}
+
+	thermal->mode = THERMAL_DEVICE_ENABLED;
+	*p_thermal = thermal;
+	return 0;
+err_unreg_cdevs:
+	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
+		if (thermal->cdevs[i])
+			thermal_cooling_device_unregister(thermal->cdevs[i]);
+err_free_thermal:
+	devm_kfree(dev, thermal);
+	return err;
+}
+
+void mlxsw_thermal_fini(struct mlxsw_thermal *thermal)
+{
+	int i;
+
+	if (thermal->tzdev) {
+		thermal_zone_device_unregister(thermal->tzdev);
+		thermal->tzdev = NULL;
+	}
+
+	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
+		if (thermal->cdevs[i]) {
+			thermal_cooling_device_unregister(thermal->cdevs[i]);
+			thermal->cdevs[i] = NULL;
+		}
+	}
+
+	devm_kfree(thermal->bus_info->dev, thermal);
+}
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 1/2] mlxsw: reg: Add Management Fan Speed Limit register
From: Jiri Pirko @ 2016-11-22 10:24 UTC (permalink / raw)
  To: netdev; +Cc: davem, cera, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1479810253-9114-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

The MFSL register is used to configure the fan speed event / interrupt
notification mechanism. Fan speed threshold are defined for both
under-speed and over-speed.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 49 +++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index edad7cb..2618e9c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -4518,6 +4518,54 @@ static inline void mlxsw_reg_mfsm_pack(char *payload, u8 tacho)
 	mlxsw_reg_mfsm_tacho_set(payload, tacho);
 }
 
+/* MFSL - Management Fan Speed Limit Register
+ * ------------------------------------------
+ * The Fan Speed Limit register is used to configure the fan speed
+ * event / interrupt notification mechanism. Fan speed threshold are
+ * defined for both under-speed and over-speed.
+ */
+#define MLXSW_REG_MFSL_ID 0x9004
+#define MLXSW_REG_MFSL_LEN 0x0C
+
+MLXSW_REG_DEFINE(mfsl, MLXSW_REG_MFSL_ID, MLXSW_REG_MFSL_LEN);
+
+/* reg_mfsl_tacho
+ * Fan tachometer index.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mfsl, tacho, 0x00, 24, 4);
+
+/* reg_mfsl_tach_min
+ * Tachometer minimum value (minimum RPM).
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mfsl, tach_min, 0x04, 0, 16);
+
+/* reg_mfsl_tach_max
+ * Tachometer maximum value (maximum RPM).
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mfsl, tach_max, 0x08, 0, 16);
+
+static inline void mlxsw_reg_mfsl_pack(char *payload, u8 tacho,
+				       u16 tach_min, u16 tach_max)
+{
+	MLXSW_REG_ZERO(mfsl, payload);
+	mlxsw_reg_mfsl_tacho_set(payload, tacho);
+	mlxsw_reg_mfsl_tach_min_set(payload, tach_min);
+	mlxsw_reg_mfsl_tach_max_set(payload, tach_max);
+}
+
+static inline void mlxsw_reg_mfsl_unpack(char *payload, u8 tacho,
+					 u16 *p_tach_min, u16 *p_tach_max)
+{
+	if (p_tach_min)
+		*p_tach_min = mlxsw_reg_mfsl_tach_min_get(payload);
+
+	if (p_tach_max)
+		*p_tach_max = mlxsw_reg_mfsl_tach_max_get(payload);
+}
+
 /* MTCAP - Management Temperature Capabilities
  * -------------------------------------------
  * This register exposes the capabilities of the device and
@@ -5228,6 +5276,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
 	MLXSW_REG(mfcr),
 	MLXSW_REG(mfsc),
 	MLXSW_REG(mfsm),
+	MLXSW_REG(mfsl),
 	MLXSW_REG(mtcap),
 	MLXSW_REG(mtmp),
 	MLXSW_REG(mpat),
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 0/2] mlxsw: core: Implement thermal zone
From: Jiri Pirko @ 2016-11-22 10:24 UTC (permalink / raw)
  To: netdev; +Cc: davem, cera, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz

From: Jiri Pirko <jiri@mellanox.com>

Implement thermal zone for mlxsw based HW.
The first patch is just a register dependency for the second patch.

Ivan Vecera (1):
  mlxsw: core: Implement thermal zone

Jiri Pirko (1):
  mlxsw: reg: Add Management Fan Speed Limit register

 drivers/net/ethernet/mellanox/mlxsw/Kconfig        |   9 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile       |   1 +
 drivers/net/ethernet/mellanox/mlxsw/core.c         |   8 +
 drivers/net/ethernet/mellanox/mlxsw/core.h         |  24 ++
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 442 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/reg.h          |  49 +++
 6 files changed, 533 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c

-- 
2.7.4

^ permalink raw reply

* net/icmp: null-ptr-deref in icmp6_send
From: Andrey Konovalov @ 2016-11-22 10:23 UTC (permalink / raw)
  To: David Ahern, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML
  Cc: Dmitry Vyukov, Alexander Potapenko, Kostya Serebryany,
	Eric Dumazet, syzkaller

Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

It seems that skb_dst(skb) may end up being NULL.

As far as I can see the bug was introduced in commit 5d41ce29e ("net:
icmp6_send should use dst dev to determine L3 domain").
ICMP v4 probaly has similar issue due to 9d1a6c4ea ("net:
icmp_route_lookup should use rt dev to determine L3 domain").

On commit 9c763584b7c8911106bb77af7e648bef09af9d80 (4.9-rc6, Nov 20).

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800666d4200 task.stack: ffff880067348000
RIP: 0010:[<ffffffff833617ec>]  [<ffffffff833617ec>]
icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
RSP: 0018:ffff88006734f2c0  EFLAGS: 00010206
RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
FS:  00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
Stack:
 ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
 ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
 ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
Call Trace:
 [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
 [<     inline     >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
 [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
 [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
 [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
 [<ffffffff82bdc01b>] __netif_receive_skb_core+0x187b/0x2a10 net/core/dev.c:4208
 [<ffffffff82bdd1da>] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4246
 [<ffffffff82bdd4d3>] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4274
 [<ffffffff82bdd6f8>] netif_receive_skb+0x48/0x250 net/core/dev.c:4298
 [<ffffffff82420e7e>] tun_get_user+0xbde/0x2890 drivers/net/tun.c:1308
 [<ffffffff82422d4a>] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
 [<     inline     >] new_sync_write fs/read_write.c:499
 [<ffffffff8151c234>] __vfs_write+0x334/0x570 fs/read_write.c:512
 [<ffffffff8151fd4b>] vfs_write+0x17b/0x500 fs/read_write.c:560
 [<     inline     >] SYSC_write fs/read_write.c:607
 [<ffffffff81523674>] SyS_write+0xd4/0x1a0 fs/read_write.c:599
 [<ffffffff83fc4301>] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code: 67 58 41 f6 c4 01 0f 85 d4 07 00 00 49 83 e4 fe e8 ea 5e fc fd
49 8d 7c 24 18 49 ba 00 00 00 00 00 fc ff df 49 89 f9 49 c1 e9 03 <43>
80 3c 11 00 0f 85 c5 17 00 00 4d 8b 64 24 18 65 ff 05 cd 3c
RIP  [<ffffffff833617ec>] icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
 RSP <ffff88006734f2c0>
---[ end trace 12dd736536064d71 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox