Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: fec: select queue depending on VLAN priority
From: Andrew Lunn @ 2017-05-09 12:15 UTC (permalink / raw)
  To: Stefan Agner; +Cc: fugang.duan, festevam, netdev, linux-kernel
In-Reply-To: <20170509053708.2573-1-stefan@agner.ch>

On Mon, May 08, 2017 at 10:37:08PM -0700, Stefan Agner wrote:
> Since the addition of the multi queue code with commit 59d0f7465644
> ("net: fec: init multi queue date structure") the queue selection
> has been handelt by the default transmit queue selection
> implementation which tries to evenly distribute the traffic across
> all available queues. This selection presumes that the queues are
> using an equal priority, however, the queues 1 and 2 are actually
> of higher priority (the classification of the queues is enabled in
> fec_enet_enable_ring).
> 
> This can lead to net scheduler warnings and continuous TX ring
> dumps when exercising the system with iperf.
> 
> Use only queue 0 for all common traffic (no VLAN and P802.1p
> priority 0 and 1) and route level 2-7 through queue 1 and 2.

Hi Stefan

Did you try:

vconfig set_egress_map eth0.42 0 7
ip addr add 10.42.42.42/24 eth0.42
iperf -c 10.42.42.1

i.e. send a continuous stream on one of the higher priority queues.

>From what was said earlier in this thread, isn't queue 0 going to be
starved? As well as this patch, don't we also need some default
bandwidth allocations to the queues to ensure queue 0 does get some
bandwidth?

	Andrew

^ permalink raw reply

* [PATCH net] tcp: do not inherit mc_list from parent
From: Eric Dumazet @ 2017-05-09 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Pray3r, Andrey Konovalov

From: Eric Dumazet <edumazet@google.com>

syzkaller found a way to trigger double frees from ip_mc_drop_socket()

It turns out that leave a copy of parent mc_list at accept() time,
which is very bad.

Very similar to commit 8b485ce69876 ("tcp: do not inherit
fastopen_req from parent")

Initial report from Pray3r, completed by Andrey one.
Thanks a lot to them !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pray3r <pray3r.z@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
---
Notes:
 - day-0 bug.
 - Not sure if it makes sense for TCP socket to be able to join MC
group ?

 net/ipv4/tcp_minisocks.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 717be4de53248352c758b50557987d898340dd4f..03035e2857fc8b6e4cd8af6e46e81048d4de9105 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -448,6 +448,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 		minmax_reset(&newtp->rtt_min, tcp_time_stamp, ~0U);
 		newicsk->icsk_rto = TCP_TIMEOUT_INIT;
 		newicsk->icsk_ack.lrcvtime = tcp_time_stamp;
+		newicsk->icsk_inet.mc_list = NULL;
 
 		newtp->packets_out = 0;
 		newtp->retrans_out = 0;

^ permalink raw reply related

* Re: [PATCH] net: wireless: ath: ath10k: remove unnecessary code
From: Kalle Valo @ 2017-05-09 12:20 UTC (permalink / raw)
  To: Arend Van Spriel
  Cc: Gustavo A. R. Silva, netdev@vger.kernel.org,
	linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
	ath10k@lists.infradead.org
In-Reply-To: <76408651-07c6-fe31-863f-e1cb73b49663@broadcom.com>

Arend Van Spriel <arend.vanspriel@broadcom.com> writes:

> On 9-5-2017 7:33, Kalle Valo wrote:
>> "Gustavo A. R. Silva" <garsilva@embeddedor.com> writes:
>> 
>>> The name of an array used by itself will always return the array's address.
>>> So these tests will always evaluate as false and therefore the _return_
>>> will never be executed.
>>>
>>> Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
>> 
>> I don't understand the commit log, especially what does "The name of an
>> array used by itself" mean?
>
> The array fields in struct wmi_start_scan_arg that are checked here are
> fixed size arrays so they can never be NULL.
>
> Maybe that helps rephrasing this commit message.

Much much better, thanks!

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] net: wireless: ath: ath9k: remove unnecessary code
From: Kalle Valo @ 2017-05-09 12:21 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: ath9k-devel, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20170509070158.Horde.dYleVB-aK1cNNyQpdVsVMNp@gator4166.hostgator.com>

"Gustavo A. R. Silva" <garsilva@embeddedor.com> writes:

> Hi Kalle,
>
> Quoting Kalle Valo <kvalo@qca.qualcomm.com>:
>
>> "Gustavo A. R. Silva" <garsilva@embeddedor.com> writes:
>>
>>> The name of an array used by itself will always return the array's address.
>>> So this test will always evaluate as true.
>>>
>>> Addresses-Coverity-ID: 1364903
>>> Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
>>> ---
>>>  drivers/net/wireless/ath/ath9k/eeprom.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/wireless/ath/ath9k/eeprom.c
>>> b/drivers/net/wireless/ath/ath9k/eeprom.c
>>> index fb80ec8..5c3bc28 100644
>>> --- a/drivers/net/wireless/ath/ath9k/eeprom.c
>>> +++ b/drivers/net/wireless/ath/ath9k/eeprom.c
>>> @@ -143,7 +143,7 @@ bool ath9k_hw_nvram_read(struct ath_hw *ah, u32
>>> off, u16 *data)
>>>
>>>  	if (ah->eeprom_blob)
>>>  		ret = ath9k_hw_nvram_read_firmware(ah->eeprom_blob, off, data);
>>> -	else if (pdata && !pdata->use_eeprom && pdata->eeprom_data)
>>> +	else if (pdata && !pdata->use_eeprom)
>>>  		ret = ath9k_hw_nvram_read_pdata(pdata, off, data);
>>>  	else
>>>  		ret = common->bus_ops->eeprom_read(common, off, data);
>>
>> The patch may very well be valid (didn't check yet) but the commit log
>> is gibberish for me.
>>
>
> Let me correct that and I'll send the patch again.

Thanks.

Also no need to have that long "net: wireless: ath:" prefix, "ath9k: "
or "ath10k: " is enough.

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] wil6210: Replace five seq_puts() calls by seq_putc()
From: Eric Dumazet @ 2017-05-09 12:25 UTC (permalink / raw)
  To: SF Markus Elfring
  Cc: wil6210, linux-wireless, netdev, Kalle Valo, Maya Erez, LKML,
	kernel-janitors
In-Reply-To: <64747f85-e373-a0ff-b6dc-70cdfe35f71a@users.sourceforge.net>

On Tue, 2017-05-09 at 09:50 +0200, SF Markus Elfring wrote:
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Mon, 8 May 2017 22:22:04 +0200
> 
> Five single characters (line breaks) should be put into a sequence.
> Thus use the corresponding function "seq_putc".
> 
> This issue was detected by using the Coccinelle software.

There is no _issue_ at all here, only a matter of taste.

printf("\n")  or putchar('\n')  in some slow path is really not that
interesting.

^ permalink raw reply

* Re: [PATCH net] rtnetlink: Fix the IFLA_PHYS_PORT_NAME TLV to include terminating NULL
From: Tobias Klauser @ 2017-05-09 12:31 UTC (permalink / raw)
  To: Yotam Gigi
  Cc: davem, zhangshengju, roopa, sd, bblanco, minipli, nogahf, moshe,
	rshearma, daniel, netdev, David Ahern
In-Reply-To: <1494331922-16451-1-git-send-email-yotamg@mellanox.com>

On 2017-05-09 at 14:12:02 +0200, Yotam Gigi <yotamg@mellanox.com> wrote:
> The IFLA_PHYS_PORT_NAME rtnetlink TLV length does not include the
> terminating NULL character, which is different from other string typed
> TLVs. Due to the fact that libnl checks for the terminating NULL in every
> string typed attribute, it crashes on every RTM_GETLINK response on
> drivers that implement ndo_get_phys_port_name.
> 
> Make the fill_phys_port_name function include the terminating NULL in the
> TLV size by using the nla_put_string helper function.
> 
> Fixes: db24a9044ee1 ("net: add support for phys_port_name")
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
> Cc: David Ahern <dsa@cumulusnetworks.com>
> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> ---
> Please consider this for stable too. Thanks!

This is already fixed in commit 77ef033b687c ("rtnetlink: NUL-terminate
IFLA_PHYS_PORT_NAME string").

^ permalink raw reply

* Re: [PATCH] net: wireless: ath: ath10k: remove unnecessary code
From: Gustavo A. R. Silva @ 2017-05-09 12:34 UTC (permalink / raw)
  To: Arend Van Spriel; +Cc: netdev, Kalle Valo, linux-wireless, linux-kernel, ath10k
In-Reply-To: <76408651-07c6-fe31-863f-e1cb73b49663@broadcom.com>

Hi Arend,

Quoting Arend Van Spriel <arend.vanspriel@broadcom.com>:

> On 9-5-2017 7:33, Kalle Valo wrote:
>> "Gustavo A. R. Silva" <garsilva@embeddedor.com> writes:
>>
>>> The name of an array used by itself will always return the array's address.
>>> So these tests will always evaluate as false and therefore the _return_
>>> will never be executed.
>>>
>>> Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
>>
>> I don't understand the commit log, especially what does "The name of an
>> array used by itself" mean?
>
> The array fields in struct wmi_start_scan_arg that are checked here are
> fixed size arrays so they can never be NULL.
>
> Maybe that helps rephrasing this commit message.
>

Definitely. Thank you!
--
Gustavo A. R. Silva

^ permalink raw reply

* Re: [PATCH] net: wireless: ath: ath9k: remove unnecessary code
From: Gustavo A. R. Silva @ 2017-05-09 12:36 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath9k-devel, linux-wireless, netdev, linux-kernel
In-Reply-To: <87shkez0da.fsf@kamboji.qca.qualcomm.com>


Quoting Kalle Valo <kvalo@qca.qualcomm.com>:

> "Gustavo A. R. Silva" <garsilva@embeddedor.com> writes:
>
>> Hi Kalle,
>>
>> Quoting Kalle Valo <kvalo@qca.qualcomm.com>:
>>
>>> "Gustavo A. R. Silva" <garsilva@embeddedor.com> writes:
>>>
>>>> The name of an array used by itself will always return the  
>>>> array's address.
>>>> So this test will always evaluate as true.
>>>>
>>>> Addresses-Coverity-ID: 1364903
>>>> Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
>>>> ---
>>>>  drivers/net/wireless/ath/ath9k/eeprom.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/wireless/ath/ath9k/eeprom.c
>>>> b/drivers/net/wireless/ath/ath9k/eeprom.c
>>>> index fb80ec8..5c3bc28 100644
>>>> --- a/drivers/net/wireless/ath/ath9k/eeprom.c
>>>> +++ b/drivers/net/wireless/ath/ath9k/eeprom.c
>>>> @@ -143,7 +143,7 @@ bool ath9k_hw_nvram_read(struct ath_hw *ah, u32
>>>> off, u16 *data)
>>>>
>>>>  	if (ah->eeprom_blob)
>>>>  		ret = ath9k_hw_nvram_read_firmware(ah->eeprom_blob, off, data);
>>>> -	else if (pdata && !pdata->use_eeprom && pdata->eeprom_data)
>>>> +	else if (pdata && !pdata->use_eeprom)
>>>>  		ret = ath9k_hw_nvram_read_pdata(pdata, off, data);
>>>>  	else
>>>>  		ret = common->bus_ops->eeprom_read(common, off, data);
>>>
>>> The patch may very well be valid (didn't check yet) but the commit log
>>> is gibberish for me.
>>>
>>
>> Let me correct that and I'll send the patch again.
>
> Thanks.
>
> Also no need to have that long "net: wireless: ath:" prefix, "ath9k: "
> or "ath10k: " is enough.
>

I get it.

Thanks!
--
Gustavo A. R. Silva

^ permalink raw reply

* [PATCH 0/2] net: Set maximum receive packet size on veth interfaces
From: Fredrik Markstrom @ 2017-05-09 12:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Stephen Hemminger, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge, Fredrik Markstrom

Currently veth drops all packets larger then the mtu set on the receiving
end of the pair. This is inconsistent with most hardware ethernet drivers.

This patch set adds a new driver attribute to set the maximum size of
received packet to make it possible to create configurations similar to
those possible with (most) hardware ethernet interfaces.

The set consists of two patches. The first one adding a parameter do the
dev_forward_skb functions to specify the maximum packet size, the
second one implents a new attribute (VETH_MRU) in the veth driver.

Fredrik Markstrom (1):
  veth: Added attribute to set maximum receive size on veth interfaces

Fredrik Markström (1):
  net: Added mtu parameter to dev_forward_skb calls

 drivers/net/ipvlan/ipvlan_core.c |  7 ++++---
 drivers/net/macvlan.c            |  4 ++--
 drivers/net/veth.c               | 45 +++++++++++++++++++++++++++++++++++++++-
 include/linux/netdevice.h        | 10 ++++-----
 include/uapi/linux/veth.h        |  1 +
 net/bridge/br_forward.c          |  4 ++--
 net/core/dev.c                   | 17 +++++++++------
 net/core/filter.c                |  4 ++--
 net/l2tp/l2tp_eth.c              |  2 +-
 9 files changed, 72 insertions(+), 22 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCH 1/2] net: Added mtu parameter to dev_forward_skb calls
From: Fredrik Markstrom @ 2017-05-09 12:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Stephen Hemminger, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge,
	Fredrik Markström
In-Reply-To: <20170509124439.45674-1-fredrik.markstrom@gmail.com>

From: Fredrik Markström <fredrik.markstrom@gmail.com>

is_skb_forwardable() currently checks if the packet size is <= mtu of
the receiving interface. This is not consistent with most of the hardware
ethernet drivers that happily receives packets larger then MTU.

This patch adds a parameter to dev_forward_skb and is_skb_forwardable so
that the caller can override this packet size limit.

Signed-off-by: Fredrik Markstrom <fredrik.markstrom@gmail.com>
---
 drivers/net/ipvlan/ipvlan_core.c |  7 ++++---
 drivers/net/macvlan.c            |  4 ++--
 drivers/net/veth.c               |  2 +-
 include/linux/netdevice.h        | 10 +++++-----
 net/bridge/br_forward.c          |  4 ++--
 net/core/dev.c                   | 17 +++++++++++------
 net/core/filter.c                |  4 ++--
 net/l2tp/l2tp_eth.c              |  2 +-
 8 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 1f3295e274d0..dbbe48ade204 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -234,7 +234,8 @@ void ipvlan_process_multicast(struct work_struct *work)
 				nskb->pkt_type = pkt_type;
 				nskb->dev = ipvlan->dev;
 				if (tx_pkt)
-					ret = dev_forward_skb(ipvlan->dev, nskb);
+					ret = dev_forward_skb(ipvlan->dev,
+							      nskb, 0);
 				else
 					ret = netif_rx(nskb);
 			}
@@ -301,7 +302,7 @@ static int ipvlan_rcv_frame(struct ipvl_addr *addr, struct sk_buff **pskb,
 
 	if (local) {
 		skb->pkt_type = PACKET_HOST;
-		if (dev_forward_skb(ipvlan->dev, skb) == NET_RX_SUCCESS)
+		if (dev_forward_skb(ipvlan->dev, skb, 0) == NET_RX_SUCCESS)
 			success = true;
 	} else {
 		ret = RX_HANDLER_ANOTHER;
@@ -547,7 +548,7 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
 		 * the skb for the main-dev. At the RX side we just return
 		 * RX_PASS for it to be processed further on the stack.
 		 */
-		return dev_forward_skb(ipvlan->phy_dev, skb);
+		return dev_forward_skb(ipvlan->phy_dev, skb, 0);
 
 	} else if (is_multicast_ether_addr(eth->h_dest)) {
 		ipvlan_skb_crossing_ns(skb, NULL);
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9261722960a7..4db2876c1e44 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -202,7 +202,7 @@ static int macvlan_broadcast_one(struct sk_buff *skb,
 	struct net_device *dev = vlan->dev;
 
 	if (local)
-		return __dev_forward_skb(dev, skb);
+		return __dev_forward_skb(dev, skb, 0);
 
 	skb->dev = dev;
 	if (ether_addr_equal_64bits(eth->h_dest, dev->broadcast))
@@ -495,7 +495,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 		dest = macvlan_hash_lookup(port, eth->h_dest);
 		if (dest && dest->mode == MACVLAN_MODE_BRIDGE) {
 			/* send to lowerdev first for its network taps */
-			dev_forward_skb(vlan->lowerdev, skb);
+			dev_forward_skb(vlan->lowerdev, skb, 0);
 
 			return NET_XMIT_SUCCESS;
 		}
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c39d6d690e5..561da3a63b8a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -116,7 +116,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 	}
 
-	if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
+	if (likely(dev_forward_skb(rcv, skb, 0) == NET_RX_SUCCESS)) {
 		struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
 
 		u64_stats_update_begin(&stats->syncp);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97456b2539e4..f207b083ffec 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3282,16 +3282,16 @@ int dev_change_xdp_fd(struct net_device *dev, int fd, u32 flags);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
-int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
-int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
+int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu);
+int dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu);
 bool is_skb_forwardable(const struct net_device *dev,
-			const struct sk_buff *skb);
+			const struct sk_buff *skb, int mtu);
 
 static __always_inline int ____dev_forward_skb(struct net_device *dev,
-					       struct sk_buff *skb)
+					       struct sk_buff *skb, int mtu)
 {
 	if (skb_orphan_frags(skb, GFP_ATOMIC) ||
-	    unlikely(!is_skb_forwardable(dev, skb))) {
+	    unlikely(!is_skb_forwardable(dev, skb, mtu))) {
 		atomic_long_inc(&dev->rx_dropped);
 		kfree_skb(skb);
 		return NET_RX_DROP;
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 902af6ba481c..a1a38bb0d890 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -35,7 +35,7 @@ static inline int should_deliver(const struct net_bridge_port *p,
 
 int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-	if (!is_skb_forwardable(skb->dev, skb))
+	if (!is_skb_forwardable(skb->dev, skb, 0))
 		goto drop;
 
 	skb_push(skb, ETH_HLEN);
@@ -96,7 +96,7 @@ static void __br_forward(const struct net_bridge_port *to,
 		net = dev_net(indev);
 	} else {
 		if (unlikely(netpoll_tx_running(to->br->dev))) {
-			if (!is_skb_forwardable(skb->dev, skb)) {
+			if (!is_skb_forwardable(skb->dev, skb, skb->dev_mtu)) {
 				kfree_skb(skb);
 			} else {
 				skb_push(skb, ETH_HLEN);
diff --git a/net/core/dev.c b/net/core/dev.c
index 533a6d6f6092..f7c53d7c8e26 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1767,14 +1767,18 @@ static inline void net_timestamp_set(struct sk_buff *skb)
 			__net_timestamp(SKB);		\
 	}						\
 
-bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
+bool is_skb_forwardable(const struct net_device *dev,
+			const struct sk_buff *skb, int mtu)
 {
 	unsigned int len;
 
 	if (!(dev->flags & IFF_UP))
 		return false;
 
-	len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
+	if (mtu == 0)
+		mtu = dev->mtu;
+
+	len = mtu + dev->hard_header_len + VLAN_HLEN;
 	if (skb->len <= len)
 		return true;
 
@@ -1788,9 +1792,9 @@ bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(is_skb_forwardable);
 
-int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
+int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu)
 {
-	int ret = ____dev_forward_skb(dev, skb);
+	int ret = ____dev_forward_skb(dev, skb, mtu);
 
 	if (likely(!ret)) {
 		skb->protocol = eth_type_trans(skb, dev);
@@ -1806,6 +1810,7 @@ EXPORT_SYMBOL_GPL(__dev_forward_skb);
  *
  * @dev: destination network device
  * @skb: buffer to forward
+ * @mtu: Maximum size to forward. If 0 dev->mtu is used.
  *
  * return values:
  *	NET_RX_SUCCESS	(no congestion)
@@ -1819,9 +1824,9 @@ EXPORT_SYMBOL_GPL(__dev_forward_skb);
  * we have to clear all information in the skb that could
  * impact namespace isolation.
  */
-int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
+int dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu)
 {
-	return __dev_forward_skb(dev, skb) ?: netif_rx_internal(skb);
+	return __dev_forward_skb(dev, skb, mtu) ?: netif_rx_internal(skb);
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index ebaeaf2e46e8..3f3eb26e7ea1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1632,13 +1632,13 @@ static const struct bpf_func_proto bpf_csum_update_proto = {
 
 static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
 {
-	return dev_forward_skb(dev, skb);
+	return dev_forward_skb(dev, skb, 0);
 }
 
 static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
 				      struct sk_buff *skb)
 {
-	int ret = ____dev_forward_skb(dev, skb);
+	int ret = ____dev_forward_skb(dev, skb, 0);
 
 	if (likely(!ret)) {
 		skb->dev = dev;
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 6fd41d7afe1e..1258555b6578 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -164,7 +164,7 @@ static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb,
 	skb_dst_drop(skb);
 	nf_reset(skb);
 
-	if (dev_forward_skb(dev, skb) == NET_RX_SUCCESS) {
+	if (dev_forward_skb(dev, skb, 0) == NET_RX_SUCCESS) {
 		atomic_long_inc(&priv->rx_packets);
 		atomic_long_add(data_len, &priv->rx_bytes);
 	} else {
-- 
2.11.0

^ permalink raw reply related

* [PATCH 2/2] veth: Added attribute to set maximum receive size on veth interfaces
From: Fredrik Markstrom @ 2017-05-09 12:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Stephen Hemminger, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge, Fredrik Markstrom
In-Reply-To: <20170509124439.45674-1-fredrik.markstrom@gmail.com>

Currently veth drops all packet larger then the mtu set on the receiving
end of the pair. This is inconsistent with most hardware ethernet drivers.
This patch adds a new driver attribute to set the maximum size of received
packet to make it possible to create configurations similar to those
possible with (most) hardware ethernet interfaces.

Signed-off-by: Fredrik Markstrom <fredrik.markstrom@gmail.com>
---
 drivers/net/veth.c        | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/veth.h |  1 +
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 561da3a63b8a..5669286dd531 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -33,6 +33,7 @@ struct veth_priv {
 	struct net_device __rcu	*peer;
 	atomic64_t		dropped;
 	unsigned		requested_headroom;
+	int			mru;
 };
 
 /*
@@ -106,6 +107,7 @@ static const struct ethtool_ops veth_ethtool_ops = {
 static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct veth_priv *priv = netdev_priv(dev);
+	struct veth_priv *rcv_priv;
 	struct net_device *rcv;
 	int length = skb->len;
 
@@ -115,8 +117,10 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 		kfree_skb(skb);
 		goto drop;
 	}
+	rcv_priv = netdev_priv(rcv);
 
-	if (likely(dev_forward_skb(rcv, skb, 0) == NET_RX_SUCCESS)) {
+	if (likely(dev_forward_skb(rcv, skb, rcv_priv->mru) ==
+		   NET_RX_SUCCESS)) {
 		struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
 
 		u64_stats_update_begin(&stats->syncp);
@@ -346,6 +350,11 @@ static int veth_validate(struct nlattr *tb[], struct nlattr *data[])
 		if (!is_valid_veth_mtu(nla_get_u32(tb[IFLA_MTU])))
 			return -EINVAL;
 	}
+
+	if (tb[VETH_MRU])
+		if (!is_valid_veth_mtu(nla_get_u32(tb[VETH_MRU])))
+			return -EINVAL;
+
 	return 0;
 }
 
@@ -450,10 +459,15 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
 	 */
 
 	priv = netdev_priv(dev);
+	if (tb[VETH_MRU])
+		priv->mru = nla_get_u32(tb[VETH_MRU]);
 	rcu_assign_pointer(priv->peer, peer);
 
 	priv = netdev_priv(peer);
+	if (tbp[VETH_MRU])
+		priv->mru = nla_get_u32(tbp[VETH_MRU]);
 	rcu_assign_pointer(priv->peer, dev);
+
 	return 0;
 
 err_register_dev:
@@ -489,8 +503,34 @@ static void veth_dellink(struct net_device *dev, struct list_head *head)
 	}
 }
 
+static int veth_changelink(struct net_device *dev,
+			   struct nlattr *tb[], struct nlattr *data[])
+{
+	struct veth_priv *priv = netdev_priv(dev);
+
+	if (data && data[VETH_MRU])
+		priv->mru = nla_get_u32(data[VETH_MRU]);
+	return 0;
+}
+
+static size_t veth_get_size(const struct net_device *dev)
+{
+	return nla_total_size(4);/* VETH_MRU */
+}
+
+static int veth_fill_info(struct sk_buff *skb,
+			  const struct net_device *dev)
+{
+	struct veth_priv *priv = netdev_priv(dev);
+
+	if (nla_put_u32(skb, VETH_MRU, priv->mru))
+		return -EMSGSIZE;
+	return 0;
+}
+
 static const struct nla_policy veth_policy[VETH_INFO_MAX + 1] = {
 	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
+	[VETH_MRU]		= { .type = NLA_U32 },
 };
 
 static struct net *veth_get_link_net(const struct net_device *dev)
@@ -508,9 +548,12 @@ static struct rtnl_link_ops veth_link_ops = {
 	.validate	= veth_validate,
 	.newlink	= veth_newlink,
 	.dellink	= veth_dellink,
+	.changelink	= veth_changelink,
 	.policy		= veth_policy,
 	.maxtype	= VETH_INFO_MAX,
 	.get_link_net	= veth_get_link_net,
+	.get_size	= veth_get_size,
+	.fill_info	= veth_fill_info,
 };
 
 /*
diff --git a/include/uapi/linux/veth.h b/include/uapi/linux/veth.h
index 3354c1eb424e..8665b260f156 100644
--- a/include/uapi/linux/veth.h
+++ b/include/uapi/linux/veth.h
@@ -4,6 +4,7 @@
 enum {
 	VETH_INFO_UNSPEC,
 	VETH_INFO_PEER,
+	VETH_MRU,
 
 	__VETH_INFO_MAX
 #define VETH_INFO_MAX	(__VETH_INFO_MAX - 1)
-- 
2.11.0

^ permalink raw reply related

* Support for VETH_MRU in libnl
From: Fredrik Markstrom @ 2017-05-09 12:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Stephen Hemminger, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge, Fredrik Markstrom
In-Reply-To: <20170509124439.45674-1-fredrik.markstrom@gmail.com>

---
 include/linux/if_link.h           |   1 +
 include/netlink-private/types.h   |   1 +
 include/netlink/route/link/veth.h |   4 ++
 lib/route/link.c                  |   4 ++
 lib/route/link/veth.c             | 141 +++++++++++++++++++++++++++++---------
 5 files changed, 118 insertions(+), 33 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 8b84939..b9859bd 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -316,6 +316,7 @@ struct ifla_vxlan_port_range {
 enum {
 	VETH_INFO_UNSPEC,
 	VETH_INFO_PEER,
+	VETH_MRU,
 
 	__VETH_INFO_MAX
 #define VETH_INFO_MAX   (__VETH_INFO_MAX - 1)
diff --git a/include/netlink-private/types.h b/include/netlink-private/types.h
index 3ff4fe1..c97090b 100644
--- a/include/netlink-private/types.h
+++ b/include/netlink-private/types.h
@@ -165,6 +165,7 @@ struct rtnl_link
 	uint32_t			l_flags;
 	uint32_t			l_change;
 	uint32_t 			l_mtu;
+	uint32_t 			l_mru;
 	uint32_t			l_link;
 	uint32_t			l_txqlen;
 	uint32_t			l_weight;
diff --git a/include/netlink/route/link/veth.h b/include/netlink/route/link/veth.h
index 35c2345..58eeb98 100644
--- a/include/netlink/route/link/veth.h
+++ b/include/netlink/route/link/veth.h
@@ -29,6 +29,10 @@ extern struct rtnl_link *rtnl_link_veth_get_peer(struct rtnl_link *);
 extern int rtnl_link_veth_add(struct nl_sock *sock, const char *name,
 			      const char *peer, pid_t pid);
 
+extern int rtnl_link_veth_set_mru(struct rtnl_link *, uint32_t);
+
+extern uint32_t rtnl_link_veth_get_mru(struct rtnl_link *);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/route/link.c b/lib/route/link.c
index 3d31ffc..3cdacbb 100644
--- a/lib/route/link.c
+++ b/lib/route/link.c
@@ -61,6 +61,7 @@
 #define LINK_ATTR_PHYS_PORT_ID	(1 << 28)
 #define LINK_ATTR_NS_FD		(1 << 29)
 #define LINK_ATTR_NS_PID	(1 << 30)
+#define LINK_ATTR_MRU		(1 << 31)
 
 static struct nl_cache_ops rtnl_link_ops;
 static struct nl_object_ops link_obj_ops;
@@ -1255,6 +1256,9 @@ int rtnl_link_fill_info(struct nl_msg *msg, struct rtnl_link *link)
 	if (link->ce_mask & LINK_ATTR_MTU)
 		NLA_PUT_U32(msg, IFLA_MTU, link->l_mtu);
 
+	if (link->ce_mask & LINK_ATTR_MRU)
+		NLA_PUT_U32(msg, IFLA_MTU, link->l_mru);
+
 	if (link->ce_mask & LINK_ATTR_TXQLEN)
 		NLA_PUT_U32(msg, IFLA_TXQLEN, link->l_txqlen);
 
diff --git a/lib/route/link/veth.c b/lib/route/link/veth.c
index e7e4a26..5dc15af 100644
--- a/lib/route/link/veth.c
+++ b/lib/route/link/veth.c
@@ -33,16 +33,62 @@
 
 #include <linux/if_link.h>
 
+#define VETH_HAS_MRU		(1<<0)
+
+struct veth_info
+{
+	struct rtnl_link *peer;
+	uint32_t		vei_mru;
+	uint32_t		vei_mask;
+};
+
 static struct nla_policy veth_policy[VETH_INFO_MAX+1] = {
 	[VETH_INFO_PEER]	= { .minlen = sizeof(struct ifinfomsg) },
+	[VETH_MRU]		= { .type = NLA_U32 },
 };
 
+static int veth_alloc(struct rtnl_link *link)
+{
+	struct rtnl_link *peer;
+	struct veth_info *vei = link->l_info;
+	int err;
+
+	/* return early if we are in recursion */
+	if (vei && vei->peer)
+		return 0;
+
+	if (!(peer = rtnl_link_alloc()))
+		return -NLE_NOMEM;
+
+	if ((vei = calloc(1, sizeof(*vei))) == NULL)
+	  return -NLE_NOMEM;
+
+	/* We don't need to hold a reference here, as link and
+	 * its peer should always be freed together.
+	 */
+	vei->peer = link;
+
+	peer->l_info = vei;
+	if ((err = rtnl_link_set_type(peer, "veth")) < 0) {
+		rtnl_link_put(peer);
+		return err;
+	}
+
+	if ((vei = calloc(1, sizeof(*vei))) == NULL)
+	  return -NLE_NOMEM;
+
+	vei->peer = peer;
+	link->l_info = vei;
+	return 0;
+}
+
 static int veth_parse(struct rtnl_link *link, struct nlattr *data,
 		      struct nlattr *xstats)
 {
 	struct nlattr *tb[VETH_INFO_MAX+1];
 	struct nlattr *peer_tb[IFLA_MAX + 1];
-	struct rtnl_link *peer = link->l_info;
+	struct veth_info *vei = link->l_info;
+	struct rtnl_link *peer = vei->peer;
 	int err;
 
 	NL_DBG(3, "Parsing veth link info");
@@ -50,6 +96,14 @@ static int veth_parse(struct rtnl_link *link, struct nlattr *data,
 	if ((err = nla_parse_nested(tb, VETH_INFO_MAX, data, veth_policy)) < 0)
 		goto errout;
 
+	if ((err = veth_alloc(link)) < 0)
+		goto errout;
+
+	if (tb[VETH_MRU]) {
+		vei->vei_mru = nla_get_u32(tb[VETH_MRU]);
+		vei->vei_mask |= VETH_HAS_MRU;
+	}
+
 	if (tb[VETH_INFO_PEER]) {
 		struct nlattr *nla_peer;
 		struct ifinfomsg *ifi;
@@ -86,7 +140,8 @@ static void veth_dump_line(struct rtnl_link *link, struct nl_dump_params *p)
 
 static void veth_dump_details(struct rtnl_link *link, struct nl_dump_params *p)
 {
-	struct rtnl_link *peer = link->l_info;
+	struct veth_info *vei = link->l_info;
+	struct rtnl_link *peer = vei->peer;
 	char *name;
 	name = rtnl_link_get_name(peer);
 	nl_dump(p, "      peer ");
@@ -98,7 +153,14 @@ static void veth_dump_details(struct rtnl_link *link, struct nl_dump_params *p)
 
 static int veth_clone(struct rtnl_link *dst, struct rtnl_link *src)
 {
-	struct rtnl_link *dst_peer = NULL, *src_peer = src->l_info;
+	struct veth_info *src_vei = src->l_info;
+	struct veth_info *dst_vei = dst->l_info;
+	struct rtnl_link *dst_peer = NULL, *src_peer = src_vei->peer;
+
+
+	printf("veth_clone not implemented\n");
+
+	// FIXME:
 
 	/* we are calling nl_object_clone() recursively, this should
 	 * happen only once */
@@ -116,7 +178,8 @@ static int veth_clone(struct rtnl_link *dst, struct rtnl_link *src)
 
 static int veth_put_attrs(struct nl_msg *msg, struct rtnl_link *link)
 {
-	struct rtnl_link *peer = link->l_info;
+	struct veth_info *vei = link->l_info;
+	struct rtnl_link *peer = vei->peer;
 	struct ifinfomsg ifi;
 	struct nlattr *data, *info_peer;
 
@@ -135,44 +198,31 @@ static int veth_put_attrs(struct nl_msg *msg, struct rtnl_link *link)
 		return -NLE_MSGSIZE;
 	rtnl_link_fill_info(msg, peer);
 	nla_nest_end(msg, info_peer);
-	nla_nest_end(msg, data);
 
-	return 0;
-}
-
-static int veth_alloc(struct rtnl_link *link)
-{
-	struct rtnl_link *peer;
-	int err;
-
-	/* return early if we are in recursion */
-	if (link->l_info)
-		return 0;
+	if (vei->vei_mask & VETH_HAS_MRU)
+		NLA_PUT_U32(msg, VETH_MRU, vei->vei_mru);
 
-	if (!(peer = rtnl_link_alloc()))
-		return -NLE_NOMEM;
+	nla_nest_end(msg, data);
 
-	/* We don't need to hold a reference here, as link and
-	 * its peer should always be freed together.
-	 */
-	peer->l_info = link;
-	if ((err = rtnl_link_set_type(peer, "veth")) < 0) {
-		rtnl_link_put(peer);
-		return err;
-	}
+nla_put_failure:
 
-	link->l_info = peer;
 	return 0;
 }
 
 static void veth_free(struct rtnl_link *link)
 {
-	struct rtnl_link *peer = link->l_info;
-	if (peer) {
+	struct veth_info *vei = link->l_info;
+	if (vei) {
+		struct rtnl_link *peer = vei->peer;
+		if (peer) {
+			vei->peer = NULL;
+			rtnl_link_put(peer);
+			/* avoid calling this recursively */
+			free(peer->l_info);
+			peer->l_info = NULL;
+		}
+		free(vei);
 		link->l_info = NULL;
-		/* avoid calling this recursively */
-		peer->l_info = NULL;
-		rtnl_link_put(peer);
 	}
 	/* the caller should finally free link */
 }
@@ -195,7 +245,7 @@ static struct rtnl_link_info_ops veth_info_ops = {
 #define IS_VETH_LINK_ASSERT(link) \
 	if ((link)->l_info_ops != &veth_info_ops) { \
 		APPBUG("Link is not a veth link. set type \"veth\" first."); \
-		return NULL; \
+		return -NLE_OPNOTSUPP; \
 	}
 /** @endcond */
 
@@ -293,6 +343,31 @@ int rtnl_link_veth_add(struct nl_sock *sock, const char *name,
 	return err;
 }
 
+int rtnl_link_veth_set_mru(struct rtnl_link *link, uint32_t mru)
+{
+	struct veth_info *vei = link->l_info;
+
+	IS_VETH_LINK_ASSERT(link);
+
+	vei->vei_mru = mru;
+	vei->vei_mask |= VETH_HAS_MRU;
+
+	return 0;
+}
+
+uint32_t rtnl_link_veth_get_mru(struct rtnl_link *link)
+{
+	struct veth_info *vei = link->l_info;
+
+	IS_VETH_LINK_ASSERT(link);
+
+	if (vei->vei_mask & VETH_HAS_MRU)
+		return vei->vei_mru;
+	else
+		return 0;
+}
+
+
 /** @} */
 
 static void __init veth_init(void)
-- 
2.10.1

^ permalink raw reply related

* Re: [PATCH net] rtnetlink: Fix the IFLA_PHYS_PORT_NAME TLV to include terminating NULL
From: Yotam Gigi @ 2017-05-09 12:48 UTC (permalink / raw)
  To: Tobias Klauser
  Cc: davem, zhangshengju, roopa, sd, bblanco, minipli, nogahf, moshe,
	rshearma, daniel, netdev, David Ahern
In-Reply-To: <20170509123107.GG10395@distanz.ch>

On 05/09/2017 03:31 PM, Tobias Klauser wrote:
> On 2017-05-09 at 14:12:02 +0200, Yotam Gigi <yotamg@mellanox.com> wrote:
>> The IFLA_PHYS_PORT_NAME rtnetlink TLV length does not include the
>> terminating NULL character, which is different from other string typed
>> TLVs. Due to the fact that libnl checks for the terminating NULL in every
>> string typed attribute, it crashes on every RTM_GETLINK response on
>> drivers that implement ndo_get_phys_port_name.
>>
>> Make the fill_phys_port_name function include the terminating NULL in the
>> TLV size by using the nla_put_string helper function.
>>
>> Fixes: db24a9044ee1 ("net: add support for phys_port_name")
>> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
>> Cc: David Ahern <dsa@cumulusnetworks.com>
>> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>> Please consider this for stable too. Thanks!
> This is already fixed in commit 77ef033b687c ("rtnetlink: NUL-terminate
> IFLA_PHYS_PORT_NAME string").


You are right. I forgot to rebase my net tree :)

^ permalink raw reply

* [PATCH v2] ath10k: remove unnecessary code
From: Gustavo A. R. Silva @ 2017-05-09 12:51 UTC (permalink / raw)
  To: Kalle Valo
  Cc: ath10k, linux-wireless, netdev, linux-kernel, Gustavo A. R. Silva,
	Arend Van Spriel

The array fields in struct wmi_start_scan_arg that are checked here are
fixed size arrays so they can never be NULL.

Addresses-Coverity-ID: 1260031
Cc: Arend Van Spriel <arend.vanspriel@broadcom.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
---
Changes in v2:
 Rephrase commit log.

 drivers/net/wireless/ath/ath10k/wmi.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index 2f1743e..135cf83 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -5933,15 +5933,6 @@ static struct sk_buff *ath10k_wmi_10_4_op_gen_init(struct ath10k *ar)
 
 int ath10k_wmi_start_scan_verify(const struct wmi_start_scan_arg *arg)
 {
-	if (arg->ie_len && !arg->ie)
-		return -EINVAL;
-	if (arg->n_channels && !arg->channels)
-		return -EINVAL;
-	if (arg->n_ssids && !arg->ssids)
-		return -EINVAL;
-	if (arg->n_bssids && !arg->bssids)
-		return -EINVAL;
-
 	if (arg->ie_len > WLAN_SCAN_PARAMS_MAX_IE_LEN)
 		return -EINVAL;
 	if (arg->n_channels > ARRAY_SIZE(arg->channels))
-- 
2.5.0

^ permalink raw reply related

* [PATCH v2] ath9k: remove unnecessary code
From: Gustavo A. R. Silva @ 2017-05-09 13:04 UTC (permalink / raw)
  To: Kalle Valo
  Cc: linux-wireless, netdev, linux-kernel, Gustavo A. R. Silva,
	Arend Van Spriel

The array field eeprom_data in struct th9k_platform_data
is a fixed size array so it can never be NULL.

Addresses-Coverity-ID: 1364903
Cc: Arend Van Spriel <arend.vanspriel@broadcom.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
---
Changes in v2:
 Rephrase commit log.

 drivers/net/wireless/ath/ath9k/eeprom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/eeprom.c b/drivers/net/wireless/ath/ath9k/eeprom.c
index fb80ec8..5c3bc28 100644
--- a/drivers/net/wireless/ath/ath9k/eeprom.c
+++ b/drivers/net/wireless/ath/ath9k/eeprom.c
@@ -143,7 +143,7 @@ bool ath9k_hw_nvram_read(struct ath_hw *ah, u32 off, u16 *data)
 
 	if (ah->eeprom_blob)
 		ret = ath9k_hw_nvram_read_firmware(ah->eeprom_blob, off, data);
-	else if (pdata && !pdata->use_eeprom && pdata->eeprom_data)
+	else if (pdata && !pdata->use_eeprom)
 		ret = ath9k_hw_nvram_read_pdata(pdata, off, data);
 	else
 		ret = common->bus_ops->eeprom_read(common, off, data);
-- 
2.5.0

^ permalink raw reply related

* Re: [PATCH net] ip6_tunnel: remove unreachable ICMP_REDIRECT code
From: Hangbin Liu @ 2017-05-09 13:09 UTC (permalink / raw)
  To: Cong Wang; +Cc: Hangbin Liu, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpWAdKve_ZxG3n+uoz_We65RV95CaKydPFYobo+pQWvjjg@mail.gmail.com>

On Mon, May 08, 2017 at 01:26:48PM -0700, Cong Wang wrote:
> On Mon, May 8, 2017 at 4:11 AM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> > After call ip6_tnl_err(), the rel_type will be ether ICMPV6_DEST_UNREACH
> > or ICMPV6_PKT_TOOBIG. We will never reach ICMP_REDIRECT. So remove it.
> 
> Are you sure we really don't need to handle NDISC_REDIRECT here?

Hi Cong,

I have no intend to remove any handler if we need it.

Just from the code path, after call ip6_tnl_err() without error, the rel_type
will be set to either ICMPV6_DEST_UNREACH or ICMPV6_PKT_TOOBIG. Which mean the
NDISC_REDIRECT check will never be reached. That's the reason I removed it.

So if we still want to handle it, I think we need a check in ip6_tnl_err().

Please correct me if I missed anything. You know I'm a fresher here.
> 
> I can't find anything in RFC 2473 explictly, but I am feeling we should handle
> it rather than ignoring it according to:
> 
>    To report a problem detected inside the tunnel to the source of an
>    original packet, the tunnel entry point node must relay the ICMP
>    message received from inside the tunnel to the source of that
>    original IPv6 packet.


As I understand, the problem is detected inside the tunnel and should
reply to the source of original packet.

In section 8.1 Tunnel ICMP Messages

The tunnel ICMP messages that are reported to the source of the
original packet are:
hop limit exceeded
unreachable node
parameter problem
packet too big

Also what I understand that a redirect msg may happen looks like

A: Original Packet Source Node
B: Tunnel Entry-Point Node
C: Tunnel Exit-Point Node
D: Original Packet Destination Node

A   --  B  -- Node 1 -- C -- D
           \- Node 2 -/

When B send msg to C, there may have a redirect from Node 1 to B, which
should be a ICMP error inside the tunnel. Not tunnel entry point to original
souce.


Or looks like

A: Original Packet Source Node
BE: Tunnel Entry-Point Node
CF: Tunnel Exit-Point Node
D: Original Packet Destination Node

A  --  B  --  C  --  D
   \-  E  --  F  -/

When A send pkt to D, and B reply a redirect msg to A. But I think this
problem is not detected _inside_ tunnel.

Thanks
Hangbin

^ permalink raw reply

* Re: [PATCH net] tcp: init tcp_options before using it.
From: Hangbin Liu @ 2017-05-09 13:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1494254963.7796.55.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, May 08, 2017 at 07:49:23AM -0700, Eric Dumazet wrote:
> On Mon, 2017-05-08 at 17:57 +0800, Hangbin Liu wrote:
> > I searched 4308fc58dced ("tcp: Document use of undefined variable") in
> > archive list, but did not find the thread. So I'm not sure why we only
> > add a description about un-initialized value.
> > 
> > Even we don't use tmp_opt.sack_ok, I think it would be more safe to
> > initialize the value before using it. Just as other caller did.
> 
> Patch is not needed at all.
> 
> Comment and code are pretty clear.
> 
> This part of the code uses a generic function ( tcp_parse_options()) to
> decode TCP options, but we are only caring about TS one.

OK, got it. Thanks for the explanation and sorry for the inconvenience.

Best Regards
Hangbin

^ permalink raw reply

* Re: [PATCH net] tcp: do not inherit mc_list from parent
From: Eric Dumazet @ 2017-05-09 13:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Pray3r, Andrey Konovalov
In-Reply-To: <1494332235.7796.70.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2017-05-09 at 05:17 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> syzkaller found a way to trigger double frees from ip_mc_drop_socket()
> 
> It turns out that leave a copy of parent mc_list at accept() time,
> which is very bad.
> 
> Very similar to commit 8b485ce69876 ("tcp: do not inherit
> fastopen_req from parent")
> 
> Initial report from Pray3r, completed by Andrey one.
> Thanks a lot to them !
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Pray3r <pray3r.z@gmail.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Tested-by: Andrey Konovalov <andreyknvl@google.com>
> ---
> Notes:
>  - day-0 bug.
>  - Not sure if it makes sense for TCP socket to be able to join MC
> group ?


I will send a V2, putting the fix in inet_csk_clone_lock() so that DCCP
is also fixed ;)

^ permalink raw reply

* Re: [PATCH net] tcp: init tcp_options before using it.
From: Eric Dumazet @ 2017-05-09 13:25 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev
In-Reply-To: <20170509132240.GA4649@leo.usersys.redhat.com>

On Tue, 2017-05-09 at 21:22 +0800, Hangbin Liu wrote:
> On Mon, May 08, 2017 at 07:49:23AM -0700, Eric Dumazet wrote:
> > On Mon, 2017-05-08 at 17:57 +0800, Hangbin Liu wrote:
> > > I searched 4308fc58dced ("tcp: Document use of undefined variable") in
> > > archive list, but did not find the thread. So I'm not sure why we only
> > > add a description about un-initialized value.
> > > 
> > > Even we don't use tmp_opt.sack_ok, I think it would be more safe to
> > > initialize the value before using it. Just as other caller did.
> > 
> > Patch is not needed at all.
> > 
> > Comment and code are pretty clear.
> > 
> > This part of the code uses a generic function ( tcp_parse_options()) to
> > decode TCP options, but we are only caring about TS one.
> 
> OK, got it. Thanks for the explanation and sorry for the inconvenience.

No inconvenience taken ;)

^ permalink raw reply

* Re: [PATCH 1/4] net: macb: Add support for PTP timestamps in DMA descriptors
From: David Miller @ 2017-05-09 13:25 UTC (permalink / raw)
  To: richardcochran-Re5JQEeQqe8AvxtiuMwx3w
  Cc: rafalo-vna1KIf7WgpBDgjK7y7TUQ,
	nicolas.ferre-AIFe0yeh4nAAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	harini.katakam-gjFFaj9aHVfQT0dZR+AlfA,
	andrei.pistirica-UWL1GkI3JZL3oGB3hsPCZA
In-Reply-To: <20170509120434.GA9368-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>

From: Richard Cochran <richardcochran-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Tue, 9 May 2017 14:04:34 +0200

> On Tue, May 09, 2017 at 10:24:45AM +0100, Rafal Ozieblo wrote:
>> This patch adds support for PTP timestamps in
>> DMA buffer descriptors. It checks capability at runtime
>> and uses appropriate buffer descriptor.
>> 
>> Signed-off-by: Rafal Ozieblo <rafalo-vna1KIf7WgpBDgjK7y7TUQ@public.gmane.org>
> 
> You posted this series once before, on April 13, 2017.  That makes
> this v2.  Please add v2 in the subject line, eg. [PATCH v2 1/4].
> 
> Also, add a cover letter [0/4] that summarizes the changes between v1
> and v2 of the series.

Please don't ask someone to repost a series targetting net-next
right now when that tree is currently closed for submissions.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC v2] ptr_ring: add ptr_ring_unconsume
From: Michael S. Tsirkin @ 2017-05-09 13:26 UTC (permalink / raw)
  To: Jason Wang; +Cc: linux-kernel, netdev
In-Reply-To: <ce6e1816-e3e0-4e6b-b017-05cfc54a0170@redhat.com>

On Wed, Apr 26, 2017 at 05:09:42PM +0800, Jason Wang wrote:
> 
> 
> On 2017年04月25日 00:01, Michael S. Tsirkin wrote:
> > Applications that consume a batch of entries in one go
> > can benefit from ability to return some of them back
> > into the ring.
> > 
> > Add an API for that - assuming there's space. If there's no space
> > naturally can't do this and have to drop entries, but this implies ring
> > is full so we'd likely drop some anyway.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Jason, if you add this and unconsume the outstanding packets
> > on backend disconnect, vhost close and reset, I think
> > we should apply your patch even if we don't yet know 100%
> > why it helps.
> > 
> > changes from v1:
> > - fix up coding style issues reported by Sergei Shtylyov
> > 
> > 
> >   include/linux/ptr_ring.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 56 insertions(+)
> > 
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index 783e7f5..902afc2 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -457,6 +457,62 @@ static inline int ptr_ring_init(struct ptr_ring *r, int size, gfp_t gfp)
> >   	return 0;
> >   }
> > +/*
> > + * Return entries into ring. Destroy entries that don't fit.
> > + *
> > + * Note: this is expected to be a rare slow path operation.
> > + *
> > + * Note: producer lock is nested within consumer lock, so if you
> > + * resize you must make sure all uses nest correctly.
> > + * In particular if you consume ring in interrupt or BH context, you must
> > + * disable interrupts/BH when doing so.
> > + */
> > +static inline void ptr_ring_unconsume(struct ptr_ring *r, void **batch, int n,
> > +				      void (*destroy)(void *))
> > +{
> > +	unsigned long flags;
> > +	int head;
> > +
> > +	spin_lock_irqsave(&r->consumer_lock, flags);
> > +	spin_lock(&r->producer_lock);
> > +
> > +	if (!r->size)
> > +		goto done;
> > +
> > +	/*
> > +	 * Clean out buffered entries (for simplicity). This way following code
> > +	 * can test entries for NULL and if not assume they are valid.
> > +	 */
> > +	head = r->consumer_head - 1;
> > +	while (likely(head >= r->consumer_tail))
> > +		r->queue[head--] = NULL;
> > +	r->consumer_tail = r->consumer_head;
> > +
> > +	/*
> > +	 * Go over entries in batch, start moving head back and copy entries.
> > +	 * Stop when we run into previously unconsumed entries.
> > +	 */
> > +	while (n--) {
> > +		head = r->consumer_head - 1;
> > +		if (head < 0)
> > +			head = r->size - 1;
> > +		if (r->queue[head]) {
> > +			/* This batch entry will have to be destroyed. */
> > +			++n;
> > +			goto done;
> > +		}
> > +		r->queue[head] = batch[n];
> > +		r->consumer_tail = r->consumer_head = head;
> 
> Looks like something wrong here (bad page state reported), uncomment the
> above while() solving the issue. But after staring it for a while I didn't
> find anything interesting, maybe you have some idea on this?
> 
> Thanks
> 
> 
> > +	}
> > +
> > +done:
> > +	/* Destroy all entries left in the batch. */
> > +	while (n--)
> > +		destroy(batch[n]);
> > +	spin_unlock(&r->producer_lock);
> > +	spin_unlock_irqrestore(&r->consumer_lock, flags);
> > +}
> > +
> >   static inline void **__ptr_ring_swap_queue(struct ptr_ring *r, void **queue,
> >   					   int size, gfp_t gfp,
> >   					   void (*destroy)(void *))

What's our plan here? I can't delay pull request much longer.

^ permalink raw reply

* [PATCH v2 net] dccp/tcp: do not inherit mc_list from parent
From: Eric Dumazet @ 2017-05-09 13:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Pray3r, Andrey Konovalov
In-Reply-To: <1494332235.7796.70.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

syzkaller found a way to trigger double frees from ip_mc_drop_socket()

It turns out that leave a copy of parent mc_list at accept() time,
which is very bad.

Very similar to commit 8b485ce69876 ("tcp: do not inherit
fastopen_req from parent")

Initial report from Pray3r, completed by Andrey one.
Thanks a lot to them !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pray3r <pray3r.z@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
---
v2: fix moved into inet_csk_clone_lock() to fix both DCCP and TCP

 net/ipv4/inet_connection_sock.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 5e313c1ac94fc88eca5fe3a0e9e46e551e955ff0..1054d330bf9df3189a21dbb08e27c0e6ad136775 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -794,6 +794,8 @@ struct sock *inet_csk_clone_lock(const struct sock *sk,
 		/* listeners have SOCK_RCU_FREE, not the children */
 		sock_reset_flag(newsk, SOCK_RCU_FREE);
 
+		inet_sk(newsk)->mc_list = NULL;
+
 		newsk->sk_mark = inet_rsk(req)->ir_mark;
 		atomic64_set(&newsk->sk_cookie,
 			     atomic64_read(&inet_rsk(req)->ir_cookie));

^ permalink raw reply related

* Re: [PATCH net] tcp: do not inherit mc_list from parent
From: David Miller @ 2017-05-09 13:30 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, pray3r.z, andreyknvl
In-Reply-To: <1494336215.7796.75.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 09 May 2017 06:23:35 -0700

> I will send a V2, putting the fix in inet_csk_clone_lock() so that DCCP
> is also fixed ;)

Thank you :)

^ permalink raw reply

* Re: [PATCH] net: fec: select queue depending on VLAN priority
From: David Miller @ 2017-05-09 13:39 UTC (permalink / raw)
  To: stefan; +Cc: fugang.duan, andrew, festevam, netdev, linux-kernel
In-Reply-To: <20170509053708.2573-1-stefan@agner.ch>

From: Stefan Agner <stefan@agner.ch>
Date: Mon,  8 May 2017 22:37:08 -0700

> Since the addition of the multi queue code with commit 59d0f7465644
> ("net: fec: init multi queue date structure") the queue selection
> has been handelt by the default transmit queue selection
> implementation which tries to evenly distribute the traffic across
> all available queues. This selection presumes that the queues are
> using an equal priority, however, the queues 1 and 2 are actually
> of higher priority (the classification of the queues is enabled in
> fec_enet_enable_ring).
> 
> This can lead to net scheduler warnings and continuous TX ring
> dumps when exercising the system with iperf.
> 
> Use only queue 0 for all common traffic (no VLAN and P802.1p
> priority 0 and 1) and route level 2-7 through queue 1 and 2.
> 
> Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
> Fixes: 59d0f7465644 ("net: fec: init multi queue date structure")

If the queues are used for prioritization, and it does not have
multiple normal priority level queues, multiqueue is not what the
driver should have implemented.

^ permalink raw reply

* Re: [PATCH] DECnet: Use container_of() for embedded struct
From: David Miller @ 2017-05-09 13:40 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, linux-decnet-user, netdev
In-Reply-To: <20170508223144.GA53216@beast>

From: Kees Cook <keescook@chromium.org>
Date: Mon, 8 May 2017 15:31:44 -0700

> Instead of a direct cross-type cast, use conatiner_of() to locate
> the embedded structure, even in the face of future struct layout
> randomization.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox