Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] brcmfmac: Ensure pointer correctly set if skb data location changes
From: Arend van Spriel @ 2017-04-24 11:10 UTC (permalink / raw)
  To: James Hughes, Franky Lin, Hante Meuleman, Kalle Valo,
	linux-wireless, brcm80211-dev-list.pdl, netdev
In-Reply-To: <20170424105219.18797-1-james.hughes@raspberrypi.org>

On 4/24/2017 12:52 PM, James Hughes wrote:
> The incoming skb header may be resized if header space is
> insufficient, which might change the data adddress in the skb.
> Ensure that a cached pointer to that data is correctly set by
> moving assignment to after any possible changes.

Thanks, James

Minor nit below...

You may add my acknowledgement:

Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
> Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
> ---
>   drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c

[...]

> @@ -229,6 +229,8 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
>   		}
>   	}
>   
> +	eh = (struct ethhdr *)(skb->data);
> +

Please move after the length validation below.

Regards,
Arend

>   	/* validate length for ether packet */
>   	if (skb->len < sizeof(*eh)) {
>   		ret = -EINVAL;
> 

^ permalink raw reply

* Re: [PATCH iproute2 net 3/8] tc/pedit: Introduce 'add' operation
From: Or Gerlitz @ 2017-04-24 11:04 UTC (permalink / raw)
  To: Amir Vadai
  Cc: Jamal Hadi Salim, Stephen Hemminger, Linux Netdev List,
	Or Gerlitz
In-Reply-To: <20170424075203.GA3265@office.localdomain>

On Mon, Apr 24, 2017 at 10:52 AM, Amir Vadai <amir@vadai.me> wrote:
> On Sun, Apr 23, 2017 at 01:44:51PM -0400, Jamal Hadi Salim wrote:
>> Thanks for the excellent work.

sure, it's Amir, you know..

>> On 17-04-23 08:53 AM, Amir Vadai wrote:

>> Mostly curious about hardware handling.

> As to hardware handling, Or did the implementation so he will answer
> better. AFAIK, current implementation doesn't allow partial field
> setting/adding, so such a rule can't be offloaded. I think it is only to
> make things simple and because there was no use case for that.
> To my knowledge, hardware should support such thing if needed.

yeah (currently no partial setting) and yeah (in the future yes, will
support partial setting of offset into
field and partial length to set from there) and no for partial addition.

The reason we decided to disallow partial addition in the FW API was
what do you do with the carry, e.g you
add a value to the 2nd nibble of IP address and there are carry bits,
where do you take them? this becomes
even more ugly if you are dealing with mac addresses. I guess once
there's use case for partial add offload,
we will revisit that.

Or.

^ permalink raw reply

* Re: [PATCH net] team: fix memory leaks
From: Jiri Pirko @ 2017-04-24 11:04 UTC (permalink / raw)
  To: Pan Bian; +Cc: netdev, linux-kernel
In-Reply-To: <1493029756-13171-1-git-send-email-bianpan2016@163.com>

Mon, Apr 24, 2017 at 12:29:16PM CEST, bianpan2016@163.com wrote:
>In functions team_nl_send_port_list_get() and
>team_nl_send_options_get(), pointer skb keeps the return value of
>nlmsg_new(). When the call to genlmsg_put() fails, the memory is not
>freed(). This will result in memory leak bugs.
>
>Fixes: 9b00cf2d1024 ("team: implement multipart netlink messages for options transfers")
>Signed-off-by: Pan Bian <bianpan2016@163.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* RE: [PATCH] macsec: avoid heap overflow in skb_to_sgvec
From: David Laight @ 2017-04-24 11:02 UTC (permalink / raw)
  To: 'Jason A. Donenfeld', netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, davem@davemloft.net
  Cc: stable@vger.kernel.org, security@kernel.org
In-Reply-To: <20170421211448.16995-1-Jason@zx2c4.com>

From: Jason A. Donenfeld
> Sent: 21 April 2017 22:15
> While this may appear as a humdrum one line change, it's actually quite
> important. An sk_buff stores data in three places:
> 
> 1. A linear chunk of allocated memory in skb->data. This is the easiest
>    one to work with, but it precludes using scatterdata since the memory
>    must be linear.
> 2. The array skb_shinfo(skb)->frags, which is of maximum length
>    MAX_SKB_FRAGS. This is nice for scattergather, since these fragments
>    can point to different pages.
> 3. skb_shinfo(skb)->frag_list, which is a pointer to another sk_buff,
>    which in turn can have data in either (1) or (2).
> 
> The first two are rather easy to deal with, since they're of a fixed
> maximum length, while the third one is not, since there can be
> potentially limitless chains of fragments. Fortunately dealing with
> frag_list is opt-in for drivers, so drivers don't actually have to deal
> with this mess. For whatever reason, macsec decided it wanted pain, and
> so it explicitly specified NETIF_F_FRAGLIST.
> 
> Because dealing with (1), (2), and (3) is insane, most users of sk_buff
> doing any sort of crypto or paging operation calls a convenient function
> called skb_to_sgvec (which happens to be recursive if (3) is in use!).
> This takes a sk_buff as input, and writes into its output pointer an
> array of scattergather list items. Sometimes people like to declare a
> fixed size scattergather list on the stack; othertimes people like to
> allocate a fixed size scattergather list on the heap. However, if you're
> doing it in a fixed-size fashion, you really shouldn't be using
> NETIF_F_FRAGLIST too (unless you're also ensuring the sk_buff and its
> frag_list children arent't shared and then you check the number of
> fragments in total required.)
> 
> Macsec specifically does this:
> 
>         size += sizeof(struct scatterlist) * (MAX_SKB_FRAGS + 1);
>         tmp = kmalloc(size, GFP_ATOMIC);
>         *sg = (struct scatterlist *)(tmp + sg_offset);
> 	...
>         sg_init_table(sg, MAX_SKB_FRAGS + 1);
>         skb_to_sgvec(skb, sg, 0, skb->len);
> 
> Specifying MAX_SKB_FRAGS + 1 is the right answer usually, but not if you're
> using NETIF_F_FRAGLIST, in which case the call to skb_to_sgvec will
> overflow the heap, and disaster ensues.
...

Shouldn't skb_to_sgvec() be checking the number of fragments against
the size of the sg list?
The callers would then all need auditing to allow for failure.

	David

^ permalink raw reply

* Re: [PATCH net] bridge: shutdown bridge device before removing it
From: Nikolay Aleksandrov @ 2017-04-24 11:01 UTC (permalink / raw)
  To: Xin Long, network dev; +Cc: bridge@lists.linux-foundation.org, David S. Miller
In-Reply-To: <a8b2b2c876f416431dac031a0beb7ce079c7d0a9.1493018730.git.lucien.xin@gmail.com>

On 24/04/17 10:25, Xin Long wrote:
> During removing a bridge device, if the bridge is still up, a new mdb entry
> still can be added in br_multicast_add_group() after all mdb entries are
> removed in br_multicast_dev_del(). Like the path:
> 
>   mld_ifc_timer_expire ->
>     mld_sendpack -> ...
>       br_multicast_rcv ->
>         br_multicast_add_group
> 
> The new mp's timer will be set up. If the timer expires after the bridge
> is freed, it may cause use-after-free panic in br_multicast_group_expired.
> This can happen when ip link remove a bridge or destroy a netns with a
> bridge device inside.
> 
> As we can see in br_del_bridge, brctl is also supposed to remove a bridge
> device after it's shutdown.
> 
> This patch is to call dev_close at the beginning of br_dev_delete so that
> netif_running check in br_multicast_add_group can avoid this issue. But
> to keep consistent with before, it will not remove the IFF_UP check in
> br_del_bridge for brctl.
> 
> Reported-by: Jianwen Ji <jiji@redhat.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/bridge/br_if.c | 2 ++
>  1 file changed, 2 insertions(+)
> 

+CC bridge maintainers

I can see how this could happen, could you also provide the traceback ?

The patch looks good to me, actually I think it fixes another issue with
mcast stats where the percpu pointer can be accessed after it's freed if
an mcast packet can get sent via br->dev after the br_multicast_dev_del() call.
This is definitely stable material, if I'm not mistaken the issue is there since
the introduction of br_dev_delete:
commit e10177abf842
Author: Satish Ashok <sashok@cumulusnetworks.com>
Date:   Wed Jul 15 07:16:51 2015 -0700

    bridge: multicast: fix handling of temp and perm entries



Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: PROBLEM: IPVS incorrectly reverse-NATs traffic to LVS host
From: Nick Moriarty @ 2017-04-24 10:55 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Wensong Zhang, Simon Horman, netdev, lvs-devel
In-Reply-To: <alpine.LFD.2.20.1704221417080.1974@ja.home.ssi.bg>

Hi Julian,

Many thanks for your prompt fix!  I've now tested this patch against
the 4.11.0-rc8 kernel on Ubuntu, and I can confirm that my check
script is no longer seeing incorrect addresses in its responses.

Could you please keep me posted as this is merged?

Thanks again

On 22 April 2017 at 18:06, Julian Anastasov <ja@ssi.bg> wrote:
>
>         Hello,
>
> On Wed, 12 Apr 2017, Nick Moriarty wrote:
>
>> Hi,
>>
>> I've experienced a problem in how traffic returning to an LVS host is
>> handled in certain circumstances.  Please find a bug report below - if
>> there's any further information you'd like, please let me know.
>>
>> [1.] One line summary of the problem:
>> IPVS incorrectly reverse-NATs traffic to LVS host
>>
>> [2.] Full description of the problem/report:
>> When using IPVS in direct-routing mode, normal traffic from the LVS
>> host to a back-end server is sometimes incorrectly NATed on the way
>> back into the LVS host.  Using tcpdump shows that the return packets
>> have the correct source IP, but by the time it makes it back to the
>> application, it's been changed.
>>
>> To reproduce this, a configuration such as the following will work:
>> - Set up an LVS system with a VIP serving UDP to a backend DNS server
>> using the direct-routing method in IPVS
>> - Make an outgoing UDP request to the VIP from the LVS system itself
>> (this causes a connection to be added to the IPVS connection table)
>> - The request should succeed as normal
>> - Note the UDP source port used
>> - Within 5 minutes (before the UDP connection entry expires), make an
>> outgoing UDP request directly to the backend DNS server
>> - The request will fail as the reply is incorrectly modified on its
>> way back and appears to return from the VIP
>>
>> Monitoring the above sequence with tcpdump verifies that the returned
>> packet (as it enters the host) is from the DNS IP, even though the
>> application sees the VIP.
>>
>> If an outgoing request direct to the DNS server is made from a port
>> not in the connection table, everything is fine.
>
>         Thanks for the detailed report! I think, I fixed the
> problem. Let me know if you are able to test the appended fix.
>
>> I expect that somewhere, something (e.g. functionality for IPVS MASQ
>> responses) is applying IPVS connection
>> information to incoming traffic, matching a DROUTE rule, and treating
>> it as NAT traffic.
>
>         Yep, that is what happens.
>
> ================================================================
>
> [PATCH net] ipvs: SNAT packet replies only for NATed connections
>
> We do not check if packet from real server is for NAT
> connection before performing SNAT. This causes problems
> for setups that use DR/TUN and allow local clients to
> access the real server directly, for example:
>
> - local client in director creates IPVS-DR/TUN connection
> CIP->VIP and the request packets are routed to RIP.
> Talks are finished but IPVS connection is not expired yet.
>
> - second local client creates non-IPVS connection CIP->RIP
> with same reply tuple RIP->CIP and when replies are received
> on LOCAL_IN we wrongly assign them for the first client
> connection because RIP->CIP matches the reply direction.
>
> The problem is more visible to local UDP clients but in rare
> cases it can happen also for TCP or remote clients when the
> real server sends the reply traffic via the director.
>
> So, better to be more precise for the reply traffic.
> As replies are not expected for DR/TUN connections, better
> to not touch them.
>
> Reported-by: Nick Moriarty <nick.moriarty@york.ac.uk>
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> ---
>  net/netfilter/ipvs/ip_vs_core.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index db40050..ee44ed5 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -849,10 +849,8 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
>  {
>         unsigned int verdict = NF_DROP;
>
> -       if (IP_VS_FWD_METHOD(cp) != 0) {
> -               pr_err("shouldn't reach here, because the box is on the "
> -                      "half connection in the tun/dr module.\n");
> -       }
> +       if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
> +               goto ignore_cp;
>
>         /* Ensure the checksum is correct */
>         if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) {
> @@ -886,6 +884,8 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
>                 ip_vs_notrack(skb);
>         else
>                 ip_vs_update_conntrack(skb, cp, 0);
> +
> +ignore_cp:
>         verdict = NF_ACCEPT;
>
>  out:
> @@ -1385,8 +1385,11 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, in
>          */
>         cp = pp->conn_out_get(ipvs, af, skb, &iph);
>
> -       if (likely(cp))
> +       if (likely(cp)) {
> +               if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
> +                       goto ignore_cp;
>                 return handle_response(af, skb, pd, cp, &iph, hooknum);
> +       }
>
>         /* Check for real-server-started requests */
>         if (atomic_read(&ipvs->conn_out_counter)) {
> @@ -1444,9 +1447,15 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, in
>                         }
>                 }
>         }
> +
> +out:
>         IP_VS_DBG_PKT(12, af, pp, skb, iph.off,
>                       "ip_vs_out: packet continues traversal as normal");
>         return NF_ACCEPT;
> +
> +ignore_cp:
> +       __ip_vs_conn_put(cp);
> +       goto out;
>  }
>
>  /*
> --
> 2.9.3
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>



-- 
Nick Moriarty
Linux Systems Administrator and Developer

IT Services
Computer Science Building
University of York
York
YO10 5GH
+44 (0)1904 32 3484

e-mail disclaimer: http://www.york.ac.uk/docs/disclaimer/email.htm

^ permalink raw reply

* [PATCH] brcmfmac: Ensure pointer correctly set if skb data location changes
From: James Hughes @ 2017-04-24 10:52 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Kalle Valo,
	linux-wireless, brcm80211-dev-list.pdl, netdev
  Cc: James Hughes

The incoming skb header may be resized if header space is
insufficient, which might change the data adddress in the skb.
Ensure that a cached pointer to that data is correctly set by
moving assignment to after any possible changes.

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 5eaac13e2317..934fe00e28a0 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -198,7 +198,7 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 	int ret;
 	struct brcmf_if *ifp = netdev_priv(ndev);
 	struct brcmf_pub *drvr = ifp->drvr;
-	struct ethhdr *eh = (struct ethhdr *)(skb->data);
+	struct ethhdr *eh;
 
 	brcmf_dbg(DATA, "Enter, bsscfgidx=%d\n", ifp->bsscfgidx);
 
@@ -229,6 +229,8 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 		}
 	}
 
+	eh = (struct ethhdr *)(skb->data);
+
 	/* validate length for ether packet */
 	if (skb->len < sizeof(*eh)) {
 		ret = -EINVAL;
-- 
2.11.0

^ permalink raw reply related

* Re: bluetooth 6lowpan interfaces are not virtual anymore
From: Luiz Augusto von Dentz @ 2017-04-24 10:35 UTC (permalink / raw)
  To: Alexander Aring
  Cc: Network Development, Jukka Rissanen, linux-wpan@vger.kernel.org,
	Linux Bluetooth, Michael Richardson
In-Reply-To: <f3c04860-651e-c9f3-e7ab-db609d105436@pengutronix.de>

Hi Alex,

On Wed, Apr 19, 2017 at 8:43 PM, Alexander Aring <aar@pengutronix.de> wrote:
> Hi,
>
> at first I want to clarify what my definition of virtual and non virtual
> interface is:
>
> - virtual interfaces: has no queue(s)
> - non virtual interfaces: has a queue(s)
>
> I did some "big" ASCII graphic what I think it will do now.
> At first the virtual interface:
>
> --- snip ---
>
> Transmit side only! Case of virtual interface.
>
> IPv6 Stack
>
>                 +-----------------------------+
>                 |    IPv6 Packet (ND, etc)    |
>                 +--------------+--------------+
>                                |
>       -------------------------+---------------------------------
> 6LoWPAN Interface              |
>                                |
>                 +--------------v--------------+
>                 | Adaptation (IPv6 to 6Lo)    |
>                 +--------------+--------------+
>     +--------------------------+
>     |
>     | -----------------------------------------------------------
> Link|Layer (802.15.4/hci_dev/etc)
>     |
>     |                  SKB Queue (With kind of discipline)
>     |           +----------------+---------------+-------------+
>     +----------->      SKB1      |     SKBN      |     ...     +----+
>                 +----------------+---------------+-------------+    |
>                                                                     |
>                                                                     |
>       ------------------------------------------------------------  |
> Real Transeiver Hardware                                            |
>                                        +----------------------------+
>                                        |
>               +------------------------v-----------------------+
>               |        Framebuffers (YOUR hardware resource)   |
>               +------------------------------------------------+
>
> --- snap ---
>
> The non virtual interface:
>
> --- snip ---
>
> Transmit side only! Case of non virtual interface.
>
> IPv6 Stack
>
>                 +-----------------------------+
>                 |    IPv6 Packet (ND, etc)    |
>                 +--------------+--------------+
>                                |
>       -------------------------+---------------------------------
> 6LoWPAN Interface              |
>                                |
>     +--------------------------+
>     |                                                              /> Will be queued if you stop the queue
>     |                  SKB Queue (With kind of discipline)      /--   Because Link Layer is full/software limitation
>     |           +----------------+---------------+-------------+
>     +----------->      SKB1      |     SKBN      |     ...     +----+
>                 +----------------+---------------+-------------+    |
>                                                                     |
>                                +------------------------------------+
>                                |
>                 +--------------v--------------+
>                 | Adaptation (IPv6 to 6Lo)    |
>                 +--------------+--------------+
>     +--------------------------+
>     |
>     | -----------------------------------------------------------
> Link|Layer (802.15.4/hci_dev/etc)
>     |                                                              -> software limitation (tx credits?)
>     |                  SKB Queue (With kind of discipline)     ---/   Right position to make different handling (for virtual case).
>     |           +----------------+---------------+------------/+
>     +----------->      SKB1      |     SKBN      |     ...     +----+
>                 +----------------+---------------+-------------+    |
>                                                                     |
>                                                                     |
>       ------------------------------------------------------------  |
> Real Transeiver Hardware                                            |
>                                        +----------------------------+
>                                        |
>               +------------------------v-----------------------+
>               |        Framebuffers (YOUR hardware resource)   |
>               +------------------------------------------------+
>
>
> --- snap ---
>
> Adding a queue to a interface which does a protocol translation only is
> in my opinion: wrong.
>
> At least if you want to try to make a more efficient qdisc handling,
> means dropping skb's in queue when userspace sends too much. You will
> change it back, control two queues seems to be difficult.

It is wrong only if you look at 6LoWPAN interface as being owned by
6lowpan modules, but it doesn't, in fact it won't anything by itself
so it basically acts as a library to that perform 6LoWPAN operation on
the packet level, thus why it is possible to switch from virtual to
real interface. Also in terms of 15.4, the 6LoWPAN interfaces you
depicted above is also no controlled by 6LoWPAN itself, in fact it
seeems to represent links in 15.4, and I also assume these links may
not use 6LoWPAN.

> On 04/18/2017 12:43 PM, Luiz Augusto von Dentz wrote:
>> Hi Alex,
>>
>> On Mon, Apr 17, 2017 at 8:56 PM, Alexander Aring <aar@pengutronix.de> wrote:
>>> Hi,
>>>
>>> bluetooth-next contains patches which introduces a queue for bluetooth
>>> 6LoWPAN interfaces. [0]
>>>
>>> At first, the current behaviour is now that 802.15.4 6LoWPAN interfaces
>>> are virtual and bluetooth 6LoWPAN interfaces are not virtual anymore.
>>> To have a different handling in both subsystems is _definitely_ wrong.
>>
>> Well 15.4 and Bluetooth have very different handling, iirc in 15.4 you
>> do have a real interface which has, in fact, queueing support:
>>
>> net/mac802154/iface.c:ieee802154_if_setup:
>> dev->tx_queue_len = 300;
>>
>
> Is l2cap_chan_send a call with is synchronized which wait for it so
> after that the "l2cap bluetooth *frame*. etc." is send?
>
> I looked at the code [0]. It will queue it in some skb queue, so far I
> see. Hard to see it in this code, but I see some queueing and workqueue
> scheduling [1] in the function which will be called by l2cap_chan_send.

There is a tx_queue per channel but that doesn't mean there is a per
channel and introducing one when we can use the net qdisc support
seems useless to me.

> This queue should have the same meaning like our queue in IEEE
> 802.15.4 interface which you mentioned above.
> At this point, we have no difference. There exists already a queue for
> your real transeiver hardware.

The queue here is per channel, btw the queue size is not decided by us
it is the remote side that provides the tx credits so to some extend
the tx_queue is actually the remote queue in case of CoC. When testing
this it was quite clear this does not work in practice, with IPv6 at
least, because the remote side may not have enough credits for a
single packet (MPS * tx_credits < MTU) which means without proper
queueing support we would be dropping every packet.

>>> What does the 6LoWPAN interface?
>>>
>>> It will do a protocol change (an adaptation, because 6LoWPAN should
>>> provide the same functionality as IPv6) from IPv6 to 6LoWPAN (tx) and
>>> vice versa for (rx). In my opinion this should be handled as a virtual
>>> interface and not as an interface with a queue.
>>
>> Then we need a real netif for Bluetooth as well, one that does
>> queueing since introducing for l2cap_chan makes no sense when most of
>> time the userspace has a socket which does its own queueing. Btw, I
>> have the opinion that doing a second interface just to make Bluetooth
>> behave like 15.4 is very wasteful since we don't have any other
>> network support except for bnep, which in fact does the same thing
>> regarding queueing.
>>
>
> If bnep just generates some ethernet frames and you put L2CAP around,
> then it should be virtual too. If it queue skbs for sending out for the
> hci_dev. Then it ends in a similar queue handling like "non virtual
> 6LoWPAN interface".

BNEP does not use the same L2CAP channel mode, in fact there is no
queueing whatsoever since it use basic mode, and considering it has
been working for this long I don't think there is a real problem in it
using the net qdisc infra, actually I don't quite get this virtual to
non-virtual complaint, IMO this is only valid if you do have a
tunnel-like design where there are multiple interfaces involved so
there is no point in having qdisc enabled in all of them but we only
have one interface in Bluetooth.

> What you mean with second interface? At least you need a capable
> net_device interface to offer an IP connection from userspace.
>
>>> What makes a queue on 6LoWPAN interfaces?
>>>
>>> It will queue IPv6 packets which waits for it adaptation (the protocol
>>> change) with some additional qdisc handling.
>>> If finally the xmit_do callback will occur it will replace IPv6 header
>>> with 6LoWPAN, etc. After that it should be queued into some queue on
>>> link layer side which should be do the transmit finally.
>>
>> In case of Bluetooth it does the Link Layer using L2CAP, which is not
>> exactly at Link Layer, in fact it is one of the major problems of
>> having this in Kernel space since it is similar of doing IP over TCP
>> tunnel.
>>
>
> aha, but this has nothing to do to decide that you have a 6LoWPAN queue,
> or? I think this is one reason why you want a "6lowtun" interface. To
> handle L2CAP in userspace.

Indeed it is one of main reasons.

>>> Why I think bluetooth introduced a queue handling on 6LoWPAN interfaces?
>>>
>>> Because I think they don't like their own *qdisc* handling on their link
>>> layer interface. I write *qdisc* here because, they have no net_devices
>>> and use some kind of own qdisc handling.
>>
>> No we don't, and except we do change the whole architecture of
>> Bluetooth to work as a net_device I don't think we can have a similar
>> design as 15.4. Anyway Bluetooth is not really designed to carry IP,
>> the L2CAP and other profiles above are mostly a Bluetooth thing so
>> having a net_device would not buy us much more than queueing for bnep
>> and ipsp.
>>
>
> At least, having a queue and don't put anything anymore into the queue
> when "tx credits" is at zero (because queue is full). _Is_ a kind of
> qdisc handling.

The tx_credits operate at L2CAP segment level (MPS), so it is at a
completely different level, there are no guarantees the credits will
attend to a single IP packet.

In case you are curious when testing with qdisc disabled the interface
will basically turn down as soon as there are no credits left which
makes even pings be dropped.

>>> My question: is this correct?
>>>
>>> How to fix that (In my opinion):
>>>
>>> So commit [0] says something "out of credits" that's what I think it's
>>> the *qdisc* handling. If you cannot queue anymore -> you need to drop
>>> it. If you don't like how the current behaviour is, you need to change
>>> your *qdisc* handling on your link layer interface. Introducing queue at
>>> 6LoWPAN interfaces will introduce "buffer bloating".
>>
>> That is not what the code comments says, eg. netif_stop_queue:
>>
>>  * Stop upper layers calling the device hard_start_xmit routine.
>>  * Used for flow control when transmit resources are unavailable.
>>
>
> This comments are correct so long you operate on "real hardware
> resources" which 6LoWPAN interfaces doesn't do. In case so far I see, it
> will call l2cap_chan_send which putting some L2CAP protocol on it (to
> provide data) and somewhere queue it for transmit to fill "real hardware
> resources" e.g. framebuffers.

The interface in question is created by Bluetooth 6lowpan module, the
TX and RX is responsibility of the implementation of the interface
which I assume it to avoid leaking this sort of details to 6LoWPAN
itself or having yet another level of indirection where the 6LoWPAN
would need to define another API/callbacks to interface with the
modules trying to use it.

> Your "resource" which is unavailable is the link layer queue, but this
> is just a software limitation by bluetooth, which you can change by
> software to make a different handling, than just adding more software queues.
>
> What _virtual_ 6LoWPAN interface does is:
>
> Input: IPv6 -> Output: 6LoWPAN, without any queueing in front of that.
>
>> The fact 15.4, and bnep, uses these APIs makes me believe this is the
>> proper way of using qdisc handling, the only difference here is at
>> what level we would be using them.
>>
>
> On 15.4 we use it for link layer interface queue, but we don't put
> another queue in 6LoWPAN. See my aboves ASCII arts which shows the
> different when 6LoWPAN is virtual and not virtual.
>
> ---
>
> We have already qdisc problems in our case. In 15.4 it "just works" for
> now, but we need more handling there to send not garbage if one frame
> gets dropped there, see [2].
>
> In my opinion, if bluetooth people will hit similar issues, you will
> change it back that 6LoWPAN has no queue anymore. Otherwise it will very
> hard to decide which skbs you drop or not - because you need to deal
> with two queues.
>
> To make 6LoWPAN non virtual anymore and just adding a queue will occur
> that you don't drop much anymore... but the queue at link layer
> (hci_dev) will work as before and this occurs buffer bloating only.
>
>>> ---
>>>
>>> I don't care what bluetooth does with the 6LoWPAN interface. If bluetooth
>>> people wants such behaviour, then I am okay with that.
>>>
>>> I will bookmark this mail for the time when somebody reverts it to a
>>> virtual interface again. I think somebody will change it again, or maybe
>>> somebody will argument why we need a queue here.
>>
>> From what I could grasp from the code except if 6LoWPAN start creating
>> its own interface, which it doesn't currently, it should never assume
>> any specific behavior for an interface. 6LoWPAN right now only have
>> helper functions which the interface code call on TX/RX, and by
>> looking at code under 15.4 the only thing we would be doing is to set
>> skb->dev = wdev; (wdev being the real device) as Bluetooth don't use
>> 6lowpan fragmentation.
>>
>
> yep, there is a lot of work do make a better framework, but has nothing
> do to why bluetooth introduced a second queue on 6lowpan interfaces.
>
>> Btw, another big different in terms of design that it seems 15.4 does
>> create interfaces to each and every link, in Bluetooth the interface
>> is per adapter so it is in fact multi-link, I guess in practice 15.4
>> shall only have one 6lowpan link since it is connected to mesh, but in
>> Bluetooth we may have more than one Link and I don't think it would be
>> a good idea to have each of these links as a different interface,
>> especially not with the same mac address as it seems to be what 15.4
>> code is doing:
>>
>> /* Set the lowpan hardware address to the wpan hardware address. */
>> memcpy(ldev->dev_addr, wdev->dev_addr, IEEE802154_ADDR_LEN);
>>
>
> Of course we need the same mac address there (at first), because we
> have an address filter on hardware.
>
> (Be aware that you _cannot_ change the mac address (dev->dev_addr) while IP
>  capable net device is up. They assume the mac address is read only if net
>  device is up.)
>
> There is currently an idea to provide multiple links, but then hardware
> need to go into promiscuous mode (disable address filtering) but then we
> will lose some hardware offloading stuff e.g. ACK handling.
> If we have that, we can provide multiple links with different mac
> addresses, at least this will be handled as multiple wpan interfaces for
> one PHY.

We do need multi-link from the start, and it has been decided it
wouldn't be with multiple interfaces but with a single one.

>> So at least with IPSP/RFC-7668 the topology is completely different
>> from what we can see in 15.4, it may happen that this changes with
>> upcoming changes to Bluetooth, adding yet another topology that we
>> will need to support. Personally I wouldn't mind if 15.4 and Bluetooth
>> had more in common, at least at 6lowpan level, so we wouldn't have the
>> need to create separated modules just to deal with their differences,
>> but in reality, these technologies are competing for the same market
>> so I don't think it will happen, unfortunately.
>>
>
> You already have separated modules, this is .e.g IPHC stuff and 6lowpan
> specific implementation for bluetooth. In net/bluetooth/6lowpan.c you
> can put your changes for make specific bluetooth handling.
>
> This discussion moved currently to a more complex question about how to
> make the whole 6lowpan architecture...
>
> I think I tried to explain so much as possible what a queue on top of
> 6lowpan interface will do, at least if you want to make a better queue
> handling (when dropping skb's inside the queue(s)) then you will fail
> and revert that change (I think). But then I can say "I told you so" :-)
>
> - Alex
>
> [0] http://lxr.free-electrons.com/source/net/bluetooth/l2cap_core.c#L2455
> [1] http://lxr.free-electrons.com/source/net/bluetooth/hci_core.c#L3490
> [2] http://www.spinics.net/lists/linux-wpan/msg03679.html



-- 
Luiz Augusto von Dentz

^ permalink raw reply

* [PATCH net] team: fix memory leaks
From: Pan Bian @ 2017-04-24 10:29 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: linux-kernel, Pan Bian

In functions team_nl_send_port_list_get() and
team_nl_send_options_get(), pointer skb keeps the return value of
nlmsg_new(). When the call to genlmsg_put() fails, the memory is not
freed(). This will result in memory leak bugs.

Fixes: 9b00cf2d1024 ("team: implement multipart netlink messages for options transfers")
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
 drivers/net/team/team.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index f8c81f1..85c0124 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2361,8 +2361,10 @@ static int team_nl_send_options_get(struct team *team, u32 portid, u32 seq,
 
 	hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
 			  TEAM_CMD_OPTIONS_GET);
-	if (!hdr)
+	if (!hdr) {
+		nlmsg_free(skb);
 		return -EMSGSIZE;
+	}
 
 	if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
 		goto nla_put_failure;
@@ -2634,8 +2636,10 @@ static int team_nl_send_port_list_get(struct team *team, u32 portid, u32 seq,
 
 	hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
 			  TEAM_CMD_PORT_LIST_GET);
-	if (!hdr)
+	if (!hdr) {
+		nlmsg_free(skb);
 		return -EMSGSIZE;
+	}
 
 	if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
 		goto nla_put_failure;
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net v2 1/3] net: hns: support deferred probe when can not obtain irq
From: Matthias Brugger @ 2017-04-24 10:28 UTC (permalink / raw)
  To: netdev; +Cc: netdev, charles.chenxin, linuxarm
In-Reply-To: <1492760684-117205-2-git-send-email-yankejian@huawei.com>

On 21/04/17 09:44, Yankejian wrote:
> From: lipeng <lipeng321@huawei.com>
>
> In the hip06 and hip07 SoCs, the interrupt lines from the
> DSAF controllers are connected to mbigen hw module.
> The mbigen module is probed with module_init, and, as such,
> is not guaranteed to probe before the HNS driver. So we need
> to support deferred probe.
>
> We check for probe deferral in the hw layer probe, so we not
> probe into the main layer and memories, etc., to later learn
> that we need to defer the probe.
>

Why? This looks like a hack.
 From what I see, we can handle EPROBE_DEFER easily inside hns_ppe_init 
checking the return value of hns_rcb_get_cfg. Like you do in 2/3 of this 
series.

Regards,
Matthias

> Signed-off-by: lipeng <lipeng321@huawei.com>
> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
> ---
>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> index 403ea9d..2da5b42 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> @@ -2971,6 +2971,18 @@ static int hns_dsaf_probe(struct platform_device *pdev)
>  	struct dsaf_device *dsaf_dev;
>  	int ret;
>
> +	/*
> +	 * Check if we should defer the probe before we probe the
> +	 * dsaf, as it's hard to defer later on.
> +	 */
> +	ret = platform_get_irq(pdev, 0);
> +	if (ret < 0) {
> +		if (ret != -EPROBE_DEFER)
> +			dev_err(&pdev->dev, "Cannot obtain irq\n");
> +
> +		return ret;
> +	}
> +
>  	dsaf_dev = hns_dsaf_alloc_dev(&pdev->dev, sizeof(struct dsaf_drv_priv));
>  	if (IS_ERR(dsaf_dev)) {
>  		ret = PTR_ERR(dsaf_dev);
>

^ permalink raw reply

* Re: [PATCH] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
From: Simon Horman @ 2017-04-24  9:54 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Paolo Abeni, netfilter-devel, lvs-devel, Wensong Zhang, netdev
In-Reply-To: <alpine.LFD.2.20.1704241017030.2492@ja.home.ssi.bg>

On Mon, Apr 24, 2017 at 10:21:30AM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Mon, 24 Apr 2017, Paolo Abeni wrote:
> 
> > Hi,
> > 
> > The problem with the patched code is that it tries to resolve ipv6
> > addresses that are not created/validated by the kernel.
> 
> 	OK. Simon, please apply to ipvs tree.
> 
> Acked-by: Julian Anastasov <ja@ssi.bg>

Thanks, done.

^ permalink raw reply

* Re: [PATCH] net: ipv6: check route protocol when deleting routes
From: Lorenzo Colitti @ 2017-04-24  9:48 UTC (permalink / raw)
  To: Mantas Mikulėnas
  Cc: netdev@vger.kernel.org, lkml, David Miller, Joel Scherpelz,
	Hannes Frederic Sowa, YOSHIFUJI Hideaki, Greg KH
In-Reply-To: <20161216083059.251368-1-grawity@gmail.com>

On Fri, Dec 16, 2016 at 5:30 PM, Mantas Mikulėnas <grawity@gmail.com> wrote:
> The protocol field is checked when deleting IPv4 routes, but ignored for
> IPv6, which causes problems with routing daemons accidentally deleting
> externally set routes (observed by multiple bird6 users).
>
> This can be verified using `ip -6 route del <prefix> proto something`.

I think this change might have broken userspace deleting routes that
were created by RAs. This is because the rtm_protocol returned to
userspace is actually synthesized only at route dump time by
rt6_fill_node:

        else if (rt->rt6i_flags & RTF_ADDRCONF) {
                if (rt->rt6i_flags & (RTF_DEFAULT | RTF_ROUTEINFO))
                        rtm->rtm_protocol = RTPROT_RA;
                else
                        rtm->rtm_protocol = RTPROT_KERNEL;
        }

but rt6_add_dflt_router and rt6_add_route_info add the route with a
protocol of 0, and 0 is silently upgraded to RTPROT_BOOT by
ip6_route_info_create.

        if (cfg->fc_protocol == RTPROT_UNSPEC)
                cfg->fc_protocol = RTPROT_BOOT;
        rt->rt6i_protocol = cfg->fc_protocol;

So an app that was previously trying to delete routes looking at
rtm_proto, and issuing a delete with whatever rtm_proto was returned
by netlink, will result in this check failing because its passed-in
protocol (RTPROT_RA or RTPROT_KERNEL) will not match the actual FIB
value, which is RTPROT_BOOT.

I can't easily test on a vanilla kernel, but on a system running a
slightly modified 4.4.63, I see the code fail like this:

# ip -6 route show
2001:db8:64::/64 dev nettest100  proto kernel  metric 256  expires 291sec
fe80::/64 dev nettest100  proto kernel  metric 256
default via fe80::6400 dev nettest100  proto ra  metric 1024  expires 291sec
# ip -6 route flush
Failed to send flush request: No such process
# ip -6 route show
default via fe80::6400 dev nettest100  proto ra  metric 1024  expires 286sec

If so, it seems unfortunate to have brought this into -stable.

For non-stable kernels, it seems that the proper fix would be:

1. Ensure that when an RA creates a route, it properly sets
rtm_protocol at time of route creation.
2. When we dump routes to userspace, we don't overwrite the rtm_protocol.

I can try to write that up, but I'm not sure why the code doesn't do
this already. Perhaps there's a good reason for it. Yoshifuji, Hannes,
any thoughts?

^ permalink raw reply

* Re: [net 1/1] team: fix memory leaks
From: Jiri Pirko @ 2017-04-24  9:43 UTC (permalink / raw)
  To: Pan Bian; +Cc: netdev, linux-kernel
In-Reply-To: <1493026612-12383-1-git-send-email-bianpan2016@163.com>

Plus since you have only one patch, please do not do "1/1" in the
email subject.  Thanks.


Mon, Apr 24, 2017 at 11:36:52AM CEST, bianpan2016@163.com wrote:
>In functions team_nl_send_port_list_get() and
>team_nl_send_options_get(), pointer skb keeps the return value of
>nlmsg_new(). When the call to genlmsg_put() fails, the memory is not
>freed(). This will result in memory leak bugs.
>
>Fixes: 9b00cf2d1024 ("team: implement multipart netlink messages for
>options transfers")
>Signed-off-by: Pan Bian <bianpan2016@163.com>
>---
> drivers/net/team/team.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index f8c81f1..85c0124 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -2361,8 +2361,10 @@ static int team_nl_send_options_get(struct team *team, u32 portid, u32 seq,
> 
> 	hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
> 			  TEAM_CMD_OPTIONS_GET);
>-	if (!hdr)
>+	if (!hdr) {
>+		nlmsg_free(skb);
> 		return -EMSGSIZE;
>+	}
> 
> 	if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
> 		goto nla_put_failure;
>@@ -2634,8 +2636,10 @@ static int team_nl_send_port_list_get(struct team *team, u32 portid, u32 seq,
> 
> 	hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
> 			  TEAM_CMD_PORT_LIST_GET);
>-	if (!hdr)
>+	if (!hdr) {
>+		nlmsg_free(skb);
> 		return -EMSGSIZE;
>+	}
> 
> 	if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
> 		goto nla_put_failure;
>-- 
>1.9.1
>
>

^ permalink raw reply

* Re: [PATCH v2 net-next] net: ipv6: send unsolicited NA if enabled for all interfaces
From: Simon Horman @ 2017-04-24  9:42 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, hannes
In-Reply-To: <1492877413-21906-1-git-send-email-dsa@cumulusnetworks.com>

On Sat, Apr 22, 2017 at 09:10:13AM -0700, David Ahern wrote:
> When arp_notify is set to 1 for either a specific interface or for 'all'
> interfaces, gratuitous arp requests are sent. Since ndisc_notify is the
> ipv6 equivalent to arp_notify, it should follow the same semantics.
> Commit 4a6e3c5def13 ("net: ipv6: send unsolicited NA on admin up") sends
> the NA on admin up. The final piece is checking devconf_all->ndisc_notify
> in addition to the per device setting. Add it.
> 
> Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change")
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Reviewed-by: Simon Horman <simon.horman@netronome.com>

> ---
> v2
> - update commit message with subject of commit 4a6e3c5def13 per comment
>   from Sergei
> 
>  net/ipv6/ndisc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index b23822e64228..d310dc41209a 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -1753,7 +1753,8 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event,
>  		idev = in6_dev_get(dev);
>  		if (!idev)
>  			break;
> -		if (idev->cnf.ndisc_notify)
> +		if (idev->cnf.ndisc_notify ||
> +		    net->ipv6.devconf_all->ndisc_notify)
>  			ndisc_send_unsol_na(dev);
>  		in6_dev_put(idev);
>  		break;
> -- 
> 2.1.4
> 

^ permalink raw reply

* Re: [net 1/1] team: fix memory leaks
From: Jiri Pirko @ 2017-04-24  9:42 UTC (permalink / raw)
  To: Pan Bian; +Cc: netdev, linux-kernel
In-Reply-To: <1493026612-12383-1-git-send-email-bianpan2016@163.com>

Interesting. In last reply, I put the "[]" prefix example exactly as it
should be, yet you managed to have it wrong...


Mon, Apr 24, 2017 at 11:36:52AM CEST, bianpan2016@163.com wrote:
>In functions team_nl_send_port_list_get() and
>team_nl_send_options_get(), pointer skb keeps the return value of
>nlmsg_new(). When the call to genlmsg_put() fails, the memory is not
>freed(). This will result in memory leak bugs.
>
>Fixes: 9b00cf2d1024 ("team: implement multipart netlink messages for
>options transfers")

No linewraps here, please.



>Signed-off-by: Pan Bian <bianpan2016@163.com>
>---
> drivers/net/team/team.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index f8c81f1..85c0124 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -2361,8 +2361,10 @@ static int team_nl_send_options_get(struct team *team, u32 portid, u32 seq,
> 
> 	hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
> 			  TEAM_CMD_OPTIONS_GET);
>-	if (!hdr)
>+	if (!hdr) {
>+		nlmsg_free(skb);
> 		return -EMSGSIZE;
>+	}
> 
> 	if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
> 		goto nla_put_failure;
>@@ -2634,8 +2636,10 @@ static int team_nl_send_port_list_get(struct team *team, u32 portid, u32 seq,
> 
> 	hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
> 			  TEAM_CMD_PORT_LIST_GET);
>-	if (!hdr)
>+	if (!hdr) {
>+		nlmsg_free(skb);
> 		return -EMSGSIZE;
>+	}
> 
> 	if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
> 		goto nla_put_failure;
>-- 
>1.9.1
>
>

^ permalink raw reply

* [net 1/1] team: fix memory leaks
From: Pan Bian @ 2017-04-24  9:36 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: linux-kernel, Pan Bian

In functions team_nl_send_port_list_get() and
team_nl_send_options_get(), pointer skb keeps the return value of
nlmsg_new(). When the call to genlmsg_put() fails, the memory is not
freed(). This will result in memory leak bugs.

Fixes: 9b00cf2d1024 ("team: implement multipart netlink messages for
options transfers")
Signed-off-by: Pan Bian <bianpan2016@163.com>
---
 drivers/net/team/team.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index f8c81f1..85c0124 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2361,8 +2361,10 @@ static int team_nl_send_options_get(struct team *team, u32 portid, u32 seq,
 
 	hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
 			  TEAM_CMD_OPTIONS_GET);
-	if (!hdr)
+	if (!hdr) {
+		nlmsg_free(skb);
 		return -EMSGSIZE;
+	}
 
 	if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
 		goto nla_put_failure;
@@ -2634,8 +2636,10 @@ static int team_nl_send_port_list_get(struct team *team, u32 portid, u32 seq,
 
 	hdr = genlmsg_put(skb, portid, seq, &team_nl_family, flags | NLM_F_MULTI,
 			  TEAM_CMD_PORT_LIST_GET);
-	if (!hdr)
+	if (!hdr) {
+		nlmsg_free(skb);
 		return -EMSGSIZE;
+	}
 
 	if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
 		goto nla_put_failure;
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next v4 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Simon Horman @ 2017-04-24  9:27 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: David Miller, eric.dumazet, jiri, netdev, xiyou.wangcong
In-Reply-To: <c6333422-ae6d-149c-09ec-c5dfb43d68b3@mojatatu.com>

On Fri, Apr 21, 2017 at 02:11:00PM -0400, Jamal Hadi Salim wrote:
> On 17-04-21 12:12 PM, David Miller wrote:
> 
> >Yes for existing attributes we are stuck in the mud because of how
> >we've handled things in the past.  I'm not saying we should change
> >behavior for existing attributes.
> >
> >I'm talking about any newly added attribute from here on out, and
> >that we need to require checks for them.
> >
> 
> Please bear with me. I want to make sure to get this right.
> 
> Lets say I updated the kernel today to reject transactions with
> bits it didnt understand. Lets call this "old kernel". A tc that
> understands/sets these bits and nothing else. Call it "old tc".
> 3 months later:
> I add one more bit setting to introduce a new feature in a new
> kernel version. Lets call this new "kernel". I update to
> understand new bits. Call it "new tc".
> 
> The possibilities:
> a) old tc + old kernel combo. No problem
> b) new tc + new kernel combo. No problem.
> c) old tc + new kernel combo. No problem.
> d) new tc + old kernel. Rejection.
> 
> For #d if i have a smart tc it would retry with a new combination
> which restores its behavior to old tc level. Of course this means
> apps would have to be rewritten going forward to understand these
> mechanics.
> Alternative is to request for capabilities first then doing a
> lowest common denominator request.
> But even that is a lot more code and crossing user/kernel twice.

>From my PoV, for #d user-space has to either get smart or fail.

Creating new tc involved work in order to support the new feature.
Part of that work would be a decision weather or not to provide
compatibility for old kernel or to bail out gracefully.

> There is a simpler approach that would work going forward.
> How about letting the user choose their fate? Set something maybe
> in the netlink header to tell the kernel "if you dont understand
> something I am asking for - please ignore it and do what you can".
> This would maintain current behavior but would force the user to
> explicitly state so.
> 
> cheers,
> jamal
> 

^ permalink raw reply

* Re: [PATCH net-next v5 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Simon Horman @ 2017-04-24  9:14 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jamal Hadi Salim, davem, xiyou.wangcong, eric.dumazet, netdev
In-Reply-To: <20170420142453.GF1886@nanopsycho.orion>

On Thu, Apr 20, 2017 at 04:24:53PM +0200, Jiri Pirko wrote:
> Thu, Apr 20, 2017 at 04:18:50PM CEST, jhs@mojatatu.com wrote:
> >On 17-04-20 09:59 AM, Jiri Pirko wrote:
> >> Thu, Apr 20, 2017 at 03:06:21PM CEST, jhs@mojatatu.com wrote:
> >> > From: Jamal Hadi Salim <jhs@mojatatu.com>

...

> >> > +	if (tcaa[TCAA_ACT_FLAGS])
> >> > +		act_flags = nla_get_u32(tcaa[TCAA_ACT_FLAGS]);
> >> 
> >> I still believe this is wrong. Should be a separate attr per flag.
> >> For user experience breakage reasons:
> >> 2 kernels should not behave differently on the exact same value passed
> >> from userspace:
> >> User passes 0x2. Now the kernel will ignore the set bit, the next kernel
> >> will recognize it as a valid flag and do something.
> >> Please let the discussion reach a consensus before pushing this again.
> >> 
> >> 
> >
> >Jiri - I dont agree. There is no such breakage. Refer to my previous
> >email. Lets just move on.
> 
> Anyone else has opinion on this?

At the risk of jumping into a hornets nest, yes, I have an opinion:

* A agree with Jiri that a separate attribute per flag seems to be
  the cleanest option, however;
* I think it would be reasonable from a UABI PoV to permit currently unused
  bits of TCAA_ACT_FLAGS to be re-uses so long as the kernel checks that
  they are zero until they are designated to have some use. I believe this
  implies that the default value for any future uses of these bits would be
  zero.

Jamal, I am confused about why are you so concerned about the space
consumed by this attribute, it's per-message, right? Is it the bigger
picture you are worried about - a similar per-entry flag at some point in
the future?

^ permalink raw reply

* Re: [PATCH v4 net-next] mdio_bus: Issue GPIO RESET to PHYs.
From: Roger Quadros @ 2017-04-24  9:04 UTC (permalink / raw)
  To: Andrew Lunn, Lars-Peter Clausen
  Cc: davem, Florian Fainelli, tony, nsekhar, jsarha, netdev,
	linux-omap, linux-kernel
In-Reply-To: <20170423233553.GA23498@lunn.ch>

On 24/04/17 02:35, Andrew Lunn wrote:
> On Fri, Apr 21, 2017 at 03:31:09PM +0200, Lars-Peter Clausen wrote:
>> On 04/21/2017 03:15 PM, Roger Quadros wrote:
>>> diff --git a/Documentation/devicetree/bindings/net/mdio.txt b/Documentation/devicetree/bindings/net/mdio.txt
>>> new file mode 100644
>>> index 0000000..4ffbbac
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/net/mdio.txt
>>> @@ -0,0 +1,33 @@
>>> +Common MDIO bus properties.
>>> +
>>> +These are generic properties that can apply to any MDIO bus.
>>> +
>>> +Optional properties:
>>> +- reset-gpios: List of one or more GPIOs that control the RESET lines
>>> +  of the PHYs on that MDIO bus.
>>> +- reset-delay-us: RESET pulse width in microseconds as per PHY datasheet.
>>> +
>>> +A list of child nodes, one per device on the bus is expected. These
>>> +should follow the generic phy.txt, or a device specific binding document.
>>> +
>>> +Example :
>>> +This example shows these optional properties, plus other properties
>>> +required for the TI Davinci MDIO driver.
>>> +
>>> +	davinci_mdio: ethernet@0x5c030000 {
>>> +		compatible = "ti,davinci_mdio";
>>> +		reg = <0x5c030000 0x1000>;
>>> +		#address-cells = <1>;
>>> +		#size-cells = <0>;
>>> +
>>> +		reset-gpios = <&gpio2 5 GPIO_ACTIVE_LOW>;
>>> +		reset-delay-us = <2>;   /* PHY datasheet states 1us min */
>>
>> If this is the reset line of the PHY shouldn't it be a property of the PHY
>> node rather than of the MDIO controller node (which might have a reset on
>> its own)?
>>> +
>>> +		ethphy0: ethernet-phy@1 {
>>> +			reg = <1>;
>>> +		};
>>> +
>>> +		ethphy1: ethernet-phy@3 {
>>> +			reg = <3>;
>>> +		};
> 
> Hi Lars-Peter
> 
> We discussed this when the first proposal was made. There are two
> cases, to consider.
> 
> 1) Here, one GPIO line resets all PHYs on the same MDIO bus. In this
> example, two PHYs.
> 
> 2) There is one GPIO line per PHY. That is a separate case, and as you
> say, the reset line should probably be considered a PHY property, not
> an MDIO property. However, it can be messy, since in order to probe
> the MDIO bus, you probably need to take the PHY out of reset.
> 
> Anyway, this patch addresses the first case, so should be accepted. If
> anybody wants to address the second case, they are free to do so.

Thanks for the explanation Andrew.

For the second case, even if the RESET GPIO property is specified
in the PHY node, the RESET *will* have to be done by the MDIO bus driver
else the PHY might not be probed at all.

Whether we need additional code to just to make the DT look prettier is
questionable and if required can come as a separate patch.

cheers,
-roger

^ permalink raw reply

* Re: [RFC PATCH 3/7] net: add option to get information about timestamped packets
From: Miroslav Lichvar @ 2017-04-24  9:00 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <CAF=yD-+tJpY7XJrqxga=0_kn-FjS0M=yuu59m_ZRRrrcpJCbZw@mail.gmail.com>

On Thu, Apr 13, 2017 at 12:16:09PM -0400, Willem de Bruijn wrote:
> On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> > On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote:
> >> Why is this L2 length needed?
> >
> > It's needed for incoming packets to allow converting of preamble
> > timestamps to trailer timestamps.
> 
> Receiving the mac length of a packet sounds like a feature independent
> from timestamping.

I agree, but so far nobody suggested another use for this information.
Do you have any suggestions?

The idea was that if it is useful only with HW timestamping, it would
be better to save it only with the timestamp, so there is no
performance impact in the more common case when HW timestamping is
disabled. Am I overly cautious here?

> Either an ioctl similar to SIOCGIFMTU or, if it may
> vary due to existince of vlan headers, a new independent cmsg at the
> SOL_SOCKET layer.

It's not just the VLAN headers. The length of the IP header may vary
with IP options, so the offset of the UDP data in the packet cannot be
assumed to be constant.

Now I'm wondering if it's actually necessary to save the original
value of skb->mac_len + skb->len. Would "skb->data - skb->head -
skb->mac_header + skb->len" always work as the L2 length for received
packets at the time when the cmsg is prepared?

As for the original ifindex, it seems to me it does need to be saved
to a new field since __netif_receive_skb_core() intentionally
overwrites skb->skb_iif. What would be the best place for it, sk_buff
or skb_shared_info?

And would it really be acceptable to save it for all packets in
__netif_receive_skb_core(), even when HW timestamping is disabled?
Seeing how the code and the data structures were optimized over time,
I have a feeling it would not be accepted.

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH] brcm80211: brcmfmac: Ensure that incoming skb's are writable
From: James Hughes @ 2017-04-24  8:45 UTC (permalink / raw)
  To: Arend Van Spriel
  Cc: netdev, Franky Lin, Hante Meuleman, Kalle Valo, linux-wireless
In-Reply-To: <e2e5d950-d186-1a27-4b30-4ae2e9838103@broadcom.com>

On 23 April 2017 at 20:34, Arend Van Spriel
<arend.vanspriel@broadcom.com> wrote:
> On 21-4-2017 11:22, James Hughes wrote:
>> On 20 April 2017 at 20:48, Arend van Spriel
>> <arend.vanspriel@broadcom.com> wrote:
>>> + linux-wireless
>>>
>>> On 4/20/2017 1:16 PM, James Hughes wrote:
>>>>
>>>> The driver was adding header information to incoming skb
>>>> without ensuring the head was uncloned and hence writable.
>>>>
>>>> skb_cow_head has been used to ensure they are writable, however,
>>>> this required some changes to error handling to ensure that
>>>> if skb_cow_head failed it was not ignored.
>>>>
>>>> This really needs to be reviewed by someone who is more familiar
>>>> with this code base to ensure any deallocation of skb's is
>>>> still correct.
>>>>
>>>> Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
>>>> ---
>>>>   .../wireless/broadcom/brcm80211/brcmfmac/bcdc.c    | 15 ++++++++--
>>>>   .../wireless/broadcom/brcm80211/brcmfmac/core.c    | 23 +++++-----------
>>>>   .../broadcom/brcm80211/brcmfmac/fwsignal.c         | 32
>>>> +++++++++++++++++-----
>>>>   .../wireless/broadcom/brcm80211/brcmfmac/sdio.c    |  7 ++++-
>>>>   4 files changed, 51 insertions(+), 26 deletions(-)
>>>>
>>>
>>> [...]
>>>
>>>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>> index 5eaac13..08272e8 100644
>>>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>>>> @@ -198,7 +198,7 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct
>>>> sk_buff *skb,
>>>>         int ret;
>>>>         struct brcmf_if *ifp = netdev_priv(ndev);
>>>>         struct brcmf_pub *drvr = ifp->drvr;
>>>> -       struct ethhdr *eh = (struct ethhdr *)(skb->data);
>>>> +       struct ethhdr *eh;
>>>>         brcmf_dbg(DATA, "Enter, bsscfgidx=%d\n", ifp->bsscfgidx);
>>>>   @@ -212,23 +212,14 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct
>>>> sk_buff *skb,
>>>>         }
>>>>         /* Make sure there's enough room for any header */
>>>> -       if (skb_headroom(skb) < drvr->hdrlen) {
>>>> -               struct sk_buff *skb2;
>>>> -
>>>> -               brcmf_dbg(INFO, "%s: insufficient headroom\n",
>>>> -                         brcmf_ifname(ifp));
>>>> -               drvr->bus_if->tx_realloc++;
>>>> -               skb2 = skb_realloc_headroom(skb, drvr->hdrlen);
>>>> -               dev_kfree_skb(skb);
>>>> -               skb = skb2;
>>>> -               if (skb == NULL) {
>>>> -                       brcmf_err("%s: skb_realloc_headroom failed\n",
>>>> -                                 brcmf_ifname(ifp));
>>>> -                       ret = -ENOMEM;
>>>> -                       goto done;
>>>> -               }
>>>
>>>
>>> What you are throwing away here is code that assures there is sufficient
>>> headroom for protocol and bus layer in the tx path, because that is
>>> determined by drvr->hdrlen. This is where the skb is handed to the driver so
>>> if you could leave the functionality above *and* assure it is writeable that
>>> would be the best solution as there is no need for all the other changes
>>> down the tx path.
>>
>> The skb_cow_head function takes the required headroom as a parameter
>> and will ensure that there is enough space, so I don't think this code
>> segment is
>> required or have I misunderstood what you mean here?
>
> I looked more into what skb_cow_head() and skb_realloc_headroom()
> actually do and it seems you are right.
>
>> Is it safe to rely on the _cow_ being done here and not further down
>> in the stack?
>> Or at least checked further down in the stack. Previous comments from
>> another patch
>> requested that the _cow_ be done close to the actual addition of the
>> header. I presume
>> it is unlikely/impossible that the functions that add header
>> information  down the stack
>> will be called without the above being done first?
>
> It is safe. During probe sequence each layer in the driver stack
> increments drvr->hdrlen with headroom it needs. So drvr->hdrlen will
> indicate the total headroom needed when .start_xmit() is called. And
> yes, the .start_xmit() is the single point of entry in the transmit
> path. So the other locations where skb_push() is done are safe.
>

OK, I will redo the patch (making it v3 + changelogs), taking this in
to account.


>>>
>>>> +       ret = skb_cow_head(skb, drvr->hdrlen);
>>>> +       if (ret) {
>>>
>>>
>>> So move the realloc code above here instead of simply freeing the skb.
>>>
>>>> +               dev_kfree_skb_any(skb);
>>>> +               goto done;
>>>>         }
>>>>   +     eh = (struct ethhdr *)(skb->data);
>>>
>>>
>>> Now this is actually a separate fix so I would like a separate patch for it.
>>>
>>
>> No problem, but see final paragraph below.
>
> Let me see...

I'll split this fix out, but should I issue it in the same patch set
or as a separate patch?
And for a different patch, should I base it on the _cow_ version patch
or the original version?

>
>>> I have a RPi3 sitting on my desk so how can I replicate the issue. It was
>>> something about broadcast/multicast traffic when using AP mode and a bridge,
>>> right?
>>>
>>> Regards,
>>> Arend
>>
>> See this issue for details on replication.
>> https://github.com/raspberrypi/firmware/issues/673
>>
>> The bridge I use is setup using a similar procedure to that described
>> here.  https://www.raspberrypi.org/documentation/configuration/wireless/access-point.md
>>
>> I'm happy to pass over this work to you guys if you think it is
>> appropriate, you are the
>> experts on the codebase.
>
> Sure. However, I must say you did an excellent job digging into the
> issue and root causing it. We always were under the impression that we
> could treat the skb as our own when handed to us in .start_xmit() callback.
>
> Regards,
> Arend

Thanks! I suspect that a few drivers are making the same mistake, Eric
found 6 over and above the smsc95xx and this one that I originally
found.

Regards
James

^ permalink raw reply

* Re: [PATCH v2 2/2] iproute2: add support for invisible qdisc dumping
From: Jiri Pirko @ 2017-04-24  8:43 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: David S. Miller, Stephen Hemminger, Eric Dumazet,
	Jamal Hadi Salim, Phil Sutter, Cong Wang, Daniel Borkmann, netdev,
	linux-kernel
In-Reply-To: <alpine.LSU.2.20.1703081303490.31814@cbobk.fhfr.pm>

Wed, Mar 08, 2017 at 01:04:42PM CET, jikos@kernel.org wrote:
>From: Jiri Kosina <jkosina@suse.cz>
>
>Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking 
>kernel to perform 'full qdisc dump', as for historical reasons some of the 
>default qdiscs are being hidden by the kernel.
>
>The command syntax is being extended by voluntary 'invisible' argument to
>'tc qdisc show'.
>
>Signed-off-by: Jiri Kosina <jkosina@suse.cz>
>---

[...]


>@@ -325,7 +328,25 @@ static int tc_qdisc_list(int argc, char **argv)
> 		filter_ifindex = t.tcm_ifindex;
> 	}
> 
>-	if (rtnl_dump_request(&rth, RTM_GETQDISC, &t, sizeof(t)) < 0) {
>+	if (dump_invisible) {
>+		struct {
>+			struct nlmsghdr n;
>+			struct tcmsg t;
>+			char buf[256];
>+		} req = {
>+			.n.nlmsg_type = RTM_GETQDISC,
>+			.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
>+		};
>+
>+		req.t.tcm_family = AF_UNSPEC;
>+
>+		addattr(&req.n, 256, TCA_DUMP_INVISIBLE);
>+		if (rtnl_dump_request_n(&rth, &req.n) < 0) {

Jirko, you ignore the values that were previously set in "t". I think
that you can change this to do rtnl_dump_request_n always and simplify
the code a bit.


>+			perror("Cannot send dump request");
>+			return 1;
>+		}
>+
>+	} else if (rtnl_dump_request(&rth, RTM_GETQDISC, &t, sizeof(t)) < 0) {
> 		perror("Cannot send dump request");
> 		return 1;
> 	}
>-- 
>Jiri Kosina
>SUSE Labs
>

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return uninitialized stats
From: Paul Menzel @ 2017-04-24  8:23 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: Jeff Kirsher, netdev, intel-wired-lan, Stefan Priebe
In-Reply-To: <20170421212012.25950-1-bpoirier@suse.com>

Dear Benjamin,


Thank you for your fix.

On 04/21/17 23:20, Benjamin Poirier wrote:
> Some statistics passed to ethtool are garbage because e1000e_get_stats64()
> doesn't write them, for example: tx_heartbeat_errors. This leaks kernel
> memory to userspace and confuses users.

Could you please give specific examples to reproduce the issue? That way 
your fix can also be tested.

[…]


Kind regards,

Paul

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: LAN9303: add i2c dependency
From: Juergen Borleis @ 2017-04-24  8:22 UTC (permalink / raw)
  To: Tobias Regnery; +Cc: andrew, vivien.didelot, f.fainelli, netdev, linux-kernel
In-Reply-To: <20170424080737.24897-1-tobias.regnery@gmail.com>

Hi Tobias,

On Monday 24 April 2017 10:07:37 Tobias Regnery wrote:
> The Kconfig symbol for the I2C mode of the LAN9303 driver selects
> REGMAP_I2C. This symbol depends on I2C but the driver doesen't has a
> dependency on I2C.
>
> With CONFIG_I2C=n kconfig fails to select REGMAP_I2C and we get the
> following warning:
>
> warning: (NET_DSA_SMSC_LAN9303_I2C) selects REGMAP_I2C which has unmet
> direct dependencies (I2C)
>
> This results in a build failure because the regmap-i2c functions aren't
> available:
>
> drivers/base/regmap/regmap-i2c.c: In function 'regmap_smbus_byte_reg_read':
> drivers/base/regmap/regmap-i2c.c:29:8: error: implicit declaration of function 'i2c_smbus_read_byte_data' [-Werror=implicit-function-declaration]
>
> drivers/base/regmap/regmap-i2c.c: In function 'regmap_smbus_byte_reg_write':
> drivers/base/regmap/regmap-i2c.c:47:9: error: implicit declaration of function 'i2c_smbus_write_byte_data' [-Werror=implicit-function-declaration]
> ... 
>
> Fix this by adding a kconfig dependency on the I2C subsystem.
>
> Fixes: be4e119f9914 ("net: dsa: LAN9303: add I2C managed mode support")
> Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
> ---
>  drivers/net/dsa/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
> index 131a5b1cbfc8..862ee22303c2 100644
> --- a/drivers/net/dsa/Kconfig
> +++ b/drivers/net/dsa/Kconfig
> @@ -59,7 +59,7 @@ config NET_DSA_SMSC_LAN9303
>
>  config NET_DSA_SMSC_LAN9303_I2C
>  	tristate "SMSC/Microchip LAN9303 3-ports 10/100 ethernet switch in I2C managed mode"
> -	depends on NET_DSA 
> +	depends on NET_DSA && I2C
>  	select NET_DSA_SMSC_LAN9303
>  	select REGMAP_I2C
>  	---help---

Thanks, but already found and fixed by Arnd Bergmann.

jb

-- 
Pengutronix e.K.                             | Juergen Borleis             |
Industrial Linux Solutions                   | http://www.pengutronix.de/  |

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return uninitialized stats
From: Neftin, Sasha @ 2017-04-24  8:17 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: intel-wired-lan, Ruinskiy, Dima
In-Reply-To: <630A6B92B7EDEB45A87E20D3D286660171182EED@hasmsx109.ger.corp.intel.com>

On 4/23/2017 15:53, Neftin, Sasha wrote:
> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On Behalf Of Benjamin Poirier
> Sent: Saturday, April 22, 2017 00:20
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Stefan Priebe <s.priebe@profihost.ag>
> Subject: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return uninitialized stats
>
> Some statistics passed to ethtool are garbage because e1000e_get_stats64() doesn't write them, for example: tx_heartbeat_errors. This leaks kernel memory to userspace and confuses users.
>
> Do like ixgbe and use dev_get_stats() which first zeroes out rtnl_link_stats64.
>
> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>   drivers/net/ethernet/intel/e1000e/ethtool.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c b/drivers/net/ethernet/intel/e1000e/ethtool.c
> index 7aff68a4a4df..f117b90cdc2f 100644
> --- a/drivers/net/ethernet/intel/e1000e/ethtool.c
> +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
> @@ -2063,7 +2063,7 @@ static void e1000_get_ethtool_stats(struct net_device *netdev,
>   
>   	pm_runtime_get_sync(netdev->dev.parent);
>   
> -	e1000e_get_stats64(netdev, &net_stats);
> +	dev_get_stats(netdev, &net_stats);
>   
>   	pm_runtime_put_sync(netdev->dev.parent);
>   
> --
> 2.12.2
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@lists.osuosl.org
> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Hello,

We would like to not accept this patch. Suggested generic method 
'*dev_get_stats' (net/core/dev.c) calls 'ops->ndo_get_stats64' method 
which eventually calls e1000e_get_stats64 (netdev.c) - so there is same 
functionality. Also, see that 'e1000e_get_stats64' method in netdev.c 
(line 5928) calls 'memset' with 0's before update statistics.  Local 
sanity check in our lab shows 'tx_heartbeat_errors' counter reported as 0.

Thanks,

Sasha

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox