Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH iproute2 v2 2/3] link_iptnl: Print tunnel mode
From: Stephen Hemminger @ 2018-01-02 21:13 UTC (permalink / raw)
  To: Serhey Popovych; +Cc: netdev
In-Reply-To: <1514926965-22061-1-git-send-email-serhe.popovych@gmail.com>

On Tue,  2 Jan 2018 23:02:45 +0200
Serhey Popovych <serhe.popovych@gmail.com> wrote:

> Tunnel mode does not appear in parameters print for iptnl
> supported tunnels like ipip and sit, while printed for
> ip6tnl.
> 
> Print tunnel mode as "proto" field name for JSON and
> without any name when printing to cli to follow ip6tnl
> behaviour.
> 
> For non JSON output we have:
> 
>    $ ip -d link show dev sit1
> 
> Before:
> -------
> 17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
>     link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
>     sit remote any local X.X.X.X ...
>         ~~~
> 
> After:
> ------
> 17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
>     link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
>     sit any remote any local X.X.X.X ...
>         ^^^
> 
> Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
> ---
> v2: Addressed comments: "proto ipip" vs "proto ip4ip4" for
>     IPPROTO_IPIP tunnel type. Add example to message.
> 
>  ip/link_iptnl.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
> index d4d935b..b6ef95d 100644
> --- a/ip/link_iptnl.c
> +++ b/ip/link_iptnl.c
> @@ -372,6 +372,23 @@ static void iptunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[
>  	if (tb[IFLA_IPTUN_COLLECT_METADATA])
>  		print_bool(PRINT_ANY, "external", "external ", true);
>  
> +	if (tb[IFLA_IPTUN_PROTO]) {
> +		switch (rta_getattr_u8(tb[IFLA_IPTUN_PROTO])) {
> +		case IPPROTO_IPIP:
> +			print_string(PRINT_ANY, "proto", "%s ", "ipip");
> +			break;
> +		case IPPROTO_IPV6:
> +			print_string(PRINT_ANY, "proto", "%s ", "ip6ip");
> +			break;
> +		case IPPROTO_MPLS:
> +			print_string(PRINT_ANY, "proto", "%s ", "mplsip");
> +			break;
> +		case 0:
> +			print_string(PRINT_ANY, "proto", "%s ", "any");
> +			break;
> +		}
> +	}
> +
>  	if (tb[IFLA_IPTUN_REMOTE]) {
>  		unsigned int addr = rta_getattr_u32(tb[IFLA_IPTUN_REMOTE]);

Thanks for fixing the ip4 value.

When you fix one thing, you need to resend whole patch series.

^ permalink raw reply

* Re: [PATCH] net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
From: Stefano Brivio @ 2018-01-02 21:12 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Mohamed Ghannam, Michal Kubecek, Miroslav Benes, netdev,
	linux-kernel
In-Reply-To: <20180102163020.32473-1-nstange@suse.de>

Hi,

On Tue,  2 Jan 2018 17:30:20 +0100
Nicolai Stange <nstange@suse.de> wrote:

> [...]
>
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index 5b9bd5c33d9d..e84290c28c0c 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -513,16 +513,18 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  	int err;
>  	struct ip_options_data opt_copy;
>  	struct raw_frag_vec rfv;
> -	int hdrincl;
> +	int hdrincl, __hdrincl;
>  
>  	err = -EMSGSIZE;
>  	if (len > 0xFFFF)
>  		goto out;
>  
>  	/* hdrincl should be READ_ONCE(inet->hdrincl)
> -	 * but READ_ONCE() doesn't work with bit fields
> +	 * but READ_ONCE() doesn't work with bit fields.
> +	 * Emulate it by doing the READ_ONCE() from an intermediate int.
>  	 */
> -	hdrincl = inet->hdrincl;
> +	__hdrincl = inet->hdrincl;
> +	hdrincl = READ_ONCE(__hdrincl);

I guess you don't actually need to use a third variable. What about
doing READ_ONCE() on hdrincl itself after the first assignment?

Perhaps something like the patch below -- applies to net.git, yields
same binary output as your version with gcc 6, looks IMHO more
straightforward:

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 125c1eab3eaa..8c2f783a95fc 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -519,10 +519,12 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (len > 0xFFFF)
 		goto out;
 
-	/* hdrincl should be READ_ONCE(inet->hdrincl)
-	 * but READ_ONCE() doesn't work with bit fields
+	/* hdrincl should be READ_ONCE(inet->hdrincl) but READ_ONCE() doesn't
+	 * work with bit fields. Emulate it by adding a further sequence point.
 	 */
 	hdrincl = inet->hdrincl;
+	hdrincl = READ_ONCE(hdrincl);
+
 	/*
 	 *	Check the flags.
 	 */


-- 
Stefano

^ permalink raw reply related

* [PATCH iproute2 v2 2/3] link_iptnl: Print tunnel mode
From: Serhey Popovych @ 2018-01-02 21:02 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20180102115413.37666ab5@xeon-e3>

Tunnel mode does not appear in parameters print for iptnl
supported tunnels like ipip and sit, while printed for
ip6tnl.

Print tunnel mode as "proto" field name for JSON and
without any name when printing to cli to follow ip6tnl
behaviour.

For non JSON output we have:

   $ ip -d link show dev sit1

Before:
-------
17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
    link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
    sit remote any local X.X.X.X ...
        ~~~

After:
------
17: sit1@NONE: <NOARP> mtu 1480 qdisc noop state DOWN ...
    link/sit X.X.X.X brd 0.0.0.0 promiscuity 0
    sit any remote any local X.X.X.X ...
        ^^^

Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
---
v2: Addressed comments: "proto ipip" vs "proto ip4ip4" for
    IPPROTO_IPIP tunnel type. Add example to message.

 ip/link_iptnl.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index d4d935b..b6ef95d 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -372,6 +372,23 @@ static void iptunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[
 	if (tb[IFLA_IPTUN_COLLECT_METADATA])
 		print_bool(PRINT_ANY, "external", "external ", true);
 
+	if (tb[IFLA_IPTUN_PROTO]) {
+		switch (rta_getattr_u8(tb[IFLA_IPTUN_PROTO])) {
+		case IPPROTO_IPIP:
+			print_string(PRINT_ANY, "proto", "%s ", "ipip");
+			break;
+		case IPPROTO_IPV6:
+			print_string(PRINT_ANY, "proto", "%s ", "ip6ip");
+			break;
+		case IPPROTO_MPLS:
+			print_string(PRINT_ANY, "proto", "%s ", "mplsip");
+			break;
+		case 0:
+			print_string(PRINT_ANY, "proto", "%s ", "any");
+			break;
+		}
+	}
+
 	if (tb[IFLA_IPTUN_REMOTE]) {
 		unsigned int addr = rta_getattr_u32(tb[IFLA_IPTUN_REMOTE]);
 
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net-next 2/2] l2tp: add peer_offset parameter
From: James Chapman @ 2018-01-02 20:59 UTC (permalink / raw)
  To: Guillaume Nault, Lorenzo Bianconi, David S. Miller; +Cc: netdev, Hangbin Liu
In-Reply-To: <f196609a-eeb4-5fd9-4c56-9b4c17547fe2@katalix.com>

On 02/01/18 20:08, James Chapman wrote:
> On 02/01/18 18:05, Guillaume Nault wrote:
>>>> Lorenzo, is this being added to fix interoperability with another 
>>>> L2TPv3
>>>> implementation? If so, can you share more details?
>>>>
>>> Hi James,
>>>
>>> I introduced peer_offset parameter to fix a specific setup where
>>> tunnel endpoints
>>> running L2TPv3 would use different values for tx offset (since in
>>> iproute2 there is no
>>> restriction on it), not to fix a given an interoperability issue.
>>>
>> Yes, but was it just to test iproute2's peer_offset option? Or is there
>> a plan to use it for real?
>>
>>> I introduced this feature since:
>>>   - offset has been added for long time to L2TPv3 implementation
>>>     (commit f7faffa3ff8ef6ae712ef16312b8a2aa7a1c95fe and
>>>     commit 309795f4bec2d69cd507a631f82065c2198a0825) and I wanted to
>>> preserve UABI
>>>   - have the same degree of freedom for offset parameter we have in
>>> L2TPv2 and fix the issue
>>>     described above
>>>
>> AFAIU, the current L2TPv2 implementation never sets the offset field
>> and nobody ever realised.
>>
>>> Now what we can do I guess is:
>>> - as suggested by Guillaume drop completely the offset support 
>>> without removing
>>>    netlink attribute in order to not break UABI
>>> - fix offset support initializing properly padding bits
>>>
>> I'd go for the first one. I just wonder if that looks acceptable to
>> David an James.
>
> I think the first one too. Also update iproute2 to remove or hide the 
> offset and peer_offset parameters.
>
>
I just realised the peer_offset attribute changes are already applied in 
net-next. (I missed these when they were submitted just before 
Christmas.) Should these commits be reverted? We probably don't want 
v4.15 to get an additional l2tp peer_offset attribute if we are going to 
remove it and the rest of the code supporting configurable offset 
attributes in the next release.

81487bf Merge branch 'l2tp-next'
f15bc54 l2tp: add peer_offset parameter
820da53 l2tp: fix missing print session offset info

^ permalink raw reply

* [PATCH] ethernet: cpsw-phy-sel: Delete an error message for a failed memory allocation in cpsw_phy_sel_probe()
From: SF Markus Elfring @ 2018-01-02 20:48 UTC (permalink / raw)
  To: netdev, linux-omap, Grygorii Strashko, Mugunthan V N
  Cc: LKML, kernel-janitors, Daniel Mack

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Tue, 2 Jan 2018 21:41:25 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 drivers/net/ethernet/ti/cpsw-phy-sel.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw-phy-sel.c b/drivers/net/ethernet/ti/cpsw-phy-sel.c
index 18013645e76c..d29bc349bfc4 100644
--- a/drivers/net/ethernet/ti/cpsw-phy-sel.c
+++ b/drivers/net/ethernet/ti/cpsw-phy-sel.c
@@ -213,10 +213,8 @@ static int cpsw_phy_sel_probe(struct platform_device *pdev)
 		return -EINVAL;
 
 	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
-	if (!priv) {
-		dev_err(&pdev->dev, "unable to alloc memory for cpsw phy sel\n");
+	if (!priv)
 		return -ENOMEM;
-	}
 
 	priv->dev = &pdev->dev;
 	priv->cpsw_phy_sel = of_id->data;
-- 
2.15.1

^ permalink raw reply related

* Re: [PATCH net 3/3] eet: ena: invoke netif_carrier_off() only after netdev registered
From: Belgazal, Netanel @ 2018-01-02 20:38 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Woodhouse, David, Machulsky, Zorik,
	Matushevsky, Alexander, Bshara, Saeed, Wilson, Matt,
	Liguori, Anthony, Bshara, Nafea, Schmeilin, Evgeny, Tzalik, Guy
In-Reply-To: <20180102.140804.982695913961893372.davem@davemloft.net>

Right.
I’ll remove this patch.

On 1/2/18, 9:08 PM, "David Miller" <davem@davemloft.net> wrote:

    From: <netanel@amazon.com>
    Date: Thu, 28 Dec 2017 21:30:20 +0000
    
    > From: Netanel Belgazal <netanel@amazon.com>
    > 
    > netif_carrier_off() should be called only after register netdev.
    > Move the function's call after the registration.
    > 
    > Signed-off-by: Netanel Belgazal <netanel@amazon.com>
    > ---
    >  drivers/net/ethernet/amazon/ena/ena_netdev.c | 4 ++--
    >  1 file changed, 2 insertions(+), 2 deletions(-)
    > 
    > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
    > index fbe21a817bd8..ee50c56765a4 100644
    > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
    > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
    > @@ -3276,14 +3276,14 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
    >  
    >  	memcpy(adapter->netdev->perm_addr, adapter->mac_addr, netdev->addr_len);
    >  
    > -	netif_carrier_off(netdev);
    > -
    >  	rc = register_netdev(netdev);
    >  	if (rc) {
    >  		dev_err(&pdev->dev, "Cannot register net device\n");
    >  		goto err_rss;
    >  	}
    >  
    > +	netif_carrier_off(netdev);
    > +
    
    You cannot invoke this after register_netdev(), asynchronous activity can cause this
    call to lose information and lose a link up event.
    
    


^ permalink raw reply

* Re: [net-next] netfilter: add segment routing header 'srh' match
From: Ahmed Abdelsalam @ 2018-01-02 20:37 UTC (permalink / raw)
  To: David Lebrun
  Cc: pablo, kadlec, fw, kuznet, linux-kernel, netfilter-devel,
	coreteam, netdev, amsalam20
In-Reply-To: <1514545672-13827-1-git-send-email-amsalam20@gmail.com>

On Fri, 29 Dec 2017 12:07:52 +0100
Ahmed Abdelsalam <amsalam20@gmail.com> wrote:

> It allows matching packets based on Segment Routing Header
> (SRH) information.
> The implementation considers revision 7 of the SRH draft.
> https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07
> 
> Currently supported match options include:
> (1) Next Header
> (2) Hdr Ext Len
> (3) Segments Left
> (4) Last Entry
> (5) Tag value of SRH
> 
> Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
> ---
>  include/uapi/linux/netfilter_ipv6/ip6t_srh.h |  63 ++++++++++
>  net/ipv6/netfilter/Kconfig                   |   9 ++
>  net/ipv6/netfilter/Makefile                  |   1 +
>  net/ipv6/netfilter/ip6t_srh.c                | 165 +++++++++++++++++++++++++++
>  4 files changed, 238 insertions(+)
>  create mode 100644 include/uapi/linux/netfilter_ipv6/ip6t_srh.h
>  create mode 100644 net/ipv6/netfilter/ip6t_srh.c
> 
> diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> new file mode 100644
> index 0000000..1b5dbd8
> --- /dev/null
> +++ b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> @@ -0,0 +1,63 @@
> +/**
> + * Definitions for Segment Routing Header 'srh' match
> + *
> + * Author:
> + *       Ahmed Abdelsalam       <amsalam20@gmail.com>
> + */
> +
> +#ifndef _IP6T_SRH_H
> +#define _IP6T_SRH_H
> +
> +#include <linux/types.h>
> +#include <linux/netfilter.h>
> +
> +/* Values for "mt_flags" field in struct ip6t_srh */
> +#define IP6T_SRH_NEXTHDR        0x0001
> +#define IP6T_SRH_LEN_EQ         0x0002
> +#define IP6T_SRH_LEN_GT         0x0004
> +#define IP6T_SRH_LEN_LT         0x0008
> +#define IP6T_SRH_SEGS_EQ        0x0010
> +#define IP6T_SRH_SEGS_GT        0x0020
> +#define IP6T_SRH_SEGS_LT        0x0040
> +#define IP6T_SRH_LAST_EQ        0x0080
> +#define IP6T_SRH_LAST_GT        0x0100
> +#define IP6T_SRH_LAST_LT        0x0200
> +#define IP6T_SRH_TAG            0x0400
> +#define IP6T_SRH_MASK           0x07FF
> +
> +/* Values for "mt_invflags" field in struct ip6t_srh */
> +#define IP6T_SRH_INV_NEXTHDR    0x0001
> +#define IP6T_SRH_INV_LEN_EQ     0x0002
> +#define IP6T_SRH_INV_LEN_GT     0x0004
> +#define IP6T_SRH_INV_LEN_LT     0x0008
> +#define IP6T_SRH_INV_SEGS_EQ    0x0010
> +#define IP6T_SRH_INV_SEGS_GT    0x0020
> +#define IP6T_SRH_INV_SEGS_LT    0x0040
> +#define IP6T_SRH_INV_LAST_EQ    0x0080
> +#define IP6T_SRH_INV_LAST_GT    0x0100
> +#define IP6T_SRH_INV_LAST_LT    0x0200
> +#define IP6T_SRH_INV_TAG        0x0400
> +#define IP6T_SRH_INV_MASK       0x07FF
> +
> +/**
> + *      struct ip6t_srh - SRH match options
> + *      @ next_hdr: Next header field of SRH
> + *      @ hdr_len: Extension header length field of SRH
> + *      @ segs_left: Segments left field of SRH
> + *      @ last_entry: Last entry field of SRH
> + *      @ tag: Tag field of SRH
> + *      @ mt_flags: match options
> + *      @ mt_invflags: Invert the sense of match options
> + */
> +
> +struct ip6t_srh {
> +	__u8                    next_hdr;
> +	__u8                    hdr_len;
> +	__u8                    segs_left;
> +	__u8                    last_entry;
> +	__u16                   tag;
> +	__u16                   mt_flags;
> +	__u16                   mt_invflags;
> +};
> +
> +#endif /*_IP6T_SRH_H*/
> diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
> index 6acb2ee..e1818eb 100644
> --- a/net/ipv6/netfilter/Kconfig
> +++ b/net/ipv6/netfilter/Kconfig
> @@ -232,6 +232,15 @@ config IP6_NF_MATCH_RT
>  
>  	  To compile it as a module, choose M here.  If unsure, say N.
>  
> +config IP6_NF_MATCH_SRH
> +        tristate '"srh" Segment Routing header match support'
> +        depends on NETFILTER_ADVANCED
> +        help
> +          srh matching allows you to match packets based on the segment
> +	  routing header of the packet.
> +
> +          To compile it as a module, choose M here.  If unsure, say N.
> +
>  # The targets
>  config IP6_NF_TARGET_HL
>  	tristate '"HL" hoplimit target support'
> diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
> index c6ee0cd..e0d51a9 100644
> --- a/net/ipv6/netfilter/Makefile
> +++ b/net/ipv6/netfilter/Makefile
> @@ -54,6 +54,7 @@ obj-$(CONFIG_IP6_NF_MATCH_MH) += ip6t_mh.o
>  obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_hbh.o
>  obj-$(CONFIG_IP6_NF_MATCH_RPFILTER) += ip6t_rpfilter.o
>  obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o
> +obj-$(CONFIG_IP6_NF_MATCH_SRH) += ip6t_srh.o
>  
>  # targets
>  obj-$(CONFIG_IP6_NF_TARGET_MASQUERADE) += ip6t_MASQUERADE.o
> diff --git a/net/ipv6/netfilter/ip6t_srh.c b/net/ipv6/netfilter/ip6t_srh.c
> new file mode 100644
> index 0000000..75e41dc9
> --- /dev/null
> +++ b/net/ipv6/netfilter/ip6t_srh.c
> @@ -0,0 +1,165 @@
> +/*
> + * Kernel module to match Segment Routing Header (SRH) parameters.
> + *
> + * Author:
> + * Ahmed Abdelsalam <amsalam20@gmail.com>
> + *
> + *
> + *  This program is free software; you can redistribute it and/or
> + *	modify it under the terms of the GNU General Public License
> + *	as published by the Free Software Foundation; either version 2
> + *	of the License, or (at your option) any later version.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/ipv6.h>
> +#include <linux/types.h>
> +#include <net/ipv6.h>
> +#include <net/seg6.h>
> +
> +#include <linux/netfilter/x_tables.h>
> +#include <linux/netfilter_ipv6/ip6t_srh.h>
> +#include <linux/netfilter_ipv6/ip6_tables.h>
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Xtables: IPv6 Segment Routing Header match");
> +MODULE_AUTHOR("Ahmed Abdelsalam  <amsalam20@gmail.com>");
> +
> +/* Test a struct->mt_invflags and a boolean for inequality */
> +#define NF_SRH_INVF(ptr, flag, boolean)	\
> +	((boolean) ^ !!((ptr)->mt_invflags & (flag)))
> +
> +static bool srh_mt6(const struct sk_buff *skb, struct xt_action_param *par)
> +{
> +	const struct ip6t_srh *srhinfo = par->matchinfo;
> +	struct ipv6_sr_hdr *srh;
> +	struct ipv6_sr_hdr _srh;
> +	int hdrlen, srhoff = 0;
> +
> +	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
> +		return false;
> +
> +	srh = skb_header_pointer(skb, srhoff, sizeof(_srh), &_srh);
> +
> +	if (!srh)
> +		return false;
> +
> +	hdrlen = ipv6_optlen(srh);
> +	if (skb->len - srhoff < hdrlen)
> +		return false;
> +
> +	if (srh->type != IPV6_SRCRT_TYPE_4)
> +		return false;
> +
> +	if (srh->segments_left > srh->first_segment)
> +		return false;
> +
> +	/* Next Header matching */
> +	if (srhinfo->mt_flags & IP6T_SRH_NEXTHDR)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_NEXTHDR,
> +				!(srh->nexthdr == srhinfo->next_hdr)))
> +			return false;
> +
> +	/* Header Extension Length matching */
> +	if (srhinfo->mt_flags & IP6T_SRH_LEN_EQ)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LEN_EQ,
> +				!(srh->hdrlen == srhinfo->hdr_len)))
> +			return false;
> +
> +	if (srhinfo->mt_flags & IP6T_SRH_LEN_GT)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LEN_GT,
> +				!(srh->hdrlen > srhinfo->hdr_len)))
> +			return false;
> +
> +	if (srhinfo->mt_flags & IP6T_SRH_LEN_LT)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LEN_LT,
> +				!(srh->hdrlen < srhinfo->hdr_len)))
> +			return false;
> +
> +	/* Segments Left matching */
> +	if (srhinfo->mt_flags & IP6T_SRH_SEGS_EQ)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_SEGS_EQ,
> +				!(srh->segments_left == srhinfo->segs_left)))
> +			return false;
> +
> +	if (srhinfo->mt_flags & IP6T_SRH_SEGS_GT)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_SEGS_GT,
> +				!(srh->segments_left > srhinfo->segs_left)))
> +			return false;
> +
> +	if (srhinfo->mt_flags & IP6T_SRH_SEGS_LT)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_SEGS_LT,
> +				!(srh->segments_left < srhinfo->segs_left)))
> +			return false;
> +
> +	/**
> +	 * Last Entry matching
> +	 * Last_Entry field was introduced in revision 6 of the SRH draft.
> +	 * It was called First_Segment in the previous revision
> +	 */
> +	if (srhinfo->mt_flags & IP6T_SRH_LAST_EQ)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LAST_EQ,
> +				!(srh->first_segment == srhinfo->last_entry)))
> +			return false;
> +
> +	if (srhinfo->mt_flags & IP6T_SRH_LAST_GT)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LAST_GT,
> +				!(srh->first_segment > srhinfo->last_entry)))
> +			return false;
> +
> +	if (srhinfo->mt_flags & IP6T_SRH_LAST_LT)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_LAST_LT,
> +				!(srh->first_segment < srhinfo->last_entry)))
> +			return false;
> +
> +	/**
> +	 * Tag matchig
> +	 * Tag field was introduced in revision 6 of the SRH draft.
> +	 */
> +	if (srhinfo->mt_flags & IP6T_SRH_TAG)
> +		if (NF_SRH_INVF(srhinfo, IP6T_SRH_INV_TAG,
> +				!(srh->tag == srhinfo->tag)))
> +			return false;
> +	return true;
> +}
> +
> +static int srh_mt6_check(const struct xt_mtchk_param *par)
> +{
> +	const struct ip6t_srh *srhinfo = par->matchinfo;
> +
> +	if (srhinfo->mt_flags & ~IP6T_SRH_MASK) {
> +		pr_debug("unknown srh match flags  %X\n", srhinfo->mt_flags);
> +		return -EINVAL;
> +	}
> +
> +	if (srhinfo->mt_invflags & ~IP6T_SRH_INV_MASK) {
> +		pr_debug("unknown srh invflags %X\n", srhinfo->mt_invflags);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static struct xt_match srh_mt6_reg __read_mostly = {
> +	.name		= "srh",
> +	.family		= NFPROTO_IPV6,
> +	.match		= srh_mt6,
> +	.matchsize	= sizeof(struct ip6t_srh),
> +	.checkentry	= srh_mt6_check,
> +	.me		= THIS_MODULE,
> +};
> +
> +static int __init srh_mt6_init(void)
> +{
> +	return xt_register_match(&srh_mt6_reg);
> +}
> +
> +static void __exit srh_mt6_exit(void)
> +{
> +	xt_unregister_match(&srh_mt6_reg);
> +}
> +
> +module_init(srh_mt6_init);
> +module_exit(srh_mt6_exit);
> -- 
> 2.1.4
> 
David, 
Any comment on this patch ?
-- 
Ahmed 

^ permalink raw reply

* Re: Bluetooth: Prevent stack info leak from the EFS element.
From: Guenter Roeck @ 2018-01-02 20:26 UTC (permalink / raw)
  To: gregkh@linuxfoundation.org
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, Ben Seri, netdev,
	linux-kernel
In-Reply-To: <20171208141447.GA5404@kroah.com>

On Fri, Dec 08, 2017 at 03:14:47PM +0100, gregkh@linuxfoundation.org wrote:
> From: Ben Seri <ben@armis.com>
> 
> In the function l2cap_parse_conf_rsp and in the function
> l2cap_parse_conf_req the following variable is declared without
> initialization:
> 
> struct l2cap_conf_efs efs;
> 
> In addition, when parsing input configuration parameters in both of
> these functions, the switch case for handling EFS elements may skip the
> memcpy call that will write to the efs variable:
> 
> ...
> case L2CAP_CONF_EFS:
> if (olen == sizeof(efs))
> memcpy(&efs, (void *)val, olen);
> ...
> 
> The olen in the above if is attacker controlled, and regardless of that
> if, in both of these functions the efs variable would eventually be
> added to the outgoing configuration request that is being built:
> 
> l2cap_add_conf_opt(&ptr, L2CAP_CONF_EFS, sizeof(efs), (unsigned long) &efs);
> 
> So by sending a configuration request, or response, that contains an
> L2CAP_CONF_EFS element, but with an element length that is not
> sizeof(efs) - the memcpy to the uninitialized efs variable can be
> avoided, and the uninitialized variable would be returned to the
> attacker (16 bytes).
> 
> This issue has been assigned CVE-2017-1000410
> 
> Cc: Marcel Holtmann <marcel@holtmann.org>
> Cc: Gustavo Padovan <gustavo@padovan.org>
> Cc: Johan Hedberg <johan.hedberg@gmail.com>
> Cc: stable <stable@vger.kernel.org>
> Signed-off-by: Ben Seri <ben@armis.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  net/bluetooth/l2cap_core.c |   20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> Marcel, for some reason this patch never got applied, despite lots of
> advance notice.  Please, someone queue it up as it resolves the above
> very-well-reported issue.
> 

This patch is still not upstream or in -next. Given that we (ChromeOS)
are heavy Bluetooth users, I'll go ahead and apply it without waiting
any longer. For my understanding and for tracking purposes, it would
be useful to know why it is not being applied. Does anyone know ?

Thanks,
Guenter

> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> index 43ba91c440bc..fc6615d59165 100644
> --- a/net/bluetooth/l2cap_core.c
> +++ b/net/bluetooth/l2cap_core.c
> @@ -3363,9 +3363,10 @@ static int l2cap_parse_conf_req(struct l2cap_chan *chan, void *data, size_t data
>  			break;
>  
>  		case L2CAP_CONF_EFS:
> -			remote_efs = 1;
> -			if (olen == sizeof(efs))
> +			if (olen == sizeof(efs)) {
> +				remote_efs = 1;
>  				memcpy(&efs, (void *) val, olen);
> +			}
>  			break;
>  
>  		case L2CAP_CONF_EWS:
> @@ -3584,16 +3585,17 @@ static int l2cap_parse_conf_rsp(struct l2cap_chan *chan, void *rsp, int len,
>  			break;
>  
>  		case L2CAP_CONF_EFS:
> -			if (olen == sizeof(efs))
> +			if (olen == sizeof(efs)) {
>  				memcpy(&efs, (void *)val, olen);
>  
> -			if (chan->local_stype != L2CAP_SERV_NOTRAFIC &&
> -			    efs.stype != L2CAP_SERV_NOTRAFIC &&
> -			    efs.stype != chan->local_stype)
> -				return -ECONNREFUSED;
> +				if (chan->local_stype != L2CAP_SERV_NOTRAFIC &&
> +				    efs.stype != L2CAP_SERV_NOTRAFIC &&
> +				    efs.stype != chan->local_stype)
> +					return -ECONNREFUSED;
>  
> -			l2cap_add_conf_opt(&ptr, L2CAP_CONF_EFS, sizeof(efs),
> -					   (unsigned long) &efs, endptr - ptr);
> +				l2cap_add_conf_opt(&ptr, L2CAP_CONF_EFS, sizeof(efs),
> +						   (unsigned long) &efs, endptr - ptr);
> +			}
>  			break;
>  
>  		case L2CAP_CONF_FCS:

^ permalink raw reply

* Re: [net-next] ipv6: sr: export some functions of seg6local
From: Ahmed Abdelsalam @ 2018-01-02 20:25 UTC (permalink / raw)
  To: David Lebrun; +Cc: davem, kuznet, yoshfuji, david.lebrun, netdev
In-Reply-To: <744656e9-b277-2915-e575-aea8b2c083be@gmail.com>

Ok! 
Thanks David
On Tue, 2 Jan 2018 20:11:48 +0000
David Lebrun <dav.lebrun@gmail.com> wrote:

> On 12/29/2017 09:09 PM, Ahmed Abdelsalam wrote:
> > Some functions of seg6local are very useful to process SRv6
> > encapsulated packets.
> > 
> > This patch exports some functions of seg6local that are useful and
> > can be re-used at different parts of the kernel.
> > 
> > The set of exported functions are:
> > (1) get_srh()
> > (2) advance_nextseg()
> > (3) lookup_nexthop
> 
> If you want to export those functions, it's better to prefix them with
> e.g. seg6_:
> 
> seg6_get_srh()
> seg6_advance_nextseg()
> seg6_lookup_nexthop()
> 
> David


-- 
Ahmed Abdelsalam <amsalam20@gmail.com>

^ permalink raw reply

* [net 1/2] e1000: fix disabling already-disabled warning
From: Jeff Kirsher @ 2018-01-02 20:20 UTC (permalink / raw)
  To: davem; +Cc: Tushar Dave, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180102202045.72780-1-jeffrey.t.kirsher@intel.com>

From: Tushar Dave <tushar.n.dave@oracle.com>

This patch adds check so that driver does not disable already
disabled device.

[   44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[   44.997548] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6
[   45.013419] e1000 0000:00:03.0: disabling already-disabled device
[   45.013447] ------------[ cut here ]------------
[   45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 pci_disable_device+0xa1/0x105:
						pci_disable_device at drivers/pci/pci.c:1640
[   45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 4.14.0-01330-g3c07399 #1
[   45.017197] task: ffff88011bee9e40 task.stack: ffffc90000860000
[   45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
						pci_disable_device at drivers/pci/pci.c:1640
[   45.018603] RSP: 0000:ffffc90000863e30 EFLAGS: 00010286
[   45.019282] RAX: 0000000000000035 RBX: ffff88013a230008 RCX: 0000000000000000
[   45.020182] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000203
[   45.021084] RBP: ffff88013a3f31e8 R08: 0000000000000001 R09: 0000000000000000
[   45.021986] R10: ffffffff827ec29c R11: 0000000000000002 R12: 0000000000000001
[   45.022946] R13: ffff88013a230008 R14: ffff880117802b20 R15: ffffc90000863e8f
[   45.023842] FS:  0000000000000000(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
[   45.024863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.025583] CR2: ffffc900006d4000 CR3: 000000000220f000 CR4: 00000000000006a0
[   45.026478] Call Trace:
[   45.026811]  __e1000_shutdown+0x1d4/0x1e2:
						__e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5162
[   45.027344]  ? rcu_perf_cleanup+0x2a1/0x2a1:
						rcu_perf_shutdown at kernel/rcu/rcuperf.c:627
[   45.027883]  e1000_shutdown+0x14/0x3a:
						e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5235
[   45.028351]  device_shutdown+0x110/0x1aa:
						device_shutdown at drivers/base/core.c:2807
[   45.028858]  kernel_power_off+0x31/0x64:
						kernel_power_off at kernel/reboot.c:260
[   45.029343]  rcu_perf_shutdown+0x9b/0xa7:
						rcu_perf_shutdown at kernel/rcu/rcuperf.c:637
[   45.029852]  ? __wake_up_common_lock+0xa2/0xa2:
						autoremove_wake_function at kernel/sched/wait.c:376
[   45.030414]  kthread+0x126/0x12e:
						kthread at kernel/kthread.c:233
[   45.030834]  ? __kthread_bind_mask+0x8e/0x8e:
						kthread at kernel/kthread.c:190
[   45.031399]  ? ret_from_fork+0x1f/0x30:
						ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.031883]  ? kernel_init+0xa/0xf5:
						kernel_init at init/main.c:997
[   45.032325]  ret_from_fork+0x1f/0x30:
						ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82
[   45.035222] ---[ end trace c257137b1b1976ef ]---
[   45.037838] ACPI: Preparing to enter system sleep state S5

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000/e1000.h      |  3 ++-
 drivers/net/ethernet/intel/e1000/e1000_main.c | 27 ++++++++++++++++++++++-----
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
index d7bdea79e9fa..8fd2458060a0 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -331,7 +331,8 @@ struct e1000_adapter {
 enum e1000_state_t {
 	__E1000_TESTING,
 	__E1000_RESETTING,
-	__E1000_DOWN
+	__E1000_DOWN,
+	__E1000_DISABLED
 };
 
 #undef pr_fmt
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 1982f7917a8d..3dd4aeb2706d 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -945,7 +945,7 @@ static int e1000_init_hw_struct(struct e1000_adapter *adapter,
 static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	struct net_device *netdev;
-	struct e1000_adapter *adapter;
+	struct e1000_adapter *adapter = NULL;
 	struct e1000_hw *hw;
 
 	static int cards_found;
@@ -955,6 +955,7 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	u16 tmp = 0;
 	u16 eeprom_apme_mask = E1000_EEPROM_APME;
 	int bars, need_ioport;
+	bool disable_dev = false;
 
 	/* do not allocate ioport bars when not needed */
 	need_ioport = e1000_is_need_ioport(pdev);
@@ -1259,11 +1260,13 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	iounmap(hw->ce4100_gbe_mdio_base_virt);
 	iounmap(hw->hw_addr);
 err_ioremap:
+	disable_dev = !test_and_set_bit(__E1000_DISABLED, &adapter->flags);
 	free_netdev(netdev);
 err_alloc_etherdev:
 	pci_release_selected_regions(pdev, bars);
 err_pci_reg:
-	pci_disable_device(pdev);
+	if (!adapter || disable_dev)
+		pci_disable_device(pdev);
 	return err;
 }
 
@@ -1281,6 +1284,7 @@ static void e1000_remove(struct pci_dev *pdev)
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
+	bool disable_dev;
 
 	e1000_down_and_stop(adapter);
 	e1000_release_manageability(adapter);
@@ -1299,9 +1303,11 @@ static void e1000_remove(struct pci_dev *pdev)
 		iounmap(hw->flash_address);
 	pci_release_selected_regions(pdev, adapter->bars);
 
+	disable_dev = !test_and_set_bit(__E1000_DISABLED, &adapter->flags);
 	free_netdev(netdev);
 
-	pci_disable_device(pdev);
+	if (disable_dev)
+		pci_disable_device(pdev);
 }
 
 /**
@@ -5156,7 +5162,8 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool *enable_wake)
 	if (netif_running(netdev))
 		e1000_free_irq(adapter);
 
-	pci_disable_device(pdev);
+	if (!test_and_set_bit(__E1000_DISABLED, &adapter->flags))
+		pci_disable_device(pdev);
 
 	return 0;
 }
@@ -5200,6 +5207,10 @@ static int e1000_resume(struct pci_dev *pdev)
 		pr_err("Cannot enable PCI device from suspend\n");
 		return err;
 	}
+
+	/* flush memory to make sure state is correct */
+	smp_mb__before_atomic();
+	clear_bit(__E1000_DISABLED, &adapter->flags);
 	pci_set_master(pdev);
 
 	pci_enable_wake(pdev, PCI_D3hot, 0);
@@ -5274,7 +5285,9 @@ static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev,
 
 	if (netif_running(netdev))
 		e1000_down(adapter);
-	pci_disable_device(pdev);
+
+	if (!test_and_set_bit(__E1000_DISABLED, &adapter->flags))
+		pci_disable_device(pdev);
 
 	/* Request a slot slot reset. */
 	return PCI_ERS_RESULT_NEED_RESET;
@@ -5302,6 +5315,10 @@ static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev)
 		pr_err("Cannot re-enable PCI device after reset.\n");
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
+
+	/* flush memory to make sure state is correct */
+	smp_mb__before_atomic();
+	clear_bit(__E1000_DISABLED, &adapter->flags);
 	pci_set_master(pdev);
 
 	pci_enable_wake(pdev, PCI_D3hot, 0);
-- 
2.15.1

^ permalink raw reply related

* [net 2/2] e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
From: Jeff Kirsher @ 2018-01-02 20:20 UTC (permalink / raw)
  To: davem; +Cc: Benjamin Poirier, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180102202045.72780-1-jeffrey.t.kirsher@intel.com>

From: Benjamin Poirier <bpoirier@suse.com>

e1000e_check_for_copper_link() and e1000_check_for_copper_link_ich8lan()
are the two functions that may be assigned to mac.ops.check_for_link when
phy.media_type == e1000_media_type_copper. Commit 19110cfbb34d ("e1000e:
Separate signaling for link check/link up") changed the meaning of the
return value of check_for_link for copper media but only adjusted the first
function. This patch adjusts the second function likewise.

Reported-by: Christian Hesse <list@eworm.de>
Reported-by: Gabriel C <nix.or.die@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198047
Fixes: 19110cfbb34d ("e1000e: Separate signaling for link check/link up")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Christian Hesse <list@eworm.de>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/ich8lan.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index d6d4ed7acf03..31277d3bb7dc 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1367,6 +1367,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
  *  Checks to see of the link status of the hardware has changed.  If a
  *  change in link status has been detected, then we read the PHY registers
  *  to get the current speed/duplex if link exists.
+ *
+ *  Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 (link
+ *  up).
  **/
 static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 {
@@ -1382,7 +1385,7 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 	 * Change or Rx Sequence Error interrupt.
 	 */
 	if (!mac->get_link_status)
-		return 0;
+		return 1;
 
 	/* First we want to see if the MII Status Register reports
 	 * link.  If so, then we want to get the current speed/duplex
@@ -1613,10 +1616,12 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 	 * different link partner.
 	 */
 	ret_val = e1000e_config_fc_after_link_up(hw);
-	if (ret_val)
+	if (ret_val) {
 		e_dbg("Error configuring flow control\n");
+		return ret_val;
+	}
 
-	return ret_val;
+	return 1;
 }
 
 static s32 e1000_get_variants_ich8lan(struct e1000_adapter *adapter)
-- 
2.15.1

^ permalink raw reply related

* [net 0/2][pull request] Intel Wired LAN Driver Updates 2018-01-02
From: Jeff Kirsher @ 2018-01-02 20:20 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains fixes for e1000 and e1000e.

Tushar Dave adds a check to the driver so that it won't attempt to disable a
device that is already disabled for e1000.

Benjamin Poirier provides a fix to e1000e, where a previous commit that
Benjamin submitted changed the meaning of the return value for
"check_for_link" for copper media and not all the instances were properly
updated.  Benjamin fixes the remaining instances that needed the change.

The following are changes since commit 2758b3e3e630ba304fc4aca434d591e70e528298:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 1GbE

Benjamin Poirier (1):
  e1000e: Fix e1000_check_for_copper_link_ich8lan return value.

Tushar Dave (1):
  e1000: fix disabling already-disabled warning

 drivers/net/ethernet/intel/e1000/e1000.h      |  3 ++-
 drivers/net/ethernet/intel/e1000/e1000_main.c | 27 ++++++++++++++++++++++-----
 drivers/net/ethernet/intel/e1000e/ich8lan.c   | 11 ++++++++---
 3 files changed, 32 insertions(+), 9 deletions(-)

-- 
2.15.1

^ permalink raw reply

* Re: [PATCH net-next 2/2] l2tp: add peer_offset parameter
From: James Chapman @ 2018-01-02 20:18 UTC (permalink / raw)
  To: Lorenzo Bianconi, Guillaume Nault; +Cc: David S. Miller, netdev, Hangbin Liu
In-Reply-To: <CAJ0CqmWhhzvi8HCKjf-xUjkszuNKgnuoD_MXmDNtUtWyEExdNQ@mail.gmail.com>

On 02/01/18 19:28, Lorenzo Bianconi wrote:
>>>> Lorenzo, is this being added to fix interoperability with another L2TPv3
>>>> implementation? If so, can you share more details?
>>>>
>>> Hi James,
>>>
>>> I introduced peer_offset parameter to fix a specific setup where
>>> tunnel endpoints
>>> running L2TPv3 would use different values for tx offset (since in
>>> iproute2 there is no
>>> restriction on it), not to fix a given an interoperability issue.
>>>
>> Yes, but was it just to test iproute2's peer_offset option? Or is there
>> a plan to use it for real?
>>
>>> I introduced this feature since:
>>>   - offset has been added for long time to L2TPv3 implementation
>>>     (commit f7faffa3ff8ef6ae712ef16312b8a2aa7a1c95fe and
>>>     commit 309795f4bec2d69cd507a631f82065c2198a0825) and I wanted to
>>> preserve UABI
>>>   - have the same degree of freedom for offset parameter we have in
>>> L2TPv2 and fix the issue
>>>     described above
>>>
>> AFAIU, the current L2TPv2 implementation never sets the offset field
>> and nobody ever realised.
>>
> Perhaps I am little bit polarized on UABI issue, but I was rethinking
> about it and maybe removing offset parameter would lead to an
> interoperability issue for device running L2TPv3 since offset
> parameter is there and it is not a nope.
> Please consider this setup:
> - 2 endpoint running L2TPv3, the first running net-next and the second
> running 4.14
> - both endpoint are configured using iproute2 in this way:
>
>    - ip l2tp add tunnel local <ip0> remote <ip1> tunnel_id <id0>
> peer_tunnel_id <id1> udp_sport <p0> udp_dport <p1>
>    - ip l2tp add tunnel local <ip1> remote <ip0> tunnel_id <id1>
> peer_tunnel_id <id0> udp_sport <p1> udp_dport <p0>
>    - ip l2tp add session name l2tp0 tunnel_id <id0> session_id <s0>
> peer_session_id <s1> offset 8
>    - ip l2tp add session name l2tp0 tunnel_id <id1> session_id <s1>
> peer_session_id <s0> offset 8
>
> Can we assume offset is never used for L2TPv3?
If offset is ever set non-zero, packets transmitted on the wire are not 
compliant with the L2TPv3 RFC. So if someone is using a non-zero offset 
in their config, it might be a good thing that it stops working with a 
kernel update.

> Regards,
> Lorenzo
>
>>> Now what we can do I guess is:
>>> - as suggested by Guillaume drop completely the offset support without removing
>>>    netlink attribute in order to not break UABI
>>> - fix offset support initializing properly padding bits
>>>
>> I'd go for the first one. I just wonder if that looks acceptable to
>> David an James.

^ permalink raw reply

* Re: [PATCH net-next 2/2] l2tp: add peer_offset parameter
From: James Chapman @ 2018-01-02 20:08 UTC (permalink / raw)
  To: Guillaume Nault, Lorenzo Bianconi; +Cc: David S. Miller, netdev, Hangbin Liu
In-Reply-To: <20180102180557.GB1402@alphalink.fr>

On 02/01/18 18:05, Guillaume Nault wrote:
>>> Lorenzo, is this being added to fix interoperability with another L2TPv3
>>> implementation? If so, can you share more details?
>>>
>> Hi James,
>>
>> I introduced peer_offset parameter to fix a specific setup where
>> tunnel endpoints
>> running L2TPv3 would use different values for tx offset (since in
>> iproute2 there is no
>> restriction on it), not to fix a given an interoperability issue.
>>
> Yes, but was it just to test iproute2's peer_offset option? Or is there
> a plan to use it for real?
>
>> I introduced this feature since:
>>   - offset has been added for long time to L2TPv3 implementation
>>     (commit f7faffa3ff8ef6ae712ef16312b8a2aa7a1c95fe and
>>     commit 309795f4bec2d69cd507a631f82065c2198a0825) and I wanted to
>> preserve UABI
>>   - have the same degree of freedom for offset parameter we have in
>> L2TPv2 and fix the issue
>>     described above
>>
> AFAIU, the current L2TPv2 implementation never sets the offset field
> and nobody ever realised.
>
>> Now what we can do I guess is:
>> - as suggested by Guillaume drop completely the offset support without removing
>>    netlink attribute in order to not break UABI
>> - fix offset support initializing properly padding bits
>>
> I'd go for the first one. I just wonder if that looks acceptable to
> David an James.

I think the first one too. Also update iproute2 to remove or hide the 
offset and peer_offset parameters.

^ permalink raw reply

* Re: [PATCH net-next 2/2] l2tp: add peer_offset parameter
From: James Chapman @ 2018-01-02 20:08 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: Lorenzo Bianconi, davem, netdev, liuhangbin
In-Reply-To: <20180102175048.GA1402@alphalink.fr>

On 02/01/18 17:50, Guillaume Nault wrote:
> On Fri, Dec 29, 2017 at 06:53:56PM +0000, James Chapman wrote:
>> On 28/12/17 19:45, Guillaume Nault wrote:
>>> Here we have an option that:
>>>     * creates invalid packets (AFAIK),
>>>     * is buggy and leaks memory on the network,
>>>     * doesn't seem to have any use case (even the manpage
>>>       says "This is hardly ever used").
>>>
>>> So I'm sorry, but I don't see the point in expanding this option to
>>> allow even stranger setups. If there's a use case, then fine.
>>> Otherwise, let's just acknowledge that the "peer_offset" option of
>>> iproute2 is a noop (and maybe remove it from the manpage).
>>>
>>> And the kernel "offset" option needs to be fixed. Actually, I wouldn't
>>> mind if it was converted to be a noop, or even rejected. L2TP already
>>> has its share of unused features that complicate the code and hamper
>>> evolution and bug fixing. As I said earlier, if it's buggy, unused and
>>> can't even produce valid packets, then why bothering with it?
>>>
>>> But that's just my point of view. James, do you have an opinion on
>>> this?
>> I agree, Guillaume.
>>
>> The L2TPv3 protocol RFC dropped the configurable offset of L2TPv2 - instead,
>> the Layer-2-Specific-Sublayer is supposed to handle any transport-specific
>> data alignment requirements.
>>
> Yes, and AFAIK, no RFC has ever defined an L2TPv3 sublayer using offsets.
>
>> I think a configurable offset has found its way
>> into iproute2 l2tp commands by mistake, perhaps because the netlink API
>> defines an attribute for it, but which was only intended for use with
>> L2TPv2.
>>
> Makes sense, however L2TP_ATTR_OFFSET seems to be a noop for L2TPv2 in
> the current implementation.
>
>> For L2TPv2, we only configure the offset for transmitted packets. In
>> received packets, the offset (if present) is obtained from the L2TPv2 header
>> in each received packet. There is no need to add a peer-offset netlink
>> attribute to set the offset expected in received packets.
>>
> Agreed for Rx side. I also agree on the theory for Tx, but in the current
> implementation, l2tp_build_l2tpv2_header() doesn't take the session's
> "offset" field into account. So, unless I've missed something,
> L2TP_ATTR_OFFSET is already a noop for L2TPv2.

You're right. My bad.

> Not sure if it's worth handling this feature of L2TPv2. The Linux
> implementation has been there for so long, and nobody ever complained
> that there was no way to define an offset on Tx.

I agree.

^ permalink raw reply

* Re: [net-next] ipv6: sr: export some functions of seg6local
From: David Lebrun @ 2018-01-02 20:11 UTC (permalink / raw)
  To: Ahmed Abdelsalam, davem, kuznet, yoshfuji; +Cc: david.lebrun, netdev
In-Reply-To: <1514581784-1527-1-git-send-email-amsalam20@gmail.com>

On 12/29/2017 09:09 PM, Ahmed Abdelsalam wrote:
> Some functions of seg6local are very useful to process SRv6
> encapsulated packets.
> 
> This patch exports some functions of seg6local that are useful and
> can be re-used at different parts of the kernel.
> 
> The set of exported functions are:
> (1) get_srh()
> (2) advance_nextseg()
> (3) lookup_nexthop

If you want to export those functions, it's better to prefix them with
e.g. seg6_:

seg6_get_srh()
seg6_advance_nextseg()
seg6_lookup_nexthop()

David

^ permalink raw reply

* Re: [patch iproute2 v4 3/3] man: Add -bs option to tc manpage
From: Marcelo Ricardo Leitner @ 2018-01-02 20:07 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev, gerlitz.or, stephen, dsahern
In-Reply-To: <20180102142804.27145-4-chrism@mellanox.com>

On Tue, Jan 02, 2018 at 11:28:04PM +0900, Chris Mi wrote:
> Signed-off-by: Chris Mi <chrism@mellanox.com>
> ---
>  man/man8/tc.8 | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/man/man8/tc.8 b/man/man8/tc.8
> index ff071b33..de137e16 100644
> --- a/man/man8/tc.8
> +++ b/man/man8/tc.8
> @@ -601,6 +601,11 @@ must exist already.
>  read commands from provided file or standard input and invoke them.
>  First failure will cause termination of tc.
>  
> +.TP
> +.BR "\-bs", " \-bs size", " \-batchsize", " \-batchsize size"
> +How many commands are accumulated before sending to kernel.
> +By default, it is 1. It only takes effect in batch mode.
> +

You should also describe the limitations it has. Like, it only works
for action and filter and that it shouldn't be mixed with other
commands.
And maybe even do such check in the code: refuse to do other commands
if batch_size > 1.

>  .TP
>  .BR "\-force"
>  don't terminate tc on errors in batch mode.
> -- 
> 2.14.3
> 

^ permalink raw reply

* Re: [patch iproute2 v4 2/3] tc: Add -bs option to batch mode
From: Marcelo Ricardo Leitner @ 2018-01-02 20:04 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev, gerlitz.or, stephen, dsahern
In-Reply-To: <20180102142804.27145-3-chrism@mellanox.com>

On Tue, Jan 02, 2018 at 11:28:03PM +0900, Chris Mi wrote:
> @@ -240,23 +244,49 @@ static int batch(const char *name)
>  	}
>  
>  	cmdlineno = 0;
> -	while (getcmdline(&line, &len, stdin) != -1) {
> +	if (getcmdline(&line, &len, stdin) == -1)
> +		goto Exit;
> +	do {
>  		char *largv[100];
>  		int largc;
>  
> +		if (getcmdline(&line2, &len, stdin) == -1)
> +			lastline = true;
> +
>  		largc = makeargs(line, largv, 100);
>  		if (largc == 0)
>  			continue;	/* blank line */

If it reads a new line, it won't process anything else after it
because line won't get updated.

  Marcelo

>  
> -		if (do_cmd(largc, largv)) {
> -			fprintf(stderr, "Command failed %s:%d\n", name, cmdlineno);
> +		line = line2;
> +		line2 = NULL;
> +		len = 0;
> +
> +		/*
> +		 * In batch mode, if we haven't accumulated enough commands
> +		 * and this is not the last command, don't send the message
> +		 * immediately.
> +		 */
> +		if (batch_size > 1 && msg_iov_index + 1 != batch_size
> +		    && !lastline)
> +			send = false;
> +		else
> +			send = true;
> +
> +		ret = do_cmd(largc, largv, batch_size, msg_iov_index++, send);
> +		if (ret < 0) {
> +			fprintf(stderr, "Command failed %s:%d\n", name,
> +				cmdlineno);
>  			ret = 1;
>  			if (!force)
>  				break;
>  		}
> -	}
> -	if (line)
> -		free(line);
> +		msg_iov_index %= batch_size;
> +	} while (!lastline);
> +
> +	free_filter_reqs();
> +	free_action_reqs();
> +Exit:
> +	free(line);
>  
>  	rtnl_close(&rth);
>  	return ret;

^ permalink raw reply

* Re: [PATCH net-next 0/5] marvell10g updates
From: David Miller @ 2018-01-02 20:01 UTC (permalink / raw)
  To: linux; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <20171229124412.GA10595@n2100.armlinux.org.uk>

From: Russell King - ARM Linux <linux@armlinux.org.uk>
Date: Fri, 29 Dec 2017 12:44:13 +0000

> This series:
> - adds MDI/MDIX reporting
> - adds support for 10/100Mbps half-duplex link modes
> - adds a comment describing the setup on VF610 ZII boards (where
>   the phy interface mode doesn't change.)
> - cleans up the phy interace mode switching

Series applied, thank you.

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] i40e: Delete an error message for a failed memory allocation in i40e_init_interrupt_scheme()
From: Jesse Brandeburg @ 2018-01-02 19:56 UTC (permalink / raw)
  To: SF Markus Elfring
  Cc: netdev, intel-wired-lan, Jeff Kirsher, kernel-janitors, LKML,
	jesse.brandeburg
In-Reply-To: <c9c151e5-bd26-5b72-d26b-c153249811ab@users.sourceforge.net>

On Mon, 1 Jan 2018 20:43:35 +0100
SF Markus Elfring <elfring@users.sourceforge.net> wrote:

> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Mon, 1 Jan 2018 20:38:14 +0100
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>

Thanks for the patch.

Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply

* Re: [PATCH iproute2 2/3] link_iptnl: Print tunnel mode
From: Stephen Hemminger @ 2018-01-02 19:54 UTC (permalink / raw)
  To: Serhey Popovych; +Cc: netdev
In-Reply-To: <1514906959-9719-3-git-send-email-serhe.popovych@gmail.com>

On Tue,  2 Jan 2018 17:29:18 +0200
Serhey Popovych <serhe.popovych@gmail.com> wrote:

> Tunnel mode does not appear in parameters print for iptnl
> supported tunnels like ipip and sit, while printed for
> ip6tnl.
> 
> Print tunnel mode with "proto" field for JSON and without
> any name when printing to cli to follow ip6tnl behaviour.
> 
> Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
> ---
>  ip/link_iptnl.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
> index d4d935b..afd1696 100644
> --- a/ip/link_iptnl.c
> +++ b/ip/link_iptnl.c
> @@ -372,6 +372,23 @@ static void iptunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[
>  	if (tb[IFLA_IPTUN_COLLECT_METADATA])
>  		print_bool(PRINT_ANY, "external", "external ", true);
>  
> +	if (tb[IFLA_IPTUN_PROTO]) {
> +		switch (rta_getattr_u8(tb[IFLA_IPTUN_PROTO])) {
> +		case IPPROTO_IPIP:
> +			print_string(PRINT_ANY, "proto", "%s ", "ip4ip4");
> +			break;
> +		case IPPROTO_IPV6:
> +			print_string(PRINT_ANY, "proto", "%s ", "ip6ip");
> +			break;
> +		case IPPROTO_MPLS:
> +			print_string(PRINT_ANY, "proto", "%s ", "mplsip");
> +			break;
> +		case 0:
> +			print_string(PRINT_ANY, "proto", "%s ", "any");
> +			break;
> +		}
> +	}
> +
>  	if (tb[IFLA_IPTUN_REMOTE]) {
>  		unsigned int addr = rta_getattr_u32(tb[IFLA_IPTUN_REMOTE]);
>  

In iproute2 utilities the output format must match the input format.
I don't see "proto ip4ip4" as a valid option.

^ permalink raw reply

* Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint
From: David Miller @ 2018-01-02 19:52 UTC (permalink / raw)
  To: brendan.d.gregg
  Cc: laoar.shao, songliubraving, marcelo.leitner, rostedt, bgregg,
	netdev, linux-kernel
In-Reply-To: <CAE40pddorDmoaGn2og-7u7vxpDY8uiWN8qEbQ05YYnK8f0Q_XQ@mail.gmail.com>

From: Brendan Gregg <brendan.d.gregg@gmail.com>
Date: Tue, 2 Jan 2018 11:46:26 -0800

> If I'm to use sock:inet_sock_set_state for TCP tracing, I'd like
> sk->sk_protocol exposed as a tracepoint argument so I can match on
> IPPROTO_TCP.

Agreed.

^ permalink raw reply

* Re: [PATCH] NET: usb: qmi_wwan: add support for YUGA CLM920-NC5 PID 0x9625
From: David Miller @ 2018-01-02 19:49 UTC (permalink / raw)
  To: sz.lin; +Cc: bjorn, netdev, linux-usb, linux-kernel
In-Reply-To: <20171229090217.6933-1-sz.lin@moxa.com>

From: SZ Lin (林上智) <sz.lin@moxa.com>
Date: Fri, 29 Dec 2017 17:02:17 +0800

> This patch adds support for PID 0x9625 of YUGA CLM920-NC5.
> 
> YUGA CLM920-NC5 needs to enable QMI_WWAN_QUIRK_DTR before QMI operation.
> 
> qmicli -d /dev/cdc-wdm0 -p --dms-get-revision
> [/dev/cdc-wdm0] Device revision retrieved:
>         Revision: 'CLM920_NC5-V1  1  [Oct 23 2016 19:00:00]'
> 
> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>

Applied, thank you.

^ permalink raw reply

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
From: Jiri Pirko @ 2018-01-02 19:49 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel
In-Reply-To: <20171225102346.GB1885@nanopsycho>

Mon, Dec 25, 2017 at 11:23:46AM CET, jiri@resnulli.us wrote:
>Sun, Dec 24, 2017 at 05:25:41PM CET, dsahern@gmail.com wrote:
>>On 12/24/17 1:19 AM, Jiri Pirko wrote:
>>> Sun, Dec 24, 2017 at 02:54:47AM CET, dsahern@gmail.com wrote:
>>>> On 12/23/17 9:54 AM, Jiri Pirko wrote:
>>>>> So back to the example. First, we create 2 qdiscs. Both will share
>>>>> block number 22. "22" is just an identification. If we don't pass any
>>>>> block number, a new one will be generated by kernel:
>>>>>
>>>>> $ tc qdisc add dev ens7 ingress block 22
>>>>>                                 ^^^^^^^^
>>>>> $ tc qdisc add dev ens8 ingress block 22
>>>>>                                 ^^^^^^^^
>>>>>
>>>>> Now if we list the qdiscs, we will see the block index in the output:
>>>>>
>>>>> $ tc qdisc
>>>>> qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22
>>>>> qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22
>>>>>
>>>>> To make is more visual, the situation looks like this:
>>>>>
>>>>>    ens7 ingress qdisc                 ens7 ingress qdisc
>>>>>           |                                  |
>>>>>           |                                  |
>>>>>           +---------->  block 22  <----------+
>>>>>
>>>>> Unlimited number of qdiscs may share the same block.
>>>>>
>>>>> Now we can add filter to any of qdiscs sharing the same block:
>>>>>
>>>>> $ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
>>>>
>>>>
>>>> Allowing config of a shared block through any qdisc that references it
>>>> is akin to me allowing nexthop objects to be manipulated by any route
>>>> that references it -- sure, it could be done but causes a lot surprises
>>>> to the user.
>>>>
>>>> You are adding a new tc object -- a shared block. Why the resistance to
>>>> creating a proper API for managing it?
>>> 
>>> Again, no resistance, I said many times it would be done as a follow-up.
>>> But as an api already exists, it has to continue to work. Or do you
>>> suggest it should stop working? That, I don't agree with.
>>> 
>>
>>That is exactly what I am saying - principle of least surprise. The new
>>object brings its own API and can only be modified using the new API.
>>The scheme above can and will surprise users. You are thinking like a tc
>>developer, someone intimately familiar with the code, and not like an
>>ordinary user of this new feature.
>
>Breaking exising tools is newer good. Note that not only about filter
>add/del iface but also dump and notifications. I agree to extend the api
>for the "block handle", sure, but the existing api should continue to
>work.

DaveA, please consider following example:

$ tc qdisc add dev ens7 ingress
$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 block 1

Now I have one device with one qdisc attached.

I will add some filters, for example:
$ tc filter add dev ens7 ingress protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop

No sharing is happening. The user is doing what he is used to do.

Now user decides to share this filters with another device. As you can
see above, the block created for ens7 qdisc instance has id "1".
User can simply do:

tc qdisc add dev ens8 ingress block 1

And the block gets shared among ens7 ingress qdisc instance and ens8
ingress qdisc instance.

What is wrong with this? The approach you suggest would disallow this
forcing user to explicitly create some block entity and then to attach
it to qdisc instances. I don't really see good reason for it. Could you
please clear this up for me?

Thanks!

^ permalink raw reply

* Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint
From: Brendan Gregg @ 2018-01-02 19:46 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Song Liu, David S. Miller, Marcelo Ricardo Leitner,
	Steven Rostedt, Brendan Gregg, netdev, LKML
In-Reply-To: <CALOAHbC4V1EsyecfDXGi-b7P08huhY3+soHdpuqTaEn4AcfVww@mail.gmail.com>

On Sat, Dec 30, 2017 at 7:06 PM, Yafang Shao <laoar.shao@gmail.com> wrote:
> On Sun, Dec 31, 2017 at 6:33 AM, Brendan Gregg
> <brendan.d.gregg@gmail.com> wrote:
>> On Tue, Dec 19, 2017 at 7:12 PM, Yafang Shao <laoar.shao@gmail.com> wrote:
>>> As sk_state is a common field for struct sock, so the state
>>> transition tracepoint should not be a TCP specific feature.
>>> Currently it traces all AF_INET state transition, so I rename this
>>> tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
>>> into trace/events/sock.h.
>>
>> The tcp:tcp_set_state probe is tcp_set_state(), so it's only going to
>> fire for TCP sessions. It's not broken, and we could add a
>> sctp:sctp_set_state as well. Replacing tcp:tcp_set_state with
>> inet_sk_set_state is feeling like we might be baking too much
>> implementation detail into the tracepoint API.
>>
>> If we must have inet_sk_set_state, then must we also delete tcp:tcp_set_state?
>>
>
> Hi Brendan,
>
> The reason we have to make this change could be got from this mail
> thread, https://patchwork.kernel.org/patch/10099243/ .
>
> The original tcp:tcp_set_state probe doesn't traced all TCP state transitions.
> There're some state transitions in inet_connection_sock.c and
> inet_hashtables.c are missed.
> So we have to place this probe into these two files to fix the issue.
> But as inet_connection_sock.c and inet_hashtables.c are common files
> for all IPv4 protocols, not only for TCP, so it is not proper to place
> a tcp_ function in these two files.
> That's why we decide to rename tcp:tcp_set_state probe to
> sock:inet_sock_set_state.

It kinda feels like we are fixing one exposing-implementation problem
(the missing state changes, which I'm happy to see fixed), by exposing
another (there's no tcp:tcp_set_state because we don't want to put tcp
functions in inet*.c files). Anyway...

If I'm to use sock:inet_sock_set_state for TCP tracing, I'd like
sk->sk_protocol exposed as a tracepoint argument so I can match on
IPPROTO_TCP. Otherwise I'll have to keep digging it out of (void
*)skaddr. (And if we're adding arguments, maybe consider sk_family as
well, to make it easier to see which address arguments to use).

Brendan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox