Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] net/mlx5e: fix csum adjustments caused by RXFCS
From: Eric Dumazet @ 2018-10-30 19:03 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, netdev, Eric Dumazet, Eran Ben Elisha,
	Saeed Mahameed, Dimitris Michailidis, Paweł Staszewski
In-Reply-To: <CAM_iQpUPWncS22uYURFZ5OsvPZp7aJozPNOcDCak9nRFOS0jig@mail.gmail.com>

On Tue, Oct 30, 2018 at 11:59 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:

> I wonder how compiler recognizes it as "never fail" when marked with
> __must_check.

__must_check means that you can not ignore the return value of a function.

Here we do use the return value.

Also prior code was not checking skb->length so really my patch does
add explicit
check if in the future skb->len gets wrong after a bug is added in this driver.

(A NULL deref will trap the bug, instead of potentially reading
garbage from skb->data)

^ permalink raw reply

* RE: [PATCH net] net: ethernet: cadence: fix socket buffer corruption problem
From: Tristram.Ha @ 2018-10-30 19:36 UTC (permalink / raw)
  To: Claudiu.Beznea; +Cc: UNGLinuxDriver, netdev, davem, Nicolas.Ferre
In-Reply-To: <eceb7dd3-c7bc-fa78-26dc-1da2e4a8cf4e@microchip.com>

> Could you check on your side that applying this on top of your patch, your
> scenario is still working?
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c
> b/drivers/net/ethernet/cadence/macb_main.c
> index 1d86b4d5645a..e1347d6d1b50 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1702,12 +1702,8 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
> struct net_device *ndev)
>                 *skb = nskb;
>         }
> 
> -       if (padlen) {
> -               if (padlen >= ETH_FCS_LEN)
> -                       skb_put_zero(*skb, padlen - ETH_FCS_LEN);
> -               else
> -                       skb_trim(*skb, ETH_FCS_LEN - padlen);
> -       }
> +       if (padlen > ETH_FCS_LEN)
> +               skb_put_zero(*skb, padlen - ETH_FCS_LEN);

I think it is okay but I need to check all paths are covered.
 
> It was reported in [1] that UDP checksum is offloaded to hardware no matter
> the application previously computed it.
> 
> The code should be executed only for packets that has checksum computed
> by
> applications ((*skb)->ip_summed != CHECKSUM_PARTIAL). The idea was to
> not
> recompute checksum for packets with checksum already computed. To do
> so,
> while hardware checksum is enabled (NETIF_F_HW_CSUM), TX_NOCRC bit
> should
> be set on buffer descriptor. But to do so, packets must have a minimum size
> of 64 and FCS to be computed.
> 
> The NETIF_F_HW_CSUM check was placed there because the issue
> described in
> [1] is reproducible because hardware checksum is enabled and overrides the
> checksum provided by applications.
> 
> [1] https://www.spinics.net/lists/netdev/msg505065.html

I understand the issue now.  It is weird that the transmit descriptor does not
have direct control over turning on checksum generation or not, but it wastes
3 bits returning the error codes of such generation.  What can the driver do
with such information?

In my opinion then hardware transmit checksumming cannot be supported
In Linux.

> > NETIF_F_SG is not enabled in the MAC I used, so enabling
> NETIF_IF_HW_CSUM
> > is rather pointless.  With the padding code the transmit throughput cannot
> get
> > higher than 100Mbps in a gigabit connection.
> >
> > I would recommend to add this option to disable manual padding in one of
> those
> > macb_config structures.
> 
> In this way the user would have to know from the beginning what kind of
> packets are used.
> 

The kernel already does a good job of calculating checksum.  Using hardware to
do that does not improve performance much.

Alternative is to use ethtool to disable hardware tx checksum so that software can
intentionally send wrong checksums.


^ permalink raw reply

* [PATCH iproute2] ip rule: Honor filter arguments on flush
From: David Ahern @ 2018-10-30 20:58 UTC (permalink / raw)
  To: netdev; +Cc: stephen, David Ahern

From: David Ahern <dsahern@gmail.com>

'ip ru flush' currently removes all rules with priority > 0 regardless
of any other command line arguments passed in. Update flush_rule to
call filter_nlmsg to determine if the rule should be flushed or not.
This enables rule flushing such as 'ip ru flush table 1001' and
'ip ru flush pref 99'.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 ip/iprule.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/ip/iprule.c b/ip/iprule.c
index b465a80785b1..a85a43904e6e 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -461,6 +461,7 @@ static int flush_rule(struct nlmsghdr *n, void *arg)
 	struct fib_rule_hdr *frh = NLMSG_DATA(n);
 	int len = n->nlmsg_len;
 	struct rtattr *tb[FRA_MAX+1];
+	int host_len = -1;
 
 	len -= NLMSG_LENGTH(sizeof(*frh));
 	if (len < 0)
@@ -468,6 +469,10 @@ static int flush_rule(struct nlmsghdr *n, void *arg)
 
 	parse_rtattr(tb, FRA_MAX, RTM_RTA(frh), len);
 
+	host_len = af_bit_len(frh->family);
+	if (!filter_nlmsg(n, tb, host_len))
+		return 0;
+
 	if (tb[FRA_PROTOCOL]) {
 		__u8 protocol = rta_getattr_u8(tb[FRA_PROTOCOL]);
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH iproute2] ip rule: Require at least one argument for add
From: David Ahern @ 2018-10-30 20:59 UTC (permalink / raw)
  To: netdev; +Cc: stephen, David Ahern

From: David Ahern <dsahern@gmail.com>

'ip rule add' with no additional arguments just adds another rule
for the main table - which exists by default. Require at least
1 argument similar to delete.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 ip/iprule.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/ip/iprule.c b/ip/iprule.c
index d89d808d8909..b465a80785b1 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -691,6 +691,11 @@ static int iprule_modify(int cmd, int argc, char **argv)
 	};
 
 	if (cmd == RTM_NEWRULE) {
+		if (argc == 0) {
+			fprintf(stderr,
+				"\"ip rule add\" requires arguments.\n");
+			return -1;
+		}
 		req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL;
 		req.frh.action = FR_ACT_TO_TBL;
 	}
-- 
2.11.0

^ permalink raw reply related

* Re: WARNING in rds_message_alloc_sgs
From: Leon Romanovsky @ 2018-10-31  6:42 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: syzbot, linux-rdma, netdev, rds-devel, syzkaller-bugs, davem,
	linux-kernel
In-Reply-To: <a7a834e0-c300-b5b9-67b5-78fa3eba8ebb@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

On Tue, Oct 30, 2018 at 12:38:02PM -0700, Santosh Shilimkar wrote:
> On 10/30/2018 12:28 PM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    6201f31a39f8 Add linux-next specific files for 20181030
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1397d06d400000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=2a22859d870756c1
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=26de17458aeda9d305d8
> > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10bb52eb400000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=118bdfc5400000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
> >
> > WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316
> > rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
> > Kernel panic - not syncing: panic_on_warn set ...
> Looks like this kernel build has panic on warn enabled which
> triggers panic for " WARN_ON(!nr_pages)" case. Will look into
> it. Thanks !!

Please don't forget to remove user triggered WARN_ON.
https://lwn.net/Articles/769365/
"Greg Kroah-Hartman raised the problem of core kernel API code that will
use WARN_ON_ONCE() to complain about bad usage; that will not generate
the desired result if WARN_ON_ONCE() is configured to crash the machine.
He was told that the code should just call pr_warn() instead, and that
the called function should return an error in such situations. It was
generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be
triggered from user space need to be fixed."

Thanks

>
> Regards,
> Santosh

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* [PATCH iproute2-next] ip rule: Add ipproto and port range to filter list
From: David Ahern @ 2018-10-30 22:03 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

From: David Ahern <dsahern@gmail.com>

Allow ip rule dumps and flushes to filter based on ipproto, sport
and dport. Example:

$ ip ru ls ipproto udp
99:	from all to 8.8.8.8 ipproto udp dport 53 lookup 1001
$ ip ru ls dport 53
99:	from all to 8.8.8.8 ipproto udp dport 53 lookup 1001

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 ip/iprule.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/ip/iprule.c b/ip/iprule.c
index a85a43904e6e..0713694d4a49 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -78,6 +78,9 @@ static struct
 	inet_prefix dst;
 	int protocol;
 	int protocolmask;
+	struct fib_rule_port_range sport;
+	struct fib_rule_port_range dport;
+	__u8 ipproto;
 } filter;
 
 static inline int frh_get_table(struct fib_rule_hdr *frh, struct rtattr **tb)
@@ -174,6 +177,39 @@ static bool filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
 			return false;
 	}
 
+	if (filter.ipproto) {
+		__u8 ipproto = 0;
+
+		if (tb[FRA_IP_PROTO])
+			ipproto = rta_getattr_u8(tb[FRA_IP_PROTO]);
+		if (filter.ipproto != ipproto)
+			return false;
+	}
+
+	if (filter.sport.start) {
+		const struct fib_rule_port_range *r;
+
+		if (!tb[FRA_SPORT_RANGE])
+			return false;
+
+		r = RTA_DATA(tb[FRA_SPORT_RANGE]);
+		if (r->start != filter.sport.start ||
+		    r->end != filter.sport.end)
+			return false;
+	}
+
+	if (filter.dport.start) {
+		const struct fib_rule_port_range *r;
+
+		if (!tb[FRA_DPORT_RANGE])
+			return false;
+
+		r = RTA_DATA(tb[FRA_DPORT_RANGE]);
+		if (r->start != filter.dport.start ||
+		    r->end != filter.dport.end)
+			return false;
+	}
+
 	table = frh_get_table(frh, tb);
 	if (filter.tb > 0 && filter.tb ^ table)
 		return false;
@@ -607,6 +643,36 @@ static int iprule_list_flush_or_save(int argc, char **argv, int action)
 				filter.protocolmask = 0;
 			}
 			filter.protocol = prot;
+		} else if (strcmp(*argv, "ipproto") == 0) {
+			int ipproto;
+
+			NEXT_ARG();
+			ipproto = inet_proto_a2n(*argv);
+			if (ipproto < 0)
+				invarg("Invalid \"ipproto\" value\n", *argv);
+			filter.ipproto = ipproto;
+		} else if (strcmp(*argv, "sport") == 0) {
+			struct fib_rule_port_range r;
+			int ret = 0;
+
+			NEXT_ARG();
+			ret = sscanf(*argv, "%hu-%hu", &r.start, &r.end);
+			if (ret == 1)
+				r.end = r.start;
+			else if (ret != 2)
+				invarg("invalid port range\n", *argv);
+			filter.sport = r;
+		} else if (strcmp(*argv, "dport") == 0) {
+			struct fib_rule_port_range r;
+			int ret = 0;
+
+			NEXT_ARG();
+			ret = sscanf(*argv, "%hu-%hu", &r.start, &r.end);
+			if (ret == 1)
+				r.end = r.start;
+			else if (ret != 2)
+				invarg("invalid dport range\n", *argv);
+			filter.dport = r;
 		} else{
 			if (matches(*argv, "dst") == 0 ||
 			    matches(*argv, "to") == 0) {
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net v5] net/ipv6: Add anycast addresses to a global hashtable
From: David Ahern @ 2018-10-30 22:06 UTC (permalink / raw)
  To: David Miller, 0xeffeff; +Cc: netdev, kuznet, yoshfuji
In-Reply-To: <20181030.113147.173837020180526004.davem@davemloft.net>

On 10/30/18 12:31 PM, David Miller wrote:
> From: Jeff Barnhill <0xeffeff@gmail.com>
> Date: Tue, 30 Oct 2018 07:10:58 -0400
> 
>> I originally started implementing it the way you suggested; however,
>> it seemed to complicate management of that structure because it isn't
>> currently using rcu.  Also, assuming that can be worked out, where
>> would I get the net from?  Would I need to store a copy in ifcaddr6,
>> or is there some way to access it during ipv6_chk_acast_addr()?  It
>> seems that if I don't add a copy of net, but instead access it through
>> aca_rt(?), then freeing the ifcaddr6 memory becomes problematic
>> (detaching it from idev, while read_rcu may still be accessing it).
>> On Mon, Oct 29, 2018 at 11:32 PM David Miller <davem@davemloft.net> wrote:
> 
> I don't think converting the structure over to RCU, especially because
> all of the read paths (everything leading to ipv6_chk_acast_dev()) are
> taking RCU locks already.
> 
> And I cannot understand how having _two_ structures to manage a piece
> of information can be less complicated than just one.
> 
> You can add a backpointer to the 'idev' in ifacaddr6 to get at the
> network namespace.  You don't even need to do additional reference
> counting because the idev->ac_list is always purged before an idev
> is destroyed.
> 

or make the table per namespace.

^ permalink raw reply

* Re: [PATCH bpf] tools/bpf: add unlimited rlimit for flow_dissector_load
From: Daniel Borkmann @ 2018-10-30 22:30 UTC (permalink / raw)
  To: Yonghong Song, ast, netdev; +Cc: kernel-team
In-Reply-To: <20181029215648.3043831-1-yhs@fb.com>

On 10/29/2018 10:56 PM, Yonghong Song wrote:
> On our test machine, bpf selftest test_flow_dissector.sh failed
> with the following error:
>   # ./test_flow_dissector.sh
>   bpffs not mounted. Mounting...
>   libbpf: failed to create map (name: 'jmp_table'): Operation not permitted
>   libbpf: failed to load object 'bpf_flow.o'
>   ./flow_dissector_load: bpf_prog_load bpf_flow.o
>   selftests: test_flow_dissector [FAILED]
> 
> Let us increase the rlimit to remove the above map
> creation failure.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>

Applied to bpf, thanks!

^ permalink raw reply

* Re: [PATCH] bpf: tcp_bpf_recvmsg should return EAGAIN when nonblocking and no data
From: Daniel Borkmann @ 2018-10-30 22:30 UTC (permalink / raw)
  To: John Fastabend, ast; +Cc: netdev
In-Reply-To: <1540841488-22023-1-git-send-email-john.fastabend@gmail.com>

On 10/29/2018 08:31 PM, John Fastabend wrote:
> We return 0 in the case of a nonblocking socket that has no data
> available. However, this is incorrect and may confuse applications.
> After this patch we do the correct thing and return the error
> EAGAIN.
> 
> Quoting return codes from recvmsg manpage,
> 
> EAGAIN or EWOULDBLOCK
>  The socket is marked nonblocking and the receive operation would
>  block, or a receive timeout had been set and the timeout expired
>  before data was received.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Applied to bpf, thanks!

^ permalink raw reply

* [RFC PATCH] net: vlan ipsec-xfrm offload
From: Shannon Nelson @ 2018-10-30 22:44 UTC (permalink / raw)
  To: netdev; +Cc: shannon.lee.nelson

This is an RFC post - not working code.  I welcome your comments.

If the real device offers IPsec offload, why shouldn't the VLAN device
also offer the offload?  This essentially becomes a passthru interface
to the real device by swapping out the xfrm_state's netdevice reference
before calling to the real_dev.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 net/8021q/vlan_dev.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index ff720f1..a46e056 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -32,6 +32,7 @@
 #include <linux/phy.h>
 #include <net/arp.h>
 #include <net/switchdev.h>
+#include <net/xfrm.h>
 
 #include "vlan.h"
 #include "vlanproc.h"
@@ -546,6 +547,7 @@ static struct device_type vlan_type = {
 };
 
 static const struct net_device_ops vlan_netdev_ops;
+static const struct xfrmdev_ops vlan_xfrmdev_ops;
 
 static int vlan_dev_init(struct net_device *dev)
 {
@@ -553,6 +555,7 @@ static int vlan_dev_init(struct net_device *dev)
 
 	netif_carrier_off(dev);
 
+	/* copy netdev features into list of user selectable features */
 	/* IFF_BROADCAST|IFF_MULTICAST; ??? */
 	dev->flags  = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI |
 					  IFF_MASTER | IFF_SLAVE);
@@ -564,6 +567,12 @@ static int vlan_dev_init(struct net_device *dev)
 			   NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE |
 			   NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC |
 			   NETIF_F_ALL_FCOE;
+#ifdef CONFIG_XFRM_OFFLOAD
+	dev->hw_features |= real_dev->hw_features & (NETIF_F_HW_ESP |
+						     NETIF_F_HW_ESP_TX_CSUM |
+						     NETIF_F_GSO_ESP);
+	dev->xfrmdev_ops = &vlan_xfrmdev_ops;
+#endif
 
 	dev->features |= dev->hw_features | NETIF_F_LLTX;
 	dev->gso_max_size = real_dev->gso_max_size;
@@ -767,6 +776,95 @@ static int vlan_dev_get_iflink(const struct net_device *dev)
 	return real_dev->ifindex;
 }
 
+static int vlan_xdo_state_add(struct xfrm_state *xs)
+{
+	struct net_device *dev = xs->xso.dev;
+	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
+	int ret;
+
+	/* swap out the vlan dev and put a hold on the real device */
+	xs->xso.dev = real_dev;
+	dev_hold(real_dev);
+
+	ret = real_dev->xfrmdev_ops->xdo_dev_state_add(xs);
+
+	/* if it wasn't successful, release the real_dev */
+	if (ret)
+		dev_put(real_dev);
+
+	/* put dev back before we leave */
+	xs->xso.dev = dev;
+
+	return ret;
+}
+
+static void vlan_xdo_state_delete(struct xfrm_state *xs)
+{
+	struct net_device *dev = xs->xso.dev;
+	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
+
+	xs->xso.dev = real_dev;
+	real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
+	xs->xso.dev = dev;
+}
+
+static void vlan_xdo_state_free(struct xfrm_state *xs)
+{
+	struct net_device *dev = xs->xso.dev;
+	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
+
+	/* check for optional interface */
+	if (real_dev->xfrmdev_ops->xdo_dev_state_free) {
+		xs->xso.dev = real_dev;
+		real_dev->xfrmdev_ops->xdo_dev_state_free(xs);
+		xs->xso.dev = dev;
+	}
+
+	/* let go of the real device, we're done with it */
+	dev_put(real_dev);
+}
+
+static bool vlan_xdo_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
+{
+	struct net_device *dev = xs->xso.dev;
+	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
+	bool ret = true;
+
+	/* check for optional interface */
+	if (likely(real_dev->xfrmdev_ops->xdo_dev_offload_ok)) {
+		xs->xso.dev = real_dev;
+		ret = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
+		xs->xso.dev = dev;
+	}
+
+	if (!ret)
+		netdev_err(dev, "%s: real_dev %s NAKd the offload\n",
+			   __func__, real_dev->name);
+
+	return ret;
+}
+
+static void vlan_xdo_advance_esn(struct xfrm_state *xs)
+{
+	struct net_device *dev = xs->xso.dev;
+	struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
+
+	/* check for optional interface */
+	if (real_dev->xfrmdev_ops->xdo_dev_state_advance_esn) {
+		xs->xso.dev = real_dev;
+		real_dev->xfrmdev_ops->xdo_dev_state_advance_esn(xs);
+		xs->xso.dev = dev;
+	}
+}
+
+static const struct xfrmdev_ops vlan_xfrmdev_ops = {
+	.xdo_dev_state_add = vlan_xdo_state_add,
+	.xdo_dev_state_delete = vlan_xdo_state_delete,
+	.xdo_dev_state_free = vlan_xdo_state_free,
+	.xdo_dev_offload_ok = vlan_xdo_offload_ok,
+	.xdo_dev_state_advance_esn = vlan_xdo_advance_esn,
+};
+
 static const struct ethtool_ops vlan_ethtool_ops = {
 	.get_link_ksettings	= vlan_ethtool_get_link_ksettings,
 	.get_drvinfo	        = vlan_ethtool_get_drvinfo,
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net v5] net/ipv6: Add anycast addresses to a global hashtable
From: David Miller @ 2018-10-30 23:19 UTC (permalink / raw)
  To: dsahern; +Cc: 0xeffeff, netdev, kuznet, yoshfuji
In-Reply-To: <75a6a4bb-a8f3-4996-a157-825a5834ae52@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Tue, 30 Oct 2018 16:06:46 -0600

> or make the table per namespace.

This will increase namespace create/destroy cost, so I'd rather not
for something like this.

^ permalink raw reply

* [PATCH net] net: dsa: microchip: initialize mutex before use
From: Tristram.Ha @ 2018-10-30 23:45 UTC (permalink / raw)
  To: David S. Miller
  Cc: Tristram Ha, Andrew Lunn, Florian Fainelli, Pavel Machek,
	UNGLinuxDriver, netdev

From: Tristram Ha <Tristram.Ha@microchip.com>

Initialize mutex before use.  Avoid kernel complaint when
CONFIG_DEBUG_LOCK_ALLOC is enabled.

Fixes: b987e98e50ab90e5 ("dsa: add DSA switch driver for Microchip KSZ9477")
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
---
 drivers/net/dsa/microchip/ksz_common.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index 54e0ca6..f6f0662 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -1117,11 +1117,6 @@ static int ksz_switch_init(struct ksz_device *dev)
 {
 	int i;
 
-	mutex_init(&dev->reg_mutex);
-	mutex_init(&dev->stats_mutex);
-	mutex_init(&dev->alu_mutex);
-	mutex_init(&dev->vlan_mutex);
-
 	dev->ds->ops = &ksz_switch_ops;
 
 	for (i = 0; i < ARRAY_SIZE(ksz_switch_chips); i++) {
@@ -1206,6 +1201,12 @@ int ksz_switch_register(struct ksz_device *dev)
 	if (dev->pdata)
 		dev->chip_id = dev->pdata->chip_id;
 
+	/* mutex is used in next function call. */
+	mutex_init(&dev->reg_mutex);
+	mutex_init(&dev->stats_mutex);
+	mutex_init(&dev->alu_mutex);
+	mutex_init(&dev->vlan_mutex);
+
 	if (ksz_switch_detect(dev))
 		return -EINVAL;
 
-- 
1.9.1

^ permalink raw reply related

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
From: Fabio Rossi @ 2018-10-31  0:25 UTC (permalink / raw)
  To: andre, Eric Dumazet; +Cc: Stephen Hemminger, Dimitris Michailidis, netdev
In-Reply-To: <4693819f-4a76-532f-9b24-d4328183c807@gmail.com>

> On 10/16/2018 06:00 AM, Eric Dumazet wrote:
> > On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@tomt.net> wrote:
> >>
> >> On 15.10.2018 17:41, Eric Dumazet wrote:
> >>> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> >>>> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> >>>> got the following first bad commit:
> >>>>
> >>>> commit 88078d98d1bb085d72af8437707279e203524fa5
> >>>> Author: Eric Dumazet <edumazet@google.com>
> >>>> Date:   Wed Apr 18 11:43:15 2018 -0700
> >>>>
> >>>>      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >>>>
> >>>>      After working on IP defragmentation lately, I found that some large
> >>>>      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> >>>>      zero paddings on the last (small) fragment.
> >>>>
> >>>>      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> >>>>      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> >>>>      fragments had CHECKSUM_COMPLETE set.
> >>>>
> >>>>      We can instead compute the checksum of the part we are trimming,
> >>>>      usually smaller than the part we keep.
> >>>>
> >>>>      Signed-off-by: Eric Dumazet <edumazet@google.com>
> >>>>      Signed-off-by: David S. Miller <davem@davemloft.net>
> >>>>
> >>>
> >>> Thanks for bisecting !
> >>>
> >>> This commit is known to expose some NIC/driver bugs.
> >>>
> >>> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> >>> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
> >>>
> >>> I assume SKY2_HW_NEW_LE is not set on your NIC ?
> >>>
> >>
> >> I've seen similar on several systems with mlx4 cards when using 4.18.x -
> >> that is hw csum failure followed by some backtrace.
> >>
> >> Only seems to happen on systems dealing with quite a bit of UDP.
> >>
> > 
> > Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
> > but CHECKSUM_UNNECESSARY
> > 
> > I would be nice to track this a bit further, maybe by providing the
> > full packet content.
> > 
> >> Example from 4.18.10:
> >>> [635607.740574] p0xe0: hw csum failure
> >>> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> >>> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> >>> [635607.740599] Call Trace:
> >>> [635607.740602]  <IRQ>
> >>> [635607.740611]  dump_stack+0x5c/0x7b
> >>> [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
> >>> [635607.740621]  udp6_gro_receive+0x211/0x290
> >>> [635607.740624]  ipv6_gro_receive+0x1a8/0x390
> >>> [635607.740627]  dev_gro_receive+0x33e/0x550
> >>> [635607.740628]  napi_gro_frags+0xa2/0x210
> >>> [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> >>> [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> >>> [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> >>> [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> >>> [635607.740658]  net_rx_action+0xe0/0x2e0
> >>> [635607.740662]  __do_softirq+0xd8/0x2e5
> >>> [635607.740666]  irq_exit+0xb4/0xc0
> >>> [635607.740667]  do_IRQ+0x85/0xd0
> >>> [635607.740670]  common_interrupt+0xf/0xf
> >>> [635607.740671]  </IRQ>
> >>> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> >>> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> >>> [635607.740701] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> >>> [635607.740703] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> >>> [635607.740703] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> >>> [635607.740704] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> >>> [635607.740705] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> >>> [635607.740706] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> >>> [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
> >>> [635607.740712]  do_idle+0x1d0/0x240
> >>> [635607.740715]  cpu_startup_entry+0x5f/0x70
> >>> [635607.740719]  start_secondary+0x185/0x1a0
> >>> [635607.740722]  secondary_startup_64+0xa5/0xb0
> >>> [635607.740731] p0xe0: hw csum failure
> >>> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> >>> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> >>> [635607.740746] Call Trace:
> >>> [635607.740747]  <IRQ>
> >>> [635607.740750]  dump_stack+0x5c/0x7b
> >>> [635607.740755]  __skb_checksum_complete+0xb8/0xd0
> >>> [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
> >>> [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> >>> [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> >>> [635607.740774]  ip6_input_finish+0xc0/0x460
> >>> [635607.740776]  ip6_input+0x2b/0x90
> >>> [635607.740778]  ? ip6_rcv_finish+0x110/0x110
> >>> [635607.740780]  ipv6_rcv+0x2cd/0x4b0
> >>> [635607.740783]  ? udp6_lib_lookup_skb+0x59/0x80
> >>> [635607.740785]  __netif_receive_skb_core+0x455/0xb30
> >>> [635607.740788]  ? ipv6_gro_receive+0x1a8/0x390
> >>> [635607.740790]  ? netif_receive_skb_internal+0x24/0xb0
> >>> [635607.740792]  netif_receive_skb_internal+0x24/0xb0
> >>> [635607.740793]  napi_gro_frags+0x165/0x210
> >>> [635607.740796]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> >>> [635607.740802]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> >>> [635607.740807]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> >>> [635607.740810]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> >>> [635607.740811]  net_rx_action+0xe0/0x2e0
> >>> [635607.740813]  __do_softirq+0xd8/0x2e5
> >>> [635607.740816]  irq_exit+0xb4/0xc0
> >>> [635607.740817]  do_IRQ+0x85/0xd0
> >>> [635607.740820]  common_interrupt+0xf/0xf
> >>> [635607.740821]  </IRQ>
> >>> [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> >>> [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> >>> [635607.740848] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> >>> [635607.740849] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> >>> [635607.740850] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> >>> [635607.740851] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> >>> [635607.740852] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> >>> [635607.740853] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> >>> [635607.740855]  ? cpuidle_enter_state+0x91/0x2a0
> >>> [635607.740857]  do_idle+0x1d0/0x240
> >>> [635607.740859]  cpu_startup_entry+0x5f/0x70
> >>> [635607.740861]  start_secondary+0x185/0x1a0
> >>> [635607.740863]  secondary_startup_64+0xa5/0xb0
> 
> As a matter of fact Dimitris found the issue in the patch and is working on a fix involving csum_block_sub()
> 
> Problems comes from trimming an odd number of bytes.

I have applied the commit db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 on top of kernel 4.19.0 and the problem seems gone. The same commit applies to kernel 4.18.x but doesn't work. Thanks!

Fabio

^ permalink raw reply

* (unknown), 
From: Ubaithullah Masood @ 2018-10-31  0:38 UTC (permalink / raw)


This is Mr Ubaithullah Masood from Banco Santander Bank S A Hong Kong.
I got your contact during my private search on net..Would you be
interested in a business transaction to act as the beneficiary to
claim 9.8M USD funds of my deceased client who died intestate (
Without a Will)and my bank wants to confiscate the funds if the funds
are not claimed soon. Do get back for more details as this deal is
safe and all documentation will be done legally and we will share 50%
each.
Thanks.

^ permalink raw reply

* Re: [PATCH net] net: dsa: microchip: initialize mutex before use
From: Andrew Lunn @ 2018-10-31  1:21 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: David S. Miller, Florian Fainelli, Pavel Machek, UNGLinuxDriver,
	netdev
In-Reply-To: <1540943149-26832-1-git-send-email-Tristram.Ha@microchip.com>

On Tue, Oct 30, 2018 at 04:45:49PM -0700, Tristram.Ha@microchip.com wrote:
> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Initialize mutex before use.  Avoid kernel complaint when
> CONFIG_DEBUG_LOCK_ALLOC is enabled.
> 
> Fixes: b987e98e50ab90e5 ("dsa: add DSA switch driver for Microchip KSZ9477")
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
> ---
>  drivers/net/dsa/microchip/ksz_common.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
> index 54e0ca6..f6f0662 100644
> --- a/drivers/net/dsa/microchip/ksz_common.c
> +++ b/drivers/net/dsa/microchip/ksz_common.c
> @@ -1117,11 +1117,6 @@ static int ksz_switch_init(struct ksz_device *dev)
>  {
>  	int i;
>  
> -	mutex_init(&dev->reg_mutex);
> -	mutex_init(&dev->stats_mutex);
> -	mutex_init(&dev->alu_mutex);
> -	mutex_init(&dev->vlan_mutex);
> -
>  	dev->ds->ops = &ksz_switch_ops;
>  
>  	for (i = 0; i < ARRAY_SIZE(ksz_switch_chips); i++) {
> @@ -1206,6 +1201,12 @@ int ksz_switch_register(struct ksz_device *dev)
>  	if (dev->pdata)
>  		dev->chip_id = dev->pdata->chip_id;
>  
> +	/* mutex is used in next function call. */
> +	mutex_init(&dev->reg_mutex);
> +	mutex_init(&dev->stats_mutex);
> +	mutex_init(&dev->alu_mutex);
> +	mutex_init(&dev->vlan_mutex);

Hi Tristram

The comment is no longer relevant now that all mutexes are
initialised.

	Andrew

^ permalink raw reply

* UBSAN: Undefined behaviour in drivers/net/ppp/ppp_generic.c
From: Kyungtae Kim @ 2018-10-31 10:46 UTC (permalink / raw)
  To: paulus, davem
  Cc: Byoungyoung Lee, DaeRyong Jeong, syzkaller, netdev, linux-ppp,
	linux-kernel

We report a crash in v4.19-rc2 (and the latest kernel as well):

kernel config: https://kt0755.github.io/etc/config_v2-4.19
repro: https://kt0755.github.io/etc/repro.1e3e9.c

unit_set() lacks the bounds checking for an integer variable n,
which causes integer overflow when it is equal to INT_MAX.

=======================================================
UBSAN: Undefined behaviour in drivers/net/ppp/ppp_generic.c:3246:9
signed integer overflow:
2147483647 + 1 cannot be represented in type 'int'
CPU: 0 PID: 7676 Comm: syz-executor4 Not tainted 4.19.0-rc2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xd2/0x148 lib/dump_stack.c:113
 ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
 handle_overflow+0x1cf/0x21a lib/ubsan.c:190
 __ubsan_handle_add_overflow+0x2a/0x31 lib/ubsan.c:198
 unit_set drivers/net/ppp/ppp_generic.c:3246 [inline]
 ppp_unit_register drivers/net/ppp/ppp_generic.c:979 [inline]
 ppp_dev_configure+0xbd8/0xd50 drivers/net/ppp/ppp_generic.c:1049
 ppp_create_interface drivers/net/ppp/ppp_generic.c:3013 [inline]
 ppp_unattached_ioctl drivers/net/ppp/ppp_generic.c:848 [inline]
 ppp_ioctl+0x1652/0x27f0 drivers/net/ppp/ppp_generic.c:601
 vfs_ioctl fs/ioctl.c:46 [inline]
 do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
 ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
 __do_sys_ioctl fs/ioctl.c:709 [inline]
 __se_sys_ioctl fs/ioctl.c:707 [inline]
 __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
 do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f67b6f92c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f67b6f936cc RCX: 00000000004497b9
RDX: 0000000020000080 RSI: 00000000c004743e RDI: 0000000000000013
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000005598 R14: 00000000006ed638 R15: 00007f67b6f93700
======================================================

A simple patch below is provided.
Note that in this control flow, negative n is already
filtered out (drivers/net/ppp/ppp_generic.c:965).

--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -3243,6 +3243,9 @@ static int unit_set(struct idr *p, void *ptr, int n)
 {
        int unit;

+  if (n == INT_MAX)
+               return -EINVAL;
+
        unit = idr_alloc(p, ptr, n, n + 1, GFP_KERNEL);
        if (unit == -ENOSPC)
                unit = -EINVAL;

Thanks,
Kyungtae Kim

^ permalink raw reply

* Re: [RFC PATCH 1/4] ptp: add PTP_SYS_OFFSET_EXTENDED ioctl
From: Richard Cochran @ 2018-10-31  2:23 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: netdev, intel-wired-lan, Jacob Keller
In-Reply-To: <20181026162742.631-2-mlichvar@redhat.com>

On Fri, Oct 26, 2018 at 06:27:39PM +0200, Miroslav Lichvar wrote:
> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> index 2012551d93e0..1a04c437fd4f 100644
> --- a/drivers/ptp/ptp_chardev.c
> +++ b/drivers/ptp/ptp_chardev.c
> @@ -124,11 +124,13 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
>  	struct ptp_clock_caps caps;
>  	struct ptp_clock_request req;
>  	struct ptp_sys_offset *sysoff = NULL;
> +	struct ptp_sys_offset_extended *sysoff_extended = NULL;

How about a more succinct name, like 'extoff' ?

>  	struct ptp_sys_offset_precise precise_offset;
>  	struct ptp_pin_desc pd;
>  	struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
>  	struct ptp_clock_info *ops = ptp->info;
>  	struct ptp_clock_time *pct;
> +	struct ptp_system_timestamp sts;
>  	struct timespec64 ts;
>  	struct system_device_crosststamp xtstamp;
>  	int enable, err = 0;

This collection of automatic variables is getting ugly.  May I ask you
to prefix a patch that puts them into reverse Christmas tree before
your changes?  (Patch below)

> @@ -211,6 +213,43 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
>  			err = -EFAULT;
>  		break;
>  
> +	case PTP_SYS_OFFSET_EXTENDED:
> +		if (!ptp->info->gettimex64) {
> +			err = -EOPNOTSUPP;
> +			break;
> +		}
> +		sysoff_extended = memdup_user((void __user *)arg,
> +					      sizeof(*sysoff_extended));

As pointed out before, this needs to be freed, and

> +		if (IS_ERR(sysoff_extended)) {
> +			err = PTR_ERR(sysoff_extended);
> +			sysoff = NULL;

here you meant, sysoff_extended = NULL;

> +			break;
> +		}
> +		if (sysoff_extended->n_samples > PTP_MAX_SAMPLES) {
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		pct = &sysoff_extended->ts[0];
> +		for (i = 0; i < sysoff_extended->n_samples; i++) {
> +			err = ptp->info->gettimex64(ptp->info, &sts);
> +			if (err)
> +				break;
> +			pct->sec = sts.sys_ts1.tv_sec;
> +			pct->nsec = sts.sys_ts1.tv_nsec;
> +			pct++;
> +			pct->sec = sts.phc_ts.tv_sec;
> +			pct->nsec = sts.phc_ts.tv_nsec;
> +			pct++;
> +			pct->sec = sts.sys_ts2.tv_sec;
> +			pct->nsec = sts.sys_ts2.tv_nsec;
> +			pct++;
> +		}
> +		if (copy_to_user((void __user *)arg, sysoff_extended,
> +				 sizeof(*sysoff_extended)))
> +			err = -EFAULT;
> +		break;
> +
>  	case PTP_SYS_OFFSET:
>  		sysoff = memdup_user((void __user *)arg, sizeof(*sysoff));
>  		if (IS_ERR(sysoff)) {
> diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
> index 51349d124ee5..79321d929925 100644
> --- a/include/linux/ptp_clock_kernel.h
> +++ b/include/linux/ptp_clock_kernel.h
> @@ -39,6 +39,13 @@ struct ptp_clock_request {
>  };
>  
>  struct system_device_crosststamp;
> +

KernelDoc please for this:

> +struct ptp_system_timestamp {
> +	struct timespec64 sys_ts1;
> +	struct timespec64 phc_ts;
> +	struct timespec64 sys_ts2;
> +};
> +

Thanks,
Richard

^ permalink raw reply

* Re: [RFC PATCH 0/4] More accurate PHC<->system clock synchronization
From: Richard Cochran @ 2018-10-31  2:25 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: netdev, intel-wired-lan, Jacob Keller
In-Reply-To: <20181026162742.631-1-mlichvar@redhat.com>

On Fri, Oct 26, 2018 at 06:27:38PM +0200, Miroslav Lichvar wrote:
> The other patches add support for the new ioctl to the e1000e, igb,
> and ixgbe driver. Tests with few different NICs in different machines
> (and PCIe slots) show that:
> - with an I219 (e1000e) the measured delay improved from 2500 to 1300 ns
>   and the error in the measured offset, when compared to cross
>   timestamping, was reduced by a factor of 5
> - with an I210 (igb) the delay improved from 5100 to 1700 ns
> - with an I350 (igb) the delay improved from 2300 to 750 ns
> - with an X550 (ixgbe) the delay improved from 1950 to 650 ns

Nice work!  This is a welcome improvement.

Thanks,
Richard

^ permalink raw reply

* Re: [RFC PATCH 3/4] igb: add support for extended PHC gettime
From: Richard Cochran @ 2018-10-31  2:29 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: netdev, intel-wired-lan, Jacob Keller
In-Reply-To: <20181026162742.631-4-mlichvar@redhat.com>

On Fri, Oct 26, 2018 at 06:27:41PM +0200, Miroslav Lichvar wrote:
> +static int igb_ptp_gettimex(struct ptp_clock_info *ptp,
> +			    struct ptp_system_timestamp *sts)
> +{
> +	struct igb_adapter *igb = container_of(ptp, struct igb_adapter,
> +					       ptp_caps);
> +	struct e1000_hw *hw = &igb->hw;
> +	unsigned long flags;
> +	u32 lo, hi;
> +	u64 ns;
> +
> +	spin_lock_irqsave(&igb->tmreg_lock, flags);
> +
> +	/* 82576 doesn't have SYSTIMR */
> +	if (igb->hw.mac.type == e1000_82576) {

Instead of if/then/else, can't you follow the pattern of providing
different function flavors ...

> +		ptp_read_system_prets(sts);
> +		lo = rd32(E1000_SYSTIML);
> +		ptp_read_system_postts(sts);
> +		hi = rd32(E1000_SYSTIMH);
> +	} else {
> +		ptp_read_system_prets(sts);
> +		rd32(E1000_SYSTIMR);
> +		ptp_read_system_postts(sts);
> +		lo = rd32(E1000_SYSTIML);
> +		hi = rd32(E1000_SYSTIMH);
> +	}
> +
> +	/* SYSTIM on I210/I211 counts time in seconds and nanoseconds */
> +	if (igb->hw.mac.type == e1000_i210 || igb->hw.mac.type == e1000_i211) {
> +		sts->phc_ts.tv_sec = hi;
> +		sts->phc_ts.tv_nsec = lo;
> +	} else {
> +		ns = timecounter_cyc2time(&igb->tc, ((u64)hi << 32) | lo);
> +		sts->phc_ts = ns_to_timespec64(ns);
> +	}
> +
> +	spin_unlock_irqrestore(&igb->tmreg_lock, flags);
> +
> +	return 0;
> +}
> +
>  static int igb_ptp_settime_82576(struct ptp_clock_info *ptp,
>  				 const struct timespec64 *ts)
>  {
> @@ -1125,6 +1165,7 @@ void igb_ptp_init(struct igb_adapter *adapter)
>  		adapter->ptp_caps.adjfreq = igb_ptp_adjfreq_82576;
>  		adapter->ptp_caps.adjtime = igb_ptp_adjtime_82576;
>  		adapter->ptp_caps.gettime64 = igb_ptp_gettime_82576;
> +		adapter->ptp_caps.gettimex64 = igb_ptp_gettimex;

... here?

Thanks,
Richard

>  		adapter->ptp_caps.settime64 = igb_ptp_settime_82576;
>  		adapter->ptp_caps.enable = igb_ptp_feature_enable;
>  		adapter->cc.read = igb_ptp_read_82576;
> @@ -1144,6 +1185,7 @@ void igb_ptp_init(struct igb_adapter *adapter)
>  		adapter->ptp_caps.adjfine = igb_ptp_adjfine_82580;
>  		adapter->ptp_caps.adjtime = igb_ptp_adjtime_82576;
>  		adapter->ptp_caps.gettime64 = igb_ptp_gettime_82576;
> +		adapter->ptp_caps.gettimex64 = igb_ptp_gettimex;
>  		adapter->ptp_caps.settime64 = igb_ptp_settime_82576;
>  		adapter->ptp_caps.enable = igb_ptp_feature_enable;
>  		adapter->cc.read = igb_ptp_read_82580;
> @@ -1172,6 +1214,7 @@ void igb_ptp_init(struct igb_adapter *adapter)
>  		adapter->ptp_caps.adjfine = igb_ptp_adjfine_82580;
>  		adapter->ptp_caps.adjtime = igb_ptp_adjtime_i210;
>  		adapter->ptp_caps.gettime64 = igb_ptp_gettime_i210;
> +		adapter->ptp_caps.gettimex64 = igb_ptp_gettimex;
>  		adapter->ptp_caps.settime64 = igb_ptp_settime_i210;
>  		adapter->ptp_caps.enable = igb_ptp_feature_enable_i210;
>  		adapter->ptp_caps.verify = igb_ptp_verify_pin;

^ permalink raw reply

* Re: [PATCH v2 net] igb: shorten maximum PHC timecounter update interval
From: Richard Cochran @ 2018-10-31  2:30 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: netdev, intel-wired-lan, Jacob Keller, Thomas Gleixner
In-Reply-To: <20181026171300.12720-1-mlichvar@redhat.com>

On Fri, Oct 26, 2018 at 07:13:00PM +0200, Miroslav Lichvar wrote:
> The timecounter needs to be updated at least once per ~550 seconds in
> order to avoid a 40-bit SYSTIM timestamp to be misinterpreted as an old
> timestamp.
> 
> Since commit 500462a9de65 ("timers: Switch to a non-cascading wheel"),
> scheduling of delayed work seems to be less accurate and a requested
> delay of 540 seconds may actually be longer than 550 seconds. Also, the
> PHC may be adjusted to run up to 6% faster than real time and the system
> clock up to 10% slower. Shorten the delay to 360 seconds to be sure the
> timecounter is updated in time.
> 
> This fixes an issue with HW timestamps on 82580/I350/I354 being off by
> ~1100 seconds for few seconds every ~9 minutes.

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* [PATCH] net: move ‘__zerocopy_sg_from_iter’ prototype to header file <linux/skbuff.h>
From: Mathieu Malaterre @ 2018-10-31 11:34 UTC (permalink / raw)
  To: davem; +Cc: Mathieu Malaterre, linux-kernel, netdev

This makes it clear the function is part of the API. Also this will
remove a warning triggered at W=1:

  net/core/datagram.c:581:5: warning: no previous prototype for ‘__zerocopy_sg_from_iter’ [-Wmissing-prototypes]

Signed-off-by: Mathieu Malaterre <malat@debian.org>
---
 include/linux/skbuff.h | 2 ++
 net/core/skbuff.c      | 3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0ba687454267..cca7c0a3c176 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3311,6 +3311,8 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, int hlen,
 				   struct msghdr *msg);
 int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
 				 struct iov_iter *from, int len);
+int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb,
+			    struct iov_iter *from, size_t length);
 int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *frm);
 void skb_free_datagram(struct sock *sk, struct sk_buff *skb);
 void __skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb, int len);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 946de0e24c87..7eb7e0e104d4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1102,9 +1102,6 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg)
 }
 EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort);
 
-extern int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb,
-				   struct iov_iter *from, size_t length);
-
 int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 			     struct msghdr *msg, int len,
 			     struct ubuf_info *uarg)
-- 
2.11.0

^ permalink raw reply related

* Re: [RFC PATCH 1/4] ptp: add PTP_SYS_OFFSET_EXTENDED ioctl
From: Richard Cochran @ 2018-10-31  3:01 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: netdev, intel-wired-lan, Jacob Keller
In-Reply-To: <20181031022351.24mmvf7u7bcx42rv@localhost>

On Tue, Oct 30, 2018 at 07:23:51PM -0700, Richard Cochran wrote:
> This collection of automatic variables is getting ugly.  May I ask you
> to prefix a patch that puts them into reverse Christmas tree before
> your changes?  (Patch below)

Forgot the diff. Here it is...

---
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 2012551d93e0..b54b8158ff8a 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -121,18 +121,18 @@ int ptp_open(struct posix_clock *pc, fmode_t fmode)
 
 long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 {
-	struct ptp_clock_caps caps;
-	struct ptp_clock_request req;
-	struct ptp_sys_offset *sysoff = NULL;
-	struct ptp_sys_offset_precise precise_offset;
-	struct ptp_pin_desc pd;
 	struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+	struct ptp_sys_offset_precise precise_offset;
+	struct system_device_crosststamp xtstamp;
 	struct ptp_clock_info *ops = ptp->info;
+	struct ptp_sys_offset *sysoff = NULL;
+	struct ptp_clock_request req;
+	struct ptp_clock_caps caps;
 	struct ptp_clock_time *pct;
+	unsigned int i, pin_index;
+	struct ptp_pin_desc pd;
 	struct timespec64 ts;
-	struct system_device_crosststamp xtstamp;
 	int enable, err = 0;
-	unsigned int i, pin_index;
 
 	switch (cmd) {
 

^ permalink raw reply related

* [PATCH] net: document skb parameter in function 'skb_gso_size_check'
From: Mathieu Malaterre @ 2018-10-31 12:16 UTC (permalink / raw)
  To: davem; +Cc: Mathieu Malaterre, netdev, linux-kernel

Remove kernel-doc warning:

  net/core/skbuff.c:4953: warning: Function parameter or member 'skb' not described in 'skb_gso_size_check'

Signed-off-by: Mathieu Malaterre <malat@debian.org>
---
 net/core/skbuff.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7eb7e0e104d4..4e774f34ccc3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4941,6 +4941,8 @@ static unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
  *
  * This is a helper to do that correctly considering GSO_BY_FRAGS.
  *
+ * @skb: GSO skb
+ *
  * @seg_len: The segmented length (from skb_gso_*_seglen). In the
  *           GSO_BY_FRAGS case this will be [header sizes + GSO_BY_FRAGS].
  *
-- 
2.11.0

^ permalink raw reply related

* [PATCH 2/4] net: macb: Support clock management for tsu_clk
From: Harini Katakam @ 2018-10-31  3:40 UTC (permalink / raw)
  To: nicolas.ferre, davem, claudiu.beznea
  Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
	harini.katakam, Harini Katakam
In-Reply-To: <1540957223-30984-1-git-send-email-harini.katakam@xilinx.com>

From: Harini Katakam <harinik@xilinx.com>

TSU clock needs to be enabled/disabled as per support in devicetree
and it should also be controlled during suspend/resume (WOL has no
dependency on this clock).

Signed-off-by: Harini Katakam <harinik@xilinx.com>
---
 drivers/net/ethernet/cadence/macb.h      |  3 ++-
 drivers/net/ethernet/cadence/macb_main.c | 30 +++++++++++++++++++++++++-----
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 3d45f4c..6f2b4b0 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -1082,7 +1082,7 @@ struct macb_config {
 	unsigned int		dma_burst_length;
 	int	(*clk_init)(struct platform_device *pdev, struct clk **pclk,
 			    struct clk **hclk, struct clk **tx_clk,
-			    struct clk **rx_clk);
+			    struct clk **rx_clk, struct clk **tsu_clk);
 	int	(*init)(struct platform_device *pdev);
 	int	jumbo_max_len;
 };
@@ -1162,6 +1162,7 @@ struct macb {
 	struct clk		*hclk;
 	struct clk		*tx_clk;
 	struct clk		*rx_clk;
+	struct clk		*tsu_clk;
 	struct net_device	*dev;
 	union {
 		struct macb_stats	macb;
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index b4e26c1..7ae8d731 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -3304,7 +3304,7 @@ static void macb_probe_queues(void __iomem *mem,
 
 static int macb_clk_init(struct platform_device *pdev, struct clk **pclk,
 			 struct clk **hclk, struct clk **tx_clk,
-			 struct clk **rx_clk)
+			 struct clk **rx_clk, struct clk **tsu_clk)
 {
 	struct macb_platform_data *pdata;
 	int err;
@@ -3338,6 +3338,10 @@ static int macb_clk_init(struct platform_device *pdev, struct clk **pclk,
 	if (IS_ERR(*rx_clk))
 		*rx_clk = NULL;
 
+	*tsu_clk = devm_clk_get(&pdev->dev, "tsu_clk");
+	if (IS_ERR(*tsu_clk))
+		*tsu_clk = NULL;
+
 	err = clk_prepare_enable(*pclk);
 	if (err) {
 		dev_err(&pdev->dev, "failed to enable pclk (%u)\n", err);
@@ -3362,8 +3366,17 @@ static int macb_clk_init(struct platform_device *pdev, struct clk **pclk,
 		goto err_disable_txclk;
 	}
 
+	err = clk_prepare_enable(*tsu_clk);
+	if (err) {
+		dev_err(&pdev->dev, "failed to enable tsu_clk (%u)\n", err);
+		goto err_disable_rxclk;
+	}
+
 	return 0;
 
+err_disable_rxclk:
+	clk_disable_unprepare(*rx_clk);
+
 err_disable_txclk:
 	clk_disable_unprepare(*tx_clk);
 
@@ -3814,13 +3827,14 @@ static const struct net_device_ops at91ether_netdev_ops = {
 
 static int at91ether_clk_init(struct platform_device *pdev, struct clk **pclk,
 			      struct clk **hclk, struct clk **tx_clk,
-			      struct clk **rx_clk)
+			      struct clk **rx_clk, struct clk **tsu_clk)
 {
 	int err;
 
 	*hclk = NULL;
 	*tx_clk = NULL;
 	*rx_clk = NULL;
+	*tsu_clk = NULL;
 
 	*pclk = devm_clk_get(&pdev->dev, "ether_clk");
 	if (IS_ERR(*pclk))
@@ -3968,11 +3982,12 @@ static int macb_probe(struct platform_device *pdev)
 {
 	const struct macb_config *macb_config = &default_gem_config;
 	int (*clk_init)(struct platform_device *, struct clk **,
-			struct clk **, struct clk **,  struct clk **)
-					      = macb_config->clk_init;
+			struct clk **, struct clk **,  struct clk **,
+			struct clk **) = macb_config->clk_init;
 	int (*init)(struct platform_device *) = macb_config->init;
 	struct device_node *np = pdev->dev.of_node;
 	struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL;
+	struct clk *tsu_clk = NULL;
 	unsigned int queue_mask, num_queues;
 	struct macb_platform_data *pdata;
 	bool native_io;
@@ -4000,7 +4015,7 @@ static int macb_probe(struct platform_device *pdev)
 		}
 	}
 
-	err = clk_init(pdev, &pclk, &hclk, &tx_clk, &rx_clk);
+	err = clk_init(pdev, &pclk, &hclk, &tx_clk, &rx_clk, &tsu_clk);
 	if (err)
 		return err;
 
@@ -4037,6 +4052,7 @@ static int macb_probe(struct platform_device *pdev)
 	bp->hclk = hclk;
 	bp->tx_clk = tx_clk;
 	bp->rx_clk = rx_clk;
+	bp->tsu_clk = tsu_clk;
 	if (macb_config)
 		bp->jumbo_max_len = macb_config->jumbo_max_len;
 
@@ -4152,6 +4168,7 @@ static int macb_probe(struct platform_device *pdev)
 	clk_disable_unprepare(hclk);
 	clk_disable_unprepare(pclk);
 	clk_disable_unprepare(rx_clk);
+	clk_disable_unprepare(tsu_clk);
 
 	return err;
 }
@@ -4179,6 +4196,7 @@ static int macb_remove(struct platform_device *pdev)
 		clk_disable_unprepare(bp->hclk);
 		clk_disable_unprepare(bp->pclk);
 		clk_disable_unprepare(bp->rx_clk);
+		clk_disable_unprepare(bp->tsu_clk);
 		of_node_put(bp->phy_node);
 		free_netdev(dev);
 	}
@@ -4205,6 +4223,7 @@ static int __maybe_unused macb_suspend(struct device *dev)
 		clk_disable_unprepare(bp->pclk);
 		clk_disable_unprepare(bp->rx_clk);
 	}
+	clk_disable_unprepare(bp->tsu_clk);
 
 	return 0;
 }
@@ -4225,6 +4244,7 @@ static int __maybe_unused macb_resume(struct device *dev)
 		clk_prepare_enable(bp->tx_clk);
 		clk_prepare_enable(bp->rx_clk);
 	}
+	clk_prepare_enable(bp->tsu_clk);
 
 	netif_device_attach(netdev);
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH 4/4] net: macb: Add support for suspend/resume with full power down
From: Harini Katakam @ 2018-10-31  3:40 UTC (permalink / raw)
  To: nicolas.ferre, davem, claudiu.beznea
  Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
	harini.katakam, Kedareswara rao Appana
In-Reply-To: <1540957223-30984-1-git-send-email-harini.katakam@xilinx.com>

When macb device is suspended and system is powered down, the clocks
are removed and hence macb should be closed gracefully and restored
upon resume. This patch does the same by switching off the net device,
suspending phy and performing necessary cleanup of interrupts and BDs.
Upon resume, all these are reinitialized again.

Reset of macb device is done only when GEM is not a wake device.
Even when gem is a wake device, tx queues can be stopped and ptp device
can be closed (tsu clock will be disabled in pm_runtime_suspend) as
wake event detection has no dependency on this.

Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
---
Notes:
I was unable to do a full macb_close/open in this patch as suggested
because it was freeing and allocating the full RX/TX buffers and
this time consuming, also leading to a crash when done continuously
in stress tests. 

 drivers/net/ethernet/cadence/macb_main.c | 38 ++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 09cb4bb..0d1acb4 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -4247,16 +4247,33 @@ static int __maybe_unused macb_suspend(struct device *dev)
 	struct platform_device *pdev = to_platform_device(dev);
 	struct net_device *netdev = platform_get_drvdata(pdev);
 	struct macb *bp = netdev_priv(netdev);
+	struct macb_queue *queue = bp->queues;
+	unsigned long flags;
+	unsigned int q;
+
+	if (!netif_running(netdev))
+		return 0;
 
-	netif_carrier_off(netdev);
-	netif_device_detach(netdev);
 
 	if (bp->wol & MACB_WOL_ENABLED) {
 		macb_writel(bp, IER, MACB_BIT(WOL));
 		macb_writel(bp, WOL, MACB_BIT(MAG));
 		enable_irq_wake(bp->queues[0].irq);
+		netif_device_detach(netdev);
+	} else {
+		netif_device_detach(netdev);
+		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
+			napi_disable(&queue->napi);
+		phy_stop(bp->phy_dev);
+		phy_suspend(bp->phy_dev);
+		spin_lock_irqsave(&bp->lock, flags);
+		macb_reset_hw(bp);
+		spin_unlock_irqrestore(&bp->lock, flags);
 	}
 
+	netif_carrier_off(netdev);
+	if (bp->ptp_info)
+		bp->ptp_info->ptp_remove(netdev);
 	pm_runtime_force_suspend(dev);
 
 	return 0;
@@ -4267,6 +4284,11 @@ static int __maybe_unused macb_resume(struct device *dev)
 	struct platform_device *pdev = to_platform_device(dev);
 	struct net_device *netdev = platform_get_drvdata(pdev);
 	struct macb *bp = netdev_priv(netdev);
+	struct macb_queue *queue = bp->queues;
+	unsigned int q;
+
+	if (!netif_running(netdev))
+		return 0;
 
 	pm_runtime_force_resume(dev);
 
@@ -4274,9 +4296,21 @@ static int __maybe_unused macb_resume(struct device *dev)
 		macb_writel(bp, IDR, MACB_BIT(WOL));
 		macb_writel(bp, WOL, 0);
 		disable_irq_wake(bp->queues[0].irq);
+	} else {
+		macb_writel(bp, NCR, MACB_BIT(MPE));
+		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
+			napi_enable(&queue->napi);
+		phy_resume(bp->phy_dev);
+		phy_init_hw(bp->phy_dev);
+		phy_start(bp->phy_dev);
 	}
 
+	bp->macbgem_ops.mog_init_rings(bp);
+	macb_init_hw(bp);
+	macb_set_rx_mode(netdev);
 	netif_device_attach(netdev);
+	if (bp->ptp_info)
+		bp->ptp_info->ptp_init(netdev);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox