Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next v2 1/3] net: phy: dp83640: use new PTP_MSGTYPE_SYNC define
From: Christian Eggers @ 2020-11-24  7:44 UTC (permalink / raw)
  To: Richard Cochran, Andrew Lunn, Heiner Kallweit, Jakub Kicinski
  Cc: Vladimir Oltean, Russell King, David S . Miller, netdev,
	linux-kernel, Christian Eggers, Kurt Kanzenbach
In-Reply-To: <20201124074418.2609-1-ceggers@arri.de>

Replace use of magic number with recently introduced define.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Cc: Kurt Kanzenbach <kurt@linutronix.de>
---
 drivers/net/phy/dp83640.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index f2caccaf4408..9757ca0d9633 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -964,15 +964,12 @@ static void decode_status_frame(struct dp83640_private *dp83640,
 static int is_sync(struct sk_buff *skb, int type)
 {
 	struct ptp_header *hdr;
-	u8 msgtype;
 
 	hdr = ptp_parse_header(skb, type);
 	if (!hdr)
 		return 0;
 
-	msgtype = ptp_get_msgtype(hdr, type);
-
-	return (msgtype & 0xf) == 0;
+	return ptp_get_msgtype(hdr, type) == PTP_MSGTYPE_SYNC;
 }
 
 static void dp83640_free_clocks(void)
-- 
Christian Eggers
Embedded software developer

Arnold & Richter Cine Technik GmbH & Co. Betriebs KG
Sitz: Muenchen - Registergericht: Amtsgericht Muenchen - Handelsregisternummer: HRA 57918
Persoenlich haftender Gesellschafter: Arnold & Richter Cine Technik GmbH
Sitz: Muenchen - Registergericht: Amtsgericht Muenchen - Handelsregisternummer: HRB 54477
Geschaeftsfuehrer: Dr. Michael Neuhaeuser; Stephan Schenk; Walter Trauninger; Markus Zeiler


^ permalink raw reply related

* [PATCH net-next v2 0/3] net: ptp: use common defines for PTP message types in further drivers
From: Christian Eggers @ 2020-11-24  7:44 UTC (permalink / raw)
  To: Richard Cochran, Andrew Lunn, Heiner Kallweit, Jakub Kicinski
  Cc: Vladimir Oltean, Russell King, David S . Miller, netdev,
	linux-kernel, Christian Eggers

Changes in v2:
----------------
- resend, as v1 was sent before the prerequisites were merged
- removed mismatch between From: and Signed-off-by:
- [2/3] Reviewed-by: Ido Schimmel <idosch@nvidia.com>
- [3/3] Reviewed-by: Antoine Tenart <atenart@kernel.org>
- [3/3] removed dead email addresses from Cc:


This series replaces further driver internal enumeration / uses of magic
numbers with the newly introduced PTP_MSGTYPE_* defines.

On Friday, 20 November 2020, 23:39:10 CET, Vladimir Oltean wrote:
> On Fri, Nov 20, 2020 at 09:41:03AM +0100, Christian Eggers wrote:
> > This series introduces commen defines for PTP event messages. Driver
> > internal defines are removed and some uses of magic numbers are replaced
> > by the new defines.
> > [...]
> 
> I understand that you don't want to spend a lifetime on this, but I see
> that there are more drivers which you did not touch.
> 
> is_sync() in drivers/net/phy/dp83640.c can be made to
> 	return ptp_get_msgtype(hdr, type) == PTP_MSGTYPE_SYNC;
> 
> this can be removed from drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.h:
> enum {
> 	MLXSW_SP_PTP_MESSAGE_TYPE_SYNC,
> 	MLXSW_SP_PTP_MESSAGE_TYPE_DELAY_REQ,
> 	MLXSW_SP_PTP_MESSAGE_TYPE_PDELAY_REQ,
> 	MLXSW_SP_PTP_MESSAGE_TYPE_PDELAY_RESP,
> };
I think that I have found an addtional one in the Microsemi VSC85xx PHY driver.





^ permalink raw reply

* Re: [PATCH nf-next v3 3/3] netfilter: Introduce egress hook
From: Lukas Wunner @ 2020-11-24  7:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Pablo Neira Ayuso, Daniel Borkmann,
	Laura García Liébana, John Fastabend, Jozsef Kadlecsik,
	Florian Westphal, Netfilter Development Mailing list, coreteam,
	Network Development, Alexei Starovoitov, Eric Dumazet,
	Thomas Graf, David Miller
In-Reply-To: <20201124033422.gvwhvsjmwt3b3irx@ast-mbp>

On Mon, Nov 23, 2020 at 07:34:22PM -0800, Alexei Starovoitov wrote:
> It's a missing hook for out-of-tree module. That's why it stinks so much.

As I've said before, the motivation for these patches has pivoted away
from the original use case (which was indeed an out-of-tree module by
a company for which I no longer work):

https://lore.kernel.org/netdev/20200905052403.GA10306@wunner.de/

When first submitting this series I also posted a patch to use the nft
egress hook from userspace for filtering and mangling.  It seems Zevenet
is actively using that:

https://lore.kernel.org/netdev/CAF90-Wi4W1U4FSYqyBTqe7sANbdO6=zgr-u+YY+X-gvNmOgc6A@mail.gmail.com/


> So please consider augmenting your nft k8s solution with a tiny bit of bpf.
> bpf can add a new helper to call into nf_hook_slow().

The out-of-tree module had nothing to do with k8s, it was for industrial
fieldbus communication.  But again, I no longer work for that company.
We're talking about a hook that's used by userspace, not by an out-of-tree
module.


> If it was not driven by
> out-of-tree kernel module I wouldn't have any problem with it.

Good!  Thank you.  Let me update and repost the patches then.

Lukas

^ permalink raw reply

* [PATCH][V2] libbpf: add support for canceling cached_cons advance
From: Li RongQing @ 2020-11-24  7:21 UTC (permalink / raw)
  To: netdev, bpf, magnus.karlsson

Add a new function for returning descriptors the user received
after an xsk_ring_cons__peek call. After the application has
gotten a number of descriptors from a ring, it might not be able
to or want to process them all for various reasons. Therefore,
it would be useful to have an interface for returning or
cancelling a number of them so that they are returned to the ring.

This patch adds a new function called xsk_ring_cons__cancel that
performs this operation on nb descriptors counted from the end of
the batch of descriptors that was received through the peek call.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
[ Magnus Karlsson: rewrote changelog ]
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
---
diff with v1: fix the building, and rewrote changelog

 tools/lib/bpf/xsk.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 1069c46364ff..1719a327e5f9 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -153,6 +153,12 @@ static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
 	return entries;
 }
 
+static inline void xsk_ring_cons__cancel(struct xsk_ring_cons *cons,
+					 size_t nb)
+{
+	cons->cached_cons -= nb;
+}
+
 static inline void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
 {
 	/* Make sure data has been read before indicating we are done
-- 
2.17.3


^ permalink raw reply related

* Re: [PATCH net] ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init
From: wanghai (M) @ 2020-11-24  7:19 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <20201123172207.213d7134@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>


在 2020/11/24 9:22, Jakub Kicinski 写道:
> On Sun, 22 Nov 2020 10:34:56 +0800 Wang Hai wrote:
>> kmemleak report a memory leak as follows:
>>
>> unreferenced object 0xffff8880059c6a00 (size 64):
>>    comm "ip", pid 23696, jiffies 4296590183 (age 1755.384s)
>>    hex dump (first 32 bytes):
>>      20 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00   ...............
>>      1c 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00  ................
>>    backtrace:
>>      [<00000000aa4e7a87>] ip6addrlbl_add+0x90/0xbb0
>>      [<0000000070b8d7f1>] ip6addrlbl_net_init+0x109/0x170
>>      [<000000006a9ca9d4>] ops_init+0xa8/0x3c0
>>      [<000000002da57bf2>] setup_net+0x2de/0x7e0
>>      [<000000004e52d573>] copy_net_ns+0x27d/0x530
>>      [<00000000b07ae2b4>] create_new_namespaces+0x382/0xa30
>>      [<000000003b76d36f>] unshare_nsproxy_namespaces+0xa1/0x1d0
>>      [<0000000030653721>] ksys_unshare+0x3a4/0x780
>>      [<0000000007e82e40>] __x64_sys_unshare+0x2d/0x40
>>      [<0000000031a10c08>] do_syscall_64+0x33/0x40
>>      [<0000000099df30e7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> We should free all rules when we catch an error in ip6addrlbl_net_init().
>> otherwise a memory leak will occur.
>>
>> Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Signed-off-by: Wang Hai <wanghai38@huawei.com>
> We can simplify this function.
>
>> diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
>> index 642fc6ac13d2..637e323a0224 100644
>> --- a/net/ipv6/addrlabel.c
>> +++ b/net/ipv6/addrlabel.c
>> @@ -306,6 +306,8 @@ static int ip6addrlbl_del(struct net *net,
>>   /* add default label */
>>   static int __net_init ip6addrlbl_net_init(struct net *net)
>>   {
>> +	struct ip6addrlbl_entry *p = NULL;
>> +	struct hlist_node *n;
>>   	int err = 0;
> err does not need init
>
>>   	int i;
>>   
>> @@ -320,9 +322,17 @@ static int __net_init ip6addrlbl_net_init(struct net *net)
> instead of the temporary ret variable we can assign directly to err
>
>>   					 ip6addrlbl_init_table[i].prefixlen,
>>   					 0,
>>   					 ip6addrlbl_init_table[i].label, 0);
>> -		/* XXX: should we free all rules when we catch an error? */
>> -		if (ret && (!err || err != -ENOMEM))
>> +		if (ret && (!err || err != -ENOMEM)) {
> this will become if (err)
>
>>   			err = ret;
>> +			goto err_ip6addrlbl_add;
>> +		}
>> +	}
>> +	return err;
> return 0;
>
>> +err_ip6addrlbl_add:
>> +	hlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) {
>> +		hlist_del_rcu(&p->list);
>> +		kfree_rcu(p, rcu);
>>   	}
>>   	return err;
>>   }

Thanks for advice. I just sent a v2

“[PATCH net v2] ipv6: addrlabel: fix possible memory leak in 
ip6addrlbl_net_init”

> .
>

^ permalink raw reply

* Re: [PATCH v2] brmcfmac: fix compile when DEBUG is defined
From: Kalle Valo @ 2020-11-24  7:19 UTC (permalink / raw)
  To: hby; +Cc: davem, kuba, linux-wireless, netdev, linux-kernel
In-Reply-To: <c3b297cf-268e-6f28-f585-5452dd8696f8@163.com>

hby <hby2003@163.com> writes:

> I am sorry for the HTML email, and I change the email client. The
> patch update.
>
> From b87d429158b4efc3f6835828f495a261e17d5af4 Mon Sep 17 00:00:00 2001
> From: hby <hby2003@163.com>
> Date: Tue, 24 Nov 2020 09:16:24 +0800
> Subject: [PATCH] brmcfmac: fix compile when DEBUG is defined
>
> The steps:
> 1. add "#define DEBUG" in
> drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c line 61.
> 2. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- O=../Out_Linux
> bcm2835_defconfig
> 3. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- O=../Out_Linux/
> zImage modules dtbs -j8
>
> Then, it will fail, the compile log described below:

It doesn't work like this, the patch handling is very much automated and
you can't just reply with a new patch. I strongly recommend to use git
send-email and read the wiki page below.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH net-next 1/1] net: dsa: hellcreek: Add TAPRIO offloading support
From: Kurt Kanzenbach @ 2020-11-24  7:19 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
	Jakub Kicinski, Vinicius Costa Gomes, netdev
In-Reply-To: <20201122230826.5l7yzdwcjzntpijm@skbuf>

[-- Attachment #1: Type: text/plain, Size: 19509 bytes --]

On Mon Nov 23 2020, Vladimir Oltean wrote:
> Hi Kurt,
>
> On Sat, Nov 21, 2020 at 12:57:03PM +0100, Kurt Kanzenbach wrote:
>> The switch has support for the 802.1Qbv Time Aware Shaper (TAS). Traffic
>> schedules may be configured individually on each front port. Each port has eight
>> egress queues. The traffic is mapped to a traffic class respectively via the PCP
>> field of a VLAN tagged frame.
>> 
>> The TAPRIO Qdisc already implements that. Therefore, this interface can simply
>> be reused. Add .port_setup_tc() accordingly.
>> 
>> The activation of a schedule on a port is split into two parts:
>> 
>>  * Programming the necessary gate control list (GCL)
>>  * Setup delayed work for starting the schedule
>> 
>> The hardware supports starting a schedule up to eight seconds in the future. The
>> TAPRIO interface provides an absolute base time. Therefore, periodic delayed
>> work is leveraged to check whether a schedule may be started or not.
>> 
>> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
>> ---
>>  drivers/net/dsa/hirschmann/hellcreek.c | 314 +++++++++++++++++++++++++
>>  drivers/net/dsa/hirschmann/hellcreek.h |  22 ++
>>  2 files changed, 336 insertions(+)
>> 
>> diff --git a/drivers/net/dsa/hirschmann/hellcreek.c b/drivers/net/dsa/hirschmann/hellcreek.c
>> index 6420b76ea37c..35514af1922a 100644
>> --- a/drivers/net/dsa/hirschmann/hellcreek.c
>> +++ b/drivers/net/dsa/hirschmann/hellcreek.c
>> @@ -23,6 +23,7 @@
>>  #include <linux/mutex.h>
>>  #include <linux/delay.h>
>>  #include <net/dsa.h>
>> +#include <net/pkt_sched.h>
>>  
>>  #include "hellcreek.h"
>>  #include "hellcreek_ptp.h"
>> @@ -153,6 +154,13 @@ static void hellcreek_select_vlan(struct hellcreek *hellcreek, int vid,
>>  	hellcreek_write(hellcreek, val, HR_VIDCFG);
>>  }
>>  
>> +static void hellcreek_select_tgd(struct hellcreek *hellcreek, int port)
>> +{
>> +	u16 val = port << TR_TGDSEL_TDGSEL_SHIFT;
>> +
>> +	hellcreek_write(hellcreek, val, TR_TGDSEL);
>> +}
>> +
>>  static int hellcreek_wait_until_ready(struct hellcreek *hellcreek)
>>  {
>>  	u16 val;
>> @@ -1135,6 +1143,308 @@ hellcreek_port_prechangeupper(struct dsa_switch *ds, int port,
>>  	return ret;
>>  }
>>  
>> +static void hellcreek_setup_gcl(struct hellcreek *hellcreek, int port,
>> +				const struct hellcreek_schedule *schedule)
>> +{
>> +	size_t i;
>> +
>> +	for (i = 1; i <= schedule->num_entries; ++i) {
>> +		const struct hellcreek_gcl_entry *cur, *initial, *next;
>> +		u16 data;
>> +		u8 gates;
>> +
>> +		cur = &schedule->entries[i - 1];
>> +		initial = &schedule->entries[0];
>> +		next = &schedule->entries[i];
>> +
>
> I would find it more intuitive to have the invariant assignment out of
> the loop.
>
> 	const struct hellcreek_gcl_entry *cur, *initial, *next;
>
> 	cur = initial = &schedule->entries[0];
> 	next = cur + 1;
>
> 	for (i = 0; i < schedule->num_entries; i++) {
> 		u16 data;
> 		u8 gates;
>
> 		[...]
>
> 		cur++;
> 		next++;
> 	}

Sure. Could also do

 for (i = 0; i < schedule->num_entries; i++, cur++, next++)

>
>> +		if (i == schedule->num_entries)
>> +			gates = initial->gate_states ^
>> +				cur->gate_states;
>> +		else
>> +			gates = next->gate_states ^
>> +				cur->gate_states;
>> +
>> +		data = gates;
>> +		if (cur->overrun_ignore)
>> +			data |= TR_GCLDAT_GCLOVRI;
>> +
>> +		if (i == schedule->num_entries)
>> +			data |= TR_GCLDAT_GCLWRLAST;
>> +
>> +		/* Gates states */
>> +		hellcreek_write(hellcreek, data, TR_GCLDAT);
>> +
>> +		/* Time intervall */
>
> interval

Ah. That's German spelling.

>
>> +		hellcreek_write(hellcreek,
>> +				cur->interval & 0x0000ffff,
>> +				TR_GCLTIL);
>> +		hellcreek_write(hellcreek,
>> +				(cur->interval & 0xffff0000) >> 16,
>> +				TR_GCLTIH);
>> +
>> +		/* Commit entry */
>> +		data = ((i - 1) << TR_GCLCMD_GCLWRADR_SHIFT) |
>> +			(initial->gate_states <<
>> +			 TR_GCLCMD_INIT_GATE_STATES_SHIFT);
>> +		hellcreek_write(hellcreek, data, TR_GCLCMD);
>
> So the command register holds the initial gate states. When the command
> register is written, it samples the data register, populating GCL entry
> number GCLWRADR with that information. Every GCL entry contains the
> duration until it is executed, and the gates that need to change when
> the schedule transitions towards the next GCL entry. Right?
>
> Why are the initial gate states written each time into the command
> register? Is it required by the hardware, or just easier in the
> implementation?

I think it's required. That sequence is from the programming guide.

>
>> +	}
>> +}
>> +
>> +static void hellcreek_set_cycle_time(struct hellcreek *hellcreek,
>> +				     const struct hellcreek_schedule *schedule)
>> +{
>> +	u32 cycle_time = schedule->cycle_time;
>> +
>> +	hellcreek_write(hellcreek, cycle_time & 0x0000ffff, TR_CTWRL);
>> +	hellcreek_write(hellcreek, (cycle_time & 0xffff0000) >> 16, TR_CTWRH);
>
> The cycle_time in struct tc_taprio_qopt_offload is 64-bit, so you should
> NACK a value that is too large and return an appropriate extack
> message.

Ok, makes sense.

>
>> +}
>> +
>> +static void hellcreek_switch_schedule(struct hellcreek *hellcreek,
>> +				      ktime_t start_time)
>> +{
>> +	struct timespec64 ts = ktime_to_timespec64(start_time);
>> +
>> +	/* Start can be up to eight seconds in the future */
>> +	ts.tv_sec %= 8;
>
> What happens if base_time is specified as zero, or a small number that
> only encodes a phase?
>
> I would expect that the base time is advanced by an integer number of
> cycles until it becomes in the immediate future, so that it can be
> applied.

Yes, that's missing/not implemented. If the admin base time is in the
past, a valid starting point in the future has to be calculated by the
driver.

>
> I don't think that's what happens right now.
> If the start_time is 0.000000000, the cycle_time is 0.333333333, and now
> is 8.125000000.
>
> I believe that what would happen is:
> - hellcreek_schedule_startable() says "yes, it's startable right away",
>   because base_time_ns - current_ns) is negative, and therefore also
>   smaller than 8 * NSEC_PER_SEC.
> - hellcreek_switch_schedule() gets called with the 0.000000000 time, and
>   this start_time gets programmed right away into hardware.
>
> What does the hardware do if it's given a schedule that begins at 0.000000000
> when the current time is at 8.125000000?
>
> I would expect that somebody (hardware or driver) advances the time
> 0.000000000 into the first time that is larger than 8.125000000, while
> still remaining congruent to the original time in terms of phase offset.
>
> I.e. advance the base time by 25 cycle times, to get
> 0.000000000 + 25 x 0.333333333 = 8.333333325 ns.
>
> Then I would expect the schedule to start at 8.333333325 nanoseconds,
> which is the first value immediately larger than "now" (8.125000000).

Correct.

>
> When you give the hardware the base_time of 0.000000000, is this what
> happens? Is it equivalent to giving it 0.333333325?
>
>> +
>> +	/* Start schedule at this point of time */
>> +	hellcreek_write(hellcreek, ts.tv_nsec & 0x0000ffff, TR_ESTWRL);
>> +	hellcreek_write(hellcreek, (ts.tv_nsec & 0xffff0000) >> 16, TR_ESTWRH);
>> +
>> +	/* Arm timer, set seconds and switch schedule */
>> +	hellcreek_write(hellcreek, TR_ESTCMD_ESTARM | TR_ESTCMD_ESTSWCFG |
>> +			((ts.tv_sec & TR_ESTCMD_ESTSEC_MASK) <<
>> +			 TR_ESTCMD_ESTSEC_SHIFT), TR_ESTCMD);
>> +}
>> +
>> +static struct hellcreek_schedule *
>> +hellcreek_taprio_to_schedule(struct tc_taprio_qopt_offload *taprio)
>> +{
>> +	struct hellcreek_schedule *schedule;
>> +	size_t i;
>> +
>> +	/* Allocate some memory first */
>> +	schedule = kzalloc(sizeof(*schedule), GFP_KERNEL);
>
> struct hellcreek_schedule has no added value on top of struct
> tc_taprio_qopt_offload (except for _maybe_ that overrun_ignore, which
> currently you don't use and could therefore remove - arguably the
> overrun_ignore flag should be user-visible if it ever was to be
> configured, and therefore my argument still stands).

overrun_ignore shouldn't be used. So, yes, that schedule can be removed.

>
> The reason why I'm making the point, though, is that you don't need to
> allocate extra memory if you use the plain struct tc_taprio_qopt_offload.
> You can use taprio_offload_get() to increase the reference count on the
> structure that was passed to you, use it for as long as you need, and
> just call taprio_offload_free() when you're done with it.

Ah, didn't know that. That'll help.

>
>> +	if (!schedule)
>> +		return ERR_PTR(-ENOMEM);
>> +	schedule->entries = kcalloc(taprio->num_entries,
>> +				    sizeof(*schedule->entries),
>> +				    GFP_KERNEL);
>> +	if (!schedule->entries) {
>> +		kfree(schedule);
>> +		return ERR_PTR(-ENOMEM);
>> +	}
>> +
>> +	/* Construct hellcreek schedule */
>> +	schedule->num_entries = taprio->num_entries;
>> +	schedule->base_time   = taprio->base_time;
>> +
>> +	for (i = 0; i < taprio->num_entries; ++i) {
>> +		const struct tc_taprio_sched_entry *t = &taprio->entries[i];
>> +		struct hellcreek_gcl_entry *k = &schedule->entries[i];
>> +
>> +		k->interval	  = t->interval;
>> +		k->gate_states	  = t->gate_mask;
>> +		k->overrun_ignore = 0;
>> +
>> +		/* Update complete cycle time */
>> +		schedule->cycle_time += t->interval;
>
> You shouldn't need to do this, the cycle_time from struct
> tc_taprio_qopt_offload should come properly populated. If it doesn't
> it's a bug.

True.

>
>> +	}
>> +
>> +	return schedule;
>> +}
>> +
>> +static void hellcreek_free_schedule(struct hellcreek *hellcreek, int port)
>> +{
>> +	struct hellcreek_port *hellcreek_port = &hellcreek->ports[port];
>> +
>> +	kfree(hellcreek_port->current_schedule->entries);
>> +	kfree(hellcreek_port->current_schedule);
>> +	hellcreek_port->current_schedule = NULL;
>> +}
>> +
>> +static bool hellcreek_schedule_startable(struct hellcreek *hellcreek, int port)
>> +{
>> +	struct hellcreek_port *hellcreek_port = &hellcreek->ports[port];
>> +	s64 base_time_ns, current_ns;
>> +
>> +	/* The switch allows a schedule to be started only eight seconds within
>> +	 * the future. Therefore, check the current PTP time if the schedule is
>> +	 * startable or not.
>> +	 */
>
> The future of whom? I expect that TR_ESTWR is relative to the most
> recent integer multiple of 8 seconds, and not relative to "now".

I think TR_ESTWR is relative to "now" and software has to make sure that
is programmed correctly.

> Otherwise you would never be able to phase-align taprio schedules with
> other devices in the LAN.
>
> Doesn't this mean that you need to be extra careful at modulo-8-seconds
> wraparounds of the current PTP time?

Yes, indeed.

>
>> +
>> +	/* Use the "cached" time. That should be alright, as it's updated quite
>> +	 * frequently in the PTP code.
>> +	 */
>> +	mutex_lock(&hellcreek->ptp_lock);
>> +	current_ns = hellcreek->seconds * NSEC_PER_SEC + hellcreek->last_ts;
>> +	mutex_unlock(&hellcreek->ptp_lock);
>> +
>> +	/* Calculate difference to admin base time */
>> +	base_time_ns = ktime_to_ns(hellcreek_port->current_schedule->base_time);
>> +
>> +	if (base_time_ns - current_ns < (s64)8 * NSEC_PER_SEC)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +static void hellcreek_start_schedule(struct hellcreek *hellcreek, int port)
>> +{
>> +	struct hellcreek_port *hellcreek_port = &hellcreek->ports[port];
>> +
>> +	/* First select port */
>> +	hellcreek_select_tgd(hellcreek, port);
>> +
>> +	/* Set admin base time and switch schedule */
>> +	hellcreek_switch_schedule(hellcreek,
>> +				  hellcreek_port->current_schedule->base_time);
>> +
>> +	hellcreek_free_schedule(hellcreek, port);
>> +
>> +	dev_dbg(hellcreek->dev, "ARMed EST timer for port %d\n",
>
> Is ARM used as an acronym for something here? Why not Armed?

Yeah, it should be "Armed".

>
>> +		hellcreek_port->port);
>> +}
>> +
>> +static void hellcreek_check_schedule(struct work_struct *work)
>> +{
>> +	struct delayed_work *dw = to_delayed_work(work);
>> +	struct hellcreek_port *hellcreek_port;
>> +	struct hellcreek *hellcreek;
>> +	bool startable;
>> +
>> +	hellcreek_port = dw_to_hellcreek_port(dw);
>> +	hellcreek = hellcreek_port->hellcreek;
>> +
>> +	mutex_lock(&hellcreek->reg_lock);
>> +
>> +	/* Check starting time */
>> +	startable = hellcreek_schedule_startable(hellcreek,
>> +						 hellcreek_port->port);
>> +	if (startable) {
>> +		hellcreek_start_schedule(hellcreek, hellcreek_port->port);
>> +		mutex_unlock(&hellcreek->reg_lock);
>> +		return;
>> +	}
>> +
>> +	mutex_unlock(&hellcreek->reg_lock);
>> +
>> +	/* Reschedule */
>> +	schedule_delayed_work(&hellcreek_port->schedule_work,
>> +			      HELLCREEK_SCHEDULE_PERIOD);
>> +}
>> +
>> +static int hellcreek_port_set_schedule(struct dsa_switch *ds, int port,
>> +				       struct tc_taprio_qopt_offload *taprio)
>> +{
>> +	struct hellcreek *hellcreek = ds->priv;
>> +	struct hellcreek_port *hellcreek_port;
>> +	struct hellcreek_schedule *schedule;
>> +	bool startable;
>> +	u16 ctrl;
>> +
>> +	hellcreek_port = &hellcreek->ports[port];
>> +
>> +	/* Convert taprio data to hellcreek schedule */
>> +	schedule = hellcreek_taprio_to_schedule(taprio);
>> +	if (IS_ERR(schedule))
>> +		return PTR_ERR(schedule);
>> +
>> +	dev_dbg(hellcreek->dev, "Configure traffic schedule on port %d\n",
>> +		port);
>> +
>> +	/* First cancel delayed work */
>> +	cancel_delayed_work_sync(&hellcreek_port->schedule_work);
>> +
>> +	mutex_lock(&hellcreek->reg_lock);
>> +
>> +	if (hellcreek_port->current_schedule)
>> +		hellcreek_free_schedule(hellcreek, port);
>> +	hellcreek_port->current_schedule = schedule;
>> +
>> +	/* Then select port */
>> +	hellcreek_select_tgd(hellcreek, port);
>> +
>> +	/* Enable gating and set the admin state to forward everything in the
>> +	 * mean time
>> +	 */
>> +	ctrl = (0xff << TR_TGDCTRL_ADMINGATESTATES_SHIFT) | TR_TGDCTRL_GATE_EN;
>> +	hellcreek_write(hellcreek, ctrl, TR_TGDCTRL);
>
> Hmm, "in the meantime"? When do these admin gate states become effective?
> If possible, I expect that the currently operational schedule remains
> operational exactly until the base_time of the newly installed one,
> minus a possible cycle_time_extension.

Yes, that's the expectation. And the hardware works like that. I'll have
a look if this code is correct.

>
>> +
>> +	/* Cancel pending schedule */
>> +	hellcreek_write(hellcreek, 0x00, TR_ESTCMD);
>> +
>> +	/* Setup a new schedule */
>> +	hellcreek_setup_gcl(hellcreek, port, schedule);
>> +
>> +	/* Configure cycle time */
>> +	hellcreek_set_cycle_time(hellcreek, schedule);
>
> As a general comment, if the hardware doesn't support the cycle time
> extension when switching schedules, then you should NACK a non-zero
> passed argument there.

Ok.

>
>> +
>> +	/* Check starting time */
>> +	startable = hellcreek_schedule_startable(hellcreek, port);
>> +	if (startable) {
>> +		hellcreek_start_schedule(hellcreek, port);
>> +		mutex_unlock(&hellcreek->reg_lock);
>> +		return 0;
>> +	}
>> +
>> +	mutex_unlock(&hellcreek->reg_lock);
>> +
>> +	/* Schedule periodic schedule check */
>> +	schedule_delayed_work(&hellcreek_port->schedule_work,
>> +			      HELLCREEK_SCHEDULE_PERIOD);
>> +
>> +	return 0;
>> +}
>> +
>> +static int hellcreek_port_del_schedule(struct dsa_switch *ds, int port)
>> +{
>> +	struct hellcreek *hellcreek = ds->priv;
>> +	struct hellcreek_port *hellcreek_port;
>> +
>> +	hellcreek_port = &hellcreek->ports[port];
>> +
>> +	dev_dbg(hellcreek->dev, "Remove traffic schedule on port %d\n", port);
>> +
>> +	/* First cancel delayed work */
>> +	cancel_delayed_work_sync(&hellcreek_port->schedule_work);
>> +
>> +	mutex_lock(&hellcreek->reg_lock);
>> +
>> +	if (hellcreek_port->current_schedule)
>> +		hellcreek_free_schedule(hellcreek, port);
>> +
>> +	/* Then select port */
>> +	hellcreek_select_tgd(hellcreek, port);
>> +
>> +	/* Disable gating and return to regular switching flow */
>> +	hellcreek_write(hellcreek, 0xff << TR_TGDCTRL_ADMINGATESTATES_SHIFT,
>> +			TR_TGDCTRL);
>> +
>> +	mutex_unlock(&hellcreek->reg_lock);
>> +
>> +	return 0;
>> +}
>> +
>> +static int hellcreek_port_setup_tc(struct dsa_switch *ds, int port,
>> +				   enum tc_setup_type type, void *type_data)
>> +{
>> +	struct tc_taprio_qopt_offload *taprio = type_data;
>> +	struct hellcreek *hellcreek = ds->priv;
>> +
>> +	if (type != TC_SETUP_QDISC_TAPRIO)
>> +		return -EOPNOTSUPP;
>> +
>> +	/* Does this hellcreek version support Qbv in hardware? */
>> +	if (!hellcreek->pdata->qbv_support)
>> +		return -EOPNOTSUPP;
>> +
>> +	if (taprio->enable)
>> +		return hellcreek_port_set_schedule(ds, port, taprio);
>> +
>> +	return hellcreek_port_del_schedule(ds, port);
>> +}
>> +
>>  static const struct dsa_switch_ops hellcreek_ds_ops = {
>>  	.get_ethtool_stats   = hellcreek_get_ethtool_stats,
>>  	.get_sset_count	     = hellcreek_get_sset_count,
>> @@ -1153,6 +1463,7 @@ static const struct dsa_switch_ops hellcreek_ds_ops = {
>>  	.port_hwtstamp_get   = hellcreek_port_hwtstamp_get,
>>  	.port_prechangeupper = hellcreek_port_prechangeupper,
>>  	.port_rxtstamp	     = hellcreek_port_rxtstamp,
>> +	.port_setup_tc	     = hellcreek_port_setup_tc,
>>  	.port_stp_state_set  = hellcreek_port_stp_state_set,
>>  	.port_txtstamp	     = hellcreek_port_txtstamp,
>>  	.port_vlan_add	     = hellcreek_vlan_add,
>> @@ -1208,6 +1519,9 @@ static int hellcreek_probe(struct platform_device *pdev)
>>  
>>  		port->hellcreek	= hellcreek;
>>  		port->port	= i;
>> +
>> +		INIT_DELAYED_WORK(&port->schedule_work,
>> +				  hellcreek_check_schedule);
>>  	}
>>  
>>  	mutex_init(&hellcreek->reg_lock);
>> diff --git a/drivers/net/dsa/hirschmann/hellcreek.h b/drivers/net/dsa/hirschmann/hellcreek.h
>> index e81781ebc31c..7ffb1b33ff72 100644
>> --- a/drivers/net/dsa/hirschmann/hellcreek.h
>> +++ b/drivers/net/dsa/hirschmann/hellcreek.h
>> @@ -213,6 +213,20 @@ struct hellcreek_counter {
>>  	const char *name;
>>  };
>>  
>> +struct hellcreek_gcl_entry {
>> +	u32 interval;
>> +	u8 gate_states;
>> +	bool overrun_ignore;
>> +};
>> +
>> +struct hellcreek_schedule {
>> +	struct hellcreek_gcl_entry *entries;
>> +	size_t num_entries;
>> +	ktime_t base_time;
>> +	u32 cycle_time;
>> +	int port;
>
> You don't need/use the "port" member.

True.

>
>> +};
>> +
>>  struct hellcreek;
>>  
>>  /* State flags for hellcreek_port_hwtstamp::state */
>> @@ -246,6 +260,10 @@ struct hellcreek_port {
>>  
>>  	/* Per-port timestamping resources */
>>  	struct hellcreek_port_hwtstamp port_hwtstamp;
>> +
>> +	/* Per-port Qbv schedule information */
>> +	struct hellcreek_schedule *current_schedule;
>> +	struct delayed_work schedule_work;
>>  };
>>  
>>  struct hellcreek_fdb_entry {
>> @@ -283,4 +301,8 @@ struct hellcreek {
>>  	size_t fdb_entries;
>>  };
>>  
>> +#define HELLCREEK_SCHEDULE_PERIOD	(2 * HZ)
>
> Is there a risk if you schedule rarer than this? The hellcreek->seconds
> value is no longer accurate enough?

The hw engineer recommended to use the two seconds. In theory, it can be
increased.

Thanks for the review.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: pull-request: wireless-drivers-2020-11-23
From: Kalle Valo @ 2020-11-24  7:15 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, linux-wireless
In-Reply-To: <20201123153002.2200d6be@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

Jakub Kicinski <kuba@kernel.org> writes:

> On Mon, 23 Nov 2020 16:10:37 +0000 (UTC) Kalle Valo wrote:
>> wireless-drivers fixes for v5.10
>> 
>> First set of fixes for v5.10. One fix for iwlwifi kernel panic, others
>> less notable.
>> 
>> rtw88
>> 
>> * fix a bogus test found by clang
>> 
>> iwlwifi
>> 
>> * fix long memory reads causing soft lockup warnings
>> 
>> * fix kernel panic during Channel Switch Announcement (CSA)
>> 
>> * other smaller fixes
>> 
>> MAINTAINERS
>> 
>> * email address updates
>
> Pulled, thanks!
>
> Please watch out for missing sign-offs.

I assume you refer to commit 97cc16943f23, sorry about that. Currently
I'm just manually checking sign-offs and missed this patch. My plan is
to implement proper checks to my patchwork script so I'll notice these
before I commit the patch (or pull request), just have not yet find the
time to do that.

commit 97cc16943f23078535fdbce4f6391b948b4ccc08
Author:     Avraham Stern <avraham.stern@intel.com>
AuthorDate: Sat Nov 7 10:50:09 2020 +0200
Commit:     Kalle Valo <kvalo@codeaurora.org>
CommitDate: Tue Nov 10 20:45:34 2020 +0200

    iwlwifi: mvm: write queue_sync_state only for sync
    
    We use mvm->queue_sync_state to wait for synchronous queue sync
    messages, but if an async one happens inbetween we shouldn't
    clear mvm->queue_sync_state after sending the async one, that
    can run concurrently (at least from the CPU POV) with another
    synchronous queue sync.
    
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Fixes: 3c514bf831ac ("iwlwifi: mvm: add a loose synchronization of the NSSN across Rx queues")
    Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
    Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
    Link: https://lore.kernel.org/r/iwlwifi.20201107104557.51a3148f2c14.I0772171dbaec87433a11513e9586d98b5d920b5f@changeid

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* [PATCH net v2] ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init
From: Wang Hai @ 2020-11-24  7:17 UTC (permalink / raw)
  To: kuba, davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel

kmemleak report a memory leak as follows:

unreferenced object 0xffff8880059c6a00 (size 64):
  comm "ip", pid 23696, jiffies 4296590183 (age 1755.384s)
  hex dump (first 32 bytes):
    20 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00   ...............
    1c 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00  ................
  backtrace:
    [<00000000aa4e7a87>] ip6addrlbl_add+0x90/0xbb0
    [<0000000070b8d7f1>] ip6addrlbl_net_init+0x109/0x170
    [<000000006a9ca9d4>] ops_init+0xa8/0x3c0
    [<000000002da57bf2>] setup_net+0x2de/0x7e0
    [<000000004e52d573>] copy_net_ns+0x27d/0x530
    [<00000000b07ae2b4>] create_new_namespaces+0x382/0xa30
    [<000000003b76d36f>] unshare_nsproxy_namespaces+0xa1/0x1d0
    [<0000000030653721>] ksys_unshare+0x3a4/0x780
    [<0000000007e82e40>] __x64_sys_unshare+0x2d/0x40
    [<0000000031a10c08>] do_syscall_64+0x33/0x40
    [<0000000099df30e7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

We should free all rules when we catch an error in ip6addrlbl_net_init().
otherwise a memory leak will occur.

Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
---
v1->v2: simplify this function
 net/ipv6/addrlabel.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index 642fc6ac13d2..8a22486cf270 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -306,7 +306,9 @@ static int ip6addrlbl_del(struct net *net,
 /* add default label */
 static int __net_init ip6addrlbl_net_init(struct net *net)
 {
-	int err = 0;
+	struct ip6addrlbl_entry *p = NULL;
+	struct hlist_node *n;
+	int err;
 	int i;
 
 	ADDRLABEL(KERN_DEBUG "%s\n", __func__);
@@ -315,14 +317,20 @@ static int __net_init ip6addrlbl_net_init(struct net *net)
 	INIT_HLIST_HEAD(&net->ipv6.ip6addrlbl_table.head);
 
 	for (i = 0; i < ARRAY_SIZE(ip6addrlbl_init_table); i++) {
-		int ret = ip6addrlbl_add(net,
-					 ip6addrlbl_init_table[i].prefix,
-					 ip6addrlbl_init_table[i].prefixlen,
-					 0,
-					 ip6addrlbl_init_table[i].label, 0);
-		/* XXX: should we free all rules when we catch an error? */
-		if (ret && (!err || err != -ENOMEM))
-			err = ret;
+		err = ip6addrlbl_add(net,
+				     ip6addrlbl_init_table[i].prefix,
+				     ip6addrlbl_init_table[i].prefixlen,
+				     0,
+				     ip6addrlbl_init_table[i].label, 0);
+		if (err)
+			goto err_ip6addrlbl_add;
+	}
+	return 0;
+
+err_ip6addrlbl_add:
+	hlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) {
+		hlist_del_rcu(&p->list);
+		kfree_rcu(p, rcu);
 	}
 	return err;
 }
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net-next 00/13] Add mlx5 subfunction support
From: Jason Wang @ 2020-11-24  7:05 UTC (permalink / raw)
  To: Samudrala, Sridhar, Alexander Duyck, Jakub Kicinski
  Cc: David Ahern, Jason Gunthorpe, Parav Pandit, Saeed Mahameed,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	gregkh@linuxfoundation.org, Jiri Pirko, dledford@redhat.com,
	Leon Romanovsky, davem@davemloft.net
In-Reply-To: <1e7773b1-0954-257f-7355-c17b15ab6b8b@redhat.com>


On 2020/11/24 下午3:01, Jason Wang wrote:
>
> On 2020/11/21 上午3:04, Samudrala, Sridhar wrote:
>>
>>
>> On 11/20/2020 9:58 AM, Alexander Duyck wrote:
>>> On Thu, Nov 19, 2020 at 5:29 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Wed, 18 Nov 2020 21:35:29 -0700 David Ahern wrote:
>>>>> On 11/18/20 7:14 PM, Jakub Kicinski wrote:
>>>>>> On Tue, 17 Nov 2020 14:49:54 -0400 Jason Gunthorpe wrote:
>>>>>>> On Tue, Nov 17, 2020 at 09:11:20AM -0800, Jakub Kicinski wrote:
>>>>>>>
>>>>>>>>> Just to refresh all our memory, we discussed and settled on 
>>>>>>>>> the flow
>>>>>>>>> in [2]; RFC [1] followed this discussion.
>>>>>>>>>
>>>>>>>>> vdpa tool of [3] can add one or more vdpa device(s) on top of 
>>>>>>>>> already
>>>>>>>>> spawned PF, VF, SF device.
>>>>>>>>
>>>>>>>> Nack for the networking part of that. It'd basically be VMDq.
>>>>>>>
>>>>>>> What are you NAK'ing?
>>>>>>
>>>>>> Spawning multiple netdevs from one device by slicing up its queues.
>>>>>
>>>>> Why do you object to that? Slicing up h/w resources for virtual what
>>>>> ever has been common practice for a long time.
>>>>
>>>> My memory of the VMDq debate is hazy, let me rope in Alex into this.
>>>> I believe the argument was that we should offload software constructs,
>>>> not create HW-specific APIs which depend on HW availability and
>>>> implementation. So the path we took was offloading macvlan.
>>>
>>> I think it somewhat depends on the type of interface we are talking
>>> about. What we were wanting to avoid was drivers spawning their own
>>> unique VMDq netdevs and each having a different way of doing it. The
>>> approach Intel went with was to use a MACVLAN offload to approach it.
>>> Although I would imagine many would argue the approach is somewhat
>>> dated and limiting since you cannot do many offloads on a MACVLAN
>>> interface.
>>
>> Yes. We talked about this at netdev 0x14 and the limitations of 
>> macvlan based offloads.
>> https://netdevconf.info/0x14/session.html?talk-hardware-acceleration-of-container-networking-interfaces 
>>
>>
>> Subfunction seems to be a good model to expose VMDq VSI or SIOV ADI 
>> as a netdev for kernel containers. AF_XDP ZC in a container is one of 
>> the usecase this would address. Today we have to pass the entire 
>> PF/VF to a container to do AF_XDP.
>>
>> Looks like the current model is to create a subfunction of a specific 
>> type on auxiliary bus, do some configuration to assign resources and 
>> then activate the subfunction.
>>
>>>
>>> With the VDPA case I believe there is a set of predefined virtio
>>> devices that are being emulated and presented so it isn't as if they
>>> are creating a totally new interface for this.
>
>
> vDPA doesn't have any limitation of how the devices is created or 
> implemented. It could be predefined or created dynamically. vDPA 
> leaves all of those to the parent device with the help of a unified 
> management API[1]. E.g It could be a PCI device (PF or VF), 
> sub-function or  software emulated devices.


Miss the link, https://www.spinics.net/lists/netdev/msg699374.html.

Thanks


>
>
>>>
>>> What I would be interested in seeing is if there are any other vendors
>>> that have reviewed this and sign off on this approach.
>
>
> For "this approach" do you mean vDPA subfucntion? My understanding is 
> that it's totally vendor specific, vDPA subsystem don't want to be 
> limited by a specific type of device.
>
>
>>> What we don't
>>> want to see is Nivida/Mellanox do this one way, then Broadcom or Intel
>>> come along later and have yet another way of doing this. We need an
>>> interface and feature set that will work for everyone in terms of how
>>> this will look going forward.
>
> For feature set,  it would be hard to force (we can have a 
> recommendation set of features) vendors to implement a common set of 
> features consider they can be negotiated. So the management interface 
> is expected to implement features like cpu clusters in order to make 
> sure the migration compatibility, or qemu can assist for the missing 
> feature with performance lose.
>
> Thanks
>
>


^ permalink raw reply

* Re: [PATCH net-next 00/13] Add mlx5 subfunction support
From: Jason Wang @ 2020-11-24  7:01 UTC (permalink / raw)
  To: Samudrala, Sridhar, Alexander Duyck, Jakub Kicinski
  Cc: David Ahern, Jason Gunthorpe, Parav Pandit, Saeed Mahameed,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	gregkh@linuxfoundation.org, Jiri Pirko, dledford@redhat.com,
	Leon Romanovsky, davem@davemloft.net
In-Reply-To: <102d20e1-c78f-09cb-fabb-efdc59f61eb8@intel.com>


On 2020/11/21 上午3:04, Samudrala, Sridhar wrote:
>
>
> On 11/20/2020 9:58 AM, Alexander Duyck wrote:
>> On Thu, Nov 19, 2020 at 5:29 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>
>>> On Wed, 18 Nov 2020 21:35:29 -0700 David Ahern wrote:
>>>> On 11/18/20 7:14 PM, Jakub Kicinski wrote:
>>>>> On Tue, 17 Nov 2020 14:49:54 -0400 Jason Gunthorpe wrote:
>>>>>> On Tue, Nov 17, 2020 at 09:11:20AM -0800, Jakub Kicinski wrote:
>>>>>>
>>>>>>>> Just to refresh all our memory, we discussed and settled on the 
>>>>>>>> flow
>>>>>>>> in [2]; RFC [1] followed this discussion.
>>>>>>>>
>>>>>>>> vdpa tool of [3] can add one or more vdpa device(s) on top of 
>>>>>>>> already
>>>>>>>> spawned PF, VF, SF device.
>>>>>>>
>>>>>>> Nack for the networking part of that. It'd basically be VMDq.
>>>>>>
>>>>>> What are you NAK'ing?
>>>>>
>>>>> Spawning multiple netdevs from one device by slicing up its queues.
>>>>
>>>> Why do you object to that? Slicing up h/w resources for virtual what
>>>> ever has been common practice for a long time.
>>>
>>> My memory of the VMDq debate is hazy, let me rope in Alex into this.
>>> I believe the argument was that we should offload software constructs,
>>> not create HW-specific APIs which depend on HW availability and
>>> implementation. So the path we took was offloading macvlan.
>>
>> I think it somewhat depends on the type of interface we are talking
>> about. What we were wanting to avoid was drivers spawning their own
>> unique VMDq netdevs and each having a different way of doing it. The
>> approach Intel went with was to use a MACVLAN offload to approach it.
>> Although I would imagine many would argue the approach is somewhat
>> dated and limiting since you cannot do many offloads on a MACVLAN
>> interface.
>
> Yes. We talked about this at netdev 0x14 and the limitations of 
> macvlan based offloads.
> https://netdevconf.info/0x14/session.html?talk-hardware-acceleration-of-container-networking-interfaces 
>
>
> Subfunction seems to be a good model to expose VMDq VSI or SIOV ADI as 
> a netdev for kernel containers. AF_XDP ZC in a container is one of the 
> usecase this would address. Today we have to pass the entire PF/VF to 
> a container to do AF_XDP.
>
> Looks like the current model is to create a subfunction of a specific 
> type on auxiliary bus, do some configuration to assign resources and 
> then activate the subfunction.
>
>>
>> With the VDPA case I believe there is a set of predefined virtio
>> devices that are being emulated and presented so it isn't as if they
>> are creating a totally new interface for this.


vDPA doesn't have any limitation of how the devices is created or 
implemented. It could be predefined or created dynamically. vDPA leaves 
all of those to the parent device with the help of a unified management 
API[1]. E.g It could be a PCI device (PF or VF), sub-function or  
software emulated devices.


>>
>> What I would be interested in seeing is if there are any other vendors
>> that have reviewed this and sign off on this approach.


For "this approach" do you mean vDPA subfucntion? My understanding is 
that it's totally vendor specific, vDPA subsystem don't want to be 
limited by a specific type of device.


>> What we don't
>> want to see is Nivida/Mellanox do this one way, then Broadcom or Intel
>> come along later and have yet another way of doing this. We need an
>> interface and feature set that will work for everyone in terms of how
>> this will look going forward.

For feature set,  it would be hard to force (we can have a 
recommendation set of features) vendors to implement a common set of 
features consider they can be negotiated. So the management interface is 
expected to implement features like cpu clusters in order to make sure 
the migration compatibility, or qemu can assist for the missing feature 
with performance lose.

Thanks



^ permalink raw reply

* [PATCH] net: fs_enet: Fix incorrect IS_ERR_VALUE macro usages
From: Wei Li @ 2020-11-24  6:24 UTC (permalink / raw)
  To: Pantelis Antoniou, David S. Miller, Jakub Kicinski, Scott Wood,
	Jeff Garzik, Timur Tabi, Kumar Gala
  Cc: linuxppc-dev, netdev, linux-kernel, guohanjun

IS_ERR_VALUE macro should be used only with unsigned long type.
Especially it works incorrectly with unsigned shorter types on
64bit machines.

Fixes: 976de6a8c304 ("fs_enet: Be an of_platform device when CONFIG_PPC_CPM_NEW_BINDING is set.")
Fixes: 4c35630ccda5 ("[POWERPC] Change rheap functions to use ulongs instead of pointers")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 drivers/net/ethernet/freescale/fs_enet/mac-fcc.c | 2 +-
 drivers/net/ethernet/freescale/fs_enet/mac-scc.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c b/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
index b47490be872c..e2117ad46130 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mac-fcc.c
@@ -107,7 +107,7 @@ static int do_pd_setup(struct fs_enet_private *fep)
 
 	fep->fcc.mem = (void __iomem *)cpm2_immr;
 	fpi->dpram_offset = cpm_dpalloc(128, 32);
-	if (IS_ERR_VALUE(fpi->dpram_offset)) {
+	if (IS_ERR_VALUE((unsigned long)(int)fpi->dpram_offset)) {
 		ret = fpi->dpram_offset;
 		goto out_fcccp;
 	}
diff --git a/drivers/net/ethernet/freescale/fs_enet/mac-scc.c b/drivers/net/ethernet/freescale/fs_enet/mac-scc.c
index 64300ac13e02..90f82df0b1bb 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mac-scc.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mac-scc.c
@@ -136,7 +136,7 @@ static int allocate_bd(struct net_device *dev)
 
 	fep->ring_mem_addr = cpm_dpalloc((fpi->tx_ring + fpi->rx_ring) *
 					 sizeof(cbd_t), 8);
-	if (IS_ERR_VALUE(fep->ring_mem_addr))
+	if (IS_ERR_VALUE((unsigned long)(int)fep->ring_mem_addr))
 		return -ENOMEM;
 
 	fep->ring_base = (void __iomem __force*)
-- 
2.17.1


^ permalink raw reply related

* [PATCH] net/ethernet/freescale: Fix incorrect IS_ERR_VALUE macro usages
From: Wei Li @ 2020-11-24  6:22 UTC (permalink / raw)
  To: Li Yang, David S. Miller, Jakub Kicinski, Paul Gortmaker,
	Kumar Gala, Timur Tabi
  Cc: netdev, linuxppc-dev, linux-kernel, guohanjun

IS_ERR_VALUE macro should be used only with unsigned long type.
Especially it works incorrectly with unsigned shorter types on
64bit machines.

Fixes: 4c35630ccda5 ("[POWERPC] Change rheap functions to use ulongs instead of pointers")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 drivers/net/ethernet/freescale/ucc_geth.c | 30 +++++++++++------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
index 714b501be7d0..8656d9be256a 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -286,7 +286,7 @@ static int fill_init_enet_entries(struct ucc_geth_private *ugeth,
 		else {
 			init_enet_offset =
 			    qe_muram_alloc(thread_size, thread_alignment);
-			if (IS_ERR_VALUE(init_enet_offset)) {
+			if (IS_ERR_VALUE((unsigned long)(int)init_enet_offset)) {
 				if (netif_msg_ifup(ugeth))
 					pr_err("Can not allocate DPRAM memory\n");
 				qe_put_snum((u8) snum);
@@ -2223,7 +2223,7 @@ static int ucc_geth_alloc_tx(struct ucc_geth_private *ugeth)
 			ugeth->tx_bd_ring_offset[j] =
 			    qe_muram_alloc(length,
 					   UCC_GETH_TX_BD_RING_ALIGNMENT);
-			if (!IS_ERR_VALUE(ugeth->tx_bd_ring_offset[j]))
+			if (!IS_ERR_VALUE((unsigned long)(int)ugeth->tx_bd_ring_offset[j]))
 				ugeth->p_tx_bd_ring[j] =
 				    (u8 __iomem *) qe_muram_addr(ugeth->
 							 tx_bd_ring_offset[j]);
@@ -2300,7 +2300,7 @@ static int ucc_geth_alloc_rx(struct ucc_geth_private *ugeth)
 			ugeth->rx_bd_ring_offset[j] =
 			    qe_muram_alloc(length,
 					   UCC_GETH_RX_BD_RING_ALIGNMENT);
-			if (!IS_ERR_VALUE(ugeth->rx_bd_ring_offset[j]))
+			if (!IS_ERR_VALUE((unsigned long)(int)ugeth->rx_bd_ring_offset[j]))
 				ugeth->p_rx_bd_ring[j] =
 				    (u8 __iomem *) qe_muram_addr(ugeth->
 							 rx_bd_ring_offset[j]);
@@ -2510,7 +2510,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 	ugeth->tx_glbl_pram_offset =
 	    qe_muram_alloc(sizeof(struct ucc_geth_tx_global_pram),
 			   UCC_GETH_TX_GLOBAL_PRAM_ALIGNMENT);
-	if (IS_ERR_VALUE(ugeth->tx_glbl_pram_offset)) {
+	if (IS_ERR_VALUE((unsigned long)(int)ugeth->tx_glbl_pram_offset)) {
 		if (netif_msg_ifup(ugeth))
 			pr_err("Can not allocate DPRAM memory for p_tx_glbl_pram\n");
 		return -ENOMEM;
@@ -2530,7 +2530,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 			   sizeof(struct ucc_geth_thread_data_tx) +
 			   32 * (numThreadsTxNumerical == 1),
 			   UCC_GETH_THREAD_DATA_ALIGNMENT);
-	if (IS_ERR_VALUE(ugeth->thread_dat_tx_offset)) {
+	if (IS_ERR_VALUE((unsigned long)(int)ugeth->thread_dat_tx_offset)) {
 		if (netif_msg_ifup(ugeth))
 			pr_err("Can not allocate DPRAM memory for p_thread_data_tx\n");
 		return -ENOMEM;
@@ -2557,7 +2557,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 	    qe_muram_alloc(ug_info->numQueuesTx *
 			   sizeof(struct ucc_geth_send_queue_qd),
 			   UCC_GETH_SEND_QUEUE_QUEUE_DESCRIPTOR_ALIGNMENT);
-	if (IS_ERR_VALUE(ugeth->send_q_mem_reg_offset)) {
+	if (IS_ERR_VALUE((unsigned long)(int)ugeth->send_q_mem_reg_offset)) {
 		if (netif_msg_ifup(ugeth))
 			pr_err("Can not allocate DPRAM memory for p_send_q_mem_reg\n");
 		return -ENOMEM;
@@ -2597,7 +2597,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 		ugeth->scheduler_offset =
 		    qe_muram_alloc(sizeof(struct ucc_geth_scheduler),
 				   UCC_GETH_SCHEDULER_ALIGNMENT);
-		if (IS_ERR_VALUE(ugeth->scheduler_offset)) {
+		if (IS_ERR_VALUE((unsigned long)(int)ugeth->scheduler_offset)) {
 			if (netif_msg_ifup(ugeth))
 				pr_err("Can not allocate DPRAM memory for p_scheduler\n");
 			return -ENOMEM;
@@ -2644,7 +2644,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 		    qe_muram_alloc(sizeof
 				   (struct ucc_geth_tx_firmware_statistics_pram),
 				   UCC_GETH_TX_STATISTICS_ALIGNMENT);
-		if (IS_ERR_VALUE(ugeth->tx_fw_statistics_pram_offset)) {
+		if (IS_ERR_VALUE((unsigned long)(int)ugeth->tx_fw_statistics_pram_offset)) {
 			if (netif_msg_ifup(ugeth))
 				pr_err("Can not allocate DPRAM memory for p_tx_fw_statistics_pram\n");
 			return -ENOMEM;
@@ -2681,7 +2681,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 	ugeth->rx_glbl_pram_offset =
 	    qe_muram_alloc(sizeof(struct ucc_geth_rx_global_pram),
 			   UCC_GETH_RX_GLOBAL_PRAM_ALIGNMENT);
-	if (IS_ERR_VALUE(ugeth->rx_glbl_pram_offset)) {
+	if (IS_ERR_VALUE((unsigned long)(int)ugeth->rx_glbl_pram_offset)) {
 		if (netif_msg_ifup(ugeth))
 			pr_err("Can not allocate DPRAM memory for p_rx_glbl_pram\n");
 		return -ENOMEM;
@@ -2700,7 +2700,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 	    qe_muram_alloc(numThreadsRxNumerical *
 			   sizeof(struct ucc_geth_thread_data_rx),
 			   UCC_GETH_THREAD_DATA_ALIGNMENT);
-	if (IS_ERR_VALUE(ugeth->thread_dat_rx_offset)) {
+	if (IS_ERR_VALUE((unsigned long)(int)ugeth->thread_dat_rx_offset)) {
 		if (netif_msg_ifup(ugeth))
 			pr_err("Can not allocate DPRAM memory for p_thread_data_rx\n");
 		return -ENOMEM;
@@ -2721,7 +2721,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 		    qe_muram_alloc(sizeof
 				   (struct ucc_geth_rx_firmware_statistics_pram),
 				   UCC_GETH_RX_STATISTICS_ALIGNMENT);
-		if (IS_ERR_VALUE(ugeth->rx_fw_statistics_pram_offset)) {
+		if (IS_ERR_VALUE((unsigned long)(int)ugeth->rx_fw_statistics_pram_offset)) {
 			if (netif_msg_ifup(ugeth))
 				pr_err("Can not allocate DPRAM memory for p_rx_fw_statistics_pram\n");
 			return -ENOMEM;
@@ -2741,7 +2741,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 	    qe_muram_alloc(ug_info->numQueuesRx *
 			   sizeof(struct ucc_geth_rx_interrupt_coalescing_entry)
 			   + 4, UCC_GETH_RX_INTERRUPT_COALESCING_ALIGNMENT);
-	if (IS_ERR_VALUE(ugeth->rx_irq_coalescing_tbl_offset)) {
+	if (IS_ERR_VALUE((unsigned long)(int)ugeth->rx_irq_coalescing_tbl_offset)) {
 		if (netif_msg_ifup(ugeth))
 			pr_err("Can not allocate DPRAM memory for p_rx_irq_coalescing_tbl\n");
 		return -ENOMEM;
@@ -2807,7 +2807,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 			   (sizeof(struct ucc_geth_rx_bd_queues_entry) +
 			    sizeof(struct ucc_geth_rx_prefetched_bds)),
 			   UCC_GETH_RX_BD_QUEUES_ALIGNMENT);
-	if (IS_ERR_VALUE(ugeth->rx_bd_qs_tbl_offset)) {
+	if (IS_ERR_VALUE((unsigned long)(int)ugeth->rx_bd_qs_tbl_offset)) {
 		if (netif_msg_ifup(ugeth))
 			pr_err("Can not allocate DPRAM memory for p_rx_bd_qs_tbl\n");
 		return -ENOMEM;
@@ -2892,7 +2892,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 		ugeth->exf_glbl_param_offset =
 		    qe_muram_alloc(sizeof(struct ucc_geth_exf_global_pram),
 		UCC_GETH_RX_EXTENDED_FILTERING_GLOBAL_PARAMETERS_ALIGNMENT);
-		if (IS_ERR_VALUE(ugeth->exf_glbl_param_offset)) {
+		if (IS_ERR_VALUE((unsigned long)(int)ugeth->exf_glbl_param_offset)) {
 			if (netif_msg_ifup(ugeth))
 				pr_err("Can not allocate DPRAM memory for p_exf_glbl_param\n");
 			return -ENOMEM;
@@ -3026,7 +3026,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
 
 	/* Allocate InitEnet command parameter structure */
 	init_enet_pram_offset = qe_muram_alloc(sizeof(struct ucc_geth_init_pram), 4);
-	if (IS_ERR_VALUE(init_enet_pram_offset)) {
+	if (IS_ERR_VALUE((unsigned long)(int)init_enet_pram_offset)) {
 		if (netif_msg_ifup(ugeth))
 			pr_err("Can not allocate DPRAM memory for p_init_enet_pram\n");
 		return -ENOMEM;
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v3] tracepoint: Do not fail unregistering a probe due to memory allocation
From: Matt Mullins @ 2020-11-24  5:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Florian Weimer, Peter Zijlstra, Mathieu Desnoyers, linux-kernel,
	Matt Mullins, Ingo Molnar, Alexei Starovoitov, Daniel Borkmann,
	Dmitry Vyukov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf, Kees Cook,
	Josh Poimboeuf, linux-toolchains
In-Reply-To: <20201118093405.7a6d2290@gandalf.local.home>

On Wed, Nov 18, 2020 at 09:34:05AM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> The list of tracepoint callbacks is managed by an array that is protected
> by RCU. To update this array, a new array is allocated, the updates are
> copied over to the new array, and then the list of functions for the
> tracepoint is switched over to the new array. After a completion of an RCU
> grace period, the old array is freed.
> 
> This process happens for both adding a callback as well as removing one.
> But on removing a callback, if the new array fails to be allocated, the
> callback is not removed, and may be used after it is freed by the clients
> of the tracepoint.
> 
> There's really no reason to fail if the allocation for a new array fails
> when removing a function. Instead, the function can simply be replaced by a
> stub function that could be cleaned up on the next modification of the
> array. That is, instead of calling the function registered to the
> tracepoint, it would call a stub function in its place.
> 
> Link: https://lore.kernel.org/r/20201115055256.65625-1-mmullins@mmlx.us
> Link: https://lore.kernel.org/r/20201116175107.02db396d@gandalf.local.home
> Link: https://lore.kernel.org/r/20201117211836.54acaef2@oasis.local.home
> 
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Song Liu <songliubraving@fb.com>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: Andrii Nakryiko <andriin@fb.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: KP Singh <kpsingh@chromium.org>
> Cc: netdev <netdev@vger.kernel.org>
> Cc: bpf <bpf@vger.kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Florian Weimer <fw@deneb.enyo.de>
> Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
> Reported-by: syzbot+83aa762ef23b6f0d1991@syzkaller.appspotmail.com
> Reported-by: syzbot+d29e58bb557324e55e5e@syzkaller.appspotmail.com
> Reported-by: Matt Mullins <mmullins@mmlx.us>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

I'm a bit late answering your initial query, but yes indeed this fixes
the bug I was hunting.  I just watched it live through the reproducer
for about a half-hour, while unpatched I get an instant "BUG: unable to
handle page fault".

Tested-by: Matt Mullins <mmullins@mmlx.us>

> ---
> Changes since v2:
>    - Went back to using a stub function and not touching
>       the fast path.
>    - Removed adding __GFP_NOFAIL from the allocation of the removal.
> 
>  kernel/tracepoint.c | 80 ++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 64 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 3f659f855074..3e261482296c 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -53,6 +53,12 @@ struct tp_probes {
>  	struct tracepoint_func probes[];
>  };
>  
> +/* Called in removal of a func but failed to allocate a new tp_funcs */
> +static void tp_stub_func(void)
> +{
> +	return;
> +}
> +
>  static inline void *allocate_probes(int count)
>  {
>  	struct tp_probes *p  = kmalloc(struct_size(p, probes, count),
> @@ -131,6 +137,7 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
>  {
>  	struct tracepoint_func *old, *new;
>  	int nr_probes = 0;
> +	int stub_funcs = 0;
>  	int pos = -1;
>  
>  	if (WARN_ON(!tp_func->func))
> @@ -147,14 +154,34 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func,
>  			if (old[nr_probes].func == tp_func->func &&
>  			    old[nr_probes].data == tp_func->data)
>  				return ERR_PTR(-EEXIST);
> +			if (old[nr_probes].func == tp_stub_func)
> +				stub_funcs++;
>  		}
>  	}
> -	/* + 2 : one for new probe, one for NULL func */
> -	new = allocate_probes(nr_probes + 2);
> +	/* + 2 : one for new probe, one for NULL func - stub functions */
> +	new = allocate_probes(nr_probes + 2 - stub_funcs);
>  	if (new == NULL)
>  		return ERR_PTR(-ENOMEM);
>  	if (old) {
> -		if (pos < 0) {
> +		if (stub_funcs) {
> +			/* Need to copy one at a time to remove stubs */
> +			int probes = 0;
> +
> +			pos = -1;
> +			for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
> +				if (old[nr_probes].func == tp_stub_func)
> +					continue;
> +				if (pos < 0 && old[nr_probes].prio < prio)
> +					pos = probes++;
> +				new[probes++] = old[nr_probes];
> +			}
> +			nr_probes = probes;
> +			if (pos < 0)
> +				pos = probes;
> +			else
> +				nr_probes--; /* Account for insertion */
> +
> +		} else if (pos < 0) {
>  			pos = nr_probes;
>  			memcpy(new, old, nr_probes * sizeof(struct tracepoint_func));
>  		} else {
> @@ -188,8 +215,9 @@ static void *func_remove(struct tracepoint_func **funcs,
>  	/* (N -> M), (N > 1, M >= 0) probes */
>  	if (tp_func->func) {
>  		for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
> -			if (old[nr_probes].func == tp_func->func &&
> -			     old[nr_probes].data == tp_func->data)
> +			if ((old[nr_probes].func == tp_func->func &&
> +			     old[nr_probes].data == tp_func->data) ||
> +			    old[nr_probes].func == tp_stub_func)
>  				nr_del++;
>  		}
>  	}
> @@ -208,14 +236,32 @@ static void *func_remove(struct tracepoint_func **funcs,
>  		/* N -> M, (N > 1, M > 0) */
>  		/* + 1 for NULL */
>  		new = allocate_probes(nr_probes - nr_del + 1);
> -		if (new == NULL)
> -			return ERR_PTR(-ENOMEM);
> -		for (i = 0; old[i].func; i++)
> -			if (old[i].func != tp_func->func
> -					|| old[i].data != tp_func->data)
> -				new[j++] = old[i];
> -		new[nr_probes - nr_del].func = NULL;
> -		*funcs = new;
> +		if (new) {
> +			for (i = 0; old[i].func; i++)
> +				if ((old[i].func != tp_func->func
> +				     || old[i].data != tp_func->data)
> +				    && old[i].func != tp_stub_func)
> +					new[j++] = old[i];
> +			new[nr_probes - nr_del].func = NULL;
> +			*funcs = new;
> +		} else {
> +			/*
> +			 * Failed to allocate, replace the old function
> +			 * with calls to tp_stub_func.
> +			 */
> +			for (i = 0; old[i].func; i++)
> +				if (old[i].func == tp_func->func &&
> +				    old[i].data == tp_func->data) {
> +					old[i].func = tp_stub_func;
> +					/* Set the prio to the next event. */
> +					if (old[i + 1].func)
> +						old[i].prio =
> +							old[i + 1].prio;
> +					else
> +						old[i].prio = -1;
> +				}
> +			*funcs = old;
> +		}
>  	}
>  	debug_print_probes(*funcs);
>  	return old;
> @@ -295,10 +341,12 @@ static int tracepoint_remove_func(struct tracepoint *tp,
>  	tp_funcs = rcu_dereference_protected(tp->funcs,
>  			lockdep_is_held(&tracepoints_mutex));
>  	old = func_remove(&tp_funcs, func);
> -	if (IS_ERR(old)) {
> -		WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);
> +	if (WARN_ON_ONCE(IS_ERR(old)))
>  		return PTR_ERR(old);
> -	}
> +
> +	if (tp_funcs == old)
> +		/* Failed allocating new tp_funcs, replaced func with stub */
> +		return 0;
>  
>  	if (!tp_funcs) {
>  		/* Removed last function */
> -- 
> 2.25.4
> 

^ permalink raw reply

* Re: [PATCH bpf-next 1/6] bpf: fix bpf_put_raw_tracepoint()'s use of __module_address()
From: Alexei Starovoitov @ 2020-11-24  5:49 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, netdev, ast, daniel, kernel-team
In-Reply-To: <20201119232244.2776720-2-andrii@kernel.org>

On Thu, Nov 19, 2020 at 03:22:39PM -0800, Andrii Nakryiko wrote:
> __module_address() needs to be called with preemption disabled or with
> module_mutex taken. preempt_disable() is enough for read-only uses, which is
> what this fix does.
> 
> Fixes: a38d1107f937 ("bpf: support raw tracepoints in modules")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/trace/bpf_trace.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index d255bc9b2bfa..bb98a377050a 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2060,7 +2060,11 @@ struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
>  
>  void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
>  {
> -	struct module *mod = __module_address((unsigned long)btp);
> +	struct module *mod;
> +
> +	preempt_disable();
> +	mod = __module_address((unsigned long)btp);
> +	preempt_enable();
>  
>  	if (mod)
>  		module_put(mod);

I don't understand why 'mod' cannot become dangling pointer after preempt_enable().
Either it needs a comment explaining why it's ok or module_put() should
be in preempt disabled section.

^ permalink raw reply

* RE: [PATCH v15 7/9] ptp: arm/arm64: Enable ptp_kvm for arm/arm64
From: Jianyong Wu @ 2020-11-24  5:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: netdev@vger.kernel.org, yangbo.lu@nxp.com, john.stultz@linaro.org,
	tglx@linutronix.de, pbonzini@redhat.com,
	sean.j.christopherson@intel.com, richardcochran@gmail.com,
	Mark Rutland, will@kernel.org, Suzuki Poulose, Andre Przywara,
	Steven Price, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Steve Capper,
	Justin He, nd
In-Reply-To: <7bd3a66253ca4b7adbe2294eb598a23f@kernel.org>

Hi Marc,

> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: Monday, November 23, 2020 6:49 PM
> To: Jianyong Wu <Jianyong.Wu@arm.com>
> Cc: netdev@vger.kernel.org; yangbo.lu@nxp.com; john.stultz@linaro.org;
> tglx@linutronix.de; pbonzini@redhat.com; sean.j.christopherson@intel.com;
> richardcochran@gmail.com; Mark Rutland <Mark.Rutland@arm.com>;
> will@kernel.org; Suzuki Poulose <Suzuki.Poulose@arm.com>; Andre
> Przywara <Andre.Przywara@arm.com>; Steven Price
> <Steven.Price@arm.com>; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> kvm@vger.kernel.org; Steve Capper <Steve.Capper@arm.com>; Justin He
> <Justin.He@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v15 7/9] ptp: arm/arm64: Enable ptp_kvm for
> arm/arm64
> 
> On 2020-11-11 06:22, Jianyong Wu wrote:
> > Currently, there is no mechanism to keep time sync between guest and
> > host in arm/arm64 virtualization environment. Time in guest will drift
> > compared with host after boot up as they may both use third party time
> > sources to correct their time respectively. The time deviation will be
> > in order of milliseconds. But in some scenarios,like in cloud
> > envirenment, we ask
> 
> environment
> 
OK

> > for higher time precision.
> >
> > kvm ptp clock, which chooses the host clock source as a reference
> > clock to sync time between guest and host, has been adopted by x86
> > which takes the time sync order from milliseconds to nanoseconds.
> >
> > This patch enables kvm ptp clock for arm/arm64 and improves clock sync
> > precison
> 
> precision
>
OK
 
> > significantly.
> >
> > Test result comparisons between with kvm ptp clock and without it in
> > arm/arm64
> > are as follows. This test derived from the result of command 'chronyc
> > sources'. we should take more care of the last sample column which
> > shows the offset between the local clock and the source at the last
> > measurement.
> >
> > no kvm ptp in guest:
> > MS Name/IP address   Stratum Poll Reach LastRx Last sample
> >
> ==========================================================
> ==============
> > ^* dns1.synet.edu.cn      2   6   377    13  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn      2   6   377    21  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn      2   6   377    29  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn      2   6   377    37  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn      2   6   377    45  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn      2   6   377    53  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn      2   6   377    61  +1040us[+1581us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn      2   6   377     4   -130us[ +796us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn      2   6   377    12   -130us[ +796us] +/-
> > 21ms
> > ^* dns1.synet.edu.cn      2   6   377    20   -130us[ +796us] +/-
> > 21ms
> >
> > in host:
> > MS Name/IP address   Stratum Poll Reach LastRx Last sample
> >
> ==========================================================
> ==============
> > ^* 120.25.115.20          2   7   377    72   -470us[ -603us] +/-
> > 18ms
> > ^* 120.25.115.20          2   7   377    92   -470us[ -603us] +/-
> > 18ms
> > ^* 120.25.115.20          2   7   377   112   -470us[ -603us] +/-
> > 18ms
> > ^* 120.25.115.20          2   7   377     2   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20          2   7   377    22   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20          2   7   377    43   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20          2   7   377    63   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20          2   7   377    83   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20          2   7   377   103   +872ns[-6808ns] +/-
> > 17ms
> > ^* 120.25.115.20          2   7   377   123   +872ns[-6808ns] +/-
> > 17ms
> >
> > The dns1.synet.edu.cn is the network reference clock for guest and
> > 120.25.115.20 is the network reference clock for host. we can't get
> > the clock error between guest and host directly, but a roughly
> > estimated value will be in order of hundreds of us to ms.
> >
> > with kvm ptp in guest:
> > chrony has been disabled in host to remove the disturb by network
> > clock.
> >
> > MS Name/IP address         Stratum Poll Reach LastRx Last sample
> >
> ==========================================================
> ==============
> > * PHC0                    0   3   377     8     -7ns[   +1ns] +/-
> > 3ns
> > * PHC0                    0   3   377     8     +1ns[  +16ns] +/-
> > 3ns
> > * PHC0                    0   3   377     6     -4ns[   -0ns] +/-
> > 6ns
> > * PHC0                    0   3   377     6     -8ns[  -12ns] +/-
> > 5ns
> > * PHC0                    0   3   377     5     +2ns[   +4ns] +/-
> > 4ns
> > * PHC0                    0   3   377    13     +2ns[   +4ns] +/-
> > 4ns
> > * PHC0                    0   3   377    12     -4ns[   -6ns] +/-
> > 4ns
> > * PHC0                    0   3   377    11     -8ns[  -11ns] +/-
> > 6ns
> > * PHC0                    0   3   377    10    -14ns[  -20ns] +/-
> > 4ns
> > * PHC0                    0   3   377     8     +4ns[   +5ns] +/-
> > 4ns
> >
> > The PHC0 is the ptp clock which choose the host clock as its source
> > clock. So we can see that the clock difference between host and guest
> > is in order of ns.
> >
> > Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> > ---
> >  drivers/clocksource/arm_arch_timer.c | 28 ++++++++++++++++++
> >  drivers/ptp/Kconfig                  |  2 +-
> >  drivers/ptp/Makefile                 |  1 +
> >  drivers/ptp/ptp_kvm_arm.c            | 44
> ++++++++++++++++++++++++++++
> >  4 files changed, 74 insertions(+), 1 deletion(-)  create mode 100644
> > drivers/ptp/ptp_kvm_arm.c
> >
> > diff --git a/drivers/clocksource/arm_arch_timer.c
> > b/drivers/clocksource/arm_arch_timer.c
> > index d55acffb0b90..b33c5a663d30 100644
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -25,6 +25,7 @@
> >  #include <linux/sched/clock.h>
> >  #include <linux/sched_clock.h>
> >  #include <linux/acpi.h>
> > +#include <linux/arm-smccc.h>
> >
> >  #include <asm/arch_timer.h>
> >  #include <asm/virt.h>
> > @@ -1650,3 +1651,30 @@ static int __init arch_timer_acpi_init(struct
> > acpi_table_header *table)  }  TIMER_ACPI_DECLARE(arch_timer,
> > ACPI_SIG_GTDT, arch_timer_acpi_init);  #endif
> > +
> > +int kvm_arch_ptp_get_crosststamp(u64 *cycle, struct timespec64 *ts,
> > +			      struct clocksource **cs)
> > +{
> > +	struct arm_smccc_res hvc_res;
> > +	ktime_t ktime;
> > +	u32 ptp_counter;
> > +
> > +	if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> > +		ptp_counter = ARM_PTP_VIRT_COUNTER;
> > +	else
> > +		ptp_counter = ARM_PTP_PHY_COUNTER;
> > +
> > +
> 	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FU
> NC_ID,
> > +			     ptp_counter, &hvc_res);
> > +
> > +	if ((int)(hvc_res.a0) < 0)
> > +		return -EOPNOTSUPP;
> > +
> > +	ktime = (u64)hvc_res.a0 << 32 | hvc_res.a1;
> > +	*ts = ktime_to_timespec64(ktime);
> > +	*cycle = (u64)hvc_res.a2 << 32 | hvc_res.a3;
> 
> Endianness.
> 
> > +	*cs = &clocksource_counter;
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_arch_ptp_get_crosststamp);
> > diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig index
> > 942f72d8151d..677c7f696b70 100644
> > --- a/drivers/ptp/Kconfig
> > +++ b/drivers/ptp/Kconfig
> > @@ -106,7 +106,7 @@ config PTP_1588_CLOCK_PCH  config
> > PTP_1588_CLOCK_KVM
> >  	tristate "KVM virtual PTP clock"
> >  	depends on PTP_1588_CLOCK
> > -	depends on KVM_GUEST && X86
> > +	depends on KVM_GUEST && X86 ||
> (HAVE_ARM_SMCCC_DISCOVERY &&
> > ARM_ARCH_TIMER)
> >  	default y
> >  	help
> >  	  This driver adds support for using kvm infrastructure as a PTP
> > diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile index
> > 699a4e4d19c2..9fa5ede44b2b 100644
> > --- a/drivers/ptp/Makefile
> > +++ b/drivers/ptp/Makefile
> > @@ -5,6 +5,7 @@
> >
> >  ptp-y					:= ptp_clock.o ptp_chardev.o
> ptp_sysfs.o
> >  ptp_kvm-$(CONFIG_X86)			:= ptp_kvm_x86.o
> ptp_kvm_common.o
> > +ptp_kvm-$(CONFIG_HAVE_ARM_SMCCC)	:= ptp_kvm_arm.o
> ptp_kvm_common.o
> >  obj-$(CONFIG_PTP_1588_CLOCK)		+= ptp.o
> >  obj-$(CONFIG_PTP_1588_CLOCK_DTE)	+= ptp_dte.o
> >  obj-$(CONFIG_PTP_1588_CLOCK_INES)	+= ptp_ines.o
> > diff --git a/drivers/ptp/ptp_kvm_arm.c b/drivers/ptp/ptp_kvm_arm.c new
> > file mode 100644 index 000000000000..2212827c0384
> > --- /dev/null
> > +++ b/drivers/ptp/ptp_kvm_arm.c
> > @@ -0,0 +1,44 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + *  Virtual PTP 1588 clock for use with KVM guests
> > + *  Copyright (C) 2019 ARM Ltd.
> > + *  All Rights Reserved
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/err.h>
> > +#include <asm/hypervisor.h>
> > +#include <linux/module.h>
> > +#include <linux/psci.h>
> > +#include <linux/arm-smccc.h>
> > +#include <linux/timecounter.h>
> > +#include <linux/sched/clock.h>
> > +#include <asm/arch_timer.h>
> > +#include <asm/hypervisor.h>
> > +
> > +int kvm_arch_ptp_init(void)
> > +{
> > +	int ret;
> > +
> > +	ret =
> kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_KVM_PTP);
> > +	if (ret <= 0)
> > +		return -EOPNOTSUPP;
> > +
> > +	return 0;
> > +}
> > +
> > +int kvm_arch_ptp_get_clock(struct timespec64 *ts) {
> > +	ktime_t ktime;
> > +	struct arm_smccc_res hvc_res;
> > +
> > +
> 	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FU
> NC_ID,
> > +			     ARM_PTP_NONE_COUNTER, &hvc_res);
> 
> I really don't see the need to use a non-architectural counter ID.
> Using the virtual counter ID should just be fine, and shouldn't lead to any
> issue.
> 

> Am I missing something?

In this function, no counter data is needed. If virtual counter ID is used here, user may be confused with why we get virtual counter
data and do nothing with it. So I explicitly add a new "NONE" counter ID to make it clear.

WDYT?

Thanks
Jianyong
> 
> > +	if ((int)(hvc_res.a0) < 0)
> > +		return -EOPNOTSUPP;
> > +
> > +	ktime = (u64)hvc_res.a0 << 32 | hvc_res.a1;
> 
> Endianness.
> 
> > +	*ts = ktime_to_timespec64(ktime);
> > +
> > +	return 0;
> > +}
> 
> Thanks,
> 
>          M.
> --
> Jazz is not dead. It just smells funny...

^ permalink raw reply

* Re: [PATCH net-next v4 2/5] net/lapb: support netdev events
From: Martin Schiller @ 2020-11-24  5:29 UTC (permalink / raw)
  To: Xie He
  Cc: Jakub Kicinski, Andrew Hendry, David S. Miller, Linux X25,
	Linux Kernel Network Developers, LKML
In-Reply-To: <CAJht_EOA4+DSjnKYZX3udrXX9jGHRmFw3OQesUb3AncD2oowwA@mail.gmail.com>

On 2020-11-23 23:09, Xie He wrote:
> On Mon, Nov 23, 2020 at 11:36 AM Jakub Kicinski <kuba@kernel.org> 
> wrote:
>> 
>> > >  From this point of view it will be the best to handle the NETDEV_UP in
>> > > the lapb event handler and establish the link analog to the
>> > > NETDEV_CHANGE event if the carrier is UP.
>> >
>> > Thanks! This way we can make sure LAPB would automatically connect in
>> > all situations.
>> >
>> > Since we'll have a netif_carrier_ok check in NETDEV_UP handing, it
>> > might make the code look prettier to also have a netif_carrier_ok
>> > check in NETDEV_GOING_DOWN handing (for symmetry). Just a suggestion.
>> > You can do whatever looks good to you :)
>> 
>> Xie other than this the patches look good to you?
>> 
>> Martin should I expect a respin to follow Xie's suggestion
>> or should I apply v4?
> 
> There should be a respin because we need to handle the NETDEV_UP
> event. The lapbether driver (and possibly some HDLC WAN drivers)
> doesn't generate carrier events so we need to do auto-connect in the
> NETDEV_UP event.

I'll send a v5 with the appropriate change.

^ permalink raw reply

* RE: [PATCH v15 8/9] doc: add ptp_kvm introduction for arm64 support
From: Jianyong Wu @ 2020-11-24  5:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: netdev@vger.kernel.org, yangbo.lu@nxp.com, john.stultz@linaro.org,
	tglx@linutronix.de, pbonzini@redhat.com,
	sean.j.christopherson@intel.com, richardcochran@gmail.com,
	Mark Rutland, will@kernel.org, Suzuki Poulose, Andre Przywara,
	Steven Price, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Steve Capper,
	Justin He, nd
In-Reply-To: <38fad448a3a465e4c35994ce61f4d8dd@kernel.org>

Hi Marc,

> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: Monday, November 23, 2020 6:58 PM
> To: Jianyong Wu <Jianyong.Wu@arm.com>
> Cc: netdev@vger.kernel.org; yangbo.lu@nxp.com; john.stultz@linaro.org;
> tglx@linutronix.de; pbonzini@redhat.com; sean.j.christopherson@intel.com;
> richardcochran@gmail.com; Mark Rutland <Mark.Rutland@arm.com>;
> will@kernel.org; Suzuki Poulose <Suzuki.Poulose@arm.com>; Andre
> Przywara <Andre.Przywara@arm.com>; Steven Price
> <Steven.Price@arm.com>; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> kvm@vger.kernel.org; Steve Capper <Steve.Capper@arm.com>; Justin He
> <Justin.He@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v15 8/9] doc: add ptp_kvm introduction for arm64
> support
> 
> On 2020-11-11 06:22, Jianyong Wu wrote:
> > PTP_KVM implementation depends on hypercall using SMCCC. So we
> > introduce a new SMCCC service ID. This doc explains how does the ID
> > define and how does PTP_KVM works on arm/arm64.
> >
> > Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> > ---
> >  Documentation/virt/kvm/api.rst         |  9 +++++++
> >  Documentation/virt/kvm/arm/index.rst   |  1 +
> >  Documentation/virt/kvm/arm/ptp_kvm.rst | 29
> +++++++++++++++++++++
> > Documentation/virt/kvm/timekeeping.rst | 35
> ++++++++++++++++++++++++++
> >  4 files changed, 74 insertions(+)
> >  create mode 100644 Documentation/virt/kvm/arm/ptp_kvm.rst
> >
> > diff --git a/Documentation/virt/kvm/api.rst
> > b/Documentation/virt/kvm/api.rst index 36d5f1f3c6dd..9843dbcbf770
> > 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6391,3 +6391,12 @@ When enabled, KVM will disable paravirtual
> > features provided to the  guest according to the bits in the
> > KVM_CPUID_FEATURES CPUID leaf  (0x40000001). Otherwise, a guest may
> > use the paravirtual features  regardless of what has actually been
> > exposed through the CPUID leaf.
> > +
> > +8.27 KVM_CAP_PTP_KVM
> > +--------------------
> > +
> > +:Architectures: arm64
> > +
> > +This capability indicates that KVM virtual PTP service is supported
> > +in
> > host.
> > +It must company with the implementation of KVM virtual PTP service in
> > host
> > +so VMM can probe if there is the service in host by checking this
> > capability.
> > diff --git a/Documentation/virt/kvm/arm/index.rst
> > b/Documentation/virt/kvm/arm/index.rst
> > index 3e2b2aba90fc..78a9b670aafe 100644
> > --- a/Documentation/virt/kvm/arm/index.rst
> > +++ b/Documentation/virt/kvm/arm/index.rst
> > @@ -10,3 +10,4 @@ ARM
> >     hyp-abi
> >     psci
> >     pvtime
> > +   ptp_kvm
> > diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst
> > b/Documentation/virt/kvm/arm/ptp_kvm.rst
> > new file mode 100644
> > index 000000000000..bb1e6cfefe44
> > --- /dev/null
> > +++ b/Documentation/virt/kvm/arm/ptp_kvm.rst
> > @@ -0,0 +1,29 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +PTP_KVM support for arm/arm64
> > +=============================
> > +
> > +PTP_KVM is used for time sync between guest and host in a high
> > precision.
> > +It needs to get the wall time and counter value from the host and
> > transfer these
> > +to guest via hypercall service. So one more hypercall service has
> > +been
> > added.
> > +
> > +This new SMCCC hypercall is defined as:
> > +
> > +* ARM_SMCCC_HYP_KVM_PTP_FUNC_ID: 0x86000001
> > +
> > +As both 32 and 64-bits ptp_kvm client should be supported, we choose
> > SMC32/HVC32
> > +calling convention.
> > +
> > +ARM_SMCCC_HYP_KVM_PTP_FUNC_ID:
> > +
> > +    =============    ==========    ==========
> > +    Function ID:     (uint32)      0x86000001
> > +    Arguments:	     (uint32)      ARM_PTP_PHY_COUNTER(1) or
> > ARM_PTP_VIRT_COUNTER(0)
> > +                                   which indicate acquiring physical
> > counter or
> > +                                   virtual counter respectively.
> > +    return value:    (uint32)      NOT_SUPPORTED(-1) or val0 and val1
> > represent
> > +                                   wall clock time and val2 and val3
> > represent
> > +                                   counter cycle.
> 
> This needs a lot more description:
> 
> - Which word contains what part of the data (upper/lower part of the 64bit
> data)
> - The endianness of the data returned

OK.

Thanks
Jianyong 
> 
>          M.
> --
> Jazz is not dead. It just smells funny...

^ permalink raw reply

* RE: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.
From: Jianyong Wu @ 2020-11-24  5:20 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: netdev@vger.kernel.org, yangbo.lu@nxp.com, john.stultz@linaro.org,
	tglx@linutronix.de, pbonzini@redhat.com,
	sean.j.christopherson@intel.com, richardcochran@gmail.com,
	Mark Rutland, will@kernel.org, Suzuki Poulose, Andre Przywara,
	Steven Price, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Steve Capper,
	Justin He, nd
In-Reply-To: <d409aa1cb7cfcbf4351e6c5fc34d9c7e@kernel.org>

Hi Marc,

> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: Monday, November 23, 2020 6:44 PM
> To: Jianyong Wu <Jianyong.Wu@arm.com>
> Cc: netdev@vger.kernel.org; yangbo.lu@nxp.com; john.stultz@linaro.org;
> tglx@linutronix.de; pbonzini@redhat.com; sean.j.christopherson@intel.com;
> richardcochran@gmail.com; Mark Rutland <Mark.Rutland@arm.com>;
> will@kernel.org; Suzuki Poulose <Suzuki.Poulose@arm.com>; Andre
> Przywara <Andre.Przywara@arm.com>; Steven Price
> <Steven.Price@arm.com>; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> kvm@vger.kernel.org; Steve Capper <Steve.Capper@arm.com>; Justin He
> <Justin.He@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.
> 
> On 2020-11-11 06:22, Jianyong Wu wrote:
> > ptp_kvm will get this service through SMCC call.
> > The service offers wall time and cycle count of host to guest.
> > The caller must specify whether they want the host cycle count or the
> > difference between host cycle count and cntvoff.
> >
> > Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> > ---
> >  arch/arm64/kvm/hypercalls.c | 61
> +++++++++++++++++++++++++++++++++++++
> >  include/linux/arm-smccc.h   | 17 +++++++++++
> >  2 files changed, 78 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index b9d8607083eb..f7d189563f3d 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -9,6 +9,51 @@
> >  #include <kvm/arm_hypercalls.h>
> >  #include <kvm/arm_psci.h>
> >
> > +static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val) {
> > +	struct system_time_snapshot systime_snapshot;
> > +	u64 cycles = ~0UL;
> > +	u32 feature;
> > +
> > +	/*
> > +	 * system time and counter value must captured in the same
> > +	 * time to keep consistency and precision.
> > +	 */
> > +	ktime_get_snapshot(&systime_snapshot);
> > +
> > +	// binding ptp_kvm clocksource to arm_arch_counter
> > +	if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
> > +		return;
> > +
> > +	val[0] = upper_32_bits(systime_snapshot.real);
> > +	val[1] = lower_32_bits(systime_snapshot.real);
> 
> What is the endianness of these values? I can't see it defined anywhere, and
> this is likely not to work if guest and hypervisor don't align.
> 
> > +
> > +	/*
> > +	 * which of virtual counter or physical counter being
> > +	 * asked for is decided by the r1 value of SMCCC
> > +	 * call. If no invalid r1 value offered, default cycle
> > +	 * value(-1) will be returned.
> > +	 * Note: keep in mind that feature is u32 and smccc_get_arg1
> > +	 * will return u64, so need auto cast here.
> > +	 */
> > +	feature = smccc_get_arg1(vcpu);
> > +	switch (feature) {
> > +	case ARM_PTP_VIRT_COUNTER:
> > +		cycles = systime_snapshot.cycles - vcpu_read_sys_reg(vcpu,
> > CNTVOFF_EL2);
> > +		break;
> > +	case ARM_PTP_PHY_COUNTER:
> > +		cycles = systime_snapshot.cycles;
> > +		break;
> > +	case ARM_PTP_NONE_COUNTER:
> 
> What is this "NONE" counter?

Yeah, there is no counter named "NONE". this is not a counter, and only means no counter data needed for guest and just do nothing.
If no this arm here, it will go to the default one and return "NOT_SUPPORTED"

> 
> > +		break;
> > +	default:
> > +		val[0] = SMCCC_RET_NOT_SUPPORTED;
> > +		break;
> > +	}
> > +	val[2] = upper_32_bits(cycles);
> > +	val[3] = lower_32_bits(cycles);
> 
> Same problem as above.
> 
> > +}
> > +
> >  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)  {
> >  	u32 func_id = smccc_get_function(vcpu); @@ -79,6 +124,22 @@ int
> > kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> >  		break;
> >  	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> >  		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> > +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP);
> > +		break;
> > +	/*
> > +	 * This serves virtual kvm_ptp.
> > +	 * Four values will be passed back.
> > +	 * reg0 stores high 32-bits of host ktime;
> > +	 * reg1 stores low 32-bits of host ktime;
> > +	 * For ARM_PTP_VIRT_COUNTER:
> > +	 * reg2 stores high 32-bits of difference of host cycles and cntvoff;
> > +	 * reg3 stores low 32-bits of difference of host cycles and cntvoff.
> > +	 * For ARM_PTP_PHY_COUNTER:
> > +	 * reg2 stores the high 32-bits of host cycles;
> > +	 * reg3 stores the low 32-bits of host cycles.
> > +	 */
> > +	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> > +		kvm_ptp_get_time(vcpu, val);
> >  		break;
> >  	default:
> >  		return kvm_psci_call(vcpu);
> > diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> > index d75408141137..a03c5dd409d3 100644
> > --- a/include/linux/arm-smccc.h
> > +++ b/include/linux/arm-smccc.h
> > @@ -103,6 +103,7 @@
> >
> >  /* KVM "vendor specific" services */
> >  #define ARM_SMCCC_KVM_FUNC_FEATURES		0
> > +#define ARM_SMCCC_KVM_FUNC_KVM_PTP		1
> 
> I think having KVM once in the name is enough.
> 
> >  #define ARM_SMCCC_KVM_FUNC_FEATURES_2		127
> >  #define ARM_SMCCC_KVM_NUM_FUNCS			128
> >
> > @@ -114,6 +115,22 @@
> >
> >  #define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED	1
> >
> > +/*
> > + * ptp_kvm is a feature used for time sync between vm and host.
> > + * ptp_kvm module in guest kernel will get service from host using
> > + * this hypercall ID.
> > + */
> > +#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID
> 		\
> > +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,
> 		\
> > +			   ARM_SMCCC_SMC_32,
> 	\
> > +			   ARM_SMCCC_OWNER_VENDOR_HYP,
> 		\
> > +			   ARM_SMCCC_KVM_FUNC_KVM_PTP)
> > +
> > +/* ptp_kvm counter type ID */
> > +#define ARM_PTP_VIRT_COUNTER			0
> > +#define ARM_PTP_PHY_COUNTER			1
> > +#define ARM_PTP_NONE_COUNTER			2
> 
> The architecture definitely doesn't have this last counter.

Yeah, this is just represent no counter data needed from guest.
Some annotation should be added here.

Thanks
Jianyong

> 
> > +
> >  /* Paravirtualised time calls (defined by ARM DEN0057A) */
> >  #define ARM_SMCCC_HV_PV_TIME_FEATURES
> 	\
> >  	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,
> 	\
> 
> Thanks,
> 
>          M.
> --
> Jazz is not dead. It just smells funny...

^ permalink raw reply

* RE: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.
From: Jianyong Wu @ 2020-11-24  5:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Justin He, kvm@vger.kernel.org, netdev@vger.kernel.org,
	richardcochran@gmail.com, linux-kernel@vger.kernel.org,
	sean.j.christopherson@intel.com, Steven Price, Andre Przywara,
	john.stultz@linaro.org, yangbo.lu@nxp.com, pbonzini@redhat.com,
	tglx@linutronix.de, nd, will@kernel.org,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <276428d3d291f703e2f0c2c323194e98@kernel.org>

Hi Marc,

> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: Monday, November 23, 2020 7:59 PM
> To: Jianyong Wu <Jianyong.Wu@arm.com>
> Cc: Justin He <Justin.He@arm.com>; kvm@vger.kernel.org;
> netdev@vger.kernel.org; richardcochran@gmail.com; linux-
> kernel@vger.kernel.org; sean.j.christopherson@intel.com; Steven Price
> <Steven.Price@arm.com>; Andre Przywara <Andre.Przywara@arm.com>;
> john.stultz@linaro.org; yangbo.lu@nxp.com; pbonzini@redhat.com;
> tglx@linutronix.de; nd <nd@arm.com>; will@kernel.org;
> kvmarm@lists.cs.columbia.edu; linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH v15 6/9] arm64/kvm: Add hypercall service for kvm ptp.
> 
> On 2020-11-23 10:44, Marc Zyngier wrote:
> > On 2020-11-11 06:22, Jianyong Wu wrote:
> >> ptp_kvm will get this service through SMCC call.
> >> The service offers wall time and cycle count of host to guest.
> >> The caller must specify whether they want the host cycle count or the
> >> difference between host cycle count and cntvoff.
> >>
> >> Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> >> ---
> >>  arch/arm64/kvm/hypercalls.c | 61
> >> +++++++++++++++++++++++++++++++++++++
> >>  include/linux/arm-smccc.h   | 17 +++++++++++
> >>  2 files changed, 78 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/hypercalls.c
> >> b/arch/arm64/kvm/hypercalls.c index b9d8607083eb..f7d189563f3d
> 100644
> >> --- a/arch/arm64/kvm/hypercalls.c
> >> +++ b/arch/arm64/kvm/hypercalls.c
> >> @@ -9,6 +9,51 @@
> >>  #include <kvm/arm_hypercalls.h>
> >>  #include <kvm/arm_psci.h>
> >>
> >> +static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val) {
> >> +	struct system_time_snapshot systime_snapshot;
> >> +	u64 cycles = ~0UL;
> >> +	u32 feature;
> >> +
> >> +	/*
> >> +	 * system time and counter value must captured in the same
> >> +	 * time to keep consistency and precision.
> >> +	 */
> >> +	ktime_get_snapshot(&systime_snapshot);
> >> +
> >> +	// binding ptp_kvm clocksource to arm_arch_counter
> >> +	if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
> >> +		return;
> >> +
> >> +	val[0] = upper_32_bits(systime_snapshot.real);
> >> +	val[1] = lower_32_bits(systime_snapshot.real);
> >
> > What is the endianness of these values? I can't see it defined
> > anywhere, and this is likely not to work if guest and hypervisor don't
> > align.
> 
> Scratch that. This is all passed via registers, so the endianness of the data is
> irrelevant. Please discard any comment about endianness I made in this
> review.
> 
Yeah, these data process and transfer are no relationship with endianness. Thanks.

> The documentation aspect still requires to be beefed up.

So the endianness description will be "no endianness restriction".

Thanks 
Jianyong

> 
> Thanks,
> 
>          M.
> --
> Jazz is not dead. It just smells funny...

^ permalink raw reply

* [net-next PATCH v5 3/4] net: dsa: mv88e6xxx: Add serdes interrupt support for MV88E6097
From: Chris Packham @ 2020-11-24  4:34 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv, davem, kuba, linux,
	pavana.sharma
  Cc: netdev, linux-kernel, Chris Packham
In-Reply-To: <20201124043440.28400-1-chris.packham@alliedtelesis.co.nz>

The MV88E6097 presents the serdes interrupts for ports 8 and 9 via the
Switch Global 2 registers. There is no additional layer of
enablinh/disabling the serdes interrupts like other mv88e6xxx switches.
Even though most of the serdes behaviour is the same as the MV88E6185
that chip does not provide interrupts for serdes events so unlike
earlier commits the functions added here are specific to the MV88E6097.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
---
Changes in v5:
- New

 drivers/net/dsa/mv88e6xxx/chip.c   |  3 ++
 drivers/net/dsa/mv88e6xxx/serdes.c | 47 ++++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/serdes.h |  4 +++
 3 files changed, 54 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 545eb9c6c3fc..e7f68ac0c7e3 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3308,6 +3308,9 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
 	.serdes_power = mv88e6185_serdes_power,
 	.serdes_get_lane = mv88e6185_serdes_get_lane,
 	.serdes_pcs_get_state = mv88e6185_serdes_pcs_get_state,
+	.serdes_irq_mapping = mv88e6390_serdes_irq_mapping,
+	.serdes_irq_enable = mv88e6097_serdes_irq_enable,
+	.serdes_irq_status = mv88e6097_serdes_irq_status,
 	.pot_clear = mv88e6xxx_g2_pot_clear,
 	.reset = mv88e6352_g1_reset,
 	.rmu_disable = mv88e6085_g1_rmu_disable,
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c b/drivers/net/dsa/mv88e6xxx/serdes.c
index d4f40a739b17..e60e8f0d0225 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -490,6 +490,53 @@ int mv88e6185_serdes_pcs_get_state(struct mv88e6xxx_chip *chip, int port,
 	return 0;
 }
 
+int mv88e6097_serdes_irq_enable(struct mv88e6xxx_chip *chip, int port, u8 lane,
+				bool enable)
+{
+	u8 cmode = chip->ports[port].cmode;
+
+	/* The serdes interrupts are enabled in the G2_INT_MASK register. We
+	 * need to return 0 to avoid returning -EOPNOTSUPP in
+	 * mv88e6xxx_serdes_irq_enable/mv88e6xxx_serdes_irq_disable
+	 */
+	switch (cmode) {
+	case MV88E6185_PORT_STS_CMODE_SERDES:
+	case MV88E6185_PORT_STS_CMODE_1000BASE_X:
+		return 0;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static void mv88e6097_serdes_irq_link(struct mv88e6xxx_chip *chip, int port)
+{
+	u16 status;
+	int err;
+
+	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_STS, &status);
+	if (err) {
+		dev_err(chip->dev, "can't read port status: %d\n", err);
+		return;
+	}
+
+	dsa_port_phylink_mac_change(chip->ds, port, !!(status & MV88E6XXX_PORT_STS_LINK));
+}
+
+irqreturn_t mv88e6097_serdes_irq_status(struct mv88e6xxx_chip *chip, int port,
+					u8 lane)
+{
+	u8 cmode = chip->ports[port].cmode;
+
+	switch (cmode) {
+	case MV88E6185_PORT_STS_CMODE_SERDES:
+	case MV88E6185_PORT_STS_CMODE_1000BASE_X:
+		mv88e6097_serdes_irq_link(chip, port);
+		return IRQ_HANDLED;
+	}
+
+	return IRQ_NONE;
+}
+
 u8 mv88e6390_serdes_get_lane(struct mv88e6xxx_chip *chip, int port)
 {
 	u8 cmode = chip->ports[port].cmode;
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.h b/drivers/net/dsa/mv88e6xxx/serdes.h
index c24ec4122c9e..93822ef9bab8 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.h
+++ b/drivers/net/dsa/mv88e6xxx/serdes.h
@@ -110,10 +110,14 @@ int mv88e6352_serdes_power(struct mv88e6xxx_chip *chip, int port, u8 lane,
 			   bool on);
 int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, u8 lane,
 			   bool on);
+int mv88e6097_serdes_irq_enable(struct mv88e6xxx_chip *chip, int port, u8 lane,
+				bool enable);
 int mv88e6352_serdes_irq_enable(struct mv88e6xxx_chip *chip, int port, u8 lane,
 				bool enable);
 int mv88e6390_serdes_irq_enable(struct mv88e6xxx_chip *chip, int port, u8 lane,
 				bool enable);
+irqreturn_t mv88e6097_serdes_irq_status(struct mv88e6xxx_chip *chip, int port,
+					u8 lane);
 irqreturn_t mv88e6352_serdes_irq_status(struct mv88e6xxx_chip *chip, int port,
 					u8 lane);
 irqreturn_t mv88e6390_serdes_irq_status(struct mv88e6xxx_chip *chip, int port,
-- 
2.29.2


^ permalink raw reply related

* [net-next PATCH v5 1/4] net: dsa: mv88e6xxx: Don't force link when using in-band-status
From: Chris Packham @ 2020-11-24  4:34 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv, davem, kuba, linux,
	pavana.sharma
  Cc: netdev, linux-kernel, Chris Packham
In-Reply-To: <20201124043440.28400-1-chris.packham@alliedtelesis.co.nz>

When a port is configured with 'managed = "in-band-status"' switch chips
like the 88E6390 need to propagate the SERDES link state to the MAC
because the link state is not correctly detected. This causes problems
on the 88E6185/88E6097 where the link partner won't see link state
changes because we're forcing the link.

To address this introduce a new device specific op port_sync_link() and
push the logic from mv88e6xxx_mac_link_up() into that. Provide an
implementation for the 88E6185 like devices which doesn't force the
link.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
---
Changes in v5:
- Update mv88e6xxx_mac_link_down code path
Changes in v4:
- Introduce new device op
- Remove review from Andrew as things have changed a lot
Changes in v3:
- None
Changes in v2:
- Add review from Andrew

 drivers/net/dsa/mv88e6xxx/chip.c | 35 +++++++++++++++++++++++++++----
 drivers/net/dsa/mv88e6xxx/chip.h |  4 ++++
 drivers/net/dsa/mv88e6xxx/port.c | 36 ++++++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/port.h |  3 +++
 4 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index e8258db8c21e..296932b2b80d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -727,8 +727,8 @@ static void mv88e6xxx_mac_link_down(struct dsa_switch *ds, int port,
 
 	mv88e6xxx_reg_lock(chip);
 	if ((!mv88e6xxx_port_ppu_updates(chip, port) ||
-	     mode == MLO_AN_FIXED) && ops->port_set_link)
-		err = ops->port_set_link(chip, port, LINK_FORCED_DOWN);
+	     mode == MLO_AN_FIXED) && ops->port_sync_link)
+		err = ops->port_sync_link(chip, port, mode, false);
 	mv88e6xxx_reg_unlock(chip);
 
 	if (err)
@@ -768,8 +768,8 @@ static void mv88e6xxx_mac_link_up(struct dsa_switch *ds, int port,
 				goto error;
 		}
 
-		if (ops->port_set_link)
-			err = ops->port_set_link(chip, port, LINK_FORCED_UP);
+		if (ops->port_sync_link)
+			err = ops->port_sync_link(chip, port, mode, true);
 	}
 error:
 	mv88e6xxx_reg_unlock(chip);
@@ -3210,6 +3210,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
 	.phy_read = mv88e6185_phy_ppu_read,
 	.phy_write = mv88e6185_phy_ppu_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
@@ -3249,6 +3250,7 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
 	.phy_read = mv88e6185_phy_ppu_read,
 	.phy_write = mv88e6185_phy_ppu_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6185_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_set_frame_mode = mv88e6085_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6185_port_set_egress_floods,
@@ -3279,6 +3281,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6185_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
@@ -3317,6 +3320,7 @@ static const struct mv88e6xxx_ops mv88e6123_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_set_frame_mode = mv88e6085_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
@@ -3351,6 +3355,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = {
 	.phy_read = mv88e6185_phy_ppu_read,
 	.phy_write = mv88e6185_phy_ppu_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
@@ -3392,6 +3397,7 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6341_port_set_speed_duplex,
 	.port_max_speed_mode = mv88e6341_port_max_speed_mode,
@@ -3443,6 +3449,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
@@ -3484,6 +3491,7 @@ static const struct mv88e6xxx_ops mv88e6165_ops = {
 	.phy_read = mv88e6165_phy_read,
 	.phy_write = mv88e6165_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
@@ -3518,6 +3526,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
@@ -3560,6 +3569,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6352_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
@@ -3611,6 +3621,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
@@ -3653,6 +3664,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6352_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
@@ -3706,6 +3718,7 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
 	.phy_read = mv88e6185_phy_ppu_read,
 	.phy_write = mv88e6185_phy_ppu_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6185_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_set_frame_mode = mv88e6085_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6185_port_set_egress_floods,
@@ -3743,6 +3756,7 @@ static const struct mv88e6xxx_ops mv88e6190_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6390_port_set_speed_duplex,
 	.port_max_speed_mode = mv88e6390_port_max_speed_mode,
@@ -3802,6 +3816,7 @@ static const struct mv88e6xxx_ops mv88e6190x_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6390x_port_set_speed_duplex,
 	.port_max_speed_mode = mv88e6390x_port_max_speed_mode,
@@ -3861,6 +3876,7 @@ static const struct mv88e6xxx_ops mv88e6191_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6390_port_set_speed_duplex,
 	.port_max_speed_mode = mv88e6390_port_max_speed_mode,
@@ -3920,6 +3936,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6352_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
@@ -3978,6 +3995,7 @@ static const struct mv88e6xxx_ops mv88e6250_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6250_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
@@ -4015,6 +4033,7 @@ static const struct mv88e6xxx_ops mv88e6290_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6390_port_set_speed_duplex,
 	.port_max_speed_mode = mv88e6390_port_max_speed_mode,
@@ -4076,6 +4095,7 @@ static const struct mv88e6xxx_ops mv88e6320_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
@@ -4118,6 +4138,7 @@ static const struct mv88e6xxx_ops mv88e6321_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
@@ -4158,6 +4179,7 @@ static const struct mv88e6xxx_ops mv88e6341_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6341_port_set_speed_duplex,
 	.port_max_speed_mode = mv88e6341_port_max_speed_mode,
@@ -4211,6 +4233,7 @@ static const struct mv88e6xxx_ops mv88e6350_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
@@ -4251,6 +4274,7 @@ static const struct mv88e6xxx_ops mv88e6351_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
@@ -4295,6 +4319,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6352_port_set_speed_duplex,
 	.port_tag_remap = mv88e6095_port_tag_remap,
@@ -4355,6 +4380,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6390_port_set_speed_duplex,
 	.port_max_speed_mode = mv88e6390_port_max_speed_mode,
@@ -4418,6 +4444,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
 	.phy_read = mv88e6xxx_g2_smi_phy_read,
 	.phy_write = mv88e6xxx_g2_smi_phy_write,
 	.port_set_link = mv88e6xxx_port_set_link,
+	.port_sync_link = mv88e6xxx_port_sync_link,
 	.port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
 	.port_set_speed_duplex = mv88e6390x_port_set_speed_duplex,
 	.port_max_speed_mode = mv88e6390x_port_max_speed_mode,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 7faa61b7f8f8..3543055bcb51 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -417,6 +417,10 @@ struct mv88e6xxx_ops {
 	 */
 	int (*port_set_link)(struct mv88e6xxx_chip *chip, int port, int link);
 
+	/* Synchronise the port link state with that of the SERDES
+	 */
+	int (*port_sync_link)(struct mv88e6xxx_chip *chip, int port, unsigned int mode, bool isup);
+
 #define PAUSE_ON		1
 #define PAUSE_OFF		0
 
diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 8128dc607cf4..77a5fd1798cd 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -162,6 +162,42 @@ int mv88e6xxx_port_set_link(struct mv88e6xxx_chip *chip, int port, int link)
 	return 0;
 }
 
+int mv88e6xxx_port_sync_link(struct mv88e6xxx_chip *chip, int port, unsigned int mode, bool isup)
+{
+	const struct mv88e6xxx_ops *ops = chip->info->ops;
+	int err = 0;
+	int link;
+
+	if (isup)
+		link = LINK_FORCED_UP;
+	else
+		link = LINK_FORCED_DOWN;
+
+	if (ops->port_set_link)
+		err = ops->port_set_link(chip, port, link);
+
+	return err;
+}
+
+int mv88e6185_port_sync_link(struct mv88e6xxx_chip *chip, int port, unsigned int mode, bool isup)
+{
+	const struct mv88e6xxx_ops *ops = chip->info->ops;
+	int err = 0;
+	int link;
+
+	if (mode == MLO_AN_INBAND)
+		link = LINK_UNFORCED;
+	else if (isup)
+		link = LINK_FORCED_UP;
+	else
+		link = LINK_FORCED_DOWN;
+
+	if (ops->port_set_link)
+		err = ops->port_set_link(chip, port, link);
+
+	return err;
+}
+
 static int mv88e6xxx_port_set_speed_duplex(struct mv88e6xxx_chip *chip,
 					   int port, int speed, bool alt_bit,
 					   bool force_bit, int duplex)
diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
index 44d76ac973f6..500e1d4896ff 100644
--- a/drivers/net/dsa/mv88e6xxx/port.h
+++ b/drivers/net/dsa/mv88e6xxx/port.h
@@ -298,6 +298,9 @@ int mv88e6390_port_set_rgmii_delay(struct mv88e6xxx_chip *chip, int port,
 
 int mv88e6xxx_port_set_link(struct mv88e6xxx_chip *chip, int port, int link);
 
+int mv88e6xxx_port_sync_link(struct mv88e6xxx_chip *chip, int port, unsigned int mode, bool isup);
+int mv88e6185_port_sync_link(struct mv88e6xxx_chip *chip, int port, unsigned int mode, bool isup);
+
 int mv88e6065_port_set_speed_duplex(struct mv88e6xxx_chip *chip, int port,
 				    int speed, int duplex);
 int mv88e6185_port_set_speed_duplex(struct mv88e6xxx_chip *chip, int port,
-- 
2.29.2


^ permalink raw reply related

* [net-next PATCH v5 2/4] net: dsa: mv88e6xxx: Support serdes ports on MV88E6097/6095/6185
From: Chris Packham @ 2020-11-24  4:34 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv, davem, kuba, linux,
	pavana.sharma
  Cc: netdev, linux-kernel, Chris Packham
In-Reply-To: <20201124043440.28400-1-chris.packham@alliedtelesis.co.nz>

Implement serdes_power, serdes_get_lane and serdes_pcs_get_state ops for
the MV88E6097/6095/6185 so that ports 8 & 9 can be supported as serdes
ports and directly connected to other network interfaces or to SFPs
without a PHY.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
Changes in v5:
- None
Changes in v4:
- None
Changes in v3:
- Add comment to mv88e6185_serdes_get_lane
- Add review from Andrew
Changes in v2:
- expand support to cover 6095 and 6185
- move serdes related code to serdes.c

 drivers/net/dsa/mv88e6xxx/chip.c   |  9 +++++
 drivers/net/dsa/mv88e6xxx/serdes.c | 62 ++++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/serdes.h |  5 +++
 3 files changed, 76 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 296932b2b80d..545eb9c6c3fc 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3263,6 +3263,9 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
 	.stats_get_strings = mv88e6095_stats_get_strings,
 	.stats_get_stats = mv88e6095_stats_get_stats,
 	.mgmt_rsvd2cpu = mv88e6185_g2_mgmt_rsvd2cpu,
+	.serdes_power = mv88e6185_serdes_power,
+	.serdes_get_lane = mv88e6185_serdes_get_lane,
+	.serdes_pcs_get_state = mv88e6185_serdes_pcs_get_state,
 	.ppu_enable = mv88e6185_g1_ppu_enable,
 	.ppu_disable = mv88e6185_g1_ppu_disable,
 	.reset = mv88e6185_g1_reset,
@@ -3302,6 +3305,9 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
 	.set_egress_port = mv88e6095_g1_set_egress_port,
 	.watchdog_ops = &mv88e6097_watchdog_ops,
 	.mgmt_rsvd2cpu = mv88e6352_g2_mgmt_rsvd2cpu,
+	.serdes_power = mv88e6185_serdes_power,
+	.serdes_get_lane = mv88e6185_serdes_get_lane,
+	.serdes_pcs_get_state = mv88e6185_serdes_pcs_get_state,
 	.pot_clear = mv88e6xxx_g2_pot_clear,
 	.reset = mv88e6352_g1_reset,
 	.rmu_disable = mv88e6085_g1_rmu_disable,
@@ -3736,6 +3742,9 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
 	.set_egress_port = mv88e6095_g1_set_egress_port,
 	.watchdog_ops = &mv88e6097_watchdog_ops,
 	.mgmt_rsvd2cpu = mv88e6185_g2_mgmt_rsvd2cpu,
+	.serdes_power = mv88e6185_serdes_power,
+	.serdes_get_lane = mv88e6185_serdes_get_lane,
+	.serdes_pcs_get_state = mv88e6185_serdes_pcs_get_state,
 	.set_cascade_port = mv88e6185_g1_set_cascade_port,
 	.ppu_enable = mv88e6185_g1_ppu_enable,
 	.ppu_disable = mv88e6185_g1_ppu_disable,
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c b/drivers/net/dsa/mv88e6xxx/serdes.c
index 9c07b4f3d345..d4f40a739b17 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -428,6 +428,68 @@ u8 mv88e6341_serdes_get_lane(struct mv88e6xxx_chip *chip, int port)
 	return lane;
 }
 
+int mv88e6185_serdes_power(struct mv88e6xxx_chip *chip, int port, u8 lane,
+			   bool up)
+{
+	/* The serdes power can't be controlled on this switch chip but we need
+	 * to supply this function to avoid returning -EOPNOTSUPP in
+	 * mv88e6xxx_serdes_power_up/mv88e6xxx_serdes_power_down
+	 */
+	return 0;
+}
+
+u8 mv88e6185_serdes_get_lane(struct mv88e6xxx_chip *chip, int port)
+{
+	/* There are no configurable serdes lanes on this switch chip but we
+	 * need to return non-zero so that callers of
+	 * mv88e6xxx_serdes_get_lane() know this is a serdes port.
+	 */
+	switch (chip->ports[port].cmode) {
+	case MV88E6185_PORT_STS_CMODE_SERDES:
+	case MV88E6185_PORT_STS_CMODE_1000BASE_X:
+		return 0xff;
+	default:
+		return 0;
+	}
+}
+
+int mv88e6185_serdes_pcs_get_state(struct mv88e6xxx_chip *chip, int port,
+				   u8 lane, struct phylink_link_state *state)
+{
+	int err;
+	u16 status;
+
+	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_STS, &status);
+	if (err)
+		return err;
+
+	state->link = !!(status & MV88E6XXX_PORT_STS_LINK);
+
+	if (state->link) {
+		state->duplex = status & MV88E6XXX_PORT_STS_DUPLEX ? DUPLEX_FULL : DUPLEX_HALF;
+
+		switch (status &  MV88E6XXX_PORT_STS_SPEED_MASK) {
+		case MV88E6XXX_PORT_STS_SPEED_1000:
+			state->speed = SPEED_1000;
+			break;
+		case MV88E6XXX_PORT_STS_SPEED_100:
+			state->speed = SPEED_100;
+			break;
+		case MV88E6XXX_PORT_STS_SPEED_10:
+			state->speed = SPEED_10;
+			break;
+		default:
+			dev_err(chip->dev, "invalid PHY speed\n");
+			return -EINVAL;
+		}
+	} else {
+		state->duplex = DUPLEX_UNKNOWN;
+		state->speed = SPEED_UNKNOWN;
+	}
+
+	return 0;
+}
+
 u8 mv88e6390_serdes_get_lane(struct mv88e6xxx_chip *chip, int port)
 {
 	u8 cmode = chip->ports[port].cmode;
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.h b/drivers/net/dsa/mv88e6xxx/serdes.h
index 14315f26228a..c24ec4122c9e 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.h
+++ b/drivers/net/dsa/mv88e6xxx/serdes.h
@@ -73,6 +73,7 @@
 #define MV88E6390_PG_CONTROL		0xf010
 #define MV88E6390_PG_CONTROL_ENABLE_PC		BIT(0)
 
+u8 mv88e6185_serdes_get_lane(struct mv88e6xxx_chip *chip, int port);
 u8 mv88e6341_serdes_get_lane(struct mv88e6xxx_chip *chip, int port);
 u8 mv88e6352_serdes_get_lane(struct mv88e6xxx_chip *chip, int port);
 u8 mv88e6390_serdes_get_lane(struct mv88e6xxx_chip *chip, int port);
@@ -85,6 +86,8 @@ int mv88e6390_serdes_pcs_config(struct mv88e6xxx_chip *chip, int port,
 				u8 lane, unsigned int mode,
 				phy_interface_t interface,
 				const unsigned long *advertise);
+int mv88e6185_serdes_pcs_get_state(struct mv88e6xxx_chip *chip, int port,
+				   u8 lane, struct phylink_link_state *state);
 int mv88e6352_serdes_pcs_get_state(struct mv88e6xxx_chip *chip, int port,
 				   u8 lane, struct phylink_link_state *state);
 int mv88e6390_serdes_pcs_get_state(struct mv88e6xxx_chip *chip, int port,
@@ -101,6 +104,8 @@ unsigned int mv88e6352_serdes_irq_mapping(struct mv88e6xxx_chip *chip,
 					  int port);
 unsigned int mv88e6390_serdes_irq_mapping(struct mv88e6xxx_chip *chip,
 					  int port);
+int mv88e6185_serdes_power(struct mv88e6xxx_chip *chip, int port, u8 lane,
+			   bool up);
 int mv88e6352_serdes_power(struct mv88e6xxx_chip *chip, int port, u8 lane,
 			   bool on);
 int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int port, u8 lane,
-- 
2.29.2


^ permalink raw reply related

* [net-next PATCH v5 4/4] net: dsa: mv88e6xxx: Handle error in serdes_get_regs
From: Chris Packham @ 2020-11-24  4:34 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv, davem, kuba, linux,
	pavana.sharma
  Cc: netdev, linux-kernel, Chris Packham
In-Reply-To: <20201124043440.28400-1-chris.packham@alliedtelesis.co.nz>

If the underlying read operation failed we would end up writing stale
data to the supplied buffer. This would end up with the last
successfully read value repeating. Fix this by only writing the data
when we know the read was good. This will mean that failed values will
return 0xffff.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
Changes in v5:
- Add review from Andrew.
Changes in v4:
- new

 drivers/net/dsa/mv88e6xxx/serdes.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c b/drivers/net/dsa/mv88e6xxx/serdes.c
index e60e8f0d0225..3195936dc5be 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -400,14 +400,16 @@ void mv88e6352_serdes_get_regs(struct mv88e6xxx_chip *chip, int port, void *_p)
 {
 	u16 *p = _p;
 	u16 reg;
+	int err;
 	int i;
 
 	if (!mv88e6352_port_has_serdes(chip, port))
 		return;
 
 	for (i = 0 ; i < 32; i++) {
-		mv88e6352_serdes_read(chip, i, &reg);
-		p[i] = reg;
+		err = mv88e6352_serdes_read(chip, i, &reg);
+		if (!err)
+			p[i] = reg;
 	}
 }
 
@@ -1096,6 +1098,7 @@ void mv88e6390_serdes_get_regs(struct mv88e6xxx_chip *chip, int port, void *_p)
 	u16 *p = _p;
 	int lane;
 	u16 reg;
+	int err;
 	int i;
 
 	lane = mv88e6xxx_serdes_get_lane(chip, port);
@@ -1103,8 +1106,9 @@ void mv88e6390_serdes_get_regs(struct mv88e6xxx_chip *chip, int port, void *_p)
 		return;
 
 	for (i = 0 ; i < ARRAY_SIZE(mv88e6390_serdes_regs); i++) {
-		mv88e6390_serdes_read(chip, lane, MDIO_MMD_PHYXS,
-				      mv88e6390_serdes_regs[i], &reg);
-		p[i] = reg;
+		err = mv88e6390_serdes_read(chip, lane, MDIO_MMD_PHYXS,
+					    mv88e6390_serdes_regs[i], &reg);
+		if (!err)
+			p[i] = reg;
 	}
 }
-- 
2.29.2


^ permalink raw reply related

* [net-next PATCH v5 0/4] net: dsa: mv88e6xxx: serdes link without phy
From: Chris Packham @ 2020-11-24  4:34 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv, davem, kuba, linux,
	pavana.sharma
  Cc: netdev, linux-kernel, Chris Packham

This small series gets my hardware into a working state. The key points are to
make sure we don't force the link and that we ask the MAC for the link status.
I also have updated my dts to say `phy-mode = "1000base-x";` and `managed =
"in-band-status";`

I've dropped the patch for the 88E6123 as it's a distraction and I lack
hardware to do any proper testing with it. Earlier versions are on the mailing
list if anyone wants to pick it up in the future.

I notice there's a series for mv88e6393x circulating on the netdev mailing
list. As patch #1 is adding a new device specific op either this series will
need updating to cover the mv88e6393x or the mv88e6393x series will need
updating for the new op depenting on which lands first.

Chris Packham (4):
  net: dsa: mv88e6xxx: Don't force link when using in-band-status
  net: dsa: mv88e6xxx: Support serdes ports on MV88E6097/6095/6185
  net: dsa: mv88e6xxx: Add serdes interrupt support for MV88E6097
  net: dsa: mv88e6xxx: Handle error in serdes_get_regs

 drivers/net/dsa/mv88e6xxx/chip.c   |  47 ++++++++++-
 drivers/net/dsa/mv88e6xxx/chip.h   |   4 +
 drivers/net/dsa/mv88e6xxx/port.c   |  36 +++++++++
 drivers/net/dsa/mv88e6xxx/port.h   |   3 +
 drivers/net/dsa/mv88e6xxx/serdes.c | 123 +++++++++++++++++++++++++++--
 drivers/net/dsa/mv88e6xxx/serdes.h |   9 +++
 6 files changed, 213 insertions(+), 9 deletions(-)

-- 
2.29.2


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox