Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] inet_diag: fix reporting cgroup classid and fallback to priority
From: Konstantin Khlebnikov @ 2019-02-09 10:35 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller; +Cc: Sasha Levin, linux-sctp

Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.

Extension INET_DIAG_CLASS_ID has not way to request from the beginning.

This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.

Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.

Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).

So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid")
---
 include/uapi/linux/inet_diag.h |   16 +++++++++++-----
 net/ipv4/inet_diag.c           |   10 +++++++++-
 net/sctp/diag.c                |    1 +
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index 14565d703291..e8baca85bac6 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -137,15 +137,21 @@ enum {
 	INET_DIAG_TCLASS,
 	INET_DIAG_SKMEMINFO,
 	INET_DIAG_SHUTDOWN,
-	INET_DIAG_DCTCPINFO,
-	INET_DIAG_PROTOCOL,  /* response attribute only */
+
+	/*
+	 * Next extenstions cannot be requested in struct inet_diag_req_v2:
+	 * its field idiag_ext has only 8 bits.
+	 */
+
+	INET_DIAG_DCTCPINFO,	/* request as INET_DIAG_VEGASINFO */
+	INET_DIAG_PROTOCOL,	/* response attribute only */
 	INET_DIAG_SKV6ONLY,
 	INET_DIAG_LOCALS,
 	INET_DIAG_PEERS,
 	INET_DIAG_PAD,
-	INET_DIAG_MARK,
-	INET_DIAG_BBRINFO,
-	INET_DIAG_CLASS_ID,
+	INET_DIAG_MARK,		/* only with CAP_NET_ADMIN */
+	INET_DIAG_BBRINFO,	/* request as INET_DIAG_VEGASINFO */
+	INET_DIAG_CLASS_ID,	/* request as INET_DIAG_TCLASS */
 	INET_DIAG_MD5SIG,
 	__INET_DIAG_MAX,
 };
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 1a4e9ff02762..5731670c560b 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -108,6 +108,7 @@ static size_t inet_sk_attr_size(struct sock *sk,
 		+ nla_total_size(1) /* INET_DIAG_TOS */
 		+ nla_total_size(1) /* INET_DIAG_TCLASS */
 		+ nla_total_size(4) /* INET_DIAG_MARK */
+		+ nla_total_size(4) /* INET_DIAG_CLASS_ID */
 		+ nla_total_size(sizeof(struct inet_diag_meminfo))
 		+ nla_total_size(sizeof(struct inet_diag_msg))
 		+ nla_total_size(SK_MEMINFO_VARS * sizeof(u32))
@@ -287,12 +288,19 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 			goto errout;
 	}
 
-	if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
+	if (ext & (1 << (INET_DIAG_CLASS_ID - 1)) ||
+	    ext & (1 << (INET_DIAG_TCLASS - 1))) {
 		u32 classid = 0;
 
 #ifdef CONFIG_SOCK_CGROUP_DATA
 		classid = sock_cgroup_classid(&sk->sk_cgrp_data);
 #endif
+		/* Fallback to socket priority if class id isn't set.
+		 * Classful qdiscs use it as direct reference to class.
+		 * For cgroup2 classid is always zero.
+		 */
+		if (!classid)
+			classid = sk->sk_priority;
 
 		if (nla_put_u32(skb, INET_DIAG_CLASS_ID, classid))
 			goto errout;
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index 078f01a8d582..435847d98b51 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -256,6 +256,7 @@ static size_t inet_assoc_attr_size(struct sctp_association *asoc)
 		+ nla_total_size(1) /* INET_DIAG_TOS */
 		+ nla_total_size(1) /* INET_DIAG_TCLASS */
 		+ nla_total_size(4) /* INET_DIAG_MARK */
+		+ nla_total_size(4) /* INET_DIAG_CLASS_ID */
 		+ nla_total_size(addrlen * asoc->peer.transport_count)
 		+ nla_total_size(addrlen * addrcnt)
 		+ nla_total_size(sizeof(struct inet_diag_meminfo))


^ permalink raw reply related

* Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
From: Heiner Kallweit @ 2019-02-09 11:50 UTC (permalink / raw)
  To: Sander Eikelenboom, Eric Dumazet, Realtek linux nic maintainers,
	Eric Dumazet
  Cc: Linus Torvalds, linux-kernel, netdev
In-Reply-To: <88b80a6b-42e0-ce4e-8aad-4e23a17c7e65@eikelenboom.it>

On 09.02.2019 11:07, Sander Eikelenboom wrote:
> On 09/02/2019 10:59, Heiner Kallweit wrote:
>> On 09.02.2019 10:34, Sander Eikelenboom wrote:
>>> On 09/02/2019 10:02, Heiner Kallweit wrote:
>>>> On 09.02.2019 00:09, Eric Dumazet wrote:
>>>>>
>>>>>
>>>>> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>>>>>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>>>>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>>>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>>>>>> L.S.,
>>>>>>>>>>>
>>>>>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>>>>>
>>>>>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>>>>>
>>>>>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>>>>>
>>>>>>>>> Hmm i did some diging and i think:
>>>>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>>>>>
>>>>>>>> You're right. Thought this was added in 4.20 already.
>>>>>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>>>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>>>>>> have onboard Realtek network I have quite a few testers out there.
>>>>>>>> Does the issue occur under specific circumstances like very high load?
>>>>>>>
>>>>>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>>>>>> on the host.
>>>>>>>
>>>>>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>>>>>> as author of the underlying changes.
>>>>>>>
>>>>>>> It could also be the barriers weren't that unneeded as assumed.
>>>>>>
>>>>>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>>>>>> test also with only 
>>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>>> removed.
>>>>>>
>>>>>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>>>>>
>>>>>> Sure, thanks.
>>>>>>
>>>>>>> BTW am i correct these patches are merely optimizations ?
>>>>>>
>>>>>> Yes
>>>>>>
>>>>>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>>>>>> to revert them for 5.0 and try again for 5.1 ?
>>>>>>>
>>>>>> Before removing both it would be good to test with only the barrier-removal removed.
>>>>>>
>>>>>
>>>>> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>> looks buggy to me, since the skb might have been freed already on another cpu when you call
>>>>>
>>>>> You could try :
>>>>>
>>>>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>>>>> index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644
>>>>> --- a/drivers/net/ethernet/realtek/r8169.c
>>>>> +++ b/drivers/net/ethernet/realtek/r8169.c
>>>>> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>>         dma_addr_t mapping;
>>>>>         u32 opts[2], len;
>>>>>         bool stop_queue;
>>>>> +       bool door_bell;
>>>>>         int frags;
>>>>>  
>>>>>         if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
>>>>> @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>>         /* Force memory writes to complete before releasing descriptor */
>>>>>         dma_wmb();
>>>>>  
>>>>> +       door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more);
>>>>> +
>>>>>         txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
>>>>>  
>>>>>         /* Force all memory writes to complete before notifying device */
>>>>> @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>>>>>         if (unlikely(stop_queue))
>>>>>                 netif_stop_queue(dev);
>>>>>  
>>>>> -       if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
>>>>> +       if (door_bell) {
>>>>>                 RTL_W8(tp, TxPoll, NPQ);
>>>>>                 mmiowb();
>>>>>         }
>>>>>
>>>> Thanks a lot for checking and for the proposed fix.
>>>> Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits?
>>>
>>> I have done that already during the night .. the results:
>>> - I can confirm 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 is the first commit which causes hitting the BUG_ON in lib/dynamic_queue_limits.c.
>>>   (in other word, with only reverting bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 it still blows up).
>>>
>>> - The Eric's patch only applies cleanly with bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 reverted, so that's what I tested.
>>>   The patch seems to prevent hitting the BUG_ON in lib/dynamic_queue_limits.c, it has run this night and I gave done a few kernel compiles
>>>   this morning. How ever during these kernel compiles i'm getting a transmit queue timeout which i haven't seen with 4.20.x, although i regularly
>>>   compile kernels in the same way as I do now. The only thing I can't say if that is due to this change, or if it's again something else.
>>>   Which makes me somewhat inclined to go testing the complete revert some more and see if I can trigger the queue timeout on that or not.
>>>
>>>   If I can, it is a separate issue.
>>>   If I can't it seems even with a patch it still seems as a regression in comparison with 4.20.x, for which
>>>   a revert would be the right thing to do (since as you indicated these are merely optimizations), 
>>>   which would give us more time for 5.1 to try to solve things on top of the 5.0-release-to-be.
>>>   (especially since I seem to still have other issues which need to be sorted out and time is limited)
>>>
>>>   The timeout in question:
>>>         [28336.869479] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out
>>>         [28336.881498] WARNING: CPU: 0 PID: 6925 at net/sched/sch_generic.c:461 dev_watchdog+0x20b/0x210
>>>         [28336.893358] Modules linked in:
>>>         [28336.904106] CPU: 0 PID: 6925 Comm: cc1 Tainted: G      D           5.0.0-rc5-20190208-thp-net-florian-rtl8169-eric-doflr+ #1
>>>         [28336.917385] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
>>>         [28336.928988] RIP: e030:dev_watchdog+0x20b/0x210
>>>         [28336.940623] Code: 00 49 63 4e e0 eb 90 4c 89 e7 c6 05 ad d8 f1 00 01 e8 a9 32 fd ff 89 d9 48 89 c2 4c 89 e6 48 c7 c7 50 59 89 82 e8 e5 92 4d ff <0f> 0b eb c0 90 48 c7 47 08 00 00 00 00 48 c7 07 00 00 00 00 0f b7
>>>         [28336.965265] RSP: e02b:ffff88807d403ea0 EFLAGS: 00010286
>>>         [28336.977465] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff82a69db8
>>>         [28336.991265] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000200
>>>         [28337.008865] RBP: ffff88807936e41c R08: 0000000000000000 R09: 0000000000000819
>>>         [28337.022250] R10: 0000000000000202 R11: ffffffff8247ca80 R12: ffff88807936e000
>>>         [28337.035204] R13: 0000000000000000 R14: ffff88807936e440 R15: 0000000000000001
>>>         [28337.049832] FS:  00007f53e9bf3840(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
>>>         [28337.062524] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>         [28337.075086] CR2: 00007f53e60c4000 CR3: 000000001a0be000 CR4: 0000000000000660
>>>         [28337.090052] Call Trace:
>>>         [28337.103615]  <IRQ>
>>>         [28337.116587]  ? qdisc_destroy+0x120/0x120
>>>         [28337.128905]  call_timer_fn+0x19/0x90
>>>         [28337.141892]  expire_timers+0x8b/0xa0
>>>         [28337.153354]  run_timer_softirq+0x7e/0x160
>>>         [28337.165931]  ? handle_irq_event_percpu+0x4c/0x70
>>>         [28337.176548]  ? handle_percpu_irq+0x32/0x50
>>>         [28337.186734]  __do_softirq+0xed/0x229
>>>         [28337.196404]  ? hypervisor_callback+0xa/0x20
>>>         [28337.207822]  irq_exit+0xb7/0xc0
>>>         [28337.218978]  xen_evtchn_do_upcall+0x27/0x40
>>>         [28337.230763]  xen_do_hypervisor_callback+0x29/0x40
>>>         [28337.241261]  </IRQ>
>>>         [28337.253283] RIP: e033:0xff7e62
>>>         [28337.264899] Code: 35 43 0f c7 00 4c 89 ef e8 8b 6d 67 ff 0f 1f 00 44 89 e0 44 89 e2 c1 e8 06 83 e2 3f 48 8b 0c c5 40 8d c6 01 48 0f a3 d1 72 0e <48> 8b 04 c5 50 8d c6 01 48 0f a3 d0 73 0b 44 89 e6 4c 89 ef e8 b5
>>>         [28337.288677] RSP: e02b:00007fff0fc6a340 EFLAGS: 00000202
>>>         [28337.299234] RAX: 0000000000000000 RBX: 00007f53e60c3580 RCX: 0000000000000000
>>>         [28337.309577] RDX: 0000000000000034 RSI: 0000000001e71a98 RDI: 00007fff0fc6a538
>>>         [28337.320724] RBP: 00007fff0fc6a4b0 R08: 0000000000000000 R09: 0000000000000000
>>>         [28337.331829] R10: 0000000000000001 R11: 00000000020cb3d0 R12: 0000000000000034
>>>         [28337.343900] R13: 00007fff0fc6a538 R14: 0000000000000000 R15: 0000000000000001
>>>         [28337.353977] ---[ end trace 6ff49f09286816b7 ]---
>>>
>> Thanks for your efforts. As usual this tx timeout trace says basically nothing except
>> "timeout" and root cause could be anything. Earlier you reported a memory allocation error,
>> did that occur again?
>> If we decide to revert, I'd leave removal of the memory barriers in (as it doesn't seem to
>> contribute to the issue) and just submit a patch to effectively revert
>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356.
> 
> I can't say if that is correct, because i haven't tested that.
> 
> Another thing I could test is:
>  - putting all the r8169 patches (and prerequisites) that went into 5.0 
>    up to bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3, onto 4.20.7 and see what that does.
>    If that would be feasible (not too many needed prerequisites out of r8169) and if 
>    you could spare me some time and prep such a branch somewhere so i can pull and compile that,
>    that would be great.
> 

Unfortunately there's quite a number of changes. Regarding __netdev_tx_sent_queue()
and watchdog timeout I found the following comment in drivers/net/ethernet/sfc/tx.c,
efx_enqueue_skb():

	if (__netdev_tx_sent_queue(tx_queue->core_txq, skb_len, xmit_more)) {
		struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);

		/* There could be packets left on the partner queue if those
		 * SKBs had skb->xmit_more set. If we do not push those they
		 * could be left for a long time and cause a netdev watchdog.
		 */
		if (txq2->xmit_more_available)
			efx_nic_push_buffers(txq2);

But I'm not sure whether the situation in r8169 is comparable. The following patch
implements what I mentioned earlier: It leaves all other 5.0 changes in place and
effectively reverts 2e6eedb4813e34d8d84ac0eb3afb668966f3f356. Would be great if
you could give it a try.


diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e8a112149..3cca2ffb2 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6192,7 +6192,6 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	struct device *d = tp_to_dev(tp);
 	dma_addr_t mapping;
 	u32 opts[2], len;
-	bool stop_queue;
 	int frags;
 
 	if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
@@ -6234,6 +6233,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	txd->opts2 = cpu_to_le32(opts[1]);
 
+	netdev_sent_queue(dev, skb->len);
+
 	skb_tx_timestamp(skb);
 
 	/* Force memory writes to complete before releasing descriptor */
@@ -6246,14 +6247,14 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	tp->cur_tx += frags + 1;
 
-	stop_queue = !rtl_tx_slots_avail(tp, MAX_SKB_FRAGS);
-	if (unlikely(stop_queue))
-		netif_stop_queue(dev);
-
-	if (__netdev_sent_queue(dev, skb->len, skb->xmit_more))
-		RTL_W8(tp, TxPoll, NPQ);
+	RTL_W8(tp, TxPoll, NPQ);
 
-	if (unlikely(stop_queue)) {
+	if (!rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
+		/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
+		 * not miss a ring update when it notices a stopped queue.
+		 */
+		smp_wmb();
+		netif_stop_queue(dev);
 		/* Sync with rtl_tx:
 		 * - publish queue status and cur_tx ring index (write barrier)
 		 * - refresh dirty_tx ring index (read barrier).
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v2 00/10] net: phy: Add support for 2.5GBASET PHYs
From: Heiner Kallweit @ 2019-02-09 13:22 UTC (permalink / raw)
  To: Maxime Chevallier, davem, Andrew Lunn
  Cc: netdev, linux-kernel, Florian Fainelli, Russell King,
	linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <20190207094939.27369-1-maxime.chevallier@bootlin.com>

On 07.02.2019 10:49, Maxime Chevallier wrote:
> Hello everyone,
> 
> This is the second iteration of the series introducing support for
> 2.5GBASET and 5GBASET to the network PHY infrastructure.
> 
> These 2 modes are described in the 802.3bz specifications, and allow to
> use 2.5G and 5G speeds on cat5e/cat6 cables.
> 
> The required infrastructure code is added for 2.5GBASET and 5GBASET, but
> only 2.5GBASET support is added to the Marvell10G driver. The reason is
> because the 5GBASER interface mode used to communicate with the MAC
> isn't implemented yet, and therefore this can't be tested.
> 
> This series has seen some rework since last time, see the full details
> in the patches changelog. The main highlights are :
> 
>  - Patch 1 moves the Pause ans Asym_Pause parameters update after
>    calling config_init. This allows us to change the phydev->supported
>    modes in config_init, while still forcing some quirks taken from
>    phydrv->features, to disable unsupported Pause modes. This was
>    proposed by Andrew.
> 
>  - Patch 2 generalizes the way we mask-out modes we don't want to use
>    when forcing the link speed through DT or ethtool, by walking through
>    the PHY settings table, as proposed by Russell.
> 
>  - Patch 4 implements automatic setting of TM, FIBRE and Backplane bits
>    from the list of supported linkmodes.
> 
>  - In Patch 5, we only read abilities from the PMA. Pause parameters
>    aren't built from the genphy_c45_pma_read_abilities, as it was done
>    in the previous iteration of the patch. We also amke use of
>    linkmode_mod_bit to make sure we mask-out unsupported modes.
> 
>  - In Patch 8, we manually check for the PMA device id to see if we have
>    to use a quirk when reading the 2.5G/5G Extended Abilities
> 
> Maxime Chevallier (10):
>   net: phy: Update PHY linkmodes after config_init
>   net: phy: Mask-out non-compatible modes when setting the max-speed
>   net: phy: Move of_set_phy_eee_broken to phy-core.c
>   net: phy: Automatically fill the generic TP, FIBRE and Backplane modes
>   net: phy: Extract genphy_c45_pma_read_abilities from marvell10g
>   net: phy: Add generic support for 2.5GBaseT and 5GBaseT
>   net: phy: marvell10g: Add support for 2.5GBASET
>   net: phy: marvell10g: Force reading of 2.5/5G
>   net: mvpp2: Add 2.5GBaseT support
>   net: phy: marvell10g: add support for the 88x2110 PHY
> 
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |   1 +
>  drivers/net/phy/marvell10g.c                  | 142 +++++------
>  drivers/net/phy/phy-c45.c                     | 111 ++++++++
>  drivers/net/phy/phy-core.c                    |  72 ++++++
>  drivers/net/phy/phy_device.c                  | 237 +++++++++---------
>  include/linux/linkmode.h                      |   6 +
>  include/linux/marvell_phy.h                   |   2 +
>  include/linux/phy.h                           |   3 +
>  include/uapi/linux/mdio.h                     |  16 ++
>  9 files changed, 405 insertions(+), 185 deletions(-)
> 
Hi Maxime,

Andrew and me are working on Aquantia PHY support and he handed over
to me a patch series which includes parts of the first version of your
series. Having said that I'm especially interested in your patches
5 and 6. Because your series is somewhat bigger and there are a few
review comments, preparing the next round may take time.

I'd propose that you extract generic patches being submission-ready
and split the patch series into two. I think the following patches
would be candidates for the first series: 2, 3, 5, 6
(provided they have no dependency on the other patches)
Based on that both of us can go on with our work.

Andrew, what do you think?

Heiner

^ permalink raw reply

* [PATCH net-next] net: phy: probe the PHY before determining the supported features
From: Heiner Kallweit @ 2019-02-09 13:46 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

From: Andrew Lunn <andrew@lunn.ch>
We will soon support asking the PHY at runtime to determine what
features it supports, rather than forcing it to be compile time.
But we should probe the PHY first. So probe the phy driver earlier.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 31f9e7c49..92b7a71df 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2220,6 +2220,18 @@ static int phy_probe(struct device *dev)
 
 	mutex_lock(&phydev->lock);
 
+	if (phydev->drv->probe) {
+		/* Deassert the reset signal */
+		phy_device_reset(phydev, 0);
+
+		err = phydev->drv->probe(phydev);
+		if (err) {
+			/* Assert the reset signal */
+			phy_device_reset(phydev, 1);
+			goto out;
+		}
+	}
+
 	/* Start out supporting everything. Eventually,
 	 * a controller will attach, and may modify one
 	 * or both of these values
@@ -2267,17 +2279,7 @@ static int phy_probe(struct device *dev)
 	/* Set the state to READY by default */
 	phydev->state = PHY_READY;
 
-	if (phydev->drv->probe) {
-		/* Deassert the reset signal */
-		phy_device_reset(phydev, 0);
-
-		err = phydev->drv->probe(phydev);
-		if (err) {
-			/* Assert the reset signal */
-			phy_device_reset(phydev, 1);
-		}
-	}
-
+out:
 	mutex_unlock(&phydev->lock);
 
 	return err;
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next] net: phy: Add support for asking the PHY its abilities
From: Heiner Kallweit @ 2019-02-09 14:24 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

From: Andrew Lunn <andrew@lunn.ch>
Add support for runtime determination of what the PHY supports, by
adding a new function to the phy driver. The get_features call should
set the phydev->supported member with the features the PHY supports.
It is only called if phydrv->features is NULL.

This requires minor changes to pause. The PHY driver should not set
pause abilities, except for when it has odd cause capabilities, e.g.
pause cannot be disabled. With this change, phydev->supported already
contains the drivers abilities, including pause. So rather than
considering phydrv->features, look at the phydev->supported, and
enable pause if neither of the pause bits are already set.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[hkallweit1@gmail.com: fixed small checkpatch complaint in one comment]
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 31 +++++++++++++++----------------
 include/linux/phy.h          |  6 ++++++
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 92b7a71df..8573d17ec 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2236,7 +2236,14 @@ static int phy_probe(struct device *dev)
 	 * a controller will attach, and may modify one
 	 * or both of these values
 	 */
-	linkmode_copy(phydev->supported, phydrv->features);
+	if (phydrv->features) {
+		linkmode_copy(phydev->supported, phydrv->features);
+	} else {
+		err = phydrv->get_features(phydev);
+		if (err)
+			goto out;
+	}
+
 	of_set_phy_supported(phydev);
 	linkmode_copy(phydev->advertising, phydev->supported);
 
@@ -2256,20 +2263,8 @@ static int phy_probe(struct device *dev)
 	 * (e.g. hardware erratum) where the driver wants to set only one
 	 * of these bits.
 	 */
-	if (test_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydrv->features) ||
-	    test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, phydrv->features)) {
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_Pause_BIT,
-				   phydev->supported);
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
-				   phydev->supported);
-		if (test_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydrv->features))
-			linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT,
-					 phydev->supported);
-		if (test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
-			     phydrv->features))
-			linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
-					 phydev->supported);
-	} else {
+	if (!test_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydev->supported) &&
+	    !test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, phydev->supported)) {
 		linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT,
 				 phydev->supported);
 		linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
@@ -2315,7 +2310,11 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
 {
 	int retval;
 
-	if (WARN_ON(!new_driver->features)) {
+	/* Either the features are hard coded, or dynamically
+	 * determine. It cannot be both or neither
+	 */
+	if (WARN_ON((!new_driver->features && !new_driver->get_features) ||
+		    (new_driver->features && new_driver->get_features))) {
 		pr_err("%s: Driver features are missing\n", new_driver->name);
 		return -EINVAL;
 	}
diff --git a/include/linux/phy.h b/include/linux/phy.h
index f41bf651f..d2ffae992 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -502,6 +502,12 @@ struct phy_driver {
 	 */
 	int (*probe)(struct phy_device *phydev);
 
+	/*
+	 * Probe the hardware to determine what abilities it has.
+	 * Should only set phydev->supported.
+	 */
+	int (*get_features)(struct phy_device *phydev);
+
 	/* PHY Power Management */
 	int (*suspend)(struct phy_device *phydev);
 	int (*resume)(struct phy_device *phydev);
-- 
2.20.1


^ permalink raw reply related

* Re: [ISSUE][4.20.6] mlx5 and checksum failures
From: Ian Kumlien @ 2019-02-09 15:54 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Cong Wang, David Miller, Saeed Mahameed,
	Linux Kernel Network Developers
In-Reply-To: <CAA85sZv-6y2RmSkHPVV_j1fW_uk_mJ2FznssTm52K1YC6L+3aA@mail.gmail.com>

On Fri, Feb 8, 2019 at 5:29 PM Ian Kumlien <ian.kumlien@gmail.com> wrote
> On Thu, Feb 7, 2019 at 11:01 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Thu, Feb 7, 2019 at 7:43 PM Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
> > > On Thu, Feb 7, 2019 at 2:17 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > On Thu, Feb 7, 2019 at 2:01 AM Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
> > > > > On Wed, Feb 6, 2019 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > > It changes directly after the first hw checksum failure, I don't know why =/
> > > > >
> > > > > weird, Maybe a real check-summing issue/corruption on the PCI ?!
> > > >
> > > > Actually, it seems to have been introduced in 4.20.6 - 4.20.5 works just fine
>
> > > Great, the difference is only 120 patches.
> > > that is bisect-able, it will only take 5 iterations to find the
> > > offending commit.
> >
> > I just wish it wasn't a server that takes, what feels like 5 minutes to boot...
> >
> > All of these seas of sensors 2d and 3d... =P
> >
> > But, yep, that's the plan
>
> Huh, spent most of the day with two bisects and none of them yielded
> any results....
>
> Looks like I'll have to start investigating the elrepo kernel-ml build =(

Just realized that it's not an entirely fair comparison - since
retpolines wasn't enabled, damned old compilers...

^ permalink raw reply

* [PATCH] net: mvpp2: clear flow control modes in 10G mode
From: Russell King @ 2019-02-09 16:06 UTC (permalink / raw)
  To: Antoine Tenart, Maxime Chevallier
  Cc: Baruch Siach, Sven Auhagen, David S. Miller, netdev

When mvpp2 configures the flow control modes in mvpp2_xlg_config() for
10G mode, it only ever set the flow control enable bits.  There is no
mechanism to clear these bits, which means that userspace is unable to
use standard APIs to disable flow control (the only way is to poke the
register directly.)

Fix the missing bit clearance to allow flow control to be disabled.
This means that, by default, as there is no negotiation in 10G modes
with mvpp2, flow control is now disabled rather than being rx-only.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index b63fac6ee2e6..199e6f17ee1b 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -4532,8 +4532,13 @@ static void mvpp2_xlg_config(struct mvpp2_port *port, unsigned int mode,

 	if (state->pause & MLO_PAUSE_TX)
 		ctrl0 |= MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
+	else
+		ctrl0 &= ~MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
+
 	if (state->pause & MLO_PAUSE_RX)
 		ctrl0 |= MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
+	else
+		ctrl0 &= ~MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;

 	ctrl4 &= ~MVPP22_XLG_CTRL4_MACMODSELECT_GMAC;
 	ctrl4 |= MVPP22_XLG_CTRL4_FWD_FC | MVPP22_XLG_CTRL4_FWD_PFC |
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next] net: marvell: mvpp2: clear flow control modes in 10G mode
From: Russell King @ 2019-02-09 16:06 UTC (permalink / raw)
  To: Antoine Tenart, Maxime Chevallier
  Cc: Baruch Siach, Sven Auhagen, David S. Miller, netdev

When mvpp2 configures the flow control modes in mvpp2_xlg_config() for
10G mode, it only ever set the flow control enable bits.  There is no
mechanism to clear these bits, which means that userspace is unable to
use standard APIs to disable flow control (the only way is to poke the
register directly.)

Fix the missing bit clearance to allow flow control to be disabled.
This means that, by default, as there is no negotiation in 10G modes
with mvpp2, flow control is now disabled rather than being rx-only.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index b63fac6ee2e6..199e6f17ee1b 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -4532,8 +4532,13 @@ static void mvpp2_xlg_config(struct mvpp2_port *port, unsigned int mode,

 	if (state->pause & MLO_PAUSE_TX)
 		ctrl0 |= MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
+	else
+		ctrl0 &= ~MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
+
 	if (state->pause & MLO_PAUSE_RX)
 		ctrl0 |= MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
+	else
+		ctrl0 &= ~MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;

 	ctrl4 &= ~MVPP22_XLG_CTRL4_MACMODSELECT_GMAC;
 	ctrl4 |= MVPP22_XLG_CTRL4_FWD_FC | MVPP22_XLG_CTRL4_FWD_PFC |
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] net: mvpp2: clear flow control modes in 10G mode
From: Russell King - ARM Linux admin @ 2019-02-09 16:07 UTC (permalink / raw)
  To: Antoine Tenart, Maxime Chevallier
  Cc: Baruch Siach, Sven Auhagen, David S. Miller, netdev
In-Reply-To: <E1gsV8v-00044k-Bq@rmk-PC.armlinux.org.uk>

Please ignore this one - subject line is not correct.  Thanks.

On Sat, Feb 09, 2019 at 04:06:13PM +0000, Russell King wrote:
> When mvpp2 configures the flow control modes in mvpp2_xlg_config() for
> 10G mode, it only ever set the flow control enable bits.  There is no
> mechanism to clear these bits, which means that userspace is unable to
> use standard APIs to disable flow control (the only way is to poke the
> register directly.)
> 
> Fix the missing bit clearance to allow flow control to be disabled.
> This means that, by default, as there is no negotiation in 10G modes
> with mvpp2, flow control is now disabled rather than being rx-only.
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
>  drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> index b63fac6ee2e6..199e6f17ee1b 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> @@ -4532,8 +4532,13 @@ static void mvpp2_xlg_config(struct mvpp2_port *port, unsigned int mode,
>  
>  	if (state->pause & MLO_PAUSE_TX)
>  		ctrl0 |= MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
> +	else
> +		ctrl0 &= ~MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN;
> +
>  	if (state->pause & MLO_PAUSE_RX)
>  		ctrl0 |= MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
> +	else
> +		ctrl0 &= ~MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN;
>  
>  	ctrl4 &= ~MVPP22_XLG_CTRL4_MACMODSELECT_GMAC;
>  	ctrl4 |= MVPP22_XLG_CTRL4_FWD_FC | MVPP22_XLG_CTRL4_FWD_PFC |
> -- 
> 2.7.4
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* [PATCH] net: hso: do not unregister if not registered
From: Yavuz, Tuba @ 2019-02-09 16:24 UTC (permalink / raw)
  To: netdev@vger.kernel.org

On an error path inside the hso_create_net_device function of the hso
driver, hso_free_net_device gets called. This causes potentially a
negative reference count in the net device if register_netdev has not
been called yet as hso_free_net_device calls unregister_netdev
regardless. I think the driver should distinguish these cases and call
unregister_netdev only if register_netdev has been called.

Signed-off-by: Tuba Yavuz <tuba@ece.ufl.edu>
---

--- linux-stable/drivers/net/usb/hso.c.orig	2019-01-27 14:45:58.232683119 -0500
+++ linux-stable/drivers/net/usb/hso.c	2019-02-05 17:54:17.056496019 -0500
@@ -2377,7 +2377,9 @@ static void hso_free_net_device(struct h

 	remove_net_device(hso_net->parent);

-	if (hso_net->net)
+	if (hso_net->net &&
+	    hso_net->net->reg_state == NETREG_REGISTERED)
 		unregister_netdev(hso_net->net);

 	/* start freeing */

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: phy: Add support for 2.5GBASET PHYs
From: Andrew Lunn @ 2019-02-09 16:25 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Maxime Chevallier, davem, netdev, linux-kernel, Florian Fainelli,
	Russell King, linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <81c340ea-54b0-1abf-94af-b8dc4ee83e3a@gmail.com>

> I'd propose that you extract generic patches being submission-ready
> and split the patch series into two. I think the following patches
> would be candidates for the first series: 2, 3, 5, 6
> (provided they have no dependency on the other patches)
> Based on that both of us can go on with our work.
> 
> Andrew, what do you think?

Yes, that would help.

Heiner, can you also submit the .get_features patches soon. That is
another important piece of the puzzle for both drivers.

	Thanks
		Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: phy: Add support for 2.5GBASET PHYs
From: Heiner Kallweit @ 2019-02-09 16:28 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Maxime Chevallier, davem, netdev, linux-kernel, Florian Fainelli,
	Russell King, linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, mw
In-Reply-To: <20190209162545.GA30856@lunn.ch>

On 09.02.2019 17:25, Andrew Lunn wrote:
>> I'd propose that you extract generic patches being submission-ready
>> and split the patch series into two. I think the following patches
>> would be candidates for the first series: 2, 3, 5, 6
>> (provided they have no dependency on the other patches)
>> Based on that both of us can go on with our work.
>>
>> Andrew, what do you think?
> 
> Yes, that would help.
> 
> Heiner, can you also submit the .get_features patches soon. That is
> another important piece of the puzzle for both drivers.
> 
Just submitted it few hours ago.
https://patchwork.ozlabs.org/patch/1039237/

> 	Thanks
> 		Andrew
> 
Heiner

^ permalink raw reply

* Re: [PATCH net-next] net: phy: Add support for asking the PHY its abilities
From: Andrew Lunn @ 2019-02-09 16:35 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <19c1e5c4-2f86-a3ff-e971-a0098c605b3f@gmail.com>

On Sat, Feb 09, 2019 at 03:24:47PM +0100, Heiner Kallweit wrote:
> From: Andrew Lunn <andrew@lunn.ch>
> Add support for runtime determination of what the PHY supports, by
> adding a new function to the phy driver. The get_features call should
> set the phydev->supported member with the features the PHY supports.
> It is only called if phydrv->features is NULL.
> 
> This requires minor changes to pause. The PHY driver should not set
> pause abilities, except for when it has odd cause capabilities, e.g.
> pause cannot be disabled. With this change, phydev->supported already
> contains the drivers abilities, including pause. So rather than
> considering phydrv->features, look at the phydev->supported, and
> enable pause if neither of the pause bits are already set.

Hi Heiner

Ah, cool, these are the patches i was asking for, when you asked
about splitting Maxime's patches. There is one more in my tree which
converts the marvell10g to using this. I think that should be posted
as well. It makes it clear how this should be used, and it replaces
one of the patches in Maxime's set.

> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> [hkallweit1@gmail.com: fixed small checkpatch complaint in one comment]

Thanks.

	Andrew

^ permalink raw reply

* Re: [PATCH v1 net-next 1/4] net: dsa: microchip: prepare PHY for proper advertisement
From: Andrew Lunn @ 2019-02-09 16:54 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: Sergio Paracuellos, Florian Fainelli, Pavel Machek,
	UNGLinuxDriver, netdev
In-Reply-To: <1549598829-25970-2-git-send-email-Tristram.Ha@microchip.com>

> +static void ksz9477_phy_setup(struct ksz_device *dev, int port,
> +			      struct phy_device *phy)
> +{
> +	/* ETHTOOL_LINK_MODE_Pause_BIT and ETHTOOL_LINK_MODE_Asym_Pause_BIT
> +	 * can be removed to disable flow control when rate limiting is used.
> +	 */
> +	if (port < dev->phy_port_cnt) {
> +		/* The MAC actually cannot run in 1000 half-duplex mode. */
> +		phy_remove_link_mode(phy,
> +				     ETHTOOL_LINK_MODE_1000baseT_Half_BIT);
> +	}

Hi Tristram

The comment about pause does not seem to match the code.

> +}
> +
>  static void ksz9477_port_setup(struct ksz_device *dev, int port, bool cpu_port)
>  {
>  	u8 data8;
> @@ -1151,6 +1164,7 @@ static int ksz9477_setup(struct dsa_switch *ds)
>  	.setup			= ksz9477_setup,
>  	.phy_read		= ksz9477_phy_read16,
>  	.phy_write		= ksz9477_phy_write16,
> +	.adjust_link		= ksz_adjust_link,
>  	.port_enable		= ksz_enable_port,
>  	.port_disable		= ksz_disable_port,
>  	.get_strings		= ksz9477_get_strings,
> @@ -1298,6 +1312,7 @@ static void ksz9477_switch_exit(struct ksz_device *dev)
>  	.get_port_addr = ksz9477_get_port_addr,
>  	.cfg_port_member = ksz9477_cfg_port_member,
>  	.flush_dyn_mac_table = ksz9477_flush_dyn_mac_table,
> +	.phy_setup = ksz9477_phy_setup,
>  	.port_setup = ksz9477_port_setup,
>  	.shutdown = ksz9477_reset_switch,
>  	.detect = ksz9477_switch_detect,
> diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
> index 8a5111f..a57bda7 100644
> --- a/drivers/net/dsa/microchip/ksz_common.c
> +++ b/drivers/net/dsa/microchip/ksz_common.c
> @@ -2,7 +2,7 @@
>  /*
>   * Microchip switch driver main logic
>   *
> - * Copyright (C) 2017-2018 Microchip Technology Inc.
> + * Copyright (C) 2017-2019 Microchip Technology Inc.
>   */
>  
>  #include <linux/delay.h>
> @@ -61,6 +61,22 @@ int ksz_phy_write16(struct dsa_switch *ds, int addr, int reg, u16 val)
>  }
>  EXPORT_SYMBOL_GPL(ksz_phy_write16);
>  
> +void ksz_adjust_link(struct dsa_switch *ds, int port,
> +		     struct phy_device *phydev)
> +{
> +	struct ksz_device *dev = ds->priv;
> +	struct ksz_port *p = &dev->ports[port];
> +
> +	if (phydev->link) {
> +		dev->live_ports |= (1 << port) & dev->on_ports;
> +	} else if (p->phydev.link) {
> +		p->link_just_down = 1;
> +		dev->live_ports &= ~(1 << port);
> +	}

It is not very easy to understand what on_ports, live_ports and
link_just_down are all about. Could you simplify this, and make the
code symmetric? ksz_enable_port does not touch any of these, but
ksz_disable_port does?

grep live_ports *
ksz9477.c:		dev->live_ports = dev->host_mask;
ksz9477.c:				  dev->live_ports |= (1 << port);
ksz_common.c:				  dev->live_ports &= ~(1 << port);
ksz_priv.h:				  u16 live_ports;

So live_ports is not actually used for anything.

> +	p->phydev = *phydev;

This last statement seems like it belongs in phy_setup, which gets
called as part of port_enable.


^ permalink raw reply

* Re: Resource management for ndo_xdp_xmit (Was: [PATCH net] virtio_net: Account for tx bytes and packets on sending xdp_frames)
From: Toke Høiland-Jørgensen @ 2019-02-09 16:56 UTC (permalink / raw)
  To: Saeed Mahameed, brouer@redhat.com
  Cc: hawk@kernel.org, virtualization@lists.linux-foundation.org,
	borkmann@iogearbox.net, Tariq Toukan, john.fastabend@gmail.com,
	mst@redhat.com, jakub.kicinski@netronome.com, dsahern@gmail.com,
	netdev@vger.kernel.org, jasowang@redhat.com, davem@davemloft.net,
	makita.toshiaki@lab.ntt.co.jp
In-Reply-To: <b4eb9dd1c7aaa3215ade8d3d17f397588a519ca3.camel@mellanox.com>

Saeed Mahameed <saeedm@mellanox.com> writes:

> On Fri, 2019-02-08 at 15:17 -0800, Saeed Mahameed wrote:
>> On Thu, 2019-02-07 at 19:08 +0000, Saeed Mahameed wrote:
>> > 
>> > So 
>> > 1) on dev_map_update_elem() we will call
>> > dev->dev->ndo_bpf() to notify the device on the intention to
>> > start/stop
>> > redirect, and wait for it to create/destroy the HW resources
>> > before/after actually updating the map
>> > 
>> 
>> silly me, dev_map_update_elem must be atomic, we can't hook driver
>> resource allocation to it, it must come as a separate request
>> (syscall)
>> from user space to request to create XDP redirect resources.
>> 
>
> Well, it is possible to render dev_map_update_elem non-atomic and fail
> BPF programs who try to update it in the verifier
> check_map_func_compatibility.
>
> if you know of any case where devmap needs to be updated from the BPF
> program please let me know.

Well, maybe? My plan for dealing with the non-map redirect variant is
essentially to have a hidden map that does just-in-time insertion of
ifindexes if they are not already there; and rework XDP_REDIRECT to use
that.

However, this would essentially be an insert from eBPF; but I guess
maybe it could be deferred until the RX-side driver exits its NAPI loop
(as we're buffering frames in the map anyway)?

-Toke

^ permalink raw reply

* Re: Resource management for ndo_xdp_xmit (Was: [PATCH net] virtio_net: Account for tx bytes and packets on sending xdp_frames)
From: Toke Høiland-Jørgensen @ 2019-02-09 16:56 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: brouer@redhat.com, hawk@kernel.org,
	virtualization@lists.linux-foundation.org, borkmann@iogearbox.net,
	Tariq Toukan, john.fastabend@gmail.com, mst@redhat.com,
	dsahern@gmail.com, netdev@vger.kernel.org, jasowang@redhat.com,
	davem@davemloft.net, makita.toshiaki@lab.ntt.co.jp
In-Reply-To: <20190208180516.287f6ddc@cakuba.netronome.com>

Jakub Kicinski <jakub.kicinski@netronome.com> writes:

> On Sat, 9 Feb 2019 00:18:31 +0000, Saeed Mahameed wrote:
>> On Fri, 2019-02-08 at 15:17 -0800, Saeed Mahameed wrote:
>> > On Thu, 2019-02-07 at 19:08 +0000, Saeed Mahameed wrote:  
>> > > 
>> > > So 
>> > > 1) on dev_map_update_elem() we will call
>> > > dev->dev->ndo_bpf() to notify the device on the intention to
>> > > start/stop
>> > > redirect, and wait for it to create/destroy the HW resources
>> > > before/after actually updating the map
>> > >   
>> > 
>> > silly me, dev_map_update_elem must be atomic, we can't hook driver
>> > resource allocation to it, it must come as a separate request
>> > (syscall)
>> > from user space to request to create XDP redirect resources.
>> >   
>> 
>> Well, it is possible to render dev_map_update_elem non-atomic and fail
>> BPF programs who try to update it in the verifier
>> check_map_func_compatibility.
>> 
>> if you know of any case where devmap needs to be updated from the BPF
>> program please let me know.
>
> Did we find a solution to non-map redirect?

See my other reply to Saeed :)

-Toke

^ permalink raw reply

* Re: [PATCH v1 net-next 3/4] net: dsa: microchip: use readx_poll_time for polling
From: Andrew Lunn @ 2019-02-09 17:01 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: Sergio Paracuellos, Florian Fainelli, Pavel Machek,
	UNGLinuxDriver, netdev
In-Reply-To: <1549598829-25970-4-git-send-email-Tristram.Ha@microchip.com>

On Thu, Feb 07, 2019 at 08:07:08PM -0800, Tristram.Ha@microchip.com wrote:
> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Replace register polling functions using timeout with readx_poll_time call.
> 
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
> ---
>  drivers/net/dsa/microchip/ksz9477.c | 91 +++++++++++--------------------------
>  1 file changed, 27 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
> index 4502e13..8391b9e 100644
> --- a/drivers/net/dsa/microchip/ksz9477.c
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@ -114,28 +114,11 @@ static void ksz9477_port_cfg32(struct ksz_device *dev, int port, int offset,
>  	data; \
>  })
>  
> -static int ksz9477_wait_vlan_ctrl_ready(struct ksz_device *dev, u32 waiton,
> -					int timeout)
> -{
> -	u8 data;
> -
> -	do {
> -		ksz_read8(dev, REG_SW_VLAN_CTRL, &data);
> -		if (!(data & waiton))
> -			break;
> -		usleep_range(1, 10);
> -	} while (timeout-- > 0);
> -
> -	if (timeout <= 0)
> -		return -ETIMEDOUT;
> -
> -	return 0;

Hi Tristram

I think it would be better to keep ksz9477_wait_vlan_ctrl_ready(),
ksz9477_wait_alu_ready() etc, and change there implementation to use
readx_poll_timeout(). The function names act as better documentation
for what we are waiting for than the parameters being passed to
readx_poll_timeout().

	Andrew

^ permalink raw reply

* Re: [PATCH v1 net-next 4/4] net: dsa: microchip: remove unnecessary include headers
From: Andrew Lunn @ 2019-02-09 17:01 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: Sergio Paracuellos, Florian Fainelli, Pavel Machek,
	UNGLinuxDriver, netdev
In-Reply-To: <1549598829-25970-5-git-send-email-Tristram.Ha@microchip.com>

On Thu, Feb 07, 2019 at 08:07:09PM -0800, Tristram.Ha@microchip.com wrote:
> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Remove unnecessary header include lines.
> 
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v1 net-next 2/4] net: dsa: microchip: add MIB counter reading support
From: Andrew Lunn @ 2019-02-09 17:22 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: Sergio Paracuellos, Florian Fainelli, Pavel Machek,
	UNGLinuxDriver, netdev
In-Reply-To: <1549598829-25970-3-git-send-email-Tristram.Ha@microchip.com>

On Thu, Feb 07, 2019 at 08:07:07PM -0800, Tristram.Ha@microchip.com wrote:
> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Add MIB counter reading support.
> 
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
> ---
>  drivers/net/dsa/microchip/ksz9477.c    | 139 +++++++++++++++++++++++----------
>  drivers/net/dsa/microchip/ksz_common.c |  96 +++++++++++++++++++++++
>  drivers/net/dsa/microchip/ksz_common.h |   2 +
>  drivers/net/dsa/microchip/ksz_priv.h   |   7 +-
>  4 files changed, 198 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
> index 0fdb22d..4502e13 100644
> --- a/drivers/net/dsa/microchip/ksz9477.c
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@ -10,6 +10,7 @@
>  #include <linux/gpio.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/iopoll.h>
>  #include <linux/platform_data/microchip-ksz.h>
>  #include <linux/phy.h>
>  #include <linux/etherdevice.h>
> @@ -18,8 +19,8 @@
>  #include <net/switchdev.h>
>  
>  #include "ksz_priv.h"
> -#include "ksz_common.h"
>  #include "ksz9477_reg.h"
> +#include "ksz_common.h"
>  
>  static const struct {
>  	int index;
> @@ -92,6 +93,27 @@ static void ksz9477_port_cfg32(struct ksz_device *dev, int port, int offset,
>  	ksz_write32(dev, addr, data);
>  }
>  
> +#define read8_op(addr)	\
> +({ \
> +	u8 data; \
> +	ksz_read8(dev, addr, &data); \
> +	data; \
> +})
> +
> +#define read32_op(addr)	\
> +({ \
> +	u32 data; \
> +	ksz_read32(dev, addr, &data); \
> +	data; \
> +})

These two are not used. Please remove them.

> +
> +#define pread32_op(addr)	\
> +({ \
> +	u32 data; \
> +	ksz_pread32(dev, port, addr, &data); \
> +	data; \
> +})


It works, but it is not nice, and it makes assumptions about how
readx_poll_timeout is implemented.

> +	ret = readx_poll_timeout(pread32_op, REG_PORT_MIB_CTRL_STAT__4, data,
> +				 !(data & MIB_COUNTER_READ), 10, 1000);

The macro is only used one, and addr is fixed,
REG_PORT_MIB_CTRL_STAT__4. So you can at least replace addr with port,
and rename the macro pread32_stat(port).


> +	/* failed to read MIB. get out of loop */
> +	if (ret < 0) {
> +		dev_dbg(dev->dev, "Failed to get MIB\n");
> +		return;
> +	}
> +
> +	/* count resets upon read */
> +	ksz_pread32(dev, port, REG_PORT_MIB_DATA, &data);
> +	*cnt += data;
> +}
> +
> +static void ksz9477_r_mib_pkt(struct ksz_device *dev, int port, u16 addr,
> +			      u64 *dropped, u64 *cnt)
> +{
> +	addr = ksz9477_mib_names[addr].index;
> +	ksz9477_r_mib_cnt(dev, port, addr, cnt);
> +}
> +
> +static void ksz9477_freeze_mib(struct ksz_device *dev, int port, bool freeze)
> +{
> +	struct ksz_port *p = &dev->ports[port];
> +	u32 val = freeze ? MIB_COUNTER_FLUSH_FREEZE : 0;

Reverse Christmas tree.

> +
> +	/* enable/disable the port for flush/freeze function */
> +	mutex_lock(&p->mib.cnt_mutex);
> +	ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, val);
> +
> +	/* used by MIB counter reading code to know freeze is enabled */
> +	p->freeze = freeze;
> +	mutex_unlock(&p->mib.cnt_mutex);
> +}


> +static void ksz_mib_read_work(struct work_struct *work)
> +{
> +	struct ksz_device *dev =
> +		container_of(work, struct ksz_device, mib_read);
> +	struct ksz_port *p;
> +	struct ksz_port_mib *mib;
> +	int i;
> +
> +	for (i = 0; i < dev->mib_port_cnt; i++) {
> +		p = &dev->ports[i];
> +		if (!p->on)
> +			continue;
> +		mib = &p->mib;
> +		mutex_lock(&mib->cnt_mutex);
> +
> +		/* read only dropped counters when link is not up */
> +		if (p->link_just_down)
> +			p->link_just_down = 0;
> +		else if (!p->phydev.link)
> +			mib->cnt_ptr = dev->reg_mib_cnt;

This link_just_down stuff is not clear at all. Why can the drop
counters not be read when the link is up?

> +		port_r_cnt(dev, i);
> +		mutex_unlock(&mib->cnt_mutex);
> +	}
> +}

^ permalink raw reply

* Re: [PATCH net-next] net/tls: Disable async decrytion for tls1.3
From: David Miller @ 2019-02-09 17:28 UTC (permalink / raw)
  To: vakul.garg; +Cc: netdev, borisp, aviadye, davejwatson, doronrk
In-Reply-To: <20190209075110.19881-1-vakul.garg@nxp.com>

From: Vakul Garg <vakul.garg@nxp.com>
Date: Sat, 9 Feb 2019 07:53:28 +0000

> Function tls_sw_recvmsg() dequeues multiple records from stream parser
> and decrypts them. In case the decryption is done by async accelerator,
> the records may get submitted for decryption while the previous ones may
> not have been decryted yet. For tls1.3, the record type is known only
> after decryption. Therefore, for tls1.3, tls_sw_recvmsg() may submit
> records for decryption even if it gets 'handshake' records after 'data'
> records. These intermediate 'handshake' records may do a key updation.
> By the time new keys are given to ktls by userspace, it is possible that
> ktls has already submitted some records i(which are encrypted with new
> keys) for decryption using old keys. This would lead to decrypt failure.
> Therefore, async decryption of records should be disabled for tls1.3.
> 
> Fixes: 130b392c6cd6b ("net: tls: Add tls 1.3 support")
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] net: phy: remove unneeded masking of PHY register read results
From: David Miller @ 2019-02-09 17:29 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <a644082b-3268-5070-ddaa-ff24ce9840c2@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 9 Feb 2019 09:46:53 +0100

> PHY registers are only 16 bits wide, therefore, if the read was
> successful, there's no need to mask out the higher 16 bits.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>

Applied, thanks Heiner.

^ permalink raw reply

* Re: [PATCH net-next 01/16] Documentation: networking: switchdev: Update port parent ID section
From: Jiri Pirko @ 2019-02-09 17:22 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190209003248.31088-2-f.fainelli@gmail.com>

Sat, Feb 09, 2019 at 01:32:33AM CET, f.fainelli@gmail.com wrote:
>Update the section about switchdev drivers having to implement a
>switchdev_port_attr_get() function to return
>SWITCHDEV_ATTR_ID_PORT_PARENT_ID since that is no longer valid after
>commit bccb30254a4a ("net: Get rid of
>SWITCHDEV_ATTR_ID_PORT_PARENT_ID").
>
>Fixes: bccb30254a4a ("net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID")
>Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>---
> Documentation/networking/switchdev.txt | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>index f3244d87512a..2842f63ad47b 100644
>--- a/Documentation/networking/switchdev.txt
>+++ b/Documentation/networking/switchdev.txt
>@@ -92,11 +92,11 @@ device.
> Switch ID
> ^^^^^^^^^
> 
>-The switchdev driver must implement the switchdev op switchdev_port_attr_get
>-for SWITCHDEV_ATTR_ID_PORT_PARENT_ID for each port netdev, returning the same
>-physical ID for each port of a switch.  The ID must be unique between switches
>-on the same system.  The ID does not need to be unique between switches on
>-different systems.
>+The switchdev driver must implement the net_device operation
>+ndo_get_port_parent_id for each port netdev,  returning the same physical ID

Double space. Otherwise, this looks fine.

Acked-by: Jiri Pirko <jiri@mellanox.com>


>+for each port of a switch. The ID must be unique between switches on the same
>+system. The ID does not need to be unique between switches on different
>+systems.
> 
> The switch ID is used to locate ports on a switch and to know if aggregated
> ports belong to the same switch.
>-- 
>2.17.1
>

^ permalink raw reply

* Re: [PATCH net-next] net: phy: probe the PHY before determining the supported features
From: David Miller @ 2019-02-09 17:32 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <7163437e-3c36-0a6b-d328-1f54141c7a13@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 9 Feb 2019 14:46:30 +0100

> From: Andrew Lunn <andrew@lunn.ch>
> We will soon support asking the PHY at runtime to determine what
> features it supports, rather than forcing it to be compile time.
> But we should probe the PHY first. So probe the phy driver earlier.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: phy: Add support for asking the PHY its abilities
From: David Miller @ 2019-02-09 17:33 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <19c1e5c4-2f86-a3ff-e971-a0098c605b3f@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 9 Feb 2019 15:24:47 +0100

> From: Andrew Lunn <andrew@lunn.ch>
> Add support for runtime determination of what the PHY supports, by
> adding a new function to the phy driver. The get_features call should
> set the phydev->supported member with the features the PHY supports.
> It is only called if phydrv->features is NULL.
> 
> This requires minor changes to pause. The PHY driver should not set
> pause abilities, except for when it has odd cause capabilities, e.g.
> pause cannot be disabled. With this change, phydev->supported already
> contains the drivers abilities, including pause. So rather than
> considering phydrv->features, look at the phydev->supported, and
> enable pause if neither of the pause bits are already set.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> [hkallweit1@gmail.com: fixed small checkpatch complaint in one comment]
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: marvell: mvpp2: clear flow control modes in 10G mode
From: David Miller @ 2019-02-09 17:34 UTC (permalink / raw)
  To: rmk+kernel
  Cc: antoine.tenart, maxime.chevallier, baruch, sven.auhagen, netdev
In-Reply-To: <E1gsV9X-00047r-QY@rmk-PC.armlinux.org.uk>

From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Sat, 09 Feb 2019 16:06:51 +0000

> When mvpp2 configures the flow control modes in mvpp2_xlg_config() for
> 10G mode, it only ever set the flow control enable bits.  There is no
> mechanism to clear these bits, which means that userspace is unable to
> use standard APIs to disable flow control (the only way is to poke the
> register directly.)
> 
> Fix the missing bit clearance to allow flow control to be disabled.
> This means that, by default, as there is no negotiation in 10G modes
> with mvpp2, flow control is now disabled rather than being rx-only.
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox