Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Alexandre Belloni @ 2016-04-15 20:56 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: David S . Miller, Nicolas Ferre, netdev, linux-kernel,
	Andrew Lunn
In-Reply-To: <57114AA4.5080803@gmail.com>

On 15/04/2016 at 13:10:12 -0700, Florian Fainelli wrote :
> On 15/04/16 12:56, Alexandre Belloni wrote:
> > Commit d5c3d84657db ("net: phy: Avoid polling PHY with
> > PHY_IGNORE_INTERRUPTS") removed the last polling done on the phy. Since
> > then, the last actual poll done on the phy happens PHY_STATE_TIME seconds
> > (that is actually one second) after registering the phy. If the interface
> > is not UP by that time, any previous IRQ indicating the link is up is
> > ignored. Moreover, nothing will start the autonegociation so the phy will
> > simply change from READY to UP and never actually go to RUNNING.
> 
> What do you mean by that? phy_start() will start auto-negotiation.
> 

In my case, it doesn't because it switches the state from PHY_READY to
PHY_UP but phy_state_machine() is never called afterwards.

> > The one second delay explains why the issue is not seen when booting from
> > NFS or when the interface is configured at boot time.
> > 
> > To solve that, ensure the state machine is called as soon as the state
> > changes from READY to UP.
> 
> The fix may be good, but I would like to see which driver are you
> observing this with? Also, having a capture of the PHY state machine
> with debug prints enabled could help us figure out the sequence of
> events leading to what you observed.
> 

I'm using a macb with a Micrel KSZ8081.

Trace without my patch:
libphy: MACB_mii_bus: probed
macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
[...]
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
[...]
# ifconfig eth0 up
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready


With my patch:
libphy: MACB_mii_bus: probed
macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
[...]
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
[...]
# ifconfig eth0 up
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change UP -> AN
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change AN -> NOLINK
macb f8020000.ethernet eth0: link up (100/Full)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change CHANGELINK -> RUNNING
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready



> Assuming you might be using the macb driver, I see a race condition in
> how macb_probe() registers for its MDIO bus and probes for the PHY,
> after calling register_netdev(), which is something that is not good,
> because as soon as register_netdev() is called, an in-kernel notifier
> can start opening the device for use before you have returned...
> 

Well, I'm not sure  I'm running into that because phy_start() is only called
once I open the interface from userspace.


-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
From: Stephen Hemminger @ 2016-04-15 20:58 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Cong Wang, Linux Kernel Network Developers, Eric Dumazet,
	Jiri Benc, Marcelo Ricardo Leitner
In-Reply-To: <1460159706.2880965.573380353.39845928@webmail.messagingengine.com>

On Sat, 09 Apr 2016 01:55:06 +0200
Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:

> 
> 
> On Sat, Apr 9, 2016, at 01:24, Cong Wang wrote:
> > On Fri, Apr 8, 2016 at 1:55 PM, Hannes Frederic Sowa
> > <hannes@stressinduktion.org> wrote:
> > > Due to the fact that the udp socket is destructed asynchronously in a
> > > work queue, we have some nondeterministic behavior during shutdown of
> > > vxlan tunnels and creating new ones. Fix this by keeping the destruction
> > > process synchronous in regards to the user space process so IFF_UP can
> > > be reliably set.
> > >
> > > udp_tunnel_sock_release destroys vs->sock->sk if reference counter
> > > indicates so. We expect to have the same lifetime of vxlan_sock and
> > > vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
> > > only destruct the whole socket after we can be sure it cannot be found
> > > by searching vxlan_net->sock_list.
> > >
> > 
> > I am wondering what is the reason why we used work queue from
> > the beginning?
> 
> I actually don't know. It was like that from the beginning. I cc'ed
> Stephen, maybe he remembers?
> 
> Bye,
> Hannes

The problem was that VXLAN needs to update multicast settings and that
can't be done under RTNL.

^ permalink raw reply

* Re: [PATCH net-next 7/7] net: dsa: mv88e6xxx: drop switch id
From: Vivien Didelot @ 2016-04-15 21:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20160415193818.GE18523@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

<snip>

>> -#define PORT_SWITCH_ID_6350	0x3710
>> -#define PORT_SWITCH_ID_6351	0x3750
>> -#define PORT_SWITCH_ID_6352	0x3520
>
> NACK
>
> These numbers are not obvious. PORT_SWITCH_ID_6320 i can
> understand. 0x1150 i have no idea what it is.

0x1150 is not even correct. That's the product number (bits 4:15) masked
with an assumed revision 0 (bits 0:3).

That leads to confusion and error, as seen in the patch 2/7.

These values are now only used in a device description table, where they
seem pretty understandable to me.

This header file is full of inconsistencies. We have masks, offsets,
shifts, shifted and unshifted values, just for the sake of hidding said
magic numbers, while an explicit comment in a function could do the job.

But OK if we really want them defined, I'll introduce 12-bit
PORT_SWITCH_ID_PROD_NUM_* before dropping the 16-bit PORT_SWITCH_ID_*.

Thanks,
Vivien

^ permalink raw reply

* Re: [PATCH net-next 4/7] net: dsa: mv88e6xxx: add family to info
From: Vivien Didelot @ 2016-04-15 21:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <notmuch-sha1-5b8efc36765bacf253ff62234056042d5c4c5c37>

Vivien Didelot <vivien.didelot@savoirfairelinux.com> writes:

>>> +	{ MV88E6XXX_INFO(6165, 0x165, "Marvell 88E6165") },
>>
>> I think 
>>
>>> +	{ MV88E6XXX_INFO(MV88E6XXX_FAMILY_6165, 0x165, "Marvell 88E6165") },
>>
>> is clearer. It is hard to know what these values mean unless you go
>> look at the macro.
>
> Same goes for the MV88E6XXX_INFO macro... I wanted to avoid long lines
> while keeping the info table clear enough.
>
>     MV88E6XXX_INFO(0x121, "Marvell 88E6123",
>                    MV88E6XXX_FAMILY_6165,) },
>    /*             Family   Prod   Name             */
>    { MV88E6XXX_INFO(6165, 0x121, "Marvell 88E6123") },
>    { MV88E6XXX_INFO(6165, 0x161, "Marvell 88E6161") },
>    { MV88E6XXX_INFO(6165, 0x165, "Marvell 88E6165") },
>    { MV88E6XXX_INFO(6165, 0x165, "Marvell 88E6165") },
>
> But I don't really mind in fact, we'll do as you guys wish.

Oops, sent too fast. Thinking about that, I'll just keep plain struct
mv88e6xxx_info in the tables and we will maybe introduce such macro when
merging everything together.

Thanks,
Vivien

^ permalink raw reply

* Re: [PATCH net-next] net/hsr: Added support for HSR v1
From: David Miller @ 2016-04-15 21:08 UTC (permalink / raw)
  To: mail
  Cc: arvid.brodin, hannes, sd, henrik, nikolay, tgraf, linville, gospo,
	dsa, eranbe, ast, netdev, peter.heise
In-Reply-To: <20160413115222.GA42572@aircraft-controller>

From: Peter Heise <mail@pheise.de>
Date: Wed, 13 Apr 2016 13:52:22 +0200

> This patch adds support for the newer version 1 of the HSR
> networking standard. Version 0 is still default and the new
> version has to be selected via iproute2.
> 
> Main changes are in the supervision frame handling and its
> ethertype field.
> 
> Signed-off-by: Peter Heise <peter.heise@airbus.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/5] qed/qede: Add tunneling support
From: David Miller @ 2016-04-15 21:08 UTC (permalink / raw)
  To: manish.chopra; +Cc: netdev, Ariel.Elior, Yuval.Mintz
In-Reply-To: <1460612313-20323-1-git-send-email-manish.chopra@qlogic.com>

From: Manish Chopra <manish.chopra@qlogic.com>
Date: Thu, 14 Apr 2016 01:38:28 -0400

> This patch series adds support for VXLAN, GRE and GENEVE tunnels
> to be used over this driver. With this support, adapter can perform
> TSO offload, inner/outer checksums offloads on TX and RX for
> encapsulated packets.
> 
> V1->V2 [ Comments from Jesse Gross incorporated ]
> * Drop general infrastructure change patch.
>   "net: Make vxlan/geneve default udp ports public"
> * Remove by default Linux default UDP ports configurations in driver.
>   Instead, use general registration APIs for UDP port configurations
> * Removing .ndo_features_check - we will add it later with proper change.
> 
> Please consider applying this series to net-next.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH] sctp: simplify sk_receive_queue locking
From: David Miller @ 2016-04-15 21:22 UTC (permalink / raw)
  To: marcelo.leitner; +Cc: netdev, linux-sctp, vyasevich, nhorman
In-Reply-To: <6c4b2f1fab1e792537cc1661b130724d1ea26279.1460583258.git.marcelo.leitner@gmail.com>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed, 13 Apr 2016 19:12:29 -0300

> SCTP already serializes access to rcvbuf through its sock lock:
> sctp_recvmsg takes it right in the start and release at the end, while
> rx path will also take the lock before doing any socket processing. On
> sctp_rcv() it will check if there is an user using the socket and, if
> there is, it will queue incoming packets to the backlog. The backlog
> processing will do the same. Even timers will do such check and
> re-schedule if an user is using the socket.
> 
> Simplifying this will allow us to remove sctp_skb_list_tail and get ride
> of some expensive lockings.  The lists that it is used on are also
> mangled with functions like __skb_queue_tail and __skb_unlink in the
> same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
> sctp_close() will also purge those while using only the sock lock.
> 
> Therefore the lockings performed by sctp_skb_list_tail() are not
> necessary. This patch removes this function and replaces its calls with
> just skb_queue_splice_tail_init() instead.
> 
> The biggest gain is at sctp_ulpq_tail_event(), because the events always
> contain a list, even if it's queueing a single skb and this was
> triggering expensive calls to spin_lock_irqsave/_irqrestore for every
> data chunk received.
> 
> As SCTP will deliver each data chunk on a corresponding recvmsg, the
> more effective the change will be.
> Before this patch, with chunks with 30 bytes:
> netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
> 400000 -s 400000 400000
> on a 10Gbit link with 1500 MTU:
 ...
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH for-next V1 0/2] mlx5_core: mlx5_ifc updates
From: David Miller @ 2016-04-15 21:22 UTC (permalink / raw)
  To: saeedm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: saeedm-VPRAkNaXOzVWk0Htik3J/w, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, matanb-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, talal-VPRAkNaXOzVWk0Htik3J/w
In-Reply-To: <CALzJLG__o56edSf__To-BM6jakuwP7zAAEEQH8LQNTKp-2LRDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

From: Saeed Mahameed <saeedm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Date: Fri, 15 Apr 2016 20:10:07 +0300

> On Wed, Apr 13, 2016 at 7:11 PM, Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> Hi Dave and Doug
>>
>> Changes form V0:
>>         - 2nd patch commit message fixes.
>>
>> This series include mlx5_core updates for both net-next and rdma
>> trees for 4.7 kernel cycle. This is the only shared code planned
>> for 4.7 between rdma and net trees. Hopefully, this will prevent
>> future conflicts when merging between ib-next and net-next once
>> 4.7 cycle is over and merge window is opened.
>>
>> Both Mellanox rdma and net submissions will proceed once this series
>> is applied into both trees.
>>
>> Future shared code will be sent to both maintainers as pull requests
>> from Mellanox's kernel.org tree.
>>
>> We have included all the maintainers of respective drivers.
>> Kindly review the change and let us know in case of any review comments.
>>
>> Saeed Mahameed (1):
>>   net/mlx5: Update mlx5_ifc hardware features
>>
>> Tariq Toukan (1):
>>   net/mlx5: Fix mlx5 ifc cmd_hca_cap bad offsets
>>
>>  include/linux/mlx5/mlx5_ifc.h |  253 +++++++++++++++++++++++++++++------------
>>  1 files changed, 179 insertions(+), 74 deletions(-)
>>
>> --
> 
> Hi Dave,
> 
> This series is still in "Changes Requested" state in patchwork, but
> there is nothing to change here.
> I will be glad if you give it a shot, it is blocking all of our mlx5
> activities for both net and rdma trees.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCHv3 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: David Miller @ 2016-04-15 21:28 UTC (permalink / raw)
  To: lucien.xin
  Cc: netdev, linux-sctp, marcelo.leitner, vyasevich, daniel,
	eric.dumazet
In-Reply-To: <cd6bf9c2696f125a74491e1f82b77a30bb1005dd.1460618169.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 14 Apr 2016 15:35:30 +0800

> sctp_diag will dump some important details of sctp's assoc or ep, we use
> sctp_info to describe them,  sctp_get_sctp_info to get them, and export
> it to sctp_diag.ko.
> 
> v2->v3:
> - we will not use list_for_each_safe in sctp_get_sctp_info, cause
>   all the callers of it will use lock_sock.
> 
> - fix the holes in struct sctp_info with __reserved* field.
>   because sctp_diag is a new feature, and sctp_info is just for now,
>   it may be changed in the future.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Feedback was given here not to mix the changelog and the commit message.

And I want to explicitly state that I totally and _COMPLETELY_ disagree
with this.

It is absolutely essential information and belongs in the commit message.

Adding more information never hurts, so don't do this crap of putting
things that might be useful to know after the "---", ever.

Someone in the future might ask "why didn't he implement it like XXX"
and the changelog can tell him that originally that is what was done
and feedback was given to do it differently.

So Xin thanks for correctly putting the changelog inside of the commit
message, so that future developers can benefit from this knowledge.

^ permalink raw reply

* Re: [PATCHv3 net-next 0/6] sctp: support sctp_diag in kernel
From: David Miller @ 2016-04-15 21:30 UTC (permalink / raw)
  To: lucien.xin
  Cc: netdev, linux-sctp, marcelo.leitner, vyasevich, daniel,
	eric.dumazet
In-Reply-To: <cover.1460618169.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 14 Apr 2016 15:35:29 +0800

> This patchset will add sctp_diag module to implement diag interface on
> sctp in kernel.
 ...

Looks good to me, series applied, thanks.

Please follow up on the suggestion to use jiffies_to_ms(), thanks.

^ permalink raw reply

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
From: Marcelo Ricardo Leitner @ 2016-04-15 21:47 UTC (permalink / raw)
  To: Stephen Hemminger, Hannes Frederic Sowa
  Cc: Cong Wang, Linux Kernel Network Developers, Eric Dumazet,
	Jiri Benc
In-Reply-To: <20160415135832.773707e3@xeon-e3>

Em 15-04-2016 17:58, Stephen Hemminger escreveu:
> On Sat, 09 Apr 2016 01:55:06 +0200
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>
>>
>>
>> On Sat, Apr 9, 2016, at 01:24, Cong Wang wrote:
>>> On Fri, Apr 8, 2016 at 1:55 PM, Hannes Frederic Sowa
>>> <hannes@stressinduktion.org> wrote:
>>>> Due to the fact that the udp socket is destructed asynchronously in a
>>>> work queue, we have some nondeterministic behavior during shutdown of
>>>> vxlan tunnels and creating new ones. Fix this by keeping the destruction
>>>> process synchronous in regards to the user space process so IFF_UP can
>>>> be reliably set.
>>>>
>>>> udp_tunnel_sock_release destroys vs->sock->sk if reference counter
>>>> indicates so. We expect to have the same lifetime of vxlan_sock and
>>>> vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
>>>> only destruct the whole socket after we can be sure it cannot be found
>>>> by searching vxlan_net->sock_list.
>>>>
>>>
>>> I am wondering what is the reason why we used work queue from
>>> the beginning?
>>
>> I actually don't know. It was like that from the beginning. I cc'ed
>> Stephen, maybe he remembers?
>>
>> Bye,
>> Hannes
>
> The problem was that VXLAN needs to update multicast settings and that
> can't be done under RTNL.

If the socket destroy was delayed just due to this, it should be all 
good then, because I took care of this multicast issue on commits

56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
54ff9ef36bdf ("ipv4, ipv6: kill ip_mc_{join, leave}_group and 
ipv6_sock_mc_{join, drop}")

   Marcelo

^ permalink raw reply

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
From: Hannes Frederic Sowa @ 2016-04-15 21:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet, jbenc, marcelo.leitner
In-Reply-To: <20160415.163644.1883719564658558438.davem@davemloft.net>

On 15.04.2016 22:36, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Fri,  8 Apr 2016 22:55:01 +0200
>
>> @@ -1053,7 +1052,9 @@ static void __vxlan_sock_release(struct vxlan_sock *vs)
>>   	vxlan_notify_del_rx_port(vs);
>>   	spin_unlock(&vn->sock_lock);
>>
>> -	queue_work(vxlan_wq, &vs->del_work);
>> +	synchronize_net();
>> +	udp_tunnel_sock_release(vs->sock);
>> +	kfree(vs);
>>   }
>>
>>   static void vxlan_sock_release(struct vxlan_dev *vxlan)
>
> I just want to make sure you saw this change in net-next:
>
> ====================
> commit ca065d0cf80fa547724440a8bf37f1e674d917c0
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri Apr 1 08:52:13 2016 -0700
>
>      udp: no longer use SLAB_DESTROY_BY_RCU
> ====================
>
> Does that effect your change?

I have seen this patch and it does not affect this patch:

The socket is matched from the net->vxlan_net->hlist in fast path. I 
don't want to destruct (kern_sock_shutdown) the socket while packets 
could be in flight through the vxlan stack. We clean up socket memory 
and destruct after rcu again, but given that we only do this during 
ifdown of a vxlan interface I don't see that we need to optimize this again.

All other tunneling protocols don't look up sockets in fast path, so 
they don't need to protect against this.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net-next 7/7] net: dsa: mv88e6xxx: drop switch id
From: Andrew Lunn @ 2016-04-15 21:51 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <87k2jya8ot.fsf@ketchup.mtl.sfl>

On Fri, Apr 15, 2016 at 05:00:50PM -0400, Vivien Didelot wrote:
> Hi Andrew,
> 
> Andrew Lunn <andrew@lunn.ch> writes:
> 
> <snip>
> 
> >> -#define PORT_SWITCH_ID_6350	0x3710
> >> -#define PORT_SWITCH_ID_6351	0x3750
> >> -#define PORT_SWITCH_ID_6352	0x3520
> >
> > NACK
> >
> > These numbers are not obvious. PORT_SWITCH_ID_6320 i can
> > understand. 0x1150 i have no idea what it is.
> 
> 0x1150 is not even correct. That's the product number (bits 4:15) masked
> with an assumed revision 0 (bits 0:3).
> 
> That leads to confusion and error, as seen in the patch 2/7.
> 
> These values are now only used in a device description table, where they
> seem pretty understandable to me.

      { MV88E6XXX_INFO(6320, 0x115, "Marvell 88E6320") },
      { MV88E6XXX_INFO(6320, 0x310, "Marvell 88E6321") },

What does 0x115 have to do with 6320?
What does 0x310 have to do with 6321?

Most do have a pattern, but not all. For a few devices, Marvell has
used /dev/random to pick the ID. Using the macro PORT_SWITCH_ID_6320
documents where these numbers come from, and how to figure out the
correct number of a new device, etc.

> But OK if we really want them defined, I'll introduce 12-bit
> PORT_SWITCH_ID_PROD_NUM_* before dropping the 16-bit
> PORT_SWITCH_ID_*.

I'm O.K. with that.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Andrew Lunn @ 2016-04-15 22:05 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: Florian Fainelli, David S . Miller, Nicolas Ferre, netdev,
	linux-kernel
In-Reply-To: <20160415205613.GE25196@piout.net>

> Trace without my patch:
> libphy: MACB_mii_bus: probed
> macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> [...]
> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY

Are there some state changes before this? How is it getting to state
READY? It would expect it to start in DOWN, from when the phy device
was created in phy_device_create().

       Andrew

^ permalink raw reply

* Poorer networking performance in later kernels?
From: Butler, Peter @ 2016-04-15 21:02 UTC (permalink / raw)
  To: netdev@vger.kernel.org
In-Reply-To: <SN1PR0301MB19987565B71A4688593A2CDBD6680@SN1PR0301MB1998.namprd03.prod.outlook.com>

(Please keep me CC'd to all comments/responses)

I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop in networking performance.  Nothing was changed on the test systems, other than the kernel itself (and kernel modules).  The identical .config used to build the 3.4.2 kernel was brought over into the 4.4.0 kernel source tree, and any configuration differences (e.g. new parameters, etc.) were taken as default values.

The testing was performed on the same actual hardware for both kernel versions (i.e. take the existing 3.4.2 physical setup, simply boot into the (new) kernel and run the same test).  The netperf utility was used for benchmarking and the testing was always performed on idle systems.

TCP testing yielded the following results, where the 4.4.0 kernel only got about 1/2 of the throughput:

      Recv     Send       Send                          Utilization       Service Demand
      Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
      Size     Size       Size    Time       Throughput local    remote   local   remote
      bytes    bytes      bytes   secs.      10^6bits/s % S      % S      us/KB   us/KB

3.4.2 13631488 13631488   8952    30.01      9370.29    10.14    6.50     0.709   0.454
4.4.0 13631488 13631488   8952    30.02      5314.03    9.14     14.31    1.127   1.765

SCTP testing yielded the following results, where the 4.4.0 kernel only got about 1/3 of the throughput:

      Recv     Send       Send                          Utilization       Service Demand
      Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
      Size     Size       Size    Time       Throughput local    remote   local   remote
      bytes    bytes      bytes   secs.      10^6bits/s  % S     % S      us/KB   us/KB

3.4.2 13631488 13631488   8952    30.00      2306.22    13.87    13.19    3.941   3.747
4.4.0 13631488 13631488   8952    30.01       882.74    16.86    19.14    12.516  14.210

The same tests were performed a multitude of time, and are always consistent (within a few percent).  I've also tried playing with various run-time kernel parameters (/proc/sys/kernel/net/...) on the 4.4.0 kernel to alleviate the issue but have had no success at all.

I'm at a loss as to what could possibly account for such a discrepancy...

^ permalink raw reply

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Alexandre Belloni @ 2016-04-15 22:17 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David S . Miller, Nicolas Ferre, netdev,
	linux-kernel
In-Reply-To: <20160415220508.GC26665@lunn.ch>

On 16/04/2016 at 00:05:08 +0200, Andrew Lunn wrote :
> > Trace without my patch:
> > libphy: MACB_mii_bus: probed
> > macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
> > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
> > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> > [...]
> > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> 
> Are there some state changes before this? How is it getting to state
> READY? It would expect it to start in DOWN, from when the phy device
> was created in phy_device_create().
> 

No other changes. I forgot to mention that this is when booting with a
cable plugged in. Unplugging and replugging the cable makes the link
detection work fine even without the patch.

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Florian Fainelli @ 2016-04-15 22:23 UTC (permalink / raw)
  To: Alexandre Belloni, Andrew Lunn
  Cc: David S . Miller, Nicolas Ferre, netdev, linux-kernel
In-Reply-To: <20160415221711.GG25196@piout.net>

On 15/04/16 15:17, Alexandre Belloni wrote:
> On 16/04/2016 at 00:05:08 +0200, Andrew Lunn wrote :
>>> Trace without my patch:
>>> libphy: MACB_mii_bus: probed
>>> macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
>>> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
>>> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
>>> [...]
>>> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
>>
>> Are there some state changes before this? How is it getting to state
>> READY? It would expect it to start in DOWN, from when the phy device
>> was created in phy_device_create().
>>
> 
> No other changes. I forgot to mention that this is when booting with a
> cable plugged in. Unplugging and replugging the cable makes the link
> detection work fine even without the patch.

OK, so the last hunk of the change in d5c3d84657db ("net: phy: Avoid
polling PHY with PHY_IGNORE_INTERRUPTS"):

-       queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
-                          PHY_STATE_TIME * HZ);
+       /* Only re-schedule a PHY state machine change if we are polling the
+        * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
+        * between states from phy_mac_interrupt()
+        */
+       if (phydev->irq == PHY_POLL)
+               queue_delayed_work(system_power_efficient_wq,
&phydev->state_queue,
+                                  PHY_STATE_TIME * HZ);


is presumably what broke for you, right?

Could you also give this patch a spin and see if it works better with
it? The macb driver does something racy with how the MDIO and PHY are
probe wrt. registering the netdev, that needs fixing too.

diff --git a/drivers/net/ethernet/cadence/macb.c
b/drivers/net/ethernet/cadence/macb.c
index eec3200ade4a..98b99149ce0b 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -3005,28 +3005,36 @@ static int macb_probe(struct platform_device *pdev)
        if (err)
                goto err_out_free_netdev;

+       err = macb_mii_init(bp);
+       if (err)
+               goto err_out_free_netdev;
+
+       phydev = bp->phy_dev;
+       phy_attached_info(phydev);
+
+       netif_carrier_off(dev);
+
        err = register_netdev(dev);
        if (err) {
                dev_err(&pdev->dev, "Cannot register net device,
aborting.\n");
                goto err_out_unregister_netdev;
        }

-       err = macb_mii_init(bp);
-       if (err)
-               goto err_out_unregister_netdev;
-
-       netif_carrier_off(dev);
-
        netdev_info(dev, "Cadence %s rev 0x%08x at 0x%08lx irq %d (%pM)\n",
                    macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID),
                    dev->base_addr, dev->irq, dev->dev_addr);

-       phydev = bp->phy_dev;
-       phy_attached_info(phydev);
-
        return 0;

 err_out_unregister_netdev:
+       phy_disconnect(bp->phy_dev);
+       mdiobus_unregister(bp->mii_bus);
+       mdiobus_free(bp->mii_bus);
+
+       /* Shutdown the PHY if there is a GPIO reset */
+       if (bp->reset_gpio)
+               gpiod_set_value(bp->reset_gpio, 0);
+
        unregister_netdev(dev);

 err_out_free_netdev:



-- 
Florian

^ permalink raw reply related

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Andrew Lunn @ 2016-04-15 22:30 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: Florian Fainelli, David S . Miller, Nicolas Ferre, netdev,
	linux-kernel
In-Reply-To: <20160415221711.GG25196@piout.net>

On Sat, Apr 16, 2016 at 12:17:11AM +0200, Alexandre Belloni wrote:
> On 16/04/2016 at 00:05:08 +0200, Andrew Lunn wrote :
> > > Trace without my patch:
> > > libphy: MACB_mii_bus: probed
> > > macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
> > > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
> > > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> > > [...]
> > > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> > 
> > Are there some state changes before this? How is it getting to state
> > READY? It would expect it to start in DOWN, from when the phy device
> > was created in phy_device_create().
> > 
> 
> No other changes. I forgot to mention that this is when booting with a
> cable plugged in. Unplugging and replugging the cable makes the link
> detection work fine even without the patch.

Are you tftpbooting? I.e. has the boot loader already done an auto
negotiation?

I've looked at the code and i still don't see how it gets to READY.
What i do see is that when you connect the phy to the MAC, the
interrupt handler is installed. So maybe there are some PHY interrupts
before the interface is opened? Could you put a print in
phy_interrupt().

	Andrew

^ permalink raw reply

* Re: Poorer networking performance in later kernels?
From: Eric Dumazet @ 2016-04-15 22:33 UTC (permalink / raw)
  To: Butler, Peter; +Cc: netdev@vger.kernel.org
In-Reply-To: <SN1PR0301MB19983946D7A38001E4B54D20D6680@SN1PR0301MB1998.namprd03.prod.outlook.com>

On Fri, 2016-04-15 at 21:02 +0000, Butler, Peter wrote:
> (Please keep me CC'd to all comments/responses)
> 
> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop in networking performance.  Nothing was changed on the test systems, other than the kernel itself (and kernel modules).  The identical .config used to build the 3.4.2 kernel was brought over into the 4.4.0 kernel source tree, and any configuration differences (e.g. new parameters, etc.) were taken as default values.
> 
> The testing was performed on the same actual hardware for both kernel versions (i.e. take the existing 3.4.2 physical setup, simply boot into the (new) kernel and run the same test).  The netperf utility was used for benchmarking and the testing was always performed on idle systems.
> 
> TCP testing yielded the following results, where the 4.4.0 kernel only got about 1/2 of the throughput:
> 
>       Recv     Send       Send                          Utilization       Service Demand
>       Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
>       Size     Size       Size    Time       Throughput local    remote   local   remote
>       bytes    bytes      bytes   secs.      10^6bits/s % S      % S      us/KB   us/KB
> 
> 3.4.2 13631488 13631488   8952    30.01      9370.29    10.14    6.50     0.709   0.454
> 4.4.0 13631488 13631488   8952    30.02      5314.03    9.14     14.31    1.127   1.765
> 
> SCTP testing yielded the following results, where the 4.4.0 kernel only got about 1/3 of the throughput:
> 
>       Recv     Send       Send                          Utilization       Service Demand
>       Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
>       Size     Size       Size    Time       Throughput local    remote   local   remote
>       bytes    bytes      bytes   secs.      10^6bits/s  % S     % S      us/KB   us/KB
> 
> 3.4.2 13631488 13631488   8952    30.00      2306.22    13.87    13.19    3.941   3.747
> 4.4.0 13631488 13631488   8952    30.01       882.74    16.86    19.14    12.516  14.210
> 
> The same tests were performed a multitude of time, and are always consistent (within a few percent).  I've also tried playing with various run-time kernel parameters (/proc/sys/kernel/net/...) on the 4.4.0 kernel to alleviate the issue but have had no success at all.
> 
> I'm at a loss as to what could possibly account for such a discrepancy...

Maybe new kernel is faster and you have drops somewhere ?

nstat >/dev/null
netperf -H ...
nstat

Would help

^ permalink raw reply

* Re: Poorer networking performance in later kernels?
From: Rick Jones @ 2016-04-15 22:37 UTC (permalink / raw)
  To: Butler, Peter, netdev@vger.kernel.org
In-Reply-To: <SN1PR0301MB19983946D7A38001E4B54D20D6680@SN1PR0301MB1998.namprd03.prod.outlook.com>

On 04/15/2016 02:02 PM, Butler, Peter wrote:
> (Please keep me CC'd to all comments/responses)
>
> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop
> in networking performance.  Nothing was changed on the test systems,
> other than the kernel itself (and kernel modules).  The identical
> .config used to build the 3.4.2 kernel was brought over into the
> 4.4.0 kernel source tree, and any configuration differences (e.g. new
> parameters, etc.) were taken as default values.
>
> The testing was performed on the same actual hardware for both kernel
> versions (i.e. take the existing 3.4.2 physical setup, simply boot
> into the (new) kernel and run the same test).  The netperf utility
> was used for benchmarking and the testing was always performed on
> idle systems.
>
> TCP testing yielded the following results, where the 4.4.0 kernel
> only got about 1/2 of the throughput:
>

>        Recv     Send       Send                          Utilization       Service Demand
>        Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
>        Size     Size       Size    Time       Throughput local    remote   local   remote
>        bytes    bytes      bytes   secs.      10^6bits/s % S      % S      us/KB   us/KB
>
> 3.4.2 13631488 13631488   8952    30.01      9370.29    10.14    6.50     0.709   0.454
> 4.4.0 13631488 13631488   8952    30.02      5314.03    9.14     14.31    1.127   1.765
>
> SCTP testing yielded the following results, where the 4.4.0 kernel only got about 1/3 of the throughput:
>
>        Recv     Send       Send                          Utilization       Service Demand
>        Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
>        Size     Size       Size    Time       Throughput local    remote   local   remote
>        bytes    bytes      bytes   secs.      10^6bits/s  % S     % S      us/KB   us/KB
>
> 3.4.2 13631488 13631488   8952    30.00      2306.22    13.87    13.19    3.941   3.747
> 4.4.0 13631488 13631488   8952    30.01       882.74    16.86    19.14    12.516  14.210
>
> The same tests were performed a multitude of time, and are always
> consistent (within a few percent).  I've also tried playing with
> various run-time kernel parameters (/proc/sys/kernel/net/...) on the
> 4.4.0 kernel to alleviate the issue but have had no success at all.
>
> I'm at a loss as to what could possibly account for such a discrepancy...
>

I suspect I am not alone in being curious about the CPU(s) present in 
the systems and the model/whatnot of the NIC being used.  I'm also 
curious as to why you have what at first glance seem like absurdly large 
socket buffer sizes.

That said, it looks like you have some Really Big (tm) increases in 
service demand.  Many more CPU cycles being consumed per KB of data 
transferred.

Your message size makes me wonder if you were using a 9000 byte MTU.

Perhaps in the move from 3.4.2 to 4.4.0 you lost some or all of the 
stateless offloads for your NIC(s)?  Running ethtool -k <interface> on 
both ends under both kernels might be good.

Also, if you did have a 9000 byte MTU under 3.4.2 are you certain you 
still had it under 4.4.0?

It would (at least to me) also be interesting to run a TCP_RR test 
comparing the two kernels.  TCP_RR (at least with the default 
request/response size of one byte) doesn't really care about stateless 
offloads or MTUs and could show how much difference there is in basic 
path length (or I suppose in interrupt coalescing behaviour if the NIC 
in question has a mildly dodgy heuristic for such things).

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Alexandre Belloni @ 2016-04-15 22:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David S . Miller, Nicolas Ferre, netdev,
	linux-kernel
In-Reply-To: <20160415223026.GD26665@lunn.ch>

On 16/04/2016 at 00:30:26 +0200, Andrew Lunn wrote :
> On Sat, Apr 16, 2016 at 12:17:11AM +0200, Alexandre Belloni wrote:
> > On 16/04/2016 at 00:05:08 +0200, Andrew Lunn wrote :
> > > > Trace without my patch:
> > > > libphy: MACB_mii_bus: probed
> > > > macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
> > > > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
> > > > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> > > > [...]
> > > > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> > > 
> > > Are there some state changes before this? How is it getting to state
> > > READY? It would expect it to start in DOWN, from when the phy device
> > > was created in phy_device_create().
> > > 
> > 
> > No other changes. I forgot to mention that this is when booting with a
> > cable plugged in. Unplugging and replugging the cable makes the link
> > detection work fine even without the patch.
> 
> Are you tftpbooting? I.e. has the boot loader already done an auto
> negotiation?
> 

Yes.

> I've looked at the code and i still don't see how it gets to READY.
> What i do see is that when you connect the phy to the MAC, the
> interrupt handler is installed. So maybe there are some PHY interrupts
> before the interface is opened? Could you put a print in
> phy_interrupt().
> 

That is indeed the case, and I'm not sure why because
99f81afc139c6edd14d77a91ee91685a414a1c66 is trying to disable AN at
boot.

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: qdisc spin lock
From: Michael Ma @ 2016-04-15 22:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, Linux Kernel Network Developers
In-Reply-To: <1460125157.6473.434.camel@edumazet-glaptop3.roam.corp.google.com>

2016-04-08 7:19 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
> On Thu, 2016-03-31 at 16:48 -0700, Michael Ma wrote:
>> I didn't really know that multiple qdiscs can be isolated using MQ so
>> that each txq can be associated with a particular qdisc. Also we don't
>> really have multiple interfaces...
>>
>> With this MQ solution we'll still need to assign transmit queues to
>> different classes by doing some math on the bandwidth limit if I
>> understand correctly, which seems to be less convenient compared with
>> a solution purely within HTB.
>>
>> I assume that with this solution I can still share qdisc among
>> multiple transmit queues - please let me know if this is not the case.
>
> Note that this MQ + HTB thing works well, unless you use a bonding
> device. (Or you need the MQ+HTB on the slaves, with no way of sharing
> tokens between the slaves)
>
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bb1d912323d5dd50e1079e389f4e964be14f0ae3
>
> bonding can not really be used as a true MQ device yet.
>
> I might send a patch to disable this 'bonding feature' if no slave sets
> a queue_id.
>
>
So there is no way of using this MQ solution when bonding interface is
used, right? Then modifying HTB might be the only solution?

^ permalink raw reply

* Re: [PATCH 2/3] phy: add generic function to support ksetting support
From: Philippe Reynes @ 2016-04-15 22:53 UTC (permalink / raw)
  To: Fugang Duan
  Cc: davem@davemloft.net, decot@googlers.com, f.fainelli@gmail.com,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <VI1PR0401MB18559A6D1BFF6A55CE840574FF680@VI1PR0401MB1855.eurprd04.prod.outlook.com>

On 15/04/16 04:50, Fugang Duan wrote:
> From: Philippe Reynes<tremyfr@gmail.com>  Sent: Friday, April 15, 2016 6:35 AM
>> To: davem@davemloft.net; decot@googlers.com; f.fainelli@gmail.com; Fugang
>> Duan<fugang.duan@nxp.com>
>> Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Philippe Reynes
>> <tremyfr@gmail.com>
>> Subject: [PATCH 2/3] phy: add generic function to support ksetting support
>>
>> The old ethtool api (get_setting and set_setting) has generic phy functions
>> phy_ethtool_sset and phy_ethtool_gset.
>> To supprt the new ethtool api (get_link_ksettings and set_link_ksettings), we
>> add generic phy function phy_ethtool_ksettings_get and
>> phy_ethtool_ksettings_set.
>>
>> Signed-off-by: Philippe Reynes<tremyfr@gmail.com>
>> ---
>>   drivers/net/phy/phy.c |   81
>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/phy.h   |    4 ++
>>   2 files changed, 85 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index
>> 5590b9c..6f221c8 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -362,6 +362,60 @@ int phy_ethtool_sset(struct phy_device *phydev,
>> struct ethtool_cmd *cmd)  }  EXPORT_SYMBOL(phy_ethtool_sset);
>>
>> +int phy_ethtool_ksettings_set(struct phy_device *phydev,
>> +			      const struct ethtool_link_ksettings *cmd) {
>> +	u8 autoneg = cmd->base.autoneg;
>> +	u8 duplex = cmd->base.duplex;
>> +	u32 speed = cmd->base.speed;
>> +	u32 advertising;
>> +
>> +	if (cmd->base.phy_address != phydev->mdio.addr)
>> +		return -EINVAL;
>> +
>> +	ethtool_convert_link_mode_to_legacy_u32(&advertising,
>> +						cmd->link_modes.advertising);
>> +
>> +	/* We make sure that we don't pass unsupported values in to the PHY */
>> +	advertising&= phydev->supported;
>> +
>> +	/* Verify the settings we care about. */
>> +	if (autoneg != AUTONEG_ENABLE&&  autoneg != AUTONEG_DISABLE)
>> +		return -EINVAL;
>> +
>> +	if (autoneg == AUTONEG_ENABLE&&  advertising == 0)
>> +		return -EINVAL;
>> +
>> +	if (autoneg == AUTONEG_DISABLE&&
>> +	    ((speed != SPEED_1000&&
>> +	      speed != SPEED_100&&
>> +	      speed != SPEED_10) ||
>> +	     (duplex != DUPLEX_HALF&&
>> +	      duplex != DUPLEX_FULL)))
>> +		return -EINVAL;
>> +
>> +	phydev->autoneg = autoneg;
>> +
>> +	phydev->speed = speed;
>> +
>> +	phydev->advertising = advertising;
>> +
>> +	if (autoneg == AUTONEG_ENABLE)
>> +		phydev->advertising |= ADVERTISED_Autoneg;
>> +	else
>> +		phydev->advertising&= ~ADVERTISED_Autoneg;
>> +
>> +	phydev->duplex = duplex;
>> +
>> +	phydev->mdix = cmd->base.eth_tp_mdix_ctrl;
>> +
>> +	/* Restart the PHY */
>> +	phy_start_aneg(phydev);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(phy_ethtool_ksettings_set);
>> +
>>   int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd)  {
>>   	cmd->supported = phydev->supported;
>> @@ -385,6 +439,33 @@ int phy_ethtool_gset(struct phy_device *phydev,
>> struct ethtool_cmd *cmd)  }  EXPORT_SYMBOL(phy_ethtool_gset);
>>
>> +int phy_ethtool_ksettings_get(struct phy_device *phydev,
>> +			      struct ethtool_link_ksettings *cmd) {
>> +	ethtool_convert_legacy_u32_to_link_mode(cmd-
>>> link_modes.supported,
>> +						phydev->supported);
>> +
>> +	ethtool_convert_legacy_u32_to_link_mode(cmd-
>>> link_modes.advertising,
>> +						phydev->advertising);
>> +
>> +	ethtool_convert_legacy_u32_to_link_mode(cmd-
>>> link_modes.lp_advertising,
>> +						phydev->lp_advertising);
>> +
>> +	cmd->base.speed = phydev->speed;
>> +	cmd->base.duplex = phydev->duplex;
>> +	if (phydev->interface == PHY_INTERFACE_MODE_MOCA)
>> +		cmd->base.port = PORT_BNC;
>> +	else
>> +		cmd->base.port = PORT_MII;
>> +
>> +	cmd->base.phy_address = phydev->mdio.addr;
>> +	cmd->base.autoneg = phydev->autoneg;
>> +	cmd->base.eth_tp_mdix_ctrl = phydev->mdix;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(phy_ethtool_ksettings_get);
>> +
>>   /**
>>    * phy_mii_ioctl - generic PHY MII ioctl interface
>>    * @phydev: the phy_device struct
>> diff --git a/include/linux/phy.h b/include/linux/phy.h index 2abd791..be3f83b
>> 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -805,6 +805,10 @@ void phy_start_machine(struct phy_device *phydev);
>> void phy_stop_machine(struct phy_device *phydev);  int
>> phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);  int
>> phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd);
>> +int phy_ethtool_ksettings_get(struct phy_device *phydev,
>> +			      struct ethtool_link_ksettings *cmd); int
>> +phy_ethtool_ksettings_set(struct phy_device *phydev,
>> +			      const struct ethtool_link_ksettings *cmd);
>>   int phy_mii_ioctl(struct phy_device *phydev, struct ifreq *ifr, int cmd);  int
>> phy_start_interrupts(struct phy_device *phydev);  void phy_print_status(struct
>> phy_device *phydev);
>> --
>> 1.7.4.4
>
> It seems fine. There have many drivers need to update .set_settings/. get_settings, not only fsl fec driver.

Yes, a lot a drivers needs to be updated. Right now, I've only updated the fec.
If everybody agree, I can replace them.

> And whether there still need to keep the old interface in phy.c driver ?


I think we could remove the old interface when all the drivers would have moved to the new interface.

> Regards,
> Andy

Regards,
Philippe

^ permalink raw reply

* Re: qdisc spin lock
From: Eric Dumazet @ 2016-04-15 22:54 UTC (permalink / raw)
  To: Michael Ma; +Cc: Cong Wang, Linux Kernel Network Developers
In-Reply-To: <CAAmHdhw8-qEHwe+GNU_p9UGZyMU38PgrjaM9zEK9nOfSENxDUg@mail.gmail.com>

On Fri, 2016-04-15 at 15:46 -0700, Michael Ma wrote:
> 2016-04-08 7:19 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
> > On Thu, 2016-03-31 at 16:48 -0700, Michael Ma wrote:
> >> I didn't really know that multiple qdiscs can be isolated using MQ so
> >> that each txq can be associated with a particular qdisc. Also we don't
> >> really have multiple interfaces...
> >>
> >> With this MQ solution we'll still need to assign transmit queues to
> >> different classes by doing some math on the bandwidth limit if I
> >> understand correctly, which seems to be less convenient compared with
> >> a solution purely within HTB.
> >>
> >> I assume that with this solution I can still share qdisc among
> >> multiple transmit queues - please let me know if this is not the case.
> >
> > Note that this MQ + HTB thing works well, unless you use a bonding
> > device. (Or you need the MQ+HTB on the slaves, with no way of sharing
> > tokens between the slaves)
> >
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bb1d912323d5dd50e1079e389f4e964be14f0ae3
> >
> > bonding can not really be used as a true MQ device yet.
> >
> > I might send a patch to disable this 'bonding feature' if no slave sets
> > a queue_id.
> >
> >
> So there is no way of using this MQ solution when bonding interface is
> used, right? Then modifying HTB might be the only solution?

I probably can submit a bonding patch very soon if there is interest.

Modifying HTB is way more complicated :(

^ permalink raw reply

* Re: qdisc spin lock
From: Michael Ma @ 2016-04-15 23:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, Linux Kernel Network Developers
In-Reply-To: <1460760854.10638.81.camel@edumazet-glaptop3.roam.corp.google.com>

2016-04-15 15:54 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
> On Fri, 2016-04-15 at 15:46 -0700, Michael Ma wrote:
>> 2016-04-08 7:19 GMT-07:00 Eric Dumazet <eric.dumazet@gmail.com>:
>> > On Thu, 2016-03-31 at 16:48 -0700, Michael Ma wrote:
>> >> I didn't really know that multiple qdiscs can be isolated using MQ so
>> >> that each txq can be associated with a particular qdisc. Also we don't
>> >> really have multiple interfaces...
>> >>
>> >> With this MQ solution we'll still need to assign transmit queues to
>> >> different classes by doing some math on the bandwidth limit if I
>> >> understand correctly, which seems to be less convenient compared with
>> >> a solution purely within HTB.
>> >>
>> >> I assume that with this solution I can still share qdisc among
>> >> multiple transmit queues - please let me know if this is not the case.
>> >
>> > Note that this MQ + HTB thing works well, unless you use a bonding
>> > device. (Or you need the MQ+HTB on the slaves, with no way of sharing
>> > tokens between the slaves)
>> >
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bb1d912323d5dd50e1079e389f4e964be14f0ae3
>> >
>> > bonding can not really be used as a true MQ device yet.
>> >
>> > I might send a patch to disable this 'bonding feature' if no slave sets
>> > a queue_id.
>> >
>> >
>> So there is no way of using this MQ solution when bonding interface is
>> used, right? Then modifying HTB might be the only solution?
>
> I probably can submit a bonding patch very soon if there is interest.

Would definitely appreciate that. If you can share the patch it will
be helpful as well. Let me know if I can help with this...
>
> Modifying HTB is way more complicated :(
>
>
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox