Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
From: Russell King - ARM Linux @ 2017-08-24 17:04 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Antoine Tenart, davem, kishon, jason, sebastian.hesselbarth,
	gregory.clement, thomas.petazzoni, nadavh, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170824165743.GB18700@lunn.ch>

On Thu, Aug 24, 2017 at 06:57:43PM +0200, Andrew Lunn wrote:
> > I see what could be the issue but I do not understand one aspect though:
> > how could we switch from one PHY to another, as there's only one output
> > between the SoC (and so a given GoP#) and the board. So if a given PHY
> > can handle multiple modes I see, but in the other case a muxing
> > somewhere would be needed? Or did I miss something?
> 
> I think we need a hardware diagram...
> 
> How are the RJ45, copper PHY, SFP module connected to the SoC?
> 
> Somewhere there must be a mux, to select between copper and
> fibre. Where is that mux?

In the 88x3310 PHY:

                     .------- RJ45
MVPP2 ----- 88x3310 PHY
                     `------- SFP+

Here's the commentry I've provided at the very top of the 88x3310 driver
which describes all these modes:

 * There appears to be several different data paths through the PHY which
 * are automatically managed by the PHY.  The following has been determined
 * via observation and experimentation:
 *
 *       SGMII PHYXS -- BASE-T PCS -- 10G PMA -- AN -- Copper (for <= 1G)
 *  10GBASE-KR PHYXS -- BASE-T PCS -- 10G PMA -- AN -- Copper (for 10G)
 *  10GBASE-KR PHYXS -- BASE-R PCS -- Fiber
 *
 * If both the fiber and copper ports are connected, the first to gain
 * link takes priority and the other port is completely locked out.

It's not a copper-only PHY, it's just like most other PHYs out there
that support multiple connections, like the 88e151x series that support
both RJ45 and fibre and can auto-switch between them.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* RE: [EXT] Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
From: Stefan Chulski @ 2017-08-24 17:08 UTC (permalink / raw)
  To: Antoine Tenart, Andrew Lunn
  Cc: davem@davemloft.net, kishon@ti.com, jason@lakedaemon.net,
	sebastian.hesselbarth@gmail.com,
	gregory.clement@free-electrons.com,
	thomas.petazzoni@free-electrons.com, Nadav Haklai,
	linux@armlinux.org.uk, linux-kernel@vger.kernel.org,
	mw@semihalf.com, miquel.raynal@free-electrons.com,
	netdev@vger.kernel.org
In-Reply-To: <20170824161945.GG28443@kwain>

> > Imagine phylib is using the copper Ethernet PHY, but the MAC is using
> > the SFP port. Somebody pulls out the copper cable, phylib says the
> > link is down, turns the carrier off and calls the callback. Not good,
> > since your SFP cable is still plugged in...  Ethtool is
> > returning/setting stuff in the Copper Ethernet PHY, when in fact you
> > intend to be setting SFP settings.
> 
> I see what could be the issue but I do not understand one aspect though:
> how could we switch from one PHY to another, as there's only one output
> between the SoC (and so a given GoP#) and the board. So if a given PHY can
> handle multiple modes I see, but in the other case a muxing somewhere would
> be needed? Or did I miss something?

I think PHY name and PHY mode struct that describe here both MAC to PHY and PHY to PHY connection create confusion...
Serdes IP lane doesn't care if connector is SFP, RJ45 or direct attached cable. mvpp22_comphy_init only configures MAC to PHY
connection. SFI for 10G(KR in mainline), SGMII for 1G and HS_SGMII for 2.5G.

Regards,
Stefan.

^ permalink raw reply

* Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
From: Antoine Tenart @ 2017-08-24 17:13 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Andrew Lunn, Antoine Tenart, davem, kishon, jason,
	sebastian.hesselbarth, gregory.clement, thomas.petazzoni, nadavh,
	linux-kernel, mw, stefanc, miquel.raynal, netdev
In-Reply-To: <20170824170401.GA20805@n2100.armlinux.org.uk>

[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]

On Thu, Aug 24, 2017 at 06:04:01PM +0100, Russell King - ARM Linux wrote:
> On Thu, Aug 24, 2017 at 06:57:43PM +0200, Andrew Lunn wrote:
> > > I see what could be the issue but I do not understand one aspect though:
> > > how could we switch from one PHY to another, as there's only one output
> > > between the SoC (and so a given GoP#) and the board. So if a given PHY
> > > can handle multiple modes I see, but in the other case a muxing
> > > somewhere would be needed? Or did I miss something?
> > 
> > I think we need a hardware diagram...
> > 
> > How are the RJ45, copper PHY, SFP module connected to the SoC?
> > 
> > Somewhere there must be a mux, to select between copper and
> > fibre. Where is that mux?
> 
> In the 88x3310 PHY:
> 
>                      .------- RJ45
> MVPP2 ----- 88x3310 PHY
>                      `------- SFP+

And the "MVPP2" part can be expended to:

        .-- GoP #0 --.
MVPP2 ----- GoP #1 ---- Comphy lane #X -- 88x3310
        `-- GoP #2 --'

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [EXT] Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
From: Antoine Tenart @ 2017-08-24 17:14 UTC (permalink / raw)
  To: Stefan Chulski
  Cc: Antoine Tenart, Andrew Lunn, davem@davemloft.net, kishon@ti.com,
	jason@lakedaemon.net, sebastian.hesselbarth@gmail.com,
	gregory.clement@free-electrons.com,
	thomas.petazzoni@free-electrons.com, Nadav Haklai,
	linux@armlinux.org.uk, linux-kernel@vger.kernel.org,
	mw@semihalf.com, miquel.raynal@free-electrons.com,
	netdev@vger.kernel.org
In-Reply-To: <de67e234a7ff43b49346be5cea2466eb@IL-EXCH01.marvell.com>

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

On Thu, Aug 24, 2017 at 05:08:29PM +0000, Stefan Chulski wrote:
> > > Imagine phylib is using the copper Ethernet PHY, but the MAC is using
> > > the SFP port. Somebody pulls out the copper cable, phylib says the
> > > link is down, turns the carrier off and calls the callback. Not good,
> > > since your SFP cable is still plugged in...  Ethtool is
> > > returning/setting stuff in the Copper Ethernet PHY, when in fact you
> > > intend to be setting SFP settings.
> > 
> > I see what could be the issue but I do not understand one aspect though:
> > how could we switch from one PHY to another, as there's only one output
> > between the SoC (and so a given GoP#) and the board. So if a given PHY can
> > handle multiple modes I see, but in the other case a muxing somewhere would
> > be needed? Or did I miss something?
> 
> I think PHY name and PHY mode struct that describe here both MAC to
> PHY and PHY to PHY connection create confusion...  Serdes IP lane
> doesn't care if connector is SFP, RJ45 or direct attached cable.
> mvpp22_comphy_init only configures MAC to PHY
> connection. SFI for 10G(KR in mainline), SGMII for 1G and HS_SGMII for
> 2.5G.

So maybe one confusion was to name them PHY_MODE_10GKR and
PHY_MODE_SGMII. It could be PHY_MODE_10G and PHY_MODE_1G instead.

Does that sound right?

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* RE: [EXT] Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
From: Stefan Chulski @ 2017-08-24 17:16 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: Andrew Lunn, davem@davemloft.net, kishon@ti.com,
	jason@lakedaemon.net, sebastian.hesselbarth@gmail.com,
	gregory.clement@free-electrons.com,
	thomas.petazzoni@free-electrons.com, Nadav Haklai,
	linux@armlinux.org.uk, linux-kernel@vger.kernel.org,
	mw@semihalf.com, miquel.raynal@free-electrons.com,
	netdev@vger.kernel.org
In-Reply-To: <20170824171418.GI28443@kwain>

> So maybe one confusion was to name them PHY_MODE_10GKR and
> PHY_MODE_SGMII. It could be PHY_MODE_10G and PHY_MODE_1G instead.

1G can be RGMII...

Regards,
Stefan.

^ permalink raw reply

* RE: [EXT] Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
From: Stefan Chulski @ 2017-08-24 17:25 UTC (permalink / raw)
  To: Antoine Tenart, Russell King - ARM Linux
  Cc: Andrew Lunn, davem@davemloft.net, kishon@ti.com,
	jason@lakedaemon.net, sebastian.hesselbarth@gmail.com,
	gregory.clement@free-electrons.com,
	thomas.petazzoni@free-electrons.com, Nadav Haklai,
	linux-kernel@vger.kernel.org, mw@semihalf.com,
	miquel.raynal@free-electrons.com, netdev@vger.kernel.org
In-Reply-To: <20170824171301.GH28443@kwain>

> >                      .------- RJ45
> > MVPP2 ----- 88x3310 PHY
> >                      `------- SFP+
> 
> And the "MVPP2" part can be expended to:
> 
>         .-- GoP #0 --.
> MVPP2 ----- GoP #1 ---- Comphy lane #X -- 88x3310
>         `-- GoP #2 --'
> 
> Thanks!
> Antoine

One more point, Alaska 3310 PHY SoC and Marvell Embedded Processor SoC(A8K) are different SoC's.
Comphy driver configures Serdes IP in Marvell Embedded Processor SoC and media(SFP or RJ45) autodetect is done in Alaska 3310 PHY SoC(mostly by firmware).

Regards,
Stefan

^ permalink raw reply

* Re: [PATCH net-next 09/13] net: mvpp2: dynamic reconfiguration of the PHY mode
From: Andrew Lunn @ 2017-08-24 17:45 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Antoine Tenart, davem, kishon, jason, sebastian.hesselbarth,
	gregory.clement, thomas.petazzoni, nadavh, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170824165947.GZ20805@n2100.armlinux.org.uk>

Hi Russell

> I think you're all getting confused.

Yes, i was at least.

> The 88x3310 driver in the kernel knows about these combinations and
> sets the phy interface parameter correctly depending on whether the
> PHY has configured itself for copper at whatever speed or SFP+.

So when the PHY decides to swap from copper to fibre etc, is the
phylib state machine kept up to date. Does it see a down, followed by
an up?

   Andrew

^ permalink raw reply

* Re: [PATCH] ss: fix help/man TCP-STATE description for listening
From: Stephen Hemminger @ 2017-08-24 18:02 UTC (permalink / raw)
  To: Andreas Henriksson; +Cc: netdev
In-Reply-To: <1503492471-18403-1-git-send-email-andreas@fatal.se>

On Wed, 23 Aug 2017 14:47:51 +0200
Andreas Henriksson <andreas@fatal.se> wrote:

> There's some misleading information in --help and ss(8) manpage about
> TCP-STATE named 'listen'.
> ss doesn't know such a state, but it knows 'listening' state.
> 
> $ ss -tua state listen
> ss: wrong state name: listen
> 
> $ ss -tua state listening
> [...]
> 
> Addresses: https://bugs.debian.org/872990
> Reported-by: Pavel Lyulchenko <p.lyulchenko@gmail.com>
> Signed-off-by: Andreas Henriksson <andreas@fatal.se>

Applied. Thanks

^ permalink raw reply

* Re: [PATCH v3 0/7] Add RSS to DPAA 1.x Ethernet driver
From: David Miller @ 2017-08-24 18:47 UTC (permalink / raw)
  To: madalin.bucur; +Cc: netdev, linuxppc-dev, linux-kernel
In-Reply-To: <20170824.094220.1750290899777462105.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 24 Aug 2017 09:42:20 -0700 (PDT)

> From: Madalin Bucur <madalin.bucur@nxp.com>
> Date: Thu, 24 Aug 2017 10:28:21 +0300
> 
>> This patch set introduces Receive Side Scaling for the DPAA Ethernet
>> driver. Documentation is updated with details related to the new
>> feature and limitations that apply.
>> Added also a small fix.
>> 
>> v2: removed a C++ style comment
>> v3: move struct fman to header file to avoid exporting a function
> 
> Series applied, thanks.

Actually I'm reverting, this doesn't even compile.

[davem@localhost net-next]$ make -s -j8
In file included from drivers/net/ethernet/freescale/fman/fman.c:35:0:
drivers/net/ethernet/freescale/fman/fman.h:286:9: error: type defaults to ‘int’ in declaration of ‘irqreturn_t’ [-Werror=implicit-int]
 typedef irqreturn_t (fman_exceptions_cb)(struct fman *fman,
         ^~~~~~~~~~~
drivers/net/ethernet/freescale/fman/fman.h:286:9: error: ‘irqreturn_t’ declared as function returning a function
drivers/net/ethernet/freescale/fman/fman.h:287:12: warning: parameter names (without types) in function declaration
       enum fman_exceptions exception);
            ^~~~~~~~~~~~~~~
drivers/net/ethernet/freescale/fman/fman.h:300:22: error: ‘fman_bus_error_cb’ declared as function returning a function
 typedef irqreturn_t (fman_bus_error_cb)(struct fman *fman, u8 port_id,
                      ^~~~~~~~~~~~~~~~~
drivers/net/ethernet/freescale/fman/fman.h:316:18: error: field ‘muram_res’ has incomplete type
  struct resource muram_res;              /* MURAM resource */
                  ^~~~~~~~~
drivers/net/ethernet/freescale/fman/fman.h:330:2: error: unknown type name ‘fman_exceptions_cb’
  fman_exceptions_cb *exception_cb;
  ^~~~~~~~~~~~~~~~~~
drivers/net/ethernet/freescale/fman/fman.h:333:2: error: unknown type name ‘spinlock_t’
  spinlock_t spinlock;
  ^~~~~~~~~~
In file included from ./include/linux/irq.h:19:0,
                 from ./include/linux/of_irq.h:6,
                 from drivers/net/ethernet/freescale/fman/fman.c:46:
./include/linux/irqreturn.h:16:24: error: conflicting types for ‘irqreturn_t’
 typedef enum irqreturn irqreturn_t;
                        ^~~~~~~~~~~
In file included from drivers/net/ethernet/freescale/fman/fman.c:35:0:
drivers/net/ethernet/freescale/fman/fman.h:286:9: note: previous declaration of ‘irqreturn_t’ was here
 typedef irqreturn_t (fman_exceptions_cb)(struct fman *fman,
         ^~~~~~~~~~~
drivers/net/ethernet/freescale/fman/fman.c: In function ‘bmi_err_event’:
drivers/net/ethernet/freescale/fman/fman.c:1237:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_BMI_STORAGE_PROFILE_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1239:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_BMI_LIST_RAM_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1241:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_BMI_STATISTICS_RAM_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1243:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_BMI_DISPATCH_RAM_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c: In function ‘qmi_err_event’:
drivers/net/ethernet/freescale/fman/fman.c:1266:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_QMI_DOUBLE_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1268:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman,
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c: In function ‘dma_err_event’:
drivers/net/ethernet/freescale/fman/fman.c:1317:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_DMA_SINGLE_PORT_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1319:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_DMA_READ_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1321:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_DMA_SYSTEM_WRITE_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1323:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_DMA_FM_WRITE_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c: In function ‘fpm_err_event’:
drivers/net/ethernet/freescale/fman/fman.c:1340:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_FPM_DOUBLE_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1342:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_FPM_STALL_ON_TASKS);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1345:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_FPM_SINGLE_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c: In function ‘muram_err_intr’:
drivers/net/ethernet/freescale/fman/fman.c:1363:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_MURAM_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c: In function ‘qmi_event’:
drivers/net/ethernet/freescale/fman/fman.c:1385:9: error: called object is not a function or function pointer
   ret = fman->exception_cb(fman, FMAN_EX_QMI_SINGLE_ECC);
         ^~~~
drivers/net/ethernet/freescale/fman/fman.c: In function ‘fman_config’:
drivers/net/ethernet/freescale/fman/fman.c:1735:21: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
  fman->exception_cb = fman_exceptions;
                     ^
drivers/net/ethernet/freescale/fman/fman.c:1736:21: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
  fman->bus_error_cb = fman_bus_error;
                     ^
In file included from ./include/linux/mmzone.h:7:0,
                 from ./include/linux/gfp.h:5,
                 from ./include/linux/slab.h:14,
                 from drivers/net/ethernet/freescale/fman/fman.c:40:
drivers/net/ethernet/freescale/fman/fman.c:1745:17: error: passing argument 1 of ‘spinlock_check’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  spin_lock_init(&fman->spinlock);
                 ^
./include/linux/spinlock.h:293:17: note: in definition of macro ‘spin_lock_init’
  spinlock_check(_lock);    \
                 ^~~~~
./include/linux/spinlock.h:286:40: note: expected ‘spinlock_t * {aka struct spinlock *}’ but argument is of type ‘int *’
 static __always_inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
                                        ^~~~~~~~~~~~~~
In file included from ./include/linux/mmzone.h:7:0,
                 from ./include/linux/gfp.h:5,
                 from ./include/linux/slab.h:14,
                 from drivers/net/ethernet/freescale/fman/fman.c:40:
./include/linux/spinlock.h:294:29: error: request for member ‘rlock’ in something not a structure or union
  raw_spin_lock_init(&(_lock)->rlock);  \
                             ^
./include/linux/spinlock.h:99:24: note: in definition of macro ‘raw_spin_lock_init’
  __raw_spin_lock_init((lock), #lock, &__key);  \
                        ^~~~
drivers/net/ethernet/freescale/fman/fman.c:1745:2: note: in expansion of macro ‘spin_lock_init’
  spin_lock_init(&fman->spinlock);
  ^~~~~~~~~~~~~~
drivers/net/ethernet/freescale/fman/fman.c: In function ‘fman_set_port_params’:
drivers/net/ethernet/freescale/fman/fman.c:2123:20: error: passing argument 1 of ‘spinlock_check’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  spin_lock_irqsave(&fman->spinlock, flags);
                    ^
./include/linux/spinlock.h:205:34: note: in definition of macro ‘raw_spin_lock_irqsave’
   flags = _raw_spin_lock_irqsave(lock); \
                                  ^~~~
drivers/net/ethernet/freescale/fman/fman.c:2123:2: note: in expansion of macro ‘spin_lock_irqsave’
  spin_lock_irqsave(&fman->spinlock, flags);
  ^~~~~~~~~~~~~~~~~
In file included from ./include/linux/mmzone.h:7:0,
                 from ./include/linux/gfp.h:5,
                 from ./include/linux/slab.h:14,
                 from drivers/net/ethernet/freescale/fman/fman.c:40:
./include/linux/spinlock.h:286:40: note: expected ‘spinlock_t * {aka struct spinlock *}’ but argument is of type ‘int *’
 static __always_inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
                                        ^~~~~~~~~~~~~~
drivers/net/ethernet/freescale/fman/fman.c:2201:25: error: passing argument 1 of ‘spin_unlock_irqrestore’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  spin_unlock_irqrestore(&fman->spinlock, flags);
                         ^
In file included from ./include/linux/mmzone.h:7:0,
                 from ./include/linux/gfp.h:5,
                 from ./include/linux/slab.h:14,
                 from drivers/net/ethernet/freescale/fman/fman.c:40:
./include/linux/spinlock.h:352:29: note: expected ‘spinlock_t * {aka struct spinlock *}’ but argument is of type ‘int *’
 static __always_inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
                             ^~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/freescale/fman/fman.c:2206:25: error: passing argument 1 of ‘spin_unlock_irqrestore’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  spin_unlock_irqrestore(&fman->spinlock, flags);
                         ^
In file included from ./include/linux/mmzone.h:7:0,
                 from ./include/linux/gfp.h:5,
                 from ./include/linux/slab.h:14,
                 from drivers/net/ethernet/freescale/fman/fman.c:40:
./include/linux/spinlock.h:352:29: note: expected ‘spinlock_t * {aka struct spinlock *}’ but argument is of type ‘int *’
 static __always_inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
                             ^~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[5]: *** [scripts/Makefile.build:302: drivers/net/ethernet/freescale/fman/fman.o] Error 1
make[4]: *** [scripts/Makefile.build:561: drivers/net/ethernet/freescale/fman] Error 2
make[3]: *** [scripts/Makefile.build:561: drivers/net/ethernet/freescale] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [scripts/Makefile.build:561: drivers/net/ethernet] Error 2
make[1]: *** [scripts/Makefile.build:561: drivers/net] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:1019: drivers] Error 2

^ permalink raw reply

* Re: [PATCH 0/5] Netfilter fixes for net
From: David Miller @ 2017-08-24 18:49 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1503585811-13447-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 24 Aug 2017 16:43:26 +0200

> The following patchset contains Netfilter fixes for your net tree,
> they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks.

^ permalink raw reply

* [PATCH net-next] hv_netvsc: Fix rndis_filter_close error during netvsc_remove
From: Haiyang Zhang @ 2017-08-24 18:50 UTC (permalink / raw)
  To: davem, netdev; +Cc: haiyangz, kys, olaf, vkuznets, linux-kernel

From: Haiyang Zhang <haiyangz@microsoft.com>

We now remove rndis filter before unregister_netdev(), which calls
device close. It involves closing rndis filter already removed.

This patch fixes this error.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index c0c4c9195a3f..fac44c5c8d0d 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -119,12 +119,16 @@ static int netvsc_close(struct net_device *net)
 	struct net_device *vf_netdev
 		= rtnl_dereference(net_device_ctx->vf_netdev);
 	struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
-	int ret;
+	int ret = 0;
 	u32 aread, i, msec = 10, retry = 0, retry_max = 20;
 	struct vmbus_channel *chn;
 
 	netif_tx_disable(net);
 
+	/* No need to close rndis filter if it is removed already */
+	if (!nvdev)
+		goto out;
+
 	ret = rndis_filter_close(nvdev);
 	if (ret != 0) {
 		netdev_err(net, "unable to close device (ret %d).\n", ret);
@@ -163,6 +167,7 @@ static int netvsc_close(struct net_device *net)
 		ret = -ETIMEDOUT;
 	}
 
+out:
 	if (vf_netdev)
 		dev_close(vf_netdev);
 
-- 
2.14.0

^ permalink raw reply related

* Re: [PATCH net] virtio_net: be drop monitor friend
From: David Miller @ 2017-08-24 18:50 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, mst, jasowang
In-Reply-To: <1503590569.2499.81.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 24 Aug 2017 09:02:49 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> This change is needed to not fool drop monitor.
> (perf record ... -e skb:kfree_skb )
> 
> Packets were properly sent and are consumed after TX completion.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

I'm pretty sure you meant "friendly" in the Subject line so I fixed
it up to say that :-)

Applied, thanks.

^ permalink raw reply

* Re: Question about ip_defrag
From: Jesper Dangaard Brouer @ 2017-08-24 18:59 UTC (permalink / raw)
  To: liujian (CE)
  Cc: davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	yoshfuji@linux-ipv6.org, elena.reshetova@intel.com,
	edumazet@google.com, netdev@vger.kernel.org, brouer
In-Reply-To: <4F88C5DDA1E80143B232E89585ACE27D018F0AE1@DGGEMA502-MBX.china.huawei.com>


On Thu, 24 Aug 2017 16:04:41 +0000 "liujian (CE)" <liujian56@huawei.com> wrote:

> >What kernel version have you seen this issue with?  
>
> 3.10，with some backport.
>
>  >As far as I remember, this issue have been fixed before...  
>
> which one patch? I didnot find out the patch:(

AFAIK it was some bugs in the percpu_counter code.  If you need to
backport look at the git commits:

 git log lib/percpu_counter.c include/linux/percpu_counter.h

Are you maintaining your own 3.10 kernel?

I know that for RHEL7 (also kernel 3.10) we backported the
percpu_counter fixes...

--Jesper


> 发件人： Jesper Dangaard Brouer
> 收件人： liujian (CE)<liujian56@huawei.com<mailto:liujian56@huawei.com>>
> 抄送： davem@davemloft.net<mailto:davem@davemloft.net>;kuznet@ms2.inr.ac.ru<mailto:kuznet@ms2.inr.ac.ru>;yoshfuji@linux-ipv6.org<mailto:yoshfuji@linux-ipv6.org>;elena.reshetova@intel.com<mailto:elena.reshetova@intel.com>;edumazet@google.com<mailto:edumazet@google.com>;netdev@vger.kernel.org<mailto:netdev@vger.kernel.org>;brouer@redhat.com<mailto:brouer@redhat.com>
> 主题： Re: Question about ip_defrag
> 时间： 2017-08-24 21:53:17
> 
> 
> On Thu, 24 Aug 2017 13:15:33 +0000 "liujian (CE)" <liujian56@huawei.com> wrote:
> > Hello,
> >
> > With below patch we met one issue.
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.13-rc6&id=6d7b857d541e
> >
> > the issue:
> > Ip_defrag fail caused by frag_mem_limit reached 4M(frags.high_thresh).
> > At this moment,sum_frag_mem_limit is about 10K.
> > and my test machine's cpu num is 64.
> >
> > Can i only change frag_mem_limit to sum_ frag_mem_limit?
> >
> >
> > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> > index 96e95e8..f09c00b 100644
> > --- a/net/ipv4/inet_fragment.c
> > +++ b/net/ipv4/inet_fragment.c
> > @@ -120,7 +120,7 @@ static void inet_frag_secret_rebuild(struct inet_frags *f)
> >  static bool inet_fragq_should_evict(const struct inet_frag_queue *q)
> >  {
> >         return q->net->low_thresh == 0 ||
> > -              frag_mem_limit(q->net) >= q->net->low_thresh;
> > +              sum_frag_mem_limit(q->net) >= q->net->low_thresh;
> >  }
> >
> >  static unsigned int
> > @@ -355,7 +355,7 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
> >  {
> >         struct inet_frag_queue *q;
> >
> > -       if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {
> > +       if (!nf->high_thresh || sum_frag_mem_limit(nf) > nf->high_thresh) {
> >                 inet_frag_schedule_worker(f);
> >                 return NULL;
> >         }
> > @@ -396,7 +396,7 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
> >         struct inet_frag_queue *q;
> >         int depth = 0;
> >
> > -       if (frag_mem_limit(nf) > nf->low_thresh)
> > +       if (sum_frag_mem_limit(nf) > nf->low_thresh)
> >                 inet_frag_schedule_worker(f);
> >
> >         hash &= (INETFRAGS_HASHSZ - 1);
> > --
> >
> > Thank you for your time.  
> 
> What kernel version have you seen this issue with?
> 
> As far as I remember, this issue have been fixed before...
> 
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [V3 PATCH net-next 0/5] xdp: more work on xdp tracepoints
From: David Miller @ 2017-08-24 19:00 UTC (permalink / raw)
  To: brouer; +Cc: netdev, borkmann
In-Reply-To: <150357074701.26663.4047992776649697788.stgit@firesoul>

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Thu, 24 Aug 2017 12:32:58 +0200

> More work on streamlining and performance optimizing the tracepoints
> for XDP.
> 
> I've created a simple xdp_monitor application that uses this
> tracepoint, and prints statistics. Available at github:
> 
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_kern.c
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_monitor_user.c
> 
> The improvement over tracepoint with strcpy: 9810372 - 8428762 = +1381610 pps faster
>  - (1/9810372 - 1/8428762)*10^9 = -16.7 nanosec
>  - 100-(8428762/9810372*100) = strcpy-trace is 14.08% slower
>  - 981037/8428762*100 = removing strcpy made it 11.64% faster
> 
> V3: Fix merge conflict with commit e4a8e817d3cb ("bpf: misc xdp redirect cleanups")
> V2: Change trace_xdp_redirect() to align with args of trace_xdp_exception()

Series applied, thanks Jesper.

^ permalink raw reply

* Re: [PATCH net-next 3/3 v7] drivers: net: ethernet: qualcomm: rmnet: Initial implementation
From: David Miller @ 2017-08-24 19:15 UTC (permalink / raw)
  To: subashab; +Cc: netdev, fengguang.wu, dcbw, jiri, stephen, David.Laight, marcel
In-Reply-To: <1503355019-12236-4-git-send-email-subashab@codeaurora.org>

From: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Date: Mon, 21 Aug 2017 16:36:59 -0600

> diff --git a/drivers/net/ethernet/qualcomm/rmnet/Makefile b/drivers/net/ethernet/qualcomm/rmnet/Makefile
> new file mode 100644
> index 0000000..1c43e2f
> --- /dev/null
> +++ b/drivers/net/ethernet/qualcomm/rmnet/Makefile
> @@ -0,0 +1,12 @@
> +#
> +# Makefile for the RMNET module
> +#
> +
> +rmnet-y		 := rmnet_config.o
> +rmnet-y		 += rmnet_vnd.o
> +rmnet-y		 += rmnet_handlers.o
> +rmnet-y		 += rmnet_map_data.o
> +rmnet-y		 += rmnet_map_command.o
> +obj-$(CONFIG_RMNET) += rmnet.o
> +
> +CFLAGS_rmnet.o := -I$(src)

You do not need this CFLAGS rule, the local include files are included
using "" double quotes so it uses the local directory always.

> +static void rmnet_free_later(struct work_struct *work)
> +{
> +	struct rmnet_free_work *fwork;
> +
> +	fwork = container_of(work, struct rmnet_free_work, work);
> +
> +	rtnl_lock();
> +	rmnet_delink(fwork->rmnet_dev, NULL);
> +	rtnl_unlock();
> +
> +	kfree(fwork);
> +}

This is racy and doesn't work properly.

When you schedule this work, the RTNL mutex is dropped.  Meanwhile
another request can come in the associate this device.

Your work function will still run and erroneously unlink the object.

Furthermore, during this time that the RTNL mutex is dropped, people
will see the unassociated device in the lists.

You have to atomically remove the object from all possible locations
which provide external visibility of that object, before the RTNL
mutex is dropped.

So you can defer the freeing, but you cannot defer the unlink
operation.

You probably need to move to RCU as well in order for all of this to
work properly since scans of the lists occur in the data path which
is completely asynchronous and not protected by the RTNL mutex.

^ permalink raw reply

* Re: [patch net-next 11/12] mlxsw: spectrum_dpipe: Add support for IPv4 host table dump
From: David Ahern @ 2017-08-24 19:26 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: davem, arkadis, idosch, mlxsw
In-Reply-To: <20170824064010.1646-12-jiri@resnulli.us>

On 8/23/17 11:40 PM, Jiri Pirko wrote:
> +static int
> +mlxsw_sp_dpipe_table_host_entries_get(struct mlxsw_sp *mlxsw_sp,
> +				      struct devlink_dpipe_entry *entry,
> +				      bool counters_enabled,
> +				      struct devlink_dpipe_dump_ctx *dump_ctx,
> +				      int type)
> +{
> +	int rif_neigh_count = 0;
> +	int rif_neigh_skip = 0;
> +	int neigh_count = 0;
> +	int rif_count;
> +	int i, j;
> +	int err;
> +
> +	rtnl_lock();

Why does a h/w driver dumping its tables need the rtnl lock?

^ permalink raw reply

* Re: [patch net-next 12/12] mlxsw: spectrum_dpipe: Add support for controlling neighbor counters
From: David Ahern @ 2017-08-24 19:28 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: davem, arkadis, idosch, mlxsw
In-Reply-To: <20170824064010.1646-13-jiri@resnulli.us>

On 8/23/17 11:40 PM, Jiri Pirko wrote:
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
> index 94d954b..3c8599f 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
> @@ -599,6 +599,38 @@ mlxsw_sp_dpipe_table_host4_entries_dump(void *priv, bool counters_enabled,
>  						      dump_ctx, AF_INET);
>  }
>  
> +static void
> +mlxsw_sp_dpipe_table_host_counters_update(struct mlxsw_sp *mlxsw_sp,
> +					  bool enable, int type)
> +{
> +	int i;
> +
> +	rtnl_lock();

same question here. Why is rtnl lock needed?

> +	for (i = 0; i < MLXSW_CORE_RES_GET(mlxsw_sp->core, MAX_RIFS); i++) {
> +		struct mlxsw_sp_rif *rif = mlxsw_sp_rif_by_index(mlxsw_sp, i);
> +		struct mlxsw_sp_neigh_entry *neigh_entry;
> +
> +		if (!rif)
> +			continue;
> +		mlxsw_sp_rif_neigh_for_each(neigh_entry, rif) {
> +			if (mlxsw_sp_neigh_entry_type(neigh_entry) != type)
> +				continue;
> +			mlxsw_sp_neigh_entry_counter_update(mlxsw_sp,
> +							    neigh_entry,
> +							    enable);
> +		}
> +	}
> +	rtnl_unlock();

^ permalink raw reply

* Re: [Patch net-next] net_sched: remove tc class reference counting
From: David Miller @ 2017-08-24 19:32 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, jhs
In-Reply-To: <20170823043810.17602-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 22 Aug 2017 21:38:10 -0700

> For TC classes, their ->get() and ->put() are always paired, and the
> reference counting is completely useless, because:
> 
> 1) For class modification and dumping paths, we already hold RTNL lock,
>    so all of these ->get(),->change(),->put() are atomic.
> 
> 2) For filter bindiing/unbinding, we use other reference counter than
>    this one, and they should have RTNL lock too.
> 
> 3) For ->qlen_notify(), it is special because it is called on ->enqueue()
>    path, but we already hold qdisc tree lock there, and we hold this
>    tree lock when graft or delete the class too, so it should not be gone
>    or changed until we release the tree lock.
> 
> Therefore, this patch removes ->get() and ->put(), but:
> 
> 1) Adds a new ->find() to find the pointer to a class by classid, no
>    refcnt.
> 
> 2) Move the original class destroy upon the last refcnt into ->delete(),
>    right after releasing tree lock. This is fine because the class is
>    already removed from hash when holding the lock.
> 
> For those who also use ->put() as ->unbind(), just rename them to reflect
> this change.
> 
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

This one looks good to me.

Jamal, please give it a quick look over and review.

Thanks!

^ permalink raw reply

* Re: [PATCH] net/mlx4_core: make mlx4_profile const
From: David Miller @ 2017-08-24 19:33 UTC (permalink / raw)
  To: bhumirks; +Cc: julia.lawall, tariqt, netdev, linux-rdma, linux-kernel
In-Reply-To: <1503492459-3110-1-git-send-email-bhumirks@gmail.com>

From: Bhumika Goyal <bhumirks@gmail.com>
Date: Wed, 23 Aug 2017 18:17:39 +0530

> Make these const as they are only used in a copy operation.
> 
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH] net/mlx5e: make mlx5e_profile const
From: David Miller @ 2017-08-24 19:33 UTC (permalink / raw)
  To: bhumirks
  Cc: julia.lawall, saeedm, matanb, leonro, netdev, linux-rdma,
	linux-kernel
In-Reply-To: <1503492721-3187-1-git-send-email-bhumirks@gmail.com>

From: Bhumika Goyal <bhumirks@gmail.com>
Date: Wed, 23 Aug 2017 18:22:01 +0530

> Make this const as it is only passed as an argument to the function
> mlx5e_create_netdev and the corresponding argument is of type const.
> 
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net] virtio_net: be drop monitor friend
From: Eric Dumazet @ 2017-08-24 19:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, mst, jasowang
In-Reply-To: <20170824.115053.1345828961956619603.davem@davemloft.net>

On Thu, 2017-08-24 at 11:50 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 24 Aug 2017 09:02:49 -0700
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > This change is needed to not fool drop monitor.
> > (perf record ... -e skb:kfree_skb )
> > 
> > Packets were properly sent and are consumed after TX completion.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> I'm pretty sure you meant "friendly" in the Subject line so I fixed
> it up to say that :-)
> 
> Applied, thanks.

Ah, great, thanks for fixing this ;)

^ permalink raw reply

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Corentin Labbe @ 2017-08-24 19:44 UTC (permalink / raw)
  To: Florian Fainelli, maxime.ripard, andrew, wens
  Cc: Rob Herring, Mark Rutland, Russell King, Giuseppe Cavallaro,
	Alexandre Torgue, devicetree, linux-arm-kernel, linux-kernel,
	netdev
In-Reply-To: <20170824082124.GA14302@Red>

On Thu, Aug 24, 2017 at 10:21:24AM +0200, Corentin Labbe wrote:
> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
> > On 08/23/2017 12:49 AM, Maxime Ripard wrote:
> > > Hi Florian,
> > > 
> > > On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
> > >>>>> So I think what you are saying is either impossible or engineering-wise
> > >>>>> a very stupid design, like using an external MAC with a discrete PHY
> > >>>>> connected to the internal MAC's MDIO bus, while using the internal MAC
> > >>>>> with the internal PHY.
> > >>>>>
> > >>>>> Now can we please decide on something? We're a week and a half from
> > >>>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
> > >>>>> nodes (internal-mdio & external-mdio).
> > >>>>
> > >>>> I really don't see a need for a mdio-mux in the first place, just have
> > >>>> one MDIO controller (current state) sub-node which describes the
> > >>>> built-in STMMAC MDIO controller and declare the internal PHY as a child
> > >>>> node (along with 'phy-is-integrated'). If a different configuration is
> > >>>> used, then just put the external PHY as a child node there.
> > >>>>
> > >>>> If fixed-link is required, the mdio node becomes unused anyway.
> > >>>>
> > >>>> Works for everyone?
> > >>>
> > >>> If we put an external PHY with reg=1 as a child of internal MDIO,
> > >>> il will be merged with internal PHY node and get
> > >>> phy-is-integrated.
> > >>
> > >> Then have the .dtsi file contain just the mdio node, but no internal or
> > >> external PHY and push all the internal and external PHY node definition
> > >> (in its entirety) to the per-board DTS file, does not that work?
> > > 
> > > If possible, I'd really like to have the internal PHY in the
> > > DTSI. It's always there in hardware anyway, and duplicating the PHY,
> > > with its clock, reset line, and whatever info we might need in the
> > > future in each and every board DTS that uses it will be very error
> > > prone and we will have the usual bunch of issues that come up with
> > > duplication.
> > 
> > OK, then what if you put the internal PHY in the DTSI, mark it with a
> > status = "disabled" property, and have the per-board DTS put a status =
> > "okay" property along with a "phy-is-integrated" boolean property? Would
> > that work?
> 
> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external PHY (ethernet-phy@1) is still merged.
> So that adding a 'status = "disabled"' does not bring anything.
> 
> > 
> > What I really don't think is necessary is:
> > 
> > - duplicating the "mdio" controller node for external vs. internal PHY,
> > because this is not accurate, there is just one MDIO controller, but
> > there may be different kinds of MDIO/PHY devices attached
> 
> For me, if we want to represent the reality, we need two MDIO:
> - since two PHY at the same address could co-exists
> - since they are isolated so not on the same MDIO bus
> 
> > - having the STMMAC driver MDIO probing code having to deal with a
> > "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
> > and requiring more driver-level changes that are error prone
> 
> My patch for stmmac is really small, only the name of my variable ("need_mdio_mux_ids")
> have to be changed to something like "register_parent_mdio"
> 
> 
> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid it only by having two separate MDIO nodes.
> Furthermore, with only one MDIO, we will face with lots of small patch for adding phy-is-integrated, with two we do not need to change any board DT, all is simply clean.
> Really having two MDIO seems cleaner.
> 

Hello

I have speaked with Neil Amstrong, and he said that they get the same problem on amlogic.
They use a mdio-mux-mmioreg, (see eth-phy-mux in arch/arm64/boot/dts/amlogic/meson-gxl.dtsi)
So tomorow, i will send a patch series which do the same with the exception that we need a mdio-mux-syscon (which is easy/simple to do).
Since their setup use stmmac, it means that we will need 0 changes on stmmac.

Regards

^ permalink raw reply

* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Florian Fainelli @ 2017-08-24 19:59 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Maxime Ripard, Chen-Yu Tsai, Andrew Lunn, Rob Herring,
	Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
	devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <20170824082124.GA14302@Red>

On 08/24/2017 01:21 AM, Corentin Labbe wrote:
> On Wed, Aug 23, 2017 at 09:31:53AM -0700, Florian Fainelli wrote:
>> On 08/23/2017 12:49 AM, Maxime Ripard wrote:
>>> Hi Florian,
>>>
>>> On Tue, Aug 22, 2017 at 11:35:01AM -0700, Florian Fainelli wrote:
>>>>>>> So I think what you are saying is either impossible or engineering-wise
>>>>>>> a very stupid design, like using an external MAC with a discrete PHY
>>>>>>> connected to the internal MAC's MDIO bus, while using the internal MAC
>>>>>>> with the internal PHY.
>>>>>>>
>>>>>>> Now can we please decide on something? We're a week and a half from
>>>>>>> the 4.13 release. If mdio-mux is wrong, then we could have two mdio
>>>>>>> nodes (internal-mdio & external-mdio).
>>>>>>
>>>>>> I really don't see a need for a mdio-mux in the first place, just have
>>>>>> one MDIO controller (current state) sub-node which describes the
>>>>>> built-in STMMAC MDIO controller and declare the internal PHY as a child
>>>>>> node (along with 'phy-is-integrated'). If a different configuration is
>>>>>> used, then just put the external PHY as a child node there.
>>>>>>
>>>>>> If fixed-link is required, the mdio node becomes unused anyway.
>>>>>>
>>>>>> Works for everyone?
>>>>>
>>>>> If we put an external PHY with reg=1 as a child of internal MDIO,
>>>>> il will be merged with internal PHY node and get
>>>>> phy-is-integrated.
>>>>
>>>> Then have the .dtsi file contain just the mdio node, but no internal or
>>>> external PHY and push all the internal and external PHY node definition
>>>> (in its entirety) to the per-board DTS file, does not that work?
>>>
>>> If possible, I'd really like to have the internal PHY in the
>>> DTSI. It's always there in hardware anyway, and duplicating the PHY,
>>> with its clock, reset line, and whatever info we might need in the
>>> future in each and every board DTS that uses it will be very error
>>> prone and we will have the usual bunch of issues that come up with
>>> duplication.
>>
>> OK, then what if you put the internal PHY in the DTSI, mark it with a
>> status = "disabled" property, and have the per-board DTS put a status =
>> "okay" property along with a "phy-is-integrated" boolean property? Would
>> that work?
> 
> No, I tested and for example with sun8i-h3-orangepi-plus.dts, the external PHY (ethernet-phy@1) is still merged.

Is not there is a mistake in the unit address not matching the "reg"
property, or am I not looking at the right tree?

&mdio {
        ext_rgmii_phy: ethernet-phy@1 {
                compatible = "ethernet-phy-ieee802.3-c22";
                reg = <0>;
        };
};

If the PHY is really at MDIO address 0, then it should be
ethernet-phy@0, and not ethernet-phy@1, and then no problem with the
merging?


> So that adding a 'status = "disabled"' does not bring anything.
> 
>>
>> What I really don't think is necessary is:
>>
>> - duplicating the "mdio" controller node for external vs. internal PHY,
>> because this is not accurate, there is just one MDIO controller, but
>> there may be different kinds of MDIO/PHY devices attached
> 
> For me, if we want to represent the reality, we need two MDIO:
> - since two PHY at the same address could co-exists
> - since they are isolated so not on the same MDIO bus

Is that really true? It might be, but from experience with e.g:
bcmgenet, the integrated PHY and the external PHYs are on the same MDIO
bus, which is convenient, except when you have an address conflict.

> 
>> - having the STMMAC driver MDIO probing code having to deal with a
>> "mdio" sub-node or an "internal-mdio" sub-node because this is confusing
>> and requiring more driver-level changes that are error prone
> 
> My patch for stmmac is really small, only the name of my variable ("need_mdio_mux_ids")
> have to be changed to something like "register_parent_mdio"
> 
> 
> So I agree with Maxime, we need to avoid merging PHY nodes, and we can avoid it only by having two separate MDIO nodes.
> Furthermore, with only one MDIO, we will face with lots of small patch for adding phy-is-integrated, with two we do not need to change any board DT, all is simply clean.
> Really having two MDIO seems cleaner.

The only valid thing that you have provided so far is this merging
problem. Anything else ranging from "we will face with lots of small
patch for adding phy-is-integrated" to "Really having two MDIO seems
cleaner." are hard to receive as technical arguments for correctness.

What happens if someone connects an external PHY at the same MDIO
address than the internal PHY, which one do you get responses from? If
you shutdown the internal PHY and it stops responding, then this
probably becomes deterministic, but it still supports the fact there is
just one MDIO bus controller per MAC.

Anyway, do whatever works best for you I guess.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Willem de Bruijn @ 2017-08-24 20:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Koichiro Den, Jason Wang, virtualization, Network Development
In-Reply-To: <20170824160748-mutt-send-email-mst@kernel.org>

>> Traffic shaping can introduce msec timescale latencies.
>>
>> The delay may actually be a useful signal. If the guest does not
>> orphan skbs early, TSQ will throttle the socket causing host
>> queue build up.
>>
>> But, if completions are queued in-order, unrelated flows may be
>> throttled as well. Allowing out of order completions would resolve
>> this HoL blocking.
>
> We can allow out of order, no guests that follow virtio spec
> will break. But this won't help in all cases
> - a single slow flow can occupy the whole ring, you will not
>   be able to make any new buffers available for the fast flow
> - what host considers a single flow can be multiple flows for guest
>
> There are many other examples.

These examples are due to exhaustion of the fixed ubuf_info pool,
right? We could use dynamic allocation or a resizable pool if these
issues are serious enough.

>> > Neither
>> > do I see why would using tx interrupts within guest be a work around -
>> > AFAIK windows driver uses tx interrupts.
>>
>> It does not address completion latency itself. What I meant was
>> that in an interrupt-driver model, additional starvation issues,
>> such as the potential deadlock raised at the start of this thread,
>> or the timer delay observed before packets were orphaned in
>> virtio-net in commit b0c39dbdc204, are mitigated.
>>
>> Specifically, it breaks the potential deadlock where sockets are
>> blocked waiting for completions (to free up budget in sndbuf, tsq, ..),
>> yet completion handling is blocked waiting for a new packet to
>> trigger free_old_xmit_skbs from start_xmit.
>
> This talk of potential deadlock confuses me - I think you mean we would
> deadlock if we did not orphan skbs in !use_napi - is that right?  If you
> mean that you can drop skb orphan and this won't lead to a deadlock if
> free skbs upon a tx interrupt, I agree, for sure.

Yes, that is what I meant.

>> >> That is the only thing keeping us from removing the HoL blocking in vhost-net zerocopy.
>> >
>> > We don't enable network watchdog on virtio but we could and maybe
>> > should.
>>
>> Can you elaborate?
>
> The issue is that holding onto buffers for very long times makes guests
> think they are stuck. This is funamentally because from guest point of
> view this is a NIC, so it is supposed to transmit things out in
> a timely manner. If host backs the virtual NIC by something that is not
> a NIC, with traffic shaping etc introducing unbounded latencies,
> guest will be confused.

That assumes that guests are fragile in this regard. A linux guest
does not make such assumptions. There are NICs with hardware
rate limiting, so I'm not sure how much of a leap host os rate
limiting is.

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Michael S. Tsirkin @ 2017-08-24 20:50 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Koichiro Den, Jason Wang, virtualization, Network Development
In-Reply-To: <CAF=yD-+9Ah8pC9i2w3Ad3WnhQit7Yo479pMrToty7priL6BFLw@mail.gmail.com>

On Thu, Aug 24, 2017 at 04:20:39PM -0400, Willem de Bruijn wrote:
> >> Traffic shaping can introduce msec timescale latencies.
> >>
> >> The delay may actually be a useful signal. If the guest does not
> >> orphan skbs early, TSQ will throttle the socket causing host
> >> queue build up.
> >>
> >> But, if completions are queued in-order, unrelated flows may be
> >> throttled as well. Allowing out of order completions would resolve
> >> this HoL blocking.
> >
> > We can allow out of order, no guests that follow virtio spec
> > will break. But this won't help in all cases
> > - a single slow flow can occupy the whole ring, you will not
> >   be able to make any new buffers available for the fast flow
> > - what host considers a single flow can be multiple flows for guest
> >
> > There are many other examples.
> 
> These examples are due to exhaustion of the fixed ubuf_info pool,
> right?

No - the ring size itself.

> We could use dynamic allocation or a resizable pool if these
> issues are serious enough.

We need some kind of limit on how many requests a guest can queue in the
host, or it's an obvious DoS attack vector. We used the ring size for
that.

> >> > Neither
> >> > do I see why would using tx interrupts within guest be a work around -
> >> > AFAIK windows driver uses tx interrupts.
> >>
> >> It does not address completion latency itself. What I meant was
> >> that in an interrupt-driver model, additional starvation issues,
> >> such as the potential deadlock raised at the start of this thread,
> >> or the timer delay observed before packets were orphaned in
> >> virtio-net in commit b0c39dbdc204, are mitigated.
> >>
> >> Specifically, it breaks the potential deadlock where sockets are
> >> blocked waiting for completions (to free up budget in sndbuf, tsq, ..),
> >> yet completion handling is blocked waiting for a new packet to
> >> trigger free_old_xmit_skbs from start_xmit.
> >
> > This talk of potential deadlock confuses me - I think you mean we would
> > deadlock if we did not orphan skbs in !use_napi - is that right?  If you
> > mean that you can drop skb orphan and this won't lead to a deadlock if
> > free skbs upon a tx interrupt, I agree, for sure.
> 
> Yes, that is what I meant.
> 
> >> >> That is the only thing keeping us from removing the HoL blocking in vhost-net zerocopy.
> >> >
> >> > We don't enable network watchdog on virtio but we could and maybe
> >> > should.
> >>
> >> Can you elaborate?
> >
> > The issue is that holding onto buffers for very long times makes guests
> > think they are stuck. This is funamentally because from guest point of
> > view this is a NIC, so it is supposed to transmit things out in
> > a timely manner. If host backs the virtual NIC by something that is not
> > a NIC, with traffic shaping etc introducing unbounded latencies,
> > guest will be confused.
> 
> That assumes that guests are fragile in this regard. A linux guest
> does not make such assumptions.

Yes it does. Examples above:
	> > - a single slow flow can occupy the whole ring, you will not
	> >   be able to make any new buffers available for the fast flow
	> > - what host considers a single flow can be multiple flows for guest

it's easier to see if you enable the watchdog timer for virtio. Linux
supports that.

> There are NICs with hardware
> rate limiting, so I'm not sure how much of a leap host os rate
> limiting is.

I don't know what happens if these NICs hold onto packets for seconds.

-- 
MST

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox