Netdev List
 help / color / mirror / Atom feed
* Re: [iproute2 net-next 3/8] bpf: Add BPF_ macros
From: Hannes Frederic Sowa @ 2016-12-12 11:28 UTC (permalink / raw)
  To: David Ahern, netdev, stephen
In-Reply-To: <1481503995-24825-4-git-send-email-dsa@cumulusnetworks.com>

On 12.12.2016 01:53, David Ahern wrote:
> Based on version in kernel repo, samples/bpf/libbpf.h
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
>  include/bpf_util.h | 179 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 179 insertions(+)
> 
> diff --git a/include/bpf_util.h b/include/bpf_util.h
> index 726e34777755..5361dab1933d 100644

Maybe this was already discussed: why are those not part of uapi? They
get used in the bpf manpage.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH 1/1] Fixed to BUG_ON to WARN_ON def
From: Stefan Schmidt @ 2016-12-12 11:19 UTC (permalink / raw)
  To: Ozgur Karatas, yishaih; +Cc: netdev, linux-kernel
In-Reply-To: <811031481540339@web1j.yandex.ru>

Hello.

On 12/12/16 11:58, Ozgur Karatas wrote:
> Hello all,
> I think should be use to "WARN_ON" and checkpatch script give to error, I fixed and I think  should don't use "BUG_ON".
> Regards,
>
> Signed-off-by: Ozgur Karatas <okaratas@member.fsf.org>

I pointed you already before to the Documentation how to prepare a 
commit subject and commit message. You just replied with that you are 
new to contributing patches. That is all fine and many people are new 
each release. Please take the time to read the provided and pointed out 
docs.

If you keep ignoring such suggestions and docs I would think people will 
keep ignoring your patches.

regards
Stefan Schmidt

^ permalink raw reply

* Re: [PATCH v2 1/4] net: hix5hd2_gmac: add generic compatible string
From: Dongpo Li @ 2016-12-12 11:16 UTC (permalink / raw)
  To: Rob Herring
  Cc: mark.rutland-5wv7dgnIgG8, mturquette-rdvid1DuHRBWk0Htik3J/w,
	sboyd-sgV2jX0FEOL9JmXXK+q4OQ, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
	yisen.zhuang-hv44wF8Li93QT0dZR+AlfA,
	salil.mehta-hv44wF8Li93QT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	arnd-r2nGTMty4D4, andrew-g2DYL2Zd6BY,
	xuejiancheng-C8/M+/jPZTeaMJb+Lgu22Q,
	benjamin.chenhao-C8/M+/jPZTeaMJb+Lgu22Q,
	caizhiyong-C8/M+/jPZTeaMJb+Lgu22Q, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161209223521.5dnj7l44e663sntl@rob-hp-laptop>

Hi Rob,

On 2016/12/10 6:35, Rob Herring wrote:
> On Mon, Dec 05, 2016 at 09:27:58PM +0800, Dongpo Li wrote:
>> The "hix5hd2" is SoC name, add the generic ethernet driver name.
>> The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
>> the SG/TXCSUM/TSO/UFO features.
>>
>> Signed-off-by: Dongpo Li <lidongpo-C8/M+/jPZTeaMJb+Lgu22Q@public.gmane.org>
>> ---
>>  .../devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt    |  9 +++++++--
>>  drivers/net/ethernet/hisilicon/hix5hd2_gmac.c             | 15 +++++++++++----
>>  2 files changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
>> index 75d398b..75920f0 100644
>> --- a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
>> +++ b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
>> @@ -1,7 +1,12 @@
>>  Hisilicon hix5hd2 gmac controller
>>  
>>  Required properties:
>> -- compatible: should be "hisilicon,hix5hd2-gmac".
>> +- compatible: should contain one of the following SoC strings:
>> +	* "hisilicon,hix5hd2-gemac"
>> +	* "hisilicon,hi3798cv200-gemac"
>> +	and one of the following version string:
>> +	* "hisilicon,hisi-gemac-v1"
>> +	* "hisilicon,hisi-gemac-v2"
> 
> What combinations are valid? I assume both chips don't have both v1 and 
> v2. 2 SoCs and 2 versions so far, I don't think there is much point to 
> have the v1 and v2 compatible strings.
> 
The v1 and v2 are generic MAC compatible strings, many HiSilicon SoCs may
use the same MAC version. For example,
hix5hd2, hi3716cv200 SoCs use the v1 MAC version,
hi3798cv200, hi3516a SoCs use the v2 MAC version,
and there may be more SoCs added in future.
So I think the generic compatible strings are okay here.
Should I add the hi3716cv200, hi3516a SoCs compatible here?
Do you have any good advice?

>>  - reg: specifies base physical address(s) and size of the device registers.
>>    The first region is the MAC register base and size.
>>    The second region is external interface control register.
>> @@ -20,7 +25,7 @@ Required properties:
>>  
>>  Example:
>>  	gmac0: ethernet@f9840000 {
>> -		compatible = "hisilicon,hix5hd2-gmac";
>> +		compatible = "hisilicon,hix5hd2-gemac", "hisilicon,hisi-gemac-v1";
> 
> You can't just change compatible strings.
> 
Okay, maybe I should name all the compatible string with the suffix "-gmac" instead of
"-gemac". This can keep the compatible strings with the same suffix. Is this okay?
Can I just add the generic compatible string without changing the SoCs compatible string?
Like following:
  	gmac0: ethernet@f9840000 {
 -		compatible = "hisilicon,hix5hd2-gmac";
 +		compatible = "hisilicon,hix5hd2-gmac", "hisilicon,hisi-gmac-v1";

>>  		reg = <0xf9840000 0x1000>,<0xf984300c 0x4>;
>>  		interrupts = <0 71 4>;
>>  		#address-cells = <1>;
> 
> .
> 


    Regards,
    Dongpo

.

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 1/1] Fixed to BUG_ON to WARN_ON def
From: Ozgur Karatas @ 2016-12-12 10:58 UTC (permalink / raw)
  To: yishaih; +Cc: netdev, linux-kernel

Hello all, 
I think should be use to "WARN_ON" and checkpatch script give to error, I fixed and I think  should don't use "BUG_ON".
Regards,

Signed-off-by: Ozgur Karatas <okaratas@member.fsf.org>
---
drivers/net/ethernet/mellanox/mlx4/icm.c |  4 ++--

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
index 2a9dd46..3fde535 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -119,7 +119,7 @@ static int mlx4_alloc_icm_coherent(struct device *dev, struct scatterlist *mem,
 		return -ENOMEM;
 
 	sg_set_buf(mem, buf, PAGE_SIZE << order);
-	BUG_ON(mem->offset);
+	WARN_ON(mem->offset);
 	sg_dma_len(mem) = PAGE_SIZE << order;
 	return 0;
 }
@@ -133,7 +133,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
 	int ret;
 
 	/* We use sg_set_buf for coherent allocs, which assumes low memory */
-	BUG_ON(coherent && (gfp_mask & __GFP_HIGHMEM));
+	WARN_ON(coherent && (gfp_mask & __GFP_HIGHMEM));
 
 	icm = kmalloc_node(sizeof(*icm),
 			   gfp_mask & ~(__GFP_HIGHMEM | __GFP_NOWARN),
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH v3 4/4] vsock: cancel packets when failing to connect
From: Jorgen S. Hansen @ 2016-12-12 10:41 UTC (permalink / raw)
  To: Peng Tao
  Cc: netdev@vger.kernel.org, virtualization@lists.linux-foundation.org,
	David Miller, Stefan Hajnoczi, kvm@vger.kernel.org
In-Reply-To: <1481217156-7160-5-git-send-email-bergwolf@gmail.com>


> On Dec 8, 2016, at 6:12 PM, Peng Tao <bergwolf@gmail.com> wrote:
> 
> Otherwise we'll leave the packets queued until releasing vsock device.
> E.g., if guest is slow to start up, resulting ETIMEDOUT on connect, guest
> will get the connect requests from failed host sockets.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Peng Tao <bergwolf@gmail.com>
> ---
> net/vmw_vsock/af_vsock.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
> 
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 8a398b3..c73b03a 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -1101,10 +1101,19 @@ static const struct proto_ops vsock_dgram_ops = {
> 	.sendpage = sock_no_sendpage,
> };
> 
> +static int vsock_transport_cancel_pkt(struct vsock_sock *vsk)
> +{
> +	if (!transport->cancel_pkt)
> +		return -EOPNOTSUPP;
> +
> +	return transport->cancel_pkt(vsk);
> +}
> +
> static void vsock_connect_timeout(struct work_struct *work)
> {
> 	struct sock *sk;
> 	struct vsock_sock *vsk;
> +	int cancel = 0;
> 
> 	vsk = container_of(work, struct vsock_sock, dwork.work);
> 	sk = sk_vsock(vsk);
> @@ -1115,8 +1124,11 @@ static void vsock_connect_timeout(struct work_struct *work)
> 		sk->sk_state = SS_UNCONNECTED;
> 		sk->sk_err = ETIMEDOUT;
> 		sk->sk_error_report(sk);
> +		cancel = 1;
> 	}
> 	release_sock(sk);
> +	if (cancel)
> +		vsock_transport_cancel_pkt(vsk);
> 
> 	sock_put(sk);
> }
> @@ -1223,11 +1235,13 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
> 			err = sock_intr_errno(timeout);
> 			sk->sk_state = SS_UNCONNECTED;
> 			sock->state = SS_UNCONNECTED;
> +			vsock_transport_cancel_pkt(vsk);
> 			goto out_wait;
> 		} else if (timeout == 0) {
> 			err = -ETIMEDOUT;
> 			sk->sk_state = SS_UNCONNECTED;
> 			sock->state = SS_UNCONNECTED;
> +			vsock_transport_cancel_pkt(vsk);
> 			goto out_wait;
> 		}
> 
> -- 
> 2.7.4
> 

This looks fine to me:

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply

* Re: [PATCH v3 2/4] vhost-vsock: add pkt cancel capability
From: Jorgen S. Hansen @ 2016-12-12 10:37 UTC (permalink / raw)
  To: Peng Tao
  Cc: netdev@vger.kernel.org, virtualization@lists.linux-foundation.org,
	David Miller, Stefan Hajnoczi, kvm@vger.kernel.org
In-Reply-To: <1481217156-7160-3-git-send-email-bergwolf@gmail.com>


> On Dec 8, 2016, at 6:12 PM, Peng Tao <bergwolf@gmail.com> wrote:
> 
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -100,6 +100,9 @@ struct vsock_transport {
> 	void (*destruct)(struct vsock_sock *);
> 	void (*release)(struct vsock_sock *);
> 
> +	/* Cancel packets belonging the same vsock */

How about “/* Cancel all pending packets sent on vsock. */“ ? 

> +	int (*cancel_pkt)(struct vsock_sock *vsk);
> +
> 	/* Connections. */
> 	int (*connect)(struct vsock_sock *);
> 
> -- 
> 2.7.4
> 

Thanks,
Jørgen
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence GEM.
From: Harini Katakam @ 2016-12-12 10:34 UTC (permalink / raw)
  To: Andrei Pistirica
  Cc: Richard Cochran, Harini Katakam, Rafal Ozieblo, netdev,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, David Miller, Nicolas Ferre,
	Punnaiah Choudary Kalluri, michals@xilinx.com, Anirudha Sarangi,
	Boris Brezillon, alexandre.belloni, tbultel
In-Reply-To: <07C910AB6AC6C345A093D5A08F5AF568CB74D84D@CHN-SV-EXMX03.mchp-main.com>

Hi Andrei,

On Mon, Dec 12, 2016 at 3:52 PM,  <Andrei.Pistirica@microchip.com> wrote:
>
>
>> -----Original Message-----
>> From: Rafal Ozieblo [mailto:rafalo@cadence.com]
>> Sent: Friday, December 09, 2016 11:20 AM
>> To: Andrei Pistirica - M16132; richardcochran@gmail.com
>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
>> kernel@lists.infradead.org; davem@davemloft.net;
>> nicolas.ferre@atmel.com; harinikatakamlinux@gmail.com;
>> harini.katakam@xilinx.com; punnaia@xilinx.com; michals@xilinx.com;
>> anirudh@xilinx.com; boris.brezillon@free-electrons.com;
>> alexandre.belloni@free-electrons.com; tbultel@pixelsurmer.com
>> Subject: RE: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence
>> GEM.
>>
>> -----Original Message-----
>> > From: Andrei.Pistirica@microchip.com
>> > [mailto:Andrei.Pistirica@microchip.com]
>> > Sent: 8 grudnia 2016 15:42
>> > To: richardcochran@gmail.com
>> > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
>> > linux-arm-kernel@lists.infradead.org; davem@davemloft.net;
>> > nicolas.ferre@atmel.com; harinikatakamlinux@gmail.com;
>> > harini.katakam@xilinx.com; punnaia@xilinx.com; michals@xilinx.com;
>> > anirudh@xilinx.com; boris.brezillon@free-electrons.com;
>> > alexandre.belloni@free-electrons.com; tbultel@pixelsurmer.com; Rafal
>> > Ozieblo
>> > Subject: RE: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in
>> Cadence GEM.
>> >
>> >
>> >
>> > > -----Original Message-----
>> > > From: Richard Cochran [mailto:richardcochran@gmail.com]
>> > > Sent: Wednesday, December 07, 2016 11:04 PM
>> > > To: Andrei Pistirica - M16132
>> > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
>> > > kernel@lists.infradead.org; davem@davemloft.net;
>> > > nicolas.ferre@atmel.com; harinikatakamlinux@gmail.com;
>> > > harini.katakam@xilinx.com; punnaia@xilinx.com; michals@xilinx.com;
>> > > anirudh@xilinx.com; boris.brezillon@free-electrons.com;
>> > > alexandre.belloni@free-electrons.com; tbultel@pixelsurmer.com;
>> > > rafalo@cadence.com
>> > > Subject: Re: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in
>> > > Cadence GEM.
>> > >
>> > > On Wed, Dec 07, 2016 at 08:39:09PM +0100, Richard Cochran wrote:
>> > > > > +static s32 gem_ptp_max_adj(unsigned int f_nom) {
>> > > > > +       u64 adj;
>> > > > > +
>> > > > > +       /* The 48 bits of seconds for the GEM overflows every:
>> > > > > +        * 2^48/(365.25 * 24 * 60 *60) =~ 8 925 512 years (~= 9 mil years),
>> > > > > +        * thus the maximum adjust frequency must not overflow
>> > > > > + CNS
>> > > register:
>> > > > > +        *
>> > > > > +        * addend  = 10^9/nominal_freq
>> > > > > +        * adj_max = +/- addend*ppb_max/10^9
>> > > > > +        * max_ppb = (2^8-1)*nominal_freq-10^9
>> > > > > +        */
>> > > > > +       adj = f_nom;
>> > > > > +       adj *= 0xffff;
>> > > > > +       adj -= 1000000000ULL;
>> > > >
>> > > > What is this computation, and how does it relate to the comment?
>> >
>> > I considered the following simple equation: increment value at nominal
>> frequency (which is 10^9/nominal frequency nsecs) + the maximum drift
>> value (nsecs) <= maximum increment value at nominal frequency (which is
>> 8bit:0xffff).
>> > If maximum drift is written as function of nominal frequency and
>> maximum ppb, then the equation above yields that the maximum ppb is:
>> (2^8 - 1) *nominal_frequency - 10^9. The equation is also simplified by the
>> fact that the drift is written as ppm + 16bit_fractions and the increment
>> value is written as nsec + 16bit_fractions.
>> >
>> > Rafal said that this value is hardcoded: 0x64E6, while Harini said:
>> 250000000.
>>
>> To clarify a little bit. In my reference code this value (0x64E6) was taken
>> from our legacy code. It was used for testing only. I know it should be
>> change to something more accurate. This is the reason why I asked how did
>> you count it (250000000). According to our calculations this value depends
>> on actual set period (incr_ns and incr_sub_ns) and min and max value we
>> can set. The calculation were a little bit intricate, so we decided to leave it
>> as it was.
>>
>> >
>> > I need to dig into this...
>> >
>> > >
>> > > I am not sure what you meant, but it sounds like you are on the wrong
>> track.
>> > > Let me explain...
>> >
>> > Thanks.
>> >
>> > >
>> > > The max_adj has nothing at all to do with the width of the time register.
>> > > Rather, it should reflect the maximum possible change in the tuning
>> word.
>> > >
>> > > For example, with a nominal 8 ns period, the tuning word is 0x80000.
>> > > Looking at running the clock more slowly, the slowest possible word
>> > > is 0x00001, meaning a difference of 0x7FFFF.  This implies an
>> > > adjustment of
>> > > 0x7FFFF/0x80000 or 999998092 ppb.  Running more quickly, we can
>> > > already have 0x100000, twice as fast, or just under 2 billion ppb.
>> > >
>> > > You should consider the extreme cases to determine the most limited
>> > > (smallest) max_adj value:
>> > >
>> > > Case 1 - high frequency
>> > > ~~~~~~~~~~~~~~~~~~~~~~~
>> > >
>> > > With a nominal 1 ns period, we have the nominal tuning word 0x10000.
>> > > The smallest is 0x1 for a difference of 0xFFFF.  This corresponds to
>> > > an adjustment of 0xFFFF/0x10000 = .9999847412109375 or 999984741 ppb.
>> > >
>> > > Case 2 - low frequency
>> > > ~~~~~~~~~~~~~~~~~~~~~~
>> > >
>> > > With a nominal 255 ns period, the nominal word is 0xFF0000, the
>> > > largest 0xFFFFFF, and the difference is 0xFFFF.  This corresponds to
>> > > and adjustment of 0xFFFF/0xFF0000 = .0039215087890625 or 3921508 ppb.
>> > >
>> > > Since 3921508 ppb is a huge adjustment, you can simply use that as a
>> > > safe maximum, ignoring the actual input clock.
>> > >
>> > > Thanks,
>> > > Richard
>> > >
>> > >
>> >
>> > Regards,
>> > Andrei
>> >
>>
>> Best regards,
>> Rafal Ozieblo   |   Firmware System Engineer,
>> phone nbr.: +48 32 5085469
>> www.cadence.com
>
> Hi Guys,
>
> Based on Richard's input, this is what I want to do for our platforms:
>
> struct macb_ptp_info {
>         void (*ptp_init)(struct net_device *ndev);
>         void (*ptp_remove)(struct net_device *ndev);
> +       s32 (*get_ptp_max_adj)(void);
>         unsigned int (*get_tsu_rate)(struct macb *bp);
>         int (*get_ts_info)(struct net_device *dev,
>                            struct ethtool_ts_info *info);
>        int (*get_hwtst)(struct net_device *netdev,
>                          struct ifreq *ifr);
>        int (*set_hwtst)(struct net_device *netdev,
>                          struct ifreq *ifr, int cmd);
> };
>
> +static s32 gem_get_ptp_max_adj(void)
> +{
> +       return 3921508;
> +}
>
>  static struct macb_ptp_info gem_ptp_info = {
>        .ptp_init        = gem_ptp_init,
>        .ptp_remove      = gem_ptp_remove,
> +       .get_ptp_max_adj = gem_get_ptp_max_adj,
>        .get_tsu_rate    = gem_get_tsu_rate,
>        .get_ts_info     = gem_get_ts_info,
>        .get_hwtst       = gem_get_hwtst,
>        .set_hwtst       = gem_set_hwtst,
>  };
>
> [...]
> void gem_ptp_init(struct net_device *ndev)
>  {
> [...]
>         /* nominal frequency and maximum adjustment in ppb */
>         bp->tsu_rate = bp->ptp_info->get_tsu_rate(bp);
> +       bp->ptp_caps.max_adj = bp->ptp_info->get_ptp_max_adj();
> [...]
> }
>
> Richard, are you agree with this?
>
> Harini, you can fill the callback with the value for your platform. Tell me if you are ok with function's signature.
>

Thanks, this works for me.

Regards,
Harini

^ permalink raw reply

* [PATCH v2] net: macb: Added PCI wrapper for Platform Driver.
From: Bartosz Folta @ 2016-12-12 10:29 UTC (permalink / raw)
  To: Nicolas Ferre, David S. Miller, Niklas Cassel, Alexandre Torgue,
	Satanand Burla, Raghu Vatsavayi, Simon Horman,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
  Cc: Bartosz Folta, Rafal Ozieblo
In-Reply-To: <1481537461-9587-1-git-send-email-bfolta@cadence.com>

There are hardware PCI implementations of Cadence GEM network
controller. This patch will allow to use such hardware with reuse of
existing Platform Driver.

Signed-off-by: Bartosz Folta <bfolta@cadence.com>
---
Changed in v2:
Respin to net-next. Changed patch formatting.
---
 drivers/net/ethernet/cadence/Kconfig    |   9 ++
 drivers/net/ethernet/cadence/Makefile   |   1 +
 drivers/net/ethernet/cadence/macb.c     |  31 +++++--
 drivers/net/ethernet/cadence/macb_pci.c | 152 ++++++++++++++++++++++++++++++++
 include/linux/platform_data/macb.h      |   6 ++
 5 files changed, 194 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/cadence/macb_pci.c

diff --git a/drivers/net/ethernet/cadence/Kconfig b/drivers/net/ethernet/cadence/Kconfig
index f0bcb15..00d833e 100644
--- a/drivers/net/ethernet/cadence/Kconfig
+++ b/drivers/net/ethernet/cadence/Kconfig
@@ -31,4 +31,13 @@ config MACB
 	  To compile this driver as a module, choose M here: the module
 	  will be called macb.
 
+config MACB_PCI
+	tristate "Cadence PCI MACB/GEM support"
+	depends on MACB
+	---help---
+	  This is PCI wrapper for MACB driver.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called macb_pci.
+
 endif # NET_CADENCE
diff --git a/drivers/net/ethernet/cadence/Makefile b/drivers/net/ethernet/cadence/Makefile
index 91f79b1..4ba7559 100644
--- a/drivers/net/ethernet/cadence/Makefile
+++ b/drivers/net/ethernet/cadence/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_MACB) += macb.o
+obj-$(CONFIG_MACB_PCI) += macb_pci.o
diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 538544a..c0fb80a 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -404,6 +404,8 @@ static int macb_mii_probe(struct net_device *dev)
 			phy_irq = gpio_to_irq(pdata->phy_irq_pin);
 			phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
 		}
+	} else {
+		phydev->irq = PHY_POLL;
 	}
 
 	/* attach the mac to the phy */
@@ -482,6 +484,9 @@ static int macb_mii_init(struct macb *bp)
 				goto err_out_unregister_bus;
 		}
 	} else {
+		for (i = 0; i < PHY_MAX_ADDR; i++)
+			bp->mii_bus->irq[i] = PHY_POLL;
+
 		if (pdata)
 			bp->mii_bus->phy_mask = pdata->phy_mask;
 
@@ -2523,16 +2528,24 @@ static int macb_clk_init(struct platform_device *pdev, struct clk **pclk,
 			 struct clk **hclk, struct clk **tx_clk,
 			 struct clk **rx_clk)
 {
+	struct macb_platform_data *pdata;
 	int err;
 
-	*pclk = devm_clk_get(&pdev->dev, "pclk");
+	pdata = dev_get_platdata(&pdev->dev);
+	if (pdata) {
+		*pclk = pdata->pclk;
+		*hclk = pdata->hclk;
+	} else {
+		*pclk = devm_clk_get(&pdev->dev, "pclk");
+		*hclk = devm_clk_get(&pdev->dev, "hclk");
+	}
+
 	if (IS_ERR(*pclk)) {
 		err = PTR_ERR(*pclk);
 		dev_err(&pdev->dev, "failed to get macb_clk (%u)\n", err);
 		return err;
 	}
 
-	*hclk = devm_clk_get(&pdev->dev, "hclk");
 	if (IS_ERR(*hclk)) {
 		err = PTR_ERR(*hclk);
 		dev_err(&pdev->dev, "failed to get hclk (%u)\n", err);
@@ -3107,15 +3120,23 @@ static int at91ether_init(struct platform_device *pdev)
 MODULE_DEVICE_TABLE(of, macb_dt_ids);
 #endif /* CONFIG_OF */
 
+static const struct macb_config default_gem_config = {
+	.caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO,
+	.dma_burst_length = 16,
+	.clk_init = macb_clk_init,
+	.init = macb_init,
+	.jumbo_max_len = 10240,
+};
+
 static int macb_probe(struct platform_device *pdev)
 {
+	const struct macb_config *macb_config = &default_gem_config;
 	int (*clk_init)(struct platform_device *, struct clk **,
 			struct clk **, struct clk **,  struct clk **)
-					      = macb_clk_init;
-	int (*init)(struct platform_device *) = macb_init;
+					      = macb_config->clk_init;
+	int (*init)(struct platform_device *) = macb_config->init;
 	struct device_node *np = pdev->dev.of_node;
 	struct device_node *phy_node;
-	const struct macb_config *macb_config = NULL;
 	struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL;
 	unsigned int queue_mask, num_queues;
 	struct macb_platform_data *pdata;
diff --git a/drivers/net/ethernet/cadence/macb_pci.c b/drivers/net/ethernet/cadence/macb_pci.c
new file mode 100644
index 0000000..b440960
--- /dev/null
+++ b/drivers/net/ethernet/cadence/macb_pci.c
@@ -0,0 +1,152 @@
+/**
+ * macb_pci.c - Cadence GEM PCI wrapper.
+ *
+ * Copyright (C) 2016 Cadence Design Systems - http://www.cadence.com
+ *
+ * Authors: Rafal Ozieblo <rafalo@cadence.com>
+ *	    Bartosz Folta <bfolta@cadence.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2  of
+ * the License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/clk.h>
+#include <linux/clk-provider.h>
+#include <linux/etherdevice.h>
+#include <linux/pci.h>
+#include <linux/platform_data/macb.h>
+#include <linux/platform_device.h>
+#include "macb.h"
+
+#define PCI_DRIVER_NAME "macb_pci"
+#define PLAT_DRIVER_NAME "macb"
+
+#define CDNS_VENDOR_ID 0x17cd
+#define CDNS_DEVICE_ID 0xe007
+
+#define GEM_PCLK_RATE 50000000
+#define GEM_HCLK_RATE 50000000
+
+static int macb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	int err;
+	struct platform_device *plat_dev;
+	struct platform_device_info plat_info;
+	struct macb_platform_data plat_data;
+	struct resource res[2];
+
+	/* sanity check */
+	if (!id)
+		return -EINVAL;
+
+	/* enable pci device */
+	err = pci_enable_device(pdev);
+	if (err < 0) {
+		dev_err(&pdev->dev, "Enabling PCI device has failed: 0x%04X",
+			err);
+		return -EACCES;
+	}
+
+	pci_set_master(pdev);
+
+	/* set up resources */
+	memset(res, 0x00, sizeof(struct resource) * ARRAY_SIZE(res));
+	res[0].start = pdev->resource[0].start;
+	res[0].end = pdev->resource[0].end;
+	res[0].name = PCI_DRIVER_NAME;
+	res[0].flags = IORESOURCE_MEM;
+	res[1].start = pdev->irq;
+	res[1].name = PCI_DRIVER_NAME;
+	res[1].flags = IORESOURCE_IRQ;
+
+	dev_info(&pdev->dev, "EMAC physical base addr = 0x%p\n",
+		 (void *)(uintptr_t)pci_resource_start(pdev, 0));
+
+	/* set up macb platform data */
+	memset(&plat_data, 0, sizeof(plat_data));
+
+	/* initialize clocks */
+	plat_data.pclk = clk_register_fixed_rate(&pdev->dev, "pclk", NULL, 0,
+						 GEM_PCLK_RATE);
+	if (IS_ERR(plat_data.pclk)) {
+		err = PTR_ERR(plat_data.pclk);
+		goto err_pclk_register;
+	}
+
+	plat_data.hclk = clk_register_fixed_rate(&pdev->dev, "hclk", NULL, 0,
+						 GEM_HCLK_RATE);
+	if (IS_ERR(plat_data.hclk)) {
+		err = PTR_ERR(plat_data.hclk);
+		goto err_hclk_register;
+	}
+
+	/* set up platform device info */
+	memset(&plat_info, 0, sizeof(plat_info));
+	plat_info.parent = &pdev->dev;
+	plat_info.fwnode = pdev->dev.fwnode;
+	plat_info.name = PLAT_DRIVER_NAME;
+	plat_info.id = pdev->devfn;
+	plat_info.res = res;
+	plat_info.num_res = ARRAY_SIZE(res);
+	plat_info.data = &plat_data;
+	plat_info.size_data = sizeof(plat_data);
+	plat_info.dma_mask = DMA_BIT_MASK(32);
+
+	/* register platform device */
+	plat_dev = platform_device_register_full(&plat_info);
+	if (IS_ERR(plat_dev)) {
+		err = PTR_ERR(plat_dev);
+		goto err_plat_dev_register;
+	}
+
+	pci_set_drvdata(pdev, plat_dev);
+
+	return 0;
+
+err_plat_dev_register:
+	clk_unregister(plat_data.hclk);
+
+err_hclk_register:
+	clk_unregister(plat_data.pclk);
+
+err_pclk_register:
+	pci_disable_device(pdev);
+	return err;
+}
+
+void macb_remove(struct pci_dev *pdev)
+{
+	struct platform_device *plat_dev = pci_get_drvdata(pdev);
+	struct macb_platform_data *plat_data = dev_get_platdata(&plat_dev->dev);
+
+	platform_device_unregister(plat_dev);
+	pci_disable_device(pdev);
+	clk_unregister(plat_data->pclk);
+	clk_unregister(plat_data->hclk);
+}
+
+static struct pci_device_id dev_id_table[] = {
+	{ PCI_DEVICE(CDNS_VENDOR_ID, CDNS_DEVICE_ID), },
+	{ 0, }
+};
+
+static struct pci_driver macb_pci_driver = {
+	.name     = PCI_DRIVER_NAME,
+	.id_table = dev_id_table,
+	.probe    = macb_probe,
+	.remove	  = macb_remove,
+};
+
+module_pci_driver(macb_pci_driver);
+MODULE_DEVICE_TABLE(pci, dev_id_table);
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Cadence NIC PCI wrapper");
diff --git a/include/linux/platform_data/macb.h b/include/linux/platform_data/macb.h
index 21b15f6..7815d50 100644
--- a/include/linux/platform_data/macb.h
+++ b/include/linux/platform_data/macb.h
@@ -8,6 +8,8 @@
 #ifndef __MACB_PDATA_H__
 #define __MACB_PDATA_H__
 
+#include <linux/clk.h>
+
 /**
  * struct macb_platform_data - platform data for MACB Ethernet
  * @phy_mask:		phy mask passed when register the MDIO bus
@@ -15,12 +17,16 @@
  * @phy_irq_pin:	PHY IRQ
  * @is_rmii:		using RMII interface?
  * @rev_eth_addr:	reverse Ethernet address byte order
+ * @pclk:		platform clock
+ * @hclk:		AHB clock
  */
 struct macb_platform_data {
 	u32		phy_mask;
 	int		phy_irq_pin;
 	u8		is_rmii;
 	u8		rev_eth_addr;
+	struct clk	*pclk;
+	struct clk	*hclk;
 };
 
 #endif /* __MACB_PDATA_H__ */
-- 
1.9.1

^ permalink raw reply related

* Re: 4.9.0-rc8: tg3 dead after resume
From: Siva Reddy Kallam @ 2016-12-12 10:23 UTC (permalink / raw)
  To: Billy Shuman; +Cc: Michael Chan, Netdev

On Fri, Dec 9, 2016 at 7:59 PM, Billy Shuman <wshuman3@gmail.com> wrote:
> On Thu, Dec 8, 2016 at 4:03 AM, Siva Reddy Kallam
> <siva.kallam@broadcom.com> wrote:
>> On Thu, Dec 8, 2016 at 12:14 AM, Billy Shuman <wshuman3@gmail.com> wrote:
>>> On Wed, Dec 7, 2016 at 12:37 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>>> On Wed, Dec 7, 2016 at 7:20 AM, Billy Shuman <wshuman3@gmail.com> wrote:
>>>>> After resume on 4.9.0-rc8 tg3 is dead.
>>>>>
>>>>> In logs I see:
>>>>> kernel: tg3 0000:44:00.0: phy probe failed, err -19
>>>>> kernel: tg3 0000:44:00.0: Problem fetching invariants of chip, aborting
>>>>
>>>> -19 is -ENODEV which means tg3 cannot read the PHY ID.
>>>>
>>>> If it's a true suspend/resume operation, the driver does not have to
>>>> go through probe during resume.  Please explain how you do
>>>> suspend/resume.
>>>>
>>>
>>> Sorry my previous message was accidentally sent to early.
>>>
>>> I used systemd (systemctl suspend) to suspend.
>>>
>> We need more information to proceed further.
>> Without suspend, Are you able to use the tg3 port?
>
> Yes the port works fine without suspend.
OK
>
>> Which Broadcom card are you having in laptop?
>
> The nic is a NetXtreme BCM57762 Gigabit Ethernet PCIe in a thunderbolt3 dock.
>
OK
>> Please provide complete tg3 specific logs in dmesg.
>>
>
> [   32.084010] tg3.c:v3.137 (May 11, 2014)
> [   32.124695] tg3 0000:44:00.0 eth0: Tigon3 [partno(BCM957762) rev
> 57766001] (PCI Express) MAC address 98:e7:f4:8b:13:19
> [   32.124698] tg3 0000:44:00.0 eth0: attached PHY is 57765
> (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
> [   32.124699] tg3 0000:44:00.0 eth0: RXcsums[1] LinkChgREG[0]
> MIirq[0] ASF[0] TSOcap[1]
> [   32.124700] tg3 0000:44:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
> [   32.219764] tg3 0000:44:00.0 enp68s0: renamed from eth0
> [   36.219245] tg3 0000:44:00.0 enp68s0: Link is up at 1000 Mbps, full duplex
> [   36.219250] tg3 0000:44:00.0 enp68s0: Flow control is on for TX and on for RX
> [   36.219251] tg3 0000:44:00.0 enp68s0: EEE is disabled
>
> after resume
> [   92.292838] tg3 0000:44:00.0 enp68s0: No firmware running
> [   93.521744] tg3 0000:44:00.0: tg3_abort_hw timed out,
> TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
> [  106.704655] tg3 0000:44:00.0 enp68s0: Link is down
> [  108.370356] tg3 0000:44:00.0: tg3_abort_hw timed out,
> TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
>
> after rmmod, modprobe
> [  570.933636] tg3 0000:44:00.0: tg3_abort_hw timed out,
> TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
> [  604.847215] tg3.c:v3.137 (May 11, 2014)
> [  605.010075] tg3 0000:44:00.0: phy probe failed, err -19
> [  605.010077] tg3 0000:44:00.0: Problem fetching invariants of chip, aborting
>
>
>
>
We will try to reproduce and update you on this.
>>>> Did this work before?  There has been very few changes to tg3 recently.
>>>>
>>>
>>> This is a new laptop for me, but the same behavior is seen on 4.4.36 and 4.8.12.
>>>
>>>>>
>>>>> rmmod and modprobe does not fix the problem only a reboot resolves the issue.
>>>>>
>>>>> Billy

^ permalink raw reply

* RE: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence GEM.
From: Andrei.Pistirica @ 2016-12-12 10:22 UTC (permalink / raw)
  To: richardcochran, harini.katakam, rafalo
  Cc: netdev, linux-kernel, linux-arm-kernel, davem, nicolas.ferre,
	harinikatakamlinux, punnaia, michals, anirudh, boris.brezillon,
	alexandre.belloni, tbultel
In-Reply-To: <BN3PR07MB2516992DEE883FAD6DC3ED08C9870@BN3PR07MB2516.namprd07.prod.outlook.com>



> -----Original Message-----
> From: Rafal Ozieblo [mailto:rafalo@cadence.com]
> Sent: Friday, December 09, 2016 11:20 AM
> To: Andrei Pistirica - M16132; richardcochran@gmail.com
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org; davem@davemloft.net;
> nicolas.ferre@atmel.com; harinikatakamlinux@gmail.com;
> harini.katakam@xilinx.com; punnaia@xilinx.com; michals@xilinx.com;
> anirudh@xilinx.com; boris.brezillon@free-electrons.com;
> alexandre.belloni@free-electrons.com; tbultel@pixelsurmer.com
> Subject: RE: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence
> GEM.
> 
> -----Original Message-----
> > From: Andrei.Pistirica@microchip.com
> > [mailto:Andrei.Pistirica@microchip.com]
> > Sent: 8 grudnia 2016 15:42
> > To: richardcochran@gmail.com
> > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> > linux-arm-kernel@lists.infradead.org; davem@davemloft.net;
> > nicolas.ferre@atmel.com; harinikatakamlinux@gmail.com;
> > harini.katakam@xilinx.com; punnaia@xilinx.com; michals@xilinx.com;
> > anirudh@xilinx.com; boris.brezillon@free-electrons.com;
> > alexandre.belloni@free-electrons.com; tbultel@pixelsurmer.com; Rafal
> > Ozieblo
> > Subject: RE: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in
> Cadence GEM.
> >
> >
> >
> > > -----Original Message-----
> > > From: Richard Cochran [mailto:richardcochran@gmail.com]
> > > Sent: Wednesday, December 07, 2016 11:04 PM
> > > To: Andrei Pistirica - M16132
> > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> > > kernel@lists.infradead.org; davem@davemloft.net;
> > > nicolas.ferre@atmel.com; harinikatakamlinux@gmail.com;
> > > harini.katakam@xilinx.com; punnaia@xilinx.com; michals@xilinx.com;
> > > anirudh@xilinx.com; boris.brezillon@free-electrons.com;
> > > alexandre.belloni@free-electrons.com; tbultel@pixelsurmer.com;
> > > rafalo@cadence.com
> > > Subject: Re: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in
> > > Cadence GEM.
> > >
> > > On Wed, Dec 07, 2016 at 08:39:09PM +0100, Richard Cochran wrote:
> > > > > +static s32 gem_ptp_max_adj(unsigned int f_nom) {
> > > > > +       u64 adj;
> > > > > +
> > > > > +       /* The 48 bits of seconds for the GEM overflows every:
> > > > > +        * 2^48/(365.25 * 24 * 60 *60) =~ 8 925 512 years (~= 9 mil years),
> > > > > +        * thus the maximum adjust frequency must not overflow
> > > > > + CNS
> > > register:
> > > > > +        *
> > > > > +        * addend  = 10^9/nominal_freq
> > > > > +        * adj_max = +/- addend*ppb_max/10^9
> > > > > +        * max_ppb = (2^8-1)*nominal_freq-10^9
> > > > > +        */
> > > > > +       adj = f_nom;
> > > > > +       adj *= 0xffff;
> > > > > +       adj -= 1000000000ULL;
> > > >
> > > > What is this computation, and how does it relate to the comment?
> >
> > I considered the following simple equation: increment value at nominal
> frequency (which is 10^9/nominal frequency nsecs) + the maximum drift
> value (nsecs) <= maximum increment value at nominal frequency (which is
> 8bit:0xffff).
> > If maximum drift is written as function of nominal frequency and
> maximum ppb, then the equation above yields that the maximum ppb is:
> (2^8 - 1) *nominal_frequency - 10^9. The equation is also simplified by the
> fact that the drift is written as ppm + 16bit_fractions and the increment
> value is written as nsec + 16bit_fractions.
> >
> > Rafal said that this value is hardcoded: 0x64E6, while Harini said:
> 250000000.
> 
> To clarify a little bit. In my reference code this value (0x64E6) was taken
> from our legacy code. It was used for testing only. I know it should be
> change to something more accurate. This is the reason why I asked how did
> you count it (250000000). According to our calculations this value depends
> on actual set period (incr_ns and incr_sub_ns) and min and max value we
> can set. The calculation were a little bit intricate, so we decided to leave it
> as it was.
> 
> >
> > I need to dig into this...
> >
> > >
> > > I am not sure what you meant, but it sounds like you are on the wrong
> track.
> > > Let me explain...
> >
> > Thanks.
> >
> > >
> > > The max_adj has nothing at all to do with the width of the time register.
> > > Rather, it should reflect the maximum possible change in the tuning
> word.
> > >
> > > For example, with a nominal 8 ns period, the tuning word is 0x80000.
> > > Looking at running the clock more slowly, the slowest possible word
> > > is 0x00001, meaning a difference of 0x7FFFF.  This implies an
> > > adjustment of
> > > 0x7FFFF/0x80000 or 999998092 ppb.  Running more quickly, we can
> > > already have 0x100000, twice as fast, or just under 2 billion ppb.
> > >
> > > You should consider the extreme cases to determine the most limited
> > > (smallest) max_adj value:
> > >
> > > Case 1 - high frequency
> > > ~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > With a nominal 1 ns period, we have the nominal tuning word 0x10000.
> > > The smallest is 0x1 for a difference of 0xFFFF.  This corresponds to
> > > an adjustment of 0xFFFF/0x10000 = .9999847412109375 or 999984741 ppb.
> > >
> > > Case 2 - low frequency
> > > ~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > With a nominal 255 ns period, the nominal word is 0xFF0000, the
> > > largest 0xFFFFFF, and the difference is 0xFFFF.  This corresponds to
> > > and adjustment of 0xFFFF/0xFF0000 = .0039215087890625 or 3921508 ppb.
> > >
> > > Since 3921508 ppb is a huge adjustment, you can simply use that as a
> > > safe maximum, ignoring the actual input clock.
> > >
> > > Thanks,
> > > Richard
> > >
> > >
> >
> > Regards,
> > Andrei
> >
> 
> Best regards,
> Rafal Ozieblo   |   Firmware System Engineer,
> phone nbr.: +48 32 5085469
> www.cadence.com

Hi Guys,

Based on Richard's input, this is what I want to do for our platforms:

struct macb_ptp_info {
        void (*ptp_init)(struct net_device *ndev);
        void (*ptp_remove)(struct net_device *ndev);
+       s32 (*get_ptp_max_adj)(void);
        unsigned int (*get_tsu_rate)(struct macb *bp);
        int (*get_ts_info)(struct net_device *dev,
                           struct ethtool_ts_info *info);
       int (*get_hwtst)(struct net_device *netdev,
                         struct ifreq *ifr);
       int (*set_hwtst)(struct net_device *netdev,
                         struct ifreq *ifr, int cmd); 
};

+static s32 gem_get_ptp_max_adj(void)
+{
+       return 3921508;
+}

 static struct macb_ptp_info gem_ptp_info = {
       .ptp_init        = gem_ptp_init,
       .ptp_remove      = gem_ptp_remove,
+       .get_ptp_max_adj = gem_get_ptp_max_adj,
       .get_tsu_rate    = gem_get_tsu_rate,
       .get_ts_info     = gem_get_ts_info,
       .get_hwtst       = gem_get_hwtst,
       .set_hwtst       = gem_set_hwtst,
 };

[...]
void gem_ptp_init(struct net_device *ndev)
 {
[...]
        /* nominal frequency and maximum adjustment in ppb */
        bp->tsu_rate = bp->ptp_info->get_tsu_rate(bp);
+       bp->ptp_caps.max_adj = bp->ptp_info->get_ptp_max_adj();
[...]
}

Richard, are you agree with this?

Harini, you can fill the callback with the value for your platform. Tell me if you are ok with function's signature.

Regards,
Andrei

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Joao Pinto @ 2016-12-12 10:19 UTC (permalink / raw)
  To: Florian Fainelli, Andy Shevchenko
  Cc: David Miller, Joao Pinto, Giuseppe CAVALLARO, lars.persson,
	rabin.vincent, netdev, CARLOS.PALMINHA, Jie.Deng1
In-Reply-To: <556353b7-c847-7549-626d-3c324063647e@gmail.com>

Hi,

Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)
>>
>>> did
>>> actually pioneer the upstreaming effort, but it is good to see people
>>> from Synopsys willing to fix that in the future.
>>
>> Wait, you would like to tell that we have more than 2 drivers for the
>> same (okay, same vendor) IP?!
>> It's better to unify them earlier, than have n+ copies.
> 
> Unfortunately that is the case, see this email:
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg142796.html
> 
> dwc_eth_qos and stmmac have some overlap. There seems to be work
> underway to unify these two to begin with.
> 
>>
>> P.S. Though, I don't see how sxgbe got in the list. First glance on
>> the code doesn't show similarities.
> 
> Well samsung/sxgbe looks potentially similar to amd/xgbe, but that's
> just my cursory look at the code, it may very well be something entirely
> different. The descriptor formats just look suspiciously similar.
> 

Thank you for your inputs! Renaming seems to be a hotspot. I agree that maybe
instead of renaming (breaking retro-compatibility as David and Florian
mentioned), the best is to move stmmac to synopsys/ after merging *qos* and
removing it. As Florian mentioned, git is capable of detecting folder restructured.

@Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
driver would it be possible for you to make an initial analysis of what has to
be merged into Stmmac? This way the development would speed-up.

Thanks to all.

Joao

^ permalink raw reply

* Re: netlink: GPF in sock_sndtimeo
From: Dmitry Vyukov @ 2016-12-12 10:07 UTC (permalink / raw)
  To: syzkaller
  Cc: Richard Guy Briggs, linux-audit, Paul Moore, David Miller,
	Johannes Berg, Florian Westphal, Eric Dumazet, Herbert Xu, netdev,
	LKML
In-Reply-To: <CAM_iQpVcHGywXn90EpiSz-LsUDgKVqs-7BY-L7UBCu2VxkC31Q@mail.gmail.com>

On Sat, Dec 10, 2016 at 8:40 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On 2016-12-08 22:57, Cong Wang wrote:
>>>> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
>>>> > I also tried to extend Cong Wang's idea to attempt to proactively respond to a
>>>> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking error
>>>> > stack dump using mutex_lock(&audit_cmd_mutex) in the notifier callback.
>>>> > Eliminating the lock since the sock is dead anways eliminates the error.
>>>> >
>>>> > Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll try to
>>>> > get the test case to compile.
>>>>
>>>> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 'audit_pid'
>>>> are updated as a whole and race between audit_receive_msg() and
>>>> NETLINK_URELEASE.
>>>
>>> This is what I expected and why I originally added the mutex lock in the
>>> callback...  The dumps I got were bare with no wrapper identifying the
>>> process context or specific error, so I'm at a bit of a loss how to
>>> solve this (without thinking more about it) other than instinctively
>>> removing the mutex.
>>
>> Netlink notifier can safely be converted to blocking one, I will send
>> a patch.
>>
>> But I seriously doubt you really need NETLINK_URELEASE here,
>> it adds nothing but overhead, b/c the netlink notifier is called on
>> every netlink socket in the system, but for net exit path, that is
>> relatively a slow path.
>>
>> Also, kauditd_send_skb() needs audit_cmd_mutex too.
>
> Please let me know what you think about the attached patch?

Applied the patch locally and have not seen the bug since then (~24
hours of testing).

^ permalink raw reply

* [PATCH v2] audit: use proper refcount locking on audit_sock
From: Richard Guy Briggs @ 2016-12-12 10:03 UTC (permalink / raw)
  To: netdev, linux-kernel, linux-audit
  Cc: Richard Guy Briggs, dvyukov, xiyou.wangcong, edumazet, eparis,
	pmoore, sgrubb
In-Reply-To: <20161212100215.GA1305@madcap2.tricolour.ca>

Resetting audit_sock appears to be racy.

audit_sock was being copied and dereferenced without using a refcount on
the source sock.

Bump the refcount on the underlying sock when we store a refrence in
audit_sock and release it when we reset audit_sock.  audit_sock
modification needs the audit_cmd_mutex.

See: https://lkml.org/lkml/2016/11/26/232

Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang
<xiyou.wangcong@gmail.com> on ideas how to fix it.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
There has been a lot of change in the audit code that is about to go
upstream to address audit queue issues.  This patch is based on the
source tree: git://git.infradead.org/users/pcmoore/audit#next
---
 kernel/audit.c |   34 ++++++++++++++++++++++++++++------
 1 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index f20eee0..439f7f3 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -452,7 +452,9 @@ static void auditd_reset(void)
 	struct sk_buff *skb;
 
 	/* break the connection */
+	sock_put(audit_sock);
 	audit_pid = 0;
+	audit_nlk_portid = 0;
 	audit_sock = NULL;
 
 	/* flush all of the retry queue to the hold queue */
@@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
 	if (rc >= 0) {
 		consume_skb(skb);
 		rc = 0;
+	} else {
+		if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
+			mutex_lock(&audit_cmd_mutex);
+			auditd_reset();
+			mutex_unlock(&audit_cmd_mutex);
+		}
 	}
 
 	return rc;
@@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
 
 				auditd = 0;
 				if (AUDITD_BAD(rc, reschedule)) {
+					mutex_lock(&audit_cmd_mutex);
 					auditd_reset();
+					mutex_unlock(&audit_cmd_mutex);
 					reschedule = 0;
 				}
 			} else
@@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
 				auditd = 0;
 				if (AUDITD_BAD(rc, reschedule)) {
 					kauditd_hold_skb(skb);
+					mutex_lock(&audit_cmd_mutex);
 					auditd_reset();
+					mutex_unlock(&audit_cmd_mutex);
 					reschedule = 0;
 				} else
 					/* temporary problem (we hope), queue
@@ -623,7 +635,9 @@ quick_loop:
 				if (rc) {
 					auditd = 0;
 					if (AUDITD_BAD(rc, reschedule)) {
+						mutex_lock(&audit_cmd_mutex);
 						auditd_reset();
+						mutex_unlock(&audit_cmd_mutex);
 						reschedule = 0;
 					}
 
@@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 				return -EACCES;
 			}
 			if (audit_pid && new_pid &&
-			    audit_replace(requesting_pid) != -ECONNREFUSED) {
+			    (audit_replace(requesting_pid) & (-ECONNREFUSED|-EPERM|-ENOMEM))) {
 				audit_log_config_change("audit_pid", new_pid, audit_pid, 0);
 				return -EEXIST;
 			}
 			if (audit_enabled != AUDIT_OFF)
 				audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
-			audit_pid = new_pid;
-			audit_nlk_portid = NETLINK_CB(skb).portid;
-			audit_sock = skb->sk;
-			if (!new_pid)
+			if (new_pid) {
+				if (audit_sock)
+					sock_put(audit_sock);
+				audit_pid = new_pid;
+				audit_nlk_portid = NETLINK_CB(skb).portid;
+				sock_hold(skb->sk);
+				audit_sock = skb->sk;
+			} else {
 				auditd_reset();
+			}
 			wake_up_interruptible(&kauditd_wait);
 		}
 		if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
@@ -1283,8 +1302,11 @@ static void __net_exit audit_net_exit(struct net *net)
 {
 	struct audit_net *aunet = net_generic(net, audit_net_id);
 	struct sock *sock = aunet->nlsk;
-	if (sock == audit_sock)
+	if (sock == audit_sock) {
+		mutex_lock(&audit_cmd_mutex);
 		auditd_reset();
+		mutex_unlock(&audit_cmd_mutex);
+	}
 
 	RCU_INIT_POINTER(aunet->nlsk, NULL);
 	synchronize_net();
-- 
1.7.1

^ permalink raw reply related

* Re: netlink: GPF in sock_sndtimeo
From: Richard Guy Briggs @ 2016-12-12 10:02 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-audit, Paul Moore, Dmitry Vyukov, David Miller,
	Johannes Berg, Florian Westphal, Eric Dumazet, Herbert Xu, netdev,
	LKML, syzkaller
In-Reply-To: <CAM_iQpV2GuKhR_1tD5jjACeD+pajJLws08CLmeYAo+rsjxvB0A@mail.gmail.com>

On 2016-12-09 20:13, Cong Wang wrote:
> On Fri, Dec 9, 2016 at 3:01 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2016-12-08 22:57, Cong Wang wrote:
> >> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> >> > I also tried to extend Cong Wang's idea to attempt to proactively respond to a
> >> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking error
> >> > stack dump using mutex_lock(&audit_cmd_mutex) in the notifier callback.
> >> > Eliminating the lock since the sock is dead anways eliminates the error.
> >> >
> >> > Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll try to
> >> > get the test case to compile.
> >>
> >> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 'audit_pid'
> >> are updated as a whole and race between audit_receive_msg() and
> >> NETLINK_URELEASE.
> >
> > This is what I expected and why I originally added the mutex lock in the
> > callback...  The dumps I got were bare with no wrapper identifying the
> > process context or specific error, so I'm at a bit of a loss how to
> > solve this (without thinking more about it) other than instinctively
> > removing the mutex.
> 
> Netlink notifier can safely be converted to blocking one, I will send
> a patch.

I had a quick look at how that might happen.  The netlink notifier chain
is atomic.  Would the registered callback funciton need to spawn a
one-time thread to avoid blocking?

> But I seriously doubt you really need NETLINK_URELEASE here,
> it adds nothing but overhead, b/c the netlink notifier is called on
> every netlink socket in the system, but for net exit path, that is
> relatively a slow path.

I was a bit concerned about its overhead, but was hoping to update
audit_sock more quickly in the case of a sock shutting down for any
reason.

> Also, kauditd_send_skb() needs audit_cmd_mutex too.

Agreed.

> I will send a formal patch.

I had a look at your patch.  It looks attractively simple.  The audit
next tree has patches queued that add an audit_reset function that will
require more work.  I still see some potential gaps.

- If the process messes up (or the sock lookup messes up) it is reset
  in the kauditd thread under the audit_cmd_mutex.

- If the process exits normally or is replaced due to an audit_replace
  error, it is reset from audit_receive_skb under the audit_cmd_mutex.

- If the process dies before the kauditd thread notices, either reap it
  via notifier callback or it needs a check on net exit to reset.  This
  last one appears necessary to decrement the sock refcount so the sock
  can be released in netlink_kernel_release().

If we want to be proactive and use the netlink notifier, we assume the
overhead of adding to the netlink notifier chain and eliminate all the
other reset calls under the kauditd thread.  If we are ok being
reactionary, then we'll at least need the net exit check on audit_sock.

Have I understood this correctly?

I'll follow with a patch based on audit#next

There will be an upstream merge conflict between audit#next and net#next
due to the removal of:
	RCU_INIT_POINTER(aunet->nlsk, NULL);                                                        
	synchronize_net();
from the end of audit_net_exit().  This patch should probably go through
the audit maintainer due to the other anticipated merge conflicts.

> Thanks.

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Jesper Dangaard Brouer @ 2016-12-12  9:40 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: netdev@vger.kernel.org, linux-mm, John Fastabend,
	Willem de Bruijn, Björn Töpel, Karlsson, Magnus,
	Alexander Duyck, Mel Gorman, Tom Herbert, Brenden Blanco,
	Tariq Toukan, Saeed Mahameed, Jesse Brandeburg, Kalman Meth,
	brouer
In-Reply-To: <20161212083812.GA19987@rapoport-lnx>


On Mon, 12 Dec 2016 10:38:13 +0200 Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:

> Hello Jesper,
> 
> On Mon, Dec 05, 2016 at 03:31:32PM +0100, Jesper Dangaard Brouer wrote:
> > Hi all,
> > 
> > This is my design for how to safely handle RX zero-copy in the network
> > stack, by using page_pool[1] and modifying NIC drivers.  Safely means
> > not leaking kernel info in pages mapped to userspace and resilience
> > so a malicious userspace app cannot crash the kernel.
> > 
> > Design target
> > =============
> > 
> > Allow the NIC to function as a normal Linux NIC and be shared in a
> > safe manor, between the kernel network stack and an accelerated
> > userspace application using RX zero-copy delivery.
> > 
> > Target is to provide the basis for building RX zero-copy solutions in
> > a memory safe manor.  An efficient communication channel for userspace
> > delivery is out of scope for this document, but OOM considerations are
> > discussed below (`Userspace delivery and OOM`_).  
> 
> Sorry, if this reply is a bit off-topic.

It is very much on topic IMHO :-)

> I'm working on implementation of RX zero-copy for virtio and I've dedicated
> some thought about making guest memory available for physical NIC DMAs.
> I believe this is quite related to your page_pool proposal, at least from
> the NIC driver perspective, so I'd like to share some thoughts here.

Seems quite related. I'm very interested in cooperating with you! I'm
not very familiar with virtio, and how packets/pages gets channeled
into virtio.

> The idea is to dedicate one (or more) of the NIC's queues to a VM, e.g.
> using macvtap, and then propagate guest RX memory allocations to the NIC
> using something like new .ndo_set_rx_buffers method.

I believe the page_pool API/design aligns with this idea/use-case.

> What is your view about interface between the page_pool and the NIC
> drivers?

In my Prove-of-Concept implementation, the NIC driver (mlx5) register
a page_pool per RX queue.  This is done for two reasons (1) performance
and (2) for supporting use-cases where only one single RX-ring queue is
(re)configured to support RX-zero-copy.  There are some associated
extra cost of enabling this mode, thus it makes sense to only enable it
when needed.

I've not decided how this gets enabled, maybe some new driver NDO.  It
could also happen when a XDP program gets loaded, which request this
feature.

The macvtap solution is nice and we should support it, but it requires
VM to have their MAC-addr registered on the physical switch.  This
design is about adding flexibility. Registering an XDP eBPF filter
provides the maximum flexibility for matching the destination VM.


> Have you considered using "push" model for setting the NIC's RX memory?

I don't understand what you mean by a "push" model?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v3 0/4] vsock: cancel connect packets when failing to connect
From: Peng Tao @ 2016-12-12  9:17 UTC (permalink / raw)
  To: Stefan Hajnoczi, Jorgen Hansen
  Cc: netdev@vger.kernel.org, virtualization, David Miller,
	Stefan Hajnoczi, kvm
In-Reply-To: <20161209101842.GD18260@stefanha-x1.localdomain>

On Fri, Dec 9, 2016 at 6:18 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Fri, Dec 09, 2016 at 01:12:32AM +0800, Peng Tao wrote:
>> Currently, if a connect call fails on a signal or timeout (e.g., guest is still
>> in the process of starting up), we'll just return to caller and leave the connect
>> packet queued and they are sent even though the connection is considered a failure,
>> which can confuse applications with unwanted false connect attempt.
>>
>> The patchset enables vsock (both host and guest) to cancel queued packets when
>> a connect attempt is considered to fail.
>>
>> v3 changelog:
>>   - define cancel_pkt callback in struct vsock_transport rather than struct virtio_transport
>>   - rename virtio_vsock_pkt->vsk to virtio_vsock_pkt->cancel_token
>> v2 changelog:
>>   - fix queued_replies counting and resume tx/rx when necessary
>>
>>
>> Peng Tao (4):
>>   vsock: track pkt owner vsock
>>   vhost-vsock: add pkt cancel capability
>>   vsock: add pkt cancel capability
>>   vsock: cancel packets when failing to connect
>>
>>  drivers/vhost/vsock.c                   | 41 ++++++++++++++++++++++++++++++++
>>  include/linux/virtio_vsock.h            |  2 ++
>>  include/net/af_vsock.h                  |  3 +++
>>  net/vmw_vsock/af_vsock.c                | 14 +++++++++++
>>  net/vmw_vsock/virtio_transport.c        | 42 +++++++++++++++++++++++++++++++++
>>  net/vmw_vsock/virtio_transport_common.c |  7 ++++++
>>  6 files changed, 109 insertions(+)
>
> I'm happy although I pointed out two unnecessary (void*) casts.
>
> Please wait for Jorgen to go happy on the af_vsock.c changes before
> applying.
Thanks for reviewing!

Jorgen, would you please see if the changes to af_vsock.c is OK to you?

Cheers,
Tao

^ permalink raw reply

* Re: [iproute2 net-next 3/8] bpf: Add BPF_ macros
From: Daniel Borkmann @ 2016-12-12  9:15 UTC (permalink / raw)
  To: David Ahern, netdev, stephen
In-Reply-To: <1481503995-24825-4-git-send-email-dsa@cumulusnetworks.com>

On 12/12/2016 01:53 AM, David Ahern wrote:
> Based on version in kernel repo, samples/bpf/libbpf.h
>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [iproute2 net-next 2/8] bpf: export bpf_prog_load
From: Daniel Borkmann @ 2016-12-12  9:14 UTC (permalink / raw)
  To: David Ahern, netdev, stephen
In-Reply-To: <1481503995-24825-3-git-send-email-dsa@cumulusnetworks.com>

On 12/12/2016 01:53 AM, David Ahern wrote:
> Code move only; no functional change intended.
>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [iproute2 net-next 1/8] lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
From: Daniel Borkmann @ 2016-12-12  9:14 UTC (permalink / raw)
  To: David Ahern, netdev, stephen
In-Reply-To: <1481503995-24825-2-git-send-email-dsa@cumulusnetworks.com>

On 12/12/2016 01:53 AM, David Ahern wrote:
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device
From: maowenan @ 2016-12-12  8:49 UTC (permalink / raw)
  To: David Laight, netdev@vger.kernel.org, f.fainelli@gmail.com,
	dingtianhong@huawei.com, weiyongjun (A)
In-Reply-To: <0cef0798-7b0d-734d-17a3-bebffe6206c7@huawei.com>



On 2016/12/5 16:47, maowenan wrote:
> 
> 
> On 2016/12/2 17:45, David Laight wrote:
>> From: Mao Wenan
>>> Sent: 30 November 2016 10:23
>>> The nic in my board use the phy dev from marvell, and the system will
>>> load the marvell phy driver automatically, but when I remove the phy
>>> drivers, the system immediately panic:
>>> Call trace:
>>> [ 2582.834493] [<ffff800000715384>] phy_state_machine+0x3c/0x438 [
>>> 2582.851754] [<ffff8000000db3b8>] process_one_work+0x150/0x428 [
>>> 2582.868188] [<ffff8000000db7d4>] worker_thread+0x144/0x4b0 [
>>> 2582.883882] [<ffff8000000e1d0c>] kthread+0xfc/0x110
>>>
>>> there should be proper reference counting in place to avoid that.
>>> I found that phy_attach_direct() forgets to add phy device driver
>>> reference count, and phy_detach() forgets to subtract reference count.
>>> This patch is to fix this bug, after that panic is disappeared when remove
>>> marvell.ko
>>>
>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>> ---
>>>  drivers/net/phy/phy_device.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>> index 1a4bf8a..a7ec7c2 100644
>>> --- a/drivers/net/phy/phy_device.c
>>> +++ b/drivers/net/phy/phy_device.c
>>> @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>  		return -EIO;
>>>  	}
>>>
>>> +	if (!try_module_get(d->driver->owner)) {
>>> +		dev_err(&dev->dev, "failed to get the device driver module\n");
>>> +		return -EIO;
>>> +	}
>>
>> If this is the phy code, what stops the phy driver being unloaded
>> before the try_module_get() obtains a reference.
>> If it isn't the phy driver then there ought to be a reference count obtained
>> when the phy driver is located (by whatever decides which phy driver to use).
>> Even if that code later releases its reference (it probably shouldn't on success)
>> then you can't fail to get an extra reference here.
> 
> [Mao Wenan]Yes, this is phy code, in function phy_attach_direct(), drivers/net/phy/phy_device.c.
> when one NIC driver to do probe behavior, it will attach one matched phy driver. phy_attach_direct()
> is to obtain phy driver reference and bind phy driver, if try_module_get() execute on success, the reference
> count is added; if failed, the driver can't be attached to this NIC, and it can't added the phy driver
> reference count. So before try_module_get obtains a reference, phy driver can't can't be bound to this NIC.
> when the phy driver is attached to NIC, the reference count is added, if someone remove phy driver directly,
> it will be failed because reference count is not equal to 0.
> 
> An example of call trace when there is NIC driver to attch one phy driver:
> hns_nic_dev_probe->hns_nic_try_get_ae->hns_nic_init_phy->of_phy_connect->phy_connect_direct->phy_attach_direct
> 
> Consider the steps of phy driver(marvell.ko) added and removed, and NIC driver(hns_enet_drv.ko) added and removed:
> 1)insmod marvell       ref=0
> 2)insmod hns_enet_drv  ref=1
> 3)rmmod marvell        (should not on success, ref=1)
> 4)rmmod hns_enet_drv   ref=0
> 5)rmmod marvell        (should on success, because ref=0)
> 
> if we don't add the reference count in phy_attach_direct(the second step ref=0), so the third step rmmod marvell will
> be panic, because there is one user remain use marvell driver and phy_stat_machine use the NULL drv pointer.
> 
>>
>>> +
>>>  	get_device(d);
>>>
>>>  	/* Assume that if there is no driver, that it doesn't
>>> @@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>
>>>  error:
>>>  	put_device(d);
>>> +	module_put(d->driver->owner);
>>
>> Are those two in the wrong order ?
>>
>>>  	module_put(bus->owner);
>>>  	return err;
>>>  }
>>> @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
>>>  	bus = phydev->mdio.bus;
>>>
>>>  	put_device(&phydev->mdio.dev);
>>> +	module_put(phydev->mdio.dev.driver->owner);
>>>  	module_put(bus->owner);
>>
>> Where is this code called from?
>> You can't call it from the phy driver because the driver can be unloaded
>> as soon as the last reference is removed.
>> At that point the code memory is freed.
> 
> [Mao Wenan] it is called by NIC when it is removed, which aims to disconnect one bound phy driver. If this phy driver
> is not used for this NIC, reference count should be subtracted, and phy driver can be removed if there is no user.
> hns_nic_dev_remove->phy_disconnect->phy_detach
> 
> 
> 
>>
>>>  }
>>>  EXPORT_SYMBOL(phy_detach);
>>> --
>>> 2.7.0
>>>
>>
>>
>> .
>>

@Florian Fainelli, what's your comments about this patch?

^ permalink raw reply

* Re: Designing a safe RX-zero-copy Memory Model for Networking
From: Mike Rapoport @ 2016-12-12  8:38 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev@vger.kernel.org, linux-mm, John Fastabend,
	Willem de Bruijn, Björn Töpel, Karlsson, Magnus,
	Alexander Duyck, Mel Gorman, Tom Herbert, Brenden Blanco,
	Tariq Toukan, Saeed Mahameed, Jesse Brandeburg, Kalman Meth
In-Reply-To: <20161205153132.283fcb0e@redhat.com>

Hello Jesper,

On Mon, Dec 05, 2016 at 03:31:32PM +0100, Jesper Dangaard Brouer wrote:
> Hi all,
> 
> This is my design for how to safely handle RX zero-copy in the network
> stack, by using page_pool[1] and modifying NIC drivers.  Safely means
> not leaking kernel info in pages mapped to userspace and resilience
> so a malicious userspace app cannot crash the kernel.
> 
> Design target
> =============
> 
> Allow the NIC to function as a normal Linux NIC and be shared in a
> safe manor, between the kernel network stack and an accelerated
> userspace application using RX zero-copy delivery.
> 
> Target is to provide the basis for building RX zero-copy solutions in
> a memory safe manor.  An efficient communication channel for userspace
> delivery is out of scope for this document, but OOM considerations are
> discussed below (`Userspace delivery and OOM`_).

Sorry, if this reply is a bit off-topic.

I'm working on implementation of RX zero-copy for virtio and I've dedicated
some thought about making guest memory available for physical NIC DMAs.
I believe this is quite related to your page_pool proposal, at least from
the NIC driver perspective, so I'd like to share some thoughts here.
The idea is to dedicate one (or more) of the NIC's queues to a VM, e.g.
using macvtap, and then propagate guest RX memory allocations to the NIC
using something like new .ndo_set_rx_buffers method.

What is your view about interface between the page_pool and the NIC
drivers?
Have you considered using "push" model for setting the NIC's RX memory?

> 
> --
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 
> Above document is taken at GitHub commit 47fa7c844f48fab8b
>  https://github.com/netoptimizer/prototype-kernel/commit/47fa7c844f48fab8b
> 

--
Sincerely yours,
Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: [PATCH] net: add one ethtool option to set relax ordering mode
From: maowenan @ 2016-12-12  8:30 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev@vger.kernel.org, jeffrey.t.kirsher@intel.com,
	weiyongjun (A)
In-Reply-To: <20161208141153.GI26852@lunn.ch>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Andrew Lunn
> Sent: Thursday, December 08, 2016 10:12 PM
> To: maowenan
> Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
> Subject: Re: [PATCH] net: add one ethtool option to set relax ordering mode
> 
> On Thu, Dec 08, 2016 at 02:51:37PM +0800, Mao Wenan wrote:
> > This patch provides one way to set/unset IXGBE NIC TX and RX relax
> > ordering mode, which can be set by ethtool.
> > Relax ordering is one mode of 82599 NIC, to enable this mode can
> > enhance the performance for some cpu architecure.
> > example:
> > ethtool -s enp1s0f0 relaxorder off
> > ethtool -s enp1s0f0 relaxorder on
> 
> Since this is a simple on/off, could it not be done with a feature?
> ethtool --feature?
> 
> 	Andrew

Hello Andrew, 
	Thank you for your comments.
	I get your idea about using ethtool -K|--feature is good for this feature, right? 
My original concert is about this is a relax ordering mode exist in 82599, it is the hardware 
related feature. And ethtool -s option is related hardware of phy and other (e.g: speed, duplex...),
it is very easy to implement in do_sset().
But ethtool -K is mainly used for protocol offload,
        ethtool -K|--features|--offload DEVNAME Set protocol offload and other features
                FEATURE on|off ... 
@Jeff Kirsher, what's your comments?

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: ethernet: Initial driver for Synopsys DWC XLGMAC
From: kbuild test robot @ 2016-12-12  8:26 UTC (permalink / raw)
  To: Jie Deng
  Cc: kbuild-all, davem, f.fainelli, netdev, linux-kernel,
	CARLOS.PALMINHA, lars.persson, thomas.lendacky, Jie Deng
In-Reply-To: <3fe82457c51f8437797eae27d03cdb0dcbef039b.1481075763.git.jiedeng@synopsys.com>

[-- Attachment #1: Type: text/plain, Size: 2632 bytes --]

Hi Jie,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jie-Deng/net-phy-add-extension-of-phy-mode-for-XLGMII/20161207-121843
config: i386-randconfig-h0-12121424 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from drivers/net/ethernet/synopsys/dwc/dwc-xlgmac-pci.c:17:0:
>> drivers/net/ethernet/synopsys/dwc/dwc-eth.h:662:26: error: 'IEEE_8021QAZ_MAX_TCS' undeclared here (not in a function)
     unsigned int prio2q_map[IEEE_8021QAZ_MAX_TCS];
                             ^~~~~~~~~~~~~~~~~~~~
--
   In file included from drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c:58:0:
>> drivers/net/ethernet/synopsys/dwc/dwc-eth.h:662:26: error: 'IEEE_8021QAZ_MAX_TCS' undeclared here (not in a function)
     unsigned int prio2q_map[IEEE_8021QAZ_MAX_TCS];
                             ^~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c: In function 'dwc_eth_enable_tx_flow_control':
>> drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c:1214:13: error: dereferencing pointer to incomplete type 'struct ieee_ets'
        tc = ets->prio_tc[prio];
                ^~
>> drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c:1217:12: error: dereferencing pointer to incomplete type 'struct ieee_pfc'
        if (pfc->pfc_en & (1 << tc)) {
               ^~
   drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c: In function 'dwc_eth_config_dcb_tc':
>> drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c:1407:8: error: 'IEEE_8021QAZ_TSA_STRICT' undeclared (first use in this function)
      case IEEE_8021QAZ_TSA_STRICT:
           ^~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c:1407:8: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c:1413:8: error: 'IEEE_8021QAZ_TSA_ETS' undeclared (first use in this function)
      case IEEE_8021QAZ_TSA_ETS:
           ^~~~~~~~~~~~~~~~~~~~

vim +/IEEE_8021QAZ_MAX_TCS +662 drivers/net/ethernet/synopsys/dwc/dwc-eth.h

   656		u64 tx_tstamp;
   657	
   658		/* DCB support */
   659		struct ieee_ets *ets;
   660		struct ieee_pfc *pfc;
   661		unsigned int q2tc_map[DWC_ETH_MAX_QUEUES];
 > 662		unsigned int prio2q_map[IEEE_8021QAZ_MAX_TCS];
   663		u8 num_tcs;
   664	
   665		/* Device control parameters */

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26168 bytes --]

^ permalink raw reply

* Re: [Patch net-next] ipvs: remove an annoying printk in netns init
From: Simon Horman @ 2016-12-12  7:54 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev
In-Reply-To: <1481346600-25335-1-git-send-email-xiyou.wangcong@gmail.com>

On vr, dec 09, 2016 at 09:09:59 -0800, Cong Wang wrote:
> At most it is used for debugging purpose, but I don't think
> it is even useful for debugging, just remove it.
> 
> Cc: Simon Horman <horms@verge.net.au>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Thanks, applied.

^ permalink raw reply

* [PATCH] vhost: cache used event for better performance
From: Jason Wang @ 2016-12-12  6:46 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: Jason Wang

When event index was enabled, we need to fetch used event from
userspace memory each time. This userspace fetch (with memory
barrier) could be saved sometime when 1) caching used event and 2)
if used event is ahead of new and old to new updating does not cross
it, we're sure there's no need to notify guest.

This will be useful for heavy tx load e.g guest pktgen test with Linux
driver shows ~3.5% improvement.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 28 ++++++++++++++++++++++------
 drivers/vhost/vhost.h |  3 +++
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2663543..d3fa550 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -290,6 +290,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->avail = NULL;
 	vq->used = NULL;
 	vq->last_avail_idx = 0;
+	vq->last_used_event = 0;
 	vq->avail_idx = 0;
 	vq->last_used_idx = 0;
 	vq->signalled_used = 0;
@@ -1324,7 +1325,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
 			r = -EINVAL;
 			break;
 		}
-		vq->last_avail_idx = s.num;
+		vq->last_avail_idx = vq->last_used_event = s.num;
 		/* Forget the cached index value. */
 		vq->avail_idx = vq->last_avail_idx;
 		break;
@@ -2159,10 +2160,6 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	__u16 old, new;
 	__virtio16 event;
 	bool v;
-	/* Flush out used index updates. This is paired
-	 * with the barrier that the Guest executes when enabling
-	 * interrupts. */
-	smp_mb();
 
 	if (vhost_has_feature(vq, VIRTIO_F_NOTIFY_ON_EMPTY) &&
 	    unlikely(vq->avail_idx == vq->last_avail_idx))
@@ -2170,6 +2167,10 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
 		__virtio16 flags;
+		/* Flush out used index updates. This is paired
+		 * with the barrier that the Guest executes when enabling
+		 * interrupts. */
+		smp_mb();
 		if (vhost_get_user(vq, flags, &vq->avail->flags)) {
 			vq_err(vq, "Failed to get flags");
 			return true;
@@ -2184,11 +2185,26 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	if (unlikely(!v))
 		return true;
 
+	/* We're sure if the following conditions are met, there's no
+	 * need to notify guest:
+	 * 1) cached used event is ahead of new
+	 * 2) old to new updating does not cross cached used event. */
+	if (vring_need_event(vq->last_used_event, new + vq->num, new) &&
+	    !vring_need_event(vq->last_used_event, new, old))
+		return false;
+
+	/* Flush out used index updates. This is paired
+	 * with the barrier that the Guest executes when enabling
+	 * interrupts. */
+	smp_mb();
+
 	if (vhost_get_user(vq, event, vhost_used_event(vq))) {
 		vq_err(vq, "Failed to get used event idx");
 		return true;
 	}
-	return vring_need_event(vhost16_to_cpu(vq, event), new, old);
+	vq->last_used_event = vhost16_to_cpu(vq, event);
+
+	return vring_need_event(vq->last_used_event, new, old);
 }
 
 /* This actually signals the guest, using eventfd. */
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 78f3c5f..a9cbbb1 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -107,6 +107,9 @@ struct vhost_virtqueue {
 	/* Last index we used. */
 	u16 last_used_idx;
 
+	/* Last used evet we've seen */
+	u16 last_used_event;
+
 	/* Used flags */
 	u16 used_flags;
 
-- 
2.7.4


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox