Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: XDP redirect measurements, gotchas and tracepoints
From: John Fastabend @ 2017-08-28 16:14 UTC (permalink / raw)
  To: Andy Gospodarek, Michael Chan
  Cc: Jesper Dangaard Brouer, Alexander Duyck, Duyck, Alexander H,
	pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, borkmann@iogearbox.net
In-Reply-To: <20170828160237.GA70910@C02RW35GFVH8.dhcp.broadcom.net>

On 08/28/2017 09:02 AM, Andy Gospodarek wrote:
> On Fri, Aug 25, 2017 at 08:28:55AM -0700, Michael Chan wrote:
>> On Fri, Aug 25, 2017 at 8:10 AM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>>> On 08/25/2017 05:45 AM, Jesper Dangaard Brouer wrote:
>>>> On Thu, 24 Aug 2017 20:36:28 -0700
>>>> Michael Chan <michael.chan@broadcom.com> wrote:
>>>>
>>>>> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
>>>>> <brouer@redhat.com> wrote:
>>>>>> On Tue, 22 Aug 2017 23:59:05 -0700
>>>>>> Michael Chan <michael.chan@broadcom.com> wrote:
>>>>>>
>>>>>>> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
>>>>>>> <alexander.duyck@gmail.com> wrote:
>>>>>>>> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>>>>>>>>
>>>>>>>>> Right, but it's conceivable to add an API to "return" the buffer to
>>>>>>>>> the input device, right?
>>>>>>
>>>>>> Yes, I would really like to see an API like this.
>>>>>>
>>>>>>>>
>>>>>>>> You could, it is just added complexity. "just free the buffer" in
>>>>>>>> ixgbe usually just amounts to one atomic operation to decrement the
>>>>>>>> total page count since page recycling is already implemented in the
>>>>>>>> driver. You still would have to unmap the buffer regardless of if you
>>>>>>>> were recycling it or not so all you would save is 1.000015259 atomic
>>>>>>>> operations per packet. The fraction is because once every 64K uses we
>>>>>>>> have to bulk update the count on the page.
>>>>>>>>
>>>>>>>
>>>>>>> If the buffer is returned to the input device, the input device can
>>>>>>> keep the DMA mapping.  All it needs to do is to dma_sync it back to
>>>>>>> the input device when the buffer is returned.
>>>>>>
>>>>>> Yes, exactly, return to the input device. I really think we should
>>>>>> work on a solution where we can keep the DMA mapping around.  We have
>>>>>> an opportunity here to make ndo_xdp_xmit TX queues use a specialized
>>>>>> page return call, to achieve this. (I imagine other arch's have a high
>>>>>> DMA overhead than Intel)
>>>>>>
>>>>>> I'm not sure how the API should look.  The ixgbe recycle mechanism and
>>>>>> splitting the page (into two packets) actually complicates things, and
>>>>>> tie us into a page-refcnt based model.  We could get around this by
>>>>>> each driver implementing a page-return-callback, that allow us to
>>>>>> return the page to the input device?  Then, drivers implementing the
>>>>>> 1-packet-per-page can simply check/read the page-refcnt, and if it is
>>>>>> "1" DMA-sync and reuse it in the RX queue.
>>>>>>
>>>>>
>>>>> Yeah, based on Alex' description, it's not clear to me whether ixgbe
>>>>> redirecting to a non-intel NIC or vice versa will actually work.  It
>>>>> sounds like the output device has to make some assumptions about how
>>>>> the page was allocated by the input device.
>>>>
>>>> Yes, exactly. We are tied into a page refcnt based scheme.
>>>>
>>>> Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping)
>>>> is also tied to the RX queue size, plus how fast the pages are returned.
>>>> This makes it very hard to tune.  As I demonstrated, default ixgbe
>>>> settings does not work well with XDP_REDIRECT.  I needed to increase
>>>> TX-ring size, but it broke page recycling (dropping perf from 13Mpps to
>>>> 10Mpps) so I also needed it increase RX-ring size.  But perf is best if
>>>> RX-ring size is smaller, thus two contradicting tuning needed.
>>>>
>>>
>>> The changes to decouple the ixgbe page recycle scheme (1pg per descriptor
>>> split into two halves being the default) from the number of descriptors
>>> doesn't look too bad IMO. It seems like it could be done by having some
>>> extra pages allocated upfront and pulling those in when we need another
>>> page.
>>>
>>> This would be a nice iterative step we could take on the existing API.
>>>
>>>>
>>>>> With buffer return API,
>>>>> each driver can cleanly recycle or free its own buffers properly.
>>>>
>>>> Yes, exactly. And RX-driver can implement a special memory model for
>>>> this queue.  E.g. RX-driver can know this is a dedicated XDP RX-queue
>>>> which is never used for SKBs, thus opening for new RX memory models.
>>>>
>>>> Another advantage of a return API.  There is also an opportunity for
>>>> avoiding the DMA map on TX. As we need to know the from-device.  Thus,
>>>> we can add a DMA API, where we can query if the two devices uses the
>>>> same DMA engine, and can reuse the same DMA address the RX-side already
>>>> knows.
>>>>
>>>>
>>>>> Let me discuss this further with Andy to see if we can come up with a
>>>>> good scheme.
>>>>
>>>> Sound good, looking forward to hear what you come-up with :-)
>>>>
>>>
>>> I guess by this thread we will see a broadcom nic with redirect support
>>> soon ;)
>>
>> Yes, Andy actually has finished the coding for XDP_REDIRECT, but the
>> buffer recycling scheme has some problems.  We can make it work for
>> Broadcom to Broadcom only, but we want a better solution.
> 
> (Sorry for the radio silence I was AFK last week...)
> 
> I finished it a little while ago, but Michael and I both have concerns
> that in a heterogenous hardware setup one can quickly run into issues
> and haven't had time to work-up a few solutions before bringing this up
> formally.  It also isn't a major problem until the second
> optimized/native XDP driver appears on the scene.
> 
> I can run a test where XDP redirects from an ixgbe <-> bnxt_en based
> device I get OOM kills after only a few seconds, due to the lack of
> feedback between the different drivers that the pointer to xdp->data can
> be freed/reused/etc and the different buffer allocation schemes used.
> 

hmm so how do you get OOM here, I expect the number of in-flight xdp
bufs should be limited by the number of xdps that can be posted to the
outgoing interface. If we are hitting OOM that _should_ mean the size of
the tx queue is too large. Ixgbe should be free'ing the buffer if an error
is returned from xdp xmit routines (will check this today). And bnxt should
return an error if we hit some high water mark on xmit.

> Initially I did not think this was an issue and that xdp_do_flush_map()
> would handle this, but I think there is a still a need to be able to
> signal back to the receving device that the buffer allocated has been
> xmitted by the transmitter and can be freed.  Since there is really no
> guarantee that completion of an XDP_REDIRECT action means that it is
> safe to free area pointed to by xdp->data area that contains the packet
> to be xmitted.  Since the packet done interrupt handler in a driver
> cannot signal back the the receiving driver that the buffer is now safe
> to reuse/free there is a chance for trouble.  

There should be some high water mark on how many outstanding packets
can be in-flight. At the moment I assumed this was something related to
queue lengths a more explicit high water mark could added to the xmit path
and tracked in xdp infrastructure.

> 
> I was hoping to spend some time this week cooking up a patch that just
> did not allow use of XDP_REDIRECT when the ifindex of the outgoing
> device did not match that of the device to which the XDP prog was
> attached, but that probably is not worth the trouble when we would just
> fix it for real.  (It would also require some really terrible hacks to
> enforce this in the kernel when all that is being done is setting up a
> map that contains the redirect table, so it is probably not useful.)
> 

I would prefer to solve the problem vs limiting the implementation

> The basic prototype would be something like this:
> 
> (rx packet interrupt on eth0, leads to napi_poll)
> napi_poll (eth0)
>   call xdp_prog (eth0)
>     xdp_do_redirect (eth0)
>       ndo_xdp_xmit (eth1)
>       mark buffer with information netdev/ring/etc
>       place buffer on tx ring for eth1
> 
> (tx done interrupt on eth1, leads to napi_poll)
> napi_poll (eth1)
>   process tx interrupt (eth1)
>     look up information about netdev/ring/etc
>     ndo_xdp_data_free (eth0, ring, etc)
> 
> Thoughts?
> 

^ permalink raw reply

* Re: XDP redirect measurements, gotchas and tracepoints
From: Alexander Duyck @ 2017-08-28 16:11 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Michael Chan, John Fastabend, Jesper Dangaard Brouer,
	Duyck, Alexander H, pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, borkmann@iogearbox.net
In-Reply-To: <20170828160237.GA70910@C02RW35GFVH8.dhcp.broadcom.net>

On Mon, Aug 28, 2017 at 9:02 AM, Andy Gospodarek <andy@greyhouse.net> wrote:
> On Fri, Aug 25, 2017 at 08:28:55AM -0700, Michael Chan wrote:
>> On Fri, Aug 25, 2017 at 8:10 AM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>> > On 08/25/2017 05:45 AM, Jesper Dangaard Brouer wrote:
>> >> On Thu, 24 Aug 2017 20:36:28 -0700
>> >> Michael Chan <michael.chan@broadcom.com> wrote:
>> >>
>> >>> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
>> >>> <brouer@redhat.com> wrote:
>> >>>> On Tue, 22 Aug 2017 23:59:05 -0700
>> >>>> Michael Chan <michael.chan@broadcom.com> wrote:
>> >>>>
>> >>>>> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
>> >>>>> <alexander.duyck@gmail.com> wrote:
>> >>>>>> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> >>>>>>>
>> >>>>>>> Right, but it's conceivable to add an API to "return" the buffer to
>> >>>>>>> the input device, right?
>> >>>>
>> >>>> Yes, I would really like to see an API like this.
>> >>>>
>> >>>>>>
>> >>>>>> You could, it is just added complexity. "just free the buffer" in
>> >>>>>> ixgbe usually just amounts to one atomic operation to decrement the
>> >>>>>> total page count since page recycling is already implemented in the
>> >>>>>> driver. You still would have to unmap the buffer regardless of if you
>> >>>>>> were recycling it or not so all you would save is 1.000015259 atomic
>> >>>>>> operations per packet. The fraction is because once every 64K uses we
>> >>>>>> have to bulk update the count on the page.
>> >>>>>>
>> >>>>>
>> >>>>> If the buffer is returned to the input device, the input device can
>> >>>>> keep the DMA mapping.  All it needs to do is to dma_sync it back to
>> >>>>> the input device when the buffer is returned.
>> >>>>
>> >>>> Yes, exactly, return to the input device. I really think we should
>> >>>> work on a solution where we can keep the DMA mapping around.  We have
>> >>>> an opportunity here to make ndo_xdp_xmit TX queues use a specialized
>> >>>> page return call, to achieve this. (I imagine other arch's have a high
>> >>>> DMA overhead than Intel)
>> >>>>
>> >>>> I'm not sure how the API should look.  The ixgbe recycle mechanism and
>> >>>> splitting the page (into two packets) actually complicates things, and
>> >>>> tie us into a page-refcnt based model.  We could get around this by
>> >>>> each driver implementing a page-return-callback, that allow us to
>> >>>> return the page to the input device?  Then, drivers implementing the
>> >>>> 1-packet-per-page can simply check/read the page-refcnt, and if it is
>> >>>> "1" DMA-sync and reuse it in the RX queue.
>> >>>>
>> >>>
>> >>> Yeah, based on Alex' description, it's not clear to me whether ixgbe
>> >>> redirecting to a non-intel NIC or vice versa will actually work.  It
>> >>> sounds like the output device has to make some assumptions about how
>> >>> the page was allocated by the input device.
>> >>
>> >> Yes, exactly. We are tied into a page refcnt based scheme.
>> >>
>> >> Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping)
>> >> is also tied to the RX queue size, plus how fast the pages are returned.
>> >> This makes it very hard to tune.  As I demonstrated, default ixgbe
>> >> settings does not work well with XDP_REDIRECT.  I needed to increase
>> >> TX-ring size, but it broke page recycling (dropping perf from 13Mpps to
>> >> 10Mpps) so I also needed it increase RX-ring size.  But perf is best if
>> >> RX-ring size is smaller, thus two contradicting tuning needed.
>> >>
>> >
>> > The changes to decouple the ixgbe page recycle scheme (1pg per descriptor
>> > split into two halves being the default) from the number of descriptors
>> > doesn't look too bad IMO. It seems like it could be done by having some
>> > extra pages allocated upfront and pulling those in when we need another
>> > page.
>> >
>> > This would be a nice iterative step we could take on the existing API.
>> >
>> >>
>> >>> With buffer return API,
>> >>> each driver can cleanly recycle or free its own buffers properly.
>> >>
>> >> Yes, exactly. And RX-driver can implement a special memory model for
>> >> this queue.  E.g. RX-driver can know this is a dedicated XDP RX-queue
>> >> which is never used for SKBs, thus opening for new RX memory models.
>> >>
>> >> Another advantage of a return API.  There is also an opportunity for
>> >> avoiding the DMA map on TX. As we need to know the from-device.  Thus,
>> >> we can add a DMA API, where we can query if the two devices uses the
>> >> same DMA engine, and can reuse the same DMA address the RX-side already
>> >> knows.
>> >>
>> >>
>> >>> Let me discuss this further with Andy to see if we can come up with a
>> >>> good scheme.
>> >>
>> >> Sound good, looking forward to hear what you come-up with :-)
>> >>
>> >
>> > I guess by this thread we will see a broadcom nic with redirect support
>> > soon ;)
>>
>> Yes, Andy actually has finished the coding for XDP_REDIRECT, but the
>> buffer recycling scheme has some problems.  We can make it work for
>> Broadcom to Broadcom only, but we want a better solution.
>
> (Sorry for the radio silence I was AFK last week...)
>
> I finished it a little while ago, but Michael and I both have concerns
> that in a heterogenous hardware setup one can quickly run into issues
> and haven't had time to work-up a few solutions before bringing this up
> formally.  It also isn't a major problem until the second
> optimized/native XDP driver appears on the scene.
>
> I can run a test where XDP redirects from an ixgbe <-> bnxt_en based
> device I get OOM kills after only a few seconds, due to the lack of
> feedback between the different drivers that the pointer to xdp->data can
> be freed/reused/etc and the different buffer allocation schemes used.
>
> Initially I did not think this was an issue and that xdp_do_flush_map()
> would handle this, but I think there is a still a need to be able to
> signal back to the receving device that the buffer allocated has been
> xmitted by the transmitter and can be freed.  Since there is really no
> guarantee that completion of an XDP_REDIRECT action means that it is
> safe to free area pointed to by xdp->data area that contains the packet
> to be xmitted.  Since the packet done interrupt handler in a driver
> cannot signal back the the receiving driver that the buffer is now safe
> to reuse/free there is a chance for trouble.
>
> I was hoping to spend some time this week cooking up a patch that just
> did not allow use of XDP_REDIRECT when the ifindex of the outgoing
> device did not match that of the device to which the XDP prog was
> attached, but that probably is not worth the trouble when we would just
> fix it for real.  (It would also require some really terrible hacks to
> enforce this in the kernel when all that is being done is setting up a
> map that contains the redirect table, so it is probably not useful.)
>
> The basic prototype would be something like this:
>
> (rx packet interrupt on eth0, leads to napi_poll)
> napi_poll (eth0)
>   call xdp_prog (eth0)
>     xdp_do_redirect (eth0)
>       ndo_xdp_xmit (eth1)
>       mark buffer with information netdev/ring/etc
>       place buffer on tx ring for eth1
>
> (tx done interrupt on eth1, leads to napi_poll)
> napi_poll (eth1)
>   process tx interrupt (eth1)
>     look up information about netdev/ring/etc
>     ndo_xdp_data_free (eth0, ring, etc)
>
> Thoughts?
>

My advice would be to not over complicate this. My big concern with
all this buffer recycling is what happens the first time somebody
introduces something like mirroring? Are you going to copy the data to
a new page which would be quite expensive or just have to introduce
reference counts? You are going to have to deal with stuff like
reference counts eventually so you might as well bite that bullet now.
My advice would be to not bother with optimizing for performance right
now and instead focus on just getting functionality. The approach we
took in ixgbe for the transmit path should work for almost any other
driver since all you are looking at is having to free the page
reference which takes care of reference counting already.

- Alex

^ permalink raw reply

* Re: XDP redirect measurements, gotchas and tracepoints
From: Andy Gospodarek @ 2017-08-28 16:02 UTC (permalink / raw)
  To: Michael Chan
  Cc: John Fastabend, Jesper Dangaard Brouer, Alexander Duyck,
	Duyck, Alexander H, pstaszewski@itcare.pl, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, borkmann@iogearbox.net
In-Reply-To: <CACKFLin-sZEkkCxE2iZRHjgG=K36OJ65B-xrEYVCCSQqhYsH-g@mail.gmail.com>

On Fri, Aug 25, 2017 at 08:28:55AM -0700, Michael Chan wrote:
> On Fri, Aug 25, 2017 at 8:10 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
> > On 08/25/2017 05:45 AM, Jesper Dangaard Brouer wrote:
> >> On Thu, 24 Aug 2017 20:36:28 -0700
> >> Michael Chan <michael.chan@broadcom.com> wrote:
> >>
> >>> On Wed, Aug 23, 2017 at 1:29 AM, Jesper Dangaard Brouer
> >>> <brouer@redhat.com> wrote:
> >>>> On Tue, 22 Aug 2017 23:59:05 -0700
> >>>> Michael Chan <michael.chan@broadcom.com> wrote:
> >>>>
> >>>>> On Tue, Aug 22, 2017 at 6:06 PM, Alexander Duyck
> >>>>> <alexander.duyck@gmail.com> wrote:
> >>>>>> On Tue, Aug 22, 2017 at 1:04 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> >>>>>>>
> >>>>>>> Right, but it's conceivable to add an API to "return" the buffer to
> >>>>>>> the input device, right?
> >>>>
> >>>> Yes, I would really like to see an API like this.
> >>>>
> >>>>>>
> >>>>>> You could, it is just added complexity. "just free the buffer" in
> >>>>>> ixgbe usually just amounts to one atomic operation to decrement the
> >>>>>> total page count since page recycling is already implemented in the
> >>>>>> driver. You still would have to unmap the buffer regardless of if you
> >>>>>> were recycling it or not so all you would save is 1.000015259 atomic
> >>>>>> operations per packet. The fraction is because once every 64K uses we
> >>>>>> have to bulk update the count on the page.
> >>>>>>
> >>>>>
> >>>>> If the buffer is returned to the input device, the input device can
> >>>>> keep the DMA mapping.  All it needs to do is to dma_sync it back to
> >>>>> the input device when the buffer is returned.
> >>>>
> >>>> Yes, exactly, return to the input device. I really think we should
> >>>> work on a solution where we can keep the DMA mapping around.  We have
> >>>> an opportunity here to make ndo_xdp_xmit TX queues use a specialized
> >>>> page return call, to achieve this. (I imagine other arch's have a high
> >>>> DMA overhead than Intel)
> >>>>
> >>>> I'm not sure how the API should look.  The ixgbe recycle mechanism and
> >>>> splitting the page (into two packets) actually complicates things, and
> >>>> tie us into a page-refcnt based model.  We could get around this by
> >>>> each driver implementing a page-return-callback, that allow us to
> >>>> return the page to the input device?  Then, drivers implementing the
> >>>> 1-packet-per-page can simply check/read the page-refcnt, and if it is
> >>>> "1" DMA-sync and reuse it in the RX queue.
> >>>>
> >>>
> >>> Yeah, based on Alex' description, it's not clear to me whether ixgbe
> >>> redirecting to a non-intel NIC or vice versa will actually work.  It
> >>> sounds like the output device has to make some assumptions about how
> >>> the page was allocated by the input device.
> >>
> >> Yes, exactly. We are tied into a page refcnt based scheme.
> >>
> >> Besides the ixgbe page recycle scheme (which keeps the DMA RX-mapping)
> >> is also tied to the RX queue size, plus how fast the pages are returned.
> >> This makes it very hard to tune.  As I demonstrated, default ixgbe
> >> settings does not work well with XDP_REDIRECT.  I needed to increase
> >> TX-ring size, but it broke page recycling (dropping perf from 13Mpps to
> >> 10Mpps) so I also needed it increase RX-ring size.  But perf is best if
> >> RX-ring size is smaller, thus two contradicting tuning needed.
> >>
> >
> > The changes to decouple the ixgbe page recycle scheme (1pg per descriptor
> > split into two halves being the default) from the number of descriptors
> > doesn't look too bad IMO. It seems like it could be done by having some
> > extra pages allocated upfront and pulling those in when we need another
> > page.
> >
> > This would be a nice iterative step we could take on the existing API.
> >
> >>
> >>> With buffer return API,
> >>> each driver can cleanly recycle or free its own buffers properly.
> >>
> >> Yes, exactly. And RX-driver can implement a special memory model for
> >> this queue.  E.g. RX-driver can know this is a dedicated XDP RX-queue
> >> which is never used for SKBs, thus opening for new RX memory models.
> >>
> >> Another advantage of a return API.  There is also an opportunity for
> >> avoiding the DMA map on TX. As we need to know the from-device.  Thus,
> >> we can add a DMA API, where we can query if the two devices uses the
> >> same DMA engine, and can reuse the same DMA address the RX-side already
> >> knows.
> >>
> >>
> >>> Let me discuss this further with Andy to see if we can come up with a
> >>> good scheme.
> >>
> >> Sound good, looking forward to hear what you come-up with :-)
> >>
> >
> > I guess by this thread we will see a broadcom nic with redirect support
> > soon ;)
> 
> Yes, Andy actually has finished the coding for XDP_REDIRECT, but the
> buffer recycling scheme has some problems.  We can make it work for
> Broadcom to Broadcom only, but we want a better solution.

(Sorry for the radio silence I was AFK last week...)

I finished it a little while ago, but Michael and I both have concerns
that in a heterogenous hardware setup one can quickly run into issues
and haven't had time to work-up a few solutions before bringing this up
formally.  It also isn't a major problem until the second
optimized/native XDP driver appears on the scene.

I can run a test where XDP redirects from an ixgbe <-> bnxt_en based
device I get OOM kills after only a few seconds, due to the lack of
feedback between the different drivers that the pointer to xdp->data can
be freed/reused/etc and the different buffer allocation schemes used.

Initially I did not think this was an issue and that xdp_do_flush_map()
would handle this, but I think there is a still a need to be able to
signal back to the receving device that the buffer allocated has been
xmitted by the transmitter and can be freed.  Since there is really no
guarantee that completion of an XDP_REDIRECT action means that it is
safe to free area pointed to by xdp->data area that contains the packet
to be xmitted.  Since the packet done interrupt handler in a driver
cannot signal back the the receiving driver that the buffer is now safe
to reuse/free there is a chance for trouble.  

I was hoping to spend some time this week cooking up a patch that just
did not allow use of XDP_REDIRECT when the ifindex of the outgoing
device did not match that of the device to which the XDP prog was
attached, but that probably is not worth the trouble when we would just
fix it for real.  (It would also require some really terrible hacks to
enforce this in the kernel when all that is being done is setting up a
map that contains the redirect table, so it is probably not useful.)

The basic prototype would be something like this:

(rx packet interrupt on eth0, leads to napi_poll)
napi_poll (eth0)
  call xdp_prog (eth0)
    xdp_do_redirect (eth0)
      ndo_xdp_xmit (eth1)
      mark buffer with information netdev/ring/etc
      place buffer on tx ring for eth1

(tx done interrupt on eth1, leads to napi_poll)
napi_poll (eth1)
  process tx interrupt (eth1)
    look up information about netdev/ring/etc
    ndo_xdp_data_free (eth0, ring, etc)

Thoughts?

^ permalink raw reply

* Re: [PATCH net-next] bpf: fix oops on allocation failure
From: John Fastabend @ 2017-08-28 15:58 UTC (permalink / raw)
  To: Daniel Borkmann, Dan Carpenter, Alexei Starovoitov
  Cc: netdev, kernel-janitors
In-Reply-To: <59A08CCD.2050603@iogearbox.net>

On 08/25/2017 01:47 PM, Daniel Borkmann wrote:
> On 08/25/2017 10:27 PM, Dan Carpenter wrote:
>> "err" is set to zero if bpf_map_area_alloc() fails so it means we return
>> ERR_PTR(0) which is NULL.  The caller, find_and_alloc_map(), is not
>> expecting NULL returns and will oops.
>>
>> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>

Thanks.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* [PATCH net v1 1/1] tipc: permit bond slave as bearer
From: Parthasarathy Bhuvaragan @ 2017-08-28 15:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, tipc-discussion, jon.maloy, maloy, ying.xue

For a bond slave device as a tipc bearer, the dev represents the bond
interface and orig_dev represents the slave in tipc_l2_rcv_msg().
Since we decode the tipc_ptr from bonding device (dev), we fail to
find the bearer and thus tipc links are not established.

In this commit, we register the tipc protocol callback per device and
look for tipc bearer from both the devices.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
---
 net/tipc/bearer.c | 26 +++++++++++---------------
 net/tipc/bearer.h |  2 ++
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 767e0537dde5..89cd061c4468 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -65,6 +65,8 @@ static struct tipc_bearer *bearer_get(struct net *net, int bearer_id)
 }
 
 static void bearer_disable(struct net *net, struct tipc_bearer *b);
+static int tipc_l2_rcv_msg(struct sk_buff *skb, struct net_device *dev,
+			   struct packet_type *pt, struct net_device *orig_dev);
 
 /**
  * tipc_media_find - locates specified media object by name
@@ -428,6 +430,10 @@ int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b,
 
 	/* Associate TIPC bearer with L2 bearer */
 	rcu_assign_pointer(b->media_ptr, dev);
+	b->pt.dev = dev;
+	b->pt.type = htons(ETH_P_TIPC);
+	b->pt.func = tipc_l2_rcv_msg;
+	dev_add_pack(&b->pt);
 	memset(&b->bcast_addr, 0, sizeof(b->bcast_addr));
 	memcpy(b->bcast_addr.value, dev->broadcast, b->media->hwaddr_len);
 	b->bcast_addr.media_id = b->media->type_id;
@@ -447,6 +453,7 @@ void tipc_disable_l2_media(struct tipc_bearer *b)
 	struct net_device *dev;
 
 	dev = (struct net_device *)rtnl_dereference(b->media_ptr);
+	dev_remove_pack(&b->pt);
 	RCU_INIT_POINTER(dev->tipc_ptr, NULL);
 	synchronize_net();
 	dev_put(dev);
@@ -594,11 +601,12 @@ static int tipc_l2_rcv_msg(struct sk_buff *skb, struct net_device *dev,
 	struct tipc_bearer *b;
 
 	rcu_read_lock();
-	b = rcu_dereference_rtnl(dev->tipc_ptr);
+	b = rcu_dereference_rtnl(dev->tipc_ptr) ?:
+		rcu_dereference_rtnl(orig_dev->tipc_ptr);
 	if (likely(b && test_bit(0, &b->up) &&
 		   (skb->pkt_type <= PACKET_MULTICAST))) {
 		skb->next = NULL;
-		tipc_rcv(dev_net(dev), skb, b);
+		tipc_rcv(dev_net(b->pt.dev), skb, b);
 		rcu_read_unlock();
 		return NET_RX_SUCCESS;
 	}
@@ -659,11 +667,6 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt,
 	return NOTIFY_OK;
 }
 
-static struct packet_type tipc_packet_type __read_mostly = {
-	.type = htons(ETH_P_TIPC),
-	.func = tipc_l2_rcv_msg,
-};
-
 static struct notifier_block notifier = {
 	.notifier_call  = tipc_l2_device_event,
 	.priority	= 0,
@@ -671,19 +674,12 @@ static struct notifier_block notifier = {
 
 int tipc_bearer_setup(void)
 {
-	int err;
-
-	err = register_netdevice_notifier(&notifier);
-	if (err)
-		return err;
-	dev_add_pack(&tipc_packet_type);
-	return 0;
+	return register_netdevice_notifier(&notifier);
 }
 
 void tipc_bearer_cleanup(void)
 {
 	unregister_netdevice_notifier(&notifier);
-	dev_remove_pack(&tipc_packet_type);
 }
 
 void tipc_bearer_stop(struct net *net)
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 635c9086e19a..e07a55a80c18 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -131,6 +131,7 @@ struct tipc_media {
  * @name: bearer name (format = media:interface)
  * @media: ptr to media structure associated with bearer
  * @bcast_addr: media address used in broadcasting
+ * @pt: packet type for bearer
  * @rcu: rcu struct for tipc_bearer
  * @priority: default link priority for bearer
  * @window: default window size for bearer
@@ -151,6 +152,7 @@ struct tipc_bearer {
 	char name[TIPC_MAX_BEARER_NAME];
 	struct tipc_media *media;
 	struct tipc_media_addr bcast_addr;
+	struct packet_type pt;
 	struct rcu_head rcu;
 	u32 priority;
 	u32 window;
-- 
2.1.4

^ permalink raw reply related

* Re: mlxsw and rtnl lock
From: Roopa Prabhu @ 2017-08-28 15:55 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: David Ahern, Jiri Pirko, netdev@vger.kernel.org, mlxsw
In-Reply-To: <20170826170418.GA22324@shredder>

On Sat, Aug 26, 2017 at 10:04 AM, Ido Schimmel <idosch@idosch.org> wrote:
> On Fri, Aug 25, 2017 at 01:26:07PM -0700, David Ahern wrote:
>> Jiri / Ido:
>>
>>
[snip]
>
> Regarding the silent abort, that's intentional. You can look at the same
> code in v4.9 - when the chain was still blocking - and you'll see that
> we didn't propagate the error even then. This was discussed in the past
> and the conclusion was that user doesn't expect to operation to fail. If
> hardware resources are exceeded, we let the kernel take care of the
> forwarding instead.

History here is that the initial fib offload hooks were added during
rocker days.
subsequently we have had many discussions (on offload policies) to see
if we can remove that
check and report the error back to the user (routing daemon in this
case) since the cpu kernel
cannot handle 100G or more traffic on such platforms. Basically the
cpu kernel cannot take care
of forwarding for such platforms. I believe this came up during the
last switchdev BOF as well.
The current silent abort is in there because kernel needs to maintain
its semantics for hardware offload.
Which is a valid position but we will need to find a way to make it
work for switch platforms.

IIUC, the only place where switchdev offload returns error to the user is
'bridge and bond membership offload'

I know fib and bridge fdb offload don't propagate errors to the user
(maybe they used to before moving to notifiers ?. need to check
history on this).

Are tc hardware offload errors propagated to the user ?

It is a bit inconsistent today, some errors are propagated to the user
and some are not.

^ permalink raw reply

* Re: [PATCH net-next 1/4] net: Add SRIOV VGT+ support
From: Sabrina Dubroca @ 2017-08-28 15:52 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Eugenia Emantayev, Mohamad Haj Yahia,
	Hannes Frederic Sowa
In-Reply-To: <20170827110618.20599-2-saeedm@mellanox.com>

2017-08-27, 14:06:15 +0300, Saeed Mahameed wrote:
[...]
> +#define VF_VLAN_BITMAP	DIV_ROUND_UP(VF_VLAN_N_VID, sizeof(__u64) * BITS_PER_BYTE)
> +struct ifla_vf_vlan_trunk {
> +	__u32 vf;
> +	__u64 allowed_vlans_8021q_bm[VF_VLAN_BITMAP];
> +	__u64 allowed_vlans_8021ad_bm[VF_VLAN_BITMAP];
> +};

This is huge (1032B). And you put one of these in the netlink message
for each VF.  This means that with 51 VF (at least in my environment,
where each VF takes 1296B), you're going to overflow the u16 size of a
single attribute (IFLA_VFINFO_LIST), and you cannot dump the device
anymore. I'm afraid this is going to break existing setups.

-- 
Sabrina

^ permalink raw reply

* Fwd: DA850-evm MAC Address is random
From: Adam Ford @ 2017-08-28 15:32 UTC (permalink / raw)
  To: grygorii.strashko, linux-omap, netdev

The davinvi_emac MAC address seems to attempt a call to
ti_cm_get_macid in cpsw-common.c but it returns the message
'davinci_emac davinci_emac.1: incompatible machine/device type for
reading mac address ' and then generates a random MAC address.

The function appears to lookup varions boards using
'of_machine_is_compaible' and supports dm8148, am33xx, am3517, dm816,
am4372 and dra7.  I don't see the ti,davinci-dm6467-emac which is
what's shown in the da850 device tree.

Is there a patch somewhere for supporting the da850-evm?

If not, is there a way to pass the MAC address from U-Boot to the
driver so it doesn't generate a random MAC?

adam

^ permalink raw reply

* Re: [PATCH RFC WIP 0/5] IGMP snooping for local traffic
From: Stephen Hemminger @ 2017-08-28 15:11 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Vivien Didelot, Florian Fainelli, nikolay, jiri, roopa,
	bridge
In-Reply-To: <1503780970-10312-1-git-send-email-andrew@lunn.ch>

On Sat, 26 Aug 2017 22:56:05 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> This is a WIP patchset i would like comments on from bridge, switchdev
> and hardware offload people.
> 
> The linux bridge supports IGMP snooping. It will listen to IGMP
> reports on bridge ports and keep track of which groups have been
> joined on an interface. It will then forward multicast based on this
> group membership.
> 
> When the bridge adds or removed groups from an interface, it uses
> switchdev to request the hardware add an mdb to a port, so the
> hardware can perform the selective forwarding between ports.
> 
> What is not covered by the current bridge code, is IGMP joins/leaves
> from the host on the brX interface. No such monitoring is
> performed. With a pure software bridge, it is not required. All
> mulitcast frames are passed to the brX interface, and the network
> stack filters them, as it does for any interface. However, when
> hardware offload is involved, things change. We should program the
> hardware to only send multcast packets to the host when the host has
> in interest in them.
> 
> Thus we need to perform IGMP snooping on the brX interface, just like
> any other interface of the bridge. However, currently the brX
> interface is missing all the needed data structures to do this. There
> is no net_bridge_port structure for the brX interface. This strucuture
> is created when an interface is added to the bridge. But the brX
> interface is not a member of the bridge. So this patchset makes the
> brX interface a first class member of the bridge. When the brX
> interface is opened, the interface is added to the bridge. A
> net_bridge_port is allocated for it, and IGMP snooping is performed as
> usual.
> 
> There are some complexities here. Some assumptions are broken, like
> the master interface of a port interface is the bridge interface. The
> brX interface cannot be its own master. The use of
> netdev_master_upper_dev_get() within the bridge code has been changed
> to reflecit this. The bridge receive handler needs to not process
> frames for the brX interface, etc.
> 
> The interface downward to the hardware is also an issue. The code
> presented here is a hack and needs to change. But that is secondary
> and can be solved once it is agreed how the bridge needs to change to
> support this use case.
> 
> Comment welcome and wanted.
> 
> 	Andrew
> 
> Andrew Lunn (5):
>   net: rtnetlink: Handle bridge port without upper device
>   net: bridge: Skip receive handler on brX interface
>   net: bridge: Make the brX interface a member of the bridge
>   net: dsa: HACK: Handle MDB add/remove for none-switch ports
>   net: dsa: Don't include CPU port when adding MDB to a port
> 
>  include/linux/if_bridge.h |  1 +
>  net/bridge/br_device.c    | 12 ++++++++++--
>  net/bridge/br_if.c        | 37 ++++++++++++++++++++++++-------------
>  net/bridge/br_input.c     |  4 ++++
>  net/bridge/br_mdb.c       |  2 --
>  net/bridge/br_multicast.c |  7 ++++---
>  net/bridge/br_private.h   |  1 +
>  net/core/rtnetlink.c      | 23 +++++++++++++++++++++--
>  net/dsa/port.c            | 19 +++++++++++++++++--
>  net/dsa/switch.c          |  2 +-
>  10 files changed, 83 insertions(+), 25 deletions(-)
> 

Sorry you can't change the semantics of the bridge like this.
There are likely to be scripts and management utilities that won't work after this.

Figure out another way. Such as adding IGMP updates in the local packet send/receive path.

^ permalink raw reply

* Re: IPv6 loopback issue report
From: David Ahern @ 2017-08-28 15:07 UTC (permalink / raw)
  To: Tariq Toukan, Linux Kernel Network Developers, David Miller,
	Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: ranro, guye, Eran Ben Elisha
In-Reply-To: <464e900f-c256-4d22-6a83-a3aea0fca8bc@mellanox.com>

On 8/28/17 8:48 AM, Tariq Toukan wrote:
> #  /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8
> PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data
> bytes
> 
> --- fe80::7efe:90ff:fecb:7502%ens8 ping statistics ---
> 3 packets transmitted, 0 received, 100% packet loss, time 2043ms


I'll take a look. Most likely related to my recent IPv6 change.

^ permalink raw reply

* Re: [PATCH net-next] bridge: fdb add and delete tracepoints
From: Roopa Prabhu @ 2017-08-28 15:03 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Nikolay Aleksandrov, netdev@vger.kernel.org, bridge,
	davem@davemloft.net, Andrew Lunn
In-Reply-To: <592631a3-6a80-407a-f087-37f7f04417d2@gmail.com>

On Sun, Aug 27, 2017 at 7:11 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 08/27/2017 02:33 PM, Roopa Prabhu wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> Tracepoints to trace bridge forwarding database updates.
>
> Thanks for adding this!
>
>>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> ---
>>  include/trace/events/bridge.h | 98 +++++++++++++++++++++++++++++++++++++++++++
>>  net/bridge/br_fdb.c           |  7 ++++
>>  net/core/net-traces.c         |  6 +++
>>  3 files changed, 111 insertions(+)
>>  create mode 100644 include/trace/events/bridge.h
>>
>> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
>> new file mode 100644
>> index 0000000..e2d52cf
>> --- /dev/null
>> +++ b/include/trace/events/bridge.h
>> @@ -0,0 +1,98 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM bridge
>> +
>> +#if !defined(_TRACE_BRIDGE_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_BRIDGE_H
>> +
>> +#include <linux/netdevice.h>
>> +#include <linux/tracepoint.h>
>> +
>> +#include "../../../net/bridge/br_private.h"
>> +
>> +TRACE_EVENT(br_fdb_add,
>> +
>> +     TP_PROTO(struct ndmsg *ndm, struct net_device *dev,
>> +              const unsigned char *addr, u16 vid, u16 nlh_flags),
>> +
>> +     TP_ARGS(ndm, dev, addr, vid, nlh_flags),
>> +
>> +     TP_STRUCT__entry(
>> +             __field(u8, ndm_flags)
>> +             __string(dev, dev->name)
>> +             __array(unsigned char, addr, 6)
>
> Can you use ETH_ALEN instead of 6 here?
>
>> +             __field(u16, vid)
>> +             __field(u16, nlh_flags)
>> +     ),
>> +
>> +     TP_fast_assign(
>> +             __assign_str(dev, dev->name);
>> +             memcpy(__entry->addr, addr, 6);
>
> Likewise
>
>> +             __entry->vid = vid;
>> +             __entry->nlh_flags = nlh_flags;
>> +             __entry->ndm_flags = ndm->ndm_flags;
>> +     ),
>> +
>> +     TP_printk("dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u nlh_flags %x ndm_flags = %x",
>
> I wonder if we could make %pM work for TP_printk() as this would
> simplify the argument list a bitt. Can you use %04x for vid, nlh_flags
> and %02x for ndm_flags?


one thing here.., i think %u is better for vid. bridge driver debug
prints also use %u for vid.

^ permalink raw reply

* Re: [ethtool] ethtool: Remove UDP Fragmentation Offload use from ethtool
From: Eric Dumazet @ 2017-08-28 15:00 UTC (permalink / raw)
  To: Tariq Toukan; +Cc: John W. Linville, netdev, Eran Ben Elisha, Shaker Daibes
In-Reply-To: <1503923928-9419-1-git-send-email-tariqt@mellanox.com>

On Mon, 2017-08-28 at 15:38 +0300, Tariq Toukan wrote:
> From: Shaker Daibes <shakerd@mellanox.com>
> 
> UFO was removed in kernel, here we remove it in ethtool app.
> 
> Fixes the following issue:
> Features for ens8:
> Cannot get device udp-fragmentation-offload settings: Operation not supported
> 
> Tested with "make check"
> 
> Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> ---

Hi guys

I would rather remove the warning, but leave the ability to switch UFO
on machines running old kernel but a recent ethtool.

ethtool does not need to be downgraded every time we boot an old
kernel ;)

Thanks !

^ permalink raw reply

* [PATCH net-next v3 13/13] arm64: defconfig: enable Marvell CP110 comphy
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Miquel Raynal, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, netdev, Antoine Tenart
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

From: Miquel Raynal <miquel.raynal@free-electrons.com>

The comphy is an hardware block giving access to common PHYs that can be
used by various other engines (Network, SATA, ...). This is used on
Marvell 7k/8k platforms for now. Enable the corresponding driver.

Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com>
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 6b26a05ae5b4..d453ba50df43 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -510,6 +510,7 @@ CONFIG_PWM_TEGRA=m
 CONFIG_PHY_RCAR_GEN3_USB2=y
 CONFIG_PHY_HI6220_USB=y
 CONFIG_PHY_SUN4I_USB=y
+CONFIG_PHY_MVEBU_CP110_COMPHY=y
 CONFIG_PHY_ROCKCHIP_INNO_USB2=y
 CONFIG_PHY_ROCKCHIP_EMMC=y
 CONFIG_PHY_ROCKCHIP_PCIE=m
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 12/13] arm64: dts: marvell: 7040-db: add comphy references to Ethernet ports
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

This patch adds comphy phandles to the Ethernet ports in the 7040-db
device tree. The comphy is used to configure the serdes PHYs used by
these ports.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 arch/arm64/boot/dts/marvell/armada-7040-db.dts | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-7040-db.dts b/arch/arm64/boot/dts/marvell/armada-7040-db.dts
index 92c761c380d3..03d1c42d7c47 100644
--- a/arch/arm64/boot/dts/marvell/armada-7040-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-7040-db.dts
@@ -180,6 +180,7 @@
 	status = "okay";
 	phy = <&phy0>;
 	phy-mode = "sgmii";
+	phys = <&cpm_comphy0 1>;
 };
 
 &cpm_eth2 {
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 11/13] arm64: dts: marvell: mcbin: add comphy references to Ethernet ports
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

This patch adds comphy phandles to the Ethernet ports in the mcbin
device tree. The comphy is used to configure the serdes PHYs used by
these ports.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
index 9f0a00802452..970081ca197b 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
@@ -148,6 +148,7 @@
 &cpm_eth0 {
 	status = "okay";
 	phy = <&phy0>;
+	phys = <&cpm_comphy4 0>;
 	phy-mode = "10gbase-kr";
 };
 
@@ -181,6 +182,7 @@
 &cps_eth0 {
 	status = "okay";
 	phy = <&phy8>;
+	phys = <&cps_comphy4 0>;
 	phy-mode = "10gbase-kr";
 };
 
@@ -189,6 +191,7 @@
 	status = "okay";
 	phy = <&ge_phy>;
 	phy-mode = "sgmii";
+	phys = <&cps_comphy0 1>;
 };
 
 &cps_sata0 {
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 10/13] arm64: dts: marvell: add comphy nodes on cp110 master and slave
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

Now that the comphy driver is available, this patch adds the
corresponding nodes in the cp110 master and slave device trees.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 .../boot/dts/marvell/armada-cp110-master.dtsi      | 38 ++++++++++++++++++++++
 .../arm64/boot/dts/marvell/armada-cp110-slave.dtsi | 38 ++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
index 9b2581473183..f2a50552bad4 100644
--- a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
@@ -91,6 +91,44 @@
 				};
 			};
 
+			cpm_comphy: phy@120000 {
+				compatible = "marvell,comphy-cp110";
+				reg = <0x120000 0x6000>;
+				marvell,system-controller = <&cpm_syscon0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+
+				cpm_comphy0: phy@0 {
+					reg = <0>;
+					#phy-cells = <1>;
+				};
+
+				cpm_comphy1: phy@1 {
+					reg = <1>;
+					#phy-cells = <1>;
+				};
+
+				cpm_comphy2: phy@2 {
+					reg = <2>;
+					#phy-cells = <1>;
+				};
+
+				cpm_comphy3: phy@3 {
+					reg = <3>;
+					#phy-cells = <1>;
+				};
+
+				cpm_comphy4: phy@4 {
+					reg = <4>;
+					#phy-cells = <1>;
+				};
+
+				cpm_comphy5: phy@5 {
+					reg = <5>;
+					#phy-cells = <1>;
+				};
+			};
+
 			cpm_mdio: mdio@12a200 {
 				#address-cells = <1>;
 				#size-cells = <0>;
diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
index d3902f218c46..bd7f7d0e6de9 100644
--- a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
@@ -98,6 +98,44 @@
 				};
 			};
 
+			cps_comphy: phy@120000 {
+				compatible = "marvell,comphy-cp110";
+				reg = <0x120000 0x6000>;
+				marvell,system-controller = <&cps_syscon0>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+
+				cps_comphy0: phy@0 {
+					reg = <0>;
+					#phy-cells = <1>;
+				};
+
+				cps_comphy1: phy@1 {
+					reg = <1>;
+					#phy-cells = <1>;
+				};
+
+				cps_comphy2: phy@2 {
+					reg = <2>;
+					#phy-cells = <1>;
+				};
+
+				cps_comphy3: phy@3 {
+					reg = <3>;
+					#phy-cells = <1>;
+				};
+
+				cps_comphy4: phy@4 {
+					reg = <4>;
+					#phy-cells = <1>;
+				};
+
+				cps_comphy5: phy@5 {
+					reg = <5>;
+					#phy-cells = <1>;
+				};
+			};
+
 			cps_mdio: mdio@12a200 {
 				#address-cells = <1>;
 				#size-cells = <0>;
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 09/13] arm64: dts: marvell: extend the cp110 syscon register area length
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

This patch extends on both cp110 the system register area length to
include some of the comphy registers as well.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi | 2 +-
 arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
index 18299e164cb7..9b2581473183 100644
--- a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
@@ -118,7 +118,7 @@
 
 			cpm_syscon0: system-controller@440000 {
 				compatible = "syscon", "simple-mfd";
-				reg = <0x440000 0x1000>;
+				reg = <0x440000 0x2000>;
 
 				cpm_clk: clock {
 					compatible = "marvell,cp110-clock";
diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
index 5ae8fa575859..d3902f218c46 100644
--- a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
@@ -125,7 +125,7 @@
 
 			cps_syscon0: system-controller@440000 {
 				compatible = "syscon", "simple-mfd";
-				reg = <0x440000 0x1000>;
+				reg = <0x440000 0x2000>;
 
 				cps_clk: clock {
 					compatible = "marvell,cp110-clock";
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 08/13] net: mvpp2: dynamic reconfiguration of the comphy/GoP/MAC
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

This patch adds logic to reconfigure the comphy/GoP/MAC when the link
state is updated at runtime. This is very useful on boards where many
link speed are supported: depending on what is negotiated the PPv2
driver will automatically reconfigures the link between the PHY and the
MAC.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 2f05a0b0773c..9e64b1ba3d43 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5771,9 +5771,28 @@ static void mvpp2_link_event(struct net_device *dev)
 {
 	struct mvpp2_port *port = netdev_priv(dev);
 	struct phy_device *phydev = dev->phydev;
+	bool link_reconfigured = false;
 	u32 val;
 
 	if (phydev->link) {
+		if (port->phy_interface != phydev->interface && port->comphy) {
+	                /* disable current port for reconfiguration */
+	                mvpp2_interrupts_disable(port);
+	                netif_carrier_off(port->dev);
+	                mvpp2_port_disable(port);
+			phy_power_off(port->comphy);
+
+	                /* comphy reconfiguration */
+	                port->phy_interface = phydev->interface;
+	                mvpp22_comphy_init(port);
+
+	                /* gop/mac reconfiguration */
+	                mvpp22_gop_init(port);
+	                mvpp2_port_mii_set(port);
+
+	                link_reconfigured = true;
+		}
+
 		if ((port->speed != phydev->speed) ||
 		    (port->duplex != phydev->duplex)) {
 			mvpp2_gmac_set_autoneg(port, phydev);
@@ -5783,7 +5802,7 @@ static void mvpp2_link_event(struct net_device *dev)
 		}
 	}
 
-	if (phydev->link != port->link) {
+	if (phydev->link != port->link || link_reconfigured) {
 		port->link = phydev->link;
 
 		if (phydev->link) {
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 07/13] net: mvpp2: do not set GMAC autoneg when using XLG MAC
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

When using the XLG MAC, it does not make sense to force the GMAC autoneg
parameters. This patch adds checks to only set the GMAC autoneg
parameters when needed (i.e. when not using the XLG MAC).

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 64 +++++++++++++++++++++++-------------
 1 file changed, 42 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 09cad32734f3..2f05a0b0773c 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5735,6 +5735,37 @@ static irqreturn_t mvpp2_isr(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static void mvpp2_gmac_set_autoneg(struct mvpp2_port *port,
+				   struct phy_device *phydev)
+{
+	u32 val;
+
+	if (port->phy_interface != PHY_INTERFACE_MODE_RGMII &&
+	    port->phy_interface != PHY_INTERFACE_MODE_RGMII_ID &&
+	    port->phy_interface != PHY_INTERFACE_MODE_RGMII_RXID &&
+	    port->phy_interface != PHY_INTERFACE_MODE_RGMII_TXID &&
+	    port->phy_interface != PHY_INTERFACE_MODE_SGMII)
+		return;
+
+	val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+	val &= ~(MVPP2_GMAC_CONFIG_MII_SPEED |
+		 MVPP2_GMAC_CONFIG_GMII_SPEED |
+		 MVPP2_GMAC_CONFIG_FULL_DUPLEX |
+		 MVPP2_GMAC_AN_SPEED_EN |
+		 MVPP2_GMAC_AN_DUPLEX_EN);
+
+	if (phydev->duplex)
+		val |= MVPP2_GMAC_CONFIG_FULL_DUPLEX;
+
+	if (phydev->speed == SPEED_1000)
+		val |= MVPP2_GMAC_CONFIG_GMII_SPEED;
+	else if (phydev->speed == SPEED_100)
+		val |= MVPP2_GMAC_CONFIG_MII_SPEED;
+
+	writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+
+}
+
 /* Adjust link */
 static void mvpp2_link_event(struct net_device *dev)
 {
@@ -5745,24 +5776,7 @@ static void mvpp2_link_event(struct net_device *dev)
 	if (phydev->link) {
 		if ((port->speed != phydev->speed) ||
 		    (port->duplex != phydev->duplex)) {
-			u32 val;
-
-			val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-			val &= ~(MVPP2_GMAC_CONFIG_MII_SPEED |
-				 MVPP2_GMAC_CONFIG_GMII_SPEED |
-				 MVPP2_GMAC_CONFIG_FULL_DUPLEX |
-				 MVPP2_GMAC_AN_SPEED_EN |
-				 MVPP2_GMAC_AN_DUPLEX_EN);
-
-			if (phydev->duplex)
-				val |= MVPP2_GMAC_CONFIG_FULL_DUPLEX;
-
-			if (phydev->speed == SPEED_1000)
-				val |= MVPP2_GMAC_CONFIG_GMII_SPEED;
-			else if (phydev->speed == SPEED_100)
-				val |= MVPP2_GMAC_CONFIG_MII_SPEED;
-
-			writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+			mvpp2_gmac_set_autoneg(port, phydev);
 
 			port->duplex = phydev->duplex;
 			port->speed  = phydev->speed;
@@ -5773,10 +5787,16 @@ static void mvpp2_link_event(struct net_device *dev)
 		port->link = phydev->link;
 
 		if (phydev->link) {
-			val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-			val |= (MVPP2_GMAC_FORCE_LINK_PASS |
-				MVPP2_GMAC_FORCE_LINK_DOWN);
-			writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+			if (port->phy_interface == PHY_INTERFACE_MODE_RGMII ||
+			    port->phy_interface == PHY_INTERFACE_MODE_RGMII_ID ||
+			    port->phy_interface == PHY_INTERFACE_MODE_RGMII_RXID ||
+			    port->phy_interface == PHY_INTERFACE_MODE_RGMII_TXID ||
+			    port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+				val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+				val |= (MVPP2_GMAC_FORCE_LINK_PASS |
+					MVPP2_GMAC_FORCE_LINK_DOWN);
+				writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+			}
 
 			mvpp2_interrupts_enable(port);
 			mvpp2_port_enable(port);
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 06/13] net: mvpp2: improve the link management function
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

When the link status changes, the phylib calls the link_event function
in the mvpp2 driver. Before this patch only the egress/ingress transmit
was enabled/disabled. This patch adds more functionality to the link
status management code by enabling/disabling the port per-cpu
interrupts, and the port itself. The queues are now stopped as well, and
the netif carrier helpers are called.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 9c0c81e68d55..09cad32734f3 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5777,14 +5777,25 @@ static void mvpp2_link_event(struct net_device *dev)
 			val |= (MVPP2_GMAC_FORCE_LINK_PASS |
 				MVPP2_GMAC_FORCE_LINK_DOWN);
 			writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
+
+			mvpp2_interrupts_enable(port);
+			mvpp2_port_enable(port);
+
 			mvpp2_egress_enable(port);
 			mvpp2_ingress_enable(port);
+			netif_carrier_on(dev);
+			netif_tx_wake_all_queues(dev);
 		} else {
 			port->duplex = -1;
 			port->speed = 0;
 
+			netif_tx_stop_all_queues(dev);
+			netif_carrier_off(dev);
 			mvpp2_ingress_disable(port);
 			mvpp2_egress_disable(port);
+
+			mvpp2_port_disable(port);
+			mvpp2_interrupts_disable(port);
 		}
 
 		phy_print_status(phydev);
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 05/13] net: mvpp2: simplify the link_event function
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

The link_event function is somewhat complicated. This cosmetic patch
simplifies it.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index fab231858a41..9c0c81e68d55 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5740,7 +5740,6 @@ static void mvpp2_link_event(struct net_device *dev)
 {
 	struct mvpp2_port *port = netdev_priv(dev);
 	struct phy_device *phydev = dev->phydev;
-	int status_change = 0;
 	u32 val;
 
 	if (phydev->link) {
@@ -5771,16 +5770,8 @@ static void mvpp2_link_event(struct net_device *dev)
 	}
 
 	if (phydev->link != port->link) {
-		if (!phydev->link) {
-			port->duplex = -1;
-			port->speed = 0;
-		}
-
 		port->link = phydev->link;
-		status_change = 1;
-	}
 
-	if (status_change) {
 		if (phydev->link) {
 			val = readl(port->base + MVPP2_GMAC_AUTONEG_CONFIG);
 			val |= (MVPP2_GMAC_FORCE_LINK_PASS |
@@ -5789,9 +5780,13 @@ static void mvpp2_link_event(struct net_device *dev)
 			mvpp2_egress_enable(port);
 			mvpp2_ingress_enable(port);
 		} else {
+			port->duplex = -1;
+			port->speed = 0;
+
 			mvpp2_ingress_disable(port);
 			mvpp2_egress_disable(port);
 		}
+
 		phy_print_status(phydev);
 	}
 }
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 04/13] net: mvpp2: initialize the comphy
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

On some platforms, the comphy is between the MAC GoP and the PHYs. The
mvpp2 driver currently relies on the firmware/bootloader to configure
the comphy. As a comphy driver was added to the generic PHY framework,
this patch uses it in the mvpp2 driver to configure the comphy at boot
time to avoid relying on the bootloader.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 44 +++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index e312dfc3555b..fab231858a41 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -28,6 +28,7 @@
 #include <linux/of_address.h>
 #include <linux/of_device.h>
 #include <linux/phy.h>
+#include <linux/phy/phy.h>
 #include <linux/clk.h>
 #include <linux/hrtimer.h>
 #include <linux/ktime.h>
@@ -861,6 +862,7 @@ struct mvpp2_port {
 
 	phy_interface_t phy_interface;
 	struct device_node *phy_node;
+	struct phy *comphy;
 	unsigned int link;
 	unsigned int duplex;
 	unsigned int speed;
@@ -4420,6 +4422,32 @@ static int mvpp22_gop_init(struct mvpp2_port *port)
 	return -EINVAL;
 }
 
+static int mvpp22_comphy_init(struct mvpp2_port *port)
+{
+	enum phy_mode mode;
+	int ret;
+
+	if (!port->comphy)
+		return 0;
+
+	switch (port->phy_interface) {
+	case PHY_INTERFACE_MODE_SGMII:
+		mode = PHY_MODE_SGMII;
+		break;
+	case PHY_INTERFACE_MODE_10GKR:
+		mode = PHY_MODE_10GKR;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = phy_set_mode(port->comphy, mode);
+	if (ret)
+		return ret;
+
+	return phy_power_on(port->comphy);
+}
+
 static void mvpp2_port_mii_gmac_configure_mode(struct mvpp2_port *port)
 {
 	u32 val;
@@ -6404,8 +6432,10 @@ static void mvpp2_start_dev(struct mvpp2_port *port)
 	/* Enable interrupts on all CPUs */
 	mvpp2_interrupts_enable(port);
 
-	if (port->priv->hw_version == MVPP22)
+	if (port->priv->hw_version == MVPP22) {
+		mvpp22_comphy_init(port);
 		mvpp22_gop_init(port);
+	}
 
 	mvpp2_port_mii_set(port);
 	mvpp2_port_enable(port);
@@ -6436,6 +6466,7 @@ static void mvpp2_stop_dev(struct mvpp2_port *port)
 	mvpp2_egress_disable(port);
 	mvpp2_port_disable(port);
 	phy_stop(ndev->phydev);
+	phy_power_off(port->comphy);
 }
 
 static int mvpp2_check_ringparam_valid(struct net_device *dev,
@@ -7270,6 +7301,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 			    struct mvpp2 *priv)
 {
 	struct device_node *phy_node;
+	struct phy *comphy;
 	struct mvpp2_port *port;
 	struct mvpp2_port_pcpu *port_pcpu;
 	struct net_device *dev;
@@ -7311,6 +7343,15 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		goto err_free_netdev;
 	}
 
+	comphy = devm_of_phy_get(&pdev->dev, port_node, NULL);
+	if (IS_ERR(comphy)) {
+		if (PTR_ERR(comphy) == -EPROBE_DEFER) {
+			err = -EPROBE_DEFER;
+			goto err_free_netdev;
+		}
+		comphy = NULL;
+	}
+
 	if (of_property_read_u32(port_node, "port-id", &id)) {
 		err = -EINVAL;
 		dev_err(&pdev->dev, "missing port-id value\n");
@@ -7344,6 +7385,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 
 	port->phy_node = phy_node;
 	port->phy_interface = phy_mode;
+	port->comphy = comphy;
 
 	if (priv->hw_version == MVPP21) {
 		res = platform_get_resource(pdev, IORESOURCE_MEM, 2 + id);
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 03/13] Documentation/bindings: phy: document the Marvell comphy driver
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

The Marvell Armada 7K/8K SoCs contains an hardware block called COMPHY
that provides a number of shared PHYs used by various interfaces in the
SoC: network, SATA, PCIe, etc. This Device Tree binding allows to
describe this COMPHY hardware block.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 .../devicetree/bindings/phy/phy-mvebu-comphy.txt   | 43 ++++++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/phy-mvebu-comphy.txt

diff --git a/Documentation/devicetree/bindings/phy/phy-mvebu-comphy.txt b/Documentation/devicetree/bindings/phy/phy-mvebu-comphy.txt
new file mode 100644
index 000000000000..bfcf80341657
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/phy-mvebu-comphy.txt
@@ -0,0 +1,43 @@
+mvebu comphy driver
+-------------------
+
+A comphy controller can be found on Marvell Armada 7k/8k on the CP110. It
+provides a number of shared PHYs used by various interfaces (network, sata,
+usb, PCIe...).
+
+Required properties:
+
+- compatible: should be "marvell,comphy-cp110"
+- reg: should contain the comphy register location and length.
+- marvell,system-controller: should contain a phandle to the
+                             system controller node.
+- #address-cells: should be 1.
+- #size-cells: should be 0.
+
+A sub-node is required for each comphy lane provided by the comphy.
+
+Required properties (child nodes):
+
+- reg: comphy lane number.
+- #phy-cells : from the generic phy bindings, must be 1. Defines the
+               input port to use for a given comphy lane.
+
+Example:
+
+	cpm_comphy: phy@120000 {
+		compatible = "marvell,comphy-cp110";
+		reg = <0x120000 0x6000>;
+		marvell,system-controller = <&cpm_syscon0>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		cpm_comphy0: phy@0 {
+			reg = <0>;
+			#phy-cells = <1>;
+		};
+
+		cpm_comphy1: phy@1 {
+			reg = <1>;
+			#phy-cells = <1>;
+		};
+	};
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 02/13] phy: add the mvebu cp110 comphy driver
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

On the CP110 unit, which can be found on various Marvell platforms such
as the 7k and 8k (currently), a comphy (common PHYs) hardware block can
be found. This block provides a number of PHYs which can be used in
various modes by other controllers (network, SATA ...). These common
PHYs must be configured for the controllers using them to work correctly
either at boot time, or when the system runs to switch the mode used.
This patch adds a driver for this comphy hardware block, providing
callbacks for the its PHYs so that consumers can configure the modes
used.

As of this commit, two modes are supported by the comphy driver: sgmii
and 10gkr.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/phy/marvell/Kconfig                  |  10 +
 drivers/phy/marvell/Makefile                 |   1 +
 drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 656 +++++++++++++++++++++++++++
 3 files changed, 667 insertions(+)
 create mode 100644 drivers/phy/marvell/phy-mvebu-cp110-comphy.c

diff --git a/drivers/phy/marvell/Kconfig b/drivers/phy/marvell/Kconfig
index 048d8893bc2e..26755f3d1a9a 100644
--- a/drivers/phy/marvell/Kconfig
+++ b/drivers/phy/marvell/Kconfig
@@ -21,6 +21,16 @@ config PHY_BERLIN_USB
 	help
 	  Enable this to support the USB PHY on Marvell Berlin SoCs.
 
+config PHY_MVEBU_CP110_COMPHY
+	tristate "Marvell CP110 comphy driver"
+	depends on ARCH_MVEBU && OF
+	select GENERIC_PHY
+	help
+	  This driver allows to control the comphy, an hardware block providing
+	  shared serdes PHYs on Marvell Armada 7k/8k (in the CP110). Its serdes
+	  lanes can be used by various controllers (Ethernet, sata, usb,
+	  PCIe...).
+
 config PHY_MVEBU_SATA
 	def_bool y
 	depends on ARCH_DOVE || MACH_DOVE || MACH_KIRKWOOD
diff --git a/drivers/phy/marvell/Makefile b/drivers/phy/marvell/Makefile
index 3fc188f59118..0cf6a7cbaf9f 100644
--- a/drivers/phy/marvell/Makefile
+++ b/drivers/phy/marvell/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_ARMADA375_USBCLUSTER_PHY)	+= phy-armada375-usb2.o
 obj-$(CONFIG_PHY_BERLIN_SATA)		+= phy-berlin-sata.o
 obj-$(CONFIG_PHY_BERLIN_USB)		+= phy-berlin-usb.o
+obj-$(CONFIG_PHY_MVEBU_CP110_COMPHY)	+= phy-mvebu-cp110-comphy.o
 obj-$(CONFIG_PHY_MVEBU_SATA)		+= phy-mvebu-sata.o
 obj-$(CONFIG_PHY_PXA_28NM_HSIC)		+= phy-pxa-28nm-hsic.o
 obj-$(CONFIG_PHY_PXA_28NM_USB2)		+= phy-pxa-28nm-usb2.o
diff --git a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
new file mode 100644
index 000000000000..41556e790856
--- /dev/null
+++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
@@ -0,0 +1,656 @@
+/*
+ * Copyright (C) 2017 Marvell
+ *
+ * Antoine Tenart <antoine.tenart@free-electrons.com>
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/mfd/syscon.h>
+#include <linux/module.h>
+#include <linux/phy/phy.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+
+/* Relative to priv->base */
+#define MVEBU_COMPHY_SERDES_CFG0(n)		(0x0 + (n) * 0x1000)
+#define     MVEBU_COMPHY_SERDES_CFG0_PU_PLL	BIT(1)
+#define     MVEBU_COMPHY_SERDES_CFG0_GEN_RX(n)	((n) << 3)
+#define     MVEBU_COMPHY_SERDES_CFG0_GEN_TX(n)	((n) << 7)
+#define     MVEBU_COMPHY_SERDES_CFG0_PU_RX	BIT(11)
+#define     MVEBU_COMPHY_SERDES_CFG0_PU_TX	BIT(12)
+#define     MVEBU_COMPHY_SERDES_CFG0_HALF_BUS	BIT(14)
+#define MVEBU_COMPHY_SERDES_CFG1(n)		(0x4 + (n) * 0x1000)
+#define     MVEBU_COMPHY_SERDES_CFG1_RESET	BIT(3)
+#define     MVEBU_COMPHY_SERDES_CFG1_RX_INIT	BIT(4)
+#define     MVEBU_COMPHY_SERDES_CFG1_CORE_RESET	BIT(5)
+#define     MVEBU_COMPHY_SERDES_CFG1_RF_RESET	BIT(6)
+#define MVEBU_COMPHY_SERDES_CFG2(n)		(0x8 + (n) * 0x1000)
+#define     MVEBU_COMPHY_SERDES_CFG2_DFE_EN	BIT(4)
+#define MVEBU_COMPHY_SERDES_STATUS0(n)		(0x18 + (n) * 0x1000)
+#define     MVEBU_COMPHY_SERDES_STATUS0_TX_PLL_RDY	BIT(2)
+#define     MVEBU_COMPHY_SERDES_STATUS0_RX_PLL_RDY	BIT(3)
+#define     MVEBU_COMPHY_SERDES_STATUS0_RX_INIT		BIT(4)
+#define MVEBU_COMPHY_PWRPLL_CTRL(n)		(0x804 + (n) * 0x1000)
+#define     MVEBU_COMPHY_PWRPLL_CTRL_RFREQ(n)	((n) << 0)
+#define     MVEBU_COMPHY_PWRPLL_PHY_MODE(n)	((n) << 5)
+#define MVEBU_COMPHY_IMP_CAL(n)			(0x80c + (n) * 0x1000)
+#define     MVEBU_COMPHY_IMP_CAL_TX_EXT(n)	((n) << 10)
+#define     MVEBU_COMPHY_IMP_CAL_TX_EXT_EN	BIT(15)
+#define MVEBU_COMPHY_DFE_RES(n)			(0x81c + (n) * 0x1000)
+#define     MVEBU_COMPHY_DFE_RES_FORCE_GEN_TBL	BIT(15)
+#define MVEBU_COMPHY_COEF(n)			(0x828 + (n) * 0x1000)
+#define     MVEBU_COMPHY_COEF_DFE_EN		BIT(14)
+#define     MVEBU_COMPHY_COEF_DFE_CTRL		BIT(15)
+#define MVEBU_COMPHY_GEN1_S0(n)			(0x834 + (n) * 0x1000)
+#define     MVEBU_COMPHY_GEN1_S0_TX_AMP(n)	((n) << 1)
+#define     MVEBU_COMPHY_GEN1_S0_TX_EMPH(n)	((n) << 7)
+#define MVEBU_COMPHY_GEN1_S1(n)			(0x838 + (n) * 0x1000)
+#define     MVEBU_COMPHY_GEN1_S1_RX_MUL_PI(n)	((n) << 0)
+#define     MVEBU_COMPHY_GEN1_S1_RX_MUL_PF(n)	((n) << 3)
+#define     MVEBU_COMPHY_GEN1_S1_RX_MUL_FI(n)	((n) << 6)
+#define     MVEBU_COMPHY_GEN1_S1_RX_MUL_FF(n)	((n) << 8)
+#define     MVEBU_COMPHY_GEN1_S1_RX_DFE_EN	BIT(10)
+#define     MVEBU_COMPHY_GEN1_S1_RX_DIV(n)	((n) << 11)
+#define MVEBU_COMPHY_GEN1_S2(n)			(0x8f4 + (n) * 0x1000)
+#define     MVEBU_COMPHY_GEN1_S2_TX_EMPH(n)	((n) << 0)
+#define     MVEBU_COMPHY_GEN1_S2_TX_EMPH_EN	BIT(4)
+#define MVEBU_COMPHY_LOOPBACK(n)		(0x88c + (n) * 0x1000)
+#define     MVEBU_COMPHY_LOOPBACK_DBUS_WIDTH(n)	((n) << 1)
+#define MVEBU_COMPHY_VDD_CAL0(n)		(0x908 + (n) * 0x1000)
+#define     MVEBU_COMPHY_VDD_CAL0_CONT_MODE	BIT(15)
+#define MVEBU_COMPHY_EXT_SELV(n)		(0x914 + (n) * 0x1000)
+#define     MVEBU_COMPHY_EXT_SELV_RX_SAMPL(n)	((n) << 5)
+#define MVEBU_COMPHY_MISC_CTRL0(n)		(0x93c + (n) * 0x1000)
+#define     MVEBU_COMPHY_MISC_CTRL0_ICP_FORCE	BIT(5)
+#define     MVEBU_COMPHY_MISC_CTRL0_REFCLK_SEL	BIT(10)
+#define MVEBU_COMPHY_RX_CTRL1(n)		(0x940 + (n) * 0x1000)
+#define     MVEBU_COMPHY_RX_CTRL1_RXCLK2X_SEL	BIT(11)
+#define     MVEBU_COMPHY_RX_CTRL1_CLK8T_EN	BIT(12)
+#define MVEBU_COMPHY_SPEED_DIV(n)		(0x954 + (n) * 0x1000)
+#define     MVEBU_COMPHY_SPEED_DIV_TX_FORCE	BIT(7)
+#define MVEBU_SP_CALIB(n)			(0x96c + (n) * 0x1000)
+#define     MVEBU_SP_CALIB_SAMPLER(n)		((n) << 8)
+#define     MVEBU_SP_CALIB_SAMPLER_EN		BIT(12)
+#define MVEBU_COMPHY_TX_SLEW_RATE(n)		(0x974 + (n) * 0x1000)
+#define     MVEBU_COMPHY_TX_SLEW_RATE_EMPH(n)	((n) << 5)
+#define     MVEBU_COMPHY_TX_SLEW_RATE_SLC(n)	((n) << 10)
+#define MVEBU_COMPHY_DLT_CTRL(n)		(0x984 + (n) * 0x1000)
+#define     MVEBU_COMPHY_DLT_CTRL_DTL_FLOOP_EN	BIT(2)
+#define MVEBU_COMPHY_FRAME_DETECT0(n)		(0xa14 + (n) * 0x1000)
+#define     MVEBU_COMPHY_FRAME_DETECT0_PATN(n)	((n) << 7)
+#define MVEBU_COMPHY_FRAME_DETECT3(n)		(0xa20 + (n) * 0x1000)
+#define     MVEBU_COMPHY_FRAME_DETECT3_LOST_TIMEOUT_EN	BIT(12)
+#define MVEBU_COMPHY_DME(n)			(0xa28 + (n) * 0x1000)
+#define     MVEBU_COMPHY_DME_ETH_MODE		BIT(7)
+#define MVEBU_COMPHY_TRAINING0(n)		(0xa68 + (n) * 0x1000)
+#define     MVEBU_COMPHY_TRAINING0_P2P_HOLD	BIT(15)
+#define MVEBU_COMPHY_TRAINING5(n)		(0xaa4 + (n) * 0x1000)
+#define	    MVEBU_COMPHY_TRAINING5_RX_TIMER(n)	((n) << 0)
+#define MVEBU_COMPHY_TX_TRAIN_PRESET(n)		(0xb1c + (n) * 0x1000)
+#define     MVEBU_COMPHY_TX_TRAIN_PRESET_16B_AUTO_EN	BIT(8)
+#define     MVEBU_COMPHY_TX_TRAIN_PRESET_PRBS11		BIT(9)
+#define MVEBU_COMPHY_GEN1_S3(n)			(0xc40 + (n) * 0x1000)
+#define     MVEBU_COMPHY_GEN1_S3_FBCK_SEL	BIT(9)
+#define MVEBU_COMPHY_GEN1_S4(n)			(0xc44 + (n) * 0x1000)
+#define	    MVEBU_COMPHY_GEN1_S4_DFE_RES(n)	((n) << 8)
+#define MVEBU_COMPHY_TX_PRESET(n)		(0xc68 + (n) * 0x1000)
+#define     MVEBU_COMPHY_TX_PRESET_INDEX(n)	((n) << 0)
+#define MVEBU_COMPHY_GEN1_S5(n)			(0xd38 + (n) * 0x1000)
+#define     MVEBU_COMPHY_GEN1_S5_ICP(n)		((n) << 0)
+
+/* Relative to priv->regmap */
+#define MVEBU_COMPHY_CONF1(n)			(0x1000 + (n) * 0x28)
+#define     MVEBU_COMPHY_CONF1_PWRUP		BIT(1)
+#define     MVEBU_COMPHY_CONF1_USB_PCIE		BIT(2)	/* 0: Ethernet/SATA */
+#define MVEBU_COMPHY_CONF6(n)			(0x1014 + (n) * 0x28)
+#define     MVEBU_COMPHY_CONF6_40B		BIT(18)
+#define MVEBU_COMPHY_SELECTOR			0x1140
+#define     MVEBU_COMPHY_SELECTOR_PHY(n)	((n) * 0x4)
+
+#define MVEBU_COMPHY_LANES	6
+#define MVEBU_COMPHY_PORTS	3
+
+struct mvebu_comhy_conf {
+	enum phy_mode mode;
+	unsigned lane;
+	unsigned port;
+	u32 mux;
+};
+
+#define MVEBU_COMPHY_CONF(_lane, _port, _mode, _mux)	\
+	{						\
+		.lane = _lane,				\
+		.port = _port,				\
+		.mode = _mode,				\
+		.mux = _mux,				\
+	}
+
+static const struct mvebu_comhy_conf mvebu_comphy_cp110_modes[] = {
+	/* lane 0 */
+	MVEBU_COMPHY_CONF(0, 1, PHY_MODE_SGMII, 0x1),
+	/* lane 1 */
+	MVEBU_COMPHY_CONF(1, 2, PHY_MODE_SGMII, 0x1),
+	/* lane 2 */
+	MVEBU_COMPHY_CONF(2, 0, PHY_MODE_SGMII, 0x1),
+	MVEBU_COMPHY_CONF(2, 0, PHY_MODE_10GKR, 0x1),
+	/* lane 3 */
+	MVEBU_COMPHY_CONF(3, 1, PHY_MODE_SGMII, 0x2),
+	/* lane 4 */
+	MVEBU_COMPHY_CONF(4, 0, PHY_MODE_SGMII, 0x2),
+	MVEBU_COMPHY_CONF(4, 0, PHY_MODE_10GKR, 0x2),
+	MVEBU_COMPHY_CONF(4, 1, PHY_MODE_SGMII, 0x1),
+	/* lane 5 */
+	MVEBU_COMPHY_CONF(5, 2, PHY_MODE_SGMII, 0x1),
+};
+
+struct mvebu_comphy_priv {
+	void __iomem *base;
+	struct regmap *regmap;
+	struct device *dev;
+	struct phy *phys[MVEBU_COMPHY_LANES];
+	int modes[MVEBU_COMPHY_LANES];
+};
+
+struct mvebu_comphy_lane {
+	struct mvebu_comphy_priv *priv;
+	struct device_node *of_node;
+	unsigned id;
+	enum phy_mode mode;
+	int port;
+};
+
+static int mvebu_comphy_get_mux(int lane, int port, enum phy_mode mode)
+{
+	int i, n = ARRAY_SIZE(mvebu_comphy_cp110_modes);
+
+	/* Unused PHY mux value is 0x0 */
+	if (mode == PHY_MODE_INVALID)
+		return 0;
+
+	for (i = 0; i < n; i++) {
+		if (mvebu_comphy_cp110_modes[i].lane == lane &&
+		    mvebu_comphy_cp110_modes[i].port == port &&
+		    mvebu_comphy_cp110_modes[i].mode == mode)
+			break;
+	}
+
+	if (i == n)
+		return -EINVAL;
+
+	return mvebu_comphy_cp110_modes[i].mux;
+}
+
+static void mvebu_comphy_ethernet_init_reset(struct mvebu_comphy_lane *lane,
+					     enum phy_mode mode)
+{
+	struct mvebu_comphy_priv *priv = lane->priv;
+	u32 val;
+
+	regmap_read(priv->regmap, MVEBU_COMPHY_CONF1(lane->id), &val);
+	val &= ~MVEBU_COMPHY_CONF1_USB_PCIE;
+	val |= MVEBU_COMPHY_CONF1_PWRUP;
+	regmap_write(priv->regmap, MVEBU_COMPHY_CONF1(lane->id), val);
+
+	/* Select baud rates and PLLs */
+	val = readl(priv->base + MVEBU_COMPHY_SERDES_CFG0(lane->id));
+	val &= ~(MVEBU_COMPHY_SERDES_CFG0_PU_PLL |
+		 MVEBU_COMPHY_SERDES_CFG0_PU_RX |
+		 MVEBU_COMPHY_SERDES_CFG0_PU_TX |
+		 MVEBU_COMPHY_SERDES_CFG0_HALF_BUS |
+		 MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0xf) |
+		 MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0xf));
+	if (mode == PHY_MODE_10GKR)
+		val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0xe) |
+		       MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0xe);
+	else if (mode == PHY_MODE_SGMII)
+		val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0x6) |
+		       MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0x6) |
+		       MVEBU_COMPHY_SERDES_CFG0_HALF_BUS;
+	writel(val, priv->base + MVEBU_COMPHY_SERDES_CFG0(lane->id));
+
+	/* reset */
+	val = readl(priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+	val &= ~(MVEBU_COMPHY_SERDES_CFG1_RESET |
+		 MVEBU_COMPHY_SERDES_CFG1_CORE_RESET |
+		 MVEBU_COMPHY_SERDES_CFG1_RF_RESET);
+	writel(val, priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+
+	/* de-assert reset */
+	val = readl(priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+	val |= MVEBU_COMPHY_SERDES_CFG1_RESET |
+	       MVEBU_COMPHY_SERDES_CFG1_CORE_RESET;
+	writel(val, priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+
+	/* wait until clocks are ready */
+	mdelay(1);
+
+	/* exlicitly disable 40B, the bits isn't clear on reset */
+	regmap_read(priv->regmap, MVEBU_COMPHY_CONF6(lane->id), &val);
+	val &= ~MVEBU_COMPHY_CONF6_40B;
+	regmap_write(priv->regmap, MVEBU_COMPHY_CONF6(lane->id), val);
+
+	/* refclk selection */
+	val = readl(priv->base + MVEBU_COMPHY_MISC_CTRL0(lane->id));
+	val &= ~MVEBU_COMPHY_MISC_CTRL0_REFCLK_SEL;
+	if (mode == PHY_MODE_10GKR)
+		val |= MVEBU_COMPHY_MISC_CTRL0_ICP_FORCE;
+	writel(val, priv->base + MVEBU_COMPHY_MISC_CTRL0(lane->id));
+
+	/* power and pll selection */
+	val = readl(priv->base + MVEBU_COMPHY_PWRPLL_CTRL(lane->id));
+	val &= ~(MVEBU_COMPHY_PWRPLL_CTRL_RFREQ(0x1f) |
+		 MVEBU_COMPHY_PWRPLL_PHY_MODE(0x7));
+	val |= MVEBU_COMPHY_PWRPLL_CTRL_RFREQ(0x1) |
+	       MVEBU_COMPHY_PWRPLL_PHY_MODE(0x4);
+	writel(val, priv->base + MVEBU_COMPHY_PWRPLL_CTRL(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_LOOPBACK(lane->id));
+	val &= ~MVEBU_COMPHY_LOOPBACK_DBUS_WIDTH(0x7);
+	val |= MVEBU_COMPHY_LOOPBACK_DBUS_WIDTH(0x1);
+	writel(val, priv->base + MVEBU_COMPHY_LOOPBACK(lane->id));
+}
+
+static int mvebu_comphy_init_plls(struct mvebu_comphy_lane *lane,
+				  enum phy_mode mode)
+{
+	struct mvebu_comphy_priv *priv = lane->priv;
+	u32 val;
+
+	/* SERDES external config */
+	val = readl(priv->base + MVEBU_COMPHY_SERDES_CFG0(lane->id));
+	val |= MVEBU_COMPHY_SERDES_CFG0_PU_PLL |
+	       MVEBU_COMPHY_SERDES_CFG0_PU_RX |
+	       MVEBU_COMPHY_SERDES_CFG0_PU_TX;
+	writel(val, priv->base + MVEBU_COMPHY_SERDES_CFG0(lane->id));
+
+	/* check rx/tx pll */
+	readl_poll_timeout(priv->base + MVEBU_COMPHY_SERDES_STATUS0(lane->id),
+			   val,
+			   val & (MVEBU_COMPHY_SERDES_STATUS0_RX_PLL_RDY |
+				  MVEBU_COMPHY_SERDES_STATUS0_TX_PLL_RDY),
+			   1000, 150000);
+	if (!(val & (MVEBU_COMPHY_SERDES_STATUS0_RX_PLL_RDY |
+		     MVEBU_COMPHY_SERDES_STATUS0_TX_PLL_RDY)))
+		return -ETIMEDOUT;
+
+	/* rx init */
+	val = readl(priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+	val |= MVEBU_COMPHY_SERDES_CFG1_RX_INIT;
+	writel(val, priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+
+	/* check rx */
+	readl_poll_timeout(priv->base + MVEBU_COMPHY_SERDES_STATUS0(lane->id),
+			   val, val & MVEBU_COMPHY_SERDES_STATUS0_RX_INIT,
+			   1000, 10000);
+	if (!(val & MVEBU_COMPHY_SERDES_STATUS0_RX_INIT))
+		return -ETIMEDOUT;
+
+	val = readl(priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+	val &= ~MVEBU_COMPHY_SERDES_CFG1_RX_INIT;
+	writel(val, priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+
+	return 0;
+}
+
+static int mvebu_comphy_set_mode_sgmii(struct phy *phy, enum phy_mode mode)
+{
+	struct mvebu_comphy_lane *lane = phy_get_drvdata(phy);
+	struct mvebu_comphy_priv *priv = lane->priv;
+	u32 val;
+
+	mvebu_comphy_ethernet_init_reset(lane, mode);
+
+	val = readl(priv->base + MVEBU_COMPHY_RX_CTRL1(lane->id));
+	val &= ~MVEBU_COMPHY_RX_CTRL1_CLK8T_EN;
+	val |= MVEBU_COMPHY_RX_CTRL1_RXCLK2X_SEL;
+	writel(val, priv->base + MVEBU_COMPHY_RX_CTRL1(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_DLT_CTRL(lane->id));
+	val &= ~MVEBU_COMPHY_DLT_CTRL_DTL_FLOOP_EN;
+	writel(val, priv->base + MVEBU_COMPHY_DLT_CTRL(lane->id));
+
+	regmap_read(priv->regmap, MVEBU_COMPHY_CONF1(lane->id), &val);
+	val &= ~MVEBU_COMPHY_CONF1_USB_PCIE;
+	val |= MVEBU_COMPHY_CONF1_PWRUP;
+	regmap_write(priv->regmap, MVEBU_COMPHY_CONF1(lane->id), val);
+
+	val = readl(priv->base + MVEBU_COMPHY_GEN1_S0(lane->id));
+	val &= ~MVEBU_COMPHY_GEN1_S0_TX_EMPH(0xf);
+	val |= MVEBU_COMPHY_GEN1_S0_TX_EMPH(0x1);
+	writel(val, priv->base + MVEBU_COMPHY_GEN1_S0(lane->id));
+
+	return mvebu_comphy_init_plls(lane, mode);
+}
+
+static int mvebu_comphy_set_mode_10gkr(struct phy *phy, enum phy_mode mode)
+{
+	struct mvebu_comphy_lane *lane = phy_get_drvdata(phy);
+	struct mvebu_comphy_priv *priv = lane->priv;
+	u32 val;
+
+	mvebu_comphy_ethernet_init_reset(lane, mode);
+
+	val = readl(priv->base + MVEBU_COMPHY_RX_CTRL1(lane->id));
+	val |= MVEBU_COMPHY_RX_CTRL1_RXCLK2X_SEL |
+	       MVEBU_COMPHY_RX_CTRL1_CLK8T_EN;
+	writel(val, priv->base + MVEBU_COMPHY_RX_CTRL1(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_DLT_CTRL(lane->id));
+	val |= MVEBU_COMPHY_DLT_CTRL_DTL_FLOOP_EN;
+	writel(val, priv->base + MVEBU_COMPHY_DLT_CTRL(lane->id));
+
+	/* Speed divider */
+	val = readl(priv->base + MVEBU_COMPHY_SPEED_DIV(lane->id));
+	val |= MVEBU_COMPHY_SPEED_DIV_TX_FORCE;
+	writel(val, priv->base + MVEBU_COMPHY_SPEED_DIV(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_SERDES_CFG2(lane->id));
+	val |= MVEBU_COMPHY_SERDES_CFG2_DFE_EN;
+	writel(val, priv->base + MVEBU_COMPHY_SERDES_CFG2(lane->id));
+
+	/* DFE resolution */
+	val = readl(priv->base + MVEBU_COMPHY_DFE_RES(lane->id));
+	val |= MVEBU_COMPHY_DFE_RES_FORCE_GEN_TBL;
+	writel(val, priv->base + MVEBU_COMPHY_DFE_RES(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_GEN1_S0(lane->id));
+	val &= ~(MVEBU_COMPHY_GEN1_S0_TX_AMP(0x1f) |
+		 MVEBU_COMPHY_GEN1_S0_TX_EMPH(0xf));
+	val |= MVEBU_COMPHY_GEN1_S0_TX_AMP(0x1c) |
+	       MVEBU_COMPHY_GEN1_S0_TX_EMPH(0xe);
+	writel(val, priv->base + MVEBU_COMPHY_GEN1_S0(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_GEN1_S2(lane->id));
+	val &= ~MVEBU_COMPHY_GEN1_S2_TX_EMPH(0xf);
+	val |= MVEBU_COMPHY_GEN1_S2_TX_EMPH_EN;
+	writel(val, priv->base + MVEBU_COMPHY_GEN1_S2(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_TX_SLEW_RATE(lane->id));
+	val |= MVEBU_COMPHY_TX_SLEW_RATE_EMPH(0x3) |
+	       MVEBU_COMPHY_TX_SLEW_RATE_SLC(0x3f);
+	writel(val, priv->base + MVEBU_COMPHY_TX_SLEW_RATE(lane->id));
+
+	/* Impedance calibration */
+	val = readl(priv->base + MVEBU_COMPHY_IMP_CAL(lane->id));
+	val &= ~MVEBU_COMPHY_IMP_CAL_TX_EXT(0x1f);
+	val |= MVEBU_COMPHY_IMP_CAL_TX_EXT(0xe) |
+	       MVEBU_COMPHY_IMP_CAL_TX_EXT_EN;
+	writel(val, priv->base + MVEBU_COMPHY_IMP_CAL(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_GEN1_S5(lane->id));
+	val &= ~MVEBU_COMPHY_GEN1_S5_ICP(0xf);
+	writel(val, priv->base + MVEBU_COMPHY_GEN1_S5(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_GEN1_S1(lane->id));
+	val &= ~(MVEBU_COMPHY_GEN1_S1_RX_MUL_PI(0x7) |
+		 MVEBU_COMPHY_GEN1_S1_RX_MUL_PF(0x7) |
+		 MVEBU_COMPHY_GEN1_S1_RX_MUL_FI(0x3) |
+		 MVEBU_COMPHY_GEN1_S1_RX_MUL_FF(0x3));
+	val |= MVEBU_COMPHY_GEN1_S1_RX_DFE_EN |
+	       MVEBU_COMPHY_GEN1_S1_RX_MUL_PI(0x2) |
+	       MVEBU_COMPHY_GEN1_S1_RX_MUL_PF(0x2) |
+	       MVEBU_COMPHY_GEN1_S1_RX_MUL_FF(0x1) |
+	       MVEBU_COMPHY_GEN1_S1_RX_DIV(0x3);
+	writel(val, priv->base + MVEBU_COMPHY_GEN1_S1(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_COEF(lane->id));
+	val &= ~(MVEBU_COMPHY_COEF_DFE_EN | MVEBU_COMPHY_COEF_DFE_CTRL);
+	writel(val, priv->base + MVEBU_COMPHY_COEF(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_GEN1_S4(lane->id));
+	val &= ~MVEBU_COMPHY_GEN1_S4_DFE_RES(0x3);
+	val |= MVEBU_COMPHY_GEN1_S4_DFE_RES(0x1);
+	writel(val, priv->base + MVEBU_COMPHY_GEN1_S4(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_GEN1_S3(lane->id));
+	val |= MVEBU_COMPHY_GEN1_S3_FBCK_SEL;
+	writel(val, priv->base + MVEBU_COMPHY_GEN1_S3(lane->id));
+
+	/* rx training timer */
+	val = readl(priv->base + MVEBU_COMPHY_TRAINING5(lane->id));
+	val &= ~MVEBU_COMPHY_TRAINING5_RX_TIMER(0x3ff);
+	val |= MVEBU_COMPHY_TRAINING5_RX_TIMER(0x13);
+	writel(val, priv->base + MVEBU_COMPHY_TRAINING5(lane->id));
+
+	/* tx train peak to peak hold */
+	val = readl(priv->base + MVEBU_COMPHY_TRAINING0(lane->id));
+	val |= MVEBU_COMPHY_TRAINING0_P2P_HOLD;
+	writel(val, priv->base + MVEBU_COMPHY_TRAINING0(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_TX_PRESET(lane->id));
+	val &= ~MVEBU_COMPHY_TX_PRESET_INDEX(0xf);
+	val |= MVEBU_COMPHY_TX_PRESET_INDEX(0x2);	/* preset coeff */
+	writel(val, priv->base + MVEBU_COMPHY_TX_PRESET(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_FRAME_DETECT3(lane->id));
+	val &= ~MVEBU_COMPHY_FRAME_DETECT3_LOST_TIMEOUT_EN;
+	writel(val, priv->base + MVEBU_COMPHY_FRAME_DETECT3(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_TX_TRAIN_PRESET(lane->id));
+	val |= MVEBU_COMPHY_TX_TRAIN_PRESET_16B_AUTO_EN |
+	       MVEBU_COMPHY_TX_TRAIN_PRESET_PRBS11;
+	writel(val, priv->base + MVEBU_COMPHY_TX_TRAIN_PRESET(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_FRAME_DETECT0(lane->id));
+	val &= ~MVEBU_COMPHY_FRAME_DETECT0_PATN(0x1ff);
+	val |= MVEBU_COMPHY_FRAME_DETECT0_PATN(0x88);
+	writel(val, priv->base + MVEBU_COMPHY_FRAME_DETECT0(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_DME(lane->id));
+	val |= MVEBU_COMPHY_DME_ETH_MODE;
+	writel(val, priv->base + MVEBU_COMPHY_DME(lane->id));
+
+	val = readl(priv->base + MVEBU_COMPHY_VDD_CAL0(lane->id));
+	val |= MVEBU_COMPHY_VDD_CAL0_CONT_MODE;
+	writel(val, priv->base + MVEBU_COMPHY_VDD_CAL0(lane->id));
+
+	val = readl(priv->base + MVEBU_SP_CALIB(lane->id));
+	val &= ~MVEBU_SP_CALIB_SAMPLER(0x3);
+	val |= MVEBU_SP_CALIB_SAMPLER(0x3) |
+	       MVEBU_SP_CALIB_SAMPLER_EN;
+	writel(val, priv->base + MVEBU_SP_CALIB(lane->id));
+	val &= ~MVEBU_SP_CALIB_SAMPLER_EN;
+	writel(val, priv->base + MVEBU_SP_CALIB(lane->id));
+
+	/* External rx regulator */
+	val = readl(priv->base + MVEBU_COMPHY_EXT_SELV(lane->id));
+	val &= ~MVEBU_COMPHY_EXT_SELV_RX_SAMPL(0x1f);
+	val |= MVEBU_COMPHY_EXT_SELV_RX_SAMPL(0x1a);
+	writel(val, priv->base + MVEBU_COMPHY_EXT_SELV(lane->id));
+
+	return mvebu_comphy_init_plls(lane, mode);
+}
+
+static int mvebu_comphy_power_on(struct phy *phy)
+{
+	struct mvebu_comphy_lane *lane = phy_get_drvdata(phy);
+	struct mvebu_comphy_priv *priv = lane->priv;
+	int ret;
+	u32 mux, val;
+
+	mux = mvebu_comphy_get_mux(lane->id, lane->port, lane->mode);
+	if (mux < 0)
+		return -ENOTSUPP;
+
+	regmap_read(priv->regmap, MVEBU_COMPHY_SELECTOR, &val);
+	val &= ~(0xf << MVEBU_COMPHY_SELECTOR_PHY(lane->id));
+	val |= mux << MVEBU_COMPHY_SELECTOR_PHY(lane->id);
+	regmap_write(priv->regmap, MVEBU_COMPHY_SELECTOR, val);
+
+	switch (lane->mode) {
+	case PHY_MODE_SGMII:
+		ret = mvebu_comphy_set_mode_sgmii(phy, lane->mode);
+		break;
+	case PHY_MODE_10GKR:
+		ret = mvebu_comphy_set_mode_10gkr(phy, lane->mode);
+		break;
+	default:
+		return -ENOTSUPP;
+	}
+
+	/* digital reset */
+	val = readl(priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+	val |= MVEBU_COMPHY_SERDES_CFG1_RF_RESET;
+	writel(val, priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+
+	return ret;
+}
+
+static int mvebu_comphy_set_mode(struct phy *phy, enum phy_mode mode)
+{
+	struct mvebu_comphy_lane *lane = phy_get_drvdata(phy);
+
+	if (mvebu_comphy_get_mux(lane->id, lane->port, mode) < 0)
+		return -EINVAL;
+
+	lane->mode = mode;
+	return 0;
+}
+
+static int mvebu_comphy_power_off(struct phy *phy)
+{
+	struct mvebu_comphy_lane *lane = phy_get_drvdata(phy);
+	struct mvebu_comphy_priv *priv = lane->priv;
+	u32 val;
+
+	val = readl(priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+	val &= ~(MVEBU_COMPHY_SERDES_CFG1_RESET |
+		 MVEBU_COMPHY_SERDES_CFG1_CORE_RESET |
+		 MVEBU_COMPHY_SERDES_CFG1_RF_RESET);
+	writel(val, priv->base + MVEBU_COMPHY_SERDES_CFG1(lane->id));
+
+	regmap_read(priv->regmap, MVEBU_COMPHY_SELECTOR, &val);
+	val &= ~(0xf << MVEBU_COMPHY_SELECTOR_PHY(lane->id));
+	regmap_write(priv->regmap, MVEBU_COMPHY_SELECTOR, val);
+
+	return 0;
+}
+
+static const struct phy_ops mvebu_comphy_ops = {
+	.power_on	= mvebu_comphy_power_on,
+	.power_off	= mvebu_comphy_power_off,
+	.set_mode	= mvebu_comphy_set_mode,
+};
+
+static struct phy *mvebu_comphy_xlate(struct device *dev,
+				      struct of_phandle_args *args)
+{
+	struct mvebu_comphy_priv *priv = dev_get_drvdata(dev);
+	struct mvebu_comphy_lane *lane;
+	int i;
+
+	if (WARN_ON(args->args[0] >= MVEBU_COMPHY_PORTS))
+		return ERR_PTR(-EINVAL);
+
+	for (i = 0; i < MVEBU_COMPHY_LANES; i++) {
+		if (!priv->phys[i])
+			continue;
+
+		lane = phy_get_drvdata(priv->phys[i]);
+		if (priv->phys[i] && args->np == lane->of_node)
+			break;
+	}
+
+	if (i == MVEBU_COMPHY_LANES)
+		return ERR_PTR(-ENODEV);
+
+	if (lane->port >= 0)
+		return ERR_PTR(-EBUSY);
+
+	lane->port = args->args[0];
+	return priv->phys[i];
+}
+
+static int mvebu_comphy_probe(struct platform_device *pdev)
+{
+	struct mvebu_comphy_priv *priv;
+	struct phy_provider *provider;
+	struct device_node *child;
+	struct resource *res;
+
+	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->dev = &pdev->dev;
+	priv->regmap =
+		syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
+						"marvell,system-controller");
+	if (IS_ERR(priv->regmap))
+		return PTR_ERR(priv->regmap);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(&pdev->dev, res);
+	if (!priv->base)
+		return -ENOMEM;
+
+	for_each_available_child_of_node(pdev->dev.of_node, child) {
+		struct mvebu_comphy_lane *lane;
+		struct phy *phy;
+		int ret;
+		u32 val;
+
+		ret = of_property_read_u32(child, "reg", &val);
+		if (ret < 0) {
+			dev_err(&pdev->dev, "missing 'reg' property (%d)\n",
+				ret);
+			continue;
+		}
+
+		if (val >= MVEBU_COMPHY_LANES) {
+			dev_err(&pdev->dev, "invalid 'reg' property\n");
+			continue;
+		}
+
+		lane = devm_kzalloc(&pdev->dev, sizeof(*lane), GFP_KERNEL);
+		if (!lane)
+			return -ENOMEM;
+
+		phy = devm_phy_create(&pdev->dev, NULL, &mvebu_comphy_ops);
+		if (IS_ERR(phy))
+			return PTR_ERR(phy);
+
+		lane->priv = priv;
+		lane->of_node = child;
+		lane->mode = PHY_MODE_INVALID;
+		lane->id = val;
+		lane->port = -1;
+		phy_set_drvdata(phy, lane);
+
+		priv->phys[val] = phy;
+
+		/*
+		 * Once all modes are supported in this driver we should call
+		 * mvebu_comphy_power_off(phy) here to avoid relying on the
+		 * bootloader/firmware configuration.
+		 */
+	}
+
+	dev_set_drvdata(&pdev->dev, priv);
+	provider = devm_of_phy_provider_register(&pdev->dev,
+						 mvebu_comphy_xlate);
+	return PTR_ERR_OR_ZERO(provider);
+}
+
+static const struct of_device_id mvebu_comphy_of_match_table[] = {
+	{ .compatible = "marvell,comphy-cp110" },
+	{ },
+};
+MODULE_DEVICE_TABLE(of, mvebu_comphy_of_match_table);
+
+static struct platform_driver mvebu_comphy_driver = {
+	.probe	= mvebu_comphy_probe,
+	.driver	= {
+		.name = "mvebu-comphy",
+		.of_match_table = mvebu_comphy_of_match_table,
+	},
+};
+module_platform_driver(mvebu_comphy_driver);
+
+MODULE_AUTHOR("Antoine Tenart <antoine.tenart@free-electrons.com>");
+MODULE_DESCRIPTION("Common PHY driver for mvebu SoCs");
+MODULE_LICENSE("GPL v2");
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v3 01/13] phy: add sgmii and 10gkr modes to the phy_mode enum
From: Antoine Tenart @ 2017-08-28 14:57 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement
  Cc: Antoine Tenart, thomas.petazzoni, nadavh, linux, linux-kernel, mw,
	stefanc, miquel.raynal, netdev
In-Reply-To: <20170828145725.2539-1-antoine.tenart@free-electrons.com>

This patch adds more generic PHY modes to the phy_mode enum, to
allow configuring generic PHYs to the SGMII and/or the 10GKR mode
by using the set_mode callback.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 include/linux/phy/phy.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
index 78bb0d7f6b11..e694d4008c4a 100644
--- a/include/linux/phy/phy.h
+++ b/include/linux/phy/phy.h
@@ -27,6 +27,8 @@ enum phy_mode {
 	PHY_MODE_USB_HOST,
 	PHY_MODE_USB_DEVICE,
 	PHY_MODE_USB_OTG,
+	PHY_MODE_SGMII,
+	PHY_MODE_10GKR,
 };
 
 /**
-- 
2.13.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox