Netdev List
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool
From: Alexander Duyck @ 2014-11-27  0:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, brouer, jeffrey.t.kirsher, eric.dumazet, ast

This patch series implements a means of allocating page fragments without
the need for the local_irq_save/restore in __netdev_alloc_frag.  By doing
this I am able to decrease packet processing time by 11ns per packet in my
test environment.

---

Alexander Duyck (3):
      net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag
      net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb
      fm10k/igb/ixgbe: Use napi_alloc_skb


 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    4 -
 drivers/net/ethernet/intel/igb/igb_main.c     |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    4 -
 include/linux/skbuff.h                        |   11 ++
 net/core/dev.c                                |    2 
 net/core/skbuff.c                             |  160 ++++++++++++++++++-------
 6 files changed, 133 insertions(+), 51 deletions(-)

--

^ permalink raw reply

* [RFC PATCH 1/3] net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag
From: Alexander Duyck @ 2014-11-27  0:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, brouer, jeffrey.t.kirsher, eric.dumazet, ast
In-Reply-To: <20141126235900.1617.10008.stgit@ahduyck-vm-fedora20>

This patch splits the netdev_alloc_frag function up so that it can be used
on one of two page frag pools instead of being fixed on the
netdev_alloc_cache.  By doing this we can add a NAPI specific function
__napi_alloc_frag that accesses a pool that is only used from softirq
context.  The advantage to this is that we do not need to call
local_irq_save/restore which can be a significant savings.

I also took the opportunity to refactor the core bits that were placed in
__alloc_page_frag.  First I updated the allocation to do either a 32K
allocation or an order 0 page.  Then I also rewrote the logic to work from
the end of the page to the start.  By doing this the size value doesn't
have to be used unless we have run out of space for page fragments.
Finally I cleaned up the atomic bits so that we just do an
atomic_sub_return and if that returns 0 then we set the page->_count via an
atomic_set.  This way we can remove the extra conditional for the
atomic_read since it would have led to an atomic_inc in the case of success
anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |    2 +
 net/core/skbuff.c      |   86 +++++++++++++++++++++++++++---------------------
 2 files changed, 51 insertions(+), 37 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6333835..e596efa 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2184,6 +2184,8 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
 	return __netdev_alloc_skb_ip_align(dev, length, GFP_ATOMIC);
 }
 
+void *napi_alloc_frag(unsigned int fragsz);
+
 /**
  * __dev_alloc_pages - allocate page for network Rx
  * @gfp_mask: allocation priority. Set __GFP_NOMEMALLOC if not for network Rx
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 92116df..6dd2b44 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -336,59 +336,60 @@ struct netdev_alloc_cache {
 	unsigned int		pagecnt_bias;
 };
 static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
+static DEFINE_PER_CPU(struct netdev_alloc_cache, napi_alloc_cache);
 
-static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
+static void *__alloc_page_frag(struct netdev_alloc_cache __percpu *cache,
+			       unsigned int fragsz, gfp_t gfp_mask)
 {
-	struct netdev_alloc_cache *nc;
-	void *data = NULL;
-	int order;
-	unsigned long flags;
+	struct netdev_alloc_cache *nc = this_cpu_ptr(cache);
 
-	local_irq_save(flags);
-	nc = this_cpu_ptr(&netdev_alloc_cache);
 	if (unlikely(!nc->frag.page)) {
 refill:
-		for (order = NETDEV_FRAG_PAGE_MAX_ORDER; ;) {
-			gfp_t gfp = gfp_mask;
-
-			if (order)
-				gfp |= __GFP_COMP | __GFP_NOWARN;
-			nc->frag.page = alloc_pages(gfp, order);
-			if (likely(nc->frag.page))
-				break;
-			if (--order < 0)
-				goto end;
+		nc->frag.size = NETDEV_FRAG_PAGE_MAX_SIZE;
+		nc->frag.page = alloc_pages_node(NUMA_NO_NODE,
+						 gfp_mask |
+						 __GFP_COMP |
+						 __GFP_NOWARN,
+						 NETDEV_FRAG_PAGE_MAX_ORDER);
+		if (unlikely(!nc->frag.page)) {
+			nc->frag.size = PAGE_SIZE;
+			nc->frag.page = alloc_pages_node(NUMA_NO_NODE,
+							 gfp_mask, 0);
+			if (unlikely(!nc->frag.page))
+				return NULL;
 		}
-		nc->frag.size = PAGE_SIZE << order;
+
 		/* Even if we own the page, we do not use atomic_set().
 		 * This would break get_page_unless_zero() users.
 		 */
-		atomic_add(NETDEV_PAGECNT_MAX_BIAS - 1,
-			   &nc->frag.page->_count);
+		atomic_add(NETDEV_PAGECNT_MAX_BIAS - 1, &nc->frag.page->_count);
 		nc->pagecnt_bias = NETDEV_PAGECNT_MAX_BIAS;
-		nc->frag.offset = 0;
+		nc->frag.offset = nc->frag.size;
 	}
 
-	if (nc->frag.offset + fragsz > nc->frag.size) {
-		if (atomic_read(&nc->frag.page->_count) != nc->pagecnt_bias) {
-			if (!atomic_sub_and_test(nc->pagecnt_bias,
-						 &nc->frag.page->_count))
-				goto refill;
-			/* OK, page count is 0, we can safely set it */
-			atomic_set(&nc->frag.page->_count,
-				   NETDEV_PAGECNT_MAX_BIAS);
-		} else {
-			atomic_add(NETDEV_PAGECNT_MAX_BIAS - nc->pagecnt_bias,
-				   &nc->frag.page->_count);
-		}
+	if (nc->frag.offset < fragsz) {
+		if (atomic_sub_return(nc->pagecnt_bias, &nc->frag.page->_count))
+			goto refill;
+
+		/* OK, page count is 0, we can safely set it */
+		atomic_set(&nc->frag.page->_count, NETDEV_PAGECNT_MAX_BIAS);
 		nc->pagecnt_bias = NETDEV_PAGECNT_MAX_BIAS;
-		nc->frag.offset = 0;
+		nc->frag.offset = nc->frag.size;
 	}
 
-	data = page_address(nc->frag.page) + nc->frag.offset;
-	nc->frag.offset += fragsz;
+	nc->frag.offset -= fragsz;
 	nc->pagecnt_bias--;
-end:
+
+	return page_address(nc->frag.page) + nc->frag.offset;
+}
+
+static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
+{
+	unsigned long flags;
+	void *data;
+
+	local_irq_save(flags);
+	data = __alloc_page_frag(&netdev_alloc_cache, fragsz, gfp_mask);
 	local_irq_restore(flags);
 	return data;
 }
@@ -406,6 +407,17 @@ void *netdev_alloc_frag(unsigned int fragsz)
 }
 EXPORT_SYMBOL(netdev_alloc_frag);
 
+static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
+{
+	return __alloc_page_frag(&napi_alloc_cache, fragsz, gfp_mask);
+}
+
+void *napi_alloc_frag(unsigned int fragsz)
+{
+	return __napi_alloc_frag(fragsz, GFP_ATOMIC | __GFP_COLD);
+}
+EXPORT_SYMBOL(napi_alloc_frag);
+
 /**
  *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
  *	@dev: network device to receive on

^ permalink raw reply related

* [RFC PATCH 2/3] net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb
From: Alexander Duyck @ 2014-11-27  0:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, brouer, jeffrey.t.kirsher, eric.dumazet, ast
In-Reply-To: <20141126235900.1617.10008.stgit@ahduyck-vm-fedora20>

This change pulls the core functionality out of __netdev_alloc_skb and
places them in a new function named __alloc_rx_skb.  The reason for doing
this is to make these bits accessible to a new function __napi_alloc_skb.
In addition __alloc_rx_skb now has a new flags value that is used to
determine which page frag pool to allocate from.  If the SKB_ALLOC_NAPI
flag is set then the NAPI pool is used.  The advantage of this is that we
do not have to use local_irq_save/restore when accessing the NAPI pool from
NAPI context.

In my test setup I saw at least 11ns of savings using the napi_alloc_skb
function versus the netdev_alloc_skb function, most of this being due to
the fact that we didn't have to call local_irq_save/restore.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |    9 ++++++
 net/core/dev.c         |    2 +
 net/core/skbuff.c      |   74 +++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e596efa..67de1d0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -151,6 +151,7 @@ struct net_device;
 struct scatterlist;
 struct pipe_inode_info;
 struct iov_iter;
+struct napi_struct;
 
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 struct nf_conntrack {
@@ -674,6 +675,7 @@ struct sk_buff {
 
 #define SKB_ALLOC_FCLONE	0x01
 #define SKB_ALLOC_RX		0x02
+#define SKB_ALLOC_NAPI		0x04
 
 /* Returns true if the skb was allocated from PFMEMALLOC reserves */
 static inline bool skb_pfmemalloc(const struct sk_buff *skb)
@@ -2185,6 +2187,13 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
 }
 
 void *napi_alloc_frag(unsigned int fragsz);
+struct sk_buff *__napi_alloc_skb(struct napi_struct *napi,
+				 unsigned int length, gfp_t gfp_mask);
+static inline struct sk_buff *napi_alloc_skb(struct napi_struct *napi,
+					     unsigned int length)
+{
+	return __napi_alloc_skb(napi, length, GFP_ATOMIC);
+}
 
 /**
  * __dev_alloc_pages - allocate page for network Rx
diff --git a/net/core/dev.c b/net/core/dev.c
index ac48362..ff636ad 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4172,7 +4172,7 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi)
 	struct sk_buff *skb = napi->skb;
 
 	if (!skb) {
-		skb = netdev_alloc_skb_ip_align(napi->dev, GRO_MAX_HEAD);
+		skb = napi_alloc_skb(napi, GRO_MAX_HEAD);
 		napi->skb = skb;
 	}
 	return skb;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6dd2b44..397efd8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -419,10 +419,13 @@ void *napi_alloc_frag(unsigned int fragsz)
 EXPORT_SYMBOL(napi_alloc_frag);
 
 /**
- *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
- *	@dev: network device to receive on
+ *	__alloc_rx_skb - allocate an skbuff for rx
  *	@length: length to allocate
  *	@gfp_mask: get_free_pages mask, passed to alloc_skb
+ *	@flags:	If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used for
+ *		allocations in case we have to fallback to __alloc_skb()
+ *		If SKB_ALLOC_NAPI is set, page fragment will be allocated
+ *		from napi_cache instead of netdev_cache.
  *
  *	Allocate a new &sk_buff and assign it a usage count of one. The
  *	buffer has unspecified headroom built in. Users should allocate
@@ -431,11 +434,11 @@ EXPORT_SYMBOL(napi_alloc_frag);
  *
  *	%NULL is returned if there is no free memory.
  */
-struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
-				   unsigned int length, gfp_t gfp_mask)
+static struct sk_buff *__alloc_rx_skb(unsigned int length, gfp_t gfp_mask,
+				      int flags)
 {
 	struct sk_buff *skb = NULL;
-	unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
+	unsigned int fragsz = SKB_DATA_ALIGN(length) +
 			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
 	if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) {
@@ -444,7 +447,9 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 		if (sk_memalloc_socks())
 			gfp_mask |= __GFP_MEMALLOC;
 
-		data = __netdev_alloc_frag(fragsz, gfp_mask);
+		data = (flags & SKB_ALLOC_NAPI) ?
+			__napi_alloc_frag(fragsz, gfp_mask) :
+			__netdev_alloc_frag(fragsz, gfp_mask);
 
 		if (likely(data)) {
 			skb = build_skb(data, fragsz);
@@ -452,17 +457,72 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 				put_page(virt_to_head_page(data));
 		}
 	} else {
-		skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask,
+		skb = __alloc_skb(length, gfp_mask,
 				  SKB_ALLOC_RX, NUMA_NO_NODE);
 	}
+	return skb;
+}
+
+/**
+ *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
+ *	@dev: network device to receive on
+ *	@length: length to allocate
+ *	@gfp_mask: get_free_pages mask, passed to alloc_skb
+ *
+ *	Allocate a new &sk_buff and assign it a usage count of one. The
+ *	buffer has NET_SKB_PAD headroom built in. Users should allocate
+ *	the headroom they think they need without accounting for the
+ *	built in space. The built in space is used for optimisations.
+ *
+ *	%NULL is returned if there is no free memory.
+ */
+struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
+				   unsigned int length, gfp_t gfp_mask)
+{
+	struct sk_buff *skb;
+
+	length += NET_SKB_PAD;
+	skb = __alloc_rx_skb(length, gfp_mask, 0);
+
 	if (likely(skb)) {
 		skb_reserve(skb, NET_SKB_PAD);
 		skb->dev = dev;
 	}
+
 	return skb;
 }
 EXPORT_SYMBOL(__netdev_alloc_skb);
 
+/**
+ *	__napi_alloc_skb - allocate skbuff for rx in a specific NAPI instance
+ *	@napi: napi instance this buffer was allocated for
+ *	@length: length to allocate
+ *	@gfp_mask: get_free_pages mask, passed to alloc_skb and alloc_pages
+ *
+ *	Allocate a new sk_buff for use in NAPI receive.  This buffer will
+ *	attempt to allocate the head from a special reserved region used
+ *	only for NAPI Rx allocation.  By doing this we can save several
+ *	CPU cycles by avoiding having to disable and re-enable IRQs.
+ *
+ *	%NULL is returned if there is no free memory.
+ */
+struct sk_buff *__napi_alloc_skb(struct napi_struct *napi,
+				 unsigned int length, gfp_t gfp_mask)
+{
+	struct sk_buff *skb;
+
+	length += NET_SKB_PAD + NET_IP_ALIGN;
+	skb = __alloc_rx_skb(length, gfp_mask, SKB_ALLOC_NAPI);
+
+	if (likely(skb)) {
+		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+		skb->dev = napi->dev;
+	}
+
+	return skb;
+}
+EXPORT_SYMBOL(__napi_alloc_skb);
+
 void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
 		     int size, unsigned int truesize)
 {

^ permalink raw reply related

* [RFC PATCH 3/3] fm10k/igb/ixgbe: Use napi_alloc_skb
From: Alexander Duyck @ 2014-11-27  0:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, brouer, jeffrey.t.kirsher, eric.dumazet, ast
In-Reply-To: <20141126235900.1617.10008.stgit@ahduyck-vm-fedora20>

This change replaces calls to netdev_alloc_skb_ip_align with
napi_alloc_skb.  The advantage of napi_alloc_skb is currently the fact that
the page allocation doesn't make use of any irq disable calls.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c     |    3 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    4 ++--
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 3acdec3..7eff7c6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -310,8 +310,8 @@ static struct sk_buff *fm10k_fetch_rx_buffer(struct fm10k_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
-						FM10K_RX_HDR_LEN);
+		skb = napi_alloc_skb(&rx_ring->q_vector->napi,
+				     FM10K_RX_HDR_LEN);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_failed++;
 			return NULL;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 536ef9d..922d78c 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6638,8 +6638,7 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
-						IGB_RX_HDR_LEN);
+		skb = napi_alloc_skb(&rx_ring->q_vector->napi, IGB_RX_HDR_LEN);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_failed++;
 			return NULL;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a195d24..03df91f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1858,8 +1858,8 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
-						IXGBE_RX_HDR_SIZE);
+		skb = napi_alloc_skb(&rx_ring->q_vector->napi,
+				     IXGBE_RX_HDR_SIZE);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_rx_buff_failed++;
 			return NULL;

^ permalink raw reply related

* Re: [PATCH rfc 1/4] net-timestamp: pull headers for SOCK_STREAM
From: Andy Lutomirski @ 2014-11-27  0:36 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: David Miller, Network Development, Richard Cochran
In-Reply-To: <CA+FuTScmwYyrU-LOK5dvbdcp4n=wig7GTdV1gmViS-2gJ4Q9ZA@mail.gmail.com>

On Wed, Nov 26, 2014 at 1:03 PM, Willem de Bruijn <willemb@google.com> wrote:
> On Tue, Nov 25, 2014 at 4:39 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Tue, Nov 25, 2014 at 11:54 AM, David Miller <davem@davemloft.net> wrote:
>>> From: Willem de Bruijn <willemb@google.com>
>>> Date: Tue, 25 Nov 2014 14:52:00 -0500
>>>
>>>> On Tue, Nov 25, 2014 at 1:42 PM, David Miller <davem@davemloft.net> wrote:
>>>>> From: Willem de Bruijn <willemb@google.com>
>>>>> Date: Tue, 25 Nov 2014 12:58:03 -0500
>>>>>
>>>>> What's the harm in exposing the headers?  Either it's harmful, and
>>>>> therefore doing so for UDP is bad too, or it's harmless and
>>>>
>>>> Headers may expose information not available otherwise. I don't
>>>> immediately see critical problems, but that does not mean that they
>>>> might not lurk there.
>>>>
>>>> We so far avoid exposing the sequence number, though keeping it hidden
>>>> is more about third parties. More in general, unprivileged processes
>>>> may start requesting timestamps only to learn tcp state that they
>>>> should either get from tcpinfo or cannot currently get at all, likely
>>>> for good reason. A far-fetched example is identifying admin iptables
>>>> tos mangling rules by reading the tos bits at the driver layer. At least
>>>> on my machine, iptables -L is privileged.
>>>>
>>>>> we should probably leave it alone to not risk breaking anyone.
>>>>
>>>> That's fair. I sent it for rfc first for that reason. I won't resubmit
>>>> unless more serious concerns are raised.
>>>
>>> I just worry about the potential breakage.
>>>
>>> Your concerns are valid... I honestly don't know what we should do here.
>>> Both choices have merit.
>>
>> Here's a scenario in which giving the headers might be dangerous:
>>
>> Suppose I create a network namespace that's designed to contain
>> something, e.g. a Tor or Tor-like client, that shouldn't know any of
>> its public addressing information.  I might assign something like a
>> tunnel interface to the namespace, but, if the contained code can get
>> lower-level headers, it might learn something that would identify the
>> *other* end of the tunnel, which wouldn't be so good.  Admittedly,
>> this would be just one of several things that would require care to
>> get this right.
>
> network namespaces are an interesting case, indeed.
>
>>
>> Also, what happens if the output is transformed by ipsec?  Does the
>> timestamp message show the ciphertext?
>>
>> TBH, I'd rather send no payload at all and have an scm message that
>> the sender provides that specifies a cookie identifying the particular
>> sent data.  But that ship mostly sailed awhile ago.
>>
>> For bytestreams, though, isn't this all new in 3.18?  Or am I off by a release.
>
> It was added in 3.17. That is still very recent.
>
> One third option, though hardly pretty, is to put display of headers
> under administrator control. An application cannot easily infer whether
> headers are stripped, and legacy applications do not even know to try.
> So, this is a bit too crude:
>
> +    if (sk->sk_protocol == IPPROTO_TCP && sysctl_net_blind_errqueue)
> +        skb_pull(skb, skb_transport_offset(skb) + tcp_hdrlen(skb));
> +    else if (sk->sk_protocol == IPPROTO_UDP && sysctl_net_blind_errqueue >= 2)
> +        skb_pull(skb, skb_transport_offset(skb) + sizeof(struct udphdr));
>
> An alternative is to add a timestamping option to skip headers (or
> even full payload, basically
> http://patchwork.ozlabs.org/patch/366967/) and give the administrator
> a sysctl to drop all requests that do not pass this flag. The intent
> is that future proof applications will start requesting the flag, and
> relying on the ts counter. Hardened installations can set the sysctl
> from the start, accepting possible breakage.

Is there any reason to believe that unconditionally dropping the
headers would break anything?  I find it a bit hard to believe that
anyone has actually implemented logic to figure out *what* L2 header
type should be decoded and decode it.

I can imagine that someone has hardcoded an assumption that the
underlying interface is Ethernet, but there's still the whole pile of
vlan, random datacenter encapsulation protocols and such to worry
about.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: [PATCH net-next v4] ipvlan: Initial check-in of the IPVLAN driver.
From: Toshiaki Makita @ 2014-11-27  1:59 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: netdev, Eric Dumazet, Maciej Zenczykowski, Laurent Chavey,
	Tim Hockin, David Miller, Brandon Philips, Pavel Emelianov
In-Reply-To: <CAF2d9jgZkgk4kOyS7K959vEq_55DuEnkVdxzKPdmPiTxFfPvsg@mail.gmail.com>

On 2014/11/27 2:05, Mahesh Bandewar wrote:
> On Tue, Nov 25, 2014 at 10:41 PM, Toshiaki Makita
> <makita.toshiaki@lab.ntt.co.jp> wrote:
>> Hi Mahesh,
>>
>> I found that deleting the last ipvlan device triggers WARN_ON() in
>> rtmsg_ifinfo().
>> ipvlan_nl_fillinfo() seems to return -EINVAL in that case.
>>
>>> +static int ipvlan_nl_fillinfo(struct sk_buff *skb,
>>> +                           const struct net_device *dev)
>>> +{
>>> +     struct ipvl_dev *ipvlan = netdev_priv(dev);
>>> +     struct ipvl_port *port = ipvlan_port_get_rtnl(ipvlan->phy_dev);
>>> +     int ret = -EINVAL;
>>> +
>>> +     if (!port)
>>> +             goto err;
>>> +
>>> +     ret = -EMSGSIZE;
>>> +     if (nla_put_u16(skb, IFLA_IPVLAN_MODE, port->mode))
>>> +             goto err;
>>> +
>>> +     return 0;
>>> +
>>> +err:
>>> +     return ret;
>>> +}
>>
>> rollback_registered_many() calls rtmsg_ifinfo() after calling ndo_uninit().
>> ndo_uninit() (ipvlan_uninit() -> ipvlan_port_destroy() ->
>> netdev_rx_handler_unregister()) sets rx_handler_data into NULL.
>> So, we cannot dereference "port" in ipvlan_nl_fillinfo().
>>
> Calling fillinfo() after calling uninit() seems pointless on any
> device. 

bonding needs calling rtmsg_ifinfo() after calling ndo_uninit().
56bfa7ee7c88 ("unregister_netdevice : move RTM_DELLINK to until after
ndo_uninit")

> But how are you hitting this case? Can you share the command
> sequence with me?

# ip link add link eth0 name ipvl0 type ipvlan
# ip link del ipvl0

Thanks,
Toshiaki Makita

> 
> Thanks,
> --mahesh..
> 
>> Maybe "mode" should belong to struct ipvl_dev?
>>
>> Thanks,
>> Toshiaki Makita

^ permalink raw reply

* [PATCH v2 0/6] support GMAC driver for RK3288
From: Roger Chen @ 2014-11-27  2:49 UTC (permalink / raw)
  To: heiko
  Cc: peppe.cavallaro, netdev, linux-kernel, linux-rockchip, kever.yang,
	eddie.cai, roger.chen

Roger Chen (6):
  patch1: add driver for Rockchip RK3288 SoCs integrated GMAC
  patch2: define clock ID used for GMAC
  patch3: modify CRU config for Rockchip RK3288 SoCs integrated GMAC
  patch4: dts: rockchip: add gmac info for rk3288
  patch5: dts: rockchip: enable gmac on RK3288 evb board
  patch6: add document for Rockchip RK3288 GMAC

Tested on rk3288 evb board:
Execute the following command to enable ethernet,
set local IP and ping a remote host.

busybox ifconfig eth0 up
busybox ifconfig eth0 192.168.1.111
ping 192.168.1.1

-- 
1.7.9.5

^ permalink raw reply

* [PATCH v2 0/6] support GMAC driver for RK3288
From: Roger Chen @ 2014-11-27  2:51 UTC (permalink / raw)
  To: peppe.cavallaro
  Cc: heiko, netdev, linux-kernel, linux-rockchip, kever.yang,
	eddie.cai, roger.chen
In-Reply-To: <1417056591-3570-1-git-send-email-roger.chen@rock-chips.com>

Roger Chen (6):
  patch1: add driver for Rockchip RK3288 SoCs integrated GMAC
  patch2: define clock ID used for GMAC
  patch3: modify CRU config for Rockchip RK3288 SoCs integrated GMAC
  patch4: dts: rockchip: add gmac info for rk3288
  patch5: dts: rockchip: enable gmac on RK3288 evb board
  patch6: add document for Rockchip RK3288 GMAC

Tested on rk3288 evb board:
Execute the following command to enable ethernet,
set local IP and ping a remote host.

busybox ifconfig eth0 up
busybox ifconfig eth0 192.168.1.111
ping 192.168.1.1

-- 
1.7.9.5

^ permalink raw reply

* [PATCH v2 1/6] GMAC: add driver for Rockchip RK3288 SoCs integrated GMAC
From: Roger Chen @ 2014-11-27  2:52 UTC (permalink / raw)
  To: peppe.cavallaro
  Cc: heiko, netdev, linux-kernel, linux-rockchip, kever.yang,
	eddie.cai, roger.chen
In-Reply-To: <1417056591-3570-1-git-send-email-roger.chen@rock-chips.com>

This driver is based on stmmac driver.

modification based on Giuseppe CAVALLARO's suggestion:
1. use BIT()
	> +/*RK3288_GRF_SOC_CON3*/
	> +#define GMAC_TXCLK_DLY_ENABLE   ((0x4000 << 16) | (0x4000))
	> +#define GMAC_TXCLK_DLY_DISABLE  ((0x4000 << 16) | (0x0000))
	...

	why do not use BIT and BIT_MASK where possible?

	===>after modification:

	#define GRF_BIT(nr)     (BIT(nr) | BIT(nr+16))
	#define GRF_CLR_BIT(nr) (BIT(nr+16))
	#define GMAC_TXCLK_DLY_ENABLE   GRF_BIT(14)
	#define GMAC_TXCLK_DLY_DISABLE  GRF_CLR_BIT(14)
	...
2.
	> +    regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
	> +             GMAC_PHY_INTF_SEL_RGMII);
	> +    regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
	> +             GMAC_RMII_MODE_CLR);
	maybe you could perform just one write unless there is some HW
	constraint.

	===>after modification:

	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
			 GMAC_PHY_INTF_SEL_RGMII | GMAC_RMII_MODE_CLR);

3. use macros
	> +    regmap_write(bsp_priv->grf, RK3288_GRF_GPIO3D_E, 0xFFFFFFFF);
	> +    regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4B_E,
	> +             0x3<<2<<16 | 0x3<<2);

	pls use macros, these shift sequence is really help to decode

	===>after modification:

	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4A_E, GPIO4A_12MA);
	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4B_E, GPIO4B_2_12MA);

4. remove grf fail check in rk_gmac_setup()
	> +    if (IS_ERR(bsp_priv->grf))
	> +        dev_err(&pdev->dev, "Missing rockchip,grf property\n");

	I wonder if you can fail on here and save all the check in
	set_rgmii_speed etc.
	Maybe this can be considered a mandatory property for the glue-logic.

5. remove .tx_coe=1
	> +const struct stmmac_of_data rk_gmac_data = {
	> +    .has_gmac = 1,
	> +    .tx_coe = 1,

	FYI, on new gmac there is the HW capability register to dinamically
	provide you if coe is supported.

	IMO you should add the OF "compatible" string and in case of mac
	newer than the 3.50a you can remove coe.

Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
---
 drivers/net/ethernet/stmicro/stmmac/Makefile       |    2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c     |  636 ++++++++++++++++++++
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |    3 +
 .../net/ethernet/stmicro/stmmac/stmmac_platform.h  |    1 +
 4 files changed, 641 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile b/drivers/net/ethernet/stmicro/stmmac/Makefile
index ac4d562..73c2715 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -6,7 +6,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o ring_mode.o	\
 
 obj-$(CONFIG_STMMAC_PLATFORM) += stmmac-platform.o
 stmmac-platform-objs:= stmmac_platform.o dwmac-meson.o dwmac-sunxi.o	\
-		       dwmac-sti.o dwmac-socfpga.o
+		       dwmac-sti.o dwmac-socfpga.o dwmac-rk.o
 
 obj-$(CONFIG_STMMAC_PCI) += stmmac-pci.o
 stmmac-pci-objs:= stmmac_pci.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
new file mode 100644
index 0000000..870563f
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -0,0 +1,636 @@
+/**
+ * dwmac-rk.c - Rockchip RK3288 DWMAC specific glue layer
+ *
+ * Copyright (C) 2014 Chen-Zhi (Roger Chen)
+ *
+ * Chen-Zhi (Roger Chen)  <roger.chen@rock-chips.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/stmmac.h>
+#include <linux/bitops.h>
+#include <linux/clk.h>
+#include <linux/phy.h>
+#include <linux/of_net.h>
+#include <linux/gpio.h>
+#include <linux/of_gpio.h>
+#include <linux/of_device.h>
+#include <linux/regulator/consumer.h>
+#include <linux/delay.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+struct rk_priv_data {
+	struct platform_device *pdev;
+	int phy_iface;
+	bool power_ctrl_by_pmu;
+	char pmu_regulator[32];
+	int pmu_enable_level;
+
+	int power_io;
+	int power_io_level;
+	int reset_io;
+	int reset_io_level;
+	int phyirq_io;
+	int phyirq_io_level;
+
+	bool clk_enabled;
+	bool clock_input;
+
+	struct clk *clk_mac;
+	struct clk *clk_mac_pll;
+	struct clk *gmac_clkin;
+	struct clk *mac_clk_rx;
+	struct clk *mac_clk_tx;
+	struct clk *clk_mac_ref;
+	struct clk *clk_mac_refout;
+	struct clk *aclk_mac;
+	struct clk *pclk_mac;
+
+	int tx_delay;
+	int rx_delay;
+
+	struct regmap *grf;
+};
+
+#define RK3288_GRF_SOC_CON1 0x0248
+#define RK3288_GRF_SOC_CON3 0x0250
+#define RK3288_GRF_GPIO3D_E 0x01ec
+#define RK3288_GRF_GPIO4A_E 0x01f0
+#define RK3288_GRF_GPIO4B_E 0x01f4
+
+#define GPIO3D_2MA	0xFFFF0000
+#define GPIO3D_4MA	0xFFFF5555
+#define GPIO3D_8MA	0xFFFFAAAA
+#define GPIO3D_12MA	0xFFFFFFFF
+
+#define GPIO4A_2MA	0xFFFF0000
+#define GPIO4A_4MA	0xFFFF5555
+#define GPIO4A_8MA	0xFFFFAAAA
+#define GPIO4A_12MA	0xFFFFFFFF
+
+#define GRF_BIT(nr)	(BIT(nr) | BIT(nr+16))
+#define GRF_CLR_BIT(nr)	(BIT(nr+16))
+
+#define GPIO4B_2_2MA	(GRF_CLR_BIT(2) | GRF_CLR_BIT(3))
+#define GPIO4B_2_4MA	(GRF_BIT(2) | GRF_CLR_BIT(3))
+#define GPIO4B_2_8MA	(GRF_CLR_BIT(2) | GRF_BIT(3))
+#define GPIO4B_2_12MA	(GRF_BIT(2) | GRF_BIT(3))
+
+/*RK3288_GRF_SOC_CON1*/
+#define GMAC_PHY_INTF_SEL_RGMII	(GRF_BIT(6) | GRF_CLR_BIT(7) | GRF_CLR_BIT(8))
+#define GMAC_PHY_INTF_SEL_RMII  (GRF_CLR_BIT(6) | GRF_CLR_BIT(7) | GRF_BIT(8))
+#define GMAC_FLOW_CTRL          GRF_BIT(9)
+#define GMAC_FLOW_CTRL_CLR      GRF_CLR_BIT(9)
+#define GMAC_SPEED_10M          GRF_CLR_BIT(10)
+#define GMAC_SPEED_100M         GRF_BIT(10)
+#define GMAC_RMII_CLK_25M       GRF_BIT(11)
+#define GMAC_RMII_CLK_2_5M      GRF_CLR_BIT(11)
+#define GMAC_CLK_125M           (GRF_CLR_BIT(12) | GRF_CLR_BIT(13))
+#define GMAC_CLK_25M            (GRF_BIT(12) | GRF_BIT(13))
+#define GMAC_CLK_2_5M           (GRF_CLR_BIT(12) | GRF_BIT(13))
+#define GMAC_RMII_MODE          GRF_BIT(14)
+#define GMAC_RMII_MODE_CLR      GRF_CLR_BIT(14)
+
+/*RK3288_GRF_SOC_CON3*/
+#define GMAC_TXCLK_DLY_ENABLE   GRF_BIT(14)
+#define GMAC_TXCLK_DLY_DISABLE  GRF_CLR_BIT(14)
+#define GMAC_RXCLK_DLY_ENABLE   GRF_BIT(15)
+#define GMAC_RXCLK_DLY_DISABLE  GRF_CLR_BIT(15)
+#define GMAC_CLK_RX_DL_CFG(val) ((0x7F<<7<<16) | (val<<7))
+#define GMAC_CLK_TX_DL_CFG(val) ((0x7F<<16) | (val))
+
+static void set_to_rgmii(struct rk_priv_data *bsp_priv,
+			 int tx_delay, int rx_delay)
+{
+	if (IS_ERR(bsp_priv->grf)) {
+		pr_err("%s: Missing rockchip,grf property\n", __func__);
+		return;
+	}
+
+	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
+		     GMAC_PHY_INTF_SEL_RGMII | GMAC_RMII_MODE_CLR);
+	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON3,
+		     GMAC_RXCLK_DLY_ENABLE | GMAC_TXCLK_DLY_ENABLE |
+		     GMAC_CLK_RX_DL_CFG(rx_delay) |
+		     GMAC_CLK_TX_DL_CFG(tx_delay));
+	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO3D_E, GPIO3D_12MA);
+	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4A_E, GPIO4A_12MA);
+	regmap_write(bsp_priv->grf, RK3288_GRF_GPIO4B_E, GPIO4B_2_12MA);
+
+	pr_debug("%s: tx delay=0x%x; rx delay=0x%x;\n",
+		 __func__, tx_delay, rx_delay);
+}
+
+static void set_to_rmii(struct rk_priv_data *bsp_priv)
+{
+	if (IS_ERR(bsp_priv->grf)) {
+		pr_err("%s: Missing rockchip,grf property\n", __func__);
+		return;
+	}
+
+	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
+		     GMAC_PHY_INTF_SEL_RMII);
+	regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
+		     GMAC_RMII_MODE);
+}
+
+static void set_rgmii_speed(struct rk_priv_data *bsp_priv, int speed)
+{
+	if (IS_ERR(bsp_priv->grf)) {
+		pr_err("%s: Missing rockchip,grf property\n", __func__);
+		return;
+	}
+
+	if (speed == 10)
+		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1, GMAC_CLK_2_5M);
+	else if (speed == 100)
+		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1, GMAC_CLK_25M);
+	else if (speed == 1000)
+		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1, GMAC_CLK_125M);
+	else
+		pr_err("unknown speed value for RGMII! speed=%d", speed);
+}
+
+static void set_rmii_speed(struct rk_priv_data *bsp_priv, int speed)
+{
+	if (IS_ERR(bsp_priv->grf)) {
+		pr_err("%s: Missing rockchip,grf property\n", __func__);
+		return;
+	}
+
+	if (speed == 10) {
+		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
+			     GMAC_RMII_CLK_2_5M);
+		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
+			     GMAC_SPEED_10M);
+	} else if (speed == 100) {
+		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
+			     GMAC_RMII_CLK_25M);
+		regmap_write(bsp_priv->grf, RK3288_GRF_SOC_CON1,
+			     GMAC_SPEED_100M);
+	} else {
+		pr_err("unknown speed value for RMII! speed=%d", speed);
+	}
+}
+
+#define MAC_CLK_RX	"mac_clk_rx"
+#define MAC_CLK_TX	"mac_clk_tx"
+#define CLK_MAC_REF	"clk_mac_ref"
+#define CLK_MAC_REF_OUT	"clk_mac_refout"
+#define CLK_MAC_PLL	"clk_mac_pll"
+#define ACLK_MAC	"aclk_mac"
+#define PCLK_MAC	"pclk_mac"
+#define MAC_CLKIN	"ext_gmac"
+#define CLK_MAC		"stmmaceth"
+
+static int gmac_clk_init(struct rk_priv_data *bsp_priv)
+{
+	struct device *dev = &bsp_priv->pdev->dev;
+
+	bsp_priv->clk_enabled = false;
+
+	bsp_priv->mac_clk_rx = clk_get(dev, MAC_CLK_RX);
+	if (IS_ERR(bsp_priv->mac_clk_rx))
+		pr_warn("%s: warning: cannot get clock %s\n",
+			__func__, MAC_CLK_RX);
+
+	bsp_priv->mac_clk_tx = clk_get(dev, MAC_CLK_TX);
+	if (IS_ERR(bsp_priv->mac_clk_tx))
+		pr_warn("%s: warning: cannot get clock %s\n",
+			__func__, MAC_CLK_TX);
+
+	bsp_priv->clk_mac_ref = clk_get(dev, CLK_MAC_REF);
+	if (IS_ERR(bsp_priv->clk_mac_ref))
+		pr_warn("%s: warning: cannot get clock %s\n",
+			__func__, CLK_MAC_REF);
+
+	bsp_priv->clk_mac_refout = clk_get(dev, CLK_MAC_REF_OUT);
+	if (IS_ERR(bsp_priv->clk_mac_refout))
+		pr_warn("%s: warning:cannot get clock %s\n",
+			__func__, CLK_MAC_REF_OUT);
+
+	bsp_priv->aclk_mac = clk_get(dev, ACLK_MAC);
+	if (IS_ERR(bsp_priv->aclk_mac))
+		pr_warn("%s: warning: cannot get clock %s\n",
+			__func__, ACLK_MAC);
+
+	bsp_priv->pclk_mac = clk_get(dev, PCLK_MAC);
+	if (IS_ERR(bsp_priv->pclk_mac))
+		pr_warn("%s: warning: cannot get clock %s\n",
+			__func__, PCLK_MAC);
+
+	bsp_priv->clk_mac_pll = clk_get(dev, CLK_MAC_PLL);
+	if (IS_ERR(bsp_priv->clk_mac_pll))
+		pr_warn("%s: warning: cannot get clock %s\n",
+			__func__, CLK_MAC_PLL);
+
+	bsp_priv->gmac_clkin = clk_get(dev, MAC_CLKIN);
+	if (IS_ERR(bsp_priv->gmac_clkin))
+		pr_warn("%s: warning: cannot get clock %s\n",
+			__func__, MAC_CLKIN);
+
+	bsp_priv->clk_mac = clk_get(dev, CLK_MAC);
+	if (IS_ERR(bsp_priv->clk_mac))
+		pr_warn("%s: warning: cannot get clock %s\n",
+			__func__, CLK_MAC);
+
+	if (bsp_priv->clock_input) {
+		pr_info("%s: clock input from PHY\n", __func__);
+	} else {
+		if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RMII)
+			clk_set_rate(bsp_priv->clk_mac_pll, 50000000);
+
+		clk_set_parent(bsp_priv->clk_mac, bsp_priv->clk_mac_pll);
+	}
+
+	return 0;
+}
+
+static int gmac_clk_enable(struct rk_priv_data *bsp_priv, bool enable)
+{
+	int phy_iface = phy_iface = bsp_priv->phy_iface;
+
+	if (enable) {
+		if (!bsp_priv->clk_enabled) {
+			if (phy_iface == PHY_INTERFACE_MODE_RMII) {
+				if (!IS_ERR(bsp_priv->mac_clk_rx))
+					clk_prepare_enable(
+						bsp_priv->mac_clk_rx);
+
+				if (!IS_ERR(bsp_priv->clk_mac_ref))
+					clk_prepare_enable(
+						bsp_priv->clk_mac_ref);
+
+				if (!IS_ERR(bsp_priv->clk_mac_refout))
+					clk_prepare_enable(
+						bsp_priv->clk_mac_refout);
+			}
+
+			if (!IS_ERR(bsp_priv->aclk_mac))
+				clk_prepare_enable(bsp_priv->aclk_mac);
+
+			if (!IS_ERR(bsp_priv->pclk_mac))
+				clk_prepare_enable(bsp_priv->pclk_mac);
+
+			if (!IS_ERR(bsp_priv->mac_clk_tx))
+				clk_prepare_enable(bsp_priv->mac_clk_tx);
+
+			/**
+			 * if (!IS_ERR(bsp_priv->clk_mac))
+			 *	clk_prepare_enable(bsp_priv->clk_mac);
+			 */
+			mdelay(5);
+			bsp_priv->clk_enabled = true;
+		}
+	} else {
+		if (bsp_priv->clk_enabled) {
+			if (phy_iface == PHY_INTERFACE_MODE_RMII) {
+				if (!IS_ERR(bsp_priv->mac_clk_rx))
+					clk_disable_unprepare(
+						bsp_priv->mac_clk_rx);
+
+				if (!IS_ERR(bsp_priv->clk_mac_ref))
+					clk_disable_unprepare(
+						bsp_priv->clk_mac_ref);
+
+				if (!IS_ERR(bsp_priv->clk_mac_refout))
+					clk_disable_unprepare(
+						bsp_priv->clk_mac_refout);
+			}
+
+			if (!IS_ERR(bsp_priv->aclk_mac))
+				clk_disable_unprepare(bsp_priv->aclk_mac);
+
+			if (!IS_ERR(bsp_priv->pclk_mac))
+				clk_disable_unprepare(bsp_priv->pclk_mac);
+
+			if (!IS_ERR(bsp_priv->mac_clk_tx))
+				clk_disable_unprepare(bsp_priv->mac_clk_tx);
+			/**
+			 * if (!IS_ERR(bsp_priv->clk_mac))
+			 *	clk_disable_unprepare(bsp_priv->clk_mac);
+			 */
+			bsp_priv->clk_enabled = false;
+		}
+	}
+
+	return 0;
+}
+
+static int power_on_by_pmu(struct rk_priv_data *bsp_priv, bool enable)
+{
+	struct regulator *ldo;
+	char *ldostr = bsp_priv->pmu_regulator;
+	int ret;
+
+	if (!ldostr) {
+		pr_err("%s: no ldo found\n", __func__);
+		return -1;
+	}
+
+	ldo = regulator_get(NULL, ldostr);
+	if (!ldo) {
+		pr_err("\n%s get ldo %s failed\n", __func__, ldostr);
+	} else {
+		if (enable) {
+			if (!regulator_is_enabled(ldo)) {
+				regulator_set_voltage(ldo, 3300000, 3300000);
+				ret = regulator_enable(ldo);
+				if (ret != 0)
+					pr_err("%s: fail to enable %s\n",
+					       __func__, ldostr);
+				else
+					pr_info("turn on ldo done.\n");
+			} else {
+				pr_warn("%s is enabled before enable", ldostr);
+			}
+		} else {
+			if (regulator_is_enabled(ldo)) {
+				ret = regulator_disable(ldo);
+				if (ret != 0)
+					pr_err("%s: fail to disable %s\n",
+					       __func__, ldostr);
+				else
+					pr_info("turn off ldo done.\n");
+			} else {
+				pr_warn("%s is disabled before disable",
+					ldostr);
+			}
+		}
+		regulator_put(ldo);
+	}
+
+	return 0;
+}
+
+static int power_on_by_gpio(struct rk_priv_data *bsp_priv, bool enable)
+{
+	if (enable) {
+		/*power on*/
+		if (gpio_is_valid(bsp_priv->power_io))
+			gpio_direction_output(bsp_priv->power_io,
+					      bsp_priv->power_io_level);
+	} else {
+		/*power off*/
+		if (gpio_is_valid(bsp_priv->power_io))
+			gpio_direction_output(bsp_priv->power_io,
+					      !bsp_priv->power_io_level);
+	}
+
+	return 0;
+}
+
+static int phy_power_on(struct rk_priv_data *bsp_priv, bool enable)
+{
+	int ret = -1;
+
+	pr_info("Ethernet PHY power %s\n", enable == 1 ? "on" : "off");
+
+	if (bsp_priv->power_ctrl_by_pmu)
+		ret = power_on_by_pmu(bsp_priv, enable);
+	else
+		ret =  power_on_by_gpio(bsp_priv, enable);
+
+	if (enable) {
+		/*reset*/
+		if (gpio_is_valid(bsp_priv->reset_io)) {
+			gpio_direction_output(bsp_priv->reset_io,
+					      bsp_priv->reset_io_level);
+			mdelay(5);
+			gpio_direction_output(bsp_priv->reset_io,
+					      !bsp_priv->reset_io_level);
+		}
+		mdelay(30);
+
+	} else {
+		/*pull down reset*/
+		if (gpio_is_valid(bsp_priv->reset_io)) {
+			gpio_direction_output(bsp_priv->reset_io,
+					      bsp_priv->reset_io_level);
+		}
+	}
+
+	return ret;
+}
+
+#define GPIO_PHY_POWER	"gmac_phy_power"
+#define GPIO_PHY_RESET	"gmac_phy_reset"
+#define GPIO_PHY_IRQ	"gmac_phy_irq"
+
+static void *rk_gmac_setup(struct platform_device *pdev)
+{
+	struct rk_priv_data *bsp_priv;
+	struct device *dev = &pdev->dev;
+	enum of_gpio_flags flags;
+	int ret;
+	const char *strings = NULL;
+	int value;
+	int irq;
+
+	bsp_priv = devm_kzalloc(dev, sizeof(*bsp_priv), GFP_KERNEL);
+	if (!bsp_priv)
+		return ERR_PTR(-ENOMEM);
+
+	bsp_priv->phy_iface = of_get_phy_mode(dev->of_node);
+
+	ret = of_property_read_string(dev->of_node, "pmu_regulator", &strings);
+	if (ret) {
+		pr_err("%s: Can not read property: pmu_regulator.\n", __func__);
+		bsp_priv->power_ctrl_by_pmu = false;
+	} else {
+		pr_info("%s: ethernet phy power controlled by pmu(%s).\n",
+			__func__, strings);
+		bsp_priv->power_ctrl_by_pmu = true;
+		strcpy(bsp_priv->pmu_regulator, strings);
+	}
+
+	ret = of_property_read_u32(dev->of_node, "pmu_enable_level", &value);
+	if (ret) {
+		pr_err("%s: Can not read property: pmu_enable_level.\n",
+		       __func__);
+		bsp_priv->power_ctrl_by_pmu = false;
+	} else {
+		pr_info("%s: PHY power controlled by pmu(level = %s).\n",
+			__func__, (value == 1) ? "HIGH" : "LOW");
+		bsp_priv->power_ctrl_by_pmu = true;
+		bsp_priv->pmu_enable_level = value;
+	}
+
+	ret = of_property_read_string(dev->of_node, "clock_in_out", &strings);
+	if (ret) {
+		pr_err("%s: Can not read property: clock_in_out.\n", __func__);
+		bsp_priv->clock_input = true;
+	} else {
+		pr_info("%s: clock input or output? (%s).\n",
+			__func__, strings);
+		if (!strcmp(strings, "input"))
+			bsp_priv->clock_input = true;
+		else
+			bsp_priv->clock_input = false;
+	}
+
+	ret = of_property_read_u32(dev->of_node, "tx_delay", &value);
+	if (ret) {
+		bsp_priv->tx_delay = 0x30;
+		pr_err("%s: Can not read property: tx_delay.", __func__);
+		pr_err("%s: set tx_delay to 0x%x\n",
+		       __func__, bsp_priv->tx_delay);
+	} else {
+		pr_info("%s: TX delay(0x%x).\n", __func__, value);
+		bsp_priv->tx_delay = value;
+	}
+
+	ret = of_property_read_u32(dev->of_node, "rx_delay", &value);
+	if (ret) {
+		bsp_priv->rx_delay = 0x10;
+		pr_err("%s: Can not read property: rx_delay.", __func__);
+		pr_err("%s: set rx_delay to 0x%x\n",
+		       __func__, bsp_priv->rx_delay);
+	} else {
+		pr_info("%s: RX delay(0x%x).\n", __func__, value);
+		bsp_priv->rx_delay = value;
+	}
+
+	bsp_priv->grf = syscon_regmap_lookup_by_phandle(dev->of_node,
+							"rockchip,grf");
+	bsp_priv->phyirq_io =
+		of_get_named_gpio_flags(dev->of_node,
+					"phyirq-gpio", 0, &flags);
+	bsp_priv->phyirq_io_level = (flags & OF_GPIO_ACTIVE_LOW) ? 0 : 1;
+
+	bsp_priv->reset_io =
+		of_get_named_gpio_flags(dev->of_node,
+					"reset-gpio", 0, &flags);
+	bsp_priv->reset_io_level = (flags & OF_GPIO_ACTIVE_LOW) ? 0 : 1;
+
+	bsp_priv->power_io =
+		of_get_named_gpio_flags(dev->of_node, "power-gpio", 0, &flags);
+	bsp_priv->power_io_level = (flags & OF_GPIO_ACTIVE_LOW) ? 0 : 1;
+
+	/*power*/
+	if (!gpio_is_valid(bsp_priv->power_io)) {
+		pr_err("%s: Failed to get GPIO %s.\n",
+		       __func__, "power-gpio");
+	} else {
+		ret = gpio_request(bsp_priv->power_io, GPIO_PHY_POWER);
+		if (ret)
+			pr_err("%s: ERROR: Failed to request GPIO %s.\n",
+			       __func__, GPIO_PHY_POWER);
+	}
+
+	if (!gpio_is_valid(bsp_priv->reset_io)) {
+		pr_err("%s: ERROR: Get reset-gpio failed.\n", __func__);
+	} else {
+		ret = gpio_request(bsp_priv->reset_io, GPIO_PHY_RESET);
+		if (ret)
+			pr_err("%s: ERROR: Failed to request GPIO %s.\n",
+			       __func__, GPIO_PHY_RESET);
+	}
+
+	if (bsp_priv->phyirq_io > 0) {
+		struct plat_stmmacenet_data *plat_dat;
+
+		pr_info("PHY irq in use\n");
+		ret = gpio_request(bsp_priv->phyirq_io, GPIO_PHY_IRQ);
+		if (ret < 0) {
+			pr_warn("%s: Failed to request GPIO %s\n",
+				__func__, GPIO_PHY_IRQ);
+			goto goon;
+		}
+
+		ret = gpio_direction_input(bsp_priv->phyirq_io);
+		if (ret < 0) {
+			pr_err("%s, Failed to set input for GPIO %s\n",
+			       __func__, GPIO_PHY_IRQ);
+			gpio_free(bsp_priv->phyirq_io);
+			goto goon;
+		}
+
+		irq = gpio_to_irq(bsp_priv->phyirq_io);
+		if (irq < 0) {
+			ret = irq;
+			pr_err("Failed to set irq for %s\n",
+			       GPIO_PHY_IRQ);
+			gpio_free(bsp_priv->phyirq_io);
+			goto goon;
+		}
+
+		plat_dat = dev_get_platdata(&pdev->dev);
+		if (plat_dat)
+			plat_dat->mdio_bus_data->probed_phy_irq = irq;
+		else
+			pr_err("%s: plat_data is NULL\n", __func__);
+	}
+
+goon:
+	/*rmii or rgmii*/
+	if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RGMII) {
+		pr_info("%s: init for RGMII\n", __func__);
+		set_to_rgmii(bsp_priv, bsp_priv->tx_delay, bsp_priv->rx_delay);
+	} else if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RMII) {
+		pr_info("%s: init for RMII\n", __func__);
+		set_to_rmii(bsp_priv);
+	} else {
+		pr_err("%s: ERROR: NO interface defined!\n", __func__);
+	}
+
+	bsp_priv->pdev = pdev;
+
+	gmac_clk_init(bsp_priv);
+
+	return bsp_priv;
+}
+
+static int rk_gmac_init(struct platform_device *pdev, void *priv)
+{
+	struct rk_priv_data *bsp_priv = priv;
+	int ret;
+
+	ret = phy_power_on(bsp_priv, true);
+	if (ret)
+		return ret;
+
+	ret = gmac_clk_enable(bsp_priv, true);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static void rk_gmac_exit(struct platform_device *pdev, void *priv)
+{
+	struct rk_priv_data *gmac = priv;
+
+	phy_power_on(gmac, false);
+	gmac_clk_enable(gmac, false);
+}
+
+static void rk_fix_speed(void *priv, unsigned int speed)
+{
+	struct rk_priv_data *bsp_priv = priv;
+
+	if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RGMII)
+		set_rgmii_speed(bsp_priv, speed);
+	else if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RMII)
+		set_rmii_speed(bsp_priv, speed);
+	else
+		pr_err("unsupported interface %d", bsp_priv->phy_iface);
+}
+
+const struct stmmac_of_data rk_gmac_data = {
+	.has_gmac = 1,
+	.fix_mac_speed = rk_fix_speed,
+	.setup = rk_gmac_setup,
+	.init = rk_gmac_init,
+	.exit = rk_gmac_exit,
+};
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 15814b7..b4dee96 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -33,6 +33,7 @@
 
 static const struct of_device_id stmmac_dt_ids[] = {
 	/* SoC specific glue layers should come before generic bindings */
+	{ .compatible = "rockchip,rk3288-gmac", .data = &rk_gmac_data},
 	{ .compatible = "amlogic,meson6-dwmac", .data = &meson6_dwmac_data},
 	{ .compatible = "allwinner,sun7i-a20-gmac", .data = &sun7i_gmac_data},
 	{ .compatible = "st,stih415-dwmac", .data = &stih4xx_dwmac_data},
@@ -291,6 +292,8 @@ static int stmmac_pltfr_probe(struct platform_device *pdev)
 			return  -ENOMEM;
 		}
 
+		pdev->dev.platform_data = plat_dat;
+
 		ret = stmmac_probe_config_dt(pdev, plat_dat, &mac);
 		if (ret) {
 			pr_err("%s: main dt probe failed", __func__);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
index 25dd1f7..32a0516 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.h
@@ -24,5 +24,6 @@ extern const struct stmmac_of_data sun7i_gmac_data;
 extern const struct stmmac_of_data stih4xx_dwmac_data;
 extern const struct stmmac_of_data stid127_dwmac_data;
 extern const struct stmmac_of_data socfpga_gmac_data;
+extern const struct stmmac_of_data rk_gmac_data;
 
 #endif /* __STMMAC_PLATFORM_H__ */
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v2 2/6] GMAC: define clock ID used for GMAC
From: Roger Chen @ 2014-11-27  2:52 UTC (permalink / raw)
  To: heiko
  Cc: peppe.cavallaro, netdev, linux-kernel, linux-rockchip, kever.yang,
	eddie.cai, roger.chen
In-Reply-To: <1417056591-3570-1-git-send-email-roger.chen@rock-chips.com>

Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
---
 include/dt-bindings/clock/rk3288-cru.h |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/dt-bindings/clock/rk3288-cru.h b/include/dt-bindings/clock/rk3288-cru.h
index 100a08c..f9496f5 100644
--- a/include/dt-bindings/clock/rk3288-cru.h
+++ b/include/dt-bindings/clock/rk3288-cru.h
@@ -72,6 +72,10 @@
 #define SCLK_HEVC_CABAC		111
 #define SCLK_HEVC_CORE		112
 
+#define SCLK_MAC_PLL		150
+#define SCLK_MAC		151
+#define SCLK_MACREF_OUT		152
+
 #define DCLK_VOP0		190
 #define DCLK_VOP1		191
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v2 3/6] GMAC: modify CRU config for Rockchip RK3288 SoCs integrated GMAC
From: Roger Chen @ 2014-11-27  2:52 UTC (permalink / raw)
  To: heiko
  Cc: peppe.cavallaro, netdev, linux-kernel, linux-rockchip, kever.yang,
	eddie.cai, roger.chen
In-Reply-To: <1417056591-3570-1-git-send-email-roger.chen@rock-chips.com>

modify CRU config for GMAC driver

Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
---
 drivers/clk/rockchip/clk-rk3288.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/clk/rockchip/clk-rk3288.c b/drivers/clk/rockchip/clk-rk3288.c
index 2327829..60237dc 100644
--- a/drivers/clk/rockchip/clk-rk3288.c
+++ b/drivers/clk/rockchip/clk-rk3288.c
@@ -187,7 +187,7 @@ PNAME(mux_uart2_p)	= { "uart2_src", "uart2_frac", "xin24m" };
 PNAME(mux_uart3_p)	= { "uart3_src", "uart3_frac", "xin24m" };
 PNAME(mux_uart4_p)	= { "uart4_src", "uart4_frac", "xin24m" };
 PNAME(mux_cif_out_p)	= { "cif_src", "xin24m" };
-PNAME(mux_macref_p)	= { "mac_src", "ext_gmac" };
+PNAME(mux_mac_p)	= { "mac_pll_src", "ext_gmac" };
 PNAME(mux_hsadcout_p)	= { "hsadc_src", "ext_hsadc" };
 PNAME(mux_edp_24m_p)	= { "ext_edp_24m", "xin24m" };
 PNAME(mux_tspout_p)	= { "cpll", "gpll", "npll", "xin27m" };
@@ -560,18 +560,18 @@ static struct rockchip_clk_branch rk3288_clk_branches[] __initdata = {
 	MUX(SCLK_UART4, "sclk_uart4", mux_uart4_p, 0,
 			RK3288_CLKSEL_CON(3), 8, 2, MFLAGS),
 
-	COMPOSITE(0, "mac_src", mux_pll_src_npll_cpll_gpll_p, 0,
+	COMPOSITE(SCLK_MAC_PLL, "mac_pll_src", mux_pll_src_npll_cpll_gpll_p, 0,
 			RK3288_CLKSEL_CON(21), 0, 2, MFLAGS, 8, 5, DFLAGS,
 			RK3288_CLKGATE_CON(2), 5, GFLAGS),
-	MUX(0, "macref", mux_macref_p, 0,
+	MUX(SCLK_MAC, "mac_clk", mux_mac_p, 0,
 			RK3288_CLKSEL_CON(21), 4, 1, MFLAGS),
-	GATE(0, "sclk_macref_out", "macref", 0,
+	GATE(SCLK_MACREF_OUT, "sclk_macref_out", "mac_clk", 0,
 			RK3288_CLKGATE_CON(5), 3, GFLAGS),
-	GATE(SCLK_MACREF, "sclk_macref", "macref", 0,
+	GATE(SCLK_MACREF, "sclk_macref", "mac_clk", 0,
 			RK3288_CLKGATE_CON(5), 2, GFLAGS),
-	GATE(SCLK_MAC_RX, "sclk_mac_rx", "macref", 0,
+	GATE(SCLK_MAC_RX, "sclk_mac_rx", "mac_clk", 0,
 			RK3288_CLKGATE_CON(5), 0, GFLAGS),
-	GATE(SCLK_MAC_TX, "sclk_mac_tx", "macref", 0,
+	GATE(SCLK_MAC_TX, "sclk_mac_tx", "mac_clk", 0,
 			RK3288_CLKGATE_CON(5), 1, GFLAGS),
 
 	COMPOSITE(0, "hsadc_src", mux_pll_src_cpll_gpll_p, 0,
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v2 4/6] ARM: dts: rockchip: add gmac info for rk3288
From: Roger Chen @ 2014-11-27  2:53 UTC (permalink / raw)
  To: heiko
  Cc: peppe.cavallaro, netdev, linux-kernel, linux-rockchip, kever.yang,
	eddie.cai, roger.chen
In-Reply-To: <1417056591-3570-1-git-send-email-roger.chen@rock-chips.com>

add gmac info in rk3288.dtsi for GMAC driver

Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
---
 arch/arm/boot/dts/rk3288.dtsi |   49 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/arch/arm/boot/dts/rk3288.dtsi b/arch/arm/boot/dts/rk3288.dtsi
index 0f50d5d..22f7019 100644
--- a/arch/arm/boot/dts/rk3288.dtsi
+++ b/arch/arm/boot/dts/rk3288.dtsi
@@ -358,6 +358,22 @@
 		status = "disabled";
 	};
 
+	gmac: ethernet@ff290000 {
+		compatible = "rockchip,rk3288-gmac";
+		reg = <0xff290000 0x10000>;
+		interrupts = <GIC_SPI 27 IRQ_TYPE_LEVEL_HIGH>;
+		interrupt-names = "macirq";
+		rockchip,grf = <&grf>;
+		clocks = <&cru SCLK_MAC>, <&cru SCLK_MAC_PLL>,
+			<&cru SCLK_MAC_RX>, <&cru SCLK_MAC_TX>,
+			<&cru SCLK_MACREF>, <&cru SCLK_MACREF_OUT>,
+			<&cru ACLK_GMAC>, <&cru PCLK_GMAC>;
+		clock-names = "stmmaceth", "clk_mac_pll",
+			"mac_clk_rx", "mac_clk_tx",
+			"clk_mac_ref", "clk_mac_refout",
+			"aclk_mac", "pclk_mac";
+	};
+
 	usb_host0_ehci: usb@ff500000 {
 		compatible = "generic-ehci";
 		reg = <0xff500000 0x100>;
@@ -1040,5 +1056,38 @@
 				rockchip,pins = <7 23 3 &pcfg_pull_none>;
 			};
 		};
+
+		gmac {
+			rgmii_pins: rgmii-pins {
+				rockchip,pins = <3 30 3 &pcfg_pull_none>,
+						<3 31 3 &pcfg_pull_none>,
+						<3 26 3 &pcfg_pull_none>,
+						<3 27 3 &pcfg_pull_none>,
+						<3 28 3 &pcfg_pull_none>,
+						<3 29 3 &pcfg_pull_none>,
+						<3 24 3 &pcfg_pull_none>,
+						<3 25 3 &pcfg_pull_none>,
+						<4 0 3 &pcfg_pull_none>,
+						<4 5 3 &pcfg_pull_none>,
+						<4 6 3 &pcfg_pull_none>,
+						<4 9 3 &pcfg_pull_none>,
+						<4 4 3 &pcfg_pull_none>,
+						<4 1 3 &pcfg_pull_none>,
+						<4 3 3 &pcfg_pull_none>;
+			};
+
+			rmii_pins: rmii-pins {
+				rockchip,pins = <3 30 3 &pcfg_pull_none>,
+						<3 31 3 &pcfg_pull_none>,
+						<3 28 3 &pcfg_pull_none>,
+						<3 29 3 &pcfg_pull_none>,
+						<4 0 3 &pcfg_pull_none>,
+						<4 5 3 &pcfg_pull_none>,
+						<4 4 3 &pcfg_pull_none>,
+						<4 1 3 &pcfg_pull_none>,
+						<4 2 3 &pcfg_pull_none>,
+						<4 3 3 &pcfg_pull_none>;
+			};
+		};
 	};
 };
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v2 5/6] ARM: dts: rockchip: enable gmac on RK3288 evb board
From: Roger Chen @ 2014-11-27  2:53 UTC (permalink / raw)
  To: heiko
  Cc: peppe.cavallaro, netdev, linux-kernel, linux-rockchip, kever.yang,
	eddie.cai, roger.chen
In-Reply-To: <1417056591-3570-1-git-send-email-roger.chen@rock-chips.com>

enable gmac in rk3288-evb-rk808.dts

Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
---
 arch/arm/boot/dts/rk3288-evb-rk808.dts |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/arm/boot/dts/rk3288-evb-rk808.dts b/arch/arm/boot/dts/rk3288-evb-rk808.dts
index d8c775e6..13ad2d2 100644
--- a/arch/arm/boot/dts/rk3288-evb-rk808.dts
+++ b/arch/arm/boot/dts/rk3288-evb-rk808.dts
@@ -15,6 +15,13 @@
 
 / {
 	compatible = "rockchip,rk3288-evb-rk808", "rockchip,rk3288";
+
+	ext_gmac: external-gmac-clock {
+		compatible = "fixed-clock";
+		clock-frequency = <125000000>;
+		clock-output-names = "ext_gmac";
+		#clock-cells = <0>;
+	};
 };
 
 &cpu0 {
@@ -152,3 +159,20 @@
 		};
 	};
 };
+
+&gmac {
+	//pmu_regulator = "act_ldo5";
+	//pmu_enable_level = <1>; //1->HIGH, 0->LOW
+	power-gpio = <&gpio0 6 GPIO_ACTIVE_HIGH>;
+	reset-gpio = <&gpio4 7 GPIO_ACTIVE_LOW>;
+	//phyirq-gpio = <&gpio4 2 GPIO_ACTIVE_LOW>;
+	phy-mode = "rgmii";
+	clock_in_out = "input";
+	assigned-clocks = <&cru SCLK_MAC>;
+	assigned-clock-parents = <&ext_gmac>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&rgmii_pins>;
+	tx_delay = <0x30>;
+	rx_delay = <0x10>;
+	status = "ok";
+};
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v2 6/6] GMAC: add document for Rockchip RK3288 GMAC
From: Roger Chen @ 2014-11-27  2:53 UTC (permalink / raw)
  To: heiko
  Cc: peppe.cavallaro, netdev, linux-kernel, linux-rockchip, kever.yang,
	eddie.cai, roger.chen
In-Reply-To: <1417056591-3570-1-git-send-email-roger.chen@rock-chips.com>

The document descripts how to add properties for GMAC in device tree.

Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
---
 .../devicetree/bindings/net/rockchip-dwmac.txt     |   71 ++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/rockchip-dwmac.txt

diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
new file mode 100644
index 0000000..237442b
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
@@ -0,0 +1,71 @@
+Rockchip SoC RK3288 10/100/1000 Ethernet driver(GMAC)
+
+The device node has following properties.
+
+Required properties:
+ - compatible: Can be "rockchip,rk3288-gmac".
+ - reg: addresses and length of the register sets for the device.
+ - interrupts: Should contain the GMAC interrupts.
+ - interrupt-names: Should contain the interrupt names "macirq".
+ - rockchip,grf: phandle to the syscon grf used to control speed and mode.
+ - clocks: <&cru SCLK_MAC>: clock selector for main clock, from PLL or PHY.
+	   <&cru SCLK_MAC_PLL>: PLL clock for SCLK_MAC
+	   <&cru SCLK_MAC_RX>: clock gate for RX
+	   <&cru SCLK_MAC_TX>: clock gate for TX
+	   <&cru SCLK_MACREF>: clock gate for RMII referce clock
+	   <&cru SCLK_MACREF_OUT> clock gate for RMII reference clock output
+	   <&cru ACLK_GMAC>: AXI clock gate for GMAC 
+	   <&cru PCLK_GMAC>: APB clock gate for GMAC 
+ - clock-names: One name for each entry in the clocks property.
+ - phy-mode: See ethernet.txt file in the same directory.
+ - pinctrl-names: Names corresponding to the numbered pinctrl states.
+ - pinctrl-0: pin-control mode. can be <&rgmii_pins> or <&rmii_pins>.
+ - clock_in_out: For RGMII, it must be "input", means main clock(125MHz)
+   is not sourced from SoC's PLL, but input from PHY; For RMII, "input" means
+   PHY provides the reference clock(50MHz), "output" means GMAC provides the
+   reference clock. 
+ - assigned-clocks: main clock, should be <&cru SCLK_MAC>;
+ - assigned-clock-parents = parent of main clock.
+   can be <&ext_gmac> or <&cru SCLK_MAC_PLL>.
+ - reset-gpio: GPIO for reset
+
+Optional properties:
+ - tx_delay: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as default.
+ - rx_delay: Delay value for RXD timing. Range value is 0~0x7F, 0x10 as default.
+ - pmu_regulator: PMIC's integrated LDO power for PHY. Can be "act_ldo5".
+ - pmu_enable_level: Enable level of LDO. Can be <1> or <0>. 1->HIGH, 0->LOW.
+ - power-gpio: GPIO used to control PHY power. Normally,
+   power-gpio and pmu_regulator can not be used at the same time.
+ - phyirq-gpio: GPIO used as PHY irq.
+
+Example:
+
+gmac: ethernet@ff290000 {
+	compatible = "rockchip,rk3288-gmac";
+	reg = <0xff290000 0x10000>;
+	interrupts = <GIC_SPI 27 IRQ_TYPE_LEVEL_HIGH>;
+	interrupt-names = "macirq";
+	rockchip,grf = <&grf>;
+	clocks = <&cru SCLK_MAC>, <&cru SCLK_MAC_PLL>,
+		<&cru SCLK_MAC_RX>, <&cru SCLK_MAC_TX>,
+		<&cru SCLK_MACREF>, <&cru SCLK_MACREF_OUT>,
+		<&cru ACLK_GMAC>, <&cru PCLK_GMAC>;
+	clock-names = "stmmaceth", "clk_mac_pll",
+		"mac_clk_rx", "mac_clk_tx",
+		"clk_mac_ref", "clk_mac_refout",
+		"aclk_mac", "pclk_mac";
+	phy-mode = "rgmii";
+	pinctrl-names = "default";
+	pinctrl-0 = <&rgmii_pins /*&rmii_pins*/>;
+
+	clock_in_out = "input";
+	assigned-clocks = <&cru SCLK_MAC>;
+	assigned-clock-parents = <&ext_gmac>;
+	tx_delay = <0x30>;
+	rx_delay = <0x10>;
+
+        power-gpio = <&gpio0 6 GPIO_ACTIVE_HIGH>;
+        reset-gpio = <&gpio4 7 GPIO_ACTIVE_LOW>;
+
+	status = "ok";
+};
-- 
1.7.9.5

^ permalink raw reply related

* Re: [patch net-next v3 04/17] net: introduce generic switch devices support
From: Simon Horman @ 2014-11-27  3:13 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Thomas Graf, Jiri Pirko, netdev, davem, nhorman, andy, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl
In-Reply-To: <54754A10.6020601@mojatatu.com>

On Tue, Nov 25, 2014 at 10:33:36PM -0500, Jamal Hadi Salim wrote:
> On 11/25/14 16:54, Thomas Graf wrote:
> >On 11/25/14 at 12:08pm, Jamal Hadi Salim wrote:
> 
> >It would definitely help if you could expose some more details on the
> >"some network processor" you have. We're all very eager ;-)
> >
> 
> Well, this thing doesnt run ovs ;-> (/me runs). If you come
> to netdev i may let you play with it ;-> Its a humongous device
> (think multi 100G ports).
> 
> On a serious note: Even if you took what Simon/Netronome has
> (yes, I know they use ovs;->)

FWIW, we are also interested in non-OVS use cases.

> - there is really no need for a switch
> abstraction *at all* if all you want to is hang a packet
> processing graph that ingresses at a port and egress at another port.
> As you know, Linux supports it just fine with tc.

I may be missing the point but I see two problems that are solved by
the switch abstraction.

- Cases where no ports are configured.

  Perhaps no such use cases exist for the API in question.
  But it does seem plausible to me that non-physical ports could
  be added at run-time and that thus a "switch" could initially
  exist with no configured port. Something like how bridges
  initially have no ports (IIRC).

- Discovering the association between ports and "switches".

My recollection from the double round table discussion on the last day of
the Düsseldorf sessions was that these were reasons that simply accessing
any port belonging to the "switch" were not entirely satisfactory.

> >I'm with Jiri but I agree it's not a perfect fit. I doubt there is but
> >if you can come up with something that fits better I'm open to it.
> >
> >I considered "dataplane" or "dp" for a bit but it's quite generic as
> >well.
> >
> 
> The purpose is to offload. I think any name would be better than
> mapping it to a specific abstraction called "switch". Especially
> if it is hanging off a port and there is no switch in the pipeline.
> 
> cheers,
> jamal

^ permalink raw reply

* [PATCH] stmmac: platform: Move plat_dat checking earlier
From: Huacai Chen @ 2014-11-27  3:14 UTC (permalink / raw)
  To: Giuseppe Cavallaro; +Cc: Vince Bridgers, David S. Miller, netdev, Huacai Chen

Original code only check/alloc plat_dat for the CONFIG_OF case, this
patch check/alloc it earlier and unconditionally to avoid kernel build
warnings:

drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:275
stmmac_pltfr_probe() warn: variable dereferenced before check 'plat_dat'

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 5b0da39..d254950 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -265,6 +265,15 @@ static int stmmac_pltfr_probe(struct platform_device *pdev)
 
 	plat_dat = dev_get_platdata(&pdev->dev);
 
+	if (!plat_dat)
+		plat_dat = devm_kzalloc(&pdev->dev,
+				sizeof(struct plat_stmmacenet_data),
+				GFP_KERNEL);
+	if (!plat_dat) {
+		pr_err("%s: ERROR: no memory", __func__);
+		return  -ENOMEM;
+	}
+
 	/* Set default value for multicast hash bins */
 	plat_dat->multicast_filter_bins = HASH_TABLE_SIZE;
 
@@ -272,15 +281,6 @@ static int stmmac_pltfr_probe(struct platform_device *pdev)
 	plat_dat->unicast_filter_entries = 1;
 
 	if (pdev->dev.of_node) {
-		if (!plat_dat)
-			plat_dat = devm_kzalloc(&pdev->dev,
-					sizeof(struct plat_stmmacenet_data),
-					GFP_KERNEL);
-		if (!plat_dat) {
-			pr_err("%s: ERROR: no memory", __func__);
-			return  -ENOMEM;
-		}
-
 		ret = stmmac_probe_config_dt(pdev, plat_dat, &mac);
 		if (ret) {
 			pr_err("%s: main dt probe failed", __func__);
-- 
1.7.7.3

^ permalink raw reply related

* Re: [PATCH] xen-netfront: Remove BUGs on paged skb data which crosses a page boundary
From: Seth Forshee @ 2014-11-27  3:53 UTC (permalink / raw)
  To: David Miller
  Cc: konrad.wilk, boris.ostrovsky, david.vrabel, zoltan.kiss,
	eric.dumazet, stefan.bader, xen-devel, netdev, linux-kernel
In-Reply-To: <20141126.122812.223757363894961994.davem@davemloft.net>

On Wed, Nov 26, 2014 at 12:28:12PM -0500, David Miller wrote:
> From: Seth Forshee <seth.forshee@canonical.com>
> Date: Tue, 25 Nov 2014 20:28:24 -0600
> 
> > These BUGs can be erroneously triggered by frags which refer to
> > tail pages within a compound page. The data in these pages may
> > overrun the hardware page while still being contained within the
> > compound page, but since compound_order() evaluates to 0 for tail
> > pages the assertion fails. The code already iterates through
> > subsequent pages correctly in this scenario, so the BUGs are
> > unnecessary and can be removed.
> > 
> > Fixes: f36c374782e4 ("xen/netfront: handle compound page fragments on transmit")
> > Cc: <stable@vger.kernel.org> # 3.7+
> > Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
> 
> Can I get some Xen developer reviews?

Fwiw this issue was discussed previously and this was the recommended
fix.

 http://article.gmane.org/gmane.linux.kernel/1825381

Since then I got some feedback from a tester that he didn't see any
problems with the BUGs removed (actually replaced with a WARN so I know
that he actually saw the condition which triggered the BUG).

Thanks,
Seth

^ permalink raw reply

* Re: Multiple DSA switch on shared MII
From: Rajib Karmakar @ 2014-11-27  4:47 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev
In-Reply-To: <547636B7.3060706@gmail.com>

On Thu, Nov 27, 2014 at 1:53 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 25/11/14 23:33, Rajib Karmakar wrote:
>> Hello Florian,
>>
>> Thanks for your reply.
>>
>> On Wed, Nov 26, 2014 at 10:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>> 2014-11-25 20:52 GMT-08:00 Rajib Karmakar <rajibkit@gmail.com>:
>>>> Hello,
>>>>
>>>> I am developing a DSA driver for Marvell 6172 and need to create 2 dsa
>>>> switch chip (one for WAN and one for LAN).
>>>
>>> This is not typically how it is designed to work, you would register
>>> one dsa switch chip with the ports assignment in this data structure.
>>> Right now, DSA does not support a dual Ethernet MAC configuration,
>>> although you could probably do per-port VLAN membership and create two
>>> default VLANs to allow that.
>>>
>>>>
>>>> My device has 2 MACs and one shared mii_bus. I have added two
>>>> dsa_platform_data structures and registered them but cannot probe as
>>>> dsa_probe tries mdio_register() on the same bus and fails.
>>>>
>>>> Is it possible to create two DSA switch on same (shared) mii?
>>>
>>> Not in its current form, and I am not exactly sure how we would support that.
>>
>> Yes, not with the current DSA implementation, but I can manage to
>> solve this by a small (ugly) patch - renamed the slave mii bus as
>> "<master_bus->id>:<pd->sw_addr>:<platform_device->id>" instead of
>> "<master_bus->id>:<pd->sw_addr>". Though I am not yet sure enough if
>> this could have any negative impact or not.
>
> This will register two virtual slave mii buses backed by the same real
> mdio bus driver, although that is not necessarily a problem, you want to
> make sure that they are not going to poll the same PHYs in the switch
> driver.
>
> This is probably the easiest way to achieve what you want, can you just
> introduce a check on the "id" being >= 0 (using -1 with platform_devices
> is valid when there is just one device in the system).
>
> Thanks!
>

yes, I handled that later. Thanks for your reply

>>
>> Comments please.
>>
>>>
>>>>
>>>> Regards,
>>>> Rajib
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Florian
>>
>> ----
>> diff -rup org/net/dsa/dsa.c mod/net/dsa/dsa.c
>> --- org/net/dsa/dsa.c
>> +++ mod/net/dsa/dsa.c
>> @@ -68,7 +68,7 @@ dsa_switch_probe(struct mii_bus *bus, in
>>
>>  /* basic switch operations **************************************************/
>>  static struct dsa_switch *
>> -dsa_switch_setup(struct dsa_switch_tree *dst, int index,
>> +dsa_switch_setup(struct dsa_switch_tree *dst, int index, int id,
>>   struct device *parent, struct mii_bus *bus)
>>  {
>>   struct dsa_chip_data *pd = dst->pd->chip + index;
>> @@ -156,7 +156,7 @@ dsa_switch_setup(struct dsa_switch_tree
>>   ret = -ENOMEM;
>>   goto out;
>>   }
>> - dsa_slave_mii_bus_init(ds);
>> + dsa_slave_mii_bus_init(ds, id);
>>
>>   ret = mdiobus_register(ds->slave_mii_bus);
>>   if (ret < 0)
>> @@ -349,7 +349,7 @@ static int dsa_probe(struct platform_dev
>>   continue;
>>   }
>>
>> - ds = dsa_switch_setup(dst, i, &pdev->dev, bus);
>> + ds = dsa_switch_setup(dst, i, pdev->id, &pdev->dev, bus);
>>   if (IS_ERR(ds)) {
>>   printk(KERN_ERR "%s[%d]: couldn't create dsa switch "
>>   "instance (error %ld)\n", dev->name, i,
>> diff -rup org/net/dsa/dsa_priv.h mod/net/dsa/dsa_priv.h
>> --- org/net/dsa/dsa_priv.h
>> +++ mod/net/dsa/dsa_priv.h
>> @@ -163,7 +163,7 @@ void register_switch_driver(struct dsa_s
>>  void unregister_switch_driver(struct dsa_switch_driver *type);
>>
>>  /* slave.c */
>> -void dsa_slave_mii_bus_init(struct dsa_switch *ds);
>> +void dsa_slave_mii_bus_init(struct dsa_switch *ds, int id);
>>  struct net_device *dsa_slave_create(struct dsa_switch *ds,
>>      struct device *parent,
>>      int port, char *name);
>> diff -rup org/net/dsa/slave.c mod/net/dsa/slave.c
>> --- org/net/dsa/slave.c
>> +++ mod/net/dsa/slave.c
>> @@ -35,14 +35,14 @@ static int dsa_slave_phy_write(struct mi
>>   return 0;
>>  }
>>
>> -void dsa_slave_mii_bus_init(struct dsa_switch *ds)
>> +void dsa_slave_mii_bus_init(struct dsa_switch *ds, int id)
>>  {
>>   ds->slave_mii_bus->priv = (void *)ds;
>>   ds->slave_mii_bus->name = "dsa slave smi";
>>   ds->slave_mii_bus->read = dsa_slave_phy_read;
>>   ds->slave_mii_bus->write = dsa_slave_phy_write;
>> - snprintf(ds->slave_mii_bus->id, MII_BUS_ID_SIZE, "%s:%.2x",
>> - ds->master_mii_bus->id, ds->pd->sw_addr);
>> + snprintf(ds->slave_mii_bus->id, MII_BUS_ID_SIZE, "%s:%.2x:%.1x",
>> + ds->master_mii_bus->id, ds->pd->sw_addr, id);
>>   ds->slave_mii_bus->parent = &ds->master_mii_bus->dev;
>>  }
>>
>

^ permalink raw reply

* Re: [PATCH net-next V2] tun/macvtap: use consume_skb() instead of kfree_skb() when needed
From: Jason Wang @ 2014-11-27  5:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: davem, netdev, linux-kernel, Eric Dumazet
In-Reply-To: <20141126134742.GA5358@redhat.com>



On 11/26/2014 09:47 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 26, 2014 at 03:43:30PM +0800, Jason Wang wrote:
>> >To be more friendly with drop monitor, we should only call kfree_skb() when
>> >the packets were dropped and use consume_skb() in other cases.
>> >
>> >Cc: Eric Dumazet<eric.dumazet@gmail.com>
>> >Signed-off-by: Jason Wang<jasowang@redhat.com>
>> >---
>> >Changes from V1:
>> >- check the return value of tun/macvtap_put_user()
>> >---
>> >  drivers/net/macvtap.c | 5 ++++-
>> >  drivers/net/tun.c     | 5 ++++-
>> >  2 files changed, 8 insertions(+), 2 deletions(-)
>> >
>> >diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> >index 42a80d3..c171ab6 100644
>> >--- a/drivers/net/macvtap.c
>> >+++ b/drivers/net/macvtap.c
>> >@@ -862,7 +862,10 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q,
>> >  		}
>> >  		iov_iter_init(&iter, READ, iv, segs, len);
>> >  		ret = macvtap_put_user(q, skb, &iter);
>> >-		kfree_skb(skb);
>> >+		if (ret < 0)
> Maybe unlikely() here?
>

Better, will post V3.

Thanks

^ permalink raw reply

* [PATCH net] bpf: x86: fix epilogue generation for eBPF programs
From: Alexei Starovoitov @ 2014-11-27  5:02 UTC (permalink / raw)
  To: David S. Miller
  Cc: Zi Shen Lim, Eric Dumazet, Daniel Borkmann, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, netdev, linux-kernel

classic BPF has a restriction that last insn is always BPF_RET.
eBPF doesn't have BPF_RET instruction and this restriction.
It has BPF_EXIT insn which can appear anywhere in the program
one or more times and it doesn't have to be last insn.
Fix eBPF JIT to emit epilogue when first BPF_EXIT is seen
and all other BPF_EXIT instructions will be emitted as jump.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
Note, this bug is applicable only to native eBPF programs
which first were introduced in 3.18, so no need to send it
to stable and therefore no 'Fixes' tag.

arm64 JIT has the same problem, but the fix is not as trivial,
so will be done as separate patch.

Since 3.18 can only load eBPF programs and cannot execute them,
this patch can even be done in net-next only, but I think it's worth
to apply it to 3.18(net), so that JITed output for native eBPF
programs is correct when bpf syscall loads it with net.core.bpf_jit_enable=2

 arch/x86/net/bpf_jit_comp.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 3f62734..7e90244 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -178,7 +178,7 @@ static void jit_fill_hole(void *area, unsigned int size)
 }
 
 struct jit_context {
-	unsigned int cleanup_addr; /* epilogue code offset */
+	int cleanup_addr; /* epilogue code offset */
 	bool seen_ld_abs;
 };
 
@@ -192,6 +192,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 	struct bpf_insn *insn = bpf_prog->insnsi;
 	int insn_cnt = bpf_prog->len;
 	bool seen_ld_abs = ctx->seen_ld_abs | (oldproglen == 0);
+	bool seen_exit = false;
 	u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
 	int i;
 	int proglen = 0;
@@ -854,10 +855,11 @@ common_load:
 			goto common_load;
 
 		case BPF_JMP | BPF_EXIT:
-			if (i != insn_cnt - 1) {
+			if (seen_exit) {
 				jmp_offset = ctx->cleanup_addr - addrs[i];
 				goto emit_jmp;
 			}
+			seen_exit = true;
 			/* update cleanup_addr */
 			ctx->cleanup_addr = proglen;
 			/* mov rbx, qword ptr [rbp-X] */
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next] ipvlan: ipvlan depends on INET and IPV6
From: Mahesh Bandewar @ 2014-11-27  5:13 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Jim Davis, Mahesh Bandewar

This driver uses ip_out_local() and ip6_route_output() which are
defined only if CONFIG_INET and CONFIG_IPV6 are enabled respectively.

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/Kconfig         | 2 ++
 drivers/net/ipvlan/ipvlan.h | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b6d64f546574..d6607ee9c855 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -148,6 +148,8 @@ config MACVTAP
 
 config IPVLAN
     tristate "IP-VLAN support"
+    depends on INET
+    depends on IPV6
     ---help---
       This allows one to create virtual devices off of a main interface
       and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index ab3e7614ed71..c44d29eca6c0 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -23,8 +23,9 @@
 #include <linux/if_vlan.h>
 #include <linux/ip.h>
 #include <linux/inetdevice.h>
+#include <net/ip.h>
+#include <net/ip6_route.h>
 #include <net/rtnetlink.h>
-#include <net/gre.h>
 #include <net/route.h>
 #include <net/addrconf.h>
 
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* Re: [PATCH net-next] macvlan: delay the header check for dodgy packets into lower device
From: Jason Wang @ 2014-11-27  5:14 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, netdev, linux-kernel, mst, vyasevic
In-Reply-To: <20141126.153736.1815789848350571029.davem@davemloft.net>



On 11/27/2014 04:37 AM, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Wed, 26 Nov 2014 17:21:14 +0800
>
>> We do header check twice for a dodgy packet. One is done before
>> macvlan_start_xmit(), another is done before lower device's
>> ndo_start_xmit(). The first one seems redundant so this patch tries to
>> delay header check until a packet reaches its lower device (or macvtap)
>> through always enabling NETIF_F_GSO_ROBUST for macvlan device.
>>
>> Cc: Patrick McHardy <kaber@trash.net>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>
> Hmmm, it's the idea that if we have a dodgy packet, we want to
> notice that as early as possible in the packet processing path?
>

Not late even with this patch. The check will be done immediately after 
macvlan passing a packet to lower device which should be sufficient.

For good packets, this patch saves one time of header checking. For bad 
packets, this patch just lets the dropping happens during the validation 
before ndo_start_xmit() of lower device.

^ permalink raw reply

* [PATCH] e1000: remove unused variables
From: Sudip Mukherjee @ 2014-11-27  5:22 UTC (permalink / raw)
  To: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny,
	Don Skidmore, Greg Rose, Matthew Vick, John Ronciak,
	Mitch Williams, Linux NICS
  Cc: e1000-devel, netdev, Sudip Mukherjee, linux-kernel

these variables were only being assigned some values, but were never
used.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
---
 drivers/net/ethernet/intel/e1000/e1000_hw.c   | 142 ++++++++++++--------------
 drivers/net/ethernet/intel/e1000/e1000_main.c |   3 -
 2 files changed, 66 insertions(+), 79 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_hw.c b/drivers/net/ethernet/intel/e1000/e1000_hw.c
index 45c8c864..7812f59 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_hw.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_hw.c
@@ -154,7 +154,6 @@ static s32 e1000_set_phy_type(struct e1000_hw *hw)
  */
 static void e1000_phy_init_script(struct e1000_hw *hw)
 {
-	u32 ret_val;
 	u16 phy_saved_data;
 
 	if (hw->phy_init_script) {
@@ -163,7 +162,7 @@ static void e1000_phy_init_script(struct e1000_hw *hw)
 		/* Save off the current value of register 0x2F5B to be restored
 		 * at the end of this routine.
 		 */
-		ret_val = e1000_read_phy_reg(hw, 0x2F5B, &phy_saved_data);
+		e1000_read_phy_reg(hw, 0x2F5B, &phy_saved_data);
 
 		/* Disabled the PHY transmitter */
 		e1000_write_phy_reg(hw, 0x2F5B, 0x0003);
@@ -402,7 +401,6 @@ s32 e1000_reset_hw(struct e1000_hw *hw)
 {
 	u32 ctrl;
 	u32 ctrl_ext;
-	u32 icr;
 	u32 manc;
 	u32 led_ctrl;
 	s32 ret_val;
@@ -527,7 +525,7 @@ s32 e1000_reset_hw(struct e1000_hw *hw)
 	ew32(IMC, 0xffffffff);
 
 	/* Clear any pending interrupt events. */
-	icr = er32(ICR);
+	er32(ICR);
 
 	/* If MWI was previously enabled, reenable it. */
 	if (hw->mac_type == e1000_82542_rev2_0) {
@@ -2396,16 +2394,13 @@ static s32 e1000_check_for_serdes_link_generic(struct e1000_hw *hw)
  */
 s32 e1000_check_for_link(struct e1000_hw *hw)
 {
-	u32 rxcw = 0;
-	u32 ctrl;
 	u32 status;
 	u32 rctl;
 	u32 icr;
-	u32 signal = 0;
 	s32 ret_val;
 	u16 phy_data;
 
-	ctrl = er32(CTRL);
+	er32(CTRL);
 	status = er32(STATUS);
 
 	/* On adapters with a MAC newer than 82544, SW Definable pin 1 will be
@@ -2414,12 +2409,9 @@ s32 e1000_check_for_link(struct e1000_hw *hw)
 	 */
 	if ((hw->media_type == e1000_media_type_fiber) ||
 	    (hw->media_type == e1000_media_type_internal_serdes)) {
-		rxcw = er32(RXCW);
+		er32(RXCW);
 
 		if (hw->media_type == e1000_media_type_fiber) {
-			signal =
-			    (hw->mac_type >
-			     e1000_82544) ? E1000_CTRL_SWDPIN1 : 0;
 			if (status & E1000_STATUS_LU)
 				hw->get_link_status = false;
 		}
@@ -4698,78 +4690,76 @@ s32 e1000_led_off(struct e1000_hw *hw)
  */
 static void e1000_clear_hw_cntrs(struct e1000_hw *hw)
 {
-	volatile u32 temp;
-
-	temp = er32(CRCERRS);
-	temp = er32(SYMERRS);
-	temp = er32(MPC);
-	temp = er32(SCC);
-	temp = er32(ECOL);
-	temp = er32(MCC);
-	temp = er32(LATECOL);
-	temp = er32(COLC);
-	temp = er32(DC);
-	temp = er32(SEC);
-	temp = er32(RLEC);
-	temp = er32(XONRXC);
-	temp = er32(XONTXC);
-	temp = er32(XOFFRXC);
-	temp = er32(XOFFTXC);
-	temp = er32(FCRUC);
-
-	temp = er32(PRC64);
-	temp = er32(PRC127);
-	temp = er32(PRC255);
-	temp = er32(PRC511);
-	temp = er32(PRC1023);
-	temp = er32(PRC1522);
-
-	temp = er32(GPRC);
-	temp = er32(BPRC);
-	temp = er32(MPRC);
-	temp = er32(GPTC);
-	temp = er32(GORCL);
-	temp = er32(GORCH);
-	temp = er32(GOTCL);
-	temp = er32(GOTCH);
-	temp = er32(RNBC);
-	temp = er32(RUC);
-	temp = er32(RFC);
-	temp = er32(ROC);
-	temp = er32(RJC);
-	temp = er32(TORL);
-	temp = er32(TORH);
-	temp = er32(TOTL);
-	temp = er32(TOTH);
-	temp = er32(TPR);
-	temp = er32(TPT);
-
-	temp = er32(PTC64);
-	temp = er32(PTC127);
-	temp = er32(PTC255);
-	temp = er32(PTC511);
-	temp = er32(PTC1023);
-	temp = er32(PTC1522);
-
-	temp = er32(MPTC);
-	temp = er32(BPTC);
+	er32(CRCERRS);
+	er32(SYMERRS);
+	er32(MPC);
+	er32(SCC);
+	er32(ECOL);
+	er32(MCC);
+	er32(LATECOL);
+	er32(COLC);
+	er32(DC);
+	er32(SEC);
+	er32(RLEC);
+	er32(XONRXC);
+	er32(XONTXC);
+	er32(XOFFRXC);
+	er32(XOFFTXC);
+	er32(FCRUC);
+
+	er32(PRC64);
+	er32(PRC127);
+	er32(PRC255);
+	er32(PRC511);
+	er32(PRC1023);
+	er32(PRC1522);
+
+	er32(GPRC);
+	er32(BPRC);
+	er32(MPRC);
+	er32(GPTC);
+	er32(GORCL);
+	er32(GORCH);
+	er32(GOTCL);
+	er32(GOTCH);
+	er32(RNBC);
+	er32(RUC);
+	er32(RFC);
+	er32(ROC);
+	er32(RJC);
+	er32(TORL);
+	er32(TORH);
+	er32(TOTL);
+	er32(TOTH);
+	er32(TPR);
+	er32(TPT);
+
+	er32(PTC64);
+	er32(PTC127);
+	er32(PTC255);
+	er32(PTC511);
+	er32(PTC1023);
+	er32(PTC1522);
+
+	er32(MPTC);
+	er32(BPTC);
 
 	if (hw->mac_type < e1000_82543)
 		return;
 
-	temp = er32(ALGNERRC);
-	temp = er32(RXERRC);
-	temp = er32(TNCRS);
-	temp = er32(CEXTERR);
-	temp = er32(TSCTC);
-	temp = er32(TSCTFC);
+	er32(ALGNERRC);
+	er32(RXERRC);
+	er32(TNCRS);
+	er32(CEXTERR);
+	er32(TSCTC);
+	er32(TSCTFC);
 
 	if (hw->mac_type <= e1000_82544)
 		return;
 
-	temp = er32(MGTPRC);
-	temp = er32(MGTPDC);
-	temp = er32(MGTPTC);
+	er32(MGTPRC);
+	er32(MGTPDC);
+	er32(MGTPTC);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 24f3986..a70ea46 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -2443,7 +2443,6 @@ static void e1000_watchdog(struct work_struct *work)
 	if (link) {
 		if (!netif_carrier_ok(netdev)) {
 			u32 ctrl;
-			bool txb2b = true;
 			/* update snapshot of PHY registers on LSC */
 			e1000_get_speed_and_duplex(hw,
 						   &adapter->link_speed,
@@ -2465,11 +2464,9 @@ static void e1000_watchdog(struct work_struct *work)
 			adapter->tx_timeout_factor = 1;
 			switch (adapter->link_speed) {
 			case SPEED_10:
-				txb2b = false;
 				adapter->tx_timeout_factor = 16;
 				break;
 			case SPEED_100:
-				txb2b = false;
 				/* maybe add some timeout factor ? */
 				break;
 			}
-- 
1.8.1.2


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* Re: [RFC PATCH 1/3] net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag
From: Alexei Starovoitov @ 2014-11-27  5:29 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Network Development, David S. Miller, Jesper Dangaard Brouer,
	Jeff Kirsher, Eric Dumazet
In-Reply-To: <20141127000557.1617.88261.stgit@ahduyck-vm-fedora20>

On Wed, Nov 26, 2014 at 4:05 PM, Alexander Duyck
<alexander.h.duyck@redhat.com> wrote:
> This patch splits the netdev_alloc_frag function up so that it can be used
> on one of two page frag pools instead of being fixed on the
> netdev_alloc_cache.  By doing this we can add a NAPI specific function
> __napi_alloc_frag that accesses a pool that is only used from softirq
> context.  The advantage to this is that we do not need to call
> local_irq_save/restore which can be a significant savings.
>
> I also took the opportunity to refactor the core bits that were placed in
> __alloc_page_frag.  First I updated the allocation to do either a 32K
> allocation or an order 0 page.  Then I also rewrote the logic to work from
> the end of the page to the start.  By doing this the size value doesn't
> have to be used unless we have run out of space for page fragments.
> Finally I cleaned up the atomic bits so that we just do an
> atomic_sub_return and if that returns 0 then we set the page->_count via an
> atomic_set.  This way we can remove the extra conditional for the
> atomic_read since it would have led to an atomic_inc in the case of success
> anyway.

Nice simplification. Complicated, but looks good to me.
I think only replacement of loop with 32k+page begs
better explanation in the commit log. I'm guessing you're
killing intermediate sizes to simplify the code?

Thank you for doing it. It's a great improvement.

^ permalink raw reply

* Re: [PATCH v2 02/19] kbuild: kselftest_install - add a new make target to install selftests
From: Masami Hiramatsu @ 2014-11-27  5:32 UTC (permalink / raw)
  To: Shuah Khan
  Cc: gregkh, akpm, mmarek, davem, keescook, tranmanphong, dh.herrmann,
	hughd, bobby.prani, ebiederm, serge.hallyn, linux-kbuild,
	linux-kernel, linux-api, netdev
In-Reply-To: <a2344d4df903d673afe1631118f40917f773cc9a.1415735831.git.shuahkh@osg.samsung.com>

(2014/11/12 5:27), Shuah Khan wrote:
> Add a new make target to install to install kernel selftests.
> This new target will build and install selftests. kselftest
> target now depends on kselftest_install and runs the generated
> kselftest script to reduce duplicate work and for common look
> and feel when running tests.
> 
> Approach:
> 
> make kselftest_target:

kselftest_install?

> -- exports kselftest INSTALL_KSFT_PATH
>    default $(INSTALL_MOD_PATH)/lib/kselftest/$(KERNELRELEASE)
> -- exports path for ksefltest.sh
> -- runs selftests make install target:

This direction is OK to me.

BTW, I've found another path to make selftest in Makefile,
Actually you can do

make -C tools/ selftest

And there are selftest_install and selftest_clean targets (but
currently it has a bug and doesn't work, anyway)

I think we'd better do subdir make instead of adding these targets.
This means that "make kselftest*" should be an alias of "make -C tools/ selftest*"

Also, I'd like to request passing some options like as O=$(objtree)
so that we can make test kmodules in selftests.

Thank you,


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox