* [PATCH v2.54] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2014-02-12 3:52 UTC (permalink / raw)
To: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
Jesse Gross, Ben Pfaff
Cc: Ravi K
Hi Jesse, Hi All,
As per the suggestion made by Ben in relation to this patch
I have updated it so that:
* The datapath rejects push MPLS actions in the presence of VLAN tags.
I have done this by blacklisting the following:
- ETH_P_8021Q (0x8100)
- ETH_P_8021AD (0x88A8)
- ETH_P_QINQ1 (0x0x9100)
- ETH_P_QINQ2 (0x0x9200)
- ETH_P_QINQ3 (0x0x9300)
But perhaps a safer option would be to whitelist only ethertypes
we are completely comfortable with. Starting with:
- ETH_P_IP (0x0800)
- ETH_P_ARP (0x0806)
- ETH_P_IPV6 (0x86DD)
to aid review this patch is available in git at:
https://github.com/horms/openvswitch devel/mpls-v2.54
Simon Horman (1):
datapath: Add basic MPLS support to kernel
OPENFLOW-1.1+ | 12 -
datapath/Modules.mk | 1 +
datapath/actions.c | 119 +++++++++-
datapath/datapath.c | 4 +-
datapath/flow.c | 29 +++
datapath/flow.h | 17 +-
datapath/flow_netlink.c | 296 ++++++++++++++++++++++--
datapath/flow_netlink.h | 2 +-
datapath/linux/compat/gso.c | 70 +++++-
datapath/linux/compat/gso.h | 41 ++++
datapath/linux/compat/include/linux/netdevice.h | 6 +-
datapath/linux/compat/netdevice.c | 10 +-
datapath/mpls.h | 15 ++
include/linux/openvswitch.h | 7 +-
14 files changed, 567 insertions(+), 62 deletions(-)
create mode 100644 datapath/mpls.h
--
1.8.5.2
^ permalink raw reply
* [PATCH v2.54] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2014-02-12 3:52 UTC (permalink / raw)
To: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
Jesse Gross, Ben Pfaff
Cc: Ravi K
In-Reply-To: <1392177160-18742-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.
Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
Cc: Ravi K <rkerur-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Leo Alterman <lalterman-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Cc: Isaku Yamahata <yamahata-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
Cc: Joe Stringer <joe-Q1GJJQv1iO6lP80pJB477g@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>
---
v2.54
* Do not allow push MPLS in the presence of VLANs
* Remove support for push MPLS in the presence of VLANs from actions.c
v2.53
* Push MPLS labels after VLAN tags
- This is consistent with OF1.2 and plans for OF1.3.4, and OF1.5+.
It is inconsistent with OF1.4, which appears to be an aberration
v2.52
* Do not guard __skb_network_protocol with KERNEL_VERSION(3.11.0)
It was not guarded before this patch and should not be guarded
afterwards as it is currently needed regardless of the kernel version
v2.50 - v2.51
* No change
v2.49
* Remove MPLS items from OPENFLOW-1.1+. They should now be complete.
v2.47
* Rebase for HAVE_RHEL_OVS_HOOK and OVS_KEY_ATTR_TCP_FLAGS
v2.43 - v2.46
* No change
v2.42
* Rebase for:
+ 0585f7a ("datapath: Simplify mega-flow APIs.")
+ a097c0b ("datapath: Restructure datapath.c and flow.c")
* As suggested by Jesse Gross
+ Take into account that push_mpls() will have freed the skb on error
+ Remove dubious !eth_p_mpls(skb->protocol) condition from push_mpls
The !eth_p_mpls(skb->protocol) condition on setting inner_protocol
has no effect. Its motivation was to ensure that inner_protocol was
only set the first time that mpls_push occured. However this is already
ensured by the !ovs_skb_get_inner_protocol(skb) condition.
+ Return -EINVAL instead of -ENOMEM from pop_mpls() if the skb is too short
+ Do not add @inner_protocol to kernel doc for struct ovs_skb_cb.
The patch no longer adds an inner_protocol member to struct ovs_skb_cb
+ Do not add and set otherwise unsued inner_protocol variable in
rpl_dev_queue_xmit()
* As suggested by Pravin Shelar
+ Implement compatibility code in existing rpl_skb_gso_segment
rather than introducing to use rpl___skb_gso_segment
v2.41
* No change
v2.40
* Rebase for:
+ New dev_queue_xmit compat code
+ Updated put_vlan()
* As suggested by Jesse Gross
+ Remove bogus mac_len update from push_mpls()
+ Slightly simplify push_mpls() by using eth_hdr()
+ Remove dubious condition !eth_p_mpls(inner_protocol) on
an skb being considered to be MPLS in netdev_send()
+ Only use compatibility code for MPLS GSO segmentation on kernels
older than 3.11
+ Revamp setting of inner_protocol
1. Do not unconditionally set inner_protocol to the value of
skb->protocol in ovs_execute_actions().
2. Initialise inner_protocol it to zero only if compatibility code is in
use. In the case where compatibility code is not in use it will either
be zero due since the allocation of the skb or some other value set
by some other user.
3. Conditionally set the inner_protocol in push_mpls() to the value of
skb->protocol when entering push_mpls(). The condition is that
inner_protocol is zero and the value of skb->protocol is not an MPLS
ethernet type.
- This new scheme:
+ Pushes logic to set inner_protocol closer to the case where it is
needed.
+ Avoids over-writing values set by other users.
* As suggested by Pravin Shelar
+ Only set and restore skb->protocol in rpl___skb_gso_segment() in the
case of MPLS
+ Add inner_protocol field to struct ovs_gso_cb instead of ovs_skb_cb.
This moves compatibility code closer to where it is used
and creates fewer differences with mainline.
* Update comment on mac_len updates in datapath/actions.c
* Remove HAVE_INNER_PROCOTOL and instead just check
against kernel version 3.11 directly.
HAVE_INNER_PROCOTOL is a hang-over from work done prior
to the merge of inner_protocol into the kernel.
* Remove dubious condition !eth_p_mpls(inner_protocol) on
using inner_protocol as the type in rpl_skb_network_protocol()
* Do not update type of features in rpl_dev_queue_xmit.
Though arguably correct this is not an inherent part of
the changes made by this patch.
* Use skb_cow_head() in push_mpls()
+ Call skb_cow_head(skb, MPLS_HLEN) instead of
make_writable(skb, skb->mac_len) to ensure that there is enough head
room to push an MPLS LSE regardless of whether the skb is cloned or not.
+ This is consistent with the behaviour of rpl__vlan_put_tag().
+ This is a fix for crashes reported when performing mpls_push
with headroom less than 4. This problem was introduced in v3.36.
* Skip popping in mpls_pop if the skb is too short to contain an MPLS LSE
v2.39
* Rebase for removal of vlan, checksum and skb->mark compat code
v2.38
* Rebase for SCTP support
* Refactor validate_tp_port() to iterate over eth_types rather
than open-coding the loop. With the addition of SCTP this logic
is now used three times.
v2.37
* Rebase
v2.36
* Do not add set_ethertype() to datapath/actions.c.
As this patch has evolved this function had devolved into
to sets of functionality wrapped into a single function with
only one line of common code. Refactor things to simply
open-code setting the ether type in the two locations where
set_ethertype() was previously used. The aim here is to improve
readability.
* Update setting skb->protocol after mpls push and pop.
- In the case of push_mpls it should be set unconditionally
as in v2.35 the behaviour of this function to always push
an MPLS LSE before any VLAN tags.
- In the case of mpls_pop eth_p_mpls(skb->protocol) is a better
test than skb->protocol != htons(ETH_P_8021Q) as it will give the
correct behaviour in the presence of other VLAN ethernet types,
for example 0x88a8 which is used by 802.1ad. Moreover, it seems
correct to update the ethernet type if it was previously set
according to the top-most MPLS LSE.
* Deaccelerate VLANs when pushing MPLS tags the
- Since v2.35 MPLS push will insert an MPLS LSE before any VLAN tags.
This means that if an accelerated tag is present it should be
deaccelerated to ensure it ends up in the correct position.
* Update skb->mac_len in push_mpls() so that it will be correct
when used by a subsequent call to pop_mpls().
As things stand I do not believe this is strictly necessary as
ovs-vswitchd will not send a pop MPLS action after a push MPLS action.
However, I have added this in order to code more defensively as I believe
that if such a sequence did occur it would be rather unobvious why
it didn't work.
* Do not add skb_cow_head() call in push_mpls().
It is unnecessary as there is a make_writable() call.
This change was also made in v2.30 but some how the
code regressed between then and v2.35.
v2.35
* Rebase
* Move MPLS constants to mpls.h
* Push MPLS tags after ethernet, before VLAN tags
- This is consistent with the OpenFlow 1.3 specification
- Compatibility with OpenFlow 1.2 and earlier versions
may be provided by ovs-vswitchd.
* Correct GSO behaviour in the presence of MPLS but absence of VLANs
v2.34
* Rebase for megaflow changes
v2.33
* Ensure that inner_protocol is always set to to the current
skb->protocol value in ovs_execute_actions(). This ensures
it is set to the correct value in the absence of a push_mpls action.
Also remove setting of inner_protocol in push_mpls() as
it duplicates the code now in ovs_execute_actions().
* Call __skb_gso_segment() instead of skb_gso_segment() from
rpl___skb_gso_segment() in the case that HAVE___SKB_GSO_SEGMENT is set.
This was a typo.
v2.32
* As suggested by Jesse Gross
- Use int instead of size_t in validate_and_copy_actions__().
- Fix crazy edit mess in pop_mpls() action comment
- Move eth_p_mpls() into mpls.h
- Refactor skb_gso_segment MPLS handling into rpl_skb_gso_segment
Address Jesse's comments regarding this code:
"Can we push this completely into the skb_gso_segment() compatibility
code? It's both nicer and may make the interactions with the vlan code
less confusing."
- Move GSO compatibility code into linux/compat/gso.*
- Set skb->protocol on mpls_push and mpls_pop in the presence
of an offloaded VLAN.
v2.31
* As suggested by Jesse Gross
- There is no need to make mac_header_end inline as it is not in a header file
- Remove dubious if (*skb_ethertype == ethertype) optimisation from
set_ethertype
- Only set skb->protocol in push_mpls() or pop_mpls() for non-VLAN packets
- Use MAX_ETH_TYPES instead of SAMPLE_ACTION_DEPTH for array size
of types in struct eth_types. This corrects a typo/thinko.
- Correct eth type tracking logic such that start isn't advanced
when entering a sample action, ensuring that all possibly types
are checked when verifying nested actions.
* Define HAVE_INNER_PROTOCOL based on kernel version.
inner_protocol has been merged into net-next and should appear in
v3.11 so there is no longer a need for a acinclude.m4 test to check for it.
* Add MPLS GSO compatibility code.
This is for use on kernels that do not have MPLS GSO support.
Thanks to Joe Stringer for his work on this.
v2.30
* As suggested by Jesse Gross
- Use skb_cow_head in push_mpls to ensure there is sufficient headroom for
skb_push
- Call make_writable with skb->mac_len instead of skb->mac_len + MPLS_HLEN
in push_mpls as only the first skb->mac_len bytes of existing packet data
are modified.
- Rename skb_mac_header_end as mac_header_end, this seems
to be a more appropriate name for a local function.
- Remove OVS_CSUM_COMPLETE code from set_ethertype().
Inside OVS the ethernet header is not covered by OVS_CSUM_COMPLETE.
- Use __skb_pull() instead of skb_pull() in pop_mpls()
- Decrement and decrement skb->mac_len when poping and pushing VLAN tags.
Previously mac_len was reset, but this would result in forgetting
the MPLS label stack.
- Remove spurious comment from before do_execute_actions().
- Move OVS_KEY_ATTR_MPLS attribute to its final, upstreamable, location.
- Correct ethertype check for OVS_ACTION_ATTR_POP_MPLS case in
validate_and_copy_actions() to check for MPLS ethertypes rather than
ETH_P_IP.
- Rewrite tracking of eth types used to verify actions in the presence
of sample actions. There is a large comment above struct eth_types
describing the new implementation.
v2.29
* Break include/ and lib/ portions of the patch out into a
separate patch "datapath: Add basic MPLS support to kernel"
* Update for new MPLS GSO scheme
- skb->protocol is set to the new ethertype of the packet
on MPLS push and pop
- When pushing the first MPLS LSE onto a previously non-MPLS
packet set skb->inner_protocol to the original ethertype.
- skb->inner_protocol may be used by the network stack
for GSO of the inner-packet.
* Drop const from ethertype parameter of set_ethertype.
This appears to be a legacy of this parameter being a pointer.
* Pass the ethertype patrameter of pop_mpls as a value rather
than a pointer.
v2.28
* Kernel Datapath changes as suggested by Jarno Rajahalme
+ Correct the logic introduced in v2.27 to set the network_header
to after the MPLS label stack in the case of an MPLS packet.
- Increment stack_len offset so that label stacks of depth greater
than two do not cause an infinite loop.
- Correct offset passed to check_header to include skb->mac len
v2.27
* Kernel Datapath changes as suggested by Jarno Rajahalme and Jesse Gross:
+ Previously the mac_len and network_header of an skb corresponded
to the end of the L2 header. To support GSO, just before transmission,
do_output, with the results as follows:
Input: non-MPLS skb: Output: network header and mac_len correspond
to the beginning of the L3 headers
Input: MPLS: Output: network header and mac_len correspond to the
end of the L2 headers.
This is somewhat confusing.
+ The new scheme is as follows:
- The mac_len always corresponds to the end of the L2 header.
- The network header always corresponds to the beginning of the
L3 header.
+ Note that in the case of MPLS output the end of the L2 headers and the
beginning of the L3 headers will differ.
* Remove unused declaration of skb_cb_mpls_stack()
v2.26
* Rebase on master
* Kernel Datapath changes as suggested by Jarno Rajahalme
- Use skb_network_header() instead of skb_mac_header() to locate
the ethertype to set in set_ethertype() as the latter will
be wrong in the presence of VLAN tags. This resolves
a regression introduced in v2.24.
- Enhance comment in do_output()
- do_execute_actions(): Do not alter mpls_stack_depth if
a MPLS push or pop action fail. This is achieved by altering
mpls_stack_depth at the end of push_mpls() and pop_mpls().
v2.25
* Rebase on master
* Pass big-endian value as the last argument of eth_types_set() in
validate_and_copy_actions__()
* Use revised GSO support as provided by the patch series
"[PATCH 0/2] Small Modifications to GSO to allow segmentation of MPLS"
- Set skb->mac_len to the length of the l2 header + MPLS stack length
- Update skb->network_header accordingly
- Set skb->encapsulated_features
v2.24
* Use skb_mac_header() in set_ethertype()
* Set skb->encapsulation in set_ethertype() to support MPLS GSO.
Also add a note about the other requirements for MPLS GSO.
MPLS GSO support will be posted as a patch net-next (Linux mainline)
"MPLS: Add limited GSO support"
* Do not add ETH_TYPE_MIN, it is no longer used
v2.23
* As suggested by Jesse Gross:
- Verify the current ethernet type when validating sample actions
both for the taken and not-taken path if the sample action.
- Document that the OVS_KEY_ATTR_MPLS attribute accepts a list of
struct ovs_key_mpls but that an implementation may restrict
the length it accepts.
- Restrict the array length of the OVS_KEY_ATTR_MPLS to one.
+ Don't add ovs_flow_verify_key_len as it was added to
handle attributes whose values are arrays but there are
no attributes with values that are arrays (of length greater than one).
v2.22
* As suggested by Jesse Gross:
- Fix sparse warning in validate_and_copy_actions()
I have no idea why sparse doesn't show this up this on my system.
- Remove call to skb_cow_head() from push_mpls() as it
is already covered by a call to make_writable()
- Check (key_type > OVS_KEY_ATTR_MAX) in ovs_flow_verify_key_len()
- Disallow set actions on l2.5+ data and MPLS push and pop actions
after an MPLS pop action as there is no verification that the packet
is actually of the new ethernet type. This may later be supported
using recirculation or by other means.
- Do not add spurious debuging message to ovs_flow_cmd_new_or_set()
v2.21
* As suggested by Jesse Gross:
- Verify that l3 and l4 actions always always occur prior to
a push_mpls action and use the network header pointer of an skb
to track the top of the MPLS stack. This avoids adding an l2_size
element to the skb callback.
v2.20
* As suggested by Jesse Gross:
- Do not add ovs_dp_ioctl_hook
+ This appears to be garbage from a rebase
- Do not add skb_cb_set_l2_size. Instead set OVS_CB(skb)->l2_size
in ovs_flow_extract().
- Do not free skb on error in push_mpls(), it is freed in the caller
- Call skb_reset_mac_len() in pop_mpls() and push_mpls()
- Update checksums in pop_mpls(), push_mpls() and set_mpls().
- Rename skb_cb_mpls_bos() as skb_cb_mpls_stack().
It returns the top not the bottom of the stack.
- Track the current eth_type in validate_and_copy_actions
which is initially the eth_type of the flow and may be modified
by push_mpls and pop_mpls actions. Use this to correctly validate
mpls_set actions. This is to allow mpls_set actions to be applied
to a non-MPLS frame after an mpls_push action (although ovs-vswitchd
doesn't currently do that).
Also:
+ Remove the check of the eth_type in set_mpls() as the new validation
scheme should ensure it cannot be incorrect.
+ Use the current eth_type to validate mpls_pop actions and remove
the eth_type check from pop_mpls().
- Move OVS_KEY_ATTR_MPLS to non-upstream group in ovs_key_lens
- Remove unnecessary memset of mpls_key in ovs_flow_to_nlattrs()
- Make a union of the mpls and ip elements of struct sw_flow_key.
Currently the code stops parsing after an MPLS header so it is
not possible for the ip and mpls elements to be used simultaneously
and some space can be saved by using a union.
- Allow an array of MPLS key attributes
+ Currently all but the first element is ignored
+ User-space needs to be updated to accept more than one element,
currently it will treat their presence as an error
- Do not update network header in ovs_flow_extract() for after parsing
the MPLS stack as it is never used because no l3+ processing
occurs on MPLS frames.
- Allow multiple MPLS entries in a match by allowing the OVS_KEY_ATTR_MPLS
to be an array of struct ovs_key_mpls with at least one entry.
Currently only one entry is used which is byte-for-byte compatible with
the previous scheme of having OVS_KEY_ATTR_MPLS as a struct
ovs_key_mpls.
* Make skb writable in pop_mpls(), push_mpls() and set_mpls().
v2.18 - v2.19
* No change
v2.17
* As suggested by Ben Pfaff
- Use consistent terminology for MPLS.
+ Consistently refer to the MPLS component of a packet as the
MPLS label stack and entries in the stack as MPLS label stack entries
(LSE). An MPLS label is a component of an MPLS label stack entry.
The other components are the traffic class (TC), time to live (TTL)
and bottom of stack (BoS) bit.
- Rename compose_.*mpls_ functions as execute_.*mpls_
v2.16
* No change
v2.15
* As suggested by Ben Pfaff
- Use OVS_ACTION_SET to set OVS_KEY_ATTR_MPLS instead of
OVS_ACTION_ATTR_SET_MPLS
v2.14
* Remove include/linux/openvswitch.h portion which added add
new key and action attributes. This
now present in "User-Space MPLS actions and matches"
which is now a dependency of this patch
v2.13
* As suggested by Jarno Rajahalme
- Rename mpls_bos element of ovs_skb_cb as l2_size as it is set and used
regardless of if an MPLS stack is present or not. Update the name of
helper functions and documentation accordingly.
- Ensure that skb_cb_mpls_bos() never returns NULL
* Correct endieness in eth_p_mpls()
v2.12
* Update skb and network header on MPLS extraction in ovs_flow_extract()
* Use NULL in skb_cb_mpls_bos()
* Add eth_p_mpls helper
v2.10 - v2.11
* No change
v2.9
* datapath: Always update the mpls bos if vlan_pop is successful
Regardless of the details of how a successful
vlan_pop is achieved, the mpls bos needs to be updated.
Without this fix it has been observed that the following
results in malformed packets
v2.8
* No change
v2.7
* Rebase
v2.6
* As suggested by Yamahata-san
- Do not guard against label == 0 for
OVS_ACTION_ATTR_SET_MPLS in validate_actions().
A label of 0 is valid
- Remove comment stupulating that if
the top_label element of struct sw_flow_key is 0 then
there is no MPLS label. An MPLS label of 0 is valid
and the correct check if ethertype is
ntohs(ETH_TYPE_MPLS) or ntohs(ETH_TYPE_MPLS_MCAST)
v2.4 - v2.5
* No change
v2.3
* s/mpls_stack/mpls_bos/
This is in keeping with the naming used in the OpenFlow 1.3 specification
v2.2
* Call skb_reset_mac_header() in skb_cb_set_mpls_stack()
eth_hdr(skb) is non-NULL when called in skb_cb_set_mpls_stack().
* Add a call to skb_cb_set_mpls_stack() in ovs_packet_cmd_execute().
I apologise that I have mislaid my notes on this but
it avoids a kernel panic. I can investigate again if necessary.
* Use struct ovs_action_push_mpls instead of
__be16 to decode OVS_ACTION_ATTR_PUSH_MPLS in validate_actions(). This is
consistent with the data format for the attribute.
* Indentation fix in skb_cb_mpls_stack(). [cosmetic]
v2.1
* Manual rebase
Conflicts:
datapath/linux/compat/include/linux/netdevice.h
datapath/linux/compat/netdevice.c
---
OPENFLOW-1.1+ | 12 -
datapath/Modules.mk | 1 +
datapath/actions.c | 119 +++++++++-
datapath/datapath.c | 4 +-
datapath/flow.c | 29 +++
datapath/flow.h | 17 +-
datapath/flow_netlink.c | 296 ++++++++++++++++++++++--
datapath/flow_netlink.h | 2 +-
datapath/linux/compat/gso.c | 70 +++++-
datapath/linux/compat/gso.h | 41 ++++
datapath/linux/compat/include/linux/netdevice.h | 6 +-
datapath/linux/compat/netdevice.c | 10 +-
datapath/mpls.h | 15 ++
include/linux/openvswitch.h | 7 +-
14 files changed, 567 insertions(+), 62 deletions(-)
create mode 100644 datapath/mpls.h
diff --git a/OPENFLOW-1.1+ b/OPENFLOW-1.1+
index eaf2ee9..75d9a09 100644
--- a/OPENFLOW-1.1+
+++ b/OPENFLOW-1.1+
@@ -59,10 +59,6 @@ probably incomplete.
behavior does not change.
[required for OF1.1 and OF1.2]
- * MPLS. Simon Horman maintains a patch series that adds this
- feature. This is partially merged.
- [optional for OF1.1+]
-
* Match and set double-tagged VLANs (QinQ). This requires kernel
work for reasonable performance.
[optional for OF1.1+]
@@ -121,18 +117,10 @@ didn't compare the specs carefully yet.)
some kind of "hardware" support, if we judged it useful enough.)
[optional for OF1.3+]
- * MPLS BoS matching.
- Part of MPLS patchset by Simon Horman.
- [optional for OF1.3+]
-
* Provider Backbone Bridge tagging. I don't plan to implement
this (but we'd accept an implementation).
[optional for OF1.3+]
- * Rework tag order.
- Part of MPLS patchset by Simon Horman.
- [required for v1.3+]
-
* On-demand flow counters. I think this might be a real
optimization in some cases for the software switch.
[optional for OF1.3+]
diff --git a/datapath/Modules.mk b/datapath/Modules.mk
index b652411..6aa80e5 100644
--- a/datapath/Modules.mk
+++ b/datapath/Modules.mk
@@ -26,6 +26,7 @@ openvswitch_headers = \
flow.h \
flow_netlink.h \
flow_table.h \
+ mpls.h \
vlan.h \
vport.h \
vport-internal_dev.h \
diff --git a/datapath/actions.c b/datapath/actions.c
index 30ea1d2..4820ff5 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -35,6 +35,8 @@
#include <net/sctp/checksum.h>
#include "datapath.h"
+#include "gso.h"
+#include "mpls.h"
#include "vlan.h"
#include "vport.h"
@@ -49,6 +51,101 @@ static int make_writable(struct sk_buff *skb, int write_len)
return pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
}
+/* The end of the mac header.
+ *
+ * For non-MPLS skbs this will correspond to the network header.
+ * For MPLS skbs it will be before the network_header as the MPLS
+ * label stack lies between the end of the mac header and the network
+ * header. That is, for MPLS skbs the end of the mac header
+ * is the top of the MPLS label stack.
+ */
+static unsigned char *mac_header_end(const struct sk_buff *skb)
+{
+ return skb_mac_header(skb) + skb->mac_len;
+}
+
+static int push_mpls(struct sk_buff *skb,
+ const struct ovs_action_push_mpls *mpls)
+{
+ __be32 *new_mpls_lse;
+ struct ethhdr *hdr;
+
+ if (skb_cow_head(skb, MPLS_HLEN) < 0) {
+ kfree_skb(skb);
+ return -ENOMEM;
+ }
+
+ skb_push(skb, MPLS_HLEN);
+ memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
+ skb->mac_len);
+ skb_reset_mac_header(skb);
+
+ new_mpls_lse = (__be32 *)mac_header_end(skb);
+ *new_mpls_lse = mpls->mpls_lse;
+
+ if (skb->ip_summed == CHECKSUM_COMPLETE)
+ skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse,
+ MPLS_HLEN, 0));
+
+ hdr = eth_hdr(skb);
+ hdr->h_proto = mpls->mpls_ethertype;
+ skb->protocol = mpls->mpls_ethertype;
+ return 0;
+}
+
+static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
+{
+ struct ethhdr *hdr;
+ int err;
+
+ if (unlikely(skb->len < skb->mac_len + MPLS_HLEN))
+ return -EINVAL;
+
+ err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+ if (unlikely(err))
+ return err;
+
+ if (skb->ip_summed == CHECKSUM_COMPLETE)
+ skb->csum = csum_sub(skb->csum,
+ csum_partial(mac_header_end(skb),
+ MPLS_HLEN, 0));
+
+ memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
+ skb->mac_len);
+
+ __skb_pull(skb, MPLS_HLEN);
+ skb_reset_mac_header(skb);
+
+ /* mac_header_end() is used to locate the ethertype
+ * field correctly in the presence of VLAN tags.
+ */
+ hdr = (struct ethhdr *)(mac_header_end(skb) - ETH_HLEN);
+ hdr->h_proto = ethertype;
+ if (eth_p_mpls(skb->protocol))
+ skb->protocol = ethertype;
+ return 0;
+}
+
+static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
+{
+ __be32 *stack = (__be32 *)mac_header_end(skb);
+ int err;
+
+ err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+ if (unlikely(err))
+ return err;
+
+ if (skb->ip_summed == CHECKSUM_COMPLETE) {
+ __be32 diff[] = { ~(*stack), *mpls_lse };
+ skb->csum = ~csum_partial((char *)diff, sizeof(diff),
+ ~skb->csum);
+ }
+
+ *stack = *mpls_lse;
+
+ return 0;
+}
+
/* remove VLAN header from packet and update csum accordingly. */
static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
{
@@ -71,7 +168,8 @@ static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
vlan_set_encap_proto(skb, vhdr);
skb->mac_header += VLAN_HLEN;
- skb_reset_mac_len(skb);
+ /* Update mac_len for subsequent MPLS actions */
+ skb->mac_len -= VLAN_HLEN;
return 0;
}
@@ -116,6 +214,9 @@ static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vla
if (!__vlan_put_tag(skb, skb->vlan_proto, current_tag))
return -ENOMEM;
+ /* Update mac_len for subsequent MPLS actions */
+ skb->mac_len += VLAN_HLEN;
+
if (skb->ip_summed == CHECKSUM_COMPLETE)
skb->csum = csum_add(skb->csum, csum_partial(skb->data
+ (2 * ETH_ALEN), VLAN_HLEN, 0));
@@ -501,6 +602,10 @@ static int execute_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_SCTP:
err = set_sctp(skb, nla_data(nested_attr));
break;
+
+ case OVS_KEY_ATTR_MPLS:
+ err = set_mpls(skb, nla_data(nested_attr));
+ break;
}
return err;
@@ -536,6 +641,16 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
output_userspace(dp, skb, a);
break;
+ case OVS_ACTION_ATTR_PUSH_MPLS:
+ err = push_mpls(skb, nla_data(a));
+ if (unlikely(err)) /* skb already freed. */
+ return err;
+ break;
+
+ case OVS_ACTION_ATTR_POP_MPLS:
+ err = pop_mpls(skb, nla_get_be16(a));
+ break;
+
case OVS_ACTION_ATTR_PUSH_VLAN:
err = push_vlan(skb, nla_data(a));
if (unlikely(err)) /* skb already freed. */
@@ -609,6 +724,8 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
goto out_loop;
}
+ ovs_skb_init_inner_protocol(skb);
+
OVS_CB(skb)->tun_key = NULL;
error = do_execute_actions(dp, skb, acts->actions,
acts->actions_len, false);
diff --git a/datapath/datapath.c b/datapath/datapath.c
index d528ba0..44ad3f1 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -543,7 +543,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
goto err_flow_free;
err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
- &flow->key, 0, &acts);
+ &flow->key, &acts);
rcu_assign_pointer(flow->sf_acts, acts);
if (err)
goto err_flow_free;
@@ -806,7 +806,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
ovs_flow_mask_key(&masked_key, &key, &mask);
error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS],
- &masked_key, 0, &acts);
+ &masked_key, &acts);
if (error) {
OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
goto err_kfree;
diff --git a/datapath/flow.c b/datapath/flow.c
index abe6789..e20828b 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -45,6 +45,7 @@
#include <net/ipv6.h>
#include <net/ndisc.h>
+#include "mpls.h"
#include "vlan.h"
u64 ovs_flow_used_time(unsigned long flow_jiffies)
@@ -481,6 +482,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
return -ENOMEM;
skb_reset_network_header(skb);
+ skb_reset_mac_len(skb);
__skb_push(skb, skb->data - skb_mac_header(skb));
/* Network layer. */
@@ -564,6 +566,33 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
memcpy(key->ipv4.arp.sha, arp->ar_sha, ETH_ALEN);
memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN);
}
+ } else if (eth_p_mpls(key->eth.type)) {
+ size_t stack_len = MPLS_HLEN;
+
+ /* In the presence of an MPLS label stack the end of the L2
+ * header and the beginning of the L3 header differ.
+ *
+ * Advance network_header to the beginning of the L3
+ * header. mac_len corresponds to the end of the L2 header.
+ */
+ while (1) {
+ __be32 lse;
+
+ error = check_header(skb, skb->mac_len + stack_len);
+ if (unlikely(error))
+ return 0;
+
+ memcpy(&lse, skb_network_header(skb), MPLS_HLEN);
+
+ if (stack_len == MPLS_HLEN)
+ memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
+
+ skb_set_network_header(skb, skb->mac_len + stack_len);
+ if (lse & htonl(MPLS_BOS_MASK))
+ break;
+
+ stack_len += MPLS_HLEN;
+ }
} else if (key->eth.type == htons(ETH_P_IPV6)) {
int nh_len; /* IPv6 Header + Extensions */
diff --git a/datapath/flow.h b/datapath/flow.h
index eafcfd8..86ea5f5 100644
--- a/datapath/flow.h
+++ b/datapath/flow.h
@@ -80,12 +80,17 @@ struct sw_flow_key {
__be16 tci; /* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
__be16 type; /* Ethernet frame type. */
} eth;
- struct {
- u8 proto; /* IP protocol or lower 8 bits of ARP opcode. */
- u8 tos; /* IP ToS. */
- u8 ttl; /* IP TTL/hop limit. */
- u8 frag; /* One of OVS_FRAG_TYPE_*. */
- } ip;
+ union {
+ struct {
+ __be32 top_lse; /* top label stack entry */
+ } mpls;
+ struct {
+ u8 proto; /* IP protocol or lower 8 bits of ARP opcode. */
+ u8 tos; /* IP ToS. */
+ u8 ttl; /* IP TTL/hop limit. */
+ u8 frag; /* One of OVS_FRAG_TYPE_*. */
+ } ip;
+ };
union {
struct {
struct {
diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index 39fe4bf..86e0950 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -20,6 +20,7 @@
#include "flow.h"
#include "datapath.h"
+#include "mpls.h"
#include <linux/uaccess.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
@@ -122,7 +123,8 @@ static bool match_validate(const struct sw_flow_match *match,
| (1ULL << OVS_KEY_ATTR_ICMP)
| (1ULL << OVS_KEY_ATTR_ICMPV6)
| (1ULL << OVS_KEY_ATTR_ARP)
- | (1ULL << OVS_KEY_ATTR_ND));
+ | (1ULL << OVS_KEY_ATTR_ND)
+ | (1ULL << OVS_KEY_ATTR_MPLS));
/* Always allowed mask fields. */
mask_allowed |= ((1ULL << OVS_KEY_ATTR_TUNNEL)
@@ -137,6 +139,13 @@ static bool match_validate(const struct sw_flow_match *match,
mask_allowed |= 1ULL << OVS_KEY_ATTR_ARP;
}
+
+ if (eth_p_mpls(match->key->eth.type)) {
+ key_expected |= 1ULL << OVS_KEY_ATTR_MPLS;
+ if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
+ mask_allowed |= 1ULL << OVS_KEY_ATTR_MPLS;
+ }
+
if (match->key->eth.type == htons(ETH_P_IP)) {
key_expected |= 1ULL << OVS_KEY_ATTR_IPV4;
if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
@@ -252,6 +261,7 @@ static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
[OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
[OVS_KEY_ATTR_TUNNEL] = -1,
+ [OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls),
};
static bool is_all_zero(const u8 *fp, size_t size)
@@ -662,6 +672,16 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, bool *exact_5tuple
attrs &= ~(1ULL << OVS_KEY_ATTR_ARP);
}
+ if (attrs & (1ULL << OVS_KEY_ATTR_MPLS)) {
+ const struct ovs_key_mpls *mpls_key;
+
+ mpls_key = nla_data(a[OVS_KEY_ATTR_MPLS]);
+ SW_FLOW_KEY_PUT(match, mpls.top_lse,
+ mpls_key->mpls_lse, is_mask);
+
+ attrs &= ~(1ULL << OVS_KEY_ATTR_MPLS);
+ }
+
if (attrs & (1ULL << OVS_KEY_ATTR_TCP)) {
const struct ovs_key_tcp *tcp_key;
@@ -1061,6 +1081,14 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
arp_key->arp_op = htons(output->ip.proto);
memcpy(arp_key->arp_sha, output->ipv4.arp.sha, ETH_ALEN);
memcpy(arp_key->arp_tha, output->ipv4.arp.tha, ETH_ALEN);
+ } else if (eth_p_mpls(swkey->eth.type)) {
+ struct ovs_key_mpls *mpls_key;
+
+ nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key));
+ if (!nla)
+ goto nla_put_failure;
+ mpls_key = nla_data(nla);
+ mpls_key->mpls_lse = output->mpls.top_lse;
}
if ((swkey->eth.type == htons(ETH_P_IP) ||
@@ -1269,15 +1297,133 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
a->nla_len = sfa->actions_len - st_offset;
}
+#define MAX_ETH_TYPES 16 /* Arbitrary Limit */
+
+/* struct eth_types - possible eth types
+ * @types: provides storage for the possible eth types.
+ * @start: is the index of the first entry of types which is possible.
+ * @end: is the index of the last entry of types which is possible.
+ * @cursor: is the index of the entry which should be updated if an action
+ * changes the eth type.
+ *
+ * Due to the sample action there may be multiple possible eth types.
+ * In order to correctly validate actions all possible types are tracked
+ * and verified. This is done using struct eth_types.
+ *
+ * Initially start, end and cursor should be 0, and the first element of
+ * types should be set to the eth type of the flow.
+ *
+ * When an action changes the eth type then the values of start and end are
+ * updated to the value of cursor. The new type is stored at types[cursor].
+ *
+ * When entering a sample action the start and cursor values are saved. The
+ * value of cursor is set to the value of end plus one.
+ *
+ * When leaving a sample action the start and cursor values are restored to
+ * their saved values.
+ *
+ * An example follows.
+ *
+ * actions: pop_mpls(A),sample(pop_mpls(B)),sample(pop_mpls(C)),pop_mpls(D)
+ *
+ * 0. Initial state:
+ * types = { original_eth_type }
+ * start = end = cursor = 0;
+ *
+ * 1. pop_mpls(A)
+ * a. Check types from start (0) to end (0) inclusive
+ * i.e. Check against original_eth_type
+ * b. Set start = end = cursor
+ * c. Set types[cursor] = A
+ * New state:
+ * types = { A }
+ * start = end = cursor = 0;
+ *
+ * 2. Enter first sample()
+ * a. Save start and cursor
+ * b. Set cursor = end + 1
+ * New state:
+ * types = { A }
+ * start = end = 0;
+ * cursor = 1;
+ *
+ * 3. pop_mpls(B)
+ * a. Check types from start (0) to end (0)
+ * i.e: Check against A
+ * b. Set start = end = cursor
+ * c. Set types[cursor] = B
+ * New state:
+ * types = { A, B }
+ * start = end = cursor = 1;
+ *
+ * 4. Leave first sample()
+ * a. Restore start and cursor to the values when entering 2.
+ * New state:
+ * types = { A, B }
+ * start = cursor = 0;
+ * end = 1;
+ *
+ * 5. Enter second sample()
+ * a. Save start and cursor
+ * b. Set cursor = end + 1
+ * New state:
+ * types = { A, B }
+ * start = 0;
+ * end = 1;
+ * cursor = 2;
+ *
+ * 6. pop_mpls(C)
+ * a. Check types from start (0) to end (1) inclusive
+ * i.e: Check against A and B
+ * b. Set start = end = cursor
+ * c. Set types[cursor] = C
+ * New state:
+ * types = { A, B, C }
+ * start = end = cursor = 2;
+ *
+ * 7. Leave second sample()
+ * a. Restore start and cursor to the values when entering 5.
+ * New state:
+ * types = { A, B, C }
+ * start = cursor = 0;
+ * end = 2;
+ *
+ * 8. pop_mpls(D)
+ * a. Check types from start (0) to end (2) inclusive
+ * i.e: Check against A, B and C
+ * b. Set start = end = cursor
+ * c. Set types[cursor] = D
+ * New state:
+ * types = { D } // Trailing entries of type are no longer used end = 0
+ * start = end = cursor = 0;
+ */
+struct eth_types {
+ int start, end, cursor;
+ __be16 types[MAX_ETH_TYPES];
+};
+
+static void eth_types_set(struct eth_types *types, __be16 type)
+{
+ types->start = types->end = types->cursor;
+ types->types[types->cursor] = type;
+}
+
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+ const struct sw_flow_key *key,
+ int depth,
+ struct sw_flow_actions **sfa,
+ struct eth_types *eth_types);
static int validate_and_copy_sample(const struct nlattr *attr,
const struct sw_flow_key *key, int depth,
- struct sw_flow_actions **sfa)
+ struct sw_flow_actions **sfa,
+ struct eth_types *eth_types)
{
const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
const struct nlattr *probability, *actions;
const struct nlattr *a;
int rem, start, err, st_acts;
+ int saved_eth_types_start, saved_eth_types_cursor;
memset(attrs, 0, sizeof(attrs));
nla_for_each_nested(a, attr, rem) {
@@ -1309,22 +1455,38 @@ static int validate_and_copy_sample(const struct nlattr *attr,
if (st_acts < 0)
return st_acts;
- err = ovs_nla_copy_actions(actions, key, depth + 1, sfa);
+ /* Save and update eth_types cursor and start. Please see the
+ * comment for struct eth_types for a discussion of this.
+ */
+ saved_eth_types_start = eth_types->start;
+ saved_eth_types_cursor = eth_types->cursor;
+ eth_types->cursor = eth_types->end + 1;
+ if (eth_types->cursor == MAX_ETH_TYPES)
+ return -EINVAL;
+
+ err = ovs_nla_copy_actions__(actions, key, depth + 1, sfa, eth_types);
if (err)
return err;
+ /* Restore eth_types cursor and start. Please see the
+ * comment for struct eth_types for a discussion of this.
+ */
+ eth_types->cursor = saved_eth_types_cursor;
+ eth_types->start = saved_eth_types_start;
+
add_nested_action_end(*sfa, st_acts);
add_nested_action_end(*sfa, start);
return 0;
}
-static int validate_tp_port(const struct sw_flow_key *flow_key)
+static int validate_tp_port__(const struct sw_flow_key *flow_key,
+ __be16 eth_type)
{
- if (flow_key->eth.type == htons(ETH_P_IP)) {
+ if (eth_type == htons(ETH_P_IP)) {
if (flow_key->ipv4.tp.src || flow_key->ipv4.tp.dst)
return 0;
- } else if (flow_key->eth.type == htons(ETH_P_IPV6)) {
+ } else if (eth_type == htons(ETH_P_IPV6)) {
if (flow_key->ipv6.tp.src || flow_key->ipv6.tp.dst)
return 0;
}
@@ -1332,6 +1494,21 @@ static int validate_tp_port(const struct sw_flow_key *flow_key)
return -EINVAL;
}
+static int validate_tp_port(const struct sw_flow_key *flow_key,
+ const struct eth_types *eth_types)
+{
+ int i;
+
+ for (i = eth_types->start; i < eth_types->end; i++) {
+ int ret = validate_tp_port__(flow_key, eth_types->types[i]);
+
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
void ovs_match_init(struct sw_flow_match *match,
struct sw_flow_key *key,
struct sw_flow_mask *mask)
@@ -1374,7 +1551,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
static int validate_set(const struct nlattr *a,
const struct sw_flow_key *flow_key,
struct sw_flow_actions **sfa,
- bool *set_tun)
+ bool *set_tun, struct eth_types *eth_types)
{
const struct nlattr *ovs_key = nla_data(a);
int key_type = nla_type(ovs_key);
@@ -1405,9 +1582,12 @@ static int validate_set(const struct nlattr *a,
return err;
break;
- case OVS_KEY_ATTR_IPV4:
- if (flow_key->eth.type != htons(ETH_P_IP))
- return -EINVAL;
+ case OVS_KEY_ATTR_IPV4: {
+ int i;
+
+ for (i = eth_types->start; i <= eth_types->end; i++)
+ if (eth_types->types[i] != htons(ETH_P_IP))
+ return -EINVAL;
if (!flow_key->ip.proto)
return -EINVAL;
@@ -1420,10 +1600,14 @@ static int validate_set(const struct nlattr *a,
return -EINVAL;
break;
+ }
- case OVS_KEY_ATTR_IPV6:
- if (flow_key->eth.type != htons(ETH_P_IPV6))
- return -EINVAL;
+ case OVS_KEY_ATTR_IPV6: {
+ int i;
+
+ for (i = eth_types->start; i <= eth_types->end; i++)
+ if (eth_types->types[i] != htons(ETH_P_IPV6))
+ return -EINVAL;
if (!flow_key->ip.proto)
return -EINVAL;
@@ -1439,24 +1623,35 @@ static int validate_set(const struct nlattr *a,
return -EINVAL;
break;
+ }
+
case OVS_KEY_ATTR_TCP:
if (flow_key->ip.proto != IPPROTO_TCP)
return -EINVAL;
- return validate_tp_port(flow_key);
+ return validate_tp_port(flow_key, eth_types);
case OVS_KEY_ATTR_UDP:
if (flow_key->ip.proto != IPPROTO_UDP)
return -EINVAL;
- return validate_tp_port(flow_key);
+ return validate_tp_port(flow_key, eth_types);
+
+ case OVS_KEY_ATTR_MPLS: {
+ int i;
+
+ for (i = eth_types->start; i < eth_types->end; i++)
+ if (!eth_p_mpls(eth_types->types[i]))
+ return -EINVAL;
+ break;
+ }
case OVS_KEY_ATTR_SCTP:
if (flow_key->ip.proto != IPPROTO_SCTP)
return -EINVAL;
- return validate_tp_port(flow_key);
+ return validate_tp_port(flow_key, eth_types);
default:
return -EINVAL;
@@ -1500,10 +1695,11 @@ static int copy_action(const struct nlattr *from,
return 0;
}
-int ovs_nla_copy_actions(const struct nlattr *attr,
- const struct sw_flow_key *key,
- int depth,
- struct sw_flow_actions **sfa)
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+ const struct sw_flow_key *key,
+ int depth,
+ struct sw_flow_actions **sfa,
+ struct eth_types *eth_types)
{
const struct nlattr *a;
int rem, err;
@@ -1516,6 +1712,8 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = {
[OVS_ACTION_ATTR_OUTPUT] = sizeof(u32),
[OVS_ACTION_ATTR_USERSPACE] = (u32)-1,
+ [OVS_ACTION_ATTR_PUSH_MPLS] = sizeof(struct ovs_action_push_mpls),
+ [OVS_ACTION_ATTR_POP_MPLS] = sizeof(__be16),
[OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan),
[OVS_ACTION_ATTR_POP_VLAN] = 0,
[OVS_ACTION_ATTR_SET] = (u32)-1,
@@ -1558,14 +1756,54 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
return -EINVAL;
break;
+ case OVS_ACTION_ATTR_PUSH_MPLS: {
+ int i;
+ const struct ovs_action_push_mpls *mpls = nla_data(a);
+
+ if (!eth_p_mpls(mpls->mpls_ethertype))
+ return -EINVAL;
+ /* Prohibit push MPLS in the presence of VLANs */
+ for (i = eth_types->start; i < eth_types->end; i++)
+ if (eth_types->types[i] == htons(ETH_P_8021Q) ||
+ eth_types->types[i] == htons(ETH_P_8021AD) ||
+ eth_types->types[i] == htons(ETH_P_QINQ1) ||
+ eth_types->types[i] == htons(ETH_P_QINQ2) ||
+ eth_types->types[i] == htons(ETH_P_QINQ3))
+ return -EINVAL;
+ eth_types_set(eth_types, mpls->mpls_ethertype);
+ break;
+ }
+
+ case OVS_ACTION_ATTR_POP_MPLS: {
+ int i;
+
+ for (i = eth_types->start; i <= eth_types->end; i++)
+ if (!eth_p_mpls(eth_types->types[i]))
+ return -EINVAL;
+
+ /* Disallow subsequent L2.5+ set and mpls_pop actions
+ * as there is no check here to ensure that the new
+ * eth_type is valid and thus set actions could
+ * write off the end of the packet or otherwise
+ * corrupt it.
+ *
+ * Support for these actions is planned using packet
+ * recirculation.
+ */
+ eth_types_set(eth_types, htons(0));
+ break;
+ }
+
case OVS_ACTION_ATTR_SET:
- err = validate_set(a, key, sfa, &skip_copy);
+ err = validate_set(a, key, sfa, &skip_copy,
+ eth_types);
if (err)
return err;
break;
case OVS_ACTION_ATTR_SAMPLE:
- err = validate_and_copy_sample(a, key, depth, sfa);
+ err = validate_and_copy_sample(a, key, depth, sfa,
+ eth_types);
if (err)
return err;
skip_copy = true;
@@ -1587,6 +1825,20 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
return 0;
}
+int ovs_nla_copy_actions(const struct nlattr *attr,
+ const struct sw_flow_key *key,
+ struct sw_flow_actions **sfa)
+{
+ struct eth_types eth_type = {
+ .start = 0,
+ .end = 0,
+ .cursor = 0,
+ .types = { key->eth.type, },
+ };
+
+ return ovs_nla_copy_actions__(attr, key, 0, sfa, ð_type);
+}
+
static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
{
const struct nlattr *a;
diff --git a/datapath/flow_netlink.h b/datapath/flow_netlink.h
index b31fbe2..41d2673 100644
--- a/datapath/flow_netlink.h
+++ b/datapath/flow_netlink.h
@@ -50,7 +50,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
const struct nlattr *);
int ovs_nla_copy_actions(const struct nlattr *attr,
- const struct sw_flow_key *key, int depth,
+ const struct sw_flow_key *key,
struct sw_flow_actions **sfa);
int ovs_nla_put_actions(const struct nlattr *attr,
int len, struct sk_buff *skb);
diff --git a/datapath/linux/compat/gso.c b/datapath/linux/compat/gso.c
index 32f906c..7461f57 100644
--- a/datapath/linux/compat/gso.c
+++ b/datapath/linux/compat/gso.c
@@ -19,6 +19,7 @@
#include <linux/module.h>
#include <linux/if.h>
#include <linux/if_tunnel.h>
+#include <linux/if_vlan.h>
#include <linux/icmp.h>
#include <linux/in.h>
#include <linux/ip.h>
@@ -35,6 +36,8 @@
#include <net/xfrm.h>
#include "gso.h"
+#include "mpls.h"
+#include "vlan.h"
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37) && \
!defined(HAVE_VLAN_BUG_WORKAROUND)
@@ -47,10 +50,12 @@ MODULE_PARM_DESC(vlan_tso, "Enable TSO for VLAN packets");
#define vlan_tso true
#endif
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
static bool dev_supports_vlan_tx(struct net_device *dev)
{
-#if defined(HAVE_VLAN_BUG_WORKAROUND)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
+ return true;
+#elif defined(HAVE_VLAN_BUG_WORKAROUND)
return dev->features & NETIF_F_HW_VLAN_TX;
#else
/* Assume that the driver is buggy. */
@@ -58,24 +63,64 @@ static bool dev_supports_vlan_tx(struct net_device *dev)
#endif
}
+/* Strictly this is not needed and will be optimised out
+ * as this code is guarded by if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0).
+ * It is here to make things explicit should the compatibility
+ * code be extended in some way prior extending its life-span
+ * beyond v3.11.
+ */
+static bool supports_mpls_gso(void)
+{
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,11,0)
+ return true;
+#else
+ return false;
+#endif
+}
+
int rpl_dev_queue_xmit(struct sk_buff *skb)
{
#undef dev_queue_xmit
int err = -ENOMEM;
+ bool vlan, mpls;
+
+ vlan = mpls = false;
+
+ if (eth_p_mpls(skb->protocol) && !supports_mpls_gso())
+ mpls = true;
+
+ if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev))
+ vlan = true;
- if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev)) {
+ if (vlan || mpls) {
int features;
features = netif_skb_features(skb);
- if (!vlan_tso)
- features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
- NETIF_F_UFO | NETIF_F_FSO);
+ if (vlan) {
+ if (!vlan_tso)
+ features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
+ NETIF_F_UFO | NETIF_F_FSO);
- skb = __vlan_put_tag(skb, skb->vlan_proto, vlan_tx_tag_get(skb));
- if (unlikely(!skb))
- return err;
- vlan_set_tci(skb, 0);
+ skb = __vlan_put_tag(skb, skb->vlan_proto,
+ vlan_tx_tag_get(skb));
+ if (unlikely(!skb))
+ return err;
+ vlan_set_tci(skb, 0);
+ }
+
+ /* As of v3.11 the kernel provides an mpls_features field in
+ * struct net_device which allows devices to advertise which
+ * features its supports for MPLS. This value defaults to
+ * NETIF_F_SG and as of v3.11.
+ *
+ * This compatibility code is intended for kernels older
+ * than v3.11 that do not support MPLS GSO and thus do not
+ * provide mpls_features. Thus this code uses NETIF_F_SG
+ * directly in place of mpls_features.
+ */
+ if (mpls)
+ features &= NETIF_F_SG;
if (netif_needs_gso(skb, features)) {
struct sk_buff *nskb;
@@ -114,13 +159,16 @@ drop:
kfree_skb(skb);
return err;
}
-#endif /* kernel version < 2.6.37 */
+#endif /* kernel version < 3.11.0 */
static __be16 __skb_network_protocol(struct sk_buff *skb)
{
__be16 type = skb->protocol;
int vlan_depth = ETH_HLEN;
+ if (eth_p_mpls(skb->protocol))
+ type = ovs_skb_get_inner_protocol(skb);
+
while (type == htons(ETH_P_8021Q) || type == htons(ETH_P_8021AD)) {
struct vlan_hdr *vh;
diff --git a/datapath/linux/compat/gso.h b/datapath/linux/compat/gso.h
index 44fd213..d7a9cea 100644
--- a/datapath/linux/compat/gso.h
+++ b/datapath/linux/compat/gso.h
@@ -1,6 +1,7 @@
#ifndef __LINUX_GSO_WRAPPER_H
#define __LINUX_GSO_WRAPPER_H
+#include <linux/netdevice.h>
#include <linux/skbuff.h>
#include <net/protocol.h>
@@ -11,6 +12,9 @@ struct ovs_gso_cb {
sk_buff_data_t inner_network_header;
sk_buff_data_t inner_mac_header;
void (*fix_segment)(struct sk_buff *);
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
+ __be16 inner_protocol;
+#endif
};
#define OVS_GSO_CB(skb) ((struct ovs_gso_cb *)(skb)->cb)
@@ -69,4 +73,41 @@ static inline void skb_reset_inner_headers(struct sk_buff *skb)
#define ip_local_out rpl_ip_local_out
int ip_local_out(struct sk_buff *skb);
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
+static inline void ovs_skb_init_inner_protocol(struct sk_buff *skb) {
+ OVS_GSO_CB(skb)->inner_protocol = htons(0);
+}
+
+static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb,
+ __be16 ethertype) {
+ OVS_GSO_CB(skb)->inner_protocol = ethertype;
+}
+
+static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb)
+{
+ return OVS_GSO_CB(skb)->inner_protocol;
+}
+
+#else
+
+static inline void ovs_skb_init_inner_protocol(struct sk_buff *skb) {
+ /* Nothing to do. The inner_protocol is either zero or
+ * has been set to a value by another user.
+ * Either way it may be considered initialised.
+ */
+}
+
+static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb,
+ __be16 ethertype)
+{
+ skb->inner_protocol = ethertype;
+}
+
+static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb)
+{
+ return skb->inner_protocol;
+}
+#endif
+
#endif
diff --git a/datapath/linux/compat/include/linux/netdevice.h b/datapath/linux/compat/include/linux/netdevice.h
index e04f308..df69ca6 100644
--- a/datapath/linux/compat/include/linux/netdevice.h
+++ b/datapath/linux/compat/include/linux/netdevice.h
@@ -64,11 +64,13 @@ static inline struct net_device *dev_get_by_index_rcu(struct net *net, int ifind
typedef u32 netdev_features_t;
#endif
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
#define skb_gso_segment rpl_skb_gso_segment
struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
netdev_features_t features);
+#endif
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38)
#define netif_skb_features rpl_netif_skb_features
netdev_features_t rpl_netif_skb_features(struct sk_buff *skb);
@@ -113,7 +115,7 @@ static inline struct net_device *netdev_master_upper_dev_get(struct net_device *
}
#endif
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
#define dev_queue_xmit rpl_dev_queue_xmit
int dev_queue_xmit(struct sk_buff *skb);
#endif
diff --git a/datapath/linux/compat/netdevice.c b/datapath/linux/compat/netdevice.c
index 1dc5abf..d22fced 100644
--- a/datapath/linux/compat/netdevice.c
+++ b/datapath/linux/compat/netdevice.c
@@ -1,6 +1,9 @@
#include <linux/netdevice.h>
#include <linux/if_vlan.h>
+#include "mpls.h"
+#include "gso.h"
+
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38)
#ifndef HAVE_CAN_CHECKSUM_PROTOCOL
static bool can_checksum_protocol(netdev_features_t features, __be16 protocol)
@@ -69,7 +72,9 @@ netdev_features_t rpl_netif_skb_features(struct sk_buff *skb)
return harmonize_features(skb, protocol, features);
}
}
+#endif /* kernel version < 2.6.38 */
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
netdev_features_t features)
{
@@ -78,6 +83,9 @@ struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
__be16 skb_proto;
struct sk_buff *skb_gso;
+ if (eth_p_mpls(skb->protocol))
+ type = ovs_skb_get_inner_protocol(skb);
+
while (type == htons(ETH_P_8021Q)) {
struct vlan_hdr *vh;
@@ -98,4 +106,4 @@ struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
skb->protocol = skb_proto;
return skb_gso;
}
-#endif /* kernel version < 2.6.38 */
+#endif /* kernel version < 3.11.0 */
diff --git a/datapath/mpls.h b/datapath/mpls.h
new file mode 100644
index 0000000..7eab104
--- /dev/null
+++ b/datapath/mpls.h
@@ -0,0 +1,15 @@
+#ifndef MPLS_H
+#define MPLS_H 1
+
+#include <linux/if_ether.h>
+
+#define MPLS_BOS_MASK 0x00000100
+#define MPLS_HLEN 4
+
+static inline bool eth_p_mpls(__be16 eth_type)
+{
+ return eth_type == htons(ETH_P_MPLS_UC) ||
+ eth_type == htons(ETH_P_MPLS_MC);
+}
+
+#endif
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index d1ff5ec..7205f7b 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -307,14 +307,13 @@ enum ovs_key_attr {
OVS_KEY_ATTR_TUNNEL, /* Nested set of ovs_tunnel attributes */
OVS_KEY_ATTR_SCTP, /* struct ovs_key_sctp */
OVS_KEY_ATTR_TCP_FLAGS, /* be16 TCP flags. */
+ OVS_KEY_ATTR_MPLS, /* array of struct ovs_key_mpls.
+ * The implementation may restrict
+ * the accepted length of the array. */
#ifdef __KERNEL__
OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */
#endif
-
- OVS_KEY_ATTR_MPLS = 62, /* array of struct ovs_key_mpls.
- * The implementation may restrict
- * the accepted length of the array. */
__OVS_KEY_ATTR_MAX
};
--
1.8.5.2
^ permalink raw reply related
* [PATCH net] bonding: Fix deadlock in bonding driver when using netpoll
From: Ding Tianhong @ 2014-02-12 4:06 UTC (permalink / raw)
To: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
Netdev
The bonding driver take write locks and spin locks that are shared
by the tx path in enslave processing and notification processing,
If the netconsole is in use, the bonding can call printk which puts
us in the netpoll tx path, if the netconsole is attached to the bonding
driver, result in deadlock.
So add protection for these place, by checking the netpoll_block_tx
state, we can defer the sending of the netconsole frames until a later
time using the retransmit feature of netpoll_send_skb that is triggered
on the return code NETDEV_TX_BUSY.
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
drivers/net/bonding/bond_main.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 71ba18e..8676649 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1543,9 +1543,11 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
bond_set_carrier(bond);
if (USES_PRIMARY(bond->params.mode)) {
+ block_netpoll_tx();
write_lock_bh(&bond->curr_slave_lock);
bond_select_active_slave(bond);
write_unlock_bh(&bond->curr_slave_lock);
+ unblock_netpoll_tx();
}
pr_info("%s: enslaving %s as a%s interface with a%s link.\n",
@@ -1571,10 +1573,12 @@ err_detach:
if (bond->primary_slave == new_slave)
bond->primary_slave = NULL;
if (bond->curr_active_slave == new_slave) {
+ block_netpoll_tx();
write_lock_bh(&bond->curr_slave_lock);
bond_change_active_slave(bond, NULL);
bond_select_active_slave(bond);
write_unlock_bh(&bond->curr_slave_lock);
+ unblock_netpoll_tx();
}
slave_disable_netpoll(new_slave);
@@ -2864,9 +2868,12 @@ static int bond_slave_netdev_event(unsigned long event,
pr_info("%s: Primary slave changed to %s, reselecting active slave.\n",
bond->dev->name, bond->primary_slave ? slave_dev->name :
"none");
+
+ block_netpoll_tx();
write_lock_bh(&bond->curr_slave_lock);
bond_select_active_slave(bond);
write_unlock_bh(&bond->curr_slave_lock);
+ unblock_netpoll_tx();
break;
case NETDEV_FEAT_CHANGE:
bond_compute_features(bond);
--
1.8.0
^ permalink raw reply related
* [PATCH net-next 02/10] net: phy: add MoCA PHY type
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
Some Ethernet MACs are connected to a MoCA PHY which will handle the
low-level job of sending Ethernet frames on the coaxial cable, these
Ethernet MACs need to know about it to be properly configured.
Add a new PHY mode "moca" and update the Device Tree parsing logic to
look for it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/of/of_net.c | 1 +
include/linux/phy.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c
index 729beba..e04f57b 100644
--- a/drivers/of/of_net.c
+++ b/drivers/of/of_net.c
@@ -32,6 +32,7 @@ static const char *phy_modes[] = {
[PHY_INTERFACE_MODE_SMII] = "smii",
[PHY_INTERFACE_MODE_XGMII] = "xgmii",
[PHY_INTERFACE_MODE_INTERNAL] = "internal",
+ [PHY_INTERFACE_MODE_MOCA] = "moca",
};
/**
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 463434b..0680261 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -75,6 +75,7 @@ typedef enum {
PHY_INTERFACE_MODE_SMII,
PHY_INTERFACE_MODE_XGMII,
PHY_INTERFACE_MODE_INTERNAL,
+ PHY_INTERFACE_MODE_MOCA,
} phy_interface_t;
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 03/10] net: phy: update port type for MoCA PHYs
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
MoCA PHYs are using coaxial (BNC-like) connectors, update the
transceiver port type when replying to ethtool.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/phy/phy.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 19c9eca..a755fa2 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -283,7 +283,10 @@ int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd)
ethtool_cmd_speed_set(cmd, phydev->speed);
cmd->duplex = phydev->duplex;
- cmd->port = PORT_MII;
+ if (phydev->interface == PHY_INTERFACE_MODE_MOCA)
+ cmd->port = PORT_BNC;
+ else
+ cmd->port = PORT_MII;
cmd->phy_address = phydev->addr;
cmd->transceiver = phy_is_internal(phydev) ?
XCVR_INTERNAL : XCVR_EXTERNAL;
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 05/10] net: bcmgenet: add driver definitions and private structure
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
This patchs adds the bcmgenet.h header file which contains all the
hardware definitions for the GENETv1 to v4 hardware blocks as well as
the driver private structure and MIB counters.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/broadcom/genet/bcmgenet.h | 631 +++++++++++++++++++++++++
1 file changed, 631 insertions(+)
create mode 100644 drivers/net/ethernet/broadcom/genet/bcmgenet.h
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
new file mode 100644
index 0000000..28ba32f
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -0,0 +1,631 @@
+/*
+ * Copyright (c) 2014 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ *
+*/
+#ifndef __BCMGENET_H__
+#define __BCMGENET_H__
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/spinlock.h>
+#include <linux/clk.h>
+#include <linux/mii.h>
+#include <linux/if_vlan.h>
+#include <linux/phy.h>
+
+/* total number of Buffer Descriptors, same for Rx/Tx */
+#define TOTAL_DESC 256
+/* which ring is descriptor based */
+#define DESC_INDEX 16
+/* Body(1500) + EH_SIZE(14) + VLANTAG(4) + BRCMTAG(6) + FCS(4) = 1528.
+ * 1536 is multiple of 256 bytes
+ */
+#define ENET_BRCM_TAG_LEN 6
+#define ENET_PAD 8
+#define ENET_MAX_MTU_SIZE (ETH_DATA_LEN + ETH_HLEN + VLAN_HLEN + \
+ ENET_BRCM_TAG_LEN + ETH_FCS_LEN + ENET_PAD)
+#define DMA_MAX_BURST_LENGTH 0x10
+
+/* misc. configuration */
+#define CLEAR_ALL_HFB 0xFF
+#define DMA_FC_THRESH_HI (TOTAL_DESC >> 4)
+#define DMA_FC_THRESH_LO 5
+
+/* PHY types */
+#define BRCM_PHY_TYPE_INT 0
+#define BRCM_PHY_TYPE_MOCA 1
+
+/* 64B status Block */
+struct status_64 {
+ u32 length_status; /* length and peripheral status */
+ u32 ext_status; /* Extended status*/
+ u32 rx_csum; /* partial rx checksum */
+ u32 unused1[9]; /* unused */
+ u32 tx_csum_info; /* Tx checksum info. */
+ u32 unused2[3]; /* unused */
+};
+
+
+/* Rx status bits */
+#define STATUS_RX_EXT_MASK 0x1FFFFF
+#define STATUS_RX_CSUM_MASK 0xFFFF
+#define STATUS_RX_CSUM_OK 0x10000
+#define STATUS_RX_CSUM_FR 0x20000
+#define STATUS_RX_PROTO_TCP 0
+#define STATUS_RX_PROTO_UDP 1
+#define STATUS_RX_PROTO_ICMP 2
+#define STATUS_RX_PROTO_OTHER 3
+#define STATUS_RX_PROTO_MASK 3
+#define STATUS_RX_PROTO_SHIFT 18
+#define STATUS_FILTER_INDEX_MASK 0xFFFF
+/* Tx status bits */
+#define STATUS_TX_CSUM_START_MASK 0X7FFF
+#define STATUS_TX_CSUM_START_SHIFT 16
+#define STATUS_TX_CSUM_PROTO_UDP 0x8000
+#define STATUS_TX_CSUM_OFFSET_MASK 0x7FFF
+#define STATUS_TX_CSUM_LV 0x80000000
+
+/* DMA Descriptor */
+#define DMA_DESC_LENGTH_STATUS 0x00 /* in bytes of data in buffer */
+#define DMA_DESC_ADDRESS_LO 0x04 /* lower bits of PA */
+#define DMA_DESC_ADDRESS_HI 0x08 /* upper 32 bits of PA, GENETv4+ */
+
+/* Rx/Tx common counter group.*/
+struct bcmgenet_pkt_counters {
+ u32 cnt_64; /* RO Received/Transmited 64 bytes packet */
+ u32 cnt_127; /* RO Rx/Tx 127 bytes packet */
+ u32 cnt_255; /* RO Rx/Tx 65-255 bytes packet */
+ u32 cnt_511; /* RO Rx/Tx 256-511 bytes packet */
+ u32 cnt_1023; /* RO Rx/Tx 512-1023 bytes packet */
+ u32 cnt_1518; /* RO Rx/Tx 1024-1518 bytes packet */
+ u32 cnt_mgv; /* RO Rx/Tx 1519-1522 good VLAN packet */
+ u32 cnt_2047; /* RO Rx/Tx 1522-2047 bytes packet*/
+ u32 cnt_4095; /* RO Rx/Tx 2048-4095 bytes packet*/
+ u32 cnt_9216; /* RO Rx/Tx 4096-9216 bytes packet*/
+};
+
+/* RSV, Receive Status Vector */
+struct bcmgenet_rx_counters {
+ struct bcmgenet_pkt_counters pkt_cnt;
+ u32 pkt; /* RO (0x428) Received pkt count*/
+ u32 bytes; /* RO Received byte count */
+ u32 mca; /* RO # of Received multicast pkt */
+ u32 bca; /* RO # of Receive broadcast pkt */
+ u32 fcs; /* RO # of Received FCS error */
+ u32 cf; /* RO # of Received control frame pkt*/
+ u32 pf; /* RO # of Received pause frame pkt */
+ u32 uo; /* RO # of unknown op code pkt */
+ u32 aln; /* RO # of alignment error count */
+ u32 flr; /* RO # of frame length out of range count */
+ u32 cde; /* RO # of code error pkt */
+ u32 fcr; /* RO # of carrier sense error pkt */
+ u32 ovr; /* RO # of oversize pkt*/
+ u32 jbr; /* RO # of jabber count */
+ u32 mtue; /* RO # of MTU error pkt*/
+ u32 pok; /* RO # of Received good pkt */
+ u32 uc; /* RO # of unicast pkt */
+ u32 ppp; /* RO # of PPP pkt */
+ u32 rcrc; /* RO (0x470),# of CRC match pkt */
+};
+
+/* TSV, Transmit Status Vector */
+struct bcmgenet_tx_counters {
+ struct bcmgenet_pkt_counters pkt_cnt;
+ u32 pkts; /* RO (0x4a8) Transmited pkt */
+ u32 mca; /* RO # of xmited multicast pkt */
+ u32 bca; /* RO # of xmited broadcast pkt */
+ u32 pf; /* RO # of xmited pause frame count */
+ u32 cf; /* RO # of xmited control frame count */
+ u32 fcs; /* RO # of xmited FCS error count */
+ u32 ovr; /* RO # of xmited oversize pkt */
+ u32 drf; /* RO # of xmited deferral pkt */
+ u32 edf; /* RO # of xmited Excessive deferral pkt*/
+ u32 scl; /* RO # of xmited single collision pkt */
+ u32 mcl; /* RO # of xmited multiple collision pkt*/
+ u32 lcl; /* RO # of xmited late collision pkt */
+ u32 ecl; /* RO # of xmited excessive collision pkt*/
+ u32 frg; /* RO # of xmited fragments pkt*/
+ u32 ncl; /* RO # of xmited total collision count */
+ u32 jbr; /* RO # of xmited jabber count*/
+ u32 bytes; /* RO # of xmited byte count */
+ u32 pok; /* RO # of xmited good pkt */
+ u32 uc; /* RO (0x0x4f0)# of xmited unitcast pkt */
+};
+
+struct bcmgenet_mib_counters {
+ struct bcmgenet_rx_counters rx;
+ struct bcmgenet_tx_counters tx;
+ u32 rx_runt_cnt;
+ u32 rx_runt_fcs;
+ u32 rx_runt_fcs_align;
+ u32 rx_runt_bytes;
+ u32 rbuf_ovflow_cnt;
+ u32 rbuf_err_cnt;
+ u32 mdf_err_cnt;
+};
+
+#define UMAC_HD_BKP_CTRL 0x004
+#define HD_FC_EN (1 << 0)
+#define HD_FC_BKOFF_OK (1 << 1)
+#define IPG_CONFIG_RX_SHIFT 2
+#define IPG_CONFIG_RX_MASK 0x1F
+
+#define UMAC_CMD 0x008
+#define CMD_TX_EN (1 << 0)
+#define CMD_RX_EN (1 << 1)
+#define UMAC_SPEED_10 0
+#define UMAC_SPEED_100 1
+#define UMAC_SPEED_1000 2
+#define UMAC_SPEED_2500 3
+#define CMD_SPEED_SHIFT 2
+#define CMD_SPEED_MASK 3
+#define CMD_PROMISC (1 << 4)
+#define CMD_PAD_EN (1 << 5)
+#define CMD_CRC_FWD (1 << 6)
+#define CMD_PAUSE_FWD (1 << 7)
+#define CMD_RX_PAUSE_IGNORE (1 << 8)
+#define CMD_TX_ADDR_INS (1 << 9)
+#define CMD_HD_EN (1 << 10)
+#define CMD_SW_RESET (1 << 13)
+#define CMD_LCL_LOOP_EN (1 << 15)
+#define CMD_AUTO_CONFIG (1 << 22)
+#define CMD_CNTL_FRM_EN (1 << 23)
+#define CMD_NO_LEN_CHK (1 << 24)
+#define CMD_RMT_LOOP_EN (1 << 25)
+#define CMD_PRBL_EN (1 << 27)
+#define CMD_TX_PAUSE_IGNORE (1 << 28)
+#define CMD_TX_RX_EN (1 << 29)
+#define CMD_RUNT_FILTER_DIS (1 << 30)
+
+#define UMAC_MAC0 0x00C
+#define UMAC_MAC1 0x010
+#define UMAC_MAX_FRAME_LEN 0x014
+
+#define UMAC_TX_FLUSH 0x334
+
+#define UMAC_MIB_START 0x400
+
+#define UMAC_MDIO_CMD 0x614
+#define MDIO_START_BUSY (1 << 29)
+#define MDIO_READ_FAIL (1 << 28)
+#define MDIO_RD (2 << 26)
+#define MDIO_WR (1 << 26)
+#define MDIO_PMD_SHIFT 21
+#define MDIO_PMD_MASK 0x1F
+#define MDIO_REG_SHIFT 16
+#define MDIO_REG_MASK 0x1F
+
+#define UMAC_RBUF_OVFL_CNT 0x61C
+
+#define UMAC_MPD_CTRL 0x620
+#define MPD_EN (1 << 0)
+#define MPD_PW_EN (1 << 27)
+#define MPD_MSEQ_LEN_SHIFT 16
+#define MPD_MSEQ_LEN_MASK 0xFF
+
+#define UMAC_MPD_PW_MS 0x624
+#define UMAC_MPD_PW_LS 0x628
+#define UMAC_RBUF_ERR_CNT 0x634
+#define UMAC_MDF_ERR_CNT 0x638
+#define UMAC_MDF_CTRL 0x650
+#define UMAC_MDF_ADDR 0x654
+#define UMAC_MIB_CTRL 0x580
+#define MIB_RESET_RX (1 << 0)
+#define MIB_RESET_RUNT (1 << 1)
+#define MIB_RESET_TX (1 << 2)
+
+#define RBUF_CTRL 0x00
+#define RBUF_64B_EN (1 << 0)
+#define RBUF_ALIGN_2B (1 << 1)
+#define RBUF_BAD_DIS (1 << 2)
+
+#define RBUF_STATUS 0x0C
+#define RBUF_STATUS_WOL (1 << 0)
+#define RBUF_STATUS_MPD_INTR_ACTIVE (1 << 1)
+#define RBUF_STATUS_ACPI_INTR_ACTIVE (1 << 2)
+
+#define RBUF_CHK_CTRL 0x14
+#define RBUF_RXCHK_EN (1 << 0)
+#define RBUF_SKIP_FCS (1 << 4)
+
+#define RBUF_TBUF_SIZE_CTRL 0xb4
+
+#define RBUF_HFB_CTRL_V1 0x38
+#define RBUF_HFB_FILTER_EN_SHIFT 16
+#define RBUF_HFB_FILTER_EN_MASK 0xffff0000
+#define RBUF_HFB_EN (1 << 0)
+#define RBUF_HFB_256B (1 << 1)
+#define RBUF_ACPI_EN (1 << 2)
+
+#define RBUF_HFB_LEN_V1 0x3C
+#define RBUF_FLTR_LEN_MASK 0xFF
+#define RBUF_FLTR_LEN_SHIFT 8
+
+#define TBUF_CTRL 0x00
+#define TBUF_BP_MC 0x0C
+
+#define TBUF_CTRL_V1 0x80
+#define TBUF_BP_MC_V1 0xA0
+
+#define HFB_CTRL 0x00
+#define HFB_FLT_ENABLE_V3PLUS 0x04
+#define HFB_FLT_LEN_V2 0x04
+#define HFB_FLT_LEN_V3PLUS 0x1C
+
+/* uniMac intrl2 registers */
+#define INTRL2_CPU_STAT 0x00
+#define INTRL2_CPU_SET 0x04
+#define INTRL2_CPU_CLEAR 0x08
+#define INTRL2_CPU_MASK_STATUS 0x0C
+#define INTRL2_CPU_MASK_SET 0x10
+#define INTRL2_CPU_MASK_CLEAR 0x14
+
+/* INTRL2 instance 0 definitions */
+#define UMAC_IRQ_SCB (1 << 0)
+#define UMAC_IRQ_EPHY (1 << 1)
+#define UMAC_IRQ_PHY_DET_R (1 << 2)
+#define UMAC_IRQ_PHY_DET_F (1 << 3)
+#define UMAC_IRQ_LINK_UP (1 << 4)
+#define UMAC_IRQ_LINK_DOWN (1 << 5)
+#define UMAC_IRQ_UMAC (1 << 6)
+#define UMAC_IRQ_UMAC_TSV (1 << 7)
+#define UMAC_IRQ_TBUF_UNDERRUN (1 << 8)
+#define UMAC_IRQ_RBUF_OVERFLOW (1 << 9)
+#define UMAC_IRQ_HFB_SM (1 << 10)
+#define UMAC_IRQ_HFB_MM (1 << 11)
+#define UMAC_IRQ_MPD_R (1 << 12)
+#define UMAC_IRQ_RXDMA_MBDONE (1 << 13)
+#define UMAC_IRQ_RXDMA_PDONE (1 << 14)
+#define UMAC_IRQ_RXDMA_BDONE (1 << 15)
+#define UMAC_IRQ_TXDMA_MBDONE (1 << 16)
+#define UMAC_IRQ_TXDMA_PDONE (1 << 17)
+#define UMAC_IRQ_TXDMA_BDONE (1 << 18)
+/* Only valid for GENETv3+ */
+#define UMAC_IRQ_MDIO_DONE (1 << 23)
+#define UMAC_IRQ_MDIO_ERROR (1 << 24)
+
+/* Register block offset */
+#define GENET_SYS_OFF 0x0000
+#define GENET_GR_BRIDGE_OFF 0x0040
+#define GENET_EXT_OFF 0x0080
+#define GENET_INTRL2_0_OFF 0x0200
+#define GENET_INTRL2_1_OFF 0x0240
+#define GENET_RBUF_OFF 0x0300
+#define GENET_UMAC_OFF 0x0800
+
+/* SYS block offsets and register definitions */
+#define SYS_REV_CTRL 0x00
+#define SYS_PORT_CTRL 0x04
+#define PORT_MODE_INT_EPHY 0
+#define PORT_MODE_INT_GPHY 1
+#define PORT_MODE_EXT_EPHY 2
+#define PORT_MODE_EXT_GPHY 3
+#define PORT_MODE_EXT_RVMII_25 (4 | BIT(4))
+#define PORT_MODE_EXT_RVMII_50 4
+#define LED_ACT_SOURCE_MAC (1 << 9)
+
+#define SYS_RBUF_FLUSH_CTRL 0x08
+#define SYS_TBUF_FLUSH_CTRL 0x0C
+#define RBUF_FLUSH_CTRL_V1 0x04
+
+/* Ext block register offsets and definitions */
+#define EXT_EXT_PWR_MGMT 0x00
+#define EXT_PWR_DOWN_BIAS (1 << 0)
+#define EXT_PWR_DOWN_DLL (1 << 1)
+#define EXT_PWR_DOWN_PHY (1 << 2)
+#define EXT_PWR_DN_EN_LD (1 << 3)
+#define EXT_ENERGY_DET (1 << 4)
+#define EXT_IDDQ_FROM_PHY (1 << 5)
+#define EXT_PHY_RESET (1 << 8)
+#define EXT_ENERGY_DET_MASK (1 << 12)
+
+#define EXT_RGMII_OOB_CTRL 0x0C
+#define RGMII_MODE_EN (1 << 0)
+#define RGMII_LINK (1 << 4)
+#define OOB_DISABLE (1 << 5)
+#define ID_MODE_DIS (1 << 16)
+
+#define EXT_GPHY_CTRL 0x1C
+#define EXT_CFG_IDDQ_BIAS (1 << 0)
+#define EXT_CFG_PWR_DOWN (1 << 1)
+#define EXT_GPHY_RESET (1 << 5)
+
+/* DMA rings size */
+#define DMA_RING_SIZE (0x40)
+#define DMA_RINGS_SIZE (DMA_RING_SIZE * (DESC_INDEX + 1))
+
+/* DMA registers common definitions */
+#define DMA_RW_POINTER_MASK 0x1FF
+#define DMA_P_INDEX_DISCARD_CNT_MASK 0xFFFF
+#define DMA_P_INDEX_DISCARD_CNT_SHIFT 16
+#define DMA_BUFFER_DONE_CNT_MASK 0xFFFF
+#define DMA_BUFFER_DONE_CNT_SHIFT 16
+#define DMA_P_INDEX_MASK 0xFFFF
+#define DMA_C_INDEX_MASK 0xFFFF
+
+/* DMA ring size register */
+#define DMA_RING_SIZE_MASK 0xFFFF
+#define DMA_RING_SIZE_SHIFT 16
+#define DMA_RING_BUFFER_SIZE_MASK 0xFFFF
+
+/* DMA interrupt threshold register */
+#define DMA_INTR_THRESHOLD_MASK 0x00FF
+
+/* DMA XON/XOFF register */
+#define DMA_XON_THREHOLD_MASK 0xFFFF
+#define DMA_XOFF_THRESHOLD_MASK 0xFFFF
+#define DMA_XOFF_THRESHOLD_SHIFT 16
+
+/* DMA flow period register */
+#define DMA_FLOW_PERIOD_MASK 0xFFFF
+#define DMA_MAX_PKT_SIZE_MASK 0xFFFF
+#define DMA_MAX_PKT_SIZE_SHIFT 16
+
+
+/* DMA control register */
+#define DMA_EN (1 << 0)
+#define DMA_RING_BUF_EN_SHIFT 0x01
+#define DMA_RING_BUF_EN_MASK 0xFFFF
+#define DMA_TSB_SWAP_EN (1 << 20)
+
+/* DMA status register */
+#define DMA_DISABLED (1 << 0)
+#define DMA_DESC_RAM_INIT_BUSY (1 << 1)
+
+/* DMA SCB burst size register */
+#define DMA_SCB_BURST_SIZE_MASK 0x1F
+
+/* DMA activity vector register */
+#define DMA_ACTIVITY_VECTOR_MASK 0x1FFFF
+
+/* DMA backpressure mask register */
+#define DMA_BACKPRESSURE_MASK 0x1FFFF
+#define DMA_PFC_ENABLE (1 << 31)
+
+/* DMA backpressure status register */
+#define DMA_BACKPRESSURE_STATUS_MASK 0x1FFFF
+
+/* DMA override register */
+#define DMA_LITTLE_ENDIAN_MODE (1 << 0)
+#define DMA_REGISTER_MODE (1 << 1)
+
+/* DMA timeout register */
+#define DMA_TIMEOUT_MASK 0xFFFF
+
+/* TDMA rate limiting control register */
+#define DMA_RATE_LIMIT_EN_MASK 0xFFFF
+
+/* TDMA arbitration control register */
+#define DMA_ARBITER_MODE_MASK 0x03
+#define DMA_RING_BUF_PRIORITY_MASK 0x1F
+#define DMA_RING_BUF_PRIORITY_SHIFT 5
+#define DMA_RATE_ADJ_MASK 0xFF
+
+/* Tx/Rx Dma Descriptor common bits*/
+#define DMA_BUFLENGTH_MASK 0x0fff
+#define DMA_BUFLENGTH_SHIFT 16
+#define DMA_OWN 0x8000
+#define DMA_EOP 0x4000
+#define DMA_SOP 0x2000
+#define DMA_WRAP 0x1000
+/* Tx specific Dma descriptor bits */
+#define DMA_TX_UNDERRUN 0x0200
+#define DMA_TX_APPEND_CRC 0x0040
+#define DMA_TX_OW_CRC 0x0020
+#define DMA_TX_DO_CSUM 0x0010
+#define DMA_TX_QTAG_SHIFT 7
+
+/* Rx Specific Dma descriptor bits */
+#define DMA_RX_CHK_V3PLUS 0x8000
+#define DMA_RX_CHK_V12 0x1000
+#define DMA_RX_BRDCAST 0x0040
+#define DMA_RX_MULT 0x0020
+#define DMA_RX_LG 0x0010
+#define DMA_RX_NO 0x0008
+#define DMA_RX_RXER 0x0004
+#define DMA_RX_CRC_ERROR 0x0002
+#define DMA_RX_OV 0x0001
+#define DMA_RX_FI_MASK 0x001F
+#define DMA_RX_FI_SHIFT 0x0007
+#define DMA_DESC_ALLOC_MASK 0x00FF
+
+#define DMA_ARBITER_RR 0x00
+#define DMA_ARBITER_WRR 0x01
+#define DMA_ARBITER_SP 0x02
+
+struct enet_cb {
+ struct sk_buff *skb;
+ void __iomem *bd_addr;
+ DEFINE_DMA_UNMAP_ADDR(dma_addr);
+ DEFINE_DMA_UNMAP_LEN(dma_len);
+};
+
+/* power management mode */
+enum bcmgenet_power_mode {
+ GENET_POWER_CABLE_SENSE = 0,
+ GENET_POWER_WOL_MAGIC,
+ GENET_POWER_WOL_ACPI,
+ GENET_POWER_PASSIVE,
+};
+
+struct bcmgenet_priv;
+
+/* We support both runtime GENET detection and compile-time
+ * to optimize code-paths for a given hardware
+ */
+enum bcmgenet_version {
+ GENET_V1 = 1,
+ GENET_V2,
+ GENET_V3,
+ GENET_V4
+};
+
+#define GENET_IS_V1(p) (__genet_get_version(p) == GENET_V1)
+#define GENET_IS_V2(p) (__genet_get_version(p) == GENET_V2)
+#define GENET_IS_V3(p) (__genet_get_version(p) == GENET_V3)
+#define GENET_IS_V4(p) (__genet_get_version(p) == GENET_V4)
+
+enum bcmgenet_version __genet_get_version(struct bcmgenet_priv *priv);
+
+
+/* Hardware flags */
+#define GENET_HAS_40BITS (1 << 0)
+#define GENET_HAS_EXT (1 << 1)
+#define GENET_HAS_MDIO_INTR (1 << 2)
+
+/* BCMGENET hardware parameters, keep this structure nicely aligned
+ * since it is going to be used in hot paths
+ */
+struct bcmgenet_hw_params {
+ u8 tx_queues;
+ u8 rx_queues;
+ u8 bds_cnt;
+ u8 bp_in_en_shift;
+ u32 bp_in_mask;
+ u8 hfb_filter_cnt;
+ u8 qtag_mask;
+ u16 tbuf_offset;
+ u32 hfb_offset;
+ u32 hfb_reg_offset;
+ u32 rdma_offset;
+ u32 tdma_offset;
+ u32 words_per_bd;
+ u32 flags;
+};
+
+struct bcmgenet_tx_ring {
+ spinlock_t lock; /* ring lock */
+ unsigned int index; /* ring index */
+ unsigned int queue; /* queue index */
+ struct enet_cb *cbs; /* tx ring buffer control block*/
+ unsigned int size; /* size of each tx ring */
+ unsigned int c_index; /* last consumer index of each ring*/
+ unsigned int free_bds; /* # of free bds for each ring */
+ unsigned int write_ptr; /* Tx ring write pointer SW copy */
+ unsigned int prod_index; /* Tx ring producer index SW copy */
+ unsigned int cb_ptr; /* Tx ring initial CB ptr */
+ unsigned int end_ptr; /* Tx ring end CB ptr */
+ void (*int_enable)(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *);
+ void (*int_disable)(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *);
+};
+
+/* device context */
+struct bcmgenet_priv {
+ void __iomem *base;
+ enum bcmgenet_version version;
+ struct net_device *dev;
+ spinlock_t lock;
+ spinlock_t bh_lock;
+ u32 int0_mask;
+ u32 int1_mask;
+
+ /* NAPI for descriptor based rx */
+ struct napi_struct napi ____cacheline_aligned;
+
+ /* transmit variables */
+ void __iomem *tx_bds;
+ struct enet_cb *tx_cbs;
+ unsigned int num_tx_bds;
+
+ struct bcmgenet_tx_ring tx_rings[DESC_INDEX + 1];
+
+ /* receive variables */
+ void __iomem *rx_bds;
+ void __iomem *rx_bd_assign_ptr;
+ int rx_bd_assign_index;
+ struct enet_cb *rx_cbs;
+ unsigned int num_rx_bds;
+ unsigned int rx_buf_len;
+ unsigned int rx_read_ptr;
+ unsigned int rx_c_index;
+
+ /* other misc variables */
+ struct bcmgenet_hw_params *hw_params;
+ wait_queue_head_t wq;
+ struct phy_device *phydev;
+ struct device_node *phy_dn;
+ struct mii_bus *mii_bus;
+ int old_duplex;
+ int old_link;
+ int old_pause;
+ phy_interface_t phy_interface;
+ u32 phy_supported;
+ int irq0;
+ int irq1;
+ int phy_addr;
+ int phy_type;
+ int phy_speed;
+ int ext_phy;
+ unsigned int irq0_stat;
+ unsigned int irq1_stat;
+ unsigned int desc_64b_en;
+ unsigned int desc_rxchk_en;
+ unsigned int dma_rx_chk_bit;
+ unsigned int crc_fwd_en;
+ u32 msg_enable;
+
+ struct work_struct bcmgenet_irq_work;
+ struct clk *clk;
+ struct platform_device *pdev;
+
+ /* WOL */
+ unsigned long wol_enabled;
+ struct clk *clk_wol;
+ u32 wolopts;
+
+ struct mutex mib_mutex;
+ struct bcmgenet_mib_counters mib;
+};
+
+#define GENET_IO_MACRO(name, offset) \
+static inline u32 bcmgenet_##name##_readl(struct bcmgenet_priv *priv, \
+ u32 off) \
+{ \
+ return __raw_readl(priv->base + offset + off); \
+} \
+static inline void bcmgenet_##name##_writel(struct bcmgenet_priv *priv, \
+ u32 val, u32 off) \
+{ \
+ __raw_writel(val, priv->base + offset + off); \
+}
+
+GENET_IO_MACRO(ext, GENET_EXT_OFF);
+GENET_IO_MACRO(umac, GENET_UMAC_OFF);
+GENET_IO_MACRO(sys, GENET_SYS_OFF);
+
+/* interrupt l2 registers accessors */
+GENET_IO_MACRO(intrl2_0, GENET_INTRL2_0_OFF);
+GENET_IO_MACRO(intrl2_1, GENET_INTRL2_1_OFF);
+
+/* HFB register accessors */
+GENET_IO_MACRO(hfb, priv->hw_params->hfb_offset);
+
+/* GENET v2+ HFB control and filter len helpers */
+GENET_IO_MACRO(hfb_reg, priv->hw_params->hfb_reg_offset);
+
+/* RBUF register accessors */
+GENET_IO_MACRO(rbuf, GENET_RBUF_OFF);
+
+
+int bcmgenet_mii_init(struct net_device *dev);
+int bcmgenet_mii_config(struct net_device *dev);
+void bcmgenet_mii_exit(struct net_device *dev);
+void bcmgenet_mii_reset(struct net_device *dev);
+
+#endif /* __BCMGENET_H__ */
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 07/10] net: bcmgenet: add MDIO routines
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
This patch adds support for configuring the port multiplexer hardware
which resides in front of the GENET Ethernet MAC controller. This allows
us to support:
- internal PHYs (using drivers/net/phy/bcm7xxx.c)
- MoCA PHYs which are an entirely separate hardware block not covered
here
- external PHYs and switches
Note that MoCA and switches are currently supported using the emulated
"fixed PHY" driver.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/broadcom/genet/bcmmii.c | 483 +++++++++++++++++++++++++++
1 file changed, 483 insertions(+)
create mode 100644 drivers/net/ethernet/broadcom/genet/bcmmii.c
diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
new file mode 100644
index 0000000..15b3392
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -0,0 +1,483 @@
+/*
+ * Broadcom GENET MDIO routines
+ *
+ * Copyright (c) 2014 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+
+#include <linux/types.h>
+#include <linux/delay.h>
+#include <linux/wait.h>
+#include <linux/mii.h>
+#include <linux/ethtool.h>
+#include <linux/bitops.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/phy.h>
+#include <linux/phy_fixed.h>
+#include <linux/brcmphy.h>
+#include <linux/of.h>
+#include <linux/of_net.h>
+#include <linux/of_mdio.h>
+
+#include "bcmgenet.h"
+
+/* read a value from the MII */
+static int bcmgenet_mii_read(struct mii_bus *bus, int phy_id, int location)
+{
+ int ret;
+ struct net_device *dev = bus->priv;
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ u32 reg;
+
+ bcmgenet_umac_writel(priv, (MDIO_RD | (phy_id << MDIO_PMD_SHIFT) |
+ (location << MDIO_REG_SHIFT)), UMAC_MDIO_CMD);
+ /* Start MDIO transaction*/
+ reg = bcmgenet_umac_readl(priv, UMAC_MDIO_CMD);
+ reg |= MDIO_START_BUSY;
+ bcmgenet_umac_writel(priv, reg, UMAC_MDIO_CMD);
+ wait_event_timeout(priv->wq,
+ !(bcmgenet_umac_readl(priv, UMAC_MDIO_CMD)
+ & MDIO_START_BUSY),
+ HZ / 100);
+ ret = bcmgenet_umac_readl(priv, UMAC_MDIO_CMD);
+
+ if (ret & MDIO_READ_FAIL)
+ return -EIO;
+
+ return ret & 0xffff;
+}
+
+/* write a value to the MII */
+static int bcmgenet_mii_write(struct mii_bus *bus, int phy_id,
+ int location, u16 val)
+{
+ struct net_device *dev = bus->priv;
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ u32 reg;
+
+ bcmgenet_umac_writel(priv, (MDIO_WR | (phy_id << MDIO_PMD_SHIFT) |
+ (location << MDIO_REG_SHIFT) | (0xffff & val)),
+ UMAC_MDIO_CMD);
+ reg = bcmgenet_umac_readl(priv, UMAC_MDIO_CMD);
+ reg |= MDIO_START_BUSY;
+ bcmgenet_umac_writel(priv, reg, UMAC_MDIO_CMD);
+ wait_event_timeout(priv->wq,
+ !(bcmgenet_umac_readl(priv, UMAC_MDIO_CMD) &
+ MDIO_START_BUSY),
+ HZ / 100);
+
+ return 0;
+}
+
+/* setup netdev link state when PHY link status change and
+ * update UMAC and RGMII block when link up
+ */
+static void bcmgenet_mii_setup(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct phy_device *phydev = priv->phydev;
+ u32 reg, cmd_bits = 0;
+ unsigned int status_changed = 0;
+
+ if (priv->old_link != phydev->link) {
+ status_changed = 1;
+ priv->old_link = phydev->link;
+ }
+
+ if (phydev->link) {
+ /* program UMAC and RGMII block based on established link
+ * speed, pause, and duplex.
+ * the speed set in umac->cmd tell RGMII block which clock
+ * 25MHz(100Mbps)/125MHz(1Gbps) to use for transmit.
+ * receive clock is provided by PHY.
+ */
+ reg = bcmgenet_ext_readl(priv, EXT_RGMII_OOB_CTRL);
+ reg &= ~OOB_DISABLE;
+ reg |= RGMII_LINK;
+ bcmgenet_ext_writel(priv, reg, EXT_RGMII_OOB_CTRL);
+
+ /* speed */
+ if (phydev->speed == SPEED_1000)
+ cmd_bits = UMAC_SPEED_1000;
+ else if (phydev->speed == SPEED_100)
+ cmd_bits = UMAC_SPEED_100;
+ else
+ cmd_bits = UMAC_SPEED_10;
+ cmd_bits <<= CMD_SPEED_SHIFT;
+
+ if (priv->old_duplex != phydev->duplex) {
+ status_changed = 1;
+ priv->old_duplex = phydev->duplex;
+ }
+
+ /* duplex */
+ if (phydev->duplex != DUPLEX_FULL)
+ cmd_bits |= CMD_HD_EN;
+
+ if (priv->old_pause != phydev->pause) {
+ status_changed = 1;
+ priv->old_pause = phydev->pause;
+ }
+
+ /* pause capability */
+ if (!phydev->pause)
+ cmd_bits |= CMD_RX_PAUSE_IGNORE | CMD_TX_PAUSE_IGNORE;
+
+ reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+ reg &= ~((CMD_SPEED_MASK << CMD_SPEED_SHIFT) |
+ CMD_HD_EN |
+ CMD_RX_PAUSE_IGNORE | CMD_TX_PAUSE_IGNORE);
+ reg |= cmd_bits;
+ bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+ }
+
+ if (status_changed)
+ phy_print_status(phydev);
+}
+
+void bcmgenet_mii_reset(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+
+ if (priv->phydev) {
+ phy_init_hw(priv->phydev);
+ phy_start_aneg(priv->phydev);
+ }
+}
+
+static void bcmgenet_ephy_power_up(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ u32 reg = 0;
+
+ /* EXT_GPHY_CTRL is only valid for GENETv4 and onward */
+ if (!GENET_IS_V4(priv))
+ return;
+
+ reg = bcmgenet_ext_readl(priv, EXT_GPHY_CTRL);
+ reg &= ~(EXT_CFG_IDDQ_BIAS | EXT_CFG_PWR_DOWN);
+ reg |= EXT_GPHY_RESET;
+ bcmgenet_ext_writel(priv, reg, EXT_GPHY_CTRL);
+ mdelay(2);
+
+ reg &= ~EXT_GPHY_RESET;
+ bcmgenet_ext_writel(priv, reg, EXT_GPHY_CTRL);
+ udelay(20);
+}
+
+static int bcmgenet_mii_probe(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct phy_device *phydev;
+ unsigned int phy_flags;
+
+ if (priv->phydev) {
+ pr_info("PHY already attached\n");
+ return 0;
+ }
+
+ phy_flags = PHY_BRCM_100MBPS_WAR;
+
+ /* workarounds are only needed for 100Mbps PHYs */
+ if (priv->phy_speed == SPEED_1000)
+ phy_flags = 0;
+
+ /* workarounds are only needed for some 40nm chips, exclude
+ * GENET v1
+ */
+ if (GENET_IS_V1(priv))
+ phy_flags = 0;
+
+ if (priv->phy_dn)
+ phydev = of_phy_connect(dev, priv->phy_dn,
+ bcmgenet_mii_setup, phy_flags,
+ priv->phy_interface);
+ else
+ phydev = of_phy_connect_fixed_link(dev,
+ bcmgenet_mii_setup,
+ priv->phy_interface);
+
+ if (!phydev) {
+ pr_err("could not attach to PHY\n");
+ return -ENODEV;
+ }
+
+ phydev->supported &= priv->phy_supported;
+ /* Adjust advertised speeds based on configured speed */
+ if (priv->phy_speed == SPEED_1000)
+ phydev->advertising = PHY_GBIT_FEATURES;
+ else
+ phydev->advertising = PHY_BASIC_FEATURES;
+
+ pr_info("attached PHY at address %d [%s]\n",
+ phydev->addr, phydev->drv->name);
+
+ priv->old_link = -1;
+ priv->old_duplex = -1;
+ priv->old_pause = -1;
+ priv->phydev = phydev;
+
+ return 0;
+}
+
+static int bcmgenet_mii_alloc(struct bcmgenet_priv *priv)
+{
+ struct mii_bus *bus;
+ int ret = 0;
+
+ if (priv->mii_bus)
+ return 0;
+
+ priv->mii_bus = mdiobus_alloc();
+ if (!priv->mii_bus) {
+ pr_err("failed to allocate\n");
+ return -ENOMEM;
+ }
+
+ bus = priv->mii_bus;
+ bus->priv = priv->dev;
+ bus->name = "bcmgenet MII bus";
+ bus->parent = &priv->pdev->dev;
+ bus->read = bcmgenet_mii_read;
+ bus->write = bcmgenet_mii_write;
+ snprintf(bus->id, MII_BUS_ID_SIZE, "%s-%d",
+ priv->pdev->name, priv->pdev->id);
+
+ bus->irq = kzalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
+ if (!bus->irq) {
+ ret = -ENOMEM;
+ goto out_mdio_free;
+ }
+
+ /* The internal PHY has its link interrupts routed to the
+ * Ethernet MAC ISRs
+ */
+ if (priv->phy_type == PHY_INTERFACE_MODE_INTERNAL)
+ bus->irq[priv->phy_addr] = PHY_IGNORE_INTERRUPT;
+ else
+ bus->irq[priv->phy_addr] = PHY_POLL;
+
+ return 0;
+
+out_mdio_free:
+ mdiobus_free(priv->mii_bus);
+ return ret;
+}
+
+static void bcmgenet_mii_free(struct bcmgenet_priv *priv)
+{
+ mdiobus_unregister(priv->mii_bus);
+ kfree(priv->mii_bus->irq);
+ mdiobus_free(priv->mii_bus);
+}
+
+static void bcmgenet_internal_phy_setup(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ u32 reg;
+
+ /* Power up EPHY */
+ bcmgenet_ephy_power_up(dev);
+ /* enable APD */
+ reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+ reg |= EXT_PWR_DN_EN_LD;
+ bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+ bcmgenet_mii_reset(dev);
+}
+
+static void bcmgenet_moca_phy_setup(struct bcmgenet_priv *priv)
+{
+ u32 reg;
+
+ /* Speed settings are set in bcmgenet_mii_setup() */
+ reg = bcmgenet_sys_readl(priv, SYS_PORT_CTRL);
+ reg |= LED_ACT_SOURCE_MAC;
+ bcmgenet_sys_writel(priv, reg, SYS_PORT_CTRL);
+}
+
+int bcmgenet_mii_config(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct device *kdev = &priv->pdev->dev;
+ const char *phy_name = NULL;
+ u32 id_mode_dis = 0;
+ u32 port_ctrl;
+ u32 reg;
+
+ priv->ext_phy = (priv->phy_type != PHY_INTERFACE_MODE_INTERNAL) &&
+ (priv->phy_type != PHY_INTERFACE_MODE_MOCA);
+
+ switch (priv->phy_interface) {
+ case PHY_INTERFACE_MODE_INTERNAL:
+ case PHY_INTERFACE_MODE_MOCA:
+ /* Irrespective of the actually configured PHY speed (100 or
+ * 1000) GENETv4 only has an internal GPHY so we will just end
+ * up masking the Gigabit features from what we support, not
+ * switching to the EPHY
+ */
+ if (GENET_IS_V4(priv)) {
+ priv->phy_supported = PHY_GBIT_FEATURES;
+ port_ctrl = PORT_MODE_INT_GPHY;
+ } else {
+ priv->phy_supported = PHY_BASIC_FEATURES;
+ port_ctrl = PORT_MODE_INT_EPHY;
+ }
+
+ bcmgenet_sys_writel(priv, port_ctrl, SYS_PORT_CTRL);
+
+ if (priv->phy_type == PHY_INTERFACE_MODE_INTERNAL) {
+ phy_name = "internal PHY";
+ bcmgenet_internal_phy_setup(dev);
+ } else if (priv->phy_type == PHY_INTERFACE_MODE_MOCA) {
+ phy_name = "MoCA";
+ bcmgenet_moca_phy_setup(priv);
+ }
+ break;
+
+ case PHY_INTERFACE_MODE_MII:
+ phy_name = "external MII";
+ priv->phy_supported = PHY_BASIC_FEATURES;
+ bcmgenet_sys_writel(priv,
+ PORT_MODE_EXT_EPHY, SYS_PORT_CTRL);
+ break;
+
+ case PHY_INTERFACE_MODE_REVMII:
+ phy_name = "external RvMII";
+ if (priv->phy_speed == SPEED_100) {
+ priv->phy_supported = PHY_BASIC_FEATURES;
+ port_ctrl = PORT_MODE_EXT_RVMII_25;
+ } else {
+ priv->phy_supported = PHY_GBIT_FEATURES;
+ port_ctrl = PORT_MODE_EXT_RVMII_50;
+ }
+ bcmgenet_sys_writel(priv, port_ctrl, SYS_PORT_CTRL);
+ break;
+
+ case PHY_INTERFACE_MODE_RGMII:
+ /* RGMII_NO_ID: TXC transitions at the same time as TXD
+ * (requires PCB or receiver-side delay)
+ * RGMII: Add 2ns delay on TXC (90 degree shift)
+ *
+ * ID is implicitly disabled for 100Mbps (RG)MII operation.
+ */
+ id_mode_dis = BIT(16);
+ /* fall through */
+ case PHY_INTERFACE_MODE_RGMII_TXID:
+ if (id_mode_dis)
+ phy_name = "external RGMII (no delay)";
+ else
+ phy_name = "external RGMII (TX delay)";
+ reg = bcmgenet_ext_readl(priv, EXT_RGMII_OOB_CTRL);
+ reg |= RGMII_MODE_EN | id_mode_dis;
+ bcmgenet_ext_writel(priv, reg, EXT_RGMII_OOB_CTRL);
+ bcmgenet_sys_writel(priv,
+ PORT_MODE_EXT_GPHY, SYS_PORT_CTRL);
+ priv->phy_supported = PHY_GBIT_FEATURES;
+ /* setup mii based on configure speed and RGMII txclk is set in
+ * umac->cmd, mii_setup() after link established.
+ */
+ break;
+ default:
+ dev_err(kdev, "unknown phy mode: %d\n", priv->phy_interface);
+ return -EINVAL;
+ }
+
+ dev_info(kdev, "configuring instance for %s\n", phy_name);
+
+ return 0;
+}
+
+static int bcmgenet_mii_of_init(struct bcmgenet_priv *priv)
+{
+ struct device_node *dn = priv->pdev->dev.of_node;
+ struct device *kdev = &priv->pdev->dev;
+ struct device_node *mdio_dn;
+ const __be32 *fixed_link;
+ u32 propval;
+ int phy_mode;
+ int ret, sz;
+
+ mdio_dn = of_get_next_child(dn, NULL);
+ if (!mdio_dn) {
+ dev_err(kdev, "unable to find MDIO bus node\n");
+ return -ENODEV;
+ }
+
+ ret = of_mdiobus_register(priv->mii_bus, mdio_dn);
+ if (ret) {
+ dev_err(kdev, "failed to register MDIO bus\n");
+ return ret;
+ }
+
+ /* Check if we have an internal or external PHY */
+ priv->phy_dn = of_parse_phandle(dn, "phy-handle", 0);
+ if (priv->phy_dn) {
+ if (!of_property_read_u32(priv->phy_dn, "max-speed", &propval))
+ priv->phy_speed = propval;
+ } else {
+ /* Read the link speed from the fixed-link property */
+ fixed_link = of_get_property(dn, "fixed-link", &sz);
+ if (!fixed_link || sz < sizeof(*fixed_link)) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ priv->phy_speed = be32_to_cpu(fixed_link[2]);
+ }
+
+ /* Get the link mode */
+ phy_mode = of_get_phy_mode(dn);
+ priv->phy_interface = phy_mode;
+ priv->phy_type = phy_mode;
+
+ return 0;
+out:
+ mdiobus_unregister(priv->mii_bus);
+ return ret;
+}
+
+int bcmgenet_mii_init(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ int ret;
+
+ ret = bcmgenet_mii_alloc(priv);
+ if (ret)
+ return ret;
+
+ ret = bcmgenet_mii_of_init(priv);
+ if (ret)
+ goto out;
+
+ ret = bcmgenet_mii_config(dev);
+ if (ret)
+ goto out;
+
+ ret = bcmgenet_mii_probe(dev);
+ if (ret)
+ goto out;
+
+ return 0;
+out:
+ bcmgenet_mii_free(priv);
+ return ret;
+}
+
+void bcmgenet_mii_exit(struct net_device *dev)
+{
+ bcmgenet_mii_free(netdev_priv(dev));
+}
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 08/10] net: bcmgenet: hook into the build system
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
This patch adds a new configuration symbol: CONFIG_BCMGENET which allows
us to build the Broadcom GENET driver and hook the driver files into the
build system.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/broadcom/Kconfig | 10 ++++++++++
drivers/net/ethernet/broadcom/Makefile | 1 +
drivers/net/ethernet/broadcom/genet/Makefile | 2 ++
3 files changed, 13 insertions(+)
create mode 100644 drivers/net/ethernet/broadcom/genet/Makefile
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index 3f97d9f..a489712 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -60,6 +60,16 @@ config BCM63XX_ENET
This driver supports the ethernet MACs in the Broadcom 63xx
MIPS chipset family (BCM63XX).
+config BCMGENET
+ tristate "Broadcom GENET internal MAC support"
+ select MII
+ select PHYLIB
+ select FIXED_PHY
+ select BCM7XXX_PHY
+ help
+ This driver supports the built-in Ethernet MACs found in the
+ Broadcom BCM7xxx Set Top Box family chipset.
+
config BNX2
tristate "Broadcom NetXtremeII support"
depends on PCI
diff --git a/drivers/net/ethernet/broadcom/Makefile b/drivers/net/ethernet/broadcom/Makefile
index 68efa1a..fd639a0 100644
--- a/drivers/net/ethernet/broadcom/Makefile
+++ b/drivers/net/ethernet/broadcom/Makefile
@@ -4,6 +4,7 @@
obj-$(CONFIG_B44) += b44.o
obj-$(CONFIG_BCM63XX_ENET) += bcm63xx_enet.o
+obj-$(CONFIG_BCMGENET) += genet/
obj-$(CONFIG_BNX2) += bnx2.o
obj-$(CONFIG_CNIC) += cnic.o
obj-$(CONFIG_BNX2X) += bnx2x/
diff --git a/drivers/net/ethernet/broadcom/genet/Makefile b/drivers/net/ethernet/broadcom/genet/Makefile
new file mode 100644
index 0000000..31f55a9
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/genet/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_BCMGENET) += genet.o
+genet-objs := bcmgenet.o bcmmii.o
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 09/10] Documentation: add Device tree bindings for Broadcom GENET
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
This patch adds the Device Tree bindings for the Broadcom GENET Gigabit
Ethernet controller. A bunch of examples are provided to illustrate the
versatile aspect of the hardare.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
.../devicetree/bindings/net/broadcom-bcmgenet.txt | 111 +++++++++++++++++++++
1 file changed, 111 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
diff --git a/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt b/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
new file mode 100644
index 0000000..93c58e9
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
@@ -0,0 +1,111 @@
+* Broadcom BCM7xxx Ethernet Controller (GENET)
+
+Required properties:
+- compatible: should be "brcm,genet-v1", "brcm,genet-v2", "brcm,genet-v3",
+ "brcm,genet-v4".
+- reg: address and length of the register set for the device.
+- interrupts: interrupt for the device
+- mdio bus node: this node should always be present regarless of the PHY
+ configuration of the GENET instance
+- phy-mode: The interface between the SoC and the PHY (a string that
+ of_get_phy_mode() can understand).
+
+MDIO bus node required properties:
+
+- compatible: should be "brcm,genet-v<N>-mdio"
+- reg: address and length relative to the parent node base register address
+- address-cells: address cell for MDIO bus addressing, should be 1
+- size-cells: size of the cells for MDIO bus addressing, should be 0
+
+Optional properties:
+- phy-handle: A phandle to a phy node defining the PHY address (as the reg
+ property, a single integer), used to describe configurations where a PHY
+ (internal or external) is used.
+
+- fixed-link: When the GENET interface is connected to a MoCA hardware block
+ or when operating in a RGMII to RGMII type of connection, or when the
+ MDIO bus is voluntarily disabled, this property should be used to describe
+ the "fixed link", the property is described as follows:
+
+ fixed-link: <a b c d e> where a is emulated phy id - choose any,
+ but unique to the all specified fixed-links, b is duplex - 0 half,
+ 1 full, c is link speed - d#10/d#100/d#1000, d is pause - 0 no
+ pause, 1 pause, e is asym_pause - 0 no asym_pause, 1 asym_pause.
+
+Internal Gigabit PHY example:
+
+ethernet@f0b60000 {
+ phy-mode = "internal";
+ phy-handle = <&phy1>;
+ mac-address = [ 00 10 18 36 23 1a ];
+ compatible = "brcm,genet-v4";
+ #address-cells = <0x1>;
+ #size-cells = <0x1>;
+ device_type = "ethernet";
+ reg = <0xf0b60000 0xfc4c>;
+ interrupts = <0x0 0x14 0x0 0x0 0x15 0x0>;
+
+ mdio@b60e14 {
+ compatible = "brcm,genet-mdio-v4";
+ #address-cells = <0x1>;
+ #size-cells = <0x0>;
+ reg = <0xb60e14 0x8>;
+
+ phy1: ethernet-phy@1 {
+ device_type = "ethernet-phy";
+ max-speed = <1000>;
+ reg = <0x1>;
+ compatible = "brcm,28nm-gphy", "ethernet-phy-ieee802.3-c22";
+ };
+ };
+};
+
+MoCA interface / MAC to MAC example:
+
+ethernet@f0b80000 {
+ phy-mode = "moca";
+ fixed-link = <1 0 1000 0 0>;
+ mac-address = [ 00 10 18 36 24 1a ];
+ compatible = "brcm,genet-v4";
+ #address-cells = <0x1>;
+ #size-cells = <0x1>;
+ device_type = "ethernet";
+ reg = <0xf0b80000 0xfc4c>;
+ interrupts = <0x0 0x16 0x0 0x0 0x17 0x0>;
+
+ mdio@b80e14 {
+ compatible = "brcm,genet-mdio-v4";
+ #address-cells = <0x1>;
+ #size-cells = <0x0>;
+ reg = <0xb80e14 0x8>;
+ };
+};
+
+
+External MDIO-connected Gigabit PHY/switch:
+
+ethernet@f0ba0000 {
+ phy-mode = "rgmii";
+ phy-handle = <&phy0>;
+ mac-address = [ 00 10 18 36 26 1a ];
+ compatible = "brcm,genet-v4";
+ #address-cells = <0x1>;
+ #size-cells = <0x1>;
+ device_type = "ethernet";
+ reg = <0xf0ba0000 0xfc4c>;
+ interrupts = <0x0 0x18 0x0 0x0 0x19 0x0>;
+
+ mdio@ba0e14 {
+ compatible = "brcm,genet-mdio-v4";
+ #address-cells = <0x1>;
+ #size-cells = <0x0>;
+ reg = <0xba0e14 0x8>;
+
+ phy0: ethernet-phy@0 {
+ device_type = "ethernet-phy";
+ max-speed = <1000>;
+ reg = <0x0>;
+ compatible = "brcm,bcm53125", "ethernet-phy-ieee802.3-c22";
+ };
+ };
+};
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 04/10] net: phy: add Broadcom BCM7xxx internal PHY driver
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
This patch adds support for the Broadcom BCM7xxx Set Top Box SoCs
internal PHYs. This driver supports the following generation of SoCs:
- BCM7366, BCM7439, BCM7445 (28nm process)
- all 40nm and 65nm (older MIPS-based SoCs)
The PHYs on these SoCs require a bunch of workarounds to operate
correctly, both during configuration time and at suspend/resume time,
the driver handles that for us.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/phy/Kconfig | 6 +
drivers/net/phy/Makefile | 1 +
drivers/net/phy/bcm7xxx.c | 322 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/brcmphy.h | 9 ++
4 files changed, 338 insertions(+)
create mode 100644 drivers/net/phy/bcm7xxx.c
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 9b5d46c..6a17f92 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -71,6 +71,12 @@ config BCM63XX_PHY
---help---
Currently supports the 6348 and 6358 PHYs.
+config BCM7XXX_PHY
+ tristate "Drivers for Broadcom 7xxx SOCs internal PHYs"
+ ---help---
+ Currently supports the BCM7366, BCM7439, BCM7445, and
+ 40nm and 65nm generation of BCM7xxx Set Top Box SoCs.
+
config BCM87XX_PHY
tristate "Driver for Broadcom BCM8706 and BCM8727 PHYs"
help
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 9013dfa..07d2402 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_SMSC_PHY) += smsc.o
obj-$(CONFIG_VITESSE_PHY) += vitesse.o
obj-$(CONFIG_BROADCOM_PHY) += broadcom.o
obj-$(CONFIG_BCM63XX_PHY) += bcm63xx.o
+obj-$(CONFIG_BCM7XXX_PHY) += bcm7xxx.o
obj-$(CONFIG_BCM87XX_PHY) += bcm87xx.o
obj-$(CONFIG_ICPLUS_PHY) += icplus.o
obj-$(CONFIG_REALTEK_PHY) += realtek.o
diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
new file mode 100644
index 0000000..f9ac282
--- /dev/null
+++ b/drivers/net/phy/bcm7xxx.c
@@ -0,0 +1,322 @@
+/*
+ * Broadcom BCM7xxx internal transceivers support.
+ *
+ * Copyright (C) 2014, Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/delay.h>
+#include <linux/brcmphy.h>
+
+static int bcm7445_config_init(struct phy_device *phydev)
+{
+ int ret;
+ const struct bcm7445_regs {
+ int reg;
+ u16 value;
+ } bcm7445_regs_cfg[] = {
+ /* increases ADC latency by 24ns */
+ { 0x17, 0x0038 },
+ { 0x15, 0xAB95 },
+ /* increases internal 1V LDO voltage by 5% */
+ { 0x17, 0x2038 },
+ { 0x15, 0xBB22 },
+ /* reduce RX low pass filter corner frequency */
+ { 0x17, 0x6038 },
+ { 0x15, 0xFFC5 },
+ /* reduce RX high pass filter corner frequency */
+ { 0x17, 0x003a },
+ { 0x15, 0x2002 },
+ };
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(bcm7445_regs_cfg); i++) {
+ ret = phy_write(phydev,
+ bcm7445_regs_cfg[i].reg,
+ bcm7445_regs_cfg[i].value);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static void phy_write_exp(struct phy_device *phydev,
+ u16 reg, u16 value)
+{
+ phy_write(phydev, 0x17, 0xf00 | reg);
+ phy_write(phydev, 0x15, value);
+}
+
+static void phy_write_misc(struct phy_device *phydev,
+ u16 reg, u16 chl, u16 value)
+{
+ int tmp;
+
+ phy_write(phydev, 0x18, 0x7);
+
+ tmp = phy_read(phydev, 0x18);
+ tmp |= 0x800;
+ phy_write(phydev, 0x18, tmp);
+
+ tmp = (chl * 0x2000) | reg;
+ phy_write(phydev, 0x17, tmp);
+
+ phy_write(phydev, 0x15, value);
+}
+
+static int bcm7xxx_28nm_afe_config_init(struct phy_device *phydev)
+{
+ /* write AFE_RXCONFIG_0 */
+ phy_write_misc(phydev, 0x38, 0x0000, 0xeb17);
+
+ /* write AFE_RXCONFIG_1 */
+ phy_write_misc(phydev, 0x38, 0x0001, 0x9a3f);
+
+ /* write AFE_RX_LP_COUNTER */
+ phy_write_misc(phydev, 0x38, 0x0003, 0x7fc7);
+
+ /* write AFE_HPF_TRIM_OTHERS */
+ phy_write_misc(phydev, 0x3A, 0x0000, 0x000b);
+
+ /* write AFTE_TX_CONFIG */
+ phy_write_misc(phydev, 0x39, 0x0000, 0x0800);
+
+ /* Increase VCO range to prevent unlocking problem of PLL at low
+ * temp
+ */
+ phy_write_misc(phydev, 0x0032, 0x0001, 0x0048);
+
+ /* Change Ki to 011 */
+ phy_write_misc(phydev, 0x0032, 0x0002, 0x021b);
+
+ /* Disable loading of TVCO buffer to bandgap, set bandgap trim
+ * to 111
+ */
+ phy_write_misc(phydev, 0x0033, 0x0000, 0x0e20);
+
+ /* Adjust bias current trim by -3 */
+ phy_write_misc(phydev, 0x000a, 0x0000, 0x690b);
+
+ /* Switch to CORE_BASE1E */
+ phy_write(phydev, 0x1e, 0xd);
+
+ /* Reset R_CAL/RC_CAL Engine */
+ phy_write_exp(phydev, 0x00b0, 0x0010);
+
+ /* Disable Reset R_CAL/RC_CAL Engine */
+ phy_write_exp(phydev, 0x00b0, 0x0000);
+
+ return 0;
+}
+
+static int bcm7xxx_28nm_config_init(struct phy_device *phydev)
+{
+ int ret;
+
+ ret = bcm7445_config_init(phydev);
+ if (ret)
+ return ret;
+
+ return bcm7xxx_28nm_afe_config_init(phydev);
+}
+
+static int phy_set_clr_bits(struct phy_device *dev, int location,
+ int set_mask, int clr_mask)
+{
+ int v, ret;
+
+ v = phy_read(dev, location);
+ if (v < 0)
+ return v;
+
+ v &= ~clr_mask;
+ v |= set_mask;
+
+ ret = phy_write(dev, location, v);
+ if (ret < 0)
+ return ret;
+
+ return v;
+}
+
+static int bcm7xxx_config_init(struct phy_device *phydev)
+{
+ /* Enable 64 clock MDIO */
+ phy_write(phydev, 0x1d, 0x1000);
+ phy_read(phydev, 0x1d);
+
+ /* Workaround only required for 100Mbits/sec */
+ if (!(phydev->dev_flags & PHY_BRCM_100MBPS_WAR))
+ return 0;
+
+ /* set shadow mode 2 */
+ phy_set_clr_bits(phydev, 0x1f, 0x0004, 0x0004);
+
+ /* set iddq_clkbias */
+ phy_write(phydev, 0x14, 0x0F00);
+ udelay(10);
+
+ /* reset iddq_clkbias */
+ phy_write(phydev, 0x14, 0x0C00);
+
+ phy_write(phydev, 0x13, 0x7555);
+
+ /* reset shadow mode 2 */
+ phy_set_clr_bits(phydev, 0x1f, 0x0004, 0);
+
+ return 0;
+}
+
+/* Workaround for putting the PHY in IDDQ mode, required
+ * for all BCM7XXX PHYs
+ */
+static int bcm7xxx_suspend(struct phy_device *phydev)
+{
+ int ret;
+ const struct bcm7xxx_regs {
+ int reg;
+ u16 value;
+ } bcm7xxx_suspend_cfg[] = {
+ { 0x1f, 0x008b },
+ { 0x10, 0x01c0 },
+ { 0x14, 0x7000 },
+ { 0x1f, 0x000f },
+ { 0x10, 0x20d0 },
+ { 0x1f, 0x000b },
+ };
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(bcm7xxx_suspend_cfg); i++) {
+ ret = phy_write(phydev,
+ bcm7xxx_suspend_cfg[i].reg,
+ bcm7xxx_suspend_cfg[i].value);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static int bcm7xxx_dummy_config_init(struct phy_device *phydev)
+{
+ return 0;
+}
+
+static struct phy_driver bcm7xxx_driver[] = {
+{
+ .phy_id = PHY_ID_BCM7366,
+ .phy_id_mask = 0xfffffff0,
+ .name = "Broadcom BCM7366",
+ .features = PHY_GBIT_FEATURES |
+ SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+ .flags = PHY_IS_INTERNAL,
+ .config_init = bcm7xxx_28nm_afe_config_init,
+ .config_aneg = genphy_config_aneg,
+ .read_status = genphy_read_status,
+ .suspend = bcm7xxx_suspend,
+ .resume = bcm7xxx_28nm_afe_config_init,
+ .driver = { .owner = THIS_MODULE },
+}, {
+ .phy_id = PHY_ID_BCM7439,
+ .phy_id_mask = 0xfffffff0,
+ .name = "Broadcom BCM7439",
+ .features = PHY_GBIT_FEATURES |
+ SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+ .flags = PHY_IS_INTERNAL,
+ .config_init = bcm7xxx_28nm_afe_config_init,
+ .config_aneg = genphy_config_aneg,
+ .read_status = genphy_read_status,
+ .suspend = bcm7xxx_suspend,
+ .resume = bcm7xxx_28nm_afe_config_init,
+ .driver = { .owner = THIS_MODULE },
+}, {
+ .phy_id = PHY_ID_BCM7445,
+ .phy_id_mask = 0xfffffff0,
+ .name = "Broadcom BCM7445",
+ .features = PHY_GBIT_FEATURES |
+ SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+ .flags = PHY_IS_INTERNAL,
+ .config_init = bcm7xxx_28nm_config_init,
+ .config_aneg = genphy_config_aneg,
+ .read_status = genphy_read_status,
+ .suspend = bcm7xxx_suspend,
+ .resume = bcm7xxx_28nm_config_init,
+ .driver = { .owner = THIS_MODULE },
+}, {
+ .name = "Broadcom BCM7XXX 28nm",
+ .phy_id = PHY_ID_BCM7XXX_28,
+ .phy_id_mask = PHY_BCM_OUI_MASK,
+ .features = PHY_GBIT_FEATURES |
+ SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+ .flags = PHY_IS_INTERNAL,
+ .config_init = bcm7xxx_28nm_config_init,
+ .config_aneg = genphy_config_aneg,
+ .read_status = genphy_read_status,
+ .suspend = bcm7xxx_suspend,
+ .resume = bcm7xxx_28nm_config_init,
+ .driver = { .owner = THIS_MODULE },
+}, {
+ .phy_id = PHY_BCM_OUI_4,
+ .phy_id_mask = 0xffff0000,
+ .name = "Broadcom BCM7XXX 40nm",
+ .features = PHY_GBIT_FEATURES |
+ SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+ .flags = PHY_IS_INTERNAL,
+ .config_init = bcm7xxx_config_init,
+ .config_aneg = genphy_config_aneg,
+ .read_status = genphy_read_status,
+ .suspend = bcm7xxx_suspend,
+ .resume = bcm7xxx_config_init,
+ .driver = { .owner = THIS_MODULE },
+}, {
+ .phy_id = PHY_BCM_OUI_5,
+ .phy_id_mask = 0xffffff00,
+ .name = "Broadcom BCM7XXX 65nm",
+ .features = PHY_BASIC_FEATURES |
+ SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+ .flags = PHY_IS_INTERNAL,
+ .config_init = bcm7xxx_dummy_config_init,
+ .config_aneg = genphy_config_aneg,
+ .read_status = genphy_read_status,
+ .suspend = bcm7xxx_suspend,
+ .resume = bcm7xxx_config_init,
+ .driver = { .owner = THIS_MODULE },
+} };
+
+static struct mdio_device_id __maybe_unused bcm7xxx_tbl[] = {
+ { PHY_ID_BCM7366, 0xfffffff0, },
+ { PHY_ID_BCM7439, 0xfffffff0, },
+ { PHY_ID_BCM7445, 0xfffffff0, },
+ { PHY_ID_BCM7XXX_28, 0xfffffc00 },
+ { PHY_BCM_OUI_4, 0xffff0000 },
+ { PHY_BCM_OUI_5, 0xffffff00 },
+ { }
+};
+
+static int __init bcm7xxx_phy_init(void)
+{
+ return phy_drivers_register(bcm7xxx_driver,
+ ARRAY_SIZE(bcm7xxx_driver));
+}
+
+static void __exit bcm7xxx_phy_exit(void)
+{
+ phy_drivers_unregister(bcm7xxx_driver,
+ ARRAY_SIZE(bcm7xxx_driver));
+}
+
+module_init(bcm7xxx_phy_init);
+module_exit(bcm7xxx_phy_exit);
+
+MODULE_DEVICE_TABLE(mdio, bcm7xxx_tbl);
+
+MODULE_DESCRIPTION("Broadcom BCM7xxx internal PHY driver");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Broadcom Corporation");
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index 677b4f0..e9fc98d 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -13,10 +13,17 @@
#define PHY_ID_BCM5461 0x002060c0
#define PHY_ID_BCM57780 0x03625d90
+#define PHY_ID_BCM7366 0x600d8490
+#define PHY_ID_BCM7439 0x600d8480
+#define PHY_ID_BCM7445 0x600d8510
+#define PHY_ID_BCM7XXX_28 0x600d8400
+
#define PHY_BCM_OUI_MASK 0xfffffc00
#define PHY_BCM_OUI_1 0x00206000
#define PHY_BCM_OUI_2 0x0143bc00
#define PHY_BCM_OUI_3 0x03625c00
+#define PHY_BCM_OUI_4 0x600d0000
+#define PHY_BCM_OUI_5 0x03625e00
#define PHY_BCM_FLAGS_MODE_COPPER 0x00000001
@@ -31,6 +38,8 @@
#define PHY_BRCM_EXT_IBND_TX_ENABLE 0x00002000
#define PHY_BRCM_CLEAR_RGMII_MODE 0x00004000
#define PHY_BRCM_DIS_TXCRXC_NOENRGY 0x00008000
+/* Broadcom BCM7xxx specific workarounds */
+#define PHY_BRCM_100MBPS_WAR 0x00010000
#define PHY_BCM_FLAGS_VALID 0x80000000
#endif /* _LINUX_BRCMPHY_H */
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 10/10] MAINTAINERS: add entry for the Broadcom GENET driver
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
Add myself as a maintainer of the Broadcom GENET driver.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
MAINTAINERS | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 091b50e..5a7b3ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1845,6 +1845,12 @@ L: netdev@vger.kernel.org
S: Supported
F: drivers/net/ethernet/broadcom/b44.*
+BROADCOM GENET ETHERNET DRIVER
+M: Florian Fainelli <f.fainelli@gmail.com>
+L: netdev@vger.kernel.org
+S: Supported
+F: drivers/net/ethernet/broadcom/genet/
+
BROADCOM BNX2 GIGABIT ETHERNET DRIVER
M: Michael Chan <mchan@broadcom.com>
L: netdev@vger.kernel.org
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 06/10] net: bcmgenet: add main driver file
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
This patch adds the BCMGENET main driver file which supports the
following:
- GENET hardware from V1 to V4
- support for reading the UniMAC MIB counters statistics
- support for the 5 transmit queues
- support for RX/TX checksum offload and SG
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 2685 ++++++++++++++++++++++++
1 file changed, 2685 insertions(+)
create mode 100644 drivers/net/ethernet/broadcom/genet/bcmgenet.c
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
new file mode 100644
index 0000000..de21261
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -0,0 +1,2685 @@
+/*
+ * Broadcom GENET (Gigabit Ethernet) controller driver
+ *
+ * Copyright (c) 2014 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#define pr_fmt(fmt) "bcmgenet: " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+#include <linux/fcntl.h>
+#include <linux/interrupt.h>
+#include <linux/string.h>
+#include <linux/if_ether.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/delay.h>
+#include <linux/platform_device.h>
+#include <linux/dma-mapping.h>
+#include <linux/pm.h>
+#include <linux/clk.h>
+#include <linux/version.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_net.h>
+#include <linux/of_platform.h>
+#include <net/arp.h>
+
+#include <linux/mii.h>
+#include <linux/ethtool.h>
+#include <linux/netdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/skbuff.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/phy.h>
+
+#include <asm/unaligned.h>
+
+#include "bcmgenet.h"
+
+/* Maximum number of hardware queues, downsized if needed */
+#define GENET_MAX_MQ_CNT 4
+
+/* Default highest priority queue for multi queue support */
+#define GENET_Q0_PRIORITY 0
+
+#define GENET_DEFAULT_BD_CNT \
+ (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->bds_cnt)
+
+#define RX_BUF_LENGTH 2048
+#define SKB_ALIGNMENT 32
+
+/* Tx/Rx DMA register offset, skip 256 descriptors */
+#define WORDS_PER_BD(p) (p->hw_params->words_per_bd)
+#define DMA_DESC_SIZE (WORDS_PER_BD(priv) * sizeof(u32))
+
+#define GENET_TDMA_REG_OFF (priv->hw_params->tdma_offset + \
+ TOTAL_DESC * DMA_DESC_SIZE)
+
+#define GENET_RDMA_REG_OFF (priv->hw_params->rdma_offset + \
+ TOTAL_DESC * DMA_DESC_SIZE)
+
+static inline void dmadesc_set_length_status(struct bcmgenet_priv *priv,
+ void __iomem *d, u32 value)
+{
+ __raw_writel(value, d + DMA_DESC_LENGTH_STATUS);
+}
+
+static inline u32 dmadesc_get_length_status(struct bcmgenet_priv *priv,
+ void __iomem *d)
+{
+ return __raw_readl(d + DMA_DESC_LENGTH_STATUS);
+}
+
+static inline void dmadesc_set_addr(struct bcmgenet_priv *priv,
+ void __iomem *d,
+ dma_addr_t addr)
+{
+ __raw_writel(lower_32_bits(addr), d + DMA_DESC_ADDRESS_LO);
+
+ /* Register writes to GISB bus can take couple hundred nanoseconds
+ * and are done for each packet, save these expensive writes unless
+ * the platform is explicitely configured for 64-bits/LPAE.
+ */
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+ if (priv->hw_params->flags & GENET_HAS_40BITS)
+ __raw_writel(upper_32_bits(addr), d + DMA_DESC_ADDRESS_HI);
+#endif
+}
+
+/* Combined address + length/status setter */
+static inline void dmadesc_set(struct bcmgenet_priv *priv,
+ void __iomem *d, dma_addr_t addr, u32 val)
+{
+ dmadesc_set_length_status(priv, d, val);
+ dmadesc_set_addr(priv, d, addr);
+}
+
+static inline dma_addr_t dmadesc_get_addr(struct bcmgenet_priv *priv,
+ void __iomem *d)
+{
+ dma_addr_t addr;
+
+ addr = __raw_readl(d + DMA_DESC_ADDRESS_LO);
+
+ /* Register writes to GISB bus can take couple hundred nanoseconds
+ * and are done for each packet, save these expensive writes unless
+ * the platform is explicitely configured for 64-bits/LPAE.
+ */
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+ if (priv->hw_params->flags & GENET_HAS_40BITS)
+ addr |= (u64)__raw_readl(d + DMA_DESC_ADDRESS_HI) << 32;
+#endif
+ return addr;
+}
+
+#define GENET_VER_FMT "%1d.%1d EPHY: 0x%04x"
+
+#define GENET_MSG_DEFAULT (NETIF_MSG_DRV | NETIF_MSG_PROBE | \
+ NETIF_MSG_LINK)
+
+static int debug = -1;
+module_param(debug, int, 0);
+MODULE_PARM_DESC(debug, "GENET debug level");
+
+enum bcmgenet_version __genet_get_version(struct bcmgenet_priv *priv)
+{
+ return priv->version;
+}
+
+
+static inline u32 bcmgenet_rbuf_ctrl_get(struct bcmgenet_priv *priv)
+{
+ if (GENET_IS_V1(priv))
+ return bcmgenet_rbuf_readl(priv, RBUF_FLUSH_CTRL_V1);
+ else
+ return bcmgenet_sys_readl(priv, SYS_RBUF_FLUSH_CTRL);
+}
+
+static inline void bcmgenet_rbuf_ctrl_set(struct bcmgenet_priv *priv, u32 val)
+{
+ if (GENET_IS_V1(priv))
+ bcmgenet_rbuf_writel(priv, val, RBUF_FLUSH_CTRL_V1);
+ else
+ bcmgenet_sys_writel(priv, val, SYS_RBUF_FLUSH_CTRL);
+}
+
+/* These macros are defined to deal with register map change
+ * between GENET1.1 and GENET2. Only those currently being used
+ * by driver are defined.
+ */
+static inline u32 bcmgenet_tbuf_ctrl_get(struct bcmgenet_priv *priv)
+{
+ if (GENET_IS_V1(priv))
+ return bcmgenet_rbuf_readl(priv, TBUF_CTRL_V1);
+ else
+ return __raw_readl(priv->base +
+ priv->hw_params->tbuf_offset + TBUF_CTRL);
+}
+
+static inline void bcmgenet_tbuf_ctrl_set(struct bcmgenet_priv *priv, u32 val)
+{
+ if (GENET_IS_V1(priv))
+ bcmgenet_rbuf_writel(priv, val, TBUF_CTRL_V1);
+ else
+ __raw_writel(val, priv->base +
+ priv->hw_params->tbuf_offset + TBUF_CTRL);
+}
+
+static inline u32 bcmgenet_bp_mc_get(struct bcmgenet_priv *priv)
+{
+ if (GENET_IS_V1(priv))
+ return bcmgenet_rbuf_readl(priv, TBUF_BP_MC_V1);
+ else
+ return __raw_readl(priv->base +
+ priv->hw_params->tbuf_offset + TBUF_BP_MC);
+}
+
+static inline void bcmgenet_bp_mc_set(struct bcmgenet_priv *priv, u32 val)
+{
+ if (GENET_IS_V1(priv))
+ bcmgenet_rbuf_writel(priv, val, TBUF_BP_MC_V1);
+ else
+ __raw_writel(val, priv->base +
+ priv->hw_params->tbuf_offset + TBUF_BP_MC);
+}
+
+/* RX/TX DMA register accessors */
+enum dma_reg {
+ DMA_RING_CFG = 0,
+ DMA_CTRL,
+ DMA_STATUS,
+ DMA_SCB_BURST_SIZE,
+ DMA_ARB_CTRL,
+ DMA_PRIORITY,
+ DMA_RING_PRIORITY,
+};
+
+static const u8 bcmgenet_dma_regs_v3plus[] = {
+ [DMA_RING_CFG] = 0x00,
+ [DMA_CTRL] = 0x04,
+ [DMA_STATUS] = 0x08,
+ [DMA_SCB_BURST_SIZE] = 0x0C,
+ [DMA_ARB_CTRL] = 0x2C,
+ [DMA_PRIORITY] = 0x30,
+ [DMA_RING_PRIORITY] = 0x38,
+};
+
+static const u8 bcmgenet_dma_regs_v2[] = {
+ [DMA_RING_CFG] = 0x00,
+ [DMA_CTRL] = 0x04,
+ [DMA_STATUS] = 0x08,
+ [DMA_SCB_BURST_SIZE] = 0x0C,
+ [DMA_ARB_CTRL] = 0x30,
+ [DMA_PRIORITY] = 0x34,
+ [DMA_RING_PRIORITY] = 0x3C,
+};
+
+static const u8 bcmgenet_dma_regs_v1[] = {
+ [DMA_CTRL] = 0x00,
+ [DMA_STATUS] = 0x04,
+ [DMA_SCB_BURST_SIZE] = 0x0C,
+ [DMA_ARB_CTRL] = 0x30,
+ [DMA_PRIORITY] = 0x34,
+ [DMA_RING_PRIORITY] = 0x3C,
+};
+
+/* Set at runtime once bcmgenet version is known */
+static const u8 *bcmgenet_dma_regs;
+
+static inline struct bcmgenet_priv *dev_to_priv(struct device *dev)
+{
+ return netdev_priv(dev_get_drvdata(dev));
+}
+
+static inline u32 bcmgenet_tdma_readl(struct bcmgenet_priv *priv,
+ enum dma_reg r)
+{
+ return __raw_readl(priv->base + GENET_TDMA_REG_OFF +
+ DMA_RINGS_SIZE + bcmgenet_dma_regs[r]);
+}
+
+static inline void bcmgenet_tdma_writel(struct bcmgenet_priv *priv,
+ u32 val, enum dma_reg r)
+{
+ __raw_writel(val, priv->base + GENET_TDMA_REG_OFF +
+ DMA_RINGS_SIZE + bcmgenet_dma_regs[r]);
+}
+
+static inline u32 bcmgenet_rdma_readl(struct bcmgenet_priv *priv,
+ enum dma_reg r)
+{
+ return __raw_readl(priv->base + GENET_RDMA_REG_OFF +
+ DMA_RINGS_SIZE + bcmgenet_dma_regs[r]);
+}
+
+static inline void bcmgenet_rdma_writel(struct bcmgenet_priv *priv,
+ u32 val, enum dma_reg r)
+{
+ __raw_writel(val, priv->base + GENET_RDMA_REG_OFF +
+ DMA_RINGS_SIZE + bcmgenet_dma_regs[r]);
+}
+
+/* RDMA/TDMA ring registers and accessors
+ * we merge the common fields and just prefix with T/D the registers
+ * having different meaning depending on the direction
+ */
+enum dma_ring_reg {
+ TDMA_READ_PTR = 0,
+ RDMA_WRITE_PTR = TDMA_READ_PTR,
+ TDMA_READ_PTR_HI,
+ RDMA_WRITE_PTR_HI = TDMA_READ_PTR_HI,
+ TDMA_CONS_INDEX,
+ RDMA_PROD_INDEX = TDMA_CONS_INDEX,
+ TDMA_PROD_INDEX,
+ RDMA_CONS_INDEX = TDMA_PROD_INDEX,
+ DMA_RING_BUF_SIZE,
+ DMA_START_ADDR,
+ DMA_START_ADDR_HI,
+ DMA_END_ADDR,
+ DMA_END_ADDR_HI,
+ DMA_MBUF_DONE_THRESH,
+ TDMA_FLOW_PERIOD,
+ RDMA_XON_XOFF_THRESH = TDMA_FLOW_PERIOD,
+ TDMA_WRITE_PTR,
+ RDMA_READ_PTR = TDMA_WRITE_PTR,
+ TDMA_WRITE_PTR_HI,
+ RDMA_READ_PTR_HI = TDMA_WRITE_PTR_HI
+};
+
+/* GENET v4 supports 40-bits pointer addressing
+ * for obvious reasons the LO and HI word parts
+ * are contiguous, but this offsets the other
+ * registers.
+ */
+static const u8 genet_dma_ring_regs_v4[] = {
+ [TDMA_READ_PTR] = 0x00,
+ [TDMA_READ_PTR_HI] = 0x04,
+ [TDMA_CONS_INDEX] = 0x08,
+ [TDMA_PROD_INDEX] = 0x0C,
+ [DMA_RING_BUF_SIZE] = 0x10,
+ [DMA_START_ADDR] = 0x14,
+ [DMA_START_ADDR_HI] = 0x18,
+ [DMA_END_ADDR] = 0x1C,
+ [DMA_END_ADDR_HI] = 0x20,
+ [DMA_MBUF_DONE_THRESH] = 0x24,
+ [TDMA_FLOW_PERIOD] = 0x28,
+ [TDMA_WRITE_PTR] = 0x2C,
+ [TDMA_WRITE_PTR_HI] = 0x30,
+};
+
+static const u8 genet_dma_ring_regs_v123[] = {
+ [TDMA_READ_PTR] = 0x00,
+ [TDMA_CONS_INDEX] = 0x04,
+ [TDMA_PROD_INDEX] = 0x08,
+ [DMA_RING_BUF_SIZE] = 0x0C,
+ [DMA_START_ADDR] = 0x10,
+ [DMA_END_ADDR] = 0x14,
+ [DMA_MBUF_DONE_THRESH] = 0x18,
+ [TDMA_FLOW_PERIOD] = 0x1C,
+ [TDMA_WRITE_PTR] = 0x20,
+};
+
+/* Set at runtime once GENET version is known */
+static const u8 *genet_dma_ring_regs;
+
+static inline u32 bcmgenet_tdma_ring_readl(struct bcmgenet_priv *priv,
+ unsigned int ring,
+ enum dma_ring_reg r)
+{
+ return __raw_readl(priv->base + GENET_TDMA_REG_OFF +
+ (DMA_RING_SIZE * ring) +
+ genet_dma_ring_regs[r]);
+}
+
+static inline void bcmgenet_tdma_ring_writel(struct bcmgenet_priv *priv,
+ unsigned int ring,
+ u32 val,
+ enum dma_ring_reg r)
+{
+ __raw_writel(val, priv->base + GENET_TDMA_REG_OFF +
+ (DMA_RING_SIZE * ring) +
+ genet_dma_ring_regs[r]);
+}
+
+static inline u32 bcmgenet_rdma_ring_readl(struct bcmgenet_priv *priv,
+ unsigned int ring,
+ enum dma_ring_reg r)
+{
+ return __raw_readl(priv->base + GENET_RDMA_REG_OFF +
+ (DMA_RING_SIZE * ring) +
+ genet_dma_ring_regs[r]);
+}
+
+static inline void bcmgenet_rdma_ring_writel(struct bcmgenet_priv *priv,
+ unsigned int ring,
+ u32 val,
+ enum dma_ring_reg r)
+{
+ __raw_writel(val, priv->base + GENET_RDMA_REG_OFF +
+ (DMA_RING_SIZE * ring) +
+ genet_dma_ring_regs[r]);
+}
+
+static int bcmgenet_get_settings(struct net_device *dev,
+ struct ethtool_cmd *cmd)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+
+ if (!netif_running(dev))
+ return -EINVAL;
+
+ if (!priv->phydev)
+ return -ENODEV;
+
+ return phy_ethtool_gset(priv->phydev, cmd);
+}
+
+static int bcmgenet_set_settings(struct net_device *dev,
+ struct ethtool_cmd *cmd)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+
+ if (!netif_running(dev))
+ return -EINVAL;
+
+ if (!priv->phydev)
+ return -ENODEV;
+
+ return phy_ethtool_sset(priv->phydev, cmd);
+}
+
+static int bcmgenet_set_rx_csum(struct net_device *dev,
+ netdev_features_t wanted)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ u32 rbuf_chk_ctrl;
+ int rx_csum_en;
+
+ rx_csum_en = !!(wanted & NETIF_F_RXCSUM);
+
+ spin_lock_bh(&priv->bh_lock);
+ rbuf_chk_ctrl = bcmgenet_rbuf_readl(priv, RBUF_CHK_CTRL);
+
+ /* enable rx checksumming */
+ if (!rx_csum_en)
+ rbuf_chk_ctrl &= ~RBUF_RXCHK_EN;
+ else
+ rbuf_chk_ctrl |= RBUF_RXCHK_EN;
+ priv->desc_rxchk_en = rx_csum_en;
+ bcmgenet_rbuf_writel(priv, rbuf_chk_ctrl, RBUF_CHK_CTRL);
+
+ spin_unlock_bh(&priv->bh_lock);
+
+ return 0;
+}
+static int bcmgenet_set_tx_csum(struct net_device *dev,
+ netdev_features_t wanted)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ int desc_64b_en;
+ u32 tbuf_ctrl, rbuf_ctrl;
+
+ spin_lock_bh(&priv->bh_lock);
+ tbuf_ctrl = bcmgenet_tbuf_ctrl_get(priv);
+ rbuf_ctrl = bcmgenet_rbuf_readl(priv, RBUF_CTRL);
+
+ desc_64b_en = !!(wanted & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM));
+
+ /* enable 64bytes descriptor in both directions (RBUF and TBUF) */
+ if (!desc_64b_en) {
+ tbuf_ctrl &= ~RBUF_64B_EN;
+ rbuf_ctrl &= ~RBUF_64B_EN;
+ } else {
+ tbuf_ctrl |= RBUF_64B_EN;
+ rbuf_ctrl |= RBUF_64B_EN;
+ }
+ priv->desc_64b_en = desc_64b_en;
+
+ bcmgenet_tbuf_ctrl_set(priv, tbuf_ctrl);
+ bcmgenet_rbuf_writel(priv, rbuf_ctrl, RBUF_CTRL);
+ spin_unlock_bh(&priv->bh_lock);
+ return 0;
+}
+
+static int bcmgenet_set_features(struct net_device *dev,
+ netdev_features_t features)
+{
+ netdev_features_t changed = features ^ dev->features;
+ netdev_features_t wanted = dev->wanted_features;
+ int ret = 0;
+
+ if (changed & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
+ ret = bcmgenet_set_tx_csum(dev, wanted);
+ if (changed & (NETIF_F_RXCSUM))
+ ret = bcmgenet_set_rx_csum(dev, wanted);
+
+ return ret;
+}
+
+static u32 bcmgenet_get_msglevel(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+
+ return priv->msg_enable;
+}
+
+static void bcmgenet_set_msglevel(struct net_device *dev, u32 level)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+
+ priv->msg_enable = level;
+}
+
+/* standard ethtool support functions. */
+enum bcmgenet_stat_type {
+ BCMGENET_STAT_NETDEV = -1,
+ BCMGENET_STAT_MIB_RX,
+ BCMGENET_STAT_MIB_TX,
+ BCMGENET_STAT_RUNT,
+ BCMGENET_STAT_MISC,
+};
+
+struct bcmgenet_stats {
+ char stat_string[ETH_GSTRING_LEN];
+ int stat_sizeof;
+ int stat_offset;
+ enum bcmgenet_stat_type type;
+ /* reg offset from UMAC base for misc counters */
+ u16 reg_offset;
+};
+
+#define STAT_NETDEV(m) { \
+ .stat_string = __stringify(m), \
+ .stat_sizeof = sizeof(((struct net_device_stats *)0)->m), \
+ .stat_offset = offsetof(struct net_device_stats, m), \
+ .type = BCMGENET_STAT_NETDEV, \
+}
+
+#define STAT_GENET_MIB(str, m, _type) { \
+ .stat_string = str, \
+ .stat_sizeof = sizeof(((struct bcmgenet_priv *)0)->m), \
+ .stat_offset = offsetof(struct bcmgenet_priv, m), \
+ .type = _type, \
+}
+
+#define STAT_GENET_MIB_RX(str, m) STAT_GENET_MIB(str, m, BCMGENET_STAT_MIB_RX)
+#define STAT_GENET_MIB_TX(str, m) STAT_GENET_MIB(str, m, BCMGENET_STAT_MIB_TX)
+#define STAT_GENET_RUNT(str, m) STAT_GENET_MIB(str, m, BCMGENET_STAT_RUNT)
+
+#define STAT_GENET_MISC(str, m, offset) { \
+ .stat_string = str, \
+ .stat_sizeof = sizeof(((struct bcmgenet_priv *)0)->m), \
+ .stat_offset = offsetof(struct bcmgenet_priv, m), \
+ .type = BCMGENET_STAT_MISC, \
+ .reg_offset = offset, \
+}
+
+
+/* There is a 0xC gap between the end of RX and beginning of TX stats and then
+ * between the end of TX stats and the beginning of the RX RUNT
+ */
+#define BCMGENET_STAT_OFFSET 0xc
+
+/* Hardware counters must be kept in sync because the order/offset
+ * is important here (order in structure declaration = order in hardware)
+ */
+static const struct bcmgenet_stats bcmgenet_gstrings_stats[] = {
+ /* general stats */
+ STAT_NETDEV(rx_packets),
+ STAT_NETDEV(tx_packets),
+ STAT_NETDEV(rx_bytes),
+ STAT_NETDEV(tx_bytes),
+ STAT_NETDEV(rx_errors),
+ STAT_NETDEV(tx_errors),
+ STAT_NETDEV(rx_dropped),
+ STAT_NETDEV(tx_dropped),
+ STAT_NETDEV(multicast),
+ /* UniMAC RSV counters */
+ STAT_GENET_MIB_RX("rx_64_octets", mib.rx.pkt_cnt.cnt_64),
+ STAT_GENET_MIB_RX("rx_65_127_oct", mib.rx.pkt_cnt.cnt_127),
+ STAT_GENET_MIB_RX("rx_128_255_oct", mib.rx.pkt_cnt.cnt_255),
+ STAT_GENET_MIB_RX("rx_256_511_oct", mib.rx.pkt_cnt.cnt_511),
+ STAT_GENET_MIB_RX("rx_512_1023_oct", mib.rx.pkt_cnt.cnt_1023),
+ STAT_GENET_MIB_RX("rx_1024_1518_oct", mib.rx.pkt_cnt.cnt_1518),
+ STAT_GENET_MIB_RX("rx_vlan_1519_1522_oct", mib.rx.pkt_cnt.cnt_mgv),
+ STAT_GENET_MIB_RX("rx_1522_2047_oct", mib.rx.pkt_cnt.cnt_2047),
+ STAT_GENET_MIB_RX("rx_2048_4095_oct", mib.rx.pkt_cnt.cnt_4095),
+ STAT_GENET_MIB_RX("rx_4096_9216_oct", mib.rx.pkt_cnt.cnt_9216),
+ STAT_GENET_MIB_RX("rx_pkts", mib.rx.pkt),
+ STAT_GENET_MIB_RX("rx_bytes", mib.rx.bytes),
+ STAT_GENET_MIB_RX("rx_multicast", mib.rx.mca),
+ STAT_GENET_MIB_RX("rx_broadcast", mib.rx.bca),
+ STAT_GENET_MIB_RX("rx_fcs", mib.rx.fcs),
+ STAT_GENET_MIB_RX("rx_control", mib.rx.cf),
+ STAT_GENET_MIB_RX("rx_pause", mib.rx.pf),
+ STAT_GENET_MIB_RX("rx_unknown", mib.rx.uo),
+ STAT_GENET_MIB_RX("rx_align", mib.rx.aln),
+ STAT_GENET_MIB_RX("rx_outrange", mib.rx.flr),
+ STAT_GENET_MIB_RX("rx_code", mib.rx.cde),
+ STAT_GENET_MIB_RX("rx_carrier", mib.rx.fcr),
+ STAT_GENET_MIB_RX("rx_oversize", mib.rx.ovr),
+ STAT_GENET_MIB_RX("rx_jabber", mib.rx.jbr),
+ STAT_GENET_MIB_RX("rx_mtu_err", mib.rx.mtue),
+ STAT_GENET_MIB_RX("rx_good_pkts", mib.rx.pok),
+ STAT_GENET_MIB_RX("rx_unicast", mib.rx.uc),
+ STAT_GENET_MIB_RX("rx_ppp", mib.rx.ppp),
+ STAT_GENET_MIB_RX("rx_crc", mib.rx.rcrc),
+ /* UniMAC TSV counters */
+ STAT_GENET_MIB_TX("tx_64_octets", mib.tx.pkt_cnt.cnt_64),
+ STAT_GENET_MIB_TX("tx_65_127_oct", mib.tx.pkt_cnt.cnt_127),
+ STAT_GENET_MIB_TX("tx_128_255_oct", mib.tx.pkt_cnt.cnt_255),
+ STAT_GENET_MIB_TX("tx_256_511_oct", mib.tx.pkt_cnt.cnt_511),
+ STAT_GENET_MIB_TX("tx_512_1023_oct", mib.tx.pkt_cnt.cnt_1023),
+ STAT_GENET_MIB_TX("tx_1024_1518_oct", mib.tx.pkt_cnt.cnt_1518),
+ STAT_GENET_MIB_TX("tx_vlan_1519_1522_oct", mib.tx.pkt_cnt.cnt_mgv),
+ STAT_GENET_MIB_TX("tx_1522_2047_oct", mib.tx.pkt_cnt.cnt_2047),
+ STAT_GENET_MIB_TX("tx_2048_4095_oct", mib.tx.pkt_cnt.cnt_4095),
+ STAT_GENET_MIB_TX("tx_4096_9216_oct", mib.tx.pkt_cnt.cnt_9216),
+ STAT_GENET_MIB_TX("tx_pkts", mib.tx.pkts),
+ STAT_GENET_MIB_TX("tx_multicast", mib.tx.mca),
+ STAT_GENET_MIB_TX("tx_broadcast", mib.tx.bca),
+ STAT_GENET_MIB_TX("tx_pause", mib.tx.pf),
+ STAT_GENET_MIB_TX("tx_control", mib.tx.cf),
+ STAT_GENET_MIB_TX("tx_fcs_err", mib.tx.fcs),
+ STAT_GENET_MIB_TX("tx_oversize", mib.tx.ovr),
+ STAT_GENET_MIB_TX("tx_defer", mib.tx.drf),
+ STAT_GENET_MIB_TX("tx_excess_defer", mib.tx.edf),
+ STAT_GENET_MIB_TX("tx_single_col", mib.tx.scl),
+ STAT_GENET_MIB_TX("tx_multi_col", mib.tx.mcl),
+ STAT_GENET_MIB_TX("tx_late_col", mib.tx.lcl),
+ STAT_GENET_MIB_TX("tx_excess_col", mib.tx.ecl),
+ STAT_GENET_MIB_TX("tx_frags", mib.tx.frg),
+ STAT_GENET_MIB_TX("tx_total_col", mib.tx.ncl),
+ STAT_GENET_MIB_TX("tx_jabber", mib.tx.jbr),
+ STAT_GENET_MIB_TX("tx_bytes", mib.tx.bytes),
+ STAT_GENET_MIB_TX("tx_good_pkts", mib.tx.pok),
+ STAT_GENET_MIB_TX("tx_unicast", mib.tx.uc),
+ /* UniMAC RUNT counters */
+ STAT_GENET_RUNT("rx_runt_pkts", mib.rx_runt_cnt),
+ STAT_GENET_RUNT("rx_runt_valid_fcs", mib.rx_runt_fcs),
+ STAT_GENET_RUNT("rx_runt_inval_fcs_align", mib.rx_runt_fcs_align),
+ STAT_GENET_RUNT("rx_runt_bytes", mib.rx_runt_bytes),
+ /* Misc UniMAC counters */
+ STAT_GENET_MISC("rbuf_ovflow_cnt", mib.rbuf_ovflow_cnt,
+ UMAC_RBUF_OVFL_CNT),
+ STAT_GENET_MISC("rbuf_err_cnt", mib.rbuf_err_cnt, UMAC_RBUF_ERR_CNT),
+ STAT_GENET_MISC("mdf_err_cnt", mib.mdf_err_cnt, UMAC_MDF_ERR_CNT),
+};
+
+#define BCMGENET_STATS_LEN ARRAY_SIZE(bcmgenet_gstrings_stats)
+
+static void bcmgenet_get_drvinfo(struct net_device *dev,
+ struct ethtool_drvinfo *info)
+{
+ strlcpy(info->driver, "bcmgenet", sizeof(info->driver));
+ strlcpy(info->version, "v2.0", sizeof(info->version));
+ info->n_stats = BCMGENET_STATS_LEN;
+
+}
+
+static int bcmgenet_get_sset_count(struct net_device *dev, int string_set)
+{
+ switch (string_set) {
+ case ETH_SS_STATS:
+ return BCMGENET_STATS_LEN;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static void bcmgenet_get_strings(struct net_device *dev,
+ u32 stringset, u8 *data)
+{
+ int i;
+
+ switch (stringset) {
+ case ETH_SS_STATS:
+ for (i = 0; i < BCMGENET_STATS_LEN; i++) {
+ memcpy(data + i * ETH_GSTRING_LEN,
+ bcmgenet_gstrings_stats[i].stat_string,
+ ETH_GSTRING_LEN);
+ }
+ break;
+ }
+}
+
+static void bcmgenet_update_mib_counters(struct bcmgenet_priv *priv)
+{
+ int i, j = 0;
+
+ for (i = 0; i < BCMGENET_STATS_LEN; i++) {
+ const struct bcmgenet_stats *s;
+ u32 val = 0;
+ char *p;
+ u8 offset = 0;
+
+ s = &bcmgenet_gstrings_stats[i];
+ switch (s->type) {
+ case BCMGENET_STAT_NETDEV:
+ continue;
+ case BCMGENET_STAT_MIB_RX:
+ case BCMGENET_STAT_MIB_TX:
+ case BCMGENET_STAT_RUNT:
+ if (s->type != BCMGENET_STAT_MIB_RX)
+ offset = BCMGENET_STAT_OFFSET;
+ val = bcmgenet_umac_readl(priv, UMAC_MIB_START +
+ j + offset);
+ break;
+ case BCMGENET_STAT_MISC:
+ val = bcmgenet_umac_readl(priv, s->reg_offset);
+ /* clear if overflowed */
+ if (val == ~0)
+ bcmgenet_umac_writel(priv, 0, s->reg_offset);
+ break;
+ }
+
+ j += s->stat_sizeof;
+ p = (char *)priv + s->stat_offset;
+ *(u32 *)p = val;
+ }
+}
+
+static void bcmgenet_get_ethtool_stats(struct net_device *dev,
+ struct ethtool_stats *stats,
+ u64 *data)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ int i;
+
+ mutex_lock(&priv->mib_mutex);
+ if (netif_running(dev))
+ bcmgenet_update_mib_counters(priv);
+
+ for (i = 0; i < BCMGENET_STATS_LEN; i++) {
+ const struct bcmgenet_stats *s;
+ char *p;
+
+ s = &bcmgenet_gstrings_stats[i];
+ if (s->type == BCMGENET_STAT_NETDEV)
+ p = (char *)&dev->stats;
+ else
+ p = (char *)priv;
+ p += s->stat_offset;
+ data[i] = *(u32 *)p;
+ }
+ mutex_unlock(&priv->mib_mutex);
+}
+
+/* standard ethtool support functions. */
+static struct ethtool_ops bcmgenet_ethtool_ops = {
+ .get_strings = bcmgenet_get_strings,
+ .get_sset_count = bcmgenet_get_sset_count,
+ .get_ethtool_stats = bcmgenet_get_ethtool_stats,
+ .get_settings = bcmgenet_get_settings,
+ .set_settings = bcmgenet_set_settings,
+ .get_drvinfo = bcmgenet_get_drvinfo,
+ .get_link = ethtool_op_get_link,
+ .get_msglevel = bcmgenet_get_msglevel,
+ .set_msglevel = bcmgenet_set_msglevel,
+};
+
+/* Power down the unimac, based on mode. */
+static void bcmgenet_power_down(struct bcmgenet_priv *priv,
+ enum bcmgenet_power_mode mode)
+{
+ u32 reg;
+
+ switch (mode) {
+ case GENET_POWER_CABLE_SENSE:
+ if (priv->phydev)
+ phy_detach(priv->phydev);
+ break;
+
+ case GENET_POWER_PASSIVE:
+ /* Power down LED */
+ bcmgenet_mii_reset(priv->dev);
+ if (priv->hw_params->flags & GENET_HAS_EXT) {
+ reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+ reg |= (EXT_PWR_DOWN_PHY |
+ EXT_PWR_DOWN_DLL | EXT_PWR_DOWN_BIAS);
+ bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+ }
+ break;
+ default:
+ break;
+ }
+
+}
+
+static void bcmgenet_power_up(struct bcmgenet_priv *priv,
+ enum bcmgenet_power_mode mode)
+{
+ u32 reg;
+
+ switch (mode) {
+ case GENET_POWER_CABLE_SENSE:
+ /* enable APD */
+ if (priv->hw_params->flags & GENET_HAS_EXT) {
+ reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+ reg |= EXT_PWR_DN_EN_LD;
+ bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+ bcmgenet_mii_reset(priv->dev);
+ }
+ break;
+
+ case GENET_POWER_PASSIVE:
+ if (priv->hw_params->flags & GENET_HAS_EXT) {
+ reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+ reg &= ~EXT_PWR_DOWN_DLL;
+ reg &= ~EXT_PWR_DOWN_PHY;
+ reg &= ~EXT_PWR_DOWN_BIAS;
+ /* enable APD */
+ reg |= EXT_PWR_DN_EN_LD;
+ bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+ bcmgenet_mii_reset(priv->dev);
+ }
+ break;
+ default:
+ break;
+ }
+}
+
+/* ioctl handle special commands that are not present in ethtool. */
+static int bcmgenet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ int val = 0;
+
+ if (!netif_running(dev))
+ return -EINVAL;
+
+ switch (cmd) {
+ case SIOCGMIIPHY:
+ case SIOCGMIIREG:
+ case SIOCSMIIREG:
+ if (!priv->phydev)
+ val = -ENODEV;
+ else
+ val = phy_mii_ioctl(priv->phydev, rq, cmd);
+ break;
+
+ default:
+ val = -EINVAL;
+ break;
+ }
+
+ return val;
+}
+
+static struct enet_cb *bcmgenet_get_txcb(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *ring)
+{
+ struct enet_cb *tx_cb_ptr;
+
+ tx_cb_ptr = ring->cbs;
+ tx_cb_ptr += ring->write_ptr - ring->cb_ptr;
+ tx_cb_ptr->bd_addr = priv->tx_bds + ring->write_ptr * DMA_DESC_SIZE;
+ /* Advancing local write pointer */
+ if (ring->write_ptr == ring->end_ptr)
+ ring->write_ptr = ring->cb_ptr;
+ else
+ ring->write_ptr++;
+
+ return tx_cb_ptr;
+}
+
+/* Simple helper to free a control block's resources */
+static void bcmgenet_free_cb(struct enet_cb *cb)
+{
+ dev_kfree_skb_any(cb->skb);
+ cb->skb = NULL;
+ dma_unmap_addr_set(cb, dma_addr, 0);
+}
+
+static inline void bcmgenet_tx_ring16_int_disable(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *ring)
+{
+ bcmgenet_intrl2_0_writel(priv,
+ UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE,
+ INTRL2_CPU_MASK_SET);
+}
+
+static inline void bcmgenet_tx_ring16_int_enable(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *ring)
+{
+ bcmgenet_intrl2_0_writel(priv,
+ UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE,
+ INTRL2_CPU_MASK_CLEAR);
+}
+
+static inline void bcmgenet_tx_ring_int_enable(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *ring)
+{
+ bcmgenet_intrl2_1_writel(priv,
+ (1 << ring->index), INTRL2_CPU_MASK_CLEAR);
+ priv->int1_mask &= ~(1 << ring->index);
+}
+
+static inline void bcmgenet_tx_ring_int_disable(struct bcmgenet_priv *priv,
+ struct bcmgenet_tx_ring *ring)
+{
+ bcmgenet_intrl2_1_writel(priv,
+ (1 << ring->index), INTRL2_CPU_MASK_SET);
+ priv->int1_mask |= (1 << ring->index);
+}
+
+/* Unlocked version of the reclaim routine */
+static void __bcmgenet_tx_reclaim(struct net_device *dev,
+ struct bcmgenet_tx_ring *ring)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ int last_tx_cn, last_c_index, num_tx_bds;
+ struct enet_cb *tx_cb_ptr;
+ unsigned int c_index;
+
+ /* Compute how many buffers are transmited since last xmit call */
+ c_index = bcmgenet_tdma_ring_readl(priv, ring->index, TDMA_CONS_INDEX);
+
+ last_c_index = ring->c_index;
+ num_tx_bds = ring->size;
+
+ c_index &= (num_tx_bds - 1);
+
+ if (c_index >= last_c_index)
+ last_tx_cn = c_index - last_c_index;
+ else
+ last_tx_cn = num_tx_bds - last_c_index + c_index;
+
+ netif_dbg(priv, tx_done, dev,
+ "%s ring=%d index=%d last_tx_cn=%d last_index=%d\n",
+ __func__, ring->index,
+ c_index, last_tx_cn, last_c_index);
+
+ /* Reclaim transmitted buffers */
+ while (last_tx_cn-- > 0) {
+ tx_cb_ptr = ring->cbs + last_c_index;
+ if (tx_cb_ptr->skb) {
+ dev->stats.tx_bytes += tx_cb_ptr->skb->len;
+ dma_unmap_single(&dev->dev,
+ dma_unmap_addr(tx_cb_ptr, dma_addr),
+ tx_cb_ptr->skb->len,
+ DMA_TO_DEVICE);
+ bcmgenet_free_cb(tx_cb_ptr);
+ } else if (dma_unmap_addr(tx_cb_ptr, dma_addr)) {
+ dev->stats.tx_bytes +=
+ dma_unmap_len(tx_cb_ptr, dma_len);
+ dma_unmap_page(&dev->dev,
+ dma_unmap_addr(tx_cb_ptr, dma_addr),
+ dma_unmap_len(tx_cb_ptr, dma_len),
+ DMA_TO_DEVICE);
+ dma_unmap_addr_set(tx_cb_ptr, dma_addr, 0);
+ }
+ dev->stats.tx_packets++;
+ ring->free_bds += 1;
+
+ last_c_index++;
+ last_c_index &= (num_tx_bds - 1);
+ }
+
+ if (ring->free_bds > (MAX_SKB_FRAGS + 1))
+ ring->int_disable(priv, ring);
+
+ if (__netif_subqueue_stopped(dev, ring->queue))
+ netif_wake_subqueue(dev, ring->queue);
+
+ ring->c_index = c_index;
+}
+
+static void bcmgenet_tx_reclaim(struct net_device *dev,
+ struct bcmgenet_tx_ring *ring)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&ring->lock, flags);
+ __bcmgenet_tx_reclaim(dev, ring);
+ spin_unlock_irqrestore(&ring->lock, flags);
+}
+
+static void bcmgenet_tx_reclaim_all(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ int i;
+
+ if (netif_is_multiqueue(dev)) {
+ for (i = 0; i < priv->hw_params->tx_queues; i++)
+ bcmgenet_tx_reclaim(dev, &priv->tx_rings[i]);
+ }
+
+ bcmgenet_tx_reclaim(dev, &priv->tx_rings[DESC_INDEX]);
+}
+
+/* Transmits a single SKB (either head of a fragment or a single SKB)
+ * caller must hold priv->lock
+ */
+static int bcmgenet_xmit_single(struct net_device *dev,
+ struct sk_buff *skb,
+ u16 dma_desc_flags,
+ struct bcmgenet_tx_ring *ring)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct device *kdev = &priv->pdev->dev;
+ struct enet_cb *tx_cb_ptr;
+ unsigned int skb_len;
+ dma_addr_t mapping;
+ u32 length_status;
+ int ret;
+
+ tx_cb_ptr = bcmgenet_get_txcb(priv, ring);
+
+ if (unlikely(!tx_cb_ptr))
+ BUG();
+
+ tx_cb_ptr->skb = skb;
+
+ skb_len = skb_headlen(skb) < ETH_ZLEN ? ETH_ZLEN : skb_headlen(skb);
+
+ mapping = dma_map_single(kdev, skb->data, skb_len, DMA_TO_DEVICE);
+ ret = dma_mapping_error(kdev, mapping);
+ if (ret) {
+ netif_err(priv, tx_err, dev, "Tx DMA map failed\n");
+ dev_kfree_skb(skb);
+ return ret;
+ }
+
+ dma_unmap_addr_set(tx_cb_ptr, dma_addr, mapping);
+ dma_unmap_len_set(tx_cb_ptr, dma_len, skb->len);
+ length_status = (skb_len << DMA_BUFLENGTH_SHIFT) | dma_desc_flags |
+ (priv->hw_params->qtag_mask << DMA_TX_QTAG_SHIFT) |
+ DMA_TX_APPEND_CRC;
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL)
+ length_status |= DMA_TX_DO_CSUM;
+
+ dmadesc_set(priv, tx_cb_ptr->bd_addr, mapping, length_status);
+
+ /* Decrement total BD count and advance our write pointer */
+ ring->free_bds -= 1;
+ ring->prod_index += 1;
+ ring->prod_index &= DMA_P_INDEX_MASK;
+
+ return 0;
+}
+
+/* Transmit a SKB fragement */
+static int bcmgenet_xmit_frag(struct net_device *dev,
+ skb_frag_t *frag,
+ u16 dma_desc_flags,
+ struct bcmgenet_tx_ring *ring)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct device *kdev = &priv->pdev->dev;
+ struct enet_cb *tx_cb_ptr;
+ dma_addr_t mapping;
+ int ret;
+
+ tx_cb_ptr = bcmgenet_get_txcb(priv, ring);
+
+ if (unlikely(!tx_cb_ptr))
+ BUG();
+ tx_cb_ptr->skb = NULL;
+
+ mapping = skb_frag_dma_map(kdev, frag, 0,
+ skb_frag_size(frag), DMA_TO_DEVICE);
+ ret = dma_mapping_error(kdev, mapping);
+ if (ret) {
+ netif_err(priv, tx_err, dev, "%s: Tx DMA map failed\n",
+ __func__);
+ /*TODO: Handle frag failure.*/
+ return ret;
+ }
+
+ dma_unmap_addr_set(tx_cb_ptr, dma_addr, mapping);
+ dma_unmap_len_set(tx_cb_ptr, dma_len, frag->size);
+
+ dmadesc_set(priv, tx_cb_ptr->bd_addr, mapping,
+ (frag->size << DMA_BUFLENGTH_SHIFT) | dma_desc_flags |
+ (priv->hw_params->qtag_mask << DMA_TX_QTAG_SHIFT));
+
+
+ ring->free_bds -= 1;
+ ring->prod_index += 1;
+ ring->prod_index &= DMA_P_INDEX_MASK;
+
+ return 0;
+}
+
+/* Reallocate the SKB to put enough headroom in front of it and insert
+ * the transmit checksum offsets in the descriptors
+ */
+static int bcmgenet_put_tx_csum(struct net_device *dev, struct sk_buff *skb)
+{
+ struct status_64 *status = NULL;
+ struct sk_buff *new_skb;
+ u16 offset;
+ u8 ip_proto;
+ u16 ip_ver;
+ u32 tx_csum_info;
+
+ if (unlikely(skb_headroom(skb) < sizeof(*status))) {
+ /* If 64 byte status block enabled, must make sure skb has
+ * enough headroom for us to insert 64B status block.
+ */
+ new_skb = skb_realloc_headroom(skb, sizeof(*status));
+ dev_kfree_skb(skb);
+ if (!new_skb) {
+ dev->stats.tx_errors++;
+ dev->stats.tx_dropped++;
+ return -ENOMEM;
+ }
+ skb = new_skb;
+ }
+
+ skb_push(skb, sizeof(*status));
+ status = (struct status_64 *)skb->data;
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ ip_ver = htons(skb->protocol);
+ switch (ip_ver) {
+ case ETH_P_IP:
+ ip_proto = ip_hdr(skb)->protocol;
+ break;
+ case ETH_P_IPV6:
+ ip_proto = ipv6_hdr(skb)->nexthdr;
+ break;
+ default:
+ return 0;
+ }
+
+ offset = skb_checksum_start_offset(skb) - sizeof(*status);
+ tx_csum_info = (offset << STATUS_TX_CSUM_START_SHIFT) |
+ (offset + skb->csum_offset);
+
+ /* Set the length valid bit for TCP and UDP and just set
+ * the special UDP flag for IPv4, else just set to 0.
+ */
+ if (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP) {
+ tx_csum_info |= STATUS_TX_CSUM_LV;
+ if (ip_proto == IPPROTO_UDP && ip_ver == ETH_P_IP)
+ tx_csum_info |= STATUS_TX_CSUM_PROTO_UDP;
+ } else
+ tx_csum_info = 0;
+
+ status->tx_csum_info = tx_csum_info;
+ }
+
+ return 0;
+}
+
+static netdev_tx_t bcmgenet_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct bcmgenet_tx_ring *ring = NULL;
+ unsigned long flags = 0;
+ int nr_frags, index;
+ u16 dma_desc_flags;
+ int ret;
+ int i;
+
+ index = skb_get_queue_mapping(skb);
+ /* Mapping strategy:
+ * queue_mapping = 0, unclassfieid, packet xmited through ring16
+ * queue_mapping = 1, goes to ring 0. (highest priority queue
+ * queue_mapping = 2, goes to ring 1.
+ * queue_mapping = 3, goes to ring 2.
+ * queue_mapping = 4, goes to ring 3.
+ */
+ if (index == 0)
+ index = DESC_INDEX;
+ else
+ index -= 1;
+
+ if (index != DESC_INDEX && index > 3) {
+ netdev_err(dev, "%s: queue_mapping %d is invalid\n",
+ __func__, skb_get_queue_mapping(skb));
+ dev->stats.tx_errors++;
+ dev->stats.tx_dropped++;
+ ret = NETDEV_TX_OK;
+ goto out;
+ }
+ nr_frags = skb_shinfo(skb)->nr_frags;
+ ring = &priv->tx_rings[index];
+
+ spin_lock_irqsave(&ring->lock, flags);
+ if (ring->free_bds <= nr_frags + 1) {
+ netif_stop_subqueue(dev, ring->queue);
+ netdev_err(dev, "%s: tx ring %d full when queue %d awake\n",
+ __func__, index, ring->queue);
+ ret = NETDEV_TX_BUSY;
+ goto out;
+ }
+
+ /* reclaim xmited skb every 8 packets. */
+ /*if (ring->free_bds < ring->size - 8)*/
+ /*__bcmgenet_tx_reclaim(dev, ring);*/
+
+ /* set the SKB transmit checksum */
+ if (priv->desc_64b_en) {
+ ret = bcmgenet_put_tx_csum(dev, skb);
+ if (ret) {
+ ret = NETDEV_TX_OK;
+ goto out;
+ }
+ }
+
+ dma_desc_flags = DMA_SOP;
+ if (nr_frags == 0)
+ dma_desc_flags |= DMA_EOP;
+
+ /* Transmit single SKB or head of fragment list */
+ ret = bcmgenet_xmit_single(dev, skb, dma_desc_flags, ring);
+ if (ret) {
+ ret = NETDEV_TX_OK;
+ goto out;
+ }
+
+ /* xmit fragment */
+ for (i = 0; i < nr_frags; i++) {
+ /*TODO: Handle frag failure.*/
+ ret = bcmgenet_xmit_frag(dev,
+ &skb_shinfo(skb)->frags[i],
+ (i == nr_frags - 1) ? DMA_EOP : 0, ring);
+ if (ret) {
+ ret = NETDEV_TX_OK;
+ goto out;
+ }
+ }
+
+ /* we kept a software copy of how much we should advance the TDMA
+ * producer index, now write it down to the hardware
+ */
+ bcmgenet_tdma_ring_writel(priv, ring->index,
+ ring->prod_index, TDMA_PROD_INDEX);
+
+ if (ring->free_bds <= (MAX_SKB_FRAGS + 1)) {
+ netif_stop_subqueue(dev, ring->queue);
+ ring->int_enable(priv, ring);
+ }
+
+out:
+ spin_unlock_irqrestore(&ring->lock, flags);
+
+ return ret;
+}
+
+
+static int bcmgenet_rx_refill(struct bcmgenet_priv *priv,
+ struct enet_cb *cb)
+{
+ struct device *kdev = &priv->pdev->dev;
+ struct sk_buff *skb;
+ dma_addr_t mapping;
+ int ret;
+
+ skb = netdev_alloc_skb(priv->dev,
+ priv->rx_buf_len + SKB_ALIGNMENT);
+ if (!skb)
+ return -ENOMEM;
+
+ /* a caller did not release this control block */
+ WARN_ON(cb->skb != NULL);
+ cb->skb = skb;
+ mapping = dma_map_single(kdev, skb->data,
+ priv->rx_buf_len, DMA_FROM_DEVICE);
+ ret = dma_mapping_error(kdev, mapping);
+ if (ret) {
+ bcmgenet_free_cb(cb);
+ netif_err(priv, rx_err, priv->dev,
+ "%s DMA map failed\n", __func__);
+ return ret;
+ }
+
+ dma_unmap_addr_set(cb, dma_addr, mapping);
+ /* assign packet, prepare descriptor, and advance pointer */
+
+ dmadesc_set_addr(priv, priv->rx_bd_assign_ptr, mapping);
+
+ /* turn on the newly assigned BD for DMA to use */
+ priv->rx_bd_assign_index++;
+ priv->rx_bd_assign_index &= (priv->num_rx_bds - 1);
+
+ priv->rx_bd_assign_ptr = priv->rx_bds +
+ (priv->rx_bd_assign_index * DMA_DESC_SIZE);
+
+ return 0;
+}
+
+/* bcmgenet_desc_rx - descriptor based rx process.
+ * this could be called from bottom half, or from NAPI polling method.
+ */
+static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
+ unsigned int budget)
+{
+ struct net_device *dev = priv->dev;
+ struct enet_cb *cb;
+ struct sk_buff *skb;
+ u32 dma_length_status;
+ unsigned long dma_flag;
+ int len, err;
+ unsigned int rxpktprocessed = 0, rxpkttoprocess;
+ unsigned int p_index;
+ unsigned int chksum_ok = 0;
+
+ p_index = bcmgenet_rdma_ring_readl(priv,
+ DESC_INDEX, RDMA_PROD_INDEX);
+ p_index &= DMA_P_INDEX_MASK;
+
+ if (p_index < priv->rx_c_index)
+ rxpkttoprocess = (DMA_C_INDEX_MASK + 1) -
+ priv->rx_c_index + p_index;
+ else
+ rxpkttoprocess = p_index - priv->rx_c_index;
+
+ netif_dbg(priv, rx_status, dev,
+ "RDMA: rxpkttoprocess=%d\n", rxpkttoprocess);
+
+ while ((rxpktprocessed < rxpkttoprocess) &&
+ (rxpktprocessed < budget)) {
+
+ /* Unmap the packet contents such that we can use the
+ * RSV from the 64 bytes descriptor when enabled and save
+ * a 32-bits register read
+ */
+ cb = &priv->rx_cbs[priv->rx_read_ptr];
+ skb = cb->skb;
+ dma_unmap_single(&dev->dev, dma_unmap_addr(cb, dma_addr),
+ priv->rx_buf_len, DMA_FROM_DEVICE);
+
+ if (!priv->desc_64b_en) {
+ dma_length_status = dmadesc_get_length_status(priv,
+ priv->rx_bds +
+ (priv->rx_read_ptr *
+ DMA_DESC_SIZE));
+ } else {
+ struct status_64 *status;
+ status = (struct status_64 *)skb->data;
+ dma_length_status = status->length_status;
+ }
+
+ /* DMA flags and length are still valid no matter how
+ * we got the Receive Status Vector (64B RSB or register)
+ */
+ dma_flag = dma_length_status & 0xffff;
+ len = dma_length_status >> DMA_BUFLENGTH_SHIFT;
+
+ netif_dbg(priv, rx_status, dev,
+ "%s: p_ind=%d c_ind=%d read_ptr=%d len_stat=0x%08x\n",
+ __func__, p_index, priv->rx_c_index, priv->rx_read_ptr,
+ dma_length_status);
+
+ rxpktprocessed++;
+
+ priv->rx_read_ptr++;
+ priv->rx_read_ptr &= (priv->num_rx_bds - 1);
+
+ /* out of memory, just drop packets at the hardware level */
+ if (unlikely(!skb)) {
+ dev->stats.rx_dropped++;
+ dev->stats.rx_errors++;
+ goto refill;
+ }
+
+ if (unlikely(!(dma_flag & DMA_EOP) || !(dma_flag & DMA_SOP))) {
+ netif_err(priv, rx_status, dev,
+ "Droping fragmented packet!\n");
+ dev->stats.rx_dropped++;
+ dev->stats.rx_errors++;
+ dev_kfree_skb_any(cb->skb);
+ cb->skb = NULL;
+ goto refill;
+ }
+ /* report errors */
+ if (unlikely(dma_flag & (DMA_RX_CRC_ERROR |
+ DMA_RX_OV |
+ DMA_RX_NO |
+ DMA_RX_LG |
+ DMA_RX_RXER))) {
+ netif_err(priv, rx_status, dev, "dma_flag=0x%x\n",
+ (unsigned int)dma_flag);
+ if (dma_flag & DMA_RX_CRC_ERROR)
+ dev->stats.rx_crc_errors++;
+ if (dma_flag & DMA_RX_OV)
+ dev->stats.rx_over_errors++;
+ if (dma_flag & DMA_RX_NO)
+ dev->stats.rx_frame_errors++;
+ if (dma_flag & DMA_RX_LG)
+ dev->stats.rx_length_errors++;
+ dev->stats.rx_dropped++;
+ dev->stats.rx_errors++;
+
+ /* discard the packet and advance consumer index.*/
+ dev_kfree_skb_any(cb->skb);
+ cb->skb = NULL;
+ goto refill;
+ } /* error packet */
+
+ chksum_ok = (dma_flag & priv->dma_rx_chk_bit) &&
+ priv->desc_rxchk_en;
+
+ skb_put(skb, len);
+ if (priv->desc_64b_en) {
+ skb_pull(skb, 64);
+ len -= 64;
+ }
+
+ if (likely(chksum_ok))
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+ /* remove hardware 2bytes added for IP alignment */
+ skb_pull(skb, 2);
+ len -= 2;
+
+ if (priv->crc_fwd_en) {
+ skb_trim(skb, len - ETH_FCS_LEN);
+ len -= ETH_FCS_LEN;
+ }
+
+ /*Finish setting up the received SKB and send it to the kernel*/
+ skb->protocol = eth_type_trans(skb, priv->dev);
+ dev->stats.rx_packets++;
+ dev->stats.rx_bytes += len;
+ if (dma_flag & DMA_RX_MULT)
+ dev->stats.multicast++;
+
+ /* Notify kernel */
+ napi_gro_receive(&priv->napi, skb);
+ cb->skb = NULL;
+ netif_dbg(priv, rx_status, dev, "pushed up to kernel\n");
+
+ /* refill RX path on the current control block */
+refill:
+ err = bcmgenet_rx_refill(priv, cb);
+ if (err)
+ netif_err(priv, rx_err, dev, "Rx refill failed\n");
+ }
+
+ return rxpktprocessed;
+}
+
+/* Assign skb to RX DMA descriptor. */
+static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv)
+{
+ struct enet_cb *cb;
+ int ret = 0;
+ int i;
+ u32 reg;
+
+ netif_dbg(priv, hw, priv->dev, "%s:\n", __func__);
+
+ /* This function may be called from irq bottom-half. */
+ spin_lock_bh(&priv->bh_lock);
+
+ /* loop here for each buffer needing assign */
+ for (i = 0; i < priv->num_rx_bds; i++) {
+ cb = &priv->rx_cbs[priv->rx_bd_assign_index];
+ if (cb->skb)
+ continue;
+
+ /* set the DMA descriptor length once and for all
+ * it will only change if we support dynamically sizing
+ * priv->rx_buf_len, but we do not
+ */
+ dmadesc_set_length_status(priv, priv->rx_bd_assign_ptr,
+ priv->rx_buf_len << DMA_BUFLENGTH_SHIFT);
+
+ ret = bcmgenet_rx_refill(priv, cb);
+ if (ret)
+ break;
+
+ }
+
+ /* Enable rx DMA incase it was disabled due to running out of rx BD */
+ reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
+ reg |= DMA_EN;
+ bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
+
+ spin_unlock_bh(&priv->bh_lock);
+
+ return ret;
+}
+
+static void bcmgenet_free_rx_buffers(struct bcmgenet_priv *priv)
+{
+ struct enet_cb *cb;
+ int i;
+
+ for (i = 0; i < priv->num_rx_bds; i++) {
+ cb = &priv->rx_cbs[i];
+
+ if (dma_unmap_addr(cb, dma_addr)) {
+ dma_unmap_single(&priv->dev->dev,
+ dma_unmap_addr(cb, dma_addr),
+ priv->rx_buf_len, DMA_FROM_DEVICE);
+ dma_unmap_addr_set(cb, dma_addr, 0);
+ }
+
+ if (cb->skb)
+ bcmgenet_free_cb(cb);
+ }
+}
+
+static int reset_umac(struct bcmgenet_priv *priv)
+{
+ struct device *kdev = &priv->pdev->dev;
+ unsigned int timeout = 0;
+ u32 reg;
+
+ /* 7358a0/7552a0: bad default in RBUF_FLUSH_CTRL.umac_sw_rst */
+ bcmgenet_rbuf_ctrl_set(priv, 0);
+ udelay(10);
+
+ /* disable MAC while updating its registers */
+ bcmgenet_umac_writel(priv, 0, UMAC_CMD);
+
+ /* issue soft reset, wait for it to complete */
+ bcmgenet_umac_writel(priv, CMD_SW_RESET, UMAC_CMD);
+ while (timeout++ < 1000) {
+ reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+ if (!(reg & CMD_SW_RESET))
+ break;
+ udelay(1);
+ }
+
+ if (timeout == 1000) {
+ dev_err(kdev,
+ "timeout waiting for MAC to come out of resetn\n");
+ return -ETIMEDOUT;
+ }
+
+ return 0;
+}
+
+/* init_umac: Initializes the uniMac controller */
+static int init_umac(struct bcmgenet_priv *priv)
+{
+ struct device *kdev = &priv->pdev->dev;
+ int ret;
+ u32 reg, cpu_mask_clear;
+
+ dev_dbg(&priv->pdev->dev, "bcmgenet: init_umac\n");
+
+ ret = reset_umac(priv);
+ if (ret)
+ return ret;
+
+ bcmgenet_umac_writel(priv, 0, UMAC_CMD);
+ /* clear tx/rx counter */
+ bcmgenet_umac_writel(priv,
+ MIB_RESET_RX | MIB_RESET_TX | MIB_RESET_RUNT, UMAC_MIB_CTRL);
+ bcmgenet_umac_writel(priv, 0, UMAC_MIB_CTRL);
+
+ bcmgenet_umac_writel(priv, ENET_MAX_MTU_SIZE, UMAC_MAX_FRAME_LEN);
+
+ /* init rx registers, enable ip header optimization */
+ reg = bcmgenet_rbuf_readl(priv, RBUF_CTRL);
+ reg |= RBUF_ALIGN_2B;
+ bcmgenet_rbuf_writel(priv, reg, RBUF_CTRL);
+
+ if (!GENET_IS_V1(priv) && !GENET_IS_V2(priv))
+ bcmgenet_rbuf_writel(priv, 1, RBUF_TBUF_SIZE_CTRL);
+
+ /* Mask all interrupts.*/
+ bcmgenet_intrl2_0_writel(priv, 0xFFFFFFFF, INTRL2_CPU_MASK_SET);
+ bcmgenet_intrl2_0_writel(priv, 0xFFFFFFFF, INTRL2_CPU_CLEAR);
+ bcmgenet_intrl2_0_writel(priv, 0, INTRL2_CPU_MASK_CLEAR);
+
+ cpu_mask_clear = UMAC_IRQ_RXDMA_BDONE;
+
+ dev_dbg(kdev, "%s:Enabling RXDMA_BDONE interrupt\n", __func__);
+
+ /* Monitor cable plug/unpluged event for internal PHY */
+ if (priv->phy_type == BRCM_PHY_TYPE_INT)
+ cpu_mask_clear |= (UMAC_IRQ_LINK_DOWN | UMAC_IRQ_LINK_UP);
+ else if (priv->ext_phy)
+ cpu_mask_clear |= (UMAC_IRQ_LINK_DOWN | UMAC_IRQ_LINK_UP);
+ else if (priv->phy_type == BRCM_PHY_TYPE_MOCA) {
+ reg = bcmgenet_bp_mc_get(priv);
+ reg |= BIT(priv->hw_params->bp_in_en_shift);
+
+ /* bp_mask: back pressure mask */
+ if (netif_is_multiqueue(priv->dev))
+ reg |= priv->hw_params->bp_in_mask;
+ else
+ reg &= ~priv->hw_params->bp_in_mask;
+ bcmgenet_bp_mc_set(priv, reg);
+ }
+
+ /* Enable MDIO interrupts on GENET v3+ */
+ if (priv->hw_params->flags & GENET_HAS_MDIO_INTR)
+ cpu_mask_clear |= UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR;
+
+ bcmgenet_intrl2_0_writel(priv, cpu_mask_clear,
+ INTRL2_CPU_MASK_CLEAR);
+
+ /* Enable rx/tx engine.*/
+ dev_dbg(kdev, "done init umac\n");
+
+ return 0;
+}
+
+/* Initialize all house-keeping variables for a TX ring, along
+ * with corresponding hardware registers
+ */
+static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
+ unsigned int index, unsigned int size,
+ unsigned int write_ptr, unsigned int end_ptr)
+{
+ struct bcmgenet_tx_ring *ring = &priv->tx_rings[index];
+ u32 words_per_bd = WORDS_PER_BD(priv);
+ u32 flow_period_val = 0;
+ unsigned int first_bd;
+
+ spin_lock_init(&ring->lock);
+ ring->index = index;
+ if (index == DESC_INDEX) {
+ ring->queue = 0;
+ ring->int_enable = bcmgenet_tx_ring16_int_enable;
+ ring->int_disable = bcmgenet_tx_ring16_int_disable;
+ } else {
+ ring->queue = index + 1;
+ ring->int_enable = bcmgenet_tx_ring_int_enable;
+ ring->int_disable = bcmgenet_tx_ring_int_disable;
+ }
+ ring->cbs = priv->tx_cbs + write_ptr;
+ ring->size = size;
+ ring->c_index = 0;
+ ring->free_bds = size;
+ ring->write_ptr = write_ptr;
+ ring->cb_ptr = write_ptr;
+ ring->end_ptr = end_ptr - 1;
+ ring->prod_index = 0;
+
+ /* Set flow period for ring != 16 */
+ if (index != DESC_INDEX)
+ flow_period_val = ENET_MAX_MTU_SIZE << 16;
+
+ bcmgenet_tdma_ring_writel(priv, index, 0, TDMA_PROD_INDEX);
+ bcmgenet_tdma_ring_writel(priv, index, 0, TDMA_CONS_INDEX);
+ bcmgenet_tdma_ring_writel(priv, index, 1, DMA_MBUF_DONE_THRESH);
+ /* Disable rate control for now */
+ bcmgenet_tdma_ring_writel(priv, index, flow_period_val,
+ TDMA_FLOW_PERIOD);
+ /* Unclassified traffic goes to ring 16 */
+ bcmgenet_tdma_ring_writel(priv, index,
+ ((size << DMA_RING_SIZE_SHIFT) | RX_BUF_LENGTH),
+ DMA_RING_BUF_SIZE);
+
+ first_bd = write_ptr;
+
+ /* Set start and end address, read and write pointers */
+ bcmgenet_tdma_ring_writel(priv, index, first_bd * words_per_bd,
+ DMA_START_ADDR);
+ bcmgenet_tdma_ring_writel(priv, index, first_bd * words_per_bd,
+ TDMA_READ_PTR);
+ bcmgenet_tdma_ring_writel(priv, index, first_bd,
+ TDMA_WRITE_PTR);
+ bcmgenet_tdma_ring_writel(priv, index, end_ptr * words_per_bd - 1,
+ DMA_END_ADDR);
+}
+
+/* Initialize a RDMA ring */
+static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
+ unsigned int index, unsigned int size)
+{
+ u32 words_per_bd = WORDS_PER_BD(priv);
+ int ret;
+
+ priv->num_rx_bds = TOTAL_DESC;
+ priv->rx_bds = priv->base + priv->hw_params->rdma_offset;
+ priv->rx_bd_assign_ptr = priv->rx_bds;
+ priv->rx_bd_assign_index = 0;
+ priv->rx_c_index = 0;
+ priv->rx_read_ptr = 0;
+ priv->rx_cbs = kzalloc(priv->num_rx_bds * sizeof(struct enet_cb),
+ GFP_KERNEL);
+ if (!priv->rx_cbs)
+ return -ENOMEM;
+
+ ret = bcmgenet_alloc_rx_buffers(priv);
+ if (ret) {
+ kfree(priv->rx_cbs);
+ return ret;
+ }
+
+ bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_WRITE_PTR);
+ bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_PROD_INDEX);
+ bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_CONS_INDEX);
+ bcmgenet_rdma_ring_writel(priv, index,
+ ((size << DMA_RING_SIZE_SHIFT) | RX_BUF_LENGTH),
+ DMA_RING_BUF_SIZE);
+ bcmgenet_rdma_ring_writel(priv, index, 0, DMA_START_ADDR);
+ bcmgenet_rdma_ring_writel(priv, index,
+ words_per_bd * size - 1, DMA_END_ADDR);
+ bcmgenet_rdma_ring_writel(priv, index,
+ (DMA_FC_THRESH_LO << DMA_XOFF_THRESHOLD_SHIFT) |
+ DMA_FC_THRESH_HI, RDMA_XON_XOFF_THRESH);
+ bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_READ_PTR);
+
+ return ret;
+}
+
+/* init multi xmit queues, only available for GENET2
+ * the queue is partitioned as follows:
+ *
+ * queue 0 - 3 is priority based, each one has 32 descriptors,
+ * with queue 0 being the highest priority queue.
+ *
+ * queue 16 is the default tx queue with GENET_DEFAULT_BD_CNT
+ * descriptors: 256 - (number of tx queues * bds per queues) = 128
+ * descriptors.
+ *
+ * The transmit control block pool is then partitioned as following:
+ * - tx_cbs[0...127] are for queue 16
+ * - tx_ring_cbs[0] points to tx_cbs[128..159]
+ * - tx_ring_cbs[1] points to tx_cbs[160..191]
+ * - tx_ring_cbs[2] points to tx_cbs[192..223]
+ * - tx_ring_cbs[3] points to tx_cbs[224..255]
+ */
+static void bcmgenet_init_multiq(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ unsigned int i, dma_enable;
+ u32 reg, dma_ctrl, ring_cfg = 0, dma_priority = 0;
+
+ if (!netif_is_multiqueue(dev)) {
+ netdev_warn(dev, "called with non multi queue aware HW\n");
+ return;
+ }
+
+ dma_ctrl = bcmgenet_tdma_readl(priv, DMA_CTRL);
+ dma_enable = dma_ctrl & DMA_EN;
+ dma_ctrl &= ~DMA_EN;
+ bcmgenet_tdma_writel(priv, dma_ctrl, DMA_CTRL);
+
+ /* Enable strict priority arbiter mode */
+ bcmgenet_tdma_writel(priv, DMA_ARBITER_SP, DMA_ARB_CTRL);
+
+ for (i = 0; i < priv->hw_params->tx_queues; i++) {
+ /* first 64 tx_cbs are reserved for default tx queue
+ * (ring 16)
+ */
+ bcmgenet_init_tx_ring(priv, i, priv->hw_params->bds_cnt,
+ i * priv->hw_params->bds_cnt,
+ (i + 1) * priv->hw_params->bds_cnt);
+
+ /* Configure ring as decriptor ring and setup priority */
+ ring_cfg |= (1 << i);
+ dma_priority |= ((GENET_Q0_PRIORITY + i) <<
+ (GENET_MAX_MQ_CNT + 1) * i);
+ dma_ctrl |= (1 << (i + DMA_RING_BUF_EN_SHIFT));
+ }
+
+ /* Enable rings */
+ reg = bcmgenet_tdma_readl(priv, DMA_RING_CFG);
+ reg |= ring_cfg;
+ bcmgenet_tdma_writel(priv, reg, DMA_RING_CFG);
+
+ /* Use configured rings priority and set ring #16 priority */
+ reg = bcmgenet_tdma_readl(priv, DMA_RING_PRIORITY);
+ reg |= ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
+ reg |= dma_priority;
+ bcmgenet_tdma_writel(priv, reg, DMA_PRIORITY);
+
+ /* Configure ring as descriptor ring and re-enable DMA if enabled */
+ reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
+ reg |= dma_ctrl;
+ if (dma_enable)
+ reg |= DMA_EN;
+ bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
+}
+
+static void bcmgenet_fini_dma(struct bcmgenet_priv *priv)
+{
+ int i;
+
+ /* disable DMA */
+ bcmgenet_rdma_writel(priv, 0, DMA_CTRL);
+ bcmgenet_tdma_writel(priv, 0, DMA_CTRL);
+
+ for (i = 0; i < priv->num_tx_bds; i++) {
+ if (priv->tx_cbs[i].skb != NULL) {
+ dev_kfree_skb(priv->tx_cbs[i].skb);
+ priv->tx_cbs[i].skb = NULL;
+ }
+ }
+ bcmgenet_free_rx_buffers(priv);
+ kfree(priv->rx_cbs);
+ kfree(priv->tx_cbs);
+}
+
+/* init_edma: Initialize DMA control register */
+static int bcmgenet_init_dma(struct bcmgenet_priv *priv)
+{
+ int ret;
+
+ netif_dbg(priv, hw, priv->dev, "bcmgenet: init_edma\n");
+
+ /* by default, enable ring 16 (descriptor based) */
+ ret = bcmgenet_init_rx_ring(priv, DESC_INDEX, TOTAL_DESC);
+ if (ret) {
+ netdev_err(priv->dev, "failed to initialize RX ring\n");
+ return ret;
+ }
+
+ /* init rDma */
+ bcmgenet_rdma_writel(priv, DMA_MAX_BURST_LENGTH, DMA_SCB_BURST_SIZE);
+
+ /* Init tDma */
+ bcmgenet_tdma_writel(priv, DMA_MAX_BURST_LENGTH, DMA_SCB_BURST_SIZE);
+
+ /* Initialize commont TX ring structures */
+ priv->tx_bds = priv->base + priv->hw_params->tdma_offset;
+ priv->num_tx_bds = TOTAL_DESC;
+ priv->tx_cbs = kzalloc(priv->num_tx_bds * sizeof(struct enet_cb),
+ GFP_KERNEL);
+ if (!priv->tx_cbs) {
+ bcmgenet_fini_dma(priv);
+ return -ENOMEM;
+ }
+
+ /* initialize multi xmit queue */
+ bcmgenet_init_multiq(priv->dev);
+
+ /* initialize special ring 16 */
+ bcmgenet_init_tx_ring(priv, DESC_INDEX, GENET_DEFAULT_BD_CNT,
+ priv->hw_params->tx_queues * priv->hw_params->bds_cnt,
+ TOTAL_DESC);
+
+ return 0;
+}
+
+/* NAPI polling method*/
+static int bcmgenet_poll(struct napi_struct *napi, int budget)
+{
+ struct bcmgenet_priv *priv = container_of(napi,
+ struct bcmgenet_priv, napi);
+ unsigned int work_done;
+
+ work_done = bcmgenet_desc_rx(priv, budget);
+
+ /* tx reclaim */
+ bcmgenet_tx_reclaim(priv->dev, &priv->tx_rings[DESC_INDEX]);
+ /* Advancing our consumer index*/
+ priv->rx_c_index += work_done;
+ priv->rx_c_index &= DMA_C_INDEX_MASK;
+ bcmgenet_rdma_ring_writel(priv, DESC_INDEX,
+ priv->rx_c_index, RDMA_CONS_INDEX);
+ if (work_done < budget) {
+ napi_complete(napi);
+ bcmgenet_intrl2_0_writel(priv,
+ UMAC_IRQ_RXDMA_BDONE, INTRL2_CPU_MASK_CLEAR);
+ }
+
+ return work_done;
+}
+
+/* Interrupt bottom half */
+static void bcmgenet_irq_task(struct work_struct *work)
+{
+ struct bcmgenet_priv *priv = container_of(
+ work, struct bcmgenet_priv, bcmgenet_irq_work);
+ struct net_device *dev;
+ u32 reg;
+
+ dev = priv->dev;
+
+ netif_dbg(priv, intr, dev, "%s\n", __func__);
+ /* Cable plugged/unplugged event */
+ if (priv->phy_type == BRCM_PHY_TYPE_INT) {
+ if (priv->irq0_stat & UMAC_IRQ_PHY_DET_R) {
+ priv->irq0_stat &= ~UMAC_IRQ_PHY_DET_R;
+ netif_crit(priv, link, dev,
+ "cable plugged in, powering up\n");
+ bcmgenet_power_up(priv, GENET_POWER_CABLE_SENSE);
+ } else if (priv->irq0_stat & UMAC_IRQ_PHY_DET_F) {
+ priv->irq0_stat &= ~UMAC_IRQ_PHY_DET_F;
+ netif_crit(priv, link, dev,
+ "cable unplugged, powering down\n");
+ bcmgenet_power_down(priv, GENET_POWER_CABLE_SENSE);
+ }
+ }
+ if (priv->irq0_stat & UMAC_IRQ_MPD_R) {
+ priv->irq0_stat &= ~UMAC_IRQ_MPD_R;
+ netif_crit(priv, wol, dev,
+ "magic packet detected, waking up\n");
+ /* disable mpd interrupt */
+ bcmgenet_intrl2_0_writel(priv,
+ UMAC_IRQ_MPD_R, INTRL2_CPU_MASK_SET);
+ /* disable CRC forward.*/
+ reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+ reg &= ~CMD_CRC_FWD;
+ bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+ priv->crc_fwd_en = 0;
+ bcmgenet_power_up(priv, GENET_POWER_WOL_MAGIC);
+
+ } else if (priv->irq0_stat & (UMAC_IRQ_HFB_SM | UMAC_IRQ_HFB_MM)) {
+ priv->irq0_stat &= ~(UMAC_IRQ_HFB_SM | UMAC_IRQ_HFB_MM);
+ netif_crit(priv, wol, dev,
+ "ACPI pattern matched, waking up\n");
+ /* disable HFB match interrupts */
+ bcmgenet_intrl2_0_writel(priv,
+ UMAC_IRQ_HFB_SM | UMAC_IRQ_HFB_MM, INTRL2_CPU_MASK_SET);
+ bcmgenet_power_up(priv, GENET_POWER_WOL_ACPI);
+ }
+
+ /* Link UP/DOWN event */
+ if ((priv->hw_params->flags & GENET_HAS_MDIO_INTR) &&
+ (priv->irq0_stat & (UMAC_IRQ_LINK_UP|UMAC_IRQ_LINK_DOWN))) {
+ if (priv->phydev)
+ phy_mac_interrupt(priv->phydev,
+ (priv->irq0_stat & UMAC_IRQ_LINK_UP));
+ priv->irq0_stat &= ~(UMAC_IRQ_LINK_UP|UMAC_IRQ_LINK_DOWN);
+ }
+}
+
+/* bcmgenet_isr1: interrupt handler for ring buffer. */
+static irqreturn_t bcmgenet_isr1(int irq, void *dev_id)
+{
+ struct bcmgenet_priv *priv = dev_id;
+ unsigned int index;
+
+ /* Save irq status for bottom-half processing. */
+ priv->irq1_stat =
+ bcmgenet_intrl2_1_readl(priv, INTRL2_CPU_STAT) &
+ ~priv->int1_mask;
+ /* clear inerrupts*/
+ bcmgenet_intrl2_1_writel(priv, priv->irq1_stat, INTRL2_CPU_CLEAR);
+
+ netif_dbg(priv, intr, priv->dev,
+ "%s: IRQ=0x%x\n", __func__, priv->irq1_stat);
+ /* Check the MBDONE interrupts.
+ * packet is done, reclaim descriptors
+ */
+ if (priv->irq1_stat & 0x0000ffff) {
+ index = 0;
+ for (index = 0; index < 16; index++) {
+ if (priv->irq1_stat & (1 << index))
+ bcmgenet_tx_reclaim(priv->dev,
+ &priv->tx_rings[index]);
+ }
+ }
+ return IRQ_HANDLED;
+}
+
+/* bcmgenet_isr0: Handle various interrupts. */
+static irqreturn_t bcmgenet_isr0(int irq, void *dev_id)
+{
+ struct bcmgenet_priv *priv = dev_id;
+
+ /* Save irq status for bottom-half processing. */
+ priv->irq0_stat =
+ bcmgenet_intrl2_0_readl(priv, INTRL2_CPU_STAT) &
+ ~bcmgenet_intrl2_0_readl(priv, INTRL2_CPU_MASK_STATUS);
+ /* clear inerrupts*/
+ bcmgenet_intrl2_0_writel(priv, priv->irq0_stat, INTRL2_CPU_CLEAR);
+
+ netif_dbg(priv, intr, priv->dev,
+ "IRQ=0x%x\n", priv->irq0_stat);
+
+ if (priv->irq0_stat & (UMAC_IRQ_RXDMA_BDONE | UMAC_IRQ_RXDMA_PDONE)) {
+ /* We use NAPI(software interrupt throttling, if
+ * Rx Descriptor throttling is not used.
+ * Disable interrupt, will be enabled in the poll method.
+ */
+ if (likely(napi_schedule_prep(&priv->napi))) {
+ bcmgenet_intrl2_0_writel(priv,
+ UMAC_IRQ_RXDMA_BDONE, INTRL2_CPU_MASK_SET);
+ __napi_schedule(&priv->napi);
+ }
+ }
+ if (priv->irq0_stat &
+ (UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE)) {
+ /* Tx reclaim */
+ bcmgenet_tx_reclaim(priv->dev, &priv->tx_rings[DESC_INDEX]);
+ }
+ if (priv->irq0_stat & (UMAC_IRQ_PHY_DET_R |
+ UMAC_IRQ_PHY_DET_F |
+ UMAC_IRQ_LINK_UP |
+ UMAC_IRQ_LINK_DOWN |
+ UMAC_IRQ_HFB_SM |
+ UMAC_IRQ_HFB_MM |
+ UMAC_IRQ_MPD_R)) {
+ /* all other interested interrupts handled in bottom half */
+ schedule_work(&priv->bcmgenet_irq_work);
+ }
+
+ if ((priv->hw_params->flags & GENET_HAS_MDIO_INTR) &&
+ priv->irq0_stat & (UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR)) {
+ priv->irq0_stat &= ~(UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR);
+ wake_up(&priv->wq);
+ }
+
+ return IRQ_HANDLED;
+}
+
+static void bcmgenet_umac_reset(struct bcmgenet_priv *priv)
+{
+ u32 reg;
+
+ reg = bcmgenet_rbuf_ctrl_get(priv);
+ reg |= BIT(1);
+ bcmgenet_rbuf_ctrl_set(priv, reg);
+ udelay(10);
+
+ reg &= ~BIT(1);
+ bcmgenet_rbuf_ctrl_set(priv, reg);
+ udelay(10);
+}
+
+static void bcmgenet_set_hw_addr(struct bcmgenet_priv *priv,
+ unsigned char *addr)
+{
+ bcmgenet_umac_writel(priv, (addr[0] << 24) | (addr[1] << 16) |
+ (addr[2] << 8) | addr[3], UMAC_MAC0);
+ bcmgenet_umac_writel(priv, (addr[4] << 8) | addr[5], UMAC_MAC1);
+}
+
+static int bcmgenet_wol_resume(struct bcmgenet_priv *priv)
+{
+ int ret;
+
+ /* From WOL-enabled suspend, switch to regular clock */
+ clk_disable(priv->clk_wol);
+ /* init umac registers to synchronize s/w with h/w */
+ ret = init_umac(priv);
+ if (ret)
+ return ret;
+
+ if (priv->phydev)
+ phy_init_hw(priv->phydev);
+ /* Speed settings must be restored */
+ bcmgenet_mii_config(priv->dev);
+
+ return 0;
+}
+
+/* Returns a reusable dma control register value */
+static u32 bcmgenet_dma_disable(struct bcmgenet_priv *priv)
+{
+ u32 reg;
+ u32 dma_ctrl;
+
+ /* disable DMA */
+ dma_ctrl = 1 << (DESC_INDEX + DMA_RING_BUF_EN_SHIFT) | DMA_EN;
+ reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
+ reg &= ~dma_ctrl;
+ bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
+
+ reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
+ reg &= ~dma_ctrl;
+ bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
+
+ bcmgenet_umac_writel(priv, 1, UMAC_TX_FLUSH);
+ udelay(10);
+ bcmgenet_umac_writel(priv, 0, UMAC_TX_FLUSH);
+
+ return dma_ctrl;
+}
+
+static void bcmgenet_enable_dma(struct bcmgenet_priv *priv, u32 dma_ctrl)
+{
+ u32 reg;
+
+ reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
+ reg |= dma_ctrl;
+ bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
+
+ reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
+ reg |= dma_ctrl;
+ bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
+}
+
+static int bcmgenet_open(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ unsigned long dma_ctrl;
+ u32 reg;
+ int ret;
+
+ netif_dbg(priv, ifup, dev, "bcmgenet_open\n");
+
+ /* Turn on the clock */
+ if (!IS_ERR(priv->clk))
+ clk_prepare_enable(priv->clk);
+
+ /* take MAC out of reset */
+ bcmgenet_umac_reset(priv);
+
+ ret = init_umac(priv);
+ if (ret)
+ goto err_clk_disable;
+
+ /* disable ethernet MAC while updating its registers */
+ reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+ reg &= ~(CMD_TX_EN | CMD_RX_EN);
+ bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+
+ bcmgenet_set_hw_addr(priv, dev->dev_addr);
+
+ if (priv->wol_enabled) {
+ ret = bcmgenet_wol_resume(priv);
+ if (ret)
+ return ret;
+ }
+
+ if (priv->phy_type == BRCM_PHY_TYPE_INT) {
+ reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+ reg |= EXT_ENERGY_DET_MASK;
+ bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+ }
+
+ if (test_and_clear_bit(GENET_POWER_WOL_MAGIC, &priv->wol_enabled))
+ bcmgenet_power_up(priv, GENET_POWER_WOL_MAGIC);
+ if (test_and_clear_bit(GENET_POWER_WOL_ACPI, &priv->wol_enabled))
+ bcmgenet_power_up(priv, GENET_POWER_WOL_ACPI);
+
+ /* Disable RX/TX DMA and flush TX queues */
+ dma_ctrl = bcmgenet_dma_disable(priv);
+
+ /* Reinitialize TDMA and RDMA and SW housekeeping */
+ ret = bcmgenet_init_dma(priv);
+ if (ret) {
+ netdev_err(dev, "failed to initialize DMA\n");
+ goto err_fini_dma;
+ }
+
+ /* Always enable ring 16 - descriptor ring */
+ bcmgenet_enable_dma(priv, dma_ctrl);
+
+ ret = request_irq(priv->irq0, bcmgenet_isr0, IRQF_SHARED,
+ dev->name, priv);
+ if (ret < 0) {
+ netdev_err(dev, "can't request IRQ %d\n", priv->irq0);
+ goto err_fini_dma;
+ }
+
+ ret = request_irq(priv->irq1, bcmgenet_isr1, IRQF_SHARED,
+ dev->name, priv);
+ if (ret < 0) {
+ netdev_err(dev, "can't request IRQ %d\n", priv->irq1);
+ goto err_irq0;
+ }
+
+ /* Start the network engine */
+ napi_enable(&priv->napi);
+
+ reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+ reg |= (CMD_TX_EN | CMD_RX_EN);
+ bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+
+ /* Make sure we reflect the value of CRC_CMD_FWD */
+ priv->crc_fwd_en = !!(reg & CMD_CRC_FWD);
+
+ device_set_wakeup_capable(&dev->dev, 1);
+
+ if (priv->phy_type == BRCM_PHY_TYPE_INT)
+ bcmgenet_power_up(priv, GENET_POWER_PASSIVE);
+
+ netif_tx_start_all_queues(dev);
+
+ if (priv->phydev)
+ phy_start(priv->phydev);
+
+ return 0;
+
+err_irq0:
+ free_irq(priv->irq0, dev);
+err_fini_dma:
+ bcmgenet_fini_dma(priv);
+err_clk_disable:
+ if (!IS_ERR(priv->clk))
+ clk_disable_unprepare(priv->clk);
+ return ret;
+}
+
+static int bcmgenet_dma_teardown(struct bcmgenet_priv *priv)
+{
+ int timeout = 0;
+ u32 reg;
+
+ /* Disable TDMA to stop add more frames in TX DMA */
+ reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
+ reg &= ~DMA_EN;
+ bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
+
+ /* Check TDMA status register to confirm TDMA is disabled */
+ while (!(bcmgenet_tdma_readl(priv, DMA_STATUS) & DMA_DISABLED)) {
+ if (timeout++ == 5000) {
+ netdev_warn(priv->dev,
+ "Timed out while disabling TX DMA\n");
+ return -ETIMEDOUT;
+ }
+ udelay(1);
+ }
+
+ /* Wait 10ms for packet drain in both tx and rx dma */
+ usleep_range(10000, 20000);
+
+ /* Disable RDMA */
+ reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
+ reg &= ~DMA_EN;
+ bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
+
+ timeout = 0;
+ /* Check RDMA status register to confirm RDMA is disabled */
+ while (!(bcmgenet_rdma_readl(priv, DMA_STATUS) & DMA_DISABLED)) {
+ if (timeout++ == 5000) {
+ netdev_warn(priv->dev,
+ "Timed out while disabling RX DMA\n");
+ return -ETIMEDOUT;
+ }
+ udelay(1);
+ }
+
+ return 0;
+}
+
+static int bcmgenet_close(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ int ret;
+ u32 reg;
+
+ netif_dbg(priv, ifdown, dev, "bcmgenet_close\n");
+
+ if (priv->phydev)
+ phy_stop(priv->phydev);
+
+ /* Disable MAC receive */
+ reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+ reg &= ~CMD_RX_EN;
+ bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+
+ netif_tx_stop_all_queues(dev);
+
+ ret = bcmgenet_dma_teardown(priv);
+ if (ret)
+ return ret;
+
+ /* Disable MAC transmit. TX DMA disabled have to done before this */
+ reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+ reg &= ~CMD_TX_EN;
+ bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+
+ napi_disable(&priv->napi);
+
+ /* tx reclaim */
+ bcmgenet_tx_reclaim_all(dev);
+ bcmgenet_fini_dma(priv);
+
+ free_irq(priv->irq0, priv);
+ free_irq(priv->irq1, priv);
+
+ /* Wait for pending work items to complete - we are stopping
+ * the clock now. Since interrupts are disabled, no new work
+ * will be scheduled.
+ */
+ cancel_work_sync(&priv->bcmgenet_irq_work);
+
+ if (device_may_wakeup(&dev->dev)) {
+ if (priv->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE))
+ bcmgenet_power_down(priv, GENET_POWER_WOL_MAGIC);
+ if (priv->wolopts & WAKE_ARP)
+ bcmgenet_power_down(priv, GENET_POWER_WOL_ACPI);
+ } else if (priv->phy_type == BRCM_PHY_TYPE_INT)
+ bcmgenet_power_down(priv, GENET_POWER_PASSIVE);
+
+ if (priv->wol_enabled)
+ clk_enable(priv->clk_wol);
+
+ if (!IS_ERR(priv->clk))
+ clk_disable_unprepare(priv->clk);
+
+ return 0;
+}
+
+static void bcmgenet_timeout(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+
+ BUG_ON(dev == NULL);
+
+ netif_dbg(priv, tx_err, dev, "bcmgenet_timeout\n");
+
+ dev->trans_start = jiffies;
+
+ dev->stats.tx_errors++;
+
+ netif_tx_wake_all_queues(dev);
+}
+
+#define MAX_MC_COUNT 16
+
+static inline void bcmgenet_set_mdf_addr(struct bcmgenet_priv *priv,
+ unsigned char *addr,
+ int *i,
+ int *mc)
+{
+ u32 reg;
+
+ bcmgenet_umac_writel(priv, addr[0] << 8 | addr[1],
+ UMAC_MDF_ADDR + (*i * 4));
+ bcmgenet_umac_writel(priv,
+ addr[2] << 24 | addr[3] << 16 |
+ addr[4] << 8 | addr[5],
+ UMAC_MDF_ADDR + ((*i + 1) * 4));
+ reg = bcmgenet_umac_readl(priv, UMAC_MDF_CTRL);
+ reg |= (1 << (MAX_MC_COUNT - *mc));
+ bcmgenet_umac_writel(priv, reg, UMAC_MDF_CTRL);
+ *i += 2;
+ (*mc)++;
+}
+
+static void bcmgenet_set_rx_mode(struct net_device *dev)
+{
+ struct bcmgenet_priv *priv = netdev_priv(dev);
+ struct netdev_hw_addr *ha;
+ int i, mc;
+ u32 reg;
+
+ netif_dbg(priv, hw, dev, "%s: %08X\n", __func__, dev->flags);
+
+ /* Promiscous mode */
+ reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+ if (dev->flags & IFF_PROMISC) {
+ reg |= CMD_PROMISC;
+ bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+ bcmgenet_umac_writel(priv, 0, UMAC_MDF_CTRL);
+ return;
+ } else {
+ reg &= ~CMD_PROMISC;
+ bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+ }
+
+ /* UniMac doesn't support ALLMULTI */
+ if (dev->flags & IFF_ALLMULTI)
+ return;
+
+ /* update MDF filter */
+ i = 0;
+ mc = 0;
+ /* Broadcast */
+ bcmgenet_set_mdf_addr(priv, dev->broadcast, &i, &mc);
+ /* my own address.*/
+ bcmgenet_set_mdf_addr(priv, dev->dev_addr, &i, &mc);
+ /* Unicast list*/
+ if (netdev_uc_count(dev) > (MAX_MC_COUNT - mc))
+ return;
+
+ if (!netdev_uc_empty(dev))
+ netdev_for_each_uc_addr(ha, dev)
+ bcmgenet_set_mdf_addr(priv, ha->addr, &i, &mc);
+ /* Multicast */
+ if (netdev_mc_empty(dev) || netdev_mc_count(dev) >= (MAX_MC_COUNT - mc))
+ return;
+
+ netdev_for_each_mc_addr(ha, dev)
+ bcmgenet_set_mdf_addr(priv, ha->addr, &i, &mc);
+}
+
+/* Set the hardware MAC address. */
+static int bcmgenet_set_mac_addr(struct net_device *dev, void *p)
+{
+ struct sockaddr *addr = p;
+
+ if (netif_running(dev))
+ return -EBUSY;
+
+ ether_addr_copy(dev->dev_addr, addr->sa_data);
+
+ return 0;
+}
+
+static u16 bcmgenet_select_queue(struct net_device *dev,
+ struct sk_buff *skb, void *accel_priv)
+{
+ return netif_is_multiqueue(dev) ? skb->queue_mapping : 0;
+}
+
+static const struct net_device_ops bcmgenet_netdev_ops = {
+ .ndo_open = bcmgenet_open,
+ .ndo_stop = bcmgenet_close,
+ .ndo_start_xmit = bcmgenet_xmit,
+ .ndo_select_queue = bcmgenet_select_queue,
+ .ndo_tx_timeout = bcmgenet_timeout,
+ .ndo_set_rx_mode = bcmgenet_set_rx_mode,
+ .ndo_set_mac_address = bcmgenet_set_mac_addr,
+ .ndo_do_ioctl = bcmgenet_ioctl,
+ .ndo_set_features = bcmgenet_set_features,
+};
+
+/* Array of GENET hardware parameters/characteristics */
+static struct bcmgenet_hw_params bcmgenet_hw_params[] = {
+ [GENET_V1] = {
+ .tx_queues = 0,
+ .rx_queues = 0,
+ .bds_cnt = 0,
+ .bp_in_en_shift = 16,
+ .bp_in_mask = 0xffff,
+ .hfb_filter_cnt = 16,
+ .qtag_mask = 0x1F,
+ .hfb_offset = 0x1000,
+ .rdma_offset = 0x2000,
+ .tdma_offset = 0x3000,
+ .words_per_bd = 2,
+ },
+ [GENET_V2] = {
+ .tx_queues = 4,
+ .rx_queues = 4,
+ .bds_cnt = 32,
+ .bp_in_en_shift = 16,
+ .bp_in_mask = 0xffff,
+ .hfb_filter_cnt = 16,
+ .qtag_mask = 0x1F,
+ .tbuf_offset = 0x0600,
+ .hfb_offset = 0x1000,
+ .hfb_reg_offset = 0x2000,
+ .rdma_offset = 0x3000,
+ .tdma_offset = 0x4000,
+ .words_per_bd = 2,
+ .flags = GENET_HAS_EXT,
+ },
+ [GENET_V3] = {
+ .tx_queues = 4,
+ .rx_queues = 4,
+ .bds_cnt = 32,
+ .bp_in_en_shift = 17,
+ .bp_in_mask = 0x1ffff,
+ .hfb_filter_cnt = 48,
+ .qtag_mask = 0x3F,
+ .tbuf_offset = 0x0600,
+ .hfb_offset = 0x8000,
+ .hfb_reg_offset = 0xfc00,
+ .rdma_offset = 0x10000,
+ .tdma_offset = 0x11000,
+ .words_per_bd = 2,
+ .flags = GENET_HAS_EXT | GENET_HAS_MDIO_INTR,
+ },
+ [GENET_V4] = {
+ .tx_queues = 4,
+ .rx_queues = 4,
+ .bds_cnt = 32,
+ .bp_in_en_shift = 17,
+ .bp_in_mask = 0x1ffff,
+ .hfb_filter_cnt = 48,
+ .qtag_mask = 0x3F,
+ .tbuf_offset = 0x0600,
+ .hfb_offset = 0x8000,
+ .hfb_reg_offset = 0xfc00,
+ .rdma_offset = 0x2000,
+ .tdma_offset = 0x4000,
+ .words_per_bd = 3,
+ .flags = GENET_HAS_40BITS | GENET_HAS_EXT | GENET_HAS_MDIO_INTR,
+ },
+};
+
+/* Infer hardware parameters from the detected GENET version */
+static void bcmgenet_set_hw_params(struct bcmgenet_priv *priv)
+{
+ struct bcmgenet_hw_params *params;
+ u32 reg;
+ u8 major;
+
+ if (GENET_IS_V4(priv)) {
+ bcmgenet_dma_regs = bcmgenet_dma_regs_v3plus;
+ genet_dma_ring_regs = genet_dma_ring_regs_v4;
+ priv->dma_rx_chk_bit = DMA_RX_CHK_V3PLUS;
+ priv->version = GENET_V4;
+ } else if (GENET_IS_V3(priv)) {
+ bcmgenet_dma_regs = bcmgenet_dma_regs_v3plus;
+ genet_dma_ring_regs = genet_dma_ring_regs_v123;
+ priv->dma_rx_chk_bit = DMA_RX_CHK_V3PLUS;
+ priv->version = GENET_V3;
+ } else if (GENET_IS_V2(priv)) {
+ bcmgenet_dma_regs = bcmgenet_dma_regs_v2;
+ genet_dma_ring_regs = genet_dma_ring_regs_v123;
+ priv->dma_rx_chk_bit = DMA_RX_CHK_V12;
+ priv->version = GENET_V2;
+ } else if (GENET_IS_V1(priv)) {
+ bcmgenet_dma_regs = bcmgenet_dma_regs_v1;
+ genet_dma_ring_regs = genet_dma_ring_regs_v123;
+ priv->dma_rx_chk_bit = DMA_RX_CHK_V12;
+ priv->version = GENET_V1;
+ }
+
+ /* enum genet_version starts at 1 */
+ priv->hw_params = &bcmgenet_hw_params[priv->version];
+ params = priv->hw_params;
+
+ /* Read GENET HW version */
+ reg = bcmgenet_sys_readl(priv, SYS_REV_CTRL);
+ major = (reg >> 24 & 0x0f);
+ if (major == 5)
+ major = 4;
+ else if (major == 0)
+ major = 1;
+ if (major != priv->version) {
+ dev_err(&priv->pdev->dev,
+ "GENET version mismatch, got: %d, configured for: %d\n",
+ major, priv->version);
+ }
+
+ /* Print the GENET core version */
+ dev_info(&priv->pdev->dev, "GENET " GENET_VER_FMT,
+ major, (reg >> 16) & 0x0f, reg & 0xffff);
+
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+ if (!(params->flags & GENET_HAS_40BITS))
+ pr_warn("GENET does not support 40-bits PA\n");
+#endif
+
+ pr_debug("Configuration for version: %d\n"
+ "TXq: %1d, RXq: %1d, BDs: %1d\n"
+ "BP << en: %2d, BP msk: 0x%05x\n"
+ "HFB count: %2d, QTAQ msk: 0x%05x\n"
+ "TBUF: 0x%04x, HFB: 0x%04x, HFBreg: 0x%04x\n"
+ "RDMA: 0x%05x, TDMA: 0x%05x\n"
+ "Words/BD: %d\n",
+ priv->version,
+ params->tx_queues, params->rx_queues, params->bds_cnt,
+ params->bp_in_en_shift, params->bp_in_mask,
+ params->hfb_filter_cnt, params->qtag_mask,
+ params->tbuf_offset, params->hfb_offset,
+ params->hfb_reg_offset,
+ params->rdma_offset, params->tdma_offset,
+ params->words_per_bd);
+}
+
+static int bcmgenet_drv_probe(struct platform_device *pdev)
+{
+ struct device_node *dn = pdev->dev.of_node;
+ struct bcmgenet_priv *priv;
+ struct net_device *dev;
+ const void *macaddr;
+ struct resource *r;
+ int err = -EIO;
+
+ /* Up to GENET_MAX_MQ_CNT + 1 TX queues and a single RX queue */
+ dev = alloc_etherdev_mqs(sizeof(*priv), GENET_MAX_MQ_CNT + 1, 1);
+ if (!dev) {
+ dev_err(&pdev->dev, "can't allocate net device\n");
+ return -ENOMEM;
+ }
+
+ priv = netdev_priv(dev);
+ priv->irq0 = platform_get_irq(pdev, 0);
+ priv->irq1 = platform_get_irq(pdev, 1);
+ if (!priv->irq0 || !priv->irq1) {
+ dev_err(&pdev->dev, "can't find IRQs\n");
+ err = -EINVAL;
+ goto err;
+ }
+
+ macaddr = of_get_mac_address(dn);
+ if (!macaddr) {
+ dev_err(&pdev->dev, "can't find MAC address\n");
+ err = -EINVAL;
+ goto err;
+ }
+
+ r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ priv->base = devm_request_and_ioremap(&pdev->dev, r);
+ if (!priv->base) {
+ dev_err(&pdev->dev, "can't ioremap\n");
+ err = -EINVAL;
+ goto err;
+ }
+
+ dev->base_addr = (unsigned long)priv->base;
+ SET_NETDEV_DEV(dev, &pdev->dev);
+ dev_set_drvdata(&pdev->dev, dev);
+ ether_addr_copy(dev->dev_addr, macaddr);
+ dev->irq = priv->irq0;
+ dev->watchdog_timeo = 2 * HZ;
+ SET_ETHTOOL_OPS(dev, &bcmgenet_ethtool_ops);
+ dev->netdev_ops = &bcmgenet_netdev_ops;
+ netif_napi_add(dev, &priv->napi, bcmgenet_poll, 64);
+
+ priv->msg_enable = netif_msg_init(debug, GENET_MSG_DEFAULT);
+
+ /* Set hardware features */
+ dev->hw_features |= NETIF_F_SG | NETIF_F_IP_CSUM |
+ NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM;
+
+ /* Set the needed headroom to account for any possible
+ * features enabling/disabling at runtime
+ */
+ dev->needed_headroom += 64;
+
+ netdev_boot_setup_check(dev);
+
+ priv->dev = dev;
+ priv->pdev = pdev;
+
+ if (of_device_is_compatible(dn, "brcm,genet-v4"))
+ priv->version = GENET_V4;
+ else if (of_device_is_compatible(dn, "brcm,genet-v3"))
+ priv->version = GENET_V3;
+ else if (of_device_is_compatible(dn, "brcm,genet-v2"))
+ priv->version = GENET_V2;
+ else if (of_device_is_compatible(dn, "brcm,genet-v1"))
+ priv->version = GENET_V1;
+ else {
+ dev_err(&pdev->dev, "unknown GENET version\n");
+ err = -EINVAL;
+ goto err;
+ }
+
+ bcmgenet_set_hw_params(priv);
+
+ spin_lock_init(&priv->lock);
+ spin_lock_init(&priv->bh_lock);
+ mutex_init(&priv->mib_mutex);
+ /* Mii wait queue */
+ init_waitqueue_head(&priv->wq);
+ /* Always use RX_BUF_LENGTH (2KB) buffer for all chips */
+ priv->rx_buf_len = RX_BUF_LENGTH;
+ INIT_WORK(&priv->bcmgenet_irq_work, bcmgenet_irq_task);
+
+ priv->clk = devm_clk_get(&priv->pdev->dev, "enet");
+ if (IS_ERR(priv->clk))
+ dev_warn(&priv->pdev->dev, "failed to get enet clock\n");
+
+ priv->clk_wol = devm_clk_get(&priv->pdev->dev, "enet-wol");
+ if (IS_ERR(priv->clk_wol))
+ dev_warn(&priv->pdev->dev, "failed to get enet-wol clock\n");
+
+ if (!IS_ERR(priv->clk))
+ clk_prepare_enable(priv->clk);
+
+ err = reset_umac(priv);
+ if (err)
+ goto err_clk_disable;
+
+ err = bcmgenet_mii_init(dev);
+ if (err)
+ goto err_clk_disable;
+
+ /* setup number of real queues + 1 (GENET_V1 has 0 hardware queues
+ * just the ring 16 descriptor based TX
+ */
+ netif_set_real_num_tx_queues(priv->dev, priv->hw_params->tx_queues + 1);
+ netif_set_real_num_rx_queues(priv->dev, priv->hw_params->rx_queues + 1);
+
+ err = register_netdev(dev);
+ if (err)
+ goto err_clk_disable;
+
+ /* Turn off the clocks */
+ if (!IS_ERR(priv->clk))
+ clk_disable_unprepare(priv->clk);
+
+ return err;
+
+err_clk_disable:
+ if (!IS_ERR(priv->clk))
+ clk_disable_unprepare(priv->clk);
+err:
+ free_netdev(dev);
+ return err;
+}
+
+static int bcmgenet_drv_remove(struct platform_device *pdev)
+{
+ struct bcmgenet_priv *priv = dev_to_priv(&pdev->dev);
+
+ dev_set_drvdata(&pdev->dev, NULL);
+ unregister_netdev(priv->dev);
+ bcmgenet_mii_exit(priv->dev);
+ free_netdev(priv->dev);
+
+ return 0;
+}
+
+static const struct of_device_id bcmgenet_match[] = {
+ { .compatible = "brcm,genet-v1", },
+ { .compatible = "brcm,genet-v2", },
+ { .compatible = "brcm,genet-v3", },
+ { .compatible = "brcm,genet-v4", },
+ { },
+};
+
+static struct platform_driver bcmgenet_plat_drv = {
+ .probe = bcmgenet_drv_probe,
+ .remove = bcmgenet_drv_remove,
+ .driver = {
+ .name = "bcmgenet",
+ .owner = THIS_MODULE,
+ .of_match_table = bcmgenet_match,
+ },
+};
+
+static int bcmgenet_module_init(void)
+{
+ platform_driver_register(&bcmgenet_plat_drv);
+ return 0;
+}
+
+static void bcmgenet_module_cleanup(void)
+{
+ platform_driver_unregister(&bcmgenet_plat_drv);
+}
+
+module_init(bcmgenet_module_init);
+module_exit(bcmgenet_module_cleanup);
+MODULE_LICENSE("GPL");
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 01/10] net: phy: add "internal" PHY mode
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>
On some systems, the PHY can be internal, in the same package as the
Ethernet MAC, and still be responding to a specific address on the MDIO
bus, in that case, the Ethernet MAC might need to know about it to
properly configure a port multiplexer to switch to an internal or
external PHY. Add a new PHY interface mode for this and update the
Device Tree of_get_phy_mode() function to look for it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/of/of_net.c | 1 +
include/linux/phy.h | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c
index a208a45..729beba 100644
--- a/drivers/of/of_net.c
+++ b/drivers/of/of_net.c
@@ -31,6 +31,7 @@ static const char *phy_modes[] = {
[PHY_INTERFACE_MODE_RTBI] = "rtbi",
[PHY_INTERFACE_MODE_SMII] = "smii",
[PHY_INTERFACE_MODE_XGMII] = "xgmii",
+ [PHY_INTERFACE_MODE_INTERNAL] = "internal",
};
/**
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 565188c..463434b 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -74,6 +74,7 @@ typedef enum {
PHY_INTERFACE_MODE_RTBI,
PHY_INTERFACE_MODE_SMII,
PHY_INTERFACE_MODE_XGMII,
+ PHY_INTERFACE_MODE_INTERNAL,
} phy_interface_t;
@@ -553,7 +554,8 @@ static inline bool phy_interrupt_is_valid(struct phy_device *phydev)
*/
static inline bool phy_is_internal(struct phy_device *phydev)
{
- return phydev->is_internal;
+ return phydev->is_internal ||
+ phydev->interface == PHY_INTERFACE_MODE_INTERNAL;
}
/**
--
1.8.3.2
^ permalink raw reply related
* [PATCH net-next 00/10] Support for Broadcom GENET driver
From: Florian Fainelli @ 2014-02-12 4:07 UTC (permalink / raw)
To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
Hi all,
This patchset adds support for the Broadcom GENET Gigabit Ethernet MAC
controller. This controller is found on the Broadcom BCM7xxx Set Top Box
System-on-a-Chip.
Florian Fainelli (10):
net: phy: add "internal" PHY mode
net: phy: add MoCA PHY type
net: phy: update port type for MoCA PHYs
net: phy: add Broadcom BCM7xxx internal PHY driver
net: bcmgenet: add driver definitions and private structure
net: bcmgenet: add main driver file
net: bcmgenet: add MDIO routines
net: bcmgenet: hook into the build system
Documentation: add Device tree bindings for Broadcom GENET
MAINTAINERS: add entry for the Broadcom GENET driver
.../devicetree/bindings/net/broadcom-bcmgenet.txt | 111 +
MAINTAINERS | 6 +
drivers/net/ethernet/broadcom/Kconfig | 10 +
drivers/net/ethernet/broadcom/Makefile | 1 +
drivers/net/ethernet/broadcom/genet/Makefile | 2 +
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 2685 ++++++++++++++++++++
drivers/net/ethernet/broadcom/genet/bcmgenet.h | 631 +++++
drivers/net/ethernet/broadcom/genet/bcmmii.c | 483 ++++
drivers/net/phy/Kconfig | 6 +
drivers/net/phy/Makefile | 1 +
drivers/net/phy/bcm7xxx.c | 322 +++
drivers/net/phy/phy.c | 5 +-
drivers/of/of_net.c | 2 +
include/linux/brcmphy.h | 9 +
include/linux/phy.h | 5 +-
15 files changed, 4277 insertions(+), 2 deletions(-)
create mode 100644 Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
create mode 100644 drivers/net/ethernet/broadcom/genet/Makefile
create mode 100644 drivers/net/ethernet/broadcom/genet/bcmgenet.c
create mode 100644 drivers/net/ethernet/broadcom/genet/bcmgenet.h
create mode 100644 drivers/net/ethernet/broadcom/genet/bcmmii.c
create mode 100644 drivers/net/phy/bcm7xxx.c
--
1.8.3.2
^ permalink raw reply
* Re: 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe
From: Eric Dumazet @ 2014-02-12 4:17 UTC (permalink / raw)
To: Dan Williams
Cc: Sander Eikelenboom, Konrad Rzeszutek Wilk, Wei Liu,
Francois Romieu, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Dave Jones
In-Reply-To: <CAPcyv4juAS0ODRPKE-_wXYu5bixLnvsgs3jgo4vzWcatfnyVsw@mail.gmail.com>
On Tue, 2014-02-11 at 18:07 -0800, Dan Williams wrote:
> The overlap granularity is too large. Multiple dma_map_single
> mappings are allowed to a given page as long as they don't collide on
> the same cache line.
>
I am not sure why you try number of mappings of a page.
Try launching 100 concurrent netperf -t TCP_SENFILE
Same page might be mapped more than 100 times, more than 10000 times in
some cases.
^ permalink raw reply
* Re: [PATCH net] net: Clear local_df only if crossing namespace.
From: Pravin Shelar @ 2014-02-12 4:26 UTC (permalink / raw)
To: Pravin Shelar, David Miller, netdev, Templin, Fred L,
Nicolas Dichtel
In-Reply-To: <20140211021150.GB11150@order.stressinduktion.org>
On Mon, Feb 10, 2014 at 6:11 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Mon, Feb 10, 2014 at 01:00:14PM -0800, Pravin Shelar wrote:
>> On Fri, Feb 7, 2014 at 4:58 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > May I know because of wich vport, vxlan or gre, you did this change?
>> >
>> It affects both gre and vxlan.
>
> Ok, thanks.
>
>> > I am feeling a bit uncomfortable handling remote and local packets that
>> > differently on lower tunnel output (local_df is mostly set on locally
>> > originating packets).
>>
>> For ip traffic it make sense to turn on local_df only for local
>> traffic, since for remote case we can send icmp (frag-needed) back to
>> source. No such thing exist for OVS tunnels. ICMP packet are not
>> returned to source for the tunnels. That is why to be on safe side,
>> local_df is turned on for tunnels in OVS.
>
> I have a proposal:
>
> I don't like it that much because of the many arguments. But I currently
> don't see another easy solution. Maybe we should make bool xnet an enum and
> test with bitops?
>
> I left the clearing of local_df in skb_scrub_packet as we need it for the
> dev_forward_skb case and it should be done that in any case.
>
> This diff is slightly compile tested. ;)
>
> I can test and make proper submit if you agree.
>
> What do you think?
>
I am not sure why the caller can not just set skb->local_df before
calling iptunnel_xmit() rather than passing extra arg to this
function?
There are not that many caller of this function.
Thanks,
Pravin.
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 026a313..630e72f 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1657,7 +1657,7 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
> return err;
>
> return iptunnel_xmit(rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df,
> - false);
> + false, false);
> }
> EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
>
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index 48ed75c..8863002 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -154,7 +154,8 @@ static inline u8 ip_tunnel_ecn_encap(u8 tos, const struct iphdr *iph,
> int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto);
> int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
> __be32 src, __be32 dst, __u8 proto,
> - __u8 tos, __u8 ttl, __be16 df, bool xnet);
> + __u8 tos, __u8 ttl, __be16 df, bool xnet,
> + bool clear_local_df);
>
> struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb, bool gre_csum,
> int gso_type_mask);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 8f519db..5773681 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3903,12 +3903,13 @@ EXPORT_SYMBOL(skb_try_coalesce);
> */
> void skb_scrub_packet(struct sk_buff *skb, bool xnet)
> {
> - if (xnet)
> + if (xnet) {
> skb_orphan(skb);
> + skb->local_df = 0;
> + }
> skb->tstamp.tv64 = 0;
> skb->pkt_type = PACKET_HOST;
> skb->skb_iif = 0;
> - skb->local_df = 0;
> skb_dst_drop(skb);
> skb->mark = 0;
> secpath_reset(skb);
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index c0e3cb7..2922ec9 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -721,7 +721,8 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
> }
>
> err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, protocol,
> - tos, ttl, df, !net_eq(tunnel->net, dev_net(dev)));
> + tos, ttl, df, !net_eq(tunnel->net, dev_net(dev)),
> + true);
> iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
>
> return;
> diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
> index 6156f4e..93beb04 100644
> --- a/net/ipv4/ip_tunnel_core.c
> +++ b/net/ipv4/ip_tunnel_core.c
> @@ -48,13 +48,16 @@
>
> int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
> __be32 src, __be32 dst, __u8 proto,
> - __u8 tos, __u8 ttl, __be16 df, bool xnet)
> + __u8 tos, __u8 ttl, __be16 df, bool xnet,
> + bool clear_df)
> {
> int pkt_len = skb->len;
> struct iphdr *iph;
> int err;
>
> skb_scrub_packet(skb, xnet);
> + if (clear_df)
> + skb->local_df = 0;
>
> skb_clear_hash(skb);
> skb_dst_set(skb, &rt->dst);
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index 3dfbcf1..cc0be0e 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -974,7 +974,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
> }
>
> err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, IPPROTO_IPV6, tos,
> - ttl, df, !net_eq(tunnel->net, dev_net(dev)));
> + ttl, df, !net_eq(tunnel->net, dev_net(dev)), true);
> iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
> return NETDEV_TX_OK;
>
^ permalink raw reply
* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Jason Wang @ 2014-02-12 5:28 UTC (permalink / raw)
To: Qin Chuanyu, davem
Cc: Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
Eric Dumazet
In-Reply-To: <52FA32C5.9040601@huawei.com>
On 02/11/2014 10:25 PM, Qin Chuanyu wrote:
> we could xmit directly instead of going through softirq to gain
> throughput and lantency improved.
> test model: VM-Host-Host just do transmit. with vhost thread and nic
> interrupt bind cpu1. netperf do throuhput test and qperf do lantency
> test.
> Host OS: suse11sp3, Guest OS: suse11sp3
>
> latency result(us):
> packet_len 64 256 512 1460
> old(UDP) 44 47 48 66
> new(UDP) 38 41 42 66
>
> old(TCP) 52 55 70 117
> new(TCP) 45 48 61 114
>
> throughput result(Gbit/s):
> packet_len 64 512 1024 1460
> old(UDP) 0.42 2.02 3.75 4.68
> new(UDP) 0.45 2.14 3.77 5.06
>
> TCP due to the latency, client couldn't send packet big enough
> to get benefit from TSO of nic, so the result show it will send
> more packet per sencond but get lower throughput.
>
> Eric mentioned that it would has problem with cgroup, but the patch
> had been sent by Herbert Xu.
> patch_id f845172531fb7410c7fb7780b1a6e51ee6df7d52
>
> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
> ---
A question: without NAPI weight, could this starve other net devices?
> drivers/net/tun.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 44c4db8..90b4e58 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct
> *tun, struct tun_file *tfile,
> skb_probe_transport_header(skb, 0);
>
> rxhash = skb_get_hash(skb);
> - netif_rx_ni(skb);
> + rcu_read_lock_bh();
> + netif_receive_skb(skb);
> + rcu_read_unlock_bh();
>
> tun->dev->stats.rx_packets++;
> tun->dev->stats.rx_bytes += len;
^ permalink raw reply
* [PATCH net] virtio-net: alloc big buffers also when guest can receive UFO
From: Jason Wang @ 2014-02-12 5:43 UTC (permalink / raw)
To: rusty, mst, virtio-dev, virtualization, netdev, linux-kernel
Cc: Sridhar Samudrala
We should alloc big buffers also when guest can receive UFO
pakcets. Otherwise the big packets will be truncated when mergeable rx
buffer is disabled.
Fixes 5c5167515d80f78f6bb538492c423adcae31ad65
(virtio-net: Allow UFO feature to be set and advertised.)
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d75f8ed..5632a99 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1711,7 +1711,8 @@ static int virtnet_probe(struct virtio_device *vdev)
/* If we can receive ANY GSO packets, we must allocate large ones. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
- virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
+ virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
+ virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
vi->big_packets = true;
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
--
1.8.3.2
^ permalink raw reply related
* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Eric Dumazet @ 2014-02-12 5:47 UTC (permalink / raw)
To: Jason Wang
Cc: Qin Chuanyu, davem, Michael S. Tsirkin, Anthony Liguori, KVM list,
netdev
In-Reply-To: <52FB066E.1020006@redhat.com>
On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:
> A question: without NAPI weight, could this starve other net devices?
Not really, as net devices are serviced by softirq handler.
^ permalink raw reply
* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Jason Wang @ 2014-02-12 5:50 UTC (permalink / raw)
To: Eric Dumazet
Cc: Qin Chuanyu, davem, Michael S. Tsirkin, Anthony Liguori, KVM list,
netdev
In-Reply-To: <1392184074.1752.2.camel@edumazet-glaptop2.roam.corp.google.com>
On 02/12/2014 01:47 PM, Eric Dumazet wrote:
> On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:
>
>> A question: without NAPI weight, could this starve other net devices?
> Not really, as net devices are serviced by softirq handler.
>
>
Yes, then the issue is tun could be starved by other net devices.
^ permalink raw reply
* [PATCH net] vhost_net: do not report a used len larger than receive buffer size
From: Jason Wang @ 2014-02-12 5:57 UTC (permalink / raw)
To: mst, kvm, virtio-dev, virtualization, netdev, linux-kernel
Currently, even if the packet were truncated by lower socket, we still
report the packet size as the used len which may confuse guest
driver. Fixes this by returning the size of guest receive buffer instead.
Fixes 3a4d5c94e959359ece6d6b55045c3f046677f55c
(vhost_net: a kernel-level virtio server)
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9a68409..06268a0 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -525,7 +525,8 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
++headcount;
seg += in;
}
- heads[headcount - 1].len += datalen;
+ if (likely(datalen < 0))
+ heads[headcount - 1].len += datalen;
*iovcount = seg;
if (unlikely(log))
*log_num = nlogs;
--
1.8.3.2
^ permalink raw reply related
* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Eric Dumazet @ 2014-02-12 6:26 UTC (permalink / raw)
To: Jason Wang
Cc: Qin Chuanyu, davem, Michael S. Tsirkin, Anthony Liguori, KVM list,
netdev
In-Reply-To: <52FB0BBD.7060303@redhat.com>
On Wed, 2014-02-12 at 13:50 +0800, Jason Wang wrote:
> On 02/12/2014 01:47 PM, Eric Dumazet wrote:
> > On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:
> >
> >> A question: without NAPI weight, could this starve other net devices?
> > Not really, as net devices are serviced by softirq handler.
> >
> >
>
> Yes, then the issue is tun could be starved by other net devices.
How this patch changes anything to this 'problem' ?
netif_rx_ni() can only be called if your process is not preempted by
other high prio tasks/softirqs.
If this process is scheduled on a cpu, then disabling bh to process
_one_ packet wont fundamentally change dynamic of the system.
^ permalink raw reply
* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Qin Chuanyu @ 2014-02-12 6:46 UTC (permalink / raw)
To: Jason Wang, davem
Cc: Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
Eric Dumazet
In-Reply-To: <52FB066E.1020006@redhat.com>
On 2014/2/12 13:28, Jason Wang wrote:
> A question: without NAPI weight, could this starve other net devices?
tap xmit skb use thread context,the poll func of physical nic driver
could be called in softirq context without change.
I had test it by binding vhost thread and physic nic interrupt on the
same vcpu, use netperf xmit udp, test model is VM1-Host1-Host2.
if only VM1 xmit skb, the top show as below :
Cpu1 :0.0%us, 95.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 5.0%si, 0.0%st
then use host2 xmit skb to VM1, the top show as below :
Cpu1 :0.0%us, 41.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 59.0%si, 0.0%st
so I think there is no problem with this change.
>> drivers/net/tun.c | 4 +++-
>> 1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 44c4db8..90b4e58 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct
>> *tun, struct tun_file *tfile,
>> skb_probe_transport_header(skb, 0);
>>
>> rxhash = skb_get_hash(skb);
>> - netif_rx_ni(skb);
>> + rcu_read_lock_bh();
>> + netif_receive_skb(skb);
>> + rcu_read_unlock_bh();
>>
>> tun->dev->stats.rx_packets++;
>> tun->dev->stats.rx_bytes += len;
>
>
> .
>
^ permalink raw reply
* [PATCH net-next 1/2] bonding: remove the redundant judgements for bond_set_mac_address()
From: Ding Tianhong @ 2014-02-12 6:58 UTC (permalink / raw)
To: fubar, vfalico, andy; +Cc: davem, netdev
In-Reply-To: <1392188330-17208-1-git-send-email-dingtianhong@huawei.com>
The dev_set_mac_address() will check the dev->netdev_ops->ndo_set_mac_address,
so no need to check it in bond_set_mac_address().
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
drivers/net/bonding/bond_main.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 71ba18e..58aa531 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3461,15 +3461,7 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
*/
bond_for_each_slave(bond, slave, iter) {
- const struct net_device_ops *slave_ops = slave->dev->netdev_ops;
pr_debug("slave %p %s\n", slave, slave->dev->name);
-
- if (slave_ops->ndo_set_mac_address == NULL) {
- res = -EOPNOTSUPP;
- pr_debug("EOPNOTSUPP %s\n", slave->dev->name);
- goto unwind;
- }
-
res = dev_set_mac_address(slave->dev, addr);
if (res) {
/* TODO: consider downing the slave
--
1.8.0
^ permalink raw reply related
* [PATCH net-next 0/2] bonding: remove the redundant judgements for bonding
From: Ding Tianhong @ 2014-02-12 6:58 UTC (permalink / raw)
To: fubar, vfalico, andy; +Cc: davem, netdev
Remove the redundant judgements for bond_set_mac_address() and
bond_option_queue_id_set().
Ding Tianhong (2):
bonding: remove the redundant judgements for bond_set_mac_address()
bonding: remove the redundant judgements for
bond_option_queue_id_set()
drivers/net/bonding/bond_main.c | 8 --------
drivers/net/bonding/bond_options.c | 3 +--
2 files changed, 1 insertion(+), 10 deletions(-)
--
1.8.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox