Netdev List
 help / color / mirror / Atom feed
* [PATCH v2.54] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2014-02-12  3:52 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Jesse Gross, Ben Pfaff
  Cc: Ravi K

Hi Jesse, Hi All,

As per the suggestion made by Ben in relation to this patch
I have updated it so that:

* The datapath rejects push MPLS actions in the presence of VLAN tags.

  I have done this by blacklisting the following:
  - ETH_P_8021Q (0x8100)
  - ETH_P_8021AD (0x88A8)
  - ETH_P_QINQ1 (0x0x9100)
  - ETH_P_QINQ2 (0x0x9200)
  - ETH_P_QINQ3 (0x0x9300)

  But perhaps a safer option would be to whitelist only ethertypes
  we are completely comfortable with. Starting with:
  - ETH_P_IP (0x0800)
  - ETH_P_ARP (0x0806)
  - ETH_P_IPV6 (0x86DD)


to aid review this patch is available in git at:
https://github.com/horms/openvswitch devel/mpls-v2.54

Simon Horman (1):
  datapath: Add basic MPLS support to kernel

 OPENFLOW-1.1+                                   |  12 -
 datapath/Modules.mk                             |   1 +
 datapath/actions.c                              | 119 +++++++++-
 datapath/datapath.c                             |   4 +-
 datapath/flow.c                                 |  29 +++
 datapath/flow.h                                 |  17 +-
 datapath/flow_netlink.c                         | 296 ++++++++++++++++++++++--
 datapath/flow_netlink.h                         |   2 +-
 datapath/linux/compat/gso.c                     |  70 +++++-
 datapath/linux/compat/gso.h                     |  41 ++++
 datapath/linux/compat/include/linux/netdevice.h |   6 +-
 datapath/linux/compat/netdevice.c               |  10 +-
 datapath/mpls.h                                 |  15 ++
 include/linux/openvswitch.h                     |   7 +-
 14 files changed, 567 insertions(+), 62 deletions(-)
 create mode 100644 datapath/mpls.h

-- 
1.8.5.2

^ permalink raw reply

* [PATCH v2.54] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2014-02-12  3:52 UTC (permalink / raw)
  To: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Jesse Gross, Ben Pfaff
  Cc: Ravi K
In-Reply-To: <1392177160-18742-1-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

Cc: Ravi K <rkerur-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Leo Alterman <lalterman-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Cc: Isaku Yamahata <yamahata-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
Cc: Joe Stringer <joe-Q1GJJQv1iO6lP80pJB477g@public.gmane.org>
Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

---
v2.54
* Do not allow push MPLS in the presence of VLANs
* Remove support for push MPLS in the presence of VLANs from actions.c

v2.53
* Push MPLS labels after VLAN tags
  - This is consistent with OF1.2 and plans for OF1.3.4, and OF1.5+.
    It is inconsistent with OF1.4, which appears to be an aberration

v2.52
* Do not guard __skb_network_protocol with KERNEL_VERSION(3.11.0)
  It was not guarded before this patch and should not be guarded
  afterwards as it is currently needed regardless of the kernel version

v2.50 - v2.51
* No change

v2.49
* Remove MPLS items from OPENFLOW-1.1+. They should now be complete.

v2.47
* Rebase for HAVE_RHEL_OVS_HOOK and OVS_KEY_ATTR_TCP_FLAGS

v2.43 - v2.46
* No change

v2.42
* Rebase for:
  + 0585f7a ("datapath: Simplify mega-flow APIs.")
  + a097c0b ("datapath: Restructure datapath.c and flow.c")
* As suggested by Jesse Gross
  + Take into account that push_mpls() will have freed the skb on error
  + Remove dubious !eth_p_mpls(skb->protocol) condition from push_mpls
    The !eth_p_mpls(skb->protocol) condition on setting inner_protocol
    has no effect. Its motivation was to ensure that inner_protocol was
    only set the first time that mpls_push occured. However this is already
    ensured by the !ovs_skb_get_inner_protocol(skb) condition.
  + Return -EINVAL instead of -ENOMEM from pop_mpls() if the skb is too short
  + Do not add @inner_protocol to kernel doc for struct ovs_skb_cb.
    The patch no longer adds an inner_protocol member to struct ovs_skb_cb
  + Do not add and set otherwise unsued inner_protocol variable in
    rpl_dev_queue_xmit()
* As suggested by Pravin Shelar
  + Implement compatibility code in existing rpl_skb_gso_segment
    rather than introducing to use rpl___skb_gso_segment

v2.41
* No change

v2.40
* Rebase for:
  + New dev_queue_xmit compat code
  + Updated put_vlan()
* As suggested by Jesse Gross
  + Remove bogus mac_len update from push_mpls()
  + Slightly simplify push_mpls() by using eth_hdr()
  + Remove dubious condition !eth_p_mpls(inner_protocol) on
    an skb being considered to be MPLS in netdev_send()
  + Only use compatibility code for MPLS GSO segmentation on kernels
    older than 3.11
  + Revamp setting of inner_protocol
    1. Do not unconditionally set inner_protocol to the value of
       skb->protocol in ovs_execute_actions().
    2. Initialise inner_protocol it to zero only if compatibility code is in
       use. In the case where compatibility code is not in use it will either
       be zero due since the allocation of the skb or some other value set
       by some other user.
    3. Conditionally set the inner_protocol in push_mpls() to the value of
       skb->protocol when entering push_mpls(). The condition is that
       inner_protocol is zero and the value of skb->protocol is not an MPLS
       ethernet type.
    - This new scheme:
      + Pushes logic to set inner_protocol closer to the case where it is
	needed.
      + Avoids over-writing values set by other users.
* As suggested by Pravin Shelar
  + Only set and restore skb->protocol in rpl___skb_gso_segment() in the
    case of MPLS
  + Add inner_protocol field to struct ovs_gso_cb instead of ovs_skb_cb.
    This moves compatibility code closer to where it is used
    and creates fewer differences with mainline.
* Update comment on mac_len updates in datapath/actions.c
* Remove HAVE_INNER_PROCOTOL and instead just check
  against kernel version 3.11 directly.
  HAVE_INNER_PROCOTOL is a hang-over from work done prior
  to the merge of inner_protocol into the kernel.
* Remove dubious condition !eth_p_mpls(inner_protocol) on
  using inner_protocol as the type in rpl_skb_network_protocol()
* Do not update type of features in rpl_dev_queue_xmit.
  Though arguably correct this is not an inherent part of
  the changes made by this patch.
* Use skb_cow_head() in push_mpls()
  + Call skb_cow_head(skb, MPLS_HLEN) instead of
    make_writable(skb, skb->mac_len) to ensure that there is enough head
    room to push an MPLS LSE regardless of whether the skb is cloned or not.
  + This is consistent with the behaviour of rpl__vlan_put_tag().
  + This is a fix for crashes reported when performing mpls_push
    with headroom less than 4. This problem was introduced in v3.36.
* Skip popping in mpls_pop if the skb is too short to contain an MPLS LSE

v2.39
* Rebase for removal of vlan, checksum and skb->mark compat code

v2.38
* Rebase for SCTP support
* Refactor validate_tp_port() to iterate over eth_types rather
  than open-coding the loop. With the addition of SCTP this logic
  is now used three times.

v2.37
* Rebase

v2.36
* Do not add set_ethertype() to datapath/actions.c.
  As this patch has evolved this function had devolved into
  to sets of functionality wrapped into a single function with
  only one line of common code. Refactor things to simply
  open-code setting the ether type in the two locations where
  set_ethertype() was previously used. The aim here is to improve
  readability.

* Update setting skb->protocol after mpls push and pop.
  - In the case of push_mpls it should be set unconditionally
    as in v2.35 the behaviour of this function to always push
    an MPLS LSE before any VLAN tags.
  - In the case of mpls_pop eth_p_mpls(skb->protocol) is a better
    test than skb->protocol != htons(ETH_P_8021Q) as it will give the
    correct behaviour in the presence of other VLAN ethernet types,
    for example 0x88a8 which is used by 802.1ad. Moreover, it seems
    correct to update the ethernet type if it was previously set
    according to the top-most MPLS LSE.

* Deaccelerate VLANs when pushing MPLS tags the
  - Since v2.35 MPLS push will insert an MPLS LSE before any VLAN tags.
    This means that if an accelerated tag is present it should be
    deaccelerated to ensure it ends up in the correct position.

* Update skb->mac_len in push_mpls() so that it will be correct
  when used by a subsequent call to pop_mpls().

  As things stand I do not believe this is strictly necessary as
  ovs-vswitchd will not send a pop MPLS action after a push MPLS action.
  However, I have added this in order to code more defensively as I believe
  that if such a sequence did occur it would be rather unobvious why
  it didn't work.

* Do not add skb_cow_head() call in push_mpls().
  It is unnecessary as there is a make_writable() call.
  This change was also made in v2.30 but some how the
  code regressed between then and v2.35.

v2.35
* Rebase
* Move MPLS constants to mpls.h
* Push MPLS tags after ethernet, before VLAN tags
  - This is consistent with the OpenFlow 1.3 specification
  - Compatibility with OpenFlow 1.2 and earlier versions
    may be provided by ovs-vswitchd.
* Correct GSO behaviour in the presence of MPLS but absence of VLANs

v2.34
* Rebase for megaflow changes

v2.33
* Ensure that inner_protocol is always set to to the current
  skb->protocol value in ovs_execute_actions(). This ensures
  it is set to the correct value in the absence of a push_mpls action.
  Also remove setting of inner_protocol in push_mpls() as
  it duplicates the code now in ovs_execute_actions().
* Call __skb_gso_segment() instead of skb_gso_segment() from
  rpl___skb_gso_segment() in the case that HAVE___SKB_GSO_SEGMENT is set.
  This was a typo.

v2.32
* As suggested by Jesse Gross
  - Use int instead of size_t in validate_and_copy_actions__().
  - Fix crazy edit mess in pop_mpls() action comment
  - Move eth_p_mpls() into mpls.h
  - Refactor skb_gso_segment MPLS handling into rpl_skb_gso_segment
    Address Jesse's comments regarding this code:
    "Can we push this completely into the skb_gso_segment() compatibility
     code? It's both nicer and may make the interactions with the vlan code
     less confusing."
  - Move GSO compatibility code into linux/compat/gso.*
  - Set skb->protocol on mpls_push and mpls_pop in the presence
    of an offloaded VLAN.

v2.31
* As suggested by Jesse Gross
  - There is no need to make mac_header_end inline as it is not in a header file
  - Remove dubious if (*skb_ethertype == ethertype) optimisation from
    set_ethertype
  - Only set skb->protocol in push_mpls() or pop_mpls() for non-VLAN packets
  - Use MAX_ETH_TYPES instead of SAMPLE_ACTION_DEPTH for array size
    of types in struct eth_types. This corrects a typo/thinko.
  - Correct eth type tracking logic such that start isn't advanced
    when entering a sample action, ensuring that all possibly types
    are checked when verifying nested actions.
* Define HAVE_INNER_PROTOCOL based on kernel version.
  inner_protocol has been merged into net-next and should appear in
  v3.11 so there is no longer a need for a acinclude.m4 test to check for it.
* Add MPLS GSO compatibility code.
  This is for use on kernels that do not have MPLS GSO support.
  Thanks to Joe Stringer for his work on this.

v2.30
* As suggested by Jesse Gross
  - Use skb_cow_head in push_mpls to ensure there is sufficient headroom for
    skb_push
  - Call make_writable with skb->mac_len instead of skb->mac_len + MPLS_HLEN
    in push_mpls as only the first skb->mac_len bytes of existing packet data
    are modified.
  - Rename skb_mac_header_end as mac_header_end, this seems
    to be a more appropriate name for a local function.
  - Remove OVS_CSUM_COMPLETE code from set_ethertype().
    Inside OVS the ethernet header is not covered by OVS_CSUM_COMPLETE.
  - Use __skb_pull() instead of skb_pull() in pop_mpls()
  - Decrement and decrement skb->mac_len when poping and pushing VLAN tags.
    Previously mac_len was reset, but this would result in forgetting
    the MPLS label stack.
  - Remove spurious comment from before do_execute_actions().
  - Move OVS_KEY_ATTR_MPLS attribute to its final, upstreamable, location.
  - Correct ethertype check for OVS_ACTION_ATTR_POP_MPLS case in
    validate_and_copy_actions() to check for MPLS ethertypes rather than
    ETH_P_IP.
  - Rewrite tracking of eth types used to verify actions in the presence
    of sample actions. There is a large comment above struct eth_types
    describing the new implementation.

v2.29
* Break include/ and lib/ portions of the patch out into a
  separate patch "datapath: Add basic MPLS support to kernel"
* Update for new MPLS GSO scheme
  - skb->protocol is set to the new ethertype of the packet
    on MPLS push and pop
  - When pushing the first MPLS LSE onto a previously non-MPLS
    packet set skb->inner_protocol to the original ethertype.
  - skb->inner_protocol may be used by the network stack
    for GSO of the inner-packet.
* Drop const from ethertype parameter of set_ethertype.
  This appears to be a legacy of this parameter being a pointer.
* Pass the ethertype patrameter of pop_mpls as a value rather
  than a pointer.

v2.28
* Kernel Datapath changes as suggested by Jarno Rajahalme
  + Correct the logic introduced in v2.27 to set the network_header
    to after the MPLS label stack in the case of an MPLS packet.
    - Increment stack_len offset so that label stacks of depth greater
      than two do not cause an infinite loop.
    - Correct offset passed to check_header to include skb->mac len

v2.27
* Kernel Datapath changes as suggested by Jarno Rajahalme and Jesse Gross:
  + Previously the mac_len and network_header of an skb corresponded
    to the end of the L2 header.  To support GSO, just before transmission,
    do_output, with the results as follows:

    Input: non-MPLS skb: Output: network header and mac_len correspond
                         to the beginning of the L3 headers
    Input: MPLS:         Output: network header and mac_len correspond to the
                         end of the L2 headers.

    This is somewhat confusing.

  + The new scheme is as follows:
    - The mac_len always corresponds to the end of the L2 header.
    - The network header always corresponds to the beginning of the
      L3 header.

  + Note that in the case of MPLS output the end of the L2 headers and the
  beginning of the L3 headers will differ.

* Remove unused declaration of skb_cb_mpls_stack()

v2.26
* Rebase on master
* Kernel Datapath changes as suggested by Jarno Rajahalme
  - Use skb_network_header() instead of skb_mac_header() to locate
    the ethertype to set in set_ethertype() as the latter will
    be wrong in the presence of VLAN tags. This resolves
    a regression introduced in v2.24.
  - Enhance comment in do_output()
  - do_execute_actions(): Do not alter mpls_stack_depth if
    a MPLS push or pop action fail. This is achieved by altering
    mpls_stack_depth at the end of push_mpls() and pop_mpls().

v2.25
* Rebase on master
* Pass big-endian value as the last argument of eth_types_set() in
  validate_and_copy_actions__()
* Use revised GSO support as provided by the patch series
  "[PATCH 0/2] Small Modifications to GSO to allow segmentation of MPLS"
  - Set skb->mac_len to the length of the l2 header + MPLS stack length
  - Update skb->network_header accordingly
  - Set skb->encapsulated_features

v2.24
* Use skb_mac_header() in set_ethertype()
* Set skb->encapsulation in set_ethertype() to support MPLS GSO.
  Also add a note about the other requirements for MPLS GSO.
  MPLS GSO support will be posted as a patch net-next (Linux mainline)
  "MPLS: Add limited GSO support"
* Do not add ETH_TYPE_MIN, it is no longer used

v2.23
* As suggested by Jesse Gross:
  - Verify the current ethernet type when validating sample actions
    both for the taken and not-taken path if the sample action.
  - Document that the OVS_KEY_ATTR_MPLS attribute accepts a list of
    struct ovs_key_mpls but that an implementation may restrict
    the length it accepts.
  - Restrict the array length of the OVS_KEY_ATTR_MPLS to one.
    + Don't add ovs_flow_verify_key_len as it was added to
      handle attributes whose values are arrays but there are
      no attributes with values that are arrays (of length greater than one).

v2.22
* As suggested by Jesse Gross:
  - Fix sparse warning in validate_and_copy_actions()
    I have no idea why sparse doesn't show this up this on my system.
  - Remove call to skb_cow_head() from push_mpls() as it
    is already covered by a call to make_writable()
  - Check (key_type > OVS_KEY_ATTR_MAX) in ovs_flow_verify_key_len()
  - Disallow set actions on l2.5+ data and MPLS push and pop actions
    after an MPLS pop action as there is no verification that the packet
    is actually of the new ethernet type. This may later be supported
    using recirculation or by other means.
  - Do not add spurious debuging message to ovs_flow_cmd_new_or_set()

v2.21
* As suggested by Jesse Gross:
  - Verify that l3 and l4 actions always always occur prior to
    a push_mpls action and use the network header pointer of an skb
    to track the top of the MPLS stack. This avoids adding an l2_size
    element to the skb callback.

v2.20
* As suggested by Jesse Gross:
  - Do not add ovs_dp_ioctl_hook
    + This appears to be garbage from a rebase
  - Do not add skb_cb_set_l2_size. Instead set OVS_CB(skb)->l2_size
    in ovs_flow_extract().
  - Do not free skb on error in push_mpls(), it is freed in the caller
  - Call skb_reset_mac_len() in pop_mpls() and push_mpls()
  - Update checksums in pop_mpls(), push_mpls() and set_mpls().
  - Rename skb_cb_mpls_bos() as skb_cb_mpls_stack().
    It returns the top not the bottom of the stack.
  - Track the current eth_type in validate_and_copy_actions
    which is initially the eth_type of the flow and may be modified
    by push_mpls and pop_mpls actions. Use this to correctly validate
    mpls_set actions. This is to allow mpls_set actions to be applied
    to a non-MPLS frame after an mpls_push action (although ovs-vswitchd
    doesn't currently do that).
    Also:
    + Remove the check of the eth_type in set_mpls() as the new validation
      scheme should ensure it cannot be incorrect.
    + Use the current eth_type to validate mpls_pop actions and remove
      the eth_type check from pop_mpls().
  - Move OVS_KEY_ATTR_MPLS to non-upstream group in ovs_key_lens
  - Remove unnecessary memset of mpls_key in ovs_flow_to_nlattrs()
  - Make a union of the mpls and ip elements of struct sw_flow_key.
    Currently the code stops parsing after an MPLS header so it is
    not possible for the ip and mpls elements to be used simultaneously
    and some space can be saved by using a union.
  - Allow an array of MPLS key attributes
    + Currently all but the first element is ignored
    + User-space needs to be updated to accept more than one element,
      currently it will treat their presence as an error
  - Do not update network header in ovs_flow_extract() for after parsing
    the MPLS stack as it is never used because no l3+ processing
    occurs on MPLS frames.
  - Allow multiple MPLS entries in a match by allowing the OVS_KEY_ATTR_MPLS
    to be an array of struct ovs_key_mpls with at least one entry.
    Currently only one entry is used which is byte-for-byte compatible with
    the previous scheme of having OVS_KEY_ATTR_MPLS as a struct
    ovs_key_mpls.
* Make skb writable in pop_mpls(), push_mpls() and set_mpls().

v2.18 - v2.19
* No change

v2.17
* As suggested by Ben Pfaff
  - Use consistent terminology for MPLS.
    + Consistently refer to the MPLS component of a packet as the
      MPLS label stack and entries in the stack as MPLS label stack entries
      (LSE).  An MPLS label is a component of an MPLS label stack entry.
      The other components are the traffic class (TC), time to live (TTL)
      and bottom of stack (BoS) bit.
  - Rename compose_.*mpls_ functions as execute_.*mpls_

v2.16
* No change

v2.15
* As suggested by Ben Pfaff
  - Use OVS_ACTION_SET to set OVS_KEY_ATTR_MPLS instead of
    OVS_ACTION_ATTR_SET_MPLS

v2.14
* Remove include/linux/openvswitch.h portion which added add
  new key and action attributes. This
  now present in "User-Space MPLS actions and matches"
  which is now a dependency of this patch

v2.13
* As suggested by Jarno Rajahalme
  - Rename mpls_bos element of ovs_skb_cb as l2_size as it is set and used
    regardless of if an MPLS stack is present or not. Update the name of
    helper functions and documentation accordingly.
  - Ensure that skb_cb_mpls_bos() never returns NULL
* Correct endieness in eth_p_mpls()

v2.12
* Update skb and network header on MPLS extraction in ovs_flow_extract()
* Use NULL in skb_cb_mpls_bos()
* Add eth_p_mpls helper

v2.10 - v2.11
* No change

v2.9
* datapath: Always update the mpls bos if  vlan_pop is successful

  Regardless of the details of how a successful
  vlan_pop is achieved, the mpls bos needs to be updated.

  Without this fix it has been observed that the following
  results in malformed packets

v2.8
* No change

v2.7
* Rebase

v2.6
* As suggested by Yamahata-san
  - Do not guard against label == 0 for
    OVS_ACTION_ATTR_SET_MPLS in validate_actions().
    A label of 0 is valid
  - Remove comment stupulating that if
    the top_label element of struct sw_flow_key is 0 then
    there is no MPLS label. An MPLS label of 0 is valid
    and the correct check if ethertype is
    ntohs(ETH_TYPE_MPLS) or ntohs(ETH_TYPE_MPLS_MCAST)

v2.4 - v2.5
* No change

v2.3
* s/mpls_stack/mpls_bos/
  This is in keeping with the naming used in the OpenFlow 1.3 specification

v2.2
* Call skb_reset_mac_header() in skb_cb_set_mpls_stack()
  eth_hdr(skb) is non-NULL when called in skb_cb_set_mpls_stack().
* Add a call to skb_cb_set_mpls_stack() in ovs_packet_cmd_execute().
  I apologise that I have mislaid my notes on this but
  it avoids a kernel panic. I can investigate again if necessary.
* Use struct ovs_action_push_mpls instead of
  __be16 to decode OVS_ACTION_ATTR_PUSH_MPLS in validate_actions(). This is
  consistent with the data format for the attribute.
* Indentation fix in skb_cb_mpls_stack(). [cosmetic]

v2.1
* Manual rebase

Conflicts:
	datapath/linux/compat/include/linux/netdevice.h
	datapath/linux/compat/netdevice.c
---
 OPENFLOW-1.1+                                   |  12 -
 datapath/Modules.mk                             |   1 +
 datapath/actions.c                              | 119 +++++++++-
 datapath/datapath.c                             |   4 +-
 datapath/flow.c                                 |  29 +++
 datapath/flow.h                                 |  17 +-
 datapath/flow_netlink.c                         | 296 ++++++++++++++++++++++--
 datapath/flow_netlink.h                         |   2 +-
 datapath/linux/compat/gso.c                     |  70 +++++-
 datapath/linux/compat/gso.h                     |  41 ++++
 datapath/linux/compat/include/linux/netdevice.h |   6 +-
 datapath/linux/compat/netdevice.c               |  10 +-
 datapath/mpls.h                                 |  15 ++
 include/linux/openvswitch.h                     |   7 +-
 14 files changed, 567 insertions(+), 62 deletions(-)
 create mode 100644 datapath/mpls.h

diff --git a/OPENFLOW-1.1+ b/OPENFLOW-1.1+
index eaf2ee9..75d9a09 100644
--- a/OPENFLOW-1.1+
+++ b/OPENFLOW-1.1+
@@ -59,10 +59,6 @@ probably incomplete.
       behavior does not change.
       [required for OF1.1 and OF1.2]
 
-    * MPLS.  Simon Horman maintains a patch series that adds this
-      feature.  This is partially merged.
-      [optional for OF1.1+]
-
     * Match and set double-tagged VLANs (QinQ).  This requires kernel
       work for reasonable performance.
       [optional for OF1.1+]
@@ -121,18 +117,10 @@ didn't compare the specs carefully yet.)
       some kind of "hardware" support, if we judged it useful enough.)
       [optional for OF1.3+]
 
-    * MPLS BoS matching.
-      Part of MPLS patchset by Simon Horman.
-      [optional for OF1.3+]
-
     * Provider Backbone Bridge tagging.  I don't plan to implement
       this (but we'd accept an implementation).
       [optional for OF1.3+]
 
-    * Rework tag order.
-      Part of MPLS patchset by Simon Horman.
-      [required for v1.3+]
-
     * On-demand flow counters.  I think this might be a real
       optimization in some cases for the software switch.
       [optional for OF1.3+]
diff --git a/datapath/Modules.mk b/datapath/Modules.mk
index b652411..6aa80e5 100644
--- a/datapath/Modules.mk
+++ b/datapath/Modules.mk
@@ -26,6 +26,7 @@ openvswitch_headers = \
 	flow.h \
 	flow_netlink.h \
 	flow_table.h \
+	mpls.h \
 	vlan.h \
 	vport.h \
 	vport-internal_dev.h \
diff --git a/datapath/actions.c b/datapath/actions.c
index 30ea1d2..4820ff5 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -35,6 +35,8 @@
 #include <net/sctp/checksum.h>
 
 #include "datapath.h"
+#include "gso.h"
+#include "mpls.h"
 #include "vlan.h"
 #include "vport.h"
 
@@ -49,6 +51,101 @@ static int make_writable(struct sk_buff *skb, int write_len)
 	return pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
 }
 
+/* The end of the mac header.
+ *
+ * For non-MPLS skbs this will correspond to the network header.
+ * For MPLS skbs it will be before the network_header as the MPLS
+ * label stack lies between the end of the mac header and the network
+ * header. That is, for MPLS skbs the end of the mac header
+ * is the top of the MPLS label stack.
+ */
+static unsigned char *mac_header_end(const struct sk_buff *skb)
+{
+	return skb_mac_header(skb) + skb->mac_len;
+}
+
+static int push_mpls(struct sk_buff *skb,
+		     const struct ovs_action_push_mpls *mpls)
+{
+	__be32 *new_mpls_lse;
+	struct ethhdr *hdr;
+
+	if (skb_cow_head(skb, MPLS_HLEN) < 0) {
+		kfree_skb(skb);
+		return -ENOMEM;
+	}
+
+	skb_push(skb, MPLS_HLEN);
+	memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
+		skb->mac_len);
+	skb_reset_mac_header(skb);
+
+	new_mpls_lse = (__be32 *)mac_header_end(skb);
+	*new_mpls_lse = mpls->mpls_lse;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse,
+							     MPLS_HLEN, 0));
+
+	hdr = eth_hdr(skb);
+	hdr->h_proto = mpls->mpls_ethertype;
+	skb->protocol = mpls->mpls_ethertype;
+	return 0;
+}
+
+static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
+{
+	struct ethhdr *hdr;
+	int err;
+
+	if (unlikely(skb->len < skb->mac_len + MPLS_HLEN))
+		return -EINVAL;
+
+	err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+	if (unlikely(err))
+		return err;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_sub(skb->csum,
+				     csum_partial(mac_header_end(skb),
+						  MPLS_HLEN, 0));
+
+	memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
+		skb->mac_len);
+
+	__skb_pull(skb, MPLS_HLEN);
+	skb_reset_mac_header(skb);
+
+	/* mac_header_end() is used to locate the ethertype
+	 * field correctly in the presence of VLAN tags.
+	 */
+	hdr = (struct ethhdr *)(mac_header_end(skb) - ETH_HLEN);
+	hdr->h_proto = ethertype;
+	if (eth_p_mpls(skb->protocol))
+		skb->protocol = ethertype;
+	return 0;
+}
+
+static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
+{
+	__be32 *stack = (__be32 *)mac_header_end(skb);
+	int err;
+
+	err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+	if (unlikely(err))
+		return err;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE) {
+		__be32 diff[] = { ~(*stack), *mpls_lse };
+		skb->csum = ~csum_partial((char *)diff, sizeof(diff),
+					  ~skb->csum);
+	}
+
+	*stack = *mpls_lse;
+
+	return 0;
+}
+
 /* remove VLAN header from packet and update csum accordingly. */
 static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
 {
@@ -71,7 +168,8 @@ static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
 
 	vlan_set_encap_proto(skb, vhdr);
 	skb->mac_header += VLAN_HLEN;
-	skb_reset_mac_len(skb);
+	/* Update mac_len for subsequent MPLS actions */
+	skb->mac_len -= VLAN_HLEN;
 
 	return 0;
 }
@@ -116,6 +214,9 @@ static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vla
 		if (!__vlan_put_tag(skb, skb->vlan_proto, current_tag))
 			return -ENOMEM;
 
+		/* Update mac_len for subsequent MPLS actions */
+		skb->mac_len += VLAN_HLEN;
+
 		if (skb->ip_summed == CHECKSUM_COMPLETE)
 			skb->csum = csum_add(skb->csum, csum_partial(skb->data
 					+ (2 * ETH_ALEN), VLAN_HLEN, 0));
@@ -501,6 +602,10 @@ static int execute_set_action(struct sk_buff *skb,
 	case OVS_KEY_ATTR_SCTP:
 		err = set_sctp(skb, nla_data(nested_attr));
 		break;
+
+	case OVS_KEY_ATTR_MPLS:
+		err = set_mpls(skb, nla_data(nested_attr));
+		break;
 	}
 
 	return err;
@@ -536,6 +641,16 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			output_userspace(dp, skb, a);
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_MPLS:
+			err = push_mpls(skb, nla_data(a));
+			if (unlikely(err)) /* skb already freed. */
+				return err;
+			break;
+
+		case OVS_ACTION_ATTR_POP_MPLS:
+			err = pop_mpls(skb, nla_get_be16(a));
+			break;
+
 		case OVS_ACTION_ATTR_PUSH_VLAN:
 			err = push_vlan(skb, nla_data(a));
 			if (unlikely(err)) /* skb already freed. */
@@ -609,6 +724,8 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
 		goto out_loop;
 	}
 
+	ovs_skb_init_inner_protocol(skb);
+
 	OVS_CB(skb)->tun_key = NULL;
 	error = do_execute_actions(dp, skb, acts->actions,
 					 acts->actions_len, false);
diff --git a/datapath/datapath.c b/datapath/datapath.c
index d528ba0..44ad3f1 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -543,7 +543,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		goto err_flow_free;
 
 	err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
-				   &flow->key, 0, &acts);
+				   &flow->key, &acts);
 	rcu_assign_pointer(flow->sf_acts, acts);
 	if (err)
 		goto err_flow_free;
@@ -806,7 +806,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 
 		ovs_flow_mask_key(&masked_key, &key, &mask);
 		error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS],
-					     &masked_key, 0, &acts);
+					     &masked_key, &acts);
 		if (error) {
 			OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
 			goto err_kfree;
diff --git a/datapath/flow.c b/datapath/flow.c
index abe6789..e20828b 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -45,6 +45,7 @@
 #include <net/ipv6.h>
 #include <net/ndisc.h>
 
+#include "mpls.h"
 #include "vlan.h"
 
 u64 ovs_flow_used_time(unsigned long flow_jiffies)
@@ -481,6 +482,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
 		return -ENOMEM;
 
 	skb_reset_network_header(skb);
+	skb_reset_mac_len(skb);
 	__skb_push(skb, skb->data - skb_mac_header(skb));
 
 	/* Network layer. */
@@ -564,6 +566,33 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
 			memcpy(key->ipv4.arp.sha, arp->ar_sha, ETH_ALEN);
 			memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN);
 		}
+	} else if (eth_p_mpls(key->eth.type)) {
+		size_t stack_len = MPLS_HLEN;
+
+		/* In the presence of an MPLS label stack the end of the L2
+		 * header and the beginning of the L3 header differ.
+		 *
+		 * Advance network_header to the beginning of the L3
+		 * header. mac_len corresponds to the end of the L2 header.
+		 */
+		while (1) {
+			__be32 lse;
+
+			error = check_header(skb, skb->mac_len + stack_len);
+			if (unlikely(error))
+				return 0;
+
+			memcpy(&lse, skb_network_header(skb), MPLS_HLEN);
+
+			if (stack_len == MPLS_HLEN)
+				memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
+
+			skb_set_network_header(skb, skb->mac_len + stack_len);
+			if (lse & htonl(MPLS_BOS_MASK))
+				break;
+
+			stack_len += MPLS_HLEN;
+		}
 	} else if (key->eth.type == htons(ETH_P_IPV6)) {
 		int nh_len;             /* IPv6 Header + Extensions */
 
diff --git a/datapath/flow.h b/datapath/flow.h
index eafcfd8..86ea5f5 100644
--- a/datapath/flow.h
+++ b/datapath/flow.h
@@ -80,12 +80,17 @@ struct sw_flow_key {
 		__be16 tci;		/* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
 		__be16 type;		/* Ethernet frame type. */
 	} eth;
-	struct {
-		u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
-		u8     tos;		/* IP ToS. */
-		u8     ttl;		/* IP TTL/hop limit. */
-		u8     frag;		/* One of OVS_FRAG_TYPE_*. */
-	} ip;
+	union {
+		struct {
+			__be32 top_lse;		/* top label stack entry */
+		} mpls;
+		struct {
+			u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
+			u8     tos;		/* IP ToS. */
+			u8     ttl;		/* IP TTL/hop limit. */
+			u8     frag;		/* One of OVS_FRAG_TYPE_*. */
+		} ip;
+	};
 	union {
 		struct {
 			struct {
diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index 39fe4bf..86e0950 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -20,6 +20,7 @@
 
 #include "flow.h"
 #include "datapath.h"
+#include "mpls.h"
 #include <linux/uaccess.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
@@ -122,7 +123,8 @@ static bool match_validate(const struct sw_flow_match *match,
 			| (1ULL << OVS_KEY_ATTR_ICMP)
 			| (1ULL << OVS_KEY_ATTR_ICMPV6)
 			| (1ULL << OVS_KEY_ATTR_ARP)
-			| (1ULL << OVS_KEY_ATTR_ND));
+			| (1ULL << OVS_KEY_ATTR_ND)
+			| (1ULL << OVS_KEY_ATTR_MPLS));
 
 	/* Always allowed mask fields. */
 	mask_allowed |= ((1ULL << OVS_KEY_ATTR_TUNNEL)
@@ -137,6 +139,13 @@ static bool match_validate(const struct sw_flow_match *match,
 			mask_allowed |= 1ULL << OVS_KEY_ATTR_ARP;
 	}
 
+
+	if (eth_p_mpls(match->key->eth.type)) {
+		key_expected |= 1ULL << OVS_KEY_ATTR_MPLS;
+		if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
+			mask_allowed |= 1ULL << OVS_KEY_ATTR_MPLS;
+	}
+
 	if (match->key->eth.type == htons(ETH_P_IP)) {
 		key_expected |= 1ULL << OVS_KEY_ATTR_IPV4;
 		if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
@@ -252,6 +261,7 @@ static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
 	[OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
 	[OVS_KEY_ATTR_TUNNEL] = -1,
+	[OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls),
 };
 
 static bool is_all_zero(const u8 *fp, size_t size)
@@ -662,6 +672,16 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match,  bool *exact_5tuple
 		attrs &= ~(1ULL << OVS_KEY_ATTR_ARP);
 	}
 
+	if (attrs & (1ULL << OVS_KEY_ATTR_MPLS)) {
+		const struct ovs_key_mpls *mpls_key;
+
+		mpls_key = nla_data(a[OVS_KEY_ATTR_MPLS]);
+		SW_FLOW_KEY_PUT(match, mpls.top_lse,
+				mpls_key->mpls_lse, is_mask);
+
+		attrs &= ~(1ULL << OVS_KEY_ATTR_MPLS);
+	 }
+
 	if (attrs & (1ULL << OVS_KEY_ATTR_TCP)) {
 		const struct ovs_key_tcp *tcp_key;
 
@@ -1061,6 +1081,14 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
 		arp_key->arp_op = htons(output->ip.proto);
 		memcpy(arp_key->arp_sha, output->ipv4.arp.sha, ETH_ALEN);
 		memcpy(arp_key->arp_tha, output->ipv4.arp.tha, ETH_ALEN);
+	} else if (eth_p_mpls(swkey->eth.type)) {
+		struct ovs_key_mpls *mpls_key;
+
+		nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key));
+		if (!nla)
+			goto nla_put_failure;
+		mpls_key = nla_data(nla);
+		mpls_key->mpls_lse = output->mpls.top_lse;
 	}
 
 	if ((swkey->eth.type == htons(ETH_P_IP) ||
@@ -1269,15 +1297,133 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
 
 	a->nla_len = sfa->actions_len - st_offset;
 }
+#define MAX_ETH_TYPES 16 /* Arbitrary Limit */
+
+/* struct eth_types - possible eth types
+ * @types: provides storage for the possible eth types.
+ * @start: is the index of the first entry of types which is possible.
+ * @end: is the index of the last entry of types which is possible.
+ * @cursor: is the index of the entry which should be updated if an action
+ * changes the eth type.
+ *
+ * Due to the sample action there may be multiple possible eth types.
+ * In order to correctly validate actions all possible types are tracked
+ * and verified. This is done using struct eth_types.
+ *
+ * Initially start, end and cursor should be 0, and the first element of
+ * types should be set to the eth type of the flow.
+ *
+ * When an action changes the eth type then the values of start and end are
+ * updated to the value of cursor. The new type is stored at types[cursor].
+ *
+ * When entering a sample action the start and cursor values are saved. The
+ * value of cursor is set to the value of end plus one.
+ *
+ * When leaving a sample action the start and cursor values are restored to
+ * their saved values.
+ *
+ * An example follows.
+ *
+ * actions: pop_mpls(A),sample(pop_mpls(B)),sample(pop_mpls(C)),pop_mpls(D)
+ *
+ * 0. Initial state:
+ *	types = { original_eth_type }
+ * 	start = end = cursor = 0;
+ *
+ * 1. pop_mpls(A)
+ *    a. Check types from start (0) to end (0) inclusive
+ *       i.e. Check against original_eth_type
+ *    b. Set start = end = cursor
+ *    c. Set types[cursor] = A
+ *    New state:
+ *	types = { A }
+ *	start = end = cursor = 0;
+ *
+ * 2. Enter first sample()
+ *    a. Save start and cursor
+ *    b. Set cursor = end + 1
+ *    New state:
+ *	types = { A }
+ *	start = end = 0;
+ *	cursor = 1;
+ *
+ * 3. pop_mpls(B)
+ *    a. Check types from start (0) to end (0)
+ *       i.e: Check against A
+ *    b. Set start = end = cursor
+ *    c. Set types[cursor] = B
+ *    New state:
+ *	types = { A, B }
+ *	start = end = cursor = 1;
+ *
+ * 4. Leave first sample()
+ *    a. Restore start and cursor to the values when entering 2.
+ *    New state:
+ *	types = { A, B }
+ *	start = cursor = 0;
+ *	end = 1;
+ *
+ * 5. Enter second sample()
+ *    a. Save start and cursor
+ *    b. Set cursor = end + 1
+ *    New state:
+ *	types = { A, B }
+ *	start = 0;
+ *	end = 1;
+ *	cursor = 2;
+ *
+ * 6. pop_mpls(C)
+ *    a. Check types from start (0) to end (1) inclusive
+ *       i.e: Check against A and B
+ *    b. Set start = end = cursor
+ *    c. Set types[cursor] = C
+ *    New state:
+ *	types = { A, B, C }
+ *	start = end = cursor = 2;
+ *
+ * 7. Leave second sample()
+ *    a. Restore start and cursor to the values when entering 5.
+ *    New state:
+ *	types = { A, B, C }
+ *	start = cursor = 0;
+ *	end = 2;
+ *
+ * 8. pop_mpls(D)
+ *    a. Check types from start (0) to end (2) inclusive
+ *       i.e: Check against A, B and C
+ *    b. Set start = end = cursor
+ *    c. Set types[cursor] = D
+ *    New state:
+ *	types = { D } // Trailing entries of type are no longer used end = 0
+ *	start = end = cursor = 0;
+ */
+struct eth_types {
+	int start, end, cursor;
+	__be16 types[MAX_ETH_TYPES];
+};
+
+static void eth_types_set(struct eth_types *types, __be16 type)
+{
+	types->start = types->end = types->cursor;
+	types->types[types->cursor] = type;
+}
+
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+				  const struct sw_flow_key *key,
+				  int depth,
+				  struct sw_flow_actions **sfa,
+				  struct eth_types *eth_types);
 
 static int validate_and_copy_sample(const struct nlattr *attr,
 				    const struct sw_flow_key *key, int depth,
-				    struct sw_flow_actions **sfa)
+				    struct sw_flow_actions **sfa,
+				    struct eth_types *eth_types)
 {
 	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
 	const struct nlattr *probability, *actions;
 	const struct nlattr *a;
 	int rem, start, err, st_acts;
+	int saved_eth_types_start, saved_eth_types_cursor;
 
 	memset(attrs, 0, sizeof(attrs));
 	nla_for_each_nested(a, attr, rem) {
@@ -1309,22 +1455,38 @@ static int validate_and_copy_sample(const struct nlattr *attr,
 	if (st_acts < 0)
 		return st_acts;
 
-	err = ovs_nla_copy_actions(actions, key, depth + 1, sfa);
+	/* Save and update eth_types cursor and start.  Please see the
+	 * comment for struct eth_types for a discussion of this.
+	 */
+	saved_eth_types_start = eth_types->start;
+	saved_eth_types_cursor = eth_types->cursor;
+	eth_types->cursor = eth_types->end + 1;
+	if (eth_types->cursor == MAX_ETH_TYPES)
+		return -EINVAL;
+
+	err = ovs_nla_copy_actions__(actions, key, depth + 1, sfa, eth_types);
 	if (err)
 		return err;
 
+	/* Restore eth_types cursor and start.  Please see the
+	 * comment for struct eth_types for a discussion of this.
+	 */
+	eth_types->cursor = saved_eth_types_cursor;
+	eth_types->start = saved_eth_types_start;
+
 	add_nested_action_end(*sfa, st_acts);
 	add_nested_action_end(*sfa, start);
 
 	return 0;
 }
 
-static int validate_tp_port(const struct sw_flow_key *flow_key)
+static int validate_tp_port__(const struct sw_flow_key *flow_key,
+			      __be16 eth_type)
 {
-	if (flow_key->eth.type == htons(ETH_P_IP)) {
+	if (eth_type == htons(ETH_P_IP)) {
 		if (flow_key->ipv4.tp.src || flow_key->ipv4.tp.dst)
 			return 0;
-	} else if (flow_key->eth.type == htons(ETH_P_IPV6)) {
+	} else  if (eth_type == htons(ETH_P_IPV6)) {
 		if (flow_key->ipv6.tp.src || flow_key->ipv6.tp.dst)
 			return 0;
 	}
@@ -1332,6 +1494,21 @@ static int validate_tp_port(const struct sw_flow_key *flow_key)
 	return -EINVAL;
 }
 
+static int validate_tp_port(const struct sw_flow_key *flow_key,
+                           const struct eth_types *eth_types)
+{
+	int i;
+
+	for (i = eth_types->start; i < eth_types->end; i++) {
+		int ret = validate_tp_port__(flow_key, eth_types->types[i]);
+
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 void ovs_match_init(struct sw_flow_match *match,
 		    struct sw_flow_key *key,
 		    struct sw_flow_mask *mask)
@@ -1374,7 +1551,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 static int validate_set(const struct nlattr *a,
 			const struct sw_flow_key *flow_key,
 			struct sw_flow_actions **sfa,
-			bool *set_tun)
+			bool *set_tun, struct eth_types *eth_types)
 {
 	const struct nlattr *ovs_key = nla_data(a);
 	int key_type = nla_type(ovs_key);
@@ -1405,9 +1582,12 @@ static int validate_set(const struct nlattr *a,
 			return err;
 		break;
 
-	case OVS_KEY_ATTR_IPV4:
-		if (flow_key->eth.type != htons(ETH_P_IP))
-			return -EINVAL;
+	case OVS_KEY_ATTR_IPV4: {
+		int i;
+
+		for (i = eth_types->start; i <= eth_types->end; i++)
+			if (eth_types->types[i] != htons(ETH_P_IP))
+				return -EINVAL;
 
 		if (!flow_key->ip.proto)
 			return -EINVAL;
@@ -1420,10 +1600,14 @@ static int validate_set(const struct nlattr *a,
 			return -EINVAL;
 
 		break;
+	}
 
-	case OVS_KEY_ATTR_IPV6:
-		if (flow_key->eth.type != htons(ETH_P_IPV6))
-			return -EINVAL;
+	case OVS_KEY_ATTR_IPV6: {
+		int i;
+
+		for (i = eth_types->start; i <= eth_types->end; i++)
+			if (eth_types->types[i] != htons(ETH_P_IPV6))
+				return -EINVAL;
 
 		if (!flow_key->ip.proto)
 			return -EINVAL;
@@ -1439,24 +1623,35 @@ static int validate_set(const struct nlattr *a,
 			return -EINVAL;
 
 		break;
+	}
+
 
 	case OVS_KEY_ATTR_TCP:
 		if (flow_key->ip.proto != IPPROTO_TCP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_types);
 
 	case OVS_KEY_ATTR_UDP:
 		if (flow_key->ip.proto != IPPROTO_UDP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_types);
+
+	case OVS_KEY_ATTR_MPLS: {
+		int i;
+
+		for (i = eth_types->start; i < eth_types->end; i++)
+			if (!eth_p_mpls(eth_types->types[i]))
+				return -EINVAL;
+		break;
+	}
 
 	case OVS_KEY_ATTR_SCTP:
 		if (flow_key->ip.proto != IPPROTO_SCTP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_types);
 
 	default:
 		return -EINVAL;
@@ -1500,10 +1695,11 @@ static int copy_action(const struct nlattr *from,
 	return 0;
 }
 
-int ovs_nla_copy_actions(const struct nlattr *attr,
-			 const struct sw_flow_key *key,
-			 int depth,
-			 struct sw_flow_actions **sfa)
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+				  const struct sw_flow_key *key,
+				  int depth,
+				  struct sw_flow_actions **sfa,
+				  struct eth_types *eth_types)
 {
 	const struct nlattr *a;
 	int rem, err;
@@ -1516,6 +1712,8 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 		static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = {
 			[OVS_ACTION_ATTR_OUTPUT] = sizeof(u32),
 			[OVS_ACTION_ATTR_USERSPACE] = (u32)-1,
+			[OVS_ACTION_ATTR_PUSH_MPLS] = sizeof(struct ovs_action_push_mpls),
+			[OVS_ACTION_ATTR_POP_MPLS] = sizeof(__be16),
 			[OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan),
 			[OVS_ACTION_ATTR_POP_VLAN] = 0,
 			[OVS_ACTION_ATTR_SET] = (u32)-1,
@@ -1558,14 +1756,54 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 				return -EINVAL;
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_MPLS: {
+			int i;
+			const struct ovs_action_push_mpls *mpls = nla_data(a);
+
+			if (!eth_p_mpls(mpls->mpls_ethertype))
+				return -EINVAL;
+			/* Prohibit push MPLS in the presence of VLANs */
+			for (i = eth_types->start; i < eth_types->end; i++)
+				if (eth_types->types[i] == htons(ETH_P_8021Q) ||
+				    eth_types->types[i] == htons(ETH_P_8021AD) ||
+				    eth_types->types[i] == htons(ETH_P_QINQ1) ||
+				    eth_types->types[i] == htons(ETH_P_QINQ2) ||
+				    eth_types->types[i] == htons(ETH_P_QINQ3))
+					return -EINVAL;
+			eth_types_set(eth_types, mpls->mpls_ethertype);
+			break;
+		}
+
+		case OVS_ACTION_ATTR_POP_MPLS: {
+			int i;
+
+			for (i = eth_types->start; i <= eth_types->end; i++)
+				if (!eth_p_mpls(eth_types->types[i]))
+					return -EINVAL;
+
+			/* Disallow subsequent L2.5+ set and mpls_pop actions
+			 * as there is no check here to ensure that the new
+			 * eth_type is valid and thus set actions could
+			 * write off the end of the packet or otherwise
+			 * corrupt it.
+			 *
+			 * Support for these actions is planned using packet
+			 * recirculation.
+			 */
+			eth_types_set(eth_types, htons(0));
+			break;
+		}
+
 		case OVS_ACTION_ATTR_SET:
-			err = validate_set(a, key, sfa, &skip_copy);
+			err = validate_set(a, key, sfa, &skip_copy,
+					   eth_types);
 			if (err)
 				return err;
 			break;
 
 		case OVS_ACTION_ATTR_SAMPLE:
-			err = validate_and_copy_sample(a, key, depth, sfa);
+			err = validate_and_copy_sample(a, key, depth, sfa,
+						       eth_types);
 			if (err)
 				return err;
 			skip_copy = true;
@@ -1587,6 +1825,20 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 	return 0;
 }
 
+int ovs_nla_copy_actions(const struct nlattr *attr,
+			 const struct sw_flow_key *key,
+			 struct sw_flow_actions **sfa)
+{
+	struct eth_types eth_type = {
+		.start = 0,
+		.end = 0,
+		.cursor = 0,
+		.types = { key->eth.type, },
+	};
+
+	return ovs_nla_copy_actions__(attr, key, 0, sfa, &eth_type);
+}
+
 static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
 {
 	const struct nlattr *a;
diff --git a/datapath/flow_netlink.h b/datapath/flow_netlink.h
index b31fbe2..41d2673 100644
--- a/datapath/flow_netlink.h
+++ b/datapath/flow_netlink.h
@@ -50,7 +50,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 		      const struct nlattr *);
 
 int ovs_nla_copy_actions(const struct nlattr *attr,
-			 const struct sw_flow_key *key, int depth,
+			 const struct sw_flow_key *key,
 			 struct sw_flow_actions **sfa);
 int ovs_nla_put_actions(const struct nlattr *attr,
 			int len, struct sk_buff *skb);
diff --git a/datapath/linux/compat/gso.c b/datapath/linux/compat/gso.c
index 32f906c..7461f57 100644
--- a/datapath/linux/compat/gso.c
+++ b/datapath/linux/compat/gso.c
@@ -19,6 +19,7 @@
 #include <linux/module.h>
 #include <linux/if.h>
 #include <linux/if_tunnel.h>
+#include <linux/if_vlan.h>
 #include <linux/icmp.h>
 #include <linux/in.h>
 #include <linux/ip.h>
@@ -35,6 +36,8 @@
 #include <net/xfrm.h>
 
 #include "gso.h"
+#include "mpls.h"
+#include "vlan.h"
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37) && \
 	!defined(HAVE_VLAN_BUG_WORKAROUND)
@@ -47,10 +50,12 @@ MODULE_PARM_DESC(vlan_tso, "Enable TSO for VLAN packets");
 #define vlan_tso true
 #endif
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
 static bool dev_supports_vlan_tx(struct net_device *dev)
 {
-#if defined(HAVE_VLAN_BUG_WORKAROUND)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
+	return true;
+#elif defined(HAVE_VLAN_BUG_WORKAROUND)
 	return dev->features & NETIF_F_HW_VLAN_TX;
 #else
 	/* Assume that the driver is buggy. */
@@ -58,24 +63,64 @@ static bool dev_supports_vlan_tx(struct net_device *dev)
 #endif
 }
 
+/* Strictly this is not needed and will be optimised out
+ * as this code is guarded by if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0).
+ * It is here to make things explicit should the compatibility
+ * code be extended in some way prior extending its life-span
+ * beyond v3.11.
+ */
+static bool supports_mpls_gso(void)
+{
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,11,0)
+	return true;
+#else
+	return false;
+#endif
+}
+
 int rpl_dev_queue_xmit(struct sk_buff *skb)
 {
 #undef dev_queue_xmit
 	int err = -ENOMEM;
+	bool vlan, mpls;
+
+	vlan = mpls = false;
+
+	if (eth_p_mpls(skb->protocol) && !supports_mpls_gso())
+		mpls = true;
+
+	if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev))
+		vlan = true;
 
-	if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev)) {
+	if (vlan || mpls) {
 		int features;
 
 		features = netif_skb_features(skb);
 
-		if (!vlan_tso)
-			features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
-				      NETIF_F_UFO | NETIF_F_FSO);
+		if (vlan) {
+			if (!vlan_tso)
+				features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
+					      NETIF_F_UFO | NETIF_F_FSO);
 
-		skb = __vlan_put_tag(skb, skb->vlan_proto, vlan_tx_tag_get(skb));
-		if (unlikely(!skb))
-			return err;
-		vlan_set_tci(skb, 0);
+			skb = __vlan_put_tag(skb, skb->vlan_proto,
+					     vlan_tx_tag_get(skb));
+			if (unlikely(!skb))
+				return err;
+			vlan_set_tci(skb, 0);
+		}
+
+		/* As of v3.11 the kernel provides an mpls_features field in
+		 * struct net_device which allows devices to advertise which
+		 * features its supports for MPLS. This value defaults to
+		 * NETIF_F_SG and as of v3.11.
+		 *
+		 * This compatibility code is intended for kernels older
+		 * than v3.11 that do not support MPLS GSO and thus do not
+		 * provide mpls_features. Thus this code uses NETIF_F_SG
+		 * directly in place of mpls_features.
+		 */
+		if (mpls)
+			features &= NETIF_F_SG;
 
 		if (netif_needs_gso(skb, features)) {
 			struct sk_buff *nskb;
@@ -114,13 +159,16 @@ drop:
 	kfree_skb(skb);
 	return err;
 }
-#endif /* kernel version < 2.6.37 */
+#endif /* kernel version < 3.11.0 */
 
 static __be16 __skb_network_protocol(struct sk_buff *skb)
 {
 	__be16 type = skb->protocol;
 	int vlan_depth = ETH_HLEN;
 
+	if (eth_p_mpls(skb->protocol))
+		type = ovs_skb_get_inner_protocol(skb);
+
 	while (type == htons(ETH_P_8021Q) || type == htons(ETH_P_8021AD)) {
 		struct vlan_hdr *vh;
 
diff --git a/datapath/linux/compat/gso.h b/datapath/linux/compat/gso.h
index 44fd213..d7a9cea 100644
--- a/datapath/linux/compat/gso.h
+++ b/datapath/linux/compat/gso.h
@@ -1,6 +1,7 @@
 #ifndef __LINUX_GSO_WRAPPER_H
 #define __LINUX_GSO_WRAPPER_H
 
+#include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <net/protocol.h>
 
@@ -11,6 +12,9 @@ struct ovs_gso_cb {
 	sk_buff_data_t	inner_network_header;
 	sk_buff_data_t	inner_mac_header;
 	void (*fix_segment)(struct sk_buff *);
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
+	__be16			inner_protocol;
+#endif
 };
 #define OVS_GSO_CB(skb) ((struct ovs_gso_cb *)(skb)->cb)
 
@@ -69,4 +73,41 @@ static inline void skb_reset_inner_headers(struct sk_buff *skb)
 
 #define ip_local_out rpl_ip_local_out
 int ip_local_out(struct sk_buff *skb);
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
+static inline void ovs_skb_init_inner_protocol(struct sk_buff *skb) {
+	OVS_GSO_CB(skb)->inner_protocol = htons(0);
+}
+
+static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb,
+					      __be16 ethertype) {
+	OVS_GSO_CB(skb)->inner_protocol = ethertype;
+}
+
+static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb)
+{
+	return OVS_GSO_CB(skb)->inner_protocol;
+}
+
+#else
+
+static inline void ovs_skb_init_inner_protocol(struct sk_buff *skb) {
+	/* Nothing to do. The inner_protocol is either zero or
+	 * has been set to a value by another user.
+	 * Either way it may be considered initialised.
+	 */
+}
+
+static inline void ovs_skb_set_inner_protocol(struct sk_buff *skb,
+					      __be16 ethertype)
+{
+	skb->inner_protocol = ethertype;
+}
+
+static inline __be16 ovs_skb_get_inner_protocol(struct sk_buff *skb)
+{
+	return skb->inner_protocol;
+}
+#endif
+
 #endif
diff --git a/datapath/linux/compat/include/linux/netdevice.h b/datapath/linux/compat/include/linux/netdevice.h
index e04f308..df69ca6 100644
--- a/datapath/linux/compat/include/linux/netdevice.h
+++ b/datapath/linux/compat/include/linux/netdevice.h
@@ -64,11 +64,13 @@ static inline struct net_device *dev_get_by_index_rcu(struct net *net, int ifind
 typedef u32 netdev_features_t;
 #endif
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
 #define skb_gso_segment rpl_skb_gso_segment
 struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
                                     netdev_features_t features);
+#endif
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38)
 #define netif_skb_features rpl_netif_skb_features
 netdev_features_t rpl_netif_skb_features(struct sk_buff *skb);
 
@@ -113,7 +115,7 @@ static inline struct net_device *netdev_master_upper_dev_get(struct net_device *
 }
 #endif
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
 #define dev_queue_xmit rpl_dev_queue_xmit
 int dev_queue_xmit(struct sk_buff *skb);
 #endif
diff --git a/datapath/linux/compat/netdevice.c b/datapath/linux/compat/netdevice.c
index 1dc5abf..d22fced 100644
--- a/datapath/linux/compat/netdevice.c
+++ b/datapath/linux/compat/netdevice.c
@@ -1,6 +1,9 @@
 #include <linux/netdevice.h>
 #include <linux/if_vlan.h>
 
+#include "mpls.h"
+#include "gso.h"
+
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,38)
 #ifndef HAVE_CAN_CHECKSUM_PROTOCOL
 static bool can_checksum_protocol(netdev_features_t features, __be16 protocol)
@@ -69,7 +72,9 @@ netdev_features_t rpl_netif_skb_features(struct sk_buff *skb)
 		return harmonize_features(skb, protocol, features);
 	}
 }
+#endif	/* kernel version < 2.6.38 */
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,11,0)
 struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
 				    netdev_features_t features)
 {
@@ -78,6 +83,9 @@ struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
 	__be16 skb_proto;
 	struct sk_buff *skb_gso;
 
+	if (eth_p_mpls(skb->protocol))
+		type = ovs_skb_get_inner_protocol(skb);
+
 	while (type == htons(ETH_P_8021Q)) {
 		struct vlan_hdr *vh;
 
@@ -98,4 +106,4 @@ struct sk_buff *rpl_skb_gso_segment(struct sk_buff *skb,
 	skb->protocol = skb_proto;
 	return skb_gso;
 }
-#endif	/* kernel version < 2.6.38 */
+#endif	/* kernel version < 3.11.0 */
diff --git a/datapath/mpls.h b/datapath/mpls.h
new file mode 100644
index 0000000..7eab104
--- /dev/null
+++ b/datapath/mpls.h
@@ -0,0 +1,15 @@
+#ifndef MPLS_H
+#define MPLS_H 1
+
+#include <linux/if_ether.h>
+
+#define MPLS_BOS_MASK	0x00000100
+#define MPLS_HLEN 4
+
+static inline bool eth_p_mpls(__be16 eth_type)
+{
+	return eth_type == htons(ETH_P_MPLS_UC) ||
+		eth_type == htons(ETH_P_MPLS_MC);
+}
+
+#endif
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index d1ff5ec..7205f7b 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -307,14 +307,13 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_TUNNEL,	/* Nested set of ovs_tunnel attributes */
 	OVS_KEY_ATTR_SCTP,      /* struct ovs_key_sctp */
 	OVS_KEY_ATTR_TCP_FLAGS,	/* be16 TCP flags. */
+	OVS_KEY_ATTR_MPLS,      /* array of struct ovs_key_mpls.
+				 * The implementation may restrict
+				 * the accepted length of the array. */
 
 #ifdef __KERNEL__
 	OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
 #endif
-
-	OVS_KEY_ATTR_MPLS = 62, /* array of struct ovs_key_mpls.
-				 * The implementation may restrict
-				 * the accepted length of the array. */
 	__OVS_KEY_ATTR_MAX
 };
 
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH net] bonding: Fix deadlock in bonding driver when using netpoll
From: Ding Tianhong @ 2014-02-12  4:06 UTC (permalink / raw)
  To: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Netdev

The bonding driver take write locks and spin locks that are shared
by the tx path in enslave processing and notification processing,
If the netconsole is in use, the bonding can call printk which puts
us in the netpoll tx path, if the netconsole is attached to the bonding
driver, result in deadlock.

So add protection for these place, by checking the netpoll_block_tx
state, we can defer the sending of the netconsole frames until a later
time using the retransmit feature of netpoll_send_skb that is triggered
on the return code NETDEV_TX_BUSY.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 71ba18e..8676649 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1543,9 +1543,11 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	bond_set_carrier(bond);
 
 	if (USES_PRIMARY(bond->params.mode)) {
+		block_netpoll_tx();
 		write_lock_bh(&bond->curr_slave_lock);
 		bond_select_active_slave(bond);
 		write_unlock_bh(&bond->curr_slave_lock);
+		unblock_netpoll_tx();
 	}
 
 	pr_info("%s: enslaving %s as a%s interface with a%s link.\n",
@@ -1571,10 +1573,12 @@ err_detach:
 	if (bond->primary_slave == new_slave)
 		bond->primary_slave = NULL;
 	if (bond->curr_active_slave == new_slave) {
+		block_netpoll_tx();
 		write_lock_bh(&bond->curr_slave_lock);
 		bond_change_active_slave(bond, NULL);
 		bond_select_active_slave(bond);
 		write_unlock_bh(&bond->curr_slave_lock);
+		unblock_netpoll_tx();
 	}
 	slave_disable_netpoll(new_slave);
 
@@ -2864,9 +2868,12 @@ static int bond_slave_netdev_event(unsigned long event,
 		pr_info("%s: Primary slave changed to %s, reselecting active slave.\n",
 			bond->dev->name, bond->primary_slave ? slave_dev->name :
 							       "none");
+
+		block_netpoll_tx();
 		write_lock_bh(&bond->curr_slave_lock);
 		bond_select_active_slave(bond);
 		write_unlock_bh(&bond->curr_slave_lock);
+		unblock_netpoll_tx();
 		break;
 	case NETDEV_FEAT_CHANGE:
 		bond_compute_features(bond);
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next 02/10] net: phy: add MoCA PHY type
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

Some Ethernet MACs are connected to a MoCA PHY which will handle the
low-level job of sending Ethernet frames on the coaxial cable, these
Ethernet MACs need to know about it to be properly configured.
Add a new PHY mode "moca" and update the Device Tree parsing logic to
look for it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/of/of_net.c | 1 +
 include/linux/phy.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c
index 729beba..e04f57b 100644
--- a/drivers/of/of_net.c
+++ b/drivers/of/of_net.c
@@ -32,6 +32,7 @@ static const char *phy_modes[] = {
 	[PHY_INTERFACE_MODE_SMII]	= "smii",
 	[PHY_INTERFACE_MODE_XGMII]	= "xgmii",
 	[PHY_INTERFACE_MODE_INTERNAL]	= "internal",
+	[PHY_INTERFACE_MODE_MOCA]	= "moca",
 };
 
 /**
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 463434b..0680261 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -75,6 +75,7 @@ typedef enum {
 	PHY_INTERFACE_MODE_SMII,
 	PHY_INTERFACE_MODE_XGMII,
 	PHY_INTERFACE_MODE_INTERNAL,
+	PHY_INTERFACE_MODE_MOCA,
 } phy_interface_t;
 
 
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 03/10] net: phy: update port type for MoCA PHYs
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

MoCA PHYs are using coaxial (BNC-like) connectors, update the
transceiver port type when replying to ethtool.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/phy.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 19c9eca..a755fa2 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -283,7 +283,10 @@ int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd)
 
 	ethtool_cmd_speed_set(cmd, phydev->speed);
 	cmd->duplex = phydev->duplex;
-	cmd->port = PORT_MII;
+	if (phydev->interface == PHY_INTERFACE_MODE_MOCA)
+		cmd->port = PORT_BNC;
+	else
+		cmd->port = PORT_MII;
 	cmd->phy_address = phydev->addr;
 	cmd->transceiver = phy_is_internal(phydev) ?
 		XCVR_INTERNAL : XCVR_EXTERNAL;
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 05/10] net: bcmgenet: add driver definitions and private structure
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

This patchs adds the bcmgenet.h header file which contains all the
hardware definitions for the GENETv1 to v4 hardware blocks as well as
the driver private structure and MIB counters.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.h | 631 +++++++++++++++++++++++++
 1 file changed, 631 insertions(+)
 create mode 100644 drivers/net/ethernet/broadcom/genet/bcmgenet.h

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
new file mode 100644
index 0000000..28ba32f
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
@@ -0,0 +1,631 @@
+/*
+ * Copyright (c) 2014 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ *
+*/
+#ifndef __BCMGENET_H__
+#define __BCMGENET_H__
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/spinlock.h>
+#include <linux/clk.h>
+#include <linux/mii.h>
+#include <linux/if_vlan.h>
+#include <linux/phy.h>
+
+/* total number of Buffer Descriptors, same for Rx/Tx */
+#define TOTAL_DESC				256
+/* which ring is descriptor based */
+#define DESC_INDEX				16
+/* Body(1500) + EH_SIZE(14) + VLANTAG(4) + BRCMTAG(6) + FCS(4) = 1528.
+ * 1536 is multiple of 256 bytes
+ */
+#define ENET_BRCM_TAG_LEN	6
+#define ENET_PAD		8
+#define ENET_MAX_MTU_SIZE	(ETH_DATA_LEN + ETH_HLEN + VLAN_HLEN + \
+				 ENET_BRCM_TAG_LEN + ETH_FCS_LEN + ENET_PAD)
+#define DMA_MAX_BURST_LENGTH    0x10
+
+/* misc. configuration */
+#define CLEAR_ALL_HFB			0xFF
+#define DMA_FC_THRESH_HI		(TOTAL_DESC >> 4)
+#define DMA_FC_THRESH_LO		5
+
+/* PHY types */
+#define BRCM_PHY_TYPE_INT		0
+#define BRCM_PHY_TYPE_MOCA		1
+
+/* 64B status Block */
+struct status_64 {
+	u32	length_status;		/* length and peripheral status */
+	u32	ext_status;		/* Extended status*/
+	u32	rx_csum;		/* partial rx checksum */
+	u32	unused1[9];		/* unused */
+	u32	tx_csum_info;		/* Tx checksum info. */
+	u32	unused2[3];		/* unused */
+};
+
+
+/* Rx status bits */
+#define STATUS_RX_EXT_MASK		0x1FFFFF
+#define STATUS_RX_CSUM_MASK		0xFFFF
+#define STATUS_RX_CSUM_OK		0x10000
+#define STATUS_RX_CSUM_FR		0x20000
+#define STATUS_RX_PROTO_TCP		0
+#define STATUS_RX_PROTO_UDP		1
+#define STATUS_RX_PROTO_ICMP		2
+#define STATUS_RX_PROTO_OTHER		3
+#define STATUS_RX_PROTO_MASK		3
+#define STATUS_RX_PROTO_SHIFT		18
+#define STATUS_FILTER_INDEX_MASK	0xFFFF
+/* Tx status bits */
+#define STATUS_TX_CSUM_START_MASK	0X7FFF
+#define STATUS_TX_CSUM_START_SHIFT	16
+#define STATUS_TX_CSUM_PROTO_UDP	0x8000
+#define STATUS_TX_CSUM_OFFSET_MASK	0x7FFF
+#define STATUS_TX_CSUM_LV		0x80000000
+
+/* DMA Descriptor */
+#define DMA_DESC_LENGTH_STATUS	0x00	/* in bytes of data in buffer */
+#define DMA_DESC_ADDRESS_LO	0x04	/* lower bits of PA */
+#define DMA_DESC_ADDRESS_HI	0x08	/* upper 32 bits of PA, GENETv4+ */
+
+/* Rx/Tx common counter group.*/
+struct bcmgenet_pkt_counters {
+	u32	cnt_64;		/* RO Received/Transmited 64 bytes packet */
+	u32	cnt_127;	/* RO Rx/Tx 127 bytes packet */
+	u32	cnt_255;	/* RO Rx/Tx 65-255 bytes packet */
+	u32	cnt_511;	/* RO Rx/Tx 256-511 bytes packet */
+	u32	cnt_1023;	/* RO Rx/Tx 512-1023 bytes packet */
+	u32	cnt_1518;	/* RO Rx/Tx 1024-1518 bytes packet */
+	u32	cnt_mgv;	/* RO Rx/Tx 1519-1522 good VLAN packet */
+	u32	cnt_2047;	/* RO Rx/Tx 1522-2047 bytes packet*/
+	u32	cnt_4095;	/* RO Rx/Tx 2048-4095 bytes packet*/
+	u32	cnt_9216;	/* RO Rx/Tx 4096-9216 bytes packet*/
+};
+
+/* RSV, Receive Status Vector */
+struct bcmgenet_rx_counters {
+	struct  bcmgenet_pkt_counters pkt_cnt;
+	u32	pkt;		/* RO (0x428) Received pkt count*/
+	u32	bytes;		/* RO Received byte count */
+	u32	mca;		/* RO # of Received multicast pkt */
+	u32	bca;		/* RO # of Receive broadcast pkt */
+	u32	fcs;		/* RO # of Received FCS error  */
+	u32	cf;		/* RO # of Received control frame pkt*/
+	u32	pf;		/* RO # of Received pause frame pkt */
+	u32	uo;		/* RO # of unknown op code pkt */
+	u32	aln;		/* RO # of alignment error count */
+	u32	flr;		/* RO # of frame length out of range count */
+	u32	cde;		/* RO # of code error pkt */
+	u32	fcr;		/* RO # of carrier sense error pkt */
+	u32	ovr;		/* RO # of oversize pkt*/
+	u32	jbr;		/* RO # of jabber count */
+	u32	mtue;		/* RO # of MTU error pkt*/
+	u32	pok;		/* RO # of Received good pkt */
+	u32	uc;		/* RO # of unicast pkt */
+	u32	ppp;		/* RO # of PPP pkt */
+	u32	rcrc;		/* RO (0x470),# of CRC match pkt */
+};
+
+/* TSV, Transmit Status Vector */
+struct bcmgenet_tx_counters {
+	struct bcmgenet_pkt_counters pkt_cnt;
+	u32	pkts;		/* RO (0x4a8) Transmited pkt */
+	u32	mca;		/* RO # of xmited multicast pkt */
+	u32	bca;		/* RO # of xmited broadcast pkt */
+	u32	pf;		/* RO # of xmited pause frame count */
+	u32	cf;		/* RO # of xmited control frame count */
+	u32	fcs;		/* RO # of xmited FCS error count */
+	u32	ovr;		/* RO # of xmited oversize pkt */
+	u32	drf;		/* RO # of xmited deferral pkt */
+	u32	edf;		/* RO # of xmited Excessive deferral pkt*/
+	u32	scl;		/* RO # of xmited single collision pkt */
+	u32	mcl;		/* RO # of xmited multiple collision pkt*/
+	u32	lcl;		/* RO # of xmited late collision pkt */
+	u32	ecl;		/* RO # of xmited excessive collision pkt*/
+	u32	frg;		/* RO # of xmited fragments pkt*/
+	u32	ncl;		/* RO # of xmited total collision count */
+	u32	jbr;		/* RO # of xmited jabber count*/
+	u32	bytes;		/* RO # of xmited byte count */
+	u32	pok;		/* RO # of xmited good pkt */
+	u32	uc;		/* RO (0x0x4f0)# of xmited unitcast pkt */
+};
+
+struct bcmgenet_mib_counters {
+	struct bcmgenet_rx_counters rx;
+	struct bcmgenet_tx_counters tx;
+	u32	rx_runt_cnt;
+	u32	rx_runt_fcs;
+	u32	rx_runt_fcs_align;
+	u32	rx_runt_bytes;
+	u32	rbuf_ovflow_cnt;
+	u32	rbuf_err_cnt;
+	u32	mdf_err_cnt;
+};
+
+#define UMAC_HD_BKP_CTRL		0x004
+#define	 HD_FC_EN			(1 << 0)
+#define  HD_FC_BKOFF_OK			(1 << 1)
+#define  IPG_CONFIG_RX_SHIFT		2
+#define  IPG_CONFIG_RX_MASK		0x1F
+
+#define UMAC_CMD			0x008
+#define  CMD_TX_EN			(1 << 0)
+#define  CMD_RX_EN			(1 << 1)
+#define  UMAC_SPEED_10			0
+#define  UMAC_SPEED_100			1
+#define  UMAC_SPEED_1000		2
+#define  UMAC_SPEED_2500		3
+#define  CMD_SPEED_SHIFT		2
+#define  CMD_SPEED_MASK			3
+#define  CMD_PROMISC			(1 << 4)
+#define  CMD_PAD_EN			(1 << 5)
+#define  CMD_CRC_FWD			(1 << 6)
+#define  CMD_PAUSE_FWD			(1 << 7)
+#define  CMD_RX_PAUSE_IGNORE		(1 << 8)
+#define  CMD_TX_ADDR_INS		(1 << 9)
+#define  CMD_HD_EN			(1 << 10)
+#define  CMD_SW_RESET			(1 << 13)
+#define  CMD_LCL_LOOP_EN		(1 << 15)
+#define  CMD_AUTO_CONFIG		(1 << 22)
+#define  CMD_CNTL_FRM_EN		(1 << 23)
+#define  CMD_NO_LEN_CHK			(1 << 24)
+#define  CMD_RMT_LOOP_EN		(1 << 25)
+#define  CMD_PRBL_EN			(1 << 27)
+#define  CMD_TX_PAUSE_IGNORE		(1 << 28)
+#define  CMD_TX_RX_EN			(1 << 29)
+#define  CMD_RUNT_FILTER_DIS		(1 << 30)
+
+#define UMAC_MAC0			0x00C
+#define UMAC_MAC1			0x010
+#define UMAC_MAX_FRAME_LEN		0x014
+
+#define UMAC_TX_FLUSH			0x334
+
+#define UMAC_MIB_START			0x400
+
+#define UMAC_MDIO_CMD			0x614
+#define  MDIO_START_BUSY		(1 << 29)
+#define  MDIO_READ_FAIL			(1 << 28)
+#define  MDIO_RD			(2 << 26)
+#define  MDIO_WR			(1 << 26)
+#define  MDIO_PMD_SHIFT			21
+#define  MDIO_PMD_MASK			0x1F
+#define  MDIO_REG_SHIFT			16
+#define  MDIO_REG_MASK			0x1F
+
+#define UMAC_RBUF_OVFL_CNT		0x61C
+
+#define UMAC_MPD_CTRL			0x620
+#define  MPD_EN				(1 << 0)
+#define  MPD_PW_EN			(1 << 27)
+#define  MPD_MSEQ_LEN_SHIFT		16
+#define  MPD_MSEQ_LEN_MASK		0xFF
+
+#define UMAC_MPD_PW_MS			0x624
+#define UMAC_MPD_PW_LS			0x628
+#define UMAC_RBUF_ERR_CNT		0x634
+#define UMAC_MDF_ERR_CNT		0x638
+#define UMAC_MDF_CTRL			0x650
+#define UMAC_MDF_ADDR			0x654
+#define UMAC_MIB_CTRL			0x580
+#define  MIB_RESET_RX			(1 << 0)
+#define  MIB_RESET_RUNT			(1 << 1)
+#define  MIB_RESET_TX			(1 << 2)
+
+#define RBUF_CTRL			0x00
+#define  RBUF_64B_EN			(1 << 0)
+#define  RBUF_ALIGN_2B			(1 << 1)
+#define  RBUF_BAD_DIS			(1 << 2)
+
+#define RBUF_STATUS			0x0C
+#define  RBUF_STATUS_WOL		(1 << 0)
+#define  RBUF_STATUS_MPD_INTR_ACTIVE	(1 << 1)
+#define  RBUF_STATUS_ACPI_INTR_ACTIVE	(1 << 2)
+
+#define RBUF_CHK_CTRL			0x14
+#define  RBUF_RXCHK_EN			(1 << 0)
+#define  RBUF_SKIP_FCS			(1 << 4)
+
+#define RBUF_TBUF_SIZE_CTRL		0xb4
+
+#define RBUF_HFB_CTRL_V1		0x38
+#define  RBUF_HFB_FILTER_EN_SHIFT	16
+#define  RBUF_HFB_FILTER_EN_MASK	0xffff0000
+#define  RBUF_HFB_EN			(1 << 0)
+#define  RBUF_HFB_256B			(1 << 1)
+#define  RBUF_ACPI_EN			(1 << 2)
+
+#define RBUF_HFB_LEN_V1			0x3C
+#define  RBUF_FLTR_LEN_MASK		0xFF
+#define  RBUF_FLTR_LEN_SHIFT		8
+
+#define TBUF_CTRL			0x00
+#define TBUF_BP_MC			0x0C
+
+#define TBUF_CTRL_V1			0x80
+#define TBUF_BP_MC_V1			0xA0
+
+#define HFB_CTRL			0x00
+#define HFB_FLT_ENABLE_V3PLUS		0x04
+#define HFB_FLT_LEN_V2			0x04
+#define HFB_FLT_LEN_V3PLUS		0x1C
+
+/* uniMac intrl2 registers */
+#define INTRL2_CPU_STAT			0x00
+#define INTRL2_CPU_SET			0x04
+#define INTRL2_CPU_CLEAR		0x08
+#define INTRL2_CPU_MASK_STATUS		0x0C
+#define INTRL2_CPU_MASK_SET		0x10
+#define INTRL2_CPU_MASK_CLEAR		0x14
+
+/* INTRL2 instance 0 definitions */
+#define UMAC_IRQ_SCB			(1 << 0)
+#define UMAC_IRQ_EPHY			(1 << 1)
+#define UMAC_IRQ_PHY_DET_R		(1 << 2)
+#define UMAC_IRQ_PHY_DET_F		(1 << 3)
+#define UMAC_IRQ_LINK_UP		(1 << 4)
+#define UMAC_IRQ_LINK_DOWN		(1 << 5)
+#define UMAC_IRQ_UMAC			(1 << 6)
+#define UMAC_IRQ_UMAC_TSV		(1 << 7)
+#define UMAC_IRQ_TBUF_UNDERRUN		(1 << 8)
+#define UMAC_IRQ_RBUF_OVERFLOW		(1 << 9)
+#define UMAC_IRQ_HFB_SM			(1 << 10)
+#define UMAC_IRQ_HFB_MM			(1 << 11)
+#define UMAC_IRQ_MPD_R			(1 << 12)
+#define UMAC_IRQ_RXDMA_MBDONE		(1 << 13)
+#define UMAC_IRQ_RXDMA_PDONE		(1 << 14)
+#define UMAC_IRQ_RXDMA_BDONE		(1 << 15)
+#define UMAC_IRQ_TXDMA_MBDONE		(1 << 16)
+#define UMAC_IRQ_TXDMA_PDONE		(1 << 17)
+#define UMAC_IRQ_TXDMA_BDONE		(1 << 18)
+/* Only valid for GENETv3+ */
+#define UMAC_IRQ_MDIO_DONE		(1 << 23)
+#define UMAC_IRQ_MDIO_ERROR		(1 << 24)
+
+/* Register block offset */
+#define GENET_SYS_OFF			0x0000
+#define GENET_GR_BRIDGE_OFF		0x0040
+#define GENET_EXT_OFF			0x0080
+#define GENET_INTRL2_0_OFF		0x0200
+#define GENET_INTRL2_1_OFF		0x0240
+#define GENET_RBUF_OFF			0x0300
+#define GENET_UMAC_OFF			0x0800
+
+/* SYS block offsets and register definitions */
+#define SYS_REV_CTRL			0x00
+#define SYS_PORT_CTRL			0x04
+#define  PORT_MODE_INT_EPHY		0
+#define  PORT_MODE_INT_GPHY		1
+#define  PORT_MODE_EXT_EPHY		2
+#define  PORT_MODE_EXT_GPHY		3
+#define  PORT_MODE_EXT_RVMII_25		(4 | BIT(4))
+#define  PORT_MODE_EXT_RVMII_50		4
+#define  LED_ACT_SOURCE_MAC		(1 << 9)
+
+#define SYS_RBUF_FLUSH_CTRL		0x08
+#define SYS_TBUF_FLUSH_CTRL		0x0C
+#define RBUF_FLUSH_CTRL_V1		0x04
+
+/* Ext block register offsets and definitions */
+#define EXT_EXT_PWR_MGMT		0x00
+#define  EXT_PWR_DOWN_BIAS		(1 << 0)
+#define  EXT_PWR_DOWN_DLL		(1 << 1)
+#define  EXT_PWR_DOWN_PHY		(1 << 2)
+#define  EXT_PWR_DN_EN_LD		(1 << 3)
+#define  EXT_ENERGY_DET			(1 << 4)
+#define  EXT_IDDQ_FROM_PHY		(1 << 5)
+#define  EXT_PHY_RESET			(1 << 8)
+#define  EXT_ENERGY_DET_MASK		(1 << 12)
+
+#define EXT_RGMII_OOB_CTRL		0x0C
+#define  RGMII_MODE_EN			(1 << 0)
+#define  RGMII_LINK			(1 << 4)
+#define  OOB_DISABLE			(1 << 5)
+#define  ID_MODE_DIS			(1 << 16)
+
+#define EXT_GPHY_CTRL			0x1C
+#define  EXT_CFG_IDDQ_BIAS		(1 << 0)
+#define  EXT_CFG_PWR_DOWN		(1 << 1)
+#define  EXT_GPHY_RESET			(1 << 5)
+
+/* DMA rings size */
+#define DMA_RING_SIZE			(0x40)
+#define DMA_RINGS_SIZE			(DMA_RING_SIZE * (DESC_INDEX + 1))
+
+/* DMA registers common definitions */
+#define DMA_RW_POINTER_MASK		0x1FF
+#define DMA_P_INDEX_DISCARD_CNT_MASK	0xFFFF
+#define DMA_P_INDEX_DISCARD_CNT_SHIFT	16
+#define DMA_BUFFER_DONE_CNT_MASK	0xFFFF
+#define DMA_BUFFER_DONE_CNT_SHIFT	16
+#define DMA_P_INDEX_MASK		0xFFFF
+#define DMA_C_INDEX_MASK		0xFFFF
+
+/* DMA ring size register */
+#define DMA_RING_SIZE_MASK		0xFFFF
+#define DMA_RING_SIZE_SHIFT		16
+#define DMA_RING_BUFFER_SIZE_MASK	0xFFFF
+
+/* DMA interrupt threshold register */
+#define DMA_INTR_THRESHOLD_MASK		0x00FF
+
+/* DMA XON/XOFF register */
+#define DMA_XON_THREHOLD_MASK		0xFFFF
+#define DMA_XOFF_THRESHOLD_MASK		0xFFFF
+#define DMA_XOFF_THRESHOLD_SHIFT	16
+
+/* DMA flow period register */
+#define DMA_FLOW_PERIOD_MASK		0xFFFF
+#define DMA_MAX_PKT_SIZE_MASK		0xFFFF
+#define DMA_MAX_PKT_SIZE_SHIFT		16
+
+
+/* DMA control register */
+#define DMA_EN				(1 << 0)
+#define DMA_RING_BUF_EN_SHIFT		0x01
+#define DMA_RING_BUF_EN_MASK		0xFFFF
+#define DMA_TSB_SWAP_EN			(1 << 20)
+
+/* DMA status register */
+#define DMA_DISABLED			(1 << 0)
+#define DMA_DESC_RAM_INIT_BUSY		(1 << 1)
+
+/* DMA SCB burst size register */
+#define DMA_SCB_BURST_SIZE_MASK		0x1F
+
+/* DMA activity vector register */
+#define DMA_ACTIVITY_VECTOR_MASK	0x1FFFF
+
+/* DMA backpressure mask register */
+#define DMA_BACKPRESSURE_MASK		0x1FFFF
+#define DMA_PFC_ENABLE			(1 << 31)
+
+/* DMA backpressure status register */
+#define DMA_BACKPRESSURE_STATUS_MASK	0x1FFFF
+
+/* DMA override register */
+#define DMA_LITTLE_ENDIAN_MODE		(1 << 0)
+#define DMA_REGISTER_MODE		(1 << 1)
+
+/* DMA timeout register */
+#define DMA_TIMEOUT_MASK		0xFFFF
+
+/* TDMA rate limiting control register */
+#define DMA_RATE_LIMIT_EN_MASK		0xFFFF
+
+/* TDMA arbitration control register */
+#define DMA_ARBITER_MODE_MASK		0x03
+#define DMA_RING_BUF_PRIORITY_MASK	0x1F
+#define DMA_RING_BUF_PRIORITY_SHIFT	5
+#define DMA_RATE_ADJ_MASK		0xFF
+
+/* Tx/Rx Dma Descriptor common bits*/
+#define DMA_BUFLENGTH_MASK		0x0fff
+#define DMA_BUFLENGTH_SHIFT		16
+#define DMA_OWN				0x8000
+#define DMA_EOP				0x4000
+#define DMA_SOP				0x2000
+#define DMA_WRAP			0x1000
+/* Tx specific Dma descriptor bits */
+#define DMA_TX_UNDERRUN			0x0200
+#define DMA_TX_APPEND_CRC		0x0040
+#define DMA_TX_OW_CRC			0x0020
+#define DMA_TX_DO_CSUM			0x0010
+#define DMA_TX_QTAG_SHIFT		7
+
+/* Rx Specific Dma descriptor bits */
+#define DMA_RX_CHK_V3PLUS		0x8000
+#define DMA_RX_CHK_V12			0x1000
+#define DMA_RX_BRDCAST			0x0040
+#define DMA_RX_MULT			0x0020
+#define DMA_RX_LG			0x0010
+#define DMA_RX_NO			0x0008
+#define DMA_RX_RXER			0x0004
+#define DMA_RX_CRC_ERROR		0x0002
+#define DMA_RX_OV			0x0001
+#define DMA_RX_FI_MASK			0x001F
+#define DMA_RX_FI_SHIFT			0x0007
+#define DMA_DESC_ALLOC_MASK		0x00FF
+
+#define DMA_ARBITER_RR			0x00
+#define DMA_ARBITER_WRR			0x01
+#define DMA_ARBITER_SP			0x02
+
+struct enet_cb {
+	struct sk_buff      *skb;
+	void __iomem *bd_addr;
+	DEFINE_DMA_UNMAP_ADDR(dma_addr);
+	DEFINE_DMA_UNMAP_LEN(dma_len);
+};
+
+/* power management mode */
+enum bcmgenet_power_mode {
+	GENET_POWER_CABLE_SENSE = 0,
+	GENET_POWER_WOL_MAGIC,
+	GENET_POWER_WOL_ACPI,
+	GENET_POWER_PASSIVE,
+};
+
+struct bcmgenet_priv;
+
+/* We support both runtime GENET detection and compile-time
+ * to optimize code-paths for a given hardware
+ */
+enum bcmgenet_version {
+	GENET_V1 = 1,
+	GENET_V2,
+	GENET_V3,
+	GENET_V4
+};
+
+#define GENET_IS_V1(p)	(__genet_get_version(p) == GENET_V1)
+#define GENET_IS_V2(p)	(__genet_get_version(p) == GENET_V2)
+#define GENET_IS_V3(p)	(__genet_get_version(p) == GENET_V3)
+#define GENET_IS_V4(p)	(__genet_get_version(p) == GENET_V4)
+
+enum bcmgenet_version __genet_get_version(struct bcmgenet_priv *priv);
+
+
+/* Hardware flags */
+#define GENET_HAS_40BITS	(1 << 0)
+#define GENET_HAS_EXT		(1 << 1)
+#define GENET_HAS_MDIO_INTR	(1 << 2)
+
+/* BCMGENET hardware parameters, keep this structure nicely aligned
+ * since it is going to be used in hot paths
+ */
+struct bcmgenet_hw_params {
+	u8		tx_queues;
+	u8		rx_queues;
+	u8		bds_cnt;
+	u8		bp_in_en_shift;
+	u32		bp_in_mask;
+	u8		hfb_filter_cnt;
+	u8		qtag_mask;
+	u16		tbuf_offset;
+	u32		hfb_offset;
+	u32		hfb_reg_offset;
+	u32		rdma_offset;
+	u32		tdma_offset;
+	u32		words_per_bd;
+	u32		flags;
+};
+
+struct bcmgenet_tx_ring {
+	spinlock_t	lock;		/* ring lock */
+	unsigned int	index;		/* ring index */
+	unsigned int	queue;		/* queue index */
+	struct enet_cb	*cbs;		/* tx ring buffer control block*/
+	unsigned int	size;		/* size of each tx ring */
+	unsigned int	c_index;	/* last consumer index of each ring*/
+	unsigned int	free_bds;	/* # of free bds for each ring */
+	unsigned int	write_ptr;	/* Tx ring write pointer SW copy */
+	unsigned int	prod_index;	/* Tx ring producer index SW copy */
+	unsigned int	cb_ptr;		/* Tx ring initial CB ptr */
+	unsigned int	end_ptr;	/* Tx ring end CB ptr */
+	void (*int_enable)(struct bcmgenet_priv *priv,
+				struct bcmgenet_tx_ring *);
+	void (*int_disable)(struct bcmgenet_priv *priv,
+				struct bcmgenet_tx_ring *);
+};
+
+/* device context */
+struct bcmgenet_priv {
+	void __iomem *base;
+	enum bcmgenet_version version;
+	struct net_device *dev;
+	spinlock_t lock;
+	spinlock_t bh_lock;
+	u32 int0_mask;
+	u32 int1_mask;
+
+	/* NAPI for descriptor based rx */
+	struct napi_struct napi ____cacheline_aligned;
+
+	/* transmit variables */
+	void __iomem *tx_bds;
+	struct enet_cb *tx_cbs;
+	unsigned int num_tx_bds;
+
+	struct bcmgenet_tx_ring tx_rings[DESC_INDEX + 1];
+
+	/* receive variables */
+	void __iomem *rx_bds;
+	void __iomem *rx_bd_assign_ptr;
+	int rx_bd_assign_index;
+	struct enet_cb *rx_cbs;
+	unsigned int num_rx_bds;
+	unsigned int rx_buf_len;
+	unsigned int rx_read_ptr;
+	unsigned int rx_c_index;
+
+	/* other misc variables */
+	struct bcmgenet_hw_params *hw_params;
+	wait_queue_head_t wq;
+	struct phy_device *phydev;
+	struct device_node *phy_dn;
+	struct mii_bus *mii_bus;
+	int old_duplex;
+	int old_link;
+	int old_pause;
+	phy_interface_t phy_interface;
+	u32 phy_supported;
+	int irq0;
+	int irq1;
+	int phy_addr;
+	int phy_type;
+	int phy_speed;
+	int ext_phy;
+	unsigned int irq0_stat;
+	unsigned int irq1_stat;
+	unsigned int desc_64b_en;
+	unsigned int desc_rxchk_en;
+	unsigned int dma_rx_chk_bit;
+	unsigned int crc_fwd_en;
+	u32 msg_enable;
+
+	struct work_struct bcmgenet_irq_work;
+	struct clk *clk;
+	struct platform_device *pdev;
+
+	/* WOL */
+	unsigned long wol_enabled;
+	struct clk *clk_wol;
+	u32 wolopts;
+
+	struct mutex mib_mutex;
+	struct bcmgenet_mib_counters mib;
+};
+
+#define GENET_IO_MACRO(name, offset)					\
+static inline u32 bcmgenet_##name##_readl(struct bcmgenet_priv *priv,	\
+					u32 off)			\
+{									\
+	return __raw_readl(priv->base + offset + off);			\
+}									\
+static inline void bcmgenet_##name##_writel(struct bcmgenet_priv *priv,	\
+					u32 val, u32 off)		\
+{									\
+	__raw_writel(val, priv->base + offset + off);			\
+}
+
+GENET_IO_MACRO(ext, GENET_EXT_OFF);
+GENET_IO_MACRO(umac, GENET_UMAC_OFF);
+GENET_IO_MACRO(sys, GENET_SYS_OFF);
+
+/* interrupt l2 registers accessors */
+GENET_IO_MACRO(intrl2_0, GENET_INTRL2_0_OFF);
+GENET_IO_MACRO(intrl2_1, GENET_INTRL2_1_OFF);
+
+/* HFB register accessors  */
+GENET_IO_MACRO(hfb, priv->hw_params->hfb_offset);
+
+/* GENET v2+ HFB control and filter len helpers */
+GENET_IO_MACRO(hfb_reg, priv->hw_params->hfb_reg_offset);
+
+/* RBUF register accessors */
+GENET_IO_MACRO(rbuf, GENET_RBUF_OFF);
+
+
+int bcmgenet_mii_init(struct net_device *dev);
+int bcmgenet_mii_config(struct net_device *dev);
+void bcmgenet_mii_exit(struct net_device *dev);
+void bcmgenet_mii_reset(struct net_device *dev);
+
+#endif /* __BCMGENET_H__ */
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 07/10] net: bcmgenet: add MDIO routines
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

This patch adds support for configuring the port multiplexer hardware
which resides in front of the GENET Ethernet MAC controller. This allows
us to support:

- internal PHYs (using drivers/net/phy/bcm7xxx.c)
- MoCA PHYs which are an entirely separate hardware block not covered
  here
- external PHYs and switches

Note that MoCA and switches are currently supported using the emulated
"fixed PHY" driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmmii.c | 483 +++++++++++++++++++++++++++
 1 file changed, 483 insertions(+)
 create mode 100644 drivers/net/ethernet/broadcom/genet/bcmmii.c

diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
new file mode 100644
index 0000000..15b3392
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -0,0 +1,483 @@
+/*
+ * Broadcom GENET MDIO routines
+ *
+ * Copyright (c) 2014 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+
+#include <linux/types.h>
+#include <linux/delay.h>
+#include <linux/wait.h>
+#include <linux/mii.h>
+#include <linux/ethtool.h>
+#include <linux/bitops.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/phy.h>
+#include <linux/phy_fixed.h>
+#include <linux/brcmphy.h>
+#include <linux/of.h>
+#include <linux/of_net.h>
+#include <linux/of_mdio.h>
+
+#include "bcmgenet.h"
+
+/* read a value from the MII */
+static int bcmgenet_mii_read(struct mii_bus *bus, int phy_id, int location)
+{
+	int ret;
+	struct net_device *dev = bus->priv;
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	u32 reg;
+
+	bcmgenet_umac_writel(priv, (MDIO_RD | (phy_id << MDIO_PMD_SHIFT) |
+			(location << MDIO_REG_SHIFT)), UMAC_MDIO_CMD);
+	/* Start MDIO transaction*/
+	reg = bcmgenet_umac_readl(priv, UMAC_MDIO_CMD);
+	reg |= MDIO_START_BUSY;
+	bcmgenet_umac_writel(priv, reg, UMAC_MDIO_CMD);
+	wait_event_timeout(priv->wq,
+			!(bcmgenet_umac_readl(priv, UMAC_MDIO_CMD)
+				& MDIO_START_BUSY),
+			HZ / 100);
+	ret = bcmgenet_umac_readl(priv, UMAC_MDIO_CMD);
+
+	if (ret & MDIO_READ_FAIL)
+		return -EIO;
+
+	return ret & 0xffff;
+}
+
+/* write a value to the MII */
+static int bcmgenet_mii_write(struct mii_bus *bus, int phy_id,
+			int location, u16 val)
+{
+	struct net_device *dev = bus->priv;
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	u32 reg;
+
+	bcmgenet_umac_writel(priv, (MDIO_WR | (phy_id << MDIO_PMD_SHIFT) |
+			(location << MDIO_REG_SHIFT) | (0xffff & val)),
+			UMAC_MDIO_CMD);
+	reg = bcmgenet_umac_readl(priv, UMAC_MDIO_CMD);
+	reg |= MDIO_START_BUSY;
+	bcmgenet_umac_writel(priv, reg, UMAC_MDIO_CMD);
+	wait_event_timeout(priv->wq,
+			!(bcmgenet_umac_readl(priv, UMAC_MDIO_CMD) &
+				MDIO_START_BUSY),
+			HZ / 100);
+
+	return 0;
+}
+
+/* setup netdev link state when PHY link status change and
+ * update UMAC and RGMII block when link up
+ */
+static void bcmgenet_mii_setup(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	struct phy_device *phydev = priv->phydev;
+	u32 reg, cmd_bits = 0;
+	unsigned int status_changed = 0;
+
+	if (priv->old_link != phydev->link) {
+		status_changed = 1;
+		priv->old_link = phydev->link;
+	}
+
+	if (phydev->link) {
+		/* program UMAC and RGMII block based on established link
+		 * speed, pause, and duplex.
+		 * the speed set in umac->cmd tell RGMII block which clock
+		 * 25MHz(100Mbps)/125MHz(1Gbps) to use for transmit.
+		 * receive clock is provided by PHY.
+		 */
+		reg = bcmgenet_ext_readl(priv, EXT_RGMII_OOB_CTRL);
+		reg &= ~OOB_DISABLE;
+		reg |= RGMII_LINK;
+		bcmgenet_ext_writel(priv, reg, EXT_RGMII_OOB_CTRL);
+
+		/* speed */
+		if (phydev->speed == SPEED_1000)
+			cmd_bits = UMAC_SPEED_1000;
+		else if (phydev->speed == SPEED_100)
+			cmd_bits = UMAC_SPEED_100;
+		else
+			cmd_bits = UMAC_SPEED_10;
+		cmd_bits <<= CMD_SPEED_SHIFT;
+
+		if (priv->old_duplex != phydev->duplex) {
+			status_changed = 1;
+			priv->old_duplex = phydev->duplex;
+		}
+
+		/* duplex */
+		if (phydev->duplex != DUPLEX_FULL)
+			cmd_bits |= CMD_HD_EN;
+
+		if (priv->old_pause != phydev->pause) {
+			status_changed = 1;
+			priv->old_pause = phydev->pause;
+		}
+
+		/* pause capability */
+		if (!phydev->pause)
+			cmd_bits |= CMD_RX_PAUSE_IGNORE | CMD_TX_PAUSE_IGNORE;
+
+		reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+		reg &= ~((CMD_SPEED_MASK << CMD_SPEED_SHIFT) |
+			       CMD_HD_EN |
+			       CMD_RX_PAUSE_IGNORE | CMD_TX_PAUSE_IGNORE);
+		reg |= cmd_bits;
+		bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+	}
+
+	if (status_changed)
+		phy_print_status(phydev);
+}
+
+void bcmgenet_mii_reset(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+
+	if (priv->phydev) {
+		phy_init_hw(priv->phydev);
+		phy_start_aneg(priv->phydev);
+	}
+}
+
+static void bcmgenet_ephy_power_up(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	u32 reg = 0;
+
+	/* EXT_GPHY_CTRL is only valid for GENETv4 and onward */
+	if (!GENET_IS_V4(priv))
+		return;
+
+	reg = bcmgenet_ext_readl(priv, EXT_GPHY_CTRL);
+	reg &= ~(EXT_CFG_IDDQ_BIAS | EXT_CFG_PWR_DOWN);
+	reg |= EXT_GPHY_RESET;
+	bcmgenet_ext_writel(priv, reg, EXT_GPHY_CTRL);
+	mdelay(2);
+
+	reg &= ~EXT_GPHY_RESET;
+	bcmgenet_ext_writel(priv, reg, EXT_GPHY_CTRL);
+	udelay(20);
+}
+
+static int bcmgenet_mii_probe(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	struct phy_device *phydev;
+	unsigned int phy_flags;
+
+	if (priv->phydev) {
+		pr_info("PHY already attached\n");
+		return 0;
+	}
+
+	phy_flags = PHY_BRCM_100MBPS_WAR;
+
+	/* workarounds are only needed for 100Mbps PHYs */
+	if (priv->phy_speed == SPEED_1000)
+		phy_flags = 0;
+
+	/* workarounds are only needed for some 40nm chips, exclude
+	 * GENET v1
+	 */
+	if (GENET_IS_V1(priv))
+		phy_flags = 0;
+
+	if (priv->phy_dn)
+		phydev = of_phy_connect(dev, priv->phy_dn,
+					bcmgenet_mii_setup, phy_flags,
+					priv->phy_interface);
+	else
+		phydev = of_phy_connect_fixed_link(dev,
+					bcmgenet_mii_setup,
+					priv->phy_interface);
+
+	if (!phydev) {
+		pr_err("could not attach to PHY\n");
+		return -ENODEV;
+	}
+
+	phydev->supported &= priv->phy_supported;
+	/* Adjust advertised speeds based on configured speed */
+	if (priv->phy_speed == SPEED_1000)
+		phydev->advertising = PHY_GBIT_FEATURES;
+	else
+		phydev->advertising = PHY_BASIC_FEATURES;
+
+	pr_info("attached PHY at address %d [%s]\n",
+			phydev->addr, phydev->drv->name);
+
+	priv->old_link = -1;
+	priv->old_duplex = -1;
+	priv->old_pause = -1;
+	priv->phydev = phydev;
+
+	return 0;
+}
+
+static int bcmgenet_mii_alloc(struct bcmgenet_priv *priv)
+{
+	struct mii_bus *bus;
+	int ret = 0;
+
+	if (priv->mii_bus)
+		return 0;
+
+	priv->mii_bus = mdiobus_alloc();
+	if (!priv->mii_bus) {
+		pr_err("failed to allocate\n");
+		return -ENOMEM;
+	}
+
+	bus = priv->mii_bus;
+	bus->priv = priv->dev;
+	bus->name = "bcmgenet MII bus";
+	bus->parent = &priv->pdev->dev;
+	bus->read = bcmgenet_mii_read;
+	bus->write = bcmgenet_mii_write;
+	snprintf(bus->id, MII_BUS_ID_SIZE, "%s-%d",
+			priv->pdev->name, priv->pdev->id);
+
+	bus->irq = kzalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
+	if (!bus->irq) {
+		ret = -ENOMEM;
+		goto out_mdio_free;
+	}
+
+	/* The internal PHY has its link interrupts routed to the
+	 * Ethernet MAC ISRs
+	 */
+	if (priv->phy_type == PHY_INTERFACE_MODE_INTERNAL)
+		bus->irq[priv->phy_addr] = PHY_IGNORE_INTERRUPT;
+	else
+		bus->irq[priv->phy_addr] = PHY_POLL;
+
+	return 0;
+
+out_mdio_free:
+	mdiobus_free(priv->mii_bus);
+	return ret;
+}
+
+static void bcmgenet_mii_free(struct bcmgenet_priv *priv)
+{
+	mdiobus_unregister(priv->mii_bus);
+	kfree(priv->mii_bus->irq);
+	mdiobus_free(priv->mii_bus);
+}
+
+static void bcmgenet_internal_phy_setup(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	u32 reg;
+
+	/* Power up EPHY */
+	bcmgenet_ephy_power_up(dev);
+	/* enable APD */
+	reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+	reg |= EXT_PWR_DN_EN_LD;
+	bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+	bcmgenet_mii_reset(dev);
+}
+
+static void bcmgenet_moca_phy_setup(struct bcmgenet_priv *priv)
+{
+	u32 reg;
+
+	/* Speed settings are set in bcmgenet_mii_setup() */
+	reg = bcmgenet_sys_readl(priv, SYS_PORT_CTRL);
+	reg |= LED_ACT_SOURCE_MAC;
+	bcmgenet_sys_writel(priv, reg, SYS_PORT_CTRL);
+}
+
+int bcmgenet_mii_config(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	struct device *kdev = &priv->pdev->dev;
+	const char *phy_name = NULL;
+	u32 id_mode_dis = 0;
+	u32 port_ctrl;
+	u32 reg;
+
+	priv->ext_phy = (priv->phy_type != PHY_INTERFACE_MODE_INTERNAL) &&
+			(priv->phy_type != PHY_INTERFACE_MODE_MOCA);
+
+	switch (priv->phy_interface) {
+	case PHY_INTERFACE_MODE_INTERNAL:
+	case PHY_INTERFACE_MODE_MOCA:
+		/* Irrespective of the actually configured PHY speed (100 or
+		 * 1000) GENETv4 only has an internal GPHY so we will just end
+		 * up masking the Gigabit features from what we support, not
+		 * switching to the EPHY
+		 */
+		if (GENET_IS_V4(priv)) {
+			priv->phy_supported = PHY_GBIT_FEATURES;
+			port_ctrl = PORT_MODE_INT_GPHY;
+		} else {
+			priv->phy_supported = PHY_BASIC_FEATURES;
+			port_ctrl = PORT_MODE_INT_EPHY;
+		}
+
+		bcmgenet_sys_writel(priv, port_ctrl, SYS_PORT_CTRL);
+
+		if (priv->phy_type == PHY_INTERFACE_MODE_INTERNAL) {
+			phy_name = "internal PHY";
+			bcmgenet_internal_phy_setup(dev);
+		} else if (priv->phy_type == PHY_INTERFACE_MODE_MOCA) {
+			phy_name = "MoCA";
+			bcmgenet_moca_phy_setup(priv);
+		}
+		break;
+
+	case PHY_INTERFACE_MODE_MII:
+		phy_name = "external MII";
+		priv->phy_supported = PHY_BASIC_FEATURES;
+		bcmgenet_sys_writel(priv,
+				PORT_MODE_EXT_EPHY, SYS_PORT_CTRL);
+		break;
+
+	case PHY_INTERFACE_MODE_REVMII:
+		phy_name = "external RvMII";
+		if (priv->phy_speed == SPEED_100) {
+			priv->phy_supported = PHY_BASIC_FEATURES;
+			port_ctrl = PORT_MODE_EXT_RVMII_25;
+		} else {
+			priv->phy_supported = PHY_GBIT_FEATURES;
+			port_ctrl = PORT_MODE_EXT_RVMII_50;
+		}
+		bcmgenet_sys_writel(priv, port_ctrl, SYS_PORT_CTRL);
+		break;
+
+	case PHY_INTERFACE_MODE_RGMII:
+		/* RGMII_NO_ID: TXC transitions at the same time as TXD
+		 *		(requires PCB or receiver-side delay)
+		 * RGMII:	Add 2ns delay on TXC (90 degree shift)
+		 *
+		 * ID is implicitly disabled for 100Mbps (RG)MII operation.
+		 */
+		id_mode_dis = BIT(16);
+		/* fall through */
+	case PHY_INTERFACE_MODE_RGMII_TXID:
+		if (id_mode_dis)
+			phy_name = "external RGMII (no delay)";
+		else
+			phy_name = "external RGMII (TX delay)";
+		reg = bcmgenet_ext_readl(priv, EXT_RGMII_OOB_CTRL);
+		reg |= RGMII_MODE_EN | id_mode_dis;
+		bcmgenet_ext_writel(priv, reg, EXT_RGMII_OOB_CTRL);
+		bcmgenet_sys_writel(priv,
+				PORT_MODE_EXT_GPHY, SYS_PORT_CTRL);
+		priv->phy_supported = PHY_GBIT_FEATURES;
+		/* setup mii based on configure speed and RGMII txclk is set in
+		 * umac->cmd, mii_setup() after link established.
+		 */
+		break;
+	default:
+		dev_err(kdev, "unknown phy mode: %d\n", priv->phy_interface);
+		return -EINVAL;
+	}
+
+	dev_info(kdev, "configuring instance for %s\n", phy_name);
+
+	return 0;
+}
+
+static int bcmgenet_mii_of_init(struct bcmgenet_priv *priv)
+{
+	struct device_node *dn = priv->pdev->dev.of_node;
+	struct device *kdev = &priv->pdev->dev;
+	struct device_node *mdio_dn;
+	const __be32 *fixed_link;
+	u32 propval;
+	int phy_mode;
+	int ret, sz;
+
+	mdio_dn = of_get_next_child(dn, NULL);
+	if (!mdio_dn) {
+		dev_err(kdev, "unable to find MDIO bus node\n");
+		return -ENODEV;
+	}
+
+	ret = of_mdiobus_register(priv->mii_bus, mdio_dn);
+	if (ret) {
+		dev_err(kdev, "failed to register MDIO bus\n");
+		return ret;
+	}
+
+	/* Check if we have an internal or external PHY */
+	priv->phy_dn = of_parse_phandle(dn, "phy-handle", 0);
+	if (priv->phy_dn) {
+		if (!of_property_read_u32(priv->phy_dn, "max-speed", &propval))
+			priv->phy_speed = propval;
+	} else {
+		/* Read the link speed from the fixed-link property */
+		fixed_link = of_get_property(dn, "fixed-link", &sz);
+		if (!fixed_link || sz < sizeof(*fixed_link)) {
+			ret = -ENODEV;
+			goto out;
+		}
+
+		priv->phy_speed = be32_to_cpu(fixed_link[2]);
+	}
+
+	/* Get the link mode */
+	phy_mode = of_get_phy_mode(dn);
+	priv->phy_interface = phy_mode;
+	priv->phy_type = phy_mode;
+
+	return 0;
+out:
+	mdiobus_unregister(priv->mii_bus);
+	return ret;
+}
+
+int bcmgenet_mii_init(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	int ret;
+
+	ret = bcmgenet_mii_alloc(priv);
+	if (ret)
+		return ret;
+
+	ret = bcmgenet_mii_of_init(priv);
+	if (ret)
+		goto out;
+
+	ret = bcmgenet_mii_config(dev);
+	if (ret)
+		goto out;
+
+	ret = bcmgenet_mii_probe(dev);
+	if (ret)
+		goto out;
+
+	return 0;
+out:
+	bcmgenet_mii_free(priv);
+	return ret;
+}
+
+void bcmgenet_mii_exit(struct net_device *dev)
+{
+	bcmgenet_mii_free(netdev_priv(dev));
+}
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 08/10] net: bcmgenet: hook into the build system
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

This patch adds a new configuration symbol: CONFIG_BCMGENET which allows
us to build the Broadcom GENET driver and hook the driver files into the
build system.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/Kconfig        | 10 ++++++++++
 drivers/net/ethernet/broadcom/Makefile       |  1 +
 drivers/net/ethernet/broadcom/genet/Makefile |  2 ++
 3 files changed, 13 insertions(+)
 create mode 100644 drivers/net/ethernet/broadcom/genet/Makefile

diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index 3f97d9f..a489712 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -60,6 +60,16 @@ config BCM63XX_ENET
 	  This driver supports the ethernet MACs in the Broadcom 63xx
 	  MIPS chipset family (BCM63XX).
 
+config BCMGENET
+	tristate "Broadcom GENET internal MAC support"
+	select MII
+	select PHYLIB
+	select FIXED_PHY
+	select BCM7XXX_PHY
+	help
+	  This driver supports the built-in Ethernet MACs found in the
+	  Broadcom BCM7xxx Set Top Box family chipset.
+
 config BNX2
 	tristate "Broadcom NetXtremeII support"
 	depends on PCI
diff --git a/drivers/net/ethernet/broadcom/Makefile b/drivers/net/ethernet/broadcom/Makefile
index 68efa1a..fd639a0 100644
--- a/drivers/net/ethernet/broadcom/Makefile
+++ b/drivers/net/ethernet/broadcom/Makefile
@@ -4,6 +4,7 @@
 
 obj-$(CONFIG_B44) += b44.o
 obj-$(CONFIG_BCM63XX_ENET) += bcm63xx_enet.o
+obj-$(CONFIG_BCMGENET) += genet/
 obj-$(CONFIG_BNX2) += bnx2.o
 obj-$(CONFIG_CNIC) += cnic.o
 obj-$(CONFIG_BNX2X) += bnx2x/
diff --git a/drivers/net/ethernet/broadcom/genet/Makefile b/drivers/net/ethernet/broadcom/genet/Makefile
new file mode 100644
index 0000000..31f55a9
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/genet/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_BCMGENET) += genet.o
+genet-objs := bcmgenet.o bcmmii.o
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 09/10] Documentation: add Device tree bindings for Broadcom GENET
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

This patch adds the Device Tree bindings for the Broadcom GENET Gigabit
Ethernet controller. A bunch of examples are provided to illustrate the
versatile aspect of the hardare.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 .../devicetree/bindings/net/broadcom-bcmgenet.txt  | 111 +++++++++++++++++++++
 1 file changed, 111 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt

diff --git a/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt b/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
new file mode 100644
index 0000000..93c58e9
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
@@ -0,0 +1,111 @@
+* Broadcom BCM7xxx Ethernet Controller (GENET)
+
+Required properties:
+- compatible: should be "brcm,genet-v1", "brcm,genet-v2", "brcm,genet-v3",
+  "brcm,genet-v4".
+- reg: address and length of the register set for the device.
+- interrupts: interrupt for the device
+- mdio bus node: this node should always be present regarless of the PHY
+  configuration of the GENET instance
+- phy-mode: The interface between the SoC and the PHY (a string that
+  of_get_phy_mode() can understand).
+
+MDIO bus node required properties:
+
+- compatible: should be "brcm,genet-v<N>-mdio"
+- reg: address and length relative to the parent node base register address
+- address-cells: address cell for MDIO bus addressing, should be 1
+- size-cells: size of the cells for MDIO bus addressing, should be 0
+
+Optional properties:
+- phy-handle: A phandle to a phy node defining the PHY address (as the reg
+  property, a single integer), used to describe configurations where a PHY
+  (internal or external) is used.
+
+- fixed-link: When the GENET interface is connected to a MoCA hardware block
+  or when operating in a RGMII to RGMII type of connection, or when the
+  MDIO bus is voluntarily disabled, this property should be used to describe
+  the "fixed link", the property is described as follows:
+
+  fixed-link: <a b c d e> where a is emulated phy id - choose any,
+  but unique to the all specified fixed-links, b is duplex - 0 half,
+  1 full, c is link speed - d#10/d#100/d#1000, d is pause - 0 no
+  pause, 1 pause, e is asym_pause - 0 no asym_pause, 1 asym_pause.
+
+Internal Gigabit PHY example:
+
+ethernet@f0b60000 {
+	phy-mode = "internal";
+	phy-handle = <&phy1>;
+	mac-address = [ 00 10 18 36 23 1a ];
+	compatible = "brcm,genet-v4";
+	#address-cells = <0x1>;
+	#size-cells = <0x1>;
+	device_type = "ethernet";
+	reg = <0xf0b60000 0xfc4c>;
+	interrupts = <0x0 0x14 0x0 0x0 0x15 0x0>;
+
+	mdio@b60e14 {
+		compatible = "brcm,genet-mdio-v4";
+		#address-cells = <0x1>;
+		#size-cells = <0x0>;
+		reg = <0xb60e14 0x8>;
+
+		phy1: ethernet-phy@1 {
+			device_type = "ethernet-phy";
+			max-speed = <1000>;
+			reg = <0x1>;
+			compatible = "brcm,28nm-gphy", "ethernet-phy-ieee802.3-c22";
+		};
+	};
+};
+
+MoCA interface / MAC to MAC example:
+
+ethernet@f0b80000 {
+	phy-mode = "moca";
+	fixed-link = <1 0 1000 0 0>;
+	mac-address = [ 00 10 18 36 24 1a ];
+	compatible = "brcm,genet-v4";
+	#address-cells = <0x1>;
+	#size-cells = <0x1>;
+	device_type = "ethernet";
+	reg = <0xf0b80000 0xfc4c>;
+	interrupts = <0x0 0x16 0x0 0x0 0x17 0x0>;
+
+	mdio@b80e14 {
+		compatible = "brcm,genet-mdio-v4";
+		#address-cells = <0x1>;
+		#size-cells = <0x0>;
+		reg = <0xb80e14 0x8>;
+	};
+};
+
+
+External MDIO-connected Gigabit PHY/switch:
+
+ethernet@f0ba0000 {
+	phy-mode = "rgmii";
+	phy-handle = <&phy0>;
+	mac-address = [ 00 10 18 36 26 1a ];
+	compatible = "brcm,genet-v4";
+	#address-cells = <0x1>;
+	#size-cells = <0x1>;
+	device_type = "ethernet";
+	reg = <0xf0ba0000 0xfc4c>;
+	interrupts = <0x0 0x18 0x0 0x0 0x19 0x0>;
+
+	mdio@ba0e14 {
+		compatible = "brcm,genet-mdio-v4";
+		#address-cells = <0x1>;
+		#size-cells = <0x0>;
+		reg = <0xba0e14 0x8>;
+
+		phy0: ethernet-phy@0 {
+			device_type = "ethernet-phy";
+			max-speed = <1000>;
+			reg = <0x0>;
+			compatible = "brcm,bcm53125", "ethernet-phy-ieee802.3-c22";
+		};
+	};
+};
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 04/10] net: phy: add Broadcom BCM7xxx internal PHY driver
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

This patch adds support for the Broadcom BCM7xxx Set Top Box SoCs
internal PHYs. This driver supports the following generation of SoCs:

- BCM7366, BCM7439, BCM7445 (28nm process)
- all 40nm and 65nm (older MIPS-based SoCs)

The PHYs on these SoCs require a bunch of workarounds to operate
correctly, both during configuration time and at suspend/resume time,
the driver handles that for us.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/Kconfig   |   6 +
 drivers/net/phy/Makefile  |   1 +
 drivers/net/phy/bcm7xxx.c | 322 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/brcmphy.h   |   9 ++
 4 files changed, 338 insertions(+)
 create mode 100644 drivers/net/phy/bcm7xxx.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 9b5d46c..6a17f92 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -71,6 +71,12 @@ config BCM63XX_PHY
 	---help---
 	  Currently supports the 6348 and 6358 PHYs.
 
+config BCM7XXX_PHY
+	tristate "Drivers for Broadcom 7xxx SOCs internal PHYs"
+	---help---
+	  Currently supports the BCM7366, BCM7439, BCM7445, and
+	  40nm and 65nm generation of BCM7xxx Set Top Box SoCs.
+
 config BCM87XX_PHY
 	tristate "Driver for Broadcom BCM8706 and BCM8727 PHYs"
 	help
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 9013dfa..07d2402 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_SMSC_PHY)		+= smsc.o
 obj-$(CONFIG_VITESSE_PHY)	+= vitesse.o
 obj-$(CONFIG_BROADCOM_PHY)	+= broadcom.o
 obj-$(CONFIG_BCM63XX_PHY)	+= bcm63xx.o
+obj-$(CONFIG_BCM7XXX_PHY)	+= bcm7xxx.o
 obj-$(CONFIG_BCM87XX_PHY)	+= bcm87xx.o
 obj-$(CONFIG_ICPLUS_PHY)	+= icplus.o
 obj-$(CONFIG_REALTEK_PHY)	+= realtek.o
diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
new file mode 100644
index 0000000..f9ac282
--- /dev/null
+++ b/drivers/net/phy/bcm7xxx.c
@@ -0,0 +1,322 @@
+/*
+ * Broadcom BCM7xxx internal transceivers support.
+ *
+ * Copyright (C) 2014, Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/delay.h>
+#include <linux/brcmphy.h>
+
+static int bcm7445_config_init(struct phy_device *phydev)
+{
+	int ret;
+	const struct bcm7445_regs {
+		int reg;
+		u16 value;
+	} bcm7445_regs_cfg[] = {
+		/* increases ADC latency by 24ns */
+		{ 0x17, 0x0038 },
+		{ 0x15, 0xAB95 },
+		/* increases internal 1V LDO voltage by 5% */
+		{ 0x17, 0x2038 },
+		{ 0x15, 0xBB22 },
+		/* reduce RX low pass filter corner frequency */
+		{ 0x17, 0x6038 },
+		{ 0x15, 0xFFC5 },
+		/* reduce RX high pass filter corner frequency */
+		{ 0x17, 0x003a },
+		{ 0x15, 0x2002 },
+	};
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(bcm7445_regs_cfg); i++) {
+		ret = phy_write(phydev,
+				bcm7445_regs_cfg[i].reg,
+				bcm7445_regs_cfg[i].value);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static void phy_write_exp(struct phy_device *phydev,
+					u16 reg, u16 value)
+{
+	phy_write(phydev, 0x17, 0xf00 | reg);
+	phy_write(phydev, 0x15, value);
+}
+
+static void phy_write_misc(struct phy_device *phydev,
+					u16 reg, u16 chl, u16 value)
+{
+	int tmp;
+
+	phy_write(phydev, 0x18, 0x7);
+
+	tmp = phy_read(phydev, 0x18);
+	tmp |= 0x800;
+	phy_write(phydev, 0x18, tmp);
+
+	tmp = (chl * 0x2000) | reg;
+	phy_write(phydev, 0x17, tmp);
+
+	phy_write(phydev, 0x15, value);
+}
+
+static int bcm7xxx_28nm_afe_config_init(struct phy_device *phydev)
+{
+	/* write AFE_RXCONFIG_0 */
+	phy_write_misc(phydev, 0x38, 0x0000, 0xeb17);
+
+	/* write AFE_RXCONFIG_1 */
+	phy_write_misc(phydev, 0x38, 0x0001, 0x9a3f);
+
+	/* write AFE_RX_LP_COUNTER */
+	phy_write_misc(phydev, 0x38, 0x0003, 0x7fc7);
+
+	/* write AFE_HPF_TRIM_OTHERS */
+	phy_write_misc(phydev, 0x3A, 0x0000, 0x000b);
+
+	/* write AFTE_TX_CONFIG */
+	phy_write_misc(phydev, 0x39, 0x0000, 0x0800);
+
+	/* Increase VCO range to prevent unlocking problem of PLL at low
+	 * temp
+	 */
+	phy_write_misc(phydev, 0x0032, 0x0001, 0x0048);
+
+	/* Change Ki to 011 */
+	phy_write_misc(phydev, 0x0032, 0x0002, 0x021b);
+
+	/* Disable loading of TVCO buffer to bandgap, set bandgap trim
+	 * to 111
+	 */
+	phy_write_misc(phydev, 0x0033, 0x0000, 0x0e20);
+
+	/* Adjust bias current trim by -3 */
+	phy_write_misc(phydev, 0x000a, 0x0000, 0x690b);
+
+	/* Switch to CORE_BASE1E */
+	phy_write(phydev, 0x1e, 0xd);
+
+	/* Reset R_CAL/RC_CAL Engine */
+	phy_write_exp(phydev, 0x00b0, 0x0010);
+
+	/* Disable Reset R_CAL/RC_CAL Engine */
+	phy_write_exp(phydev, 0x00b0, 0x0000);
+
+	return 0;
+}
+
+static int bcm7xxx_28nm_config_init(struct phy_device *phydev)
+{
+	int ret;
+
+	ret = bcm7445_config_init(phydev);
+	if (ret)
+		return ret;
+
+	return bcm7xxx_28nm_afe_config_init(phydev);
+}
+
+static int phy_set_clr_bits(struct phy_device *dev, int location,
+					int set_mask, int clr_mask)
+{
+	int v, ret;
+
+	v = phy_read(dev, location);
+	if (v < 0)
+		return v;
+
+	v &= ~clr_mask;
+	v |= set_mask;
+
+	ret = phy_write(dev, location, v);
+	if (ret < 0)
+		return ret;
+
+	return v;
+}
+
+static int bcm7xxx_config_init(struct phy_device *phydev)
+{
+	/* Enable 64 clock MDIO */
+	phy_write(phydev, 0x1d, 0x1000);
+	phy_read(phydev, 0x1d);
+
+	/* Workaround only required for 100Mbits/sec */
+	if (!(phydev->dev_flags & PHY_BRCM_100MBPS_WAR))
+		return 0;
+
+	/* set shadow mode 2 */
+	phy_set_clr_bits(phydev, 0x1f, 0x0004, 0x0004);
+
+	/* set iddq_clkbias */
+	phy_write(phydev, 0x14, 0x0F00);
+	udelay(10);
+
+	/* reset iddq_clkbias */
+	phy_write(phydev, 0x14, 0x0C00);
+
+	phy_write(phydev, 0x13, 0x7555);
+
+	/* reset shadow mode 2 */
+	phy_set_clr_bits(phydev, 0x1f, 0x0004, 0);
+
+	return 0;
+}
+
+/* Workaround for putting the PHY in IDDQ mode, required
+ * for all BCM7XXX PHYs
+ */
+static int bcm7xxx_suspend(struct phy_device *phydev)
+{
+	int ret;
+	const struct bcm7xxx_regs {
+		int reg;
+		u16 value;
+	} bcm7xxx_suspend_cfg[] = {
+		{ 0x1f, 0x008b },
+		{ 0x10, 0x01c0 },
+		{ 0x14, 0x7000 },
+		{ 0x1f, 0x000f },
+		{ 0x10, 0x20d0 },
+		{ 0x1f, 0x000b },
+	};
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(bcm7xxx_suspend_cfg); i++) {
+		ret = phy_write(phydev,
+				bcm7xxx_suspend_cfg[i].reg,
+				bcm7xxx_suspend_cfg[i].value);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int bcm7xxx_dummy_config_init(struct phy_device *phydev)
+{
+	return 0;
+}
+
+static struct phy_driver bcm7xxx_driver[] = {
+{
+	.phy_id		= PHY_ID_BCM7366,
+	.phy_id_mask	= 0xfffffff0,
+	.name		= "Broadcom BCM7366",
+	.features	= PHY_GBIT_FEATURES |
+			  SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+	.flags		= PHY_IS_INTERNAL,
+	.config_init	= bcm7xxx_28nm_afe_config_init,
+	.config_aneg	= genphy_config_aneg,
+	.read_status	= genphy_read_status,
+	.suspend	= bcm7xxx_suspend,
+	.resume		= bcm7xxx_28nm_afe_config_init,
+	.driver		= { .owner = THIS_MODULE },
+}, {
+	.phy_id		= PHY_ID_BCM7439,
+	.phy_id_mask	= 0xfffffff0,
+	.name		= "Broadcom BCM7439",
+	.features	= PHY_GBIT_FEATURES |
+			  SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+	.flags		= PHY_IS_INTERNAL,
+	.config_init	= bcm7xxx_28nm_afe_config_init,
+	.config_aneg	= genphy_config_aneg,
+	.read_status	= genphy_read_status,
+	.suspend	= bcm7xxx_suspend,
+	.resume		= bcm7xxx_28nm_afe_config_init,
+	.driver		= { .owner = THIS_MODULE },
+}, {
+	.phy_id		= PHY_ID_BCM7445,
+	.phy_id_mask	= 0xfffffff0,
+	.name		= "Broadcom BCM7445",
+	.features	= PHY_GBIT_FEATURES |
+			  SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+	.flags		= PHY_IS_INTERNAL,
+	.config_init	= bcm7xxx_28nm_config_init,
+	.config_aneg	= genphy_config_aneg,
+	.read_status	= genphy_read_status,
+	.suspend	= bcm7xxx_suspend,
+	.resume		= bcm7xxx_28nm_config_init,
+	.driver		= { .owner = THIS_MODULE },
+}, {
+	.name		= "Broadcom BCM7XXX 28nm",
+	.phy_id		= PHY_ID_BCM7XXX_28,
+	.phy_id_mask	= PHY_BCM_OUI_MASK,
+	.features	= PHY_GBIT_FEATURES |
+			  SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+	.flags		= PHY_IS_INTERNAL,
+	.config_init	= bcm7xxx_28nm_config_init,
+	.config_aneg	= genphy_config_aneg,
+	.read_status	= genphy_read_status,
+	.suspend	= bcm7xxx_suspend,
+	.resume		= bcm7xxx_28nm_config_init,
+	.driver		= { .owner = THIS_MODULE },
+}, {
+	.phy_id		= PHY_BCM_OUI_4,
+	.phy_id_mask	= 0xffff0000,
+	.name		= "Broadcom BCM7XXX 40nm",
+	.features	= PHY_GBIT_FEATURES |
+			  SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+	.flags		= PHY_IS_INTERNAL,
+	.config_init	= bcm7xxx_config_init,
+	.config_aneg	= genphy_config_aneg,
+	.read_status	= genphy_read_status,
+	.suspend	= bcm7xxx_suspend,
+	.resume		= bcm7xxx_config_init,
+	.driver		= { .owner = THIS_MODULE },
+}, {
+	.phy_id		= PHY_BCM_OUI_5,
+	.phy_id_mask	= 0xffffff00,
+	.name		= "Broadcom BCM7XXX 65nm",
+	.features	= PHY_BASIC_FEATURES |
+			  SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+	.flags		= PHY_IS_INTERNAL,
+	.config_init	= bcm7xxx_dummy_config_init,
+	.config_aneg	= genphy_config_aneg,
+	.read_status	= genphy_read_status,
+	.suspend	= bcm7xxx_suspend,
+	.resume		= bcm7xxx_config_init,
+	.driver		= { .owner = THIS_MODULE },
+} };
+
+static struct mdio_device_id __maybe_unused bcm7xxx_tbl[] = {
+	{ PHY_ID_BCM7366, 0xfffffff0, },
+	{ PHY_ID_BCM7439, 0xfffffff0, },
+	{ PHY_ID_BCM7445, 0xfffffff0, },
+	{ PHY_ID_BCM7XXX_28, 0xfffffc00 },
+	{ PHY_BCM_OUI_4, 0xffff0000 },
+	{ PHY_BCM_OUI_5, 0xffffff00 },
+	{ }
+};
+
+static int __init bcm7xxx_phy_init(void)
+{
+	return phy_drivers_register(bcm7xxx_driver,
+			ARRAY_SIZE(bcm7xxx_driver));
+}
+
+static void __exit bcm7xxx_phy_exit(void)
+{
+	phy_drivers_unregister(bcm7xxx_driver,
+			ARRAY_SIZE(bcm7xxx_driver));
+}
+
+module_init(bcm7xxx_phy_init);
+module_exit(bcm7xxx_phy_exit);
+
+MODULE_DEVICE_TABLE(mdio, bcm7xxx_tbl);
+
+MODULE_DESCRIPTION("Broadcom BCM7xxx internal PHY driver");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Broadcom Corporation");
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index 677b4f0..e9fc98d 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -13,10 +13,17 @@
 #define PHY_ID_BCM5461			0x002060c0
 #define PHY_ID_BCM57780			0x03625d90
 
+#define PHY_ID_BCM7366			0x600d8490
+#define PHY_ID_BCM7439			0x600d8480
+#define PHY_ID_BCM7445			0x600d8510
+#define PHY_ID_BCM7XXX_28		0x600d8400
+
 #define PHY_BCM_OUI_MASK		0xfffffc00
 #define PHY_BCM_OUI_1			0x00206000
 #define PHY_BCM_OUI_2			0x0143bc00
 #define PHY_BCM_OUI_3			0x03625c00
+#define PHY_BCM_OUI_4			0x600d0000
+#define PHY_BCM_OUI_5			0x03625e00
 
 
 #define PHY_BCM_FLAGS_MODE_COPPER	0x00000001
@@ -31,6 +38,8 @@
 #define PHY_BRCM_EXT_IBND_TX_ENABLE	0x00002000
 #define PHY_BRCM_CLEAR_RGMII_MODE	0x00004000
 #define PHY_BRCM_DIS_TXCRXC_NOENRGY	0x00008000
+/* Broadcom BCM7xxx specific workarounds */
+#define PHY_BRCM_100MBPS_WAR		0x00010000
 #define PHY_BCM_FLAGS_VALID		0x80000000
 
 #endif /* _LINUX_BRCMPHY_H */
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 10/10] MAINTAINERS: add entry for the Broadcom GENET driver
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

Add myself as a maintainer of the Broadcom GENET driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 091b50e..5a7b3ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1845,6 +1845,12 @@ L:	netdev@vger.kernel.org
 S:	Supported
 F:	drivers/net/ethernet/broadcom/b44.*
 
+BROADCOM GENET ETHERNET DRIVER
+M:	Florian Fainelli <f.fainelli@gmail.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	drivers/net/ethernet/broadcom/genet/
+
 BROADCOM BNX2 GIGABIT ETHERNET DRIVER
 M:	Michael Chan <mchan@broadcom.com>
 L:	netdev@vger.kernel.org
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 06/10] net: bcmgenet: add main driver file
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

This patch adds the BCMGENET main driver file which supports the
following:

- GENET hardware from V1 to V4
- support for reading the UniMAC MIB counters statistics
- support for the 5 transmit queues
- support for RX/TX checksum offload and SG

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 2685 ++++++++++++++++++++++++
 1 file changed, 2685 insertions(+)
 create mode 100644 drivers/net/ethernet/broadcom/genet/bcmgenet.c

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
new file mode 100644
index 0000000..de21261
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -0,0 +1,2685 @@
+/*
+ * Broadcom GENET (Gigabit Ethernet) controller driver
+ *
+ * Copyright (c) 2014 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#define pr_fmt(fmt)				"bcmgenet: " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+#include <linux/fcntl.h>
+#include <linux/interrupt.h>
+#include <linux/string.h>
+#include <linux/if_ether.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/delay.h>
+#include <linux/platform_device.h>
+#include <linux/dma-mapping.h>
+#include <linux/pm.h>
+#include <linux/clk.h>
+#include <linux/version.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_net.h>
+#include <linux/of_platform.h>
+#include <net/arp.h>
+
+#include <linux/mii.h>
+#include <linux/ethtool.h>
+#include <linux/netdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/skbuff.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/phy.h>
+
+#include <asm/unaligned.h>
+
+#include "bcmgenet.h"
+
+/* Maximum number of hardware queues, downsized if needed */
+#define GENET_MAX_MQ_CNT	4
+
+/* Default highest priority queue for multi queue support */
+#define GENET_Q0_PRIORITY	0
+
+#define GENET_DEFAULT_BD_CNT	\
+	(TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->bds_cnt)
+
+#define RX_BUF_LENGTH		2048
+#define SKB_ALIGNMENT		32
+
+/* Tx/Rx DMA register offset, skip 256 descriptors */
+#define WORDS_PER_BD(p)		(p->hw_params->words_per_bd)
+#define DMA_DESC_SIZE		(WORDS_PER_BD(priv) * sizeof(u32))
+
+#define GENET_TDMA_REG_OFF	(priv->hw_params->tdma_offset + \
+				TOTAL_DESC * DMA_DESC_SIZE)
+
+#define GENET_RDMA_REG_OFF	(priv->hw_params->rdma_offset + \
+				TOTAL_DESC * DMA_DESC_SIZE)
+
+static inline void dmadesc_set_length_status(struct bcmgenet_priv *priv,
+						void __iomem *d, u32 value)
+{
+	__raw_writel(value, d + DMA_DESC_LENGTH_STATUS);
+}
+
+static inline u32 dmadesc_get_length_status(struct bcmgenet_priv *priv,
+						void __iomem *d)
+{
+	return __raw_readl(d + DMA_DESC_LENGTH_STATUS);
+}
+
+static inline void dmadesc_set_addr(struct bcmgenet_priv *priv,
+				    void __iomem *d,
+				    dma_addr_t addr)
+{
+	__raw_writel(lower_32_bits(addr), d + DMA_DESC_ADDRESS_LO);
+
+	/* Register writes to GISB bus can take couple hundred nanoseconds
+	 * and are done for each packet, save these expensive writes unless
+	 * the platform is explicitely configured for 64-bits/LPAE.
+	 */
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+	if (priv->hw_params->flags & GENET_HAS_40BITS)
+		__raw_writel(upper_32_bits(addr), d + DMA_DESC_ADDRESS_HI);
+#endif
+}
+
+/* Combined address + length/status setter */
+static inline void dmadesc_set(struct bcmgenet_priv *priv,
+				void __iomem *d, dma_addr_t addr, u32 val)
+{
+	dmadesc_set_length_status(priv, d, val);
+	dmadesc_set_addr(priv, d, addr);
+}
+
+static inline dma_addr_t dmadesc_get_addr(struct bcmgenet_priv *priv,
+					  void __iomem *d)
+{
+	dma_addr_t addr;
+
+	addr = __raw_readl(d + DMA_DESC_ADDRESS_LO);
+
+	/* Register writes to GISB bus can take couple hundred nanoseconds
+	 * and are done for each packet, save these expensive writes unless
+	 * the platform is explicitely configured for 64-bits/LPAE.
+	 */
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+	if (priv->hw_params->flags & GENET_HAS_40BITS)
+		addr |= (u64)__raw_readl(d + DMA_DESC_ADDRESS_HI) << 32;
+#endif
+	return addr;
+}
+
+#define GENET_VER_FMT	"%1d.%1d EPHY: 0x%04x"
+
+#define GENET_MSG_DEFAULT	(NETIF_MSG_DRV | NETIF_MSG_PROBE | \
+				NETIF_MSG_LINK)
+
+static int debug = -1;
+module_param(debug, int, 0);
+MODULE_PARM_DESC(debug, "GENET debug level");
+
+enum bcmgenet_version __genet_get_version(struct bcmgenet_priv *priv)
+{
+	return priv->version;
+}
+
+
+static inline u32 bcmgenet_rbuf_ctrl_get(struct bcmgenet_priv *priv)
+{
+	if (GENET_IS_V1(priv))
+		return bcmgenet_rbuf_readl(priv, RBUF_FLUSH_CTRL_V1);
+	else
+		return bcmgenet_sys_readl(priv, SYS_RBUF_FLUSH_CTRL);
+}
+
+static inline void bcmgenet_rbuf_ctrl_set(struct bcmgenet_priv *priv, u32 val)
+{
+	if (GENET_IS_V1(priv))
+		bcmgenet_rbuf_writel(priv, val, RBUF_FLUSH_CTRL_V1);
+	else
+		bcmgenet_sys_writel(priv, val, SYS_RBUF_FLUSH_CTRL);
+}
+
+/* These macros are defined to deal with register map change
+ * between GENET1.1 and GENET2. Only those currently being used
+ * by driver are defined.
+ */
+static inline u32 bcmgenet_tbuf_ctrl_get(struct bcmgenet_priv *priv)
+{
+	if (GENET_IS_V1(priv))
+		return bcmgenet_rbuf_readl(priv, TBUF_CTRL_V1);
+	else
+		return __raw_readl(priv->base +
+				priv->hw_params->tbuf_offset + TBUF_CTRL);
+}
+
+static inline void bcmgenet_tbuf_ctrl_set(struct bcmgenet_priv *priv, u32 val)
+{
+	if (GENET_IS_V1(priv))
+		bcmgenet_rbuf_writel(priv, val, TBUF_CTRL_V1);
+	else
+		__raw_writel(val, priv->base +
+				priv->hw_params->tbuf_offset + TBUF_CTRL);
+}
+
+static inline u32 bcmgenet_bp_mc_get(struct bcmgenet_priv *priv)
+{
+	if (GENET_IS_V1(priv))
+		return bcmgenet_rbuf_readl(priv, TBUF_BP_MC_V1);
+	else
+		return __raw_readl(priv->base +
+				priv->hw_params->tbuf_offset + TBUF_BP_MC);
+}
+
+static inline void bcmgenet_bp_mc_set(struct bcmgenet_priv *priv, u32 val)
+{
+	if (GENET_IS_V1(priv))
+		bcmgenet_rbuf_writel(priv, val, TBUF_BP_MC_V1);
+	else
+		__raw_writel(val, priv->base +
+				priv->hw_params->tbuf_offset + TBUF_BP_MC);
+}
+
+/* RX/TX DMA register accessors */
+enum dma_reg {
+	DMA_RING_CFG = 0,
+	DMA_CTRL,
+	DMA_STATUS,
+	DMA_SCB_BURST_SIZE,
+	DMA_ARB_CTRL,
+	DMA_PRIORITY,
+	DMA_RING_PRIORITY,
+};
+
+static const u8 bcmgenet_dma_regs_v3plus[] = {
+	[DMA_RING_CFG]		= 0x00,
+	[DMA_CTRL]		= 0x04,
+	[DMA_STATUS]		= 0x08,
+	[DMA_SCB_BURST_SIZE]	= 0x0C,
+	[DMA_ARB_CTRL]		= 0x2C,
+	[DMA_PRIORITY]		= 0x30,
+	[DMA_RING_PRIORITY]	= 0x38,
+};
+
+static const u8 bcmgenet_dma_regs_v2[] = {
+	[DMA_RING_CFG]		= 0x00,
+	[DMA_CTRL]		= 0x04,
+	[DMA_STATUS]		= 0x08,
+	[DMA_SCB_BURST_SIZE]	= 0x0C,
+	[DMA_ARB_CTRL]		= 0x30,
+	[DMA_PRIORITY]		= 0x34,
+	[DMA_RING_PRIORITY]	= 0x3C,
+};
+
+static const u8 bcmgenet_dma_regs_v1[] = {
+	[DMA_CTRL]		= 0x00,
+	[DMA_STATUS]		= 0x04,
+	[DMA_SCB_BURST_SIZE]	= 0x0C,
+	[DMA_ARB_CTRL]		= 0x30,
+	[DMA_PRIORITY]		= 0x34,
+	[DMA_RING_PRIORITY]	= 0x3C,
+};
+
+/* Set at runtime once bcmgenet version is known */
+static const u8 *bcmgenet_dma_regs;
+
+static inline struct bcmgenet_priv *dev_to_priv(struct device *dev)
+{
+	return netdev_priv(dev_get_drvdata(dev));
+}
+
+static inline u32 bcmgenet_tdma_readl(struct bcmgenet_priv *priv,
+					enum dma_reg r)
+{
+	return __raw_readl(priv->base + GENET_TDMA_REG_OFF +
+			DMA_RINGS_SIZE + bcmgenet_dma_regs[r]);
+}
+
+static inline void bcmgenet_tdma_writel(struct bcmgenet_priv *priv,
+					u32 val, enum dma_reg r)
+{
+	__raw_writel(val, priv->base + GENET_TDMA_REG_OFF +
+			DMA_RINGS_SIZE + bcmgenet_dma_regs[r]);
+}
+
+static inline u32 bcmgenet_rdma_readl(struct bcmgenet_priv *priv,
+					enum dma_reg r)
+{
+	return __raw_readl(priv->base + GENET_RDMA_REG_OFF +
+			DMA_RINGS_SIZE + bcmgenet_dma_regs[r]);
+}
+
+static inline void bcmgenet_rdma_writel(struct bcmgenet_priv *priv,
+					u32 val, enum dma_reg r)
+{
+	__raw_writel(val, priv->base + GENET_RDMA_REG_OFF +
+			DMA_RINGS_SIZE + bcmgenet_dma_regs[r]);
+}
+
+/* RDMA/TDMA ring registers and accessors
+ * we merge the common fields and just prefix with T/D the registers
+ * having different meaning depending on the direction
+ */
+enum dma_ring_reg {
+	TDMA_READ_PTR = 0,
+	RDMA_WRITE_PTR = TDMA_READ_PTR,
+	TDMA_READ_PTR_HI,
+	RDMA_WRITE_PTR_HI = TDMA_READ_PTR_HI,
+	TDMA_CONS_INDEX,
+	RDMA_PROD_INDEX = TDMA_CONS_INDEX,
+	TDMA_PROD_INDEX,
+	RDMA_CONS_INDEX = TDMA_PROD_INDEX,
+	DMA_RING_BUF_SIZE,
+	DMA_START_ADDR,
+	DMA_START_ADDR_HI,
+	DMA_END_ADDR,
+	DMA_END_ADDR_HI,
+	DMA_MBUF_DONE_THRESH,
+	TDMA_FLOW_PERIOD,
+	RDMA_XON_XOFF_THRESH = TDMA_FLOW_PERIOD,
+	TDMA_WRITE_PTR,
+	RDMA_READ_PTR = TDMA_WRITE_PTR,
+	TDMA_WRITE_PTR_HI,
+	RDMA_READ_PTR_HI = TDMA_WRITE_PTR_HI
+};
+
+/* GENET v4 supports 40-bits pointer addressing
+ * for obvious reasons the LO and HI word parts
+ * are contiguous, but this offsets the other
+ * registers.
+ */
+static const u8 genet_dma_ring_regs_v4[] = {
+	[TDMA_READ_PTR]			= 0x00,
+	[TDMA_READ_PTR_HI]		= 0x04,
+	[TDMA_CONS_INDEX]		= 0x08,
+	[TDMA_PROD_INDEX]		= 0x0C,
+	[DMA_RING_BUF_SIZE]		= 0x10,
+	[DMA_START_ADDR]		= 0x14,
+	[DMA_START_ADDR_HI]		= 0x18,
+	[DMA_END_ADDR]			= 0x1C,
+	[DMA_END_ADDR_HI]		= 0x20,
+	[DMA_MBUF_DONE_THRESH]		= 0x24,
+	[TDMA_FLOW_PERIOD]		= 0x28,
+	[TDMA_WRITE_PTR]		= 0x2C,
+	[TDMA_WRITE_PTR_HI]		= 0x30,
+};
+
+static const u8 genet_dma_ring_regs_v123[] = {
+	[TDMA_READ_PTR]			= 0x00,
+	[TDMA_CONS_INDEX]		= 0x04,
+	[TDMA_PROD_INDEX]		= 0x08,
+	[DMA_RING_BUF_SIZE]		= 0x0C,
+	[DMA_START_ADDR]		= 0x10,
+	[DMA_END_ADDR]			= 0x14,
+	[DMA_MBUF_DONE_THRESH]		= 0x18,
+	[TDMA_FLOW_PERIOD]		= 0x1C,
+	[TDMA_WRITE_PTR]		= 0x20,
+};
+
+/* Set at runtime once GENET version is known */
+static const u8 *genet_dma_ring_regs;
+
+static inline u32 bcmgenet_tdma_ring_readl(struct bcmgenet_priv *priv,
+						unsigned int ring,
+						enum dma_ring_reg r)
+{
+	return __raw_readl(priv->base + GENET_TDMA_REG_OFF +
+			(DMA_RING_SIZE * ring) +
+			genet_dma_ring_regs[r]);
+}
+
+static inline void bcmgenet_tdma_ring_writel(struct bcmgenet_priv *priv,
+						unsigned int ring,
+						u32 val,
+						enum dma_ring_reg r)
+{
+	__raw_writel(val, priv->base + GENET_TDMA_REG_OFF +
+			(DMA_RING_SIZE * ring) +
+			genet_dma_ring_regs[r]);
+}
+
+static inline u32 bcmgenet_rdma_ring_readl(struct bcmgenet_priv *priv,
+						unsigned int ring,
+						enum dma_ring_reg r)
+{
+	return __raw_readl(priv->base + GENET_RDMA_REG_OFF +
+			(DMA_RING_SIZE * ring) +
+			genet_dma_ring_regs[r]);
+}
+
+static inline void bcmgenet_rdma_ring_writel(struct bcmgenet_priv *priv,
+						unsigned int ring,
+						u32 val,
+						enum dma_ring_reg r)
+{
+	__raw_writel(val, priv->base + GENET_RDMA_REG_OFF +
+			(DMA_RING_SIZE * ring) +
+			genet_dma_ring_regs[r]);
+}
+
+static int bcmgenet_get_settings(struct net_device *dev,
+		struct ethtool_cmd *cmd)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+
+	if (!netif_running(dev))
+		return -EINVAL;
+
+	if (!priv->phydev)
+		return -ENODEV;
+
+	return phy_ethtool_gset(priv->phydev, cmd);
+}
+
+static int bcmgenet_set_settings(struct net_device *dev,
+		struct ethtool_cmd *cmd)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+
+	if (!netif_running(dev))
+		return -EINVAL;
+
+	if (!priv->phydev)
+		return -ENODEV;
+
+	return phy_ethtool_sset(priv->phydev, cmd);
+}
+
+static int bcmgenet_set_rx_csum(struct net_device *dev,
+				netdev_features_t wanted)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	u32 rbuf_chk_ctrl;
+	int rx_csum_en;
+
+	rx_csum_en = !!(wanted & NETIF_F_RXCSUM);
+
+	spin_lock_bh(&priv->bh_lock);
+	rbuf_chk_ctrl = bcmgenet_rbuf_readl(priv, RBUF_CHK_CTRL);
+
+	/* enable rx checksumming */
+	if (!rx_csum_en)
+		rbuf_chk_ctrl &= ~RBUF_RXCHK_EN;
+	else
+		rbuf_chk_ctrl |= RBUF_RXCHK_EN;
+	priv->desc_rxchk_en = rx_csum_en;
+	bcmgenet_rbuf_writel(priv, rbuf_chk_ctrl, RBUF_CHK_CTRL);
+
+	spin_unlock_bh(&priv->bh_lock);
+
+	return 0;
+}
+static int bcmgenet_set_tx_csum(struct net_device *dev,
+				netdev_features_t wanted)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	int desc_64b_en;
+	u32 tbuf_ctrl, rbuf_ctrl;
+
+	spin_lock_bh(&priv->bh_lock);
+	tbuf_ctrl = bcmgenet_tbuf_ctrl_get(priv);
+	rbuf_ctrl = bcmgenet_rbuf_readl(priv, RBUF_CTRL);
+
+	desc_64b_en = !!(wanted & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM));
+
+	/* enable 64bytes descriptor in both directions (RBUF and TBUF) */
+	if (!desc_64b_en) {
+		tbuf_ctrl &= ~RBUF_64B_EN;
+		rbuf_ctrl &= ~RBUF_64B_EN;
+	} else {
+		tbuf_ctrl |= RBUF_64B_EN;
+		rbuf_ctrl |= RBUF_64B_EN;
+	}
+	priv->desc_64b_en = desc_64b_en;
+
+	bcmgenet_tbuf_ctrl_set(priv, tbuf_ctrl);
+	bcmgenet_rbuf_writel(priv, rbuf_ctrl, RBUF_CTRL);
+	spin_unlock_bh(&priv->bh_lock);
+	return 0;
+}
+
+static int bcmgenet_set_features(struct net_device *dev,
+		netdev_features_t features)
+{
+	netdev_features_t changed = features ^ dev->features;
+	netdev_features_t wanted = dev->wanted_features;
+	int ret = 0;
+
+	if (changed & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))
+		ret = bcmgenet_set_tx_csum(dev, wanted);
+	if (changed & (NETIF_F_RXCSUM))
+		ret = bcmgenet_set_rx_csum(dev, wanted);
+
+	return ret;
+}
+
+static u32 bcmgenet_get_msglevel(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+
+	return priv->msg_enable;
+}
+
+static void bcmgenet_set_msglevel(struct net_device *dev, u32 level)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+
+	priv->msg_enable = level;
+}
+
+/* standard ethtool support functions. */
+enum bcmgenet_stat_type {
+	BCMGENET_STAT_NETDEV = -1,
+	BCMGENET_STAT_MIB_RX,
+	BCMGENET_STAT_MIB_TX,
+	BCMGENET_STAT_RUNT,
+	BCMGENET_STAT_MISC,
+};
+
+struct bcmgenet_stats {
+	char stat_string[ETH_GSTRING_LEN];
+	int stat_sizeof;
+	int stat_offset;
+	enum bcmgenet_stat_type type;
+	/* reg offset from UMAC base for misc counters */
+	u16 reg_offset;
+};
+
+#define STAT_NETDEV(m) { \
+	.stat_string = __stringify(m), \
+	.stat_sizeof = sizeof(((struct net_device_stats *)0)->m), \
+	.stat_offset = offsetof(struct net_device_stats, m), \
+	.type = BCMGENET_STAT_NETDEV, \
+}
+
+#define STAT_GENET_MIB(str, m, _type) { \
+	.stat_string = str, \
+	.stat_sizeof = sizeof(((struct bcmgenet_priv *)0)->m), \
+	.stat_offset = offsetof(struct bcmgenet_priv, m), \
+	.type = _type, \
+}
+
+#define STAT_GENET_MIB_RX(str, m) STAT_GENET_MIB(str, m, BCMGENET_STAT_MIB_RX)
+#define STAT_GENET_MIB_TX(str, m) STAT_GENET_MIB(str, m, BCMGENET_STAT_MIB_TX)
+#define STAT_GENET_RUNT(str, m) STAT_GENET_MIB(str, m, BCMGENET_STAT_RUNT)
+
+#define STAT_GENET_MISC(str, m, offset) { \
+	.stat_string = str, \
+	.stat_sizeof = sizeof(((struct bcmgenet_priv *)0)->m), \
+	.stat_offset = offsetof(struct bcmgenet_priv, m), \
+	.type = BCMGENET_STAT_MISC, \
+	.reg_offset = offset, \
+}
+
+
+/* There is a 0xC gap between the end of RX and beginning of TX stats and then
+ * between the end of TX stats and the beginning of the RX RUNT
+ */
+#define BCMGENET_STAT_OFFSET	0xc
+
+/* Hardware counters must be kept in sync because the order/offset
+ * is important here (order in structure declaration = order in hardware)
+ */
+static const struct bcmgenet_stats bcmgenet_gstrings_stats[] = {
+	/* general stats */
+	STAT_NETDEV(rx_packets),
+	STAT_NETDEV(tx_packets),
+	STAT_NETDEV(rx_bytes),
+	STAT_NETDEV(tx_bytes),
+	STAT_NETDEV(rx_errors),
+	STAT_NETDEV(tx_errors),
+	STAT_NETDEV(rx_dropped),
+	STAT_NETDEV(tx_dropped),
+	STAT_NETDEV(multicast),
+	/* UniMAC RSV counters */
+	STAT_GENET_MIB_RX("rx_64_octets", mib.rx.pkt_cnt.cnt_64),
+	STAT_GENET_MIB_RX("rx_65_127_oct", mib.rx.pkt_cnt.cnt_127),
+	STAT_GENET_MIB_RX("rx_128_255_oct", mib.rx.pkt_cnt.cnt_255),
+	STAT_GENET_MIB_RX("rx_256_511_oct", mib.rx.pkt_cnt.cnt_511),
+	STAT_GENET_MIB_RX("rx_512_1023_oct", mib.rx.pkt_cnt.cnt_1023),
+	STAT_GENET_MIB_RX("rx_1024_1518_oct", mib.rx.pkt_cnt.cnt_1518),
+	STAT_GENET_MIB_RX("rx_vlan_1519_1522_oct", mib.rx.pkt_cnt.cnt_mgv),
+	STAT_GENET_MIB_RX("rx_1522_2047_oct", mib.rx.pkt_cnt.cnt_2047),
+	STAT_GENET_MIB_RX("rx_2048_4095_oct", mib.rx.pkt_cnt.cnt_4095),
+	STAT_GENET_MIB_RX("rx_4096_9216_oct", mib.rx.pkt_cnt.cnt_9216),
+	STAT_GENET_MIB_RX("rx_pkts", mib.rx.pkt),
+	STAT_GENET_MIB_RX("rx_bytes", mib.rx.bytes),
+	STAT_GENET_MIB_RX("rx_multicast", mib.rx.mca),
+	STAT_GENET_MIB_RX("rx_broadcast", mib.rx.bca),
+	STAT_GENET_MIB_RX("rx_fcs", mib.rx.fcs),
+	STAT_GENET_MIB_RX("rx_control", mib.rx.cf),
+	STAT_GENET_MIB_RX("rx_pause", mib.rx.pf),
+	STAT_GENET_MIB_RX("rx_unknown", mib.rx.uo),
+	STAT_GENET_MIB_RX("rx_align", mib.rx.aln),
+	STAT_GENET_MIB_RX("rx_outrange", mib.rx.flr),
+	STAT_GENET_MIB_RX("rx_code", mib.rx.cde),
+	STAT_GENET_MIB_RX("rx_carrier", mib.rx.fcr),
+	STAT_GENET_MIB_RX("rx_oversize", mib.rx.ovr),
+	STAT_GENET_MIB_RX("rx_jabber", mib.rx.jbr),
+	STAT_GENET_MIB_RX("rx_mtu_err", mib.rx.mtue),
+	STAT_GENET_MIB_RX("rx_good_pkts", mib.rx.pok),
+	STAT_GENET_MIB_RX("rx_unicast", mib.rx.uc),
+	STAT_GENET_MIB_RX("rx_ppp", mib.rx.ppp),
+	STAT_GENET_MIB_RX("rx_crc", mib.rx.rcrc),
+	/* UniMAC TSV counters */
+	STAT_GENET_MIB_TX("tx_64_octets", mib.tx.pkt_cnt.cnt_64),
+	STAT_GENET_MIB_TX("tx_65_127_oct", mib.tx.pkt_cnt.cnt_127),
+	STAT_GENET_MIB_TX("tx_128_255_oct", mib.tx.pkt_cnt.cnt_255),
+	STAT_GENET_MIB_TX("tx_256_511_oct", mib.tx.pkt_cnt.cnt_511),
+	STAT_GENET_MIB_TX("tx_512_1023_oct", mib.tx.pkt_cnt.cnt_1023),
+	STAT_GENET_MIB_TX("tx_1024_1518_oct", mib.tx.pkt_cnt.cnt_1518),
+	STAT_GENET_MIB_TX("tx_vlan_1519_1522_oct", mib.tx.pkt_cnt.cnt_mgv),
+	STAT_GENET_MIB_TX("tx_1522_2047_oct", mib.tx.pkt_cnt.cnt_2047),
+	STAT_GENET_MIB_TX("tx_2048_4095_oct", mib.tx.pkt_cnt.cnt_4095),
+	STAT_GENET_MIB_TX("tx_4096_9216_oct", mib.tx.pkt_cnt.cnt_9216),
+	STAT_GENET_MIB_TX("tx_pkts", mib.tx.pkts),
+	STAT_GENET_MIB_TX("tx_multicast", mib.tx.mca),
+	STAT_GENET_MIB_TX("tx_broadcast", mib.tx.bca),
+	STAT_GENET_MIB_TX("tx_pause", mib.tx.pf),
+	STAT_GENET_MIB_TX("tx_control", mib.tx.cf),
+	STAT_GENET_MIB_TX("tx_fcs_err", mib.tx.fcs),
+	STAT_GENET_MIB_TX("tx_oversize", mib.tx.ovr),
+	STAT_GENET_MIB_TX("tx_defer", mib.tx.drf),
+	STAT_GENET_MIB_TX("tx_excess_defer", mib.tx.edf),
+	STAT_GENET_MIB_TX("tx_single_col", mib.tx.scl),
+	STAT_GENET_MIB_TX("tx_multi_col", mib.tx.mcl),
+	STAT_GENET_MIB_TX("tx_late_col", mib.tx.lcl),
+	STAT_GENET_MIB_TX("tx_excess_col", mib.tx.ecl),
+	STAT_GENET_MIB_TX("tx_frags", mib.tx.frg),
+	STAT_GENET_MIB_TX("tx_total_col", mib.tx.ncl),
+	STAT_GENET_MIB_TX("tx_jabber", mib.tx.jbr),
+	STAT_GENET_MIB_TX("tx_bytes", mib.tx.bytes),
+	STAT_GENET_MIB_TX("tx_good_pkts", mib.tx.pok),
+	STAT_GENET_MIB_TX("tx_unicast", mib.tx.uc),
+	/* UniMAC RUNT counters */
+	STAT_GENET_RUNT("rx_runt_pkts", mib.rx_runt_cnt),
+	STAT_GENET_RUNT("rx_runt_valid_fcs", mib.rx_runt_fcs),
+	STAT_GENET_RUNT("rx_runt_inval_fcs_align", mib.rx_runt_fcs_align),
+	STAT_GENET_RUNT("rx_runt_bytes", mib.rx_runt_bytes),
+	/* Misc UniMAC counters */
+	STAT_GENET_MISC("rbuf_ovflow_cnt", mib.rbuf_ovflow_cnt,
+			UMAC_RBUF_OVFL_CNT),
+	STAT_GENET_MISC("rbuf_err_cnt", mib.rbuf_err_cnt, UMAC_RBUF_ERR_CNT),
+	STAT_GENET_MISC("mdf_err_cnt", mib.mdf_err_cnt, UMAC_MDF_ERR_CNT),
+};
+
+#define BCMGENET_STATS_LEN	ARRAY_SIZE(bcmgenet_gstrings_stats)
+
+static void bcmgenet_get_drvinfo(struct net_device *dev,
+		struct ethtool_drvinfo *info)
+{
+	strlcpy(info->driver, "bcmgenet", sizeof(info->driver));
+	strlcpy(info->version, "v2.0", sizeof(info->version));
+	info->n_stats = BCMGENET_STATS_LEN;
+
+}
+
+static int bcmgenet_get_sset_count(struct net_device *dev, int string_set)
+{
+	switch (string_set) {
+	case ETH_SS_STATS:
+		return BCMGENET_STATS_LEN;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static void bcmgenet_get_strings(struct net_device *dev,
+				u32 stringset, u8 *data)
+{
+	int i;
+
+	switch (stringset) {
+	case ETH_SS_STATS:
+		for (i = 0; i < BCMGENET_STATS_LEN; i++) {
+			memcpy(data + i * ETH_GSTRING_LEN,
+				bcmgenet_gstrings_stats[i].stat_string,
+				ETH_GSTRING_LEN);
+		}
+		break;
+	}
+}
+
+static void bcmgenet_update_mib_counters(struct bcmgenet_priv *priv)
+{
+	int i, j = 0;
+
+	for (i = 0; i < BCMGENET_STATS_LEN; i++) {
+		const struct bcmgenet_stats *s;
+		u32 val = 0;
+		char *p;
+		u8 offset = 0;
+
+		s = &bcmgenet_gstrings_stats[i];
+		switch (s->type) {
+		case BCMGENET_STAT_NETDEV:
+			continue;
+		case BCMGENET_STAT_MIB_RX:
+		case BCMGENET_STAT_MIB_TX:
+		case BCMGENET_STAT_RUNT:
+			if (s->type != BCMGENET_STAT_MIB_RX)
+				offset = BCMGENET_STAT_OFFSET;
+			val = bcmgenet_umac_readl(priv, UMAC_MIB_START +
+								j + offset);
+			break;
+		case BCMGENET_STAT_MISC:
+			val = bcmgenet_umac_readl(priv, s->reg_offset);
+			/* clear if overflowed */
+			if (val == ~0)
+				bcmgenet_umac_writel(priv, 0, s->reg_offset);
+			break;
+		}
+
+		j += s->stat_sizeof;
+		p = (char *)priv + s->stat_offset;
+		*(u32 *)p = val;
+	}
+}
+
+static void bcmgenet_get_ethtool_stats(struct net_device *dev,
+					struct ethtool_stats *stats,
+					u64 *data)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	int i;
+
+	mutex_lock(&priv->mib_mutex);
+	if (netif_running(dev))
+		bcmgenet_update_mib_counters(priv);
+
+	for (i = 0; i < BCMGENET_STATS_LEN; i++) {
+		const struct bcmgenet_stats *s;
+		char *p;
+
+		s = &bcmgenet_gstrings_stats[i];
+		if (s->type == BCMGENET_STAT_NETDEV)
+			p = (char *)&dev->stats;
+		else
+			p = (char *)priv;
+		p += s->stat_offset;
+		data[i] = *(u32 *)p;
+	}
+	mutex_unlock(&priv->mib_mutex);
+}
+
+/* standard ethtool support functions. */
+static struct ethtool_ops bcmgenet_ethtool_ops = {
+	.get_strings		= bcmgenet_get_strings,
+	.get_sset_count		= bcmgenet_get_sset_count,
+	.get_ethtool_stats	= bcmgenet_get_ethtool_stats,
+	.get_settings		= bcmgenet_get_settings,
+	.set_settings		= bcmgenet_set_settings,
+	.get_drvinfo		= bcmgenet_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+	.get_msglevel		= bcmgenet_get_msglevel,
+	.set_msglevel		= bcmgenet_set_msglevel,
+};
+
+/* Power down the unimac, based on mode. */
+static void bcmgenet_power_down(struct bcmgenet_priv *priv,
+				enum bcmgenet_power_mode mode)
+{
+	u32 reg;
+
+	switch (mode) {
+	case GENET_POWER_CABLE_SENSE:
+		if (priv->phydev)
+			phy_detach(priv->phydev);
+		break;
+
+	case GENET_POWER_PASSIVE:
+		/* Power down LED */
+		bcmgenet_mii_reset(priv->dev);
+		if (priv->hw_params->flags & GENET_HAS_EXT) {
+			reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+			reg |= (EXT_PWR_DOWN_PHY |
+				EXT_PWR_DOWN_DLL | EXT_PWR_DOWN_BIAS);
+			bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+		}
+		break;
+	default:
+		break;
+	}
+
+}
+
+static void bcmgenet_power_up(struct bcmgenet_priv *priv,
+				enum bcmgenet_power_mode mode)
+{
+	u32 reg;
+
+	switch (mode) {
+	case GENET_POWER_CABLE_SENSE:
+		/* enable APD */
+		if (priv->hw_params->flags & GENET_HAS_EXT) {
+			reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+			reg |= EXT_PWR_DN_EN_LD;
+			bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+			bcmgenet_mii_reset(priv->dev);
+		}
+		break;
+
+	case GENET_POWER_PASSIVE:
+		if (priv->hw_params->flags & GENET_HAS_EXT) {
+			reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+			reg &= ~EXT_PWR_DOWN_DLL;
+			reg &= ~EXT_PWR_DOWN_PHY;
+			reg &= ~EXT_PWR_DOWN_BIAS;
+			/* enable APD */
+			reg |= EXT_PWR_DN_EN_LD;
+			bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+			bcmgenet_mii_reset(priv->dev);
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+/* ioctl handle special commands that are not present in ethtool. */
+static int bcmgenet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	int val = 0;
+
+	if (!netif_running(dev))
+		return -EINVAL;
+
+	switch (cmd) {
+	case SIOCGMIIPHY:
+	case SIOCGMIIREG:
+	case SIOCSMIIREG:
+		if (!priv->phydev)
+			val = -ENODEV;
+		else
+			val = phy_mii_ioctl(priv->phydev, rq, cmd);
+		break;
+
+	default:
+		val = -EINVAL;
+		break;
+	}
+
+	return val;
+}
+
+static struct enet_cb *bcmgenet_get_txcb(struct bcmgenet_priv *priv,
+					 struct bcmgenet_tx_ring *ring)
+{
+	struct enet_cb *tx_cb_ptr;
+
+	tx_cb_ptr = ring->cbs;
+	tx_cb_ptr += ring->write_ptr - ring->cb_ptr;
+	tx_cb_ptr->bd_addr = priv->tx_bds + ring->write_ptr * DMA_DESC_SIZE;
+	/* Advancing local write pointer */
+	if (ring->write_ptr == ring->end_ptr)
+		ring->write_ptr = ring->cb_ptr;
+	else
+		ring->write_ptr++;
+
+	return tx_cb_ptr;
+}
+
+/* Simple helper to free a control block's resources */
+static void bcmgenet_free_cb(struct enet_cb *cb)
+{
+	dev_kfree_skb_any(cb->skb);
+	cb->skb = NULL;
+	dma_unmap_addr_set(cb, dma_addr, 0);
+}
+
+static inline void bcmgenet_tx_ring16_int_disable(struct bcmgenet_priv *priv,
+						  struct bcmgenet_tx_ring *ring)
+{
+	bcmgenet_intrl2_0_writel(priv,
+			UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE,
+			INTRL2_CPU_MASK_SET);
+}
+
+static inline void bcmgenet_tx_ring16_int_enable(struct bcmgenet_priv *priv,
+						 struct bcmgenet_tx_ring *ring)
+{
+	bcmgenet_intrl2_0_writel(priv,
+			UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE,
+			INTRL2_CPU_MASK_CLEAR);
+}
+
+static inline void bcmgenet_tx_ring_int_enable(struct bcmgenet_priv *priv,
+						struct bcmgenet_tx_ring *ring)
+{
+	bcmgenet_intrl2_1_writel(priv,
+			(1 << ring->index), INTRL2_CPU_MASK_CLEAR);
+	priv->int1_mask &= ~(1 << ring->index);
+}
+
+static inline void bcmgenet_tx_ring_int_disable(struct bcmgenet_priv *priv,
+						struct bcmgenet_tx_ring *ring)
+{
+	bcmgenet_intrl2_1_writel(priv,
+			(1 << ring->index), INTRL2_CPU_MASK_SET);
+	priv->int1_mask |= (1 << ring->index);
+}
+
+/* Unlocked version of the reclaim routine */
+static void __bcmgenet_tx_reclaim(struct net_device *dev,
+				struct bcmgenet_tx_ring *ring)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	int last_tx_cn, last_c_index, num_tx_bds;
+	struct enet_cb *tx_cb_ptr;
+	unsigned int c_index;
+
+	/* Compute how many buffers are transmited since last xmit call */
+	c_index = bcmgenet_tdma_ring_readl(priv, ring->index, TDMA_CONS_INDEX);
+
+	last_c_index = ring->c_index;
+	num_tx_bds = ring->size;
+
+	c_index &= (num_tx_bds - 1);
+
+	if (c_index >= last_c_index)
+		last_tx_cn = c_index - last_c_index;
+	else
+		last_tx_cn = num_tx_bds - last_c_index + c_index;
+
+	netif_dbg(priv, tx_done, dev,
+			"%s ring=%d index=%d last_tx_cn=%d last_index=%d\n",
+			__func__, ring->index,
+			c_index, last_tx_cn, last_c_index);
+
+	/* Reclaim transmitted buffers */
+	while (last_tx_cn-- > 0) {
+		tx_cb_ptr = ring->cbs + last_c_index;
+		if (tx_cb_ptr->skb) {
+			dev->stats.tx_bytes += tx_cb_ptr->skb->len;
+			dma_unmap_single(&dev->dev,
+					dma_unmap_addr(tx_cb_ptr, dma_addr),
+					tx_cb_ptr->skb->len,
+					DMA_TO_DEVICE);
+			bcmgenet_free_cb(tx_cb_ptr);
+		} else if (dma_unmap_addr(tx_cb_ptr, dma_addr)) {
+			dev->stats.tx_bytes +=
+				dma_unmap_len(tx_cb_ptr, dma_len);
+			dma_unmap_page(&dev->dev,
+					dma_unmap_addr(tx_cb_ptr, dma_addr),
+					dma_unmap_len(tx_cb_ptr, dma_len),
+					DMA_TO_DEVICE);
+			dma_unmap_addr_set(tx_cb_ptr, dma_addr, 0);
+		}
+		dev->stats.tx_packets++;
+		ring->free_bds += 1;
+
+		last_c_index++;
+		last_c_index &= (num_tx_bds - 1);
+	}
+
+	if (ring->free_bds > (MAX_SKB_FRAGS + 1))
+		ring->int_disable(priv, ring);
+
+	if (__netif_subqueue_stopped(dev, ring->queue))
+		netif_wake_subqueue(dev, ring->queue);
+
+	ring->c_index = c_index;
+}
+
+static void bcmgenet_tx_reclaim(struct net_device *dev,
+		struct bcmgenet_tx_ring *ring)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ring->lock, flags);
+	__bcmgenet_tx_reclaim(dev, ring);
+	spin_unlock_irqrestore(&ring->lock, flags);
+}
+
+static void bcmgenet_tx_reclaim_all(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	int i;
+
+	if (netif_is_multiqueue(dev)) {
+		for (i = 0; i < priv->hw_params->tx_queues; i++)
+			bcmgenet_tx_reclaim(dev, &priv->tx_rings[i]);
+	}
+
+	bcmgenet_tx_reclaim(dev, &priv->tx_rings[DESC_INDEX]);
+}
+
+/* Transmits a single SKB (either head of a fragment or a single SKB)
+ * caller must hold priv->lock
+ */
+static int bcmgenet_xmit_single(struct net_device *dev,
+				struct sk_buff *skb,
+				u16 dma_desc_flags,
+				struct bcmgenet_tx_ring *ring)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	struct device *kdev = &priv->pdev->dev;
+	struct enet_cb *tx_cb_ptr;
+	unsigned int skb_len;
+	dma_addr_t mapping;
+	u32 length_status;
+	int ret;
+
+	tx_cb_ptr = bcmgenet_get_txcb(priv, ring);
+
+	if (unlikely(!tx_cb_ptr))
+		BUG();
+
+	tx_cb_ptr->skb = skb;
+
+	skb_len = skb_headlen(skb) < ETH_ZLEN ? ETH_ZLEN : skb_headlen(skb);
+
+	mapping = dma_map_single(kdev, skb->data, skb_len, DMA_TO_DEVICE);
+	ret = dma_mapping_error(kdev, mapping);
+	if (ret) {
+		netif_err(priv, tx_err, dev, "Tx DMA map failed\n");
+		dev_kfree_skb(skb);
+		return ret;
+	}
+
+	dma_unmap_addr_set(tx_cb_ptr, dma_addr, mapping);
+	dma_unmap_len_set(tx_cb_ptr, dma_len, skb->len);
+	length_status = (skb_len << DMA_BUFLENGTH_SHIFT) | dma_desc_flags |
+			(priv->hw_params->qtag_mask << DMA_TX_QTAG_SHIFT) |
+			DMA_TX_APPEND_CRC;
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL)
+		length_status |= DMA_TX_DO_CSUM;
+
+	dmadesc_set(priv, tx_cb_ptr->bd_addr, mapping, length_status);
+
+	/* Decrement total BD count and advance our write pointer */
+	ring->free_bds -= 1;
+	ring->prod_index += 1;
+	ring->prod_index &= DMA_P_INDEX_MASK;
+
+	return 0;
+}
+
+/* Transmit a SKB fragement */
+static int bcmgenet_xmit_frag(struct net_device *dev,
+				skb_frag_t *frag,
+				u16 dma_desc_flags,
+				struct bcmgenet_tx_ring *ring)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	struct device *kdev = &priv->pdev->dev;
+	struct enet_cb *tx_cb_ptr;
+	dma_addr_t mapping;
+	int ret;
+
+	tx_cb_ptr = bcmgenet_get_txcb(priv, ring);
+
+	if (unlikely(!tx_cb_ptr))
+		BUG();
+	tx_cb_ptr->skb = NULL;
+
+	mapping = skb_frag_dma_map(kdev, frag, 0,
+		skb_frag_size(frag), DMA_TO_DEVICE);
+	ret = dma_mapping_error(kdev, mapping);
+	if (ret) {
+		netif_err(priv, tx_err, dev, "%s: Tx DMA map failed\n",
+				__func__);
+		/*TODO: Handle frag failure.*/
+		return ret;
+	}
+
+	dma_unmap_addr_set(tx_cb_ptr, dma_addr, mapping);
+	dma_unmap_len_set(tx_cb_ptr, dma_len, frag->size);
+
+	dmadesc_set(priv, tx_cb_ptr->bd_addr, mapping,
+			(frag->size << DMA_BUFLENGTH_SHIFT) | dma_desc_flags |
+			(priv->hw_params->qtag_mask << DMA_TX_QTAG_SHIFT));
+
+
+	ring->free_bds -= 1;
+	ring->prod_index += 1;
+	ring->prod_index &= DMA_P_INDEX_MASK;
+
+	return 0;
+}
+
+/* Reallocate the SKB to put enough headroom in front of it and insert
+ * the transmit checksum offsets in the descriptors
+ */
+static int bcmgenet_put_tx_csum(struct net_device *dev, struct sk_buff *skb)
+{
+	struct status_64 *status = NULL;
+	struct sk_buff *new_skb;
+	u16 offset;
+	u8 ip_proto;
+	u16 ip_ver;
+	u32 tx_csum_info;
+
+	if (unlikely(skb_headroom(skb) < sizeof(*status))) {
+		/* If 64 byte status block enabled, must make sure skb has
+		 * enough headroom for us to insert 64B status block.
+		 */
+		new_skb = skb_realloc_headroom(skb, sizeof(*status));
+		dev_kfree_skb(skb);
+		if (!new_skb) {
+			dev->stats.tx_errors++;
+			dev->stats.tx_dropped++;
+			return -ENOMEM;
+		}
+		skb = new_skb;
+	}
+
+	skb_push(skb, sizeof(*status));
+	status = (struct status_64 *)skb->data;
+
+	if (skb->ip_summed  == CHECKSUM_PARTIAL) {
+		ip_ver = htons(skb->protocol);
+		switch (ip_ver) {
+		case ETH_P_IP:
+			ip_proto = ip_hdr(skb)->protocol;
+			break;
+		case ETH_P_IPV6:
+			ip_proto = ipv6_hdr(skb)->nexthdr;
+			break;
+		default:
+			return 0;
+		}
+
+		offset = skb_checksum_start_offset(skb) - sizeof(*status);
+		tx_csum_info = (offset << STATUS_TX_CSUM_START_SHIFT) |
+				(offset + skb->csum_offset);
+
+		/* Set the length valid bit for TCP and UDP and just set
+		 * the special UDP flag for IPv4, else just set to 0.
+		 */
+		if (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP) {
+			tx_csum_info |= STATUS_TX_CSUM_LV;
+			if (ip_proto == IPPROTO_UDP && ip_ver == ETH_P_IP)
+				tx_csum_info |= STATUS_TX_CSUM_PROTO_UDP;
+		} else
+			tx_csum_info = 0;
+
+		status->tx_csum_info = tx_csum_info;
+	}
+
+	return 0;
+}
+
+static netdev_tx_t bcmgenet_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	struct bcmgenet_tx_ring *ring = NULL;
+	unsigned long flags = 0;
+	int nr_frags, index;
+	u16 dma_desc_flags;
+	int ret;
+	int i;
+
+	index = skb_get_queue_mapping(skb);
+	/* Mapping strategy:
+	 * queue_mapping = 0, unclassfieid, packet xmited through ring16
+	 * queue_mapping = 1, goes to ring 0. (highest priority queue
+	 * queue_mapping = 2, goes to ring 1.
+	 * queue_mapping = 3, goes to ring 2.
+	 * queue_mapping = 4, goes to ring 3.
+	 */
+	if (index == 0)
+		index = DESC_INDEX;
+	else
+		index -= 1;
+
+	if (index != DESC_INDEX && index > 3) {
+		netdev_err(dev, "%s: queue_mapping %d is invalid\n",
+				__func__, skb_get_queue_mapping(skb));
+		dev->stats.tx_errors++;
+		dev->stats.tx_dropped++;
+		ret = NETDEV_TX_OK;
+		goto out;
+	}
+	nr_frags = skb_shinfo(skb)->nr_frags;
+	ring = &priv->tx_rings[index];
+
+	spin_lock_irqsave(&ring->lock, flags);
+	if (ring->free_bds <= nr_frags + 1) {
+		netif_stop_subqueue(dev, ring->queue);
+		netdev_err(dev, "%s: tx ring %d full when queue %d awake\n",
+				__func__, index, ring->queue);
+		ret = NETDEV_TX_BUSY;
+		goto out;
+	}
+
+	/* reclaim xmited skb every 8 packets. */
+	/*if (ring->free_bds < ring->size - 8)*/
+		/*__bcmgenet_tx_reclaim(dev, ring);*/
+
+	/* set the SKB transmit checksum */
+	if (priv->desc_64b_en) {
+		ret = bcmgenet_put_tx_csum(dev, skb);
+		if (ret) {
+			ret = NETDEV_TX_OK;
+			goto out;
+		}
+	}
+
+	dma_desc_flags = DMA_SOP;
+	if (nr_frags == 0)
+		dma_desc_flags |= DMA_EOP;
+
+	/* Transmit single SKB or head of fragment list */
+	ret = bcmgenet_xmit_single(dev, skb, dma_desc_flags, ring);
+	if (ret) {
+		ret = NETDEV_TX_OK;
+		goto out;
+	}
+
+	/* xmit fragment */
+	for (i = 0; i < nr_frags; i++) {
+		/*TODO: Handle frag failure.*/
+		ret = bcmgenet_xmit_frag(dev,
+				&skb_shinfo(skb)->frags[i],
+				(i == nr_frags - 1) ? DMA_EOP : 0, ring);
+		if (ret) {
+			ret = NETDEV_TX_OK;
+			goto out;
+		}
+	}
+
+	/* we kept a software copy of how much we should advance the TDMA
+	 * producer index, now write it down to the hardware
+	 */
+	bcmgenet_tdma_ring_writel(priv, ring->index,
+			ring->prod_index, TDMA_PROD_INDEX);
+
+	if (ring->free_bds <= (MAX_SKB_FRAGS + 1)) {
+		netif_stop_subqueue(dev, ring->queue);
+		ring->int_enable(priv, ring);
+	}
+
+out:
+	spin_unlock_irqrestore(&ring->lock, flags);
+
+	return ret;
+}
+
+
+static int bcmgenet_rx_refill(struct bcmgenet_priv *priv,
+				struct enet_cb *cb)
+{
+	struct device *kdev = &priv->pdev->dev;
+	struct sk_buff *skb;
+	dma_addr_t mapping;
+	int ret;
+
+	skb = netdev_alloc_skb(priv->dev,
+				priv->rx_buf_len + SKB_ALIGNMENT);
+	if (!skb)
+		return -ENOMEM;
+
+	/* a caller did not release this control block */
+	WARN_ON(cb->skb != NULL);
+	cb->skb = skb;
+	mapping = dma_map_single(kdev, skb->data,
+			priv->rx_buf_len, DMA_FROM_DEVICE);
+	ret = dma_mapping_error(kdev, mapping);
+	if (ret) {
+		bcmgenet_free_cb(cb);
+		netif_err(priv, rx_err, priv->dev,
+				"%s DMA map failed\n", __func__);
+		return ret;
+	}
+
+	dma_unmap_addr_set(cb, dma_addr, mapping);
+	/* assign packet, prepare descriptor, and advance pointer */
+
+	dmadesc_set_addr(priv, priv->rx_bd_assign_ptr, mapping);
+
+	/* turn on the newly assigned BD for DMA to use */
+	priv->rx_bd_assign_index++;
+	priv->rx_bd_assign_index &= (priv->num_rx_bds - 1);
+
+	priv->rx_bd_assign_ptr = priv->rx_bds +
+		(priv->rx_bd_assign_index * DMA_DESC_SIZE);
+
+	return 0;
+}
+
+/* bcmgenet_desc_rx - descriptor based rx process.
+ * this could be called from bottom half, or from NAPI polling method.
+ */
+static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
+				     unsigned int budget)
+{
+	struct net_device *dev = priv->dev;
+	struct enet_cb *cb;
+	struct sk_buff *skb;
+	u32 dma_length_status;
+	unsigned long dma_flag;
+	int len, err;
+	unsigned int rxpktprocessed = 0, rxpkttoprocess;
+	unsigned int p_index;
+	unsigned int chksum_ok = 0;
+
+	p_index = bcmgenet_rdma_ring_readl(priv,
+			DESC_INDEX, RDMA_PROD_INDEX);
+	p_index &= DMA_P_INDEX_MASK;
+
+	if (p_index < priv->rx_c_index)
+		rxpkttoprocess = (DMA_C_INDEX_MASK + 1) -
+			priv->rx_c_index + p_index;
+	else
+		rxpkttoprocess = p_index - priv->rx_c_index;
+
+	netif_dbg(priv, rx_status, dev,
+		"RDMA: rxpkttoprocess=%d\n", rxpkttoprocess);
+
+	while ((rxpktprocessed < rxpkttoprocess) &&
+			(rxpktprocessed < budget)) {
+
+		/* Unmap the packet contents such that we can use the
+		 * RSV from the 64 bytes descriptor when enabled and save
+		 * a 32-bits register read
+		 */
+		cb = &priv->rx_cbs[priv->rx_read_ptr];
+		skb = cb->skb;
+		dma_unmap_single(&dev->dev, dma_unmap_addr(cb, dma_addr),
+				priv->rx_buf_len, DMA_FROM_DEVICE);
+
+		if (!priv->desc_64b_en) {
+			dma_length_status = dmadesc_get_length_status(priv,
+							priv->rx_bds +
+							(priv->rx_read_ptr *
+							 DMA_DESC_SIZE));
+		} else {
+			struct status_64 *status;
+			status = (struct status_64 *)skb->data;
+			dma_length_status = status->length_status;
+		}
+
+		/* DMA flags and length are still valid no matter how
+		 * we got the Receive Status Vector (64B RSB or register)
+		 */
+		dma_flag = dma_length_status & 0xffff;
+		len = dma_length_status >> DMA_BUFLENGTH_SHIFT;
+
+		netif_dbg(priv, rx_status, dev,
+			"%s: p_ind=%d c_ind=%d read_ptr=%d len_stat=0x%08x\n",
+			__func__, p_index, priv->rx_c_index, priv->rx_read_ptr,
+			dma_length_status);
+
+		rxpktprocessed++;
+
+		priv->rx_read_ptr++;
+		priv->rx_read_ptr &= (priv->num_rx_bds - 1);
+
+		/* out of memory, just drop packets at the hardware level */
+		if (unlikely(!skb)) {
+			dev->stats.rx_dropped++;
+			dev->stats.rx_errors++;
+			goto refill;
+		}
+
+		if (unlikely(!(dma_flag & DMA_EOP) || !(dma_flag & DMA_SOP))) {
+			netif_err(priv, rx_status, dev,
+					"Droping fragmented packet!\n");
+			dev->stats.rx_dropped++;
+			dev->stats.rx_errors++;
+			dev_kfree_skb_any(cb->skb);
+			cb->skb = NULL;
+			goto refill;
+		}
+		/* report errors */
+		if (unlikely(dma_flag & (DMA_RX_CRC_ERROR |
+						DMA_RX_OV |
+						DMA_RX_NO |
+						DMA_RX_LG |
+						DMA_RX_RXER))) {
+			netif_err(priv, rx_status, dev, "dma_flag=0x%x\n",
+						(unsigned int)dma_flag);
+			if (dma_flag & DMA_RX_CRC_ERROR)
+				dev->stats.rx_crc_errors++;
+			if (dma_flag & DMA_RX_OV)
+				dev->stats.rx_over_errors++;
+			if (dma_flag & DMA_RX_NO)
+				dev->stats.rx_frame_errors++;
+			if (dma_flag & DMA_RX_LG)
+				dev->stats.rx_length_errors++;
+			dev->stats.rx_dropped++;
+			dev->stats.rx_errors++;
+
+			/* discard the packet and advance consumer index.*/
+			dev_kfree_skb_any(cb->skb);
+			cb->skb = NULL;
+			goto refill;
+		} /* error packet */
+
+		chksum_ok = (dma_flag & priv->dma_rx_chk_bit) &&
+				priv->desc_rxchk_en;
+
+		skb_put(skb, len);
+		if (priv->desc_64b_en) {
+			skb_pull(skb, 64);
+			len -= 64;
+		}
+
+		if (likely(chksum_ok))
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+		/* remove hardware 2bytes added for IP alignment */
+		skb_pull(skb, 2);
+		len -= 2;
+
+		if (priv->crc_fwd_en) {
+			skb_trim(skb, len - ETH_FCS_LEN);
+			len -= ETH_FCS_LEN;
+		}
+
+		/*Finish setting up the received SKB and send it to the kernel*/
+		skb->protocol = eth_type_trans(skb, priv->dev);
+		dev->stats.rx_packets++;
+		dev->stats.rx_bytes += len;
+		if (dma_flag & DMA_RX_MULT)
+			dev->stats.multicast++;
+
+		/* Notify kernel */
+		napi_gro_receive(&priv->napi, skb);
+		cb->skb = NULL;
+		netif_dbg(priv, rx_status, dev, "pushed up to kernel\n");
+
+		/* refill RX path on the current control block */
+refill:
+		err = bcmgenet_rx_refill(priv, cb);
+		if (err)
+			netif_err(priv, rx_err, dev, "Rx refill failed\n");
+	}
+
+	return rxpktprocessed;
+}
+
+/* Assign skb to RX DMA descriptor. */
+static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv)
+{
+	struct enet_cb *cb;
+	int ret = 0;
+	int i;
+	u32 reg;
+
+	netif_dbg(priv, hw, priv->dev, "%s:\n", __func__);
+
+	/* This function may be called from irq bottom-half. */
+	spin_lock_bh(&priv->bh_lock);
+
+	/* loop here for each buffer needing assign */
+	for (i = 0; i < priv->num_rx_bds; i++) {
+		cb = &priv->rx_cbs[priv->rx_bd_assign_index];
+		if (cb->skb)
+			continue;
+
+		/* set the DMA descriptor length once and for all
+		 * it will only change if we support dynamically sizing
+		 * priv->rx_buf_len, but we do not
+		 */
+		dmadesc_set_length_status(priv, priv->rx_bd_assign_ptr,
+				priv->rx_buf_len << DMA_BUFLENGTH_SHIFT);
+
+		ret = bcmgenet_rx_refill(priv, cb);
+		if (ret)
+			break;
+
+	}
+
+	/* Enable rx DMA incase it was disabled due to running out of rx BD */
+	reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
+	reg |= DMA_EN;
+	bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
+
+	spin_unlock_bh(&priv->bh_lock);
+
+	return ret;
+}
+
+static void bcmgenet_free_rx_buffers(struct bcmgenet_priv *priv)
+{
+	struct enet_cb *cb;
+	int i;
+
+	for (i = 0; i < priv->num_rx_bds; i++) {
+		cb = &priv->rx_cbs[i];
+
+		if (dma_unmap_addr(cb, dma_addr)) {
+			dma_unmap_single(&priv->dev->dev,
+					dma_unmap_addr(cb, dma_addr),
+					priv->rx_buf_len, DMA_FROM_DEVICE);
+			dma_unmap_addr_set(cb, dma_addr, 0);
+		}
+
+		if (cb->skb)
+			bcmgenet_free_cb(cb);
+	}
+}
+
+static int reset_umac(struct bcmgenet_priv *priv)
+{
+	struct device *kdev = &priv->pdev->dev;
+	unsigned int timeout = 0;
+	u32 reg;
+
+	/* 7358a0/7552a0: bad default in RBUF_FLUSH_CTRL.umac_sw_rst */
+	bcmgenet_rbuf_ctrl_set(priv, 0);
+	udelay(10);
+
+	/* disable MAC while updating its registers */
+	bcmgenet_umac_writel(priv, 0, UMAC_CMD);
+
+	/* issue soft reset, wait for it to complete */
+	bcmgenet_umac_writel(priv, CMD_SW_RESET, UMAC_CMD);
+	while (timeout++ < 1000) {
+		reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+		if (!(reg & CMD_SW_RESET))
+			break;
+		udelay(1);
+	}
+
+	if (timeout == 1000) {
+		dev_err(kdev,
+			"timeout waiting for MAC to come out of resetn\n");
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
+/* init_umac: Initializes the uniMac controller */
+static int init_umac(struct bcmgenet_priv *priv)
+{
+	struct device *kdev = &priv->pdev->dev;
+	int ret;
+	u32 reg, cpu_mask_clear;
+
+	dev_dbg(&priv->pdev->dev, "bcmgenet: init_umac\n");
+
+	ret = reset_umac(priv);
+	if (ret)
+		return ret;
+
+	bcmgenet_umac_writel(priv, 0, UMAC_CMD);
+	/* clear tx/rx counter */
+	bcmgenet_umac_writel(priv,
+		MIB_RESET_RX | MIB_RESET_TX | MIB_RESET_RUNT, UMAC_MIB_CTRL);
+	bcmgenet_umac_writel(priv, 0, UMAC_MIB_CTRL);
+
+	bcmgenet_umac_writel(priv, ENET_MAX_MTU_SIZE, UMAC_MAX_FRAME_LEN);
+
+	/* init rx registers, enable ip header optimization */
+	reg = bcmgenet_rbuf_readl(priv, RBUF_CTRL);
+	reg |= RBUF_ALIGN_2B;
+	bcmgenet_rbuf_writel(priv, reg, RBUF_CTRL);
+
+	if (!GENET_IS_V1(priv) && !GENET_IS_V2(priv))
+		bcmgenet_rbuf_writel(priv, 1, RBUF_TBUF_SIZE_CTRL);
+
+	/* Mask all interrupts.*/
+	bcmgenet_intrl2_0_writel(priv, 0xFFFFFFFF, INTRL2_CPU_MASK_SET);
+	bcmgenet_intrl2_0_writel(priv, 0xFFFFFFFF, INTRL2_CPU_CLEAR);
+	bcmgenet_intrl2_0_writel(priv, 0, INTRL2_CPU_MASK_CLEAR);
+
+	cpu_mask_clear = UMAC_IRQ_RXDMA_BDONE;
+
+	dev_dbg(kdev, "%s:Enabling RXDMA_BDONE interrupt\n", __func__);
+
+	/* Monitor cable plug/unpluged event for internal PHY */
+	if (priv->phy_type == BRCM_PHY_TYPE_INT)
+		cpu_mask_clear |= (UMAC_IRQ_LINK_DOWN | UMAC_IRQ_LINK_UP);
+	else if (priv->ext_phy)
+		cpu_mask_clear |= (UMAC_IRQ_LINK_DOWN | UMAC_IRQ_LINK_UP);
+	else if (priv->phy_type == BRCM_PHY_TYPE_MOCA) {
+		reg = bcmgenet_bp_mc_get(priv);
+		reg |= BIT(priv->hw_params->bp_in_en_shift);
+
+		/* bp_mask: back pressure mask */
+		if (netif_is_multiqueue(priv->dev))
+			reg |= priv->hw_params->bp_in_mask;
+		else
+			reg &= ~priv->hw_params->bp_in_mask;
+		bcmgenet_bp_mc_set(priv, reg);
+	}
+
+	/* Enable MDIO interrupts on GENET v3+ */
+	if (priv->hw_params->flags & GENET_HAS_MDIO_INTR)
+		cpu_mask_clear |= UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR;
+
+	bcmgenet_intrl2_0_writel(priv, cpu_mask_clear,
+		INTRL2_CPU_MASK_CLEAR);
+
+	/* Enable rx/tx engine.*/
+	dev_dbg(kdev, "done init umac\n");
+
+	return 0;
+}
+
+/* Initialize all house-keeping variables for a TX ring, along
+ * with corresponding hardware registers
+ */
+static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
+				  unsigned int index, unsigned int size,
+				  unsigned int write_ptr, unsigned int end_ptr)
+{
+	struct bcmgenet_tx_ring *ring = &priv->tx_rings[index];
+	u32 words_per_bd = WORDS_PER_BD(priv);
+	u32 flow_period_val = 0;
+	unsigned int first_bd;
+
+	spin_lock_init(&ring->lock);
+	ring->index = index;
+	if (index == DESC_INDEX) {
+		ring->queue = 0;
+		ring->int_enable = bcmgenet_tx_ring16_int_enable;
+		ring->int_disable = bcmgenet_tx_ring16_int_disable;
+	} else {
+		ring->queue = index + 1;
+		ring->int_enable = bcmgenet_tx_ring_int_enable;
+		ring->int_disable = bcmgenet_tx_ring_int_disable;
+	}
+	ring->cbs = priv->tx_cbs + write_ptr;
+	ring->size = size;
+	ring->c_index = 0;
+	ring->free_bds = size;
+	ring->write_ptr = write_ptr;
+	ring->cb_ptr = write_ptr;
+	ring->end_ptr = end_ptr - 1;
+	ring->prod_index = 0;
+
+	/* Set flow period for ring != 16 */
+	if (index != DESC_INDEX)
+		flow_period_val = ENET_MAX_MTU_SIZE << 16;
+
+	bcmgenet_tdma_ring_writel(priv, index, 0, TDMA_PROD_INDEX);
+	bcmgenet_tdma_ring_writel(priv, index, 0, TDMA_CONS_INDEX);
+	bcmgenet_tdma_ring_writel(priv, index, 1, DMA_MBUF_DONE_THRESH);
+	/* Disable rate control for now */
+	bcmgenet_tdma_ring_writel(priv, index, flow_period_val,
+			TDMA_FLOW_PERIOD);
+	/* Unclassified traffic goes to ring 16 */
+	bcmgenet_tdma_ring_writel(priv, index,
+			((size << DMA_RING_SIZE_SHIFT) | RX_BUF_LENGTH),
+			DMA_RING_BUF_SIZE);
+
+	first_bd = write_ptr;
+
+	/* Set start and end address, read and write pointers */
+	bcmgenet_tdma_ring_writel(priv, index, first_bd * words_per_bd,
+			DMA_START_ADDR);
+	bcmgenet_tdma_ring_writel(priv, index, first_bd * words_per_bd,
+			TDMA_READ_PTR);
+	bcmgenet_tdma_ring_writel(priv, index, first_bd,
+			TDMA_WRITE_PTR);
+	bcmgenet_tdma_ring_writel(priv, index, end_ptr * words_per_bd - 1,
+			DMA_END_ADDR);
+}
+
+/* Initialize a RDMA ring */
+static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
+				  unsigned int index, unsigned int size)
+{
+	u32 words_per_bd = WORDS_PER_BD(priv);
+	int ret;
+
+	priv->num_rx_bds = TOTAL_DESC;
+	priv->rx_bds = priv->base + priv->hw_params->rdma_offset;
+	priv->rx_bd_assign_ptr = priv->rx_bds;
+	priv->rx_bd_assign_index = 0;
+	priv->rx_c_index = 0;
+	priv->rx_read_ptr = 0;
+	priv->rx_cbs = kzalloc(priv->num_rx_bds * sizeof(struct enet_cb),
+				GFP_KERNEL);
+	if (!priv->rx_cbs)
+		return -ENOMEM;
+
+	ret = bcmgenet_alloc_rx_buffers(priv);
+	if (ret) {
+		kfree(priv->rx_cbs);
+		return ret;
+	}
+
+	bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_WRITE_PTR);
+	bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_PROD_INDEX);
+	bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_CONS_INDEX);
+	bcmgenet_rdma_ring_writel(priv, index,
+		((size << DMA_RING_SIZE_SHIFT) | RX_BUF_LENGTH),
+		DMA_RING_BUF_SIZE);
+	bcmgenet_rdma_ring_writel(priv, index, 0, DMA_START_ADDR);
+	bcmgenet_rdma_ring_writel(priv, index,
+		words_per_bd * size - 1, DMA_END_ADDR);
+	bcmgenet_rdma_ring_writel(priv, index,
+			(DMA_FC_THRESH_LO << DMA_XOFF_THRESHOLD_SHIFT) |
+			DMA_FC_THRESH_HI, RDMA_XON_XOFF_THRESH);
+	bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_READ_PTR);
+
+	return ret;
+}
+
+/* init multi xmit queues, only available for GENET2
+ * the queue is partitioned as follows:
+ *
+ * queue 0 - 3 is priority based, each one has 32 descriptors,
+ * with queue 0 being the highest priority queue.
+ *
+ * queue 16 is the default tx queue with GENET_DEFAULT_BD_CNT
+ * descriptors: 256 - (number of tx queues * bds per queues) = 128
+ * descriptors.
+ *
+ * The transmit control block pool is then partitioned as following:
+ * - tx_cbs[0...127] are for queue 16
+ * - tx_ring_cbs[0] points to tx_cbs[128..159]
+ * - tx_ring_cbs[1] points to tx_cbs[160..191]
+ * - tx_ring_cbs[2] points to tx_cbs[192..223]
+ * - tx_ring_cbs[3] points to tx_cbs[224..255]
+ */
+static void bcmgenet_init_multiq(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	unsigned int i, dma_enable;
+	u32 reg, dma_ctrl, ring_cfg = 0, dma_priority = 0;
+
+	if (!netif_is_multiqueue(dev)) {
+		netdev_warn(dev, "called with non multi queue aware HW\n");
+		return;
+	}
+
+	dma_ctrl = bcmgenet_tdma_readl(priv, DMA_CTRL);
+	dma_enable = dma_ctrl & DMA_EN;
+	dma_ctrl &= ~DMA_EN;
+	bcmgenet_tdma_writel(priv, dma_ctrl, DMA_CTRL);
+
+	/* Enable strict priority arbiter mode */
+	bcmgenet_tdma_writel(priv, DMA_ARBITER_SP, DMA_ARB_CTRL);
+
+	for (i = 0; i < priv->hw_params->tx_queues; i++) {
+		/* first 64 tx_cbs are reserved for default tx queue
+		 * (ring 16)
+		 */
+		bcmgenet_init_tx_ring(priv, i, priv->hw_params->bds_cnt,
+					i * priv->hw_params->bds_cnt,
+					(i + 1) * priv->hw_params->bds_cnt);
+
+		/* Configure ring as decriptor ring and setup priority */
+		ring_cfg |= (1 << i);
+		dma_priority |= ((GENET_Q0_PRIORITY + i) <<
+				(GENET_MAX_MQ_CNT + 1) * i);
+		dma_ctrl |= (1 << (i + DMA_RING_BUF_EN_SHIFT));
+	}
+
+	/* Enable rings */
+	reg = bcmgenet_tdma_readl(priv, DMA_RING_CFG);
+	reg |= ring_cfg;
+	bcmgenet_tdma_writel(priv, reg, DMA_RING_CFG);
+
+	/* Use configured rings priority and set ring #16 priority */
+	reg = bcmgenet_tdma_readl(priv, DMA_RING_PRIORITY);
+	reg |= ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) << 20);
+	reg |= dma_priority;
+	bcmgenet_tdma_writel(priv, reg, DMA_PRIORITY);
+
+	/* Configure ring as descriptor ring and re-enable DMA if enabled */
+	reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
+	reg |= dma_ctrl;
+	if (dma_enable)
+		reg |= DMA_EN;
+	bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
+}
+
+static void bcmgenet_fini_dma(struct bcmgenet_priv *priv)
+{
+	int i;
+
+	/* disable DMA */
+	bcmgenet_rdma_writel(priv, 0, DMA_CTRL);
+	bcmgenet_tdma_writel(priv, 0, DMA_CTRL);
+
+	for (i = 0; i < priv->num_tx_bds; i++) {
+		if (priv->tx_cbs[i].skb != NULL) {
+			dev_kfree_skb(priv->tx_cbs[i].skb);
+			priv->tx_cbs[i].skb = NULL;
+		}
+	}
+	bcmgenet_free_rx_buffers(priv);
+	kfree(priv->rx_cbs);
+	kfree(priv->tx_cbs);
+}
+
+/* init_edma: Initialize DMA control register */
+static int bcmgenet_init_dma(struct bcmgenet_priv *priv)
+{
+	int ret;
+
+	netif_dbg(priv, hw, priv->dev, "bcmgenet: init_edma\n");
+
+	/* by default, enable ring 16 (descriptor based) */
+	ret = bcmgenet_init_rx_ring(priv, DESC_INDEX, TOTAL_DESC);
+	if (ret) {
+		netdev_err(priv->dev, "failed to initialize RX ring\n");
+		return ret;
+	}
+
+	/* init rDma */
+	bcmgenet_rdma_writel(priv, DMA_MAX_BURST_LENGTH, DMA_SCB_BURST_SIZE);
+
+	/* Init tDma */
+	bcmgenet_tdma_writel(priv, DMA_MAX_BURST_LENGTH, DMA_SCB_BURST_SIZE);
+
+	/* Initialize commont TX ring structures */
+	priv->tx_bds = priv->base + priv->hw_params->tdma_offset;
+	priv->num_tx_bds = TOTAL_DESC;
+	priv->tx_cbs = kzalloc(priv->num_tx_bds * sizeof(struct enet_cb),
+				GFP_KERNEL);
+	if (!priv->tx_cbs) {
+		bcmgenet_fini_dma(priv);
+		return -ENOMEM;
+	}
+
+	/* initialize multi xmit queue */
+	bcmgenet_init_multiq(priv->dev);
+
+	/* initialize special ring 16 */
+	bcmgenet_init_tx_ring(priv, DESC_INDEX, GENET_DEFAULT_BD_CNT,
+			priv->hw_params->tx_queues * priv->hw_params->bds_cnt,
+			TOTAL_DESC);
+
+	return 0;
+}
+
+/* NAPI polling method*/
+static int bcmgenet_poll(struct napi_struct *napi, int budget)
+{
+	struct bcmgenet_priv *priv = container_of(napi,
+			struct bcmgenet_priv, napi);
+	unsigned int work_done;
+
+	work_done = bcmgenet_desc_rx(priv, budget);
+
+	/* tx reclaim */
+	bcmgenet_tx_reclaim(priv->dev, &priv->tx_rings[DESC_INDEX]);
+	/* Advancing our consumer index*/
+	priv->rx_c_index += work_done;
+	priv->rx_c_index &= DMA_C_INDEX_MASK;
+	bcmgenet_rdma_ring_writel(priv, DESC_INDEX,
+				priv->rx_c_index, RDMA_CONS_INDEX);
+	if (work_done < budget) {
+		napi_complete(napi);
+		bcmgenet_intrl2_0_writel(priv,
+			UMAC_IRQ_RXDMA_BDONE, INTRL2_CPU_MASK_CLEAR);
+	}
+
+	return work_done;
+}
+
+/* Interrupt bottom half */
+static void bcmgenet_irq_task(struct work_struct *work)
+{
+	struct bcmgenet_priv *priv = container_of(
+			work, struct bcmgenet_priv, bcmgenet_irq_work);
+	struct net_device *dev;
+	u32 reg;
+
+	dev = priv->dev;
+
+	netif_dbg(priv, intr, dev, "%s\n", __func__);
+	/* Cable plugged/unplugged event */
+	if (priv->phy_type == BRCM_PHY_TYPE_INT) {
+		if (priv->irq0_stat & UMAC_IRQ_PHY_DET_R) {
+			priv->irq0_stat &= ~UMAC_IRQ_PHY_DET_R;
+			netif_crit(priv, link, dev,
+				"cable plugged in, powering up\n");
+			bcmgenet_power_up(priv, GENET_POWER_CABLE_SENSE);
+		} else if (priv->irq0_stat & UMAC_IRQ_PHY_DET_F) {
+			priv->irq0_stat &= ~UMAC_IRQ_PHY_DET_F;
+			netif_crit(priv, link, dev,
+				"cable unplugged, powering down\n");
+			bcmgenet_power_down(priv, GENET_POWER_CABLE_SENSE);
+		}
+	}
+	if (priv->irq0_stat & UMAC_IRQ_MPD_R) {
+		priv->irq0_stat &= ~UMAC_IRQ_MPD_R;
+		netif_crit(priv, wol, dev,
+			"magic packet detected, waking up\n");
+		/* disable mpd interrupt */
+		bcmgenet_intrl2_0_writel(priv,
+			UMAC_IRQ_MPD_R, INTRL2_CPU_MASK_SET);
+		/* disable CRC forward.*/
+		reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+		reg &= ~CMD_CRC_FWD;
+		bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+		priv->crc_fwd_en = 0;
+		bcmgenet_power_up(priv, GENET_POWER_WOL_MAGIC);
+
+	} else if (priv->irq0_stat & (UMAC_IRQ_HFB_SM | UMAC_IRQ_HFB_MM)) {
+		priv->irq0_stat &= ~(UMAC_IRQ_HFB_SM | UMAC_IRQ_HFB_MM);
+		netif_crit(priv, wol, dev,
+			"ACPI pattern matched, waking up\n");
+		/* disable HFB match interrupts */
+		bcmgenet_intrl2_0_writel(priv,
+			UMAC_IRQ_HFB_SM | UMAC_IRQ_HFB_MM, INTRL2_CPU_MASK_SET);
+		bcmgenet_power_up(priv, GENET_POWER_WOL_ACPI);
+	}
+
+	/* Link UP/DOWN event */
+	if ((priv->hw_params->flags & GENET_HAS_MDIO_INTR) &&
+		(priv->irq0_stat & (UMAC_IRQ_LINK_UP|UMAC_IRQ_LINK_DOWN))) {
+		if (priv->phydev)
+			phy_mac_interrupt(priv->phydev,
+				(priv->irq0_stat & UMAC_IRQ_LINK_UP));
+		priv->irq0_stat &= ~(UMAC_IRQ_LINK_UP|UMAC_IRQ_LINK_DOWN);
+	}
+}
+
+/* bcmgenet_isr1: interrupt handler for ring buffer. */
+static irqreturn_t bcmgenet_isr1(int irq, void *dev_id)
+{
+	struct bcmgenet_priv *priv = dev_id;
+	unsigned int index;
+
+	/* Save irq status for bottom-half processing. */
+	priv->irq1_stat =
+		bcmgenet_intrl2_1_readl(priv, INTRL2_CPU_STAT) &
+		~priv->int1_mask;
+	/* clear inerrupts*/
+	bcmgenet_intrl2_1_writel(priv, priv->irq1_stat, INTRL2_CPU_CLEAR);
+
+	netif_dbg(priv, intr, priv->dev,
+		"%s: IRQ=0x%x\n", __func__, priv->irq1_stat);
+	/* Check the MBDONE interrupts.
+	 * packet is done, reclaim descriptors
+	 */
+	if (priv->irq1_stat & 0x0000ffff) {
+		index = 0;
+		for (index = 0; index < 16; index++) {
+			if (priv->irq1_stat & (1 << index))
+				bcmgenet_tx_reclaim(priv->dev,
+						&priv->tx_rings[index]);
+		}
+	}
+	return IRQ_HANDLED;
+}
+
+/* bcmgenet_isr0: Handle various interrupts. */
+static irqreturn_t bcmgenet_isr0(int irq, void *dev_id)
+{
+	struct bcmgenet_priv *priv = dev_id;
+
+	/* Save irq status for bottom-half processing. */
+	priv->irq0_stat =
+		bcmgenet_intrl2_0_readl(priv, INTRL2_CPU_STAT) &
+		~bcmgenet_intrl2_0_readl(priv, INTRL2_CPU_MASK_STATUS);
+	/* clear inerrupts*/
+	bcmgenet_intrl2_0_writel(priv, priv->irq0_stat, INTRL2_CPU_CLEAR);
+
+	netif_dbg(priv, intr, priv->dev,
+		"IRQ=0x%x\n", priv->irq0_stat);
+
+	if (priv->irq0_stat & (UMAC_IRQ_RXDMA_BDONE | UMAC_IRQ_RXDMA_PDONE)) {
+		/* We use NAPI(software interrupt throttling, if
+		 * Rx Descriptor throttling is not used.
+		 * Disable interrupt, will be enabled in the poll method.
+		 */
+		if (likely(napi_schedule_prep(&priv->napi))) {
+			bcmgenet_intrl2_0_writel(priv,
+				UMAC_IRQ_RXDMA_BDONE, INTRL2_CPU_MASK_SET);
+			__napi_schedule(&priv->napi);
+		}
+	}
+	if (priv->irq0_stat &
+			(UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE)) {
+		/* Tx reclaim */
+		bcmgenet_tx_reclaim(priv->dev, &priv->tx_rings[DESC_INDEX]);
+	}
+	if (priv->irq0_stat & (UMAC_IRQ_PHY_DET_R |
+				UMAC_IRQ_PHY_DET_F |
+				UMAC_IRQ_LINK_UP |
+				UMAC_IRQ_LINK_DOWN |
+				UMAC_IRQ_HFB_SM |
+				UMAC_IRQ_HFB_MM |
+				UMAC_IRQ_MPD_R)) {
+		/* all other interested interrupts handled in bottom half */
+		schedule_work(&priv->bcmgenet_irq_work);
+	}
+
+	if ((priv->hw_params->flags & GENET_HAS_MDIO_INTR) &&
+		priv->irq0_stat & (UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR)) {
+		priv->irq0_stat &= ~(UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR);
+		wake_up(&priv->wq);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void bcmgenet_umac_reset(struct bcmgenet_priv *priv)
+{
+	u32 reg;
+
+	reg = bcmgenet_rbuf_ctrl_get(priv);
+	reg |= BIT(1);
+	bcmgenet_rbuf_ctrl_set(priv, reg);
+	udelay(10);
+
+	reg &= ~BIT(1);
+	bcmgenet_rbuf_ctrl_set(priv, reg);
+	udelay(10);
+}
+
+static void bcmgenet_set_hw_addr(struct bcmgenet_priv *priv,
+				  unsigned char *addr)
+{
+	bcmgenet_umac_writel(priv, (addr[0] << 24) | (addr[1] << 16) |
+			(addr[2] << 8) | addr[3], UMAC_MAC0);
+	bcmgenet_umac_writel(priv, (addr[4] << 8) | addr[5], UMAC_MAC1);
+}
+
+static int bcmgenet_wol_resume(struct bcmgenet_priv *priv)
+{
+	int ret;
+
+	/* From WOL-enabled suspend, switch to regular clock */
+	clk_disable(priv->clk_wol);
+	/* init umac registers to synchronize s/w with h/w */
+	ret = init_umac(priv);
+	if (ret)
+		return ret;
+
+	if (priv->phydev)
+		phy_init_hw(priv->phydev);
+	/* Speed settings must be restored */
+	bcmgenet_mii_config(priv->dev);
+
+	return 0;
+}
+
+/* Returns a reusable dma control register value */
+static u32 bcmgenet_dma_disable(struct bcmgenet_priv *priv)
+{
+	u32 reg;
+	u32 dma_ctrl;
+
+	/* disable DMA */
+	dma_ctrl = 1 << (DESC_INDEX + DMA_RING_BUF_EN_SHIFT) | DMA_EN;
+	reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
+	reg &= ~dma_ctrl;
+	bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
+
+	reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
+	reg &= ~dma_ctrl;
+	bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
+
+	bcmgenet_umac_writel(priv, 1, UMAC_TX_FLUSH);
+	udelay(10);
+	bcmgenet_umac_writel(priv, 0, UMAC_TX_FLUSH);
+
+	return dma_ctrl;
+}
+
+static void bcmgenet_enable_dma(struct bcmgenet_priv *priv, u32 dma_ctrl)
+{
+	u32 reg;
+
+	reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
+	reg |= dma_ctrl;
+	bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
+
+	reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
+	reg |= dma_ctrl;
+	bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
+}
+
+static int bcmgenet_open(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	unsigned long dma_ctrl;
+	u32 reg;
+	int ret;
+
+	netif_dbg(priv, ifup, dev, "bcmgenet_open\n");
+
+	/* Turn on the clock */
+	if (!IS_ERR(priv->clk))
+		clk_prepare_enable(priv->clk);
+
+	/* take MAC out of reset */
+	bcmgenet_umac_reset(priv);
+
+	ret = init_umac(priv);
+	if (ret)
+		goto err_clk_disable;
+
+	/* disable ethernet MAC while updating its registers */
+	reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+	reg &= ~(CMD_TX_EN | CMD_RX_EN);
+	bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+
+	bcmgenet_set_hw_addr(priv, dev->dev_addr);
+
+	if (priv->wol_enabled) {
+		ret = bcmgenet_wol_resume(priv);
+		if (ret)
+			return ret;
+	}
+
+	if (priv->phy_type == BRCM_PHY_TYPE_INT) {
+		reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
+		reg |= EXT_ENERGY_DET_MASK;
+		bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
+	}
+
+	if (test_and_clear_bit(GENET_POWER_WOL_MAGIC, &priv->wol_enabled))
+		bcmgenet_power_up(priv, GENET_POWER_WOL_MAGIC);
+	if (test_and_clear_bit(GENET_POWER_WOL_ACPI, &priv->wol_enabled))
+		bcmgenet_power_up(priv, GENET_POWER_WOL_ACPI);
+
+	/* Disable RX/TX DMA and flush TX queues */
+	dma_ctrl = bcmgenet_dma_disable(priv);
+
+	/* Reinitialize TDMA and RDMA and SW housekeeping */
+	ret = bcmgenet_init_dma(priv);
+	if (ret) {
+		netdev_err(dev, "failed to initialize DMA\n");
+		goto err_fini_dma;
+	}
+
+	/* Always enable ring 16 - descriptor ring */
+	bcmgenet_enable_dma(priv, dma_ctrl);
+
+	ret = request_irq(priv->irq0, bcmgenet_isr0, IRQF_SHARED,
+			dev->name, priv);
+	if (ret < 0) {
+		netdev_err(dev, "can't request IRQ %d\n", priv->irq0);
+		goto err_fini_dma;
+	}
+
+	ret = request_irq(priv->irq1, bcmgenet_isr1, IRQF_SHARED,
+				dev->name, priv);
+	if (ret < 0) {
+		netdev_err(dev, "can't request IRQ %d\n", priv->irq1);
+		goto err_irq0;
+	}
+
+	/* Start the network engine */
+	napi_enable(&priv->napi);
+
+	reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+	reg |= (CMD_TX_EN | CMD_RX_EN);
+	bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+
+	/* Make sure we reflect the value of CRC_CMD_FWD */
+	priv->crc_fwd_en = !!(reg & CMD_CRC_FWD);
+
+	device_set_wakeup_capable(&dev->dev, 1);
+
+	if (priv->phy_type == BRCM_PHY_TYPE_INT)
+		bcmgenet_power_up(priv, GENET_POWER_PASSIVE);
+
+	netif_tx_start_all_queues(dev);
+
+	if (priv->phydev)
+		phy_start(priv->phydev);
+
+	return 0;
+
+err_irq0:
+	free_irq(priv->irq0, dev);
+err_fini_dma:
+	bcmgenet_fini_dma(priv);
+err_clk_disable:
+	if (!IS_ERR(priv->clk))
+		clk_disable_unprepare(priv->clk);
+	return ret;
+}
+
+static int bcmgenet_dma_teardown(struct bcmgenet_priv *priv)
+{
+	int timeout = 0;
+	u32 reg;
+
+	/* Disable TDMA to stop add more frames in TX DMA */
+	reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
+	reg &= ~DMA_EN;
+	bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
+
+	/* Check TDMA status register to confirm TDMA is disabled */
+	while (!(bcmgenet_tdma_readl(priv, DMA_STATUS) & DMA_DISABLED)) {
+		if (timeout++ == 5000) {
+			netdev_warn(priv->dev,
+				"Timed out while disabling TX DMA\n");
+			return -ETIMEDOUT;
+		}
+		udelay(1);
+	}
+
+	/* Wait 10ms for packet drain in both tx and rx dma */
+	usleep_range(10000, 20000);
+
+	/* Disable RDMA */
+	reg = bcmgenet_rdma_readl(priv, DMA_CTRL);
+	reg &= ~DMA_EN;
+	bcmgenet_rdma_writel(priv, reg, DMA_CTRL);
+
+	timeout = 0;
+	/* Check RDMA status register to confirm RDMA is disabled */
+	while (!(bcmgenet_rdma_readl(priv, DMA_STATUS) & DMA_DISABLED)) {
+		if (timeout++ == 5000) {
+			netdev_warn(priv->dev,
+				"Timed out while disabling RX DMA\n");
+			return -ETIMEDOUT;
+		}
+		udelay(1);
+	}
+
+	return 0;
+}
+
+static int bcmgenet_close(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	int ret;
+	u32 reg;
+
+	netif_dbg(priv, ifdown, dev, "bcmgenet_close\n");
+
+	if (priv->phydev)
+		phy_stop(priv->phydev);
+
+	/* Disable MAC receive */
+	reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+	reg &= ~CMD_RX_EN;
+	bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+
+	netif_tx_stop_all_queues(dev);
+
+	ret = bcmgenet_dma_teardown(priv);
+	if (ret)
+		return ret;
+
+	/* Disable MAC transmit. TX DMA disabled have to done before this */
+	reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+	reg &= ~CMD_TX_EN;
+	bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+
+	napi_disable(&priv->napi);
+
+	/* tx reclaim */
+	bcmgenet_tx_reclaim_all(dev);
+	bcmgenet_fini_dma(priv);
+
+	free_irq(priv->irq0, priv);
+	free_irq(priv->irq1, priv);
+
+	/* Wait for pending work items to complete - we are stopping
+	 * the clock now. Since interrupts are disabled, no new work
+	 * will be scheduled.
+	 */
+	cancel_work_sync(&priv->bcmgenet_irq_work);
+
+	if (device_may_wakeup(&dev->dev)) {
+		if (priv->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE))
+			bcmgenet_power_down(priv, GENET_POWER_WOL_MAGIC);
+		if (priv->wolopts & WAKE_ARP)
+			bcmgenet_power_down(priv, GENET_POWER_WOL_ACPI);
+	} else if (priv->phy_type == BRCM_PHY_TYPE_INT)
+		bcmgenet_power_down(priv, GENET_POWER_PASSIVE);
+
+	if (priv->wol_enabled)
+		clk_enable(priv->clk_wol);
+
+	if (!IS_ERR(priv->clk))
+		clk_disable_unprepare(priv->clk);
+
+	return 0;
+}
+
+static void bcmgenet_timeout(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+
+	BUG_ON(dev == NULL);
+
+	netif_dbg(priv, tx_err, dev, "bcmgenet_timeout\n");
+
+	dev->trans_start = jiffies;
+
+	dev->stats.tx_errors++;
+
+	netif_tx_wake_all_queues(dev);
+}
+
+#define MAX_MC_COUNT	16
+
+static inline void bcmgenet_set_mdf_addr(struct bcmgenet_priv *priv,
+					 unsigned char *addr,
+					 int *i,
+					 int *mc)
+{
+	u32 reg;
+
+	bcmgenet_umac_writel(priv, addr[0] << 8 | addr[1],
+			UMAC_MDF_ADDR + (*i * 4));
+	bcmgenet_umac_writel(priv,
+			addr[2] << 24 | addr[3] << 16 |
+			addr[4] << 8 | addr[5],
+			UMAC_MDF_ADDR + ((*i + 1) * 4));
+	reg = bcmgenet_umac_readl(priv, UMAC_MDF_CTRL);
+	reg |= (1 << (MAX_MC_COUNT - *mc));
+	bcmgenet_umac_writel(priv, reg, UMAC_MDF_CTRL);
+	*i += 2;
+	(*mc)++;
+}
+
+static void bcmgenet_set_rx_mode(struct net_device *dev)
+{
+	struct bcmgenet_priv *priv = netdev_priv(dev);
+	struct netdev_hw_addr *ha;
+	int i, mc;
+	u32 reg;
+
+	netif_dbg(priv, hw, dev, "%s: %08X\n", __func__, dev->flags);
+
+	/* Promiscous mode */
+	reg = bcmgenet_umac_readl(priv, UMAC_CMD);
+	if (dev->flags & IFF_PROMISC) {
+		reg |= CMD_PROMISC;
+		bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+		bcmgenet_umac_writel(priv, 0, UMAC_MDF_CTRL);
+		return;
+	} else {
+		reg &= ~CMD_PROMISC;
+		bcmgenet_umac_writel(priv, reg, UMAC_CMD);
+	}
+
+	/* UniMac doesn't support ALLMULTI */
+	if (dev->flags & IFF_ALLMULTI)
+		return;
+
+	/* update MDF filter */
+	i = 0;
+	mc = 0;
+	/* Broadcast */
+	bcmgenet_set_mdf_addr(priv, dev->broadcast, &i, &mc);
+	/* my own address.*/
+	bcmgenet_set_mdf_addr(priv, dev->dev_addr, &i, &mc);
+	/* Unicast list*/
+	if (netdev_uc_count(dev) > (MAX_MC_COUNT - mc))
+		return;
+
+	if (!netdev_uc_empty(dev))
+		netdev_for_each_uc_addr(ha, dev)
+			bcmgenet_set_mdf_addr(priv, ha->addr, &i, &mc);
+	/* Multicast */
+	if (netdev_mc_empty(dev) || netdev_mc_count(dev) >= (MAX_MC_COUNT - mc))
+		return;
+
+	netdev_for_each_mc_addr(ha, dev)
+		bcmgenet_set_mdf_addr(priv, ha->addr, &i, &mc);
+}
+
+/* Set the hardware MAC address. */
+static int bcmgenet_set_mac_addr(struct net_device *dev, void *p)
+{
+	struct sockaddr *addr = p;
+
+	if (netif_running(dev))
+		return -EBUSY;
+
+	ether_addr_copy(dev->dev_addr, addr->sa_data);
+
+	return 0;
+}
+
+static u16 bcmgenet_select_queue(struct net_device *dev,
+		struct sk_buff *skb, void *accel_priv)
+{
+	return netif_is_multiqueue(dev) ? skb->queue_mapping : 0;
+}
+
+static const struct net_device_ops bcmgenet_netdev_ops = {
+	.ndo_open = bcmgenet_open,
+	.ndo_stop = bcmgenet_close,
+	.ndo_start_xmit = bcmgenet_xmit,
+	.ndo_select_queue = bcmgenet_select_queue,
+	.ndo_tx_timeout = bcmgenet_timeout,
+	.ndo_set_rx_mode = bcmgenet_set_rx_mode,
+	.ndo_set_mac_address = bcmgenet_set_mac_addr,
+	.ndo_do_ioctl = bcmgenet_ioctl,
+	.ndo_set_features = bcmgenet_set_features,
+};
+
+/* Array of GENET hardware parameters/characteristics */
+static struct bcmgenet_hw_params bcmgenet_hw_params[] = {
+	[GENET_V1] = {
+		.tx_queues = 0,
+		.rx_queues = 0,
+		.bds_cnt = 0,
+		.bp_in_en_shift = 16,
+		.bp_in_mask = 0xffff,
+		.hfb_filter_cnt = 16,
+		.qtag_mask = 0x1F,
+		.hfb_offset = 0x1000,
+		.rdma_offset = 0x2000,
+		.tdma_offset = 0x3000,
+		.words_per_bd = 2,
+	},
+	[GENET_V2] = {
+		.tx_queues = 4,
+		.rx_queues = 4,
+		.bds_cnt = 32,
+		.bp_in_en_shift = 16,
+		.bp_in_mask = 0xffff,
+		.hfb_filter_cnt = 16,
+		.qtag_mask = 0x1F,
+		.tbuf_offset = 0x0600,
+		.hfb_offset = 0x1000,
+		.hfb_reg_offset = 0x2000,
+		.rdma_offset = 0x3000,
+		.tdma_offset = 0x4000,
+		.words_per_bd = 2,
+		.flags = GENET_HAS_EXT,
+	},
+	[GENET_V3] = {
+		.tx_queues = 4,
+		.rx_queues = 4,
+		.bds_cnt = 32,
+		.bp_in_en_shift = 17,
+		.bp_in_mask = 0x1ffff,
+		.hfb_filter_cnt = 48,
+		.qtag_mask = 0x3F,
+		.tbuf_offset = 0x0600,
+		.hfb_offset = 0x8000,
+		.hfb_reg_offset = 0xfc00,
+		.rdma_offset = 0x10000,
+		.tdma_offset = 0x11000,
+		.words_per_bd = 2,
+		.flags = GENET_HAS_EXT | GENET_HAS_MDIO_INTR,
+	},
+	[GENET_V4] = {
+		.tx_queues = 4,
+		.rx_queues = 4,
+		.bds_cnt = 32,
+		.bp_in_en_shift = 17,
+		.bp_in_mask = 0x1ffff,
+		.hfb_filter_cnt = 48,
+		.qtag_mask = 0x3F,
+		.tbuf_offset = 0x0600,
+		.hfb_offset = 0x8000,
+		.hfb_reg_offset = 0xfc00,
+		.rdma_offset = 0x2000,
+		.tdma_offset = 0x4000,
+		.words_per_bd = 3,
+		.flags = GENET_HAS_40BITS | GENET_HAS_EXT | GENET_HAS_MDIO_INTR,
+	},
+};
+
+/* Infer hardware parameters from the detected GENET version */
+static void bcmgenet_set_hw_params(struct bcmgenet_priv *priv)
+{
+	struct bcmgenet_hw_params *params;
+	u32 reg;
+	u8 major;
+
+	if (GENET_IS_V4(priv)) {
+		bcmgenet_dma_regs = bcmgenet_dma_regs_v3plus;
+		genet_dma_ring_regs = genet_dma_ring_regs_v4;
+		priv->dma_rx_chk_bit = DMA_RX_CHK_V3PLUS;
+		priv->version = GENET_V4;
+	} else if (GENET_IS_V3(priv)) {
+		bcmgenet_dma_regs = bcmgenet_dma_regs_v3plus;
+		genet_dma_ring_regs = genet_dma_ring_regs_v123;
+		priv->dma_rx_chk_bit = DMA_RX_CHK_V3PLUS;
+		priv->version = GENET_V3;
+	} else if (GENET_IS_V2(priv)) {
+		bcmgenet_dma_regs = bcmgenet_dma_regs_v2;
+		genet_dma_ring_regs = genet_dma_ring_regs_v123;
+		priv->dma_rx_chk_bit = DMA_RX_CHK_V12;
+		priv->version = GENET_V2;
+	} else if (GENET_IS_V1(priv)) {
+		bcmgenet_dma_regs = bcmgenet_dma_regs_v1;
+		genet_dma_ring_regs = genet_dma_ring_regs_v123;
+		priv->dma_rx_chk_bit = DMA_RX_CHK_V12;
+		priv->version = GENET_V1;
+	}
+
+	/* enum genet_version starts at 1 */
+	priv->hw_params = &bcmgenet_hw_params[priv->version];
+	params = priv->hw_params;
+
+	/* Read GENET HW version */
+	reg = bcmgenet_sys_readl(priv, SYS_REV_CTRL);
+	major = (reg >> 24 & 0x0f);
+	if (major == 5)
+		major = 4;
+	else if (major == 0)
+		major = 1;
+	if (major != priv->version) {
+		dev_err(&priv->pdev->dev,
+			"GENET version mismatch, got: %d, configured for: %d\n",
+			major, priv->version);
+	}
+
+	/* Print the GENET core version */
+	dev_info(&priv->pdev->dev, "GENET " GENET_VER_FMT,
+		major, (reg >> 16) & 0x0f, reg & 0xffff);
+
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+	if (!(params->flags & GENET_HAS_40BITS))
+		pr_warn("GENET does not support 40-bits PA\n");
+#endif
+
+	pr_debug("Configuration for version: %d\n"
+		"TXq: %1d, RXq: %1d, BDs: %1d\n"
+		"BP << en: %2d, BP msk: 0x%05x\n"
+		"HFB count: %2d, QTAQ msk: 0x%05x\n"
+		"TBUF: 0x%04x, HFB: 0x%04x, HFBreg: 0x%04x\n"
+		"RDMA: 0x%05x, TDMA: 0x%05x\n"
+		"Words/BD: %d\n",
+		priv->version,
+		params->tx_queues, params->rx_queues, params->bds_cnt,
+		params->bp_in_en_shift, params->bp_in_mask,
+		params->hfb_filter_cnt, params->qtag_mask,
+		params->tbuf_offset, params->hfb_offset,
+		params->hfb_reg_offset,
+		params->rdma_offset, params->tdma_offset,
+		params->words_per_bd);
+}
+
+static int bcmgenet_drv_probe(struct platform_device *pdev)
+{
+	struct device_node *dn = pdev->dev.of_node;
+	struct bcmgenet_priv *priv;
+	struct net_device *dev;
+	const void *macaddr;
+	struct resource *r;
+	int err = -EIO;
+
+	/* Up to GENET_MAX_MQ_CNT + 1 TX queues and a single RX queue */
+	dev = alloc_etherdev_mqs(sizeof(*priv), GENET_MAX_MQ_CNT + 1, 1);
+	if (!dev) {
+		dev_err(&pdev->dev, "can't allocate net device\n");
+		return -ENOMEM;
+	}
+
+	priv = netdev_priv(dev);
+	priv->irq0 = platform_get_irq(pdev, 0);
+	priv->irq1 = platform_get_irq(pdev, 1);
+	if (!priv->irq0 || !priv->irq1) {
+		dev_err(&pdev->dev, "can't find IRQs\n");
+		err = -EINVAL;
+		goto err;
+	}
+
+	macaddr = of_get_mac_address(dn);
+	if (!macaddr) {
+		dev_err(&pdev->dev, "can't find MAC address\n");
+		err = -EINVAL;
+		goto err;
+	}
+
+	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_request_and_ioremap(&pdev->dev, r);
+	if (!priv->base) {
+		dev_err(&pdev->dev, "can't ioremap\n");
+		err = -EINVAL;
+		goto err;
+	}
+
+	dev->base_addr = (unsigned long)priv->base;
+	SET_NETDEV_DEV(dev, &pdev->dev);
+	dev_set_drvdata(&pdev->dev, dev);
+	ether_addr_copy(dev->dev_addr, macaddr);
+	dev->irq = priv->irq0;
+	dev->watchdog_timeo = 2 * HZ;
+	SET_ETHTOOL_OPS(dev, &bcmgenet_ethtool_ops);
+	dev->netdev_ops = &bcmgenet_netdev_ops;
+	netif_napi_add(dev, &priv->napi, bcmgenet_poll, 64);
+
+	priv->msg_enable = netif_msg_init(debug, GENET_MSG_DEFAULT);
+
+	/* Set hardware features */
+	dev->hw_features |= NETIF_F_SG | NETIF_F_IP_CSUM |
+		NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM;
+
+	/* Set the needed headroom to account for any possible
+	 * features enabling/disabling at runtime
+	 */
+	dev->needed_headroom += 64;
+
+	netdev_boot_setup_check(dev);
+
+	priv->dev = dev;
+	priv->pdev = pdev;
+
+	if (of_device_is_compatible(dn, "brcm,genet-v4"))
+		priv->version = GENET_V4;
+	else if (of_device_is_compatible(dn, "brcm,genet-v3"))
+		priv->version = GENET_V3;
+	else if (of_device_is_compatible(dn, "brcm,genet-v2"))
+		priv->version = GENET_V2;
+	else if (of_device_is_compatible(dn, "brcm,genet-v1"))
+		priv->version = GENET_V1;
+	else {
+		dev_err(&pdev->dev, "unknown GENET version\n");
+		err = -EINVAL;
+		goto err;
+	}
+
+	bcmgenet_set_hw_params(priv);
+
+	spin_lock_init(&priv->lock);
+	spin_lock_init(&priv->bh_lock);
+	mutex_init(&priv->mib_mutex);
+	/* Mii wait queue */
+	init_waitqueue_head(&priv->wq);
+	/* Always use RX_BUF_LENGTH (2KB) buffer for all chips */
+	priv->rx_buf_len = RX_BUF_LENGTH;
+	INIT_WORK(&priv->bcmgenet_irq_work, bcmgenet_irq_task);
+
+	priv->clk = devm_clk_get(&priv->pdev->dev, "enet");
+	if (IS_ERR(priv->clk))
+		dev_warn(&priv->pdev->dev, "failed to get enet clock\n");
+
+	priv->clk_wol = devm_clk_get(&priv->pdev->dev, "enet-wol");
+	if (IS_ERR(priv->clk_wol))
+		dev_warn(&priv->pdev->dev, "failed to get enet-wol clock\n");
+
+	if (!IS_ERR(priv->clk))
+		clk_prepare_enable(priv->clk);
+
+	err = reset_umac(priv);
+	if (err)
+		goto err_clk_disable;
+
+	err = bcmgenet_mii_init(dev);
+	if (err)
+		goto err_clk_disable;
+
+	/* setup number of real queues  + 1 (GENET_V1 has 0 hardware queues
+	 * just the ring 16 descriptor based TX
+	 */
+	netif_set_real_num_tx_queues(priv->dev, priv->hw_params->tx_queues + 1);
+	netif_set_real_num_rx_queues(priv->dev, priv->hw_params->rx_queues + 1);
+
+	err = register_netdev(dev);
+	if (err)
+		goto err_clk_disable;
+
+	/* Turn off the clocks */
+	if (!IS_ERR(priv->clk))
+		clk_disable_unprepare(priv->clk);
+
+	return err;
+
+err_clk_disable:
+	if (!IS_ERR(priv->clk))
+		clk_disable_unprepare(priv->clk);
+err:
+	free_netdev(dev);
+	return err;
+}
+
+static int bcmgenet_drv_remove(struct platform_device *pdev)
+{
+	struct bcmgenet_priv *priv = dev_to_priv(&pdev->dev);
+
+	dev_set_drvdata(&pdev->dev, NULL);
+	unregister_netdev(priv->dev);
+	bcmgenet_mii_exit(priv->dev);
+	free_netdev(priv->dev);
+
+	return 0;
+}
+
+static const struct of_device_id bcmgenet_match[] = {
+	{ .compatible = "brcm,genet-v1", },
+	{ .compatible = "brcm,genet-v2", },
+	{ .compatible = "brcm,genet-v3", },
+	{ .compatible = "brcm,genet-v4", },
+	{ },
+};
+
+static struct platform_driver bcmgenet_plat_drv = {
+	.probe	= bcmgenet_drv_probe,
+	.remove	= bcmgenet_drv_remove,
+	.driver	= {
+		.name	= "bcmgenet",
+		.owner	= THIS_MODULE,
+		.of_match_table = bcmgenet_match,
+	},
+};
+
+static int bcmgenet_module_init(void)
+{
+	platform_driver_register(&bcmgenet_plat_drv);
+	return 0;
+}
+
+static void bcmgenet_module_cleanup(void)
+{
+	platform_driver_unregister(&bcmgenet_plat_drv);
+}
+
+module_init(bcmgenet_module_init);
+module_exit(bcmgenet_module_cleanup);
+MODULE_LICENSE("GPL");
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 01/10] net: phy: add "internal" PHY mode
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli
In-Reply-To: <1392178053-3143-1-git-send-email-f.fainelli@gmail.com>

On some systems, the PHY can be internal, in the same package as the
Ethernet MAC, and still be responding to a specific address on the MDIO
bus, in that case, the Ethernet MAC might need to know about it to
properly configure a port multiplexer to switch to an internal or
external PHY. Add a new PHY interface mode for this and update the
Device Tree of_get_phy_mode() function to look for it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/of/of_net.c | 1 +
 include/linux/phy.h | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c
index a208a45..729beba 100644
--- a/drivers/of/of_net.c
+++ b/drivers/of/of_net.c
@@ -31,6 +31,7 @@ static const char *phy_modes[] = {
 	[PHY_INTERFACE_MODE_RTBI]	= "rtbi",
 	[PHY_INTERFACE_MODE_SMII]	= "smii",
 	[PHY_INTERFACE_MODE_XGMII]	= "xgmii",
+	[PHY_INTERFACE_MODE_INTERNAL]	= "internal",
 };
 
 /**
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 565188c..463434b 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -74,6 +74,7 @@ typedef enum {
 	PHY_INTERFACE_MODE_RTBI,
 	PHY_INTERFACE_MODE_SMII,
 	PHY_INTERFACE_MODE_XGMII,
+	PHY_INTERFACE_MODE_INTERNAL,
 } phy_interface_t;
 
 
@@ -553,7 +554,8 @@ static inline bool phy_interrupt_is_valid(struct phy_device *phydev)
  */
 static inline bool phy_is_internal(struct phy_device *phydev)
 {
-	return phydev->is_internal;
+	return phydev->is_internal ||
+		phydev->interface == PHY_INTERFACE_MODE_INTERNAL;
 }
 
 /**
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next 00/10] Support for Broadcom GENET driver
From: Florian Fainelli @ 2014-02-12  4:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, cernekee, devicetree, Florian Fainelli

Hi all,

This patchset adds support for the Broadcom GENET Gigabit Ethernet MAC
controller. This controller is found on the Broadcom BCM7xxx Set Top Box
System-on-a-Chip.

Florian Fainelli (10):
  net: phy: add "internal" PHY mode
  net: phy: add MoCA PHY type
  net: phy: update port type for MoCA PHYs
  net: phy: add Broadcom BCM7xxx internal PHY driver
  net: bcmgenet: add driver definitions and private structure
  net: bcmgenet: add main driver file
  net: bcmgenet: add MDIO routines
  net: bcmgenet: hook into the build system
  Documentation: add Device tree bindings for Broadcom GENET
  MAINTAINERS: add entry for the Broadcom GENET driver

 .../devicetree/bindings/net/broadcom-bcmgenet.txt  |  111 +
 MAINTAINERS                                        |    6 +
 drivers/net/ethernet/broadcom/Kconfig              |   10 +
 drivers/net/ethernet/broadcom/Makefile             |    1 +
 drivers/net/ethernet/broadcom/genet/Makefile       |    2 +
 drivers/net/ethernet/broadcom/genet/bcmgenet.c     | 2685 ++++++++++++++++++++
 drivers/net/ethernet/broadcom/genet/bcmgenet.h     |  631 +++++
 drivers/net/ethernet/broadcom/genet/bcmmii.c       |  483 ++++
 drivers/net/phy/Kconfig                            |    6 +
 drivers/net/phy/Makefile                           |    1 +
 drivers/net/phy/bcm7xxx.c                          |  322 +++
 drivers/net/phy/phy.c                              |    5 +-
 drivers/of/of_net.c                                |    2 +
 include/linux/brcmphy.h                            |    9 +
 include/linux/phy.h                                |    5 +-
 15 files changed, 4277 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
 create mode 100644 drivers/net/ethernet/broadcom/genet/Makefile
 create mode 100644 drivers/net/ethernet/broadcom/genet/bcmgenet.c
 create mode 100644 drivers/net/ethernet/broadcom/genet/bcmgenet.h
 create mode 100644 drivers/net/ethernet/broadcom/genet/bcmmii.c
 create mode 100644 drivers/net/phy/bcm7xxx.c

-- 
1.8.3.2

^ permalink raw reply

* Re: 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe
From: Eric Dumazet @ 2014-02-12  4:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Sander Eikelenboom, Konrad Rzeszutek Wilk, Wei Liu,
	Francois Romieu, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Dave Jones
In-Reply-To: <CAPcyv4juAS0ODRPKE-_wXYu5bixLnvsgs3jgo4vzWcatfnyVsw@mail.gmail.com>

On Tue, 2014-02-11 at 18:07 -0800, Dan Williams wrote:

> The overlap granularity is too large.  Multiple dma_map_single
> mappings are allowed to a given page as long as they don't collide on
> the same cache line.
> 

I am not sure why you try number of mappings of a page.

Try launching 100 concurrent netperf -t TCP_SENFILE

Same page might be mapped more than 100 times, more than 10000 times in
some cases.

^ permalink raw reply

* Re: [PATCH net] net: Clear local_df only if crossing namespace.
From: Pravin Shelar @ 2014-02-12  4:26 UTC (permalink / raw)
  To: Pravin Shelar, David Miller, netdev, Templin, Fred L,
	Nicolas Dichtel
In-Reply-To: <20140211021150.GB11150@order.stressinduktion.org>

On Mon, Feb 10, 2014 at 6:11 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Mon, Feb 10, 2014 at 01:00:14PM -0800, Pravin Shelar wrote:
>> On Fri, Feb 7, 2014 at 4:58 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > May I know because of wich vport, vxlan or gre, you did this change?
>> >
>> It affects both gre and vxlan.
>
> Ok, thanks.
>
>> > I am feeling a bit uncomfortable handling remote and local packets that
>> > differently on lower tunnel output (local_df is mostly set on locally
>> > originating packets).
>>
>> For ip traffic it make sense to turn on local_df only for local
>> traffic, since for remote case we can send icmp (frag-needed) back to
>> source. No such thing exist for OVS tunnels. ICMP packet are not
>> returned to source for the tunnels. That is why to be on safe side,
>> local_df is turned on for tunnels in OVS.
>
> I have a proposal:
>
> I don't like it that much because of the many arguments. But I currently
> don't see another easy solution. Maybe we should make bool xnet an enum and
> test with bitops?
>
> I left the clearing of local_df in skb_scrub_packet as we need it for the
> dev_forward_skb case and it should be done that in any case.
>
> This diff is slightly compile tested. ;)
>
> I can test and make proper submit if you agree.
>
> What do you think?
>

I am not sure why the caller can not just set skb->local_df before
calling iptunnel_xmit() rather than passing extra arg to this
function?
There are not that many caller of this function.

Thanks,
Pravin.

> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 026a313..630e72f 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1657,7 +1657,7 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
>                 return err;
>
>         return iptunnel_xmit(rt, skb, src, dst, IPPROTO_UDP, tos, ttl, df,
> -                            false);
> +                            false, false);
>  }
>  EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
>
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index 48ed75c..8863002 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -154,7 +154,8 @@ static inline u8 ip_tunnel_ecn_encap(u8 tos, const struct iphdr *iph,
>  int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto);
>  int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
>                   __be32 src, __be32 dst, __u8 proto,
> -                 __u8 tos, __u8 ttl, __be16 df, bool xnet);
> +                 __u8 tos, __u8 ttl, __be16 df, bool xnet,
> +                 bool clear_local_df);
>
>  struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb, bool gre_csum,
>                                          int gso_type_mask);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 8f519db..5773681 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3903,12 +3903,13 @@ EXPORT_SYMBOL(skb_try_coalesce);
>   */
>  void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>  {
> -       if (xnet)
> +       if (xnet) {
>                 skb_orphan(skb);
> +               skb->local_df = 0;
> +       }
>         skb->tstamp.tv64 = 0;
>         skb->pkt_type = PACKET_HOST;
>         skb->skb_iif = 0;
> -       skb->local_df = 0;
>         skb_dst_drop(skb);
>         skb->mark = 0;
>         secpath_reset(skb);
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index c0e3cb7..2922ec9 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -721,7 +721,8 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
>         }
>
>         err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, protocol,
> -                           tos, ttl, df, !net_eq(tunnel->net, dev_net(dev)));
> +                           tos, ttl, df, !net_eq(tunnel->net, dev_net(dev)),
> +                           true);
>         iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
>
>         return;
> diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
> index 6156f4e..93beb04 100644
> --- a/net/ipv4/ip_tunnel_core.c
> +++ b/net/ipv4/ip_tunnel_core.c
> @@ -48,13 +48,16 @@
>
>  int iptunnel_xmit(struct rtable *rt, struct sk_buff *skb,
>                   __be32 src, __be32 dst, __u8 proto,
> -                 __u8 tos, __u8 ttl, __be16 df, bool xnet)
> +                 __u8 tos, __u8 ttl, __be16 df, bool xnet,
> +                 bool clear_df)
>  {
>         int pkt_len = skb->len;
>         struct iphdr *iph;
>         int err;
>
>         skb_scrub_packet(skb, xnet);
> +       if (clear_df)
> +               skb->local_df = 0;
>
>         skb_clear_hash(skb);
>         skb_dst_set(skb, &rt->dst);
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index 3dfbcf1..cc0be0e 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -974,7 +974,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
>         }
>
>         err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, IPPROTO_IPV6, tos,
> -                           ttl, df, !net_eq(tunnel->net, dev_net(dev)));
> +                           ttl, df, !net_eq(tunnel->net, dev_net(dev)), true);
>         iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
>         return NETDEV_TX_OK;
>

^ permalink raw reply

* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Jason Wang @ 2014-02-12  5:28 UTC (permalink / raw)
  To: Qin Chuanyu, davem
  Cc: Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
	Eric Dumazet
In-Reply-To: <52FA32C5.9040601@huawei.com>

On 02/11/2014 10:25 PM, Qin Chuanyu wrote:
> we could xmit directly instead of going through softirq to gain
> throughput and lantency improved.
> test model: VM-Host-Host just do transmit. with vhost thread and nic
> interrupt bind cpu1. netperf do throuhput test and qperf do lantency
> test.
> Host OS: suse11sp3, Guest OS: suse11sp3
>
> latency result(us):
> packet_len 64 256 512 1460
> old(UDP)   44  47  48   66
> new(UDP)   38  41  42   66
>
> old(TCP)   52  55  70  117
> new(TCP)   45  48  61  114
>
> throughput result(Gbit/s):
> packet_len   64   512   1024   1460
> old(UDP)   0.42  2.02   3.75   4.68
> new(UDP)   0.45  2.14   3.77   5.06
>
> TCP due to the latency, client couldn't send packet big enough
> to get benefit from TSO of nic, so the result show it will send
> more packet per sencond but get lower throughput.
>
> Eric mentioned that it would has problem with cgroup, but the patch
> had been sent by Herbert Xu.
> patch_id f845172531fb7410c7fb7780b1a6e51ee6df7d52
>
> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
> ---

A question: without NAPI weight, could this starve other net devices?
>  drivers/net/tun.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 44c4db8..90b4e58 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct
> *tun, struct tun_file *tfile,
>      skb_probe_transport_header(skb, 0);
>
>      rxhash = skb_get_hash(skb);
> -    netif_rx_ni(skb);
> +    rcu_read_lock_bh();
> +    netif_receive_skb(skb);
> +    rcu_read_unlock_bh();
>
>      tun->dev->stats.rx_packets++;
>      tun->dev->stats.rx_bytes += len;

^ permalink raw reply

* [PATCH net] virtio-net: alloc big buffers also when guest can receive UFO
From: Jason Wang @ 2014-02-12  5:43 UTC (permalink / raw)
  To: rusty, mst, virtio-dev, virtualization, netdev, linux-kernel
  Cc: Sridhar Samudrala

We should alloc big buffers also when guest can receive UFO
pakcets. Otherwise the big packets will be truncated when mergeable rx
buffer is disabled.

Fixes 5c5167515d80f78f6bb538492c423adcae31ad65
(virtio-net: Allow UFO feature to be set and advertised.)

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d75f8ed..5632a99 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1711,7 +1711,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
 	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
-	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
+	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
+	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
 		vi->big_packets = true;
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Eric Dumazet @ 2014-02-12  5:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Qin Chuanyu, davem, Michael S. Tsirkin, Anthony Liguori, KVM list,
	netdev
In-Reply-To: <52FB066E.1020006@redhat.com>

On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:

> A question: without NAPI weight, could this starve other net devices?

Not really, as net devices are serviced by softirq handler.



^ permalink raw reply

* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Jason Wang @ 2014-02-12  5:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Qin Chuanyu, davem, Michael S. Tsirkin, Anthony Liguori, KVM list,
	netdev
In-Reply-To: <1392184074.1752.2.camel@edumazet-glaptop2.roam.corp.google.com>

On 02/12/2014 01:47 PM, Eric Dumazet wrote:
> On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:
>
>> A question: without NAPI weight, could this starve other net devices?
> Not really, as net devices are serviced by softirq handler.
>
>

Yes, then the issue is tun could be starved by other net devices.

^ permalink raw reply

* [PATCH net] vhost_net: do not report a used len larger than receive buffer size
From: Jason Wang @ 2014-02-12  5:57 UTC (permalink / raw)
  To: mst, kvm, virtio-dev, virtualization, netdev, linux-kernel

Currently, even if the packet were truncated by lower socket, we still
report the packet size as the used len which may confuse guest
driver. Fixes this by returning the size of guest receive buffer instead.

Fixes 3a4d5c94e959359ece6d6b55045c3f046677f55c
(vhost_net: a kernel-level virtio server)

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9a68409..06268a0 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -525,7 +525,8 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 		++headcount;
 		seg += in;
 	}
-	heads[headcount - 1].len += datalen;
+	if (likely(datalen < 0))
+		heads[headcount - 1].len += datalen;
 	*iovcount = seg;
 	if (unlikely(log))
 		*log_num = nlogs;
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Eric Dumazet @ 2014-02-12  6:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: Qin Chuanyu, davem, Michael S. Tsirkin, Anthony Liguori, KVM list,
	netdev
In-Reply-To: <52FB0BBD.7060303@redhat.com>

On Wed, 2014-02-12 at 13:50 +0800, Jason Wang wrote:
> On 02/12/2014 01:47 PM, Eric Dumazet wrote:
> > On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:
> >
> >> A question: without NAPI weight, could this starve other net devices?
> > Not really, as net devices are serviced by softirq handler.
> >
> >
> 
> Yes, then the issue is tun could be starved by other net devices.

How this patch changes anything to this 'problem' ?

netif_rx_ni() can only be called if your process is not preempted by
other high prio tasks/softirqs.

If this process is scheduled on a cpu, then disabling bh to process
_one_ packet wont fundamentally change dynamic of the system.

^ permalink raw reply

* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Qin Chuanyu @ 2014-02-12  6:46 UTC (permalink / raw)
  To: Jason Wang, davem
  Cc: Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
	Eric Dumazet
In-Reply-To: <52FB066E.1020006@redhat.com>

On 2014/2/12 13:28, Jason Wang wrote:

> A question: without NAPI weight, could this starve other net devices?
tap xmit skb use thread context,the poll func of physical nic driver
could be called in softirq context without change.

I had test it by binding vhost thread and physic nic interrupt on the 
same vcpu, use netperf xmit udp, test model is VM1-Host1-Host2.

if only VM1 xmit skb, the top show as below :
Cpu1 :0.0%us, 95.0%sy, 0.0%ni, 0.0%id,  0.0%wa, 0.0%hi, 5.0%si,  0.0%st

then use host2 xmit skb to VM1, the top show as below :
Cpu1 :0.0%us, 41.0%sy, 0.0%ni, 0.0%id,  0.0%wa, 0.0%hi, 59.0%si, 0.0%st

so I think there is no problem with this change.

>>   drivers/net/tun.c |    4 +++-
>>   1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 44c4db8..90b4e58 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct
>> *tun, struct tun_file *tfile,
>>       skb_probe_transport_header(skb, 0);
>>
>>       rxhash = skb_get_hash(skb);
>> -    netif_rx_ni(skb);
>> +    rcu_read_lock_bh();
>> +    netif_receive_skb(skb);
>> +    rcu_read_unlock_bh();
>>
>>       tun->dev->stats.rx_packets++;
>>       tun->dev->stats.rx_bytes += len;
>
>
> .
>

^ permalink raw reply

* [PATCH net-next 1/2] bonding: remove the redundant judgements for bond_set_mac_address()
From: Ding Tianhong @ 2014-02-12  6:58 UTC (permalink / raw)
  To: fubar, vfalico, andy; +Cc: davem, netdev
In-Reply-To: <1392188330-17208-1-git-send-email-dingtianhong@huawei.com>

The dev_set_mac_address() will check the dev->netdev_ops->ndo_set_mac_address,
so no need to check it in bond_set_mac_address().

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 71ba18e..58aa531 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3461,15 +3461,7 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
 	 */
 
 	bond_for_each_slave(bond, slave, iter) {
-		const struct net_device_ops *slave_ops = slave->dev->netdev_ops;
 		pr_debug("slave %p %s\n", slave, slave->dev->name);
-
-		if (slave_ops->ndo_set_mac_address == NULL) {
-			res = -EOPNOTSUPP;
-			pr_debug("EOPNOTSUPP %s\n", slave->dev->name);
-			goto unwind;
-		}
-
 		res = dev_set_mac_address(slave->dev, addr);
 		if (res) {
 			/* TODO: consider downing the slave
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next 0/2] bonding: remove the redundant judgements for bonding
From: Ding Tianhong @ 2014-02-12  6:58 UTC (permalink / raw)
  To: fubar, vfalico, andy; +Cc: davem, netdev

Remove the redundant judgements for bond_set_mac_address() and
bond_option_queue_id_set().

Ding Tianhong (2):
  bonding: remove the redundant judgements for bond_set_mac_address()
  bonding: remove the redundant judgements for
    bond_option_queue_id_set()

 drivers/net/bonding/bond_main.c    | 8 --------
 drivers/net/bonding/bond_options.c | 3 +--
 2 files changed, 1 insertion(+), 10 deletions(-)

-- 
1.8.0

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox