* RE: IPV6 ndisc:: Bad NIC causing IPV6 NDP to stop working
From: Eric Dumazet @ 2012-06-21 9:16 UTC (permalink / raw)
To: Menny_Hamburger; +Cc: netdev
In-Reply-To: <D8C50530D6022F40A817A35C40CC06A70B34DBF05D@DUBX7MCDUB01.EMEA.DELL.COM>
Please don't top post on this list.
On Thu, 2012-06-21 at 09:43 +0100, Menny_Hamburger@Dell.com wrote:
> For high availability reasons, the machines discussed run with a
> number of NICs per subnet, where our own proprietary service fixes up
> routing when a NIC goes wild.
> We schedule a fix in the field but our goal is to eliminate as many
> single points of failure as we can, so that our systems will still run
> properly when something goes wrong.
Even if a NIC does memory corruption or some nasty bug ?
That sounds great :)
> We encountered this issue on some proprietary NICs but also with bnx2,
> where we get "chip not in correct endian mode" errors (This is another
> problem that may require a separate discussion).
Until very recently, we used to orphan skb before giving them to device
transmit. So you probably use a very old kernel.
I guess we could just do a regular alloc_skb(), it makes no sense to
limit in-flight ND skbs, we have Qdisc/device limits anyway.
BTW, I have no idea why ndisc_build_skb() is EXPORTed
net/ipv6/ndisc.c | 24 ++++++------------------
1 file changed, 6 insertions(+), 18 deletions(-)
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 69a6330..f149d85 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -429,7 +429,6 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev,
int hlen = LL_RESERVED_SPACE(dev);
int tlen = dev->needed_tailroom;
int len;
- int err;
u8 *opt;
if (!dev->addr_len)
@@ -439,15 +438,10 @@ struct sk_buff *ndisc_build_skb(struct net_device *dev,
if (llinfo)
len += ndisc_opt_addr_space(dev);
- skb = sock_alloc_send_skb(sk,
- (MAX_HEADER + sizeof(struct ipv6hdr) +
- len + hlen + tlen),
- 1, &err);
- if (!skb) {
- ND_PRINTK(0, err, "ND: %s failed to allocate an skb, err=%d\n",
- __func__, err);
+ skb = alloc_skb(MAX_HEADER + sizeof(struct ipv6hdr) + len + hlen + tlen,
+ GFP_ATOMIC);
+ if (!skb)
return NULL;
- }
skb_reserve(skb, hlen);
ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len);
@@ -1550,16 +1544,10 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
hlen = LL_RESERVED_SPACE(dev);
tlen = dev->needed_tailroom;
- buff = sock_alloc_send_skb(sk,
- (MAX_HEADER + sizeof(struct ipv6hdr) +
- len + hlen + tlen),
- 1, &err);
- if (buff == NULL) {
- ND_PRINTK(0, err,
- "Redirect: %s failed to allocate an skb, err=%d\n",
- __func__, err);
+ buff = alloc_skb(MAX_HEADER + sizeof(struct ipv6hdr) + len + hlen + tlen,
+ GFP_ATOMIC);
+ if (!buff)
goto release;
- }
skb_reserve(buff, hlen);
ip6_nd_hdr(sk, buff, dev, &saddr_buf, &ipv6_hdr(skb)->saddr,
^ permalink raw reply related
* Re: [PATCH] net: dcb: fix small regression in __dcbnl_pg_setcfg()
From: David Miller @ 2012-06-21 9:04 UTC (permalink / raw)
To: tgraf; +Cc: john.r.fastabend, tgraf, netdev, lucy.liu, alexander.h.duyck
In-Reply-To: <20120621081910.GH27921@canuck.infradead.org>
From: Thomas Graf <tgraf@infradead.org>
Date: Thu, 21 Jun 2012 04:19:10 -0400
> ACK
Can I get a real "Acked-by: ..." so that it automatically gets
picked up by patchwork? Thanks.
^ permalink raw reply
* RE: IPV6 ndisc:: Bad NIC causing IPV6 NDP to stop working
From: Menny_Hamburger @ 2012-06-21 8:43 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1340266972.4604.4404.camel@edumazet-glaptop>
For high availability reasons, the machines discussed run with a number of NICs per subnet, where our own proprietary service fixes up routing when a NIC goes wild.
We schedule a fix in the field but our goal is to eliminate as many single points of failure as we can, so that our systems will still run properly when something goes wrong.
We encountered this issue on some proprietary NICs but also with bnx2, where we get "chip not in correct endian mode" errors (This is another problem that may require a separate discussion).
-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: 21 June, 2012 11:23
To: Hamburger, Menny
Cc: netdev@vger.kernel.org
Subject: Re: IPV6 ndisc:: Bad NIC causing IPV6 NDP to stop working
On Thu, 2012-06-21 at 08:59 +0100, Menny_Hamburger@Dell.com wrote:
> Hi,
>
> Our machines runs EL5.8 x86_64.
> We have witnessed several cases where we suspect that a bad NIC on the machine caused IPV6 neighbour discovery to stop working on all the other NICs - when this happens ping6 fails on every NIC we try it.
> From looking into the code I see that there is only a single socket assigned for NDP; Does it sound logical to allocate a socket per interface instead of a single global socket.
> I have found the following thread in LKML: https://lkml.org/lkml/2006/11/29/335, and it seems that this allocation issue still exists in EL5 based kernels - could this cause the above problem?
>
What is a bad NIC, and why not fixing it ?
^ permalink raw reply
* Re: [RFC] tcp: How does SACK or FACK determine the time to start fast retransmition?
From: Vijay Subramanian @ 2012-06-21 8:42 UTC (permalink / raw)
To: 李易; +Cc: netdev, kernelnewbies
In-Reply-To: <4FE2C03C.6030102@gmail.com>
On 20 June 2012 23:33, 李易 <lovelylich@gmail.com> wrote:
> HI all,
> When tcp uses reno as its congestion control algothim, it uses
> tp->sacked_out as dup-ack. When the third dup-ack(under default
> condition) comes, tcp will initiate its fast retransmition.
> But how about sack ?
> According to kernel source code comments, when sack or fack tcp option
> is enabled, there is no dup-ack counter. See comments for function
> tcp_dupack_heuristics():
> http://lxr.linux.no/linux+v2.6.37/net/ipv4/tcp_input.c#L2300
> So , how does tcp know the current dup-ack is the last one which
> triggers the fast retransmition?
With SACK, number of dupacks does not have much meaning. What matters is
--how the SACK scoreboard looks like i.e. which packets are tagged
Lost/Sacked/Retransmitted
-- Whether FACK is in use (this assumes holes in between sacked
packets are lost and have left the network and so we can send out more
packets)
So, stack does not count the number of dupacks that have come in. Only
SACK blocks matter.
You can try to track the following path:
tcp_ack() deals with incoming acks and if it sees a dupack (does not
matter what number), or incoming packet contains SACK it calls
tcp_fastretrans_alert() which calls tcp_xmit_retransmit_queue().
tcp_xmit_retransmit_queue() decides which packets to retransmit. The
first packet to start retransmitting from is tracked in
tp->retransmit_skb_hint.
Note that the dupThresh is actually tracked by tp->reordering which
measures the reordering in the network and is not fixed at 3. So, if
more than
tp->reordering packets have been acked above a given packet, this
packet is a candidate for retransmisson. See tcp_mark_head_lost() to
see how the
reordering metric is used to mark packets as lost. This corresponds to
the check you mentioned in the RFC.
So, window permitting, packets are sent as follows;
(a)-- Packets marked lost as per description above
(b)-- new packets (if any)
(c)-- Holes between sacked packets which are not reliably lost.
choice between (b) and (c) is made in tcp_can_forward_retransmit().
Hope this helps.
Vijay
^ permalink raw reply
* Re: IPV6 ndisc:: Bad NIC causing IPV6 NDP to stop working
From: Eric Dumazet @ 2012-06-21 8:22 UTC (permalink / raw)
To: Menny_Hamburger; +Cc: netdev
In-Reply-To: <D8C50530D6022F40A817A35C40CC06A70B34DBEFCF@DUBX7MCDUB01.EMEA.DELL.COM>
On Thu, 2012-06-21 at 08:59 +0100, Menny_Hamburger@Dell.com wrote:
> Hi,
>
> Our machines runs EL5.8 x86_64.
> We have witnessed several cases where we suspect that a bad NIC on the machine caused IPV6 neighbour discovery to stop working on all the other NICs - when this happens ping6 fails on every NIC we try it.
> From looking into the code I see that there is only a single socket assigned for NDP; Does it sound logical to allocate a socket per interface instead of a single global socket.
> I have found the following thread in LKML: https://lkml.org/lkml/2006/11/29/335, and it seems that this allocation issue still exists in EL5 based kernels - could this cause the above problem?
>
What is a bad NIC, and why not fixing it ?
^ permalink raw reply
* Re: [PATCH] net: dcb: fix small regression in __dcbnl_pg_setcfg()
From: Thomas Graf @ 2012-06-21 8:19 UTC (permalink / raw)
To: John Fastabend; +Cc: tgraf, davem, netdev, lucy.liu, alexander.h.duyck
In-Reply-To: <20120621055621.14148.42206.stgit@jf-dev1-dcblab>
On Wed, Jun 20, 2012 at 10:56:21PM -0700, John Fastabend wrote:
> A small regression was introduced in the reply command of
> dcbnl_pg_setcfg(). User space apps may be expecting the
> DCB_ATTR_PG_CFG attribute to be returned with the patch
> below TX or RX variants are returned.
>
> commit 7be994138b188387691322921c08e19bddf6d3c5
> Author: Thomas Graf <tgraf@suug.ch>
> Date: Wed Jun 13 02:54:55 2012 +0000
>
> dcbnl: Shorten all command handling functions
>
> This patch reverts this behavior and returns DCB_ATTR_PG_CFG
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>
> net/dcb/dcbnl.c | 3 +--
> 1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
> index 0a36007..013da86 100644
> --- a/net/dcb/dcbnl.c
> +++ b/net/dcb/dcbnl.c
> @@ -852,8 +852,7 @@ static int __dcbnl_pg_setcfg(struct net_device *netdev, struct nlmsghdr *nlh,
> }
> }
>
> - return nla_put_u8(skb,
> - (dir ? DCB_CMD_PGRX_SCFG : DCB_CMD_PGTX_SCFG), 0);
> + return nla_put_u8(skb, DCB_ATTR_PG_CFG, 0);
> }
>
ACK
Thanks John
^ permalink raw reply
* IPV6 ndisc:: Bad NIC causing IPV6 NDP to stop working
From: Menny_Hamburger @ 2012-06-21 7:59 UTC (permalink / raw)
To: netdev
Hi,
Our machines runs EL5.8 x86_64.
We have witnessed several cases where we suspect that a bad NIC on the machine caused IPV6 neighbour discovery to stop working on all the other NICs - when this happens ping6 fails on every NIC we try it.
>From looking into the code I see that there is only a single socket assigned for NDP; Does it sound logical to allocate a socket per interface instead of a single global socket.
I have found the following thread in LKML: https://lkml.org/lkml/2006/11/29/335, and it seems that this allocation issue still exists in EL5 based kernels - could this cause the above problem?
Thanks,
Menny
^ permalink raw reply
* Re: divide by 0 error in igbvf_set_coalesce - ab50a2a
From: Jeff Kirsher @ 2012-06-21 7:35 UTC (permalink / raw)
To: David Ahern; +Cc: Williams, Mitch A, netdev@vger.kernel.org
In-Reply-To: <4FE24CF1.50603@cisco.com>
[-- Attachment #1: Type: text/plain, Size: 373 bytes --]
On 06/20/2012 03:21 PM, David Ahern wrote:
>
> On 6/18/12 2:45 PM, Williams, Mitch A wrote:
>> Thanks for letting me know, David. I'll look into it and get a patch
>> out soon. Shouldn't be that big of a deal to fix.
>
> Could you CC me on the patch so I know when it's fixed? I have enough
> events to poll.
I will CC you when I push Mitch's patch upstream.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]
^ permalink raw reply
* Re: [RFC] tcp: How does SACK or FACK determine the time to start fast retransmition?
From: Li Yu @ 2012-06-21 7:20 UTC (permalink / raw)
To: 李易; +Cc: netdev, kernelnewbies
In-Reply-To: <4FE2C03C.6030102@gmail.com>
于 2012年06月21日 14:33, 李易 写道:
> HI all,
> When tcp uses reno as its congestion control algothim, it uses
> tp->sacked_out as dup-ack. When the third dup-ack(under default
> condition) comes, tcp will initiate its fast retransmition.
> But how about sack ?
> According to kernel source code comments, when sack or fack tcp option
> is enabled, there is no dup-ack counter. See comments for function
> tcp_dupack_heuristics():
> http://lxr.linux.no/linux+v2.6.37/net/ipv4/tcp_input.c#L2300
> So , how does tcp know the current dup-ack is the last one which
> triggers the fast retransmition?
>
> According to rfc3517 section 5:
> "Upon the receipt of the first (DupThresh - 1) duplicate ACKs, the
> scoreboard is to be updated as normal."
> "When a TCP sender receives the duplicate ACK corresponding to
> DupThresh ACKs,
> the scoreboard MUST be updated with the new SACK information (via
> Update ()). If no previous loss event has occurred
> on the connection or the cumulative acknowledgment point is beyond
> the last value of RecoveryPoint, a loss recovery phase SHOULD be
> initiated, per the fast retransmit algorithm outlined in [RFC2581]."
>
> But these sentences doesn't describe how tcp knows the current ack
> is the dup-threshold dup-ack.
>
> Accorrding to rfc3517 seciton 4 and isLost(Seqnum) function:
> "The routine returns true when either
> DupThresh discontiguous SACKed sequences have arrived above
> ’SeqNum’ or (DupThresh * SMSS) bytes with sequence numbers greater
> than ’SeqNum’ have been SACKed. Otherwise, the routine returns
> false."
> I think this is just what I am searching for, but I still don't know
> which line of code in Linux tcp protocol does this check.
> Can any one help me ? thks in advance.
>
>
Do you mean you did not locate where FR is triggered in TCP stack ?
I am not a TCP expert, however I think that it may be at
tcp_time_to_recover(), and the "DupThresh" is not a fixed value in
Linux TCP implementation.
Thanks
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* RE: [RFC net-next 11/14] Fix emulex/benet
From: Sathya.Perla @ 2012-06-21 6:42 UTC (permalink / raw)
To: yuvalmin, netdev, davem; +Cc: eilong
In-Reply-To: <1340118848-30978-12-git-send-email-yuvalmin@broadcom.com>
Yuval, for be2net, the best place to cap the number of queues to a global default
value would be_num_rss_want(). The change would look like:
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/
index fa2a01e..5265b42 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2186,12 +2186,15 @@ static void be_msix_disable(struct be_adapter *adapter)
static uint be_num_rss_want(struct be_adapter *adapter)
{
+ u32 num = 0;
+
if ((adapter->function_caps & BE_FUNCTION_CAPS_RSS) &&
!sriov_want(adapter) && be_physfn(adapter) &&
- !be_is_mc(adapter))
- return (adapter->be3_native) ? BE3_MAX_RSS_QS : BE2_MAX_RSS_QS;
- else
- return 0;
+ !be_is_mc(adapter)) {
+ num = (adapter->be3_native) ? BE3_MAX_RSS_QS : BE2_MAX_RSS_QS;
+ num = min_t(u32, num, DEFAULT_MAX_NUM_RSS_QUEUES);
+ }
+ return num;
}
static void be_msix_enable(struct be_adapter *adapter)
thanks,
-Sathya
________________________________________
From: Yuval Mintz [yuvalmin@broadcom.com]
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
---
drivers/net/ethernet/emulex/benet/be_main.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 5a34503..e42597d 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2153,13 +2153,15 @@ static uint be_num_rss_want(struct be_adapter *adapter)
static void be_msix_enable(struct be_adapter *adapter)
{
#define BE_MIN_MSIX_VECTORS 1
- int i, status, num_vec, num_roce_vec = 0;
+ int i, status, num_vec, num_roce_vec = 0, ncpu;
+
+ ncpu = min_t(int, num_online_cpus(), DEFAULT_MAX_NUM_RSS_QUEUES);
/* If RSS queues are not used, need a vec for default RX Q */
- num_vec = min(be_num_rss_want(adapter), num_online_cpus());
+ num_vec = min(be_num_rss_want(adapter), ncpu);
if (be_roce_supported(adapter)) {
num_roce_vec = min_t(u32, MAX_ROCE_MSIX_VECTORS,
- (num_online_cpus() + 1));
+ (u32)(ncpu + 1));
num_roce_vec = min(num_roce_vec, MAX_ROCE_EQS);
num_vec += num_roce_vec;
num_vec = min(num_vec, MAX_MSIX_VECTORS);
--
1.7.9.rc2
^ permalink raw reply related
* Re: [PATCH] [XFRM] Fix unexpected SA hard expiration after changing date
From: Fan Du @ 2012-06-21 6:41 UTC (permalink / raw)
To: David Miller; +Cc: herbert, netdev
In-Reply-To: <20120620.204323.1614285855856633624.davem@davemloft.net>
Hi, David
On 2012年06月21日 11:43, David Miller wrote:
> From: Fan Du<fan.du@windriver.com>
> Date: Thu, 21 Jun 2012 10:11:51 +0800
>
>> Could you please take a look at it if you have time?
>
> Everyone saw it, you need to be patient.
>
Apologize for this noise.
Yes, I'm young, and a bit of rush :)
> And you still have fdu@windriver.com in the CC: list, which I told you
> bounces.
>
> I asked you explicitly to remove this email address from the CC: list,
> because it bounces.
I didn't do it deliberately.
Damn, must be my afternoon buzzy head!
I have sent out V4, I hope I will not make any dummy mistake.
Anyway, please review it if you have time, and I will wait for your
response *PATIENTLY*.
thanks
--
Love each day!
--fan
^ permalink raw reply
* [RFC] tcp: How does SACK or FACK determine the time to start fast retransmition?
From: 李易 @ 2012-06-21 6:33 UTC (permalink / raw)
To: netdev; +Cc: kernelnewbies
In-Reply-To: <CAAA3+BpRQ24HED6a+yCF+A_q=vJ7nHexLJd1U+-50pb43MVa5w@mail.gmail.com>
HI all,
When tcp uses reno as its congestion control algothim, it uses
tp->sacked_out as dup-ack. When the third dup-ack(under default
condition) comes, tcp will initiate its fast retransmition.
But how about sack ?
According to kernel source code comments, when sack or fack tcp option
is enabled, there is no dup-ack counter. See comments for function
tcp_dupack_heuristics():
http://lxr.linux.no/linux+v2.6.37/net/ipv4/tcp_input.c#L2300
So , how does tcp know the current dup-ack is the last one which
triggers the fast retransmition?
According to rfc3517 section 5:
"Upon the receipt of the first (DupThresh - 1) duplicate ACKs, the
scoreboard is to be updated as normal."
"When a TCP sender receives the duplicate ACK corresponding to
DupThresh ACKs,
the scoreboard MUST be updated with the new SACK information (via
Update ()). If no previous loss event has occurred
on the connection or the cumulative acknowledgment point is beyond
the last value of RecoveryPoint, a loss recovery phase SHOULD be
initiated, per the fast retransmit algorithm outlined in [RFC2581]."
But these sentences doesn't describe how tcp knows the current ack
is the dup-threshold dup-ack.
Accorrding to rfc3517 seciton 4 and isLost(Seqnum) function:
"The routine returns true when either
DupThresh discontiguous SACKed sequences have arrived above
’SeqNum’ or (DupThresh * SMSS) bytes with sequence numbers greater
than ’SeqNum’ have been SACKed. Otherwise, the routine returns
false."
I think this is just what I am searching for, but I still don't know
which line of code in Linux tcp protocol does this check.
Can any one help me ? thks in advance.
^ permalink raw reply
* [XFRM][RFC v4] Fix unexpected SA hard expiration after setting new date
From: Fan Du @ 2012-06-21 6:26 UTC (permalink / raw)
To: davem, herbert; +Cc: netdev
Hi, Dave and Herbert
Could you give a try to review this?
thanks
Changelog:
v1->v2
1) use xflags instead of creating new flags(suggested by Steffen Klassert)
v2->v3
1) fix email problem, and remove cc to myself(requested by David Miller)
v3->v4
1) fix typo when clearing XFRM_SOFT_EXPIRE(thanks for David Miller)
2) fix email problem, and remove cc to myself AGAIN!!!
*Background*:
Once IPsec SAs are created between two peers, kernel setup a timer to monitor
two events: soft/hard expiration. However the timer handler use xtime to
caculate whether it's soft or hard expiration event.
normal code flow(hard expire time:100s, soft expire time:82s)
a) When new SAs created, xfrm_timer_handler is called one second
after its creation. At this point, calculate soft expire
interval(81s), setup the timer;
b) soft expire occur, rearm the timer with hard expire interval(18s)
then notify racoon2 about soft expire event. racoon2 will create
new SAs.
c) hard expire happen, notify racoon2 about it. racoon2 will delete
the old SAs.
*Scenario*:
Setting a new date before b),and after a) could result c) happens first,
As a result, old SAs is deleted before new ones are created. Normally
new SAs will be created by the next time networking traffic, but there
is a small time being when networking connection is down, this could
result in upper layer connections failed in tel comm area, thus it's
better to keep it strict in sequence.
*Workaround*:
set new time could happen:
1) before a), then SAs is updated with new time.
2) before b),and after a)
2a) When new SAs created, xfrm_timer_handler is called one second
after its creation. At this point, calculate soft expire
interval(81s), setup the timer;(set flag to mark next time should
be soft time expire)
<<---- new date comes
2b) soft expire occur, the calculation results in a hard time expire
event, but flag is set, so catch ya. Sync the addtime, and rearm
the timer with hard expire interval(18s), then notify racoon2
about soft expire event;
2c) hard expire happen, notify racoon2 about it;
so everything is in order.
3) after b), hard expire always happened anyway.
^ permalink raw reply
* [PATCH] [XFRM] Fix unexpected SA hard expiration after changing date
From: Fan Du @ 2012-06-21 6:26 UTC (permalink / raw)
To: davem, herbert; +Cc: netdev
In-Reply-To: <1340259961-30354-1-git-send-email-fdu@windriver.com>
After SA is setup, one timer is armed to detect soft/hard expiration,
however the timer handler uses xtime to do the math. This makes hard
expiration occurs first before soft expiration after setting new date
with big interval. As a result new child SA is deleted before rekeying
the new one.
Signed-off-by: Fan Du <fdu@windriver.com>
---
include/net/xfrm.h | 3 +++
net/xfrm/xfrm_state.c | 21 +++++++++++++++++----
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 2933d74..8d16777 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -197,6 +197,8 @@ struct xfrm_state
struct xfrm_lifetime_cur curlft;
struct timer_list timer;
+ /* used to fix curlft->add_time when changing date */
+ long saved_tmo;
/* Last used time */
unsigned long lastused;
@@ -218,6 +220,7 @@ struct xfrm_state
/* xflags - make enum if more show up */
#define XFRM_TIME_DEFER 1
+#define XFRM_SOFT_EXPIRE 2
enum {
XFRM_STATE_VOID,
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index fd77cf0..2b55244 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -442,8 +442,17 @@ static void xfrm_timer_handler(unsigned long data)
if (x->lft.hard_add_expires_seconds) {
long tmo = x->lft.hard_add_expires_seconds +
x->curlft.add_time - now;
- if (tmo <= 0)
- goto expired;
+ if (tmo <= 0) {
+ if (x->xflags & XFRM_SOFT_EXPIRE) {
+ /* enter hard expire without soft expire first?!
+ * setting a new date could trigger this.
+ * workarbound: fix x->curflt.add_time by below:
+ */
+ x->curlft.add_time = now - x->saved_tmo - 1;
+ tmo = x->lft.hard_add_expires_seconds - x->saved_tmo;
+ } else
+ goto expired;
+ }
if (tmo < next)
next = tmo;
}
@@ -460,10 +469,14 @@ static void xfrm_timer_handler(unsigned long data)
if (x->lft.soft_add_expires_seconds) {
long tmo = x->lft.soft_add_expires_seconds +
x->curlft.add_time - now;
- if (tmo <= 0)
+ if (tmo <= 0) {
warn = 1;
- else if (tmo < next)
+ x->xflags &= ~XFRM_SOFT_EXPIRE;
+ } else if (tmo < next) {
next = tmo;
+ x->xflags |= XFRM_SOFT_EXPIRE;
+ x->saved_tmo = tmo;
+ }
}
if (x->lft.soft_use_expires_seconds) {
long tmo = x->lft.soft_use_expires_seconds +
--
1.7.11
^ permalink raw reply related
* [PATCH] net: dcb: fix small regression in __dcbnl_pg_setcfg()
From: John Fastabend @ 2012-06-21 5:56 UTC (permalink / raw)
To: tgraf, davem; +Cc: netdev, lucy.liu, alexander.h.duyck
A small regression was introduced in the reply command of
dcbnl_pg_setcfg(). User space apps may be expecting the
DCB_ATTR_PG_CFG attribute to be returned with the patch
below TX or RX variants are returned.
commit 7be994138b188387691322921c08e19bddf6d3c5
Author: Thomas Graf <tgraf@suug.ch>
Date: Wed Jun 13 02:54:55 2012 +0000
dcbnl: Shorten all command handling functions
This patch reverts this behavior and returns DCB_ATTR_PG_CFG
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
net/dcb/dcbnl.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 0a36007..013da86 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -852,8 +852,7 @@ static int __dcbnl_pg_setcfg(struct net_device *netdev, struct nlmsghdr *nlh,
}
}
- return nla_put_u8(skb,
- (dir ? DCB_CMD_PGRX_SCFG : DCB_CMD_PGTX_SCFG), 0);
+ return nla_put_u8(skb, DCB_ATTR_PG_CFG, 0);
}
static int dcbnl_pgtx_setcfg(struct net_device *netdev, struct nlmsghdr *nlh,
^ permalink raw reply related
* Re: [PATCH 2/8] dcbnl: Shorten all command handling functions
From: John Fastabend @ 2012-06-21 6:04 UTC (permalink / raw)
To: Thomas Graf; +Cc: davem, netdev, lucy.liu, alexander.h.duyck
In-Reply-To: <3fea99383d5ee98b51bd8b0088bed38cdb59f375.1339591572.git.tgraf@suug.ch>
On 6/13/2012 5:54 AM, Thomas Graf wrote:
> Allocating and sending the skb in dcb_doit() allows for much
> shorter and cleaner command handling functions.
>
> The huge switch statement is replaced with an array based definition
> of the handling function and reply message type.
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> net/dcb/dcbnl.c | 722 +++++++++++++------------------------------------------
> 1 files changed, 172 insertions(+), 550 deletions(-)
>
[...]
> @@ -1054,32 +885,27 @@ static int __dcbnl_pg_setcfg(struct net_device *netdev, struct nlattr **tb,
> }
> }
>
> - ret = dcbnl_reply(0, RTM_SETDCB,
> - (dir ? DCB_CMD_PGRX_SCFG : DCB_CMD_PGTX_SCFG),
> - DCB_ATTR_PG_CFG, pid, seq, flags);
> + ret = nla_put_u8(skb, (dir ? DCB_CMD_PGRX_SCFG : DCB_CMD_PGTX_SCFG), 0);
Previously, this returned DCB_ATTR_PG_CFG now it is returning RX or TX
variants. Although I like this implementation better I think in order
to not break user space application we should keep the previous
behavior. At least one app is checking these return codes.
I'll post a patch shortly. Thomas if you ACK it that would be great.
Thanks,
John
^ permalink raw reply
* [net-next.git 4/4 (v8)] phy: add the EEE support and the way to access to the MMD registers.
From: Giuseppe CAVALLARO @ 2012-06-21 6:03 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, bhutchings, rayagond, davem, yuvalmin,
Giuseppe Cavallaro
In-Reply-To: <1340258599-3083-1-git-send-email-peppe.cavallaro@st.com>
This patch adds the support for the Energy-Efficient Ethernet (EEE)
to the Physical Abstraction Layer.
To support the EEE we have to access to the MMD registers 3.20 and
7.60/61. So two new functions have been added to read/write the MMD
registers (clause 45).
An Ethernet driver (I tested the stmmac) can invoke the phy_init_eee to properly
check if the EEE is supported by the PHYs and it can also set the clock
stop enable bit in the 3.0 register.
The phy_get_eee_err can be used for reporting the number of time where
the PHY failed to complete its normal wake sequence.
In the end, this patch also adds the EEE ethtool support implementing:
o phy_ethtool_set_eee
o phy_ethtool_get_eee
v1: initial patch
v2: fixed some errors especially on naming convention
v3: renamed again the mmd read/write functions thank to Ben's feedback
v4: moved file to phy.c and added the ethtool support.
v5: fixed phy_adv_to_eee, phy_eee_to_supported, phy_eee_to_adv return
values according to ethtool API (thanks to Ben's feedback).
Renamed some macros to avoid too long names.
v6: fixed kernel-doc comments to be properly parsed.
Fixed the phy_init_eee function: we need to check which link mode
was autonegotiated and then the corresponding bits in 7.60 and 7.61
registers.
v7: reviewed the way to get the negotiated settings.
v8: fixed a problem in the phy_init_eee return value erroneously added
when included the phy_read_status call.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/phy/phy.c | 281 +++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/mdio.h | 21 +++-
include/linux/mii.h | 9 ++
include/linux/phy.h | 5 +
4 files changed, 312 insertions(+), 4 deletions(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 2e1c237..e13a30d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -35,6 +35,7 @@
#include <linux/phy.h>
#include <linux/timer.h>
#include <linux/workqueue.h>
+#include <linux/mdio.h>
#include <linux/atomic.h>
#include <asm/io.h>
@@ -967,3 +968,283 @@ void phy_state_machine(struct work_struct *work)
schedule_delayed_work(&phydev->state_queue, PHY_STATE_TIME * HZ);
}
+
+static inline void mmd_phy_indirect(struct mii_bus *bus, int prtad, int devad,
+ int addr)
+{
+ /* Write the desired MMD Devad */
+ bus->write(bus, addr, MII_MMD_CTRL, devad);
+
+ /* Write the desired MMD register address */
+ bus->write(bus, addr, MII_MMD_DATA, prtad);
+
+ /* Select the Function : DATA with no post increment */
+ bus->write(bus, addr, MII_MMD_CTRL, (devad | MII_MMD_CTRL_NOINCR));
+}
+
+/**
+ * phy_read_mmd_indirect - reads data from the MMD registers
+ * @bus: the target MII bus
+ * @prtad: MMD Address
+ * @devad: MMD DEVAD
+ * @addr: PHY address on the MII bus
+ *
+ * Description: it reads data from the MMD registers (clause 22 to access to
+ * clause 45) of the specified phy address.
+ * To read these register we have:
+ * 1) Write reg 13 // DEVAD
+ * 2) Write reg 14 // MMD Address
+ * 3) Write reg 13 // MMD Data Command for MMD DEVAD
+ * 3) Read reg 14 // Read MMD data
+ */
+static int phy_read_mmd_indirect(struct mii_bus *bus, int prtad, int devad,
+ int addr)
+{
+ u32 ret;
+
+ mmd_phy_indirect(bus, prtad, devad, addr);
+
+ /* Read the content of the MMD's selected register */
+ ret = bus->read(bus, addr, MII_MMD_DATA);
+
+ return ret;
+}
+
+/**
+ * phy_write_mmd_indirect - writes data to the MMD registers
+ * @bus: the target MII bus
+ * @prtad: MMD Address
+ * @devad: MMD DEVAD
+ * @addr: PHY address on the MII bus
+ * @data: data to write in the MMD register
+ *
+ * Description: Write data from the MMD registers of the specified
+ * phy address.
+ * To write these register we have:
+ * 1) Write reg 13 // DEVAD
+ * 2) Write reg 14 // MMD Address
+ * 3) Write reg 13 // MMD Data Command for MMD DEVAD
+ * 3) Write reg 14 // Write MMD data
+ */
+static void phy_write_mmd_indirect(struct mii_bus *bus, int prtad, int devad,
+ int addr, u32 data)
+{
+ mmd_phy_indirect(bus, prtad, devad, addr);
+
+ /* Write the data into MMD's selected register */
+ bus->write(bus, addr, MII_MMD_DATA, data);
+}
+
+static u32 phy_eee_to_adv(u16 eee_adv)
+{
+ u32 adv = 0;
+
+ if (eee_adv & MDIO_EEE_100TX)
+ adv |= ADVERTISED_100baseT_Full;
+ if (eee_adv & MDIO_EEE_1000T)
+ adv |= ADVERTISED_1000baseT_Full;
+ if (eee_adv & MDIO_EEE_10GT)
+ adv |= ADVERTISED_10000baseT_Full;
+ if (eee_adv & MDIO_EEE_1000KX)
+ adv |= ADVERTISED_1000baseKX_Full;
+ if (eee_adv & MDIO_EEE_10GKX4)
+ adv |= ADVERTISED_10000baseKX4_Full;
+ if (eee_adv & MDIO_EEE_10GKR)
+ adv |= ADVERTISED_10000baseKR_Full;
+
+ return adv;
+}
+
+static u32 phy_eee_to_supported(u16 eee_caported)
+{
+ u32 supported = 0;
+
+ if (eee_caported & MDIO_EEE_100TX)
+ supported |= SUPPORTED_100baseT_Full;
+ if (eee_caported & MDIO_EEE_1000T)
+ supported |= SUPPORTED_1000baseT_Full;
+ if (eee_caported & MDIO_EEE_10GT)
+ supported |= SUPPORTED_10000baseT_Full;
+ if (eee_caported & MDIO_EEE_1000KX)
+ supported |= SUPPORTED_1000baseKX_Full;
+ if (eee_caported & MDIO_EEE_10GKX4)
+ supported |= SUPPORTED_10000baseKX4_Full;
+ if (eee_caported & MDIO_EEE_10GKR)
+ supported |= SUPPORTED_10000baseKR_Full;
+
+ return supported;
+}
+
+static u16 phy_adv_to_eee(u32 adv)
+{
+ u16 reg = 0;
+
+ if (adv & ADVERTISED_100baseT_Full)
+ reg |= MDIO_EEE_100TX;
+ if (adv & ADVERTISED_1000baseT_Full)
+ reg |= MDIO_EEE_1000T;
+ if (adv & ADVERTISED_10000baseT_Full)
+ reg |= MDIO_EEE_10GT;
+ if (adv & ADVERTISED_1000baseKX_Full)
+ reg |= MDIO_EEE_1000KX;
+ if (adv & ADVERTISED_10000baseKX4_Full)
+ reg |= MDIO_EEE_10GKX4;
+ if (adv & ADVERTISED_10000baseKR_Full)
+ reg |= MDIO_EEE_10GKR;
+
+ return reg;
+}
+
+/**
+ * phy_init_eee - init and check the EEE feature
+ * @phydev: target phy_device struct
+ * @clk_stop_enable: PHY may stop the clock during LPI
+ *
+ * Description: it checks if the Energy-Efficient Ethernet (EEE)
+ * is supported by looking at the MMD registers 3.20 and 7.60/61
+ * and it programs the MMD register 3.0 setting the "Clock stop enable"
+ * bit if required.
+ */
+int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable)
+{
+ int ret = -EPROTONOSUPPORT;
+
+ /* According to 802.3az,the EEE is supported only in full duplex-mode.
+ * Also EEE feature is active when core is operating with MII, GMII
+ * or RGMII.
+ */
+ if ((phydev->duplex == DUPLEX_FULL) &&
+ ((phydev->interface == PHY_INTERFACE_MODE_MII) ||
+ (phydev->interface == PHY_INTERFACE_MODE_GMII) ||
+ (phydev->interface == PHY_INTERFACE_MODE_RGMII))) {
+ u16 eee_lp, eee_cap, eee_adv;
+ u32 lp, cap, adv;
+ int idx, status;
+
+ /* Read phy status to properly get the right settings */
+ status = phy_read_status(phydev);
+ if (status)
+ return status;
+
+ /* First check if the EEE ability is supported */
+ eee_cap = phy_read_mmd_indirect(phydev->bus, MDIO_PCS_EEE_ABLE,
+ MDIO_MMD_PCS, phydev->addr);
+ if (eee_cap < 0)
+ return eee_cap;
+
+ cap = phy_eee_to_supported(eee_cap);
+ if (!cap)
+ goto eee_exit;
+
+ /* Check which link settings negotiated and verify it in
+ * the EEE advertising registers.
+ */
+ eee_lp = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_LPABLE,
+ MDIO_MMD_AN, phydev->addr);
+ if (eee_lp < 0)
+ return eee_lp;
+
+ eee_adv = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_ADV,
+ MDIO_MMD_AN, phydev->addr);
+ if (eee_adv < 0)
+ return eee_adv;
+
+ adv = phy_eee_to_adv(eee_adv);
+ lp = phy_eee_to_adv(eee_lp);
+ idx = phy_find_setting(phydev->speed, phydev->duplex);
+ if ((lp & adv & settings[idx].setting))
+ goto eee_exit;
+
+ if (clk_stop_enable) {
+ /* Configure the PHY to stop receiving xMII
+ * clock while it is signaling LPI.
+ */
+ u32 val = phy_read_mmd_indirect(phydev->bus, MDIO_CTRL1,
+ MDIO_MMD_PCS,
+ phydev->addr);
+ if (val < 0)
+ return val;
+
+ val |= MDIO_PCS_CTRL1_CLKSTOP_EN;
+ phy_write_mmd_indirect(phydev->bus, MDIO_CTRL1,
+ MDIO_MMD_PCS, phydev->addr, val);
+ }
+
+ ret = 0; /* EEE supported */
+ }
+
+eee_exit:
+ return ret;
+}
+EXPORT_SYMBOL(phy_init_eee);
+
+/**
+ * phy_get_eee_err - report the EEE wake error count
+ * @phydev: target phy_device struct
+ *
+ * Description: it is to report the number of time where the PHY
+ * failed to complete its normal wake sequence.
+ */
+int phy_get_eee_err(struct phy_device *phydev)
+{
+ return phy_read_mmd_indirect(phydev->bus, MDIO_PCS_EEE_WK_ERR,
+ MDIO_MMD_PCS, phydev->addr);
+
+}
+EXPORT_SYMBOL(phy_get_eee_err);
+
+/**
+ * phy_ethtool_get_eee - get EEE supported and status
+ * @phydev: target phy_device struct
+ * @data: ethtool_eee data
+ *
+ * Description: it reportes the Supported/Advertisement/LP Advertisement
+ * capabilities.
+ */
+int phy_ethtool_get_eee(struct phy_device *phydev, struct ethtool_eee *data)
+{
+ int val;
+
+ /* Get Supported EEE */
+ val = phy_read_mmd_indirect(phydev->bus, MDIO_PCS_EEE_ABLE,
+ MDIO_MMD_PCS, phydev->addr);
+ if (val < 0)
+ return val;
+ data->supported = phy_eee_to_supported(val);
+
+ /* Get advertisement EEE */
+ val = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_ADV,
+ MDIO_MMD_AN, phydev->addr);
+ if (val < 0)
+ return val;
+ data->advertised = phy_eee_to_adv(val);
+
+ /* Get LP advertisement EEE */
+ val = phy_read_mmd_indirect(phydev->bus, MDIO_AN_EEE_LPABLE,
+ MDIO_MMD_AN, phydev->addr);
+ if (val < 0)
+ return val;
+ data->lp_advertised = phy_eee_to_adv(val);
+
+ return 0;
+}
+EXPORT_SYMBOL(phy_ethtool_get_eee);
+
+/**
+ * phy_ethtool_set_eee - set EEE supported and status
+ * @phydev: target phy_device struct
+ * @data: ethtool_eee data
+ *
+ * Description: it is to program the Advertisement EEE register.
+ */
+int phy_ethtool_set_eee(struct phy_device *phydev, struct ethtool_eee *data)
+{
+ int val;
+
+ val = phy_adv_to_eee(data->advertised);
+ phy_write_mmd_indirect(phydev->bus, MDIO_AN_EEE_ADV, MDIO_MMD_AN,
+ phydev->addr, val);
+
+ return 0;
+}
+EXPORT_SYMBOL(phy_ethtool_set_eee);
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index dfb9479..4ad8f0e 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -43,7 +43,11 @@
#define MDIO_PKGID2 15
#define MDIO_AN_ADVERTISE 16 /* AN advertising (base page) */
#define MDIO_AN_LPA 19 /* AN LP abilities (base page) */
+#define MDIO_PCS_EEE_ABLE 20 /* EEE Capability register */
+#define MDIO_PCS_EEE_WK_ERR 22 /* EEE wake error counter */
#define MDIO_PHYXS_LNSTAT 24 /* PHY XGXS lane state */
+#define MDIO_AN_EEE_ADV 60 /* EEE advertisement */
+#define MDIO_AN_EEE_LPABLE 61 /* EEE link partner ability */
/* Media-dependent registers. */
#define MDIO_PMA_10GBT_SWAPPOL 130 /* 10GBASE-T pair swap & polarity */
@@ -56,7 +60,6 @@
#define MDIO_PCS_10GBRT_STAT2 33 /* 10GBASE-R/-T PCS status 2 */
#define MDIO_AN_10GBT_CTRL 32 /* 10GBASE-T auto-negotiation control */
#define MDIO_AN_10GBT_STAT 33 /* 10GBASE-T auto-negotiation status */
-#define MDIO_AN_EEE_ADV 60 /* EEE advertisement */
/* LASI (Link Alarm Status Interrupt) registers, defined by XENPAK MSA. */
#define MDIO_PMA_LASI_RXCTRL 0x9000 /* RX_ALARM control */
@@ -82,6 +85,7 @@
#define MDIO_AN_CTRL1_RESTART BMCR_ANRESTART
#define MDIO_AN_CTRL1_ENABLE BMCR_ANENABLE
#define MDIO_AN_CTRL1_XNP 0x2000 /* Enable extended next page */
+#define MDIO_PCS_CTRL1_CLKSTOP_EN 0x400 /* Stop the clock during LPI */
/* 10 Gb/s */
#define MDIO_CTRL1_SPEED10G (MDIO_CTRL1_SPEEDSELEXT | 0x00)
@@ -237,9 +241,18 @@
#define MDIO_AN_10GBT_STAT_MS 0x4000 /* Master/slave config */
#define MDIO_AN_10GBT_STAT_MSFLT 0x8000 /* Master/slave config fault */
-/* AN EEE Advertisement register. */
-#define MDIO_AN_EEE_ADV_100TX 0x0002 /* Advertise 100TX EEE cap */
-#define MDIO_AN_EEE_ADV_1000T 0x0004 /* Advertise 1000T EEE cap */
+/* EEE Supported/Advertisement/LP Advertisement registers.
+ *
+ * EEE capability Register (3.20), Advertisement (7.60) and
+ * Link partner ability (7.61) registers have and can use the same identical
+ * bit masks.
+ */
+#define MDIO_EEE_100TX 0x0002 /* 100TX EEE cap */
+#define MDIO_EEE_1000T 0x0004 /* 1000T EEE cap */
+#define MDIO_EEE_10GT 0x0008 /* 10GT EEE cap */
+#define MDIO_EEE_1000KX 0x0010 /* 1000KX EEE cap */
+#define MDIO_EEE_10GKX4 0x0020 /* 10G KX4 EEE cap */
+#define MDIO_EEE_10GKR 0x0040 /* 10G KR EEE cap */
/* LASI RX_ALARM control/status registers. */
#define MDIO_PMA_LASI_RX_PHYXSLFLT 0x0001 /* PHY XS RX local fault */
diff --git a/include/linux/mii.h b/include/linux/mii.h
index 2783eca..8ef3a7a 100644
--- a/include/linux/mii.h
+++ b/include/linux/mii.h
@@ -21,6 +21,8 @@
#define MII_EXPANSION 0x06 /* Expansion register */
#define MII_CTRL1000 0x09 /* 1000BASE-T control */
#define MII_STAT1000 0x0a /* 1000BASE-T status */
+#define MII_MMD_CTRL 0x0d /* MMD Access Control Register */
+#define MII_MMD_DATA 0x0e /* MMD Access Data Register */
#define MII_ESTATUS 0x0f /* Extended Status */
#define MII_DCOUNTER 0x12 /* Disconnect counter */
#define MII_FCSCOUNTER 0x13 /* False carrier counter */
@@ -141,6 +143,13 @@
#define FLOW_CTRL_TX 0x01
#define FLOW_CTRL_RX 0x02
+/* MMD Access Control register fields */
+#define MII_MMD_CTRL_DEVAD_MASK 0x1f /* Mask MMD DEVAD*/
+#define MII_MMD_CTRL_ADDR 0x0000 /* Address */
+#define MII_MMD_CTRL_NOINCR 0x4000 /* no post increment */
+#define MII_MMD_CTRL_INCR_RDWT 0x8000 /* post increment on reads & writes */
+#define MII_MMD_CTRL_INCR_ON_WT 0xC000 /* post increment on writes only */
+
/* This structure is used in all SIOCxMIIxxx ioctl calls */
struct mii_ioctl_data {
__u16 phy_id;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index c291cae..97fc4cf 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -532,6 +532,11 @@ int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
int (*run)(struct phy_device *));
int phy_scan_fixups(struct phy_device *phydev);
+int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable);
+int phy_get_eee_err(struct phy_device *phydev);
+int phy_ethtool_set_eee(struct phy_device *phydev, struct ethtool_eee *data);
+int phy_ethtool_get_eee(struct phy_device *phydev, struct ethtool_eee *data);
+
int __init mdio_bus_init(void);
void mdio_bus_exit(void);
--
1.7.4.4
^ permalink raw reply related
* [net-next.git 3/4 (v5)] stmmac: add the Energy Efficient Ethernet support
From: Giuseppe CAVALLARO @ 2012-06-21 6:03 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, bhutchings, rayagond, davem, yuvalmin,
Giuseppe Cavallaro
In-Reply-To: <1340258599-3083-1-git-send-email-peppe.cavallaro@st.com>
This patch adds the Energy Efficient Ethernet support to the stmmac.
Please see the driver's documentation for further details about this support
in the driver.
Thanks also goes to Rayagond Kokatanur for his first implementation.
Note:
to clearly manage and expose the lpi interrupt status and eee ethtool
stats I've had to do some modifications to the driver's design and I
found really useful to move other parts of the code (e.g. mmc irq stat)
in the main directly. So this means that some core has been reworked
to introduce the EEE.
v1: initial patch
v2: fixed some sparse issues (typos)
v3: erroneously sent the v2 renamed as v3
v4:
o Fixed the return value of the stmmac_eee_init as suggested by D.Miller
o Totally reviewed the ethtool support for EEE
o Added a new internal parameter to tune the SW timer for TX LPI.
v5: do not change any eee setting in case of the stmmac_ethtool_op_set_eee fails
(it has to return -EOPNOTSUPP in that case).
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/ethernet/stmicro/stmmac/common.h | 31 ++++-
drivers/net/ethernet/stmicro/stmmac/dwmac1000.h | 20 +++
.../net/ethernet/stmicro/stmmac/dwmac1000_core.c | 101 +++++++++++-
.../net/ethernet/stmicro/stmmac/dwmac100_core.c | 4 +-
drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h | 1 +
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 8 +
.../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 57 +++++++
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 166 +++++++++++++++++++-
.../net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +
9 files changed, 372 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index bcd54d6..e2d0832 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -95,6 +95,16 @@ struct stmmac_extra_stats {
unsigned long poll_n;
unsigned long sched_timer_n;
unsigned long normal_irq_n;
+ unsigned long mmc_tx_irq_n;
+ unsigned long mmc_rx_irq_n;
+ unsigned long mmc_rx_csum_offload_irq_n;
+ /* EEE */
+ unsigned long irq_receive_pmt_irq_n;
+ unsigned long irq_tx_path_in_lpi_mode_n;
+ unsigned long irq_tx_path_exit_lpi_mode_n;
+ unsigned long irq_rx_path_in_lpi_mode_n;
+ unsigned long irq_rx_path_exit_lpi_mode_n;
+ unsigned long phy_eee_wakeup_error_n;
};
/* CSR Frequency Access Defines*/
@@ -162,6 +172,17 @@ enum tx_dma_irq_status {
handle_tx_rx = 3,
};
+enum core_specific_irq_mask {
+ core_mmc_tx_irq = 1,
+ core_mmc_rx_irq = 2,
+ core_mmc_rx_csum_offload_irq = 4,
+ core_irq_receive_pmt_irq = 8,
+ core_irq_tx_path_in_lpi_mode = 16,
+ core_irq_tx_path_exit_lpi_mode = 32,
+ core_irq_rx_path_in_lpi_mode = 64,
+ core_irq_rx_path_exit_lpi_mode = 128,
+};
+
/* DMA HW capabilities */
struct dma_features {
unsigned int mbps_10_100;
@@ -208,6 +229,10 @@ struct dma_features {
#define MAC_ENABLE_TX 0x00000008 /* Transmitter Enable */
#define MAC_RNABLE_RX 0x00000004 /* Receiver Enable */
+/* Default LPI timers */
+#define STMMAC_DEFAULT_LIT_LS_TIMER 0x3E8
+#define STMMAC_DEFAULT_TWT_LS_TIMER 0x0
+
struct stmmac_desc_ops {
/* DMA RX descriptor ring initialization */
void (*init_rx_desc) (struct dma_desc *p, unsigned int ring_size,
@@ -278,7 +303,7 @@ struct stmmac_ops {
/* Dump MAC registers */
void (*dump_regs) (void __iomem *ioaddr);
/* Handle extra events on specific interrupts hw dependent */
- void (*host_irq_status) (void __iomem *ioaddr);
+ int (*host_irq_status) (void __iomem *ioaddr);
/* Multicast filter setting */
void (*set_filter) (struct net_device *dev, int id);
/* Flow control setting */
@@ -291,6 +316,10 @@ struct stmmac_ops {
unsigned int reg_n);
void (*get_umac_addr) (void __iomem *ioaddr, unsigned char *addr,
unsigned int reg_n);
+ void (*set_eee_mode) (void __iomem *ioaddr);
+ void (*reset_eee_mode) (void __iomem *ioaddr);
+ void (*set_eee_timer) (void __iomem *ioaddr, int ls, int tw);
+ void (*set_eee_pls) (void __iomem *ioaddr, int link);
};
struct mac_link {
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
index 23478bf..f90fcb5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
@@ -36,6 +36,7 @@
#define GMAC_INT_STATUS 0x00000038 /* interrupt status register */
enum dwmac1000_irq_status {
+ lpiis_irq = 0x400,
time_stamp_irq = 0x0200,
mmc_rx_csum_offload_irq = 0x0080,
mmc_tx_irq = 0x0040,
@@ -60,6 +61,25 @@ enum power_event {
power_down = 0x00000001,
};
+/* Energy Efficient Ethernet (EEE)
+ *
+ * LPI status, timer and control register offset
+ */
+#define LPI_CTRL_STATUS 0x0030
+#define LPI_TIMER_CTRL 0x0034
+
+/* LPI control and status defines */
+#define LPI_CTRL_STATUS_LPITXA 0x00080000 /* Enable LPI TX Automate */
+#define LPI_CTRL_STATUS_PLSEN 0x00040000 /* Enable PHY Link Status */
+#define LPI_CTRL_STATUS_PLS 0x00020000 /* PHY Link Status */
+#define LPI_CTRL_STATUS_LPIEN 0x00010000 /* LPI Enable */
+#define LPI_CTRL_STATUS_RLPIST 0x00000200 /* Receive LPI state */
+#define LPI_CTRL_STATUS_TLPIST 0x00000100 /* Transmit LPI state */
+#define LPI_CTRL_STATUS_RLPIEX 0x00000008 /* Receive LPI Exit */
+#define LPI_CTRL_STATUS_RLPIEN 0x00000004 /* Receive LPI Entry */
+#define LPI_CTRL_STATUS_TLPIEX 0x00000002 /* Transmit LPI Exit */
+#define LPI_CTRL_STATUS_TLPIEN 0x00000001 /* Transmit LPI Entry */
+
/* GMAC HW ADDR regs */
#define GMAC_ADDR_HIGH(reg) (((reg > 15) ? 0x00000800 : 0x00000040) + \
(reg * 8))
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index b5e4d02..bfe0226 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -194,26 +194,107 @@ static void dwmac1000_pmt(void __iomem *ioaddr, unsigned long mode)
}
-static void dwmac1000_irq_status(void __iomem *ioaddr)
+static int dwmac1000_irq_status(void __iomem *ioaddr)
{
u32 intr_status = readl(ioaddr + GMAC_INT_STATUS);
+ int status = 0;
/* Not used events (e.g. MMC interrupts) are not handled. */
- if ((intr_status & mmc_tx_irq))
- CHIP_DBG(KERN_DEBUG "GMAC: MMC tx interrupt: 0x%08x\n",
+ if ((intr_status & mmc_tx_irq)) {
+ CHIP_DBG(KERN_INFO "GMAC: MMC tx interrupt: 0x%08x\n",
readl(ioaddr + GMAC_MMC_TX_INTR));
- if (unlikely(intr_status & mmc_rx_irq))
- CHIP_DBG(KERN_DEBUG "GMAC: MMC rx interrupt: 0x%08x\n",
+ status |= core_mmc_tx_irq;
+ }
+ if (unlikely(intr_status & mmc_rx_irq)) {
+ CHIP_DBG(KERN_INFO "GMAC: MMC rx interrupt: 0x%08x\n",
readl(ioaddr + GMAC_MMC_RX_INTR));
- if (unlikely(intr_status & mmc_rx_csum_offload_irq))
- CHIP_DBG(KERN_DEBUG "GMAC: MMC rx csum offload: 0x%08x\n",
+ status |= core_mmc_rx_irq;
+ }
+ if (unlikely(intr_status & mmc_rx_csum_offload_irq)) {
+ CHIP_DBG(KERN_INFO "GMAC: MMC rx csum offload: 0x%08x\n",
readl(ioaddr + GMAC_MMC_RX_CSUM_OFFLOAD));
+ status |= core_mmc_rx_csum_offload_irq;
+ }
if (unlikely(intr_status & pmt_irq)) {
- CHIP_DBG(KERN_DEBUG "GMAC: received Magic frame\n");
+ CHIP_DBG(KERN_INFO "GMAC: received Magic frame\n");
/* clear the PMT bits 5 and 6 by reading the PMT
* status register. */
readl(ioaddr + GMAC_PMT);
+ status |= core_irq_receive_pmt_irq;
}
+ /* MAC trx/rx EEE LPI entry/exit interrupts */
+ if (intr_status & lpiis_irq) {
+ /* Clean LPI interrupt by reading the Reg 12 */
+ u32 lpi_status = readl(ioaddr + LPI_CTRL_STATUS);
+
+ if (lpi_status & LPI_CTRL_STATUS_TLPIEN) {
+ CHIP_DBG(KERN_INFO "GMAC TX entered in LPI\n");
+ status |= core_irq_tx_path_in_lpi_mode;
+ }
+ if (lpi_status & LPI_CTRL_STATUS_TLPIEX) {
+ CHIP_DBG(KERN_INFO "GMAC TX exit from LPI\n");
+ status |= core_irq_tx_path_exit_lpi_mode;
+ }
+ if (lpi_status & LPI_CTRL_STATUS_RLPIEN) {
+ CHIP_DBG(KERN_INFO "GMAC RX entered in LPI\n");
+ status |= core_irq_rx_path_in_lpi_mode;
+ }
+ if (lpi_status & LPI_CTRL_STATUS_RLPIEX) {
+ CHIP_DBG(KERN_INFO "GMAC RX exit from LPI\n");
+ status |= core_irq_rx_path_exit_lpi_mode;
+ }
+ }
+
+ return status;
+}
+
+static void dwmac1000_set_eee_mode(void __iomem *ioaddr)
+{
+ u32 value;
+
+ /* Enable the link status receive on RGMII, SGMII ore SMII
+ * receive path and instruct the transmit to enter in LPI
+ * state. */
+ value = readl(ioaddr + LPI_CTRL_STATUS);
+ value |= LPI_CTRL_STATUS_LPIEN | LPI_CTRL_STATUS_LPITXA;
+ writel(value, ioaddr + LPI_CTRL_STATUS);
+}
+
+static void dwmac1000_reset_eee_mode(void __iomem *ioaddr)
+{
+ u32 value;
+
+ value = readl(ioaddr + LPI_CTRL_STATUS);
+ value &= ~(LPI_CTRL_STATUS_LPIEN | LPI_CTRL_STATUS_LPITXA);
+ writel(value, ioaddr + LPI_CTRL_STATUS);
+}
+
+static void dwmac1000_set_eee_pls(void __iomem *ioaddr, int link)
+{
+ u32 value;
+
+ value = readl(ioaddr + LPI_CTRL_STATUS);
+
+ if (link)
+ value |= LPI_CTRL_STATUS_PLS;
+ else
+ value &= ~LPI_CTRL_STATUS_PLS;
+
+ writel(value, ioaddr + LPI_CTRL_STATUS);
+}
+
+static void dwmac1000_set_eee_timer(void __iomem *ioaddr, int ls, int tw)
+{
+ int value = ((tw & 0xffff)) | ((ls & 0x7ff) << 16);
+
+ /* Program the timers in the LPI timer control register:
+ * LS: minimum time (ms) for which the link
+ * status from PHY should be ok before transmitting
+ * the LPI pattern.
+ * TW: minimum time (us) for which the core waits
+ * after it has stopped transmitting the LPI pattern.
+ */
+ writel(value, ioaddr + LPI_TIMER_CTRL);
}
static const struct stmmac_ops dwmac1000_ops = {
@@ -226,6 +307,10 @@ static const struct stmmac_ops dwmac1000_ops = {
.pmt = dwmac1000_pmt,
.set_umac_addr = dwmac1000_set_umac_addr,
.get_umac_addr = dwmac1000_get_umac_addr,
+ .set_eee_mode = dwmac1000_set_eee_mode,
+ .reset_eee_mode = dwmac1000_reset_eee_mode,
+ .set_eee_timer = dwmac1000_set_eee_timer,
+ .set_eee_pls = dwmac1000_set_eee_pls,
};
struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index 19e0f4e..f83210e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -72,9 +72,9 @@ static int dwmac100_rx_ipc_enable(void __iomem *ioaddr)
return 0;
}
-static void dwmac100_irq_status(void __iomem *ioaddr)
+static int dwmac100_irq_status(void __iomem *ioaddr)
{
- return;
+ return 0;
}
static void dwmac100_set_umac_addr(void __iomem *ioaddr, unsigned char *addr,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h b/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h
index 6e0360f..e678ce3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h
@@ -70,6 +70,7 @@
#define DMA_INTR_DEFAULT_MASK (DMA_INTR_NORMAL | DMA_INTR_ABNORMAL)
/* DMA Status register defines */
+#define DMA_STATUS_GLPII 0x40000000 /* GMAC LPI interrupt */
#define DMA_STATUS_GPI 0x10000000 /* PMT interrupt */
#define DMA_STATUS_GMI 0x08000000 /* MMC interrupt */
#define DMA_STATUS_GLI 0x04000000 /* GMAC Line interface int */
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index dc20c56..ab4c376 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -87,6 +87,12 @@ struct stmmac_priv {
#endif
int clk_csr;
int synopsys_id;
+ struct timer_list eee_ctrl_timer;
+ bool tx_path_in_lpi_mode;
+ int lpi_irq;
+ int eee_enabled;
+ int eee_active;
+ int tx_lpi_timer;
};
extern int phyaddr;
@@ -104,6 +110,8 @@ int stmmac_dvr_remove(struct net_device *ndev);
struct stmmac_priv *stmmac_dvr_probe(struct device *device,
struct plat_stmmacenet_data *plat_dat,
void __iomem *addr);
+void stmmac_disable_eee_mode(struct stmmac_priv *priv);
+bool stmmac_eee_init(struct stmmac_priv *priv);
#ifdef CONFIG_HAVE_CLK
static inline int stmmac_clk_enable(struct stmmac_priv *priv)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index ce43184..76fd61a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -93,6 +93,16 @@ static const struct stmmac_stats stmmac_gstrings_stats[] = {
STMMAC_STAT(poll_n),
STMMAC_STAT(sched_timer_n),
STMMAC_STAT(normal_irq_n),
+ STMMAC_STAT(normal_irq_n),
+ STMMAC_STAT(mmc_tx_irq_n),
+ STMMAC_STAT(mmc_rx_irq_n),
+ STMMAC_STAT(mmc_rx_csum_offload_irq_n),
+ STMMAC_STAT(irq_receive_pmt_irq_n),
+ STMMAC_STAT(irq_tx_path_in_lpi_mode_n),
+ STMMAC_STAT(irq_tx_path_exit_lpi_mode_n),
+ STMMAC_STAT(irq_rx_path_in_lpi_mode_n),
+ STMMAC_STAT(irq_rx_path_exit_lpi_mode_n),
+ STMMAC_STAT(phy_eee_wakeup_error_n),
};
#define STMMAC_STATS_LEN ARRAY_SIZE(stmmac_gstrings_stats)
@@ -366,6 +376,11 @@ static void stmmac_get_ethtool_stats(struct net_device *dev,
(*(u32 *)p);
}
}
+ if (priv->eee_enabled) {
+ int val = phy_get_eee_err(priv->phydev);
+ if (val)
+ priv->xstats.phy_eee_wakeup_error_n = val;
+ }
}
for (i = 0; i < STMMAC_STATS_LEN; i++) {
char *p = (char *)priv + stmmac_gstrings_stats[i].stat_offset;
@@ -464,6 +479,46 @@ static int stmmac_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
return 0;
}
+static int stmmac_ethtool_op_get_eee(struct net_device *dev,
+ struct ethtool_eee *edata)
+{
+ struct stmmac_priv *priv = netdev_priv(dev);
+
+ if (!priv->dma_cap.eee)
+ return -EOPNOTSUPP;
+
+ edata->eee_enabled = priv->eee_enabled;
+ edata->eee_active = priv->eee_active;
+ edata->tx_lpi_timer = priv->tx_lpi_timer;
+
+ return phy_ethtool_get_eee(priv->phydev, edata);
+}
+
+static int stmmac_ethtool_op_set_eee(struct net_device *dev,
+ struct ethtool_eee *edata)
+{
+ struct stmmac_priv *priv = netdev_priv(dev);
+
+ priv->eee_enabled = edata->eee_enabled;
+
+ if (!priv->eee_enabled)
+ stmmac_disable_eee_mode(priv);
+ else {
+ /* We are asking for enabling the EEE but it is safe
+ * to verify all by invoking the eee_init function.
+ * In case of failure it will return an error.
+ */
+ priv->eee_enabled = stmmac_eee_init(priv);
+ if (!priv->eee_enabled)
+ return -EOPNOTSUPP;
+
+ /* Do not change tx_lpi_timer in case of failure */
+ priv->tx_lpi_timer = edata->tx_lpi_timer;
+ }
+
+ return phy_ethtool_set_eee(priv->phydev, edata);
+}
+
static const struct ethtool_ops stmmac_ethtool_ops = {
.begin = stmmac_check_if_running,
.get_drvinfo = stmmac_ethtool_getdrvinfo,
@@ -480,6 +535,8 @@ static const struct ethtool_ops stmmac_ethtool_ops = {
.get_strings = stmmac_get_strings,
.get_wol = stmmac_get_wol,
.set_wol = stmmac_set_wol,
+ .get_eee = stmmac_ethtool_op_get_eee,
+ .set_eee = stmmac_ethtool_op_set_eee,
.get_sset_count = stmmac_get_sset_count,
.get_ts_info = ethtool_op_get_ts_info,
};
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index eba49cb..ea3bc09 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -133,6 +133,12 @@ static const u32 default_msg_level = (NETIF_MSG_DRV | NETIF_MSG_PROBE |
NETIF_MSG_LINK | NETIF_MSG_IFUP |
NETIF_MSG_IFDOWN | NETIF_MSG_TIMER);
+#define STMMAC_DEFAULT_LPI_TIMER 1000
+static int eee_timer = STMMAC_DEFAULT_LPI_TIMER;
+module_param(eee_timer, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(eee_timer, "LPI tx expiration time in msec");
+#define STMMAC_LPI_TIMER(x) (jiffies + msecs_to_jiffies(x))
+
static irqreturn_t stmmac_interrupt(int irq, void *dev_id);
#ifdef CONFIG_STMMAC_DEBUG_FS
@@ -161,6 +167,8 @@ static void stmmac_verify_args(void)
flow_ctrl = FLOW_OFF;
if (unlikely((pause < 0) || (pause > 0xffff)))
pause = PAUSE_TIME;
+ if (eee_timer < 0)
+ eee_timer = STMMAC_DEFAULT_LPI_TIMER;
}
static void stmmac_clk_csr_set(struct stmmac_priv *priv)
@@ -229,6 +237,85 @@ static inline void stmmac_hw_fix_mac_speed(struct stmmac_priv *priv)
phydev->speed);
}
+static void stmmac_enable_eee_mode(struct stmmac_priv *priv)
+{
+ /* Check and enter in LPI mode */
+ if ((priv->dirty_tx == priv->cur_tx) &&
+ (priv->tx_path_in_lpi_mode == false))
+ priv->hw->mac->set_eee_mode(priv->ioaddr);
+}
+
+void stmmac_disable_eee_mode(struct stmmac_priv *priv)
+{
+ /* Exit and disable EEE in case of we are are in LPI state. */
+ priv->hw->mac->reset_eee_mode(priv->ioaddr);
+ del_timer_sync(&priv->eee_ctrl_timer);
+ priv->tx_path_in_lpi_mode = false;
+}
+
+/**
+ * stmmac_eee_ctrl_timer
+ * @arg : data hook
+ * Description:
+ * If there is no data transfer and if we are not in LPI state,
+ * then MAC Transmitter can be moved to LPI state.
+ */
+static void stmmac_eee_ctrl_timer(unsigned long arg)
+{
+ struct stmmac_priv *priv = (struct stmmac_priv *)arg;
+
+ stmmac_enable_eee_mode(priv);
+ mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_TIMER(eee_timer));
+}
+
+/**
+ * stmmac_eee_init
+ * @priv: private device pointer
+ * Description:
+ * If the EEE support has been enabled while configuring the driver,
+ * if the GMAC actually supports the EEE (from the HW cap reg) and the
+ * phy can also manage EEE, so enable the LPI state and start the timer
+ * to verify if the tx path can enter in LPI state.
+ */
+bool stmmac_eee_init(struct stmmac_priv *priv)
+{
+ bool ret = false;
+
+ /* MAC core supports the EEE feature. */
+ if (priv->dma_cap.eee) {
+ /* Check if the PHY supports EEE */
+ if (phy_init_eee(priv->phydev, 1))
+ goto out;
+
+ priv->eee_active = 1;
+ init_timer(&priv->eee_ctrl_timer);
+ priv->eee_ctrl_timer.function = stmmac_eee_ctrl_timer;
+ priv->eee_ctrl_timer.data = (unsigned long)priv;
+ priv->eee_ctrl_timer.expires = STMMAC_LPI_TIMER(eee_timer);
+ add_timer(&priv->eee_ctrl_timer);
+
+ priv->hw->mac->set_eee_timer(priv->ioaddr,
+ STMMAC_DEFAULT_LIT_LS_TIMER,
+ priv->tx_lpi_timer);
+
+ pr_info("stmmac: Energy-Efficient Ethernet initialized\n");
+
+ ret = true;
+ }
+out:
+ return ret;
+}
+
+static void stmmac_eee_adjust(struct stmmac_priv *priv)
+{
+ /* When the EEE has been already initialised we have to
+ * modify the PLS bit in the LPI ctrl & status reg according
+ * to the PHY link status. For this reason.
+ */
+ if (priv->eee_enabled)
+ priv->hw->mac->set_eee_pls(priv->ioaddr, priv->phydev->link);
+}
+
/**
* stmmac_adjust_link
* @dev: net device structure
@@ -249,6 +336,7 @@ static void stmmac_adjust_link(struct net_device *dev)
phydev->addr, phydev->link);
spin_lock_irqsave(&priv->lock, flags);
+
if (phydev->link) {
u32 ctrl = readl(priv->ioaddr + MAC_CTRL_REG);
@@ -315,6 +403,8 @@ static void stmmac_adjust_link(struct net_device *dev)
if (new_state && netif_msg_link(priv))
phy_print_status(phydev);
+ stmmac_eee_adjust(priv);
+
spin_unlock_irqrestore(&priv->lock, flags);
DBG(probe, DEBUG, "stmmac_adjust_link: exiting\n");
@@ -332,7 +422,7 @@ static int stmmac_init_phy(struct net_device *dev)
{
struct stmmac_priv *priv = netdev_priv(dev);
struct phy_device *phydev;
- char phy_id[MII_BUS_ID_SIZE + 3];
+ char phy_id_fmt[MII_BUS_ID_SIZE + 3];
char bus_id[MII_BUS_ID_SIZE];
int interface = priv->plat->interface;
priv->oldlink = 0;
@@ -346,11 +436,12 @@ static int stmmac_init_phy(struct net_device *dev)
snprintf(bus_id, MII_BUS_ID_SIZE, "stmmac-%x",
priv->plat->bus_id);
- snprintf(phy_id, MII_BUS_ID_SIZE + 3, PHY_ID_FMT, bus_id,
+ snprintf(phy_id_fmt, MII_BUS_ID_SIZE + 3, PHY_ID_FMT, bus_id,
priv->plat->phy_addr);
- pr_debug("stmmac_init_phy: trying to attach to %s\n", phy_id);
+ pr_debug("stmmac_init_phy: trying to attach to %s\n", phy_id_fmt);
- phydev = phy_connect(dev, phy_id, &stmmac_adjust_link, 0, interface);
+ phydev = phy_connect(dev, phy_id_fmt, &stmmac_adjust_link, 0,
+ interface);
if (IS_ERR(phydev)) {
pr_err("%s: Could not attach to PHY\n", dev->name);
@@ -689,6 +780,11 @@ static void stmmac_tx(struct stmmac_priv *priv)
}
netif_tx_unlock(priv->dev);
}
+
+ if ((priv->eee_enabled) && (!priv->tx_path_in_lpi_mode)) {
+ stmmac_enable_eee_mode(priv);
+ mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_TIMER(eee_timer));
+ }
spin_unlock(&priv->tx_lock);
}
@@ -1027,6 +1123,17 @@ static int stmmac_open(struct net_device *dev)
}
}
+ /* Request the IRQ lines */
+ if (priv->lpi_irq != -ENXIO) {
+ ret = request_irq(priv->lpi_irq, stmmac_interrupt, IRQF_SHARED,
+ dev->name, dev);
+ if (unlikely(ret < 0)) {
+ pr_err("%s: ERROR: allocating the LPI IRQ %d (%d)\n",
+ __func__, priv->lpi_irq, ret);
+ goto open_error_lpiirq;
+ }
+ }
+
/* Enable the MAC Rx/Tx */
stmmac_set_mac(priv->ioaddr, true);
@@ -1062,12 +1169,19 @@ static int stmmac_open(struct net_device *dev)
if (priv->phydev)
phy_start(priv->phydev);
+ priv->tx_lpi_timer = STMMAC_DEFAULT_TWT_LS_TIMER;
+ priv->eee_enabled = stmmac_eee_init(priv);
+
napi_enable(&priv->napi);
skb_queue_head_init(&priv->rx_recycle);
netif_start_queue(dev);
return 0;
+open_error_lpiirq:
+ if (priv->wol_irq != dev->irq)
+ free_irq(priv->wol_irq, dev);
+
open_error_wolirq:
free_irq(dev->irq, dev);
@@ -1093,6 +1207,9 @@ static int stmmac_release(struct net_device *dev)
{
struct stmmac_priv *priv = netdev_priv(dev);
+ if (priv->eee_enabled)
+ del_timer_sync(&priv->eee_ctrl_timer);
+
/* Stop and disconnect the PHY */
if (priv->phydev) {
phy_stop(priv->phydev);
@@ -1115,6 +1232,8 @@ static int stmmac_release(struct net_device *dev)
free_irq(dev->irq, dev);
if (priv->wol_irq != dev->irq)
free_irq(priv->wol_irq, dev);
+ if (priv->lpi_irq != -ENXIO)
+ free_irq(priv->lpi_irq, dev);
/* Stop TX/RX DMA and clear the descriptors */
priv->hw->dma->stop_tx(priv->ioaddr);
@@ -1164,6 +1283,9 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
spin_lock(&priv->tx_lock);
+ if (priv->tx_path_in_lpi_mode)
+ stmmac_disable_eee_mode(priv);
+
entry = priv->cur_tx % txsize;
#ifdef STMMAC_XMIT_DEBUG
@@ -1540,10 +1662,37 @@ static irqreturn_t stmmac_interrupt(int irq, void *dev_id)
return IRQ_NONE;
}
- if (priv->plat->has_gmac)
- /* To handle GMAC own interrupts */
- priv->hw->mac->host_irq_status((void __iomem *) dev->base_addr);
+ /* To handle GMAC own interrupts */
+ if (priv->plat->has_gmac) {
+ int status = priv->hw->mac->host_irq_status((void __iomem *)
+ dev->base_addr);
+ if (unlikely(status)) {
+ if (status & core_mmc_tx_irq)
+ priv->xstats.mmc_tx_irq_n++;
+ if (status & core_mmc_rx_irq)
+ priv->xstats.mmc_rx_irq_n++;
+ if (status & core_mmc_rx_csum_offload_irq)
+ priv->xstats.mmc_rx_csum_offload_irq_n++;
+ if (status & core_irq_receive_pmt_irq)
+ priv->xstats.irq_receive_pmt_irq_n++;
+
+ /* For LPI we need to save the tx status */
+ if (status & core_irq_tx_path_in_lpi_mode) {
+ priv->xstats.irq_tx_path_in_lpi_mode_n++;
+ priv->tx_path_in_lpi_mode = true;
+ }
+ if (status & core_irq_tx_path_exit_lpi_mode) {
+ priv->xstats.irq_tx_path_exit_lpi_mode_n++;
+ priv->tx_path_in_lpi_mode = false;
+ }
+ if (status & core_irq_rx_path_in_lpi_mode)
+ priv->xstats.irq_rx_path_in_lpi_mode_n++;
+ if (status & core_irq_rx_path_exit_lpi_mode)
+ priv->xstats.irq_rx_path_exit_lpi_mode_n++;
+ }
+ }
+ /* To handle DMA interrupts */
stmmac_dma_interrupt(priv);
return IRQ_HANDLED;
@@ -2155,6 +2304,9 @@ static int __init stmmac_cmdline_opt(char *str)
} else if (!strncmp(opt, "pause:", 6)) {
if (kstrtoint(opt + 6, 0, &pause))
goto err;
+ } else if (!strncmp(opt, "eee_timer:", 6)) {
+ if (kstrtoint(opt + 10, 0, &eee_timer))
+ goto err;
#ifdef CONFIG_STMMAC_TIMER
} else if (!strncmp(opt, "tmrate:", 7)) {
if (kstrtoint(opt + 7, 0, &tmrate))
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 20eb502..7d36163 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -156,6 +156,8 @@ static int stmmac_pltfr_probe(struct platform_device *pdev)
if (priv->wol_irq == -ENXIO)
priv->wol_irq = priv->dev->irq;
+ priv->lpi_irq = platform_get_irq_byname(pdev, "eth_lpi");
+
platform_set_drvdata(pdev, priv->dev);
pr_debug("STMMAC platform driver registration completed");
--
1.7.4.4
^ permalink raw reply related
* [net-next.git 2/4] stmmac: update the driver Documentation and add EEE
From: Giuseppe CAVALLARO @ 2012-06-21 6:03 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, bhutchings, rayagond, davem, yuvalmin,
Giuseppe Cavallaro
In-Reply-To: <1340258599-3083-1-git-send-email-peppe.cavallaro@st.com>
This patch updates the stmmac's documentation adding
some missing files in the section used to describe the
internal driver's structure.
Also the patch adds a new section to describe the EEE support.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
Documentation/networking/stmmac.txt | 36 +++++++++++++++++++++++++++++-----
1 files changed, 30 insertions(+), 6 deletions(-)
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 5cb9a19..c676b9c 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -257,9 +257,11 @@ reset procedure etc).
o Makefile
o stmmac_main.c: main network device driver;
o stmmac_mdio.c: mdio functions;
+ o stmmac_pci: PCI driver;
+ o stmmac_platform.c: platform driver
o stmmac_ethtool.c: ethtool support;
o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts
- Only tested on ST40 platforms based.
+ (only tested on ST40 platforms based);
o stmmac.h: private driver structure;
o common.h: common definitions and VFTs;
o descs.h: descriptor structure definitions;
@@ -269,9 +271,11 @@ reset procedure etc).
o dwmac100_core: MAC 100 core and dma code;
o dwmac100_dma.c: dma funtions for the MAC chip;
o dwmac1000.h: specific header file for the MAC;
- o dwmac_lib.c: generic DMA functions shared among chips
- o enh_desc.c: functions for handling enhanced descriptors
- o norm_desc.c: functions for handling normal descriptors
+ o dwmac_lib.c: generic DMA functions shared among chips;
+ o enh_desc.c: functions for handling enhanced descriptors;
+ o norm_desc.c: functions for handling normal descriptors;
+ o chain_mode.c/ring_mode.c:: functions to manage RING/CHAINED modes;
+ o mmc_core.c/mmc.h: Management MAC Counters;
5) Debug Information
@@ -304,7 +308,27 @@ All these are only useful during the developing stage
and should never enabled inside the code for general usage.
In fact, these can generate an huge amount of debug messages.
-6) TODO:
+6) Energy Efficient Ethernet
+
+Energy Efficient Ethernet(EEE) enables IEEE 802.3 MAC sublayer along
+with a family of Physical layer to operate in the Low power Idle(LPI)
+mode. The EEE mode supports the IEEE 802.3 MAC operation at 100Mbps,
+1000Mbps & 10Gbps.
+
+The LPI mode allows power saving by switching off parts of the
+communication device functionality when there is no data to be
+transmitted & received. The system on both the side of the link can
+disable some functionalities & save power during the period of low-link
+utilization. The MAC controls whether the system should enter or exit
+the LPI mode & communicate this to PHY.
+
+As soon as the interface is opened, the driver verifies if the EEE can
+be supported. This is done by looking at both the DMA HW capability
+register and the PHY devices MCD registers.
+To enter in Tx LPI mode the driver needs to have a software timer
+that enable and disable the LPI mode when there is nothing to be
+transmitted.
+
+7) TODO:
o XGMAC is not supported.
- o Add the EEE - Energy Efficient Ethernet
o Add the PTP - precision time protocol
--
1.7.4.4
^ permalink raw reply related
* [net-next.git 1/4 (v3)] stmmac: do not use strict_strtoul but kstrtoint
From: Giuseppe CAVALLARO @ 2012-06-21 6:03 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, bhutchings, rayagond, davem, yuvalmin,
Giuseppe Cavallaro
In-Reply-To: <1340258599-3083-1-git-send-email-peppe.cavallaro@st.com>
This patch replaces the obsolete strict_strtoul with kstrtoint.
v2: also removed casting on kstrtoul.
v3: use kstrtoint instead of kstrtoul due to all vars are integer.
thanks to E. Dumazet.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 27 +++++++-------------
1 files changed, 10 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 590e95b..eba49cb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2129,42 +2129,35 @@ static int __init stmmac_cmdline_opt(char *str)
return -EINVAL;
while ((opt = strsep(&str, ",")) != NULL) {
if (!strncmp(opt, "debug:", 6)) {
- if (strict_strtoul(opt + 6, 0, (unsigned long *)&debug))
+ if (kstrtoint(opt + 6, 0, &debug))
goto err;
} else if (!strncmp(opt, "phyaddr:", 8)) {
- if (strict_strtoul(opt + 8, 0,
- (unsigned long *)&phyaddr))
+ if (kstrtoint(opt + 8, 0, &phyaddr))
goto err;
} else if (!strncmp(opt, "dma_txsize:", 11)) {
- if (strict_strtoul(opt + 11, 0,
- (unsigned long *)&dma_txsize))
+ if (kstrtoint(opt + 11, 0, &dma_txsize))
goto err;
} else if (!strncmp(opt, "dma_rxsize:", 11)) {
- if (strict_strtoul(opt + 11, 0,
- (unsigned long *)&dma_rxsize))
+ if (kstrtoint(opt + 11, 0, &dma_rxsize))
goto err;
} else if (!strncmp(opt, "buf_sz:", 7)) {
- if (strict_strtoul(opt + 7, 0,
- (unsigned long *)&buf_sz))
+ if (kstrtoint(opt + 7, 0, &buf_sz))
goto err;
} else if (!strncmp(opt, "tc:", 3)) {
- if (strict_strtoul(opt + 3, 0, (unsigned long *)&tc))
+ if (kstrtoint(opt + 3, 0, &tc))
goto err;
} else if (!strncmp(opt, "watchdog:", 9)) {
- if (strict_strtoul(opt + 9, 0,
- (unsigned long *)&watchdog))
+ if (kstrtoint(opt + 9, 0, &watchdog))
goto err;
} else if (!strncmp(opt, "flow_ctrl:", 10)) {
- if (strict_strtoul(opt + 10, 0,
- (unsigned long *)&flow_ctrl))
+ if (kstrtoint(opt + 10, 0, &flow_ctrl))
goto err;
} else if (!strncmp(opt, "pause:", 6)) {
- if (strict_strtoul(opt + 6, 0, (unsigned long *)&pause))
+ if (kstrtoint(opt + 6, 0, &pause))
goto err;
#ifdef CONFIG_STMMAC_TIMER
} else if (!strncmp(opt, "tmrate:", 7)) {
- if (strict_strtoul(opt + 7, 0,
- (unsigned long *)&tmrate))
+ if (kstrtoint(opt + 7, 0, &tmrate))
goto err;
#endif
}
--
1.7.4.4
^ permalink raw reply related
* [net-next.git 0/4] EEE for PAL and stmmac (V6)
From: Giuseppe CAVALLARO @ 2012-06-21 6:03 UTC (permalink / raw)
To: netdev
Cc: eric.dumazet, bhutchings, rayagond, davem, yuvalmin,
Giuseppe Cavallaro
These patches add the EEE support in the stmmac device driver
restoring an old work I had done some months ago and not
completed in time.
I've tested all on ST STB with the IC+ 101G PHY device that has
this feature.
The initial EEE support for the stmmac has been written by Rayagond
but I have reworked all his code adding new parts and especially
performing tests on a real hardware. Thx Rayagond!
In these patches, we can see that the stmmac supports the EEE
only if the DMA HW capability register says that this
feature is actually available. In that case, the driver can enter
in the Tx LPI mode by using a timer as recommended by Synopsys.
Note that EEE is supported in new chip generations; in particular
I used the 3.61a.
At any rate, further information about how the driver treats the EEE
can be found in the stmmac.txt file (there is a patch for that).
Another patch is for Physical Abstraction Layer now able to
manage the MMD registers (clause 45); it also provides the ethtool
support to manage supported/advertisement/lp adv features.
v3: fixed the "stmmac: do not use strict_strtoul but kstrtoint"
to use the kstrtoint.
v4: fixed the function to enable the EEE and add a check that verifies
if the link auto-negotiated matches with the bits in the adv and lp
registers.
v5: reviewed the way to get the negotiated settings
v6: fixed a broken return value in the phy_eee_init function
Giuseppe Cavallaro (4):
stmmac: do not use strict_strtoul but kstrtoint
stmmac: update the driver Documentation and add EEE
stmmac: add the Energy Efficient Ethernet support
phy: add the EEE support and the way to access to the MMD registers.
Documentation/networking/stmmac.txt | 36 ++-
drivers/net/ethernet/stmicro/stmmac/common.h | 31 ++-
drivers/net/ethernet/stmicro/stmmac/dwmac1000.h | 20 ++
.../net/ethernet/stmicro/stmmac/dwmac1000_core.c | 101 +++++++-
.../net/ethernet/stmicro/stmmac/dwmac100_core.c | 4 +-
drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h | 1 +
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 8 +
.../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 57 ++++
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 193 ++++++++++++--
.../net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +
drivers/net/phy/phy.c | 281 ++++++++++++++++++++
include/linux/mdio.h | 21 ++-
include/linux/mii.h | 9 +
include/linux/phy.h | 5 +
14 files changed, 724 insertions(+), 45 deletions(-)
--
1.7.4.4
^ permalink raw reply
* Re: How does SACK or FACK determine the time to start fast retransmition?
From: Eric Dumazet @ 2012-06-21 5:59 UTC (permalink / raw)
To: LovelyLich; +Cc: netdev, linux-kernel
In-Reply-To: <CAAA3+BpRQ24HED6a+yCF+A_q=vJ7nHexLJd1U+-50pb43MVa5w@mail.gmail.com>
On Wed, 2012-06-20 at 22:31 +0800, LovelyLich wrote:
> HI all,
> When tcp uses reno as its congestion control algothim, it uses
> tp->sacked_out as dup-ack. When the third dup-ack(under default
> condition) comes, tcp will initiate its fast retransmition.
> But how about sack ?
Some tcp experts out there only catch tcp related stuff if Subject
includes TCP.
You might resend your mail on netdev (not sure lkml will really help)
with following subject :
[RFC] tcp: How does SACK or FACK determine the time to start fast
retransmission?
To get more traction.
^ permalink raw reply
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO
From: David Miller @ 2012-06-21 5:56 UTC (permalink / raw)
To: eric.dumazet; +Cc: alexander.h.duyck, netdev, jeffrey.t.kirsher
In-Reply-To: <1340180223.4604.828.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Jun 2012 10:17:03 +0200
> By the way, big cost in netdev_alloc_frag() is the irq
> masking/restore We probably could have a version for softirq
> users...
That's an extremely disappointing way for us to be losing cycles.
Everyone pays this price purely because:
1) __netdev_alloc_skb() uses netdev_alloc_frag()
and:
2) #1 is invoked, either directly or indirectly, by tons
of slow non-NAPI drivers.
This got me looking into the plathora of interfaces we let drivers use
to allocate RX buffers. It's a big mess.
We have dev_alloc_skb() which essentially calls __netdev_alloc_skb()
with a NULL device argument. This is terrible because it means that
if we want to do something interesting on a per-device level we can't
rely upon the device being non-NULL in __netdev_alloc_skb().
I looked at the remaining dev_alloc_skb() users and these are in places
which are allocating packets in a module which is one level removed from
the netdevice level. For example, ATM and infiniband IPATH.
What these callers want is something more like:
static inline struct sk_buff *alloc_skb_and_reserve_pad(unsinged int length,
gfp_t gfp_mask)
{
struct sk_buff *skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask,
0, NUMA_NO_NODE);
if (likely(skb))
skb_reserve(skb, NET_SKB_PAD);
return skb;
}
Then we won't have the NULL device case for __netdev_alloc_skb() any more.
^ permalink raw reply
* Re: [RFC net-next 00/14] default maximal number of RSS queues in mq drivers
From: David Miller @ 2012-06-21 5:49 UTC (permalink / raw)
To: yuvalmin
Cc: bhutchings, netdev, eilong, divy, ogerlitz, jdmason,
anirban.chakraborty, jitendra.kalsaria, ron.mercer,
jeffrey.t.kirsher, mason, gallatin, sathya.perla,
subbu.seetharaman, ajit.khaparde, mcarlson, mchan
In-Reply-To: <4FE2ADF8.8010508@broadcom.com>
From: "Yuval Mintz" <yuvalmin@broadcom.com>
Date: Thu, 21 Jun 2012 08:15:36 +0300
> On 06/20/2012 11:48 PM, Ben Hutchings wrote:
>
>> Also, I would recommend encapsulating the calculation of default number
>> of RSS queues in a function, rather than repeating it in every driver.
>> That will make it easier to replace with something more sophisticated
>> and configurable later on.
>>
>> Ben.
>>
>
> Do You also have the notion of where's the correct place to place such a
> function?
I would put the extern declaration into linux/netdevice.h and the
implementation in net/core/dev.c
^ permalink raw reply
* Re
From: Dave Wegener @ 2012-06-21 5:31 UTC (permalink / raw)
Hello,
I would like to discuss a very good business proposal with you. Kindly contact: weiserosanka@yahoo.com.hk Mr.Weise
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox