* Re: dsa: handling more than 1 cpu port
From: Andrew Lunn @ 2016-12-14 10:31 UTC (permalink / raw)
To: John Crispin; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <1671ae82-c213-7611-584f-02a3b1d5dff9@phrozen.org>
On Wed, Dec 14, 2016 at 11:01:54AM +0100, John Crispin wrote:
> Hi Andrew,
>
> switches supported by qca8k have 2 gmacs that we can wire an external
> mii interface to. Usually this is used to wire 2 of the SoCs MACs to the
> switch. Thw switch itself is aware of one of the MACs being the cpu port
> and expects this to be port/mac0. Using the other will break the
> hardware offloading features.
Just to be sure here. There is no way to use the second port connected
to the CPU as a CPU port?
The Marvell chips do allow this. So i developed a proof of concept
which had a mapping between cpu ports and slave ports. slave port X
should you cpu port y for its traffic. This never got past proof of
concept.
If this can be made to work for qca8k, i would prefer having this
general concept, than specific hacks for pass through.
Andrew
^ permalink raw reply
* Re: dsa: handling more than 1 cpu port
From: John Crispin @ 2016-12-14 10:35 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <20161214103153.GC27370@lunn.ch>
On 14/12/2016 11:31, Andrew Lunn wrote:
> On Wed, Dec 14, 2016 at 11:01:54AM +0100, John Crispin wrote:
>> Hi Andrew,
>>
>> switches supported by qca8k have 2 gmacs that we can wire an external
>> mii interface to. Usually this is used to wire 2 of the SoCs MACs to the
>> switch. Thw switch itself is aware of one of the MACs being the cpu port
>> and expects this to be port/mac0. Using the other will break the
>> hardware offloading features.
>
> Just to be sure here. There is no way to use the second port connected
> to the CPU as a CPU port?
both macs are considered cpu ports and both allow for the tag to be
injected. for HW NAT/routing/... offloading to work, the lan ports neet
to trunk via port0 and not port6 however.
>
> The Marvell chips do allow this. So i developed a proof of concept
> which had a mapping between cpu ports and slave ports. slave port X
> should you cpu port y for its traffic. This never got past proof of
> concept.
>
> If this can be made to work for qca8k, i would prefer having this
> general concept, than specific hacks for pass through.
oh cool, can you send those patches my way please ? how do you configure
this from userland ? does the cpu port get its on swdev which i just add
to my lan bridge and then add the 2nd cpu port to the wan bridge ?
John
^ permalink raw reply
* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Andrew Lunn @ 2016-12-14 10:46 UTC (permalink / raw)
To: Volodymyr Bendiuga
Cc: Volodymyr Bendiuga, Florian Fainelli, Vivien Didelot, netdev,
Volodymyr Bendiuga
In-Reply-To: <CABHmqqB-6-L124hrAjNxd4kO3zYp9R=RM1X6PXyuQW6Hapiv4g@mail.gmail.com>
On Wed, Dec 14, 2016 at 11:03:14AM +0100, Volodymyr Bendiuga wrote:
> Hi,
>
> I understand what you wrote, but we do not refresh anything
> at intervals. FDB dump works only on user request, i.e. user
> run bridge fdb dump command. What we did is just a smarter
> way of feeding the dump request with entries obtained from
> switchcore. Interesting thing is that fdb dump in switchcore
> reads all entries, and from those entries only one is returned at
> a time, others are discarded, because request comes as per port.
Ah, O.K.
> What we do is we dump entries from switchcore once, when the
> first dump request comes, save all of them in the cache, and then
> all following fdb dump requests for each port will be fed from the cache,
> so no more hardware dump operations. We set the cache to be valid for
> 2 seconds, but this could be adjusted and tweaked. So in our case
> we decrease the number of MDIO reads quite a lot.
> What we are thinking now, is that this logics could be moved
> to net/dsa layer, and maybe unified with fdb load logics, if possible.
We should check what the other switches do. Can they perform a dump
with the hardware performing the port filter? Those switches will not
benefit from such a cache. But they should also not suffer.
Combining it with load is a bigger question. Currently the drivers are
stateless. That makes the error handling very simple. We don't have to
worry about running out of memory, since we don't allocate any memory.
If we run out of memory for this dump cache, it is not serious, since
a dump does not result in state change. But if we are out of memory on
a load, we do have state changes. We have to deal with the complexity
of allocating resources in the _prepare call, and then use the
resource in the do call. I would much prefer to avoid this at first,
by optimising the atu get. If we don't see a significant speed up,
then we can consider this complex solution of a cache for load.
Andrew
^ permalink raw reply
* (unknown),
From: Mr Friedrich Mayrhofer @ 2016-12-14 9:45 UTC (permalink / raw)
Good Day,
This is the second time i am sending you this mail.
I, Friedrich Mayrhofer Donate $ 1,000,000.00 to You, Email Me
personally for more details.
Regards.
Friedrich Mayrhofer
^ permalink raw reply
* Re: net/arp: ARP cache aging failed.
From: YueHaibing @ 2016-12-14 10:54 UTC (permalink / raw)
To: Julian Anastasov, Hannes Frederic Sowa
Cc: Eric Dumazet, David S. Miller, netdev
In-Reply-To: <alpine.LFD.2.11.1611252140120.1888@ja.home.ssi.bg>
On 2016/11/26 4:40, Julian Anastasov wrote:
>
> Hello,
>
> On Fri, 25 Nov 2016, Hannes Frederic Sowa wrote:
>
>> On 25.11.2016 09:18, Julian Anastasov wrote:
>>>
>>> Another option would be to add similar bit to
>>> struct sock (sk_dst_pending_confirm), may be ip_finish_output2
>>> can propagate it to dst_neigh_output via new arg and then to
>>> reset it.
>>
>> I don't understand how this can help? Maybe I understood it wrong?
>
> The idea is, the indication from highest level (TCP) to
> be propagated to lowest level (neigh).
>
>>> Or to change n->confirmed via some new inline sock
>>> function instead of changing dst_neigh_output. At this place
>>> skb->sk is optional, may be we need (skb->sk && dst ==
>>> skb->sk->sk_dst_cache) check. Also, when sk_dst_cache changes
>>> we should clear this flag.
>>
>> In "(skb->sk && dst == skb->sk->sk_dst_cache)" where does dst come from?
>
> This is in case we do not want to propagate indication
> from TCP to tunnels, see below...
>
>> I don't see a possibility besides using mac_header or inner_mac_header
>> to look up the incoming MAC address and confirm that one in the neighbor
>> cache so far (we could try to optimize this case for rt_gateway though).
>
> My idea is as follows:
>
> - TCP instead of calling dst_confirm should call new func
> dst_confirm_sk(sk) where sk->sk_dst_pending_confirm is set,
> not dst->pending_confirm.
>
> - ip_finish_output2: use skb->sk (TCP) to check for
> sk_dst_pending_confirm and update n->confirmed in some
> new inline func
>
> Correct me if I'm wrong but here is how I see the
> situation:
>
> - TCP caches output dst in socket, for example, dst1,
> sets it as skb->dst
>
> - if xfrm tunnels are used then dst1 can point to some tunnel,
> i.e. it is not in all cases the same skb->dst, as seen by
> ip_finish_output2
>
> - netfilter can DNAT and assign different skb->dst
>
> - as result, before now, dst1->pending_confirm is set but
> it is not used properly because ip_finish_output2 uses
> the final skb->dst which can be different or with
> rt_gateway = 0
>
> - considering that tunnels do not use dst_confirm,
> n->confirmed is not changed every time. There are some
> interesting cases:
>
> 1. both dst1 and the final skb->dst point to same
> dst with rt_gateway = 0: n->confirmed is updated but
> as wee see it can be for wrong neigh.
>
> 2. dst1 is different from skb->dst, so n->confirmed is
> not changed. This can happen on DNAT or when using
> tunnel to secure gateway.
>
> - in ip_finish_output2 we have skb->sk (original TCP sk) and
> sk arg (equal to skb->sk or second option is sock of some UDP
> tunnel, etc). The idea is to use skb->sk->sk_dst_pending_confirm,
> i.e. highest level sk because the sk arg, if different, does not
> have such flag set (tunnels do not call dst_confirm).
>
> - ip_finish_output2 should call new func dst_neigh_confirm_sk
> (which has nothing related to dst, hm... better name is needed):
>
> if (!IS_ERR(neigh)) {
> - int res = dst_neigh_output(dst, neigh, skb);
> + int res;
> +
> + dst_neigh_confirm_sk(skb->sk, neigh);
> + res = dst_neigh_output(dst, neigh, skb);
>
> which should do something like this:
>
> if (sk && sk->sk_dst_pending_confirm) {
> unsigned long now = jiffies;
>
> sk->sk_dst_pending_confirm = 0;
> /* avoid dirtying neighbour */
> if (n->confirmed != now)
> n->confirmed = now;
> }
>
> Additional dst == skb->sk->sk_dst_cache check will
> not propagate the indication on DNAT/tunnel. For me,
> it is better without such check.
>
> So, the idea is to move TCP and other similar
> users to the new dst_confirm_sk() method. If other
> dst_confirm() users are left, they should be checked
> if dsts with rt_gateway = 0 can be wrongly used.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>
> .
>
Sorry for so late.
Based on your ideas, I make a patch and test it for a while.
It works for me.
>From dc033fe4bac8931590e15774a298f04ccea8ed4c Mon Sep 17 00:00:00 2001
From: YueHabing <yuehaibing@huawei.com>
Date: Wed, 14 Dec 2016 02:43:51 -0500
Subject: [PATCH] arp: do neigh confirm based on sk arg
Signed-off-by: YueHabing <yuehaibing@huawei.com>
---
include/net/sock.h | 18 ++++++++++++++++++
net/ipv4/ip_output.c | 5 ++++-
net/ipv4/ping.c | 2 +-
net/ipv4/raw.c | 2 +-
net/ipv4/tcp_input.c | 8 ++------
net/ipv4/tcp_metrics.c | 5 +++--
net/ipv4/udp.c | 2 +-
net/ipv6/raw.c | 2 +-
net/ipv6/route.c | 6 +++---
net/ipv6/udp.c | 2 +-
10 files changed, 35 insertions(+), 17 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 92b2697..ece8dfa 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -434,6 +434,7 @@ struct sock {
struct sk_buff *sk_send_head;
__s32 sk_peek_off;
int sk_write_pending;
+ unsigned short sk_dst_pending_confirm;
#ifdef CONFIG_SECURITY
void *sk_security;
#endif
@@ -2263,6 +2264,23 @@ static inline void sk_state_store(struct sock *sk, int newstate)
smp_store_release(&sk->sk_state, newstate);
}
+static inline void dst_confirm_sk(struct sock *sk)
+{
+ sk->sk_dst_pending_confirm = 1;
+}
+
+static inline void dst_neigh_confirm_sk(struct sock *sk, struct neighbour *n)
+{
+ if (sk && sk->sk_dst_pending_confirm) {
+ unsigned long now = jiffies;
+
+ sk->sk_dst_pending_confirm = 0;
+ /* avoid dirtying neighbour */
+ if (n->confirmed != now)
+ n->confirmed = now;
+ }
+}
+
void sock_enable_timestamp(struct sock *sk, int flag);
int sock_get_timestamp(struct sock *, struct timeval __user *);
int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 877bdb0..7c9b640 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -221,7 +221,10 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
if (unlikely(!neigh))
neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
if (!IS_ERR(neigh)) {
- int res = dst_neigh_output(dst, neigh, skb);
+ int res;
+
+ dst_neigh_confirm_sk(skb->sk, neigh);
+ res = dst_neigh_output(dst, neigh, skb);
rcu_read_unlock_bh();
return res;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 96b8e2b..bcff1c6 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -847,7 +847,7 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
return err;
do_confirm:
- dst_confirm(&rt->dst);
+ dst_confirm_sk(sk);
if (!(msg->msg_flags & MSG_PROBE) || len)
goto back_from_confirm;
err = 0;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ecbe5a7..e01424d 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -664,7 +664,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
return len;
do_confirm:
- dst_confirm(&rt->dst);
+ dst_confirm_sk(sk);
if (!(msg->msg_flags & MSG_PROBE) || len)
goto back_from_confirm;
err = 0;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c71d49c..92b2060 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3702,9 +3702,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
tcp_process_tlp_ack(sk, ack, flag);
if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
- struct dst_entry *dst = __sk_dst_get(sk);
- if (dst)
- dst_confirm(dst);
+ dst_confirm_sk(sk);
}
if (icsk->icsk_pending == ICSK_TIME_RETRANS)
@@ -6049,9 +6047,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
tcp_set_state(sk, TCP_FIN_WAIT2);
sk->sk_shutdown |= SEND_SHUTDOWN;
- dst = __sk_dst_get(sk);
- if (dst)
- dst_confirm(dst);
+ dst_confirm_sk(sk);
if (!sock_flag(sk, SOCK_DEAD)) {
/* Wake up lingering close() */
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index bf1f3b2..375ff08 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -379,7 +379,8 @@ void tcp_update_metrics(struct sock *sk)
return;
if (dst->flags & DST_HOST)
- dst_confirm(dst);
+ dst_confirm_sk(sk);
+
rcu_read_lock();
if (icsk->icsk_backoff || !tp->srtt_us) {
@@ -496,7 +497,7 @@ void tcp_init_metrics(struct sock *sk)
if (!dst)
goto reset;
- dst_confirm(dst);
+ dst_confirm_sk(sk);
rcu_read_lock();
tm = tcp_get_metrics(sk, dst, true);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5bab6c3..f7852d3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1111,7 +1111,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
return err;
do_confirm:
- dst_confirm(&rt->dst);
+ dst_confirm_sk(sk);
if (!(msg->msg_flags&MSG_PROBE) || len)
goto back_from_confirm;
err = 0;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 054a1d8..bbb09a4 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -927,7 +927,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
txopt_put(opt_to_free);
return err < 0 ? err : len;
do_confirm:
- dst_confirm(dst);
+ dst_confirm_sk(sk);
if (!(msg->msg_flags & MSG_PROBE) || len)
goto back_from_confirm;
err = 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1b57e11..8df4671 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1356,7 +1356,7 @@ static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
(rt->rt6i_flags & RTF_PCPU || rt->rt6i_node);
}
-static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
+static void __ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
const struct ipv6hdr *iph, u32 mtu)
{
struct rt6_info *rt6 = (struct rt6_info *)dst;
@@ -1367,7 +1367,7 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
if (dst_metric_locked(dst, RTAX_MTU))
return;
- dst_confirm(dst);
+ dst_confirm_sk(sk);
mtu = max_t(u32, mtu, IPV6_MIN_MTU);
if (mtu >= dst_mtu(dst))
return;
@@ -2248,7 +2248,7 @@ static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_bu
* Look, redirects are sent only in response to data packets,
* so that this nexthop apparently is reachable. --ANK
*/
- dst_confirm(&rt->dst);
+ dst_confirm_sk(sk);
neigh = __neigh_lookup(&nd_tbl, &msg->target, skb->dev, 1);
if (!neigh)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e4a8000..6057101 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1309,7 +1309,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
return err;
do_confirm:
- dst_confirm(dst);
+ dst_confirm_sk(sk);
if (!(msg->msg_flags&MSG_PROBE) || len)
goto back_from_confirm;
err = 0;
--
1.8.3.1
^ permalink raw reply related
* [PATCH 2/3] selftests: do not require bash to run bpf tests
From: Rolf Eike Beer @ 2016-12-14 10:58 UTC (permalink / raw)
To: linux-kselftest
Cc: Daniel Borkmann, David S. Miller, netdev, Alexei Starovoitov,
linux-kernel, Shuah Khan, Shuah Khan
>From b9d6c1b7427d708ef2d4d57aac17b700b3694d71 Mon Sep 17 00:00:00 2001
From: Rolf Eike Beer <eike-kernel@sf-tec.de>
Date: Wed, 14 Dec 2016 09:58:12 +0100
Subject: [PATCH 2/3] selftests: do not require bash to run bpf tests
Nothing in this minimal script seems to require bash. We often run these tests
on embedded devices where the only shell available is the busybox ash.
Signed-off-by: Rolf Eike Beer <eb@emlix.com>
---
tools/testing/selftests/bpf/test_kmod.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_kmod.sh b/tools/testing/selftests/bpf/test_kmod.sh
index 92e627a..6d58cca 100755
--- a/tools/testing/selftests/bpf/test_kmod.sh
+++ b/tools/testing/selftests/bpf/test_kmod.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/bin/sh
SRC_TREE=../../../../
--
2.8.3
--
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Bertha-von-Suttner-Str. 9, 37085 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055
emlix – smart embedded open source
^ permalink raw reply related
* Re: [PATCH 2/3] selftests: do not require bash to run bpf tests
From: Daniel Borkmann @ 2016-12-14 11:03 UTC (permalink / raw)
To: Rolf Eike Beer, linux-kselftest
Cc: David S. Miller, netdev, Alexei Starovoitov, linux-kernel,
Shuah Khan, Shuah Khan
In-Reply-To: <3057464.DrP1iNtcaF@devpool21>
On 12/14/2016 11:58 AM, Rolf Eike Beer wrote:
> From b9d6c1b7427d708ef2d4d57aac17b700b3694d71 Mon Sep 17 00:00:00 2001
> From: Rolf Eike Beer <eike-kernel@sf-tec.de>
> Date: Wed, 14 Dec 2016 09:58:12 +0100
> Subject: [PATCH 2/3] selftests: do not require bash to run bpf tests
>
> Nothing in this minimal script seems to require bash. We often run these tests
> on embedded devices where the only shell available is the busybox ash.
>
> Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: dsa: handling more than 1 cpu port
From: John Crispin @ 2016-12-14 11:11 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <20161214110058.GE27370@lunn.ch>
On 14/12/2016 12:00, Andrew Lunn wrote:
> On Wed, Dec 14, 2016 at 11:35:30AM +0100, John Crispin wrote:
>>
>>
>> On 14/12/2016 11:31, Andrew Lunn wrote:
>>> On Wed, Dec 14, 2016 at 11:01:54AM +0100, John Crispin wrote:
>>>> Hi Andrew,
>>>>
>>>> switches supported by qca8k have 2 gmacs that we can wire an external
>>>> mii interface to. Usually this is used to wire 2 of the SoCs MACs to the
>>>> switch. Thw switch itself is aware of one of the MACs being the cpu port
>>>> and expects this to be port/mac0. Using the other will break the
>>>> hardware offloading features.
>>>
>>> Just to be sure here. There is no way to use the second port connected
>>> to the CPU as a CPU port?
>>
>> both macs are considered cpu ports and both allow for the tag to be
>> injected. for HW NAT/routing/... offloading to work, the lan ports neet
>> to trunk via port0 and not port6 however.
>
> Maybe you can do a restricted version of the generic solution. LAN
> ports are mapped to cpu port0. WAN port to cpu port 6?
hardcoding it is exactly what i want to avoid, same as using magic name
matching. the dts should describe the HW and not the usage dictated by
what is printed on the casing of the switch. as you mention below this
is marketing chatter. imho ports should have any name and we should be
able to bridge them as we feel happy and attach the bridges to any cpu
port that we want.
>
>>> The Marvell chips do allow this. So i developed a proof of concept
>>> which had a mapping between cpu ports and slave ports. slave port X
>>> should you cpu port y for its traffic. This never got past proof of
>>> concept.
>>>
>>> If this can be made to work for qca8k, i would prefer having this
>>> general concept, than specific hacks for pass through.
>>
>> oh cool, can you send those patches my way please ? how do you configure
>> this from userland ? does the cpu port get its on swdev which i just add
>> to my lan bridge and then add the 2nd cpu port to the wan bridge ?
>
> https://github.com/lunn/linux/tree/v4.1-rc4-net-next-multiple-cpu
>
> You don't configure anything from userland. Which was actually a
> criticism. It is in device tree. But my solution is generic. Having
> one WAN port and four bridges LAN ports is a pure marketing
> requirement. The hardware will happily do two WAN ports and 3 LAN
> ports, for example. And the switch is happy to take traffic for the
> WAN port and a LAN port over the same CPU port, and keep the traffic
> separate. So we can have some form of load balancing. We are not
> limited to 1->1, 1->4, we can do 1->2, 1->3 to increase the overall
> performance. And to the user it is all transparent.
>
> This PoC is for the old DSA binding. The new binding makes it easier
> to express this. Which is one of the reasons for the new binding.
>
> Andrew
>
i'll have a look at the patches. thanks !
John
^ permalink raw reply
* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Hannes Frederic Sowa @ 2016-12-14 11:21 UTC (permalink / raw)
To: Jason A. Donenfeld, Netdev, kernel-hardening, LKML, linux-crypto
Cc: Jean-Philippe Aumasson, Daniel J . Bernstein, Linus Torvalds,
Eric Biggers
In-Reply-To: <20161214035927.30004-1-Jason@zx2c4.com>
Hello,
On 14.12.2016 04:59, Jason A. Donenfeld wrote:
> SipHash is a 64-bit keyed hash function that is actually a
> cryptographically secure PRF, like HMAC. Except SipHash is super fast,
> and is meant to be used as a hashtable keyed lookup function.
Can you show or cite benchmarks in comparison with jhash? Last time I
looked, especially for short inputs, siphash didn't beat jhash (also on
all the 32 bit devices etc.).
> SipHash isn't just some new trendy hash function. It's been around for a
> while, and there really isn't anything that comes remotely close to
> being useful in the way SipHash is. With that said, why do we need this?
>
> There are a variety of attacks known as "hashtable poisoning" in which an
> attacker forms some data such that the hash of that data will be the
> same, and then preceeds to fill up all entries of a hashbucket. This is
> a realistic and well-known denial-of-service vector.
This pretty much depends on the linearity of the hash function? I don't
think a crypto secure hash function is needed for a hash table. Albeit I
agree that siphash certainly looks good to be used here.
> Linux developers already seem to be aware that this is an issue, and
> various places that use hash tables in, say, a network context, use a
> non-cryptographically secure function (usually jhash) and then try to
> twiddle with the key on a time basis (or in many cases just do nothing
> and hope that nobody notices). While this is an admirable attempt at
> solving the problem, it doesn't actually fix it. SipHash fixes it.
I am pretty sure that SipHash still needs a random key per hash table
also. So far it was only the choice of hash function you are questioning.
> (It fixes it in such a sound way that you could even build a stream
> cipher out of SipHash that would resist the modern cryptanalysis.)
>
> There are a modicum of places in the kernel that are vulnerable to
> hashtable poisoning attacks, either via userspace vectors or network
> vectors, and there's not a reliable mechanism inside the kernel at the
> moment to fix it. The first step toward fixing these issues is actually
> getting a secure primitive into the kernel for developers to use. Then
> we can, bit by bit, port things over to it as deemed appropriate.
Hmm, I tried to follow up with all the HashDoS work and so far didn't
see any HashDoS attacks against the Jenkins/SpookyHash family.
If this is an issue we might need to also put those changes into stable.
> Dozens of languages are already using this internally for their hash
> tables. Some of the BSDs already use this in their kernels. SipHash is
> a widely known high-speed solution to a widely known problem, and it's
> time we catch-up.
Bye,
Hannes
^ permalink raw reply
* [PATCH] vsock: lookup and setup guest_cid inside vhost_vsock_lock
From: Gao feng @ 2016-12-14 11:24 UTC (permalink / raw)
To: kvm, netdev; +Cc: stefanha, mst, Gao feng
Multi vsocks may setup the same cid at the same time.
Signed-off-by: Gao feng <omarapazanadi@gmail.com>
---
drivers/vhost/vsock.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e3b30ea..a08332b 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -50,11 +50,10 @@ static u32 vhost_transport_get_local_cid(void)
return VHOST_VSOCK_DEFAULT_HOST_CID;
}
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+static struct vhost_vsock *__vhost_vsock_get(u32 guest_cid)
{
struct vhost_vsock *vsock;
- spin_lock_bh(&vhost_vsock_lock);
list_for_each_entry(vsock, &vhost_vsock_list, list) {
u32 other_cid = vsock->guest_cid;
@@ -63,15 +62,24 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
continue;
if (other_cid == guest_cid) {
- spin_unlock_bh(&vhost_vsock_lock);
return vsock;
}
}
- spin_unlock_bh(&vhost_vsock_lock);
return NULL;
}
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+ struct vhost_vsock *vsock;
+
+ spin_lock_bh(&vhost_vsock_lock);
+ vsock = __vhost_vsock_get(guest_cid);
+ spin_unlock_bh(&vhost_vsock_lock);
+
+ return vsock;
+}
+
static void
vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
struct vhost_virtqueue *vq)
@@ -562,11 +570,12 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
return -EINVAL;
/* Refuse if CID is already in use */
- other = vhost_vsock_get(guest_cid);
- if (other && other != vsock)
- return -EADDRINUSE;
-
spin_lock_bh(&vhost_vsock_lock);
+ other = __vhost_vsock_get(guest_cid);
+ if (other && other != vsock) {
+ spin_unlock_bh(&vhost_vsock_lock);
+ return -EADDRINUSE;
+ }
vsock->guest_cid = guest_cid;
spin_unlock_bh(&vhost_vsock_lock);
--
2.5.5
^ permalink raw reply related
* Re: dsa: handling more than 1 cpu port
From: Andrew Lunn @ 2016-12-14 11:00 UTC (permalink / raw)
To: John Crispin; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <0f82a785-b31c-0428-eb07-a9464a0c5ddd@phrozen.org>
On Wed, Dec 14, 2016 at 11:35:30AM +0100, John Crispin wrote:
>
>
> On 14/12/2016 11:31, Andrew Lunn wrote:
> > On Wed, Dec 14, 2016 at 11:01:54AM +0100, John Crispin wrote:
> >> Hi Andrew,
> >>
> >> switches supported by qca8k have 2 gmacs that we can wire an external
> >> mii interface to. Usually this is used to wire 2 of the SoCs MACs to the
> >> switch. Thw switch itself is aware of one of the MACs being the cpu port
> >> and expects this to be port/mac0. Using the other will break the
> >> hardware offloading features.
> >
> > Just to be sure here. There is no way to use the second port connected
> > to the CPU as a CPU port?
>
> both macs are considered cpu ports and both allow for the tag to be
> injected. for HW NAT/routing/... offloading to work, the lan ports neet
> to trunk via port0 and not port6 however.
Maybe you can do a restricted version of the generic solution. LAN
ports are mapped to cpu port0. WAN port to cpu port 6?
> > The Marvell chips do allow this. So i developed a proof of concept
> > which had a mapping between cpu ports and slave ports. slave port X
> > should you cpu port y for its traffic. This never got past proof of
> > concept.
> >
> > If this can be made to work for qca8k, i would prefer having this
> > general concept, than specific hacks for pass through.
>
> oh cool, can you send those patches my way please ? how do you configure
> this from userland ? does the cpu port get its on swdev which i just add
> to my lan bridge and then add the 2nd cpu port to the wan bridge ?
https://github.com/lunn/linux/tree/v4.1-rc4-net-next-multiple-cpu
You don't configure anything from userland. Which was actually a
criticism. It is in device tree. But my solution is generic. Having
one WAN port and four bridges LAN ports is a pure marketing
requirement. The hardware will happily do two WAN ports and 3 LAN
ports, for example. And the switch is happy to take traffic for the
WAN port and a LAN port over the same CPU port, and keep the traffic
separate. So we can have some form of load balancing. We are not
limited to 1->1, 1->4, we can do 1->2, 1->3 to increase the overall
performance. And to the user it is all transparent.
This PoC is for the old DSA binding. The new binding makes it easier
to express this. Which is one of the reasons for the new binding.
Andrew
^ permalink raw reply
* Re: Re: Synopsys Ethernet QoS
From: Pavel Machek @ 2016-12-14 11:54 UTC (permalink / raw)
To: Niklas Cassel
Cc: Joao Pinto, Florian Fainelli, Andy Shevchenko, David Miller,
Giuseppe CAVALLARO, larper, rabinv, netdev, CARLOS.PALMINHA,
Jie.Deng1, Stephen Warren
In-Reply-To: <1d445ec1-deb8-6e36-39c4-6813c446095f@axis.com>
[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]
Hi!
> There are some performance problems with the stmmac driver though:
>
> When running iperf3 with 3 streams:
> iperf3 -c 192.168.0.90 -P 3 -t 30
> iperf3 -c 192.168.0.90 -P 3 -t 30 -R
>
> I get really bad fairness between the streams.
>
> This appears to be an issue with how TX IRQ coalescing is implemented in stmmac.
> Disabling TX IRQ coalescing in the stmmac driver makes the problem go away.
Yes, I'm hitting the same problem
https://lkml.org/lkml/2016/12/11/90
> Also netperf TCP_RR and UDP_RR gives really bad results compared to the
> dwceqos driver (without IRQ coalescing).
> 2000 transactions/sec vs 9000 transactions/sec.
> Turning TX IRQ coalescing off and RX interrupt watchdog off in stmmac
> gives the same performance. I guess it's a trade off, low CPU usage
> vs low latency, so I don't know how important TCP_RR/UDP_RR really is.
Well... 75% performance hit is a bug, plain and simple, not CPU tradeoff.
> The best thing would be to get a good working TX IRQ coalesce
> implementation with HR timers in stmmac.
Actually it seems that using HR timers is not the only problem, AFAICT
the logic is wrong way around. (But yes, it needs to be HR timer, too,
and probably packet count needs to be reduced, too.)
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: [PATCH] vsock: lookup and setup guest_cid inside vhost_vsock_lock
From: Stefan Hajnoczi @ 2016-12-14 12:39 UTC (permalink / raw)
To: Gao feng; +Cc: kvm, netdev, mst
In-Reply-To: <1481714676-18555-1-git-send-email-omarapazanadi@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 414 bytes --]
On Wed, Dec 14, 2016 at 07:24:36PM +0800, Gao feng wrote:
> Multi vsocks may setup the same cid at the same time.
>
> Signed-off-by: Gao feng <omarapazanadi@gmail.com>
> ---
> drivers/vhost/vsock.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
Good catch, a classic time-of-check-to-time-of-use race condition.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply
* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-14 12:46 UTC (permalink / raw)
To: David Laight
Cc: Netdev, kernel-hardening, Jean-Philippe Aumasson, LKML,
Linux Crypto Mailing List, Daniel J . Bernstein, Linus Torvalds,
Eric Biggers
In-Reply-To: <20161214035927.30004-1-Jason@zx2c4.com>
Hi David,
On Wed, Dec 14, 2016 at 10:56 AM, David Laight <David.Laight@aculab.com> wrote:
> ...
>> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
> ...
>> + u64 k0 = get_unaligned_le64(key);
>> + u64 k1 = get_unaligned_le64(key + sizeof(u64));
> ...
>> + m = get_unaligned_le64(data);
>
> All these unaligned accesses are going to get expensive on architectures
> like sparc64.
Yes, the unaligned accesses aren't pretty. Since in pretty much all
use cases thus far, the data can easily be made aligned, perhaps it
makes sense to create siphash24() and siphash24_unaligned(). Any
thoughts on doing something like that?
Jason
^ permalink raw reply
* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Jason A. Donenfeld @ 2016-12-14 12:53 UTC (permalink / raw)
To: David Laight
Cc: Netdev, kernel-hardening, Andi Kleen, LKML,
Linux Crypto Mailing List
In-Reply-To: <20161214035927.30004-3-Jason@zx2c4.com>
Hi David,
On Wed, Dec 14, 2016 at 10:51 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Jason A. Donenfeld
>> Sent: 14 December 2016 00:17
>> This gives a clear speed and security improvement. Rather than manually
>> filling MD5 buffers, we simply create a layout by a simple anonymous
>> struct, for which gcc generates rather efficient code.
> ...
>> + const struct {
>> + struct in6_addr saddr;
>> + struct in6_addr daddr;
>> + __be16 sport;
>> + __be16 dport;
>> + } __packed combined = {
>> + .saddr = *(struct in6_addr *)saddr,
>> + .daddr = *(struct in6_addr *)daddr,
>> + .sport = sport,
>> + .dport = dport
>> + };
>
> You need to look at the effect of marking this (and the other)
> structures 'packed' on architectures like sparc64.
In all current uses of __packed in the code, I think the impact is
precisely zero, because all structures have members in descending
order of size, with each member being a perfect multiple of the one
below it. The __packed is therefore just there for safety, in case
somebody comes in and screws everything up by sticking a u8 in
between. In that case, it wouldn't be desirable to hash the structure
padding bits. In the worst case, I don't believe the impact would be
worse than a byte-by-byte memcpy, which is what the old code did. But
anyway, these structures are already naturally packed anyway, so the
present impact is nil.
Jason
^ permalink raw reply
* [RFC PATCH net-next v4 1/2] macb: Add 1588 support in Cadence GEM.
From: Andrei Pistirica @ 2016-12-14 12:56 UTC (permalink / raw)
To: netdev, linux-kernel, linux-arm-kernel, davem, nicolas.ferre,
harinikatakamlinux, harini.katakam
Cc: boris.brezillon, rafalo, alexandre.belloni, michals,
Andrei Pistirica, tbultel, anirudh, punnaia, richardcochran
Cadence GEM provides a 102 bit time counter with 48 bits for seconds,
30 bits for nsecs and 24 bits for sub-nsecs to control 1588 timestamping.
This patch does the following:
- Registers to ptp clock framework
- Timer initialization is done by writing time of day to the timer counter.
- ns increment register is programmed as NSEC_PER_SEC/tsu-clock-rate.
For a 16 bit subns precision, the subns increment equals
remainder of (NS_PER_SEC/TSU_CLK) * (2^16).
- Timestamps are obtained from the TX/RX PTP event/PEER registers.
The timestamp obtained thus is updated in skb for upper layers to access.
- The drivers register functions with ptp to perform time and frequency
adjustment.
- Time adjustment is done by writing to the 1558_ADJUST register.
The controller will read the delta in this register and update the timer
counter register. Alternatively, for large time offset adjustments,
the driver reads the secs and nsecs counter values, adds/subtracts the
delta and updates the timer counter.
- Frequency is adjusted by adjusting addend (8bit nanosecond increment) and
addendsub (16bit increment nanosecond fractions).
The 102bit counter is incremented at nominal frequency with addend and
addendsub values. Each period addend and addendsub values are adjusted
based on ppm drift.
Signed-off-by: Andrei Pistirica <andrei.pistirica@microchip.com>
Signed-off-by: Harini Katakam <harinik@xilinx.com>
---
Patch history:
Version 1:
This patch is based on original Harini's patch, implemented in a
separate file to ease the review/maintanance and integration with
other platforms (e.g. Zynq Ultrascale+ MPSoC).
Feature was tested on SAMA5D2 platform using ptp4l v1.6 from linuxptp
project and also with ptpd2 version 2.3.1. PTP was tested over
IPv4,IPv6 and 802.3 protocols.
In case that macb is compiled as a module, it has been renamed to
cadence-macb.ko to avoid naming confusion in Makefile.
Version 2 modifications:
- bitfields for TSU are named according to SAMA5D2 data sheet
- identify GEM-PTP support based on platform capability
- add spinlock for TSU access
- change macb_ptp_adjfreq and use fewer 64bit divisions
Version 3 modifications:
- new adjfine api with one 64 division for frequency adjustment
(based on Richard's input)
- add maximum adjustment frequency (ppb) based on nominal frequency
- per platform PTP configuration
- cosmetic changes
Note 1: Kbuild uses "select" instead of "imply", and the macb maintainer agreed
to make the change when it will be available in net-next.
Version 4 modifications:
- update adjfine for a better approximation
- add maximum adjustment frequency callback to PTP platform configuraion
Note 1: This driver does not support GEM-GXL!
Note 2: Patch on net-next, on December 14th.
drivers/net/ethernet/cadence/Kconfig | 10 +-
drivers/net/ethernet/cadence/Makefile | 8 +-
drivers/net/ethernet/cadence/macb.h | 118 ++++++++++
drivers/net/ethernet/cadence/macb_ptp.c | 366 ++++++++++++++++++++++++++++++++
4 files changed, 500 insertions(+), 2 deletions(-)
create mode 100644 drivers/net/ethernet/cadence/macb_ptp.c
diff --git a/drivers/net/ethernet/cadence/Kconfig b/drivers/net/ethernet/cadence/Kconfig
index f0bcb15..ebbc65f 100644
--- a/drivers/net/ethernet/cadence/Kconfig
+++ b/drivers/net/ethernet/cadence/Kconfig
@@ -29,6 +29,14 @@ config MACB
support for the MACB/GEM chip.
To compile this driver as a module, choose M here: the module
- will be called macb.
+ will be called cadence-macb.
+
+config MACB_USE_HWSTAMP
+ bool "Use IEEE 1588 hwstamp"
+ depends on MACB
+ default y
+ select PTP_1588_CLOCK
+ ---help---
+ Enable IEEE 1588 Precision Time Protocol (PTP) support for MACB.
endif # NET_CADENCE
diff --git a/drivers/net/ethernet/cadence/Makefile b/drivers/net/ethernet/cadence/Makefile
index 91f79b1..4402d42 100644
--- a/drivers/net/ethernet/cadence/Makefile
+++ b/drivers/net/ethernet/cadence/Makefile
@@ -2,4 +2,10 @@
# Makefile for the Atmel network device drivers.
#
-obj-$(CONFIG_MACB) += macb.o
+cadence-macb-y := macb.o
+
+ifeq ($(CONFIG_MACB_USE_HWSTAMP),y)
+cadence-macb-y += macb_ptp.o
+endif
+
+obj-$(CONFIG_MACB) += cadence-macb.o
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index d67adad..e65e985 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -10,6 +10,9 @@
#ifndef _MACB_H
#define _MACB_H
+#include <linux/ptp_clock.h>
+#include <linux/ptp_clock_kernel.h>
+
#define MACB_GREGS_NBR 16
#define MACB_GREGS_VERSION 2
#define MACB_MAX_QUEUES 8
@@ -131,6 +134,20 @@
#define GEM_RXIPCCNT 0x01a8 /* IP header Checksum Error Counter */
#define GEM_RXTCPCCNT 0x01ac /* TCP Checksum Error Counter */
#define GEM_RXUDPCCNT 0x01b0 /* UDP Checksum Error Counter */
+#define GEM_TISUBN 0x01bc /* 1588 Timer Increment Sub-ns */
+#define GEM_TSH 0x01c0 /* 1588 Timer Seconds High */
+#define GEM_TSL 0x01d0 /* 1588 Timer Seconds Low */
+#define GEM_TN 0x01d4 /* 1588 Timer Nanoseconds */
+#define GEM_TA 0x01d8 /* 1588 Timer Adjust */
+#define GEM_TI 0x01dc /* 1588 Timer Increment */
+#define GEM_EFTSL 0x01e0 /* PTP Event Frame Tx Seconds Low */
+#define GEM_EFTN 0x01e4 /* PTP Event Frame Tx Nanoseconds */
+#define GEM_EFRSL 0x01e8 /* PTP Event Frame Rx Seconds Low */
+#define GEM_EFRN 0x01ec /* PTP Event Frame Rx Nanoseconds */
+#define GEM_PEFTSL 0x01f0 /* PTP Peer Event Frame Tx Secs Low */
+#define GEM_PEFTN 0x01f4 /* PTP Peer Event Frame Tx Ns */
+#define GEM_PEFRSL 0x01f8 /* PTP Peer Event Frame Rx Sec Low */
+#define GEM_PEFRN 0x01fc /* PTP Peer Event Frame Rx Ns */
#define GEM_DCFG1 0x0280 /* Design Config 1 */
#define GEM_DCFG2 0x0284 /* Design Config 2 */
#define GEM_DCFG3 0x0288 /* Design Config 3 */
@@ -174,6 +191,7 @@
#define MACB_NCR_TPF_SIZE 1
#define MACB_TZQ_OFFSET 12 /* Transmit zero quantum pause frame */
#define MACB_TZQ_SIZE 1
+#define MACB_SRTSM_OFFSET 15
/* Bitfields in NCFGR */
#define MACB_SPD_OFFSET 0 /* Speed */
@@ -319,6 +337,32 @@
#define MACB_PTZ_SIZE 1
#define MACB_WOL_OFFSET 14 /* Enable wake-on-lan interrupt */
#define MACB_WOL_SIZE 1
+#define MACB_DRQFR_OFFSET 18 /* PTP Delay Request Frame Received */
+#define MACB_DRQFR_SIZE 1
+#define MACB_SFR_OFFSET 19 /* PTP Sync Frame Received */
+#define MACB_SFR_SIZE 1
+#define MACB_DRQFT_OFFSET 20 /* PTP Delay Request Frame Transmitted */
+#define MACB_DRQFT_SIZE 1
+#define MACB_SFT_OFFSET 21 /* PTP Sync Frame Transmitted */
+#define MACB_SFT_SIZE 1
+#define MACB_PDRQFR_OFFSET 22 /* PDelay Request Frame Received */
+#define MACB_PDRQFR_SIZE 1
+#define MACB_PDRSFR_OFFSET 23 /* PDelay Response Frame Received */
+#define MACB_PDRSFR_SIZE 1
+#define MACB_PDRQFT_OFFSET 24 /* PDelay Request Frame Transmitted */
+#define MACB_PDRQFT_SIZE 1
+#define MACB_PDRSFT_OFFSET 25 /* PDelay Response Frame Transmitted */
+#define MACB_PDRSFT_SIZE 1
+#define MACB_SRI_OFFSET 26 /* TSU Seconds Register Increment */
+#define MACB_SRI_SIZE 1
+
+/* Timer increment fields */
+#define MACB_TI_CNS_OFFSET 0
+#define MACB_TI_CNS_SIZE 8
+#define MACB_TI_ACNS_OFFSET 8
+#define MACB_TI_ACNS_SIZE 8
+#define MACB_TI_NIT_OFFSET 16
+#define MACB_TI_NIT_SIZE 8
/* Bitfields in MAN */
#define MACB_DATA_OFFSET 0 /* data */
@@ -386,6 +430,17 @@
#define GEM_PBUF_LSO_OFFSET 27
#define GEM_PBUF_LSO_SIZE 1
+/* Bitfields in TISUBN */
+#define GEM_SUBNSINCR_OFFSET 0
+#define GEM_SUBNSINCR_SIZE 16
+
+/* Bitfields in TI */
+#define GEM_NSINCR_OFFSET 0
+#define GEM_NSINCR_SIZE 8
+
+/* Bitfields in ADJ */
+#define GEM_ADDSUB_OFFSET 31
+#define GEM_ADDSUB_SIZE 1
/* Constants for CLK */
#define MACB_CLK_DIV8 0
#define MACB_CLK_DIV16 1
@@ -417,6 +472,7 @@
#define MACB_CAPS_GIGABIT_MODE_AVAILABLE 0x20000000
#define MACB_CAPS_SG_DISABLED 0x40000000
#define MACB_CAPS_MACB_IS_GEM 0x80000000
+#define MACB_CAPS_GEM_HAS_PTP 0x00000020
/* LSO settings */
#define MACB_LSO_UFO_ENABLE 0x01
@@ -782,6 +838,20 @@ struct macb_or_gem_ops {
int (*mog_rx)(struct macb *bp, int budget);
};
+/* MACB-PTP interface: adapt to platform needs and GEM (e.g. GXL). */
+struct macb_ptp_info {
+ void (*ptp_init)(struct net_device *ndev);
+ void (*ptp_remove)(struct net_device *ndev);
+ s32 (*get_ptp_max_adj)(void);
+ unsigned int (*get_tsu_rate)(struct macb *bp);
+ int (*get_ts_info)(struct net_device *dev,
+ struct ethtool_ts_info *info);
+ int (*get_hwtst)(struct net_device *netdev,
+ struct ifreq *ifr);
+ int (*set_hwtst)(struct net_device *netdev,
+ struct ifreq *ifr, int cmd);
+};
+
struct macb_config {
u32 caps;
unsigned int dma_burst_length;
@@ -874,11 +944,59 @@ struct macb {
unsigned int jumbo_max_len;
u32 wol;
+
+ struct macb_ptp_info *ptp_info;
+#ifdef CONFIG_MACB_USE_HWSTAMP
+ bool hwts_tx_en;
+ bool hwts_rx_en;
+ spinlock_t tsu_clk_lock; /* gem tsu clock locking */
+ unsigned int tsu_rate;
+
+ struct ptp_clock *ptp_clock;
+ struct ptp_clock_info ptp_caps;
+ u32 ns_incr;
+ u32 subns_incr;
+#endif
};
+#ifdef CONFIG_MACB_USE_HWSTAMP
+void gem_ptp_init(struct net_device *ndev);
+void gem_ptp_remove(struct net_device *ndev);
+void gem_ptp_txstamp(struct macb *bp, struct sk_buff *skb);
+void gem_ptp_rxstamp(struct macb *bp, struct sk_buff *skb);
+
+static inline void gem_ptp_do_txstamp(struct macb *bp, struct sk_buff *skb)
+{
+ if (!bp->hwts_tx_en)
+ return;
+
+ return gem_ptp_txstamp(bp, skb);
+}
+
+static inline void gem_ptp_do_rxstamp(struct macb *bp, struct sk_buff *skb)
+{
+ if (!bp->hwts_rx_en)
+ return;
+
+ return gem_ptp_rxstamp(bp, skb);
+}
+
+#else
+static inline void gem_ptp_init(struct net_device *ndev) { }
+static inline void gem_ptp_remove(struct net_device *ndev) { }
+
+static inline void gem_ptp_do_txstamp(struct macb *bp, struct sk_buff *skb) { }
+static inline void gem_ptp_do_rxstamp(struct macb *bp, struct sk_buff *skb) { }
+#endif
+
static inline bool macb_is_gem(struct macb *bp)
{
return !!(bp->caps & MACB_CAPS_MACB_IS_GEM);
}
+static inline bool gem_has_ptp(struct macb *bp)
+{
+ return !!(bp->caps & MACB_CAPS_GEM_HAS_PTP);
+}
+
#endif /* _MACB_H */
diff --git a/drivers/net/ethernet/cadence/macb_ptp.c b/drivers/net/ethernet/cadence/macb_ptp.c
new file mode 100644
index 0000000..6121b2a
--- /dev/null
+++ b/drivers/net/ethernet/cadence/macb_ptp.c
@@ -0,0 +1,366 @@
+/*
+ * 1588 PTP support for GEM device.
+ *
+ * Copyright (C) 2016 Microchip Technology
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include <linux/clk.h>
+#include <linux/device.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/time64.h>
+#include <linux/ptp_classify.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <linux/net_tstamp.h>
+
+#include "macb.h"
+
+#define GEM_PTP_TIMER_NAME "gem-ptp-timer"
+
+static inline void gem_tsu_get_time(struct macb *bp,
+ struct timespec64 *ts)
+{
+ u64 sec, sech, secl;
+
+ spin_lock(&bp->tsu_clk_lock);
+
+ /* GEM's internal time */
+ sech = gem_readl(bp, TSH);
+ secl = gem_readl(bp, TSL);
+ ts->tv_nsec = gem_readl(bp, TN);
+ ts->tv_sec = (sech << 32) | secl;
+
+ /* minimize error */
+ sech = gem_readl(bp, TSH);
+ secl = gem_readl(bp, TSL);
+ sec = (sech << 32) | secl;
+ if (ts->tv_sec != sec) {
+ ts->tv_sec = sec;
+ ts->tv_nsec = gem_readl(bp, TN);
+ }
+
+ spin_unlock(&bp->tsu_clk_lock);
+}
+
+static inline void gem_tsu_set_time(struct macb *bp,
+ const struct timespec64 *ts)
+{
+ u32 ns, sech, secl;
+ s64 word_mask = 0xffffffff;
+
+ sech = (u32)ts->tv_sec;
+ secl = (u32)ts->tv_sec;
+ ns = ts->tv_nsec;
+ if (ts->tv_sec > word_mask)
+ sech = (ts->tv_sec >> 32);
+
+ spin_lock(&bp->tsu_clk_lock);
+
+ /* TSH doesn't latch the time and no atomicity! */
+ gem_writel(bp, TN, 0); /* clear to avoid overflow */
+ gem_writel(bp, TSH, sech);
+ gem_writel(bp, TSL, secl);
+ gem_writel(bp, TN, ns);
+
+ spin_unlock(&bp->tsu_clk_lock);
+}
+
+static int gem_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
+{
+ struct macb *bp = container_of(ptp, struct macb, ptp_caps);
+ u32 word, diff;
+ u64 adj, rate;
+ int neg_adj = 0;
+
+ if (scaled_ppm < 0) {
+ neg_adj = 1;
+ scaled_ppm = -scaled_ppm;
+ }
+ rate = scaled_ppm;
+
+ /* word: unused(8bit) | ns(8bit) | fractions(16bit) */
+ word = (bp->ns_incr << 16) + bp->subns_incr;
+
+ adj = word;
+ adj *= rate;
+ adj += 500000UL << 16;
+ adj >>= 16; /* remove fractions */
+ diff = div_u64(adj, 1000000UL);
+ word = neg_adj ? word - diff : word + diff;
+
+ spin_lock(&bp->tsu_clk_lock);
+
+ gem_writel(bp, TISUBN, GEM_BF(SUBNSINCR, (word & 0xffff)));
+ gem_writel(bp, TI, GEM_BF(NSINCR, (word >> 16)));
+
+ spin_unlock(&bp->tsu_clk_lock);
+ return 0;
+}
+
+static int gem_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+ struct macb *bp = container_of(ptp, struct macb, ptp_caps);
+ struct timespec64 now, then = ns_to_timespec64(delta);
+ u32 adj, sign = 0;
+
+ if (delta < 0) {
+ delta = -delta;
+ sign = 1;
+ }
+
+ if (delta > 0x3FFFFFFF) {
+ gem_tsu_get_time(bp, &now);
+
+ if (sign)
+ now = timespec64_sub(now, then);
+ else
+ now = timespec64_add(now, then);
+
+ gem_tsu_set_time(bp, (const struct timespec64 *)&now);
+ } else {
+ adj = delta;
+ if (sign)
+ adj |= GEM_BIT(ADDSUB);
+
+ gem_writel(bp, TA, adj);
+ }
+
+ return 0;
+}
+
+static int gem_ptp_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
+{
+ struct macb *bp = container_of(ptp, struct macb, ptp_caps);
+
+ gem_tsu_get_time(bp, ts);
+
+ return 0;
+}
+
+static int gem_ptp_settime(struct ptp_clock_info *ptp,
+ const struct timespec64 *ts)
+{
+ struct macb *bp = container_of(ptp, struct macb, ptp_caps);
+
+ gem_tsu_set_time(bp, ts);
+
+ return 0;
+}
+
+static int gem_ptp_enable(struct ptp_clock_info *ptp,
+ struct ptp_clock_request *rq, int on)
+{
+ return -EOPNOTSUPP;
+}
+
+static struct ptp_clock_info gem_ptp_caps_template = {
+ .owner = THIS_MODULE,
+ .name = GEM_PTP_TIMER_NAME,
+ .max_adj = 0,
+ .n_alarm = 0,
+ .n_ext_ts = 0,
+ .n_per_out = 0,
+ .n_pins = 0,
+ .pps = 0,
+ .adjfine = gem_ptp_adjfine,
+ .adjtime = gem_ptp_adjtime,
+ .gettime64 = gem_ptp_gettime,
+ .settime64 = gem_ptp_settime,
+ .enable = gem_ptp_enable,
+};
+
+static void gem_ptp_init_timer(struct macb *bp)
+{
+ struct timespec64 now;
+ u32 rem = 0;
+
+ getnstimeofday64(&now);
+ gem_tsu_set_time(bp, (const struct timespec64 *)&now);
+
+ bp->ns_incr = div_u64_rem(NSEC_PER_SEC, bp->tsu_rate, &rem);
+ if (rem) {
+ u64 adj = rem;
+
+ adj <<= 16; /* 16 bits nsec fragments */
+ bp->subns_incr = div_u64(adj, bp->tsu_rate);
+ } else {
+ bp->subns_incr = 0;
+ }
+
+ gem_writel(bp, TISUBN, GEM_BF(SUBNSINCR, bp->subns_incr));
+ gem_writel(bp, TI, GEM_BF(NSINCR, bp->ns_incr));
+ gem_writel(bp, TA, 0);
+}
+
+static void gem_ptp_clear_timer(struct macb *bp)
+{
+ bp->ns_incr = 0;
+ bp->subns_incr = 0;
+
+ gem_writel(bp, TISUBN, GEM_BF(SUBNSINCR, 0));
+ gem_writel(bp, TI, GEM_BF(NSINCR, 0));
+ gem_writel(bp, TA, 0);
+}
+
+/* While GEM can timestamp PTP packets, it does not mark the RX descriptor
+ * to identify them. UDP packets must be parsed to identify PTP packets.
+ *
+ * Note: Inspired from drivers/net/ethernet/ti/cpts.c
+ */
+static int gem_get_ptp_peer(struct sk_buff *skb, int ptp_class)
+{
+ unsigned int offset = 0;
+ u8 *msgtype, *data = skb->data;
+
+ /* PTP frames are rare! */
+ if (likely(ptp_class == PTP_CLASS_NONE))
+ return -1;
+
+ if (ptp_class & PTP_CLASS_VLAN)
+ offset += VLAN_HLEN;
+
+ switch (ptp_class & PTP_CLASS_PMASK) {
+ case PTP_CLASS_IPV4:
+ offset += ETH_HLEN + IPV4_HLEN(data + offset) + UDP_HLEN;
+ break;
+ case PTP_CLASS_IPV6:
+ offset += ETH_HLEN + IP6_HLEN + UDP_HLEN;
+ break;
+ case PTP_CLASS_L2:
+ offset += ETH_HLEN;
+ break;
+
+ /* something went wrong! */
+ default:
+ return -1;
+ }
+
+ if (skb->len + ETH_HLEN < offset + OFF_PTP_SEQUENCE_ID)
+ return -1;
+
+ if (unlikely(ptp_class & PTP_CLASS_V1))
+ msgtype = data + offset + OFF_PTP_CONTROL;
+ else
+ msgtype = data + offset;
+
+ return (*msgtype) & 0x2;
+}
+
+static void gem_ptp_tx_hwtstamp(struct macb *bp, struct sk_buff *skb,
+ int peer_ev)
+{
+ struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb);
+ struct timespec64 ts;
+ u64 ns;
+
+ /* PTP Peer Event Frame packets */
+ if (peer_ev) {
+ ts.tv_sec = gem_readl(bp, PEFTSL);
+ ts.tv_nsec = gem_readl(bp, PEFTN);
+
+ /* PTP Event Frame packets */
+ } else {
+ ts.tv_sec = gem_readl(bp, EFTSL);
+ ts.tv_nsec = gem_readl(bp, EFTN);
+ }
+ ns = timespec64_to_ns(&ts);
+
+ memset(shhwtstamps, 0, sizeof(struct skb_shared_hwtstamps));
+ shhwtstamps->hwtstamp = ns_to_ktime(ns);
+ skb_tstamp_tx(skb, skb_hwtstamps(skb));
+}
+
+static void gem_ptp_rx_hwtstamp(struct macb *bp, struct sk_buff *skb,
+ int peer_ev)
+{
+ struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb);
+ struct timespec64 ts;
+ u64 ns;
+
+ if (peer_ev) {
+ /* PTP Peer Event Frame packets */
+ ts.tv_sec = gem_readl(bp, PEFRSL);
+ ts.tv_nsec = gem_readl(bp, PEFRN);
+ } else {
+ /* PTP Event Frame packets */
+ ts.tv_sec = gem_readl(bp, EFRSL);
+ ts.tv_nsec = gem_readl(bp, EFRN);
+ }
+ ns = timespec64_to_ns(&ts);
+
+ memset(shhwtstamps, 0, sizeof(struct skb_shared_hwtstamps));
+ shhwtstamps->hwtstamp = ns_to_ktime(ns);
+}
+
+/* no static, GEM PTP interface functions */
+void gem_ptp_txstamp(struct macb *bp, struct sk_buff *skb)
+{
+ if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) {
+ int class = ptp_classify_raw(skb);
+ int peer;
+
+ peer = gem_get_ptp_peer(skb, class);
+ if (peer < 0)
+ return;
+
+ /* Timestamp this packet */
+ gem_ptp_tx_hwtstamp(bp, skb, peer);
+ }
+}
+
+void gem_ptp_rxstamp(struct macb *bp, struct sk_buff *skb)
+{
+ int class, peer;
+
+ __skb_push(skb, ETH_HLEN);
+ class = ptp_classify_raw(skb);
+ __skb_pull(skb, ETH_HLEN);
+
+ peer = gem_get_ptp_peer(skb, class);
+ if (peer < 0)
+ return;
+
+ gem_ptp_rx_hwtstamp(bp, skb, peer);
+}
+
+void gem_ptp_init(struct net_device *ndev)
+{
+ struct macb *bp = netdev_priv(ndev);
+
+ spin_lock_init(&bp->tsu_clk_lock);
+ bp->ptp_caps = gem_ptp_caps_template;
+
+ /* nominal frequency and maximum adjustment in ppb */
+ bp->tsu_rate = bp->ptp_info->get_tsu_rate(bp);
+ bp->ptp_caps.max_adj = bp->ptp_info->get_ptp_max_adj();
+
+ gem_ptp_init_timer(bp);
+
+ bp->ptp_clock = ptp_clock_register(&bp->ptp_caps, NULL);
+ if (IS_ERR(&bp->ptp_clock)) {
+ bp->ptp_clock = NULL;
+ pr_err("ptp clock register failed\n");
+ return;
+ }
+
+ dev_info(&bp->pdev->dev, "%s ptp clock registered.\n",
+ GEM_PTP_TIMER_NAME);
+}
+
+void gem_ptp_remove(struct net_device *ndev)
+{
+ struct macb *bp = netdev_priv(ndev);
+
+ if (bp->ptp_clock)
+ ptp_clock_unregister(bp->ptp_clock);
+
+ gem_ptp_clear_timer(bp);
+
+ dev_info(&bp->pdev->dev, "%s ptp clock unregistered.\n",
+ GEM_PTP_TIMER_NAME);
+}
--
2.7.4
^ permalink raw reply related
* [RFC PATCH net-next v4 2/2] macb: Enable 1588 support in SAMA5Dx platforms.
From: Andrei Pistirica @ 2016-12-14 12:56 UTC (permalink / raw)
To: netdev, linux-kernel, linux-arm-kernel, davem, nicolas.ferre,
harinikatakamlinux, harini.katakam
Cc: punnaia, michals, anirudh, boris.brezillon, alexandre.belloni,
tbultel, richardcochran, rafalo, Andrei Pistirica
In-Reply-To: <1481720175-12703-1-git-send-email-andrei.pistirica@microchip.com>
This patch does the following:
- Enable HW time stamp for the following platforms: SAMA5D2, SAMA5D3 and
SAMA5D4.
- HW time stamp capabilities are advertised via ethtool and macb ioctl is
updated accordingly.
- HW time stamp on the PTP Ethernet packets are received using the
SO_TIMESTAMPING API. Where timers are obtained from the PTP event/peer
registers.
Note: Patch on net-next, on December 7th.
Signed-off-by: Andrei Pistirica <andrei.pistirica@microchip.com>
---
Patch history:
Version 1:
Integration with SAMA5D2 only. This feature wasn't tested on any
other platform that might use cadence/gem.
Patch is not completely ported to the very latest version of net-next,
and it will be after review.
Version 2 modifications:
- add PTP caps for SAMA5D2/3/4 platforms
- and cosmetic changes
Version 3 modifications:
- add support for sama5D2/3/4 platforms using GEM-PTP interface.
Version 4 modifications:
- time stamp only PTP_V2 events
- maximum adjustment value is set based on Richard's input
Note: Patch on net-next, on December 14th.
drivers/net/ethernet/cadence/macb.c | 168 ++++++++++++++++++++++++++++++++++--
1 file changed, 163 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 538544a..8d5c976 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -714,6 +714,8 @@ static void macb_tx_interrupt(struct macb_queue *queue)
/* First, update TX stats if needed */
if (skb) {
+ gem_ptp_do_txstamp(bp, skb);
+
netdev_vdbg(bp->dev, "skb %u (data %p) TX complete\n",
macb_tx_ring_wrap(bp, tail),
skb->data);
@@ -878,6 +880,8 @@ static int gem_rx(struct macb *bp, int budget)
GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
skb->ip_summed = CHECKSUM_UNNECESSARY;
+ gem_ptp_do_rxstamp(bp, skb);
+
bp->stats.rx_packets++;
bp->stats.rx_bytes += skb->len;
@@ -2080,6 +2084,9 @@ static int macb_open(struct net_device *dev)
netif_tx_start_all_queues(dev);
+ if (bp->ptp_info)
+ bp->ptp_info->ptp_init(dev);
+
return 0;
}
@@ -2101,6 +2108,9 @@ static int macb_close(struct net_device *dev)
macb_free_consistent(bp);
+ if (bp->ptp_info)
+ bp->ptp_info->ptp_remove(dev);
+
return 0;
}
@@ -2374,6 +2384,133 @@ static int macb_set_ringparam(struct net_device *netdev,
return 0;
}
+#ifdef CONFIG_MACB_USE_HWSTAMP
+static unsigned int gem_get_tsu_rate(struct macb *bp)
+{
+ /* Note: TSU rate is hardwired to PCLK. */
+ return clk_get_rate(bp->pclk);
+}
+
+static s32 gem_get_ptp_max_adj(void)
+{
+ return 3921508;
+}
+
+static int gem_get_ts_info(struct net_device *dev,
+ struct ethtool_ts_info *info)
+{
+ struct macb *bp = netdev_priv(dev);
+
+ ethtool_op_get_ts_info(dev, info);
+ info->so_timestamping =
+ SOF_TIMESTAMPING_TX_SOFTWARE |
+ SOF_TIMESTAMPING_RX_SOFTWARE |
+ SOF_TIMESTAMPING_SOFTWARE |
+ SOF_TIMESTAMPING_TX_HARDWARE |
+ SOF_TIMESTAMPING_RX_HARDWARE |
+ SOF_TIMESTAMPING_RAW_HARDWARE;
+ info->phc_index = -1;
+
+ if (bp->ptp_clock)
+ info->phc_index = ptp_clock_index(bp->ptp_clock);
+
+ return 0;
+}
+
+static int gem_set_hwtst(struct net_device *netdev,
+ struct ifreq *ifr, int cmd)
+{
+ struct hwtstamp_config config;
+ struct macb *priv = netdev_priv(netdev);
+ u32 regval;
+
+ netdev_vdbg(netdev, "macb_hwtstamp_ioctl\n");
+
+ if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
+ return -EFAULT;
+
+ /* reserved for future extensions */
+ if (config.flags)
+ return -EINVAL;
+
+ switch (config.tx_type) {
+ case HWTSTAMP_TX_OFF:
+ priv->hwts_tx_en = false;
+ break;
+ case HWTSTAMP_TX_ON:
+ priv->hwts_tx_en = true;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ switch (config.rx_filter) {
+ case HWTSTAMP_FILTER_NONE:
+ if (priv->hwts_rx_en)
+ priv->hwts_rx_en = false;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+ case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+ case HWTSTAMP_FILTER_PTP_V2_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+ config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
+ regval = macb_readl(priv, NCR);
+ macb_writel(priv, NCR, (regval | MACB_BIT(SRTSM)));
+
+ if (!priv->hwts_rx_en)
+ priv->hwts_rx_en = true;
+ break;
+ default:
+ config.rx_filter = HWTSTAMP_FILTER_NONE;
+ return -ERANGE;
+ }
+
+ return copy_to_user(ifr->ifr_data, &config, sizeof(config)) ?
+ -EFAULT : 0;
+}
+
+static int gem_get_hwtst(struct net_device *netdev,
+ struct ifreq *ifr)
+{
+ struct hwtstamp_config config;
+ struct macb *priv = netdev_priv(netdev);
+
+ config.flags = 0;
+ config.tx_type = priv->hwts_tx_en ? HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
+ config.rx_filter = (priv->hwts_rx_en ?
+ HWTSTAMP_FILTER_ALL : HWTSTAMP_FILTER_NONE);
+
+ return copy_to_user(ifr->ifr_data, &config, sizeof(config)) ?
+ -EFAULT : 0;
+}
+
+static struct macb_ptp_info gem_ptp_info = {
+ .ptp_init = gem_ptp_init,
+ .ptp_remove = gem_ptp_remove,
+ .get_ptp_max_adj = gem_get_ptp_max_adj,
+ .get_tsu_rate = gem_get_tsu_rate,
+ .get_ts_info = gem_get_ts_info,
+ .get_hwtst = gem_get_hwtst,
+ .set_hwtst = gem_set_hwtst,
+};
+#endif
+
+static int macb_get_ts_info(struct net_device *netdev,
+ struct ethtool_ts_info *info)
+{
+ struct macb *bp = netdev_priv(netdev);
+
+ if (bp->ptp_info)
+ return bp->ptp_info->get_ts_info(netdev, info);
+
+ return ethtool_op_get_ts_info(netdev, info);
+}
+
static const struct ethtool_ops macb_ethtool_ops = {
.get_regs_len = macb_get_regs_len,
.get_regs = macb_get_regs,
@@ -2391,7 +2528,7 @@ static const struct ethtool_ops gem_ethtool_ops = {
.get_regs_len = macb_get_regs_len,
.get_regs = macb_get_regs,
.get_link = ethtool_op_get_link,
- .get_ts_info = ethtool_op_get_ts_info,
+ .get_ts_info = macb_get_ts_info,
.get_ethtool_stats = gem_get_ethtool_stats,
.get_strings = gem_get_ethtool_strings,
.get_sset_count = gem_get_sset_count,
@@ -2404,6 +2541,7 @@ static const struct ethtool_ops gem_ethtool_ops = {
static int macb_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
{
struct phy_device *phydev = dev->phydev;
+ struct macb *bp = netdev_priv(dev);
if (!netif_running(dev))
return -EINVAL;
@@ -2411,7 +2549,20 @@ static int macb_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
if (!phydev)
return -ENODEV;
- return phy_mii_ioctl(phydev, rq, cmd);
+ switch (cmd) {
+ case SIOCSHWTSTAMP:
+ if (bp->ptp_info)
+ return bp->ptp_info->set_hwtst(dev, rq, cmd);
+
+ return -EOPNOTSUPP;
+ case SIOCGHWTSTAMP:
+ if (bp->ptp_info)
+ return bp->ptp_info->get_hwtst(dev, rq);
+
+ return -EOPNOTSUPP;
+ default:
+ return phy_mii_ioctl(phydev, rq, cmd);
+ }
}
static int macb_set_features(struct net_device *netdev,
@@ -2485,6 +2636,12 @@ static void macb_configure_caps(struct macb *bp,
dcfg = gem_readl(bp, DCFG2);
if ((dcfg & (GEM_BIT(RX_PKT_BUFF) | GEM_BIT(TX_PKT_BUFF))) == 0)
bp->caps |= MACB_CAPS_FIFO_MODE;
+
+ /* iff HWSTAMP is configure and gem has the capability */
+#ifdef CONFIG_MACB_USE_HWSTAMP
+ if (gem_has_ptp(bp))
+ bp->ptp_info = &gem_ptp_info;
+#endif
}
dev_dbg(&bp->pdev->dev, "Cadence caps 0x%08x\n", bp->caps);
@@ -3041,7 +3198,7 @@ static const struct macb_config pc302gem_config = {
};
static const struct macb_config sama5d2_config = {
- .caps = MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII,
+ .caps = MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII | MACB_CAPS_GEM_HAS_PTP,
.dma_burst_length = 16,
.clk_init = macb_clk_init,
.init = macb_init,
@@ -3049,14 +3206,15 @@ static const struct macb_config sama5d2_config = {
static const struct macb_config sama5d3_config = {
.caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE
- | MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII,
+ | MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII
+ | MACB_CAPS_GEM_HAS_PTP,
.dma_burst_length = 16,
.clk_init = macb_clk_init,
.init = macb_init,
};
static const struct macb_config sama5d4_config = {
- .caps = MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII,
+ .caps = MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII | MACB_CAPS_GEM_HAS_PTP,
.dma_burst_length = 4,
.clk_init = macb_clk_init,
.init = macb_init,
--
2.7.4
^ permalink raw reply related
* Re: [PATCH] arp: do neigh confirm based on sk arg
From: kbuild test robot @ 2016-12-14 12:56 UTC (permalink / raw)
To: YueHaibing
Cc: kbuild-all, Julian Anastasov, Hannes Frederic Sowa, Eric Dumazet,
David S. Miller, netdev
In-Reply-To: <74d41c47-7091-3c52-096d-5b9af2e0e9cf@huawei.com>
[-- Attachment #1: Type: text/plain, Size: 2997 bytes --]
Hi YueHaibing,
[auto build test WARNING on v4.9-rc8]
[cannot apply to net/master net-next/master sparc-next/master next-20161214]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/YueHaibing/arp-do-neigh-confirm-based-on-sk-arg/20161214-191755
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=openrisc
All warnings (new ones prefixed by >>):
net/ipv4/tcp_input.c: In function 'tcp_rcv_state_process':
>> net/ipv4/tcp_input.c:6003:21: warning: unused variable 'dst'
vim +/dst +6003 net/ipv4/tcp_input.c
^1da177e4 Linus Torvalds 2005-04-16 5987 */
168a8f580 Jerry Chu 2012-08-31 5988 tcp_rearm_rto(sk);
168a8f580 Jerry Chu 2012-08-31 5989 } else
^1da177e4 Linus Torvalds 2005-04-16 5990 tcp_init_metrics(sk);
^1da177e4 Linus Torvalds 2005-04-16 5991
c0402760f Yuchung Cheng 2016-09-19 5992 if (!inet_csk(sk)->icsk_ca_ops->cong_control)
02cf4ebd8 Neal Cardwell 2013-10-21 5993 tcp_update_pacing_rate(sk);
02cf4ebd8 Neal Cardwell 2013-10-21 5994
61eb90035 Joe Perches 2013-05-24 5995 /* Prevent spurious tcp_cwnd_restart() on first data packet */
^1da177e4 Linus Torvalds 2005-04-16 5996 tp->lsndtime = tcp_time_stamp;
^1da177e4 Linus Torvalds 2005-04-16 5997
^1da177e4 Linus Torvalds 2005-04-16 5998 tcp_initialize_rcv_mss(sk);
^1da177e4 Linus Torvalds 2005-04-16 5999 tcp_fast_path_on(tp);
^1da177e4 Linus Torvalds 2005-04-16 6000 break;
^1da177e4 Linus Torvalds 2005-04-16 6001
c48b22daa Joe Perches 2013-05-24 6002 case TCP_FIN_WAIT1: {
c48b22daa Joe Perches 2013-05-24 @6003 struct dst_entry *dst;
c48b22daa Joe Perches 2013-05-24 6004 int tmo;
c48b22daa Joe Perches 2013-05-24 6005
168a8f580 Jerry Chu 2012-08-31 6006 /* If we enter the TCP_FIN_WAIT1 state and we are a
168a8f580 Jerry Chu 2012-08-31 6007 * Fast Open socket and this is the first acceptable
168a8f580 Jerry Chu 2012-08-31 6008 * ACK we have received, this would have acknowledged
168a8f580 Jerry Chu 2012-08-31 6009 * our SYNACK so stop the SYNACK timer.
168a8f580 Jerry Chu 2012-08-31 6010 */
00db41243 Ian Morris 2015-04-03 6011 if (req) {
:::::: The code at line 6003 was first introduced by commit
:::::: c48b22daa6062fff9eded311b4d6974c29b40487 tcp: Remove 2 indentation levels in tcp_rcv_state_process
:::::: TO: Joe Perches <joe@perches.com>
:::::: CC: David S. Miller <davem@davemloft.net>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7277 bytes --]
^ permalink raw reply
* Re: [PATCH 1/1] Fixed to BUG_ON to WARN_ON def
From: Tariq Toukan @ 2016-12-14 12:58 UTC (permalink / raw)
To: Leon Romanovsky, Ozgur Karatas, Tariq Toukan
Cc: yishaih@mellanox.com, netdev, linux-kernel
In-Reply-To: <20161212181838.GB8204@mtr-leonro.local>
Thanks Ozgur for your report.
On 12/12/2016 8:18 PM, Leon Romanovsky wrote:
> On Mon, Dec 12, 2016 at 03:04:28PM +0200, Ozgur Karatas wrote:
>> Dear Romanovsky;
> Please avoid top-posting in your replies.
> Thanks
>
>> I'm trying to learn english and I apologize for my mistake words and phrases. So, I think the code when call to "sg_set_buf" and next time set memory and buffer. For example, isn't to call "WARN_ON" function, get a error to implicit declaration, right?
>>
>> Because, you will use to "BUG_ON" get a error implicit declaration of functions.
> I'm not sure that I followed you. mem->offset is set by sg_set_buf from
> buf variable returned by dma_alloc_coherent(). HW needs to get very
> precise size of this buf, in multiple of pages and aligned to pages
> boundaries.
>
>> sg_set_buf(mem, buf, PAGE_SIZE << order);
>> WARN_ON(mem->offset);
> See the patch inline which removes this BUG_ON in proper and safe way.
>
> From 7babe807affa2b27d51d3610afb75b693929ea1a Mon Sep 17 00:00:00 2001
> From: Leon Romanovsky <leonro@mellanox.com>
> Date: Mon, 12 Dec 2016 20:02:45 +0200
> Subject: [PATCH] net/mlx4: Remove BUG_ON from ICM allocation routine
>
> This patch removes BUG_ON() macro from mlx4_alloc_icm_coherent()
> by checking DMA address aligment in advance and performing proper
> folding in case of error.
>
> Fixes: 5b0bf5e25efe ("mlx4_core: Support ICM tables in coherent memory")
> Reported-by: Ozgur Karatas <okaratas@member.fsf.org>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/icm.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index 2a9dd46..e1f9e7c 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -118,8 +118,13 @@ static int mlx4_alloc_icm_coherent(struct device *dev, struct scatterlist *mem,
> if (!buf)
> return -ENOMEM;
>
> + if (offset_in_page(buf)) {
> + dma_free_coherent(dev, PAGE_SIZE << order,
> + buf, sg_dma_address(mem));
> + return -ENOMEM;
> + }
> +
> sg_set_buf(mem, buf, PAGE_SIZE << order);
> - BUG_ON(mem->offset);
> sg_dma_len(mem) = PAGE_SIZE << order;
> return 0;
> }
> --
Thanks Leon for the patch. It is the right way to do so.
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
We will submit Leon's patch in a new email.
Regards,
Tariq
> 2.10.2
>
^ permalink raw reply
* Re: Synopsys Ethernet QoS
From: Pavel Machek @ 2016-12-14 12:57 UTC (permalink / raw)
To: Niklas Cassel
Cc: Giuseppe CAVALLARO, Joao Pinto, Florian Fainelli, Andy Shevchenko,
David Miller, larper, rabinv, netdev, CARLOS.PALMINHA, Jie.Deng1,
Stephen Warren
In-Reply-To: <99424968-ad8f-fec6-ebcf-ab7b19ee5486@axis.com>
[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]
Hi!
> So if there is a long time before handling interrupts,
> I guess that it makes sense that one stream could
> get an advantage in the net scheduler.
>
> If I find the time, and if no one beats me to it, I will try to replace
> the normal timers with HR timers + a smaller default timeout.
>
Can you try something like this? Highres timers will be needed, too,
but this fixes the logic problem.
You'll need to apply it twice as code is copy&pasted.
Best regards,
Pavel
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
*/
priv->tx_count_frames += nfrags + 1;
if (likely(priv->tx_coal_frames > priv->tx_count_frames)) {
- mod_timer(&priv->txtimer,
- STMMAC_COAL_TIMER(priv->tx_coal_timer));
+ if (priv->tx_count_frames == nfrags + 1)
+ mod_timer(&priv->txtimer,
+ STMMAC_COAL_TIMER(priv->tx_coal_timer));
} else {
priv->tx_count_frames = 0;
priv->hw->desc->set_tx_ic(desc);
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply
* Re: [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-14 13:10 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Netdev, kernel-hardening, LKML, Linux Crypto Mailing List,
Jean-Philippe Aumasson, Daniel J . Bernstein, Linus Torvalds,
Eric Biggers
In-Reply-To: <516c5633-14c2-ee18-90e4-84d73870ba2c@stressinduktion.org>
Hi Hannes,
On Wed, Dec 14, 2016 at 12:21 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Can you show or cite benchmarks in comparison with jhash? Last time I
> looked, especially for short inputs, siphash didn't beat jhash (also on
> all the 32 bit devices etc.).
I assume that jhash is likely faster than siphash, but I wouldn't be
surprised if with optimization we can make siphash at least pretty
close on 64-bit platforms. (I'll do some tests though; maybe I'm wrong
and jhash is already slower.)
With that said, siphash is here to replace uses of jhash where
hashtable poisoning vulnerabilities make it necessary. Where there's
no significant security improvement, if there's no speed improvement
either, then of course nothing's going to change.
I should have mentioned md5_transform in this first message too, as
two other patches in this series actually replace md5_transform usage
with siphash. I think in this case, siphash is a clear performance
winner (and security winner) over md5_transform. So if the push back
against replacing jhash usages is just too high, at the very least it
remains useful already for the md5_transform usage.
> This pretty much depends on the linearity of the hash function? I don't
> think a crypto secure hash function is needed for a hash table. Albeit I
> agree that siphash certainly looks good to be used here.
In order to prevent the aforementioned poisoning attacks, a PRF with
perfect linearity is required, which is what's achieved when it's a
cryptographically secure one. Check out section 7 of
https://131002.net/siphash/siphash.pdf .
> I am pretty sure that SipHash still needs a random key per hash table
> also. So far it was only the choice of hash function you are questioning.
Siphash needs a random secret key, yes. The point is that the hash
function remains secure so long as the secret key is kept secret.
Other functions can't make the same guarantee, and so nervous periodic
key rotation is necessary, but in most cases nothing is done, and so
things just leak over time.
> Hmm, I tried to follow up with all the HashDoS work and so far didn't
> see any HashDoS attacks against the Jenkins/SpookyHash family.
>
> If this is an issue we might need to also put those changes into stable.
jhash just isn't secure; it's not a cryptographically secure PRF. If
there hasn't already been an academic paper put out there about it
this year, let's make this thread 1000 messages long to garner
attention, and next year perhaps we'll see one. No doubt that
motivated government organizations, defense contractors, criminals,
and other netizens have already done research in private. Replacing
insecure functions with secure functions is usually a good thing.
Jason
^ permalink raw reply
* Re: [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Hannes Frederic Sowa @ 2016-12-14 13:16 UTC (permalink / raw)
To: Jason A. Donenfeld, David Laight
Cc: Netdev, kernel-hardening, Andi Kleen, LKML,
Linux Crypto Mailing List
In-Reply-To: <CAHmME9pEM=cDC5S=j1BU2oCF8-WdnbRfiVojcet4rXcRLcpJRw@mail.gmail.com>
On 14.12.2016 13:53, Jason A. Donenfeld wrote:
> Hi David,
>
> On Wed, Dec 14, 2016 at 10:51 AM, David Laight <David.Laight@aculab.com> wrote:
>> From: Jason A. Donenfeld
>>> Sent: 14 December 2016 00:17
>>> This gives a clear speed and security improvement. Rather than manually
>>> filling MD5 buffers, we simply create a layout by a simple anonymous
>>> struct, for which gcc generates rather efficient code.
>> ...
>>> + const struct {
>>> + struct in6_addr saddr;
>>> + struct in6_addr daddr;
>>> + __be16 sport;
>>> + __be16 dport;
>>> + } __packed combined = {
>>> + .saddr = *(struct in6_addr *)saddr,
>>> + .daddr = *(struct in6_addr *)daddr,
>>> + .sport = sport,
>>> + .dport = dport
>>> + };
>>
>> You need to look at the effect of marking this (and the other)
>> structures 'packed' on architectures like sparc64.
>
> In all current uses of __packed in the code, I think the impact is
> precisely zero, because all structures have members in descending
> order of size, with each member being a perfect multiple of the one
> below it. The __packed is therefore just there for safety, in case
> somebody comes in and screws everything up by sticking a u8 in
> between. In that case, it wouldn't be desirable to hash the structure
> padding bits. In the worst case, I don't believe the impact would be
> worse than a byte-by-byte memcpy, which is what the old code did. But
> anyway, these structures are already naturally packed anyway, so the
> present impact is nil.
__packed not only removes all padding of the struct but also changes the
alignment assumptions for the whole struct itself. The rule, the struct
is aligned by its maximum alignment of a member is no longer true. That
said, the code accessing this struct will change (not on archs that can
deal efficiently with unaligned access, but on others).
A proper test for not introducing padding is to use something with
BUILD_BUG_ON. Also gcc also clears the padding of the struct, so padding
shouldn't be that bad in there (it is better than byte access on mips).
Btw. I think gcc7 gets support for store merging optimization.
Bye,
Hannes
^ permalink raw reply
* Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat
From: Alexey Dobriyan @ 2016-12-14 13:19 UTC (permalink / raw)
To: David Laight
Cc: davem@davemloft.net, netdev@vger.kernel.org, xemul@openvz.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB023DBE9@AcuExch.aculab.com>
On Tue, Dec 13, 2016 at 5:42 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Alexey Dobriyan
>> Sent: 13 December 2016 14:23
> ...
>> Well, the point of the patch is to save .text, so might as well save
>> as much as possible. Any form other than "ptr[id]" is going
>> to be either bigger or bigger and slower and "ptr" should be the first field.
>
> You've not read and understood the next bit:
>
>> > However if you offset the 'id' values so that only
>> > values 2 up are valid the code becomes:
>> > return net->gen2->ptr[id - 2];
>> > which will be exactly the same code as:
>> > return net->gen1->ptr[id];
>> > but it is much more obvious that 'id' values must be >= 2.
>> >
>> > The '2' should be generated from the structure offset, but with my method
>> > is doesn't actually matter if it is wrong.
>
> If you have foo->bar[id - const] then the compiler has to add the
> offset of 'bar' and subtract for 'const'.
> If the numbers match no add or subtract is needed.
>
> It is much cleaner to do this by explicitly removing the offset on the
> accesses than using a union.
Surprisingly, the trick only works if array index is cast to "unsigned long"
before subtracting.
Code becomes
...
ptr = ng->ptr[(unsigned long)id - 3];
...
I'll post a patch when net-next reopens.
^ permalink raw reply
* Re: Synopsys Ethernet QoS
From: Joao Pinto @ 2016-12-14 13:14 UTC (permalink / raw)
To: Pavel Machek, Niklas Cassel
Cc: Giuseppe CAVALLARO, Joao Pinto, Florian Fainelli, Andy Shevchenko,
David Miller, larper, rabinv, netdev, CARLOS.PALMINHA, Jie.Deng1,
Stephen Warren
In-Reply-To: <20161214125735.GA19542@amd>
Hi,
Às 12:57 PM de 12/14/2016, Pavel Machek escreveu:
> Hi!
>
>> So if there is a long time before handling interrupts,
>> I guess that it makes sense that one stream could
>> get an advantage in the net scheduler.
>>
>> If I find the time, and if no one beats me to it, I will try to replace
>> the normal timers with HR timers + a smaller default timeout.
>>
>
> Can you try something like this? Highres timers will be needed, too,
> but this fixes the logic problem.
>
> You'll need to apply it twice as code is copy&pasted.
>
> Best regards,
> Pavel
>
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>
> */
> priv->tx_count_frames += nfrags + 1;
> if (likely(priv->tx_coal_frames > priv->tx_count_frames)) {
> - mod_timer(&priv->txtimer,
> - STMMAC_COAL_TIMER(priv->tx_coal_timer));
> + if (priv->tx_count_frames == nfrags + 1)
> + mod_timer(&priv->txtimer,
> + STMMAC_COAL_TIMER(priv->tx_coal_timer));
> } else {
> priv->tx_count_frames = 0;
> priv->hw->desc->set_tx_ic(desc);
>
>
I know that this is completely of topic, but I am facing a dificulty with
stmmac. I have interrupts, mac well configured rx packets being received
successfully, but TX is not working, resulting in Tx errors = Total TX packets.
I have made a lot of debug and my conclusions is that by some reason when using
stmmac after starting tx dma, the hw state machine enters a deadend state
resulting in those errors. Anyone faced this trouble?
Thanks.
^ permalink raw reply
* Re: [PATCHv3 perf/core 0/7] Reuse libbpf from samples/bpf
From: Arnaldo Carvalho de Melo @ 2016-12-14 13:25 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: Joe Stringer, linux-kernel, netdev, wangnan0, ast
In-Reply-To: <584ACE2E.2090108@iogearbox.net>
Em Fri, Dec 09, 2016 at 04:30:54PM +0100, Daniel Borkmann escreveu:
> Hi Arnaldo,
>
> On 12/09/2016 04:09 PM, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Dec 08, 2016 at 06:46:13PM -0800, Joe Stringer escreveu:
> > > (Was "libbpf: Synchronize implementations")
> > >
> > > Update tools/lib/bpf to provide the remaining bpf wrapper pieces needed by the
> > > samples/bpf/ code, then get rid of all of the duplicate BPF libraries in
> > > samples/bpf/libbpf.[ch].
> > >
> > > ---
> > > v3: Add ack for first patch.
> > > Split out second patch from v2 into separate changes for remaining diff.
> > > Add patches to switch samples/bpf over to using tools/lib/.
> > > v2: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
> > > Don't shift non-bpf code into libbpf.
> > > Drop the patch to synchronize ELF definitions with tc.
> > > v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
> > > First post.
> >
> > Thanks, applied after addressing the -I$(objtree) issue raised by Wang,
>
> [ Sorry for late reply. ]
>
> First of all, glad to see us getting rid of the duplicate lib eventually! :)
>
> Please note that this might result in hopefully just a minor merge issue
> with net-next. Looks like patch 4/7 touches test_maps.c and test_verifier.c,
> which moved to a new bpf selftest suite [1] this net-next cycle. Seems it's
> just log buffer and some renames there, which can be discarded for both
> files sitting in selftests.
Yeah, I've got to this point, and the merge has a little bit more than
that, including BPF_PROG_ATTACH/BPF_PROG_DETACH, etc, working on it...
- Arnaldo
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox