Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 02/10] net: netcp: ethss: add support of 10gbe pcsr link status
From: Rob Herring @ 2016-12-22 22:30 UTC (permalink / raw)
  To: Murali Karicheri
  Cc: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland
In-Reply-To: <1482271793-7671-3-git-send-email-m-karicheri2@ti.com>

On Tue, Dec 20, 2016 at 05:09:45PM -0500, Murali Karicheri wrote:
> From: WingMan Kwok <w-kwok2@ti.com>
> 
> The 10GBASE-R Physical Coding Sublayer (PCS-R) module provides
> functionality of a physical coding sublayer (PCS) on data being
> transferred between a demuxed XGMII and SerDes supporting a 16
> or 32 bit interface.  From the driver point of view, whether
> a ethernet link is up or not depends also on the status of the
> block-lock bit of the PCSR.  This patch adds the checking of that
> bit in order to determine the link status.

I would think this would be a common thing and the phy driver should 
provide the status, rather than trying to give the ethernet driver 
direct access to the phy registers. Is the PCSR the serdes phy or 
registers in addition to that?

Rob

^ permalink raw reply

* Re: [PATCH v4 2/3] Bluetooth: btusb: Add out-of-band wakeup support
From: Rob Herring @ 2016-12-22 22:39 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Mark Rutland, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	Amitkumar Karwar, Wei-Ning Huang, Xinming Hu, netdev, devicetree,
	linux-bluetooth, Brian Norris, linux-kernel, rajatxjain
In-Reply-To: <1482352432-38302-2-git-send-email-rajatja@google.com>

On Wed, Dec 21, 2016 at 12:33:51PM -0800, Rajat Jain wrote:
> Some onboard BT chips (e.g. Marvell 8997) contain a wakeup pin that
> can be connected to a gpio on the CPU side, and can be used to wakeup
> the host out-of-band. This can be useful in situations where the
> in-band wakeup is not possible or not preferable (e.g. the in-band
> wakeup may require the USB host controller to remain active, and
> hence consuming more system power during system sleep).
> 
> The oob gpio interrupt to be used for wakeup on the CPU side, is
> read from the device tree node, (using standard interrupt descriptors).
> A devcie tree binding document is also added for the driver. The
> compatible string is in compliance with
> Documentation/devicetree/bindings/usb/usb-device.txt
> 
> Signed-off-by: Rajat Jain <rajatja@google.com>
> Reviewed-by: Brian Norris <briannorris@chromium.org>
> ---
> v4: Move the set_bit(BTUSB_OOB_WAKE_DISABLED,..) call to the beginning of
>     btusb_config_oob_wake() - caught by Brian.
> v3: Add Brian's "Reviewed-by"
> v2: * Use interrupt-names ("wakeup") instead of assuming first interrupt.
>     * Leave it on device tree to specify IRQ flags (level /edge triggered)
>     * Mark the device as non wakeable on exit.
> 
>  Documentation/devicetree/bindings/net/btusb.txt | 40 ++++++++++++

Acked-by: Rob Herring <robh@kernel.org>

>  drivers/bluetooth/btusb.c                       | 84 +++++++++++++++++++++++++
>  2 files changed, 124 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/btusb.txt

^ permalink raw reply

* Re: [PATCH 3/6 net-next] inet: kill smallest_size and smallest_port
From: Eric Dumazet @ 2016-12-22 22:41 UTC (permalink / raw)
  To: Josef Bacik; +Cc: davem, hannes, kraigatgoog, tom, netdev, kernel-team
In-Reply-To: <1482441998-28359-4-git-send-email-jbacik@fb.com>

On Thu, 2016-12-22 at 16:26 -0500, Josef Bacik wrote:
> In inet_csk_get_port we seem to be using smallest_port to figure out where the
> best place to look for a SO_REUSEPORT sk that matches with an existing set of
> SO_REUSEPORT's.  However if we get to the logic
> 
> if (smallest_size != -1) {
> 	port = smallest_port;
> 	goto have_port;
> }
> 
> we will do a useless search, because we would have already done the
> inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
> would have gone to found_tb and succeeded.  Since this logic makes us do yet
> another trip through inet_csk_bind_conflict for a port we know won't work just
> delete this code and save us the time.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Please remove tb->need_owners ;)

^ permalink raw reply

* Re: [PATCH 6/6 net-next] inet: reset tb->fastreuseport when adding a reuseport sk
From: Eric Dumazet @ 2016-12-22 22:43 UTC (permalink / raw)
  To: Josef Bacik; +Cc: davem, hannes, kraigatgoog, tom, netdev, kernel-team
In-Reply-To: <1482441998-28359-7-git-send-email-jbacik@fb.com>

On Thu, 2016-12-22 at 16:26 -0500, Josef Bacik wrote:
> If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
> never set it again.  Which means that in the future if we end up adding a bunch
> of reuseport sk's to that tb we'll have to do the expensive scan every time.
> Instead add a sock_common to the tb so we know what reuseport sk succeeded last.
> Once one sk has made it onto the list we know that there are no potential bind
> conflicts on the owners list that match that sk's rcv_addr.  So copy the sk's
> common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so
> we know we have to do an extra check for subsequent reuseport sockets and skip
> the expensive bind conflict check.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  include/net/inet_hashtables.h   |  4 ++++
>  net/ipv4/inet_connection_sock.c | 53 +++++++++++++++++++++++++++++++++++++----
>  2 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> index 756ed16..4ccc18f 100644
> --- a/include/net/inet_hashtables.h
> +++ b/include/net/inet_hashtables.h
> @@ -74,12 +74,16 @@ struct inet_ehash_bucket {
>   * users logged onto your box, isn't it nice to know that new data
>   * ports are created in O(1) time?  I thought so. ;-)	-DaveM
>   */
> +#define FASTREUSEPORT_ANY	1
> +#define FASTREUSEPORT_STRICT	2
> +
>  struct inet_bind_bucket {
>  	possible_net_t		ib_net;
>  	unsigned short		port;
>  	signed char		fastreuse;
>  	signed char		fastreuseport;
>  	kuid_t			fastuid;
> +	struct sock_common	fastsock;
>  	int			num_owners;
>  	struct hlist_node	node;
>  	struct hlist_head	owners;

Please place this fat field at the end of inet_bind_bucket

Many sockets do not use SO_REUSEPORT and should not use this field,
while tb->owners need to be touched.

^ permalink raw reply

* Re: [PATCH v3 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Rob Herring @ 2016-12-22 23:01 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	lauro.venancio-430g2QfJUUCGglJvpFV4uA,
	aloisio.almeida-430g2QfJUUCGglJvpFV4uA,
	sameo-VuQAYsv1563Yd54FQh9/CA, mark.rutland-5wv7dgnIgG8,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mgreer-luAo+O/VEmrlveNOaEYElw, justin-R+k406RtEhcAvxtiuMwx3w
In-Reply-To: <1482380314-16440-1-git-send-email-geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>

On Wed, Dec 21, 2016 at 11:18:32PM -0500, Geoff Lansberry wrote:
> The TRF7970A has configuration options to support hardware designs
> which use a 27.12MHz clock. This commit adds a device tree option
> 'clock-frequency' to support configuring the this chip for default
> 13.56MHz clock or the optional 27.12MHz clock.
> 
> Signed-off-by: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>
> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt       |  2 +

Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

>  drivers/nfc/trf7970a.c                             | 50 +++++++++++++++++-----
>  2 files changed, 41 insertions(+), 11 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
From: Rob Herring @ 2016-12-22 23:02 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, lauro.venancio, aloisio.almeida, sameo,
	mark.rutland, netdev, devicetree, linux-kernel, mgreer, justin
In-Reply-To: <1482380314-16440-2-git-send-email-geoff@kuvee.com>

On Wed, Dec 21, 2016 at 11:18:33PM -0500, Geoff Lansberry wrote:
> The TRF7970A has configuration options for supporting hardware designs
> with 1.8 Volt or 3.3 Volt IO.   This commit adds a device tree option,
> using a fixed regulator binding, for setting the io voltage to match
> the hardware configuration. If no option is supplied it defaults to
> 3.3 volt configuration.
> 
> Signed-off-by: Geoff Lansberry <geoff@kuvee.com>
> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt       |  2 ++

Acked-by: Rob Herring <robh@kernel.org>

>  drivers/nfc/trf7970a.c                             | 26 +++++++++++++++++++++-
>  2 files changed, 27 insertions(+), 1 deletion(-)

^ permalink raw reply

* Re: [PATCH] Fixed status entry in m_can documentation
From: Rob Herring @ 2016-12-22 23:14 UTC (permalink / raw)
  To: Vyacheslav V. Yurkov
  Cc: linux-can-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jiri Kosina,
	Wolfgang Grandegger, Marc Kleine-Budde, Mark Rutland
In-Reply-To: <20161222104521.20683-1-uvv.mail-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Thu, Dec 22, 2016 at 11:45:21AM +0100, Vyacheslav V. Yurkov wrote:
> Use valid value for 'enabled' in status field
> 
> Signed-off-by: Vyacheslav V. Yurkov <uvv.mail-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  Documentation/devicetree/bindings/net/can/m_can.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt b/Documentation/devicetree/bindings/net/can/m_can.txt
> index 9e33177..5facaf5 100644
> --- a/Documentation/devicetree/bindings/net/can/m_can.txt
> +++ b/Documentation/devicetree/bindings/net/can/m_can.txt
> @@ -63,5 +63,5 @@ Board dts:
>  &m_can1 {
>  	pinctrl-names = "default";
>  	pinctrl-0 = <&pinctrl_m_can1>;
> -	status = "enabled";
> +	status = "okay";

Examples don't need to have status prop. Just remove.

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net] inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets
From: Willem de Bruijn @ 2016-12-22 23:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, njagabar, samanthakumar, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar <njagabar@cloudmark.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/ip_sockglue.c | 2 +-
 net/ipv6/datagram.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 8b13881..9760734 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -148,7 +148,7 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct sk_buff *skb)
 	const struct iphdr *iph = ip_hdr(skb);
 	__be16 *ports = (__be16 *)skb_transport_header(skb);
 
-	if (skb_transport_offset(skb) + 4 > skb->len)
+	if (skb_transport_offset(skb) + 4 > (int)skb->len)
 		return;
 
 	/* All current transport protocols have the port numbers in the
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 0489e19..1407426 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -701,7 +701,7 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct msghdr *msg,
 		struct sockaddr_in6 sin6;
 		__be16 *ports = (__be16 *) skb_transport_header(skb);
 
-		if (skb_transport_offset(skb) + 4 <= skb->len) {
+		if (skb_transport_offset(skb) + 4 <= (int)skb->len) {
 			/* All current transport protocols have the port numbers in the
 			 * first four bytes of the transport header and this function is
 			 * written with this assumption in mind.
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Daniel Borkmann @ 2016-12-22 23:20 UTC (permalink / raw)
  To: Shahar Klein, davem
  Cc: xiyou.wangcong, gerlitz.or, roid, jiri, john.fastabend, netdev
In-Reply-To: <7033ed3d-0665-1a38-7631-a1346d46e1b4@mellanox.com>

On 12/22/2016 02:16 PM, Shahar Klein wrote:
> On 12/21/2016 7:04 PM, Daniel Borkmann wrote:
>> Shahar reported a soft lockup in tc_classify(), where we run into an
>> endless loop when walking the classifier chain due to tp->next == tp
>> which is a state we should never run into. The issue only seems to
>> trigger under load in the tc control path.
>>
>> What happens is that in tc_ctl_tfilter(), thread A allocates a new
>> tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
>> with it. In that classifier callback we had to unlock/lock the rtnl
>> mutex and returned with -EAGAIN. One reason why we need to drop there
>> is, for example, that we need to request an action module to be loaded.
>>
>> This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
>> after we loaded and found the requested action, we need to redo the
>> whole request so we don't race against others. While we had to unlock
>> rtnl in that time, thread B's request was processed next on that CPU.
>> Thread B added a new tp instance successfully to the classifier chain.
>> When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
>> and destroying its tp instance which never got linked, we goto replay
>> and redo A's request.
>>
>> This time when walking the classifier chain in tc_ctl_tfilter() for
>> checking for existing tp instances we had a priority match and found
>> the tp instance that was created and linked by thread B. Now calling
>> again into tp->ops->change() with that tp was successful and returned
>> without error.
>>
>> tp_created was never cleared in the second round, thus kernel thinks
>> that we need to link it into the classifier chain (once again). tp and
>> *back point to the same object due to the match we had earlier on. Thus
>> for thread B's already public tp, we reset tp->next to tp itself and
>> link it into the chain, which eventually causes the mentioned endless
>> loop in tc_classify() once a packet hits the data path.
>>
>> Fix is to clear tp_created at the beginning of each request, also when
>> we replay it. On the paths that can cause -EAGAIN we already destroy
>> the original tp instance we had and on replay we really need to start
>> from scratch. It seems that this issue was first introduced in commit
>> 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
>> and avoid kernel panic when we use cls_cgroup").
>>
>> Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
>> Reported-by: Shahar Klein <shahark@mellanox.com>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: Cong Wang <xiyou.wangcong@gmail.com>
[...]
> I've tested this specific patch (without cong's patch and without the many prints) many times and in many test permutations today and it didn't lock
>
> Thanks again Daniel!

Just catching up with email (traveling whole day), thanks a bunch for
your effort, Shahar! In that case lets then add:

Tested-by: Shahar Klein <shahark@mellanox.com>

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Daniel Borkmann @ 2016-12-22 23:21 UTC (permalink / raw)
  To: John Fastabend, David Miller
  Cc: xiyou.wangcong, shahark, gerlitz.or, roid, jiri, netdev
In-Reply-To: <585C127E.7040509@gmail.com>

On 12/22/2016 06:50 PM, John Fastabend wrote:
> On 16-12-22 08:53 AM, David Miller wrote:
>> From: Daniel Borkmann <daniel@iogearbox.net>
>> Date: Wed, 21 Dec 2016 22:07:48 +0100
>>
>>> Ok, you mean for net. In that case I prefer the smaller sized fix to
>>> be honest. It also covers everything from the point where we fetch
>>> the chain via cops->tcf_chain() to the end of the function, which is
>>> where most of the complexity resides, and only the two mentioned
>>> commits do the relock, so as a fix I think it's fine as-is. As
>>> mentioned, if there's need to refactor tc_ctl_tfilter() net-next
>>> would be better, imho.
>>
>> Please, can you two work towards an agreement as to what fix should
>> go in at this time?
>>
>> Thanks.
>
> Thanks for fixing this!
>
> I have a slight preference to push this patch into net as its been
> tested already by Shahar and is not particularly invasive. Then for
> net-next do a larger cleanup to get rid of some of the complexity per
> Cong's suggestion.

My preference as well, thanks.

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-23  0:07 UTC (permalink / raw)
  To: hannes, linux, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <30d9513a-129b-d246-1461-2326130e118f@stressinduktion.org>

Hannes Frederic Sowa wrote:
> A lockdep test should still be done. ;)

Adding might_lock() annotations will improve coverage a lot.

> Yes, that does look nice indeed. Accounting for bits instead of bytes
> shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> case you can't satisfy a request with one batched entropy block and have
> to consume randomness from two.

The bit granularity is also for the callers' convenience, so they don't
have to mask again.  Whether get_random_bits rounds up to byte boundaries
internally or not is something else.

When the current batch runs low, I was actually thinking of throwing
away the remaining bits and computing a new batch of 512.  But it's
whatever works best at implementation time.

>>> It could only mix the output back in every two calls, in which case
>>> you can backtrack up to one call but you need to do 2^128 work to
>>> backtrack farther.  But yes, this is getting excessively complicated.
> 
>> No, if you're willing to accept limited backtrack, this is a perfectly
>> acceptable solution, and not too complicated.  You could do it phase-less
>> if you like; store the previous output, then after generating the new
>> one, mix in both.  Then overwrite the previous output.  (But doing two
>> rounds of a crypto primtive to avoid one conditional jump is stupid,
>> so forget that.)

> Can you quickly explain why we lose the backtracking capability?

Sure.  An RNG is (state[i], output[i]) = f(state[i-1]).  The goal of
backtracking is to compute output[i], or better yet state[i-1], given
state[i].

For example, consider an OFB or CTR mode generator.  The state is a key
and and IV, and you encrypt the IV with the key to produce output, then
either replace the IV with the output, or increment it.  Either way,
since you still have the key, you can invert the transformation and
recover the previous IV.

The standard way around this is to use the Davies-Meyer construction:

IV[i] = IV[i-1] + E(IV[i-1], key)

This is the standard way to make a non-invertible random function
out of an invertible random permutation.

>From the sum, there's no easy way to find the ciphertext *or* the
plaintext that was encrypted.  Assuming the encryption is secure,
the only way to reverse it is brute force: guess IV[i-1] and run the
operation forward to see if the resultant IV[i] matches.

There are a variety of ways to organize this computation, since the
guess gives toy both IV[i-1] and E(IV[i-1], key) = IV[i] - IV[i-1], including
running E forward, backward, or starting from both ends to see if you
meet in the middle.

The way you add the encryption output to the IV is not very important.
It can be addition, xor, or some more complex invertible transformation.
In the case of SipHash, the "encryption" output is smaller than the
input, so we have to get a bit more creative, but it's still basically
the same thing.

The problem is that the output which is combined with the IV is too small.
With only 64 bits, trying all possible values is practical.  (The world's
Bitcoin miners are collectively computing SHA-256(SHA-256(input)) 1.7 * 2^64
times per second.)

By basically doing two iterations at once and mixing in 128 bits of
output, the guessing attack is rendered impractical.  The only downside
is that you need to remember and store one result between when it's
computed and last used.  This is part of the state, so an attack can
find output[i-1], but not anything farther back.

> ChaCha as a block cipher gives a "perfect" permutation from the output
> of either the CRNG or the CPRNG, which actually itself has backtracking
> protection.

I'm not quite understanding.  The /dev/random implementation uses some
of the ChaCha output as a new ChaCha key (that's another way to mix output
back into the state) to prevent backtracking.  But this slows it down, and
again if you want to be efficient, you're generating and storing large batches
of entropy and storing it in the RNG state.

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Daniel Borkmann @ 2016-12-23  0:26 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Shahar Klein, Or Gerlitz, Roi Dayan, Jiri Pirko,
	John Fastabend, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpW9Ly+fMEiBtHq4XwUgvt89-5uE2Bwdw1FXJGV3=eY1ZA@mail.gmail.com>

On 12/22/2016 08:05 PM, Cong Wang wrote:
> On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Ok, you mean for net. In that case I prefer the smaller sized fix to be
>> honest. It also covers everything from the point where we fetch the chain
>> via cops->tcf_chain() to the end of the function, which is where most of
>> the complexity resides, and only the two mentioned commits do the relock,
>
> I really wish the problem is only about relocking, but look at the code,
> the deeper reason why we have this bug is the complexity of the logic
> inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
> idempotent; 2) the request logic itself is hard, because of tc filter design
> and implementation.
>
> This is why I worry more than just relocking.

But do you have a concrete 2nd issue/bug you're seeing? It rather sounds to
me your argument is more about fear of complexity on tc framework itself.
I agree it's complex, and tc_ctl_tfilter() is quite big in itself, where it
would be good to reduce it's complexity into smaller pieces. But it's not
really related to the fix itself, reducing complexity requires significantly
more and deeper work on the code. We can rework tc_ctl_tfilter() in net-next
to try to simplify it, sure, but I don't get why we have to discuss so much
on this matter in this context, really.

>> so as a fix I think it's fine as-is. As mentioned, if there's need to
>> refactor tc_ctl_tfilter() net-next would be better, imho.
>
> Refactor is a too strong word, just moving the replay out is not a refactor.
> The end result will be still smaller than your commit d936377414fadbafb4,
> which is already backported to stable.

Commit d936377414fa is a whole different thing, and not related to the
topic at all. The two-line changes with the initialization fix is quite
straight forward and fixes a bug in the logic. It's also less complex in
terms of lines but also in terms of complexity than to refactor or move the
replay out. Moving it out contextually means that you also wouldn't rule
out that things like nlmsg_parse(), or in-fact *any* of the __tc_ctl_tfilter()
return paths could potentially return an error that suddenly would require
a replay of the request. So, while moving it out fixes the bug, too, it also
adds more potential points where you would go and replay the request where
you didn't do so before and therefore also adds more complexity to the code
that needs review/audit when fixing bugs since you don't have these hard/direct
return paths anymore. Therefore I don't think it's better to go that way
for the fix.

Thanks,
Daniel

^ permalink raw reply

* [PATCH] brcmfmac: fix spelling mistakes on "Ivalid"
From: Colin King @ 2016-12-23  0:43 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Kalle Valo,
	Pieter-Paul Giesberts, Rafał Miłecki, linux-wireless,
	brcm80211-dev-list.pdl, netdev
  Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fixes to spelling mistake "Ivalid" to "Invalid" in
brcmf_err error messages.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 7ffc4ab..15eaf72 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -3971,7 +3971,7 @@ brcmf_configure_wpaie(struct brcmf_if *ifp,
 			pval |= AES_ENABLED;
 			break;
 		default:
-			brcmf_err("Ivalid unicast security info\n");
+			brcmf_err("Invalid unicast security info\n");
 		}
 		offset++;
 	}
@@ -4015,7 +4015,7 @@ brcmf_configure_wpaie(struct brcmf_if *ifp,
 			wpa_auth |= WPA2_AUTH_1X_SHA256;
 			break;
 		default:
-			brcmf_err("Ivalid key mgmt info\n");
+			brcmf_err("Invalid key mgmt info\n");
 		}
 		offset++;
 	}
-- 
2.10.2

^ permalink raw reply related

* RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
From: maowenan @ 2016-12-23  0:40 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Stephen Hemminger, netdev@vger.kernel.org,
	jeffrey.t.kirsher@intel.com, weiyongjun (A), Dingtianhong
In-Reply-To: <CAKgT0Uc2ztu_iDn79zdNtsAdb-Ftu65V6YHV-sqXb6vTCTiGCQ@mail.gmail.com>



> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> Sent: Thursday, December 22, 2016 11:54 PM
> To: maowenan
> Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com;
> weiyongjun (A); Dingtianhong
> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
> 
> On Wed, Dec 21, 2016 at 5:39 PM, maowenan <maowenan@huawei.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >> Sent: Thursday, December 22, 2016 9:28 AM
> >> To: maowenan
> >> Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
> >> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> >> ordering mode
> >>
> >> On Thu, 8 Dec 2016 14:51:38 +0800
> >> Mao Wenan <maowenan@huawei.com> wrote:
> >>
> >> > This patch provides one way to set/unset IXGBE NIC TX and RX relax
> >> > ordering mode, which can be set by ethtool.
> >> > Relax ordering is one mode of 82599 NIC, to enable this mode can
> >> > enhance the performance for some cpu architecure.
> >>
> >> Then it should be done by CPU architecture specific quirks
> >> (preferably in PCI
> >> layer) so that all users get the option without having to do manual
> intervention.
> >>
> >> > example:
> >> > ethtool -s enp1s0f0 relaxorder off
> >> > ethtool -s enp1s0f0 relaxorder on
> >>
> >> Doing it via ethtool is a developer API (for testing) not something
> >> that makes sense in production.
> >
> >
> > This feature is not mandatory for all users, acturally relax ordering
> > default configuration of 82599 is 'disable', So this patch gives one way to
> enable relax ordering to be selected in some performance condition.
> 
> That isn't quite true.  The default for Sparc systems is to have it enabled.
> 
> Really this is something that is platform specific.  I agree with Stephen that it
> would work better if this was handled as a series of platform specific quirks
> handled at something like the PCI layer rather than be a switch the user can
> toggle on and off.
> 
> With that being said there are changes being made that should help to improve
> the situation.  Specifically I am looking at adding support for the
> DMA_ATTR_WEAK_ORDERING which may also allow us to identify cases where
> you might be able to specify the DMA behavior via the DMA mapping instead of
> having to make the final decision in the device itself.
> 
> - Alex

Yes, Sparc is a special case. From the NIC driver point of view, It is no need for 
some ARCHs to do particular operation and compiling branch, ethtool is a flexible 
method for user to make decision whether on|off this feature.
I think Jeff as maintainer of 82599 has some comments about this.
-Wenan


^ permalink raw reply

* Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
From: Jeff Kirsher @ 2016-12-23  1:06 UTC (permalink / raw)
  To: maowenan, Alexander Duyck
  Cc: Stephen Hemminger, netdev@vger.kernel.org, weiyongjun (A),
	Dingtianhong
In-Reply-To: <F95AC9340317A84688A5F0DF0246F3F20151FD1C@szxeml504-mbs.china.huawei.com>

[-- Attachment #1: Type: text/plain, Size: 3540 bytes --]

On Fri, 2016-12-23 at 00:40 +0000, maowenan wrote:
> > -----Original Message-----
> > From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> > Sent: Thursday, December 22, 2016 11:54 PM
> > To: maowenan
> > Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel.
> > com;
> > weiyongjun (A); Dingtianhong
> > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > ordering mode
> > 
> > On Wed, Dec 21, 2016 at 5:39 PM, maowenan <maowenan@huawei.com>
> > wrote:
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > Sent: Thursday, December 22, 2016 9:28 AM
> > > > To: maowenan
> > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
> > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > > > ordering mode
> > > > 
> > > > On Thu, 8 Dec 2016 14:51:38 +0800
> > > > Mao Wenan <maowenan@huawei.com> wrote:
> > > > 
> > > > > This patch provides one way to set/unset IXGBE NIC TX and RX
> > > > > relax
> > > > > ordering mode, which can be set by ethtool.
> > > > > Relax ordering is one mode of 82599 NIC, to enable this mode can
> > > > > enhance the performance for some cpu architecure.
> > > > 
> > > > Then it should be done by CPU architecture specific quirks
> > > > (preferably in PCI
> > > > layer) so that all users get the option without having to do manual
> > 
> > intervention.
> > > > 
> > > > > example:
> > > > > ethtool -s enp1s0f0 relaxorder off
> > > > > ethtool -s enp1s0f0 relaxorder on
> > > > 
> > > > Doing it via ethtool is a developer API (for testing) not something
> > > > that makes sense in production.
> > > 
> > > 
> > > This feature is not mandatory for all users, acturally relax ordering
> > > default configuration of 82599 is 'disable', So this patch gives one
> > > way to
> > 
> > enable relax ordering to be selected in some performance condition.
> > 
> > That isn't quite true.  The default for Sparc systems is to have it
> > enabled.
> > 
> > Really this is something that is platform specific.  I agree with
> > Stephen that it
> > would work better if this was handled as a series of platform specific
> > quirks
> > handled at something like the PCI layer rather than be a switch the
> > user can
> > toggle on and off.
> > 
> > With that being said there are changes being made that should help to
> > improve
> > the situation.  Specifically I am looking at adding support for the
> > DMA_ATTR_WEAK_ORDERING which may also allow us to identify cases where
> > you might be able to specify the DMA behavior via the DMA mapping
> > instead of
> > having to make the final decision in the device itself.
> > 
> > - Alex
> 
> Yes, Sparc is a special case. From the NIC driver point of view, It is no
> need for 
> some ARCHs to do particular operation and compiling branch, ethtool is a
> flexible 
> method for user to make decision whether on|off this feature.
> I think Jeff as maintainer of 82599 has some comments about this.

My original comment/objection was that you attempted to do this change as a
module parameter to the ixgbe driver, where I directed you to use ethtool 
so that other drivers could benefit from the ability to enable/disable
relaxed ordering.  As far as how it gets implemented in ethtool or PCI
layer, makes little difference to me, I only had issues with the driver
specific module parameter implementation, which is not acceptable.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Phil Reid @ 2016-12-23  1:09 UTC (permalink / raw)
  To: Joao Pinto, peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <67213186-8583-9f9c-6f6a-6f56d2ba4ff0@synopsys.com>

G'day Joao,
On 23/12/2016 01:06, Joao Pinto wrote:
> Às 4:57 PM de 12/22/2016, Phil Reid escreveu:
>> On 22/12/2016 23:47, Joao Pinto wrote:
>>>
>>> Hello Phil,
>>>
>>> Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
>>>> G'day Joao,
>>>>
>>>> On 22/12/2016 20:38, Joao Pinto wrote:
>>>>> When testing stmmac with my QoS reference design I checked a problem in the
>>>>> CSR clock configuration that was impossibilitating the phy discovery, since
>>>>> every read operation returned 0x0000ffff. This patch fixes the issue.
>>>>>
>>>>> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
>>>>> ---
>>>>> changes v1->v2 (David Miller)
>>>>> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
>>>>> to make sense
>>>>>
>>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>>>>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
>>>>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>> index b21d03f..94223c8 100644
>>>>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>>>>> *ioaddr, int mcbins,
>>>>>      mac->mii.reg_shift = 6;
>>>>>      mac->mii.reg_mask = 0x000007C0;
>>>>>      mac->mii.clk_csr_shift = 2;
>>>>> -    mac->mii.clk_csr_mask = 0xF;
>>>>> +    mac->mii.clk_csr_mask = GENMASK(4, 2);
>>>>
>>>> Should this not be GENMASK(5,2)
>>>
>>> According to Universal MAC databook (valid for MAC100 and MAC1000), we have:
>>>
>>> Bits: 4:2
>>> 000 60-100 MHz clk_csr_i/42
>>> 001 100-150 MHz clk_csr_i/62
>>> 010 20-35 MHz clk_csr_i/16
>>> 011 35-60 MHz clk_csr_i/26
>>> 100 150-250 MHz clk_csr_i/102
>>> 101 250-300 MHz clk_csr_i/124
>>> 110, 111 Reserved
>>>
>>> So only bits 2, 3 and 4 should be masked.
>>>
>>> Thanks.
>>>
>> But this is a change in behaviour from what was there isn't.
>> Previous mask was 4 bits. now it's 3.
>>
>> And for example the altera socfgpa implementation in the cyclone V has valid values
>> for values 0x8-0xf, using bit 5:2.
>
> According to the databook, bit 5 is reserved and RO. When reserved, it is
> possible to customize. Is that the case? If there is hardware using the 5th bit
> we can put it GENMASK(5, 2) with no problem.
>
I've also checked the Aria 10 documentation and bit 5 is also RW.
The following options are documented and supported
     1000: CSR clock/4
     1001: CSR clock/6
     1010: CSR clock/8
     1011: CSR clock/10
     1100: CSR clock/12
     1101: CSR clock/14
     1110: CSR clock/16
     1111: CSR clock/18
They do mention that these values will probably be outside the IEEE 802.3 specified range.

But there's at least a couple of cores out there where GENMASK(5,2) is valid.
Can't say if anyone is using that setting thou.


-- 
Regards
Phil Reid

^ permalink raw reply

* RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
From: maowenan @ 2016-12-23  6:14 UTC (permalink / raw)
  To: Jeff Kirsher, Alexander Duyck
  Cc: Stephen Hemminger, netdev@vger.kernel.org, weiyongjun (A),
	Dingtianhong
In-Reply-To: <1482455203.2481.31.camel@intel.com>



> -----Original Message-----
> From: Jeff Kirsher [mailto:jeffrey.t.kirsher@intel.com]
> Sent: Friday, December 23, 2016 9:07 AM
> To: maowenan; Alexander Duyck
> Cc: Stephen Hemminger; netdev@vger.kernel.org; weiyongjun (A);
> Dingtianhong
> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
> 
> On Fri, 2016-12-23 at 00:40 +0000, maowenan wrote:
> > > -----Original Message-----
> > > From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> > > Sent: Thursday, December 22, 2016 11:54 PM
> > > To: maowenan
> > > Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel.
> > > com;
> > > weiyongjun (A); Dingtianhong
> > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > > ordering mode
> > >
> > > On Wed, Dec 21, 2016 at 5:39 PM, maowenan <maowenan@huawei.com>
> > > wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > > Sent: Thursday, December 22, 2016 9:28 AM
> > > > > To: maowenan
> > > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
> > > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set
> > > > > relax ordering mode
> > > > >
> > > > > On Thu, 8 Dec 2016 14:51:38 +0800 Mao Wenan
> > > > > <maowenan@huawei.com> wrote:
> > > > >
> > > > > > This patch provides one way to set/unset IXGBE NIC TX and RX
> > > > > > relax ordering mode, which can be set by ethtool.
> > > > > > Relax ordering is one mode of 82599 NIC, to enable this mode
> > > > > > can enhance the performance for some cpu architecure.
> > > > >
> > > > > Then it should be done by CPU architecture specific quirks
> > > > > (preferably in PCI
> > > > > layer) so that all users get the option without having to do
> > > > > manual
> > >
> > > intervention.
> > > > >
> > > > > > example:
> > > > > > ethtool -s enp1s0f0 relaxorder off ethtool -s enp1s0f0
> > > > > > relaxorder on
> > > > >
> > > > > Doing it via ethtool is a developer API (for testing) not
> > > > > something that makes sense in production.
> > > >
> > > >
> > > > This feature is not mandatory for all users, acturally relax
> > > > ordering default configuration of 82599 is 'disable', So this
> > > > patch gives one way to
> > >
> > > enable relax ordering to be selected in some performance condition.
> > >
> > > That isn't quite true.  The default for Sparc systems is to have it
> > > enabled.
> > >
> > > Really this is something that is platform specific.  I agree with
> > > Stephen that it would work better if this was handled as a series of
> > > platform specific quirks handled at something like the PCI layer
> > > rather than be a switch the user can toggle on and off.
> > >
> > > With that being said there are changes being made that should help
> > > to improve the situation.  Specifically I am looking at adding
> > > support for the DMA_ATTR_WEAK_ORDERING which may also allow us to
> > > identify cases where you might be able to specify the DMA behavior
> > > via the DMA mapping instead of having to make the final decision in
> > > the device itself.
> > >
> > > - Alex
> >
> > Yes, Sparc is a special case. From the NIC driver point of view, It is
> > no need for some ARCHs to do particular operation and compiling
> > branch, ethtool is a flexible method for user to make decision whether
> > on|off this feature.
> > I think Jeff as maintainer of 82599 has some comments about this.
> 
> My original comment/objection was that you attempted to do this change as a
> module parameter to the ixgbe driver, where I directed you to use ethtool so
> that other drivers could benefit from the ability to enable/disable relaxed
> ordering.  As far as how it gets implemented in ethtool or PCI layer, makes
> little difference to me, I only had issues with the driver specific module
> parameter implementation, which is not acceptable.


Thank you Jeff and Alex.
And then I have gone through mail thread about "i40e: enable PCIe relax ordering for SPARC",
It only works for SPARC, any other ARCH who wants to enable DMA_ATTR_WEAK_ORDERING 
feature, should define the new macro, recompile the driver module.

Because of the above reasons, we implement in ethtool to give the final user a convenient
way to on|off special feature, no need define new macro, easy to extend the new features,
and also good benefit for other driver as Jeff referred.


^ permalink raw reply

* [patch net 0/3] mlxsw: Router fixes
From: Jiri Pirko @ 2016-12-23  8:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis

From: Jiri Pirko <jiri@mellanox.com>

Ido says:

First two patches ensure we remove from the device's table neighbours
that are considered to be dead by the neighbour core.

The last patch removes nexthop groups from the device when they are no
longer valid.

Ido Schimmel (3):
  neigh: Send netevent after marking neigh as dead
  mlxsw: spectrum_router: Don't reflect dead neighs
  mlxsw: spectrum_router: Correctly remove nexthop groups

 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 13 +++++++++----
 net/core/neighbour.c                                  |  1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [patch net 1/3] neigh: Send netevent after marking neigh as dead
From: Jiri Pirko @ 2016-12-23  8:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis
In-Reply-To: <1482481970-2327-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

neigh_cleanup_and_release() is always called after marking a neighbour
as dead, but it only notifies user space and not in-kernel listeners of
the netevent notification chain.

This can cause multiple problems. In my specific use case, it causes the
listener (a switch driver capable of L3 offloads) to believe a neighbour
entry is still valid, and is thus erroneously kept in the device's
table.

Fix that by sending a netevent after marking the neighbour as dead.

Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/neighbour.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 782dd86..7bb12e0 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -100,6 +100,7 @@ static void neigh_cleanup_and_release(struct neighbour *neigh)
 		neigh->parms->neigh_cleanup(neigh);
 
 	__neigh_notify(neigh, RTM_DELNEIGH, 0);
+	call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
 	neigh_release(neigh);
 }
 
-- 
2.7.4

^ permalink raw reply related

* [patch net 2/3] mlxsw: spectrum_router: Don't reflect dead neighs
From: Jiri Pirko @ 2016-12-23  8:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis
In-Reply-To: <1482481970-2327-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

When a neighbour is considered to be dead, we should remove it from the
device's table regardless of its NUD state.

Without this patch, after setting a port to be administratively down we
get the following errors when we periodically try to update the kernel
about neighbours activity:

[  461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find
matching neighbour for IP=192.168.100.2

Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 53126bf..a0f9742 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -942,7 +942,7 @@ static void mlxsw_sp_router_neigh_update_hw(struct work_struct *work)
 	char rauht_pl[MLXSW_REG_RAUHT_LEN];
 	struct net_device *dev;
 	bool entry_connected;
-	u8 nud_state;
+	u8 nud_state, dead;
 	bool updating;
 	bool removing;
 	bool adding;
@@ -953,10 +953,11 @@ static void mlxsw_sp_router_neigh_update_hw(struct work_struct *work)
 	dip = ntohl(*((__be32 *) n->primary_key));
 	memcpy(neigh_entry->ha, n->ha, sizeof(neigh_entry->ha));
 	nud_state = n->nud_state;
+	dead = n->dead;
 	dev = n->dev;
 	read_unlock_bh(&n->lock);
 
-	entry_connected = nud_state & NUD_VALID;
+	entry_connected = nud_state & NUD_VALID && !dead;
 	adding = (!neigh_entry->offloaded) && entry_connected;
 	updating = neigh_entry->offloaded && entry_connected;
 	removing = neigh_entry->offloaded && !entry_connected;
@@ -1351,7 +1352,7 @@ static int mlxsw_sp_nexthop_init(struct mlxsw_sp *mlxsw_sp,
 	struct mlxsw_sp_neigh_entry *neigh_entry;
 	struct net_device *dev = fib_nh->nh_dev;
 	struct neighbour *n;
-	u8 nud_state;
+	u8 nud_state, dead;
 
 	/* Take a reference of neigh here ensuring that neigh would
 	 * not be detructed before the nexthop entry is finished.
@@ -1383,8 +1384,9 @@ static int mlxsw_sp_nexthop_init(struct mlxsw_sp *mlxsw_sp,
 	list_add_tail(&nh->neigh_list_node, &neigh_entry->nexthop_list);
 	read_lock_bh(&n->lock);
 	nud_state = n->nud_state;
+	dead = n->dead;
 	read_unlock_bh(&n->lock);
-	__mlxsw_sp_nexthop_neigh_update(nh, !(nud_state & NUD_VALID));
+	__mlxsw_sp_nexthop_neigh_update(nh, !(nud_state & NUD_VALID && !dead));
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* [patch net 3/3] mlxsw: spectrum_router: Correctly remove nexthop groups
From: Jiri Pirko @ 2016-12-23  8:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis
In-Reply-To: <1482481970-2327-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

At the end of the nexthop initialization process we determine whether
the nexthop should be offloaded or not based on the NUD state of the
neighbour representing it. After all the nexthops were initialized we
refresh the nexthop group and potentially offload it to the device, in
case some of the nexthops were resolved.

Make the destruction of a nexthop group symmetric with its creation by
marking all nexthops as invalid and then refresh the nexthop group to
make sure it was removed from the device's tables.

Fixes: b2157149b0b0 ("mlxsw: spectrum_router: Add the nexthop neigh activity update")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index a0f9742..01d0efa 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -1396,6 +1396,7 @@ static void mlxsw_sp_nexthop_fini(struct mlxsw_sp *mlxsw_sp,
 {
 	struct mlxsw_sp_neigh_entry *neigh_entry = nh->neigh_entry;
 
+	__mlxsw_sp_nexthop_neigh_update(nh, true);
 	list_del(&nh->neigh_list_node);
 
 	/* If that is the last nexthop connected to that neigh, remove from
@@ -1454,6 +1455,8 @@ mlxsw_sp_nexthop_group_destroy(struct mlxsw_sp *mlxsw_sp,
 		nh = &nh_grp->nexthops[i];
 		mlxsw_sp_nexthop_fini(mlxsw_sp, nh);
 	}
+	mlxsw_sp_nexthop_group_refresh(mlxsw_sp, nh_grp);
+	WARN_ON_ONCE(nh_grp->adj_index_valid);
 	kfree(nh_grp);
 }
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next 1/3] net:dsa:mv88e6xxx: use hashtable to store multicast entries
From: Volodymyr Bendiuga @ 2016-12-23  9:30 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vivien Didelot, Florian Fainelli, Volodymyr Bendiuga, netdev,
	Volodymyr Bendiuga, John Crispin
In-Reply-To: <20161216110115.GC20359@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

Hi Andrew,
Here is the program I promised you.
There is a .c and Makefile attached to this mail. Simply type ”make” to build it.
There is a dependency on libnl3, which needs to be installed. If you don’t have it
Just install it: "apt-get install libnl-3-dev libnl-route-3-dev” if you use ubuntu based
Linux.

What the program does is it creates a libnl cache, inserts into it some hardcoded
Multicast entries, and then adds entries to the kernel (to the bridge and to hardware).
By default the program looks for interfaces that have ”eth” in their names. If your interfaces
Have different names, just correct that line in the code. Otherwise it should just run straight away.
When entries are added the timing is presented. By default the program uses vlan id 1. If you need
To override it, just pass "-v 2” or whatever vlan id you wish.
Please try running the program with hash table patch and without. You should see a significant difference.

Let me know if there is something you need help with.

Thanks,
Volodymyr

[-- Attachment #2: Makefile --]
[-- Type: application/octet-stream, Size: 326 bytes --]


EXEC      = fdbtest
OBJS      = fdbtest.o 
SRCS      = $(OBJS:.o=.c)
DEPS      = $(SRCS:.c=.d)
CFLAGS   += `pkg-config --cflags libnl-route-3.0`
DEPLIBS   = `pkg-config --libs libnl-route-3.0`
LDLIBS   += $(DEPLIBS)

all: Makefile $(EXEC)

$(EXEC): $(OBJS)

clean:
	-@$(RM) $(OBJS) $(EXEC)

distclean: clean
	-@$(RM) $(DEPS)

[-- Attachment #3: fdbtest.c --]
[-- Type: application/octet-stream, Size: 6045 bytes --]

/* 
 * FDB test - insert preprogrammed fdb entries
 * into HW database and measure the time it takes
 * to do the job.
 *
 * Author: Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com>
 */

#include <net/if.h>
#include <netinet/ether.h>
#include <unistd.h>
#include <time.h>

#define _LINUX_IF_H
#include <linux/if_link.h>
#include <netlink/netlink.h>
#include <netlink/route/link.h>
#include <netlink/route/neighbour.h>
#include <netlink/route/link/bridge.h>

struct cb_data {
	int err;
	int vid;
	struct nl_sock *sk;
	void *arg;
};

static void __add(struct nl_object *obj, void *data)
{
	struct cb_data *cbd = data;
	struct nl_sock *sk;
	struct rtnl_neigh *neigh = (struct rtnl_neigh *)obj;

	sk = cbd->sk;
	if (cbd->err)
		return;

        cbd->err = rtnl_neigh_add(sk, neigh, NLM_F_CREATE);
}

static int __insert(struct nl_cache *cache, int ifindex, unsigned char *mac, int vlan)
{
	struct rtnl_neigh *neigh;
	struct nl_addr *addr;
	int err = 0;

        neigh = rtnl_neigh_alloc();
        if (!neigh) {
                printf("Could not allocate netlink neighbour\n");
                return -NLE_NOMEM;
        }

        addr = nl_addr_alloc(ETH_ALEN);
        if (!addr) {
		printf("Could not allocate netlink address\n");
		err = -NLE_NOMEM;
		goto err_put_neigh;
        }

	nl_addr_set_family(addr, AF_LLC);
	nl_addr_set_prefixlen(addr, 48);
        if (nl_addr_set_binary_addr(addr, mac, ETH_ALEN) < 0) {
		printf("Could not set netlink address\n");
		err = -NLE_FAILURE;
		goto err_put_addr;
	}

	rtnl_neigh_set_family(neigh, PF_BRIDGE);
	rtnl_neigh_set_lladdr(neigh, addr);
	rtnl_neigh_set_flags(neigh, NTF_SELF);
	rtnl_neigh_set_state(neigh, NUD_NOARP);
	rtnl_neigh_set_ifindex(neigh, ifindex);
	rtnl_neigh_set_vlan(neigh, vlan);

	err = nl_cache_add(cache, (struct nl_object*)neigh);
	if (err)
		printf("Could not add object to cache, err -> %d", err);

err_put_addr:
	nl_addr_put(addr);
err_put_neigh:
	rtnl_neigh_put(neigh);

	return err;
}

static int __insert_in_vlan(struct nl_cache *cache, char* ifname, unsigned char *mac, int vid)
{
       int ifindex;
       int err = 0;

       ifindex = if_nametoindex(ifname);
       err = __insert(cache, ifindex, mac, vid);

       return err;
}

static int __insert_default_mac(struct nl_cache *cache, char* ifname, int vid)
{
	int err = 0;
	size_t i;
        unsigned char mac[][ETH_ALEN] = {
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x01 }, /* 224.0.0.1 */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x02 }, /* 224.0.0.2 */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x04 }, /* 224.0.0.4   - DVMRP */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x05 }, /* 224.0.0.5   - OSPF */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x06 }, /* 224.0.0.6   - OSPF */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x09 }, /* 224.0.0.9   - RIPv2 */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x0a }, /* 224.0.0.10  - IGRP */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x0c }, /* 224.0.0.12  - DHCP Server/Relay */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x0d }, /* 224.0.0.13  - PIM */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x0e }, /* 224.0.0.14  - RSVP */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x12 }, /* 224.0.0.18  - VRRP*/
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x13 }, /* 224.0.0.19 */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x14 }, /* 224.0.0.20 */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x15 }, /* 224.0.0.21 */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x16 }, /* 224.0.0.22  - IGMP */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x18 }, /* 224.0.0.24  - OSPF */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x66 }, /* 224.0.0.102 - HSRP */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x6b }, /* 224.0.0.107 - PTP-pdelay */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0xfb }, /* 224.0.0.251 - mDNS */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0xfc }, /* 224.0.0.252 - Link-local multicast DNS */
                { 0x01, 0x00, 0x5e, 0x00, 0x00, 0xfe }  /* 224.0.0.254 */
        };

	for (i = 0; i < sizeof(mac) / sizeof(mac[0]); i++)
		__insert_in_vlan(cache, ifname, mac[i], vid);

	return err;
}

static void __add_fdb_defaults(struct nl_object *obj, void *data)
{
	struct cb_data *cbd = data;
	struct nl_cache *fdb_cache = cbd->arg;
	struct rtnl_link *link = nl_object_priv(obj);
	char *ifname;

	ifname = rtnl_link_get_name(link);
	if (!strncmp(ifname, "eth", 3)) {
		printf("Using interface -> %s\n", ifname);
		cbd->err = __insert_default_mac(fdb_cache, ifname, cbd->vid);
	}
	else
		printf("Skipping interface -> %s\n", ifname);
}

static int usage(int code)
{
	fprintf(stderr, "\nUsage: fdbtest [-h] [-v vid]\n\n"
		"  -h         Show summary of command line options and exit\n"
		"  -v vid     Set vlan id\n"
		"\n");

	return code;
}

int main(int argc , char *argv[])
{
	struct nl_cache *link_cache, *neigh_cache;
	struct nl_sock *nlsk;
	struct cb_data cbd = {.err = 0, .vid = 1 };
	clock_t start, end;
	double time_used;
	int err = 0;
	int c;

	while ((c = getopt(argc, argv, "hv:")) != EOF) {
		switch (c) {
		case 'h':
			return usage(0);

		case 'v':
			cbd.vid = atoi(optarg);
			break;

		default:
			return usage(1);
		}
	}

	printf("Init\n");
	nlsk = nl_socket_alloc();
	if (!nlsk)
		return -NLE_NOMEM;
	err = nl_connect(nlsk, NETLINK_ROUTE);
	if (err)
		goto free_nlsk;

	err = rtnl_link_alloc_cache(nlsk, AF_UNSPEC, &link_cache);
	if (err)
		goto free_nlsk;

	err = rtnl_neigh_alloc_cache(nlsk, &neigh_cache);
	if (err)
		goto free_lcache;

	printf("Using vlan %d\n\n", cbd.vid);
	cbd.arg = neigh_cache;
	nl_cache_foreach(link_cache, __add_fdb_defaults, &cbd);
	err = cbd.err;

	cbd.sk = nlsk;
	start = clock();
	nl_cache_foreach(neigh_cache, __add, &cbd);
	end = clock();
	err = cbd.err;
	time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
	printf("\nTime spent: %f sec\n", time_used);
	
	nl_cache_free(neigh_cache);
	printf("Exit %d\n", err);

 free_lcache:
	nl_cache_free(link_cache);
 free_nlsk:
	nl_socket_free(nlsk);

	return err;
}

[-- Attachment #4: Type: text/plain, Size: 621 bytes --]


> 16 dec. 2016 kl. 12:01 skrev Andrew Lunn <andrew@lunn.ch>:
> 
> On Fri, Dec 16, 2016 at 11:26:01AM +0100, Volodymyr Bendiuga wrote:
>> Hi Andrew,
>> 
>> I don't have any script right now, the code I have is a part of
>> the OS, but I could write simple C program which represents
>> what we are doing with mc entries and send it to you for profiling.
> 
> It would be nice to have a benchmark test we can use to test out
> ideas.
> 
> Please try to make the code flexible. The slave interface names on my
> boards are probably not the same as on your. Also, the number of ports
> may differ.
> 
>    Thanks
> 	Andrew


^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Daniel Borkmann @ 2016-12-23 10:04 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Andy Lutomirski, Alexei Starovoitov
  Cc: Jason A. Donenfeld, kernel-hardening@lists.openwall.com,
	Theodore Ts'o, Netdev, LKML, Linux Crypto Mailing List,
	David Laight, Eric Dumazet, Linus Torvalds, Eric Biggers,
	Tom Herbert, Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <1482425969.2673.5.camel@stressinduktion.org>

On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>>>> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>>>> <hannes@stressinduktion.org> wrote:
>>>>> IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>>>>> You don't want to give people new IPv6 addresses with the same stable
>>>>> secret (across reboots) after a kernel upgrade. Maybe they lose
>>>>> connectivity then and it is extra work?
>>>>
>>>> Ahh, too bad. So it goes.
>>>
>>> If no other users survive we can put it into the ipv6 module.
>>>
>>>>> The bpf hash stuff can be changed during this merge window, as it is
>>>>> not yet in a released kernel. Albeit I would probably have preferred
>>>>> something like sha256 here, which can be easily replicated by user
>>>>> space tools (minus the problem of patching out references to not
>>>>> hashable data, which must be zeroed).
>>>>
>>>> Oh, interesting, so time is of the essence then. Do you want to handle
>>>> changing the new eBPF code to something not-SHA1 before it's too late,
>>>> as part of a ne
>>
>> w patchset that can fast track itself to David? And
>>>> then I can preserve my large series for the next merge window.
>>>
>>> This algorithm should be a non-seeded algorithm, because the hashes
>>> should be stable and verifiable by user space tooling. Thus this would
>>> need a hashing algorithm that is hardened against pre-image
>>> attacks/collision resistance, which siphash is not. I would prefer some
>>> higher order SHA algorithm for that actually.
>>>
>>
>> You mean:
>>
>> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
>> Author: Daniel Borkmann <daniel@iogearbox.net>
>> Date:   Sun Dec 4 23:19:41 2016 +0100
>>
>>      bpf: add prog_digest and expose it via fdinfo/netlink
>>
>>
>> Yes, please!  This actually matters for security -- imagine a
>> malicious program brute-forcing a collision so that it gets loaded
>> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
>> (preferably the latter).  Speed basically doesn't matter here and
>> Blake2 is both less stable (didn't they slightly change it recently?)
>> and much less well studied.
>
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.
>
> There have been talks about signing bpf programs, thus this would
> probably need another digest algorithm additionally to that one,
> wasting probably instructions. Probably going somewhere in direction of
> PKCS#7 might be the thing to do (which leads to the problem to make
> PKCS#7 attackable by ordinary unpriv users, hmpf).
>
>> My inclination would have been to seed them with something that isn't
>> exposed to userspace for the precise reason that it would prevent user
>> code from making assumptions about what's in the hash.  But if there's
>> a use case for why user code needs to be able to calculate the hash on
>> its own, then that's fine.  But perhaps the actual fdinfo string
>> should be "sha256:abcd1234..." to give some flexibility down the road.
>>
>> Also:
>>
>> +       result = (__force __be32 *)fp->digest;
>> +       for (i = 0; i < SHA_DIGEST_WORDS; i++)
>> +               result[i] = cpu_to_be32(fp->digest[i]);
>>
>> Everyone, please, please, please don't open-code crypto primitives.
>> Is this and the code above it even correct?  It might be but on a very
>> brief glance it looks wrong to me.  If you're doing this to avoid
>> depending on crypto, then fix crypto so you can pull in the algorithm
>> without pulling in the whole crypto core.
>
> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Still trying to catch up on this admittedly bit confusing thread. I
did run automated tests over couple of days comparing the data I got
from fdinfo with the one from af_alg and found no mismatch on the test
cases varying from min to max possible program sizes. In the process
of testing, as you might have seen on netdev, I found couple of other
bugs in bpf code along the way and fixed them up as well. So my question,
do you or Andy or anyone participating in claiming this have any
concrete data or test cases that suggests something different? If yes,
I'm very curious to hear about it and willing fix it up, of course.
When I'm back from pto I'll prep and cook up my test suite to be
included into the selftests/bpf/, should have done this initially,
sorry about that. I'll also post something to expose the alg, that
sounds fine to me.

> I wondered if bpf program loading should have used the module loading
> infrastructure from the beginning...
>
>> At the very least, there should be a separate function that calculates
>> the hash of a buffer and that function should explicitly run itself
>> against test vectors of various lengths to make sure that it's
>> calculating what it claims to be calculating.  And it doesn't look
>> like the userspace code has landed, so, if this thing isn't
>> calculating SHA1 correctly, it's plausible that no one has noticed.
>
> I hope this was known from the beginning, this is not sha1 unfortunately.
>
> But ebpf elf programs also need preprocessing to get rid of some
> embedded load-depending data, so maybe it was considered to be just
> enough?
>
> Bye,
> Hannes
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Joao Pinto @ 2016-12-23 10:09 UTC (permalink / raw)
  To: Phil Reid, Joao Pinto, peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <bc49326e-15ff-cbb3-6117-9fb4b96cc281@electromag.com.au>


Hello Phil,

Às 1:09 AM de 12/23/2016, Phil Reid escreveu:
> G'day Joao,
> On 23/12/2016 01:06, Joao Pinto wrote:
>> Às 4:57 PM de 12/22/2016, Phil Reid escreveu:
>>> On 22/12/2016 23:47, Joao Pinto wrote:
>>>>
>>>> Hello Phil,
>>>>
>>>> Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
>>>>> G'day Joao,
>>>>>
>>>>> On 22/12/2016 20:38, Joao Pinto wrote:
>>>>>> When testing stmmac with my QoS reference design I checked a problem in the
>>>>>> CSR clock configuration that was impossibilitating the phy discovery, since
>>>>>> every read operation returned 0x0000ffff. This patch fixes the issue.
>>>>>>
>>>>>> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
>>>>>> ---
>>>>>> changes v1->v2 (David Miller)
>>>>>> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
>>>>>> to make sense
>>>>>>
>>>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>>>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>>>>>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
>>>>>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>>> index b21d03f..94223c8 100644
>>>>>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>>> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>>>>>> *ioaddr, int mcbins,
>>>>>>      mac->mii.reg_shift = 6;
>>>>>>      mac->mii.reg_mask = 0x000007C0;
>>>>>>      mac->mii.clk_csr_shift = 2;
>>>>>> -    mac->mii.clk_csr_mask = 0xF;
>>>>>> +    mac->mii.clk_csr_mask = GENMASK(4, 2);
>>>>>
>>>>> Should this not be GENMASK(5,2)
>>>>
>>>> According to Universal MAC databook (valid for MAC100 and MAC1000), we have:
>>>>
>>>> Bits: 4:2
>>>> 000 60-100 MHz clk_csr_i/42
>>>> 001 100-150 MHz clk_csr_i/62
>>>> 010 20-35 MHz clk_csr_i/16
>>>> 011 35-60 MHz clk_csr_i/26
>>>> 100 150-250 MHz clk_csr_i/102
>>>> 101 250-300 MHz clk_csr_i/124
>>>> 110, 111 Reserved
>>>>
>>>> So only bits 2, 3 and 4 should be masked.
>>>>
>>>> Thanks.
>>>>
>>> But this is a change in behaviour from what was there isn't.
>>> Previous mask was 4 bits. now it's 3.
>>>
>>> And for example the altera socfgpa implementation in the cyclone V has valid
>>> values
>>> for values 0x8-0xf, using bit 5:2.
>>
>> According to the databook, bit 5 is reserved and RO. When reserved, it is
>> possible to customize. Is that the case? If there is hardware using the 5th bit
>> we can put it GENMASK(5, 2) with no problem.
>>
> I've also checked the Aria 10 documentation and bit 5 is also RW.
> The following options are documented and supported
>     1000: CSR clock/4
>     1001: CSR clock/6
>     1010: CSR clock/8
>     1011: CSR clock/10
>     1100: CSR clock/12
>     1101: CSR clock/14
>     1110: CSR clock/16
>     1111: CSR clock/18
> They do mention that these values will probably be outside the IEEE 802.3
> specified range.
> 
> But there's at least a couple of cores out there where GENMASK(5,2) is valid.
> Can't say if anyone is using that setting thou.

Thanks for checking it! Ok, it seems like they are using the reserved bit 5. No
problem, I am going to change the patch and put the mask from 2 to 5. Thanks for
your help!

Joao

> 
> 

^ permalink raw reply

* [PATCH v3] stmmac: CSR clock configuration fix
From: Joao Pinto @ 2016-12-23 10:15 UTC (permalink / raw)
  To: peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, preid,
	netdev, Joao Pinto

When testing stmmac with my QoS reference design I checked a problem in the
CSR clock configuration that was impossibilitating the phy discovery, since
every read operation returned 0x0000ffff. This patch fixes the issue.

Signed-off-by: Joao Pinto <jpinto@synopsys.com>
---
changes v2->v3 (Phil Reid)
- Altera uses the reserved bit 5 also for CR clock, so the mask was changed
changes v1->v2 (David Miller)
- DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
to make sense

 drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index b21d03f..be3c91c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins,
 	mac->mii.reg_shift = 6;
 	mac->mii.reg_mask = 0x000007C0;
 	mac->mii.clk_csr_shift = 2;
-	mac->mii.clk_csr_mask = 0xF;
+	mac->mii.clk_csr_mask = GENMASK(5, 2);
 
 	/* Get and dump the chip ID */
 	*synopsys_id = stmmac_get_synopsys_id(hwid);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index a1d582f..9dd2987 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -197,7 +197,7 @@ struct mac_device_info *dwmac100_setup(void __iomem *ioaddr, int *synopsys_id)
 	mac->mii.reg_shift = 6;
 	mac->mii.reg_mask = 0x000007C0;
 	mac->mii.clk_csr_shift = 2;
-	mac->mii.clk_csr_mask = 0xF;
+	mac->mii.clk_csr_mask = GENMASK(5, 2);
 
 	/* Synopsys Id is not available on old chips */
 	*synopsys_id = 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 23322fd..fda01f7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -81,8 +81,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg)
 	value |= (phyaddr << priv->hw->mii.addr_shift)
 		& priv->hw->mii.addr_mask;
 	value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
-	value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask)
-		<< priv->hw->mii.clk_csr_shift;
+	value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+		& priv->hw->mii.clk_csr_mask;
 	if (priv->plat->has_gmac4)
 		value |= MII_GMAC4_READ;
 
@@ -122,8 +122,8 @@ static int stmmac_mdio_write(struct mii_bus *bus, int phyaddr, int phyreg,
 		& priv->hw->mii.addr_mask;
 	value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
 
-	value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask)
-		<< priv->hw->mii.clk_csr_shift);
+	value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+		& priv->hw->mii.clk_csr_mask;
 	if (priv->plat->has_gmac4)
 		value |= MII_GMAC4_WRITE;
 
-- 
2.9.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox