Netdev List
 help / color / mirror / Atom feed
* [PATCH] tipc: Fix tipc_sk_reinit handling of -EAGAIN
From: Bob Peterson @ 2017-08-23 14:43 UTC (permalink / raw)
  To: herbert, Phil Sutter, davem, netdev; +Cc: LKML
In-Reply-To: <1907606232.1152250.1503498773672.JavaMail.zimbra@redhat.com>

Hi,

In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit
had additional logic added to loop in the event that function
rhashtable_walk_next() returned -EAGAIN. No worries.

However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
and therefore skips the call to rhashtable_walk_stop(). That has
the effect of calling rcu_read_lock() without its paired call to
rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
may not be apparent for a while, especially since resize events may
be rare. But the comments to rhashtable_walk_start() state:

 * ...Note that we take the RCU lock in all
 * cases including when we return an error.  So you must always call
 * rhashtable_walk_stop to clean up.

This patch replaces the continue with a goto and label to ensure a
matching call to rhashtable_walk_stop().

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 101e3597338f..d50edd6e0019 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2255,8 +2255,8 @@ void tipc_sk_reinit(struct net *net)
 
 	do {
 		tsk = ERR_PTR(rhashtable_walk_start(&iter));
-		if (tsk)
-			continue;
+		if (IS_ERR(tsk))
+			goto walk_stop;
 
 		while ((tsk = rhashtable_walk_next(&iter)) && !IS_ERR(tsk)) {
 			spin_lock_bh(&tsk->sk.sk_lock.slock);
@@ -2265,7 +2265,7 @@ void tipc_sk_reinit(struct net *net)
 			msg_set_orignode(msg, tn->own_addr);
 			spin_unlock_bh(&tsk->sk.sk_lock.slock);
 		}
-
+walk_stop:
 		rhashtable_walk_stop(&iter);
 	} while (tsk == ERR_PTR(-EAGAIN));
 }

^ permalink raw reply related

* Re: [PATCH v2 net-next 1/5] selftests/bpf: add a test for a bug in liveness-based pruning
From: Daniel Borkmann via iovisor-dev @ 2017-08-23 14:42 UTC (permalink / raw)
  To: Edward Cree, davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <83dbbb0c-3cbc-e033-0ceb-f31db6eb57c2-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

On 08/23/2017 04:09 PM, Edward Cree wrote:
> Writes in straight-line code should not prevent reads from propagating
>   along jumps.  With current verifier code, the jump from 3 to 5 does not
>   add a read mark on 3:R0 (because 5:R0 has a write mark), meaning that
>   the jump from 1 to 3 gets pruned as safe even though R0 is NOT_INIT.
>
> Verifier output:
> 0: (61) r2 = *(u32 *)(r1 +0)
> 1: (35) if r2 >= 0x0 goto pc+1
>   R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
> 2: (b7) r0 = 0
> 3: (35) if r2 >= 0x0 goto pc+1
>   R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
> 4: (b7) r0 = 0
> 5: (95) exit
>
> from 3 to 5: safe
>
> from 1 to 3: safe
> processed 8 insns, stack depth 0
>
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

Acked-by: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>

^ permalink raw reply

* Re: [PATCH v2 net-next 2/5] bpf/verifier: when pruning a branch, ignore its write marks
From: Alexei Starovoitov via iovisor-dev @ 2017-08-23 14:42 UTC (permalink / raw)
  To: Edward Cree, davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <ec2ee579-8391-c023-348a-770faa6bc5c7-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

On 8/23/17 7:10 AM, Edward Cree wrote:
> The fact that writes occurred in reaching the continuation state does
>  not screen off its reads from us, because we're not really its parent.
> So detect 'not really the parent' in do_propagate_liveness, and ignore
>  write marks in that case.
>
> Fixes: dc503a8ad984 ("bpf/verifier: track liveness for pruning")
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

Acked-by: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

^ permalink raw reply

* Re: [PATCH v2 net-next 4/5] bpf/verifier: remove varlen_map_value_access flag
From: Alexei Starovoitov via iovisor-dev @ 2017-08-23 14:41 UTC (permalink / raw)
  To: Edward Cree, davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <34d97e19-2de8-d336-ba13-77d3b02c5f20-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

On 8/23/17 7:10 AM, Edward Cree wrote:
> The optimisation it does is broken when the 'new' register value has a
>  variable offset and the 'old' was constant.  I broke it with my pointer
>  types unification (see Fixes tag below), before which the 'new' value
>  would have type PTR_TO_MAP_VALUE_ADJ and would thus not compare equal;
>  other changes in that patch mean that its original behaviour (ignore
>  min/max values) cannot be restored.
> Tests on a sample set of cilium programs show no change in count of
>  processed instructions.
>
> Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

Acked-by: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

^ permalink raw reply

* Re: [PATCH v2 net-next 5/5] bpf/verifier: document liveness analysis
From: Alexei Starovoitov via iovisor-dev @ 2017-08-23 14:41 UTC (permalink / raw)
  To: Edward Cree, davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <3d4870a2-d7ee-2ac0-aaed-a9faeae89b9e-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

On 8/23/17 7:11 AM, Edward Cree wrote:
> The liveness tracking algorithm is quite subtle; add comments to explain it.
>
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

Acked-by: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

^ permalink raw reply

* [PATCH v2] at76c50x-usb: check memory allocation failure
From: Himanshu Jha @ 2017-08-23 14:29 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, netdev, Himanshu Jha

Check memory allocation failure and return -ENOMEM rather than just
retuning void.

Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
---
 drivers/net/wireless/atmel/at76c50x-usb.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/wireless/atmel/at76c50x-usb.c b/drivers/net/wireless/atmel/at76c50x-usb.c
index 09defbc..ceb453b 100644
--- a/drivers/net/wireless/atmel/at76c50x-usb.c
+++ b/drivers/net/wireless/atmel/at76c50x-usb.c
@@ -932,7 +932,7 @@ static int at76_set_autorate_fallback(struct at76_priv *priv, int onoff)
 	return ret;
 }
 
-static void at76_dump_mib_mac_addr(struct at76_priv *priv)
+static int at76_dump_mib_mac_addr(struct at76_priv *priv)
 {
 	int i;
 	int ret;
@@ -940,7 +940,7 @@ static void at76_dump_mib_mac_addr(struct at76_priv *priv)
 					 GFP_KERNEL);
 
 	if (!m)
-		return;
+		return -ENOMEM;
 
 	ret = at76_get_mib(priv->udev, MIB_MAC_ADDR, m,
 			   sizeof(struct mib_mac_addr));
@@ -961,7 +961,7 @@ static void at76_dump_mib_mac_addr(struct at76_priv *priv)
 	kfree(m);
 }
 
-static void at76_dump_mib_mac_wep(struct at76_priv *priv)
+static int at76_dump_mib_mac_wep(struct at76_priv *priv)
 {
 	int i;
 	int ret;
@@ -969,7 +969,7 @@ static void at76_dump_mib_mac_wep(struct at76_priv *priv)
 	struct mib_mac_wep *m = kmalloc(sizeof(struct mib_mac_wep), GFP_KERNEL);
 
 	if (!m)
-		return;
+		return -ENOMEM;
 
 	ret = at76_get_mib(priv->udev, MIB_MAC_WEP, m,
 			   sizeof(struct mib_mac_wep));
@@ -999,14 +999,14 @@ static void at76_dump_mib_mac_wep(struct at76_priv *priv)
 	kfree(m);
 }
 
-static void at76_dump_mib_mac_mgmt(struct at76_priv *priv)
+static int at76_dump_mib_mac_mgmt(struct at76_priv *priv)
 {
 	int ret;
 	struct mib_mac_mgmt *m = kmalloc(sizeof(struct mib_mac_mgmt),
 					 GFP_KERNEL);
 
 	if (!m)
-		return;
+		return -ENOMEM;
 
 	ret = at76_get_mib(priv->udev, MIB_MAC_MGMT, m,
 			   sizeof(struct mib_mac_mgmt));
@@ -1037,13 +1037,13 @@ static void at76_dump_mib_mac_mgmt(struct at76_priv *priv)
 	kfree(m);
 }
 
-static void at76_dump_mib_mac(struct at76_priv *priv)
+static int at76_dump_mib_mac(struct at76_priv *priv)
 {
 	int ret;
 	struct mib_mac *m = kmalloc(sizeof(struct mib_mac), GFP_KERNEL);
 
 	if (!m)
-		return;
+		return -ENOMEM;
 
 	ret = at76_get_mib(priv->udev, MIB_MAC, m, sizeof(struct mib_mac));
 	if (ret < 0) {
@@ -1074,13 +1074,13 @@ static void at76_dump_mib_mac(struct at76_priv *priv)
 	kfree(m);
 }
 
-static void at76_dump_mib_phy(struct at76_priv *priv)
+static int at76_dump_mib_phy(struct at76_priv *priv)
 {
 	int ret;
 	struct mib_phy *m = kmalloc(sizeof(struct mib_phy), GFP_KERNEL);
 
 	if (!m)
-		return;
+		return -ENOMEM;
 
 	ret = at76_get_mib(priv->udev, MIB_PHY, m, sizeof(struct mib_phy));
 	if (ret < 0) {
@@ -1107,13 +1107,13 @@ static void at76_dump_mib_phy(struct at76_priv *priv)
 	kfree(m);
 }
 
-static void at76_dump_mib_local(struct at76_priv *priv)
+static int at76_dump_mib_local(struct at76_priv *priv)
 {
 	int ret;
 	struct mib_local *m = kmalloc(sizeof(*m), GFP_KERNEL);
 
 	if (!m)
-		return;
+		return -ENOMEM;
 
 	ret = at76_get_mib(priv->udev, MIB_LOCAL, m, sizeof(*m));
 	if (ret < 0) {
@@ -1132,13 +1132,13 @@ static void at76_dump_mib_local(struct at76_priv *priv)
 	kfree(m);
 }
 
-static void at76_dump_mib_mdomain(struct at76_priv *priv)
+static int at76_dump_mib_mdomain(struct at76_priv *priv)
 {
 	int ret;
 	struct mib_mdomain *m = kmalloc(sizeof(struct mib_mdomain), GFP_KERNEL);
 
 	if (!m)
-		return;
+		return -ENOMEM;
 
 	ret = at76_get_mib(priv->udev, MIB_MDOMAIN, m,
 			   sizeof(struct mib_mdomain));
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Koichiro Den @ 2017-08-23 14:28 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Willem de Bruijn, virtualization, Network Development
In-Reply-To: <20170822204015-mutt-send-email-mst@kernel.org>

On Tue, 2017-08-22 at 20:55 +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 22, 2017 at 10:50:41AM +0800, Jason Wang wrote:
> > > Perhaps the descriptor pool should also be
> > > revised to allow out of order completions. Then there is no need to
> > > copy zerocopy packets whenever they may experience delay.
> > 
> > Yes, but as replied in the referenced thread, windows driver may treat out
> > of order completion as a bug.
> 
> That would be a windows driver bug then, but I don't think it makes this
> assumption. What the referenced thread
> (https://patchwork.kernel.org/patch/3787671/) is saying is that host
> must use any buffers made available on a tx vq within a reasonable
> timeframe otherwise windows guests panic.
> 
> Ideally we would detect that a packet is actually experiencing delay and
> trigger the copy at that point e.g. by calling skb_linearize. But it
> isn't easy to track these packets though and even harder to do a data
> copy without races.
> 
> Which reminds me that skb_linearize in net core seems to be
> fundamentally racy - I suspect that if skb is cloned, and someone is
> trying to use the shared frags while another thread calls skb_linearize,
> we get some use after free bugs which likely mostly go undetected
> because the corrupted packets mostly go on wire and get dropped
> by checksum code.
> 
Please let me make sure if I understand it correctly:
* always do copy with skb_orphan_frags_rx as Willem mentioned in the earlier
post, before the xmit_skb as opposed to my original patch, is safe but too
costly so cannot be adopted.
* as a generic solution, if we were to somehow overcome the safety issue, track
the delay and do copy if some threshold is reached could be an answer, but it's
hard for now.
* so things like the current vhost-net implementation of deciding whether or not
to do zerocopy beforehand referring the zerocopy tx error ratio is a point of
practical compromise.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Koichiro Den @ 2017-08-23 14:26 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Michael S. Tsirkin, virtualization, Network Development
In-Reply-To: <CAF=yD-JyLEUgUgeH5TjRUj2wz9CjRkAx1HUyXRxCDyT5iHx-sg@mail.gmail.com>

On Tue, 2017-08-22 at 13:19 -0400, Willem de Bruijn wrote:
> > > > >         /* Don't wait up for transmitted skbs to be freed. */
> > > > >         if (!use_napi) {
> > > > > +               if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
> > > > > +                       struct ubuf_info *uarg;
> > > > > +                       uarg = skb_shinfo(skb)->destructor_arg;
> > > > > +                       if (uarg->callback)
> > > > > +                           uarg->callback(uarg, true);
> > > > > +                       skb_shinfo(skb)->destructor_arg = NULL;
> > > > > +                       skb_shinfo(skb)->tx_flags &=
> > > > > ~SKBTX_DEV_ZEROCOPY;
> > > > > +               }
> > > > 
> > > > Instead of open coding, this can use skb_zcopy_clear.
> > > 
> > > It is not correct to only send the zerocopy completion here, as
> > > the skb will still be shared with the nic. The only safe approach
> > > to notifying early is to create a copy with skb_orphan_frags_rx.
> > > That will call skb_zcopy_clear(skb, false) internally. But the
> > > copy will be very expensive.
> > 
> > I noticed this email after my last post. I cannot not imagine how it could
> > be
> > unsafe in virtio case. Sorry could you explain a bit more?
> 
> A guest process sends a packet with MSG_ZEROCOPY to the
> virtio-net device. As soon as the process receives the completion
> notification, it is allowed to reuse the memory backing the packet.
> 
> A call to skb_zcopy_clear in virtio-net start_xmit will notify the
> process that it is allowed to reuse the memory. But the user pages
> are still linked into the skb frags and are about to be shared with
> the host os.
> 
> > > Is the deadlock you refer to the case where tx interrupts are
> > > disabled, so transmit completions are only handled in start_xmit
> > > and somehow the socket(s) are unable to send new data? The
> > > question is what is blocking them. If it is zerocopy, it may be a
> > > too low optmem_max or hitting the per-user locked pages limit.
> > > We may have to keep interrupts enabled when zerocopy skbs
> > > are in flight.
> > 
> > Even if we keep interrupts enabled, We miss the completion without
> > start_xmit.
> > And it's also likely that the next start_xmit depends on the completion
> > itself
> > as I described in my last post.
> > 
Thanks for the explanation, I misinterpreted the "nic" part, now it's clear.
Sorry about bothering.

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Koichiro Den @ 2017-08-23 14:24 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Jason Wang, Michael S. Tsirkin, virtualization,
	Network Development
In-Reply-To: <CAF=yD-+AOnYafiwAf+v+fhyg_0fi-6LdQSnPYcoAcngJqYs9dg@mail.gmail.com>

On Tue, 2017-08-22 at 13:16 -0400, Willem de Bruijn wrote:
> > > > An issue of the referenced patch is that sndbuf could be smaller than
> > > > low
> > > > watermark.
> > 
> > We cannot determine the low watermark properly because of not only sndbuf
> > size
> > issue but also the fact that the upper vhost-net cannot directly see how
> > much
> > descriptor is currently available at the virtio-net tx queue. It depends on
> > multiqueue settings or other senders which are also using the same tx queue.
> > Note that in the latter case if they constantly transmitting, the deadlock
> > could
> > not occur(*), however if it has just temporarily fulfill some portion of the
> > pool in the mean time, then the low watermark cannot be helpful.
> > (*: That is because it's reliable enough in the sense I mention below.)
> > 
> > Keep in this in mind, let me briefly describe the possible deadlock I
> > mentioned:
> > (1). vhost-net on L1 guest has nothing to do sendmsg until the upper layer
> > sets
> > new descriptors, which depends only on the vhost-net zcopy callback and
> > adding
> > newly used descriptors.
> > (2). vhost-net callback depends on the skb freeing on the xmit path only.
> > (3). the xmit path depends (possibly only) on the vhost-net sendmsg.
> > As you see, it's enough to bring about the situation above that L1 virtio-
> > net
> > reaches its limit earlier than the L0 host processing. The vhost-net pool
> > could
> > be almost full or empty, whatever.
> 
> Thanks for the context. This issue is very similar to the one that used to
> exist when running out of transmit descriptors, before the removal of
> the timer and introduction of skb_orphan in start_xmit.
> 
> To make sure that I understand correctly, let me paraphrase:
> 
> A. guest socket cannot send because it exhausted its sk budget (sndbuf, tsq,
> ..)
> 
> B. budget is not freed up until guest receives tx completion for this flow
> 
> C. tx completion is held back on the host side in vhost_zerocopy_signal_used
>    behind the completion for an unrelated skb
> 
> D. unrelated packet is delayed somewhere in the host stackf zerocopy
> completions.
>    e.g., netem
> 
> The issue that is specific to vhost-net zerocopy is that (C) enforces strict
> ordering of transmit completions causing head of line blocking behind
> vhost-net zerocopy callbacks.
> 
> This is a different problem from
> 
> C1. tx completion is delayed until guest sends another packet and
>        triggers free_old_xmit_skb
> 
> Both in host and guest, zerocopy packets should never be able to loop
> to a receive path where they can cause unbounded delay.
> 
> The obvious cases of latency are queueing, like netem. That leads
> to poor performance for unrelated flows, but I don't see how this
> could cause deadlock.

Thanks for the wrap-up. I see all the points now and also that C1 should not
cause deadlock.

^ permalink raw reply

* [PATCH v2 net-next 5/5] bpf/verifier: document liveness analysis
From: Edward Cree @ 2017-08-23 14:11 UTC (permalink / raw)
  To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, iovisor-dev
In-Reply-To: <f9208b68-466e-c6cb-4d0c-c567e40bdc15@solarflare.com>

The liveness tracking algorithm is quite subtle; add comments to explain it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 include/linux/bpf_verifier.h | 13 +++++++++++++
 kernel/bpf/verifier.c        | 28 +++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index d8f131a..b8d200f 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -21,6 +21,19 @@
  */
 #define BPF_MAX_VAR_SIZ	INT_MAX
 
+/* Liveness marks, used for registers and spilled-regs (in stack slots).
+ * Read marks propagate upwards until they find a write mark; they record that
+ * "one of this state's descendants read this reg" (and therefore the reg is
+ * relevant for states_equal() checks).
+ * Write marks collect downwards and do not propagate; they record that "the
+ * straight-line code that reached this state (from its parent) wrote this reg"
+ * (and therefore that reads propagated from this state or its descendants
+ * should not propagate to its parent).
+ * A state with a write mark can receive read marks; it just won't propagate
+ * them to its parent, since the write mark is a property, not of the state,
+ * but of the link between it and its parent.  See mark_reg_read() and
+ * mark_stack_slot_read() in kernel/bpf/verifier.c.
+ */
 enum bpf_reg_liveness {
 	REG_LIVE_NONE = 0, /* reg hasn't been read or written this branch */
 	REG_LIVE_READ, /* reg was read, so we're sensitive to initial value */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 711bdbd..d690c7d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3417,6 +3417,12 @@ static bool states_equal(struct bpf_verifier_env *env,
 	return ret;
 }
 
+/* A write screens off any subsequent reads; but write marks come from the
+ * straight-line code between a state and its parent.  When we arrive at a
+ * jump target (in the first iteration of the propagate_liveness() loop),
+ * we didn't arrive by the straight-line code, so read marks in state must
+ * propagate to parent regardless of state's write marks.
+ */
 static bool do_propagate_liveness(const struct bpf_verifier_state *state,
 				  struct bpf_verifier_state *parent)
 {
@@ -3457,6 +3463,15 @@ static bool do_propagate_liveness(const struct bpf_verifier_state *state,
 	return touched;
 }
 
+/* "parent" is "a state from which we reach the current state", but initially
+ * it is not the state->parent (i.e. "the state whose straight-line code leads
+ * to the current state"), instead it is the state that happened to arrive at
+ * a (prunable) equivalent of the current state.  See comment above
+ * do_propagate_liveness() for consequences of this.
+ * This function is just a more efficient way of calling mark_reg_read() or
+ * mark_stack_slot_read() on each reg in "parent" that is read in "state",
+ * though it requires that parent != state->parent in the call arguments.
+ */
 static void propagate_liveness(const struct bpf_verifier_state *state,
 			       struct bpf_verifier_state *parent)
 {
@@ -3485,6 +3500,12 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 			/* reached equivalent register/stack state,
 			 * prune the search.
 			 * Registers read by the continuation are read by us.
+			 * If we have any write marks in env->cur_state, they
+			 * will prevent corresponding reads in the continuation
+			 * from reaching our parent (an explored_state).  Our
+			 * own state will get the read marks recorded, but
+			 * they'll be immediately forgotten as we're pruning
+			 * this state and will pop a new one.
 			 */
 			propagate_liveness(&sl->state, &env->cur_state);
 			return 1;
@@ -3508,7 +3529,12 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
 	env->explored_states[insn_idx] = new_sl;
 	/* connect new state to parentage chain */
 	env->cur_state.parent = &new_sl->state;
-	/* clear liveness marks in current state */
+	/* clear write marks in current state: the writes we did are not writes
+	 * our child did, so they don't screen off its reads from us.
+	 * (There are no read marks in current state, because reads always mark
+	 * their parent and current state never has children yet.  Only
+	 * explored_states can get read marks.)
+	 */
 	for (i = 0; i < BPF_REG_FP; i++)
 		env->cur_state.regs[i].live = REG_LIVE_NONE;
 	for (i = 0; i < MAX_BPF_STACK / BPF_REG_SIZE; i++)

^ permalink raw reply related

* [PATCH v2 net-next 4/5] bpf/verifier: remove varlen_map_value_access flag
From: Edward Cree via iovisor-dev @ 2017-08-23 14:10 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov, Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <f9208b68-466e-c6cb-4d0c-c567e40bdc15-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

The optimisation it does is broken when the 'new' register value has a
 variable offset and the 'old' was constant.  I broke it with my pointer
 types unification (see Fixes tag below), before which the 'new' value
 would have type PTR_TO_MAP_VALUE_ADJ and would thus not compare equal;
 other changes in that patch mean that its original behaviour (ignore
 min/max values) cannot be restored.
Tests on a sample set of cilium programs show no change in count of
 processed instructions.

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
 include/linux/bpf_verifier.h |  1 -
 kernel/bpf/verifier.c        | 41 ++++++++++++-----------------------------
 2 files changed, 12 insertions(+), 30 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 91d07ef..d8f131a 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -125,7 +125,6 @@ struct bpf_verifier_env {
 	u32 id_gen;			/* used to generate unique reg IDs */
 	bool allow_ptr_leaks;
 	bool seen_direct_write;
-	bool varlen_map_value_access;
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fdbaa60..711bdbd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -832,11 +832,6 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 	 */
 	if (log_level)
 		print_verifier_state(state);
-	/* If the offset is variable, we will need to be stricter in state
-	 * pruning from now on.
-	 */
-	if (!tnum_is_const(reg->var_off))
-		env->varlen_map_value_access = true;
 	/* The minimum value is only important with signed
 	 * comparisons where we can't assume the floor of a
 	 * value is 0.  If we are using signed variables for our
@@ -3247,9 +3242,8 @@ static bool check_ids(u32 old_id, u32 cur_id, struct idpair *idmap)
 }
 
 /* Returns true if (rold safe implies rcur safe) */
-static bool regsafe(struct bpf_reg_state *rold,
-		    struct bpf_reg_state *rcur,
-		    bool varlen_map_access, struct idpair *idmap)
+static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
+		    struct idpair *idmap)
 {
 	if (!(rold->live & REG_LIVE_READ))
 		/* explored state didn't use this */
@@ -3281,22 +3275,14 @@ static bool regsafe(struct bpf_reg_state *rold,
 			       tnum_is_unknown(rold->var_off);
 		}
 	case PTR_TO_MAP_VALUE:
-		if (varlen_map_access) {
-			/* If the new min/max/var_off satisfy the old ones and
-			 * everything else matches, we are OK.
-			 * We don't care about the 'id' value, because nothing
-			 * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
-			 */
-			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
-			       range_within(rold, rcur) &&
-			       tnum_in(rold->var_off, rcur->var_off);
-		} else {
-			/* If the ranges/var_off were not the same, but
-			 * everything else was and we didn't do a variable
-			 * access into a map then we are a-ok.
-			 */
-			return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0;
-		}
+		/* If the new min/max/var_off satisfy the old ones and
+		 * everything else matches, we are OK.
+		 * We don't care about the 'id' value, because nothing
+		 * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
+		 */
+		return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
+		       range_within(rold, rcur) &&
+		       tnum_in(rold->var_off, rcur->var_off);
 	case PTR_TO_MAP_VALUE_OR_NULL:
 		/* a PTR_TO_MAP_VALUE could be safe to use as a
 		 * PTR_TO_MAP_VALUE_OR_NULL into the same map.
@@ -3380,7 +3366,6 @@ static bool states_equal(struct bpf_verifier_env *env,
 			 struct bpf_verifier_state *old,
 			 struct bpf_verifier_state *cur)
 {
-	bool varlen_map_access = env->varlen_map_value_access;
 	struct idpair *idmap;
 	bool ret = false;
 	int i;
@@ -3391,8 +3376,7 @@ static bool states_equal(struct bpf_verifier_env *env,
 		return false;
 
 	for (i = 0; i < MAX_BPF_REG; i++) {
-		if (!regsafe(&old->regs[i], &cur->regs[i], varlen_map_access,
-			     idmap))
+		if (!regsafe(&old->regs[i], &cur->regs[i], idmap))
 			goto out_free;
 	}
 
@@ -3412,7 +3396,7 @@ static bool states_equal(struct bpf_verifier_env *env,
 			continue;
 		if (!regsafe(&old->spilled_regs[i / BPF_REG_SIZE],
 			     &cur->spilled_regs[i / BPF_REG_SIZE],
-			     varlen_map_access, idmap))
+			     idmap))
 			/* when explored and current stack slot are both storing
 			 * spilled registers, check that stored pointers types
 			 * are the same as well.
@@ -3555,7 +3539,6 @@ static int do_check(struct bpf_verifier_env *env)
 	init_reg_state(regs);
 	state->parent = NULL;
 	insn_idx = 0;
-	env->varlen_map_value_access = false;
 	for (;;) {
 		struct bpf_insn *insn;
 		u8 class;

^ permalink raw reply related

* [PATCH v2 net-next 3/5] selftests/bpf: add a test for a pruning bug in the verifier
From: Edward Cree via iovisor-dev @ 2017-08-23 14:10 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov, Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <f9208b68-466e-c6cb-4d0c-c567e40bdc15-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

From: Alexei Starovoitov <ast-b10kYP2dOMg@public.gmane.org>

The test makes a read through a map value pointer, then considers pruning
 a branch where the register holds an adjusted map value pointer.  It
 should not prune, but currently it does.

Signed-off-by: Alexei Starovoitov <ast-b10kYP2dOMg@public.gmane.org>
[ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org: added test-name and patch description]
Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
 tools/testing/selftests/bpf/test_verifier.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index c912734..353d170 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -6503,6 +6503,34 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_LWT_IN,
 	},
+	{
+		"varlen_map_value_access pruning",
+		.insns = {
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				     BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+			BPF_MOV32_IMM(BPF_REG_2, MAX_ENTRIES),
+			BPF_JMP_REG(BPF_JSGT, BPF_REG_2, BPF_REG_1, 1),
+			BPF_MOV32_IMM(BPF_REG_1, 0),
+			BPF_ALU32_IMM(BPF_LSH, BPF_REG_1, 2),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+			BPF_JMP_IMM(BPF_JA, 0, 0, 0),
+			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0,
+				   offsetof(struct test_val, foo)),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr_unpriv = "R0 leaks addr",
+		.errstr = "R0 unbounded memory access",
+		.result_unpriv = REJECT,
+		.result = REJECT,
+		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)

^ permalink raw reply related

* [PATCH v2 net-next 2/5] bpf/verifier: when pruning a branch, ignore its write marks
From: Edward Cree via iovisor-dev @ 2017-08-23 14:10 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov, Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <f9208b68-466e-c6cb-4d0c-c567e40bdc15-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

The fact that writes occurred in reaching the continuation state does
 not screen off its reads from us, because we're not really its parent.
So detect 'not really the parent' in do_propagate_liveness, and ignore
 write marks in that case.

Fixes: dc503a8ad984 ("bpf/verifier: track liveness for pruning")
Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
 kernel/bpf/verifier.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e42c096..fdbaa60 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3436,6 +3436,7 @@ static bool states_equal(struct bpf_verifier_env *env,
 static bool do_propagate_liveness(const struct bpf_verifier_state *state,
 				  struct bpf_verifier_state *parent)
 {
+	bool writes = parent == state->parent; /* Observe write marks */
 	bool touched = false; /* any changes made? */
 	int i;
 
@@ -3447,7 +3448,9 @@ static bool do_propagate_liveness(const struct bpf_verifier_state *state,
 	for (i = 0; i < BPF_REG_FP; i++) {
 		if (parent->regs[i].live & REG_LIVE_READ)
 			continue;
-		if (state->regs[i].live == REG_LIVE_READ) {
+		if (writes && (state->regs[i].live & REG_LIVE_WRITTEN))
+			continue;
+		if (state->regs[i].live & REG_LIVE_READ) {
 			parent->regs[i].live |= REG_LIVE_READ;
 			touched = true;
 		}
@@ -3460,7 +3463,9 @@ static bool do_propagate_liveness(const struct bpf_verifier_state *state,
 			continue;
 		if (parent->spilled_regs[i].live & REG_LIVE_READ)
 			continue;
-		if (state->spilled_regs[i].live == REG_LIVE_READ) {
+		if (writes && (state->spilled_regs[i].live & REG_LIVE_WRITTEN))
+			continue;
+		if (state->spilled_regs[i].live & REG_LIVE_READ) {
 			parent->spilled_regs[i].live |= REG_LIVE_READ;
 			touched = true;
 		}

^ permalink raw reply related

* [PATCH v2 net-next 1/5] selftests/bpf: add a test for a bug in liveness-based pruning
From: Edward Cree @ 2017-08-23 14:09 UTC (permalink / raw)
  To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, iovisor-dev
In-Reply-To: <f9208b68-466e-c6cb-4d0c-c567e40bdc15@solarflare.com>

Writes in straight-line code should not prevent reads from propagating
 along jumps.  With current verifier code, the jump from 3 to 5 does not
 add a read mark on 3:R0 (because 5:R0 has a write mark), meaning that
 the jump from 1 to 3 gets pruned as safe even though R0 is NOT_INIT.

Verifier output:
0: (61) r2 = *(u32 *)(r1 +0)
1: (35) if r2 >= 0x0 goto pc+1
 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
2: (b7) r0 = 0
3: (35) if r2 >= 0x0 goto pc+1
 R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
4: (b7) r0 = 0
5: (95) exit

from 3 to 5: safe

from 1 to 3: safe
processed 8 insns, stack depth 0

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 tools/testing/selftests/bpf/test_verifier.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index c03542c..c912734 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -6487,6 +6487,22 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_LWT_IN,
 	},
+	{
+		"liveness pruning and write screening",
+		.insns = {
+			/* Get an unknown value */
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0),
+			/* branch conditions teach us nothing about R2 */
+			BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_JMP_IMM(BPF_JGE, BPF_REG_2, 0, 1),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R0 !read_ok",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_LWT_IN,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)

^ permalink raw reply related

* [PATCH net 7/7] net:ethernet:aquantia: Show info message if bad firmware version detected.
From: Pavel Belous @ 2017-08-23 14:05 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, David Arcari, Igor Russkikh, Nadezhda Krupnina,
	Simon Edelhaus, Pavel Belous, Pavel Belous
In-Reply-To: <cover.1503496727.git.pavel.belous@aquantia.com>

From: Pavel Belous <pavel.belous@aquantia.com>

We should inform user about wrong firmware version
by printing message in dmesg.

Fixes: 3d2ff7eebe26 ("net: ethernet: aquantia: Atlantic hardware abstraction layer")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index eb3ae51..ad4cc53 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -141,6 +141,12 @@ static int hw_atl_utils_init_ucp(struct aq_hw_s *self,
 
 	err = hw_atl_utils_ver_match(aq_hw_caps->fw_ver_expected,
 				     aq_hw_read_reg(self, 0x18U));
+
+	if (err < 0)
+		pr_err("%s: Bad FW version detected: expected=%x, actual=%x\n",
+		       AQ_CFG_DRV_NAME,
+		       aq_hw_caps->fw_ver_expected,
+		       aq_hw_read_reg(self, 0x18U));
 	return err;
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 net-next 0/5] bpf: verifier fixes
From: Edward Cree via iovisor-dev @ 2017-08-23 14:07 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
	Alexei Starovoitov, Daniel Borkmann
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev

Fix a couple of bugs introduced in my recent verifier patches.
Patch #2 does slightly increase the insn count on bpf_lxc.o, but only by
 about a hundred insns (i.e. 0.2%).

v2: added test for write-marks bug (patch #1); reworded comment on
 propagate_liveness() for clarity.

Alexei Starovoitov (1):
  selftests/bpf: add a test for a pruning bug in the verifier

Edward Cree (4):
  selftests/bpf: add a test for a bug in liveness-based pruning
  bpf/verifier: when pruning a branch, ignore its write marks
  bpf/verifier: remove varlen_map_value_access flag
  bpf/verifier: document liveness analysis

 include/linux/bpf_verifier.h                | 14 +++++-
 kernel/bpf/verifier.c                       | 78 +++++++++++++++++------------
 tools/testing/selftests/bpf/test_verifier.c | 44 ++++++++++++++++
 3 files changed, 103 insertions(+), 33 deletions(-)

^ permalink raw reply

* [PATCH net 6/7] net:ethernet:aquantia: Fix for multicast filter handling.
From: Pavel Belous @ 2017-08-23 14:05 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, David Arcari, Igor Russkikh, Nadezhda Krupnina,
	Simon Edelhaus
In-Reply-To: <cover.1503496727.git.pavel.belous@aquantia.com>

From: Igor Russkikh <Igor.Russkikh.aquantia.com>

Since the HW supports up to 32 multicast filters we should
track count of multicast filters to avoid overflow.
If we attempt to add >32 multicast filter - just set NETIF_ALLMULTI flag
instead.

Fixes: 94f6c9e4cdf6 ("net: ethernet: aquantia: Support for NIC-specific code")
Signed-off-by: Igor Russkikh <Igor.Russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index dce17a5..6ac9e26 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -669,11 +669,26 @@ int aq_nic_set_multicast_list(struct aq_nic_s *self, struct net_device *ndev)
 	netdev_for_each_mc_addr(ha, ndev) {
 		ether_addr_copy(self->mc_list.ar[i++], ha->addr);
 		++self->mc_list.count;
+
+		if (i >= AQ_CFG_MULTICAST_ADDRESS_MAX)
+			break;
 	}
 
-	return self->aq_hw_ops.hw_multicast_list_set(self->aq_hw,
+	if (i >= AQ_CFG_MULTICAST_ADDRESS_MAX) {
+		/* Number of filters is too big: atlantic does not support this.
+		 * Force all multi filter to support this.
+		 * With this we disable all UC filters and setup "all pass"
+		 * multicast mask
+		 */
+		self->packet_filter |= IFF_ALLMULTI;
+		self->aq_hw->aq_nic_cfg->mc_list_count = 0;
+		return self->aq_hw_ops.hw_packet_filter_set(self->aq_hw,
+							self->packet_filter);
+	} else {
+		return self->aq_hw_ops.hw_multicast_list_set(self->aq_hw,
 						    self->mc_list.ar,
 						    self->mc_list.count);
+	}
 }
 
 int aq_nic_set_mtu(struct aq_nic_s *self, int new_mtu)
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 5/7] net:ethernet:aquantia: Fix for incorrect speed index.
From: Pavel Belous @ 2017-08-23 14:05 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, David Arcari, Igor Russkikh, Nadezhda Krupnina,
	Simon Edelhaus, Pavel Belous, Pavel Belous
In-Reply-To: <cover.1503496727.git.pavel.belous@aquantia.com>

From: Pavel Belous <pavel.belous@aquantia.com>

The driver choose the optimal interrupt throttling settings depends
of current link speed.
Due this bug link_status field from aq_hw is never updated and as result
always used same interrupt throttling values.

Fixes: 3d2ff7eebe26 ("net: ethernet: aquantia: Atlantic hardware abstraction layer")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h     |  3 +--
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c    | 31 ++++++++++------------
 .../aquantia/atlantic/hw_atl/hw_atl_utils.c        |  4 +--
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h        |  3 +--
 4 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_hw.h b/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
index fce0fd3..bf9b3f0 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
@@ -105,8 +105,7 @@ struct aq_hw_ops {
 
 	int (*hw_set_mac_address)(struct aq_hw_s *self, u8 *mac_addr);
 
-	int (*hw_get_link_status)(struct aq_hw_s *self,
-				  struct aq_hw_link_status_s *link_status);
+	int (*hw_get_link_status)(struct aq_hw_s *self);
 
 	int (*hw_set_link_speed)(struct aq_hw_s *self, u32 speed);
 
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index d6d8e70..dce17a5 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -125,33 +125,30 @@ static void aq_nic_service_timer_cb(unsigned long param)
 	struct net_device *ndev = aq_nic_get_ndev(self);
 	int err = 0;
 	unsigned int i = 0U;
-	struct aq_hw_link_status_s link_status;
 	struct aq_ring_stats_rx_s stats_rx;
 	struct aq_ring_stats_tx_s stats_tx;
 
 	if (aq_utils_obj_test(&self->header.flags, AQ_NIC_FLAGS_IS_NOT_READY))
 		goto err_exit;
 
-	err = self->aq_hw_ops.hw_get_link_status(self->aq_hw, &link_status);
+	err = self->aq_hw_ops.hw_get_link_status(self->aq_hw);
 	if (err < 0)
 		goto err_exit;
 
-	self->aq_hw_ops.hw_interrupt_moderation_set(self->aq_hw,
-			    self->aq_nic_cfg.is_interrupt_moderation);
-
-	if (memcmp(&link_status, &self->link_status, sizeof(link_status))) {
-		if (link_status.mbps) {
-			aq_utils_obj_set(&self->header.flags,
-					 AQ_NIC_FLAG_STARTED);
-			aq_utils_obj_clear(&self->header.flags,
-					   AQ_NIC_LINK_DOWN);
-			netif_carrier_on(self->ndev);
-		} else {
-			netif_carrier_off(self->ndev);
-			aq_utils_obj_set(&self->header.flags, AQ_NIC_LINK_DOWN);
-		}
+	self->link_status = self->aq_hw->aq_link_status;
 
-		self->link_status = link_status;
+	self->aq_hw_ops.hw_interrupt_moderation_set(self->aq_hw,
+		    self->aq_nic_cfg.is_interrupt_moderation);
+
+	if (self->link_status.mbps) {
+		aq_utils_obj_set(&self->header.flags,
+				 AQ_NIC_FLAG_STARTED);
+		aq_utils_obj_clear(&self->header.flags,
+				   AQ_NIC_LINK_DOWN);
+		netif_carrier_on(self->ndev);
+	} else {
+		netif_carrier_off(self->ndev);
+		aq_utils_obj_set(&self->header.flags, AQ_NIC_LINK_DOWN);
 	}
 
 	memset(&stats_rx, 0U, sizeof(struct aq_ring_stats_rx_s));
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index fcfbe42..eb3ae51 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -314,11 +314,11 @@ void hw_atl_utils_mpi_set(struct aq_hw_s *self,
 err_exit:;
 }
 
-int hw_atl_utils_mpi_get_link_status(struct aq_hw_s *self,
-				     struct aq_hw_link_status_s *link_status)
+int hw_atl_utils_mpi_get_link_status(struct aq_hw_s *self)
 {
 	u32 cp0x036C = aq_hw_read_reg(self, HW_ATL_MPI_STATE_ADR);
 	u32 link_speed_mask = cp0x036C >> HW_ATL_MPI_SPEED_SHIFT;
+	struct aq_hw_link_status_s *link_status = &self->aq_link_status;
 
 	if (!link_speed_mask) {
 		link_status->mbps = 0U;
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
index fc69408a..4f3e148 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
@@ -193,8 +193,7 @@ void hw_atl_utils_mpi_set(struct aq_hw_s *self,
 int hw_atl_utils_mpi_set_speed(struct aq_hw_s *self, u32 speed,
 			       enum hal_atl_utils_fw_state_e state);
 
-int hw_atl_utils_mpi_get_link_status(struct aq_hw_s *self,
-				     struct aq_hw_link_status_s *link_status);
+int hw_atl_utils_mpi_get_link_status(struct aq_hw_s *self);
 
 int hw_atl_utils_get_mac_permanent(struct aq_hw_s *self,
 				   struct aq_hw_caps_s *aq_hw_caps,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 4/7] net:ethernet:aquantia: Fix for MCP state change.
From: Pavel Belous @ 2017-08-23 14:05 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, David Arcari, Igor Russkikh, Nadezhda Krupnina,
	Simon Edelhaus, Pavel Belous, Pavel Belous
In-Reply-To: <cover.1503496727.git.pavel.belous@aquantia.com>

From: Pavel Belous <pavel.belous@aquantia.com>

The firmware state is controlled by writing value in to 0x368 register.
This value contain MCP state and desired link mode.
Because this value was incorrectly formed the firmware does not
resetting properly (ethtool -S shows the HW counters which
never resetting, even after reboot).

Fixes: 3d2ff7eebe26 "net: ethernet: aquantia: Atlantic hardware abstraction layer")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
---
 .../net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c    |  7 ++++---
 .../net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h    | 13 +++++++++++++
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 8d6d8f5..fcfbe42 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -280,10 +280,11 @@ err_exit:;
 int hw_atl_utils_mpi_set_speed(struct aq_hw_s *self, u32 speed,
 			       enum hal_atl_utils_fw_state_e state)
 {
-	u32 ucp_0x368 = 0;
+	union hal_atl_utils_hw_mpi_state_reg ucp_0x368 = { 0 };
 
-	ucp_0x368 = (speed << HW_ATL_MPI_SPEED_SHIFT) | state;
-	aq_hw_write_reg(self, HW_ATL_MPI_CONTROL_ADR, ucp_0x368);
+	ucp_0x368.u_speed = speed;
+	ucp_0x368.e_state = state;
+	aq_hw_write_reg(self, HW_ATL_MPI_CONTROL_ADR, ucp_0x368.val);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
index a66aee5..fc69408a 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
@@ -160,6 +160,19 @@ enum hal_atl_utils_fw_state_e {
 	MPI_POWER = 4,
 };
 
+union hal_atl_utils_hw_mpi_state_reg {
+	u32 val;
+	struct {
+		u8 e_state;
+		u8 reserved1;
+		u8 u_speed;
+		u8 reserved2:1;
+		u8 disable_dirty_wake:1;
+		u8 reserved3:2;
+		u8 u_downshift:4;
+	};
+};
+
 #define HAL_ATLANTIC_RATE_10G        BIT(0)
 #define HAL_ATLANTIC_RATE_5G         BIT(1)
 #define HAL_ATLANTIC_RATE_5GSR       BIT(2)
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 3/7] net:ethernet:aquantia: Workaround for HW checksum bug.
From: Pavel Belous @ 2017-08-23 14:05 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, David Arcari, Igor Russkikh, Nadezhda Krupnina,
	Simon Edelhaus, Pavel Belous, Pavel Belous
In-Reply-To: <cover.1503496727.git.pavel.belous@aquantia.com>

From: Pavel Belous <pavel.belous@aquantia.com>

The hardware has the HW Checksum Offload bug when small
TCP patckets (with length <= 60 bytes) has wrong "checksum valid" bit.

The solution is - ignore checksum valid bit for small packets
(with length <= 60 bytes) and mark this as CHECKSUM_NONE to allow
network stack recalculate checksum itself.

Fixes: ccf9a5ed14be ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c | 6 ++++++
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
index faeb493..c5a02df 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
@@ -629,6 +629,12 @@ static int hw_atl_a0_hw_ring_rx_receive(struct aq_hw_s *self,
 				buff->is_udp_cso = (is_err & 0x10U) ? 0 : 1;
 			else if (0x0U == (pkt_type & 0x1CU))
 				buff->is_tcp_cso = (is_err & 0x10U) ? 0 : 1;
+
+			/* Checksum offload workaround for small packets */
+			if (rxd_wb->pkt_len <= 60) {
+				buff->is_ip_cso = 0U;
+				buff->is_cso_err = 0U;
+			}
 		}
 
 		is_err &= ~0x18U;
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
index 1bceb73..21784cc 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
@@ -645,6 +645,12 @@ static int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s *self,
 				buff->is_udp_cso = buff->is_cso_err ? 0U : 1U;
 			else if (0x0U == (pkt_type & 0x1CU))
 				buff->is_tcp_cso = buff->is_cso_err ? 0U : 1U;
+
+			/* Checksum offload workaround for small packets */
+			if (rxd_wb->pkt_len <= 60) {
+				buff->is_ip_cso = 0U;
+				buff->is_cso_err = 0U;
+			}
 		}
 
 		is_err &= ~0x18U;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 2/7] net:ethernet:aquantia: Fix for number of RSS queues.
From: Pavel Belous @ 2017-08-23 14:05 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, David Arcari, Igor Russkikh, Nadezhda Krupnina,
	Simon Edelhaus, Pavel Belous, Pavel Belous
In-Reply-To: <cover.1503496727.git.pavel.belous@aquantia.com>

From: Pavel Belous <pavel.belous@aquantia.com>

The number of RSS queues should be not more than numbers of CPU.
Its does not make sense to increase perfomance, and also cause problems on
some motherboards.

Fixes: 94f6c9e4cdf6 ("net: ethernet: aquantia: Support for NIC-specific code")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 08b7275..d6d8e70 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -103,6 +103,8 @@ int aq_nic_cfg_start(struct aq_nic_s *self)
 	else
 		cfg->vecs = 1U;
 
+	cfg->num_rss_queues = min(cfg->vecs, AQ_CFG_NUM_RSS_QUEUES_DEF);
+
 	cfg->irq_type = aq_pci_func_get_irq_type(self->aq_pci_func);
 
 	if ((cfg->irq_type == AQ_HW_IRQ_LEGACY) ||
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 1/7] net:ethernet:aquantia: Extra spinlocks removed.
From: Pavel Belous @ 2017-08-23 14:05 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, David Arcari, Igor Russkikh, Nadezhda Krupnina,
	Simon Edelhaus, Pavel Belous, Pavel Belous
In-Reply-To: <cover.1503496727.git.pavel.belous@aquantia.com>

From: Pavel Belous <pavel.belous@aquantia.com>

This patch removes datapath spinlocks which does not perform any
useful work.

Fixes: 6e70637f9f1e ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c   | 42 +++++++----------------
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c  |  1 -
 drivers/net/ethernet/aquantia/atlantic/aq_utils.h |  1 -
 drivers/net/ethernet/aquantia/atlantic/aq_vec.c   | 11 ++----
 4 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 9ee1c50..08b7275 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -597,14 +597,11 @@ static unsigned int aq_nic_map_skb(struct aq_nic_s *self,
 }
 
 int aq_nic_xmit(struct aq_nic_s *self, struct sk_buff *skb)
-__releases(&ring->lock)
-__acquires(&ring->lock)
 {
 	struct aq_ring_s *ring = NULL;
 	unsigned int frags = 0U;
 	unsigned int vec = skb->queue_mapping % self->aq_nic_cfg.vecs;
 	unsigned int tc = 0U;
-	unsigned int trys = AQ_CFG_LOCK_TRYS;
 	int err = NETDEV_TX_OK;
 	bool is_nic_in_bad_state;
 
@@ -628,36 +625,21 @@ __acquires(&ring->lock)
 		goto err_exit;
 	}
 
-	do {
-		if (spin_trylock(&ring->header.lock)) {
-			frags = aq_nic_map_skb(self, skb, ring);
-
-			if (likely(frags)) {
-				err = self->aq_hw_ops.hw_ring_tx_xmit(
-								self->aq_hw,
-								ring, frags);
-				if (err >= 0) {
-					if (aq_ring_avail_dx(ring) <
-					    AQ_CFG_SKB_FRAGS_MAX + 1)
-						aq_nic_ndev_queue_stop(
-								self,
-								ring->idx);
-
-					++ring->stats.tx.packets;
-					ring->stats.tx.bytes += skb->len;
-				}
-			} else {
-				err = NETDEV_TX_BUSY;
-			}
+	frags = aq_nic_map_skb(self, skb, ring);
 
-			spin_unlock(&ring->header.lock);
-			break;
-		}
-	} while (--trys);
+	if (likely(frags)) {
+		err = self->aq_hw_ops.hw_ring_tx_xmit(self->aq_hw,
+						      ring,
+						      frags);
+		if (err >= 0) {
+			if (aq_ring_avail_dx(ring) < AQ_CFG_SKB_FRAGS_MAX + 1)
+				aq_nic_ndev_queue_stop(self, ring->idx);
 
-	if (!trys) {
+			++ring->stats.tx.packets;
+			ring->stats.tx.bytes += skb->len;
+		}
+	} else {
 		err = NETDEV_TX_BUSY;
-		goto err_exit;
 	}
 
 err_exit:
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 9a08179..ec5579f 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -101,7 +101,6 @@ int aq_ring_init(struct aq_ring_s *self)
 	self->hw_head = 0;
 	self->sw_head = 0;
 	self->sw_tail = 0;
-	spin_lock_init(&self->header.lock);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_utils.h b/drivers/net/ethernet/aquantia/atlantic/aq_utils.h
index f6012b3..e12bcdf 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_utils.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_utils.h
@@ -17,7 +17,6 @@
 #define AQ_DIMOF(_ARY_)  ARRAY_SIZE(_ARY_)
 
 struct aq_obj_s {
-	spinlock_t lock; /* spinlock for nic/rings processing */
 	atomic_t flags;
 };
 
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
index ad5b4d4d..fee446a 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
@@ -34,8 +34,6 @@ struct aq_vec_s {
 #define AQ_VEC_RX_ID 1
 
 static int aq_vec_poll(struct napi_struct *napi, int budget)
-__releases(&self->lock)
-__acquires(&self->lock)
 {
 	struct aq_vec_s *self = container_of(napi, struct aq_vec_s, napi);
 	struct aq_ring_s *ring = NULL;
@@ -47,7 +45,7 @@ __acquires(&self->lock)
 
 	if (!self) {
 		err = -EINVAL;
-	} else if (spin_trylock(&self->header.lock)) {
+	} else {
 		for (i = 0U, ring = self->ring[0];
 			self->tx_rings > i; ++i, ring = self->ring[i]) {
 			if (self->aq_hw_ops->hw_ring_tx_head_update) {
@@ -105,11 +103,8 @@ __acquires(&self->lock)
 			self->aq_hw_ops->hw_irq_enable(self->aq_hw,
 					1U << self->aq_ring_param.vec_idx);
 		}
-
-err_exit:
-		spin_unlock(&self->header.lock);
 	}
-
+err_exit:
 	return work_done;
 }
 
@@ -185,8 +180,6 @@ int aq_vec_init(struct aq_vec_s *self, struct aq_hw_ops *aq_hw_ops,
 	self->aq_hw_ops = aq_hw_ops;
 	self->aq_hw = aq_hw;
 
-	spin_lock_init(&self->header.lock);
-
 	for (i = 0U, ring = self->ring[0];
 		self->tx_rings > i; ++i, ring = self->ring[i]) {
 		err = aq_ring_init(&ring[AQ_VEC_TX_ID]);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 0/7] net:ethernet:aquantia: Atlantic driver Update 2017-08-23
From: Pavel Belous @ 2017-08-23 14:05 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, David Arcari, Igor Russkikh, Nadezhda Krupnina,
	Simon Edelhaus, Pavel Belous

From: Pavel Belous <pavel.belous@aquantia.com>

This series contains updates for aQuantia Atlantic driver.

It has bugfixes and some improvements.

Igor Russkikh (1):
  net:ethernet:aquantia: Fix for multicast filter handling.

Pavel Belous (6):
  net:ethernet:aquantia: Extra spinlocks removed.
  net:ethernet:aquantia: Fix for number of RSS queues.
  net:ethernet:aquantia: Workaround for HW checksum bug.
  net:ethernet:aquantia: Fix for MCP state change.
  net:ethernet:aquantia: Fix for incorrect speed index.
  net:ethernet:aquantia: Show info message if bad firmware version
    detected.

 drivers/net/ethernet/aquantia/atlantic/aq_hw.h     |  3 +-
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c    | 92 +++++++++++-----------
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c   |  1 -
 drivers/net/ethernet/aquantia/atlantic/aq_utils.h  |  1 -
 drivers/net/ethernet/aquantia/atlantic/aq_vec.c    | 11 +--
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c  |  6 ++
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  |  6 ++
 .../aquantia/atlantic/hw_atl/hw_atl_utils.c        | 17 ++--
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h        | 16 +++-
 9 files changed, 85 insertions(+), 68 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [RFC PATCH] dt-binding: net: sfp binding documentation
From: Russell King - ARM Linux @ 2017-08-23 14:04 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Rob Herring, Mark Rutland, Andrew Lunn, Florian Fainelli,
	David S . Miller, netdev, devicetree
In-Reply-To: <20170821150653.jmoogmxklkfbrzxg@sapphire.tkos.co.il>

On Mon, Aug 21, 2017 at 06:06:53PM +0300, Baruch Siach wrote:
> Hi Russell,
> 
> On Mon, Aug 21, 2017 at 01:53:17PM +0100, Russell King - ARM Linux wrote:
> > Note that I didn't expect the SFP code to just get merged with very
> > little in the way of real in-depth review of things like:
> > 
> > * the way the SFP code works, and its structure
> > * analysis of the bindings checking that they're fit for everyone's
> >   purposes.
> 
> I was also surprised to see the "sff,sfp" compatible string with no ack from 
> DT maintainers. Hence this RFC.

I've been pushed into submitting the code for merging, and I hadn't
got around to writing the DT docs (thanks for doing that).  As I've
already said, I'm disappointed that the code didn't get more of a
review before it was merged - it seems Linux review is not what it
was, people care more about reviewing for spelling errors and style
than code structure and functionality, stating that "if we don't like
it we can always rework it" or similar.

It also seems that people believe that they can't make use of other
people's work until it gets merged into mainline kernels (which is
what has been behind the pressure of getting this merged.)

What isn't realised is that having other people use the code before
it gets merged allows design issues to be identified and resolved
when there is great flexibility available - for example, changing the
DT binding.  Once it's merged, changing DT bindings becomes harder,
especially if they need to be changed in an incompatible way.

I'm fed up about this, and way past caring about these details today
through.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* [PATCH] [net-next] qlge: avoid memcpy buffer overflow
From: Arnd Bergmann @ 2017-08-23 13:59 UTC (permalink / raw)
  To: Harish Patil, Manish Chopra, Dept-GELinuxNICDev
  Cc: Arnd Bergmann, stable, David S. Miller, Kees Cook,
	Gustavo A. R. Silva, netdev, linux-kernel

gcc-8.0.0 (snapshot) points out that we copy a variable-length string
into a fixed length field using memcpy() with the destination length,
and that ends up copying whatever follows the string:

    inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=]
  memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);

Changing it to use strncpy() will instead zero-pad the destination,
which seems to be the right thing to do here.

The bug is probably harmless, but it seems like a good idea to address
it in stable kernels as well, if only for the purpose of building with
gcc-8 without warnings.

Cc: stable@vger.kernel.org
Fixes: a61f80261306 ("qlge: Add ethtool register dump function.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
Doesn't seem urgent to me, so please queue it for net-next if it
looks ok.

Interestingly, the hardened memcpy() functions in linux/string.h never
caught this problem event though I think they should have, but gcc-8
found it by default.
---
 drivers/net/ethernet/qlogic/qlge/qlge_dbg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c b/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c
index 458d55ba423f..fe2599b83d09 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_dbg.c
@@ -705,7 +705,7 @@ static void ql_build_coredump_seg_header(
 	seg_hdr->cookie = MPI_COREDUMP_COOKIE;
 	seg_hdr->segNum = seg_number;
 	seg_hdr->segSize = seg_size;
-	memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);
+	strncpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);
 }
 
 /*
-- 
2.9.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox