Netdev List
 help / color / mirror / Atom feed
* [PATCH net 2/2] nfp: flower: handle neighbour events on internal ports
From: Jakub Kicinski @ 2019-08-28  5:56 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, John Hurley, Simon Horman, Jakub Kicinski
In-Reply-To: <20190828055630.17331-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

Recent code changes to NFP allowed the offload of neighbour entries to FW
when the next hop device was an internal port. This allows for offload of
tunnel encap when the end-point IP address is applied to such a port.

Unfortunately, the neighbour event handler still rejects events that are
not associated with a repr dev and so the firmware neighbour table may get
out of sync for internal ports.

Fix this by allowing internal port neighbour events to be correctly
processed.

Fixes: 45756dfedab5 ("nfp: flower: allow tunnels to output to internal port")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index a7a80f4b722a..f0ee982eb1b5 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -328,13 +328,13 @@ nfp_tun_neigh_event_handler(struct notifier_block *nb, unsigned long event,
 
 	flow.daddr = *(__be32 *)n->primary_key;
 
-	/* Only concerned with route changes for representors. */
-	if (!nfp_netdev_is_nfp_repr(n->dev))
-		return NOTIFY_DONE;
-
 	app_priv = container_of(nb, struct nfp_flower_priv, tun.neigh_nb);
 	app = app_priv->app;
 
+	if (!nfp_netdev_is_nfp_repr(n->dev) &&
+	    !nfp_flower_internal_port_can_offload(app, n->dev))
+		return NOTIFY_DONE;
+
 	/* Only concerned with changes to routes already added to NFP. */
 	if (!nfp_tun_has_route(app, flow.daddr))
 		return NOTIFY_DONE;
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Andy Lutomirski @ 2019-08-28  6:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	Steven Rostedt, David S. Miller, Daniel Borkmann,
	Network Development, bpf, kernel-team, Linux API
In-Reply-To: <20190828044340.zeha3k3cmmxgfqj7@ast-mbp.dhcp.thefacebook.com>

On Tue, Aug 27, 2019 at 9:43 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 27, 2019 at 05:55:41PM -0700, Andy Lutomirski wrote:
> >
> > I was hoping for something in Documentation/admin-guide, not in a
> > changelog that's hard to find.
>
> eventually yes.
>
> > >
> > > > Changing the capability that some existing operation requires could
> > > > break existing programs.  The old capability may need to be accepted
> > > > as well.
> > >
> > > As far as I can see there is no ABI breakage. Please point out
> > > which line of the patch may break it.
> >
> > As a more or less arbitrary selection:
> >
> >  void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> >  {
> >         if (!bpf_prog_kallsyms_candidate(fp) ||
> > -           !capable(CAP_SYS_ADMIN))
> > +           !capable(CAP_BPF))
> >                 return;
> >
> > Before your patch, a task with CAP_SYS_ADMIN could do this.  Now it
> > can't.  Per the usual Linux definition of "ABI break", this is an ABI
> > break if and only if someone actually did this in a context where they
> > have CAP_SYS_ADMIN but not all capabilities.  How confident are you
> > that no one does things like this?
> >  void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> >  {
> >         if (!bpf_prog_kallsyms_candidate(fp) ||
> > -           !capable(CAP_SYS_ADMIN))
> > +           !capable(CAP_BPF))
> >                 return;
>
> Yes. I'm confident that apps don't drop everything and
> leave cap_sys_admin only before doing bpf() syscall, since it would
> break their own use of networking.
> Hence I'm not going to do the cap_syslog-like "deprecated" message mess
> because of this unfounded concern.
> If I turn out to be wrong we will add this "deprecated mess" later.
>
> >
> > From the previous discussion, you want to make progress toward solving
> > a lot of problems with CAP_BPF.  One of them was making BPF
> > firewalling more generally useful. By making CAP_BPF grant the ability
> > to read kernel memory, you will make administrators much more nervous
> > to grant CAP_BPF.
>
> Andy, were your email hacked?
> I explained several times that in this proposal
> CAP_BPF _and_ CAP_TRACING _both_ are necessary to read kernel memory.
> CAP_BPF alone is _not enough_.

You have indeed said this many times.  You've stated it as a matter of
fact as though it cannot possibly discussed.  I'm asking you to
justify it.

> > Similarly, and correct me if I'm wrong, most of
> > these capabilities are primarily or only useful for tracing, so I
> > don't see why users without CAP_TRACING should get them.
> > bpf_trace_printk(), in particular, even has "trace" in its name :)
> >
> > Also, if a task has CAP_TRACING, it's expected to be able to trace the
> > system -- that's the whole point.  Why shouldn't it be able to use BPF
> > to trace the system better?
>
> CAP_TRACING shouldn't be able to do BPF because BPF is not tracing only.

What does "do BPF" even mean?  seccomp() does BPF.  SO_ATTACH_FILTER
does BPF.  Saying that using BPF should require a specific capability
seems kind of like saying that using the network should require a
specific capability.  Linux (and Unixy systems in general) distinguish
between binding low-number ports, binding high-number ports, using raw
sockets, and changing the system's IP address.  These have different
implications and require different capabilities.

It seems like you are specifically trying to add a new switch to turn
as much of BPF as possible on and off.  Why?

> >
> > test_run allows fully controlled inputs, in a context where a program
> > can trivially flush caches, mistrain branch predictors, etc first.  It
> > seems to me that, if a JITted bpf program contains an exploitable
> > speculation gadget (MDS, Spectre v1, RSB, or anything else),
>
> speaking of MDS... I already asked you to help investigate its
> applicability with existing bpf exposure. Are you going to do that?

I am blissfully uninvolved in MDS, and I don't know all that much more
about the overall mechanism than a random reader of tech news :)  ISTM
there are two meaningful ways that BPF could be involved: a BPF
program could leak info into the state exposed by MDS, or a BPF
program could try to read that state.  From what little I understand,
it's essentially inevitable that BPF leaks information into MDS state,
and this is probably even controllable by an attacker that understands
MDS in enough detail.    So the interesting questions are: can BPF be
used to read MDS state and can BPF be used to leak information in a
more useful way than the rest of the kernel to an attacker.

Keeping in mind that the kernel will flush MDS state on every exit to
usermode, I think the most likely attack is to try to read MDS state
with BPF.  This could happen, I suppose -- BPF programs can easily
contain the usual speculation gadgets of "do something and read an
address that depends on the outcome".  Fortunately, outside of
bpf_probe_read(), AFAIK BPF programs can't directly touch user memory,
and an attacker that is allowed to use bpf_probe_read() doesn't need
MDS to read things.

So it's not entirely obvious to me how an attack would be mounted.
test_run would make it a lot easier, I think.

>
> > it will
> > be *much* easier to exploit it using test_run than using normal
> > network traffic.  Similarly, normal network traffic will have network
> > headers that are valid enough to have caused the BPF program to be
> > invoked in the first place.  test_run can inject arbitrary garbage.
>
> Please take a look at Jann's var1 exploit. Was it hard to run bpf prog
> in controlled environment without test_run command ?
>

Can you send me a link?

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Andy Lutomirski @ 2019-08-28  6:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Peter Zijlstra, Masami Hiramatsu,
	Steven Rostedt, David S. Miller, Daniel Borkmann,
	Network Development, bpf, kernel-team, Linux API
In-Reply-To: <20190828044903.nv3hvinkkolnnxtv@ast-mbp.dhcp.thefacebook.com>

On Tue, Aug 27, 2019 at 9:49 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 27, 2019 at 07:00:40PM -0700, Andy Lutomirski wrote:
> >
> > Let me put this a bit differently. Part of the point is that
> > CAP_TRACING should allow a user or program to trace without being able
> > to corrupt the system. CAP_BPF as you’ve proposed it *can* likely
> > crash the system.
>
> Really? I'm still waiting for your example where bpf+kprobe crashes the system...
>

That's not what I meant.  bpf+kprobe causing a crash is a bug.  I'm
referring to a totally different issue.  On my laptop:

$ sudo bpftool map
48: hash  name foobar  flags 0x0
    key 8B  value 8B  max_entries 64  memlock 8192B
181: lpm_trie  flags 0x1
    key 8B  value 8B  max_entries 1  memlock 4096B
182: lpm_trie  flags 0x1
    key 20B  value 8B  max_entries 1  memlock 4096B
183: lpm_trie  flags 0x1
    key 8B  value 8B  max_entries 1  memlock 4096B
184: lpm_trie  flags 0x1
    key 20B  value 8B  max_entries 1  memlock 4096B
185: lpm_trie  flags 0x1
    key 8B  value 8B  max_entries 1  memlock 4096B
186: lpm_trie  flags 0x1
    key 20B  value 8B  max_entries 1  memlock 4096B
187: lpm_trie  flags 0x1
    key 8B  value 8B  max_entries 1  memlock 4096B
188: lpm_trie  flags 0x1
    key 20B  value 8B  max_entries 1  memlock 4096B

$ sudo bpftool map dump id 186
key:
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00 00 00 00
value:
02 00 00 00 00 00 00 00
Found 1 element

$ sudo bpftool map delete id 186 key hex 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00
[this worked]

I don't know what my laptop was doing with map id 186 in particular,
but, whatever it was, I definitely broke it.  If a BPF firewall is in
use on something important enough, this could easily remove
connectivity from part or all of the system.  Right now, this needs
CAP_SYS_ADMIN.  With your patch, CAP_BPF is sufficient to do this, but
you *also* need CAP_BPF to trace the system using BPF.  Tracing with
BPF is 'safe' in the absence of bugs.  Modifying other peoples' maps
is not.

One possible answer to this would be to limit CAP_BPF to the subset of
BPF that is totaly safe in the absence of bugs (e.g. loading most
program types if they don't have dangerous BPF_CALL instructions but
not *_BY_ID).  Another answer would be to say that CAP_BPF will not be
needed by future unprivileged bpf mechanisms, and that CAP_TRACING
plus unprivileged bpf will be enough to do most or all interesting BPF
tracing operations.

If the answer is the latter, then maybe it would make sense to try to
implement some of the unprivileged bpf stuff and then to see whether
CAP_BPF is still needed.

^ permalink raw reply

* [PATCH] sky2: Disable MSI on yet another ASUS boards (P6Xxxx)
From: Takashi Iwai @ 2019-08-28  6:31 UTC (permalink / raw)
  To: Mirko Lindner, Stephen Hemminger
  Cc: David S . Miller, netdev, linux-kernel, SteveM

A similar workaround for the suspend/resume problem is needed for yet
another ASUS machines, P6X models.  Like the previous fix, the BIOS
doesn't provide the standard DMI_SYS_* entry, so again DMI_BOARD_*
entries are used instead.

Reported-and-tested-by: SteveM <swm@swm1.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 drivers/net/ethernet/marvell/sky2.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index c2e00bb587cd..5f56ee83e3b1 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -4931,6 +4931,13 @@ static const struct dmi_system_id msi_blacklist[] = {
 			DMI_MATCH(DMI_BOARD_NAME, "P6T"),
 		},
 	},
+	{
+		.ident = "ASUS P6X",
+		.matches = {
+			DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
+			DMI_MATCH(DMI_BOARD_NAME, "P6X"),
+		},
+	},
 	{}
 };
 
-- 
2.16.4


^ permalink raw reply related

* Re: [PATCH net] net/sched: pfifo_fast: fix wrong dereference in pfifo_fast_enqueue
From: Paolo Abeni @ 2019-08-28  6:31 UTC (permalink / raw)
  To: Davide Caratti, Cong Wang, Jamal Hadi Salim, Jiri Pirko,
	David S. Miller, netdev
  Cc: Stefano Brivio, Li Shuang
In-Reply-To: <d5a7a167ab57e035685445ee641840a0c5fd39ae.1566940693.git.dcaratti@redhat.com>

On Tue, 2019-08-27 at 23:18 +0200, Davide Caratti wrote:
> Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of
> 'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that
> per-cpu counters are there in the error path of skb_array_produce().
> Otherwise, the following splat can be seen:
> 
>  Unable to handle kernel paging request at virtual address 0000600dea430008
>  Mem abort info:
>    ESR = 0x96000005
>    Exception class = DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>  Data abort info:
>    ISV = 0, ISS = 0x00000005
>    CM = 0, WnR = 0
>  user pgtable: 64k pages, 48-bit VAs, pgdp = 000000007b97530e
>  [0000600dea430008] pgd=0000000000000000, pud=0000000000000000
>  Internal error: Oops: 96000005 [#1] SMP
> [...]
>  pstate: 10000005 (nzcV daif -PAN -UAO)
>  pc : pfifo_fast_enqueue+0x524/0x6e8
>  lr : pfifo_fast_enqueue+0x46c/0x6e8
>  sp : ffff800d39376fe0
>  x29: ffff800d39376fe0 x28: 1ffff001a07d1e40
>  x27: ffff800d03e8f188 x26: ffff800d03e8f200
>  x25: 0000000000000062 x24: ffff800d393772f0
>  x23: 0000000000000000 x22: 0000000000000403
>  x21: ffff800cca569a00 x20: ffff800d03e8ee00
>  x19: ffff800cca569a10 x18: 00000000000000bf
>  x17: 0000000000000000 x16: 0000000000000000
>  x15: 0000000000000000 x14: ffff1001a726edd0
>  x13: 1fffe4000276a9a4 x12: 0000000000000000
>  x11: dfff200000000000 x10: ffff800d03e8f1a0
>  x9 : 0000000000000003 x8 : 0000000000000000
>  x7 : 00000000f1f1f1f1 x6 : ffff1001a726edea
>  x5 : ffff800cca56a53c x4 : 1ffff001bf9a8003
>  x3 : 1ffff001bf9a8003 x2 : 1ffff001a07d1dcb
>  x1 : 0000600dea430000 x0 : 0000600dea430008
>  Process ping (pid: 6067, stack limit = 0x00000000dc0aa557)
>  Call trace:
>   pfifo_fast_enqueue+0x524/0x6e8
>   htb_enqueue+0x660/0x10e0 [sch_htb]
>   __dev_queue_xmit+0x123c/0x2de0
>   dev_queue_xmit+0x24/0x30
>   ip_finish_output2+0xc48/0x1720
>   ip_finish_output+0x548/0x9d8
>   ip_output+0x334/0x788
>   ip_local_out+0x90/0x138
>   ip_send_skb+0x44/0x1d0
>   ip_push_pending_frames+0x5c/0x78
>   raw_sendmsg+0xed8/0x28d0
>   inet_sendmsg+0xc4/0x5c0
>   sock_sendmsg+0xac/0x108
>   __sys_sendto+0x1ac/0x2a0
>   __arm64_sys_sendto+0xc4/0x138
>   el0_svc_handler+0x13c/0x298
>   el0_svc+0x8/0xc
>  Code: f9402e80 d538d081 91002000 8b010000 (885f7c03)
> 
> Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags',
> before dereferencing 'qdisc->cpu_qstats'.
> 
> Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too")
> CC: Paolo Abeni <pabeni@redhat.com>
> CC: Stefano Brivio <sbrivio@redhat.com>
> Reported-by: Li Shuang <shuali@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
>  net/sched/sch_generic.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 099797e5409d..137db1cbde85 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -624,8 +624,12 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
>  
>  	err = skb_array_produce(q, skb);
>  
> -	if (unlikely(err))
> -		return qdisc_drop_cpu(skb, qdisc, to_free);
> +	if (unlikely(err)) {
> +		if (qdisc_is_percpu_stats(qdisc))
> +			return qdisc_drop_cpu(skb, qdisc, to_free);
> +		else
> +			return qdisc_drop(skb, qdisc, to_free);
> +	}
>  
>  	qdisc_update_stats_at_enqueue(qdisc, pkt_len);
>  	return NET_XMIT_SUCCESS;

LGTM, thanks Davide!

I just did a code audit of the others pfifo_fast callbacks, I think
this is the last spot in need of such fix.

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply

* Re: [PATCH net-next v2 2/3] dt-bindings: net: dsa: mt7530: Add support for port 5
From: René van Dorst @ 2019-08-28  6:35 UTC (permalink / raw)
  To: Rob Herring
  Cc: Sean Wang, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S . Miller, Matthias Brugger, netdev, linux-arm-kernel,
	linux-mediatek, John Crispin, linux-mips, Frank Wunderlich,
	devicetree
In-Reply-To: <20190827222251.GA30507@bogus>

Hi Rob,

Quoting Rob Herring <robh@kernel.org>:

> On Wed, Aug 21, 2019 at 04:45:46PM +0200, René van Dorst wrote:
>> MT7530 port 5 has many modes/configurations.
>> Update the documentation how to use port 5.
>>
>> Signed-off-by: René van Dorst <opensource@vdorst.com>
>> Cc: devicetree@vger.kernel.org
>> Cc: Rob Herring <robh@kernel.org>
>
>> v1->v2:
>> * Adding extra note about RGMII2 and gpio use.
>> rfc->v1:
>> * No change
>
> The changelog goes below the '---'
>

Thanks for the review,
I shall fix that.

>> ---
>>  .../devicetree/bindings/net/dsa/mt7530.txt    | 218 ++++++++++++++++++
>>  1 file changed, 218 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/net/dsa/mt7530.txt  
>> b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
>> index 47aa205ee0bd..43993aae3f9c 100644
>> --- a/Documentation/devicetree/bindings/net/dsa/mt7530.txt
>> +++ b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
>> @@ -35,6 +35,42 @@ Required properties for the child nodes within  
>> ports container:
>>  - phy-mode: String, must be either "trgmii" or "rgmii" for port labeled
>>  	 "cpu".
>>
>> +Port 5 of the switch is muxed between:
>> +1. GMAC5: GMAC5 can interface with another external MAC or PHY.
>> +2. PHY of port 0 or port 4: PHY interfaces with an external MAC  
>> like 2nd GMAC
>> +   of the SOC. Used in many setups where port 0/4 becomes the WAN port.
>> +   Note: On a MT7621 SOC with integrated switch: 2nd GMAC can only  
>> connected to
>> +	 GMAC5 when the gpios for RGMII2 (GPIO 22-33) are not used and not
>> +	 connected to external component!
>> +
>> +Port 5 modes/configurations:
>> +1. Port 5 is disabled and isolated: An external phy can interface  
>> to the 2nd
>> +   GMAC of the SOC.
>> +   In the case of a build-in MT7530 switch, port 5 shares the  
>> RGMII bus with 2nd
>> +   GMAC and an optional external phy. Mind the GPIO/pinctl  
>> settings of the SOC!
>> +2. Port 5 is muxed to PHY of port 0/4: Port 0/4 interfaces with 2nd GMAC.
>> +   It is a simple MAC to PHY interface, port 5 needs to be setup  
>> for xMII mode
>> +   and RGMII delay.
>> +3. Port 5 is muxed to GMAC5 and can interface to an external phy.
>> +   Port 5 becomes an extra switch port.
>> +   Only works on platform where external phy TX<->RX lines are swapped.
>> +   Like in the Ubiquiti ER-X-SFP.
>> +4. Port 5 is muxed to GMAC5 and interfaces with the 2nd GAMC as  
>> 2nd CPU port.
>> +   Currently a 2nd CPU port is not supported by DSA code.
>> +
>> +Depending on how the external PHY is wired:
>> +1. normal: The PHY can only connect to 2nd GMAC but not to the switch
>> +2. swapped: RGMII TX, RX are swapped; external phy interface with  
>> the switch as
>> +   a ethernet port. But can't interface to the 2nd GMAC.
>> +
>> +Based on the DT the port 5 mode is configured.
>> +
>> +Driver tries to lookup the phy-handle of the 2nd GMAC of the master device.
>> +When phy-handle matches PHY of port 0 or 4 then port 5 set-up as mode 2.
>> +phy-mode must be set, see also example 2 below!
>> + * mt7621: phy-mode = "rgmii-txid";
>> + * mt7623: phy-mode = "rgmii";
>> +
>>  See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list  
>> of additional
>>  required, optional properties and how the integrated switch subnodes must
>>  be specified.
>> @@ -94,3 +130,185 @@ Example:
>>  			};
>>  		};
>>  	};
>> +
>> +Example 2: MT7621: Port 4 is WAN port: 2nd GMAC -> Port 5 -> PHY port 4.
>> +
>> +&eth {
>> +	status = "okay";
>
> Don't show status in examples.

OK.

> This should show the complete node.
>

To be clear, I should take ethernet node from the mt7621.dtsi [0] or  
mt7623.dtsi
[1] and insert the example below?, right?

Greats,

René

[0]:  
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/staging/mt7621-dts/mt7621.dtsi#n397
[1]:  
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/arch/arm/boot/dts/mt7623.dtsi#n1023

>> +
>> +	gmac0: mac@0 {
>> +		compatible = "mediatek,eth-mac";
>> +		reg = <0>;
>> +		phy-mode = "rgmii";
>> +
>> +		fixed-link {
>> +			speed = <1000>;
>> +			full-duplex;
>> +			pause;
>> +		};
>> +	};
>> +
>> +	gmac1: mac@1 {
>> +		compatible = "mediatek,eth-mac";
>> +		reg = <1>;
>> +		phy-mode = "rgmii-txid";
>> +		phy-handle = <&phy4>;
>> +	};
>> +
>> +	mdio: mdio-bus {
>> +		#address-cells = <1>;
>> +		#size-cells = <0>;
>> +
>> +		/* Internal phy */
>> +		phy4: ethernet-phy@4 {
>> +			reg = <4>;
>> +		};
>> +
>> +		mt7530: switch@1f {
>> +			compatible = "mediatek,mt7621";
>> +			#address-cells = <1>;
>> +			#size-cells = <0>;
>> +			reg = <0x1f>;
>> +			pinctrl-names = "default";
>> +			mediatek,mcm;
>> +
>> +			resets = <&rstctrl 2>;
>> +			reset-names = "mcm";
>> +
>> +			ports {
>> +				#address-cells = <1>;
>> +				#size-cells = <0>;
>> +
>> +				port@0 {
>> +					reg = <0>;
>> +					label = "lan0";
>> +				};
>> +
>> +				port@1 {
>> +					reg = <1>;
>> +					label = "lan1";
>> +				};
>> +
>> +				port@2 {
>> +					reg = <2>;
>> +					label = "lan2";
>> +				};
>> +
>> +				port@3 {
>> +					reg = <3>;
>> +					label = "lan3";
>> +				};
>> +
>> +/* Commented out. Port 4 is handled by 2nd GMAC.
>> +				port@4 {
>> +					reg = <4>;
>> +					label = "lan4";
>> +				};
>> +*/
>> +
>> +				cpu_port0: port@6 {
>> +					reg = <6>;
>> +					label = "cpu";
>> +					ethernet = <&gmac0>;
>> +					phy-mode = "rgmii";
>> +
>> +					fixed-link {
>> +						speed = <1000>;
>> +						full-duplex;
>> +						pause;
>> +					};
>> +				};
>> +			};
>> +		};
>> +	};
>> +};
>> +
>> +Example 3: MT7621: Port 5 is connected to external PHY: Port 5 ->  
>> external PHY.
>> +
>> +&eth {
>> +	status = "okay";
>> +
>> +	gmac0: mac@0 {
>> +		compatible = "mediatek,eth-mac";
>> +		reg = <0>;
>> +		phy-mode = "rgmii";
>> +
>> +		fixed-link {
>> +			speed = <1000>;
>> +			full-duplex;
>> +			pause;
>> +		};
>> +	};
>> +
>> +	mdio: mdio-bus {
>> +		#address-cells = <1>;
>> +		#size-cells = <0>;
>> +
>> +		/* External phy */
>> +		ephy5: ethernet-phy@7 {
>> +			reg = <7>;
>> +		};
>> +
>> +		mt7530: switch@1f {
>> +			compatible = "mediatek,mt7621";
>> +			#address-cells = <1>;
>> +			#size-cells = <0>;
>> +			reg = <0x1f>;
>> +			pinctrl-names = "default";
>> +			mediatek,mcm;
>> +
>> +			resets = <&rstctrl 2>;
>> +			reset-names = "mcm";
>> +
>> +			ports {
>> +				#address-cells = <1>;
>> +				#size-cells = <0>;
>> +
>> +				port@0 {
>> +					reg = <0>;
>> +					label = "lan0";
>> +				};
>> +
>> +				port@1 {
>> +					reg = <1>;
>> +					label = "lan1";
>> +				};
>> +
>> +				port@2 {
>> +					reg = <2>;
>> +					label = "lan2";
>> +				};
>> +
>> +				port@3 {
>> +					reg = <3>;
>> +					label = "lan3";
>> +				};
>> +
>> +				port@4 {
>> +					reg = <4>;
>> +					label = "lan4";
>> +				};
>> +
>> +				port@5 {
>> +					reg = <5>;
>> +					label = "lan5";
>> +					phy-mode = "rgmii";
>> +					phy-handle = <&ephy5>;
>> +				};
>> +
>> +				cpu_port0: port@6 {
>> +					reg = <6>;
>> +					label = "cpu";
>> +					ethernet = <&gmac0>;
>> +					phy-mode = "rgmii";
>> +
>> +					fixed-link {
>> +						speed = <1000>;
>> +						full-duplex;
>> +						pause;
>> +					};
>> +				};
>> +			};
>> +		};
>> +	};
>> +};
>> --
>> 2.20.1
>>




^ permalink raw reply

* WARNING in smc_unhash_sk (3)
From: syzbot @ 2019-08-28  6:38 UTC (permalink / raw)
  To: davem, kgraul, linux-kernel, linux-s390, netdev, syzkaller-bugs,
	ubraun

Hello,

syzbot found the following crash on:

HEAD commit:    a55aa89a Linux 5.3-rc6
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=112dd212600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=58485246ad14eafe
dashboard link: https://syzkaller.appspot.com/bug?extid=8488cc4cf1c9e09b8b86
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15426ebc600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=116aca7a600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 9198 at ./include/net/sock.h:666 sk_del_node_init  
include/net/sock.h:666 [inline]
WARNING: CPU: 0 PID: 9198 at ./include/net/sock.h:666  
smc_unhash_sk+0x21b/0x240 net/smc/af_smc.c:96
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9198 Comm: syz-executor057 Not tainted 5.3.0-rc6 #93
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113
  panic+0x25c/0x799 kernel/panic.c:219
  __warn+0x22f/0x230 kernel/panic.c:576
  report_bug+0x190/0x290 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:179 [inline]
  do_error_trap+0xd7/0x440 arch/x86/kernel/traps.c:272
  do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:291
  invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1028
RIP: 0010:sk_del_node_init include/net/sock.h:666 [inline]
RIP: 0010:smc_unhash_sk+0x21b/0x240 net/smc/af_smc.c:96
Code: 48 89 df e8 07 b1 39 00 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3  
e8 03 d7 31 fa 48 c7 c7 f2 c3 3a 88 31 c0 e8 28 1d 1b fa <0f> 0b eb 85 44  
89 f1 80 e1 07 80 c1 03 38 c1 0f 8c 5b ff ff ff 4c
RSP: 0018:ffff888094177b68 EFLAGS: 00010246
RAX: 0000000000000024 RBX: 0000000000000001 RCX: b964ece25f6b7c00
RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000
RBP: ffff888094177bb0 R08: ffffffff815cf7d4 R09: ffffed1015d46088
R10: ffffed1015d46088 R11: 0000000000000000 R12: ffff888098ccb240
R13: dffffc0000000000 R14: ffff888098ccb2c0 R15: ffff888098ccb268
  __smc_release+0x1f8/0x3a0 net/smc/af_smc.c:146
  smc_release+0x15b/0x2c0 net/smc/af_smc.c:185
  __sock_release net/socket.c:590 [inline]
  sock_close+0xe1/0x260 net/socket.c:1268
  __fput+0x2e4/0x740 fs/file_table.c:280
  ____fput+0x15/0x20 fs/file_table.c:313
  task_work_run+0x17e/0x1b0 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x5e8/0x21a0 kernel/exit.c:879
  do_group_exit+0x15c/0x2b0 kernel/exit.c:983
  __do_sys_exit_group+0x17/0x20 kernel/exit.c:994
  __se_sys_exit_group+0x14/0x20 kernel/exit.c:992
  __x64_sys_exit_group+0x3b/0x40 kernel/exit.c:992
  do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x43ff28
Code: 00 00 be 3c 00 00 00 eb 19 66 0f 1f 84 00 00 00 00 00 48 89 d7 89 f0  
0f 05 48 3d 00 f0 ff ff 77 21 f4 48 89 d7 44 89 c0 0f 05 <48> 3d 00 f0 ff  
ff 76 e0 f7 d8 64 41 89 01 eb d8 0f 1f 84 00 00 00
RSP: 002b:00007ffefacce238 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043ff28
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004bf750 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* general protection fault in tls_sk_proto_close (2)
From: syzbot @ 2019-08-28  6:38 UTC (permalink / raw)
  To: aviadye, borisp, daniel, davejwatson, davem, jakub.kicinski,
	john.fastabend, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    a55aa89a Linux 5.3-rc6
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16c26ebc600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=2a6a2b9826fdadf9
dashboard link: https://syzkaller.appspot.com/bug?extid=7a6ee4d0078eac6bf782
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1112a4de600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+7a6ee4d0078eac6bf782@syzkaller.appspotmail.com

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 10290 Comm: syz-executor.0 Not tainted 5.3.0-rc6 #120
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1  
ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c  
89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
FS:  00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  tls_sk_proto_close+0x35b/0x990 net/tls/tls_main.c:321
  tcp_bpf_close+0x17c/0x390 net/ipv4/tcp_bpf.c:582
  inet_release+0xed/0x200 net/ipv4/af_inet.c:427
  inet6_release+0x53/0x80 net/ipv6/af_inet6.c:470
  __sock_release+0xce/0x280 net/socket.c:590
  sock_close+0x1e/0x30 net/socket.c:1268
  __fput+0x2ff/0x890 fs/file_table.c:280
  ____fput+0x16/0x20 fs/file_table.c:313
  task_work_run+0x145/0x1c0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
  exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
  prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
  do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413540
Code: 01 f0 ff ff 0f 83 30 1b 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f  
44 00 00 83 3d 4d 2d 66 00 00 75 14 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff
RSP: 002b:00007fff5d481778 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000413540
RDX: 0000001b2e520000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000075bf20
R13: 0000000000000003 R14: 0000000000761220 R15: ffffffffffffffff
Modules linked in:
---[ end trace bdfd4385a0f1f76d ]---
RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1  
ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c  
89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
FS:  00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* [net-next 00/15][pull request] Intel Wired LAN Driver Updates 2019-08-27
From: Jeff Kirsher @ 2019-08-28  6:43 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann

This series contains a variety of cold and hot savoury changes to Intel
drivers.  Some of the fixes could be considered for stable even though
the author did not request it.

Hulk Robert cleans up (i.e. removes) a function that has no caller for
the iavf driver.

Radoslaw fixes an issue when there is no link in the VM after the
hypervisor is restored from a low-power state due to the driver not
properly restoring features in the device that had been disabled during
the suspension for ixgbevf.

Kai-Heng Feng modified e1000e to use mod_delayed_work() to help resolve
a hot plug speed detection issue by adding a deterministic 1 second
delay before running watchdog task after an interrupt.

Sasha moves functions around to avoid forward declarations, since the
forward declarations are not necessary for these static functions in
igc.  Also added a check for igc during driver probe to validate the NVM
checksum.  Cleaned up code defines that were not being used in the igc
driver.  Adds support for IP generic transmit checksum offload in the
igc driver.

Updated the iavf kernel documentation by a developer with no life.

Jake provides another fm10k update to a local variable for ease of code
readability.

Mitch fixes the iavf driver to allow the VF to override the MAC address
set by the host, if the VF is in "trusted" mode.

Mauro S. M. Rodrigues provides several changes for i40e driver, first
with resolving hw_dbg usage and referencing a i40e_hw attribute.  Also
implemented a debug macro using pr_debug, since the use of netdev_dbg
could cause a NULL pointer dereference during probe.  Finally cleaned up
code that is no longer used or needed.

Firo Yang provides a change in the ixgbe driver to ensure we sync the
first fragment unconditionally to help resolve an issue seen in the XEN
environment when the upper network stack could receive an incomplete
network packet.

Mariusz adds a missing device to the i40e PCI table in the driver.

The following are changes since commit 68aaf4459556b1f9370c259fd486aecad2257552:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Firo Yang (1):
  ixgbe: sync the first fragment unconditionally

Jacob Keller (1):
  fm10k: use a local variable for the frag pointer

Jeff Kirsher (1):
  Documentation: iavf: Update the Intel LAN driver doc for iavf

Kai-Heng Feng (1):
  e1000e: Make speed detection on hotplugging cable more reliable

Mariusz Stachura (1):
  i40e: Add support for X710 device

Mauro S. M. Rodrigues (3):
  i40e: fix hw_dbg usage in i40e_hmc_get_object_va
  i40e: Implement debug macro hw_dbg using pr_debug
  i40e: Remove EMPR traces from debugfs facility

Mitch Williams (1):
  iavf: allow permanent MAC address to change

Radoslaw Tyl (1):
  ixgbevf: Link lost in VM on ixgbevf when restoring from freeze or
    suspend

Sasha Neftin (4):
  igc: Remove useless forward declaration
  igc: Add NVM checksum validation
  igc: Remove unneeded PCI bus defines
  igc: Add tx_csum offload functionality

YueHaibing (1):
  iavf: remove unused debug function iavf_debug_d

 .../networking/device_drivers/intel/iavf.rst  | 115 ++++++++---
 drivers/net/ethernet/intel/e1000e/netdev.c    |  12 +-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |   8 +-
 drivers/net/ethernet/intel/i40e/i40e.h        |   1 -
 drivers/net/ethernet/intel/i40e/i40e_common.c |   1 +
 .../net/ethernet/intel/i40e/i40e_debugfs.c    |   4 -
 drivers/net/ethernet/intel/i40e/i40e_hmc.c    |   1 +
 .../net/ethernet/intel/i40e/i40e_lan_hmc.c    |  14 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c   |   1 +
 drivers/net/ethernet/intel/i40e/i40e_osdep.h  |   7 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |   1 -
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  26 ---
 drivers/net/ethernet/intel/igc/igc.h          |   4 +
 drivers/net/ethernet/intel/igc/igc_base.h     |   8 +
 drivers/net/ethernet/intel/igc/igc_defines.h  |   9 +-
 drivers/net/ethernet/intel/igc/igc_mac.c      |  73 ++++---
 drivers/net/ethernet/intel/igc/igc_main.c     | 106 ++++++++++
 drivers/net/ethernet/intel/igc/igc_phy.c      | 192 +++++++++---------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  16 +-
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |   1 +
 20 files changed, 372 insertions(+), 228 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [net-next 02/15] ixgbevf: Link lost in VM on ixgbevf when restoring from freeze or suspend
From: Jeff Kirsher @ 2019-08-28  6:43 UTC (permalink / raw)
  To: davem; +Cc: Radoslaw Tyl, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Radoslaw Tyl <radoslawx.tyl@intel.com>

This patch fixed issue in VM which shows no link when hypervisor is
restored from low-power state. The driver is responsible for re-enabling
any features of the device that had been disabled during suspend calls,
such as IRQs and bus mastering.

Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 8c011d4ce7a9..75e849a64db7 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2517,6 +2517,7 @@ void ixgbevf_reinit_locked(struct ixgbevf_adapter *adapter)
 		msleep(1);
 
 	ixgbevf_down(adapter);
+	pci_set_master(adapter->pdev);
 	ixgbevf_up(adapter);
 
 	clear_bit(__IXGBEVF_RESETTING, &adapter->state);
-- 
2.21.0


^ permalink raw reply related

* [net-next 03/15] e1000e: Make speed detection on hotplugging cable more reliable
From: Jeff Kirsher @ 2019-08-28  6:43 UTC (permalink / raw)
  To: davem; +Cc: Kai-Heng Feng, netdev, nhorman, sassmann, Aaron Brown,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Kai-Heng Feng <kai.heng.feng@canonical.com>

After hot plugging an 1Gbps Ethernet cable with 1Gbps link partner, the
MII_BMSR may report 10Mbps, renders the network rather slow.

The issue has much lower fail rate after commit 59653e6497d1 ("e1000e:
Make watchdog use delayed work"), which essentially introduces some
delay before running the watchdog task.

But there's still a chance that the hot plugging event and the queued
watchdog task gets run at the same time, then the original issue can be
observed once again.

So let's use mod_delayed_work() to add a deterministic 1 second delay
before running watchdog task, after an interrupt.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 8a3f035c3a5f..d7d56e42a6aa 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1780,8 +1780,8 @@ static irqreturn_t e1000_intr_msi(int __always_unused irq, void *data)
 		}
 		/* guard against interrupt when we're going down */
 		if (!test_bit(__E1000_DOWN, &adapter->state))
-			queue_delayed_work(adapter->e1000_workqueue,
-					   &adapter->watchdog_task, 1);
+			mod_delayed_work(adapter->e1000_workqueue,
+					 &adapter->watchdog_task, HZ);
 	}
 
 	/* Reset on uncorrectable ECC error */
@@ -1861,8 +1861,8 @@ static irqreturn_t e1000_intr(int __always_unused irq, void *data)
 		}
 		/* guard against interrupt when we're going down */
 		if (!test_bit(__E1000_DOWN, &adapter->state))
-			queue_delayed_work(adapter->e1000_workqueue,
-					   &adapter->watchdog_task, 1);
+			mod_delayed_work(adapter->e1000_workqueue,
+					 &adapter->watchdog_task, HZ);
 	}
 
 	/* Reset on uncorrectable ECC error */
@@ -1907,8 +1907,8 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
 		hw->mac.get_link_status = true;
 		/* guard against interrupt when we're going down */
 		if (!test_bit(__E1000_DOWN, &adapter->state))
-			queue_delayed_work(adapter->e1000_workqueue,
-					   &adapter->watchdog_task, 1);
+			mod_delayed_work(adapter->e1000_workqueue,
+					 &adapter->watchdog_task, HZ);
 	}
 
 	if (!test_bit(__E1000_DOWN, &adapter->state))
-- 
2.21.0


^ permalink raw reply related

* [net-next 01/15] iavf: remove unused debug function iavf_debug_d
From: Jeff Kirsher @ 2019-08-28  6:43 UTC (permalink / raw)
  To: davem
  Cc: YueHaibing, netdev, nhorman, sassmann, Hulk Robot, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: YueHaibing <yuehaibing@huawei.com>

There is no caller of function iavf_debug_d() in tree since
commit 75051ce4c5d8 ("iavf: Fix up debug print macro"),
so it can be removed.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 22 ---------------------
 1 file changed, 22 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 9d2b50964a08..554aa619ff02 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -142,28 +142,6 @@ enum iavf_status iavf_free_virt_mem_d(struct iavf_hw *hw,
 	return 0;
 }
 
-/**
- * iavf_debug_d - OS dependent version of debug printing
- * @hw:  pointer to the HW structure
- * @mask: debug level mask
- * @fmt_str: printf-type format description
- **/
-void iavf_debug_d(void *hw, u32 mask, char *fmt_str, ...)
-{
-	char buf[512];
-	va_list argptr;
-
-	if (!(mask & ((struct iavf_hw *)hw)->debug_mask))
-		return;
-
-	va_start(argptr, fmt_str);
-	vsnprintf(buf, sizeof(buf), fmt_str, argptr);
-	va_end(argptr);
-
-	/* the debug string is already formatted with a newline */
-	pr_info("%s", buf);
-}
-
 /**
  * iavf_schedule_reset - Set the flags and schedule a reset event
  * @adapter: board private structure
-- 
2.21.0


^ permalink raw reply related

* [net-next 11/15] i40e: Implement debug macro hw_dbg using pr_debug
From: Jeff Kirsher @ 2019-08-28  6:44 UTC (permalink / raw)
  To: davem
  Cc: Mauro S. M. Rodrigues, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>

There are several uses of hw_dbg in the code, producing no output. This
patch implements it using pr_debug.

Initially the intention was to implement it using netdev_dbg, analogously
to what is done in ixgbe for instance. That approach was avoided due to
some early usages of hw_dbg, like i40e_pf_reset, before the VSI structure
initialization causing NULL pointer dereference during the driver probe if
the dbg messages were turned on as soon as the module is probed.

Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_common.c | 1 +
 drivers/net/ethernet/intel/i40e/i40e_hmc.c    | 1 +
 drivers/net/ethernet/intel/i40e/i40e_osdep.h  | 7 ++++++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 46e649c09f72..d37c6e0e5f08 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
+#include "i40e.h"
 #include "i40e_type.h"
 #include "i40e_adminq.h"
 #include "i40e_prototype.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
index 19ce93d7fd0a..163ee8c6311c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
+#include "i40e.h"
 #include "i40e_osdep.h"
 #include "i40e_register.h"
 #include "i40e_status.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_osdep.h b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
index a07574bff550..c0c9ce3eab23 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_osdep.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
@@ -18,7 +18,12 @@
  * actual OS primitives
  */
 
-#define hw_dbg(hw, S, A...)	do {} while (0)
+#define hw_dbg(hw, S, A...)							\
+do {										\
+	int domain = pci_domain_nr(((struct i40e_pf *)(hw)->back)->pdev->bus);	\
+	pr_debug("i40e %04x:%02x:%02x.%x " S, domain, (hw)->bus.bus_id,		\
+		 (hw)->bus.device, (hw)->bus.func, ## A);			\
+} while (0)
 
 #define wr32(a, reg, value)	writel((value), ((a)->hw_addr + (reg)))
 #define rd32(a, reg)		readl((a)->hw_addr + (reg))
-- 
2.21.0


^ permalink raw reply related

* [net-next 14/15] igc: Add tx_csum offload functionality
From: Jeff Kirsher @ 2019-08-28  6:44 UTC (permalink / raw)
  To: davem; +Cc: Sasha Neftin, netdev, nhorman, sassmann, Aaron Brown,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Sasha Neftin <sasha.neftin@intel.com>

Add IP generic TX checksum offload functionality.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h         |  4 +
 drivers/net/ethernet/intel/igc/igc_base.h    |  8 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  5 +
 drivers/net/ethernet/intel/igc/igc_main.c    | 97 ++++++++++++++++++++
 4 files changed, 114 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 0f5534ce27b0..7e16345d836e 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -135,6 +135,9 @@ extern char igc_driver_version[];
 /* How many Rx Buffers do we bundle into one write to the hardware ? */
 #define IGC_RX_BUFFER_WRITE	16 /* Must be power of 2 */
 
+/* VLAN info */
+#define IGC_TX_FLAGS_VLAN_MASK	0xffff0000
+
 /* igc_test_staterr - tests bits within Rx descriptor status and error fields */
 static inline __le32 igc_test_staterr(union igc_adv_rx_desc *rx_desc,
 				      const u32 stat_err_bits)
@@ -254,6 +257,7 @@ struct igc_ring {
 	u16 count;                      /* number of desc. in the ring */
 	u8 queue_index;                 /* logical index of the ring*/
 	u8 reg_idx;                     /* physical index of the ring */
+	bool launchtime_enable;		/* true if LaunchTime is enabled */
 
 	/* everything past this point are written often */
 	u16 next_to_clean;
diff --git a/drivers/net/ethernet/intel/igc/igc_base.h b/drivers/net/ethernet/intel/igc/igc_base.h
index 58d1109d7f3f..ea627ce52525 100644
--- a/drivers/net/ethernet/intel/igc/igc_base.h
+++ b/drivers/net/ethernet/intel/igc/igc_base.h
@@ -22,6 +22,14 @@ union igc_adv_tx_desc {
 	} wb;
 };
 
+/* Context descriptors */
+struct igc_adv_tx_context_desc {
+	__le32 vlan_macip_lens;
+	__le32 launch_time;
+	__le32 type_tucmd_mlhl;
+	__le32 mss_l4len_idx;
+};
+
 /* Adv Transmit Descriptor Config Masks */
 #define IGC_ADVTXD_MAC_TSTAMP	0x00080000 /* IEEE1588 Timestamp packet */
 #define IGC_ADVTXD_DTYP_CTXT	0x00200000 /* Advanced Context Descriptor */
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 549134ecd105..f3f2325fe567 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -397,4 +397,9 @@
 #define IGC_VLAPQF_P_VALID(_n)	(0x1 << (3 + (_n) * 4))
 #define IGC_VLAPQF_QUEUE_MASK	0x03
 
+#define IGC_ADVTXD_MACLEN_SHIFT		9  /* Adv ctxt desc mac len shift */
+#define IGC_ADVTXD_TUCMD_IPV4		0x00000400  /* IP Packet Type:1=IPv4 */
+#define IGC_ADVTXD_TUCMD_L4T_TCP	0x00000800  /* L4 Packet Type of TCP */
+#define IGC_ADVTXD_TUCMD_L4T_SCTP	0x00001000 /* L4 packet TYPE of SCTP */
+
 #endif /* _IGC_DEFINES_H_ */
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 965d1c939f0f..63b62d74f961 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5,6 +5,11 @@
 #include <linux/types.h>
 #include <linux/if_vlan.h>
 #include <linux/aer.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
+#include <linux/ip.h>
+
+#include <net/ipv6.h>
 
 #include "igc.h"
 #include "igc_hw.h"
@@ -790,8 +795,96 @@ static int igc_set_mac(struct net_device *netdev, void *p)
 	return 0;
 }
 
+static void igc_tx_ctxtdesc(struct igc_ring *tx_ring,
+			    struct igc_tx_buffer *first,
+			    u32 vlan_macip_lens, u32 type_tucmd,
+			    u32 mss_l4len_idx)
+{
+	struct igc_adv_tx_context_desc *context_desc;
+	u16 i = tx_ring->next_to_use;
+	struct timespec64 ts;
+
+	context_desc = IGC_TX_CTXTDESC(tx_ring, i);
+
+	i++;
+	tx_ring->next_to_use = (i < tx_ring->count) ? i : 0;
+
+	/* set bits to identify this as an advanced context descriptor */
+	type_tucmd |= IGC_TXD_CMD_DEXT | IGC_ADVTXD_DTYP_CTXT;
+
+	/* For 82575, context index must be unique per ring. */
+	if (test_bit(IGC_RING_FLAG_TX_CTX_IDX, &tx_ring->flags))
+		mss_l4len_idx |= tx_ring->reg_idx << 4;
+
+	context_desc->vlan_macip_lens	= cpu_to_le32(vlan_macip_lens);
+	context_desc->type_tucmd_mlhl	= cpu_to_le32(type_tucmd);
+	context_desc->mss_l4len_idx	= cpu_to_le32(mss_l4len_idx);
+
+	/* We assume there is always a valid Tx time available. Invalid times
+	 * should have been handled by the upper layers.
+	 */
+	if (tx_ring->launchtime_enable) {
+		ts = ns_to_timespec64(first->skb->tstamp);
+		first->skb->tstamp = 0;
+		context_desc->launch_time = cpu_to_le32(ts.tv_nsec / 32);
+	} else {
+		context_desc->launch_time = 0;
+	}
+}
+
+static inline bool igc_ipv6_csum_is_sctp(struct sk_buff *skb)
+{
+	unsigned int offset = 0;
+
+	ipv6_find_hdr(skb, &offset, IPPROTO_SCTP, NULL, NULL);
+
+	return offset == skb_checksum_start_offset(skb);
+}
+
 static void igc_tx_csum(struct igc_ring *tx_ring, struct igc_tx_buffer *first)
 {
+	struct sk_buff *skb = first->skb;
+	u32 vlan_macip_lens = 0;
+	u32 type_tucmd = 0;
+
+	if (skb->ip_summed != CHECKSUM_PARTIAL) {
+csum_failed:
+		if (!(first->tx_flags & IGC_TX_FLAGS_VLAN) &&
+		    !tx_ring->launchtime_enable)
+			return;
+		goto no_csum;
+	}
+
+	switch (skb->csum_offset) {
+	case offsetof(struct tcphdr, check):
+		type_tucmd = IGC_ADVTXD_TUCMD_L4T_TCP;
+		/* fall through */
+	case offsetof(struct udphdr, check):
+		break;
+	case offsetof(struct sctphdr, checksum):
+		/* validate that this is actually an SCTP request */
+		if ((first->protocol == htons(ETH_P_IP) &&
+		     (ip_hdr(skb)->protocol == IPPROTO_SCTP)) ||
+		    (first->protocol == htons(ETH_P_IPV6) &&
+		     igc_ipv6_csum_is_sctp(skb))) {
+			type_tucmd = IGC_ADVTXD_TUCMD_L4T_SCTP;
+			break;
+		}
+		/* fall through */
+	default:
+		skb_checksum_help(skb);
+		goto csum_failed;
+	}
+
+	/* update TX checksum flag */
+	first->tx_flags |= IGC_TX_FLAGS_CSUM;
+	vlan_macip_lens = skb_checksum_start_offset(skb) -
+			  skb_network_offset(skb);
+no_csum:
+	vlan_macip_lens |= skb_network_offset(skb) << IGC_ADVTXD_MACLEN_SHIFT;
+	vlan_macip_lens |= first->tx_flags & IGC_TX_FLAGS_VLAN_MASK;
+
+	igc_tx_ctxtdesc(tx_ring, first, vlan_macip_lens, type_tucmd, 0);
 }
 
 static int __igc_maybe_stop_tx(struct igc_ring *tx_ring, const u16 size)
@@ -4116,6 +4209,9 @@ static int igc_probe(struct pci_dev *pdev,
 	if (err)
 		goto err_sw_init;
 
+	/* Add supported features to the features list*/
+	netdev->features |= NETIF_F_HW_CSUM;
+
 	/* setup the private structure */
 	err = igc_sw_init(adapter);
 	if (err)
@@ -4123,6 +4219,7 @@ static int igc_probe(struct pci_dev *pdev,
 
 	/* copy netdev features into list of user selectable features */
 	netdev->hw_features |= NETIF_F_NTUPLE;
+	netdev->hw_features |= netdev->features;
 
 	/* MTU range: 68 - 9216 */
 	netdev->min_mtu = ETH_MIN_MTU;
-- 
2.21.0


^ permalink raw reply related

* [net-next 15/15] i40e: Add support for X710 device
From: Jeff Kirsher @ 2019-08-28  6:44 UTC (permalink / raw)
  To: davem
  Cc: Mariusz Stachura, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Mariusz Stachura <mariusz.stachura@intel.com>

Add I40E_DEV_ID_10G_BASE_T_BC to i40e_pci_tbl

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index fdf43d87e983..a71369546c23 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -73,6 +73,7 @@ static const struct pci_device_id i40e_pci_tbl[] = {
 	{PCI_VDEVICE(INTEL, I40E_DEV_ID_QSFP_C), 0},
 	{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T), 0},
 	{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T4), 0},
+	{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T_BC), 0},
 	{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_SFP), 0},
 	{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_B), 0},
 	{PCI_VDEVICE(INTEL, I40E_DEV_ID_KX_X722), 0},
-- 
2.21.0


^ permalink raw reply related

* [net-next 12/15] i40e: Remove EMPR traces from debugfs facility
From: Jeff Kirsher @ 2019-08-28  6:44 UTC (permalink / raw)
  To: davem
  Cc: Mauro S. M. Rodrigues, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>

Since commit
'5098850c9b9b ("i40e/i40evf: i40e_register.h updates")'
it is no longer possible to trigger an EMP Reset from debugfs, but it's
possible to request it either way, to end up with a bad reset request:

echo empr > /sys/kernel/debug/i40e/0002\:01\:00.1/command
i40e 0002:01:00.1: debugfs: forcing EMPR
i40e 0002:01:00.1: bad reset request 0x00010000

So let's remove this piece of code and show the available valid commands
as it is when any invalid command is issued.

Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         | 1 -
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 4 ----
 2 files changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 3e535d3263b3..f1a1bd324b50 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -131,7 +131,6 @@ enum i40e_state_t {
 	__I40E_PF_RESET_REQUESTED,
 	__I40E_CORE_RESET_REQUESTED,
 	__I40E_GLOBAL_RESET_REQUESTED,
-	__I40E_EMP_RESET_REQUESTED,
 	__I40E_EMP_RESET_INTR_RECEIVED,
 	__I40E_SUSPENDED,
 	__I40E_PTP_TX_IN_PROGRESS,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 41232898d8ae..99ea543dd245 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1125,10 +1125,6 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
 		dev_info(&pf->pdev->dev, "debugfs: forcing GlobR\n");
 		i40e_do_reset_safe(pf, BIT(__I40E_GLOBAL_RESET_REQUESTED));
 
-	} else if (strncmp(cmd_buf, "empr", 4) == 0) {
-		dev_info(&pf->pdev->dev, "debugfs: forcing EMPR\n");
-		i40e_do_reset_safe(pf, BIT(__I40E_EMP_RESET_REQUESTED));
-
 	} else if (strncmp(cmd_buf, "read", 4) == 0) {
 		u32 address;
 		u32 value;
-- 
2.21.0


^ permalink raw reply related

* [net-next 13/15] ixgbe: sync the first fragment unconditionally
From: Jeff Kirsher @ 2019-08-28  6:44 UTC (permalink / raw)
  To: davem
  Cc: Firo Yang, netdev, nhorman, sassmann, Alexander Duyck,
	Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Firo Yang <firo.yang@suse.com>

In Xen environment, if Xen-swiotlb is enabled, ixgbe driver
could possibly allocate a page, DMA memory buffer, for the first
fragment which is not suitable for Xen-swiotlb to do DMA operations.
Xen-swiotlb have to internally allocate another page for doing DMA
operations. This mechanism requires syncing the data from the internal
page to the page which ixgbe sends to upper network stack. However,
since commit f3213d932173 ("ixgbe: Update driver to make use of DMA
attributes in Rx path"), the unmap operation is performed with
DMA_ATTR_SKIP_CPU_SYNC. As a result, the sync is not performed.
Since the sync isn't performed, the upper network stack could receive
a incomplete network packet. By incomplete, it means the linear data
on the first fragment(between skb->head and skb->end) is invalid. So
we have to copy the data from the internal xen-swiotlb page to the page
which ixgbe sends to upper network stack through the sync operation.

More details from Alexander Duyck:
Specifically since we are mapping the frame with
DMA_ATTR_SKIP_CPU_SYNC we have to unmap with that as well. As a result
a sync is not performed on an unmap and must be done manually as we
skipped it for the first frag. As such we need to always sync before
possibly performing a page unmap operation.

Fixes: f3213d932173 ("ixgbe: Update driver to make use of DMA
attributes in Rx path")
Signed-off-by: Firo Yang <firo.yang@suse.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 17b7ae9f46ec..f5fc5929a15d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1825,13 +1825,7 @@ static void ixgbe_pull_tail(struct ixgbe_ring *rx_ring,
 static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
 				struct sk_buff *skb)
 {
-	/* if the page was released unmap it, else just sync our portion */
-	if (unlikely(IXGBE_CB(skb)->page_released)) {
-		dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
-				     ixgbe_rx_pg_size(rx_ring),
-				     DMA_FROM_DEVICE,
-				     IXGBE_RX_DMA_ATTR);
-	} else if (ring_uses_build_skb(rx_ring)) {
+	if (ring_uses_build_skb(rx_ring)) {
 		unsigned long offset = (unsigned long)(skb->data) & ~PAGE_MASK;
 
 		dma_sync_single_range_for_cpu(rx_ring->dev,
@@ -1848,6 +1842,14 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
 					      skb_frag_size(frag),
 					      DMA_FROM_DEVICE);
 	}
+
+	/* If the page was released, just unmap it. */
+	if (unlikely(IXGBE_CB(skb)->page_released)) {
+		dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
+				     ixgbe_rx_pg_size(rx_ring),
+				     DMA_FROM_DEVICE,
+				     IXGBE_RX_DMA_ATTR);
+	}
 }
 
 /**
-- 
2.21.0


^ permalink raw reply related

* [net-next 10/15] i40e: fix hw_dbg usage in i40e_hmc_get_object_va
From: Jeff Kirsher @ 2019-08-28  6:44 UTC (permalink / raw)
  To: davem
  Cc: Mauro S. M. Rodrigues, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>

The mentioned function references a i40e_hw attribute, as parameter for
hw_dbg, but it doesn't exist in the function scope.
Fixes it by changing parameters from i40e_hmc_info to i40e_hw which can
retrieve the necessary i40e_hmc_info.

Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c
index 994011c38fb4..f059de33a0fd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
+#include "i40e.h"
 #include "i40e_osdep.h"
 #include "i40e_register.h"
 #include "i40e_type.h"
@@ -963,7 +964,7 @@ static i40e_status i40e_set_hmc_context(u8 *context_bytes,
 
 /**
  * i40e_hmc_get_object_va - retrieves an object's virtual address
- * @hmc_info: pointer to i40e_hmc_info struct
+ * @hw: the hardware struct, from which we obtain the i40e_hmc_info pointer
  * @object_base: pointer to u64 to get the va
  * @rsrc_type: the hmc resource type
  * @obj_idx: hmc object index
@@ -972,7 +973,7 @@ static i40e_status i40e_set_hmc_context(u8 *context_bytes,
  * base pointer.  This function is used for LAN Queue contexts.
  **/
 static
-i40e_status i40e_hmc_get_object_va(struct i40e_hmc_info *hmc_info,
+i40e_status i40e_hmc_get_object_va(struct i40e_hw *hw,
 					u8 **object_base,
 					enum i40e_hmc_lan_rsrc_type rsrc_type,
 					u32 obj_idx)
@@ -982,6 +983,7 @@ i40e_status i40e_hmc_get_object_va(struct i40e_hmc_info *hmc_info,
 	struct i40e_hmc_sd_entry *sd_entry;
 	struct i40e_hmc_pd_entry *pd_entry;
 	u32 pd_idx, pd_lmt, rel_pd_idx;
+	struct i40e_hmc_info *hmc_info = &hw->hmc;
 	u64 obj_offset_in_fpm;
 	u32 sd_idx, sd_lmt;
 
@@ -1047,7 +1049,7 @@ i40e_status i40e_clear_lan_tx_queue_context(struct i40e_hw *hw,
 	i40e_status err;
 	u8 *context_bytes;
 
-	err = i40e_hmc_get_object_va(&hw->hmc, &context_bytes,
+	err = i40e_hmc_get_object_va(hw, &context_bytes,
 				     I40E_HMC_LAN_TX, queue);
 	if (err < 0)
 		return err;
@@ -1068,7 +1070,7 @@ i40e_status i40e_set_lan_tx_queue_context(struct i40e_hw *hw,
 	i40e_status err;
 	u8 *context_bytes;
 
-	err = i40e_hmc_get_object_va(&hw->hmc, &context_bytes,
+	err = i40e_hmc_get_object_va(hw, &context_bytes,
 				     I40E_HMC_LAN_TX, queue);
 	if (err < 0)
 		return err;
@@ -1088,7 +1090,7 @@ i40e_status i40e_clear_lan_rx_queue_context(struct i40e_hw *hw,
 	i40e_status err;
 	u8 *context_bytes;
 
-	err = i40e_hmc_get_object_va(&hw->hmc, &context_bytes,
+	err = i40e_hmc_get_object_va(hw, &context_bytes,
 				     I40E_HMC_LAN_RX, queue);
 	if (err < 0)
 		return err;
@@ -1109,7 +1111,7 @@ i40e_status i40e_set_lan_rx_queue_context(struct i40e_hw *hw,
 	i40e_status err;
 	u8 *context_bytes;
 
-	err = i40e_hmc_get_object_va(&hw->hmc, &context_bytes,
+	err = i40e_hmc_get_object_va(hw, &context_bytes,
 				     I40E_HMC_LAN_RX, queue);
 	if (err < 0)
 		return err;
-- 
2.21.0


^ permalink raw reply related

* [net-next 09/15] igc: Remove unneeded PCI bus defines
From: Jeff Kirsher @ 2019-08-28  6:44 UTC (permalink / raw)
  To: davem; +Cc: Sasha Neftin, netdev, nhorman, sassmann, Aaron Brown,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Sasha Neftin <sasha.neftin@intel.com>

PCIe device control 2 defines does not use internally.
This patch comes to clean up those.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_defines.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 11b99acf4abe..549134ecd105 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -10,10 +10,6 @@
 
 #define IGC_CTRL_EXT_DRV_LOAD	0x10000000 /* Drv loaded bit for FW */
 
-/* PCI Bus Info */
-#define PCIE_DEVICE_CONTROL2		0x28
-#define PCIE_DEVICE_CONTROL2_16ms	0x0005
-
 /* Physical Func Reset Done Indication */
 #define IGC_CTRL_EXT_LINK_MODE_MASK	0x00C00000
 
-- 
2.21.0


^ permalink raw reply related

* [net-next 06/15] fm10k: use a local variable for the frag pointer
From: Jeff Kirsher @ 2019-08-28  6:43 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

In the function fm10k_xmit_frame_ring, we recently switched to using
the skb_frag_size accessor instead of directly using the size member of
the skb fragment.

This made the for loop slightly harder to read because it created a very
long line that is difficult to split up. Avoid this by using a local
variable in the for loop, so that we do not have to break the line on an
open parenthesis.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index e0a2be534b20..2be9222510e7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1073,9 +1073,11 @@ netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
 	 *       + 2 desc gap to keep tail from touching head
 	 * otherwise try next time
 	 */
-	for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
-		count += TXD_USE_COUNT(skb_frag_size(
-						&skb_shinfo(skb)->frags[f]));
+	for (f = 0; f < skb_shinfo(skb)->nr_frags; f++) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[f];
+
+		count += TXD_USE_COUNT(skb_frag_size(frag));
+	}
 
 	if (fm10k_maybe_stop_tx(tx_ring, count + 3)) {
 		tx_ring->tx_stats.tx_busy++;
-- 
2.21.0


^ permalink raw reply related

* [net-next 07/15] igc: Add NVM checksum validation
From: Jeff Kirsher @ 2019-08-28  6:43 UTC (permalink / raw)
  To: davem; +Cc: Sasha Neftin, netdev, nhorman, sassmann, Aaron Brown,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Sasha Neftin <sasha.neftin@intel.com>

Add NVM checksum validation during probe functionality.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 251552855c40..965d1c939f0f 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -4133,6 +4133,15 @@ static int igc_probe(struct pci_dev *pdev,
 	 */
 	hw->mac.ops.reset_hw(hw);
 
+	if (igc_get_flash_presence_i225(hw)) {
+		if (hw->nvm.ops.validate(hw) < 0) {
+			dev_err(&pdev->dev,
+				"The NVM Checksum Is Not Valid\n");
+			err = -EIO;
+			goto err_eeprom;
+		}
+	}
+
 	if (eth_platform_get_mac_address(&pdev->dev, hw->mac.addr)) {
 		/* copy the MAC address out of the NVM */
 		if (hw->mac.ops.read_mac_addr(hw))
-- 
2.21.0


^ permalink raw reply related

* [net-next 08/15] iavf: allow permanent MAC address to change
From: Jeff Kirsher @ 2019-08-28  6:44 UTC (permalink / raw)
  To: davem
  Cc: Mitch Williams, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

Allow the VF to override the "permanent" MAC address set by the host.
This allows bonding to work in the case where the administrator has set
the VF MAC.

Note that the VF must still be set to Trusted on the host if this change
is to be accepted by the PF driver.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h      | 1 -
 drivers/net/ethernet/intel/iavf/iavf_main.c | 4 ----
 2 files changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 9fc635d816d2..29de3ae96ef2 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -253,7 +253,6 @@ struct iavf_adapter {
 #define IAVF_FLAG_RESET_PENDING		BIT(4)
 #define IAVF_FLAG_RESET_NEEDED		BIT(5)
 #define IAVF_FLAG_WB_ON_ITR_CAPABLE		BIT(6)
-#define IAVF_FLAG_ADDR_SET_BY_PF		BIT(8)
 #define IAVF_FLAG_SERVICE_CLIENT_REQUESTED	BIT(9)
 #define IAVF_FLAG_CLIENT_NEEDS_OPEN		BIT(10)
 #define IAVF_FLAG_CLIENT_NEEDS_CLOSE		BIT(11)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 554aa619ff02..07f5541a0f01 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -790,9 +790,6 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
 	if (ether_addr_equal(netdev->dev_addr, addr->sa_data))
 		return 0;
 
-	if (adapter->flags & IAVF_FLAG_ADDR_SET_BY_PF)
-		return -EPERM;
-
 	spin_lock_bh(&adapter->mac_vlan_list_lock);
 
 	f = iavf_find_filter(adapter, hw->mac.addr);
@@ -1811,7 +1808,6 @@ static int iavf_init_get_resources(struct iavf_adapter *adapter)
 		eth_hw_addr_random(netdev);
 		ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr);
 	} else {
-		adapter->flags |= IAVF_FLAG_ADDR_SET_BY_PF;
 		ether_addr_copy(netdev->dev_addr, adapter->hw.mac.addr);
 		ether_addr_copy(netdev->perm_addr, adapter->hw.mac.addr);
 	}
-- 
2.21.0


^ permalink raw reply related

* [net-next 05/15] Documentation: iavf: Update the Intel LAN driver doc for iavf
From: Jeff Kirsher @ 2019-08-28  6:43 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, Aaron Brown
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

Update the LAN driver documentation to include the latest feature
implementation and driver capabilities.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
---
 .../networking/device_drivers/intel/iavf.rst  | 115 +++++++++++++-----
 1 file changed, 82 insertions(+), 33 deletions(-)

diff --git a/Documentation/networking/device_drivers/intel/iavf.rst b/Documentation/networking/device_drivers/intel/iavf.rst
index 2d0c3baa1752..cfc08842e32c 100644
--- a/Documentation/networking/device_drivers/intel/iavf.rst
+++ b/Documentation/networking/device_drivers/intel/iavf.rst
@@ -10,11 +10,15 @@ Copyright(c) 2013-2018 Intel Corporation.
 Contents
 ========
 
+- Overview
 - Identifying Your Adapter
 - Additional Configurations
 - Known Issues/Troubleshooting
 - Support
 
+Overview
+========
+
 This file describes the iavf Linux* Base Driver. This driver was formerly
 called i40evf.
 
@@ -27,6 +31,7 @@ The guest OS loading the iavf driver must support MSI-X interrupts.
 
 Identifying Your Adapter
 ========================
+
 The driver in this kernel is compatible with devices based on the following:
  * Intel(R) XL710 X710 Virtual Function
  * Intel(R) X722 Virtual Function
@@ -50,9 +55,10 @@ Link messages will not be displayed to the console if the distribution is
 restricting system messages. In order to see network driver link messages on
 your console, set dmesg to eight by entering the following::
 
-  dmesg -n 8
+    # dmesg -n 8
 
-NOTE: This setting is not saved across reboots.
+NOTE:
+  This setting is not saved across reboots.
 
 ethtool
 -------
@@ -72,11 +78,11 @@ then requests from that VF to set VLAN tag stripping will be ignored.
 To enable/disable VLAN tag stripping for a VF, issue the following command
 from inside the VM in which you are running the VF::
 
-  ethtool -K <if_name> rxvlan on/off
+    # ethtool -K <if_name> rxvlan on/off
 
 or alternatively::
 
-  ethtool --offload <if_name> rxvlan on/off
+    # ethtool --offload <if_name> rxvlan on/off
 
 Adaptive Virtual Function
 -------------------------
@@ -91,21 +97,21 @@ additional features depending on what features are available in the PF with
 which the AVF is associated. The following are base mode features:
 
 - 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs)
-  for Tx/Rx.
-- i40e descriptors and ring format.
-- Descriptor write-back completion.
-- 1 control queue, with i40e descriptors, CSRs and ring format.
-- 5 MSI-X interrupt vectors and corresponding i40e CSRs.
-- 1 Interrupt Throttle Rate (ITR) index.
-- 1 Virtual Station Interface (VSI) per VF.
+  for Tx/Rx
+- i40e descriptors and ring format
+- Descriptor write-back completion
+- 1 control queue, with i40e descriptors, CSRs and ring format
+- 5 MSI-X interrupt vectors and corresponding i40e CSRs
+- 1 Interrupt Throttle Rate (ITR) index
+- 1 Virtual Station Interface (VSI) per VF
 - 1 Traffic Class (TC), TC0
 - Receive Side Scaling (RSS) with 64 entry indirection table and key,
-  configured through the PF.
-- 1 unicast MAC address reserved per VF.
-- 16 MAC address filters for each VF.
-- Stateless offloads - non-tunneled checksums.
-- AVF device ID.
-- HW mailbox is used for VF to PF communications (including on Windows).
+  configured through the PF
+- 1 unicast MAC address reserved per VF
+- 16 MAC address filters for each VF
+- Stateless offloads - non-tunneled checksums
+- AVF device ID
+- HW mailbox is used for VF to PF communications (including on Windows)
 
 IEEE 802.1ad (QinQ) Support
 ---------------------------
@@ -117,8 +123,8 @@ VLAN ID, among other uses.
 
 The following are examples of how to configure 802.1ad (QinQ)::
 
-  ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
-  ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
+    # ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
+    # ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
 
 Where "24" and "371" are example VLAN IDs.
 
@@ -133,6 +139,19 @@ specific application. This can reduce latency for the specified application,
 and allow Tx traffic to be rate limited per application. Follow the steps below
 to set ADq.
 
+Requirements:
+
+- The sch_mqprio, act_mirred and cls_flower modules must be loaded
+- The latest version of iproute2
+- If another driver (for example, DPDK) has set cloud filters, you cannot
+  enable ADQ
+- Depending on the underlying PF device, ADQ cannot be enabled when the
+  following features are enabled:
+
+  + Data Center Bridging (DCB)
+  + Multiple Functions per Port (MFP)
+  + Sideband Filters
+
 1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
 The shaper bw_rlimit parameter is optional.
 
@@ -141,9 +160,9 @@ to 1Gbit for tc0 and 3Gbit for tc1.
 
 ::
 
-  # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
-  queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
-  max_rate 1Gbit 3Gbit
+    tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
+    queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
+    max_rate 1Gbit 3Gbit
 
 map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
 sets priorities 0-3 to use tc0 and 4-7 to use tc1)
@@ -162,6 +181,10 @@ Totals must be equal or less than port speed.
 For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
 monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
 
+NOTE:
+  Setting up channels via ethtool (ethtool -L) is not supported when the
+  TCs are configured using mqprio.
+
 2. Enable HW TC offload on interface::
 
     # ethtool -K <interface> hw-tc-offload on
@@ -171,16 +194,16 @@ monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
     # tc qdisc add dev <interface> ingress
 
 NOTES:
- - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory.
- - ADq is not compatible with cloud filters.
+ - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory
+ - ADq is not compatible with cloud filters
  - Setting up channels via ethtool (ethtool -L) is not supported when the TCs
-   are configured using mqprio.
+   are configured using mqprio
  - You must have iproute2 latest version
- - NVM version 6.01 or later is required.
+ - NVM version 6.01 or later is required
  - ADq cannot be enabled when any the following features are enabled: Data
-   Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters.
+   Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters
  - If another driver (for example, DPDK) has set cloud filters, you cannot
-   enable ADq.
+   enable ADq
  - Tunnel filters are not supported in ADq. If encapsulated packets do arrive
    in non-tunnel mode, filtering will be done on the inner headers.  For example,
    for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN
@@ -198,6 +221,16 @@ NOTES:
 Known Issues/Troubleshooting
 ============================
 
+Bonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device
+---------------------------------------------------------------------------------
+If you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700
+series based device, the VF slaves may fail when they become the active slave.
+If the MAC address of the VF is set by the PF (Physical Function) of the
+device, when you add a slave, or change the active-backup slave, Linux bonding
+tries to sync the backup slave's MAC address to the same MAC address as the
+active slave. Linux bonding will fail at this point. This issue will not occur
+if the VF's MAC address is not set by the PF.
+
 Traffic Is Not Being Passed Between VM and Client
 -------------------------------------------------
 You may not be able to pass traffic between a client system and a
@@ -215,13 +248,28 @@ Do not unload a port's driver if a Virtual Function (VF) with an active Virtual
 Machine (VM) is bound to it. Doing so will cause the port to appear to hang.
 Once the VM shuts down, or otherwise releases the VF, the command will complete.
 
+Using four traffic classes fails
+--------------------------------
+Do not try to reserve more than three traffic classes in the iavf driver. Doing
+so will fail to set any traffic classes and will cause the driver to write
+errors to stdout. Use a maximum of three queues to avoid this issue.
+
+Multiple log error messages on iavf driver removal
+--------------------------------------------------
+If you have several VFs and you remove the iavf driver, several instances of
+the following log errors are written to the log::
+
+    Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok
+    Unable to send the message to VF 2 aq_err 12
+    ARQ Overflow Error detected
+
 Virtual machine does not get link
 ---------------------------------
 If the virtual machine has more than one virtual port assigned to it, and those
 virtual ports are bound to different physical ports, you may not get link on
 all of the virtual ports. The following command may work around the issue::
 
-  ethtool -r <PF>
+    # ethtool -r <PF>
 
 Where <PF> is the PF interface in the host, for example: p5p1. You may need to
 run the command more than once to get link on all virtual ports.
@@ -251,12 +299,13 @@ traffic.
 If you have multiple interfaces in a server, either turn on ARP filtering by
 entering::
 
-  echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
+    # echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
 
-NOTE: This setting is not saved across reboots. The configuration change can be
-made permanent by adding the following line to the file /etc/sysctl.conf::
+NOTE:
+  This setting is not saved across reboots. The configuration change can be
+  made permanent by adding the following line to the file /etc/sysctl.conf::
 
-  net.ipv4.conf.all.arp_filter = 1
+    net.ipv4.conf.all.arp_filter = 1
 
 Another alternative is to install the interfaces in separate broadcast domains
 (either in different switches or in a switch partitioned to VLANs).
-- 
2.21.0


^ permalink raw reply related

* [net-next 04/15] igc: Remove useless forward declaration
From: Jeff Kirsher @ 2019-08-28  6:43 UTC (permalink / raw)
  To: davem; +Cc: Sasha Neftin, netdev, nhorman, sassmann, Aaron Brown,
	Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>

From: Sasha Neftin <sasha.neftin@intel.com>

Move igc_phy_setup_autoneg, igc_wait_autoneg and igc_set_fc_watermarks
up to avoid forward declaration.
It is not necessary to forward declare these static methods.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_mac.c |  73 +++++----
 drivers/net/ethernet/intel/igc/igc_phy.c | 192 +++++++++++------------
 2 files changed, 129 insertions(+), 136 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
index ba4646737288..5eeb4c8caf4a 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -7,9 +7,6 @@
 #include "igc_mac.h"
 #include "igc_hw.h"
 
-/* forward declaration */
-static s32 igc_set_fc_watermarks(struct igc_hw *hw);
-
 /**
  * igc_disable_pcie_master - Disables PCI-express master access
  * @hw: pointer to the HW structure
@@ -74,6 +71,41 @@ void igc_init_rx_addrs(struct igc_hw *hw, u16 rar_count)
 		hw->mac.ops.rar_set(hw, mac_addr, i);
 }
 
+/**
+ * igc_set_fc_watermarks - Set flow control high/low watermarks
+ * @hw: pointer to the HW structure
+ *
+ * Sets the flow control high/low threshold (watermark) registers.  If
+ * flow control XON frame transmission is enabled, then set XON frame
+ * transmission as well.
+ */
+static s32 igc_set_fc_watermarks(struct igc_hw *hw)
+{
+	u32 fcrtl = 0, fcrth = 0;
+
+	/* Set the flow control receive threshold registers.  Normally,
+	 * these registers will be set to a default threshold that may be
+	 * adjusted later by the driver's runtime code.  However, if the
+	 * ability to transmit pause frames is not enabled, then these
+	 * registers will be set to 0.
+	 */
+	if (hw->fc.current_mode & igc_fc_tx_pause) {
+		/* We need to set up the Receive Threshold high and low water
+		 * marks as well as (optionally) enabling the transmission of
+		 * XON frames.
+		 */
+		fcrtl = hw->fc.low_water;
+		if (hw->fc.send_xon)
+			fcrtl |= IGC_FCRTL_XONE;
+
+		fcrth = hw->fc.high_water;
+	}
+	wr32(IGC_FCRTL, fcrtl);
+	wr32(IGC_FCRTH, fcrth);
+
+	return 0;
+}
+
 /**
  * igc_setup_link - Setup flow control and link settings
  * @hw: pointer to the HW structure
@@ -194,41 +226,6 @@ s32 igc_force_mac_fc(struct igc_hw *hw)
 	return ret_val;
 }
 
-/**
- * igc_set_fc_watermarks - Set flow control high/low watermarks
- * @hw: pointer to the HW structure
- *
- * Sets the flow control high/low threshold (watermark) registers.  If
- * flow control XON frame transmission is enabled, then set XON frame
- * transmission as well.
- */
-static s32 igc_set_fc_watermarks(struct igc_hw *hw)
-{
-	u32 fcrtl = 0, fcrth = 0;
-
-	/* Set the flow control receive threshold registers.  Normally,
-	 * these registers will be set to a default threshold that may be
-	 * adjusted later by the driver's runtime code.  However, if the
-	 * ability to transmit pause frames is not enabled, then these
-	 * registers will be set to 0.
-	 */
-	if (hw->fc.current_mode & igc_fc_tx_pause) {
-		/* We need to set up the Receive Threshold high and low water
-		 * marks as well as (optionally) enabling the transmission of
-		 * XON frames.
-		 */
-		fcrtl = hw->fc.low_water;
-		if (hw->fc.send_xon)
-			fcrtl |= IGC_FCRTL_XONE;
-
-		fcrth = hw->fc.high_water;
-	}
-	wr32(IGC_FCRTL, fcrtl);
-	wr32(IGC_FCRTH, fcrth);
-
-	return 0;
-}
-
 /**
  * igc_clear_hw_cntrs_base - Clear base hardware counters
  * @hw: pointer to the HW structure
diff --git a/drivers/net/ethernet/intel/igc/igc_phy.c b/drivers/net/ethernet/intel/igc/igc_phy.c
index 4c8f96a9a148..f4b05af0dd2f 100644
--- a/drivers/net/ethernet/intel/igc/igc_phy.c
+++ b/drivers/net/ethernet/intel/igc/igc_phy.c
@@ -3,10 +3,6 @@
 
 #include "igc_phy.h"
 
-/* forward declaration */
-static s32 igc_phy_setup_autoneg(struct igc_hw *hw);
-static s32 igc_wait_autoneg(struct igc_hw *hw);
-
 /**
  * igc_check_reset_block - Check if PHY reset is blocked
  * @hw: pointer to the HW structure
@@ -207,100 +203,6 @@ s32 igc_phy_hw_reset(struct igc_hw *hw)
 	return ret_val;
 }
 
-/**
- * igc_copper_link_autoneg - Setup/Enable autoneg for copper link
- * @hw: pointer to the HW structure
- *
- * Performs initial bounds checking on autoneg advertisement parameter, then
- * configure to advertise the full capability.  Setup the PHY to autoneg
- * and restart the negotiation process between the link partner.  If
- * autoneg_wait_to_complete, then wait for autoneg to complete before exiting.
- */
-static s32 igc_copper_link_autoneg(struct igc_hw *hw)
-{
-	struct igc_phy_info *phy = &hw->phy;
-	u16 phy_ctrl;
-	s32 ret_val;
-
-	/* Perform some bounds checking on the autoneg advertisement
-	 * parameter.
-	 */
-	phy->autoneg_advertised &= phy->autoneg_mask;
-
-	/* If autoneg_advertised is zero, we assume it was not defaulted
-	 * by the calling code so we set to advertise full capability.
-	 */
-	if (phy->autoneg_advertised == 0)
-		phy->autoneg_advertised = phy->autoneg_mask;
-
-	hw_dbg("Reconfiguring auto-neg advertisement params\n");
-	ret_val = igc_phy_setup_autoneg(hw);
-	if (ret_val) {
-		hw_dbg("Error Setting up Auto-Negotiation\n");
-		goto out;
-	}
-	hw_dbg("Restarting Auto-Neg\n");
-
-	/* Restart auto-negotiation by setting the Auto Neg Enable bit and
-	 * the Auto Neg Restart bit in the PHY control register.
-	 */
-	ret_val = phy->ops.read_reg(hw, PHY_CONTROL, &phy_ctrl);
-	if (ret_val)
-		goto out;
-
-	phy_ctrl |= (MII_CR_AUTO_NEG_EN | MII_CR_RESTART_AUTO_NEG);
-	ret_val = phy->ops.write_reg(hw, PHY_CONTROL, phy_ctrl);
-	if (ret_val)
-		goto out;
-
-	/* Does the user want to wait for Auto-Neg to complete here, or
-	 * check at a later time (for example, callback routine).
-	 */
-	if (phy->autoneg_wait_to_complete) {
-		ret_val = igc_wait_autoneg(hw);
-		if (ret_val) {
-			hw_dbg("Error while waiting for autoneg to complete\n");
-			goto out;
-		}
-	}
-
-	hw->mac.get_link_status = true;
-
-out:
-	return ret_val;
-}
-
-/**
- * igc_wait_autoneg - Wait for auto-neg completion
- * @hw: pointer to the HW structure
- *
- * Waits for auto-negotiation to complete or for the auto-negotiation time
- * limit to expire, which ever happens first.
- */
-static s32 igc_wait_autoneg(struct igc_hw *hw)
-{
-	u16 i, phy_status;
-	s32 ret_val = 0;
-
-	/* Break after autoneg completes or PHY_AUTO_NEG_LIMIT expires. */
-	for (i = PHY_AUTO_NEG_LIMIT; i > 0; i--) {
-		ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
-		if (ret_val)
-			break;
-		ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
-		if (ret_val)
-			break;
-		if (phy_status & MII_SR_AUTONEG_COMPLETE)
-			break;
-		msleep(100);
-	}
-
-	/* PHY_AUTO_NEG_TIME expiration doesn't guarantee auto-negotiation
-	 * has completed.
-	 */
-	return ret_val;
-}
-
 /**
  * igc_phy_setup_autoneg - Configure PHY for auto-negotiation
  * @hw: pointer to the HW structure
@@ -485,6 +387,100 @@ static s32 igc_phy_setup_autoneg(struct igc_hw *hw)
 	return ret_val;
 }
 
+/**
+ * igc_wait_autoneg - Wait for auto-neg completion
+ * @hw: pointer to the HW structure
+ *
+ * Waits for auto-negotiation to complete or for the auto-negotiation time
+ * limit to expire, which ever happens first.
+ */
+static s32 igc_wait_autoneg(struct igc_hw *hw)
+{
+	u16 i, phy_status;
+	s32 ret_val = 0;
+
+	/* Break after autoneg completes or PHY_AUTO_NEG_LIMIT expires. */
+	for (i = PHY_AUTO_NEG_LIMIT; i > 0; i--) {
+		ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
+		if (ret_val)
+			break;
+		ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
+		if (ret_val)
+			break;
+		if (phy_status & MII_SR_AUTONEG_COMPLETE)
+			break;
+		msleep(100);
+	}
+
+	/* PHY_AUTO_NEG_TIME expiration doesn't guarantee auto-negotiation
+	 * has completed.
+	 */
+	return ret_val;
+}
+
+/**
+ * igc_copper_link_autoneg - Setup/Enable autoneg for copper link
+ * @hw: pointer to the HW structure
+ *
+ * Performs initial bounds checking on autoneg advertisement parameter, then
+ * configure to advertise the full capability.  Setup the PHY to autoneg
+ * and restart the negotiation process between the link partner.  If
+ * autoneg_wait_to_complete, then wait for autoneg to complete before exiting.
+ */
+static s32 igc_copper_link_autoneg(struct igc_hw *hw)
+{
+	struct igc_phy_info *phy = &hw->phy;
+	u16 phy_ctrl;
+	s32 ret_val;
+
+	/* Perform some bounds checking on the autoneg advertisement
+	 * parameter.
+	 */
+	phy->autoneg_advertised &= phy->autoneg_mask;
+
+	/* If autoneg_advertised is zero, we assume it was not defaulted
+	 * by the calling code so we set to advertise full capability.
+	 */
+	if (phy->autoneg_advertised == 0)
+		phy->autoneg_advertised = phy->autoneg_mask;
+
+	hw_dbg("Reconfiguring auto-neg advertisement params\n");
+	ret_val = igc_phy_setup_autoneg(hw);
+	if (ret_val) {
+		hw_dbg("Error Setting up Auto-Negotiation\n");
+		goto out;
+	}
+	hw_dbg("Restarting Auto-Neg\n");
+
+	/* Restart auto-negotiation by setting the Auto Neg Enable bit and
+	 * the Auto Neg Restart bit in the PHY control register.
+	 */
+	ret_val = phy->ops.read_reg(hw, PHY_CONTROL, &phy_ctrl);
+	if (ret_val)
+		goto out;
+
+	phy_ctrl |= (MII_CR_AUTO_NEG_EN | MII_CR_RESTART_AUTO_NEG);
+	ret_val = phy->ops.write_reg(hw, PHY_CONTROL, phy_ctrl);
+	if (ret_val)
+		goto out;
+
+	/* Does the user want to wait for Auto-Neg to complete here, or
+	 * check at a later time (for example, callback routine).
+	 */
+	if (phy->autoneg_wait_to_complete) {
+		ret_val = igc_wait_autoneg(hw);
+		if (ret_val) {
+			hw_dbg("Error while waiting for autoneg to complete\n");
+			goto out;
+		}
+	}
+
+	hw->mac.get_link_status = true;
+
+out:
+	return ret_val;
+}
+
 /**
  * igc_setup_copper_link - Configure copper link settings
  * @hw: pointer to the HW structure
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH rdma-next v3 0/3] ODP support for mlx5 DC QPs
From: Leon Romanovsky @ 2019-08-28  7:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Michael Guralnik, Saeed Mahameed,
	linux-netdev
In-Reply-To: <20190827155140.GA15153@ziepe.ca>

On Tue, Aug 27, 2019 at 12:51:40PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 19, 2019 at 03:08:12PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Changelog
> >  v3:
> >  * Rewrote patches to expose through DEVX without need to change mlx5-abi.h at all.
> >  v2: https://lore.kernel.org/linux-rdma/20190806074807.9111-1-leon@kernel.org
> >  * Fixed reserved_* field wrong name (Saeed M.)
> >  * Split first patch to two patches, one for mlx5-next and one for  rdma-next. (Saeed M.)
> >  v1: https://lore.kernel.org/linux-rdma/20190804100048.32671-1-leon@kernel.org
> >  * Fixed alignment to u64 in mlx5-abi.h (Gal P.)
> >  v0: https://lore.kernel.org/linux-rdma/20190801122139.25224-1-leon@kernel.org
> >
> > >From Michael,
> >
> > The series adds support for on-demand paging for DC transport.
> >
> > As DC is mlx-only transport, the capabilities are exposed
> > to the user using DEVX objects and later on through mlx5dv_query_device.
> >
> > Thanks
> >
> > Michael Guralnik (3):
> >   net/mlx5: Set ODP capabilities for DC transport to max
> >   IB/mlx5: Remove check of FW capabilities in ODP page fault handling
> >   IB/mlx5: Add page fault handler for DC initiator WQE
>
> This seems fine, can you put the commit on the shared branch?

Thanks, applied to mlx5-next
00679b631edd net/mlx5: Set ODP capabilities for DC transport to max

>
> Thanks,
> Jason

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox