Netdev List
 help / color / mirror / Atom feed
* Good News
From: Mrs Julie Leach @ 2018-04-30 20:27 UTC (permalink / raw)


You are a recipient to Mrs Julie Leach Donation of $2 million USD. Contact 
(julieleach104@gmail.com) for claims.

^ permalink raw reply

* Re: [PATCH net-next v6 0/3] kernel: add support to collect hardware logs in crash recovery kernel
From: Eric W. Biederman @ 2018-04-30 20:09 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: netdev, kexec, linux-fsdevel, linux-kernel, davem, viro, stephen,
	akpm, torvalds, ganeshgr, nirranjan, indranil
In-Reply-To: <cover.1525082669.git.rahul.lakkireddy@chelsio.com>

Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:

> v6:
> - Reworked device dump elf note name to contain vendor identifier.
> - Added vmcoredd_header that precedes actual dump in the Elf Note.
> - Device dump's name is moved inside vmcoredd_header.
> - Added "CHELSIO" string as vendor identifier in the Elf Note name
>   for cxgb4 device dumps.

Yep you did and that is not correct.  My apologies if I was unclear.

An elf note looks like this:

/* Note header in a PT_NOTE section */
typedef struct elf32_note {
  Elf32_Word	n_namesz;	/* Name size */
  Elf32_Word	n_descsz;	/* Content size */
  Elf32_Word	n_type;		/* Content type */
} Elf32_Nhdr;

n_descsz is is the length of the body.

n_namesz is the length of the ``vendor'' but not the vendor of the your
driver.  It is the ``vendor'' that defines n_type.  So "LINUX" in our
case.

The pair "LINUX", NT_VMCOREDD go together.  That pair is what a
subsequent program must look at to decide how to understand the note.

Please don't use CRASH_CORE_NOTE_HEAD BYTES that obscures things
unnecessarily, and really is not applicable to this use case.
ALIGN(sizeof(struct elf_note), 4) is almost as short and it makes
it clear what your are talking about.


Also please don't look too closely as the other note stuff for Elf core
dumps.  That stuff was not well done from an Elf standards and ABI
perspective.  Unfortunately by the time I had noticed it was years later
so not something that is worth the breakage in tools to change now.

Looking at struct vmcoredd_header.

That seems a reasonable structure.  However it is a uapi structure so
we need to carefully definite it that way.

Perhaps:

#define VMCOREDD_MAX_NAME_BYTES  32

struct vmcoredd_header {
	__u32 header_len;	/* Length of this header */
        __u32 reserved;
        __u64 data_len;		/* Length of this device dump */
        __u8 dump_name[VMCOREDD_MAX_NAME_BYTES];
        __u8 reserved2[16];
};


Looking at that I see another siginficant issue.  We can't let the
device dump be more than 32bits long.  The length of an elf
note is defined by an elf word which is universally a __u32.

So perhaps vmcoredd_header should look like:

#define VMCOREDD_MAX_NAME_BYTES  44

struct vmcoredd_header {
	__u32	n_namesz;	/* Name size */
        __u32	n_descsz;	/* Content size */
        __u32	n_type;		/* NT_VMDOREDD */
	__u8	name[8];	/* LINUX\0\0\0 */
        __u8	dump_name[VMCOREDD_MAX_NAME_BYTES];
};

The total length of the data dump would be descsz - sizeof(struct
vmcoredd_header).  The header winds up being a constant 64 bytes this
way, and includes the elf note header.

Eric

^ permalink raw reply

* Re: [PATCH net-next v2 4/5] ipv6: sr: Add seg6local action End.BPF
From: Mathieu Xhonneux @ 2018-04-30 20:06 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev, David Lebrun, Daniel Borkmann
In-Reply-To: <20180428000141.f63iw36nfr3bbz2r@ast-mbp>

2018-04-28 2:01 GMT+02:00 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
>
> On Fri, Apr 27, 2018 at 10:59:19AM -0400, David Miller wrote:
> > From: Mathieu Xhonneux <m.xhonneux@gmail.com>
> > Date: Tue, 24 Apr 2018 18:44:15 +0100
> >
> > > This patch adds the End.BPF action to the LWT seg6local infrastructure.
> > > This action works like any other seg6local End action, meaning that an IPv6
> > > header with SRH is needed, whose DA has to be equal to the SID of the
> > > action. It will also advance the SRH to the next segment, the BPF program
> > > does not have to take care of this.
> >
> > I'd like to see some BPF developers review this change.
> >
> > But on my side I wonder if, instead of validating the whole thing afterwards,
> > we should make the helpers accessible by the eBPF program validate the changes
> > as they are made.
>
> Looking at the code I don't think it's possible to keep it valid all the time
> while building, so seg6_validate_srh() after the program run seems necessary.


Indeed, e.g. to add a TLV in the SRH one needs to call
bpf_lwt_seg6_adjust_srh (to add some room for the TLV), then
bpf_lwt_seg6_store_bytes() (to fill the space with the TLV). Between
those two calls, the SRH is in an invalid state.

>
>
> I think the whole set should be targeting bpf-next tree.
> Please fix kbuild errors, rebase and document new helper in man-page style.
> Things like:
> +       test_btf_haskv.o test_btf_nokv.o test_lwt_seg6local.o
> +>>>>>>> selftests/bpf: test for seg6local End.BPF action
> should be fixed properly.


Oops, I didn't catch this one, thanks. I'll send a v3 towards bpf-next.

^ permalink raw reply

* [PATCH net-next] udp: disable gso with no_check_tx
From: Willem de Bruijn @ 2018-04-30 19:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Syzbot managed to send a udp gso packet without checksum offload into
the gso stack by disabling tx checksum (UDP_NO_CHECK6_TX). This
triggered the skb_warn_bad_offload.

  RIP: 0010:skb_warn_bad_offload+0x2bc/0x600 net/core/dev.c:2658
   skb_gso_segment include/linux/netdevice.h:4038 [inline]
   validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3120
   __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3577
   dev_queue_xmit+0x17/0x20 net/core/dev.c:3618

UDP_NO_CHECK6_TX sets skb->ip_summed to CHECKSUM_NONE just after the
udp gso integrity checks in udp_(v6_)send_skb. Extend those checks to
catch and fail in this case.

After the integrity checks jump directly to the CHECKSUM_PARTIAL case
to avoid reading the no_check_tx flags again (a TOCTTOU race).

Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/udp.c | 4 ++++
 net/ipv6/udp.c | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 794aeafeb782..dd3102a37ef9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -786,11 +786,14 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
 			return -EINVAL;
 		if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS)
 			return -EINVAL;
+		if (sk->sk_no_check_tx)
+			return -EINVAL;
 		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite)
 			return -EIO;
 
 		skb_shinfo(skb)->gso_size = cork->gso_size;
 		skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+		goto csum_partial;
 	}
 
 	if (is_udplite)  				 /*     UDP-Lite      */
@@ -802,6 +805,7 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
 		goto send;
 
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) { /* UDP hardware csum */
+csum_partial:
 
 		udp4_hwcsum(skb, fl4->saddr, fl4->daddr);
 		goto send;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 6acfdd3e442b..a34e28ac03a7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1051,11 +1051,14 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
 			return -EINVAL;
 		if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS)
 			return -EINVAL;
+		if (udp_sk(sk)->no_check6_tx)
+			return -EINVAL;
 		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite)
 			return -EIO;
 
 		skb_shinfo(skb)->gso_size = cork->gso_size;
 		skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+		goto csum_partial;
 	}
 
 	if (is_udplite)
@@ -1064,6 +1067,7 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
 		skb->ip_summed = CHECKSUM_NONE;
 		goto send;
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) { /* UDP hardware csum */
+csum_partial:
 		udp6_hwcsum_outgoing(sk, skb, &fl6->saddr, &fl6->daddr, len);
 		goto send;
 	} else
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* Re: Representing cpu-port of a switch
From: Florian Fainelli @ 2018-04-30 19:45 UTC (permalink / raw)
  To: sk.syed2; +Cc: netdev, jayaram, syeds, andrew, vivien.didelot
In-Reply-To: <CAE6JOJ909uWjzr1gOic59=cvMfJbGWB28-b4CnjMr-hXHVNP+w@mail.gmail.com>

On 04/30/2018 06:06 AM, sk.syed2 wrote:
>> Again, pretty standard. What you probably meant is that for host destined or initiated traffic you must use that internal port to egress/ingress frames towards the other external ports.
>>
> Yes.
> 
>>
>>> Without tagging, we cant really use DSA, and hide the cpu/dsa port. So
>>> if we expose this cpu port as a interface with fixed-phy
>>> infrastructure does it create any problems?
>>
>> Well the fact that you don't have a tagging protocol does not really mean you cannot use DSA. You could create your own tagging format which in your case could be as simple as keeping the prepended DMA packet descriptor and have the parsing of that descriptor be done in a DSA tagger driver.
> This would need HW change wont it?
> 
>  Not that I would necessarily recommend that though, see below.
>>
> 
>>  DSA documentation says one
>>> cannot open a socket on cpu/dsa port and send/receive traffic. Is it
>>> fairly common to use internal/cpu port as a network interface- i.e,
>>> creating a socket and send/receive traffic?
>>
>> In the context of DSA, all external ports are represented by a network device. This means that the CPU/management port is only used to ingress/egress frames that include the tag which either the switch hardware inserts on its way to the host or conversely that the host must insert to have the switch do the appropriate switching operation. If you do not use tags and you still have a way to target specific external ports the same representation should happen and you do not want to expose the internal port because it will only be used to send/receive traffic from the external ports and it will not be used to send or receive traffic to itself (so to speak).
> Just like with DSA you should have the ability to create network
> devices that are per-port and you can use the HW provided information
> to deliver packets to the appropriate destination port, conversely
> send from the appropriate source port.
>>
>>
> The switch HW doesn't have any provision to target specific external
> port. I think I didnt make this clear. We would like to integrate a
> switch along with an endpoint into a single HW. Meaning we would like
> to use internal cpu port as any normal network port to send and
> receive traffic. The external network ports will not be used to
> send/receive normal traffic(except for control/mgmt frames). So, the
> two external front panel ports tied to phys are lets say eth1, eth2.

The only problem with that approach is that eth1 and eth2 are only
control interfaces, they cannot transfer any data, eth0 does that, see
below.

> The cpu port tied to DMAs is represented using fixed-link/always_on
> interface say ep0(endpoint). Now applications can open a socket and
> send/receive traffic from ep0. The switch does forwarding based on
> programmed CAM. Do you see any problem with this approach?

No, this is entirely similar to what we do in DSA with DSA_PROTO_TAG_NONE.

In premise, if you created a VLAN identifier for each of your front
panel port, you could emulate in SW what you do not have in HW which is
to target specific front panel ports.

> 
> 
> 
>>> One problem is how to report back when network errors(like if both
>>> front panel ports are disconnected, the expectation is to bring this
>>> cpu port down?).
>>
>> The CPU port should be considered always UP and the external ports must have proper link notifications in place through PHYLIB/PHYLINK preferably. With link management in place the carrier state is properly managed and the network stack won't send traffic from a port that is not UP.
>>
>>> We also need to offload all the switch configuration to switch-dev. So
>>> the question is using switch-dev without DSA and representing a cpu
>>> port as a normal network interface would be ok?
>>
>> Using switchdev without DSA is absolutely okay, see rocker and mlxsw, but neither of those do represent their CPU/management port for the reasons outlined above that it is only used to convey traffic to/from other ports that have a proper network device representation.
>>
> May be this is something elementary, but what do you mean by proper
> network device representation that cpu port lacks?

CPU ports in these cases are not created as a separate net_device
instance, they exist in the HW and at the SW level because we need to
use them to direct traffic to/from specific front-panel port using a
specific DMA capability (e.g: a special set of bits in a DMA
descriptor). In your case, you don't have that ability in your HW, so
you must actually create a CPU net_device for applications to be able to
send traffic.
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-04-30 19:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, aaron.f.brown
In-Reply-To: <20180430071208.GG23854@nanopsycho.orion>

On 4/30/2018 12:12 AM, Jiri Pirko wrote:
> Mon, Apr 30, 2018 at 05:00:33AM CEST, sridhar.samudrala@intel.com wrote:
>> On 4/28/2018 1:24 AM, Jiri Pirko wrote:
>>> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>>>> This patch enables virtio_net to switch over to a VF datapath when a VF
>>>> netdev is present with the same MAC address. It allows live migration
>>>> of a VM with a direct attached VF without the need to setup a bond/team
>>>> between a VF and virtio net device in the guest.
>>>>
>>>> The hypervisor needs to enable only one datapath at any time so that
>>>> packets don't get looped back to the VM over the other datapath. When a VF
>>> Why? Both datapaths could be enabled at a time. Why the loop on
>>> hypervisor side would be a problem. This in not an issue for
>>> bonding/team as well.
>> Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
>> sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.
> Why? Please see below.
>
>
>> This can happen if both datapaths are enabled at the same time.
>>
>> I would think this is an issue even with bonding/team as well when virtio-net and
>> VF are backed by the same PF.
>>
>>
> I believe that the scenario is the same as on an ordinary nic/swich
> network:
>
> ...................
>
>    host
>   
>         bond0
>        /     \
>      eth0   eth1
>       |       |
> ...................
>       |       |
>       p1      p2
>
>    switch
>
> ...................
>
> It is perfectly valid to p1 and p2 be up and "bridged" together. Bond
> has to cope with loop-backed frames. "Failover driver" should too,
> it's the same scenario.

OK. So looks like we should be able to handle this by returning RX_HANDLER_EXACT
for frames received on standby device when primary is present.

^ permalink raw reply

* Re: [RFC net-next 0/5] Support for PHY test modes
From: Florian Fainelli @ 2018-04-30 19:23 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Miller, netdev, rmk, linux-kernel, cphealy, nikita.yoush,
	vivien.didelot, Nisar.Sayed, UNGLinuxDriver
In-Reply-To: <20180430164012.GB16854@lunn.ch>

On 04/30/2018 09:40 AM, Andrew Lunn wrote:
>> Turning these tests on will typically result in the link partner
>> dropping the link with us, and the interface will be non-functional as
>> far as the data path is concerned (similar to an isolation mode). This
>> might warrant properly reporting that to user-space through e.g: a
>> private IFF_* value maybe?
> 
> Hi Florian
> 
> I've not looked at the code yet....
> 
> Is it also necessary to kick off auto-neg again after the test has
> finished, in order to reestablish the link?

It would yes. Right now there is a test mode exposed named "normal"
which really, means: bring back to the PHY to an operation state. This
state likely does not belong here and we would have to introduce flags
instead such as start and stop the test instead.

Please review the patches when we get a chance, because I suspect this
is not the only issue they have ;)
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v9 1/4] virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
From: Samudrala, Sridhar @ 2018-04-30 19:14 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, aaron.f.brown
In-Reply-To: <20180430070340.GF23854@nanopsycho.orion>


On 4/30/2018 12:03 AM, Jiri Pirko wrote:
> Mon, Apr 30, 2018 at 04:47:03AM CEST, sridhar.samudrala@intel.com wrote:
>> On 4/28/2018 12:50 AM, Jiri Pirko wrote:
>>> Fri, Apr 27, 2018 at 07:06:57PM CEST,sridhar.samudrala@intel.com  wrote:
>>>> This feature bit can be used by hypervisor to indicate virtio_net device to
>>>> act as a standby for another device with the same MAC address.
>>>>
>>>> VIRTIO_NET_F_STANDBY is defined as bit 62 as it is a device feature bit.
>>>>
>>>> Signed-off-by: Sridhar Samudrala<sridhar.samudrala@intel.com>
>>>> ---
>>>> drivers/net/virtio_net.c        | 2 +-
>>>> include/uapi/linux/virtio_net.h | 3 +++
>>>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 3b5991734118..51a085b1a242 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -2999,7 +2999,7 @@ static struct virtio_device_id id_table[] = {
>>>> 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>>>> 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>>>> 	VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>>>> -	VIRTIO_NET_F_SPEED_DUPLEX
>>>> +	VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY
>>> This is not part of current qemu master (head 6f0c4706b35dead265509115ddbd2a8d1af516c1)
>>> Were I can find the qemu code?
>>>
>>> Also, I think it makes sense to push HW (qemu HW in this case) first
>>> and only then the driver.
>> I had sent qemu patch with a couple of earlier versions of this patchset.
>> Will include it when i send out v10.
> The point was, don't you want to push it to qemu first? Did you at least
> send RFC to qemu?

Yes. Here is the link to the RFC patch.
https://patchwork.ozlabs.org/patch/859521/

^ permalink raw reply

* Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: John Stoffel @ 2018-04-30 18:27 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: John Stoffel, Andrew, eric.dumazet, mst, edumazet, netdev,
	jasowang, Randy Dunlap, linux-kernel, Matthew Wilcox, Hocko,
	James Bottomley, Michal, dm-devel, David Miller, David Rientjes,
	Morton, virtualization, linux-mm, Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804261726540.13401@file01.intranet.prod.int.rdu2.redhat.com>

>>>>> "Mikulas" == Mikulas Patocka <mpatocka@redhat.com> writes:

Mikulas> On Thu, 26 Apr 2018, John Stoffel wrote:

>> >>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:
>> 
James> I may be an atypical developer but I'd rather have a root canal
James> than browse through menuconfig options.  The way to get people
James> to learn about new debugging options is to blog about it (or
James> write an lwn.net article) which google will find the next time
James> I ask it how I debug XXX.  Google (probably as a service to
James> humanity) rarely turns up Kconfig options in response to a
James> query.
>> 
>> I agree with James here.  Looking at the SLAB vs SLUB Kconfig entries
>> tells me *nothing* about why I should pick one or the other, as an
>> example.
>> 
>> John

Mikulas> I see your point - and I think the misunderstanding is this.

Thanks.

Mikulas> This patch is not really helping people to debug existing crashes. It is 
Mikulas> not like "you get a crash" - "you google for some keywords" - "you get a 
Mikulas> page that suggests to turn this option on" - "you turn it on and solve the 
Mikulas> crash".

Mikulas> What this patch really does is that - it makes the kernel deliberately 
Mikulas> crash in a situation when the code violates the specification, but it 
Mikulas> would not crash otherwise or it would crash very rarely. It helps to 
Mikulas> detect specification violations.

Mikulas> If the kernel developer (or tester) doesn't use this option, his buggy 
Mikulas> code won't crash - and if it won't crash, he won't fix the bug or report 
Mikulas> it. How is the user or developer supposed to learn about this option, if 
Mikulas> he gets no crash at all?

So why do we make this a KConfig option at all?  Just turn it on and
let it rip.  Now I also think that Linus has the right idea to not
just sprinkle BUG_ONs into the code, just dump and oops and keep going
if you can.  If it's a filesystem or a device, turn it read only so
that people notice right away.

^ permalink raw reply

* Re: Request for stable 4.14.x inclusion: net: don't call update_pmtu unconditionally
From: Greg KH @ 2018-04-30 18:22 UTC (permalink / raw)
  To: Eddie Chapman; +Cc: Thomas Deutschmann, stable, davem, nicolas.dichtel, netdev
In-Reply-To: <7485fe73-0c33-e5b4-6c9b-c8c7a359be4a@ehuk.net>

On Fri, Apr 27, 2018 at 07:43:52PM +0100, Eddie Chapman wrote:
> On 27/04/18 19:07, Thomas Deutschmann wrote:
> > Hi Greg,
> > 
> > first, we need to cherry-pick another patch first:
> > >  From 52a589d51f1008f62569bf89e95b26221ee76690 Mon Sep 17 00:00:00 2001
> > > From: Xin Long <lucien.xin@gmail.com>
> > > Date: Mon, 25 Dec 2017 14:43:58 +0800
> > > Subject: [PATCH] geneve: update skb dst pmtu on tx path
> > > 
> > > Commit a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") has fixed
> > > a performance issue caused by the change of lower dev's mtu for vxlan.
> > > 
> > > The same thing needs to be done for geneve as well.
> > > 
> > > Note that geneve cannot adjust it's mtu according to lower dev's mtu
> > > when creating it. The performance is very low later when netperfing
> > > over it without fixing the mtu manually. This patch could also avoid
> > > this issue.
> > > 
> > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > > Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Oops, I completely missed that the coreos patch doesn't have the geneve hunk
> that is in the original 4.15 patch. I don't load the geneve module on my box
> hence why no problems surfaced on my machine.

The geneve hunk doesn't apply at all to the 4.14.y tree, so I think
someone has a messed up tree somewhere...

I'll go look into this now.

greg k-h

^ permalink raw reply

* INFO: rcu detected stall in kfree_skbmem
From: syzbot @ 2018-04-30 18:09 UTC (permalink / raw)
  To: avagin, davem, ktkhai, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    5d1365940a68 Merge  
git://git.kernel.org/pub/scm/linux/kerne...
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?id=5667997129637888
kernel config:   
https://syzkaller.appspot.com/x/.config?id=-5947642240294114534
dashboard link: https://syzkaller.appspot.com/bug?extid=fc78715ba3b3257caf6a
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+fc78715ba3b3257caf6a@syzkaller.appspotmail.com

INFO: rcu_sched self-detected stall on CPU
	1-...!: (1 GPs behind) idle=a3e/1/4611686018427387908 softirq=71980/71983  
fqs=33
	 (t=125000 jiffies g=39438 c=39437 q=958)
rcu_sched kthread starved for 124829 jiffies! g39438 c39437 f0x0  
RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
RCU grace-period kthread stack dump:
rcu_sched       R  running task    23768     9      2 0x80000000
Call Trace:
  context_switch kernel/sched/core.c:2848 [inline]
  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
  schedule+0xef/0x430 kernel/sched/core.c:3549
  schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
  rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
  kthread+0x345/0x410 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
NMI backtrace for cpu 1
CPU: 1 PID: 20560 Comm: syz-executor4 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
  __rcu_pending kernel/rcu/tree.c:3356 [inline]
  rcu_pending kernel/rcu/tree.c:3401 [inline]
  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
  __run_hrtimer kernel/time/hrtimer.c:1386 [inline]
  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
  hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783  
[inline]
RIP: 0010:kmem_cache_free+0xb3/0x2d0 mm/slab.c:3757
RSP: 0018:ffff8801db105228 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000007 RBX: ffff8800b055c940 RCX: 1ffff1003b2345a5
RDX: 0000000000000000 RSI: ffff8801d91a2d80 RDI: 0000000000000282
RBP: ffff8801db105248 R08: ffff8801d91a2cb8 R09: 0000000000000002
R10: ffff8801d91a2480 R11: 0000000000000000 R12: ffff8801d9848e40
R13: 0000000000000282 R14: ffffffff85b7f27c R15: 0000000000000000
  kfree_skbmem+0x13c/0x210 net/core/skbuff.c:582
  __kfree_skb net/core/skbuff.c:642 [inline]
  kfree_skb+0x19d/0x560 net/core/skbuff.c:659
  enqueue_to_backlog+0x2fc/0xc90 net/core/dev.c:3968
  netif_rx_internal+0x14d/0xae0 net/core/dev.c:4181
  netif_rx+0xba/0x400 net/core/dev.c:4206
  loopback_xmit+0x283/0x741 drivers/net/loopback.c:91
  __netdev_start_xmit include/linux/netdevice.h:4087 [inline]
  netdev_start_xmit include/linux/netdevice.h:4096 [inline]
  xmit_one net/core/dev.c:3053 [inline]
  dev_hard_start_xmit+0x264/0xc10 net/core/dev.c:3069
  __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584
  dev_queue_xmit+0x17/0x20 net/core/dev.c:3617
  neigh_hh_output include/net/neighbour.h:472 [inline]
  neigh_output include/net/neighbour.h:480 [inline]
  ip6_finish_output2+0x134e/0x2810 net/ipv6/ip6_output.c:120
  ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154
  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
  ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171
  dst_output include/net/dst.h:444 [inline]
  NF_HOOK include/linux/netfilter.h:288 [inline]
  ip6_xmit+0xf51/0x23f0 net/ipv6/ip6_output.c:277
  sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225
  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650
  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d1/0x200 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
  </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783  
[inline]
RIP: 0010:lock_release+0x4d4/0xa10 kernel/locking/lockdep.c:3942
RSP: 0018:ffff8801971ce7b0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: 1ffff10032e39cfb RCX: 1ffff1003b234595
RDX: 1ffffffff11630ed RSI: 0000000000000002 RDI: 0000000000000282
RBP: ffff8801971ce8e0 R08: 1ffff10032e39cff R09: ffffed003b6246c2
R10: 0000000000000003 R11: 0000000000000001 R12: ffff8801d91a2480
R13: ffffffff88b8df60 R14: ffff8801d91a2480 R15: ffff8801971ce7f8
  rcu_lock_release include/linux/rcupdate.h:251 [inline]
  rcu_read_unlock include/linux/rcupdate.h:688 [inline]
  __unlock_page_memcg+0x72/0x100 mm/memcontrol.c:1654
  unlock_page_memcg+0x2c/0x40 mm/memcontrol.c:1663
  page_remove_file_rmap mm/rmap.c:1248 [inline]
  page_remove_rmap+0x6f2/0x1250 mm/rmap.c:1299
  zap_pte_range mm/memory.c:1337 [inline]
  zap_pmd_range mm/memory.c:1441 [inline]
  zap_pud_range mm/memory.c:1470 [inline]
  zap_p4d_range mm/memory.c:1491 [inline]
  unmap_page_range+0xeb4/0x2200 mm/memory.c:1512
  unmap_single_vma+0x1a0/0x310 mm/memory.c:1557
  unmap_vmas+0x120/0x1f0 mm/memory.c:1587
  exit_mmap+0x265/0x570 mm/mmap.c:3038
  __mmput kernel/fork.c:962 [inline]
  mmput+0x251/0x610 kernel/fork.c:983
  exit_mm kernel/exit.c:544 [inline]
  do_exit+0xe98/0x2730 kernel/exit.c:852
  do_group_exit+0x16f/0x430 kernel/exit.c:968
  get_signal+0x886/0x1960 kernel/signal.c:2469
  do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810
  exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
  do_syscall_64+0x792/0x9d0 arch/x86/entry/common.c:292
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455319
RSP: 002b:00007fa346e81ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000072bf80 RCX: 0000000000455319
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bf80
RBP: 000000000072bf80 R08: 0000000000000000 R09: 000000000072bf58
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000a3e81f R14: 00007fa346e829c0 R15: 0000000000000001


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* INFO: rcu detected stall in kmem_cache_alloc_node_trace
From: syzbot @ 2018-04-30 18:05 UTC (permalink / raw)
  To: davem, linux-kernel, linux-sctp, netdev, nhorman, syzkaller-bugs,
	vyasevich

Hello,

syzbot found the following crash on:

HEAD commit:    17dec0a94915 Merge branch 'userns-linus' of  
git://git.kerne...
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?id=6093051722203136
kernel config:   
https://syzkaller.appspot.com/x/.config?id=-2735707888269579554
dashboard link: https://syzkaller.appspot.com/bug?extid=deec965c578bb9b81613
compiler:       gcc (GCC) 8.0.1 20180301 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+deec965c578bb9b81613@syzkaller.appspotmail.com

sctp: [Deprecated]: syz-executor3 (pid 10218) Use of int in max_burst  
socket option.
Use struct sctp_assoc_value instead
sctp: [Deprecated]: syz-executor3 (pid 10218) Use of int in max_burst  
socket option.
Use struct sctp_assoc_value instead
random: crng init done
INFO: rcu_sched self-detected stall on CPU
	0-....: (120712 ticks this GP) idle=ac6/1/4611686018427387908  
softirq=31693/31693 fqs=31173
	 (t=125001 jiffies g=17039 c=17038 q=303419)
NMI backtrace for cpu 0
CPU: 0 PID: 10218 Comm: syz-executor3 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x1b9/0x29f lib/dump_stack.c:53
  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
  __rcu_pending kernel/rcu/tree.c:3356 [inline]
  rcu_pending kernel/rcu/tree.c:3401 [inline]
  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
  tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:162
  tick_sched_timer+0x42/0x130 kernel/time/tick-sched.c:1170
  __run_hrtimer kernel/time/hrtimer.c:1349 [inline]
  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1411
  hrtimer_interrupt+0x2f3/0x750 kernel/time/hrtimer.c:1469
  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783  
[inline]
RIP: 0010:lock_is_held_type+0x18b/0x210 kernel/locking/lockdep.c:3960
RSP: 0018:ffff8801db006400 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12
RAX: dffffc0000000000 RBX: 0000000000000282 RCX: 0000000000000000
RDX: 1ffffffff1162e55 RSI: ffffffff88b90c60 RDI: 0000000000000282
RBP: ffff8801db006420 R08: ffffed003b6046c3 R09: ffffed003b6046c2
R10: ffffed003b6046c2 R11: ffff8801db023613 R12: ffff8801b2f623c0
R13: 0000000000000000 R14: ffff88009932bb00 R15: 00000000ffffffff
  lock_is_held include/linux/lockdep.h:344 [inline]
  rcu_read_lock_sched_held+0x108/0x120 kernel/rcu/update.c:117
  trace_kmalloc_node include/trace/events/kmem.h:100 [inline]
  kmem_cache_alloc_node_trace+0x34e/0x770 mm/slab.c:3652
  __do_kmalloc_node mm/slab.c:3669 [inline]
  __kmalloc_node_track_caller+0x33/0x70 mm/slab.c:3684
  __kmalloc_reserve.isra.38+0x3a/0xe0 net/core/skbuff.c:137
  __alloc_skb+0x14d/0x780 net/core/skbuff.c:205
  alloc_skb include/linux/skbuff.h:987 [inline]
  sctp_packet_transmit+0x45e/0x3ba0 net/sctp/output.c:585
  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d1/0x200 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
  </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783  
[inline]
RIP: 0010:console_unlock+0xcdf/0x1100 kernel/printk/printk.c:2403
RSP: 0018:ffff8801946eec00 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff12
RAX: 0000000000040000 RBX: 0000000000000200 RCX: ffffc90002ee8000
RDX: 0000000000004461 RSI: ffffffff815f3446 RDI: 0000000000000212
RBP: ffff8801946eed68 R08: ffff8801b2f62c38 R09: 0000000000000006
R10: ffff8801b2f623c0 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff84b84430 R14: 0000000000000001 R15: dffffc0000000000
  vprintk_emit+0x6ad/0xdd0 kernel/printk/printk.c:1907
  vprintk_default+0x28/0x30 kernel/printk/printk.c:1947
  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:379
  printk+0x9e/0xba kernel/printk/printk.c:1980
  sctp_getsockopt_maxburst net/sctp/socket.c:6265 [inline]
  sctp_getsockopt.cold.34+0x11d/0x14c net/sctp/socket.c:7240
  sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2998
  __sys_getsockopt+0x1a5/0x370 net/socket.c:1940
  SYSC_getsockopt net/socket.c:1951 [inline]
  SyS_getsockopt+0x34/0x50 net/socket.c:1948
  do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455279
RSP: 002b:00007f5c0c0f2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 00007f5c0c0f36d4 RCX: 0000000000455279
RDX: 0000000000000014 RSI: 0000000000000084 RDI: 0000000000000014
RBP: 000000000072bea0 R08: 0000000020000240 R09: 0000000000000000
R10: 0000000020000140 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000012b R14: 00000000006f4ca8 R15: 0000000000000000
INFO: rcu_sched detected stalls on CPUs/tasks:
	0-....: (120712 ticks this GP) idle=ac6/1/4611686018427387908  
softirq=31693/31693 fqs=31173
	(detected by 1, t=125002 jiffies, g=17039, c=17038, q=303419)
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 10218 Comm: syz-executor3 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__lock_acquire+0xa78/0x5130 kernel/locking/lockdep.c:3434
RSP: 0018:ffff8801db005b40 EFLAGS: 00000046
RAX: dffffc0000000000 RBX: 00000000858813a6 RCX: 0000000000000001
RDX: 1ffff100365ec585 RSI: ffff8801b2f62c38 RDI: ffff8801b2f62cf9
RBP: ffff8801db005ed0 R08: 0000000000000008 R09: 0000000000000004
R10: ffff8801b2f62cd8 R11: ffff8801b2f623c0 R12: 00000000be6c7baf
R13: 0000000000000000 R14: 2ca977e0cccfd81f R15: 0000000000000000
FS:  00007f5c0c0f3700(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe57df8fa8 CR3: 00000001d7124000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <IRQ>
  lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
  rcu_lock_acquire include/linux/rcupdate.h:246 [inline]
  rcu_read_lock include/linux/rcupdate.h:632 [inline]
  is_bpf_text_address+0x3b/0x170 kernel/bpf/core.c:478
  kernel_text_address+0x79/0xf0 kernel/extable.c:152
  __kernel_text_address+0xd/0x40 kernel/extable.c:107
  unwind_get_return_address+0x61/0xa0 arch/x86/kernel/unwind_frame.c:18
  __save_stack_trace+0x7e/0xd0 arch/x86/kernel/stacktrace.c:45
  save_stack_trace+0x1a/0x20 arch/x86/kernel/stacktrace.c:60
  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
  set_track mm/kasan/kasan.c:459 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527
  __cache_free mm/slab.c:3486 [inline]
  kmem_cache_free+0x86/0x2d0 mm/slab.c:3744
  kfree_skbmem+0x13c/0x210 net/core/skbuff.c:582
  __kfree_skb net/core/skbuff.c:642 [inline]
  consume_skb+0x193/0x550 net/core/skbuff.c:701
  sctp_chunk_destroy net/sctp/sm_make_chunk.c:1477 [inline]
  sctp_chunk_put+0x2c0/0x440 net/sctp/sm_make_chunk.c:1504
  sctp_chunk_free+0x53/0x60 net/sctp/sm_make_chunk.c:1491
  sctp_packet_pack net/sctp/output.c:489 [inline]
  sctp_packet_transmit+0x142e/0x3ba0 net/sctp/output.c:610
  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d1/0x200 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
  </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783  
[inline]
RIP: 0010:console_unlock+0xcdf/0x1100 kernel/printk/printk.c:2403
RSP: 0018:ffff8801946eec00 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff12
RAX: 0000000000040000 RBX: 0000000000000200 RCX: ffffc90002ee8000
RDX: 0000000000004461 RSI: ffffffff815f3446 RDI: 0000000000000212
RBP: ffff8801946eed68 R08: ffff8801b2f62c38 R09: 0000000000000006
R10: ffff8801b2f623c0 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff84b84430 R14: 0000000000000001 R15: dffffc0000000000
  vprintk_emit+0x6ad/0xdd0 kernel/printk/printk.c:1907
  vprintk_default+0x28/0x30 kernel/printk/printk.c:1947
  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:379
  printk+0x9e/0xba kernel/printk/printk.c:1980
  sctp_getsockopt_maxburst net/sctp/socket.c:6265 [inline]
  sctp_getsockopt.cold.34+0x11d/0x14c net/sctp/socket.c:7240
  sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2998
  __sys_getsockopt+0x1a5/0x370 net/socket.c:1940
  SYSC_getsockopt net/socket.c:1951 [inline]
  SyS_getsockopt+0x34/0x50 net/socket.c:1948
  do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455279
RSP: 002b:00007f5c0c0f2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 00007f5c0c0f36d4 RCX: 0000000000455279
RDX: 0000000000000014 RSI: 0000000000000084 RDI: 0000000000000014
RBP: 000000000072bea0 R08: 0000000020000240 R09: 0000000000000000
R10: 0000000020000140 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000012b R14: 00000000006f4ca8 R15: 0000000000000000
Code: 0f 85 dc 32 00 00 8b 0d b7 9e 8b 07 85 c9 0f 84 62 f7 ff ff 48 b8 00  
00 00 00 00 fc ff df 48 8b 54 24 78 48 c1 ea 03 80 3c 02 00 <0f> 85 0b 31  
00 00 48 8b 94 24 88 00 00 00 4d 89 b3 68 08 00 00
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 3.007  
msecs
INFO: task kworker/u4:4:688 blocked for more than 120 seconds.
       Not tainted 4.16.0+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u4:4    D19560   688      2 0x80000000
Workqueue: events_unbound fsnotify_mark_destroy_workfn
Call Trace:
  context_switch kernel/sched/core.c:2848 [inline]
  __schedule+0x807/0x1e40 kernel/sched/core.c:3490
  schedule+0xef/0x430 kernel/sched/core.c:3549
  schedule_timeout+0x1b5/0x240 kernel/time/timer.c:1777
  do_wait_for_common kernel/sched/completion.c:83 [inline]
  __wait_for_common kernel/sched/completion.c:104 [inline]
  wait_for_common kernel/sched/completion.c:115 [inline]
  wait_for_completion+0x3e7/0x870 kernel/sched/completion.c:136
  __synchronize_srcu+0x189/0x240 kernel/rcu/srcutree.c:924
  synchronize_srcu+0x408/0x54f kernel/rcu/srcutree.c:1002
  fsnotify_mark_destroy_workfn+0x1aa/0x530 fs/notify/mark.c:759
  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
  kthread+0x345/0x410 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411

Showing all locks held in the system:
2 locks held by kworker/u4:4/688:
  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:  
__write_once_size include/linux/compiler.h:215 [inline]
  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:  
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:  
atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:  
atomic_long_set include/asm-generic/atomic-long.h:57 [inline]
  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:  
set_work_data kernel/workqueue.c:617 [inline]
  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:  
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
  #0: 000000007b6e92d0 ((wq_completion)"events_unbound"){+.+.}, at:  
process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
  #1: 000000004c7e11cf ((reaper_work).work){+.+.}, at:  
process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
2 locks held by khungtaskd/878:
  #0: 000000001cc267e2 (rcu_read_lock){....}, at:  
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
  #0: 000000001cc267e2 (rcu_read_lock){....}, at: watchdog+0x1ff/0xf60  
kernel/hung_task.c:249
  #1: 000000002f71223f (tasklist_lock){.+.+}, at:  
debug_show_all_locks+0xde/0x34a kernel/locking/lockdep.c:4470
2 locks held by kworker/1:2/1980:
  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at:  
__write_once_size include/linux/compiler.h:215 [inline]
  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at:  
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at: atomic64_set  
include/asm-generic/atomic-instrumented.h:40 [inline]
  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at: atomic_long_set  
include/asm-generic/atomic-long.h:57 [inline]
  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at: set_work_data  
kernel/workqueue.c:617 [inline]
  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at:  
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
  #0: 00000000dc6681fd ((wq_completion)"events"){+.+.}, at:  
process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
  #1: 000000008e69c2e7 (xfrm_state_gc_work){+.+.}, at:  
process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
1 lock held by rsyslogd/4365:
  #0: 00000000ef321630 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x1a9/0x1e0  
fs/file.c:766
2 locks held by getty/4456:
  #0: 0000000044e43f49 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 0000000093b079e0 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4457:
  #0: 000000005b25b55f (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 0000000099955ee5 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4458:
  #0: 0000000050f5738d (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000cccc0402 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4459:
  #0: 00000000ddecfb5c (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 000000003c071f3a (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4460:
  #0: 000000003bec706e (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000dd28453c (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4461:
  #0: 00000000313fc55a (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000e9766156 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4462:
  #0: 000000003212bb13 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 0000000065fa2f73 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 878 Comm: khungtaskd Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x1b9/0x29f lib/dump_stack.c:53
  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
  check_hung_task kernel/hung_task.c:132 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
  watchdog+0xc10/0xf60 kernel/hung_task.c:249
  kthread+0x345/0x410 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 10218 Comm: syz-executor3 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:debug_lockdep_rcu_enabled.part.1+0xb/0x60 kernel/rcu/update.c:298
RSP: 0018:ffff8801db0062e8 EFLAGS: 00000202
RAX: dffffc0000000000 RBX: 1ffff1003b600c61 RCX: ffffffff86583d15
RDX: 0000000000000004 RSI: ffffffff86583d65 RDI: ffffffff88e6cc40
RBP: ffff8801db0062f8 R08: ffff8801b2f623c0 R09: ffffed003b6046c2
R10: ffffed003b6046c2 R11: ffff8801db023613 R12: ffff8801c3d740c0
R13: 0000000000000001 R14: 0000000000010000 R15: ffff88019cc0a900
FS:  00007f5c0c0f3700(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe57df8fa8 CR3: 00000001d7124000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <IRQ>
  rcu_read_unlock include/linux/rcupdate.h:684 [inline]
  ip6_mtu+0x36a/0x590 net/ipv6/route.c:2420
  dst_mtu include/net/dst.h:210 [inline]
  ip6_xmit+0xb42/0x23f0 net/ipv6/ip6_output.c:262
  sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225
  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:642
  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d1/0x200 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
  </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783  
[inline]
RIP: 0010:console_unlock+0xcdf/0x1100 kernel/printk/printk.c:2403
RSP: 0018:ffff8801946eec00 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff12
RAX: 0000000000040000 RBX: 0000000000000200 RCX: ffffc90002ee8000
RDX: 0000000000004461 RSI: ffffffff815f3446 RDI: 0000000000000212
RBP: ffff8801946eed68 R08: ffff8801b2f62c38 R09: 0000000000000006
R10: ffff8801b2f623c0 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff84b84430 R14: 0000000000000001 R15: dffffc0000000000
  vprintk_emit+0x6ad/0xdd0 kernel/printk/printk.c:1907
  vprintk_default+0x28/0x30 kernel/printk/printk.c:1947
  vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:379
  printk+0x9e/0xba kernel/printk/printk.c:1980
  sctp_getsockopt_maxburst net/sctp/socket.c:6265 [inline]
  sctp_getsockopt.cold.34+0x11d/0x14c net/sctp/socket.c:7240
  sock_common_getsockopt+0x9a/0xe0 net/core/sock.c:2998
  __sys_getsockopt+0x1a5/0x370 net/socket.c:1940
  SYSC_getsockopt net/socket.c:1951 [inline]
  SyS_getsockopt+0x34/0x50 net/socket.c:1948
  do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455279
RSP: 002b:00007f5c0c0f2c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 00007f5c0c0f36d4 RCX: 0000000000455279
RDX: 0000000000000014 RSI: 0000000000000084 RDI: 0000000000000014
RBP: 000000000072bea0 R08: 0000000020000240 R09: 0000000000000000
R10: 0000000020000140 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000012b R14: 00000000006f4ca8 R15: 0000000000000000
Code: e9 c3 fd ff ff 48 8b 7d c8 e8 52 6c 50 00 e9 9c fd ff ff 0f 1f 00 66  
2e 0f 1f 84 00 00 00 00 00 48 b8 00 00 00 00 00 fc ff df 55 <48> 89 e5 53  
65 48 8b 1c 25 c0 ed 01 00 48 8d bb 74 08 00 00 48


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: Performance regressions in TCP_STREAM tests in Linux 4.15 (and later)
From: Eric Dumazet @ 2018-04-30 17:47 UTC (permalink / raw)
  To: Ben Greear, Steven Rostedt, Michael Wenig
  Cc: netdev@vger.kernel.org, Shilpi Agarwal, Boon Ang, Darren Hart,
	Steven Rostedt, Abdul Anshad Azeez
In-Reply-To: <7e1f00ad-d859-0aab-c953-d6da2efe11f0@gmail.com>



On 04/30/2018 09:36 AM, Eric Dumazet wrote:
> 
> 
> On 04/30/2018 09:14 AM, Ben Greear wrote:
>> On 04/27/2018 08:11 PM, Steven Rostedt wrote:
>>>
>>> We'd like this email archived in netdev list, but since netdev is
>>> notorious for blocking outlook email as spam, it didn't go through. So
>>> I'm replying here to help get it into the archives.
>>>
>>> Thanks!
>>>
>>> -- Steve
>>>
>>>
>>> On Fri, 27 Apr 2018 23:05:46 +0000
>>> Michael Wenig <mwenig@vmware.com> wrote:
>>>
>>>> As part of VMware's performance testing with the Linux 4.15 kernel,
>>>> we identified CPU cost and throughput regressions when comparing to
>>>> the Linux 4.14 kernel. The impacted test cases are mostly TCP_STREAM
>>>> send tests when using small message sizes. The regressions are
>>>> significant (up 3x) and were tracked down to be a side effect of Eric
>>>> Dumazat's RB tree changes that went into the Linux 4.15 kernel.
>>>> Further investigation showed our use of the TCP_NODELAY flag in
>>>> conjunction with Eric's change caused the regressions to show and
>>>> simply disabling TCP_NODELAY brought performance back to normal.
>>>> Eric's change also resulted into significant improvements in our
>>>> TCP_RR test cases.
>>>>
>>>>
>>>>
>>>> Based on these results, our theory is that Eric's change made the
>>>> system overall faster (reduced latency) but as a side effect less
>>>> aggregation is happening (with TCP_NODELAY) and that results in lower
>>>> throughput. Previously even though TCP_NODELAY was set, system was
>>>> slower and we still got some benefit of aggregation. Aggregation
>>>> helps in better efficiency and higher throughput although it can
>>>> increase the latency. If you are seeing a regression in your
>>>> application throughput after this change, using TCP_NODELAY might
>>>> help bring performance back however that might increase latency.
>>
>> I guess you mean _disabling_ TCP_NODELAY instead of _using_ TCP_NODELAY?
>>
> 
> Yeah, I guess auto-corking does not work as intended.

I would try the following patch :

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 44be7f43455e4aefde8db61e2d941a69abcc642a..c9d00ef54deca15d5760bcbe154001a96fa1e2a7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -697,7 +697,7 @@ static bool tcp_should_autocork(struct sock *sk, struct sk_buff *skb,
 {
        return skb->len < size_goal &&
               sock_net(sk)->ipv4.sysctl_tcp_autocorking &&
-              skb != tcp_write_queue_head(sk) &&
+              !tcp_rtx_queue_empty(sk) &&
               refcount_read(&sk->sk_wmem_alloc) > skb->truesize;
 }
 

^ permalink raw reply related

* Re: [PATCH bpf-next] bpf: relax constraints on formatting for eBPF helper documentation
From: Quentin Monnet @ 2018-04-30 17:45 UTC (permalink / raw)
  To: Edward Cree, daniel, ast; +Cc: dsahern, yhs, netdev, oss-drivers
In-Reply-To: <ecc5415e-0a60-ba6b-84a7-d39bb8b4a0d5@solarflare.com>

2018-04-30 17:33 UTC+0100 ~ Edward Cree <ecree@solarflare.com>
> On 30/04/18 16:59, Quentin Monnet wrote:
>> The Python script used to parse and extract eBPF helpers documentation
>> from include/uapi/linux/bpf.h expects a very specific formatting for the
>> descriptions (single dots represent a space, '>' stands for a tab):
>>
>>     /*
>>      ...
>>      *.int bpf_helper(list of arguments)
>>      *.>    Description
>>      *.>    >       Start of description
>>      *.>    >       Another line of description
>>      *.>    >       And yet another line of description
>>      *.>    Return
>>      *.>    >       0 on success, or a negative error in case of failure
>>      ...
>>      */
>>
>> This is too strict, and painful for developers who wants to add
>> documentation for new helpers. Worse, it is extremelly difficult to
>> check that the formatting is correct during reviews. Change the
>> format expected by the script and make it more flexible. The script now
>> works whether or not the initial space (right after the star) is
>> present, and accepts both tabs and white spaces (or a combination of
>> both) for indenting description sections and contents.
>>
>> Concretely, something like the following would now be supported:
>>
>>     /*
>>      ...
>>      *int bpf_helper(list of arguments)
>>      *......Description
>>      *.>    >       Start of description...
>>      *>     >       Another line of description
>>      *..............And yet another line of description
>>      *>     Return
>>      *.>    ........0 on success, or a negative error in case of failure
>>      ...
>>      */
>>
>> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
>> ---
>>  scripts/bpf_helpers_doc.py | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
>> index 30ba0fee36e4..717547e6f0a6 100755
>> --- a/scripts/bpf_helpers_doc.py
>> +++ b/scripts/bpf_helpers_doc.py
>> @@ -87,7 +87,7 @@ class HeaderParser(object):
>>          #   - Same as above, with "const" and/or "struct" in front of type
>>          #   - "..." (undefined number of arguments, for bpf_trace_printk())
>>          # There is at least one term ("void"), and at most five arguments.
>> -        p = re.compile('^ \* ((.+) \**\w+\((((const )?(struct )?(\w+|\.\.\.)( \**\w+)?)(, )?){1,5}\))$')
>> +        p = re.compile('^ \* ?((.+) \**\w+\((((const )?(struct )?(\w+|\.\.\.)( \**\w+)?)(, )?){1,5}\))$')
> The proper coding style for such things is to go straight to tabs after
>  the star and not have the space.  So if we're going to make the script
>  flexible here (and leave coding style enforcement to other tools such
>  as checkpatch), maybe the regexen should just begin '^ \*\s+' and avoid
>  relying on counting indentation to delimit sections (e.g. scan for the
>  section headers like '^ \*\s+Description$' instead).

Thanks Edward! I agree it would be cleaner. However, with the current
format of the doc, I see two shortcomings.

- First we need a way to detect the end of a section. There is no
"Return" section for helper returning void, so we cannot rely on it to
end the "Description" section. And there is no delimiter to indicate the
end of the description of a given helper. We cannot assume that a string
matching a function definition, alone on its line, indicate the start of
a new helper (this is not the case). So as I see it, this would at least
require some delimiter between the descriptions of different functions
in bpf.h. I could add them if you think this is better.

- Also, we loose the possibility to further indent the text from the
description. Think about code snippets in descriptions: were we to
extract the lines with a regex such as / *\s+(.*)/, I see no way to get
the additional indent that should appear in the man page, if we do not
know what indent level was used for the helper description. I do not see
any simple workaround.

This being said, I am ready to bring whatever changes are needed to make
writing new helper doc easier, so I am open to suggestions if you have
workarounds for these or if the consensus is that the formatting should
be completely revised.

> Btw, leading '^' is unnecessary as re.match() is already implicitly
>  anchored at start-of-string.  (The trailing '$' are still needed.)

Oh, thanks! I'll fix that.

Quentin

^ permalink raw reply

* Re: [PATCH net-next 1/1] inet_diag: fetch cong algo info when socket is destroyed
From: Jamal Hadi Salim @ 2018-04-30 17:33 UTC (permalink / raw)
  To: David Miller; +Cc: kraig, netdev, eric.dumazet, kernel, hadi
In-Reply-To: <20180429.203124.963666195389191670.davem@davemloft.net>

On 29/04/18 08:31 PM, David Miller wrote:

> Well, two things:
> 
> 1) The congestion control info is opt-in, meaning that the user gets
>     it in the dump if they ask for it.
> 
>     This information is opt-in, because otherwise the dumps get really
>     large.
> 
>     Therefore, emitting this stuff by default on destroys in a
>     non-starter.
> 

There are two options that I investigated:
Add a setsockopt() for a new group that indicate "give me the congestion
info in addition" or add a similar knob at bind() time. Either of those
approaches would require bigger surgeries. If you think either of those
is reasonable i will work in that direction.

Note: Vegas adds 4 32-bit words; BBR 5 32-bit words; the congestion
name another 16B worst case.
In the larger scope of things that is very small extra data and saves
all the complexity of the other approaches.

> 2) The TCP_TIME_WAIT test is not there for looks.  You need to add it
>     also to the destroy case, and guess what?  All the sockets you will
>     see will not pass that test.
> 

The TCP_TIME_WAIT test makes sense for a live socket. This sock is
past that stage.

> I'm not applying this, sorry.  I really think things are go as-is, and
> if you really truly want the congestion control information you can
> ask for it while the socket is still alive, and is in the proper state
> to sample the congestion control state before you kill it off.

I am avoiding the polling for scaling reasons. It worked fine for
small number of sockets.

cheers,
jamal

^ permalink raw reply

* [PATCH v2] ethtool: fix a potential missing-check bug
From: Wenwen Wang @ 2018-04-30 17:31 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Kangjie Lu, David S. Miller, Florian Fainelli, Andrew Lunn,
	Edward Cree, Russell King, Alan Brady, Stephen Hemminger,
	Eugenia Emantayev, Inbar Karmy, Vidya Sagar Ravipati, Yury Norov,
	Al Viro, open list:NETWORKING [GENERAL], open list

In ethtool_get_rxnfc(), the object "info" is firstly copied from
user-space. If the FLOW_RSS flag is set in the member field flow_type of
"info" (and cmd is ETHTOOL_GRXFH), info needs to be copied again from
user-space because FLOW_RSS is newer and has new definition, as mentioned
in the comment. However, given that the user data resides in user-space, a
malicious user can race to change the data after the first copy. By doing
so, the user can inject inconsistent data. For example, in the second
copy, the FLOW_RSS flag could be cleared in the field flow_type of "info".
In the following execution, "info" will be used in the function
ops->get_rxnfc(). Such inconsistent data can potentially lead to unexpected
information leakage since ops->get_rxnfc() will prepare various types of
data according to flow_type, and the prepared data will be eventually
copied to user-space. This inconsistent data may also cause undefined
behaviors based on how ops->get_rxnfc() is implemented.

This patch simply re-verifies the flow_type field of "info" after the
second copy. If the value is not as expected, an error code will be
returned.

Signed-off-by: Wenwen Wang <wang6495@umn.edu>
---
 net/core/ethtool.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 03416e6..ba02f0d 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1032,6 +1032,11 @@ static noinline_for_stack int ethtool_get_rxnfc(struct net_device *dev,
 		info_size = sizeof(info);
 		if (copy_from_user(&info, useraddr, info_size))
 			return -EFAULT;
+		/* Since malicious users may modify the original data,
+		 * we need to check whether FLOW_RSS is still requested.
+		 */
+		if (!(info.flow_type & FLOW_RSS))
+			return -EINVAL;
 	}
 
 	if (info.cmd == ETHTOOL_GRXCLSRLALL) {
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH RFC 6/9] veth: Add ndo_xdp_xmit
From: Jesper Dangaard Brouer @ 2018-04-30 17:27 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: Toshiaki Makita, netdev, brouer, Tariq Toukan
In-Reply-To: <da8cb809-ab79-1373-f69b-9eed323b99f6@lab.ntt.co.jp>

On Thu, 26 Apr 2018 19:52:40 +0900
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:

> On 2018/04/26 5:24, Jesper Dangaard Brouer wrote:
> > On Tue, 24 Apr 2018 23:39:20 +0900
> > Toshiaki Makita <toshiaki.makita1@gmail.com> wrote:
> >   
> >> +static int veth_xdp_xmit(struct net_device *dev, struct xdp_frame *frame)
> >> +{
> >> +	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
> >> +	int headroom = frame->data - (void *)frame;
> >> +	struct net_device *rcv;
> >> +	int err = 0;
> >> +
> >> +	rcv = rcu_dereference(priv->peer);
> >> +	if (unlikely(!rcv))
> >> +		return -ENXIO;
> >> +
> >> +	rcv_priv = netdev_priv(rcv);
> >> +	/* xdp_ring is initialized on receive side? */
> >> +	if (rcu_access_pointer(rcv_priv->xdp_prog)) {
> >> +		err = xdp_ok_fwd_dev(rcv, frame->len);
> >> +		if (unlikely(err))
> >> +			return err;
> >> +
> >> +		err = veth_xdp_enqueue(rcv_priv, veth_xdp_to_ptr(frame));
> >> +	} else {
> >> +		struct sk_buff *skb;
> >> +
> >> +		skb = veth_build_skb(frame, headroom, frame->len, 0);
> >> +		if (unlikely(!skb))
> >> +			return -ENOMEM;
> >> +
> >> +		/* Get page ref in case skb is dropped in netif_rx.
> >> +		 * The caller is responsible for freeing the page on error.
> >> +		 */
> >> +		get_page(virt_to_page(frame->data));  
> > 
> > I'm not sure you can make this assumption, that xdp_frames coming from
> > another device driver uses a refcnt based memory model. But maybe I'm
> > confused, as this looks like an SKB receive path, but in the
> > ndo_xdp_xmit().  
> 
> I find this path similar to cpumap, which creates skb from redirected
> xdp frame. Once it is converted to skb, skb head is freed by
> page_frag_free, so anyway I needed to get the refcount here regardless
> of memory model.

Yes I know, I wrote cpumap ;-)

First of all, I don't want to see such xdp_frame to SKB conversion code
in every driver.  Because that increase the chances of errors.  And
when looking at the details, then it seems that you have made the
mistake of making it possible to leak xdp_frame info to the SKB (which
cpumap takes into account).

Second, I think the refcnt scheme here is wrong. The xdp_frame should
be "owned" by XDP and have the proper refcnt to deliver it directly to
the network stack.

Third, if we choose that we want a fallback, in-case XDP is not enabled
on egress dev (but it have an ndo_xdp_xmit), then it should be placed
in the generic/core code.  E.g. __bpf_tx_xdp_map() could look at the
return code from dev->netdev_ops->ndo_xdp() and create an SKB.  (Hint,
this would make it easy to implement TX bulking towards the dev).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


+static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
+				      int buflen)
+{
+	struct sk_buff *skb;
+
+	if (!buflen) {
+		buflen = SKB_DATA_ALIGN(headroom + len) +
+			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	}
+	skb = build_skb(head, buflen);
+	if (!skb)
+		return NULL;
+
+	skb_reserve(skb, headroom);
+	skb_put(skb, len);
+
+	return skb;
+}

^ permalink raw reply

* [PATCH bpf-next] bpf/verifier: enable ctx + const + 0.
From: William Tu @ 2018-04-30 17:15 UTC (permalink / raw)
  To: netdev; +Cc: Yonghong Song, Yifeng Sun

Existing verifier does not allow 'ctx + const + const'.  However, due to
compiler optimization, there is a case where BPF compilerit generates
'ctx + const + 0', as shown below:

  599: (1d) if r2 == r4 goto pc+2
   R0=inv(id=0) R1=ctx(id=0,off=40,imm=0)
   R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
   R3=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R4=inv0
   R6=ctx(id=0,off=0,imm=0) R7=inv2
  600: (bf) r1 = r6			// r1 is ctx
  601: (07) r1 += 36			// r1 has offset 36
  602: (61) r4 = *(u32 *)(r1 +0)	// r1 + 0
  dereference of modified ctx ptr R1 off=36+0, ctx+const is allowed,
  ctx+const+const is not

The reason for BPF backend generating this code is due optimization
likes this, explained from Yonghong:
    if (...)
        *(ctx + 60)
    else
        *(ctx + 56)

The compiler translates it to
    if (...)
       ptr = ctx + 60
    else
       ptr = ctx + 56
    *(ptr + 0)

So load ptr memory become an example of 'ctx + const + 0'.  This patch
enables support for this case.

Fixes: f8ddadc4db6c7 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
---
 kernel/bpf/verifier.c                       |  2 +-
 tools/testing/selftests/bpf/test_verifier.c | 13 +++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 712d8655e916..c9a791b9cf2a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1638,7 +1638,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		/* ctx accesses must be at a fixed offset, so that we can
 		 * determine what type of data were returned.
 		 */
-		if (reg->off) {
+		if (reg->off && off != reg->off) {
 			verbose(env,
 				"dereference of modified ctx ptr R%d off=%d+%d, ctx+const is allowed, ctx+const+const is not\n",
 				regno, reg->off, off - reg->off);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 1acafe26498b..95ad5d5723ae 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -8452,6 +8452,19 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
+		"arithmetic ops make PTR_TO_CTX + const + 0 valid",
+		.insns = {
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
+				      offsetof(struct __sk_buff, data) -
+				      offsetof(struct __sk_buff, mark)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
 		"pkt_end - pkt_start is allowed",
 		.insns = {
 			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH V2 net-next 1/2] tcp: send in-queue bytes in cmsg upon read
From: Soheil Hassas Yeganeh @ 2018-04-30 16:59 UTC (permalink / raw)
  To: David Miller
  Cc: Eric Dumazet, netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet,
	Willem de Bruijn
In-Reply-To: <20180430.121024.1354407786745903932.davem@davemloft.net>

On Mon, Apr 30, 2018 at 12:10 PM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 30 Apr 2018 09:01:47 -0700
>
>> TCP sockets are read by a single thread really (or synchronized
>> threads), or garbage is ensured, regardless of how the kernel
>> ensures locking while reporting "queue length"
>
> Whatever applications "typically do", we should never return
> garbage, and that is what this code allowing to happen.
>
> Everything else in recvmsg() operates on state under the proper socket
> lock, to ensure consistency.
>
> The only reason we are releasing the socket lock first it to make sure
> the backlog is processed and we have the most update information
> available.
>
> It seems like one is striving for correctness and better accuracy, no?
> :-)
>
> Look, this can be fixed really simply.  And if you are worried about
> unbounded loops if two apps maliciously do recvmsg() in parallel,
> then don't even loop, just fallback to full socket locking and make
> the "non-typical" application pay the price:
>
>         tmp1 = A;
>         tmp2 = B;
>         barrier();
>         tmp3 = A;
>         if (unlikely(tmp1 != tmp3)) {
>                 lock_sock(sk);
>                 tmp1 = A;
>                 tmp2 = B;
>                 release_sock(sk);
>         }
>
> I'm seriously not applying the patch as-is, sorry.  This issue
> must be addressed somehow.

Thank you David for the suggestion. Sure, I'll send a V3 with what you
suggested above.

Thanks,
Soheil

^ permalink raw reply

* [net-next 6/9] i40e: Fix multiple issues with UDP tunnel offload filter configuration
From: Jeff Kirsher @ 2018-04-30 17:00 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180430170059.13186-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This fixes at least 2 issues I have found with the UDP tunnel filter
configuration.

The first issue is the fact that the tunnels didn't have any sort of mutual
exclusion in place to prevent an update from racing with a user request to
add/remove a port. As such you could request to add and remove a port
before the port update code had a chance to respond which would result in a
very confusing result. To address it I have added 2 changes. First I added
the RTNL mutex wrapper around our updating of the pending, port, and
filter_index bits. Second I added logic so that we cannot use a port that
has a pending deletion since we need to free the space in hardware before
we can allow software to reuse it.

The second issue addressed is the fact that we were not recording the
actual filter index provided to us by the admin queue. As a result we were
deleting filters that were not associated with the actual filter we wanted
to delete. To fix that I added a filter_index member to the UDP port
tracking structure.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |  2 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 66 +++++++++++++++++++++++------
 2 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index f573108faec3..70d369e9139c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -310,10 +310,12 @@ struct i40e_tc_configuration {
 	struct i40e_tc_info tc_info[I40E_MAX_TRAFFIC_CLASS];
 };
 
+#define I40E_UDP_PORT_INDEX_UNUSED	255
 struct i40e_udp_port_config {
 	/* AdminQ command interface expects port number in Host byte order */
 	u16 port;
 	u8 type;
+	u8 filter_index;
 };
 
 /* macros related to FLX_PIT */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index ad01bfc5ec80..0babde10fa15 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9672,9 +9672,9 @@ static void i40e_handle_mdd_event(struct i40e_pf *pf)
 	i40e_flush(hw);
 }
 
-static const char *i40e_tunnel_name(struct i40e_udp_port_config *port)
+static const char *i40e_tunnel_name(u8 type)
 {
-	switch (port->type) {
+	switch (type) {
 	case UDP_TUNNEL_TYPE_VXLAN:
 		return "vxlan";
 	case UDP_TUNNEL_TYPE_GENEVE:
@@ -9708,37 +9708,68 @@ static void i40e_sync_udp_filters(struct i40e_pf *pf)
 static void i40e_sync_udp_filters_subtask(struct i40e_pf *pf)
 {
 	struct i40e_hw *hw = &pf->hw;
-	i40e_status ret;
+	u8 filter_index, type;
 	u16 port;
 	int i;
 
 	if (!test_and_clear_bit(__I40E_UDP_FILTER_SYNC_PENDING, pf->state))
 		return;
 
+	/* acquire RTNL to maintain state of flags and port requests */
+	rtnl_lock();
+
 	for (i = 0; i < I40E_MAX_PF_UDP_OFFLOAD_PORTS; i++) {
 		if (pf->pending_udp_bitmap & BIT_ULL(i)) {
+			struct i40e_udp_port_config *udp_port;
+			i40e_status ret = 0;
+
+			udp_port = &pf->udp_ports[i];
 			pf->pending_udp_bitmap &= ~BIT_ULL(i);
-			port = pf->udp_ports[i].port;
+
+			port = READ_ONCE(udp_port->port);
+			type = READ_ONCE(udp_port->type);
+			filter_index = READ_ONCE(udp_port->filter_index);
+
+			/* release RTNL while we wait on AQ command */
+			rtnl_unlock();
+
 			if (port)
 				ret = i40e_aq_add_udp_tunnel(hw, port,
-							pf->udp_ports[i].type,
-							NULL, NULL);
-			else
-				ret = i40e_aq_del_udp_tunnel(hw, i, NULL);
+							     type,
+							     &filter_index,
+							     NULL);
+			else if (filter_index != I40E_UDP_PORT_INDEX_UNUSED)
+				ret = i40e_aq_del_udp_tunnel(hw, filter_index,
+							     NULL);
+
+			/* reacquire RTNL so we can update filter_index */
+			rtnl_lock();
 
 			if (ret) {
 				dev_info(&pf->pdev->dev,
 					 "%s %s port %d, index %d failed, err %s aq_err %s\n",
-					 i40e_tunnel_name(&pf->udp_ports[i]),
+					 i40e_tunnel_name(type),
 					 port ? "add" : "delete",
-					 port, i,
+					 port,
+					 filter_index,
 					 i40e_stat_str(&pf->hw, ret),
 					 i40e_aq_str(&pf->hw,
 						     pf->hw.aq.asq_last_status));
-				pf->udp_ports[i].port = 0;
+				if (port) {
+					/* failed to add, just reset port,
+					 * drop pending bit for any deletion
+					 */
+					udp_port->port = 0;
+					pf->pending_udp_bitmap &= ~BIT_ULL(i);
+				}
+			} else if (port) {
+				/* record filter index on success */
+				udp_port->filter_index = filter_index;
 			}
 		}
 	}
+
+	rtnl_unlock();
 }
 
 /**
@@ -11355,6 +11386,11 @@ static u8 i40e_get_udp_port_idx(struct i40e_pf *pf, u16 port)
 	u8 i;
 
 	for (i = 0; i < I40E_MAX_PF_UDP_OFFLOAD_PORTS; i++) {
+		/* Do not report ports with pending deletions as
+		 * being available.
+		 */
+		if (!port && (pf->pending_udp_bitmap & BIT_ULL(i)))
+			continue;
 		if (pf->udp_ports[i].port == port)
 			return i;
 	}
@@ -11409,6 +11445,7 @@ static void i40e_udp_tunnel_add(struct net_device *netdev,
 
 	/* New port: add it and mark its index in the bitmap */
 	pf->udp_ports[next_idx].port = port;
+	pf->udp_ports[next_idx].filter_index = I40E_UDP_PORT_INDEX_UNUSED;
 	pf->pending_udp_bitmap |= BIT_ULL(next_idx);
 	set_bit(__I40E_UDP_FILTER_SYNC_PENDING, pf->state);
 }
@@ -11450,7 +11487,12 @@ static void i40e_udp_tunnel_del(struct net_device *netdev,
 	 * and make it pending
 	 */
 	pf->udp_ports[idx].port = 0;
-	pf->pending_udp_bitmap |= BIT_ULL(idx);
+
+	/* Toggle pending bit instead of setting it. This way if we are
+	 * deleting a port that has yet to be added we just clear the pending
+	 * bit and don't have to worry about it.
+	 */
+	pf->pending_udp_bitmap ^= BIT_ULL(idx);
 	set_bit(__I40E_UDP_FILTER_SYNC_PENDING, pf->state);
 
 	return;
-- 
2.14.3

^ permalink raw reply related

* [net-next 7/9] i40e: avoid overflow in i40e_ptp_adjfreq()
From: Jeff Kirsher @ 2018-04-30 17:00 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180430170059.13186-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

When operating at 1GbE, the base incval for the PTP clock is so large
that multiplying it by numbers close to the max_adj can overflow the
u64.

Rather than attempting to limit the max_adj to a value small enough to
avoid overflow, instead calculate the incvalue adjustment based on the
40GbE incvalue, and then multiply that by the scaling factor for the
link speed.

This sacrifices a small amount of precision in the adjustment but we
avoid erratic behavior of the clock due to the overflow caused if ppb is
very near the maximum adjustment.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h     |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_ptp.c | 41 ++++++++++++++++++++----------
 2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 70d369e9139c..1d59ab6ca90f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -586,7 +586,7 @@ struct i40e_pf {
 	unsigned long ptp_tx_start;
 	struct hwtstamp_config tstamp_config;
 	struct mutex tmreg_lock; /* Used to protect the SYSTIME registers. */
-	u64 ptp_base_adj;
+	u32 ptp_adj_mult;
 	u32 tx_hwtstamp_timeouts;
 	u32 tx_hwtstamp_skipped;
 	u32 rx_hwtstamp_cleared;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 43d7c44d6d9f..aa3daec2049d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -16,9 +16,9 @@
  * At 1Gb link, the period is multiplied by 20. (32ns)
  * 1588 functionality is not supported at 100Mbps.
  */
-#define I40E_PTP_40GB_INCVAL 0x0199999999ULL
-#define I40E_PTP_10GB_INCVAL 0x0333333333ULL
-#define I40E_PTP_1GB_INCVAL  0x2000000000ULL
+#define I40E_PTP_40GB_INCVAL		0x0199999999ULL
+#define I40E_PTP_10GB_INCVAL_MULT	2
+#define I40E_PTP_1GB_INCVAL_MULT	20
 
 #define I40E_PRTTSYN_CTL1_TSYNTYPE_V1  BIT(I40E_PRTTSYN_CTL1_TSYNTYPE_SHIFT)
 #define I40E_PRTTSYN_CTL1_TSYNTYPE_V2  (2 << \
@@ -106,17 +106,24 @@ static int i40e_ptp_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
 		ppb = -ppb;
 	}
 
-	smp_mb(); /* Force any pending update before accessing. */
-	adj = READ_ONCE(pf->ptp_base_adj);
-
-	freq = adj;
+	freq = I40E_PTP_40GB_INCVAL;
 	freq *= ppb;
 	diff = div_u64(freq, 1000000000ULL);
 
 	if (neg_adj)
-		adj -= diff;
+		adj = I40E_PTP_40GB_INCVAL - diff;
 	else
-		adj += diff;
+		adj = I40E_PTP_40GB_INCVAL + diff;
+
+	/* At some link speeds, the base incval is so large that directly
+	 * multiplying by ppb would result in arithmetic overflow even when
+	 * using a u64. Avoid this by instead calculating the new incval
+	 * always in terms of the 40GbE clock rate and then multiplying by the
+	 * link speed factor afterwards. This does result in slightly lower
+	 * precision at lower link speeds, but it is fairly minor.
+	 */
+	smp_mb(); /* Force any pending update before accessing. */
+	adj *= READ_ONCE(pf->ptp_adj_mult);
 
 	wr32(hw, I40E_PRTTSYN_INC_L, adj & 0xFFFFFFFF);
 	wr32(hw, I40E_PRTTSYN_INC_H, adj >> 32);
@@ -438,6 +445,7 @@ void i40e_ptp_set_increment(struct i40e_pf *pf)
 	struct i40e_link_status *hw_link_info;
 	struct i40e_hw *hw = &pf->hw;
 	u64 incval;
+	u32 mult;
 
 	hw_link_info = &hw->phy.link_info;
 
@@ -445,10 +453,10 @@ void i40e_ptp_set_increment(struct i40e_pf *pf)
 
 	switch (hw_link_info->link_speed) {
 	case I40E_LINK_SPEED_10GB:
-		incval = I40E_PTP_10GB_INCVAL;
+		mult = I40E_PTP_10GB_INCVAL_MULT;
 		break;
 	case I40E_LINK_SPEED_1GB:
-		incval = I40E_PTP_1GB_INCVAL;
+		mult = I40E_PTP_1GB_INCVAL_MULT;
 		break;
 	case I40E_LINK_SPEED_100MB:
 	{
@@ -459,15 +467,20 @@ void i40e_ptp_set_increment(struct i40e_pf *pf)
 				 "1588 functionality is not supported at 100 Mbps. Stopping the PHC.\n");
 			warn_once++;
 		}
-		incval = 0;
+		mult = 0;
 		break;
 	}
 	case I40E_LINK_SPEED_40GB:
 	default:
-		incval = I40E_PTP_40GB_INCVAL;
+		mult = 1;
 		break;
 	}
 
+	/* The increment value is calculated by taking the base 40GbE incvalue
+	 * and multiplying it by a factor based on the link speed.
+	 */
+	incval = I40E_PTP_40GB_INCVAL * mult;
+
 	/* Write the new increment value into the increment register. The
 	 * hardware will not update the clock until both registers have been
 	 * written.
@@ -476,7 +489,7 @@ void i40e_ptp_set_increment(struct i40e_pf *pf)
 	wr32(hw, I40E_PRTTSYN_INC_H, incval >> 32);
 
 	/* Update the base adjustement value. */
-	WRITE_ONCE(pf->ptp_base_adj, incval);
+	WRITE_ONCE(pf->ptp_adj_mult, mult);
 	smp_mb(); /* Force the above update. */
 }
 
-- 
2.14.3

^ permalink raw reply related

* [net-next 9/9] i40e: use %pI4b instead of byte swapping before dev_err
From: Jeff Kirsher @ 2018-04-30 17:00 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180430170059.13186-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Fix warnings regarding restricted __be32 type usage by strictly
specifying the type of the ipv4 address being printed in the dev_err
statement.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b500bbf6c43f..c8659fbd7111 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7213,8 +7213,7 @@ static int i40e_parse_cls_flower(struct i40e_vsi *vsi,
 			if (mask->dst == cpu_to_be32(0xffffffff)) {
 				field_flags |= I40E_CLOUD_FIELD_IIP;
 			} else {
-				mask->dst = be32_to_cpu(mask->dst);
-				dev_err(&pf->pdev->dev, "Bad ip dst mask %pI4\n",
+				dev_err(&pf->pdev->dev, "Bad ip dst mask %pI4b\n",
 					&mask->dst);
 				return I40E_ERR_CONFIG;
 			}
@@ -7224,8 +7223,7 @@ static int i40e_parse_cls_flower(struct i40e_vsi *vsi,
 			if (mask->src == cpu_to_be32(0xffffffff)) {
 				field_flags |= I40E_CLOUD_FIELD_IIP;
 			} else {
-				mask->src = be32_to_cpu(mask->src);
-				dev_err(&pf->pdev->dev, "Bad ip src mask %pI4\n",
+				dev_err(&pf->pdev->dev, "Bad ip src mask %pI4b\n",
 					&mask->src);
 				return I40E_ERR_CONFIG;
 			}
-- 
2.14.3

^ permalink raw reply related

* [net-next 8/9] i40e/i40evf: take into account queue map from vf when handling queues
From: Jeff Kirsher @ 2018-04-30 17:00 UTC (permalink / raw)
  To: davem
  Cc: Harshitha Ramamurthy, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180430170059.13186-1-jeffrey.t.kirsher@intel.com>

From: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>

The expectation of the ops VIRTCHNL_OP_ENABLE_QUEUES and
VIRTCHNL_OP_DISABLE_QUEUES is that the queue map sent by
the VF is taken into account when enabling/disabling
queues in the VF VSI. This patch makes sure that happens.

By breaking out the individual queue set up functions so
that they can be called directly from the i40e_virtchnl_pf.c
file, only the queues as specified by the queue bit map that
accompanies the enable/disable queues ops will be handled.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |  3 +
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 40 +++++++++----
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 69 +++++++++++++++++++++-
 3 files changed, 99 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 1d59ab6ca90f..7a80652e2500 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -987,6 +987,9 @@ void i40e_service_event_schedule(struct i40e_pf *pf);
 void i40e_notify_client_of_vf_msg(struct i40e_vsi *vsi, u32 vf_id,
 				  u8 *msg, u16 len);
 
+int i40e_control_wait_tx_q(int seid, struct i40e_pf *pf, int pf_q, bool is_xdp,
+			   bool enable);
+int i40e_control_wait_rx_q(struct i40e_pf *pf, int pf_q, bool enable);
 int i40e_vsi_start_rings(struct i40e_vsi *vsi);
 void i40e_vsi_stop_rings(struct i40e_vsi *vsi);
 void i40e_vsi_stop_rings_no_wait(struct  i40e_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 0babde10fa15..b500bbf6c43f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4235,8 +4235,8 @@ static void i40e_control_tx_q(struct i40e_pf *pf, int pf_q, bool enable)
  * @is_xdp: true if the queue is used for XDP
  * @enable: start or stop the queue
  **/
-static int i40e_control_wait_tx_q(int seid, struct i40e_pf *pf, int pf_q,
-				  bool is_xdp, bool enable)
+int i40e_control_wait_tx_q(int seid, struct i40e_pf *pf, int pf_q,
+			   bool is_xdp, bool enable)
 {
 	int ret;
 
@@ -4281,7 +4281,6 @@ static int i40e_vsi_control_tx(struct i40e_vsi *vsi, bool enable)
 		if (ret)
 			break;
 	}
-
 	return ret;
 }
 
@@ -4320,9 +4319,9 @@ static int i40e_pf_rxq_wait(struct i40e_pf *pf, int pf_q, bool enable)
  * @pf_q: the PF queue to configure
  * @enable: start or stop the queue
  *
- * This function enables or disables a single queue. Note that any delay
- * required after the operation is expected to be handled by the caller of
- * this function.
+ * This function enables or disables a single queue. Note that
+ * any delay required after the operation is expected to be
+ * handled by the caller of this function.
  **/
 static void i40e_control_rx_q(struct i40e_pf *pf, int pf_q, bool enable)
 {
@@ -4351,6 +4350,30 @@ static void i40e_control_rx_q(struct i40e_pf *pf, int pf_q, bool enable)
 	wr32(hw, I40E_QRX_ENA(pf_q), rx_reg);
 }
 
+/**
+ * i40e_control_wait_rx_q
+ * @pf: the PF structure
+ * @pf_q: queue being configured
+ * @enable: start or stop the rings
+ *
+ * This function enables or disables a single queue along with waiting
+ * for the change to finish. The caller of this function should handle
+ * the delays needed in the case of disabling queues.
+ **/
+int i40e_control_wait_rx_q(struct i40e_pf *pf, int pf_q, bool enable)
+{
+	int ret = 0;
+
+	i40e_control_rx_q(pf, pf_q, enable);
+
+	/* wait for the change to finish */
+	ret = i40e_pf_rxq_wait(pf, pf_q, enable);
+	if (ret)
+		return ret;
+
+	return ret;
+}
+
 /**
  * i40e_vsi_control_rx - Start or stop a VSI's rings
  * @vsi: the VSI being configured
@@ -4363,10 +4386,7 @@ static int i40e_vsi_control_rx(struct i40e_vsi *vsi, bool enable)
 
 	pf_q = vsi->base_queue;
 	for (i = 0; i < vsi->num_queue_pairs; i++, pf_q++) {
-		i40e_control_rx_q(pf, pf_q, enable);
-
-		/* wait for the change to finish */
-		ret = i40e_pf_rxq_wait(pf, pf_q, enable);
+		ret = i40e_control_wait_rx_q(pf, pf_q, enable);
 		if (ret) {
 			dev_info(&pf->pdev->dev,
 				 "VSI seid %d Rx ring %d %sable timeout\n",
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 2ceea63cc6cf..c6d24eaede18 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2153,6 +2153,51 @@ static int i40e_vc_config_irq_map_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 				       aq_ret);
 }
 
+/**
+ * i40e_ctrl_vf_tx_rings
+ * @vsi: the SRIOV VSI being configured
+ * @q_map: bit map of the queues to be enabled
+ * @enable: start or stop the queue
+ **/
+static int i40e_ctrl_vf_tx_rings(struct i40e_vsi *vsi, unsigned long q_map,
+				 bool enable)
+{
+	struct i40e_pf *pf = vsi->back;
+	int ret = 0;
+	u16 q_id;
+
+	for_each_set_bit(q_id, &q_map, I40E_MAX_VF_QUEUES) {
+		ret = i40e_control_wait_tx_q(vsi->seid, pf,
+					     vsi->base_queue + q_id,
+					     false /*is xdp*/, enable);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
+/**
+ * i40e_ctrl_vf_rx_rings
+ * @vsi: the SRIOV VSI being configured
+ * @q_map: bit map of the queues to be enabled
+ * @enable: start or stop the queue
+ **/
+static int i40e_ctrl_vf_rx_rings(struct i40e_vsi *vsi, unsigned long q_map,
+				 bool enable)
+{
+	struct i40e_pf *pf = vsi->back;
+	int ret = 0;
+	u16 q_id;
+
+	for_each_set_bit(q_id, &q_map, I40E_MAX_VF_QUEUES) {
+		ret = i40e_control_wait_rx_q(pf, vsi->base_queue + q_id,
+					     enable);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
 /**
  * i40e_vc_enable_queues_msg
  * @vf: pointer to the VF info
@@ -2185,8 +2230,17 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 		goto error_param;
 	}
 
-	if (i40e_vsi_start_rings(pf->vsi[vf->lan_vsi_idx]))
+	/* Use the queue bit map sent by the VF */
+	if (i40e_ctrl_vf_rx_rings(pf->vsi[vf->lan_vsi_idx], vqs->rx_queues,
+				  true)) {
+		aq_ret = I40E_ERR_TIMEOUT;
+		goto error_param;
+	}
+	if (i40e_ctrl_vf_tx_rings(pf->vsi[vf->lan_vsi_idx], vqs->tx_queues,
+				  true)) {
 		aq_ret = I40E_ERR_TIMEOUT;
+		goto error_param;
+	}
 
 	/* need to start the rings for additional ADq VSI's as well */
 	if (vf->adq_enabled) {
@@ -2234,8 +2288,17 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 		goto error_param;
 	}
 
-	i40e_vsi_stop_rings(pf->vsi[vf->lan_vsi_idx]);
-
+	/* Use the queue bit map sent by the VF */
+	if (i40e_ctrl_vf_tx_rings(pf->vsi[vf->lan_vsi_idx], vqs->tx_queues,
+				  false)) {
+		aq_ret = I40E_ERR_TIMEOUT;
+		goto error_param;
+	}
+	if (i40e_ctrl_vf_rx_rings(pf->vsi[vf->lan_vsi_idx], vqs->rx_queues,
+				  false)) {
+		aq_ret = I40E_ERR_TIMEOUT;
+		goto error_param;
+	}
 error_param:
 	/* send the response to the VF */
 	return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES,
-- 
2.14.3

^ permalink raw reply related

* [net-next 3/9] i40e: fix reading LLDP configuration
From: Jeff Kirsher @ 2018-04-30 17:00 UTC (permalink / raw)
  To: davem; +Cc: Mariusz Stachura, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180430170059.13186-1-jeffrey.t.kirsher@intel.com>

From: Mariusz Stachura <mariusz.stachura@intel.com>

Previous method for reading LLDP config was based on hard-coded
offsets. It happened to work, because of structured architecture of
the NVM memory. In the new approach, known as FLAT, we need to
calculate the absolute address, instead of using relative values.
Needed defines for memory location were added.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_dcb.c    | 91 ++++++++++++++++++++++++---
 drivers/net/ethernet/intel/i40e/i40e_type.h   |  8 ++-
 drivers/net/ethernet/intel/i40evf/i40e_type.h | 10 ++-
 3 files changed, 99 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
index 69e7d4967b1c..56bff8faf371 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
@@ -920,6 +920,70 @@ i40e_status i40e_init_dcb(struct i40e_hw *hw)
 	return ret;
 }
 
+/**
+ * _i40e_read_lldp_cfg - generic read of LLDP Configuration data from NVM
+ * @hw: pointer to the HW structure
+ * @lldp_cfg: pointer to hold lldp configuration variables
+ * @module: address of the module pointer
+ * @word_offset: offset of LLDP configuration
+ *
+ * Reads the LLDP configuration data from NVM using passed addresses
+ **/
+static i40e_status _i40e_read_lldp_cfg(struct i40e_hw *hw,
+				       struct i40e_lldp_variables *lldp_cfg,
+				       u8 module, u32 word_offset)
+{
+	u32 address, offset = (2 * word_offset);
+	i40e_status ret;
+	__le16 raw_mem;
+	u16 mem;
+
+	ret = i40e_acquire_nvm(hw, I40E_RESOURCE_READ);
+	if (ret)
+		return ret;
+
+	ret = i40e_aq_read_nvm(hw, 0x0, module * 2, sizeof(raw_mem), &raw_mem,
+			       true, NULL);
+	i40e_release_nvm(hw);
+	if (ret)
+		return ret;
+
+	mem = le16_to_cpu(raw_mem);
+	/* Check if this pointer needs to be read in word size or 4K sector
+	 * units.
+	 */
+	if (mem & I40E_PTR_TYPE)
+		address = (0x7FFF & mem) * 4096;
+	else
+		address = (0x7FFF & mem) * 2;
+
+	ret = i40e_acquire_nvm(hw, I40E_RESOURCE_READ);
+	if (ret)
+		goto err_lldp_cfg;
+
+	ret = i40e_aq_read_nvm(hw, module, offset, sizeof(raw_mem), &raw_mem,
+			       true, NULL);
+	i40e_release_nvm(hw);
+	if (ret)
+		return ret;
+
+	mem = le16_to_cpu(raw_mem);
+	offset = mem + word_offset;
+	offset *= 2;
+
+	ret = i40e_acquire_nvm(hw, I40E_RESOURCE_READ);
+	if (ret)
+		goto err_lldp_cfg;
+
+	ret = i40e_aq_read_nvm(hw, 0, address + offset,
+			       sizeof(struct i40e_lldp_variables), lldp_cfg,
+			       true, NULL);
+	i40e_release_nvm(hw);
+
+err_lldp_cfg:
+	return ret;
+}
+
 /**
  * i40e_read_lldp_cfg - read LLDP Configuration data from NVM
  * @hw: pointer to the HW structure
@@ -931,21 +995,34 @@ i40e_status i40e_read_lldp_cfg(struct i40e_hw *hw,
 			       struct i40e_lldp_variables *lldp_cfg)
 {
 	i40e_status ret = 0;
-	u32 offset = (2 * I40E_NVM_LLDP_CFG_PTR);
+	u32 mem;
 
 	if (!lldp_cfg)
 		return I40E_ERR_PARAM;
 
 	ret = i40e_acquire_nvm(hw, I40E_RESOURCE_READ);
 	if (ret)
-		goto err_lldp_cfg;
+		return ret;
 
-	ret = i40e_aq_read_nvm(hw, I40E_SR_EMP_MODULE_PTR, offset,
-			       sizeof(struct i40e_lldp_variables),
-			       (u8 *)lldp_cfg,
-			       true, NULL);
+	ret = i40e_aq_read_nvm(hw, I40E_SR_NVM_CONTROL_WORD, 0, sizeof(mem),
+			       &mem, true, NULL);
 	i40e_release_nvm(hw);
+	if (ret)
+		return ret;
+
+	/* Read a bit that holds information whether we are running flat or
+	 * structured NVM image. Flat image has LLDP configuration in shadow
+	 * ram, so there is a need to pass different addresses for both cases.
+	 */
+	if (mem & I40E_SR_NVM_MAP_STRUCTURE_TYPE) {
+		/* Flat NVM case */
+		ret = _i40e_read_lldp_cfg(hw, lldp_cfg, I40E_SR_EMP_MODULE_PTR,
+					  I40E_SR_LLDP_CFG_PTR);
+	} else {
+		/* Good old structured NVM image */
+		ret = _i40e_read_lldp_cfg(hw, lldp_cfg, I40E_EMP_MODULE_PTR,
+					  I40E_NVM_LLDP_CFG_PTR);
+	}
 
-err_lldp_cfg:
 	return ret;
 }
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 40968a4216a7..7df969c59855 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -1294,7 +1294,8 @@ struct i40e_hw_port_stats {
 
 /* Checksum and Shadow RAM pointers */
 #define I40E_SR_NVM_CONTROL_WORD		0x00
-#define I40E_SR_EMP_MODULE_PTR			0x0F
+#define I40E_EMP_MODULE_PTR			0x0F
+#define I40E_SR_EMP_MODULE_PTR			0x48
 #define I40E_SR_PBA_FLAGS			0x15
 #define I40E_SR_PBA_BLOCK_PTR			0x16
 #define I40E_SR_BOOT_CONFIG_PTR			0x17
@@ -1313,6 +1314,8 @@ struct i40e_hw_port_stats {
 #define I40E_SR_PCIE_ALT_MODULE_MAX_SIZE	1024
 #define I40E_SR_CONTROL_WORD_1_SHIFT		0x06
 #define I40E_SR_CONTROL_WORD_1_MASK	(0x03 << I40E_SR_CONTROL_WORD_1_SHIFT)
+#define I40E_SR_CONTROL_WORD_1_NVM_BANK_VALID	BIT(5)
+#define I40E_SR_NVM_MAP_STRUCTURE_TYPE		BIT(12)
 #define I40E_PTR_TYPE				BIT(15)
 #define I40E_SR_OCP_CFG_WORD0			0x2B
 #define I40E_SR_OCP_ENABLED			BIT(15)
@@ -1430,7 +1433,8 @@ enum i40e_reset_type {
 };
 
 /* IEEE 802.1AB LLDP Agent Variables from NVM */
-#define I40E_NVM_LLDP_CFG_PTR		0xD
+#define I40E_NVM_LLDP_CFG_PTR	0x06
+#define I40E_SR_LLDP_CFG_PTR	0x31
 struct i40e_lldp_variables {
 	u16 length;
 	u16 adminstatus;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index c77cb9f672d8..094387db3c11 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -1233,7 +1233,8 @@ struct i40e_hw_port_stats {
 
 /* Checksum and Shadow RAM pointers */
 #define I40E_SR_NVM_CONTROL_WORD		0x00
-#define I40E_SR_EMP_MODULE_PTR			0x0F
+#define I40E_EMP_MODULE_PTR			0x0F
+#define I40E_SR_EMP_MODULE_PTR			0x48
 #define I40E_NVM_OEM_VER_OFF			0x83
 #define I40E_SR_NVM_DEV_STARTER_VERSION		0x18
 #define I40E_SR_NVM_WAKE_ON_LAN			0x19
@@ -1249,6 +1250,9 @@ struct i40e_hw_port_stats {
 #define I40E_SR_PCIE_ALT_MODULE_MAX_SIZE	1024
 #define I40E_SR_CONTROL_WORD_1_SHIFT		0x06
 #define I40E_SR_CONTROL_WORD_1_MASK	(0x03 << I40E_SR_CONTROL_WORD_1_SHIFT)
+#define I40E_SR_CONTROL_WORD_1_NVM_BANK_VALID	BIT(5)
+#define I40E_SR_NVM_MAP_STRUCTURE_TYPE		BIT(12)
+#define I40E_PTR_TYPE				BIT(15)
 
 /* Shadow RAM related */
 #define I40E_SR_SECTOR_SIZE_IN_WORDS	0x800
@@ -1362,6 +1366,10 @@ enum i40e_reset_type {
 	I40E_RESET_EMPR		= 3,
 };
 
+/* IEEE 802.1AB LLDP Agent Variables from NVM */
+#define I40E_NVM_LLDP_CFG_PTR	0x06
+#define I40E_SR_LLDP_CFG_PTR	0x31
+
 /* RSS Hash Table Size */
 #define I40E_PFQF_CTL_0_HASHLUTSIZE_512	0x00010000
 
-- 
2.14.3

^ permalink raw reply related

* [net-next 2/9] i40e/i40evf: cleanup incorrect function doxygen comments
From: Jeff Kirsher @ 2018-04-30 17:00 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180430170059.13186-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Recent versions of the Linux kernel now warn about incorrect parameter
definitions for function comments. Fix up several function comments to
correctly reflect the current function arguments. This cleans up the
warnings and helps ensure our documentation is accurate.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_client.c      |  6 ++--
 drivers/net/ethernet/intel/i40e/i40e_common.c      | 37 +++++++++++++---------
 drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c      | 11 ++++---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |  8 ++---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     | 24 ++++++++------
 drivers/net/ethernet/intel/i40e/i40e_hmc.c         |  1 -
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 22 +++++++++----
 drivers/net/ethernet/intel/i40e/i40e_nvm.c         |  1 +
 drivers/net/ethernet/intel/i40e/i40e_ptp.c         |  4 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |  6 ++--
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 16 +++++-----
 drivers/net/ethernet/intel/i40evf/i40e_common.c    |  1 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c      |  4 ++-
 drivers/net/ethernet/intel/i40evf/i40evf_client.c  |  4 +--
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c |  7 ++--
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  5 ++-
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    | 11 +------
 17 files changed, 95 insertions(+), 73 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.c b/drivers/net/ethernet/intel/i40e/i40e_client.c
index 2041757f948c..5f3b8b9ff511 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_client.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_client.c
@@ -40,7 +40,7 @@ static struct i40e_ops i40e_lan_ops = {
 /**
  * i40e_client_get_params - Get the params that can change at runtime
  * @vsi: the VSI with the message
- * @param: clinet param struct
+ * @params: client param struct
  *
  **/
 static
@@ -566,7 +566,7 @@ static int i40e_client_virtchnl_send(struct i40e_info *ldev,
  * i40e_client_setup_qvlist
  * @ldev: pointer to L2 context.
  * @client: Client pointer.
- * @qv_info: queue and vector list
+ * @qvlist_info: queue and vector list
  *
  * Return 0 on success or < 0 on error
  **/
@@ -641,7 +641,7 @@ static int i40e_client_setup_qvlist(struct i40e_info *ldev,
  * i40e_client_request_reset
  * @ldev: pointer to L2 context.
  * @client: Client pointer.
- * @level: reset level
+ * @reset_level: reset level
  **/
 static void i40e_client_request_reset(struct i40e_info *ldev,
 				      struct i40e_client *client,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 6f8fd70d606a..eb2d1530d331 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1671,6 +1671,8 @@ enum i40e_status_code i40e_aq_set_phy_config(struct i40e_hw *hw,
 /**
  * i40e_set_fc
  * @hw: pointer to the hw struct
+ * @aq_failures: buffer to return AdminQ failure information
+ * @atomic_restart: whether to enable atomic link restart
  *
  * Set the requested flow control mode using set_phy_config.
  **/
@@ -2807,8 +2809,8 @@ i40e_status i40e_aq_remove_macvlan(struct i40e_hw *hw, u16 seid,
  * @mr_list: list of mirrored VSI SEIDs or VLAN IDs
  * @cmd_details: pointer to command details structure or NULL
  * @rule_id: Rule ID returned from FW
- * @rule_used: Number of rules used in internal switch
- * @rule_free: Number of rules free in internal switch
+ * @rules_used: Number of rules used in internal switch
+ * @rules_free: Number of rules free in internal switch
  *
  * Add/Delete a mirror rule to a specific switch. Mirror rules are supported for
  * VEBs/VEPA elements only
@@ -2868,8 +2870,8 @@ static i40e_status i40e_mirrorrule_op(struct i40e_hw *hw,
  * @mr_list: list of mirrored VSI SEIDs or VLAN IDs
  * @cmd_details: pointer to command details structure or NULL
  * @rule_id: Rule ID returned from FW
- * @rule_used: Number of rules used in internal switch
- * @rule_free: Number of rules free in internal switch
+ * @rules_used: Number of rules used in internal switch
+ * @rules_free: Number of rules free in internal switch
  *
  * Add mirror rule. Mirror rules are supported for VEBs or VEPA elements only
  **/
@@ -2899,8 +2901,8 @@ i40e_status i40e_aq_add_mirrorrule(struct i40e_hw *hw, u16 sw_seid,
  *		add_mirrorrule.
  * @mr_list: list of mirrored VLAN IDs to be removed
  * @cmd_details: pointer to command details structure or NULL
- * @rule_used: Number of rules used in internal switch
- * @rule_free: Number of rules free in internal switch
+ * @rules_used: Number of rules used in internal switch
+ * @rules_free: Number of rules free in internal switch
  *
  * Delete a mirror rule. Mirror rules are supported for VEBs/VEPA elements only
  **/
@@ -3648,6 +3650,8 @@ i40e_status i40e_aq_stop_lldp(struct i40e_hw *hw, bool shutdown_agent,
 /**
  * i40e_aq_start_lldp
  * @hw: pointer to the hw struct
+ * @buff: buffer for result
+ * @buff_size: buffer size
  * @cmd_details: pointer to command details structure or NULL
  *
  * Start the embedded LLDP Agent on all ports.
@@ -3728,7 +3732,6 @@ i40e_status i40e_aq_get_cee_dcb_config(struct i40e_hw *hw,
  * i40e_aq_add_udp_tunnel
  * @hw: pointer to the hw struct
  * @udp_port: the UDP port to add in Host byte order
- * @header_len: length of the tunneling header length in DWords
  * @protocol_index: protocol index type
  * @filter_index: pointer to filter index
  * @cmd_details: pointer to command details structure or NULL
@@ -3947,6 +3950,7 @@ i40e_status i40e_aq_config_vsi_tc_bw(struct i40e_hw *hw,
  * @hw: pointer to the hw struct
  * @seid: seid of the switching component connected to Physical Port
  * @ets_data: Buffer holding ETS parameters
+ * @opcode: Tx scheduler AQ command opcode
  * @cmd_details: pointer to command details structure or NULL
  **/
 i40e_status i40e_aq_config_switch_comp_ets(struct i40e_hw *hw,
@@ -4290,10 +4294,10 @@ i40e_status i40e_aq_add_rem_control_packet_filter(struct i40e_hw *hw,
  * @hw: pointer to the hw struct
  * @seid: VSI seid to add ethertype filter from
  **/
-#define I40E_FLOW_CONTROL_ETHTYPE 0x8808
 void i40e_add_filter_to_drop_tx_flow_control_frames(struct i40e_hw *hw,
 						    u16 seid)
 {
+#define I40E_FLOW_CONTROL_ETHTYPE 0x8808
 	u16 flag = I40E_AQC_ADD_CONTROL_PACKET_FLAGS_IGNORE_MAC |
 		   I40E_AQC_ADD_CONTROL_PACKET_FLAGS_DROP |
 		   I40E_AQC_ADD_CONTROL_PACKET_FLAGS_TX;
@@ -4424,6 +4428,7 @@ void i40e_set_pci_config_data(struct i40e_hw *hw, u16 link_status)
  * @ret_buff_size: actual buffer size returned
  * @ret_next_table: next block to read
  * @ret_next_index: next index to read
+ * @cmd_details: pointer to command details structure or NULL
  *
  * Dump internal FW/HW data for debug purposes.
  *
@@ -4550,7 +4555,7 @@ i40e_status i40e_aq_configure_partition_bw(struct i40e_hw *hw,
  * i40e_read_phy_register_clause22
  * @hw: pointer to the HW structure
  * @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
  * @value: PHY register value
  *
  * Reads specified PHY register value
@@ -4595,7 +4600,7 @@ i40e_status i40e_read_phy_register_clause22(struct i40e_hw *hw,
  * i40e_write_phy_register_clause22
  * @hw: pointer to the HW structure
  * @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
  * @value: PHY register value
  *
  * Writes specified PHY register value
@@ -4636,7 +4641,7 @@ i40e_status i40e_write_phy_register_clause22(struct i40e_hw *hw,
  * @hw: pointer to the HW structure
  * @page: registers page number
  * @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
  * @value: PHY register value
  *
  * Reads specified PHY register value
@@ -4710,7 +4715,7 @@ i40e_status i40e_read_phy_register_clause45(struct i40e_hw *hw,
  * @hw: pointer to the HW structure
  * @page: registers page number
  * @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
  * @value: PHY register value
  *
  * Writes value to specified PHY register
@@ -4777,7 +4782,7 @@ i40e_status i40e_write_phy_register_clause45(struct i40e_hw *hw,
  * @hw: pointer to the HW structure
  * @page: registers page number
  * @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
  * @value: PHY register value
  *
  * Writes value to specified PHY register
@@ -4813,7 +4818,7 @@ i40e_status i40e_write_phy_register(struct i40e_hw *hw,
  * @hw: pointer to the HW structure
  * @page: registers page number
  * @reg: register address in the page
- * @phy_adr: PHY address on MDIO interface
+ * @phy_addr: PHY address on MDIO interface
  * @value: PHY register value
  *
  * Reads specified PHY register value
@@ -4848,7 +4853,6 @@ i40e_status i40e_read_phy_register(struct i40e_hw *hw,
  * i40e_get_phy_address
  * @hw: pointer to the HW structure
  * @dev_num: PHY port num that address we want
- * @phy_addr: Returned PHY address
  *
  * Gets PHY address for current port
  **/
@@ -5058,7 +5062,9 @@ i40e_status i40e_led_get_phy(struct i40e_hw *hw, u16 *led_addr,
  * i40e_led_set_phy
  * @hw: pointer to the HW structure
  * @on: true or false
+ * @led_addr: address of led register to use
  * @mode: original val plus bit for set or ignore
+ *
  * Set led's on or off when controlled by the PHY
  *
  **/
@@ -5347,6 +5353,7 @@ i40e_status_code i40e_aq_write_ddp(struct i40e_hw *hw, void *buff,
  * @hw: pointer to the hw struct
  * @buff: command buffer (size in bytes = buff_size)
  * @buff_size: buffer size in bytes
+ * @flags: AdminQ command flags
  * @cmd_details: pointer to command details structure or NULL
  **/
 enum
diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c b/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c
index a09d723d5ca0..9deae9a35423 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb_nl.c
@@ -23,7 +23,7 @@ static void i40e_get_pfc_delay(struct i40e_hw *hw, u16 *delay)
 
 /**
  * i40e_dcbnl_ieee_getets - retrieve local IEEE ETS configuration
- * @netdev: the corresponding netdev
+ * @dev: the corresponding netdev
  * @ets: structure to hold the ETS information
  *
  * Returns local IEEE ETS configuration
@@ -62,8 +62,8 @@ static int i40e_dcbnl_ieee_getets(struct net_device *dev,
 
 /**
  * i40e_dcbnl_ieee_getpfc - retrieve local IEEE PFC configuration
- * @netdev: the corresponding netdev
- * @ets: structure to hold the PFC information
+ * @dev: the corresponding netdev
+ * @pfc: structure to hold the PFC information
  *
  * Returns local IEEE PFC configuration
  **/
@@ -95,7 +95,7 @@ static int i40e_dcbnl_ieee_getpfc(struct net_device *dev,
 
 /**
  * i40e_dcbnl_getdcbx - retrieve current DCBx capability
- * @netdev: the corresponding netdev
+ * @dev: the corresponding netdev
  *
  * Returns DCBx capability features
  **/
@@ -108,7 +108,8 @@ static u8 i40e_dcbnl_getdcbx(struct net_device *dev)
 
 /**
  * i40e_dcbnl_get_perm_hw_addr - MAC address used by DCBx
- * @netdev: the corresponding netdev
+ * @dev: the corresponding netdev
+ * @perm_addr: buffer to store the MAC address
  *
  * Returns the SAN MAC address used for LLDP exchange
  **/
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 94774a2a511f..56b911a5dd8b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -12,8 +12,8 @@ static struct dentry *i40e_dbg_root;
 
 /**
  * i40e_dbg_find_vsi - searches for the vsi with the given seid
- * @pf - the PF structure to search for the vsi
- * @seid - seid of the vsi it is searching for
+ * @pf: the PF structure to search for the vsi
+ * @seid: seid of the vsi it is searching for
  **/
 static struct i40e_vsi *i40e_dbg_find_vsi(struct i40e_pf *pf, int seid)
 {
@@ -31,8 +31,8 @@ static struct i40e_vsi *i40e_dbg_find_vsi(struct i40e_pf *pf, int seid)
 
 /**
  * i40e_dbg_find_veb - searches for the veb with the given seid
- * @pf - the PF structure to search for the veb
- * @seid - seid of the veb it is searching for
+ * @pf: the PF structure to search for the veb
+ * @seid: seid of the veb it is searching for
  **/
 static struct i40e_veb *i40e_dbg_find_veb(struct i40e_pf *pf, int seid)
 {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 94dbe2c419ca..c1bbfb913e49 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1055,6 +1055,9 @@ static int i40e_nway_reset(struct net_device *netdev)
 
 /**
  * i40e_get_pauseparam -  Get Flow Control status
+ * @netdev: netdevice structure
+ * @pause: buffer to return pause parameters
+ *
  * Return tx/rx-pause status
  **/
 static void i40e_get_pauseparam(struct net_device *netdev,
@@ -2526,7 +2529,7 @@ static int i40e_get_rss_hash_opts(struct i40e_pf *pf, struct ethtool_rxnfc *cmd)
 /**
  * i40e_check_mask - Check whether a mask field is set
  * @mask: the full mask value
- * @field; mask of the field to check
+ * @field: mask of the field to check
  *
  * If the given mask is fully set, return positive value. If the mask for the
  * field is fully unset, return zero. Otherwise return a negative error code.
@@ -2597,6 +2600,7 @@ static int i40e_parse_rx_flow_user_data(struct ethtool_rx_flow_spec *fsp,
 /**
  * i40e_fill_rx_flow_user_data - Fill in user-defined data field
  * @fsp: pointer to rx_flow specification
+ * @data: pointer to return userdef data
  *
  * Reads the userdef data structure and properly fills in the user defined
  * fields of the rx_flow_spec.
@@ -2775,6 +2779,7 @@ static int i40e_get_ethtool_fdir_entry(struct i40e_pf *pf,
  * i40e_get_rxnfc - command to get RX flow classification rules
  * @netdev: network interface device structure
  * @cmd: ethtool rxnfc command
+ * @rule_locs: pointer to store rule data
  *
  * Returns Success if the command is supported.
  **/
@@ -2816,7 +2821,7 @@ static int i40e_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
 /**
  * i40e_get_rss_hash_bits - Read RSS Hash bits from register
  * @nfc: pointer to user request
- * @i_setc bits currently set
+ * @i_setc: bits currently set
  *
  * Returns value of bits to be set per user request
  **/
@@ -2861,7 +2866,7 @@ static u64 i40e_get_rss_hash_bits(struct ethtool_rxnfc *nfc, u64 i_setc)
 /**
  * i40e_set_rss_hash_opt - Enable/Disable flow types for RSS hash
  * @pf: pointer to the physical function struct
- * @cmd: ethtool rxnfc command
+ * @nfc: ethtool rxnfc command
  *
  * Returns Success if the flow input set is supported.
  **/
@@ -3260,7 +3265,7 @@ static int i40e_add_flex_offset(struct list_head *flex_pit_list,
  * __i40e_reprogram_flex_pit - Re-program specific FLX_PIT table
  * @pf: Pointer to the PF structure
  * @flex_pit_list: list of flexible src offsets in use
- * #flex_pit_start: index to first entry for this section of the table
+ * @flex_pit_start: index to first entry for this section of the table
  *
  * In order to handle flexible data, the hardware uses a table of values
  * called the FLX_PIT table. This table is used to indicate which sections of
@@ -3374,7 +3379,7 @@ static void i40e_reprogram_flex_pit(struct i40e_pf *pf)
 
 /**
  * i40e_flow_str - Converts a flow_type into a human readable string
- * @flow_type: the flow type from a flow specification
+ * @fsp: the flow specification
  *
  * Currently only flow types we support are included here, and the string
  * value attempts to match what ethtool would use to configure this flow type.
@@ -4079,7 +4084,7 @@ static unsigned int i40e_max_channels(struct i40e_vsi *vsi)
 
 /**
  * i40e_get_channels - Get the current channels enabled and max supported etc.
- * @netdev: network interface device structure
+ * @dev: network interface device structure
  * @ch: ethtool channels structure
  *
  * We don't support separate tx and rx queues as channels. The other count
@@ -4088,7 +4093,7 @@ static unsigned int i40e_max_channels(struct i40e_vsi *vsi)
  * q_vectors since we support a lot more queue pairs than q_vectors.
  **/
 static void i40e_get_channels(struct net_device *dev,
-			       struct ethtool_channels *ch)
+			      struct ethtool_channels *ch)
 {
 	struct i40e_netdev_priv *np = netdev_priv(dev);
 	struct i40e_vsi *vsi = np->vsi;
@@ -4107,14 +4112,14 @@ static void i40e_get_channels(struct net_device *dev,
 
 /**
  * i40e_set_channels - Set the new channels count.
- * @netdev: network interface device structure
+ * @dev: network interface device structure
  * @ch: ethtool channels structure
  *
  * The new channels count may not be the same as requested by the user
  * since it gets rounded down to a power of 2 value.
  **/
 static int i40e_set_channels(struct net_device *dev,
-			      struct ethtool_channels *ch)
+			     struct ethtool_channels *ch)
 {
 	const u8 drop = I40E_FILTER_PROGRAM_DESC_DEST_DROP_PACKET;
 	struct i40e_netdev_priv *np = netdev_priv(dev);
@@ -4249,6 +4254,7 @@ static int i40e_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key,
  * @netdev: network interface device structure
  * @indir: indirection table
  * @key: hash key
+ * @hfunc: hash function to use
  *
  * Returns -EINVAL if the table specifies an invalid queue id, otherwise
  * returns 0 after programming the table.
diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
index e40f3b59e42b..19ce93d7fd0a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
@@ -174,7 +174,6 @@ i40e_status i40e_add_pd_table_entry(struct i40e_hw *hw,
  * @hw: pointer to our HW structure
  * @hmc_info: pointer to the HMC configuration information structure
  * @idx: the page index
- * @is_pf: distinguishes a VF from a PF
  *
  * This function:
  *	1. Marks the entry in pd tabe (for paged address mode) or in sd table
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 41bad112a907..ad01bfc5ec80 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -254,8 +254,8 @@ static int i40e_put_lump(struct i40e_lump_tracking *pile, u16 index, u16 id)
 
 /**
  * i40e_find_vsi_from_id - searches for the vsi with the given id
- * @pf - the pf structure to search for the vsi
- * @id - id of the vsi it is searching for
+ * @pf: the pf structure to search for the vsi
+ * @id: id of the vsi it is searching for
  **/
 struct i40e_vsi *i40e_find_vsi_from_id(struct i40e_pf *pf, u16 id)
 {
@@ -411,6 +411,7 @@ static void i40e_get_netdev_stats_struct_tx(struct i40e_ring *ring,
 /**
  * i40e_get_netdev_stats_struct - Get statistics for netdev interface
  * @netdev: network interface device structure
+ * @stats: data structure to store statistics
  *
  * Returns the address of the device statistics structure.
  * The statistics are actually updated from the service task.
@@ -2003,7 +2004,7 @@ struct i40e_new_mac_filter *i40e_next_filter(struct i40e_new_mac_filter *next)
  * from firmware
  * @count: Number of filters added
  * @add_list: return data from fw
- * @head: pointer to first filter in current batch
+ * @add_head: pointer to first filter in current batch
  *
  * MAC filter entries from list were slated to be added to device. Returns
  * number of successful filters. Note that 0 does NOT mean success!
@@ -2110,6 +2111,7 @@ void i40e_aqc_add_filters(struct i40e_vsi *vsi, const char *vsi_name,
 /**
  * i40e_aqc_broadcast_filter - Set promiscuous broadcast flags
  * @vsi: pointer to the VSI
+ * @vsi_name: the VSI name
  * @f: filter data
  *
  * This function sets or clears the promiscuous broadcast flags for VLAN
@@ -2816,6 +2818,7 @@ void i40e_vsi_kill_vlan(struct i40e_vsi *vsi, u16 vid)
 /**
  * i40e_vlan_rx_add_vid - Add a vlan id filter to HW offload
  * @netdev: network interface to be adjusted
+ * @proto: unused protocol value
  * @vid: vlan id to be added
  *
  * net_device_ops implementation for adding vlan ids
@@ -2840,6 +2843,7 @@ static int i40e_vlan_rx_add_vid(struct net_device *netdev,
 /**
  * i40e_vlan_rx_kill_vid - Remove a vlan id filter from HW offload
  * @netdev: network interface to be adjusted
+ * @proto: unused protocol value
  * @vid: vlan id to be removed
  *
  * net_device_ops implementation for removing vlan ids
@@ -3461,7 +3465,7 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
 
 /**
  * i40e_enable_misc_int_causes - enable the non-queue interrupts
- * @hw: ptr to the hardware info
+ * @pf: pointer to private device data structure
  **/
 static void i40e_enable_misc_int_causes(struct i40e_pf *pf)
 {
@@ -5072,7 +5076,7 @@ static int i40e_vsi_get_bw_info(struct i40e_vsi *vsi)
  * i40e_vsi_configure_bw_alloc - Configure VSI BW allocation per TC
  * @vsi: the VSI being configured
  * @enabled_tc: TC bitmap
- * @bw_credits: BW shared credits per TC
+ * @bw_share: BW shared credits per TC
  *
  * Returns 0 on success, negative value on failure
  **/
@@ -6329,6 +6333,7 @@ static int i40e_init_pf_dcb(struct i40e_pf *pf)
 /**
  * i40e_print_link_message - print link up or down
  * @vsi: the VSI for which link needs a message
+ * @isup: true of link is up, false otherwise
  */
 void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
 {
@@ -9980,7 +9985,7 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
 
 /**
  * i40e_vsi_free_arrays - Free queue and vector pointer arrays for the VSI
- * @type: VSI pointer
+ * @vsi: VSI pointer
  * @free_qvectors: a bool to specify if q_vectors need to be freed.
  *
  * On error: returns error code (negative)
@@ -10776,7 +10781,7 @@ int i40e_config_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size)
  * @vsi: Pointer to VSI structure
  * @seed: Buffer to store the keys
  * @lut: Buffer to store the lookup table entries
- * lut_size: Size of buffer to store the lookup table entries
+ * @lut_size: Size of buffer to store the lookup table entries
  *
  * Returns 0 on success, negative on failure
  */
@@ -11476,6 +11481,7 @@ static int i40e_get_phys_port_id(struct net_device *netdev,
  * @tb: pointer to array of nladdr (unused)
  * @dev: the net device pointer
  * @addr: the MAC address entry being added
+ * @vid: VLAN ID
  * @flags: instructions from stack about fdb operation
  */
 static int i40e_ndo_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
@@ -11521,6 +11527,7 @@ static int i40e_ndo_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
  * i40e_ndo_bridge_setlink - Set the hardware bridge mode
  * @dev: the netdev being configured
  * @nlh: RTNL message
+ * @flags: bridge flags
  *
  * Inserts a new hardware bridge if not already created and
  * enables the bridging mode requested (VEB or VEPA). If the
@@ -14094,6 +14101,7 @@ static void i40e_remove(struct pci_dev *pdev)
 /**
  * i40e_pci_error_detected - warning that something funky happened in PCI land
  * @pdev: PCI device information struct
+ * @error: the type of PCI error
  *
  * Called to warn that something happened and the error handling steps
  * are in progress.  Allows the driver to quiesce things, be ready for
diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
index 51554df9cb9a..0299e5bbb902 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
@@ -1149,6 +1149,7 @@ void i40e_nvmupd_clear_wait_state(struct i40e_hw *hw)
  * i40e_nvmupd_check_wait_event - handle NVM update operation events
  * @hw: pointer to the hardware structure
  * @opcode: the event that just happened
+ * @desc: AdminQ descriptor
  **/
 void i40e_nvmupd_check_wait_event(struct i40e_hw *hw, u16 opcode,
 				  struct i40e_aq_desc *desc)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 47d10ad39c18..43d7c44d6d9f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -483,7 +483,7 @@ void i40e_ptp_set_increment(struct i40e_pf *pf)
 /**
  * i40e_ptp_get_ts_config - ioctl interface to read the HW timestamping
  * @pf: Board private structure
- * @ifreq: ioctl data
+ * @ifr: ioctl data
  *
  * Obtain the current hardware timestamping settigs as requested. To do this,
  * keep a shadow copy of the timestamp settings rather than attempting to
@@ -627,7 +627,7 @@ static int i40e_ptp_set_timestamp_mode(struct i40e_pf *pf,
 /**
  * i40e_ptp_set_ts_config - ioctl interface to control the HW timestamping
  * @pf: Board private structure
- * @ifreq: ioctl data
+ * @ifr: ioctl data
  *
  * Respond to the user filter requests and make the appropriate hardware
  * changes here. The XL710 cannot support splitting of the Tx/Rx timestamping
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 6959897c789e..5efa68de935b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -471,7 +471,7 @@ static int i40e_add_del_fdir_ipv4(struct i40e_vsi *vsi,
 /**
  * i40e_add_del_fdir - Build raw packets to add/del fdir filter
  * @vsi: pointer to the targeted VSI
- * @cmd: command to get or set RX flow classification rules
+ * @input: filter to add or delete
  * @add: true adds a filter, false removes it
  *
  **/
@@ -689,7 +689,7 @@ void i40e_free_tx_resources(struct i40e_ring *tx_ring)
 
 /**
  * i40e_get_tx_pending - how many tx descriptors not processed
- * @tx_ring: the ring of descriptors
+ * @ring: the ring of descriptors
  * @in_sw: use SW variables
  *
  * Since there is no access to the ring head register
@@ -1771,6 +1771,8 @@ static inline int i40e_ptype_to_htype(u8 ptype)
  * i40e_rx_hash - set the hash value in the skb
  * @ring: descriptor ring
  * @rx_desc: specific descriptor
+ * @skb: skb currently being received and modified
+ * @rx_ptype: Rx packet type
  **/
 static inline void i40e_rx_hash(struct i40e_ring *ring,
 				union i40e_rx_desc *rx_desc,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 3e7ab9fbeb0c..2ceea63cc6cf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -8,8 +8,8 @@
 /**
  * i40e_vc_vf_broadcast
  * @pf: pointer to the PF structure
- * @opcode: operation code
- * @retval: return value
+ * @v_opcode: operation code
+ * @v_retval: return value
  * @msg: pointer to the msg buffer
  * @msglen: msg length
  *
@@ -1639,6 +1639,7 @@ static int i40e_vc_send_resp_to_vf(struct i40e_vf *vf,
 /**
  * i40e_vc_get_version_msg
  * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer
  *
  * called from the VF to request the API version used by the PF
  **/
@@ -1682,7 +1683,6 @@ static void i40e_del_qch(struct i40e_vf *vf)
  * i40e_vc_get_vf_resources_msg
  * @vf: pointer to the VF info
  * @msg: pointer to the msg buffer
- * @msglen: msg length
  *
  * called from the VF to request its resources
  **/
@@ -1806,8 +1806,6 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg)
 /**
  * i40e_vc_reset_vf_msg
  * @vf: pointer to the VF info
- * @msg: pointer to the msg buffer
- * @msglen: msg length
  *
  * called from the VF to reset itself,
  * unlike other virtchnl messages, PF driver
@@ -3532,15 +3530,16 @@ static int i40e_vc_del_qch_msg(struct i40e_vf *vf, u8 *msg)
  * i40e_vc_process_vf_msg
  * @pf: pointer to the PF structure
  * @vf_id: source VF id
+ * @v_opcode: operation code
+ * @v_retval: unused return value code
  * @msg: pointer to the msg buffer
  * @msglen: msg length
- * @msghndl: msg handle
  *
  * called from the common aeq/arq handler to
  * process request from VF
  **/
 int i40e_vc_process_vf_msg(struct i40e_pf *pf, s16 vf_id, u32 v_opcode,
-			   u32 v_retval, u8 *msg, u16 msglen)
+			   u32 __always_unused v_retval, u8 *msg, u16 msglen)
 {
 	struct i40e_hw *hw = &pf->hw;
 	int local_vf_id = vf_id - (s16)hw->func_caps.vf_base_id;
@@ -3991,7 +3990,8 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, int vf_id,
  * i40e_ndo_set_vf_bw
  * @netdev: network interface device structure
  * @vf_id: VF identifier
- * @tx_rate: Tx rate
+ * @min_tx_rate: Minimum Tx rate
+ * @max_tx_rate: Maximum Tx rate
  *
  * configure VF Tx rate
  **/
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 0841b2d1ca6d..9cef54971312 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -1231,6 +1231,7 @@ i40e_status_code i40evf_aq_write_ddp(struct i40e_hw *hw, void *buff,
  * @hw: pointer to the hw struct
  * @buff: command buffer (size in bytes = buff_size)
  * @buff_size: buffer size in bytes
+ * @flags: AdminQ command flags
  * @cmd_details: pointer to command details structure or NULL
  **/
 enum
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 87101f565be1..a9730711e257 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -105,7 +105,7 @@ void i40evf_free_tx_resources(struct i40e_ring *tx_ring)
 
 /**
  * i40evf_get_tx_pending - how many Tx descriptors not processed
- * @tx_ring: the ring of descriptors
+ * @ring: the ring of descriptors
  * @in_sw: is tx_pending being checked in SW or HW
  *
  * Since there is no access to the ring head register
@@ -1046,6 +1046,8 @@ static inline int i40e_ptype_to_htype(u8 ptype)
  * i40e_rx_hash - set the hash value in the skb
  * @ring: descriptor ring
  * @rx_desc: specific descriptor
+ * @skb: skb currently being received and modified
+ * @rx_ptype: Rx packet type
  **/
 static inline void i40e_rx_hash(struct i40e_ring *ring,
 				union i40e_rx_desc *rx_desc,
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_client.c b/drivers/net/ethernet/intel/i40evf/i40evf_client.c
index a30d093a52d6..3cc9d60d0d72 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_client.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_client.c
@@ -178,7 +178,6 @@ void i40evf_notify_client_close(struct i40e_vsi *vsi, bool reset)
 /**
  * i40evf_client_add_instance - add a client instance to the instance list
  * @adapter: pointer to the board struct
- * @client: pointer to a client struct in the client list.
  *
  * Returns cinst ptr on success, NULL on failure
  **/
@@ -236,7 +235,6 @@ i40evf_client_add_instance(struct i40evf_adapter *adapter)
 /**
  * i40evf_client_del_instance - removes a client instance from the list
  * @adapter: pointer to the board struct
- * @client: pointer to the client struct
  *
  **/
 static
@@ -440,7 +438,7 @@ static u32 i40evf_client_virtchnl_send(struct i40e_info *ldev,
  * i40evf_client_setup_qvlist - send a message to the PF to setup iwarp qv map
  * @ldev: pointer to L2 context.
  * @client: Client pointer.
- * @qv_info: queue and vector list
+ * @qvlist_info: queue and vector list
  *
  * Return 0 on success or < 0 on error
  **/
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 404595efea78..69efe0aec76a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -202,7 +202,7 @@ static void i40evf_get_strings(struct net_device *netdev, u32 sset, u8 *data)
 
 /**
  * i40evf_get_priv_flags - report device private flags
- * @dev: network interface device structure
+ * @netdev: network interface device structure
  *
  * The get string set count and the string set should be matched for each
  * flag returned.  Add new strings for each flag to the i40e_gstrings_priv_flags
@@ -229,7 +229,7 @@ static u32 i40evf_get_priv_flags(struct net_device *netdev)
 
 /**
  * i40evf_set_priv_flags - set private flags
- * @dev: network interface device structure
+ * @netdev: network interface device structure
  * @flags: bit flags to be set
  **/
 static int i40evf_set_priv_flags(struct net_device *netdev, u32 flags)
@@ -603,6 +603,7 @@ static int i40evf_set_per_queue_coalesce(struct net_device *netdev,
  * i40evf_get_rxnfc - command to get RX flow classification rules
  * @netdev: network interface device structure
  * @cmd: ethtool rxnfc command
+ * @rule_locs: pointer to store rule locations
  *
  * Returns Success if the command is supported.
  **/
@@ -722,6 +723,7 @@ static u32 i40evf_get_rxfh_indir_size(struct net_device *netdev)
  * @netdev: network interface device structure
  * @indir: indirection table
  * @key: hash key
+ * @hfunc: hash function in use
  *
  * Reads the indirection table directly from the hardware. Always returns 0.
  **/
@@ -750,6 +752,7 @@ static int i40evf_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key,
  * @netdev: network interface device structure
  * @indir: indirection table
  * @key: hash key
+ * @hfunc: hash function to use
  *
  * Returns -EINVAL if the table specifies an inavlid queue id, otherwise
  * returns 0 after programming the table.
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 8f775a82e2fa..28a8cc4a14cb 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -449,6 +449,7 @@ static void i40evf_irq_affinity_release(struct kref *ref) {}
 /**
  * i40evf_request_traffic_irqs - Initialize MSI-X interrupts
  * @adapter: board private structure
+ * @basename: device basename
  *
  * Allocates MSI-X vectors for tx and rx handling, and requests
  * interrupts from the kernel.
@@ -721,6 +722,7 @@ static void i40evf_del_vlan(struct i40evf_adapter *adapter, u16 vlan)
 /**
  * i40evf_vlan_rx_add_vid - Add a VLAN filter to a device
  * @netdev: network device struct
+ * @proto: unused protocol data
  * @vid: VLAN tag
  **/
 static int i40evf_vlan_rx_add_vid(struct net_device *netdev,
@@ -738,6 +740,7 @@ static int i40evf_vlan_rx_add_vid(struct net_device *netdev,
 /**
  * i40evf_vlan_rx_kill_vid - Remove a VLAN filter from a device
  * @netdev: network device struct
+ * @proto: unused protocol data
  * @vid: VLAN tag
  **/
 static int i40evf_vlan_rx_kill_vid(struct net_device *netdev,
@@ -3136,7 +3139,7 @@ static int i40evf_set_features(struct net_device *netdev,
 /**
  * i40evf_features_check - Validate encapsulated packet conforms to limits
  * @skb: skb buff
- * @netdev: This physical port's netdev
+ * @dev: This physical port's netdev
  * @features: Offload features that the stack believes apply
  **/
 static netdev_features_t i40evf_features_check(struct sk_buff *skb,
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index 8bcefb7f7ac0..565677de5ba3 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -155,8 +155,7 @@ int i40evf_send_vf_config_msg(struct i40evf_adapter *adapter)
 
 /**
  * i40evf_get_vf_config
- * @hw: pointer to the hardware structure
- * @len: length of buffer
+ * @adapter: private adapter structure
  *
  * Get VF configuration from PF and populate hw structure. Must be called after
  * admin queue is initialized. Busy waits until response is received from PF,
@@ -399,8 +398,6 @@ int i40evf_request_queues(struct i40evf_adapter *adapter, int num)
 /**
  * i40evf_add_ether_addrs
  * @adapter: adapter structure
- * @addrs: the MAC address filters to add (contiguous)
- * @count: number of filters
  *
  * Request that the PF add one or more addresses to our filters.
  **/
@@ -473,8 +470,6 @@ void i40evf_add_ether_addrs(struct i40evf_adapter *adapter)
 /**
  * i40evf_del_ether_addrs
  * @adapter: adapter structure
- * @addrs: the MAC address filters to remove (contiguous)
- * @count: number of filtes
  *
  * Request that the PF remove one or more addresses from our filters.
  **/
@@ -547,8 +542,6 @@ void i40evf_del_ether_addrs(struct i40evf_adapter *adapter)
 /**
  * i40evf_add_vlans
  * @adapter: adapter structure
- * @vlans: the VLANs to add
- * @count: number of VLANs
  *
  * Request that the PF add one or more VLAN filters to our VSI.
  **/
@@ -619,8 +612,6 @@ void i40evf_add_vlans(struct i40evf_adapter *adapter)
 /**
  * i40evf_del_vlans
  * @adapter: adapter structure
- * @vlans: the VLANs to remove
- * @count: number of VLANs
  *
  * Request that the PF remove one or more VLAN filters from our VSI.
  **/
-- 
2.14.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox