* Re: [PATCH v2] net: dsa: qca8k: enable port flow control
From: xiaofeis @ 2019-07-27 1:15 UTC (permalink / raw)
To: Andrew Lunn; +Cc: vkoul, netdev
In-Reply-To: <20190726132919.GB18223@lunn.ch>
On 2019-07-26 21:29, Andrew Lunn wrote:
>> I didn't compile it on this tree, same code is just compiled and
>> tested on
>> kernel v4.14.
>
> For kernel development work, v4.14 is dead. It died 12th November
> 2017. It gets backports of bug fixes, but kernel developers otherwise
> don't touch it.
>
>> We are working on one google project, all the change is
>> required to upstream by Google.
>> But if I do the change based on the new type for kernel 5.3, then the
>> commit
>> can't be used directly for Google's project.
>
> So you will need to backport the change. In this case, you will have a
> very different patch in v4.14 than in mainline, due to changes like
> this. That is part of the pain in using such an old kernel.
>
> You should use the function
>
> void phy_support_asym_pause(struct phy_device *phydev);
>
> to indicate the MAC supports asym pause.
>
> Andrew
Hi Andrew
Thanks a lot, you are correct. phy_support_asym_pause is the API to do
this.
Very appreciate for all your patinet explaination and good suggestion.
Thanks
Xiaofeis
^ permalink raw reply
* Re: [PATCH bpf-next v10 06/10] bpf,landlock: Add a new map type: inode
From: Alexei Starovoitov @ 2019-07-27 1:40 UTC (permalink / raw)
To: Mickaël Salaün
Cc: linux-kernel, Alexander Viro, Alexei Starovoitov, Andrew Morton,
Andy Lutomirski, Arnaldo Carvalho de Melo, Casey Schaufler,
Daniel Borkmann, David Drysdale, David S . Miller,
Eric W . Biederman, James Morris, Jann Horn, John Johansen,
Jonathan Corbet, Kees Cook, Michael Kerrisk,
Mickaël Salaün, Paul Moore, Sargun Dhillon,
Serge E . Hallyn, Shuah Khan, Stephen Smalley, Tejun Heo,
Tetsuo Handa, Thomas Graf, Tycho Andersen, Will Drewry,
kernel-hardening, linux-api, linux-fsdevel, linux-security-module,
netdev
In-Reply-To: <20190721213116.23476-7-mic@digikod.net>
On Sun, Jul 21, 2019 at 11:31:12PM +0200, Mickaël Salaün wrote:
> FIXME: 64-bits in the doc
>
> This new map store arbitrary values referenced by inode keys. The map
> can be updated from user space with file descriptor pointing to inodes
> tied to a file system. From an eBPF (Landlock) program point of view,
> such a map is read-only and can only be used to retrieved a value tied
> to a given inode. This is useful to recognize an inode tagged by user
> space, without access right to this inode (i.e. no need to have a write
> access to this inode).
>
> Add dedicated BPF functions to handle this type of map:
> * bpf_inode_htab_map_update_elem()
> * bpf_inode_htab_map_lookup_elem()
> * bpf_inode_htab_map_delete_elem()
>
> This new map require a dedicated helper inode_map_lookup_elem() because
> of the key which is a pointer to an opaque data (only provided by the
> kernel). This act like a (physical or cryptographic) key, which is why
> it is also not allowed to get the next key.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
there are too many things to comment on.
Let's do this patch.
imo inode_map concept is interesting, but see below...
> +
> + /*
> + * Limit number of entries in an inode map to the maximum number of
> + * open files for the current process. The maximum number of file
> + * references (including all inode maps) for a process is then
> + * (RLIMIT_NOFILE - 1) * RLIMIT_NOFILE. If the process' RLIMIT_NOFILE
> + * is 0, then any entry update is forbidden.
> + *
> + * An eBPF program can inherit all the inode map FD. The worse case is
> + * to fill a bunch of arraymaps, create an eBPF program, close the
> + * inode map FDs, and start again. The maximum number of inode map
> + * entries can then be close to RLIMIT_NOFILE^3.
> + */
> + if (attr->max_entries > rlimit(RLIMIT_NOFILE))
> + return -EMFILE;
rlimit is checked, but no fd are consumed.
Once created such inode map_fd can be passed to a different process.
map_fd can be pinned into bpffs.
etc.
what the value of the check?
> +
> + /* decorelate UAPI from kernel API */
> + attr->key_size = sizeof(struct inode *);
> +
> + return htab_map_alloc_check(attr);
> +}
> +
> +static void inode_htab_put_key(void *key)
> +{
> + struct inode **inode = key;
> +
> + if ((*inode)->i_state & I_FREEING)
> + return;
checking the state without take a lock? isn't it racy?
> + iput(*inode);
> +}
> +
> +/* called from syscall or (never) from eBPF program */
> +static int map_get_next_no_key(struct bpf_map *map, void *key, void *next_key)
> +{
> + /* do not leak a file descriptor */
what this comment suppose to mean?
> + return -ENOTSUPP;
> +}
> +
> +/* must call iput(inode) after this call */
> +static struct inode *inode_from_fd(int ufd, bool check_access)
> +{
> + struct inode *ret;
> + struct fd f;
> + int deny;
> +
> + f = fdget(ufd);
> + if (unlikely(!f.file))
> + return ERR_PTR(-EBADF);
> + /* TODO?: add this check when called from an eBPF program too (already
> + * checked by the LSM parent hooks anyway) */
> + if (unlikely(IS_PRIVATE(file_inode(f.file)))) {
> + ret = ERR_PTR(-EINVAL);
> + goto put_fd;
> + }
> + /* check if the FD is tied to a mount point */
> + /* TODO?: add this check when called from an eBPF program too */
> + if (unlikely(f.file->f_path.mnt->mnt_flags & MNT_INTERNAL)) {
> + ret = ERR_PTR(-EINVAL);
> + goto put_fd;
> + }
a bunch of TODOs do not inspire confidence.
> + if (check_access) {
> + /*
> + * must be allowed to access attributes from this file to then
> + * be able to compare an inode to its map entry
> + */
> + deny = security_inode_getattr(&f.file->f_path);
> + if (deny) {
> + ret = ERR_PTR(deny);
> + goto put_fd;
> + }
> + }
> + ret = file_inode(f.file);
> + ihold(ret);
> +
> +put_fd:
> + fdput(f);
> + return ret;
> +}
> +
> +/*
> + * The key is a FD when called from a syscall, but an inode address when called
> + * from an eBPF program.
> + */
> +
> +/* called from syscall */
> +int bpf_inode_fd_htab_map_lookup_elem(struct bpf_map *map, int *key, void *value)
> +{
> + void *ptr;
> + struct inode *inode;
> + int ret;
> +
> + /* check inode access */
> + inode = inode_from_fd(*key, true);
> + if (IS_ERR(inode))
> + return PTR_ERR(inode);
> +
> + rcu_read_lock();
> + ptr = htab_map_lookup_elem(map, &inode);
> + iput(inode);
> + if (IS_ERR(ptr)) {
> + ret = PTR_ERR(ptr);
> + } else if (!ptr) {
> + ret = -ENOENT;
> + } else {
> + ret = 0;
> + copy_map_value(map, value, ptr);
> + }
> + rcu_read_unlock();
> + return ret;
> +}
> +
> +/* called from kernel */
wrong comment?
kernel side cannot call it, right?
> +int bpf_inode_ptr_locked_htab_map_delete_elem(struct bpf_map *map,
> + struct inode **key, bool remove_in_inode)
> +{
> + if (remove_in_inode)
> + landlock_inode_remove_map(*key, map);
> + return htab_map_delete_elem(map, key);
> +}
> +
> +/* called from syscall */
> +int bpf_inode_fd_htab_map_delete_elem(struct bpf_map *map, int *key)
> +{
> + struct inode *inode;
> + int ret;
> +
> + /* do not check inode access (similar to directory check) */
> + inode = inode_from_fd(*key, false);
> + if (IS_ERR(inode))
> + return PTR_ERR(inode);
> + ret = bpf_inode_ptr_locked_htab_map_delete_elem(map, &inode, true);
> + iput(inode);
> + return ret;
> +}
> +
> +/* called from syscall */
> +int bpf_inode_fd_htab_map_update_elem(struct bpf_map *map, int *key, void *value,
> + u64 map_flags)
> +{
> + struct inode *inode;
> + int ret;
> +
> + WARN_ON_ONCE(!rcu_read_lock_held());
> +
> + /* check inode access */
> + inode = inode_from_fd(*key, true);
> + if (IS_ERR(inode))
> + return PTR_ERR(inode);
> + ret = htab_map_update_elem(map, &inode, value, map_flags);
> + if (!ret)
> + ret = landlock_inode_add_map(inode, map);
> + iput(inode);
> + return ret;
> +}
> +
> +static void inode_htab_map_free(struct bpf_map *map)
> +{
> + struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> + struct hlist_nulls_node *n;
> + struct hlist_nulls_head *head;
> + struct htab_elem *l;
> + int i;
> +
> + for (i = 0; i < htab->n_buckets; i++) {
> + head = select_bucket(htab, i);
> + hlist_nulls_for_each_entry_safe(l, n, head, hash_node) {
> + landlock_inode_remove_map(*((struct inode **)l->key), map);
> + }
> + }
> + htab_map_free(map);
> +}
user space can delete the map.
that will trigger inode_htab_map_free() which will call
landlock_inode_remove_map().
which will simply itereate the list and delete from the list.
While in parallel inode can be destoyed and hook_inode_free_security()
will be called.
I think nothing that protects from this race.
> +
> +/*
> + * We need a dedicated helper to deal with inode maps because the key is a
> + * pointer to an opaque data, only provided by the kernel. This really act
> + * like a (physical or cryptographic) key, which is why it is also not allowed
> + * to get the next key with map_get_next_key().
inode pointer is like cryptographic key? :)
> + */
> +BPF_CALL_2(bpf_inode_map_lookup_elem, struct bpf_map *, map, void *, key)
> +{
> + WARN_ON_ONCE(!rcu_read_lock_held());
> + return (unsigned long)htab_map_lookup_elem(map, &key);
> +}
> +
> +const struct bpf_func_proto bpf_inode_map_lookup_elem_proto = {
> + .func = bpf_inode_map_lookup_elem,
> + .gpl_only = false,
> + .pkt_access = true,
pkt_access ? :)
> + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
> + .arg1_type = ARG_CONST_MAP_PTR,
> + .arg2_type = ARG_PTR_TO_INODE,
> +};
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index b2a8cb14f28e..e46441c42b68 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -801,6 +801,8 @@ static int map_lookup_elem(union bpf_attr *attr)
> } else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
> map->map_type == BPF_MAP_TYPE_STACK) {
> err = map->ops->map_peek_elem(map, value);
> + } else if (map->map_type == BPF_MAP_TYPE_INODE) {
> + err = bpf_inode_fd_htab_map_lookup_elem(map, key, value);
> } else {
> rcu_read_lock();
> if (map->ops->map_lookup_elem_sys_only)
> @@ -951,6 +953,10 @@ static int map_update_elem(union bpf_attr *attr)
> } else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
> map->map_type == BPF_MAP_TYPE_STACK) {
> err = map->ops->map_push_elem(map, value, attr->flags);
> + } else if (map->map_type == BPF_MAP_TYPE_INODE) {
> + rcu_read_lock();
> + err = bpf_inode_fd_htab_map_update_elem(map, key, value, attr->flags);
> + rcu_read_unlock();
> } else {
> rcu_read_lock();
> err = map->ops->map_update_elem(map, key, value, attr->flags);
> @@ -1006,7 +1012,10 @@ static int map_delete_elem(union bpf_attr *attr)
> preempt_disable();
> __this_cpu_inc(bpf_prog_active);
> rcu_read_lock();
> - err = map->ops->map_delete_elem(map, key);
> + if (map->map_type == BPF_MAP_TYPE_INODE)
> + err = bpf_inode_fd_htab_map_delete_elem(map, key);
> + else
> + err = map->ops->map_delete_elem(map, key);
> rcu_read_unlock();
> __this_cpu_dec(bpf_prog_active);
> preempt_enable();
> @@ -1018,6 +1027,22 @@ static int map_delete_elem(union bpf_attr *attr)
> return err;
> }
>
> +int bpf_inode_ptr_unlocked_htab_map_delete_elem(struct bpf_map *map,
> + struct inode **key, bool remove_in_inode)
> +{
> + int err;
> +
> + preempt_disable();
> + __this_cpu_inc(bpf_prog_active);
> + rcu_read_lock();
> + err = bpf_inode_ptr_locked_htab_map_delete_elem(map, key, remove_in_inode);
> + rcu_read_unlock();
> + __this_cpu_dec(bpf_prog_active);
> + preempt_enable();
> + maybe_wait_bpf_programs(map);
if that function was actually doing synchronize_rcu() the consequences
would have been unpleasant. Fortunately it's a nop in this case.
Please read the code carefully before copy-paste.
Also what do you think the reason of bpf_prog_active above?
What is the reason of rcu_read_lock above?
I think the patch set needs to shrink at least in half to be reviewable.
The way you tie seccomp and lsm is probably the biggest obstacle
than any of the bugs above.
Can you drop seccomp ? and do it as normal lsm ?
^ permalink raw reply
* Re: [PATCH] b43legacy: Remove pointless cond_resched() wrapper
From: Larry Finger @ 2019-07-27 1:52 UTC (permalink / raw)
To: Thomas Gleixner, netdev; +Cc: b43-dev, Kalle Valo
In-Reply-To: <alpine.DEB.2.21.1907262157500.1791@nanos.tec.linutronix.de>
On 7/26/19 3:00 PM, Thomas Gleixner wrote:
> cond_resched() can be used unconditionally. If CONFIG_PREEMPT is set, it
> becomes a NOP scheduler wise.
>
> Also the B43_BUG_ON() in that wrapper is a homebrewn variant of
> __might_sleep() which is part of cond_resched() already.
>
> Remove the wrapper and invoke cond_resched() directly.
>
> Found while looking for CONFIG_PREEMPT dependent code treewide.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: netdev@vger.kernel.org
> Cc: b43-dev@lists.infradead.org
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: Larry Finger <Larry.Finger@lwfinger.net>
Reviewed- and Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Thanks.
Larry
^ permalink raw reply
* Re: [PATCH] isdn/gigaset: check endpoint null in gigaset_probe
From: Phong Tran @ 2019-07-27 1:56 UTC (permalink / raw)
To: Paul Bolle, isdn, gregkh
Cc: tranmanphong, gigaset307x-common, netdev, linux-kernel,
linux-kernel-mentees, syzbot+35b1c403a14f5c89eba7
In-Reply-To: <1876196a0e7fc665f0f50d5e9c0e2641f713e089.camel@tiscali.nl>
On 7/26/19 9:22 PM, Paul Bolle wrote:
> Phong Tran schreef op vr 26-07-2019 om 20:35 [+0700]:
>> This fixed the potential reference NULL pointer while using variable
>> endpoint.
>>
>> Reported-by: syzbot+35b1c403a14f5c89eba7@syzkaller.appspotmail.com
>> Tested by syzbot:
>> https://groups.google.com/d/msg/syzkaller-bugs/wnHG8eRNWEA/Qn2HhjNdBgAJ
>>
>> Signed-off-by: Phong Tran <tranmanphong@gmail.com>
>> ---
>> drivers/isdn/gigaset/usb-gigaset.c | 9 +++++++++
>
> This is now drivers/staging/isdn/gigaset/usb-gigaset.c.
this patch was created base on branch
kasan/usb-fuzzer-usb-testing-2019.07.11 [1]
I did not notice about the driver was moved to staging.
>
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/isdn/gigaset/usb-gigaset.c b/drivers/isdn/gigaset/usb-gigaset.c
>> index 1b9b43659bdf..2e011f3db59e 100644
>> --- a/drivers/isdn/gigaset/usb-gigaset.c
>> +++ b/drivers/isdn/gigaset/usb-gigaset.c
>> @@ -703,6 +703,10 @@ static int gigaset_probe(struct usb_interface *interface,
>> usb_set_intfdata(interface, cs);
>>
>> endpoint = &hostif->endpoint[0].desc;
>> + if (!endpoint) {
>> + dev_err(cs->dev, "Couldn't get control endpoint\n");
>> + return -ENODEV;
>> + }
>
> When can this happen? Is this one of those bugs that one can only trigger with
> a specially crafted (evil) usb device?
>
Yes, in my understanding, this only happens with random test of syzbot.
>> buffer_size = le16_to_cpu(endpoint->wMaxPacketSize);
>> ucs->bulk_out_size = buffer_size;
>> @@ -722,6 +726,11 @@ static int gigaset_probe(struct usb_interface *interface,
>> }
>>
>> endpoint = &hostif->endpoint[1].desc;
>> + if (!endpoint) {
>> + dev_err(cs->dev, "Endpoint not available\n");
>> + retval = -ENODEV;
>> + goto error;
>> + }
>>
>> ucs->busy = 0;
>>
>
> Please note that I'm very close to getting cut off from the ISDN network, so
> the chances of being able to testi this on a live system are getting small.
>
This bug can be invalid now. Do you agree?
There is an instruction to report invalid bug to syzbot [2].
> Thanks,
>
>
> Paul Bolle
>
[1]
https://github.com/google/kasan/commits/usb-fuzzer-usb-testing-2019.07.11
[2]
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
Thanks,
Phong
^ permalink raw reply
* [PATCH] tcp: add new tcp_mtu_probe_floor sysctl
From: Josh Hunt @ 2019-07-27 2:23 UTC (permalink / raw)
To: netdev; +Cc: davem, edumazet, Josh Hunt
The current implementation of TCP MTU probing can considerably
underestimate the MTU on lossy connections allowing the MSS to get down to
48. We have found that in almost all of these cases on our networks these
paths can handle much larger MTUs meaning the connections are being
artificially limited. Even though TCP MTU probing can raise the MSS back up
we have seen this not to be the case causing connections to be "stuck" with
an MSS of 48 when heavy loss is present.
Prior to pushing out this change we could not keep TCP MTU probing enabled
b/c of the above reasons. Now with a reasonble floor set we've had it
enabled for the past 6 months.
The new sysctl will still default to TCP_MIN_SND_MSS (48), but gives
administrators the ability to control the floor of MSS probing.
Signed-off-by: Josh Hunt <johunt@akamai.com>
---
Documentation/networking/ip-sysctl.txt | 6 ++++++
include/net/netns/ipv4.h | 1 +
net/ipv4/sysctl_net_ipv4.c | 9 +++++++++
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/tcp_timer.c | 2 +-
5 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index df33674799b5..49e95f438ed7 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -256,6 +256,12 @@ tcp_base_mss - INTEGER
Path MTU discovery (MTU probing). If MTU probing is enabled,
this is the initial MSS used by the connection.
+tcp_mtu_probe_floor - INTEGER
+ If MTU probing is enabled this caps the minimum MSS used for search_low
+ for the connection.
+
+ Default : 48
+
tcp_min_snd_mss - INTEGER
TCP SYN and SYNACK messages usually advertise an ADVMSS option,
as described in RFC 1122 and RFC 6691.
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index bc24a8ec1ce5..c0c0791b1912 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -116,6 +116,7 @@ struct netns_ipv4 {
int sysctl_tcp_l3mdev_accept;
#endif
int sysctl_tcp_mtu_probing;
+ int sysctl_tcp_mtu_probe_floor;
int sysctl_tcp_base_mss;
int sysctl_tcp_min_snd_mss;
int sysctl_tcp_probe_threshold;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0b980e841927..59ded25acd04 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -820,6 +820,15 @@ static struct ctl_table ipv4_net_table[] = {
.extra2 = &tcp_min_snd_mss_max,
},
{
+ .procname = "tcp_mtu_probe_floor",
+ .data = &init_net.ipv4.sysctl_tcp_mtu_probe_floor,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &tcp_min_snd_mss_min,
+ .extra2 = &tcp_min_snd_mss_max,
+ },
+ {
.procname = "tcp_probe_threshold",
.data = &init_net.ipv4.sysctl_tcp_probe_threshold,
.maxlen = sizeof(int),
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d57641cb3477..e0a372676329 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2637,6 +2637,7 @@ static int __net_init tcp_sk_init(struct net *net)
net->ipv4.sysctl_tcp_min_snd_mss = TCP_MIN_SND_MSS;
net->ipv4.sysctl_tcp_probe_threshold = TCP_PROBE_THRESHOLD;
net->ipv4.sysctl_tcp_probe_interval = TCP_PROBE_INTERVAL;
+ net->ipv4.sysctl_tcp_mtu_probe_floor = TCP_MIN_SND_MSS;
net->ipv4.sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME;
net->ipv4.sysctl_tcp_keepalive_probes = TCP_KEEPALIVE_PROBES;
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index c801cd37cc2a..dbd9d2d0ee63 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -154,7 +154,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
} else {
mss = tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low) >> 1;
mss = min(net->ipv4.sysctl_tcp_base_mss, mss);
- mss = max(mss, 68 - tcp_sk(sk)->tcp_header_len);
+ mss = max(mss, net->ipv4.sysctl_tcp_mtu_probe_floor);
mss = max(mss, net->ipv4.sysctl_tcp_min_snd_mss);
icsk->icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
}
--
2.7.4
^ permalink raw reply related
* Re: next-20190723: bpf/seccomp - systemd/journald issue?
From: Alexei Starovoitov @ 2019-07-27 2:24 UTC (permalink / raw)
To: sedat.dilek
Cc: Yonghong Song, Alexei Starovoitov, Daniel Borkmann, Martin Lau,
Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
Clang-Built-Linux ML, Kees Cook, Nick Desaulniers,
Nathan Chancellor
In-Reply-To: <CA+icZUUe0QE9QGMom1iQwuG8nM7Oi4Mq0GKqrLvebyxfUmj6RQ@mail.gmail.com>
On Fri, Jul 26, 2019 at 2:19 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Fri, Jul 26, 2019 at 11:10 PM Yonghong Song <yhs@fb.com> wrote:
> >
> >
> >
> > On 7/26/19 2:02 PM, Sedat Dilek wrote:
> > > On Fri, Jul 26, 2019 at 10:38 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > >>
> > >> Hi Yonghong Song,
> > >>
> > >> On Fri, Jul 26, 2019 at 5:45 PM Yonghong Song <yhs@fb.com> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On 7/26/19 1:26 AM, Sedat Dilek wrote:
> > >>>> Hi,
> > >>>>
> > >>>> I have opened a new issue in the ClangBuiltLinux issue tracker.
> > >>>
> > >>> Glad to know clang 9 has asm goto support and now It can compile
> > >>> kernel again.
> > >>>
> > >>
> > >> Yupp.
> > >>
> > >>>>
> > >>>> I am seeing a problem in the area bpf/seccomp causing
> > >>>> systemd/journald/udevd services to fail.
> > >>>>
> > >>>> [Fri Jul 26 08:08:43 2019] systemd[453]: systemd-udevd.service: Failed
> > >>>> to connect stdout to the journal socket, ignoring: Connection refused
> > >>>>
> > >>>> This happens when I use the (LLVM) LLD ld.lld-9 linker but not with
> > >>>> BFD linker ld.bfd on Debian/buster AMD64.
> > >>>> In both cases I use clang-9 (prerelease).
> > >>>
> > >>> Looks like it is a lld bug.
> > >>>
> > >>> I see the stack trace has __bpf_prog_run32() which is used by
> > >>> kernel bpf interpreter. Could you try to enable bpf jit
> > >>> sysctl net.core.bpf_jit_enable = 1
> > >>> If this passed, it will prove it is interpreter related.
> > >>>
> > >>
> > >> After...
> > >>
> > >> sysctl -w net.core.bpf_jit_enable=1
> > >>
> > >> I can start all failed systemd services.
> > >>
> > >> systemd-journald.service
> > >> systemd-udevd.service
> > >> haveged.service
> > >>
> > >> This is in maintenance mode.
> > >>
> > >> What is next: Do set a permanent sysctl setting for net.core.bpf_jit_enable?
> > >>
> > >
> > > This is what I did:
> >
> > I probably won't have cycles to debug this potential lld issue.
> > Maybe you already did, I suggest you put enough reproducible
> > details in the bug you filed against lld so they can take a look.
> >
>
> I understand and will put the journalctl-log into the CBL issue
> tracker and update informations.
>
> Thanks for your help understanding the BPF correlations.
>
> Is setting 'net.core.bpf_jit_enable = 2' helpful here?
jit_enable=1 is enough.
Or use CONFIG_BPF_JIT_ALWAYS_ON to workaround.
It sounds like clang miscompiles interpreter.
modprobe test_bpf
should be able to point out which part of interpreter is broken.
^ permalink raw reply
* Re: [PATCH bpf-next v5 0/6] xdp: Add devmap_hash map type
From: Alexei Starovoitov @ 2019-07-27 2:26 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Daniel Borkmann, Alexei Starovoitov, Network Development,
David Miller, Jesper Dangaard Brouer, Jakub Kicinski,
Björn Töpel, Yonghong Song
In-Reply-To: <156415721066.13581.737309854787645225.stgit@alrua-x1>
On Fri, Jul 26, 2019 at 9:06 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> This series adds a new map type, devmap_hash, that works like the existing
> devmap type, but using a hash-based indexing scheme. This is useful for the use
> case where a devmap is indexed by ifindex (for instance for use with the routing
> table lookup helper). For this use case, the regular devmap needs to be sized
> after the maximum ifindex number, not the number of devices in it. A hash-based
> indexing scheme makes it possible to size the map after the number of devices it
> should contain instead.
>
> This was previously part of my patch series that also turned the regular
> bpf_redirect() helper into a map-based one; for this series I just pulled out
> the patches that introduced the new map type.
>
> Changelog:
>
> v5:
>
> - Dynamically set the number of hash buckets by rounding up max_entries to the
> nearest power of two (mirroring the regular hashmap), as suggested by Jesper.
fyi I'm waiting for Jesper to review this new version.
^ permalink raw reply
* Re: [PATCH V2 net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: liuyonglong @ 2019-07-27 2:28 UTC (permalink / raw)
To: Joe Perches, Saeed Mahameed, tanhuazhong@huawei.com,
davem@davemloft.net
Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
salil.mehta@huawei.com, linuxarm@huawei.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <05602c954c689ffcd796e9468c52bca6fa4efe3f.camel@perches.com>
On 2019/7/27 6:18, Joe Perches wrote:
> On Fri, 2019-07-26 at 22:00 +0000, Saeed Mahameed wrote:
>> On Fri, 2019-07-26 at 11:24 +0800, Huazhong Tan wrote:
>>> From: Yonglong Liu <liuyonglong@huawei.com>
>>>
>>> Some times just see the eth interface have been down/up via
>>> dmesg, but can not know why the eth down. So adds some debug
>>> messages to identify the cause for this.
> []
>>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
>> []
>>> @@ -459,6 +459,10 @@ static int hns3_nic_net_open(struct net_device
>>> *netdev)
>>> h->ae_algo->ops->set_timer_task(priv->ae_handle, true);
>>>
>>> hns3_config_xps(priv);
>>> +
>>> + if (netif_msg_drv(h))
>>> + netdev_info(netdev, "net open\n");
>>> +
>>
>> to make sure this is only intended for debug, and to avoid repetition.
>> #define hns3_dbg(__dev, format, args...) \
>> ({ \
>> if (netif_msg_drv(h)) \
>> netdev_info(h->netdev, format, ##args); \
>> })
>
> netif_dbg(h, drv, h->netdev, "net open\n")
>
Hi, Saeed && Joe:
For our cases, maybe netif_info() can be use for HNS3 drivers?
netif_dbg need to open dynamic debug options additional.
^ permalink raw reply
* Re: memory leak in kobject_set_name_vargs (2)
From: Linus Torvalds @ 2019-07-27 2:29 UTC (permalink / raw)
To: syzbot
Cc: Catalin Marinas, David Miller, Dmitry Vyukov, Herbert Xu, kuznet,
Kalle Valo, Linux List Kernel Mailing, Linux-MM, luciano.coelho,
Netdev, steffen.klassert, syzkaller-bugs, yoshfuji
In-Reply-To: <00000000000083ffc4058e9dddf0@google.com>
On Fri, Jul 26, 2019 at 4:26 PM syzbot
<syzbot+ad8ca40ecd77896d51e2@syzkaller.appspotmail.com> wrote:
>
> syzbot has bisected this bug to:
>
> commit 0e034f5c4bc408c943f9c4a06244415d75d7108c
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Wed May 18 18:51:25 2016 +0000
>
> iwlwifi: fix mis-merge that breaks the driver
While this bisection looks more likely than the other syzbot entry
that bisected to a version change, I don't think it is correct eitger.
The bisection ended up doing a lot of "git bisect skip" because of the
undefined reference to `nf_nat_icmp_reply_translation'
issue. Also, the memory leak doesn't seem to be entirely reliable:
when the bisect does 10 runs to verify that some test kernel is bad,
there are a couple of cases where only one or two of the ten run
failed.
Which makes me wonder if one or two of the "everything OK" runs were
actually buggy, but just happened to have all ten pass...
Linus
^ permalink raw reply
* Re: memory leak in kobject_set_name_vargs (2)
From: Qian Cai @ 2019-07-27 2:56 UTC (permalink / raw)
To: Linus Torvalds
Cc: syzbot, Catalin Marinas, David Miller, Dmitry Vyukov, Herbert Xu,
kuznet, Kalle Valo, Linux List Kernel Mailing, Linux-MM,
luciano.coelho, Netdev, steffen.klassert, syzkaller-bugs,
yoshfuji, Wang Hai, Andy Shevchenko, David S. Miller
In-Reply-To: <CAHk-=why-PdP_HNbskRADMp1bnj+FwUDYpUZSYoNLNHMRPtoVA@mail.gmail.com>
> On Jul 26, 2019, at 10:29 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Fri, Jul 26, 2019 at 4:26 PM syzbot
> <syzbot+ad8ca40ecd77896d51e2@syzkaller.appspotmail.com> wrote:
>>
>> syzbot has bisected this bug to:
>>
>> commit 0e034f5c4bc408c943f9c4a06244415d75d7108c
>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>> Date: Wed May 18 18:51:25 2016 +0000
>>
>> iwlwifi: fix mis-merge that breaks the driver
>
> While this bisection looks more likely than the other syzbot entry
> that bisected to a version change, I don't think it is correct eitger.
>
> The bisection ended up doing a lot of "git bisect skip" because of the
>
> undefined reference to `nf_nat_icmp_reply_translation'
>
> issue. Also, the memory leak doesn't seem to be entirely reliable:
> when the bisect does 10 runs to verify that some test kernel is bad,
> there are a couple of cases where only one or two of the ten run
> failed.
>
> Which makes me wonder if one or two of the "everything OK" runs were
> actually buggy, but just happened to have all ten pass…
Real bisection should point to,
8ed633b9baf9e (“Revert "net-sysfs: Fix memory leak in netdev_register_kobject”")
I did encounter those memory leak and comes up with a similar fix in,
6b70fc94afd1 ("net-sysfs: Fix memory leak in netdev_register_kobject”)
but those error handling paths are tricky that seems nobody did much testing there, so it will
keep hitting other bugs in upper functions.
^ permalink raw reply
* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Andrew Lunn @ 2019-07-27 3:02 UTC (permalink / raw)
To: Allan W. Nielsen
Cc: Horatiu Vultur, Nikolay Aleksandrov, roopa, davem, bridge, netdev,
linux-kernel
In-Reply-To: <20190726195010.7x75rr74v7ph3m6m@lx-anielsen.microsemi.net>
> As you properly guessed, this model is quite different from what we are used to.
Yes, it takes a while to get the idea that the hardware is just an
accelerator for what the Linux stack can already do. And if the switch
cannot do some feature, pass the frame to Linux so it can handle it.
You need to keep in mind that there could be other ports in the bridge
than switch ports, and those ports might be interested in the
multicast traffic. Hence the CPU needs to see the traffic. But IGMP
snooping can be used to optimise this. But you still need to be
careful, eg. IPv6 Neighbour discovery has often been broken on
mv88e6xxx because we have been too aggressive with filtering
multicast.
Andrew
^ permalink raw reply
* Re: [PATCH V2 net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: Joe Perches @ 2019-07-27 3:14 UTC (permalink / raw)
To: liuyonglong, Saeed Mahameed, tanhuazhong@huawei.com,
davem@davemloft.net
Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
salil.mehta@huawei.com, linuxarm@huawei.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <f517dc69-6356-98fe-fb7a-0427728814bb@huawei.com>
On Sat, 2019-07-27 at 10:28 +0800, liuyonglong wrote:
> On 2019/7/27 6:18, Joe Perches wrote:
> > On Fri, 2019-07-26 at 22:00 +0000, Saeed Mahameed wrote:
> > > On Fri, 2019-07-26 at 11:24 +0800, Huazhong Tan wrote:
> > > > From: Yonglong Liu <liuyonglong@huawei.com>
> > > >
> > > > Some times just see the eth interface have been down/up via
> > > > dmesg, but can not know why the eth down. So adds some debug
> > > > messages to identify the cause for this.
> > []
> > > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> > > []
> > > > @@ -459,6 +459,10 @@ static int hns3_nic_net_open(struct net_device
> > > > *netdev)
> > > > h->ae_algo->ops->set_timer_task(priv->ae_handle, true);
> > > >
> > > > hns3_config_xps(priv);
> > > > +
> > > > + if (netif_msg_drv(h))
> > > > + netdev_info(netdev, "net open\n");
> > > > +
> > >
> > > to make sure this is only intended for debug, and to avoid repetition.
> > > #define hns3_dbg(__dev, format, args...) \
> > > ({ \
> > > if (netif_msg_drv(h)) \
> > > netdev_info(h->netdev, format, ##args); \
> > > })
> >
> > netif_dbg(h, drv, h->netdev, "net open\n")
> >
>
> Hi, Saeed && Joe:
> For our cases, maybe netif_info() can be use for HNS3 drivers?
> netif_dbg need to open dynamic debug options additional.
Your code, your choice.
I do think littering dmesg with "net open" style messages
and such may be unnecessary. KERN_DEBUG seems a more
appropriate log level.
^ permalink raw reply
* Re: [PATCH bpf-next 01/10] libbpf: add .BTF.ext offset relocation section loading
From: Andrii Nakryiko @ 2019-07-27 5:11 UTC (permalink / raw)
To: Song Liu
Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
Daniel Borkmann, Yonghong Song, Kernel Team
In-Reply-To: <B01B98E5-CDFB-4E3A-BD58-DBA3113C3C3F@fb.com>
On Wed, Jul 24, 2019 at 10:20 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Jul 24, 2019, at 5:37 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Jul 24, 2019 at 5:00 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >>
> >>
> >>> On Jul 24, 2019, at 12:27 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> >>>
> >>> Add support for BPF CO-RE offset relocations. Add section/record
> >>> iteration macros for .BTF.ext. These macro are useful for iterating over
> >>> each .BTF.ext record, either for dumping out contents or later for BPF
> >>> CO-RE relocation handling.
> >>>
> >>> To enable other parts of libbpf to work with .BTF.ext contents, moved
> >>> a bunch of type definitions into libbpf_internal.h.
> >>>
> >>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >>> ---
> >>> tools/lib/bpf/btf.c | 64 +++++++++--------------
> >>> tools/lib/bpf/btf.h | 4 ++
> >>> tools/lib/bpf/libbpf_internal.h | 91 +++++++++++++++++++++++++++++++++
> >>> 3 files changed, 118 insertions(+), 41 deletions(-)
> >>>
> >
> > [...]
> >
> >>> +
> >>> static int btf_ext_parse_hdr(__u8 *data, __u32 data_size)
> >>> {
> >>> const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
> >>> @@ -1004,6 +979,13 @@ struct btf_ext *btf_ext__new(__u8 *data, __u32 size)
> >>> if (err)
> >>> goto done;
> >>>
> >>> + /* check if there is offset_reloc_off/offset_reloc_len fields */
> >>> + if (btf_ext->hdr->hdr_len < sizeof(struct btf_ext_header))
> >>
> >> This check will break when we add more optional sections to btf_ext_header.
> >> Maybe use offsetof() instead?
> >
> > I didn't do it, because there are no fields after offset_reloc_len.
> > But now I though that maybe it would be ok to add zero-sized marker
> > field, kind of like marking off various versions of btf_ext header?
> >
> > Alternatively, I can add offsetofend() macro somewhere in libbpf_internal.h.
> >
> > Do you have any preference?
>
> We only need a stable number to compare against. offsetofend() works.
> Or we can simply have something like
>
> if (btf_ext->hdr->hdr_len <= offsetof(struct btf_ext_header, offset_reloc_off))
> goto done;
> or
> if (btf_ext->hdr->hdr_len < offsetof(struct btf_ext_header, offset_reloc_len))
> goto done;
>
> Does this make sense?
I think offsetofend() is the cleanest solution, I'll do just that.
>
> Thanks,
> Song
^ permalink raw reply
* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: kbuild test robot @ 2019-07-27 5:20 UTC (permalink / raw)
To: Himadri Pandya
Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4160 bytes --]
Hi Himadri,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc1 next-20190726]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All error/warnings (new ones prefixed by >>):
>> net/vmw_vsock/hyperv_transport.c:58:28: error: 'HV_HYP_PAGE_SIZE' undeclared here (not in a function); did you mean 'HV_MESSAGE_SIZE'?
#define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
^
>> net/vmw_vsock/hyperv_transport.c:65:10: note: in expansion of macro 'HVS_SEND_BUF_SIZE'
u8 data[HVS_SEND_BUF_SIZE];
^~~~~~~~~~~~~~~~~
In file included from include/linux/list.h:9:0,
from include/linux/module.h:9,
from net/vmw_vsock/hyperv_transport.c:11:
net/vmw_vsock/hyperv_transport.c: In function 'hvs_open_connection':
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
#define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
^~~~~~~~~~~~~
>> net/vmw_vsock/hyperv_transport.c:390:12: note: in expansion of macro 'max_t'
sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^~~~~~~~~~~~~
>> net/vmw_vsock/hyperv_transport.c:391:12: note: in expansion of macro 'min_t'
sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
#define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
^~~~~~~~~~~~~
net/vmw_vsock/hyperv_transport.c:393:12: note: in expansion of macro 'max_t'
rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^~~~~~~~~~~~~
net/vmw_vsock/hyperv_transport.c:394:12: note: in expansion of macro 'min_t'
rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
^~~~~
net/vmw_vsock/hyperv_transport.c: In function 'hvs_stream_enqueue':
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
__builtin_choose_expr(__safe_cmp(x, y), \
^
include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
^~~~~~~~~~~~~
net/vmw_vsock/hyperv_transport.c:681:14: note: in expansion of macro 'min_t'
to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
^~~~~
vim +58 net/vmw_vsock/hyperv_transport.c
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69531 bytes --]
^ permalink raw reply
* [PATCH V3 net-next 01/10] net: hns3: add reset checking before set channels
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Jian Shen, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
From: Jian Shen <shenjian15@huawei.com>
hns3_set_channels() should check the resetting status firstly,
since the device will reinitialize when resetting. If the
reset has not completed, the hns3_set_channels() may access
invalid memory.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 69f7ef8..08af782 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -4378,6 +4378,9 @@ int hns3_set_channels(struct net_device *netdev,
u16 org_tqp_num;
int ret;
+ if (hns3_nic_resetting(netdev))
+ return -EBUSY;
+
if (ch->rx_count || ch->tx_count)
return -EINVAL;
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 03/10] net: hns3: remove upgrade reset level when reset fail
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
Currently, hclge_reset_err_handle() will assert a global reset
when the failing count is smaller than MAX_RESET_FAIL_CNT, which
will affect other running functions.
So this patch removes this upgrading, and uses re-scheduling reset
task to do it.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
---
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 28 +++++++---------------
1 file changed, 8 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 3fde5471..3c64d70 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3305,7 +3305,7 @@ static int hclge_reset_prepare_wait(struct hclge_dev *hdev)
return ret;
}
-static bool hclge_reset_err_handle(struct hclge_dev *hdev, bool is_timeout)
+static bool hclge_reset_err_handle(struct hclge_dev *hdev)
{
#define MAX_RESET_FAIL_CNT 5
@@ -3322,20 +3322,11 @@ static bool hclge_reset_err_handle(struct hclge_dev *hdev, bool is_timeout)
return false;
} else if (hdev->reset_fail_cnt < MAX_RESET_FAIL_CNT) {
hdev->reset_fail_cnt++;
- if (is_timeout) {
- set_bit(hdev->reset_type, &hdev->reset_pending);
- dev_info(&hdev->pdev->dev,
- "re-schedule to wait for hw reset done\n");
- return true;
- }
-
- dev_info(&hdev->pdev->dev, "Upgrade reset level\n");
- hclge_clear_reset_cause(hdev);
- set_bit(HNAE3_GLOBAL_RESET, &hdev->default_reset_request);
- mod_timer(&hdev->reset_timer,
- jiffies + HCLGE_RESET_INTERVAL);
-
- return false;
+ set_bit(hdev->reset_type, &hdev->reset_pending);
+ dev_info(&hdev->pdev->dev,
+ "re-schedule reset task(%d)\n",
+ hdev->reset_fail_cnt);
+ return true;
}
hclge_clear_reset_cause(hdev);
@@ -3382,7 +3373,6 @@ static int hclge_reset_stack(struct hclge_dev *hdev)
static void hclge_reset(struct hclge_dev *hdev)
{
struct hnae3_ae_dev *ae_dev = pci_get_drvdata(hdev->pdev);
- bool is_timeout = false;
int ret;
/* Initialize ae_dev reset status as well, in case enet layer wants to
@@ -3410,10 +3400,8 @@ static void hclge_reset(struct hclge_dev *hdev)
if (ret)
goto err_reset;
- if (hclge_reset_wait(hdev)) {
- is_timeout = true;
+ if (hclge_reset_wait(hdev))
goto err_reset;
- }
hdev->rst_stats.hw_reset_done_cnt++;
@@ -3465,7 +3453,7 @@ static void hclge_reset(struct hclge_dev *hdev)
err_reset_lock:
rtnl_unlock();
err_reset:
- if (hclge_reset_err_handle(hdev, is_timeout))
+ if (hclge_reset_err_handle(hdev))
hclge_reset_task_schedule(hdev);
}
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 05/10] net: hns3: modify firmware version display format
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Yufeng Mo, Peng Li, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
From: Yufeng Mo <moyufeng@huawei.com>
This patch modifies firmware version display format in
hclge(vf)_cmd_init() and hns3_get_drvinfo(). Also, adds
some optimizations for firmware version display format.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hnae3.h | 9 +++++++++
drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 15 +++++++++++++--
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 10 +++++++++-
drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c | 10 +++++++++-
4 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 48c7b70..a4624db 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -179,6 +179,15 @@ struct hnae3_vector_info {
#define HNAE3_RING_GL_RX 0
#define HNAE3_RING_GL_TX 1
+#define HNAE3_FW_VERSION_BYTE3_SHIFT 24
+#define HNAE3_FW_VERSION_BYTE3_MASK GENMASK(31, 24)
+#define HNAE3_FW_VERSION_BYTE2_SHIFT 16
+#define HNAE3_FW_VERSION_BYTE2_MASK GENMASK(23, 16)
+#define HNAE3_FW_VERSION_BYTE1_SHIFT 8
+#define HNAE3_FW_VERSION_BYTE1_MASK GENMASK(15, 8)
+#define HNAE3_FW_VERSION_BYTE0_SHIFT 0
+#define HNAE3_FW_VERSION_BYTE0_MASK GENMASK(7, 0)
+
struct hnae3_ring_chain_node {
struct hnae3_ring_chain_node *next;
u32 tqp_index;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 5bff98a..e71c92b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -527,6 +527,7 @@ static void hns3_get_drvinfo(struct net_device *netdev,
{
struct hns3_nic_priv *priv = netdev_priv(netdev);
struct hnae3_handle *h = priv->ae_handle;
+ u32 fw_version;
if (!h->ae_algo->ops->get_fw_version) {
netdev_err(netdev, "could not get fw version!\n");
@@ -545,8 +546,18 @@ static void hns3_get_drvinfo(struct net_device *netdev,
sizeof(drvinfo->bus_info));
drvinfo->bus_info[ETHTOOL_BUSINFO_LEN - 1] = '\0';
- snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), "0x%08x",
- priv->ae_handle->ae_algo->ops->get_fw_version(h));
+ fw_version = priv->ae_handle->ae_algo->ops->get_fw_version(h);
+
+ snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
+ "%lu.%lu.%lu.%lu",
+ hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE3_MASK,
+ HNAE3_FW_VERSION_BYTE3_SHIFT),
+ hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE2_MASK,
+ HNAE3_FW_VERSION_BYTE2_SHIFT),
+ hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE1_MASK,
+ HNAE3_FW_VERSION_BYTE1_SHIFT),
+ hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE0_MASK,
+ HNAE3_FW_VERSION_BYTE0_SHIFT));
}
static u32 hns3_get_link(struct net_device *netdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 22f6acd..d9858f2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -419,7 +419,15 @@ int hclge_cmd_init(struct hclge_dev *hdev)
}
hdev->fw_version = version;
- dev_info(&hdev->pdev->dev, "The firmware version is %08x\n", version);
+ dev_info(&hdev->pdev->dev, "The firmware version is %lu.%lu.%lu.%lu\n",
+ hnae3_get_field(version, HNAE3_FW_VERSION_BYTE3_MASK,
+ HNAE3_FW_VERSION_BYTE3_SHIFT),
+ hnae3_get_field(version, HNAE3_FW_VERSION_BYTE2_MASK,
+ HNAE3_FW_VERSION_BYTE2_SHIFT),
+ hnae3_get_field(version, HNAE3_FW_VERSION_BYTE1_MASK,
+ HNAE3_FW_VERSION_BYTE1_SHIFT),
+ hnae3_get_field(version, HNAE3_FW_VERSION_BYTE0_MASK,
+ HNAE3_FW_VERSION_BYTE0_SHIFT));
return 0;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
index 652b796..8f21eb3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
@@ -405,7 +405,15 @@ int hclgevf_cmd_init(struct hclgevf_dev *hdev)
}
hdev->fw_version = version;
- dev_info(&hdev->pdev->dev, "The firmware version is %08x\n", version);
+ dev_info(&hdev->pdev->dev, "The firmware version is %lu.%lu.%lu.%lu\n",
+ hnae3_get_field(version, HNAE3_FW_VERSION_BYTE3_MASK,
+ HNAE3_FW_VERSION_BYTE3_SHIFT),
+ hnae3_get_field(version, HNAE3_FW_VERSION_BYTE2_MASK,
+ HNAE3_FW_VERSION_BYTE2_SHIFT),
+ hnae3_get_field(version, HNAE3_FW_VERSION_BYTE1_MASK,
+ HNAE3_FW_VERSION_BYTE1_SHIFT),
+ hnae3_get_field(version, HNAE3_FW_VERSION_BYTE0_MASK,
+ HNAE3_FW_VERSION_BYTE0_SHIFT));
return 0;
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 02/10] net: hns3: add a check for get_reset_level
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Guangbin Huang, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
From: Guangbin Huang <huangguangbin@huawei.com>
For some cases, ops->get_reset_level may not be implemented, so we
should check whether it is NULL before calling get_reset_level.
Signed-off-by: Guangbin Huang <huangguangbin@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 08af782..4d58c53 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1963,7 +1963,7 @@ static pci_ers_result_t hns3_slot_reset(struct pci_dev *pdev)
ops = ae_dev->ops;
/* request the reset */
- if (ops->reset_event) {
+ if (ops->reset_event && ops->get_reset_level) {
if (ae_dev->hw_err_reset_req) {
reset_type = ops->get_reset_level(ae_dev,
&ae_dev->hw_err_reset_req);
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 06/10] net: hns3: add debug messages to identify eth down cause
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Yonglong Liu, Peng Li, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
From: Yonglong Liu <liuyonglong@huawei.com>
Some times just see the eth interface have been down/up via
dmesg, but can not know why the eth down. So adds some debug
messages to identify the cause for this.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 18 ++++++++++++++++++
drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 19 +++++++++++++++++++
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 11 +++++++++++
3 files changed, 48 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 4d58c53..973c57b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -459,6 +459,9 @@ static int hns3_nic_net_open(struct net_device *netdev)
h->ae_algo->ops->set_timer_task(priv->ae_handle, true);
hns3_config_xps(priv);
+
+ netif_info(h, drv, netdev, "net open\n");
+
return 0;
}
@@ -519,6 +522,8 @@ static int hns3_nic_net_stop(struct net_device *netdev)
if (test_and_set_bit(HNS3_NIC_STATE_DOWN, &priv->state))
return 0;
+ netif_info(h, drv, netdev, "net stop\n");
+
if (h->ae_algo->ops->set_timer_task)
h->ae_algo->ops->set_timer_task(priv->ae_handle, false);
@@ -1550,6 +1555,8 @@ static int hns3_setup_tc(struct net_device *netdev, void *type_data)
h = hns3_get_handle(netdev);
kinfo = &h->kinfo;
+ netif_info(h, drv, netdev, "setup tc: num_tc=%u\n", tc);
+
return (kinfo->dcb_ops && kinfo->dcb_ops->setup_tc) ?
kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : -EOPNOTSUPP;
}
@@ -1593,6 +1600,10 @@ static int hns3_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
struct hnae3_handle *h = hns3_get_handle(netdev);
int ret = -EIO;
+ netif_info(h, drv, netdev,
+ "set vf vlan: vf=%d, vlan=%u, qos=%u, vlan_proto=%u\n",
+ vf, vlan, qos, vlan_proto);
+
if (h->ae_algo->ops->set_vf_vlan_filter)
ret = h->ae_algo->ops->set_vf_vlan_filter(h, vf, vlan,
qos, vlan_proto);
@@ -1611,6 +1622,9 @@ static int hns3_nic_change_mtu(struct net_device *netdev, int new_mtu)
if (!h->ae_algo->ops->set_mtu)
return -EOPNOTSUPP;
+ netif_info(h, drv, netdev,
+ "change mtu from %u to %d\n", netdev->mtu, new_mtu);
+
ret = h->ae_algo->ops->set_mtu(h, new_mtu);
if (ret)
netdev_err(netdev, "failed to change MTU in hardware %d\n",
@@ -4395,6 +4409,10 @@ int hns3_set_channels(struct net_device *netdev,
if (kinfo->rss_size == new_tqp_num)
return 0;
+ netif_info(h, drv, netdev,
+ "set channels: tqp_num=%u, rxfh=%d\n",
+ new_tqp_num, rxfh_configured);
+
ret = hns3_reset_notify(h, HNAE3_DOWN_CLIENT);
if (ret)
return ret;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index e71c92b..8553200 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -311,6 +311,8 @@ static void hns3_self_test(struct net_device *ndev,
if (eth_test->flags != ETH_TEST_FL_OFFLINE)
return;
+ netif_info(h, drv, ndev, "self test start");
+
st_param[HNAE3_LOOP_APP][0] = HNAE3_LOOP_APP;
st_param[HNAE3_LOOP_APP][1] =
h->flags & HNAE3_SUPPORT_APP_LOOPBACK;
@@ -374,6 +376,8 @@ static void hns3_self_test(struct net_device *ndev,
if (if_running)
ndev->netdev_ops->ndo_open(ndev);
+
+ netif_info(h, drv, ndev, "self test end\n");
}
static int hns3_get_sset_count(struct net_device *netdev, int stringset)
@@ -604,6 +608,10 @@ static int hns3_set_pauseparam(struct net_device *netdev,
{
struct hnae3_handle *h = hns3_get_handle(netdev);
+ netif_info(h, drv, netdev,
+ "set pauseparam: autoneg=%u, rx:%u, tx:%u\n",
+ param->autoneg, param->rx_pause, param->tx_pause);
+
if (h->ae_algo->ops->set_pauseparam)
return h->ae_algo->ops->set_pauseparam(h, param->autoneg,
param->rx_pause,
@@ -743,6 +751,11 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
if (cmd->base.speed == SPEED_1000 && cmd->base.duplex == DUPLEX_HALF)
return -EINVAL;
+ netif_info(handle, drv, netdev,
+ "set link(%s): autoneg=%u, speed=%u, duplex=%u\n",
+ netdev->phydev ? "phy" : "mac",
+ cmd->base.autoneg, cmd->base.speed, cmd->base.duplex);
+
/* Only support ksettings_set for netdev with phy attached for now */
if (netdev->phydev)
return phy_ethtool_ksettings_set(netdev->phydev, cmd);
@@ -984,6 +997,9 @@ static int hns3_nway_reset(struct net_device *netdev)
return -EINVAL;
}
+ netif_info(handle, drv, netdev,
+ "nway reset (using %s)\n", phy ? "phy" : "mac");
+
if (phy)
return genphy_restart_aneg(phy);
@@ -1308,6 +1324,9 @@ static int hns3_set_fecparam(struct net_device *netdev,
if (!ops->set_fec)
return -EOPNOTSUPP;
fec_mode = eth_to_loc_fec(fec->fec);
+
+ netif_info(handle, drv, netdev, "set fecparam: mode=%u\n", fec_mode);
+
return ops->set_fec(handle, fec_mode);
}
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index bac4ce1..59774e1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -201,6 +201,7 @@ static int hclge_client_setup_tc(struct hclge_dev *hdev)
static int hclge_ieee_setets(struct hnae3_handle *h, struct ieee_ets *ets)
{
struct hclge_vport *vport = hclge_get_vport(h);
+ struct net_device *netdev = h->kinfo.netdev;
struct hclge_dev *hdev = vport->back;
bool map_changed = false;
u8 num_tc = 0;
@@ -215,6 +216,8 @@ static int hclge_ieee_setets(struct hnae3_handle *h, struct ieee_ets *ets)
return ret;
if (map_changed) {
+ netif_info(h, drv, netdev, "set ets\n");
+
ret = hclge_notify_client(hdev, HNAE3_DOWN_CLIENT);
if (ret)
return ret;
@@ -300,6 +303,7 @@ static int hclge_ieee_getpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
{
struct hclge_vport *vport = hclge_get_vport(h);
+ struct net_device *netdev = h->kinfo.netdev;
struct hclge_dev *hdev = vport->back;
u8 i, j, pfc_map, *prio_tc;
@@ -325,6 +329,10 @@ static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
hdev->tm_info.hw_pfc_map = pfc_map;
hdev->tm_info.pfc_en = pfc->pfc_en;
+ netif_info(h, drv, netdev,
+ "set pfc: pfc_en=%u, pfc_map=%u, num_tc=%u\n",
+ pfc->pfc_en, pfc_map, hdev->tm_info.num_tc);
+
hclge_tm_pfc_info_update(hdev);
return hclge_pause_setup_hw(hdev, false);
@@ -345,8 +353,11 @@ static u8 hclge_getdcbx(struct hnae3_handle *h)
static u8 hclge_setdcbx(struct hnae3_handle *h, u8 mode)
{
struct hclge_vport *vport = hclge_get_vport(h);
+ struct net_device *netdev = h->kinfo.netdev;
struct hclge_dev *hdev = vport->back;
+ netif_info(h, drv, netdev, "set dcbx: mode=%u\n", mode);
+
/* No support for LLD_MANAGED modes or CEE */
if ((mode & DCB_CAP_DCBX_LLD_MANAGED) ||
(mode & DCB_CAP_DCBX_VER_CEE) ||
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 04/10] net: hns3: change GFP flag during lock period
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Yufeng Mo, lipeng 00277521, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
From: Yufeng Mo <moyufeng@huawei.com>
When allocating memory, the GFP_KERNEL cannot be used during the
spin_lock period. This is because it may cause scheduling when holding
spin_lock. This patch changes GFP flag to GFP_ATOMIC in this case.
Fixes: dd74f815dd41 ("net: hns3: Add support for rule add/delete for flow director")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: lipeng 00277521 <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 3c64d70..14199c4 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -5796,7 +5796,7 @@ static int hclge_add_fd_entry_by_arfs(struct hnae3_handle *handle, u16 queue_id,
return -ENOSPC;
}
- rule = kzalloc(sizeof(*rule), GFP_KERNEL);
+ rule = kzalloc(sizeof(*rule), GFP_ATOMIC);
if (!rule) {
spin_unlock_bh(&hdev->fd_rule_lock);
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 10/10] net: hns3: use dev_info() instead of pr_info()
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
dev_info() is more appropriate for printing messages when driver
initialization done, so switch to dev_info().
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 +++-
drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 ++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 30a7074..4138780 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -8862,7 +8862,9 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
hclge_state_init(hdev);
hdev->last_reset_time = jiffies;
- pr_info("%s driver initialization finished.\n", HCLGE_DRIVER_NAME);
+ dev_info(&hdev->pdev->dev, "%s driver initialization finished.\n",
+ HCLGE_DRIVER_NAME);
+
return 0;
err_mdiobus_unreg:
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index a13a0e1..ae0e6a6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -2695,7 +2695,8 @@ static int hclgevf_init_hdev(struct hclgevf_dev *hdev)
}
hdev->last_reset_time = jiffies;
- pr_info("finished initializing %s driver\n", HCLGEVF_DRIVER_NAME);
+ dev_info(&hdev->pdev->dev, "finished initializing %s driver\n",
+ HCLGEVF_DRIVER_NAME);
return 0;
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 08/10] net: hns3: add interrupt affinity support for misc interrupt
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Yunsheng Lin, Peng Li, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
From: Yunsheng Lin <linyunsheng@huawei.com>
The misc interrupt is used to schedule the reset and mailbox
subtask, and service_task delayed_work is used to do periodic
management work each second.
This patch sets the above three subtask's affinity using the
misc interrupt' affinity.
Also this patch setups a affinity notify for misc interrupt to
allow user to change the above three subtask's affinity.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 53 ++++++++++++++++++++--
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 4 ++
2 files changed, 53 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 13c9697..30a7074 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1270,6 +1270,12 @@ static int hclge_configure(struct hclge_dev *hdev)
hclge_init_kdump_kernel_config(hdev);
+ /* Set the init affinity based on pci func number */
+ i = cpumask_weight(cpumask_of_node(dev_to_node(&hdev->pdev->dev)));
+ i = i ? PCI_FUNC(hdev->pdev->devfn) % i : 0;
+ cpumask_set_cpu(cpumask_local_spread(i, dev_to_node(&hdev->pdev->dev)),
+ &hdev->affinity_mask);
+
return ret;
}
@@ -2499,14 +2505,16 @@ static void hclge_mbx_task_schedule(struct hclge_dev *hdev)
{
if (!test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state) &&
!test_and_set_bit(HCLGE_STATE_MBX_SERVICE_SCHED, &hdev->state))
- schedule_work(&hdev->mbx_service_task);
+ queue_work_on(cpumask_first(&hdev->affinity_mask), system_wq,
+ &hdev->mbx_service_task);
}
static void hclge_reset_task_schedule(struct hclge_dev *hdev)
{
if (!test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
!test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state))
- schedule_work(&hdev->rst_service_task);
+ queue_work_on(cpumask_first(&hdev->affinity_mask), system_wq,
+ &hdev->rst_service_task);
}
static void hclge_task_schedule(struct hclge_dev *hdev)
@@ -2516,8 +2524,9 @@ static void hclge_task_schedule(struct hclge_dev *hdev)
!test_and_set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state)) {
hdev->hw_stats.stats_timer++;
hdev->fd_arfs_expire_timer++;
- mod_delayed_work(system_wq, &hdev->service_task,
- round_jiffies_relative(HZ));
+ mod_delayed_work_on(cpumask_first(&hdev->affinity_mask),
+ system_wq, &hdev->service_task,
+ round_jiffies_relative(HZ));
}
}
@@ -2903,6 +2912,36 @@ static void hclge_get_misc_vector(struct hclge_dev *hdev)
hdev->num_msi_used += 1;
}
+static void hclge_irq_affinity_notify(struct irq_affinity_notify *notify,
+ const cpumask_t *mask)
+{
+ struct hclge_dev *hdev = container_of(notify, struct hclge_dev,
+ affinity_notify);
+
+ cpumask_copy(&hdev->affinity_mask, mask);
+}
+
+static void hclge_irq_affinity_release(struct kref *ref)
+{
+}
+
+static void hclge_misc_affinity_setup(struct hclge_dev *hdev)
+{
+ irq_set_affinity_hint(hdev->misc_vector.vector_irq,
+ &hdev->affinity_mask);
+
+ hdev->affinity_notify.notify = hclge_irq_affinity_notify;
+ hdev->affinity_notify.release = hclge_irq_affinity_release;
+ irq_set_affinity_notifier(hdev->misc_vector.vector_irq,
+ &hdev->affinity_notify);
+}
+
+static void hclge_misc_affinity_teardown(struct hclge_dev *hdev)
+{
+ irq_set_affinity_notifier(hdev->misc_vector.vector_irq, NULL);
+ irq_set_affinity_hint(hdev->misc_vector.vector_irq, NULL);
+}
+
static int hclge_misc_irq_init(struct hclge_dev *hdev)
{
int ret;
@@ -8794,6 +8833,11 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
INIT_WORK(&hdev->rst_service_task, hclge_reset_service_task);
INIT_WORK(&hdev->mbx_service_task, hclge_mailbox_service_task);
+ /* Setup affinity after service timer setup because add_timer_on
+ * is called in affinity notify.
+ */
+ hclge_misc_affinity_setup(hdev);
+
hclge_clear_all_event_cause(hdev);
hclge_clear_resetting_state(hdev);
@@ -8955,6 +8999,7 @@ static void hclge_uninit_ae_dev(struct hnae3_ae_dev *ae_dev)
struct hclge_dev *hdev = ae_dev->priv;
struct hclge_mac *mac = &hdev->hw.mac;
+ hclge_misc_affinity_teardown(hdev);
hclge_state_uninit(hdev);
if (mac->phydev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index dde8f22..688e425 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -863,6 +863,10 @@ struct hclge_dev {
DECLARE_KFIFO(mac_tnl_log, struct hclge_mac_tnl_stats,
HCLGE_MAC_TNL_LOG_SIZE);
+
+ /* affinity mask and notify for misc interrupt */
+ cpumask_t affinity_mask;
+ struct irq_affinity_notify affinity_notify;
};
/* VPort level vlan tag configuration for TX direction */
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 07/10] net: hns3: make hclge_service use delayed workqueue
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Yunsheng Lin, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
From: Yunsheng Lin <linyunsheng@huawei.com>
Use delayed work instead of using timers to trigger the
hclge_serive.
Simplify the code with one less middle function and in order
to support misc irq affinity.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 52 +++++++++-------------
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 3 +-
2 files changed, 21 insertions(+), 34 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 14199c4..13c9697 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2513,8 +2513,12 @@ static void hclge_task_schedule(struct hclge_dev *hdev)
{
if (!test_bit(HCLGE_STATE_DOWN, &hdev->state) &&
!test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
- !test_and_set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state))
- (void)schedule_work(&hdev->service_task);
+ !test_and_set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state)) {
+ hdev->hw_stats.stats_timer++;
+ hdev->fd_arfs_expire_timer++;
+ mod_delayed_work(system_wq, &hdev->service_task,
+ round_jiffies_relative(HZ));
+ }
}
static int hclge_get_mac_link_status(struct hclge_dev *hdev)
@@ -2729,25 +2733,6 @@ static int hclge_get_status(struct hnae3_handle *handle)
return hdev->hw.mac.link;
}
-static void hclge_service_timer(struct timer_list *t)
-{
- struct hclge_dev *hdev = from_timer(hdev, t, service_timer);
-
- mod_timer(&hdev->service_timer, jiffies + HZ);
- hdev->hw_stats.stats_timer++;
- hdev->fd_arfs_expire_timer++;
- hclge_task_schedule(hdev);
-}
-
-static void hclge_service_complete(struct hclge_dev *hdev)
-{
- WARN_ON(!test_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state));
-
- /* Flush memory before next watchdog */
- smp_mb__before_atomic();
- clear_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state);
-}
-
static u32 hclge_check_event_cause(struct hclge_dev *hdev, u32 *clearval)
{
u32 rst_src_reg, cmdq_src_reg, msix_src_reg;
@@ -3594,7 +3579,9 @@ static void hclge_update_vport_alive(struct hclge_dev *hdev)
static void hclge_service_task(struct work_struct *work)
{
struct hclge_dev *hdev =
- container_of(work, struct hclge_dev, service_task);
+ container_of(work, struct hclge_dev, service_task.work);
+
+ clear_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state);
if (hdev->hw_stats.stats_timer >= HCLGE_STATS_TIMER_INTERVAL) {
hclge_update_stats_for_all(hdev);
@@ -3609,7 +3596,8 @@ static void hclge_service_task(struct work_struct *work)
hclge_rfs_filter_expire(hdev);
hdev->fd_arfs_expire_timer = 0;
}
- hclge_service_complete(hdev);
+
+ hclge_task_schedule(hdev);
}
struct hclge_vport *hclge_get_vport(struct hnae3_handle *handle)
@@ -6148,10 +6136,13 @@ static void hclge_set_timer_task(struct hnae3_handle *handle, bool enable)
struct hclge_dev *hdev = vport->back;
if (enable) {
- mod_timer(&hdev->service_timer, jiffies + HZ);
+ hclge_task_schedule(hdev);
} else {
- del_timer_sync(&hdev->service_timer);
- cancel_work_sync(&hdev->service_task);
+ /* Set the DOWN flag here to disable the service to be
+ * scheduled again
+ */
+ set_bit(HCLGE_STATE_DOWN, &hdev->state);
+ cancel_delayed_work_sync(&hdev->service_task);
clear_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state);
}
}
@@ -8590,12 +8581,10 @@ static void hclge_state_uninit(struct hclge_dev *hdev)
set_bit(HCLGE_STATE_DOWN, &hdev->state);
set_bit(HCLGE_STATE_REMOVING, &hdev->state);
- if (hdev->service_timer.function)
- del_timer_sync(&hdev->service_timer);
if (hdev->reset_timer.function)
del_timer_sync(&hdev->reset_timer);
- if (hdev->service_task.func)
- cancel_work_sync(&hdev->service_task);
+ if (hdev->service_task.work.func)
+ cancel_delayed_work_sync(&hdev->service_task);
if (hdev->rst_service_task.func)
cancel_work_sync(&hdev->rst_service_task);
if (hdev->mbx_service_task.func)
@@ -8800,9 +8789,8 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
hclge_dcb_ops_set(hdev);
- timer_setup(&hdev->service_timer, hclge_service_timer, 0);
timer_setup(&hdev->reset_timer, hclge_reset_timer, 0);
- INIT_WORK(&hdev->service_task, hclge_service_task);
+ INIT_DELAYED_WORK(&hdev->service_task, hclge_service_task);
INIT_WORK(&hdev->rst_service_task, hclge_reset_service_task);
INIT_WORK(&hdev->mbx_service_task, hclge_mailbox_service_task);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 6a12285..dde8f22 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -806,9 +806,8 @@ struct hclge_dev {
u16 adminq_work_limit; /* Num of admin receive queue desc to process */
unsigned long service_timer_period;
unsigned long service_timer_previous;
- struct timer_list service_timer;
struct timer_list reset_timer;
- struct work_struct service_task;
+ struct delayed_work service_task;
struct work_struct rst_service_task;
struct work_struct mbx_service_task;
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 09/10] net: hns3: Add support for using order 1 pages with a 4K buffer
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Yunsheng Lin, Huazhong Tan
In-Reply-To: <1564206372-42467-1-git-send-email-tanhuazhong@huawei.com>
From: Yunsheng Lin <linyunsheng@huawei.com>
Hardware supports 0.5K, 1K, 2K, 4K RX buffer size, the
RX buffer can not be reused because the hns3_page_order
return 0 when page size and RX buffer size are both 4096.
So this patch changes the hns3_page_order to return 1 when
RX buffer is greater than half of the page size and page size
is less the 8192, and dev_alloc_pages has already been used
to allocate the compound page for RX buffer.
This patch also changes hnae3_* to hns3_* for page order
and RX buffer size calculation because they are used in
hns3 module.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 +++++-----
drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 15 ++++++++++++---
2 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 973c57b..59a6076 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2081,7 +2081,7 @@ static void hns3_set_default_feature(struct net_device *netdev)
static int hns3_alloc_buffer(struct hns3_enet_ring *ring,
struct hns3_desc_cb *cb)
{
- unsigned int order = hnae3_page_order(ring);
+ unsigned int order = hns3_page_order(ring);
struct page *p;
p = dev_alloc_pages(order);
@@ -2092,7 +2092,7 @@ static int hns3_alloc_buffer(struct hns3_enet_ring *ring,
cb->page_offset = 0;
cb->reuse_flag = 0;
cb->buf = page_address(p);
- cb->length = hnae3_page_size(ring);
+ cb->length = hns3_page_size(ring);
cb->type = DESC_TYPE_PAGE;
return 0;
@@ -2395,7 +2395,7 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i,
{
struct hns3_desc *desc = &ring->desc[ring->next_to_clean];
int size = le16_to_cpu(desc->rx.size);
- u32 truesize = hnae3_buf_size(ring);
+ u32 truesize = hns3_buf_size(ring);
skb_add_rx_frag(skb, i, desc_cb->priv, desc_cb->page_offset + pull_len,
size - pull_len, truesize);
@@ -2410,7 +2410,7 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i,
/* Move offset up to the next cache line */
desc_cb->page_offset += truesize;
- if (desc_cb->page_offset + truesize <= hnae3_page_size(ring)) {
+ if (desc_cb->page_offset + truesize <= hns3_page_size(ring)) {
desc_cb->reuse_flag = 1;
/* Bump ref count on page before it is given */
get_page(desc_cb->priv);
@@ -2692,7 +2692,7 @@ static int hns3_add_frag(struct hns3_enet_ring *ring, struct hns3_desc *desc,
}
if (ring->tail_skb) {
- head_skb->truesize += hnae3_buf_size(ring);
+ head_skb->truesize += hns3_buf_size(ring);
head_skb->data_len += le16_to_cpu(desc->rx.size);
head_skb->len += le16_to_cpu(desc->rx.size);
skb = ring->tail_skb;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index 848b866..1a17856 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -608,9 +608,18 @@ static inline bool hns3_nic_resetting(struct net_device *netdev)
#define tx_ring_data(priv, idx) ((priv)->ring_data[idx])
-#define hnae3_buf_size(_ring) ((_ring)->buf_size)
-#define hnae3_page_order(_ring) (get_order(hnae3_buf_size(_ring)))
-#define hnae3_page_size(_ring) (PAGE_SIZE << (u32)hnae3_page_order(_ring))
+#define hns3_buf_size(_ring) ((_ring)->buf_size)
+
+static inline unsigned int hns3_page_order(struct hns3_enet_ring *ring)
+{
+#if (PAGE_SIZE < 8192)
+ if (ring->buf_size > (PAGE_SIZE / 2))
+ return 1;
+#endif
+ return 0;
+}
+
+#define hns3_page_size(_ring) (PAGE_SIZE << hns3_page_order(_ring))
/* iterator for handling rings in ring group */
#define hns3_for_each_ring(pos, head) \
--
2.7.4
^ permalink raw reply related
* [PATCH V3 net-next 00/10] net: hns3: some code optimizations & bugfixes & features
From: Huazhong Tan @ 2019-07-27 5:46 UTC (permalink / raw)
To: davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm, saeedm,
Huazhong Tan
This patch-set includes code optimizations, bugfixes and features for
the HNS3 ethernet controller driver.
[patch 1/10] checks reset status before setting channel.
[patch 2/10] adds a NULL pointer checking.
[patch 3/10] removes reset level upgrading when current reset fails.
[patch 4/10] fixes a GFP flags errors when holding spin_lock.
[patch 5/10] modifies firmware version format.
[patch 6/10] adds some print information which is off by default.
[patch 7/10 - 8/10] adds two code optimizations about interrupt handler
and work task.
[patch 9/10] adds support for using order 1 pages with a 4K buffer.
[patch 10/10] modifies messages prints with dev_info() instead of
pr_info().
Change log:
V2->V3: fixes comments from Saeed Mahameed and Joe Perches.
V1->V2: fixes comments from Saeed Mahameed and
removes previous [patch 4/11] and [patch 11/11]
which needs further discussion, and adds a new
patch [11/11] suggested by Saeed Mahameed.
Guangbin Huang (1):
net: hns3: add a check for get_reset_level
Huazhong Tan (2):
net: hns3: remove upgrade reset level when reset fail
net: hns3: use dev_info() instead of pr_info()
Jian Shen (1):
net: hns3: add reset checking before set channels
Yonglong Liu (1):
net: hns3: add debug messages to identify eth down cause
Yufeng Mo (2):
net: hns3: change GFP flag during lock period
net: hns3: modify firmware version display format
Yunsheng Lin (3):
net: hns3: make hclge_service use delayed workqueue
net: hns3: add interrupt affinity support for misc interrupt
net: hns3: Add support for using order 1 pages with a 4K buffer
drivers/net/ethernet/hisilicon/hns3/hnae3.h | 9 ++
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 33 ++++-
drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 15 ++-
drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 34 +++++-
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 10 +-
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 11 ++
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 135 ++++++++++++---------
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 7 +-
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c | 10 +-
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 +-
10 files changed, 195 insertions(+), 72 deletions(-)
--
2.7.4
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox