* Re: [PATCH net-next 03/13] net: sched: extend Qdisc with rcu
From: Kirill Tkhai @ 2018-09-06 8:39 UTC (permalink / raw)
To: Eric Dumazet, Vlad Buslov, netdev
Cc: jhs, xiyou.wangcong, jiri, davem, stephen, paulmck,
nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
lucien.xin, jakub.kicinski, christian.brauner, jbenc
In-Reply-To: <a5b99bb3-9076-ba0f-5b8b-efe8529e93fa@gmail.com>
On 06.09.2018 11:30, Eric Dumazet wrote:
>
>
> On 09/06/2018 12:58 AM, Vlad Buslov wrote:
>
> ...
>
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 18e22a5a6550..239c73f29471 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -90,6 +90,7 @@ struct Qdisc {
>> struct gnet_stats_queue __percpu *cpu_qstats;
>> int padded;
>> refcount_t refcnt;
>> + struct rcu_head rcu;
>>
>> /*
>> * For performance sake on SMP, we put highly modified fields at the end
>
> Probably better to move this at the end of struct Qdisc,
> not risking unexpected performance regressions in fast path.
Do you mean regressions on UP? On SMP it looks like this field
fits in the unused gap created by:
struct sk_buff_head gso_skb ____cacheline_aligned_in_smp;
Kirill
^ permalink raw reply
* Re: [PATCH v2 01/17] asm: simd context helper API
From: Thomas Gleixner @ 2018-09-06 13:42 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: Andrew Lutomirski, LKML, Netdev, David Miller, Greg Kroah-Hartman,
Samuel Neves, linux-arch, Rik van Riel
In-Reply-To: <CAHmME9rL2omQxqMVCcX-jPMpcQDk8ftEn3YXWcFhS=AkVK8ZXw@mail.gmail.com>
On Sat, 1 Sep 2018, Jason A. Donenfeld wrote:
> On Sat, Sep 1, 2018 at 2:32 PM Andy Lutomirski <luto@kernel.org> wrote:
> > I tend to think the right approach is to merge Jason's code and then
> > make it better later. Even with a totally perfect lazy FPU restore
> > implementation on x86, we'll probably still need some way of dealing
> > with SIMD contexts. I think we're highly unlikely to ever a allow
> > SIMD usage in all NMI contexts, for example, and there will always be
> > cases where we specifically don't want to use all available SIMD
> > capabilities even if we can. For example, generating random numbers
> > does crypto, but we probably don't want to do *SIMD* crypto, since
> > that will force a save and restore and will probably fire up the
> > AVX512 unit, and that's not worth it unless we're already using it for
> > some other reason.
> >
> > Also, as Rik has discovered, lazy FPU restore is conceptually
> > straightforward but isn't entirely trivial :)
>
> Sounds good. I'll move ahead on this basis.
Fine with me.
^ permalink raw reply
* [PATCH net-next v3 1/2] netlink: ipv4 igmp join notifications
From: Patrick Ruddy @ 2018-09-06 9:10 UTC (permalink / raw)
To: netdev; +Cc: roopa, jiri, stephen
In-Reply-To: <20180830093545.29465-2-pruddy@vyatta.att-mail.com>
Some userspace applications need to know about IGMP joins from the
kernel for 2 reasons:
1. To allow the programming of multicast MAC filters in hardware
2. To form a multicast FORUS list for non link-local multicast
groups to be sent to the kernel and from there to the interested
party.
(1) can be fulfilled but simply sending the hardware multicast MAC
address to be programmed but (2) requires the L3 address to be sent
since this cannot be constructed from the MAC address whereas the
reverse translation is a standard library function.
This commit provides addition and deletion of multicast addresses
using the RTM_NEWMDB and RTM_DELMDB messages with AF_INET. It also
provides the RTM_GETMDB extension to allow multicast join state to
be read from the kernel.
Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
---
v3 rework to use RTM_***MDB messages as per review comments.
net/ipv4/igmp.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 139 insertions(+)
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 4da39446da2d..aed819e2ea93 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -86,6 +86,7 @@
#include <linux/inetdevice.h>
#include <linux/igmp.h>
#include <linux/if_arp.h>
+#include <net/netlink.h>
#include <linux/rtnetlink.h>
#include <linux/times.h>
#include <linux/pkt_sched.h>
@@ -1385,6 +1386,91 @@ static void ip_mc_hash_remove(struct in_device *in_dev,
}
+static int fill_addr(struct sk_buff *skb, struct net_device *dev, __be32 addr,
+ int type, unsigned int flags)
+{
+ struct nlmsghdr *nlh;
+ struct ifaddrmsg *ifm;
+
+ nlh = nlmsg_put(skb, 0, 0, type, sizeof(*ifm), flags);
+ if (!nlh)
+ return -EMSGSIZE;
+
+ ifm = nlmsg_data(nlh);
+ ifm->ifa_family = AF_INET;
+ ifm->ifa_prefixlen = 32;
+ ifm->ifa_flags = IFA_F_PERMANENT;
+ ifm->ifa_scope = RT_SCOPE_LINK;
+ ifm->ifa_index = dev->ifindex;
+
+ if (nla_put_in_addr(skb, IFA_ADDRESS, addr))
+ goto nla_put_failure;
+ nlmsg_end(skb, nlh);
+ return 0;
+
+nla_put_failure:
+ nlmsg_cancel(skb, nlh);
+ return -EMSGSIZE;
+}
+
+static inline size_t addr_nlmsg_size(void)
+{
+ return NLMSG_ALIGN(sizeof(struct ifaddrmsg))
+ + nla_total_size(sizeof(__be32));
+}
+
+static void ip_mc_addr_notify(struct net_device *dev, __be32 addr, int type)
+{
+ struct net *net = dev_net(dev);
+ struct sk_buff *skb;
+ int err = -ENOBUFS;
+
+ skb = nlmsg_new(addr_nlmsg_size(), GFP_ATOMIC);
+ if (!skb)
+ goto errout;
+
+ err = fill_addr(skb, dev, addr, type, 0);
+ if (err < 0) {
+ WARN_ON(err == -EMSGSIZE);
+ kfree_skb(skb);
+ goto errout;
+ }
+ rtnl_notify(skb, net, 0, RTNLGRP_MDB, NULL, GFP_ATOMIC);
+ return;
+errout:
+ if (err < 0)
+ rtnl_set_sk_err(net, RTNLGRP_MDB, err);
+}
+
+int ip_mc_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb,
+ struct net_device *dev)
+{
+ int s_idx;
+ int idx = 0;
+ struct ip_mc_list *im;
+ struct in_device *in_dev;
+
+ ASSERT_RTNL();
+
+ s_idx = cb->args[2];
+ in_dev = __in_dev_get_rtnl(dev);
+
+ for_each_pmc_rtnl(in_dev, im) {
+ if (idx < s_idx)
+ continue;
+ if (fill_addr(skb, dev, im->multiaddr, RTM_NEWMDB,
+ NLM_F_MULTI) < 0)
+ goto done;
+ nl_dump_check_consistent(cb, nlmsg_hdr(skb));
+ idx++;
+ }
+
+ done:
+ cb->args[2] = idx;
+
+ return skb->len;
+}
+
/*
* A socket has joined a multicast group on device dev.
*/
@@ -1430,6 +1516,8 @@ static void __ip_mc_inc_group(struct in_device *in_dev, __be32 addr,
igmpv3_del_delrec(in_dev, im);
#endif
igmp_group_added(im);
+
+ ip_mc_addr_notify(in_dev->dev, addr, RTM_NEWMDB);
if (!in_dev->dead)
ip_rt_multicast_event(in_dev);
out:
@@ -1661,6 +1749,8 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 addr)
in_dev->mc_count--;
igmp_group_dropped(i);
ip_mc_clear_src(i);
+ ip_mc_addr_notify(in_dev->dev, addr,
+ RTM_DELMDB);
if (!in_dev->dead)
ip_rt_multicast_event(in_dev);
@@ -3051,6 +3141,53 @@ static struct notifier_block igmp_notifier = {
.notifier_call = igmp_netdev_event,
};
+static int igmp_mc_dump_ifaddrs(struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ struct net *net = sock_net(skb->sk);
+ int h, s_h;
+ int idx, s_idx;
+ struct net_device *dev;
+ struct in_device *in_dev;
+ struct hlist_head *head;
+
+ s_h = cb->args[0];
+ idx = cb->args[1];
+ s_idx = idx;
+
+ for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
+ idx = 0;
+ head = &net->dev_index_head[h];
+ rcu_read_lock();
+ cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^
+ net->dev_base_seq;
+ hlist_for_each_entry_rcu(dev, head, index_hlist) {
+ if (idx < s_idx)
+ goto cont;
+ if (h > s_h || idx > s_idx)
+ cb->args[2] = 0;
+ in_dev = __in_dev_get_rcu(dev);
+ if (!in_dev)
+ goto cont;
+
+ /* loop over multicast addresses */
+ if (ip_mc_dump_ifaddr(skb, cb, dev) < 0) {
+ rcu_read_unlock();
+ goto done;
+ }
+cont:
+ idx++;
+ }
+ rcu_read_unlock();
+ }
+
+done:
+ cb->args[0] = h;
+ cb->args[1] = idx;
+
+ return skb->len;
+}
+
int __init igmp_mc_init(void)
{
#if defined(CONFIG_PROC_FS)
@@ -3064,6 +3201,8 @@ int __init igmp_mc_init(void)
goto reg_notif_fail;
return 0;
+ rtnl_register(PF_INET, RTM_GETMDB, NULL, igmp_mc_dump_ifaddrs, 0);
+
reg_notif_fail:
unregister_pernet_subsys(&igmp_net_ops);
return err;
--
2.17.1
^ permalink raw reply related
* [PATCH net-next v3 2/2] netlink: ipv6 MLD join notifications
From: Patrick Ruddy @ 2018-09-06 9:10 UTC (permalink / raw)
To: netdev; +Cc: roopa, jiri, stephen
In-Reply-To: <20180906091056.21109-1-pruddy@vyatta.att-mail.com>
Some userspace applications need to know about MLD joins from the
kernel for 2 reasons:
1. To allow the programming of multicast MAC filters in hardware
2. To form a multicast FORUS list for non link-local multicast
groups to be sent to the kernel and from there to the interested
party.
(1) can be fulfilled but simply sending the hardware multicast MAC
address to be programmed but (2) requires the L3 address to be sent
since this cannot be constructed from the MAC address whereas the
reverse translation is a standard library function.
This commit provides addition and deletion of multicast addresses
using the RTM_NEWMDB and RTM_DELMDB messages with AF_INET6. It also
provides the RTM_GETMDB extension to allow multicast join state to
be read from the kernel.
Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
---
v3 rework to use RTM_***MDB messages as per review comments.
net/ipv6/addrconf.c | 34 ++++++++++++++++-------
net/ipv6/mcast.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 90 insertions(+), 10 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d51a8c0b3372..d23955c21650 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4860,6 +4860,8 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
struct nlmsghdr *nlh;
u8 scope = RT_SCOPE_UNIVERSE;
int ifindex = ifmca->idev->dev->ifindex;
+ int addr_type = (event == RTM_GETMULTICAST) ? IFA_MULTICAST :
+ IFA_ADDRESS;
if (ipv6_addr_scope(&ifmca->mca_addr) & IFA_SITE)
scope = RT_SCOPE_SITE;
@@ -4869,7 +4871,7 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
return -EMSGSIZE;
put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
- if (nla_put_in6_addr(skb, IFA_MULTICAST, &ifmca->mca_addr) < 0 ||
+ if (nla_put_in6_addr(skb, addr_type, &ifmca->mca_addr) < 0 ||
put_cacheinfo(skb, ifmca->mca_cstamp, ifmca->mca_tstamp,
INFINITY_LIFE_TIME, INFINITY_LIFE_TIME) < 0) {
nlmsg_cancel(skb, nlh);
@@ -4916,7 +4918,7 @@ enum addr_type_t {
/* called with rcu_read_lock() */
static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
struct netlink_callback *cb, enum addr_type_t type,
- int s_ip_idx, int *p_ip_idx)
+ int s_ip_idx, int *p_ip_idx, int msg_type)
{
struct ifmcaddr6 *ifmca;
struct ifacaddr6 *ifaca;
@@ -4935,7 +4937,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
err = inet6_fill_ifaddr(skb, ifa,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
- RTM_NEWADDR,
+ msg_type,
NLM_F_MULTI);
if (err < 0)
break;
@@ -4952,7 +4954,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
err = inet6_fill_ifmcaddr(skb, ifmca,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
- RTM_GETMULTICAST,
+ msg_type,
NLM_F_MULTI);
if (err < 0)
break;
@@ -4967,7 +4969,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
err = inet6_fill_ifacaddr(skb, ifaca,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
- RTM_GETANYCAST,
+ msg_type,
NLM_F_MULTI);
if (err < 0)
break;
@@ -4982,7 +4984,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
}
static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
- enum addr_type_t type)
+ enum addr_type_t type, int msg_type)
{
struct net *net = sock_net(skb->sk);
int h, s_h;
@@ -5012,7 +5014,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
goto cont;
if (in6_dump_addrs(idev, skb, cb, type,
- s_ip_idx, &ip_idx) < 0)
+ s_ip_idx, &ip_idx, msg_type) < 0)
goto done;
cont:
idx++;
@@ -5031,14 +5033,22 @@ static int inet6_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
{
enum addr_type_t type = UNICAST_ADDR;
- return inet6_dump_addr(skb, cb, type);
+ return inet6_dump_addr(skb, cb, type, RTM_NEWADDR);
}
static int inet6_dump_ifmcaddr(struct sk_buff *skb, struct netlink_callback *cb)
{
enum addr_type_t type = MULTICAST_ADDR;
- return inet6_dump_addr(skb, cb, type);
+ return inet6_dump_addr(skb, cb, type, RTM_GETMULTICAST);
+}
+
+static int inet6_mdb_dump_ifmcaddr(struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ enum addr_type_t type = MULTICAST_ADDR;
+
+ return inet6_dump_addr(skb, cb, type, RTM_NEWMDB);
}
@@ -5046,7 +5056,7 @@ static int inet6_dump_ifacaddr(struct sk_buff *skb, struct netlink_callback *cb)
{
enum addr_type_t type = ANYCAST_ADDR;
- return inet6_dump_addr(skb, cb, type);
+ return inet6_dump_addr(skb, cb, type, RTM_GETANYCAST);
}
static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
@@ -6774,6 +6784,10 @@ int __init addrconf_init(void)
NULL, inet6_dump_ifmcaddr, 0);
if (err < 0)
goto errout;
+ err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETMDB,
+ NULL, inet6_mdb_dump_ifmcaddr, 0);
+ if (err < 0)
+ goto errout;
err = rtnl_register_module(THIS_MODULE, PF_INET6, RTM_GETANYCAST,
NULL, inet6_dump_ifacaddr, 0);
if (err < 0)
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 4ae54aaca373..5108dbf73516 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -880,6 +880,67 @@ static struct ifmcaddr6 *mca_alloc(struct inet6_dev *idev,
return mc;
}
+static int fill_addr(struct sk_buff *skb, struct net_device *dev,
+ const struct in6_addr *addr, int type, unsigned int flags)
+{
+ struct nlmsghdr *nlh;
+ struct ifaddrmsg *ifm;
+ u8 scope = RT_SCOPE_UNIVERSE;
+
+ if (ipv6_addr_scope(addr) & IFA_SITE)
+ scope = RT_SCOPE_SITE;
+
+ nlh = nlmsg_put(skb, 0, 0, type, sizeof(*ifm), flags);
+ if (!nlh)
+ return -EMSGSIZE;
+
+ ifm = nlmsg_data(nlh);
+ ifm->ifa_family = AF_INET6;
+ ifm->ifa_prefixlen = 128;
+ ifm->ifa_flags = IFA_F_PERMANENT;
+ ifm->ifa_scope = scope;
+ ifm->ifa_index = dev->ifindex;
+
+ if (nla_put_in6_addr(skb, IFA_ADDRESS, addr))
+ goto nla_put_failure;
+ nlmsg_end(skb, nlh);
+ return 0;
+
+nla_put_failure:
+ nlmsg_cancel(skb, nlh);
+ return -EMSGSIZE;
+}
+
+static inline size_t addr_nlmsg_size(void)
+{
+ return NLMSG_ALIGN(sizeof(struct ifaddrmsg))
+ + nla_total_size(sizeof(struct in6_addr));
+}
+
+static void ipv6_mc_addr_notify(struct net_device *dev,
+ const struct in6_addr *addr, int type)
+{
+ struct net *net = dev_net(dev);
+ struct sk_buff *skb;
+ int err = -ENOBUFS;
+
+ skb = nlmsg_new(addr_nlmsg_size(), GFP_ATOMIC);
+ if (!skb)
+ goto errout;
+
+ err = fill_addr(skb, dev, addr, type, 0);
+ if (err < 0) {
+ WARN_ON(err == -EMSGSIZE);
+ kfree_skb(skb);
+ goto errout;
+ }
+ rtnl_notify(skb, net, 0, RTNLGRP_MDB, NULL, GFP_ATOMIC);
+ return;
+errout:
+ if (err < 0)
+ rtnl_set_sk_err(net, RTNLGRP_MDB, err);
+}
+
/*
* device multicast group inc (add if not found)
*/
@@ -932,6 +993,9 @@ static int __ipv6_dev_mc_inc(struct net_device *dev,
mld_del_delrec(idev, mc);
igmp6_group_added(mc);
+
+ ipv6_mc_addr_notify(dev, addr, RTM_NEWMDB);
+
ma_put(mc);
return 0;
}
@@ -960,6 +1024,8 @@ int __ipv6_dev_mc_dec(struct inet6_dev *idev, const struct in6_addr *addr)
igmp6_group_dropped(ma);
ip6_mc_clear_src(ma);
+ ipv6_mc_addr_notify(idev->dev, addr,
+ RTM_DELMDB);
ma_put(ma);
return 0;
}
--
2.17.1
^ permalink raw reply related
* Re: [PATCH net-next 03/13] net: sched: extend Qdisc with rcu
From: Eric Dumazet @ 2018-09-06 9:21 UTC (permalink / raw)
To: Kirill Tkhai, Eric Dumazet, Vlad Buslov, netdev
Cc: jhs, xiyou.wangcong, jiri, davem, stephen, paulmck,
nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
lucien.xin, jakub.kicinski, christian.brauner, jbenc
In-Reply-To: <4a230727-1072-3bc9-1590-2e59c2162460@virtuozzo.com>
On 09/06/2018 01:39 AM, Kirill Tkhai wrote:
> On 06.09.2018 11:30, Eric Dumazet wrote:
>>
>>
>> On 09/06/2018 12:58 AM, Vlad Buslov wrote:
>>
>> ...
>>
>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>> index 18e22a5a6550..239c73f29471 100644
>>> --- a/include/net/sch_generic.h
>>> +++ b/include/net/sch_generic.h
>>> @@ -90,6 +90,7 @@ struct Qdisc {
>>> struct gnet_stats_queue __percpu *cpu_qstats;
>>> int padded;
>>> refcount_t refcnt;
>>> + struct rcu_head rcu;
>>>
>>> /*
>>> * For performance sake on SMP, we put highly modified fields at the end
>>
>> Probably better to move this at the end of struct Qdisc,
>> not risking unexpected performance regressions in fast path.
>
> Do you mean regressions on UP? On SMP it looks like this field
> fits in the unused gap created by:
>
> struct sk_buff_head gso_skb ____cacheline_aligned_in_smp;
>
> Kirill
Oh right, we might have holes there, and in the last cache line, so it does not
matter for now.
^ permalink raw reply
* Re: [PATCH net-next 03/13] net: sched: extend Qdisc with rcu
From: Vlad Buslov @ 2018-09-06 9:23 UTC (permalink / raw)
To: Kirill Tkhai
Cc: Eric Dumazet, netdev, jhs, xiyou.wangcong, jiri, davem, stephen,
paulmck, nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
lucien.xin, jakub.kicinski, christian.brauner, jbenc
In-Reply-To: <4a230727-1072-3bc9-1590-2e59c2162460@virtuozzo.com>
On Thu 06 Sep 2018 at 08:39, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> On 06.09.2018 11:30, Eric Dumazet wrote:
>>
>>
>> On 09/06/2018 12:58 AM, Vlad Buslov wrote:
>>
>> ...
>>
>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>> index 18e22a5a6550..239c73f29471 100644
>>> --- a/include/net/sch_generic.h
>>> +++ b/include/net/sch_generic.h
>>> @@ -90,6 +90,7 @@ struct Qdisc {
>>> struct gnet_stats_queue __percpu *cpu_qstats;
>>> int padded;
>>> refcount_t refcnt;
>>> + struct rcu_head rcu;
>>>
>>> /*
>>> * For performance sake on SMP, we put highly modified fields at the end
>>
>> Probably better to move this at the end of struct Qdisc,
>> not risking unexpected performance regressions in fast path.
>
> Do you mean regressions on UP? On SMP it looks like this field
> fits in the unused gap created by:
>
> struct sk_buff_head gso_skb ____cacheline_aligned_in_smp;
>
> Kirill
Hi Eric, Kirill
I intentionally put rcu_head here in order for it not to be in same
cache line with "highly modified fields" (according to comment).
^ permalink raw reply
* Re: [PATCH net-next 03/13] net: sched: extend Qdisc with rcu
From: Eric Dumazet @ 2018-09-06 9:29 UTC (permalink / raw)
To: Vlad Buslov, Kirill Tkhai
Cc: netdev, jhs, xiyou.wangcong, jiri, davem, stephen, paulmck,
nicolas.dichtel, leon, gregkh, mark.rutland, fw, dsahern,
lucien.xin, jakub.kicinski, christian.brauner, jbenc
In-Reply-To: <vbf1sa7x9m9.fsf@reg-r-vrt-018-180.mtr.labs.mlnx>
On 09/06/2018 02:23 AM, Vlad Buslov wrote:
>
> On Thu 06 Sep 2018 at 08:39, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
>> On 06.09.2018 11:30, Eric Dumazet wrote:
>>>
>>>
>>> On 09/06/2018 12:58 AM, Vlad Buslov wrote:
>>>
>>> ...
>>>
>>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>>> index 18e22a5a6550..239c73f29471 100644
>>>> --- a/include/net/sch_generic.h
>>>> +++ b/include/net/sch_generic.h
>>>> @@ -90,6 +90,7 @@ struct Qdisc {
>>>> struct gnet_stats_queue __percpu *cpu_qstats;
>>>> int padded;
>>>> refcount_t refcnt;
>>>> + struct rcu_head rcu;
>>>>
>>>> /*
>>>> * For performance sake on SMP, we put highly modified fields at the end
>>>
>>> Probably better to move this at the end of struct Qdisc,
>>> not risking unexpected performance regressions in fast path.
>>
>> Do you mean regressions on UP? On SMP it looks like this field
>> fits in the unused gap created by:
>>
>> struct sk_buff_head gso_skb ____cacheline_aligned_in_smp;
>>
>> Kirill
>
> Hi Eric, Kirill
>
> I intentionally put rcu_head here in order for it not to be in same
> cache line with "highly modified fields" (according to comment).
My personal preference would have been to use the last cache line
for a new 'control path field', and leave holes in the first one
for future needs in fast path.
But this is a minor detail really, either choice is fine.
^ permalink raw reply
* Re: [PATCH RFC] net/mlx5_en: switch to Toeplitz RSS hash by default
From: Konstantin Khlebnikov @ 2018-09-06 10:04 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Tariq Toukan, Linux Netdev List, Saeed Mahameed, Gal Pressman,
Or Gerlitz, David S. Miller
In-Reply-To: <CALzJLG8wEHMEg-ixYPyoOCOQMcDVq03eVPo-7TSk0dR7mHM=Bg@mail.gmail.com>
On 06.09.2018 08:24, Saeed Mahameed wrote:
> On Sun, Sep 2, 2018 at 2:55 AM, Konstantin Khlebnikov
> <khlebnikov@yandex-team.ru> wrote:
>> On 02.09.2018 12:29, Tariq Toukan wrote:
>>>
>>>
>>>
>>> On 31/08/2018 2:29 PM, Konstantin Khlebnikov wrote:
>>>>
>>>> XOR (MLX5_RX_HASH_FN_INVERTED_XOR8) gives only 8 bits.
>>>> It seems not enough for RFS. All other drivers use toeplitz.
>>>>
>>>> Driver mlx4_en uses Toeplitz by default and warns if hash XOR is used
>>>> together with NETIF_F_RXHASH (enabled by default too): "Enabling both
>>>> XOR Hash function and RX Hashing can limit RPS functionality".
>>>>
>>>> XOR is default in mlx5_en since commit 2be6967cdbc9
>>>> ("net/mlx5e: Support ETH_RSS_HASH_XOR").
>>>>
>>>> Hash function could be set via ethtool. But it would be nice to have
>>>> single standard for drivers or proper description why this one is
>>>> special.
>>>>
>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>>> ---
>>>
>>>
>>> Hi Konstantin,
>>>
>>> Thanks for the patch.
>>>
>>> I understand the motivation.
>>>
>>> This change affects the default out-of-the-box behavior and requires a
>>> full performance cycle. We'll run performance regression tomorrow, results
>>> should be ready by EOW.
>>> > I'll update.
>>
>>
>> Ok, thank you.
>>
>> The only mention I've found in your documentation
>> http://www.mellanox.com/related-docs/prod_software/Mellanox_EN_for_Linux_User_Manual_v4_4.pdf
>>
>> is
>> ---
>> 1.1.10 RSS Support
>> 1.1.10.1 RSS Hash Function
>> The device has the ability to use XOR as the RSS distribution function,
>> instead of the default
>> Toplitz function.
>> The XOR function can be better distributed among driver's receive queues in
>> small number of
>> streams, where it distributes each TCP/UDP stream to a different queue.
>> ---
>>
>> So Toeplitz is supposed to be default hash function for all versions of
>> drivers and hardware.
>>
>> Also XOR8 seems vulnerable for ddos - hash is predictable, no random\secret
>> vector, only 8 bits.
>> So, it's easy to route all flows into one point. As we got it by accident.
>>
>> Moreover, in kernel 4.4.y hash switch via ethtool is broken and does not
>> work =)
>>
>
> is it broken in mlx5 only or for the whole kernel ?
In mlx5 only - interface works but makes no effect.
For now we have disabled RXHASH as workaround.
>
> If it is mlx5 then this might be the reason:
> commit 2d75b2bc8a8c0ce5567a6ecef52e194d117efe3f
> net/mlx5e: Add ethtool RSS configuration options
>
> was submitted to kernel 4.3
>
> and an important fix for hash function change was submitted to 4.5:
>
> commit bdfc028de1b3cd59490d5413a5c87b0fa50040c2
> Author: Tariq Toukan <tariqt@mellanox.com>
> Date: Mon Feb 29 21:17:12 2016 +0200
>
> net/mlx5e: Fix ethtool RX hash func configuration change
>
> We should modify TIRs explicitly to apply the new RSS configuration.
> The light ndo close/open calls do not "refresh" them.
>
> Fixes: 2d75b2bc8a8c ('net/mlx5e: Add ethtool RSS configuration options')
>
I think so, but this patch also seems flawed and fixed in
commit a100ff3eef193d2d79daf98dcd97a54776ffeb78
Author: Gal Pressman <galp@mellanox.com>
Date: Thu Jan 12 16:25:46 2017 +0200
net/mlx5e: Fix update of hash function/key via ethtool
Modifying TIR hash should change selected fields bitmask in addition to
the function and key.
And maybe more, there a lot of progress since v4.4
>
>>>
>>> Regards,
>>> Tariq
^ permalink raw reply
* Re: [PATCH 1/7] fix hnode refcounting
From: Jamal Hadi Salim @ 2018-09-06 10:21 UTC (permalink / raw)
To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko, stable
In-Reply-To: <20180905190414.5477-1-viro@ZenIV.linux.org.uk>
On 2018-09-05 3:04 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> cls_u32.c misuses refcounts for struct tc_u_hnode - it counts references via
> ->hlist and via ->tp_root together. u32_destroy() drops the former and, in
> case when there had been links, leaves the sucker on the list. As the result,
> there's nothing to protect it from getting freed once links are dropped.
> That also makes the "is it busy" check incapable of catching the root hnode -
> it *is* busy (there's a reference from tp), but we don't see it as something
> separate. "Is it our root?" check partially covers that, but the problem
> exists for others' roots as well.
>
> AFAICS, the minimal fix preserving the existing behaviour (where it doesn't
> include oopsen, that is) would be this:
> * count tp->root and tp_c->hlist as separate references. I.e.
> have u32_init() set refcount to 2, not 1.
> * in u32_destroy() we always drop the former; in u32_destroy_hnode() -
> the latter.
>
> That way we have *all* references contributing to refcount. List
> removal happens in u32_destroy_hnode() (called only when ->refcnt is 1)
> an in u32_destroy() in case of tc_u_common going away, along with everything
> reachable from it. IOW, that way we know that u32_destroy_key() won't
> free something still on the list (or pointed to by someone's ->root).
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For networking patches, subject should be reflective of tree and
subsystem. Example for this one:
"[PATCH net 1/7]:net: sched: cls_u32: fix hnode refcounting"
Also useful to have a cover letter summarizing the patchset
in 0/7. Otherwise
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 2/7] mark root hnode explicitly
From: Jamal Hadi Salim @ 2018-09-06 10:28 UTC (permalink / raw)
To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180905190414.5477-2-viro@ZenIV.linux.org.uk>
On 2018-09-05 3:04 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> ... and disallow deleting or linking to such
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Same comment as other one in regards to subject
Since the flag space is coming from htnode which is
exposed via uapi it makes sense to keep this one here
because it is for private use; but a comment in
include/uapi/linux/pkt_cls.h that this flag or
maybe a set of bits is reserved for internal use.
Otherwise:
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 3/7] make sure that divisor is a power of 2
From: Jamal Hadi Salim @ 2018-09-06 10:28 UTC (permalink / raw)
To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180905190414.5477-3-viro@ZenIV.linux.org.uk>
On 2018-09-05 3:04 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 2/7] mark root hnode explicitly
From: Jamal Hadi Salim @ 2018-09-06 10:34 UTC (permalink / raw)
To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <8d3e140a-eb1b-3aae-d9e7-36d103a54e5e@mojatatu.com>
On 2018-09-06 6:28 a.m., Jamal Hadi Salim wrote:
> On 2018-09-05 3:04 p.m., Al Viro wrote:
>> From: Al Viro <viro@zeniv.linux.org.uk>
>>
>> ... and disallow deleting or linking to such
>>
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>
> Same comment as other one in regards to subject
>
> Since the flag space is coming from htnode which is
> exposed via uapi it makes sense to keep this one here
> because it is for private use; but a comment in
> include/uapi/linux/pkt_cls.h that this flag or
> maybe a set of bits is reserved for internal use.
> Otherwise:
>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Sorry, additional comment:
It makes sense to reject user space attempt to
set TCA_CLS_FLAGS_U32_ROOT
So my suggestion is to update tc_flags_valid() to
check for this.
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 4/7] get rid of unused argument of u32_destroy_key()
From: Jamal Hadi Salim @ 2018-09-06 10:34 UTC (permalink / raw)
To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180905190414.5477-4-viro@ZenIV.linux.org.uk>
On 2018-09-05 3:04 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 5/7] get rid of tc_u_knode ->tp
From: Jamal Hadi Salim @ 2018-09-06 10:35 UTC (permalink / raw)
To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180905190414.5477-5-viro@ZenIV.linux.org.uk>
On 2018-09-05 3:04 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> not used anymore
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 6/7] get rid of tc_u_common ->rcu
From: Jamal Hadi Salim @ 2018-09-06 10:36 UTC (permalink / raw)
To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180905190414.5477-6-viro@ZenIV.linux.org.uk>
On 2018-09-05 3:04 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> unused
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 7/7] clean tc_u_common hashtable
From: Jamal Hadi Salim @ 2018-09-06 10:36 UTC (permalink / raw)
To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <20180905190414.5477-7-viro@ZenIV.linux.org.uk>
On 2018-09-05 3:04 p.m., Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> * calculate key *once*, not for each hash chain element
> * let tc_u_hash() return the pointer to chain head rather than index -
> callers are cleaner that way.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCH] ath9k: add reset for airtime station debugfs
From: Louie Lu @ 2018-09-06 10:41 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: kvalo, ath9k-devel, davem, linux-wireless, netdev
In-Reply-To: <877ejzynz9.fsf@toke.dk>
Previous mail rejects by mailing list, re-send again...
Toke Høiland-Jørgensen <toke@toke.dk> 於 2018年9月6日 週四 下午5:27寫道:
>
> Louie Lu <git@louie.lu> writes:
>
> > Let user can reset station airtime status by debugfs, it will
> > reset all airtime deficit to ATH_AIRTIME_QUANTUM and reset rx/tx
> > airtime accumulate to 0.
>
> No objections to the patch, but I'm curious which issues you were
> debugging that led you to needing it? :)
>
I'm testing to get the packet queue time + airtime in ath_tx_process_buffer,
it would be useful if I can reset the station airtime accumulated
value, so I can observe in each test round (e.g. 5 ping) airtime
accumulated
Also to reset the deficit to make sure it runs like fresh one.
Louie.
>
> -Toke
^ permalink raw reply
* Re: [PATCH iproute2] tc/mqprio: Print extra info on invalid args.
From: Stephen Hemminger @ 2018-09-06 10:39 UTC (permalink / raw)
To: Caleb Raitto; +Cc: netdev, jhs, xiyou.wangcong, jiri, Caleb Raitto
In-Reply-To: <20180905202419.238812-1-caleb.raitto@gmail.com>
On Wed, 5 Sep 2018 13:24:19 -0700
Caleb Raitto <caleb.raitto@gmail.com> wrote:
> From: Caleb Raitto <caraitto@google.com>
>
> Print the name of the argument that wasn't understood, and also print
> the usage string.
>
> Signed-off-by: Caleb Raitto <caraitto@google.com>
The standard code pattern in iproute2 is to use invarg().
Why not use that?
^ permalink raw reply
* Re: [PATCH 2/7] mark root hnode explicitly
From: Jamal Hadi Salim @ 2018-09-06 10:42 UTC (permalink / raw)
To: Al Viro, netdev; +Cc: Cong Wang, Jiri Pirko
In-Reply-To: <5689c19a-c4fb-24c1-4594-86f2639faa98@mojatatu.com>
[-- Attachment #1: Type: text/plain, Size: 47 bytes --]
And a bunch of indentations...
cheers,
jamal
[-- Attachment #2: indent.p --]
[-- Type: text/x-pascal, Size: 975 bytes --]
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 6d45ec4c218c..cb3bee12af78 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -485,7 +485,8 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
struct tc_cls_u32_offload cls_u32 = {};
tc_cls_common_offload_init(&cls_u32.common, tp,
- h->flags & ~TCA_CLS_FLAGS_U32_ROOT, extack);
+ h->flags & ~TCA_CLS_FLAGS_U32_ROOT,
+ extack);
cls_u32.command = TC_CLSU32_DELETE_HNODE;
cls_u32.hnode.divisor = h->divisor;
cls_u32.hnode.handle = h->handle;
@@ -1215,7 +1216,7 @@ static int u32_reoffload_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
int err;
tc_cls_common_offload_init(&cls_u32.common, tp,
- ht->flags & ~TCA_CLS_FLAGS_U32_ROOT, extack);
+ ht->flags & ~TCA_CLS_FLAGS_U32_ROOT, extack);
cls_u32.command = add ? TC_CLSU32_NEW_HNODE : TC_CLSU32_DELETE_HNODE;
cls_u32.hnode.divisor = ht->divisor;
cls_u32.hnode.handle = ht->handle;
^ permalink raw reply related
* Re: [PATCH 2/7] mark root hnode explicitly
From: Al Viro @ 2018-09-06 10:59 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: netdev, Cong Wang, Jiri Pirko
In-Reply-To: <5689c19a-c4fb-24c1-4594-86f2639faa98@mojatatu.com>
On Thu, Sep 06, 2018 at 06:34:00AM -0400, Jamal Hadi Salim wrote:
> On 2018-09-06 6:28 a.m., Jamal Hadi Salim wrote:
> > On 2018-09-05 3:04 p.m., Al Viro wrote:
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > >
> > > ... and disallow deleting or linking to such
> > >
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> >
> > Same comment as other one in regards to subject
> >
> > Since the flag space is coming from htnode which is
> > exposed via uapi it makes sense to keep this one here
> > because it is for private use; but a comment in
> > include/uapi/linux/pkt_cls.h that this flag or
> > maybe a set of bits is reserved for internal use.
> > Otherwise:
> >
> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>
>
> Sorry, additional comment:
> It makes sense to reject user space attempt to
> set TCA_CLS_FLAGS_U32_ROOT
Point, and that one is IMO enough to give up on using ->flags for
that. How about simply
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3f985f29ef30..d14048e38b5c 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -84,6 +84,7 @@ struct tc_u_hnode {
int refcnt;
unsigned int divisor;
struct idr handle_idr;
+ bool is_root;
struct rcu_head rcu;
u32 flags;
/* The 'ht' field MUST be the last field in structure to allow for
@@ -377,6 +378,7 @@ static int u32_init(struct tcf_proto *tp)
root_ht->refcnt++;
root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000;
root_ht->prio = tp->prio;
+ root_ht->is_root = true;
idr_init(&root_ht->handle_idr);
if (tp_c == NULL) {
@@ -693,7 +695,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last,
goto out;
}
- if (root_ht == ht) {
+ if (ht->is_root) {
NL_SET_ERR_MSG_MOD(extack, "Not allowed to delete root node");
return -EINVAL;
}
@@ -795,6 +797,10 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp,
NL_SET_ERR_MSG_MOD(extack, "Link hash table not found");
return -EINVAL;
}
+ if (ht_down->is_root) {
+ NL_SET_ERR_MSG_MOD(extack, "Not linking to root node");
+ return -EINVAL;
+ }
ht_down->refcnt++;
}
^ permalink raw reply related
* Re: [PATCH 2/7] mark root hnode explicitly
From: Jamal Hadi Salim @ 2018-09-06 11:04 UTC (permalink / raw)
To: Al Viro; +Cc: netdev, Cong Wang, Jiri Pirko
In-Reply-To: <20180906105933.GU19965@ZenIV.linux.org.uk>
On 2018-09-06 6:59 a.m., Al Viro wrote:
> On Thu, Sep 06, 2018 at 06:34:00AM -0400, Jamal Hadi Salim wrote:
>> On 2018-09-06 6:28 a.m., Jamal Hadi Salim wrote:
[..]
> Point, and that one is IMO enough to give up on using ->flags for
> that. How about simply
>
> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> index 3f985f29ef30..d14048e38b5c 100644
> --- a/net/sched/cls_u32.c
> +++ b/net/sched/cls_u32.c
> @@ -84,6 +84,7 @@ struct tc_u_hnode {
> int refcnt;
> unsigned int divisor;
> struct idr handle_idr;
> + bool is_root;
> struct rcu_head rcu;
> u32 flags;
> /* The 'ht' field MUST be the last field in structure to allow for
> @@ -377,6 +378,7 @@ static int u32_init(struct tcf_proto *tp)
> root_ht->refcnt++;
> root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000;
> root_ht->prio = tp->prio;
> + root_ht->is_root = true;
> idr_init(&root_ht->handle_idr);
>
> if (tp_c == NULL) {
> @@ -693,7 +695,7 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last,
> goto out;
> }
>
> - if (root_ht == ht) {
> + if (ht->is_root) {
> NL_SET_ERR_MSG_MOD(extack, "Not allowed to delete root node");
> return -EINVAL;
> }
> @@ -795,6 +797,10 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp,
> NL_SET_ERR_MSG_MOD(extack, "Link hash table not found");
> return -EINVAL;
> }
> + if (ht_down->is_root) {
> + NL_SET_ERR_MSG_MOD(extack, "Not linking to root node");
> + return -EINVAL;
> + }
> ht_down->refcnt++;
Looks good to me ;->
cheers,
jamal
^ permalink raw reply
* [PATCH net-next] liquidio CN23XX: Remove set but not used variable 'ring_flag'
From: YueHaibing @ 2018-09-06 11:22 UTC (permalink / raw)
To: Derek Chickles, Satanand Burla, Felix Manlunas, Raghu Vatsavayi,
David S. Miller
Cc: YueHaibing, netdev, kernel-janitors
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c: In function 'cn23xx_setup_octeon_vf_device':
drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c:619:20: warning:
variable 'ring_flag' set but not used [-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index 962bb62..fda4940 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -616,7 +616,7 @@ static void cn23xx_disable_vf_interrupt(struct octeon_device *oct, u8 intr_flag)
int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
{
struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip;
- u32 rings_per_vf, ring_flag;
+ u32 rings_per_vf;
u64 reg_val;
if (octeon_map_pci_barx(oct, 0, 0))
@@ -634,8 +634,6 @@ int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
rings_per_vf = reg_val & CN23XX_PKT_INPUT_CTL_RPVF_MASK;
- ring_flag = 0;
-
cn23xx->conf = oct_get_config_info(oct, LIO_23XX);
if (!cn23xx->conf) {
dev_err(&oct->pci_dev->dev, "%s No Config found for CN23XX\n",
^ permalink raw reply related
* Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete
From: Vlad Buslov @ 2018-09-06 11:14 UTC (permalink / raw)
To: Cong Wang
Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
David Miller
In-Reply-To: <CAM_iQpUY_-itn2oRQ-CEtOxZ-u1sKxCfbYWgPS1v6NTkT-NbSA@mail.gmail.com>
On Wed 05 Sep 2018 at 20:32, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Sep 5, 2018 at 12:05 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Tue 04 Sep 2018 at 22:41, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Mon, Sep 3, 2018 at 1:33 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >>
>> >> On Mon 03 Sep 2018 at 18:50, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >> > On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >> >>
>> >> >> Action API was changed to work with actions and action_idr in concurrency
>> >> >> safe manner, however tcf_del_walker() still uses actions without taking
>> >> >> reference to them first and deletes them directly, disregarding possible
>> >> >> concurrent delete.
>> >> >>
>> >> >> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
>> >> >> caller to hold reference to action and accepts action id as argument,
>> >> >> instead of direct action pointer.
>> >> >
>> >> > Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
>> >> > tcf_dump_walker() already does.
>> >>
>> >> Because tcf_del_walker() calls __tcf_idr_release(), which take
>> >> idrinfo->lock itself (deadlock). It also calls sleeping functions like
>> >
>> > Deadlock can be easily resolved by moving the lock out.
>> >
>> >
>> >> tcf_action_goto_chain_fini(), so just implementing function that
>> >> releases action without taking idrinfo->lock is not enough.
>> >
>> > Sleeping can be resolved either by making it atomic or
>> > deferring it to a work queue.
>> >
>> > None of your arguments here is a blocker to locking
>> > idrinfo->lock. You really should focus on if it is really
>> > necessary to lock idrinfo->lock in tcf_del_walker(), rather
>> > than these details.
>> >
>> > For me, if you need idrinfo->lock for dump walker, you must
>> > need it for delete walker too, because deletion is a writer
>> > which should require stronger protection than the dumper,
>> > which merely a reader.
>>
>> I don't get how it is necessary. Dump walker uses pointers to actions
>> directly, and in order to be concurrency-safe it must either hold the
>
> It uses the pointer in a read-only way, what you said doesn't change
> the fact that it is a reader. And, like other readers, it may not need
> to lock at all, which is a different topic.
>
>
>> lock or obtain reference to action. Note that del walker doesn't use the
>> action pointer, it only passed action id to tcf_idr_delete_index()
>> function, which does all the necessary locking and can deal with
>> potential concurrency issues (concurrent delete, etc.). This approach
>> also benefits from code reuse from other code paths that delete actions,
>> instead of implementing its own.
>
> Look at the difference below.
>
> With your change:
>
> idr_for_each_entry_ul{
> spin_lock(&idrinfo->lock);
> idr_remove();
> spin_unlock(&idrinfo->lock);
> }
>
> With what I suggest:
>
> spin_lock(&idrinfo->lock);
> idr_for_each_entry_ul{
> idr_remove();
> }
> spin_unlock(&idrinfo->lock);
>
> Isn't a concurrent tcf_idr_check_alloc() able to livelock here with
> your change?
>
> idr_for_each_entry_ul{
> spin_lock(&idrinfo->lock);
> idr_remove();
> spin_unlock(&idrinfo->lock);
> // tcf_idr_check_alloc() jumps in,
> // allocates next ID which can be found
> // by idr_get_next_ul()
> } // the whole loop goes _literately_ infinite...
idr_for_each_entry_ul traverses idr entries with ascending order of
identifiers, so infinite livelock like this is not possible because it
never goes back to newly added entries with id<current_id.
>
>
> Also, idr_for_each_entry_ul() is supposed to be protected either
> by RCU or idrinfo->lock, no? With your change or without any change,
> it doesn't even have any lock after removing RTNL?
After reading this comment I checked actual idr implementation and I
think you are right. Even though idr_for_each_entry_ul() macro (and
function idr_get_next_ul() that it uses to iterate over idr entries)
doesn't specify any locking requirements in comment description (that is
why this patch doesn't use any), its implementation seems to require
external synchronization.
You suggest I should just hold idrinfo->lock for whole del_walker loop
duration, or play nicely with potential concurrent users and
take/release it per action?
Thanks,
Vlad
^ permalink raw reply
* Re: [PATCH v2 01/17] asm: simd context helper API
From: Jason A. Donenfeld @ 2018-09-06 15:52 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andrew Lutomirski, LKML, Netdev, David Miller, Greg Kroah-Hartman,
Samuel Neves, linux-arch, Rik van Riel
In-Reply-To: <alpine.DEB.2.21.1809061542330.1570@nanos.tec.linutronix.de>
Hi Thomas,
On Thu, Sep 6, 2018 at 9:29 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Sat, 1 Sep 2018, Jason A. Donenfeld wrote:
> > On Sat, Sep 1, 2018 at 2:32 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > I tend to think the right approach is to merge Jason's code and then
> > > make it better later. Even with a totally perfect lazy FPU restore
> > > implementation on x86, we'll probably still need some way of dealing
> > > with SIMD contexts. I think we're highly unlikely to ever a allow
> > > SIMD usage in all NMI contexts, for example, and there will always be
> > > cases where we specifically don't want to use all available SIMD
> > > capabilities even if we can. For example, generating random numbers
> > > does crypto, but we probably don't want to do *SIMD* crypto, since
> > > that will force a save and restore and will probably fire up the
> > > AVX512 unit, and that's not worth it unless we're already using it for
> > > some other reason.
> > >
> > > Also, as Rik has discovered, lazy FPU restore is conceptually
> > > straightforward but isn't entirely trivial :)
> >
> > Sounds good. I'll move ahead on this basis.
>
> Fine with me.
Do you want to pull this single patch [01/17] into your tree now, and
then when I submit v3 of WireGuard and such, I can just drop this
patch from it, and then the rest will enter like usual networking
stuff through Dave's tree?
Jason
^ permalink raw reply
* Re: [PATCH net-next v1] net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC
From: Sabrina Dubroca @ 2018-09-06 11:22 UTC (permalink / raw)
To: Vakul Garg; +Cc: netdev, borisp, aviadye, davejwatson, davem, doronrk
In-Reply-To: <20180905162743.10145-1-vakul.garg@nxp.com>
2018-09-05, 21:57:43 +0530, Vakul Garg wrote:
> tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
> function sk_alloc_sg(). In case the number of SG entries hit
> MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
> current SG index to '0'. This leads to calling of function
> tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
> kernel crash. To fix this, set the number of SG elements to the number
> of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
> returns -ENOSPC.
>
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
What commit introduced this bug? Can you add Fixes: tag? And if it's a
bug present in the net tree, the fix should go into the net tree as
well.
Another thing I've noticed a few times with your patch submissions:
the clock on the system you're using for git-send-email seems to be
set something like 5 hours and 18 minutes in the future. Could you fix
it? It makes your email appear out of order.
Date: Wed, 5 Sep 2018 21:57:43 +0530
Received: from lti.ap.freescale.net (14.143.30.134) by
AM0PR04MB4243.eurprd04.prod.outlook.com (2603:10a6:208:66::29) with Microsoft
SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
15.20.1101.17; Wed, 5 Sep 2018 11:09:20 +0000
--
Sabrina
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox