* Re: [PATCH net-next] dev: Correctly get length of alias string in dev_set_alias(),Re: [PATCH net-next] dev: Correctly get length of alias string in dev_set_alias()
From: David Miller @ 2017-12-20 19:55 UTC (permalink / raw)
To: serhe.popovych; +Cc: netdev
In-Reply-To: <de0b307c-63e7-b786-e9d1-8b4a94f61d9a@gmail.com>
From: Serhey Popovich <serhe.popovych@gmail.com>
Date: Wed, 20 Dec 2017 21:44:46 +0200
> Sorry but I do not mean larger. I mean shorter. When nla_len() >
> strlen() we allocate extra space up to IFALIASZ - 1.
But the user typically does not do that.
> This is definitely fix nothing: we never get above the bounds, but
> in case if NULL terminator is in the middle of string with nla_len()
> we might allocate unused extra space.
See above, not worth optimizing for.
^ permalink raw reply
* Re: [net 1/1] tipc: remove joining group member from congested list
From: David Miller @ 2017-12-20 19:57 UTC (permalink / raw)
To: jon.maloy
Cc: netdev, mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen,
hoang.h.le, canh.d.luu, ying.xue, tipc-discussion
In-Reply-To: <1513764195-7731-1-git-send-email-jon.maloy@ericsson.com>
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 20 Dec 2017 11:03:15 +0100
> When we receive a JOIN message from a peer member, the message may
> contain an advertised window value ADV_IDLE that permits removing the
> member in question from the tipc_group::congested list. However, since
> the removal has been made conditional on that the advertised window is
> *not* ADV_IDLE, we miss this case. This has the effect that a sender
> sometimes may enter a state of permanent, false, broadcast congestion.
>
> We fix this by unconditinally removing the member from the congested
> list before calling tipc_member_update(), which might potentially sort
> it into the list again.
>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
I bet this bug wasn't fun to track down.
Applied, thanks Jon.
^ permalink raw reply
* Re: RCU callback crashes
From: Jiri Pirko @ 2017-12-20 19:59 UTC (permalink / raw)
To: Cong Wang; +Cc: Jakub Kicinski, netdev@vger.kernel.org
In-Reply-To: <CAM_iQpWUjfv2-Sirmdb5WfV4pZ4uF0m7=HR5YGWaKxb4KHp8gQ@mail.gmail.com>
Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> Ah, no object debug but KASAN on produces this:
>>
>
>
>I bet it is an ingress qdisc which is being freed?
>
>
>
>> [ 39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>> [ 39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>> [ 39.283524]
>> [ 39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>> [ 39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [ 39.303969] Call Trace:
>> [ 39.306769] <IRQ>
>> [ 39.309088] dump_stack+0xa6/0x118
>> [ 39.312957] ? _atomic_dec_and_lock+0xe8/0xe8
>> [ 39.317895] ? cpu_needs_another_gp+0x246/0x2b0
>> [ 39.323030] print_address_description+0x6a/0x270
>> [ 39.328380] ? cpu_needs_another_gp+0x246/0x2b0
>> [ 39.333510] kasan_report+0x23f/0x350
>> [ 39.337672] cpu_needs_another_gp+0x246/0x2b0
>> ...
>> [ 39.383026] rcu_process_callbacks+0x1a0/0x620
>> ...
>
>
>This is confusing.
>
>I guess it is q->miniqp which is freed in qdisc_graft() without properly
>waiting for rcu readers?
miniqp is inside qdisc private data:
struct ingress_sched_data {
struct tcf_block *block;
struct tcf_block_ext_info block_info;
struct mini_Qdisc_pair miniqp;
};
That is freed along with the qdisc itself in:
qdisc_destroy->qdisc_free
Before miniq, tp was checked in the rcu reader path. In case it was not
null, q was processed. In slow patch, tp is freed after rcu grace period:
tcf_proto_destroy->kfree_rcu
I assumed that since q is processed in rcu reader, it is also freed after
a grace period, but now looking at the code I don't see it happening
like that.
So I think that change to miniq made the existing race window
a bit wider and easier to hit.
I believe that calling kfree_rcu by call_rcu should resolve this.
^ permalink raw reply
* Re: [patch net-next 02/10] devlink: Add support for resource abstraction
From: Jiri Pirko @ 2017-12-20 20:01 UTC (permalink / raw)
To: David Miller
Cc: netdev, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, dsa, roopa
In-Reply-To: <20171220.144358.1892414104907740492.davem@davemloft.net>
Wed, Dec 20, 2017 at 08:43:58PM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Wed, 20 Dec 2017 12:58:13 +0100
>
>> From: Arkadi Sharshevsky <arkadis@mellanox.com>
>>
>> Add support for hardware resource abstraction over devlink. Each resource
>> is identified via id, furthermore it contains information regarding its
>> size and its related sub resources. Each resource can also provide its
>> current occupancy.
>>
>> In some cases the sizes of some resources can be changed, yet for those
>> changes to take place a hot driver reload may be needed. The reload
>> capability will be introduced in the next patch.
>>
>> Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>
>In what units are these sizes? If it depends upon the resource, it would
>be great to have a way to introspect the units given a resource.
In this case the unit is "item". But you have a point. We'll figure this
out.
>
>> + struct devlink_resource_ops *resource_ops;
>
>Const?
>
>> +static inline int
>> +devlink_resource_register(struct devlink *devlink,
>> + const char *resource_name,
>> + bool top_hierarchy,
>> + u64 resource_size,
>> + u64 resource_id,
>> + u64 parent_resource_id,
>> + struct devlink_resource_ops *resource_ops)
>
>Const for resource_ops?
>
>> +int devlink_resource_register(struct devlink *devlink,
>> + const char *resource_name,
>> + bool top_hierarchy,
>> + u64 resource_size,
>> + u64 resource_id,
>> + u64 parent_resource_id,
>> + struct devlink_resource_ops *resource_ops)
>
>Likewise.
Ok, thanks!
^ permalink raw reply
* Re: [patch net-next 00/10] Add support for resource abstraction
From: Jiri Pirko @ 2017-12-20 20:03 UTC (permalink / raw)
To: David Ahern
Cc: netdev, davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
yuvalm, ogerlitz, roopa
In-Reply-To: <e3533808-a00c-265a-554d-edab794895c3@cumulusnetworks.com>
Wed, Dec 20, 2017 at 08:23:54PM CET, dsa@cumulusnetworks.com wrote:
>On 12/20/17 4:58 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Many of the ASIC's internal resources are limited and are shared between
>> several hardware procedures. For example, unified hash-based memory can
>> be used for many lookup purposes, like FDB and LPM. In many cases the user
>> can provide a partitioning scheme for such a resource in order to perform
>> fine tuning for his application. In such cases performing driver reload is
>> needed for the changes to take place, thus this patchset also adds support
>> for hot reload.
>>
>> Such an abstraction can be coupled with devlink's dpipe interface, which
>> models the ASIC's pipeline as a graph of match/action tables. By modeling
>> the hardware resource object, and by coupling it to several dpipe tables,
>> further visibility can be achieved in order to debug ASIC-wide issues.
>>
>> The proposed interface will provide the user the ability to understand the
>> limitations of the hardware, and receive notification regarding its occupancy.
>> Furthermore, monitoring the resource occupancy can be done in real-time and
>> can be useful in many cases.
>> ---
>> Userspace part prototype can be found at https://github.com/arkadis/iproute2/
>> at resource_dev branch.
>>
>
>The breakout problem I mentioned in the last round of patches has not
>been fixed:
Oops, I guess that we missed that. Thanks!
^ permalink raw reply
* Re: [PATCH net-next] net: packet: allow bind to device which is !IFF_UP
From: David Miller @ 2017-12-20 20:06 UTC (permalink / raw)
To: cugyly; +Cc: netdev, Linyu.Yuan
In-Reply-To: <1513776049-9141-1-git-send-email-cugyly@163.com>
From: yuan linyu <cugyly@163.com>
Date: Wed, 20 Dec 2017 21:20:49 +0800
> From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
>
> this try to allow tcpdump to capture packet once device IFF_UP
>
> Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
I don't think this is such a great idea.
Even if it "works":
1) The behavior goes back decades, this means real applications can
be broken by this change.
2) The feedback is useful, if I ran tcpdump on the wrong interface
I'd like to get this error if the interface I mistakenly chose
was done in order to help me figure this out.
^ permalink raw reply
* [PATCH net v3] openvswitch: Fix pop_vlan action for double tagged frames
From: Eric Garver @ 2017-12-20 20:09 UTC (permalink / raw)
To: netdev; +Cc: ovs-dev, Jiri Benc
skb_vlan_pop() expects skb->protocol to be a valid TPID for double
tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
shift the true ethertype into position for us.
Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
Signed-off-by: Eric Garver <e@erig.me>
---
net/openvswitch/flow.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index dbe2379329c5..f039064ce922 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -579,6 +579,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
return -EINVAL;
skb_reset_network_header(skb);
+ key->eth.type = skb->protocol;
} else {
eth = eth_hdr(skb);
ether_addr_copy(key->eth.src, eth->h_source);
@@ -592,15 +593,23 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
if (unlikely(parse_vlan(skb, key)))
return -ENOMEM;
- skb->protocol = parse_ethertype(skb);
- if (unlikely(skb->protocol == htons(0)))
+ key->eth.type = parse_ethertype(skb);
+ if (unlikely(key->eth.type == htons(0)))
return -ENOMEM;
+ /* Multiple tagged packets need to retain TPID to satisfy
+ * skb_vlan_pop(), which will later shift the ethertype into
+ * skb->protocol.
+ */
+ if (key->eth.cvlan.tci & htons(VLAN_TAG_PRESENT))
+ skb->protocol = key->eth.cvlan.tpid;
+ else
+ skb->protocol = key->eth.type;
+
skb_reset_network_header(skb);
__skb_push(skb, skb->data - skb_mac_header(skb));
}
skb_reset_mac_len(skb);
- key->eth.type = skb->protocol;
/* Network layer. */
if (key->eth.type == htons(ETH_P_IP)) {
--
2.12.0
^ permalink raw reply related
* [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback"
From: John Fastabend @ 2017-12-20 20:09 UTC (permalink / raw)
To: xiyou.wangcong, jiri, davem; +Cc: kubakici, netdev, eric.dumazet
RCU grace period is needed for lockless qdiscs added in the commit
c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
It is needed now that qdiscs may be lockless otherwise we risk
free'ing a qdisc that is still in use from datapath. Additionally,
push list cleanup into RCU callback. Otherwise we risk the datapath
adding skbs during removal.
Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
include/net/sch_generic.h | 1 +
net/sched/sch_generic.c | 50 ++++++++++++++++++++++++---------------------
2 files changed, 28 insertions(+), 23 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index bc6b25f..a65306b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -97,6 +97,7 @@ struct Qdisc {
unsigned long state;
struct Qdisc *next_sched;
struct sk_buff_head skb_bad_txq;
+ struct rcu_head rcu_head;
int padded;
refcount_t refcnt;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 876fab2..ab497ef 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -873,31 +873,15 @@ void qdisc_reset(struct Qdisc *qdisc)
}
EXPORT_SYMBOL(qdisc_reset);
-static void qdisc_free(struct Qdisc *qdisc)
+static void qdisc_rcu_free(struct rcu_head *head)
{
- if (qdisc_is_percpu_stats(qdisc)) {
- free_percpu(qdisc->cpu_bstats);
- free_percpu(qdisc->cpu_qstats);
- }
-
- kfree((char *) qdisc - qdisc->padded);
-}
-
-void qdisc_destroy(struct Qdisc *qdisc)
-{
- const struct Qdisc_ops *ops = qdisc->ops;
+ struct Qdisc *qdisc = container_of(head, struct Qdisc, rcu_head);
+ const struct Qdisc_ops *ops = qdisc->ops;
struct sk_buff *skb, *tmp;
- if (qdisc->flags & TCQ_F_BUILTIN ||
- !refcount_dec_and_test(&qdisc->refcnt))
- return;
-
-#ifdef CONFIG_NET_SCHED
- qdisc_hash_del(qdisc);
-
- qdisc_put_stab(rtnl_dereference(qdisc->stab));
-#endif
- gen_kill_estimator(&qdisc->rate_est);
+ /* At this point no outstanding references to this Qdisc should
+ * exist in the datapath so its safe to clean up skb lists, etc.
+ */
if (ops->reset)
ops->reset(qdisc);
if (ops->destroy)
@@ -916,7 +900,27 @@ void qdisc_destroy(struct Qdisc *qdisc)
kfree_skb_list(skb);
}
- qdisc_free(qdisc);
+ if (qdisc_is_percpu_stats(qdisc)) {
+ free_percpu(qdisc->cpu_bstats);
+ free_percpu(qdisc->cpu_qstats);
+ }
+
+ kfree((char *) qdisc - qdisc->padded);
+}
+
+void qdisc_destroy(struct Qdisc *qdisc)
+{
+ if (qdisc->flags & TCQ_F_BUILTIN ||
+ !refcount_dec_and_test(&qdisc->refcnt))
+ return;
+
+#ifdef CONFIG_NET_SCHED
+ qdisc_hash_del(qdisc);
+
+ qdisc_put_stab(rtnl_dereference(qdisc->stab));
+#endif
+ gen_kill_estimator(&qdisc->rate_est);
+ call_rcu(&qdisc->rcu_head, qdisc_rcu_free);
}
EXPORT_SYMBOL(qdisc_destroy);
^ permalink raw reply related
* Re: [PATCH v2] net: ibm: emac: support RGMII-[RX|TX]ID phymode
From: David Miller @ 2017-12-20 20:10 UTC (permalink / raw)
To: chunkeey; +Cc: netdev, andrew, christophe.jaillet
In-Reply-To: <20171220160201.17143-1-chunkeey@gmail.com>
From: Christian Lamparter <chunkeey@gmail.com>
Date: Wed, 20 Dec 2017 17:02:01 +0100
> diff --git a/drivers/net/ethernet/ibm/emac/emac.h b/drivers/net/ethernet/ibm/emac/emac.h
> index 5afcc27ceebb..8c6d2af7281b 100644
> --- a/drivers/net/ethernet/ibm/emac/emac.h
> +++ b/drivers/net/ethernet/ibm/emac/emac.h
> @@ -112,6 +112,9 @@ struct emac_regs {
> #define PHY_MODE_RMII PHY_INTERFACE_MODE_RMII
> #define PHY_MODE_SMII PHY_INTERFACE_MODE_SMII
> #define PHY_MODE_RGMII PHY_INTERFACE_MODE_RGMII
> +#define PHY_MODE_RGMII_ID PHY_INTERFACE_MODE_RGMII_ID
> +#define PHY_MODE_RGMII_RXID PHY_INTERFACE_MODE_RGMII_RXID
> +#define PHY_MODE_RGMII_TXID PHY_INTERFACE_MODE_RGMII_TXID
> #define PHY_MODE_TBI PHY_INTERFACE_MODE_TBI
> #define PHY_MODE_GMII PHY_INTERFACE_MODE_GMII
> #define PHY_MODE_RTBI PHY_INTERFACE_MODE_RTBI
Why does this driver do all of this CPP macro aliasing?
It's really cruddy because anyone who now reads this code:
> static inline int rgmii_valid_mode(int phy_mode)
> {
> - return phy_mode == PHY_MODE_GMII ||
> + return phy_interface_mode_is_rgmii(phy_mode) ||
> + phy_mode == PHY_MODE_GMII ||
> phy_mode == PHY_MODE_MII ||
> - phy_mode == PHY_MODE_RGMII ||
> phy_mode == PHY_MODE_TBI ||
> phy_mode == PHY_MODE_RTBI;
> }
will read this and say "Oh the function tests against these weird
PHY_MODE_* aliases, but the helper function phy_interface_mode_is_rgmii()
tests against PHY_INTERFACE_MODE_*, what is going on?"
I hate to do this to you, but please get rid of these confusing and
completely unnecessary PHY_MODE_* CPP aliases first, and just use the
proper PHY_INTERFACE_MODE_* values consistently.
Thank you.
^ permalink raw reply
* [PATCH net V3] net: reevalulate autoflowlabel setting after sysctl setting
From: Shaohua Li @ 2017-12-20 20:10 UTC (permalink / raw)
To: netdev, davem
Cc: Kernel Team, Shaohua Li, Martin KaFai Lau, Eric Dumazet,
Tom Herbert
From: Shaohua Li <shli@fb.com>
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.
The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.
To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.
Note, this changes behavior a little bit. Before commit 42240901f7c4
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
include/linux/ipv6.h | 3 ++-
net/ipv6/af_inet6.c | 1 -
net/ipv6/ip6_output.c | 12 ++++++++++--
net/ipv6/ipv6_sockglue.c | 1 +
4 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index cb18c62..8415bf1 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -273,7 +273,8 @@ struct ipv6_pinfo {
* 100: prefer care-of address
*/
dontfrag:1,
- autoflowlabel:1;
+ autoflowlabel:1,
+ autoflowlabel_set:1;
__u8 min_hopcount;
__u8 tclass;
__be32 rcv_flowinfo;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index c26f712..c9441ca 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -210,7 +210,6 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
np->mcast_hops = IPV6_DEFAULT_MCASTHOPS;
np->mc_loop = 1;
np->pmtudisc = IPV6_PMTUDISC_WANT;
- np->autoflowlabel = ip6_default_np_autolabel(net);
np->repflow = net->ipv6.sysctl.flowlabel_reflect;
sk->sk_ipv6only = net->ipv6.sysctl.bindv6only;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 5110a41..f7dd51c 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -166,6 +166,14 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
!(IP6CB(skb)->flags & IP6SKB_REROUTED));
}
+static bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
+{
+ if (!np->autoflowlabel_set)
+ return ip6_default_np_autolabel(net);
+ else
+ return np->autoflowlabel;
+}
+
/*
* xmit an sk_buff (used by TCP, SCTP and DCCP)
* Note : socket lock is not held for SYNACK packets, but might be modified
@@ -230,7 +238,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
hlimit = ip6_dst_hoplimit(dst);
ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel,
- np->autoflowlabel, fl6));
+ ip6_autoflowlabel(net, np), fl6));
hdr->payload_len = htons(seg_len);
hdr->nexthdr = proto;
@@ -1626,7 +1634,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
ip6_flow_hdr(hdr, v6_cork->tclass,
ip6_make_flowlabel(net, skb, fl6->flowlabel,
- np->autoflowlabel, fl6));
+ ip6_autoflowlabel(net, np), fl6));
hdr->hop_limit = v6_cork->hop_limit;
hdr->nexthdr = proto;
hdr->saddr = fl6->saddr;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index b9404fe..2d4680e 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -886,6 +886,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
break;
case IPV6_AUTOFLOWLABEL:
np->autoflowlabel = valbool;
+ np->autoflowlabel_set = 1;
retv = 0;
break;
case IPV6_RECVFRAGSIZE:
--
2.9.5
^ permalink raw reply related
* Re: [PATCH net,stable] s390/qeth: fix error handling in checksum cmd callback
From: David Miller @ 2017-12-20 20:12 UTC (permalink / raw)
To: jwi; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20171220170718.57715-1-jwi@linux.vnet.ibm.com>
From: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Date: Wed, 20 Dec 2017 18:07:18 +0100
> Make sure to check both return code fields before processing the
> response. Otherwise we risk operating on invalid data.
>
> Fixes: c9475369bd2b ("s390/qeth: rework RX/TX checksum offload")
> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Applied and queued up for -stable, thanks Julian.
^ permalink raw reply
* Re: [PATCH net v3] ipv4: Fix use-after-free when flushing FIB tables
From: David Miller @ 2017-12-20 20:13 UTC (permalink / raw)
To: idosch; +Cc: netdev, alexander.h.duyck, dsahern, fengguang.wu, mlxsw
In-Reply-To: <20171220173419.11259-1-idosch@mellanox.com>
From: Ido Schimmel <idosch@mellanox.com>
Date: Wed, 20 Dec 2017 19:34:19 +0200
> Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the
> local table uses the same trie allocated for the main table when custom
> rules are not in use.
>
> When a net namespace is dismantled, the main table is flushed and freed
> (via an RCU callback) before the local table. In case the callback is
> invoked before the local table is iterated, a use-after-free can occur.
>
> Fix this by iterating over the FIB tables in reverse order, so that the
> main table is always freed after the local table.
>
> v3: Reworded comment according to Alex's suggestion.
> v2: Add a comment to make the fix more explicit per Dave's and Alex's
> feedback.
>
> Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: RCU callback crashes
From: John Fastabend @ 2017-12-20 20:14 UTC (permalink / raw)
To: Jiri Pirko, Cong Wang; +Cc: Jakub Kicinski, netdev@vger.kernel.org
In-Reply-To: <20171220195922.GB1760@nanopsycho>
On 12/20/2017 11:59 AM, Jiri Pirko wrote:
> Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>> On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>> Ah, no object debug but KASAN on produces this:
>>>
>>
>>
>> I bet it is an ingress qdisc which is being freed?
>>
>>
>>
>>> [ 39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>>> [ 39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>>> [ 39.283524]
>>> [ 39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>>> [ 39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>> [ 39.303969] Call Trace:
>>> [ 39.306769] <IRQ>
>>> [ 39.309088] dump_stack+0xa6/0x118
>>> [ 39.312957] ? _atomic_dec_and_lock+0xe8/0xe8
>>> [ 39.317895] ? cpu_needs_another_gp+0x246/0x2b0
>>> [ 39.323030] print_address_description+0x6a/0x270
>>> [ 39.328380] ? cpu_needs_another_gp+0x246/0x2b0
>>> [ 39.333510] kasan_report+0x23f/0x350
>>> [ 39.337672] cpu_needs_another_gp+0x246/0x2b0
>>> ...
>>> [ 39.383026] rcu_process_callbacks+0x1a0/0x620
>>> ...
>>
>>
>> This is confusing.
>>
>> I guess it is q->miniqp which is freed in qdisc_graft() without properly
>> waiting for rcu readers?
>
> miniqp is inside qdisc private data:
> struct ingress_sched_data {
> struct tcf_block *block;
> struct tcf_block_ext_info block_info;
> struct mini_Qdisc_pair miniqp;
> };
>
> That is freed along with the qdisc itself in:
> qdisc_destroy->qdisc_free
>
> Before miniq, tp was checked in the rcu reader path. In case it was not
> null, q was processed. In slow patch, tp is freed after rcu grace period:
> tcf_proto_destroy->kfree_rcu
>
> I assumed that since q is processed in rcu reader, it is also freed after
> a grace period, but now looking at the code I don't see it happening
> like that.
>
> So I think that change to miniq made the existing race window
> a bit wider and easier to hit.
>
> I believe that calling kfree_rcu by call_rcu should resolve this.
>
Hi,
Just sent a patch to complete qdisc_destroy from rcu callback. This
is needed to resolve a race with the lockless qdisc patches.
But I guess it should fix the miniq issue as well?
Thanks,
John
^ permalink raw reply
* Re: RCU callback crashes
From: Jiri Pirko @ 2017-12-20 20:15 UTC (permalink / raw)
To: Cong Wang; +Cc: Jakub Kicinski, netdev@vger.kernel.org, john.fastabend
In-Reply-To: <20171220195922.GB1760@nanopsycho>
Wed, Dec 20, 2017 at 08:59:22PM CET, jiri@resnulli.us wrote:
>Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>>On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>> Ah, no object debug but KASAN on produces this:
>>>
>>
>>
>>I bet it is an ingress qdisc which is being freed?
>>
>>
>>
>>> [ 39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>>> [ 39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>>> [ 39.283524]
>>> [ 39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>>> [ 39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>> [ 39.303969] Call Trace:
>>> [ 39.306769] <IRQ>
>>> [ 39.309088] dump_stack+0xa6/0x118
>>> [ 39.312957] ? _atomic_dec_and_lock+0xe8/0xe8
>>> [ 39.317895] ? cpu_needs_another_gp+0x246/0x2b0
>>> [ 39.323030] print_address_description+0x6a/0x270
>>> [ 39.328380] ? cpu_needs_another_gp+0x246/0x2b0
>>> [ 39.333510] kasan_report+0x23f/0x350
>>> [ 39.337672] cpu_needs_another_gp+0x246/0x2b0
>>> ...
>>> [ 39.383026] rcu_process_callbacks+0x1a0/0x620
>>> ...
>>
>>
>>This is confusing.
>>
>>I guess it is q->miniqp which is freed in qdisc_graft() without properly
>>waiting for rcu readers?
>
>miniqp is inside qdisc private data:
>struct ingress_sched_data {
> struct tcf_block *block;
> struct tcf_block_ext_info block_info;
> struct mini_Qdisc_pair miniqp;
>};
>
>That is freed along with the qdisc itself in:
>qdisc_destroy->qdisc_free
>
>Before miniq, tp was checked in the rcu reader path. In case it was not
>null, q was processed. In slow patch, tp is freed after rcu grace period:
>tcf_proto_destroy->kfree_rcu
>
>I assumed that since q is processed in rcu reader, it is also freed after
>a grace period, but now looking at the code I don't see it happening
>like that.
Aha! It was removed by:
commit c5ad119fb6c09b0297446be05bd66602fa564758
Author: John Fastabend <john.fastabend@gmail.com>
Date: Thu Dec 7 09:58:19 2017 -0800
net: sched: pfifo_fast use skb_array
>
>So I think that change to miniq made the existing race window
>a bit wider and easier to hit.
>
>I believe that calling kfree_rcu by call_rcu should resolve this.
^ permalink raw reply
* Re: [PATCH net v3] openvswitch: Fix pop_vlan action for double tagged frames
From: Jiri Benc @ 2017-12-20 20:17 UTC (permalink / raw)
To: Eric Garver; +Cc: netdev, ovs-dev
In-Reply-To: <20171220200922.29415-1-e@erig.me>
On Wed, 20 Dec 2017 15:09:22 -0500, Eric Garver wrote:
> skb_vlan_pop() expects skb->protocol to be a valid TPID for double
> tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
> shift the true ethertype into position for us.
>
> Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
> Signed-off-by: Eric Garver <e@erig.me>
Thanks!
Reviewed-by: Jiri Benc <jbenc@redhat.com>
^ permalink raw reply
* Re: [PATCH net v3] openvswitch: Fix pop_vlan action for double tagged frames
From: Jiri Benc @ 2017-12-20 20:17 UTC (permalink / raw)
To: Eric Garver; +Cc: netdev, ovs-dev
In-Reply-To: <20171220200922.29415-1-e@erig.me>
On Wed, 20 Dec 2017 15:09:22 -0500, Eric Garver wrote:
> skb_vlan_pop() expects skb->protocol to be a valid TPID for double
> tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
> shift the true ethertype into position for us.
>
> Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
> Signed-off-by: Eric Garver <e@erig.me>
Thanks!
Reviewed-by: Jiri Benc <jbenc@redhat.com>
^ permalink raw reply
* Re: Linux ECN Handling
From: Neal Cardwell @ 2017-12-20 20:17 UTC (permalink / raw)
To: Steve Ibanez
Cc: Eric Dumazet, Yuchung Cheng, Daniel Borkmann, Netdev,
Florian Westphal, Mohammad Alizadeh, Lawrence Brakmo
In-Reply-To: <CACJspmKn=WhhK+-0XkcuYEu5D1K0QVpAbN=+8Cg9x3k0-D0cUA@mail.gmail.com>
On Wed, Dec 20, 2017 at 2:20 PM, Steve Ibanez <sibanez@stanford.edu> wrote:
>
> Hi Neal,
>
> I added in some more printk statements and it does indeed look like
> all of these calls you listed are being invoked successfully. I guess
> this isn't too surprising given what the inet_csk_schedule_ack() and
> inet_csk_ack_scheduled() functions are doing:
>
> static inline void inet_csk_schedule_ack(struct sock *sk)
> {
> inet_csk(sk)->icsk_ack.pending |= ICSK_ACK_SCHED;
> }
>
> static inline int inet_csk_ack_scheduled(const struct sock *sk)
> {
> return inet_csk(sk)->icsk_ack.pending & ICSK_ACK_SCHED;
> }
>
> So through the code path that you listed, the inet_csk_schedule_ack()
> function sets the ICSK_ACK_SCHED bit and then the tcp_ack_snd_check()
> function just checks that the ICSK_ACK_SCHED bit is indeed set.
>inet_csk_schedule_ack
> Do you know how I can verify that setting the ICSK_ACK_SCHED bit
> actually results in an ACK being sent?
Hmm. I don't think in this case we can verify that setting the
ICSK_ACK_SCHED bit actually results in an ACK being sent. Because
AFAICT in this case it seems like an ACK is not sent. :-) This is
based on both the tcpdumps on Dec 5 and your detective work yesterday
("The tcp_rcv_established() function calls tcp_ack_snd_check() at the
end of step5 and then the return statement indicated below is invoked,
which prevents the __tcp_ack_snd_check() function from running.")
So AFAICT the puzzle is: how is the icsk_ack.pending ICSK_ACK_SCHED
bit being cleared between the inet_csk_schedule_ack() call and the
tcp_ack_snd_check() call, without (apparently) an actual ACK being
sent on the wire?
AFAICT the ICSK_ACK_SCHED bit is not supposed to be cleared unless we
get to this sequence:
tcp_transmit_skb()
if (likely(tcb->tcp_flags & TCPHDR_ACK))
tcp_event_ack_sent(sk, tcp_skb_pcount(skb));
-> inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK);
icsk->icsk_ack.blocked = icsk->icsk_ack.pending = 0;
I don't have a theory that fits all of those data points, unless this
is a bi-directional transfer (is it?) and between the
inet_csk_schedule_ack() call and the tcp_ack_snd_check() call the TCP
connection sends a data packet (in tcp_data_snd_check()) and then it
is dropped for some reason before the packet make it to the tcpdump
sniffing point. Perhaps because of a qdisc limit or something?
I guess a possible next step would be, while processing an incoming
skb with the cwr bit set, the code could set a new debugging field in
the tcp_sock (tp->processing_cwr), and then you could check this field
in tcp_transmit_skb() and printk if (1) there is an attempted
queue_xmit() cal and (2) if the queue_xmit() fails (the err > 0 case).
That's a long shot, but the only idea I have at this point.
thanks,
neal
^ permalink raw reply
* Re: RCU callback crashes
From: Jakub Kicinski @ 2017-12-20 20:17 UTC (permalink / raw)
To: John Fastabend; +Cc: Jiri Pirko, netdev@vger.kernel.org, Cong Wang
In-Reply-To: <a6840ce4-d4a8-19c5-f8a5-3dfc00aa7e4b@gmail.com>
On Wed, 20 Dec 2017 10:04:17 -0800, John Fastabend wrote:
> On 12/19/2017 10:34 PM, Jakub Kicinski wrote:
> > On Tue, 19 Dec 2017 22:22:27 -0800, Jakub Kicinski wrote:
> >>>> I get this:
> >>>
> >>> Could you try to run it with kasan on?
> >>
> >> I didn't manage to reproduce it with KASAN on so far :( Even enabling
> >> object debugging to get the second splat in my email (which is more
> >> useful) actually makes the crash go away, I only see the warning...
> >
> > Ah, no object debug but KASAN on produces this:
> >
>
> @Jakub, This is with mq and pfifo_fast I guess?
Sorry for falling silent, I was convinced I saw this before your code
went in, it just takes a lot longer to trigger... I've been running
net-next from Dec 1st now for an hour now and it didn't crash :/
Trying KASAN now..
^ permalink raw reply
* Re: RCU callback crashes
From: Jiri Pirko @ 2017-12-20 20:18 UTC (permalink / raw)
To: John Fastabend; +Cc: Cong Wang, Jakub Kicinski, netdev@vger.kernel.org
In-Reply-To: <d82f1fdf-5d7b-db1b-6efb-85ff957f8a38@gmail.com>
Wed, Dec 20, 2017 at 09:14:49PM CET, john.fastabend@gmail.com wrote:
>On 12/20/2017 11:59 AM, Jiri Pirko wrote:
>> Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>>> On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>>> Ah, no object debug but KASAN on produces this:
>>>>
>>>
>>>
>>> I bet it is an ingress qdisc which is being freed?
>>>
>>>
>>>
>>>> [ 39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>>>> [ 39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>>>> [ 39.283524]
>>>> [ 39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>>>> [ 39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>>> [ 39.303969] Call Trace:
>>>> [ 39.306769] <IRQ>
>>>> [ 39.309088] dump_stack+0xa6/0x118
>>>> [ 39.312957] ? _atomic_dec_and_lock+0xe8/0xe8
>>>> [ 39.317895] ? cpu_needs_another_gp+0x246/0x2b0
>>>> [ 39.323030] print_address_description+0x6a/0x270
>>>> [ 39.328380] ? cpu_needs_another_gp+0x246/0x2b0
>>>> [ 39.333510] kasan_report+0x23f/0x350
>>>> [ 39.337672] cpu_needs_another_gp+0x246/0x2b0
>>>> ...
>>>> [ 39.383026] rcu_process_callbacks+0x1a0/0x620
>>>> ...
>>>
>>>
>>> This is confusing.
>>>
>>> I guess it is q->miniqp which is freed in qdisc_graft() without properly
>>> waiting for rcu readers?
>>
>> miniqp is inside qdisc private data:
>> struct ingress_sched_data {
>> struct tcf_block *block;
>> struct tcf_block_ext_info block_info;
>> struct mini_Qdisc_pair miniqp;
>> };
>>
>> That is freed along with the qdisc itself in:
>> qdisc_destroy->qdisc_free
>>
>> Before miniq, tp was checked in the rcu reader path. In case it was not
>> null, q was processed. In slow patch, tp is freed after rcu grace period:
>> tcf_proto_destroy->kfree_rcu
>>
>> I assumed that since q is processed in rcu reader, it is also freed after
>> a grace period, but now looking at the code I don't see it happening
>> like that.
>>
>> So I think that change to miniq made the existing race window
>> a bit wider and easier to hit.
>>
>> I believe that calling kfree_rcu by call_rcu should resolve this.
>>
>
>Hi,
>
>Just sent a patch to complete qdisc_destroy from rcu callback. This
>is needed to resolve a race with the lockless qdisc patches.
>
>But I guess it should fix the miniq issue as well?
Yes, it should.
^ permalink raw reply
* Re: RCU callback crashes
From: John Fastabend @ 2017-12-20 20:18 UTC (permalink / raw)
To: Jiri Pirko, Cong Wang; +Cc: Jakub Kicinski, netdev@vger.kernel.org
In-Reply-To: <20171220201555.GF1760@nanopsycho>
On 12/20/2017 12:15 PM, Jiri Pirko wrote:
> Wed, Dec 20, 2017 at 08:59:22PM CET, jiri@resnulli.us wrote:
>> Wed, Dec 20, 2017 at 07:17:50PM CET, xiyou.wangcong@gmail.com wrote:
>>> On Tue, Dec 19, 2017 at 10:34 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>>> Ah, no object debug but KASAN on produces this:
>>>>
>>>
>>>
>>> I bet it is an ingress qdisc which is being freed?
>>>
>>>
>>>
>>>> [ 39.268209] BUG: KASAN: use-after-free in cpu_needs_another_gp+0x246/0x2b0
>>>> [ 39.275965] Read of size 8 at addr ffff8803aa64f138 by task swapper/13/0
>>>> [ 39.283524]
>>>> [ 39.285256] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.15.0-rc3-perf-00955-g1d0b01347dd5-dirty #8
>>>> [ 39.295535] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>>>> [ 39.303969] Call Trace:
>>>> [ 39.306769] <IRQ>
>>>> [ 39.309088] dump_stack+0xa6/0x118
>>>> [ 39.312957] ? _atomic_dec_and_lock+0xe8/0xe8
>>>> [ 39.317895] ? cpu_needs_another_gp+0x246/0x2b0
>>>> [ 39.323030] print_address_description+0x6a/0x270
>>>> [ 39.328380] ? cpu_needs_another_gp+0x246/0x2b0
>>>> [ 39.333510] kasan_report+0x23f/0x350
>>>> [ 39.337672] cpu_needs_another_gp+0x246/0x2b0
>>>> ...
>>>> [ 39.383026] rcu_process_callbacks+0x1a0/0x620
>>>> ...
>>>
>>>
>>> This is confusing.
>>>
>>> I guess it is q->miniqp which is freed in qdisc_graft() without properly
>>> waiting for rcu readers?
>>
>> miniqp is inside qdisc private data:
>> struct ingress_sched_data {
>> struct tcf_block *block;
>> struct tcf_block_ext_info block_info;
>> struct mini_Qdisc_pair miniqp;
>> };
>>
>> That is freed along with the qdisc itself in:
>> qdisc_destroy->qdisc_free
>>
>> Before miniq, tp was checked in the rcu reader path. In case it was not
>> null, q was processed. In slow patch, tp is freed after rcu grace period:
>> tcf_proto_destroy->kfree_rcu
>>
>> I assumed that since q is processed in rcu reader, it is also freed after
>> a grace period, but now looking at the code I don't see it happening
>> like that.
>
> Aha! It was removed by:
> commit c5ad119fb6c09b0297446be05bd66602fa564758
> Author: John Fastabend <john.fastabend@gmail.com>
> Date: Thu Dec 7 09:58:19 2017 -0800
>
> net: sched: pfifo_fast use skb_array
>
Even farther back right,
commit 752fbcc33405d6f8249465e4b2c4e420091bb825
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue Sep 19 13:15:42 2017 -0700
net_sched: no need to free qdisc in RCU callback
>
>>
>> So I think that change to miniq made the existing race window
>> a bit wider and easier to hit.
>>
>> I believe that calling kfree_rcu by call_rcu should resolve this.
^ permalink raw reply
* [RFC PATCH net-next] tools/bpftool: use version from the kernel source tree
From: Roman Gushchin @ 2017-12-20 20:19 UTC (permalink / raw)
To: netdev
Cc: linux-kernel, kernel-team, scientist, Roman Gushchin,
Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann
Bpftool determines it's own version based on the kernel
version, which is picked from the linux/version.h header.
It's strange to use the version of the installed kernel
headers, and makes much more sense to use the version
of the actual source tree, where bpftool sources are.
This patch adds $(srctree)/usr/include to the list
of include files, which causes bpftool to use the version
from the source tree.
Example:
before:
$ bpftool version
bpftool v4.14.6
after:
$ bpftool version
bpftool v4.15.0
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
tools/bpf/bpftool/Makefile | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 9c089cfa5f3f..6864d416c49e 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -37,7 +37,9 @@ CC = gcc
CFLAGS += -O2
CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
-CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
+CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi
+CFLAGS += -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf
+CFLAGS += -I$(srctree)/kernel/bpf/ -I$(srctree)/usr/include
LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
INSTALL ?= install
--
2.14.3
^ permalink raw reply related
* Re: RCU callback crashes
From: John Fastabend @ 2017-12-20 20:23 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Jiri Pirko, netdev@vger.kernel.org, Cong Wang
In-Reply-To: <20171220121705.18401098@cakuba.netronome.com>
On 12/20/2017 12:17 PM, Jakub Kicinski wrote:
> On Wed, 20 Dec 2017 10:04:17 -0800, John Fastabend wrote:
>> On 12/19/2017 10:34 PM, Jakub Kicinski wrote:
>>> On Tue, 19 Dec 2017 22:22:27 -0800, Jakub Kicinski wrote:
>>>>>> I get this:
>>>>>
>>>>> Could you try to run it with kasan on?
>>>>
>>>> I didn't manage to reproduce it with KASAN on so far :( Even enabling
>>>> object debugging to get the second splat in my email (which is more
>>>> useful) actually makes the crash go away, I only see the warning...
>>>
>>> Ah, no object debug but KASAN on produces this:
>>>
>>
>> @Jakub, This is with mq and pfifo_fast I guess?
>
> Sorry for falling silent, I was convinced I saw this before your code
> went in, it just takes a lot longer to trigger... I've been running
> net-next from Dec 1st now for an hour now and it didn't crash :/
>
> Trying KASAN now..
>
Its possible my patches just made it worse because the kfree on the skb
lists was exposed as well.
I'm trying to see how removing that rcu grace period was safe in the
first place. The datapath is using rcu_read critical section to protect
the qdisc but the control path (a) doesn't use rcu grace period and (b)
doesn't use the qidisc lock. Going to go get a coffee and I'll think
about it a bit more. Any ideas Cong?
Perhaps we need a patch for net (mine was against net-next) and stable
as well probably.
Thanks,
John
^ permalink raw reply
* Re: [PATCH net-next 00/15] s390/net: updates 2017-12-20
From: David Miller @ 2017-12-20 20:24 UTC (permalink / raw)
To: jwi; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20171220191109.90487-1-jwi@linux.vnet.ibm.com>
From: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Date: Wed, 20 Dec 2017 20:10:54 +0100
> Please apply the following patch series for 4.16.
> Nothing too exciting, mostly just beating the qeth L3 code into shape.
Series applied, thanks.
> Thanks & happy holidays,
> Julian
Same to you!
^ permalink raw reply
* Re: [RFC PATCH net-next] tools/bpftool: use version from the kernel source tree
From: Yonghong Song @ 2017-12-20 20:26 UTC (permalink / raw)
To: Roman Gushchin, netdev
Cc: linux-kernel, kernel-team, scientist, Jakub Kicinski,
Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <20171220201943.24440-1-guro@fb.com>
On 12/20/17 12:19 PM, Roman Gushchin wrote:
> Bpftool determines it's own version based on the kernel
> version, which is picked from the linux/version.h header.
>
> It's strange to use the version of the installed kernel
> headers, and makes much more sense to use the version
> of the actual source tree, where bpftool sources are.
>
> This patch adds $(srctree)/usr/include to the list
> of include files, which causes bpftool to use the version
> from the source tree.
>
> Example:
> before:
>
> $ bpftool version
> bpftool v4.14.6
>
> after:
> $ bpftool version
> bpftool v4.15.0
>
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> ---
> tools/bpf/bpftool/Makefile | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> index 9c089cfa5f3f..6864d416c49e 100644
> --- a/tools/bpf/bpftool/Makefile
> +++ b/tools/bpf/bpftool/Makefile
> @@ -37,7 +37,9 @@ CC = gcc
>
> CFLAGS += -O2
> CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
> -CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
> +CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi
> +CFLAGS += -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf
> +CFLAGS += -I$(srctree)/kernel/bpf/ -I$(srctree)/usr/include
-I$(srctree)/usr/include may not work if build directory is not the same
as the source directory. You probably should use
-I$(objtree)/usr/include?
> LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
>
> INSTALL ?= install
>
^ permalink raw reply
* Re: [RFC PATCH net-next] tools/bpftool: use version from the kernel source tree
From: Jakub Kicinski @ 2017-12-20 20:29 UTC (permalink / raw)
To: Roman Gushchin
Cc: netdev, linux-kernel, kernel-team, scientist, Alexei Starovoitov,
Daniel Borkmann
In-Reply-To: <20171220201943.24440-1-guro@fb.com>
On Wed, 20 Dec 2017 20:19:43 +0000, Roman Gushchin wrote:
> Bpftool determines it's own version based on the kernel
> version, which is picked from the linux/version.h header.
>
> It's strange to use the version of the installed kernel
> headers, and makes much more sense to use the version
> of the actual source tree, where bpftool sources are.
>
> This patch adds $(srctree)/usr/include to the list
> of include files, which causes bpftool to use the version
> from the source tree.
>
> Example:
> before:
>
> $ bpftool version
> bpftool v4.14.6
>
> after:
> $ bpftool version
> bpftool v4.15.0
Thanks for the patch, this would indeed use some improvement.
How about we just run make to get the version like liblockdep does?
LIBLOCKDEP_VERSION=$(shell make --no-print-directory -sC ../../.. kernelversion)
probably s@../../..@$(srctree)@
$(srctree)/usr/include is not going to be there for out-of-source builds.
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> ---
> tools/bpf/bpftool/Makefile | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> index 9c089cfa5f3f..6864d416c49e 100644
> --- a/tools/bpf/bpftool/Makefile
> +++ b/tools/bpf/bpftool/Makefile
> @@ -37,7 +37,9 @@ CC = gcc
>
> CFLAGS += -O2
> CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
> -CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
> +CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi
> +CFLAGS += -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf
> +CFLAGS += -I$(srctree)/kernel/bpf/ -I$(srctree)/usr/include
> LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
>
> INSTALL ?= install
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox