* [PATCH v2] net: sched: fix memory leak in tcindex_set_parms
@ 2022-11-13 17:05 Hawkins Jiawei
2022-11-15 11:36 ` Paolo Abeni
2022-11-15 17:02 ` Jakub Kicinski
0 siblings, 2 replies; 9+ messages in thread
From: Hawkins Jiawei @ 2022-11-13 17:05 UTC (permalink / raw)
To: yin31149, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: 18801353760, syzbot+232ebdbd36706c965ebf, syzkaller-bugs,
Cong Wang, netdev, linux-kernel
Syzkaller reports a memory leak as follows:
====================================
BUG: memory leak
unreferenced object 0xffff88810c287f00 (size 256):
comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
[<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
[<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
[<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
[<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
[<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
[<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
[<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
[<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
[<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
[<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
[<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
[<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
[<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
[<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
[<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
[<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
[<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
[<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
[<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
[<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
[<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
====================================
Kernel uses tcindex_change() to change an existing
traffic-control-indices filter properties. During the
process of changing, kernel clears the old
traffic-control-indices filter result, and updates it
by RCU assigning new traffic-control-indices data.
Yet the problem is that, kernel clears the old
traffic-control-indices filter result, without destroying
its tcf_exts structure, which triggers the above
memory leak.
This patch solves it by using tcf_exts_destroy() to
destroy the tcf_exts structure in old
traffic-control-indices filter result, after the
RCU grace period.
[Thanks to the suggestion from Jakub Kicinski and Cong Wang]
Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
---
v2:
- remove all 'will' in commit message according to Jakub Kicinski
- add Fixes tag according to Jakub Kicinski
- remove all ifdefs according to Jakub Kicinski and Cong Wang
- add synchronize_rcu() before destorying old_e according to
Cong Wang
v1: https://lore.kernel.org/all/20221031060835.11722-1-yin31149@gmail.com/
net/sched/cls_tcindex.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 1c9eeb98d826..d2fac9559d3e 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
struct tcf_result cr = {};
int err, balloc = 0;
struct tcf_exts e;
+ struct tcf_exts old_e = {};
err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
if (err < 0)
@@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
}
if (old_r && old_r != r) {
+ old_e = old_r->exts;
err = tcindex_filter_result_init(old_r, cp, net);
if (err < 0) {
kfree(f);
@@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
tcf_exts_destroy(&new_filter_result.exts);
}
+ /* Note: old_e should be destroyed after the RCU grace period,
+ * to avoid possible use-after-free by concurrent readers.
+ */
+ synchronize_rcu();
+ tcf_exts_destroy(&old_e);
+
if (oldp)
tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work);
return 0;
--
2.25.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms 2022-11-13 17:05 [PATCH v2] net: sched: fix memory leak in tcindex_set_parms Hawkins Jiawei @ 2022-11-15 11:36 ` Paolo Abeni 2022-11-15 11:39 ` Dmitry Vyukov 2022-11-15 17:02 ` Jakub Kicinski 1 sibling, 1 reply; 9+ messages in thread From: Paolo Abeni @ 2022-11-15 11:36 UTC (permalink / raw) To: Hawkins Jiawei, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski Cc: 18801353760, syzbot+232ebdbd36706c965ebf, syzkaller-bugs, Cong Wang, netdev, linux-kernel On Mon, 2022-11-14 at 01:05 +0800, Hawkins Jiawei wrote: > Syzkaller reports a memory leak as follows: > ==================================== > BUG: memory leak > unreferenced object 0xffff88810c287f00 (size 256): > comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 > [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] > [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] > [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] > [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] > [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 > [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 > [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 > [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 > [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 > [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] > [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 > [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 > [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] > [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 > [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 > [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 > [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 > [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] > [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] > [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 > [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] > [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd > ==================================== > > Kernel uses tcindex_change() to change an existing > traffic-control-indices filter properties. During the > process of changing, kernel clears the old > traffic-control-indices filter result, and updates it > by RCU assigning new traffic-control-indices data. > > Yet the problem is that, kernel clears the old > traffic-control-indices filter result, without destroying > its tcf_exts structure, which triggers the above > memory leak. > > This patch solves it by using tcf_exts_destroy() to > destroy the tcf_exts structure in old > traffic-control-indices filter result, after the > RCU grace period. > > [Thanks to the suggestion from Jakub Kicinski and Cong Wang] > > Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > Cc: Cong Wang <cong.wang@bytedance.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> > --- > v2: > - remove all 'will' in commit message according to Jakub Kicinski > - add Fixes tag according to Jakub Kicinski > - remove all ifdefs according to Jakub Kicinski and Cong Wang > - add synchronize_rcu() before destorying old_e according to > Cong Wang > > v1: https://lore.kernel.org/all/20221031060835.11722-1-yin31149@gmail.com/ > net/sched/cls_tcindex.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > index 1c9eeb98d826..d2fac9559d3e 100644 > --- a/net/sched/cls_tcindex.c > +++ b/net/sched/cls_tcindex.c > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > struct tcf_result cr = {}; > int err, balloc = 0; > struct tcf_exts e; > + struct tcf_exts old_e = {}; > > err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > if (err < 0) > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > } > > if (old_r && old_r != r) { > + old_e = old_r->exts; > err = tcindex_filter_result_init(old_r, cp, net); > if (err < 0) { > kfree(f); > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > tcf_exts_destroy(&new_filter_result.exts); > } > > + /* Note: old_e should be destroyed after the RCU grace period, > + * to avoid possible use-after-free by concurrent readers. > + */ > + synchronize_rcu(); this could make tc reconfiguration potentially very slow. I'm wondering if we can delegate the tcf_exts_destroy() to some workqueue? Thanks! Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms 2022-11-15 11:36 ` Paolo Abeni @ 2022-11-15 11:39 ` Dmitry Vyukov 0 siblings, 0 replies; 9+ messages in thread From: Dmitry Vyukov @ 2022-11-15 11:39 UTC (permalink / raw) To: Paolo Abeni Cc: Hawkins Jiawei, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski, 18801353760, syzbot+232ebdbd36706c965ebf, syzkaller-bugs, Cong Wang, netdev, linux-kernel On Tue, 15 Nov 2022 at 12:36, Paolo Abeni <pabeni@redhat.com> wrote: > > On Mon, 2022-11-14 at 01:05 +0800, Hawkins Jiawei wrote: > > Syzkaller reports a memory leak as follows: > > ==================================== > > BUG: memory leak > > unreferenced object 0xffff88810c287f00 (size 256): > > comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) > > hex dump (first 32 bytes): > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > backtrace: > > [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 > > [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] > > [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] > > [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] > > [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] > > [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 > > [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 > > [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 > > [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 > > [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 > > [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] > > [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 > > [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 > > [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] > > [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 > > [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 > > [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 > > [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 > > [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] > > [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] > > [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 > > [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] > > [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > > [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd > > ==================================== > > > > Kernel uses tcindex_change() to change an existing > > traffic-control-indices filter properties. During the > > process of changing, kernel clears the old > > traffic-control-indices filter result, and updates it > > by RCU assigning new traffic-control-indices data. > > > > Yet the problem is that, kernel clears the old > > traffic-control-indices filter result, without destroying > > its tcf_exts structure, which triggers the above > > memory leak. > > > > This patch solves it by using tcf_exts_destroy() to > > destroy the tcf_exts structure in old > > traffic-control-indices filter result, after the > > RCU grace period. > > > > [Thanks to the suggestion from Jakub Kicinski and Cong Wang] > > > > Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") > > Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ > > Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > > Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com > > Cc: Cong Wang <cong.wang@bytedance.com> > > Cc: Jakub Kicinski <kuba@kernel.org> > > Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> > > --- > > v2: > > - remove all 'will' in commit message according to Jakub Kicinski > > - add Fixes tag according to Jakub Kicinski > > - remove all ifdefs according to Jakub Kicinski and Cong Wang > > - add synchronize_rcu() before destorying old_e according to > > Cong Wang > > > > v1: https://lore.kernel.org/all/20221031060835.11722-1-yin31149@gmail.com/ > > net/sched/cls_tcindex.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > > index 1c9eeb98d826..d2fac9559d3e 100644 > > --- a/net/sched/cls_tcindex.c > > +++ b/net/sched/cls_tcindex.c > > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > struct tcf_result cr = {}; > > int err, balloc = 0; > > struct tcf_exts e; > > + struct tcf_exts old_e = {}; > > > > err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > > if (err < 0) > > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > } > > > > if (old_r && old_r != r) { > > + old_e = old_r->exts; > > err = tcindex_filter_result_init(old_r, cp, net); > > if (err < 0) { > > kfree(f); > > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > tcf_exts_destroy(&new_filter_result.exts); > > } > > > > + /* Note: old_e should be destroyed after the RCU grace period, > > + * to avoid possible use-after-free by concurrent readers. > > + */ > > + synchronize_rcu(); > > this could make tc reconfiguration potentially very slow. I'm wondering > if we can delegate the tcf_exts_destroy() to some workqueue? call_rcu? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms 2022-11-13 17:05 [PATCH v2] net: sched: fix memory leak in tcindex_set_parms Hawkins Jiawei 2022-11-15 11:36 ` Paolo Abeni @ 2022-11-15 17:02 ` Jakub Kicinski 2022-11-15 18:57 ` Paolo Abeni 2022-11-16 7:35 ` Hawkins Jiawei 1 sibling, 2 replies; 9+ messages in thread From: Jakub Kicinski @ 2022-11-15 17:02 UTC (permalink / raw) To: Hawkins Jiawei Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, 18801353760, syzbot+232ebdbd36706c965ebf, syzkaller-bugs, Cong Wang, netdev, linux-kernel On Mon, 14 Nov 2022 01:05:08 +0800 Hawkins Jiawei wrote: > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > index 1c9eeb98d826..d2fac9559d3e 100644 > --- a/net/sched/cls_tcindex.c > +++ b/net/sched/cls_tcindex.c > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > struct tcf_result cr = {}; > int err, balloc = 0; > struct tcf_exts e; > + struct tcf_exts old_e = {}; This is not a valid way of initializing a structure. tcf_exts_init() is supposed to be called. If we add a list member to that structure this code will break, again. > err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > if (err < 0) > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > } > > if (old_r && old_r != r) { > + old_e = old_r->exts; > err = tcindex_filter_result_init(old_r, cp, net); > if (err < 0) { > kfree(f); > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > tcf_exts_destroy(&new_filter_result.exts); > } > > + /* Note: old_e should be destroyed after the RCU grace period, > + * to avoid possible use-after-free by concurrent readers. > + */ > + synchronize_rcu(); > + tcf_exts_destroy(&old_e); I don't think this dance is required, @cp is a copy of the original data, and the original (@p) is destroyed in a safe manner below. > if (oldp) > tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work); > return 0; ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms 2022-11-15 17:02 ` Jakub Kicinski @ 2022-11-15 18:57 ` Paolo Abeni 2022-11-16 2:44 ` Jakub Kicinski 2022-11-16 7:35 ` Hawkins Jiawei 1 sibling, 1 reply; 9+ messages in thread From: Paolo Abeni @ 2022-11-15 18:57 UTC (permalink / raw) To: Jakub Kicinski, Hawkins Jiawei Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, 18801353760, syzbot+232ebdbd36706c965ebf, syzkaller-bugs, Cong Wang, netdev, linux-kernel On Tue, 2022-11-15 at 09:02 -0800, Jakub Kicinski wrote: > On Mon, 14 Nov 2022 01:05:08 +0800 Hawkins Jiawei wrote: > > > @@ -479,6 +480,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > } > > > > if (old_r && old_r != r) { > > + old_e = old_r->exts; > > err = tcindex_filter_result_init(old_r, cp, net); > > if (err < 0) { > > kfree(f); > > @@ -510,6 +512,12 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > tcf_exts_destroy(&new_filter_result.exts); > > } > > > > + /* Note: old_e should be destroyed after the RCU grace period, > > + * to avoid possible use-after-free by concurrent readers. > > + */ > > + synchronize_rcu(); > > + tcf_exts_destroy(&old_e); > > I don't think this dance is required, @cp is a copy of the original > data, and the original (@p) is destroyed in a safe manner below. This code confuses me more than a bit, and I don't follow ?!? it looks like that at this point: * the data path could access 'old_r->exts' contents via 'p' just before the previous 'tcindex_filter_result_init(old_r, cp, net);' but still potentially within the same RCU grace period * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old exts from 'p' so that will not be freed by later tcindex_partial_destroy_work() Overall it looks to me that we need some somewhat wait for the RCU grace period, Somewhat side question: it looks like that the 'perfect hashing' usage is the root cause of the issue addressed here, and very likely is afflicted by other problems, e.g. the data curruption in 'err = tcindex_filter_result_init(old_r, cp, net);'. AFAICS 'perfect hashing' usage is a sort of optimization that the user- space may trigger with some combination of the tcindex arguments. I'm wondering if we could drop all perfect hashing related code? Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms 2022-11-15 18:57 ` Paolo Abeni @ 2022-11-16 2:44 ` Jakub Kicinski 2022-11-16 12:10 ` Hawkins Jiawei 0 siblings, 1 reply; 9+ messages in thread From: Jakub Kicinski @ 2022-11-16 2:44 UTC (permalink / raw) To: Paolo Abeni Cc: Hawkins Jiawei, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet, 18801353760, syzbot+232ebdbd36706c965ebf, syzkaller-bugs, Cong Wang, netdev, linux-kernel On Tue, 15 Nov 2022 19:57:10 +0100 Paolo Abeni wrote: > This code confuses me more than a bit, and I don't follow ?!? It's very confusing :S For starters I don't know when r != old_r. I mean now it triggers randomly after the RCU-ification, but in the original code when it was just a memset(). When would old_r ever not be null and yet point to a different entry? > it looks like that at this point: > > * the data path could access 'old_r->exts' contents via 'p' just before > the previous 'tcindex_filter_result_init(old_r, cp, net);' but still > potentially within the same RCU grace period > > * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old > exts from 'p' so that will not be freed by later > tcindex_partial_destroy_work() > > Overall it looks to me that we need some somewhat wait for the RCU > grace period, Isn't it better to make @cp a deeper copy of @p ? I thought it already is but we don't seem to be cloning p->h. Also the cloning of p->perfect looks quite lossy. > Somewhat side question: it looks like that the 'perfect hashing' usage > is the root cause of the issue addressed here, and very likely is > afflicted by other problems, e.g. the data curruption in 'err = > tcindex_filter_result_init(old_r, cp, net);'. > > AFAICS 'perfect hashing' usage is a sort of optimization that the user- > space may trigger with some combination of the tcindex arguments. I'm > wondering if we could drop all perfect hashing related code? The thought of "how much of this can we delete" did cross my mind :) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms 2022-11-16 2:44 ` Jakub Kicinski @ 2022-11-16 12:10 ` Hawkins Jiawei 2022-11-16 13:21 ` Hawkins Jiawei 0 siblings, 1 reply; 9+ messages in thread From: Hawkins Jiawei @ 2022-11-16 12:10 UTC (permalink / raw) To: kuba Cc: 18801353760, cong.wang, davem, edumazet, jhs, jiri, linux-kernel, netdev, pabeni, syzbot+232ebdbd36706c965ebf, syzkaller-bugs, xiyou.wangcong, yin31149 On Wed, 16 Nov 2022 at 10:44, Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 15 Nov 2022 19:57:10 +0100 Paolo Abeni wrote: > > This code confuses me more than a bit, and I don't follow ?!? > > It's very confusing :S > > For starters I don't know when r != old_r. I mean now it triggers > randomly after the RCU-ification, but in the original code when > it was just a memset(). When would old_r ever not be null and yet > point to a different entry? I am also confused about the code when I tried to fix this bug. As for when `old_r != r`, according to the simplified code below, this should be probably true if `p->perfect` is true or `!p->perfect && !pc->h` is true(please correct me if I am wrong) struct tcindex_filter_result new_filter_result, *old_r = r; struct tcindex_data *cp = NULL, *oldp; struct tcf_result cr = {}; /* tcindex_data attributes must look atomic to classifier/lookup so * allocate new tcindex data and RCU assign it onto root. Keeping * perfect hash and hash pointers from old data. */ cp = kzalloc(sizeof(*cp), GFP_KERNEL); if (p->perfect) { if (tcindex_alloc_perfect_hash(net, cp) < 0) goto errout; ... } cp->h = p->h; if (!cp->perfect && !cp->h) { if (valid_perfect_hash(cp)) { if (tcindex_alloc_perfect_hash(net, cp) < 0) goto errout_alloc; } else { struct tcindex_filter __rcu **hash; hash = kcalloc(cp->hash, sizeof(struct tcindex_filter *), GFP_KERNEL); if (!hash) goto errout_alloc; cp->h = hash; } } ... if (cp->perfect) r = cp->perfect + handle; else r = tcindex_lookup(cp, handle) ? : &new_filter_result; if (old_r && old_r != r) { err = tcindex_filter_result_init(old_r, cp, net); if (err < 0) { kfree(f); goto errout_alloc; } } * If `p->perfect` is true, tcindex_alloc_perfect_hash() newly alloctes cp->perfect. * If `!p->perfect && !p->h` is true, cp->perfect or cp->h is newly allocated. In either case, r probably points to the newly allocated memory, which should not equals to the old_r. > > > it looks like that at this point: > > > > * the data path could access 'old_r->exts' contents via 'p' just before > > the previous 'tcindex_filter_result_init(old_r, cp, net);' but still > > potentially within the same RCU grace period > > > > * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old > > exts from 'p' so that will not be freed by later > > tcindex_partial_destroy_work() > > > > Overall it looks to me that we need some somewhat wait for the RCU > > grace period, > > Isn't it better to make @cp a deeper copy of @p ? > I thought it already is but we don't seem to be cloning p->h. > Also the cloning of p->perfect looks quite lossy. Yes, I also think @cp should be a deeper copy of @p. But it seems that in tcindex_alloc_perfect_hash(), each @cp ->exts will be initialized by tcf_exts_init() as below, and tcindex_set_parms() forgets to free the old ->exts content, triggering this memory leak.(Please correct me if I am wrong) static int tcindex_alloc_perfect_hash(struct net *net, struct tcindex_data *cp) { int i, err = 0; cp->perfect = kcalloc(cp->hash, sizeof(struct tcindex_filter_result), GFP_KERNEL | __GFP_NOWARN); for (i = 0; i < cp->hash; i++) { err = tcf_exts_init(&cp->perfect[i].exts, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); if (err < 0) goto errout; cp->perfect[i].p = cp; } } static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net, int action, int police) { #ifdef CONFIG_NET_CLS_ACT exts->type = 0; exts->nr_actions = 0; /* Note: we do not own yet a reference on net. * This reference might be taken later from tcf_exts_get_net(). */ exts->net = net; exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *), GFP_KERNEL); if (!exts->actions) return -ENOMEM; #endif exts->action = action; exts->police = police; return 0; } > > > Somewhat side question: it looks like that the 'perfect hashing' usage > > is the root cause of the issue addressed here, and very likely is > > afflicted by other problems, e.g. the data curruption in 'err = > > tcindex_filter_result_init(old_r, cp, net);'. > > > > AFAICS 'perfect hashing' usage is a sort of optimization that the user- > > space may trigger with some combination of the tcindex arguments. I'm > > wondering if we could drop all perfect hashing related code? > > The thought of "how much of this can we delete" did cross my mind :) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms 2022-11-16 12:10 ` Hawkins Jiawei @ 2022-11-16 13:21 ` Hawkins Jiawei 0 siblings, 0 replies; 9+ messages in thread From: Hawkins Jiawei @ 2022-11-16 13:21 UTC (permalink / raw) To: yin31149 Cc: 18801353760, cong.wang, davem, edumazet, jhs, jiri, kuba, linux-kernel, netdev, pabeni, syzbot+232ebdbd36706c965ebf, syzkaller-bugs, xiyou.wangcong On Wed, 16 Nov 2022 at 20:10, Hawkins Jiawei <yin31149@gmail.com> wrote: > > On Wed, 16 Nov 2022 at 10:44, Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Tue, 15 Nov 2022 19:57:10 +0100 Paolo Abeni wrote: > > > This code confuses me more than a bit, and I don't follow ?!? > > > > It's very confusing :S > > > > For starters I don't know when r != old_r. I mean now it triggers > > randomly after the RCU-ification, but in the original code when > > it was just a memset(). When would old_r ever not be null and yet > > point to a different entry? > > I am also confused about the code when I tried to fix this bug. > > As for when `old_r != r`, according to the simplified > code below, this should be probably true if `p->perfect` is true > or `!p->perfect && !pc->h` is true(please correct me if I am wrong) > > struct tcindex_filter_result new_filter_result, *old_r = r; > struct tcindex_data *cp = NULL, *oldp; > struct tcf_result cr = {}; > > /* tcindex_data attributes must look atomic to classifier/lookup so > * allocate new tcindex data and RCU assign it onto root. Keeping > * perfect hash and hash pointers from old data. > */ > cp = kzalloc(sizeof(*cp), GFP_KERNEL); > > if (p->perfect) { > if (tcindex_alloc_perfect_hash(net, cp) < 0) > goto errout; > ... > } > cp->h = p->h; > > if (!cp->perfect && !cp->h) { > if (valid_perfect_hash(cp)) { > if (tcindex_alloc_perfect_hash(net, cp) < 0) > goto errout_alloc; > > } else { > struct tcindex_filter __rcu **hash; > > hash = kcalloc(cp->hash, > sizeof(struct tcindex_filter *), > GFP_KERNEL); > > if (!hash) > goto errout_alloc; > > cp->h = hash; > } > } > ... > > if (cp->perfect) > r = cp->perfect + handle; > else > r = tcindex_lookup(cp, handle) ? : &new_filter_result; > > if (old_r && old_r != r) { > err = tcindex_filter_result_init(old_r, cp, net); > if (err < 0) { > kfree(f); > goto errout_alloc; > } > } > > * If `p->perfect` is true, tcindex_alloc_perfect_hash() newly > alloctes cp->perfect. > > * If `!p->perfect && !p->h` is true, cp->perfect or cp->h is > newly allocated. > > In either case, r probably points to the newly allocated memory, > which should not equals to the old_r. Sorry for the error. In the second case, `r` is possibly pointing to the `&new_filter_result`, which is a stack variable address, and should still not equal to the `old_r`. > > > > > > it looks like that at this point: > > > > > > * the data path could access 'old_r->exts' contents via 'p' just before > > > the previous 'tcindex_filter_result_init(old_r, cp, net);' but still > > > potentially within the same RCU grace period > > > > > > * 'tcindex_filter_result_init(old_r, cp, net);' has 'unlinked' the old > > > exts from 'p' so that will not be freed by later > > > tcindex_partial_destroy_work() > > > > > > Overall it looks to me that we need some somewhat wait for the RCU > > > grace period, > > > > Isn't it better to make @cp a deeper copy of @p ? > > I thought it already is but we don't seem to be cloning p->h. > > Also the cloning of p->perfect looks quite lossy. > > Yes, I also think @cp should be a deeper copy of @p. > > But it seems that in tcindex_alloc_perfect_hash(), > each @cp ->exts will be initialized by tcf_exts_init() > as below, and tcindex_set_parms() forgets to free the > old ->exts content, triggering this memory leak.(Please > correct me if I am wrong) > > static int tcindex_alloc_perfect_hash(struct net *net, > struct tcindex_data *cp) > { > int i, err = 0; > > cp->perfect = kcalloc(cp->hash, sizeof(struct tcindex_filter_result), > GFP_KERNEL | __GFP_NOWARN); > > for (i = 0; i < cp->hash; i++) { > err = tcf_exts_init(&cp->perfect[i].exts, net, > TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE); > if (err < 0) > goto errout; > cp->perfect[i].p = cp; > } > } > > static inline int tcf_exts_init(struct tcf_exts *exts, struct net *net, > int action, int police) > { > #ifdef CONFIG_NET_CLS_ACT > exts->type = 0; > exts->nr_actions = 0; > /* Note: we do not own yet a reference on net. > * This reference might be taken later from tcf_exts_get_net(). > */ > exts->net = net; > exts->actions = kcalloc(TCA_ACT_MAX_PRIO, sizeof(struct tc_action *), > GFP_KERNEL); > if (!exts->actions) > return -ENOMEM; > #endif > exts->action = action; > exts->police = police; > return 0; > } > > > > > > Somewhat side question: it looks like that the 'perfect hashing' usage > > > is the root cause of the issue addressed here, and very likely is > > > afflicted by other problems, e.g. the data curruption in 'err = > > > tcindex_filter_result_init(old_r, cp, net);'. > > > > > > AFAICS 'perfect hashing' usage is a sort of optimization that the user- > > > space may trigger with some combination of the tcindex arguments. I'm > > > wondering if we could drop all perfect hashing related code? > > > > The thought of "how much of this can we delete" did cross my mind :) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] net: sched: fix memory leak in tcindex_set_parms 2022-11-15 17:02 ` Jakub Kicinski 2022-11-15 18:57 ` Paolo Abeni @ 2022-11-16 7:35 ` Hawkins Jiawei 1 sibling, 0 replies; 9+ messages in thread From: Hawkins Jiawei @ 2022-11-16 7:35 UTC (permalink / raw) To: kuba Cc: 18801353760, cong.wang, davem, edumazet, jhs, jiri, linux-kernel, netdev, pabeni, syzbot+232ebdbd36706c965ebf, syzkaller-bugs, xiyou.wangcong, yin31149 Hi, On Wed, 16 Nov 2022 at 01:02, Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 14 Nov 2022 01:05:08 +0800 Hawkins Jiawei wrote: > > diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c > > index 1c9eeb98d826..d2fac9559d3e 100644 > > --- a/net/sched/cls_tcindex.c > > +++ b/net/sched/cls_tcindex.c > > @@ -338,6 +338,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, > > struct tcf_result cr = {}; > > int err, balloc = 0; > > struct tcf_exts e; > > + struct tcf_exts old_e = {}; > > This is not a valid way of initializing a structure. > tcf_exts_init() is supposed to be called. > If we add a list member to that structure this code will break, again. Yes, you are right. But the `old_e` variable here is used only for freeing old_r->exts resource, `old_e` will be overwritten by old_r->exts content as follows: struct tcf_exts old_e = {}; ... if (old_r && old_r != r) { old_e = old_r->exts; ... } ... synchronize_rcu(); tcf_exts_destroy(&old_e); So this patch uses `struct tcf_exts old_e = {}` here just for a cleared space. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-11-16 13:21 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-13 17:05 [PATCH v2] net: sched: fix memory leak in tcindex_set_parms Hawkins Jiawei 2022-11-15 11:36 ` Paolo Abeni 2022-11-15 11:39 ` Dmitry Vyukov 2022-11-15 17:02 ` Jakub Kicinski 2022-11-15 18:57 ` Paolo Abeni 2022-11-16 2:44 ` Jakub Kicinski 2022-11-16 12:10 ` Hawkins Jiawei 2022-11-16 13:21 ` Hawkins Jiawei 2022-11-16 7:35 ` Hawkins Jiawei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).