From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Sasha Levin To: "stable@vger.kernel.org" , "linux-kernel@vger.kernel.org" CC: Jiri Olsa , Jiri Olsa , Alexander Shishkin , Andi Kleen , David Ahern , Kan Liang , Lukasz Odzioba , Peter Zijlstra , Wang Nan , "kernel-team@lge.com" , Arnaldo Carvalho de Melo , Sasha Levin Subject: [PATCH AUTOSEL 4.18 43/88] perf tools: Fix struct comm_str removal crash Date: Fri, 7 Sep 2018 00:36:24 +0000 Message-ID: <20180907003547.57567-43-alexander.levin@microsoft.com> References: <20180907003547.57567-1-alexander.levin@microsoft.com> In-Reply-To: <20180907003547.57567-1-alexander.levin@microsoft.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: From: Jiri Olsa [ Upstream commit 46b3722cc7765582354488da633aafffcb138458 ] We occasionaly hit following assert failure in 'perf top', when processing = the /proc info in multiple threads. perf: ...include/linux/refcount.h:109: refcount_inc: Assertion `!(!refcount_inc_not_zero(r))' failed. The gdb backtrace looks like this: [Switching to Thread 0x7ffff11ba700 (LWP 13749)] 0x00007ffff50839fb in raise () from /lib64/libc.so.6 (gdb) #0 0x00007ffff50839fb in raise () from /lib64/libc.so.6 #1 0x00007ffff5085800 in abort () from /lib64/libc.so.6 #2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6 #4 0x0000000000535373 in refcount_inc (r=3D0x7fffdc009be0) at ...include/linux/refcount.h:109 #5 0x00000000005354f1 in comm_str__get (cs=3D0x7fffdc009bc0) at util/comm.c:24 #6 0x00000000005356bd in __comm_str__findnew (str=3D0x7fffd000b260 ":2", root=3D0xbed5c0 ) at util/comm.c:72 #7 0x000000000053579e in comm_str__findnew (str=3D0x7fffd000b260 ":2", root=3D0xbed5c0 ) at util/comm.c:95 #8 0x000000000053582e in comm__new (str=3D0x7fffd000b260 ":2", timestamp=3D0, exec=3Dfalse) at util/comm.c:111 #9 0x00000000005363bc in thread__new (pid=3D2, tid=3D2) at util/thread.c= :57 #10 0x0000000000523da0 in ____machine__findnew_thread (machine=3D0xbfde38= , threads=3D0xbfdf28, pid=3D2, tid=3D2, create=3Dtrue) at util/machine.= c:457 #11 0x0000000000523eb4 in __machine__findnew_thread (machine=3D0xbfde38, ... The failing assertion is this one: REFCOUNT_WARN(!refcount_inc_not_zero(r), ... The problem is that we keep global comm_str_root list, which is accessed by multiple threads during the 'perf top' startup and following 2 paths can race: thread 1: ... thread__new comm__new comm_str__findnew down_write(&comm_str_lock); __comm_str__findnew comm_str__get thread 2: ... comm__override or comm__free comm_str__put refcount_dec_and_test down_write(&comm_str_lock); rb_erase(&cs->rb_node, &comm_str_root); Because thread 2 first decrements the refcnt and only after then it removes= the struct comm_str from the list, the thread 1 can find this object on the lis= t with refcnt equls to 0 and hit the assert. This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects that already dropped the refcnt to 0. For the rest of the objects we take t= he refcnt before comparing its name and release it afterwards with comm_str__p= ut, which can also release the object completely. Signed-off-by: Jiri Olsa Acked-by: Namhyung Kim Cc: Alexander Shishkin Cc: Andi Kleen Cc: David Ahern Cc: Kan Liang Cc: Lukasz Odzioba Cc: Peter Zijlstra Cc: Wang Nan Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin --- tools/perf/util/comm.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c index 7798a2cc8a86..31279a7bd919 100644 --- a/tools/perf/util/comm.c +++ b/tools/perf/util/comm.c @@ -20,9 +20,10 @@ static struct rw_semaphore comm_str_lock =3D {.lock =3D = PTHREAD_RWLOCK_INITIALIZER,} =20 static struct comm_str *comm_str__get(struct comm_str *cs) { - if (cs) - refcount_inc(&cs->refcnt); - return cs; + if (cs && refcount_inc_not_zero(&cs->refcnt)) + return cs; + + return NULL; } =20 static void comm_str__put(struct comm_str *cs) @@ -67,9 +68,14 @@ struct comm_str *__comm_str__findnew(const char *str, st= ruct rb_root *root) parent =3D *p; iter =3D rb_entry(parent, struct comm_str, rb_node); =20 + /* + * If we race with comm_str__put, iter->refcnt is 0 + * and it will be removed within comm_str__put call + * shortly, ignore it in this search. + */ cmp =3D strcmp(str, iter->str); - if (!cmp) - return comm_str__get(iter); + if (!cmp && comm_str__get(iter)) + return iter; =20 if (cmp < 0) p =3D &(*p)->rb_left; --=20 2.17.1