* [PATCH] binder: do not crash on bad transaction in binder_thread_release() @ 2025-03-31 15:24 Frode Isaksen 2025-04-01 3:03 ` Carlos Llamas 0 siblings, 1 reply; 3+ messages in thread From: Frode Isaksen @ 2025-03-31 15:24 UTC (permalink / raw) To: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Carlos Llamas, Suren Baghdasaryan Cc: Frode Isaksen, linux-kernel From: Frode Isaksen <frode@meta.com> Instead of calling BUG(), set the binder_thread to NULL, as is done in other parts of the code. Log if it is a bad transaction (other than in or out). The BUG in binder_thread_release() was preceded by these warning logs: binder: 1198:1217 got reply transaction with bad transaction stack, transaction 49693 has target 1198:0 binder: 1198:1217 transaction failed 29201/-71, size 4-0 line 3065 ... binder: release 954:1333 transaction 49693 out, still active ... binder: release 1198:1217 transaction 49693 out, still active kernel BUG at drivers/android/binder.c:5070! Signed-off-by: Frode Isaksen <frode@meta.com> --- This bug was discovered, tested and fixed (no more crashes seen) on Meta Quest 3 device. drivers/android/binder.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 76052006bd87..c21d7806e42b 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -5302,7 +5302,8 @@ static int binder_thread_release(struct binder_proc *proc, "release %d:%d transaction %d %s, still active\n", proc->pid, thread->pid, t->debug_id, - (t->to_thread == thread) ? "in" : "out"); + (t->to_thread == thread) ? "in" : + (t->from == thread) ? "out" : "bad"); if (t->to_thread == thread) { thread->proc->outstanding_txns--; @@ -5317,7 +5318,7 @@ static int binder_thread_release(struct binder_proc *proc, t->from = NULL; t = t->from_parent; } else - BUG(); + t = NULL; spin_unlock(&last_t->lock); if (t) spin_lock(&t->lock); -- 2.49.0 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] binder: do not crash on bad transaction in binder_thread_release() 2025-03-31 15:24 [PATCH] binder: do not crash on bad transaction in binder_thread_release() Frode Isaksen @ 2025-04-01 3:03 ` Carlos Llamas 2025-04-01 7:53 ` Frode Isaksen 0 siblings, 1 reply; 3+ messages in thread From: Carlos Llamas @ 2025-04-01 3:03 UTC (permalink / raw) To: Frode Isaksen Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Suren Baghdasaryan, Frode Isaksen, linux-kernel On Mon, Mar 31, 2025 at 05:24:14PM +0200, Frode Isaksen wrote: > From: Frode Isaksen <frode@meta.com> > > Instead of calling BUG(), set the binder_thread to NULL, Yeap, not crashing the kernel is a good idea. > as is done in other parts of the code. > Log if it is a bad transaction (other than in or out). > The BUG in binder_thread_release() was preceded by > these warning logs: > binder: 1198:1217 got reply transaction with bad transaction stack, > transaction 49693 has target 1198:0 So tid 1217 is sending a reply to an incoming sync transaction. However, its transaction_stack shows that t->to_thread is NULL. I have no idea how can this be possible. When the transaction was picked up by 1217, its info was recorded in its transaction_stack as such: if (cmd != BR_REPLY && !(t->flags & TF_ONE_WAY)) { binder_inner_proc_lock(thread->proc); t->to_parent = thread->transaction_stack; t->to_thread = thread; <----- HERE thread->transaction_stack = t; binder_inner_proc_unlock(thread->proc); } I don't understand how 't->to_thread' later becomes NULL, maybe memory corruption? > binder: 1198:1217 transaction failed 29201/-71, size 4-0 line 3065 > ... > binder: release 954:1333 transaction 49693 out, still active > ... > binder: release 1198:1217 transaction 49693 out, still active > kernel BUG at drivers/android/binder.c:5070! > > Signed-off-by: Frode Isaksen <frode@meta.com> > --- > This bug was discovered, tested and fixed (no more crashes seen) on Meta Quest 3 device. Do you have a way to reproduce this? It sounds like there is something else going wrong before and we probably want to fix that. > > drivers/android/binder.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c > index 76052006bd87..c21d7806e42b 100644 > --- a/drivers/android/binder.c > +++ b/drivers/android/binder.c > @@ -5302,7 +5302,8 @@ static int binder_thread_release(struct binder_proc *proc, > "release %d:%d transaction %d %s, still active\n", > proc->pid, thread->pid, > t->debug_id, > - (t->to_thread == thread) ? "in" : "out"); > + (t->to_thread == thread) ? "in" : > + (t->from == thread) ? "out" : "bad"); > > if (t->to_thread == thread) { > thread->proc->outstanding_txns--; > @@ -5317,7 +5318,7 @@ static int binder_thread_release(struct binder_proc *proc, > t->from = NULL; > t = t->from_parent; > } else > - BUG(); > + t = NULL; Dropping BUG() is nice but this could use en error message. > spin_unlock(&last_t->lock); > if (t) > spin_lock(&t->lock); > -- > 2.49.0 > Regards, -- Carlos Llamas ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] binder: do not crash on bad transaction in binder_thread_release() 2025-04-01 3:03 ` Carlos Llamas @ 2025-04-01 7:53 ` Frode Isaksen 0 siblings, 0 replies; 3+ messages in thread From: Frode Isaksen @ 2025-04-01 7:53 UTC (permalink / raw) To: Carlos Llamas Cc: Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Suren Baghdasaryan, Frode Isaksen, linux-kernel On 4/1/25 5:03 AM, Carlos Llamas wrote: > On Mon, Mar 31, 2025 at 05:24:14PM +0200, Frode Isaksen wrote: >> From: Frode Isaksen <frode@meta.com> >> >> Instead of calling BUG(), set the binder_thread to NULL, > Yeap, not crashing the kernel is a good idea. > >> as is done in other parts of the code. >> Log if it is a bad transaction (other than in or out). >> The BUG in binder_thread_release() was preceded by >> these warning logs: >> binder: 1198:1217 got reply transaction with bad transaction stack, >> transaction 49693 has target 1198:0 > So tid 1217 is sending a reply to an incoming sync transaction. However, > its transaction_stack shows that t->to_thread is NULL. I have no idea > how can this be possible. When the transaction was picked up by 1217, > its info was recorded in its transaction_stack as such: > > if (cmd != BR_REPLY && !(t->flags & TF_ONE_WAY)) { > binder_inner_proc_lock(thread->proc); > t->to_parent = thread->transaction_stack; > t->to_thread = thread; <----- HERE > thread->transaction_stack = t; > binder_inner_proc_unlock(thread->proc); > } > > I don't understand how 't->to_thread' later becomes NULL, maybe memory > corruption? > >> binder: 1198:1217 transaction failed 29201/-71, size 4-0 line 3065 >> ... >> binder: release 954:1333 transaction 49693 out, still active >> ... >> binder: release 1198:1217 transaction 49693 out, still active >> kernel BUG at drivers/android/binder.c:5070! >> >> Signed-off-by: Frode Isaksen <frode@meta.com> >> --- >> This bug was discovered, tested and fixed (no more crashes seen) on Meta Quest 3 device. > Do you have a way to reproduce this? It sounds like there is something > else going wrong before and we probably want to fix that. I was never able to reproduce this locally and it happens very seldom. As you can see in the log, there was a "bad transaction" 17 secs before the crash. <6>[ 13.664167] binder: 1198:1217 got reply transaction with bad transaction stack, transaction 49693 has target 1198:0 <6>[ 13.664174] binder: 1198:1217 transaction failed 29201/-71, size 4-0 line 3065 ... <6>[ 40.793299] binder: release 954:1333 transaction 49693 out, still active ... <6>[ 41.624171] binder: release 1198:1217 transaction 49693 out, still active <6>[ 41.624197] ------------[ cut here ]------------ <2>[ 41.624198] kernel BUG at drivers/android/binder.c:5070! > >> drivers/android/binder.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/android/binder.c b/drivers/android/binder.c >> index 76052006bd87..c21d7806e42b 100644 >> --- a/drivers/android/binder.c >> +++ b/drivers/android/binder.c >> @@ -5302,7 +5302,8 @@ static int binder_thread_release(struct binder_proc *proc, >> "release %d:%d transaction %d %s, still active\n", >> proc->pid, thread->pid, >> t->debug_id, >> - (t->to_thread == thread) ? "in" : "out"); >> + (t->to_thread == thread) ? "in" : >> + (t->from == thread) ? "out" : "bad"); >> >> if (t->to_thread == thread) { >> thread->proc->outstanding_txns--; >> @@ -5317,7 +5318,7 @@ static int binder_thread_release(struct binder_proc *proc, >> t->from = NULL; >> t = t->from_parent; >> } else >> - BUG(); >> + t = NULL; > Dropping BUG() is nice but this could use en error message. The binder_debug() will log it as "bad" transaction instead of "out", so there will be a trace of this happening. Thanks, Frode > >> spin_unlock(&last_t->lock); >> if (t) >> spin_lock(&t->lock); >> -- >> 2.49.0 >> > Regards, > -- > Carlos Llamas ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-04-01 7:53 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-31 15:24 [PATCH] binder: do not crash on bad transaction in binder_thread_release() Frode Isaksen 2025-04-01 3:03 ` Carlos Llamas 2025-04-01 7:53 ` Frode Isaksen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox