From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AD63C433F5 for ; Tue, 2 Nov 2021 08:33:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 126BD61051 for ; Tue, 2 Nov 2021 08:33:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230353AbhKBIgN (ORCPT ); Tue, 2 Nov 2021 04:36:13 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:48212 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbhKBIgM (ORCPT ); Tue, 2 Nov 2021 04:36:12 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 1A8D91FD77; Tue, 2 Nov 2021 08:33:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1635842017; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=t2IT/o+8Ln8itaqpCquNFH+qyCEqcX9y1LvXTcQoBW0=; b=eUXKYKr3qhTxI6MybOhhgYX54RUP4tVin8j26AUUHBl12XiCLdOVH0yOzH+NmuYfgQ02uy f4PkPrbtuND9wIxHXljEYaXzkzOhBftSBGBcARgbgMI6z23E6COsu/L+XJt4Qss6gCnd7a /v1GAo+WcXnX/L4ASXkA8fiQocwc9dE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1635842017; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=t2IT/o+8Ln8itaqpCquNFH+qyCEqcX9y1LvXTcQoBW0=; b=bwE+Gm2ya5r4gt++cvcXJ9lGHoKXmYofMipVBsxgzusZ71d0Yl+19UUqMISa79TpKLi/DW XhWMwxP7PKwg1OCQ== Received: from alsa1.suse.de (alsa1.suse.de [10.160.4.42]) by relay2.suse.de (Postfix) with ESMTP id 0336CA3B84; Tue, 2 Nov 2021 08:33:37 +0000 (UTC) Date: Tue, 02 Nov 2021 09:33:36 +0100 Message-ID: From: Takashi Iwai To: Zqiang Cc: tiwai@suse.com, alsa-devel@alsa-project.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] ALSA: seq: Fix RCU stall in snd_seq_write() In-Reply-To: <20211102033222.3849-1-qiang.zhang1211@gmail.com> References: <20211102033222.3849-1-qiang.zhang1211@gmail.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.3 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 02 Nov 2021 04:32:22 +0100, Zqiang wrote: > > If we have a lot of cell object, this cycle may take a long time, and > trigger RCU stall. insert a conditional reschedule point to fix it. > > rcu: INFO: rcu_preempt self-detected stall on CPU > rcu: 1-....: (1 GPs behind) idle=9f5/1/0x4000000000000000 > softirq=16474/16475 fqs=4916 > (t=10500 jiffies g=19249 q=192515) > NMI backtrace for cpu 1 > ...... > asm_sysvec_apic_timer_interrupt > RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 > spin_unlock_irqrestore > snd_seq_prioq_cell_out+0x1dc/0x360 > snd_seq_check_queue+0x1a6/0x3f0 > snd_seq_enqueue_event+0x1ed/0x3e0 > snd_seq_client_enqueue_event.constprop.0+0x19a/0x3c0 > snd_seq_write+0x2db/0x510 > vfs_write+0x1c4/0x900 > ksys_write+0x171/0x1d0 > do_syscall_64+0x35/0xb0 > > Reported-by: syzbot+bb950e68b400ab4f65f8@syzkaller.appspotmail.com > Signed-off-by: Zqiang > --- > sound/core/seq/seq_queue.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/sound/core/seq/seq_queue.c b/sound/core/seq/seq_queue.c > index d6c02dea976c..f5b1e4562a64 100644 > --- a/sound/core/seq/seq_queue.c > +++ b/sound/core/seq/seq_queue.c > @@ -263,6 +263,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) > if (!cell) > break; > snd_seq_dispatch_event(cell, atomic, hop); > + cond_resched(); > } > > /* Process time queue... */ > @@ -272,6 +273,7 @@ void snd_seq_check_queue(struct snd_seq_queue *q, int atomic, int hop) > if (!cell) > break; > snd_seq_dispatch_event(cell, atomic, hop); > + cond_resched(); It's good to have cond_resched() in those places but it must be done more carefully, as the code path may be called from the non-atomic context, too. That is, it must have a check of atomic argument, and cond_resched() is applied only when atomic==false. But I still wonder how this gets a RCU stall out of sudden. Looking through https://syzkaller.appspot.com/bug?extid=bb950e68b400ab4f65f8 it's triggered by many cases since the end of September... thanks, Takashi