From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA2E5C433F5 for ; Fri, 28 Jan 2022 16:09:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245596AbiA1QJ1 (ORCPT ); Fri, 28 Jan 2022 11:09:27 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:30888 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S234788AbiA1QJ1 (ORCPT ); Fri, 28 Jan 2022 11:09:27 -0500 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 20SFoldD027320; Fri, 28 Jan 2022 16:08:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : mime-version; s=pp1; bh=QuLXzDgjL9wnzbED3Nhc8cRtQ2KTEN03cmdeQI331YI=; b=KvhOshayGMTSXSGaeA4/I2ZeblH7U3fOIQzTE1m0wGbar/VSD3opcDl8nLE3BekGGnDm 6jVnwWvHywZNTwvxjzKy/WF++mICRRobj3FDDlydD+twy/BwX+d+aYcgfHNvkuR0xcu6 MjWinBsFlmyZopC3PRN3Zc7bOJl/ySFoUllVHP14WIg7S8GsRjVk9Z8wldzo3Q59vnfU Ifogc4f7yDExw/YnPT5qr5fHy0QC9aPzHaDhIBfr5wNH+qWiMnfNLark9BBWz2EQqBoK 6KRQ8xCmRr+g6SJ9YBuvYa9/woAg/g9uUaA6Q+bkmoAqBN9eBlttO9ZKMEIg5s72yk0X oQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3dvjwus6pk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 16:08:55 +0000 Received: from m0098413.ppops.net (m0098413.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 20SFrXPj007396; Fri, 28 Jan 2022 16:08:55 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0b-001b2d01.pphosted.com with ESMTP id 3dvjwus6nx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 16:08:55 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 20SG2rDB030994; Fri, 28 Jan 2022 16:08:53 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma01fra.de.ibm.com with ESMTP id 3dr9ja8jqp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 16:08:53 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 20SG8n4X33948128 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 28 Jan 2022 16:08:49 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 865BB11C052; Fri, 28 Jan 2022 16:08:49 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1669111C058; Fri, 28 Jan 2022 16:08:49 +0000 (GMT) Received: from tuxmaker.linux.ibm.com (unknown [9.152.85.9]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 28 Jan 2022 16:08:49 +0000 (GMT) From: Sven Schnelle To: Mark Rutland Cc: Steven Rostedt , LKML , Ingo Molnar , Andrew Morton , Yinan Liu , Ard Biesheuvel , Kees Cook , Sachin Sant , linuxppc-dev@lists.ozlabs.org, Russell King , linux-arm-kernel@lists.infradead.org, hca@linux.ibm.com, linux-s390@vger.kernel.org, "Paul E. McKenney" Subject: Re: ftrace hangs waiting for rcu References: <20220127114249.03b1b52b@gandalf.local.home> Date: Fri, 28 Jan 2022 17:08:48 +0100 In-Reply-To: (Mark Rutland's message of "Fri, 28 Jan 2022 15:42:48 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: HIA2iVl0ZRTGnPPvDah9uhzigmjhRZTE X-Proofpoint-ORIG-GUID: 2MXkrPI1bRRT1QHJ7IkNsiOUGgkUm0bm X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_05,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 phishscore=0 spamscore=0 mlxscore=0 malwarescore=0 impostorscore=0 bulkscore=0 lowpriorityscore=0 clxscore=1015 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280101 Precedence: bulk List-ID: X-Mailing-List: linux-s390@vger.kernel.org Hi Mark, Mark Rutland writes: > On arm64 I bisected this down to: > > 7a30871b6a27de1a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection") > > Which was going wrong because ilog2() rounds down, and so the shift was wrong > for any nr_cpus that was not a power-of-two. Paul had already fixed that in > rcu-next, and just sent a pull request to Linus: > > https://lore.kernel.org/lkml/20220128143251.GA2398275@paulmck-ThinkPad-P17-Gen-1/ > > With that applied, I no longer see these hangs. > > Does your s390 test machine have a non-power-of-two nr_cpus, and does that fix > the issue for you? We noticed the PR from Paul and are currently testing the fix. So far it's looking good. The configuration where we have seen the hang is a bit unusual: - 16 physical CPUs on the kvm host - 248 logical CPUs inside kvm - debug kernel both on the host and kvm guest So things are likely a bit slow in the kvm guest. Interesting is that the number of CPUs is even. But maybe RCU sees an odd number of CPUs and gets confused before all cpus are brought up. Have to read code/test to see whether that could be possible. Thanks for investigating! Sven From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 22150C433F5 for ; Fri, 28 Jan 2022 16:10:07 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Jlj8s2yncz3cCS for ; Sat, 29 Jan 2022 03:10:05 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=KvhOshay; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0a-001b2d01.pphosted.com; envelope-from=svens@linux.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=KvhOshay; dkim-atps=neutral Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Jlj856pzkz2yLT for ; Sat, 29 Jan 2022 03:09:25 +1100 (AEDT) Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 20SFoldD027320; Fri, 28 Jan 2022 16:08:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : mime-version; s=pp1; bh=QuLXzDgjL9wnzbED3Nhc8cRtQ2KTEN03cmdeQI331YI=; b=KvhOshayGMTSXSGaeA4/I2ZeblH7U3fOIQzTE1m0wGbar/VSD3opcDl8nLE3BekGGnDm 6jVnwWvHywZNTwvxjzKy/WF++mICRRobj3FDDlydD+twy/BwX+d+aYcgfHNvkuR0xcu6 MjWinBsFlmyZopC3PRN3Zc7bOJl/ySFoUllVHP14WIg7S8GsRjVk9Z8wldzo3Q59vnfU Ifogc4f7yDExw/YnPT5qr5fHy0QC9aPzHaDhIBfr5wNH+qWiMnfNLark9BBWz2EQqBoK 6KRQ8xCmRr+g6SJ9YBuvYa9/woAg/g9uUaA6Q+bkmoAqBN9eBlttO9ZKMEIg5s72yk0X oQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3dvjwus6pk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 16:08:55 +0000 Received: from m0098413.ppops.net (m0098413.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 20SFrXPj007396; Fri, 28 Jan 2022 16:08:55 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0b-001b2d01.pphosted.com with ESMTP id 3dvjwus6nx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 16:08:55 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 20SG2rDB030994; Fri, 28 Jan 2022 16:08:53 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma01fra.de.ibm.com with ESMTP id 3dr9ja8jqp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 16:08:53 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 20SG8n4X33948128 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 28 Jan 2022 16:08:49 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 865BB11C052; Fri, 28 Jan 2022 16:08:49 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1669111C058; Fri, 28 Jan 2022 16:08:49 +0000 (GMT) Received: from tuxmaker.linux.ibm.com (unknown [9.152.85.9]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 28 Jan 2022 16:08:49 +0000 (GMT) From: Sven Schnelle To: Mark Rutland Subject: Re: ftrace hangs waiting for rcu References: <20220127114249.03b1b52b@gandalf.local.home> Date: Fri, 28 Jan 2022 17:08:48 +0100 In-Reply-To: (Mark Rutland's message of "Fri, 28 Jan 2022 15:42:48 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: HIA2iVl0ZRTGnPPvDah9uhzigmjhRZTE X-Proofpoint-ORIG-GUID: 2MXkrPI1bRRT1QHJ7IkNsiOUGgkUm0bm X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_05,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 phishscore=0 spamscore=0 mlxscore=0 malwarescore=0 impostorscore=0 bulkscore=0 lowpriorityscore=0 clxscore=1015 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280101 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-s390@vger.kernel.org, Kees Cook , "Paul E. McKenney" , hca@linux.ibm.com, LKML , Steven Rostedt , Ingo Molnar , Sachin Sant , Russell King , Andrew Morton , Yinan Liu , linuxppc-dev@lists.ozlabs.org, Ard Biesheuvel , linux-arm-kernel@lists.infradead.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hi Mark, Mark Rutland writes: > On arm64 I bisected this down to: > > 7a30871b6a27de1a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection") > > Which was going wrong because ilog2() rounds down, and so the shift was wrong > for any nr_cpus that was not a power-of-two. Paul had already fixed that in > rcu-next, and just sent a pull request to Linus: > > https://lore.kernel.org/lkml/20220128143251.GA2398275@paulmck-ThinkPad-P17-Gen-1/ > > With that applied, I no longer see these hangs. > > Does your s390 test machine have a non-power-of-two nr_cpus, and does that fix > the issue for you? We noticed the PR from Paul and are currently testing the fix. So far it's looking good. The configuration where we have seen the hang is a bit unusual: - 16 physical CPUs on the kvm host - 248 logical CPUs inside kvm - debug kernel both on the host and kvm guest So things are likely a bit slow in the kvm guest. Interesting is that the number of CPUs is even. But maybe RCU sees an odd number of CPUs and gets confused before all cpus are brought up. Have to read code/test to see whether that could be possible. Thanks for investigating! Sven From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 68AC9C4332F for ; Fri, 28 Jan 2022 16:12:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:In-Reply-To: Date:References:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=9cucfj/UrBuPa9Mgn+L/gpT1NhfDb1Eyq1mTOIN5O2g=; b=NgeDSRwCb35fRo BAP7Da3Iloeo1Z+qZKzUaikPCh/0Kv7U1GNNiSZHY7FoqTgZRJNgRZjqucY3f1WKGCm7Z10QDq9OJ PWnHp29M8VeRXqljPgSj0S03z4XO8rWHjMTffMDYX+kR0+1XtGGBGNiRXa7ED846mLAkGadG2eDSB mnvaPgZKHcVEGkCu/B5LTWsq821W3AThkErhKnJyX/azXkle+Lb7n3CgFPeX8SF2KO5qpA9dWK4Lp TSaTEJcOEI1f35YTA70jCYY24m4DQUBNEPUAxmi/PkHKAe86yp6VWAAsyxEL3VEr7T0j4Z1JIO0Ls VtPMvCzGWgk9l2e6C/jg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nDTpN-002siZ-Lu; Fri, 28 Jan 2022 16:10:22 +0000 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5] helo=mx0a-001b2d01.pphosted.com) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nDToO-002sLL-2q for linux-arm-kernel@lists.infradead.org; Fri, 28 Jan 2022 16:09:21 +0000 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 20SFoldD027320; Fri, 28 Jan 2022 16:08:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : mime-version; s=pp1; bh=QuLXzDgjL9wnzbED3Nhc8cRtQ2KTEN03cmdeQI331YI=; b=KvhOshayGMTSXSGaeA4/I2ZeblH7U3fOIQzTE1m0wGbar/VSD3opcDl8nLE3BekGGnDm 6jVnwWvHywZNTwvxjzKy/WF++mICRRobj3FDDlydD+twy/BwX+d+aYcgfHNvkuR0xcu6 MjWinBsFlmyZopC3PRN3Zc7bOJl/ySFoUllVHP14WIg7S8GsRjVk9Z8wldzo3Q59vnfU Ifogc4f7yDExw/YnPT5qr5fHy0QC9aPzHaDhIBfr5wNH+qWiMnfNLark9BBWz2EQqBoK 6KRQ8xCmRr+g6SJ9YBuvYa9/woAg/g9uUaA6Q+bkmoAqBN9eBlttO9ZKMEIg5s72yk0X oQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3dvjwus6pk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 16:08:55 +0000 Received: from m0098413.ppops.net (m0098413.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 20SFrXPj007396; Fri, 28 Jan 2022 16:08:55 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0b-001b2d01.pphosted.com with ESMTP id 3dvjwus6nx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 16:08:55 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 20SG2rDB030994; Fri, 28 Jan 2022 16:08:53 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma01fra.de.ibm.com with ESMTP id 3dr9ja8jqp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Jan 2022 16:08:53 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 20SG8n4X33948128 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 28 Jan 2022 16:08:49 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 865BB11C052; Fri, 28 Jan 2022 16:08:49 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1669111C058; Fri, 28 Jan 2022 16:08:49 +0000 (GMT) Received: from tuxmaker.linux.ibm.com (unknown [9.152.85.9]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 28 Jan 2022 16:08:49 +0000 (GMT) From: Sven Schnelle To: Mark Rutland Cc: Steven Rostedt , LKML , Ingo Molnar , Andrew Morton , Yinan Liu , Ard Biesheuvel , Kees Cook , Sachin Sant , linuxppc-dev@lists.ozlabs.org, Russell King , linux-arm-kernel@lists.infradead.org, hca@linux.ibm.com, linux-s390@vger.kernel.org, "Paul E. McKenney" Subject: Re: ftrace hangs waiting for rcu References: <20220127114249.03b1b52b@gandalf.local.home> Date: Fri, 28 Jan 2022 17:08:48 +0100 In-Reply-To: (Mark Rutland's message of "Fri, 28 Jan 2022 15:42:48 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) X-TM-AS-GCONF: 00 X-Proofpoint-GUID: HIA2iVl0ZRTGnPPvDah9uhzigmjhRZTE X-Proofpoint-ORIG-GUID: 2MXkrPI1bRRT1QHJ7IkNsiOUGgkUm0bm X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2022-01-28_05,2022-01-28_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 phishscore=0 spamscore=0 mlxscore=0 malwarescore=0 impostorscore=0 bulkscore=0 lowpriorityscore=0 clxscore=1015 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2201280101 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220128_080920_287853_E2EA470B X-CRM114-Status: GOOD ( 24.65 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Mark, Mark Rutland writes: > On arm64 I bisected this down to: > > 7a30871b6a27de1a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection") > > Which was going wrong because ilog2() rounds down, and so the shift was wrong > for any nr_cpus that was not a power-of-two. Paul had already fixed that in > rcu-next, and just sent a pull request to Linus: > > https://lore.kernel.org/lkml/20220128143251.GA2398275@paulmck-ThinkPad-P17-Gen-1/ > > With that applied, I no longer see these hangs. > > Does your s390 test machine have a non-power-of-two nr_cpus, and does that fix > the issue for you? We noticed the PR from Paul and are currently testing the fix. So far it's looking good. The configuration where we have seen the hang is a bit unusual: - 16 physical CPUs on the kvm host - 248 logical CPUs inside kvm - debug kernel both on the host and kvm guest So things are likely a bit slow in the kvm guest. Interesting is that the number of CPUs is even. But maybe RCU sees an odd number of CPUs and gets confused before all cpus are brought up. Have to read code/test to see whether that could be possible. Thanks for investigating! Sven _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel