From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753251AbcCXUY0 (ORCPT ); Thu, 24 Mar 2016 16:24:26 -0400 Received: from mail-by2on0078.outbound.protection.outlook.com ([207.46.100.78]:40096 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751863AbcCXUYR (ORCPT ); Thu, 24 Mar 2016 16:24:17 -0400 Authentication-Results: spf=fail (sender IP is 63.163.107.225) smtp.mailfrom=sandisk.com; linux.vnet.ibm.com; dkim=none (message not signed) header.d=none;linux.vnet.ibm.com; dmarc=none action=none header.from=sandisk.com; X-AuditID: ac160c71-c9970980000019e4-f4-56f44cde61d5 Subject: Re: RCU stall To: "paulmck@linux.vnet.ibm.com" References: <56F1A8F2.9000905@sandisk.com> <20160322204510.GS4287@linux.vnet.ibm.com> <56F1DAF6.3030804@sandisk.com> <20160323015932.GX4287@linux.vnet.ibm.com> <20160323022902.GA28227@linux.vnet.ibm.com> CC: "linux-kernel@vger.kernel.org" From: Bart Van Assche Message-ID: <56F44CE2.1060703@sandisk.com> Date: Thu, 24 Mar 2016 13:24:02 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 MIME-Version: 1.0 In-Reply-To: <20160323022902.GA28227@linux.vnet.ibm.com> Content-Type: multipart/mixed; boundary="------------050900080509050601030503" X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrEIsWRmVeSWpSXmKPExsWyRoxnke59ny9hBkt3MFtc3jWHzeLt5u+s DkweDw5tZvH4vEkugCmKyyYlNSezLLVI3y6BK+NsZ2rBv+KK5p5mpgbGfyldjBwcEgImEgvX JXcxcnEICRxglHixYhtTFyMnkLODUWLGlgIQG6Rm29XVbBBFSxkldm59ygaSEBYQk9j5qhHM FhGwlfh8cD4zRNFFRom2PxfBJjELOErc3vsWzGYTMJL49n4mC4jNK6Al8WHRZUYQm0VAVeLd yTPsILaoQIRE64On7BA1ghInZz4Bq+cUsJCYvfkE1MwAiTmN81lAlkkI/GORuHT2FDvE2eoS J5fMZ5rAKDQLSf8sJD0QtoXEzPnnGSFseYntb+cwQ9jREnP6zrKjinMA2SkSL3blQJhBEjef RGPTeeTEO6jOWIkr7b1MmKakSbQ3sEGEUyTmfPzGgmlMA6PEruccCxgFVjGK5WbmFOempxYY mugVJ+alZBZn6yXn525ihMR+4Q7G17e9DzEKcDAq8fA6uH8JE2JNLCuuzD3EqAI06NGG1RcY pVjy8vNSlUR4Z3kApXlTEiurUovy44tKc1KLDzFKc7AoifNejP4YJiSQnliSmp2aWpBaBJNl 4uCUamAMqev43lJ3L9v458GjL2RX3/BrZeesT2f+//vCRsMb03hmGx2f8nhHx5/569Umdnxr Tv+wN9z9Rej71b8DIrUZ1q683SL1/Jv/ucSGjnVVx54r+d7onblfvmvHVedJqaZ7g9kCLWvC kqo5QkWynpatzedeK/nr9LvL9R/2X/zGJhKbJCrbe9ZUiaU4I9FQi7moOBEAAfbd2wUDAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrBJMWRmVeSWpSXmKPExsXCtZEjRfeRz5cwg4svJSwu75rDZvF283dW ByaPB4c2s3h83iQXwBTFZZOSmpNZllqkb5fAldGzT6pgVUnF8h8P2RoYF6Z2MXJySAiYSGy7 upqti5GLQ0hgMaPErEe/mEASwgJiEjtfNbKB2CICthKfD85nhii6yCjR9uciWBGzgKPEo8t3 mEFsNgEjiW/vZ7KA2LwCWhIfFl1mBLFZBFQl3p08ww5iiwpESLQ+eMoOUSMocXLmE7B6TgEL idmbT0DNDJDoO3eKZQIj7ywkZbOQpBYwMq1iFMvNzCnOTc8sMDTSK07MS8ksztZLzs/dxAgO G86oHYzXJ5ofYmTi4JRqYFRONtaZvdf5xba9fgtnOGpO2Po3W3HX+3VevC7/ps1nn//gQ+L+ Q1sy8sIfzfM5KLbu584O5hNaIqnqE1tmTGW++O/iZZbJ6QmfXJKc2kQ3tYs7Lo7/0Bak2ThL kJ95xaFgppKJW+7wl2ef3XNq9+nuH7LdVzel+FjMfAd0j7JK75STRgc3TlViKc5INNRiLipO BABifFhqywEAAA== X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:63.163.107.225;IPV:NLI;CTRY:US;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(1110001)(1109001)(339900001)(377454003)(24454002)(189002)(199003)(2950100001)(6806005)(64126003)(11100500001)(5008740100001)(5890100001)(2270400002)(93886004)(270700001)(3480700003)(586003)(568964002)(2476003)(1096002)(4326007)(53416004)(2906002)(2501003)(92566002)(77096005)(4610100001)(1220700001)(5000100001)(110136002)(86362001)(33656002)(4810100001)(65816999)(87266999)(84326002)(65956001)(50986999)(76176999)(189998001)(19580395003)(36756003)(512874002)(59896002)(83506001)(54356999)(4001350100001)(106466001)(221733001)(105606002)(65806001)(5003600100002)(85426001)(2351001)(87936001);DIR:OUT;SFP:1101;SCL:1;SRVR:BN1PR02MB055;H:milsmgep14.sandisk.com;FPR:;SPF:Fail;MLV:sfv;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;BN1BFFO11FD053;1:bRLL4zdOijYU/Ku967xX6FLKHoH1EK72B38nWwK/0Wa3C0s6QJ7zlqC05WnOY9Ze3Ao5m8i03ymymCwP68IL5p3Lq99pQyB+1pzNTs2h3spPT7ikm5M3OvOOValGO2LvmBcO3qQh1+z2Zp66bZYv0N0rCJpymX4pGHh+7yjbTaKBIGzvfMKJ6fP+J61qW1mYbghb3uv9dmXFq4TlUEPKax5v8wBrbTUYjQqRjOG+JmABfU/fpNPhfbFO59uDVEN6jsHnqc1WKR65mxkhTfuLZmi82N1hbzVzNZ4ZIbKRZFVqWfCJOgJcoHHSQIciVKBzir6ouk77qkJOrrk2KDsO3OAyZ/q5U2bVgZqhWxdYMc2Svu/23/sN+Sg2BMUUSh0KfU3ZtYzUOwzTSBsvvKNxHA== X-MS-Office365-Filtering-Correlation-Id: 680bc329-b858-4f2e-2253-08d35422426a X-Microsoft-Exchange-Diagnostics: 1;BN1PR02MB055;2:hDutIfGGIR1Lfs27P5sJB0PpeWbF9feSNAbR0m09N0UBgeFmElDsQ+BBeX1iKNWJvQgD0rUdV9hAMLaoj/Z3irge0yBbH6DuTmtxsqFfvQW8O0m9w9yvcrhd5PgBtsPWxJQ422ip7Mur9xW9zxpgmZwc0eP/+g/BB5/J2cYIN02RcPFHlarImnRTMYKoo5lH;3:sOklFxCRdhSFK4dcJ7BDMc2nFbbuK//q0fmmLoLee+yDwXQ/8YpUQ0tJXb7JxUVQBEZQL0AUM/scxeYO8whacpodpN/D8DEhqRzlgrWzOFzpb+aREG5EGy+7MLbXCy9JefWeU+isbctt7TPrtJ0JqswhCDbmH94ajsevBsni4KTZwLX6oaYwjk4sccMqi7Pg5HgLI0bH7sddFPd7Lmz2boGbSB5ziDqPena3lvIyaVc= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN1PR02MB055; X-Microsoft-Exchange-Diagnostics: 1;BN1PR02MB055;25:xZFsCf9TyNKkdHpMwraTQHBgkjejB8tpF3+YSAmHFiVvWnUWYgY2FgKzgrgQNjD97n5gYifFA64nr2hLjEfdOiVExmZDMHIW+bM7A62GqU3WmVPSMDFUeOExroh1lmGlkSxZLxSBvRPLqXmhJ6v2d2KA0shfLLkExogDawiygGULh8mronlQsZDLcWOqJCBVjeQhtl+igr7vAuj7Cly4mbsiQwcRn4g2MOZ66s5t2J2Rfb4qbpDR3WECaxyeXE6vmPmjyyImmM2/ZGLSe1ZZunJu8zKOVFsmF3u+w80+FkXkIFb9TMAkZJ+Z0mrZq/kNknjX7nJGFU2ohQngv1pEuN0HFWRqR/ePuPivfESddtYXubKZW7jyb/nmuuTU3izVEvB8DlvL3oP7ATw5xHSUyaogLlRgpj7LF7a6Apz8FIWkrX6xIjnA/QPX7W0EbUAMZiBMHAjv0ueYjaho9LsANHGgJriNkjVo0NjOJ0OYWyaz/1MbSmXhW9HPQbsmEQu8U92uYdCpPe5qoiaSa5/438ljM9ZCHQB6LNd+PH3EM4EiZzS6QBoyR3DdVD1j5tPsa8xD9Rs+FpGaFpskVvaCXqNt79smqVu1RvGckW/BPZ4wfMqZJUV3mbim+ICSlbssejZHNm04QGr8CS5vmkz0ACtbNJh+1ZErUzw5Hsu3bFY= X-Microsoft-Exchange-Diagnostics: 1;BN1PR02MB055;20:Hx3Robe3FjbB4sgCvhgOs3v4gw9DnOvNTUp38Gtaa6QDTyz70GmGlbxy0f3/jlZSp78GW752qYeDejk8pzGKqAZjUWrWTVa62bM+JFGJd/W63hm2RTITkfMLKWV7W0SfexWthyN2HDx5HbFBDoE8klcltonPJI06sRhpt7hi/jydBaQGG1/hoSlECbkgsG15jbdUA7gBPi4HDoF1jLA5pR/Zn+wnyp51UH+CgktWqQWHjZzZTiVeLbASPisrgAc2Kaq30BN16sBP4fq2GcfPwaIdJ/SckgJ+PhNfRzCLtuXPWb6KHClEkLXlX1y4Q/Xotz/3V9UroW7tVcAllaYWNjb/te24tuxei1y9hSKiP0k9Xx3IKskGCkurIf/ZvmzluYjNEpZW89Hs1kBgQSPzxYDth/qKehvPK6Ox0+7pNxBZoBS9FIQSR2XwOPu/3h9BI+htVr2WsdC+OKyZZ2hvQT9rTT/PdvkWV3uj7Pp+KvfRYyK2cf+JeQPOnTFAW1mm X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(102415293)(102615271)(601004)(2401047)(13018025)(8121501046)(5005006)(13016025)(10201501046)(3002001);SRVR:BN1PR02MB055;BCL:0;PCL:0;RULEID:;SRVR:BN1PR02MB055; X-Microsoft-Exchange-Diagnostics: 1;BN1PR02MB055;4:fxjp4DZiJlFPY+G4MYsAY8K53UukTpkOH1ygEGzKNHrwBkFoRST6q0yz4Luuqz6d3gLMotY4CvDKHOm99uOEJg+qnmUP+CnzVwMhpExqa24fPpaNI382wHbvCkQjRWCkbWI9yquC/g51supN3WXwbU0ql6GYuwbqKKXkQbYEzoiSgahitEYC9lB1IbG7CCEEXFVQUgj5cL8O9mZdc8nV5NPJ+bTycYJdVNmro5fPE8EKT6IUXpW78S4GtuGHvmxm3+Clr/QHPQnlf0gOFTFNTgLvAsVNrr+uzfpqZfxx/l4twLlCbg16ZqcLD1mesk130IDmjE3Yzg1wI7K1dB2AWNrGN6I3SJFuEM2DxjCFh32I4+2g6d0wsOf/9n+1nfY2VythHJxHDQWvXaLr7XMy0XSeYOyXHZDOR/Q94SqHnlcwRuye01rz3T7zl4ys+tvBactL0Ye5qUJgYGn2OVhYzA== X-Forefront-PRVS: 0891BC3F3D X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;BN1PR02MB055;23:hadiw4h2tyDZadxRGNuLvKK4+JAyzXEED1ZU/b72gn?= =?us-ascii?Q?VdLR8GqgQBlwd/jdSVztrpBEK4tVGYRfe7BElPuit7cCJcVeqByBDvDPNNUU?= =?us-ascii?Q?863ff474vTSFrPGVtOn+aeRUTCRP1WB65QjmF/6HBKKqiuwxJ6GR0m+4R+he?= =?us-ascii?Q?s6UWwKoApWJg0ZvvYSiIj3GP9fn1IHQHxsVhonTa3gxUNGfBxzTNULiXdvMY?= =?us-ascii?Q?VdCwqFrBGSZNk9OqLtuMJ93WqnZxCxSumIJ9M7GN4Kg+MzdwHIkxZoicpAO1?= =?us-ascii?Q?kSvefBSRSnN9t8OU7MzDoVBq97Hl5zvZfWd//h3VeXp0krrBriazzM7CZsu5?= =?us-ascii?Q?x27mOVF4a1A3VPlY1jdjTVYHjxh+8FlFUl1eIOWUBq/TTLPXaN0XZX2H0eFX?= =?us-ascii?Q?wjiga904/w2ToAJyJg7APn0p2ySJYiNutpZpUTzWkgqzCNgVnCvPZ3ZySXm5?= =?us-ascii?Q?ne8GMTBzsAOZNHOsPk9viUatSq0eBzGmbRO1Y0y3HQRffIIfduoZJJCfKVul?= =?us-ascii?Q?1ataDvrwaBTOFxJbhRUDAMJPw9gUeowArDNQFUJN568cNkwcoIfzaS9uoJaG?= =?us-ascii?Q?Qmk/Sb7ume7op9ZJbgcbWuc+jEM6+K0hzf+jc1WoPdNlyutdVx8gmeN1Ct2Q?= =?us-ascii?Q?18L838rBHnleh/0sOOi6CZ5GPxYeQ3mcOrddy5tNQ/ev0sTaatqFoMA1SUHo?= =?us-ascii?Q?6nrW6LONE0Xss8KldXRx0H+2nNS7wUhVmwKUEHPWuTH/4kM462SZywYHxlpK?= =?us-ascii?Q?adVyL/3QjE7wS2CX/Wlyh8c9aRIyFCVIdm/3UHTHHMUuAZDXMx+8WTXqnHDm?= =?us-ascii?Q?nb8n2B3yy1wpLVIRHUQGZwDuupcWhOx44kMANquMoQXXKNwzAWRMi2KyK+R1?= =?us-ascii?Q?Ztqa4mczdfVxEhLVwk+m+rI9pfzf8aTB8CuORSLQFZxJJqmPAv/xxbt3dGv/?= =?us-ascii?Q?tIoX6u9WLWAoIRAWMMvOnLpN30pd0LuGAJqGBH4kJv9DlarzXqw8RxAOXa9c?= =?us-ascii?Q?ZU74HDXbHH4MFyyUpUyPOVRgozRL8hiuh0WtzTm72KAXYE/BanulS1vWnu9A?= =?us-ascii?Q?xLLkl0vZdI82O55efMLNjKcF7fVP8YIuDJMZF26wmQDZBuhUzB6J5T9v50aE?= =?us-ascii?Q?xQiFcNfLV5PlaVSJcZ4qwdaqYCSIqT5DjU1ccW66jifnIW+SCV7fhqxesLcG?= =?us-ascii?Q?iC2o1smCgnYt1Dk99y+Ox1QZXTKC7onwt8nyngcRAspnxDeusgigdQZg8JRu?= =?us-ascii?Q?aoTz5d0NR6F0V6SESqzbuyxgZSgw9ldLFWmAkrv1oXE6mi8x9o2R7nbxS76F?= =?us-ascii?Q?uwP5cCnNYvfwVM+VMpT5RrPn0gXUp5it5QWVY6jFrQGIexWCir655VoFeVQ+?= =?us-ascii?Q?IocgZBZ3b5MyM+tfYPYtQGXMsnBFGaMXKHOCDnsNgoyShfx2XSKMn042zRqT?= =?us-ascii?Q?c0FZoP9O+XbkJgevSihlpVPscpBqBEbyGnUrOPc7UGFymc7BCR?= X-Microsoft-Exchange-Diagnostics: 1;BN1PR02MB055;5:pL3ZitX7Lqr/8xUoNtjOhYTKP+YtgngEPgbg2BTxudeAFOIGnuAR1TDyZajcSVn13/ifo4hlemxSQLAQiIepJla/NMzHZHNBeskYB4+uHLVwB6YnCroaNqUgGeVP70PeIR6Gs4nj46VoAGP8XltHKg==;24:oYL8AQe6buC5gWh+rSAWzHUnO4cO5KmYHCkDOAOznmj6QxT2qpK1nbLn17lWEydrFqOFP90w9Gf5V5AgccQGppC3FGUqMhKrPsvElde1owE=;20:81D+wj1fPtD8igYM89c6lkbsLKMA1sLhEZKjtlre1e4b1nQyKlDu7WK+tIEZs5gkPXXyaHXoaVGzHKnJYC/7KatwgyC95FKEOJasvbklFq0DfUmEu901NogRYNDcHLAqvxwWkFRi15SnAKBi/Cdae5cAVK5ZXmyZNwY8W6+8+b+FfqAskGrX9Te4aro5ToQZXygQeDr36aIp7vXLg0EanHWJAq/t3kQXByY18+hDxlZ6pc5F5H/1PRGuFZ2hkgET SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: sandisk.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Mar 2016 20:24:10.5501 (UTC) X-MS-Exchange-CrossTenant-Id: fcd9ea9c-ae8c-460c-ab3c-3db42d7ac64d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=fcd9ea9c-ae8c-460c-ab3c-3db42d7ac64d;Ip=[63.163.107.225];Helo=[milsmgep14.sandisk.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN1PR02MB055 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --------------050900080509050601030503 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit On 03/22/2016 07:29 PM, Paul E. McKenney wrote: > Note that a soft lockup triggered at 10509.568010, well before the RCU > CPU stall warning.. And you have a second soft lockup at 10537.567212, > with the same funtion scsi_request_fn() at the top of the stack in both > stack traces. That function has a nice big "for (;;)" loop that does > not appear to have any iteration-limiting mechanism. Hello Paul, Your feedback is really appreciated. I have started with testing the four attached patches. With the tests I ran so far these patches were sufficient to avoid any soft lockup or RCU stall complaints. I will submit these patches to the appropriate maintainers once I have finished testing these patches. Bart. --------------050900080509050601030503 Content-Type: text/x-patch; name="0001-IB-cm-Fix-a-recently-introduced-locking-bug.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-IB-cm-Fix-a-recently-introduced-locking-bug.patch" >>From 0f72b28329342346980ae99c69d19b7adb0123bc Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 24 Mar 2016 11:01:14 -0700 Subject: [PATCH 1/4] IB/cm: Fix a recently introduced locking bug ib_cm_notify() can be called from interrupt context. Hence do not reenable interrupts unconditionally in cm_establish(). This patch avoids that lockdep reports the following warning: WARNING: CPU: 0 PID: 23317 at kernel/locking/lockdep.c:2624 trace _hardirqs_on_caller+0x112/0x1b0 DEBUG_LOCKS_WARN_ON(current->hardirq_context) Call Trace: [] dump_stack+0x67/0x92 [] __warn+0xc1/0xe0 [] warn_slowpath_fmt+0x4a/0x50 [] trace_hardirqs_on_caller+0x112/0x1b0 [] trace_hardirqs_on+0xd/0x10 [] _raw_spin_unlock_irq+0x27/0x40 [] ib_cm_notify+0x25c/0x290 [ib_cm] [] srpt_qp_event+0xa1/0xf0 [ib_srpt] [] mlx4_ib_qp_event+0x67/0xd0 [mlx4_ib] [] mlx4_qp_event+0x5a/0xc0 [mlx4_core] [] mlx4_eq_int+0x3d8/0xcf0 [mlx4_core] [] mlx4_msi_x_interrupt+0xc/0x20 [mlx4_core] [] handle_irq_event_percpu+0x64/0x100 [] handle_irq_event+0x34/0x60 [] handle_edge_irq+0x6a/0x150 [] handle_irq+0x15/0x20 [] do_IRQ+0x5c/0x110 [] common_interrupt+0x89/0x89 [] blk_run_queue_async+0x37/0x40 [] rq_completed+0x43/0x70 [dm_mod] [] dm_softirq_done+0x176/0x280 [dm_mod] [] blk_done_softirq+0x52/0x90 [] __do_softirq+0x10f/0x230 [] irq_exit+0xa8/0xb0 [] smp_trace_call_function_single_interrupt+0x2e/0x30 [] smp_call_function_single_interrupt+0x9/0x10 [] call_function_single_interrupt+0x89/0x90 Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's going away") Signed-off-by: Bart Van Assche Cc: Erez Shitrit Cc: Doug Ledford Cc: stable # v4.2+ --- drivers/infiniband/core/cm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 1d92e09..c995255 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3452,14 +3452,14 @@ static int cm_establish(struct ib_cm_id *cm_id) work->cm_event.event = IB_CM_USER_ESTABLISHED; /* Check if the device started its remove_one */ - spin_lock_irq(&cm.lock); + spin_lock_irqsave(&cm.lock, flags); if (!cm_dev->going_down) { queue_delayed_work(cm.wq, &work->work, 0); } else { kfree(work); ret = -ENODEV; } - spin_unlock_irq(&cm.lock); + spin_unlock_irqrestore(&cm.lock, flags); out: return ret; -- 2.7.3 --------------050900080509050601030503 Content-Type: text/x-patch; name="0002-kernel-kthread.c-Avoid-CPU-lockups.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-kernel-kthread.c-Avoid-CPU-lockups.patch" >>From 5fd6aedadc04d102cd261507ff61071071455fb6 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 24 Mar 2016 12:04:01 -0700 Subject: [PATCH 2/4] kernel/kthread.c: Avoid CPU lockups Avoid that complaints similar to the one below are reported against a debug kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kdmwork-25 4:2:23313] irq event stamp: 16320042 hardirqs last enabled at (16320041): [] _raw_spin_unlock_irq+0x27/0x40 hardirqs last disabled at (16320042): [] 0xffff8803ffbe3cd8 softirqs last enabled at (16319960): [] __do_softirq+0x1cb/0x230 softirqs last disabled at (16319715): [] irq_exit+0xa8/0xb0 CPU: 1 PID: 23313 Comm: kdmwork-254:2 RIP: 0010:[] [] _raw_spin_unlock_irq+0x2f/0x40 Call Trace: [] scsi_request_fn+0x11f/0x630 [] __blk_run_queue+0x2e/0x40 [] __elv_add_request+0x75/0x1f0 [] blk_insert_cloned_request+0x101/0x190 [] map_request+0x16a/0x1b0 [dm_mod] [] map_tio_request+0x1d/0x40 [dm_mod] [] kthread_worker_fn+0x82/0x1a0 [] kthread+0xea/0x100 [] ret_from_fork+0x22/0x40 Signed-off-by: Bart Van Assche --- kernel/kthread.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/kthread.c b/kernel/kthread.c index 9ff173d..516ca6b 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -593,6 +593,7 @@ repeat: if (work) { __set_current_state(TASK_RUNNING); work->func(work); + cond_resched_rcu_qs(); } else if (!freezing(current)) schedule(); -- 2.7.3 --------------050900080509050601030503 Content-Type: text/x-patch; name="0003-block-Limit-work-processed-in-softirq-context.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0003-block-Limit-work-processed-in-softirq-context.patch" >>From 44985e4b2f3124bf87e84a4c7572efa00ac28d3b Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 23 Mar 2016 17:14:57 -0700 Subject: [PATCH 3/4] block: Limit work processed in softirq context Avoid that complaints like the one below are reported against a debug kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [disk11_0:2708] irq event stamp: 17120809 hardirqs last enabled at (17120808): [] _raw_spin_unlock_irqrestore+0x31/0x50 hardirqs last disabled at (17120809): [] 0xffff88046f223bd0 softirqs last enabled at (17120794): [] scst_check_blocked_dev+0x77/0x1c0 [scst] softirqs last disabled at (17120795): [] do_softirq_own_stack+0x1c/0x30 RIP: 0010:[] [] _raw_spin_unlock_irqrestore+0x33/0x50 Call Trace: [] free_debug_processing+0x270/0x3a0 [] __slab_free+0x17a/0x2c0 [] kmem_cache_free+0x1b4/0x1d0 [] mempool_free_slab+0x12/0x20 [] mempool_free+0x26/0x80 [] bio_free+0x49/0x60 [] bio_put+0x1e/0x30 [] end_clone_bio+0x21/0x70 [dm_mod] [] bio_endio+0x52/0x60 [] blk_update_request+0x7c/0x2a0 [] scsi_end_request+0x2e/0x1d0 [] scsi_io_completion+0xb4/0x610 [] scsi_finish_command+0xca/0x120 [] scsi_softirq_done+0x120/0x140 [] blk_done_softirq+0x76/0x90 [] __do_softirq+0x10f/0x230 [] do_softirq_own_stack+0x1c/0x30 --- block/blk-softirq.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/block/blk-softirq.c b/block/blk-softirq.c index 53b1737..d739949 100644 --- a/block/blk-softirq.c +++ b/block/blk-softirq.c @@ -20,20 +20,26 @@ static DEFINE_PER_CPU(struct list_head, blk_cpu_done); */ static void blk_done_softirq(struct softirq_action *h) { - struct list_head *cpu_list, local_list; + struct list_head *cpu_list = this_cpu_ptr(&blk_cpu_done); + struct request *rq; + int i; local_irq_disable(); - cpu_list = this_cpu_ptr(&blk_cpu_done); - list_replace_init(cpu_list, &local_list); - local_irq_enable(); - - while (!list_empty(&local_list)) { - struct request *rq; - - rq = list_entry(local_list.next, struct request, ipi_list); + for (i = 64; i > 0; i--) { + if (list_empty(cpu_list)) + goto done; + rq = list_first_entry(cpu_list, struct request, ipi_list); list_del_init(&rq->ipi_list); + local_irq_enable(); + rq->q->softirq_done_fn(rq); + + local_irq_disable(); } + raise_softirq_irqoff(BLOCK_SOFTIRQ); + +done: + local_irq_enable(); } #ifdef CONFIG_SMP -- 2.7.3 --------------050900080509050601030503 Content-Type: text/x-patch; name="0004-Avoid-that-I-O-completion-processing-triggers-lockup.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0004-Avoid-that-I-O-completion-processing-triggers-lockup.pa"; filename*1="tch" >>From a73fdf710b98922fa02d464af96b499ea2740832 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 23 Mar 2016 14:38:13 -0700 Subject: [PATCH 4/4] Avoid that I/O completion processing triggers lockup complaints Avoid that I/O completion processing triggers the following complaints if kernel debug options that slow down the kernel significantly are enabled: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kdmwork-254:2:358] irq event stamp: 15233868 hardirqs last enabled at (15233867): [] _raw_spin_unlock_irq+0x27/0x40 hardirqs last disabled at (15233868): [] apic_timer_interrupt+0x84/0x90 softirqs last enabled at (15233850): [] __do_softirq+0x1cb/0x230 softirqs last disabled at (15233743): [] irq_exit+0xa8/0xb0 CPU: 3 PID: 358 Comm: kdmwork-254:2 RIP: 0010:[] [] _raw_spin_unlock_irq+0x2f/0x40 Call Trace: [] scsi_request_fn+0x118/0x600 [] __blk_run_queue+0x2e/0x40 [] __elv_add_request+0x75/0x1f0 [] blk_insert_cloned_request+0x101/0x190 [] map_request+0x18e/0x210 [dm_mod] [] map_tio_request+0x1d/0x40 [dm_mod] [] kthread_worker_fn+0x7d/0x1a0 [] kthread+0xea/0x100 [] ret_from_fork+0x3f/0x70 INFO: rcu_sched self-detected stall on CPU 3-...: (6497 ticks this GP) idle=fb9/140000000000002/0 softirq=2044956/2045037 fqs=5414 (t=6500 jiffies g=219289 c=219288 q=7233211) Task dump for CPU 3: kdmwork-254:2 R running task 0 358 2 0x00000008 Call Trace: [] sched_show_task+0xbf/0x150 [] dump_cpu_task+0x32/0x40 [] rcu_dump_cpu_stacks+0x89/0xe0 [] rcu_check_callbacks+0x439/0x730 [] update_process_times+0x34/0x60 [] tick_sched_handle.isra.18+0x20/0x50 [] tick_sched_timer+0x38/0x70 [] __hrtimer_run_queues+0xa5/0x1c0 [] hrtimer_interrupt+0xa6/0x1b0 [] smp_trace_apic_timer_interrupt+0x63/0x90 [] smp_apic_timer_interrupt+0x9/0x10 [] apic_timer_interrupt+0x89/0x90 [] __slab_free+0xc6/0x270 [] kmem_cache_free+0x159/0x160 [] kiocb_free+0x32/0x40 [] aio_complete+0x1e5/0x3c0 [] dio_complete+0x75/0x1d0 [] dio_bio_end_aio+0x7a/0x130 [] bio_endio+0x3a/0x60 [] blk_update_request+0x7c/0x2a0 [] end_clone_bio+0x41/0x70 [dm_mod] [] bio_endio+0x3a/0x60 [] blk_update_request+0x7c/0x2a0 [] scsi_end_request+0x2e/0x1d0 [] scsi_io_completion+0xb4/0x610 [] scsi_finish_command+0xca/0x120 [] scsi_softirq_done+0x120/0x140 [] blk_done_softirq+0x72/0x90 [] __do_softirq+0x10f/0x230 [] irq_exit+0xa8/0xb0 [] do_IRQ+0x65/0x110 [] common_interrupt+0x89/0x89 [] __multipath_map.isra.16+0x145/0x260 [dm_multipath] [] multipath_map+0x12/0x20 [dm_multipath] [] map_request+0x43/0x210 [dm_mod] [] map_tio_request+0x1d/0x40 [dm_mod] [] kthread_worker_fn+0x7d/0x1a0 [] kthread+0xea/0x100 [] ret_from_fork+0x3f/0x70 Signed-off-by: Bart Van Assche Cc: Paul E. McKenney --- drivers/scsi/scsi_lib.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 8106515..8f264a0 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1770,13 +1770,14 @@ static void scsi_request_fn(struct request_queue *q) struct Scsi_Host *shost; struct scsi_cmnd *cmd; struct request *req; + int i; /* - * To start with, we keep looping until the queue is empty, or until - * the host is no longer able to accept any more requests. + * Loop until the queue is empty, until the host is no longer able to + * accept any more requests or until 64 requests have been processed. */ shost = sdev->host; - for (;;) { + for (i = 64; i > 0; i--) { int rtn; /* * get next queueable request. We do this early to make sure @@ -1861,6 +1862,9 @@ static void scsi_request_fn(struct request_queue *q) spin_lock_irq(q->queue_lock); } + if (unlikely(i == 0)) + blk_delay_queue(q, 0); + return; host_not_ready: -- 2.7.3 --------------050900080509050601030503--