From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Van Assche Subject: Re: [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device() Date: Thu, 22 Oct 2015 10:33:06 -0700 Message-ID: <56291DD2.90104@sandisk.com> References: <1445533954-19857-1-git-send-email-vkuznets@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-bl2on0061.outbound.protection.outlook.com ([65.55.169.61]:39904 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756803AbbJVRdO (ORCPT ); Thu, 22 Oct 2015 13:33:14 -0400 In-Reply-To: <1445533954-19857-1-git-send-email-vkuznets@redhat.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Vitaly Kuznetsov , "James E.J. Bottomley" Cc: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, "K. Y. Srinivasan" On 10/22/2015 10:12 AM, Vitaly Kuznetsov wrote: > On some host errors storvsc module tries to remove sdev by scheduling a job > which does the following: > > sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun); > if (sdev) { > scsi_remove_device(sdev); > scsi_device_put(sdev); > } > > While this code seems correct the following crash is observed: > > general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC > RIP: 0010:[] [] bdi_destroy+0x39/0x220 > ... > [] ? _raw_spin_unlock_irq+0x2c/0x40 > [] blk_cleanup_queue+0x17b/0x270 > [] __scsi_remove_device+0x54/0xd0 [scsi_mod] > [] scsi_remove_device+0x2b/0x40 [scsi_mod] > [] storvsc_remove_lun+0x3d/0x60 [hv_storvsc] > [] process_one_work+0x1b1/0x530 > ... > > The problem comes with the fact that many such jobs (for the same device) > are being scheduled simultaneously. While scsi_remove_device() uses > shost->scan_mutex and scsi_device_lookup() will fail for a device in > SDEV_DEL state there is no protection against someone who did > scsi_device_lookup() before we actually entered __scsi_remove_device(). So > the whole scenario looks like that: two callers do simultaneous (or > preemption happens) calls to scsi_device_lookup() ant these calls succeed > for all of them, after that both callers try doing scsi_remove_device(). > shost->scan_mutex only serializes their calls to __scsi_remove_device() > and we end up doing the cleanup path twice. > > Signed-off-by: Vitaly Kuznetsov > --- > drivers/scsi/scsi_sysfs.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index b333389..e0d2707 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev) > { > struct device *dev = &sdev->sdev_gendev; > > + /* > + * This cleanup path is not reentrant and while it is impossible > + * to get a new reference with scsi_device_get() someone can still > + * hold a previously acquired one. > + */ > + if (sdev->sdev_state == SDEV_DEL) > + return; > + > if (sdev->is_visible) { > if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) > return; Hello Vitaly, Sorry but I don't see how the above patch could be a proper fix. If two calls to __scsi_remove_device() occur concurrently the crash explained above can still occur. The storsvc driver should be modified such that concurrent __scsi_remove_device() calls do not occur. How about preventing concurrent calls via a mutex ? Another possible approach is to use the workqueue mechanism. An example can be found in the SRP initiator driver (ib_srp). Bart. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757911AbbJVRdZ (ORCPT ); Thu, 22 Oct 2015 13:33:25 -0400 Received: from mail-bl2on0061.outbound.protection.outlook.com ([65.55.169.61]:39904 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756803AbbJVRdO (ORCPT ); Thu, 22 Oct 2015 13:33:14 -0400 Authentication-Results: spf=pass (sender IP is 63.163.107.172) smtp.mailfrom=sandisk.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=bestguesspass action=none header.from=sandisk.com; X-AuditID: ac160a68-f790b6d00000123b-09-56291dd31aa0 Subject: Re: [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device() To: Vitaly Kuznetsov , "James E.J. Bottomley" References: <1445533954-19857-1-git-send-email-vkuznets@redhat.com> CC: , , "K. Y. Srinivasan" From: Bart Van Assche Message-ID: <56291DD2.90104@sandisk.com> Date: Thu, 22 Oct 2015 10:33:06 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <1445533954-19857-1-git-send-email-vkuznets@redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrFLMWRmVeSWpSXmKPExsWyRoxnke5lWc0wg93vZSz+r7/NYvFk9VZ2 i8u75rBZdF/fwWYxqfUyswOrR+uOv+weh3/8YPZ4v+8qm8fnTXIBLFFcNimpOZllqUX6dglc GY928xT8EavouXaZvYHxplAXIyeHhICJxLOte1khbDGJC/fWs3UxcnEICZxglFh6vocVwtnB KPF6+WNWmI43O5YwQiQ2MUosf7KSHSQhLBArcXLxWiCbg0NEIEri/pwckLCQgLPEzm9nmUFs ZoEsiXvzJ4PNYRMwkvj2fiYLSDmvgIbE1w9aIGEWAVWJPU23wEpEBSIkJk5oALN5BQQlTs58 AlbOKeAi8XGCHYjJLGAv8WBrGcRweYntb+cwgxwmIbCXVeLlxOUsEBeoS5xcMp9pAqPILCST ZiG0z0LSvoCReRWjWG5mTnFuemqBoaFecWJeSmZxtl5yfu4mRnCMcGXsYNw6yfwQowAHoxIP r8Z/9TAh1sSy4srcQ4wSHMxKIrz9TzXChHhTEiurUovy44tKc1KLDzFKc7AoifNat6iFCQmk J5akZqemFqQWwWSZODilGhgzjYSU73v/fmVgr/Gv2LYxZsP370IMgX1TjBMcq4WbPN8+XrNH deaJtumJy0JiFJa++8+13vOd8n3udeZH7viF/1p8Uv3nJ3Xl1qcyvj3v1hpm/3zD9jVp69O1 pZtZljM7p77meFbX/tpv5/ejTUpqrxsLk2pUNm2yeM2aY/hIOCGTKVEj94oSS3FGoqEWc1Fx IgDrg8g0jQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrEJMWRmVeSWpSXmKPExsXCtZEjRfeSrGaYQd9mK4v/62+zWDxZvZXd 4vKuOWwW3dd3sFlMar3M7MDq0brjL7vH4R8/mD3e77vK5vF5k1wASxSXTUpqTmZZapG+XQJX xqPdPAV/xCp6rl1mb2C8KdTFyMkhIWAi8WbHEkYIW0ziwr31bF2MXBxCAhsYJea+/cYKkhAW iJU4uXgtO4gtIhAlsaThHhuILSTgLLHz21lmEJtZIE1i9qG5YDVsAkYS397PZOli5ODgFdCQ +PpBCyTMIqAqsafpFthIUYEIiYkTGsBsXgFBiZMzn4CVcwq4SHycYAcx0VbiztzdUNPlJba/ ncM8gZF/FpKOWUjKZiEpW8DIvIpRLDczpzg3PbPA0FCvODEvJbM4Wy85P3cTIzhUOSN3MD6d aH6IkYmDU6qBsay4WchIPsL1aWvCtQfzJJKWS1dbb6+4avsu7UjYz54rInFTNlqf/Wpj7/Fv 2vcXe2YsO3Im5FSgc8TT8vIOr4+BdlWWj2f2TrgryvHv0MnXx+7nLQ42vDAl6GHV3FfyL9h6 LK+r7LatWBGdEtfjI/Eq8fwcpQP/HqSc3n5c0LbcyflHQWSqnRJLcUaioRZzUXEiADIqWacF AgAA X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;BY2FFO11OLC009;1:jEqyP/ln2rzM+SKuQ5DumJR3x9RJb6qMDt1h4yJyxq1SwDhUwbTV+BWUE3/l1cUdVnFTRIIOS1PGVxHUwMEsix2BjMZIV92fM6Q1n15K0n54mKLzU78cEWA8/Bl7KUDCIFNEvegA91xmMzXY8xNwcmgkuNIAPlAo8aU6nO44oJW+ac0od5dJWQlhDgR+dPlcd4j1AT4/4goSoZIpsTy1tSrRj5vleKIiT7GNcA1rpG1u3f+WK7w4odSXW54dlKlobmoaj5BNV6TAYAkKFQLNEg2xkP6brHvMQFJbE+bfz9EMfuzm+3GhYpmjLEPaiUpaNk0FOGikS6Yzat+NlMk5WubEawYhTFz5R1g63ZoB+8rWRpuo0GyPUeLrpxkjb9U+ X-Forefront-Antispam-Report: CIP:63.163.107.172;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(438002)(24454002)(189002)(199003)(479174004)(377454003)(86362001)(5008740100001)(189998001)(5001960100002)(65816999)(76176999)(54356999)(575784001)(50986999)(106466001)(64706001)(65806001)(36756003)(33656002)(65956001)(47776003)(77096005)(2950100001)(87936001)(19580405001)(92566002)(5001920100001)(19580395003)(50466002)(69596002)(5007970100001)(46102003)(83506001)(64126003)(97736004)(4001350100001)(11100500001)(23746002)(5001770100001)(81156007);DIR:OUT;SFP:1101;SCL:1;SRVR:BY1PR02MB1257;H:milsmgep11.sandisk.com;FPR:;SPF:Pass;PTR:ErrorRetry;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;BY1PR02MB1257;2:mQ9KathTs1q+ZE8tYKswIW8TqO3Dl3e980k1VkDjrqlgln7RitEN96Kpuo23LkLoClZxPtlA658TJRvtYWoRf/9eFl/ID6xuXSJVn7ON6KKpzq2Z+25fwHutn5fDYr0+2tDJPljJt6i8e4tEdmCISXXY6La3M96mKDL3oSbpvzA=;3:ujN9PdIB0oW8RNwewUgz7czJoHMk8FrTPOHuYB/QGdTZVFzLdSpVjVc5nRqp0g81rucTkoE1nhly71BkYwPzAYAMucvl3DYJTLAdTH+UiPF+bz6ZOZ77J5ofEEVZmU3vM1OCYJpMps8u/vgU0uS2FwQuD/CFxLt5wC5LDuHQIO5sqA9pME+OykyMnYJNBYmx7tsRIg8TqnlFAxn7NWCcNfkV2Qie1w9Dua67aQHyO9k8VUqboLn4sgzgUm87SoaZevn2Unrtq3VhCVLy313JZw==;25:ai8tdCSMQwevgRSuet9MTokYICHYomsZiloC7Artn8jQGcxO7pvNAPADMTLmhWeusS+VqZQp73J64x05v8b6hxEY6ayxzqx4p7ekSVUW85U/lToQ7iGvFRJJOSf2fxNgWqzuAHiYvqnpRYpdZbINRPr/7gXbLX+y9TtLpS/PjOy7VkNm1S5OU5YCQQy2PD3OTj6XTKotAsVIwWRNslYTpp6zWbIao+7m53cH0ZQ0T7yNMqIi44SiybgOcfWVTZFBZznbPzblzvaCV8N194bjqA== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(8251501001);SRVR:BY1PR02MB1257; X-Microsoft-Exchange-Diagnostics: 1;BY1PR02MB1257;20:Af2GLre6noau5it86sUGs8pUkCgnMVreYjL/AApAuLhug1yXJVLpomyoPqT6ed5k9R3YZp9rP6fjhIJe8chfAiMkCACpASPgILi0R/9wcbr0t4BIol7ADUUc2lAq5zi86iTBHdYk/ap/s617cbzbiLBeMDVuljtq0TlFbEjcffCMLu+cxCOEd7yzqc5NUpNHtcOSztF8aMtPHgmRZL+MzDassVlbK791/0k+UH9O0MqRZ7cqLRB9VMDqzS7z36oOz5BYlwmTBxRJfSVyREu9E20uBW/RvA7Q80LDkKel1KHDrv/t0e3YtKnzykEeJftRW1Emlur6t/YBMRNLiI50V+Q9qce9rAnPKWR0G3lOCBgOhHlwQnOiau+RGbjkuZDVkmlcYnD9iQLQ79IAQZ6g5LsJasVcJJlA+z38aogbZMlB0UPtKqF/GphmEsFVyWh+jYD79FZfYTGMDhDosd4z7E68TbfnBG0rg4RgIR9x/xn+E+xm9mX5Vj5tMGkArt7p;4:NLJIzotD96ShTeiHrfFKbh3n80amI71/j1a+hkUrw7m2qQduClo1SSCHq91cgNl7Sc6Uxo3scErbutHvqexg5TiDsAxr5dKWa1IrpWOoEv9clA78RxLl5oZBMO2C0W2RrVleGqQA68PC2oiLE8gtVQvhDdnXWHsOQILAaXNRsgImh9AYdlMdrKLhkXn9aTA0zpe4illyMi9TFYblpUJFPGe/wShdBTn162PceTjR/PKQ2cxemn86qM9OqPef4RFzujsMHMBo3Rxq17VhnRFSJ58uaKeRWzDWKle44N03GlgCxsDjuQbgSttbuKlP/XND2L+I/rkLgW33TvvR4lS9x3uNomOuWvKxYBb45hmlpl1c1411MhBjyUuXmXbXa2il X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(520078)(5005006)(8121501046)(3002001)(102215026);SRVR:BY1PR02MB1257;BCL:0;PCL:0;RULEID:;SRVR:BY1PR02MB1257; X-Forefront-PRVS: 0737B96801 X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;BY1PR02MB1257;23:6ulF4JQJuet0vHUwvq2EvknFSJeVp4fupoebb?= =?Windows-1252?Q?q2hGa+mezuEOfpkYctijCL0U6E17Y0MzACSMtIkHl8Wulh7cgqptTilO?= =?Windows-1252?Q?gfWaAo0X14V0Ua1kzeS3rC+LbpuX1rRtjvo6k0WykAn/IELtZ2hrsPBf?= =?Windows-1252?Q?KITDZ3SUF/6cy0ylmo70C3vZqy+V6AlhrlPse/LZLYrh3EQV4SwnZ08r?= =?Windows-1252?Q?J2hkanYtLPNI2xCodNWIISoLVBZmTeuDHPEhMzJsJOKiZK6Afqt19EW+?= =?Windows-1252?Q?7+Z+4JGYlEM2UwCYfA6gvelpxJcDh3CyHEXWMW2roCuW6mpY1t9/FgEd?= =?Windows-1252?Q?mI0IrP8iBzLIHlBKFDiJWWyXx/9k+rE3Fg++IpcdyvsTbFrMlWynuWXx?= =?Windows-1252?Q?Zeedd/2vE0ZamlpyqFtp4STO3QMqCwxmgnItAgqUQP90ljvw+sOrS2ak?= =?Windows-1252?Q?fltJp5T3pzsdWKNBDniK5HX7z/LEWecxwoP1MXbYU6V7313k7vlu2W3A?= =?Windows-1252?Q?/5IJaOCA/ImxH5zD7CHAVlQmoBzdYZhrFgZCLXE5LFlLmQz+UN+Fp6Zk?= =?Windows-1252?Q?tx+Ivjx5Greo7yCt7T5Mg8KnLIAFeDeQi5az00/jCHdvWIyn2PtEJKO0?= =?Windows-1252?Q?OGo5COruCBAde4qVA3lP/SLWlXuDGVmNGgoLykx/5/aTDYtBH5vytdk0?= =?Windows-1252?Q?mfFnJRmnrQO5znJoaexuR22Gx+V8V0JsRtKUoD0d/+vakJjOukpbrh7B?= =?Windows-1252?Q?Ae//eVOMPa+OwMPl1pVTCZmLdj+hRjSgtxVnfw2FNSVx//1+pn71NFEH?= =?Windows-1252?Q?cB4POlJkinLSHjOz123xluGVj4i4lcwmhbnOnjVWn63HsyTeVyNbtxZT?= =?Windows-1252?Q?Of1Rjd4MtNAcP+pitKqN0TwmSnvaKnob57tHpGdOTh4BO5oBF1WNaWnO?= =?Windows-1252?Q?ybTzu4exFlf1xZYEMr4QFvI3AHTjBajkHqZU//5f0GUY3fdLYiAsxu3c?= =?Windows-1252?Q?iPtBxhW3qE4X16yhNEpTbKbRBoSTmx48J0k26TK78s1xxZNDktHETaDq?= =?Windows-1252?Q?SPTD9LyvMXM8s8xz/AEpD/PV04v/FNzH+krZwXKg9mg7oEkQvfqVWBrO?= =?Windows-1252?Q?qIMeDK3lo637ahrDauAA5II6bGVGRlWxjjv3J65krXyUFFBwrKYmGALM?= =?Windows-1252?Q?FsTUrDRYA0vTGSvUHJitUoJh0QPVzO2sLuZfSWEJgRskxzOKY3hNWSYq?= =?Windows-1252?Q?fGN6Q2z9AXM/dSagSYMxTIUfbS7aJ2Vy1xnbo4=3D?= X-Microsoft-Exchange-Diagnostics: 1;BY1PR02MB1257;5:8YmvcjlCq/fZ8D2PaQvcOFSE5XD/ACvdZ2yntqUaXz9IwHSDcz8/jK4max2GFi3zekaMbLn2CoBep8VvgGNOu04ICkwEuHNr62Rqz3+obFFF1oAuCplBdPFDzauspdP6bHXwxpPKsRD31b7ub9y2gw==;24:tQ7oSTlLZ6+F1RrBF9egvTE6fFBdCvtzH6boLktvixxnkdpsvvd9Y92JlnDjquraNfxn5QXXX+HkUywaRwfpjHtS7QMq5q+Pbe45v5Qegn4=;20:M6B7G2xZkUZFB6a9tGpxFxoEP462DsJFcOy4FM81PYZQKCJIWRJY5Z4ooVOCjX1pbYeRGDAkdJjcl8WnAhkDkgH6dISzy/3t58NoHKLKIrfsIbzrSTHkXQ1R6jFMfS6xEoxYA/9YO20Mt/qChD2Ot3Jx2BDBN2N3PJBtJ8wCP5kbevRfHnTXG//OVhULgIKEwtka1+HBX/qk9ZHaDUoO1IF3haWFSxUM6HJZs6EwesAAIm4B92zQWq/gQrdtu2Hq SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: sandisk.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Oct 2015 17:33:07.5403 (UTC) X-MS-Exchange-CrossTenant-Id: fcd9ea9c-ae8c-460c-ab3c-3db42d7ac64d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=fcd9ea9c-ae8c-460c-ab3c-3db42d7ac64d;Ip=[63.163.107.172];Helo=[milsmgep11.sandisk.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY1PR02MB1257 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/22/2015 10:12 AM, Vitaly Kuznetsov wrote: > On some host errors storvsc module tries to remove sdev by scheduling a job > which does the following: > > sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun); > if (sdev) { > scsi_remove_device(sdev); > scsi_device_put(sdev); > } > > While this code seems correct the following crash is observed: > > general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC > RIP: 0010:[] [] bdi_destroy+0x39/0x220 > ... > [] ? _raw_spin_unlock_irq+0x2c/0x40 > [] blk_cleanup_queue+0x17b/0x270 > [] __scsi_remove_device+0x54/0xd0 [scsi_mod] > [] scsi_remove_device+0x2b/0x40 [scsi_mod] > [] storvsc_remove_lun+0x3d/0x60 [hv_storvsc] > [] process_one_work+0x1b1/0x530 > ... > > The problem comes with the fact that many such jobs (for the same device) > are being scheduled simultaneously. While scsi_remove_device() uses > shost->scan_mutex and scsi_device_lookup() will fail for a device in > SDEV_DEL state there is no protection against someone who did > scsi_device_lookup() before we actually entered __scsi_remove_device(). So > the whole scenario looks like that: two callers do simultaneous (or > preemption happens) calls to scsi_device_lookup() ant these calls succeed > for all of them, after that both callers try doing scsi_remove_device(). > shost->scan_mutex only serializes their calls to __scsi_remove_device() > and we end up doing the cleanup path twice. > > Signed-off-by: Vitaly Kuznetsov > --- > drivers/scsi/scsi_sysfs.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index b333389..e0d2707 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev) > { > struct device *dev = &sdev->sdev_gendev; > > + /* > + * This cleanup path is not reentrant and while it is impossible > + * to get a new reference with scsi_device_get() someone can still > + * hold a previously acquired one. > + */ > + if (sdev->sdev_state == SDEV_DEL) > + return; > + > if (sdev->is_visible) { > if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) > return; Hello Vitaly, Sorry but I don't see how the above patch could be a proper fix. If two calls to __scsi_remove_device() occur concurrently the crash explained above can still occur. The storsvc driver should be modified such that concurrent __scsi_remove_device() calls do not occur. How about preventing concurrent calls via a mutex ? Another possible approach is to use the workqueue mechanism. An example can be found in the SRP initiator driver (ib_srp). Bart.