From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761441AbdEAPnq (ORCPT ); Mon, 1 May 2017 11:43:46 -0400 Received: from g9t1613g.houston.hpe.com ([15.241.32.99]:28190 "EHLO g9t1613g.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751777AbdEAPnX (ORCPT ); Mon, 1 May 2017 11:43:23 -0400 From: "Kani, Toshimitsu" To: "dan.j.williams@intel.com" , "linux-nvdimm@lists.01.org" CC: "linux-kernel@vger.kernel.org" , "dave.jiang@intel.com" , "vishal.l.verma@intel.com" Subject: Re: [PATCH] libnvdimm: rework region badblocks clearing Thread-Topic: [PATCH] libnvdimm: rework region badblocks clearing Thread-Index: AQHSwa+nta9I7uFKMkmISr7SDbaEM6HfnXuA Date: Mon, 1 May 2017 15:34:32 +0000 Message-ID: <1493652871.30303.15.camel@hpe.com> References: <149355594185.9917.1577772489949690281.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <149355594185.9917.1577772489949690281.stgit@dwillia2-desk3.amr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=hpe.com; x-originating-ip: [15.203.227.8] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;AT5PR84MB0258;7:uLCfp2IlfKWnH42b7dZllvKXzLB29+f47zzbAHsKAHCXn3b0ttMej6VCbTGyvljYibVswZLkjLAhxe7Kod61HfeyzY3JDiiOAf4RiZoZD2bWe4+DvyEZ54g1LnssPFkR3sdjTIhXYx4+q3zIMnSviOL3rsLNNVfkkM2oHjKB5lyIhfdDZEd1gNZrKdpYpcmJe7xpDYUGZJZkR3FcqMkbLvr8WbGsSUPYit61zsVlMliO6fYTB3V/WJkf6cfJCQd3nkK4dQrfIDYEL5V8ag3R90ziAFhmPAoT8qXBgMMI7EV+BcxImqjCFdk1AOZbNHaMzw1rZ50IWp8yG+ZknHS0zg== x-ms-office365-filtering-correlation-id: 001331a1-c513-4f6e-8451-08d490a7907e x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081);SRVR:AT5PR84MB0258; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123564025)(20161123562025)(20161123555025)(20161123560025)(6072148);SRVR:AT5PR84MB0258;BCL:0;PCL:0;RULEID:;SRVR:AT5PR84MB0258; x-forefront-prvs: 02945962BD x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39400400002)(39850400002)(39410400002)(39860400002)(39450400003)(39840400002)(377424004)(24454002)(38730400002)(36756003)(53936002)(8676002)(81166006)(3846002)(6116002)(54906002)(6512007)(3660700001)(478600001)(3280700002)(4326008)(7736002)(305945005)(5660300001)(6506006)(6436002)(25786009)(86362001)(2501003)(2906002)(76176999)(229853002)(102836003)(189998001)(33646002)(2900100001)(6486002)(50986999)(77096006)(54356999)(103116003)(8936002)(122556002)(2950100002);DIR:OUT;SFP:1102;SCL:1;SRVR:AT5PR84MB0258;H:AT5PR84MB0260.NAMPRD84.PROD.OUTLOOK.COM;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 01 May 2017 15:34:32.3606 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 105b2061-b669-4b31-92ac-24d304d195dc X-MS-Exchange-Transport-CrossTenantHeadersStamped: AT5PR84MB0258 X-OriginatorOrg: hpe.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id v41FhoCU028034 On Sun, 2017-04-30 at 05:39 -0700, Dan Williams wrote: > Toshi noticed that the new support for a region-level badblocks > missed the case where errors are cleared due to BTT I/O. > > An initial attempt to fix this ran into a "sleeping while atomic" > warning due to taking the nvdimm_bus_lock() in the BTT I/O path to > satisfy the locking requirements of __nvdimm_bus_badblocks_clear(). > However, that lock is not needed since we are not acting any data > that is subject to change due to a change of state of the bus / > region. The badblocks instance has its own internal lock to handle > mutations of the error list. > > So, to make it clear that we are just acting on region devices and > don't need the lock rename __nvdimm_bus_badblocks_clear() to > nvdimm_clear_badblocks_regions(). Eliminate the lock and consolidate > all routines in drivers/nvdimm/bus.c. Also, make some cleanups to > remove unnecessary casts, make the calling convention of > nvdimm_clear_badblocks_regions() clearer by replacing struct resource > with the minimal struct clear_badblocks_context, and use the > DEVICE_ATTR macro. Hi Dan, I was testing the change with CONFIG_DEBUG_ATOMIC_SLEEP set this time, and hit the following BUG with BTT. This is a separate issue (not introduced by this patch), but it shows that we have an issue with the DSM call path as well. [ 1279.712933] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: 1: func: 1 input length: 16 [ 1279.721111] nvdimm in  00000000: 60000000 00000002 00001000 00000000  ...`............ [ 1279.729799] BUG: sleeping function called from invalid context at mm/slab.h:432 [ 1279.738005] in_atomic(): 1, irqs_disabled(): 0, pid: 13353, name: dd [ 1279.745187] INFO: lockdep is turned off. : [ 1279.767908] Call Trace: [ 1279.771116]  dump_stack+0x86/0xc3 [ 1279.775201]  ___might_sleep+0x17d/0x250 [ 1279.779808]  __might_sleep+0x4a/0x80 [ 1279.784214]  __kmalloc+0x1c0/0x2e0 [ 1279.788388]  acpi_os_allocate_zeroed+0x2d/0x2f [ 1279.793604]  acpi_evaluate_object+0x59/0x3b1 [ 1279.798640]  acpi_evaluate_dsm+0xbd/0x10c [ 1279.803458]  acpi_nfit_ctl+0x1ef/0x7c0 [nfit] [ 1279.808584]  ? nsio_rw_bytes+0x152/0x280 [ 1279.813258]  nvdimm_clear_poison+0x77/0x140 [ 1279.818193]  nsio_rw_bytes+0x18f/0x280 [ 1279.822684]  btt_write_pg+0x1d4/0x3d0 [nd_btt] [ 1279.827869]  btt_make_request+0x119/0x2d0 [nd_btt] [ 1279.833398]  ? generic_make_request+0xef/0x3b0 [ 1279.838575]  generic_make_request+0x122/0x3b0 [ 1279.843661]  ? iov_iter_get_pages+0xbd/0x380 [ 1279.848666]  submit_bio+0x73/0x150 [ 1279.852801]  ? bio_iov_iter_get_pages+0xd7/0x120 [ 1279.858166]  ? __blkdev_direct_IO_simple+0x17b/0x340 [ 1279.863877]  __blkdev_direct_IO_simple+0x177/0x340 [ 1279.869453]  ? bdput+0x20/0x20 [ 1279.873231]  blkdev_direct_IO+0x3b1/0x3c0 [ 1279.877963]  ? current_time+0x18/0x70 [ 1279.882344]  generic_file_direct_write+0xba/0x180 [ 1279.887765]  __generic_file_write_iter+0xc0/0x1c0 [ 1279.893185]  ? __clear_user+0x23/0x70 [ 1279.897550]  blkdev_write_iter+0x8b/0x100 [ 1279.902258]  ? __might_sleep+0x4a/0x80 [ 1279.906699]  __vfs_write+0xe8/0x160 [ 1279.910876]  vfs_write+0xcb/0x1f0 [ 1279.914867]  SyS_write+0x58/0xc0 [ 1279.918773]  do_syscall_64+0x6c/0x1f0 [ 1279.923120]  entry_SYSCALL64_slow_path+0x25/0x25 Thanks, -Toshi