From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from canpmsgout02.his.huawei.com (canpmsgout02.his.huawei.com [113.46.200.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4208C42B72F; Tue, 28 Apr 2026 12:34:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.217 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777379684; cv=none; b=V8G7GP03/xKxSPHSgE+PbYmz+3FwY6dYQCTYzXmsU18Ex3jYkLfK4YnhyhgiGTl4QQL/svPmCY+D0VXOe2vUsJwsR4p/Qj7Hk4HSNbMHqbfdo0EWaa8jblZBSZvjPr1k5GimmGNfjOl5YESxUBsnMivSWxtzhpoo5Xq6Uiq/udQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777379684; c=relaxed/simple; bh=d2Mq+UZRYClXrxCgQULHj/AsUiMRQgk3yEEpegzONLs=; h=Subject:To:CC:References:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type; b=d7b4u6+QJBwwr/Oky/XHWpYxXUVyZFAmXhA6Y84fy5bafqwmRzaiLppH+avhrLDQR1/+f6s1+DCI5P4oIaceG9tZay8/CbqUvLihQqJs0VZau2deDa343tWcGxFA6mMCfp5jqcri6+nO9YCvT5+kJTbazr0zYbSQwQXN61+g+4E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=oJqY9FNE; arc=none smtp.client-ip=113.46.200.217 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="oJqY9FNE" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=zzMDEYzsulY+KHA1t1QmO1PLem7Ek0Kh2oljY0po6pI=; b=oJqY9FNEiXJzrwtJNuU8JXsbp+zDI90/SwPURrnq1/rKozrLjwLmQKIY7Wz6VW9wYu7M+Wyv1 vHIBWPB1HzZkaR/9xd0tPokQaSrfDFVEwjIM1hReOC+Kk6CzY69vcorMt7+fpJbJXmnVFz+q2KN Y2qzhTEPruTUvNcKbzV1s9Y= Received: from mail.maildlp.com (unknown [172.19.162.223]) by canpmsgout02.his.huawei.com (SkyGuard) with ESMTPS id 4g4fpF67ZfzcZyN; Tue, 28 Apr 2026 20:27:33 +0800 (CST) Received: from dggemv712-chm.china.huawei.com (unknown [10.1.198.32]) by mail.maildlp.com (Postfix) with ESMTPS id E716140561; Tue, 28 Apr 2026 20:34:27 +0800 (CST) Received: from kwepemq500010.china.huawei.com (7.202.194.235) by dggemv712-chm.china.huawei.com (10.1.198.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 28 Apr 2026 20:34:27 +0800 Received: from [10.173.124.160] (10.173.124.160) by kwepemq500010.china.huawei.com (7.202.194.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 28 Apr 2026 20:34:26 +0800 Subject: Re: [PATCH v2 3/3] drivers/base/memory: fix locking for poison accounting lookup To: Muchun Song CC: Muchun Song , Vishal Verma , Ying Huang , "Dan Williams" , Naoya Horiguchi , , , , , , David Hildenbrand , "Oscar Salvador" , Greg Kroah-Hartman , "Rafael J . Wysocki" , Danilo Krummrich , Andrew Morton References: <20260428085219.1316047-1-songmuchun@bytedance.com> <20260428085219.1316047-4-songmuchun@bytedance.com> <68DFF29C-B3CC-4950-8A8E-7D42350939CA@linux.dev> From: Miaohe Lin Message-ID: <3697dafa-7ff4-30d9-006b-860299421b63@huawei.com> Date: Tue, 28 Apr 2026 20:34:26 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: <68DFF29C-B3CC-4950-8A8E-7D42350939CA@linux.dev> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-ClientProxiedBy: kwepems200002.china.huawei.com (7.221.188.68) To kwepemq500010.china.huawei.com (7.202.194.235) On 2026/4/28 19:40, Muchun Song wrote: > > >> On Apr 28, 2026, at 19:37, Miaohe Lin wrote: >> >> On 2026/4/28 16:52, Muchun Song wrote: >>> memblk_nr_poison_inc() and memblk_nr_poison_sub() call >>> find_memory_block_by_id(), which requires device_hotplug_lock to >>> serialize the xarray lookup against memory block removal. >>> >>> Take device_hotplug_lock around the lookup and nr_hwpoison update so >>> the memory block cannot disappear between xa_load() and get_device(). >>> >>> Fixes: 5033091de814 ("mm/hwpoison: introduce per-memory_block hwpoison counter") >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Muchun Song >> >> Thanks for update. >> >>> --- >>> drivers/base/memory.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c >>> index 6981b55d582a..f76aee29e9a5 100644 >>> --- a/drivers/base/memory.c >>> +++ b/drivers/base/memory.c >>> @@ -1228,23 +1228,29 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func, >>> void memblk_nr_poison_inc(unsigned long pfn) >>> { >>> const unsigned long block_id = pfn_to_block_id(pfn); >>> - struct memory_block *mem = find_memory_block_by_id(block_id); >>> + struct memory_block *mem; >>> >>> + lock_device_hotplug(); >> >> memblk_nr_poison_inc() and memblk_nr_poison_sub() are both called from memory_failure() context. >> I'm afraid if memory_failure() is triggered while lock_device_hotplug is held, it will lead to >> deadlock. Or am I miss something? > > I am curious is there any place where memory_failure() is called with holding lock_device_hotplug? Sorry for dumb scenario, I was a bit too presumptuous. But there might be another possible deadlock: remove_memory lock_device_hotplug <-- first called here try_remove_memory remove_memory_block_devices num_poisoned_pages_sub memblk_nr_poison_sub lock_device_hotplug <-- deadlock here Hope I'm not mistaken again. :) Thank. .