From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by kanga.kvack.org (Postfix) with ESMTP id B6AE66B0069 for ; Wed, 22 Oct 2014 12:58:03 -0400 (EDT) Received: by mail-pa0-f41.google.com with SMTP id rd3so2074337pab.28 for ; Wed, 22 Oct 2014 09:58:03 -0700 (PDT) Received: from smtp.codeaurora.org (smtp.codeaurora.org. [198.145.11.231]) by mx.google.com with ESMTPS id gx10si4281056pbd.136.2014.10.22.09.57.54 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Oct 2014 09:57:55 -0700 (PDT) Message-ID: <5447E210.8020902@codeaurora.org> Date: Wed, 22 Oct 2014 09:57:52 -0700 From: Laura Abbott MIME-Version: 1.0 Subject: Deadlock with CMA and CPU hotplug Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: mgorman@suse.de, m.szyprowski@samsung.com, mina86@mina86.com Cc: linux-mm@kvack.org, Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, pratikp@codeaurora.org Hi, We've run into a AB/BA deadlock situation involving a driver lock and the CPU hotplug lock on a 3.10 based kernel. The situation is this: CPU 0 CPU 1 ----- ---- Start CPU hotplug mutex_lock(&cpu_hotplug.lock) Run CPU hotplug notifier data for driver comes in mutex_lock(&driver_lock) driver calls dma_alloc_coherent alloc_contig_range lru_add_drain_all get_online_cpus() mutex_lock(&cpu_hotplug.lock) Driver hotplug notifier runs mutex_lock(&driver_lock) The driver itself is out of tree right now[1] and we're looking at ways to rework the driver. The best option for rework right now though might result in some performance penalties. The size that's being allocated can't easily be converted to an atomic allocation either It seems like this might be a limitation of where CMA/ dma_alloc_coherent could potentially be used and make drivers unnecessarily aware of CPU hotplug locking. Does this seem like an actual problem that needs to be fixed or is trying to use CMA in a CPU hotplug notifier path just asking for trouble? Thanks, Laura [1] For reference, the driver is a version of https://lkml.org/lkml/2014/10/7/495 although that particular posted version allocates memory at probe instead of runtime and probably doesn't have the deadlock. -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by kanga.kvack.org (Postfix) with ESMTP id 9CDBF900021 for ; Tue, 28 Oct 2014 05:49:27 -0400 (EDT) Received: by mail-pa0-f46.google.com with SMTP id lf10so330761pab.33 for ; Tue, 28 Oct 2014 02:49:27 -0700 (PDT) Received: from mailout4.w1.samsung.com (mailout4.w1.samsung.com. [210.118.77.14]) by mx.google.com with ESMTPS id az17si777846pdb.198.2014.10.28.02.49.25 for (version=TLSv1 cipher=RC4-MD5 bits=128/128); Tue, 28 Oct 2014 02:49:26 -0700 (PDT) Received: from eucpsbgm1.samsung.com (unknown [203.254.199.244]) by mailout4.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0NE500K1SFEYW190@mailout4.w1.samsung.com> for linux-mm@kvack.org; Tue, 28 Oct 2014 09:52:10 +0000 (GMT) Message-id: <544F66A2.1080302@samsung.com> Date: Tue, 28 Oct 2014 10:49:22 +0100 From: Marek Szyprowski MIME-version: 1.0 Subject: Re: Deadlock with CMA and CPU hotplug References: <5447E210.8020902@codeaurora.org> In-reply-to: <5447E210.8020902@codeaurora.org> Content-type: text/plain; charset=utf-8; format=flowed Content-transfer-encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Laura Abbott , mgorman@suse.de, mina86@mina86.com Cc: linux-mm@kvack.org, Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, pratikp@codeaurora.org Hello, On 2014-10-22 18:57, Laura Abbott wrote: > We've run into a AB/BA deadlock situation involving a driver lock and > the CPU hotplug lock on a 3.10 based kernel. The situation is this: > > CPU 0 CPU 1 > ----- ---- > Start CPU hotplug > mutex_lock(&cpu_hotplug.lock) > Run CPU hotplug notifier > data for driver comes in > mutex_lock(&driver_lock) > driver calls dma_alloc_coherent > alloc_contig_range > lru_add_drain_all > get_online_cpus() > mutex_lock(&cpu_hotplug.lock) > > Driver hotplug notifier runs > mutex_lock(&driver_lock) > > The driver itself is out of tree right now[1] and we're looking at > ways to rework the driver. The best option for rework right now > though might result in some performance penalties. The size that's > being allocated can't easily be converted to an atomic allocation either > It seems like this might be a limitation of where CMA/ > dma_alloc_coherent could potentially be used and make drivers > unnecessarily aware of CPU hotplug locking. > > Does this seem like an actual problem that needs to be fixed or > is trying to use CMA in a CPU hotplug notifier path just asking > for trouble? IMHO doing any allocation without GFP_ATOMIC from a notifier is asking for problems. I always considered notifiers as callbacks that might be called directly from i.e. interrupts. I don't know much about your code, but maybe it would be possible to move the problematic code from a notifier to a separate worker or thread? Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753917AbaJVQ54 (ORCPT ); Wed, 22 Oct 2014 12:57:56 -0400 Received: from smtp.codeaurora.org ([198.145.11.231]:52392 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752924AbaJVQ5z (ORCPT ); Wed, 22 Oct 2014 12:57:55 -0400 Message-ID: <5447E210.8020902@codeaurora.org> Date: Wed, 22 Oct 2014 09:57:52 -0700 From: Laura Abbott User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: mgorman@suse.de, m.szyprowski@samsung.com, mina86@mina86.com CC: linux-mm@kvack.org, Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, pratikp@codeaurora.org Subject: Deadlock with CMA and CPU hotplug Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, We've run into a AB/BA deadlock situation involving a driver lock and the CPU hotplug lock on a 3.10 based kernel. The situation is this: CPU 0 CPU 1 ----- ---- Start CPU hotplug mutex_lock(&cpu_hotplug.lock) Run CPU hotplug notifier data for driver comes in mutex_lock(&driver_lock) driver calls dma_alloc_coherent alloc_contig_range lru_add_drain_all get_online_cpus() mutex_lock(&cpu_hotplug.lock) Driver hotplug notifier runs mutex_lock(&driver_lock) The driver itself is out of tree right now[1] and we're looking at ways to rework the driver. The best option for rework right now though might result in some performance penalties. The size that's being allocated can't easily be converted to an atomic allocation either It seems like this might be a limitation of where CMA/ dma_alloc_coherent could potentially be used and make drivers unnecessarily aware of CPU hotplug locking. Does this seem like an actual problem that needs to be fixed or is trying to use CMA in a CPU hotplug notifier path just asking for trouble? Thanks, Laura [1] For reference, the driver is a version of https://lkml.org/lkml/2014/10/7/495 although that particular posted version allocates memory at probe instead of runtime and probably doesn't have the deadlock. -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932174AbaJ1Jtn (ORCPT ); Tue, 28 Oct 2014 05:49:43 -0400 Received: from mailout4.w1.samsung.com ([210.118.77.14]:58474 "EHLO mailout4.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755099AbaJ1Jt0 (ORCPT ); Tue, 28 Oct 2014 05:49:26 -0400 X-AuditID: cbfec7f4-b7f6c6d00000120b-75-544f66a42b65 Message-id: <544F66A2.1080302@samsung.com> Date: Tue, 28 Oct 2014 10:49:22 +0100 From: Marek Szyprowski User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-version: 1.0 To: Laura Abbott , mgorman@suse.de, mina86@mina86.com Cc: linux-mm@kvack.org, Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, pratikp@codeaurora.org Subject: Re: Deadlock with CMA and CPU hotplug References: <5447E210.8020902@codeaurora.org> In-reply-to: <5447E210.8020902@codeaurora.org> Content-type: text/plain; charset=utf-8; format=flowed Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrJLMWRmVeSWpSXmKPExsVy+t/xq7pL0vxDDPp3c1hs75zBbnF51xw2 i3tr/rNaTH73jNFiwfEWVovV/04xWhzvPcBk8e2+uQOHx+W+XiaPzSu0PDat6mTz2PRpErvH uj+vgEKnqz0+b5ILYI/isklJzcksSy3St0vgyrh1p5WtYBdvxa3d51kaGL9ydjFyckgImEhc +NjCBGGLSVy4t56ti5GLQ0hgKaPE/yPXoJxPjBKdNxezglTxCmhJrFz7hR3EZhFQlfi+tp0N xGYTMJToetsFZosKxEjc37maDaJeUOLH5HssXYwcHCICPhJH+z1AZjILTGeUWHJhISNIjbCA nsSG7glgVwgJ6Er8ev6UHaSeEyje2OYEEmYWMJP48vIwK4QtL7F5zVvmCYwCs5BsmIWkbBaS sgWMzKsYRVNLkwuKk9JzDfWKE3OLS/PS9ZLzczcxQoL/yw7GxcesDjEKcDAq8fDumOYbIsSa WFZcmXuIUYKDWUmENyLGP0SINyWxsiq1KD++qDQntfgQIxMHp1QDY0zJtS8Lfdy8bDf9PtKl WexlcXTP08av3NEPahYe+VGy5+3DqL07nsvnTjoyIXDvfwZ2B5fraSeDJpS/Wt182a9AwVQv 0c5TXvDN7RjFJ6Vf10zfWCK18VTPHm3BtMqKEN96q/OsHyfuzu+/ExVkacvWPvPmpG7DTRHJ 5y6rfmUpc3l0V3SSjRJLcUaioRZzUXEiAFjUxu1cAgAA Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On 2014-10-22 18:57, Laura Abbott wrote: > We've run into a AB/BA deadlock situation involving a driver lock and > the CPU hotplug lock on a 3.10 based kernel. The situation is this: > > CPU 0 CPU 1 > ----- ---- > Start CPU hotplug > mutex_lock(&cpu_hotplug.lock) > Run CPU hotplug notifier > data for driver comes in > mutex_lock(&driver_lock) > driver calls dma_alloc_coherent > alloc_contig_range > lru_add_drain_all > get_online_cpus() > mutex_lock(&cpu_hotplug.lock) > > Driver hotplug notifier runs > mutex_lock(&driver_lock) > > The driver itself is out of tree right now[1] and we're looking at > ways to rework the driver. The best option for rework right now > though might result in some performance penalties. The size that's > being allocated can't easily be converted to an atomic allocation either > It seems like this might be a limitation of where CMA/ > dma_alloc_coherent could potentially be used and make drivers > unnecessarily aware of CPU hotplug locking. > > Does this seem like an actual problem that needs to be fixed or > is trying to use CMA in a CPU hotplug notifier path just asking > for trouble? IMHO doing any allocation without GFP_ATOMIC from a notifier is asking for problems. I always considered notifiers as callbacks that might be called directly from i.e. interrupts. I don't know much about your code, but maybe it would be possible to move the problematic code from a notifier to a separate worker or thread? Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland