From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757008AbcGZQgM (ORCPT ); Tue, 26 Jul 2016 12:36:12 -0400 Received: from mail-qk0-f196.google.com ([209.85.220.196]:34535 "EHLO mail-qk0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754485AbcGZQgK (ORCPT ); Tue, 26 Jul 2016 12:36:10 -0400 Subject: Re: [RFC PATCH] mm/hugetlb: Avoid soft lockup in set_max_huge_pages() To: Dave Hansen , linux-mm@kvack.org References: <1469547868-9814-1-git-send-email-hejianet@gmail.com> <579788BA.1040706@linux.intel.com> Cc: linux-kernel@vger.kernel.org, Andrew Morton , Naoya Horiguchi , Mike Kravetz , "Kirill A. Shutemov" , Michal Hocko , Paul Gortmaker From: hejianet Message-ID: <5797916B.2020008@gmail.com> Date: Wed, 27 Jul 2016 00:35:55 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <579788BA.1040706@linux.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/26/16 11:58 PM, Dave Hansen wrote: > On 07/26/2016 08:44 AM, Jia He wrote: >> This patch is to fix such soft lockup. I thouhgt it is safe to call >> cond_resched() because alloc_fresh_gigantic_page and alloc_fresh_huge_page >> are out of spin_lock/unlock section. > Yikes. So the call site for both the things you patch is this: > >> while (count > persistent_huge_pages(h)) { > ... >> spin_unlock(&hugetlb_lock); >> if (hstate_is_gigantic(h)) >> ret = alloc_fresh_gigantic_page(h, nodes_allowed); >> else >> ret = alloc_fresh_huge_page(h, nodes_allowed); >> spin_lock(&hugetlb_lock); > and you choose to patch both of the alloc_*() functions. Why not just > fix it at the common call site? Seems like that > spin_lock(&hugetlb_lock) could be a cond_resched_lock() which would fix > both cases. > > Also, putting that cond_resched() inside the for_each_node*() loop is an > odd choice. It seems to indicate that the loops can take a long time, > which really isn't the case. The _loop_ isn't long, right? Yes,thanks for the suggestions Will send out V2 later B.R.