From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754478AbbCBNqx (ORCPT ); Mon, 2 Mar 2015 08:46:53 -0500 Received: from cantor2.suse.de ([195.135.220.15]:59564 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753773AbbCBNqr (ORCPT ); Mon, 2 Mar 2015 08:46:47 -0500 Message-ID: <54F469C1.9090601@suse.cz> Date: Mon, 02 Mar 2015 14:46:41 +0100 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: David Rientjes , Andrew Morton CC: Christoph Lameter , Pekka Enberg , Joonsoo Kim , Johannes Weiner , Mel Gorman , Pravin Shelar , Jarno Rajahalme , Li Zefan , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, cgroups@vger.kernel.org, dev@openvswitch.org Subject: Re: [patch v2 1/3] mm: remove GFP_THISNODE References: In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/27/2015 11:16 PM, David Rientjes wrote: > NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE. > > GFP_THISNODE is a secret combination of gfp bits that have different > behavior than expected. It is a combination of __GFP_THISNODE, > __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page allocator > slowpath to fail without trying reclaim even though it may be used in > combination with __GFP_WAIT. > > An example of the problem this creates: commit e97ca8e5b864 ("mm: fix > GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE > that really just wanted __GFP_THISNODE. The problem doesn't end there, > however, because even it was a no-op for alloc_misplaced_dst_page(), > which also sets __GFP_NORETRY and __GFP_NOWARN, and > migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT > is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is > a no-op in these cases since the page allocator special-cases > __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN. > > It's time to just remove GFP_THISNODE entirely. We leave __GFP_THISNODE > to restrict an allocation to a local node, but remove GFP_THISNODE and > its obscurity. Instead, we require that a caller clear __GFP_WAIT if it > wants to avoid reclaim. > > This allows the aforementioned functions to actually reclaim as they > should. It also enables any future callers that want to do > __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The > rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT. > > Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so > it is unchanged. > > Signed-off-by: David Rientjes Acked-by: Vlastimil Babka So you've convinced me that this is a non-functional change for slab and a prerequisity for patch 2/3 which is the main point of this series for 4.0. That said, the new 'goto nopage' condition is still triggered by a combination of flag states, and the less we have those, the better for us IMHO. Looking at commit 952f3b51be which introduced this particular check using GFP_THISNODE, it seemed like it was a workaround to avoid triggering reclaim, without needing a new gfp flag. Nowadays, we have such flag called __GFP_NO_KSWAPD and as I explained in my reply to v1 (where I missed the new condition), passing the flag would already prevent wake_all_kswapds() and treating the allocation as atomic in gfp_to_alloc_flags(). So the whole difference would be another get_page_from_freelist() attempt (possibly less constrained than the fast path one) and then bail out on !wait. So it would be IMHO better for longer-term maintainability to have the relevant __GFP_THISNODE callers pass also __GFP_NO_KSWAPD to denote these opportunistic allocation attempts, instead of having this subtle semantic difference attached to __GFP_THISNODE && !__GFP_WAIT. It would be probably too risky for 4.0. On the other hand, I don't think even this series is really urgent. It's true that patch 2/3 fixes two commits, including a 4.0 one, but those commits are already not regressions without the fix. But maybe current -rcX is low enough to proceed with this series and catch any regressions in time, allowing the larger cleanups later.