From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753698Ab1DZXSX (ORCPT ); Tue, 26 Apr 2011 19:18:23 -0400 Received: from cantor.suse.de ([195.135.220.2]:46456 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751851Ab1DZXSW (ORCPT ); Tue, 26 Apr 2011 19:18:22 -0400 Date: Wed, 27 Apr 2011 09:18:11 +1000 From: NeilBrown To: Mel Gorman Cc: Linux-MM , Linux-Netdev , LKML , David Miller , Peter Zijlstra Subject: Re: [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage Message-ID: <20110427091811.153ca78b@notabene.brown> In-Reply-To: <20110426142624.GH4658@suse.de> References: <1303803414-5937-1-git-send-email-mgorman@suse.de> <1303803414-5937-13-git-send-email-mgorman@suse.de> <20110426223059.10f3edda@notabene.brown> <20110426142624.GH4658@suse.de> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.1; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 26 Apr 2011 15:26:24 +0100 Mel Gorman wrote: > On Tue, Apr 26, 2011 at 10:30:59PM +1000, NeilBrown wrote: > > On Tue, 26 Apr 2011 08:36:53 +0100 Mel Gorman wrote: > > > > > > > +/* > > > + * Throttle direct reclaimers if backing storage is backed by the network > > > + * and the PFMEMALLOC reserve for the preferred node is getting dangerously > > > + * depleted. kswapd will continue to make progress and wake the processes > > > + * when the low watermark is reached > > > + */ > > > +static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist, > > > + nodemask_t *nodemask) > > > +{ > > > + struct zone *zone; > > > + int high_zoneidx = gfp_zone(gfp_mask); > > > + DEFINE_WAIT(wait); > > > + > > > + /* Check if the pfmemalloc reserves are ok */ > > > + first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone); > > > + prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait, > > > + TASK_INTERRUPTIBLE); > > > + if (pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx)) > > > + goto out; > > > + > > > + /* Throttle */ > > > + do { > > > + schedule(); > > > + finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait); > > > + prepare_to_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait, > > > + TASK_INTERRUPTIBLE); > > > + } while (!pfmemalloc_watermark_ok(zone->zone_pgdat, high_zoneidx) && > > > + !fatal_signal_pending(current)); > > > + > > > +out: > > > + finish_wait(&zone->zone_pgdat->pfmemalloc_wait, &wait); > > > +} > > > > You are doing an interruptible wait, but only checking for fatal signals. > > So if a non-fatal signal arrives, you will busy-wait. > > > > So I suspect you want TASK_KILLABLE, so just use: > > > > wait_event_killable(zone->zone_pgdat->pfmemalloc_wait, > > pgmemalloc_watermark_ok(zone->zone_pgdata, > > high_zoneidx)); > > > > Well, if a normal signal arrives, we do not necessarily want the > process to enter reclaim. For fatal signals, I allow it to continue > because it's not likely to be putting the system under more pressure > if it's exiting. Yep, I understand that and it doesn't seem unreasonable. However I don't think the code implements that correctly. If you get a non-fatal signal, schedule will exit immediately (because of the TASK_INTERRUPTIBLE setting) and the 'while' clause will succeed because the signal is not fatal, so it will loop around and try to schedule again, which will again exit immediately - busy loop. > > > (You also have an extraneous call to finish_wait) > > > > Which one? I'm not seeing a flow where finish_wait gets called twice > without a prepare_to_wait in between. > You don't need to call finish_wait immediately before prepare_to_wait. It really is best to just use the appropriate 'wait_event*' macro.... NeilBrown