Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

All of lore.kernel.org
 help / color / mirror / Atom feed

From: kernel test robot <lkp@intel.com>
To: kbuild-all@lists.01.org
Subject: Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress
Date: Tue, 16 Mar 2021 03:54:31 +0800	[thread overview]
Message-ID: <202103160339.z59TH5v7-lkp@intel.com> (raw)
In-Reply-To: <20210315165837.789593-1-atomlin@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 18652 bytes --]

Hi Aaron,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-linux-mm/master]

url:    https://github.com/0day-ci/linux/commits/Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
base:   https://github.com/hnaz/linux-mm master
config: arc-randconfig-r024-20210315 (attached as .config)
compiler: arceb-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/77338aaff2606a7715c832545e79370e849e3b4e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
        git checkout 77338aaff2606a7715c832545e79370e849e3b4e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   mm/page_alloc.c: In function 'should_reclaim_retry':
>> mm/page_alloc.c:4444:3: error: 'result' undeclared (first use in this function)
    4444 |   result false;
         |   ^~~~~~
   mm/page_alloc.c:4444:3: note: each undeclared identifier is reported only once for each function it appears in
>> mm/page_alloc.c:4444:9: error: expected ';' before 'false'
    4444 |   result false;
         |         ^~~~~~
         |         ;
>> mm/page_alloc.c:4447:50: error: expected ';' before 'for'
    4447 |   return unreserve_highatomic_pageblock(ac, true)
         |                                                  ^
         |                                                  ;
   mm/page_alloc.c:4426:18: warning: unused variable 'z' [-Wunused-variable]
    4426 |  struct zoneref *z;
         |                  ^
   mm/page_alloc.c:4425:15: warning: unused variable 'zone' [-Wunused-variable]
    4425 |  struct zone *zone;
         |               ^~~~
   mm/page_alloc.c: In function '__alloc_pages_slowpath':
>> mm/page_alloc.c:4720:11: error: expected ';' before ':' token
    4720 |   goto oom:
         |           ^
         |           ;
>> mm/page_alloc.c:4556:6: warning: variable 'compaction_retries' set but not used [-Wunused-but-set-variable]
    4556 |  int compaction_retries;
         |      ^~~~~~~~~~~~~~~~~~
   mm/page_alloc.c: At top level:
   mm/page_alloc.c:6136:23: warning: no previous prototype for 'memmap_init' [-Wmissing-prototypes]
    6136 | void __meminit __weak memmap_init(unsigned long size, int nid,
         |                       ^~~~~~~~~~~


vim +/result +4444 mm/page_alloc.c

  4409	
  4410	/*
  4411	 * Checks whether it makes sense to retry the reclaim to make a forward progress
  4412	 * for the given allocation request.
  4413	 *
  4414	 * We give up when we either have tried MAX_RECLAIM_RETRIES in a row
  4415	 * without success, or when we couldn't even meet the watermark if we
  4416	 * reclaimed all remaining pages on the LRU lists.
  4417	 *
  4418	 * Returns true if a retry is viable or false to enter the oom path.
  4419	 */
  4420	static inline bool
  4421	should_reclaim_retry(gfp_t gfp_mask, unsigned order,
  4422			     struct alloc_context *ac, int alloc_flags,
  4423			     bool did_some_progress, int *no_progress_loops)
  4424	{
  4425		struct zone *zone;
  4426		struct zoneref *z;
  4427		bool ret = false;
  4428	
  4429		/*
  4430		 * Costly allocations might have made a progress but this doesn't mean
  4431		 * their order will become available due to high fragmentation so
  4432		 * always increment the no progress counter for them
  4433		 */
  4434		if (did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER)
  4435			*no_progress_loops = 0;
  4436		else
  4437			(*no_progress_loops)++;
  4438	
  4439		/*
  4440		 * Make sure we converge to OOM if we cannot make any progress
  4441		 * several times in the row.
  4442		 */
  4443		if (*no_progress_loops > MAX_RECLAIM_RETRIES)
> 4444			result false;
  4445		/* Last chance before OOM, try draining highatomic_reserve once */
  4446		else if (*no_progress_loops == MAX_RECLAIM_RETRIES)
> 4447			return unreserve_highatomic_pageblock(ac, true)
  4448	
  4449		/*
  4450		 * Keep reclaiming pages while there is a chance this will lead
  4451		 * somewhere.  If none of the target zones can satisfy our allocation
  4452		 * request even if all reclaimable pages are considered then we are
  4453		 * screwed and have to go OOM.
  4454		 */
  4455		for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
  4456					ac->highest_zoneidx, ac->nodemask) {
  4457			unsigned long available;
  4458			unsigned long reclaimable;
  4459			unsigned long min_wmark = min_wmark_pages(zone);
  4460			bool wmark;
  4461	
  4462			available = reclaimable = zone_reclaimable_pages(zone);
  4463			available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
  4464	
  4465			/*
  4466			 * Would the allocation succeed if we reclaimed all
  4467			 * reclaimable pages?
  4468			 */
  4469			wmark = __zone_watermark_ok(zone, order, min_wmark,
  4470					ac->highest_zoneidx, alloc_flags, available);
  4471			trace_reclaim_retry_zone(z, order, reclaimable,
  4472					available, min_wmark, *no_progress_loops, wmark);
  4473			if (wmark) {
  4474				/*
  4475				 * If we didn't make any progress and have a lot of
  4476				 * dirty + writeback pages then we should wait for
  4477				 * an IO to complete to slow down the reclaim and
  4478				 * prevent from pre mature OOM
  4479				 */
  4480				if (!did_some_progress) {
  4481					unsigned long write_pending;
  4482	
  4483					write_pending = zone_page_state_snapshot(zone,
  4484								NR_ZONE_WRITE_PENDING);
  4485	
  4486					if (2 * write_pending > reclaimable) {
  4487						congestion_wait(BLK_RW_ASYNC, HZ/10);
  4488						return true;
  4489					}
  4490				}
  4491	
  4492				ret = true;
  4493				goto out;
  4494			}
  4495		}
  4496	
  4497	out:
  4498		/*
  4499		 * Memory allocation/reclaim might be called from a WQ context and the
  4500		 * current implementation of the WQ concurrency control doesn't
  4501		 * recognize that a particular WQ is congested if the worker thread is
  4502		 * looping without ever sleeping. Therefore we have to do a short sleep
  4503		 * here rather than calling cond_resched().
  4504		 */
  4505		if (current->flags & PF_WQ_WORKER)
  4506			schedule_timeout_uninterruptible(1);
  4507		else
  4508			cond_resched();
  4509		return ret;
  4510	}
  4511	
  4512	static inline bool
  4513	check_retry_cpuset(int cpuset_mems_cookie, struct alloc_context *ac)
  4514	{
  4515		/*
  4516		 * It's possible that cpuset's mems_allowed and the nodemask from
  4517		 * mempolicy don't intersect. This should be normally dealt with by
  4518		 * policy_nodemask(), but it's possible to race with cpuset update in
  4519		 * such a way the check therein was true, and then it became false
  4520		 * before we got our cpuset_mems_cookie here.
  4521		 * This assumes that for all allocations, ac->nodemask can come only
  4522		 * from MPOL_BIND mempolicy (whose documented semantics is to be ignored
  4523		 * when it does not intersect with the cpuset restrictions) or the
  4524		 * caller can deal with a violated nodemask.
  4525		 */
  4526		if (cpusets_enabled() && ac->nodemask &&
  4527				!cpuset_nodemask_valid_mems_allowed(ac->nodemask)) {
  4528			ac->nodemask = NULL;
  4529			return true;
  4530		}
  4531	
  4532		/*
  4533		 * When updating a task's mems_allowed or mempolicy nodemask, it is
  4534		 * possible to race with parallel threads in such a way that our
  4535		 * allocation can fail while the mask is being updated. If we are about
  4536		 * to fail, check if the cpuset changed during allocation and if so,
  4537		 * retry.
  4538		 */
  4539		if (read_mems_allowed_retry(cpuset_mems_cookie))
  4540			return true;
  4541	
  4542		return false;
  4543	}
  4544	
  4545	static inline struct page *
  4546	__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
  4547							struct alloc_context *ac)
  4548	{
  4549		bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
  4550		const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
  4551		struct page *page = NULL;
  4552		unsigned int alloc_flags;
  4553		unsigned long did_some_progress;
  4554		enum compact_priority compact_priority;
  4555		enum compact_result compact_result;
> 4556		int compaction_retries;
  4557		int no_progress_loops;
  4558		unsigned int cpuset_mems_cookie;
  4559		int reserve_flags;
  4560	
  4561		/*
  4562		 * We also sanity check to catch abuse of atomic reserves being used by
  4563		 * callers that are not in atomic context.
  4564		 */
  4565		if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==
  4566					(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
  4567			gfp_mask &= ~__GFP_ATOMIC;
  4568	
  4569	retry_cpuset:
  4570		compaction_retries = 0;
  4571		no_progress_loops = 0;
  4572		compact_priority = DEF_COMPACT_PRIORITY;
  4573		cpuset_mems_cookie = read_mems_allowed_begin();
  4574	
  4575		/*
  4576		 * The fast path uses conservative alloc_flags to succeed only until
  4577		 * kswapd needs to be woken up, and to avoid the cost of setting up
  4578		 * alloc_flags precisely. So we do that now.
  4579		 */
  4580		alloc_flags = gfp_to_alloc_flags(gfp_mask);
  4581	
  4582		/*
  4583		 * We need to recalculate the starting point for the zonelist iterator
  4584		 * because we might have used different nodemask in the fast path, or
  4585		 * there was a cpuset modification and we are retrying - otherwise we
  4586		 * could end up iterating over non-eligible zones endlessly.
  4587		 */
  4588		ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
  4589						ac->highest_zoneidx, ac->nodemask);
  4590		if (!ac->preferred_zoneref->zone)
  4591			goto nopage;
  4592	
  4593		if (alloc_flags & ALLOC_KSWAPD)
  4594			wake_all_kswapds(order, gfp_mask, ac);
  4595	
  4596		/*
  4597		 * The adjusted alloc_flags might result in immediate success, so try
  4598		 * that first
  4599		 */
  4600		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
  4601		if (page)
  4602			goto got_pg;
  4603	
  4604		/*
  4605		 * For costly allocations, try direct compaction first, as it's likely
  4606		 * that we have enough base pages and don't need to reclaim. For non-
  4607		 * movable high-order allocations, do that as well, as compaction will
  4608		 * try prevent permanent fragmentation by migrating from blocks of the
  4609		 * same migratetype.
  4610		 * Don't try this for allocations that are allowed to ignore
  4611		 * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
  4612		 */
  4613		if (can_direct_reclaim &&
  4614				(costly_order ||
  4615				   (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
  4616				&& !gfp_pfmemalloc_allowed(gfp_mask)) {
  4617			page = __alloc_pages_direct_compact(gfp_mask, order,
  4618							alloc_flags, ac,
  4619							INIT_COMPACT_PRIORITY,
  4620							&compact_result);
  4621			if (page)
  4622				goto got_pg;
  4623	
  4624			/*
  4625			 * Checks for costly allocations with __GFP_NORETRY, which
  4626			 * includes some THP page fault allocations
  4627			 */
  4628			if (costly_order && (gfp_mask & __GFP_NORETRY)) {
  4629				/*
  4630				 * If allocating entire pageblock(s) and compaction
  4631				 * failed because all zones are below low watermarks
  4632				 * or is prohibited because it recently failed at this
  4633				 * order, fail immediately unless the allocator has
  4634				 * requested compaction and reclaim retry.
  4635				 *
  4636				 * Reclaim is
  4637				 *  - potentially very expensive because zones are far
  4638				 *    below their low watermarks or this is part of very
  4639				 *    bursty high order allocations,
  4640				 *  - not guaranteed to help because isolate_freepages()
  4641				 *    may not iterate over freed pages as part of its
  4642				 *    linear scan, and
  4643				 *  - unlikely to make entire pageblocks free on its
  4644				 *    own.
  4645				 */
  4646				if (compact_result == COMPACT_SKIPPED ||
  4647				    compact_result == COMPACT_DEFERRED)
  4648					goto nopage;
  4649	
  4650				/*
  4651				 * Looks like reclaim/compaction is worth trying, but
  4652				 * sync compaction could be very expensive, so keep
  4653				 * using async compaction.
  4654				 */
  4655				compact_priority = INIT_COMPACT_PRIORITY;
  4656			}
  4657		}
  4658	
  4659	retry:
  4660		/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
  4661		if (alloc_flags & ALLOC_KSWAPD)
  4662			wake_all_kswapds(order, gfp_mask, ac);
  4663	
  4664		reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
  4665		if (reserve_flags)
  4666			alloc_flags = current_alloc_flags(gfp_mask, reserve_flags);
  4667	
  4668		/*
  4669		 * Reset the nodemask and zonelist iterators if memory policies can be
  4670		 * ignored. These allocations are high priority and system rather than
  4671		 * user oriented.
  4672		 */
  4673		if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) {
  4674			ac->nodemask = NULL;
  4675			ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
  4676						ac->highest_zoneidx, ac->nodemask);
  4677		}
  4678	
  4679		/* Attempt with potentially adjusted zonelist and alloc_flags */
  4680		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
  4681		if (page)
  4682			goto got_pg;
  4683	
  4684		/* Caller is not willing to reclaim, we can't balance anything */
  4685		if (!can_direct_reclaim)
  4686			goto nopage;
  4687	
  4688		/* Avoid recursion of direct reclaim */
  4689		if (current->flags & PF_MEMALLOC)
  4690			goto nopage;
  4691	
  4692		/* Try direct reclaim and then allocating */
  4693		page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
  4694								&did_some_progress);
  4695		if (page)
  4696			goto got_pg;
  4697	
  4698		/* Try direct compaction and then allocating */
  4699		page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
  4700						compact_priority, &compact_result);
  4701		if (page)
  4702			goto got_pg;
  4703	
  4704		/* Do not loop if specifically requested */
  4705		if (gfp_mask & __GFP_NORETRY)
  4706			goto nopage;
  4707	
  4708		/*
  4709		 * Do not retry costly high order allocations unless they are
  4710		 * __GFP_RETRY_MAYFAIL
  4711		 */
  4712		if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
  4713			goto nopage;
  4714	
  4715		if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
  4716					 did_some_progress > 0, &no_progress_loops))
  4717			goto retry;
  4718	
  4719		if (should_try_oom(no_progress_loops, compact_result))
  4720			goto oom:
  4721		/*
  4722		 * It doesn't make any sense to retry for the compaction if the order-0
  4723		 * reclaim is not able to make any progress because the current
  4724		 * implementation of the compaction depends on the sufficient amount
  4725		 * of free memory (see __compaction_suitable)
  4726		 */
  4727		if (did_some_progress > 0 &&
  4728				should_compact_retry(ac, order, alloc_flags,
  4729					compact_result, &compact_priority,
  4730					&compaction_retries))
  4731			goto retry;
  4732	
  4733	
  4734		/* Deal with possible cpuset update races before we start OOM killing */
  4735		if (check_retry_cpuset(cpuset_mems_cookie, ac))
  4736			goto retry_cpuset;
  4737	
  4738	oom:
  4739		/* Reclaim has failed us, start killing things */
  4740		page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
  4741		if (page)
  4742			goto got_pg;
  4743	
  4744		/* Avoid allocations with no watermarks from looping endlessly */
  4745		if (tsk_is_oom_victim(current) &&
  4746		    (alloc_flags & ALLOC_OOM ||
  4747		     (gfp_mask & __GFP_NOMEMALLOC)))
  4748			goto nopage;
  4749	
  4750		/* Retry as long as the OOM killer is making progress */
  4751		if (did_some_progress) {
  4752			no_progress_loops = 0;
  4753			goto retry;
  4754		}
  4755	
  4756	nopage:
  4757		/* Deal with possible cpuset update races before we fail */
  4758		if (check_retry_cpuset(cpuset_mems_cookie, ac))
  4759			goto retry_cpuset;
  4760	
  4761		/*
  4762		 * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
  4763		 * we always retry
  4764		 */
  4765		if (gfp_mask & __GFP_NOFAIL) {
  4766			/*
  4767			 * All existing users of the __GFP_NOFAIL are blockable, so warn
  4768			 * of any new users that actually require GFP_NOWAIT
  4769			 */
  4770			if (WARN_ON_ONCE(!can_direct_reclaim))
  4771				goto fail;
  4772	
  4773			/*
  4774			 * PF_MEMALLOC request from this context is rather bizarre
  4775			 * because we cannot reclaim anything and only can loop waiting
  4776			 * for somebody to do a work for us
  4777			 */
  4778			WARN_ON_ONCE(current->flags & PF_MEMALLOC);
  4779	
  4780			/*
  4781			 * non failing costly orders are a hard requirement which we
  4782			 * are not prepared for much so let's warn about these users
  4783			 * so that we can identify them and convert them to something
  4784			 * else.
  4785			 */
  4786			WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
  4787	
  4788			/*
  4789			 * Help non-failing allocations by giving them access to memory
  4790			 * reserves but do not use ALLOC_NO_WATERMARKS because this
  4791			 * could deplete whole memory reserves which would just make
  4792			 * the situation worse
  4793			 */
  4794			page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
  4795			if (page)
  4796				goto got_pg;
  4797	
  4798			cond_resched();
  4799			goto retry;
  4800		}
  4801	fail:
  4802		warn_alloc(gfp_mask, ac->nodemask,
  4803				"page allocation failure: order:%u", order);
  4804	got_pg:
  4805		return page;
  4806	}
  4807	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 22374 bytes --]

WARNING: multiple messages have this Message-ID (diff)

From: kernel test robot <lkp@intel.com>
To: Aaron Tomlin <atomlin@redhat.com>, linux-mm@kvack.org
Cc: kbuild-all@lists.01.org, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress
Date: Tue, 16 Mar 2021 03:54:31 +0800	[thread overview]
Message-ID: <202103160339.z59TH5v7-lkp@intel.com> (raw)
In-Reply-To: <20210315165837.789593-1-atomlin@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 18187 bytes --]

Hi Aaron,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-linux-mm/master]

url:    https://github.com/0day-ci/linux/commits/Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
base:   https://github.com/hnaz/linux-mm master
config: arc-randconfig-r024-20210315 (attached as .config)
compiler: arceb-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/77338aaff2606a7715c832545e79370e849e3b4e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Aaron-Tomlin/mm-page_alloc-try-oom-if-reclaim-is-unable-to-make-forward-progress/20210316-010203
        git checkout 77338aaff2606a7715c832545e79370e849e3b4e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   mm/page_alloc.c: In function 'should_reclaim_retry':
>> mm/page_alloc.c:4444:3: error: 'result' undeclared (first use in this function)
    4444 |   result false;
         |   ^~~~~~
   mm/page_alloc.c:4444:3: note: each undeclared identifier is reported only once for each function it appears in
>> mm/page_alloc.c:4444:9: error: expected ';' before 'false'
    4444 |   result false;
         |         ^~~~~~
         |         ;
>> mm/page_alloc.c:4447:50: error: expected ';' before 'for'
    4447 |   return unreserve_highatomic_pageblock(ac, true)
         |                                                  ^
         |                                                  ;
   mm/page_alloc.c:4426:18: warning: unused variable 'z' [-Wunused-variable]
    4426 |  struct zoneref *z;
         |                  ^
   mm/page_alloc.c:4425:15: warning: unused variable 'zone' [-Wunused-variable]
    4425 |  struct zone *zone;
         |               ^~~~
   mm/page_alloc.c: In function '__alloc_pages_slowpath':
>> mm/page_alloc.c:4720:11: error: expected ';' before ':' token
    4720 |   goto oom:
         |           ^
         |           ;
>> mm/page_alloc.c:4556:6: warning: variable 'compaction_retries' set but not used [-Wunused-but-set-variable]
    4556 |  int compaction_retries;
         |      ^~~~~~~~~~~~~~~~~~
   mm/page_alloc.c: At top level:
   mm/page_alloc.c:6136:23: warning: no previous prototype for 'memmap_init' [-Wmissing-prototypes]
    6136 | void __meminit __weak memmap_init(unsigned long size, int nid,
         |                       ^~~~~~~~~~~


vim +/result +4444 mm/page_alloc.c

  4409	
  4410	/*
  4411	 * Checks whether it makes sense to retry the reclaim to make a forward progress
  4412	 * for the given allocation request.
  4413	 *
  4414	 * We give up when we either have tried MAX_RECLAIM_RETRIES in a row
  4415	 * without success, or when we couldn't even meet the watermark if we
  4416	 * reclaimed all remaining pages on the LRU lists.
  4417	 *
  4418	 * Returns true if a retry is viable or false to enter the oom path.
  4419	 */
  4420	static inline bool
  4421	should_reclaim_retry(gfp_t gfp_mask, unsigned order,
  4422			     struct alloc_context *ac, int alloc_flags,
  4423			     bool did_some_progress, int *no_progress_loops)
  4424	{
  4425		struct zone *zone;
  4426		struct zoneref *z;
  4427		bool ret = false;
  4428	
  4429		/*
  4430		 * Costly allocations might have made a progress but this doesn't mean
  4431		 * their order will become available due to high fragmentation so
  4432		 * always increment the no progress counter for them
  4433		 */
  4434		if (did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER)
  4435			*no_progress_loops = 0;
  4436		else
  4437			(*no_progress_loops)++;
  4438	
  4439		/*
  4440		 * Make sure we converge to OOM if we cannot make any progress
  4441		 * several times in the row.
  4442		 */
  4443		if (*no_progress_loops > MAX_RECLAIM_RETRIES)
> 4444			result false;
  4445		/* Last chance before OOM, try draining highatomic_reserve once */
  4446		else if (*no_progress_loops == MAX_RECLAIM_RETRIES)
> 4447			return unreserve_highatomic_pageblock(ac, true)
  4448	
  4449		/*
  4450		 * Keep reclaiming pages while there is a chance this will lead
  4451		 * somewhere.  If none of the target zones can satisfy our allocation
  4452		 * request even if all reclaimable pages are considered then we are
  4453		 * screwed and have to go OOM.
  4454		 */
  4455		for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
  4456					ac->highest_zoneidx, ac->nodemask) {
  4457			unsigned long available;
  4458			unsigned long reclaimable;
  4459			unsigned long min_wmark = min_wmark_pages(zone);
  4460			bool wmark;
  4461	
  4462			available = reclaimable = zone_reclaimable_pages(zone);
  4463			available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
  4464	
  4465			/*
  4466			 * Would the allocation succeed if we reclaimed all
  4467			 * reclaimable pages?
  4468			 */
  4469			wmark = __zone_watermark_ok(zone, order, min_wmark,
  4470					ac->highest_zoneidx, alloc_flags, available);
  4471			trace_reclaim_retry_zone(z, order, reclaimable,
  4472					available, min_wmark, *no_progress_loops, wmark);
  4473			if (wmark) {
  4474				/*
  4475				 * If we didn't make any progress and have a lot of
  4476				 * dirty + writeback pages then we should wait for
  4477				 * an IO to complete to slow down the reclaim and
  4478				 * prevent from pre mature OOM
  4479				 */
  4480				if (!did_some_progress) {
  4481					unsigned long write_pending;
  4482	
  4483					write_pending = zone_page_state_snapshot(zone,
  4484								NR_ZONE_WRITE_PENDING);
  4485	
  4486					if (2 * write_pending > reclaimable) {
  4487						congestion_wait(BLK_RW_ASYNC, HZ/10);
  4488						return true;
  4489					}
  4490				}
  4491	
  4492				ret = true;
  4493				goto out;
  4494			}
  4495		}
  4496	
  4497	out:
  4498		/*
  4499		 * Memory allocation/reclaim might be called from a WQ context and the
  4500		 * current implementation of the WQ concurrency control doesn't
  4501		 * recognize that a particular WQ is congested if the worker thread is
  4502		 * looping without ever sleeping. Therefore we have to do a short sleep
  4503		 * here rather than calling cond_resched().
  4504		 */
  4505		if (current->flags & PF_WQ_WORKER)
  4506			schedule_timeout_uninterruptible(1);
  4507		else
  4508			cond_resched();
  4509		return ret;
  4510	}
  4511	
  4512	static inline bool
  4513	check_retry_cpuset(int cpuset_mems_cookie, struct alloc_context *ac)
  4514	{
  4515		/*
  4516		 * It's possible that cpuset's mems_allowed and the nodemask from
  4517		 * mempolicy don't intersect. This should be normally dealt with by
  4518		 * policy_nodemask(), but it's possible to race with cpuset update in
  4519		 * such a way the check therein was true, and then it became false
  4520		 * before we got our cpuset_mems_cookie here.
  4521		 * This assumes that for all allocations, ac->nodemask can come only
  4522		 * from MPOL_BIND mempolicy (whose documented semantics is to be ignored
  4523		 * when it does not intersect with the cpuset restrictions) or the
  4524		 * caller can deal with a violated nodemask.
  4525		 */
  4526		if (cpusets_enabled() && ac->nodemask &&
  4527				!cpuset_nodemask_valid_mems_allowed(ac->nodemask)) {
  4528			ac->nodemask = NULL;
  4529			return true;
  4530		}
  4531	
  4532		/*
  4533		 * When updating a task's mems_allowed or mempolicy nodemask, it is
  4534		 * possible to race with parallel threads in such a way that our
  4535		 * allocation can fail while the mask is being updated. If we are about
  4536		 * to fail, check if the cpuset changed during allocation and if so,
  4537		 * retry.
  4538		 */
  4539		if (read_mems_allowed_retry(cpuset_mems_cookie))
  4540			return true;
  4541	
  4542		return false;
  4543	}
  4544	
  4545	static inline struct page *
  4546	__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
  4547							struct alloc_context *ac)
  4548	{
  4549		bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
  4550		const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
  4551		struct page *page = NULL;
  4552		unsigned int alloc_flags;
  4553		unsigned long did_some_progress;
  4554		enum compact_priority compact_priority;
  4555		enum compact_result compact_result;
> 4556		int compaction_retries;
  4557		int no_progress_loops;
  4558		unsigned int cpuset_mems_cookie;
  4559		int reserve_flags;
  4560	
  4561		/*
  4562		 * We also sanity check to catch abuse of atomic reserves being used by
  4563		 * callers that are not in atomic context.
  4564		 */
  4565		if (WARN_ON_ONCE((gfp_mask & (__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)) ==
  4566					(__GFP_ATOMIC|__GFP_DIRECT_RECLAIM)))
  4567			gfp_mask &= ~__GFP_ATOMIC;
  4568	
  4569	retry_cpuset:
  4570		compaction_retries = 0;
  4571		no_progress_loops = 0;
  4572		compact_priority = DEF_COMPACT_PRIORITY;
  4573		cpuset_mems_cookie = read_mems_allowed_begin();
  4574	
  4575		/*
  4576		 * The fast path uses conservative alloc_flags to succeed only until
  4577		 * kswapd needs to be woken up, and to avoid the cost of setting up
  4578		 * alloc_flags precisely. So we do that now.
  4579		 */
  4580		alloc_flags = gfp_to_alloc_flags(gfp_mask);
  4581	
  4582		/*
  4583		 * We need to recalculate the starting point for the zonelist iterator
  4584		 * because we might have used different nodemask in the fast path, or
  4585		 * there was a cpuset modification and we are retrying - otherwise we
  4586		 * could end up iterating over non-eligible zones endlessly.
  4587		 */
  4588		ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
  4589						ac->highest_zoneidx, ac->nodemask);
  4590		if (!ac->preferred_zoneref->zone)
  4591			goto nopage;
  4592	
  4593		if (alloc_flags & ALLOC_KSWAPD)
  4594			wake_all_kswapds(order, gfp_mask, ac);
  4595	
  4596		/*
  4597		 * The adjusted alloc_flags might result in immediate success, so try
  4598		 * that first
  4599		 */
  4600		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
  4601		if (page)
  4602			goto got_pg;
  4603	
  4604		/*
  4605		 * For costly allocations, try direct compaction first, as it's likely
  4606		 * that we have enough base pages and don't need to reclaim. For non-
  4607		 * movable high-order allocations, do that as well, as compaction will
  4608		 * try prevent permanent fragmentation by migrating from blocks of the
  4609		 * same migratetype.
  4610		 * Don't try this for allocations that are allowed to ignore
  4611		 * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
  4612		 */
  4613		if (can_direct_reclaim &&
  4614				(costly_order ||
  4615				   (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
  4616				&& !gfp_pfmemalloc_allowed(gfp_mask)) {
  4617			page = __alloc_pages_direct_compact(gfp_mask, order,
  4618							alloc_flags, ac,
  4619							INIT_COMPACT_PRIORITY,
  4620							&compact_result);
  4621			if (page)
  4622				goto got_pg;
  4623	
  4624			/*
  4625			 * Checks for costly allocations with __GFP_NORETRY, which
  4626			 * includes some THP page fault allocations
  4627			 */
  4628			if (costly_order && (gfp_mask & __GFP_NORETRY)) {
  4629				/*
  4630				 * If allocating entire pageblock(s) and compaction
  4631				 * failed because all zones are below low watermarks
  4632				 * or is prohibited because it recently failed at this
  4633				 * order, fail immediately unless the allocator has
  4634				 * requested compaction and reclaim retry.
  4635				 *
  4636				 * Reclaim is
  4637				 *  - potentially very expensive because zones are far
  4638				 *    below their low watermarks or this is part of very
  4639				 *    bursty high order allocations,
  4640				 *  - not guaranteed to help because isolate_freepages()
  4641				 *    may not iterate over freed pages as part of its
  4642				 *    linear scan, and
  4643				 *  - unlikely to make entire pageblocks free on its
  4644				 *    own.
  4645				 */
  4646				if (compact_result == COMPACT_SKIPPED ||
  4647				    compact_result == COMPACT_DEFERRED)
  4648					goto nopage;
  4649	
  4650				/*
  4651				 * Looks like reclaim/compaction is worth trying, but
  4652				 * sync compaction could be very expensive, so keep
  4653				 * using async compaction.
  4654				 */
  4655				compact_priority = INIT_COMPACT_PRIORITY;
  4656			}
  4657		}
  4658	
  4659	retry:
  4660		/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
  4661		if (alloc_flags & ALLOC_KSWAPD)
  4662			wake_all_kswapds(order, gfp_mask, ac);
  4663	
  4664		reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
  4665		if (reserve_flags)
  4666			alloc_flags = current_alloc_flags(gfp_mask, reserve_flags);
  4667	
  4668		/*
  4669		 * Reset the nodemask and zonelist iterators if memory policies can be
  4670		 * ignored. These allocations are high priority and system rather than
  4671		 * user oriented.
  4672		 */
  4673		if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) {
  4674			ac->nodemask = NULL;
  4675			ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
  4676						ac->highest_zoneidx, ac->nodemask);
  4677		}
  4678	
  4679		/* Attempt with potentially adjusted zonelist and alloc_flags */
  4680		page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
  4681		if (page)
  4682			goto got_pg;
  4683	
  4684		/* Caller is not willing to reclaim, we can't balance anything */
  4685		if (!can_direct_reclaim)
  4686			goto nopage;
  4687	
  4688		/* Avoid recursion of direct reclaim */
  4689		if (current->flags & PF_MEMALLOC)
  4690			goto nopage;
  4691	
  4692		/* Try direct reclaim and then allocating */
  4693		page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
  4694								&did_some_progress);
  4695		if (page)
  4696			goto got_pg;
  4697	
  4698		/* Try direct compaction and then allocating */
  4699		page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
  4700						compact_priority, &compact_result);
  4701		if (page)
  4702			goto got_pg;
  4703	
  4704		/* Do not loop if specifically requested */
  4705		if (gfp_mask & __GFP_NORETRY)
  4706			goto nopage;
  4707	
  4708		/*
  4709		 * Do not retry costly high order allocations unless they are
  4710		 * __GFP_RETRY_MAYFAIL
  4711		 */
  4712		if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
  4713			goto nopage;
  4714	
  4715		if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
  4716					 did_some_progress > 0, &no_progress_loops))
  4717			goto retry;
  4718	
  4719		if (should_try_oom(no_progress_loops, compact_result))
  4720			goto oom:
  4721		/*
  4722		 * It doesn't make any sense to retry for the compaction if the order-0
  4723		 * reclaim is not able to make any progress because the current
  4724		 * implementation of the compaction depends on the sufficient amount
  4725		 * of free memory (see __compaction_suitable)
  4726		 */
  4727		if (did_some_progress > 0 &&
  4728				should_compact_retry(ac, order, alloc_flags,
  4729					compact_result, &compact_priority,
  4730					&compaction_retries))
  4731			goto retry;
  4732	
  4733	
  4734		/* Deal with possible cpuset update races before we start OOM killing */
  4735		if (check_retry_cpuset(cpuset_mems_cookie, ac))
  4736			goto retry_cpuset;
  4737	
  4738	oom:
  4739		/* Reclaim has failed us, start killing things */
  4740		page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
  4741		if (page)
  4742			goto got_pg;
  4743	
  4744		/* Avoid allocations with no watermarks from looping endlessly */
  4745		if (tsk_is_oom_victim(current) &&
  4746		    (alloc_flags & ALLOC_OOM ||
  4747		     (gfp_mask & __GFP_NOMEMALLOC)))
  4748			goto nopage;
  4749	
  4750		/* Retry as long as the OOM killer is making progress */
  4751		if (did_some_progress) {
  4752			no_progress_loops = 0;
  4753			goto retry;
  4754		}
  4755	
  4756	nopage:
  4757		/* Deal with possible cpuset update races before we fail */
  4758		if (check_retry_cpuset(cpuset_mems_cookie, ac))
  4759			goto retry_cpuset;
  4760	
  4761		/*
  4762		 * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
  4763		 * we always retry
  4764		 */
  4765		if (gfp_mask & __GFP_NOFAIL) {
  4766			/*
  4767			 * All existing users of the __GFP_NOFAIL are blockable, so warn
  4768			 * of any new users that actually require GFP_NOWAIT
  4769			 */
  4770			if (WARN_ON_ONCE(!can_direct_reclaim))
  4771				goto fail;
  4772	
  4773			/*
  4774			 * PF_MEMALLOC request from this context is rather bizarre
  4775			 * because we cannot reclaim anything and only can loop waiting
  4776			 * for somebody to do a work for us
  4777			 */
  4778			WARN_ON_ONCE(current->flags & PF_MEMALLOC);
  4779	
  4780			/*
  4781			 * non failing costly orders are a hard requirement which we
  4782			 * are not prepared for much so let's warn about these users
  4783			 * so that we can identify them and convert them to something
  4784			 * else.
  4785			 */
  4786			WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
  4787	
  4788			/*
  4789			 * Help non-failing allocations by giving them access to memory
  4790			 * reserves but do not use ALLOC_NO_WATERMARKS because this
  4791			 * could deplete whole memory reserves which would just make
  4792			 * the situation worse
  4793			 */
  4794			page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
  4795			if (page)
  4796				goto got_pg;
  4797	
  4798			cond_resched();
  4799			goto retry;
  4800		}
  4801	fail:
  4802		warn_alloc(gfp_mask, ac->nodemask,
  4803				"page allocation failure: order:%u", order);
  4804	got_pg:
  4805		return page;
  4806	}
  4807	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22374 bytes --]

next prev parent reply	other threads:[~2021-03-15 19:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-15 16:58 [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress Aaron Tomlin
2021-03-15 19:54 ` kernel test robot
2021-03-15 19:54   ` kernel test robot
2021-03-15 19:54 ` kernel test robot
2021-03-15 19:54   ` kernel test robot
2021-03-15 19:54 ` kernel test robot [this message]
2021-03-15 19:54   ` kernel test robot
2021-03-18 16:16 ` Michal Hocko
2021-03-19 17:29   ` Aaron Tomlin
2021-03-22 10:47     ` Michal Hocko
2021-03-25 21:01       ` Aaron Tomlin
2021-03-26  8:16         ` Michal Hocko
2021-03-26 11:22           ` Aaron Tomlin
2021-03-26 15:36             ` Michal Hocko
2021-03-26 17:00               ` Aaron Tomlin
2021-05-18 14:05               ` Aaron Tomlin
2021-05-19 11:10                 ` Michal Hocko
2021-05-19 13:06                   ` Aaron Tomlin
2021-05-19 14:50                     ` [PATCH] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt Aaron Tomlin
2021-05-19 15:22                       ` Vlastimil Babka
2021-05-19 19:08                         ` Aaron Tomlin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202103160339.z59TH5v7-lkp@intel.com \
    --to=lkp@intel.com \
    --cc=kbuild-all@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.