Re: [PATCH] Fix bug in try_to_free_pages and balance_pgdat when they fail to reclaim pages

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nick Piggin <nickpiggin@yahoo.com.au>
To: "Martin J. Bligh" <mbligh@google.com>
Cc: Andrew Morton <akpm@osdl.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [PATCH] Fix bug in try_to_free_pages and balance_pgdat when they fail to reclaim pages
Date: Wed, 18 Oct 2006 03:03:47 +1000	[thread overview]
Message-ID: <45350CF3.6030306@yahoo.com.au> (raw)
In-Reply-To: <4534E397.8090505@google.com>

Martin J. Bligh wrote:

>> But temp_priority should be set to 0 at that point.
> 
> 
> It that were true, it'd be great. But how?
> This is everything that touches it:
> 
> 0 mmzone.h     <global>             208 int temp_priority;
> 1 page_alloc.c free_area_init_core 2019 zone->temp_priority =
>                                         zone->prev_priority = DEF_PRIORITY;
> 2 vmscan.c     shrink_zones         937 zone->temp_priority = priority;
> 3 vmscan.c     try_to_free_pages    987 zone->temp_priority = DEF_PRIORITY;
> 4 vmscan.c     try_to_free_pages   1031 zone->prev_priority =
>                                         zone->temp_priority;
> 5 vmscan.c     balance_pgdat       1081 zone->temp_priority = DEF_PRIORITY;
> 6 vmscan.c     balance_pgdat       1143 zone->temp_priority = priority;
> 7 vmscan.c     balance_pgdat       1189 zone->prev_priority =
>                                         zone->temp_priority;
> 8 vmstat.c     zoneinfo_show        593 zone->temp_priority,
> 
> Only thing that looks interesting here is shrink_zones.

For try_to_free_pages, shrink_zones will continue to be called until
priority reaches 0. So temp_priority and prev_priority are now 0. When
it breaks out of the loop, prev_priority gets assigned temp_priority.
Both of which are zero *unless you've hit the temp_priority race*. As
I said, getting rid of temp_priority and somehow tracking it locally
will close this race. I agree this race is a bug and would be happy to
see it fixed. This might be what your patch inadvertently fixes.

>> But your loops are not exactly per reclaimer either. Granted there
>> is a large race window in the current code, but this patch isn't the
>> way to fix that particular problem.
> 
> 
> Why not? Perhaps it's not a panacea, but it's a definite improvement.

OK it is an improvement for the cases when we hit priority = 0. It would
be nice to fix the race for medium priorities as well though. Hmm, OK,
if we can't do that easily then I would be OK with this approach for the
time being.

Please don't duplicate that whole loop again in try_to_free_pages, though.

> 
>>> Moreover, whilst try_to_free_pages calls shrink_zones, balance_pgdat
>>> does not. Nothing else I can see sets temp_priority.
>>
>>
>> balance_pgdat.
> 
> 
> That's only called from kswapd. If we're in balance_pgdat, we ARE 
> kswapd. We can't fix ourself. So effectively we're doing:
> 
> while (priority--) {
>     if (we reclaimed OK)
>         goto out;
> }
> out:
> prev_priority = DEF_PRIORITY;
> 
> We've just walked the whole bloody list with priority set to 0.
> 
> We failed to reclaim a few pages.
> 
> We know the world is in deep pain.
> 
> Why the hell would we elevate prev_priority?

No. If we've walked the whole bloody list and failed to reclaim any
pages, we do not set prev_priority to DEF_PRIORITY. Read the code, it
does the same thing with the priorities as shrink_zones.

>> Unnecesary and indicates something else is broken if you are seeing
>> problems here.
> 
> 
> You think we should set prev_priority up, when we've just walked the
> whole list at prio 0 and can't reclaim anything? Unless so, I fail
> to see how the patch is unnecessary.
> 
> And yes, I'm sure other things are broken, but again, this fixes a
> clear bug.

AFAIKS there is no bug that have identified here or in your changelog.
There is a race, there are many of tolerable races in reclaim. I can
accept this races is intolerable for you, so I am OK with fixing it.

>  > So do you still see the problem on upstream kernel
> 
>> without your patches applied?
> 
> 
> I can't slap an upstream bleeding edge kernel across a few thousand
> production machines, and wait to see if the world blows up, sorry.
> If I can make a reproduce test case, I'll send it out, but thus far
> we've been unsuccessful.

No problem, I didn't ask you to do that. But if you want this patch
in the upstream kerenl, then I will keep asking whether it fixes a
problem in the upstream kernel.

> 
> But I can see it happening in earlier versions, and I can read the
> code in 2.6.18, and see obvious bugs.

I can't see any besides the temp_priority race.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

WARNING: multiple messages have this Message-ID (diff)

From: Nick Piggin <nickpiggin@yahoo.com.au>
To: "Martin J. Bligh" <mbligh@google.com>
Cc: Andrew Morton <akpm@osdl.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [PATCH] Fix bug in try_to_free_pages and balance_pgdat when they fail to reclaim pages
Date: Wed, 18 Oct 2006 03:03:47 +1000	[thread overview]
Message-ID: <45350CF3.6030306@yahoo.com.au> (raw)
In-Reply-To: <4534E397.8090505@google.com>

Martin J. Bligh wrote:

>> But temp_priority should be set to 0 at that point.
> 
> 
> It that were true, it'd be great. But how?
> This is everything that touches it:
> 
> 0 mmzone.h     <global>             208 int temp_priority;
> 1 page_alloc.c free_area_init_core 2019 zone->temp_priority =
>                                         zone->prev_priority = DEF_PRIORITY;
> 2 vmscan.c     shrink_zones         937 zone->temp_priority = priority;
> 3 vmscan.c     try_to_free_pages    987 zone->temp_priority = DEF_PRIORITY;
> 4 vmscan.c     try_to_free_pages   1031 zone->prev_priority =
>                                         zone->temp_priority;
> 5 vmscan.c     balance_pgdat       1081 zone->temp_priority = DEF_PRIORITY;
> 6 vmscan.c     balance_pgdat       1143 zone->temp_priority = priority;
> 7 vmscan.c     balance_pgdat       1189 zone->prev_priority =
>                                         zone->temp_priority;
> 8 vmstat.c     zoneinfo_show        593 zone->temp_priority,
> 
> Only thing that looks interesting here is shrink_zones.

For try_to_free_pages, shrink_zones will continue to be called until
priority reaches 0. So temp_priority and prev_priority are now 0. When
it breaks out of the loop, prev_priority gets assigned temp_priority.
Both of which are zero *unless you've hit the temp_priority race*. As
I said, getting rid of temp_priority and somehow tracking it locally
will close this race. I agree this race is a bug and would be happy to
see it fixed. This might be what your patch inadvertently fixes.



>> But your loops are not exactly per reclaimer either. Granted there
>> is a large race window in the current code, but this patch isn't the
>> way to fix that particular problem.
> 
> 
> Why not? Perhaps it's not a panacea, but it's a definite improvement.

OK it is an improvement for the cases when we hit priority = 0. It would
be nice to fix the race for medium priorities as well though. Hmm, OK,
if we can't do that easily then I would be OK with this approach for the
time being.

Please don't duplicate that whole loop again in try_to_free_pages, though.

> 
>>> Moreover, whilst try_to_free_pages calls shrink_zones, balance_pgdat
>>> does not. Nothing else I can see sets temp_priority.
>>
>>
>> balance_pgdat.
> 
> 
> That's only called from kswapd. If we're in balance_pgdat, we ARE 
> kswapd. We can't fix ourself. So effectively we're doing:
> 
> while (priority--) {
>     if (we reclaimed OK)
>         goto out;
> }
> out:
> prev_priority = DEF_PRIORITY;
> 
> We've just walked the whole bloody list with priority set to 0.
> 
> We failed to reclaim a few pages.
> 
> We know the world is in deep pain.
> 
> Why the hell would we elevate prev_priority?

No. If we've walked the whole bloody list and failed to reclaim any
pages, we do not set prev_priority to DEF_PRIORITY. Read the code, it
does the same thing with the priorities as shrink_zones.

>> Unnecesary and indicates something else is broken if you are seeing
>> problems here.
> 
> 
> You think we should set prev_priority up, when we've just walked the
> whole list at prio 0 and can't reclaim anything? Unless so, I fail
> to see how the patch is unnecessary.
> 
> And yes, I'm sure other things are broken, but again, this fixes a
> clear bug.

AFAIKS there is no bug that have identified here or in your changelog.
There is a race, there are many of tolerable races in reclaim. I can
accept this races is intolerable for you, so I am OK with fixing it.


>  > So do you still see the problem on upstream kernel
> 
>> without your patches applied?
> 
> 
> I can't slap an upstream bleeding edge kernel across a few thousand
> production machines, and wait to see if the world blows up, sorry.
> If I can make a reproduce test case, I'll send it out, but thus far
> we've been unsuccessful.

No problem, I didn't ask you to do that. But if you want this patch
in the upstream kerenl, then I will keep asking whether it fixes a
problem in the upstream kernel.

> 
> But I can see it happening in earlier versions, and I can read the
> code in 2.6.18, and see obvious bugs.

I can't see any besides the temp_priority race.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-10-17 17:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-17  0:36 [PATCH] Fix bug in try_to_free_pages and balance_pgdat when they fail to reclaim pages Martin Bligh
2006-10-17  6:18 ` Nick Piggin
2006-10-17  6:18   ` Nick Piggin
2006-10-17  6:36   ` Martin J. Bligh
2006-10-17  6:36     ` Martin J. Bligh
2006-10-17  6:48     ` Nick Piggin
2006-10-17  6:48       ` Nick Piggin
2006-10-17 14:07       ` Martin J. Bligh
2006-10-17 14:07         ` Martin J. Bligh
2006-10-17 17:03         ` Nick Piggin [this message]
2006-10-17 17:03           ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45350CF3.6030306@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mbligh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.