Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD"

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Thorsten Leemhuis <fedora@leemhuis.info>
To: Josh Boyer <jwboyer@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Zdenek Kabelac <zkabelac@redhat.com>,
	Seth Jennings <sjenning@linux.vnet.ibm.com>,
	Jiri Slaby <jslaby@suse.cz>,
	Valdis.Kletnieks@vt.edu, Jiri Slaby <jirislaby@gmail.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	bruno@wolff.to
Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD"
Date: Tue, 20 Nov 2012 18:43:04 +0100	[thread overview]
Message-ID: <50ABC128.80706@leemhuis.info> (raw)
In-Reply-To: <CA+5PVA7__=JcjLAhs5cpVK-WaZbF5bQhp5WojBJsdEt9SnG3cw@mail.gmail.com>

On 20.11.2012 16:38, Josh Boyer wrote:
> On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman <mgorman@suse.de> wrote:
>> On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote:
>>> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman <mgorman@suse.de> wrote:
>>>> With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
>>>> based on failures" reverted, Zdenek Kabelac reported the following
>>>>
>>>>          Hmm,  so it's just took longer to hit the problem and observe
>>>>          kswapd0 spinning on my CPU again - it's not as endless like before -
>>>>          but still it easily eats minutes - it helps to  turn off  Firefox
>>>>          or TB  (memory hungry apps) so kswapd0 stops soon - and restart
>>>>          those apps again.  (And I still have like >1GB of cached memory)
>>>>
>>>>          kswapd0         R  running task        0    30      2 0x00000000
>>>>           ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246
>>>>           ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8
>>>>           ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000
>>>>          Call Trace:
>>>>           [<ffffffff81555bf2>] preempt_schedule+0x42/0x60
>>>>           [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60
>>>>           [<ffffffff81192971>] put_super+0x31/0x40
>>>>           [<ffffffff81192a42>] drop_super+0x22/0x30
>>>>           [<ffffffff81193b89>] prune_super+0x149/0x1b0
>>>>           [<ffffffff81141e2a>] shrink_slab+0xba/0x510
>>>>
>>>> The sysrq+m indicates the system has no swap so it'll never reclaim
>>>> anonymous pages as part of reclaim/compaction. That is one part of the
>>>> problem but not the root cause as file-backed pages could also be reclaimed.
>>>>
>>>> The likely underlying problem is that kswapd is woken up or kept awake
>>>> for each THP allocation request in the page allocator slow path.
>>>>
>>>> If compaction fails for the requesting process then compaction will be
>>>> deferred for a time and direct reclaim is avoided. However, if there
>>>> are a storm of THP requests that are simply rejected, it will still
>>>> be the the case that kswapd is awake for a prolonged period of time
>>>> as pgdat->kswapd_max_order is updated each time. This is noticed by
>>>> the main kswapd() loop and it will not call kswapd_try_to_sleep().
>>>> Instead it will loopp, shrinking a small number of pages and calling
>>>> shrink_slab() on each iteration.
>>>>
>>>> The temptation is to supply a patch that checks if kswapd was woken for
>>>> THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
>>>> backed up by proper testing. As 3.7 is very close to release and this is
>>>> not a bug we should release with, a safer path is to revert "mm: remove
>>>> __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the
>>>> balance_pgdat() logic in general.
>>>>
>>>> Signed-off-by: Mel Gorman <mgorman@suse.de>
>>>
>>> Does anyone know if this is queued to go into 3.7 somewhere?  I looked
>>> a bit and can't find it in a tree.  We have a few reports of Fedora
>>> rawhide users hitting this.
>>
>> No, because I was waiting to hear if a) it worked and preferably if the
>> alternative "less safe" option worked. This close to release it might be
>> better to just go with the safe option.
>
> We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988
> and people say this revert patch doesn't seem to make the issue go away
> fully.  Thorsten has created another kernel with the other patch applied
> for testing.
>
> At least I think that is the latest status from the bug.  Hopefully the
> commenters will chime in.

The short story from my current point of view is:

  * my main machine at home where I initially saw the issue that started 
this thread seems to be running fine with rc6 and the "safe" patch Mel 
posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 
kernel with the revert that went into rc6 and the "safe" patch -- that 
worked fine for a few days, too.

  * I have a second machine where I started to use 3.7-rc kernels only 
yesterday (the machine triggered a bug in the radeon driver that seems 
to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac 
mentions in this thread. I wasn't able to look closer at it, but simply 
tried rc6 with the safe patch, which didn't help. I'm now running rc6 
with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151
I can't yet tell if it helps. If the problems shows up again I'll try to 
capture more debugging data via sysrq -- there wasn't any time for that 
when I was running rc6 with the safe patch, sorry.

Thorsten

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Thorsten Leemhuis <fedora@leemhuis.info>
To: Josh Boyer <jwboyer@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Zdenek Kabelac <zkabelac@redhat.com>,
	Seth Jennings <sjenning@linux.vnet.ibm.com>,
	Jiri Slaby <jslaby@suse.cz>,
	Valdis.Kletnieks@vt.edu, Jiri Slaby <jirislaby@gmail.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Robert Jennings <rcj@linux.vnet.ibm.com>,
	bruno@wolff.to
Subject: Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD"
Date: Tue, 20 Nov 2012 18:43:04 +0100	[thread overview]
Message-ID: <50ABC128.80706@leemhuis.info> (raw)
In-Reply-To: <CA+5PVA7__=JcjLAhs5cpVK-WaZbF5bQhp5WojBJsdEt9SnG3cw@mail.gmail.com>

On 20.11.2012 16:38, Josh Boyer wrote:
> On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman <mgorman@suse.de> wrote:
>> On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote:
>>> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman <mgorman@suse.de> wrote:
>>>> With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
>>>> based on failures" reverted, Zdenek Kabelac reported the following
>>>>
>>>>          Hmm,  so it's just took longer to hit the problem and observe
>>>>          kswapd0 spinning on my CPU again - it's not as endless like before -
>>>>          but still it easily eats minutes - it helps to  turn off  Firefox
>>>>          or TB  (memory hungry apps) so kswapd0 stops soon - and restart
>>>>          those apps again.  (And I still have like >1GB of cached memory)
>>>>
>>>>          kswapd0         R  running task        0    30      2 0x00000000
>>>>           ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246
>>>>           ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8
>>>>           ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000
>>>>          Call Trace:
>>>>           [<ffffffff81555bf2>] preempt_schedule+0x42/0x60
>>>>           [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60
>>>>           [<ffffffff81192971>] put_super+0x31/0x40
>>>>           [<ffffffff81192a42>] drop_super+0x22/0x30
>>>>           [<ffffffff81193b89>] prune_super+0x149/0x1b0
>>>>           [<ffffffff81141e2a>] shrink_slab+0xba/0x510
>>>>
>>>> The sysrq+m indicates the system has no swap so it'll never reclaim
>>>> anonymous pages as part of reclaim/compaction. That is one part of the
>>>> problem but not the root cause as file-backed pages could also be reclaimed.
>>>>
>>>> The likely underlying problem is that kswapd is woken up or kept awake
>>>> for each THP allocation request in the page allocator slow path.
>>>>
>>>> If compaction fails for the requesting process then compaction will be
>>>> deferred for a time and direct reclaim is avoided. However, if there
>>>> are a storm of THP requests that are simply rejected, it will still
>>>> be the the case that kswapd is awake for a prolonged period of time
>>>> as pgdat->kswapd_max_order is updated each time. This is noticed by
>>>> the main kswapd() loop and it will not call kswapd_try_to_sleep().
>>>> Instead it will loopp, shrinking a small number of pages and calling
>>>> shrink_slab() on each iteration.
>>>>
>>>> The temptation is to supply a patch that checks if kswapd was woken for
>>>> THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
>>>> backed up by proper testing. As 3.7 is very close to release and this is
>>>> not a bug we should release with, a safer path is to revert "mm: remove
>>>> __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the
>>>> balance_pgdat() logic in general.
>>>>
>>>> Signed-off-by: Mel Gorman <mgorman@suse.de>
>>>
>>> Does anyone know if this is queued to go into 3.7 somewhere?  I looked
>>> a bit and can't find it in a tree.  We have a few reports of Fedora
>>> rawhide users hitting this.
>>
>> No, because I was waiting to hear if a) it worked and preferably if the
>> alternative "less safe" option worked. This close to release it might be
>> better to just go with the safe option.
>
> We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988
> and people say this revert patch doesn't seem to make the issue go away
> fully.  Thorsten has created another kernel with the other patch applied
> for testing.
>
> At least I think that is the latest status from the bug.  Hopefully the
> commenters will chime in.

The short story from my current point of view is:

  * my main machine at home where I initially saw the issue that started 
this thread seems to be running fine with rc6 and the "safe" patch Mel 
posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 
kernel with the revert that went into rc6 and the "safe" patch -- that 
worked fine for a few days, too.

  * I have a second machine where I started to use 3.7-rc kernels only 
yesterday (the machine triggered a bug in the radeon driver that seems 
to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac 
mentions in this thread. I wasn't able to look closer at it, but simply 
tried rc6 with the safe patch, which didn't help. I'm now running rc6 
with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151
I can't yet tell if it helps. If the problems shows up again I'll try to 
capture more debugging data via sysrq -- there wasn't any time for that 
when I was running rc6 with the safe patch, sorry.

Thorsten

next prev parent reply	other threads:[~2012-11-20 17:43 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-11  8:52 kswapd0: wxcessive CPU usage Jiri Slaby
2012-10-11  8:52 ` Jiri Slaby
2012-10-11 13:44 ` Valdis.Kletnieks
2012-10-11 15:34   ` Jiri Slaby
2012-10-11 15:34     ` Jiri Slaby
2012-10-11 17:56     ` Valdis.Kletnieks
2012-10-11 17:59       ` Jiri Slaby
2012-10-11 17:59         ` Jiri Slaby
2012-10-11 18:19         ` Valdis.Kletnieks
2012-10-11 22:08           ` kswapd0: excessive " Jiri Slaby
2012-10-11 22:08             ` Jiri Slaby
2012-10-12 12:37             ` Jiri Slaby
2012-10-12 12:37               ` Jiri Slaby
2012-10-12 13:57               ` Mel Gorman
2012-10-12 13:57                 ` Mel Gorman
2012-10-15  9:54                 ` Jiri Slaby
2012-10-15  9:54                   ` Jiri Slaby
2012-10-15 11:09                   ` Mel Gorman
2012-10-15 11:09                     ` Mel Gorman
2012-10-29 10:52                     ` Thorsten Leemhuis
2012-10-29 10:52                       ` Thorsten Leemhuis
2012-10-30 19:18                       ` Mel Gorman
2012-10-30 19:18                         ` Mel Gorman
2012-10-31 11:25                         ` Thorsten Leemhuis
2012-10-31 11:25                           ` Thorsten Leemhuis
2012-10-31 15:04                           ` Mel Gorman
2012-10-31 15:04                             ` Mel Gorman
2012-11-04 16:36                         ` Rik van Riel
2012-11-04 16:36                           ` Rik van Riel
2012-11-02 10:44                     ` Zdenek Kabelac
2012-11-02 10:44                       ` Zdenek Kabelac
2012-11-02 10:53                       ` Jiri Slaby
2012-11-02 10:53                         ` Jiri Slaby
2012-11-02 19:45                         ` Jiri Slaby
2012-11-02 19:45                           ` Jiri Slaby
2012-11-04 11:26                           ` Zdenek Kabelac
2012-11-04 11:26                             ` Zdenek Kabelac
2012-11-05 14:24                           ` [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Mel Gorman
2012-11-05 14:24                             ` Mel Gorman
2012-11-06 10:15                             ` Johannes Hirte
2012-11-06 10:15                               ` Johannes Hirte
2012-11-09  8:36                               ` Mel Gorman
2012-11-09  8:36                                 ` Mel Gorman
2012-11-14 21:43                                 ` Johannes Hirte
2012-11-14 21:43                                   ` Johannes Hirte
2012-11-09  9:12                             ` Mel Gorman
2012-11-09  9:12                               ` Mel Gorman
2012-11-09  4:22                           ` kswapd0: excessive CPU usage Seth Jennings
2012-11-09  4:22                             ` Seth Jennings
2012-11-09  8:07                             ` Zdenek Kabelac
2012-11-09  8:07                               ` Zdenek Kabelac
2012-11-09  9:06                               ` Mel Gorman
2012-11-09  9:06                                 ` Mel Gorman
2012-11-11  9:13                                 ` Zdenek Kabelac
2012-11-11  9:13                                   ` Zdenek Kabelac
2012-11-12 11:37                                   ` [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Mel Gorman
2012-11-12 11:37                                     ` Mel Gorman
2012-11-16 19:14                                     ` Josh Boyer
2012-11-16 19:14                                       ` Josh Boyer
2012-11-16 19:51                                       ` Andrew Morton
2012-11-16 19:51                                         ` Andrew Morton
2012-11-20  1:43                                         ` Valdis.Kletnieks
2012-11-16 20:06                                       ` Mel Gorman
2012-11-16 20:06                                         ` Mel Gorman
2012-11-20 15:38                                         ` Josh Boyer
2012-11-20 15:38                                           ` Josh Boyer
2012-11-20 16:13                                           ` Bruno Wolff III
2012-11-20 16:13                                             ` Bruno Wolff III
2012-11-20 17:43                                           ` Thorsten Leemhuis [this message]
2012-11-20 17:43                                             ` Thorsten Leemhuis
2012-11-23 15:20                                             ` Thorsten Leemhuis
2012-11-23 15:20                                               ` Thorsten Leemhuis
2012-11-27 11:12                                               ` Mel Gorman
2012-11-27 11:12                                                 ` Mel Gorman
2012-11-21 15:08                                           ` Mel Gorman
2012-11-21 15:08                                             ` Mel Gorman
2012-11-20  9:18                                     ` Glauber Costa
2012-11-20  9:18                                       ` Glauber Costa
2012-11-20 20:18                                       ` Andrew Morton
2012-11-20 20:18                                         ` Andrew Morton
2012-11-21  8:30                                         ` Glauber Costa
2012-11-21  8:30                                           ` Glauber Costa
2012-11-12 12:19                                   ` kswapd0: excessive CPU usage Mel Gorman
2012-11-12 12:19                                     ` Mel Gorman
2012-11-12 13:13                                     ` Zdenek Kabelac
2012-11-12 13:13                                       ` Zdenek Kabelac
2012-11-12 13:31                                       ` Mel Gorman
2012-11-12 13:31                                         ` Mel Gorman
2012-11-12 14:50                                         ` Zdenek Kabelac
2012-11-12 14:50                                           ` Zdenek Kabelac
2012-11-18 19:00                                         ` Zdenek Kabelac
2012-11-18 19:00                                           ` Zdenek Kabelac
2012-11-18 19:07                                           ` Jiri Slaby
2012-11-18 19:07                                             ` Jiri Slaby
2012-11-09  8:40                             ` Mel Gorman
2012-11-09  8:40                               ` Mel Gorman
2012-10-11 22:14 ` kswapd0: wxcessive " Andrew Morton
2012-10-11 22:14   ` Andrew Morton
2012-10-11 22:26   ` Jiri Slaby
2012-10-11 22:26     ` Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50ABC128.80706@leemhuis.info \
    --to=fedora@leemhuis.info \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=bruno@wolff.to \
    --cc=jirislaby@gmail.com \
    --cc=jslaby@suse.cz \
    --cc=jwboyer@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.