From: Stuart Foster <smf.linux@ntlworld.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [Bug 42578] Kernel crash "Out of memory error by X" when using NTFS file system on external USB Hard drive
Date: Sat, 11 Feb 2012 21:28:23 +0000 [thread overview]
Message-ID: <4F36DD77.1080306@ntlworld.com> (raw)
In-Reply-To: <20120210163748.GR5796@csn.ul.ie>
On 02/10/12 16:37, Mel Gorman wrote:
> On Thu, Jan 19, 2012 at 12:24:48PM -0800, Andrew Morton wrote:
>>
>> (switched to email. Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Wed, 18 Jan 2012 09:22:12 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=42578
>>
>
> Sorry again for taking so long to look at this.
>
>> Stuart has an 8GB x86_32 machine.
>
> The bugzilla talks about a 16G machine. Is 8G a typo?
>
>> It has large amounts of NTFS
>> pagecache in highmem. NTFS is using 512-byte buffer_heads. All of the
>> machine's lowmem is being consumed by struct buffer_heads which are
>> attached to the highmem pagecache and the machine is dead in the water,
>> getting a storm of ooms.
>>
>
> Ok, I was at least able to confirm with an 8G machine that there are a lot
> of buffer_heads allocated as you'd expect but it did not crash. I suspect
> it's because the ratio of highmem/normal was insufficient to trigger the
> bug. Stuart, if this is a 16G machine, can you test booting with mem=8G
> to confirm the ratio of highmem/normal is the important factor please?
>
>> A regression, I think. A box-killing one on a pretty simple workload
>> on a not uncommon machine.
>>
>
> Because of the trigger, it's the type of bug that could have existed for
> a long time without being noticed. When I went to reproduce this, I found
> that my distro by default was using fuse to access the NTFS partition
> which could have also contributed to hiding this.
>
>> We used to handle this by scanning highmem even when there was plenty
>> of free highmem and the request is for a lowmmem pages. We have made a
>> few changes in this area and I guess that's what broke it.
>>
>
> I don't have much time to look at this unfortunately so I didn't dig too
> deep but this assessment looks accurate. In direct reclaim for example,
> we used to always scan all zones unconditionally. Now we filter what zones
> we reclaim from based on the gfp mask of the caller.
>
>> I think a suitable fix here would be to extend the
>> buffer_heads_over_limit special-case. If buffer_heads_over_limit is
>> true, both direct-reclaimers and kswapd should scan the highmem zone
>> regardless of incoming gfp_mask and regardless of the highmem free
>> pages count.
>>
>
> I've included a quick hatchet job below to test the basic theory. It has
> not been tested properly I'm afraid but the basic idea is there.
>
>> In this mode, we only scan the file lru. We should perform writeback
>> as well, because the buffer_heads might be dirty.
>>
>
> With this patch against 3.3-rc3, it won't immediately initiate writeback by
> kswapd. Direct reclaim cannot initiate writeback at all so there is still
> a risk that enough dirty pages could exist to pin low memory and go OOM but
> the machine would need at least 30G of machine and running in 32-bit mode.
>
>> [aside: If all of a page's buffer_heads are dirty we can in fact
>> reclaim them and mark the entire page dirty. If some of the
>> buffer_heads are dirty and the others are uptodate we can even reclaim
>> them in this case, and mark the entire page dirty, causing extra I/O
>> later. But try_to_release_page() doesn't do these things.]
>>
>
> Good tip.
>
>> I think it is was always wrong that we only strip buffer_heads when
>> moving pages to the inactive list. What happens if those 600MB of
>> buffer_heads are all attached to inactive pages?
>>
>
> I wondered the same thing myself. With some use-once logic, there is
> no guarantee that they even get promoted to the active list in the
> first place. It's "always" been like this but we've changed how pages gets
> promoted quite a bit and this use case could have been easily missed.
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c52b235..3622765 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2235,6 +2235,14 @@ static bool shrink_zones(int priority, struct zonelist *zonelist,
> unsigned long nr_soft_scanned;
> bool aborted_reclaim = false;
>
> + /*
> + * If the number of buffer_heads in the machine exceeds the maximum
> + * allowed level, force direct reclaim to scan the highmem zone as
> + * highmem pages could be pinning lowmem pages storing buffer_heads
> + */
> + if (buffer_heads_over_limit)
> + sc->gfp_mask |= __GFP_HIGHMEM;
> +
> for_each_zone_zonelist_nodemask(zone, z, zonelist,
> gfp_zone(sc->gfp_mask), sc->nodemask) {
> if (!populated_zone(zone))
> @@ -2724,6 +2732,17 @@ loop_again:
> */
> age_active_anon(zone,&sc, priority);
>
> + /*
> + * If the number of buffer_heads in the machine
> + * exceeds the maximum allowed level and this node
> + * has a highmem zone, force kswapd to reclaim from
> + * it to relieve lowmem pressure.
> + */
> + if (buffer_heads_over_limit&& is_highmem_idx(i)) {
> + end_zone = i;
> + break;
> + }
> +
> if (!zone_watermark_ok_safe(zone, order,
> high_wmark_pages(zone), 0, 0)) {
> end_zone = i;
> @@ -2786,7 +2805,8 @@ loop_again:
> (zone->present_pages +
> KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
> KSWAPD_ZONE_BALANCE_GAP_RATIO);
> - if (!zone_watermark_ok_safe(zone, order,
> + if ((buffer_heads_over_limit&& is_highmem_idx(i)) ||
> + !zone_watermark_ok_safe(zone, order,
> high_wmark_pages(zone) + balance_gap,
> end_zone, 0)) {
> shrink_zone(priority, zone,&sc);
>
Hi,
Thanks for the update, my test results using kernel 3.3-rc3 are as follows:
1 With all 16Gbyte enabled the system fails as previously reported.
2 With memory limited to 8Gbyte the system does not fail.
3 With the patch applied and the system using the full 16Gbyte the
system does not fail.
Thanks
Stuart
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-02-11 21:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-42578-27@https.bugzilla.kernel.org/>
[not found] ` <201201180922.q0I9MCYl032623@bugzilla.kernel.org>
2012-01-19 20:24 ` [Bug 42578] Kernel crash "Out of memory error by X" when using NTFS file system on external USB Hard drive Andrew Morton
2012-02-10 16:37 ` Mel Gorman
2012-02-10 17:01 ` Rik van Riel
2012-02-14 12:17 ` Mel Gorman
2012-02-11 21:28 ` Stuart Foster [this message]
2012-02-14 13:09 ` Mel Gorman
2012-02-14 20:00 ` Christoph Lameter
2012-02-14 20:34 ` Andrew Morton
2012-02-14 20:42 ` Christoph Lameter
2012-02-14 20:37 ` Andrew Morton
2012-02-15 13:57 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F36DD77.1080306@ntlworld.com \
--to=smf.linux@ntlworld.com \
--cc=akpm@linux-foundation.org \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).