From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
bugme-daemon@bugzilla.kernel.org,
Johannes Weiner <hannes@cmpxchg.org>,
ajlill@ajlc.waterloo.on.ca
Subject: Re: [Bugme-new] [Bug 15214] New: Oops at __rmqueue+0x51/0x2b3
Date: Fri, 5 Feb 2010 11:20:00 +0000 [thread overview]
Message-ID: <20100205112000.GD20412@csn.ul.ie> (raw)
In-Reply-To: <20100203143921.f2c96e8c.akpm@linux-foundation.org>
On Wed, Feb 03, 2010 at 02:39:21PM -0800, Andrew Morton wrote:
>
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Wed, 3 Feb 2010 02:30:22 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
> > http://bugzilla.kernel.org/show_bug.cgi?id=15214
> >
> > Summary: Oops at __rmqueue+0x51/0x2b3
> > Product: Memory Management
> > Version: 2.5
> > Kernel Version: 2.6.32.7
> > Platform: All
> > OS/Version: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: Page Allocator
> > AssignedTo: akpm@linux-foundation.org
> > ReportedBy: ajlill@ajlc.waterloo.on.ca
> > Regression: Yes
> >
> >
> > Created an attachment (id=24887)
> > --> (http://bugzilla.kernel.org/attachment.cgi?id=24887)
> > .config file
> >
> > I get an Oops when doing a lot of filesystem reads. The process, cfagent, is
> > running through the filesystem checksumming files when it dies. It doesn't
> > happen every time cfagent runs, but there's a pretty good chance it will.
> > This problem happens on 2.6.31.* as well, 3.6.30.10 appears to be stable. It
> > happens on two different computers, so it's unlikely to be hardware. Also, in
> > 2.6.32.*, I get an Oops at
> >
> > BUG_ON(page_zone(start_page) != page_zone(end_page));
> >
> > in move_freepages when I do sysctl -w vm.min_free_kbytes=16384
> >
> > but I can only reliably reproduce it when I do the sysctl from the boot
> > scripts, and I'm having trouble getting netconsole started beforehand to
> > capture the full output.
> >
This point on sysctl is truely bizarre. It implies that the struct pages
have been corrupted in some fashion. Just before this check is made, we
do
/* Do not cross zone boundaries */
if (start_pfn < zone->zone_start_pfn)
start_page = page;
if (end_pfn >= zone->zone_start_pfn + zone->spanned_pages)
return 0;
return move_freepages(zone, start_page, end_page, migratetype);
So, for that bug to be triggered, two pages between
zone->zone_start_pfn and
zone->zone_start_pfn + zone->spanned_pages
have to have different results for page_zone(). That would be outright
wrong.
Ordinarily at this point, I would assume that your memory is bad with
small errors occuring. The early-in-boot problem might indicate that
there is a specific region of memory that is bust rather than something
like a power problem.
You said that the checksumming problem happens on two separate machines,
but can you confirm that this problem also happens on both please?
> > gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
> >
This is a bit of a reach, but how confident are you that this version of
gcc is building kernels correctly?
There are a few disconnected reports of kernel problems with this
particular version of gcc although none that I can connect with this
problem or on x86 for that matter. One example is
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=536354
which reported problems building kernels on the s390 with that compiler.
Moving to 4.2 helped them and it *should* have been fixed according to
this bug
http://bugzilla.kernel.org/show_bug.cgi?id=13012
It might be a red herring, but just to be sure, would you mind trying
gcc 4.2 or 4.3 just to be sure please?
> > Full text of Oops:
> >
> > BUG: unable to handle kernel paging request at 6eae67fc
> > IP: [<c0192a38>] __rmqueue+0x51/0x2b3
> > *pdpt = 00000000351be001 *pde = 0000000000000000
> > Oops: 0002 [#1] SMP
> > last sysfs file: /sys/class/firmware/0000:00:0b.0/loading
> > Modules linked in: netconsole af_packet autofs4 nfsd nfs lockd fscache nfs_acl
> > auth_rpcgss sunrpc ipv6 nls_iso8859_1 nls_cp437 vfat fat xfs exportfs fuse
> > configfs dm_snapshot dm_mirror dm_region_hash dm_log dm_mod eeprom w83781d
> > hwmon_vid hwmon r128 drm tuner_simple tuner_types tuner msp3400 saa7115 button
> > processor ivtv i2c_algo_bit cx2341x v4l2_common videodev psmouse parport_pc
> > v4l1_compat rtc_cmos parport tveeprom i2c_piix4 rtc_core intel_agp serio_raw
> > rtc_lib agpgart i2c_core shpchp pci_hotplug pcspkr evdev ext3 jbd mbcache raid1
> > sg sr_mod sd_mod cdrom crc_t10dif ata_generic pata_acpi pata_pdc202xx_old
> > ata_piix floppy e1000 uhci_hcd libata thermal fan unix [last unloaded:
> > scsi_wait_scan]
> >
> > Pid: 6629, comm: cfagent Not tainted (2.6.32.7 #1) System Name
> > EIP: 0060:[<c0192a38>] EFLAGS: 00210002 CPU: 0
> > EIP is at __rmqueue+0x51/0x2b3
What line does addr2line say c0192a38 corresponds to?
> > EAX: c146a018 EBX: 0000000a ECX: 6eae67f8 EDX: c050b654
> > ESI: c050b644 EDI: 00200246 EBP: f51c9d1c ESP: f51c9cec
> > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > Process cfagent (pid: 6629, ti=f51c8000 task=f51b40b0 task.ti=f51c8000)
> > Stack:
> > 00000002 00000000 c050b260 00000001 f6ba8280 00200002 c0193c92 c019404e
> > <0> c146a000 c1479ff8 c050b260 00200246 f51c9d78 c0193cd5 f51c9d7c 00000002
> > <0> 00000000 00000000 000201da c050c16c 00000000 c050b280 00000001 0000001f
> > Call Trace:
> > [<c0193c92>] ? get_page_from_freelist+0xdf/0x3a8
> > [<c019404e>] ? __alloc_pages_nodemask+0xdd/0x481
> > [<c0193cd5>] ? get_page_from_freelist+0x122/0x3a8
> > [<c019404e>] ? __alloc_pages_nodemask+0xdd/0x481
> > [<c01caa57>] ? _d_rehash+0x3c/0x40
> > [<c01961e3>] ? __do_page_cache_readahead+0x80/0x15b
> > [<c01cb95f>] ? __d_lookup+0xa1/0xd5
> > [<c01962d5>] ? ra_submit+0x17/0x1c
> > [<c01964e4>] ? ondemand_readahead+0x150/0x15c
> > [<c0196569>] ? page_cache_sync_readahead+0x16/0x1b
> > [<c0190def>] ? generic_file_aio_read+0x212/0x507
> > [<c01bd512>] ? do_sync_read+0xab/0xe9
> > [<c01a86f5>] ? mmap_region+0x25b/0x334
> > [<c014823f>] ? autoremove_wake_function+0x0/0x33
> > [<c020edd8>] ? security_file_permission+0xf/0x11
> > [<c01bd467>] ? do_sync_read+0x0/0xe9
> > [<c01bdc1d>] ? vfs_read+0x8a/0x13f
> > [<c01be026>] ? sys_read+0x3b/0x60
> > [<c010296f>] ? sysenter_do_call+0x12/0x27
> > Code: 2c c1 e1 03 8d 94 30 20 02 00 00 e9 8a 00 00 00 8d 72 0c 8d 04 0e 39 00
> > 74 7c 8b 55 d0 8b 04 d6 8d 48 e8 89 4d f0 8b 08 8b 50 04 <89> 51 04 89 0a c7 40
> > 04 00 02 20 00 c7 00 00 01 10 00 0f ba 70
> > EIP: [<c0192a38>] __rmqueue+0x51/0x2b3 SS:ESP 0068:f51c9cec
> > CR2: 000000006eae67fc
> > ---[ end trace db0096b2091950d0 ]---
> >
>
> Strange regression. I'd be suspecting that we've mucked up the initial
> mem_map, perhaps because of a wart in the e820 or acpi tables.
>
> Or perhaps it's something else.
>
Lets see what the early boot looked like.
Tony, would you mind booting with "mminit_loglevel=4 loglevel=9" and send
the full dmesg please?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-02-05 11:20 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-15214-10286@http.bugzilla.kernel.org/>
2010-02-03 22:39 ` [Bugme-new] [Bug 15214] New: Oops at __rmqueue+0x51/0x2b3 Andrew Morton
2010-02-05 11:20 ` Mel Gorman [this message]
2010-02-07 18:34 ` Tony Lill
2010-02-08 10:10 ` Mel Gorman
2010-02-08 19:18 ` Andrew Morton
2010-02-09 14:45 ` Mel Gorman
[not found] ` <201002101217.34131.ajlill@ajlc.waterloo.on.ca>
2010-02-11 18:20 ` Mel Gorman
2010-02-11 18:49 ` Linus Torvalds
2010-02-12 12:17 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100205112000.GD20412@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=ajlill@ajlc.waterloo.on.ca \
--cc=akpm@linux-foundation.org \
--cc=bugme-daemon@bugzilla.kernel.org \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).