All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Randy.Dunlap" <rddunlap@osdl.org>
To: Brad Fitzpatrick <brad@danga.com>
Cc: Jeff Garzik <jgarzik@pobox.com>,
	linux-kernel@vger.kernel.org, ak@suse.de
Subject: Re: [OOPS] 2.6.9-rc4, dual Opteron, NUMA, 8GB
Date: Wed, 13 Oct 2004 13:34:01 -0700	[thread overview]
Message-ID: <416D9139.1060200@osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.58.0410131328400.31327@danga.com>

Brad Fitzpatrick wrote:
> On Wed, 13 Oct 2004, Randy.Dunlap wrote:
> 
> 
>>Brad Fitzpatrick wrote:
>>
>>>On Wed, 13 Oct 2004, Jeff Garzik wrote:
>>>
>>>
>>>
>>>>Brad Fitzpatrick wrote:
>>>>
>>>>
>>>>>I'm reporting an oops.  Details follow.
>>>>>
>>>>>I have two of these machines.  I will happily be anybody's guinea pig
>>>>>to debug this.  (more details, access to machine, try patches, kernels...)
>>>>>Machines aren't in production.
>>>>>
>>>>>- Brad
>>>>>
>>>>>
>>>>>Kernel:  2.6.9-rc4 vanilla (.config below)
>>>>>
>>>>>Hardware:  IBM eServer 325, Dual Opteron 8GB ram (more info below)
>>>>>
>>>>>Pre-crash and crash:
>>>>>
>>>>>a1:~# mke2fs /dev/mapper/raid10-data
>>>>>mke2fs 1.35 (28-Feb-2004)
>>>>>Filesystem label=
>>>>>OS type: Linux
>>>>>Block size=4096 (log=2)
>>>>>Fragment size=4096 (log=2)
>>>>>25608192 inodes, 51200000 blocks
>>>>>2560000 blocks (5.00%) reserved for the super user
>>>>>First data block=0
>>>>>1563 block groups
>>>>>32768 blocks per group, 32768 fragments per group
>>>>>16384 inodes per group
>>>>>Superblock backups stored on blocks:
>>>>>       32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
>>>>>       4096000, 7962624, 11239424, 20480000, 23887872
>>>>>
>>>>>Writing inode tables: 1091/1563
>>>>>Message from syslogd@localhost at Wed Oct 13 11:46:01 2004 ...
>>>>>localhost kernel: Oops: 0000 [1] SMP
>>>>>
>>>>>Message from syslogd@localhost at Wed Oct 13 11:46:01 2004 ...
>>>>>localhost kernel: CR2: 0000000000001770
>>>>
>>>>
>>>>What's your block device configuration?  What block devices are sitting
>>>>on top of what other block devices?
>>>
>>>
>>>/dev/mapper/raid10-data is a LV taking 200GB of a 280GB VG ("raid10") with
>>>a single PV in it:  /dev/sdb1 -- ips driver, IBM ServeRAID 6M card,
>>>representing a RAID 10 atop 8 SCSI disks.
>>>
>>>I just made a new kernel without NUMA and made a filesystem on /dev/sdb1
>>>directly instead of using LVM and it worked fine, if not a little slowly.
>>>
>>>Now that I know it /can/ work, I'll try and narrow down whose fault it is:
>>>NUMA or LVM.
>>
>>Very similar to
>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109328505204081&w=2
>>and its follow-up:
>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109330259511819&w=2
>>
>>but no solutions there.
> 
> 
> Well, good to know I'm not alone?  :-)
> 
> I was just about to mail and report that disabling NUMA does help:
> 
>    NUMA + mke2fs on LVM:      OOPS  (mailed earlier)
> no NUMA + mke2fs on LVM:      okay
>    NUMA + mke2fs on sdb1:     OOPS  (below)
> no NUMA + mke2fs on sdb1:     okay
> 
> no NUMA + mount e2fs on LVM:  okay
> no NUMA + mount e2fs on sb1:  okay
>    NUMA + mount e2fs on LVM:  okay
>    NUMA + mount e2fs on sb1:  untested, assume okay
> 
> 
> OOPs when doing mke2fs on /dev/sdb1, with NUMA enabled:
> 
> Oct 13 13:24:37 localhost kernel: Unable to handle kernel paging request at 0000000000001770 RIP:
> Oct 13 13:24:37 localhost kernel: <ffffffff8015efe4>{kmem_getpages+132}
> Oct 13 13:24:37 localhost kernel: PML4 1f8fe6067 PGD 1f8fef067 PMD 0
> Oct 13 13:24:37 localhost kernel: Oops: 0000 [1] SMP
> Oct 13 13:24:37 localhost kernel: CPU 0
> Oct 13 13:24:37 localhost kernel: Modules linked in: af_packet tsdev mousedev joydev usbhid ohci_hcd hw_random amd74xx evdev tg3 dm_mod ide_generic ide_cd ide_core cdrom rtc ext3 jbd mbcache sd_mod ips mptscsih mptbase scsi_mod unix
> Oct 13 13:24:37 localhost kernel: Pid: 3145, comm: mke2fs Not tainted 2.6.9-rc4
> Oct 13 13:24:37 localhost kernel: RIP: 0010:[kmem_getpages+132/432] <ffffffff8015efe4>{kmem_getpages+132}
> Oct 13 13:24:37 localhost kernel: RSP: 0018:00000101f81b7aa8  EFLAGS: 00010213
> Oct 13 13:24:37 localhost kernel: RAX: ffffffff7fffffff RBX: 00000101fffc9680 RCX: 0000000000000000
> Oct 13 13:24:37 localhost kernel: RDX: 0000010000011700 RSI: 00000100000119c0 RDI: 0000010000012500
> Oct 13 13:24:37 localhost kernel: RBP: 00000101fffc9680 R08: 000001016bc01000 R09: 00000101fffc96e8
> Oct 13 13:24:37 localhost kernel: R10: 0000000000000000 R11: 00000000fffffffa R12: 00000101fffc9680
> Oct 13 13:24:37 localhost kernel: R13: 0000000000000000 R14: 00000101fffc9728 R15: 0000000000000001
> Oct 13 13:24:37 localhost kernel: FS:  0000002a95ddb4a0(0000) GS:ffffffff803df300(0000) knlGS:0000000000000000
> Oct 13 13:24:37 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Oct 13 13:24:37 localhost kernel: CR2: 0000000000001770 CR3: 0000000000101000 CR4: 00000000000006e0
> Oct 13 13:24:37 localhost kernel: Process mke2fs (pid: 3145, threadinfo 00000101f81b6000, task 00000101fe9b4030)
> Oct 13 13:24:37 localhost kernel: Stack: 000001016c3fa000 0000000000000000 0000000000000050 ffffffff8015ff6e
> Oct 13 13:24:37 localhost kernel:        0000005000000010 000000000000003c 00000100fbf6b000 00000101fffc9680
> Oct 13 13:24:37 localhost kernel:        00000101fffc96c8 00000101fffc9728
> Oct 13 13:24:37 localhost kernel: Call Trace:<ffffffff8015ff6e>{cache_grow+190} <ffffffff801601c6>{cache_alloc_refill+422}
> Oct 13 13:24:37 localhost kernel:        <ffffffff801604b6>{kmem_cache_alloc+54} <ffffffff8017d761>{alloc_buffer_head+17}
> Oct 13 13:24:37 localhost kernel:        <ffffffff8017aeba>{create_buffers+42} <ffffffff8017b884>{create_empty_buffers+20}
> Oct 13 13:24:37 localhost kernel:        <ffffffff8017bcdf>{__block_prepare_write+175} <ffffffff80180190>{blkdev_get_block+0}
> Oct 13 13:24:37 localhost kernel:        <ffffffff8017c78a>{block_prepare_write+26} <ffffffff80158dc4>{generic_file_buffered_write+404}
> Oct 13 13:24:37 localhost kernel:        <ffffffff80193fae>{inode_update_time+158} <ffffffff801594dd>{generic_file_aio_write_nolock+765}
> Oct 13 13:24:37 localhost kernel:        <ffffffff801595b5>{generic_file_write_nolock+165} <ffffffff80134ef3>{__wake_up+67}
> Oct 13 13:24:37 localhost kernel:        <ffffffff802a46fe>{thread_return+41} <ffffffff80136890>{autoremove_wake_function+0}
> Oct 13 13:24:37 localhost kernel:        <ffffffff8018128a>{blkdev_file_write+26} <ffffffff801789e4>{vfs_write+228}
> Oct 13 13:24:37 localhost kernel:        <ffffffff80178b13>{sys_write+83} <ffffffff8011195a>{system_call+126}
> Oct 13 13:24:37 localhost kernel:
> Oct 13 13:24:37 localhost kernel:
> Oct 13 13:24:37 localhost kernel: Code: 48 8b 91 70 17 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00
> Oct 13 13:24:37 localhost kernel: RIP <ffffffff8015efe4>{kmem_getpages+132} RSP <00000101f81b7aa8>
> Oct 13 13:24:37 localhost kernel: CR2: 0000000000001770
> 
> 
> 
> Randy, if you're interested and you're actually at OSDL Beaverton, I'm
> just across the street from you.  I could carry this 1U server and 3U
> drive cabinet over to you!  :)

I am at the Round in Beaverton.  It might be easier for me to go to
the server. :)

> Who's responsible for the K8_NUMA stuff?  I'd love to work with them to
> narrow this down.

Andi Kleen (SUSE).  Copied.

-- 
~Randy

  reply	other threads:[~2004-10-13 20:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-13 19:11 [OOPS] 2.6.9-rc4, dual Opteron, NUMA, 8GB Brad Fitzpatrick
2004-10-13 20:01 ` Jeff Garzik
2004-10-13 20:09   ` Brad Fitzpatrick
2004-10-13 20:12     ` Randy.Dunlap
2004-10-13 20:32       ` Brad Fitzpatrick
2004-10-13 20:34         ` Randy.Dunlap [this message]
2004-10-13 21:21           ` Andi Kleen
2004-10-14  6:07             ` Brad Fitzpatrick
2004-10-14 18:29               ` Brad Fitzpatrick
2004-10-14 18:40                 ` Andi Kleen
2004-10-13 20:45         ` Jeff Garzik
2004-10-13 20:49           ` Brad Fitzpatrick
2004-10-13 20:50             ` Jeff Garzik
2004-10-13 20:38       ` Gnome-2.8 stoped working on kernel-2.6.9-rc4-mm1 Stef van der Made
2004-10-13 20:47         ` Radoslaw Szkodzinski
2004-10-13 21:16           ` Stef van der Made
2004-10-13 22:08             ` Buddy Lucas
2004-10-13 23:07               ` Jon Masters
2004-10-13 21:13         ` Jesse Stockall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=416D9139.1060200@osdl.org \
    --to=rddunlap@osdl.org \
    --cc=ak@suse.de \
    --cc=brad@danga.com \
    --cc=jgarzik@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.