Re: ASPLOV miss ratio porting to planet labs kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sizhao Yang <zaoyang@gmail.com>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: ASPLOV miss ratio porting to planet labs kernel
Date: Fri, 15 Jul 2005 13:03:17 -0500	[thread overview]
Message-ID: <f0655b9a05071511036f1733b0@mail.gmail.com> (raw)
In-Reply-To: <20050708042843.GB5793@dmt.cnet>

Hi Marcelo,

Thank you for the quick response.  I actually didn't anticipate that
quick of a response.

> > I was wondering if someone could help me with this.  I'm porting an
> > ASPLOV paper miss ratio curve from 2.4.20 2.6.11.6 and eventually to
> > Planet Labs kernel.  It's a novel idea for memory management.  In
> > porting I at run time I'm consistently hitting kernel bugs at four
> > different places bad_page, bad_range, in rmap.c
> > BUG(page_mapcount(page)< 0), and failing at apm_do_idle.  All of these
> > functions except apm_do_idle seem to be new functions from 2.4.20 to
> > 2.6.11.6.  I'm pretty sure I'm forgetting to account for certain
> > things when modifying the pages, but I'm not sure where.
> 
> Having the information which bad_page etc. dump out would definately help.
> 
> I can't figure out what is going on with the data you provide, probably
> someone else can.

Here are some data dumps that I have:   for bad_page, bad_range, and rmap.c 
error 1: When this function has an error it traces back to bad_range
and bad_page.
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:647!
invalid operand: 0000 [#1]
Modules linked in:
CPU:    0
EIP:    0060:[<c0137af8>]    Not tainted VLI
EFLAGS: 00000202   (2.6.11-kgdb)
EIP is at buffered_rmqueue+0x178/0x1d0
eax: 00000001   ebx: c033c164   ecx: fff99e0b   edx: 00001000
esi: c033c164   edi: 00000246   ebp: c033c180   esp: c7db5df0
ds: 007b   es: 007b   ss: 0068
Process nbench (pid: 301, threadinfo=c7db4000 task=c1273020)
Stack: c033c190 c7e3b000 c033c178 000080d2 00000000 c033c164 00000000 00000000
       000080d2 c013800d 00000001 00000000 00000000 c11657e0 00000001 00000000
       c1273020 00000010 c033c3cc 00000000 c03e0aa0 003ba025 c7dbffd0 c12f385c
Call Trace:
 [<c013800d>] __alloc_pages+0x3fd/0x460
 [<c0142e19>] do_anonymous_page+0x59/0x120
 [<c0142f3e>] do_no_page+0x5e/0x280
 [<c014338c>] handle_mm_fault+0x12c/0x1b0
 [<c0113a76>] do_page_fault+0x176/0x5db
 [<c014479d>] vma_merge+0xbd/0x180
 [<c0107cd1>] old_mmap+0xc1/0x100
 [<c0113900>] do_page_fault+0x0/0x5db
 [<c0102f8b>] error_code+0x2b/0x30
Code: 00 00 00 8b 44 24 08 83 c4 14 5b 5e 5f 5d c3 8b 54 24 10 8b 44
24 08 e8 c7 f6 ff ff eb e5 0f 0b 5e 02 ed f4 2f c0 e9 7c ff ff ff <0f>
0b 87 02 ed f4 2f c0 e9 09 ff ff ff 9c 5f fa 8b 54 24 10 89
 <1>Unable to handle kernel paging request at virtual address 00100104
 printing eip:
c0137927
*pde = 00000000
Oops: 0002 [#2]
Modules linked in:
CPU:    0
EIP:    0060:[<c0137927>]    Not tainted VLI
EFLAGS: 00000097   (2.6.11-kgdb)


error 2: main error: For rmap.c:

------------[ cut here ]------------
kernel BUG at mm/rmap.c:487!
invalid operand: 0000 [#1]
Modules linked in:
CPU:    0
EIP:    0060:[<c0147b5d>]    Not tainted VLI
EFLAGS: 00000292   (2.6.11-kgdb)
EIP is at page_remove_rmap+0x3d/0x70
eax: ffffffe0   ebx: c10fc6a0   ecx: 00000000   edx: c7d9de74
esi: c7d78124   edi: 00010000   ebp: c10fc6a0   esp: c7d9de70
ds: 007b   es: 007b   ss: 0068
Process ls (pid: 302, threadinfo=c7d9c000 task=c1273020)
Stack: c02ffb8f 00001000 c0141442 c0300460 c7ddcfa4 00000000 c12f116c 07e35025
       00000000 08048000 c03bda38 08448000 c7dde084 08058000 c03bda38 c01415ab
       00010000 00000000 c7ddeb7c b83e9000 08048000 c7dde084 08058000 c03bda38
Call Trace:
 [<c0141442>] zap_pte_range+0x162/0x280
 [<c01415ab>] zap_pmd_range+0x4b/0x70
 [<c014160d>] zap_pud_range+0x3d/0x60
 [<c0141694>] unmap_page_range+0x64/0x80
 [<c01417b9>] unmap_vmas+0x109/0x220
 [<c0145e87>] exit_mmap+0x67/0x130
 [<c0116fde>] mmput+0x1e/0x70
 [<c011ab99>] do_exit+0x99/0x360
 [<c011d2d3>] tasklet_action+0x43/0x70
 [<c011d099>] __do_softirq+0x79/0x90
 [<c011aecf>] do_group_exit+0x2f/0x70
 [<c0102de3>] syscall_call+0x7/0xb
Code: 08 ff 0f 98 c0 84 c0 74 3a 8b 43 08 40 78 26 8b 43 08 40 78 16
8b 5c 24 04 ba ff  ff ff ff b8 10 00 00 00 83 c4 08 e9 43 08 ff ff
<0f> 0b e7 01 7c fb 2f c0 eb e0 c7 04  24 8f fb 2f c0 e8 6d 14 fd
 <6>note: ls[302] exited with preempt_count 1

 ------------[ cut here ]------------
kernel BUG at mm/rmap.c:487!
invalid operand: 0000 [#1]
Modules linked in:
CPU:    0
EIP:    0060:[<c0147b3d>]    Not tainted VLI
EFLAGS: 00000296   (2.6.11-kgdb)
EIP is at page_remove_rmap+0x3d/0x70
eax: ffffffa0   ebx: c10fb680   ecx: 00000000   edx: c7dc9e88
esi: c7d37fd0   edi: 00021000   ebp: c10fb680   esp: c7dc9e84
ds: 007b   es: 007b   ss: 0068
Process nbench (pid: 301, threadinfo=c7dc8000 task=c1273020)
Stack: c02ffb6f 00016000 c0141436 00000246 00000100 00000000 c7dc9ef8 07db4067
       00000000 b7fde000 c03bda38 b83de000 c7dcfb80 b7fff000 c03bda38 c014159b
       00021000 00000000 c01205f0 00000100 b7fde000 c7dcfb80 b7fff000 c03bda38
Call Trace:
 [<c0141436>] zap_pte_range+0x156/0x270
 [<c014159b>] zap_pmd_range+0x4b/0x70
 [<c01205f0>] cascade+0x30/0x50
 [<c01415fd>] zap_pud_range+0x3d/0x60
 [<c0106bb9>] timer_interrupt+0x49/0xe0
 [<c0141684>] unmap_page_range+0x64/0x80
 [<c01417a9>] unmap_vmas+0x109/0x220
 [<c01457da>] unmap_region+0x6a/0xd0
 [<c0145a91>] do_munmap+0xf1/0x130
 [<c0145b10>] sys_munmap+0x40/0x70
 [<c0102de3>] syscall_call+0x7/0xb
Code: 08 ff 0f 98 c0 84 c0 74 3a 8b 43 08 40 78 26 8b 43 08 40 78 16
8b 5c 24 04 ba ff ff ff ff b8 10 00 00 00 83 c4 08 e9 63 08 ff ff <0f>
0b e7 01 5c fb 2f c0 eb e0 c7 04 24 6f fb 2f c0 e8 8d 14 fd
 <6>note: nbench[301] exited with preempt_count 1
scheduling while atomic: nbench/0x00000001/301
 [<c02e9038>] __switch_to_end+0x20c/0x214
 [<c01046eb>] do_IRQ+0x3b/0x70
 [<c0102f52>] common_interrupt+0x1a/0x20
 [<c02e9acd>] rwsem_down_read_failed+0x8d/0x170
 [<c011bff0>] .text.lock.exit+0x27/0x87
 [<c011ab99>] do_exit+0x99/0x360
 [<c0103678>] die+0x138/0x140
 [<c01039d0>] do_invalid_op+0x0/0xc0
 [<c0103a6f>] do_invalid_op+0x9f/0xc0
 [<c0120ad6>] update_process_times+0x166/0x210
 [<c0147b3d>] page_remove_rmap+0x3d/0x70
 [<c0132a79>] handle_IRQ_event+0x29/0x60
 [<c011d099>] __do_softirq+0x79/0x90
 [<c01046eb>] do_IRQ+0x3b/0x70
 [<c0102f52>] common_interrupt+0x1a/0x20
 [<c0102f8b>] error_code+0x2b/0x30
 [<c0147b3d>] page_remove_rmap+0x3d/0x70
 [<c0141436>] zap_pte_range+0x156/0x270
 [<c014159b>] zap_pmd_range+0x4b/0x70
 [<c01205f0>] cascade+0x30/0x50
 [<c01415fd>] zap_pud_range+0x3d/0x60
 [<c0106bb9>] timer_interrupt+0x49/0xe0
 [<c0141684>] unmap_page_range+0x64/0x80
 [<c01417a9>] unmap_vmas+0x109/0x220
 [<c01457da>] unmap_region+0x6a/0xd0
 [<c0145a91>] do_munmap+0xf1/0x130
 [<c0145b10>] sys_munmap+0x40/0x70
 [<c0102de3>] syscall_call+0x7/0xb


> Why dont you post the code (in case its GPL)...

The patch is here.  

https://wiki.planet-lab.org/twiki/pub/Planetlab/UIUCProject/mrc2.6.11.6patch.diff.txt.txt

It's for the 2.6.11.6 kernel.  To replicate the error fully before
compiling: please comment the mrc_reset_pgtables(p); in mm/mrc.c on
line 433.  If one comments out mrc_scan as well then there's no run
time crashes at all.  So all of the errors comes from mrc_scan and
mrc_reset_pgtables, it's just I can't figure out how it's corrupting
the pages, and how I should account for that in the code.

Also, in make xconfig a menu should come up. Go to "Kernel hacking"
then uncheck "Sleep-inside-spinlock checking"   I usually use the
virtual emulator qemu when debugging.  Finally when running mrc.  You
want to start a program and get the process id then do the following:

echo "<process id> 1 1 100 100" > /proc/sys/vm/mrc_info   This will
turn mrc on for the given process.

All MRC code is surrounded by //MRC or something like it for easy
recognition.  I usually use qemu if you use qemu you may want to
download this:

http://fabrice.bellard.free.fr/qemu/linux-test-0.5.1.tar.gz
http://fabrice.bellard.free.fr/qemu/qemu-0.7.0.tar.gz

and when running qemu.sh in linux-test change the script to
"qemu -nographic -hda linux.img -kernel
/boot/vmlinuz-2.6.11-kgdb -append "console=ttyS0 root=/dev/hda
sb=0x220,5,1,5 ide2=noprobe ide3=noprobe ide4=noprobe
ide5=noprobe"

I'd appreciate any direction you can provide.  Thanks for your help in
advance.  I look forward to hearing from you.

Zao

     prev parent reply	other threads:[~2005-07-15 18:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-07 17:28 ASPLOV miss ratio porting to planet labs kernel Sizhao Yang
2005-07-08  4:28 ` Marcelo Tosatti
2005-07-15 18:03   ` Sizhao Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0655b9a05071511036f1733b0@mail.gmail.com \
    --to=zaoyang@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.