ASPLOV miss ratio porting to planet labs kernel

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* ASPLOV miss ratio porting to planet labs kernel
@ 2005-07-07 17:28 Sizhao Yang
  2005-07-08  4:28 ` Marcelo Tosatti
  0 siblings, 1 reply; 3+ messages in thread
From: Sizhao Yang @ 2005-07-07 17:28 UTC (permalink / raw)
  To: linux-kernel

Hi all,

I was wondering if someone could help me with this.  I'm porting an
ASPLOV paper miss ratio curve from 2.4.20 2.6.11.6 and eventually to
Planet Labs kernel.  It's a novel idea for memory management.  In
porting I at run time I'm consistently hitting kernel bugs at four
different places bad_page, bad_range, in rmap.c
BUG(page_mapcount(page)< 0), and failing at apm_do_idle.  All of these
functions except apm_do_idle seem to be new functions from 2.4.20 to
2.6.11.6.  I'm pretty sure I'm forgetting to account for certain
things when modifying the pages, but I'm not sure where.

What I'm doing in the port is resetting protection bits so that when
it page faults.  It will calculate a miss ratio based on the number of
accessed bits and other information.  After I gather the information I
will reset the accessed bits.  Then based on previous miss ratios and
current miss ratio it will give out memory to different processes
based on that.  That's the general idea.  For more specifics:

http://carmen.cs.uiuc.edu/paper/ASPLOS04-Zhou.pdf

I've narrowed it down to primarily when I call the following functions:
ptep_test_and_clear_young,
static inline pte_t pte_mknominor(pte_t pte) { (pte).pte_low &=
~_PAGE_PROTNONE; return pte; }
static inline pte_t pte_mkminor(pte_t pte) { (pte).pte_low |=
_PAGE_PROTNONE; return pte; }
static inline pte_t pte_mkpresent(pte_t pte) { (pte).pte_low |=
_PAGE_PRESENT; return pte; }
static inline pte_t pte_mkabsent(pte_t pte) { (pte).pte_low &=
~_PAGE_PRESENT; return pte; }

When I don't have those functions in my code the kernel doesn't crash,
but when I do they crash.  So, my question is am I to page accounting
aspects? I looked at rmap functions for incrementing the _mapcount but
they seem to be only for when a pte is copied.  Should I be
incrementing the pagecount at any point?  rmap.c
BUG(page_mapcount(page)< 0) is invoked when the accessed bits are
cleared in zap_pte, but I don't know how the page is being corrupted. 
So, my main issue is if anyone can help point me to the direction
where a page could be corrupted I would greatly appreciate.

I'd appreciate any general direction anyone can point me at.  Thanks
in advance.  I look forward to hearing from anyone.

Zao

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ASPLOV miss ratio porting to planet labs kernel
  2005-07-07 17:28 ASPLOV miss ratio porting to planet labs kernel Sizhao Yang
@ 2005-07-08  4:28 ` Marcelo Tosatti
  2005-07-15 18:03   ` Sizhao Yang
  0 siblings, 1 reply; 3+ messages in thread
From: Marcelo Tosatti @ 2005-07-08  4:28 UTC (permalink / raw)
  To: Sizhao Yang; +Cc: linux-kernel

On Thu, Jul 07, 2005 at 12:28:09PM -0500, Sizhao Yang wrote:
> Hi all,
> 
> I was wondering if someone could help me with this.  I'm porting an
> ASPLOV paper miss ratio curve from 2.4.20 2.6.11.6 and eventually to
> Planet Labs kernel.  It's a novel idea for memory management.  In
> porting I at run time I'm consistently hitting kernel bugs at four
> different places bad_page, bad_range, in rmap.c
> BUG(page_mapcount(page)< 0), and failing at apm_do_idle.  All of these
> functions except apm_do_idle seem to be new functions from 2.4.20 to
> 2.6.11.6.  I'm pretty sure I'm forgetting to account for certain
> things when modifying the pages, but I'm not sure where.

Having the information which bad_page etc. dump out would definately help.

I can't figure out what is going on with the data you provide, probably
someone else can.

> What I'm doing in the port is resetting protection bits so that when
> it page faults.  It will calculate a miss ratio based on the number of
> accessed bits and other information.  After I gather the information I
> will reset the accessed bits.  Then based on previous miss ratios and
> current miss ratio it will give out memory to different processes
> based on that.  That's the general idea.  For more specifics:
> 
> http://carmen.cs.uiuc.edu/paper/ASPLOS04-Zhou.pdf
> 
> I've narrowed it down to primarily when I call the following functions:
> ptep_test_and_clear_young,
> static inline pte_t pte_mknominor(pte_t pte) { (pte).pte_low &=
> ~_PAGE_PROTNONE; return pte; }
> static inline pte_t pte_mkminor(pte_t pte) { (pte).pte_low |=
> _PAGE_PROTNONE; return pte; }
> static inline pte_t pte_mkpresent(pte_t pte) { (pte).pte_low |=
> _PAGE_PRESENT; return pte; }
> static inline pte_t pte_mkabsent(pte_t pte) { (pte).pte_low &=
> ~_PAGE_PRESENT; return pte; }
> 
> When I don't have those functions in my code the kernel doesn't crash,
> but when I do they crash.  So, my question is am I to page accounting
> aspects? I looked at rmap functions for incrementing the _mapcount but
> they seem to be only for when a pte is copied.  Should I be
> incrementing the pagecount at any point?

Nope - that should be internal to rmap.c (you shouldnt touch mapcount directly). 
But you dont seem to be doing that anyway. 

> rmap.c
> BUG(page_mapcount(page)< 0) is invoked when the accessed bits are
> cleared in zap_pte, but I don't know how the page is being corrupted. 

Why dont you post the code (in case its GPL)...

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ASPLOV miss ratio porting to planet labs kernel
  2005-07-08  4:28 ` Marcelo Tosatti
@ 2005-07-15 18:03   ` Sizhao Yang
  0 siblings, 0 replies; 3+ messages in thread
From: Sizhao Yang @ 2005-07-15 18:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

Hi Marcelo,

Thank you for the quick response.  I actually didn't anticipate that
quick of a response.

> > I was wondering if someone could help me with this.  I'm porting an
> > ASPLOV paper miss ratio curve from 2.4.20 2.6.11.6 and eventually to
> > Planet Labs kernel.  It's a novel idea for memory management.  In
> > porting I at run time I'm consistently hitting kernel bugs at four
> > different places bad_page, bad_range, in rmap.c
> > BUG(page_mapcount(page)< 0), and failing at apm_do_idle.  All of these
> > functions except apm_do_idle seem to be new functions from 2.4.20 to
> > 2.6.11.6.  I'm pretty sure I'm forgetting to account for certain
> > things when modifying the pages, but I'm not sure where.
> 
> Having the information which bad_page etc. dump out would definately help.
> 
> I can't figure out what is going on with the data you provide, probably
> someone else can.

Here are some data dumps that I have:   for bad_page, bad_range, and rmap.c 
error 1: When this function has an error it traces back to bad_range
and bad_page.
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:647!
invalid operand: 0000 [#1]
Modules linked in:
CPU:    0
EIP:    0060:[<c0137af8>]    Not tainted VLI
EFLAGS: 00000202   (2.6.11-kgdb)
EIP is at buffered_rmqueue+0x178/0x1d0
eax: 00000001   ebx: c033c164   ecx: fff99e0b   edx: 00001000
esi: c033c164   edi: 00000246   ebp: c033c180   esp: c7db5df0
ds: 007b   es: 007b   ss: 0068
Process nbench (pid: 301, threadinfo=c7db4000 task=c1273020)
Stack: c033c190 c7e3b000 c033c178 000080d2 00000000 c033c164 00000000 00000000
       000080d2 c013800d 00000001 00000000 00000000 c11657e0 00000001 00000000
       c1273020 00000010 c033c3cc 00000000 c03e0aa0 003ba025 c7dbffd0 c12f385c
Call Trace:
 [<c013800d>] __alloc_pages+0x3fd/0x460
 [<c0142e19>] do_anonymous_page+0x59/0x120
 [<c0142f3e>] do_no_page+0x5e/0x280
 [<c014338c>] handle_mm_fault+0x12c/0x1b0
 [<c0113a76>] do_page_fault+0x176/0x5db
 [<c014479d>] vma_merge+0xbd/0x180
 [<c0107cd1>] old_mmap+0xc1/0x100
 [<c0113900>] do_page_fault+0x0/0x5db
 [<c0102f8b>] error_code+0x2b/0x30
Code: 00 00 00 8b 44 24 08 83 c4 14 5b 5e 5f 5d c3 8b 54 24 10 8b 44
24 08 e8 c7 f6 ff ff eb e5 0f 0b 5e 02 ed f4 2f c0 e9 7c ff ff ff <0f>
0b 87 02 ed f4 2f c0 e9 09 ff ff ff 9c 5f fa 8b 54 24 10 89
 <1>Unable to handle kernel paging request at virtual address 00100104
 printing eip:
c0137927
*pde = 00000000
Oops: 0002 [#2]
Modules linked in:
CPU:    0
EIP:    0060:[<c0137927>]    Not tainted VLI
EFLAGS: 00000097   (2.6.11-kgdb)


error 2: main error: For rmap.c:

------------[ cut here ]------------
kernel BUG at mm/rmap.c:487!
invalid operand: 0000 [#1]
Modules linked in:
CPU:    0
EIP:    0060:[<c0147b5d>]    Not tainted VLI
EFLAGS: 00000292   (2.6.11-kgdb)
EIP is at page_remove_rmap+0x3d/0x70
eax: ffffffe0   ebx: c10fc6a0   ecx: 00000000   edx: c7d9de74
esi: c7d78124   edi: 00010000   ebp: c10fc6a0   esp: c7d9de70
ds: 007b   es: 007b   ss: 0068
Process ls (pid: 302, threadinfo=c7d9c000 task=c1273020)
Stack: c02ffb8f 00001000 c0141442 c0300460 c7ddcfa4 00000000 c12f116c 07e35025
       00000000 08048000 c03bda38 08448000 c7dde084 08058000 c03bda38 c01415ab
       00010000 00000000 c7ddeb7c b83e9000 08048000 c7dde084 08058000 c03bda38
Call Trace:
 [<c0141442>] zap_pte_range+0x162/0x280
 [<c01415ab>] zap_pmd_range+0x4b/0x70
 [<c014160d>] zap_pud_range+0x3d/0x60
 [<c0141694>] unmap_page_range+0x64/0x80
 [<c01417b9>] unmap_vmas+0x109/0x220
 [<c0145e87>] exit_mmap+0x67/0x130
 [<c0116fde>] mmput+0x1e/0x70
 [<c011ab99>] do_exit+0x99/0x360
 [<c011d2d3>] tasklet_action+0x43/0x70
 [<c011d099>] __do_softirq+0x79/0x90
 [<c011aecf>] do_group_exit+0x2f/0x70
 [<c0102de3>] syscall_call+0x7/0xb
Code: 08 ff 0f 98 c0 84 c0 74 3a 8b 43 08 40 78 26 8b 43 08 40 78 16
8b 5c 24 04 ba ff  ff ff ff b8 10 00 00 00 83 c4 08 e9 43 08 ff ff
<0f> 0b e7 01 7c fb 2f c0 eb e0 c7 04  24 8f fb 2f c0 e8 6d 14 fd
 <6>note: ls[302] exited with preempt_count 1

 ------------[ cut here ]------------
kernel BUG at mm/rmap.c:487!
invalid operand: 0000 [#1]
Modules linked in:
CPU:    0
EIP:    0060:[<c0147b3d>]    Not tainted VLI
EFLAGS: 00000296   (2.6.11-kgdb)
EIP is at page_remove_rmap+0x3d/0x70
eax: ffffffa0   ebx: c10fb680   ecx: 00000000   edx: c7dc9e88
esi: c7d37fd0   edi: 00021000   ebp: c10fb680   esp: c7dc9e84
ds: 007b   es: 007b   ss: 0068
Process nbench (pid: 301, threadinfo=c7dc8000 task=c1273020)
Stack: c02ffb6f 00016000 c0141436 00000246 00000100 00000000 c7dc9ef8 07db4067
       00000000 b7fde000 c03bda38 b83de000 c7dcfb80 b7fff000 c03bda38 c014159b
       00021000 00000000 c01205f0 00000100 b7fde000 c7dcfb80 b7fff000 c03bda38
Call Trace:
 [<c0141436>] zap_pte_range+0x156/0x270
 [<c014159b>] zap_pmd_range+0x4b/0x70
 [<c01205f0>] cascade+0x30/0x50
 [<c01415fd>] zap_pud_range+0x3d/0x60
 [<c0106bb9>] timer_interrupt+0x49/0xe0
 [<c0141684>] unmap_page_range+0x64/0x80
 [<c01417a9>] unmap_vmas+0x109/0x220
 [<c01457da>] unmap_region+0x6a/0xd0
 [<c0145a91>] do_munmap+0xf1/0x130
 [<c0145b10>] sys_munmap+0x40/0x70
 [<c0102de3>] syscall_call+0x7/0xb
Code: 08 ff 0f 98 c0 84 c0 74 3a 8b 43 08 40 78 26 8b 43 08 40 78 16
8b 5c 24 04 ba ff ff ff ff b8 10 00 00 00 83 c4 08 e9 63 08 ff ff <0f>
0b e7 01 5c fb 2f c0 eb e0 c7 04 24 6f fb 2f c0 e8 8d 14 fd
 <6>note: nbench[301] exited with preempt_count 1
scheduling while atomic: nbench/0x00000001/301
 [<c02e9038>] __switch_to_end+0x20c/0x214
 [<c01046eb>] do_IRQ+0x3b/0x70
 [<c0102f52>] common_interrupt+0x1a/0x20
 [<c02e9acd>] rwsem_down_read_failed+0x8d/0x170
 [<c011bff0>] .text.lock.exit+0x27/0x87
 [<c011ab99>] do_exit+0x99/0x360
 [<c0103678>] die+0x138/0x140
 [<c01039d0>] do_invalid_op+0x0/0xc0
 [<c0103a6f>] do_invalid_op+0x9f/0xc0
 [<c0120ad6>] update_process_times+0x166/0x210
 [<c0147b3d>] page_remove_rmap+0x3d/0x70
 [<c0132a79>] handle_IRQ_event+0x29/0x60
 [<c011d099>] __do_softirq+0x79/0x90
 [<c01046eb>] do_IRQ+0x3b/0x70
 [<c0102f52>] common_interrupt+0x1a/0x20
 [<c0102f8b>] error_code+0x2b/0x30
 [<c0147b3d>] page_remove_rmap+0x3d/0x70
 [<c0141436>] zap_pte_range+0x156/0x270
 [<c014159b>] zap_pmd_range+0x4b/0x70
 [<c01205f0>] cascade+0x30/0x50
 [<c01415fd>] zap_pud_range+0x3d/0x60
 [<c0106bb9>] timer_interrupt+0x49/0xe0
 [<c0141684>] unmap_page_range+0x64/0x80
 [<c01417a9>] unmap_vmas+0x109/0x220
 [<c01457da>] unmap_region+0x6a/0xd0
 [<c0145a91>] do_munmap+0xf1/0x130
 [<c0145b10>] sys_munmap+0x40/0x70
 [<c0102de3>] syscall_call+0x7/0xb


> Why dont you post the code (in case its GPL)...

The patch is here.  

https://wiki.planet-lab.org/twiki/pub/Planetlab/UIUCProject/mrc2.6.11.6patch.diff.txt.txt

It's for the 2.6.11.6 kernel.  To replicate the error fully before
compiling: please comment the mrc_reset_pgtables(p); in mm/mrc.c on
line 433.  If one comments out mrc_scan as well then there's no run
time crashes at all.  So all of the errors comes from mrc_scan and
mrc_reset_pgtables, it's just I can't figure out how it's corrupting
the pages, and how I should account for that in the code.

Also, in make xconfig a menu should come up. Go to "Kernel hacking"
then uncheck "Sleep-inside-spinlock checking"   I usually use the
virtual emulator qemu when debugging.  Finally when running mrc.  You
want to start a program and get the process id then do the following:

echo "<process id> 1 1 100 100" > /proc/sys/vm/mrc_info   This will
turn mrc on for the given process.

All MRC code is surrounded by //MRC or something like it for easy
recognition.  I usually use qemu if you use qemu you may want to
download this:

http://fabrice.bellard.free.fr/qemu/linux-test-0.5.1.tar.gz
http://fabrice.bellard.free.fr/qemu/qemu-0.7.0.tar.gz

and when running qemu.sh in linux-test change the script to
"qemu -nographic -hda linux.img -kernel
/boot/vmlinuz-2.6.11-kgdb -append "console=ttyS0 root=/dev/hda
sb=0x220,5,1,5 ide2=noprobe ide3=noprobe ide4=noprobe
ide5=noprobe"

I'd appreciate any direction you can provide.  Thanks for your help in
advance.  I look forward to hearing from you.

Zao

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-07-15 18:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-07 17:28 ASPLOV miss ratio porting to planet labs kernel Sizhao Yang
2005-07-08  4:28 ` Marcelo Tosatti
2005-07-15 18:03   ` Sizhao Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox