From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261451AbVGHPB1 (ORCPT ); Fri, 8 Jul 2005 11:01:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262682AbVGHPB1 (ORCPT ); Fri, 8 Jul 2005 11:01:27 -0400 Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:4794 "EHLO parcelfarce.linux.theplanet.co.uk") by vger.kernel.org with ESMTP id S261451AbVGHPB0 (ORCPT ); Fri, 8 Jul 2005 11:01:26 -0400 Date: Fri, 8 Jul 2005 01:28:44 -0300 From: Marcelo Tosatti To: Sizhao Yang Cc: linux-kernel@vger.kernel.org Subject: Re: ASPLOV miss ratio porting to planet labs kernel Message-ID: <20050708042843.GB5793@dmt.cnet> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 07, 2005 at 12:28:09PM -0500, Sizhao Yang wrote: > Hi all, > > I was wondering if someone could help me with this. I'm porting an > ASPLOV paper miss ratio curve from 2.4.20 2.6.11.6 and eventually to > Planet Labs kernel. It's a novel idea for memory management. In > porting I at run time I'm consistently hitting kernel bugs at four > different places bad_page, bad_range, in rmap.c > BUG(page_mapcount(page)< 0), and failing at apm_do_idle. All of these > functions except apm_do_idle seem to be new functions from 2.4.20 to > 2.6.11.6. I'm pretty sure I'm forgetting to account for certain > things when modifying the pages, but I'm not sure where. Having the information which bad_page etc. dump out would definately help. I can't figure out what is going on with the data you provide, probably someone else can. > What I'm doing in the port is resetting protection bits so that when > it page faults. It will calculate a miss ratio based on the number of > accessed bits and other information. After I gather the information I > will reset the accessed bits. Then based on previous miss ratios and > current miss ratio it will give out memory to different processes > based on that. That's the general idea. For more specifics: > > http://carmen.cs.uiuc.edu/paper/ASPLOS04-Zhou.pdf > > I've narrowed it down to primarily when I call the following functions: > ptep_test_and_clear_young, > static inline pte_t pte_mknominor(pte_t pte) { (pte).pte_low &= > ~_PAGE_PROTNONE; return pte; } > static inline pte_t pte_mkminor(pte_t pte) { (pte).pte_low |= > _PAGE_PROTNONE; return pte; } > static inline pte_t pte_mkpresent(pte_t pte) { (pte).pte_low |= > _PAGE_PRESENT; return pte; } > static inline pte_t pte_mkabsent(pte_t pte) { (pte).pte_low &= > ~_PAGE_PRESENT; return pte; } > > When I don't have those functions in my code the kernel doesn't crash, > but when I do they crash. So, my question is am I to page accounting > aspects? I looked at rmap functions for incrementing the _mapcount but > they seem to be only for when a pte is copied. Should I be > incrementing the pagecount at any point? Nope - that should be internal to rmap.c (you shouldnt touch mapcount directly). But you dont seem to be doing that anyway. > rmap.c > BUG(page_mapcount(page)< 0) is invoked when the accessed bits are > cleared in zap_pte, but I don't know how the page is being corrupted. Why dont you post the code (in case its GPL)...