From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: Fwd: Re: struct page field arrangement Date: Fri, 16 Mar 2007 12:54:08 +0000 Message-ID: <45FAA180.76E4.0078.0@novell.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org >>> Keir Fraser 16.03.07 13:25 >>> >On 16/3/07 12:11, "Keir Fraser" wrote: > >>> page_referenced_one() in mm/rmap.c). If this happens when >>> xen_pgd_unpin() has already passed the respective pte page, but >>> mm_walk() hasn't reached the page, yet, the update will fail (if done >>> directly, ptwr will no pick this up, and if done through a hypercall, the >>> call would fail, likely producing a BUG()). >> >> What kind of stress test did you run? I was expecting that unpin would be >> okay because we only call mm_unpin() from _arch_exit_mmap() if the mm_count >> is 1 (which I believe means the mm is not active in any task). newburn on machines with not too much (<= 2G) memory. >And actually the pinning happens on activate_mm() in most cases, which I >would expect to be 'early enough' since noone can run on the mm before that? > >If you've managed to provoke bugs then that's very interesting (and scary)! > >I suppose if I understand the rmap case correctly, we're still susceptible >to the paging kernel thread trying to page things out at any time? Is that >what you think you've been seeing go wrong? Yes, somewhere in that area. From the data I have (page fault on the page table write in ptep_clear_flush_young(), with the page table dump showing the page to be writeable and present) I can only conclude that the race is with the unpin path (otherwise I should see the page being write protected), while the vm scan tries to recover memory at the same time, and since this scan is scanning zones, not mm-s, the references to the mm-s are being obtained from struct page -> vma -> mm (i.e. the mm-s' use counts don't matter here). Jan