From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19])
	by kanga.kvack.org (Postfix) with SMTP id AD1616B00A3
	for <linux-mm@kvack.org>; Thu, 28 Jan 2010 09:57:37 -0500 (EST)
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 00 of 31] Transparent Hugepage support #8
Message-Id: <patchbomb.1264689194@v2.random>
Date: Thu, 28 Jan 2010 15:33:14 +0100
From: Andrea Arcangeli <aarcange@redhat.com>
Sender: owner-linux-mm@kvack.org
To: linux-mm@kvack.org
Cc: Marcelo Tosatti <mtosatti@redhat.com>, Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>, Izik Eidus <ieidus@redhat.com>, Hugh Dickins <hugh.dickins@tiscali.co.uk>, Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>, Mel Gorman <mel@csn.ul.ie>, Dave Hansen <dave@linux.vnet.ibm.com>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, Christoph Lameter <cl@linux-foundation.org>, Chris Wright <chrisw@sous-sol.org>, Andrew Morton <akpm@linux-foundation.org>, bpicco@redhat.com, Christoph Hellwig <hch@infradead.org>, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>, Balbir Singh <balbir@linux.vnet.ibm.com>, Arnd Bergmann <arnd@arndb.de>
List-ID: <linux-mm.kvack.org>

Hello,

this is the last covering all review plus fixes for vm_normal_page in
khugepaged (which still results in a warning generated by a pte having
pte_special bit and mapping mmio area in the 256M area of the graphics card of
mst but nor VM_PFNMAP nor VM_MIXEDMAP set, even worse vm_file is null and
vm_ops is null too). I still can't figure out how a special pte can be mapped
in an area with vm_ops and vm_file both null (btw, on a side note I doubt we
ever have a case of vm_ops not null and vm_file null or vm_ops null and
vm_file not null). It happens during the speculative readonly pass and I
intentionally tried to avoid taking the pt lock (it was later taken in
collapse_huge_page of course). I wonder if that's the reason so I added the
pt lock in the speculative pass too, but I can't see how it can happen even
without the lock (the pte can't go away under it because khugepaged holds the
mmap_sem read mode).  I would imagine it could be a problem only if pte
updates weren't atomic as they are on 64bit (despite not being enforced with
asm() constructs but relaying on gcc) so takign pt lock will help on that
side, but even if it was gcc doing partial writes to ptes, it shouldn't be
always pointing to mmio of graphics card.

I can't reproduce the khugepaged warning here, but it's only a warning, it
can't affect stability if it's true my hypotesis that the bug was already
there, and that I only exposed it with khugepaged.

But if it's not my bug, then I wonder why munmap doesn't trip on it too.  The
suspicious code at the moment is i915_gem, things like unmap_mapping_range
etc.. One of my theories is that unmap_mapping_range of i915_gem is not
clearing all ptes and then those pte_special leaks into newly allocated
mappings as new pte are allocated but then I can't imagine this not to have
adverse effects (at the very least it should screwup graphics card at boot
around the time these bugcheck triggers). Also teh bugchecks seem to go away
after 80 sec uptime so maybe the corruption is then cleared as the new
mappings are teardown too (this time correctly through regular munmap and not
->close).

It's my primary focus is to understand that pte_special thing, because other
than the above there is no other known issue so far, and it is rock solid in
all hardware where I deployed it and I leave swapping storms running 24/7 in
addition to khugepaged at 0 scan/defrag_sleep to stress smp safety of
split_huge_page and collapse_huge_page. I rebooted laptop and test server,
only to upgrade the #8 version to include in the testing the new code post
review.

I suggest to try it (especially if you use i915_gem, as I need to know if
anybody else can reproduce the khugepaged warning with pte_special set) and
let me know, the more testing the better.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>