From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Mundt Date: Mon, 10 Nov 2008 08:06:23 +0000 Subject: Re: repeated oops under load on SH4 system Message-Id: <20081110080623.GB13734@linux-sh.org> List-Id: References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-sh@vger.kernel.org On Tue, Nov 04, 2008 at 09:31:44PM +0900, CHIKAMA Masaki wrote: > Hello all. > > I've got repeated oops message under a load on kernel 2.6.26.7. > It happens once or twice per a week with the below message. > > >Unable to handle kernel paging request at virtual address dfff0700 > >Unable to handle kernel paging request at virtual address dfff1000 > >Unable to handle kernel paging request at virtual address dfff0a00 > > I have been gotten this message from around kernel 2.6.23. I didn't > test before it. > My hardware is mach-landisk with attached .config. > The root file system is on nfs server. > Please let me know if you need more information to investigating the problem. > Could somebody give me a hint to resolve the issue ? > > Thanks in advance. > This suggests you are getting a TLB miss on various fixmap entries. Based on your call chain, these are related to the cache colouring in the page copying. update_mmu_cache() specifically faults the translation in, so you should not be making it all the way up to the TLB miss handler in the first place. This points to something evicting the entry from the TLB during your copy, which while it is not something I have seen in practice, is interesting to know that it remains a possibility under other workloads. A simple but expensive fix for this would be blowing out the TLB and speculatively bumping up the UTLB replace boundary prior to pre-faulting the fixmap translation. I'll look at this some more over the next couple days and send you a patch for testing.