From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753391Ab2EZX4e (ORCPT ); Sat, 26 May 2012 19:56:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:30781 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751663Ab2EZX4b (ORCPT ); Sat, 26 May 2012 19:56:31 -0400 Date: Sun, 27 May 2012 01:54:47 +0200 From: Andrea Arcangeli To: Hugh Dickins Cc: Sasha Levin , Andrew Morton , viro , oleg@redhat.com, "a.p.zijlstra" , mingo , Dave Jones , "linux-kernel@vger.kernel.org" , linux-mm Subject: Re: mm: kernel BUG at mm/memory.c:1230 Message-ID: <20120526235447.GA4016@redhat.com> References: <1337884054.3292.22.camel@lappy> <20120524120727.6eab2f97.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello everyone, On Sat, May 26, 2012 at 01:26:48PM -0700, Hugh Dickins wrote: > I've been round this loop before with that particular VM_BUG_ON. > > At first I thought like Andrew, that it's glaringly wrong on the exit > path; but then changed my mind. > > When munmapping, we certainly can arrive here with an unaligned addr > and next; but in that case rwsem_is_locked. > > Whereas in exiting, rwsem is not locked, but we're going linearly upwards, > and whenever we walk into a pmd_trans_huge area, both addr and next should > be hpage aligned: the vma bounds are unsuited to THP if they're unaligned. > > Other cases equally should not arise: madvise MADV_DONTNEED should > have rwsem_is_locked; and truncation or hole-punching shouldn't be > possible on a pure-anonymous (!vma->vm_ops) area considered for THP. > > But I cannot remember what brought me here before: a crash in testing > on one of my machines, which further investigation root-caused elsewhere? > or a report from someone else? or noticed when auditing another problem? > I'm frustrated not to recall. I agree it's not a false positive. The reason I introduced that VM_BUG_ON was to verify if any vma_adjust_trans_huge() was missing anywhere (so that it doesn't crash later in split_huge_page with an obscure mapcount != page_mapcount BUG_ON, there it would be much less obvious to see why it crashed than here). We should printk addr, end and the vma->vm_start/vm_end to debug this further. > > I'm not sure if that's indeed the issue or not, but note that this is > > the first time I've managed to trigger that with the fuzzer, and it's > > not that easy to reproduce. Which is a bit odd for code that was there > > for 4 months... > > I'm keeping off the linux-next for the moment; I'll worry about this > more if it shows up when we try 3.5-rc1. Your fuzzing tells that my > logic above is wrong, but maybe it's just a passing defect in next. If it's a missing vma_adjust_trans_huge() it shouldn't go unnoticed even with DEBUG_VM=n, so I agree that if it only happens on linux-next it's worth trying to reproduce it with 3.5-rc/3.4 too just in case. It's actually the first time I hear of this bugcheck triggering. Thanks! Andrea