From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id 9AE906B0038 for ; Mon, 3 Apr 2017 07:37:36 -0400 (EDT) Received: by mail-pg0-f69.google.com with SMTP id u3so137245003pgn.12 for ; Mon, 03 Apr 2017 04:37:36 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id g2si14084648pln.12.2017.04.03.04.37.35 for ; Mon, 03 Apr 2017 04:37:35 -0700 (PDT) Date: Mon, 3 Apr 2017 12:37:51 +0100 From: Will Deacon Subject: Re: Bad page state splats on arm64, v4.11-rc{3,4} Message-ID: <20170403113751.GD5706@arm.com> References: <20170331175845.GE6488@leverpostej> <20170403105629.GB18905@leverpostej> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170403105629.GB18905@leverpostej> Sender: owner-linux-mm@kvack.org List-ID: To: Mark Rutland Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, catalin.marinas@arm.com, punit.agrawal@arm.com On Mon, Apr 03, 2017 at 11:56:29AM +0100, Mark Rutland wrote: > On Fri, Mar 31, 2017 at 06:58:45PM +0100, Mark Rutland wrote: > > Hi, > > > > I'm seeing intermittent bad page state splats on arm64 with 4.11-rc3 and > > v4.11-rc4. I have not tested earlier kernels, or other architectures. > > > > So far, it looks like the flags are always bad in the same > > way: > > > > bad because of flags: 0x80(waiters) > > > > ... though I don't know if that's definitely the case for splat 4, the > > BUG at mm/page_alloc.c:800. > > > > I see this in QEMU VMs launched by Syzkaller, triggering once every few > > hours. So far, I have not been able to reproduce the issue in any other > > way (including using syz-repro). > > It looks like this may be an issue with the arm64 HUGETLB code. > > I wasn't able to trigger the issue over the weekend on a kernel with > HUGETLBFS disabled. There are known issues with our handling of > contiguous entries, and this might be an artefact of that. After chatting with Punit, it looks like this might be because the GUP code doesn't handle huge ptes (which we create using the contiguous hint), so follow_page_pte ends up with one of those and goes wrong. In particular, the migration code will certainly do the wrong thing. I'll probably revert the contiguous support (again) if testing indicates that it makes this issue disappear. Will -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org