From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6656440C03 for ; Sat, 11 Jan 2025 08:50:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736585425; cv=none; b=p5+gpQJMhNe6xnlrAGeueXubo3AZYfxA4e645yFHWyuKVongoZmhb1bppEdVCvhAcjO1Dg4Ju07w5nJgG3kMB4Bf5a2SkMF7qCtwaXm/6X3SdQIt7gqbKmgpIbRhWaK/ViI8zlGROgcyHGoM4OqMUxg0M4Nb57t5+cxXxEs5psM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736585425; c=relaxed/simple; bh=NSwJi+azJDLKA7W3P8P7d+6ZfgET+mXxOwmosyRMHeg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZxrFon9tAAJdVe7IfjpgciuazVxnc5AAH+AEAz32HT30ZhA5Q8ySyJpCgNT9e5j8gu6wfOsY8ZaqDT+zoV354e5fLmW8L0FZjZYaNkBP6xm4SgBmQ5G+Y/UFAatsraJoC/2AO8hh+ApCjQB3XQcbgani5xR1mPpwk9atP/peSH0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iQTnZdZT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iQTnZdZT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2C736C4CED2; Sat, 11 Jan 2025 08:50:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736585424; bh=NSwJi+azJDLKA7W3P8P7d+6ZfgET+mXxOwmosyRMHeg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=iQTnZdZTA2JAfaVSOoQjSGZVmKGUkuv25d5kJ+p7LX2UEksv/DQL1oYOIaYcxIDLP TLGW68bhrIzcJaZP3czOm8hO3s1Dgz/kJsVBuJ9fphRUaK/KiME71xOBQRp+gGOcN5 q4+PHaAhpssO1oGy2WKmA0fOGqmy1TsgiD4jM+LuVq0/7M5UE0kH5NTY129V5or0/7 Y24u1UCzjoCr2AJ3OtUc6dsYEuSb+p65uncQwn2r3ajrCENbHQUb6qhbmd7uuH71C0 0+e9TYjhiAmOGY1sGOcHagTGe5TC7SeU/rSsF41fq8F1WqGF/sqq0RLOJJaZu/QEr4 uFaK3i4kKqPig== Date: Sat, 11 Jan 2025 10:50:13 +0200 From: Mike Rapoport To: Adam Williamson Cc: linux-kernel@vger.kernel.org, jforbes@redhat.com, mcgrof@kernel.org Subject: Re: Kernel crash while doing chroot'ed grub2-mkconfig on qemu-emulated Nehalem CPU since late November 6.13 snapshot Message-ID: References: <9a3c747052fb82274ab3c4a84eaf64c1273117ce.camel@redhat.com> <565d943ae51707002807d198b913bcd2f25a3ef5.camel@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <565d943ae51707002807d198b913bcd2f25a3ef5.camel@redhat.com> On Fri, Jan 10, 2025 at 09:28:01AM -0800, Adam Williamson wrote: > On Fri, 2025-01-10 at 11:57 +0200, Mike Rapoport wrote: > > Hi Adam, > > > > On Thu, Jan 02, 2025 at 12:16:03PM -0800, Adam Williamson wrote: > > > > > > Update on this: over the holidays, I bisected it to > > > 5185e7f9f3bd754ab60680814afd714e2673ef88 . A kernel with that commit > > > reverted does not hit the bug. > > > > > > I also did some testing with various CPU model configurations. I think > > > this actually isn't to do with Nehalem per se, but "virtual machines > > > where the CPU configuration does not exactly match the host", or > > > something like that. > > > > > > I tried a bunch of qemu CPU model settings - nehalem, sandybridge, > > > haswell, Skylake-Client and Cascadelake-Server - and got failures with > > > all of them, but when I set the model to "host", all tests passed. > > > > > > The tests get farmed out to a cluster of systems which have different > > > CPUs - one is Broadwell, one is Skylake, one is Cascade Lake - so I > > > think when I set the model to anything specific, it will match the host > > > CPU on some or none of those systems, but never *all* of them, so the > > > bug will always show up. > > > > > > I have emailed the author and reviewer of > > > 5185e7f9f3bd754ab60680814afd714e2673ef88 (also CCed on this mail) but > > > have not heard back from them yet. I've sunk over a week into this bug > > > at this point so it'd be great if someone could look at it. It's not > > > the biggest regression in the world, but it is a bit awkward for our > > > automated testing (I'll have to fiddle around to try and set CPU model > > > 'host' for the most badly-affected tests but ensure we still have > > > enough tests with 'nehalem' to confirm our baseline isn't moved). > > > > > > Thanks, and happy new year! > > > > Can you please test this patch: > > > > diff --git a/mm/execmem.c b/mm/execmem.c > > index be6b234c032e..0090a6f422aa 100644 > > --- a/mm/execmem.c > > +++ b/mm/execmem.c > > @@ -266,6 +266,7 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size) > > unsigned long vm_flags = VM_ALLOW_HUGE_VMAP; > > struct execmem_area *area; > > unsigned long start, end; > > + unsigned int page_shift; > > struct vm_struct *vm; > > size_t alloc_size; > > int err = -ENOMEM; > > @@ -296,8 +297,9 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size) > > if (err) > > goto err_free_mem; > > > > + page_shift = get_vm_area_page_order(vm) + PAGE_SHIFT; > > err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages, > > - PMD_SHIFT); > > + page_shift); > > if (err) > > goto err_free_mem; > > > > Hi Mike! Thanks. I can indeed, and I will, but also an update: on > further testing, sadly, using 'host' CPU for qemu doesn't really avoid > the bug either :/ The initial test must have just gotten lucky. I > implemented that as a 'workaround' in our openQA system and dropped the > five automatic retries per test I was using as a bludgeon, but then > failures started showing up again :/ So I've had to put the five > retries back in place for now. > > Sorry if this sent you down any wrong paths, I will test the patch > unless you tell me it's useless with this new information :) I don't think that CPU flavour is important here. I'll greatly appreciate your testing. > -- > Adam Williamson (he/him/his) -- Sincerely yours, Mike.