linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Adam Williamson <awilliam@redhat.com>
Cc: linux-kernel@vger.kernel.org, jforbes@redhat.com, mcgrof@kernel.org
Subject: Re: Kernel crash while doing chroot'ed grub2-mkconfig on qemu-emulated Nehalem CPU since late November 6.13 snapshot
Date: Sat, 11 Jan 2025 10:50:13 +0200	[thread overview]
Message-ID: <Z4IwxfydqWMkhoLq@kernel.org> (raw)
In-Reply-To: <565d943ae51707002807d198b913bcd2f25a3ef5.camel@redhat.com>

On Fri, Jan 10, 2025 at 09:28:01AM -0800, Adam Williamson wrote:
> On Fri, 2025-01-10 at 11:57 +0200, Mike Rapoport wrote:
> > Hi Adam,
> > 
> > On Thu, Jan 02, 2025 at 12:16:03PM -0800, Adam Williamson wrote:
> > > 
> > > Update on this: over the holidays, I bisected it to
> > > 5185e7f9f3bd754ab60680814afd714e2673ef88 . A kernel with that commit
> > > reverted does not hit the bug.
> > > 
> > > I also did some testing with various CPU model configurations. I think
> > > this actually isn't to do with Nehalem per se, but "virtual machines
> > > where the CPU configuration does not exactly match the host", or
> > > something like that.
> > > 
> > > I tried a bunch of qemu CPU model settings - nehalem, sandybridge,
> > > haswell, Skylake-Client and Cascadelake-Server - and got failures with
> > > all of them, but when I set the model to "host", all tests passed.
> > > 
> > > The tests get farmed out to a cluster of systems which have different
> > > CPUs - one is Broadwell, one is Skylake, one is Cascade Lake - so I
> > > think when I set the model to anything specific, it will match the host
> > > CPU on some or none of those systems, but never *all* of them, so the
> > > bug will always show up.
> > > 
> > > I have emailed the author and reviewer of
> > > 5185e7f9f3bd754ab60680814afd714e2673ef88 (also CCed on this mail) but
> > > have not heard back from them yet. I've sunk over a week into this bug
> > > at this point so it'd be great if someone could look at it. It's not
> > > the biggest regression in the world, but it is a bit awkward for our
> > > automated testing (I'll have to fiddle around to try and set CPU model
> > > 'host' for the most badly-affected tests but ensure we still have
> > > enough tests with 'nehalem' to confirm our baseline isn't moved).
> > > 
> > > Thanks, and happy new year!
> > 
> > Can you please test this patch:
> > 
> > diff --git a/mm/execmem.c b/mm/execmem.c
> > index be6b234c032e..0090a6f422aa 100644
> > --- a/mm/execmem.c
> > +++ b/mm/execmem.c
> > @@ -266,6 +266,7 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
> >  	unsigned long vm_flags = VM_ALLOW_HUGE_VMAP;
> >  	struct execmem_area *area;
> >  	unsigned long start, end;
> > +	unsigned int page_shift;
> >  	struct vm_struct *vm;
> >  	size_t alloc_size;
> >  	int err = -ENOMEM;
> > @@ -296,8 +297,9 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
> >  	if (err)
> >  		goto err_free_mem;
> >  
> > +	page_shift = get_vm_area_page_order(vm) + PAGE_SHIFT;
> >  	err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages,
> > -				       PMD_SHIFT);
> > +				       page_shift);
> >  	if (err)
> >  		goto err_free_mem;
> >  
> 
> Hi Mike! Thanks. I can indeed, and I will, but also an update: on
> further testing, sadly, using 'host' CPU for qemu doesn't really avoid
> the bug either :/ The initial test must have just gotten lucky. I
> implemented that as a 'workaround' in our openQA system and dropped the
> five automatic retries per test I was using as a bludgeon, but then
> failures started showing up again :/ So I've had to put the five
> retries back in place for now.
> 
> Sorry if this sent you down any wrong paths, I will test the patch
> unless you tell me it's useless with this new information :)

I don't think that CPU flavour is important here. I'll greatly appreciate
your testing.

> -- 
> Adam Williamson (he/him/his)

-- 
Sincerely yours,
Mike.


  reply	other threads:[~2025-01-11  8:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-11 16:51 Kernel crash while doing chroot'ed grub2-mkconfig on qemu-emulated Nehalem CPU since late November 6.13 snapshot Adam Williamson
2025-01-02 20:16 ` Adam Williamson
2025-01-04  1:51   ` Luis Chamberlain
2025-01-04  2:57     ` Adam Williamson
2025-01-06  7:09     ` Adam Williamson
2025-01-10  9:57   ` Mike Rapoport
2025-01-10 17:28     ` Adam Williamson
2025-01-11  8:50       ` Mike Rapoport [this message]
     [not found]         ` <358da653bcd8b7875f59e673e5572bddd3677aea.camel@redhat.com>
     [not found]           ` <9cb0bdfc94643fb1544671837b357bc49800ce3f.camel@redhat.com>
2025-01-14  7:55             ` Adam Williamson
2025-01-14  8:08               ` Adam Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4IwxfydqWMkhoLq@kernel.org \
    --to=rppt@kernel.org \
    --cc=awilliam@redhat.com \
    --cc=jforbes@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).