Re: Kernel crash while doing chroot'ed grub2-mkconfig on qemu-emulated Nehalem CPU since late November 6.13 snapshot

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Rapoport <rppt@kernel.org>
To: Adam Williamson <awilliam@redhat.com>
Cc: linux-kernel@vger.kernel.org, jforbes@redhat.com, mcgrof@kernel.org
Subject: Re: Kernel crash while doing chroot'ed grub2-mkconfig on qemu-emulated Nehalem CPU since late November 6.13 snapshot
Date: Sat, 11 Jan 2025 10:50:13 +0200	[thread overview]
Message-ID: <Z4IwxfydqWMkhoLq@kernel.org> (raw)
In-Reply-To: <565d943ae51707002807d198b913bcd2f25a3ef5.camel@redhat.com>

On Fri, Jan 10, 2025 at 09:28:01AM -0800, Adam Williamson wrote:
> On Fri, 2025-01-10 at 11:57 +0200, Mike Rapoport wrote:
> > Hi Adam,
> > 
> > On Thu, Jan 02, 2025 at 12:16:03PM -0800, Adam Williamson wrote:
> > > 
> > > Update on this: over the holidays, I bisected it to
> > > 5185e7f9f3bd754ab60680814afd714e2673ef88 . A kernel with that commit
> > > reverted does not hit the bug.
> > > 
> > > I also did some testing with various CPU model configurations. I think
> > > this actually isn't to do with Nehalem per se, but "virtual machines
> > > where the CPU configuration does not exactly match the host", or
> > > something like that.
> > > 
> > > I tried a bunch of qemu CPU model settings - nehalem, sandybridge,
> > > haswell, Skylake-Client and Cascadelake-Server - and got failures with
> > > all of them, but when I set the model to "host", all tests passed.
> > > 
> > > The tests get farmed out to a cluster of systems which have different
> > > CPUs - one is Broadwell, one is Skylake, one is Cascade Lake - so I
> > > think when I set the model to anything specific, it will match the host
> > > CPU on some or none of those systems, but never *all* of them, so the
> > > bug will always show up.
> > > 
> > > I have emailed the author and reviewer of
> > > 5185e7f9f3bd754ab60680814afd714e2673ef88 (also CCed on this mail) but
> > > have not heard back from them yet. I've sunk over a week into this bug
> > > at this point so it'd be great if someone could look at it. It's not
> > > the biggest regression in the world, but it is a bit awkward for our
> > > automated testing (I'll have to fiddle around to try and set CPU model
> > > 'host' for the most badly-affected tests but ensure we still have
> > > enough tests with 'nehalem' to confirm our baseline isn't moved).
> > > 
> > > Thanks, and happy new year!
> > 
> > Can you please test this patch:
> > 
> > diff --git a/mm/execmem.c b/mm/execmem.c
> > index be6b234c032e..0090a6f422aa 100644
> > --- a/mm/execmem.c
> > +++ b/mm/execmem.c
> > @@ -266,6 +266,7 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
> >  	unsigned long vm_flags = VM_ALLOW_HUGE_VMAP;
> >  	struct execmem_area *area;
> >  	unsigned long start, end;
> > +	unsigned int page_shift;
> >  	struct vm_struct *vm;
> >  	size_t alloc_size;
> >  	int err = -ENOMEM;
> > @@ -296,8 +297,9 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
> >  	if (err)
> >  		goto err_free_mem;
> >  
> > +	page_shift = get_vm_area_page_order(vm) + PAGE_SHIFT;
> >  	err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages,
> > -				       PMD_SHIFT);
> > +				       page_shift);
> >  	if (err)
> >  		goto err_free_mem;
> >  
> 
> Hi Mike! Thanks. I can indeed, and I will, but also an update: on
> further testing, sadly, using 'host' CPU for qemu doesn't really avoid
> the bug either :/ The initial test must have just gotten lucky. I
> implemented that as a 'workaround' in our openQA system and dropped the
> five automatic retries per test I was using as a bludgeon, but then
> failures started showing up again :/ So I've had to put the five
> retries back in place for now.
> 
> Sorry if this sent you down any wrong paths, I will test the patch
> unless you tell me it's useless with this new information :)

I don't think that CPU flavour is important here. I'll greatly appreciate
your testing.

> -- 
> Adam Williamson (he/him/his)

-- 
Sincerely yours,
Mike.

next prev parent reply	other threads:[~2025-01-11  8:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-11 16:51 Kernel crash while doing chroot'ed grub2-mkconfig on qemu-emulated Nehalem CPU since late November 6.13 snapshot Adam Williamson
2025-01-02 20:16 ` Adam Williamson
2025-01-04  1:51   ` Luis Chamberlain
2025-01-04  2:57     ` Adam Williamson
2025-01-06  7:09     ` Adam Williamson
2025-01-10  9:57   ` Mike Rapoport
2025-01-10 17:28     ` Adam Williamson
2025-01-11  8:50       ` Mike Rapoport [this message]
     [not found]         ` <358da653bcd8b7875f59e673e5572bddd3677aea.camel@redhat.com>
     [not found]           ` <9cb0bdfc94643fb1544671837b357bc49800ce3f.camel@redhat.com>
2025-01-14  7:55             ` Adam Williamson
2025-01-14  8:08               ` Adam Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4IwxfydqWMkhoLq@kernel.org \
    --to=rppt@kernel.org \
    --cc=awilliam@redhat.com \
    --cc=jforbes@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.