From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
Andrea Arcangeli <aarcange@redhat.com>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>,
Amit Shah <amit.shah@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"quintela@redhat.com" <quintela@redhat.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: post-copy is broken?
Date: Mon, 18 Apr 2016 14:23:39 +0100 [thread overview]
Message-ID: <20160418132338.GG2222@work-vm> (raw)
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E041813A6@shsmsx102.ccr.corp.intel.com>
* Li, Liang Z (liang.z.li@intel.com) wrote:
> > > > > > > > Interesting; it's failing reliably for me - but only with a
> > > > > > > > reasonably freshly booted machine (so that the pages get THPd).
> > > > > > >
> > > > > > > The same here. Freshly booted machine with 64GiB ram. I've
> > > > > > > checked
> > > > > > > /proc/vmstat: huge pages were allocated
> > > > > >
> > > > > > Thanks for testing.
> > > > > >
> > > > > > Damn; this is confusing now. I've got a RHEL7 box with
> > > > > > 4.6.0-rc3 on where it works, and a fedora24 VM where it fails
> > > > > > (the f24 VM is where I did the bisect so it works fine with the
> > > > > > older kernel on the f24
> > > > userspace in that VM).
> > > > > >
> > > > > > So lets see:
> > > > > > works: Kirill's (64GB machine)
> > > > > > Dave's RHEL7 host (24GB RAM, dual xeon, RHEL7
> > > > > > userspace and kernel
> > > > > > config)
> > > > > > fails: Dave's f24 VM (4GB RAM, 4 vcpus VM on my laptop24
> > > > > > userspace and kernel config)
> > > > > >
> > > > > > So it's any of userspace, kernel config, machine hardware or hmm.
> > > > > >
> > > > > > My f24 box has transparent_hugepage_madvise, where my rhel7 has
> > > > > > transparent_hugepage_always (but still works if I flip it to
> > > > > > madvise at run time). I'll try and get the configs closer together.
> > > > > >
> > > > > > Liang Li: Can you run my test on your setup which fails the
> > > > > > migrate and tell me what your userspace is?
> > > > > >
> > > > > > (If you've not built my test yet, you might find you need to add a :
> > > > > > tests/postcopy-test$(EXESUF): tests/postcopy-test.o
> > > > > >
> > > > > > to the tests/Makefile)
> > > > > >
> > > > >
> > > > > Hi Dave,
> > > > >
> > > > > How to build and run you test? I didn't do that before.
> > > >
> > > > Apply the code in:
> > > > http://lists.gnu.org/archive/html/qemu-devel/2016-04/msg02138.html
> > > >
> > > > fix the:
> > > > + if ( ((b + 1) % 255) == last_byte && !hit_edge) {
> > > > to:
> > > > + if ( ((b + 1) % 256) == last_byte && !hit_edge) {
> > > >
> > > > to tests/Makefile
> > > > tests/postcopy-test$(EXESUF): tests/postcopy-test.o
> > > >
> > > > and do a:
> > > > make check
> > > >
> > > > in qemu.
> > > > Then you can rerun the test with:
> > > > QTEST_QEMU_BINARY=path/to/qemu-system-
> > x86_64 ./tests/postcopy-
> > > > test
> > > >
> > > > if it works, reboot and check it still works from a fresh boot.
> > > >
> > > > Can you describe the system which your full test failed on? What
> > > > distro on the host? What type of host was it tested on?
> > > >
> > > > Dave
> > > >
> > >
> > >
> > > Thanks, Dave
> > >
> > > The host is CenOS7, its original kernel is 3.10.0-327.el7.x86_64
> > > (CentOS 7.1?), The hardware platform is HSW-EP with 64GB RAM.
> >
> > OK, so your test fails on real hardware; my guess is that my test will work on
> > there.
> > Can you try your test with THP disabled on the host:
> >
> > echo never > /sys/kernel/mm/transparent_hugepage/enabled
> >
>
> If the THP is disabled, no fails.
> And your test was always passed, even when real post-copy was failed.
>
> In my env, the output of
> 'cat /sys/kernel/mm/transparent_hugepage/enabled' is:
>
> [always] ...
OK, I can't get my test to fail on real hardware - only in a VM; but my
suspicion is we're looking at the same bug; both of them it goes away
if we disable THP, both of them work on 4.4.x and fail on 4.5.x.
I'd love to be able to find a nice easy test to be able to give to Andrea
and Kirill
I've also just confirmed that running (in a VM) a fedora-24 4.5.0 kernel
with a fedora-23 userspace (qemu built under f23) still fails with my test.
So the problem there is definitely triggered by the newer kernel not
the newer userspace.
Dave
>
> Liang
>
> > Dave
> >
> > >
> > >
> > > > >
> > > > > Thanks!
> > > > > Liang
> > > > >
> > > > > >
> > > > > > Dave
> > > > > > >
> > > > > > > --
> > > > > > > Kirill A. Shutemov
> > > > > > --
> > > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-04-18 13:23 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <F2CBF3009FA73547804AE4C663CAB28E0417E6B1@shsmsx102.ccr.corp.intel.com>
[not found] ` <20160412175501.GB6415@work-vm>
[not found] ` <F2CBF3009FA73547804AE4C663CAB28E0417EE92@shsmsx102.ccr.corp.intel.com>
[not found] ` <F2CBF3009FA73547804AE4C663CAB28E0417EEE4@shsmsx102.ccr.corp.intel.com>
[not found] ` <20160413080545.GA2270@work-vm>
[not found] ` <20160413114103.GB2270@work-vm>
[not found] ` <20160413125053.GC2270@work-vm>
[not found] ` <20160413205132.GG26364@redhat.com>
[not found] ` <20160414123441.GF2252@work-vm>
2016-04-14 16:22 ` post-copy is broken? Andrea Arcangeli
2016-04-15 12:52 ` Kirill A. Shutemov
2016-04-15 13:42 ` Dr. David Alan Gilbert
2016-04-15 15:23 ` Kirill A. Shutemov
2016-04-15 16:34 ` Dr. David Alan Gilbert
2016-04-18 9:50 ` Li, Liang Z
2016-04-18 9:55 ` Dr. David Alan Gilbert
2016-04-18 10:06 ` Li, Liang Z
2016-04-18 10:15 ` Dr. David Alan Gilbert
2016-04-18 10:33 ` Li, Liang Z
2016-04-18 13:23 ` Dr. David Alan Gilbert [this message]
2016-04-18 17:18 ` Dr. David Alan Gilbert
2016-04-20 17:27 ` Dr. David Alan Gilbert
2016-04-21 19:21 ` Dr. David Alan Gilbert
2016-04-27 14:47 ` Andrea Arcangeli
2016-04-28 2:59 ` Li, Liang Z
2016-04-28 8:03 ` Dr. David Alan Gilbert
2016-04-15 22:19 ` Andrea Arcangeli
2016-04-18 9:40 ` Dr. David Alan Gilbert
2016-04-18 9:58 ` Li, Liang Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160418132338.GG2222@work-vm \
--to=dgilbert@redhat.com \
--cc=aarcange@redhat.com \
--cc=amit.shah@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=liang.z.li@intel.com \
--cc=linux-mm@kvack.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).