Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eduardo Habkost <ehabkost@redhat.com>
To: "Denis V. Lunev" <den@virtuozzo.com>,
	"Longpeng (Mike)" <longpeng2@huawei.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>,
	pbonzini@redhat.com, rth@twiddle.net, qemu-devel@nongnu.org,
	rkagan@virtuozzo.com, Gonglei <arei.gonglei@huawei.com>,
	huangpeng <peter.huangpeng@huawei.com>,
	zhaoshenglong <zhaoshenglong@huawei.com>,
	herongguang.he@huawei.com
Subject: Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default
Date: Tue, 28 Nov 2017 19:13:26 -0200	[thread overview]
Message-ID: <20171128211326.GV3037@localhost.localdomain> (raw)
In-Reply-To: <2cd31202-27c3-983f-85a9-6814ff504706@virtuozzo.com>

[CCing the people who were copied in the original patch that
enabled l3cache]

On Tue, Nov 28, 2017 at 11:20:27PM +0300, Denis V. Lunev wrote:
> On 11/28/2017 10:58 PM, Eduardo Habkost wrote:
> > Hi,
> >
> > On Fri, Nov 24, 2017 at 04:26:50PM +0300, Denis Plotnikov wrote:
> >> Commit 14c985cffa "target-i386: present virtual L3 cache info for vcpus"
> >> introduced and set by default exposing l3 to the guest.
> >>
> >> The motivation behind it was that in the Linux scheduler, when waking up
> >> a task on a sibling CPU, the task was put onto the target CPU's runqueue
> >> directly, without sending a reschedule IPI.  Reduction in the IPI count
> >> led to performance gain.
> >>
> >> However, this isn't the whole story.  Once the task is on the target
> >> CPU's runqueue, it may have to preempt the current task on that CPU, be
> >> it the idle task putting the CPU to sleep or just another running task.
> >> For that a reschedule IPI will have to be issued, too.  Only when that
> >> other CPU is running a normal task for too little time, the fairness
> >> constraints will prevent the preemption and thus the IPI.
> >>
> >> This boils down to the improvement being only achievable in workloads
> >> with many actively switching tasks.  We had no access to the
> >> (proprietary?) SAP HANA benchmark the commit referred to, but the
> >> pattern is also reproduced with "perf bench sched messaging -g 1"
> >> on 1 socket, 8 cores vCPU topology, we see indeed:
> >>
> >> l3-cache	#res IPI /s	#time / 10000 loops
> >> off		560K		1.8 sec
> >> on		40K		0.9 sec
> >>
> >> Now there's a downside: with L3 cache the Linux scheduler is more eager
> >> to wake up tasks on sibling CPUs, resulting in unnecessary cross-vCPU
> >> interactions and therefore exessive halts and IPIs.  E.g. "perf bench
> >> sched pipe -i 100000" gives
> >>
> >> l3-cache	#res IPI /s	#HLT /s		#time /100000 loops
> >> off		200 (no K)	230		0.2 sec
> >> on		400K		330K		0.5 sec
> >>
> >> In a more realistic test, we observe 15% degradation in VM density
> >> (measured as the number of VMs, each running Drupal CMS serving 2 http
> >> requests per second to its main page, with 95%-percentile response
> >> latency under 100 ms) with l3-cache=on.
> >>
> >> We think that mostly-idle scenario is more common in cloud and personal
> >> usage, and should be optimized for by default; users of highly loaded
> >> VMs should be able to tune them up themselves.
> >>
> > There's one thing I don't understand in your test case: if you
> > just found out that Linux will behave worse if it assumes that
> > the VCPUs are sharing a L3 cache, why are you configuring a
> > 8-core VCPU topology explicitly?
> >
> > Do you still see a difference in the numbers if you use "-smp 8"
> > with no "cores" and "threads" options?
> >
> This is quite simple. A lot of software licenses are bound to the amount
> of CPU __sockets__. Thus it is mandatory in a lot of cases to set topology
> with 1 socket/xx cores to reduce the amount of money necessary to
> be paid for the software.

In this case it looks like we're talking about the expected
meaning of "cores=N".  My first interpretation would be that the
user obviously want the guest to see the multiple cores sharing a
L3 cache, because that's how real CPUs normally work.  But I see
why you have different expectations.

Numbers on dedicated-pCPU scenarios would be helpful to guide the
decision.  I wouldn't like to cause a performance regression for
users that fine-tuned vCPU topology and set up CPU pinning.

-- 
Eduardo

next prev parent reply	other threads:[~2017-11-28 21:13 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-24 13:26 [Qemu-devel] [PATCH] i386: turn off l3-cache property by default Denis Plotnikov
2017-11-28 18:54 ` Michael S. Tsirkin
2017-11-28 19:50   ` Paolo Bonzini
2017-11-28 20:05     ` Eduardo Habkost
2017-11-29  4:56     ` Roman Kagan
2017-11-28 19:58 ` Eduardo Habkost
2017-11-28 20:20   ` Denis V. Lunev
2017-11-28 21:13     ` Eduardo Habkost [this message]
2017-11-29  1:57       ` Gonglei (Arei)
2017-11-29  5:55         ` rkagan
2017-11-29  6:01           ` Gonglei (Arei)
2017-11-29  5:20       ` Longpeng (Mike)
2017-11-29  6:01         ` Roman Kagan
2017-11-29  7:38           ` Longpeng (Mike)
2017-11-29 10:41         ` Eduardo Habkost
2017-11-29 11:58           ` Longpeng (Mike)
2017-11-29 13:35             ` Roman Kagan
2017-11-29 17:09               ` Eduardo Habkost
2017-11-29 17:15               ` Paolo Bonzini
2017-11-30  6:28                 ` Roman Kagan
2017-11-30  9:26               ` Longpeng (Mike)
2017-11-29  5:46       ` Roman Kagan
2017-11-29 10:25         ` Eduardo Habkost
2017-11-29  4:17     ` Michael S. Tsirkin
2017-11-29  6:25       ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171128211326.GV3037@localhost.localdomain \
    --to=ehabkost@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=den@virtuozzo.com \
    --cc=dplotnikov@virtuozzo.com \
    --cc=herongguang.he@huawei.com \
    --cc=longpeng2@huawei.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rkagan@virtuozzo.com \
    --cc=rth@twiddle.net \
    --cc=zhaoshenglong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).