Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eduardo Habkost <ehabkost@redhat.com>
To: "Denis V. Lunev" <den@virtuozzo.com>,
	"Longpeng (Mike)" <longpeng2@huawei.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>,
	pbonzini@redhat.com, rth@twiddle.net, qemu-devel@nongnu.org,
	rkagan@virtuozzo.com, Gonglei <arei.gonglei@huawei.com>,
	huangpeng <peter.huangpeng@huawei.com>,
	zhaoshenglong <zhaoshenglong@huawei.com>,
	herongguang.he@huawei.com
Subject: Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default
Date: Tue, 28 Nov 2017 19:13:26 -0200	[thread overview]
Message-ID: <20171128211326.GV3037@localhost.localdomain> (raw)
In-Reply-To: <2cd31202-27c3-983f-85a9-6814ff504706@virtuozzo.com>

[CCing the people who were copied in the original patch that
enabled l3cache]

On Tue, Nov 28, 2017 at 11:20:27PM +0300, Denis V. Lunev wrote:
> On 11/28/2017 10:58 PM, Eduardo Habkost wrote:
> > Hi,
> >
> > On Fri, Nov 24, 2017 at 04:26:50PM +0300, Denis Plotnikov wrote:
> >> Commit 14c985cffa "target-i386: present virtual L3 cache info for vcpus"
> >> introduced and set by default exposing l3 to the guest.
> >>
> >> The motivation behind it was that in the Linux scheduler, when waking up
> >> a task on a sibling CPU, the task was put onto the target CPU's runqueue
> >> directly, without sending a reschedule IPI.  Reduction in the IPI count
> >> led to performance gain.
> >>
> >> However, this isn't the whole story.  Once the task is on the target
> >> CPU's runqueue, it may have to preempt the current task on that CPU, be
> >> it the idle task putting the CPU to sleep or just another running task.
> >> For that a reschedule IPI will have to be issued, too.  Only when that
> >> other CPU is running a normal task for too little time, the fairness
> >> constraints will prevent the preemption and thus the IPI.
> >>
> >> This boils down to the improvement being only achievable in workloads
> >> with many actively switching tasks.  We had no access to the
> >> (proprietary?) SAP HANA benchmark the commit referred to, but the
> >> pattern is also reproduced with "perf bench sched messaging -g 1"
> >> on 1 socket, 8 cores vCPU topology, we see indeed:
> >>
> >> l3-cache	#res IPI /s	#time / 10000 loops
> >> off		560K		1.8 sec
> >> on		40K		0.9 sec
> >>
> >> Now there's a downside: with L3 cache the Linux scheduler is more eager
> >> to wake up tasks on sibling CPUs, resulting in unnecessary cross-vCPU
> >> interactions and therefore exessive halts and IPIs.  E.g. "perf bench
> >> sched pipe -i 100000" gives
> >>
> >> l3-cache	#res IPI /s	#HLT /s		#time /100000 loops
> >> off		200 (no K)	230		0.2 sec
> >> on		400K		330K		0.5 sec
> >>
> >> In a more realistic test, we observe 15% degradation in VM density
> >> (measured as the number of VMs, each running Drupal CMS serving 2 http
> >> requests per second to its main page, with 95%-percentile response
> >> latency under 100 ms) with l3-cache=on.
> >>
> >> We think that mostly-idle scenario is more common in cloud and personal
> >> usage, and should be optimized for by default; users of highly loaded
> >> VMs should be able to tune them up themselves.
> >>
> > There's one thing I don't understand in your test case: if you
> > just found out that Linux will behave worse if it assumes that
> > the VCPUs are sharing a L3 cache, why are you configuring a
> > 8-core VCPU topology explicitly?
> >
> > Do you still see a difference in the numbers if you use "-smp 8"
> > with no "cores" and "threads" options?
> >
> This is quite simple. A lot of software licenses are bound to the amount
> of CPU __sockets__. Thus it is mandatory in a lot of cases to set topology
> with 1 socket/xx cores to reduce the amount of money necessary to
> be paid for the software.

In this case it looks like we're talking about the expected
meaning of "cores=N".  My first interpretation would be that the
user obviously want the guest to see the multiple cores sharing a
L3 cache, because that's how real CPUs normally work.  But I see
why you have different expectations.

Numbers on dedicated-pCPU scenarios would be helpful to guide the
decision.  I wouldn't like to cause a performance regression for
users that fine-tuned vCPU topology and set up CPU pinning.

-- 
Eduardo

next prev parent reply	other threads:[~2017-11-28 21:13 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-24 13:26 [Qemu-devel] [PATCH] i386: turn off l3-cache property by default Denis Plotnikov
2017-11-28 18:54 ` Michael S. Tsirkin
2017-11-28 19:50   ` Paolo Bonzini
2017-11-28 20:05     ` Eduardo Habkost
2017-11-29  4:56     ` Roman Kagan
2017-11-28 19:58 ` Eduardo Habkost
2017-11-28 20:20   ` Denis V. Lunev
2017-11-28 21:13     ` Eduardo Habkost [this message]
2017-11-29  1:57       ` Gonglei (Arei)
2017-11-29  5:55         ` rkagan
2017-11-29  6:01           ` Gonglei (Arei)
2017-11-29  5:20       ` Longpeng (Mike)
2017-11-29  6:01         ` Roman Kagan
2017-11-29  7:38           ` Longpeng (Mike)
2017-11-29 10:41         ` Eduardo Habkost
2017-11-29 11:58           ` Longpeng (Mike)
2017-11-29 13:35             ` Roman Kagan
2017-11-29 17:09               ` Eduardo Habkost
2017-11-29 17:15               ` Paolo Bonzini
2017-11-30  6:28                 ` Roman Kagan
2017-11-30  9:26               ` Longpeng (Mike)
2017-11-29  5:46       ` Roman Kagan
2017-11-29 10:25         ` Eduardo Habkost
2017-11-29  4:17     ` Michael S. Tsirkin
2017-11-29  6:25       ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171128211326.GV3037@localhost.localdomain \
    --to=ehabkost@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=den@virtuozzo.com \
    --cc=dplotnikov@virtuozzo.com \
    --cc=herongguang.he@huawei.com \
    --cc=longpeng2@huawei.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rkagan@virtuozzo.com \
    --cc=rth@twiddle.net \
    --cc=zhaoshenglong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.