From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752426AbbE0UJp (ORCPT ); Wed, 27 May 2015 16:09:45 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:50918 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751476AbbE0UJk (ORCPT ); Wed, 27 May 2015 16:09:40 -0400 Message-ID: <55662460.2050501@fb.com> Date: Wed, 27 May 2015 16:09:04 -0400 From: Josef Bacik User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: , , , , Subject: Re: [PATCH] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE References: <1432675865-378571-1-git-send-email-jbacik@fb.com> In-Reply-To: <1432675865-378571-1-git-send-email-jbacik@fb.com> Content-Type: multipart/mixed; boundary="------------010500090400080206040909" X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.14.151,1.0.33,0.0.0000 definitions=2015-05-27_07:2015-05-27,2015-05-27,1970-01-01 signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --------------010500090400080206040909 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit On 05/26/2015 05:31 PM, Josef Bacik wrote: > At Facebook we have a pretty heavily multi-threaded application that is > sensitive to latency. We have been pulling forward the old SD_WAKE_IDLE code > because it gives us a pretty significant performance gain (like 20%). It turns > out this is because there are cases where the scheduler puts our task on a busy > CPU when there are idle CPU's in the system. We verify this by reading the > cpu_delay_req_avg_us from the scheduler netlink stuff. With our crappy patch we > get much lower numbers vs baseline. > > SD_BALANCE_WAKE is supposed to find us an idle cpu to run on, however it is just > looking for an idle sibling, preferring affinity over all else. This is not > helpful in all cases, and SD_BALANCE_WAKE's job is to find us an idle cpu, not > garuntee affinity. Fix this by first trying to find an idle sibling, and then > if the cpu is not idle fall through to the logic to find an idle cpu. With this > patch we get slightly better performance than with our forward port of > SD_WAKE_IDLE. Thanks, > I rigged up a test script to run the perf bench sched tests and give me the numbers. Here are the numbers 4.0 Messaging: 56.934 Total runtime in seconds Pipe: 105620.762 ops/sec 4.0 + my patch Messaging: 47.374 Pipe: 113691.199 so ~20% better performance out of the Messaging test which is sort of like HHVM and ~8% better pipe performance. This box is a 2 socket 16 core box. I've attached the script I'm using, basically I just run each thing 5 times, and for the perf bench sched pipe run I do NR_CPUS/2 instances of them in parallel. If you are interested I'd be happy to show you numbers for our HHVM test, but they are less straightforward and require pretty pictures and a book of how to read the numbers. Thanks Josef --------------010500090400080206040909--