From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753351Ab1HAQ2y (ORCPT ); Mon, 1 Aug 2011 12:28:54 -0400 Received: from smtp105.prem.mail.ac4.yahoo.com ([76.13.13.44]:39698 "HELO smtp105.prem.mail.ac4.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1753041Ab1HAQ2v (ORCPT ); Mon, 1 Aug 2011 12:28:51 -0400 X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: 7ubjA3wVM1kL99xWbsqS0Z9SwulZZicelCPHlPq92Nrw0Zs aRf5swWFkJHs1Gu10DhOXkbKhMRxieBH5ny4T7Lk3fSQzEH4TJDgx4W6aaCQ BIRZ_8LzEDzkZj3T1b5yibUUlbJeL.3zIqULo77.cOpB78M9.ln2pCbb9zZV CqhlP2HisAVyLpZ4EnneO5bOinb3NzA3G1.7BeoZPfLnyeEft51XpCmyk9lD UFtXrPO4LiQiczmyqHjAHVEsVRMiyfFur7yq5CsrBh6OiL_J3gtclGHpo2Tz 0hyAb56SqxZxQiULL7liSJw_ddMJG2YxG8ega1rQd_r6r0zxw X-Yahoo-SMTP: _Dag8S.swBC1p4FJKLCXbs8NQzyse1SYSgnAbY0- Message-Id: <20110801162823.755182213@linux.com> User-Agent: quilt/0.48-1 Date: Mon, 01 Aug 2011 11:28:23 -0500 From: Christoph Lameter To: Pekka Enberg Cc: David Rientjes Cc: Andi Kleen Cc: tj@kernel.org Cc: Metathronius Galabant Cc: Matt Mackall Cc: Eric Dumazet Cc: Adrian Drzewiecki Cc: linux-kernel@vger.kernel.org Subject: [slub p3 0/7] SLUB: [RFC] Per cpu partial lists V3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org V2->V3 : Work on the todo list. Still some work to be done to reduce code impact and make this all cleaner. (Pekka: patch 1-3 are cleanup patches of general usefulness. You got #1 already 2+3 could be picked up w/o any issue). The following patchset introduces per cpu partial lists which allow a performance increase of around ~15% if there is contention for the nodelock (can be tested using hackbench). These lists help to avoid per node lock overhead. Allocator latency could be further reduced by making these operations work without disabling interrupts (like the fastpath and the free slowpath) as well as implementing better ways of handling ther cpu array with partial pages. I am still not satisfied with the cleanliness of the code after these changes. Some review with suggestions as to how to restructure the code given these changes in operations would be appreciated. It is interesting to note that BSD has gone to a scheme with partial pages only per cpu (source: Adrian). Transfer of cpu ownerships is done using IPIs. Probably too much overhead for our taste. The use of a few per cpu partial pages looks to be beneficial though. Note that there is no performance gain when there is no contention. Performance: Before After ./hackbench 100 process 200000 Time: 2299.072 1742.454 ./hackbench 100 process 20000 Time: 224.654 182.393 ./hackbench 100 process 20000 Time: 227.126 182.780 ./hackbench 100 process 20000 Time: 219.608 182.899 ./hackbench 10 process 20000 Time: 21.769 18.756 ./hackbench 10 process 20000 Time: 21.657 18.938 ./hackbench 10 process 20000 Time: 23.193 19.537 ./hackbench 1 process 20000 Time: 2.337 2.263 ./hackbench 1 process 20000 Time: 2.223 2.271 ./hackbench 1 process 20000 Time: 2.269 2.301