From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758008Ab1JEPMt (ORCPT ); Wed, 5 Oct 2011 11:12:49 -0400 Received: from mga01.intel.com ([192.55.52.88]:6415 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757904Ab1JEPMs (ORCPT ); Wed, 5 Oct 2011 11:12:48 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.68,491,1312182000"; d="scan'208";a="70580979" From: Andi Kleen To: Christoph Lameter Cc: Peter Zijlstra , starlight@binnacle.cx, Eric Dumazet , linux-kernel@vger.kernel.org, netdev , Willy Tarreau , Ingo Molnar Subject: Re: big picture UDP/IP performance question re 2.6.18 -> 2.6.32 References: <6.2.5.6.2.20111003112108.03a83a28@binnacle.cx> <1317820942.6766.26.camel@twins> Date: Wed, 05 Oct 2011 08:12:44 -0700 In-Reply-To: (Christoph Lameter's message of "Wed, 5 Oct 2011 09:26:24 -0500 (CDT)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Christoph Lameter writes: > On Wed, 5 Oct 2011, Peter Zijlstra wrote: > >> Clearly none of the tests being ran on a regular basis, by for instance >> the Intel regression team, covers your needs. Start by fixing that. > > The most commonly run, the aged old tests contained in the AIM9 suite, > have consistently shown these regressions over long years and they have > been brought up repeatedly in numerous discussions. This is pervasive > thoughout the OS hotpaths. Just look at how the page fault latencies > change over time. Take a modern machine and then run successively older > kernel versions on it. You will see performance getting better and > latencies becoming smaller. One of the reason the Intel tests fix problems is that Alex and Tim and others look into them and track down regressions. If you see a new problem during your testing you should do the same. This actually helps. It doesn't need to be a full analysis/patch, even just posting some detailed information on a new regression is useful. For example one tool I found useful is to just enable the function graph tracer and look at the latencies of functions in that path during the test. If something changes dramatically it's relatively easy to point to. So for example if you see context switch changing it should not be that difficult to do such a trace and at least point to the guilty functions and post a mail. >> Also, for latency, we've got ftrace and a latencytracer, provide traces >> that illustrate your fail. > > We would need a backport of both to a kernel version that works with > reasonable latencies so that we can figure out what caused these > regressions for this particular case. Disabling network and kernel > features usually gives you better performance but there are a lot of > things in the hot paths these days that can not be disabled. Please do function traces and point fingers at slow things in hotpath. The more the merrier. Post a list of a shame! -Andi -- ak@linux.intel.com -- Speaking for myself only