From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Miller" Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Date: Tue, 05 Jul 2005 16:45:03 -0700 (PDT) Message-ID: <20050705.164503.104035718.davem@davemloft.net> References: <20050705.143548.28788459.davem@davemloft.net> <42CB14B2.5090601@cosmosbay.com> <20050705234104.GR16076@postel.suug.ch> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Return-path: To: tgraf@suug.ch In-Reply-To: <20050705234104.GR16076@postel.suug.ch> Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org From: Thomas Graf Date: Wed, 6 Jul 2005 01:41:04 +0200 > I still think we can fix this performance issue without manually > unrolling the loop or we should at least try to. In the end gcc > should notice the constant part of the loop and move it out so > basically the only difference should the additional prio++ and > possibly a failing branch prediction. But the branch prediction is where I personally think a lot of the lossage is coming from. These can cost upwards of 20 or 30 processor cycles, easily. That's getting close to the cost of a L2 cache miss. I see the difficulties with this change now, why don't we revisit this some time in the future?