From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Miller" Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c Date: Tue, 05 Jul 2005 14:22:10 -0700 (PDT) Message-ID: <20050705.142210.14973612.davem@davemloft.net> References: <20050705134805.GH16076@postel.suug.ch> <42CAAE2F.5070807@cosmosbay.com> <20050705173411.GK16076@postel.suug.ch> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: dada1@cosmosbay.com, netdev@oss.sgi.com Return-path: To: tgraf@suug.ch In-Reply-To: <20050705173411.GK16076@postel.suug.ch> Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org From: Thomas Graf Date: Tue, 5 Jul 2005 19:34:11 +0200 > Do as you wish, I don't feel like argueing about micro optimizations. I bet the performance gain really comes from the mispredicted branches in the loop. For loops of fixed duration, say, 5 or 6 iterations or less, it totally defeats the branch prediction logic in most processors. By the time the chip moves the I-cache branch state to "likely" the loop has ended and we eat a mispredict. I think the original patch is OK, hand unrolling the loop in the C code. Adding -funroll-loops to the CFLAGS has lots of implications, and in particular the embedded folks might not be happy with some things that result from that. So I'll apply the original unrolling patch for now.