From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [net-next 1/2] qdisc: Allow qdiscs to provide backpressure up the stack. Date: Thu, 26 Aug 2010 15:59:15 -0700 (PDT) Message-ID: <20100826.155915.200379485.davem@davemloft.net> References: <1282762851-3612-1-git-send-email-greearb@candelatech.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: greearb@candelatech.com Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:36862 "EHLO sunset.davemloft.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754609Ab0HZW67 (ORCPT ); Thu, 26 Aug 2010 18:58:59 -0400 In-Reply-To: <1282762851-3612-1-git-send-email-greearb@candelatech.com> Sender: netdev-owner@vger.kernel.org List-ID: From: Ben Greear Date: Wed, 25 Aug 2010 12:00:50 -0700 > Some qdiscs, in some instances, can reliably detect when they > are about to drop a packet in the dev_queue_xmit path. In > this case, it would be nice to provide backpressure up the > stack, and NOT free the skb in the qdisc logic. > > Signed-off-by: Ben Greear I agree with Stephen that, if you're going to do this, it's better to just update all of the qdiscs to be able to handle the new semantic than to make this special case code path. Right now the only configuration where you scheme is going to work is one with the default qdisc. That is way too narrow in scope in my opinion. There is also the issue of what this actually accomplishes. All I can happening is that instead of being limited to the child device's backlog limits, you're limited by the combined backlog of two devices. In reality I don't see this helping much at all. The most downstream device provides the TX queue limiting and that ought to be sufficient. In fact, allowing such increase in backlogging is something which might be completely unexpected. If the limit is insufficient in the most downstream device, increase it. If the only remaining problem is pktgen, we can look into changing it to operate with more information than just plain NETDEV_TX_BUSY. Also, there is the issue of waking up the queue. If you pass back NETDEV_TX_BUSY the generic scheduler layer assumes that this means that there will absolutely be something which triggers to get the queue transmitting again. Usually that is the hardware queue waking up in the device. But as implemented, you've added no such notifications and therefore I think it's possible for the pipline of packets going through these layered devices to get wedged and stop forever. This is why you can't mix and match hardware queue status notifications with qdisc software ones. If you pass NETDEV_TX_BUSY back you better have fake queue wakeup notifications to get the send engine going again as well. But I believe we really don't want that. The only indication we really want to propagate back down to the callers is NET_XMIT_CN which, as far as I can tell, does happen correctly right now. If not that's a bug. You're essentially trying to pass down a hardware queue condition for something that is caused by a software queue limit. And therefore I strongly disagree with what these changes are doing.