From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sowmini Varadhan Date: Thu, 07 Aug 2014 20:17:08 +0000 Subject: Re: [PATCH sparc] ldc_connect() should not return EINVAL when handshake is in progress. Message-Id: <20140807201708.GD29191@oracle.com> List-Id: References: <20140801135040.GE19031@oracle.com> In-Reply-To: <20140801135040.GE19031@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: sparclinux@vger.kernel.org (Maybe this discussion belongs in netdev?) On (08/06/14 23:00), David Miller wrote: > The scheme used there basically is to mark the carrier off when > the peer stops sinking packets we are TX'ing to it, and queue > up a timer. > > If the timer fires, we free up all of the packets in the transmit > queue so they don't linger forever and take up resources. > > Usually there is some "event" that can let us know that data is > being sinked again, and we can use that to turn the carrier on > and start transmitting again. Otherwise we'll have to poll > for such things in the timer. so in the case of sunvnet, there are 2 writes that can block- either !vnet_tx_dring_avail() (i.e., no descriptor rings) , or !tx_has_space_for() (i.e., cannot send the ldc trigger) vnet_start_xmit() already seems to have most of the smarts to handle both failures, the one missing piece is that __vnet_tx_trigger() should not loop infinitely on the ldc_write, but give up with EWOULDBLOCK after some finite number of tries- at that point vnet_start_xmit() recovers correctly (afaict). But then there's another problem at the tail of vnet_event()- if we hit the maybe_tx_wakeup() condition, we try to take the netif_tx_lock() in the recv-interrupt-context and can deadlock with dev_watchdog(). I could move the contents of maybe_tx_wakeup() into vnet_tx_timeout(), and that avoids deadlocks. However I think that now matches the description of polling-to-recover-from-blocked-tx above. And one side-effect of moving from the synchronous maybe_tx_wakeup() to the reliance on watchdog-timer to unblock things is that things like iperf throughput can fluctuate wildly. But given that this actually *is* a performance weakness, perhaps its not worth over-optimizing this case? Another data-point in favor of not over-optimizing the wrong thing: in separate experiments, when I try to move the data-handling code in vnet_event off to a bh-handler, performance improves noticeably- I rarely encounter the netif_queue_stopped(), so relying on vnet_tx_timeout() for recovery is not an issue with that prototype. Unless there's a simple way to kick off netif_wake_queue() from vnet_event(), it seems better to just make vnet_tx_timeout() do this job. --Sowmini