From mboxrd@z Thu Jan 1 00:00:00 1970 From: David L Stevens Subject: Re: [PATCHv6 net-next 1/3] sunvnet: upgrade to VIO protocol version 1.6 Date: Thu, 18 Sep 2014 15:58:21 -0400 Message-ID: <541B395D.1000809@oracle.com> References: <541A2316.5010603@oracle.com> <2AB76E42-C12D-47C5-8476-0D0C611691A5@oracle.com> <541AD838.50700@oracle.com> <9BA1705F-0C89-471A-9872-688A3FA3165C@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: David Miller , netdev@vger.kernel.org To: Raghuram Kothakota Return-path: Received: from aserp1040.oracle.com ([141.146.126.69]:48271 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756218AbaIRT60 (ORCPT ); Thu, 18 Sep 2014 15:58:26 -0400 In-Reply-To: <9BA1705F-0C89-471A-9872-688A3FA3165C@oracle.com> Sender: netdev-owner@vger.kernel.org List-ID: On 09/18/2014 02:49 PM, Raghuram Kothakota wrote: > > I am probably not as knowledgeable of sunvnet as you may be, but I would assume > the code is capable of handling the a vport removal and should have sufficient method > cleanup as well. There is nothing in the code that I saw that checks if pending descriptors are marked as VIO_DESC_DONE before freeing the buffer space when the device is shutdown. From that, I assume a shutdown during active transmits could result in a fault or the remote looking at garbage. Adding a wait for that condition could be "forever" if the remote disappears while processing receives, so it'd need a timeout, too, and it all needs to switch gracefully to the new buffer set. I didn't want to put MTU changes in the list of things to trigger that, as I believe they also would, nor to complicate the MTU-relevant pieces I have with something that I think ultimately should use fewer, larger buffers. > In the virtualization world, we want resources to be efficiently used and memory is > still very important resource. My concern is mostly because this memory usage of > 32+MB is on a per LDC basis. LDoms today supports a max of 128 domains, but > from my experience seen actual deployments of the order of 50 domains. This is > going up as the platforms getting more and more powerful. If there are really > that many peers, then the amount of memory consumed by one vnet instance > is 50 * 32+MB = 1.6GB+. It's fine if this memory is really used, but it seems like this > will be useful only when the peer is another linux guest with this version of vnet and > also the MTU is configured to use 64K. The memory is being wasted for all other > peers that either don't support 64K MTU or not configured to use it and also > the switch port as obviously it doesn't support 64K MTU today. Since the current code allows only 1500-byte MTUs, this patch set is useful for any value larger than that-- on Solaris, up to 16000, and on Linux, as high as 64K. I think dynamic buffer allocation should not be tied to these other items, but it should be done. This isn't the end of sunvnet development, but the beginning. However, if 64K is too large, what value is not? Any number we pick is arbitrary and with large numbers of LDOMs may be "too many." Of course, the smaller it is, the smaller the benefit, too. I would prefer to reduce VNET_MAXPACKET to some lower value, or make it a module parameter, than to link this patchset to dynamic allocation of buffers. But the 16000 value Solaris supports would correspond to ~400MB in your example above -- still a large number for a single virtual interface. >> b) I think most people will want to use large MTUs for performance; enough so >> that perhaps the bring-up MTU should be 64K too > > > From my experience in SPARC world, most customers pushed us back for any > proposal to use Jumbo Frames. The customers who configured Jumbo frames, > mostly used 9K for performance of NFS etc. Customers have never been able to use jumbo frames on Linux LDOMs before, and I expect an 8X improvement in throughput might affect their judgement on it. Moreso, if those large buffers and throughput improvements can be done with GSO/TSO by default with no MTU adjustments on the devices. > When we implemented TSO support, we evaluated the cost of the buffers vs > performance. We were able to limit TSO support to 8K(actually bit less) and still > achieve high performance, for example we are able to drive line rate on a 10G > and guest-to-guest of the order of 45+Gbps. So, my suggestion would be to > increase the parallelism of the code more than depending on large MTU. That's nice for Solaris. I see: dlsl1 880 # ip link set mtu 8192 dev eth1 dlsl1 881 # netperf -H 10.0.0.2 TCP STREAM TEST to 10.0.0.2 : interval Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3176.10 dlsl1 882 # ip link set mtu 65535 dev eth1 dlsl1 883 # netperf -H 10.0.0.2 TCP STREAM TEST to 10.0.0.2 : interval Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 7651.42 So, on Linux, throughput is more than doubled by supporting all the way to 64K-- something I expect to translate to TSO as well. Beyond that, it ought to be whatever the admin wants-- not some arbitrary limit set in advance. Since it is a virtual device, there is no physical constraint, as in Ethernet signaling, to force it to 1500, or 9K. The only limit ought to be the IPv4 max packet size of 64K IP data + framing (and arguably not even that, if it's using primarily or exclusively IPv6). +-DLS