From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Date: Wed, 14 Oct 2015 15:40:49 -0700 Subject: [Intel-wired-lan] [next-queue 13/17] fm10k: Update adaptive ITR algorithm In-Reply-To: <1444853538.26286.42.camel@intel.com> References: <1444779554-20464-1-git-send-email-jacob.e.keller@intel.com> <1444779554-20464-13-git-send-email-jacob.e.keller@intel.com> <561EA08C.8090705@gmail.com> <1444853538.26286.42.camel@intel.com> Message-ID: <561ED9F1.5060104@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: On 10/14/2015 01:12 PM, Keller, Jacob E wrote: > On Wed, 2015-10-14 at 11:35 -0700, Alexander Duyck wrote: >> On 10/13/2015 04:39 PM, Jacob Keller wrote: >> + */+#define FM10K_ITR_SCALE_SMALL 6 >>> +#define FM10K_ITR_SCALE_MEDIUM 5 >>> +#define FM10K_ITR_SCALE_LARGE 4 >>> + >>> + if (avg_wire_size < 300) >>> + avg_wire_size *= FM10K_ITR_SCALE_SMALL; >>> + else if ((avg_wire_size >= 300) && (avg_wire_size < 1200)) >>> + avg_wire_size *= FM10K_ITR_SCALE_MEDIUM; >>> else >>> - avg_wire_size /= 2; >>> + avg_wire_size *= FM10K_ITR_SCALE_LARGE; >>> + >> Where is it these scaling values originated from? Just looking >> through >> the values I am not sure having this broken out like it is provides >> much >> value. >> > I am not really sure what exactly you mean here? The numbers are kind of all over the place. The result of the math above swings back and forth in a saw tooth between values and doesn't seem to do anything really consistent. For example a packet that is 275 in size will have an interrupt rate of 125K interrupts per second, while a packet that is 276 bytes in size with be closer to 166K. It is basically creating some odd peaks and valleys. > >> What I am getting at is that the input is a packet size, and the >> output >> is a value somewhere between 2 and 47. (I think that 47 is still a >> bit >> high by the way and probably should be something more like 25 which I >> believe you established as the minimum Tx interrupt rate in a later >> patch.) >> > The input is two fold for calculation, packet size, and number of > packets. Last I knew though the average packet size is the only portion we end up using to actually set the interrupt moderation rate. >> What you may want to do is look at pulling in the upper limit to >> something more reasonable like 1536 for avg_wire_size, and then >> simplify >> this logic a bit. Specifically what is it you are trying to >> accomplish >> by tweaking the scale factor like you are? I assume you are wanting >> to >> approximate a curve. If so you might wan to look at possibly >> including >> an offset value so that you can smooth out the points where your >> intersections occur. >> > I honestly don't know. I mostly took the original work from 6-7 months > ago, and added your suggestions from that series, but maybe that isn't > valid now? I suspect some of my views have changed over the last several months after dealing with interrupt moderation issues on ixgbe. The thing I started realizing is that with full sized Ethernet frames, 1514 bytes in size, you have a maximum rate of something like 4 million packets per second at 50Gbps. It seems like you should be firing an interrupt once every 100 packets at least don't you think? That is why I am now thinking at minimum 40K interrupts per second is necessary. However a side effect of that is that 100 packets will overrun a UDP buffer in most cases as there is only room for about 70 frames based on math I saw when I submitted the patch for ixgbe (https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c?id=8ac34f10a5ea4c7b6f57dfd52b0693a2b67d9ac4). So if I want to take UDP rmem_default into account we would be looking at something more like 60K interrupts per second which is getting fairly tiny at 17us >> For example what you may want to consider doing would be to instead >> use >> a multiplication factor for small, an addition value for medium, and >> for >> large you simply cap it at a certain value. >> > So, how would this look? The capping makes sense, we should probably > cap it at around 30 or something? I'm not sure that 25 is the limit, > since for some workloads I think we did calculate that we could use > that few interrupts.. maybe the CPU savings aren't worth it though if > it messes up too many other flows? So I have done some math based on an assumption of a 212992 wmem_default and a overhead of 640 bytes per packet (256 skb, 64 headroom, 320 shared info). Breaking it all down goes a little something like this: wmem_default / (size + overhead) = desired_pkts_per_int rate / bits_per_byte / (size + ethernet_overhead) = pkt_rate (desired_pkts_per_int / pkt_rate) * usec_per_sec = ITR value So then when we start plugging in the numbers: 212992 / (size + 640) = desired_pkts_per_int we can simplify the second expression by just doing the division now. 50,000,000,000 / 8 / (size + 24) = pkt_rate 6,250,000,000/(size + 24) = pkt_rate Then we just need to plug in the values and reduce: (212,992 / (size + 640)) / (6,250,000,000/(size + 24)) * 1,000,000 = ITR value (212,992/(size + 640)) / (6,250/(size + 24)) = ITR value (34.078/(size + 640))/(1/(size+24) = ITR value (34 * (size + 24))/(size + 640) = ITR value So now we have the expression we are looking for in order to determine the minimum interrupt rate, but we want to avoid the division. So in the end we have something that looks like this in order to generate an approximation of the curve without having to do any unnecessary math (dropping the +24, and the 3000 limit from earlier): /* the following is a crude approximation for the following expression: * (34 * (size + 24))/(size + 640) = ITR value * / if (avg_wire_size <= 360) { /* start 333K ints/sec and gradually drop to 77K ints/sec */ avg_wire_size *= 8; avg_wire_size += 376; } else if (avg_wire_size <= 1152) { /* 77K ints/sec to 45K ints/sec avg_wire_size *= 3; avg_wire_size += 2176; } else if (avg_wire_size <= 1920) { /* 45K ints/sec to 38K ints/sec avg_wire_size += 4480; } else { /* plateau@a limit of 38K ints/sec */ avg_wire_size = 6656; } Mind you the above is a crude approximation, but it should give decent performance and it only strays from the values of the original function by 1us or 2us and it stays under the curve. It could probably use some tuning and tweaking as well but you get the general idea. You may even want to tune this to be a bit more aggressive in terms of interrupts per second. I know some distros such as RHEL are still running around with an untuned skb_shared_info and such and as a result they take up more space resulting in a larger memory footprint. What this would represent is modifying the 640 value in the original function to increase based on the extra overhead. Then it would be necessary to modify the slopes, offsets, and transitions points to get the right approximation for the new curve. > I could also try to take a completely different algorithm say from i40e > instead? This one really has limited testing. Yeah, this one was a "good enough" solution at the time and as I recall it was a clone of the igb interrupt moderation. Now that you have real ports you probably need something better than 1Gbps. The i40e one is from ixgbe, which was inherited from e1000. The problem with the interrupt moderation in that design is that for any high throughput usage everything becomes bulk (8K ints per second). For 1G that works fine. But I can tell you from what I have seen on 10Gbps NICs it doesn't do well under any kind of small packet, or single threaded throughput test.