From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keller, Jacob E Date: Wed, 14 Oct 2015 23:50:49 +0000 Subject: [Intel-wired-lan] [next-queue 13/17] fm10k: Update adaptive ITR algorithm In-Reply-To: <561ED9F1.5060104@gmail.com> References: <1444779554-20464-1-git-send-email-jacob.e.keller@intel.com> <1444779554-20464-13-git-send-email-jacob.e.keller@intel.com> <561EA08C.8090705@gmail.com> <1444853538.26286.42.camel@intel.com> <561ED9F1.5060104@gmail.com> Message-ID: <1444866649.26286.58.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: On Wed, 2015-10-14 at 15:40 -0700, Alexander Duyck wrote: > On 10/14/2015 01:12 PM, Keller, Jacob E wrote: > > On Wed, 2015-10-14 at 11:35 -0700, Alexander Duyck wrote: > > > On 10/13/2015 04:39 PM, Jacob Keller wrote: > > > + */+#define FM10K_ITR_SCALE_SMALL 6 > > > > +#define FM10K_ITR_SCALE_MEDIUM 5 > > > > +#define FM10K_ITR_SCALE_LARGE 4 > > > > + > > > > + if (avg_wire_size < 300) > > > > + avg_wire_size *= FM10K_ITR_SCALE_SMALL; > > > > + else if ((avg_wire_size >= 300) && (avg_wire_size < > > > > 1200)) > > > > + avg_wire_size *= FM10K_ITR_SCALE_MEDIUM; > > > > else > > > > - avg_wire_size /= 2; > > > > + avg_wire_size *= FM10K_ITR_SCALE_LARGE; > > > > + > > > Where is it these scaling values originated from? Just looking > > > through > > > the values I am not sure having this broken out like it is > > > provides > > > much > > > value. > > > > > I am not really sure what exactly you mean here? > > The numbers are kind of all over the place. The result of the math > above swings back and forth in a saw tooth between values and doesn't > seem to do anything really consistent. > > For example a packet that is 275 in size will have an interrupt rate > of > 125K interrupts per second, while a packet that is 276 bytes in size > with be closer to 166K. It is basically creating some odd peaks and > valleys. > Yea ok this does do a lot of weird things. I can definitely look at implementing what you suggest below and see how it goes.. > > > > > What I am getting at is that the input is a packet size, and the > > > output > > > is a value somewhere between 2 and 47. (I think that 47 is still > > > a > > > bit > > > high by the way and probably should be something more like 25 > > > which I > > > believe you established as the minimum Tx interrupt rate in a > > > later > > > patch.) > > > > > The input is two fold for calculation, packet size, and number of > > packets. > > Last I knew though the average packet size is the only portion we end > up > using to actually set the interrupt moderation rate. > You're correct right now we calculate average wiresize and only use that, no mention of total packet as well. > > > What you may want to do is look at pulling in the upper limit to > > > something more reasonable like 1536 for avg_wire_size, and then > > > simplify > > > this logic a bit. Specifically what is it you are trying to > > > accomplish > > > by tweaking the scale factor like you are? I assume you are > > > wanting > > > to > > > approximate a curve. If so you might wan to look at possibly > > > including > > > an offset value so that you can smooth out the points where your > > > intersections occur. > > > > > I honestly don't know. I mostly took the original work from 6-7 > > months > > ago, and added your suggestions from that series, but maybe that > > isn't > > valid now? > > I suspect some of my views have changed over the last several months > after dealing with interrupt moderation issues on ixgbe. > Sure. It's a complicated problem. > The thing I started realizing is that with full sized Ethernet > frames, > 1514 bytes in size, you have a maximum rate of something like 4 > million > packets per second at 50Gbps. It seems like you should be firing an > interrupt once every 100 packets at least don't you think? That is > why > I am now thinking at minimum 40K interrupts per second is necessary. > However a side effect of that is that 100 packets will overrun a UDP > buffer in most cases as there is only room for about 70 frames based > on > math I saw when I submitted the patch for ixgbe > (https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/comm > it/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c?id=8ac34f10a5ea4c7b6 > f57dfd52b0693a2b67d9ac4). > So if I want to take UDP rmem_default into account we would be > looking > at something more like 60K interrupts per second which is getting > fairly > tiny at 17us > Ok. > > > For example what you may want to consider doing would be to > > > instead > > > use > > > a multiplication factor for small, an addition value for medium, > > > and > > > for > > > large you simply cap it at a certain value. > > > > > So, how would this look? The capping makes sense, we should > > probably > > cap it at around 30 or something? I'm not sure that 25 is the > > limit, > > since for some workloads I think we did calculate that we could use > > that few interrupts.. maybe the CPU savings aren't worth it though > > if > > it messes up too many other flows? > > So I have done some math based on an assumption of a 212992 > wmem_default > and a overhead of 640 bytes per packet (256 skb, 64 headroom, 320 > shared > info). Breaking it all down goes a little something like this: > > wmem_default / (size + overhead) = desired_pkts_per_int > rate / bits_per_byte / (size + ethernet_overhead) = pkt_rate > (desired_pkts_per_int / pkt_rate) * usec_per_sec = ITR value > > So then when we start plugging in the numbers: > 212992 / (size + 640) = desired_pkts_per_int > > we can simplify the second expression by just doing the division now. > 50,000,000,000 / 8 / (size + 24) = pkt_rate > 6,250,000,000/(size + 24) = pkt_rate > > Then we just need to plug in the values and reduce: > (212,992 / (size + 640)) / (6,250,000,000/(size + 24)) * > 1,000,000 > = ITR value > (212,992/(size + 640)) / (6,250/(size + 24)) = ITR value > (34.078/(size + 640))/(1/(size+24) = ITR value > (34 * (size + 24))/(size + 640) = ITR value > > So now we have the expression we are looking for in order to > determine > the minimum interrupt rate, but we want to avoid the division. So in > the end we have something that looks like this in order to generate > an > approximation of the curve without having to do any unnecessary math > (dropping the +24, and the 3000 limit from earlier): > > /* the following is a crude approximation for the following > expression: > * (34 * (size + 24))/(size + 640) = ITR value > * / > if (avg_wire_size <= 360) { > /* start 333K ints/sec and gradually drop to 77K ints/sec */ > avg_wire_size *= 8; > avg_wire_size += 376; > } else if (avg_wire_size <= 1152) { > /* 77K ints/sec to 45K ints/sec > avg_wire_size *= 3; > avg_wire_size += 2176; > } else if (avg_wire_size <= 1920) { > /* 45K ints/sec to 38K ints/sec > avg_wire_size += 4480; > } else { > /* plateau at a limit of 38K ints/sec */ > avg_wire_size = 6656; > } > So this is calculating the inverse ints/sec? where do we end up converting this to microseconds. Hmm. > Mind you the above is a crude approximation, but it should give > decent > performance and it only strays from the values of the original > function > by 1us or 2us and it stays under the curve. It could probably use > some > tuning and tweaking as well but you get the general idea. > > You may even want to tune this to be a bit more aggressive in terms > of > interrupts per second. I know some distros such as RHEL are still > running around with an untuned skb_shared_info and such and as a > result > they take up more space resulting in a larger memory footprint. What > this would represent is modifying the 640 value in the original > function > to increase based on the extra overhead. Then it would be necessary > to > modify the slopes, offsets, and transitions points to get the right > approximation for the new curve. > > > I could also try to take a completely different algorithm say from > > i40e > > instead? This one really has limited testing. > > Yeah, this one was a "good enough" solution at the time and as I > recall > it was a clone of the igb interrupt moderation. Now that you have > real > ports you probably need something better than 1Gbps. > > The i40e one is from ixgbe, which was inherited from e1000. The > problem > with the interrupt moderation in that design is that for any high > throughput usage everything becomes bulk (8K ints per second). For 1G > that works fine. But I can tell you from what I have seen on 10Gbps > NICs it doesn't do well under any kind of small packet, or single > threaded throughput test. > > Agreed ok. I will look into this more. Regards, Jake