From: Alexander Duyck <alexander.duyck@gmail.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [next-queue 13/17] fm10k: Update adaptive ITR algorithm
Date: Wed, 14 Oct 2015 15:40:49 -0700 [thread overview]
Message-ID: <561ED9F1.5060104@gmail.com> (raw)
In-Reply-To: <1444853538.26286.42.camel@intel.com>
On 10/14/2015 01:12 PM, Keller, Jacob E wrote:
> On Wed, 2015-10-14 at 11:35 -0700, Alexander Duyck wrote:
>> On 10/13/2015 04:39 PM, Jacob Keller wrote:
>> + */+#define FM10K_ITR_SCALE_SMALL 6
>>> +#define FM10K_ITR_SCALE_MEDIUM 5
>>> +#define FM10K_ITR_SCALE_LARGE 4
>>> +
>>> + if (avg_wire_size < 300)
>>> + avg_wire_size *= FM10K_ITR_SCALE_SMALL;
>>> + else if ((avg_wire_size >= 300) && (avg_wire_size < 1200))
>>> + avg_wire_size *= FM10K_ITR_SCALE_MEDIUM;
>>> else
>>> - avg_wire_size /= 2;
>>> + avg_wire_size *= FM10K_ITR_SCALE_LARGE;
>>> +
>> Where is it these scaling values originated from? Just looking
>> through
>> the values I am not sure having this broken out like it is provides
>> much
>> value.
>>
> I am not really sure what exactly you mean here?
The numbers are kind of all over the place. The result of the math
above swings back and forth in a saw tooth between values and doesn't
seem to do anything really consistent.
For example a packet that is 275 in size will have an interrupt rate of
125K interrupts per second, while a packet that is 276 bytes in size
with be closer to 166K. It is basically creating some odd peaks and
valleys.
>
>> What I am getting at is that the input is a packet size, and the
>> output
>> is a value somewhere between 2 and 47. (I think that 47 is still a
>> bit
>> high by the way and probably should be something more like 25 which I
>> believe you established as the minimum Tx interrupt rate in a later
>> patch.)
>>
> The input is two fold for calculation, packet size, and number of
> packets.
Last I knew though the average packet size is the only portion we end up
using to actually set the interrupt moderation rate.
>> What you may want to do is look at pulling in the upper limit to
>> something more reasonable like 1536 for avg_wire_size, and then
>> simplify
>> this logic a bit. Specifically what is it you are trying to
>> accomplish
>> by tweaking the scale factor like you are? I assume you are wanting
>> to
>> approximate a curve. If so you might wan to look at possibly
>> including
>> an offset value so that you can smooth out the points where your
>> intersections occur.
>>
> I honestly don't know. I mostly took the original work from 6-7 months
> ago, and added your suggestions from that series, but maybe that isn't
> valid now?
I suspect some of my views have changed over the last several months
after dealing with interrupt moderation issues on ixgbe.
The thing I started realizing is that with full sized Ethernet frames,
1514 bytes in size, you have a maximum rate of something like 4 million
packets per second at 50Gbps. It seems like you should be firing an
interrupt once every 100 packets at least don't you think? That is why
I am now thinking at minimum 40K interrupts per second is necessary.
However a side effect of that is that 100 packets will overrun a UDP
buffer in most cases as there is only room for about 70 frames based on
math I saw when I submitted the patch for ixgbe
(https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c?id=8ac34f10a5ea4c7b6f57dfd52b0693a2b67d9ac4).
So if I want to take UDP rmem_default into account we would be looking
at something more like 60K interrupts per second which is getting fairly
tiny at 17us
>> For example what you may want to consider doing would be to instead
>> use
>> a multiplication factor for small, an addition value for medium, and
>> for
>> large you simply cap it at a certain value.
>>
> So, how would this look? The capping makes sense, we should probably
> cap it at around 30 or something? I'm not sure that 25 is the limit,
> since for some workloads I think we did calculate that we could use
> that few interrupts.. maybe the CPU savings aren't worth it though if
> it messes up too many other flows?
So I have done some math based on an assumption of a 212992 wmem_default
and a overhead of 640 bytes per packet (256 skb, 64 headroom, 320 shared
info). Breaking it all down goes a little something like this:
wmem_default / (size + overhead) = desired_pkts_per_int
rate / bits_per_byte / (size + ethernet_overhead) = pkt_rate
(desired_pkts_per_int / pkt_rate) * usec_per_sec = ITR value
So then when we start plugging in the numbers:
212992 / (size + 640) = desired_pkts_per_int
we can simplify the second expression by just doing the division now.
50,000,000,000 / 8 / (size + 24) = pkt_rate
6,250,000,000/(size + 24) = pkt_rate
Then we just need to plug in the values and reduce:
(212,992 / (size + 640)) / (6,250,000,000/(size + 24)) * 1,000,000
= ITR value
(212,992/(size + 640)) / (6,250/(size + 24)) = ITR value
(34.078/(size + 640))/(1/(size+24) = ITR value
(34 * (size + 24))/(size + 640) = ITR value
So now we have the expression we are looking for in order to determine
the minimum interrupt rate, but we want to avoid the division. So in
the end we have something that looks like this in order to generate an
approximation of the curve without having to do any unnecessary math
(dropping the +24, and the 3000 limit from earlier):
/* the following is a crude approximation for the following expression:
* (34 * (size + 24))/(size + 640) = ITR value
* /
if (avg_wire_size <= 360) {
/* start 333K ints/sec and gradually drop to 77K ints/sec */
avg_wire_size *= 8;
avg_wire_size += 376;
} else if (avg_wire_size <= 1152) {
/* 77K ints/sec to 45K ints/sec
avg_wire_size *= 3;
avg_wire_size += 2176;
} else if (avg_wire_size <= 1920) {
/* 45K ints/sec to 38K ints/sec
avg_wire_size += 4480;
} else {
/* plateau@a limit of 38K ints/sec */
avg_wire_size = 6656;
}
Mind you the above is a crude approximation, but it should give decent
performance and it only strays from the values of the original function
by 1us or 2us and it stays under the curve. It could probably use some
tuning and tweaking as well but you get the general idea.
You may even want to tune this to be a bit more aggressive in terms of
interrupts per second. I know some distros such as RHEL are still
running around with an untuned skb_shared_info and such and as a result
they take up more space resulting in a larger memory footprint. What
this would represent is modifying the 640 value in the original function
to increase based on the extra overhead. Then it would be necessary to
modify the slopes, offsets, and transitions points to get the right
approximation for the new curve.
> I could also try to take a completely different algorithm say from i40e
> instead? This one really has limited testing.
Yeah, this one was a "good enough" solution at the time and as I recall
it was a clone of the igb interrupt moderation. Now that you have real
ports you probably need something better than 1Gbps.
The i40e one is from ixgbe, which was inherited from e1000. The problem
with the interrupt moderation in that design is that for any high
throughput usage everything becomes bulk (8K ints per second). For 1G
that works fine. But I can tell you from what I have seen on 10Gbps
NICs it doesn't do well under any kind of small packet, or single
threaded throughput test.
next prev parent reply other threads:[~2015-10-14 22:40 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-13 23:38 [Intel-wired-lan] [next-queue 01/17] fm10k: conditionally compile DCB and DebugFS support Jacob Keller
2015-10-13 23:38 ` [Intel-wired-lan] [next-queue 02/17] fm10k: set netdev features in one location Jacob Keller
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 03/17] fm10k: reinitialize queuing scheme after calling init_hw Jacob Keller
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 04/17] fm10k: reset max_queues on init_hw_vf failure Jacob Keller
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 05/17] fm10k: always check init_hw for errors Jacob Keller
2015-10-14 0:46 ` Allan, Bruce W
2015-10-14 15:57 ` Keller, Jacob E
2015-10-28 0:47 ` Singh, Krishneil K
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 06/17] fm10k: Correct typecast in fm10k_update_xc_addr_pf Jacob Keller
2015-10-14 0:46 ` Allan, Bruce W
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 07/17] fm10k: explicitly typecast vlan values to u16 Jacob Keller
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 08/17] fm10k: add statistics for actual DWORD count of mbmem mailbox Jacob Keller
2015-10-14 0:47 ` Allan, Bruce W
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 09/17] fm10k: rename mbx_tx_oversized statistic to mbx_tx_dropped Jacob Keller
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 10/17] fm10k: add TEB check to fm10k_gre_is_nvgre Jacob Keller
2015-10-14 0:47 ` Allan, Bruce W
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 11/17] fm10k: Add support for ITR scaling based on PCIe link speed Jacob Keller
2015-10-14 0:47 ` Allan, Bruce W
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 12/17] fm10k: introduce ITR_IS_ADAPTIVE macro Jacob Keller
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 13/17] fm10k: Update adaptive ITR algorithm Jacob Keller
2015-10-14 18:35 ` Alexander Duyck
2015-10-14 20:12 ` Keller, Jacob E
2015-10-14 22:40 ` Alexander Duyck [this message]
2015-10-14 23:50 ` Keller, Jacob E
2015-10-15 2:17 ` Alexander Duyck
2015-10-15 16:32 ` Keller, Jacob E
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 14/17] fm10k: use macro for default Tx and Rx ITR values Jacob Keller
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 15/17] fm10k: change default Tx ITR to 25usec Jacob Keller
2015-10-14 15:15 ` Alexander Duyck
2015-10-14 15:59 ` Keller, Jacob E
2015-10-14 16:23 ` Alexander Duyck
2015-10-14 16:31 ` Keller, Jacob E
2015-10-14 17:57 ` Keller, Jacob E
2015-10-14 23:27 ` Alexander Duyck
2015-10-14 23:44 ` Keller, Jacob E
2015-10-15 2:23 ` Alexander Duyck
2015-10-15 16:35 ` Keller, Jacob E
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 16/17] fm10k: TRIVIAL fix typo of hardware Jacob Keller
2015-10-13 23:39 ` [Intel-wired-lan] [next-queue 17/17] fm10k: TRIVIAL cleanup order at top of fm10k_xmit_frame Jacob Keller
2015-10-14 0:46 ` [Intel-wired-lan] [next-queue 01/17] fm10k: conditionally compile DCB and DebugFS support Allan, Bruce W
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=561ED9F1.5060104@gmail.com \
--to=alexander.duyck@gmail.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.