From: Vick, Matthew <matthew.vick@intel.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [net-next 20/25] fm10k: Add support for ITR scaling based on PCIe link speed
Date: Mon, 6 Apr 2015 18:33:40 +0000 [thread overview]
Message-ID: <D1481B56.7236D%matthew.vick@intel.com> (raw)
In-Reply-To: <5522BCC1.6040907@gmail.com>
On 4/6/15, 10:05 AM, "Alexander Duyck" <alexander.duyck@gmail.com> wrote:
>On 04/06/2015 09:39 AM, Vick, Matthew wrote:
>> On 4/4/15, 11:16 AM, "Alexander Duyck" <alexander.duyck@gmail.com>
>>wrote:
>>
>>> On 04/03/2015 01:27 PM, Jeff Kirsher wrote:
[...]
>>>> @@ -164,6 +165,9 @@ struct fm10k_ring_container {
>>>> #define FM10K_ITR_10K 100 /* 100us */
>>>> #define FM10K_ITR_20K 50 /* 50us */
>>>> #define FM10K_ITR_ADAPTIVE 0x8000 /* adaptive interrupt moderation
>>>> flag */
>>>> +#define FM10K_ITR_SCALE_SMALL 60 /* Constant factor for small frames
>>>>*/
>>>> +#define FM10K_ITR_SCALE_MEDIUM 50 /* Constant factor for medium
>>>>frames
>>>> */
>>>> +#define FM10K_ITR_SCALE_LARGE 40 /* Constant factor for large frames
>>>>*/
>>>>
>>>> #define FM10K_ITR_ENABLE (FM10K_ITR_AUTOMASK | FM10K_ITR_MASK_CLEAR)
>>>>
>>> These values don't make any sense to me, where did they come from?
>> They were experimental. I'll be the first to admit they probably aren't
>> perfect, but they're at least a significant improvement.
>>
>> Do you have some recommendations for interrupt rates at 25G/50G?
>>Depending
>> on what the targets are, it'd be nice to get these scalars to something
>>we
>> can shift by instead of divide by.
>
>If nothing else these need to be documented along with the expected
>range of interrupts per second.
>
>So for example if you expect the FM10K_ITR_SCALE_SMALL of 60 to be used
>with packets 300 bytes and smaller you should document somewhere that
>this is going to drive rates of 1 million to 200 thousand interrupts per
>second if that is what you actually want. I was assuming that was a bug
>since that value seems obscenely high, however at 50Gb/s I supposed it
>is possible that you could theoretically push 70Mpps so in theory that
>might be appropriate if we were actually processing small packets at
>line rate.. :-)
That, um, was totally intentional. Yeah, 1 million interrupts or bust--I
totally didn't mess up the math when I had intended it to go up to 100k
int/s. :) Will definitely fix (and document) in v2.
>>>> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>>>> b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>>>> index 1e08832..cd2e86a 100644
>>>> --- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>>>> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>>>> @@ -1395,11 +1395,20 @@ static void fm10k_update_itr(struct
>>>> fm10k_ring_container *ring_container)
>>>> if (avg_wire_size > 3000)
>>>> avg_wire_size = 3000;
>>>>
>>>> - /* Give a little boost to mid-size frames */
>>>> - if ((avg_wire_size > 300) && (avg_wire_size < 1200))
>>>> - avg_wire_size /= 3;
>>>> + /* Simple throttle rate management based on average wire size,
>>>> + * providing boosts to small and medium packet loads. Divide the
>>>> + * average wire size by a constant factor to calculate the minimum
>>>> time
>>>> + * until the next interrupt in microseconds.
>>>> + */
>>>> + if (avg_wire_size < 300)
>>>> + avg_wire_size /= FM10K_ITR_SCALE_SMALL;
>>>> + else if ((avg_wire_size >= 300) && (avg_wire_size < 1200))
>>>> + avg_wire_size /= FM10K_ITR_SCALE_MEDIUM;
>>>> else
>>>> - avg_wire_size /= 2;
>>>> + avg_wire_size /= FM10K_ITR_SCALE_LARGE;
>>>> +
>>>> + /* Scale for various PCIe link speeds */
>>>> + avg_wire_size /= ring_container->ring->itr_scale;
>>>>
>>>> /* write back value and retain adaptive flag */
>>>> ring_container->itr = avg_wire_size | FM10K_ITR_ADAPTIVE;
>>> This seems like all it is doing is maxing out the interrupt rate for
>>>all
>>> cases. For example, a value of 1514 is reduced to 37.8. When you use
>>> that as a usecs value that means the queue is capable of over 26K
>>> interrupts per second.
>>>
>>> The division by itr_scale is a really bad idea. I would recommend
>>> replacing it with a shift and you should probably check for the value
>>> hitting 0.
>>>
>>> I suspect you probably aren't seeing much of a penalty because the
>>>MSI-X
>>> interrupt call is pretty cheap, however that is still a pretty high
>>> interrupt rate compared to past parts.
>> The interrupt rates are definitely open to suggestion. I tried to
>>maintain
>> the idea of the original algorithm, which did basically the same thing
>> (avg_wire_size divided by some scalar), so I'm not seeing what's
>> significantly different here from that angle.
>>
>> A shift for the itr_scale is a fair suggestion. I'll incorporate that
>>into
>> a v2.
>>
>> I definitely understand that I went with some high interrupt values, but
>> given the architecture of the device I'd be skeptical about dropping all
>> the way to 8000int/s.
>
>I think the big issue with this is that there is no bottom end. So if
>you have a setup where you are running small packets your adjustment
>brings it all the way down to 1us since 60/60 is 1. Then you also have
>to do a shift by 2 or divide by 4 for the PCIe gen1 which will bring the
>value down to 0 which isn't correct.
>
>What you probably need to do is look at creating a limit on just how low
>a value you can have. Since you are having to do a divide by 4 after
>the computation you might make 4us the lowest value you can generate for
>this algorithm, and probably make that the limit, excluding ITR
>disabled, for user controllable interrupt moderation as well. Then that
>way things are always at least throttled to no more than 250K
>interrupts/second per queue. Otherwise there is s good chance that
>everything with packet size smaller than 240B will just end up with no
>interrupt moderation since that is currently being shifted off into
>oblivion.
And that's very much a bug I'll make sure to get fixed.
The accidentally unmoderated user input case is a good catch as well! I'll
add some lower bounds based on the PCIe link speed to the set_coalesce
call.
next prev parent reply other threads:[~2015-04-06 18:33 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-03 20:26 [Intel-wired-lan] [net-next 01/25] fm10k: Corrected an error in Tx statistics Jeff Kirsher
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 02/25] fm10k: Remove redundant rx_errors in ethtool Jeff Kirsher
2015-04-03 21:01 ` Jeff Kirsher
2015-04-14 19:37 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 03/25] fm10k: Correct spelling mistake Jeff Kirsher
2015-04-03 21:01 ` Jeff Kirsher
2015-04-14 19:37 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 04/25] fm10k: Have the VF get the default VLAN during init Jeff Kirsher
2015-04-03 21:01 ` Jeff Kirsher
2015-04-14 19:42 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 05/25] fm10k: Add netconsole support Jeff Kirsher
2015-04-03 21:02 ` Jeff Kirsher
2015-04-14 19:44 ` Singh, Krishneil K
2015-04-07 15:17 ` Alexander Duyck
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 06/25] fm10k: fix unused warnings Jeff Kirsher
2015-04-03 21:02 ` Jeff Kirsher
2015-04-14 19:45 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 07/25] fm10k: allow creation of VLAN on default vid Jeff Kirsher
2015-04-03 21:02 ` Jeff Kirsher
2015-04-14 19:45 ` Singh, Krishneil K
2015-04-04 17:39 ` Alexander Duyck
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 08/25] fm10k: only show actual queues, not the maximum in hardware Jeff Kirsher
2015-04-03 21:02 ` Jeff Kirsher
2015-04-14 19:46 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 09/25] fm10k: use hw->mac.max_queues for stats Jeff Kirsher
2015-04-03 21:03 ` Jeff Kirsher
2015-04-14 19:47 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 10/25] fm10k: separate PF only stats so that VF does not display them Jeff Kirsher
2015-04-03 21:03 ` Jeff Kirsher
2015-04-14 19:47 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 11/25] fm10k: remove extraneous "Reset interface" message Jeff Kirsher
2015-04-03 21:03 ` Jeff Kirsher
2015-04-14 19:47 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 12/25] fm10k: only increment tx_timeout_count in Tx hang path Jeff Kirsher
2015-04-03 21:04 ` Jeff Kirsher
2015-04-14 19:48 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 13/25] fm10k: expose tx_timeout_count as an ethtool stat Jeff Kirsher
2015-04-03 21:04 ` Jeff Kirsher
2015-04-14 19:48 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 14/25] fm10k: Set PF queues to unlimited bandwidth during virtualization Jeff Kirsher
2015-04-03 21:04 ` Jeff Kirsher
2015-04-14 19:49 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 15/25] fm10k: use separate workqueue for fm10k driver Jeff Kirsher
2015-04-03 21:04 ` Jeff Kirsher
2015-04-14 19:49 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 16/25] fm10k: don't handle mailbox events in iov_event path Jeff Kirsher
2015-04-03 21:05 ` Jeff Kirsher
2015-04-14 19:49 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 17/25] fm10k: comment next_vf_mbx flow Jeff Kirsher
2015-04-03 21:05 ` Jeff Kirsher
2015-04-14 19:49 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 18/25] fm10k: fix function header comment Jeff Kirsher
2015-04-03 21:05 ` Jeff Kirsher
2015-04-14 19:50 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 19/25] fm10k: start service timer on probe Jeff Kirsher
2015-04-03 21:05 ` Jeff Kirsher
2015-04-14 19:50 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 20/25] fm10k: Add support for ITR scaling based on PCIe link speed Jeff Kirsher
2015-04-03 21:06 ` Jeff Kirsher
2015-04-04 18:16 ` Alexander Duyck
2015-04-06 16:39 ` Vick, Matthew
2015-04-06 17:05 ` Alexander Duyck
2015-04-06 18:33 ` Vick, Matthew [this message]
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 21/25] fm10k: update xcast mode before synchronizing multicast addresses Jeff Kirsher
2015-04-03 21:06 ` Jeff Kirsher
2015-04-14 19:50 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 22/25] fm10k: renamed mbx_tx_dropped to mbx_tx_oversized Jeff Kirsher
2015-04-03 21:06 ` Jeff Kirsher
2015-04-14 19:50 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 23/25] fm10k: reset head instead of calling update_max_size Jeff Kirsher
2015-04-03 21:07 ` Jeff Kirsher
2015-04-14 19:51 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 24/25] fm10k: mbx_update_max_size does not drop all oversized messages Jeff Kirsher
2015-04-03 21:07 ` Jeff Kirsher
2015-04-14 19:51 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 25/25] fm10k: corrected VF multicast update Jeff Kirsher
2015-04-03 21:07 ` Jeff Kirsher
2015-04-14 19:51 ` Singh, Krishneil K
2015-04-03 21:00 ` [Intel-wired-lan] [net-next 01/25] fm10k: Corrected an error in Tx statistics Jeff Kirsher
2015-04-14 19:36 ` Singh, Krishneil K
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D1481B56.7236D%matthew.vick@intel.com \
--to=matthew.vick@intel.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.