From: Vick, Matthew <matthew.vick@intel.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [net-next 20/25] fm10k: Add support for ITR scaling based on PCIe link speed
Date: Mon, 6 Apr 2015 16:39:38 +0000 [thread overview]
Message-ID: <D147FAF6.72258%matthew.vick@intel.com> (raw)
In-Reply-To: <55202A91.4080403@gmail.com>
On 4/4/15, 11:16 AM, "Alexander Duyck" <alexander.duyck@gmail.com> wrote:
>On 04/03/2015 01:27 PM, Jeff Kirsher wrote:
[...]
>>
>> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h
>>b/drivers/net/ethernet/intel/fm10k/fm10k.h
>> index cc7f442e..feb53c0 100644
>> --- a/drivers/net/ethernet/intel/fm10k/fm10k.h
>> +++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
>> @@ -131,6 +131,7 @@ struct fm10k_ring {
>> * different for DCB and RSS modes
>> */
>> u8 qos_pc; /* priority class of queue */
>> + u8 itr_scale; /* throttle scaler based on PCI speed */
>> u16 vid; /* default vlan ID of queue */
>> u16 count; /* amount of descriptors */
>>
>
>This shouldn't be a part of the ring. If it is related to the ITR it
>really belongs in the ring container or the q_vector.
Fair enough--I definitely went back and forth on where to put it. I think
I'll put it in the ring container, since the ITR already gets updated
there. It's just mildly annoying during the initialization routines is all.
>> @@ -164,6 +165,9 @@ struct fm10k_ring_container {
>> #define FM10K_ITR_10K 100 /* 100us */
>> #define FM10K_ITR_20K 50 /* 50us */
>> #define FM10K_ITR_ADAPTIVE 0x8000 /* adaptive interrupt moderation
>>flag */
>> +#define FM10K_ITR_SCALE_SMALL 60 /* Constant factor for small frames */
>> +#define FM10K_ITR_SCALE_MEDIUM 50 /* Constant factor for medium frames
>>*/
>> +#define FM10K_ITR_SCALE_LARGE 40 /* Constant factor for large frames */
>>
>> #define FM10K_ITR_ENABLE (FM10K_ITR_AUTOMASK | FM10K_ITR_MASK_CLEAR)
>>
>
>These values don't make any sense to me, where did they come from?
They were experimental. I'll be the first to admit they probably aren't
perfect, but they're at least a significant improvement.
Do you have some recommendations for interrupt rates at 25G/50G? Depending
on what the targets are, it'd be nice to get these scalars to something we
can shift by instead of divide by.
>> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>>b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>> index 1e08832..cd2e86a 100644
>> --- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>> @@ -1395,11 +1395,20 @@ static void fm10k_update_itr(struct
>>fm10k_ring_container *ring_container)
>> if (avg_wire_size > 3000)
>> avg_wire_size = 3000;
>>
>> - /* Give a little boost to mid-size frames */
>> - if ((avg_wire_size > 300) && (avg_wire_size < 1200))
>> - avg_wire_size /= 3;
>> + /* Simple throttle rate management based on average wire size,
>> + * providing boosts to small and medium packet loads. Divide the
>> + * average wire size by a constant factor to calculate the minimum
>>time
>> + * until the next interrupt in microseconds.
>> + */
>> + if (avg_wire_size < 300)
>> + avg_wire_size /= FM10K_ITR_SCALE_SMALL;
>> + else if ((avg_wire_size >= 300) && (avg_wire_size < 1200))
>> + avg_wire_size /= FM10K_ITR_SCALE_MEDIUM;
>> else
>> - avg_wire_size /= 2;
>> + avg_wire_size /= FM10K_ITR_SCALE_LARGE;
>> +
>> + /* Scale for various PCIe link speeds */
>> + avg_wire_size /= ring_container->ring->itr_scale;
>>
>> /* write back value and retain adaptive flag */
>> ring_container->itr = avg_wire_size | FM10K_ITR_ADAPTIVE;
>
>This seems like all it is doing is maxing out the interrupt rate for all
>cases. For example, a value of 1514 is reduced to 37.8. When you use
>that as a usecs value that means the queue is capable of over 26K
>interrupts per second.
>
>The division by itr_scale is a really bad idea. I would recommend
>replacing it with a shift and you should probably check for the value
>hitting 0.
>
>I suspect you probably aren't seeing much of a penalty because the MSI-X
>interrupt call is pretty cheap, however that is still a pretty high
>interrupt rate compared to past parts.
The interrupt rates are definitely open to suggestion. I tried to maintain
the idea of the original algorithm, which did basically the same thing
(avg_wire_size divided by some scalar), so I'm not seeing what's
significantly different here from that angle.
A shift for the itr_scale is a fair suggestion. I'll incorporate that into
a v2.
I definitely understand that I went with some high interrupt values, but
given the architecture of the device I'd be skeptical about dropping all
the way to 8000int/s.
>> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
>>b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
>> index cc527dd..c7c9832 100644
>> --- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
>> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
>> @@ -532,6 +532,9 @@ static void fm10k_configure_tx_ring(struct
>>fm10k_intfc *interface,
>> /* store tail pointer */
>> ring->tail = &interface->uc_addr[FM10K_TDT(reg_idx)];
>>
>> + /* store ITR scale */
>> + ring->itr_scale = hw->mac.itr_scale;
>> +
>> /* reset ntu and ntc to place SW in sync with hardwdare */
>> ring->next_to_clean = 0;
>> ring->next_to_use = 0;
>> @@ -638,6 +641,9 @@ static void fm10k_configure_rx_ring(struct
>>fm10k_intfc *interface,
>> /* store tail pointer */
>> ring->tail = &interface->uc_addr[FM10K_RDT(reg_idx)];
>>
>> + /* store ITR scale */
>> + ring->itr_scale = hw->mac.itr_scale;
>> +
>> /* reset ntu and ntc to place SW in sync with hardwdare */
>> ring->next_to_clean = 0;
>> ring->next_to_use = 0;
>> @@ -824,7 +830,8 @@ static irqreturn_t fm10k_msix_mbx_vf(int
>>__always_unused irq, void *data)
>>
>> /* re-enable mailbox interrupt and indicate 20us delay */
>> fm10k_write_reg(hw, FM10K_VFITR(FM10K_MBX_VECTOR),
>> - FM10K_ITR_ENABLE | FM10K_MBX_INT_DELAY);
>> + FM10K_ITR_ENABLE | (FM10K_MBX_INT_DELAY /
>> + hw->mac.itr_scale));
>>
>> /* service upstream mailbox */
>> if (fm10k_mbx_trylock(interface)) {
>> @@ -1007,7 +1014,8 @@ static irqreturn_t fm10k_msix_mbx_pf(int
>>__always_unused irq, void *data)
>>
>> /* re-enable mailbox interrupt and indicate 20us delay */
>> fm10k_write_reg(hw, FM10K_ITR(FM10K_MBX_VECTOR),
>> - FM10K_ITR_ENABLE | FM10K_MBX_INT_DELAY);
>> + FM10K_ITR_ENABLE | (FM10K_MBX_INT_DELAY /
>> + hw->mac.itr_scale));
>>
>> return IRQ_HANDLED;
>> }
>
>You would be much better off here using the itr_scale as a multiple or a
>shift value rather than doing a divide in an interrupt handler as a
>divide can be quite expensive.
Will fix for v2.
>> @@ -56,6 +59,7 @@ static s32 fm10k_stop_hw_vf(struct fm10k_hw *hw)
>> fm10k_write_reg(hw, FM10K_TDBAH(i), bah);
>> fm10k_write_reg(hw, FM10K_RDBAL(i), bal);
>> fm10k_write_reg(hw, FM10K_RDBAH(i), bah);
>> + fm10k_write_reg(hw, FM10K_TDLEN(i), tdlen);
>> }
>>
>> return 0;
>> @@ -124,9 +128,12 @@ static s32 fm10k_init_hw_vf(struct fm10k_hw *hw)
>> /* record maximum queue count */
>> hw->mac.max_queues = i;
>>
>> - /* fetch default VLAN */
>> + /* fetch default VLAN and ITR scale */
>> hw->mac.default_vid = (fm10k_read_reg(hw, FM10K_TXQCTL(0)) &
>> FM10K_TXQCTL_VID_MASK) >> FM10K_TXQCTL_VID_SHIFT;
>> + hw->mac.itr_scale = (fm10k_read_reg(hw, FM10K_TDLEN(0)) &
>> + FM10K_TDLEN_ITR_SCALE_MASK) >>
>> + FM10K_TDLEN_ITR_SCALE_SHIFT;
>>
>> return 0;
>> }
>>
>
>This is a good reason to get rid of the divide in favor of a shift. If
>a VF didn't restore the tdlen value using the stop_hw_vf function then
>you could potentially have one driver load trip up the next and cause a
>divide by 0.
Agreed, and sure enough we have a commit in the works that assumes a
default value if we read zero. I originally went with the divide for
readability's sake, but the shift is probably the better choice, so I'll
send a v2 with that.
If you have some suggestions for an interrupt range, I can look into
incorporating those as well.
Thank you for the review, Alex!
Cheers,
Matthew
next prev parent reply other threads:[~2015-04-06 16:39 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-03 20:26 [Intel-wired-lan] [net-next 01/25] fm10k: Corrected an error in Tx statistics Jeff Kirsher
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 02/25] fm10k: Remove redundant rx_errors in ethtool Jeff Kirsher
2015-04-03 21:01 ` Jeff Kirsher
2015-04-14 19:37 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 03/25] fm10k: Correct spelling mistake Jeff Kirsher
2015-04-03 21:01 ` Jeff Kirsher
2015-04-14 19:37 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 04/25] fm10k: Have the VF get the default VLAN during init Jeff Kirsher
2015-04-03 21:01 ` Jeff Kirsher
2015-04-14 19:42 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 05/25] fm10k: Add netconsole support Jeff Kirsher
2015-04-03 21:02 ` Jeff Kirsher
2015-04-14 19:44 ` Singh, Krishneil K
2015-04-07 15:17 ` Alexander Duyck
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 06/25] fm10k: fix unused warnings Jeff Kirsher
2015-04-03 21:02 ` Jeff Kirsher
2015-04-14 19:45 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 07/25] fm10k: allow creation of VLAN on default vid Jeff Kirsher
2015-04-03 21:02 ` Jeff Kirsher
2015-04-14 19:45 ` Singh, Krishneil K
2015-04-04 17:39 ` Alexander Duyck
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 08/25] fm10k: only show actual queues, not the maximum in hardware Jeff Kirsher
2015-04-03 21:02 ` Jeff Kirsher
2015-04-14 19:46 ` Singh, Krishneil K
2015-04-03 20:26 ` [Intel-wired-lan] [net-next 09/25] fm10k: use hw->mac.max_queues for stats Jeff Kirsher
2015-04-03 21:03 ` Jeff Kirsher
2015-04-14 19:47 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 10/25] fm10k: separate PF only stats so that VF does not display them Jeff Kirsher
2015-04-03 21:03 ` Jeff Kirsher
2015-04-14 19:47 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 11/25] fm10k: remove extraneous "Reset interface" message Jeff Kirsher
2015-04-03 21:03 ` Jeff Kirsher
2015-04-14 19:47 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 12/25] fm10k: only increment tx_timeout_count in Tx hang path Jeff Kirsher
2015-04-03 21:04 ` Jeff Kirsher
2015-04-14 19:48 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 13/25] fm10k: expose tx_timeout_count as an ethtool stat Jeff Kirsher
2015-04-03 21:04 ` Jeff Kirsher
2015-04-14 19:48 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 14/25] fm10k: Set PF queues to unlimited bandwidth during virtualization Jeff Kirsher
2015-04-03 21:04 ` Jeff Kirsher
2015-04-14 19:49 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 15/25] fm10k: use separate workqueue for fm10k driver Jeff Kirsher
2015-04-03 21:04 ` Jeff Kirsher
2015-04-14 19:49 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 16/25] fm10k: don't handle mailbox events in iov_event path Jeff Kirsher
2015-04-03 21:05 ` Jeff Kirsher
2015-04-14 19:49 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 17/25] fm10k: comment next_vf_mbx flow Jeff Kirsher
2015-04-03 21:05 ` Jeff Kirsher
2015-04-14 19:49 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 18/25] fm10k: fix function header comment Jeff Kirsher
2015-04-03 21:05 ` Jeff Kirsher
2015-04-14 19:50 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 19/25] fm10k: start service timer on probe Jeff Kirsher
2015-04-03 21:05 ` Jeff Kirsher
2015-04-14 19:50 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 20/25] fm10k: Add support for ITR scaling based on PCIe link speed Jeff Kirsher
2015-04-03 21:06 ` Jeff Kirsher
2015-04-04 18:16 ` Alexander Duyck
2015-04-06 16:39 ` Vick, Matthew [this message]
2015-04-06 17:05 ` Alexander Duyck
2015-04-06 18:33 ` Vick, Matthew
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 21/25] fm10k: update xcast mode before synchronizing multicast addresses Jeff Kirsher
2015-04-03 21:06 ` Jeff Kirsher
2015-04-14 19:50 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 22/25] fm10k: renamed mbx_tx_dropped to mbx_tx_oversized Jeff Kirsher
2015-04-03 21:06 ` Jeff Kirsher
2015-04-14 19:50 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 23/25] fm10k: reset head instead of calling update_max_size Jeff Kirsher
2015-04-03 21:07 ` Jeff Kirsher
2015-04-14 19:51 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 24/25] fm10k: mbx_update_max_size does not drop all oversized messages Jeff Kirsher
2015-04-03 21:07 ` Jeff Kirsher
2015-04-14 19:51 ` Singh, Krishneil K
2015-04-03 20:27 ` [Intel-wired-lan] [net-next 25/25] fm10k: corrected VF multicast update Jeff Kirsher
2015-04-03 21:07 ` Jeff Kirsher
2015-04-14 19:51 ` Singh, Krishneil K
2015-04-03 21:00 ` [Intel-wired-lan] [net-next 01/25] fm10k: Corrected an error in Tx statistics Jeff Kirsher
2015-04-14 19:36 ` Singh, Krishneil K
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D147FAF6.72258%matthew.vick@intel.com \
--to=matthew.vick@intel.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.