From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f175.google.com ([209.85.216.175]:40255 "EHLO mail-qt0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753188AbeCUTol (ORCPT ); Wed, 21 Mar 2018 15:44:41 -0400 Received: by mail-qt0-f175.google.com with SMTP id y6so6551127qtm.7 for ; Wed, 21 Mar 2018 12:44:41 -0700 (PDT) Subject: Re: [PATCH net-next V2] Documentation/networking: Add net DIM documentation To: Randy Dunlap , Tal Gilboa , "David S. Miller" Cc: "netdev@vger.kernel.org" , Tariq Toukan References: <1521657225-65392-1-git-send-email-talgi@mellanox.com> From: Florian Fainelli Message-ID: Date: Wed, 21 Mar 2018 12:44:29 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org List-ID: On 03/21/2018 12:37 PM, Randy Dunlap wrote: > On 03/21/2018 11:33 AM, Tal Gilboa wrote: >> Net DIM is a generic algorithm, purposed for dynamically >> optimizing network devices interrupt moderation. This >> document describes how it works and how to use it. >> >> Signed-off-by: Tal Gilboa >> --- >> Documentation/networking/net_dim.txt | 174 +++++++++++++++++++++++++++++++++++ >> 1 file changed, 174 insertions(+) >> create mode 100644 Documentation/networking/net_dim.txt >> >> diff --git a/Documentation/networking/net_dim.txt b/Documentation/networking/net_dim.txt >> new file mode 100644 >> index 0000000..9cb31c5 >> --- /dev/null >> +++ b/Documentation/networking/net_dim.txt >> @@ -0,0 +1,174 @@ >> +Net DIM - Generic Network Dynamic Interrupt Moderation >> +====================================================== >> + >> +Author: >> + Tal Gilboa >> + >> + >> +Contents >> +========= >> + >> +- Assumptions >> +- Introduction >> +- The Net DIM Algorithm >> +- Registering a Network Device to DIM >> +- Example >> + >> +Part 0: Assumptions >> +====================== >> + >> +This document assumes the reader has basic knowledge in network drivers >> +and in general interrupt moderation. >> + >> + >> +Part I: Introduction >> +====================== >> + >> +Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the >> +interrupt moderation configuration of a channel in order to optimize packet >> +processing. The mechanism includes an algorithm which decides if and how to >> +change moderation parameters for a channel, usually by performing an analysis on >> +runtime data sampled from the system. Net DIM is such a mechanism. In each >> +iteration of the algorithm, it analyses a given sample of the data, compares it >> +to the previous sample and if required, it can decide to change some of the >> +interrupt moderation configuration fields. The data sample is composed of data >> +bandwidth, the number of packets and the number of events. The time between >> +samples is also measured. Net DIM compares the current and the previous data and >> +returns an adjusted interrupt moderation configuration object. In some cases, >> +the algorithm might decide not to change anything. The configuration fields are >> +the minimum duration (microseconds) allowed between events and the maximum >> +number of wanted packets per event. The Net DIM algorithm ascribes importance to >> +increase bandwidth over reducing interrupt rate. >> + >> + >> +Part II: The Net DIM Algorithm >> +=============================== >> + >> +Each iteration of the Net DIM algorithm follows these steps: >> +1. Calculates new data sample. >> +2. Compares it to previous sample. >> +3. Makes a decision - suggests interrupt moderation configuration fields. >> +4. Applies a schedule work function, which applies suggested configuration. >> + >> +The first two steps are straightforward, both the new and the previous data are >> +supplied by the driver registered to Net DIM. The previous data is the new data >> +supplied to the previous iteration. The comparison step checks the difference >> +between the new and previous data and decides on the result of the last step. >> +A step would result as "better" if bandwidth increases and as "worse" if >> +bandwidth reduces. If there is no change in bandwidth, the packet rate is >> +compared in a similar fashion - increase == "better" and decrease == "worse". >> +In case there is no change in the packet rate as well, the interrupt rate is >> +compared. Here the algorithm tries to optimize for lower interrupt rate so an >> +increase in the interrupt rate is considered "worse" and a decrease is >> +considered "better". Step #2 has an optimization for avoiding false results: it >> +only considers a difference between samples as valid if it is greater than a >> +certain percentage. Also, since Net DIM does not measure anything by itself, it >> +assumes the data provided by the driver is valid. >> + >> +Step #3 decides on the suggested configuration based on the result from step #2 >> +and the internal state of the algorithm. The states reflect the "direction" of >> +the algorithm: is it going left (reducing moderation), right (increasing >> +moderation) or standing still. Another optimization is that if a decision >> +to stay still is made multiple times, the interval between iterations of the >> +algorithm would increase in order to reduce calculation overhead. Also, after >> +"parking" on one of the most left or most right decisions, the algorithm may >> +decide to verify this decision by taking a step in the other direction. This is >> +done in order to avoid getting stuck in a "deep sleep" scenario. Once a >> +decision is made, an interrupt moderation configuration is selected from >> +the predefined profiles. > > I think a short description of the predefined profiles could help. Agreed it would help if the different modes (NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE, NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE) were expanded a bit further. The whole term QE sounds very much Ethernet converged adapter to me... > >> + >> +The last step is to notify the registered driver that it should apply the >> +suggested configuration. This is done by scheduling a work function, defined by >> +the Net DIM API and provided by the registered driver. >> + >> +As you can see, Net DIM itself does not actively interact with the system. It >> +would have trouble making the correct decisions if the wrong data is supplied to >> +it and it would be useless if the work function would not apply the suggested >> +configuration. This does, however, allow the registered driver some room for >> +manoeuvre as it may provide partial data or ignore the algorithm suggestion >> +under some conditions. >> + >> + >> +Part III: Registering a Network Device to DIM >> +============================================== >> + >> +Net DIM API exposes the main function net_dim(struct net_dim *dim, >> +struct net_dim_sample end_sample). This function is the entry point to the Net >> +DIM algorithm and has to be called every time the driver would like to check if >> +it should change interrupt moderation parameters. The driver should provide two > > Is it completely up to the driver to decide when to call net_dim()? > So it could be based on TX traffic, RX traffic, time, queue depths, etc.? > >> +data structures: struct net_dim and struct net_dim_sample. Struct net_dim >> +describes the state of DIM for a specific object (RX queue, TX queue, >> +other queues, etc.). This includes the current selected profile, previous data >> +samples, the callback function provided by the driver and more. >> +Struct net_dim_sample describes a data sample, which will be compared to the >> +data sample stored in struct net_dim in order to decide on the algorithm's next >> +step. The sample should include bytes, packets and interrupts, measured by >> +the driver. >> + >> +In order to use Net DIM from a networking driver, the driver needs to call the >> +main net_dim() function. The recommended method is to call net_dim() on each >> +interrupt. Since Net DIM has a built-in moderation and it might decide to skip > > (continuing my question from above:) > or on each interrupt. But the hardware could also be doing interrupt mitigation, > so each interrupt doesn't always correlate to anything specific. > >> +iterations under certain conditions, there is no need to moderate the net_dim() >> +calls as well. As mentioned above, the driver needs to provide an object of type >> +struct net_dim to the net_dim() function call. It is advised for each entity >> +using Net DIM to hold a struct net_dim as part of its data structure and use it >> +as the main Net DIM API object. The struct net_dim_sample should hold the latest >> +bytes, packets and interrupts count. No need to perform any calculations, just >> +include the raw data. >> + >> +The net_dim() call itself does not return anything. Instead Net DIM relies on >> +the driver to provide a callback function, which is called when the algorithm >> +decides to make a change in the interrupt moderation parameters. This callback >> +will be scheduled and run in a separate thread in order not to add overhead to >> +the data flow. After the work is done, Net DIM algorithm needs to be set to >> +the proper state in order to move to the next iteration. >> + >> + >> +Part IV: Example >> +================= >> + >> +The following code demonstrates how to register a driver to Net DIM. The actual >> +usage is not complete but it should make the outline of the usage clear. >> + >> +my_driver.c: >> + >> +#include >> + >> +/* Callback for net DIM to schedule on a decision to change moderation */ >> +void my_driver_do_dim_work(struct work_struct *work) >> +{ >> + /* Get struct net_dim from struct work_struct */ >> + struct net_dim *dim = container_of(work, struct net_dim, >> + work); >> + /* Do interrupt moderation related stuff */ >> + ... >> + >> + /* Signal net DIM work is done and it should move to next iteration */ >> + dim->state = NET_DIM_START_MEASURE; >> +} >> + >> +/* My driver's interrupt handler */ >> +int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...) >> +{ >> + ... >> + /* A struct to hold current measured data */ >> + struct net_dim_sample dim_sample; >> + ... >> + /* Initiate data sample struct with current data */ >> + net_dim_sample(my_entity->events, >> + my_entity->packets, >> + my_entity->bytes, >> + &dim_sample); >> + /* Call net DIM */ >> + net_dim(&my_entity->dim, dim_sample); >> + ... >> +} >> + >> +/* My entity's initialization function (my_entity was already allocated) */ >> +int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...) >> +{ >> + ... >> + /* Initiate struct work_struct with my driver's callback function */ >> + INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work); >> + ... >> +} >> > > Reviewed-by: Randy Dunlap > > thanks, > -- Florian